How are the Predict benchmarks calculated?

Dive into how we created the benchmarks from a dataset of a whopping 4197 images and 19096 AOIs!


The data consisted of 4197 images, which were each labeled to be included in one of the subcategories within the three main categories: Advertisement, Websites, or Products.

A total of 19096 Areas of Interest (AOIs) were drawn across the images. Each image had different relevant AOIs drawn on it, resulting in a different number of AOIs for the whole dataset as seen below: 

Each of these images with AOIs was run through the Predict platform to produce scores of Cognitive Demand, Focus, Clarity, Engagement, and for each AOI also the equivalent Attention score. 


The benchmarks were produced for the overall data set (regardless of category or subcategory), then by category and subcategory. 

First, the distribution of the scores was checked to see if the average would be representative of most scores, as seen by the histograms of the Predict metrics below. For the overall scores, the average (red vertical line) and the median (green vertical line) were produced to validate whether the data follows a normal distribution.

When the data follows a normal (bell-curve) distribution, then the majority of the data is near the average value. A good indicator of this is that the average and the median are the same value, or in our case close enough. If the average and the median were far apart, the benchmarks would not be guaranteed to be representative of the average scores for the industry. However, when the average and median are close, it means the benchmarks represent the range where most of the data points are, i.e. the average for the industry.

The data was then split into the different categories and subcategories. The benchmark of a category or subcategory was produced by calculating the average score together with the distribution (standard deviation) of the scores. The benchmark range was then calculated by subtracting the standard deviation from the average to get the lower end of the range, and adding the standard deviation to the average to get the upper end of the range. 

Because the benchmarks were calculated like this, you can conclude that a score within the benchmark range fits within the average scores of images within the given category, and a score that is not within the range is atypical for the industry.