How are the Predict benchmarks calculated?

Dive into how we created the benchmarks from a dataset of a whopping 4197 images and 19096 AOIs!


The data consisted of 4197 images, which were each labeled to be included in one of the subcategories within the three main categories: Advertisement, Websites, or Products.

A total of 19096 Areas of Interest (AOIs) were drawn across the images. Each image had different relevant AOIs drawn on it, resulting in a different number of AOIs for the whole dataset as seen below: 

Each of these images with AOIs was run through the Predict platform to produce scores of Cognitive Demand, Focus, Clarity, Engagement, and for each AOI also the equivalent Attention score. 


The benchmarks were produced for the overall data set (regardless of category or subcategory), then by category and subcategory. The data was then split into the different categories and subcategories.

The overall distributions of each benchmark category and their corresponding subcategories are not guaranteed to follow a Gaussian or normal (bell-curve) distribution. Hence, we employ a general algorithm, from a statistical point of view, to calculate the confidence intervals that constitute the lower and upper ends of the benchmark ranges.


In simple terms, we want to ensure that 68.3% of the distribution, corresponding to 1-sigma probability volume, will lie inside the proposed range, irrespective of the shape of the distribution. In order to compute this range, we integrate from the two extreme ends of the distribution until we reach 15.8% on each side. The rationale underlying this algorithm is illustrated visually via the two schematics above for Gaussian (normal) and skewed distributions. This procedure yields the lower and upper boundaries that encompass 68.3% of the distribution. These boundaries constitute the lower and upper ends, respectively, of the benchmark ranges. The median value of the distribution is also indicated via a dashed line on the platform.

By virtue of the above quantitative procedure, you can conclude that a score lying within the benchmark range fits within the general standards, i.e. around 70% of images within the given category, and a score that is not within the range is atypical for the industry.