Understand how UXit calculates evaluation scores, category results, grades, and trend comparisons.

Overview

This methodology evaluates an interface against a structured set of usability and design rules. Each rule is scored as pass or fail, producing a conformance metric that shows how closely the interface matches the selected guideline set. Because each result is binary and traceable to a specific guideline, the score is repeatable, comparable over time, and useful for identifying where the design meets requirements or falls short.

Scoring Model

Each guideline is marked with one of the following:

Pass = 1
Fail = 0
N/A and Unanswered = excluded from scoring

Only Pass and Fail are counted in the score calculation. No weighting or subjective scale is applied.

Category Scoring

For each category, let:

Pass = number of guidelines marked Pass
Fail = number of guidelines marked Fail

Then:

\text{CategoryScore} = \frac{\text{Pass}}{\text{Pass} + \text{Fail}}

Example:

\begin{aligned} \text{Pass} &= 6 \\ \text{Fail} &= 2 \\ \text{CategoryScore} &= \frac{6}{6 + 2} = 0.75 = 75\% \end{aligned}

If $\text{Pass} + \text{Fail} = 0$ , the category is excluded from aggregation.

Overall Score

If there are $k$ valid category scores, the overall score is:

\text{OverallScore} = \frac{\sum_{i=1}^{k} \text{CategoryScore}_i}{k}

The result is expressed as a percentage.

Flat Score Variant

Without categories, let:

TotalPass = total number of Pass across all guidelines
TotalFail = total number of Fail across all guidelines

If categories are ignored:

\text{Score} = \frac{\text{TotalPass}}{\text{TotalPass} + \text{TotalFail}}

Worked Example

Category	Pass	Fail	N/A
A	4	1	0
B	2	2	1
C	3	0	2

Using the table above:

\begin{aligned} A &= \frac{4}{4 + 1} = 0.80 \\ B &= \frac{2}{2 + 2} = 0.50 \\ C &= \frac{3}{3 + 0} = 1.00 \\ \text{OverallScore} &= \frac{0.80 + 0.50 + 1.00}{3} = 0.7667 = 76.67\% \end{aligned}

Grade Thresholds

Grade	Interval
A	90 to 100%
B	80 to 89.9%
C	70 to 79.9%
D	60 to 69.9%
F	0 to 59.9%

Evaluation Method

A reviewer goes through each guideline and marks whether the current interface satisfies it. Each guideline is treated as a yes-or-no check:

g_i(\text{flow}) \in \{0, 1\}

The final score is the average of these binary outcomes across the set. This keeps the model consistent, easy to audit, and traceable to individual failed guidelines. It is intended to measure rule satisfaction and track change over time, not infer quality from ambiguous user-dependent signals that may vary between users or sessions.