Overview

This methodology defines how the system evaluates an interface against a structured set of usability and design rules. Each rule has a clear outcome; either it passes or it fails. The resulting score shows the percentage of rules that were met during evaluation. It reflects how well the interface aligns with established standards and expectations. This approach does not rely on opinion or interpretation. It provides a stable, repeatable measure of conformance. The results help identify where the design meets requirements, where it falls short, and how it changes over time.

Scoring Model

Each guideline is marked with one of the following:

Pass = 1
Fail = 0
N/A and Unanswered = excluded from scoring

Only Pass and Fail are counted in the score calculation.

Category Scoring

For each category:

Let:

Pass = number of guidelines marked Pass
Fail = number of guidelines marked Fail

Then: CategoryScore = Pass / (Pass + Fail)

Example:

Pass = 6  
Fail = 2  
Factored: CategoryScore = 6 / (6 + 2) = 0.75 = 75%

If Pass + Fail = 0, the category is excluded from aggregation.

Overall Score

If there are k valid category scores:

OverallScore = (Sum of all CategoryScores) ÷ k

The result is expressed as a percentage.

Flat Score Variant

Without categories:

Let:

TotalPass = total number of Pass across all guidelines
TotalFail = total number of Fail across all guidelines

If categories are ignored:

Score = TotalPass ÷ (TotalPass + TotalFail)

Category scores:

A = 4 / (4 + 1) = 0.80
B = 2 / (2 + 2) = 0.50
C = 3 / (3 + 0) = 1.00
OverallScore = (0.80 + 0.50 + 1.00) / 3 = 0.7667 = 76.67%

Worked Example

Category	Pass	Fail	N/A
A	4	1	0
B	2	2	1
C	3	0	2

Grade Thresholds

Grade	Interval
A	90 to 100%
B	80 to 89.9%
C	70 to 79.9%
D	60 to 69.9%
F	0 to 59.9%

Evaluation Method

A reviewer goes through each guideline and marks whether the current interface satisfies it. Each guideline is treated as a yes-or-no check:

gᵢ(flow) ∈ {0, 1}

The final score is the average of these binary outcomes across the set.

Binary Rules

Using binary values keeps results consistent, traceable, and easy to compare. There’s no subjectivity, scaling ambiguity, or weighting. Every failure can be traced back to a specific guideline. This allows full traceability to individual failures and enables linear aggregation over time. As your evaluations adhere to similar guideline sets for similar flows, adhearance improvements can be tracked for refactoring where needed.

What the Score Represents

Percentage of defined rules that the interface satisfies
Clear signal of how closely the interface matches standards
Stable metric for comparing the same system over time
Directional trend to track improvement or regression

What the Score Does Not Represent

User satisfaction or emotional response
Perceived ease of use or aesthetic appeal
Efficiency, speed, or task success
Cognitive demand or user behavior patterns

Using Results

Focus on changes over time rather than single scores
Review failed items to understand specific gaps
Keep older evaluations to track trends and regressions
Use the score to guide decisions, not to define success or failure

Methodology