Methodology
Overview
This methodology defines how the system evaluates an interface against a structured set of usability and design rules. Each rule has a clear outcome; either it passes or it fails. The resulting score shows the percentage of rules that were met during evaluation. It reflects how well the interface aligns with established standards and expectations. This approach does not rely on opinion or interpretation. It provides a stable, repeatable measure of conformance. The results help identify where the design meets requirements, where it falls short, and how it changes over time.
Scoring Model
Each guideline is marked with one of the following:
- Pass = 1
- Fail = 0
- N/A and Unanswered = excluded from scoring
Only Pass and Fail are counted in the score calculation.
Category Scoring
For each category:
Let:
- Pass = number of guidelines marked Pass
- Fail = number of guidelines marked Fail
Then: CategoryScore = Pass / (Pass + Fail)
Example:
Pass = 6
Fail = 2
Factored: CategoryScore = 6 / (6 + 2) = 0.75 = 75%If Pass + Fail = 0, the category is excluded from aggregation.
Overall Score
If there are k valid category scores:
OverallScore = (Sum of all CategoryScores) ÷ kThe result is expressed as a percentage.
Flat Score Variant
Without categories:
Let:
- TotalPass = total number of Pass across all guidelines
- TotalFail = total number of Fail across all guidelines
If categories are ignored:
Score = TotalPass ÷ (TotalPass + TotalFail)Category scores:
A = 4 / (4 + 1) = 0.80
B = 2 / (2 + 2) = 0.50
C = 3 / (3 + 0) = 1.00
OverallScore = (0.80 + 0.50 + 1.00) / 3 = 0.7667 = 76.67%Worked Example
| Category | Pass | Fail | N/A |
|---|---|---|---|
| A | 4 | 1 | 0 |
| B | 2 | 2 | 1 |
| C | 3 | 0 | 2 |
Grade Thresholds
| Grade | Interval |
|---|---|
| A | 90 to 100% |
| B | 80 to 89.9% |
| C | 70 to 79.9% |
| D | 60 to 69.9% |
| F | 0 to 59.9% |
Evaluation Method
A reviewer goes through each guideline and marks whether the current interface satisfies it. Each guideline is treated as a yes-or-no check:
gᵢ(flow) ∈ {0, 1}The final score is the average of these binary outcomes across the set.
Binary Rules
Using binary values keeps results consistent, traceable, and easy to compare. There’s no subjectivity, scaling ambiguity, or weighting. Every failure can be traced back to a specific guideline. This allows full traceability to individual failures and enables linear aggregation over time. As your evaluations adhere to similar guideline sets for similar flows, adhearance improvements can be tracked for refactoring where needed.
What the Score Represents
- Percentage of defined rules that the interface satisfies
- Clear signal of how closely the interface matches standards
- Stable metric for comparing the same system over time
- Directional trend to track improvement or regression
What the Score Does Not Represent
- User satisfaction or emotional response
- Perceived ease of use or aesthetic appeal
- Efficiency, speed, or task success
- Cognitive demand or user behavior patterns
Using Results
- Focus on changes over time rather than single scores
- Review failed items to understand specific gaps
- Keep older evaluations to track trends and regressions
- Use the score to guide decisions, not to define success or failure