Methodology
Understand how UXit calculates evaluation scores, category results, grades, and trend comparisons.
Overview
This methodology evaluates an interface against a structured set of usability and design rules. Each rule is scored as pass or fail, producing a conformance metric that shows how closely the interface matches the selected guideline set. Because each result is binary and traceable to a specific guideline, the score is repeatable, comparable over time, and useful for identifying where the design meets requirements or falls short.
Scoring Model
Each guideline is marked with one of the following:
- Pass = 1
- Fail = 0
- N/A and Unanswered = excluded from scoring
Only Pass and Fail are counted in the score calculation. No weighting or subjective scale is applied.
Category Scoring
For each category, let:
- Pass = number of guidelines marked Pass
- Fail = number of guidelines marked Fail
Then:
Example:
If , the category is excluded from aggregation.
Overall Score
If there are valid category scores, the overall score is:
The result is expressed as a percentage.
Flat Score Variant
Without categories, let:
- TotalPass = total number of Pass across all guidelines
- TotalFail = total number of Fail across all guidelines
If categories are ignored:
Worked Example
| Category | Pass | Fail | N/A |
|---|---|---|---|
| A | 4 | 1 | 0 |
| B | 2 | 2 | 1 |
| C | 3 | 0 | 2 |
Using the table above:
Grade Thresholds
| Grade | Interval |
|---|---|
| A | 90 to 100% |
| B | 80 to 89.9% |
| C | 70 to 79.9% |
| D | 60 to 69.9% |
| F | 0 to 59.9% |
Evaluation Method
A reviewer goes through each guideline and marks whether the current interface satisfies it. Each guideline is treated as a yes-or-no check:
The final score is the average of these binary outcomes across the set. This keeps the model consistent, easy to audit, and traceable to individual failed guidelines. It is intended to measure rule satisfaction and track change over time, not infer quality from ambiguous user-dependent signals that may vary between users or sessions.
What the Score Represents
- Percentage of defined rules that the interface satisfies
- Clear signal of how closely the interface matches standards
- Stable metric for comparing the same system over time
- Directional trend to track improvement or regression
What the Score Does Not Represent
- User satisfaction or emotional response
- Perceived ease of use or aesthetic appeal
- Efficiency, speed, or task success
- Cognitive demand or user behavior patterns
Using Results
- Focus on changes over time rather than single scores
- Review failed items to understand specific gaps
- Keep older evaluations to track trends and regressions
- Use the score to guide decisions, not to define success or failure