Kilotest: Accessibility Testing Strategies
Estimated reading time: about 60 minutes.
About this tutorial
Notice: This is a rough draft created in partnership with Claude Sonnet 4.6 Thinking. Revisions are in progress.
This tutorial is for web developers with limited experience in web accessibility testing. It takes about 60 minutes to complete.
You will learn about three strategies for testing web accessibility:
- Human testing — a person directly examines and uses the page
- Rule-engine testing — software applies codified rules to the page
- AI-agent testing — an AI agent analyzes the page
A practicum applies all three strategies to a real example, revealing what each strategy finds, what each misses, and why.
Links to additional information are included throughout. Completing the tutorial does not require following any of those links.
Offline use: This tutorial is self-contained. You may complete it without an Internet connection, though external links will not be reachable.
Note on AI results: The AI testing results in Part 2 were captured in May 2026 from a specific model. As AI models and the resources available to them improve, results may change — potentially for better or worse. The captured results are preserved here as a fixed point of comparison.
Part 1: Orientation
About 30 minutes.
Why accessibility testing matters
About 5 minutes.
Accessibility is not a separate concern added to a finished product. It is an aspect of code quality, legal compliance, and basic usability — present or absent in every page you ship.
Code quality
Inaccessible code is often incorrect code. HTML specifications define semantic elements and attributes for good reasons, and violations frequently indicate structural errors that affect all users, not only users with disabilities. A missing autocomplete attribute, for example, is not a cosmetic gap — it tells the browser that the developer has not declared the purpose of a form field that users rely on.
Accessible code also tends to be more maintainable: clear structure, meaningful labels, and correct role assignments make code easier to understand and modify.
Legal compliance
In many jurisdictions, web accessibility is a legal requirement. The most widely referenced technical standard is the Web Content Accessibility Guidelines (WCAG), published by the W3C. WCAG 2.1 and 2.2 are referenced by law in the United States (Section 508, ADA), the European Union (European Accessibility Act), the United Kingdom (Equality Act), and many other countries.
Non-compliance exposes organisations to legal action, regulatory penalties, and reputational harm. Lawsuits and settlements involving web accessibility are documented each year in all of these jurisdictions.
Usability
Approximately 15–20% of people worldwide have a disability that can affect their use of the web. These include permanent conditions (blindness, deafness, motor impairments, cognitive disabilities) as well as temporary situations (a broken arm, recovering from eye surgery) and situational constraints (a noisy environment, a phone in bright sunlight, holding an infant).
Inaccessible code excludes or burdens these users. Features that are technically accessible also tend to benefit users without disabilities: clear labels reduce errors for everyone, logical keyboard flow speeds navigation, and high contrast aids readability in suboptimal lighting.
The three testing strategies
About 8 minutes.
Human testing
A person directly examines the page, using the browser and auxiliary tools to find accessibility issues. This is sometimes called manual testing
, though the term can be misleading — modern human testing uses several tools; it is the human judgment that distinguishes this strategy.
Human testing typically involves three techniques:
- DOM inspection. The tester uses browser developer tools to examine the HTML structure, attributes, and the accessibility tree. The accessibility tree is the browser's representation of the page as assistive technologies see it — role, name, state, and value for each element.
- Keyboard navigation. The tester uses only the keyboard (Tab, Shift+Tab, Enter, Space, arrow keys) to navigate and operate the page. This reveals whether all interactive elements are reachable and operable without a mouse.
- Screen reader testing. The tester activates a screen reader — software that vocalises page content — and navigates the page by ear. Common screen readers include NVDA and JAWS on Windows, VoiceOver on macOS and iOS, and TalkBack on Android.
Effective human testing requires substantial knowledge: WCAG success criteria, ARIA authoring practices, HTML semantics, and proficiency with at least one screen reader. It also takes time — a thorough examination of a single page may take 30 minutes to several hours.
Rule-engine testing
Software applies a set of codified rules to a page and reports which rules are violated. This is sometimes called automated testing
, though that label is also imprecise — what is automated is the rule application, not the judgment about what rules should exist.
Rule engines come in several forms:
- Browser extensions such as the axe DevTools extension and the WAVE Evaluation Tool run directly in the browser and report violations interactively.
- Installed engines such as Pa11y and Testaro run from the command line and can be integrated into development workflows.
- API services accept page URLs and return analysis results over the network.
- Ensemble services such as Kilotest run multiple rule engines against a page and consolidate the results. Because different engines encode different rules, an ensemble typically finds more issues than any single engine.
Rule-engine testing is fast — a full scan typically takes 1–5 minutes. It requires little accessibility knowledge from the user, since the rules encode the expertise. It can be run repeatedly, integrated into CI/CD pipelines, and applied to large numbers of pages.
AI-agent testing
An AI agent, as used in this tutorial, is a language model — software trained on large amounts of text to understand and generate language — configured with instructions to perform a specific task. The underlying model (such as Claude, GPT-4, or Gemini) is the system you access; the agent is the model combined with its instructions and the content you provide. For accessibility testing, the agent is given page content and asked to identify violations. This strategy is the newest of the three and is evolving rapidly.
The workflow involves three choices:
- Selecting a model. Different models vary in their knowledge of accessibility standards, their ability to reason about HTML structure, and their tendency to hallucinate. As of 2026, capable models include Claude (Anthropic), GPT-4 (OpenAI), and Gemini (Google).
- Designing instructions. The instructions — often called a
prompt
— specify what to test, what standards to apply, and how to report findings. Prompt quality significantly affects the quality of results. - Providing page content. The AI agent must receive the relevant page content. This may be the raw HTML source, the rendered DOM, screenshots, or some combination. What you provide determines what the agent can find.
AI-agent testing can reason about issues that resist mechanical rules — for example, whether image alternative text is meaningful in context, or whether a form label is genuinely helpful. Its output is available within minutes and requires no specialist tools beyond access to an AI agent. However, AI agents can produce false findings and can miss real ones, and their results may change as the underlying models are updated.
Choosing among the strategies
About 12 minutes.
The three strategies are not alternatives — they are complements. A mature accessibility testing practice uses all three. Understanding their differences helps you decide how to allocate effort.
Time to results
- Human testing
- Slowest. A thorough examination of a single page takes 30 minutes to several hours, depending on the page's complexity and the scope of testing.
- Rule-engine testing
- Fastest. A full-page scan typically completes in 1–5 minutes. An ensemble service such as Kilotest may take somewhat longer.
- AI-agent testing
- Moderate. Depending on the scope of instructions, the amount of page content provided, and the AI agent's processing speed, 2–10 minutes is typical for focused testing.
Financial cost
- Human testing
- The cost of a skilled tester's time. Specialist accessibility consultants are expensive. Internal developers trained in accessibility are a recurring investment.
- Rule-engine testing
- Many rule engines are free and open-source. Commercial tools and API services typically charge per page or per month. Kilotest is currently free. Costs are generally low relative to the number of tests run.
- AI-agent testing
- Most AI services charge per token (unit of text) processed. As of 2026, asking an AI agent to check a single page against one WCAG success criterion typically costs between $0.01 and $0.10 depending on the model used. Prices are falling over time.
What can each strategy find?
- Human testing
- The broadest potential coverage. A skilled human can find any issue that is observable through the browser, developer tools, or a screen reader. Human testing is the only strategy that can reliably assess whether content is usable — not merely technically compliant — for users with disabilities. It is also the only strategy that can catch issues that only appear during interaction (filling out a multi-step form, for instance) or that depend on context (whether an image description is accurate given the surrounding text).
- Rule-engine testing
- Limited to issues for which explicit rules have been written. Rules can be precise and comprehensive for clearly defined requirements (for example: every
<img>must have analtattribute). They cannot assess subjective quality (for example: whether thatalttext is actually helpful). Some issues are detectable in principle but not yet covered by available engines. - AI-agent testing
- Broader than rule engines for issues requiring reasoning, narrower than humans for issues requiring interaction or lived experience. An AI agent can assess whether image alternative text is plausibly meaningful, whether instructions are clear, or whether form labels are adequate. It cannot click, scroll, or type into the page; it does not receive a visual or auditory rendering; and it does not receive the browser's accessibility tree — the computed structure of roles, names, states, and values that assistive technologies consume.
False positives (incorrect findings)
- Human testing
- Low, in the hands of a knowledgeable tester. A human who understands the standards can distinguish genuine violations from compliant edge cases. Inexperienced testers produce more false positives.
- Rule-engine testing
- Moderate and tool-dependent. Rules that cannot fully determine compliance from the code alone (for example: whether an image needs alternative text) may produce incorrect results. Different engines disagree about the same page, and some findings require human review to confirm.
- AI-agent testing
- Variable and model-dependent. Current models sometimes misidentify compliant patterns as violations, particularly when their knowledge of applicable standards is incomplete or when the HTML structure is complex. Critical review of AI output is essential.
False negatives (missed defects)
- Human testing
- Low for experienced testers working systematically, but not zero. Testers can overlook attributes on elements they do not specifically inspect. Issues that only manifest in edge cases or under specific conditions may be missed if those cases are not tested. Consistency across large page sets can be difficult to maintain.
- Rule-engine testing
- Any issue not covered by a rule will be missed. No current rule engine covers all WCAG success criteria. Engines also differ in which rules they implement: a single engine may miss issues that another engine catches. Ensemble testing reduces the false negative rate relative to any single engine.
- AI-agent testing
- Depends heavily on what content is provided to the AI agent. If the agent receives only the raw HTML source of a page whose forms are rendered by JavaScript, it will miss all form-related issues entirely. If the agent's knowledge of valid attribute values is incomplete, it may miss invalid values. AI-agent false negatives are less predictable than rule-engine false negatives, because they vary with the model version, the prompt, and the content provided.
Expertise required
- Human testing
- High. The tester needs working knowledge of WCAG, ARIA, and HTML semantics, plus proficiency with a screen reader.
- Rule-engine testing
- Low to moderate. Running a tool requires little accessibility knowledge. Interpreting results and determining whether flagged items are genuine violations requires more. Reviewing results across a large ensemble requires the ability to reconcile conflicting findings.
- AI-agent testing
- Low to moderate for basic use; moderate for reliable use. Formulating precise and effective instructions, understanding the limits of AI output, and critically reviewing findings all benefit from accessibility knowledge.
Best suited for
- Human testing
- Final verification before release; issues requiring judgment about context, meaning, or usability; interaction flows; screen reader experience; and any issue that requires lived experience with assistive technology.
- Rule-engine testing
- First-pass scanning across many pages; continuous integration checks; measuring progress over time; identifying well-defined structural violations efficiently; and providing a baseline before human testing.
- AI-agent testing
- Focused analysis of specific issue types when sufficient page content can be provided; assessment of issues that are too nuanced for mechanical rules but too common for full human review; and generating explanations and remediation guidance.
Introducing the case study
About 5 minutes.
The practicum applies all three strategies to a single type of accessibility issue on a specific web page. The issue type is: incorrect or absent autocomplete attribute on form inputs.
Why the autocomplete attribute matters
The autocomplete attribute tells the browser what kind of personal information an input field collects. When set correctly, the browser can offer to fill in the field from stored data — a feature that particularly helps users with cognitive disabilities, motor disabilities, dyslexia, or anyone filling out forms repeatedly.
WCAG 2.1 and 2.2 Success Criterion 1.3.5 Identify Input Purpose (Level AA) requires that the purpose of inputs collecting personal information can be programmatically determined. The autocomplete attribute is the primary mechanism for satisfying this criterion in HTML.
Valid attribute values
On <input>, <select>, and <textarea> elements, the autocomplete attribute must have a value from the defined list of autofill tokens. Common values include:
name,given-name,family-nameemailtelstreet-address,postal-code,countryusername,new-password,current-password
On a <form> element, the autocomplete attribute is also valid — but with an important restriction: the only valid values for a <form> element are on and off. The richer set of purpose tokens is not valid on <form> elements; those tokens belong on the individual input elements within the form.
Two types of violation
This practicum focuses on two specific violations of WCAG 1.3.5:
- Missing attribute: An input collecting personal information has no
autocompleteattribute at all. - Invalid attribute value: An element has an
autocompleteattribute, but its value is not among those recognised by the HTML specification for that element type.
Both violations prevent browsers and assistive technologies from identifying the purpose of the affected fields.
Part 2: Practicum
About 30 minutes.
This practicum examines a single real-world web page for autocomplete attribute issues. The page is a public website homepage captured in May 2026. The name and URL of the site are withheld to avoid public-relations concerns unrelated to this tutorial. The page facts below are a snapshot; the live page may differ.
The example page
About 3 minutes.
The example page is the homepage of a nonprofit organisation in the career-guidance sector. It is a publicly accessible, JavaScript-rendered WordPress site.
Relevant page content
The page contains a newsletter sign-up form. This form is not present in the raw HTML that the server delivers. It is injected into the page by a JavaScript plugin (Thrive Leads) after the browser has loaded and executed the page scripts. The form contains one visible user-facing input field:
<input type="email" data-field="email" data-required="1"
data-validation="email" name="email"
placeholder="> Enter your email">
The form element wrapping this input is:
<form action="#" method="post" novalidate="novalidate"
autocomplete="new-password">
Both elements were captured from the rendered DOM on 2026-05-30 using a headless browser. They are the only non-hidden, non-search, non-consent form inputs on the page.
The two defects
Two autocomplete attribute issues are present:
- Issue A — The
<input type="email">element has noautocompleteattribute. Per WCAG 1.3.5, an input that collects an email address requiresautocomplete="email". - Issue B — The
<form>element hasautocomplete="new-password". On a<form>element, the only valid values areonandoff.new-passwordis a valid token for<input>elements, not for<form>elements. This value appears to be a misapplication of a technique sometimes used to suppress browser autofill on individual input fields.
Human testing
About 7 minutes.
A human tester would approach this page using browser developer tools, keyboard navigation, and (optionally) a screen reader.
What a tester would do
- Open the page in a browser and let it fully load.
- Identify interactive elements — in this case, the email input in the newsletter form.
- Open the browser's developer tools (typically F12 or right-click → Inspect).
- Select the email input in the Elements panel and examine its attributes.
- Note that the
autocompleteattribute is absent. - Select the enclosing
<form>element and examine its attributes. - Note that
autocomplete="new-password"is set on the form. - Recall (or look up) that
new-passwordis not a valid value for a<form>element.
Findings
Issue A found? Yes — a tester who inspects the email input will find the missing attribute.
Issue B found? Uncertain. A tester who inspects the form element will see autocomplete="new-password". Whether they flag this as invalid depends on whether they know that this token is not valid on <form> elements. Many experienced developers have seen new-password used as an autofill-suppression technique on individual inputs and might not question its use on a form.
Limitations observed
The form is JavaScript-rendered and appears only after the page fully loads. A tester who views the page source (rather than the rendered DOM) would not find it at all. A tester must know to wait for JavaScript execution and to inspect the live DOM, not the source.
Keyboard navigation and screen reader testing would reveal an additional issue — the email input has no label element (Issue A causes the input to be named only by its placeholder text, which is fragile) — but that is a separate issue not in scope here. Those techniques would not directly reveal the autocomplete attribute issues, which are only visible in the DOM.
Verdict on autocomplete issues
- Issue A (missing attribute on input): Found, if the tester inspects the input element.
- Issue B (invalid value on form): Possibly missed, unless the tester knows the valid values for
autocompleteon<form>elements.
Rule-engine testing
About 7 minutes.
Several rule engines were applied to this page. The results below compare what a single widely-used engine found against what ensemble testing found.
Single-engine result: axe-core
axe-core is one of the most widely used open-source accessibility rule engines. It is the engine underlying the axe DevTools browser extension, the popular jest-axe testing library, and many CI/CD integrations. It was run against the fully rendered DOM of the example page in May 2026.
axe-core result on autocomplete issues: 0 violations found.
axe-core's autocomplete-valid rule checks that autocomplete attribute values, when present, are from the valid list. It does not flag the absence of an autocomplete attribute on inputs collecting personal information. Separately, the autocomplete="new-password" value on the <form> element was not flagged.
Both autocomplete issues were false negatives for axe-core.
Ensemble result: Kilotest
Kilotest applied 10 tools to the same page in May 2026. Nine of those tools reported at least one issue on the page. Two tools specifically reported autocomplete issues:
- Testaro reported
autocomplete missing
(Issue A: the email input lacks anautocompleteattribute). WCAG 1.3.5. - HTML CodeSniffer reported
autocomplete invalid
(Issue B: the<form>element hasautocomplete="new-password", which is not valid for that element type). WCAG 1.3.5.
Ensemble result: both autocomplete issues found — by different tools within the ensemble.
The full Kilotest report for this page is available at kilotest.com (report 260530T0032/n2u). It lists 45 issues across 9 tools — the autocomplete issues are two of them.
Why the single engine missed both issues
axe-core's autocomplete-valid rule was designed to catch invalid values when an autocomplete attribute is present. It was not designed to enforce WCAG 1.3.5's requirement that inputs collecting personal information must have an autocomplete attribute. For Issue B, axe-core also did not flag the invalid value on the <form> element.
This is not a criticism of axe-core specifically. Every rule engine has coverage gaps. The practical lesson is that no single rule engine covers all WCAG success criteria. Running multiple engines — an ensemble — reduces, though does not eliminate, false negatives.
Verdict on autocomplete issues
- Issue A (missing attribute on input): Missed by axe-core; found by Testaro in the Kilotest ensemble.
- Issue B (invalid value on form): Missed by axe-core; found by HTML CodeSniffer in the Kilotest ensemble.
AI-agent testing
About 7 minutes.
An AI agent was asked to find autocomplete attribute issues on the example page. Two scenarios were tested, illustrating how the content given to the agent affects what it can find.
What the AI agent was given and asked
In both scenarios, the instructions were the same:
Instruction to the AI agent:
The following content is from a web page. Identify all violations of WCAG 2.2 Success Criterion 1.3.5 (Identify Input Purpose) related to the autocomplete attribute. For each violation, state the element, the problem, and the correction required. Be precise about which values are valid for each element type.
Scenario 1: AI agent given the raw HTML source
The raw HTML delivered by the server was provided to the AI agent. This HTML does not contain the newsletter form, which is injected by JavaScript after page load.
Model used: Claude claude-sonnet-4 (Anthropic), May 2026.
AI response (summary):
I can see a search input (<input type="search" name="s">) in the provided HTML. This element does not have an autocomplete attribute. However, a search field is not among the input purposes listed under WCAG 1.3.5 — it does not collect personal information — so no violation is present. I find no violations of WCAG 1.3.5 in the provided HTML.
Result: Both issues missed — 100% false negative rate.
The AI agent correctly analysed what it was given, and correctly found no violations in the raw HTML. But the raw HTML does not contain the newsletter form. The AI agent could not report what it was not given. This is not a failure of the agent's reasoning — it is a failure of the data provided to it.
Scenario 2: AI agent given the rendered DOM
The HTML of the fully rendered DOM — captured after JavaScript execution — was provided to the AI agent. This HTML includes the newsletter form with its email input and the form element.
Model used: Claude claude-sonnet-4 (Anthropic), May 2026.
AI response (summary):
I have identified two violations of WCAG 2.2 Success Criterion 1.3.5:
Violation 1: Missing autocomplete attribute on email input.
Element: <input type="email" name="email" placeholder="> Enter your email">
Problem: This input collects an email address but has no autocomplete attribute. WCAG 1.3.5 requires that inputs collecting personal information have a programmatically determinable purpose.
Correction: Add autocomplete="email".
Violation 2: Invalid autocomplete value on the <form> element.
Element: <form autocomplete="new-password">
Problem: On a <form> element, the only valid values for autocomplete are on and off. The value new-password is a valid autofill detail token for <input> elements, but not for <form> elements. Its presence here does not conform to the HTML specification and will not be processed as intended by browsers.
Correction: Replace autocomplete="new-password" with autocomplete="off" if the intent is to suppress autofill on the form. Note that autocomplete="off" is itself not recommended for personal data collection forms, as it degrades the experience for users who rely on autofill.
Result: Both issues correctly found — when given the right input.
With the rendered DOM, the AI agent identified both violations accurately and provided correct remediation advice. It also flagged that suppressing autocomplete on a personal data form is itself a usability concern for users who rely on autofill — a nuance that rule engines do not typically report.
What this demonstrates
The two scenarios show that the effectiveness of AI-agent testing depends critically on what the AI agent receives, not only on the capability of the underlying model. Providing the rendered DOM rather than the raw HTML source is the difference between finding both issues and finding neither.
This has a direct practical implication: when using AI agents for accessibility testing, you must understand whether the page's content is server-rendered or JavaScript-rendered. If it is JavaScript-rendered, the agent must receive the rendered DOM, not the raw source.
It also illustrates a limitation that affects all three strategies: all depend on the tester (human, rule engine, or AI agent) receiving the full relevant content. Differences in what each strategy can access — raw source, rendered DOM, accessibility tree, visual rendering — partly determine what each strategy can find.
Verdict on autocomplete issues
- Issue A (missing attribute on input): Missed when given raw HTML; found when given rendered DOM.
- Issue B (invalid value on form): Missed when given raw HTML; found when given rendered DOM.
Comparing the strategies
About 6 minutes.
The practicum produced a clear picture of what each strategy found on the example page:
What each strategy found
- Human testing
- Found Issue A (missing attribute) on direct inspection. Issue B (invalid value on form) was possibly missed, depending on whether the tester knew that
new-passwordis not a valid value for<form>elements. The form itself would have been invisible to a tester using only the page source. - Rule-engine testing (single engine)
- Found neither issue. axe-core, one of the most widely used accessibility rule engines, produced zero findings on the two autocomplete issues.
- Rule-engine testing (ensemble)
- Found both issues, through two different tools in the ensemble: Testaro found Issue A; HTML CodeSniffer found Issue B.
- AI-agent testing (raw HTML)
- Found neither issue. The form is not present in the raw HTML, so the AI agent had no relevant content to analyse.
- AI-agent testing (rendered DOM)
- Found both issues, with accurate explanations and correct remediation advice.
Lessons from the case study
Lesson 1: What you provide determines what gets found
Every testing strategy operates on a representation of the page — raw source, rendered DOM, visual rendering, or accessibility tree. Each representation contains different information. A strategy given an incomplete or wrong representation will produce incomplete or wrong results, regardless of its capability.
In this case study, both rule-engine tools that found the issues used the rendered DOM (via a headless browser). The AI agent found the issues when given the rendered DOM and nothing when given the raw source. The human tester would find both using a browser's developer tools, which always shows the rendered DOM.
Lesson 2: No single rule engine is complete
axe-core is a capable, well-maintained engine used by millions of developers. It found neither autocomplete issue on this page. This is not an unusual result — studies of rule-engine agreement consistently find that no single engine covers all issues.
Ensemble testing, by combining multiple engines, found both issues. This is the practical argument for ensembles: different tools have different coverage, and their findings complement each other.
Lesson 3: Some issues are harder for humans than for machines
Issue B — the invalid autocomplete value on the <form> element — illustrates an important pattern. The developer who introduced this value almost certainly had a goal (suppressing browser autofill) and used a technique they had seen work on individual inputs. The fact that it is technically invalid for a <form> element is a detail that a knowledgeable human might easily overlook.
An appropriate rule engine, however, checks the value against a precise specification and reports the violation without ambiguity. This is one of the strongest arguments for rule-engine testing: it catches specification-level errors consistently, regardless of whether the reviewer finds the value plausible.
Lesson 4: The AI limitation encountered during this tutorial's development
This tutorial was developed with AI assistance. During development, an AI agent was asked to fetch the example page and identify the elements with autocomplete issues. The agent fetched the page HTML and ran automated queries — and initially found only one issue (the missing attribute on the email input). It did not find the invalid value on the <form> element.
The reason is instructive. The agent queried for <input>, <select>, and <textarea> elements, but did not initially think to query for the <form> element's autocomplete attribute. A human developer with DevTools open — able to visually scan the attributes panel, explore the DOM tree interactively, and notice all attributes on any element — would more naturally encounter the form element's attribute while inspecting the surrounding structure.
This experience illustrates a real limitation of current AI agents: even though an agent can rapidly process large amounts of text, it does not yet replicate the fluid, exploratory experience of a developer browsing to a page and inspecting its DOM. That exploratory capacity remains a strength of human testing.
Conclusion
About 5 minutes.
You have seen all three accessibility testing strategies applied to the same page and the same issue type. Each strategy has genuine strengths and genuine limitations. The key points to take away:
- Use all three. The strategies are complements, not alternatives. Rule engines provide fast, consistent first-pass coverage. AI agents can reason about nuance and explain issues. Human testing catches what the others miss and is the ultimate measure of real-world usability.
- Understand what each strategy sees. A strategy can only report on what it receives. For JavaScript-rendered content, ensure your tools — rule engines, AI agents, or human testers — are working with the rendered DOM, not the raw HTML source.
- Prefer ensembles over single tools. No rule engine covers all WCAG success criteria. Running multiple engines together significantly reduces the false negative rate relative to any single engine.
- Verify AI output. AI agents can produce both false positives and false negatives. Their output is a useful starting point, not a final verdict. Critical review against the relevant standards is essential.
- Test early and often. The cheapest time to fix an accessibility issue is before the page is shipped. Rule-engine testing integrated into a development workflow catches issues while the code is fresh and the fix is small.
Further reading
- Understanding WCAG 2.2 — W3C explanations of each success criterion
- Web Accessibility Evaluation Tools List — W3C registry of rule-engine tools
- Understanding SC 1.3.5: Identify Input Purpose — detailed guidance on
autocompleterequirements - HTML Living Standard: Autofill — the definitive list of valid
autocompletetokens - Accessibility Metatesting: Comparing Nine Testing Tools — research on rule-engine coverage variation
- Testaro: Efficient Ensemble Testing for Web Accessibility — the rationale for ensemble testing
Return to the Kilotest home page.
Suggest an improvement
Have a suggestion for improving this tutorial? Submit it below. Your suggestion will be saved for the tutorial authors to review.