Tutorial: Accessibility Testing Strategies

About this tutorial

If you are familiar with front-end web technology (HTML, CSS, JavaScript), but you have limited knowledge of digital accessibility testing, this tutorial is designed for you.

Estimated completion time: 1 hour.

The tutorial has two parts:

Orientation. What digital accessibility is, why it is valuable, and how websites can be tested for it.
Practicum. Examining a specific example.

There are links to additional information at the end.

You can help improve this tutorial by submitting your suggestions at the end.

Part 1: Orientation

Accessibility basics

Accessibility has several meanings. Here it refers to universal usability: usability for the widest practical range of people. It often aims to serve people with specific physical, sensory, and cognitive disabilities (an estimated 20% of the world population), but accessibility typically elevates usability for everybody.

Accessibility standards exist for streets, buildings, vehicles, devices, and software. Standards are defined by professional associations, governments, and individual experts. The latter standards are often called best practices.

Kilotest deals with web accessibility, so that is the focus of this tutorial.

Not everybody agrees with every standard, but some standards are widely accepted and have even been made legally mandatory. The web accessibility standards gaining the greatest legitimacy are those defined by the the World Wide Web Consortium (W3C). They include:

Web Content Accessibility Guidelines (WCAG): for websites
Authoring Tool Accessibility Guidelines (ATAG): for web authoring tools
Accessible Rich Internet Applications (ARIA): for scripted web applications

Impacts

Obviously, websites that are insecure or leak private or confidential data are risky. Similarly, inaccessible websites are risky. They expose site owners to prosecution, civil litigation, and negative publicity. Inaccessible websites are also harder to use, deterring what you want users to do, such as understanding your mission or making purchases.

Many users interact with the web using assistive technologies (ATs): hardware and software tools that mediate between websites and users. Some are designed to serve users with disabilities, such as screen readers that explain structure and content aurally, voice input software, eye trackers, and breath-based navigators. ATs typically rely on website accessibility, so AT users may be unable to navigate or operate inaccessible sites. Do you want to see an example of this? Then pause for a minute and view a video about browsing while blind.

Increasingly, web use is being mediated by artificial-intelligence agents. They all make mistakes, but the error rate is likely to jump when a website violates accessibility standards.

Accessibility standards include but exceed HTML, CSS, and JavaScript standards, so if you achieve accessibility you get conformity to those standards as a side effect. This makes your code more maintainable and debugging easier as team members change.

Testing

Accessibility testing is the process of verifying conformity to accessibility standards. There are three main strategies of web accessibility testing.

Strategy 1. Testing by humans

A person directly examines a website, using browsers, developer tools, I/O devices, and assistive technologies to find accessibility problems. This is sometimes called manual testing, but it really is human testing: testing by a person who investigates the site by inspecting and using it.

One subtype of human testing is testing by experts in the use of assistive technologies and atypical I/O methods. These are usually people who have gained their expertise through long-term use arising from disabilities.

Human testing gives you insight into accessibility problems that other strategies are likely to miss. It is also the slowest and most expensive strategy, typically requiring hours per page.

Strategy 2. Testing by rule engines

Software applies a set of codified rules and reports on any violations of the rules. This is sometimes called automated testing.

Rule engines come in several forms:

Browser extensions run directly in the browser and report violations interactively.
Installed engines run from the command line and can be integrated into development workflows.
API services accept page URLs or HTML content and return analysis results over the network.
Ensemble services run multiple rule engines against a page and consolidate the results.

Rule-engine testing is fast and often cheap. Testing one web page typically takes no more than five minutes. It requires little accessibility knowledge from the user, since the rules encode the expertise. It can be run repeatedly, integrated into CI/CD pipelines, and applied to many pages.

Strategy 3. Testing by AI agents

Artificial intelligence (AI) has been developing since the mid-1950s and by now has produced systems that can mimic human expert behavior. So it is not preposterous that you could give the URL of a web page to an AI agent and say, Find accessibility defects on that page and report them to me. What happens if you try that?

If you ask a mainstream AI agent, powered by a foundation model, to navigate to a web page and look for accessibility defects in it, the response may look right, but it may be:

incomplete
incorrect
fabricated

AI agents tend to have trouble getting the information they need for accessibility testing of a deployed web page. They may need the source code, the HTML after any scripts have been executed, the accessibility tree (an accessibility-related simplification of the code), a graphic image (screenshot) of the page, and access to the live page so they can interact with it. Because of limited capabilities and security restrictions, they are likely to be unable to get all of that and handle it efficiently.

The most commonly used AI agents are powered by large language models. Such a model is trained to give you answers that are highly probable. Sorry, I don’t have full access to that web page is often the right answer, but an improbable one, so you most likely will get a fabrication instead. The model will use its vast knowledge to imagine what a page would contain and base its answer on that.

When an AI agent does this, it is called a hallucination. When a human does it, it is called fraud.

The AI industry is making progress in ensuring that general-purpose AI agents delegate tasks that are impossible for them to competent tools and specialized AI agents. At present, however, you cannot expect AI agents powered by the mainstream foundation models to reliably navigate to a web page and diagnose accessibility defects on it.

You have other options, however. If you want an AI agent to test a deployed web page, you can supply to the agent the above-listed documents that it cannot otherwise obtain. Even then:

Acquiring and providing these documents costs you time, detracting from the time-saving promise of AI assistance.
You risk overloading the agent with more data than it can efficiently process.
You are not giving the agent access to the live page, so defects that become detectable only during interactions with the page cannot be found.

If you are developing a website, you can ask an AI agent to look for accessibility defects in your codebase. Agents that are integrated into development environments or pipelines can do this and may find issues that were not discovered by rule engines.

Here is one example. The Kilotest home page, when tested by Kilotest itself, is reported to have zero issues. But, when an AI agent was given local access to the Kilotest codebase in June 2026, the agent reported two accessibility issues on the home page. One was that the list of things to do was not wrapped in a nav element, so assistive technologies would not know that it is a list of navigation links. The other was that the More than a thousand? Really? line was not coded as a heading, even though it acted as a heading for the introduction to Kilotest that opens when you click that line.

These two issues were debatable, but you may want to make decisions about debatable issues, and an AI agent can make you aware of them.

Hybrid strategies

In reality, none of the above three strategies is pure.

Human testers use automation, including browser extensions, bookmarklets, and assistive technologies.
Rule engines can present dubious cases to humans for decisions and can get AI agent help to test for complex standards.
AI agents can use rule-based tools and depend on humans to provide effective instructions and data.

Orchestrating testing strategies

Sequence

The most mature accessibility testing practice would use all three strategies. Given their capabilities, wait times, and costs, a practical approach likely starts testing with rule engines and repairs the defects that they discover; then asks AI agents whether they can find any defects that still remain and, if so, repairs those; and finally leverages human testers. That gives humans a cleaner, more consistent, and more compliant product to test than they often work on, saving time and money. They can focus on discovering tricky issues they might otherwise overlook. Your website becomes even better.

Quality control

Accessibility testing is a type of quality control, but software tests, including accessibility tests, need quality control, too. Human testers make mistakes. Rule engines make mistakes. AI agents make mistakes. Moreover, accessibility is partly judgmental, so whether a tester made a mistake may be debatable.

Competitive testing can provide some internal quality control. Ensemble testing with multiple rule engines, and testing by a combination of rule engines, AI agents, and humans, let you identify inter-tester disagreements and investigate those. Your findings can help you improve not only your websites but also the testing process.

Part 2: Practicum

The issue

Let’s get concrete now. Out of dozens of accessibility standards, we shall examine one: input purpose identification.

Here is the problem. If you are active on the web, you probably fill out forms often, and you may be annoyed at how often you need to enter the same information about yourself, such as username, name, date of birth, telephone number, email address, postal address, password, and health history. The input purpose identification standard says that a web form asking for some of this information must cooperate with your browser to help get the inputs completed automatically. This helps all users, and especially those who have trouble remembering details and typing fast and accurately.

Input purpose identification is one of the standards codified by the Web Content Accessibility Guidelines (WCAG), where it is called Success Criterion 1.3.5. There is a list of input purposes subject to the standard. Any input, textarea, or select element with one of those purposes must have an autocomplete attribute with the corresponding value. For example, if it is a username input, the input element must have an autocomplete="username" attribute. Some other commonly used values are:

name, given-name, family-name
email, tel
street-address, postal-code, country
new-password, current-password

A form element may have an autocomplete attribute with an "on" or "off" value to permit or prohibit automated browser completion of all its data, except that autocomplete attributes on any elements within the form take precedence.

The page

We shall look for this single issue on a single web page. It is the home page of a JavaScript-rendered WordPress site belonging to a nonprofit organization. It was tested in May 2026.

The page contains a form for requesting a guide published by the organization. This form is not present in the raw HTML that the server delivers, but is injected into the page by a JavaScript plugin after the browser has loaded and executed the page scripts. The form contains one visible user-facing input field. Here is a screenshot of the form:

Here is the code for the form:

<form action="#" method="post" novalidate="novalidate" autocomplete="new-password">
  <input
    type="email"
    data-field="email"
    data-required="1"
    data-validation="email"
    name="email"
    placeholder="> Enter your email"
  >
</form>

Testing the page

By rule engines

What do rule engines find about this issue when they test this page? For a quick answer, Kilotest subjected the page to testing by all 10 rule engines in its ensemble. Two of the rule engines reported rule violations belonging to this issue:

The Testaro rule engine reported, input has no autocomplete="email" attribute. In other words, Testaro recognized this input as asking for the user’s email address and therefore as requiring an autocomplete="email" attribute, but not having one.
The HTML CodeSniffer rule engine, referencing this form element, reported, Invalid autocomplete value: new-password. Element does not belong to Password control group.. In other words, HTML CodeSniffer recognized that this form is not a password input (a form is not an input at all!) and therefore is not eligible for an autocomplete="new-password" attribute, but has one.

Both of these findings are correct.

The input asks for the user’s email address and therefore requires an autocomplete="email" attribute but has none; Testaro discovered and reported this.
The form is permitted to have an autocomplete attribute, but its value must be only "on" or "off", so the actual value of "new-password" is invalid; HTML CodeSniffer discovered this attribute and reported that the form element was not eligible to have it with the new-password value, but had it.

You might think that this shows rule engines were effective in revealing the defects related to this issue, but this is not a success story. Yes, two rule engines found related defects, but the other eight rule engines did not. Inspection of those eight rule engines reveals that six of them include rules enforcing this standard in some way. So, did these six rule engines report false negatives? Did they fail to enforce their applicable rules here? Perhaps, but perhaps not. Suppose a rule engine has a rule that requires the value of any autocomplete attribute to be in the list of allowed values. Well, new-password is in the list. So that rule engine would, properly, not report a violation of that rule.

We can clearly see from this example that no rule engine will find all accessibility defects, not even all defects that can be detected by rule engines. That is why Kilotest uses an ensemble of rule engines.

Of course, running ten rule engines also creates a risk of false positives reported by ten rule engines. In this particular case, there were none. For example, no rule engine claimed to find a defect related to this issue someplace else on the page. Kilotest, like any platform that integrates multiple rule engines, faces greater risks of false positives than a single rule engine would. It combats false positives by investigating user complaints and deprecating rules whose tests are found faulty.

By AI agents

An AI agent powered by a frontier model was asked to visit the page and report any accessibility defect related to autocomplete attributes. The agent worked for about 3 minutes and then reported a missing autocomplete="email" attribute in the newsletter subscription form. But the page has no newsletter subscription form! It has a form to request a guide, not to subscribe to anything. Moreover, the agent provided code to support its conclusion, and the code differed pervasively from the actual code on the page. This response was a complete fabrication. The agent confessed to it, explaining that it resorted to hallucination when its security restrictions prevented it from accessing the necessary page data.

A similar AI agent returned correct diagnoses when it was fed the page as an HTML document. That worked, because the two defects relevant to this practicum are discoverable from the HTML of the rendered page. Any defects arising from stylesheets or scripts would have been missed.

Even when an agent discovered both defects, it added some misleading advice: Replace autocomplete="new-password" with autocomplete="off" if the intent is to suppress autofill on the form. Note that autocomplete="off" is itself not recommended for personal data collection forms, as it degrades the experience for users who rely on autofill. The agent implied that setting autocomplete to "off" in the form element would nullify an autocomplete="email" in the form’s input element, but that is wrong, as mentioned in the orientation above.

By humans

Human testers would handle the defects for this practicum differently, depending on how they test.

At the simplest, a human tester acts like a typical user. A tester doing this would not see the form element and not be able to report its defective autocomplete attribute, because that defect would not prevent completing the email input. When clicking inside the email input, the tester would likely see an autocompletion popup with one or more email addresses. That would be the browser misbehaving, treating the input as asking for the user’s email address merely because the input has a type="email" attribute. This is a lucky guess here, but would be a mistake if the input were asking for the email address of an emergency contact. Thus, because of an overzealous browser, the tester would not be aware of any defect in the input either.

A more complex type of human testing is to use some atypical navigation method or some assistive technology while using the page. A tester doing this would likewise not notice the form defect, and might or might not discover the missing autocomplete attribute on the input element, depending on the details of the navigation method and the behavior of the browser.

The most exhaustive type of human testing includes not only using the page in various ways but also inspecting the code, the styles, and the accessibility tree by means of the browser developer tools. This would allow an attentive human tester to find both the improper autocomplete attribute on the form element and the missing autocomplete attribute on the input element.

Knowledge check

Accessibility testing by AI agents

Accessibility testing by humans

Practicum topic

Practicum testing

Suggest improvements

Please submit suggestions for improving this tutorial below. The Kilotest managers will review them.

Kilotest tutorial: Accessibility testing strategies

Notice