VoxLens: Making Online Data Visualizations Accessible with an Interactive JavaScript Plug-In

Authors Alida T. Muongchan, Ather Sharif, Jacob O. Wobbrock, Katharina Reinecke, Olivia H. Wang,

Plaintext

VoxLens: Making
VoxLens: Making Online
Online Data
Data Visualizations
Visualizations Accessible
Accessible with
with an
an
Interactive JavaScript
Interactive JavaScript Plug-In
Plug-In
Ather Sharif
Ather Sharif Olivia H.
Olivia H. Wang
Wang Alida T.
Alida T. Muongchan
Muongchan
asharif@cs.washington.edu
asharif@cs.washington.edu wang4@cs.washington.edu
wang4@cs.washington.edu alidatm@uw.edu
alidatm@uw.edu
Paul G.Allen
Paul G. AllenSchool
Schoolof
ofComputer
Computer Paul G.Allen
Paul G. AllenSchool
Schoolof
ofComputer
Computer Human CenteredDesign
Human Centered Designand
and
Science & Engineering | DUB Group,
Science & Engineering | DUB Group, Science & Engineering,
Science & Engineering, Engineering,
Engineering,
University of Washington
University of Washington Universityof
University ofWashington
Washington Universityof
University ofWashington
Washington
Seattle,Washington,
Seattle, Washington,USA
USA Seattle, Washington, USA
Seattle, Washington, USA Seattle, Washington, USA
Seattle, Washington, USA
Katharina Reinecke
Katharina Reinecke Jacob O.
Jacob O. Wobbrock
Wobbrock
reinecke@cs.washington.edu
reinecke@cs.washington.edu wobbrock@uw.edu
wobbrock@uw.edu
Paul G.Allen
Paul G. AllenSchool
Schoolof
ofComputer
Computer The InformationSchool
The Information School| |
Science & Engineering | DUB Group,
Science & Engineering | DUB Group, DUB Group,
DUB Group,
University of Washington
University of Washington Universityof
University ofWashington
Washington
Seattle,Washington,
Seattle, Washington,USA
USA Seattle, Washington, USA
Seattle, Washington, USA

Figure 1: VoxLens is an open-source JavaScript plug-in that improves the accessibility of online data visualizations using a
Figure 1: VoxLens
multi-modal is anThe
approach. open-source JavaScript
code at left plug-in
shows that that improves
integration the accessibility
of VoxLens requires onlyof aonline
singledata
line visualizations using
of code. At right, wea
multi-modal
portray approach.
an example The code
interaction at left
with showsusing
VoxLens that voice-activated
integration of VoxLens
commands requires only a single
for screen-reader line of code. At right, we
users.
portray an example interaction with VoxLens using voice-activated commands for screen-reader users.
ABSTRACT Specifcally, VoxLens enables screen-reader users to obtain a holis-
ABSTRACT
JavaScript visualization libraries are widely used to create online
Specifically,
tic summaryVoxLens enables
of presented screen-reader
information, playusers to obtain
sonifed a holis-
versions of
JavaScript visualization libraries are widely tic data,
the summary of presented
and interact information, in
with visualizations play sonified versions
a “drill-down” manner of
data visualizations but provide limited access used to create
to their online
information
data visualizations but provide the data,
using and interactcommands.
voice-activated with visualizations
Throughintask-based
a “drill-down” manner
experiments
for screen-reader users. Buildinglimited
on prioraccess to their
fndings information
about the expe-
for screen-reader users. Building ononline
prior findings about the expe- using21voice-activated
with screen-reader commands.
users, we showThrough task-basedimproves
that VoxLens experiments
the
riences of screen-reader users with data visualizations, we
riences VoxLens,
of screen-reader users with online data visualizations, with 21 screen-reader
accuracy of information users, we show
extraction andthat VoxLenstime
interaction improves the
by 122%
present an open-source JavaScript plug-in that—withwe a
present VoxLens, an open-source JavaScript plug-in that—with accuracy
and of information
36%, respectively, overextraction and interaction
existing conventional time by with
interaction 122%
single line of code—improves the accessibility of online data visu-a
single lineforof screen-reader
code—improves the accessibility of onlineapproach.
data visu- and 36%,
online datarespectively, over
visualizations. Our existing conventional
interviews interactionusers
with screen-reader with
alizations users using a multi-modal
alizations for screen-reader users using a multi-modal approach. online data
suggest thatvisualizations. Our interviews with
VoxLens is a “game-changer” screen-reader
in making online users
data
suggest that VoxLens
visualizations accessibleisto
a “game-changer” in making
screen-reader users, saving online data
them time
Permission to make digital or hard copies of part or all of this work for personal or visualizations
and efort. accessible to screen-reader users, saving them time
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
and effort.
This work is licensed under a Creative Commons Attribution International
on the first page. Copyrights for third-party components of this work must be honored.
4.0 License.
For all other uses, contact the owner/author(s). CCS CONCEPTS
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
© 2022 Copyright held by the owner/author(s). • Human-centered computing → Information visualization;
© 2022 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-9157-3/22/04.
ACM ISBN 978-1-4503-9157-3/22/04. Accessibility systems and tools; • Social and professional top-
https://doi.org/10.1145/3491102.3517431
https://doi.org/10.1145/3491102.3517431 ics → People with disabilities.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

KEYWORDS customizable.) Additionally, VoxLens reduces the burden on visu-
Visualizations, accessibility, screen readers, voice-based interaction, alization creators in applying accessibility features to their data
blind, low-vision. visualizations, requiring inserting only a single line of JavaScript
code during visualization creation. Furthermore, VoxLens enables
ACM Reference Format: screen-reader users to explore the data as per their individual pref-
Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, erences, without relying on the visualization creators and without
and Jacob O. Wobbrock. 2022. VoxLens: Making Online Data Visualizations
having to process data in their minds. VoxLens is the frst system to:
Accessible with an Interactive JavaScript Plug-In. In CHI Conference on
(1) enable screen-reader users to interact with online data visualiza-
Human Factors in Computing Systems (CHI ’22), April 29-May 5, 2022, New
Orleans, LA, USA. ACM, New York, NY, USA, 19 pages. https://doi.org/10. tions using voice-activated commands; and (2) ofer a multi-modal
1145/3491102.3517431 solution using three diferent modes of interaction.
To assess the performance of VoxLens, we conducted controlled
task-based experiments with 21 screen-reader users. Specifcally,
1 INTRODUCTION we analyzed the accuracy of extracted information and interaction
Online data visualizations are present widely on the Web, allowing time with online data visualizations. Our results show that with
experts and non-experts alike to explore and analyze data both VoxLens, compared to without it, screen-reader users improved
simple and complex. They assist people in extracting information their accuracy of extracting information by 122% and reduced their
efectively and efciently, taking advantage of the ability of the overall interaction time by 36%. Additionally, we conducted follow-
human mind to recognize and interpret visual patterns [57]. up semi-structured interviews with six participants, fnding that
However, the visual nature of data visualizations inherently dis- VoxLens is a positive step forward in making online data visual-
enfranchises screen-reader users, who may not be able to see or izations accessible, interactive dialogue is one of the ‘top’ features,
recognize visual patterns [52, 57]. We defne “screen-reader users,” sonifcation helps in ‘visualizing’ data, and data summary is a good
following prior work [69], as people who utilize a screen reader (e.g., starting point. Furthermore, we assessed the perceived workload
JAWS [68], NVDA [2], or VoiceOver [44]) to read the contents of a of VoxLens using the NASA-TLX questionnaire [38], showing that
computer screen. They might have conditions including complete VoxLens leaves users feeling successful in their performance and
or partial blindness, low vision, learning disabilities (such as alexia), demands low physical efort.
motion sensitivity, or vestibular hypersensitivity. The main contributions of our work are as follows:
Due to the inaccessibility of data visualizations, screen-reader
(1) VoxLens, an interactive JavaScript plug-in that improves
users commonly cannot access them at all. Even when the data
the accessibility of online data visualizations for screen-
visualization includes basic accessibility functions (e.g., alternative
reader users. VoxLens ofers a multi-modal solution, en-
text or a data table), screen-reader users still spend 211% more time
abling screen-reader users to explore online data visualiza-
interacting with online data visualizations and answer questions
tions, both holistically and in a drilled-down manner, using
about the data in the visualizations 61% less accurately, compared
voice-activated commands. We present its design and archi-
to non-screen-reader users [69]. Screen-reader users rely on the cre-
tecture, functionality, commands, and operations. Addition-
ators of visualizations to provide adequate alternative text, which is
often incomplete. Additionally, they have to remember and process ally, we open-source our implementation at
https://github.com/athersharif/voxlens.
more information mentally than is often humanly feasible [74],
(2) Results from formative and summative studies with screen-
such as when seeking the maximum or minimum value in a chart.
reader users evaluating the performance of VoxLens. With
Prior work has studied the experiences of screen-reader users with
VoxLens, screen-reader users signifcantly improved their in-
online data visualizations and highlighted the challenges they face,
teraction performance compared to their conventional inter-
the information they seek, and the techniques and strategies that
action with online data visualizations. Specifcally, VoxLens
could make online data visualizations more accessible [69]. Building
increased their accuracy of extracting information by 122%
on this work, it is our aim to realize a novel interactive solution to
and decreased their interaction time by 36% compared to not
enable screen-reader users to efciently interact with online data
using VoxLens.
visualizations.
To this end, we created an open-source JavaScript plug-in called
“VoxLens,” following an iterative design process [1]. VoxLens pro- 2 RELATED WORK
vides screen-reader users with a multi-modal solution that supports We review the previous research on the experiences of screen-reader
three modes of interaction: (1) Question-and-Answer mode, where users with online data visualizations and the systems designed to
the user verbally interacts with the visualizations on their own; (2) improve the accessibility of data visualizations for screen-reader
Summary mode, where VoxLens describes the summary of the in- users. Additionally, we review existing JavaScript libraries used to
formation contained in the visualization; and (3) Sonifcation mode, create online visualizations, and tools that generate audio graphs.
where VoxLens maps the data in the visualization to a musical
scale, enabling listeners to interpret the data trend. (Existing sonif-
cation tools are either proprietary [39] or written in a programming
2.1 Experiences of Screen-Reader Users with
language other than JavaScript [5], making them unintegratable Online Data Visualizations
with popular JavaScript visualization libraries; VoxLens’ sonifca- Understanding the experiences and needs of users is paramount
tion feature is open-source, integratable with other libraries, and in the development of tools and systems [8, 40]. Several prior
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

research eforts have conducted interviews with blind and low- 2.3 Existing JavaScript Data Visualization
vision (BLV) users to understand their experiences with technology Libraries
[7, 43, 67, 69, 82]. Most recently, Zhang et al. [82] conducted inter-
Several JavaScript data visualization libraries exist that enable visu-
views with 12 BLV users, reporting four major challenges of cur-
alization creators to make visualizations for the Web. We classifed
rent emoji entry methods: (1) the entry process is time-consuming;
these visualization libraries into two categories based on accessibil-
(2) the results from these methods are inconsistent with the ex-
ity features: (1) libraries that rely on developers to append appro-
pectations of users; (3) there is a lack of support for discovering
priate alternative text (e.g., D3 and ChartJS); and (2) libraries that
new emojis; and (4) there is a lack of support for fnding the right
automatically provide screen-reader users with built-in features for
emojis. They utilized these fndings to develop Voicemoji, a speech-
data access (e.g., Google Charts).
based emoji entry system that enables BLV users to input emojis.
Bostock et al. [11] developed D3—a powerful visualization library
Schaadhardt et al. [67] conducted contextual interviews with 12
that uses web standards to generate graphs. D3 uses Scalable Vector
blind users, identifying key accessibility problems with 2-D digi-
Graphics (SVG) [22] to create such visualizations, relying on the de-
tal artboards, such as Microsoft PowerPoint and Adobe Illustrator.
velopers to provide adequate alternative text for screen-reader users
Similarly, Sharif et al. [69] conducted contextual interviews with
to comprehend the information contained in the visualizations.
9 screen-reader users, highlighting the inequities screen-reader
Google Charts [23] is a visualization tool widely used to create
users face when interacting with online data visualizations. They
graphs. An important underlying accessibility feature of Google
reported the challenges screen-reader users face, the information
Charts is the presence of a visually hidden tabular representation
they seek, and the techniques and strategies they prefer to make on-
of data. While this approach allows screen-reader users to access
line data visualizations more accessible. We rely upon the fndings
the raw data, extracting information is a cumbersome task. Fur-
from Sharif et al. [69] to design VoxLens, an interactive JavaScript
thermore, tabular representations of data introduce excessive user
plug-in that improves the accessibility of online data visualizations,
workloads, as screen-reader users have to sequentially go through
deriving motivation from Marriott et al.’s [57] call-to-action for
each data point. The workload is further exacerbated as data car-
creating inclusive data visualizations for people with disabilities.
dinality increases, forcing screen-reader users to memorize each
data point to extract even the most fundamental information such
2.2 Accessibility of Online Data Visualizations as minimum or maximum values.
In contrast to these approaches, VoxLens introduces an alternate
Prior research eforts have explored several techniques to make data
way for screen-reader users to obtain their desired information
visualizations more accessible to BLV users, including automatically
without relying on visualization creators, and without mentally
generating alternative text for visualization elements [48, 59, 70],
computing complex information through memorization of data.
sonifcation [3, 5, 16, 27, 39, 58, 83], haptic graphs [33, 76, 81], 3-D
printing [15, 43, 71], and trend categorization [47]. For example,
2.4 Audio Graphs
Sharif et al. [70] developed evoGraphs, a jQuery plug-in to create
accessible graphs by automatically generating alternative text. Sim- Prior work has developed sonifcation tools to enable screen-reader
ilarly, Kim et al. [47] created a framework that uses multimodal users to explore data trends and patterns in online data visualiza-
deep learning to generate summarization text from image-based tions [3, 5, 39, 58, 83]. McGookin et al. [58] developed SoundBar,
line graphs. Zhao et al. [83] developed iSonic, which assists BLV a system that allows blind users to gain a quick overview of bar
users in exploring georeferenced data through non-textual sounds graphs using musical tones. Highcharts [39], a proprietary commer-
and speech output. They conducted in-depth studies with seven cial charting tool, ofers data sonifcation as an add-on. Apple Audio
blind users, fnding that iSonic enabled blind users to fnd facts Graphs [5] is an API for Apple application developers to construct
and discover trends in georeferenced data. Yu et al. [81] developed an audible representation of the data in charts and graphs, giving
a system to create haptic graphs, evaluating their system using BLV users access to valuable data insights. Similarly, Ahmetovic et
an experiment employing both blind and sighted people, fnding al. [3] developed a web app that supports blind people in exploring
that haptic interfaces are useful in providing the information con- graphs of mathematical functions using sonifcation.
tained in a graph to blind computer users. Hurst et al. [43] worked At least one of the following is true for all of the aforementioned
with six individuals with low or limited vision and developed Viz- systems: (1) they are proprietary and cannot be used outside of
Touch, software that leverages afordable 3-D printing to rapidly their respective products [39]; (2) they are standalone hardware
and automatically generate tangible visualizations. or software applications [3]; (3) they require installation of extra
Although these approaches are plausible solutions for improving hardware or software [58]; or (4) they are incompatible with existing
the accessibility of visualizations for screen-reader users, at least JavaScript libraries [5]. VoxLens provides sonifcation as a separate
one of the following is true for all of them: (1) they require additional open-source library (independent from the VoxLens library) that is
equipment or devices; (2) they are not practical for spontaneous customizable and integratable with any JavaScript library or code.
everyday web browsing; (3) they do not ofer a multi-modal solution;
and (4) they do not explore the varying preferences of visualization 3 DESIGN OF VOXLENS
interaction among screen-reader users. In contrast, VoxLens does We present the design and implementation of VoxLens, an open-
not need any additional equipment, is designed for spontaneous source JavaScript plug-in that improves the accessibility of online
everyday web browsing, and ofers a multi-modal solution catering data visualizations. We created VoxLens using a user-centered itera-
to the individual needs and abilities of screen-reader users. tive design process, building on fndings and recommendations from
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

prior work [69]. Specifcally, our goal was to provide screen-reader the data collected from the Wizard-of-Oz studies. We provide our
users with a comprehensive means of extracting information from fndings from the Wizard-of-Oz studies for each VoxLens mode in
online data visualizations, both holistically and in a drilled-down its respective section, below.
fashion. Holistic exploration involves overall trend, extremum, and
labels and ranges for each axis, whereas drilled-down interaction 3.1.2 Qestion-and-Answer Mode. In Question-and-Answer mode,
involves examining individual data points [69]. We named our tool screen-reader users can extract information from data visualiza-
VoxLens, combining “vox,” meaning “voice” in Latin, and “lens,” tions by asking questions verbally using their microphone. We used
since it provides a way for screen-reader users to explore, examine, the Web Speech API [60] and the P5 Speech library [24] for speech
and extract information from online data visualizations. Currently, input, removing the need for any additional software or hardware
VoxLens only supports two-dimensional single-series data. installation by the user. Through manual testing, we found the P5
Speech library to perform quite well in recognizing speech from
3.1 Interaction Modes diferent accents, pronunciations, and background noise levels. Af-
Following the recommendations from prior work [69], our goal ter getting the text from the speech, we used an approximate string
was to enable screen-reader users to gain a holistic overview of the matching algorithm from Hall and Dowling [36] to recognize the
data as well as to perform drilled-down explorations. Therefore, commands. Additionally, we verifed VoxLens’ command recogni-
we explored three modes of interaction: (1) Question-and-Answer tion efectiveness through manual testing, using prior work’s [73]
mode, where the user verbally interacts with the visualizations; data set on natural language utterances for visualizations.
(2) Summary mode, where VoxLens verbally ofers a summary of Our Wizard-of-Oz studies revealed that participants liked clear
the information contained in the visualization; and (3) Sonifcation instructions and responses, integration with the user’s screen reader,
mode, where VoxLens maps the data in the visualization to a mu- and the ability to query by specifc terminologies. They specifed
sical scale, enabling listeners to interpret possible data trends or that having an interactive tutorial to become familiar with the tool,
patterns. We iteratively built the features for these modes seeking a help menu to determine which commands are supported, and
feedback from screen-reader users through our Wizard-of-Oz stud- the ability to include the user’s query in the response as key areas
ies. VoxLens channels all voice outputs through the user’s local of improvement. Therefore, after recognizing the commands and
screen reader, providing screen-reader users with a familiar and processing their respective responses, VoxLens delivers a single
comfortable experience. These three modes of interaction can be combined response to the user via their screen readers. This ap-
activated by pressing their respective keyboard shortcuts (Table 1). proach enables screen-reader users to get a response to multiple
commands as one single response. Additionally, we also added each
3.1.1 Wizard-of-Oz Studies. Our goal was to gather feedback and query as feedback in the response (Figure 1). For example, if the
identify areas of improvement for the VoxLens features. Therefore, user said, “what is the maximum?”, the response was, “I heard you
we conducted a Wizard-of-Oz study [21, 35] with fve screen-reader ask about the maximum. The maximum is...” If a command was
users (see Appendix B, Table 7). (For clarity, we prefx the codes for not recognized, the response was, “I heard you say [user input].
participants in our Wizard-of-Oz studies with “W.”) We used the Command not recognized. Please try again.”
fndings from the studies to inform design decisions when itera- Screen-reader users are also able to get a list of supported com-
tively building VoxLens. In our studies, we, the “wizards,” simulated mands by asking for the commands list. For example, the user can
the auditory responses from a hypothetical screen reader. ask, “What are the supported commands?” to hear all of the com-
mands that VoxLens supports. The list of supported commands,
along with their aliases, are presented in Table 2.

3.1.3 Summary Mode. Our Wizard-of-Oz studies, in line with the
fndings from prior work [69], revealed that participants liked the
efciency and the preliminary exploration of the data. They sug-
gested the information be personalized based on the preferences
of each user, but by default, it should only expose the minimum
amount of information that a user would need to decide if they
want to delve further into the data exploration. To delve further,
they commonly seek the title, axis labels and ranges, maximum
and minimum data points, and the average in online data visualiza-
tions. (The title and axis labels are required confguration options
Figure 2: Sample visualization showing price by car brands. for VoxLens, discussed further in section 3.2.2 below. Axis ranges,
maximum and minimum data points, and average are computed by
VoxLens.) At the same time, screen-reader users preferred concisely
Participants interacted with all of the aforementioned VoxLens stated information. Therefore, the goal for VoxLens’s Summary
modes and were briefy interviewed in a semi-structured manner mode was to generate the summary only as a means to providing
with open prompts at the end of each of their interactions. Specif- the foundational holistic information about the visualization, and
ically, we asked them to identify the features that they liked and not as a replacement for the visualization itself. We used the “lan-
the areas of improvement for each mode. We qualitatively analyzed guage of graphics” [10] through a pre-defned sentence template,
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

Keyboard Shortcuts
Question-and-Answer Mode Modifier Keys + A / Modifier Keys + 1
Summary Mode Modifier Keys + S / Modifier Keys + 2
Sonifcation Mode Modifier Keys + M / Modifier Keys + 3
Repeat Instructions Modifier Keys + I / Modifier Keys + 4
Table 1: Keyboard shortcuts for VoxLens’ interaction modes and preliminary commands. Modifier Keys for Windows and
MacOS were Control+Shift and Option, respectively.

Information Type Command Aliases
Extremum Maximum Highest
Minimum Lowest
Axis Labels and Ranges Axis Labels -
Ranges -
Statistics Mean Average
Median -
Mode -
Variance -
Standard Deviation -
Sum Total
Individual Data Point [x-axis label] value [x-axis label] data
Help Commands Instructions
- Directions, Help
Table 2: Voice-activated commands for VoxLens’ Question-and-Answer mode.

identifed as Level 1 by Lundgard et al. [55], to decide the sentence the minimum data point is $20,000 belonging
structure. Our sentence template was: to Kia. The average is $60,000.

Graph with title: [title]. The X-axis is As noted in prior work [55, 69], the preference for information
[x-axis title]. The Y-axis is [y-axis title] varies from one individual to another. Therefore, future work can
and ranges from [range minimum] to [range explore personalization options to generate a summarized response
maximum]. The maximum data point is [maximum that caters to the individual needs of screen-reader users.
y-axis value] belonging to [corresponding Additionally, VoxLens, at present, does not provide information
x-axis value], and the minimum data point about the overall trend through the Summary mode. Such infor-
is [minimum y-axis value] belonging to mation may be useful for screen-reader users in navigating line
[corresponding x-axis value]. The average graphs [47, 48]. Therefore, work is underway to incorporate trend
is [average]. information in the summarized response generated for line graphs,
utilizing the fndings from prior work [47, 48].
For example, here is a generated summary of a data visualization
depicting the prices of various car brands (Figure 2): 3.1.4 Sonification Mode. For Sonifcation mode, our Wizard-of-Oz
participants liked the ability to preliminarily explore the data trend.
Graph with title: Price by Car Brands. The As improvements, participants suggested the ability to identify key
X-axis is car brands. The Y-axis is price and information, such as the maximum and the minimum data points.
ranges from $0 to $300,000. The maximum data Therefore, VoxLens’s sonifcation mode presents screen-reader
point is $290,000 belonging to Ferrari, and users with a sonifed response (also known as an “audio graph”
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

[72]) mapping the data in the visualization to a musical scale. A and in order. You’ll hear a beep sound, after
sonifed response enables the listeners to interpret the data trend which you can ask a question such as, ‘‘what
or pattern and gain a big-picture perspective of the data that is is the average?’’ or ‘‘what is the maximum
not necessarily achievable otherwise [66]. To generate the sonifed value in the graph?’’ To hear the textual
response, we utilized Tone.js [56], a JavaScript library that ofers summary of the graph, press Control + Shift +
a wide variety of customizable options to produce musical notes. S or Control + Shift + 2. To hear the sonified
Our goal was to enable the listeners to directionally distinguish version of the graph, press Control + Shift
between data points and to interpret the overall data trend. + M or Control + Shift + 3. To repeat these
Varying tonal frequency is more efective at representing trends instructions, press Control + Shift + I or
than varying amplitude [26, 42]. Therefore, we mapped each data Control + Shift + 4. Key combinations must
point to a frequency within the range of 130 and 650 Hz based be pressed all together and in order.
on its magnitude. For example, for the minimum data point the
At this stage, screen-reader users can activate question-and-
frequency was 130 Hz, for the maximum data point it was 650Hz,
answer mode, listen to the textual summary, play the sonifed ver-
and the intermediate data points were assigned values linearly in-
sion of the data contained in the visualization, or hear the instruc-
between, similar to prior work [19, 61]. Additionally, similar to
tions again. Activating the question-and-answer mode plays a beep
design choices made by Ohshiro et al. [61], we used the sound of a
sound, after which the user can ask a question in a free-form man-
sawtooth wave to indicate value changes along the x-axis. These
ner, without following any specifc grammar or sentence structure.
approaches enabled us to distinctively diferentiate between data
They are also able to ask for multiple pieces of information, in no
values directionally, especially values that were only minimally dif-
particular order. For example, in a visualization containing prices
ferent from each other. We chose this range based on the frequency
of cars by car brands, a screen-reader user may ask:
range of the human voice [6, 58, 75], and by trying several combina-
tions ourselves, fnding a setting that was comfortable for human Tell me the mean, maximum, and standard
ears. We provide examples of sonifed responses in our paper’s deviation.
supplementary materials. Our open-source sonifcation library is The response from VoxLens would be:
available at https://github.com/athersharif/sonifer.
In our work, we used the three common chart types (bar, scatter, I heard you asking about the mean, maximum,
and line) [65], following prior work [69]. All of these chart types use and standard deviation. The mean is $60,000.
a traditional Cartesian coordinate system. Therefore, VoxLens’s The maximum value of price for car brands is
sonifed response is best applicable to graphs represented using a $290,000 belonging to Ferrari. The standard
Cartesian plane. Future work can study sonifcation responses for deviation is 30,000.
graphs that do not employ a Cartesian plane to represent data (e.g., Similarly, users may choose to hear the textual summary or play
polar plots, pie charts, etc.). the sonifed version, as discussed above.

3.2 Usage and Integration 3.2.2 Visualization Creators. Typically, the accessibility of online
3.2.1 Screen-Reader User. A pain point for screen-reader users data visualizations relies upon visualization creators and their
when interacting with online data visualizations is that most vi- knowledge and practice of accessibility standards. When an alter-
sualization elements are undiscoverable and incomprehensible by native text description is not provided, the visualization is useless
screen readers. In building VoxLens, we ensured that the visualiza- to screen-reader users. In cases where alternative text is provided,
tion elements were recognizable and describable by screen readers. the quality and quantity of the text is also a developer’s choice,
Hence, as the very frst step, when the screen reader encounters a which may or may not be adequate for screen-reader users. For
visualization created with VoxLens, the following is read to users: example, a common unfortunate practice is to use the title of the
visualization as its alternative text, which helps screen-reader users
Bar graph with title: [title]. To listen
in understanding the topic of the visualization but does not help
to instructions on how to interact with the
in understanding the content contained within the visualization.
graph, press Control + Shift + I or Control +
Therefore, VoxLens is designed to reduce the burden and depen-
Shift + 4. Key combinations must be pressed
dency on developers to make accessible visualizations, keeping the
all together and in order.
interaction consistent, independent of the visualization library used.
The modifer keys (Control + Shift on Windows, and Option Additionally, VoxLens is engineered to require only a single line of
on MacOS) and command keys were selected to not interfere with code, minimizing any barriers to its adoption (Figure 1).
the dedicated key combinations of the screen reader, the Google VoxLens supports the following confguration options: “x” (key
Chrome browser, and the operating system. Each command was name of the independent variable), “y” (key name of the dependent
additionally assigned a numeric activation key, as per suggestions variable), “title” (title of the visualization), “xLabel” (label for x-
from our participants. axis), and “yLabel” (label for y-axis). “x,” “y,” and “title” are required
When a user presses the key combination to listen to the instruc- parameters, whereas the “xLabel” and “yLabel” are optional and
tions, their screen reader announces the following: default to the key names of “x” and “y,” respectively. VoxLens
To interact with the graph, press Control + allows visualization creators to set the values of these confguration
Shift + A or Control + Shift + 1 all together options, as shown in Figure 1.
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

3.3 Channeling VoxLens’ Output to Screen Charts, D3, and ChartJS, integrated VoxLens with each of them,
Readers and deployed a testing website on our server. The testing website
was instrumental in ensuring the correct operation of VoxLens
One of the challenges we faced was to channel the auditory response
under various confgurations, bypassing the challenges of setting
from VoxLens to the screen reader of the user. As noted by our
up a development environment for testers.
participants during Wizard-of-Oz studies, screen-reader users have
unique preferences for their screen readers, including the voice
and speed of the speech output. Therefore, it was important for 3.5 Conficts with Other Plug-ins
VoxLens to utilize these preferences, providing screen-reader users To the best of our knowledge, two kinds of conficts are possible
with a consistent, familiar, and comfortable experience. To relay the with VoxLens: key combination conficts and ARIA attribute con-
output from VoxLens to the screen reader, we created a temporary ficts. As mentioned in section 4.2.1, we selected key combinations to
div element that was only visible to screen readers, positioning it avoid conficts with the dedicated combinations of the screen reader,
of-screen, following WebAIM’s recommendations [78]. the Google Chrome browser, and the operating system. However, it
Then, we added the appropriate Accessible Rich Internet Ap- is possible that some users might have external plug-ins using key
plications (ARIA) attributes [77] to the temporary element to en- combinations that would confict with those from VoxLens. Future
sure maximum accessibility. ARIA attributes are a set of attributes work could build a centralized confguration management system,
to make web content more accessible to people with disabilities. enabling users to specify their own key combinations.
Notably, we added the “aria-live” attribute, allowing screen read- VoxLens modifes the “aria-label” attribute of the visualiza-
ers to immediately announce the query responses that VoxLens tion container element to describe the interaction instructions for
adds to the temporary element. For MacOS, we had to addition- VoxLens, as mentioned in section 4.2.1. It is possible that another
ally include the “role” attribute, with its value set to “alert.” This plug-in may intend to modify the “aria-label” attribute as well, in
approach enabled VoxLens to promptly respond to screen-reader which case the execution order of the plug-ins will determine which
users’ voice-activated commands using their screen readers. After plug-in achieves the fnal override. The execution order of the plug-
the response from VoxLens is read by the screen reader, a callback ins depends on several external factors [63], and is, unfortunately,
function removes the temporary element from the HTML tree to a common limitation for any browser plug-in. However, VoxLens
avoid overloading the HTML Document Object Model (DOM). does not afect the “aria-labelledby” attribute, allowing other sys-
tems to gracefully override the “aria-label” attribute set by VoxLens,
as this attribute takes precedence over the “aria-label” attribute in
3.4 Additional Implementation Details
the accessibility tree. Future iterations of VoxLens will attempt
At present, VoxLens only supports two-dimensional data, contain- to ensure that VoxLens achieves the last execution order and that
ing one independent and one dependent variable, as only the in- the ARIA labels set by other systems are additionally relayed to
teractive experiences of screen-reader users with two-dimensional screen-reader users.
data visualizations are well-understood [69]. To support data dimen- It is important to note that VoxLens’s sonifcation library is
sions greater than two, future work would need to investigate the supplied independently from the main VoxLens plug-in and does
interactive experiences of screen-reader users with n-dimensional not follow the same limitations. Our testing did not reveal any
data visualizations. VoxLens is customizable and engineered to conficts of the sonifcation library with other plug-ins.
support additional modifcations in the future.
VoxLens relies on the Web Speech API [60], and is therefore
4 EVALUATION METHOD
only fully functional on browsers with established support for the
API such as Google Chrome. JavaScript was naturally our choice of We evaluated the performance of VoxLens using a mixed-methods
programming language for VoxLens, as VoxLens is a plug-in for approach. Specifcally, we conducted an online mixed-factorial ex-
JavaScript visualization libraries. Additionally, we used EcmaScript periment with screen-reader users to assess VoxLens quantitatively.
[37] to take advantage of modern JavaScript features such as de- Additionally, we conducted follow-up interviews with our partici-
structured assignments, arrow functions, and the spread operator. pants for a qualitative assessment of VoxLens.
We also built a testing tool to test VoxLens on data visualizations,
using the React [45] framework as the user-interface framework 4.1 Participants
and Node.js [28] as the back-end server—both of which also use Our participants (see Appendix A, Table 6) were 22 screen-reader
JavaScript as their underlying programming language. Additionally, users, recruited using word-of-mouth, snowball sampling, and email
we used GraphQL [29] as the API layer for querying and connecting distribution lists for people with disabilities. Nine participants iden-
with our Postgres [34] database, which we used to store data and tifed as women and 13 as men. Their average age was 45.3 years
participants’ interaction logs. (SD=16.8). Twenty participants had complete blindness and two
Creating a tool like VoxLens requires signif- participants had partial blindness; nine participants were blind
cant engineering efort. Our GitHub repository at since birth, 12 lost vision gradually, and one became blind due to a
https://github.com/athersharif/voxlens has a total of 188 brain tumor. The highest level of education attained or in pursuit
commits and 101,104 lines of developed code, excluding comments. was a doctoral degree for two participants, a Master’s degree for
To support testing VoxLens on various operating systems and seven participants, a Bachelor’s degree for eight participants, and a
browsers with diferent screen readers, we collected 30 data sets high school diploma for the remaining fve participants. Estimated
of varying data points, created their visualizations using Google computer usage was more than 5 hours per day for 12 participants,
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

2-5 hours per day for eight participants, and 1-2 hours per day for (2) Symmetry Comparison (comparison of data points); and (3) Chart
two participants. The average frequency of interacting with online Type-Specifc Questions (value retrieval for bar charts, trend sum-
data visualizations was over two visualizations per day, usually in mary for line charts, and correlation for scatter plots). As in prior
the context of news articles, blog posts, and social media. work [69], all questions were multiple-choice questions with four
For the task-based experiment and questionnaire, participants choices: the correct answer, two incorrect answers, and the option
were compensated with a $20 Amazon gift card for 30-45 minutes for “Unable to extract information.” The order of the four choices
of their time. For the follow-up interview, they were compensated was randomized per trial.
$10 for 30 minutes of their time. No participant was allowed to
partake in the experiment more than once. 4.3 Procedure
The study was conducted online by participants without direct
4.2 Apparatus supervision. The study comprised six stages. The frst stage dis-
played the study purpose, eligibility criteria, and the statement of
We conducted our task-based experiment online using a user study
IRB approval. In the second stage, the participants were asked to fll
platform that we created with the JavaScript React framework [45].
out a pre-study questionnaire to record their demographic informa-
We tested our platform with screen-reader users and ourselves, both
tion, screen-reader software, vision-loss level, and diagnosis (see
with and without a screen reader, ensuring maximum and proper
Appendix A, Table 6). Additionally, participants were asked about
accessibility measures. We deployed the experiment platform as a
their education level, daily computer usage, and their frequency of
website hosted on our own server.
interacting with visualizations.
We analyzed the performance of VoxLens comparing the data
In the third stage, participants were presented with a step-by-
collected from our task-based experiments with that from prior
step interactive tutorial to train and familiarize themselves with the
work [69]. To enable a fair comparison to this prior work, we used
modes, features, and commands that VoxLens ofers. Additionally,
the same visualization libraries, visualization data set, question cat-
participants were asked questions at each step to validate their
egories, and complexity levels. The visualization libraries (Google
understanding. On average, the tutorial took 12.6 minutes (SD=6.8)
Charts, ChartJS, and D3) were chosen based on the variation in their
to complete. Upon successful completion of the tutorial, participants
underlying implementations as well as their application of accessi-
were taken to the fourth stage, which displayed the instructions for
bility measures. Google Charts utilizes SVG elements to generate
completing the study tasks.
the visualization and appends a tabular representation of the data
In the ffth stage, each participant was given a total of nine tasks.
for screen-reader users, by default; D3 also makes use of SVG ele-
For each task, participants were shown three Web pages: Page 1
ments but does not provide a tabular representation; ChartJS uses
contained the question to explore, page 2 displayed the question
HTML Canvas to render the visualization as an image and relies
and visualization, and page 3 presented the question with a set of
on the developers to add alternative text (“alt-text”) and Accessible
four multiple-choice responses. Figure 3 shows the three pages of
Rich Internet Applications (“ARIA”) attributes [77]. Therefore, each
an example task. After the completion of the tasks, participants
of these visualization libraries provides a diferent experience for
were asked to fll out the NASA-TLX [38] survey in the last stage.
screen-reader users, as highlighted in prior work [69].
An entire study session ranged from 30-45 minutes in duration.
We provide all of the visualizations and data sets used in this
work in this paper’s supplementary materials. Readers can repro-
duce these visualizations using the supplementary materials in 4.4 Design & Analysis
conjunction with the source code and examples presented in our The experiment was a 2 × 3 × 3 × 3 mixed-factorial design with the
open-source GitHub repository. We implemented the visualizations following factors and levels:
following the WCAG 2.0 guidelines [17] in combination with the of-
• VoxLens (V X ), between-Ss.: {yes, no}
fcial accessibility recommendations from the visualization libraries.
• Visualization Library (V L), within-Ss.: {ChartJS, D3, Google
For ChartJS, we added the “role” and “aria-label” attributes to the
Charts}
“canvas” element. The “role” attribute had the value of “img,” and
• Data Complexity (CMP), within-Ss.: {Low, Medium, High}
the “aria-label” was given the value of the visualization title, as per
• Question Difculty (DF ), within-Ss.: {Low, Medium, High}
the ofcial documentation from ChartJS developers [18]. We did
not perform any accessibility scafolding for Google Charts and For the screen-reader users who did not use VoxLens (V X =no),
D3 visualizations, as these visualizations rely on a combination of we used prior work’s data [69] (N =36) as a baseline for comparison.
internal implementations and the features of SVG for accessibility. Our two dependent variables were Accuracy of Extracted Infor-
Our goal was to replicate an accurate representation of how these mation (AEI) and Interaction Time (IT). We used a dichotomous
visualizations currently exist on the Web. representation of AEI (i.e., “inaccurate” or 0 if the user was unable
Recent prior work [46] has reported that the non-visual ques- to answer the question correctly, and “accurate” or 1 otherwise) for
tions that users ask from graphs mainly comprise compositional our analysis. We used a mixed logistic regression model [32] with
questions, similar to the fndings from Brehmer and Munzner’s the above factors, interactions with VoxLens, and a covariate to
task topology [14]. Therefore, our question categories comprised control for Age. We also included Subjectr as a random factor to
one “Search” action (lookup and locate) and two “Query” actions account for repeated measures. The statistical model was therefore
(identify and compare), similar to prior work [13]. The categories, in AEI ← V X +V X ×V L +V X ×CMP +V X ×DF + Aдe + Subjectr . We
ascending order of difculty, were: (1) Order Statistics (extremum); did not include factors for V L, CMP, or DF because our research
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

interview transcript was coded by three researchers independently,
and disagreements were resolved through mutual discussions. As
suggested by Lombard et al. [51], we calculated inter-rater relia-
bility (IRR) using pairwise percentage agreement together with
Krippendorf’s α [49]. To calculate pairwise percentage agreement,
we calculated the average pairwise agreement among the three rater
pairs across observations. Our pairwise percentage agreement was
94.3%, showing a high agreement between raters. Krippendorf’s α
was calculated using ReCal [31] and found to be 0.81, indicating a
high level of reliability [50].
In addition to conducting follow-up interviews, we administered
the NASA-TLX survey [38] with all participants (N =21) to assess
the perceived workload of VoxLens.

5 RESULTS
We present our experiment results using the Accuracy of Extracted
Information (AEI) and Interaction Time (IT) for screen-reader users
with and without VoxLens. We also present our interview results
and the subjective ratings from the NASA-TLX questionnaire [38].

5.1 Accuracy of Extracted Information
Our results show a signifcant main efect of VoxLens (VX) on AEI
(χ 2 (1, N =57)=38.16, p<.001, Cramer’s V =.14), with VoxLens users
achieving 75% accuracy (SD = 18.0%) and non-VoxLens users
achieving only 34% accuracy (SD = 20.1%). This diference consti-
tuted a 122% improvement due to VoxLens.
By analyzing the VoxLens (VX) × Visualization Library (VL) inter-
action, we investigated whether changes in AEI were proportional
across visualization libraries for participants in each VoxLens group.
Figure 3: Participants were shown three pages for each task. The V X × V L interaction was indeed statistically signifcant (χ 2 (4,
(a) Page 1 presented the question to explore. (b) Page 2 dis- N =57)=82.82, p<.001, Cramer’s V =.20). This result indicates that
played the same question and a visualization. (c) Page 3 AEI signifcantly difered among visualization libraries for partic-
showed the question again with a set of four multiple choice ipants in each VoxLens group. Figure 4 and Table 3 show AEI
responses. percentages for diferent visualization libraries for each VoxLens
group. Additionally, we report our fndings in Table 4.
Prior work [69] has reported a statistically signifcant diference
questions centered around VoxLens (V X ) and our interest in these between screen-reader users (SRU) and non-screen-reader users
factors only extended to their possible interactions with VoxLens. (non-SRU) in terms of AEI , attributing the diference to the inac-
For Interaction Time (IT ), we used a linear mixed model [30, cessibility of online data visualizations. We conducted a second
54] with the same model terms as for AEI . IT was calculated as analysis, investigating whether AEI was diferent between screen-
the total time of the screen reader’s focus on the visualization reader users who used VoxLens and non-screen-reader users, to
element. Participants were tested over three Visualization Library × extract information from online data visualizations. Specifcally, we
Complexity (V L × CMP) conditions, resulting in 3×3 = 9 trials per investigated the efect of SRU on AEI but did not fnd a statistically
participant. With 21 participants, a total of 21×9 = 189 trials were signifcant efect (p ≈ .077). This result itself does not provide evi-
produced and analyzed for this study. One participant, who was dence in support of VoxLens closing the access gap between the
unable to complete the tutorial, was excluded from the analysis. two user groups; further experimentation is necessary to confrm
or refute this marginal result. In light of VoxLens’s other benefts,
4.5 Qualitative and Subjective Evaluation however, this is an encouraging trend.
To qualitatively assess the performance of VoxLens, we conducted
follow-up interviews with six screen-reader users, randomly se- 5.2 Interaction Time
lected from our pool of participants who completed the task-based Our preliminary analysis showed that the interaction times were
experiment. Similar to prior work [80], we ceased recruitment of conditionally non-normal, determined using Anderson-Darling [4]
participants once we reached saturation of insights. tests of normality. To achieve normality, we applied logarithmic
To analyze our interviews, we used thematic analysis [12] guided transformation prior to analysis, as is common practice for time
by a semantic approach [62]. We used two interviews to develop measures [9, 41, 53]. For ease of interpretation, plots of interaction
an initial set of codes, resulting in a total of 23 open codes. Each times are shown using the original non-transformed values.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

Figure 4: Accuracy of Extracted Information (AEI ), as a percentage, for screen-reader users without (N =36) and with (N =21)
VoxLens, by (a) Visualization Library, (b) Complexity Level, and (c) Difculty Level. The percentage represents the “accurate”
answers. Therefore, higher is better. Error bars represent mean ± standard deviation.

Without VoxLens With VoxLens
N AA AA (%) N AA AA (%)
Overall 324 109 34% 189 141 75%
Visualization Library (VL)
ChartJS 108 12 11% 63 50 79%
D3 108 18 17% 63 47 75%
Google Charts 108 79 73% 63 44 70%
Complexity Level (CMP)
Low 108 40 37% 63 52 83%
Medium 108 34 31% 63 48 76%
High 108 35 32% 63 41 65%
Difculty Level (DF)
Low 108 35 32% 63 58 92%
Medium 108 36 33% 63 38 60%
High 108 38 35% 63 45 71%
Table 3: Numerical results for the N = 513 questions asked of screen-reader users with and without VoxLens for each level
of Visualization Library, Complexity Level, and Difculty Level. N is the total number of questions asked, AA is the number of
“accurate answers,” and AA(%) is the percentage of “accurate answers.”

VoxLens (VX) had a signifcant main efect on Interaction Time The VX × VL and VX × DF interactions were both signifcant
(IT) (F (4,54)=12.66, p<.05, ηp2 =.19). Specifcally, the average IT for (F (4,444)=33.89, p<.001, ηp2 =.23 and F (444)=14.41, p<.001, ηp2 =.12,
non-VoxLens users was 84.6 seconds (SD=75.2). For VoxLens users, respectively). Figure 5 shows interaction times across diferent
it was 54.1 seconds (SD=21.9), 36% lower (faster) than for partici- visualization libraries, difculty levels, and complexity levels for
pants without VoxLens. VoxLens group. For VoxLens users, all three visualization libraries
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

N χ2 p Cramer’s V
V X (VoxLens) 57 38.16 < .001 .14
VX ×VL 57 82.82 < .001 .20
V X × CMP 57 8.90 .064 .07
V X × DF 57 17.95 .001 .09
Aдe 57 3.58 .058 .04
Table 4: Summary results from 57 screen-reader users with (N =21) and without (N =36) VoxLens. “VL” is the visualization
library, “CMP” is data complexity, and “DF” is question difculty. Cramer’s V is a measure of efect size [25]. All results are
statistically signifcant (p < .05) or marginal (.05 ≤ p < .10).

d fn d fd F p ηp2
V X (VoxLens) 4 54 12.66 .001 .19
VX ×VL 4 444 33.89 < .001 .23
V X × CMP 4 444 1.85 .118 .02
V X × DF 4 444 14.41 < .001 .12
Aдe 4 54 5.03 .029 .09
Table 5: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens using a linear mixed
model [30, 54]. “VL” is the visualization library, “CMP” is data complexity, and “DF” is question difculty. Partial eta-squared
(ηp2 ) is a measure of efect size [20]. All results are statistically signifcant (p < .05) except V X × CMP.

resulted in almost identical interaction times. Figure 5 portrays our participants’ feedback about VoxLens: (1) a positive step for-
larger variations in interaction times for users who did not use ward in making online data visualizations accessible, (2) interactive
VoxLens (data used from prior work [69]) compared to VoxLens dialogue is one of the “top” features, (3) sonifcation helps in “vi-
users. We attribute these observed diferences to the diferent un- sualizing” data, (4) data summary is a good starting point, and (5)
derlying implementations of the visualization libraries. one-size-fts-all is not the optimal solution. We present each of
We investigated the efects of Age on IT . Age had a signifcant these in turn.
efect on IT (F (1,54)=5.03, p<.05, ηp2 =.09), indicating that IT difered
signifcantly across the ages of our participants, with participants 5.3.1 A Positive Step Forward in Making Online Data Visualiza-
aged 50 or older showing higher interaction times by about 7% tions Accessible. All participants found VoxLens to be an overall
compared to participants under the age of 50. Table 8 (Appendix helpful tool to interact with and quickly extract information from
C) shows the average IT for each age range by VoxLens group. online data visualizations. For example, S1 and S3 expressed their
Additionally, we report our fndings in Table 5. excitement about VoxLens:
Similar to our exploration of investigating the efect of screen-
reader users (SRU ) on AEI , we examined the main efect of SRU I have never been able to really interact
on IT . Our results show that SRU had a signifcant efect on IT with graphs before online. So without the
(F (4,54)=48.84, p<.001, ηp2 =.48), with non-screen-reader users per- tool, I am not able to have that picture
forming 99% faster than VoxLens users. in my head about what the graph looks
like. I mean, like, especially when looking
up news articles or really any, sort of,
like, social media, there’s a lot of visual
5.3 Interview Results representations and graphs and pictographs
To assess VoxLens qualitatively, we investigated the overall expe- that I don’t have access to so I could see
riences of our participants with VoxLens, the features they found myself using [VoxLens] a lot. The tool is
helpful, the challenges they faced during the interaction, and the really great and definitely a positive step
improvements and future features that could enhance the perfor- forward in creating accessible graphs and
mance of VoxLens. We identifed fve main results from analyzing data. (S1)
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

access a graph and a chart and be able to
parse data from it. (S3)
Participants highlighted that VoxLens contributes to bridging
the access gap between screen-reader- and non-screen-reader users.
As S4 said:
So, as a sighted person looks at a graph
and as they can tell where the peak is or
which one has the most or whatever, we want
to be able to do that quickly as well. And
even if there is a text description under
the graph, and I’ve not seen that very much,
you have to read through everything to find
a certain piece of information that you’re
looking for. [Using VoxLens], I can find
out specific pieces of information without
having to read an entire page of text. (S4)
Additionally, participants identifed that VoxLens enables them
to quickly extract information from online data visualizations. S5
shared his experiences:
Again, you know, [VoxLens] helps you find
data a little bit quicker than navigating
with a screen reader, and it’ll give you a
brief idea of what the data is about before
you start digging deeper into it. (S5)
The fndings from our frst result show that VoxLens contributes
to reducing the access gap for screen-reader users, and is a positive
step forward, enabling screen-reader users to interact with and
explore online data visualizations.

5.3.2 Interactive Dialogue is One of the “Top” Features. Similar to
our frst fnding, all the participants found the question-and-answer
mode of VoxLens a fast and efcient way to extract information
from online data visualizations. S2 considered the question-and-
answer mode as one of the key features of VoxLens:
So I believe that one of the really top
features is, kind of, interactive dialogue.
(S2)
Similarly, S1 found the question-and-answer mode a fast and
reliable way to extract information, requiring “a lot less brain power.”
She said:
I especially liked the part of the tool where
you can ask it a question and it would give
Figure 5: Interaction Times (IT ), in seconds, for screen- you the information back. I thought it was
reader users without (N =36) and with (N =21) VoxLens by (a) brilliant actually. I felt like being able
Visualization Library (V L), (b) Data Complexity Level (CMP), to ask it a question made everything go a
and (c) Question Difculty Level (DF ). Error bars represent lot faster and it took a lot less brain power
mean ± standard deviation. Lower is better (faster). I think. I felt really confident about the
answers that it was giving back to me. (S1)
S3 noted the broader utility and applicability of the question-
Oh, [VoxLens] was outstanding. It’s and-answer mode:
definitely a great way to visualize the The voice activation was very, very neat. I’m
graphs if you can’t see them in the charts. sure it could come in handy for a variety of
I mean, it’s just so cool that this is uses too. I definitely enjoyed that feature.
something that allows a blind person to (S3)
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

S5 faced some challenges in activating the right command but 5.3.5 One-Size-Fits-All Is Not the Optimal Solution. To enhance
was able to learn the usage of the question-and-answer mode in a the usability of and interaction experience with VoxLens, our par-
few tries: ticipants identifed the need to cater to the individual preferences
You know, sometimes the word was wrong and of the screen-reader users. For example, S3 recognized the need to
I think it says something like, it didn’t have multiple options to “play” with the sonifed response:
understand, but basically eventually I got So I was just thinking maybe, you know,
it right. (S5) that could be some sort of option or like
Our second fnding indicates that VoxLens’ question-and-answer an alternate way to sonify it. Perhaps
mode is a fast, efcient, and reliable way for screen-reader users to having an option to do it as continuous
extract information. Additionally, the feedback from the question- cause I noticed, like, they were all
and-answer mode assists screen-reader users to resolve the chal- discrete. ’Cause sometimes, you know, it’s
lenges by themselves within a few tries. just preference or that could be something
that could add some usability. It’s just
5.3.3 Sonification Helps in “Visualizing” Data. Our third result some little things to maybe play with or to
reveals that our participants found sonifcation helpful in under- maybe give an option or something. (S3)
standing general trends in the data. Specifcally, participants were
able to infer whether an overall trend was increasing or decreasing, Similarly, S4 was interested in VoxLens predicting what she was
obtaining holistic information about the data. S2 said: going to ask using artifcial intelligence (A.I.). She said:
The idea of sonification of the graph You know, I think that [VoxLens] would need
could give a general understanding of the a lot more artificial intelligence. It could
trends. The way that it could summarize the be a lot [more] intuitive when it comes to
charts was really nice too. The sonification understanding what I’m going to ask. (S4)
feature was amazing. (S2) Additionally, S2 suggested adding setting preferences for the
S1, who had never used sonifcation before, expressed her initial summary and the auditory sonifed output:
struggles interpreting a sonifed response but was able to “visualize” [Summary mode] could eventually become a
the graph through sonifcation within a few tries. She said: setting preference or something that can
The audio graph... I’d never used one before, be disabled. And you, as a screen-reader
so I kind of struggled with that a little bit user, could not control the speed of the
because I wasn’t sure if the higher pitch [sonification] to you. To go faster or to
meant the bar was higher up in the graph or go slower, even as a blind person, would be
not. But being able to visualize the graph [helpful]. (S2)
with this because of the sound was really Our fndings indicate that a one-size-fts-all solution is not op-
helpful. (S1) timal and instead, a personalizable solution should be provided, a
Overall, our third result shows that sonifcation is a helpful notion supported by recent work in ability-based design [79]. We
feature for screen-reader users to interact with data visualizations, are working to incorporate the feedback and suggestions from our
providing them with holistic information about data trends. participants into VoxLens.

5.3.4 Data Summary is a Good Starting Point. In keeping with 5.4 Subjective Workload Ratings
fndings from prior work [69], our fourth fnding indicates that
screen-reader users frst seek to obtain a holistic overview of the We used the NASA Task Load Index (TLX) [38] workload question-
data, fnding a data summary to be a good starting point for visual- naire to collect subjective ratings for VoxLens. The NASA-TLX
ization exploration. The summary mode of VoxLens enabled our instrument asks participants to rate the workload of a task on six
participants to quickly get a “general picture” of the data. S1 and scales: mental demand, physical demand, temporal demand, per-
S4 expressed the benefts of VoxLens’ summary mode: formance, efort, and frustration. Each scale ranges from low (1)
to high (20). We further classifed the scale into four categories
I thought the summary feature was really
for a score x: low (x < 6), somewhat low (6 ≤ x < 11), somewhat
great just to get, like, a general picture
high (11 ≤ x < 16), and high (16 ≤ x). Our results indicate that
and then diving deeper with the other
VoxLens requires low physical- (M=3.4, SD=3.3) and temporal de-
features to get a more detailed image in
mand (M=5.7, SD=3.8), and has high perceived performance (M=5.6,
my head about what the graphs look like. (S1)
SD=5.6). Mental demand (M=7.8, SD=4.4), efort (M=9.9, SD=6.1),
So, um, the summary option was a good start and frustration (M=8.3, SD=6.6) were somewhat low.
point to know, okay, what is, kind of, on Prior work [69], which is the source of our data for screen-reader
the graph. (S4) users who did not use VoxLens, did not conduct a NASA-TLX sur-
Our fourth result indicates that VoxLens’ summary mode as- vey with their participants. Therefore, a direct workload compari-
sisted screen-reader users to holistically explore the online data son is not possible. However, the subjective ratings from our study
visualizations, helping them in determining if they want to dig could serve as a control for comparisons in future work attempting
deeper into the data. to make online visualizations accessible for screen-reader users.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

6 DISCUSSION efectiveness of voice-recognition technology. Additional investiga-
In this work, we created VoxLens, an interactive JavaScript plug-in tion showed that to answer the questions in our experiment tasks,
to make online data visualizations more accessible to screen-reader screen-reader users utilized the Question-and-Answer mode 71.9%
users. This work has been guided by the recommendations and of the time, compared to the Summary (22.5%) and Sonifcation
fndings from prior work [69] that highlight the barriers screen- (5.5%) modes. Out of the 71.9% Question-and-Answer mode usage,
reader users face in accessing the information contained in online VoxLens accurately recognized and responded to commands 49.9%
data visualizations. In creating VoxLens, we sought to improve of the time; 34% of the time VoxLens was unable to accurately parse
the accessibility of online data visualizations by making them dis- the speech input, and the remaining 16.1% of the time VoxLens
coverable and comprehensible to screen readers, and by enabling received commands that were not supported (e.g., “correlation co-
screen-reader users to explore the data both holistically and in a efcient”). VoxLens uses the Web Speech API [60] for recognizing
drilled-down manner. To achieve this, we designed three modes voice commands. While the Web Speech API is a great leap for-
of VoxLens: (1) Question-and-Answer mode; (2) Summary mode, ward in terms of speech-input and text-to-speech output features
and (3) Sonifcation mode. Our task-based experiments show that [60], it is still an experimental feature with limited performance of
screen-reader users extracted information 122% more accurately about 70% accuracy [64]. Therefore, future work could evaluate the
and spent 36% less time when interacting with online data visualiza- performance of VoxLens with alternatives to the Web Speech API.
tions using VoxLens than without. Additionally, we observed that
screen-reader users utilized VoxLens uniformly across all visual- 6.3 Qualitative Assessment of VoxLens
ization libraries that were included in our experiments, irrespective All six screen-reader users we interviewed expressed that VoxLens
of the underlying implementations and accessibility measures of signifcantly improved their current experiences with online data
the libraries, achieving a consistent interaction. visualizations. Participants showed their excitement about VoxLens
assisting them in “visualizing” the data and in extracting informa-
6.1 Simultaneous Improvement In Accuracy tion from important visualizations, such as the ones portraying
and Interaction Times COVID-19 statistics. Furthermore, some of our participants high-
lighted that VoxLens reduces the access gap between screen-reader-
Prior work [69] has reported that due to the inaccessibility of online
and non-screen-reader users. For example, S4 mentioned that with
data visualizations, screen-reader users extract information 62% less
the help of VoxLens, she was able to “fnd out specifc pieces of infor-
accurately than non-screen-reader users. We found that VoxLens
mation without having to read an entire page of text,” similar to how
improved the accuracy of information extraction of screen-reader
a “sighted person” would interact with the graph. Additionally, our
users by 122%, reducing the information extraction gap between
participants found VoxLens “pretty easy,” “meaningful,” “smooth,”
the two user groups from 62% to 15%. However, in terms of interac-
and “intuitive,” without requiring a high mental demand.
tion time, while VoxLens reduced the gap from 211% to 99%, the
diference is still statistically signifcant between non-screen-reader
and VoxLens users. Non-screen-reader users utilize their visual
6.4 VoxLens is a Response to Call-to-Action for
system’s power to quickly recognize patterns and extrapolate in- Inclusive Data Visualizations
formation from graphs, such as overall trends and extrema [57]. In Taking these fndings together, VoxLens is a response to the call-
contrast, screen-reader users rely on alternative techniques, such as to-action put forward by Marriott et al. [57] that asserts the need
sonifcation, to understand data trends. However, hearing a sonifed to improve accessibility for disabled people disenfranchised by ex-
version of the data can be time-consuming, especially when the isting data visualizations and tools. VoxLens is an addition to the
data cardinality is large, contributing to the diference in the inter- tools and systems designed to make the Web an equitable place
action times between the two user groups. Additionally, issuing a for screen-reader users, aiming to bring their experiences on a par
voice command, pressing a key combination, and the duration of with that of non-screen-reader users. Through efective advertise-
the auditory response can also contribute to the observed difer- ment, and by encouraging developers to integrate VoxLens into
ence. However, it is worth emphasizing that screen-reader users the codebase of visualization libraries, we hope to broadly expand
who used VoxLens improved their interaction time by 36% while the reach and impact of VoxLens. Additionally, through collecting
also increasing their accuracy of extracted information by 122%. In anonymous usage logs (VoxLens modes used, commands issued,
other words, VoxLens users became both faster and more accurate, and responses issued) and feedback from users—a feature already
a fortunate outcome often hard to realize in human performance implemented in VoxLens—we aspire to continue improving the
studies due to speed-accuracy tradeofs. usability and functionality of VoxLens for a diverse group of users.

6.2 Role of Voice-Recognition Technology 7 LIMITATIONS & FUTURE WORK
For screen-reader users who used VoxLens, 75% (N =141) of the At present, VoxLens is limited to two-dimensional data visualiza-
answers were correct and 11% (N =20) were incorrect. Our partici- tions with a single series of data. Future work could study the
pants were unable to extract the answers from the remaining 15% experiences of screen-reader users with n-dimensional data visual-
(N =28) of the questions. Further exploration revealed that among izations and multiple series of data, and extend the functionality of
the 25% (N =48) of the questions that were not answered correctly, VoxLens based on the fndings. Additionally, VoxLens is currently
52% (N =25) involved symmetry comparison. Symmetry comparison only fully functional on Google Chrome, as the support for the
requires value retrieval of multiple data points and relies on the Web Speech API’s speech recognition is currently limited to Google
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

Chrome. Future work could consider alternatives to the Web Speech [4] Theodore W. Anderson and Donald A. Darling. 1954. A test of goodness of ft.
API that ofer cross-browser support for speech recognition. Journal of the American statistical association 49, 268 (1954), 765–769.
[5] Apple. n.d.. Audio Graphs | Apple Developer Documentation. https://developer.
Our fndings showed that some of our participants preferred apple.com/documentation/accessibility/audio_graphs. (Accessed on 08/01/2021).
to have the ability to control the speed, frequency, and waveform [6] Ronald J. Baken and Robert F. Orlikof. 2000. Clinical Measurement of Speech
and Voice. Singular Thomson Learning, San Diego, California, USA. https:
of the sonifed response. Therefore, future work could extend the //books.google.com/books?id=ElPyvaJbDiwC
functionality of VoxLens by connecting it to a centralized con- [7] Nikola Banovic, Rachel L. Franz, Khai N. Truong, Jennifer Mankof, and Anind K.
fguration management system, enabling screen-reader users to Dey. 2013. Uncovering Information Needs for Independent Spatial Learning for
Users Who Are Visually Impaired. In Proceedings of the 15th International ACM
specify their preferences. These preferences could then be used to SIGACCESS Conference on Computers and Accessibility (Bellevue, Washington)
generate appropriate responses, catering to the individual needs of (ASSETS ’13). Association for Computing Machinery, New York, NY, USA, Article
screen-reader users. 24, 8 pages. https://doi.org/10.1145/2513383.2513445
[8] Katja Battarbee. 2004. Co-experience: understanding user experiences in interaction.
Aalto University, Helsinki, Finland. 103 + app. 117 pages. http://urn.f/URN:
ISBN:951-558-161-3
8 CONCLUSION [9] Donald A. Berry. 1987. Logarithmic Transformations in ANOVA. Biometrics 43,
We presented VoxLens, a JavaScript plug-in that improves the acces- 2 (1987), 439–456. http://www.jstor.org/stable/2531826
[10] Jacques Bertin. 1983. Semiology of graphics; diagrams networks maps. Technical
sibility of online data visualizations, enabling screen-reader users Report.
to extract information using a multi-modal approach. In creating [11] Michael Bostock, Vadim Ogievetsky, and Jefrey Heer. 2011. D3 data-driven
VoxLens, we sought to address the challenges screen-reader users documents. IEEE transactions on visualization and computer graphics 17, 12 (2011),
2301–2309.
face with online data visualizations by enabling them to extract [12] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.
information both holistically and in a drilled-down manner, using Qualitative research in psychology 3, 2 (2006), 77–101.
techniques and strategies that they prefer. Specifcally, VoxLens [13] Matthew Brehmer, Bongshin Lee, Petra Isenberg, and Eun Kyoung Choe. 2018.
Visualizing ranges over time on mobile phones: a task-based crowdsourced
provides three modes of interaction using speech and sonifcation: evaluation. IEEE transactions on visualization and computer graphics 25, 1 (2018),
Question-and-Answer mode, Summary mode, and Sonifcation mode. 619–629.
[14] Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract
To assess the performance of VoxLens, we conducted task-based visualization tasks. IEEE transactions on visualization and computer graphics 19,
experiments and interviews with screen-reader users. VoxLens sig- 12 (2013), 2376–2385.
nifcantly improved the interaction experiences of screen-reader [15] Craig Brown and Amy Hurst. 2012. VizTouch: Automatically Generated Tactile
Visualizations of Coordinate Spaces. In Proceedings of the Sixth International
users with online data visualizations, both in terms of accuracy Conference on Tangible, Embedded and Embodied Interaction (Kingston, Ontario,
of extracted information and interaction time, compared to their Canada) (TEI ’12). Association for Computing Machinery, New York, NY, USA,
conventional interaction with online data visualizations. Our re- 131–138. https://doi.org/10.1145/2148131.2148160
[16] Lorna M. Brown, Stephen A. Brewster, Ramesh Ramloll, Mike Burton, and Beate
sults also show that screen-reader users considered VoxLens to Riedel. 2003. Design guidelines for audio presentation of graphs and tables.
be a “game-changer,” providing them with “exciting new ways” to In Proceedings of the 9th International Conference on Auditory Display. Citeseer,
Boston University, USA, 284–287.
interact with online data visualizations and saving them time and [17] Ben Caldwell, Michael Cooper, Loretta Guarino Reid, Gregg Vanderheiden, Wendy
efort. We hope that by open-sourcing our code for VoxLens and Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guide-
our sonifcation solution, our work will inspire developers and visu- lines (WCAG) 2.0.
[18] ChartJS. [n. d.]. Accessibility | Chart.js. https://www.chartjs.org/docs/3.5.1/
alization creators to continually improve the accessibility of online general/accessibility.html. (Accessed on 01/08/2022).
data visualizations. We also hope that our work will motivate and [19] Peter Ciuha, Bojan Klemenc, and Franc Solina. 2010. Visualization of Concurrent
guide future research in making data visualizations accessible. Tones in Music with Colours. In Proceedings of the 18th ACM International Confer-
ence on Multimedia (Firenze, Italy) (MM ’10). Association for Computing Machin-
ery, New York, NY, USA, 1677–1680. https://doi.org/10.1145/1873951.1874320
ACKNOWLEDGMENTS [20] Jacob Cohen. 1973. Eta-squared and partial eta-squared in fxed factor ANOVA
designs. Educational and psychological measurement 33, 1 (1973), 107–112.
This work was supported in part by the Mani Charitable Foundation, [21] Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz stud-
ies—why and how. Knowledge-based systems 6, 4 (1993), 258–266.
the University of Washington Center for an Informed Public, and [22] Patrick Dengler, Anthony Grasso, Chris Lilley, Cameron McCormack, Doug
the University of Washington Center for Research and Education Schepers, and Jonathan Watt. 2011. Scalable Vector Graphics (SVG) 1.1.
on Accessible Technology and Experiences (CREATE). We extend [23] Google Developers. 2014. Charts. https://developers.google.com/chart/
[24] Roger L. DuBois. 2017. Web Audio Speech Synthesis / Recognition for p5.js.
our gratitude to the AccessComputing staf for their support and https://github.com/IDMNYU/p5.js-speech
assistance in recruiting participants. We would also like to thank the [25] Christopher J. Ferguson. 2016. An efect size primer: A guide for clinicians and
anonymous reviewers for their helpful comments and suggestions. researchers. In Methodological issues and strategies in clinical research, A.E. Kazdin
(Ed.). American Psychological Association, Washington, DC, USA, 301––310.
Any opinions, fndings, conclusions, or recommendations expressed [26] John H. Flowers. 2005. Thirteen years of refection on auditory graphing:
in this work are those of the authors and do not necessarily refect Promises, pitfalls, and potential new directions. In Proceedings of the 11th Meeting
of the International Conference on Auditory Display. Citeseer, Limerick, Ireland,
those of any supporter. 406–409.
[27] John H. Flowers, Dion C. Buhman, and Kimberly D. Turnage. 1997. Cross-
modal equivalence of visual and auditory scatterplots for exploring bivariate
REFERENCES data samples. Human Factors 39, 3 (1997), 341–351.
[1] Chadia Abras, Diane Maloney-Krichmar, Jenny Preece, et al. 2004. User-centered [28] OpenJS Foundation. n.d.. Node.js. https://nodejs.org/en/. (Accessed on
design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Thousand 08/08/2021).
Oaks: SAGE Publications 37, 4 (2004), 445–456. [29] The GraphQL Foundation. n.d.. GraphQL | A query language for your API.
[2] NV Access. n.d.. NV Access | Download. https://www.nvaccess.org/download/. https://graphql.org/. (Accessed on 08/08/2021).
(Accessed on 08/08/2021). [30] Brigitte N. Frederick. 1999. Fixed-, random-, and mixed-efects ANOVA models:
[3] Dragan Ahmetovic, Cristian Bernareggi, João Guerreiro, Sergio Mascetti, and A user-friendly guide for increasing the generalizability of ANOVA results. In
Anna Capietto. 2019. Audiofunctions. web: Multimodal exploration of math- Advances in Social Science Methodology, B. Thompson (Ed.). JAI Press, Stamford,
ematical function graphs. In Proceedings of the 16th International Web for All Connecticut, 111–122.
Conference. 1–10.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

[31] Deen G. Freelon. 2010. ReCal: Intercoder reliability calculation as a web service. [57] Kim Marriott, Bongshin Lee, Matthew Butler, Ed Cutrell, Kirsten Ellis, Cagatay
International Journal of Internet Science 5, 1 (2010), 20–33. Goncu, Marti Hearst, Kathleen McCoy, and Danielle Albers Szafr. 2021. Inclusive
[32] Arthur Gilmour, Robert D. Anderson, and Alexander L. Rae. 1985. The analysis data visualization for people with disabilities: a call to action. Interactions 28, 3
of binomial data by a generalized linear mixed model. Biometrika 72, 3 (1985), (2021), 47–51.
593–599. [58] David K. McGookin and Stephen A. Brewster. 2006. SoundBar: Exploiting Multiple
[33] Nicholas A. Giudice, Hari Prasath Palani, Eric Brenner, and Kevin M. Kramer. 2012. Views in Multimodal Graph Browsing. In Proceedings of the 4th Nordic Conference
Learning Non-Visual Graphical Information Using a Touch-Based Vibro-Audio on Human-Computer Interaction: Changing Roles (Oslo, Norway) (NordiCHI ’06).
Interface. In Proceedings of the 14th International ACM SIGACCESS Conference on Association for Computing Machinery, New York, NY, USA, 145–154. https:
Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12). Association //doi.org/10.1145/1182475.1182491
for Computing Machinery, New York, NY, USA, 103–110. https://doi.org/10. [59] Silvia Mirri, Silvio Peroni, Paola Salomoni, Fabio Vitali, and Vincenzo Rubano.
1145/2384916.2384935 2017. Towards accessible graphs in HTML-based scientifc articles. In 2017 14th
[34] The PostgreSQL Global Development Group. n.d.. PostgreSQL: The world’s most IEEE Annual Consumer Communications Networking Conference (CCNC). IEEE,
advanced open source database. https://www.postgresql.org/. (Accessed on Las Vegas, NV, USA, 1067–1072. https://doi.org/10.1109/CCNC.2017.7983287
08/08/2021). [60] André Natal, Glen Shires, and Philip Jägenstedt. n.d.. Web Speech API. https:
[35] Melita Hajdinjak and France Mihelic. 2004. Conducting the Wizard-of-Oz Exper- //wicg.github.io/speech-api/. (Accessed on 08/07/2021).
iment. Informatica (Slovenia) 28, 4 (2004), 425–429. [61] Keita Ohshiro, Amy Hurst, and Luke DuBois. 2021. Making Math Graphs More
[36] Patrick A.V. Hall and Geof R. Dowling. 1980. Approximate string matching. Accessible in Remote Learning: Using Sonifcation to Introduce Discontinuity in
ACM computing surveys (CSUR) 12, 4 (1980), 381–402. Calculus. In The 23rd International ACM SIGACCESS Conference on Computers
[37] Jordan Harband, Shu-yu Guo, Michael Ficarra, and Kevin Gibbons. 1999. Standard and Accessibility. 1–4.
ecma-262. [62] Michael Q. Patton. 1990. Qualitative evaluation and research methods. SAGE
[38] Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX Publications Inc., Saint Paul, MN, USA.
(Task Load Index): Results of empirical and theoretical research. In Advances in [63] Pablo Picazo-Sanchez, Juan Tapiador, and Gerardo Schneider. 2020. After you,
psychology. Vol. 52. Elsevier, North-Holland, Netherlands, 139–183. please: browser extensions order attacks and countermeasures. International
[39] Highcharts. n.d.. Sonifcation | Highcharts. https://www.highcharts.com/docs/ Journal of Information Security 19, 6 (2020), 623–638.
accessibility/sonifcation. (Accessed on 08/01/2021). [64] Ricardo Sousa Rocha, Pedro Ferreira, Inês Dutra, Ricardo Correia, Rogerio Salvini,
[40] Clare J. Hooper. 2011. Towards Designing More Efective Systems by Under- and Elizabeth Burnside. 2016. A Speech-to-Text Interface for MammoClass. In
standing User Experiences. SIGWEB Newsl. Autumn, 4, Article 4 (Sept. 2011), 2016 IEEE 29th International Symposium on Computer-Based Medical Systems
3 pages. https://doi.org/10.1145/2020936.2020940 (CBMS). IEEE, Dublin, Ireland and Belfast, Northern Ireland, 1–6. https://doi.
[41] Mike H. Hoyle. 1973. Transformations: An Introduction and a Bibliography. org/10.1109/CBMS.2016.25
International Statistical Review / Revue Internationale de Statistique 41, 2 (1973), [65] Bahador Saket, Alex Endert, and Çağatay Demiralp. 2018. Task-based efective-
203–223. http://www.jstor.org/stable/1402836 ness of basic visualizations. IEEE transactions on visualization and computer
[42] Weijian Hu, Kaiwei Wang, Kailun Yang, Ruiqi Cheng, Yaozu Ye, Lei Sun, and graphics 25, 7 (2018), 2505–2512.
Zhijie Xu. 2020. A comparative study in real-time scene sonifcation for visually [66] Nik Sawe, Chris Chafe, and Jefrey Treviño. 2020. Using Data Sonifcation to
impaired people. Sensors 20, 11 (2020), 3222. Overcome Science Literacy, Numeracy, and Visualization Barriers in Science
[43] Amy Hurst and Shaun Kane. 2013. Making "Making" Accessible. In Proceedings Communication. Frontiers in Communication 5 (2020), 46.
of the 12th International Conference on Interaction Design and Children (New York, [67] Anastasia Schaadhardt, Alexis Hiniker, and Jacob O. Wobbrock. 2021. Under-
New York, USA) (IDC ’13). Association for Computing Machinery, New York, NY, standing Blind Screen-Reader Users’ Experiences of Digital Artboards. In Pro-
USA, 635–638. https://doi.org/10.1145/2485760.2485883 ceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
[44] Apple Inc. n.d.. Accessibility - Vision - Apple. https://www.apple.com/ Association for Computing Machinery, New York, NY, USA, Article 270, 19 pages.
accessibility/vision/. (Accessed on 08/08/2021). https://doi.org/10.1145/3411764.3445242
[45] Facebook Inc. n.d.. React – A JavaScript library for building user interfaces. [68] Freedom Scientifc. n.d.. JAWS® – Freedom Scientifc. https://www.
https://reactjs.org/. (Accessed on 08/08/2021). freedomscientifc.com/products/software/jaws/. (Accessed on 08/08/2021).
[46] Dae Hyun Kim, Enamul Hoque, and Maneesh Agrawala. 2020. Answering Ques- [69] Ather Sharif, Sanjana Chintalapati, Jacob O. Wobbrock, and Katharina Reinecke.
tions about Charts and Generating Visual Explanations. In Proceedings of the 2020 2021. Understanding Screen-Reader Users’ Experiences with Online Data Visual-
CHI Conference on Human Factors in Computing Systems. Association for Com- izations. In The 23rd International ACM SIGACCESS Conference on Computers and
puting Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831. Accessibility (Virtual Event) (ASSETS ’21). Association for Computing Machinery,
3376467 New York, NY, USA, To Appear.
[47] Edward Kim and Kathleen F McCoy. 2018. Multimodal deep learning using [70] Ather Sharif and Babak Forouraghi. 2018. evoGraphs — A jQuery plugin to create
images and text for information graphic classifcation. In Proceedings of the 20th web accessible graphs. In 2018 15th IEEE Annual Consumer Communications
International ACM SIGACCESS Conference on Computers and Accessibility. 143– Networking Conference (CCNC). IEEE, Las Vegas, NV, USA, 1–4. https://doi.org/
148. 10.1109/CCNC.2018.8319239
[48] Edward Kim, Connor Onweller, and Kathleen F McCoy. 2021. Information Graphic [71] Lei Shi, Idan Zelzer, Catherine Feng, and Shiri Azenkot. 2016. Tickers and Talker:
Summarization using a Collection of Multimodal Deep Neural Networks. In 2020 An Accessible Labeling Toolkit for 3D Printed Models. In Proceedings of the
25th International Conference on Pattern Recognition (ICPR). IEEE, 10188–10195. 2016 CHI Conference on Human Factors in Computing Systems. Association for
[49] Klaus Krippendorf. 2011. Computing Krippendorf’s Alpha-Reliability. https: Computing Machinery, New York, NY, USA, 4896–4907. https://doi.org/10.1145/
//repository.upenn.edu/asc_papers/43 Retrieved from. 2858036.2858507
[50] Klaus Krippendorf. 2018. Content analysis: An introduction to its methodology. [72] Boris Smus. 2013. Web Audio API: Advanced Sound for Games and Interactive
SAGE Publications Inc., Pennsylvania, USA. Apps. O’Reilly Media, California, USA. https://books.google.com/books?id=
[51] Richard J. Landis and Gary G. Kock. 1977. The Measurement of Observer eSPyRuL8b7UC
Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http: [73] Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John
//www.jstor.org/stable/2529310 Stasko. 2021. Collecting and Characterizing Natural Language Utterances for
[52] Bongshin Lee, Arjun Srinivasan, Petra Isenberg, John Stasko, et al. 2021. Post- Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on
WIMP Interaction for Information Visualization. Foundations and Trends® in Human Factors in Computing Systems. Association for Computing Machinery,
Human-Computer Interaction 14, 1 (2021), 1–95. New York, NY, USA, Article 464, 10 pages. https://doi.org/10.1145/3411764.
[53] Eckhard Limpert, Werner A. Stahel, and Markus Abbt. 2001. Log-normal distri- 3445400
butions across the sciences: keys and clues: on the charms of statistics, and how [74] Jonathan E. Thiele, Michael S. Pratte, and Jefrey N. Rouder. 2011. On perfect
mechanical models resembling gambling machines ofer a link to a handy way working-memory performance with large numbers of items. Psychonomic Bulletin
to characterize log-normal distributions, which can provide deeper insight into & Review 18, 5 (2011), 958–963.
variability and probability—normal or log-normal: that is the question. BioScience [75] Ingo R. Titze and Daniel W. Martin. 1998. Principles of voice production.
51, 5 (2001), 341–352. [76] Frances Van Scoy, Don McLaughlin, and Angela Fullmer. 2005. Auditory augmen-
[54] Ramon C. Littell, Henry P. Raymond, and Clarence B. Ammerman. 1998. Statistical tation of haptic graphs: Developing a graphic tool for teaching precalculus skill
analysis of repeated measures data using SAS procedures. Journal of animal to blind students. In Proceedings of the 11th Meeting of the International Conference
science 76, 4 (1998), 1216–1231. on Auditory Display, Vol. 5. Citeseer, Limerick, Ireland, 5 pages.
[55] Alan Lundgard and Arvind Satyanarayan. 2021. Accessible Visualization via [77] W3C. n.d.. WAI-ARIA Overview | Web Accessibility Initiative (WAI) | W3C.
Natural Language Descriptions: A Four-Level Model of Semantic Content. IEEE https://www.w3.org/WAI/standards-guidelines/aria/. (Accessed on 04/11/2021).
transactions on visualization and computer graphics (2021). [78] WebAIM. n.d.. WebAIM: CSS in Action - Invisible Content Just for Screen
[56] Yotam Mann. n.d.. Tone.js. https://tonejs.github.io/. (Accessed on 08/02/2021). Reader Users. https://webaim.org/techniques/css/invisiblecontent/. (Accessed
on 09/01/2021).
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

[79] Jacob O. Wobbrock, Krzysztof Z. Gajos, Shaun K. Kane, and Gregg C. Vanderhei-
den. 2018. Ability-based design. Commun. ACM 61, 6 (2018), 62–71.
[80] Susan P. Wyche and Rebecca E. Grinter. 2009. Extraordinary Computing: Religion
as a Lens for Reconsidering the Home. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems. Association for Computing Machinery,
New York, NY, USA, 749–758. https://doi.org/10.1145/1518701.1518817
[81] Wai Yu, Ramesh Ramloll, and Stephen Brewster. 2001. Haptic graphs for blind
computer users. In Haptic Human-Computer Interaction, Stephen Brewster and
Roderick Murray-Smith (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
41–51.
[82] Mingrui Ray Zhang, Ruolin Wang, Xuhai Xu, Qisheng Li, Ather Sharif, and
Jacob O. Wobbrock. 2021. Voicemoji: Emoji Entry Using Voice for Visually
Impaired People. In Proceedings of the 2021 CHI Conference on Human Factors in
Computing Systems. Association for Computing Machinery, New York, NY, USA,
Article 37, 18 pages. https://doi.org/10.1145/3411764.3445338
[83] Haixia Zhao, Catherine Plaisant, Ben Shneiderman, and Jonathan Lazar. 2008.
Data sonifcation for users with visual impairment: a case study with georefer-
enced data. ACM Transactions on Computer-Human Interaction (TOCHI) 15, 1
(2008), 1–28.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

A PARTICIPANT DEMOGRAPHICS

Gender Age Screen Reader Vision-Loss Level Diagnosis
P1 M 28 NVDA Blind since birth, Complete blindness Optic Nerve Hypoplasia
P2 M 61 JAWS Complete blindness, Lost vision gradually Optic Neuropathy
P3 M 48 JAWS Complete blindness, Lost vision gradually Leber Congenital Amauro-
sis
P4 F 29 NVDA Blind since birth, Complete blindness Optic Nerve Hypoplasia and
Glaucoma
P5 F 37 JAWS Blind since birth, Complete blindness Leber Congenital Amauro-
sis
P6 F 51 JAWS Blind since birth, Complete blindness Retinopathy of Prematurity
P7 M 58 JAWS Complete blindness, Lost vision gradually Glaucoma
P8 M 30 NVDA Blind since birth, Complete blindness Leber Congenital Amauro-
sis
P9 F 64 JAWS Complete blindness, Lost vision gradually Retinitis Pigmentosa
P10 F 68 Fusion Lost vision gradually, Partial blindness Stargaart’s Maculopathy
P11 F 73 JAWS Complete blindness, Lost vision gradually Retinitis Pigmentosa
P12 F 64 JAWS Complete blindness, Lost vision gradually Cataracts
P13 M 18 NVDA Complete blindness Brain Tumor
P14 M 36 JAWS Blind since birth, Complete blindness Leber Congenital Amauro-
sis
P15 M 25 NVDA Lost vision gradually, Partial vision Retinopathy of Prematurity
and Subsequent Cataracts
P16 M 42 JAWS Blind since birth, Complete blindness Microphthalmia
P17 M 68 JAWS Complete blindness, Lost vision gradually Detached Retinas
P18 F 31 NVDA Blind since birth, Complete blindness Retinopathy of Prematurity
P19 F 47 JAWS Complete blindness, Lost vision gradually Optic Neuropathy
P20 M 48 NVDA Complete blindness, Lost vision gradually Retinitis Pigmentosa
P21 M 43 NVDA Complete blindness, Lost vision gradually Retinitis Pigmentosa
P22 M 19 NVDA Blind since birth, Complete blindness Retinopathy of Prematurity
Table 6: Screen-reader participants, their gender identifcation, age, screen reader, vision-loss level, and diagnosis. Under the
Gender column, M = Male, F = Female, and N B = Non-binary.
VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

B WIZARD-OF-OZ PARTICIPANT DEMOGRAPHICS

Gender Age Screen Reader Vision-Loss Level Diagnosis
W1 M 25 VoiceOver Partial vision Extremely low vision
W2 M 28 NVDA Blind since birth Optic Nerve Hypoplaxia
W3 M 23 VoiceOver Blind since birth Septo-optic Dysplasia
W4 F 26 JAWS Blind since birth Leber Congenital Amaurosis
W5 M 31 JAWS Blind since birth Retinopathy of Prematurity
Table 7: Screen-reader participants for the Wizard-of-Oz experiment, their gender identifcation, age, screen reader, vision-loss
level, and diagnosis. Under the Gender column, M = Male, F = Female, and N B = Non-binary.

C INTERACTION TIME PER AGE RANGE

Both Groups Without VoxLens With VoxLens
Age Range N Mean SD N Mean SD N Mean SD
18-19 3 62.5 30.5 1 96.8 - 2 45.3 9.4
20-29 9 40.6 28.0 6 63.7 20.3 3 50.8 7.9
30-39 10 44.2 23.1 7 60.6 16.5 3 43.9 4.5
40-49 15 67.5 78.7 10 106.7 93.1 5 59.8 14.9
50-59 7 47.6 27.3 5 79.1 19.2 2 63.3 3.0
60-69 11 64.8 33.6 6 76.8 33.7 5 55.2 10.8
> 70 2 127.0 127.4 1 217.1 - 1 60.1 -
Table 8: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens showing the numerical
results for Interaction Time (IT), for each age range. N is the total number of participants for the given age range, Mean is the
average IT in seconds, and SD represents the standard deviation.