DOKK Library

VoxLens: Making Online Data Visualizations Accessible with an Interactive JavaScript Plug-In

Authors Alida T. Muongchan Ather Sharif Jacob O. Wobbrock Katharina Reinecke Olivia H. Wang

License CC-BY-4.0

 VoxLens: Making
 VoxLens: Making Online
                  Online Data
                         Data Visualizations
                               Visualizations Accessible
                                              Accessible with
                                                         with an
                 Interactive JavaScript
                Interactive  JavaScript Plug-In
                     Ather Sharif
                     Ather Sharif                                              Olivia H.
                                                                               Olivia H. Wang
                                                                                         Wang                                Alida T.
                                                                                                                             Alida T. Muongchan
       Paul  G.Allen
       Paul G.  AllenSchool
                               Computer                            Paul  G.Allen
                                                                   Paul G.  AllenSchool
                                                                                           Computer                       Human    CenteredDesign
                                                                                                                          Human Centered    Designand
      Science  & Engineering | DUB Group,
     Science & Engineering | DUB Group,                                   Science & Engineering,
                                                                         Science & Engineering,                                    Engineering,
            University of Washington
           University of Washington                                    Universityof
                                                                      University   ofWashington
                                                                                     Washington                            Universityof
                                                                                                                           University  ofWashington
           Seattle, Washington,USA
                                USA                                     Seattle, Washington, USA
                                                                       Seattle, Washington, USA                             Seattle, Washington, USA
                                                                                                                           Seattle, Washington, USA
                                             Katharina Reinecke
                                             Katharina Reinecke                                      Jacob O.
                                                                                                    Jacob  O. Wobbrock
                                     Paul  G.Allen
                                     Paul G.  AllenSchool
                                                             Computer                              The  InformationSchool
                                                                                                  The Information   School| |
                                    Science  & Engineering | DUB Group,
                                   Science & Engineering | DUB Group,                                     DUB  Group,
                                                                                                          DUB Group,
                                          University of Washington
                                         University of Washington                                 Universityof
                                                                                                  University  ofWashington
                                         Seattle, Washington,USA
                                                              USA                                  Seattle, Washington, USA
                                                                                                  Seattle, Washington, USA

Figure 1: VoxLens is an open-source JavaScript plug-in that improves the accessibility of online data visualizations using a
 Figure 1: VoxLens
multi-modal        is anThe
             approach.   open-source   JavaScript
                             code at left          plug-in
                                           shows that       that improves
                                                       integration        the accessibility
                                                                   of VoxLens  requires onlyof aonline
                                                                                                        line visualizations using
                                                                                                             of code. At right, wea
portray       approach.
        an example      The code
                    interaction    at left
                                with        showsusing
                                      VoxLens     that voice-activated
                                                        integration of VoxLens
                                                                       commands requires  only a single
                                                                                  for screen-reader      line of code. At right, we
 portray an example interaction with VoxLens using voice-activated commands for screen-reader users.
ABSTRACT                                                                                     Specifcally, VoxLens enables screen-reader users to obtain a holis-
JavaScript visualization libraries are widely used to create online
                                                                                             tic summaryVoxLens        enables
                                                                                                             of presented        screen-reader
                                                                                                                             information,    playusers  to obtain
                                                                                                                                                   sonifed        a holis-
                                                                                                                                                             versions  of
 JavaScript   visualization   libraries are widely                                            tic data,
                                                                                             the  summary     of presented
                                                                                                        and interact          information, in
                                                                                                                      with visualizations     play  sonified versions
                                                                                                                                                a “drill-down”   manner of
data  visualizations    but provide  limited  access used  to create
                                                     to their        online
 data  visualizations    but provide                                                          the data,
                                                                                             using       and interactcommands.
                                                                                                    voice-activated    with visualizations
                                                                                                                                                 a “drill-down”  manner
for screen-reader     users. Buildinglimited
                                       on prioraccess to their
                                                 fndings       information
                                                           about  the expe-
 for screen-reader     users. Building  ononline
                                            prior findings  about the expe-                   using21voice-activated
                                                                                             with     screen-reader commands.
                                                                                                                       users, we showThrough    task-basedimproves
                                                                                                                                          that VoxLens       experiments
riences   of screen-reader    users with          data visualizations,  we
 riences VoxLens,
           of screen-reader    users with  online data  visualizations,                       with 21 screen-reader
                                                                                             accuracy   of information  users,  we show
                                                                                                                          extraction    andthat VoxLenstime
                                                                                                                                             interaction   improves   the
                                                                                                                                                                by 122%
present                an open-source    JavaScript  plug-in  that—withwe a
 present   VoxLens,     an open-source    JavaScript plug-in   that—with                      accuracy
                                                                                             and         of information
                                                                                                  36%, respectively,   overextraction    and interaction
                                                                                                                             existing conventional         time by with
                                                                                                                                                       interaction  122%
single  line  of code—improves     the accessibility  of online  data visu-a
 single lineforof screen-reader
                  code—improves     the accessibility  of onlineapproach.
                                                                  data visu-                  and 36%,
                                                                                             online  datarespectively,  over
                                                                                                           visualizations. Our existing conventional
                                                                                                                                 interviews             interactionusers
                                                                                                                                              with screen-reader    with
alizations                        users  using a multi-modal
 alizations for screen-reader users using a multi-modal approach.                             online data
                                                                                             suggest  thatvisualizations.   Our interviews with
                                                                                                            VoxLens is a “game-changer”              screen-reader
                                                                                                                                                in making    online users
                                                                                              suggest that VoxLens
                                                                                             visualizations   accessibleisto
                                                                                                                           a “game-changer”      in making
                                                                                                                              screen-reader users,    saving online  data
                                                                                                                                                              them time
 Permission to make digital or hard copies of part or all of this work for personal or        visualizations
                                                                                             and  efort.       accessible to screen-reader users, saving them time
 classroom use is granted without fee provided that copies are not made or distributed
 for profit or commercial advantage and that copies bear this notice and the full citation
                                                                                              and effort.
This work is licensed under a Creative Commons Attribution International
 on the first page. Copyrights for third-party components of this work must be honored.
4.0 License.
 For all other uses, contact the owner/author(s).                                            CCS CONCEPTS
 CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA
 © 2022 Copyright held by the owner/author(s).                                               • Human-centered computing → Information visualization;
© 2022 Copyright held by the owner/author(s).
 ACM ISBN 978-1-4503-9157-3/22/04.
ACM ISBN 978-1-4503-9157-3/22/04.                                                            Accessibility systems and tools; • Social and professional top-                                                      ics → People with disabilities.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                  Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

KEYWORDS                                                                       customizable.) Additionally, VoxLens reduces the burden on visu-
Visualizations, accessibility, screen readers, voice-based interaction,        alization creators in applying accessibility features to their data
blind, low-vision.                                                             visualizations, requiring inserting only a single line of JavaScript
                                                                               code during visualization creation. Furthermore, VoxLens enables
ACM Reference Format:                                                          screen-reader users to explore the data as per their individual pref-
Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke,          erences, without relying on the visualization creators and without
and Jacob O. Wobbrock. 2022. VoxLens: Making Online Data Visualizations
                                                                               having to process data in their minds. VoxLens is the frst system to:
Accessible with an Interactive JavaScript Plug-In. In CHI Conference on
                                                                               (1) enable screen-reader users to interact with online data visualiza-
Human Factors in Computing Systems (CHI ’22), April 29-May 5, 2022, New
Orleans, LA, USA. ACM, New York, NY, USA, 19 pages.        tions using voice-activated commands; and (2) ofer a multi-modal
1145/3491102.3517431                                                           solution using three diferent modes of interaction.
                                                                                  To assess the performance of VoxLens, we conducted controlled
                                                                               task-based experiments with 21 screen-reader users. Specifcally,
1    INTRODUCTION                                                              we analyzed the accuracy of extracted information and interaction
Online data visualizations are present widely on the Web, allowing             time with online data visualizations. Our results show that with
experts and non-experts alike to explore and analyze data both                 VoxLens, compared to without it, screen-reader users improved
simple and complex. They assist people in extracting information               their accuracy of extracting information by 122% and reduced their
efectively and efciently, taking advantage of the ability of the               overall interaction time by 36%. Additionally, we conducted follow-
human mind to recognize and interpret visual patterns [57].                    up semi-structured interviews with six participants, fnding that
   However, the visual nature of data visualizations inherently dis-           VoxLens is a positive step forward in making online data visual-
enfranchises screen-reader users, who may not be able to see or                izations accessible, interactive dialogue is one of the ‘top’ features,
recognize visual patterns [52, 57]. We defne “screen-reader users,”            sonifcation helps in ‘visualizing’ data, and data summary is a good
following prior work [69], as people who utilize a screen reader (e.g.,        starting point. Furthermore, we assessed the perceived workload
JAWS [68], NVDA [2], or VoiceOver [44]) to read the contents of a              of VoxLens using the NASA-TLX questionnaire [38], showing that
computer screen. They might have conditions including complete                 VoxLens leaves users feeling successful in their performance and
or partial blindness, low vision, learning disabilities (such as alexia),      demands low physical efort.
motion sensitivity, or vestibular hypersensitivity.                               The main contributions of our work are as follows:
   Due to the inaccessibility of data visualizations, screen-reader
                                                                                    (1) VoxLens, an interactive JavaScript plug-in that improves
users commonly cannot access them at all. Even when the data
                                                                                        the accessibility of online data visualizations for screen-
visualization includes basic accessibility functions (e.g., alternative
                                                                                        reader users. VoxLens ofers a multi-modal solution, en-
text or a data table), screen-reader users still spend 211% more time
                                                                                        abling screen-reader users to explore online data visualiza-
interacting with online data visualizations and answer questions
                                                                                        tions, both holistically and in a drilled-down manner, using
about the data in the visualizations 61% less accurately, compared
                                                                                        voice-activated commands. We present its design and archi-
to non-screen-reader users [69]. Screen-reader users rely on the cre-
                                                                                        tecture, functionality, commands, and operations. Addition-
ators of visualizations to provide adequate alternative text, which is
often incomplete. Additionally, they have to remember and process                       ally, we open-source our implementation at
more information mentally than is often humanly feasible [74],
                                                                                    (2) Results from formative and summative studies with screen-
such as when seeking the maximum or minimum value in a chart.
                                                                                        reader users evaluating the performance of VoxLens. With
Prior work has studied the experiences of screen-reader users with
                                                                                        VoxLens, screen-reader users signifcantly improved their in-
online data visualizations and highlighted the challenges they face,
                                                                                        teraction performance compared to their conventional inter-
the information they seek, and the techniques and strategies that
                                                                                        action with online data visualizations. Specifcally, VoxLens
could make online data visualizations more accessible [69]. Building
                                                                                        increased their accuracy of extracting information by 122%
on this work, it is our aim to realize a novel interactive solution to
                                                                                        and decreased their interaction time by 36% compared to not
enable screen-reader users to efciently interact with online data
                                                                                        using VoxLens.
   To this end, we created an open-source JavaScript plug-in called
“VoxLens,” following an iterative design process [1]. VoxLens pro-              2     RELATED WORK
vides screen-reader users with a multi-modal solution that supports            We review the previous research on the experiences of screen-reader
three modes of interaction: (1) Question-and-Answer mode, where                users with online data visualizations and the systems designed to
the user verbally interacts with the visualizations on their own; (2)          improve the accessibility of data visualizations for screen-reader
Summary mode, where VoxLens describes the summary of the in-                   users. Additionally, we review existing JavaScript libraries used to
formation contained in the visualization; and (3) Sonifcation mode,            create online visualizations, and tools that generate audio graphs.
where VoxLens maps the data in the visualization to a musical
scale, enabling listeners to interpret the data trend. (Existing sonif-
cation tools are either proprietary [39] or written in a programming
                                                                                2.1     Experiences of Screen-Reader Users with
language other than JavaScript [5], making them unintegratable                          Online Data Visualizations
with popular JavaScript visualization libraries; VoxLens’ sonifca-              Understanding the experiences and needs of users is paramount
tion feature is open-source, integratable with other libraries, and             in the development of tools and systems [8, 40]. Several prior
VoxLens                                                                                              CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

research eforts have conducted interviews with blind and low-               2.3    Existing JavaScript Data Visualization
vision (BLV) users to understand their experiences with technology                 Libraries
[7, 43, 67, 69, 82]. Most recently, Zhang et al. [82] conducted inter-
                                                                            Several JavaScript data visualization libraries exist that enable visu-
views with 12 BLV users, reporting four major challenges of cur-
                                                                            alization creators to make visualizations for the Web. We classifed
rent emoji entry methods: (1) the entry process is time-consuming;
                                                                            these visualization libraries into two categories based on accessibil-
(2) the results from these methods are inconsistent with the ex-
                                                                            ity features: (1) libraries that rely on developers to append appro-
pectations of users; (3) there is a lack of support for discovering
                                                                            priate alternative text (e.g., D3 and ChartJS); and (2) libraries that
new emojis; and (4) there is a lack of support for fnding the right
                                                                            automatically provide screen-reader users with built-in features for
emojis. They utilized these fndings to develop Voicemoji, a speech-
                                                                            data access (e.g., Google Charts).
based emoji entry system that enables BLV users to input emojis.
                                                                               Bostock et al. [11] developed D3—a powerful visualization library
Schaadhardt et al. [67] conducted contextual interviews with 12
                                                                            that uses web standards to generate graphs. D3 uses Scalable Vector
blind users, identifying key accessibility problems with 2-D digi-
                                                                            Graphics (SVG) [22] to create such visualizations, relying on the de-
tal artboards, such as Microsoft PowerPoint and Adobe Illustrator.
                                                                            velopers to provide adequate alternative text for screen-reader users
Similarly, Sharif et al. [69] conducted contextual interviews with
                                                                            to comprehend the information contained in the visualizations.
9 screen-reader users, highlighting the inequities screen-reader
                                                                               Google Charts [23] is a visualization tool widely used to create
users face when interacting with online data visualizations. They
                                                                            graphs. An important underlying accessibility feature of Google
reported the challenges screen-reader users face, the information
                                                                            Charts is the presence of a visually hidden tabular representation
they seek, and the techniques and strategies they prefer to make on-
                                                                            of data. While this approach allows screen-reader users to access
line data visualizations more accessible. We rely upon the fndings
                                                                            the raw data, extracting information is a cumbersome task. Fur-
from Sharif et al. [69] to design VoxLens, an interactive JavaScript
                                                                            thermore, tabular representations of data introduce excessive user
plug-in that improves the accessibility of online data visualizations,
                                                                            workloads, as screen-reader users have to sequentially go through
deriving motivation from Marriott et al.’s [57] call-to-action for
                                                                            each data point. The workload is further exacerbated as data car-
creating inclusive data visualizations for people with disabilities.
                                                                            dinality increases, forcing screen-reader users to memorize each
                                                                            data point to extract even the most fundamental information such
2.2       Accessibility of Online Data Visualizations                       as minimum or maximum values.
                                                                               In contrast to these approaches, VoxLens introduces an alternate
Prior research eforts have explored several techniques to make data
                                                                            way for screen-reader users to obtain their desired information
visualizations more accessible to BLV users, including automatically
                                                                            without relying on visualization creators, and without mentally
generating alternative text for visualization elements [48, 59, 70],
                                                                            computing complex information through memorization of data.
sonifcation [3, 5, 16, 27, 39, 58, 83], haptic graphs [33, 76, 81], 3-D
printing [15, 43, 71], and trend categorization [47]. For example,
                                                                            2.4    Audio Graphs
Sharif et al. [70] developed evoGraphs, a jQuery plug-in to create
accessible graphs by automatically generating alternative text. Sim-        Prior work has developed sonifcation tools to enable screen-reader
ilarly, Kim et al. [47] created a framework that uses multimodal            users to explore data trends and patterns in online data visualiza-
deep learning to generate summarization text from image-based               tions [3, 5, 39, 58, 83]. McGookin et al. [58] developed SoundBar,
line graphs. Zhao et al. [83] developed iSonic, which assists BLV           a system that allows blind users to gain a quick overview of bar
users in exploring georeferenced data through non-textual sounds            graphs using musical tones. Highcharts [39], a proprietary commer-
and speech output. They conducted in-depth studies with seven               cial charting tool, ofers data sonifcation as an add-on. Apple Audio
blind users, fnding that iSonic enabled blind users to fnd facts            Graphs [5] is an API for Apple application developers to construct
and discover trends in georeferenced data. Yu et al. [81] developed         an audible representation of the data in charts and graphs, giving
a system to create haptic graphs, evaluating their system using             BLV users access to valuable data insights. Similarly, Ahmetovic et
an experiment employing both blind and sighted people, fnding               al. [3] developed a web app that supports blind people in exploring
that haptic interfaces are useful in providing the information con-         graphs of mathematical functions using sonifcation.
tained in a graph to blind computer users. Hurst et al. [43] worked             At least one of the following is true for all of the aforementioned
with six individuals with low or limited vision and developed Viz-          systems: (1) they are proprietary and cannot be used outside of
Touch, software that leverages afordable 3-D printing to rapidly            their respective products [39]; (2) they are standalone hardware
and automatically generate tangible visualizations.                         or software applications [3]; (3) they require installation of extra
   Although these approaches are plausible solutions for improving          hardware or software [58]; or (4) they are incompatible with existing
the accessibility of visualizations for screen-reader users, at least       JavaScript libraries [5]. VoxLens provides sonifcation as a separate
one of the following is true for all of them: (1) they require additional   open-source library (independent from the VoxLens library) that is
equipment or devices; (2) they are not practical for spontaneous            customizable and integratable with any JavaScript library or code.
everyday web browsing; (3) they do not ofer a multi-modal solution;
and (4) they do not explore the varying preferences of visualization        3     DESIGN OF VOXLENS
interaction among screen-reader users. In contrast, VoxLens does            We present the design and implementation of VoxLens, an open-
not need any additional equipment, is designed for spontaneous              source JavaScript plug-in that improves the accessibility of online
everyday web browsing, and ofers a multi-modal solution catering            data visualizations. We created VoxLens using a user-centered itera-
to the individual needs and abilities of screen-reader users.               tive design process, building on fndings and recommendations from
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA              Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

prior work [69]. Specifcally, our goal was to provide screen-reader         the data collected from the Wizard-of-Oz studies. We provide our
users with a comprehensive means of extracting information from             fndings from the Wizard-of-Oz studies for each VoxLens mode in
online data visualizations, both holistically and in a drilled-down         its respective section, below.
fashion. Holistic exploration involves overall trend, extremum, and
labels and ranges for each axis, whereas drilled-down interaction           3.1.2 Qestion-and-Answer Mode. In Question-and-Answer mode,
involves examining individual data points [69]. We named our tool           screen-reader users can extract information from data visualiza-
VoxLens, combining “vox,” meaning “voice” in Latin, and “lens,”             tions by asking questions verbally using their microphone. We used
since it provides a way for screen-reader users to explore, examine,        the Web Speech API [60] and the P5 Speech library [24] for speech
and extract information from online data visualizations. Currently,         input, removing the need for any additional software or hardware
VoxLens only supports two-dimensional single-series data.                   installation by the user. Through manual testing, we found the P5
                                                                            Speech library to perform quite well in recognizing speech from
3.1     Interaction Modes                                                   diferent accents, pronunciations, and background noise levels. Af-
Following the recommendations from prior work [69], our goal                ter getting the text from the speech, we used an approximate string
was to enable screen-reader users to gain a holistic overview of the        matching algorithm from Hall and Dowling [36] to recognize the
data as well as to perform drilled-down explorations. Therefore,            commands. Additionally, we verifed VoxLens’ command recogni-
we explored three modes of interaction: (1) Question-and-Answer             tion efectiveness through manual testing, using prior work’s [73]
mode, where the user verbally interacts with the visualizations;            data set on natural language utterances for visualizations.
(2) Summary mode, where VoxLens verbally ofers a summary of                    Our Wizard-of-Oz studies revealed that participants liked clear
the information contained in the visualization; and (3) Sonifcation         instructions and responses, integration with the user’s screen reader,
mode, where VoxLens maps the data in the visualization to a mu-             and the ability to query by specifc terminologies. They specifed
sical scale, enabling listeners to interpret possible data trends or        that having an interactive tutorial to become familiar with the tool,
patterns. We iteratively built the features for these modes seeking         a help menu to determine which commands are supported, and
feedback from screen-reader users through our Wizard-of-Oz stud-            the ability to include the user’s query in the response as key areas
ies. VoxLens channels all voice outputs through the user’s local            of improvement. Therefore, after recognizing the commands and
screen reader, providing screen-reader users with a familiar and            processing their respective responses, VoxLens delivers a single
comfortable experience. These three modes of interaction can be             combined response to the user via their screen readers. This ap-
activated by pressing their respective keyboard shortcuts (Table 1).        proach enables screen-reader users to get a response to multiple
                                                                            commands as one single response. Additionally, we also added each
3.1.1 Wizard-of-Oz Studies. Our goal was to gather feedback and             query as feedback in the response (Figure 1). For example, if the
identify areas of improvement for the VoxLens features. Therefore,          user said, “what is the maximum?”, the response was, “I heard you
we conducted a Wizard-of-Oz study [21, 35] with fve screen-reader           ask about the maximum. The maximum is...” If a command was
users (see Appendix B, Table 7). (For clarity, we prefx the codes for       not recognized, the response was, “I heard you say [user input].
participants in our Wizard-of-Oz studies with “W.”) We used the             Command not recognized. Please try again.”
fndings from the studies to inform design decisions when itera-                Screen-reader users are also able to get a list of supported com-
tively building VoxLens. In our studies, we, the “wizards,” simulated       mands by asking for the commands list. For example, the user can
the auditory responses from a hypothetical screen reader.                   ask, “What are the supported commands?” to hear all of the com-
                                                                            mands that VoxLens supports. The list of supported commands,
                                                                            along with their aliases, are presented in Table 2.

                                                                           3.1.3 Summary Mode. Our Wizard-of-Oz studies, in line with the
                                                                           fndings from prior work [69], revealed that participants liked the
                                                                           efciency and the preliminary exploration of the data. They sug-
                                                                           gested the information be personalized based on the preferences
                                                                           of each user, but by default, it should only expose the minimum
                                                                           amount of information that a user would need to decide if they
                                                                           want to delve further into the data exploration. To delve further,
                                                                           they commonly seek the title, axis labels and ranges, maximum
                                                                           and minimum data points, and the average in online data visualiza-
                                                                           tions. (The title and axis labels are required confguration options
Figure 2: Sample visualization showing price by car brands.                for VoxLens, discussed further in section 3.2.2 below. Axis ranges,
                                                                           maximum and minimum data points, and average are computed by
                                                                           VoxLens.) At the same time, screen-reader users preferred concisely
   Participants interacted with all of the aforementioned VoxLens          stated information. Therefore, the goal for VoxLens’s Summary
modes and were briefy interviewed in a semi-structured manner              mode was to generate the summary only as a means to providing
with open prompts at the end of each of their interactions. Specif-        the foundational holistic information about the visualization, and
ically, we asked them to identify the features that they liked and         not as a replacement for the visualization itself. We used the “lan-
the areas of improvement for each mode. We qualitatively analyzed          guage of graphics” [10] through a pre-defned sentence template,
VoxLens                                                                                                CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

                                                                            Keyboard Shortcuts
                               Question-and-Answer Mode         Modifier Keys + A / Modifier Keys + 1
                               Summary Mode                     Modifier Keys + S / Modifier Keys + 2
                               Sonifcation Mode                 Modifier Keys + M / Modifier Keys + 3
                               Repeat Instructions              Modifier Keys + I / Modifier Keys + 4
Table 1: Keyboard shortcuts for VoxLens’ interaction modes and preliminary commands. Modifier Keys for Windows and
MacOS were Control+Shift and Option, respectively.

                            Information Type                 Command                           Aliases
                            Extremum                          Maximum                          Highest
                                                              Minimum                          Lowest
                            Axis Labels and Ranges           Axis Labels                           -
                                                               Ranges                              -
                            Statistics                          Mean                           Average
                                                               Median                              -
                                                                Mode                               -
                                                               Variance                            -
                                                         Standard Deviation                        -
                                                                 Sum                             Total
                            Individual Data Point        [x-axis label] value             [x-axis label] data
                            Help                             Commands                        Instructions
                                                                   -                      Directions, Help
                           Table 2: Voice-activated commands for VoxLens’ Question-and-Answer mode.

identifed as Level 1 by Lundgard et al. [55], to decide the sentence              the minimum data point is $20,000 belonging
structure. Our sentence template was:                                             to Kia. The average is $60,000.

      Graph with title: [title]. The X-axis is                                As noted in prior work [55, 69], the preference for information
      [x-axis title]. The Y-axis is [y-axis title]                         varies from one individual to another. Therefore, future work can
      and ranges from [range minimum] to [range                            explore personalization options to generate a summarized response
      maximum]. The maximum data point is [maximum                         that caters to the individual needs of screen-reader users.
      y-axis value] belonging to [corresponding                               Additionally, VoxLens, at present, does not provide information
      x-axis value], and the minimum data point                            about the overall trend through the Summary mode. Such infor-
      is [minimum y-axis value] belonging to                               mation may be useful for screen-reader users in navigating line
      [corresponding x-axis value]. The average                            graphs [47, 48]. Therefore, work is underway to incorporate trend
      is [average].                                                        information in the summarized response generated for line graphs,
                                                                           utilizing the fndings from prior work [47, 48].
  For example, here is a generated summary of a data visualization
depicting the prices of various car brands (Figure 2):                     3.1.4 Sonification Mode. For Sonifcation mode, our Wizard-of-Oz
                                                                           participants liked the ability to preliminarily explore the data trend.
      Graph with title: Price by Car Brands. The                           As improvements, participants suggested the ability to identify key
      X-axis is car brands. The Y-axis is price and                        information, such as the maximum and the minimum data points.
      ranges from $0 to $300,000. The maximum data                         Therefore, VoxLens’s sonifcation mode presents screen-reader
      point is $290,000 belonging to Ferrari, and                          users with a sonifed response (also known as an “audio graph”
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

[72]) mapping the data in the visualization to a musical scale. A                    and in order. You’ll hear a beep sound, after
sonifed response enables the listeners to interpret the data trend                   which you can ask a question such as, ‘‘what
or pattern and gain a big-picture perspective of the data that is                    is the average?’’ or ‘‘what is the maximum
not necessarily achievable otherwise [66]. To generate the sonifed                   value in the graph?’’ To hear the textual
response, we utilized Tone.js [56], a JavaScript library that ofers                  summary of the graph, press Control + Shift +
a wide variety of customizable options to produce musical notes.                     S or Control + Shift + 2. To hear the sonified
Our goal was to enable the listeners to directionally distinguish                    version of the graph, press Control + Shift
between data points and to interpret the overall data trend.                         + M or Control + Shift + 3. To repeat these
   Varying tonal frequency is more efective at representing trends                   instructions, press Control + Shift + I or
than varying amplitude [26, 42]. Therefore, we mapped each data                      Control + Shift + 4. Key combinations must
point to a frequency within the range of 130 and 650 Hz based                        be pressed all together and in order.
on its magnitude. For example, for the minimum data point the
                                                                                At this stage, screen-reader users can activate question-and-
frequency was 130 Hz, for the maximum data point it was 650Hz,
                                                                             answer mode, listen to the textual summary, play the sonifed ver-
and the intermediate data points were assigned values linearly in-
                                                                             sion of the data contained in the visualization, or hear the instruc-
between, similar to prior work [19, 61]. Additionally, similar to
                                                                             tions again. Activating the question-and-answer mode plays a beep
design choices made by Ohshiro et al. [61], we used the sound of a
                                                                             sound, after which the user can ask a question in a free-form man-
sawtooth wave to indicate value changes along the x-axis. These
                                                                             ner, without following any specifc grammar or sentence structure.
approaches enabled us to distinctively diferentiate between data
                                                                             They are also able to ask for multiple pieces of information, in no
values directionally, especially values that were only minimally dif-
                                                                             particular order. For example, in a visualization containing prices
ferent from each other. We chose this range based on the frequency
                                                                             of cars by car brands, a screen-reader user may ask:
range of the human voice [6, 58, 75], and by trying several combina-
tions ourselves, fnding a setting that was comfortable for human                      Tell me the mean, maximum, and standard
ears. We provide examples of sonifed responses in our paper’s                         deviation.
supplementary materials. Our open-source sonifcation library is                  The response from VoxLens would be:
available at
   In our work, we used the three common chart types (bar, scatter,                   I heard you asking about the mean, maximum,
and line) [65], following prior work [69]. All of these chart types use               and standard deviation. The mean is $60,000.
a traditional Cartesian coordinate system. Therefore, VoxLens’s                       The maximum value of price for car brands is
sonifed response is best applicable to graphs represented using a                     $290,000 belonging to Ferrari. The standard
Cartesian plane. Future work can study sonifcation responses for                      deviation is 30,000.
graphs that do not employ a Cartesian plane to represent data (e.g.,             Similarly, users may choose to hear the textual summary or play
polar plots, pie charts, etc.).                                               the sonifed version, as discussed above.

3.2     Usage and Integration                                                3.2.2 Visualization Creators. Typically, the accessibility of online
3.2.1 Screen-Reader User. A pain point for screen-reader users               data visualizations relies upon visualization creators and their
when interacting with online data visualizations is that most vi-            knowledge and practice of accessibility standards. When an alter-
sualization elements are undiscoverable and incomprehensible by              native text description is not provided, the visualization is useless
screen readers. In building VoxLens, we ensured that the visualiza-          to screen-reader users. In cases where alternative text is provided,
tion elements were recognizable and describable by screen readers.           the quality and quantity of the text is also a developer’s choice,
Hence, as the very frst step, when the screen reader encounters a            which may or may not be adequate for screen-reader users. For
visualization created with VoxLens, the following is read to users:          example, a common unfortunate practice is to use the title of the
                                                                             visualization as its alternative text, which helps screen-reader users
        Bar graph with title: [title]. To listen
                                                                             in understanding the topic of the visualization but does not help
        to instructions on how to interact with the
                                                                             in understanding the content contained within the visualization.
        graph, press Control + Shift + I or Control +
                                                                             Therefore, VoxLens is designed to reduce the burden and depen-
        Shift + 4. Key combinations must be pressed
                                                                             dency on developers to make accessible visualizations, keeping the
        all together and in order.
                                                                             interaction consistent, independent of the visualization library used.
   The modifer keys (Control + Shift on Windows, and Option                  Additionally, VoxLens is engineered to require only a single line of
on MacOS) and command keys were selected to not interfere with               code, minimizing any barriers to its adoption (Figure 1).
the dedicated key combinations of the screen reader, the Google                 VoxLens supports the following confguration options: “x” (key
Chrome browser, and the operating system. Each command was                   name of the independent variable), “y” (key name of the dependent
additionally assigned a numeric activation key, as per suggestions           variable), “title” (title of the visualization), “xLabel” (label for x-
from our participants.                                                       axis), and “yLabel” (label for y-axis). “x,” “y,” and “title” are required
   When a user presses the key combination to listen to the instruc-         parameters, whereas the “xLabel” and “yLabel” are optional and
tions, their screen reader announces the following:                          default to the key names of “x” and “y,” respectively. VoxLens
        To interact with the graph, press Control +                          allows visualization creators to set the values of these confguration
        Shift + A or Control + Shift + 1 all together                        options, as shown in Figure 1.
VoxLens                                                                                            CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

3.3       Channeling VoxLens’ Output to Screen                           Charts, D3, and ChartJS, integrated VoxLens with each of them,
          Readers                                                        and deployed a testing website on our server. The testing website
                                                                         was instrumental in ensuring the correct operation of VoxLens
One of the challenges we faced was to channel the auditory response
                                                                         under various confgurations, bypassing the challenges of setting
from VoxLens to the screen reader of the user. As noted by our
                                                                         up a development environment for testers.
participants during Wizard-of-Oz studies, screen-reader users have
unique preferences for their screen readers, including the voice
and speed of the speech output. Therefore, it was important for          3.5    Conficts with Other Plug-ins
VoxLens to utilize these preferences, providing screen-reader users      To the best of our knowledge, two kinds of conficts are possible
with a consistent, familiar, and comfortable experience. To relay the    with VoxLens: key combination conficts and ARIA attribute con-
output from VoxLens to the screen reader, we created a temporary         ficts. As mentioned in section 4.2.1, we selected key combinations to
div element that was only visible to screen readers, positioning it      avoid conficts with the dedicated combinations of the screen reader,
of-screen, following WebAIM’s recommendations [78].                      the Google Chrome browser, and the operating system. However, it
   Then, we added the appropriate Accessible Rich Internet Ap-           is possible that some users might have external plug-ins using key
plications (ARIA) attributes [77] to the temporary element to en-        combinations that would confict with those from VoxLens. Future
sure maximum accessibility. ARIA attributes are a set of attributes      work could build a centralized confguration management system,
to make web content more accessible to people with disabilities.         enabling users to specify their own key combinations.
Notably, we added the “aria-live” attribute, allowing screen read-          VoxLens modifes the “aria-label” attribute of the visualiza-
ers to immediately announce the query responses that VoxLens             tion container element to describe the interaction instructions for
adds to the temporary element. For MacOS, we had to addition-            VoxLens, as mentioned in section 4.2.1. It is possible that another
ally include the “role” attribute, with its value set to “alert.” This   plug-in may intend to modify the “aria-label” attribute as well, in
approach enabled VoxLens to promptly respond to screen-reader            which case the execution order of the plug-ins will determine which
users’ voice-activated commands using their screen readers. After        plug-in achieves the fnal override. The execution order of the plug-
the response from VoxLens is read by the screen reader, a callback       ins depends on several external factors [63], and is, unfortunately,
function removes the temporary element from the HTML tree to             a common limitation for any browser plug-in. However, VoxLens
avoid overloading the HTML Document Object Model (DOM).                  does not afect the “aria-labelledby” attribute, allowing other sys-
                                                                         tems to gracefully override the “aria-label” attribute set by VoxLens,
                                                                         as this attribute takes precedence over the “aria-label” attribute in
3.4       Additional Implementation Details
                                                                         the accessibility tree. Future iterations of VoxLens will attempt
At present, VoxLens only supports two-dimensional data, contain-         to ensure that VoxLens achieves the last execution order and that
ing one independent and one dependent variable, as only the in-          the ARIA labels set by other systems are additionally relayed to
teractive experiences of screen-reader users with two-dimensional        screen-reader users.
data visualizations are well-understood [69]. To support data dimen-        It is important to note that VoxLens’s sonifcation library is
sions greater than two, future work would need to investigate the        supplied independently from the main VoxLens plug-in and does
interactive experiences of screen-reader users with n-dimensional        not follow the same limitations. Our testing did not reveal any
data visualizations. VoxLens is customizable and engineered to           conficts of the sonifcation library with other plug-ins.
support additional modifcations in the future.
   VoxLens relies on the Web Speech API [60], and is therefore
                                                                         4     EVALUATION METHOD
only fully functional on browsers with established support for the
API such as Google Chrome. JavaScript was naturally our choice of        We evaluated the performance of VoxLens using a mixed-methods
programming language for VoxLens, as VoxLens is a plug-in for            approach. Specifcally, we conducted an online mixed-factorial ex-
JavaScript visualization libraries. Additionally, we used EcmaScript     periment with screen-reader users to assess VoxLens quantitatively.
[37] to take advantage of modern JavaScript features such as de-         Additionally, we conducted follow-up interviews with our partici-
structured assignments, arrow functions, and the spread operator.        pants for a qualitative assessment of VoxLens.
We also built a testing tool to test VoxLens on data visualizations,
using the React [45] framework as the user-interface framework           4.1    Participants
and Node.js [28] as the back-end server—both of which also use           Our participants (see Appendix A, Table 6) were 22 screen-reader
JavaScript as their underlying programming language. Additionally,       users, recruited using word-of-mouth, snowball sampling, and email
we used GraphQL [29] as the API layer for querying and connecting        distribution lists for people with disabilities. Nine participants iden-
with our Postgres [34] database, which we used to store data and         tifed as women and 13 as men. Their average age was 45.3 years
participants’ interaction logs.                                          (SD=16.8). Twenty participants had complete blindness and two
   Creating a tool like VoxLens requires signif-                         participants had partial blindness; nine participants were blind
cant engineering efort. Our GitHub repository at                         since birth, 12 lost vision gradually, and one became blind due to a has a total of 188                brain tumor. The highest level of education attained or in pursuit
commits and 101,104 lines of developed code, excluding comments.         was a doctoral degree for two participants, a Master’s degree for
To support testing VoxLens on various operating systems and              seven participants, a Bachelor’s degree for eight participants, and a
browsers with diferent screen readers, we collected 30 data sets         high school diploma for the remaining fve participants. Estimated
of varying data points, created their visualizations using Google        computer usage was more than 5 hours per day for 12 participants,
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

2-5 hours per day for eight participants, and 1-2 hours per day for          (2) Symmetry Comparison (comparison of data points); and (3) Chart
two participants. The average frequency of interacting with online           Type-Specifc Questions (value retrieval for bar charts, trend sum-
data visualizations was over two visualizations per day, usually in          mary for line charts, and correlation for scatter plots). As in prior
the context of news articles, blog posts, and social media.                  work [69], all questions were multiple-choice questions with four
   For the task-based experiment and questionnaire, participants             choices: the correct answer, two incorrect answers, and the option
were compensated with a $20 Amazon gift card for 30-45 minutes               for “Unable to extract information.” The order of the four choices
of their time. For the follow-up interview, they were compensated            was randomized per trial.
$10 for 30 minutes of their time. No participant was allowed to
partake in the experiment more than once.                                     4.3     Procedure
                                                                             The study was conducted online by participants without direct
4.2     Apparatus                                                            supervision. The study comprised six stages. The frst stage dis-
                                                                             played the study purpose, eligibility criteria, and the statement of
We conducted our task-based experiment online using a user study
                                                                             IRB approval. In the second stage, the participants were asked to fll
platform that we created with the JavaScript React framework [45].
                                                                             out a pre-study questionnaire to record their demographic informa-
We tested our platform with screen-reader users and ourselves, both
                                                                             tion, screen-reader software, vision-loss level, and diagnosis (see
with and without a screen reader, ensuring maximum and proper
                                                                             Appendix A, Table 6). Additionally, participants were asked about
accessibility measures. We deployed the experiment platform as a
                                                                             their education level, daily computer usage, and their frequency of
website hosted on our own server.
                                                                             interacting with visualizations.
   We analyzed the performance of VoxLens comparing the data
                                                                                In the third stage, participants were presented with a step-by-
collected from our task-based experiments with that from prior
                                                                             step interactive tutorial to train and familiarize themselves with the
work [69]. To enable a fair comparison to this prior work, we used
                                                                             modes, features, and commands that VoxLens ofers. Additionally,
the same visualization libraries, visualization data set, question cat-
                                                                             participants were asked questions at each step to validate their
egories, and complexity levels. The visualization libraries (Google
                                                                             understanding. On average, the tutorial took 12.6 minutes (SD=6.8)
Charts, ChartJS, and D3) were chosen based on the variation in their
                                                                             to complete. Upon successful completion of the tutorial, participants
underlying implementations as well as their application of accessi-
                                                                             were taken to the fourth stage, which displayed the instructions for
bility measures. Google Charts utilizes SVG elements to generate
                                                                             completing the study tasks.
the visualization and appends a tabular representation of the data
                                                                                In the ffth stage, each participant was given a total of nine tasks.
for screen-reader users, by default; D3 also makes use of SVG ele-
                                                                             For each task, participants were shown three Web pages: Page 1
ments but does not provide a tabular representation; ChartJS uses
                                                                             contained the question to explore, page 2 displayed the question
HTML Canvas to render the visualization as an image and relies
                                                                             and visualization, and page 3 presented the question with a set of
on the developers to add alternative text (“alt-text”) and Accessible
                                                                             four multiple-choice responses. Figure 3 shows the three pages of
Rich Internet Applications (“ARIA”) attributes [77]. Therefore, each
                                                                             an example task. After the completion of the tasks, participants
of these visualization libraries provides a diferent experience for
                                                                             were asked to fll out the NASA-TLX [38] survey in the last stage.
screen-reader users, as highlighted in prior work [69].
                                                                             An entire study session ranged from 30-45 minutes in duration.
   We provide all of the visualizations and data sets used in this
work in this paper’s supplementary materials. Readers can repro-
duce these visualizations using the supplementary materials in                4.4     Design & Analysis
conjunction with the source code and examples presented in our               The experiment was a 2 × 3 × 3 × 3 mixed-factorial design with the
open-source GitHub repository. We implemented the visualizations             following factors and levels:
following the WCAG 2.0 guidelines [17] in combination with the of-
                                                                                    • VoxLens (V X ), between-Ss.: {yes, no}
fcial accessibility recommendations from the visualization libraries.
                                                                                    • Visualization Library (V L), within-Ss.: {ChartJS, D3, Google
For ChartJS, we added the “role” and “aria-label” attributes to the
“canvas” element. The “role” attribute had the value of “img,” and
                                                                                    • Data Complexity (CMP), within-Ss.: {Low, Medium, High}
the “aria-label” was given the value of the visualization title, as per
                                                                                    • Question Difculty (DF ), within-Ss.: {Low, Medium, High}
the ofcial documentation from ChartJS developers [18]. We did
not perform any accessibility scafolding for Google Charts and                  For the screen-reader users who did not use VoxLens (V X =no),
D3 visualizations, as these visualizations rely on a combination of          we used prior work’s data [69] (N =36) as a baseline for comparison.
internal implementations and the features of SVG for accessibility.             Our two dependent variables were Accuracy of Extracted Infor-
Our goal was to replicate an accurate representation of how these            mation (AEI) and Interaction Time (IT). We used a dichotomous
visualizations currently exist on the Web.                                   representation of AEI (i.e., “inaccurate” or 0 if the user was unable
   Recent prior work [46] has reported that the non-visual ques-             to answer the question correctly, and “accurate” or 1 otherwise) for
tions that users ask from graphs mainly comprise compositional               our analysis. We used a mixed logistic regression model [32] with
questions, similar to the fndings from Brehmer and Munzner’s                 the above factors, interactions with VoxLens, and a covariate to
task topology [14]. Therefore, our question categories comprised             control for Age. We also included Subjectr as a random factor to
one “Search” action (lookup and locate) and two “Query” actions              account for repeated measures. The statistical model was therefore
(identify and compare), similar to prior work [13]. The categories, in       AEI ← V X +V X ×V L +V X ×CMP +V X ×DF + Aдe + Subjectr . We
ascending order of difculty, were: (1) Order Statistics (extremum);          did not include factors for V L, CMP, or DF because our research
VoxLens                                                                                         CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

                                                                       interview transcript was coded by three researchers independently,
                                                                       and disagreements were resolved through mutual discussions. As
                                                                       suggested by Lombard et al. [51], we calculated inter-rater relia-
                                                                       bility (IRR) using pairwise percentage agreement together with
                                                                       Krippendorf’s α [49]. To calculate pairwise percentage agreement,
                                                                       we calculated the average pairwise agreement among the three rater
                                                                       pairs across observations. Our pairwise percentage agreement was
                                                                       94.3%, showing a high agreement between raters. Krippendorf’s α
                                                                       was calculated using ReCal [31] and found to be 0.81, indicating a
                                                                       high level of reliability [50].
                                                                          In addition to conducting follow-up interviews, we administered
                                                                       the NASA-TLX survey [38] with all participants (N =21) to assess
                                                                       the perceived workload of VoxLens.

                                                                       5     RESULTS
                                                                       We present our experiment results using the Accuracy of Extracted
                                                                       Information (AEI) and Interaction Time (IT) for screen-reader users
                                                                       with and without VoxLens. We also present our interview results
                                                                       and the subjective ratings from the NASA-TLX questionnaire [38].

                                                                       5.1    Accuracy of Extracted Information
                                                                       Our results show a signifcant main efect of VoxLens (VX) on AEI
                                                                       (χ 2 (1, N =57)=38.16, p<.001, Cramer’s V =.14), with VoxLens users
                                                                       achieving 75% accuracy (SD = 18.0%) and non-VoxLens users
                                                                       achieving only 34% accuracy (SD = 20.1%). This diference consti-
                                                                       tuted a 122% improvement due to VoxLens.
                                                                           By analyzing the VoxLens (VX) × Visualization Library (VL) inter-
                                                                       action, we investigated whether changes in AEI were proportional
                                                                       across visualization libraries for participants in each VoxLens group.
Figure 3: Participants were shown three pages for each task.           The V X × V L interaction was indeed statistically signifcant (χ 2 (4,
(a) Page 1 presented the question to explore. (b) Page 2 dis-          N =57)=82.82, p<.001, Cramer’s V =.20). This result indicates that
played the same question and a visualization. (c) Page 3               AEI signifcantly difered among visualization libraries for partic-
showed the question again with a set of four multiple choice           ipants in each VoxLens group. Figure 4 and Table 3 show AEI
responses.                                                             percentages for diferent visualization libraries for each VoxLens
                                                                       group. Additionally, we report our fndings in Table 4.
                                                                           Prior work [69] has reported a statistically signifcant diference
questions centered around VoxLens (V X ) and our interest in these     between screen-reader users (SRU) and non-screen-reader users
factors only extended to their possible interactions with VoxLens.     (non-SRU) in terms of AEI , attributing the diference to the inac-
   For Interaction Time (IT ), we used a linear mixed model [30,       cessibility of online data visualizations. We conducted a second
54] with the same model terms as for AEI . IT was calculated as        analysis, investigating whether AEI was diferent between screen-
the total time of the screen reader’s focus on the visualization       reader users who used VoxLens and non-screen-reader users, to
element. Participants were tested over three Visualization Library ×   extract information from online data visualizations. Specifcally, we
Complexity (V L × CMP) conditions, resulting in 3×3 = 9 trials per     investigated the efect of SRU on AEI but did not fnd a statistically
participant. With 21 participants, a total of 21×9 = 189 trials were   signifcant efect (p ≈ .077). This result itself does not provide evi-
produced and analyzed for this study. One participant, who was         dence in support of VoxLens closing the access gap between the
unable to complete the tutorial, was excluded from the analysis.       two user groups; further experimentation is necessary to confrm
                                                                       or refute this marginal result. In light of VoxLens’s other benefts,
4.5       Qualitative and Subjective Evaluation                        however, this is an encouraging trend.
To qualitatively assess the performance of VoxLens, we conducted
follow-up interviews with six screen-reader users, randomly se-        5.2    Interaction Time
lected from our pool of participants who completed the task-based      Our preliminary analysis showed that the interaction times were
experiment. Similar to prior work [80], we ceased recruitment of       conditionally non-normal, determined using Anderson-Darling [4]
participants once we reached saturation of insights.                   tests of normality. To achieve normality, we applied logarithmic
   To analyze our interviews, we used thematic analysis [12] guided    transformation prior to analysis, as is common practice for time
by a semantic approach [62]. We used two interviews to develop         measures [9, 41, 53]. For ease of interpretation, plots of interaction
an initial set of codes, resulting in a total of 23 open codes. Each   times are shown using the original non-transformed values.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                  Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

Figure 4: Accuracy of Extracted Information (AEI ), as a percentage, for screen-reader users without (N =36) and with (N =21)
VoxLens, by (a) Visualization Library, (b) Complexity Level, and (c) Difculty Level. The percentage represents the “accurate”
answers. Therefore, higher is better. Error bars represent mean ± standard deviation.

                                                                 Without VoxLens                        With VoxLens
                                                                N          AA        AA (%)        N         AA       AA (%)
                                Overall                        324         109        34%         189        141        75%
                                Visualization Library (VL)
                                ChartJS                        108         12         11%          63        50         79%
                                D3                             108         18         17%          63        47         75%
                                Google Charts                  108         79         73%          63        44         70%
                                Complexity Level (CMP)
                                Low                            108         40         37%          63        52         83%
                                Medium                         108         34         31%          63        48         76%
                                High                           108         35         32%          63        41         65%
                                Difculty Level (DF)
                                Low                            108         35         32%          63        58         92%
                                Medium                         108         36         33%          63        38         60%
                                High                           108         38         35%          63        45         71%
Table 3: Numerical results for the N = 513 questions asked of screen-reader users with and without VoxLens for each level
of Visualization Library, Complexity Level, and Difculty Level. N is the total number of questions asked, AA is the number of
“accurate answers,” and AA(%) is the percentage of “accurate answers.”

   VoxLens (VX) had a signifcant main efect on Interaction Time                      The VX × VL and VX × DF interactions were both signifcant
(IT) (F (4,54)=12.66, p<.05, ηp2 =.19). Specifcally, the average IT for          (F (4,444)=33.89, p<.001, ηp2 =.23 and F (444)=14.41, p<.001, ηp2 =.12,
non-VoxLens users was 84.6 seconds (SD=75.2). For VoxLens users,                 respectively). Figure 5 shows interaction times across diferent
it was 54.1 seconds (SD=21.9), 36% lower (faster) than for partici-              visualization libraries, difculty levels, and complexity levels for
pants without VoxLens.                                                           VoxLens group. For VoxLens users, all three visualization libraries
VoxLens                                                                                                        CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

                                                           N                χ2                 p            Cramer’s V
                                   V X (VoxLens)           57              38.16             < .001             .14
                                   VX ×VL                  57              82.82             < .001             .20
                                   V X × CMP               57              8.90              .064               .07
                                   V X × DF                57              17.95             .001               .09
                                   Aдe                     57              3.58              .058               .04
Table 4: Summary results from 57 screen-reader users with (N =21) and without (N =36) VoxLens. “VL” is the visualization
library, “CMP” is data complexity, and “DF” is question difculty. Cramer’s V is a measure of efect size [25]. All results are
statistically signifcant (p < .05) or marginal (.05 ≤ p < .10).

                                                   d fn           d fd                 F                p                ηp2
                           V X (VoxLens)            4              54                12.66            .001               .19
                           VX ×VL                   4             444                33.89            < .001             .23
                           V X × CMP                4             444                1.85             .118               .02
                           V X × DF                 4             444                14.41            < .001             .12
                           Aдe                      4              54                5.03             .029               .09
Table 5: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens using a linear mixed
model [30, 54]. “VL” is the visualization library, “CMP” is data complexity, and “DF” is question difculty. Partial eta-squared
(ηp2 ) is a measure of efect size [20]. All results are statistically signifcant (p < .05) except V X × CMP.

resulted in almost identical interaction times. Figure 5 portrays                 our participants’ feedback about VoxLens: (1) a positive step for-
larger variations in interaction times for users who did not use                  ward in making online data visualizations accessible, (2) interactive
VoxLens (data used from prior work [69]) compared to VoxLens                      dialogue is one of the “top” features, (3) sonifcation helps in “vi-
users. We attribute these observed diferences to the diferent un-                 sualizing” data, (4) data summary is a good starting point, and (5)
derlying implementations of the visualization libraries.                          one-size-fts-all is not the optimal solution. We present each of
    We investigated the efects of Age on IT . Age had a signifcant                these in turn.
efect on IT (F (1,54)=5.03, p<.05, ηp2 =.09), indicating that IT difered
signifcantly across the ages of our participants, with participants               5.3.1 A Positive Step Forward in Making Online Data Visualiza-
aged 50 or older showing higher interaction times by about 7%                     tions Accessible. All participants found VoxLens to be an overall
compared to participants under the age of 50. Table 8 (Appendix                   helpful tool to interact with and quickly extract information from
C) shows the average IT for each age range by VoxLens group.                      online data visualizations. For example, S1 and S3 expressed their
Additionally, we report our fndings in Table 5.                                   excitement about VoxLens:
    Similar to our exploration of investigating the efect of screen-
reader users (SRU ) on AEI , we examined the main efect of SRU                             I have never been able to really interact
on IT . Our results show that SRU had a signifcant efect on IT                             with graphs before online. So without the
(F (4,54)=48.84, p<.001, ηp2 =.48), with non-screen-reader users per-                      tool, I am not able to have that picture
forming 99% faster than VoxLens users.                                                     in my head about what the graph looks
                                                                                           like. I mean, like, especially when looking
                                                                                           up news articles or really any, sort of,
                                                                                           like, social media, there’s a lot of visual
5.3       Interview Results                                                                representations and graphs and pictographs
To assess VoxLens qualitatively, we investigated the overall expe-                         that I don’t have access to so I could see
riences of our participants with VoxLens, the features they found                          myself using [VoxLens] a lot. The tool is
helpful, the challenges they faced during the interaction, and the                         really great and definitely a positive step
improvements and future features that could enhance the perfor-                            forward in creating accessible graphs and
mance of VoxLens. We identifed fve main results from analyzing                             data. (S1)
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA       Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

                                                                             access a graph and a chart and be able to
                                                                             parse data from it. (S3)
                                                                       Participants highlighted that VoxLens contributes to bridging
                                                                    the access gap between screen-reader- and non-screen-reader users.
                                                                    As S4 said:
                                                                             So, as a sighted person looks at a graph
                                                                             and as they can tell where the peak is or
                                                                             which one has the most or whatever, we want
                                                                             to be able to do that quickly as well. And
                                                                             even if there is a text description under
                                                                             the graph, and I’ve not seen that very much,
                                                                             you have to read through everything to find
                                                                             a certain piece of information that you’re
                                                                             looking for. [Using VoxLens], I can find
                                                                             out specific pieces of information without
                                                                             having to read an entire page of text. (S4)
                                                                        Additionally, participants identifed that VoxLens enables them
                                                                     to quickly extract information from online data visualizations. S5
                                                                     shared his experiences:
                                                                             Again, you know, [VoxLens] helps you find
                                                                             data a little bit quicker than navigating
                                                                             with a screen reader, and it’ll give you a
                                                                             brief idea of what the data is about before
                                                                             you start digging deeper into it. (S5)
                                                                        The fndings from our frst result show that VoxLens contributes
                                                                     to reducing the access gap for screen-reader users, and is a positive
                                                                     step forward, enabling screen-reader users to interact with and
                                                                     explore online data visualizations.

                                                                     5.3.2 Interactive Dialogue is One of the “Top” Features. Similar to
                                                                     our frst fnding, all the participants found the question-and-answer
                                                                     mode of VoxLens a fast and efcient way to extract information
                                                                     from online data visualizations. S2 considered the question-and-
                                                                     answer mode as one of the key features of VoxLens:
                                                                             So I believe that one of the really top
                                                                             features is, kind of, interactive dialogue.
                                                                        Similarly, S1 found the question-and-answer mode a fast and
                                                                     reliable way to extract information, requiring “a lot less brain power.”
                                                                     She said:
                                                                             I especially liked the part of the tool where
                                                                             you can ask it a question and it would give
Figure 5: Interaction Times (IT ), in seconds, for screen-                   you the information back. I thought it was
reader users without (N =36) and with (N =21) VoxLens by (a)                 brilliant actually. I felt like being able
Visualization Library (V L), (b) Data Complexity Level (CMP),                to ask it a question made everything go a
and (c) Question Difculty Level (DF ). Error bars represent                  lot faster and it took a lot less brain power
mean ± standard deviation. Lower is better (faster).                         I think. I felt really confident about the
                                                                             answers that it was giving back to me. (S1)
                                                                       S3 noted the broader utility and applicability of the question-
        Oh, [VoxLens] was outstanding. It’s                          and-answer mode:
        definitely a great way to visualize the                              The voice activation was very, very neat. I’m
        graphs if you can’t see them in the charts.                          sure it could come in handy for a variety of
        I mean, it’s just so cool that this is                               uses too. I definitely enjoyed that feature.
        something that allows a blind person to                              (S3)
VoxLens                                                                                          CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

   S5 faced some challenges in activating the right command but         5.3.5 One-Size-Fits-All Is Not the Optimal Solution. To enhance
was able to learn the usage of the question-and-answer mode in a        the usability of and interaction experience with VoxLens, our par-
few tries:                                                              ticipants identifed the need to cater to the individual preferences
       You know, sometimes the word was wrong and                       of the screen-reader users. For example, S3 recognized the need to
       I think it says something like, it didn’t                        have multiple options to “play” with the sonifed response:
       understand, but basically eventually I got                              So I was just thinking maybe, you know,
       it right. (S5)                                                          that could be some sort of option or like
   Our second fnding indicates that VoxLens’ question-and-answer               an alternate way to sonify it. Perhaps
mode is a fast, efcient, and reliable way for screen-reader users to           having an option to do it as continuous
extract information. Additionally, the feedback from the question-             cause I noticed, like, they were all
and-answer mode assists screen-reader users to resolve the chal-               discrete. ’Cause sometimes, you know, it’s
lenges by themselves within a few tries.                                       just preference or that could be something
                                                                               that could add some usability. It’s just
5.3.3 Sonification Helps in “Visualizing” Data. Our third result               some little things to maybe play with or to
reveals that our participants found sonifcation helpful in under-              maybe give an option or something. (S3)
standing general trends in the data. Specifcally, participants were
able to infer whether an overall trend was increasing or decreasing,      Similarly, S4 was interested in VoxLens predicting what she was
obtaining holistic information about the data. S2 said:                 going to ask using artifcial intelligence (A.I.). She said:
       The idea of sonification of the graph                                   You know, I think that [VoxLens] would need
       could give a general understanding of the                               a lot more artificial intelligence. It could
       trends. The way that it could summarize the                             be a lot [more] intuitive when it comes to
       charts was really nice too. The sonification                            understanding what I’m going to ask. (S4)
       feature was amazing. (S2)                                          Additionally, S2 suggested adding setting preferences for the
   S1, who had never used sonifcation before, expressed her initial     summary and the auditory sonifed output:
struggles interpreting a sonifed response but was able to “visualize”          [Summary mode] could eventually become a
the graph through sonifcation within a few tries. She said:                    setting preference or something that can
       The audio graph... I’d never used one before,                           be disabled. And you, as a screen-reader
       so I kind of struggled with that a little bit                           user, could not control the speed of the
       because I wasn’t sure if the higher pitch                               [sonification] to you. To go faster or to
       meant the bar was higher up in the graph or                             go slower, even as a blind person, would be
       not. But being able to visualize the graph                              [helpful]. (S2)
       with this because of the sound was really                           Our fndings indicate that a one-size-fts-all solution is not op-
       helpful. (S1)                                                    timal and instead, a personalizable solution should be provided, a
   Overall, our third result shows that sonifcation is a helpful        notion supported by recent work in ability-based design [79]. We
feature for screen-reader users to interact with data visualizations,   are working to incorporate the feedback and suggestions from our
providing them with holistic information about data trends.             participants into VoxLens.

5.3.4 Data Summary is a Good Starting Point. In keeping with            5.4    Subjective Workload Ratings
fndings from prior work [69], our fourth fnding indicates that
screen-reader users frst seek to obtain a holistic overview of the      We used the NASA Task Load Index (TLX) [38] workload question-
data, fnding a data summary to be a good starting point for visual-     naire to collect subjective ratings for VoxLens. The NASA-TLX
ization exploration. The summary mode of VoxLens enabled our            instrument asks participants to rate the workload of a task on six
participants to quickly get a “general picture” of the data. S1 and     scales: mental demand, physical demand, temporal demand, per-
S4 expressed the benefts of VoxLens’ summary mode:                      formance, efort, and frustration. Each scale ranges from low (1)
                                                                        to high (20). We further classifed the scale into four categories
       I thought the summary feature was really
                                                                        for a score x: low (x < 6), somewhat low (6 ≤ x < 11), somewhat
       great just to get, like, a general picture
                                                                        high (11 ≤ x < 16), and high (16 ≤ x). Our results indicate that
       and then diving deeper with the other
                                                                        VoxLens requires low physical- (M=3.4, SD=3.3) and temporal de-
       features to get a more detailed image in
                                                                        mand (M=5.7, SD=3.8), and has high perceived performance (M=5.6,
       my head about what the graphs look like. (S1)
                                                                        SD=5.6). Mental demand (M=7.8, SD=4.4), efort (M=9.9, SD=6.1),
       So, um, the summary option was a good start                      and frustration (M=8.3, SD=6.6) were somewhat low.
       point to know, okay, what is, kind of, on                           Prior work [69], which is the source of our data for screen-reader
       the graph. (S4)                                                  users who did not use VoxLens, did not conduct a NASA-TLX sur-
   Our fourth result indicates that VoxLens’ summary mode as-           vey with their participants. Therefore, a direct workload compari-
sisted screen-reader users to holistically explore the online data      son is not possible. However, the subjective ratings from our study
visualizations, helping them in determining if they want to dig         could serve as a control for comparisons in future work attempting
deeper into the data.                                                   to make online visualizations accessible for screen-reader users.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA               Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

6     DISCUSSION                                                            efectiveness of voice-recognition technology. Additional investiga-
In this work, we created VoxLens, an interactive JavaScript plug-in         tion showed that to answer the questions in our experiment tasks,
to make online data visualizations more accessible to screen-reader         screen-reader users utilized the Question-and-Answer mode 71.9%
users. This work has been guided by the recommendations and                 of the time, compared to the Summary (22.5%) and Sonifcation
fndings from prior work [69] that highlight the barriers screen-            (5.5%) modes. Out of the 71.9% Question-and-Answer mode usage,
reader users face in accessing the information contained in online          VoxLens accurately recognized and responded to commands 49.9%
data visualizations. In creating VoxLens, we sought to improve              of the time; 34% of the time VoxLens was unable to accurately parse
the accessibility of online data visualizations by making them dis-         the speech input, and the remaining 16.1% of the time VoxLens
coverable and comprehensible to screen readers, and by enabling             received commands that were not supported (e.g., “correlation co-
screen-reader users to explore the data both holistically and in a          efcient”). VoxLens uses the Web Speech API [60] for recognizing
drilled-down manner. To achieve this, we designed three modes               voice commands. While the Web Speech API is a great leap for-
of VoxLens: (1) Question-and-Answer mode; (2) Summary mode,                 ward in terms of speech-input and text-to-speech output features
and (3) Sonifcation mode. Our task-based experiments show that              [60], it is still an experimental feature with limited performance of
screen-reader users extracted information 122% more accurately              about 70% accuracy [64]. Therefore, future work could evaluate the
and spent 36% less time when interacting with online data visualiza-        performance of VoxLens with alternatives to the Web Speech API.
tions using VoxLens than without. Additionally, we observed that
screen-reader users utilized VoxLens uniformly across all visual-            6.3     Qualitative Assessment of VoxLens
ization libraries that were included in our experiments, irrespective       All six screen-reader users we interviewed expressed that VoxLens
of the underlying implementations and accessibility measures of             signifcantly improved their current experiences with online data
the libraries, achieving a consistent interaction.                          visualizations. Participants showed their excitement about VoxLens
                                                                            assisting them in “visualizing” the data and in extracting informa-
6.1     Simultaneous Improvement In Accuracy                                tion from important visualizations, such as the ones portraying
        and Interaction Times                                               COVID-19 statistics. Furthermore, some of our participants high-
                                                                            lighted that VoxLens reduces the access gap between screen-reader-
Prior work [69] has reported that due to the inaccessibility of online
                                                                            and non-screen-reader users. For example, S4 mentioned that with
data visualizations, screen-reader users extract information 62% less
                                                                            the help of VoxLens, she was able to “fnd out specifc pieces of infor-
accurately than non-screen-reader users. We found that VoxLens
                                                                            mation without having to read an entire page of text,” similar to how
improved the accuracy of information extraction of screen-reader
                                                                            a “sighted person” would interact with the graph. Additionally, our
users by 122%, reducing the information extraction gap between
                                                                            participants found VoxLens “pretty easy,” “meaningful,” “smooth,”
the two user groups from 62% to 15%. However, in terms of interac-
                                                                            and “intuitive,” without requiring a high mental demand.
tion time, while VoxLens reduced the gap from 211% to 99%, the
diference is still statistically signifcant between non-screen-reader
and VoxLens users. Non-screen-reader users utilize their visual
                                                                             6.4     VoxLens is a Response to Call-to-Action for
system’s power to quickly recognize patterns and extrapolate in-                     Inclusive Data Visualizations
formation from graphs, such as overall trends and extrema [57]. In          Taking these fndings together, VoxLens is a response to the call-
contrast, screen-reader users rely on alternative techniques, such as       to-action put forward by Marriott et al. [57] that asserts the need
sonifcation, to understand data trends. However, hearing a sonifed          to improve accessibility for disabled people disenfranchised by ex-
version of the data can be time-consuming, especially when the              isting data visualizations and tools. VoxLens is an addition to the
data cardinality is large, contributing to the diference in the inter-      tools and systems designed to make the Web an equitable place
action times between the two user groups. Additionally, issuing a           for screen-reader users, aiming to bring their experiences on a par
voice command, pressing a key combination, and the duration of              with that of non-screen-reader users. Through efective advertise-
the auditory response can also contribute to the observed difer-            ment, and by encouraging developers to integrate VoxLens into
ence. However, it is worth emphasizing that screen-reader users             the codebase of visualization libraries, we hope to broadly expand
who used VoxLens improved their interaction time by 36% while               the reach and impact of VoxLens. Additionally, through collecting
also increasing their accuracy of extracted information by 122%. In         anonymous usage logs (VoxLens modes used, commands issued,
other words, VoxLens users became both faster and more accurate,            and responses issued) and feedback from users—a feature already
a fortunate outcome often hard to realize in human performance              implemented in VoxLens—we aspire to continue improving the
studies due to speed-accuracy tradeofs.                                     usability and functionality of VoxLens for a diverse group of users.

6.2     Role of Voice-Recognition Technology                                 7     LIMITATIONS & FUTURE WORK
For screen-reader users who used VoxLens, 75% (N =141) of the               At present, VoxLens is limited to two-dimensional data visualiza-
answers were correct and 11% (N =20) were incorrect. Our partici-           tions with a single series of data. Future work could study the
pants were unable to extract the answers from the remaining 15%             experiences of screen-reader users with n-dimensional data visual-
(N =28) of the questions. Further exploration revealed that among           izations and multiple series of data, and extend the functionality of
the 25% (N =48) of the questions that were not answered correctly,          VoxLens based on the fndings. Additionally, VoxLens is currently
52% (N =25) involved symmetry comparison. Symmetry comparison               only fully functional on Google Chrome, as the support for the
requires value retrieval of multiple data points and relies on the          Web Speech API’s speech recognition is currently limited to Google
VoxLens                                                                                                                CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

Chrome. Future work could consider alternatives to the Web Speech                      [4] Theodore W. Anderson and Donald A. Darling. 1954. A test of goodness of ft.
API that ofer cross-browser support for speech recognition.                                Journal of the American statistical association 49, 268 (1954), 765–769.
                                                                                       [5] Apple. n.d.. Audio Graphs | Apple Developer Documentation. https://developer.
   Our fndings showed that some of our participants preferred                     (Accessed on 08/01/2021).
to have the ability to control the speed, frequency, and waveform                      [6] Ronald J. Baken and Robert F. Orlikof. 2000. Clinical Measurement of Speech
                                                                                           and Voice. Singular Thomson Learning, San Diego, California, USA. https:
of the sonifed response. Therefore, future work could extend the                           //
functionality of VoxLens by connecting it to a centralized con-                        [7] Nikola Banovic, Rachel L. Franz, Khai N. Truong, Jennifer Mankof, and Anind K.
fguration management system, enabling screen-reader users to                               Dey. 2013. Uncovering Information Needs for Independent Spatial Learning for
                                                                                           Users Who Are Visually Impaired. In Proceedings of the 15th International ACM
specify their preferences. These preferences could then be used to                         SIGACCESS Conference on Computers and Accessibility (Bellevue, Washington)
generate appropriate responses, catering to the individual needs of                        (ASSETS ’13). Association for Computing Machinery, New York, NY, USA, Article
screen-reader users.                                                                       24, 8 pages.
                                                                                       [8] Katja Battarbee. 2004. Co-experience: understanding user experiences in interaction.
                                                                                           Aalto University, Helsinki, Finland. 103 + app. 117 pages. http://urn.f/URN:
8    CONCLUSION                                                                        [9] Donald A. Berry. 1987. Logarithmic Transformations in ANOVA. Biometrics 43,
We presented VoxLens, a JavaScript plug-in that improves the acces-                        2 (1987), 439–456.
                                                                                      [10] Jacques Bertin. 1983. Semiology of graphics; diagrams networks maps. Technical
sibility of online data visualizations, enabling screen-reader users                       Report.
to extract information using a multi-modal approach. In creating                      [11] Michael Bostock, Vadim Ogievetsky, and Jefrey Heer. 2011. D3 data-driven
VoxLens, we sought to address the challenges screen-reader users                           documents. IEEE transactions on visualization and computer graphics 17, 12 (2011),
face with online data visualizations by enabling them to extract                      [12] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology.
information both holistically and in a drilled-down manner, using                          Qualitative research in psychology 3, 2 (2006), 77–101.
techniques and strategies that they prefer. Specifcally, VoxLens                      [13] Matthew Brehmer, Bongshin Lee, Petra Isenberg, and Eun Kyoung Choe. 2018.
                                                                                           Visualizing ranges over time on mobile phones: a task-based crowdsourced
provides three modes of interaction using speech and sonifcation:                          evaluation. IEEE transactions on visualization and computer graphics 25, 1 (2018),
Question-and-Answer mode, Summary mode, and Sonifcation mode.                              619–629.
                                                                                      [14] Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract
   To assess the performance of VoxLens, we conducted task-based                           visualization tasks. IEEE transactions on visualization and computer graphics 19,
experiments and interviews with screen-reader users. VoxLens sig-                          12 (2013), 2376–2385.
nifcantly improved the interaction experiences of screen-reader                       [15] Craig Brown and Amy Hurst. 2012. VizTouch: Automatically Generated Tactile
                                                                                           Visualizations of Coordinate Spaces. In Proceedings of the Sixth International
users with online data visualizations, both in terms of accuracy                           Conference on Tangible, Embedded and Embodied Interaction (Kingston, Ontario,
of extracted information and interaction time, compared to their                           Canada) (TEI ’12). Association for Computing Machinery, New York, NY, USA,
conventional interaction with online data visualizations. Our re-                          131–138.
                                                                                      [16] Lorna M. Brown, Stephen A. Brewster, Ramesh Ramloll, Mike Burton, and Beate
sults also show that screen-reader users considered VoxLens to                             Riedel. 2003. Design guidelines for audio presentation of graphs and tables.
be a “game-changer,” providing them with “exciting new ways” to                            In Proceedings of the 9th International Conference on Auditory Display. Citeseer,
                                                                                           Boston University, USA, 284–287.
interact with online data visualizations and saving them time and                     [17] Ben Caldwell, Michael Cooper, Loretta Guarino Reid, Gregg Vanderheiden, Wendy
efort. We hope that by open-sourcing our code for VoxLens and                              Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guide-
our sonifcation solution, our work will inspire developers and visu-                       lines (WCAG) 2.0.
                                                                                      [18] ChartJS. [n. d.]. Accessibility | Chart.js.
alization creators to continually improve the accessibility of online                      general/accessibility.html. (Accessed on 01/08/2022).
data visualizations. We also hope that our work will motivate and                     [19] Peter Ciuha, Bojan Klemenc, and Franc Solina. 2010. Visualization of Concurrent
guide future research in making data visualizations accessible.                            Tones in Music with Colours. In Proceedings of the 18th ACM International Confer-
                                                                                           ence on Multimedia (Firenze, Italy) (MM ’10). Association for Computing Machin-
                                                                                           ery, New York, NY, USA, 1677–1680.
ACKNOWLEDGMENTS                                                                       [20] Jacob Cohen. 1973. Eta-squared and partial eta-squared in fxed factor ANOVA
                                                                                           designs. Educational and psychological measurement 33, 1 (1973), 107–112.
This work was supported in part by the Mani Charitable Foundation,                    [21] Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz stud-
                                                                                           ies—why and how. Knowledge-based systems 6, 4 (1993), 258–266.
the University of Washington Center for an Informed Public, and                       [22] Patrick Dengler, Anthony Grasso, Chris Lilley, Cameron McCormack, Doug
the University of Washington Center for Research and Education                             Schepers, and Jonathan Watt. 2011. Scalable Vector Graphics (SVG) 1.1.
on Accessible Technology and Experiences (CREATE). We extend                          [23] Google Developers. 2014. Charts.
                                                                                      [24] Roger L. DuBois. 2017. Web Audio Speech Synthesis / Recognition for p5.js.
our gratitude to the AccessComputing staf for their support and                  
assistance in recruiting participants. We would also like to thank the                [25] Christopher J. Ferguson. 2016. An efect size primer: A guide for clinicians and
anonymous reviewers for their helpful comments and suggestions.                            researchers. In Methodological issues and strategies in clinical research, A.E. Kazdin
                                                                                           (Ed.). American Psychological Association, Washington, DC, USA, 301––310.
Any opinions, fndings, conclusions, or recommendations expressed                      [26] John H. Flowers. 2005. Thirteen years of refection on auditory graphing:
in this work are those of the authors and do not necessarily refect                        Promises, pitfalls, and potential new directions. In Proceedings of the 11th Meeting
                                                                                           of the International Conference on Auditory Display. Citeseer, Limerick, Ireland,
those of any supporter.                                                                    406–409.
                                                                                      [27] John H. Flowers, Dion C. Buhman, and Kimberly D. Turnage. 1997. Cross-
                                                                                           modal equivalence of visual and auditory scatterplots for exploring bivariate
REFERENCES                                                                                 data samples. Human Factors 39, 3 (1997), 341–351.
 [1] Chadia Abras, Diane Maloney-Krichmar, Jenny Preece, et al. 2004. User-centered   [28] OpenJS Foundation. n.d.. Node.js.               (Accessed on
     design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Thousand           08/08/2021).
     Oaks: SAGE Publications 37, 4 (2004), 445–456.                                   [29] The GraphQL Foundation. n.d.. GraphQL | A query language for your API.
 [2] NV Access. n.d.. NV Access | Download.   (Accessed on 08/08/2021).
     (Accessed on 08/08/2021).                                                        [30] Brigitte N. Frederick. 1999. Fixed-, random-, and mixed-efects ANOVA models:
 [3] Dragan Ahmetovic, Cristian Bernareggi, João Guerreiro, Sergio Mascetti, and           A user-friendly guide for increasing the generalizability of ANOVA results. In
     Anna Capietto. 2019. Audiofunctions. web: Multimodal exploration of math-             Advances in Social Science Methodology, B. Thompson (Ed.). JAI Press, Stamford,
     ematical function graphs. In Proceedings of the 16th International Web for All        Connecticut, 111–122.
     Conference. 1–10.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                                  Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock

[31] Deen G. Freelon. 2010. ReCal: Intercoder reliability calculation as a web service.         [57] Kim Marriott, Bongshin Lee, Matthew Butler, Ed Cutrell, Kirsten Ellis, Cagatay
     International Journal of Internet Science 5, 1 (2010), 20–33.                                   Goncu, Marti Hearst, Kathleen McCoy, and Danielle Albers Szafr. 2021. Inclusive
[32] Arthur Gilmour, Robert D. Anderson, and Alexander L. Rae. 1985. The analysis                    data visualization for people with disabilities: a call to action. Interactions 28, 3
     of binomial data by a generalized linear mixed model. Biometrika 72, 3 (1985),                  (2021), 47–51.
     593–599.                                                                                   [58] David K. McGookin and Stephen A. Brewster. 2006. SoundBar: Exploiting Multiple
[33] Nicholas A. Giudice, Hari Prasath Palani, Eric Brenner, and Kevin M. Kramer. 2012.              Views in Multimodal Graph Browsing. In Proceedings of the 4th Nordic Conference
     Learning Non-Visual Graphical Information Using a Touch-Based Vibro-Audio                       on Human-Computer Interaction: Changing Roles (Oslo, Norway) (NordiCHI ’06).
     Interface. In Proceedings of the 14th International ACM SIGACCESS Conference on                 Association for Computing Machinery, New York, NY, USA, 145–154. https:
     Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12). Association                  //
     for Computing Machinery, New York, NY, USA, 103–110.                   [59] Silvia Mirri, Silvio Peroni, Paola Salomoni, Fabio Vitali, and Vincenzo Rubano.
     1145/2384916.2384935                                                                            2017. Towards accessible graphs in HTML-based scientifc articles. In 2017 14th
[34] The PostgreSQL Global Development Group. n.d.. PostgreSQL: The world’s most                     IEEE Annual Consumer Communications Networking Conference (CCNC). IEEE,
     advanced open source database. (Accessed on                        Las Vegas, NV, USA, 1067–1072.
     08/08/2021).                                                                               [60] André Natal, Glen Shires, and Philip Jägenstedt. n.d.. Web Speech API. https:
[35] Melita Hajdinjak and France Mihelic. 2004. Conducting the Wizard-of-Oz Exper-                   // (Accessed on 08/07/2021).
     iment. Informatica (Slovenia) 28, 4 (2004), 425–429.                                       [61] Keita Ohshiro, Amy Hurst, and Luke DuBois. 2021. Making Math Graphs More
[36] Patrick A.V. Hall and Geof R. Dowling. 1980. Approximate string matching.                       Accessible in Remote Learning: Using Sonifcation to Introduce Discontinuity in
     ACM computing surveys (CSUR) 12, 4 (1980), 381–402.                                             Calculus. In The 23rd International ACM SIGACCESS Conference on Computers
[37] Jordan Harband, Shu-yu Guo, Michael Ficarra, and Kevin Gibbons. 1999. Standard                  and Accessibility. 1–4.
     ecma-262.                                                                                  [62] Michael Q. Patton. 1990. Qualitative evaluation and research methods. SAGE
[38] Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX                           Publications Inc., Saint Paul, MN, USA.
     (Task Load Index): Results of empirical and theoretical research. In Advances in           [63] Pablo Picazo-Sanchez, Juan Tapiador, and Gerardo Schneider. 2020. After you,
     psychology. Vol. 52. Elsevier, North-Holland, Netherlands, 139–183.                             please: browser extensions order attacks and countermeasures. International
[39] Highcharts. n.d.. Sonifcation | Highcharts.                    Journal of Information Security 19, 6 (2020), 623–638.
     accessibility/sonifcation. (Accessed on 08/01/2021).                                       [64] Ricardo Sousa Rocha, Pedro Ferreira, Inês Dutra, Ricardo Correia, Rogerio Salvini,
[40] Clare J. Hooper. 2011. Towards Designing More Efective Systems by Under-                        and Elizabeth Burnside. 2016. A Speech-to-Text Interface for MammoClass. In
     standing User Experiences. SIGWEB Newsl. Autumn, 4, Article 4 (Sept. 2011),                     2016 IEEE 29th International Symposium on Computer-Based Medical Systems
     3 pages.                                                (CBMS). IEEE, Dublin, Ireland and Belfast, Northern Ireland, 1–6. https://doi.
[41] Mike H. Hoyle. 1973. Transformations: An Introduction and a Bibliography.                       org/10.1109/CBMS.2016.25
     International Statistical Review / Revue Internationale de Statistique 41, 2 (1973),       [65] Bahador Saket, Alex Endert, and Çağatay Demiralp. 2018. Task-based efective-
     203–223.                                                    ness of basic visualizations. IEEE transactions on visualization and computer
[42] Weijian Hu, Kaiwei Wang, Kailun Yang, Ruiqi Cheng, Yaozu Ye, Lei Sun, and                       graphics 25, 7 (2018), 2505–2512.
     Zhijie Xu. 2020. A comparative study in real-time scene sonifcation for visually           [66] Nik Sawe, Chris Chafe, and Jefrey Treviño. 2020. Using Data Sonifcation to
     impaired people. Sensors 20, 11 (2020), 3222.                                                   Overcome Science Literacy, Numeracy, and Visualization Barriers in Science
[43] Amy Hurst and Shaun Kane. 2013. Making "Making" Accessible. In Proceedings                      Communication. Frontiers in Communication 5 (2020), 46.
     of the 12th International Conference on Interaction Design and Children (New York,         [67] Anastasia Schaadhardt, Alexis Hiniker, and Jacob O. Wobbrock. 2021. Under-
     New York, USA) (IDC ’13). Association for Computing Machinery, New York, NY,                    standing Blind Screen-Reader Users’ Experiences of Digital Artboards. In Pro-
     USA, 635–638.                                           ceedings of the 2021 CHI Conference on Human Factors in Computing Systems.
[44] Apple Inc. n.d.. Accessibility - Vision - Apple.                         Association for Computing Machinery, New York, NY, USA, Article 270, 19 pages.
     accessibility/vision/. (Accessed on 08/08/2021).                                      
[45] Facebook Inc. n.d.. React – A JavaScript library for building user interfaces.             [68] Freedom Scientifc. n.d..        JAWS® – Freedom Scientifc.              https://www. (Accessed on 08/08/2021).                                        (Accessed on 08/08/2021).
[46] Dae Hyun Kim, Enamul Hoque, and Maneesh Agrawala. 2020. Answering Ques-                    [69] Ather Sharif, Sanjana Chintalapati, Jacob O. Wobbrock, and Katharina Reinecke.
     tions about Charts and Generating Visual Explanations. In Proceedings of the 2020               2021. Understanding Screen-Reader Users’ Experiences with Online Data Visual-
     CHI Conference on Human Factors in Computing Systems. Association for Com-                      izations. In The 23rd International ACM SIGACCESS Conference on Computers and
     puting Machinery, New York, NY, USA, 1–13.                     Accessibility (Virtual Event) (ASSETS ’21). Association for Computing Machinery,
     3376467                                                                                         New York, NY, USA, To Appear.
[47] Edward Kim and Kathleen F McCoy. 2018. Multimodal deep learning using                      [70] Ather Sharif and Babak Forouraghi. 2018. evoGraphs — A jQuery plugin to create
     images and text for information graphic classifcation. In Proceedings of the 20th               web accessible graphs. In 2018 15th IEEE Annual Consumer Communications
     International ACM SIGACCESS Conference on Computers and Accessibility. 143–                     Networking Conference (CCNC). IEEE, Las Vegas, NV, USA, 1–4.
     148.                                                                                            10.1109/CCNC.2018.8319239
[48] Edward Kim, Connor Onweller, and Kathleen F McCoy. 2021. Information Graphic               [71] Lei Shi, Idan Zelzer, Catherine Feng, and Shiri Azenkot. 2016. Tickers and Talker:
     Summarization using a Collection of Multimodal Deep Neural Networks. In 2020                    An Accessible Labeling Toolkit for 3D Printed Models. In Proceedings of the
     25th International Conference on Pattern Recognition (ICPR). IEEE, 10188–10195.                 2016 CHI Conference on Human Factors in Computing Systems. Association for
[49] Klaus Krippendorf. 2011. Computing Krippendorf’s Alpha-Reliability. https:                      Computing Machinery, New York, NY, USA, 4896–4907.
     // Retrieved from.                                            2858036.2858507
[50] Klaus Krippendorf. 2018. Content analysis: An introduction to its methodology.             [72] Boris Smus. 2013. Web Audio API: Advanced Sound for Games and Interactive
     SAGE Publications Inc., Pennsylvania, USA.                                                      Apps. O’Reilly Media, California, USA.
[51] Richard J. Landis and Gary G. Kock. 1977. The Measurement of Observer                           eSPyRuL8b7UC
     Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http:                    [73] Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John
     //                                                                  Stasko. 2021. Collecting and Characterizing Natural Language Utterances for
[52] Bongshin Lee, Arjun Srinivasan, Petra Isenberg, John Stasko, et al. 2021. Post-                 Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on
     WIMP Interaction for Information Visualization. Foundations and Trends® in                      Human Factors in Computing Systems. Association for Computing Machinery,
     Human-Computer Interaction 14, 1 (2021), 1–95.                                                  New York, NY, USA, Article 464, 10 pages.
[53] Eckhard Limpert, Werner A. Stahel, and Markus Abbt. 2001. Log-normal distri-                    3445400
     butions across the sciences: keys and clues: on the charms of statistics, and how          [74] Jonathan E. Thiele, Michael S. Pratte, and Jefrey N. Rouder. 2011. On perfect
     mechanical models resembling gambling machines ofer a link to a handy way                       working-memory performance with large numbers of items. Psychonomic Bulletin
     to characterize log-normal distributions, which can provide deeper insight into                 & Review 18, 5 (2011), 958–963.
     variability and probability—normal or log-normal: that is the question. BioScience         [75] Ingo R. Titze and Daniel W. Martin. 1998. Principles of voice production.
     51, 5 (2001), 341–352.                                                                     [76] Frances Van Scoy, Don McLaughlin, and Angela Fullmer. 2005. Auditory augmen-
[54] Ramon C. Littell, Henry P. Raymond, and Clarence B. Ammerman. 1998. Statistical                 tation of haptic graphs: Developing a graphic tool for teaching precalculus skill
     analysis of repeated measures data using SAS procedures. Journal of animal                      to blind students. In Proceedings of the 11th Meeting of the International Conference
     science 76, 4 (1998), 1216–1231.                                                                on Auditory Display, Vol. 5. Citeseer, Limerick, Ireland, 5 pages.
[55] Alan Lundgard and Arvind Satyanarayan. 2021. Accessible Visualization via                  [77] W3C. n.d.. WAI-ARIA Overview | Web Accessibility Initiative (WAI) | W3C.
     Natural Language Descriptions: A Four-Level Model of Semantic Content. IEEE            (Accessed on 04/11/2021).
     transactions on visualization and computer graphics (2021).                                [78] WebAIM. n.d.. WebAIM: CSS in Action - Invisible Content Just for Screen
[56] Yotam Mann. n.d.. Tone.js. (Accessed on 08/02/2021).                 Reader Users. (Accessed
                                                                                                     on 09/01/2021).
VoxLens                                                                               CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA

[79] Jacob O. Wobbrock, Krzysztof Z. Gajos, Shaun K. Kane, and Gregg C. Vanderhei-
     den. 2018. Ability-based design. Commun. ACM 61, 6 (2018), 62–71.
[80] Susan P. Wyche and Rebecca E. Grinter. 2009. Extraordinary Computing: Religion
     as a Lens for Reconsidering the Home. In Proceedings of the SIGCHI Conference
     on Human Factors in Computing Systems. Association for Computing Machinery,
     New York, NY, USA, 749–758.
[81] Wai Yu, Ramesh Ramloll, and Stephen Brewster. 2001. Haptic graphs for blind
     computer users. In Haptic Human-Computer Interaction, Stephen Brewster and
     Roderick Murray-Smith (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
[82] Mingrui Ray Zhang, Ruolin Wang, Xuhai Xu, Qisheng Li, Ather Sharif, and
     Jacob O. Wobbrock. 2021. Voicemoji: Emoji Entry Using Voice for Visually
     Impaired People. In Proceedings of the 2021 CHI Conference on Human Factors in
     Computing Systems. Association for Computing Machinery, New York, NY, USA,
     Article 37, 18 pages.
[83] Haixia Zhao, Catherine Plaisant, Ben Shneiderman, and Jonathan Lazar. 2008.
     Data sonifcation for users with visual impairment: a case study with georefer-
     enced data. ACM Transactions on Computer-Human Interaction (TOCHI) 15, 1
     (2008), 1–28.
CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA                Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock


                     Gender        Age     Screen Reader   Vision-Loss Level                                  Diagnosis
             P1      M             28       NVDA           Blind since birth, Complete blindness              Optic Nerve Hypoplasia
             P2      M             61       JAWS           Complete blindness, Lost vision gradually          Optic Neuropathy
             P3      M             48      JAWS            Complete blindness, Lost vision gradually          Leber Congenital Amauro-
             P4      F             29      NVDA            Blind since birth, Complete blindness              Optic Nerve Hypoplasia and
             P5      F             37      JAWS            Blind since birth, Complete blindness              Leber Congenital Amauro-
             P6      F             51      JAWS            Blind since birth, Complete blindness              Retinopathy of Prematurity
             P7      M             58      JAWS            Complete blindness, Lost vision gradually          Glaucoma
             P8      M             30      NVDA            Blind since birth, Complete blindness              Leber Congenital Amauro-
             P9      F             64      JAWS            Complete blindness, Lost vision gradually          Retinitis Pigmentosa
             P10     F             68      Fusion          Lost vision gradually, Partial blindness           Stargaart’s Maculopathy
             P11     F             73       JAWS           Complete blindness, Lost vision gradually          Retinitis Pigmentosa
             P12     F             64      JAWS            Complete blindness, Lost vision gradually          Cataracts
             P13     M             18      NVDA            Complete blindness                                 Brain Tumor
             P14     M             36       JAWS           Blind since birth, Complete blindness              Leber Congenital Amauro-
             P15     M             25      NVDA            Lost vision gradually, Partial vision              Retinopathy of Prematurity
                                                                                                              and Subsequent Cataracts
             P16     M             42      JAWS            Blind since birth, Complete blindness              Microphthalmia
             P17     M             68       JAWS           Complete blindness, Lost vision gradually          Detached Retinas
             P18     F             31      NVDA            Blind since birth, Complete blindness              Retinopathy of Prematurity
             P19     F             47      JAWS            Complete blindness, Lost vision gradually          Optic Neuropathy
             P20     M             48      NVDA            Complete blindness, Lost vision gradually          Retinitis Pigmentosa
             P21     M             43      NVDA            Complete blindness, Lost vision gradually          Retinitis Pigmentosa
             P22     M             19      NVDA            Blind since birth, Complete blindness              Retinopathy of Prematurity
Table 6: Screen-reader participants, their gender identifcation, age, screen reader, vision-loss level, and diagnosis. Under the
Gender column, M = Male, F = Female, and N B = Non-binary.
VoxLens                                                                                            CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA


                          Gender       Age     Screen Reader   Vision-Loss Level        Diagnosis
                    W1    M            25      VoiceOver       Partial vision           Extremely low vision
                    W2    M            28      NVDA            Blind since birth        Optic Nerve Hypoplaxia
                    W3    M            23      VoiceOver       Blind since birth        Septo-optic Dysplasia
                    W4    F            26      JAWS            Blind since birth        Leber Congenital Amaurosis
                    W5    M            31      JAWS            Blind since birth        Retinopathy of Prematurity
Table 7: Screen-reader participants for the Wizard-of-Oz experiment, their gender identifcation, age, screen reader, vision-loss
level, and diagnosis. Under the Gender column, M = Male, F = Female, and N B = Non-binary.


                                       Both Groups              Without VoxLens                     With VoxLens
                  Age Range      N          Mean      SD       N      Mean         SD         N          Mean          SD
                  18-19            3         62.5     30.5     1       96.8         -          2          45.3         9.4
                  20-29            9         40.6     28.0     6       63.7        20.3        3          50.8         7.9
                  30-39          10          44.2     23.1     7       60.6        16.5        3          43.9         4.5
                  40-49          15          67.5     78.7     10      106.7       93.1        5          59.8        14.9
                  50-59            7         47.6     27.3     5       79.1        19.2        2          63.3         3.0
                  60-69          11          64.8     33.6     6       76.8        33.7        5          55.2        10.8
                  > 70             2         127.0    127.4    1       217.1        -          1          60.1          -
Table 8: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens showing the numerical
results for Interaction Time (IT), for each age range. N is the total number of participants for the given age range, Mean is the
average IT in seconds, and SD represents the standard deviation.