Authors Alida T. Muongchan Ather Sharif Jacob O. Wobbrock Katharina Reinecke Olivia H. Wang
License CC-BY-4.0
VoxLens: Making VoxLens: Making Online Online Data Data Visualizations Visualizations Accessible Accessible with with an an Interactive JavaScript Interactive JavaScript Plug-In Plug-In Ather Sharif Ather Sharif Olivia H. Olivia H. Wang Wang Alida T. Alida T. Muongchan Muongchan asharif@cs.washington.edu asharif@cs.washington.edu wang4@cs.washington.edu wang4@cs.washington.edu alidatm@uw.edu alidatm@uw.edu Paul G.Allen Paul G. AllenSchool Schoolof ofComputer Computer Paul G.Allen Paul G. AllenSchool Schoolof ofComputer Computer Human CenteredDesign Human Centered Designand and Science & Engineering | DUB Group, Science & Engineering | DUB Group, Science & Engineering, Science & Engineering, Engineering, Engineering, University of Washington University of Washington Universityof University ofWashington Washington Universityof University ofWashington Washington Seattle,Washington, Seattle, Washington,USA USA Seattle, Washington, USA Seattle, Washington, USA Seattle, Washington, USA Seattle, Washington, USA Katharina Reinecke Katharina Reinecke Jacob O. Jacob O. Wobbrock Wobbrock reinecke@cs.washington.edu reinecke@cs.washington.edu wobbrock@uw.edu wobbrock@uw.edu Paul G.Allen Paul G. AllenSchool Schoolof ofComputer Computer The InformationSchool The Information School| | Science & Engineering | DUB Group, Science & Engineering | DUB Group, DUB Group, DUB Group, University of Washington University of Washington Universityof University ofWashington Washington Seattle,Washington, Seattle, Washington,USA USA Seattle, Washington, USA Seattle, Washington, USA Figure 1: VoxLens is an open-source JavaScript plug-in that improves the accessibility of online data visualizations using a Figure 1: VoxLens multi-modal is anThe approach. open-source JavaScript code at left plug-in shows that that improves integration the accessibility of VoxLens requires onlyof aonline singledata line visualizations using of code. At right, wea multi-modal portray approach. an example The code interaction at left with showsusing VoxLens that voice-activated integration of VoxLens commands requires only a single for screen-reader line of code. At right, we users. portray an example interaction with VoxLens using voice-activated commands for screen-reader users. ABSTRACT Specifcally, VoxLens enables screen-reader users to obtain a holis- ABSTRACT JavaScript visualization libraries are widely used to create online Specifically, tic summaryVoxLens enables of presented screen-reader information, playusers to obtain sonifed a holis- versions of JavaScript visualization libraries are widely tic data, the summary of presented and interact information, in with visualizations play sonified versions a “drill-down” manner of data visualizations but provide limited access used to create to their online information data visualizations but provide the data, using and interactcommands. voice-activated with visualizations Throughintask-based a “drill-down” manner experiments for screen-reader users. Buildinglimited on prioraccess to their fndings information about the expe- for screen-reader users. Building ononline prior findings about the expe- using21voice-activated with screen-reader commands. users, we showThrough task-basedimproves that VoxLens experiments the riences of screen-reader users with data visualizations, we riences VoxLens, of screen-reader users with online data visualizations, with 21 screen-reader accuracy of information users, we show extraction andthat VoxLenstime interaction improves the by 122% present an open-source JavaScript plug-in that—withwe a present VoxLens, an open-source JavaScript plug-in that—with accuracy and of information 36%, respectively, overextraction and interaction existing conventional time by with interaction 122% single line of code—improves the accessibility of online data visu-a single lineforof screen-reader code—improves the accessibility of onlineapproach. data visu- and 36%, online datarespectively, over visualizations. Our existing conventional interviews interactionusers with screen-reader with alizations users using a multi-modal alizations for screen-reader users using a multi-modal approach. online data suggest thatvisualizations. Our interviews with VoxLens is a “game-changer” screen-reader in making online users data suggest that VoxLens visualizations accessibleisto a “game-changer” in making screen-reader users, saving online data them time Permission to make digital or hard copies of part or all of this work for personal or visualizations and efort. accessible to screen-reader users, saving them time classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation and effort. This work is licensed under a Creative Commons Attribution International on the first page. Copyrights for third-party components of this work must be honored. 4.0 License. For all other uses, contact the owner/author(s). CCS CONCEPTS CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA © 2022 Copyright held by the owner/author(s). • Human-centered computing → Information visualization; © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9157-3/22/04. ACM ISBN 978-1-4503-9157-3/22/04. Accessibility systems and tools; • Social and professional top- https://doi.org/10.1145/3491102.3517431 https://doi.org/10.1145/3491102.3517431 ics → People with disabilities. CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock KEYWORDS customizable.) Additionally, VoxLens reduces the burden on visu- Visualizations, accessibility, screen readers, voice-based interaction, alization creators in applying accessibility features to their data blind, low-vision. visualizations, requiring inserting only a single line of JavaScript code during visualization creation. Furthermore, VoxLens enables ACM Reference Format: screen-reader users to explore the data as per their individual pref- Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, erences, without relying on the visualization creators and without and Jacob O. Wobbrock. 2022. VoxLens: Making Online Data Visualizations having to process data in their minds. VoxLens is the frst system to: Accessible with an Interactive JavaScript Plug-In. In CHI Conference on (1) enable screen-reader users to interact with online data visualiza- Human Factors in Computing Systems (CHI ’22), April 29-May 5, 2022, New Orleans, LA, USA. ACM, New York, NY, USA, 19 pages. https://doi.org/10. tions using voice-activated commands; and (2) ofer a multi-modal 1145/3491102.3517431 solution using three diferent modes of interaction. To assess the performance of VoxLens, we conducted controlled task-based experiments with 21 screen-reader users. Specifcally, 1 INTRODUCTION we analyzed the accuracy of extracted information and interaction Online data visualizations are present widely on the Web, allowing time with online data visualizations. Our results show that with experts and non-experts alike to explore and analyze data both VoxLens, compared to without it, screen-reader users improved simple and complex. They assist people in extracting information their accuracy of extracting information by 122% and reduced their efectively and efciently, taking advantage of the ability of the overall interaction time by 36%. Additionally, we conducted follow- human mind to recognize and interpret visual patterns [57]. up semi-structured interviews with six participants, fnding that However, the visual nature of data visualizations inherently dis- VoxLens is a positive step forward in making online data visual- enfranchises screen-reader users, who may not be able to see or izations accessible, interactive dialogue is one of the ‘top’ features, recognize visual patterns [52, 57]. We defne “screen-reader users,” sonifcation helps in ‘visualizing’ data, and data summary is a good following prior work [69], as people who utilize a screen reader (e.g., starting point. Furthermore, we assessed the perceived workload JAWS [68], NVDA [2], or VoiceOver [44]) to read the contents of a of VoxLens using the NASA-TLX questionnaire [38], showing that computer screen. They might have conditions including complete VoxLens leaves users feeling successful in their performance and or partial blindness, low vision, learning disabilities (such as alexia), demands low physical efort. motion sensitivity, or vestibular hypersensitivity. The main contributions of our work are as follows: Due to the inaccessibility of data visualizations, screen-reader (1) VoxLens, an interactive JavaScript plug-in that improves users commonly cannot access them at all. Even when the data the accessibility of online data visualizations for screen- visualization includes basic accessibility functions (e.g., alternative reader users. VoxLens ofers a multi-modal solution, en- text or a data table), screen-reader users still spend 211% more time abling screen-reader users to explore online data visualiza- interacting with online data visualizations and answer questions tions, both holistically and in a drilled-down manner, using about the data in the visualizations 61% less accurately, compared voice-activated commands. We present its design and archi- to non-screen-reader users [69]. Screen-reader users rely on the cre- tecture, functionality, commands, and operations. Addition- ators of visualizations to provide adequate alternative text, which is often incomplete. Additionally, they have to remember and process ally, we open-source our implementation at https://github.com/athersharif/voxlens. more information mentally than is often humanly feasible [74], (2) Results from formative and summative studies with screen- such as when seeking the maximum or minimum value in a chart. reader users evaluating the performance of VoxLens. With Prior work has studied the experiences of screen-reader users with VoxLens, screen-reader users signifcantly improved their in- online data visualizations and highlighted the challenges they face, teraction performance compared to their conventional inter- the information they seek, and the techniques and strategies that action with online data visualizations. Specifcally, VoxLens could make online data visualizations more accessible [69]. Building increased their accuracy of extracting information by 122% on this work, it is our aim to realize a novel interactive solution to and decreased their interaction time by 36% compared to not enable screen-reader users to efciently interact with online data using VoxLens. visualizations. To this end, we created an open-source JavaScript plug-in called “VoxLens,” following an iterative design process [1]. VoxLens pro- 2 RELATED WORK vides screen-reader users with a multi-modal solution that supports We review the previous research on the experiences of screen-reader three modes of interaction: (1) Question-and-Answer mode, where users with online data visualizations and the systems designed to the user verbally interacts with the visualizations on their own; (2) improve the accessibility of data visualizations for screen-reader Summary mode, where VoxLens describes the summary of the in- users. Additionally, we review existing JavaScript libraries used to formation contained in the visualization; and (3) Sonifcation mode, create online visualizations, and tools that generate audio graphs. where VoxLens maps the data in the visualization to a musical scale, enabling listeners to interpret the data trend. (Existing sonif- cation tools are either proprietary [39] or written in a programming 2.1 Experiences of Screen-Reader Users with language other than JavaScript [5], making them unintegratable Online Data Visualizations with popular JavaScript visualization libraries; VoxLens’ sonifca- Understanding the experiences and needs of users is paramount tion feature is open-source, integratable with other libraries, and in the development of tools and systems [8, 40]. Several prior VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA research eforts have conducted interviews with blind and low- 2.3 Existing JavaScript Data Visualization vision (BLV) users to understand their experiences with technology Libraries [7, 43, 67, 69, 82]. Most recently, Zhang et al. [82] conducted inter- Several JavaScript data visualization libraries exist that enable visu- views with 12 BLV users, reporting four major challenges of cur- alization creators to make visualizations for the Web. We classifed rent emoji entry methods: (1) the entry process is time-consuming; these visualization libraries into two categories based on accessibil- (2) the results from these methods are inconsistent with the ex- ity features: (1) libraries that rely on developers to append appro- pectations of users; (3) there is a lack of support for discovering priate alternative text (e.g., D3 and ChartJS); and (2) libraries that new emojis; and (4) there is a lack of support for fnding the right automatically provide screen-reader users with built-in features for emojis. They utilized these fndings to develop Voicemoji, a speech- data access (e.g., Google Charts). based emoji entry system that enables BLV users to input emojis. Bostock et al. [11] developed D3—a powerful visualization library Schaadhardt et al. [67] conducted contextual interviews with 12 that uses web standards to generate graphs. D3 uses Scalable Vector blind users, identifying key accessibility problems with 2-D digi- Graphics (SVG) [22] to create such visualizations, relying on the de- tal artboards, such as Microsoft PowerPoint and Adobe Illustrator. velopers to provide adequate alternative text for screen-reader users Similarly, Sharif et al. [69] conducted contextual interviews with to comprehend the information contained in the visualizations. 9 screen-reader users, highlighting the inequities screen-reader Google Charts [23] is a visualization tool widely used to create users face when interacting with online data visualizations. They graphs. An important underlying accessibility feature of Google reported the challenges screen-reader users face, the information Charts is the presence of a visually hidden tabular representation they seek, and the techniques and strategies they prefer to make on- of data. While this approach allows screen-reader users to access line data visualizations more accessible. We rely upon the fndings the raw data, extracting information is a cumbersome task. Fur- from Sharif et al. [69] to design VoxLens, an interactive JavaScript thermore, tabular representations of data introduce excessive user plug-in that improves the accessibility of online data visualizations, workloads, as screen-reader users have to sequentially go through deriving motivation from Marriott et al.’s [57] call-to-action for each data point. The workload is further exacerbated as data car- creating inclusive data visualizations for people with disabilities. dinality increases, forcing screen-reader users to memorize each data point to extract even the most fundamental information such 2.2 Accessibility of Online Data Visualizations as minimum or maximum values. In contrast to these approaches, VoxLens introduces an alternate Prior research eforts have explored several techniques to make data way for screen-reader users to obtain their desired information visualizations more accessible to BLV users, including automatically without relying on visualization creators, and without mentally generating alternative text for visualization elements [48, 59, 70], computing complex information through memorization of data. sonifcation [3, 5, 16, 27, 39, 58, 83], haptic graphs [33, 76, 81], 3-D printing [15, 43, 71], and trend categorization [47]. For example, 2.4 Audio Graphs Sharif et al. [70] developed evoGraphs, a jQuery plug-in to create accessible graphs by automatically generating alternative text. Sim- Prior work has developed sonifcation tools to enable screen-reader ilarly, Kim et al. [47] created a framework that uses multimodal users to explore data trends and patterns in online data visualiza- deep learning to generate summarization text from image-based tions [3, 5, 39, 58, 83]. McGookin et al. [58] developed SoundBar, line graphs. Zhao et al. [83] developed iSonic, which assists BLV a system that allows blind users to gain a quick overview of bar users in exploring georeferenced data through non-textual sounds graphs using musical tones. Highcharts [39], a proprietary commer- and speech output. They conducted in-depth studies with seven cial charting tool, ofers data sonifcation as an add-on. Apple Audio blind users, fnding that iSonic enabled blind users to fnd facts Graphs [5] is an API for Apple application developers to construct and discover trends in georeferenced data. Yu et al. [81] developed an audible representation of the data in charts and graphs, giving a system to create haptic graphs, evaluating their system using BLV users access to valuable data insights. Similarly, Ahmetovic et an experiment employing both blind and sighted people, fnding al. [3] developed a web app that supports blind people in exploring that haptic interfaces are useful in providing the information con- graphs of mathematical functions using sonifcation. tained in a graph to blind computer users. Hurst et al. [43] worked At least one of the following is true for all of the aforementioned with six individuals with low or limited vision and developed Viz- systems: (1) they are proprietary and cannot be used outside of Touch, software that leverages afordable 3-D printing to rapidly their respective products [39]; (2) they are standalone hardware and automatically generate tangible visualizations. or software applications [3]; (3) they require installation of extra Although these approaches are plausible solutions for improving hardware or software [58]; or (4) they are incompatible with existing the accessibility of visualizations for screen-reader users, at least JavaScript libraries [5]. VoxLens provides sonifcation as a separate one of the following is true for all of them: (1) they require additional open-source library (independent from the VoxLens library) that is equipment or devices; (2) they are not practical for spontaneous customizable and integratable with any JavaScript library or code. everyday web browsing; (3) they do not ofer a multi-modal solution; and (4) they do not explore the varying preferences of visualization 3 DESIGN OF VOXLENS interaction among screen-reader users. In contrast, VoxLens does We present the design and implementation of VoxLens, an open- not need any additional equipment, is designed for spontaneous source JavaScript plug-in that improves the accessibility of online everyday web browsing, and ofers a multi-modal solution catering data visualizations. We created VoxLens using a user-centered itera- to the individual needs and abilities of screen-reader users. tive design process, building on fndings and recommendations from CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock prior work [69]. Specifcally, our goal was to provide screen-reader the data collected from the Wizard-of-Oz studies. We provide our users with a comprehensive means of extracting information from fndings from the Wizard-of-Oz studies for each VoxLens mode in online data visualizations, both holistically and in a drilled-down its respective section, below. fashion. Holistic exploration involves overall trend, extremum, and labels and ranges for each axis, whereas drilled-down interaction 3.1.2 Qestion-and-Answer Mode. In Question-and-Answer mode, involves examining individual data points [69]. We named our tool screen-reader users can extract information from data visualiza- VoxLens, combining “vox,” meaning “voice” in Latin, and “lens,” tions by asking questions verbally using their microphone. We used since it provides a way for screen-reader users to explore, examine, the Web Speech API [60] and the P5 Speech library [24] for speech and extract information from online data visualizations. Currently, input, removing the need for any additional software or hardware VoxLens only supports two-dimensional single-series data. installation by the user. Through manual testing, we found the P5 Speech library to perform quite well in recognizing speech from 3.1 Interaction Modes diferent accents, pronunciations, and background noise levels. Af- Following the recommendations from prior work [69], our goal ter getting the text from the speech, we used an approximate string was to enable screen-reader users to gain a holistic overview of the matching algorithm from Hall and Dowling [36] to recognize the data as well as to perform drilled-down explorations. Therefore, commands. Additionally, we verifed VoxLens’ command recogni- we explored three modes of interaction: (1) Question-and-Answer tion efectiveness through manual testing, using prior work’s [73] mode, where the user verbally interacts with the visualizations; data set on natural language utterances for visualizations. (2) Summary mode, where VoxLens verbally ofers a summary of Our Wizard-of-Oz studies revealed that participants liked clear the information contained in the visualization; and (3) Sonifcation instructions and responses, integration with the user’s screen reader, mode, where VoxLens maps the data in the visualization to a mu- and the ability to query by specifc terminologies. They specifed sical scale, enabling listeners to interpret possible data trends or that having an interactive tutorial to become familiar with the tool, patterns. We iteratively built the features for these modes seeking a help menu to determine which commands are supported, and feedback from screen-reader users through our Wizard-of-Oz stud- the ability to include the user’s query in the response as key areas ies. VoxLens channels all voice outputs through the user’s local of improvement. Therefore, after recognizing the commands and screen reader, providing screen-reader users with a familiar and processing their respective responses, VoxLens delivers a single comfortable experience. These three modes of interaction can be combined response to the user via their screen readers. This ap- activated by pressing their respective keyboard shortcuts (Table 1). proach enables screen-reader users to get a response to multiple commands as one single response. Additionally, we also added each 3.1.1 Wizard-of-Oz Studies. Our goal was to gather feedback and query as feedback in the response (Figure 1). For example, if the identify areas of improvement for the VoxLens features. Therefore, user said, “what is the maximum?”, the response was, “I heard you we conducted a Wizard-of-Oz study [21, 35] with fve screen-reader ask about the maximum. The maximum is...” If a command was users (see Appendix B, Table 7). (For clarity, we prefx the codes for not recognized, the response was, “I heard you say [user input]. participants in our Wizard-of-Oz studies with “W.”) We used the Command not recognized. Please try again.” fndings from the studies to inform design decisions when itera- Screen-reader users are also able to get a list of supported com- tively building VoxLens. In our studies, we, the “wizards,” simulated mands by asking for the commands list. For example, the user can the auditory responses from a hypothetical screen reader. ask, “What are the supported commands?” to hear all of the com- mands that VoxLens supports. The list of supported commands, along with their aliases, are presented in Table 2. 3.1.3 Summary Mode. Our Wizard-of-Oz studies, in line with the fndings from prior work [69], revealed that participants liked the efciency and the preliminary exploration of the data. They sug- gested the information be personalized based on the preferences of each user, but by default, it should only expose the minimum amount of information that a user would need to decide if they want to delve further into the data exploration. To delve further, they commonly seek the title, axis labels and ranges, maximum and minimum data points, and the average in online data visualiza- tions. (The title and axis labels are required confguration options Figure 2: Sample visualization showing price by car brands. for VoxLens, discussed further in section 3.2.2 below. Axis ranges, maximum and minimum data points, and average are computed by VoxLens.) At the same time, screen-reader users preferred concisely Participants interacted with all of the aforementioned VoxLens stated information. Therefore, the goal for VoxLens’s Summary modes and were briefy interviewed in a semi-structured manner mode was to generate the summary only as a means to providing with open prompts at the end of each of their interactions. Specif- the foundational holistic information about the visualization, and ically, we asked them to identify the features that they liked and not as a replacement for the visualization itself. We used the “lan- the areas of improvement for each mode. We qualitatively analyzed guage of graphics” [10] through a pre-defned sentence template, VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Keyboard Shortcuts Question-and-Answer Mode Modifier Keys + A / Modifier Keys + 1 Summary Mode Modifier Keys + S / Modifier Keys + 2 Sonifcation Mode Modifier Keys + M / Modifier Keys + 3 Repeat Instructions Modifier Keys + I / Modifier Keys + 4 Table 1: Keyboard shortcuts for VoxLens’ interaction modes and preliminary commands. Modifier Keys for Windows and MacOS were Control+Shift and Option, respectively. Information Type Command Aliases Extremum Maximum Highest Minimum Lowest Axis Labels and Ranges Axis Labels - Ranges - Statistics Mean Average Median - Mode - Variance - Standard Deviation - Sum Total Individual Data Point [x-axis label] value [x-axis label] data Help Commands Instructions - Directions, Help Table 2: Voice-activated commands for VoxLens’ Question-and-Answer mode. identifed as Level 1 by Lundgard et al. [55], to decide the sentence the minimum data point is $20,000 belonging structure. Our sentence template was: to Kia. The average is $60,000. Graph with title: [title]. The X-axis is As noted in prior work [55, 69], the preference for information [x-axis title]. The Y-axis is [y-axis title] varies from one individual to another. Therefore, future work can and ranges from [range minimum] to [range explore personalization options to generate a summarized response maximum]. The maximum data point is [maximum that caters to the individual needs of screen-reader users. y-axis value] belonging to [corresponding Additionally, VoxLens, at present, does not provide information x-axis value], and the minimum data point about the overall trend through the Summary mode. Such infor- is [minimum y-axis value] belonging to mation may be useful for screen-reader users in navigating line [corresponding x-axis value]. The average graphs [47, 48]. Therefore, work is underway to incorporate trend is [average]. information in the summarized response generated for line graphs, utilizing the fndings from prior work [47, 48]. For example, here is a generated summary of a data visualization depicting the prices of various car brands (Figure 2): 3.1.4 Sonification Mode. For Sonifcation mode, our Wizard-of-Oz participants liked the ability to preliminarily explore the data trend. Graph with title: Price by Car Brands. The As improvements, participants suggested the ability to identify key X-axis is car brands. The Y-axis is price and information, such as the maximum and the minimum data points. ranges from $0 to $300,000. The maximum data Therefore, VoxLens’s sonifcation mode presents screen-reader point is $290,000 belonging to Ferrari, and users with a sonifed response (also known as an “audio graph” CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock [72]) mapping the data in the visualization to a musical scale. A and in order. You’ll hear a beep sound, after sonifed response enables the listeners to interpret the data trend which you can ask a question such as, ‘‘what or pattern and gain a big-picture perspective of the data that is is the average?’’ or ‘‘what is the maximum not necessarily achievable otherwise [66]. To generate the sonifed value in the graph?’’ To hear the textual response, we utilized Tone.js [56], a JavaScript library that ofers summary of the graph, press Control + Shift + a wide variety of customizable options to produce musical notes. S or Control + Shift + 2. To hear the sonified Our goal was to enable the listeners to directionally distinguish version of the graph, press Control + Shift between data points and to interpret the overall data trend. + M or Control + Shift + 3. To repeat these Varying tonal frequency is more efective at representing trends instructions, press Control + Shift + I or than varying amplitude [26, 42]. Therefore, we mapped each data Control + Shift + 4. Key combinations must point to a frequency within the range of 130 and 650 Hz based be pressed all together and in order. on its magnitude. For example, for the minimum data point the At this stage, screen-reader users can activate question-and- frequency was 130 Hz, for the maximum data point it was 650Hz, answer mode, listen to the textual summary, play the sonifed ver- and the intermediate data points were assigned values linearly in- sion of the data contained in the visualization, or hear the instruc- between, similar to prior work [19, 61]. Additionally, similar to tions again. Activating the question-and-answer mode plays a beep design choices made by Ohshiro et al. [61], we used the sound of a sound, after which the user can ask a question in a free-form man- sawtooth wave to indicate value changes along the x-axis. These ner, without following any specifc grammar or sentence structure. approaches enabled us to distinctively diferentiate between data They are also able to ask for multiple pieces of information, in no values directionally, especially values that were only minimally dif- particular order. For example, in a visualization containing prices ferent from each other. We chose this range based on the frequency of cars by car brands, a screen-reader user may ask: range of the human voice [6, 58, 75], and by trying several combina- tions ourselves, fnding a setting that was comfortable for human Tell me the mean, maximum, and standard ears. We provide examples of sonifed responses in our paper’s deviation. supplementary materials. Our open-source sonifcation library is The response from VoxLens would be: available at https://github.com/athersharif/sonifer. In our work, we used the three common chart types (bar, scatter, I heard you asking about the mean, maximum, and line) [65], following prior work [69]. All of these chart types use and standard deviation. The mean is $60,000. a traditional Cartesian coordinate system. Therefore, VoxLens’s The maximum value of price for car brands is sonifed response is best applicable to graphs represented using a $290,000 belonging to Ferrari. The standard Cartesian plane. Future work can study sonifcation responses for deviation is 30,000. graphs that do not employ a Cartesian plane to represent data (e.g., Similarly, users may choose to hear the textual summary or play polar plots, pie charts, etc.). the sonifed version, as discussed above. 3.2 Usage and Integration 3.2.2 Visualization Creators. Typically, the accessibility of online 3.2.1 Screen-Reader User. A pain point for screen-reader users data visualizations relies upon visualization creators and their when interacting with online data visualizations is that most vi- knowledge and practice of accessibility standards. When an alter- sualization elements are undiscoverable and incomprehensible by native text description is not provided, the visualization is useless screen readers. In building VoxLens, we ensured that the visualiza- to screen-reader users. In cases where alternative text is provided, tion elements were recognizable and describable by screen readers. the quality and quantity of the text is also a developer’s choice, Hence, as the very frst step, when the screen reader encounters a which may or may not be adequate for screen-reader users. For visualization created with VoxLens, the following is read to users: example, a common unfortunate practice is to use the title of the visualization as its alternative text, which helps screen-reader users Bar graph with title: [title]. To listen in understanding the topic of the visualization but does not help to instructions on how to interact with the in understanding the content contained within the visualization. graph, press Control + Shift + I or Control + Therefore, VoxLens is designed to reduce the burden and depen- Shift + 4. Key combinations must be pressed dency on developers to make accessible visualizations, keeping the all together and in order. interaction consistent, independent of the visualization library used. The modifer keys (Control + Shift on Windows, and Option Additionally, VoxLens is engineered to require only a single line of on MacOS) and command keys were selected to not interfere with code, minimizing any barriers to its adoption (Figure 1). the dedicated key combinations of the screen reader, the Google VoxLens supports the following confguration options: “x” (key Chrome browser, and the operating system. Each command was name of the independent variable), “y” (key name of the dependent additionally assigned a numeric activation key, as per suggestions variable), “title” (title of the visualization), “xLabel” (label for x- from our participants. axis), and “yLabel” (label for y-axis). “x,” “y,” and “title” are required When a user presses the key combination to listen to the instruc- parameters, whereas the “xLabel” and “yLabel” are optional and tions, their screen reader announces the following: default to the key names of “x” and “y,” respectively. VoxLens To interact with the graph, press Control + allows visualization creators to set the values of these confguration Shift + A or Control + Shift + 1 all together options, as shown in Figure 1. VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA 3.3 Channeling VoxLens’ Output to Screen Charts, D3, and ChartJS, integrated VoxLens with each of them, Readers and deployed a testing website on our server. The testing website was instrumental in ensuring the correct operation of VoxLens One of the challenges we faced was to channel the auditory response under various confgurations, bypassing the challenges of setting from VoxLens to the screen reader of the user. As noted by our up a development environment for testers. participants during Wizard-of-Oz studies, screen-reader users have unique preferences for their screen readers, including the voice and speed of the speech output. Therefore, it was important for 3.5 Conficts with Other Plug-ins VoxLens to utilize these preferences, providing screen-reader users To the best of our knowledge, two kinds of conficts are possible with a consistent, familiar, and comfortable experience. To relay the with VoxLens: key combination conficts and ARIA attribute con- output from VoxLens to the screen reader, we created a temporary ficts. As mentioned in section 4.2.1, we selected key combinations to div element that was only visible to screen readers, positioning it avoid conficts with the dedicated combinations of the screen reader, of-screen, following WebAIM’s recommendations [78]. the Google Chrome browser, and the operating system. However, it Then, we added the appropriate Accessible Rich Internet Ap- is possible that some users might have external plug-ins using key plications (ARIA) attributes [77] to the temporary element to en- combinations that would confict with those from VoxLens. Future sure maximum accessibility. ARIA attributes are a set of attributes work could build a centralized confguration management system, to make web content more accessible to people with disabilities. enabling users to specify their own key combinations. Notably, we added the “aria-live” attribute, allowing screen read- VoxLens modifes the “aria-label” attribute of the visualiza- ers to immediately announce the query responses that VoxLens tion container element to describe the interaction instructions for adds to the temporary element. For MacOS, we had to addition- VoxLens, as mentioned in section 4.2.1. It is possible that another ally include the “role” attribute, with its value set to “alert.” This plug-in may intend to modify the “aria-label” attribute as well, in approach enabled VoxLens to promptly respond to screen-reader which case the execution order of the plug-ins will determine which users’ voice-activated commands using their screen readers. After plug-in achieves the fnal override. The execution order of the plug- the response from VoxLens is read by the screen reader, a callback ins depends on several external factors [63], and is, unfortunately, function removes the temporary element from the HTML tree to a common limitation for any browser plug-in. However, VoxLens avoid overloading the HTML Document Object Model (DOM). does not afect the “aria-labelledby” attribute, allowing other sys- tems to gracefully override the “aria-label” attribute set by VoxLens, as this attribute takes precedence over the “aria-label” attribute in 3.4 Additional Implementation Details the accessibility tree. Future iterations of VoxLens will attempt At present, VoxLens only supports two-dimensional data, contain- to ensure that VoxLens achieves the last execution order and that ing one independent and one dependent variable, as only the in- the ARIA labels set by other systems are additionally relayed to teractive experiences of screen-reader users with two-dimensional screen-reader users. data visualizations are well-understood [69]. To support data dimen- It is important to note that VoxLens’s sonifcation library is sions greater than two, future work would need to investigate the supplied independently from the main VoxLens plug-in and does interactive experiences of screen-reader users with n-dimensional not follow the same limitations. Our testing did not reveal any data visualizations. VoxLens is customizable and engineered to conficts of the sonifcation library with other plug-ins. support additional modifcations in the future. VoxLens relies on the Web Speech API [60], and is therefore 4 EVALUATION METHOD only fully functional on browsers with established support for the API such as Google Chrome. JavaScript was naturally our choice of We evaluated the performance of VoxLens using a mixed-methods programming language for VoxLens, as VoxLens is a plug-in for approach. Specifcally, we conducted an online mixed-factorial ex- JavaScript visualization libraries. Additionally, we used EcmaScript periment with screen-reader users to assess VoxLens quantitatively. [37] to take advantage of modern JavaScript features such as de- Additionally, we conducted follow-up interviews with our partici- structured assignments, arrow functions, and the spread operator. pants for a qualitative assessment of VoxLens. We also built a testing tool to test VoxLens on data visualizations, using the React [45] framework as the user-interface framework 4.1 Participants and Node.js [28] as the back-end server—both of which also use Our participants (see Appendix A, Table 6) were 22 screen-reader JavaScript as their underlying programming language. Additionally, users, recruited using word-of-mouth, snowball sampling, and email we used GraphQL [29] as the API layer for querying and connecting distribution lists for people with disabilities. Nine participants iden- with our Postgres [34] database, which we used to store data and tifed as women and 13 as men. Their average age was 45.3 years participants’ interaction logs. (SD=16.8). Twenty participants had complete blindness and two Creating a tool like VoxLens requires signif- participants had partial blindness; nine participants were blind cant engineering efort. Our GitHub repository at since birth, 12 lost vision gradually, and one became blind due to a https://github.com/athersharif/voxlens has a total of 188 brain tumor. The highest level of education attained or in pursuit commits and 101,104 lines of developed code, excluding comments. was a doctoral degree for two participants, a Master’s degree for To support testing VoxLens on various operating systems and seven participants, a Bachelor’s degree for eight participants, and a browsers with diferent screen readers, we collected 30 data sets high school diploma for the remaining fve participants. Estimated of varying data points, created their visualizations using Google computer usage was more than 5 hours per day for 12 participants, CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock 2-5 hours per day for eight participants, and 1-2 hours per day for (2) Symmetry Comparison (comparison of data points); and (3) Chart two participants. The average frequency of interacting with online Type-Specifc Questions (value retrieval for bar charts, trend sum- data visualizations was over two visualizations per day, usually in mary for line charts, and correlation for scatter plots). As in prior the context of news articles, blog posts, and social media. work [69], all questions were multiple-choice questions with four For the task-based experiment and questionnaire, participants choices: the correct answer, two incorrect answers, and the option were compensated with a $20 Amazon gift card for 30-45 minutes for “Unable to extract information.” The order of the four choices of their time. For the follow-up interview, they were compensated was randomized per trial. $10 for 30 minutes of their time. No participant was allowed to partake in the experiment more than once. 4.3 Procedure The study was conducted online by participants without direct 4.2 Apparatus supervision. The study comprised six stages. The frst stage dis- played the study purpose, eligibility criteria, and the statement of We conducted our task-based experiment online using a user study IRB approval. In the second stage, the participants were asked to fll platform that we created with the JavaScript React framework [45]. out a pre-study questionnaire to record their demographic informa- We tested our platform with screen-reader users and ourselves, both tion, screen-reader software, vision-loss level, and diagnosis (see with and without a screen reader, ensuring maximum and proper Appendix A, Table 6). Additionally, participants were asked about accessibility measures. We deployed the experiment platform as a their education level, daily computer usage, and their frequency of website hosted on our own server. interacting with visualizations. We analyzed the performance of VoxLens comparing the data In the third stage, participants were presented with a step-by- collected from our task-based experiments with that from prior step interactive tutorial to train and familiarize themselves with the work [69]. To enable a fair comparison to this prior work, we used modes, features, and commands that VoxLens ofers. Additionally, the same visualization libraries, visualization data set, question cat- participants were asked questions at each step to validate their egories, and complexity levels. The visualization libraries (Google understanding. On average, the tutorial took 12.6 minutes (SD=6.8) Charts, ChartJS, and D3) were chosen based on the variation in their to complete. Upon successful completion of the tutorial, participants underlying implementations as well as their application of accessi- were taken to the fourth stage, which displayed the instructions for bility measures. Google Charts utilizes SVG elements to generate completing the study tasks. the visualization and appends a tabular representation of the data In the ffth stage, each participant was given a total of nine tasks. for screen-reader users, by default; D3 also makes use of SVG ele- For each task, participants were shown three Web pages: Page 1 ments but does not provide a tabular representation; ChartJS uses contained the question to explore, page 2 displayed the question HTML Canvas to render the visualization as an image and relies and visualization, and page 3 presented the question with a set of on the developers to add alternative text (“alt-text”) and Accessible four multiple-choice responses. Figure 3 shows the three pages of Rich Internet Applications (“ARIA”) attributes [77]. Therefore, each an example task. After the completion of the tasks, participants of these visualization libraries provides a diferent experience for were asked to fll out the NASA-TLX [38] survey in the last stage. screen-reader users, as highlighted in prior work [69]. An entire study session ranged from 30-45 minutes in duration. We provide all of the visualizations and data sets used in this work in this paper’s supplementary materials. Readers can repro- duce these visualizations using the supplementary materials in 4.4 Design & Analysis conjunction with the source code and examples presented in our The experiment was a 2 × 3 × 3 × 3 mixed-factorial design with the open-source GitHub repository. We implemented the visualizations following factors and levels: following the WCAG 2.0 guidelines [17] in combination with the of- • VoxLens (V X ), between-Ss.: {yes, no} fcial accessibility recommendations from the visualization libraries. • Visualization Library (V L), within-Ss.: {ChartJS, D3, Google For ChartJS, we added the “role” and “aria-label” attributes to the Charts} “canvas” element. The “role” attribute had the value of “img,” and • Data Complexity (CMP), within-Ss.: {Low, Medium, High} the “aria-label” was given the value of the visualization title, as per • Question Difculty (DF ), within-Ss.: {Low, Medium, High} the ofcial documentation from ChartJS developers [18]. We did not perform any accessibility scafolding for Google Charts and For the screen-reader users who did not use VoxLens (V X =no), D3 visualizations, as these visualizations rely on a combination of we used prior work’s data [69] (N =36) as a baseline for comparison. internal implementations and the features of SVG for accessibility. Our two dependent variables were Accuracy of Extracted Infor- Our goal was to replicate an accurate representation of how these mation (AEI) and Interaction Time (IT). We used a dichotomous visualizations currently exist on the Web. representation of AEI (i.e., “inaccurate” or 0 if the user was unable Recent prior work [46] has reported that the non-visual ques- to answer the question correctly, and “accurate” or 1 otherwise) for tions that users ask from graphs mainly comprise compositional our analysis. We used a mixed logistic regression model [32] with questions, similar to the fndings from Brehmer and Munzner’s the above factors, interactions with VoxLens, and a covariate to task topology [14]. Therefore, our question categories comprised control for Age. We also included Subjectr as a random factor to one “Search” action (lookup and locate) and two “Query” actions account for repeated measures. The statistical model was therefore (identify and compare), similar to prior work [13]. The categories, in AEI ← V X +V X ×V L +V X ×CMP +V X ×DF + Aдe + Subjectr . We ascending order of difculty, were: (1) Order Statistics (extremum); did not include factors for V L, CMP, or DF because our research VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA interview transcript was coded by three researchers independently, and disagreements were resolved through mutual discussions. As suggested by Lombard et al. [51], we calculated inter-rater relia- bility (IRR) using pairwise percentage agreement together with Krippendorf’s α [49]. To calculate pairwise percentage agreement, we calculated the average pairwise agreement among the three rater pairs across observations. Our pairwise percentage agreement was 94.3%, showing a high agreement between raters. Krippendorf’s α was calculated using ReCal [31] and found to be 0.81, indicating a high level of reliability [50]. In addition to conducting follow-up interviews, we administered the NASA-TLX survey [38] with all participants (N =21) to assess the perceived workload of VoxLens. 5 RESULTS We present our experiment results using the Accuracy of Extracted Information (AEI) and Interaction Time (IT) for screen-reader users with and without VoxLens. We also present our interview results and the subjective ratings from the NASA-TLX questionnaire [38]. 5.1 Accuracy of Extracted Information Our results show a signifcant main efect of VoxLens (VX) on AEI (χ 2 (1, N =57)=38.16, p<.001, Cramer’s V =.14), with VoxLens users achieving 75% accuracy (SD = 18.0%) and non-VoxLens users achieving only 34% accuracy (SD = 20.1%). This diference consti- tuted a 122% improvement due to VoxLens. By analyzing the VoxLens (VX) × Visualization Library (VL) inter- action, we investigated whether changes in AEI were proportional across visualization libraries for participants in each VoxLens group. Figure 3: Participants were shown three pages for each task. The V X × V L interaction was indeed statistically signifcant (χ 2 (4, (a) Page 1 presented the question to explore. (b) Page 2 dis- N =57)=82.82, p<.001, Cramer’s V =.20). This result indicates that played the same question and a visualization. (c) Page 3 AEI signifcantly difered among visualization libraries for partic- showed the question again with a set of four multiple choice ipants in each VoxLens group. Figure 4 and Table 3 show AEI responses. percentages for diferent visualization libraries for each VoxLens group. Additionally, we report our fndings in Table 4. Prior work [69] has reported a statistically signifcant diference questions centered around VoxLens (V X ) and our interest in these between screen-reader users (SRU) and non-screen-reader users factors only extended to their possible interactions with VoxLens. (non-SRU) in terms of AEI , attributing the diference to the inac- For Interaction Time (IT ), we used a linear mixed model [30, cessibility of online data visualizations. We conducted a second 54] with the same model terms as for AEI . IT was calculated as analysis, investigating whether AEI was diferent between screen- the total time of the screen reader’s focus on the visualization reader users who used VoxLens and non-screen-reader users, to element. Participants were tested over three Visualization Library × extract information from online data visualizations. Specifcally, we Complexity (V L × CMP) conditions, resulting in 3×3 = 9 trials per investigated the efect of SRU on AEI but did not fnd a statistically participant. With 21 participants, a total of 21×9 = 189 trials were signifcant efect (p ≈ .077). This result itself does not provide evi- produced and analyzed for this study. One participant, who was dence in support of VoxLens closing the access gap between the unable to complete the tutorial, was excluded from the analysis. two user groups; further experimentation is necessary to confrm or refute this marginal result. In light of VoxLens’s other benefts, 4.5 Qualitative and Subjective Evaluation however, this is an encouraging trend. To qualitatively assess the performance of VoxLens, we conducted follow-up interviews with six screen-reader users, randomly se- 5.2 Interaction Time lected from our pool of participants who completed the task-based Our preliminary analysis showed that the interaction times were experiment. Similar to prior work [80], we ceased recruitment of conditionally non-normal, determined using Anderson-Darling [4] participants once we reached saturation of insights. tests of normality. To achieve normality, we applied logarithmic To analyze our interviews, we used thematic analysis [12] guided transformation prior to analysis, as is common practice for time by a semantic approach [62]. We used two interviews to develop measures [9, 41, 53]. For ease of interpretation, plots of interaction an initial set of codes, resulting in a total of 23 open codes. Each times are shown using the original non-transformed values. CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock Figure 4: Accuracy of Extracted Information (AEI ), as a percentage, for screen-reader users without (N =36) and with (N =21) VoxLens, by (a) Visualization Library, (b) Complexity Level, and (c) Difculty Level. The percentage represents the “accurate” answers. Therefore, higher is better. Error bars represent mean ± standard deviation. Without VoxLens With VoxLens N AA AA (%) N AA AA (%) Overall 324 109 34% 189 141 75% Visualization Library (VL) ChartJS 108 12 11% 63 50 79% D3 108 18 17% 63 47 75% Google Charts 108 79 73% 63 44 70% Complexity Level (CMP) Low 108 40 37% 63 52 83% Medium 108 34 31% 63 48 76% High 108 35 32% 63 41 65% Difculty Level (DF) Low 108 35 32% 63 58 92% Medium 108 36 33% 63 38 60% High 108 38 35% 63 45 71% Table 3: Numerical results for the N = 513 questions asked of screen-reader users with and without VoxLens for each level of Visualization Library, Complexity Level, and Difculty Level. N is the total number of questions asked, AA is the number of “accurate answers,” and AA(%) is the percentage of “accurate answers.” VoxLens (VX) had a signifcant main efect on Interaction Time The VX × VL and VX × DF interactions were both signifcant (IT) (F (4,54)=12.66, p<.05, ηp2 =.19). Specifcally, the average IT for (F (4,444)=33.89, p<.001, ηp2 =.23 and F (444)=14.41, p<.001, ηp2 =.12, non-VoxLens users was 84.6 seconds (SD=75.2). For VoxLens users, respectively). Figure 5 shows interaction times across diferent it was 54.1 seconds (SD=21.9), 36% lower (faster) than for partici- visualization libraries, difculty levels, and complexity levels for pants without VoxLens. VoxLens group. For VoxLens users, all three visualization libraries VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA N χ2 p Cramer’s V V X (VoxLens) 57 38.16 < .001 .14 VX ×VL 57 82.82 < .001 .20 V X × CMP 57 8.90 .064 .07 V X × DF 57 17.95 .001 .09 Aдe 57 3.58 .058 .04 Table 4: Summary results from 57 screen-reader users with (N =21) and without (N =36) VoxLens. “VL” is the visualization library, “CMP” is data complexity, and “DF” is question difculty. Cramer’s V is a measure of efect size [25]. All results are statistically signifcant (p < .05) or marginal (.05 ≤ p < .10). d fn d fd F p ηp2 V X (VoxLens) 4 54 12.66 .001 .19 VX ×VL 4 444 33.89 < .001 .23 V X × CMP 4 444 1.85 .118 .02 V X × DF 4 444 14.41 < .001 .12 Aдe 4 54 5.03 .029 .09 Table 5: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens using a linear mixed model [30, 54]. “VL” is the visualization library, “CMP” is data complexity, and “DF” is question difculty. Partial eta-squared (ηp2 ) is a measure of efect size [20]. All results are statistically signifcant (p < .05) except V X × CMP. resulted in almost identical interaction times. Figure 5 portrays our participants’ feedback about VoxLens: (1) a positive step for- larger variations in interaction times for users who did not use ward in making online data visualizations accessible, (2) interactive VoxLens (data used from prior work [69]) compared to VoxLens dialogue is one of the “top” features, (3) sonifcation helps in “vi- users. We attribute these observed diferences to the diferent un- sualizing” data, (4) data summary is a good starting point, and (5) derlying implementations of the visualization libraries. one-size-fts-all is not the optimal solution. We present each of We investigated the efects of Age on IT . Age had a signifcant these in turn. efect on IT (F (1,54)=5.03, p<.05, ηp2 =.09), indicating that IT difered signifcantly across the ages of our participants, with participants 5.3.1 A Positive Step Forward in Making Online Data Visualiza- aged 50 or older showing higher interaction times by about 7% tions Accessible. All participants found VoxLens to be an overall compared to participants under the age of 50. Table 8 (Appendix helpful tool to interact with and quickly extract information from C) shows the average IT for each age range by VoxLens group. online data visualizations. For example, S1 and S3 expressed their Additionally, we report our fndings in Table 5. excitement about VoxLens: Similar to our exploration of investigating the efect of screen- reader users (SRU ) on AEI , we examined the main efect of SRU I have never been able to really interact on IT . Our results show that SRU had a signifcant efect on IT with graphs before online. So without the (F (4,54)=48.84, p<.001, ηp2 =.48), with non-screen-reader users per- tool, I am not able to have that picture forming 99% faster than VoxLens users. in my head about what the graph looks like. I mean, like, especially when looking up news articles or really any, sort of, like, social media, there’s a lot of visual 5.3 Interview Results representations and graphs and pictographs To assess VoxLens qualitatively, we investigated the overall expe- that I don’t have access to so I could see riences of our participants with VoxLens, the features they found myself using [VoxLens] a lot. The tool is helpful, the challenges they faced during the interaction, and the really great and definitely a positive step improvements and future features that could enhance the perfor- forward in creating accessible graphs and mance of VoxLens. We identifed fve main results from analyzing data. (S1) CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock access a graph and a chart and be able to parse data from it. (S3) Participants highlighted that VoxLens contributes to bridging the access gap between screen-reader- and non-screen-reader users. As S4 said: So, as a sighted person looks at a graph and as they can tell where the peak is or which one has the most or whatever, we want to be able to do that quickly as well. And even if there is a text description under the graph, and I’ve not seen that very much, you have to read through everything to find a certain piece of information that you’re looking for. [Using VoxLens], I can find out specific pieces of information without having to read an entire page of text. (S4) Additionally, participants identifed that VoxLens enables them to quickly extract information from online data visualizations. S5 shared his experiences: Again, you know, [VoxLens] helps you find data a little bit quicker than navigating with a screen reader, and it’ll give you a brief idea of what the data is about before you start digging deeper into it. (S5) The fndings from our frst result show that VoxLens contributes to reducing the access gap for screen-reader users, and is a positive step forward, enabling screen-reader users to interact with and explore online data visualizations. 5.3.2 Interactive Dialogue is One of the “Top” Features. Similar to our frst fnding, all the participants found the question-and-answer mode of VoxLens a fast and efcient way to extract information from online data visualizations. S2 considered the question-and- answer mode as one of the key features of VoxLens: So I believe that one of the really top features is, kind of, interactive dialogue. (S2) Similarly, S1 found the question-and-answer mode a fast and reliable way to extract information, requiring “a lot less brain power.” She said: I especially liked the part of the tool where you can ask it a question and it would give Figure 5: Interaction Times (IT ), in seconds, for screen- you the information back. I thought it was reader users without (N =36) and with (N =21) VoxLens by (a) brilliant actually. I felt like being able Visualization Library (V L), (b) Data Complexity Level (CMP), to ask it a question made everything go a and (c) Question Difculty Level (DF ). Error bars represent lot faster and it took a lot less brain power mean ± standard deviation. Lower is better (faster). I think. I felt really confident about the answers that it was giving back to me. (S1) S3 noted the broader utility and applicability of the question- Oh, [VoxLens] was outstanding. It’s and-answer mode: definitely a great way to visualize the The voice activation was very, very neat. I’m graphs if you can’t see them in the charts. sure it could come in handy for a variety of I mean, it’s just so cool that this is uses too. I definitely enjoyed that feature. something that allows a blind person to (S3) VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA S5 faced some challenges in activating the right command but 5.3.5 One-Size-Fits-All Is Not the Optimal Solution. To enhance was able to learn the usage of the question-and-answer mode in a the usability of and interaction experience with VoxLens, our par- few tries: ticipants identifed the need to cater to the individual preferences You know, sometimes the word was wrong and of the screen-reader users. For example, S3 recognized the need to I think it says something like, it didn’t have multiple options to “play” with the sonifed response: understand, but basically eventually I got So I was just thinking maybe, you know, it right. (S5) that could be some sort of option or like Our second fnding indicates that VoxLens’ question-and-answer an alternate way to sonify it. Perhaps mode is a fast, efcient, and reliable way for screen-reader users to having an option to do it as continuous extract information. Additionally, the feedback from the question- cause I noticed, like, they were all and-answer mode assists screen-reader users to resolve the chal- discrete. ’Cause sometimes, you know, it’s lenges by themselves within a few tries. just preference or that could be something that could add some usability. It’s just 5.3.3 Sonification Helps in “Visualizing” Data. Our third result some little things to maybe play with or to reveals that our participants found sonifcation helpful in under- maybe give an option or something. (S3) standing general trends in the data. Specifcally, participants were able to infer whether an overall trend was increasing or decreasing, Similarly, S4 was interested in VoxLens predicting what she was obtaining holistic information about the data. S2 said: going to ask using artifcial intelligence (A.I.). She said: The idea of sonification of the graph You know, I think that [VoxLens] would need could give a general understanding of the a lot more artificial intelligence. It could trends. The way that it could summarize the be a lot [more] intuitive when it comes to charts was really nice too. The sonification understanding what I’m going to ask. (S4) feature was amazing. (S2) Additionally, S2 suggested adding setting preferences for the S1, who had never used sonifcation before, expressed her initial summary and the auditory sonifed output: struggles interpreting a sonifed response but was able to “visualize” [Summary mode] could eventually become a the graph through sonifcation within a few tries. She said: setting preference or something that can The audio graph... I’d never used one before, be disabled. And you, as a screen-reader so I kind of struggled with that a little bit user, could not control the speed of the because I wasn’t sure if the higher pitch [sonification] to you. To go faster or to meant the bar was higher up in the graph or go slower, even as a blind person, would be not. But being able to visualize the graph [helpful]. (S2) with this because of the sound was really Our fndings indicate that a one-size-fts-all solution is not op- helpful. (S1) timal and instead, a personalizable solution should be provided, a Overall, our third result shows that sonifcation is a helpful notion supported by recent work in ability-based design [79]. We feature for screen-reader users to interact with data visualizations, are working to incorporate the feedback and suggestions from our providing them with holistic information about data trends. participants into VoxLens. 5.3.4 Data Summary is a Good Starting Point. In keeping with 5.4 Subjective Workload Ratings fndings from prior work [69], our fourth fnding indicates that screen-reader users frst seek to obtain a holistic overview of the We used the NASA Task Load Index (TLX) [38] workload question- data, fnding a data summary to be a good starting point for visual- naire to collect subjective ratings for VoxLens. The NASA-TLX ization exploration. The summary mode of VoxLens enabled our instrument asks participants to rate the workload of a task on six participants to quickly get a “general picture” of the data. S1 and scales: mental demand, physical demand, temporal demand, per- S4 expressed the benefts of VoxLens’ summary mode: formance, efort, and frustration. Each scale ranges from low (1) to high (20). We further classifed the scale into four categories I thought the summary feature was really for a score x: low (x < 6), somewhat low (6 ≤ x < 11), somewhat great just to get, like, a general picture high (11 ≤ x < 16), and high (16 ≤ x). Our results indicate that and then diving deeper with the other VoxLens requires low physical- (M=3.4, SD=3.3) and temporal de- features to get a more detailed image in mand (M=5.7, SD=3.8), and has high perceived performance (M=5.6, my head about what the graphs look like. (S1) SD=5.6). Mental demand (M=7.8, SD=4.4), efort (M=9.9, SD=6.1), So, um, the summary option was a good start and frustration (M=8.3, SD=6.6) were somewhat low. point to know, okay, what is, kind of, on Prior work [69], which is the source of our data for screen-reader the graph. (S4) users who did not use VoxLens, did not conduct a NASA-TLX sur- Our fourth result indicates that VoxLens’ summary mode as- vey with their participants. Therefore, a direct workload compari- sisted screen-reader users to holistically explore the online data son is not possible. However, the subjective ratings from our study visualizations, helping them in determining if they want to dig could serve as a control for comparisons in future work attempting deeper into the data. to make online visualizations accessible for screen-reader users. CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock 6 DISCUSSION efectiveness of voice-recognition technology. Additional investiga- In this work, we created VoxLens, an interactive JavaScript plug-in tion showed that to answer the questions in our experiment tasks, to make online data visualizations more accessible to screen-reader screen-reader users utilized the Question-and-Answer mode 71.9% users. This work has been guided by the recommendations and of the time, compared to the Summary (22.5%) and Sonifcation fndings from prior work [69] that highlight the barriers screen- (5.5%) modes. Out of the 71.9% Question-and-Answer mode usage, reader users face in accessing the information contained in online VoxLens accurately recognized and responded to commands 49.9% data visualizations. In creating VoxLens, we sought to improve of the time; 34% of the time VoxLens was unable to accurately parse the accessibility of online data visualizations by making them dis- the speech input, and the remaining 16.1% of the time VoxLens coverable and comprehensible to screen readers, and by enabling received commands that were not supported (e.g., “correlation co- screen-reader users to explore the data both holistically and in a efcient”). VoxLens uses the Web Speech API [60] for recognizing drilled-down manner. To achieve this, we designed three modes voice commands. While the Web Speech API is a great leap for- of VoxLens: (1) Question-and-Answer mode; (2) Summary mode, ward in terms of speech-input and text-to-speech output features and (3) Sonifcation mode. Our task-based experiments show that [60], it is still an experimental feature with limited performance of screen-reader users extracted information 122% more accurately about 70% accuracy [64]. Therefore, future work could evaluate the and spent 36% less time when interacting with online data visualiza- performance of VoxLens with alternatives to the Web Speech API. tions using VoxLens than without. Additionally, we observed that screen-reader users utilized VoxLens uniformly across all visual- 6.3 Qualitative Assessment of VoxLens ization libraries that were included in our experiments, irrespective All six screen-reader users we interviewed expressed that VoxLens of the underlying implementations and accessibility measures of signifcantly improved their current experiences with online data the libraries, achieving a consistent interaction. visualizations. Participants showed their excitement about VoxLens assisting them in “visualizing” the data and in extracting informa- 6.1 Simultaneous Improvement In Accuracy tion from important visualizations, such as the ones portraying and Interaction Times COVID-19 statistics. Furthermore, some of our participants high- lighted that VoxLens reduces the access gap between screen-reader- Prior work [69] has reported that due to the inaccessibility of online and non-screen-reader users. For example, S4 mentioned that with data visualizations, screen-reader users extract information 62% less the help of VoxLens, she was able to “fnd out specifc pieces of infor- accurately than non-screen-reader users. We found that VoxLens mation without having to read an entire page of text,” similar to how improved the accuracy of information extraction of screen-reader a “sighted person” would interact with the graph. Additionally, our users by 122%, reducing the information extraction gap between participants found VoxLens “pretty easy,” “meaningful,” “smooth,” the two user groups from 62% to 15%. However, in terms of interac- and “intuitive,” without requiring a high mental demand. tion time, while VoxLens reduced the gap from 211% to 99%, the diference is still statistically signifcant between non-screen-reader and VoxLens users. Non-screen-reader users utilize their visual 6.4 VoxLens is a Response to Call-to-Action for system’s power to quickly recognize patterns and extrapolate in- Inclusive Data Visualizations formation from graphs, such as overall trends and extrema [57]. In Taking these fndings together, VoxLens is a response to the call- contrast, screen-reader users rely on alternative techniques, such as to-action put forward by Marriott et al. [57] that asserts the need sonifcation, to understand data trends. However, hearing a sonifed to improve accessibility for disabled people disenfranchised by ex- version of the data can be time-consuming, especially when the isting data visualizations and tools. VoxLens is an addition to the data cardinality is large, contributing to the diference in the inter- tools and systems designed to make the Web an equitable place action times between the two user groups. Additionally, issuing a for screen-reader users, aiming to bring their experiences on a par voice command, pressing a key combination, and the duration of with that of non-screen-reader users. Through efective advertise- the auditory response can also contribute to the observed difer- ment, and by encouraging developers to integrate VoxLens into ence. However, it is worth emphasizing that screen-reader users the codebase of visualization libraries, we hope to broadly expand who used VoxLens improved their interaction time by 36% while the reach and impact of VoxLens. Additionally, through collecting also increasing their accuracy of extracted information by 122%. In anonymous usage logs (VoxLens modes used, commands issued, other words, VoxLens users became both faster and more accurate, and responses issued) and feedback from users—a feature already a fortunate outcome often hard to realize in human performance implemented in VoxLens—we aspire to continue improving the studies due to speed-accuracy tradeofs. usability and functionality of VoxLens for a diverse group of users. 6.2 Role of Voice-Recognition Technology 7 LIMITATIONS & FUTURE WORK For screen-reader users who used VoxLens, 75% (N =141) of the At present, VoxLens is limited to two-dimensional data visualiza- answers were correct and 11% (N =20) were incorrect. Our partici- tions with a single series of data. Future work could study the pants were unable to extract the answers from the remaining 15% experiences of screen-reader users with n-dimensional data visual- (N =28) of the questions. Further exploration revealed that among izations and multiple series of data, and extend the functionality of the 25% (N =48) of the questions that were not answered correctly, VoxLens based on the fndings. Additionally, VoxLens is currently 52% (N =25) involved symmetry comparison. Symmetry comparison only fully functional on Google Chrome, as the support for the requires value retrieval of multiple data points and relies on the Web Speech API’s speech recognition is currently limited to Google VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Chrome. Future work could consider alternatives to the Web Speech [4] Theodore W. Anderson and Donald A. Darling. 1954. A test of goodness of ft. API that ofer cross-browser support for speech recognition. Journal of the American statistical association 49, 268 (1954), 765–769. [5] Apple. n.d.. Audio Graphs | Apple Developer Documentation. https://developer. Our fndings showed that some of our participants preferred apple.com/documentation/accessibility/audio_graphs. (Accessed on 08/01/2021). to have the ability to control the speed, frequency, and waveform [6] Ronald J. Baken and Robert F. Orlikof. 2000. Clinical Measurement of Speech and Voice. Singular Thomson Learning, San Diego, California, USA. https: of the sonifed response. Therefore, future work could extend the //books.google.com/books?id=ElPyvaJbDiwC functionality of VoxLens by connecting it to a centralized con- [7] Nikola Banovic, Rachel L. Franz, Khai N. Truong, Jennifer Mankof, and Anind K. fguration management system, enabling screen-reader users to Dey. 2013. Uncovering Information Needs for Independent Spatial Learning for Users Who Are Visually Impaired. In Proceedings of the 15th International ACM specify their preferences. These preferences could then be used to SIGACCESS Conference on Computers and Accessibility (Bellevue, Washington) generate appropriate responses, catering to the individual needs of (ASSETS ’13). Association for Computing Machinery, New York, NY, USA, Article screen-reader users. 24, 8 pages. https://doi.org/10.1145/2513383.2513445 [8] Katja Battarbee. 2004. Co-experience: understanding user experiences in interaction. Aalto University, Helsinki, Finland. 103 + app. 117 pages. http://urn.f/URN: ISBN:951-558-161-3 8 CONCLUSION [9] Donald A. Berry. 1987. Logarithmic Transformations in ANOVA. Biometrics 43, We presented VoxLens, a JavaScript plug-in that improves the acces- 2 (1987), 439–456. http://www.jstor.org/stable/2531826 [10] Jacques Bertin. 1983. Semiology of graphics; diagrams networks maps. Technical sibility of online data visualizations, enabling screen-reader users Report. to extract information using a multi-modal approach. In creating [11] Michael Bostock, Vadim Ogievetsky, and Jefrey Heer. 2011. D3 data-driven VoxLens, we sought to address the challenges screen-reader users documents. IEEE transactions on visualization and computer graphics 17, 12 (2011), 2301–2309. face with online data visualizations by enabling them to extract [12] Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. information both holistically and in a drilled-down manner, using Qualitative research in psychology 3, 2 (2006), 77–101. techniques and strategies that they prefer. Specifcally, VoxLens [13] Matthew Brehmer, Bongshin Lee, Petra Isenberg, and Eun Kyoung Choe. 2018. Visualizing ranges over time on mobile phones: a task-based crowdsourced provides three modes of interaction using speech and sonifcation: evaluation. IEEE transactions on visualization and computer graphics 25, 1 (2018), Question-and-Answer mode, Summary mode, and Sonifcation mode. 619–629. [14] Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract To assess the performance of VoxLens, we conducted task-based visualization tasks. IEEE transactions on visualization and computer graphics 19, experiments and interviews with screen-reader users. VoxLens sig- 12 (2013), 2376–2385. nifcantly improved the interaction experiences of screen-reader [15] Craig Brown and Amy Hurst. 2012. VizTouch: Automatically Generated Tactile Visualizations of Coordinate Spaces. In Proceedings of the Sixth International users with online data visualizations, both in terms of accuracy Conference on Tangible, Embedded and Embodied Interaction (Kingston, Ontario, of extracted information and interaction time, compared to their Canada) (TEI ’12). Association for Computing Machinery, New York, NY, USA, conventional interaction with online data visualizations. Our re- 131–138. https://doi.org/10.1145/2148131.2148160 [16] Lorna M. Brown, Stephen A. Brewster, Ramesh Ramloll, Mike Burton, and Beate sults also show that screen-reader users considered VoxLens to Riedel. 2003. Design guidelines for audio presentation of graphs and tables. be a “game-changer,” providing them with “exciting new ways” to In Proceedings of the 9th International Conference on Auditory Display. Citeseer, Boston University, USA, 284–287. interact with online data visualizations and saving them time and [17] Ben Caldwell, Michael Cooper, Loretta Guarino Reid, Gregg Vanderheiden, Wendy efort. We hope that by open-sourcing our code for VoxLens and Chisholm, John Slatin, and Jason White. 2008. Web content accessibility guide- our sonifcation solution, our work will inspire developers and visu- lines (WCAG) 2.0. [18] ChartJS. [n. d.]. Accessibility | Chart.js. https://www.chartjs.org/docs/3.5.1/ alization creators to continually improve the accessibility of online general/accessibility.html. (Accessed on 01/08/2022). data visualizations. We also hope that our work will motivate and [19] Peter Ciuha, Bojan Klemenc, and Franc Solina. 2010. Visualization of Concurrent guide future research in making data visualizations accessible. Tones in Music with Colours. In Proceedings of the 18th ACM International Confer- ence on Multimedia (Firenze, Italy) (MM ’10). Association for Computing Machin- ery, New York, NY, USA, 1677–1680. https://doi.org/10.1145/1873951.1874320 ACKNOWLEDGMENTS [20] Jacob Cohen. 1973. Eta-squared and partial eta-squared in fxed factor ANOVA designs. Educational and psychological measurement 33, 1 (1973), 107–112. This work was supported in part by the Mani Charitable Foundation, [21] Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz stud- ies—why and how. Knowledge-based systems 6, 4 (1993), 258–266. the University of Washington Center for an Informed Public, and [22] Patrick Dengler, Anthony Grasso, Chris Lilley, Cameron McCormack, Doug the University of Washington Center for Research and Education Schepers, and Jonathan Watt. 2011. Scalable Vector Graphics (SVG) 1.1. on Accessible Technology and Experiences (CREATE). We extend [23] Google Developers. 2014. Charts. https://developers.google.com/chart/ [24] Roger L. DuBois. 2017. Web Audio Speech Synthesis / Recognition for p5.js. our gratitude to the AccessComputing staf for their support and https://github.com/IDMNYU/p5.js-speech assistance in recruiting participants. We would also like to thank the [25] Christopher J. Ferguson. 2016. An efect size primer: A guide for clinicians and anonymous reviewers for their helpful comments and suggestions. researchers. In Methodological issues and strategies in clinical research, A.E. Kazdin (Ed.). American Psychological Association, Washington, DC, USA, 301––310. Any opinions, fndings, conclusions, or recommendations expressed [26] John H. Flowers. 2005. Thirteen years of refection on auditory graphing: in this work are those of the authors and do not necessarily refect Promises, pitfalls, and potential new directions. In Proceedings of the 11th Meeting of the International Conference on Auditory Display. Citeseer, Limerick, Ireland, those of any supporter. 406–409. [27] John H. Flowers, Dion C. Buhman, and Kimberly D. Turnage. 1997. Cross- modal equivalence of visual and auditory scatterplots for exploring bivariate REFERENCES data samples. Human Factors 39, 3 (1997), 341–351. [1] Chadia Abras, Diane Maloney-Krichmar, Jenny Preece, et al. 2004. User-centered [28] OpenJS Foundation. n.d.. Node.js. https://nodejs.org/en/. (Accessed on design. Bainbridge, W. Encyclopedia of Human-Computer Interaction. Thousand 08/08/2021). Oaks: SAGE Publications 37, 4 (2004), 445–456. [29] The GraphQL Foundation. n.d.. GraphQL | A query language for your API. [2] NV Access. n.d.. NV Access | Download. https://www.nvaccess.org/download/. https://graphql.org/. (Accessed on 08/08/2021). (Accessed on 08/08/2021). [30] Brigitte N. Frederick. 1999. Fixed-, random-, and mixed-efects ANOVA models: [3] Dragan Ahmetovic, Cristian Bernareggi, João Guerreiro, Sergio Mascetti, and A user-friendly guide for increasing the generalizability of ANOVA results. In Anna Capietto. 2019. Audiofunctions. web: Multimodal exploration of math- Advances in Social Science Methodology, B. Thompson (Ed.). JAI Press, Stamford, ematical function graphs. In Proceedings of the 16th International Web for All Connecticut, 111–122. Conference. 1–10. CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock [31] Deen G. Freelon. 2010. ReCal: Intercoder reliability calculation as a web service. [57] Kim Marriott, Bongshin Lee, Matthew Butler, Ed Cutrell, Kirsten Ellis, Cagatay International Journal of Internet Science 5, 1 (2010), 20–33. Goncu, Marti Hearst, Kathleen McCoy, and Danielle Albers Szafr. 2021. Inclusive [32] Arthur Gilmour, Robert D. Anderson, and Alexander L. Rae. 1985. The analysis data visualization for people with disabilities: a call to action. Interactions 28, 3 of binomial data by a generalized linear mixed model. Biometrika 72, 3 (1985), (2021), 47–51. 593–599. [58] David K. McGookin and Stephen A. Brewster. 2006. SoundBar: Exploiting Multiple [33] Nicholas A. Giudice, Hari Prasath Palani, Eric Brenner, and Kevin M. Kramer. 2012. Views in Multimodal Graph Browsing. In Proceedings of the 4th Nordic Conference Learning Non-Visual Graphical Information Using a Touch-Based Vibro-Audio on Human-Computer Interaction: Changing Roles (Oslo, Norway) (NordiCHI ’06). Interface. In Proceedings of the 14th International ACM SIGACCESS Conference on Association for Computing Machinery, New York, NY, USA, 145–154. https: Computers and Accessibility (Boulder, Colorado, USA) (ASSETS ’12). Association //doi.org/10.1145/1182475.1182491 for Computing Machinery, New York, NY, USA, 103–110. https://doi.org/10. [59] Silvia Mirri, Silvio Peroni, Paola Salomoni, Fabio Vitali, and Vincenzo Rubano. 1145/2384916.2384935 2017. Towards accessible graphs in HTML-based scientifc articles. In 2017 14th [34] The PostgreSQL Global Development Group. n.d.. PostgreSQL: The world’s most IEEE Annual Consumer Communications Networking Conference (CCNC). IEEE, advanced open source database. https://www.postgresql.org/. (Accessed on Las Vegas, NV, USA, 1067–1072. https://doi.org/10.1109/CCNC.2017.7983287 08/08/2021). [60] André Natal, Glen Shires, and Philip Jägenstedt. n.d.. Web Speech API. https: [35] Melita Hajdinjak and France Mihelic. 2004. Conducting the Wizard-of-Oz Exper- //wicg.github.io/speech-api/. (Accessed on 08/07/2021). iment. Informatica (Slovenia) 28, 4 (2004), 425–429. [61] Keita Ohshiro, Amy Hurst, and Luke DuBois. 2021. Making Math Graphs More [36] Patrick A.V. Hall and Geof R. Dowling. 1980. Approximate string matching. Accessible in Remote Learning: Using Sonifcation to Introduce Discontinuity in ACM computing surveys (CSUR) 12, 4 (1980), 381–402. Calculus. In The 23rd International ACM SIGACCESS Conference on Computers [37] Jordan Harband, Shu-yu Guo, Michael Ficarra, and Kevin Gibbons. 1999. Standard and Accessibility. 1–4. ecma-262. [62] Michael Q. Patton. 1990. Qualitative evaluation and research methods. SAGE [38] Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX Publications Inc., Saint Paul, MN, USA. (Task Load Index): Results of empirical and theoretical research. In Advances in [63] Pablo Picazo-Sanchez, Juan Tapiador, and Gerardo Schneider. 2020. After you, psychology. Vol. 52. Elsevier, North-Holland, Netherlands, 139–183. please: browser extensions order attacks and countermeasures. International [39] Highcharts. n.d.. Sonifcation | Highcharts. https://www.highcharts.com/docs/ Journal of Information Security 19, 6 (2020), 623–638. accessibility/sonifcation. (Accessed on 08/01/2021). [64] Ricardo Sousa Rocha, Pedro Ferreira, Inês Dutra, Ricardo Correia, Rogerio Salvini, [40] Clare J. Hooper. 2011. Towards Designing More Efective Systems by Under- and Elizabeth Burnside. 2016. A Speech-to-Text Interface for MammoClass. In standing User Experiences. SIGWEB Newsl. Autumn, 4, Article 4 (Sept. 2011), 2016 IEEE 29th International Symposium on Computer-Based Medical Systems 3 pages. https://doi.org/10.1145/2020936.2020940 (CBMS). IEEE, Dublin, Ireland and Belfast, Northern Ireland, 1–6. https://doi. [41] Mike H. Hoyle. 1973. Transformations: An Introduction and a Bibliography. org/10.1109/CBMS.2016.25 International Statistical Review / Revue Internationale de Statistique 41, 2 (1973), [65] Bahador Saket, Alex Endert, and Çağatay Demiralp. 2018. Task-based efective- 203–223. http://www.jstor.org/stable/1402836 ness of basic visualizations. IEEE transactions on visualization and computer [42] Weijian Hu, Kaiwei Wang, Kailun Yang, Ruiqi Cheng, Yaozu Ye, Lei Sun, and graphics 25, 7 (2018), 2505–2512. Zhijie Xu. 2020. A comparative study in real-time scene sonifcation for visually [66] Nik Sawe, Chris Chafe, and Jefrey Treviño. 2020. Using Data Sonifcation to impaired people. Sensors 20, 11 (2020), 3222. Overcome Science Literacy, Numeracy, and Visualization Barriers in Science [43] Amy Hurst and Shaun Kane. 2013. Making "Making" Accessible. In Proceedings Communication. Frontiers in Communication 5 (2020), 46. of the 12th International Conference on Interaction Design and Children (New York, [67] Anastasia Schaadhardt, Alexis Hiniker, and Jacob O. Wobbrock. 2021. Under- New York, USA) (IDC ’13). Association for Computing Machinery, New York, NY, standing Blind Screen-Reader Users’ Experiences of Digital Artboards. In Pro- USA, 635–638. https://doi.org/10.1145/2485760.2485883 ceedings of the 2021 CHI Conference on Human Factors in Computing Systems. [44] Apple Inc. n.d.. Accessibility - Vision - Apple. https://www.apple.com/ Association for Computing Machinery, New York, NY, USA, Article 270, 19 pages. accessibility/vision/. (Accessed on 08/08/2021). https://doi.org/10.1145/3411764.3445242 [45] Facebook Inc. n.d.. React – A JavaScript library for building user interfaces. [68] Freedom Scientifc. n.d.. JAWS® – Freedom Scientifc. https://www. https://reactjs.org/. (Accessed on 08/08/2021). freedomscientifc.com/products/software/jaws/. (Accessed on 08/08/2021). [46] Dae Hyun Kim, Enamul Hoque, and Maneesh Agrawala. 2020. Answering Ques- [69] Ather Sharif, Sanjana Chintalapati, Jacob O. Wobbrock, and Katharina Reinecke. tions about Charts and Generating Visual Explanations. In Proceedings of the 2020 2021. Understanding Screen-Reader Users’ Experiences with Online Data Visual- CHI Conference on Human Factors in Computing Systems. Association for Com- izations. In The 23rd International ACM SIGACCESS Conference on Computers and puting Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831. Accessibility (Virtual Event) (ASSETS ’21). Association for Computing Machinery, 3376467 New York, NY, USA, To Appear. [47] Edward Kim and Kathleen F McCoy. 2018. Multimodal deep learning using [70] Ather Sharif and Babak Forouraghi. 2018. evoGraphs — A jQuery plugin to create images and text for information graphic classifcation. In Proceedings of the 20th web accessible graphs. In 2018 15th IEEE Annual Consumer Communications International ACM SIGACCESS Conference on Computers and Accessibility. 143– Networking Conference (CCNC). IEEE, Las Vegas, NV, USA, 1–4. https://doi.org/ 148. 10.1109/CCNC.2018.8319239 [48] Edward Kim, Connor Onweller, and Kathleen F McCoy. 2021. Information Graphic [71] Lei Shi, Idan Zelzer, Catherine Feng, and Shiri Azenkot. 2016. Tickers and Talker: Summarization using a Collection of Multimodal Deep Neural Networks. In 2020 An Accessible Labeling Toolkit for 3D Printed Models. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR). IEEE, 10188–10195. 2016 CHI Conference on Human Factors in Computing Systems. Association for [49] Klaus Krippendorf. 2011. Computing Krippendorf’s Alpha-Reliability. https: Computing Machinery, New York, NY, USA, 4896–4907. https://doi.org/10.1145/ //repository.upenn.edu/asc_papers/43 Retrieved from. 2858036.2858507 [50] Klaus Krippendorf. 2018. Content analysis: An introduction to its methodology. [72] Boris Smus. 2013. Web Audio API: Advanced Sound for Games and Interactive SAGE Publications Inc., Pennsylvania, USA. Apps. O’Reilly Media, California, USA. https://books.google.com/books?id= [51] Richard J. Landis and Gary G. Kock. 1977. The Measurement of Observer eSPyRuL8b7UC Agreement for Categorical Data. Biometrics 33, 1 (1977), 159–174. http: [73] Arjun Srinivasan, Nikhila Nyapathy, Bongshin Lee, Steven M. Drucker, and John //www.jstor.org/stable/2529310 Stasko. 2021. Collecting and Characterizing Natural Language Utterances for [52] Bongshin Lee, Arjun Srinivasan, Petra Isenberg, John Stasko, et al. 2021. Post- Specifying Data Visualizations. In Proceedings of the 2021 CHI Conference on WIMP Interaction for Information Visualization. Foundations and Trends® in Human Factors in Computing Systems. Association for Computing Machinery, Human-Computer Interaction 14, 1 (2021), 1–95. New York, NY, USA, Article 464, 10 pages. https://doi.org/10.1145/3411764. [53] Eckhard Limpert, Werner A. Stahel, and Markus Abbt. 2001. Log-normal distri- 3445400 butions across the sciences: keys and clues: on the charms of statistics, and how [74] Jonathan E. Thiele, Michael S. Pratte, and Jefrey N. Rouder. 2011. On perfect mechanical models resembling gambling machines ofer a link to a handy way working-memory performance with large numbers of items. Psychonomic Bulletin to characterize log-normal distributions, which can provide deeper insight into & Review 18, 5 (2011), 958–963. variability and probability—normal or log-normal: that is the question. BioScience [75] Ingo R. Titze and Daniel W. Martin. 1998. Principles of voice production. 51, 5 (2001), 341–352. [76] Frances Van Scoy, Don McLaughlin, and Angela Fullmer. 2005. Auditory augmen- [54] Ramon C. Littell, Henry P. Raymond, and Clarence B. Ammerman. 1998. Statistical tation of haptic graphs: Developing a graphic tool for teaching precalculus skill analysis of repeated measures data using SAS procedures. Journal of animal to blind students. In Proceedings of the 11th Meeting of the International Conference science 76, 4 (1998), 1216–1231. on Auditory Display, Vol. 5. Citeseer, Limerick, Ireland, 5 pages. [55] Alan Lundgard and Arvind Satyanarayan. 2021. Accessible Visualization via [77] W3C. n.d.. WAI-ARIA Overview | Web Accessibility Initiative (WAI) | W3C. Natural Language Descriptions: A Four-Level Model of Semantic Content. IEEE https://www.w3.org/WAI/standards-guidelines/aria/. (Accessed on 04/11/2021). transactions on visualization and computer graphics (2021). [78] WebAIM. n.d.. WebAIM: CSS in Action - Invisible Content Just for Screen [56] Yotam Mann. n.d.. Tone.js. https://tonejs.github.io/. (Accessed on 08/02/2021). Reader Users. https://webaim.org/techniques/css/invisiblecontent/. (Accessed on 09/01/2021). VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA [79] Jacob O. Wobbrock, Krzysztof Z. Gajos, Shaun K. Kane, and Gregg C. Vanderhei- den. 2018. Ability-based design. Commun. ACM 61, 6 (2018), 62–71. [80] Susan P. Wyche and Rebecca E. Grinter. 2009. Extraordinary Computing: Religion as a Lens for Reconsidering the Home. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, 749–758. https://doi.org/10.1145/1518701.1518817 [81] Wai Yu, Ramesh Ramloll, and Stephen Brewster. 2001. Haptic graphs for blind computer users. In Haptic Human-Computer Interaction, Stephen Brewster and Roderick Murray-Smith (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 41–51. [82] Mingrui Ray Zhang, Ruolin Wang, Xuhai Xu, Qisheng Li, Ather Sharif, and Jacob O. Wobbrock. 2021. Voicemoji: Emoji Entry Using Voice for Visually Impaired People. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 37, 18 pages. https://doi.org/10.1145/3411764.3445338 [83] Haixia Zhao, Catherine Plaisant, Ben Shneiderman, and Jonathan Lazar. 2008. Data sonifcation for users with visual impairment: a case study with georefer- enced data. ACM Transactions on Computer-Human Interaction (TOCHI) 15, 1 (2008), 1–28. CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA Ather Sharif, Olivia H. Wang, Alida T. Muongchan, Katharina Reinecke, and Jacob O. Wobbrock A PARTICIPANT DEMOGRAPHICS Gender Age Screen Reader Vision-Loss Level Diagnosis P1 M 28 NVDA Blind since birth, Complete blindness Optic Nerve Hypoplasia P2 M 61 JAWS Complete blindness, Lost vision gradually Optic Neuropathy P3 M 48 JAWS Complete blindness, Lost vision gradually Leber Congenital Amauro- sis P4 F 29 NVDA Blind since birth, Complete blindness Optic Nerve Hypoplasia and Glaucoma P5 F 37 JAWS Blind since birth, Complete blindness Leber Congenital Amauro- sis P6 F 51 JAWS Blind since birth, Complete blindness Retinopathy of Prematurity P7 M 58 JAWS Complete blindness, Lost vision gradually Glaucoma P8 M 30 NVDA Blind since birth, Complete blindness Leber Congenital Amauro- sis P9 F 64 JAWS Complete blindness, Lost vision gradually Retinitis Pigmentosa P10 F 68 Fusion Lost vision gradually, Partial blindness Stargaart’s Maculopathy P11 F 73 JAWS Complete blindness, Lost vision gradually Retinitis Pigmentosa P12 F 64 JAWS Complete blindness, Lost vision gradually Cataracts P13 M 18 NVDA Complete blindness Brain Tumor P14 M 36 JAWS Blind since birth, Complete blindness Leber Congenital Amauro- sis P15 M 25 NVDA Lost vision gradually, Partial vision Retinopathy of Prematurity and Subsequent Cataracts P16 M 42 JAWS Blind since birth, Complete blindness Microphthalmia P17 M 68 JAWS Complete blindness, Lost vision gradually Detached Retinas P18 F 31 NVDA Blind since birth, Complete blindness Retinopathy of Prematurity P19 F 47 JAWS Complete blindness, Lost vision gradually Optic Neuropathy P20 M 48 NVDA Complete blindness, Lost vision gradually Retinitis Pigmentosa P21 M 43 NVDA Complete blindness, Lost vision gradually Retinitis Pigmentosa P22 M 19 NVDA Blind since birth, Complete blindness Retinopathy of Prematurity Table 6: Screen-reader participants, their gender identifcation, age, screen reader, vision-loss level, and diagnosis. Under the Gender column, M = Male, F = Female, and N B = Non-binary. VoxLens CHI ’22, April 29-May 5, 2022, New Orleans, LA, USA B WIZARD-OF-OZ PARTICIPANT DEMOGRAPHICS Gender Age Screen Reader Vision-Loss Level Diagnosis W1 M 25 VoiceOver Partial vision Extremely low vision W2 M 28 NVDA Blind since birth Optic Nerve Hypoplaxia W3 M 23 VoiceOver Blind since birth Septo-optic Dysplasia W4 F 26 JAWS Blind since birth Leber Congenital Amaurosis W5 M 31 JAWS Blind since birth Retinopathy of Prematurity Table 7: Screen-reader participants for the Wizard-of-Oz experiment, their gender identifcation, age, screen reader, vision-loss level, and diagnosis. Under the Gender column, M = Male, F = Female, and N B = Non-binary. C INTERACTION TIME PER AGE RANGE Both Groups Without VoxLens With VoxLens Age Range N Mean SD N Mean SD N Mean SD 18-19 3 62.5 30.5 1 96.8 - 2 45.3 9.4 20-29 9 40.6 28.0 6 63.7 20.3 3 50.8 7.9 30-39 10 44.2 23.1 7 60.6 16.5 3 43.9 4.5 40-49 15 67.5 78.7 10 106.7 93.1 5 59.8 14.9 50-59 7 47.6 27.3 5 79.1 19.2 2 63.3 3.0 60-69 11 64.8 33.6 6 76.8 33.7 5 55.2 10.8 > 70 2 127.0 127.4 1 217.1 - 1 60.1 - Table 8: Summary results from 57 screen-reader participants with (N =21) and without (N =36) VoxLens showing the numerical results for Interaction Time (IT), for each age range. N is the total number of participants for the given age range, Mean is the average IT in seconds, and SD represents the standard deviation.