DOKK Library

Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript

Authors Anders Møller, Benjamin Barslev Nielsen,

License CC-BY-3.0

Plaintext
Value Partitioning: A Lightweight Approach to
Relational Static Analysis for JavaScript
Benjamin Barslev Nielsen
Aarhus University, Denmark
barslev@cs.au.dk
Anders Møller
Aarhus University, Denmark
amoeller@cs.au.dk

      Abstract
In static analysis of modern JavaScript libraries, relational analysis at key locations is critical to
provide sound and useful results. Prior work addresses this challenge by the use of various forms of
trace partitioning and syntactic patterns, which is fragile and does not scale well, or by incorporating
complex backwards analysis. In this paper, we propose a new lightweight variant of trace partitioning
named value partitioning that refines individual abstract values instead of entire abstract states.
We describe how this approach can effectively capture important relational properties involving
dynamic property accesses, functions with free variables, and predicate functions. Furthermore, we
extend an existing JavaScript analyzer with value partitioning and demonstrate experimentally that
it is a simple, precise, and efficient alternative to the existing approaches for analyzing widely used
JavaScript libraries.

2012 ACM Subject Classification Theory of computation → Program analysis

Keywords and phrases JavaScript, dataflow analysis, abstract interpretation

Digital Object Identifier 10.4230/LIPIcs.ECOOP.2020.16

Funding This work was supported by the European Research Council (ERC) under the European
Union’s Horizon 2020 research and innovation program (grant agreement No 647544).


 1      Introduction
JavaScript programs are challenging to analyze statically due to the dynamic nature of the
language. One of the main obstacles is the presence of dynamic property access operations
that allow objects to be manipulated using object property names that are dynamically
computed strings. A typical pattern that has received much attention is correlated read/write
pairs [25], a simple variant of which looks as follows:

                                    t = x[p];        ...   y[p] = t;

At run-time, this code copies a property whose name is the value of p from the x object to the y
object. If the static analysis does not know precisely the string value of p, then the properties
of x will be mixed together in y. Experience with analyzers such as WALA [25, 24, 28],
SAFE [17, 22], JSAI [13], and TAJS [11, 2, 26] has shown that when analyzing real-world
JavaScript code, including jQuery, Lodash, Underscore and other widely used libraries, such
situations often cause an avalanche of spurious dataflow that makes the analysis results
useless. If, for example, x is the object {m1: f1, m2: f2, ..., m10: f10} where f1, f2,
. . . , f10 are functions, then any subsequent function call, for example y.m3(...), will be
treated by the analysis as a call to any of the 10 functions.
       Several analysis techniques have been proposed to address this challenge. The techniques
based on correlation tracking [25], static/dynamic determinacy [24, 2], and loop sensitivity [22]
aim to increase precision by the use of context sensitivity or loop unrolling to ensure that
            © Benjamin Barslev Nielsen and Anders Møller;
            licensed under Creative Commons License CC-BY
34th European Conference on Object-Oriented Programming (ECOOP 2020).
Editors: Robert Hirschfeld and Tobias Pape; Article No. 16; pp. 16:1–16:28
                   Leibniz International Proceedings in Informatics
                   Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
16:2   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


       the analysis has precise information about p in the example above. Although this approach
       works well in many cases, the aggressive use of context sensitivity or loop unrolling can be
       expensive on analysis time. Even more importantly, it falls short when p is not determinate
       (i.e., when its value is not fixed even when the call context is known).
           An important step forward is the approach used in the CompAbs analyzer, which is built
       on SAFE [16]: Even if p is imprecise, the loss of precision at the property write operation
       can be avoided by applying trace partitioning [23] at the property read operation, based on
       which properties exist on x. Intuitively, it is often not necessary to have precise information
       about p; instead we can refine the current abstract state into a collection of more precise
       partitions, one for each of the 10 properties of x (plus one extra for the case where p is none
       of those strings, but let us ignore that for now), and after the property write operation merge
       them again. (The same idea was used earlier in TAJS, but only at for-in loops, not at
       dynamic property reads [2].) This approach, however, also has drawbacks. Trace partitioning
       is expensive, so it must be used scarcely: in the example, the code between the dynamic
       property read and the dynamic property write is essentially analyzed 10 times. For this
       reason, CompAbs relies on a syntactic pre-analysis to recognize different kinds of correlated
       read/write pairs for guiding the creation and merging of partitions.
           Recent work [26] has shown that the syntactic pre-analysis approach of CompAbs is too
       fragile, for example, it is incapable of analyzing the Lodash library (see Section 2), and
       demand-driven value refinement has been proposed as an alternative. Instead of relying on
       context sensitivity, loop unrolling, or trace partitioning, that approach applies, during the
       analysis when encountering a dynamic property write operation with an imprecise property
       name, a separate backwards analysis to regain the relation between the property name and
       the value to be written. Although demand-driven value refinement has been shown to work
       quite well in practice, building a backwards analysis for the full JavaScript language and its
       standard library is a major endeavor, so developing simpler alternatives is desirable.
           Our approach builds upon the observation from CompAbs that sufficient precision can be
       obtained using trace partitioning based on the properties of the object being read. Our key
       insight is that we do not need to partition the entire abstract state as done by CompAbs: It
       suffices to only partition the abstract values for the property name p and the value being
       read x[p] in the above example. This means that instead of analyzing the code 10 times,
       we only analyze it once, but using partitioned abstract values that retain the correlation
       between p and x[p]. The partitioned abstract values are introduced at t = x[p] and used
       at y[p] = t by means of specialized transfer functions. We refer to this variant of trace
       partitioning as value partitioning. Since partitioning individual abstract values does not
       increase the analysis complexity as much as partitioning entire states, it becomes feasible
       to apply value partitioning more extensively, at every dynamic property read where the
       property name is imprecise, thereby obviating the need for the syntactic pre-analysis.
           In this paper we present a theoretical framework for value partitioning, together with
       three instantiations: property-name partitioning (which is the one used in the example
       above), free-variable partitioning (to improve precision for free variables of closures), and
       type partitioning (to improve precision for predicate functions). Additionally, we extend
       the static analyzer TAJS with all three kinds of value partitioning and demonstrate that
       the approach is effective for analyzing popular JavaScript libraries. Value partitioning is
       a lightweight alternative to the existing approaches to relational analysis for JavaScript:
       Compared to CompAbs-style trace partitioning it avoids many redundant computations
       caused by similarities between different partitions, and compared to demand-driven value
       refinement it avoids the need for creating a separate backwards analysis.
B. B. Nielsen and A. Møller                                                                           16:3


                   1 function mixin(object, source) {
                   2     baseFor(source, function (func, methodName) {
                   3       if (!isFunction(func))
                   4         return;
                   5       object[methodName] = func;
                   6       if (isFunction(object))
                   7         object.prototype[methodName] = function() {
                   8            ...
                   9            func.apply(...);
                  10         }
                  11     });
                  12 }
                  13
                  14 function baseFor(source, iteratee) {
                  15     Object.keys(source).forEach(function (key) {
                  16       iteratee(source[key], key);
                  17     });
                  18 }
                  19
                  20 // usage of mixin during initialization
                  21 mixin(lodash, lodash);


        Figure 1 Motivating example based on code from the Lodash library.


In summary our contributions are:
    Value partitioning: a general static analysis technique that is capable of reasoning about
    relations between abstract values.
    Three instantiations of value partitioning, which tackle different challenges in static
    analysis for JavaScript, each involving relational properties:
       property-name partitioning: relations between dynamically computed object property
       names and values;
       free-variable partitioning: relations between functions and their free variables; and
       type partitioning: relations between arguments and return values of predicate functions.
    Experimental results: We show that value partitioning makes TAJS more precise than
    CompAbs [16] for several real-world JavaScript libraries, including Lodash, which is
    the most widely used library. The resulting precision is comparable to (and in case of
    the Lodash4 benchmark group substantially higher than) that of demand-driven value
    refinement [26], without the need for a separate backwards analysis.


    2      Motivating Example and Overview
Figure 1 shows a small code example based on Lodash (version 4.17.10), which is the most
depended-upon of all npm packages.1 Lines 1–12 define the function mixin, which copies all
function properties from source to object. If object is a function, a new function (which


1
    Lodash (https://lodash.com/) has more than 115 000 dependents in npm and more than 27 million
    weekly downloads as of May 2020.




                                                                                                    ECOOP 2020
16:4   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


       on invocation calls the function to be copied) is also copied to object.prototype, such that
       instantiations of object (using the keyword new) also will have these functions. In line 21,
       which is executed during the initialization of Lodash, mixin is called with the library object
       as both arguments. The function mixin uses a helper function baseFor defined in lines 14–
       18. It is called with source and a callback function defined in lines 2–11. The baseFor
       function then gets all the object property names from the source object using Object.keys,
       and the callback function is called (line 16) for each property name and corresponding
       property value. Line 3 checks whether func is a function. If so, the function is copied to
       object[methodName] in line 5. Note that func actually is the value source[methodName].
       Line 6 checks whether object is also a function and if so, a new function is declared and
       written to object.prototype[methodName] in line 7. When invoked, that new function
       calls func using func.apply(...) in line 9.
           Such complex code is not unusual in modern JavaScript libraries. For a static analysis
       reasoning about the dataflow in this code, the correlation between methodName and func is
       critical. An analysis that loses track of this correlation will mix together all the properties of
       the library object lodash when analyzing the call mixin(lodash, lodash) in line 21. As a
       consequence, if the program being analyzed contains a call to, for example, lodash.map, that
       will be treated by the analysis as a call to any of Lodash’s more than 100 different functions,
       not only the actual map function, thereby triggering an avalanche of spurious dataflow.



       Existing approaches

       Existing JavaScript analyzers do not have precise information about the value of key in
       line 16, for various different reasons. (Most importantly, Object.keys produces an array
       of property names in unspecified order.) Previous work has suggested two approaches to
       analyze such code precisely even when key is imprecise. The CompAbs [16] approach uses
       trace partitioning guided by syntactic patterns. If trace partitioning is used at the dynamic
       property read operation in line 16, the abstract state is partitioned into a set of refined
       abstract states corresponding to the properties of the source object. This way the value
       of key is precise in each of those states, and the call in line 16 is analyzed separately for
       each of them. Trace partitioning, however, is expensive, so CompAbs limits the use of
       trace partitioning according to certain syntactic patterns. At this specific dynamic property
       operation, CompAbs chooses not to apply trace partitioning and fails to detect that the
       relation between methodName and func is important.
           The second approach is demand-driven value refinement [26], which can analyze the
       example code with sufficient precision to avoid mixing together the Lodash functions. With
       this approach, the analysis detects imprecision at the dynamic property write in line 5:
       methodName is an imprecise string and func can be many different functions. It then queries
       a backwards abstract interpreter asking for the possible value of methodName for each of
       the functions. The backwards analysis returns a precise property name for each function
       and thereby enables the dynamic property write operation to be modeled precisely. For the
       dynamic property write in line 7, the function defined in lines 7–10 is written to all properties
       of object.prototype, but the abstract value being written is augmented, such that the
       value of methodName remains precise. When reading func in line 9, the backwards analysis
       is queried to get the value of func relative to the value of methodName, thereby retrieving
       a precise value for func. This ensures the desired precision, but the approach requires a
       complicated backwards analysis.
B. B. Nielsen and A. Møller                                                                                   16:5


Value partitioning
We will now informally explain how value partitioning can provide similar precision as
demand-driven value refinement, but without the need for a backwards abstract interpreter.
With traditional trace partitioning, as used by, for example, CompAbs, the analysis can
track multiple abstract states for each program point, such that the different abstract states
cover different assumptions about the execution paths that lead to that point. (Correlation
tracking [25], determinacy-based analysis [24, 2], and loop sensitivity [22] can also be viewed
as variations of trace partitioning.) The key idea behind value partitioning is that we can
obtain a similar effect as trace partitioning by instead performing the partitioning at the
level of individual abstract values. In principle, the resulting abstract domain is isomorphic
to a traditional trace partitioning domain, but this approach provides more flexibility for
using different kinds of partitioning for different parts of the abstract states. This general
idea can be instantiated in multiple ways to track different kinds of relational properties. We
next describe three instantiations that enable precise analysis of challenging JavaScript code,
including the Lodash example.

Property name partitioning
One instantiation is property name partitioning, which performs partitioning at dynamic
property reads, similar to the CompAbs technique, but on abstract values instead of abstract
states. To illustrate this mechanism by example, consider the read operation in line 16
and the correlated write operation in line 5. Assume for simplicity that the source ob-
ject has only two properties, {map: f1, trim: f2} where f1 and f2 are functions, and
methodName is an abstract value that overapproximates all valid property names. When read-
ing source[methodName], an analysis without value partitioning will read all the properties of
source. When using value partitioning, we instead partition this value according to the prop-
erty names of source, meaning that we obtain a value [t1 7→ f1, t2 7→ f2, t3 7→ undefined]
where t1 , t2 , and t3 represent different partitions.2 Intuitively, t1 represents the execution
traces where the property name being read is map, t2 similarly represents traces where the
property name being read is trim, and t3 represents all other traces. We similarly write the
partitioned value [t1 7→ "map", t2 7→ "trim", t3 7→ AnyString] to methodName.3 In this way,
the resulting abstract state retains the correlation between the values of methodName and
source[methodName].
     Later the analysis reaches the write operation object[methodName] = func, with an
abstract state where methodName is [t1 7→ "map", t2 7→ "trim", t3 7→ AnyString] and func is
[t1 7→ f1, t2 7→ f2, t3 7→ undefined]. Since the property name and the value to be written
have the same partitions, we can perform the dynamic property write separately for each
partition, meaning that f1 is written to the map property, and analogously for the other two
partitions, thereby avoiding mixing together the properties.
     Since the partitioning is performed at the value level, unlike traditional trace partitioning
we do not need any extra call contexts to the callback function defined in line 2, so the
overhead of value partitioning is negligible, even when the correlated read/write pairs span
multiple functions. For this reason, we can apply property name partitioning at all dynamic
property reads where the property name is imprecise, without the use of syntactic patterns.


2
    In JavaScript, reading an absent property yields the special value undefined.
3
    AnyString is an abstract value that represents any string. In practice we instead use a slightly more
    precise abstract value representing AnyString\{"map", "trim"}.




                                                                                                            ECOOP 2020
16:6   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


       Free variable partitioning

       A second instantiation of value partitioning is for handling free variables more precisely.
       In the example, this is useful for func in line 9, which is a free variable in the function
       defined in lines 7–10. At that function definition, we partition both the resulting abstract
       function value ` and the abstract value of func according to the existing partitioning of func,
       intuitively to be able to distinguish functions created with different values of the free variable.
       This means that the function value being written at the dynamic property write in line 7 is
       [t1 7→ `t01 , t2 7→ `t02 , t3 7→ `t03 ] where `t01 represents the function created at a point where func
       is f1 (i.e., that point is at the end of a t1 trace), and similarly for the other partitions. At
       the same time, the value of func becomes [t1 7→ f1, t2 7→ f2, t3 7→ undefined, t01 7→ f1, t02 7→
       f2, t03 7→ undefined] where the three new partitions t01 , t02 , and t03 denote the new partitioning
       we have made (one abstract value can thus have multiple partitionings simultaneously). Using
       the property name partitioning mechanism described above, at the dynamic property write
       in line 7, `t01 is written to the map property of object.prototype, and similarly for the other
       properties.
            We can exploit the free variable partitioning information when the function is later
       called. Assume the analysis encounters a call to the map method. The abstract value of
       lodash.prototype.map is then `t01 . We now use t01 as a context in ordinary context sensitive
       analysis of the function, so that when reaching func in line 9, it suffices to consider only
       the t01 partition of func, which yields the precise value f1, so again, we successfully avoided
       mixing together the properties.


       Type partitioning

       The above two uses of value partitioning are sufficient for analyzing the motivating example
       without critical precision losses, but we can make the analysis even more precise using a
       third variant. The function named isFunction used in the branch condition in line 6 is a
       typical example of a predicate function, i.e., a one-parameter function that returns a boolean,
       in this case testing whether the value passed in is a function. Assume the abstract value of
       the argument object is fun1|obj2, meaning that it represents either a function fun1 or a
       non-function object obj2. With a simple analysis, the abstract return value and hence the
       branch condition is Bool representing any boolean value, so the analysis does not know that
       object cannot be obj2 inside the branch. This causes the analysis to spuriously raise a type
       error when writing to object.prototype in line 7.
           Type partitioning avoids that imprecision as follows. Type partitioning is triggered at
       any call to a function with one argument, and partitions that argument according to its
       types. In this case, the value of object is partitioned into [a 7→ fun1, b 7→ obj2]. The
       result value from isFunction then becomes [a 7→ true, b 7→ false], which we can exploit using
       ordinary control sensitivity [10] (also called type refinement [14]) at the “true” branch such
       that object in line 7 will only be fun1 and not obj2.


       Overview

       In Section 3 we give a brief introduction to the analysis domain of TAJS. Section 4 explains
       the general value partitioning mechanism, and Section 5 details the three instantiations:
       property name partitioning, free variable partitioning, and type partitioning. Section 6
       describes our experimental evaluation, and Section 7 discusses related work.
B. B. Nielsen and A. Møller                                                                                 16:7


    r1 [r2 ] ← r3 :             Writes r3 to the property named r2 of the object r1
    r1 ← r2 [r3 ]:              Reads the property named r3 of the object r2 to r1
    r1 ← x:                     Reads the value of the variable x to r1
    x ← r1 :                    Writes r1 to the variable x
    r1 ← c:                     Assigns the constant c to r1
    r1 ← function(x){· · · }:   Creates a closure for the function and stores it in r1
    if(r1 ):                    Conditionally propagates dataflow (to model if and while)
    r1 ← r2 (r3 ):              Calls the function r2 with argument r3 and stores the result in r1
    r1 ← r2 ⊕ r3 :              Computes the binary operation r2 ⊕ r3 and stores the result in r1


        Figure 2 The main flow graph instructions in TAJS.


                                                    X ∈ AnalysisLattice = L → State
                 n ∈ N : Nodes
                                                                  σ ∈ State = (L → Obj) × Registers
                  c ∈ C : Contexts
                                                                    o ∈ Obj = P → Value
                 p ∈ P : Property names
                                                               r ∈ Registers = R → Value
        ` ∈ L = N × C : Locations
                                                                  v ∈ Value = Prim × P(L)
        Figure 3 Simplified abstract domain.


    3      Background: The TAJS Analyzer
In this section we give a brief introduction to a heavily simplified version of the analysis
domain and program representation used in TAJS [11, 2], which lays the foundation for our
extensions in the following sections.
    TAJS is an open-source dataflow analysis tool for JavaScript built as a monotone frame-
work [12]. A JavaScript program is represented as a control flow graph for each function,
with nodes representing primitive instructions of the different kinds listed in Figure 2. Each
instruction operates on registers, which can be thought of as special local variables. For
simplicity, we ignore this and receiver objects at calls, and we assume all functions have only
one parameter. As an example, the single JavaScript statement y[p] = x[p] is represented
as six flow graph nodes as shown in Figure 4.
    The components of the abstract domain are summarized in Figure 3. A location is a pair of
a node and a context. The contexts allow for context sensitivity (using the context-sensitivity
strategy described by Andreasen and Møller [2]). The main abstract domain, AnalysisLattice,
is a lattice that maps locations to abstract states, where each state contains abstract values
of object properties and registers. Objects are modeled using context-sensitive allocation-site
abstraction [6, 20], so abstract object addresses are simply locations.4 Functions are special
kinds of objects. Abstract values are modeled using a product of a constant-propagation
lattice [15] named Prim of primitive values (strings, numbers, etc.) and a powerset lattice of
object addresses.
    The analysis is control sensitive by pruning infeasible dataflow at if nodes. This includes
not only eliminating flow along unreachable branches, for example when a branch condition
is definitely false [27], but also filtering abstract values based on the branch condition [10, 14].


4
    TAJS models absence/presence of object properties and uses two artificial properties DefaultNumeric
    and DefaultOther to model properties with unknown numeric/non-numeric names; we ignore that
    here.




                                                                                                          ECOOP 2020
16:8   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


                                                     n1 :   r1 ← y


                                                     n2 :   r2 ← p


                                                     n3 :   r3 ← x


                                                     n4 :   r4 ← p


                                                    n5 : r5 ← r3 [r4 ]


                                                    n6 : r1 [r2 ] ← r5


            Figure 4 Fragment of a control flow graph, for the single statement y[p] = x[p].


       As an example, the JavaScript code if(z) is represented by two primitive instructions,
       r6 ← z and if(r6 ). In the “true” branch, not only r6 but also z must have the value true.5
       To track the connection between r6 and z, a simple intraprocedural must-equals analysis is
       performed alongside the main dataflow analysis. We leverage this mechanism in Section 5,
       for example to obtain the information that r2 , r4 , and p must have the same value at the
       property read operation in Figure 4 (unless a property accessor changes p). To keep Figure 3
       simple, we omit the must-equals information in the description of the State lattice.
            In the following sections, with a slight abuse of notation we let σ(r) denote the value of
       register r in state σ, and similarly, σ(x) denotes the value of variable x. Also, we use the
       notation σ(r) := . . . to describe the operation of writing a given value to register r and also
       to the variables and registers that are equal to r according to the must-equals information.
       If ` ∈ L is a location representing an object address, we sometimes write ` for the abstract
       value (⊥, {`}) ∈ Value. Similarly, for abstract values that represent primitive values only, we
       omit the location sets, for example, "foo" denotes the abstract value ("foo", ∅) ∈ Value.
            We omit many details of TAJS, including the definitions of the concretizations of the
       lattice elements, the definitions of the transfer functions for the different instructions, how
       values of variables are being stored in special activation objects, and how a call graph is
       built during the analysis. Analyzing full JavaScript also requires reasoning about prototypes,
       scope chains, implicit type conversions, exceptions, the standard library, property accessors
       (getters and setters), and much more. It suffices to know that the resulting abstract states
       soundly overapproximate the possible program behavior [7].
            A trace is a concrete execution of the program expressed as a finite sequence of pairs
       (`, γ) where ` is a location and γ is a concrete state, starting at the program entry point with
       the initial call context in an empty state. The semantics of a program is defined as a set
       of traces. The collecting semantics is the program semantics projected onto the program
       locations: Given a location `, the collecting semantics for `, denoted J`K, is the set of states
       that appear at ` in the set of traces defined by the program semantics. The analysis result is
       thus a lattice element X ∈ AnalysisLattice such that J`K is a subset of the concretization of
       X(`) for all locations ` ∈ L.


       5
           In actual JavaScript, the value must be truthy, which also includes nonempty strings, nonzero numbers,
           and objects.
B. B. Nielsen and A. Møller                                                                                      16:9



                           t ∈ T : Partition tokens
                       o ∈ Obj = P → PartitionedValue
                 r ∈ Registers = R → PartitionedValue
        pv ∈ PartitionedValue = T ,→ Value


        Figure 5 Extension of the abstract domain for value partitioning.


    4      Value Partitioning
To prepare the analysis for value partitioning, we introduce a set T of partition tokens
and replace occurrences of Value by PartitionedValue in the abstract domain, as shown in
Figure 5. A partitioned value is a partial map from partition tokens to ordinary values. We
use the notation [t1 7→ v1 , . . . , tk 7→ vk ] (or set-builder notation like [ti 7→ vi | i = 1, . . . , k])
to denote the partitioned value that maps ti to vi for each i = 1, . . . , k and is undefined for
all other partition tokens.
      The partition tokens play a similar role as in trace partitioning [23], but at the level
of abstract values. (We explain the differences between value partitioning and traditional
trace partitioning in more detail in Section 7.) A partition token intuitively represents a set
of execution traces. The special token any represents all traces, so the partitioned value
[any 7→ v] has the same meaning as the ordinary value v in the original abstract domain.
As an invariant, all partitioned values we use are defined for the token any.6 We extend
partitioned values to be total functions pv : T → Value by defining pv(t) = pv(any) when
t∈ / dom(pv).7
      Assume X ∈ AnalysisLattice is the result of analyzing a given program, σ = X(`) is the
abstract state at some location `, and [. . . , t 7→ v, . . . ] = σ(r) is the partitioned value of some
register r. The meaning of such a partitioned value is that for any trace that ends at ` and
is in the set of traces represented by t, the concrete value of r is in the concretization of the
abstract value v.
      A covering 8 at a location ` is a set of partition tokens where the union of the sets of
traces they represent is the set of all traces that lead to `. This means that if σ(x) =
[. . . , t1 7→ v1 , ..., tk 7→ vk , . . . ] where σ = X(`) for some program variable x at location `
where {t1 , . . . , tk } is a covering, then for every concrete state in J`K, the value of x is in the
concretization of at least one of the abstract values v1 , . . . , vk . For the initial abstract state
at the program entry, all partitioned values use the trivial covering {any}.
      Now that we have generalized the abstract domain, it is easy to adjust all transfer functions
for the different kinds of nodes to operate on partitioned values instead of ordinary values. As
an example, the original transfer function for r1 ← r2 ⊕ r3 updates a given abstract state σ by
σ(r1 ) := σ(r2 )⊕σ(r3 ) (where ⊕ applied to abstract values works as in constant propagation).9


6
  When we define a partitioned value [ti 7→ vi | i = 1, . . . , k] without an any token, an any partition is
  implicitly created with value v1 t · · · t vk .
7
  In trace partitioning terminology, this use of any corresponds to a simple pre-ordering of partition
  tokens.
8
  For formal definitions of the notions of traces and coverings, see Rival and Mauborgne [23]. Basing
  our approach on partitions instead of coverings (a partition is a covering where all the trace sets are
  disjoint) could improve precision but would complicate the analysis without much practical benefit.
9
  The actual TAJS analysis also models implicit type conversions.




                                                                                                               ECOOP 2020
16:10   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript



                 T ::    =   any                                              (Section 4)
                         |   valhN, R, Valuei                                 (Section 5.1)
                         |   funhF, C, T i                                    (Section 5.2)
                         |   typehN, R, Typesi                                (Section 5.3)

              Types ::   = undefined | null | number | string | boolean
                         | object | array | function | regexp


           Figure 6 Partition tokens used by property name partitioning, free variable partitioning, and
        type partitioning.


        When switching to the domain with partitioned values, we simply replace σ(r2 ) ⊕ σ(r3 ) by
        [t 7→ pv 2 (t) ⊕ pv 3 (t) | t ∈ dom(pv 2 ) ∪ dom(pv 3 ) where pv 2 = σ(r2 ) and pv 3 = σ(r3 )]. The
        other transfer functions and least-upper-bound are adapted similarly.
             A small example can illustrate how partitioning can make the analysis relational. Assume
        the binary operation is equality, r1 ← r2 == r3 , and that we have two partitions, t1 and t2 ,
        where both registers r2 and r3 have the value 42 in partition t1 , and both have the value
        "foo" in partition t2 . With partitioning, the value of r1 becomes [t1 7→ true, t2 7→ true]
        (i.e., definitely true), whereas without partitioning, r2 and r3 both have the value 42|"foo",
        so the value of r1 becomes AnyBool (i.e., true or false).
             To get any advantage of the new abstract domain, we of course need to modify specific
        transfer functions to selectively introduce partition tokens and further exploit the extra
        information available regarding relational properties between values. We show how that
        can be accomplished in Section 5. Those mechanisms rely on some general operations for
        manipulating the partitions in partitioned values. Most importantly, we use an operation ]
        when introducing new coverings: pv 1 ] pv 2 where pv 1 , pv 2 ∈ PartitionedValue denotes the
        combined partitioned value. For each token that is only present in one of pv 1 or pv 2 , the
        new value will be the value for that token, and for each token shared by pv 1 and pv 2 , the
        new value will be the join of the two respective values.

         5      Three Instantiations of Value Partitioning
        We now present three instantiations of the value partitioning framework. Each of them
        targets a category of relational properties that are relevant to analysis of JavaScript libraries.
        Each instantiation introduces a family of partition tokens, as shown in Figure 6, along with
        some modification of the analysis transfer functions. Each partition token represents a set of
        traces, as explained in the following.

        5.1     Property Name Partitioning
        The first use of value partitioning is for improving precision at correlated object property
        read/write operations as in the motivating example.

        Partition tokens for property name partitioning
        We introduce a family of partition tokens, valhn, r, vi, where n ∈ N , r ∈ R, and v ∈ Value.
        Such a token represents the set of traces where at the last occurrence of n, the value of
        register r is v. In all valhn, r, vi tokens we use in property name partitioning, the node n
B. B. Nielsen and A. Møller                                                                                        16:11


                
                σ(r3 ) ] [valhn, r3 , pi 7→ p | p ∈ propNames(σ(r2 ))]
                
                
 σ(r3 ) :=                 ] [valhn, r3 , otheri 7→ AnyString]                       if σ(r3 )(any) = AnyString
                
                
                σ(r )                                                               otherwise
                       3

                
                [valhn, r3 , pi 7→ readProp(σ(r2 )(any), p) | p ∈ propNames(σ(r2 ))]
                
                
 σ(r1 ) :=             ] [valhn, r3 , otheri 7→ undefined]                           if σ(r3 )(any) = AnyString
                
                [any 7→ readProp(σ(r )(any), σ(r )(any))]
                
                                                                                     otherwise
                                                  2                3



   Figure 7 Introduction of partitioned values at a dynamic property read node n of the form
r1 ← r2 [r3 ].


is a property read node (i.e., of the form r1 ← r2 [r3 ]), the register r is the one holding the
property name in that instruction (i.e., r3 in r1 ← r2 [r3 ]), and the value v is a property
name (i.e., an element of P ).10
    As an example, assume the code from Figure 4 appears inside a loop, and consider the
following two traces that both end at n6 :

     τa = · · · (n1 , γ1a )(n2 , γ2a )(n3 , γ3a )(n4 , γ4a )(n5 , γ5a )(n6 , γ6a )

and

     τb = · · · (n1 , γ1b )(n2 , γ2b )(n3 , γ3b )(n4 , γ4b )(n5 , γ5b )(n6 , γ6b )

where each “· · · ” is a trace prefix leading from the program entry point to this part of the
code, γ1a , . . . , γ6b are concrete states, and τa is a prefix of τb . The last occurrence of n5 (which
is the instruction r5 ← r3 [r4 ]) is emphasized in each of the traces. Also assume that the
value of the register r4 is "foo" in γ5a and "bar" in γ5b . Note that r4 is the register holding
the property name at the n5 instruction. In this situation, the token valhn5 , r4 , "foo"i
represents τa but not τb .

Dynamic property reads
Figure 7 shows the modified transfer function for read-property nodes, r1 ← r2 [r3 ]. The
function readProp(v1 , v2 ) looks up the abstract value of properties named v2 in the abstract
objects pointed to by v1 in the current state σ.11 Property name partitioning is triggered if
the property name is not precise (here modeled as AnyString), so in that case we partition the
property name r3 with respect to the properties that appear in the abstract objects pointed
to by r2 (expressed as propNames(σ(r2 ))), and perform the property read for each partition
to obtain a partitioned value for r1 . We use the artificial abstract value other ∈ Value to
represent all other properties; for that partition, the result value becomes undefined.12 If


10
   In JavaScript, property names are either strings, which we model in the sub-lattice Prim, or symbols,
   which can be modeled as special heap locations.
11
   Reading an object property is a nontrivial operation in JavaScript because of prototypes, getters,
   and implicit type conversions. Importantly, the value partitioning mechanism is orthogonal to such
   JavaScript technicalities.
12
   In our implementation we use a more precise string lattice, which allows us to express more precisely
   that σ(r3 ) for the valhn, r3 , otheri partition is AnyString\propNames(σ(r2 )), i.e., any string except
   for the property names that are covered by other partitions. See also footnote 3.




                                                                                                                  ECOOP 2020
16:12   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


                 for each t ∈ chooseCommonCovering(σ(r2), σ(r3 )):
                     writeProp σ(r1 )(any), σ(r2 )(t), σ(r3 )(t)

             Figure 8 Exploiting partitioned values at a dynamic property write node, r1 [r2 ] ← r3 .


        the property name r3 is already precise (corresponding to the “otherwise” cases), there is no
        need to introduce new partitions, so in that case r3 is unmodified and the result value r1 is
        obtained directly using readProp and the any partition token.
            Recall that a valhn, r, pi token represents the set of traces where at the last occurrence
        of n, the value of register r is v. To respect this property we need to remove all existing
        valhn, _, _i tokens from the abstract state before applying the transfer function for dynamic
        property reads. (This is safe because every abstract value still has other coverings, in
        particular {any}.)
            Notice that for both r3 and r1 , the new partitions use the partition tokens valhn, r3 , pi
        where n is the read-property node. Evidently, the new partition tokens form a covering. Also,
        this new transfer function respects the interpretation of the newly added valhn, r, pi tokens,
        and due to the partitioning, the resulting abstract states maintain the relation between the
        involved object property names and values.

        Dynamic property writes
        Next, we modify the transfer function for dynamic property writes, r1 [r2 ] ← r3 , as shown in
        Figure 8, to take advantage of the partitionings introduced at dynamic property reads. The
        function writeProp(v1 , v2 , v3 ) writes v3 to the properties named v2 in the objects referred
        to by v1 .13 The function chooseCommonCovering finds a covering shared by the property
        name σ(r2 ) and the value to be written σ(r3 ). (An example is given below.) If multiple
        such coverings exist, a largest one (i.e., one with the largest number of partition tokens) is
        selected.14 Recall that the two values always share the {any} covering, which will be used
        if no other covering exist. When a covering has been chosen, the value is written to the
        appropriate object property for each partition, thereby exploiting the relational information.
        In case the {any} covering is chosen, the transfer function behaves as the original version
        without value partitioning.

        Example
        To better understand property name partitioning, we now explain the mechanism in more
        detail on the example given in Figure 4. Let us assume that σ(p) = [any 7→ AnyString],
        σ(x) = [any 7→ obj 2 ] and σ(y) = [any 7→ obj 1 ] where in state σ, obj 1 is the location of an
        empty abstract object and obj 2 is the location of an abstract object with two properties,
        {foo: 1, bar: 2}. This means when analyzing the read property node r5 ← r3 [r4 ] we
        have σ(r3 ) = [any 7→ obj 2 ] and σ(r4 ) = [any 7→ AnyString]. Since the property name r4 is


        13
           We omit the details of how the implementation of writeProp in TAJS handles strong/weak updates,
           setters, and implicit type conversions. Importantly, the value partitioning mechanism is orthogonal to
           such JavaScript technicalities.
        14
           Multiple coverings can arise if, for example, the same property name is used at two different property
           read operations. We choose the largest covering based on the heuristic that fine-grained coverings lead
           to higher precision than coarse-grained coverings. The most important consequence of this heuristic is
           that we avoid the {any} covering if others are available. In case of multiple largest ones, an arbitrary
           one is selected among them.
B. B. Nielsen and A. Møller                                                                                         16:13



        pv ∈ PartitionedValue = T ,→ FPValue
                   fv ∈ FPValue = FunctionPartitions × Value
      fp ∈ FunctionPartitions = P(T )
                 AnalysisLattice = C 0 × N → State
                           c ∈ C 0 = FunctionPartitions × C

      Figure 9 Extensions of the abstract domain for free variable partitioning.


imprecise, the first case in each definition in Figure 7 applies, meaning that value partitioning is
triggered. Since propNames(σ(r2 )) = {"foo", "bar"}, we update r4 such that σ(r4 ) equals
[valhn, r4 , "foo"i 7→ "foo", valhn, r4 , "bar"i 7→ "bar", valhn, r4 , otheri 7→ AnyString],
where n is the read property node. Recall from Section 3 that the operation σ(r4 ) := . . .
does not only modify r4 but also the must-equals variables and registers, meaning that this
partitioned value is also written to r2 and p. The value being read gets the same partitions,
such that σ(r5 ) becomes [valhn, r4 , "foo"i 7→ 1, valhn, r4 , "bar"i 7→ 2, valhn, r4 , otheri 7→
undefined].
     When reaching the property write operation r1 [r2 ] ← r5 , the state σ contains σ(r2 ) =
[valhn, r4 , "foo"i 7→ "foo", valhn, r4 , "bar"i 7→ "bar", valhn, r4 , otheri 7→ AnyString]
and σ(r5 ) = [valhn, r4 , "foo"i 7→ 1, valhn, r4 , "bar"i 7→ 2, valhn, r4 , otheri 7→ undefined].
We now apply the transfer function from Figure 8. The two values σ(r2 ) and σ(r5 ) share two
coverings: {any} and {valhn, r4 , "foo"i, valhn, r4 , "bar"i, valhn, r4 , otheri}. Since the
second covering is largest, that one is picked by chooseCommonCovering(σ(r2 ), σ(r5 )).
We therefore perform three writes corresponding to the abstract assignments obj 1["foo"]=1,
obj 1["bar"]=2, and obj 1[AnyString]=undefined; notably, the properties foo and bar are
not mixed together.

5.2       Free Variable Partitioning
We now explain how to leverage value partitioning to gain precision for free variables, such
as func in line 9 in the motivating example from Figure 1.

Extending the abstract domain
The first step is to extend the abstract domain as shown in Figure 9. The Value component in
PartitionedValue is replaced by FPValue, which is a product of FunctionPartitions and Value.
The component FunctionPartitions is a set of partition tokens, which we use for tracking
which partitions the functions described in the Value component may have been declared in.
(For instance, for the motivating example from Figure 1, the function declared in lines 7–10 was
created in the partitions t01 , t02 , and t03 so the corresponding abstract values become15 ({t01 }, `),
({t02 }, `), and ({t03 }, `), where ` denotes the location for the created closure.) To preserve this
information when analyzing calls to such functions, we also augment the set of contexts to
include this information (replacing C by C 0 in AnalysisLattice). The FunctionPartitions set
is empty for values and contexts that do not use free variable partitioning.


15
     These three abstract values are denoted `t01 , `t02 , and `t03 , respectively, in the motivating example in
     Section 2.




                                                                                                                   ECOOP 2020
16:14   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


                            
                            
                             pv(t)   if t ∈ dom(pv)
                            
                                      otherwise if t = funhf, c, t0 i ∧
                            
                            ⊥
                pv(t) =
                            
                            
                                                ∃c0 , t00 : c 6= c0 ∧ funhf, c0 , t00 i ∈ dom(pv)
                            
                              pv(any) otherwise
                            

            Figure 10 Redefining how partitioned values are extended to total functions, exploiting free
        variable partitioning.


           Next, we introduce a new kind of partition tokens, and we then describe how elements of
        FunctionPartitions are created at function expressions and used at read variable nodes.


        Partition tokens for free variable partitioning
        We introduce a new kind of partition tokens, funhf, c, ti, where f is a function, c ∈ C 0 is a
        context, and t ∈ T is a partition token. A trace is represented by such a token if (1) the trace
        ends at a program location that belongs to a closure that was created when the trace up to
        that point was a t trace, and (2) that point in the trace is in function f in context c. (For
        instance, in the motivating example, a trace ending in line 9 where the currently executed
        closure was created in line 7 at the end of a t1 trace can be represented by funhf, c, t1 i,
        where f is the function at lines 2–11 and c is the context for the call to that function.) We
        only allow such partition tokens to appear in abstract values of variables that are declared in
        f . Intuitively, we use these partition tokens to obtain a form of heap specialization (also
        called heap cloning or context sensitive heap) [20] for the activation objects of f .16
            An important property is that if the abstract value of a variable x declared in a function
        f contains partition token funhf, c0 , t00 i for some c0 , t00 but not funhf, c, t0 i for any c, t0 where
        c=6 c0 , then f has not been invoked with context c in any trace represented by funhf, c, t0 i.
        This means that it is safe to redefine how partitioned values are extended to total functions
        as shown in Figure 10. The only difference between the new and the original definition from
        Section 4 is the second case, where ⊥ is returned to indicate that the set of traces for the
        given partition is empty due to the above mentioned property being satisfied.


        Function definitions
        Assume the analysis reaches a function definition node, r1 ← function(· · · ){· · · }, while
        analyzing a function f in context c, and that the function being defined has free variables
        x1 , . . . , xn that are declared in f (i.e., as parameters or local variables). Note that f is
        the function containing the function definition node being analyzed, not the function being
        defined. Let ` denote the location of the newly created closure according to the original
        transfer function without free variable partitioning. We now partition both the resulting
        function value of register r1 and the values of x1 , . . . , xn as shown in Figure 11.
             First, we use a function chooseCovering that finds a largest covering, denoted LC ,
        among the values of x1 , . . . , xn . (If multiple such coverings exist, an arbitrary one is selected
        among them, as before.)


        16
             Local variables and arguments are stored as properties on activation objects, which are created on each
             invocation.
B. B. Nielsen and A. Møller                                                                                     16:15



   LC = chooseCovering(σ(x1 ), . . . , σ(xn ))
              (
                σ(xi ) ] [funhf, c, ti 7→ σ(xi )(t) | t ∈ LC ] if LC ⊆ dom(σ(xi ))
    σ(xi ) :=
                σ(xi )                                         otherwise
                  (
                      [t 7→ ({funhf, c, ti} ∪ fp, `) | t ∈ LC ] if LC ⊆ dom(σ(xi )) for some xi
     σ(r1 ) :=
                      [any 7→ (∅, `)]                             otherwise

   Figure 11 Introduction of partitioned values at a function definition node r1 ← function(· · · ){· · · }.



    For each xi for i = 1, . . . , n, if the current value of xi contains the covering LC , we
add funhf, c, ti 7→ σ(xi )(t) to the value of xi for each t ∈ LC . (This evidently respects
the meaning of funhf, c, ti tokens informally described in the beginning of the section.)
Otherwise, if the current value of xi does not contain LC , we leave xi unmodified.
    For the result register r1 , we augment the function location ` by the same partition
tokens. If at least one free variable has been partitioned (i.e., LC ⊆ dom(σ(xi )) for some xi ),
then for each of the partition tokens t ∈ LC , the value of r1 becomes the augmented value
({funhf, c, ti} ∪ fp, `) where fp is the set of function partitions in the current context c. By
augmenting the value using the funhf, c, ti token, the information about the partitioning
is available when ` is later invoked, which is explained below. (The function partitions fp
of the current context describe how the current function was declared in an outer scope,
so by inheriting those, the partitioning also works for multiple layers of nested functions.)
Otherwise, if none of the free variables have been partitioned, register r1 is assigned the
partitioned value [any 7→ (∅, `)], which is equivalent to the original transfer function without
free variable partitioning.


Function calls

At a function call r1 ← r2 (r3 ) where σ(r2 ) is an augmented function value (fp, v) (i.e., fp
is a set of partition tokens introduced at function definitions and v refers to the set of
closures that may be invoked), we use fp to augment the context for each callee. (The set of
augmented contexts C 0 contains the FunctionPartitions component exactly for this purpose.)
Assume for simplicity that v refers to a single closure location so we only have one callee. By
augmenting the context, when analyzing the body of the callee we retain the information
about the partitions where the callee closure was created, which we can exploit when reading
its free variables as explained next.


Variable reads

Figure 12 shows the updated transfer function for read variable nodes r1 ← x, where we read
a variable x in a calling context with function partitions fp. The set of function partitions fp
tells us which partitions the current closure may have been created in. For this reason, if the
abstract value of x contains partition tokens that are also in fp, we can obtain a covering for
x by considering only those partition tokens. If there is no such partition token, we just read
the value of x as in the original transfer function.




                                                                                                               ECOOP 2020
16:16   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


                         (      F                              
                           any 7→ {σ(x)(t) | t ∈ dom(σ(x)) ∩ fp} if dom(σ(x)) ∩ fp 6= ∅
             σ(r1 ) :=
                          σ(x)                                           otherwise


           Figure 12 Exploiting partitioned values at a read variable node, r1 ← x.

                                                              32 var o1 = {x: 1, y: 2};
            22 function f(v) {
                                                              33 var o2 = {};
            23     return function g() {
                                                              34 Object.keys(o1).forEach(
            24       return v;
                                                              35   function h(p) {
            25     }
                                                              36     var v = o1[p];
            26 }
                                                              37     o2[p] = function j() {
            27
                                                              38       return v;
            28 var foo = f("foo");
                                                              39     }
            29 var bar = f("bar");
                                                              40   }
            30
                                                              41 );
            31 assert(bar() != "foo");
                                                              42 assert(o2.y() != 1);

               Figure 13 Free variable partitioning
                                                                 Figure 14 Free variable partitioning
            example with different contexts.
                                                              example with partitioned argument.


            As an example, assume σ(x) = [funhf, c, ti 7→ 1, funhf, c0 , t0 i 7→ 2, . . . ] and fp =
        {funhf, c, ti}. The value of x tells us that x must be a local variable in function f which may
        have been called in contexts c and context c0 , and that x’s value is 1 or 2, respectively. Since
        fp = {funhf, c, ti}, we know that the current function is defined inside the lexical scope of f
        in context c, meaning that the value of x must be 1.


        Examples

        To better understand free variable partitioning, we provide two examples. The first example
        (Figure 13) shows how free variable partitioning can preserve precision when a function is
        called in multiple contexts, in a way that resembles traditional heap specialization [20]. The
        second example (Figure 14) shows how free variable partitioning can preserve the precision
        of free variables partitioned with property name partitioning.
            In Figure 13, lines 22–26 define a function f that returns a closure, which on invocation
        returns the argument passed to f. Lines 28 and 29 call f with the arguments "foo" and
        "bar" and store the returned closures in the variables foo and bar, respectively. Line 31
        calls the closure stored in bar and asserts that the resulting value is not the string "foo".
        The two calls to f are analyzed in different contexts c and c0 (due to the context sensitivity
        mechanism mentioned in Section 3, as "foo" and "bar" are determinate values). For the
        invocation bar(), the resulting value is the value of the free variable v in the closure stored
        in bar. If not using heap specialization, the two concrete activation objects at the two calls
        to f would be modeled by a single abstract object, so the free variable v would have the
        imprecise abstract value AnyString. To reason precisely about the assertion in line 31, the
        analysis has to distinguish the value of v at the two calls. The baseline TAJS analyzer
        accomplishes this by the use of heap specialization [2], which provides two different abstract
        activation objects for the calls to f, so the two values "foo" and "bar" are kept separate.
B. B. Nielsen and A. Møller                                                                            16:17


    With free variable partitioning we obtain the same degree of precision as with heap
specialization in this situation. Since v is a free variable in the closure created in line 23, we
apply the top cases in the transfer functions shown in Figure 11 with LC = {any}. This means
that v after the call to f("foo") will have the value [any 7→ "foo", funhf, c, anyi 7→ "foo"]
and the value written to the foo variable is ({funhf, c, anyi}, `g ) where `g is the location of
the closure created in line 23. For the call to f("bar"), the value for v will similarly be [any 7→
"bar", funhf, c0 , anyi 7→ "bar"] and the value written to bar is ({funhf, c0 , anyi}, `g ). Note
the difference in the context part of the fun token (c at the "foo" call and c0 at the "bar"
call), since the calls to f are in those two different contexts. The value of v then becomes
[any 7→ AnyString, funhf, c, anyi 7→ "foo", funhf, c0 , anyi 7→ "bar"], so that the fun
partitions preserve the precise values.
    Now when analyzing bar(), bar has the value ({funhf, c0 , anyi}, `g ), which means
the calling context to the function g is augmented with the set of function partitions
{funhf, c0 , anyi} as described above. When reading the free variable v in line 24, we
use the first case in the transfer function defined in Figure 12, since dom(σ(x)) ∩ fp is
{funhf, c0 , anyi}. This means that the resulting value from the variable read is the value
[any 7→ "bar"], so we obtain the same precision as with heap specialization.
    This first example shows how the free variable partitioning mechanism works and how
it relates to heap specialization, but it does not demonstrate any precision improvements
compared to the existing TAJS analyzer, which does apply heap specialization. The second
example, Figure 14, illustrates a simplified version of how free variable partitioning was used
in the motivating example in combination with property name partitioning, which leads to
a precision improvement of TAJS. Line 32 defines the object o1 with two properties, and
line 33 defines o2 as an empty object. Lines 34–41 iterate over the properties of o1. For each
property, it writes a function returning the value of o1[p] to the p property of o2. To prove
that the assertion at line 42 always holds, it is critical that the values of v are not mixed
together in the iterations.
    Using property name partitioning at line 36, the value of v becomes [valhn, r, "x"i 7→ 1,
valhn, r, "y"i 7→ 2] and the value of p becomes [valhn, r, "x"i 7→ "x", valhn, r, "y"i 7→ "y"],
where n is the read property node and r is the register storing the property name. (For
clarity we ignore the other partition in this example.) When analyzing the closure creation
at line 37, we use the top rules in Figure 11 with LC = {valhn, r, "x"i, valhn, r, "y"i}. This
means that v is augmented with the additional partitions [funhh, c, valhn, r, "x"ii 7→ 1,
funhh, c, valhn, r, "y"ii 7→ 2], and the value being written to o2[p] is [valhn, r, "x"i 7→
({funhh, c, valhn, r, "x"ii}, `j ), valhn, r, "y"i 7→ ({funhh, c, valhn, r, "y"ii}, `j )]. Here, `j
denotes the location of the closure created in line 37. At the dynamic property write, the
property name and value to be written share the covering {valhn, r, "x"i, valhn, r, "y"i},
meaning that the write happens as described in Figure 8, so that o2.x becomes
({funhh, c, valhn, r, "x"ii}, `j ) and o2.y becomes ({funhh, c, valhn, r, "y"ii}, `j ). Now when
o2.y is called in line 42, the call to j is augmented with the the set of function partitions
{funhh, c, valhn, r, "y"ii}. Therefore when reading the value v in line 38, according to
Figure 12 we only read the funhh, c, valhn, r, "y"ii partition. The result of reading v is then
[any 7→ 2], so the analysis is precise enough to prove that the assertion at line 42 holds.

5.3    Type Partitioning
Value partitioning can also be useful for partitioning values based on their types. Since
JavaScript does not have function overloading, it is common to reflectively find the type of
an argument, and based on the type run different pieces of code (as in line 3 in Figure 1).



                                                                                                      ECOOP 2020
16:18   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


                         
                         σ(r3 ) ] [typehn, r3 , tyi 7→ filter(σ(r3 ), ty) | ty ∈ types(σ(r3 ))]
                         
                         
             σ(r3 ) :=                                                   if |types(σ(r3 ))| > 1
                         
                         
                         σ(r )                                          otherwise
                              3



            Figure 15 Addition to the transfer function for a call node with one argument, r1 ← r2 (r3 ).
        filter restricts a partitioned value to represent only values that match the given type, and types
        returns the possible types of the given partitioned value.


        This is often done through the use of predicate functions, which are one-parameter functions
        that return a boolean value. By partitioning the arguments at calls to predicate functions,
        the analysis becomes able to track the relations between the arguments and the return values,
        and thereby boost the control sensitivity mechanism (see Section 3) at branches that involve
        such calls. Since the analysis does not know in advance whether a function returns boolean
        values, we simply perform this partitioning at all function calls with one argument, without
        considering what values the function may return.


        Partition tokens for type partitioning

        We introduce type partitioning tokens of the form typehn, r, tyi, where n ∈ N is a call node
        r1 ← r2 (r3 ), r ∈ R is the argument register in n (in this case r3 ), and ty ∈ Types using the
        set of types shown in Figure 6. Such a token represents the set of traces where the type of r
        is ty at the last occurrence of n. For example, the traces that reach line 7 in Figure 1 are
        represented by the token typehn, r, functioni where n is the call to isFunction in line 6 and
        r is the argument register of that call node.


        Function calls

        Figure 15 shows an addition to the transfer function for call nodes, r1 ← r2 (r3 ), to partition
        the argument value before the call takes place. The first case applies if the argument σ(r3 )
        abstractly represents values of multiple types (i.e., |types(σ(r3 ))| > 1, where types returns
        the set of all the types the given abstract value may have). In this case we introduce a
        partition typehn, r3 , tyi for each ty ∈ types(σ(r3 )), such that the value in that partition is
        filter(σ(r3 ), ty), where filter restricts σ(r3 ) to only represent values of type ty. Since all
        the possible types are represented, the new partitions together form a covering.
            Recall that a typehn, r, tyi token only represents information about the last occurrence
        of n in a given trace. To ensure this property we always remove all existing typehn, _, _i
        tokens from the abstract state immediately before applying the modified transfer function
        for call node n.


        Example

        As an example consider the code in Figure 16, and assume x has the abstract value fun1|obj2
        (representing either the function fun1 or the object obj2). Without type partitioning, the
        result of analyzing the isObj(x) call is the abstract value AnyBool (representing true or false),
        so both branches are analyzed with x being fun1|obj2; however, in a concrete execution,
        fun1 will never flow to the “true” branch, and obj2 will never flow to the “false” branch.
B. B. Nielsen and A. Møller                                                                             16:19


                                                  47 function isObj(arg) {
                                                  48   if (typeof arg == ’object’)
43 function isObj(arg) {                          49     return true;
44     return typeof arg == ’object’;             50   else
45 }                                  51                 return false;
46 if (isObj(x)) { ... } else { ... } 52 }
                                                  53 if (isObj(x)) { ... } else { ... }
     Figure 16 Type partitioning example.
                                                    Figure 17 Type partitioning example with
                                                 control dependent relations.


    By using type partitioning, we partition x before calling the predicate function. In
this example let n be the call node and let r be its argument register. Then x be-
comes [typehn, r, functioni 7→ fun1, typehn, r, objecti 7→ obj2]. Now when analyzing the
body of isObj, the expression typeof arg == ’object’ evaluates to the partitioned value
[typehn, r, functioni 7→ false, typehn, r, objecti 7→ true]. When reaching the if branch,
control sensitivity ensures that only the object partition flows to the “true” branch (i.e., x’s
value becomes [typehn, r, functioni 7→ ⊥, typehn, r, objecti 7→ obj2] in that branch), and only
the function partition flows to the “false” branch (i.e., x’s value becomes [typehn, r, functioni 7→
fun1, typehn, r, objecti 7→ ⊥] in that branch).




Control dependent relations

Predicate functions are sometimes implemented with control dependent relations between
the argument and the result, as in the example in Figure 17. The example is contrived but it
is not uncommon in predicate functions that the result values appear as the literals true or
false in branches. With the type partitioning mechanism described above, the returned
values will not be partitioned in this situation, since the partitions in arg do not propagate
to the values true and false.
    To mitigate this issue, we augment the abstract states as shown in Figure 18 to keep track
of partitions that must be dead or may be live (represented by the two P(T ) components,
respectively). A partition is dead if the set of traces it represents is empty, and it is live
otherwise. (We only keep track of the live partitions in coverings where there are any
dead partitions.) Since the branch condition typeof arg == ’object’ is analyzed with
a partitioned value for arg, by control sensitivity we know that the only traces that can
reach the “true” branch are those represented by the object partition, so we record that
typehn, r, objecti is live and typehn, r, functioni is dead in that branch, and conversely in the
other branch. To exploit this information, we also update the transfer function for constants,
r1 ← c, as shown in Figure 19. Basically, it assigns ⊥ to all dead partitions and the constant
c to all live partitions. If there are no dead partitions, it behaves as usual, where the constant
is written to the any partition. When the analysis reaches true (line 49), we obtain the
partitioned value [typehn, r, functioni 7→ ⊥, typehn, r, objecti 7→ true], and similarly when
analyzing false (line 51) we get [typehn, r, functioni 7→ false, typehn, r, objecti 7→ ⊥]. The
join of these two values is [typehn, r, functioni 7→ false, typehn, r, objecti 7→ true], which
becomes the result of isObj(x). Due to the control sensitivity mechanism, only obj2 then
flows to the “true” branch, and only fun1 flows to the “false” branch in line 53.



                                                                                                       ECOOP 2020
16:20   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript



             State 0 = State × P(T ) × P(T )


             Figure 18 Abstract states updated to keep track of dead and live partitions.

                          
                          [t 7→ c | t ∈ livePartitions(σ)] ] [t 7→ ⊥ | t ∈ deadPartitions(σ)]
                          
                          
              σ(r1 ) :=                                                    if deadPartitions(σ) 6= ∅
                          
                          [any 7→ c]
                          
                                                                             otherwise


             Figure 19 Updated transfer function for constant nodes, r1 ← c, for improved type partitioning.


         6      Evaluation
        We have implemented the value partitioning framework (Section 4) and the three instantiations
        (Section 5) on top of TAJS v0.24. Implementing the general framework in TAJS required
        900 lines of code, however most of this is boilerplate code for lifting operations on ordinary
        abstract values to also work on partitioned values. With the general framework in place,
        instantiations are easy to implement: property name partitioning (Section 5.1), free variable
        partitioning (Section 5.2), and type partitioning (Section 5.3) required only around 230,
        250, and 60 lines of code, respectively. We disable TAJS’s for-in specialization technique,
        since it is subsumed by property name partitioning.17 We refer to our new analysis tool
        as TAJSValPar .18 Using this tool we evaluate our techniques by answering the following
        research questions:

        RQ1 How does TAJSValPar compare to existing state-of-the-art analyses for JavaScript?
        RQ2 What are the effects of the three different instantiations of value partitioning?

           All our experiments are conducted on an Ubuntu machine with a 2.6 GHz Intel Xeon
        E5-2697A CPU running a JVM with 10 GB RAM.

        6.1     RQ1: Comparison with State-Of-The-Art Analyses
        We start by comparing TAJSValPar against the current state-of-the-art analyses for JavaScript:
        the baseline TAJS analyzer with static determinacy [2], TAJSVR [26] with demand-driven
        value refinement, and the CompAbs analyzer [16] based on the SAFE analyzer [17]. We use
        the same benchmarks as those used in the evaluation of TAJSVR , which is the most recent
        related work.

        Micro benchmarks
        We first evaluate TAJSValPar against a small collection of micro benchmarks that capture
        some of the main challenges that appear in analysis of modern JavaScript libraries and are
        used in previous work [16, 26]. The benchmarks all contain dynamic read/write pairs that


        17
           The motivation for introducing for-in specialization in [2] was to reason about correlated read/write
           pairs inside for-in loops. This relational information is now provided by property name partitioning.
        18
           TAJSValPar : TAJS with Value Partitioning
B. B. Nielsen and A. Møller                                                                              16:21


     Table 1 Micro-benchmarks that check how state-of-the-art analyses handle various dynamic
read/write pairs that represent typical challenges in JavaScript library code. A 7 indicates that the
analysis mixes together the properties of the object being manipulated, while a X indicates that
it is sufficiently precise to keep them distinct. The CF, CG, AF, and AG benchmarks are drawn
directly from [16], while M1, M2, and M3 are drawn directly from [26].

Benchmark               TAJS                CompAbs                TAJSVR                TAJSValPar

CF                        X                     X                     X                       X
CG                        X                     X                     X                       X
AF                        7                     X                     X                       X
AG                        7                     X                     X                       X
M1                        7                     7                     X                       X
M2                        7                     7                     X                       X
M3                        7                     7                     X                       X



are variations of the pattern shown in the introduction and the motivating example. The
results of the comparison are shown in Table 1. For these benchmarks, a test succeeds if it
avoids mixing together properties in the dynamic read/write pairs.
    The first two examples, CF and CG, are loops where the static analyses have enough
information to be able to unroll all the iterations and thereby analyze the read/write patterns
with precise property names. For CF, property name partitioning in TAJSValPar gives the
same degree of precision without loop unrolling.
    AF and AG are loops where the static analyses are incapable of obtaining a precise value
for the property name used in the dynamic read/write pairs. TAJS fails to analyze these, but
CompAbs detects the pattern syntactically and therefore applies trace partitioning to analyze
the code precisely. TAJSVR also succeeds on these tests, because its backwards abstract
interpreter is capable of providing the necessary relational information. In comparison,
TAJSValPar can reason about the relational information on its own.
    Both TAJS and CompAbs fail on the last three tests (M1, M2, and M3). CompAbs
fails on M1 and M3 because it does not apply partitioning due to the fragility of syntactic
patterns, and it fails on M2 because the partitioning does not provide the necessary precision
about free variables. Again, TAJSVR can analyze them all, since the backwards abstract
interpreter is powerful enough to reason about all the cases, whereas TAJSValPar successfully
preserves the relational properties by the use of value partitioning.
   These results demonstrate that for these benchmarks, TAJSValPar is capable of providing
comparable precision to the demand-driven value refinement technique without the need for
a complicated backwards analysis, and provides better precision than the other analyses.



Library benchmarks

The next set of benchmarks is taken from the evaluation of TAJSVR and consists of small
test cases for popular real-world libraries. The libraries include the widely used functional
utility library Underscore (which has more than 20 000 dependents in npm) v1.8.3 with 1 548
LoC and the most depended-upon package Lodash (more than 115 000 dependents). We
analyze both Lodash3 (v3.0.0, 10 785 LoC) and Lodash4 (v4.17.10, 17 105 LoC), since their
code bases are substantially different and therefore pose distinct challenges for static analysis.



                                                                                                        ECOOP 2020
16:22   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


            Table 2 Analysis results for real-world benchmarks (from [26]). For each group of benchmarks
        and for each of the four analyzers, we show the number of tests that are analyzed successfully and
        (in parentheses) the average analysis time per successful test.

        Benchmark group                     TAJS             CompAbs               TAJSVR       TAJSValPar
        Underscore (182 tests)             0     (-)         0        (-)         173 (2.9s)    173    (2.7s)
        Lodash3     (176 tests)            7 (2.4s)          0        (-)         172 (5.5s)    173    (5.3s)
        Lodash4     (306 tests)            0     (-)         0        (-)         266 (24.7s)   289   (26.3s)
        Prototype     (6 tests)            0     (-)         2    (23.1s)         5 (97.7s)     5     (34.1s)
        Scriptaculous (1 tests)            0     (-)         1    (62.0s)         1 (236.9s)    1     (55.2s)
        jQuery       (71 tests)            3 (16.0s)         0        (-)         3 (13.5s)     3     (20.4s)



        The other libraries, Prototype v1.7.2, Scriptaculous v1.9.0, and jQuery v1.10,19 are popular
        libraries for client-side web programming.
            The analysis results are shown in Table 2. We classify an analysis of a benchmark as
        successful if it terminates within 5 minutes and the analysis result to our knowledge is sound.
        In particular, an analysis run is considered a failure if the analysis result does not have
        dataflow to the ordinary exit of the program. (All the tests pass in normal execution, so
        an analysis result is obviously unsound if there is no dataflow to the ordinary exit.) To
        increase confidence in the soundness of the analysis results for TAJSValPar , we apply thorough
        soundness testing as described at the end of this section. Increasing the time budget does
        not help for these benchmarks: as reported previously for JavaScript analysis tools, critical
        precision losses tend to cause a proliferation of spurious dataflow that drastically increases
        analysis time and renders the analysis results useless [26, 11, 22, 16].
            The results for TAJSValPar are comparable to those of TAJSVR , which outperforms
        the other analyzers. TAJSValPar succeeds in analyzing all the benchmarks that TAJSVR
        can handle, plus 24 more (one Lodash3 test and 23 Lodash4 tests). Note the substantial
        improvement for the Lodash4 tests: the number of Lodash4 tests that are not analyzed
        successfully is reduced from 40 to 17. None of the analyzers do well on the jQuery benchmarks;
        a preliminary manual study shows that the reasons are unrelated to relational analysis. The
        results are as expected, since property name partitioning and free variable partitioning are
        alternative techniques to provide the relational information that TAJSVR obtains from its
        demand-driven value refinement. Furthermore, value partitioning is triggered more often
        during the analysis, which means that the precision improvements are not limited to the few
        critical cases where value refinement is triggered. On top of this, type partitioning provides
        some additional precision beyond the capabilities of TAJSVR .
            Comparing the performance between TAJSValPar and TAJSVR , the most significant
        differences are for the Prototype and Scriptaculous benchmarks. TAJSValPar is around 3–4
        times faster than TAJSVR , which is mainly because property name partitioning makes the
        for-in specialization technique in TAJS obsolete. For Underscore and Lodash3, TAJSValPar
        is slightly faster than TAJSVR . This is encouraging, because analyzing dynamic property
        writes as the one in line 7 in Figure 1 is more expensive in TAJSValPar than in TAJSVR .
        In TAJSVR such an operation is handled as a single imprecise write (since the precision
        is recovered on demand), whereas TAJSValPar performs the write for each property that
        is copied. To soundly handle setters, all the writes happen in different states that are


        19
             This is the version of jQuery used in [2]. Note that [16] used the older v1.4.4.
B. B. Nielsen and A. Møller                                                                          16:23


subsequently joined together, which causes TAJSValPar to spend some extra time at such
writes. Since the analysis time is nevertheless similar, we can conclude that value partitioning
is cheaper for analyzing other parts of the libraries. For Lodash4 and jQuery, TAJSVR is
slightly faster than TAJSValPar . For Lodash4, the main reason is the handling of dynamic
property writes, and for the jQuery benchmarks, type partitioning adds little performance
overhead as seen in Table 3.



Precision

Previous work [2, 22, 26] established that type analysis and call-graph construction are
useful metrics for measuring the precision of an analysis for JavaScript, and therefore we
use these metrics to evaluate the analysis precision of TAJSValPar . All locations are treated
context-sensitively in these measurements, meaning that we count the same location once for
each reachable context. We count the number of possible types for the resulting value in
each variable or property read and find that in 99.19% of the reads, a single unique type is
read, with the average number of types being 1.02. For measuring precision of the call-graph
construction, we measure the number of call-sites with unique callees, and find this number
to be 99.95% of all call-sites. These numbers show that when the analysis succeeds, it does
so with very high precision.



Soundness

Formally proving soundness of the three variants of value partitioning is out of scope of this
paper, however, we will informally justify that the general approach is sound. Since general
trace partitioning is known to be sound, it suffices to argue that the precision gained by value
partitioning is equivalent to that obtained through trace partitioning. The key reason why
this holds for property name partitioning and type partitioning is that the partition tokens
represent the last occurrence of some node, meaning that if two values share partitions,
they represent information about the same execution traces. This means that we could
(if ignoring performance) instead have applied traditional trace partitioning, with exactly
the same partition tokens and at the same nodes, resulting in the same precision. (For
further discussion about the connection between value partitioning and trace partitioning,
see Section 7.) Similarly for free variable partitioning, since the partitions are only allowed
on activation objects, the precision is never higher than what would be obtained using heap
specialization (where each partition would be represented by a distinct abstract activation
object), and therefore soundness follows from soundness of heap specialization.
    Furthermore, to increase confidence in the soundness of our implementation, all the
TAJSValPar results have been thoroughly soundness tested [3]. This means that the analysis
results overapproximate all the dataflow facts that have been observed during concrete
executions of the analyzed benchmarks. For every variable and property read observed
concretely, we have checked that the concrete value is in the concretization of the corresponding
abstract value in the analysis results, and similarly for property writes and function calls.
All our benchmarks except one pass in total more than 7.6 million soundness tests. The
one benchmark that fails is a Lodash4 test, which uses ES6 iterators in combination with
Arrays.from, which is not fully supported in the latest version of TAJS and is unrelated to
the use of value partitioning.



                                                                                                    ECOOP 2020
16:24   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


            Table 3 Analysis results for real-world benchmarks (from [26]) using different instantiations of
        value partitioning. “None” is without value partitioning, “P” is with property name partitioning, “P
        + FV” is with property name and free variable partitioning, and “F + PV + T” is with property
        name, free variable, and type partitioning.

        Benchmark group                None                P             P + FV            P + FV + T
        Underscore (182 tests)        0     (-)       149 (2.0s)        173 (2.5s)         173      (2.7s)
        Lodash3     (176 tests)       7 (2.4s)        167 (4.7s)        173 (5.1s)         173      (5.3s)
        Lodash4     (306 tests)       0     (-)       268 (16.8s)       274 (27.7s)        289     (26.3s)
        Prototype     (6 tests)       0     (-)       0       (-)       5 (32.7s)          5       (34.1s)
        Scriptaculous (1 tests)       0     (-)       0       (-)       1 (53.1s)          1       (55.2s)
        jQuery       (71 tests)       3 (16.0s)       3 (15.2s)         3 (16.5s)          3       (20.4s)



        6.2    RQ2: Effects of the Three Instantiations
        We now investigate how much each of the three uses of value partitioning contributes to
        the results reported in the previous section. The results from running our analysis with
        only some instantiations enabled can be seen in Table 3. The column “P” is with only
        property name partitioning enabled; we see that it is sufficient for analyzing many of the
        Underscore and Lodash test cases, but not for any of the Prototype or Scriptaculous test
        cases. (Without property name partitioning but with the other two instantiations enabled,
        the analysis is not able to analyze more benchmarks than TAJS.) The column “P + FV” uses
        both property name partitioning and free variable partitioning. Also enabling free variable
        partitioning makes the analysis capable of analyzing many additional benchmarks: more
        Underscore and Lodash test cases, as well as some Prototype and Scriptaculous test cases.
        Compared to only property name partitioning, the analysis times are higher (for the reason
        discussed above regarding additional state joins). The last column “P + FV + T” is with
        all instantiations enabled and therefore contains the same numbers as shown in Table 2.
        We see that type partitioning enables the analysis of 15 additional Lodash4 tests, without
        significantly increasing the analysis time.
            We conclude that all three instantiations contribute to the results, where property name
        partitioning is the most important one, followed by free variable partitioning and then
        type partitioning. (TAJS already performs filtering at branches, as mentioned in Section 3;
        without that feature the effect of type partitioning would likely be larger.)


         7     Related Work

        Trace partitioning

        Value partitioning can be viewed as a variant of trace partitioning [23] as explained in
        Sections 1, 2 and 4, but there are some important differences. Changing the original abstract
        domain in Section 4 to support traditional trace partitioning can be done by replacing
        L → State by L → T → State, so that an abstract state is maintained for each partition, at
        every location. Thus, different locations can partition the abstract states differently. Value
        partitioning instead has only one abstract state per location but partitions the individual
        abstract values, which adds an additional degree of flexibility: different parts of each abstract
        state can be partitioned differently. In particular, for the large parts of the states where we
B. B. Nielsen and A. Møller                                                                           16:25


are not interested in relational information, we can use the {any} partitioning,20 while for the
important registers and object properties, we can have nontrivial partitions. With traditional
trace partitioning, the normal transfer functions are applied for each partition, which causes
redundant computations because of the similarities between the different partitions21 ; with
value partitioning, we only pay a price for partitioning at operations that involve abstract
values with nontrivial partitions. This is the main reason for the low overhead of the
technique.
    Another difference is that the partition tokens in traditional trace partitioning are actually
lists of “directives” (the language of directives used by Rival and Mauborgne [23] is similar
to our language of tokens in Figure 6), which can lead to a combinatorial explosion. By
partitioning at the level of values and allowing multiple coverings in each partitioned value,
we avoid the need to maintain such combinations.


Relational analysis
Traditional techniques for achieving relational analysis, as exemplified by the octagon abstract
domain [19], focus on numeric relations, such as, linear inequalities. To reduce the cost of this
approach, a syntactic pre-analysis called variable packing is typically used for partitioning
the set of program variables, and one octagon is then used for each pack instead of tracking
all possible combinations of inequalities. This kind of partitioning is reminiscent of value
partitioning, but with the important difference that variable packing and octagons operate
on sets of program variables whereas value partitioning works on individual abstract values.
In our work with analysis of JavaScript libraries, we have not encountered a critical need for
tracking numeric relations.
    The well-known analyzer Astrée [4] applies not only trace partitioning and octagons, but
also a decision tree abstract domain that is used for tracking relations between booleans
and numerical variables that affect control flow. That technique has some similarities with
our type partitioning mechanism but relies on variable packing to avoid combinatorial
explosions, whereas type partitioning uses the more lightweight value partitioning technique
in combination with the existing control sensitivity mechanism of TAJS.
    The main purpose of value partitioning is to be able to reason about relations between
different parts of the abstract state (i.e., program variables and registers) at the various
program points. Some literature uses the term relational analysis with a slightly different
meaning: to relate information across program points, typically relations between the entry
and exits of functions [8, 5].


Static analysis for JavaScript
Through the last decade, several static analyzers for JavaScript have been developed, including
WALA [25, 24, 28], SAFE [17, 22], JSAI [13], and TAJS [11, 2, 26]. Although we focus on
TAJS, the designs of SAFE and JSAI are reasonably similar, so we believe value partitioning
could also be incorporated into those tools with little effort.
   As discussed in the introduction, much work has been put into improving precision of the
analyses through different kinds of context sensitivity and elaborate abstract domains. The
techniques include parameter sensitivity and heap context sensitivity [2], loop unrolling [22],


20
     In our experiments, 99.4% of all abstract values have the trivial {any} partitioning.
21
     This was shown experimentally in the work on TAJSVR [26, Section 7.1].




                                                                                                     ECOOP 2020
16:26   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


        and syntactic patterns for detecting correlated read/write pairs and guiding context sensi-
        tivity [25]. Other works have explored more expressive string abstractions to reason more
        precisely about property names in dynamic property accesses [18, 1, 21]. Our abstract domain
        extension for value partitioning has few assumptions about the underlying abstract domain,
        so most of these techniques can be combined with value partitioning.
            Despite such precision improvement techniques, imprecision is inevitable, and only a few
        techniques have been designed to handle dynamic property accesses with imprecise property
        names, most importantly, CompAbs-style trace partitioning [16] and demand-driven value
        refinement [26]. Previous work has shown that demand-driven value refinement enables
        analysis of many more challenging benchmarks than CompAbs-style trace partitioning (as
        also discussed in Section 6), and that the trace partitioning approach causes a large amount
        of redundant computation [26, Section 7.1]. The fundamental drawback of demand-driven
        value refinement is that it requires a separate backwards abstract interpreter for not only the
        entire JavaScript language but also the standard library. The backwards abstract interpreter
        of TAJSVR is not simply the dual of TAJS but works goal-directed and with its own abstract
        domain based on intuitionistic separation logic. In contrast, value partitioning directly
        leverages the existing forward analyzer and thereby supports both the JavaScript language
        and the standard library essentially for free, which makes this approach substantially easier
        to develop and maintain. Furthermore, value partitioning is more general (for example, it
        enables type partitioning), and the three instantiations we have presented lead to better
        precision (for the Lodash4 tests).
            The HOO (heap with open objects) abstract domain [9] is a relational abstraction that is
        designed to reason more precisely about abstract objects whose properties cannot be known
        statically. That approach is highly expressive but not scalable to real-world JavaScript
        libraries as those considered in Section 6.




         8     Conclusion

        We have presented value partitioning, a static analysis technique for reasoning about relational
        properties. It is a lightweight alternative to traditional trace partitioning techniques that
        allows relational information to be incorporated into the abstract values instead of requiring
        separate abstract states for the partitions. We have proposed three instantiations of value
        partitioning in JavaScript analysis: property name partitioning, free variable partitioning,
        and type partitioning, which enable precise reasoning for dynamic read/write pairs, free
        variables, and predicate functions, respectively.
            The experimental results show that extending the TAJS analyzer with the three variants
        of value partitioning enables precise and efficient analysis of complex JavaScript libraries
        including Lodash and Underscore, thereby outperforming a state-of-the-art technique that
        relies on trace partitioning and without requiring a complicated backwards analysis. For the
        libraries considered in this study, property name partitioning has the largest effect among
        the proposed variants.
           An interesting direction for future research is to investigate whether some of the traditional
        context sensitivity strategies used in TAJS and other JavaScript analyzers can be reformulated
        as new value partitioning instantiations, to make analysis faster while retaining precision.
B. B. Nielsen and A. Møller                                                                           16:27


     References
 1   Roberto Amadini, Alexander Jordan, Graeme Gange, François Gauthier, Peter Schachte,
     Harald Søndergaard, Peter J. Stuckey, and Chenyi Zhang. Combining string abstract domains
     for JavaScript analysis: An evaluation. In Tools and Algorithms for the Construction and
     Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the
     European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala,
     Sweden, April 22-29, 2017, Proceedings, Part I, volume 10205 of Lecture Notes in Computer
     Science, pages 41–57, 2017.
 2   Esben Andreasen and Anders Møller. Determinacy in static analysis for jQuery. In Proc. ACM
     SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications,
     pages 17–31, 2014.
 3   Esben Sparre Andreasen, Anders Møller, and Benjamin Barslev Nielsen. Systematic approaches
     for increasing soundness and precision of static analyzers. In Proceedings of the 6th ACM
     SIGPLAN International Workshop on State Of the Art in Program Analysis, SOAP@PLDI
     2017, Barcelona, Spain, June 18, 2017, pages 31–36, 2017.
 4   Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine
     Miné, David Monniaux, and Xavier Rival. A static analyzer for large safety-critical software.
     In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and
     Implementation 2003, San Diego, California, USA, June 9-11, 2003, pages 196–207. ACM,
     2003.
 5   Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, and Hongseok Yang. Compositional
     shape analysis by means of bi-abduction. J. ACM, 58(6):26:1–26:66, 2011.
 6   David R. Chase, Mark N. Wegman, and F. Kenneth Zadeck. Analysis of pointers and structures.
     In Proceedings of the ACM SIGPLAN’90 Conference on Programming Language Design and
     Implementation (PLDI), White Plains, New York, USA, June 20-22, 1990, pages 296–310.
     ACM, 1990.
 7   Patrick Cousot and Radhia Cousot. Abstract interpretation: A unified lattice model for static
     analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th
     ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’77,
     pages 238–252, New York, NY, USA, 1977. ACM.
 8   Patrick Cousot and Radhia Cousot. Modular static program analysis. In Compiler Construction,
     11th International Conference, CC 2002, Held as Part of the Joint European Conferences on
     Theory and Practice of Software, ETAPS 2002, Grenoble, France, April 8-12, 2002, Proceedings,
     volume 2304 of Lecture Notes in Computer Science, pages 159–178. Springer, 2002.
 9   Arlen Cox, Bor-Yuh Evan Chang, and Xavier Rival. Automatic analysis of open objects in
     dynamic language programs. In Static Analysis - 21st International Symposium, SAS 2014,
     Munich, Germany, September 11-13, 2014. Proceedings, pages 134–150, 2014.
10   Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi. Typing local control and state
     using flow analysis. In Programming Languages and Systems - 20th European Symposium
     on Programming, ESOP 2011, Held as Part of the Joint European Conferences on Theory
     and Practice of Software, ETAPS 2011, Saarbrücken, Germany, March 26-April 3, 2011.
     Proceedings, volume 6602 of Lecture Notes in Computer Science, pages 256–275. Springer,
     2011.
11   Simon Holm Jensen, Anders Møller, and Peter Thiemann. Type analysis for JavaScript. In
     Proc. 16th International Static Analysis Symposium, pages 238–255, 2009.
12   John B. Kam and Jeffrey D. Ullman. Monotone data flow analysis frameworks. Acta Inf.,
     7(3):305–317, September 1977.
13   Vineeth Kashyap, Kyle Dewey, Ethan A. Kuefner, John Wagner, Kevin Gibbons, John
     Sarracino, Ben Wiedermann, and Ben Hardekopf. JSAI: a static analysis platform for
     JavaScript. In Proc. 22nd ACM SIGSOFT International Symposium on Foundations of
     Software Engineering, pages 121–132, 2014.




                                                                                                     ECOOP 2020
16:28   Value Partitioning: A Lightweight Approach to Relational Static Analysis for JavaScript


        14    Vineeth Kashyap, John Sarracino, John Wagner, Ben Wiedermann, and Ben Hardekopf. Type
              refinement for static analysis of JavaScript. In DLS’13, Proceedings of the 9th Symposium
              on Dynamic Languages, part of SPLASH 2013, Indianapolis, IN, USA, October 26-31, 2013,
              pages 17–26. ACM, 2013.
        15    Gary A. Kildall. A unified approach to global program optimization. In Conference Record of
              the ACM Symposium on Principles of Programming Languages, Boston, Massachusetts, USA,
             October 1973, pages 194–206. ACM Press, 1973.
        16    Yoonseok Ko, Xavier Rival, and Sukyoung Ryu. Weakly sensitive analysis for JavaScript
              object-manipulating programs. Softw., Pract. Exper., 49(5):840–884, 2019.
        17    Hongki Lee, Sooncheol Won, Joonho Jin, Junhee Cho, and Sukyoung Ryu. SAFE: formal
              specification and implementation of a scalable analysis framework for ECMAScript. In Proc.
              International Workshop on Foundations of Object Oriented Languages, 2012.
        18    Magnus Madsen and Esben Andreasen. String analysis for dynamic field access. In Proc.
              23rd International Conference on Compiler Construction, volume 8409 of Lecture Notes in
             Computer Science. Springer, 2014.
        19    Antoine Miné. The octagon abstract domain. Higher-Order and Symbolic Computation,
              19(1):31–100, 2006.
        20    Erik M. Nystrom, Hong-Seok Kim, and Wen-mei W. Hwu. Importance of heap specialization in
              pointer analysis. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program
              Analysis For Software Tools and Engineering, PASTE’04, Washington, DC, USA, June 7-8,
              2004, pages 43–48. ACM, 2004.
        21    Changhee Park, Hyeonseung Im, and Sukyoung Ryu. Precise and scalable static analysis of
              jquery using a regular expression domain. In Proceedings of the 12th Symposium on Dynamic
              Languages, DLS 2016, pages 25–36, New York, NY, USA, 2016. ACM.
        22    Changhee Park and Sukyoung Ryu. Scalable and precise static analysis of JavaScript applica-
              tions via loop-sensitivity. In Proc. 29th European Conference on Object-Oriented Programming,
              pages 735–756, 2015.
        23    Xavier Rival and Laurent Mauborgne. The trace partitioning abstract domain. ACM Trans.
              Program. Lang. Syst., 29(5), August 2007.
        24    Max Schäfer, Manu Sridharan, Julian Dolby, and Frank Tip. Dynamic determinacy analysis.
              In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI
             ’13, Seattle, WA, USA, June 16-19, 2013, pages 165–174. ACM, 2013.
        25    Manu Sridharan, Julian Dolby, Satish Chandra, Max Schäfer, and Frank Tip. Correlation
              tracking for points-to analysis of JavaScript. In Proc. 26th European Conference on Object-
             Oriented Programming, 2012.
        26    Benno Stein, Benjamin Barslev Nielsen, Bor-Yuh Evan Chang, and Anders Møller. Static
              analysis with demand-driven value refinement. Proceedings of the ACM on Programming
              Languages (PACMPL), 3:140:1–140:29, 2019.
        27    Mark N. Wegman and F. Kenneth Zadeck. Constant propagation with conditional branches.
              ACM Trans. Program. Lang. Syst., 13(2):181–210, 1991.
        28    Shiyi Wei, Omer Tripp, Barbara G. Ryder, and Julian Dolby. Revamping JavaScript static
              analysis via localization and remediation of root causes of imprecision. In Proceedings of the
              24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE
              2016, Seattle, WA, USA, November 13-18, 2016, pages 487–498. ACM, 2016.