DOKK Library

J-Force: Forced Execution on JavaScript

Authors Chung Hwan Kim Dongyan Xu I Luk Kim Kyungtae Kim Xiangyu Zhang Yonghwi Kwon Yunhui Zheng

License CC-BY-4.0

Plaintext
                      J-Force: Forced Execution on JavaScript

                              Kyungtae Kim, I Luk Kim, Chung Hwan Kim, Yonghwi Kwon,
                                     Yunhui Zheng∗ , Xiangyu Zhang, Dongyan Xu
                                                               ∗
       Department of Computer Science, Purdue University, USA    IBM T.J. Watson Research Center, USA
         {kim1798, kim1634, chungkim, kwon58, xyzhang, dxu}@cs.purdue.edu     zhengyu@us.ibm.com




ABSTRACT                                                                     they can only cover one concrete execution path in one run and may
Web-based malware equipped with stealthy cloaking and obfusca-               be unable to hit the spot that conceals malicious behaviors.
tion techniques is becoming more sophisticated nowadays. In this                To address the limitations, symbolic and concolic execution based
paper, we propose J-F ORCE, a crash-free forced JavaScript exe-              techniques [32, 31, 33] have also been proposed to analyze JavaScript
cution engine to systematically explore possible execution paths             programs. While they can generate program inputs and drive the
and reveal malicious behaviors in such malware. In particular, J-            execution along various feasible paths, due to the limitations of the
F ORCE records branch outcomes and mutates them for further ex-              constraint solvers, overcoming state explosion and handling com-
plorations. J-F ORCE inspects function parameter values that may             plex JavaScript operations (e.g., dynamic type conversions, arith-
reveal malicious intentions and expose suspicious DOM injections.            metic/string operations) are still open problems, especially for non-
We addressed a number of technical challenges encountered. For               trivial programs built atop various frameworks and other obfus-
instance, we keep track of missing objects and DOM elements, and             cated programs.
create them on demand. To verify the efficacy of our techniques,                In this paper, we propose J-F ORCE, a crash-free1 JavaScript forced
we apply J-F ORCE to detect Exploit Kit (EK) attacks and malicious           execution engine. J-F ORCE combines the advantages of static and
Chrome extensions. We observe that J-F ORCE is more effective                dynamic approaches: Similar to dynamic analysis, J-F ORCE exe-
compared to the existing tools.                                              cutes the program so that obfuscation is not an obstacle anymore.
                                                                             To increase the coverage, J-F ORCE forces the execution to go along
                                                                             different paths. In particular, J-F ORCE records the outcomes of
Keywords                                                                     branch predicates, mutates them, and explores unvisited paths via
JavaScript; Security; Malware; Evasion                                       multiple executions. This iterative path exploration process con-
                                                                             tinues until all possible paths are explored. Hence, J-F ORCE can
1.    INTRODUCTION                                                           expose not only malicious code that can only be triggered by con-
   Web-based applications powered by JavaScript are becoming more            ditions uneasily met, but also code blocks that are dynamically cre-
widespread, interactive and powerful. In the meanwhile, they are             ated and injected. Additionally, J-F ORCE further uncovers paths
attractive targets of various attacks. Unfortunately, detecting and          hidden in event and exception handlers. J-F ORCE can detect eva-
analyzing malicious web apps against diverse combinations of ex-             sive attacks triggered by non-deterministic events.
ploits and evasive techniques is complicated and challenging. Al-               We evaluate J-F ORCE on 50 real-world exploits in popular EKs [1,
though various detection schemes have been proposed [14, 27, 13],            2] and over 12, 000 Chrome extensions. J-F ORCE successfully ex-
they still suffer from sophisticated attacks such as cloaking attacks [21,   posed the hidden code of 41 exploits and found that more than 300
35, 22].                                                                     Chrome extensions inject advertisements. We also run J-F ORCE
   Both static and dynamic approaches have been applied to detect            on 100 JavaScript samples and measure its code coverage capacity.
JavaScript malware. Static analysis (e.g., [9, 8]) considers multiple        The results show that J-F ORCE can cover 95% of the code with
execution paths and usually achieves better code coverage. How-              2-8x overhead, which is significantly effective than a popular con-
ever, JavaScript is highly dynamic. Static approach may be impre-            colic execution technique (68% coverage, 10-10, 000x overhead).
cise and even incapable due to over-approximations and obfusca-                 In summary, this paper makes the following contributions.
tions. This is a critical limitation since obfuscations have been the            • We propose J-F ORCE, a JavaScript forced execution engine
most common practice to hide the real intentions for protections or                 that explores all possible paths to expose hidden malware be-
malicious reasons. By contrast, dynamic analysis techniques (e.g.,                  haviors. J-F ORCE records and switches branch outcomes to
[16, 32]) execute the program and thus can reveal concrete behav-                   explore new paths. J-F ORCE unveils function parameter val-
iors even in an obfuscated program. However, a downside is that                     ues to detect malicious intentions and DOM injection attacks.
                                                                                 • We address several technical challenges to avoid crashes dur-
                                                                                    ing the continuous path explorations. For instance, J-F ORCE
 c 2017 International World Wide Web Conference Committee (IW3C2),                  keeps track of missing objects/DOM nodes and creates them
published under Creative Commons CC BY 4.0 License.                                 on demand. J-F ORCE can tolerate critical exceptions and
WWW 2017, April 3–7, 2017, Perth, Australia.                                        handle infinite loops/recursions.
ACM 978-1-4503-4913-0/17/04.                                                     • We validate the efficacy of J-F ORCE through an extensive set
http://dx.doi.org/10.1145/3038912.3052674                                           of experiments on real-world exploits and web browser ex-
                                                                             1
                                                                               In our paper, crash-free is about avoiding or handling JavaScript
                          .                                                  exceptions.
http://bbb.com/shop2.html      http://ppp.org/abc.js         obfuscated                      Timer handler
     <html>
                              function FC3d(DzV, lm8H2) {
                                                                               EDXGD= function() {                         J-Force Driver                              Object Management
                              …
     <script>                                                                    …
                              for(HPFY=0;DVz.length>HPFY;HPFY+=8)
      …            Internet   ... d5+=String.fromCharCode(...)...return          elem.appendChild(script);
     </script>                unescape(d5);}                            eval   }                                                    Exec #1                                 Exception
     …                        ...                                              setTimeout(EDXGD, 10);                                              Exec #2                 Management
     </html>                  lTZI04 = FC3d(VkpZF,MG6V);eval(lTZI04);
                                                                                                     obfuscated
                   ieTrue = navigator.userAgent.toLowerCase()             k=document[‘createElement’](‘script’)                                                                    …
       Exploit /   browser = /msie[\/s]d+/i.test(ieTrue)                  …
       Payload     …                                                      k[‘text’]=S5SSQ(“AWFRMWtbFnshSQG                                                              DOM Management
                   if(browser) {                                          IESFJaRB94ZxUBXVMbUeEVXXnddR9Q
                      ...                                                 GmpXbR9aa....”);
                      e.insertBefore(a,b);                                ...
                   }                                                      d.appendChild(k);
                                                                                           http://ggg.net/opq.js


                    Figure 1: Stealthy Exploit Kit Attack.


      tensions. J-F ORCE successfully disclosed the hidden code in
      41 exploits and detected more than 300 ad-injecting exten-
      sions. Also, we show that J-F ORCE can achieve 95% code                                                                           Figure 2: Overview of J-F ORCE.
      coverage and is 2-8x faster than the state-of-the-art on 100
                                                                                                                           1.    <script>
      JavaScript samples.                                                                                                  2.        if (...) {
                                                                                                                           3.            btn = document.createElement("button");
  Our work focuses on understanding malicious code that is present                                                         4.            btn.id = "mybutton";
on the client, so server-side cloaking or evasion is out-of-scope.                                                         5.            btn.innerHTML = "Remove";
                                                                                                                           6.        } else {
                                                                                                                           7.            btn = document.createElement("button");
                                                                                                                           8.            btn.id = "mybutton";
2.        MOTIVATION                                                                                                       9.
                                                                                                                           10.       }
                                                                                                                                         btn.innerHTML = "Skip";

                                                                                                                           11.       document.body.appendChild(btn);
   Recently, Exploit Kits (EKs) have been favored by cybercrim-                                                            12.   </script>
inals to perform web-based attacks. In the last year alone, more                                                           13.
                                                                                                                           14.
                                                                                                                                 ...
                                                                                                                                 <script>
than 14 attacks were reported to CVE2 . Since EKs are specially                                                            15.       x = document.getElementById("mybutton");
                                                                                                                           16.       if (...) {...}
designed to exploit known browser related defects, such attacks                                                            17.       if (...) {...}
are highly effective: once a vulnerable client reaches the actual                                                          18.   </script>

EK landing page, EK will silently download and install a malware.                                                        Figure 3: Example for per-block path exploration.
Therefore, as a defense, it is critical to identify suspicious EK de-
livery at the first place. Among various delivery vectors, malver-
tising [10, 37] is one of the most dangerous and successful deliv-                                                 it is challenging for such techniques to be scalable to complicated
ery approaches. In this section, we show a real-world EK deliv-                                                    and large real-world JavaScript programs due to the limitations im-
ery equipped with layered obfuscation and cloaking techniques to                                                   posed by the underlying constraint solvers.
demonstrate our approach.                                                                                              Unfortunately, as shown in Table 1, existing JavaScript malware
   Fig. 1 presents a carefully designed multi-layer EK attack chain                                                detection tools are not effective to detect such malware in a scalable
featured with collaborative cloaking techniques such as code obfus-                                                way. In particular, while Rozzle [22] performs path explorations on
cation, dynamically created scripts and evasive paths: (1) The first                                               JavaScript programs to reveal evasive malicious behaviors, it can-
obfuscated JS(JavaScript) snippet (http://ppp.org/abc.js)                                                          not disclose code in event handlers as its analysis scope is limited
is delivered to a legitimate website via malvertising. (2) When                                                    to functions that are explicitly invoked.
it is evaluated during the page loading, it creates a piece of dy-
                                                                                                                   J-F ORCE Overview. J-F ORCE employs a forced execution tech-
namic code from strings using eval. (3) The function EDXGD
                                                                                                                   nique by switching branch outcomes and invoking event handlers.
in the resulting snippet injects code for the next. Interestingly,
                                                                                                                   As shown in Fig. 2, J-F ORCE explores feasible paths and reveals all
EDXGD is injected as an event handler and can only be invoked
                                                                                                                   the instructions irrespective of branch conditions in multiple con-
when the timeout event is fired. Once evaluated, the second piece
                                                                                                                   crete executions. Also, event and exception handlers are forcibly
of obfuscated snippet (http://ggg.net/opq.js) will be in-
                                                                                                                   invoked without emulating the events. By doing so, J-F ORCE is
jected into the DOM tree and executed. (4) As a result, another
                                                                                                                   able to reach and expose malicious logic that can only be triggered
dynamic script is created and injected (d.appendChild(k)).
                                                                                                                   by a particular combination of events and inputs. Moreover, J-
(5) The injected code uses a cloaking method to hide the mali-
                                                                                                                   F ORCE is dynamic analysis. Hence, it can handle obfuscations and
cious payload: It first checks if the client browser can be the tar-
                                                                                                                   disclose concrete function parameter values, which could further
get (navigator.userAgent and msie). The hidden code is
                                                                                                                   reveal malware behaviors (e.g., identifying eval content).
executed only if the check result (browser) is true.
Existing Approaches. As two pieces of JavaScript (abc.js and
opq.js) in the chain are obfuscated, static analysis based detec-                                                  3.    DESIGN OF J-FORCE
tion mechanisms [14, 9, 28, 11] may have difficulties in under-                                                       In this section, we present the details of J-F ORCE. We first
standing the real semantics and thus are ineffective to handle such                                                discuss the J-F ORCE execution model. Then we describe how J-
cases. Discovering the execution path that can reveal the final ex-                                                F ORCE explores multiple execution paths.
ploit payload using dynamic approaches is also difficult. Particu-
larly, it requires invocations of event handlers and proper environ-                                               3.1     J-Force Execution Model
ment settings (e.g. IE browser), which are conditions not easily                                                     The execution model of J-F ORCE is designed based on the de-
met in general. Symbolic and concolic execution techniques [32,                                                    fault page rendering model.
31, 33] can be used to explore multiple feasible paths. However,
2
                                                                                                                   3.1.1         Per-block Exploration
 CVE-2015-3090, CVE-2015-3105, CVE-2015-5122, CVE-2015-1671, CVE-2015-
5119, CVE-2015-5560, CVE-2015-7645, CVE-2015-8651, CVE-2015-8446, CVE-                                               The default page rendering order drives the execution of J-F ORCE.
2016-1019, CVE-2016-1001, CVE-2016-0189, CVE-2016-0034, CVE-2016-4117                                              Once a <script> block is evaluated, J-F ORCE starts exploring
                              Table 1: The comparison of the approaches for JavaScript malware detection.
                                                          Obfuscation   Path Explora-    State Explo-   Events     Exceptions
              Name            Category                                                                                             Target Scope
                                                           Resilient    tion Support      sion Free     Covered     Covered
              WebEval [18]    Static & Dynamic Analysis       X               7               X           7            7
              Expector [37]   Dyanamic Analysis               X               7               X           X            7        Chrome Extension
              Hulk [20]       Static & Dynamic Analysis       X               7               X           X            7
              Revolver [21]   Static & Dynamic Analysis       X               7               X           7            7
              JSAND [13]      Dynamic Analysis                X               7               X           7            7
              Nozzle [27]     Dynamic Analysis                X               7               X           7            7             Generic
              Zozzle [14]     Static Analysis                 7               7              N/A          7            7
              Rozzle [22]     Dynamic (Symbolic Value)        X               X                7          7            7
              J-F ORCE        Forced Execution                X              X                X            X           X             Generic


                                                                                   1.    function __necdel()
all other possible paths within the block. In particular, when J-                  2.    {
F ORCE reaches the exit of the block, it goes back and explores                    3.        var script = document.createElement("script");
                                                                                   4.        //...
another unvisited path. Consider the example in Fig. 3. J-F ORCE                   5.        script.src = "http: //xxx.xxxxxxx.net/";
                                                                                   6.        var protocol = ("https:" == document.location.protocol: "http://");
explores the two paths in lines 1-12 before exploring the paths in                 7.
the next <script> block in 14-18.                                                  8.       var head = document.getElementsByTagName("head")[0];
                                                                                   9.       if ((protocol === "http://") && head)
   An alternative is to consider all code blocks as one giant block                10.          head.appendChild(script);
and explore paths in the “merged” block. However, it can hardly                    11.   }
                                                                                   12.   window.addEventListener("mouseover", __necdel, false);
scale because the total number of paths to be explored is the product
of the path numbers in every individual block, whereas in the per-                         Figure 4: Code injection upon “mouseover” event.
block strategy it is the sum of the number of paths in every block.
   Please note that an external JS script is essentially a single code
                                                                                  set of handlers can only be triggered by user and timer events. In
block and hence can be explored in a similar way.
                                                                                  our experience, JS malware extensively leverages event handling
                                                                                  mechanism to lay out the attack agenda. Fig. 4 shows a simplified
3.1.2     Handling Inter-Block Dependencies                                       step in the malware delivery chain. __necdel() is registered as
   One challenge brought by the per-block design is how to con-                   an event handler of mouseover event. The script for the next
sider the dependences across code blocks. For example, in Fig. 3, a               step will not be injected unless the event is triggered. Indeed, we
same button is set with different texts (Remove and Skip) along                   observed many malicious payloads only get triggered by a series
different paths in lines 2-11. Without storing states along different             carefully organized user or timer events to escape from being de-
execution paths, our analysis may miss critical states that may lead              tected by honey-client systems or other automatic detection tools.
to malicious behavior. For instance, if we explore the path 7-9 af-               Therefore, exploring event handlers is critical.
ter 2-5. “Remove” will be overwritten by “Skip” and becomes                          J-F ORCE remembers functions registered as event handlers and
invisible to blocks afterwards.                                                   forces them to be executed. In particular, after the exploration of
   While exploring paths globally is the ideal solution, it is unscal-            the current code block, handlers that are registered during explo-
able and impractical. Instead, we develop the following technique                 ration are executed, without requiring the triggering events. The
based on the observation that most inter-block dependences are                    individual handlers are considered as code blocks that are explored
caused by DOM objects. Since it is valid to have multiple ele-                    separately. To the best of our knowledge, most existing honey-
ments with the same name or id on the DOM tree, J-F ORCE allows                   client systems and JS symbolic execution engines (e.g, [31]) do not
any DOM injections along any paths. Also, J-F ORCE intercepts                     emulate events. Hence, they cannot reveal sophisticated handler-
relevant DOM APIs (e.g. getElementById) and injects choice                        related behaviors.
points, which are conceptually equivalent to switch-case state-
ments. So, each execution returns a DOM element (with the same                     3.1.4       Handling Asynchronous Execution
id or name) until all such elements are explored. For example, in                    Currently, J-F ORCE does not focus on exposing race conditions
Fig. 3, both buttons will be appended to the DOM tree. It fur-                    caused by asynchronizations [29, 38]. In fact, most JS races are
ther inserts a choice point at line 15. As a result, totally 8 paths              transient [24]. In our experience, we have not observed any real-
are explored in the second block, where 4 are corresponding to the                world malicious attacks leveraging race conditions due to its non-
“Remove” button and the remaining 4 are for the “Skip” button.                    deterministic and unreliable nature.
   In theory, dependencies caused by global variables are handled in                 J-F ORCE respects browser’s decision on which block runs first.
the same way. However, it is very expensive to do so for all global               Note that JavaScript execution is single threaded and the execution
variables. Given our focuses are stealthy behaviors that are usu-                 of a code block cannot be interrupted. J-F ORCE only steps in when
ally based on string operations, we selectively support global string             a block is being evaluated for the purpose of per-block code explo-
variables. Furthermore, J-F ORCE also overwrites container inter-                 ration.
faces (e.g., hashmap) to support inserting multiple strings with the
same key to a global container. String attributes of DOM objects                   3.1.5       Handling Dynamic Code Evaluation
are handled similarly, where choice points are injected to access the                JavaScript is highly dynamic. Malicious JS snippets can be dy-
different versions.                                                               namically created from strings. For example, a common practice is
                                                                                  to create a <script> element, specify its source and attach it to
3.1.3     Handling Event Handlers                                                 the DOM tree. eval() is another way to run dynamic code.
  Some event handlers, such as onload, are automatically executed                    J-F ORCE admits all code injections found along different paths
when the corresponding DOM objects are loaded or created. The                     during the path exploration. Consequently, they will be explored
exploration is driven by the rendering procedure. However, another                like other code on the DOM tree. Some code snippets may be added
                                                                                           1.   obj = new XMLHttpRequest();    // D1              Line #   Defines
to DOM elements that have already been rendered and explored by                            2.   //...                                                1        D1
                                                                                           3.   if (cond)
J-F ORCE. For such cases, J-F ORCE restarts the rendering proce-                           4.     obj = null; // D2
                                                                                                                                                     2        D1
                                                                                                                                                              D1
                                                                                                                                                     3
dure but only explores the uncovered injected snippets.                                    5.   if (obj == null)                                     4        D2
                                                                                           6.     return;                                            5     D1 | D2
   For code dynamically evaluated by functions like eval, J-F ORCE                         7.   obj.send();                                          6     D1 | D2
explores the code snippet concealed in the function parameter, as a                                                                                  7     D1 | D2

part of the parent code block exploration. Note that J-F ORCE pro-                                   Execution #1                      Execution #2            Value (obj)
vides versioning support for strings so that different but concrete                        1.   obj := XMLHttpRequest     1.   obj := XMLHttpRequest       1. XMLHttpRequest
                                                                                           2.   ----                      2.   ----                        2. XMLHttpRequest
parameter values produced by previous logic will be explored.                              3.   (taken)                   3.   (taken)                     3. XMLHttpRequest
                                                                                           4.   obj := null               4.   obj := null                 4. null
                                                                                           5.   (taken)                   5.   (untaken)                   5. null
3.2       Path Exploration                                                                 6.   return                    6.   ----                        6. null
                                                                                           7.   ----                      7.   obj.send (crash!)           7. null
    J-F ORCE explores different paths in multiple runs. In each run,
                                                                                                Figure 5: Handling crashes caused by missing objects.
it looks for opportunities where mutating a predicate leads to un-
explored instructions. Once found, it forces the execution to cover                      plored instructions. At line 16, J-F ORCE starts the execution with
them in future iterations. It repeats this procedure until all instruc-                  no forced execution scheme and just runs the whole program nor-
tions are covered. We designed two exploration strategies depend-                        mally. The purpose of this step is to obtain a list of predicates on
ing on the needs.                                                                        one path. Then, J-F ORCE can develop a new scheme by mutating
     • L-path executes each instruction at least once with linear                        a predicate at line 22 to execute uncovered instructions (line 21).
        time complexity. Exploring all distinct paths is not its prior-                  The driver repeats this until the worklist is empty, meaning that
        ity. For JS malware analysis, this strategy is sufficient in most                no further opportunities can be discovered. Although the explo-
        cases as malicious behaviors are usually hidden in blocks.                       ration algorithm stems from L-path strategy, E-path takes the same
     • E-path aims at exploring all possible execution paths with                        phase except at line 21. Particularly, at the given branch, instead of
        exponential time complexity. We observed that only a few                         checking if its feasible targets are disclosed, E-path makes sure the
        advanced malware examples requires the E-path strategy.                          branch is followed along with two different targets.

Algorithm 1 Path Exploration.
Input: I: JavaScript instructions in a program
                                                                                         4.      CRASH-FREE FORCED EXECUTION
      // σ is a list of forced predicates. A predicate p is represented as a tuple          As J-F ORCE ignores path conditions, a program may execute
      // (psrc , pdst ) that specifies the source src and forced target dst              along an infeasible path and crash. In this section, we describe the
 1:   function F ORCED E XEC(σ)                                                          challenges and our solutions to avoid crashing.
 2:       σe ← [ ]         // σe is a list of executed predicates
 3:       p ← P OP _ FRONT( σ )
 4:       for each i in I do                                                             4.1       Missing Object
 5:            if i is a condition branch instruction then                                  Fig. 5 shows a typical example of the crashes caused by missing
 6:                 if isrc ≡ psrc then           // isrc : source address of i
 7:                     idst ← pdst          // specify the instruction to be executed   objects. At line 1, variable obj is initialized to an Ajax object.
 8:                     p ← P OP _ FRONT( σ )                                            Suppose the true branches of the two predicates (line 3 and 5) are
 9:                 else                                                                 taken in the first run. Since line 7 is not explored, in the second run,
10:                     E ← E ∪ {idst }
11:                 σe ← σe · (isrc , idst )
                                                                                         the predicate at line 5 is mutated. However, as obj has been set to
12:            Execute the instruction i                                                 null at line 4, the program will crash at line 7.
13:       return σe                                                                         To handle this, when resolving an object accessed, J-F ORCE first
14: function PATH E XPLORATION( )                                                        identifies a set of candidates, which can be collected using an ex-
15:    E ← {} // explored instructions                                                   isting data flow analysis. In addition, candidates without correct
16:    W ← {F ORCED E XEC (nil)} // initial execution. W : worklist                      properties and types are filtered out. As shown in the defines table
17:    while W 6= ∅ do
18:        σ 0 ← P OP(W )                                                                in Fig. 5, at line 7, D1 and D2 are possible objects to be accessed.
19:        σt ← nil                                                                      However, only D1 has the correct field send. Therefore, J-F ORCE
20:        for each p in σ 0 do                                                          selects D1 and continues the forced execution.
21:            if H AS A NY U NEXPLORED TARGET (E, p) then
22:                σt0 ← σt · S WITCHING TARGET(p)
23:                W ← W ∪ {F ORCED E XEC(σt0 )}                                         4.2       Handling Missing DOM Elements
24:            else                                                                         Another common kind of crashes in forced execution is caused
25:                σt ← σt · p
                                                                                         by missing DOM elements. Our strategy is to create and insert
                                                                                         the missing ones to the DOM tree on demand. Note that simply
   Algorithm 1 shows the details of the path exploration approach.                       creating a new DOM element on each access without appending it
Function F ORCED E XEC explains how to drive the execution to a                          to the right place will not work in practice. If multiple accesses to
desired branch. In particular, it takes a forced execution schema                        a same element yield different newly created objects, the program
σ as the input. σ is a list of tuple (psrc , pdst ), where psrc is the                   semantics will be violated. However, as DOM elements can be
address of a predicate p and pdst is the forced target. Intuitively, it                  selected in various ways (e.g., by id, XPath, etc.), the challenge lies
specifies the next step (pdst ) when J-F ORCE sees p. The logic of                       in how to put the new elements in the right place.
forced execution is specified in the loop starting at line 4 interpreted                    If the selection is by element id, name, tag and class, the solution
by JS engine. If a rerouting schema is provided for the current                          is straightforward. Particularly, as shown in Algorithm 2, if the el-
branching instruction i (line 6), J-F ORCE forces the execution to                       ement returned by the original selector is invalid (line 4), J-F ORCE
take the branch specified in the scheme at line 7. Otherwise, the                        creates a new one and inserts it to the children list of the current
instruction will be executed normally.                                                   element (line 8-9).
   Function PATH E XPLORATION is the top-level driver. It main-                             Handling XPath selectors is more challenging. An XPath may be
tains a worklist W, which is a set of forced execution schemes. E                        fully specified (e.g., “/A/B/C” means C is an immediate child of
is a set of covered instructions. J-F ORCE uses it to discover unex-                     B and B is a child of A) or partially specified (e.g., “/A//C” means
1. if (window.attachEvent) {                                                                 1. if (...) {
2.   window.attachEvent("onload", window["load" + initialize]); // ...                       2.   var script = document.createElement("script");
3. } else {                                                                                  3.   script.src = "http://.../a.js";
4.   window.addEventListener("load", initialize, false); // ...                              4.   document.body.appendChild(script);
5. }                                                                                         5. } else {
                                                                                             6.   window.location = "http://.../b.html"; /* page redirection */
Figure 6: Browser-compatibility exception in forced execution.                               7. }

                                                                                                        Figure 7: An example of page redirections.
all C objects with an ancestor A). An XPath may also contain wild-
cards to select all elements satisfying the filtering conditions (e.g.,
“/A[@exchange]” selects A with attribute exchange). In a                                       To avoid terminations due to such exceptions, J-F ORCE captures
forced run, an XPath selector may be partially broken due to miss-                          all unhandled exceptions using a top-level exception handler in the
ing elements. Consider selector “p · s”. The prefix p correctly                             global scope and resumes the interrupted execution from the near-
locate a DOM element. However, the suffix s fails because there                             est legacy function by unwinding the stack. In addition, to preserve
is no such elements. To handle this issue, J-F ORCE identifies the                          the semantics of the exception triggering statement, J-F ORCE in-
longest p that can locate a valid element o, creates element(s) cor-                        cludes a set of selective legacy APIs, which will be invoked based
responding to s and make them a subtree of o.                                               on the context. For instance, in Fig. 6, the attachEvent is redi-
   Function PathRecognizer() in Algorithm 2 describes the                                   rected to the addEventListener so that the original program
procedure. Particularly, at line 13, an XPath p is split by delimiters                      semantics are preserved. Algorithm 3 explains the details:
(i.e., ‘/’ and ‘//’). Each delimited segment τ contains three parts:                           (a) Exceptions that can be handled by the original program: J-
(1) the delimiter τp (“”, “/” or “//”); (2) the id τe (e.g., A), and                               F ORCE remembers the triggering location (line 3) and then
(3) the filter τa (e.g., [@exchange]).                                                             explores the corresponding catch block. The code after the
   If τp is “//”, GetOffSpring is invoked to identify the off-                                     triggering point will be covered in a later iteration.
springs of the current object θ that matches τe and τa (line 22).                             (b) Uncaught exceptions due to missing handlers: They will be
Otherwise, GetChildren is called to get the direct children of                                     taken care of by the top-level handler inserted by J-F ORCE
the current object that matches τe and τa (line 16). If no element is                              (lines 6,7-12).
found (line 19), a new element corresponding to τe and τa is cre-                              (c) Exception handlers present but no exception was triggered
ated as a child of θ (line 20). The above procedure continues until                                in one run. In our experience, a catch block is a high-
the original selector becomes valid.                                                               value target for exploration, as malware authors often place
   An important design choice made is that the elements created                                    their malicious code here for cloaking [22, 21]. These han-
during one (forced) run are retained for later executions. This avoids                             dlers hence should be explored regardless the exception oc-
creating duplicated elements in multiple executions and the DOM                                    currences: J-F ORCE employs the same strategy for (a). J-
tree grows monotonically. In practice, we found the size of a DOM                                  F ORCE remembers the block entry point and explores it later.
tree usually increases slowly and gradually becomes stable.
                                                                                            Algorithm 3 Exception Handling.
Algorithm 2 Handle missing DOM elements.                                                     1: function E XCEPTION O CCURENCE(σ)
Input: σ ∈ {id, name, nametag , nameclass , XPath}                                           2:    if I S C OUGHT (σ) then
  1: function C HECK A ND I NSERTION(σ)                                                      3:         S AVE E XCEPTION L OC (σ)
  2: E ← G ET E LEMENTS (σ)                                                                  4:         return // Allow to run catch block
  3: τ ← G ET C URRNET O BJECT ()                                                            5:    else
  4: if ¬ I S VALID (E) then                                                                 6:         return T OP L EVEL H ANDLER (σ)
  5:    if σ ∈ XPath then
                                                                                             7: function T OP L EVEL H ANDLER(σ)
  6:      return PATH R ECOGNIZER (σ)
                                                                                             8:    t ← F IND L EGACY F UNC (σ)
  7:    else
                                                                                             9:    if I S VALID (t) then
  8:      τ .I NSERT (C REATE E LEMENT (σ))
                                                                                            10:         return C ALL (t)
  9:      E ← G ET E LEMENTS (σ)
                                                                                            11:    else
 10: return E                                                                               12:         return and allow to run the following.
11: function PATH R ECOGNIZER(p)
12: θ ← the current node
13: p0 ← PARTITION B Y D ELIMITER (p)
14: for each segment (τp , τe , τa ) in p0 do //τp :delimiter, τe :identifier, τa :filter   4.4      Page Redirection
15:    if τp ≡ “//” then
16:      E ← θ.G ET O FFSPRINGS (τe , τa )                                                     Page redirections are commonly used to send visitors to a new
17:    else /*τp ≡ ‘/’ ∨ τp is empty*/                                                      destination by setting the location attribute of the window ob-
18:      E ← θ.G ET C HILDREN (τe , τa )                                                    ject in JavaScript. A page redirection cancels the current page
19:    if ¬ I S VALID (E) then
20:      θ.I NSERT (C REATE E LEMENT (τe , τa ))
                                                                                            rendering procedure (including the JavaScript execution and re-
21:      E ← θ.G ET C HILDREN (τe , τa )                                                    source downloading) and hence interrupts J-F ORCE’s code explo-
22:    θ←E                                                                                  ration strategy (J-F ORCE explores paths in multiple runs).
23: return E                                                                                   Fig. 7 shows an example. The true branch of the if state-
                                                                                            ment injects a new <script> element while the else branch
                                                                                            redirects visitors to b.html. Consider the following forced exe-
4.3      Handling Exception                                                                 cution. In the 1st run, the true branch is covered and a new piece
   Being able to recover from crashes caused by exceptions is one                           of JavaScript in a.js will be downloaded and executed (lines 2-
of the most important features of J-F ORCE for robustness. As the                           4). (a.js). As explained in the forced execution model, J-F ORCE
program may be forced to run on an infeasible path, various excep-                          explores the current code block before processing the next block.
tions may occur. For example, Fig. 6 shows a common practice to                             Hence, in the next iteration, it explores the else branch before ex-
make the program compatible with different browsers. J-F ORCE                               ecuting a.js. However, since the page redirection happens at line
will execute line 2 without considering its predicate and thus trig-                        6, the forced execution will be interrupted so that a.js will not be
gers an exception. Since the corresponding handler is absent, the                           explored. In fact, if there are other uncovered paths/blocks in the
forced execution will be interrupted and terminated.                                        same page, they will not be explored due to the page redirection.
                                                                                               # of     # of samples whose obfuscations / evasions can be handled
   Our solution is to load the target page in a separate frame so that     Exploit Kits
                                                                                             samples    Native run Rozzle [22] WebEval [18]           J-F ORCE
J-F ORCE can continue exploring the current page. Since frames              Angler              10         2/1          7/6            3/3              10 / 10
are isolated from each other, the effect of loading the destination          RIG                10         5/0          7/2            5/0              10 / 10
page in a frame is functionally equivalent to a page redirection. In        Nuclear             10         3/0          6/2            3/1               10 / 7
                                                                           Magnitude            10         6/2         10 / 6          6/4              10 / 10
this particular example, J-F ORCE loads b.html in an iframe
                                                                          SweetOrange           10         2/0          8/4            4/4               10 / 6
and thus is able to explore the behaviors in a.js.
                                                                                 Table 2: Comparing detection techniques on EKs.
4.5    Infinite Loop and Recursion
   J-F ORCE may suffer from infinite loops or endless recursions                                       # of Ad-injecting                # of Info. leakage
because it ignores the loop and recursion conditions. To handle                                Total    Ajax Script Injection   Total    Ajax Script Injection
this issue, we set an upper bound on the number of times a loop or            Hulk [20]        195       29           166        14        9             5
                                                                             Expector [37]     187       28           159        9         6             3
a recursive function can be invoked. For loops, J-F ORCE monitors            WebEval [18]      158       15           143        8         5             3
the loop executions and makes sure that they do not go beyond                  J-F ORCE        322       45           277        30       21             9
the threshold. Otherwise, J-F ORCE forces the execution to skip the
loop. Similarly, for recursions, we use a threshold to limit recursion      Table 3: The analysis result of 12,132 Chrome extensions.
depth. We make sure that whenever new stack frame is created, the
stack depth is smaller than the threshold.
                                                                         number of the samples can be handled by each tool, in terms of ob-
                                                                         fuscation handled and evasion passed. Since we know the ground
5.    EVALUATION                                                         truth about deobfuscation, counting successful de-obfuscations is
   J-F ORCE is implemented atop WebKit-r171233 with GTK+ port.           straightforward. For evasions, if the exploitation entry point (e.g.
Our evaluation consists of two experiments. The first one is a sys-      <object>) is reached, we say the evasion is detected.
tematic study on 50 EK samples and 12, 132 Chrome extensions                The results show that J-F ORCE is able to handle more obfusca-
to see if J-F ORCE is able to detect (malicious) behaviors covered       tions and evasions than others, hence can expose more hidden ma-
by sophisticated cloaking and obfuscation techniques. Also, since        licious behaviors in EK attacks. In particular, J-F ORCE is signifi-
being able to explore more code is important, in the second exper-       cantly effective in detecting evasions. While J-F ORCE outperforms
iment, we further quantify J-F ORCE’s performance by measuring           other techniques, it misses a few evasions in Nuclear and SweetOr-
the coverage and the overhead on 100 real-world JavaScript pro-          ange. We manually inspected these cases and found that they use
grams. All experiments are performed on a machine with an Intel          Visual Basic (VB) scripts which are not currently supported by J-
Core i7 3.40 GHz CPU and 12 GB RAM running Ubuntu 14.04                  F ORCE. However, our design is general and can be implemented
LTS.                                                                     on VB scripts too.

5.1    Detecting Suspicious Hidden Behaviors                              5.1.2      Detecting Ads Injections in Chrome Extensions
                                                                         Browser extensions are commonly used nowadays to enhance
 5.1.1 Detecting Obfuscations and Evasions in EKs                     user experience and thus becoming a target of adversaries. Several
   We have collected 50 EK samples from various sources [1, 2],       recent work [20, 18, 37] have been proposed to analyze extensions.
and classified them based on the underlying EKs, namely Angler,       In this section, we show how J-F ORCE can effectively disclose sus-
RIG, Nuclear, Magnitude, SweetOrange. Although different, we          picious behaviors in Chrome extensions.
observed they all share similar mechanisms listed as follows:            We crawled and obtained 12,132 extensions from Chrome Web
    • Obfuscation. Obfuscation conceals program functionalities       Store [5] in July 2016. The analysis is done offline. As the JavaScript
       using string operations to make detecting malware challeng-    APIs used in extensions are slightly different from those in web
       ing. In EK, obfuscation technique is used more than once       applications, we enhance J-F ORCE to support such Chrome APIs
       throughout multiple layers of code injection.                  (e.g., chrome.browserAction.onClicked). In this exper-
    • Evasion. To minimize the possibility of being caught (e.g.,     iment, we are particularly interested in detecting ad-injections and
       by honey-pot based approaches), EK only invokes the ma-        information leaks. We also compare with recent work on Chrome
       licious logic when it satisfies certain conditions. Specifi-   extension analysis [20, 18, 37].
       cally, EK usually scans visitors’ system (e.g. the signatures     Table 3 summarizes the experiment results. J-F ORCE detected
       of browsers, extensions, etc.) before moving on to the next    322 extensions that inject advertisement, where 277 deliver ad con-
       stage. An example is shown in Fig. 1 in Sec. 2.                tents using script injections and the remaining ones bring in ads via
    • Exploiting Vulnerabilities. EK is designed to exploit partic-   Ajax. Comparing to other techniques, J-F ORCE is able to find 195
       ular vulnerabilities in browsers or add-ons by hijacking the   more ad-injecting extensions, which confirms its effectiveness of
       control flow and elevating permissions. The typical targets of handling cloaking and fingerprinting techniques. In addition, J-
       such exploitation are Adobe Flash, MS Silverlight and Java     F ORCE detected 30 extensions that send out sensitive information
       runtime as well as browsers themselves.                        such as passwords and cookies via Ajax, while other techniques
    • Payload Delivery. As the last step, a malicious binary is       can detect at most 14 of them.
       downloaded and executed without user’s consent. Ransomware [7]    Table 4 presents the statistics of the Chrome extension execution
       and click fraud [6] are two common examples.                   analysis. We report the minimum, average and maximum number
   As J-F ORCE focuses on detecting malicious JavaScript behav-       of JavaScript IR instructions, script injections, Ajax requests, eval
iors, only the JavaScript parts (obfuscation and evasion) are         function invocations, event handlers and page redirections observed
included for evaluation. Analyzing non-JavaScript code, such as       in exploring one extension. The results show that J-F ORCE can
exploiting vulnerabilities in the web browser or plug-ins, is beyond  exercise more instructions and discover more behaviors than the
the scope of this paper. The results of experiments on 50 EK sam-     native run. We also report the number of runs required by J-F ORCE
ples (10 for each EK type) are presented in Table 2. It shows the     to cover all instructions (using the L-path search strategy explained
                   JavaScript IR       Script Injections          Ajax                Eval           Event Handlers     Redirections   Handled Crashes        # of Runs
                 avg min         max   avg min max         avg    min    max   avg    min    max    avg min max       avg min max      avg min max         avg min max
 J-F ORCE      1, 478   10   31, 248   0.71    0     28    0.21     0      5   0.27     0     10   1.57     0   19    0.15   0     5   2.74     0   117   11.32      1   609
 Native run       406   10   14, 151   0.46    0     13    0.03     0      2   0.15     0      8   0.85     0   12    0.02   0     2          N/A                 N/A


                                                     Table 4: The statistics of Chrome extensions analysis


in Sec. 3.2). We show the number of potential crashes caused by                               ternal script included at line 1 must be blocked by an adblocker,
the forced execution. We observed 2.74 crashes per extension on                               which is highly dependent on the execution environment. If the
average and they are mostly caused by missing objects and DOM                                 adblocker has not been configured correctly or the URL of the ex-
elements. All of them are handled correctly using the approach                                ternal resource is not on the blacklist anymore, dynamic analysis
discussed in Sec. 4.                                                                          cannot unveil the stealthy operations either.
                                                                                                 By contrast, J-F ORCE decouples the dependencies on the en-
5.1.3         Case Study - Anti-adblocker                                                     vironment and hence allows us to effectively and deterministically
    Unlike traditional programs, web applications have various ex-                            observe unusual behaviors. On the left hand side of Fig. 8, we com-
ternal dependences. For example, they can navigate the execution                              pare the control flow graphs that highlight the differences between
depending on browsers environment settings. They can download                                 J-F ORCE and dynamic analysis based approaches. J-F ORCE is able
and load different external JavaScript on the fly from third parties                          to explore both paths while the dynamic analysis only covers one
during executions. Therefore, although it is possible mutating in-                            path. As such, J-F ORCE is able to discover the real ads contents by
put values may change the execution paths, in general, it is highly                           forced execution without requiring complicated system settings to
nontrivial or even infeasible for an automatic exploration tool to                            actually trigger the logic in traditional dynamic approaches.
satisfy the triggering conditions of the execution environment and                               More importantly, through J-F ORCE, we can uncover the actual
third party scripts. In this case study, we showcase a real-world                             values of function parameters (the right side of Fig. 8) and track
anti-adblocker [4] to demonstrate how J-F ORCE bypasses sophisti-                             the origin of suspicious values. With such capabilities (especially
cated predicates and thus can be helpful for understanding stealthy                           the hidden contents that can only be obtained dynamically), it is
program behaviors.                                                                            straightforward to conclude the ads are included in the image file.
    Ad-blocker (e.g., [3]) is a piece of software that allows clients to
roam the web without encountering any Ads. In particular, it uti-                             5.2         Efficiency
lizes network control and in-page manipulation to help users block                               As described in Sec. 3.2, J-F ORCE can be configured to im-
advertisements loaded from ad-network. As many content publish-                               prove coverage on instructions (the L-path strategy) or paths (the
ers make their primary legitimate income from Ads, there are grow-                            E-path strategy). To measure its efficiency, we extracted 100 exam-
ing demands for delivering ads even the ad-blockers are running in                            ples (from Alexa.com) and evaluate J-F ORCE on these real-world
client browsers. As a result, anti-adblockers have been developed                             JavaScript programs. We compare J-F ORCE with Jalangi, a con-
and deployed by publishers on their websites. Anti-adblockers are                             colic JavaScript execution engine [32], which is one of the closest
usually scripts delivered by publishers to detect if adblockers are                           alternate approaches available at present.
enabled in the client browsers. Once found, it either hides the con-                             Fig. 9 presents the code coverage comparison results. The num-
tent or delivers the ads by circumventing the ads filters.                                    ber of branches of the benchmarks varies from 109 to 1, 200. In
    Fig. 8 presents a simplified version of a popular anti-adblocker                          Fig.9, the JavaScript benchmarks on the X-axis are sorted by the
BlockAdblock [4], where the arrows denote important call edges. It                            branch count in ascending order. The result shows that, on aver-
first detects if an adblocker is enabled on the client-side and loads                         age, J-F ORCE is able to cover 95% of the code (the same result
the real ads contents that are delivered as an image. In particular,                          for both exploration strategies), which is significantly more than
line 1 includes an external script (“advertising.js”). If it can be                           Jalangi (less than 68%). We found that the main reason for the im-
successfully loaded, variable __haz will be set to false. If an                               provement is that the concolic execution based approach does not
adblocker presents, the script will not be blocked and the value of                           explore the code in event and timer handlers. In addition, Jalangi
__haz remains undefined. Therefore, BlockAdblock can tell if                                  often fails to handle complex arithmetic operations such as division
an adblocker is running by checking the value of __haz. At line 4,                            and modulo. By contrast, J-F ORCE does not suffer from such lim-
it invokes function __ac() and defines the function to be invoked                             itation and is able to expand its analysis scope to event and excep-
for the next step. Depending on the presence of an adblocker, it will                         tion handlers. Besides, J-F ORCE does not miss conditional blocks
invoke a function (defined in lines 13-23) or do nothing. In function                         as our exploration technique is designed to cover both branches by
__dec, it loads an image, where its URL is specified at line 3 and                            switching branch outcomes. We also manually inspect the scenar-
further transformed at line 4. Interestingly, instead of displaying the                       ios where J-F ORCE fails to cover all instructions. We found that
image, it uses this image as a circumvention of ad-blocking rules                             this is mainly due to coding errors in the sample JavaScript pro-
and loads the raw data of the images. At line 21, function __cb                               grams.
is invoked, which creates a div element and displays the HTML                                    Beside the coverage, we also measure the runtime performance
hidden in the image at line 27.                                                               of J-F ORCE. Fig. 10 summarizes the comparison result of the over-
    It is highly nontrivial for static analysis based approaches to pre-                      heads collected during the coverage test. For each approach, the
cisely analyze such complicated call relations, as it requires ad-                            overhead is normalized to the native run. The result shows that the
vanced alias and string analysis (e.g., the operations in line 4 and                          overhead of J-F ORCE is 2-8x (2-300x for E-path) whereas Jalangi
20). More importantly, as the ads contents are actually hidden in                             has much higher overhead 10-10, 000x. Observe that such a differ-
an image, they may not even be in the analysis scope. As a result, it                         ence is caused by the fact that concolic execution based approaches
is very unlikely that the static analysis can handle such cases. An-                          may not scale well with the number of branches, showing expo-
other option is to actually run the program. However, one important                           nentially increasing overhead. Particularly, generating and solving
triggering condition of the secret loading procedure is that the ex-                          path constraints is more expensive than mutating branch outcomes.
                                   J-Force                              1     <script src=“http://.../advertising.js” ..></script> // “var __haz = false;”
                                                                        2     ...                                                                                24 __cb = function (s) {
                                                 if (typeof …)                                                                                                   25    …
                                                                        3     __durl = ‘//.../hallon-p12065a-:r:.gif’;
                                                                        4     __ac(function(){ __dec(__durl.replace(“:r:”, __s(5, 12)), __cb);                   26    _new = d.createElement(‘div’);
                                    return f();                         5     });                                                                                27    _new.innerHTML = s.html;
                                                                              …                                                                                  28    k.insertBefore(_new, k);
                                                         …               6    function __ac(f) {                       13 function __dec(src, callback) {        29 …
                                                                         7      …                                      14 i = new Image();
                                                                         8      if (typeof __haz === ‘undefined’) 15 i.onload = function() {
                                    Native                               9          return f();                        16    …
                                    exec                                10       ...                                   17     t.drawImage(i, 0, 0);                       J-Force
                                                   if (typeof …)
                                                                        11      return;                                18     b = __p24(t.getImageData(...).data);
                                                                                                                       19     for (...)                                     callback(s)
                                                                        12    }
                                        return f();                                                                    20        if (b[x]) s+= str.fromCharCode(b[x]);        s: “..html: <div class=\fram
                                                                                                                                                                            \></div>\n<divclass=\k3rwp
                                                                                                                       21     callback(s);
                                                                                                                                                                            j9jwhynv\>\n<div class=\gb
                                                             …                                                         22   }                                               qfwapg\>\n<span class=\g
                                                                                                                       23 i.src = src;                                      bqfwaabemdey ….</div>”




                                                                                           Figure 8: Analyzing Anti-Adblocker using J-F ORCE.


                               100
                                                                                                                               support dynamic nature and scale to real-world applications built
                                                                   J-Force               Native                                atop various JavaScript frameworks.
                                   80                              Concolic
        Coverage (%)




                                                                                                                               JavaScript Malware. EVILSEED [17] leverages characteristics of
                                   60
                                                                                                                               known malicious web pages to discover other likely malicious web
                                   100
                                   40                                                                                          pages including JavaScript. Revolver [21] aims to find JavaScript
                                                                    J-Force              Native
                                   2080                             Concolic                                                   malware based on code similarity. In particular, it tries to classify
                    Coverage (%)




                                    060
                                                                                                                               evasive malware by comparing with a large amount of JavaScript
                                      0               20           40                    60         80           100
                                     40                                      JS files
                                                                                                                               collected in advance. It heavily resorts to the result of pre-classification
                                                                                                                               by oracle, and may not be robust against newly crafted malware
  200020
Figure   9: Coverage
            Concolic       of J-F ORCE in comparison with native run                                                           (e.g., zero-day exploit). MineSpider [34] extracts URLs from JS
            J-Force(L-path)
            Overhead (times)




and  concolic
       0
  1500 0
                execution.
                      20
            J-Force(E-path)        40       60        80         100                                                           snippets equipped with evasion techniques that performs drive-by
                                                                              JS files
                                                                                                                               download attacks. It collects execution paths relevant to redirec-
                     1000
                      2000
                                                Concolic                                                                       tions using program slicing methods. While it is useful to track
                                                J-Force(L-path)
                                                                                                                               page redirections, it is not able to handle the dynamic remote code
Overhead (times)




                               500
                               1500             J-Force(E-path)
                                                                                                                               injection using iframe or simple <script> tag. Lekies et al. [23]
                                 0
                               1000                                                                                            show attack methods enabled by the object scoping and dynamic
                                   0                  20           40                    60         80           100
                                                                             JS files                                          nature of JavaScript. They investigate a set of high-ranked do-
                                   500
                                                                                                                               mains and verify that those are vulnerable to Cross-Site Script In-
                                        0
                                            0           20         40                    60         80           100
                                                                                                                               clusion(XSSI) attacks. ScriptInspector [39] examines third-party
                                                                              JS files                                         script injection to restrict accesses to critical resources. This is
Figure 10: Performance overhead of J-F ORCE in comparison                                                                      achieved by allowing site administrators to establish their own se-
with concolic execution.                                                                                                       curity policies. WebCapsule [25] records and replays web contents
                                                                                                                               executions for forensic analysis. It records and all non-deterministic
                                                                                                                               inputs to the core web rendering engine including user interactions.
6.                                      RELATED WORK                                                                           RAIL [12] can verify security patches of web applications by rerun-
Multiple Path Execution. The concept of forced execution was                                                                   ning patched web applications with previous buggy inducing inputs
employed in previous researches [26, 15, 36, 19]. Although the                                                                 such as exploits. The system can tolerate state divergences caused
concept has been applied in various domains, such as native binary                                                             by the patches. Unlike the record and replay approaches, J-Force
programs [26], mobile apps [15, 19], and identifying kernel rootk-                                                             explores all possible paths to reveal evasive malicious logics which
its [36], our work is the first to propose the forced execution en-                                                            are difficult to expose.
gine for JavaScript to the best of our knowledge. Furthermore, the                                                             Browser Extensions. Hulk [20] analyzes Chrome browser exten-
challenges that J-F ORCE solves, such as handling missing object-                                                              sions and detects malicious (or suspicious) behaviors, such as ad-
s/DOM, handling event/exception handlers and more (Sec. 4) are                                                                 injecting and information leak. Expector [37] tries to figure out
unique to JavaScript and are not proposed (or solved) by previous                                                              the correlation between malvertising and plug-ins. It shows that,
work. Rozzle [22] also places emphasis on analyzing self-revealing                                                             in a condition where a specific extension is working, malvertising
program behaviors. It explores multiple execution paths with sin-                                                              is more likely to appear. WebEval [18] inspects Chrome exten-
gle execution. However, it is done via a different approach which                                                              sions upon the combination of static and dynamic analysis. In order
is based on symbolic values. More importantly, they have limited                                                               to trigger malicious activities, it sets up simulations by recording
support for program faults and exceptions handling. By contrast,                                                               complex interactions between web pages and network events. Ob-
our tool can explore all feasible paths without being interrupted by                                                           serve that though such techniques have their own way to increase
exceptions. Symbolic (or concolic) execution has been applied to                                                               coverage and unveil hidden malicious actions, it would not be suf-
analyze JavaScript based Web applications [32, 31, 33]. Due to                                                                 ficient to induce all possible behaviors.
the limitations in underlying constraint solvers, it is challenging to
7.     DISCUSSION                                                              applications with retroactive auditing. In OSDI, pages
   As our solution aims to expose malware hidden under a certain               555–569, 2014.
program path, detecting data driven attacks is still challenging. Al-   [13]   M. Cova, C. Kruegel, and G. Vigna. Detection and analysis
though diverting control flow by the forced execution occasionally             of drive-by-download attacks and malicious javascript code.
breaks the program semantics, due to the stealthy pattern and con-             In Proceedings of the 19th international conference on World
ditional nature of the hidden code, we are confident that J-F ORCE             wide web, pages 281–290. ACM, 2010.
is able to disclose most of evasive malware in the wild. Since J-       [14]   C. Curtsinger, B. Livshits, B. G. Zorn, and C. Seifert. Zozzle:
F ORCE is currently designed to detect client-side JavaScript mal-             Fast and precise in-browser javascript malware detection. In
ware, handling cloaking schemes in the server-side scripts (e.g.               USENIX Security Symposium, pages 33–48, 2011.
SQL, PHP, etc. [30]) is beyond the scope of this paper.                 [15]   Z. Deng, B. Saltaformaggio, X. Zhang, and D. Xu. iris:
                                                                               Vetting private api abuse in ios applications. In Proceedings
8.     CONCLUSION                                                              of the 22nd ACM SIGSAC Conference on Computer and
                                                                               Communications Security, pages 44–56. ACM, 2015.
   In this paper, we proposed J-F ORCE, a forced execution engine
for JavaScript to expose hidden and even malicious program behav-       [16]   L. Gong, M. Pradel, M. Sridharan, and K. Sen. Dlint:
iors. J-F ORCE explores all possible execution paths by mutating               Dynamically checking bad coding practices in javascript. In
the outcomes of branch predicates. We solved multiple technical                Proceedings of the 2015 International Symposium on
challenges and make J-F ORCE a practical, robust and crash-free                Software Testing and Analysis, pages 94–105. ACM, 2015.
tool. We validate the efficacy of J-F ORCE through an extensive set     [17]   L. Invernizzi and P. M. Comparetti. Evilseed: A guided
of experiments. J-F ORCE has been evaluated on 50 exploits of pop-             approach to finding malicious web pages. In Security and
ular exploit kits and more than 12, 000 Chrome extensions. It suc-             Privacy (SP), 2012 IEEE Symposium on, pages 428–442.
cessfully unveiled the hidden code in 41 exploits and detected more            IEEE, 2012.
than 300 Chrome extensions injecting advertisements. The exper-         [18]   N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis,
iments on 100 real-world JavaScript samples show that J-F ORCE                 N. Provos, M. A. Rajab, and K. Thomas. Trends and lessons
is able to achieve 95% code coverage and perform 2-8x better than              from three years fighting malicious extensions. In 24th
existing approaches.                                                           USENIX Security Symposium (USENIX Security 15), pages
                                                                               579–593, 2015.
9.     ACKNOWLEDGMENTS                                                  [19]   R. Johnson and A. Stavrou. Forced-path execution for
                                                                               android applications on x86 platforms. In Software Security
   We thank the anonymous reviewers for their constructive com-                and Reliability-Companion (SERE-C), 2013 IEEE 7th
ments. This research was supported, in part, by DARPA under con-               International Conference on, pages 188–197. IEEE, 2013.
tract FA8650-15-C-7562, NSF under awards 1409668, 1320444,
                                                                        [20]   A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna,
and 1320306, ONR under contract N000141410468, and Cisco
                                                                               and V. Paxson. Hulk: Eliciting malicious behavior in browser
Systems under an unrestricted gift. Any opinions, findings, and
                                                                               extensions. In Proceedings of the 23rd Usenix Security
conclusions in this paper are those of the authors only and do not
                                                                               Symposium, 2014.
necessarily reflect the views of our sponsors.
                                                                        [21]   A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and
                                                                               G. Vigna. Revolver: An automated approach to the detection
10.     REFERENCES                                                             of evasive web-based malware. In USENIX Security, pages
 [1]   http://malware.dontneedcoffee.com.                                      637–652. Citeseer, 2013.
 [2]   http://http://malware-traffic-analysis.net.                     [22]    C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert. Rozzle:
 [3]   Adblock plus. https://adblockplus.org.                                  De-cloaking internet malware. In Security and Privacy (SP),
 [4]   Blockadblock. http://blockadblock.com.                                  2012 IEEE Symposium on, pages 443–457. IEEE, 2012.
 [5]   Chrome Web Store. https://chrome.google.com/webstore.           [23]    S. Lekies, B. Stock, M. Wentzel, and M. Johns. The
 [6]   Clickfraud. http://digitalmarketingmagazine.co.uk/digital-              unexpected dangers of dynamic javascript. In 24th USENIX
       marketing-advertising/the-crooks-willing-to-put-you-out-of-             Security Symposium (USENIX Security 15), pages 723–735,
       business-for-5/1740.                                                    Washington, D.C., Aug. 2015. USENIX Association.
 [7]   Cryptolocker: What is and how to avoid it.                      [24]    E. Mutlu, S. Tasiran, and B. Livshits. Detecting javascript
       http://www.pandasecurity.com/mediacenter/malware/cryptolocker/.         races that matter. In Proceedings of the 2015 10th Joint
 [8]   JSHint. http://jshint.com.                                              Meeting on Foundations of Software Engineering,
                                                                               ESEC/FSE 2015, pages 381–392, New York, NY, USA,
 [9]   JSLint. http://www.jslint.com.
                                                                               2015. ACM.
[10]   Malvertising, Exploit Kits, ClickFraud & Ransomware: A
                                                                       [25]    C. Neasbitt, B. Li, R. Perdisci, L. Lu, K. Singh, and K. Li.
       Thriving Underground Economy.
                                                                               Webcapsule: Towards a lightweight forensic engine for web
       https://www.zscaler.com/blogs/research/malvertising-
                                                                               browsers. In Proceedings of the 22nd ACM SIGSAC
       exploit-kits-clickfraud-ransomware-thriving-underground-
                                                                               Conference on Computer and Communications Security,
       economy.
                                                                               pages 133–145. ACM, 2015.
[11]   Y. Cao, X. Pan, Y. Chen, and J. Zhuge. Jshield: towards
                                                                       [26]    F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su.
       real-time and vulnerability-based detection of polluted
                                                                               X-force: Force-executing binary programs for security
       drive-by download attacks. In Proceedings of the 30th
                                                                               applications. In Proceedings of the 2014 USENIX Security
       Annual Computer Security Applications Conference, pages
                                                                               Symposium, San Diego, CA (August 2014), 2014.
       466–475. ACM, 2014.
                                                                       [27]    P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle:
[12]   H. Chen, T. Kim, X. Wang, N. Zeldovich, and M. F.
                                                                               A defense against heap-spraying code injection attacks. In
       Kaashoek. Identifying information disclosure in web
                                                                               USENIX Security Symposium, pages 169–186, 2009.
[28] V. Raychev, M. Vechev, and A. Krause. Predicting program             drive-by download attacks. In Computer Software and
     properties from big code. In ACM SIGPLAN Notices,                    Applications Conference (COMPSAC), 2015 IEEE 39th
     volume 50, pages 111–124. ACM, 2015.                                 Annual, volume 2, pages 444–449. IEEE, 2015.
[29] V. Raychev, M. Vechev, and M. Sridharan. Effective race       [35]   D. Y. Wang, S. Savage, and G. M. Voelker. Cloak and
     detection for event-driven programs. In ACM SIGPLAN                  dagger: dynamics of web search cloaking. In Proceedings of
     Notices, volume 48, pages 151–166. ACM, 2013.                        the 18th ACM conference on Computer and communications
[30] K. Sadalkar, R. Mohandas, and A. R. Pais. Model based                security, pages 477–490. ACM, 2011.
     hybrid approach to prevent sql injection attacks in php. In   [36]   J. Wilhelm and T.-c. Chiueh. A forced sampled execution
     Security Aspects in Information Technology, pages 3–15.              approach to kernel rootkit identification. In International
     Springer, 2011.                                                      Workshop on Recent Advances in Intrusion Detection, pages
[31] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and             219–235. Springer, 2007.
     D. Song. A symbolic execution framework for javascript. In    [37]   X. Xing, W. Meng, B. Lee, U. Weinsberg, A. Sheth,
     Security and Privacy (SP), 2010 IEEE Symposium on, pages             R. Perdisci, and W. Lee. Understanding malvertising through
     513–528. IEEE, 2010.                                                 ad-injecting browser extensions. In Proceedings of the 24th
[32] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A            International Conference on World Wide Web, pages
     selective record-replay and dynamic analysis framework for           1286–1295. International World Wide Web Conferences
     javascript. In Proceedings of the 2013 9th Joint Meeting on          Steering Committee, 2015.
     Foundations of Software Engineering, pages 488–498. ACM,      [38]   Y. Zheng, T. Bao, and X. Zhang. Statically locating web
     2013.                                                                application bugs caused by asynchronous calls. In
[33] K. Sen, G. Necula, L. Gong, and W. Choi. Multise:                    Proceedings of the 20th international conference on World
     Multi-path symbolic execution using value summaries. In              wide web, pages 805–814. ACM, 2011.
     Proceedings of the 2015 10th Joint Meeting on Foundations     [39]   Y. Zhou and D. Evans. Understanding and monitoring
     of Software Engineering, pages 842–853. ACM, 2015.                   embedded web scripts. In Security and Privacy (SP), 2015
[34] Y. Takata, M. Akiyama, T. Yagi, T. Hariu, and S. Goto.               IEEE Symposium on, pages 850–865. IEEE, 2015.
     Minespider: Extracting urls from environment-dependent