Authors Chung Hwan Kim Dongyan Xu I Luk Kim Kyungtae Kim Xiangyu Zhang Yonghwi Kwon Yunhui Zheng
License CC-BY-4.0
J-Force: Forced Execution on JavaScript Kyungtae Kim, I Luk Kim, Chung Hwan Kim, Yonghwi Kwon, Yunhui Zheng∗ , Xiangyu Zhang, Dongyan Xu ∗ Department of Computer Science, Purdue University, USA IBM T.J. Watson Research Center, USA {kim1798, kim1634, chungkim, kwon58, xyzhang, dxu}@cs.purdue.edu zhengyu@us.ibm.com ABSTRACT they can only cover one concrete execution path in one run and may Web-based malware equipped with stealthy cloaking and obfusca- be unable to hit the spot that conceals malicious behaviors. tion techniques is becoming more sophisticated nowadays. In this To address the limitations, symbolic and concolic execution based paper, we propose J-F ORCE, a crash-free forced JavaScript exe- techniques [32, 31, 33] have also been proposed to analyze JavaScript cution engine to systematically explore possible execution paths programs. While they can generate program inputs and drive the and reveal malicious behaviors in such malware. In particular, J- execution along various feasible paths, due to the limitations of the F ORCE records branch outcomes and mutates them for further ex- constraint solvers, overcoming state explosion and handling com- plorations. J-F ORCE inspects function parameter values that may plex JavaScript operations (e.g., dynamic type conversions, arith- reveal malicious intentions and expose suspicious DOM injections. metic/string operations) are still open problems, especially for non- We addressed a number of technical challenges encountered. For trivial programs built atop various frameworks and other obfus- instance, we keep track of missing objects and DOM elements, and cated programs. create them on demand. To verify the efficacy of our techniques, In this paper, we propose J-F ORCE, a crash-free1 JavaScript forced we apply J-F ORCE to detect Exploit Kit (EK) attacks and malicious execution engine. J-F ORCE combines the advantages of static and Chrome extensions. We observe that J-F ORCE is more effective dynamic approaches: Similar to dynamic analysis, J-F ORCE exe- compared to the existing tools. cutes the program so that obfuscation is not an obstacle anymore. To increase the coverage, J-F ORCE forces the execution to go along different paths. In particular, J-F ORCE records the outcomes of Keywords branch predicates, mutates them, and explores unvisited paths via JavaScript; Security; Malware; Evasion multiple executions. This iterative path exploration process con- tinues until all possible paths are explored. Hence, J-F ORCE can 1. INTRODUCTION expose not only malicious code that can only be triggered by con- Web-based applications powered by JavaScript are becoming more ditions uneasily met, but also code blocks that are dynamically cre- widespread, interactive and powerful. In the meanwhile, they are ated and injected. Additionally, J-F ORCE further uncovers paths attractive targets of various attacks. Unfortunately, detecting and hidden in event and exception handlers. J-F ORCE can detect eva- analyzing malicious web apps against diverse combinations of ex- sive attacks triggered by non-deterministic events. ploits and evasive techniques is complicated and challenging. Al- We evaluate J-F ORCE on 50 real-world exploits in popular EKs [1, though various detection schemes have been proposed [14, 27, 13], 2] and over 12, 000 Chrome extensions. J-F ORCE successfully ex- they still suffer from sophisticated attacks such as cloaking attacks [21, posed the hidden code of 41 exploits and found that more than 300 35, 22]. Chrome extensions inject advertisements. We also run J-F ORCE Both static and dynamic approaches have been applied to detect on 100 JavaScript samples and measure its code coverage capacity. JavaScript malware. Static analysis (e.g., [9, 8]) considers multiple The results show that J-F ORCE can cover 95% of the code with execution paths and usually achieves better code coverage. How- 2-8x overhead, which is significantly effective than a popular con- ever, JavaScript is highly dynamic. Static approach may be impre- colic execution technique (68% coverage, 10-10, 000x overhead). cise and even incapable due to over-approximations and obfusca- In summary, this paper makes the following contributions. tions. This is a critical limitation since obfuscations have been the • We propose J-F ORCE, a JavaScript forced execution engine most common practice to hide the real intentions for protections or that explores all possible paths to expose hidden malware be- malicious reasons. By contrast, dynamic analysis techniques (e.g., haviors. J-F ORCE records and switches branch outcomes to [16, 32]) execute the program and thus can reveal concrete behav- explore new paths. J-F ORCE unveils function parameter val- iors even in an obfuscated program. However, a downside is that ues to detect malicious intentions and DOM injection attacks. • We address several technical challenges to avoid crashes dur- ing the continuous path explorations. For instance, J-F ORCE c 2017 International World Wide Web Conference Committee (IW3C2), keeps track of missing objects/DOM nodes and creates them published under Creative Commons CC BY 4.0 License. on demand. J-F ORCE can tolerate critical exceptions and WWW 2017, April 3–7, 2017, Perth, Australia. handle infinite loops/recursions. ACM 978-1-4503-4913-0/17/04. • We validate the efficacy of J-F ORCE through an extensive set http://dx.doi.org/10.1145/3038912.3052674 of experiments on real-world exploits and web browser ex- 1 In our paper, crash-free is about avoiding or handling JavaScript . exceptions. http://bbb.com/shop2.html http://ppp.org/abc.js obfuscated Timer handler <html> function FC3d(DzV, lm8H2) { EDXGD= function() { J-Force Driver Object Management … <script> … for(HPFY=0;DVz.length>HPFY;HPFY+=8) … Internet ... d5+=String.fromCharCode(...)...return elem.appendChild(script); </script> unescape(d5);} eval } Exec #1 Exception … ... setTimeout(EDXGD, 10); Exec #2 Management </html> lTZI04 = FC3d(VkpZF,MG6V);eval(lTZI04); obfuscated ieTrue = navigator.userAgent.toLowerCase() k=document[‘createElement’](‘script’) … Exploit / browser = /msie[\/s]d+/i.test(ieTrue) … Payload … k[‘text’]=S5SSQ(“AWFRMWtbFnshSQG DOM Management if(browser) { IESFJaRB94ZxUBXVMbUeEVXXnddR9Q ... GmpXbR9aa....”); e.insertBefore(a,b); ... } d.appendChild(k); http://ggg.net/opq.js Figure 1: Stealthy Exploit Kit Attack. tensions. J-F ORCE successfully disclosed the hidden code in 41 exploits and detected more than 300 ad-injecting exten- sions. Also, we show that J-F ORCE can achieve 95% code Figure 2: Overview of J-F ORCE. coverage and is 2-8x faster than the state-of-the-art on 100 1. <script> JavaScript samples. 2. if (...) { 3. btn = document.createElement("button"); Our work focuses on understanding malicious code that is present 4. btn.id = "mybutton"; on the client, so server-side cloaking or evasion is out-of-scope. 5. btn.innerHTML = "Remove"; 6. } else { 7. btn = document.createElement("button"); 8. btn.id = "mybutton"; 2. MOTIVATION 9. 10. } btn.innerHTML = "Skip"; 11. document.body.appendChild(btn); Recently, Exploit Kits (EKs) have been favored by cybercrim- 12. </script> inals to perform web-based attacks. In the last year alone, more 13. 14. ... <script> than 14 attacks were reported to CVE2 . Since EKs are specially 15. x = document.getElementById("mybutton"); 16. if (...) {...} designed to exploit known browser related defects, such attacks 17. if (...) {...} are highly effective: once a vulnerable client reaches the actual 18. </script> EK landing page, EK will silently download and install a malware. Figure 3: Example for per-block path exploration. Therefore, as a defense, it is critical to identify suspicious EK de- livery at the first place. Among various delivery vectors, malver- tising [10, 37] is one of the most dangerous and successful deliv- it is challenging for such techniques to be scalable to complicated ery approaches. In this section, we show a real-world EK deliv- and large real-world JavaScript programs due to the limitations im- ery equipped with layered obfuscation and cloaking techniques to posed by the underlying constraint solvers. demonstrate our approach. Unfortunately, as shown in Table 1, existing JavaScript malware Fig. 1 presents a carefully designed multi-layer EK attack chain detection tools are not effective to detect such malware in a scalable featured with collaborative cloaking techniques such as code obfus- way. In particular, while Rozzle [22] performs path explorations on cation, dynamically created scripts and evasive paths: (1) The first JavaScript programs to reveal evasive malicious behaviors, it can- obfuscated JS(JavaScript) snippet (http://ppp.org/abc.js) not disclose code in event handlers as its analysis scope is limited is delivered to a legitimate website via malvertising. (2) When to functions that are explicitly invoked. it is evaluated during the page loading, it creates a piece of dy- J-F ORCE Overview. J-F ORCE employs a forced execution tech- namic code from strings using eval. (3) The function EDXGD nique by switching branch outcomes and invoking event handlers. in the resulting snippet injects code for the next. Interestingly, As shown in Fig. 2, J-F ORCE explores feasible paths and reveals all EDXGD is injected as an event handler and can only be invoked the instructions irrespective of branch conditions in multiple con- when the timeout event is fired. Once evaluated, the second piece crete executions. Also, event and exception handlers are forcibly of obfuscated snippet (http://ggg.net/opq.js) will be in- invoked without emulating the events. By doing so, J-F ORCE is jected into the DOM tree and executed. (4) As a result, another able to reach and expose malicious logic that can only be triggered dynamic script is created and injected (d.appendChild(k)). by a particular combination of events and inputs. Moreover, J- (5) The injected code uses a cloaking method to hide the mali- F ORCE is dynamic analysis. Hence, it can handle obfuscations and cious payload: It first checks if the client browser can be the tar- disclose concrete function parameter values, which could further get (navigator.userAgent and msie). The hidden code is reveal malware behaviors (e.g., identifying eval content). executed only if the check result (browser) is true. Existing Approaches. As two pieces of JavaScript (abc.js and opq.js) in the chain are obfuscated, static analysis based detec- 3. DESIGN OF J-FORCE tion mechanisms [14, 9, 28, 11] may have difficulties in under- In this section, we present the details of J-F ORCE. We first standing the real semantics and thus are ineffective to handle such discuss the J-F ORCE execution model. Then we describe how J- cases. Discovering the execution path that can reveal the final ex- F ORCE explores multiple execution paths. ploit payload using dynamic approaches is also difficult. Particu- larly, it requires invocations of event handlers and proper environ- 3.1 J-Force Execution Model ment settings (e.g. IE browser), which are conditions not easily The execution model of J-F ORCE is designed based on the de- met in general. Symbolic and concolic execution techniques [32, fault page rendering model. 31, 33] can be used to explore multiple feasible paths. However, 2 3.1.1 Per-block Exploration CVE-2015-3090, CVE-2015-3105, CVE-2015-5122, CVE-2015-1671, CVE-2015- 5119, CVE-2015-5560, CVE-2015-7645, CVE-2015-8651, CVE-2015-8446, CVE- The default page rendering order drives the execution of J-F ORCE. 2016-1019, CVE-2016-1001, CVE-2016-0189, CVE-2016-0034, CVE-2016-4117 Once a <script> block is evaluated, J-F ORCE starts exploring Table 1: The comparison of the approaches for JavaScript malware detection. Obfuscation Path Explora- State Explo- Events Exceptions Name Category Target Scope Resilient tion Support sion Free Covered Covered WebEval [18] Static & Dynamic Analysis X 7 X 7 7 Expector [37] Dyanamic Analysis X 7 X X 7 Chrome Extension Hulk [20] Static & Dynamic Analysis X 7 X X 7 Revolver [21] Static & Dynamic Analysis X 7 X 7 7 JSAND [13] Dynamic Analysis X 7 X 7 7 Nozzle [27] Dynamic Analysis X 7 X 7 7 Generic Zozzle [14] Static Analysis 7 7 N/A 7 7 Rozzle [22] Dynamic (Symbolic Value) X X 7 7 7 J-F ORCE Forced Execution X X X X X Generic 1. function __necdel() all other possible paths within the block. In particular, when J- 2. { F ORCE reaches the exit of the block, it goes back and explores 3. var script = document.createElement("script"); 4. //... another unvisited path. Consider the example in Fig. 3. J-F ORCE 5. script.src = "http: //xxx.xxxxxxx.net/"; 6. var protocol = ("https:" == document.location.protocol: "http://"); explores the two paths in lines 1-12 before exploring the paths in 7. the next <script> block in 14-18. 8. var head = document.getElementsByTagName("head")[0]; 9. if ((protocol === "http://") && head) An alternative is to consider all code blocks as one giant block 10. head.appendChild(script); and explore paths in the “merged” block. However, it can hardly 11. } 12. window.addEventListener("mouseover", __necdel, false); scale because the total number of paths to be explored is the product of the path numbers in every individual block, whereas in the per- Figure 4: Code injection upon “mouseover” event. block strategy it is the sum of the number of paths in every block. Please note that an external JS script is essentially a single code set of handlers can only be triggered by user and timer events. In block and hence can be explored in a similar way. our experience, JS malware extensively leverages event handling mechanism to lay out the attack agenda. Fig. 4 shows a simplified 3.1.2 Handling Inter-Block Dependencies step in the malware delivery chain. __necdel() is registered as One challenge brought by the per-block design is how to con- an event handler of mouseover event. The script for the next sider the dependences across code blocks. For example, in Fig. 3, a step will not be injected unless the event is triggered. Indeed, we same button is set with different texts (Remove and Skip) along observed many malicious payloads only get triggered by a series different paths in lines 2-11. Without storing states along different carefully organized user or timer events to escape from being de- execution paths, our analysis may miss critical states that may lead tected by honey-client systems or other automatic detection tools. to malicious behavior. For instance, if we explore the path 7-9 af- Therefore, exploring event handlers is critical. ter 2-5. “Remove” will be overwritten by “Skip” and becomes J-F ORCE remembers functions registered as event handlers and invisible to blocks afterwards. forces them to be executed. In particular, after the exploration of While exploring paths globally is the ideal solution, it is unscal- the current code block, handlers that are registered during explo- able and impractical. Instead, we develop the following technique ration are executed, without requiring the triggering events. The based on the observation that most inter-block dependences are individual handlers are considered as code blocks that are explored caused by DOM objects. Since it is valid to have multiple ele- separately. To the best of our knowledge, most existing honey- ments with the same name or id on the DOM tree, J-F ORCE allows client systems and JS symbolic execution engines (e.g, [31]) do not any DOM injections along any paths. Also, J-F ORCE intercepts emulate events. Hence, they cannot reveal sophisticated handler- relevant DOM APIs (e.g. getElementById) and injects choice related behaviors. points, which are conceptually equivalent to switch-case state- ments. So, each execution returns a DOM element (with the same 3.1.4 Handling Asynchronous Execution id or name) until all such elements are explored. For example, in Currently, J-F ORCE does not focus on exposing race conditions Fig. 3, both buttons will be appended to the DOM tree. It fur- caused by asynchronizations [29, 38]. In fact, most JS races are ther inserts a choice point at line 15. As a result, totally 8 paths transient [24]. In our experience, we have not observed any real- are explored in the second block, where 4 are corresponding to the world malicious attacks leveraging race conditions due to its non- “Remove” button and the remaining 4 are for the “Skip” button. deterministic and unreliable nature. In theory, dependencies caused by global variables are handled in J-F ORCE respects browser’s decision on which block runs first. the same way. However, it is very expensive to do so for all global Note that JavaScript execution is single threaded and the execution variables. Given our focuses are stealthy behaviors that are usu- of a code block cannot be interrupted. J-F ORCE only steps in when ally based on string operations, we selectively support global string a block is being evaluated for the purpose of per-block code explo- variables. Furthermore, J-F ORCE also overwrites container inter- ration. faces (e.g., hashmap) to support inserting multiple strings with the same key to a global container. String attributes of DOM objects 3.1.5 Handling Dynamic Code Evaluation are handled similarly, where choice points are injected to access the JavaScript is highly dynamic. Malicious JS snippets can be dy- different versions. namically created from strings. For example, a common practice is to create a <script> element, specify its source and attach it to 3.1.3 Handling Event Handlers the DOM tree. eval() is another way to run dynamic code. Some event handlers, such as onload, are automatically executed J-F ORCE admits all code injections found along different paths when the corresponding DOM objects are loaded or created. The during the path exploration. Consequently, they will be explored exploration is driven by the rendering procedure. However, another like other code on the DOM tree. Some code snippets may be added 1. obj = new XMLHttpRequest(); // D1 Line # Defines to DOM elements that have already been rendered and explored by 2. //... 1 D1 3. if (cond) J-F ORCE. For such cases, J-F ORCE restarts the rendering proce- 4. obj = null; // D2 2 D1 D1 3 dure but only explores the uncovered injected snippets. 5. if (obj == null) 4 D2 6. return; 5 D1 | D2 For code dynamically evaluated by functions like eval, J-F ORCE 7. obj.send(); 6 D1 | D2 explores the code snippet concealed in the function parameter, as a 7 D1 | D2 part of the parent code block exploration. Note that J-F ORCE pro- Execution #1 Execution #2 Value (obj) vides versioning support for strings so that different but concrete 1. obj := XMLHttpRequest 1. obj := XMLHttpRequest 1. XMLHttpRequest 2. ---- 2. ---- 2. XMLHttpRequest parameter values produced by previous logic will be explored. 3. (taken) 3. (taken) 3. XMLHttpRequest 4. obj := null 4. obj := null 4. null 5. (taken) 5. (untaken) 5. null 3.2 Path Exploration 6. return 6. ---- 6. null 7. ---- 7. obj.send (crash!) 7. null J-F ORCE explores different paths in multiple runs. In each run, Figure 5: Handling crashes caused by missing objects. it looks for opportunities where mutating a predicate leads to un- explored instructions. Once found, it forces the execution to cover plored instructions. At line 16, J-F ORCE starts the execution with them in future iterations. It repeats this procedure until all instruc- no forced execution scheme and just runs the whole program nor- tions are covered. We designed two exploration strategies depend- mally. The purpose of this step is to obtain a list of predicates on ing on the needs. one path. Then, J-F ORCE can develop a new scheme by mutating • L-path executes each instruction at least once with linear a predicate at line 22 to execute uncovered instructions (line 21). time complexity. Exploring all distinct paths is not its prior- The driver repeats this until the worklist is empty, meaning that ity. For JS malware analysis, this strategy is sufficient in most no further opportunities can be discovered. Although the explo- cases as malicious behaviors are usually hidden in blocks. ration algorithm stems from L-path strategy, E-path takes the same • E-path aims at exploring all possible execution paths with phase except at line 21. Particularly, at the given branch, instead of exponential time complexity. We observed that only a few checking if its feasible targets are disclosed, E-path makes sure the advanced malware examples requires the E-path strategy. branch is followed along with two different targets. Algorithm 1 Path Exploration. Input: I: JavaScript instructions in a program 4. CRASH-FREE FORCED EXECUTION // σ is a list of forced predicates. A predicate p is represented as a tuple As J-F ORCE ignores path conditions, a program may execute // (psrc , pdst ) that specifies the source src and forced target dst along an infeasible path and crash. In this section, we describe the 1: function F ORCED E XEC(σ) challenges and our solutions to avoid crashing. 2: σe ← [ ] // σe is a list of executed predicates 3: p ← P OP _ FRONT( σ ) 4: for each i in I do 4.1 Missing Object 5: if i is a condition branch instruction then Fig. 5 shows a typical example of the crashes caused by missing 6: if isrc ≡ psrc then // isrc : source address of i 7: idst ← pdst // specify the instruction to be executed objects. At line 1, variable obj is initialized to an Ajax object. 8: p ← P OP _ FRONT( σ ) Suppose the true branches of the two predicates (line 3 and 5) are 9: else taken in the first run. Since line 7 is not explored, in the second run, 10: E ← E ∪ {idst } 11: σe ← σe · (isrc , idst ) the predicate at line 5 is mutated. However, as obj has been set to 12: Execute the instruction i null at line 4, the program will crash at line 7. 13: return σe To handle this, when resolving an object accessed, J-F ORCE first 14: function PATH E XPLORATION( ) identifies a set of candidates, which can be collected using an ex- 15: E ← {} // explored instructions isting data flow analysis. In addition, candidates without correct 16: W ← {F ORCED E XEC (nil)} // initial execution. W : worklist properties and types are filtered out. As shown in the defines table 17: while W 6= ∅ do 18: σ 0 ← P OP(W ) in Fig. 5, at line 7, D1 and D2 are possible objects to be accessed. 19: σt ← nil However, only D1 has the correct field send. Therefore, J-F ORCE 20: for each p in σ 0 do selects D1 and continues the forced execution. 21: if H AS A NY U NEXPLORED TARGET (E, p) then 22: σt0 ← σt · S WITCHING TARGET(p) 23: W ← W ∪ {F ORCED E XEC(σt0 )} 4.2 Handling Missing DOM Elements 24: else Another common kind of crashes in forced execution is caused 25: σt ← σt · p by missing DOM elements. Our strategy is to create and insert the missing ones to the DOM tree on demand. Note that simply Algorithm 1 shows the details of the path exploration approach. creating a new DOM element on each access without appending it Function F ORCED E XEC explains how to drive the execution to a to the right place will not work in practice. If multiple accesses to desired branch. In particular, it takes a forced execution schema a same element yield different newly created objects, the program σ as the input. σ is a list of tuple (psrc , pdst ), where psrc is the semantics will be violated. However, as DOM elements can be address of a predicate p and pdst is the forced target. Intuitively, it selected in various ways (e.g., by id, XPath, etc.), the challenge lies specifies the next step (pdst ) when J-F ORCE sees p. The logic of in how to put the new elements in the right place. forced execution is specified in the loop starting at line 4 interpreted If the selection is by element id, name, tag and class, the solution by JS engine. If a rerouting schema is provided for the current is straightforward. Particularly, as shown in Algorithm 2, if the el- branching instruction i (line 6), J-F ORCE forces the execution to ement returned by the original selector is invalid (line 4), J-F ORCE take the branch specified in the scheme at line 7. Otherwise, the creates a new one and inserts it to the children list of the current instruction will be executed normally. element (line 8-9). Function PATH E XPLORATION is the top-level driver. It main- Handling XPath selectors is more challenging. An XPath may be tains a worklist W, which is a set of forced execution schemes. E fully specified (e.g., “/A/B/C” means C is an immediate child of is a set of covered instructions. J-F ORCE uses it to discover unex- B and B is a child of A) or partially specified (e.g., “/A//C” means 1. if (window.attachEvent) { 1. if (...) { 2. window.attachEvent("onload", window["load" + initialize]); // ... 2. var script = document.createElement("script"); 3. } else { 3. script.src = "http://.../a.js"; 4. window.addEventListener("load", initialize, false); // ... 4. document.body.appendChild(script); 5. } 5. } else { 6. window.location = "http://.../b.html"; /* page redirection */ Figure 6: Browser-compatibility exception in forced execution. 7. } Figure 7: An example of page redirections. all C objects with an ancestor A). An XPath may also contain wild- cards to select all elements satisfying the filtering conditions (e.g., “/A[@exchange]” selects A with attribute exchange). In a To avoid terminations due to such exceptions, J-F ORCE captures forced run, an XPath selector may be partially broken due to miss- all unhandled exceptions using a top-level exception handler in the ing elements. Consider selector “p · s”. The prefix p correctly global scope and resumes the interrupted execution from the near- locate a DOM element. However, the suffix s fails because there est legacy function by unwinding the stack. In addition, to preserve is no such elements. To handle this issue, J-F ORCE identifies the the semantics of the exception triggering statement, J-F ORCE in- longest p that can locate a valid element o, creates element(s) cor- cludes a set of selective legacy APIs, which will be invoked based responding to s and make them a subtree of o. on the context. For instance, in Fig. 6, the attachEvent is redi- Function PathRecognizer() in Algorithm 2 describes the rected to the addEventListener so that the original program procedure. Particularly, at line 13, an XPath p is split by delimiters semantics are preserved. Algorithm 3 explains the details: (i.e., ‘/’ and ‘//’). Each delimited segment τ contains three parts: (a) Exceptions that can be handled by the original program: J- (1) the delimiter τp (“”, “/” or “//”); (2) the id τe (e.g., A), and F ORCE remembers the triggering location (line 3) and then (3) the filter τa (e.g., [@exchange]). explores the corresponding catch block. The code after the If τp is “//”, GetOffSpring is invoked to identify the off- triggering point will be covered in a later iteration. springs of the current object θ that matches τe and τa (line 22). (b) Uncaught exceptions due to missing handlers: They will be Otherwise, GetChildren is called to get the direct children of taken care of by the top-level handler inserted by J-F ORCE the current object that matches τe and τa (line 16). If no element is (lines 6,7-12). found (line 19), a new element corresponding to τe and τa is cre- (c) Exception handlers present but no exception was triggered ated as a child of θ (line 20). The above procedure continues until in one run. In our experience, a catch block is a high- the original selector becomes valid. value target for exploration, as malware authors often place An important design choice made is that the elements created their malicious code here for cloaking [22, 21]. These han- during one (forced) run are retained for later executions. This avoids dlers hence should be explored regardless the exception oc- creating duplicated elements in multiple executions and the DOM currences: J-F ORCE employs the same strategy for (a). J- tree grows monotonically. In practice, we found the size of a DOM F ORCE remembers the block entry point and explores it later. tree usually increases slowly and gradually becomes stable. Algorithm 3 Exception Handling. Algorithm 2 Handle missing DOM elements. 1: function E XCEPTION O CCURENCE(σ) Input: σ ∈ {id, name, nametag , nameclass , XPath} 2: if I S C OUGHT (σ) then 1: function C HECK A ND I NSERTION(σ) 3: S AVE E XCEPTION L OC (σ) 2: E ← G ET E LEMENTS (σ) 4: return // Allow to run catch block 3: τ ← G ET C URRNET O BJECT () 5: else 4: if ¬ I S VALID (E) then 6: return T OP L EVEL H ANDLER (σ) 5: if σ ∈ XPath then 7: function T OP L EVEL H ANDLER(σ) 6: return PATH R ECOGNIZER (σ) 8: t ← F IND L EGACY F UNC (σ) 7: else 9: if I S VALID (t) then 8: τ .I NSERT (C REATE E LEMENT (σ)) 10: return C ALL (t) 9: E ← G ET E LEMENTS (σ) 11: else 10: return E 12: return and allow to run the following. 11: function PATH R ECOGNIZER(p) 12: θ ← the current node 13: p0 ← PARTITION B Y D ELIMITER (p) 14: for each segment (τp , τe , τa ) in p0 do //τp :delimiter, τe :identifier, τa :filter 4.4 Page Redirection 15: if τp ≡ “//” then 16: E ← θ.G ET O FFSPRINGS (τe , τa ) Page redirections are commonly used to send visitors to a new 17: else /*τp ≡ ‘/’ ∨ τp is empty*/ destination by setting the location attribute of the window ob- 18: E ← θ.G ET C HILDREN (τe , τa ) ject in JavaScript. A page redirection cancels the current page 19: if ¬ I S VALID (E) then 20: θ.I NSERT (C REATE E LEMENT (τe , τa )) rendering procedure (including the JavaScript execution and re- 21: E ← θ.G ET C HILDREN (τe , τa ) source downloading) and hence interrupts J-F ORCE’s code explo- 22: θ←E ration strategy (J-F ORCE explores paths in multiple runs). 23: return E Fig. 7 shows an example. The true branch of the if state- ment injects a new <script> element while the else branch redirects visitors to b.html. Consider the following forced exe- 4.3 Handling Exception cution. In the 1st run, the true branch is covered and a new piece Being able to recover from crashes caused by exceptions is one of JavaScript in a.js will be downloaded and executed (lines 2- of the most important features of J-F ORCE for robustness. As the 4). (a.js). As explained in the forced execution model, J-F ORCE program may be forced to run on an infeasible path, various excep- explores the current code block before processing the next block. tions may occur. For example, Fig. 6 shows a common practice to Hence, in the next iteration, it explores the else branch before ex- make the program compatible with different browsers. J-F ORCE ecuting a.js. However, since the page redirection happens at line will execute line 2 without considering its predicate and thus trig- 6, the forced execution will be interrupted so that a.js will not be gers an exception. Since the corresponding handler is absent, the explored. In fact, if there are other uncovered paths/blocks in the forced execution will be interrupted and terminated. same page, they will not be explored due to the page redirection. # of # of samples whose obfuscations / evasions can be handled Our solution is to load the target page in a separate frame so that Exploit Kits samples Native run Rozzle [22] WebEval [18] J-F ORCE J-F ORCE can continue exploring the current page. Since frames Angler 10 2/1 7/6 3/3 10 / 10 are isolated from each other, the effect of loading the destination RIG 10 5/0 7/2 5/0 10 / 10 page in a frame is functionally equivalent to a page redirection. In Nuclear 10 3/0 6/2 3/1 10 / 7 Magnitude 10 6/2 10 / 6 6/4 10 / 10 this particular example, J-F ORCE loads b.html in an iframe SweetOrange 10 2/0 8/4 4/4 10 / 6 and thus is able to explore the behaviors in a.js. Table 2: Comparing detection techniques on EKs. 4.5 Infinite Loop and Recursion J-F ORCE may suffer from infinite loops or endless recursions # of Ad-injecting # of Info. leakage because it ignores the loop and recursion conditions. To handle Total Ajax Script Injection Total Ajax Script Injection this issue, we set an upper bound on the number of times a loop or Hulk [20] 195 29 166 14 9 5 Expector [37] 187 28 159 9 6 3 a recursive function can be invoked. For loops, J-F ORCE monitors WebEval [18] 158 15 143 8 5 3 the loop executions and makes sure that they do not go beyond J-F ORCE 322 45 277 30 21 9 the threshold. Otherwise, J-F ORCE forces the execution to skip the loop. Similarly, for recursions, we use a threshold to limit recursion Table 3: The analysis result of 12,132 Chrome extensions. depth. We make sure that whenever new stack frame is created, the stack depth is smaller than the threshold. number of the samples can be handled by each tool, in terms of ob- fuscation handled and evasion passed. Since we know the ground 5. EVALUATION truth about deobfuscation, counting successful de-obfuscations is J-F ORCE is implemented atop WebKit-r171233 with GTK+ port. straightforward. For evasions, if the exploitation entry point (e.g. Our evaluation consists of two experiments. The first one is a sys- <object>) is reached, we say the evasion is detected. tematic study on 50 EK samples and 12, 132 Chrome extensions The results show that J-F ORCE is able to handle more obfusca- to see if J-F ORCE is able to detect (malicious) behaviors covered tions and evasions than others, hence can expose more hidden ma- by sophisticated cloaking and obfuscation techniques. Also, since licious behaviors in EK attacks. In particular, J-F ORCE is signifi- being able to explore more code is important, in the second exper- cantly effective in detecting evasions. While J-F ORCE outperforms iment, we further quantify J-F ORCE’s performance by measuring other techniques, it misses a few evasions in Nuclear and SweetOr- the coverage and the overhead on 100 real-world JavaScript pro- ange. We manually inspected these cases and found that they use grams. All experiments are performed on a machine with an Intel Visual Basic (VB) scripts which are not currently supported by J- Core i7 3.40 GHz CPU and 12 GB RAM running Ubuntu 14.04 F ORCE. However, our design is general and can be implemented LTS. on VB scripts too. 5.1 Detecting Suspicious Hidden Behaviors 5.1.2 Detecting Ads Injections in Chrome Extensions Browser extensions are commonly used nowadays to enhance 5.1.1 Detecting Obfuscations and Evasions in EKs user experience and thus becoming a target of adversaries. Several We have collected 50 EK samples from various sources [1, 2], recent work [20, 18, 37] have been proposed to analyze extensions. and classified them based on the underlying EKs, namely Angler, In this section, we show how J-F ORCE can effectively disclose sus- RIG, Nuclear, Magnitude, SweetOrange. Although different, we picious behaviors in Chrome extensions. observed they all share similar mechanisms listed as follows: We crawled and obtained 12,132 extensions from Chrome Web • Obfuscation. Obfuscation conceals program functionalities Store [5] in July 2016. The analysis is done offline. As the JavaScript using string operations to make detecting malware challeng- APIs used in extensions are slightly different from those in web ing. In EK, obfuscation technique is used more than once applications, we enhance J-F ORCE to support such Chrome APIs throughout multiple layers of code injection. (e.g., chrome.browserAction.onClicked). In this exper- • Evasion. To minimize the possibility of being caught (e.g., iment, we are particularly interested in detecting ad-injections and by honey-pot based approaches), EK only invokes the ma- information leaks. We also compare with recent work on Chrome licious logic when it satisfies certain conditions. Specifi- extension analysis [20, 18, 37]. cally, EK usually scans visitors’ system (e.g. the signatures Table 3 summarizes the experiment results. J-F ORCE detected of browsers, extensions, etc.) before moving on to the next 322 extensions that inject advertisement, where 277 deliver ad con- stage. An example is shown in Fig. 1 in Sec. 2. tents using script injections and the remaining ones bring in ads via • Exploiting Vulnerabilities. EK is designed to exploit partic- Ajax. Comparing to other techniques, J-F ORCE is able to find 195 ular vulnerabilities in browsers or add-ons by hijacking the more ad-injecting extensions, which confirms its effectiveness of control flow and elevating permissions. The typical targets of handling cloaking and fingerprinting techniques. In addition, J- such exploitation are Adobe Flash, MS Silverlight and Java F ORCE detected 30 extensions that send out sensitive information runtime as well as browsers themselves. such as passwords and cookies via Ajax, while other techniques • Payload Delivery. As the last step, a malicious binary is can detect at most 14 of them. downloaded and executed without user’s consent. Ransomware [7] Table 4 presents the statistics of the Chrome extension execution and click fraud [6] are two common examples. analysis. We report the minimum, average and maximum number As J-F ORCE focuses on detecting malicious JavaScript behav- of JavaScript IR instructions, script injections, Ajax requests, eval iors, only the JavaScript parts (obfuscation and evasion) are function invocations, event handlers and page redirections observed included for evaluation. Analyzing non-JavaScript code, such as in exploring one extension. The results show that J-F ORCE can exploiting vulnerabilities in the web browser or plug-ins, is beyond exercise more instructions and discover more behaviors than the the scope of this paper. The results of experiments on 50 EK sam- native run. We also report the number of runs required by J-F ORCE ples (10 for each EK type) are presented in Table 2. It shows the to cover all instructions (using the L-path search strategy explained JavaScript IR Script Injections Ajax Eval Event Handlers Redirections Handled Crashes # of Runs avg min max avg min max avg min max avg min max avg min max avg min max avg min max avg min max J-F ORCE 1, 478 10 31, 248 0.71 0 28 0.21 0 5 0.27 0 10 1.57 0 19 0.15 0 5 2.74 0 117 11.32 1 609 Native run 406 10 14, 151 0.46 0 13 0.03 0 2 0.15 0 8 0.85 0 12 0.02 0 2 N/A N/A Table 4: The statistics of Chrome extensions analysis in Sec. 3.2). We show the number of potential crashes caused by ternal script included at line 1 must be blocked by an adblocker, the forced execution. We observed 2.74 crashes per extension on which is highly dependent on the execution environment. If the average and they are mostly caused by missing objects and DOM adblocker has not been configured correctly or the URL of the ex- elements. All of them are handled correctly using the approach ternal resource is not on the blacklist anymore, dynamic analysis discussed in Sec. 4. cannot unveil the stealthy operations either. By contrast, J-F ORCE decouples the dependencies on the en- 5.1.3 Case Study - Anti-adblocker vironment and hence allows us to effectively and deterministically Unlike traditional programs, web applications have various ex- observe unusual behaviors. On the left hand side of Fig. 8, we com- ternal dependences. For example, they can navigate the execution pare the control flow graphs that highlight the differences between depending on browsers environment settings. They can download J-F ORCE and dynamic analysis based approaches. J-F ORCE is able and load different external JavaScript on the fly from third parties to explore both paths while the dynamic analysis only covers one during executions. Therefore, although it is possible mutating in- path. As such, J-F ORCE is able to discover the real ads contents by put values may change the execution paths, in general, it is highly forced execution without requiring complicated system settings to nontrivial or even infeasible for an automatic exploration tool to actually trigger the logic in traditional dynamic approaches. satisfy the triggering conditions of the execution environment and More importantly, through J-F ORCE, we can uncover the actual third party scripts. In this case study, we showcase a real-world values of function parameters (the right side of Fig. 8) and track anti-adblocker [4] to demonstrate how J-F ORCE bypasses sophisti- the origin of suspicious values. With such capabilities (especially cated predicates and thus can be helpful for understanding stealthy the hidden contents that can only be obtained dynamically), it is program behaviors. straightforward to conclude the ads are included in the image file. Ad-blocker (e.g., [3]) is a piece of software that allows clients to roam the web without encountering any Ads. In particular, it uti- 5.2 Efficiency lizes network control and in-page manipulation to help users block As described in Sec. 3.2, J-F ORCE can be configured to im- advertisements loaded from ad-network. As many content publish- prove coverage on instructions (the L-path strategy) or paths (the ers make their primary legitimate income from Ads, there are grow- E-path strategy). To measure its efficiency, we extracted 100 exam- ing demands for delivering ads even the ad-blockers are running in ples (from Alexa.com) and evaluate J-F ORCE on these real-world client browsers. As a result, anti-adblockers have been developed JavaScript programs. We compare J-F ORCE with Jalangi, a con- and deployed by publishers on their websites. Anti-adblockers are colic JavaScript execution engine [32], which is one of the closest usually scripts delivered by publishers to detect if adblockers are alternate approaches available at present. enabled in the client browsers. Once found, it either hides the con- Fig. 9 presents the code coverage comparison results. The num- tent or delivers the ads by circumventing the ads filters. ber of branches of the benchmarks varies from 109 to 1, 200. In Fig. 8 presents a simplified version of a popular anti-adblocker Fig.9, the JavaScript benchmarks on the X-axis are sorted by the BlockAdblock [4], where the arrows denote important call edges. It branch count in ascending order. The result shows that, on aver- first detects if an adblocker is enabled on the client-side and loads age, J-F ORCE is able to cover 95% of the code (the same result the real ads contents that are delivered as an image. In particular, for both exploration strategies), which is significantly more than line 1 includes an external script (“advertising.js”). If it can be Jalangi (less than 68%). We found that the main reason for the im- successfully loaded, variable __haz will be set to false. If an provement is that the concolic execution based approach does not adblocker presents, the script will not be blocked and the value of explore the code in event and timer handlers. In addition, Jalangi __haz remains undefined. Therefore, BlockAdblock can tell if often fails to handle complex arithmetic operations such as division an adblocker is running by checking the value of __haz. At line 4, and modulo. By contrast, J-F ORCE does not suffer from such lim- it invokes function __ac() and defines the function to be invoked itation and is able to expand its analysis scope to event and excep- for the next step. Depending on the presence of an adblocker, it will tion handlers. Besides, J-F ORCE does not miss conditional blocks invoke a function (defined in lines 13-23) or do nothing. In function as our exploration technique is designed to cover both branches by __dec, it loads an image, where its URL is specified at line 3 and switching branch outcomes. We also manually inspect the scenar- further transformed at line 4. Interestingly, instead of displaying the ios where J-F ORCE fails to cover all instructions. We found that image, it uses this image as a circumvention of ad-blocking rules this is mainly due to coding errors in the sample JavaScript pro- and loads the raw data of the images. At line 21, function __cb grams. is invoked, which creates a div element and displays the HTML Beside the coverage, we also measure the runtime performance hidden in the image at line 27. of J-F ORCE. Fig. 10 summarizes the comparison result of the over- It is highly nontrivial for static analysis based approaches to pre- heads collected during the coverage test. For each approach, the cisely analyze such complicated call relations, as it requires ad- overhead is normalized to the native run. The result shows that the vanced alias and string analysis (e.g., the operations in line 4 and overhead of J-F ORCE is 2-8x (2-300x for E-path) whereas Jalangi 20). More importantly, as the ads contents are actually hidden in has much higher overhead 10-10, 000x. Observe that such a differ- an image, they may not even be in the analysis scope. As a result, it ence is caused by the fact that concolic execution based approaches is very unlikely that the static analysis can handle such cases. An- may not scale well with the number of branches, showing expo- other option is to actually run the program. However, one important nentially increasing overhead. Particularly, generating and solving triggering condition of the secret loading procedure is that the ex- path constraints is more expensive than mutating branch outcomes. J-Force 1 <script src=“http://.../advertising.js” ..></script> // “var __haz = false;” 2 ... 24 __cb = function (s) { if (typeof …) 25 … 3 __durl = ‘//.../hallon-p12065a-:r:.gif’; 4 __ac(function(){ __dec(__durl.replace(“:r:”, __s(5, 12)), __cb); 26 _new = d.createElement(‘div’); return f(); 5 }); 27 _new.innerHTML = s.html; … 28 k.insertBefore(_new, k); … 6 function __ac(f) { 13 function __dec(src, callback) { 29 … 7 … 14 i = new Image(); 8 if (typeof __haz === ‘undefined’) 15 i.onload = function() { Native 9 return f(); 16 … exec 10 ... 17 t.drawImage(i, 0, 0); J-Force if (typeof …) 11 return; 18 b = __p24(t.getImageData(...).data); 19 for (...) callback(s) 12 } return f(); 20 if (b[x]) s+= str.fromCharCode(b[x]); s: “..html: <div class=\fram \></div>\n<divclass=\k3rwp 21 callback(s); j9jwhynv\>\n<div class=\gb … 22 } qfwapg\>\n<span class=\g 23 i.src = src; bqfwaabemdey ….</div>” Figure 8: Analyzing Anti-Adblocker using J-F ORCE. 100 support dynamic nature and scale to real-world applications built J-Force Native atop various JavaScript frameworks. 80 Concolic Coverage (%) JavaScript Malware. EVILSEED [17] leverages characteristics of 60 known malicious web pages to discover other likely malicious web 100 40 pages including JavaScript. Revolver [21] aims to find JavaScript J-Force Native 2080 Concolic malware based on code similarity. In particular, it tries to classify Coverage (%) 060 evasive malware by comparing with a large amount of JavaScript 0 20 40 60 80 100 40 JS files collected in advance. It heavily resorts to the result of pre-classification by oracle, and may not be robust against newly crafted malware 200020 Figure 9: Coverage Concolic of J-F ORCE in comparison with native run (e.g., zero-day exploit). MineSpider [34] extracts URLs from JS J-Force(L-path) Overhead (times) and concolic 0 1500 0 execution. 20 J-Force(E-path) 40 60 80 100 snippets equipped with evasion techniques that performs drive-by JS files download attacks. It collects execution paths relevant to redirec- 1000 2000 Concolic tions using program slicing methods. While it is useful to track J-Force(L-path) page redirections, it is not able to handle the dynamic remote code Overhead (times) 500 1500 J-Force(E-path) injection using iframe or simple <script> tag. Lekies et al. [23] 0 1000 show attack methods enabled by the object scoping and dynamic 0 20 40 60 80 100 JS files nature of JavaScript. They investigate a set of high-ranked do- 500 mains and verify that those are vulnerable to Cross-Site Script In- 0 0 20 40 60 80 100 clusion(XSSI) attacks. ScriptInspector [39] examines third-party JS files script injection to restrict accesses to critical resources. This is Figure 10: Performance overhead of J-F ORCE in comparison achieved by allowing site administrators to establish their own se- with concolic execution. curity policies. WebCapsule [25] records and replays web contents executions for forensic analysis. It records and all non-deterministic inputs to the core web rendering engine including user interactions. 6. RELATED WORK RAIL [12] can verify security patches of web applications by rerun- Multiple Path Execution. The concept of forced execution was ning patched web applications with previous buggy inducing inputs employed in previous researches [26, 15, 36, 19]. Although the such as exploits. The system can tolerate state divergences caused concept has been applied in various domains, such as native binary by the patches. Unlike the record and replay approaches, J-Force programs [26], mobile apps [15, 19], and identifying kernel rootk- explores all possible paths to reveal evasive malicious logics which its [36], our work is the first to propose the forced execution en- are difficult to expose. gine for JavaScript to the best of our knowledge. Furthermore, the Browser Extensions. Hulk [20] analyzes Chrome browser exten- challenges that J-F ORCE solves, such as handling missing object- sions and detects malicious (or suspicious) behaviors, such as ad- s/DOM, handling event/exception handlers and more (Sec. 4) are injecting and information leak. Expector [37] tries to figure out unique to JavaScript and are not proposed (or solved) by previous the correlation between malvertising and plug-ins. It shows that, work. Rozzle [22] also places emphasis on analyzing self-revealing in a condition where a specific extension is working, malvertising program behaviors. It explores multiple execution paths with sin- is more likely to appear. WebEval [18] inspects Chrome exten- gle execution. However, it is done via a different approach which sions upon the combination of static and dynamic analysis. In order is based on symbolic values. More importantly, they have limited to trigger malicious activities, it sets up simulations by recording support for program faults and exceptions handling. By contrast, complex interactions between web pages and network events. Ob- our tool can explore all feasible paths without being interrupted by serve that though such techniques have their own way to increase exceptions. Symbolic (or concolic) execution has been applied to coverage and unveil hidden malicious actions, it would not be suf- analyze JavaScript based Web applications [32, 31, 33]. Due to ficient to induce all possible behaviors. the limitations in underlying constraint solvers, it is challenging to 7. DISCUSSION applications with retroactive auditing. In OSDI, pages As our solution aims to expose malware hidden under a certain 555–569, 2014. program path, detecting data driven attacks is still challenging. Al- [13] M. Cova, C. Kruegel, and G. Vigna. Detection and analysis though diverting control flow by the forced execution occasionally of drive-by-download attacks and malicious javascript code. breaks the program semantics, due to the stealthy pattern and con- In Proceedings of the 19th international conference on World ditional nature of the hidden code, we are confident that J-F ORCE wide web, pages 281–290. ACM, 2010. is able to disclose most of evasive malware in the wild. Since J- [14] C. Curtsinger, B. Livshits, B. G. Zorn, and C. Seifert. Zozzle: F ORCE is currently designed to detect client-side JavaScript mal- Fast and precise in-browser javascript malware detection. In ware, handling cloaking schemes in the server-side scripts (e.g. USENIX Security Symposium, pages 33–48, 2011. SQL, PHP, etc. [30]) is beyond the scope of this paper. [15] Z. Deng, B. Saltaformaggio, X. Zhang, and D. Xu. iris: Vetting private api abuse in ios applications. In Proceedings 8. CONCLUSION of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 44–56. ACM, 2015. In this paper, we proposed J-F ORCE, a forced execution engine for JavaScript to expose hidden and even malicious program behav- [16] L. Gong, M. Pradel, M. Sridharan, and K. Sen. Dlint: iors. J-F ORCE explores all possible execution paths by mutating Dynamically checking bad coding practices in javascript. In the outcomes of branch predicates. We solved multiple technical Proceedings of the 2015 International Symposium on challenges and make J-F ORCE a practical, robust and crash-free Software Testing and Analysis, pages 94–105. ACM, 2015. tool. We validate the efficacy of J-F ORCE through an extensive set [17] L. Invernizzi and P. M. Comparetti. Evilseed: A guided of experiments. J-F ORCE has been evaluated on 50 exploits of pop- approach to finding malicious web pages. In Security and ular exploit kits and more than 12, 000 Chrome extensions. It suc- Privacy (SP), 2012 IEEE Symposium on, pages 428–442. cessfully unveiled the hidden code in 41 exploits and detected more IEEE, 2012. than 300 Chrome extensions injecting advertisements. The exper- [18] N. Jagpal, E. Dingle, J.-P. Gravel, P. Mavrommatis, iments on 100 real-world JavaScript samples show that J-F ORCE N. Provos, M. A. Rajab, and K. Thomas. Trends and lessons is able to achieve 95% code coverage and perform 2-8x better than from three years fighting malicious extensions. In 24th existing approaches. USENIX Security Symposium (USENIX Security 15), pages 579–593, 2015. 9. ACKNOWLEDGMENTS [19] R. Johnson and A. Stavrou. Forced-path execution for android applications on x86 platforms. In Software Security We thank the anonymous reviewers for their constructive com- and Reliability-Companion (SERE-C), 2013 IEEE 7th ments. This research was supported, in part, by DARPA under con- International Conference on, pages 188–197. IEEE, 2013. tract FA8650-15-C-7562, NSF under awards 1409668, 1320444, [20] A. Kapravelos, C. Grier, N. Chachra, C. Kruegel, G. Vigna, and 1320306, ONR under contract N000141410468, and Cisco and V. Paxson. Hulk: Eliciting malicious behavior in browser Systems under an unrestricted gift. Any opinions, findings, and extensions. In Proceedings of the 23rd Usenix Security conclusions in this paper are those of the authors only and do not Symposium, 2014. necessarily reflect the views of our sponsors. [21] A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, and G. Vigna. Revolver: An automated approach to the detection 10. REFERENCES of evasive web-based malware. In USENIX Security, pages [1] http://malware.dontneedcoffee.com. 637–652. Citeseer, 2013. [2] http://http://malware-traffic-analysis.net. [22] C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert. Rozzle: [3] Adblock plus. https://adblockplus.org. De-cloaking internet malware. In Security and Privacy (SP), [4] Blockadblock. http://blockadblock.com. 2012 IEEE Symposium on, pages 443–457. IEEE, 2012. [5] Chrome Web Store. https://chrome.google.com/webstore. [23] S. Lekies, B. Stock, M. Wentzel, and M. Johns. The [6] Clickfraud. http://digitalmarketingmagazine.co.uk/digital- unexpected dangers of dynamic javascript. In 24th USENIX marketing-advertising/the-crooks-willing-to-put-you-out-of- Security Symposium (USENIX Security 15), pages 723–735, business-for-5/1740. Washington, D.C., Aug. 2015. USENIX Association. [7] Cryptolocker: What is and how to avoid it. [24] E. Mutlu, S. Tasiran, and B. Livshits. Detecting javascript http://www.pandasecurity.com/mediacenter/malware/cryptolocker/. races that matter. In Proceedings of the 2015 10th Joint [8] JSHint. http://jshint.com. Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pages 381–392, New York, NY, USA, [9] JSLint. http://www.jslint.com. 2015. ACM. [10] Malvertising, Exploit Kits, ClickFraud & Ransomware: A [25] C. Neasbitt, B. Li, R. Perdisci, L. Lu, K. Singh, and K. Li. Thriving Underground Economy. Webcapsule: Towards a lightweight forensic engine for web https://www.zscaler.com/blogs/research/malvertising- browsers. In Proceedings of the 22nd ACM SIGSAC exploit-kits-clickfraud-ransomware-thriving-underground- Conference on Computer and Communications Security, economy. pages 133–145. ACM, 2015. [11] Y. Cao, X. Pan, Y. Chen, and J. Zhuge. Jshield: towards [26] F. Peng, Z. Deng, X. Zhang, D. Xu, Z. Lin, and Z. Su. real-time and vulnerability-based detection of polluted X-force: Force-executing binary programs for security drive-by download attacks. In Proceedings of the 30th applications. In Proceedings of the 2014 USENIX Security Annual Computer Security Applications Conference, pages Symposium, San Diego, CA (August 2014), 2014. 466–475. ACM, 2014. [27] P. Ratanaworabhan, V. B. Livshits, and B. G. Zorn. Nozzle: [12] H. Chen, T. Kim, X. Wang, N. Zeldovich, and M. F. A defense against heap-spraying code injection attacks. In Kaashoek. Identifying information disclosure in web USENIX Security Symposium, pages 169–186, 2009. [28] V. Raychev, M. Vechev, and A. Krause. Predicting program drive-by download attacks. In Computer Software and properties from big code. In ACM SIGPLAN Notices, Applications Conference (COMPSAC), 2015 IEEE 39th volume 50, pages 111–124. ACM, 2015. Annual, volume 2, pages 444–449. IEEE, 2015. [29] V. Raychev, M. Vechev, and M. Sridharan. Effective race [35] D. Y. Wang, S. Savage, and G. M. Voelker. Cloak and detection for event-driven programs. In ACM SIGPLAN dagger: dynamics of web search cloaking. In Proceedings of Notices, volume 48, pages 151–166. ACM, 2013. the 18th ACM conference on Computer and communications [30] K. Sadalkar, R. Mohandas, and A. R. Pais. Model based security, pages 477–490. ACM, 2011. hybrid approach to prevent sql injection attacks in php. In [36] J. Wilhelm and T.-c. Chiueh. A forced sampled execution Security Aspects in Information Technology, pages 3–15. approach to kernel rootkit identification. In International Springer, 2011. Workshop on Recent Advances in Intrusion Detection, pages [31] P. Saxena, D. Akhawe, S. Hanna, F. Mao, S. McCamant, and 219–235. Springer, 2007. D. Song. A symbolic execution framework for javascript. In [37] X. Xing, W. Meng, B. Lee, U. Weinsberg, A. Sheth, Security and Privacy (SP), 2010 IEEE Symposium on, pages R. Perdisci, and W. Lee. Understanding malvertising through 513–528. IEEE, 2010. ad-injecting browser extensions. In Proceedings of the 24th [32] K. Sen, S. Kalasapur, T. Brutch, and S. Gibbs. Jalangi: A International Conference on World Wide Web, pages selective record-replay and dynamic analysis framework for 1286–1295. International World Wide Web Conferences javascript. In Proceedings of the 2013 9th Joint Meeting on Steering Committee, 2015. Foundations of Software Engineering, pages 488–498. ACM, [38] Y. Zheng, T. Bao, and X. Zhang. Statically locating web 2013. application bugs caused by asynchronous calls. In [33] K. Sen, G. Necula, L. Gong, and W. Choi. Multise: Proceedings of the 20th international conference on World Multi-path symbolic execution using value summaries. In wide web, pages 805–814. ACM, 2011. Proceedings of the 2015 10th Joint Meeting on Foundations [39] Y. Zhou and D. Evans. Understanding and monitoring of Software Engineering, pages 842–853. ACM, 2015. embedded web scripts. In Security and Privacy (SP), 2015 [34] Y. Takata, M. Akiyama, T. Yagi, T. Hariu, and S. Goto. IEEE Symposium on, pages 850–865. IEEE, 2015. Minespider: Extracting urls from environment-dependent