Authors Eli Tilevich Kijin An
License CC-BY-4.0
Client Insourcing: Bringing Ops In-House for Seamless Re-engineering of Full-Stack JavaScript Applications Kijin An Eli Tilevich Software Innovations Lab, Virginia Tech Software Innovations Lab, Virginia Tech ankijin@vt.edu tilevich@cs.vt.edu ABSTRACT maintenance tasks to architecture-level changes [7]. A re-eng- Modern web applications are distributed across a browser-based ineering effort can involve adding a major feature, protecting against client and a cloud-based server. Distribution provides access to re- security vulnerabilities, or removing performance bottlenecks. Mod- mote resources, accessed over the web and shared by clients. Much ifying existing web applications requires complex program analysis of the complexity of inspecting and evolving web applications lies and modification operations that are hard to perform and even in their distributed nature. Also, the majority of mature program harder to verify. One of the main causes of this complexity is the analysis and transformation tools works only with centralized soft- distributed execution model of web applications. ware. Inspired by business process re-engineering, in which remote In this model, a web application’s execution flows across the operations can be insourced back in house to restructure and out- separate address spaces of its client and server parts. All remote source anew, we bring an analogous approach to the re-engineering interactions are typically implemented by means of middleware of web applications. Our target domain are full-stack JavaScript libraries. As a result, the control flow of web applications can be applications that implement both the client and server code in highly complex, with their business and communication logic inter- this language. Our approach is enabled by Client Insourcing, a mingled. That complexity hinders all tracing and debugging tasks. novel automatic refactoring that creates a semantically equivalent In addition, distributed execution over the network makes web centralized version of a distributed application. This centralized applications vulnerable to partial failure and non-determinism. version is then inspected, modified, and redistributed to meet new Program analysis is central to software comprehension. The web requirements. After describing the design and implementation of is predominated by dynamic languages, which defeat static analysis Client Insourcing, we demonstrate its utility and value in address- techniques. Hence, to comprehend programs written in dynamic ing changes in security, reliability, and performance requirements. languages, such as JavaScript, requires dynamic analysis. Software By reducing the complexity of the non-trivial program inspection debugging hinges on the ability to repeat executions deterministi- and evolution tasks performed to meet these requirements, our cally [26, 31]. However, many web applications are stateful, with approach can become a helpful aid in the re-engineering of web certain client server interactions changing the server’s state. It can applications in this domain. be quite laborious and error-prone to restore the original state to be able to repeat a remote buggy operation [16, 23, 33]. All in all, CCS CONCEPTS it is the presence of both distribution and stateful execution that makes it so hard to trace and modify web applications. • Software and its engineering → Software maintenance tools; In this paper, we draw inspiration from business process re- Dynamic analysis; Automated static analysis. engineering that can bring remote operations in-house via insourc- ing. Once the insourced operations are redesigned and restructured, KEYWORDS some of them can be outsourced anew. As argued above, the notion Software Engineering, Re-Engineering, Web Applications, JavaScript, of local operations being easier to analyze and restructure than Mobile Apps, Program Analysis & Transformation, Middleware remote ones equally applies to web applications. ACM Reference Format: Specifically, the approach presented herein first automatically Kijin An and Eli Tilevich. 2020. Client Insourcing: Bringing Ops In-House for transforms a web application, comprising a client communicating Seamless Re-engineering of Full-Stack JavaScript Applications. In Proceed- with a remote server, to run as a centralized program. The resulting ings of The Web Conference 2020 (WWW ’20), April 20–24, 2020, Taipei, Taiwan. centralized variant retains to a large degree the semantics of the ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3366423.3380105 original application, but replaces all remote operations with local ones. The centralized variant becomes easier to analyze and modify 1 INTRODUCTION not only because it has no remote operations, but also because Developers often need to re-engineer web applications to address the majority of program analysis and transformation approaches requirement changes made only after deployment and usage. Re- and tools have been developed for centralized programs. After the engineering captures evolutionary modifications that range from centralized variant is modified to address the new requirements This paper is published under the Creative Commons Attribution 4.0 International and the modifications have been verified, it is then redistributed (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their again into a re-engineered distributed web application. Our target personal and corporate Web sites with the appropriate attribution. domain are web applications written entirely in JavaScript, both WWW ’20, April 20–24, 2020, Taipei, Taiwan © 2020 IW3C2 (International World Wide Web Conference Committee), published the client and server parts; such applications are referred to as full- under Creative Commons CC-BY 4.0 License. stack JavaScript applications. We take advantage of the monolingual ACM ISBN 978-1-4503-7023-3/20/04. nature of such applications to streamline our implementation. https://doi.org/10.1145/3366423.3380105 179 WWW ’20, April 20–24, 2020, Taipei, Taiwan Kijin An and Eli Tilevich In a web application, clients communicate with the server by 2 RE-ENGINEERING WEB APPLICATIONS means of the HTTP protocol, typically in a request/response pat- Developers often find themselves having to re-engineer an actively tern. However, from the implementation perspective, the HTTP used application to ensure its continued utility, reliability, and safety. functionality can be supported by a variety of middleware libraries When interacting with an application in real-world settings, users with vastly dissimilar APIs [29]. To be able to identify and replace may discover and report inefficiencies and imperfections. Users may the HTTP communication functionality, a web application may request that new features be added to an application to increase its need to be executed multiple times under different inputs. However, utility. As users discover existing faults and request new features, some remote interactions cause the server to change its state. For developers can decide to re-engineer the application to deliver example, a client can pass a parameter to the server, which would an improved version to the users. Re-engineering modifications store that parameter in the server-side database. In addition, the can range from routine maintenance and evolution tasks to major non-database state can change as well (e.g., adding the parameter to architectural transformations. Next, we demonstrate two examples the JavaScript list of displayed items). In different states, the server of re-engineering full-stack JavaScript applications. may respond dissimilarly, thus making it impossible to identify HTTP middleware API calls, so they can be correctly replaced with 2.1 Example Apps corresponding local calls of the insourced functionalities. The code snippets in Figure 1 come from two third-party full-stack The focal point of our approach is Client Insourcing, a new auto- JavaScript applications realty-rest 1 (left) and recipebook 2 (right), matic refactoring that undoes distribution by gluing the local and with both of their client and server parts shown. Both applications remote parts of a distributed application together. Our approach rely on the network for their client and server parts to communicate can precisely identify the functionality of HTTP middleware— with each other. The primary user base of realty-rest are real-estate irrespective of its API and in the presence of stateful operations—by brokers, licensed professionals that sell and purchase various prop- combining program instrumentation, profiling, and fuzzing in a erties on behalf of their clients. Due to the nature of their business novel way. Our ideas are realized in our reference implementation— operations, real-estate brokers lead highly mobile professional lives, Java Script Remote Client Insourcing (JS-RCI). We evaluate our moving from location to location to show properties to potential approach’s value, correctness, and utility by applying JS-RCI to buyers. Hence, as a mobile app, realty-rest is well-aligned with re-engineer a set of real-world web applications. the needs of its users, who rely on the app to be readily available, The contribution of this paper is three-fold: responsive, and reliable. To start using the app, a user selects a prop- erty from the list of all properties registered with the system. The (1) We introduce a technique that identifies the HTTP middle- selected property can then be updated or deleted, with the app’s ware functions used to send and receive HTTP commands client then sending HTTP commands to the server, (e.g., DELETE in a full-stack JavaScript application. This technique elimi- /property/favorite to remove a property from the list of nates the need to specialize our approach for the multitude favorites, etc). The HTTP commands are wrapped into distribu- of HTTP middleware libraries and their APIs. tion middleware (angular2/http) with JavaScript API. Specifically, (2) We create Client Insourcing—a novel automatic refactoring the client invokes HTTP.delete passing a URL parameter, with angu- that creates a semantically equivalent centralized version of a lar2/http delivering the invocation to the server and calling function distributed application by integrating remote functionalities unfavorite there. This function finds and deletes the passed property, with local code and replacing middleware communication returning the updated list of favorites to the client. angular2/http with direct function calls. This refactoring moves to the marshals both property and the result-to-return as JSON-encoded client not only the server’s business logic implemented in messages. The client unmarshals the returned result to update the JavaScript, but also the referenced database functionality, GUI. The recipebook maintains a list of cooking recipes at the server, including the relational database schema in SQL. so different clients could retrieve and update the maintained recipes. (3) We evaluate the wide applicability of Client Insourcing in recipebook uses a different middleware library to wrap its HTTP re-engineering real-world full-stack JavaScript applications. commands—angularJS, whose JavaScript API differs from that of Specifically, we apply our approach to re-engineer 10 sub- angular2/http. While realty-rest is a two-tier app (JavaScript client ject distributed applications, both two-tier and three-tier and sever), recipebook is three-tier (adding a database tier). (including the database), to meet new security, reliability, Next, we present examples of how realty-rest and recipebook may and performance requirements. need to be re-engineered to address new requirements. 2.2 Adapting to Disconnected Operation The rest of this paper is structured as follows. Section 2 motivates and summarizes our approach. Sections 3 and 4 present the design Examining the history of realty-rest reveals that some of this app’s and implementation specifics of the Client Insourcing refactoring, functionalities have been moved between its client and server sites3 . respectively. Section 5 reports on how we applied Client Insourcing Since scant documentation makes it hard to ascertain the reason to streamline three representative re-engineering scenarios of web for these moves, we next discuss a typical new feature that enables applications. Section 6 discusses various applicability issues per- distributed apps to continue operation in the absence of a network taining to our approach. Section 7 compares our approach with the 1 realty-rest(https://github.com/ccoenraets/ionic2-realty-rest) related state of the art. Section 8 outlines future work directions 2 recipebook (https://github.com/9bitStudios/recipebook) and presents concluding remarks. 3 ionic2-realty (https://github.com/ccoenraets/ionic2-realty) 180 [{"id":1,"name" :..}] //CLIENT: angular/…/RecipeControllers.js function init(){ //marshaling recipe.getRecipe($routeParams.id) .then(function(data){//unmarshaling $scope.recipe = data; appSync.prepForBroadcast(…);}, function(error){…} ); Client Insourcing } WWW ’20, April 20–24, 2020, Taipei, Taiwan //SERVER:server.js //CLIENT: app/../property-details.ts //SERVER: api/recipes.js //CLIENT: angular/…/RecipeCtrls.js app.delete('/properties/favorites/' unfavorite(event, property){ var db = require('../utilities/SQL'); function init(){ //marshalling , properties.unfavorite); //Marshalling var Auth = require('../utilities/Auth'); recipe.getRecipe($rParams.id) this.pServ.unfavorite(property) app.get('/api/recipes/:id', Auth.BAuth, .then(function(data){ //server/properties.js //unmarshalling var favorites = require('./property').favs; .subscribe(favorite //unMarshalling function(req,res){ //unMarshalling $scope.recipe = data; function unfavorite(request, response) { =>{ this.favorites =favorites;}); db.query('SELECT * FROM recipes appSync.prepForBroadcast(…);}, WHERE id = ${req.params.id}', var id = request.body.id;//unMarshalling } function(error){…} function (results) { ); for (var i=0; i<favorites.length; i++){ if(error) } if (favorites[i].id == id){ 1)HTTP Request 2)HTTP Response From Client res.status(500).send({'Error'}); favorites.splice(i, 1); From Server else { break;}} DELETE HTTP/1.1 200 OK var data = []; 1)HTTP Request 2)HTTP Response response.json( favorites )//Marshalling /properties/fav Content-t: json results.forEach(function(item) { From Client From Server } orite Content-Len: … data.push({'id':item['id'], GET HTTP/1.1 200 OK HOST .. [{"id":2,"city"…] //SERVER: server/property.js [{"id":1,"city" 'name':item['name']}) /api/recipes/1 Content-t:json exports.data = [{id: 1,...}]; :..}] }); HOST .. Content-Len:… exports.favs = [{id:2,…},…,{…}]; res.json(data); //Marshalling User-Agent:.. [{"id":1,"name" }});//Query Invocation :..}] Client parameter Server Return JSCode Performance Bottleneck }); Figure 1: Motivating Distributed Apps realty-rest/recipebook and highlighted Client-/Server-side code connection. In particular, if users need to operate a mobile app in loop iteration and Array.pop(). To be able to identify this particular locations with limited or intermittent network connectivity, the source of the experienced performance bottleneck, programmers app has to deliver its core business functionality without relying either would have to be intimately familiar with the peculiarities of on any remote services. To enable such offline operations, several JavaScript APIs or to rely on detailed execution profiling, typically strategies have been proposed [32]. One such strategy is replication, available only for centralized programs. recipebook also contains a which replicates a remote component locally, so the local copy similarly inefficient forEach loop6 . Notice that the distributed con- acts as a proxy of its remote counterpart. A consistency protocol trol flow that invokes these inefficient functions, starting from the keeps both copies in sync. A naïve strategy for replicating a remote graphical actions at the client, traversing the network through lay- functionality would be just to copy its complete source files to the ers and layers of middleware, and finally executing the functions client, adapting the copied code by hand as necessary. However, at the server. The invocation flows can be interrupted by network such complete copying unnecessary replicates functionalities, some volatility and authentication failures. Hence, it is both complex of which become “dead code.” control flows and possible failures that make it hard to isolate the performance of a web application’s function. 2.3 Enhancing Privacy Enterprises often find themselves in need to enhance user privacy 2.5 Client Insourcing to the Rescue in a released application. Consider a request to keep the realty-rest Next, we explain how Client Insourcing can facilitate the re-engi - user’s property browsing histories private from other real-estate neering tasks outlined above. brokers due to business competition reasons. To ensure user pri- Redistribution Client Insourcing creates a redistributable central- vacy, certain server-side functionalities (e.g., Customer Relationship ized variant devoid of the unnecessary middleware functionality. Management (CRM)) can be redistributed to a special server that re- Once the variant is modified, it can be redistributed automatically. quires authentication before giving access to sensitive information. Numerous complementary research efforts have focused on au- In fact, realty-rest indeed has gone through a similar modification, tomating the process of distributing centralized applications, with as evidenced by the existence of realty-salesforce4 , which provides automatic transformation tools released to the public [19, 21]. Be- the same business functionality, but takes advantage of third-party cause the majority of existing refactoring techniques are designed trusted identification and security features. To re-engineer realty- for regular centralized applications, they can be applied at will to rest into realty-salesforce, programmers would have to identify and centralized variants. For example, the Extract Function refactor- migrate the relevant functionality to another server, modifying the ing can be used to separate some privacy-sensitive code within a client to communicate with different servers (regular and secure). function into a separate function to be executed in a different envi- ronment. After the sensitive code portions are separated into their 2.4 Improving Performance own encapsulation units, the resulting program can be redistributed, If a substantial subset of users becomes unsatisfied with application placing the sensitive units to execute in separate privacy-enforcing performance, programmers may be asked to identify and remove server environments. performance bottlenecks. The left side of Figure 1 displays the Isolated Profiling What if business logic can be precisely isolated server function unfavorite, which contains a known performance from middleware and distribution-related functionality? Then the bottleneck, rooted in the usage of favorite.splice(i,1), an inefficient isolated code can be easily profiled to ascertain its performance API for removing collection items. In fact, an actual pull request5 characteristics and identify any performance bottlenecks. Client states that Array.splice()’s performance is between 1.5 and 10 times Insourcing enables such isolated profiling by removing middleware slower than that of a customized implementation, comprising a for and gluing the remote parts of a web application together. 4 realty-salesforce (https://github.com/ccoenraets/ionic2-realty-salesforce) 6A modification request to remove this inefficiency appears here: https://github.com/ 5 Perfective Modification for Array.splice() (https://github.com/nodejs/node/pull/20453) elastic/apm-agent-nodejs/pull/1275 181 WWW ’20, April 20–24, 2020, Taipei, Taiwan Kijin An and Eli Tilevich Offline Operation Client Insourcing can enable offline operation, points in the client code is to identify the entry/exit execution points without copying any unnecessary code from the server to the client, of the remote functionality to insource. These points correspond by replicating only the remote functionality’s subset needed at the to the locations in the client code, at which remote invocation pa- client. The replicated subset can include both JavaScript code and rameters are marshalled to be transferred across the network, and data persisted in a database. the remote invocation’s results are unmarshalled to be used in the subsequent client execution. 3 DESIGN & REFERENCE IMPLEMENTATION To extract all the server code of the remote functionality to in- In this section, we explain our design options and then detail the source, JS-RCI uses symbolic execution. We assume that the server specifics of our implementation of the Client Insourcing refactoring. is implemented in Node.js and define the execution rules as per- taining to this framework’s architectural conventions. First of all, 3.1 Design Overview JS-RCI normalizes server code to facilitate to detect entry/exit exe- We give an overview of the main design decisions behind Client cution points and extract the executed JavaScript code. To that end, Insourcing via specific examples. Consider the task of moving the JS-RCI additionally introduce temporal local variables and makes server functionalities of DEL /favorite or GET /recipe/ JavaScript Statement to have a single operation (i.e., tmpv0 and tmpv1 :id to execute at the clients (Figure 1). Instead of invoking these in Figure 2). For symbolic execution, we use z3 [10], parameterized functions via middleware that handles communication, partial fail- with our own set of rules and facts. For example, the profiled param- ures, and authentication, they would become regular local functions eters and return results of a remote functionality are added as new to be called directly. Hence, all middleware-based code would have z3 facts. Figure 4 shows the overall process of Client Insourcing. to be replaced with direct function calls. Consider the service DEL /favorite, whose business logic is encapsulated within the server-side unfavorite function.1)HTTP We wantRequest 3.3 Exploiting Asynchrony 2)HTTP Response to insource unfavorite so it can be called as a From Server regular local function. From Client Request //SERVER: api/recipes.js However, we cannot simply move this function from theGET server to Notice that in a distributed client-server application, the remotely var db = require('../utilities/SQL'); HTTP/1.1 200 OK lient the var client, Auth as its business logic = require('../utilities/Auth'); and middleware /api/recipes/1 functionality are in- invoked functionalities running at the server, and the client code in- Content-t:json app.get('/api/recipes/:id', Auth.BAuth, termingled. In addition, the exports.favorite function(req,res){ Content-Len:1901 HOST .. array, referenced in the voking these functionalities, run in separate address spaces that are erties/unf User-Agent:.. te db.query('SELECT * FROM recipes of= unfavorite [{"id":1,"name" not shared (unless the application runs on top of some distributed .. body WHERE id ${req.params.id}',//unMarshaling :..}] If unfavorite and exports. , is declared externally. function (results) { favorite are not moved together, invoking the function locally would shared memory system [35], which is not a standard option for ":1,"city" if(error) //CLIENT: angular/…/RecipeControllers.js web applications). The parameters passed to remote invocations res.status(500).send({'Error..'}); raise an else { error. Hence, we must move all the init(){ function referenced externally //marshaling var data = []; declared program elements to the client recipe.getRecipe($routeParams.id) as well. JS-RCI identifies the and the invocation results are copied between the client and the ts results.forEach(function(item, index) { .then(function(data){//unmarshaling haling exactdata.push( boundaries of the server functionality $scope.recipe = data; to insource. However, server heaps, always creating a new copy rather than mutating any appSync.prepForBroadcast(…);}, {'id':item['id'],'name':item['name']}) some});dependent business logic of GET /recipe/:id function(error){…}is not con- existing program state. Hence, in a distributed application that uses ng }); fined res.json(data); //Marshaling to JavaScript ); code only. JS-RCI }also transparently insources application-layer middleware (e.g., HTTPClient), the client and the }});//Query Invocation }); code that persists data in a relational database. server parts share no mutable state (See Figure 3). Following this ob- servation, one can conclude that the client and the server parts have //app/../property-details.js //app/../b8f9a.js no non-middleware dependencies between them. That is, in such dis- import {j5ga2} from './j5ga2'; exports.favs = [{id: 1,city:'B,..}]; tributed applications, the only way for the client code to invoke unfavorite { const IS_SYNC = false; //app/../j5ga2.js if (IS_SYNC) {//synchronous call var favorites=require('./b8f9a').favs; a server-side functionality is by making a remote invocation via this.favorites = j5ga2(property.id); export function j5ga2(input){ return; var tmpv1 = input; middleware. To maintain this semantics, our design also provides a } //default: non-blocking call var id = tmpv1; single entry point to invoke the insourced functionality, a function new Promise((resolve,reject) => { for (var i=0; i< favorites.length; var out_j5ga2 = j5ga2(property.id); i++){… favorites.splice(i, 1);…} previously invoked via a middleware API call at the server. It is resolve(out_j5ga2); tmpv0 = favorites; }).then(res => this.favorites = res); var output = tmpv0; these insights that make it possible to safely execute the insourced } return output;}//extracted function code asynchronously, without any need for synchronization! Our design of Client Insourcing takes advantage of these insights by Figure 2: Transformed and generated code to insource a functional- executing the insourced functionality asynchronously by default. In ity DELETE /properties/favorite in realty-rest app particular, the generated code makes use of the Promise framework that exposes asynchronous execution via a standardized interface that uses the programming idioms congruent with the design of 3.2 Identifying the Code to Insource JavaScript. Next, we present our solution for automating the steps above, real- For a specific example, consider the code listing in Figure 2 that ized as the Client Insourcing Refactoring. One of our design goals shows the generated client code for DEL /favorite. Notice that was to make sure that this domain-specific refactoring is not too the default invocation model for this insourced function is asyn- burdensome for the programmer. We assume that the refactored chronous, a runtime behavior that is put into effect by creating a applications come with a set of standard test cases, and that the ap- new instance of a Promise closure. Once the asynchronous execution plication of these cases is automated. It is during the application of of j5ga2 completes, the Promise framework invokes the callback such test cases, when JS-RCI detects the marshalling/unmarshalling resolve to handle the successful execution. Since our design aims points of the functionality to insource at the client invocations. In- for versatility, we provide an option for the insourced functionality tuitively, the purpose of detecting these marshalling/unmarshalling to be invoked synchronously as a regular blocking local call. This 182 Client Insourcing WWW ’20, April 20–24, 2020, Taipei, Taiwan After Client Insourcing Distributed App [Server’s Address Space] Set of Reachable States [Client’s Address Space] equality. Unlike its server-side logic, the analysis identifies the last Set of Reachable States Server Part’s (Server) (Insourced Part) instance of the client parameter equality and the first instance of Set of References the server return value equality. Centralized [Client’s Address Space] Middleware Ver.’s Set of Set of Reachable States 4.1.1 Fuzzing Request/Response Messages. Even with these ar- Set of Reachable States (Original Client) References rangements in place, it is still possible to misidentify the correct Client Part’s (Client) Set of References entry and exit points, particularly if the parameters or return results No Shared Mutable State are primitive types, such as built-in numbers or strings (i.e., 0 or 1 values of id in findById service). To prevent such misidentification, Figure 3: Reachable States between Server and Client parts JS-RCIpopulates the original round-trip content by padding the HTTP header and body data with random bits. A fuzzing dictionary is also applied to fuzzable primitives types: string has the possible behavior can be put into effect by setting the value of the boolean values “JSRCIStr” and integer has the possible values from “90,000” variable IS_SYNC to true. and to “100,000.” For instance, JS-RCI encodes “1” as “90,001”. For a service without a client parameter (i.e., findAll type services), 4 IMPLEMENTATION SPECIFICS JS-RCI fuzzes the request with “JSRCIStr” so JS-RCI can locate the In this section, we provide some additional details pertaining to function block’s begin as the entry point. our implementation choices. 4.1.2 Achieving the Idempotency for Record/Replay Executions. De- spite the stateless nature of the RESTful architecture that guides the 4.1 Detecting Marshalling Points in design of WWW, few realistic web applications are truly stateless. Client/Server Program In fact, every HTTP request can change the server’s state. These In a full-stack JavaScript application, the client interacts with the changes hinder the precision of our detection of the server’s mar- server in the request/response pattern, exchanging data in JSON or shalling points, introducing false-negatives. Even HTTP traffic were XML formats. Client Insourcing determines which middleware API replied with identical requests, a stateful server is likely to behave calls send and receive the HTTP protocol commands through the differently in 1) marshalling its response output or 2) entering the following automatic and application-agnostic procedure. remote functionality through a different point (e.g., if a visited entry First, the round-trip traffic of the client/server interactions is is deleted, it cannot be revisited). recorded. Then, JS-RCI parses the request/response data to obtain Testing web applications deterministically requires that test cases the deserialized values of client parameters and server return. To that be isolated [16, 33]. Otherwise, the same test case can yield dissimi- end, JS-RCI captures live network traffic, not only to record/replay lar results when executed with the same input. Restoring the server the HTTP interactions, but also to extract the used HTTP com- to its original state by hand would be expensive in realistic web ap- mands. To capture business logic (as compared to fault handling plications, requiring a manual reset of the relevant database tables logic), JS-RCI only processes the responses with the status code of and a fresh restart of the server. In contrast, JS-RCI fully automates 400 (i.e., successful execution). the process to achieve the idempotent execution of all HTTP re- Next, JS-RCI replays the recorded round-trip execution that in- quests. To maintain the original server’s state, JS-RCI interleaves an vokes the remote functionality to insource. Both the client and automatically generated restore operation, run between all succes- server parts are dynamically instrumented to keep track of values sive record or replay executions. Similarly to a prior approach that for (1) arguments and returns of the function invocations (2) read- checkpoints PHP web application [16], JS-RCI initiates the restore ing and writing variables. JS-RCI keeps comparing the values of operation with a special HTTP request. Similarly to manipulating the invocations and variables to identify the ones equal to client fuzzed request messages, JS-RCI generates the restore operations by parameter and server return. To instrument the invocations and enhancing original HTTP requests with the new “JSRCIRestore” pa- variable accesses, JS-RCI uses the Jalangi2 callback APIs [42]. rameter. To be able to restore the server state, JS-RCI first saves the To identify the entry points at the server, JS-RCI keeps comparing initial values of all server’s global variables, so they can be restored the values for recorded client parameter of the remote functionality. on demand. Also, as part of its restore operation, JS-RCI executes That is, the parameter has been unmarshalled and is about to be transaction control operations between every SQL invocations, so used. To identify the exit point at the server, JS-RCI follows a similar the database rollbacks to its previous state. procedure, but looks for the value recorded as the server return of As its specific implementation strategy, JS-RCI uses jalangi2, executing the remote functionality. Finding an equal value read or whose shadow execution instruments the original JavaScript code, written determines the exit point of the remote functionality. That so the server events can be hooked dynamically. First, JS-RCI detects is, the return value is about to be marshalled and sent across the all (1) post declarations of global variables (д) and (2) pre/post Call network to the client. One may wonder: how does our approach Expressions of SQL statements (f ). Then, it uses two customized determine that the equality comparison indeed identifies the entry shadow executions at (1), д ′ = store(д) to serialize and store the and exit points of the remote functionality rather than some inter- state of all global variables and restore(д, д ′ ) to reset all global mediate values that also happen to be equal to the values of client variables to their original values, hooked by restore HTTP com- parameter and server return? To identify the entry and exit points mands. To restore the database state, JS-RCI uses shadow execution at the server, our analysis identifies the first instance of the client invoke(f , sql_stat), which invokes Call Expression of a SQL state- parameter equality and the last instance of the server return value ment f with a new SQL clause as the argument. invoke(f ,"Start 183 Record REQ/RES REQ/RES Fuzzer WWW ’20, April 20–24, 2020, Taipei, Taiwan Kijin An and Eli Tilevich Database-dep Code Server Extracted Dependency Remote Re-Engineering Server Analysis(z3) Functionality REQ/RES JS Code Normalization & Restoring Fuzzing Server traffics Instrumentation Init State HTTP Cmd Client Client Client Position JS Code for Remote Equivalent Full-Stack Record & Replay Entry/Exit Invocation Centralized Code JS App REQ/RES traffics Points AST Rewriter Figure 4: Overall process for Client Insourcing TRANSACTION") and invoke(f ,"ROLLBACK") are executed at pre clauses: the first clause expresses the dependent statements for the and post invocations of f , respectively. JS-RCI executes these oper- parameter marshalling statement, while the second clause expresses ations only once for the nested SQL invocations. the dependent statements for the result unmarshalling statement, both specific to the server execution. Because the Data-Dep rela- 4.2 Identifying the Relevant Server Code tion is transitive, one can obtain the executed statements from the One of the factors that complicates the Client Insourcing refactoring entry/exit points, as expressed by the following set operations: is that the code comprising the functionality of the insourced func- ExecutedStmts(stmt n ,VunM u u ar ,uVM ar ) id id ← tionality may not be confined to the boundaries of a single function (Data-Dep(stmt n ,stmt 1 ) ∧ Marshal(stmt 1 ,v 1 , VMid ar ∧ )) uid or even the same script. While the entry point of the remote exe- (¬Data-Dep(stmt n ,stmt 2 ) ∧ UnMarshal(stmt 2 ,v 2 , VunM ar )) cution can be a JavaScript function, this function can be invoking other functions or reference variables declared elsewhere. When 4.3 Insourcing Database-Dependent Code insourcing a remote functionality, all this dependent code must be moved together to the client to create a self-sufficient local call that Our approach can also insource code that persists data in a rela- no longer relies on any server-based code. tional database. To that end, we take advantage of the ubiquity of To determine the data dependencies between the entry/exit SQL. Recall that JS-RCI dynamically instruments string values used points of a distributed application’s remote functionality, we draw as arguments and return values in all function calls. To identify lessons provided by the state-of-the-art JavaScript analysis frame- the entry point for database-related operations, JS-RCI examines works [17, 18, 46]. JSdep [46] logically hypotheses a Data-Dep the function calls whose strings arguments represent the CRUD relation between JavaScript statements based on read/write facts, a operations (Create, Read, Update, and Delete). Consider the code point-to-analysis model of GateKeeper [17] and a control flow anal- snippet in Figure 1. JS-RCI detects that the following Call Expression ysis [18] . For instance, an assignment statement Assign becomes is a READ operation, as it is a SQL SELECT statement: db.query("SELECT * FROM recipes WHERE id=id", function(result)..); a fact that implies Read and Write relations for the variables in- volved. Read and Write on the same variable between different Although the server and the client are written in JavaScript and statements imply a Data-Dep relation at the statement level. their respective database engines accept the same SQL statements, the JavaScript APIs of these engines differ. So it would be impossible Assign(stmt 1 ,v 1 ,v 2 ) //var v 1 = v 2 ; v is variable, stmt 1 is statement to simply move this Call Expression and its dependent statements Write(stmt 1 ,v 1 ) ← Assign(stmt 1 v 1 ,v 2 ) (e.g., var db = require('../utilities/SQL');) to the client. Hence, JS- Read(stmt ,v 2 ) ← Assign(stmt ,v 1 ,v 2 ) Data-Dep(stmt 1 ,stmt 2 ) ← Read(stmt 1 ,v 1 ) ∧ Write(stmt 2 ,v 1 ) RCI adapts the server-side database API to that of the client rather ... than copying the database-specific statements verbatim. With these API calls translated, developers can simply migrate the server-side We extend JSdep’s knowledge base with the rules and facts, nec- data schema and tables. Notice that database engines store their essary to model the execution of middleware-based statements. In data in dissimilar proprietary formats. particular, we define the UnMarshal/Marshal rules to identify As a specific example, consider how JS-RCI translates the data- the entry and exit points, whose Write clauses are inferred from base API calls of MySQL7 to those of alasql 8 . the logged profiling data. To that end, JS-RCI encodes the Ref facts By extracting the arguments and return values of function calls, by using the logged values to symbolically copy the unmarshalled/- u id u id JS-RCI extracts table names and their columns, thereby inferring a marshalled values (VunMar /VMar , uid is an unique execution id complete data schema of the insourced code. Extracting the actual such as "J5ga2") into the local variables as follows: table content requires a different approach, as the WHERE clause //the entry point at the server and numerical functions, such as COUNT, return only a subset u id UnMarshal(stmt 1 ,vunM ar , VunM ) ← u id ar of table rows. To retrieve all database data, JS-RCI instruments Write(stmt 1 ,vunM ar ) ∧ Ref(vunM ar ,VunM ar ) the server code by using the shadow execution invoke(db.query, //the exit point at the server “SELECT * FROM recipes”), which is introduced in Section 4.1.2. To u Marshal(stmt 1 ,v M ar , VMid ar ) ← infer the database schema from the extracted entries, JS-RCI uses uid Write(stmt 1 ,v M ar ) ∧ Ref(v M ar ,VM ar ) tableschema-py9 . Finally, JS-RCI uses the CREATE and INSERT Based on the resulting knowledge base, JS-RCI can query the 7 https://github.com/mysqljs/mysql executed statements stmtn for the presence of unmarshalled/mar- 8 https://github.com/agershun/alasql shalled values. Predicate ExecutedStmts is a conjunction of two 9 https://github.com/frictionlessdata/tableschema-py 184 Client Insourcing WWW ’20, April 20–24, 2020, Taipei, Taiwan commands with alasql to create tables and insert the extracted data Table 1: Subject Distributed Apps and Client Insourcing Results into them, respectively, for the client-side database. Subject Apps C&P /M HTTP Methods Remote Services (Tier1,Tier2,Tier3) (ULOC) GET /recipes 22/45 5 EVALUATION recipebook GET/PUT/POST/DEL POST /recipes:id /ingredients 72/172 25/48 (AngularJS↔Express To determine how feasible and useful our approach is, we conduct GET/PUT/DEL /ingredients:id 74/207 ↔MySQL) POST /directions 26/57 an empirical evaluation driven by the following questions: GET/PUT/DEL /directions:id 60/130 GET/POST /donuts 22/88 DonutShop GET/POST/DEL /donuts:id 29/155 • RQ1. Effort Saved by Client Insourcing : How much pro- GET/POST /employee 20/71 (Ajax↔Express grammer effort is saved by applying JS-RCI? We measure the ↔knex) GET/POST/DEL /employee:id 29/138 GET/POST /shops 16/83 saved effort as the number of lines of code that would need GET/DEL /shops:id 19/128 to be copied and modified by hand. JS-RCI saves this effort res-postgresql GET/POST /user 22/71 (axios↔restify↔Postgres) GET/PUT/DEL /user 40/120 automating these manual source code changes. (Section 5.2) med-chem-rules GET /hbone 9971/9994 • RQ2. Correctness of Client Insourcing : Does Client In- (fetch↔koa.js↔knex) GET /molecular 9974/9997 theBrownNode GET /users/search 37/65 sourcing preserve the business logic of full-stack JavaScript (JQuery↔Express) GET /users/search/id 36/64 applications? Are existing standard use-cases still applica- GET /api/ladywithpet 394/409 GET /api/thedea 394/409 ble to the centralized variants of the subject applications? GET /api/theredroom 394/409 Bookworm GET /api/thegift 394/409 (Section 5.3) (AngularJS↔Express) GET /api/wallpaper 394/409 • RQ3. Value for Adaptive Tasks : How much redundant GET /api/offshore 394/409 GET /api/bigtripup 394/409 code can Client Insourcing eliminate by replicating only the GET /api/amont 394/409 necessary remote functionality? Are our centralized variants GET /properties 284/297 GET /properties:id 287/300 amenable to be redistributed with a third-party automated realty_rest GET /brokers 86/99 distribution tool? (Section 5.4) (Angular2↔Express) GET /brokers:id 90/103 GET/POST/DEL /prprts/favs 34/73 • RQ4. Value for Perfective Tasks : How suitable are the POST /prprts/likes 291/304 centralized variants of distributed subjects for isolating and GET /findAllSpeakers 13/66 ConferenceApp GET /findSpeakerById 15/68 removing common performance bottlenecks? How much (Angular2↔Express) GET /findAllSessions 43/117 GET /findSessionById 46/119 does Client Insourcing reduce the task complexity as com- Employee Dir GET /employees 22/44 pared to the original debugging process? (Section 5.5) (Angular2↔Express) GET /employees/id 38/60 shopping-cart GET/POST/DEL /cart-items 79/130 (Angular2↔Express) Total 61 24.9K/26.6K 5.1 Evaluation Setup To evaluate our approach, we have applied it to insource 61 different remote executions of 10 full-stack JavaScript applications [6, 9, 11, copied/pasted code (C&P). For the 61 remote services of 10 applica- 12, 30, 36–38, 44, 47]. Table 1 summarizes the information about tions, JS-RCI eliminates the need to modify the client code as many invoking these remote functionalities for each application. These as 26,685 ULOCs in total, 20,073 ULOCs are database code. remote services differ in their HTTP methods (e.g., GET, POST, PUT etc.), types of parameters, return results, and business logic. 5.3 Correctness of Client Insourcing To confirm that our approach is widely applicable, we selected The applicability of JS-RCI hinges on whether Client Insourcing as our evaluation subjects open-source full-stack JavaScript appli- preserves the execution semantics (i.e., business logic) of the refac- cations with dissimilar HTTP frameworks used to implement their tored applications, a property we refer to as correctness. A subject client (Tier 1), server (Tier 2) and database (Tier 3) parts: Tier1: application’s original and refactored versions are expected to suc- JQuery, Ajax, fetch, axios, AngularJS, and Angular2-TS; Tier2: Ex- cessfully pass the same test cases. Some of the tests that come with press, koa.js, and Restify, and Tier3: MySql, Postgres, and knex.js. our subjects are also distributed, invoking server-side functional- ities through HTTP middleware. To use their remote parameters 5.2 Saving Effort with Client Insourcing and results as test invariants, we manually transformed these tests for local execution without middleware. Altogether we ran 61 test Although developers can insource remote components by hand, cases against the original and insourced versions of our subject the resulting program transformations can quickly become labori- applications, with all of them successfully passing. It is possible ous and error-prone, especially for functionalities scattered across that for some complex or esoteric cases, the correctness of Client multiple script files and database-dependent code appearing in non- Insourcing would not be as stellar, but by examining why a test JavaScript files. Hence, the value of JS-RCI lies in automating the case failed, the programmer can always correct the insourced code. transformations required to insource these components. With JS- RCI completely automating the refactoring, the programmer would 5.3.1 The Effectiveness and Correctness of Detecting the Marshalling not have to modify any code by hand. To estimate the effort saved Points. Recall that in Section 4, we proposed two search strategies— by JS-RCI, we use the ULOC (Uncommented Lines of Code) that Idempotent Execution and Fuzzing—to detect the marshalling points would have to be copied at the server and pasted to the client as of a refactored application. To compare and contrast the effective- well as the ULOC that would have to be modified at the client for ness and correctness of these strategies, we ran our analysis proce- each remote service. Thus, modified client code (M) includes the dure with each of these strategies in isolation. 185 WWW ’20, April 20–24, 2020, Taipei, Taiwan Kijin An and Eli Tilevich We observed that Idempotent Execution with its Record/Replay To identify the code portions that are indeed unnecessary to phases removes the false-negatives in the detected marshalling replicate the remote functionalities under consideration, we first points for stateful servers. Our results show that subject applica- count the total lines of JavaScript code taken to implement the orig- tions with only safe (or read-only) operations are not affected by inal server parts of each subject app (S LOC ). To replicate all remote the restoring process (20/61). However, we discovered that idem- functionalities, programmers would copy S LOC to the client and potent execution is critical for the majority of our subjects (41/61). adapt them as necessary. The copied S LOC are intermingled with Specifically, having been changed by HTTP PUT/POST/DELETE various unnecessary parts, including middleware, fault handling, requests, global variables were restored correctly in realty-rest and or no-longer relevant comments. The values of S LOC are computed database entries were restored in other subjects. by examining the programmer-written files and their dependencies In contrast, Fuzzing removes false-positives for detecting mar- deplofyed in the Node.js server. In contrast, Client Insourcing ex- shalling points. tracts from the server only the lines of code required to implement We discovered that Fuzzing proved effective also in twelve cases C I ). For simplicity, we the replication disconnected operation (S LOC of our subjects (12/61). Hence, to infer the correct set of marshalling assume that the entire remote functionality is replicated for each points, while removing both false-negatives and positives, JS-RCI subject application. To estimate the number of lines of code that applies both strategies in turn. Client Insourcing saves from being replicated unnecessarily, we C I as shown in Table 3. subtract S LOC from S LOC Table 2: Correctness affected by Search Strategies 5.4.2 Value of Centralized Variants for Redistribution. Client In- Subject State Data All w/o w/o sourcing creates a redistributable (centralized) application variant Apps -less -Base Fuzzing Idem_Ex that can be refactored and enhanced using any state-of-the-practice theBrownNode ✓ X 2/2 0/2 2/2 program transformation tools and then distributed anew using any Bookworm ✓ X 8/8 0/8 8/8 ConferenceApp ✓ X 4/4 4/4 4/4 state-of-the-art ditribution tools. We applied two JavaScript refac- EmployeeDir ✓ X 2/2 2/2 2/2 toring tools on our centralized variants: Node-SandBox10 for se- shopping-cart X X 3/3 3/3 0/3 curity enhancements and extremeJS [49] for redistribution. Node- realty-rest X X 8/8 6/8 2/8 recipebook X ✓ 13/13 13/13 0/13 SandBox prevents untrusted JavaScript code from executing in- DonutShop X ✓ 14/14 14/14 0/14 finite loops or consuming large volumes of heap memory in the res-postgresql X ✓ 5/5 5/5 0/5 isolated code. However, sanboxing frameworks incur a heavy per- med-chem-rules ✓ ✓ 2/2 2/2 2/2 Total 100%(61/61) 80%(49/61) 32%(20/61) formance penalty on the isolated code, and as such must be used sparingly, if the application is to remain usable. Hence, the code to sandbox is typically isolated from the rest of the application to run in its own process and address space. extremeJS automatically 5.4 Insourcing’s Value for Adaptive Tasks distributes centralized JavaScript applications at the function level 5.4.1 Value of Automated Enabling of Disconnected Operation. In of granularity. lieu of Client Insourcing, developers would have to replicate re- Appc ent Appcs ent anbox ed mote functionalities by hand. Unassisted by program analysis, a − JS-RCI −−−−−−−→ node-SandBox−−−−−−−−−−−−→ Appdist → Remote Stub sanbox ed programmer remains unaware which specific code entities com- extremeJS −−−−−−−−−−→ Appdist prise a remote functionality that needs to be replicated. Hence, a Client Stub safe option for manually replicating any non-trivial remote func- 4 50 1,4 ConferenceApp Bookworm Shopping-cart tionality would be to first duplicate the entire server-side source 3,5 3 45 40 1,2 35 1 file at the client, and then adapt the duplicated code as necessary. 2,5 2 30 25 0,8 20 0,6 Notice that such copy-and-modify procedures invariably introduce 1,5 1 15 0,4 10 0,2 some unnecessary code, which is never used but still needs to be 0,5 0 5 0 0 Before SandBox_exact SandBox_all Before SandBox_part SandBox_all Before SandBox_part SandBox_all deployed and maintained. Hence, in our evaluation, we count the 2,5 1,6 5 number of lines of such unnecessary code that could result from 2 EmployeeDir 1,4 theBrownNode 4 ionic2-realty-rest 1,2 copying the entire source file from the server to the client. 1,5 1 3 0,8 1 0,6 2 0,5 0,4 1 0,2 Table 3: Replication 0 0 0 Before SandBox_part SandBox_all -0,2 Before SandBox_part SandBox_all Before SandBox_part SandBox_all S LOC - S LOC CI Subject Apps S LOC S LOC CI Figure 5: Redistribution with Sandboxing (y-axis:seconds) (Unnecessary LOC) theBrownNode 120 76 44 Bookworm 340 299 41 In our evaluation, we measure the additional execution time realty-rest 457 420 37 incurred by sandboxing only a subset of the remote functionality vs. ConferenceApp 78 51 27 EmployeeDir 56 35 21 the entire original remote functionality. This comparison highlights shopping-cart 48 26 22 the importance of isolating only the code that needs to be sandboxed. recipebook 624 376 126 Figure 5 shows by how much sandboxing increases the execution DonutShop 455 308 147 res-postgresql 73 28 45 time for two versions of the subject applications: (1) only the needed med-chem-rule 10228 9976 252 10 Node-SandBox (https://github.com/patriksimek/vm2) 186 C li entIn sou rcin g WWW’ 20,Ap ri l20– 24,2 020 ,Ta ipe i ,Ta iwan subse to ftheserve rp ar tisiso lat ed(SandBox_pa rt) ;(2)theent ire 11p erform anc ebo t t l ene ck sfr ombo thth eo rigin a lsu b j ectsan d serve rp ar tisisola ted(SandBox_ all) .Theo b serveddifer encesin thei rc entralizedv ariant s .A sittu rn sou t ,th ebo ttlen eckr emo v al s execu tiontimebetweenthes etwovers ionsar equites tr iking ,cle arly imp ro vedth ep erf orm an c eo fbo thv e r s i ons( d i stribu tedan dc en- showingth a tsandboxingtheent irese rverpar tisimpract ica l . tra lized )ofe a chsu bject .F igu re6summ arizesth eo bserv edp erfo r- manceimp rov em en ts.Fo rth eo rigin aldist ribut edsub je cts(∆Tdist) , 5 .5 In sou rcing ’sV alu efo rPe rfe ctiv eTa sks theimp ro vem entsr ang eb etw een2 9.5%an d2 .0% .F orth eirc en - C on s i derth ep ro blemo fi den t ifyin gth es ourceofap e rform anc e tra lizedv ariants( ∆Tcent) ,th eimp ro vem entsr ang eb e tw een3 4. 8% in ei c i encyo rbo ttlen eckinad istri butedapp.F irst ,on eh astob e and1. 6%.W ealsoapp liedal inearr egressionan alysistocompu te abl etoe xclud eth er eason so fm isconigu rat iono rn e two rkvo l a til - howc los e ly∆Tdistand∆Tcentco rr el atew i the a cho th er,re sul ting ityamongth epo ten tialc aus es.Th en,on ehastom akesu reth eapp in∆Tcent =1 .0089∗∆Tdist +1 .556.Th isequ ationsh ow sth a t isf reeo fknowna r chite cturalan ti-p at terns[39].Fo re xamp le,con - ∆Tcentan d∆Tdista r ealm o s tp erf ect lyc orre l a t ed,s oc entr a liz ed se cutiv ein e-g rain edr emo teinvo cationsc anb eb a t chedtot ak ead - var iant scanind eeds ervea sr e liableandconv en ientp roxi esforan van tag eo fb et terp rog r essb eingm ad einin c reasingth eb andw idth impo rtantc l a s sofp erfo rm an ced e buggingandop tim izationt a sks. ascomp a r edtoth elaten cych ara ct erist icsofmod ernn etwo rks[ 34]. Inadd ition,C lien tIn sou r c ingr edu cesth ecomp l exityo fth ed e- How eve r ,th esou r ceso fin ei ciencyc anb emo resub t leth antho s e buggingp roc essbys tr eam liningth ed ebugg edsu bject’se xecution stemm ingf romi ll -con c e iv eda rchite cturaldeci sion s .A tsom epo int , low:fromth ecomp le xi tyo fd is tribu tede x e cutiono v erth eW ebto th ed ebugg ingfo cusm ayn e edtosw itchtoth eprog ramm er-w r itten thesimp licityo fc en tral i zede xecution.T oqu an t i fyth ea ctu alva lue cod e .Th eJ av a Scrip te co systemf eatu resnum e rou slibraries ,soth e ofdebu gg in gth ec en tralizedv arian tofaw ebapp licationin stea d sam efun c tion alityc anb eimp lemen ted inav ar ietyo fw ays ,e acho f ofi tso rigin aldistribu tedv ersion,w ecomp ar edth eto t a le xecution whi chm ayh av e itsownp er fo rm anc ech aracte ris tics.Choo s ingon e timet ak enbyin vok ingd istributedfun ction al iti esv s.th eirl oc a l prog ramm ingid iomo v erano therc anh avead ram aticef ectonth e insour cedcoun t e rp arts .W ea ssum edth atth ed e bu ggin gt askw a s ov erallappp e rform an ce[1 4, 15] .G iventh ediverg entp er fo rm ance identifyin gp erform an cebo ttlene ck s ,sow eh eavilyin st rum en ted ch ara c t er isti cso fd if e r entJ av aScr ip tAP I s ,severalp r iorwo rkd irec - ourben chm ark sb efor em easu r ingth e ire xecutionp erform an ce .A s tion sh av efo cu sedon id en t ify ingandr emov ingcommonsou rcesof itturn sou t,in sou r c in gr edu cesad istr i butedfun ction a lity ’ se xe- in ei c i ency .Th eapp ro achp resen tedin[ 41 ]emp irica llyid entii e s cut iont im ebym oreth an90%ona verag e .G i venth atd e bu gg in g re curringp attern so fin ei ci entprog ramp erform an ce ,soth eyc an typi c al lyinvo lv e sr ep ea t ede xecution s ,h avingmu chf a s t ersub je ct s ber e s tructu r ed,th er ebyimp rovin gth eo veral lp erform an ce .Th at tod ebu gshou ldimp rov eth eei ci encyo fth ed e bu ggin gp r o ce ss. kindo fr estru curin gisacommone xamp leofp er f ect i vemod iic a - tion s .How e ver,th em ajorityo fth es tate-of -the-artapp roach esth a t id entifyandr emo vep erfo rm anc ein ei ciencie st ar ge tc entr a lized 5 .6 Th rea tstoV alid ity 1 ,0089 prog,1 r,556 am s .C lien tIn sou r cingc anm ak eth es eapp ro achesapp licable Th ev a lidi tyofoureva lua tionresu lt sissub jec ttobo thin tern aland tod istribut edapp s. ex ternalth rea tsthatwediscu ssinturnnex t. 35 /u ser s/s ear ch/ id 30 /u ser s/s ear ch Intern a lTh r eat s .Oneo fourev a lu a tionc r iteriaisthep erfo rmance 25 In eff ic ien tloop 3 5 ofth eJ a vaScriptcod egeneratedbyou rimp lem entationo fthe / favo ri te 20 / api /the red 3 0 Cli en tIn sourcingr e facto r y=0 ,ing 91 17.Th x-0 ,2ep 0 15 e rform an ceofJ avaScriptcode / api/b ig ∆"#$%&( %) 15 /api /w a l l / api/ lad y iskn ownt ob eh eav ilyaf ec tedbysp eciicd e s ignan dimp lemen- tationch oices .S imilar ly ,ou rownJ ava S criptc odin gpracticesare 2 5 /ap i/theg / api /th ed 10 / api /amon t 2 0 likelytoh aveaf ect edtheob s erv edperform anc echarac t e ristic s.For 5 m isu sedAP Is 1 5 examp le ,r athe rthandirec tlyinj ec tth einsou rcedc odes egm ent s / api /of fs 0 1 0 intoth eclients ourceiles ,w ech ooset ocre a t ebrandn ews ource 0 5 1 0 15 2 0 2 5 3 0 3 5 ∆"*+,&( %) ilesf ore achin sourcedlanguage sdeclara ti on,w i thth en ewi l es F igu re6 :Sca tte rPlo tandR egr ess ionT estfo r∆Tdist v ersu s∆Tcent 5 simp lyin cludedinth eo r iginaliles.ClientIn sourc in gcou ldh ave 0 b 0 eenimp 5 10 lem1 5 entedinav 2 0 25 ar ie 3 0 tyo foth erw ay s,pos s iblyy ield ing Weapp l i edtheapp ro achp res entedin[4 1]toth ec en t ralizedv ari - difer entso f tw a r een g ineer in gandp e rform an cem etri c s . antso fou rsu b je c tapp sp roducedbym eansofC lien tIns ourcin g . Outo f6 1sub jec ts,1 1end edupcon t ainingsom eknownp a tt e rn sof perform an ceinei cien cy.Fo rexample,Bookwormr ep et itiv e lym is - Ext ernalThrea ts .A l lou rper formancem ea sur em ent sw er ep e r- usedun optim izedstrin gAP Ip at terns :data.split("...").join("asdf formedon( De ll-OptiPlex50 50,runn ingth eJ avaSc rip tV 8En - ").split(".").join( "a sdf").B ytakingthen e tw orkan dm id d lew are gine (v6.11. 2) .Du etoth epopula ri tyo fJavaS c ript,theissueo f fun ct ion alityou to fth elis tofsu spectedcau seso fp erform an ce max im i zingtheei c iencyofJavaSc riptenginesh asc omet othe pro b lem s,C lientIn sour cingenabless o -ca lled“is o l a t edp roilin g ,” fore frontofsys temd es ign[41] .AlthoughV 8isas t ate-o f-the-ar t whichi s o la te sth ep rog ramm er-writ tenc odet ob eu seda sth e JavaSc ripteng ine,i th asit sc ompet itors ,sucha sSpiderM onkey. solet argeto fan a ly s isan dop t im i z a tionef orts .T od em onstrate Hence ,theabso luteper formanceofourexperimen t scouldd iferif th ev a lu eo fC lientIn s ourcing ,w erem oveda l lth epo inted -out ourm ea surement sw ereruninadiferentexecutionen v ironm ent . 187 WWW ’20, April 20–24, 2020, Taipei, Taiwan Kijin An and Eli Tilevich 6 DISCUSSION language constructs for programming event-based applications Our approach works only with relational databases interfaced with that communicate asynchronously (i.e., callback, promises) have by means of SQL queries. Some non-SQL databases, such as Mon- been statically analyzed via formal reasoning based on a calcu- goDB, use a distinct syntax in its client API. It should be possible lus [27, 28]. Existing dynamic analysis tools [22, 42] are known to support the dissimilar CRUD operations of non-SQL databases, to scale poorly to handle whole JavaScript program analysis. In and we plan to explore such support as a future work direction. dynamic symbolic execution (DSE), a program is symbolically exe- For various reasons, some remote functionalities cannot be in- cuted in place of concrete input values [40]. MultiSE [43] effectively sourced to run on the client, thus making it impossible to create generates testing input values of a JavaScript program by using a a centralized variant of certain distributed applications. In those value summary in Jalangi2 to speed-up dynamic symbolic execu- cases in which distribution is inevitable, some application resources, tion. JS-RCI is related to re-engineering tools that automatically naturally remote to the rest of the functionality, cannot change their transform apps [4, 13, 21, 24, 25]. Guided by zero-knowledge proofs, locality. For instance, news readers display the stories deposited to the ZØ compiler [13] preserves user privacy by splitting existing some centralized repository. It would be impossible to move the code into multi-tier apps. Cloud offloading [21] improves the energy news functionality away from the repository to the client, with- efficiency of an app by splitting it into the client and server. out manually creating some mock components that realistically JavaScript debugging is an active research area [3, 20, 48, 50]. emulate the appearance of news content locally. In other words, BLeak [48] and MemInsight [20] identify memory leaks by check- some remote functionalities may depend on resources that cannot ing for sustained memory growth patterns between consecutive be easily migrated away from their host environment for reasons executions. JSweeter [50] detects performance bottlenecks caused that include relying on server-specific APIs or being dependent on by JavaScript type mutations. However, these tools work only with some hard-to-move infrastructure components. centralized JavaScript apps that are run on a single V8 engine. In addition to standard commands, HTTP also provides a sepa- rate WebSocket interface that opens a dedicated TCP/UDP connec- 8 FUTURE WORK AND CONCLUSION tion after a round-trip handshake. WebSocket-based communica- We designed and implemented our approach with the assumption tion is fundamentally asynchronous and is used mostly in stream- that it would be applied to monolingual execution environments, ing scenarios. Although Client Insourcing can also help in the such as that of full-stack JavaScript applications. However, many re-engineering of web apps that use WebSocket for non-streaming modern distributed applications are multilingual, with the client and scenarios, we left the support for this part as a future work direction. server parts written in different languages, often quite dissimilar. It Some web applications may span across more than two tiers. might be possible to extend Client Insourcing to such multilingual Our reference implementation assumes a two-tier client-server environments by supplementing our design with automatic cross- application with a possible server-side SQL database, in which both language translation. In other words, to extend the applicability tiers are implemented in JavaScript. It should be possible to extend of Client Insourcing to multilingual distributed applications, one Client Insourcing to multi-tier applications, perhaps by applying can build upon our design and reference implementation by adding the two-tier technique pairwise to each respective pair of tiers. a cross-language translation component to the last phase of the At the same time, flattening tiers may not work well for mobile refactoring process [2]. This new component would automatically execution environments, which are known to be resource-scarce. translate the insourced code from the server language to that of the client. When invoking the insourced translated code, the differences 7 RELATED WORK between the calling conventions would have to be reconciled. We have presented an approach that facilitates the profiling, Several prior approaches conquer the complexity introduced by adaptation, and securing of full-stack JavaScript applications. The middleware functionality through abstraction and modeling tech- approach is enabled by Client Insourcing, a novel automated refac- niques. A dynamic analysis platform analyzes full-stack JavaScript toring that integrates remote functionalities with local code, thereby applications by abstracting away middleware communication, so creating a semantically equivalent centralized variant of a dis- it can be emulated in dynamic profiling scenarios [8]. [1] studies tributed application. We showed how this centralized version can be implicit relations between asynchronous and event-driven entities analysed and modified more easily than its distributed counterpart, that are spread over the client and server sides of a distributed to be then redistributed automatically with all the modifications execution. JS-RCI is unique in its ability to remove the no-longer- in place. The pervasiveness of distribution highlights the need for necessary middleware functionality and compute the server-side novel automated techniques for the re-engineering of web appli- dependent source code, which may not even be declared in the cations, and Client Insourcing can potentially become a useful same source file as the insourced functionality’s entry point. building block for such techniques. Several recent techniques automatically integrate portions of a program’s source in another program with systems such as Code- Availability. Our reference implementation and all benchmarks are CarbonReply [45] and Scalpel [5] supporting this functionality for publically available at: https://github.com/kjproj84/JS-RCI. C/C++ programs. However, these works studied how to integrate two independent centralized programs. Our reference implementation of Client Insourcing, JS-RCI, re- ACKNOWLEDGMENTS lates to advanced program analysis techniques for JavaScript, due This research is supported by the NSF through the grants # 1650540 to its target domain—cross-platform mobile apps. The JavaScript and 1717065. 188 Client Insourcing WWW ’20, April 20–24, 2020, Taipei, Taiwan REFERENCES [26] Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An [1] Saba Alimadadi, Ali Mesbah, and Karthik Pattabiraman. 2016. Understanding Empirical Analysis of Flaky Tests. In Proceedings of the 22Nd ACM SIGSOFT Asynchronous Interactions in Full-stack JavaScript. In Proceedings of the 38th International Symposium on Foundations of Software Engineering (FSE 2014). 643– International Conference on Software Engineering (ICSE ’16). 1169–1180. 653. [2] Kijin An. 2019. Facilitating the Evolutionary Modifications in Distributed Apps [27] Magnus Madsen, Ondřej Lhoták, and Frank Tip. 2017. A Model for Reasoning via Automated Refactoring. In Web Engineering. Springer International Publishing, About JavaScript Promises. Proceedings of the ACM on Programming Languages 548–553. OOPSLA (Oct. 2017), 86:1–86:24. [3] Kijin An and Eli Tilevich. 2019. Catch & Release: An Approach to Debugging [28] Magnus Madsen, Frank Tip, and Ondřej Lhoták. 2015. Static Analysis of Event- Distributed Full-Stack JavaScript Applications. In Web Engineering. 459–473. driven Node.Js JavaScript Applications. In Proceedings of the 2015 ACM SIGPLAN [4] Kijin An and Eli Tilevich. 2020. D-Goldilocks: Automatic Redistribution of Remote International Conference on Object-Oriented Programming, Systems, Languages, Functionalities for Performance and Efficiency. In 2020 IEEE 27th International and Applications (OOPSLA 2015). 505–519. Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, [29] Josip Maras, Jan Carlson, and Ivica Crnkovi. 2012. Extracting Client-side Web 251–260. Application Code. In Proceedings of the 21st International Conference on World [5] Earl T. Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Wide Web (WWW ’12). 819–828. Automated Software Transplantation. In Proceedings of the 2015 International [30] med-chem rules. 2019. https://github.com/acarl005/med-chem-rules. Symposium on Software Testing and Analysis (ISSTA 2015). 257–269. [31] James Mickens, Jeremy Elson, and Jon Howell. 2010. Mugshot: Deterministic [6] Bookworm. 2019. https://github.com/davidwoodsandersen/Bookworm. Capture and Replay for Javascript Applications. In Proceedings of the 7th USENIX [7] Eric J Byrne. 1992. A conceptual foundation for software re-engineering. In Conference on Networked Systems Design and Implementation (NSDI’10). 11–11. Proceedings of the Conference on Software Maintenance. 226–235. [32] Marija Mikic-Rakic and Nenad Medvidovic. 2006. A classification of disconnected [8] Laurent Christophe, Coen De Roover, Elisa Gonzalez Boix, and Wolfgang operation techniques. In 32nd EUROMICRO Conference on Software Engineering De Meuter. 2018. Orchestrating Dynamic Analyses of Distributed Processes and Advanced Applications (EUROMICRO’06). IEEE, 144–151. for Full-stack JavaScript Programs. In Proceedings of the 17th ACM SIGPLAN [33] Kivanç Muşlu, Bilge Soran, and Jochen Wuttke. 2011. Finding Bugs by Isolating International Conference on Generative Programming: Concepts and Experiences Unit Tests. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th (GPCE 2018). 107–118. European Conference on Foundations of Software Engineering (ESEC/FSE ’11). 496– [9] ConfApp. 2019. https://github.com/tkssharma/Ionic-conferenceApp. 499. [10] Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In [34] David A Patterson. 2004. Latency lags bandwith. Commun. ACM (2004). International conference on Tools and Algorithms for the Construction and Analysis [35] Jelica Protic, Milo Tomasevic, and Veljko Milutinovic. 1996. Distributed shared of Systems. Springer, 337–340. memory: Concepts and systems. IEEE Parallel & Distributed Technology: Systems [11] Donuts. 2019. https://github.com/VinniiOtchkov/Donuts. & Applications (1996). [12] EmployeeDir. 2019. https://github.com/ccoenraets/employee-directory-services. [36] realty-rest. 2019. https://github.com/ccoenraets/ionic2-realty-rest. [13] Matthew Fredrikson and Benjamin Livshits. 2014. ZØ: An Optimizing Distribut- [37] recipebook. 2019. https://github.com/9bitStudios/recipebook. ing Zero-Knowledge Compiler. In 23rd USENIX Security Symposium (USENIX [38] res-postgresql. 2019. https://github.com/u4bi-sev/node-postgresql. Security 14). 909–924. [39] Ganesh Samarthyam, Girish Suryanarayana, and Tushar Sharma. 2016. Refac- [14] Liang Gong, Michael Pradel, and Koushik Sen. 2015. JITProf: Pinpointing JIT- toring for software architecture smells. In Proceedings of the 1st International unfriendly JavaScript Code. In Proceedings of the 2015 10th Joint Meeting on Workshop on Software Refactoring. ACM, 1–4. Foundations of Software Engineering (ESEC/FSE 2015). 357–368. [40] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, [15] Liang Gong, Michael Pradel, Manu Sridharan, and Koushik Sen. 2015. DLint: and Dawn Song. 2010. A symbolic execution framework for JavaScript. In 2010 Dynamically Checking Bad Coding Practices in JavaScript. In Proceedings of IEEE Symposium on Security and Privacy. IEEE, 513–528. the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015). [41] M. Selakovic and M. Pradel. 2016. Performance Issues and Optimizations in 94–105. JavaScript: An Empirical Study. In 2016 IEEE/ACM 38th International Conference [16] Marco Guarnieri, Petar Tsankov, Tristan Buchs, Mohammad Torabi Dashti, and on Software Engineering (ICSE). 61–72. David Basin. 2017. Test execution checkpointing for web applications. In Pro- [42] Koushik Sen, Swaroop Kalasapur, Tasneem Brutch, and Simon Gibbs. 2013. ceedings of the 26th ACM SIGSOFT International Symposium on Software Testing Jalangi: A Selective Record-replay and Dynamic Analysis Framework for and Analysis. 203–214. JavaScript. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software [17] Salvatore Guarnieri and Benjamin Livshits. 2009. GATEKEEPER: Mostly Static Engineering (ESEC/FSE 2013). 488–498. Enforcement of Security and Reliability Policies for JavaScript Code. In USENIX [43] Koushik Sen, George Necula, Liang Gong, and Wontae Choi. 2015. MultiSE: Security Symposium. 78–85. Multi-path Symbolic Execution Using Value Summaries. In Proceedings of the [18] Salvatore Guarnieri, Marco Pistoia, Omer Tripp, Julian Dolby, Stephen Teilhet, 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). and Ryan Berg. 2011. Saving the World Wide Web from vulnerable JavaScript. In [44] shopping cart. 2019. https://github.com/ComeAlongErica/full-stack-express-lab- Proceedings of the 2011 International Symposium on Software Testing and Analysis. shopping-cart. ACM, 177–187. [45] Stelios Sidiroglou-Douskos, Eric Lahtinen, Anthony Eden, Fan Long, and Martin [19] Michael Hilton, Arpit Christi, Danny Dig, Michał Moskal, Sebastian Burckhardt, Rinard. 2017. CodeCarbonCopy. In Proceedings of the 2017 11th Joint Meeting on and Nikolai Tillmann. 2014. Refactoring local to cloud data types for mobile apps. Foundations of Software Engineering (ESEC/FSE 2017). 95–105. In Proceedings of the 1st International Conference on Mobile Software Engineering [46] Chungha Sung, Markus Kusano, Nishant Sinha, and Chao Wang. 2016. Static and Systems. 83–92. DOM Event Dependency Analysis for Testing Web Applications. In Proceedings of [20] Simon Holm Jensen, Manu Sridharan, Koushik Sen, and Satish Chandra. 2015. the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software MemInsight: platform-independent memory debugging for JavaScript. In Pro- Engineering (FSE 2016). 447–459. ceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. [47] theBrownNode. 2019. https://github.com/clintcparker/theBrownNode. [21] Y. Kwon and E. Tilevich. 2012. Energy-Efficient and Fault-Tolerant Distributed [48] John Vilk and Emery D Berger. 2018. BLeak: automatically debugging memory Mobile Execution. In 2012 IEEE 32nd International Conference on Distributed leaks in web applications. In Proceedings of the 39th ACM SIGPLAN Conference on Computing Systems. 586–595. Programming Language Design and Implementation. 15–29. [22] Guodong Li, Esben Andreasen, and Indradeep Ghosh. 2014. SymJS: Automatic [49] Xudong Wang, Xuanzhe Liu, Ying Zhang, and Gang Huang. 2012. Migration and Symbolic Testing of JavaScript Web Applications. In Proceedings of the 22nd ACM execution of JavaScript applications between mobile devices and cloud. In Pro- SIGSOFT International Symposium on Foundations of Software Engineering (FSE ceedings of the 3rd annual conference on Systems, programming, and applications: 2014). 449–459. software for humanity. 83–84. [23] Mario Linares-Vásquez, Kevin Moran, and Denys Poshyvanyk. 2017. Continuous, [50] Xiao Xiao, Shi Han, Charles Zhang, and Dongmei Zhang. 2015. Uncovering evolutionary and large-scale: A new perspective for automated mobile app testing. JavaScript performance code smells relevant to type mutations. In Asian Sympo- In 2017 IEEE International Conference on Software Maintenance and Evolution sium on Programming Languages and Systems. 335–355. (ICSME). 399–410. [24] Yin Liu, Kijin An, and Eli Tilevich. 2018. RT-Trust: Automated Refactoring for Trusted Execution Under Real-Time Constraints. In Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2018). ACM, 175–187. [25] Yin Liu, Kijin An, and Eli Tilevich. 2020. RT-Trust: Automated refactoring for different trusted execution environments under real-time constraints. Journal of Computer Languages 56 (2020), 100939. https://doi.org/10.1016/j.cola.2019.100939 189