**Authors**
Elena Boldyreva
Vadim Kholoshnia

**License**
CC-BY-4.0

The System for Finding the Least Resource-Intensive Path in Two- or Three-Dimensional Space Using Machine Learning Vadim Kholoshnia[0000−0001−6485−6340] and Elena Boldyreva[0000−0001−6357−4448] ITMO University. 197101, St.Peterburg, Russia vdkholoshnia@gmail.com eaboldyreva@itmo.ru Abstract. In the course of the research work, the system for finding the least resource-intensive path in two- or three-dimensional space, based on machine learning with visualization of the algorithms execution pro- cess was developed. The universality of the system lies in the fact that the user can determine the script for its execution - choose the optimal learning algorithm, load or create the desired map of the area and set the logic of the object movement. To demonstrate these features of the system software implementation, a user interface was developed. In or- der to formulate user recommendations and make it easier for the user to choose a machine learning algorithm, it was decided to develop a soft- ware implementation for visualizing the learning process of the genetic algorithm and reinforcement learning. In relation to the task of finding the least resource-intensive path in two- or three-dimensional space, the genetic algorithm and reinforcement learning were tested and compared. Based on a comparative analysis, recommendations on the choice of a learning algorithm are formulated. Keywords: least resource-intensive path finding · shortest path finding · machine learning · genetic algorithm · reinforcement learning · visual- ization of learning 1 Introduction Currently, there is an increasing need to create systems for automatically finding the least resource-intensive path in two or three dimensional spaces for objects without human intervention. One way to achieve this goal is to use machine learning methods such as genetic algorithm and reinforcement learning. Rein- forcement learning is one of the machine learning methods, during which the test system (agent) learns by interacting with a certain environment. The genetic Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Kholoshnia V., Boldyreva E. algorithm is a heuristic search algorithm used to solve optimization and modeling problems by randomly selecting, combining, and varying the desired parameters using mechanisms similar to natural selection in nature. These algorithms allow building paths in space, having information only about the environment. Also, such algorithms are universal, since the ability to configure allows using them almost on any map and in any conditions. The purpose of this study is to reduce the cost of resources for finding and traversing a path for an object with arbitrary logic of movement on any map. To achieve this goal, it was decided to develop the least resource-intensive path find- ing algorithm based on the NeuroEvolution of Augmenting Topologies (NEAT) [1] genetic algorithm and its software implementation using multi-threaded pro- gramming technology. For the case when the user needs to get information for all the least resource-intensive paths in space for any possible starting point it was decided to use Q-Learning [2] reinforcement learning algorithm. The least resource-intensive path means not only the shortest path from the starting point to the goal, but also the path during which the object uses the least amount of resources (e.g. the time required to complete the path). The novelty of the results is that, in contrast to existing systems, the de- veloped least resource-intensive path finding system is adaptable. I.e the script for the execution of the algorithms can be adjusted by the user by choosing the optimal algorithm for training the neural network model, selecting criteria for selecting the optimal path, creating the necessary map and determining the logic of the object movement. The least resource-intensive path finding algorithm based on the reinforcement learning (Q-Learning), unlike the existing ones, is implemented for training on a map of the area, presented in the form of a spa- tial grid of possible states or obstacles, in the form of a map that the user can specify. The visualization of the machine learning processes can not only aid the process of formulating user recommendations but also it allows using presented system for educational purposes, such as creating virtual data science labora- tory. In addition, the software implementation of the developed system allows a comparative analysis of used machine learning algorithms, which, of course, is of scientific value. 2 Literature review In the context of this study, it was decided to consider the NEAT genetic al- gorithm and Q-Learning reinforcement learning as the main machine learning algorithms. The NeuroEvolution of Augmenting Topologies learning method (trial and error) is less resource-intensive (less computational power usage), but no less accurate when constructing a path. Also, this algorithm often reaches effective networks faster than other modern neuroevolutionary methods and reinforce- ment learning methods [3, 4]. The Q-Learning learning method (reinforcement learning algorithm) is more resource-intensive when learning (more computational power usage), but the The System for Finding the Least Resource-Intensive Path 3 result of this algorithm is an array of possible states and actions that allows choosing the initial position of the object after training. In addition, the analysis of existing algorithms and implementations of the least resource-intensive path finding has been carried out. Due to the fact that there are many different formulations of this problem, there are a lot of al- gorithms for solving the shortest path finding problem, such as the Dijkstra algorithm [5], the Bellman-Ford algorithm [6], the A* algorithm [7], the Floyd- Warshell algorithm [8], the Johnson algorithm [9], the Lee algorithm [10] and others. Unlike the presented shortest path finding methods, these algorithms as- sume the presence of an already generated weighted graph or they build a path to the goal for each initial state. Also, these algorithms do not take into account the features of the logic of the object movement. 3 Content of research To develop and demonstrate the effectiveness and value of the least resource- intensive path finding system, the following steps must be completed: 1. To develop a functional structure (implementation) of the least resource- intensive path finding system; 2. To adapt and to refine the presented system depending on the chosen learning algorithms - NEAT and Q-learning; 3. To develop a software implementation of the shortest path finding algorithm; 4. To develop a user interface for software implementation; 5. To analyze the results of the software implementation of the least resource- intensive path finding system and used machine learning algorithms. 3.1 Development of the functional structure of the algorithm At this stage, the least resource-intensive path finding system was formed based on the above machine learning methods. The algorithm diagram is shown in Figure 1. After launching the presented system, it becomes necessary to choose one of the learning algorithms. In the context of this study, it was decided to consider the NEAT genetic algorithm and Q-learning reinforcement learning as the main machine learning algorithms, since each of these algorithms optimally functions only under certain conditions, which together cover the maximum number of possible conditions of user needs. 3.2 Adaptation of the shortest path finding algorithm depending on the chosen learning algorithm Implementation of the least resource-intensive path finding algorithm using the genetic algorithm (NEAT) The least resource-intensive path finding algorithm using the genetic algorithm (NEAT) works according to the following principle: when the program launches, 4 Kholoshnia V., Boldyreva E. System launch, setting of parameters: learning algorithm: NEAT/Q-Learning; dimension: 2D/3D; mode: Learn/Check results NEAT Q-Learning Loading/creating Visualization Loading/creating Visualization the map, setting formatting (maps, the map, setting formatting (maps, parameters: agent layers). parameters: agent). Creating mutation rate, Initializing arrays discount Q and R arrays number of layers, coefficient, number of agents, Learning number of Learning autocompletion algorithm launch: iterations, number algorithm launch: parameter NEAT of repetitions Q-Learning The movement Selection of the The passage of Repeating the of agents on the agent with the the agent to the previous action map in greatest "fitness". target from each (number of accordance with Creating a new starting position iterations, the parameters agent layer from it of the map repetitions) Results output Results output Fig. 1. Software implementation scheme each agent randomly generates an array of directions, presented in the form of positions for movements formed from angles (the size of this array can be adjusted for example to limit number of possible movements of the agents). After that, each agent starts moving in accordance with the directions array elements and stops moving as soon as it touches a user-defined area (e.g. an obstacle on the map), reaches the goal, or when the elements in the directions array end (the number of moves ends). Then, the selection (natural selection principle) of the best agents for further training occurs: the formation of the next generation based on data of the previous generation best agent and the mutation rate parameter happens. The larger this parameter is, the larger the array of directions of the next generation agents differs from the previous one. The “fitness” of an agent is set according to a certain formula, for example, for the presented algorithm, the ”fitness” of the agent that is closest to the target as compared to the others is greater (1). F = 1/dist(g, a)2 (1) where F - the ”fitness”, g - the goal position, a - the agent position, dist(g, a) - the distance between goal and agent. If one of the agents has reached the goal, then for the agents of the next generation, the ”fitness” of the one that has done the least number of moves is greater (2). F = d + 1/s2 (2) The System for Finding the Least Resource-Intensive Path 5 where F - the ”fitness”, d - the directions array size, s - the number of steps done before reaching the goal. The agents with the highest ”fitness” are remembered as the best. The process of executing the algorithm can be divided into peculiar stages - generations. The more generations the algorithm goes through, the more accu- rate result (less resource-intensive path) will be. After determining the ”fitness”, the agents of the next generation receive the properties of the best agent from the previous one, but with changes obtained during selection depending on mu- tation rate parameter. The first stage (generation) ends here. This process is repeated for future generations, until the goal is achieved. After reaching the goal, the “fitness” formula of the agent changes, now the agent is considered the best, which has done the least number of moves (moves) before reaching the agent (after reaching the goal with at least one agent, the transition to the next generation takes place). The presented algorithm also offers the possibility of using several layers of agents. Each level is independent of the others, which allows us to consider more possible ways. The layer with the greatest “fitness” of the best agent is considered the best. It is proposed to use this feature when there are several possible ways to build a path from the starting point to the goal on the map. The execution of an algorithm ends automatically according to the autocom- pletion parameter. This parameter shows how many generations agents must go through after reaching the goal. Also, the user can interrupt the execution and save the results at any time, if necessary. Implementing an algorithm for finding the least resource-intensive path using reinforcement learning (Q-Learning) When the reinforcement learning algorithm is launched, an array of possible states and actions R is created, which shows where the agent can go and where not, as well as the location of the target on the map (numbering starts at zero and runs horizontally and top down in ascending order). After that, the second array Q is created and filled with zeros. Then training takes place: in order for an agent to learn how to find the least resource-intensive path from any position on the map, it needs to check each initial state (after creating the first array R, an array of possible initial states is also created). Then the agent leaves each state and calculates his next move, comparing the rewards for all possible subsequent actions. After that, the second array with award weights is filled in accordance with formula (3) (Markov decision process [11]): Q(s, a) = r(s, a) + γmaxQ(s0 , a0 ) (3) where s - the current state, a - the next action, γ - the discount coefficient. It is used to balance the immediate and future rewards. Typically, this value ranges from 0.8 to 0.99. The discount coefficient controls the speed of learning, the more this parameter is, the faster the neural network learns, but the more errors it 6 Kholoshnia V., Boldyreva E. can make. s’ is the next state after the next action. a’ is the next action after the next state. The maxQ(s’, a’) function returns the maximum reward for an available action from state s’. After the training is completed, the second array Q stores the rewards of the possible actions for each state. The user can enter any initial state and get the least resource-intensive path. Such an algorithm can be used for devices that operate on static map, for exam- ple, a robot vacuum cleaner. At the first scan of the apartment, the algorithm will compose the R array and will be trained, and on subsequent trips, the vac- uum cleaner will already have a ready array of Q rewards in it that will allow it to instantly get the least resource-intensive path to the goal (dock-station) from any starting location, which will also improve with every departure. Based on the results of this stage, it was decided, when solving the problem of finding the least resource-intensive path, to use both described learning algo- rithms, since in the aggregate they cover almost all possible path finding needs and environmental conditions. 3.3 Software implementation of the algorithm For visualization of the menu it was decided to use C++ Windows Forms API. The proposed the least resource-intensive path finding algorithm on a two- dimensional map is described in the C++ programming language. To visualize the operation of the algorithm, the SFML graphical interface library was used (Fig. 2). Fig. 2. Visualization of the learning algorithm on a two-dimensional map To visualize a three-dimensional map, the Unity engine was used. The algo- rithm is described in the C# programming language (Fig. 3). The System for Finding the Least Resource-Intensive Path 7 Fig. 3. Visualization of the learning algorithm on a three-dimensional map Also, an algorithm for creating a two-dimensional and three-dimensional map is applied to the software implementation, which allows creating the necessary environmental conditions, load a map from a graphic format file (supported formats: BMP, PNG, TGA, JPG, GIF, PSD, HDR, PIC) and load 3D objects to use them when training (supported formats: FBX, DAE, 3DS, DXF, OBJ, SKP). The CSV extension data files, if necessary, simplifies editing with table editors. To represent dynamic arrays, pointers and multithreading, the standard li- braries were used, the use of which avoids possible errors and memory leaks. Unity.Jobs was used to distribute processor power, as this library not only monitors all the little things and errors when working with threads, but also provides maximum performance when using multi-threaded programming tech- nology in the Unity engine. Implementation in C-like languages, as well as splitting the software imple- mentation of the algorithm into separate files, which allows selecting parts with a description of only the desired algorithm, simplifies the process of embedding it into other systems. For the genetic algorithm, the user can choose the speed of training and, accordingly, obtain the desired result. The speed and accuracy of training can be regulated by changing the number of agents, the more agents, the higher the learning speed, as long as the number of agents does not require more computing power. A used multithreading programming technology allows separate processor power to different streams and use each of them for each layer so as long as number of layers is less than supported streams by processor there won’t be any performance issues. For reinforcement learning, the user can select the learning speed by adjusting the parameter γ. The larger the parameter, the higher the learning speed, but 8 Kholoshnia V., Boldyreva E. also the probability of missing a useful learning outcome. To implement the ability to use the spatial grid as a map, an algorithm for representing the map in the form of a two-dimensional array of possible states and actions was added. 3.4 Analysis of the results Based on the results of the software implementation, a comparative analysis of two training algorithms was carried out. The analysis data are presented in Table 1. Table 1. Comparative analysis of the presented machine learning algorithms Q-learning (map NEAT (map resolu- resolution: 25x25) tion: 80x80) Ability to choose an arbitrary + - starting point after training Adjustable parameters Obstacles on the Obstacles on the map, map resolu- map, map resolu- tion, discount coef- tion, mutation rate, ficient, number of number of layers, iterations, number number of agents, of repetitions size of directions array, autocomple- tion parameter The complexity of the map affects the runtime of the reinforcement learning algorithm: if number of the obstacles on the map increasing, the execution time decreases, since it decreases as the number of possible initial positions of the agent. Due to the structure of the genetic algorithm, the complexity of the card increases the learning time, but with the correct settings, the increase in the running time of the algorithm can be almost completely avoided. Thus, the decision on the appropriateness of sharing various training algo- rithms within the same the least resource-intensive path finding system can be considered justified, since the user can choose a training algorithm for certain en- vironmental parameters (map). Based on the results of the comparative analysis, the following recommendations were formed: 1. If it is necessary to obtain an exact path for a specific environment, it is proposed to choose the genetic algorithm NEAT, the result of which will be an array of object movements for a specific starting point. But when using the NEAT training method with the presented “fitness” formula, it is necessary to take into account that the initial position should not be beyond the obstacle, for which the agent would need to go a greater distance than the possible deviation, regulated by the mutation rate parameter. To solve this problem, it is necessary to increase the mutation rate parameter, which can The System for Finding the Least Resource-Intensive Path 9 lead to a loss of accuracy for the same period of training time or, for example, use a system of additional rewards placed on the map and indicating to the agent how to get around the barrier. Also using this logic, the user can adjust the trajectory, if it is necessary. 2. When using training with Q-Learning reinforcement learning as a result, the user receives a two-dimensional array of possible states and a certain reward for actions that allows selecting any available starting point on the map after training. However, to obtain a more accurate result, more resources will be required (it is necessary to increase the number of map fields, which entails an increase in the amount of memory used). 4 Conclusion In the course of research work: – The system for finding the least resource-intensive path in two or three dimensional space using machine learning is developed. The universality of the algorithm lies in the fact that the user can choose the optimal learning algorithm, create the necessary map of the area and set the parameters. – An effective software implementation has been developed. The effectiveness of software implementation is achieved through the use of multi-threaded programming technology and processor power distribution using the built-in thread control libraries, as well as the use of modern methods and standard libraries. – The software implementation for visualizing the learning process of the genetic algorithm and reinforcement learning to find the least resource- intensive path has been developed. The user interface is developed and tested for convenient user interaction with the software implementation of the al- gorithm. – Genetic algorithm and reinforcement learning have been tested and com- pared. Based on a comparative analysis, recommendations on the choice of a learning algorithm are formulated. References 1. Kenneth O.S.: Efficient Evolution of Neural Networks Through Complexification. Ph.D. Thesis. Department of Computer Sciences. The University of Texas at Austin (2004) 2. Wotkins C.H.: Learning from delayed rewards. Ph.D. Thesis. Cambridge (1989) 3. Kenneth O.S., Miikkulainen R.: Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation (2002) 4. Matthew E.T., Whiteson S., Stone P.: Comparing Evolutionary and Temporal Dif- ference Methods in a Reinforcement Learning Domain. Proceedings of the Genetic and Evolutionary Computation Conference (2006) 5. Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., Stein C.: Dijkstra’s algorithm. Introduction to Algorithms (Second ed.). MIT Press and McGraw–Hill (2001) 10 Kholoshnia V., Boldyreva E. 6. Bellman R.: On a routing problem. Quarterly of Applied Mathematics, 16 (1958) 7. Zeng, W., Church, R. L.: Finding shortest paths on real road networks: the case for A*. International Journal of Geographical Information Science, 23:4, 531-543 (2009) 8. Floyd R. W.: Algorithm 97: Shortest Path. Communications of the ACM. 5, 6 (1962) 9. Johnson D. B.: Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24 (1977) 10. Lee C. Y.: An Algorithm for Path Connections and Its Applications, IRE Trans- actions on Electronic Computers, EC-10 (1961) 11. Bellman R.: A Markovian Decision Process, Indiana Univ. Math. J. 6 No. 4, 679–684 (1957)