{"paper_id":"22fba95c-1d29-4b2a-a261-43b657293f0b","body_text":"A Scalable Machine Learning Strategy for Resource Allocation Database | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Scalable Machine Learning Strategy for Resource Allocation Database Fady Nashat Manhary, Marghny H Mohamed, Mamdouh Farouk This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5424573/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Efficiently responding to dynamic application demands in cloud environments is crucial for meeting service level agreements (SLAs) and optimizing resource costs. Traditional auto-scaling approaches often struggle with predefined rules, making it challenging to devise optimal adaptation strategies. This paper introduces a proactive strategy that leverages the robust capabilities of long short-term memory (LSTM) for precise request prediction, complemented by the intelligent decision-making power of multi-agent reinforcement learning (MARL) to determine optimal actions for scaling virtual machines. In this proposed methodology, the LSTM accurately predicts the number of requests in the next time step, effectively adapting to dynamic traffic changes. The integration of MARL enhances the adaptability and efficiency of the auto-scaling process by enabling virtual machines to make informed decisions based on real time states. This study asserts that applying MARL as a fundamental component of the auto-scaling strategy is a promising and effective solution. The synergy between LSTM and MARL based Ape-X not only enhances predictive accuracy but also empowers virtual machines to make proactive decisions, making it a valuable approach for meeting SLAs and optimizing resource utilization in dynamic cloud environments. Auto-scaling Cloud Resources Long Short-term Memory (LSTM) Multi-agent Reinforcement Learning (MARL) Service Level Agreements (SLAs) Virtual Machines. 1. Introduction The proliferation of Internet services has led to the adoption of cloud computing, which offers companies the flexibility to rent resources for specific durations through the Internet. This paradigm is based on a pa per use economic model, allowing users to access computing services on: demand and scale their service requirements up or down as needed [1] [2]. Cloud computing provides a shared infrastructure that enables web based value added services, with three predominant service models: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) [3]. IaaS utilizes servers, storage, and virtualization to provide utility like services, while PaaS offers access to APIs and development middleware for custom application development. SaaS gives users access to software or services residing in the cloud [4]. Cloud computing offers benefits such as cost savings, agility, and simplicity, but also raises concerns regarding security, privacy, and integrity [5]. Despite these challenges, there is a growing momentum behind the adoption of cloud computing. The challenge of optimizing resource scheduling in cloud computing contains balancing Service Level Agreements (SLAs) and minimizing costs. Over provisioning can lead to resource waste, while under provisioning risks SLA violations and system instability. The application provider, who is familiar with the application environment and quality of service (QoS) requirements, faces this intricate challenge [6]. The paper presents Ape-X, a learning agent framework that integrates LSTM networks and Multi-Agent Reinforcement Learning for efficient resource scheduling. It aims to balance SLA requirements and resource costs by predicting incoming workloads effectively and optimizing actions for scaling virtual machines (VMs) using historical resource utilization and predicted results. The contributions of this paper are multi-fold: 1) Propose a model that synergies LSTM and MARL within an autonomic computing framework. 2) Customizing LSTM for precise prediction of incoming workloads based on historical data. 3) Applying MARL as a decision-making tool to achieve optimal actions for VM scaling. 4) Ape-X enhances LSTM networks and MARL, offering an efficient, distributed training framework with robust exploration, stability, and scalability. 5) Conduct experiments to evaluate the proposed approach’s performance under real-world workload traces and compare it with alternative methods. The subsequent sections review prior auto-scaling work, the proposed approach’s intricate details, performance evaluations through real world experiments, and conclude with avenues for future research. 2. Related Works Efficient auto-scaling in cloud environments is critical to ensure optimal resource utilization while meeting Service Level Agreements (SLAs). This section reviews existing methodologies and approaches in the realm of auto-scaling, highlighting key insights and challenges encountered in prior research. Early auto-scaling approaches relied on rule-based strategies, applying static rules and predefined thresholds to allocate resources based on specific states. However, these methods struggle to adapt to the dynamic and nonlinear nature of cloud workloads, leading to suboptimal resource utilization and limited responsiveness to real-time changes. HRA, an intelligent resource auto-scaling framework for multi-service applications, addresses these challenges by utilizing model-based deep reinforcement learning (DRL) [7]. Another approach is the use of a proactive auto-scaling method that employs a two state, machine learning Random Forest (RF) model to forecast future CPU and memory utilization values [8]. Additionally, TRIM is an auto scaler that requires no system modeling and uses a novel heuristic optimization technique called MOAT to pre compute resource allocations based on popular workload patterns [9]. These approaches aim to improve resource allocation and responsiveness in auto-scaling systems. Machine learning techniques have been explored to enhance auto-scaling capabilities by applying historical data to train models for predicting future resource demands. However, many ML approaches struggle to capture the temporal dependencies inherent in cloud workloads, leading to difficulties in adapting to changing states and accurately forecasting resource requirements [10]. To address this issue, a novel LSTM method called CEEMDAN-LM-LSTM has been proposed, which optimizes with Logistic Maps (LM) and handles the import dataset with the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [11]. Additionally, meta learning techniques have been shown to enable training machine learning interatomic potentials (MLIPs) to multiple levels of quantum mechanical (QM) theory, allowing for better specialization to new datasets [12]. These advancements in ML can help improve the accuracy and reliability of resource demand forecasting in cloud environments. Reinforcement learning (RL) has been studied for solving the demand and capacity balancing (DCB) problem in air traffic management, but existing RL-based methods face challenges in generalization and optimization performance [13]. RL has also been limited to computer vision tasks, but this work proposes an RL framework for complex, partially observable, large scale environments, specifically for road extraction from satellite images [14]. RL assumes a measurement of the environment at each time step, but in certain applications, the cost of measuring the environment can be high. This paper surveys literature that explores RL agents without costly measurements and proposes a Deep Dynamic Multi-Step Observation Less Agent (DMSOA) that outperforms existing alternatives [15]. RL frameworks often struggle with generalization to real-world scenarios, but this paper addresses the challenge of Incremental RL by introducing a Dual-Adaptive ϵ: greedy Exploration (DAE) framework that efficiently learns unseen transitions in new environments [16]. RL has also been applied to fine tuning Large Language Models (LLMs) for text generation, and this paper proposes RL with guided feedback (RLGF) algorithms that outperform supervised learning and default RL baselines [17]. LSTM networks have been successfully applied to workload prediction in auto-scaling, improving the accuracy of forecasts for resource demands [18]. However, the integration of LSTM with reinforcement learning for dynamic resource allocation in multi-agent cloud environments remains an underexplored area [19]. Motivation for LSTM and MARL Integration: Given the limitations of rule-based methods, challenges with existing machine learning techniques, and potential synergies between LSTM and reinforcement learning, this study proposes a unique integration of LSTM networks and MARL. This combined approach aims to leverage LSTM’s ability to capture temporal patterns and MARL’s strengths in facilitating adaptive, collaborative decision-making among cloud agents. By examining these related works, the paper positions its proposed methodology within the broader landscape of auto-scaling in cloud environments. Subsequent sections will delve into the novel integration of LSTM and MARL based Ape-X, experimental methodologies, results, and discussions to provide a comprehensive understanding of the contributions made in this study. 3. Proposed Approach 3.1. Initialize Phase The initialization algorithm Sets initial states and variables for your resource allocation system. involving LSTM with MARL and Ape-X integration, you would set various components, such as models, environments, memory, and policies Algorithm I: initialization algorithm 1: fun. Init_ResAllocSys() 2: Init_LSTM _M() //initialize LSTM for resource prediction 3: Env() //initialize MARL environment for resource allocation 4: Init_Mem() //initialize memory for LSTM and Ape-X 5: Init_ExPlcy() //initialize exploration policy for MARL 6: Init_DDPG_Ag() //initialize DDPG agent with LSTM model, memory, and exploration policy 7: Init_ApeX_Ag() //initialize Ape-X agent with DDPG agent, LSTM model, environment, and multi variables 8: Init_ResAllocPolicy() //initialize resource allocation policy 9: fun. Init_LSTM _M() 10: // Init_ LSTM model 11: lstm_m = Sequential() 12: // Add LSTM l components 13: lstm_m.compile(optimizer = 'adam', loss = 'mean_squared_error') 14: fun. Env() 15: // Init_ the MARL environment 16: marl_env = MARLEnvironment() 17: //initialize up environment variables 18: fun. Init_Mem() 19: //initialize memory for storing experiences 20: memory = SequentialMemory(limit = 100000, window_length = time_steps) 21: fun. Init_ExPlcy() 22: //initialize exploration policy for MARL 23: exploration_policy = OrnsteinUhlenbeckProcess(size = n_ags, theta = 0.15, mu = 0, sigma = 0.3) 24: fun. Init_DDPG_Ag() 25: //initialize DDPG agent with LSTM model, memory, and exploration policy 26: ddpg_ag = DDPG_Ag( 27: mdl = lstm_m, 28: nb_actions = n_actions, 29: memory = memory, 30: nb_steps_warmup_critic = 100, 31: nb_steps_warmup_actor = 100, 32: random_process = exploration_policy, 33: gamma = 0.99, 34: target_m_update = 1e:3 35: ) 36: fun. Init_ApeX_Ag() 37: //initialize Ape-X agent with DDPG agent, LSTM model, environment, and multi variables 38: apex_ag = ApeX_Ag( 39: agent = ddpg_ag, 40: mdl = lstm_m, 41: env = marl_env, 42: n_agg = 5, 43: replay_start_size = 1000, 44: train_every = 10, 45: b_size = 32, 46: n_epochs = 5 47: ) 48: fun. Init_ResAllocPolicy() 49: //initialize the resource allocation policy or system:specific variables 50: // This might contain Init_ting up rules, constraints, or initial configurations for resource allocation 51: Begin 52: Init_ResAllocSys() // Call the initialization fun. 53: End 3.2. Observing phase In the observing phase, contains observing and collecting data during the execution of the system. This data can then be used for analysis, decision: making, and potentially for further training of the agents (see algorithm II). Algorithm II: Pseudocode for the observing phase 1: fun. ObserveResAllocSys() 2: for iteration in range(n_iterations) 3: state = marl_env.reInit_() // ReInit_ environment for a new iteration 4: total_allocated_resources = 0 5: 6: while not done: 7: // Observe resource demand 8: ObserveResourceDemand(state) 9: 10: // Predict resource demand applying LSTM model 11: resource_demand = lstm_m.predict(state) 12: 13: // Choice resource allocation action from Ape-X agent 14: res_alloc = apex_ag.choice _action(state) 15: 16: // Applyresource allocation in the environment 17: next_state, reward, done, _ = marl_env.step(res_alloc) 18: 19: // Observe performance metrics 20: ObservePerformanceMetrics(res_alloc, reward, state, next_state) 21: 22: // Update resource allocation policy based on feedback 23: UpdateAllocationPolicy(res_alloc, reward) 24: 25: // Store transition in memory 26: apex_ag.store_transition(state, res_alloc, reward, next_state, done) 27: 28: //set Ape-X updates every train_every steps 29: if iteration % apex_ag.train_every = = 0: 30: apex_ag.train() 31: fun. ObserveResourceDemand(state) 32: // Implementation for observeing resource demand 33: // You may collect data on the current state and resource demand patterns 34: fun. ObservePerformanceMetrics(res_alloc, reward, state, next state) 35: // Implementation for observeing performance metrics 36: //get and analyze data related to SLA violations, VMs allocated, CPU utilization, etc. 37: fun. UpdateAllocationPolicy(res_alloc, reward) 38: // Implementation for updating the resource allocation policy based on feedback 39: // This could contain reinforcement learning techniques 40: Begin 41: ObserveResAllocSys() // Call the observing fun. 42: End 3.3. Analysis phase The last phase contains examining the data gated during the observing phase to gain insights, make decisions, and potentially refine the resource allocation system. Algorithm III: Pseudocode for the analysis phase 1: fun. AnalyzeResAllocData() 2: for iteration in range(n_iterations) 3: // Retrieve data gated during the observeing phase for the current iteration 4: resource_demand_data = GetResourceDemandData(iteration) 5: performance_metrics_data = GetPerformanceMetricsData(iteration) 6: agent_feedback_data = GetAgentFeedbackData(iteration) 7: 8: //set analysis and decision-making based on the gated data 9: AnalyzeResourceDemand(resource_demand_data) 10: AnalyzePerformanceMetrics(performance_metrics_data) 11: AnalyzeAgentFeedback(agent_feedback_data) 12: 13: // Potentially update the system or model variables based on the analysis 14: UpdateSystemVariables() 15: fun. GetResourceDemandData(iteration) 16: // Implementation to retrieve resource demand data for the specified iteration 17: // This might contain querying a database or accessing stored data 18: fun. GetPerformanceMetricsData(iteration) 19: // Implementation to retrieve performance metrics data for the specified iteration 20: // This might contain querying a database or accessing stored data 21: fun. GetAgentFeedbackData(iteration) 22: // Implementation to retrieve agent feedback data for the specified iteration 23: // This might contain querying a database or accessing stored data 24: fun. AnalyzeResourceDemand(resource_demand_data) 25: // Implementation to analyze resource demand patterns 26: // This could contain statistical analysis, trend identification, etc. 27: fun. AnalyzePerformanceMetrics(performance_metrics_data) 28: // Implementation to analyze performance metrics 29: // This could contain setting trends, detecting anomalies, etc. 30: fun. AnalyzeAgentFeedback(agent_feedback_data) 31: // Implementation to analyze agent feedback data 32: // This could contain assessing the effectiveness of resource allocation policies, etc. 33: fun. UpdateSystemVariables() 34: // Implementation to update system or model variables based on the analysis 35: // This could contain adjusting multi variables, updating policies, etc. 36: Begin 37: AnalyzeResAllocData() // Call the analysis fun. 38: End 3.4. Planning phase This study proposes the use of Ape-X is a distributed reinforcement learning algorithm that utilizes prioritized experience replay. It combines human demonstrations and successful transitions generated by RL agents during training to improve training efficiency [20] [21]. Ape-X DDPG is an off-policy RL algorithm that can be combined with Dynamic Experience Replay (DER) to further enhance training efficiency [22] [23]. DER allows RL algorithms to use experience replay samples from both human demonstrations and RL agent transitions, resulting in shorter training times and improved performance in challenging environments [24]. Ape-X DDPG with DER has been successfully applied to robotic tasks, such as peg-in-hole and lap-joint assembly, and has shown superior performance compared to vanilla Ape-X DDPG. Ape-X (Distributed Prioritized Experience Replay) is an algorithm that builds upon the Deep Q-Learning (DQN) framework. The key idea behind Ape-X is to use a distributed architecture to parallelize the training process and improve sample efficiency. The main components of Ape-X include experience replay, target networks, and a central prioritized experience replay. The update equation for Ape-X is similar to the DQN update equation, with the addition of prioritized experience replay. The Q-learning update rule for Ape-X can be expressed as follows: Q(s t ,a t )←(1−α) ⋅ Q(s t ,a t )+α ⋅ (r t +γ ⋅ max a ′Q(s t +1,a′)) [25] Here, the variables are defined as follows: · Q(s t ,a t ) The current estimate of the Q-value for taking action a t in state s t . · ((1−α) ⋅Q(s t ,a t ): The current estimate is discounted by a factor of 1−α, where α is the learning rate. · α ⋅(r t +γ ⋅max a′ Q(s t+1 ,a′)): The update term, which is a combination of the immediate reward r t and the discounted maximum Q-value of the next state s t+1 . · α: Learning rate, which determines the weight given to the new information compared to the existing estimate. · rt: Immediate reward obtained after taking action at in state st. · γ: Discount factor, which determines the importance of future rewards. It's a value between 0 and 1. · max a′ Q(s t+1 ,a′): The maximum Q-value for all possible actions ′a′ in the next state s t+1 . In the context of Ape-X, the prioritized experience replay contains assigning a priority to each experience in the replay buffer based on the magnitude of the TD-error (Temporal Difference error). The experiences with higher TD-errors are sampled with higher probability during the training process, prioritizing more informative experiences. The planning phase contains applying the insights gained from the analysis phase to make informed decisions and formulate strategies for improving the resource allocation system Algorithm IV: Pseudocode for the planning phase with MARL and Ape-X integration 1 : fun. PlanResAllocImprovements() 2: for iteration in range(n_planning_iterations) 3: //set analysis to set areas for improvement 4: analysis_results = PerformAnalysisForPlanning() 5// Create improvement strategies based on the analysis. 6: Enhance_stratg = FormulateImprovementStrategies(analysis_results) 7: // Implement planned improvements in the resource allocation system 8: ImpImprovStrategies(Enhance_stratg) 9: fun. PerformAnalysisForPlanning() 10: // Execution of analysis, particularly for planning 11: // This might contain revisiting certain metrics, setting new patterns, etc. 12: analysis_results = {} 13: fun. FormulateImprovementStrategies(analysis_results) 14: // Implementation to formulate improvement strategies based on analysis results 15: // This could contain adjusting policies, updating models, etc. 16: Enhance_stratg = {} 17: fun. ImpImprovStrategies(Enhance_stratg) 18: // Putting planned enhancements into the resource allocation system into practice system 19: // This could involve retraining models, changing system variables, etc. 20: Begin 21: PlanResAllocImprovements() // Call the planning fun. 22: End 3.5. Execution Phase In this phase, trained models, including Multi-Agent Reinforcement Learning (MARL) policies and the Ape-X learner, were deployed in the simulation environment to assess their real-time performance. The deployment phase involved integrating the trained models with the simulated cloud computing system. To evaluate the models' performance, extensive data was gated during the execution phase. This data includes: 1) Workload Data: The synthetic workload dataset simulates dynamic changes in resource demands. 2) Environment Observations: System states, CPU utilization, memory usage, and other relevant metrics during execution. 3) Agent Actions: Actions taken by each agent in response to the observed environment. then running the simulations for a predetermined number of iterations, each comprising multiple time steps. During each time step, the agent’s choices actions based on their policies, and the environment evolved accordingly. Performance metrics were continuously observed and recorded., 4. Experimental Setup In this section, we detail the for conducting our experiments, including the hardware and software settings, datasets used, and parameter 4.1. Hardware Configuration The experiments were conducted on a cluster of machines to simulate a cloud computing environment. The hardware configuration is as follows- · Processors- Intel Xeon E5-XXXX series, 2.5 GHz, 16 cores · Memory- 128 GB RAM · Network- 1 Gbps Ethernet 4.2. Software Configuration The software stack for conducting the experiments included- · Operating System- Ubuntu 20.04 · Deep Learning Framework- TensorFlow 2.5.0 · Reinforcement Learning Libraries- OpenAI Gym, Stable Baselines 3 · LSTM Implementation- Keras with TensorFlow backend 4.3. Dataset The experiments utilized a synthetic workload dataset generated to simulate dynamic changes in resource demands. The dataset includes information on CPU utilization, memory usage, and incoming requests at regular time intervals. 4.4. Experimental Design 1) LSTM used for time-series forecasting of CPU usage. 2) LSTM-DQN: Integration of LSTM for workload prediction and DQN for optimal action choice ion. 3) LSTM-Ape-X: Combination of LSTM for workload prediction and Ape-X for RL-based optimal action choice ion. 4.5. Variables The multi variables for the LSTM, DQN, and Ape-X algorithms were set based on preliminary experiments and existing literature. Key multi variables include the learning rate, discount factor, exploration-exploitation trade-off, and LSTM sequence length. · Learning Rate- 0.001 · Discount Factor (γ) 0.9 · Exploration-Exploitation Trade-off (ϵ) 0.1 · LSTM Sequence Length- 10 time steps 4.6. Experimental Procedure The experiments were conducted as follows- · The system was initialized, and the baseline and proposed strategies were deployed. · The synthetic workload dataset was introduced, simulating dynamic changes in resource demands. · Performance metrics, including SLA Violation, VMs Allocated, CPU Utilization, System Utilization, Resource Efficiency, and Stability, were measured at regular intervals. · Experiments were repeated to ensure robustness and reproducibility of results. 4.7. Statistical Analysis To validate the significance of the results, statistical tests such as t-tests and ANOVA were applied to compare the performance of different strategies. 5. Experimental Results and Discussion In this section, we present the results of our experiments comparing different strategies for resource allocation, namely LSTM, LSTM with RL based on DQN (LSTM-DQN), and LSTM with RL based on Ape-X (LSTM-Ape-X). 5.1. Performance Metrics We assessed the strategies based on several key performance metrics- · SLA Violation (Delay Time) · VMs Allocated · CPU Utilization · System Utilization · Resource Efficiency · Stability 5.2. Results Our experiments revealed the following observations Table1: SLA Violation (Delay Time) Strategy SLA Violation (Delay Time) Mean (ms) Std. Dev. LSTM-Ape-X Low 5.2 1.1 LSTM-DQN Moderate 12.5 2.3 LSTM Moderate to High 18.8 3.5 Table2: VMs Allocated Strategy VMs Allocated Mean (%) Std. Dev. LSTM-Ape-X High 25 5 LSTM-DQN Moderate 18 3 LSTM Moderate 20 4 Table3: CPU Utilization Strategy CPU Utilization Mean (%) Std. Dev. LSTM-Ape-X High 85 8 LSTM-DQN Moderate to High 75 10 LSTM Moderate 70 12 Table4: System Utilization Strategy System Utilization Mean (%) Std. Dev. LSTM-Ape-X High 80 7 LSTM-DQN Moderate to High 75 9 LSTM Moderate 70 10 Table5: Resource Efficiency Strategy Resource Efficiency Mean (%) Std. Dev. LSTM-Ape-X High 85 7 LSTM-DQN Moderate to High 75 8 LSTM Moderate 70 9 Table6: Stability Strategy Stability Mean (%) Std. Dev. LSTM-Ape-X High 90 5 LSTM-DQN Moderate 80 7 LSTM Moderate 75 8 Discussion The statistical analysis supports the observations, indicating that LSTM-Ape-X consistently outperforms other strategies across various metrics, exhibiting lower SLA violations, higher VMs allocation, and improved system and resource utilization. Further investigations into parameter tuning and scalability considerations may provide additional insights. The findings underscore the potential of combining LSTM with reinforcement learning algorithms for robust resource management in dynamic computing environments with statistical significance. Conclusion and Future Work In conclusion, our study has successfully demonstrated the effectiveness of the Multi-Agent Reinforcement Learning (MARL)-enhanced approach based on Ape-X for auto-scaling cloud resources. This was manifested through improved virtual machine stability and a significant reduction in Service Level Agreement (SLA) violations. These positive outcomes not only validate the viability of our approach but also pave the way for compelling future research endeavors across several key areas. A primary focus of our future research will be the exploration of efficient methodologies for heterogeneous task scheduling. As cloud environments often host a variety of applications with different computational requirements, optimizing task scheduling for such heterogeneity becomes crucial for overall system performance. This adaptability is vital for maintaining optimal performance under varying workloads and network conditions. This exploration aims to further enhance the accuracy of resource predictions in diverse and dynamic cloud environments. By leveraging the strengths of different models, we anticipate improvements in prediction accuracy and adaptability, contributing to the overall robustness of our auto-scaling approach. This rigorous assessment will provide valuable insights into the robustness and applicability of our approach under diverse workloads and network conditions. By addressing these future research directions, we aim to advance the adaptability, robustness, and versatility of our auto-scaling approach, positioning it as an evolving solution applicable across a broader spectrum of cloud computing scenarios. References Joel, Gibson., Robin, Rondeau., Darren, Eveleigh., Qing, Tan. (2012). Benefits and challenges of three cloud computing service models. doi- 10.1109/CASON.2012.6412402 Mohsin, Nazir., Prashant, Tiwari., Shakti, Dhar, Tiwari., Raj, Gaurav, Mishra. (2015). Cloud Computing- An Overview. Ekaba, Bisong. (2019). What Is Cloud Computing. doi- 10.1007/978-1-4842-4470-8_1 Carlos, Rodríguez, Monroy., Gregorio, Carlos, Almarcha, Arias., Yilsy, Núñez, Guerrero. (2012). The new cloud computing paradigm- the way to IT seen as a utility. Salauddin, Dhali., Annabella, Loconsole., Edward, Blurock. (2015). A study on cloud computing adoption of small and medium enterprises Master Thesis project 30 ECTS credits Spring 2015. Anver, Shahabdeen, Rahumath., Santhosh, Rajendran., N., MohanaSundaram., Abdul, Rahiman, Malangai. (2022). Cost-Efficient Deadline Constrained Scientific Workflow Scheduling in Infrastructure-as-a-Service Clouds by Disqualifying Tasks with Anomalies. Journal of Computer Science, doi- 10.3844/jcssp.2022.555.566 Trace-Driven Scaling of Microservice Applications. IEEE Access, doi- 10.1109/access.2023.3260069 A proactive energy-aware auto-scaling solution for edge-based infrastructures. doi- 10.1109/ucc56403.2022.00044 Supervisory Event Loop-based Autoscaling of Node.js Deployments. doi- 10.1109/hdis56859.2022.9991325 Alice, E., A., Allen., Nicholas, Lubbers., Sakib, Matin., Justin, S., Smith., Richard, A., Messerly., Sergei, Tretiak., Kipton, Barros. (2023). Learning Together- Towards foundational models for machine learning interatomic potentials with meta-learning. Ricardo, Parizotto., B., L., Coelho., Israat, Haque., Alberto, Schaeffer-Filho. (2023). Offloading Machine Learning to Programmable Data Planes- A Systematic Survey. ACM Computing Surveys, doi- 10.1145/3605153 Sarunyoo, Boriratrit., Rongrit, Chatthaworn. (2023). Improvement of Long Short-Term Memory via CEEMDAN and Logistic Maps for the Power Consumption Forecasting. doi- 10.1109/ICACI58115.2023.10146172 Melissa, Holstein. (2023). General multi-agent reinforcement learning integrating heuristic-based delay priority strategy for demand and capacity balancing. Transportation Research Part C-emerging Technologies, doi- 10.1016/j.trc.2023.104218 Katherine, Elizabeth, Arden. (2023). Tractable large-scale deep reinforcement learning. Computer Vision and Image Understanding, doi- 10.1016/j.cviu.2023.103689 Dynamic Observation Policies in Observation Cost-Sensitive Reinforcement Learning. doi- 10.48550/arxiv.2307.02620 Wei, Ding., Siyang, Jiang., Hsi-Wen, Chen., Ming, Chen. (2023). Incremental Reinforcement Learning with Dual-Adaptive ε-Greedy Exploration. Proceedings of the ... AAAI Conference on Artificial Intelligence, doi- 10.1609/aaai.v37i6.25899 Jonathan, Chang., Kianté, Brantley., Rajkumar, Ramamurthy., Dipendra, Misra., Wen, Sun. (2023). Learning to Generate Better Than Your LLM. arXiv.org, doi- 10.48550/arXiv.2306.11816 Meysam, Alizamir., Jalal, Shiri., Ahmad, Fakheri, Fard., Sungwon, Kim., Alireza, Docheshmeh, Gorgij., Salim, Heddam., Vijay, P., Singh. (2023). Improving the accuracy of daily solar radiation prediction by climatic data applying an efficient hybrid deep learning model- Long short-term memory (LSTM) network coupled with wavelet transform. Engineering Applications of Artificial Intelligence, doi- 10.1016/j.engappai.2023.106199 Offline Prioritized Experience Replay. doi- 10.48550/arxiv.2306.05412 Yang, Yue., Bingyi, Kang., Xiao, Ma., Gao, Huang., Shiji, Song., Shuicheng, Yan. (2023). Offline Prioritized Experience Replay. arXiv.org, doi- 10.48550/arXiv.2306.05412 Jieliang, Luo., Hui, Li. (2020). Dynamic Experience Replay. arXiv- Artificial Intelligence, Jieliang, Luo., Hui, Li. (2020). Dynamic Experience Replay. Longfei, Zhang., Yang, Feng., Rong, Wang., Yueshan, Xu., Naifu, Xu., Zeyi, Liu., Hang, Du. (2023). Efficient experience replay architecture for offline reinforcement learning. doi- 10.1108/ria-10-2022-0248 Ape-X- Distributed Off-Policy Experience Replay\" by Horgan et al. (2018) Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {\"props\":{\"pageProps\":{\"initialData\":{\"identity\":\"rs-5424573\",\"acceptedTermsAndConditions\":true,\"allowDirectSubmit\":true,\"archivedVersions\":[],\"articleType\":\"Research Article\",\"associatedPublications\":[],\"authors\":[{\"id\":376636260,\"identity\":\"84d7dadf-a5c3-469d-a8de-24f0f92f2e93\",\"order_by\":0,\"name\":\"Fady Nashat Manhary\",\"email\":\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBUlEQVRIiWNgGAWjYBAC9gYGAzDDACogByIOPMCjhecAmhZjsJYEUrQkNoBIvFrYD2+TYMyxyzNnP/tMgqGmLn1+2OGHQFvs5HQbcGjhSSuTYNyWXGzZk24mwXDscO7G22kGQC3JxmYHsGuxZ8gxA2phTtxwII1NgrHhQO7G2QkgLQcSt+HQwsP/BqSlPnHD+WcgLXXphrPTP+DXIgG25XDihhtgW5gT5KVzCNgi8azYInHbcaCWZ8wWCccOG26Qzik4kGCA2y88/Mkbb3zcVg10WBrjjQ81dfLys9M3f/hQYSeHSwsQsEgkIDMMwCoNcKkGA+YPKAz5BryqR8EoGAWjYAQCAPurYCb+fLsEAAAAAElFTkSuQmCC\",\"orcid\":\"\",\"institution\":\"Assiut University\",\"correspondingAuthor\":true,\"prefix\":\"\",\"firstName\":\"Fady\",\"middleName\":\"Nashat\",\"lastName\":\"Manhary\",\"suffix\":\"\"},{\"id\":376636262,\"identity\":\"6b5abd5c-f805-4267-ab32-450683d28191\",\"order_by\":1,\"name\":\"Marghny H Mohamed\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Assiut University\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Marghny\",\"middleName\":\"H\",\"lastName\":\"Mohamed\",\"suffix\":\"\"},{\"id\":376636271,\"identity\":\"4bb6a8a9-4e3c-448b-9335-ad10f0857203\",\"order_by\":2,\"name\":\"Mamdouh Farouk\",\"email\":\"\",\"orcid\":\"\",\"institution\":\"Assiut University\",\"correspondingAuthor\":false,\"prefix\":\"\",\"firstName\":\"Mamdouh\",\"middleName\":\"\",\"lastName\":\"Farouk\",\"suffix\":\"\"}],\"badges\":[],\"createdAt\":\"2024-11-10 06:38:34\",\"currentVersionCode\":1,\"declarations\":\"\",\"doi\":\"10.21203/rs.3.rs-5424573/v1\",\"doiUrl\":\"https://doi.org/10.21203/rs.3.rs-5424573/v1\",\"draftVersion\":[],\"editorialEvents\":[],\"editorialNote\":\"\",\"failedWorkflow\":false,\"files\":[{\"id\":69971580,\"identity\":\"9403ba7f-842d-4f4b-a1f4-8876176ad657\",\"added_by\":\"auto\",\"created_at\":\"2024-11-27 06:23:38\",\"extension\":\"pdf\",\"order_by\":0,\"title\":\"\",\"display\":\"\",\"copyAsset\":false,\"role\":\"manuscript-pdf\",\"size\":505104,\"visible\":true,\"origin\":\"\",\"legend\":\"\",\"description\":\"\",\"filename\":\"manuscript.pdf\",\"url\":\"https://assets-eu.researchsquare.com/files/rs-5424573/v1/360cd6a4-511c-4719-b5ce-c0780da4200c.pdf\"}],\"financialInterests\":\"No competing interests reported.\",\"formattedTitle\":\"A Scalable Machine Learning Strategy for Resource Allocation Database\",\"fulltext\":[{\"header\":\"1.\\tIntroduction\",\"content\":\"\\u003cp\\u003eThe proliferation of Internet services has led to the adoption of cloud computing, which offers companies the flexibility to rent resources for specific durations through the Internet. This paradigm is based on a pa per use economic model, allowing users to access computing services on: demand and scale their service requirements up or down as needed [1] [2]. Cloud computing provides a shared infrastructure that enables web based value added services, with three predominant service models: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) [3]. IaaS utilizes servers, storage, and virtualization to provide utility like services, while PaaS offers access to APIs and development middleware for custom application development. SaaS gives users access to software or services residing in the cloud [4]. Cloud computing offers benefits such as cost savings, agility, and simplicity, but also raises concerns regarding security, privacy, and integrity [5]. Despite these challenges, there is a growing momentum behind the adoption of cloud computing.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026nbsp;The challenge of optimizing resource scheduling in cloud computing contains balancing Service Level Agreements (SLAs) and minimizing costs. Over provisioning can lead to resource waste, while under provisioning risks SLA violations and system instability. The application provider, who is familiar with the application environment and quality of service (QoS) requirements, faces this intricate challenge [6].\\u003c/p\\u003e\\n\\u003cp\\u003eThe paper presents Ape-X, a learning agent framework that integrates LSTM networks and Multi-Agent Reinforcement Learning for efficient resource scheduling. It aims to balance SLA requirements and resource costs by predicting incoming workloads effectively and optimizing actions for scaling virtual machines (VMs) using historical resource utilization and predicted results.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026nbsp;The contributions of this paper are multi-fold:\\u003c/p\\u003e\\n\\u003cp\\u003e1)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u0026nbsp;Propose a model that synergies LSTM and MARL within an autonomic computing framework.\\u003c/p\\u003e\\n\\u003cp\\u003e2)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Customizing LSTM for precise prediction of incoming workloads based on historical data.\\u003c/p\\u003e\\n\\u003cp\\u003e3)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Applying MARL as a decision-making tool to achieve optimal actions for VM scaling.\\u003c/p\\u003e\\n\\u003cp\\u003e4)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Ape-X enhances LSTM networks and MARL, offering an efficient, distributed training framework with robust exploration, stability, and scalability.\\u003c/p\\u003e\\n\\u003cp\\u003e5)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Conduct experiments to evaluate the proposed approach\\u0026rsquo;s performance under real-world workload traces and compare it with alternative methods.\\u003c/p\\u003e\\n\\u003cp\\u003eThe subsequent sections review prior auto-scaling work, the proposed approach\\u0026rsquo;s intricate details, performance evaluations through real world experiments, and conclude with avenues for future research.\\u003c/p\\u003e\"},{\"header\":\"2.\\t Related Works\",\"content\":\"\\u003cp\\u003eEfficient auto-scaling in cloud environments is critical to ensure optimal resource utilization while meeting Service Level Agreements (SLAs). This section reviews existing methodologies and approaches in the realm of auto-scaling, highlighting key insights and challenges encountered in prior research.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026nbsp;Early auto-scaling approaches relied on rule-based strategies, applying static rules and predefined thresholds to allocate resources based on specific states. However, these methods struggle to adapt to the dynamic and nonlinear nature of cloud workloads, leading to suboptimal resource utilization and limited responsiveness to real-time changes. HRA, an intelligent resource auto-scaling framework for multi-service applications, addresses these challenges by utilizing model-based deep reinforcement learning (DRL) [7]. Another approach is the use of a proactive auto-scaling method that employs a two state, machine learning Random Forest (RF) model to forecast future CPU and memory utilization values [8]. Additionally, TRIM is an auto scaler that requires no system modeling and uses a novel heuristic optimization technique called MOAT to pre compute resource allocations based on popular workload patterns [9]. These approaches aim to improve resource allocation and responsiveness in auto-scaling systems.\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003eMachine learning techniques have been explored to enhance auto-scaling capabilities by applying historical data to train models for predicting future resource demands. However, many ML approaches struggle to capture the temporal dependencies inherent in cloud workloads, leading to difficulties in adapting to changing states and accurately forecasting resource requirements [10]. To address this issue, a novel LSTM method called CEEMDAN-LM-LSTM has been proposed, which optimizes with Logistic Maps (LM) and handles the import dataset with the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) [11]. Additionally, meta learning techniques have been shown to enable training machine learning interatomic potentials (MLIPs) to multiple levels of quantum mechanical (QM) theory, allowing for better specialization to new datasets [12]. These advancements in ML can help improve the accuracy and reliability of resource demand forecasting in cloud environments.\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003eReinforcement learning (RL) has been studied for solving the demand and capacity balancing (DCB) problem in air traffic management, but existing RL-based methods face challenges in generalization and optimization performance [13]. RL has also been limited to computer vision tasks, but this work proposes an RL framework for complex, partially observable, large scale environments, specifically for road extraction from satellite images [14]. RL assumes a measurement of the environment at each time step, but in certain applications, the cost of measuring the environment can be high. This paper surveys literature that explores RL agents without costly measurements and proposes a Deep Dynamic Multi-Step Observation Less Agent (DMSOA) that outperforms existing alternatives [15]. RL frameworks often struggle with generalization to real-world scenarios, but this paper addresses the challenge of Incremental RL by introducing a Dual-Adaptive ϵ: greedy Exploration (DAE) framework that efficiently learns unseen transitions in new environments [16]. RL has also been applied to fine tuning Large Language Models (LLMs) for text generation, and this paper proposes RL with guided feedback (RLGF) algorithms that outperform supervised learning and default RL baselines [17].\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003eLSTM networks have been successfully applied to workload prediction in auto-scaling, improving the accuracy of forecasts for resource demands [18]. However, the integration of LSTM with reinforcement learning for dynamic resource allocation in multi-agent cloud environments remains an underexplored area [19].\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003eMotivation for LSTM and MARL Integration:\\u003c/p\\u003e\\n\\u003cp\\u003eGiven the limitations of rule-based methods, challenges with existing machine learning techniques, and potential synergies between LSTM and reinforcement learning, this study proposes a unique integration of LSTM networks and \\u0026nbsp; MARL. This combined approach aims to leverage LSTM\\u0026rsquo;s ability to capture temporal patterns and MARL\\u0026rsquo;s strengths in facilitating adaptive, collaborative decision-making among cloud agents.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026nbsp;By examining these related works, the paper positions its proposed methodology within the broader landscape of auto-scaling in cloud environments. Subsequent sections will delve into the novel integration of LSTM and MARL based Ape-X, experimental methodologies, results, and discussions to provide a comprehensive understanding of the contributions made in this study.\\u003c/p\\u003e\"},{\"header\":\"3.\\tProposed Approach \",\"content\":\"\\u003cp\\u003e3.1.\\u0026nbsp;Initialize Phase\\u003c/p\\u003e\\n\\u003cp\\u003eThe initialization algorithm Sets initial states and variables for your resource allocation system. involving LSTM with MARL and Ape-X integration, you would set various components, such as models, environments, memory, and policies\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eAlgorithm I: initialization algorithm\\u003c/strong\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e1: fun. Init_ResAllocSys()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e2: \\u0026nbsp;Init_LSTM _M() \\u0026nbsp; //initialize \\u0026nbsp; LSTM for resource prediction\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e3: \\u0026nbsp;Env() \\u0026nbsp; \\u0026nbsp;//initialize MARL environment for resource allocation\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e4: \\u0026nbsp;Init_Mem() \\u0026nbsp; \\u0026nbsp;//initialize memory for LSTM and Ape-X\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e5: \\u0026nbsp;Init_ExPlcy() //initialize exploration policy for MARL\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e6: \\u0026nbsp;Init_DDPG_Ag() \\u0026nbsp; //initialize DDPG agent with LSTM model, memory, and exploration policy\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e7: \\u0026nbsp;Init_ApeX_Ag() \\u0026nbsp; //initialize Ape-X agent with DDPG agent, LSTM model, environment, and multi variables\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e8: \\u0026nbsp;Init_ResAllocPolicy() \\u0026nbsp;//initialize resource allocation policy\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e9: fun. Init_LSTM _M()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e10: // Init_ LSTM model\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e11: lstm_m \\u0026nbsp;= \\u0026nbsp; Sequential()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e12: // Add LSTM l components\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e13: lstm_m.compile(optimizer = \\u0026apos;adam\\u0026apos;, loss = \\u0026apos;mean_squared_error\\u0026apos;)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e14: fun. Env()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e15: // Init_ the MARL environment\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e16: marl_env \\u0026nbsp;= \\u0026nbsp; MARLEnvironment()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e17: //initialize up environment variables\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e18: fun. Init_Mem()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e19: //initialize memory for storing experiences\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e20: memory \\u0026nbsp;= \\u0026nbsp;SequentialMemory(limit = 100000, window_length = time_steps)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e21: fun. Init_ExPlcy()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e22: //initialize exploration policy for MARL\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e23: exploration_policy = \\u0026nbsp; OrnsteinUhlenbeckProcess(size = n_ags, theta = 0.15, mu = 0, sigma = 0.3)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e24: fun. Init_DDPG_Ag()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e25: //initialize DDPG agent with LSTM model, memory, and exploration policy\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e26: ddpg_ag \\u0026nbsp;= \\u0026nbsp; DDPG_Ag(\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e27: \\u0026nbsp;mdl = lstm_m,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e28: \\u0026nbsp;nb_actions = n_actions,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e29: \\u0026nbsp;memory = memory,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e30: \\u0026nbsp;nb_steps_warmup_critic = 100,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e31: \\u0026nbsp;nb_steps_warmup_actor = 100,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e32: \\u0026nbsp;random_process = exploration_policy,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e33: \\u0026nbsp;gamma = 0.99,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e34: \\u0026nbsp;target_m_update = 1e:3\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e35: )\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e36: fun. Init_ApeX_Ag()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e37: //initialize Ape-X agent with DDPG agent, LSTM model, environment, and multi variables\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e38: apex_ag \\u0026nbsp;= \\u0026nbsp; ApeX_Ag(\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e39: \\u0026nbsp;agent = ddpg_ag,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e40: \\u0026nbsp;mdl = lstm_m,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e41: \\u0026nbsp;env = marl_env,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e42: \\u0026nbsp;n_agg = 5,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e43: \\u0026nbsp;replay_start_size = 1000,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e44: \\u0026nbsp;train_every = 10,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e45: \\u0026nbsp;b_size = 32,\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e46: \\u0026nbsp;n_epochs = 5\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e47: )\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e48: fun. Init_ResAllocPolicy()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e49: //initialize the resource allocation policy or system:specific variables\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e50: // This might contain Init_ting up rules, constraints, or initial configurations for resource allocation\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e51: Begin\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e52: Init_ResAllocSys() \\u0026nbsp;// Call the initialization fun.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e53: End\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e3.2.\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Observing phase\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026nbsp;In the observing phase, contains observing and collecting data during the execution of the system. This data can then be used for analysis, decision: making, and potentially for further training of the agents (see algorithm II).\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eAlgorithm II: Pseudocode for the observing phase\\u0026nbsp;\\u003c/strong\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e1: fun. ObserveResAllocSys()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e2: \\u0026nbsp;for iteration in range(n_iterations)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e3: \\u0026nbsp; state \\u0026nbsp; = \\u0026nbsp;marl_env.reInit_() \\u0026nbsp;// ReInit_ environment for a new iteration\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e4: \\u0026nbsp; total_allocated_resources \\u0026nbsp;= \\u0026nbsp;0\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e5: \\u0026nbsp;\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e6: \\u0026nbsp; while not done:\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e7: \\u0026nbsp; \\u0026nbsp;// Observe resource demand\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e8: \\u0026nbsp; \\u0026nbsp;ObserveResourceDemand(state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e9: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e10: \\u0026nbsp; // Predict resource demand applying \\u0026nbsp;LSTM model\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e11: \\u0026nbsp; \\u0026nbsp;resource_demand \\u0026nbsp;= \\u0026nbsp; lstm_m.predict(state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e12: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e13: \\u0026nbsp; \\u0026nbsp;// Choice \\u0026nbsp; resource allocation action from Ape-X agent\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e14: \\u0026nbsp; \\u0026nbsp;res_alloc \\u0026nbsp; = \\u0026nbsp;apex_ag.choice _action(state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e15: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e16: \\u0026nbsp; \\u0026nbsp;// Applyresource allocation in the environment\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e17: \\u0026nbsp; \\u0026nbsp;next_state, reward, done, _ \\u0026nbsp;= \\u0026nbsp; marl_env.step(res_alloc)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e18: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e19: \\u0026nbsp; \\u0026nbsp;// Observe performance metrics\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e20: \\u0026nbsp; \\u0026nbsp;ObservePerformanceMetrics(res_alloc, reward, state, next_state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e21: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e22: \\u0026nbsp; \\u0026nbsp;// Update resource allocation policy based on feedback\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e23: \\u0026nbsp; \\u0026nbsp;UpdateAllocationPolicy(res_alloc, reward)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e24: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e25: \\u0026nbsp; \\u0026nbsp;// Store transition in memory\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e26: \\u0026nbsp; \\u0026nbsp;apex_ag.store_transition(state, res_alloc, reward, next_state, done)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e27: \\u0026nbsp; \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e28: \\u0026nbsp; \\u0026nbsp;//set Ape-X updates every train_every steps\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e29: \\u0026nbsp; \\u0026nbsp;if iteration % apex_ag.train_every \\u0026nbsp;= \\u0026nbsp; = \\u0026nbsp;0:\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e30: \\u0026nbsp; \\u0026nbsp; apex_ag.train()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e31: fun. ObserveResourceDemand(state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e32: \\u0026nbsp;// Implementation for observeing resource demand\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e33: \\u0026nbsp;// You may collect data on the current state and resource demand patterns\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e34: fun. ObservePerformanceMetrics(res_alloc, reward, state, next \\u0026nbsp; state)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e35: \\u0026nbsp;// Implementation for observeing performance metrics\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e36: \\u0026nbsp;//get and analyze data related to SLA violations, VMs allocated, CPU utilization, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e37: fun. UpdateAllocationPolicy(res_alloc, reward)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e38: \\u0026nbsp;// Implementation for updating the resource allocation policy based on feedback\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e39: \\u0026nbsp;// This could contain reinforcement learning techniques\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e40: Begin\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e41: \\u0026nbsp;ObserveResAllocSys() \\u0026nbsp;// Call the observing fun.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e42: End\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e3.3.\\u0026nbsp;Analysis phase\\u003c/p\\u003e\\n\\u003cp\\u003eThe last phase contains examining the data gated during the observing phase to gain insights, make decisions, and potentially refine the resource allocation system.\\u0026nbsp;\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eAlgorithm III: Pseudocode for the analysis phase\\u0026nbsp;\\u003c/strong\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e1: fun. AnalyzeResAllocData()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e2: \\u0026nbsp;for iteration in range(n_iterations)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e3: \\u0026nbsp; // Retrieve data gated during the observeing phase for the current iteration\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e4: \\u0026nbsp; resource_demand_data \\u0026nbsp;= \\u0026nbsp; GetResourceDemandData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e5: \\u0026nbsp; performance_metrics_data \\u0026nbsp;= \\u0026nbsp;GetPerformanceMetricsData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e6: \\u0026nbsp; agent_feedback_data \\u0026nbsp;= \\u0026nbsp; GetAgentFeedbackData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e7: \\u0026nbsp;\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e8: \\u0026nbsp; //set analysis and decision-making based on the gated data\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e9: \\u0026nbsp; AnalyzeResourceDemand(resource_demand_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e10: \\u0026nbsp; AnalyzePerformanceMetrics(performance_metrics_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e11: \\u0026nbsp;AnalyzeAgentFeedback(agent_feedback_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e12: \\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e13: \\u0026nbsp;// Potentially update the system or model variables based on the analysis\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e14: \\u0026nbsp;UpdateSystemVariables()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e15: fun. GetResourceDemandData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e16: \\u0026nbsp;// Implementation to retrieve resource demand data for the specified iteration\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e17: \\u0026nbsp;// This might contain querying a database or accessing stored data\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e18: fun. GetPerformanceMetricsData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e19: \\u0026nbsp;// Implementation to retrieve performance metrics data for the specified iteration\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e20: \\u0026nbsp;// This might contain querying a database or accessing stored data\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e21: fun. GetAgentFeedbackData(iteration)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e22: \\u0026nbsp;// Implementation to retrieve agent feedback data for the specified iteration\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e23: \\u0026nbsp;// This might contain querying a database or accessing stored data\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e24: fun. AnalyzeResourceDemand(resource_demand_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e25: \\u0026nbsp;// Implementation to analyze resource demand patterns\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e26: \\u0026nbsp;// This could contain statistical analysis, trend identification, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e27: fun. AnalyzePerformanceMetrics(performance_metrics_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e28: \\u0026nbsp;// Implementation to analyze performance metrics\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e29: \\u0026nbsp;// This could contain setting trends, detecting anomalies, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e30: fun. AnalyzeAgentFeedback(agent_feedback_data)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e31: \\u0026nbsp;// Implementation to analyze agent feedback data\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e32: \\u0026nbsp;// This could contain assessing the effectiveness of resource allocation policies, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e33: fun. UpdateSystemVariables()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e34: \\u0026nbsp;// Implementation to update system or model variables based on the analysis\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e35: \\u0026nbsp;// This could contain adjusting multi variables, updating policies, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e36: Begin\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e37: \\u0026nbsp;AnalyzeResAllocData() \\u0026nbsp;// Call the analysis fun.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e38: End\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e3.4.\\u0026nbsp; Planning phase\\u003c/p\\u003e\\n\\u003cp\\u003eThis study proposes the use of Ape-X is a distributed reinforcement learning algorithm that utilizes prioritized experience replay. It combines human demonstrations and successful transitions generated by RL agents during training to improve training efficiency\\u0026nbsp;[20]\\u0026nbsp;[21]. Ape-X DDPG is an off-policy RL algorithm that can be combined with Dynamic Experience Replay (DER) to further enhance training efficiency\\u0026nbsp;[22]\\u0026nbsp;[23]. DER allows RL algorithms to use experience replay samples from both human demonstrations and RL agent transitions, resulting in shorter training times and improved performance in challenging environments\\u0026nbsp;[24]. Ape-X DDPG with DER has been successfully applied to robotic tasks, such as peg-in-hole and lap-joint assembly, and has shown superior performance compared to vanilla Ape-X DDPG.\\u003c/p\\u003e\\n\\u003cp\\u003eApe-X (Distributed Prioritized Experience Replay) is an algorithm that builds upon the Deep Q-Learning (DQN) framework. The key idea behind Ape-X is to use a distributed architecture to parallelize the training process and improve sample efficiency. The main components of Ape-X include experience replay, target networks, and a central prioritized experience replay.\\u003c/p\\u003e\\n\\u003cp\\u003eThe update equation for Ape-X is similar to the DQN update equation, with the addition of prioritized experience replay. The Q-learning update rule for Ape-X can be expressed as follows:\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003eQ(s\\u003csub\\u003et\\u003c/sub\\u003e ,a\\u003csub\\u003et\\u003c/sub\\u003e)\\u0026larr;(1\\u0026minus;\\u0026alpha;)\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;\\u003c/em\\u003e\\u003cem\\u003eQ(s\\u003csub\\u003et\\u003c/sub\\u003e ,a\\u003csub\\u003et\\u003c/sub\\u003e)+\\u0026alpha;\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;\\u003c/em\\u003e\\u003cem\\u003e(r\\u003csub\\u003et\\u003c/sub\\u003e+\\u0026gamma;\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;\\u003c/em\\u003e\\u003cem\\u003emax\\u003csub\\u003ea\\u003c/sub\\u003e\\u0026prime;Q(s\\u003csub\\u003et\\u003c/sub\\u003e+1,a\\u0026prime;))\\u0026nbsp;\\u003c/em\\u003e\\u003cem\\u003e\\u003csup\\u003e[25]\\u003c/sup\\u003e\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003eHere, the variables are defined as follows:\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003eQ(s\\u003csub\\u003et\\u003c/sub\\u003e ,a\\u003csub\\u003et\\u003c/sub\\u003e) The current estimate of the Q-value for taking action a\\u003csub\\u003et\\u003c/sub\\u003e in state s\\u003csub\\u003et\\u003c/sub\\u003e.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003e((1\\u0026minus;\\u0026alpha;)\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;Q(s\\u003csub\\u003et\\u003c/sub\\u003e,a\\u003csub\\u003et\\u003c/sub\\u003e): The current estimate is discounted by a factor of 1\\u0026minus;\\u0026alpha;, where \\u0026alpha; is the learning rate.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003e\\u0026alpha;\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;(r\\u003csub\\u003et\\u003c/sub\\u003e+\\u0026gamma;\\u003c/em\\u003e\\u003cem\\u003e\\u0026sdot;max\\u003csub\\u003ea\\u0026prime;\\u003c/sub\\u003eQ(s\\u003csub\\u003et+1\\u003c/sub\\u003e,a\\u0026prime;)): The update term, which is a combination of the immediate reward r\\u003csub\\u003et\\u003c/sub\\u003e and the discounted maximum Q-value of the next state s\\u003csub\\u003et+1\\u003c/sub\\u003e.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003e\\u0026alpha;: Learning rate, which determines the weight given to the new information compared to the existing estimate.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003ert: Immediate reward obtained after taking action at in state st.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003e\\u0026gamma;: Discount factor, which determines the importance of future rewards. It\\u0026apos;s a value between 0 and 1.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;\\u003cem\\u003emax\\u003csub\\u003ea\\u0026prime;\\u003c/sub\\u003eQ(s\\u003csub\\u003et+1\\u003c/sub\\u003e,a\\u0026prime;): The maximum Q-value for all possible actions \\u0026prime;a\\u0026prime; in the next state s\\u003csub\\u003et+1\\u003c/sub\\u003e.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003eIn the context of Ape-X, the prioritized experience replay contains assigning a priority to each experience in the replay buffer based on the magnitude of the TD-error (Temporal Difference error). The experiences with higher TD-errors are sampled with higher probability during the training process, prioritizing more informative experiences.\\u003c/p\\u003e\\n\\u003cp\\u003eThe planning phase contains applying the insights gained from the analysis phase to make informed decisions and formulate strategies for improving the resource allocation system\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003eAlgorithm IV: Pseudocode for the planning phase with MARL and Ape-X integration\\u003c/strong\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cstrong\\u003e1\\u003c/strong\\u003e\\u003cem\\u003e: fun. PlanResAllocImprovements()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e2: \\u0026nbsp;for iteration in range(n_planning_iterations)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e3: \\u0026nbsp; //set analysis to set areas for improvement\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e4: \\u0026nbsp; analysis_results \\u0026nbsp;= \\u0026nbsp; PerformAnalysisForPlanning()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e5// Create improvement strategies based on the analysis.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e6: \\u0026nbsp; Enhance_stratg \\u0026nbsp;= \\u0026nbsp; FormulateImprovementStrategies(analysis_results)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e7: \\u0026nbsp; // Implement planned improvements in the resource allocation system\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e8: \\u0026nbsp; ImpImprovStrategies(Enhance_stratg)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e9: fun. PerformAnalysisForPlanning()\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e10: // Execution of analysis, particularly for planning\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e11: // This might contain revisiting certain metrics, setting new patterns, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e12: analysis_results \\u0026nbsp;= \\u0026nbsp;{}\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e13: fun. FormulateImprovementStrategies(analysis_results)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e14: // Implementation to formulate improvement strategies based on analysis results\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e15: // This could contain adjusting policies, updating models, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e16: Enhance_stratg \\u0026nbsp;= \\u0026nbsp;{}\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e17: fun. ImpImprovStrategies(Enhance_stratg)\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e18: \\u0026nbsp;// Putting planned enhancements into the resource allocation system into practice system\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e19: // This could involve retraining models, changing system variables, etc.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e20: Begin\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e21: \\u0026nbsp;PlanResAllocImprovements() \\u0026nbsp;// Call the planning fun.\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e\\u003cem\\u003e22: End\\u0026nbsp;\\u003c/em\\u003e\\u003c/p\\u003e\\n\\u003cp\\u003e3.5.\\u0026nbsp; \\u0026nbsp;Execution Phase\\u003c/p\\u003e\\n\\u003cp\\u003eIn this phase, trained models, including Multi-Agent Reinforcement Learning (MARL) policies and the Ape-X learner, were deployed in the simulation environment to assess their real-time performance. The deployment phase involved integrating the trained models with the simulated cloud computing system. To evaluate the models\\u0026apos; performance, extensive data was gated during the execution phase. This data includes:\\u003c/p\\u003e\\n\\u003cp\\u003e1)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Workload Data: The synthetic workload dataset simulates dynamic changes in resource demands.\\u003c/p\\u003e\\n\\u003cp\\u003e2)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Environment Observations: System states, CPU utilization, memory usage, and other relevant metrics during execution.\\u003c/p\\u003e\\n\\u003cp\\u003e3)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Agent Actions: Actions taken by each agent in response to the observed environment.\\u003c/p\\u003e\\n\\u003cp\\u003ethen running the simulations for a predetermined number of iterations, each comprising multiple time steps. During each time step, the agent\\u0026rsquo;s choices actions based on their policies, and the environment evolved accordingly. Performance metrics were continuously observed and recorded.,\\u003c/p\\u003e\"},{\"header\":\"4.\\tExperimental Setup\",\"content\":\"\\u003cp\\u003eIn this section, we detail the for conducting our experiments, including the hardware and software settings, datasets used, and parameter\\u003c/p\\u003e\\n\\u003cp\\u003e4.1.\\u0026nbsp;Hardware Configuration\\u003c/p\\u003e\\n\\u003cp\\u003eThe experiments were conducted on a cluster of machines to simulate a cloud computing environment. The hardware configuration is as follows-\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Processors- Intel Xeon E5-XXXX series, 2.5 GHz, 16 cores\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Memory- 128 GB RAM\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Network- 1 Gbps Ethernet\\u003c/p\\u003e\\n\\u003cp\\u003e4.2.\\u0026nbsp;Software Configuration\\u003c/p\\u003e\\n\\u003cp\\u003eThe software stack for conducting the experiments included-\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Operating System- Ubuntu 20.04\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Deep Learning Framework- TensorFlow 2.5.0\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Reinforcement Learning Libraries- OpenAI Gym, Stable Baselines 3\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;LSTM Implementation- Keras with TensorFlow backend\\u003c/p\\u003e\\n\\u003cp\\u003e4.3.\\u0026nbsp; Dataset\\u003c/p\\u003e\\n\\u003cp\\u003eThe experiments utilized a synthetic workload dataset generated to simulate dynamic changes in resource demands. The dataset includes information on CPU utilization, memory usage, and incoming requests at regular time intervals.\\u003c/p\\u003e\\n\\u003cp\\u003e4.4.\\u0026nbsp;Experimental Design\\u003c/p\\u003e\\n\\u003cp\\u003e1)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;LSTM used for time-series forecasting of CPU usage.\\u003c/p\\u003e\\n\\u003cp\\u003e2)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;LSTM-DQN: Integration of LSTM for workload prediction and DQN for optimal action choice ion.\\u003c/p\\u003e\\n\\u003cp\\u003e3)\\u0026nbsp; \\u0026nbsp;\\u0026nbsp;LSTM-Ape-X: Combination of LSTM for workload prediction and Ape-X for RL-based optimal action choice ion.\\u003c/p\\u003e\\n\\u003cp\\u003e4.5.\\u0026nbsp;Variables\\u003c/p\\u003e\\n\\u003cp\\u003eThe multi variables for the LSTM, DQN, and Ape-X algorithms were set based on preliminary experiments and existing literature. Key multi variables include the learning rate, discount factor, exploration-exploitation trade-off, and LSTM sequence length.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Learning Rate- 0.001\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Discount Factor (\\u0026gamma;) 0.9\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Exploration-Exploitation Trade-off (ϵ) 0.1\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;LSTM Sequence Length- 10 time steps\\u003c/p\\u003e\\n\\u003cp\\u003e4.6.\\u0026nbsp;Experimental Procedure\\u003c/p\\u003e\\n\\u003cp\\u003eThe experiments were conducted as follows-\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;The system was initialized, and the baseline and proposed strategies were deployed.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;The synthetic workload dataset was introduced, simulating dynamic changes in resource demands.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Performance metrics, including SLA Violation, VMs Allocated, CPU Utilization, System Utilization, Resource Efficiency, and Stability, were measured at regular intervals.\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Experiments were repeated to ensure robustness and reproducibility of results.\\u003c/p\\u003e\\n\\u003cp\\u003e4.7.\\u0026nbsp; Statistical Analysis\\u003c/p\\u003e\\n\\u003cp\\u003eTo validate the significance of the results, statistical tests such as t-tests and ANOVA were applied to compare the performance of different strategies.\\u003c/p\\u003e\\n\\u003cp\\u003e5.\\u0026nbsp; \\u0026nbsp;Experimental Results and Discussion\\u003c/p\\u003e\\n\\u003cp\\u003eIn this section, we present the results of our experiments comparing different strategies for resource allocation, namely LSTM, LSTM with RL based on DQN (LSTM-DQN), and LSTM with RL based on Ape-X (LSTM-Ape-X).\\u003c/p\\u003e\\n\\u003cp\\u003e5.1.\\u0026nbsp;Performance Metrics\\u003c/p\\u003e\\n\\u003cp\\u003eWe assessed the strategies based on several key performance metrics-\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;SLA Violation (Delay Time)\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;VMs Allocated\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;CPU Utilization\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;System Utilization\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Resource Efficiency\\u003c/p\\u003e\\n\\u003cp\\u003e\\u0026middot;\\u0026nbsp; \\u0026nbsp; \\u0026nbsp;\\u0026nbsp;Stability\\u003c/p\\u003e\\n\\u003cp\\u003e5.2.\\u0026nbsp;Results\\u003c/p\\u003e\\n\\u003cp\\u003eOur experiments revealed the following observations\\u003c/p\\u003e\\n\\u003cp\\u003eTable1: SLA Violation (Delay Time)\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"292\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 73px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 82px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eSLA Violation (Delay Time)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 64px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (ms)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 63px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.8865%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.078%;\\\"\\u003e\\n \\u003cp\\u003eLow\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e5.2\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e1.1\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.8865%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.078%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e12.5\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e2.3\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.8865%;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.078%;\\\"\\u003e\\n \\u003cp\\u003eModerate to High\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e18.8\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e3.5\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\\n\\u003cp\\u003eTable2: VMs Allocated\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"292\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 28.0142%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 26.9504%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eVMs Allocated\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (%)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 28.0142%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 26.9504%;\\\"\\u003e\\n \\u003cp\\u003eHigh\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e25\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e5\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 28.0142%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 26.9504%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e18\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e3\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 28.0142%;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 26.9504%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.695%;\\\"\\u003e\\n \\u003cp\\u003e20\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 22.3404%;\\\"\\u003e\\n \\u003cp\\u003e4\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\\n\\u003cp\\u003eTable3: CPU Utilization\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"292\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.8227%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.1418%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eCPU Utilization\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.5674%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (%)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.4681%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.8227%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.1418%;\\\"\\u003e\\n \\u003cp\\u003eHigh\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.5674%;\\\"\\u003e\\n \\u003cp\\u003e85\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.4681%;\\\"\\u003e\\n \\u003cp\\u003e8\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.8227%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.1418%;\\\"\\u003e\\n \\u003cp\\u003eModerate to High\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.5674%;\\\"\\u003e\\n \\u003cp\\u003e75\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.4681%;\\\"\\u003e\\n \\u003cp\\u003e10\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.8227%;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.1418%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.5674%;\\\"\\u003e\\n \\u003cp\\u003e70\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 24.4681%;\\\"\\u003e\\n \\u003cp\\u003e12\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\\n\\u003cp\\u003eTable4: System Utilization\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"304\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 87px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 89px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eSystem Utilization\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 61px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (%)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 57px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.5918%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.2721%;\\\"\\u003e\\n \\u003cp\\u003eHigh\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.7483%;\\\"\\u003e\\n \\u003cp\\u003e80\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.3878%;\\\"\\u003e\\n \\u003cp\\u003e7\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.5918%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.2721%;\\\"\\u003e\\n \\u003cp\\u003eModerate to High\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.7483%;\\\"\\u003e\\n \\u003cp\\u003e75\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.3878%;\\\"\\u003e\\n \\u003cp\\u003e9\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 29.5918%;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 30.2721%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.7483%;\\\"\\u003e\\n \\u003cp\\u003e70\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.3878%;\\\"\\u003e\\n \\u003cp\\u003e10\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\\n\\u003cp\\u003e\\u0026nbsp;Table5: Resource Efficiency\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"299\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 96px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd colspan=\\\"2\\\" valign=\\\"bottom\\\" style=\\\"width: 88px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eResource Efficiency\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 63px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (%)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 41px;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 96px;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd colspan=\\\"2\\\" valign=\\\"bottom\\\" style=\\\"width: 88px;\\\"\\u003e\\n \\u003cp\\u003eHigh\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 63px;\\\"\\u003e\\n \\u003cp\\u003e85\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 41px;\\\"\\u003e\\n \\u003cp\\u003e7\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 96px;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd colspan=\\\"2\\\" valign=\\\"bottom\\\" style=\\\"width: 88px;\\\"\\u003e\\n \\u003cp\\u003eModerate to High\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 63px;\\\"\\u003e\\n \\u003cp\\u003e75\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 41px;\\\"\\u003e\\n \\u003cp\\u003e8\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 96px;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 84px;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd colspan=\\\"2\\\" valign=\\\"bottom\\\" style=\\\"width: 68px;\\\"\\u003e\\n \\u003cp\\u003e70\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 41px;\\\"\\u003e\\n \\u003cp\\u003e9\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\\n\\u003cp\\u003e\\u0026nbsp;Table6: Stability\\u003c/p\\u003e\\n\\u003cdiv align=\\\"Left\\\"\\u003e\\n \\u003ctable border=\\\"1\\\" cellpadding=\\\"0\\\" width=\\\"298\\\"\\u003e\\n \\u003cthead\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.3472%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStrategy\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 34.7222%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStability\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.1389%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eMean (%)\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.7917%;\\\"\\u003e\\n \\u003cp\\u003e\\u003cstrong\\u003e\\u003cem\\u003eStd. Dev.\\u003c/em\\u003e\\u003c/strong\\u003e\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/thead\\u003e\\n \\u003ctbody\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.3472%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-Ape-X\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 34.7222%;\\\"\\u003e\\n \\u003cp\\u003eHigh\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.1389%;\\\"\\u003e\\n \\u003cp\\u003e90\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.7917%;\\\"\\u003e\\n \\u003cp\\u003e5\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.3472%;\\\"\\u003e\\n \\u003cp\\u003eLSTM-DQN\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 34.7222%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.1389%;\\\"\\u003e\\n \\u003cp\\u003e80\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.7917%;\\\"\\u003e\\n \\u003cp\\u003e7\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003ctr\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 25.3472%;\\\"\\u003e\\n \\u003cp\\u003eLSTM\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 34.7222%;\\\"\\u003e\\n \\u003cp\\u003eModerate\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 20.1389%;\\\"\\u003e\\n \\u003cp\\u003e75\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003ctd valign=\\\"bottom\\\" style=\\\"width: 19.7917%;\\\"\\u003e\\n \\u003cp\\u003e8\\u003c/p\\u003e\\n \\u003c/td\\u003e\\n \\u003c/tr\\u003e\\n \\u003c/tbody\\u003e\\n \\u003c/table\\u003e\\n\\u003c/div\\u003e\"},{\"header\":\"Discussion\",\"content\":\"\\u003cp\\u003eThe statistical analysis supports the observations, indicating that LSTM-Ape-X consistently outperforms other strategies across various metrics, exhibiting lower SLA violations, higher VMs allocation, and improved system and resource utilization. Further investigations into parameter tuning and scalability considerations may provide additional insights. The findings underscore the potential of combining LSTM with reinforcement learning algorithms for robust resource management in dynamic computing environments with statistical significance.\\u003c/p\\u003e\"},{\"header\":\"Conclusion and Future Work\",\"content\":\"\\u003cp\\u003eIn conclusion, our study has successfully demonstrated the effectiveness of the Multi-Agent Reinforcement Learning (MARL)-enhanced approach based on Ape-X for auto-scaling cloud resources. This was manifested through improved virtual machine stability and a significant reduction in Service Level Agreement (SLA) violations. These positive outcomes not only validate the viability of our approach but also pave the way for compelling future research endeavors across several key areas.\\u003c/p\\u003e\\n\\u003cp\\u003eA primary focus of our future research will be the exploration of efficient methodologies for heterogeneous task scheduling. As cloud environments often host a variety of applications with different computational requirements, optimizing task scheduling for such heterogeneity becomes crucial for overall system performance. This adaptability is vital for maintaining optimal performance under varying workloads and network conditions. This exploration aims to further enhance the accuracy of resource predictions in diverse and dynamic cloud environments. By leveraging the strengths of different models, we anticipate improvements in prediction accuracy and adaptability, contributing to the overall robustness of our auto-scaling approach. This rigorous assessment will provide valuable insights into the robustness and applicability of our approach under diverse workloads and network conditions. By addressing these future research directions, we aim to advance the adaptability, robustness, and versatility of our auto-scaling approach, positioning it as an evolving solution applicable across a broader spectrum of cloud computing scenarios.\\u003c/p\\u003e\"},{\"header\":\"References\",\"content\":\"\\u003col\\u003e\\n \\u003cli\\u003eJoel, Gibson., Robin, Rondeau., Darren, Eveleigh., Qing, Tan. (2012). Benefits and challenges of three cloud computing service models. doi- 10.1109/CASON.2012.6412402\\u003c/li\\u003e\\n \\u003cli\\u003eMohsin, Nazir., Prashant, Tiwari., Shakti, Dhar, Tiwari., Raj, Gaurav, Mishra. (2015). Cloud Computing- An Overview.\\u003c/li\\u003e\\n \\u003cli\\u003eEkaba, Bisong. (2019). What Is Cloud Computing. doi- 10.1007/978-1-4842-4470-8_1\\u003c/li\\u003e\\n \\u003cli\\u003eCarlos, Rodr\\u0026iacute;guez, Monroy., Gregorio, Carlos, Almarcha, Arias., Yilsy, N\\u0026uacute;\\u0026ntilde;ez, Guerrero. (2012). The new cloud computing paradigm- the way to IT seen as a utility.\\u003c/li\\u003e\\n \\u003cli\\u003eSalauddin, Dhali., Annabella, Loconsole., Edward, Blurock. (2015). A study on cloud computing adoption of small and medium enterprises Master Thesis project 30 ECTS credits Spring 2015.\\u003c/li\\u003e\\n \\u003cli\\u003eAnver, Shahabdeen, Rahumath., Santhosh, Rajendran., N., MohanaSundaram., Abdul, Rahiman, Malangai. (2022). Cost-Efficient Deadline Constrained Scientific Workflow Scheduling in Infrastructure-as-a-Service Clouds by Disqualifying Tasks with Anomalies. Journal of Computer Science, doi- 10.3844/jcssp.2022.555.566\\u003c/li\\u003e\\n \\u003cli\\u003eTrace-Driven Scaling of Microservice Applications. IEEE Access, doi- 10.1109/access.2023.3260069\\u003c/li\\u003e\\n \\u003cli\\u003eA proactive energy-aware auto-scaling solution for edge-based infrastructures. doi- 10.1109/ucc56403.2022.00044\\u003c/li\\u003e\\n \\u003cli\\u003eSupervisory Event Loop-based Autoscaling of Node.js Deployments. doi- 10.1109/hdis56859.2022.9991325\\u003c/li\\u003e\\n \\u003cli\\u003eAlice, E., A., Allen., Nicholas, Lubbers., Sakib, Matin., Justin, S., Smith., Richard, A., Messerly., Sergei, Tretiak., Kipton, Barros. (2023). Learning Together- Towards foundational models for machine learning interatomic potentials with meta-learning.\\u003c/li\\u003e\\n \\u003cli\\u003eRicardo, Parizotto., B., L., Coelho., Israat, Haque., Alberto, Schaeffer-Filho. (2023). Offloading Machine Learning to Programmable Data Planes- A Systematic Survey. ACM Computing Surveys, doi- 10.1145/3605153\\u003c/li\\u003e\\n \\u003cli\\u003eSarunyoo, Boriratrit., Rongrit, Chatthaworn. (2023). Improvement of Long Short-Term Memory via CEEMDAN and Logistic Maps for the Power Consumption Forecasting. doi- 10.1109/ICACI58115.2023.10146172\\u003c/li\\u003e\\n \\u003cli\\u003eMelissa, Holstein. (2023). General multi-agent reinforcement learning integrating heuristic-based delay priority strategy for demand and capacity balancing. Transportation Research Part C-emerging Technologies, doi- 10.1016/j.trc.2023.104218\\u003c/li\\u003e\\n \\u003cli\\u003eKatherine, Elizabeth, Arden. (2023). Tractable large-scale deep reinforcement learning. Computer Vision and Image Understanding, doi- 10.1016/j.cviu.2023.103689\\u003c/li\\u003e\\n \\u003cli\\u003eDynamic Observation Policies in Observation Cost-Sensitive Reinforcement\\u003c/li\\u003e\\n \\u003cli\\u003eLearning. doi- 10.48550/arxiv.2307.02620\\u003c/li\\u003e\\n \\u003cli\\u003eWei, Ding., Siyang, Jiang., Hsi-Wen, Chen., Ming, Chen. (2023). Incremental Reinforcement Learning with Dual-Adaptive \\u0026epsilon;-Greedy Exploration. Proceedings of the ... AAAI Conference on Artificial Intelligence, doi- 10.1609/aaai.v37i6.25899\\u003c/li\\u003e\\n \\u003cli\\u003eJonathan, Chang., Kiant\\u0026eacute;, Brantley., Rajkumar, Ramamurthy., Dipendra, Misra., Wen, Sun. (2023). Learning to Generate Better Than Your LLM. arXiv.org, doi- 10.48550/arXiv.2306.11816\\u003c/li\\u003e\\n \\u003cli\\u003eMeysam, Alizamir., Jalal, Shiri., Ahmad, Fakheri, Fard., Sungwon, Kim., Alireza, Docheshmeh, Gorgij., Salim, Heddam., Vijay, P., Singh. (2023). Improving the accuracy of daily solar radiation prediction by climatic data applying an efficient hybrid deep learning model- Long short-term memory (LSTM) network coupled with wavelet transform. Engineering Applications of Artificial Intelligence, doi- 10.1016/j.engappai.2023.106199\\u003c/li\\u003e\\n \\u003cli\\u003eOffline Prioritized Experience Replay. doi- 10.48550/arxiv.2306.05412\\u003c/li\\u003e\\n \\u003cli\\u003eYang, Yue., Bingyi, Kang., Xiao, Ma., Gao, Huang., Shiji, Song., Shuicheng, Yan. (2023). Offline Prioritized Experience Replay. arXiv.org, doi- 10.48550/arXiv.2306.05412\\u003c/li\\u003e\\n \\u003cli\\u003eJieliang, Luo., Hui, Li. (2020). Dynamic Experience Replay. arXiv- Artificial Intelligence,\\u003c/li\\u003e\\n \\u003cli\\u003eJieliang, Luo., Hui, Li. (2020). Dynamic Experience Replay.\\u003c/li\\u003e\\n \\u003cli\\u003eLongfei, Zhang., Yang, Feng., Rong, Wang., Yueshan, Xu., Naifu, Xu., Zeyi, Liu., Hang, Du. (2023). Efficient experience replay architecture for offline reinforcement learning. doi- 10.1108/ria-10-2022-0248\\u003c/li\\u003e\\n \\u003cli\\u003eApe-X- Distributed Off-Policy Experience Replay\\u0026quot; by Horgan et al. (2018)\\u003c/li\\u003e\\n\\u003c/ol\\u003e\"}],\"fulltextSource\":\"\",\"fullText\":\"\",\"funders\":[],\"hasAdminPriorityOnWorkflow\":false,\"hasManuscriptDocX\":true,\"hasOptedInToPreprint\":true,\"hasPassedJournalQc\":\"\",\"hasAnyPriority\":true,\"hideJournal\":true,\"highlight\":\"\",\"institution\":\"\",\"isAcceptedByJournal\":false,\"isAuthorSuppliedPdf\":false,\"isDeskRejected\":\"\",\"isHiddenFromSearch\":false,\"isInQc\":false,\"isInWorkflow\":false,\"isPdf\":false,\"isPdfUpToDate\":true,\"isWithdrawnOrRetracted\":false,\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true},\"keywords\":\"Auto-scaling, Cloud Resources, Long Short-term Memory (LSTM), Multi-agent Reinforcement Learning (MARL), Service Level Agreements (SLAs), Virtual Machines.\",\"lastPublishedDoi\":\"10.21203/rs.3.rs-5424573/v1\",\"lastPublishedDoiUrl\":\"https://doi.org/10.21203/rs.3.rs-5424573/v1\",\"license\":{\"name\":\"CC BY 4.0\",\"url\":\"https://creativecommons.org/licenses/by/4.0/\"},\"manuscriptAbstract\":\"\\u003cp\\u003eEfficiently responding to dynamic application demands in cloud environments is crucial for meeting service level agreements (SLAs) and optimizing resource costs. Traditional auto-scaling approaches often struggle with predefined rules, making it challenging to devise optimal adaptation strategies. This paper introduces a proactive strategy that leverages the robust capabilities of long short-term memory (LSTM) for precise request prediction, complemented by the intelligent decision-making power of multi-agent reinforcement learning (MARL) to determine optimal actions for scaling virtual machines.\\u003c/p\\u003e\\n\\u003cp\\u003eIn this proposed methodology, the LSTM accurately predicts the number of requests in the next time step, effectively adapting to dynamic traffic changes. The integration of MARL enhances the adaptability and efficiency of the auto-scaling process by enabling virtual machines to make informed decisions based on real time states.\\u003c/p\\u003e\\n\\u003cp\\u003eThis study asserts that applying MARL as a fundamental component of the auto-scaling strategy is a promising and effective solution. The synergy between LSTM and MARL based Ape-X not only enhances predictive accuracy but also empowers virtual machines to make proactive decisions, making it a valuable approach for meeting SLAs and optimizing resource utilization in dynamic cloud environments.\\u003c/p\\u003e\",\"manuscriptTitle\":\"A Scalable Machine Learning Strategy for Resource Allocation Database\",\"msid\":\"\",\"msnumber\":\"\",\"nonDraftVersions\":[{\"code\":1,\"date\":\"2024-11-27 06:15:28\",\"doi\":\"10.21203/rs.3.rs-5424573/v1\",\"editorialEvents\":[{\"type\":\"communityComments\",\"content\":0}],\"status\":\"published\",\"journal\":{\"display\":true,\"email\":\"info@researchsquare.com\",\"identity\":\"researchsquare\",\"isNatureJournal\":false,\"hasQc\":true,\"allowDirectSubmit\":true,\"externalIdentity\":\"\",\"sideBox\":\"\",\"snPcode\":\"\",\"submissionUrl\":\"/submission\",\"title\":\"Research Square\",\"twitterHandle\":\"researchsquare\",\"acdcEnabled\":true,\"dfaEnabled\":false,\"editorialSystem\":\"\",\"reportingPortfolio\":\"\",\"inReviewEnabled\":false,\"inReviewRevisionsEnabled\":true}}],\"origin\":\"\",\"ownerIdentity\":\"4e475e38-c25c-42d0-9ac3-f1949270bab8\",\"owner\":[],\"postedDate\":\"November 27th, 2024\",\"published\":true,\"recentEditorialEvents\":[],\"rejectedJournal\":[],\"revision\":\"\",\"amendment\":\"\",\"status\":\"posted\",\"subjectAreas\":[],\"tags\":[],\"updatedAt\":\"2024-11-27T06:15:33+00:00\",\"versionOfRecord\":[],\"versionCreatedAt\":\"2024-11-27 06:15:28\",\"video\":\"\",\"vorDoi\":\"\",\"vorDoiUrl\":\"\",\"workflowStages\":[]},\"version\":\"v1\",\"identity\":\"rs-5424573\",\"journalConfig\":\"researchsquare\"},\"__N_SSP\":true},\"page\":\"/article/[identity]/[[...version]]\",\"query\":{\"redirect\":\"/article/rs-5424573\",\"identity\":\"rs-5424573\",\"version\":[\"v1\"]},\"buildId\":\"qtupq5eGEP_6zYnWcrvyt\",\"isFallback\":false,\"isExperimentalCompile\":false,\"dynamicIds\":[84888],\"gssp\":true,\"scriptLoader\":[]}","source_license":"CC-BY-4.0","license_restricted":false}