Found 1358 results, showing the newest relevant preprints. Sort by relevancy only.Update me on new preprints

Sample Complexity of **Reinforcement** **Learning** using Linearly Combined
Model Ensembles

**Reinforcement**

**learning**(RL) methods have been shown to be capable of learning intelligent behavior in rich domains. Expand abstract.

**Reinforcement**

**learning**(RL) methods have been shown to be capable of

**learning**intelligent behavior in rich domains. However, this has largely been done in simulated domains without adequate focus on the process of building the simulator. In this paper, we consider a setting where we have access to an ensemble of pre-trained and possibly inaccurate simulators (models). We approximate the real environment using a state-dependent linear combination of the ensemble, where the coefficients are determined by the given state features and some unknown parameters. Our proposed algorithm provably learns a near-optimal policy with a sample complexity polynomial in the number of unknown parameters, and incurs no dependence on the size of the state (or action) space. As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set. We provide exponential lower bounds that illustrate the fundamental hardness of this problem, and develop a provably efficient algorithm under additional natural assumptions.

50 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

Robust Domain Randomization for **Reinforcement** **Learning**

Producing agents that can generalize to a wide range of environments is a significant challenge in

**reinforcement****learning**. Expand abstract. Producing agents that can generalize to a wide range of environments is a significant challenge in

**reinforcement****learning**. One method for overcoming this issue is domain randomization, whereby at the start of each training episode some parameters of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. In this work, we formalize the domain randomization problem, and show that minimizing the policy's Lipschitz constant with respect to the randomization parameters leads to low variance in the**learned**policies. We propose a method where the agent only needs to be trained on one variation of the environment, and its**learned**state representations are regularized during training to minimize this constant. We conduct experiments that demonstrate that our technique leads to more efficient and robust**learning**than standard domain randomization, while achieving equal generalization scores.50 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

Robust Model Predictive Shielding for Safe **Reinforcement** **Learning** with
Stochastic Dynamics

Our goal is to ensure the safety of a control policy trained using

**reinforcement****learning**, e.g., in a simulated environment. Expand abstract. This paper proposes a framework for safe

**reinforcement****learning**that can handle stochastic nonlinear dynamical systems. We focus on the setting where the nominal dynamics are known, and are subject to additive stochastic disturbances with known distribution. Our goal is to ensure the safety of a control policy trained using**reinforcement**learning, e.g., in a simulated environment. We build on the idea of model predictive shielding (MPS), where a backup controller is used to override the**learned**policy as needed to ensure safety. The key challenge is how to compute a backup policy in the context of stochastic dynamics. We propose to use a tube-based robust NMPC controller as the backup controller. We estimate the tubes using sampled trajectories, leveraging ideas from statistical**learning**theory to obtain high-probability guarantees. We empirically demonstrate that our approach can ensure safety in stochastic systems, including cart-pole and a non-holonomic particle with random obstacles.50 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

Reciprocal Collision Avoidance for General Nonlinear Agents using
**Reinforcement** **Learning**

To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using

**reinforcement****learning**(RL). Expand abstract. Finding feasible and collision-free paths for multiple nonlinear agents is challenging in the decentralized scenarios due to limited available information of other agents and complex dynamics constraints. In this paper, we propose a fast multi-agent collision avoidance algorithm for general nonlinear agents with continuous action space, where each agent observes only positions and velocities of nearby agents. To reduce online computation, we first decompose the multi-agent scenario and solve a two agents collision avoidance problem using

**reinforcement****learning**(RL). When extending the trained policy to a multi-agent problem, safety is ensured by introducing the optimal reciprocal collision avoidance (ORCA) as linear constraints and the overall collision avoidance action could be found through simple convex optimization. Most existing RL-based multi-agent collision avoidance algorithms rely on the direct control of agent velocities. In sharp contrasts, our approach is applicable to general nonlinear agents. Realistic simulations based on nonlinear bicycle agent models are performed with various challenging scenarios, indicating a competitive performance of the proposed method in avoiding collisions, congestion and deadlock with smooth trajectories.50 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

**Learning** Humanoid Robot Running Skills through Proximal Policy
Optimization

In this work, we present a methodology based on Deep

**Reinforcement****Learning**that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics. Expand abstract. In the current level of evolution of Soccer 3D, motion control is a key factor in team's performance. Recent works takes advantages of model-free approaches based on Machine

**Learning**to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep**Reinforcement****Learning**that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.51 days ago

4/10 relevant

arXiv

4/10 relevant

arXiv

Resource Allocation in Mobility-Aware Federated Learning Networks: A
Deep **Reinforcement** **Learning** Approach

However, federated

**learning**faces the energy constraints of the workers and the high network resource cost due to the fact that a number of global model transmissions may be required to achieve the target accuracy. Expand abstract. Federated

**learning**allows mobile devices, i.e., workers, to use their local data to collaboratively train a global model required by the model owner. Federated**learning**thus addresses the privacy issues of traditional machine**learning**. However, federated**learning**faces the energy constraints of the workers and the high network resource cost due to the fact that a number of global model transmissions may be required to achieve the target accuracy. To address the energy constraint, a power beacon can be used that recharges energy to the workers. However, the model owner may need to pay an energy cost to the power beacon for the energy recharge. To address the high network resource cost, the model owner can use a WiFi channel, called default channel, for the global model transmissions. However, communication interruptions may occur due to the instability of the default channel quality. For this, special channels such as LTE channels can be used, but this incurs channel cost. As such, the problem of the model owner is to decide amounts of energy recharged to the workers and to choose channels used to transmit its global model to the workers to maximize the number of global model transmissions while minimizing the energy and channel costs. This is challenging for the model owner under the uncertainty of the channel, energy and mobility states of the workers. In this paper, we thus propose to employ the Deep Q-Network (DQN) that enables the model owner to find the optimal decisions on the energy and the channels without any a priori network knowledge. Simulation results show that the proposed DQN always achieves better performance compared to the conventional algorithms.52 days ago

8/10 relevant

arXiv

8/10 relevant

arXiv

Deep **Reinforcement** **Learning** Control of Quantum Cartpoles

We use the state-of-the-art deep

**reinforcement****learning**to stabilize the quantum cartpole and find that our deep learning approach performs comparably to or better than other strategies in standard control theory. Expand abstract. We generalize a standard benchmark of

**reinforcement**learning, the classical cartpole balancing problem, to the quantum regime by stabilizing a particle in an unstable potential through measurement and feedback. We use the state-of-the-art deep**reinforcement****learning**to stabilize the quantum cartpole and find that our deep**learning**approach performs comparably to or better than other strategies in standard control theory. Our approach also applies to measurement-feedback cooling of quantum oscillators, showing the applicability of deep**learning**to general continuous-space quantum control.52 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

Dealing with Sparse Rewards in **Reinforcement** **Learning**

This project introduces a novel

**reinforcement****learning**solution by combining aspects of two existing state of the art sparse reward solutions, curiosity driven exploration and unsupervised auxiliary tasks. Expand abstract. Successfully navigating a complex environment to obtain a desired outcome is a difficult task, that up to recently was believed to be capable only by humans. This perception has been broken down over time, especially with the introduction of deep

**reinforcement**learning, which has greatly increased the difficulty of tasks that can be automated. However, for traditional**reinforcement****learning**agents this requires an environment to be able to provide frequent extrinsic rewards, which are not known or accessible for many real-world environments. This project aims to explore and contrast existing**reinforcement****learning**solutions that circumnavigate the difficulties of an environment that provide sparse rewards. Different**reinforcement**solutions will be implemented over a several video game environments with varying difficulty and varying frequency of rewards, as to properly investigate the applicability of these solutions. This project introduces a novel**reinforcement****learning**solution by combining aspects of two existing state of the art sparse reward solutions, curiosity driven exploration and unsupervised auxiliary tasks.52 days ago

10/10 relevant

arXiv

10/10 relevant

arXiv

Momentum in **Reinforcement** **Learning**

We adapt the optimization's concept of momentum to reinforcement learning. Expand abstract.

We adapt the optimization's concept of momentum to

**reinforcement****learning**. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep**learning**. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.52 days ago

9/10 relevant

arXiv

9/10 relevant

arXiv

Towards a **Reinforcement** **Learning** Environment Toolbox for Intelligent
Electric Motor Control

A novel approach is to use

**reinforcement****learning**(RL) to have an agent learn electric drive control from scratch merely by interacting with a suitable control environment. Expand abstract. Electric motors are used in many applications and their efficiency is strongly dependent on their control. Among others, PI approaches or model predictive control methods are well-known in the scientific literature and industrial practice. A novel approach is to use

**reinforcement****learning**(RL) to have an agent learn electric drive control from scratch merely by interacting with a suitable control environment. RL achieved remarkable results with super-human performance in many games (e.g. Atari classics or Go) and also becomes more popular in control tasks like cartpole or swinging pendulum benchmarks. In this work, the open-source Python package gym-electric-motor (GEM) is developed for ease of training of RL-agents for electric motor control. Furthermore, this package can be used to compare the trained agents with other state-of-the-art control approaches. It is based on the OpenAI Gym framework that provides a widely used interface for the evaluation of RL-agents. The initial package version covers different DC motor variants and the prevalent permanent magnet synchronous motor as well as different power electronic converters and a mechanical load model. Due to the modular setup of the proposed toolbox, additional motor, load, and power electronic devices can be easily extended in the future. Furthermore, different secondary effects like controller interlocking time or noise are considered. An intelligent controller example based on the deep deterministic policy gradient algorithm which controls a series DC motor is presented and compared to a cascaded PI-controller as a baseline for future research. Fellow researchers are encouraged to use the framework in their RL investigations or to contribute to the functional scope (e.g. further motor types) of the package.52 days ago

7/10 relevant

arXiv

7/10 relevant

arXiv