【强化学习论文合集】ICML-2021 强化学习论文

强化学习（Reinforcement Learning, RL），又称再励学习、评价学习或增强学习，是机器学习的范式和方法论之一，用于描述和解决智能体（agent）在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中，涉及强化学习（Reinforcement Learning, RL）领域的论文。顶级会议包括但不限于：ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2021年国际机器学习会议（International Conference on Machine Learning, ICML）中涉及“强化学习”主题的论文。ICML如今已发展为由国际机器学习学会（IMLS）主办的年度机器学习国际顶级会议。

[1]. Safe Reinforcement Learning with Linear Function Approximation.
[2]. Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees.
[3]. Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision.
[4]. Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
[5]. Learning Routines for Effective Off-Policy Reinforcement Learning.
[6]. Goal-Conditioned Reinforcement Learning with Imagined Subgoals.
[7]. Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment.
[8]. Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning.
[9]. Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills.
[10]. Improved Corruption Robust Algorithms for Episodic Reinforcement Learning.
[11]. Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning.
[12]. Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing.
[13]. Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning.
[14]. Offline Reinforcement Learning with Pseudometric Learning.
[15]. Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation.
[16]. SAINT-ACC: Safety-Aware Intelligent Adaptive Cruise Control for Autonomous Vehicles Using Deep Reinforcement Learning.
[17]. Kernel-Based Reinforcement Learning: A Finite-Time Analysis.
[18]. Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning.
[19]. Reinforcement Learning Under Moral Uncertainty.
[20]. Self-Paced Context Evaluation for Contextual Reinforcement Learning.
[21]. Model-based Reinforcement Learning for Continuous Control with Posterior Sampling.
[22]. Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach.
[23]. PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning.
[24]. A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation.
[25]. Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning.
[26]. Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective.
[27]. Detecting Rewards Deterioration in Episodic Reinforcement Learning.
[28]. UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning.
[29]. Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning.
[30]. Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient.
[31]. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation.
[32]. Generalizable Episodic Memory for Deep Reinforcement Learning.
[33]. Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning.
[34]. Randomized Exploration in Reinforcement Learning with General Value Function Approximation.
[35]. Emphatic Algorithms for Deep Reinforcement Learning.
[36]. Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations.
[37]. Reward Identification in Inverse Reinforcement Learning.
[38]. A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning.
[39]. A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning.
[40]. High Confidence Generalization for Reinforcement Learning.
[41]. Offline Reinforcement Learning with Fisher Divergence Critic Regularization.
[42]. Revisiting Peng’s Q(λ) for Modern Reinforcement Learning.
[43]. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning.
[44]. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training.
[45]. Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot.
[46]. MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning.
[47]. Parallel Droplet Control in MEDA Biochips using Multi-Agent Reinforcement Learning.
[48]. Cooperative Exploration for Multi-Agent Deep Reinforcement Learning.
[49]. Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition.
[50]. Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices.
[51]. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play.
[52]. Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning.
[53]. Inverse Constrained Reinforcement Learning.
[54]. Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.
[55]. Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs.
[56]. Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks.
[57]. Counterfactual Credit Assignment in Model-Free Reinforcement Learning.
[58]. Offline Meta-Reinforcement Learning with Advantage Weighting.
[59]. Emergent Social Learning via Multi-agent Reinforcement Learning.
[60]. Density Constrained Reinforcement Learning.
[61]. Decoupling Value and Policy for Generalization in Reinforcement Learning.
[62]. Model-Based Reinforcement Learning via Latent-Space Collocation.
[63]. Recomposing the Reinforcement Learning Building Blocks with Hypernetworks.
[64]. RRL: Resnet as representation for Reinforcement Learning.
[65]. Structured World Belief for Reinforcement Learning in POMDP.
[66]. Multi-Task Reinforcement Learning with Context-based Representations.
[67]. Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks.
[68]. PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration.
[69]. Decoupling Representation Learning from Reinforcement Learning.
[70]. Reinforcement Learning for Cost-Aware Markov Decision Processes.
[71]. REPAINT: Knowledge Transfer in Deep Reinforcement Learning.
[72]. Safe Reinforcement Learning Using Advantage-Based Intervention.
[73]. Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing.
[74]. On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP.
[75]. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning.
[76]. Deep Reinforcement Learning amidst Continual Structured Non-Stationarity.
[77]. CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee.
[78]. Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies.
[79]. Reinforcement Learning with Prototypical Representations.
[80]. Continuous-time Model-based Reinforcement Learning.
[81]. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL.
[82]. DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning.
[83]. Near Optimal Reward-Free Reinforcement Learning.
[84]. FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning.
[85]. On-Policy Deep Reinforcement Learning for the Average-Reward Criterion.
[86]. MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration.
[87]. Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity.
[88]. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping.
[89]. Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning.
[90]. Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning.