【强化学习论文合集】ICML-2021 强化学习论文

news/2024/5/18 23:44:15 标签: 人工智能, 深度学习, 强化学习

强化学习(Reinforcement Learning, RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体(agent)在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题。
本专栏整理了近几年国际顶级会议中,涉及强化学习(Reinforcement Learning, RL)领域的论文。顶级会议包括但不限于:ICML、AAAI、IJCAI、NIPS、ICLR、AAMAS、CVPR、ICRA等。

在这里插入图片描述

今天给大家分享的是2021年国际机器学习会议(International Conference on Machine Learning, ICML)中涉及“强化学习”主题的论文。ICML如今已发展为由国际机器学习学会(IMLS)主办的年度机器学习国际顶级会议。

  • [1]. Safe Reinforcement Learning with Linear Function Approximation.
  • [2]. Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees.
  • [3]. Low-Precision Reinforcement Learning: Running Soft Actor-Critic in Half Precision.
  • [4]. Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
  • [5]. Learning Routines for Effective Off-Policy Reinforcement Learning.
  • [6]. Goal-Conditioned Reinforcement Learning with Imagined Subgoals.
  • [7]. Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment.
  • [8]. Solving Challenging Dexterous Manipulation Tasks With Trajectory Optimisation and Reinforcement Learning.
  • [9]. Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills.
  • [10]. Improved Corruption Robust Algorithms for Episodic Reinforcement Learning.
  • [11]. Variational Empowerment as Representation Learning for Goal-Conditioned Reinforcement Learning.
  • [12]. Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing.
  • [13]. Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning.
  • [14]. Offline Reinforcement Learning with Pseudometric Learning.
  • [15]. Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation.
  • [16]. SAINT-ACC: Safety-Aware Intelligent Adaptive Cruise Control for Autonomous Vehicles Using Deep Reinforcement Learning.
  • [17]. Kernel-Based Reinforcement Learning: A Finite-Time Analysis.
  • [18]. Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning.
  • [19]. Reinforcement Learning Under Moral Uncertainty.
  • [20]. Self-Paced Context Evaluation for Contextual Reinforcement Learning.
  • [21]. Model-based Reinforcement Learning for Continuous Control with Posterior Sampling.
  • [22]. Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach.
  • [23]. PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning.
  • [24]. A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation.
  • [25]. Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning.
  • [26]. Spectral Normalisation for Deep Reinforcement Learning: An Optimisation Perspective.
  • [27]. Detecting Rewards Deterioration in Episodic Reinforcement Learning.
  • [28]. UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning.
  • [29]. Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning.
  • [30]. Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient.
  • [31]. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation.
  • [32]. Generalizable Episodic Memory for Deep Reinforcement Learning.
  • [33]. Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning.
  • [34]. Randomized Exploration in Reinforcement Learning with General Value Function Approximation.
  • [35]. Emphatic Algorithms for Deep Reinforcement Learning.
  • [36]. Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations.
  • [37]. Reward Identification in Inverse Reinforcement Learning.
  • [38]. A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning.
  • [39]. A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning.
  • [40]. High Confidence Generalization for Reinforcement Learning.
  • [41]. Offline Reinforcement Learning with Fisher Divergence Critic Regularization.
  • [42]. Revisiting Peng’s Q(λ) for Modern Reinforcement Learning.
  • [43]. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning.
  • [44]. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training.
  • [45]. Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot.
  • [46]. MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning.
  • [47]. Parallel Droplet Control in MEDA Biochips using Multi-Agent Reinforcement Learning.
  • [48]. Cooperative Exploration for Multi-Agent Deep Reinforcement Learning.
  • [49]. Coach-Player Multi-agent Reinforcement Learning for Dynamic Team Composition.
  • [50]. Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices.
  • [51]. A Sharp Analysis of Model-based Reinforcement Learning with Self-Play.
  • [52]. Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning.
  • [53]. Inverse Constrained Reinforcement Learning.
  • [54]. Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.
  • [55]. Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs.
  • [56]. Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks.
  • [57]. Counterfactual Credit Assignment in Model-Free Reinforcement Learning.
  • [58]. Offline Meta-Reinforcement Learning with Advantage Weighting.
  • [59]. Emergent Social Learning via Multi-agent Reinforcement Learning.
  • [60]. Density Constrained Reinforcement Learning.
  • [61]. Decoupling Value and Policy for Generalization in Reinforcement Learning.
  • [62]. Model-Based Reinforcement Learning via Latent-Space Collocation.
  • [63]. Recomposing the Reinforcement Learning Building Blocks with Hypernetworks.
  • [64]. RRL: Resnet as representation for Reinforcement Learning.
  • [65]. Structured World Belief for Reinforcement Learning in POMDP.
  • [66]. Multi-Task Reinforcement Learning with Context-based Representations.
  • [67]. Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks.
  • [68]. PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration.
  • [69]. Decoupling Representation Learning from Reinforcement Learning.
  • [70]. Reinforcement Learning for Cost-Aware Markov Decision Processes.
  • [71]. REPAINT: Knowledge Transfer in Deep Reinforcement Learning.
  • [72]. Safe Reinforcement Learning Using Advantage-Based Intervention.
  • [73]. Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing.
  • [74]. On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP.
  • [75]. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning.
  • [76]. Deep Reinforcement Learning amidst Continual Structured Non-Stationarity.
  • [77]. CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee.
  • [78]. Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies.
  • [79]. Reinforcement Learning with Prototypical Representations.
  • [80]. Continuous-time Model-based Reinforcement Learning.
  • [81]. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL.
  • [82]. DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning.
  • [83]. Near Optimal Reward-Free Reinforcement Learning.
  • [84]. FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning.
  • [85]. On-Policy Deep Reinforcement Learning for the Average-Reward Criterion.
  • [86]. MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration.
  • [87]. Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity.
  • [88]. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping.
  • [89]. Learning Fair Policies in Decentralized Cooperative Multi-Agent Reinforcement Learning.
  • [90]. Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning.

http://www.niftyadmin.cn/n/11778.html

相关文章

1、认识时间复杂度和简单的排序算法

目录时间复杂度选择排序冒泡排序异或交换解释案例综上插入排序二分查找拓展对数器时间复杂度 如果一个操作时间和数据量没有关系,则是常数时间的操作 比如一个数组arr[n]这就是算一个偏移量,然后找到这个位置的值,这就是常数时间&#xff0c…

java毕业生设计学校图书馆管理系统计算机源码+系统+mysql+调试部署+lw

java毕业生设计学校图书馆管理系统计算机源码系统mysql调试部署lw java毕业生设计学校图书馆管理系统计算机源码系统mysql调试部署lw本源码技术栈: 项目架构:B/S架构 开发语言:Java语言 开发软件:idea eclipse 前端技术&#…

Python(PyQt5)制作帮助文档查看器(可显示后缀名为md的文件)同时显示文本和图片

先看完整效果图: 帮助文档查看器是很多程序中必备要素,而利用Qt中的QTreeView组件可以很方便的查看文件,而QTextBrowser可以直接显示格式化的MarkDown文本。因此可以利用这两个组件制作一个帮助文件查看器。 未优化 效果图: 问题优化: 你会发现QT treeView列宽设置不成功问题…

【自学前端】HTML篇已完结(附14节视频)

I have a dream,Front end development will not require additional training. 目录 内容预览 通过这里找视频: 1、内容仍然有所欠缺 2、目前以0基础再实战为主 3、目前未包含面试题 4、下一步的计划 5、希望收到反馈 内容预览 △ 目前纯HTML篇课程已经完成…

智慧社区解决方案

智慧社区解决方案 智慧小区就是以互联网为依托,运用物联网技术将小区的物联系统和服务、家庭中的智慧家居系统整合在一起,使小区管理者、用户和各种智慧系统形成各种形式的信息交互,以达到更加方便快捷的管理,给用户带来更加舒适的…

iwebsec靶场 SQL注入漏洞通关笔记2- 字符型注入(宽字节注入)

系列文章目录 iwebsec靶场 SQL注入漏洞通关笔记1- 数字型注入_mooyuan的博客-CSDN博客 目录 系列文章目录 前言 第02关 字符型注入 1.源码分析 2.字符型宽字节注入 (1)渗透方法1: (2)渗透方法2: &am…

详解设计模式:工厂方法模式

工厂方法模式,又称工厂模式、多态工厂模式和虚拟构造器模式,通过工厂父类定义负责创建产品的公共接口,子类负责生产具体对象。可以理解为简单工程模式的升级,解决简单工厂模式的弊端。 ~ 本篇内容包括:关于…

【无人机】基于PID控制器和A星算法实现无人机路径规划附matlab代码

✅作者简介:热爱科研的Matlab仿真开发者,修心和技术同步精进,matlab项目合作可私信。 🍎个人主页:Matlab科研工作室 🍊个人信条:格物致知。 更多Matlab仿真内容点击👇 智能优化算法 …