RLJ proceedings are available at https://rlj.cs.umass.edu/2024/2024issue.html
Posters sessions for a paper are the same day as the presentation day.
Aug 10, Oral Track 1: Evaluation - Room 168
-
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
Rafael Rafailov, Kyle Beltran Hatch, Anikait Singh, Aviral Kumar, Laura Smith, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip J. Ball, Jiajun Wu, Sergey Levine, Chelsea Finn
-
Harnessing Discrete Representations for Continual Reinforcement Learning
Edan Jacob Meyer, Adam White, Marlos C. Machado
-
Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning
Davide Corsi, Davide Camponogara, Alessandro Farinelli
-
Investigating the Interplay of Prioritized Replay and Generalization
Parham Mohammad Panahi, Andrew Patterson, Martha White, Adam White
-
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Kartik Choudhary, Dhawal Gupta, Philip S. Thomas
-
Resource Usage Evaluation of Discrete Model-Free Deep Reinforcement Learning Algorithms
Olivia P. Dizon-Paradis, Stephen E. Wormald, Daniel E. Capecci, Avanti Bhandarkar, Damon L. Woodard
-
OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments
Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, Kristian Kersting
-
The Cross-environment Hyperparameter Setting Benchmark for Reinforcement Learning
Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, Adam White
-
An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks
Antonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schaeffer, João Silvério, Freek Stulp
-
Combining Automated Optimisation of Hyperparameters and Reward Shape
Julian Dierkes, Emma Cramer, Holger Hoos, Sebastian Trimpe
-
Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning
Marcel Hussing, Jorge Mendez-Mendez, Anisha Singrodia, Cassandra Kent, Eric Eaton
-
Stable-Baselines3: Reliable Reinforcement Learning Implementations
Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, Noah Dormann
Aug 10, Oral Track 2: Theoretical RL and bandit algorithms - Room 165/169
-
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning
Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli
-
Bandits with Multimodal Structure
Hassan SABER, Odalric-Ambrym Maillard
-
Policy Gradient with Active Importance Sampling
Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli
-
Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes
He Wang, Laixi Shi, Yuejie Chi
-
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
Woojin Jeong, Seungki Min
-
A Batch Sequential Halving Algorithm without Performance Degradation
Sotetsu Koyamada, Soichiro Nishimori, Shin Ishii
-
Graph Neural Thompson Sampling
Shuang Wu, Arash A. Amini
-
A Tighter Convergence Proof of Reverse Experience Replay
Nan Jiang, Jinzhao Li, Yexiang Xue
-
Cost Aware Best Arm Identification
Kellen Kanarios, Qining Zhang, Lei Ying
-
Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs
Michael Lu, Matin Aghaei, Anant Raj, Sharan Vaswani
-
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms
Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh
-
Causal Contextual Bandits with Adaptive Context
Rahul Madhavan, Aurghya Maiti, Gaurav Sinha, Siddharth Barman
-
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch
Shangtong Zhang, Remi Tachet des Combes, Romain Laroche
Aug 10, Oral Track 3: Multi-agent RL and planning algorithms - Room 174/176
-
Co-Learning Empirical Games & World Models
Max Olan Smith, Michael P. Wellman
-
Best Response Shaping
Milad Aghajohari, Tim Cooijmans, Juan Agustin Duque, Shunichi Akatsuka, Aaron Courville
-
Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments
Daniel Melcer, Christopher Amato, Stavros Tripakis
-
Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning
Yuxin Chen, Chen Tang, Thomas Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan
-
Reinforcement Learning from Delayed Observations via World Models
Armin Karamzade, Kyungmin Kim, Montek Kalsi, Roy Fox
-
Cyclicity-Regularized Coordination Graphs
Oliver Järnefelt, Mahdi Kallel, Carlo D'Eramo
-
Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization
Aditya Kapoor, Benjamin Freed, Jeff Schneider, Howie Choset
-
Trust-based Consensus in Multi-Agent Reinforcement Learning Systems
Ho Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, Mirco Musolesi
-
Inception: Efficiently Computable Misinformation Attacks on Markov Games
Jeremy McMahan, Young Wu, Yudong Chen, Jerry Zhu, Qiaomin Xie
-
Human-compatible driving agents through data-regularized self-play reinforcement learning
Daphne Cornelisse, Eugene Vinitsky
-
On Welfare-Centric Fair Reinforcement Learning
Cyrus Cousins, Kavosh Asadi, Elita Lobo, Michael Littman
-
BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations
Robert J. Moss, Anthony Corso, Jef Caers, Mykel Kochenderfer
Aug 10, Oral Track 4: Deep reinforcement learning - Room 163
-
Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
Marcel Hussing, Claas A Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton
-
Mixture of Experts in a Mixture of RL settings
Timon Willi, Johan Samir Obando Ceron, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, Pablo Samuel Castro
-
Light-weight Probing of Unsupervised Representations for Reinforcement Learning
Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, Nicolas Carion
-
Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace
Léopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen
-
PASTA: Pretrained Action-State Transformer Agents
Raphael Boige, Yannis Flet-Berliac, Lars C.P.M. Quaedvlieg, Arthur Flajolet, Guillaume Richard, Thomas PIERROT
-
Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL
Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann
-
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Abdulaziz Almuzairee, Nicklas Hansen, Henrik I Christensen
-
On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Johan Samir Obando Ceron, João Guilherme Madeira Araújo, Aaron Courville, Pablo Samuel Castro
-
Policy-Guided Diffusion
Matthew Thomas Jackson, Michael Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Nicolaus Foerster
-
SplAgger: Split Aggregation for Meta-Reinforcement Learning
Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson
-
Learning to Optimize for Reinforcement Learning
Qingfeng Lan, A. Rupam Mahmood, Shuicheng YAN, Zhongwen Xu
-
Investigating the properties of neural network representations in reinforcement learning
Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White
-
Cheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Andreas Look, Barbara Rakitsch, Melih Kandemir, Jan Peters
Aug 11, Oral Track 1: RL from human feedback and imitation learning - Room 168
-
Learning Action-based Representations Using Invariance
Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, Amy Zhang
-
Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations
Connor Mattson, Anurag Sidharth Aribandi, Daniel S. Brown
-
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
Johannes Ackermann, Takayuki Osa, Masashi Sugiyama
-
Offline Diversity Maximization under Imitation Constraints
Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev
-
Imitation Learning from Observation through Optimal Transport
Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek
-
ROIL: Robust Offline Imitation Learning without Trajectories
Gersi Doko, Guang Yang, Daniel S. Brown, Marek Petrik
-
Agent-Centric Human Demonstrations Train World Models
James Staley, Elaine Short, Shivam Goel, Yash Shukla
-
Inverse Reinforcement Learning with Multiple Planning Horizons
Jiayu Yao, Weiwei Pan, Finale Doshi-Velez, Barbara E Engelhardt
-
Semi-Supervised One Shot Imitation Learning
Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel
-
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis
Qining Zhang, Honghao Wei, Lei Ying
-
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, Josiah P. Hanna
-
Reward (Mis)design for autonomous driving☆
W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone
-
Models of human preference for learning reward functions
W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro G Allievi
Aug 11, Oral Track 2: Foundations - Room 165/169
-
The Cliff of Overcommitment with Policy Gradient Step Sizes
Scott M. Jordan, Samuel Neumann, James E. Kostas, Adam White, Philip S. Thomas
-
Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley, Marlos C. Machado, Martha White
-
When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning
Claas A Voelcker, Tyler Kastner, Igor Gilitschenski, Amir-massoud Farahmand
-
A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage
Kevin Tan, Ziping Xu
-
States as goal-directed concepts: an epistemic approach to state-representation learning
Nadav Amir, Yael Niv, Angela J Langdon
-
Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior
Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael Littman
-
Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets
Benjamin Freed, Thomas Wei, Roberto Calandra, Jeff Schneider, Howie Choset
-
Multistep Inverse Is Not All You Need
Alexander Levine, Peter Stone, Amy Zhang
-
An Idiosyncrasy of Time-discretization in Reinforcement Learning
Kris De Asis, Richard S. Sutton
-
Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL
Miguel Suau, Matthijs T. J. Spaan, Frans A Oliehoek
-
Mitigating the Curse of Horizon in Monte-Carlo Returns
Alex Ayoub, David Szepesvari, Francesco Zanini, Bryan Chan, Dhawal Gupta, Bruno Castro da Silva, Dale Schuurmans
-
Structure in Deep Reinforcement Learning: A Survey and Open Problems
Aditya Mohan, Amy Zhang, Marius Lindauer
Aug 11, Oral Track 3: Applied reinforcement learning - Room 174/176
-
Sequential Decision-Making for Inline Text Autocomplete
Rohan Chitnis, Shentao Yang, Alborz Geramifard
-
A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo
Miguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, Peter Stone
-
Towards General Negotiation Strategies with End-to-End Reinforcement Learning
Bram M. Renting, Thomas M. Moerland, Holger Hoos, Catholijn M Jonker
-
JoinGym: An Efficient Join Order Selection Environment
Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, Wen Sun
-
Policy Architectures for Compositional Generalization in Control
Allan Zhou, Vikash Kumar, Chelsea Finn, Aravind Rajeswaran
-
Verification-Guided Shielding for Deep Reinforcement Learning
Davide Corsi, Guy Amir, Andoni Rodríguez, Guy Katz, César Sánchez, Roy Fox
-
Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning
Zakariae EL ASRI, Olivier Sigaud, Nicolas THOME
-
Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies
Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan
-
RL for Consistency Models: Reward Guided Text-to-Image Generation with Fast Inference
Owen Oertell, Jonathan Daniel Chang, Yiyi Zhang, Kianté Brantley, Wen Sun
-
Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps
Linfeng Zhao, Lawson L.S. Wong
-
Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning
Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, A. Rupam Mahmood
-
Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras
Mhairi Dunion, Stefano V Albrecht
-
Emergent behaviour and neural dynamics in artificial agents tracking odour plumes
Satpreet H. Singh, Floris van Breugel, Rajesh P. N. Rao, Bingni W. Brunton
-
GVFs in the real world: making predictions online for water treatment
Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White
Aug 11, Oral Track 4: RL algorithms - Room 163
-
Weight Clipping for Deep Continual and Reinforcement Learning
Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood
-
Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes
Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, Peinan Zhang
-
ROER: Regularized Optimal Experience Replay
Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni Pajarinen
-
Learning Discrete World Models for Heuristic Search
Forest Agostinelli, Misagh Soltani
-
Boosting Soft Q-Learning by Bounding
Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V Kulkarni
-
Reward Centering
Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton
-
Stabilizing Extreme Q-learning by Maclaurin Expansion
Motoki Omura, Takayuki Osa, YUSUKE Mukuta, Tatsuya Harada
-
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe
-
A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization
Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart
-
PID Accelerated Temporal Difference Algorithms
Mark Bedaywi, Amin Rakhsha, Amir-massoud Farahmand
-
SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning
Khurram Javed, Arsalan Sharifnassab, Richard S. Sutton
-
Posterior Sampling for Continuing Environments
Wanqiao Xu, Shi Dong, Benjamin Van Roy
-
Off-Policy Actor-Critic with Emphatic Weightings
Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White
Aug 12, Oral Track 1: Social and economic aspects - Room 168
-
Value Internalization: Learning and Generalizing from Social Reward
Frieda Rong, Max Kleiman-Weiner
-
Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?
Akansha Kalra, Daniel S. Brown
-
Three Dogmas of Reinforcement Learning
David Abel, Mark K Ho, Anna Harutyunyan
-
MultiHyRL: Robust Hybrid RL for Obstacle Avoidance against Adversarial Attacks on the Observation Space
Jan de Priester, Zachary Bell, Prashant Ganesh, Ricardo Sanfelice
-
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach
Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, Bin Liu
-
Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning
Taylor W. Killian, Sonali Parbhoo, Marzyeh Ghassemi
Aug 12, Oral Track 2: Theoretical RL - Room 165/169
-
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation
Noah Golowich, Ankur Moitra
-
Optimizing Rewards while meeting $\omega$-regular Constraints
Christopher Zeitler, Kristina Miller, Sayan Mitra, John Schierman, Mahesh Viswanathan
-
Distributionally Robust Constrained Reinforcement Learning under Strong Duality
Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue
-
Non-adaptive Online Finetuning for Offline Reinforcement Learning
Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, Marek Petrik
-
Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation
Yixuan Zhang, Qiaomin Xie
-
An Optimal Tightness Bound for the Simulation Lemma
Sam Lobel, Ronald Parr
-
Temporal Difference Learning with Compressed Updates: Error-Feedback meets Reinforcement Learning
Aritra Mitra, George J. Pappas, Hamed Hassani
Aug 12, Oral Track 3: Exploration - Room 174/176
-
The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough
Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti
-
Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth
-
Exploring Uncertainty in Distributional Reinforcement Learning
Georgy Antonov, Peter Dayan
-
More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling
Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu
-
Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning
Trevor McInroe, Adam Jelley, Stefano V Albrecht, Amos Storkey
-
Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance
Jakob Hollenstein, Sayantan Auddy, Matteo Saveriano, Erwan Renaudo, Justus Piater
Aug 12, Oral Track 4: Hierarchical RL and planning algorithms - Room 163
-
Online Planning in POMDPs with State-Requests
Raphaël Avalos, Eugenio Bargiacchi, Ann Nowe, Diederik Roijers, Frans A Oliehoek
-
Informed POMDP: Leveraging Additional Information in Model-Based RL
Gaspard Lambrechts, Adrien Bolland, Damien Ernst
-
Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning
Erin J Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang
-
Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot Generalization
Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp
-
Learning Abstract World Models for Value-preserving Planning with Options
Rafael Rodriguez-Sanchez, George Konidaris
-
Granger Causal Interaction Skill Chains
Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum
-
On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning
Philipp Becker, Gerhard Neumann
-
Mitigating Value Hallucination in Dyna-Style Planning via Multistep Predecessor Models
Farzane Aminmansour, Taher Jafferjee, Ehsan Imani, Erin J. Talvitie, Michael Bowling, Martha White