Company logo

RLC 2025 Schedule: August 5–9

Tuesday, August 5

8:30 AM - 5 PM
Coffee and drinks
9 AM - 5 PM
Workshops (8 parallel sessions)
12:30 PM - 2 PM
Lunch

Wednesday, August 6

8:30 AM - 5 PM
Coffee and drinks
8:45 AM - 9 AM
Opening Comments
9 AM - 10 AM
Keynote: Leslie Kaelbling
20-minute Break
10:20 AM - 11:15 AM
Orals (4 parallel sessions)
30-minute Break
11:45 AM - 12:30 PM
Orals (4 parallel sessions)
12:30 PM - 2 PM
Lunch
2 PM - 3 PM
Keynote: Dale Schuurmans
3 PM - 6 PM
Poster Session
6 PM
Banquet (Edmonton Convention Center)

Thursday, August 7

8:30 AM - 5 PM
Coffee and drinks
9 AM - 10 AM
Keynote: Joelle Pineau
20-minute Break
10:20 AM - 11:15 AM
Orals (4 parallel sessions)
30-minute Break
11:45 AM - 12:30 PM
Orals (4 parallel sessions)
12:30 PM - 2 PM
Lunch
2 PM - 3 PM
Keynote: Michael Littman
3 PM - 6 PM
Poster Session
6 PM
Dinner on your own

Friday, August 8

8:30 AM - 5 PM
Coffee and drinks
9 AM - 10 AM
Keynote: Peter Dayan
20-minute Break
10:20 AM - 11:15 AM
Orals (4 parallel sessions)
30-minute Break
11:45 AM - 12:30 PM
Orals (4 parallel sessions)
12:30 PM - 2 PM
Lunch
2:00 PM - 3:00 PM
Keynote: Richard S. Sutton
3 PM - 6 PM
Poster Session
6 PM
Dinner on your own

Saturday, August 9

9 AM - 10 AM
Breakfast & Meet-ups
10 AM - 11 AM
Town Hall
11 AM - 1 PM
Socials, Meet-ups, Excursions

Oral Talks

August 6

Track 1: RL algorithms

  1. Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes
  2. RL\(^3\): Boosting Meta Reinforcement Learning via RL inside RL\(^2\)
  3. Fast Adaptation with Behavioral Foundation Models
  4. Understanding Learned Representations and Action Collapse in Visual Reinforcement Learning
  5. Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
  6. ProtoCRL: Prototype-based Network for Continual Reinforcement Learning
  7. Offline Reinforcement Learning with Domain-Unlabeled Data
  8. SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning
  9. Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
  10. Zero-Shot Reinforcement Learning Under Partial Observability
  11. Adaptive Submodular Policy Optimization

Track 2: RL from human feedback, Imitation Learning

  1. Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
  2. Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations
  3. DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
  4. Mitigating Goal Misgeneralization via Minimax Regret
  5. Modelling human exploration with light-weight meta reinforcement learning algorithms
  6. Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
  7. PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning
  8. Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
  9. One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
  10. Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms
  11. Reward Distance Comparisons Under Transition Sparsity

Track 3: Hierarchical RL, Planning algorithms

  1. AVID: Adapting Video Diffusion Models to World Models
  2. The Confusing Instance Principle for Online Linear Quadratic Control
  3. Long-Horizon Planning with Predictable Skills
  4. Optimal discounting for offline input-driven MDP
  5. DeepCubeAF: A Foundation Model for Generalizable Pathfinding Heuristics
  6. A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching
  7. Focused Skill Discovery: Using Per-Factor Empowerment to Control State Variables
  8. Representation Learning and Skill Discovery with Empowerment
  9. Compositional Instruction Following with Language Models and Reinforcement Learning
  10. Composition and Zero-Shot Transfer with Lattice Structures in Reinforcement Learning
  11. Double Horizon Model-Based Policy Optimization

Track 4: Evaluation, Benchmarks

  1. Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
  2. Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies
  3. Multi-Task Reinforcement Learning Enables Parameter Scaling
  4. Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
  5. PufferLib 2.0: Reinforcement Learning at 1M steps/s
  6. Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL
  7. Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains
  8. How Should We Meta-Learn Reinforcement Learning Algorithms?
  9. AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
  10. Mental Modelling of Reinforcement Learning Agents by Language Models

August 7

Track 1: Deep RL

  1. Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
  2. Impoola: The Power of Average Pooling for Image-based Deep Reinforcement Learning
  3. Eau De \(Q\)-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning
  4. Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning
  5. Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions
  6. Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
  7. Sampling from Energy-based Policies using Diffusion
  8. Optimistic critics can empower small actors
  9. Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
  10. AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

Track 2: Social and economic aspects, Neuroscience and cognitive science

  1. Pareto Optimal Learning from Preferences with Hidden Context
  2. When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions
  3. Reinforcement Learning from Human Feedback with High-Confidence Safety Guarantees
  4. Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
  5. Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference
  6. High-Confidence Policy Improvement from Human Feedback
  7. MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight
  8. Building Sequential Resource Allocation Mechanisms without Payments
  9. From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations
  10. Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
  11. AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Track 3: Exploration

  1. Uncertainty Prioritized Experience Replay
  2. Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
  3. Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization
  4. Syllabus: Portable Curricula for Reinforcement Learning Agents
  5. Exploration-Free Reinforcement Learning with Linear Function Approximation
  6. Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
  7. Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World
  8. An Optimisation Framework for Unsupervised Environment Design
  9. Epistemically-guided forward-backward exploration
  10. RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Track 4: Theoretical RL, Bandit algorithms

  1. A Finite-Time Analysis of Distributed Q-Learning
  2. Finite-Time Analysis of Minimax Q-Learning
  3. Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
  4. Non-Stationary Latent Auto-Regressive Bandits
  5. A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP
  6. Leveraging priors on distribution functions for multi-arm bandits
  7. Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits
  8. On Slowly-varying Non-stationary Bandits
  9. Empirical Bound Information-Directed Sampling
  10. Thompson Sampling for Constrained Bandits
  11. Achieving Limited Adaptivity for Multinomial Logistic Bandits

August 8

Track 1: RL algorithms, Deep RL

  1. Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks
  2. Cascade - A sequential ensemble method for continuous control tasks
  3. HANQ: Hypergradients, Asymmetry, and Normalization for Fast and Stable Deep \(Q\)-Learning
  4. Rectifying Regression in Reinforcement Learning
  5. Efficient Morphology-Aware Policy Transfer to New Embodiments
  6. Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
  7. Concept-Based Off-Policy Evaluation
  8. Multiple-Frequencies Population-Based Training
  9. AVG-DICE: Stationary Distribution Correction by Regression
  10. Deep Reinforcement Learning with Gradient Eligibility Traces
  11. Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning

Track 2: Applied RL

  1. Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
  2. Chargax: A JAX Accelerated EV Charging Simulator
  3. WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies
  4. Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
  5. Quantitative Resilience Modeling for Autonomous Cyber Defense
  6. Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits
  7. Gaussian Process Q-Learning for Finite-Horizon Markov Decision Process
  8. Hybrid Classical/RL Local Planner for Ground Robot Navigation
  9. V-Max: Making RL Practical for Autonomous Driving
  10. Shaping Laser Pulses with Reinforcement Learning
  11. Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Track 3: Multi-agent RL

  1. Reinforcement Learning for Finite Space Mean-Field Type Game
  2. Collaboration Promotes Group Resilience in Multi-Agent RL
  3. Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models
  4. Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense
  5. Efficient Information Sharing for Training Decentralized Multi-Agent World Models
  6. Adaptive Reward Sharing to Enhance Learning in the Context of Multiagent Teams
  7. Seldonian Reinforcement Learning for Ad Hoc Teamwork
  8. Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
  9. TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems via Local Trajectory Encoding
  10. PEnGUiN: Partially Equivariant Graph NeUral Networks for Sample Efficient MARL
  11. Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers

Track 4: Foundations

  1. Effect of a slowdown correlated to the current state of the environment on an asynchronous learning architecture
  2. Average-Reward Soft Actor-Critic
  3. Your Learned Constraint is Secretly a Backward Reachable Tube
  4. Recursive Reward Aggregation
  5. On the Effect of Regularization in Policy Mirror Descent
  6. Investigating the Utility of Mirror Descent in Off-policy Actor-Critic
  7. Rethinking the Foundations for Continual Reinforcement Learning
  8. An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
  9. Reinforcement Learning with Adaptive Temporal Discounting
  10. Learning in complex action spaces without policy gradients
Company logo