Track 1: RL algorithms
π Room: CCIS 1-430 (508, TF)
- Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes
- RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
- Fast Adaptation with Behavioral Foundation Models
- Understanding Learned Representations and Action Collapse in Visual Reinforcement Learning
- Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
- ProtoCRL: Prototype-based Network for Continual Reinforcement Learning
- Offline Reinforcement Learning with Domain-Unlabeled Data
- SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning
- Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps
- Zero-Shot Reinforcement Learning Under Partial Observability
- Adaptive Submodular Policy Optimization
Track 2: RL from human feedback, Imitation Learning
π Room: CCIS 1-440 (413, TF)
- Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
- Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations
- DisDP: Robust Imitation Learning via Disentangled Diffusion Policies
- Mitigating Goal Misgeneralization via Minimax Regret
- Modelling human exploration with light-weight meta reinforcement learning algorithms
- Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners
- PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning
- Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets
- One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise
- Goals vs. Rewards: A Comparative Study of Objective Specification Mechanisms
- Reward Distance Comparisons Under Transition Sparsity
Track 3: Hierarchical RL, Planning algorithms
π Room: CCIS 1-140 (155, EC)
- AVID: Adapting Video Diffusion Models to World Models
- The Confusing Instance Principle for Online Linear Quadratic Control
- Long-Horizon Planning with Predictable Skills
- Optimal discounting for offline input-driven MDP
- DeepCubeAF: A Foundation Model for Generalizable Pathfinding Heuristics
- A Timer-Enforced Hybrid Supervisor for Robust, Chatter-Free Policy Switching
- Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects
- Representation Learning and Skill Discovery with Empowerment
- Compositional Instruction Following with Language Models and Reinforcement Learning
- Composition and Zero-Shot Transfer with Lattice Structures in Reinforcement Learning
- Double Horizon Model-Based Policy Optimization
Track 4: Evaluation, Benchmarks
π Room: CCIS 1-160 (155, EC)
- Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences
- Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies
- Multi-Task Reinforcement Learning Enables Parameter Scaling
- Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks
- PufferLib 2.0: Reinforcement Learning at 1M steps/s
- Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL
- Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains
- How Should We Meta-Learn Reinforcement Learning Algorithms?
- AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents
- MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight
Track 1: Deep RL
π Room: CCIS 1-430 (508, TF)
- Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
- Impoola: The Power of Average Pooling for Image-based Deep Reinforcement Learning
- Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning
- Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning
- Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions
- Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning
- Sampling from Energy-based Policies using Diffusion
- Optimistic critics can empower small actors
- Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks
- AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning
- Deep Reinforcement Learning with Gradient Eligibility Traces
Track 2: Social and economic aspects, Neuroscience and cognitive science
π Room: CCIS 1-440 (413, TF)
- Pareto Optimal Learning from Preferences with Hidden Context
- When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions
- Reinforcement Learning from Human Feedback with High-Confidence Safety Guarantees
- Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
- Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference
- High-Confidence Policy Improvement from Human Feedback
- Building Sequential Resource Allocation Mechanisms without Payments
- From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations
- Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning
- AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing
Track 3: Exploration
π Room: CCIS 1-140 (155, EC)
- Uncertainty Prioritized Experience Replay
- Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
- Quantitative Resilience Modeling for Autonomous Cyber Defense
- Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization
- Syllabus: Portable Curricula for Reinforcement Learning Agents
- Exploration-Free Reinforcement Learning with Linear Function Approximation
- Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning
- Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World
- An Optimisation Framework for Unsupervised Environment Design
- Epistemically-guided forward-backward exploration
- RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning
Track 4: Theoretical RL, Bandit algorithms
π Room: CCIS 1-160 (155, EC)
- A Finite-Time Analysis of Distributed Q-Learning
- Finite-Time Analysis of Minimax Q-Learning
- Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
- Non-Stationary Latent Auto-Regressive Bandits
- A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP
- Leveraging priors on distribution functions for multi-arm bandits
- Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits
- On Slowly-varying Non-stationary Bandits
- Empirical Bound Information-Directed Sampling
- Thompson Sampling for Constrained Bandits
- Achieving Limited Adaptivity for Multinomial Logistic Bandits
Track 1: RL algorithms, Deep RL
π Room: CCIS 1-430 (508, TF)
- Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks
- Cascade - A sequential ensemble method for continuous control tasks
- HANQ: Hypergradients, Asymmetry, and Normalization for Fast and Stable Deep $Q$-Learning
- Rectifying Regression in Reinforcement Learning
- Efficient Morphology-Aware Policy Transfer to New Embodiments
- Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting
- Concept-Based Off-Policy Evaluation
- Multiple-Frequencies Population-Based Training
- AVG-DICE: Stationary Distribution Correction by Regression
- Iterated Q-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning
Track 2: Applied RL
π Room: CCIS 1-440 (413, TF)
- Action Mapping for Reinforcement Learning in Continuous Environments with Constraints
- Chargax: A JAX Accelerated EV Charging Simulator
- WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies
- Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
- Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits
- Gaussian Process Q-Learning for Finite-Horizon Markov Decision Process
- Hybrid Classical/RL Local Planner for Ground Robot Navigation
- V-Max: Making RL Practical for Autonomous Driving
- Shaping Laser Pulses with Reinforcement Learning
- Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics
Track 3: Multi-agent RL
π Room: CCIS 1-140 (155, EC)
- Reinforcement Learning for Finite Space Mean-Field Type Game
- Collaboration Promotes Group Resilience in Multi-Agent RL
- Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models
- Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense
- Efficient Information Sharing for Training Decentralized Multi-Agent World Models
- Adaptive Reward Sharing to Enhance Learning in the Context of Multiagent Teams
- Seldonian Reinforcement Learning for Ad Hoc Teamwork
- Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control
- TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems via Local Trajectory Encoding
- PEnGUiN: Partially Equivariant Graph NeUral Networks for Sample Efficient MARL
- Human-Level Competitive PokΓ©mon via Scalable Offline Reinforcement Learning with Transformers
Track 4: Foundations
π Room: CCIS 1-160 (155, EC)
- Effect of a slowdown correlated to the current state of the environment on an asynchronous learning architecture
- Average-Reward Soft Actor-Critic
- Your Learned Constraint is Secretly a Backward Reachable Tube
- Recursive Reward Aggregation
- On the Effect of Regularization in Policy Mirror Descent
- Investigating the Utility of Mirror Descent in Off-policy Actor-Critic
- Rethinking the Foundations for Continual Reinforcement Learning
- An Analysis of Action-Value Temporal-Difference Methods That Learn State Values
- Reinforcement Learning with Adaptive Temporal Discounting
- Learning in complex action spaces without policy gradients