학술논문

Selecting the Partial State Abstractions of MDPs: A Metareasoning Approach with Deep Reinforcement Learning

Document Type

Conference

Author

Nashed, Samer B.; Svegliato, Justin; Bhatia, Abhinav; Russell, Stuart; Zilberstein, Shlomo

Source

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Intelligent Robots and Systems (IROS), 2022 IEEE/RSJ International Conference on. :11665-11670 Oct, 2022

Subject

Bioengineering
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Deep learning
Computational modeling
Decision making
Reinforcement learning
Benchmark testing
Markov processes
Topology

Language

ISSN

2153-0866

Abstract

Markov decision processes (MDPs) are a common general-purpose model used in robotics for representing sequential decision-making problems. Given the complexity of robotics applications, a popular approach for approximately solving MDPs relies on state aggregation to reduce the size of the state space but at the expense of policy fidelity-offering a trade-off between policy quality and computation time. Naturally, this poses a challenging metareasoning problem: how can an autonomous system dynamically select different state abstractions that optimize this trade-off as it operates online? In this paper, we formalize this metareasoning problem with a notion of time-dependent utility and solve it using deep reinforcement learning. To do this, we develop several general, cheap heuristics that summarize the reward structure and transition topology of the MDP at hand to serve as effective features. Empirically, we demonstrate that our metareasoning approach outperforms several baseline approaches and a strong heuristic approach on a standard benchmark domain.

Online Access

Full Text (IEEE) Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송