학술논문

Reinforcement Learning for Temporal Logic Control Synthesis with Probabilistic Satisfaction Guarantees
Document Type
Conference
Source
2019 IEEE 58th Conference on Decision and Control (CDC) Decision and Control (CDC), 2019 IEEE 58th Conference on. :5338-5343 Dec, 2019
Subject
Aerospace
General Topics for Engineers
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Automata
Probabilistic logic
Uncertainty
Computational modeling
Learning (artificial intelligence)
Markov processes
Language
ISSN
2576-2370
Abstract
We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.