백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Deep Generative Models for Decision-Making and Control- [electronic resource]

자료유형: 학위논문파일 국외

최종처리일시: 20240214100500

ISBN: 9798380380911

DDC: 629.8

저자명: Janner, Michael.

서명/저자: Deep Generative Models for Decision-Making and Control - [electronic resource]

발행사항: [S.l.]: : University of California, Berkeley., 2023

발행사항: Ann Arbor : : ProQuest Dissertations & Theses,, 2023

형태사항: 1 online resource(98 p.)

주기사항: Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

주기사항: Advisor: Levine, Sergey.

학위논문주기: Thesis (Ph.D.)--University of California, Berkeley, 2023.

사용제한주기: This item must not be sold to any third party vendors.

초록/해제: 요약Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. We begin by generalizing the dynamics model itself, replacing the standard single-step formulation with a model that predicts over probabilistic latent horizons. The resulting model, trained with a generative reinterpretation of temporal difference learning, leads to infinite-horizon variants of the procedures central to model-based control, including the model rollout and model-based value estimation.Next, we show that poor predictive accuracy of commonly-used deep dynamics models is a major bottleneck to effective planning, and describe how to use high-capacity sequence models to overcome this limitation. Framing reinforcement learning as sequence modeling simplifies a range of design decisions, allowing us to dispense with many of the components normally integral to reinforcement learning algorithms. However, despite their predictive accuracy, such sequence models are limited by the search algorithms in which they are embedded. As such, we demonstrate how to fold the entire trajectory optimization pipeline into the generative model itself, such that sampling from the model and planning with it become nearly identical. The culmination of this endeavor is a method that improves its planning capabilities, and not just its predictive accuracy, with more data and experience. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.