백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Reinforcement Learning From Static Datasets: Algorithms, Analysis, and Applications- [electronic resource]

자료유형: 학위논문파일 국외

최종처리일시: 20240214101657

ISBN: 9798380877275

DDC: 004

저자명: Kumar, Aviral.

서명/저자: Reinforcement Learning From Static Datasets: Algorithms, Analysis, and Applications - [electronic resource]

발행사항: [S.l.]: : University of California, Berkeley., 2023

발행사항: Ann Arbor : : ProQuest Dissertations & Theses,, 2023

형태사항: 1 online resource(385 p.)

주기사항: Source: Dissertations Abstracts International, Volume: 85-06, Section: B.

주기사항: Advisor: Levine, Sergey.

학위논문주기: Thesis (Ph.D.)--University of California, Berkeley, 2023.

사용제한주기: This item must not be sold to any third party vendors.

초록/해제: 요약Reinforcement learning (RL) provides a formalism for learning-based control. By attempting to learn behavioral policies that can optimize a user-specified reward function, RL methods have been able to acquire novel decision-making strategies that can outperform the best humans even with highly complex dynamics and even when the space of all possible outcomes is huge (e.g., robotic manipulation, chip floorplanning). Yet RL has had a limited applicability compared to standard machine learning (ML) in real-world scenarios. Why? The central issue with RL is that it relies crucially on running large amounts of trial-and-error active data collection for learning policies. Unfortunately though, in the real world, active data collection is generally very expensive (e.g., running wet lab experiments for drug design), and/or dangerous (e.g., robots operating around humans), and accurate simulators are hard to build. Overall, this means that while RL carries the potential to broadly unlock ML in real-world decision-making problems, we are unable to realize this potential via current RL techniques. To realize this potential of RL, in this dissertation, we develop an alternate paradigm that aims to utilizes static datasets of experience for learning policies. Such a ``dataset-driven'' paradigm broadens the applicability of RL to a variety of decision-making problems where historical datasets already exist or can be collected via domain-specific strategies. It also brings the scalability and reliability benefits that modern supervised and unsupervised ML methods enjoy into RL. That said, instantiating this paradigm is challenging as it requires reconciling the {static} nature of learning from a dataset with the traditionally active nature of RL, which results in challenges of distributional shift, generalization, and optimization. After theoretically and empirically understanding these challenges, we develop algorithmic ideas for addressing thee challenges and discuss several extensions to convert these ideas into practical methods that can train modern high-capacity neural network function approximators on large and diverse datasets. Finally, we show how the techniques can enable us to pre-train generalist policies for real robots and video games and enable fast and efficient hardware accelerator design.