백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Statistically Efficient Reinforcement Learning- [electronic resource]

자료유형: 학위논문파일 국외

최종처리일시: 20240214101531

ISBN: 9798380318662

DDC: 004

저자명: Uehara, Masatoshi.

서명/저자: Statistically Efficient Reinforcement Learning - [electronic resource]

발행사항: [S.l.]: : Cornell University., 2023

발행사항: Ann Arbor : : ProQuest Dissertations & Theses,, 2023

형태사항: 1 online resource(296 p.)

주기사항: Source: Dissertations Abstracts International, Volume: 85-03, Section: B.

주기사항: Advisor: Kallus, Nathan.

학위논문주기: Thesis (Ph.D.)--Cornell University, 2023.

사용제한주기: This item must not be sold to any third party vendors.

사용제한주기: This item must not be added to any third party search indexes.

초록/해제: 요약My research focus is on developing algorithms and statistical theories of sequential decision making on the intersection of reinforcement learning (RL) and causal inference. RL is concerned with the ways agents learn to make sequential decisions in unknown environments. It has been one of the most vibrant research frontiers in machine learning over the last few years. We have empirical success in a variety of applications, especially for games such as AlphaGo (Silver et al., 2016). Despite its popularity, the real-world application of RL in fields such as biomedicine and social science is still limited. This is because these real-world applications do not have good simulators, and experimentation is often expensive and risky (e.g., running clinical trials, deploying new marketing strategies in companies) unlike for games. Although running new experiments can be difficult, fortunately, in an era of big data, we often have access to massive historical datasets such as web-logged data and large electronic health records. This motivated me to find ways to use offline data in a statistically efficient manner, which is a central topic in the subfield of offline RL and causal machine learning. However, there is a certain limitation in offline RL when the quality of the offline data is poor. In this scenario, we want to find the best policy by adaptively collecting data. This motivated me to find ways to collect the data and search for the best policy, which is a central topic in online RL. Since experiments are often costly, it again needs to be performed in a statistically efficient way. Hence, building statistically efficient RL algorithms in both offline and online settings is the key to bringing RL to a variety of real-world applications.