백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Data-Efficient Decision-Making- [electronic resource]

자료유형: 학위논문파일 국외

최종처리일시: 20240214100126

ISBN: 9798379711665

DDC: 310

저자명: Hu, Yichun.

서명/저자: Data-Efficient Decision-Making - [electronic resource]

발행사항: [S.l.]: : Cornell University., 2023

발행사항: Ann Arbor : : ProQuest Dissertations & Theses,, 2023

형태사항: 1 online resource(322 p.)

주기사항: Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

주기사항: Advisor: Kallus, Nathan.

학위논문주기: Thesis (Ph.D.)--Cornell University, 2023.

사용제한주기: This item must not be sold to any third party vendors.

초록/해제: 요약This thesis is focused on the development of sample-efficient algorithms for personalized data-driven decision-making. In particular, the dissertation aims to address the following questions in both online (sequential) and offline (batch) settings: (i) What problem structures allow for achieving instance-specific fast regret rates? (ii) How can these problem structures be leveraged to design practical algorithms that achieve fast theoretical rates?Part I of this thesis investigates the above questions from an online perspective. Chapter 2 studies the smooth contextual bandit problem, where we use the smoothness property of the function class to design contextual bandit algorithms that interpolate between two extremes previously studied in isolation: nondifferentiable bandits and parametric-response bandits. Chapter 3 examines the DTR bandit problem, where we develop the first online algorithm with logarithmic regret for dynamic treatment regimes that involve personalized, adaptive, multi-stage treatment plans.Part II of this work delves into fast regret rates for offline problems by leveraging a probabilistic condition that measures the distribution of the reward gap between the optimal and second-optimal decisions, which we term the margin condition. In the case of contextual linear optimization, Chapter 4 shows that the naive plug-in approach actually achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance. In the case of offline reinforcement learning, Chapter 5 presents a finer regret analysis that characterizes the faster-than-square-root regret convergence rate we observe in practice.