본문

Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets- [electronic resource]
Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets - [electr...
Inhalt Info
Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets- [electronic resource]
자료유형  
 학위논문파일 국외
최종처리일시  
20240214101235
ISBN  
9798380329958
DDC  
658
저자명  
Ramani, Sivaramakrishnan.
서명/저자  
Robust Markov Decision Processes With Data-Driven, Distance-Based Ambiguity Sets - [electronic resource]
발행사항  
[S.l.]: : University of Washington., 2023
발행사항  
Ann Arbor : : ProQuest Dissertations & Theses,, 2023
형태사항  
1 online resource(131 p.)
주기사항  
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
주기사항  
Advisor: Ghate, Archis.
학위논문주기  
Thesis (Ph.D.)--University of Washington, 2023.
사용제한주기  
This item must not be sold to any third party vendors.
초록/해제  
요약We consider finite- and infinite-horizon Markov decision processes (MDPs), with unknown state-transition probabilities. These transition probabilities are assumed to belong to certain ambiguity sets, and the goal is to maximize the worst-case expected total discounted reward over all probabilities from these sets. Specifically, the ambiguity set is a ball - it includes all probability distributions within a certain distance from the empirical distribution constructed using historical, independent observations of state transitions. We therefore call these problems robust MDPs (RMDPs) with data-driven, distance-based ambiguity sets.The literature on data-driven robust optimization mentions (i) robust value convergence with respect to sample size, (ii) out-of-sample value convergence with respect to sample size, (iii) probabilistic performance guarantees on out-of-sample values, and (iv) a probabilistic convergence rate, as desirable properties. The research objective of this dissertation is to establish essentially a minimal set of conditions under which RMDPs with data-driven, distance-based ambiguity sets exhibit these four properties.We first achieve this for the (s, a)-rectangular RMDPs ((s, a)-RMDPs) studied in Chapter 2. There, the ambiguity set for the whole MDP equals a Cartesian product of ambiguity sets for individual state-action pairs. We establish robust and out-of-sample value convergence under a generalized Pinsker's inequality, if the radii of the ambiguity balls approach zero as the sample-size diverges to infinity. This inequality links convergence of probability distributions with respect to the distance function, to their topological convergence in a Euclidean space. We also establish that, for finite sample-sizes, the optimal value of the RMDP provides a lower bound on the value of the robust optimal policy in the true MDP, with a high probability. This probabilistic performance guarantee relies on a certain concentration inequality. Under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. These two inequalities hold for several well-known distance functions including total variation, Burg, Hellinger, and Wasserstein. We present computational results on a generic MDP and a machine repair example using total variation, Burg, and Wasserstein distances. These results illustrate that the generality of our framework provides broad choices when attempting to trade-off conservativeness of robust optimal policies against their out-of-sample performance by tuning ambiguity ball radii.In Chapter 3, we extend results from Chapter 2 to a so-called s-rectangular framework. In this more general context, the ambiguity set for the MDP is a Cartesian product of ambiguity sets for individual states. In that chapter, we introduce a family of distance-based s-rectangular RMDPs (s-RMDPs) indexed with ρ ∈ [1,∞]. In each state, the ambiguity set of transition probability mass functions (pmfs) across different actions equals a sublevel set of the ℓρ-norm of a vector of distances from reference pmfs. Setting ρ = ∞ in this family recovers (s, a)-RMDPs. For any s-RMDP from this family, there is an (s, a)-RMDP whose robust optimal value is at least as good; and vice versa. This occurs because the s- and (s, a)-RMDPs can employ different ambiguity set radii, casting a nuanced doubt on the anecdotal belief that (s, a)-RMDPs are more conservative than s-RMDPs. More strongly, if the distance function is lower semicontinuous and convex, then, for any s-RMDP, there exists an (s, a)-RMDP with an identical robust optimal value. This suggests that appropriate caution should be exercised before interpreting too broadly any anecdotal claims that (s, a)-RMDPs are more conservative than s-rectangular ones. We also study data-driven versions of our s-RMDPs, where the reference pmf equals the empirical pmf constructed from state transition samples. We prove that the robust optimal values, and the out-of-sample values of robust optimal policies both converge to the true optimal, asymptotically with sample sizes, if the distance function satisfies the generalized Pinsker's inequality introduced in Chapter 2. The robust optimal value also provides a probabilistic lower bound on the out-of-sample value of a robust optimal policy, when the distance function satisfies the concentration inequality. This finite-sample guarantee admits a surprising conclusion - (s, a)-RMDPs are the least conservative among all s-RMDPs within our family. Like in Chapter 2, under these generalized Pinsker and concentration inequalities, we also derive a probabilistic rate of convergence for the robust and out-of-sample values to the true optimal value. Though similar asymptotic and finite-sample results were developed for (s, a)-RMDPs in Chapter 2, the proof techniques in this chapter are different and more sophisticated. These more involved proofs are needed because the structure of s-RMDPs introduces new analytical hurdles in our attempt to establish the sufficiency of generalized Pinsker and concentration inequalities. For example, it is no longer adequate to limit attention to deterministic policies - randomization may be needed for optimality. We also present computational experiments on a machine repair example using the total variation distance and ρ = 1. The results of those experiments validate the claims established in that chapter.Finally, in Chapter 4, we develop a data-driven, distance-based RMDP framework on separable complete metric spaces. We extend our asymptotic and finite-sample results to that setup. Unlike our first two studies, this more general endeavor relies on measure-theoretic concepts from minimax control.
일반주제명  
Industrial engineering.
키워드  
Robust optimization
키워드  
Dynamic programming
키워드  
Probabilistic performance guarantee
키워드  
Value convergence
키워드  
Markov decision processes
기타저자  
University of Washington Industrial and Systems Engineering
기본자료저록  
Dissertations Abstracts International. 85-03B.
기본자료저록  
Dissertation Abstract International
전자적 위치 및 접속  
로그인 후 원문을 볼 수 있습니다.
New Books MORE
최근 3년간 통계입니다.

Buch Status

  • Reservierung
  • frei buchen
  • Meine Mappe
  • Erste Aufräumarbeiten Anfrage
  • 비도서대출신청
  • 야간 도서대출신청
Sammlungen
Registrierungsnummer callnumber Standort Verkehr Status Verkehr Info
TF09085 전자도서
마이폴더 부재도서신고 비도서대출신청

* Kredite nur für Ihre Daten gebucht werden. Wenn Sie buchen möchten Reservierungen, klicken Sie auf den Button.

해당 도서를 다른 이용자가 함께 대출한 도서

Related Popular Books

로그인 후 이용 가능합니다.