CS861: Theoretical Foundations of Machine Learning Lecture 1 – 10/20/2023 University of Wisconsin–Madison, Fall 2023
Lecture 20: Structured Bandits, Martingales
Lecturer: Kirthevasan Kandasamy Scribed by: Alex Clinton, Chenghui Zheng
Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the instructor.
In the last lecture we introduced a more general bandit framework and proposed an analogous UCB algorithm. In this lecture we will analyze the algorithm and introduce the formalism of martingales so that we can utilize their popular results.
1 Structured Bandits
Theorem 1. Consider the algorithm introduced at the end of the previous lecture. For sufficiently large T
wherea∗ =argmaxf(θ∗Ta) a∈A
RT =Tf(θTa∗)−E[
xi]∈O ̃(d T)
Proof We start by defining a good event analogous to the one we define in the analysis of UCB. Let G={|f(θTa)−f(θˆT a)|≤ρ||a|| −1,∀a∈A,∀t∈{d+1,…,T}}.Observethathereρ||a|| −1 isplaying
t t−1 Vt−1 Vt−1 the role of an upper confidence bound. We will use the following two claims to aid in the proof.
Claim 1. P(Gc) ≤ T1 . We will prove this later using some martingale concentration results. Claim 2. Under G, f(θ∗T a∗) − (θ∗T At) ≤ 2ρ(T)||At||V −1 for all t > d.
Claim 2 can be verified via the following simple calculation.
f(θTa)−f(θTA)≤f(θˆ a)+ρ(t)||a|| −1 −(f(θˆ A)+ρ(t)||A|| −1) ∗ i ∗ t t−1 ∗ ∗ Vt−1 t−1 t t Vt−1
≤f(θˆ A )−ρ(t)||A || −1 −(f(θˆ A )+ρ(t)||A || −1 )
≤ 2ρ(t)||At||V −1
tVt−1 t−1 t tVt−1 ≤ 2ρ(T)||At||V −1
Next, to bound the regret, write the pseudo-regret R ̄T = Tf(θ∗T a∗) − PTi=1 f(θ∗T At) so that RT = E[R ̄T ]. Using the tower property, we have:
RT =E(R ̄T|G)P(G)+E(R ̄T|Gc)P(Gc) |{z} | {z }|{z}
≤1 ≤T fmax ≤ T1
where fmax = max (f(θ∗T a) − f(θ∗T a′)). Under the good event G, a,a′ ∈A
R ̄t = (f(θ∗T a∗) − f(θ∗T At)) T
≤dfmax + X min(fmax,f(θ∗Ta∗)−f(θ∗TAt)) t=d+1
≤ dfmax + X min(fmax, 2ρ(t)||At||V −1 )
where ai = 1, bi = min(1, ||At||2V −1 ) t−1
Next, we will bound PTt=d+1 min(1, ||At||2V −1 ). Consider for t > d, t−1
t det(Vt) = det(Vt−1 + AtATt ), where Vt = X AsATs
s=1 1−1−11
This means that
≤ dfmax + 2ρ(t) X min(1, ||At ||2V −1 )
, as fmax ≤ 2ρ(t) for sufficiently large T ≤ dfmax + 2ρ(t)utT X min(1, ||At ||2V −1 ) , by the Cauchy-Schwarz inequality,
=det(V2 (I+V 2AATV 2)V2 ) (2) t−1 t−1 t t t−1 t−1
=det(V )det(I+(V−12A)(V−12A)T),sincedet(AB)=det(A)det(B) t−1 t−1 t t−1 t
=det(Vt−1)(1+||At||2V−1 ),sincedet(I+UVT)=1+UTV t−1
Therefore, det(VT ) = det(Vd) QTt=d+1(1 + ||At||2V −1 ) = QTt=d+1(1 + ||At||2V −1 ).
where the last inequality follows from the fact
X 2 log 1+||At||V−1
=log(det(VT))≤dlog(T)
PT ||A ||2 !d dT d det(VT)≤ t = s=1 s2 = =Td
Trace(V )d ddd
We will now use the following inequality: x ≤ 2 log(1 + x), ∀x ∈ [0, 2 log(2)] ⊇ [0, 1] to get TT
X 2 X 2 min(1, ||At||2) ≤ 2 log 1 + min 1, ||At||V −1
t=d+1 t=d+1
X 2 log 1+||At||V−1
≤ 2d log(T )
Therefore,underGRT ≤dfmax+2ρ(T)p2Tdlog(T),andsousingRT =E[RT|G]P(G)+E[RT|Gc]P(Gc), | {z }|{z}
≤T fmax ≤ T1
程序代写 CS代考 加QQ: 749389476
we can conclude that
√ ̃√ RT ≤(d+1)fmax+2 ρ(T) dT ∈O(d T)
Next, we need to prove Claim 1. In order to prove this result, we will begin with a review of martingales.
Review of martingales
In sequential decision making, information is revealed to the learner sequentially, and the learner makes decisions based on the information available. Filtrations are a construct used to formalize the amount of information available to the learner at a given time.
Definition 1. F = {Ft}t∈N is a filtration if ∀t, Ft is a σ-algebra and Ft ⊆ Ft+1
In the context of stochastic bandits, F = σ {A ,X }t−1 is the σ-algebra generated by actions and
t sss=1 Definition 2. Predictable processes and adapted processes:
rewards up to round t.
1. A stochastic process {Xt}t∈N is predictable with respect to a filtration {F}t∈N if Xt+1 is measurable (predictable).
2. A stochastic process {Xt}t∈N is adapted to a filtration {F}t∈N if Xt is Ft-measurable.
Example 2. In stochastic bandits, the actions At are predictable as At is determined based on actions up
to round t − 1.
Definition 3. Martingales and martingale difference sequences
1. An F-adapted sequence of random variables {Xt}t∈N is a martingale if (i) E[Xt|Ft−1] = Xt−1
(ii) E[|Xt|] < ∞
2. An F-adapted sequence of random variables {Yt}t∈N is a martingale difference sequence if
(i) E[Yt|Xt] = 0 (ii) E[|Yt|] < ∞
Example 3. If {Xt}t∈N is a martingale, then Yt = Xt − Xt−1 is a martingale difference sequence. 2.1 Martingale contraction
There are many popular martingale concentration results that we can use, such as the Hoeffding-Azuma inequality, and a martingale version of the Bernstein inequality (e.g Freedman 2009). Often however, in sequential feedback settings, we may need to develop a customized result suited to our problem setting. To that end, we will introduce and prove the following result.
程序代写 CS代考 加微信: cstutorcs
Lemma 1. Let F = {Ft}t≥0 be a filtration. Let {At}t≥0 be an Rd-valued stochastic process predictable with respect to F, and let {ε}t≥1 be a real-valued martingale difference sequence adapted to {Ft}t≥2. Assume εt is
λεt λ2σ2 Pt T Pt T σ-subGaussian,i.e. ∀λ, E[e ]≤e 2 . LetVt = s=1AsAs andξt = s=1Asξs whereAsAs ≤c, ∀x.
Supposev ≽I,∀t>t . Thenforallδ≤e−√1 ,withprobabilityatleast1−δ t02
||ξt||V−1 ≤γσ 2dlog(t)log δ t
where γ= 3+2log(1+2c)
Computer Science Tutoring