深度强化学习(七)策略梯度

深度强化学习(七)策略梯度

策略学习的目的是通过求解一个优化问题,学出最优策略函数或它的近似函数(比如策略网络)

一.策略网络

假设动作空间是离散的,,比如 A = { 左 , 右 , 上 } \cal A=\{左,右,上\} A={,,},策略函数 π \pi π是个条件概率函数:
π ( a ∣ s ) = P ( A = a ∣ S = s ) \pi(a\mid s)=\Bbb P(A=a\mid S=s) π(as)=P(A=aS=s)
D Q N DQN DQN类似,我们可以用神经网络 π ( a ∣ s ; θ ) \pi(a \mid s ; \boldsymbol{\theta}) π(as;θ)去近似策略函数 π ( a ∣ s ) \pi(a\mid s) π(as), θ \boldsymbol \theta θ是我们需要训练的神经网络的参数。

回忆动作价值函数的定义是
Q π ( a t , s t ) = E A t + 1 , S t + 1 … [ U t ∣ A t = a t , S t = s t ] Q_{\pi}(a_t,s_t)=\Bbb E_{A_{t+1},S_{t+1}\ldots}[U_t\mid A_t=a_t,S_t=s_t] Qπ(at,st)=EAt+1,St+1[UtAt=at,St=st]
状态价值函数的定义是
V π ( s t ) = E A t ∼ π ( a ∣ s ) [ Q π ( A t , s t ) ] V_{\pi}(s_t)=\Bbb E_{A_t\sim \pi(a\mid s)}[Q_{\pi}(A_t,s_t)] Vπ(st)=EAtπ(as)[Qπ(At,st)]
状态价值既依赖于当前状态  s t , 也依赖于策略网络  π 的参数  θ 。  \text { 状态价值既依赖于当前状态 } s_t \text {, 也依赖于策略网络 } \pi \text { 的参数 } \boldsymbol{\theta} \text { 。 }  状态价值既依赖于当前状态 st也依赖于策略网络 π 的参数 θ  

为排除状态对策略的影响,我们对状态 S t S_t St求期望,得出
J ( θ ) = E S t [ V π ( S t ) ] J(\boldsymbol \theta)=\Bbb E_{S_t}[V_{\pi}(S_t)] J(θ)=ESt[Vπ(St)]
这个目标函数排除掉了状态 S S S 的因素,只依赖于策略网络 π \pi π的参数 θ \boldsymbol \theta θ;策略越好,则 J J J越大。所以策略学习可以描述为这样一个优化问题
Max θ J ( θ ) \text{Max}_{\boldsymbol \theta} \quad J(\boldsymbol \theta) MaxθJ(θ)
由于是求最大化问题,我们可利用梯度上升对 J ( θ ) J(\boldsymbol \theta) J(θ)进行更新,问题的关键是计算 ∇ θ J ( θ ) \nabla_{\boldsymbol \theta}J(\boldsymbol \theta) θJ(θ)

二.策略梯度定理推导

Theorem:递归公式,其中 S ′ S' S是 下一时刻的状态。
∂ V π ( s ) ∂ θ = E A ∼ π ( ⋅ ∣ s ; θ ) [ ∂ ln ⁡ π ( A ∣ s ; θ ) ∂ θ ⋅ Q π ( s , A ) + γ ⋅ E S ′ ∼ p ( ⋅ ∣ s , A ) [ ∂ V π ( S ′ ) ∂ θ ] ] (2.1) \frac{\partial V_\pi(s)}{\partial \boldsymbol{\theta}}=\mathbb{E}_{A \sim \pi(\cdot \mid s ; \boldsymbol{\theta})}\left[\frac{\partial \ln \pi(A \mid s ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \cdot Q_\pi(s, A)+\gamma \cdot \mathbb{E}_{S^{\prime} \sim p(\cdot \mid s, A)}\left[\frac{\partial V_\pi\left(S^{\prime}\right)}{\partial \boldsymbol{\theta}}\right]\right]\tag{2.1} θVπ(s)=EAπ(s;θ)[θlnπ(As;θ)Qπ(s,A)+γESp(s,A)[θVπ(S)]](2.1)

Proof:
∂ V π ( s ) ∂ θ = ∂ ∂ θ [ E A ∼ π ( ⋅ ∣ s ; θ ) [ Q π ( s , A ) ] ] = ∂ ∂ θ [ ∑ A π ( a ∣ s ; θ ) Q π ( s , a ) ] = ∑ A [ ∂ π ( a ∣ s ; θ ) ∂ θ Q π ( s , a ) + π ( a ∣ s ; θ ) ∂ Q π ( s , a ) ∂ θ ] = ∑ A [ π ( a ∣ s ; θ ) ⋅ ∂ ln ⁡ π ( a ∣ s ; θ ) ∂ θ ⋅ Q π ( s , a ) + π ( a ∣ s ; θ ) ∂ Q π ( s , a ) ∂ θ ] = E A ∼ π ( ⋅ ∣ s ; θ ) [ ∂ ln ⁡ π ( A ∣ s ; θ ) ∂ θ ⋅ Q π ( s , A ) ] + E A ∼ π ( ⋅ ∣ s ; θ ) [ ∂ Q π ( s , a ) ∂ θ ] . = E A ∼ π ( ⋅ ∣ s ; θ ) [ ∂ ln ⁡ π ( A ∣ s ; θ ) ∂ θ ⋅ Q π ( s , A ) + ∂ Q π ( s , a ) ∂ θ ] \begin{aligned} \frac{\partial V_\pi(s)}{\partial \boldsymbol{\theta}} &=\frac{\partial}{\partial \boldsymbol \theta}[\Bbb E_{A\sim \pi(\cdot \mid s;\boldsymbol \theta)}[Q_{\pi}(s,A)]]\\ &= \frac{\partial}{\partial \boldsymbol \theta}[\sum_{A}\pi(a\mid s;\boldsymbol \theta)Q_{\pi}(s,a)]\\ &=\sum_{A}[\frac{\partial \pi(a\mid s;\boldsymbol \theta)}{\partial \boldsymbol \theta}Q_{\pi}(s,a)+\pi(a\mid s;\boldsymbol \theta)\frac{\partial Q_{\pi}(s,a)}{\partial \boldsymbol \theta}]\\ &=\sum_{A}[\pi(a\mid s;\boldsymbol \theta)\cdot\frac{\partial \ln \pi(a\mid s;\boldsymbol \theta)}{\partial \boldsymbol \theta}\cdot Q_{\pi}(s,a)+\pi(a\mid s;\boldsymbol \theta)\frac{\partial Q_{\pi}(s,a)}{\partial \boldsymbol \theta}] \\ & =\mathbb{E}_{A \sim \pi(\cdot \mid s ; \boldsymbol{\theta})}\left[\frac{\partial \ln \pi(A \mid s ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \cdot Q_\pi(s, A)\right]+\mathbb{E}_{A \sim \pi(\cdot \mid s ; \boldsymbol{\theta})}\left[\frac{\partial Q_\pi(s, a)}{\partial \boldsymbol{\theta}}\right] . \\ &= \mathbb{E}_{A \sim \pi(\cdot \mid s ; \boldsymbol{\theta})}[\frac{\partial \ln \pi(A \mid s ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \cdot Q_\pi(s, A)+\frac{\partial Q_\pi(s, a)}{\partial \boldsymbol{\theta}}] \end{aligned} θVπ(s)=θ[EAπ(s;θ)[Qπ(s,A)]]=θ[Aπ(as;θ)Qπ(s,a)]=A[θπ(as;θ)Qπ(s,a)+π(as;θ)θQπ(s,a)]=A[π(as;θ)θlnπ(as;θ)Qπ(s,a)+π(as;θ)θQπ(s,a)]=EAπ(s;θ)[θlnπ(As;θ)Qπ(s,A)]+EAπ(s;θ)[θQπ(s,a)].=EAπ(s;θ)[θlnπ(As;θ)Qπ(s,A)+θQπ(s,a)]
接下来仅需证明 ∂ Q π ( s , a ) ∂ θ = γ E S ′ ∼ p ( ⋅ ∣ s , A ) [ ∂ V π ( S ′ ) ∂ θ ] \frac{\partial Q_\pi(s, a)}{\partial \boldsymbol{\theta}}=\gamma \mathbb{E}_{S^{\prime} \sim p(\cdot \mid s, A)}[\frac{\partial V_\pi\left(S^{\prime}\right)}{\partial \boldsymbol{\theta}}] θQπ(s,a)=γESp(s,A)[θVπ(S)],贝尔曼方程为
Q π ( s , a ) = E S ′ ∼ p ( ⋅ ∣ s , a ) [ R ( s , a , S ′ ) + γ ⋅ V π ( s ′ ) ] = ∑ s ′ ∈ S p ( s ′ ∣ s , a ) ⋅ [ R ( s , a , s ′ ) + γ ⋅ V π ( s ′ ) ] = ∑ s ′ ∈ S p ( s ′ ∣ s , a ) ⋅ R ( s , a , s ′ ) + γ ⋅ ∑ s ′ ∈ S p ( s ′ ∣ s , a ) ⋅ V π ( s ′ ) . \begin{aligned} Q_\pi(s, a) & =\mathbb{E}_{S^{\prime} \sim p(\cdot \mid s, a)}\left[R\left(s, a, S^{\prime}\right)+\gamma \cdot V_\pi\left(s^{\prime}\right)\right] \\ & =\sum_{s^{\prime} \in \mathcal{S}} p\left(s^{\prime} \mid s, a\right) \cdot\left[R\left(s, a, s^{\prime}\right)+\gamma \cdot V_\pi\left(s^{\prime}\right)\right] \\ & =\sum_{s^{\prime} \in \mathcal{S}} p\left(s^{\prime} \mid s, a\right) \cdot R\left(s, a, s^{\prime}\right)+\gamma \cdot \sum_{s^{\prime} \in \mathcal{S}} p\left(s^{\prime} \mid s, a\right) \cdot V_\pi\left(s^{\prime}\right) . \end{aligned} Qπ(s,a)=ESp(s,a)[R(s,a,S)+γVπ(s)]=sSp(ss,a)[R(s,a,s)+γVπ(s)]=sSp(ss,a)R(s,a,s)+γsSp(ss,a)Vπ(s).

在观测到 s 、 a 、 s ′ s 、 a 、 s^{\prime} sas 之后, p ( s ′ ∣ s , a ) p\left(s^{\prime} \mid s, a\right) p(ss,a) R ( s , a , s ′ ) R\left(s, a, s^{\prime}\right) R(s,a,s) 都与策略网络 π \pi π 无关, 因此
∂ ∂ θ [ p ( s ′ ∣ s , a ) ⋅ R ( s , a , s ′ ) ] = 0. \frac{\partial}{\partial \boldsymbol{\theta}}\left[p\left(s^{\prime} \mid s, a\right) \cdot R\left(s, a, s^{\prime}\right)\right]=0 . θ[p(ss,a)R(s,a,s)]=0.

可得:
∂ Q π ( s , a ) ∂ θ = ∑ s ′ ∈ S ∂ ∂ θ [ p ( s ′ ∣ s , a ) ⋅ R ( s , a , s ′ ) ] ⏟ 等于零  + γ ⋅ ∑ s ′ ∈ S ∂ ∂ θ [ p ( s ′ ∣ s , a ) ⋅ V π ( s ′ ) ] = γ ⋅ ∑ s ′ ∈ S p ( s ′ ∣ s , a ) ⋅ ∂ V π ( s ′ ) ∂ θ = γ ⋅ E S ′ ∼ p ( ⋅ ∣ s , a ) [ ∂ V π ( S ′ ) ∂ θ ] . \begin{aligned} \frac{\partial Q_\pi(s, a)}{\partial \boldsymbol{\theta}} & =\sum_{s^{\prime} \in \mathcal{S}} \underbrace{\frac{\partial}{\partial \boldsymbol{\theta}}\left[p\left(s^{\prime} \mid s, a\right) \cdot R\left(s, a, s^{\prime}\right)\right]}_{\text {等于零 }}+\gamma \cdot \sum_{s^{\prime} \in \mathcal{S}} \frac{\partial}{\partial \boldsymbol{\theta}}\left[p\left(s^{\prime} \mid s, a\right) \cdot V_\pi\left(s^{\prime}\right)\right] \\ & =\gamma \cdot \sum_{s^{\prime} \in \mathcal{S}} p\left(s^{\prime} \mid s, a\right) \cdot \frac{\partial V_\pi\left(s^{\prime}\right)}{\partial \boldsymbol{\theta}} \\ & =\gamma \cdot \mathbb{E}_{S^{\prime} \sim p(\cdot \mid s, a)}\left[\frac{\partial V_\pi\left(S^{\prime}\right)}{\partial \boldsymbol{\theta}}\right] . \end{aligned} θQπ(s,a)=sS等于零  θ[p(ss,a)R(s,a,s)]+γsSθ[p(ss,a)Vπ(s)]=γsSp(ss,a)θVπ(s)=γESp(s,a)[θVπ(S)].

证毕

g ( s , a ; θ ) ≜ Q π ( s , a ) ⋅ ∂ ln ⁡ π ( a ∣ s ; θ ) ∂ θ \boldsymbol{g}(s, a ; \boldsymbol{\theta}) \triangleq Q_\pi(s, a) \cdot \frac{\partial \ln \pi(a \mid s ; \theta)}{\partial \boldsymbol{\theta}} g(s,a;θ)Qπ(s,a)θlnπ(as;θ) 。设一局游戏在第 n n n 步之后结束。那么
∂ J ( θ ) ∂ θ = E S 1 , A 1 [ g ( S 1 , A 1 ; θ ) ] + γ ⋅ E S 1 , A 1 , S 2 , A 2 [ g ( S 2 , A 2 ; θ ) ] + γ 2 ⋅ E S 1 , A 1 , S 2 , A 2 , S 3 , A 3 [ g ( S 3 , A 3 ; θ ) ] + ⋯ + γ n − 1 ⋅ E S 1 , A 1 , S 2 , A 2 , S 3 , A 3 , ⋯ S n , A n [ g ( S n , A n ; θ ) ] (2.2) \begin{aligned} \frac{\partial J(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}= & \mathbb{E}_{S_1, A_1}\left[\boldsymbol{g}\left(S_1, A_1 ; \boldsymbol{\theta}\right)\right] \\ & +\gamma \cdot \mathbb{E}_{S_1, A_1, S_2, A_2}\left[\boldsymbol{g}\left(S_2, A_2 ; \boldsymbol{\theta}\right)\right] \\ & +\gamma^2 \cdot \mathbb{E}_{S_1, A_1, S_2, A_2, S_3, A_3}\left[\boldsymbol{g}\left(S_3, A_3 ; \boldsymbol{\theta}\right)\right] \\ & +\cdots \\ & \left.+\gamma^{n-1} \cdot \mathbb{E}_{S_1, A_1, S_2, A_2, S_3, A_3, \cdots S_n, A_n}[\boldsymbol{g}\left(S_n, A_n ; \boldsymbol{\theta}\right)\right] \end{aligned} \tag{2.2} θJ(θ)=ES1,A1[g(S1,A1;θ)]+γES1,A1,S2,A2[g(S2,A2;θ)]+γ2ES1,A1,S2,A2,S3,A3[g(S3,A3;θ)]++γn1ES1,A1,S2,A2,S3,A3,Sn,An[g(Sn,An;θ)](2.2)

Proof:由式 2.1 2.1 2.1可知
∇ θ V π ( s t ) = E A t ∼ π ( ⋅ ∣ s t ; θ ) [ ∂ ln ⁡ π ( A t ∣ s t ; θ ) ∂ θ ⋅ Q π ( s t , A t ) + γ ⋅ E S t + 1 ∼ p ( ⋅ ∣ s t , A t ) [ ∇ θ V π ( S t + 1 ) ] ] = E A t ∼ π ( ⋅ ∣ s t ; θ ) [ g ( s t , A t ; θ ) + γ ⋅ E S t + 1 [ ∇ θ V π ( S t + 1 ) ∣ A t , S t = s t ] ] = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t [ E S t + 1 [ ∇ θ V π ( S t + 1 ) ∣ A t , S t = s t ] ∣ S t = s t ] = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t , S t + 1 [ ∇ θ V π ( S t + 1 ) ∣ S t = s t ] \begin{aligned} \nabla_{\boldsymbol \theta }V_{\pi}(s_t)&=\mathbb{E}_{A_t \sim \pi(\cdot \mid s_t ; \boldsymbol{\theta})}\left[\frac{\partial \ln \pi(A_t \mid s_t ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \cdot Q_\pi(s_t, A_t)+\gamma \cdot \mathbb{E}_{S_{t+1} \sim p(\cdot \mid s_t, A_t)}[\nabla _{\boldsymbol \theta}V_\pi\left(S_{t+1}\right)]\right]\\ &=\mathbb{E}_{A_t \sim \pi(\cdot \mid s_t ; \boldsymbol{\theta})}\left[\boldsymbol g(s_t,A_t;\boldsymbol \theta)+\gamma \cdot \mathbb{E}_{S_{t+1} }[\nabla _{\boldsymbol \theta}V_\pi\left(S_{t+1}\right)\mid A_t,S_t=s_t]\right]\\ &=\Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_t}[\Bbb E_{S_{t+1}}[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+1})\mid A_t,S_t=s_t]\mid S_t=s_t]\\ &=\Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_t,S_{t+1}}[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+1})\mid S_t=s_t] \end{aligned} θVπ(st)=EAtπ(st;θ)[θlnπ(Atst;θ)Qπ(st,At)+γESt+1p(st,At)[θVπ(St+1)]]=EAtπ(st;θ)[g(st,At;θ)+γESt+1[θVπ(St+1)At,St=st]]=EAt[g(st,At;θ)St=st]+γEAt[ESt+1[θVπ(St+1)At,St=st]St=st]=EAt[g(st,At;θ)St=st]+γEAt,St+1[θVπ(St+1)St=st]
∇ θ V π ( S t + 1 ) = E A t + 1 [ g ( S t + 1 , A t + 1 ; θ ) ∣ S t + 1 ] + γ E A t + 1 , S t + 2 [ ∇ θ V π ( S t + 2 ) ∣ S t + 1 ] \nabla_{\boldsymbol \theta }V_{\pi}(S_{t+1})=\Bbb E_{A_{t+1}}[\boldsymbol g(S_{t+1},A_{t+1};\boldsymbol \theta)\mid S_{t+1}]+\gamma \Bbb E_{A_{t+1},S_{t+2}}[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+2})\mid S_{t+1}] θVπ(St+1)=EAt+1[g(St+1,At+1;θ)St+1]+γEAt+1,St+2[θVπ(St+2)St+1],带入上式中可得
∇ θ V π ( s t ) = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t , S t + 1 [ ∇ θ V π ( S t + 1 ) ∣ S t = s t ] = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t , S t + 1 [ E A t + 1 [ g ( S t + 1 , A t + 1 ; θ ) ∣ S t + 1 ] + γ E A t + 1 , S t + 2 [ ∇ θ V π ( S t + 2 ) ∣ S t + 1 ] ∣ S t = s t ] = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t , S t + 1 [ E A t + 1 [ g ( S t + 1 , A t + 1 ; θ ) ∣ S t + 1 , S t = s t , A t ] + γ E A t + 1 , S t + 2 [ [ ∇ θ V π ( S t + 2 ) ∣ S t + 1 ] ∣ S t = s t ] 马尔可可夫性 = E A t [ g ( s t , A t ; θ ) ∣ S t = s t ] + γ E A t , S t + 1 , A t + 1 [ g ( S t + 1 , A t + 1 ; θ ) ∣ S t = s t ] + γ E A t + 1 , S t + 2 [ [ ∇ θ V π ( S t + 2 ) ∣ S t + 1 ] ∣ S t = s t ] \begin{aligned} \nabla_{\boldsymbol \theta }V_{\pi}(s_t)&=\Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_t,S_{t+1}}[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+1})\mid S_t=s_t]\\ &=\Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_t,S_{t+1}}[\Bbb E_{A_{t+1}}[\boldsymbol g(S_{t+1},A_{t+1};\boldsymbol \theta)\mid S_{t+1}]+\gamma \Bbb E_{A_{t+1},S_{t+2}}[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+2})\mid S_{t+1}]\mid S_t=s_t]\\ &=\Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_t,S_{t+1}}[\Bbb E_{A_{t+1}}[\boldsymbol g(S_{t+1},A_{t+1};\boldsymbol \theta)\mid S_{t+1},S_t=s_t,A_t]+\gamma \Bbb E_{A_{t+1},S_{t+2}}[[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+2})\mid S_{t+1}]\mid S_t=s_t]\text{马尔可可夫性}\\ &= \Bbb E_{A_t}[\boldsymbol g(s_t,A_t;\boldsymbol \theta)\mid S_t=s_t]+\gamma\Bbb E_{A_t,S_{t+1},A_{t+1}}[\boldsymbol g(S_{t+1},A_{t+1};\boldsymbol \theta)\mid S_t=s_t]+\gamma \Bbb E_{A_{t+1},S_{t+2}}[[\nabla_{\boldsymbol \theta}V_{\pi}(S_{t+2})\mid S_{t+1}]\mid S_t=s_t] \end{aligned} θVπ(st)=EAt[g(st,At;θ)St=st]+γEAt,St+1[θVπ(St+1)St=st]=EAt[g(st,At;θ)St=st]+γEAt,St+1[EAt+1[g(St+1,At+1;θ)St+1]+γEAt+1,St+2[θVπ(St+2)St+1]St=st]=EAt[g(st,At;θ)St=st]+γEAt,St+1[EAt+1[g(St+1,At+1;θ)St+1,St=st,At]+γEAt+1,St+2[[θVπ(St+2)St+1]St=st]马尔可可夫性=EAt[g(st,At;θ)St=st]+γEAt,St+1,At+1[g(St+1,At+1;θ)St=st]+γEAt+1,St+2[[θVπ(St+2)St+1]St=st]
继续利用上式反复带入,最后可得
∂ V π ( S 1 ) ∂ θ = E A 1 [ g ( S 1 , A 1 ; θ ) ∣ S 1 ] + γ ⋅ E A 1 , S 2 , A 2 [ g ( S 2 , A 2 ; θ ) ∣ S 1 ] + γ 2 ⋅ E A 1 , S 2 , A 2 , S 3 , A 3 [ g ( S 3 , A 3 ; θ ) ∣ S 1 ] + ⋯ + γ n − 1 ⋅ E A 1 , S 2 , A 2 , S 3 , A 3 , ⋯ S n , A n [ g ( S n , A n ; θ ) ∣ S 1 ] + γ n ⋅ E A 1 , S 2 , A 2 , S 3 , A 3 , ⋯ S n , A n , S n + 1 [ ∂ V π ( S n + 1 ) ∂ θ ⏟ 等于零  ∣ S 1 ] \begin{aligned} \frac{\partial V_\pi\left(S_1\right)}{\partial \boldsymbol{\theta}}= & \mathbb{E}_{A_1}\left[\boldsymbol{g}\left(S_1, A_1 ; \boldsymbol{\theta}\right)\mid S_1\right] \\ & +\gamma \cdot \mathbb{E}_{A_1, S_2, A_2}\left[\boldsymbol{g}\left(S_2, A_2 ; \boldsymbol{\theta}\right)\mid S_1\right] \\ & +\gamma^2 \cdot \mathbb{E}_{A_1, S_2, A_2, S_3, A_3}\left[\boldsymbol{g}\left(S_3, A_3 ; \boldsymbol{\theta}\right)\mid S_1\right] \\ & +\cdots \\ & +\gamma^{n-1} \cdot \mathbb{E}_{A_1, S_2, A_2, S_3, A_3, \cdots S_n, A_n}\left[\boldsymbol{g}\left(S_n, A_n ; \boldsymbol{\theta}\right)\mid S_1\right] \\ &+\gamma^n \cdot \mathbb{E}_{A_1, S_2, A_2, S_3, A_3, \cdots S_n, A_n, S_{n+1}}[\underbrace{\frac{\partial V_\pi\left(S_{n+1}\right)}{\partial \boldsymbol{\theta}}}_{\text {等于零 }}\mid S_1] \end{aligned} θVπ(S1)=EA1[g(S1,A1;θ)S1]+γEA1,S2,A2[g(S2,A2;θ)S1]+γ2EA1,S2,A2,S3,A3[g(S3,A3;θ)S1]++γn1EA1,S2,A2,S3,A3,Sn,An[g(Sn,An;θ)S1]+γnEA1,S2,A2,S3,A3,Sn,An,Sn+1[等于零  θVπ(Sn+1)S1]
上式中最后一项等于零,原因是游戏在n时刻后结束,而 n + 1 n+1 n+1时刻之后没有奖励,所以 n + 1 n+1 n+1时刻的回报和价值都是零。最后,由上面的公式和,最后,由 J ( θ ) J(\boldsymbol \theta) J(θ)定义知
∂ J ( θ ) ∂ θ = E S 1 [ ∂ V π ( S 1 ) ∂ θ ] \frac{\partial J(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}=\mathbb{E}_{S_1}\left[\frac{\partial V_\pi\left(S_1\right)}{\partial \boldsymbol{\theta}}\right] θJ(θ)=ES1[θVπ(S1)]
证毕

稳态分布:想要严格证明策略梯度定理, 需要用到马尔科夫链 (Markov chain) 的稳态分布 (stationary distribution)。设状态 S ′ S^{\prime} S 是这样得到的: S → A → S ′ S \rightarrow A \rightarrow S^{\prime} SAS 。回忆一下, 状态转移函数 p ( S ′ ∣ S , A ) p\left(S^{\prime} \mid S, A\right) p(SS,A), 是一个概率质量函数。设 f ( S ) f(S) f(S) 是状态 S S S 的概率质量函数那么状态 S ′ S^{\prime} S的边缘分布 f ( S ′ ) f(S') f(S)
f ( S ′ ) = E S , A [ p ( S ′ ∣ A , S ) ] = E S [ E A [ p ( S ′ ∣ A , S ) ∣ S ] ] = E S [ ∑ A p ( S ′ ∣ a , S ) ⋅ π ( a ∣ S ) ] = ∑ S ∑ A p ( S ′ ∣ a , s ) ⋅ π ( a ∣ s ) ⋅ f ( s ) \begin{aligned} f(S')&=\Bbb E_{S,A}[p(S'\mid A,S)]\\ &=\Bbb E_{S}[\Bbb E_{A}[p(S'\mid A,S)\mid S]]\\ &=\Bbb E_{S}[\sum_{A}p(S'\mid a,S)\cdot \pi(a\mid S)]\\ &=\sum_{S}\sum_{A}p(S'\mid a,s)\cdot \pi(a\mid s)\cdot f(s) \end{aligned} f(S)=ES,A[p(SA,S)]=ES[EA[p(SA,S)S]]=ES[Ap(Sa,S)π(aS)]=SAp(Sa,s)π(as)f(s)
如果 f ( S ′ ) f(S') f(S) f ( S ) f(S) f(S) 是相同的概率质量函数, 即 $f(S)=f(S’) $, 则意味着马尔科夫链达到稳态, 而 f ( S ) f(S) f(S) 就是稳态时的概率质量函数。

Theorem:

f ( S ) f(S) f(S) 是马尔科夫链稳态时的概率质量 (密度) 函数。那么对于任意函数 G ( S ′ ) G\left(S^{\prime}\right) G(S),
E S ∼ f ( ⋅ ) [ E A ∼ π ( ⋅ ∣ S ; θ ) [ E S ′ ∼ p ( ⋅ ∣ s , A ) [ G ( S ′ ) ] ] ] = E S ′ ∼ f ( ⋅ ) [ G ( S ′ ) ] (2.3) \mathbb{E}_{S \sim f(\cdot)}\left[\mathbb{E}_{A \sim \pi(\cdot \mid S ; \boldsymbol{\theta})}\left[\mathbb{E}_{S^{\prime} \sim p(\cdot \mid s, A)}\left[G\left(S^{\prime}\right)\right]\right]\right]=\mathbb{E}_{S^{\prime} \sim f(\cdot)}\left[G\left(S^{\prime}\right)\right]\tag{2.3} ESf()[EAπ(S;θ)[ESp(s,A)[G(S)]]]=ESf()[G(S)](2.3)

Proof:
E S ∼ f ( ⋅ ) [ E A ∼ π ( ⋅ ∣ S ; θ ) [ E S ′ ∼ p ( ⋅ ∣ S , A ) [ G ( S ′ ) ] ] ] = E S ∼ f ( ⋅ ) [ E A [ E S ′ [ G ( S ′ ) ∣ S , A ] ∣ S ] ] = E S ∼ f ( ⋅ ) [ E A , S ′ [ G ( S ′ ) ∣ S ] ] = E S , A , S ′ [ G ( S ′ ) ] = E S ′ [ G ( S ′ ) ] \begin{aligned} \mathbb{E}_{S \sim f(\cdot)}\left[\mathbb{E}_{A \sim \pi(\cdot \mid S ; \boldsymbol{\theta})}\left[\mathbb{E}_{S^{\prime} \sim p(\cdot \mid S, A)}\left[G\left(S^{\prime}\right)\right]\right]\right]&= \Bbb E_{S\sim f(\cdot)}[\Bbb E_{A}[\Bbb E_{S'}[G(S')\mid S,A]\mid S]]\\ &=\Bbb E_{S\sim f(\cdot)}[\Bbb E_{A,S'}[G(S')\mid S]]\\ &=\Bbb E_{S,A,S'}[G(S')]\\ &=\Bbb E_{S'}[G(S')] \end{aligned} ESf()[EAπ(S;θ)[ESp(S,A)[G(S)]]]=ESf()[EA[ES[G(S)S,A]S]]=ESf()[EA,S[G(S)S]]=ES,A,S[G(S)]=ES[G(S)]
又因 S , S ′ S,S' S,S有相同的分布 f ( ⋅ ) f(\cdot) f(),所以 E S ′ [ G ( S ′ ) ] = E S ′ ∼ f ( ⋅ ) [ G ( S ′ ) ] \Bbb E_{S'}[G(S')]=\mathbb{E}_{S^{\prime} \sim f(\cdot)}\left[G\left(S^{\prime}\right)\right] ES[G(S)]=ESf()[G(S)]

Theorem:策略梯度定理

设目标函数为 J ( θ ) = E S ∼ f ( ⋅ ) [ V π ( S ) ] J(\boldsymbol{\theta})=\mathbb{E}_{S \sim f(\cdot)}\left[V_\pi(S)\right] J(θ)=ESf()[Vπ(S)], 设 f ( S ) f(S) f(S) 为马尔科夫链稳态分布的概率质量 (密度) 函数。那么
∂ J ( θ ) ∂ θ = ( 1 + γ + γ 2 + ⋯ + γ n − 1 ) ⋅ E S ∼ f ( ⋅ ) [ E A ∼ π ( ⋅ ∣ S ; θ ) [ ∂ ln ⁡ π ( A ∣ S ; θ ) ∂ θ ⋅ Q π ( S , A ) ] ] \frac{\partial J(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}=\left(1+\gamma+\gamma^2+\cdots+\gamma^{n-1}\right) \cdot \mathbb{E}_{S \sim f(\cdot)}\left[\mathbb{E}_{A \sim \pi(\cdot \mid S ; \boldsymbol{\theta})}\left[\frac{\partial \ln \pi(A \mid S ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} \cdot Q_\pi(S, A)\right]\right] θJ(θ)=(1+γ+γ2++γn1)ESf()[EAπ(S;θ)[θlnπ(AS;θ)Qπ(S,A)]]

Proof:设初始状态 S 1 S_1 S1 服从马尔科夫链的稳态分布,它的概率质量函数是 f ( S 1 ) f\left(S_1\right) f(S1) 。对于所有的 t = 1 , ⋯ , n t=1, \cdots, n t=1,,n, 动作 A t A_t At 根据策略网络抽样得到:
A t ∼ π ( ⋅ ∣ S t ; θ ) A_t \sim \pi\left(\cdot \mid S_t ; \boldsymbol{\theta}\right) Atπ(St;θ)
对于任意函数 G G G, 反复应用式 2.3 可得:
E A 1 , … , A t − 1 , S 1 , … , S t [ G ( S t ) ] = E S 1 ∼ f { E A 1 ∼ π , S 2 ∼ p { E A 2 , S 3 , A 3 , S 4 , ⋯ , A t − 1 , S t [ G ( S t ) ] } } = E S 2 ∼ f { E A 2 , S 3 , A 3 , S 4 , ⋯ , A t − 1 , S t [ G ( S t ) ] } = E S 2 ∼ f { E A 2 ∼ π , S 3 ∼ p { E A 3 , S 4 , A 4 , S 5 , ⋯ , A t − 1 , S t [ G ( S t ) ] } } = E S 3 ∼ f { E A 3 , S 4 , A 4 , S 5 , ⋯ , A t − 1 , S t [ G ( S t ) ] } ⋮ = E S t − 1 ∼ f { E A t − 1 ∼ π , S t ∼ p { G ( S t ) } } = E S t ∼ f { G ( S t ) } . \begin{aligned} \Bbb E_{A_1,\ldots,A_{t-1},S_1,\ldots,S_{t}}[G(S_t)] & =\mathbb{E}_{S_1 \sim f}\left\{\mathbb{E}_{A_1 \sim \pi, S_2 \sim p}\left\{\mathbb{E}_{A_2, S_3, A_3, S_4, \cdots, A_{t-1}, S_t}\left[G\left(S_t\right)\right]\right\}\right\} \\ & =\mathbb{E}_{S_2 \sim f}\left\{\mathbb{E}_{A_2, S_3, A_3, S_4, \cdots, A_{t-1}, S_t}\left[G\left(S_t\right)\right]\right\} \quad \\ & =\mathbb{E}_{S_2 \sim f}\left\{\mathbb{E}_{A_2 \sim \pi, S_3 \sim p}\left\{\mathbb{E}_{A_3, S_4, A_4, S_5, \cdots, A_{t-1}, S_t}\left[G\left(S_t\right)\right]\right\}\right\} \\ & =\mathbb{E}_{S_3 \sim f}\left\{\mathbb{E}_{A_3, S_4, A_4, S_5, \cdots, A_{t-1}, S_t}\left[G\left(S_t\right)\right]\right\} \quad \\ & \vdots \\ & =\mathbb{E}_{S_{t-1} \sim f}\left\{\mathbb{E}_{A_{t-1} \sim \pi, S_t \sim p}\left\{G\left(S_t\right)\right\}\right\} \\ & =\mathbb{E}_{S_t \sim f}\left\{G\left(S_t\right)\right\} . \end{aligned} EA1,,At1,S1,,St[G(St)]=ES1f{EA1π,S2p{EA2,S3,A3,S4,,At1,St[G(St)]}}=ES2f{EA2,S3,A3,S4,,At1,St[G(St)]}=ES2f{EA2π,S3p{EA3,S4,A4,S5,,At1,St[G(St)]}}=ES3f{EA3,S4,A4,S5,,At1,St[G(St)]}=ESt1f{EAt1π,Stp{G(St)}}=EStf{G(St)}.
g ( s , a ; θ ) ≜ Q π ( s , a ) ⋅ ∂ ln ⁡ π ( a ∣ s ; θ ) ∂ θ \boldsymbol{g}(s, a ; \boldsymbol{\theta}) \triangleq Q_\pi(s, a) \cdot \frac{\partial \ln \pi(a \mid s ; \boldsymbol{\theta})}{\partial \boldsymbol{\theta}} g(s,a;θ)Qπ(s,a)θlnπ(as;θ) 。设一局游戏在第 n n n 步之后结束。由式2.2与上面的公式可得:
∂ J ( θ ) ∂ θ = E S 1 , A 1 [ g ( S 1 , A 1 ; θ ) ] + γ ⋅ E S 1 , A 1 , S 2 , A 2 [ g ( S 2 , A 2 ; θ ) ] + γ 2 ⋅ E S 1 , A 1 , S 2 , A 2 , S 3 , A 3 [ g ( S 3 , A 3 ; θ ) ] + ⋯ + γ n − 1 ⋅ E S 1 , A 1 , S 2 , A 2 , S 3 , A 3 , ⋯ S n , A n [ g ( S n , A n ; θ ) ] ] = E S 1 ∼ f ( ⋅ ) { E A 1 ∼ π ( ⋅ ∣ S 1 ; θ ) [ g ( S 1 , A 1 ; θ ) ] } + γ ⋅ E S 2 ∼ f ( ⋅ ) { E A 2 ∼ π ( ⋅ ∣ S 2 ; θ ) [ g ( S 2 , A 2 ; θ ) ] } + γ 2 ⋅ E S 3 ∼ f ( ⋅ ) { E A 3 ∼ π ( ⋅ ∣ S 3 ; θ ) [ g ( S 3 , A 3 ; θ ) ] } + ⋯ + γ n − 1 ⋅ E S n ∼ f ( ⋅ ) { E A n ∼ π ( ⋅ ∣ S n ; θ ) [ g ( S n , A n ; θ ) ] } = ( 1 + γ + γ 2 + ⋯ + γ n − 1 ) ⋅ E S ∼ f ( ⋅ ) { E A ∼ π ( ⋅ ∣ S ; θ ) [ g ( S , A ; θ ) ] } . \begin{aligned} \frac{\partial J(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}= & \mathbb{E}_{S_1, A_1}\left[\boldsymbol{g}\left(S_1, A_1 ; \boldsymbol{\theta}\right)\right] \\ & +\gamma \cdot \mathbb{E}_{S_1, A_1, S_2, A_2}\left[\boldsymbol{g}\left(S_2, A_2 ; \boldsymbol{\theta}\right)\right] \\ & +\gamma^2 \cdot \mathbb{E}_{S_1, A_1, S_2, A_2, S_3, A_3}\left[\boldsymbol{g}\left(S_3, A_3 ; \boldsymbol{\theta}\right)\right] \\ & +\cdots \\ & \left.+\gamma^{n-1} \cdot \mathbb{E}_{S_1, A_1, S_2, A_2, S_3, A_3, \cdots S_n, A_n}\left[\boldsymbol{g}\left(S_n, A_n ; \boldsymbol{\theta}\right)\right]\right] \\ = & \mathbb{E}_{S_1 \sim f(\cdot)}\left\{\mathbb{E}_{A_1 \sim \pi\left(\cdot \mid S_1 ; \boldsymbol{\theta}\right)}\left[\boldsymbol{g}\left(S_1, A_1 ; \boldsymbol{\theta}\right)\right]\right\} \\ & +\gamma \cdot \mathbb{E}_{S_2 \sim f(\cdot)}\left\{\mathbb{E}_{A_2 \sim \pi\left(\cdot \mid S_2 ; \boldsymbol{\theta}\right)}\left[\boldsymbol{g}\left(S_2, A_2 ; \boldsymbol{\theta}\right)\right]\right\} \\ & +\gamma^2 \cdot \mathbb{E}_{S_3 \sim f(\cdot)}\left\{\mathbb{E}_{A_3 \sim \pi\left(\cdot \mid S_3 ; \boldsymbol{\theta}\right)}\left[\boldsymbol{g}\left(S_3, A_3 ; \boldsymbol{\theta}\right)\right]\right\} \\ & +\cdots \\ & +\gamma^{n-1} \cdot \mathbb{E}_{S_n \sim f(\cdot)}\left\{\mathbb{E}_{A_n \sim \pi\left(\cdot \mid S_n ; \boldsymbol{\theta}\right)}\left[\boldsymbol{g}\left(S_n, A_n ; \boldsymbol{\theta}\right)\right]\right\} \\ = & \left(1+\gamma+\gamma^2+\cdots+\gamma^{n-1}\right) \cdot \mathbb{E}_{S \sim f(\cdot)}\left\{\mathbb{E}_{A \sim \pi(\cdot \mid S ; \boldsymbol{\theta})}[\boldsymbol{g}(S, A ; \boldsymbol{\theta})]\right\} . \end{aligned} θJ(θ)===ES1,A1[g(S1,A1;θ)]+γES1,A1,S2,A2[g(S2,A2;θ)]+γ2ES1,A1,S2,A2,S3,A3[g(S3,A3;θ)]++γn1ES1,A1,S2,A2,S3,A3,Sn,An[g(Sn,An;θ)]]ES1f(){EA1π(S1;θ)[g(S1,A1;θ)]}+γES2f(){EA2π(S2;θ)[g(S2,A2;θ)]}+γ2ES3f(){EA3π(S3;θ)[g(S3,A3;θ)]}++γn1ESnf(){EAnπ(Sn;θ)[g(Sn,An;θ)]}(1+γ+γ2++γn1)ESf(){EAπ(S;θ)[g(S,A;θ)]}.

证毕

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://xiahunao.cn/news/2869899.html

如若内容造成侵权/违法违规/事实不符,请联系瞎胡闹网进行投诉反馈,一经查实,立即删除!

相关文章

用SeaTunnel从SQL Server向Elasticsearch同步数据

文章目录 引言I 步骤1.1 环境准备1.2 配置JDBC插件1.3 编写SeaTunnel任务配置II Enable Sql Server CDC引言 SeaTunnel 的官网 https://seatunnel.apache.org/ Support SQL Server Version: server:2008 (Or later version for information only)Supported DataSource Info: …

布隆过滤器原理及应用场景

目录 一、布隆过滤器概述1.1 什么是布隆过滤器1.2 优缺点 二、布隆过滤器原理2.1 布隆过滤器的组成2.2 元素添加和查询 三、 应用场景参考资料 一、布隆过滤器概述 1.1 什么是布隆过滤器 布隆过滤器(Bloom Filter)是一种数据结构,用于快速检…

[蓝桥杯练习题]Fizz Buzz经典问题

return的艺术 #include<bits/stdc.h> using namespace std; int main(){ios::sync_with_stdio(0);cin.tie(nullptr);cout.tie(nullptr);int n;cin>>n;if(n%50&&n%30)return !(cout<<"FizzBuzz");if(n%30)return !(cout<<"Fizz&…

Microsoft Word 符号 / 特殊符号

Microsoft Word 符号 / 特殊符号 1. 插入 -> 符号 -> 其他符号 -> Wingdings 2References 1. 插入 -> 符号 -> 其他符号 -> Wingdings 2 ​ References [1] Yongqiang Cheng, https://yongqiang.blog.csdn.net/

PHP+golang开源办公系统CRM管理系统

基于ThinkPHP6 Layui MySQL的企业办公系统。集成系统设置、人事管理、消息管理、审批管理、日常办公、客户管理、合同管理、项目管理、财务管理、电销接口集成、在线签章等模块。系统简约&#xff0c;易于功能扩展&#xff0c;方便二次开发。 服务器运行环境要求 PHP > 7.…

AI - 决策树模型

&#x1f914;决策树算法 决策树的思想来源可以追溯到古希腊时期&#xff0c;当时的哲学家们就已经开始使用类似于决策树的图形来表示逻辑推理过程。然而&#xff0c;决策树作为一种科学的决策分析工具&#xff0c;其发展主要发生在20世纪。 在20世纪50年代&#xff0c;美国兰…

mac激活pycharm,python环境安装和包安装问题

1.PyCharm到官网下载就行 地址&#xff1a;Other Versions - PyCharm (jetbrains.com) 2.MacOS 下载python环境&#xff0c;地址&#xff1a; Python Releases for macOS | Python.org 3.PyCharm环境配置&#xff1a; 4. 如果包下载不下来可以换个源试试 pip install py…

【网络原理】TCP 协议中比较重要的一些特性(三)

目录 1、拥塞控制 2、延时应答 3、捎带应答 4、面向字节流 5、异常情况处理 5.1、其中一方出现了进程崩溃 5.2、其中一方出现关机&#xff08;正常流程的关机&#xff09; 5.3、其中一方出现断电&#xff08;直接拔电源&#xff0c;也是关机&#xff0c;更突然的关机&am…

Unity的AssetBundle资源运行内存管理的再次深入思考

大家好&#xff0c;我是阿赵。   这篇文章我想写了很久&#xff0c;是关于Unity项目使用AssetBundle加载资源时的内存管理的。这篇文章不会分享代码&#xff0c;只是分享思路&#xff0c;思路不一定正确&#xff0c;欢迎讨论。   对于Unity引擎的资源内存管理&#xff0c;我…

android studio设置flutter和dart的sdk配置

提示没有配置dart sdk的时候&#xff0c;其实只需要配置一下flutter的sdk就可以了&#xff0c;因为flutter的安装包里面包含了dart的sdk&#xff1a; 按照提示选中这个flutter的安装包路径就可以了&#xff1a; 并且需要开启windows的开发者开关&#xff1a;start ms-settings:…

Oracle 临时表空间管理(Temporary Tablespace)

Oracle临时表空间&#xff08;Temporary Tablespace&#xff09;主要用来存储数据库运行中产生的临时对象&#xff0c;例如SQL排序结果集&#xff0c;临时表等&#xff0c;这些对象的生存周期只有会话。本文总结了Oralce中涉及临时表空间的管理和优化操作。 目录 一、临时表空…

微信小程序(五十八)分步表单多页面传值

注释很详细&#xff0c;直接上代码 上一篇 新增内容&#xff1a; 1.分步表单传值 2.伪数据生成 源码&#xff1a; app.json {"pages": ["pages/index/index","pages/building/building","pages/room/room","pages/logs/logs&quo…

hot100 -- 矩阵

&#x1f442; Peter Pan - kidult. - 单曲 - 网易云音乐 &#x1f442; Bibliothque&#xff08;图书馆&#xff09; - Jasing Rye - 单曲 - 网易云音乐 目录 &#x1f33c;前言 &#x1f33c;二分模板 &#x1f382;矩阵置零 AC 标记数组 AC 标记变量 &#x1f6a9;…

REDHAWK——连接

文章目录 前言一、连接过程二、为什么要使用端口三、端口访问四、动态连接五、标准化数据接口六、BulkIO1、流 API①、数据类型②、输出流<1>、创建<2>、修改流元数据<3>、写入<4>、写入复数数据<5>、写缓冲<6>、关闭 ③、输入流<1>…

手机中的8款万能App推荐!

目录 1.全能AI工具箱——HuluAI 2.AI视频生成——巨日禄 3.全能办公套件——鲸鲮Office 4.视频音频转换器——VideotoMP3Converter 5.特效滤镜摄影——PicsArt 6.智能工具箱——SmartTools 7.手机视频编辑软件——KineMaster 8.安卓版万能文档阅读器——AllDocumentRea…

蓝桥杯单片机快速开发笔记——矩阵键盘

一、原理分析 二、思维导图 三、示例框架 定义了四个位控制变量&#xff0c;用于控制键盘扫描时的行列信号。 在Scan_Keys()函数中&#xff0c;首先设置行列信号&#xff0c;将其中一个行信号置为0&#xff0c;另一个行信号置为1&#xff0c;同时将列信号置为1&#xff0c;用于…

Python基础入门 --- 5.函数

文章目录 Python基础入门5.函数5.1 基本定义5.2 传入参数5.3 返回值5.3.1 None类型 5.4 说明文档5.5 嵌套调用 Python基础入门 5.函数 定义&#xff1a;可重复使用&#xff0c;用来实现特定功能的代码段。 # 不使用内置函数len&#xff0c;统计字符串的长度 str "Hell…

AI预测福彩3D第10弹【2024年3月16日预测--第2套算法重新开始计算第2次测试】

今天继续开始咱们第2套算法的验证&#xff0c;计划每套算法连续测试10期&#xff0c;达到50%的命中率即为较优的模型&#xff0c;可继续使用。老规矩&#xff0c;先上图表&#xff0c;再下结论~ 最终&#xff0c;经过研判分析&#xff0c;2024年3月16日福彩3D的七码预测结果如下…

深度学习-基于机器学习的情绪分析研究

概要 互联网技术的迅速发展使得社交平台逐渐成为热点事件中社会情感的枢纽。社会热点事件的舆论监管的其中一个重要环节就是能够准确分析民众的社会情绪。本文旨在探索可以基于文本大数据彻底分析民众对热点事件的社会情绪的模型和方法。先是从社交平台上借助文本大数据、对数据…

SQL-Labs靶场“32-33”关通关教程

君衍. 一、32关 GET单引号闭合宽字节注入1、源码分析2、宽字节注入原理3、联合查询注入4、updatexml报错注入5、floor报错注入 二、33关 GET单引号addslashes逃逸注入1、源码分析2、联合查询注入3、updatexml报错注入4、floor报错注入 SQL-Labs靶场通关教程&#xff1a; SQL注入…