Deep Learning: Chapter 18
18 Confronting the Partition Function
18.1 The Log-Likelihood Gradient
\[\nabla_\theta \log p(x;\theta) = \nabla_\theta \log \tilde{p}(x;\theta) - \nabla_\theta \log Z(\theta).\]For most undirected models of interest, the negative phase is difficult.