Talk:Product of experts
This is a nice review article.
A few comments
1) In the section Training a Product of Experts the term mini-batch is not clearly defined.
2) In the section Training a Product of Experts the use of "momentum" should be referenced as (i) it is not clear to a novice reader what this means and (ii) it should be referenced to a source, e.g. Plaut et al, 1986 (see e.g. Bishop's book 1995 or Hertz, Krogh and Palmer 1990)
3) For the para that starts
"The special bipartite structure of the RBM and EFH results in a very efficient Gibbs sampler that alternates between sampling all hidden variables independently given values ..."
I strongly recommend using (a copy of) Fig 1 from Hinton (2002) to illustrate this point.
4) In eq (2) the p_j's are undefined.
5) Below (3) you say
"Over-complete variants of ICA that retain marginal independence have also been proposed (Lewicki and Sejnowski 1998). This model has ..."
What does "this" refer to here? Is it Lewicki and Sejnowski's model? Rephrase to make this clear.
6) It might be helpful to point out that a product of Gaussians is a Gaussian, with covariance $C^{-1}_{prod} = \sum_i C^{-1}_i$, see e.g. C. K. I. Williams and F. V. Agakov, Neural Computation, 14(5), 1169-1182 (2002).
7) Please number all equations: even if you don't want to refer to a given equation in your text, someone else might want to later.