Talk:Deep belief networks
This article gives a nice summary of DBNs and their main properties, and these ideas have been sufficiently influential that the inclusion of this article in scholarpedia seems well worthwhile. I think it should be fine to accept this.
However, some suggestions: (1) This article could be dense and difficult to understand to readers not already familiar with DBNs. A figure showing a DBN, and maybe a separate figure explaining the greedy layerwise training process, might help. This would also make clearer the sense in which DBNs are graphical models. (2) If I were writing this, I would have said more about applications of DBNs. To keep the total amount of space constant, the text on "DBNs and other types of variable" can be reduced.
Very nice contribution. I think it is a little compact. I agree with the above comments that a figure could be helpful. I would not remove any text because the contribution is very short as is. I would definitely expand on the core of the algorithm. The explanation:
"After learning W, we keep p(v|h,W) but we replace p(h|W) by a better model of the aggregated posterior distribution over hidden vectors – i.e. the non-factorial distribution produced by averaging the factorial posterior distributions produced by the individual data vectors. The better model is learned by treating the hidden activity vectors produced from the training data as the training data for the next learning module."
is really somewhat cryptic to the uninitiated. Perhaps Yee Whye Teh's explanation during the NIPS workshop on deep learning in terms of unrolling a Markov chain could be very useful here.
This is an excellent, timely, and necessary contribution to Scholarpedia. I would certainly not shorten it, and I agree with the comments from the second review in this respect, i.e., I would try to clarify concepts for the uninitiated. I am not sure that Yee Whye Teh's unfolding of a Markov chain is the most pedagogical route, though. Having both explanations would be nice, but if I had to choose, I would prefer the approach currently presented. I would just expand a bit the presentation of the argument to help the non-expert.
One thing I would really like to see added is a motivational section explaining why a deep architecture is interesting, from the representational point of view. If I were to write it I would talk about the inspiration from the brain, about the fact that humans tend to explain and understand through multiple levels of concepts and representation, and also about the theoretical results suggesting that whereas a 2-level architecture can represent anything, it might need an exponential capacity where a deeper architecture would not need one (see my paper on 'Learning Deep Architectures for AI', to appear in Foundations & Trends in Machine Learning). Connected to this motivation is the importance of hidden variables in graphical models, when there is not enough prior knowledge to fully specify the semantics of all random variables involved in a parametric model (i.e., introducing generic hidden variables brings in a lot of modeling flexibility, and becomes even non-parametric when the number of hidden variables can be controlled).
Regarding the piece on auto-encoders, I would mention the denoising auto-encoders (Vincent et al, ICML'2008). They come with a bound on the log-likelihood of a generative model, and like RBMs they put more emphasis on representing the information in the input that is predictable from one part of the input given another part. They also generalize substantially better than ordinary auto-encoders. Finally, it might be worth mentionning that there are intimate connections between auto-encoders and RBMs trained by CD-1 (Bengio and Delalleau, Neural Computation 2009, "Justifying and Generalizing Contrastive Divergence").