# Favorite things NIPS

I always enjoy reading conference reports, so I thought I’d mention a few papers that caught my eye.  (I welcome any corrections to my summaries of any of these.)

1. Recent Progress in the Structure of Large-Treewidth Graphs and Some Applications, by Chandra Chekuri. Not a paper, but rather an exceptionally clear tutorial on material that should probably be mandatory for anyone interested in computational complexity and graphical models. The talk isn’t online yet, but slides are available.

2. Learning Distributed Representations for Structured Output Prediction, by Vivek Srikumar and Christopher Manning. The idea is that when doing structured prediction, labels are often “similar” in some sense, and so should be encouraged to have similar weights. This is done by defining weights not over a discrete label space, but over a finite-dimensional distributed representation, with one vector for each label. Learning alternates between updating the representation of each label and regular structured learning.

3. Clamping Variables and Approximate Inference, by Adrian Weller and Tony Jebara. The paper provides a first-principles based proof that the Bethe approximation gives a lower-bound on the true partition function for attractive binary pairwise models. (A result originally proved by Ruozzi in 2012) The basic idea is that “clamping” (i.e. brute-force summing over all values of a variable, and using the Bethe approximation on the resulting model) can only increase the Bethe approximation. But, if you clamp everything, you have the exact partition function, so Bethe must be a lower-bound.  The proofs provide a lot of insight into why the results are true.

4. Making Pairwise Binary Graphical Models Attractive, by Nicholas Ruozzi and Tony Jebara. This builds on similar ideas to the above paper. There is an idea of a cover of a graph, where vertices have multiple copies, but neighborhoods look locally the same. The paper gives an appropriate cover construction such that the cover is attractive (up to a “flipping” of the variables). Thus, the Bethe partition function of the cover lower-bounds the true partition function of the cover. This is shown to be a lower-bound on the tree-reweighted partition function. As far as I can tell, Bethe on the cover may upper or lower-bound the true partition function, so Bethe-on-cover isn’t necessarily better than tree-reweighted.  Still, since tree-reweighted is typically a loose upper-bound, it is likely to be better in practice.

5. Distributed Bayesian Posterior Sampling via Moment Sharing, by Minjie Xu, Balaji Lakshminarayanan, Yee Whye Teh, Jun Zhu, and Bo Zhang. You want to do Bayesian MCMC inference using a cluster. This paper partitions the data over the nodes of the cluster, does local MCMC approximations, and uses an exponential-family approximation and expectation propagation to send information about the posterior between the nodes, thus approximating full/normal Bayesian inference. A couple days ago Andrew Gelman mentioned a similar method, considering other local approximations.

6. Mode Estimation for High Dimensional Discrete Tree Graphical Models, by Chao Chen, Han Liu, Dimitris Metaxas, and Tianqi Zhao. The goal is to find the M highest modes in a tree-structured graphical model. They find a set of necessary local conditions to be a mode, and then propose a junction-tree type algorithm operating with local mode configurations as values to find global modes.

7. Efficient Inference of Continuous Markov Random
Fields with Polynomial Potentials
, by Shenlong Wang, Alexander Schwing, and Raquel Urtasun. The goal is to find a local minimum of a polynomial MRF by use of the CCCP method where one iteratively splits the objective into convex and concave parts and linearizes the concave part to get an upper-bound. I’d have expected this to be easy, but it turns out to be quite difficult to split up a polynomial into convex and concave parts. This paper gives an SDP problem to find a function to add to the objective to make it convex, yielding the desired split.

8. Hardness of parameter estimation in graphical models, by Guy Bresler, David Gamarnik, and Devavrat Shah. In principle, this is the ultimate non-surprising result: given the mean parameters (data marginals), it is computationally intractable to find the natural parameters (MRF weights) that produce these marginals. Formally given an oracle to do maximum likelihood learning in the normal way (find parameters to match a given mean vector), you could use that oracle to approximate the partition function, which is known to be hard. The really surprising thing here is that this wasn’t proved until 2014!

9. A* Sampling, by Chris Maddison, Daniel Tarlow, and Tom Minka. A method for exact sampling for low-dimensional continuous distributions, based on fundamentally different principles from existing methods. Intuitively, the algorithm simultaneously generates Gumbel-type noise for every possible (real-valued) configuration and optimizes the resulting function by branch-and-bound.

10. The workshops, as usual, were great aside from the usual annoyance of it only being possible to be in one place at one time.

11. Montreal.  Charming, friendly.