I’m releasing code for a “toolbox” of code for learning and inference with graphical models. It is focused on parameter learning using marginalization in the high-treewidth setting. Though the code is, in principle, domain independent, I’ve developed it with vision problems in mind. This means that the code is A) efficient (all the inference algorithms are implemented in C++) and B) can handle arbitrary graph structures.

There are, at present, a bunch of limitations:

- All the inference algorithms are for
*marginal*inference. No MAP inference, at all. - The code handles pairwise graphs only
- All variables must have the same number of possible values.
- For tree-reweighted belief propagation, a
*single*edge appearance probability must be used for all edges

For vision, these are usually no big deal. (Except if you are a MAP inference person. But that is not advisable.) In other domains, though, these might be showstoppers.

The code can be used in a bunch of different ways, depending on if you are looking for a specific tool to use, or a large framework.

- Just use the low-level [Inference] algorithms, namely A) Tree-Reweighted Belief propagation + variants (Loopy BP, TRW-S) or B) Mean-field. Take care of everything else yourself.
- Use the [Differentiation] methods (back-TRW or implicit differentiation) to calculate parameter gradients by providing your own loss functions. Do everything else on your own.
- Use the [Loss] methods (em, implicit_loss) to calculate parameter gradients by providing a true vector x and a loss name (univariate likelihood, clique likelihood, etc.) Unlike the above usages, these methods explicitly consider the conditional learning setting where one has an input and an output.
- Use the [CRF] methods to calculate calculate almost everything (deal with parameter ties for a specific type of model, etc.) These methods consider specific classes of CRFs and given and input, output, loss function, inference method, etc. give the parameter gradient. Employing this gradient in a learning framework is quite straightforward.

Congrats on your cool toolbox!

Couldn’t you use your toolbox for MAP inference from your toolbox by raising your potentials to a large power? IE, along the lines of Koller/Friedman’s 13.5.3.2 “Max-Product as Sum-Product Limit.” One technical condition for the equivalence is that ground state has to be unique, but that shouldn’t be an issue since you can add small random numbers to potential tables

For pure inference, that should be fine. However, I don’t think it would be possible to do MAP-based parameter learning that way…