Here is an example of usage of the graphical models toolbox I’ve just released. I’ll use the terminology of “perturbation” to refer to computing loss gradients from the difference of two problems as in this paper, and “truncated fitting” to refer to backpropagating derivatives through the TRW inference process for a fixed number of iterations, as in this paper.

This is basically the simplest possible vision problem. We will train a conditional random field (CRF) to take some noisy image as input:

and produce marginals that well predict a binary image as output:

The noisy image is produced by setting where is random on and is a noise level as described in this paper. For now, we set which is a pretty challenging amount of noise, as you see above.

The main file, which does the learning described here can be found in the toolbox in `examples/train_binary_denoisers.m`.

First off, we will train a model using perturbation, with the univariate likelihood loss function, based on TRW inference, with a convergence threshold of 1e-5. We do this by typing:

>> train_binary_denoisers('pert_ul_trw_1e-5') First-order Iteration Func-count f(x) Step-size optimality 0 1 0.692759 0.096 1 3 0.686739 0.432616 0.0225 2 5 0.682054 10 0.0182 3 6 0.670785 1 0.0317 4 10 0.542285 48.8796 0.932 5 12 0.515509 0.1 0.965 6 13 0.439039 1 0.355 7 14 0.302082 1 0.279 8 15 0.228832 1 0.471 9 17 0.223659 0.344464 0.0159 10 18 0.223422 1 0.00417 11 19 0.223231 1 0.00269 12 20 0.223227 1 0.00122 13 22 0.223221 4.33583 0.000857 14 23 0.223201 1 0.00121 15 24 0.223138 1 0.00306 16 25 0.223035 1 0.00509 17 26 0.222903 1 0.00564 18 27 0.222824 1 0.0035 19 28 0.222806 1 0.000945 20 29 0.222803 1 0.000798 21 30 0.222802 1 0.00079 22 31 0.222798 1 0.00111 23 32 0.222788 1 0.00168 24 33 0.222763 1 0.00255 25 34 0.222707 1 0.00364 26 35 0.222603 1 0.00435 27 36 0.222479 1 0.00339 28 37 0.222408 1 0.00117 29 38 0.222394 1 9.64e-05 30 39 0.222393 1 2.05e-05 31 40 0.222393 1 4.06e-06 32 41 0.222393 1 2.86e-07

The final model trained has an error rate of 0.096. We can visualize the marginals produced by making an image where each pixel has an intensity proportional to the predicted probability that that pixel takes label “1”.

On the other hand, we might train a model using truncated fitting, with the univariate likelihood, and 5 iterations of TRW.

`>> train_binary_denoisers('trunc_ul_trw_5')`

Sparing you the details of the optimization, this yields a total error rate of .0984 and the marginals:

Thus, restricting to only 5 iterations pays only a small accuracy penalty compared to a right convergence threshold.

Or, we could fit using the surrogate conditional likelihood. (Here called E.M., though we don’t happen to have any hidden variables.)

`>> train_binary_denoisers('em_trw_1e-5')`

This yields an error rate of .1016, and the marginals:

There are many permutations of loss functions, inference algorithms, etc. (Some of which have not yet been published.) Rather than exhaust all the possibilities, I’ll just list a bunch of examples:

`'pert_ul_trw_1e-5'` (Perturbation + univariate likelihood + TRW with 1e-5 threshold)

`'trunc_cl_trw_5'` (Truncated Fitting + clique likelihood + TRW with 5 iterations)

`'trunc_cl_mnf_5'` (Truncated Fitting + clique likelihood + Mean Field with 5 iterations)

`'trunc_em_trw_5'` (Truncated EM, with TRW used to compute both upper and lower bounds on partition function + 5 iterations)

`'em_trw_1e-5'` (Regular EM, with TRW used to compute both upper and lower bounds on partition function + 1e-5 threshold)

`'em_trw/mnf_1e-5'` (Regular EM, with TRW used for upper bound and meanfield used for lower bound + 1e-5 threshold)

`'pseudo'` (Pseudolikelihood)

About the pseudolikelihood, let’s try it.

>> train_binary_denoisers(‘pseudo’)

This yields an near-change error rate of .44, and the marginals (produced by TRW)

Which is why you probably shouldn’t use the pseudolikelihood…