Example usage of JGMT

Here is an example of usage of the graphical models toolbox I’ve just released. I’ll use the terminology of “perturbation” to refer to computing loss gradients from the difference of two problems as in this paper, and “truncated fitting” to refer to backpropagating derivatives through the TRW inference process for a fixed number of iterations, as in this paper.

This is basically the simplest possible vision problem. We will train a conditional random field (CRF) to take some noisy image \bf x as input:

and produce marginals that well predict a binary image \bf y as output:

The noisy image \bf x is produced by setting x_i = y_i(1-t_i^n) + (1-y_i)t_i^n where t_i is random on [0,1] and n is a noise level as described in this paper. For now, we set n=1.25 which is a pretty challenging amount of noise, as you see above.

The main file, which does the learning described here can be found in the toolbox in examples/train_binary_denoisers.m.

First off, we will train a model using perturbation, with the univariate likelihood loss function, based on TRW inference, with a convergence threshold of 1e-5. We do this by typing:

>> train_binary_denoisers('pert_ul_trw_1e-5')
                                                        First-order 
 Iteration  Func-count       f(x)        Step-size       optimality
     0           1         0.692759                         0.096
     1           3         0.686739       0.432616         0.0225  
     2           5         0.682054             10         0.0182  
     3           6         0.670785              1         0.0317  
     4          10         0.542285        48.8796          0.932  
     5          12         0.515509            0.1          0.965  
     6          13         0.439039              1          0.355  
     7          14         0.302082              1          0.279  
     8          15         0.228832              1          0.471  
     9          17         0.223659       0.344464         0.0159  
    10          18         0.223422              1        0.00417  
    11          19         0.223231              1        0.00269  
    12          20         0.223227              1        0.00122  
    13          22         0.223221        4.33583       0.000857  
    14          23         0.223201              1        0.00121  
    15          24         0.223138              1        0.00306  
    16          25         0.223035              1        0.00509  
    17          26         0.222903              1        0.00564  
    18          27         0.222824              1         0.0035  
    19          28         0.222806              1       0.000945  
    20          29         0.222803              1       0.000798  
    21          30         0.222802              1        0.00079  
    22          31         0.222798              1        0.00111  
    23          32         0.222788              1        0.00168  
    24          33         0.222763              1        0.00255  
    25          34         0.222707              1        0.00364  
    26          35         0.222603              1        0.00435  
    27          36         0.222479              1        0.00339  
    28          37         0.222408              1        0.00117  
    29          38         0.222394              1       9.64e-05  
    30          39         0.222393              1       2.05e-05  
    31          40         0.222393              1       4.06e-06  
    32          41         0.222393              1       2.86e-07  

The final model trained has an error rate of 0.096. We can visualize the marginals produced by making an image where each pixel has an intensity proportional to the predicted probability that that pixel takes label “1”.

On the other hand, we might train a model using truncated fitting, with the univariate likelihood, and 5 iterations of TRW.

>> train_binary_denoisers('trunc_ul_trw_5')

Sparing you the details of the optimization, this yields a total error rate of .0984 and the marginals:

Thus, restricting to only 5 iterations pays only a small accuracy penalty compared to a right convergence threshold.

Or, we could fit using the surrogate conditional likelihood. (Here called E.M., though we don’t happen to have any hidden variables.)

>> train_binary_denoisers('em_trw_1e-5')

This yields an error rate of .1016, and the marginals:

There are many permutations of loss functions, inference algorithms, etc. (Some of which have not yet been published.) Rather than exhaust all the possibilities, I’ll just list a bunch of examples:

'pert_ul_trw_1e-5' (Perturbation + univariate likelihood + TRW with 1e-5 threshold)

'trunc_cl_trw_5' (Truncated Fitting + clique likelihood + TRW with 5 iterations)

'trunc_cl_mnf_5' (Truncated Fitting + clique likelihood + Mean Field with 5 iterations)

'trunc_em_trw_5' (Truncated EM, with TRW used to compute both upper and lower bounds on partition function + 5 iterations)

'em_trw_1e-5' (Regular EM, with TRW used to compute both upper and lower bounds on partition function + 1e-5 threshold)

'em_trw/mnf_1e-5' (Regular EM, with TRW used for upper bound and meanfield used for lower bound + 1e-5 threshold)

'pseudo' (Pseudolikelihood)

About the pseudolikelihood, let’s try it.

>> train_binary_denoisers(‘pseudo’)

This yields an near-change error rate of .44, and the marginals (produced by TRW)

Which is why you probably shouldn’t use the pseudolikelihood…

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s