Marginal Beliefs of Random MRFs

A pairwise Markov Random Field is a way of defining a probability distribution over some vector {\bf x}. One way to write one is

p({\bf x}) \propto \exp( \sum_i \phi(x_i) + \sum_{(i,j)} \psi(x_i,x_j) ).

Where the first sum is over all the variables, and the second sum is over neighboring pairs. Here, I generated some random distributions over binary valued variables. For each i, I set \phi(x_i=0)=0, and \phi(x_i=1)=r_i where r_i is some value randomly chosen from a standard Gaussian. For the pairwise terms, I used \psi(x_i,x_j) = .75 \cdot I(x_i=x_j). (i.e. \psi(x_i,x_j) is .75 when the arguments are the same, and zero otherwise.) This is an “attractive network”, where neighboring variables want to have the same value.

Computing marginals p(x_i) is hard in graphs that are not treelike. Here, I approximate them using a nonlinear minimization of a “free energy” similar to that used in loopy belief propagation.

Here, I show the random single-variate biases r_i and the resulting beliefs.  What we see is constant valued regions (encouraged by \psi) interrupted where the $\phi$ is very strong.

Now, with more variables.

Now, a “repellent” network. I repeated the procedure above, but changed the pairwise interactions to \psi(x_i,x_j) = -.75 \cdot I(x_i\not=x_j). Neighboring variables want to have different values.  Notice this is the opposite of the above behavior– regions of “checkerboard” interrupted where the $\phi$ outvotes \psi.

Now, the repellent network with more variables.

Advertisements

More Vector Calculus Identities

I realize that “rule 2” from the previous post is actually just a special case of the vector chain rule.

Rule 4. (Chain rule)

If f({\bf x}) = {\bf g}({\bf h}({\bf x})), then

J[{\bf f}] = J[{\bf g}]J[{\bf h}], or equivalently,

\frac{ \partial{\bf f} }{ \partial{\bf x}^T} = \frac{ \partial{\bf g} }{ \partial{\bf h}^T} \frac{ \partial{\bf h} }{ \partial{\bf x}^T}.

Here, I have used {\bf h} to denote the argument of {\bf g}. (That makes it look more like the usual chain rule.)

From this, you get the special case where g is a scalar function. (I use the non-boldface g in g({\bf h}) to suggest that g is a scalar function that operates ‘element-wise’ on vector input.)

Rule 4. (Chain rule– special case for a scalar function)

If f({\bf x}) = g({\bf h}({\bf x})), then

J[{\bf f}]({\bf x}) = \text{diag}[ g'({\bf h})] J[{\bf h}], or equivalently,

J[{\bf f}]({\bf x}) = g'({\bf h}) {\bf 1}^T \odot J[{\bf h}].

In the last line, I use the fact that \text{diag}({\bf x})A = {\bf x}{\bf 1}^T \odot A.

Finally, substituting g = g' = \exp gives the special case below.

Vector Calculus Identities

Suppose

g({\bf x}) = {\bf p}({\bf x})^T {\bf q}({\bf x}).

That is, g is a scalar function of a vector {\bf x}, given by taking the inner product of the two vector valued functions {\bf p} and {\bf q}. Now, we would like the gradient of g, i.e. \frac{\partial g}{\partial \bf x}. What is it?

I frequently need to find such derivatives, and I have never been able to find any reference for rules to calculate them. (Though such rules surely exist somewhere!) Today, Alap and I derived a couple simple rules. The first answers the above question.

Rule 1.

If g({\bf x}) = {\bf p}({\bf x})^T {\bf q}({\bf x}), then,

\frac{\partial g}{\partial \bf x} = J[{\bf p}]^T {\bf q}({\bf x}) + J[{\bf q}]^T{\bf p}({\bf x}).

Here, J[{\bf p}]=\frac{\partial {\bf p}}{\partial {\bf x}^T} is the Jacobian. i.e. J_{i,j} = \frac{\partial p_i }{\partial x_j}

This rule is a generalization of the calculus 101 product rule, where g(x)=p(x) q(x) (everything scalar), and g'(x) = p'(x) q(x) + q'(x) p(x).

A different rule concerning exponentials is

Rule 2.

If {\bf f}({\bf x}) = \exp({\bf q}({\bf x})), then

J[{\bf f}]= \frac{\partial {\bf f}}{\partial {\bf x}^T} = \exp( {\bf q}({\bf x})) {\bf 1}^T \odot J[{\bf q}].

Here, \odot is the element-wise product. The strange product with {\bf 1}^T can be understood as “copying” the first vector. That is, {\bf x} {\bf 1}^T is a matrix where each column consists of \bf x. (The number of columns must be understood from context.)

Rule 3.

If {\bf f}({\bf x}) = {\bf p}({\bf x}) \odot {\bf q}({\bf x}), then

J[{\bf f}] ={\bf p}({\bf x}){\bf 1}^T \odot J[{\bf q}] + J[{\bf p}] \odot {\bf q}({\bf x}){\bf 1}^T

Surely there exists a large set of rules like this somewhere, but I have not been able to find them, despite needing things like this for years now. What I would really like to do is use a computer algebra system, such as Maple, Mathematica or Sage to do this, but that doesn’t seem possible at present. (It is suggested in that thread that Mathematica is up to the job, but I have tested it out found that not to be true.)