logoiMonad.com


Restricted Boltzmann Machine - Short Tutorial

Posted in python, science by zoo on the October 8th, 2008

I have read a lot of papers on RBM and it seems to be difficult to grasp all the implementation details.
So, I want to share my experience with people facing the same problems. My tutorial is based on variant of RBM-s named Continuous Restricted Boltzmann Machine or CRBM for short. CRBM have very close implementation to original RBM with binomial neurons (0,1) as possible values of activation. At the end of the article I provide some code in Python. No guarantee is given that this implementation is correct so let me know of any bugs found.
First you may have a look at the original papers that describe theory behind RBM neural networks:

RBM application to Netflix challenge

Boltzmann Machine - Scholarpedia

Continuous RBM

What is a Restricted Boltzmann Machine

Restricted Boltzmann Machine is a stochastic neural network (that is a network of neurons where each neuron have some random behavior when activated). It consist of one layer of visible units (neurons) and one layer of hidden units. Units in each layer have no connections between them and are connected to all other units in other layer (fig.1). Connections between neurons are bidirectional and symmetric . This means that information flows in both directions during the training and during the usage of the network and that weights are the same in both directions.

Restricted Boltzmann Machine

fig.1 Restricted Boltzmann Machine

RBM Network works in the following way:
First the network is trained by using some data set and setting the neurons on visible layer to match data points in this data set.
After the network is trained we can use it on new unknown data to make classification of the data (this is known as unsupervised learning)

Data set

For purpose of this tutorial I will use artificially generated data set. It is 2D data that form a pattern shown in fig.2. I choose to use 500 data points for training. Because each data points is formed by to numbers between 0 and 1 neurons have to accept continuous values and I use Continuous RBM. I have tried different 2D patterns and this one seems to be relative difficult to be learned by CRBM.

Training Data

fig. 2 Training data

Learning Algorithm

Training a RBM is performed by algorithm known as “Contrastive Divergence Learning”.

More info on Contrastive Divergence
Let W be the matrix of IxJ (I - number of visible neurons, J - number of hidden neurons) that represents
weights between neurons. Each neuron input is provided by connections from all neurons in other layer.
Current neuron state S is formed by multiplication of each input by weight, summation over all inputs and application of this sum as a argument of nonlinear sigmoidal function:
Sj = F( Sum( Si x Wij + N(0,sigma)) ) - here Si are all neurons in given layer plus one bias neuron that stays constantly set at 1.0
N(0,1) is random number from normal distribution with mean 0.0 and standard deviation sigma (I use sigma=0.2).
This nonlinear function in my case is:
F = lo + (hi - lo)/(1.0 + exp(-A*Sj))

Where lo and hi are the lower and higher bound of input values (in my case 0,1), so it
becomes: F = 1.0/(1.0 + exp(-A*Sj))
A - is some parameter that is determined during the learning process.

Contrastive divergence is a value that is computed (actually matrix of values) and that is used to adjust values of W matrix. Changing W incrementally lead to training of W values.

Let W0 be the initial matrix of weights that are set to some random small values. I use N(0, 0.1) for this.
Let CD = <Si.Sj>0 - <Si.Sj>n  - contrastive divergence
Then on each step (epoch) W is updated to new value W’:
W’ = W + alpha*CD
Here alpha is some small step - “learning rate”. There exist more complex ways for W update that involve
some “momentum” and “cost” of update to avoid W values to become very large.

Contrastive Divergence explanation

There seems to be big confusion what exactly Contrastive Divergence means and how to implement it.
I have spend a lot of time to understand it.
First of all CD as shown in the formula above is a matrix of size IxJ. So this formula have to be
computed for each combination of I and J.
<…> is a average over each data point in the data set.

Si.Sj is just a multiplication of current activation (state) of neuron I and neuron J (obviously :) ). Where Si is the state of a visible neuron, and Sj is the state of a hidden neuron.
Indexes after <…> mean that average is taken after 0 or N-th reconstruction step performed.

fig. 2 Training Restricted Boltzmann Machine

fig. 2 Training Restricted Boltzmann Machine

How is the reconstruction performed?

  1. get one data point from data set.
  2. use values of this data point to set state of visible neurons Si
  3. compute Sj for each hidden neuron based on formula above and states of visible neurons Si
  4. now Si and Sj values can be used to compute (Si.Sj)0 - here (…) means just values not average
  5. on visible neurons compute Si using the Sj computed in step3. This is known as “reconstruction”
  6. compute state of hidden neurons Sj again using Si from 5 step.
  7. now use Si and Sj to compute (Si.Sj)1 (fig.3)
  8. repeating multiple times steps 5,6 and 7 compute (Si.Sj)n. Where n is small number and can increase with learning steps to achieve better accuracy.

The algorithm as a whole is:

  • For each epoch do:
    • For each data point in data set do:
      • CDpos =0, CDneg=0 (matrices)
      • perform steps 1…8
      • accumulate CDpos = CDpos + (Si.Sj)0
      • accumulate CDneg = CDneg + (Si.Sj)n
    • At the end compute average of CDpos and CDneg by dividing them by number of data points.
    • Compute CD = < Si.Sj >0 - < Si.Sj >n = CDpos - CDneg
    • Update weights and biases W’ = W + alpha*CD (biases are just weights to neurons that stay always 1.0)
    • Compute some “error function” like sum of squared difference between Si in 1) and Si in 5)
      e.g between data and reconstruction.
  • repeat for the next epoch until error is small or some fixed number of epoch.

The value of parameter A for visible units stay constant and for hidden unit is adjusted
by the same CD calculation but instead of formula above the following formula is used:
CD = (<Sj.Sj>0 - <Sj.Sj>n)/(Aj.Aj)

In my code i use n=20 initially and gradually increase it to 50.
Most of the steps in the algorithm can be performed by some simple matrix multiplication.

RBM usage

After the RBM learning process finishes it can be shown how well it performs on new data points.
I use set of 500 data points that are random and uniformly distributed in the 2D interval (0..1, 0..1).
Visible layer neurons states are set with values of each data point and steps 1) 2) 3) 5) are repeated
multiple times. Number of repetitions is the max number of “n” used in the learning process.

At the end of n-th step, neurons states in visible layer represent a data reconstruction. This is repeated for each data point. All the reconstruction points form a 2D image pattern that can be compared with initial image.

Python implementation of CRBM

At the end of this article I provide some implementation of Continuous Restricted Boltzmann Machine in Python. Most of the steps are performed by matrix operations using NumPy library. I used PyLab and SciPy to make some nice visualizations. Images of data that are shown (fig.4) are interpolated with Gaussian kernel in order to have some “feeling” of the density of data points and reconstructed data points.

RBM reconstruction

fig. 4 Training data and reconstruction.

Main difficulty I observed is to fine tune the set of learning parameters.

I used the following parameters and if somebody is able to find better values - let me know.
sigma=0.2
A=0.1 (on visible neurons)
Learning Rate W = 0.5
Learning Rate A = 0.5
Learning Cost = 0.00001
Learning Moment = 0.9

RBM architecture is 2 visible neurons for 2D data points and 8 hidden neurons. On some experiments with simple patterns 4 neurons are enough to reconstruct pattern successfully.

Python code for Restricted Boltzmann Machine

Chaotic scattering with Povray

Posted in povray, science by zoo on the July 27th, 2008

In this post I will talk about the four-sphere scattering system. This is one of the simples examples of chaotic scattering. We can envision scattering as a small particle (electron for example) that moves toward a fixed target (group of atoms), interacts with it, and then leaves the region. These systems can have multiple exit modes and typically have fractal phase space boundaries separating the sets of initial conditions (basins) going to the various exits. In many cases, the scattering may have exceedingly complex behavior. Chaotic scattering problems are studied intensively recently. There are connections between them and chaotic behavior in quantum systems, periodic orbit theory and Poincaré recurrence theorem.

I have read a paper on chaotic scattering long time ago and decided to reproduce the experiment described in it (sorry, lost the original reference but you can search for “chaotic scattering”). In original paper leasers and mirror spheres are used but I have no spare lasers around so I decided to simulate experiment with Povray.

Experiment:
Basically we put 4 spheres together that have surfaces perfectly reflecting the light. They are arranged in the vertices of tetrahedron in the way that each sphere is touching the other 3 (fig. 1). We can use light beam, point it at the one face of the tetrahedron and look at the other 3 faces. It is expected that light will scatter multiple times between spheres and escape from one of the 4 faces chaotically which consequently led to fractal image similar to Sierpiński triangle.
I have used another approach. Camera is positioned to look at one tetrahedron face. All faces are illuminated by infinite surfaces colored differently. When light escapes from given point of the first face there can be only one source where it is coming from (ask yourself why?). Color of the point depend from the color of the source surface.

four-spheres

fig. 1 - Spheres arrangement

This is the code I used for the spheres definition:

#declare Sphere_Color = color rgb < 1.0, 1.0, 1.0> ;

#declare SphereObject = sphere {
< 0 , 0 , 0 > , 1
texture { finish { DefaultFinish } }
}
#declare R = 3.00000;
#declare CRadius = R * 0.866025 ;
#declare s4 = union {
object { SphereObject // C_1
pigment { Sphere_Color }
scale CRadius
translate <0.000000*R,1.000000*R,0.000000*R>
}
object { SphereObject // C_2
pigment { Sphere_Color }
scale CRadius
translate <0.866025*R,-0.500000*R,0.000000*R>
}
object { SphereObject // C_3
pigment { Sphere_Color }
scale CRadius
translate <-0.866025*R,-0.500000*R,0.000000*R>
}
object { SphereObject // C_4
pigment { Sphere_Color }
scale CRadius
translate <0.000000*R,0.000000*R,-1.43*R>
}
}

You can check the spheres coordinates, one of them is probably not touching others, but it is not so important and the fractal behavior can be seen in this case too.

This is the definition of light emitting planes:

#declare RR = -10000;
#declare IN = 0.9;
#declare Lightplane1 = plane {<0.000000*RR,1.000000*RR,1.802776*RR>, 1
pigment {color rgb <0.70, 0.01, 0.01>}
finish { PlaneFinish }
}

#declare Lightplane2 = plane {<0.866025*RR,-0.500000*RR,1.802776*RR>, 1
pigment {color rgb <0.93, 0.70, 0.01>}
finish { PlaneFinish }
}

#declare Lightplane3 = plane {<-0.866025*RR,-0.500000*RR,1.802776*RR >, 1
pigment {color rgb <0.45, 0.35, 0.01>}
finish { PlaneFinish }
}

#declare Lightplane4 = plane { < 0,0,1 >, 100
pigment {color rgb <0.45, 0.05, 0.45>}
finish { PlaneFinish }
}

light_source {
< 0.000000 * RR, 1.000000 * RR, 1.802776 * RR >
color rgb < IN, IN, IN >
looks_like { Lightplane1 }
photons {refraction off reflection on}
}

light_source {
< 0.866025 * RR,-0.500000 * RR, 1.802776 * RR >
color rgb < IN, IN, IN >
looks_like { Lightplane2 }
photons {refraction off reflection on}
}

light_source {
< -0.866025 * RR,-0.500000 * RR, 1.802776 * RR >
color rgb < IN, IN, IN >
looks_like { Lightplane3 }
photons {refraction off reflection on}
}

light_source {
<0, 0, 0>
color rgb < IN, IN, IN >
looks_like { Lightplane4 }
photons {refraction off reflection on}
}

Here you can play a lot with the “plane finish” and colors. I have used the following settings:

#declare PlaneFinish = finish {
ambient 0.6
diffuse 0.2
reflection 0.1
brilliance 0.3
specular 0.4
}

Also fully reflecting spheres are defined with:

#declare DefaultFinish = finish {
ambient 0.0
diffuse 0.00
specular 1.0
roughness 0.01
metallic
reflection {
1.0
metallic
}
}

I found that adding a little “roughness” helps to the realistic view of the sphere surfaces.

During the rendering light rays reflects multiple times from sphere surfaces. Povray have some restrictions on how many times to reflect a ray before it stops. This is controlled by max_trace_level and (since version 3.0) adc_bailout value settings. I have set max_trace_level to maximum of 256. This leads to very long time to render some frames.

Some raytraced images:

Here pattern like Sierpiński triangle is observed. This pattern is appearing because each sphere reflect other 3 spheres that are arranged in triangle.

The best way to see and zoom fractals is to play with the camera zooming and look_at. I have prepared also .INI file to control rendering options and create animation.


Google direct link video

Video Encoding:

Animation is created from 3000 frames. To render them I used two separate rendering processes each rendering different frames on different computing core. New version 3.7 (beta) of Povray now supports SMP and multi-core systems and will make this easier.

Each frame is saved in TGA format and video is encoded with mencoder.

This is basic set of encoding options:
mencoder mf://*.tga -mf w=480:h=360:fps=25:type=tga -ovc lavc -oac copy -o out.avi

Quality of resulting movie is not high so I played a little with the options and
finally used this set
mencoder mf://*.tga -mf w=400:h=304:fps=25:type=tga -ovc lavc -lavcopts
vcodec=mpeg4:mbd=2:trell:v4mv:last_pred=2:dia=-1:
vmax_b_frames=1:cmp=3:subcmp=3:precmp=0:keyint=2:
sc_threshold=-100000:psnr -oac copy -o out.avi

Probably there can be found better combination. The most important seems to be
the keyint option that sets the maximum interval between keyframes in frames. These options give PSNR (Peak signal-to-noise ratio) of approximately 38 dB. Sadly google process the videos that are uploaded and most of quality is lost.

POV code and INI file.

Roman Cheplyaka - Hpysics demo

Posted in Haskell, functional programming by zoo on the July 11th, 2008

Roman Cheplyaka have posted a short video demonstrating the physics engine in Haskell he is working on during his Summer of Code Project - Hphysics


YouTube direct link video

More information about progress on this project can be found on his blog: Building physics engine in Data Parallel Haskell

More: Hphysics project

Concurrent and multicore programming in Haskell

Posted in Haskell, functional programming by zoo on the July 5th, 2008

Bryan O’Sullivan Lecture on writing programs in Haskell for multicore systems.

Bryan is is also a co-author of the upcoming O’Reilly book Real World Haskell

Open standard for parallel computing across GPUs and CPUs

Posted in GPU by zoo on the July 5th, 2008

There is new initiative to create open standarts for parallel computing across GPUs and CPUs. I think this will be the next step in GPU computing. As I underspend this will be new open standard (something like OpenGL) that will be portable across different platforms. The name of the new standard is Open Computing Language (OpenCL):

Khronos™ Group announced today the formation of a new Compute Working Group to create royalty-free, open standards for programming heterogeneous data and task parallel computing across GPUs and CPUs…”

It is targeted at C language but I do not think there will be any obstacles to use it from other languages too.

More on the Khronos Web Page

Haskell plug-in for Eclipse

Posted in Haskell, functional programming by zoo on the June 29th, 2008

I use Eclipse to all sort of development on C, Java and Haskell. It gives me uniform look and feel on every project. In this post I wold like to show steps necessary to run Haskell Eclipse plug-in.

First of all you need to have Eclipse installed. I will use Ubuntu 8.10 and the installation command is:

sudo apt-get install eclipse

My Eclipse version is: Europa 3.2.2, Haskell plug-in version is: 0.10.0 and Java version “1.6.0_06″

When the installation finish run Eclipse and select “Workbench” from initial page.

  1. From menu on the top select: Help -> Software Updates -> Find and Install ….
  2. On “Install/Update” form select “Search for new features to Install”, click “Next”.
  3. On “Install” form click “New Remote Site …”
  4. On “Edit Remote Site” form enter:
    Name: Haskell Plug-in
    URL: http://eclipsefp.sf.net/updates

    - this is the location of Project that develops Functional Programming plug-in for Haskell and Ocaml. Click “OK”.

  5. On “Install” form select check-box near the “Haskell Plug-in” if it is not already selected and click “Finish”

After the Installation is complete it may be necessary to restart Eclipse.

  1. From menu go to: “Window”-> “Preferences…”
  2. On “Preferences” form select “Functional Programming” and set appropriate compiler and editor options for you system.
  3. On top right corner of Eclipse there is “Java perspective” that can be changed to “Haskell perspective”.

Enjoy happy programming.

Next Page »