<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for iMonad.com</title>
	<atom:link href="http://imonad.com/blog/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://imonad.com/blog</link>
	<description>Software engineering, Functional programming, Predictive Analytics</description>
	<lastBuildDate>Tue, 03 Nov 2009 15:54:47 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Installing Haskell plugin for Eclipse by JP Moresmau</title>
		<link>http://imonad.com/blog/2009/10/installing-haskell-plugin-for-eclipse/comment-page-1/#comment-222</link>
		<dc:creator>JP Moresmau</dc:creator>
		<pubDate>Tue, 03 Nov 2009 15:54:47 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=222#comment-222</guid>
		<description>Thanks for the instructions!! 
To get all the features of the 1.108 build you need to build Scion from Source, on every platform.</description>
		<content:encoded><![CDATA[<p>Thanks for the instructions!!<br />
To get all the features of the 1.108 build you need to build Scion from Source, on every platform.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Installing Haskell plugin for Eclipse by Axman6</title>
		<link>http://imonad.com/blog/2009/10/installing-haskell-plugin-for-eclipse/comment-page-1/#comment-221</link>
		<dc:creator>Axman6</dc:creator>
		<pubDate>Sun, 01 Nov 2009 05:18:25 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=222#comment-221</guid>
		<description>Thanks a lot for posting these instructions, I had tried in the past to install EclipseFP, without any luck whatsoever. Got it working first try thanks to you :)</description>
		<content:encoded><![CDATA[<p>Thanks a lot for posting these instructions, I had tried in the past to install EclipseFP, without any luck whatsoever. Got it working first try thanks to you <img src='http://imonad.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Installing Haskell plugin for Eclipse by iMonad.com &#187; Blog Archive &#187; Haskell plug-in for Eclipse</title>
		<link>http://imonad.com/blog/2009/10/installing-haskell-plugin-for-eclipse/comment-page-1/#comment-219</link>
		<dc:creator>iMonad.com &#187; Blog Archive &#187; Haskell plug-in for Eclipse</dc:creator>
		<pubDate>Sat, 31 Oct 2009 17:48:10 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=222#comment-219</guid>
		<description>[...] I posted updated version with installation instructions for Windows XP and screencast [...]</description>
		<content:encoded><![CDATA[<p>[...] I posted updated version with installation instructions for Windows XP and screencast [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Papers on implementing RBM in GPU by zoo</title>
		<link>http://imonad.com/blog/2009/07/papers-on-implementing-rbm-in-gpu/comment-page-1/#comment-207</link>
		<dc:creator>zoo</dc:creator>
		<pubDate>Mon, 03 Aug 2009 06:45:29 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=189#comment-207</guid>
		<description>&quot;Do you mean have a CPU based version to check that the GPU version runs in the same way? I would like to do that. &quot; 
Yes, it is always better to have a &quot;reference implementation&quot; and when measuring the speed-up this implementation have to be optimised.</description>
		<content:encoded><![CDATA[<p>&#8220;Do you mean have a CPU based version to check that the GPU version runs in the same way? I would like to do that. &#8221;<br />
Yes, it is always better to have a &#8220;reference implementation&#8221; and when measuring the speed-up this implementation have to be optimised.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Papers on implementing RBM in GPU by Ian Calvert</title>
		<link>http://imonad.com/blog/2009/07/papers-on-implementing-rbm-in-gpu/comment-page-1/#comment-204</link>
		<dc:creator>Ian Calvert</dc:creator>
		<pubDate>Fri, 31 Jul 2009 16:53:29 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=189#comment-204</guid>
		<description>Thanks. Sorry, I&#039;m not quite sure what you mean by:

&quot;I can only suggest to keep a pure C++ together with GPU implementation for testing purposes. It is hard to get it right. &quot;

Do you mean have a CPU based version to check that the GPU version runs in the same way? I would like to do that. 

&quot;It seems you are using 8600GT like me. Do you observe 5-10 times speedup?&quot;

I haven&#039;t got a reference point with an otimised C++ program. Using the measure from the last paper, I get 279MCUPS (for a single layer 512x512, batch size of 32), so that would suggest a 27 times speedup. If I&#039;ve got my maths right that is :) That includes all file reading (though the file is written as a series of floats), conversion to column major format, transfer, weight updates (including momentum) and random number generation.

I&#039;m not sure of the meaning of a few bits of the paper though, the update period is defined as &quot;the time it takes for the implementation to complete a single batch of data&quot;. What&#039;s the batch size? Does it mean a single 512 vector? That&#039;s what my calculation above assumes btw. 

Raw performance figures, for training a 784x512x512x2048 network with 10 label units (not softmaxed though) I get a speed of roughly 900 samples per second. That&#039;s for training all layers in sequence. As in, to train all layers in sequence over 50k training samples it takes just under a minute for everything. I get about 3k/s speeds for recognition.

I&#039;m curious about their implementation, they seem to use a matrix multiplication and a matrix addition for the weight updates, as well as a transpose. The transpose worries me a lot, since there&#039;s no need to store a transposed matrix in global memory. You can just change the access pattern (as I assume sgemm does). 

Maybe my program is doing something wrong :)</description>
		<content:encoded><![CDATA[<p>Thanks. Sorry, I&#8217;m not quite sure what you mean by:</p>
<p>&#8220;I can only suggest to keep a pure C++ together with GPU implementation for testing purposes. It is hard to get it right. &#8221;</p>
<p>Do you mean have a CPU based version to check that the GPU version runs in the same way? I would like to do that. </p>
<p>&#8220;It seems you are using 8600GT like me. Do you observe 5-10 times speedup?&#8221;</p>
<p>I haven&#8217;t got a reference point with an otimised C++ program. Using the measure from the last paper, I get 279MCUPS (for a single layer 512&#215;512, batch size of 32), so that would suggest a 27 times speedup. If I&#8217;ve got my maths right that is <img src='http://imonad.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  That includes all file reading (though the file is written as a series of floats), conversion to column major format, transfer, weight updates (including momentum) and random number generation.</p>
<p>I&#8217;m not sure of the meaning of a few bits of the paper though, the update period is defined as &#8220;the time it takes for the implementation to complete a single batch of data&#8221;. What&#8217;s the batch size? Does it mean a single 512 vector? That&#8217;s what my calculation above assumes btw. </p>
<p>Raw performance figures, for training a 784&#215;512x512&#215;2048 network with 10 label units (not softmaxed though) I get a speed of roughly 900 samples per second. That&#8217;s for training all layers in sequence. As in, to train all layers in sequence over 50k training samples it takes just under a minute for everything. I get about 3k/s speeds for recognition.</p>
<p>I&#8217;m curious about their implementation, they seem to use a matrix multiplication and a matrix addition for the weight updates, as well as a transpose. The transpose worries me a lot, since there&#8217;s no need to store a transposed matrix in global memory. You can just change the access pattern (as I assume sgemm does). </p>
<p>Maybe my program is doing something wrong <img src='http://imonad.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Papers on implementing RBM in GPU by zoo</title>
		<link>http://imonad.com/blog/2009/07/papers-on-implementing-rbm-in-gpu/comment-page-1/#comment-203</link>
		<dc:creator>zoo</dc:creator>
		<pubDate>Thu, 30 Jul 2009 20:36:27 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=189#comment-203</guid>
		<description>Nice. I can only suggest to keep a pure C++ together with GPU implementation for testing purposes. It is hard to get it right. My current implementation is in C++ with some interface for future parallelization. 
It seems you are using 8600GT like me. Do you observe 5-10 times speedup?</description>
		<content:encoded><![CDATA[<p>Nice. I can only suggest to keep a pure C++ together with GPU implementation for testing purposes. It is hard to get it right. My current implementation is in C++ with some interface for future parallelization.<br />
It seems you are using 8600GT like me. Do you observe 5-10 times speedup?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Papers on implementing RBM in GPU by Ian Calvert</title>
		<link>http://imonad.com/blog/2009/07/papers-on-implementing-rbm-in-gpu/comment-page-1/#comment-202</link>
		<dc:creator>Ian Calvert</dc:creator>
		<pubDate>Thu, 30 Jul 2009 16:51:20 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=189#comment-202</guid>
		<description>I&#039;m working on a GPU implementation myself. I&#039;ve got horrible code up and about at http://wiki.github.com/IanCal/leonard.

It&#039;s just a start, but over the next month or so it should become a usable library for others working with RBMs and C++. It&#039;ll be heavily restructured soon to make it easy to create experiments :).</description>
		<content:encoded><![CDATA[<p>I&#8217;m working on a GPU implementation myself. I&#8217;ve got horrible code up and about at <a href="http://wiki.github.com/IanCal/leonard" rel="nofollow">http://wiki.github.com/IanCal/leonard</a>.</p>
<p>It&#8217;s just a start, but over the next month or so it should become a usable library for others working with RBMs and C++. It&#8217;ll be heavily restructured soon to make it easy to create experiments <img src='http://imonad.com/blog/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Restricted Boltzmann Machine &#8211; Short Tutorial by chef-ele</title>
		<link>http://imonad.com/blog/2008/10/restricted-boltzmann-machine/comment-page-1/#comment-53</link>
		<dc:creator>chef-ele</dc:creator>
		<pubDate>Thu, 09 Oct 2008 16:44:31 +0000</pubDate>
		<guid isPermaLink="false">http://imonad.com/blog/?p=65#comment-53</guid>
		<description>Here&#039;s some nit-picky editorial comments that might help clarify the text (feel free to ignore these, though!): 

In the section that starts with &quot;The algorithm as a whole is:&quot;,  you do not have an explicit step that updates the weights &amp; biases.  The needed formulas are in a section above, where you wrote that &quot;CD = 0 - n&quot;  and  &quot;W’ = W + alpha*CD&quot;.  Copying those two lines into the list of steps for the algorithm would make the algorithm a bit clearer, I think.  

Next, on the diagrams, it might be helpful to explicitly label the blue nodes as visible nodes &amp; the brown nodes as hidden nodes. 

Also, when you say, &quot;Si.Sj is just a multiplication of current activation (state) of neuron I and neuron J&quot;   it might be slightly clearer to explicitly say that Si is the state of a visible neuron, and Sj is the state of a hidden neuron.  This becomes more obvious later on, but at that point in the text one might think Si &amp; Sj could be the states of two different neurons in the same layer, which they&#039;re not.</description>
		<content:encoded><![CDATA[<p>Here&#8217;s some nit-picky editorial comments that might help clarify the text (feel free to ignore these, though!): </p>
<p>In the section that starts with &#8220;The algorithm as a whole is:&#8221;,  you do not have an explicit step that updates the weights &amp; biases.  The needed formulas are in a section above, where you wrote that &#8220;CD = 0 &#8211; n&#8221;  and  &#8220;W’ = W + alpha*CD&#8221;.  Copying those two lines into the list of steps for the algorithm would make the algorithm a bit clearer, I think.  </p>
<p>Next, on the diagrams, it might be helpful to explicitly label the blue nodes as visible nodes &amp; the brown nodes as hidden nodes. </p>
<p>Also, when you say, &#8220;Si.Sj is just a multiplication of current activation (state) of neuron I and neuron J&#8221;   it might be slightly clearer to explicitly say that Si is the state of a visible neuron, and Sj is the state of a hidden neuron.  This becomes more obvious later on, but at that point in the text one might think Si &amp; Sj could be the states of two different neurons in the same layer, which they&#8217;re not.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
