<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>

<channel>
	<title>Eriky.com</title>
	<atom:link href="http://www.eriky.com/feed" rel="self" type="application/rss+xml" />
	<link>http://www.eriky.com</link>
	<description>Just another Technology Freak with a blog</description>
	<pubDate>Tue, 11 Nov 2008 12:44:56 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>MySQL: Error No. 1033 Incorrect information in file: &#8216;filename&#8217;</title>
		<link>http://www.eriky.com/2008/11/mysql-error-no-1033-incorrect-information-in-file-filename</link>
		<comments>http://www.eriky.com/2008/11/mysql-error-no-1033-incorrect-information-in-file-filename#comments</comments>
		<pubDate>Tue, 11 Nov 2008 12:41:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Software]]></category>

		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://www.eriky.com/?p=25</guid>
		<description><![CDATA[This is one of those &#8216;OMG Eriky, you saved my ass!&#8217; posts.
You probably came here after searching for this error. Before you try anything else, check if your /tmp directory exists and if it has the right permissions. If not, create it and do a &#8220;chmod 777 /tmp&#8221;. MySQL will give you the weirdest errors [...]]]></description>
			<content:encoded><![CDATA[<p>This is one of those &#8216;OMG Eriky, you saved my ass!&#8217; posts.</p>
<p>You probably came here after searching for this error. Before you try anything else, check if your /tmp directory exists and if it has the right permissions. If not, create it and do a &#8220;chmod 777 /tmp&#8221;. MySQL will give you the weirdest errors if it can not use the /tmp folder.</p>
<p>If this does not help, check your /etc/mysql/my.cnf. There could be a line somewhere, stating which directory to use as a temporary directory. Create that directory, or remove/change the line so MySQL uses your /tmp folder instead.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eriky.com/2008/11/mysql-error-no-1033-incorrect-information-in-file-filename/feed</wfw:commentRss>
		</item>
		<item>
		<title>Importing the complete wikipedia database in 5 hours</title>
		<link>http://www.eriky.com/2008/11/importing-the-complete-english-wikipedia-database</link>
		<comments>http://www.eriky.com/2008/11/importing-the-complete-english-wikipedia-database#comments</comments>
		<pubDate>Sat, 08 Nov 2008 15:10:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
		<category><![CDATA[Research]]></category>

		<category><![CDATA[mediawiki]]></category>

		<category><![CDATA[wikipedia]]></category>

		<guid isPermaLink="false">http://www.eriky.com/?p=3</guid>
		<description><![CDATA[Say you want to setup a local installation of Wikipedia. Installing the software is easy, just go to www.mediawiki.org and download the MediaWiki software, created by the WikiMedia foundation, which is the foundation of Wikipedia. That&#8217;s right, you&#8217;re guaranteed to mess up the names at some point in your life.
OK, we have it running on [...]]]></description>
			<content:encoded><![CDATA[<p>Say you want to setup a local installation of Wikipedia. Installing the software is easy, just go to www.mediawiki.org and download the MediaWiki software, created by the WikiMedia foundation, which is the foundation of Wikipedia. That&#8217;s right, you&#8217;re guaranteed to mess up the names at some point in your life.</p>
<p>OK, we have it running on a local Ubuntu installation, on Apache and MySQL5 with PHP5. Not a big deal. Now it&#8217;s time to download the Wikipedia data, which is licensed in such a way that you are free to use it. In my case, I will be using it for scientific reseach, about which I&#8217;ll surely post more in the coming months. The english data of just the current version of all the articles is 4+ gigabytes in compressed bzip2 format. Let&#8217;s decompress - you are now left with a whopping 18.2GB xml file. Outch. Importing this xml file, with the importData.php tool from mediawiki, will take lots of time. It starts importing at a rate of 4 pages per second, but this rate will go down to 2.5 per second after about 20,000 pages. After 30,000 pages, it seemed to stabilize at 2,25 pages/sec, so I started to do some math. There are about 15 million pages in total, if I remember correctly. That is 15,000,000 / (2.25 * 60*60*24) = 77 days.</p>
<p>Unfortunately, but to be expected, the rate keeps going down. After 150,000 pages I&#8217;m now at 1.7 pages/sec. Maybe this wasn&#8217;t such a good idea..<br />
<BR /><BR /></p>
<h2>Plan B</h2>
<p>I can almost hear you screaming now: &#8216;thank god, there is a plan B&#8217;.</p>
<p>Head over the <a href="http://www.mediawiki.org/wiki/MWDumper">MWDumper page</a> and download the jar file. Follow the instructions and use the example they give, looking like:</p>
<pre>java -jar mwdumper.jar --format=sql:1.5 pages_full.xml.bz2 |
   mysql -u &lt;username&gt; -p &lt;databasename&gt;</pre>
<p>This is a lot better already, it will import at 200 to 300 pages/sec. But it gets better. If you remove ALL indexes and auto_increments, the speed goes up beyond 2000 pages per second! Don&#8217;t forget to re-add the indexes and auto_increment fields when the import is done.</p>
<p>You can have your own locally installed Wikipedia in about 5 hours or less if your PC is fast. The example PC I used is a relatively dated AMD athlon 3200+ with 1GB of memory and a regular sata disk.</p>
<p>P.S.: you might want to look into MySQLs binary logging. Turning it off or reducing the maximum log size to 1MB will increase performance too!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.eriky.com/2008/11/importing-the-complete-english-wikipedia-database/feed</wfw:commentRss>
		</item>
	</channel>
</rss>
