Archive

Archive for February, 2009

Note to self: randomly drop lines in a text file

If you ever need to drop lines from a stream of text randomly, you can use this simple and short awk command:

Example: cat file | awk '{if (int(rand()*100) < 10) print $0;}'

This example keeps only 10%. You can change the 10 to any other percentage to drop more or less.

As an example, I use this to warmup my MediaWiki installation before doing a real WikiBench benchmark:

cat benchmarks/1pct.trace | head -n 100000 | grep "\-$" | \
awk '{if (int(rand()*100) < 10) print $0;}' | ./start_controller.sh -verbose

MediaWiki: “1048: Column ‘old_id’ cannot be null”

If you get this error with MediaWiki (or any other software) you should look at the properties of your table structure. Most probable is that an “auto_increment” is missing. This problem took me quite a while to find, especially because of lots of people on the web come up with the weirdest explanations and solutions, like simply reinstalling and re-importing the data. Not fun if you are in a hurry and have a table with 7.5 million text files.

WikiBench presentation

Today I presented my master research project to a group of people at the Vrije Universiteit. The project is called “WikiBench, a distributed Wikipedia based web application benchmark“. You can view my slides on this url if you are interested. The thesis (and source code!) will be released towards the end of March.

Oh crap.. Debian 5 has been released

The Debian Project is pleased to announce the official release of Debian GNU/Linux version 5.0 (codenamed Lenny) after 22 months of constant development.

Although I should be glad, I’m not since I’m still running a server with Debian 3.1 on it. It’s stable as as rock though.

The availability and updates of OpenJDK, GNU Java compiler, GNU Java bytecode interpreter, Classpath and other free versions of Sun’s Java technology, into Debian GNU/Linux 5.0 allow us to ship Java-based applications in Debian’s main repository.

That is good news for Java as a language. This will obviously make it easier to install Java software.

Return home early

OK this is a bit old but I wanted to link to it anyway, just in case you haven’t heart about this programming style yet. It is called “return home early” and it basically means that you can change the logic of your code in such a way that you get less nesting. I tend to think about this when I see lots of curly braces and it often helps me reduce code size and complexity. If it sounds interesting enough for you, read about it here!