February 25th, 2009
If you ever need to drop lines from a stream of text randomly, you can use this simple and short awk command:
Example: cat file | awk '{if (int(rand()*100) < 10) print $0;}'
This example keeps only 10%. You can change the 10 to any other percentage to drop more or less.
As an example, I use this to warmup my MediaWiki installation before doing a real WikiBench benchmark:
cat benchmarks/1pct.trace | head -n 100000 | grep "\-$" | \
awk '{if (int(rand()*100) < 10) print $0;}' | ./start_controller.sh -verbose
February 25th, 2009
If you get this error with MediaWiki (or any other software) you should look at the properties of your table structure. Most probable is that an “auto_increment” is missing. This problem took me quite a while to find, especially because of lots of people on the web come up with the weirdest explanations and solutions, like simply reinstalling and re-importing the data. Not fun if you are in a hurry and have a table with 7.5 million text files.
February 19th, 2009
Today I presented my master research project to a group of people at the Vrije Universiteit. The project is called “WikiBench, a distributed Wikipedia based web application benchmark“. You can view my slides on this url if you are interested. The thesis (and source code!) will be released towards the end of March.
February 15th, 2009
The Debian Project is pleased to announce the official release of Debian GNU/Linux version 5.0 (codenamed Lenny
) after 22 months of constant development.
Although I should be glad, I’m not since I’m still running a server with Debian 3.1 on it. It’s stable as as rock though.
The availability and updates of OpenJDK, GNU Java compiler, GNU Java bytecode interpreter, Classpath and other free versions of Sun’s Java technology, into Debian GNU/Linux 5.0 allow us to ship Java-based applications in Debian’s main
repository.
That is good news for Java as a language. This will obviously make it easier to install Java software.
February 15th, 2009
OK this is a bit old but I wanted to link to it anyway, just in case you haven’t heart about this programming style yet. It is called “return home early” and it basically means that you can change the logic of your code in such a way that you get less nesting. I tend to think about this when I see lots of curly braces and it often helps me reduce code size and complexity. If it sounds interesting enough for you, read about it here!