Note to self: randomly drop lines in a text file

If you ever need to drop lines from a stream of text randomly, you can use this simple and short awk command:

Example: cat file | awk '{if (int(rand()*100) < 10) print $0;}'

This example keeps only 10%. You can change the 10 to any other percentage to drop more or less.

As an example, I use this to warmup my MediaWiki installation before doing a real WikiBench benchmark:

cat benchmarks/1pct.trace | head -n 100000 | grep "\-$" | \
awk '{if (int(rand()*100) < 10) print $0;}' | ./start_controller.sh -verbose
This entry was posted on Wednesday, February 25th, 2009 at 3:40 pm and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a reply

Name (*)
Mail (will not be published) (*)
URI
Comment