Mastering ElasticSearch [book review]

I’ve recently had the chance to read Mastering ElasticSearch from Packt Publishing.

I’ll start of by quoting a review from

The authors, Rafal included, are well respected authors in the Solr/ElasticSearch/Lucene community so they clearly have a good grasp on the subject. This shines through in chapters like the Query DSL. However, the editing is pretty poor and frankly, PacktPub would be best placed to at least re-edit this title for the English quality.

So yes, the english is not the best out there, but understandable. A small amount of editing would greatly benefit this book. Besides from that, this book is for the more experienced ElasticSearch users. It could be seen as a sequel to the book “ElasticSearch Server” from the same authors. It covers more advanced topics like routing, shard allocation, advanced query DSL and some advanced java topics like garbage collection, to name a few! In short, this book is for those that are running ElasticSearch on a production site and need to go beyond using the defaults. If you fall into that category, this book is a valuable addition to your bookshelf.

How to listen for key presses with AngularJS

I needed to create a table that can be browsed around with the arrow keys in an AngularJS app. The documentation about ng-keypress on the AngularJS site is pretty minimal. E.g. it does not mention that you can call a function that passes an $event object containing the keypress. I created a small jsfiddle demo to show you how to catch virtually all key presses on your Angular site, like arrow keys, return, function keys, all letters and numbers, etc.

One word of warning: this works for angularjs 1.1.5 and up. Check out the JSFiddle: Catching key presses with AngularJS


Understanding HBase and BigTable

Just wanted to share this. For those of you that can’t really get a grasp on what HBase is, this blog post explains it really well.

ElasticSearch backups

So you discovered ElasticSearch, great! A question that should come to mind once you start using it seriously is: can I create backups of my indexes. Luckily you can! There are several options that are not very user friendly.

First, you can make filesystem level backups of the indexes, but with a big cluster this means you have to copy the data on each node.

You can also use a shared gateway and backup the data from the gateway. I would not advise a shared gateway because the whole point of ElasticSearch is not having a single point of failure. Actually, the shared gateway is deprecated by ElasticSearch now so don’t even think about exploring that option now!

The third option is to use the scan and scroll API call that ElasticSearch offers. These two calls allow you to scan all (or a subset) of your data and walk over the result set by repeatedly calling scroll. I have tested this on quite some data (200GB) and this works surprisingly well. That is why I decided to add a dump and import script to my open source project ESClient (Python), to save you from the trouble of having to reinvent the wheel ;-)

If you install ESClient (with pip install esclient or easy_install esclient) you get these two scripts installed automatically. You can use them by simple entering esdump or esimport on the command line and they will show you usage information.

As an example, suppose you have an index called ‘items’ and another called ‘customers’. You can backup this index to a bz2 file using:

esdump --url http://localhost:9200/ --indexes items customers --bzip2 --file items_customers.bz2

You can import this data using:

esimport --url http://localhost:9200 --file items_customers.bz2

Alternatively, you can import the data back to another index, e.g. items_test, by using the –index option on esimport.

Some notes

These two scripts currently support indexes that have the following fields: _parent, _routing. If you supplied a specific routing at index time, that will be restored too. The same holds true if you specified a parent/child relation.

Not supported are indexes in which you don’t store the _source field. You can not backup an index without this field.

Future plans

It is relatively simple to also backup the mapping of the data, so this is high on my priority list. Also, I want to check the cluster state before dumping the data, to ensure you are not backup up a cluster that is in a bad state (Yellow or Red).

P.S.: from what I understand, is working hard towards a 1.0 version which will offer backup and restore functionality out of the box!

ESClient: Python ElasticSearch Client

I’ve been working on a little project: a Python client for ElasticSearch. Are there other clients out there? Yeah sure, and they are even pretty decent. Especially pyes is OK. But I’m missing documentation on that one and I don’t like the approach they took on implementing the API. I wanted to create a simple client that stays close to the ElasticSearch REST API. It allows you to directly submit either JSON or a tree of Python objects that can be converted to JSON. ESClient has documented code and supplied unittests that should get you a long way. I plan to write good documentation to get people started quickly as soon as the API starts to reach a stable state.

In a few days I have learned a lot about:

It’s simply awesome how quick and easy you can get something up and running these days. With just a few lines of code and Pythons distutils you can create a source package and a Windows binary, upload it all to PyPI and make it installable for everyone in the world with a single command:

pip install esclient
easy_install esclient

ESClient is still in its early stages and I can assure you the API will change over the coming weeks. I need to implement more API methods and I want to implement bulk indexing. If all API methods are implemented and bulk indexing works I will work towards a stable 1.0.0 release that will also have a stable API. I also want ESClient to handle errors well. This is something that I missed in some other libraries that I have found, like pyelasticsearch.

Install Canon software without the original CD

Canon has an annoying and useless policy which makes it impossible to install the updates from their website without having an original cd. Apparently they want to make the lives of their customers harder than it already is.  (My Mackbook Air doesn’t even have a DVD player!)

Lucky for us this website has some tricks to circumvent this. For me simply removing an update.plist file was enough to get ImageBrowser 6.7.2-updater (Mac OS 10.5-10.6) to installing without original cd.

Twitter’s list of 401 banned passwords

I was looking at the source of a page when I noticed a javascript array with banned passwords. They were ROT13 encoded, so I decided to decode them and have a look. This has been done before but the list is now a bit longer, so here you go:

(warning: this list contains quite some dirty words, which could be the reason why Twitter does a rot13 enconding :-0) Read more »

Macbook Air fan stuck and making ticking noise?

Does your Macbook Air fan makes a repeating, predictable ticking noise? Does it get hotter than usual when watching YouTube video’s? Does iStat Pro reports a fan speed of 0?

A few taps with your finger help! If you have an older model (2009) tick on the right side. If you have a newer model (mine is late 2010) then tap on the right side. Don’t overdo it, but don’t be too gentle either.

See also this discussion:

Want to beat the hackers at their own game?

Google shows how web application vulnerabilities can be exploited and how to defend against these attacks. The best way to learn things is by doing, so you’ll get a chance to do some real penetration testing, actually exploiting a real application.

Anti Counterfeiting Treaty Agreement leaked

The leaked ACTA agreement!
Soon to be copy pasted into a law near you.

Check out this page on The piratebay