Data Grids & Infinispan
a (partner)talk by Manik Surtani, lead and founder of the Infinispan project at jboss
What are they: memory storage node, connected by network, be it udp or tcp and the illusion of a lot of memory. Datagrids supposed to be very fast since they are supposed to be in memory. Of course for this they need consistent hashing (for the keys) to allow fast retrieval and a certain degree of data locality (your data close to you so the transit tiime is short.
they should be easy to use, like auto recovery, auto node detection.
What’s the main purpose of data grids?
To make your rdbms go faster, give it some degree of fault tolerance and high availability.
Data grids are usually NoSQL (Not Only SQL ) solutions. And most nosql systems have the characteristics to have good scalability, high availability, fault tolerance. etc.. However not all of them have all of these.
Infinispan is an example of a data grid. It’s built with Java and Scala, it took some concepts from the Amazon Dynamo paper. It uses consistent hashing based distribution to locate stuff. Which is a very efficient and fast. And there is no single point of failure since it’s distributed. Infinispan uses MVCC locking. Which is a highly concurrent locking mechanism. It also support XA transactions through 2 phase commit cycle and deadlock detection. Currently under development is the Atomic broadcast mechanism where for they are working together with some research groups which should allow to contain consistency and reduce the network load with 50%.
the plan is to integrate map/reduce in Infinispan, part of it is already in the Infinispan trunk.
Through hibernate search or lucene you can even query Infinispan. In the future some sort of JPA/QL will be supported.
Infinispan also flushes stuff to disk to be able to recover after a crash. This a pluggable mechanism so one can create it’s own persistence mechanism.
Infinispan supports different eviction techniques. From fifo, lifo to other obscure new techniques.
The API is fairly simple it’s akey value store hence it’s like a map. there are talks about other high level API’s to allow Infinispan to be used with other languages.
Every method to talk to Infinispan have an async equivalent which return a java.util.concurrent.Future
Manik mentions that if you happen to be using jboss cache or ehcache configuring Infinispan happens the same way. i wonder, who the hell is still using JBoss Cache…
Infinispan supports REST, Memcached (text based protocol, nice commercial move since memcahced is widely used in different languages) and Hot Rod(wtf?) as communication protocol. It uses netty for creating server sockets, (netty is a nio server framework).
Hot Rod is a wire protocol for client server communication. Developed for Infinispan itself. It’s memchaced like but it is binary and not text based. And it’s a 2-way protocol, meaning that both client and server talk to each other. Hot Rod clients have built in fail-over, load-balancing and smart routing. The server can share the consistent hashes algorithms to allow a client to pick a server that is closer by or will already contain the key of the data so that one server does not have to reroute to the server that should actually contain the data. The client can already decide to send to the right server.
the first version of Infinispan is version 4.0.0, Manik does not want to bore the audience with why that is. But I’ll tell you. JBossCache was his previous project and went to 3.x.x. Infinispan is actually the predecessor of JBossCache which is a tree cache (resulting in lot of problems) so they started almost from scratch for Infinispan.
Nice note is that on network failure between couple nodes Infinispan does not know what to do but you can write your own callback handlers for that 🙂