Last Saturday I've been at the
Spring Italian Meeting in Cagliari, for an enjoyable meet-up with colleagues, friends, and Spring-passionate users.
First of all, thanks to
Massimiliano Dessi', the man behind this event ;)
Then, if you lose my presentation about Grid Computing, Grid Gain, and the Spring Framework, here it is:
I really enjoyed presenting it, and I think attendees enjoyed it too: probably because I gave three cool
Sourcesense hats to people which made me some questions about the presentation topics ;)
Too bad I can't give you any Sourcesense hat, but I can write down some of the most interesting questions!
Enjoy!
Q: Splitting a task into jobs and sending them to grid nodes involves some overhead due to data transfer: do you have any percentage number that shows you when this overhead is too high compared to what you gain by parallelizing your jobs?
A: I don't believe in magic numbers :)
I'd like to answer your question in a different way: just keep your overhead as little as possible by applying data affinity, that is, by keeping jobs and the data they need together, trying to minimize data transfers.
If you'll not transfer any data, your overhead will be at its minimum.
Q: You talked about data affinity and data grid solutions: what about my database?
A: For really scaling out your application, you must scale your full application stack: hence, your database must scale, too.
I think one of the most effective ways of making you database scale is to partition it, by splitting data into several instances and making every job access a different partition, depending on the data it needs.
Another strategy would be to use a master/replica scenario, where you have a master instance and several read-only replicas, which you map your jobs to for read-intensive operations.
Q: Is there any Grid Gain success story? Do you really use it?
A: Yes, we do :)
We recently developed for the Italian Public Broadcasting Service a custom Content Management System with extended capabilities for life cycle management and rule based publishing of editorial contents.
The publishing infrastructure is made up of a Grid Gain based application managing the publishing cycle of all public web sites, ranging from the main web portal to all related web sites.
It has been implemented for linearly scaling out the publication process from one to hundred sites, by distributing publishing operations on grid nodes, each one capable of publishing contents of one or more sites independently from others: this means that with a number of physical nodes equal to the number of sites to publish, the whole publication process would linearly scale by taking the same time as there were just one site.