Sunday, October 28, 2012

Cloud and Shannon's Limit

I've been on the road (or air) so much over the past few months that some things I had thought I'd blogged about turn out to be either dreams or only to have hit twitter. One of them is Shannon's Limit and its impact on the Cloud, which I've been discussing in presentations for about 18 months or so. There's a lot of information out there on Shannon's Limit, but it's something I came across in the mid 1980's as part of my physics undergraduate degree. Unfortunately the book I learned from is no longer published so apart from a couple of texts that are accessible via Google I can't really recommend all of them (they may be good, but I simply don't have the context to say that with certainty). However, if you're looking for a very simple, yet accurate, discussion of what Shannon's Limit says, it can be found here.

So what has this got to do with the Cloud? In the context of the Cloud then put simply, Shannon's Limit shows that the Cloud (public or private) only really works well today because not everyone is using it. Bandwidth and capacity are limited by the properties of the media we use to communicate between clients and services, no matter where those services reside. But for cloud, the limitation is the physical interconnects over which we try to route our interactions and data. Unfortunately no matter how quickly your cloud provider can improve their back end equipment, the network to and from those cloud servers will rarely change or improve, and if it does it will happen at comparatively glacial speeds.

What this means is that for the cloud to continue to work and grow with the increasing number of people who want to use it, we need to have more intelligence in the intervening connections between (and including) the client and service (or peers). This includes not just gateways and routers, but probably more importantly mobile devices. Many people are now using mobile hardware (phones, pads etc.) to connect to cloud services so adding intelligence there makes a lot of sense.

Mobile also has another role to play in the evolution of the cloud. As I've said before, and presented elsewhere, ubiquitous computing is a reality today. I remember back in 2000 when we (HP) and IBM were talking about it, but back then we were too early. Today there are billions of processors, hundreds of millions of pads, 6 billion phones etc. Most of these devices are networked. Most of them are more powerful than machines we used a decade ago for developing software or running critical services. And many of them are idle most of the time! It is this group of processors that is the true cloud and needs to be encompassed within anything we do in the future around "cloud".

Friday, October 26, 2012

NoSQL and transactions


I've been thinking about ACID and non-ACID transactions for a number of years. I've spent almost as long working in the industry and standards trying to evolve them to cater for environments where strict ACID transactions are too much. Throughout all of this I've been convinced that transactions are the right abstraction for many of the fault tolerance, reliability and consistency requirements. Over the years transactions have received bad press in some quarters, sometimes from people who don't understand them, over use them, or don't really want to have to implement them. At times various waves of technology have either helped or hindered the adoption of transactions outside of the traditional database; for instance some NoSQL efforts eschew transactions entirely (ACID and extended) citing CAP when it's not always right to do so.

I think a good transactions implementation should be at the core of all middleware platforms and databases, because if it's well thought out then it won't add overhead when it's not needed and yet provides obvious benefits when it is. It should be able to offer a wide range of transaction models (well at least more than one) and a model that makes it easier to reason about the correctness and consistency of applications and services developed with it.

At the moment most NoSQL or BigData solutions either ignore transactions or support ACID or limited ACID (only in the scope of a single instance). But it's nice to see a change occurring, such as seen with Google's Spanner work. And as they say in the paper: "We believe it  is better to have application programmers deal with performance problems due to over use of transactions as bottlenecks arise, rather than always coding around the lack of transactions."

And whilst I agree with my long time friend, colleague and co-author on RDBMS versus the efficacy of new approaches, I don't think transactions are to be confined to the history books or traditional back-end data stores. There's more research and development that needs to happen, but transactions (ACID and extended) should form a core component within this new infrastructure. Preconceived notions based on overuse or misunderstanding of transactions shouldn't disuade their use in the future if it really makes sense - which I obviously think it does.