Wednesday, September 08, 2010

Cloud and multitenancy

Over the past year or so I've been doing a lot of thinking about various aspects of cloud. One of the ones I keep coming back to is the issue of multi-tenancy. Many years ago, when I was formulating the categorization of transactional replication strategies, we found that splitting up the object server (methods) from the state and allowing each to be replicated independently, gave the ability to express everything necessary to cover the range of replica consistency protocols from passive through active. Sharing of the executable versus sharing of the state is based on aspects such as determinism of execution (passive is the only option in some cases) and speed of fail-over (active tends to do much better).

So what has this to do with multitenancy? Well I've been thinking about what it means to have a true multitenant application and hence PaaS. Some people seem to suggest that multitenancy is somehow new, complex or the domain of a new breed of engineers and vendors. But if you think about it, we've been using multitenant environments for years. Your operating system is multitenant, with multiple applications resident and often running concurrently, perhaps modifying shared data structures as a result. Even your modest Web server can be considered multitenant. And there are a range of strategies for achieving multitenancy based on a similar server/state split. Therefore, whether you're a SaaS implementer or a PaaS architect, the same factors come into play. And critically for SaaS implementers, if your PaaS doesn't support these things then you're going to have to!

In terms of PaaS, there are really 3 components that can be played with in terms of multitenancy and application and data deployment. These are the VM, the application server (container of business logic) and the data (database, file system etc.) For instance, you could have each tenant in a separate application server but running on the same VM, with data split between different database instances. Or you could have each tenant in the same application server on the same VM, with data in the same database, perhaps split on different tables. (Other options and configurations exist, of course.)

Now all of these various combinations have more or less been around before, as I said earlier. The big problem though has been around ease of use. I've written enough Unix kernel services before, for instance, to know most of the ins-and-outs of sharing state without causing the operating system to crash, and the best way to use pthreads or memory mapped files in a shared environment, but it's not always intuitive or based on well documented processes. This is precisely where a PaaS fits in: it must support the widest range of approaches and in a way that does not require the SaaS developer to have to worry about them. We can learn a lot from what's been done in the past around the intricacies of achieving multitenancy, but we do need to make it far more consumable than has perhaps been the case so far.

Tuesday, September 07, 2010

Interesting new paper on transactions

I met Daniel Abadi at HPTS 2009 last year, where he presented on HadoopDB. Very interesting and a good presenter. So it's good to see his latest paper on determinism and transactions in database systems. It's a good read and particularly for me because it mixes the two areas that have always been my interests: transactions and replication. The authors have some good things to say about NoSQL, but specifically around relaxing ACID semantics, the trade-offs that incurs and how perhaps there is an alternative. Again, this is an area I've had a bit to do with over the past few decades. I'll have to think about how and whether this is applicable to some of the work we're doing at the moment in large scale data grids.