|Analytics as a Service|
|Written by Oliver Ratzesberger|
|Monday, 21 April 2008 00:00|
Turning utility computing into a service model for analytics.
With the needs of Enterprise Analytics growing at ever increasing speeds, it becomes clear that traditional hub and spoke architectures are in no way able to sustain the demands driven by increasingly complex business analytics. As with any proliferation of systems the overhead of managing, maintaining and developing trees of increasingly complex dependencies quickly out paces the ability of an organization to deal with its challenges. What may work well at first turns into a real evolution nightmare.
It rapidly becomes more and more difficult to react to ongoing changes in business demands and growth. For many years, some of the largest corporations in the world have realized this and have focused on re-integrating islands and stovepipes of information into much more centralized analytical infrastructures.
Quite often however, this is also seen as a step towards reducing flexibility - in terms of time to market - for individual groups to quickly deliver to rapidly changing business demands. It's a very typical love-hate relationship with these so-called departmental systems or data marts.
Great for a localized team to 'bang' out new capabilities, but becomes a data integration nightmare with huge TCO (Total Cost of Ownership) implications, that are quite often not visible to the overall organization.
Various industry studies have estimated the cost per departmental system to start at a minimum of 500k over a typical 3 year lifetime for such systems and as they last longer or grow over time that minimum cost just keeps going up. It is not only the cheap HW and a few SW licenses that get you there, it is the operational cost of building and maintaining them, coupled with source system and networking overhead to move data into them, incremental headcounts and services here and there that quickly add up. Not to forget Disaster Recovery (DR) and Business Continuity requirements. Are you sure all those departmental systems have a DR plan or are backed up every week?
Most organizations might argue that you will only do a few of them - in the beginning. Even 5 of them are too many and the aggregate TCO quickly out paces the cost of more central implementations. Yet many solution providers still propose this architecture as a covert means to overcome scalability and mixed workload problems of more centralized systems.
So how do you make the best out of both worlds and deliver the ultimate flexibility with the lowest possible cost of ownership?
The answer is simple: Analytics as a Service - by some referred to as Agile Analytics
Think of it as providing utility computing for analytics to anyone within or even outside your organization. Not to confuse with Software as a Service (SaaS) - although some database vendors do offer small to medium scale warehousing as SaaS. Analytics as a Service is not limited to a single database or software, but the ability to turn a general purpose analytical platform into a shared utility for an enterprise.
'Bring us your data feeds or streams and analyze them in any way shape or form you can imagine.'
It's an advanced self service model that allows groups to leverage virtual systems for their individual data processing needs, at a cost that undercuts any data mart or cascaded systems implementation. In addition the biggest advantage to the user groups of such services is that ALL other organizational data is automatically at their immediate disposition.
One could think of it as enriching an enterprise infrastructure with more departmental data and the ability to combine that data with anything already processed in this central infrastructure.
You can start small and provide web services to the user base that allows them to bring in whatever data sources they require and grow the services with full access to a virtual slice of the infrastructure, whether programmatic through any sort of programming languages, SQL, Business Intelligence or Data Mining tools.
Of course Information Security, Data Integrity, Resource Management and other aspect cannot be ignored, but the practice has shown us that there are rather simple solutions that can be implemented, that help with automation in these important dimensions of analytical processing. This is not the place to elaborate in detail what these solutions are, the focus is on virtualization of analytical services.
The real beauty of this concept is that, the more virtual analytical systems you deploy, the better the overall scalability and the higher the cost savings. With dozens or hundreds of virtual systems, chances are that more and more of them leverage processing at different times and frequencies, one of the main selling points of virtualization in the first place.
And if there is ever a need to turn around a mission critical analysis for the leadership of your organization, all the processing resources are at your disposal and you can apply more and more of them to deliver results much faster. A completely impossible task for physically separated instances of analytical system. For example: Think of it as a temporary 500% upgrade to one of your virtual systems, than can be redeployed in seconds. Try that with a bunch of local systems or data marts.
Virtualizing database by database however is not the solution to the problem. It would still require your analytics customers to source data from other existing systems. An enterprise solution is required that can scale a single central instance beyond the sum of all individual virtual slices required. There are not many players in this area, but you certainly have your choices.
It also does not stop at large databases. More and more of today's analytical data is highly unstructured or morphing so fast that designing structures for it is nearly impossible - at least not at the scale we operate. This calls for a very close integration of relational and non relational systems.
Back to Analytics as a service.
Imagine a central infrastructure with hundreds of virtual use cases next to each other, sharing virtual resources that allow an organization to leverage and build upon existing data feeds and streams and shorten the time to market for any future use cases.
It's all about picking the right dimensions to decentralize, while leveraging core infrastructure at the lowest possible cost of ownership. With todays technology you don't need thousands of servers to built out such an infrastructure. Most companies will require surprisingly small footprints to implement a virtual analytics service model.
The Analytics Service Provider - internal or external - will take care of all the enterprise requirements that often get overlooked with localized departmental systems. Simple charge back models allow for funding from individual groups.
You might even consider outsourcing certain aspects of running such a centralized 'Analytics as a Service' infrastructure.
All of this is only possible if you are able to implement a highly resilient infrastructure that delivers availability and can handle virtually any workload - known or unknown - optimized or not, even and especially 'bad' workload. The platform has to be able to tightly control dozens to hundreds of virtual partitions, with variable and workload dependent prioritization schemes, hard or soft limits, any form of mixed workload, batch and streaming data feeds.
On any given day our systems process models and queries that have never been seen before. With millions of requests served daily, it is impossible to review every request by a systems expert or dba. We estimate that less than 1% of the workload gets reviewed by a human being at some stage of its processing. The platform has to be able be self optimizing and must require minimal manual intervention and capabilities to automate tasks required to manage the rapidly changing workload.
For us, Analytics is a Service, is becoming more and more a utility computing platform that enables agile prototyping for the Business. With a Service Oriented Architecture, we deploy our capabilities throughout the enterprise while maintaining consistency, availability, security and other important factors on never before seen scale, tens of Peta Bytes a day.
As a wise man once said: Learn to Fail Fast. (Jack Welsh)
And its all about that. Take the risk out of trying something new. Analytics is about exploring the unknown and chances are you will do something wrong. Better to know after a few days or maybe weeks, than spending months or more on a monster project that fails miserably.
Today we run about 45+ virtual analytical systems. A utility that we provide to our business customers at no charge. Aside from a training we deliver for new teams joining, we give all business teams total freedom of choice: Bring whatever data you need, use whatever tools you prefer and analyze in whatever way, shape or form. We take care of the infrastructure and provide high availability, the business gets to prototype rapidly and develop new capabilities at never before seen agility.
Sure you need some boundaries or rules around this infrastructure. You don't want these prototypes to become semi-production systems in the mid to long term. Therefore no virtual analytics slices can ever share their private date. If there is a need for that it warrants the prototype to be promoted into a true production subject area. No SLAs for any of that workload. As this is rapid prototyping the business unit leveraging that cannot become dependent on the data without promoting it into a production subject area. Time and retention limits - after 3-6 months the prototype should become obsolete. And once in a while you might want to make them unavailable, for example during outages and system upgrades.
This is not about making it difficult for the users, but to provide the right incentives to keep a balance between openness and production rigor.
In the end the benefit to the overall organization is exceeding our wildest dreams. We prototype more and faster than ever, data marts have virtually disappeared (who would argue with a high available super computer that you can get for free) and development cycles are shrinking as these prototypes are the perfect starting point for a production version of a new subject area on the Enterprise Data Warehouse (EDW).
Our development teams can quickly take a prototype and deliver a fully integrated production version using our tools and methodologies of choice and following our preferred production development cycles.
Analytics as a Service - or PET - as we internally call them, have become THE win-win for the organization. We will talk about needs for workload management in a separate thread - let me just assure you that it is much easier than anybody had expected.
|Last Updated on Saturday, 12 February 2011 08:02|