|
Sep 03
2011
|
Analytics as a Service - Social SQLPosted by Oliver Ratzesberger in social, bigdata, agile |
|
The past 12 months had us move Analytics as a Service (A3S) to new maturity levels. For the very first time we have a single point interface for all of our A3S services: the DataHub. I recently presented an overview and demo to a group of industry analysts and the feedback has been overwhelmingly positive.
The combination of Social, private Cloud, Analytics as a Service based on an Open Source built (joomla + kunena) social portal is turning into a killer application for the global enterprise. Never before have we seen agile and community, BI and Analytics brought together through a fully social experience, that allows users, analysts, scientists, executives, PMs - pretty much anybody in the organization to follow each other, link up, like, create groups, publish Analytics and discover new data, new analytics, new insights on the fly.
Search and metad-data are great, as long as you know what to look for. In todays world of BigData this is becoming increasingly complex, overwhelming in some cases nearly impossible. Sure if you are in your tiny little startup, where everybody knows everybody, its usually very easy to stay agile. You just walk up to you neighbor, colleague, business partner and get what you need. Think 1000x bigger. Think of companies with thousands, 10s- or even 100s- of thousands of employees. Traditional BI and Data Warehousing is just not going to cut it (data-marts and data-silos aside).
Its why we believe A3S and a social service like the DataHub is the next evolution of Analytics in the large enterprise. Service enabling 100s or 1000s of virtual sandboxes, virtual data-marts though a social layer of visualization and collaborative business intelligence. As part of that we built front end services for virtual data-mart management (provisioning, access controls), data upload through point and click interface from anything local (Excel, Access, CSV - to name a few) but also large scale data movement on demand (pick source and target systems and move some serious amounts of data through a simple service).
The one area we have not built out but are about to do, is what we sometimes call social SQL. Not sure if the name will stick, but that is beside the point.
Let me tell you what the vision for the next A3S service is: Imagine you have captured billions of analytical queries, basically the past several years of any single request made against all our data. Anything from small to very large. Now add to that the detailed execution plan of every one of these jobs. Of course we know who and when they ran.
Now take all of that information with a dose of machine learning and score and rate any new request people are working on, while they are working on it. The importance here is on: WHILE they are working on it.
The tool being an online SQL tool that allows people to search for queries, metadata, who is using what in our data and add that with a real-time recommendation engine that proposes other data elements you might not be aware of. Add in your social network of analysts you are working with and start merchandising data elements you were not even aware existed.
You end up with an online SQL tool on steroids, with all the collective history of the enterprise. It "knows" what is easy and simple, it "knows" what is potentially very expensive to run and can provide appropriate feedback to the user before she or he even fires off their next request. Now couple all of that with scheduling (run that query for me every Monday morning), dependency tracking (after several important subject areas have completed) and all of a sudden you get a smart workflow engine that allows you to populate your VDMs or analytical sandboxes with data. Add on top of all that progressive execution of statements and you have solved the painful process of initial loads, or massive query management. Think of progressive execution as a way to specify a chunk of the problem (e.g. how the query would look like for a single day of data) and add a background scheduler to it that will e.g. go back in time and run those chunks for the past several years of data you have stored. Next morning the analysts comes back and finds 20% of his massive analysis completed and starts looking into the results, can halt or pause further execution and can make adjustments if needed.
You have arrived at the next generation of SQL tooling, which allows your organization to become even more independent and self sufficient from IT than ever before.
Last but not least we are about to capture the first n (e.g. 200) result records of every SQL query ever run. Imagine what search looks like when you not only show who ran what query, but also show part of the result right next to it together with the time it was last executed. You will know what to expect form a particular request, you might even find the result you where looking for right there. Of course this needs to by coupled with proper authorization management and filtering so people cannot see, what they should not see.
Not sure what we will call it by the time we are done with it, point is, its one of the important pieces of Analytics as a Service that we are finally getting to.

written by Matthew Tod , September 17, 2011
A memory of the the queries used sounds like a great idea and should speed up future insight as well as saving money.
One thought is that it would be good to get a user rating for the query when they have finished - a number of dimensions could be captured here and this would be very helpful for future users of the query.
