It has been a while since I actively blogged on this personal site of ours. It has been a busy couple of years and our teams have pushed the boundaries of pretty much any technology out there that deals with Data and Analytics.
Some 4-5 years ago we started an internal project and based on Ray Kurzweil's - The SIngularity is Near - we dubbed it Singularity.
We are only weeks away from launching V3 of our Singularity platform and its nothing short of amazing. We set out to scale big, economical, make complex easy, do the impossible in the hands of all our analysts, without special training or knowledge of complex programming languages. Putting hundreds of trillions of behavioral patterns to use, structuring complex data just enough to make it simple to use, yet keep loosely structured patterns they way they are, storing unstructured data as is and project logic and structure at runtime.
Soft data projection is what I would call it, the ability to apply structural patterns when you analyze the data, not when you load it. We wanted the best of both worlds - structure where it helps us, no structure where the incoming data changes so frequently that the effort of transformation becomes prohibitive.
We started like so many others and considered everything from a completely self written stack, to existing projects - to enterprise grade solutions. As most things we do, we did not want to invest hundreds of man-years to write yet another optimizer, or yet another variation of SQL or noSQL. A team of 5 had to be able to design, build and implement multi Petabyte size systems, and these systems better be affordable.
In the end we took one of the best #BigData SQL engines out there, on one of the best workload managers with solid security and management features, built in high availability and shared nothing based scalability and focused on what was key to us: Add the soft projection capabilities, native compression and threw new - never before seen - SQL extensions into the mix.
In partnering up with a company that has been doing this for many years, we did something very different from many others: We did not reinvent the wheel. We did not sink hundreds of man years into the basics, but a few of our brightest experts focused on the key differentiator for semi structured #BigData.
Adoption is exponential. With up to a million requests per day, hundreds of analysts dig deep into our behavioral data. And with a strong workload management framework in place we can afford to to load near realtime and drop data off every few minutes, process patterns 24x7 and put trillions of events at the finger tips of a broad community of analysts that don’t need to be retrained from scratch.