Home

Please install Flash 7 or above and turn on Javascript.

Welcome

Welcome to xlmpp

to a blog primarily focused on large scale, data intensive applications like traditional Data Warehousing, Decision Support, Analytics, Data Mining, Data Transformation, Historical Deep Storage and less on Online Transaction Processing or related application technologies.

We are a team of seasoned experts from various backgrounds, that have one thing in common: We live and breath large scale data processing. The systems we work on are amongst the largest systems in the world and the data volumes are staggering. Growth is close to exponential. Analytics are our daily bread and butter.

The Great Waste

The Great Waste

Xtreme Large clusters of parallel systems are also extremely susceptible to throughput problems. Very often do systems of such magnitude fight with bottlenecks.

It takes good algorithms and a solid system architecture to avoid massive throughput loss. Soon!

Analytics to the Rescue

Analyzing the behavior of massive computer clusters

As our extreme large MPP systems got bigger and bigger we quickly realized that we had to leverage them for analyzing their own usage and performance patterns.  Massive mixed workloads with millions of jobs per day from thousands of users in a true 24x7 environment require a solid set of analytical capabilities to manage and optimize these systems.  Soon!

Our Systems

Overview of our Infrastructure

By now you should expect big stats. And thats exactly what you are about to see. Over time we will update these stats with the latest numbers from our Infrastructure.

As we can only share so much, most stats are at a high level, across multiple platforms and systems. Read More...

Project Singularity

It has been a while since I actively blogged on this personal site of ours. It has been a busy couple of years and our teams have pushed the boundaries of pretty much any technology out there that deals with Data and Analytics.

Some 4-5 years ago we started an internal project and based on Ray Kurzweil's - The SIngularity is Near - we dubbed it Singularity.

We are only weeks away from launching V3 of our Singularity platform and its nothing short of amazing. We set out to scale big, economical, make complex easy, do the impossible in the hands of all our analysts, without special training or knowledge of complex programming languages. Putting hundreds of trillions of behavioral patterns to use, structuring complex data just enough to make it simple to use, yet keep loosely structured patterns they way they are, storing unstructured data as is and project logic and structure at runtime.

Read more...
 
Analytics as a Service
Written by Oliver Ratzesberger   
Monday, 21 April 2008 00:00

Turning utility computing into a service model for analytics.

With the needs of Enterprise Analytics growing at ever increasing speeds, it becomes clear that traditional hub and spoke architectures are in no way able to sustain the demands driven by increasingly complex business analytics. As with any proliferation of systems the overhead of managing, maintaining and developing trees of increasingly complex dependencies quickly out paces the ability of an organization to deal with its challenges. What may work well at first turns into a real evolution nightmare.

Last Updated on Saturday, 12 February 2011 08:02
Read more...
 
Introducing xlmpp

xlmpp is a multi author blog about the latest trends in extreme large scale massive parallel processing (MPP).

This site is not about products or vendors but about approaches, architecture, algorithms, the how, the what and most importantly: what to avoid, not to do. Extreme large data volumes present very unique challenges.

Processing 100s-1000s of billions of records or rows or lines of text, whether inside a database or not, require not only massive parallel systems but a great amount of attention to detail.

Read more...
 
Our Systems

To provide you with a little background of what types of systems we are working on we felt it would be beneficial to share some high level stats about our infrastructure.

Incoming data volumes exceed 50TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 12PB of physical storage with over 4.5PB in our largest cluster.

We leverage compression technologies wherever possible and are achieving compression ratios as high as 96% on our highest volume data feeds.

Read more...
 

Latest Blog Entries

Tags

Today's Poll

How many servers/nodes does it take to scan for a rare 6 char pattern in 10^10 records 120 Bytes each in less than 20 sec?
 



We have 4 guests online