Home

Please install Flash 7 or above and turn on Javascript.

Welcome

Welcome to xlmpp

to a blog primarily focused on large scale, data intensive applications like traditional Data Warehousing, Decision Support, Analytics, Data Mining, Data Transformation, Historical Deep Storage and less on Online Transaction Processing or related application technologies.

We are a team of seasoned experts from various backgrounds, that have one thing in common: We live and breath large scale data processing. The systems we work on are amongst the largest systems in the world and the data volumes are staggering. Growth is close to exponential. Analytics are our daily bread and butter.

The Great Waste

The Great Waste

Xtreme Large clusters of parallel systems are also extremely susceptible to throughput problems. Very often do systems of such magnitude fight with bottlenecks.

It takes good algorithms and a solid system architecture to avoid massive throughput loss. Soon!

Analytics to the Rescue

Analyzing the behavior of massive computer clusters

As our extreme large MPP systems got bigger and bigger we quickly realized that we had to leverage them for analyzing their own usage and performance patterns.  Massive mixed workloads with millions of jobs per day from thousands of users in a true 24x7 environment require a solid set of analytical capabilities to manage and optimize these systems.  Soon!

Our Systems

Overview of our Infrastructure

By now you should expect big stats. And thats exactly what you are about to see. Over time we will update these stats with the latest numbers from our Infrastructure.

As we can only share so much, most stats are at a high level, across multiple platforms and systems. Read More...

Analytics as a Service
Written by Oliver Ratzesberger   
Monday, 21 April 2008 00:00

Turning utility computing into a service model for analytics.

With the needs of Enterprise Analytics growing at ever increasing speeds, it becomes clear that traditional hub and spoke architectures are in no way able to sustain the demands driven by increasingly complex business analytics. As with any proliferation of systems the overhead of managing, maintaining and developing trees of increasingly complex dependencies quickly out paces the ability of an organization to deal with its challenges. What may work well at first turns into a real evolution nightmare.

It rapidly becomes more and more difficult to react to ongoing changes in business demands and growth. For many years, some of the largest corporations in the world have realized this and have focused on re-integrating islands and stovepipes of information into much more centralized analytical infrastructures.

Quite often however, this is also seen as a step towards reducing flexibility - in terms of time to market - for individual groups to quickly deliver to rapidly changing business demands. It's a very typical love-hate relationship with these so-called departmental systems or data marts.

Great for a localized team to 'bang' out new capabilities, but becomes a data integration nightmare with huge TCO (Total Cost of Ownership) implications, that are quite often not visible to the overall organization.

Last Updated ( Monday, 21 April 2008 22:40 )
Read more...
 
Our Systems

To provide you with a little background of what types of systems we are working on we felt it would be beneficial to share some high level stats about our infrastructure.

Incoming data volumes exceed 50TB per day, with more than 10^11 new items/lines/records being added per day. Our analytical processing infrastructure exceeds 12PB of physical storage with over 4.5PB in our largest cluster.

We leverage compression technologies wherever possible and are achieving compression ratios as high as 96% on our highest volume data feeds.

Read more...
 
Introducing xlmpp

xlmpp is a multi author blog about the latest trends in extreme large scale massive parallel processing (MPP).

This site is not about products or vendors but about approaches, architecture, algorithms, the how, the what and most importantly: what to avoid, not to do. Extreme large data volumes present very unique challenges.

Processing 100s-1000s of billions of records or rows or lines of text, whether inside a database or not, require not only massive parallel systems but a great amount of attention to detail.

Read more...
 

Latest Blog Entries

Tags

Today's Poll

How many servers/nodes does it take to scan for a rare 6 char pattern in 10^10 records 120 Bytes each in less than 20 sec?
 



We have 24 guests online