Home The Blog
Oliver Ratzesberger's Blog
Description:
This is Oliver Ratzesberger's portion of the multi author blog at www.xlmpp.com
To learn more about me and my colleagues check out: The Authors

Oct 23
2008

CCA08 - Cloud Computing and its Applications

Posted by Oliver Ratzesberger in mppefficiencycost

Just got back from Chicago, where over the past 2 days a small group of scientists, academia and industry discussed various aspects of cloud computing and related topics. 

One of the topics was about comparing extreme large scale analytical problems and the systems leverage to solve them. In order to compare classes of super computers, Alex Szalay (John Hopkins University) explained a simple yet interesting figure: The AMDAHL number (Amdahl's Law Bell, Gray and Szalay 2006)

Alex explained the Amdahl number (BW) as One bit of IO/sec per instruction/sec.

Why is this figure interesting?  To compare the analytical capabilities of various extreme large clusters, it is important to categorize them into different groups based on their processing capabilities. Typical commercial applications of large scale analytics require large amounts of IO per available CPU while various scientific applications require less IO per available CPU.

For a Blue Gene the BW=0.013, the JHU cluster BW = 0.664. So off

Apr 21
2008

Analytics as a Service

Posted by Oliver Ratzesberger in xldbmppefficiencyagile

Analytics as a Service

What Do you think about Agile Analytics? Every heard about it? Well, here are a couple thoughts from the guys who deal with it on a daily basis.  

Analytics as a Service 

Looking forward to seeing your comments on this

Mar 22
2008

Science - DB Research Meeting

Posted by Oliver Ratzesberger in xldbsuper computingmpp

Next week I will be attending the next iteration of the xldb group events organized around eXtreme Large Database Applications. xldb workshop

With 100s of Peta Bytes of information waiting to be captured and analyzed, new concepts are required to scale today's platforms by 1-3 orders of magnitudes.

Today we 'limit' ourselves to 'only' capture 40TB/day of incremental incoming data volumes, next generation requirements demand a much more detailed collection of event detail data. 100TB/day are already on the horizon giving us just 10 days of history per Peta Byte. With deep historical requirements of 3+ years of information, data volume growth will outpace Moor's Law. And I would not be surprised if next year this time we will be thinking about how to deal with 250TB/day — the writing is on the wall.

Improvements in Processing Power per CPU, advances in Memory and Storage are not going to the able to make up for the exponential growth of data processing requirements.


Read More...
Mar 21
2008

TACC Ranger goes live

Posted by Oliver Ratzesberger in super computingmpp

On February 22nd 2008 TACC formally introduced the go-live of RANGER - a massive scale supercomputer. While not a traditional relational processing system, the design shared many components and basic principles of large scale processing platform.

Of particular interest is the multi terabit infiniband interconnect that allows the system to (re)distribute massive amounts of data.

One of the early learnings from the system is that loading massive amounts of data can at times be a larger challenge, than processing that very same data once loaded into the system. It points out a very common issue with large scale data processing:


Read More...
Mar 21
2008

A Systems overview

Posted by Oliver Ratzesberger in mppgeneral

Finally I got to complete a high level systems overview. I realize it does not contain too much detail, but as you can imagine, we are bound by pretty strict NDAs.

Nevertheless, it should give you a good feel for how much data we process any given day. The stats are pretty much going into 2008 figures and are growing rapidly.

Here is a link to the article: Our Systems

Enjoy the reading and post your comments!

Oliver

Feb 16
2008

Welcome to the blog @ xlmpp

Posted by Oliver Ratzesberger in general

Welcome to the xlmpp blog!

 

Its time to kick off an exciting blog about ultra large scale information processing architecture and real world analytical systems.

Over the next weeks and months Michael, Darren and myself will be blogging about the largest data processing systems, common issues, scalability of large scale mpp clusters, time to market, analytics, ultra high data volumes and lots more.

Be sure you bookmark the site and subscribe to our news feed.

Latest Comments

Tags

We have 23 guests online