Category >>

Apr 29
2013

Back in Action

Posted by Oliver Ratzesberger in mybloggeneralbigdata

Time to get this blog back into activity. The past few years have been so busy with projects the size of 10s of PB, that I did not get a chance to do any public blogging. We started this site when #BigData was not a term but an over simplification. How times have changed. We are working on brand new massive scale projects and are turning more and more into real time analytical processing. Time to writeup what we have been up to. Check back over the next weeks as I will update the site with new stories and insights we found along the
Mar 22
2008

Science - DB Research Meeting

Posted by Oliver Ratzesberger in xldbsuper computingmpp

Next week I will be attending the next iteration of the xldb group events organized around eXtreme Large Database Applications. xldb workshop

With 100s of Peta Bytes of information waiting to be captured and analyzed, new concepts are required to scale today's platforms by 1-3 orders of magnitudes.

Today we 'limit' ourselves to 'only' capture 40TB/day of incremental incoming data volumes, next generation requirements demand a much more detailed collection of event detail data. 100TB/day are already on the horizon giving us just 10 days of history per Peta Byte. With deep historical requirements of 3+ years of information, data volume growth will outpace Moor's Law. And I would not be surprised if next year this time we will be thinking about how to deal with 250TB/day — the writing is on the wall.

Improvements in Processing Power per CPU, advances in Memory and Storage are not going to the able to make up for the exponential growth of data processing requirements.


Read More...
Mar 21
2008

TACC Ranger goes live

Posted by Oliver Ratzesberger in super computingmpp

On February 22nd 2008 TACC formally introduced the go-live of RANGER - a massive scale supercomputer. While not a traditional relational processing system, the design shared many components and basic principles of large scale processing platform.

Of particular interest is the multi terabit infiniband interconnect that allows the system to (re)distribute massive amounts of data.

One of the early learnings from the system is that loading massive amounts of data can at times be a larger challenge, than processing that very same data once loaded into the system. It points out a very common issue with large scale data processing:


Read More...

Tags

We have 46 guests online