Data Challenges and HDF5 – Past, Present, Future

Mike Folk, The HDF GRoup

A hopeful band of folks from three US national laboratories and a supercomputing center started HDF5 in 1997 to address the need for a scientific data format for high end applications. HDF5 had to achieve fast I/O speeds on immense parallel systems that in 1997 were still on the drawing boards, and to accommodate high volume, complex, large data volumes, while still offering the earlier HDF benefits of portability, sharability and flexibility. How well has HDF5 met these needs?

Some funny things happened on the way to 2007. Computers got bigger and new architectures emerged; data volumes grew in size and complexity as data production accelerated; the web became our personal computer; whole new applications and users found HDF5 enticing; people came up with clever new ways to do things in HDF5; paper-centric institutions became digital, and people asked serious questions about preserving their data for hundreds of years. These challenges have brought new expectations and requirements for HDF5. How has HDF5 responded?

And, we know, more funny things will happen in the next decade. Petascale systems and storage will go mainstream; nanotechnology and highly multiplexed instrumentation will collect data in volumes and at speeds that that alter our notions of I/O speed; new applications will discover weaknesses in their ability to handle large, heterogeneous, complex data and will take a look at HDF5; and the scientific mainstream will finally, really embrace traditional database technology. What is HDF5 doing about these challenges?

"Data Challenges and HDF5" will describe HDF5, assess its ability to address challenges posed by recent technological change and new applications, and take a look toward meeting future data challenges.