Tal Lavian, Randy H. Katz; Doctoral Thesis, University of California at Berkeley. January 2006.

The practice of science experienced a number of paradigm shifts in the 20th century, including the growth of large geographically dispersed teams and the use of simulations and computational science as a third branch, complementing theory and laboratory experiments. The recent exponential growth in network capacity, brought about by the rapid development of agile optical transport, is resulting in another such shift as the 21st century progresses. Essential to this new branch of e-Science applications is the capability of transferring immense amounts of data: dozens and hundreds of TeraBytes and even PetaBytes.

The invention of the transistor in 1947 at Bell Labs was the triggering event that led to the technology revolution of the 20th century. The completion of the Human Genome Project (HGP) in 2003 was the triggering event for the life science revolution of the 21st century. The understanding of the genome, DNA, proteins, and enzymes is prerequisite to modifying their properties and the advancement of systematic biology. Grid Computing has become the fundamental platform to conduct this e-Science research. Vast increases in data generation by e-Science applications, along with advances in computation, storage and communication, affect the nature of scientific research. During this decade, crossing the “Peta” line is expected: Petabyte in data size, Petaflop in CPU processing, and Petabit/s in network bandwidth.

Numerous challenges arise from a network with a capacity millions of times greater than the public Internet. Currently, the distribution of large amounts of data is restricted by the inherent bottleneck nature of today”s public Internet architecture, which employs packet switching technologies. Bandwidth limitations of the Internet inhibit the advancement and utilization of new e-Science applications in Grid Computing. These emerging e-Science applications are evolving in data centers and clusters; however, the potential capability of a globally distributed system over long distances is yet to be realized. Today’s network orchestration of resources and services is done manually via multi-party conference calls, emails, yellow sticky notes, and reminder communications, all of which rely on human interaction to get results. The work in this thesis automates the orchestration of networks with other resources, better utilizing all resources in a time efficient manner. Automation allows for a vastly more comprehensive use of all components and removes human limitations from the process. We demonstrated automatic Lambda setting-up and tearing-down as part of application servers over MEMs testbed in Chicago metro area in a matter of seconds; and across domains, over transatlantic links in around minute.

The main goal of this thesis is to build a new grid-computing paradigm that fully harnesses the available communication infrastructure. An optical network functions as the third leg in orchestration with computation and storage. This tripod architecture becomes the foundation of global distribution of vast amounts of data in emerging e-Science applications.

A key investigation area of this thesis is the fundamental technologies that allow e-Science applications in Grid Virtual Organization (VO) to access abundant optical bandwidth through the new technology of Lambda on demand. This technology provides essential networking fundamentals that are presently missing from the Grid Computing environment. Further, this technology overcomes current bandwidth limitations, making VO a reality and consequentially removing some basic limitations to the growth of this new big science branch.

In this thesis, the Lambda Data Grid provides the knowledge plane that allows e-Science applications to transfer enormous amounts of data over a dedicated Lightpath, resulting in the true viability of global VO. This enhances science research by allowing large distributed teams to work efficiently, utilizing simulations and computational science as a third branch of research.