Xiaodong Zhang, Professor

Department of Computer Science and Engineering

The Ohio State University

Overcoming Major Hurdles of Paralleling in Big Data Analytics

Thursday, November 14, 4:00 PM

Packard Lab Room 466

Abstract: The effectiveness of parallel processing in big data analytics plays a key role for high performance, high throughput, and high scalability. Existing big data software systems, such as MapReduce, are designed with restrictive and synchronous communication structures, which can process big data based on simple analytics in a highly parallel mode. However, processing big data with complex analytics under the MapReduce-like structure is not efficient and not scalable. In contrast to high performance computing, hurdles of parallel processing in big data analytics that remain include (1) specific communication hardware support is not affordable; (2) data sets are hardly or even not movable after they are stored in systems; (3) lacking runtime and global coordination in cluster systems; and (4) lacking software tools to optimize and standardize big data analytics programs. In this talk, I will present our software design and implementation aiming to overcome these major hurdles in big data analytics in three aspects: (1) a data placement structure of storage to balance the requirements of performance, scalability and fault tolerance; (2) a communication facility delivering critically necessary information to coordinate big data analytics in clusters, and (3) a software tool to translate SQLs to MapReduce programs with optimizations. We have made collaborative efforts with the Facebook data infrastructure team and Hortonworks to test and deploy our research results, some of which have been adopted in their production systems, serving billions of users in the world.

Bio: Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering and Chair of the Computer Science and Engineering Department at the Ohio State University. His research interests focus on data management in computer and distributed systems. He has made strong and effective efforts to transfer his academic research into advanced technology to update the design and implementation of major general-purpose computing systems. He received his Ph.D. in Computer Science from University of Colorado at Boulder, where he received Distinguished Engineering Alumni Award in 2011. He is a Fellow of the ACM, and a Fellow of the IEEE.

© 2014-2016 Computer Science and Engineering, P.C. Rossin College of Engineering & Applied Science, Lehigh University, Bethlehem PA 18015.