Performance Analysis of Deep Learning Workloads Using Roofline Trajectories on Chameleon

Image Courtesy of Dr. Lu

Dr. Xiaoyi Lu is a research assistant professor at The Ohio State University focusing on High Performance Interconnects and Protocols, Big Data Computing, Deep Learning, Parallel Computing, Virtualization, and Cloud Computing. In this blog post, we explore his research and usage of Chameleon Cloud. 

On his current research project: Over the last decade, technologies derived from convolutional neural networks (CNNs) called Deep Learning applications, have revolutionized fields as diverse as cancer detection, self-driving cars, virtual assistants, etc. However, many users of such applications are not experts in Machine Learning itself. Consequently, there is limited knowledge among the community to run such applications in an optimized manner. For such users to make effective use of resources at their disposal, concerted efforts are necessary to figure out optimal hardware and software configurations. 

On approaching the research challenge: In this context, my team at The Ohio State University, collaborated with Dr. Ibrahim from Lawrence Berkeley National Laboratory, proposed to use the Roofline Trajectory model to perform a systematic analysis of representative CNN models and identify opportunities for black box and application-aware optimizations. Our approach is able to identify various bottlenecks that allow us to significantly improve the training performance of these CNN models. We hope that our study would be helpful for scientists, especially those who may not have enough knowledge of low-level systems, to optimize their Deep Learning model training processes and maximize the performance. Using the findings from our study, we are able to obtain up to 3.5× speedup compared to vanilla TensorFlow-based CNN training with default configurations. We perform our research tasks on the Chameleon Cloud platform, which include writing & debugging codes, running experiments, collecting results, and data analysis.

On testbed needs: Chameleon Cloud can provide bare-metal machines with root permission access. This is significantly important since many of our experiments need to monitor and collect low-level hardware and systems counters. Without bare-metal machines with root permission, it is impossible to do these kinds of studies. This is one of the major differences between Chameleon Cloud and other HPC platforms.

On the origins of the direction of this research: This work was done by one of my students, Mr. Haseeb Javed. He did his internship job at LBNL with Dr. Ibrahim. We think this is an interesting research direction for his thesis. 

On his most powerful piece of advice for students beginning research: Think Big, Start Small, Move Fast!

 

Interested readers can explore the results of Dr. Lu’s research published as Performance analysis of deep learning workloads using roofline trajectories


Add a comment

No comments