Michael Garland

See Also

I am currently a member of NVIDIA Research, where I lead the Programming Systems and Applications Research Group. Prior to joining NVIDIA, I was an assistant professor in the Department of Computer Science of the University of Illinois at Urbana-Champaign. I graduated with my Ph.D. from the Computer Science Department of Carnegie Mellon University.

Recent Publications

  1. M. Bauer, W. Lee, M. Papadakis, M. Zalewski and M. Garland. Supercomputing in Python With Legate. Computing in Science & Engineering, vol. 23, no. 04, pp. 73-79, 2021..

    Legate is a recently developed software system for constructing scalable simulation and data analysis programs using convenient, familiar notation. It demonstrates how coupling Python, with a runtime system originally designed for high-performance computing, can enable the creation of libraries that mimic the familiar interface of NumPy and Pandas for execution on both desktops and supercomputers.

  2. M. Bauer, W. Lee, E. Slaughter, Z. Jia, M. Di Renzo, M. Papadakis, G. Shipman, P. McCormick, M. Garland, and A. Aiken. Scaling Implicit Parallelism via Dynamic Control Replication. In Proc. Principles and Practices of Parallel Programming (PPoPP), Feb 2021.

    We present dynamic control replication, a run-time program analysis that enables scalable execution of implicitly parallel programs on large machines through a distributed and efficient dynamic dependence analysis. Dynamic control replication distributes dependence analysis by executing multiple copies of an implicitly parallel program while ensuring that they still collectively behave as a single execution. By distributing and parallelizing the dependence analysis, dynamic control replication supports efficient, on-the-fly computation of dependences for programs with arbitrary control flow at scale. We describe an asymptotically scalable algorithm for implementing dynamic control replication that maintains the sequential semantics of implicitly parallel programs.

    An implementation of dynamic control replication in the Legion runtime delivers the same programmer productivity as writing in other implicitly parallel programming models, such as Dask or TensorFlow, while providing better performance (11.4X and 14.9X respectively in our experiments), and scalability to hundreds of nodes. We also show that dynamic control replication provides good absolute performance and scaling for HPC applications, competitive in many cases with explicitly parallel programming systems.

  3. V. Joseph, G. L. Gopalakrishnan, S. Muralidharan, M. Garland, and A. Garg. A Programmable Approach to Neural Network Compression. IEEE Micro, vol. 40, no. 5, pp. 17-25, Sep/Oct 2020.

    Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task, which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both the model size and inference time without appreciable loss in accuracy. However, finding the best compression strategy and corresponding target sparsity for a given DNN, hardware platform, and optimization objective currently requires expensive, frequently manual, trial-and-error experimentation. In this article, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build more complex and practically interesting compression strategies. Given a strategy and user-provided objective (such as minimization of running time), Condensa uses a novel Bayesian optimization-based algorithm to automatically infer desirable sparsities. Our experiments on four real-world DNNs demonstrate memory footprint and hardware runtime throughput improvements of 188x and 2.59x, respectively, using at most ten samples per search.

Read more on my complete list of publications.