CSE 220 Paper Review Swizzle Inventor - Data Movement Synthesis for GPU Kernels

This is a repository for my research, paper reading summaries/reviews, and relevant blog-like posts in markdown.

CSE 220 Paper Review
Swizzle Inventor - Data Movement Synthesis for GPU Kernels

Summary

Swizzle inventor is a code synthesis algorithm that takes a “program sketch” and fills “holes” with “swizzles”. A program sketch is similar to a template, where the program is provided fully written except for placeholders, called holes. Holes are specified by a developer and define where synthesized code should be placed and parameters for how to synthesize the code. A swizzle is a transformation of data that optimizes access for a given hardware architecture.

A cited paper by Ben-Sasson et al., “Fast Multiplication in Binary Fields on GPUs via Register Cache”, describes a way to transform data loaded into registers of a GPU such that subsequent access by a GPU kernel (matrix mutliplication, for example) has a huge performance benefit. Mangpo, et al. took this idea as a motivation for synthesizing code that is difficult to write but beneficial for many types of computation. The transform of loaded data into different thread-local registers is considered to be a swizzle, and swizzle inventor is then able to synthesize the code for shuffling data amongst registers to optimize access paths for whatever computation is provided in a program sketch. Not only can swizzle inventor synthesize swizzles for a program sketch, it can find relatively optimized swizzles that were previously unknown.

What are the paper’s strengths?

The paper does a great job of illustrating their solution in their figures and motivating the types of operations they think are useful to include in their swizzle search space. I also believe that this paper does an exceptional job in describing a generalizable solution that can be extended to data movement in storage and for other architectures. Essentially, I think the ideas in this paper provides a framework for new research utilizing code synthesis in heterogeneous systems.

What are the paper’s weaknesses?

I don’t find any notable weaknesses, though I suppose that some criticisms may include an implementation that is narrowly applicable–requiring the use of rosette and cuda. I don’t really find these to be weaknesses, but if I had to say something I think this would be it.

Suggestions to improve paper

I don’t have any suggestions. I liked this paper the way it was and thought it was well written.

Impact and more recent work

This paper was very recent (published 6 months ago) but I imagine it will have a high impact. The implementation may not be used extensively, but the ideas provide an approach for optimizing complex, mechanical operations that are useful for a variety of architectures. I think many workloads today are primarily bottlenecked by data movement. Compute is becoming very cheap and many types of accelerators are becoming more common. The bottleneck for utilizing much of this compute is data movement. In my own work with storage systems, I have been interested in applying the work described in this paper to see if there are ways to generate optimal data access functions to be used in the storage layer. I find this work exciting and even if I don’t use it directly, I think this is a convincing application of code synthesis that will become necessary in the near future.

Citation information

@inproceedings{conf:swizzle-inventor,
    author    = {Phitchaya Mangpo Phothilimthana and Archibald Samuel Elliott
                 and An Wang and Abhinav Jangda and Bastian Hagedorn
                 and Henrik Barthels and Samuel J. Kaufman and Vinod Grover
                 and Emina Torlak and Rastislav Bodik},
    title     = {Swizzle Inventor: Data Movement Synthesis for GPU Kernels},
    booktitle = {Proceedings of the Twenty-Fourth International Conference on Architectural
                 Support for Programming Languages and Operating Systems},
    series    = {ASPLOS '19},
    year      = {2019},
    isbn      = {978-1-4503-6240-5},
    location  = {Providence, RI, USA},
    pages     = {65--78},
    numpages  = {14},
    url       = {http://doi.acm.org/10.1145/3297858.3304059},
    doi       = {10.1145/3297858.3304059},
    acmid     = {3304059},
    publisher = {ACM},
    address   = {New York, NY, USA},
    keywords  = {GPGPU, program synthesis, swizzling},
}