Career Profile
I’m currently looking for a software engineer job in backend, infrastructure, and machine learning systems (ML infrastructure). I received (09/2019) my PhD degree from New York University, advised by Prof. Jinyang Li. My interest is distributed systems. More specifically, my PhD research focuses on building machine learning systems that help AI developers to distribute algorithms on a multi-GPU machine or a large-scale cluster. Before my PhD study, I worked for 4 years in MediaTek, Taiwan, as a software engineer. My duty was to develop system software (e.g., hardware drivers and management software) for multi-core mobile phone processors.
Experiences
Following list the three projects I participated during my PhD study.
-
SwapAdvisor: Support Large Deep Learning Models via Smart Swapping
SwapAdvisor can automatically swap temporarily unused tensors from GPU memory to CPU memory to support running larger DNN models. To minimize the communication overhead, SwapAdvisor analyzes the dataflow graph of the given DNN model and uses a custom-designed genetic algorithm to optimize the operator scheduling and memory allocation. Based on the optimized operator schedule and memory allocation, SwapAdvisor can determine what and when to swap to achieve good performance.
-
Tofu: Distributing Tensor Computation for Large-scale Deep Learning
Tofu partitions very large DNN models across multiple GPUs device to reduce per-GPU memory footprint while achieving good performance. In order to understand the feasible partition methods for all the operators, Tofu provides a simple domain-specific language to describe the semantics of an operator. Tofu analyzes the semantics of each operator in the target DNN model and applies a recursive search algorithm to minimize the total communication costs.
-
Spartan: Distributed Array Programming Framework with Smart Tiling
Spartan is a distributed array framework, built on top of a set of higher-order dataflow operators. Based on the operators, Spartan provides a collection of Numpy-like array APIs. To achieve good performance for the distributed application, Spartan analyzes the communication pattern of the dataflow graph captured through the operators and applies a greedy strategy to find a good partition scheme to minimize the communication cost.