백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

내용보기

Making Neural Network Models More Efficient- [electronic resource]

자료유형: 학위논문파일 국외

최종처리일시: 20240214100445

ISBN: 9798379717254

DDC: 004

저자명: Su, Yushan.

서명/저자: Making Neural Network Models More Efficient - [electronic resource]

발행사항: [S.l.]: : Princeton University., 2023

발행사항: Ann Arbor : : ProQuest Dissertations & Theses,, 2023

형태사항: 1 online resource(94 p.)

주기사항: Source: Dissertations Abstracts International, Volume: 84-12, Section: B.

주기사항: Advisor: Li, Kai.

학위논문주기: Thesis (Ph.D.)--Princeton University, 2023.

사용제한주기: This item must not be sold to any third party vendors.

초록/해제: 요약Complex machine learning tasks typically require large neural network models. However, training and inference on neural models require substantial compute power and large memory foot-prints, and incur significant costs. My thesis studies methods to make neural networks efficient at a relatively low cost.First, we explore how to utilize CPU servers for training and inference. CPU servers are more readily available, have larger memories, and cost much less than GPUs or hardware accelerators. However, they are much less efficient for training and inference tasks. My thesis studies how to design efficient software kernels for sparse neural networks that allow unstructured pruning to achieve efficiency of training or inference . Our evaluation shows that our sparse kernels can achieve 6.4x-20.0x speedups for medium sparsities over the commonly used Intel MKL sparse library for CPUs and greatly reduce the performance gap with those for GPUs.Second, we study how to achieve high-throughput inference for large models. We propose PruMUX, a method to combine data multiplexing with model compression. We find that in most cases, PruMUX can achieve better throughput than using each approach alone for a given accuracy loss budget.Third, we study how to find best sets of parameters for PruMUX in order to make it practical. We propose Auto-PruMUX, which uses performance modeling based on a set of data points to predict multiplexing parameters for DataMUX and sparsity parameters for a given model compression technique. Our evaluation shows that Auto-PruMUX can successfully find or predict parameters to achieve the best throughput given an accuracy loss budget.This dissertation also proposes several future research directions in the areas of our studies.