Making Neural Network Models More Efficient- [electronic resource]
Making Neural Network Models More Efficient- [electronic resource]
- 자료유형
- 학위논문파일 국외
- 최종처리일시
- 20240214100445
- ISBN
- 9798379717254
- DDC
- 004
- 저자명
- Su, Yushan.
- 서명/저자
- Making Neural Network Models More Efficient - [electronic resource]
- 발행사항
- [S.l.]: : Princeton University., 2023
- 발행사항
- Ann Arbor : : ProQuest Dissertations & Theses,, 2023
- 형태사항
- 1 online resource(94 p.)
- 주기사항
- Source: Dissertations Abstracts International, Volume: 84-12, Section: B.
- 주기사항
- Advisor: Li, Kai.
- 학위논문주기
- Thesis (Ph.D.)--Princeton University, 2023.
- 사용제한주기
- This item must not be sold to any third party vendors.
- 초록/해제
- 요약Complex machine learning tasks typically require large neural network models. However, training and inference on neural models require substantial compute power and large memory foot-prints, and incur significant costs. My thesis studies methods to make neural networks efficient at a relatively low cost.First, we explore how to utilize CPU servers for training and inference. CPU servers are more readily available, have larger memories, and cost much less than GPUs or hardware accelerators. However, they are much less efficient for training and inference tasks. My thesis studies how to design efficient software kernels for sparse neural networks that allow unstructured pruning to achieve efficiency of training or inference . Our evaluation shows that our sparse kernels can achieve 6.4x-20.0x speedups for medium sparsities over the commonly used Intel MKL sparse library for CPUs and greatly reduce the performance gap with those for GPUs.Second, we study how to achieve high-throughput inference for large models. We propose PruMUX, a method to combine data multiplexing with model compression. We find that in most cases, PruMUX can achieve better throughput than using each approach alone for a given accuracy loss budget.Third, we study how to find best sets of parameters for PruMUX in order to make it practical. We propose Auto-PruMUX, which uses performance modeling based on a set of data points to predict multiplexing parameters for DataMUX and sparsity parameters for a given model compression technique. Our evaluation shows that Auto-PruMUX can successfully find or predict parameters to achieve the best throughput given an accuracy loss budget.This dissertation also proposes several future research directions in the areas of our studies.
- 일반주제명
- Computer science.
- 일반주제명
- Computer engineering.
- 키워드
- Neural network
- 키워드
- Compute power
- 기타저자
- Princeton University Computer Science
- 기본자료저록
- Dissertations Abstracts International. 84-12B.
- 기본자료저록
- Dissertation Abstract International
- 전자적 위치 및 접속
- 로그인 후 원문을 볼 수 있습니다.