본문

Learning to Design Protein and DNA Libraries- [electronic resource]
Learning to Design Protein and DNA Libraries - [electronic resource]
내용보기
Learning to Design Protein and DNA Libraries- [electronic resource]
자료유형  
 학위논문파일 국외
최종처리일시  
20240214101125
ISBN  
9798380380782
DDC  
004
저자명  
Busia, Akosua.
서명/저자  
Learning to Design Protein and DNA Libraries - [electronic resource]
발행사항  
[S.l.]: : University of California, Berkeley., 2023
발행사항  
Ann Arbor : ProQuest Dissertations & Theses, 2023
형태사항  
1 online resource(106 p.)
주기사항  
Source: Dissertations Abstracts International, Volume: 85-03, Section: B.
주기사항  
Advisor: Listgarten, Jennifer;Jordan, Michael.
학위논문주기  
Thesis (Ph.D.)--University of California, Berkeley, 2023.
사용제한주기  
This item must not be sold to any third party vendors.
초록/해제  
요약Using next-generation sequencing, it is now possible to screen up to billions of protein or DNA sequences in parallel for a property of interest. Consequently, high-throughput sequencing has vastly accelerated the rate of biological discovery for both basic scientific inquiry and for engineering novel enzymes, therapeutics, antibodies, regulatory elements, and beyond. In such high-throughput sequencing-based screens and selections, the quality of the starting sequence library greatly influences the overall chance of successfully identifying sequences with the desired property. Generalizable in silico methods for designing high-quality sequence libraries promise to reduce wet lab experimental burden and improve the speed with which new, functional sequences can be discovered. Machine learning, in particular, provides a useful set of tools for implementing such methods, as it is well-suited to analyzing the large quantities of data produced by high-throughput sequencing. In this dissertation, we will discuss several aspects of machine learning-guided library design, and propose solutions to challenges posed by existing technologies.First, we introduce a framework for machine learning-guided library design, and showcase its ability to design diverse, functional libraries in a gene therapy context. Specifically, we (i) outline a modeling approach for predicting the property selected for in a high-throughput sequencing-based selection experiment that explicitly accounts for uncertainty in the observed sequencing data, and (ii) describe a novel machine learning-guided design procedure that optimally trades off between a library's average predicted property values and its sequence diversity. We use these methods to design a clinically-relevant adeno-associated virus (AAV) peptide insertion library. AAVs hold tremendous promise as delivery vectors for clinical gene therapy, and packaging is a general prerequisite for delivering genetic material to a target tissue. Standard diversified libraries for engineering effective AAV delivery vectors contain a high proportion of variants that are unable to assemble or package their genomes, which often limits the effectiveness of downstream selections for desired properties such as efficient infection of human tissues. Using our machine learning-guided design framework, we systematically design effective starting libraries that are as diverse as possible whilst being biased towards variants that are able to assemble and package the viral genome efficiently. Specifically, we design a library of peptide insertions into the AAV capsid that achieves five-fold higher packaging fitness than the standard insertion library-known as the "NNK" library-with negligible reduction in diversity. We further demonstrate the general utility of our designed library on a downstream task to which our design approach was agnostic: infection of primary human brain tissue. Compared to the standard NNK library, our machine learning-designed library contains approximately 10-fold more variants that successfully infect the human brain.Next, we highlight a key shortcoming of the above predictive modeling approach-namely, its extremely limited ability to share information across related but non-identical reads-that prevents it from making effective use of sequencing data in many settings of interest. We introduce model-based enrichment (MBE) to overcome this shortcoming. MBE is based on a new perspective of differential sequencing analysis that uses sound theoretical principles from the density ratio estimation field in machine learning, is easy to implement, and can trivially make use of advances in modern-day machine learning classification architectures or related innovations. We evaluate MBE empirically, both in simulation and on real experimental data, and show that it improves accuracy compared to current ways of performing sequencing-based differential analyses-including the previous section's predictive modeling approach. The greater flexibility of our new approach enables effective analysis across a broader range of common experimental setups than can currently be achieved, thereby expanding the set of biological applications for which one can learn accurate predictive models to guide library design.Finally, we highlight some remaining challenges for machine learning-guided library design, including research opportunities into combining multiple sources of biological information in the design process. In summary, this dissertation presents a number of machine learning techniques that can be brought to bear on the problem of designing improved starting libraries for biological screens and selection experiments. The insights from this work provide further motivation for researchers to combine laboratory experiments with tools from machine learning to efficiently engineer novel functional protein and DNA sequences.
일반주제명  
Computer science.
일반주제명  
Biomedical engineering.
일반주제명  
Electrical engineering.
키워드  
Model-based enrichment
키워드  
DNA sequences
키워드  
Machine learning
키워드  
DNA libraries
키워드  
Design protein
기타저자  
University of California, Berkeley Electrical Engineering & Computer Sciences
기본자료저록  
Dissertations Abstracts International. 85-03B.
기본자료저록  
Dissertation Abstract International
전자적 위치 및 접속  
로그인 후 원문을 볼 수 있습니다.
신착도서 더보기
최근 3년간 통계입니다.

소장정보

  • 예약
  • 소재불명신고
  • 나의폴더
  • 우선정리요청
  • 비도서대출신청
  • 야간 도서대출신청
소장자료
등록번호 청구기호 소장처 대출가능여부 대출정보
TF05526 전자도서
마이폴더 부재도서신고 비도서대출신청

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

해당 도서를 다른 이용자가 함께 대출한 도서

관련 인기도서

로그인 후 이용 가능합니다.