Learning Visually Grounded Intelligence With Language
Learning Visually Grounded Intelligence With Language
상세정보
- 자료유형
- 학위논문 서양
- 최종처리일시
- 20250211152012
- ISBN
- 9798382832883
- DDC
- 400
- 저자명
- Li, Liunian.
- 서명/저자
- Learning Visually Grounded Intelligence With Language
- 발행사항
- [Sl] : University of California, Los Angeles, 2024
- 발행사항
- Ann Arbor : ProQuest Dissertations & Theses, 2024
- 형태사항
- 128 p
- 주기사항
- Source: Dissertations Abstracts International, Volume: 85-12, Section: A.
- 주기사항
- Advisor: Chang, Kai-Wei.
- 학위논문주기
- Thesis (Ph.D.)--University of California, Los Angeles, 2024.
- 초록/해제
- 요약To build an Artificial Intelligence system that can assist us in daily lives, the ability to understand the world around us through visual input is essential. Prior studies train visual perception models by defining concept vocabularies and annotate data against the fixed vocabulary. It is hard to define a comprehensive set of everything, and thus they are hard to generalize to novel concepts and domains. In this thesis, I turn to language as a scalable and effective tool to build visually grounded models. Intuitively, natural languages are the most effective medium of learning and communication for humans. I will introduce two lines of work to train models to understand the visual world with language as supervision. The first line of work is inspired by masked language modeling such as BERT, and extends that to build contextualized representation models for vision and language. These models can be fine-tuned to perform vision-language tasks such as answering questions about an image. The second line of work uses language to supervise object detection models and enables object detection with prompts, where the users could specify custom needs and domain knowledge in a text prompt, and the model situates its predictions based on the text on the fly.
- 일반주제명
- Language
- 일반주제명
- Linguistics
- 일반주제명
- Computer science
- 키워드
- Vocabularies
- 키워드
- Visual input
- 기타저자
- University of California, Los Angeles Computer Science 0201
- 기본자료저록
- Dissertations Abstracts International. 85-12A.
- 전자적 위치 및 접속
- 로그인 후 원문을 볼 수 있습니다.
MARC
008250123s2024 us c eng d■001000017162441
■00520250211152012
■006m o d
■007cr#unu||||||||
■020 ▼a9798382832883
■035 ▼a(MiAaPQ)AAI31331374
■040 ▼aMiAaPQ▼cMiAaPQ
■0820 ▼a400
■1001 ▼aLi, Liunian.
■24510▼aLearning Visually Grounded Intelligence With Language
■260 ▼a[Sl]▼bUniversity of California, Los Angeles▼c2024
■260 1▼aAnn Arbor▼bProQuest Dissertations & Theses▼c2024
■300 ▼a128 p
■500 ▼aSource: Dissertations Abstracts International, Volume: 85-12, Section: A.
■500 ▼aAdvisor: Chang, Kai-Wei.
■5021 ▼aThesis (Ph.D.)--University of California, Los Angeles, 2024.
■520 ▼aTo build an Artificial Intelligence system that can assist us in daily lives, the ability to understand the world around us through visual input is essential. Prior studies train visual perception models by defining concept vocabularies and annotate data against the fixed vocabulary. It is hard to define a comprehensive set of everything, and thus they are hard to generalize to novel concepts and domains. In this thesis, I turn to language as a scalable and effective tool to build visually grounded models. Intuitively, natural languages are the most effective medium of learning and communication for humans. I will introduce two lines of work to train models to understand the visual world with language as supervision. The first line of work is inspired by masked language modeling such as BERT, and extends that to build contextualized representation models for vision and language. These models can be fine-tuned to perform vision-language tasks such as answering questions about an image. The second line of work uses language to supervise object detection models and enables object detection with prompts, where the users could specify custom needs and domain knowledge in a text prompt, and the model situates its predictions based on the text on the fly.
■590 ▼aSchool code: 0031.
■650 4▼aLanguage
■650 4▼aLinguistics
■650 4▼aComputer science
■653 ▼aVisual perception
■653 ▼aMasked language modeling
■653 ▼aVocabularies
■653 ▼aVisual input
■690 ▼a0800
■690 ▼a0984
■690 ▼a0679
■690 ▼a0290
■71020▼aUniversity of California, Los Angeles▼bComputer Science 0201.
■7730 ▼tDissertations Abstracts International▼g85-12A.
■790 ▼a0031
■791 ▼aPh.D.
■792 ▼a2024
■793 ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162441▼nKERIS▼z이 자료의 원문은 한국교육학술정보원에서 제공합니다.


