본문

Learning Visually Grounded Intelligence With Language
Learning Visually Grounded Intelligence With Language
Learning Visually Grounded Intelligence With Language

상세정보

자료유형  
 학위논문 서양
최종처리일시  
20250211152012
ISBN  
9798382832883
DDC  
400
저자명  
Li, Liunian.
서명/저자  
Learning Visually Grounded Intelligence With Language
발행사항  
[Sl] : University of California, Los Angeles, 2024
발행사항  
Ann Arbor : ProQuest Dissertations & Theses, 2024
형태사항  
128 p
주기사항  
Source: Dissertations Abstracts International, Volume: 85-12, Section: A.
주기사항  
Advisor: Chang, Kai-Wei.
학위논문주기  
Thesis (Ph.D.)--University of California, Los Angeles, 2024.
초록/해제  
요약To build an Artificial Intelligence system that can assist us in daily lives, the ability to understand the world around us through visual input is essential. Prior studies train visual perception models by defining concept vocabularies and annotate data against the fixed vocabulary. It is hard to define a comprehensive set of everything, and thus they are hard to generalize to novel concepts and domains. In this thesis, I turn to language as a scalable and effective tool to build visually grounded models. Intuitively, natural languages are the most effective medium of learning and communication for humans. I will introduce two lines of work to train models to understand the visual world with language as supervision. The first line of work is inspired by masked language modeling such as BERT, and extends that to build contextualized representation models for vision and language. These models can be fine-tuned to perform vision-language tasks such as answering questions about an image. The second line of work uses language to supervise object detection models and enables object detection with prompts, where the users could specify custom needs and domain knowledge in a text prompt, and the model situates its predictions based on the text on the fly.
일반주제명  
Language
일반주제명  
Linguistics
일반주제명  
Computer science
키워드  
Visual perception
키워드  
Masked language modeling
키워드  
Vocabularies
키워드  
Visual input
기타저자  
University of California, Los Angeles Computer Science 0201
기본자료저록  
Dissertations Abstracts International. 85-12A.
전자적 위치 및 접속  
로그인 후 원문을 볼 수 있습니다.

MARC

 008250123s2024        us                              c    eng  d
■001000017162441
■00520250211152012
■006m          o    d                
■007cr#unu||||||||
■020    ▼a9798382832883
■035    ▼a(MiAaPQ)AAI31331374
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a400
■1001  ▼aLi,  Liunian.
■24510▼aLearning  Visually  Grounded  Intelligence  With  Language
■260    ▼a[Sl]▼bUniversity  of  California,  Los  Angeles▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a128  p
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-12,  Section:  A.
■500    ▼aAdvisor:  Chang,  Kai-Wei.
■5021  ▼aThesis  (Ph.D.)--University  of  California,  Los  Angeles,  2024.
■520    ▼aTo  build  an  Artificial  Intelligence  system  that  can  assist  us  in  daily  lives,  the  ability  to  understand  the  world  around  us  through  visual  input  is  essential.  Prior  studies  train  visual  perception  models  by  defining  concept  vocabularies  and  annotate  data  against  the  fixed  vocabulary.  It  is  hard  to  define  a  comprehensive  set  of  everything,  and  thus  they  are  hard  to  generalize  to  novel  concepts  and  domains.  In  this  thesis,  I  turn  to  language  as  a  scalable  and  effective  tool  to  build  visually  grounded  models.  Intuitively,  natural  languages  are  the  most  effective  medium  of  learning  and  communication  for  humans.  I  will  introduce  two  lines  of  work  to  train  models  to  understand  the  visual  world  with  language  as  supervision.  The  first  line  of  work  is  inspired  by  masked  language  modeling  such  as  BERT,  and  extends  that  to  build  contextualized  representation  models  for  vision  and  language.  These  models  can  be  fine-tuned  to  perform  vision-language  tasks  such  as  answering  questions  about  an  image.  The  second  line  of  work  uses  language  to  supervise  object  detection  models  and  enables  object  detection  with  prompts,  where  the  users  could  specify  custom  needs  and  domain  knowledge  in  a  text  prompt,  and  the  model  situates  its  predictions  based  on  the  text  on  the  fly.
■590    ▼aSchool  code:  0031.
■650  4▼aLanguage
■650  4▼aLinguistics
■650  4▼aComputer  science
■653    ▼aVisual  perception
■653    ▼aMasked  language  modeling
■653    ▼aVocabularies
■653    ▼aVisual  input
■690    ▼a0800
■690    ▼a0984
■690    ▼a0679
■690    ▼a0290
■71020▼aUniversity  of  California,  Los  Angeles▼bComputer  Science  0201.
■7730  ▼tDissertations  Abstracts  International▼g85-12A.
■790    ▼a0031
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162441▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서


    신착도서 더보기
    최근 3년간 통계입니다.

    소장정보

    • 예약
    • 소재불명신고
    • 나의폴더
    • 우선정리요청
    • 비도서대출신청
    • 야간 도서대출신청
    소장자료
    등록번호 청구기호 소장처 대출가능여부 대출정보
    TF13848 전자도서 대출가능 마이폴더 부재도서신고 비도서대출신청 야간 도서대출신청

    * 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

    해당 도서를 다른 이용자가 함께 대출한 도서

    관련 인기도서

    로그인 후 이용 가능합니다.