백석예술대학교 도서관

본문 바로가기
탑 메뉴 바로가기
주 메뉴 바로가기
하단 바로가기

상세정보
MARC
미리보기
내보내기

o SNS 내보내기

o RefWorks에서 이용 가능한 형식으로 반출합니다.
내보내기

내보내기

o RefWorks에서 이용 가능한 형식으로 반출합니다.

내보내기
부가기능
청구기호출력

Learning Visually Grounded Intelligence With Language

자료유형: 학위논문 서양

최종처리일시: 20250211152012

ISBN: 9798382832883

DDC: 400

저자명: Li, Liunian.

서명/저자: Learning Visually Grounded Intelligence With Language

발행사항: [Sl] : University of California, Los Angeles, 2024

발행사항: Ann Arbor : ProQuest Dissertations & Theses, 2024

형태사항: 128 p

주기사항: Source: Dissertations Abstracts International, Volume: 85-12, Section: A.

주기사항: Advisor: Chang, Kai-Wei.

학위논문주기: Thesis (Ph.D.)--University of California, Los Angeles, 2024.

초록/해제: 요약To build an Artificial Intelligence system that can assist us in daily lives, the ability to understand the world around us through visual input is essential. Prior studies train visual perception models by defining concept vocabularies and annotate data against the fixed vocabulary. It is hard to define a comprehensive set of everything, and thus they are hard to generalize to novel concepts and domains. In this thesis, I turn to language as a scalable and effective tool to build visually grounded models. Intuitively, natural languages are the most effective medium of learning and communication for humans. I will introduce two lines of work to train models to understand the visual world with language as supervision. The first line of work is inspired by masked language modeling such as BERT, and extends that to build contextualized representation models for vision and language. These models can be fine-tuned to perform vision-language tasks such as answering questions about an image. The second line of work uses language to supervise object detection models and enables object detection with prompts, where the users could specify custom needs and domain knowledge in a text prompt, and the model situates its predictions based on the text on the fly.

일반주제명: Language

일반주제명: Linguistics

일반주제명: Computer science

키워드: Visual perception

키워드: Masked language modeling

키워드: Vocabularies

키워드: Visual input

기타저자: University of California, Los Angeles Computer Science 0201

기본자료저록: Dissertations Abstracts International. 85-12A.

전자적 위치 및 접속: 로그인 후 원문을 볼 수 있습니다.

008250123s2024        us                              c    eng  d
■001000017162441
■00520250211152012
■006m          o    d
■007cr#unu||||||||
■020    ▼a9798382832883
■035    ▼a(MiAaPQ)AAI31331374
■040    ▼aMiAaPQ▼cMiAaPQ
■0820  ▼a400
■1001  ▼aLi,  Liunian.
■24510▼aLearning  Visually  Grounded  Intelligence  With  Language
■260    ▼a[Sl]▼bUniversity  of  California,  Los  Angeles▼c2024
■260  1▼aAnn  Arbor▼bProQuest  Dissertations  &  Theses▼c2024
■300    ▼a128  p
■500    ▼aSource:  Dissertations  Abstracts  International,  Volume:  85-12,  Section:  A.
■500    ▼aAdvisor:  Chang,  Kai-Wei.
■5021  ▼aThesis  (Ph.D.)--University  of  California,  Los  Angeles,  2024.
■520    ▼aTo  build  an  Artificial  Intelligence  system  that  can  assist  us  in  daily  lives,  the  ability  to  understand  the  world  around  us  through  visual  input  is  essential.  Prior  studies  train  visual  perception  models  by  defining  concept  vocabularies  and  annotate  data  against  the  fixed  vocabulary.  It  is  hard  to  define  a  comprehensive  set  of  everything,  and  thus  they  are  hard  to  generalize  to  novel  concepts  and  domains.  In  this  thesis,  I  turn  to  language  as  a  scalable  and  effective  tool  to  build  visually  grounded  models.  Intuitively,  natural  languages  are  the  most  effective  medium  of  learning  and  communication  for  humans.  I  will  introduce  two  lines  of  work  to  train  models  to  understand  the  visual  world  with  language  as  supervision.  The  first  line  of  work  is  inspired  by  masked  language  modeling  such  as  BERT,  and  extends  that  to  build  contextualized  representation  models  for  vision  and  language.  These  models  can  be  fine-tuned  to  perform  vision-language  tasks  such  as  answering  questions  about  an  image.  The  second  line  of  work  uses  language  to  supervise  object  detection  models  and  enables  object  detection  with  prompts,  where  the  users  could  specify  custom  needs  and  domain  knowledge  in  a  text  prompt,  and  the  model  situates  its  predictions  based  on  the  text  on  the  fly.
■590    ▼aSchool  code:  0031.
■650  4▼aLanguage
■650  4▼aLinguistics
■650  4▼aComputer  science
■653    ▼aVisual  perception
■653    ▼aMasked  language  modeling
■653    ▼aVocabularies
■653    ▼aVisual  input
■690    ▼a0800
■690    ▼a0984
■690    ▼a0679
■690    ▼a0290
■71020▼aUniversity  of  California,  Los  Angeles▼bComputer  Science  0201.
■7730  ▼tDissertations  Abstracts  International▼g85-12A.
■790    ▼a0031
■791    ▼aPh.D.
■792    ▼a2024
■793    ▼aEnglish
■85640▼uhttp://www.riss.kr/pdu/ddodLink.do?id=T17162441▼nKERIS▼z이  자료의  원문은  한국교육학술정보원에서  제공합니다.

신착도서 더보기

최근 3년간 통계입니다.

예약
소재불명신고
나의폴더
우선정리요청
비도서대출신청
야간 도서대출신청

소장자료
등록번호	청구기호	소장처	대출가능여부	대출정보
TF13848		전자도서	대출가능	마이폴더 부재도서신고 비도서대출신청 야간 도서대출신청

* 대출중인 자료에 한하여 예약이 가능합니다. 예약을 원하시면 예약버튼을 클릭하십시오.

로그인 후 이용 가능합니다.

본문

상세정보

MARC

미리보기

내보내기

chatGPT토론

Ai 추천 관련 도서

신착도서 더보기

최근 3년간 통계입니다.

소장정보

해당 도서를 다른 이용자가 함께 대출한 도서

관련 인기도서

QuickMenu