Publications

2024

  1. preprint
    ii-mmr.png
    II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
    Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim
    arXiv 2024
  2. ICML
    seeact.png
    GPT-4V(ision) is a Generalist Web Agent, if Grounded
    Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
    ICML 2024
  3. CVPR
    dual-vcr.png
    Dual-View Visual Contextualization for Web Navigation
    Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao
    CVPR 2024

2023

  1. ICCV
    prestu.png
    PreSTU: Pre-Training for Scene-Text Understanding
    Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut
    ICCV 2023

2022

  1. CVPR
    m-track.png
    One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
    Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M Sadler, Wei-Lun Chao, Yu Su
    CVPR 2022

2021

  1. EMNLP
    simpleaug.png
    Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
    Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao
    EMNLP 2021
  2. NAACL
    zsl.png
    Revisiting Document Representations for Large-Scale Zero-Shot Learning
    Jihyung Kil, Wei-Lun Chao
    NAACL 2021