Jihyung Kil

Research Scientist
Adobe Research
LinkedIn / Twitter / Google Scholar / GitHub


I am a Research Scientist at Adobe Research. I earned my Ph.D. in Computer Science and Engineering from The Ohio State University, where I was fortunate to work with Wei-Lun (Harry) Chao. Prior to Adobe, I interned at Google Research and Amazon Alexa AI.

I am broadly interested in Vision and Language, with a recent focus on multimodal document understanding and web/UI agents. Feel free to contact me at jkil@adobe.com for university collaborations and internships.

News
Sep, 2024 Our CompBench on MLLMs accepted to NeurIPS 2024 Datasets and Benchmarks.
Sep, 2024 Our ARES on multimodal CoT accepted to EMNLP 2024.
May, 2024 Our II-MMR on Visual Question Answering accepted to ACL 2024.
May, 2024 Our SeeAct on Web Navigation accepted to ICML 2024.
Mar, 2024 I am selected for Doctoral Consortium at CVPR 2024.
Feb, 2024 Our Dual-VCR on Web Navigation accepted to CVPR 2024.
Jul, 2023 Our PreSTU on Scene-Text Undersatnding accepted to ICCV 2023.
Mar, 2022 Our M-Track on Vision and Language Navigation accepted to CVPR 2022.
Dec, 2021 Our team is selected to participate in the Amazon Alexa Prize SimBot Challenge.
Aug, 2021 Our SimpleAug on Visual Question Answering accepted to EMNLP 2021.
Apr, 2021 Our paper on Zero Shot Learning accepted to NAACL 2021.
Research
  1. NeurIPS
    compbench.png
    CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
    Jihyung Kil*, Zheda Mai*, Justin Lee, Zihe Wang, Kerrie Cheng, Lemeng Wang, Ye Liu, Arpita Chowdhury, Wei-Lun Chao
    NeurIPS 2024 Datasets and Benchmarks
  2. EMNLP
    ares.png
    ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
    Ju-Seung Byun*, Jiyun Chun*, Jihyung Kil, Andrew Perrault
    EMNLP 2024
  3. ACL
    ii-mmr.png
    II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
    Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim
    ACL Findings 2024
  4. ICML
    seeact.png
    GPT-4V(ision) is a Generalist Web Agent, if Grounded
    Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
    ICML 2024
  5. CVPR
    dual-vcr.png
    Dual-View Visual Contextualization for Web Navigation
    Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao
    CVPR 2024
  6. ICCV
    prestu.png
    PreSTU: Pre-Training for Scene-Text Understanding
    Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut
    ICCV 2023
  7. CVPR
    m-track.png
    One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
    Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M Sadler, Wei-Lun Chao, Yu Su
    CVPR 2022
  8. EMNLP
    simpleaug.png
    Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
    Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao
    EMNLP 2021
  9. NAACL
    zsl.png
    Revisiting Document Representations for Large-Scale Zero-Shot Learning
    Jihyung Kil, Wei-Lun Chao
    NAACL 2021