Jihyung Kil

Research Scientist
Adobe Research
LinkedIn / Twitter / Google Scholar


I am a Research Scientist at Adobe Research. I earned my Ph.D. in Computer Science and Engineering from The Ohio State University, advised by Wei-Lun (Harry) Chao. My research interests include AI agents, with a recent focus on GUI/computer-using and embodied agents. I am also interested in multimodal understanding and generation for long-form content such as documents or textbooks. Feel free to reach out at jkil@adobe.com for collaborations and internships.

Work
Adobe Research - Research Scientist (2024 - present)
Amazon - Research Intern (2023)
Google Research (now DeepMind) - Research Intern (2022)
Research [see all]

GUI (Computer-Using) / Embodied Agents

  1. GUI Agents: A Survey
    Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, others
    ACL 2025
  2. GPT-4V(ision) is a Generalist Web Agent, if Grounded
    Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
    ICML 2024
  3. Dual-View Visual Contextualization for Web Navigation
    Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao
    CVPR 2024
  4. One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
    Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M Sadler, Wei-Lun Chao, Yu Su
    CVPR 2022

Multimodal Learning

  1. VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
    Jian Chen, Ming Li, Jihyung Kil, Chenguang Wang, Tong Yu, Ryan Rossi, Tianyi Zhou, Changyou Chen, Ruiyi Zhang
    arXiv 2025
  2. MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
    Jihyung Kil*, Zheda Mai*, Justin Lee, Zihe Wang, Kerrie Cheng, Lemeng Wang, Ye Liu, Arpita Chowdhury, Wei-Lun Chao
    NeurIPS 2024
  3. ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
    Ju-Seung Byun*, Jiyun Chun*, Jihyung Kil, Andrew Perrault
    EMNLP 2024
  4. II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
    Jihyung Kil, Farideh Tavazoee, Dongyeop Kang, Joo-Kyung Kim
    ACL 2024
  5. PreSTU: Pre-Training for Scene-Text Understanding
    Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut
    ICCV 2023
  6. Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
    Jihyung Kil, Cheng Zhang, Dong Xuan, Wei-Lun Chao
    EMNLP 2021

Other

  1. Representation Shift: Unifying Token Compression with FlashAttention
    Joonmyung Choi*, Sanghyeok Lee*, Byungoh Ko, Eunseo Kim, Jihyung Kil, Hyunwoo J. Kim
    ICCV 2025
  2. Revisiting Document Representations for Large-Scale Zero-Shot Learning
    Jihyung Kil, Wei-Lun Chao
    NAACL 2021