
Jihyung Kil
Research Scientist
Adobe Research
LinkedIn / Twitter / Google Scholar
I am a Research Scientist at Adobe Research. I earned my Ph.D. in Computer Science and Engineering from The Ohio State University, advised by Wei-Lun (Harry) Chao. My research interests include AI agents, with a recent focus on GUI/computer-using and embodied agents. I am also interested in multimodal understanding and generation for long-form content such as documents or textbooks. Feel free to reach out at jkil@adobe.com for collaborations and internships.
Work
Adobe Research - Research Scientist (2024 - present)Amazon - Research Intern (2023)
Google Research (now DeepMind) - Research Intern (2022)
Research [see all]
GUI (Computer-Using) / Embodied Agents
- GUI Agents: A Survey
- GPT-4V(ision) is a Generalist Web Agent, if Grounded
- Dual-View Visual Contextualization for Web Navigation
- One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones
Multimodal Learning
- VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
- MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs
- ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
- II-MMR: Identifying and Improving Multi-modal Multi-hop Reasoning in Visual Question Answering
- PreSTU: Pre-Training for Scene-Text Understanding
- Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Other
- Representation Shift: Unifying Token Compression with FlashAttention
- Revisiting Document Representations for Large-Scale Zero-Shot Learning