About

Hello, I am Yerim Jeon, a Ph.D. candidate at Sungkyunkwan University (SKKU), South Korea, working in the Visual Computing Lab (VCLab) under the supervision of Prof. Jae-Pil Heo. Prior to my Ph.D., I received both my B.S. and M.S. degrees from the same institution.

My research focuses on multimodal large language models (MLLMs) for 3D scene understanding. Specifically, I am interested in how MLLMs can interpret and reason about 3D scenes through natural language, bridging 3D perception and language understanding.

Publications

Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Yerim Jeon, Miso Lee, WonJun Moon, Jae-Pil Heo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
arXiv / Code
Temporally Consistent Long-Term Memory for 3D Single Object Tracking
Jaejoon Yoo, SuBeen Lee, Yerim Jeon, Miso Lee, and Jae-Pil Heo
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, [Findings]
arXiv / Code
Boundary-Recovering Network for Temporal Action Detection
Jihwan Kim, Jaehyun Choi, Yerim Jeon, and Jae-Pil Heo
Pattern Recognition, 2026
Paper / arXiv
Mutually-Aware Feature Learning for Few-Shot Object Counting
Yerim Jeon, Subeen Lee, Jihwan Kim, and Jae-Pil Heo
Pattern Recognition, 2025
Paper / arXiv

Projects

GUS HT Document Recognition AI for Logistics Automation Mar 2025 – Mar 2026

Sponsored by Hyundai Glovis
Developed a Multi-Modal LLM-based pipeline that automatically extracts text and image information from diverse logistics documents (BillingInvoice, InboundPOD, OutboundPOD).

3D Perception Technology for Autonomous Pallet Stacker Jun 2024 – Mar 2025

Sponsored by Mobyus
Developed a LiDAR-based model for estimating the center position of pallet holes for forklift cargo handling. Developed an RGB stereo-based perception technology for recognizing truck-loading and open-yard stacking environments.

Detection of Manipulated Images/Videos for Criminal Investigation Jan 2023 – Dec 2023

Sponsored by IITP
Developed detection models robust to manipulated images generated by diverse generative models, including Deepfake, Diffusion, and GAN.

Reconstruction of Non-Line-of-Sight Scene for VR/AR Contents Mar 2021 – Dec 2022

Sponsored by IITP
Built a multimodal deep learning pipeline fusing Laser, RF, and Sound sensors for object detection, depth estimation, and scene reconstruction in non-line-of-sight environments.