Visible and Clear: Finding Tiny Objects in Difference Map.- Rethinking Image Super Resolution from Training Data Perspectives.- BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering.- Efficient Inference of Vision Instruction-Following Models with Elastic Cache.- FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior.- Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams.- MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection.- WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models.
- Interactive 3D Object Detection with Prompts.- How Video Meetings Change Your Expression.- GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering.- Neural Volumetric World Models for Autonomous Driving.- IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models.- RegionDrag: Fast Region-Based Image Editing with Diffusion Models.- On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy.- Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding.
- Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration.- GRA: Detecting Oriented Objects through Group-wise Rotating and Attention.- A Simple Knowledge Distillation Framework for Generalizable Vision-Language Models.- Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer.- CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection.- Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation.- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions.- Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation.
- Invertible Neural Warp for NeRF.- Enhancing Vectorized Map Perception with Historical Rasterized Maps.- Efficient and Versatile Robust Fine-Tuning of Zero-shot Models.