Recursive Visual Programming.- LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models.- Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks.- Learning to Adapt SAM for Segmenting Cross-domain Point Clouds.- Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging.- ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers.- Fine-grained Dynamic Network for Generic Event Boundary Detection.- Take A Step Back: Rethinking the Two Stages in Visual Reasoning.
- AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation.- Learning with Counterfactual Explanations for Radiology Report Generation.- SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models.- Better Regression Makes Better Test-time Adaptive 3D Object Detection.- ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.- Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization.- Finding Visual Task Vectors.- Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation.
- Event Camera Data Dense Pre-training.- Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning.- Rethinking Image-to-Video Adaptation: An Object-centric Perspective.- Layer-Wise Relevance Propagation with Conservation Property for ResNet.- DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism.- EgoLifter: Open-world 3D Segmentation for Egocentric Perception.- MEVG : Multi-event Video Generation with Text-to-Video Models.- Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively.
- Data-to-Model Distillation: Data-Efficient Learning Framework.- DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays.- AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network.