Part2Object: Hierarchical Unsupervised 3D Instance Segmentation.- PetFace: A Large-Scale Dataset and Benchmark for Animal Identification.- MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo.- Zero-Shot Detection of AI-Generated Images.- Language-Image Pre-training with Long Captions.- GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition.- DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control.- You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception.
- Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models.- Facial Affective Behavior Analysis with Instruction Tuning.- CoReS: Orchestrating the Dance of Reasoning and Segmentation.- MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing.- MambaIR: A Simple Baseline for Image Restoration with State-Space Model.- I Can't Believe It's Not Scene Flow!.- Rethinking Unsupervised Outlier Detection via Multiple Thresholding.- Compress3D: a Compressed Latent Space for 3D Generation from a Single Image.
- Scalable Group Choreography via Variational Phase Manifold Learning.- Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition.- Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion.- PoseSOR: Human Pose Can Guide Our Attention.- TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes.- Bi-directional Contextual Attention for 3D Dense Captioning.- Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning.- InfMAE: A Foundation Model in The Infrared Modality.
- TPA3D: Triplane Attention for Fast Text-to-3D Generation.- Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification.- LivePhoto: Real Image Animation with Text-guided Motion Control.