Upper-body Hierarchical Graph for Skeleton Based Emotion Recognition in Assistive Driving.- Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction.- Exploring Guided Sampling of Conditional GANs.- MotionChain: Conversational Motion Controllers via Multimodal Prompts.- Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition.- Latent Guard: a Safety Framework for Text-to-image Generation.- MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion.- TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection.
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection.- FoundPose: Unseen Object Pose Estimation with Foundation Features.- Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation.- Kalman-Inspired Feature Propagation for Video Face Super-Resolution.- Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models.- VideoMamba: State Space Model for Efficient Video Understanding.- SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging.- Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds.
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving.- Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models.- Deep Cost Ray Fusion for Sparse Depth Video Completion.- GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection.- DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video.- GraspXL: Generating Grasping Motions for Diverse Objects at Scale.- Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models.- Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models.
- JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation.- Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals.- Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection.