Score Distillation Sampling with Learned Manifold Corrective.- FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving.- Benchmarking the Robustness of Cross-view Geo-localization Models.- GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth.- SUMix: Mixup with Semantic and Uncertain Information.- Flatness-aware Sequential Learning Generates Resilient Backdoors.- Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models.- IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception.
- DiffClass: Diffusion-Based Class Incremental Learning.- Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees.- Instant 3D Human Avatar Generation using Image Diffusion Models.- PromptFusion: Decoupling Stability and Plasticity for Continual Learning.- Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance.- Adapting to Shifting Correlations with Unlabeled Data Calibration.- Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity.- Information Bottleneck Based Data Correction in Continual Learning.
- On Spectral Properties of Gradient-based Explanation Methods.- Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization.- O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation.- Dataset Distillation by Automatic Training Trajectories.- FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation.- EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding.- UniIR: Training and Benchmarking Universal Multimodal Information Retrievers.- SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning.
- Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation.- Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision.- latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction.