Open-Set Recognition in the Age of Vision-Language Models.- Unsqueeze [CLS] Bottleneck to Learn Rich Representations.- Robust Multimodal Learning via Representation Decoupling.- Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models.- WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing.- Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation.- VeCLIP: Improving CLIP Training via Visual-enriched Captions.- Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks.
- Learning Representations from Foundation Models for Domain Generalized Stereo Matching.- Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction.- Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer.- Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts.- Event-Adapted Video Super-Resolution.- Look Hear: Gaze Prediction for Speech-directed Human Attention.- Raising the Ceiling: Conflict-Free Local Feature Matching with Dynamic View Switching.- Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge.
- Catastrophic Overfitting: A Potential Blessing in Disguise.- Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework.- SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models.- Visual Alignment Pre-training for Sign Language Translation.- Parrot Captions Teach CLIP to Spot Text.- Solving Motion Planning Tasks with a Scalable Generative Model.- Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models.- Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment.
- Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation.- BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow.- Diffusion Reward: Learning Rewards via Conditional Video Diffusion.