ProMerge: Prompt and Merge for Unsupervised Instance Segmentation.- M2D2M: Multi-Motion Generation from Text with Discrete Diffusion Models.- The Hard Positive Truth about Vision-Language Compositionality.- GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing.- Shapefusion: 3D localized human diffusion models.- Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing.- Prompting Language-Informed Distribution for Compositional Zero-Shot Learning.- Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment.
- 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting.- Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels.- Free-Viewpoint Video of Outdoor Sports Using a Drone.- Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing.- ConGeo: Robust Cross-view Geo-localization across Ground View Variations.- Generalizable Facial Expression Recognition.- GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views.- Self-Supervised Any-Point Tracking by Contrastive Random Walks.
- MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization.- Siamese Vision Transformers are Scalable Audio-visual Learners.- LCM-Lookahead for Encoder-based Text-to-Image Personalization.- Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization.- Towards Open-Ended Visual Recognition with Large Language Models.- Ray-Distance Volume Rendering for Neural Scene Reconstruction.- ReNoise: Real Image Inversion Through Iterative Noising.- Attention Decomposition for Cross-Domain Semantic Segmentation.
- Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation.- Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework.- RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models.