Large-scale Reinforcement Learning for Diffusion Models.- CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion.- FedHARM: Harmonizing Model Architectural Diversity in Federated Learning.- EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS.- Global Counterfactual Directions.- TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving.- RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark.- EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models.
- RICA^2: Rubric-Informed, Calibrated Assessment of Actions.- Region-centric Image-Language Pretraining for Open-Vocabulary Detection.- Commonly Interesting Images.- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities.- CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching.- Caltech Aerial RGB-Thermal Dataset in the Wild.- Diffusion Soup: Model Merging for Text-to-Image Diffusion Models.- Volumetric Rendering with Baked Quadrature Fields.
- CityGuessr: City-Level Video Geo-Localization on a Global Scale.- Pseudo-Labelling Should Be Aware of Disguising Channel Activations.- Bayesian Detector Combination for Object Detection with Crowdsourced Annotations.- Revising Densification in Gaussian Splatting.- FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing.- Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with DIV Loss.- Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions.- UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation.
- PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis.- R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding.- A Graph-Based Approach for Category-Agnostic Pose Estimation.