Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding.- Self-Supervised Audio-Visual Soundscape Stylization.- SAVE: Protagonist Diversification with Structure Agnostic Video Editing.- VideoAgent: Long-form Video Understanding with Large Language Model as Agent.- Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning.- Source-Free Domain-Invariant Performance Prediction.- Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures.- Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort.
- Direct Distillation between Different Domains.- Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery.- V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation.- GRiT: A Generative Region-to-text Transformer for Object Understanding.- LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System.- Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning.- Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending.- Geometry Fidelity for Spherical Images.
- BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling.- CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning.- WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation.- Benchmarking Spurious Bias in Few-Shot Image Classifiers.- TurboEdit: Real-time text-based disentangled real image editing.- Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy.- Augmented Neural Fine-tuning for Efficient Backdoor Purification.- REDIR: Refocus-free Event-based De-occlusion Image Reconstruction.
- Free-Editor: Zero-shot Text-driven 3D Scene Editing.- DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly.- An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation.