FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation.- BugNIST - a Large Volumetric Dataset for Detection under Domain Shift.- SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis.- PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture.- PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.- Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection.- A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks.- Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach.
- HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting.- DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM.- Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction.- HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance.- Multiscale Graph Texture Network.- HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis.- Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection.- RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception.
- Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation.- Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection.- CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization.- SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection.- S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition.- â-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions.- SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing.- Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition.
- Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing.- ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation.- Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos.