Paper Notes
Search
搜索
暗色模式
亮色模式
探索
标签: paper/multimodal-generation/pretraining-architecture
此标签下有13条笔记。
2026年5月
Qwen-Image-2.0 Technical Report
2026年5月
Qwen-Image-VAE-2.0 Technical Report
2026年5月
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
2026年4月
Context Unrolling in Omni Models
2026年4月
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
2026年3月
Beyond Language Modeling: An Exploration of Multimodal Pretraining
2026年3月
Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis
2026年3月
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
2026年2月
DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
2026年2月
Unified Latents (UL): How to train your latents
2026年1月
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
2025年12月
UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
2025年5月
Emerging Properties in Unified Multimodal Pretraining (BAGEL)