Paper Notes

标签: paper/multimodal-generation/pretraining-architecture

此标签下有13条笔记。

  • 2026年5月

    Qwen-Image-2.0 Technical Report

  • 2026年5月

    Qwen-Image-VAE-2.0 Technical Report

  • 2026年5月

    SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

  • 2026年4月

    Context Unrolling in Omni Models

  • 2026年4月

    Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

  • 2026年3月

    Beyond Language Modeling: An Exploration of Multimodal Pretraining

  • 2026年3月

    Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

  • 2026年3月

    Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

  • 2026年2月

    DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

  • 2026年2月

    Unified Latents (UL): How to train your latents

  • 2026年1月

    NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

  • 2025年12月

    UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

  • 2025年5月

    Emerging Properties in Unified Multimodal Pretraining (BAGEL)


Created with Quartz v4.5.2 © 2026

  • Source