Paper Notes

Home

❯

LLM & VLM

❯

RL & Post Training

文件夹: LLM--and--VLM/RL--and--Post-Training

此文件夹下有9条笔记。

  • 2026年4月

    In-Place Test-Time Training

  • 2026年4月

    Rethinking Generalization in Reasoning SFT

  • 2026年3月

    Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

  • 2026年1月

    Learning to Discover at Test Time

  • 2025年12月

    QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

  • 2025年9月

    Reinforcement Learning with Inverse Rewards for World Model Post-training (RLIR)

  • 2025年3月

    DAPO: An Open-Source LLM Reinforcement Learning System at Scale

  • GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

  • LongViTU: Instruction Tuning for Long-Form Video Understanding


Created with Quartz v4.5.2 © 2026

  • Source