Paper Notes
Search
搜索
暗色模式
亮色模式
探索
Home
❯
LLM & VLM
❯
RL & Post Training
文件夹: LLM--and--VLM/RL--and--Post-Training
此文件夹下有9条笔记。
2026年4月
In-Place Test-Time Training
2026年4月
Rethinking Generalization in Reasoning SFT
2026年3月
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
2026年1月
Learning to Discover at Test Time
2025年12月
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
2025年9月
Reinforcement Learning with Inverse Rewards for World Model Post-training (RLIR)
2025年3月
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
LongViTU: Instruction Tuning for Long-Form Video Understanding