Paper Notes
Search
搜索
暗色模式
亮色模式
探索
标签: paper/llm-vlm/rl-post-training
此标签下有9条笔记。
2026年4月
In-Place Test-Time Training
2026年4月
Rethinking Generalization in Reasoning SFT
2026年3月
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
2026年1月
Learning to Discover at Test Time
2025年12月
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
2025年9月
Reinforcement Learning with Inverse Rewards for World Model Post-training (RLIR)
2025年3月
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
LongViTU: Instruction Tuning for Long-Form Video Understanding