Paper Notes

❯

❯

RL & Post Training

文件夹: LLM--and--VLM/RL--and--Post-Training

此文件夹下有9条笔记。

2026年4月
In-Place Test-Time Training
2026年4月
Rethinking Generalization in Reasoning SFT
2026年3月
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought
2026年1月
Learning to Discover at Test Time
2025年12月
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
2025年9月
Reinforcement Learning with Inverse Rewards for World Model Post-training (RLIR)
2025年3月
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
LongViTU: Instruction Tuning for Long-Form Video Understanding

Created with Quartz v4.5.2 © 2026

Source