Paper Notes

标签: RLHF

此标签下有5条笔记。

2026年5月
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
2026年5月
Qwen-Image-2.0 Technical Report
2026年4月
深度学习多个 loss 如何平衡
2026年3月
MV-GRPO: From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space
2025年7月
Group Sequence Policy Optimization

Created with Quartz v4.5.2 © 2026

Source