Paper Notes

标签: reward-modeling

此标签下有5条笔记。

2026年5月
AutoRubric-T2I: Robust Rule-Based Reward Model for Text-to-Image Alignment
2026年5月
Beyond Pairwise Preferences: Listwise Reward-Aware Alignment for Diffusion Models
2026年5月
EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics
2026年5月
Reinforce Adjoint Matching: Scaling RL Post-Training of Diffusion and Flow-Matching Models
2026年5月
RewardHarness: Self-Evolving Agentic Post-Training

Created with Quartz v4.5.2 © 2026

Source