Selective Preference Optimization via Token-Level Reward Function Estimation
Type of publication: | Misc |
Citation: | yang:2024b |
Year: | 2024 |
Howpublished: | arXiv |
URL: | https://arxiv.org/abs/2408.135... |
Keywords: | |
Authors | |
Added by: | [PRT] |
Total mark: | 0 |
Attachments
|
|
Notes
|
|
|
|
Topics
|
|
|