Improve multi-hop reasoning in LLMs by learning from rich human feedback
Recent large language models (LLMs) have enabled tremendous progress in natural language understanding. However, they are prone to generating confident but nonsensical explanations, which poses a significant obstacle to establishing trust with users. In this post, we show how to incorporate human feedback on the incorrect reasoning chains for multi-hop reasoning to improve performance on …
Read more “Improve multi-hop reasoning in LLMs by learning from rich human feedback”