Blog | Yiyang Feng

Learning RLHF (PPO) with codes (Huggingface TRL)

Tech essays of Reinforcement Learning from Human Feedback (RLHF) and Proximal Policy Optimization (PPO) with codes in Huggingface TRL.

10 min read · September 16, 2023

2023 · NLP LLM · TechEssays
Reading Notes of How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources

Reading notes of Yao's notes of "How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources".

6 min read · February 19, 2023

2023 · NLP LLM · ReadingNotes
Huggingface parallel training for solving the CUDA out of memory issue

Document a workable solution for the annoying CUDA Out Of Memory (OOM).

3 min read · February 12, 2023

2023 · NLP CUDA · TechEssays
Could you give me a hint? Generating inference graphs for defeasible reasoning

A reading note about a paper related to defeasible reasoning.

3 min read · January 24, 2023

2023 · NLP CausalReasoning · ReadingNotes