All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Group Relative Policy Optimization
Admin Block Corp Insurance Dept
M365 Group
Expiration
Deepseek YouTube Video
Optimization
Deepseek YouTube Video SEO
Optimization
Advanced Group Policy
Management
Trusted Region
Optimization
Cyber Group
MIPCOM
Why There Is a Need of
Group Policy
Trump Post-Election Policy Videos
MCITP Certification
PPO Gradient Descent
Constraign and Unconstraign
Optimization
Policy
Gradient Methods
Using AGPM to Create a New
Policy
Grpo
Policy
Gradients
How Do I Find Optimal
Policy
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
Group Relative Policy Optimization
Admin Block Corp Insurance Dept
M365 Group
Expiration
Deepseek YouTube Video
Optimization
Deepseek YouTube Video SEO
Optimization
Advanced Group Policy
Management
Trusted Region
Optimization
Cyber Group
MIPCOM
Why There Is a Need of
Group Policy
Trump Post-Election Policy Videos
MCITP Certification
PPO Gradient Descent
Constraign and Unconstraign
Optimization
Policy
Gradient Methods
Using AGPM to Create a New
Policy
Grpo
Policy
Gradients
How Do I Find Optimal
Policy
DeepSeekMath 7B: Open-Source Math Model Surpasses GPT-4 | Byte Goose AI posted on the topic | LinkedIn
115 views
3 months ago
linkedin.com
DeepSeek-AI's GRPO Revolution: Boosting AI Reasoning with New Variants | Byte Goose AI posted on the topic | LinkedIn
103 views
4 months ago
linkedin.com
24:21
Group Relative Policy Optimization (GRPO) Explained – Formula and PyTorch Implementation
6 months ago
MSN
Deep Learning with Yacine
Group Policy Loopback Problems and Solutions
Jan 8, 2021
policypak.com
2:42
New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving
38.8K views
11 months ago
Facebook
Andrew Ng
1:06
Revolutionizing AI: OpenVLThinkerV2
117 views
1 month ago
YouTube
60s Research
4:47
Turn-PPO: LLM 에이전트 멀티턴 강화학습 최적화 및 GRPO 비교 분석
2 views
4 months ago
YouTube
CosmoX
18:55
Soft Adaptive Policy Optimization
73 views
5 months ago
YouTube
Xiaol.x
1:43
Gemma_GRPO
1 views
3 months ago
YouTube
Laveena TB21E225
22:37
Unsloth RL Training. Nvidia NeMO RL using GRPO. Reinforcement Learning from Verifiable Rewards RLVR
275 views
1 month ago
YouTube
Byte Goose AI.
17:43
[RL Fine-Tuning] From RLHF to GRPO: The Evolution and Optimization of AI LLM Models Alignment.
275 views
3 months ago
YouTube
AI Podcast Series. Byte Goose AI.
0:06
Introducing Target Policy Optimization (TPO):TPO turns GRPO into supervised learning: build a target distribution over sampled completions, then fit with cross-entropy.The gradient vanishes once the target is matched, making multi-epoch training smooth. 🧵(1/4)
14.2K views
3 weeks ago
x.com
Jean Kaddour @ ICLR 2026
0:03
LLM fine-tuning techniques I'd learn if I were to customize them:Bookmark this.1. LoRA2. QLoRA3. Prefix Tuning4. Adapter Tuning5. Instruction Tuning6. P-Tuning7. BitFit8. Soft Prompts9. RLHF10. RLAIF11. DPO (Direct Preference Optimization)12. GRPO (Group Relative Policy Optimization)13. RLAIF (RL with AI Feedback)14. Multi-Task Fine-Tuning15. Federated Fine-TuningMy favourite is GRPO for building reasoning models. What about you?I've shared my full tutorial on GRPO in the replies.
55.5K views
3 weeks ago
x.com
Akshay 🚀
17:27
【GRPO】零基础也能看懂的GRPO算法
15.4K views
2 months ago
bilibili
东川路第一可爱猫猫虫
Advanced Concepts in Large Language Models. RL / SFT / MHA / GQA / RoPE, RLVR / DPO/ GRPO Arch
5 months ago
linkedin.com
Unsloth RL Training. Nvidia NeMO RL using GRPO. Reinforcement Learning from Verifiable Rewards RLVR | Byte Goose AI
184 views
1 month ago
linkedin.com
Daily ML Papers on Instagram: "🚀 Reinforcement Learning Enables Advanced Reasoning 🤖 "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (2025) shows large language models can sharpen their logic purely through RL—no massive supervised dataset needed. 🔸 Uses Group Relative Policy Optimization (GRPO) for adaptive feedback 🔸 Emergent chain-of-thought for math & coding tasks 🔸 Distills insights into smaller, efficient models A fresh direction for LLM training!
94.4K views
Feb 26, 2025
Instagram
daily.ml.papers
17:50
Proximal Policy Optimization Explained
78.2K views
May 20, 2021
YouTube
Edan Meyer
2:27
What is a GPO
49K views
May 15, 2018
YouTube
WellLink Group Purchasing
1:02:47
Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO Tutorial
86.5K views
Dec 24, 2020
YouTube
Machine Learning with Phil
23:53
魔改GRPO不训练参数也能涨分? Training-Free GRPO论文详解
1.3K views
4 months ago
YouTube
EZ.Encoder Academy
2:54
Google Tunix Hackathon: Gemma2 2B GRPO Post Training
49 views
4 months ago
YouTube
limit less
1:10
What is Proximal Policy Optimization ( PPO)?
87 views
5 months ago
YouTube
Data Science Made Easy
7:03
GRPO: The Reinforcement Learning Trick That Changed Everything
156 views
5 months ago
YouTube
mathtartic
1:19
Policy Gradient in One Minute
2.8K views
10 months ago
YouTube
Jia-Bin Huang
13:26
Proximal Policy Optimization | ChatGPT uses this
43.2K views
Dec 4, 2023
YouTube
CodeEmporium
3:12
Teaching AI to Show Its Work: GRPO Training with Google Tunix
14 views
4 months ago
YouTube
monish devineni
11:44
Understanding R1-Zero-Like Training: A Critical Perspective
203 views
6 months ago
YouTube
Conference on Language Modeling
3:38
Improving Speech LLMs with GRPO Rewards
15 views
7 months ago
YouTube
AI Research Roundup
23:32
How LLMs Learn to Reason [GRPO]
11.7K views
1 year ago
YouTube
Jia-Bin Huang
See more
More like this
Feedback