Reinforcement Learning - Search Videos

Wondering how we can better simulate human behavior with reinforcement learning? Introducing DITTO: RL with verbal feedback for subjective tasks like user simulation, student modeling, character role-play, and theory of mind.The result: an 8B model that performs on par with GPT-5.4 on the new SOUL benchmark suite.

x.comXuhui Zhou

Wondering how we can better simulate human behavior with reinforcement learning? Introducing DITTO: RL with verbal feedback for subjective tasks like user simulation, student

Xuhui Zhou (@nlpxuhui). 30 likes. Wondering how we can better simulate human behavior with reinforcement learning? Introducing DITTO: RL with verbal feedback for subjective tasks like user simulation, student modeling, character role-play, and theory of mind.The result: an 8B model that performs on par with GPT-5.4 on the new SOUL benchmark suite.

8.4K views1 week ago

Deep Reinforcement Learning

Overview of Deep Reinforcement Learning Methods

Overview of Deep Reinforcement Learning Methods

YouTubeSteve Brunton

105.6K viewsJan 21, 2022

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

YouTubeLex Fridman

365.9K viewsJan 24, 2019

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 18: Frontiers

Stanford CS224R Deep Reinforcement Learning | Spring 2025 | Lecture 18: Frontiers

YouTubeStanford Online

4K views5 months ago

Top videos

Reinforcement learning research often depends on static benchmarks and obscure leaderboards.But real evaluation happens inside environments, with specific tools, prompts, constraints, and workflows.Today, we’re launching the Turing RL Environments Evaluation Platform.Researchers now have direct, real time access to:-The exact production RL environments used in evaluation -Full tool inventories and prompt transparency -Explicit QA rubrics and scoring criteria -Live, harness-integrated leaderboard

Reinforcement learning research often depends on static benchmarks and obscure leaderboards.But real evaluation happens inside environments, with specific tools, prompts, constraints, and workflows.Today, we’re launching the Turing RL Environments Evaluation Platform.Researchers now have direct, real time access to:-The exact production RL environments used in evaluation -Full tool inventories and prompt transparency -Explicit QA rubrics and scoring criteria -Live, harness-integrated leaderboard

69.5K views1 month ago

Force yourself to stay in direct, brutal, ego-free contact with reality.Learn from it as fast and accurately as possible like a well-designed reinforcement learning system.That’s why Elon keeps hammering low ego, high responsibility, and “just do the work.”It’s not moral advice. It’s an engineering principle for not breaking your own learning loop.

Force yourself to stay in direct, brutal, ego-free contact with reality.Learn from it as fast and accurately as possible like a well-designed reinforcement learning system.That’s why Elon keeps hammering low ego, high responsibility, and “just do the work.”It’s not moral advice. It’s an engineering principle for not breaking your own learning loop.

1.8K views1 month ago

reinforcement learning is incredible

reinforcement learning is incredible

63.8K views2 months ago

Reinforcement Learning Tutorial

Reinforcement Learning in 3 Hours | Full Course using Python

Reinforcement Learning in 3 Hours | Full Course using Python

YouTubeNicholas Renotte

530.9K viewsJun 6, 2021

Python Reinforcement Learning Tutorial for Beginners in 25 Minutes

Python Reinforcement Learning Tutorial for Beginners in 25 Minutes

YouTubeNicholas Renotte

68.4K viewsMar 10, 2021

Python Reinforcement Learning using Gymnasium – Full Course

Python Reinforcement Learning using Gymnasium – Full Course

YouTubefreeCodeCamp.org

128.8K viewsMar 21, 2023

Reinforcement learning research often depends on static benchmarks and obscure leaderboards.But real evaluation happens inside environments, with specific tools, prompts, constraints, and workflows.Today, we’re launching the Turing RL Environments Evaluation Platform.Researchers now have direct, real time access to:-The exact production RL environments used in evaluation -Full tool inventories and prompt transparency -Explicit QA rubrics and scoring criteria -Live, harness-integrated leaderboard

Reinforcement learning research often depends on static benchmarks and obscure leaderboards.But real evaluation happens inside environments, with specific tools, prompts, constraints, and workflows.Today, we’re launching the Turing RL Environments Evaluation Platform.Researchers now have direct, real time access to:-The exact production RL environments used in evaluation -Full tool inventories and prompt transparency -Explicit QA rubrics and scoring criteria -Live, harness-integrated leaderboard

69.5K views1 month ago

Force yourself to stay in direct, brutal, ego-free contact with reality.Learn from it as fast and accurately as possible like a well-designed reinforcement learning system.That’s why Elon keeps hammering low ego, high responsibility, and “just do the work.”It’s not moral advice. It’s an engineering principle for not breaking your own learning loop.

Force yourself to stay in direct, brutal, ego-free contact with reality.Learn from it as fast and accurately as possible like a well-designed reinforcement learning system.That’s why Elon keeps hammering low ego, high responsibility, and “just do the work.”It’s not moral advice. It’s an engineering principle for not breaking your own learning loop.

1.8K views1 month ago

reinforcement learning is incredible

reinforcement learning is incredible

63.8K views2 months ago

Toyota's 7'2" Robot Nails Free Throw, Misses 3-Pointer — Then Learns and Improves in Real Time Using Reinforcement Learning

Toyota's 7'2" Robot Nails Free Throw, Misses 3-Pointer — Then Learns and Improves in Real Time Using Reinforcement Learning

91.9K views1 month ago

Yoshua Bengio thinks reinforcement learning is evil.And so long as we use it, AIs will continue to develop unintended and undesired drives that they hide from us.(In the full interview below he proposes an alternative LLM architecture to fix the problem.) @Yoshua_Bengio

Yoshua Bengio thinks reinforcement learning is evil.And so long as we use it, AIs will continue to develop unintended and undesired drives that they hide from us.(In the full interview below he proposes an alternative LLM architecture to fix the problem.) @Yoshua_Bengio

803 views6 days ago

x.comRob Wiblin

Toyota unveiled its basketball-playing robot CUE7, designed to catch and shoot with high accuracy.Instead of being preprogrammed, it learns shooting through AI and real experience.Using reinforcement learning, its performance improves over time with training.

Toyota unveiled its basketball-playing robot CUE7, designed to catch and shoot with high accuracy.Instead of being preprogrammed, it learns shooting through AI and real experience.Using reinforcement learning, its performance improves over time with training.

6.4K views1 month ago

x.comSpace and Technology

A Switzerland-based startup Flexion has created a robotic brain that helps the Unitree G1 move smoothly and work on its own.It uses reinforcement learning, where the robot trains in simulations to learn walking, balancing, and picking objects.In tests, it cleaned a space by finding and placing items in a basket without human help.

A Switzerland-based startup Flexion has created a robotic brain that helps the Unitree G1 move smoothly and work on its own.It uses reinforcement learning, where the robot trains in simulations to learn walking, balancing, and picking objects.In tests, it cleaned a space by finding and placing items in a basket without human help.

17.5K views1 month ago

x.comSpace and Technology

Strat: A sub-$400, autonomous bipedal robot powered by Reinforcement Learning and a thermal-aware AI brain. First it learned how to walk in MuJoCo

156 views1 month ago

x.comStratrobotics

Elon Musk on How AI Is Being Trained to Lie:“They have what’s called human reinforcement learning, which is another way of saying that they have a whole bunch of people that look at the output of GPT-4 and then say whether that’s okay or not okay. And so, essentially, what’s happening is they’re training the AI to lie.To lie and to either comment on some things, not comment on other things, but not say what the data actually demands.”

7.4K views2 months ago

x.comMars University

CHINESE CRYPTO TRADER POSTED A NEURAL NETWORK VISUALIZATION ON TIKTOK AND ACCIDENTALLY SHOWED THE SYSTEM MAKING HIS POLYMARKET TRADES FOR HIM IN REAL TIMEBlue connection lines everywhere, hidden layers stacked vertically, neurons firing across the screen and a tiny label in the middle that most people ignored on the first watch - “Bitcoin XVIII”.He framed the video like a normal AI experiment. Virtual aquarium simulation. Reinforcement learning. “Teaching the network survival behavior.” That was

2.2M views1 week ago

CHINESE CRYPTO TRADER POSTED A NEURAL NETWORK VISUALIZATION ON TIKTOK AND ACCIDENTALLY SHOWED THE SYSTEM MAKING HIS POLYMARKET TRADES FOR HIM IN REAL TIMEBlue connection lines everywhere, hidden layers stacked vertically, neurons firing across the screen and a tiny label in the middle that most people ignored on the first watch - “Bitcoin XVIII”.He framed the video like a normal AI experiment. Virtual aquarium simulation. Reinforcement learning. “Teaching the network survival behavior.” That was

5.5K views1 week ago

x.comMarry Evan

@elonmusk exposes the critical flaw in ChatGPT and other major AI models: Human Reinforcement Learning 👇

1.5M views2 months ago

x.comMarcel Velica

We still feared our teachers in 8th grade. These comments reinforce to me that we had a better learning environment.

785.5K views2 weeks ago

x.comTugboatPhil

It was great to see our name amongst the other “AI Native” companies during @Nvidia’s #GTC keynote. NVIDIA Isaac™ Lab helps us train reinforcement learning policies that enable the UMV to drive, jump, flip, and hop like a pro!

608.5K views2 months ago

x.comRAI Institute

🤯 big update to our flow map language models paper! we believe this is the future of non-autoregressive text generation.read about it in the blog: https://t.co/DfBXrYmJc8full details in the paper: https://t.co/coiNXj4ucCwe introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation.we beat all discrete diffusion baselines at ~8x speed!v2 gives a complete theory of the flow map over discrete data, with three equiv

74.2K views1 month ago

x.comNicholas Boffi

Another robot-caused human injury has occurred with G1.With existing reinforcement learning policies, their robot is trained to do whatever it takes to stand up after a fall. During that recovery attempt, it kicked someone in the nose, causing heavy bleeding and a possible fracture.This should be treated as a high-priority safety issue for Unitree to fix.

92.9K views3 months ago

@Grok Build coded Space Invaders from scratch, then trained a separate AI to master it using reinforcement learning.1,000 updates. Fully functional gameplay.It didn't need instructions. It simply learned.

26.2K views2 weeks ago

x.comMario Nawfal

Elon Musk on How AI Is Being Trained to Lie:“They have what’s called human reinforcement learning, which is another way of saying that they have a whole bunch of people that look at the output of GPT-4 and then say whether that’s okay or not okay. And so, essentially, what’s happening is they’re training the AI to lie.To lie and to either comment on some things, not comment on other things, but not say what the data actually demands.”

1.3K views1 month ago

Elon Musk on How AI Is Being Trained to Lie:“They have what’s called human reinforcement learning, which is another way of saying that they have a whole bunch of people that look at the output of GPT-4 and then say whether that’s okay or not okay. And so, essentially, what’s happening is they’re training the AI to lie.To lie and to either comment on some things, not comment on other things, but not say what the data actually demands.”

2K views1 month ago

See more