Telegram Group & Telegram Channel
πŸ“ŒΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ "Reinforcement Learning: An Overview"

Π˜ΡΡ‡Π΅Ρ€ΠΏΡ‹Π²Π°ΡŽΡ‰ΠΈΠΉ ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π» ΠΏΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL), Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎ ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‚ΡΡ Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ срСды, Π·Π°Π΄Π°Ρ‡ΠΈ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ, исслСдуСтся ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ компромисса ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π΅ΠΎΡ€ΠΈΠ΅ΠΉ ΠΈ практичСской эксплуатаций RL.

ΠžΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ смСТныС Ρ‚Π΅ΠΌΡ‹: распрСдСлСнноС RL, иСрархичСскоС RL, ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π²Π½Π΅ ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊΠΈ ΠΈ VLM.

Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ прСдставлСн ΠΎΠ±Π·ΠΎΡ€ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² RL:

🟒SARSA;
🟒Q-learning;
🟒REINFORCE;
🟒A2C;
🟒TRPO/PPO;
🟒DDPG;
🟒Soft actor-critic;
🟒MBRL.

Автор - Kevin Murphy, Π³Π»Π°Π²Π½Ρ‹ΠΉ Π½Π°ΡƒΡ‡Π½Ρ‹ΠΉ сотрудник ΠΈ Ρ€ΡƒΠΊΠΎΠ²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ ΠΈΠ· 28 рСсСчСров ΠΈ ΠΈΠ½ΠΆΠ΅Π½Π΅Ρ€ΠΎΠ² Π² Google Deepmind. Π“Ρ€ΡƒΠΏΠΏΠ° Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π½Π°Π΄ Π³Π΅Π½Π΅Ρ€Π°Ρ‚ΠΈΠ²Π½Ρ‹ΠΌΠΈ модСлями (диффузия ΠΈ LLM), RL, Ρ€ΠΎΠ±ΠΎΡ‚ΠΎΡ‚Π΅Ρ…Π½ΠΈΠΊΠΎΠΉ, байСсовским Π²Ρ‹Π²ΠΎΠ΄ΠΎΠΌ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ Ρ‚Π΅ΠΌΠ°ΠΌΠΈ.

КСвин ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π» Π±ΠΎΠ»Π΅Π΅ 140 статСй Π½Π° Ρ€Π΅Ρ†Π΅Π½Π·ΠΈΡ€ΡƒΠ΅ΠΌΡ‹Ρ… конфСрСнциях ΠΈ Π² ΠΆΡƒΡ€Π½Π°Π»Π°Ρ…, Π° Ρ‚Π°ΠΊΠΆΠ΅ 3 ΡƒΡ‡Π΅Π±Π½ΠΈΠΊΠ° ΠΏΠΎ ML, ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Ρ… Π² 2012, 2022 ΠΈ 2023 Π³ΠΎΠ΄Π°Ρ… ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²ΠΎΠΌ MIT Press. (Книга 2012 Π³ΠΎΠ΄Π° Π±Ρ‹Π»Π° удостоСна ΠΏΡ€Π΅ΠΌΠΈΠΈ Π”Π΅Π“Ρ€ΠΎΠΎΡ‚Π° ΠΊΠ°ΠΊ Π»ΡƒΡ‡ΡˆΠ°Ρ ΠΊΠ½ΠΈΠ³Π° Π² области статистичСской Π½Π°ΡƒΠΊΠΈ).

πŸ”œ ΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π° Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС 9 дСкабря 2024 Π³ΠΎΠ΄Π°.


@ai_machinelearning_big_data

#AI #ML #Book #RL
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/ai_machinelearning_big_data/6338
Create:
Last Update:

πŸ“ŒΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ "Reinforcement Learning: An Overview"

Π˜ΡΡ‡Π΅Ρ€ΠΏΡ‹Π²Π°ΡŽΡ‰ΠΈΠΉ ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π» ΠΏΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL), Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎ ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‚ΡΡ Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ срСды, Π·Π°Π΄Π°Ρ‡ΠΈ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ, исслСдуСтся ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ компромисса ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π΅ΠΎΡ€ΠΈΠ΅ΠΉ ΠΈ практичСской эксплуатаций RL.

ΠžΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ смСТныС Ρ‚Π΅ΠΌΡ‹: распрСдСлСнноС RL, иСрархичСскоС RL, ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π²Π½Π΅ ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊΠΈ ΠΈ VLM.

Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ прСдставлСн ΠΎΠ±Π·ΠΎΡ€ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² RL:

🟒SARSA;
🟒Q-learning;
🟒REINFORCE;
🟒A2C;
🟒TRPO/PPO;
🟒DDPG;
🟒Soft actor-critic;
🟒MBRL.

Автор - Kevin Murphy, Π³Π»Π°Π²Π½Ρ‹ΠΉ Π½Π°ΡƒΡ‡Π½Ρ‹ΠΉ сотрудник ΠΈ Ρ€ΡƒΠΊΠΎΠ²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ ΠΈΠ· 28 рСсСчСров ΠΈ ΠΈΠ½ΠΆΠ΅Π½Π΅Ρ€ΠΎΠ² Π² Google Deepmind. Π“Ρ€ΡƒΠΏΠΏΠ° Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π½Π°Π΄ Π³Π΅Π½Π΅Ρ€Π°Ρ‚ΠΈΠ²Π½Ρ‹ΠΌΠΈ модСлями (диффузия ΠΈ LLM), RL, Ρ€ΠΎΠ±ΠΎΡ‚ΠΎΡ‚Π΅Ρ…Π½ΠΈΠΊΠΎΠΉ, байСсовским Π²Ρ‹Π²ΠΎΠ΄ΠΎΠΌ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ Ρ‚Π΅ΠΌΠ°ΠΌΠΈ.

КСвин ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π» Π±ΠΎΠ»Π΅Π΅ 140 статСй Π½Π° Ρ€Π΅Ρ†Π΅Π½Π·ΠΈΡ€ΡƒΠ΅ΠΌΡ‹Ρ… конфСрСнциях ΠΈ Π² ΠΆΡƒΡ€Π½Π°Π»Π°Ρ…, Π° Ρ‚Π°ΠΊΠΆΠ΅ 3 ΡƒΡ‡Π΅Π±Π½ΠΈΠΊΠ° ΠΏΠΎ ML, ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Ρ… Π² 2012, 2022 ΠΈ 2023 Π³ΠΎΠ΄Π°Ρ… ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²ΠΎΠΌ MIT Press. (Книга 2012 Π³ΠΎΠ΄Π° Π±Ρ‹Π»Π° удостоСна ΠΏΡ€Π΅ΠΌΠΈΠΈ Π”Π΅Π“Ρ€ΠΎΠΎΡ‚Π° ΠΊΠ°ΠΊ Π»ΡƒΡ‡ΡˆΠ°Ρ ΠΊΠ½ΠΈΠ³Π° Π² области статистичСской Π½Π°ΡƒΠΊΠΈ).

πŸ”œ ΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π° Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС 9 дСкабря 2024 Π³ΠΎΠ΄Π°.


@ai_machinelearning_big_data

#AI #ML #Book #RL

BY Machinelearning




Share with your friend now:
group-telegram.com/ai_machinelearning_big_data/6338

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"Someone posing as a Ukrainian citizen just joins the chat and starts spreading misinformation, or gathers data, like the location of shelters," Tsekhanovska said, noting how false messages have urged Ukrainians to turn off their phones at a specific time of night, citing cybersafety. But Telegram says people want to keep their chat history when they get a new phone, and they like having a data backup that will sync their chats across multiple devices. And that is why they let people choose whether they want their messages to be encrypted or not. When not turned on, though, chats are stored on Telegram's services, which are scattered throughout the world. But it has "disclosed 0 bytes of user data to third parties, including governments," Telegram states on its website. At its heart, Telegram is little more than a messaging app like WhatsApp or Signal. But it also offers open channels that enable a single user, or a group of users, to communicate with large numbers in a method similar to a Twitter account. This has proven to be both a blessing and a curse for Telegram and its users, since these channels can be used for both good and ill. Right now, as Wired reports, the app is a key way for Ukrainians to receive updates from the government during the invasion. "The argument from Telegram is, 'You should trust us because we tell you that we're trustworthy,'" MarΓ©chal said. "It's really in the eye of the beholder whether that's something you want to buy into." This ability to mix the public and the private, as well as the ability to use bots to engage with users has proved to be problematic. In early 2021, a database selling phone numbers pulled from Facebook was selling numbers for $20 per lookup. Similarly, security researchers found a network of deepfake bots on the platform that were generating images of people submitted by users to create non-consensual imagery, some of which involved children.
from us


Telegram Machinelearning
FROM American