Telegram Group & Telegram Channel
Forwarded from Machinelearning
πŸ“ŒΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ "Reinforcement Learning: An Overview"

Π˜ΡΡ‡Π΅Ρ€ΠΏΡ‹Π²Π°ΡŽΡ‰ΠΈΠΉ ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π» ΠΏΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL), Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎ ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‚ΡΡ Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ срСды, Π·Π°Π΄Π°Ρ‡ΠΈ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ, исслСдуСтся ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ компромисса ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π΅ΠΎΡ€ΠΈΠ΅ΠΉ ΠΈ практичСской эксплуатаций RL.

ΠžΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ смСТныС Ρ‚Π΅ΠΌΡ‹: распрСдСлСнноС RL, иСрархичСскоС RL, ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π²Π½Π΅ ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊΠΈ ΠΈ VLM.

Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ прСдставлСн ΠΎΠ±Π·ΠΎΡ€ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² RL:

🟒SARSA;
🟒Q-learning;
🟒REINFORCE;
🟒A2C;
🟒TRPO/PPO;
🟒DDPG;
🟒Soft actor-critic;
🟒MBRL.

Автор - Kevin Murphy, Π³Π»Π°Π²Π½Ρ‹ΠΉ Π½Π°ΡƒΡ‡Π½Ρ‹ΠΉ сотрудник ΠΈ Ρ€ΡƒΠΊΠΎΠ²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ ΠΈΠ· 28 рСсСчСров ΠΈ ΠΈΠ½ΠΆΠ΅Π½Π΅Ρ€ΠΎΠ² Π² Google Deepmind. Π“Ρ€ΡƒΠΏΠΏΠ° Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π½Π°Π΄ Π³Π΅Π½Π΅Ρ€Π°Ρ‚ΠΈΠ²Π½Ρ‹ΠΌΠΈ модСлями (диффузия ΠΈ LLM), RL, Ρ€ΠΎΠ±ΠΎΡ‚ΠΎΡ‚Π΅Ρ…Π½ΠΈΠΊΠΎΠΉ, байСсовским Π²Ρ‹Π²ΠΎΠ΄ΠΎΠΌ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ Ρ‚Π΅ΠΌΠ°ΠΌΠΈ.

КСвин ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π» Π±ΠΎΠ»Π΅Π΅ 140 статСй Π½Π° Ρ€Π΅Ρ†Π΅Π½Π·ΠΈΡ€ΡƒΠ΅ΠΌΡ‹Ρ… конфСрСнциях ΠΈ Π² ΠΆΡƒΡ€Π½Π°Π»Π°Ρ…, Π° Ρ‚Π°ΠΊΠΆΠ΅ 3 ΡƒΡ‡Π΅Π±Π½ΠΈΠΊΠ° ΠΏΠΎ ML, ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Ρ… Π² 2012, 2022 ΠΈ 2023 Π³ΠΎΠ΄Π°Ρ… ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²ΠΎΠΌ MIT Press. (Книга 2012 Π³ΠΎΠ΄Π° Π±Ρ‹Π»Π° удостоСна ΠΏΡ€Π΅ΠΌΠΈΠΈ Π”Π΅Π“Ρ€ΠΎΠΎΡ‚Π° ΠΊΠ°ΠΊ Π»ΡƒΡ‡ΡˆΠ°Ρ ΠΊΠ½ΠΈΠ³Π° Π² области статистичСской Π½Π°ΡƒΠΊΠΈ).

πŸ”œ ΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π° Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС 9 дСкабря 2024 Π³ΠΎΠ΄Π°.


@ai_machinelearning_big_data

#AI #ML #Book #RL
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/datascienceiot/3072
Create:
Last Update:

πŸ“ŒΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ "Reinforcement Learning: An Overview"

Π˜ΡΡ‡Π΅Ρ€ΠΏΡ‹Π²Π°ΡŽΡ‰ΠΈΠΉ ΠΌΠ°Ρ‚Π΅Ρ€ΠΈΠ°Π» ΠΏΠΎ ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΡŽ с ΠΏΠΎΠ΄ΠΊΡ€Π΅ΠΏΠ»Π΅Π½ΠΈΠ΅ΠΌ (Reinforcement Learning, RL), Π² ΠΊΠΎΡ‚ΠΎΡ€ΠΎΠΌ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎ ΠΎΠΏΠΈΡΡ‹Π²Π°ΡŽΡ‚ΡΡ Ρ€Π°Π·Π»ΠΈΡ‡Π½Ρ‹Π΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ срСды, Π·Π°Π΄Π°Ρ‡ΠΈ ΠΎΠΏΡ‚ΠΈΠΌΠΈΠ·Π°Ρ†ΠΈΠΈ, исслСдуСтся ΠΎΠΏΡ€Π΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ компромисса ΠΌΠ΅ΠΆΠ΄Ρƒ Ρ‚Π΅ΠΎΡ€ΠΈΠ΅ΠΉ ΠΈ практичСской эксплуатаций RL.

ΠžΡ‚Π΄Π΅Π»ΡŒΠ½ΠΎ Ρ€Π°ΡΡΠΌΠ°Ρ‚Ρ€ΠΈΠ²Π°ΡŽΡ‚ΡΡ смСТныС Ρ‚Π΅ΠΌΡ‹: распрСдСлСнноС RL, иСрархичСскоС RL, ΠΎΠ±ΡƒΡ‡Π΅Π½ΠΈΠ΅ Π²Π½Π΅ ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊΠΈ ΠΈ VLM.

Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ прСдставлСн ΠΎΠ±Π·ΠΎΡ€ Π°Π»Π³ΠΎΡ€ΠΈΡ‚ΠΌΠΎΠ² RL:

🟒SARSA;
🟒Q-learning;
🟒REINFORCE;
🟒A2C;
🟒TRPO/PPO;
🟒DDPG;
🟒Soft actor-critic;
🟒MBRL.

Автор - Kevin Murphy, Π³Π»Π°Π²Π½Ρ‹ΠΉ Π½Π°ΡƒΡ‡Π½Ρ‹ΠΉ сотрудник ΠΈ Ρ€ΡƒΠΊΠΎΠ²ΠΎΠ΄ΠΈΡ‚Π΅Π»ΡŒ ΠΊΠΎΠΌΠ°Π½Π΄Ρ‹ ΠΈΠ· 28 рСсСчСров ΠΈ ΠΈΠ½ΠΆΠ΅Π½Π΅Ρ€ΠΎΠ² Π² Google Deepmind. Π“Ρ€ΡƒΠΏΠΏΠ° Ρ€Π°Π±ΠΎΡ‚Π°Π΅Ρ‚ Π½Π°Π΄ Π³Π΅Π½Π΅Ρ€Π°Ρ‚ΠΈΠ²Π½Ρ‹ΠΌΠΈ модСлями (диффузия ΠΈ LLM), RL, Ρ€ΠΎΠ±ΠΎΡ‚ΠΎΡ‚Π΅Ρ…Π½ΠΈΠΊΠΎΠΉ, байСсовским Π²Ρ‹Π²ΠΎΠ΄ΠΎΠΌ ΠΈ Π΄Ρ€ΡƒΠ³ΠΈΠΌΠΈ Ρ‚Π΅ΠΌΠ°ΠΌΠΈ.

КСвин ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π» Π±ΠΎΠ»Π΅Π΅ 140 статСй Π½Π° Ρ€Π΅Ρ†Π΅Π½Π·ΠΈΡ€ΡƒΠ΅ΠΌΡ‹Ρ… конфСрСнциях ΠΈ Π² ΠΆΡƒΡ€Π½Π°Π»Π°Ρ…, Π° Ρ‚Π°ΠΊΠΆΠ΅ 3 ΡƒΡ‡Π΅Π±Π½ΠΈΠΊΠ° ΠΏΠΎ ML, ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π½Ρ‹Ρ… Π² 2012, 2022 ΠΈ 2023 Π³ΠΎΠ΄Π°Ρ… ΠΈΠ·Π΄Π°Ρ‚Π΅Π»ΡŒΡΡ‚Π²ΠΎΠΌ MIT Press. (Книга 2012 Π³ΠΎΠ΄Π° Π±Ρ‹Π»Π° удостоСна ΠΏΡ€Π΅ΠΌΠΈΠΈ Π”Π΅Π“Ρ€ΠΎΠΎΡ‚Π° ΠΊΠ°ΠΊ Π»ΡƒΡ‡ΡˆΠ°Ρ ΠΊΠ½ΠΈΠ³Π° Π² области статистичСской Π½Π°ΡƒΠΊΠΈ).

πŸ”œ ΠœΠΎΠ½ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ ΠΎΠΏΡƒΠ±Π»ΠΈΠΊΠΎΠ²Π°Π½Π° Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС 9 дСкабря 2024 Π³ΠΎΠ΄Π°.


@ai_machinelearning_big_data

#AI #ML #Book #RL

BY Data Science




Share with your friend now:
group-telegram.com/datascienceiot/3072

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

Telegram, which does little policing of its content, has also became a hub for Russian propaganda and misinformation. Many pro-Kremlin channels have become popular, alongside accounts of journalists and other independent observers. Groups are also not fully encrypted, end-to-end. This includes private groups. Private groups cannot be seen by other Telegram users, but Telegram itself can see the groups and all of the communications that you have in them. All of the same risks and warnings about channels can be applied to groups. That hurt tech stocks. For the past few weeks, the 10-year yield has traded between 1.72% and 2%, as traders moved into the bond for safety when Russia headlines were uglyβ€”and out of it when headlines improved. Now, the yield is touching its pandemic-era high. If the yield breaks above that level, that could signal that it’s on a sustainable path higher. Higher long-dated bond yields make future profits less valuableβ€”and many tech companies are valued on the basis of profits forecast for many years in the future. But because group chats and the channel features are not end-to-end encrypted, Galperin said user privacy is potentially under threat. In the United States, Telegram's lower public profile has helped it mostly avoid high level scrutiny from Congress, but it has not gone unnoticed.
from sg


Telegram Data Science
FROM American