Telegram Group & Telegram Channel
Forwarded from Machinelearning
🌟MiniMax-M1: открытя reasoning‑LLM с контекстом 1M

MiniMax-M1 — первая в мире open-weight гибридная reasoning‑LLM c 1M контекстом (8× DeepSeek R1) и гибридной архитектурой MoE + lightning attention.
• 456 млрд параметров (45,9 млрд активируются на токен), сверхэффективная генерация — 25% FLOPs DeepSeek R1 на 100K токенов
• Обучение через RL с новым алгоритмом CISPO, решающим реальные задачи от математики до кодинга
• На обучение было потрачено $534K, две версии — 40K/80K “thinking budget”
• Обходит DeepSeek R1 и Qwen3-235B на бенчмарках по математике и кодингу,
• Топ результат на задачах для software engineering и reasoning



Бенчмарки:
AIME 2024: 86.0 (M1-80K) vs 85.7 (Qwen3) vs 79.8 (DeepSeek R1)

SWE-bench Verified: 56.0 vs 34.4 (Qwen3)

OpenAI-MRCR (128k): 73.4 vs 27.7 (Qwen3)

TAU-bench (airline): 62.0 vs 34.7 (Qwen3)

LongBench-v2: 61.5 vs 50.1 (Qwen3)


➡️ Попробовать можно здесь

Hugging Face: https://huggingface.co/collections/MiniMaxAI/minimax-m1-68502ad9634ec0eeac8cf094
GitHub: https://github.com/MiniMax-AI/MiniMax-M1
Tech Report: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M1_tech_report.pdf


@ai_machinelearning_big_data

#llm #reasoningmodels #minimaxm1
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/machinelearning_interview/1862
Create:
Last Update:

🌟MiniMax-M1: открытя reasoning‑LLM с контекстом 1M

MiniMax-M1 — первая в мире open-weight гибридная reasoning‑LLM c 1M контекстом (8× DeepSeek R1) и гибридной архитектурой MoE + lightning attention.
• 456 млрд параметров (45,9 млрд активируются на токен), сверхэффективная генерация — 25% FLOPs DeepSeek R1 на 100K токенов
• Обучение через RL с новым алгоритмом CISPO, решающим реальные задачи от математики до кодинга
• На обучение было потрачено $534K, две версии — 40K/80K “thinking budget”
• Обходит DeepSeek R1 и Qwen3-235B на бенчмарках по математике и кодингу,
• Топ результат на задачах для software engineering и reasoning



Бенчмарки:
AIME 2024: 86.0 (M1-80K) vs 85.7 (Qwen3) vs 79.8 (DeepSeek R1)

SWE-bench Verified: 56.0 vs 34.4 (Qwen3)

OpenAI-MRCR (128k): 73.4 vs 27.7 (Qwen3)

TAU-bench (airline): 62.0 vs 34.7 (Qwen3)

LongBench-v2: 61.5 vs 50.1 (Qwen3)


➡️ Попробовать можно здесь

Hugging Face: https://huggingface.co/collections/MiniMaxAI/minimax-m1-68502ad9634ec0eeac8cf094
GitHub: https://github.com/MiniMax-AI/MiniMax-M1
Tech Report: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M1_tech_report.pdf


@ai_machinelearning_big_data

#llm #reasoningmodels #minimaxm1

BY Machine learning Interview





Share with your friend now:
group-telegram.com/machinelearning_interview/1862

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

At its heart, Telegram is little more than a messaging app like WhatsApp or Signal. But it also offers open channels that enable a single user, or a group of users, to communicate with large numbers in a method similar to a Twitter account. This has proven to be both a blessing and a curse for Telegram and its users, since these channels can be used for both good and ill. Right now, as Wired reports, the app is a key way for Ukrainians to receive updates from the government during the invasion. He said that since his platform does not have the capacity to check all channels, it may restrict some in Russia and Ukraine "for the duration of the conflict," but then reversed course hours later after many users complained that Telegram was an important source of information. In a statement, the regulator said the search and seizure operation was carried out against seven individuals and one corporate entity at multiple locations in Ahmedabad and Bhavnagar in Gujarat, Neemuch in Madhya Pradesh, Delhi, and Mumbai. In addition, Telegram now supports the use of third-party streaming tools like OBS Studio and XSplit to broadcast live video, allowing users to add overlays and multi-screen layouts for a more professional look. Investors took profits on Friday while they could ahead of the weekend, explained Tom Essaye, founder of Sevens Report Research. Saturday and Sunday could easily bring unfortunate news on the war front—and traders would rather be able to sell any recent winnings at Friday’s earlier prices than wait for a potentially lower price at Monday’s open.
from fr


Telegram Machine learning Interview
FROM American