Telegram Group & Telegram Channel
Forwarded from Machinelearning
🌟MiniMax-M1: открытя reasoning‑LLM с контекстом 1M

MiniMax-M1 — первая в мире open-weight гибридная reasoning‑LLM c 1M контекстом (8× DeepSeek R1) и гибридной архитектурой MoE + lightning attention.
• 456 млрд параметров (45,9 млрд активируются на токен), сверхэффективная генерация — 25% FLOPs DeepSeek R1 на 100K токенов
• Обучение через RL с новым алгоритмом CISPO, решающим реальные задачи от математики до кодинга
• На обучение было потрачено $534K, две версии — 40K/80K “thinking budget”
• Обходит DeepSeek R1 и Qwen3-235B на бенчмарках по математике и кодингу,
• Топ результат на задачах для software engineering и reasoning



Бенчмарки:
AIME 2024: 86.0 (M1-80K) vs 85.7 (Qwen3) vs 79.8 (DeepSeek R1)

SWE-bench Verified: 56.0 vs 34.4 (Qwen3)

OpenAI-MRCR (128k): 73.4 vs 27.7 (Qwen3)

TAU-bench (airline): 62.0 vs 34.7 (Qwen3)

LongBench-v2: 61.5 vs 50.1 (Qwen3)


➡️ Попробовать можно здесь

Hugging Face: https://huggingface.co/collections/MiniMaxAI/minimax-m1-68502ad9634ec0eeac8cf094
GitHub: https://github.com/MiniMax-AI/MiniMax-M1
Tech Report: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M1_tech_report.pdf


@ai_machinelearning_big_data

#llm #reasoningmodels #minimaxm1
Please open Telegram to view this post
VIEW IN TELEGRAM



group-telegram.com/machinelearning_interview/1862
Create:
Last Update:

🌟MiniMax-M1: открытя reasoning‑LLM с контекстом 1M

MiniMax-M1 — первая в мире open-weight гибридная reasoning‑LLM c 1M контекстом (8× DeepSeek R1) и гибридной архитектурой MoE + lightning attention.
• 456 млрд параметров (45,9 млрд активируются на токен), сверхэффективная генерация — 25% FLOPs DeepSeek R1 на 100K токенов
• Обучение через RL с новым алгоритмом CISPO, решающим реальные задачи от математики до кодинга
• На обучение было потрачено $534K, две версии — 40K/80K “thinking budget”
• Обходит DeepSeek R1 и Qwen3-235B на бенчмарках по математике и кодингу,
• Топ результат на задачах для software engineering и reasoning



Бенчмарки:
AIME 2024: 86.0 (M1-80K) vs 85.7 (Qwen3) vs 79.8 (DeepSeek R1)

SWE-bench Verified: 56.0 vs 34.4 (Qwen3)

OpenAI-MRCR (128k): 73.4 vs 27.7 (Qwen3)

TAU-bench (airline): 62.0 vs 34.7 (Qwen3)

LongBench-v2: 61.5 vs 50.1 (Qwen3)


➡️ Попробовать можно здесь

Hugging Face: https://huggingface.co/collections/MiniMaxAI/minimax-m1-68502ad9634ec0eeac8cf094
GitHub: https://github.com/MiniMax-AI/MiniMax-M1
Tech Report: https://github.com/MiniMax-AI/MiniMax-M1/blob/main/MiniMax_M1_tech_report.pdf


@ai_machinelearning_big_data

#llm #reasoningmodels #minimaxm1

BY Machine learning Interview





Share with your friend now:
group-telegram.com/machinelearning_interview/1862

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

In view of this, the regulator has cautioned investors not to rely on such investment tips / advice received through social media platforms. It has also said investors should exercise utmost caution while taking investment decisions while dealing in the securities market. The fake Zelenskiy account reached 20,000 followers on Telegram before it was shut down, a remedial action that experts say is all too rare. That hurt tech stocks. For the past few weeks, the 10-year yield has traded between 1.72% and 2%, as traders moved into the bond for safety when Russia headlines were ugly—and out of it when headlines improved. Now, the yield is touching its pandemic-era high. If the yield breaks above that level, that could signal that it’s on a sustainable path higher. Higher long-dated bond yields make future profits less valuable—and many tech companies are valued on the basis of profits forecast for many years in the future. Under the Sebi Act, the regulator has the power to carry out search and seizure of books, registers, documents including electronics and digital devices from any person associated with the securities market. This provided opportunity to their linked entities to offload their shares at higher prices and make significant profits at the cost of unsuspecting retail investors.
from nl


Telegram Machine learning Interview
FROM American