alwebbci Telegram Group

All about AI, Web 3.0, BCI

Salesforce introduced MMPersuade, a comprehensive multimodal benchmark that assesses AI agents’ susceptibility to established persuasion principles, covering commercial, subjective and behavioral, and adversarial contexts.

MMPersuade is a new dataset and evaluation framework to systematically study multimodal persuasion in LVLMs.

Team built a comprehensive multimodal benchmark pairing persuasive strategies with over 62,000 images and 4,700 videos.

It covers 3 key contexts: Commercial (Sales & Ads), Subjective & Behavioral (Health Nudging, Politics) , Adversarial (Misinformation & Fabricated Claims)

563 views12:54

All about AI, Web 3.0, BCI

Carnegie, Stanford introduced a new work on Training LLMs to Discover Abstractions for Solving Reasoning Problems

cohenqu.github.io

RLAD: RL through Abstraction Discovery

🔥2🥰2👏2🆒2

591 views14:20

All about AI, Web 3.0, BCI

MIT presented LoRA vs full fine-tuning: same performance ≠ same solution.

This paper shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention.

👍2🥰2👏2

610 views16:39

All about AI, Web 3.0, BCI

New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them?

Anthropic found evidence for genuine—though limited—introspective capabilities in Claude.

Researchers developed a method to distinguish true introspection from made-up answers: inject known concepts into a model's “brain,” then see how these injections affect the model’s self-reported internal states.

In one experiment, researchers asked the model to detect when a concept is injected into its “thoughts.” When researchers inject a neural pattern representing a particular concept, Claude can in some cases detect the injection, and identify the concept.

However, it doesn’t always work. In fact, most of the time, models fail to exhibit awareness of injected concepts, even when they are clearly influenced by the injection.

Also show that Claude introspects in order to detect artificially prefilled outputs. Normally, Claude apologizes for such outputs. But if researchers retroactively inject a matching concept into its prior activations, team can fool Claude into thinking the output was intentional.

This reveals a mechanism that checks consistency between intention and execution. The model appears to compare "what did I plan to say?" against "what actually came out?"—a form of introspective monitoring happening in natural circumstances.

Also found evidence for cognitive control, where models deliberately "think about" something. For instance, when team instruct a model to think about "aquariums” in an unrelated context, researchers measure higher aquarium-related neural activity than if team instruct it not to.

Note that experiments do not address the question of whether AI models can have subjective experience or human-like self-awareness. The mechanisms underlying the behaviors observe are unclear, and may not have the same philosophical significance as human introspection.

While currently limited, AI models’ introspective capabilities will likely grow more sophisticated. Introspective self-reports could help improve the transparency of AI models’ decision-making—but should not be blindly trusted.

Anthropic

Emergent introspective awareness in large language models

Research from Anthropic on the ability of large language models to introspect

❤3🔥2👏2

680 views08:29

All about AI, Web 3.0, BCI

Cognition (ex-team of Windsurf) released SWE-1.5, fast agent model.

That delivers "near-SOTA coding performance" at significantly higher speeds.

U can try it here.

Cognition

Cognition | Introducing SWE-1.5: Our Fast Agent Model

Today we’re releasing SWE-1.5, the latest in our family of models optimized for software engineering. It is a frontier-size model with hundreds of billions of parameters that achieves near-SOTA coding performance. It also sets a new standard for speed: we…

❤2🔥2👏2

569 views12:07

All about AI, Web 3.0, BCI

Perplexity launched Perplexity Patents, a new IP intelligence research Agent

"While in beta, Perplexity Patents will be free for all users. Pro and Max subscribers will receive additional usage quotas and model configuration options."

www.perplexity.ai

Introducing Perplexity Patents: AI-Powered Patent Search for Everyone

Explore Perplexity's blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.

🔥2🥰2👏2

716 views14:46

All about AI, Web 3.0, BCI

OpenAI introduced Aardvark, an agent that finds and fixes security bugs using GPT-5.

Openai

Introducing Aardvark: OpenAI’s agentic security researcher

Now in private beta: an AI agent that thinks like a security researcher and scales to meet the demands of modern software.

👍1🥰1👏1

636 views18:31

All about AI, Web 3.0, BCI

Microsoft announced new agents + economics research

AI agents are starting to shop and buy for us. At the same time, agents are representing and providing customer support on behalf of businesses.

Real markets are messy: hundreds of options, agents with hidden strategies, conversations that can go anywhere. Microsoft built a simulated marketplace to test this at scale - and found issues that need fixing.

Approach: create a safe testing ground where AI shoppers and AI sellers can interact exactly like they would in the real world - searching, haggling, paying. And systematically test what goes wrong.

It's open source , so anyone building these systems can test before launching. Think of it like a flight simulator, but for AI commerce.

Key findings: The best AI models can find near-optimal deals - but only when search is perfect. Add real-world messiness and performance tanks. Worse: ALL models (even the best) grab the first decent offer, creating a 10-30x advantage for speed over quality.

More options paradoxically made results worse. Some models fell for fake credentials and manipulation.

The future : We need agents that truly compare options, markets that work at massive scale, and market designs that stay fair when humans and AI trade together. This simulator gives us a safe place to figure that out before real money is at stake.

GitHub

GitHub - microsoft/multi-agent-marketplace: Magentic-Marketplace: Simulate Agentic Markets and See How They Evolve

Magentic-Marketplace: Simulate Agentic Markets and See How They Evolve - microsoft/multi-agent-marketplace

🔥5👍3❤2👏2

772 views08:20

All about AI, Web 3.0, BCI

DeepAnalyze: Agentic LLM for Autonomous Data Science

DeepAnalyze-8B is the first agentic LLM capable of handling the entire data science pipeline—from raw data to analyst-grade research reports—without predefined workflows.

It learns like a human via a curriculum-based agentic training paradigm and a data-grounded trajectory synthesis process.

Despite having just 8B parameters, DeepAnalyze surpasses workflow-based agents built on proprietary LLMs, marking a major step toward open, autonomous data science.

GitHub.

🔥2🥰2👏2

525 views12:14

All about AI, Web 3.0, BCI

The first research on the fundamentals of character training i.e. applying modern post training techniques to ingrain specific character traits into models.

Researchers used Constitutional AI + a new synthetic data pipeline:

1. Distillation (DPO from teacher embodying the constitution)
2. Introspection (the model generates its own character traits beyond the constitution)

Result: 11 different personas each trained on Llama 3.1, Qwen 2.5, and Gemma 3. All model weights are available.

A new eval measures the traits models choose to express on their own (revealed preferences).
Traits chosen more often have higher Elo scores. The difference before and after character training reveals its effect.

All models, datasets, code released.

🔥5🥰2

458 views09:06

All about AI, Web 3.0, BCI

N ew work from Google DeepMind on benchmarks and autograders they used on IMO Gold journey.

Main takeouts are:
- autograding can achieve ~90% accuracy even on long and difficult reasoning
- DeepThink is quite behind IMO Gold model on very difficult problems

❤5

451 views11:53

All about AI, Web 3.0, BCI

Future House launched an AI agent Finch that can do bioinformatics analysis, including repeating analysis from research papers. It is multimodal and results in a complete jupyter notebook (python or R) that ends in a concrete conclusion. Starting with closed…

Future House introduced Kosmos, an AI scientist system for data-driven discovery

Kosmos is a multi-agent system designed around a central “world model” to coordinate information across hundreds of scientific agent instances.

Use it.

Given an open-ended objective and dataset, Kosmos can perform up to 12 hours of research to explore, analyze, and complete the objective.

Team presented 7 expert-validated discoveries that Kosmos generated or reproduced across scientific disciplines, including:

1. A novel mechanism of ENT neuron vulnerability with aging
2. Identifying a critical determinant for perovskite performance
3. Evidence that high SOD2 levels may causally reduce myocardial fibrosis.

❤4🔥4👏4💅1

503 views16:54

All about AI, Web 3.0, BCI

TSMC broke ground on the world’s most advanced 1.4nm semiconductor fab, a total NT$1.5 trillion (US$48.5 billion) investment in the central Taiwan city of Taichung.

Mass production will start in 2028, with annual revenue seen at NT$500 billion ($16.2 billion).

經濟日報

台積電中科1.4奈米廠動工總投資規模上看1.5兆元預計2028年量產 | 科技產業 | 產業 | 經濟日報

台積電中科1.4奈米製程新廠昨（5）日啟動基樁工程，台積電相當低調，未公開舉行動工儀式，但後續廠房工程招標作業已展開，全...

🔥3👏3🥰2

518 views09:27

All about AI, Web 3.0, BCI

GPT-5.1 confirmed as new traces of "gpt-5-1-thinking" have been spotted on ChatGPT.

TestingCatalog

OpenAI readies GPT-5.1 Thinking model ahead of Gemini 3 Pro

GPT-5.1 Thinking debuts on ChatGPT with refined multi-step reasoning and variant models amid competitive pressures before Gemini 3 Pro.

497 views11:56

All about AI, Web 3.0, BCI

Can AI invent new math? A new paper from Google DeepMind and renowned mathematician Terence Tao shows how.

Using AlphaEvolve, the team merges LLM-generated ideas with automated evaluation to propose, test, and refine mathematical algorithms.

In tests on 67 problems across analysis, geometry, and number theory, AlphaEvolve not only rediscovered known results but often improved upon them—even generalizing finite cases into universal formulas.

Paired with DeepThink and AlphaProof, it points toward a future where AI doesn’t just assist mathematicians—it collaborates with them in discovery.

arXiv.org

Mathematical exploration and discovery at scale

AlphaEvolve is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and...

🔥7

534 views14:53

All about AI, Web 3.0, BCI

Moonshot AI released Kimi K2 Thinking. The Open-Source Thinking Agent Model is here.

- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding
- 256K context window

Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.

Weights and code.

moonshotai.github.io

Kimi K2 Thinking

Kimi K2 Thinking, Moonshot's best open-source thinking model.

🔥4😍3🤔2🆒2

622 views15:37

All about AI, Web 3.0, BCI

Google to roll out Polymarket and Kalshi prediction markets data in search results.

The Block

Google Finance to roll out Polymarket and Kalshi prediction markets data in search results

Google said prediction markets data from leading platforms Polymarket and Kalshi will roll out over the coming weeks.

👍5

536 views17:42

All about AI, Web 3.0, BCI

Sakana AI is building artificial life and they can evolve: Petri Dish Neural Cellular Automata (PD-NCA) let multiple NCA agents learn and adapt during simulation, not just after training.

Each cell updates its own parameters via gradient descent, turning morphogenesis into a living ecosystem of competing, cooperating, and ever-evolving entities—showing emergent cycles and persistent complexity growth.

GitHub

Petri Dish NCA

PPetri Dish Neural Cellular Automata (PD-NCA) is a new ALife simulation substrate that replaces the fixed, non-adaptive morphogenesis of conventional NCA—where model parameters remain constant during development—with multi-agent open-ended growth, trained…

❤7

467 views11:38

All about AI, Web 3.0, BCI

DreamGym from Meta is a new framework that lets AI agents train via synthetic reasoning-based experiences instead of costly real rollouts.

It models environment dynamics, replays and adapts tasks, and even improves sim-to-real transfer.

Results: +30% gains on WebArena and PPO-level performance—using only synthetic interactions.

🔥3

384 views17:12

All about AI, Web 3.0, BCI

Google Introduced Nested Learning: a new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing.

A proof-of-concept model, Hope, shows improved performance in language modeling.

research.google

Introducing Nested Learning: A new ML paradigm for continual learning

🔥4

363 views18:35

2025/11/09 04:02:14
Back to Top

HTML Embed Code:

<iframe width="100%" src="https://www.group-telegram.com/buyppe/webview?embed=1" title="Channel Webview" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>