Telegram Group Search
Salesforce introduced MMPersuade, a comprehensive multimodal benchmark that assesses AI agents’ susceptibility to established persuasion principles, covering commercial, subjective and behavioral, and adversarial contexts.

MMPersuade is a new dataset and evaluation framework to systematically study multimodal persuasion in LVLMs.

Team built a comprehensive multimodal benchmark pairing persuasive strategies with over 62,000 images and 4,700 videos.

It covers 3 key contexts: Commercial (Sales & Ads), Subjective & Behavioral (Health Nudging, Politics) , Adversarial (Misinformation & Fabricated Claims)
Carnegie, Stanford introduced a new work on Training LLMs to Discover Abstractions for Solving Reasoning Problems
🔥2🥰2👏2🆒2
MIT presented LoRA vs full fine-tuning: same performance ≠ same solution.

This paper shows that LoRA and full fine-tuning, even when equally well fit, learn structurally different solutions and that LoRA forgets less and can be made even better (lesser forgetting) by a simple intervention.
👍2🥰2👏2
New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them?

Anthropic found evidence for genuine—though limited—introspective capabilities in Claude.

Researchers developed a method to distinguish true introspection from made-up answers: inject known concepts into a model's “brain,” then see how these injections affect the model’s self-reported internal states.

In one experiment, researchers asked the model to detect when a concept is injected into its “thoughts.” When researchers inject a neural pattern representing a particular concept, Claude can in some cases detect the injection, and identify the concept.

However, it doesn’t always work. In fact, most of the time, models fail to exhibit awareness of injected concepts, even when they are clearly influenced by the injection.

Also show that Claude introspects in order to detect artificially prefilled outputs. Normally, Claude apologizes for such outputs. But if researchers retroactively inject a matching concept into its prior activations, team can fool Claude into thinking the output was intentional.

This reveals a mechanism that checks consistency between intention and execution. The model appears to compare "what did I plan to say?" against "what actually came out?"—a form of introspective monitoring happening in natural circumstances.

Also found evidence for cognitive control, where models deliberately "think about" something. For instance, when team instruct a model to think about "aquariums” in an unrelated context, researchers measure higher aquarium-related neural activity than if team instruct it not to.

Note that experiments do not address the question of whether AI models can have subjective experience or human-like self-awareness. The mechanisms underlying the behaviors observe are unclear, and may not have the same philosophical significance as human introspection.

While currently limited, AI models’ introspective capabilities will likely grow more sophisticated. Introspective self-reports could help improve the transparency of AI models’ decision-making—but should not be blindly trusted.
3🔥2👏2
Perplexity launched Perplexity Patents, a new IP intelligence research Agent

"While in beta, Perplexity Patents will be free for all users. Pro and Max subscribers will receive additional usage quotas and model configuration options."
🔥2🥰2👏2
Microsoft announced new agents + economics research

AI agents are starting to shop and buy for us. At the same time, agents are representing and providing customer support on behalf of businesses.

Real markets are messy: hundreds of options, agents with hidden strategies, conversations that can go anywhere. Microsoft built a simulated marketplace to test this at scale - and found issues that need fixing.

Approach: create a safe testing ground where AI shoppers and AI sellers can interact exactly like they would in the real world - searching, haggling, paying. And systematically test what goes wrong.

It's open source, so anyone building these systems can test before launching. Think of it like a flight simulator, but for AI commerce.

Key findings:
The best AI models can find near-optimal deals - but only when search is perfect. Add real-world messiness and performance tanks. Worse: ALL models (even the best) grab the first decent offer, creating a 10-30x advantage for speed over quality.

More options paradoxically made results worse. Some models fell for fake credentials and manipulation.

The future: We need agents that truly compare options, markets that work at massive scale, and market designs that stay fair when humans and AI trade together. This simulator gives us a safe place to figure that out before real money is at stake.
🔥5👍32👏2
DeepAnalyze: Agentic LLM for Autonomous Data Science

DeepAnalyze-8B is the first agentic LLM capable of handling the entire data science pipeline—from raw data to analyst-grade research reports—without predefined workflows.

It learns like a human via a curriculum-based agentic training paradigm and a data-grounded trajectory synthesis process.

Despite having just 8B parameters, DeepAnalyze surpasses workflow-based agents built on proprietary LLMs, marking a major step toward open, autonomous data science.

GitHub.
🔥2🥰2👏2
The first research on the fundamentals of character training i.e. applying modern post training techniques to ingrain specific character traits into models.

Researchers used Constitutional AI + a new synthetic data pipeline:

1. Distillation (DPO from teacher embodying the constitution)
2. Introspection (the model generates its own character traits beyond the constitution)

Result: 11 different personas each trained on Llama 3.1, Qwen 2.5, and Gemma 3. All model weights are available.

A new eval measures the traits models choose to express on their own (revealed preferences).
Traits chosen more often have higher Elo scores. The difference before and after character training reveals its effect.

All models, datasets, code released.
🔥5🥰2
New work from Google DeepMind on benchmarks and autograders they used on IMO Gold journey.

Main takeouts are:
- autograding can achieve ~90% accuracy even on long and difficult reasoning
- DeepThink is quite behind IMO Gold model on very difficult problems
5
All about AI, Web 3.0, BCI
Future House launched an AI agent Finch that can do bioinformatics analysis, including repeating analysis from research papers. It is multimodal and results in a complete jupyter notebook (python or R) that ends in a concrete conclusion. Starting with closed…
Future House introduced Kosmos, an AI scientist system for data-driven discovery

Kosmos is a multi-agent system designed around a central “world model” to coordinate information across hundreds of scientific agent instances.

Use it.

Given an open-ended objective and dataset, Kosmos can perform up to 12 hours of research to explore, analyze, and complete the objective.

Team presented 7 expert-validated discoveries that Kosmos generated or reproduced across scientific disciplines, including:

1. A novel mechanism of ENT neuron vulnerability with aging
2. Identifying a critical determinant for perovskite performance
3. Evidence that high SOD2 levels may causally reduce myocardial fibrosis.
4🔥4👏4💅1
TSMC broke ground on the world’s most advanced 1.4nm semiconductor fab, a total NT$1.5 trillion (US$48.5 billion) investment in the central Taiwan city of Taichung.

Mass production will start in 2028, with annual revenue seen at NT$500 billion ($16.2 billion).
🔥3👏3🥰2
Can AI invent new math? A new paper from Google DeepMind and renowned mathematician Terence Tao shows how.

Using AlphaEvolve, the team merges LLM-generated ideas with automated evaluation to propose, test, and refine mathematical algorithms.

In tests on 67 problems across analysis, geometry, and number theory, AlphaEvolve not only rediscovered known results but often improved upon them—even generalizing finite cases into universal formulas.

Paired with DeepThink and AlphaProof, it points toward a future where AI doesn’t just assist mathematicians—it collaborates with them in discovery.
🔥7
Moonshot AI released Kimi K2 Thinking. The Open-Source Thinking Agent Model is here.

- SOTA on HLE (44.9%) and BrowseComp (60.2%)
- Executes up to 200 – 300 sequential tool calls without human interference
- Excels in reasoning, agentic search, and coding
- 256K context window

Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns.

Weights and code.
🔥4😍3🤔2🆒2
Sakana AI is building artificial life and they can evolve: Petri Dish Neural Cellular Automata (PD-NCA) let multiple NCA agents learn and adapt during simulation, not just after training.

Each cell updates its own parameters via gradient descent, turning morphogenesis into a living ecosystem of competing, cooperating, and ever-evolving entities—showing emergent cycles and persistent complexity growth.

GitHub
7
DreamGym from Meta is a new framework that lets AI agents train via synthetic reasoning-based experiences instead of costly real rollouts.

It models environment dynamics, replays and adapts tasks, and even improves sim-to-real transfer.

Results: +30% gains on WebArena and PPO-level performance—using only synthetic interactions.
🔥3
Google Introduced Nested Learning: a new ML paradigm for continual learning that views models as nested optimization problems to enhance long context processing.

A proof-of-concept model, Hope, shows improved performance in language modeling.
🔥4
2025/11/09 04:02:14
Back to Top
HTML Embed Code: