Telegram Group & Telegram Channel
Show HN: Factorio Learning Environment – Agents Build Factories (🔥 Score: 159+ in 2 hours)

Link: https://readhacker.news/s/6qKug
Comments: https://readhacker.news/c/6qKug

I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).
FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.
A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.
The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.
Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.
Agents interact with FLE through a REPL pattern:
1. They observe the world (seeing the output of their last action)
2. Generate Python code to perform their next action
3. Receive detailed feedback (including exceptions and stdout)
We provide two main evaluation settings:
- Lab-play: 24 structured tasks with fixed resources
- Open-play: An unbounded task of building the largest possible factory on a procedurally generated map
We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).
The code is available at https://github.com/JackHopkins/factorio-learning-environment.
You'll need:
- Factorio (version 1.1.110)
- Docker
- Python 3.10+
The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.
We would love to hear your thoughts and see what others can do with this framework!
🔥16👍5👎3🤩3💩1



group-telegram.com/hacker_news_feed/115978
Create:
Last Update:

Show HN: Factorio Learning Environment – Agents Build Factories (🔥 Score: 159+ in 2 hours)

Link: https://readhacker.news/s/6qKug
Comments: https://readhacker.news/c/6qKug

I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).
FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.
A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.
The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.
Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.
Agents interact with FLE through a REPL pattern:
1. They observe the world (seeing the output of their last action)
2. Generate Python code to perform their next action
3. Receive detailed feedback (including exceptions and stdout)
We provide two main evaluation settings:
- Lab-play: 24 structured tasks with fixed resources
- Open-play: An unbounded task of building the largest possible factory on a procedurally generated map
We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).
The code is available at https://github.com/JackHopkins/factorio-learning-environment.
You'll need:
- Factorio (version 1.1.110)
- Docker
- Python 3.10+
The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.
We would love to hear your thoughts and see what others can do with this framework!

BY Hacker News


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/hacker_news_feed/115978

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

"For Telegram, accountability has always been a problem, which is why it was so popular even before the full-scale war with far-right extremists and terrorists from all over the world," she told AFP from her safe house outside the Ukrainian capital. On February 27th, Durov posted that Channels were becoming a source of unverified information and that the company lacks the ability to check on their veracity. He urged users to be mistrustful of the things shared on Channels, and initially threatened to block the feature in the countries involved for the length of the war, saying that he didn’t want Telegram to be used to aggravate conflict or incite ethnic hatred. He did, however, walk back this plan when it became clear that they had also become a vital communications tool for Ukrainian officials and citizens to help coordinate their resistance and evacuations. The regulator took order for the search and seizure operation from Judge Purushottam B Jadhav, Sebi Special Judge / Additional Sessions Judge. "The argument from Telegram is, 'You should trust us because we tell you that we're trustworthy,'" Maréchal said. "It's really in the eye of the beholder whether that's something you want to buy into." The Security Service of Ukraine said in a tweet that it was able to effectively target Russian convoys near Kyiv because of messages sent to an official Telegram bot account called "STOP Russian War."
from us


Telegram Hacker News
FROM American