↓Salta al contenuto principale

The LLM Red Teaming Framework

4 settembre 2025·381 parole·2 minuti

GitHub Framework Open Source Python LLM Best Practices

Articoli Interessanti - This article is part of a series.

Part : Perche' la tua azienda ha bisogno di AI privata (e non di ChatGPT)

Part : Keycloak

Part : GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive

Part : GitHub - EricLBuehler/mistral.rs: Fast, flexible LLM inference

Part : GitHub - antirez/voxtral.c: Pure C inference of Mistral Voxtral Realtime 4B speech to text model

Part : GitHub - alexziskind1/llama-throughput-lab: Interactive launcher and benchmarking harness for llama.cpp server throughput, with tests, sweeps, and round-robin load tools.

Part : GitHub - qwibitai/nanoclaw: A lightweight alternative to Clawdbot / OpenClaw that runs in Apple containers for security. Connect

Part : GitHub - moltbot/moltbot: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

Part : GitHub - aiming-lab/SimpleMem: SimpleMem: Efficient Lifelong Memory for LLM Agents

Part : GitHub - mikekelly/claude-sneakpeek: Get a parallel build of Claude code that unlocks feature-flagged capabilities like swarm mode.

Part : GitHub - virattt/ai-hedge-fund: An AI Hedge Fund Team

Part : moonshotai/Kimi-K2.5 · Hugging Face

Part : Welcome - Poke Documentation

Part : Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Part : NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice - NVIDIA ADLR

Part : GitHub - different-ai/openwork: An open-source alternative to Claude Cowork, powered by OpenCode

Part : GitHub - google/langextract: A Python library for extracting structured information from unstructured text using LLMs with precis

Part : GitHub - memodb-io/Acontext: Data platform for context engineering. Context data platform that stores, observes and learns. Join

Part : GitHub - rberg27/doom-coding: A guide for how to use your smartphone to code anywhere at anytime.

Part : GitHub - bolt-foundry/gambit: Agent harness framework for building, running, and verifying LLM workflows

Part : GitHub - unclecode/crawl4ai: 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Part : GitHub - finbarr/yolobox: Let your AI go full send. Your home directory stays home.

Part : GitHub - mistralai/mistral-vibe: Minimal CLI coding agent by Mistral

Part : GitHub - eigent-ai/eigent: Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity.

Part : GitHub - NVlabs/ToolOrchestra: ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.

Part : Ask HN: What is the best way to provide continuous context to models?

Part : Recursive Language Models

Part : Recursive Language Models | Alex L. Zhang

Part : Recursive Language Models: the paradigm of 2026

Part : The Art of Context Windows: Our AI Had Alzheimer's: Here's How We Taught It To Remember

Part : Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

Part : Il Disclaimer muore.

Part : GitHub - fullstackwebdev/rlm_repl: Recursive Language Models (RLMs) implementation based on the paper by Zhang, Kraska, and Khattab

Part : Cowork: Claude Code for the rest of your work

Part : Show HN: Agent-of-empires: OpenCode and Claude Code session manager

Part : ToolOrchestra

Part : OpenCode | The open source AI coding agent

Part : You Should Write An Agent · The Fly Blog

Part : Getting Started - SWE-agent documentation

Part : How to Build an Agent - Amp

Part : How to code Claude Code in 200 lines of code

Part : SAM Audio

Part : We Got Claude to Fine-Tune an Open Source LLM

Part : Use Claude Code with Chrome (beta) - Claude Code Docs

Part : GitHub - microsoft/VibeVoice: Open-Source Frontier Voice AI

Part : GitHub - GVCLab/PersonaLive: PersonaLive! : Expressive Portrait Image Animation for Live Streaming

Part : GitHub - NevaMind-AI/memU: Memory infrastructure for LLMs and AI agents

Part : GitHub - VibiumDev/vibium: Browser automation for AI agents and humans

Part : GitHub - yichuan-w/LEANN: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Part : GitHub - DGoettlich/history-llms: Information hub for our project training the largest possible historical LLMs.

Part : LLMRouter - LLMRouter

Part : Everything as Code: How We Manage Our Company In One Monorepo | Kasava

Part : GitHub - Search code, repositories, users, issues, pull requests...: 🔥 A tool to analyze your website's AI-readiness, powered by Firecrawl

Part : Fundamentals of Building Autonomous LLM Agents This paper is based on a seminar technical report from the course Trends in Autonomous Agents: Advances in Architecture and Practice offered at TUM

Part : Introduction | MCP Toolbox for Databases

Part : GitHub - Tencent-Hunyuan/HunyuanOCR

Part : Effective harnesses for long-running agents Anthropic

Part : GitHub - pixeltable/pixeltable: Pixeltable — Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads

Part : AI Explained - Stanford Research Paper.pdf - Google Drive

Part : We present Olmo 3, our next family of fully open, leading language models

Part : Nano Banana Pro is making millions of interior designers obsolete I upload my floor plan and it design the whole house for me, and even generate real images for each room based on the dimension

Part : How to Segment Videos with Segment Anything 3 (SAM3)

Part : Introducing MagicPath, an infinite canvas to create, refine, and explore with AI

Part : Nano Banana Pro is wild

Part : Next up… Slide Decks! Turn your sources into a detailed deck for reading OR a set of presentation-ready slides

Part : Presentations — Benedict Evans

Part : Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind

Part : Google Antigravity

Part : GitHub - GibsonAI/Memori: Open-Source Memory Engine for LLMs, AI Agents & Multi-Agent Systems

Part : GitHub Projects Community (@GithubProjects) on X

Part : I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs

Part : Love this framing！ This is exactly what we’re building at Weco: - you write an eval script (your verifier) - Weco iterates on the code to optimize it against that eval Software 1

Part : Supercharge your OCR Pipelines with Open Models

Part : [2511.09030] Solving a Million-Step LLM Task with Zero Errors

Part : Gemini 3: Introducing the latest Gemini AI model from Google

Part : [2511.10395] AgentEvolver: Towards Efficient Self-Evolving Agent System

Part : GitHub - rbalestr-lab/lejepa

Part : Use Cases | Claude

Part : Improving frontend design through Skills | Claude

Part : Sim: Open-source platform to build and deploy AI agent workflows

Part : Context Retrieval for AI Agents across Apps & Databases

Part : said we should delete tokenizers

Part : You Should Write An Agent · The Fly Blog

Part : "🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here"

Part : Link to the Strix GitHub repo: (don't forget to star 🌟)

Part : Source: Thanks and Bharat for showing the world you can in fact tra...

Part : This Claude Code prompt literally turns Claude Code into ultrathink...

Part : Wren AI | Official Blog

Part : Tongyi DeepResearch: A New Era of Open-Source AI Researchers | Tongyi DeepResearch

Part : Syllabi – Open-source agentic AI with tools, RAG, and multi-channel deploy

Part : OpenSkills

Part : MiniMax-M2

Part : AI Act Single Information Platform | AI Act Service Desk

Part : eurollm.io

Part : Introducing Mistral AI Studio. | Mistral AI

Part : OpenSnowcat - Enterprise-grade behavioral data platform.

Part : Dr Milan Milanović (@milan_milanovic) on X

Part : Game Theory | Open Yale Courses

Part : DeepSeek-OCR

Part : Airbyte: The Leading Data Integration Platform for ETL/ELT Pipelines

Part : Enterprise Deep Research

Part : I quite like the new DeepSeek-OCR paper

Part : olmOCR 2: Unit test rewards for document OCR | Ai2

Part : We used DeepSeek OCR to extract every dataset from tables/charts ac...

Part : Scripts I wrote that I use all the time

Part : DeepSeek OCR - More than OCR - YouTube

Part : How to Get Consistent Classification From Inconsistent LLMs?

Part : Production RAG: what I learned from processing 5M+ documents

Part : Stanford's ALL FREE Courses [2024 & 2025] ❯ CS230 - Deep Learni...

Part : Syllabus

Part : Make Any App Searchable for AI Agents

Part : PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Part : Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Part : Recursive Language Models (RLMs)

Part : nanochat

Part : ROMA: Recursive Open Meta-Agents

Part : NeuTTS Air

Part : Cua: Open-source infrastructure for Computer-Use Agents

Part : MCP Analytics and Authentication Platform

Part : My trick for getting consistent classification from LLMs

Part : If you're late to the whole "memory in AI agents" topic like me, I recommend investing 43 minutes to watch this video

Part : DeepLearning.AI: Start or Advance Your Career in AI

Part : Claude Code best practices | Code w/ Claude - YouTube

Part : EU-funded TildeOpen LLM delivers European AI breakthrough for multilingual innovation | Shaping Europe’s digital future

Part : The RAG Obituary: Killed by Agents, Buried by Context Windows

Part : Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy

Part : RAG-Anything: All-in-One RAG Framework

Part : RAGLight

Part : Turns Codebase into Easy Tutorial with AI

Part : Failing to Understand the Exponential, Again

Part : Prompt Packs | OpenAI Academy

Part : AI-Researcher: Autonomous Scientific Innovation

Part : Context Engineering for AI Agents: Lessons from Building Manus

Part : AgenticSeek: Private, Local Manus Alternative

Part : Learn Your Way

Part : Qwen-Image-Edit-2509: Multi-Image Support，Improved Consistency

Part : Qwen-Image

Part : Introducing Tongyi Deep Research

Part : 💾🎉 copyparty

Part : AI Engineering Hub

Part : Deep Chat

Part : ibm-granite/granite-docling-258M · Hugging Face

Part : Google just dropped an ace 64-page guide on building AI Agents

Part : opcode - The Elegant Desktop Companion for Claude Code

Part : NocoDB Cloud

Part : A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

Part : MemoRAG: Moving Towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Part : Enable AI to control your browser 🤖

Part : Total monthly distance traveled by passengers in California’s driverless taxis - Our World in Data

Part : A must-bookmark for vibe-coders

Part : Huge AI market opportunity in 2025

Part : The Anthropic Economic Index Anthropic

Part : dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Part : PaddleOCR

Part : DeepSite v2 - a Hugging Face Space by enzostvs

Part : How to Use Claude Code Subagents to Parallelize Development

Part : Show HN: CLAVIER-36 – A programming environment for generative music

Part : Small models are the future of agentic ai

Part : Kimi K2: Open Agentic Intelligence

Part : Introducing Qwen3-Max-Preview (Instruct)

Part : Scientific Paper Agent with LangGraph

Part : Anthropic's Interactive Prompt Engineering Tutorial

Part : swiss-ai/Apertus-70B-2509 · Hugging Face

Part : Making a font of my handwriting · Chameth.com

Part : SurfSense

Part : LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Part : NextChat

Part : This Article

Part : Colette - ci ricorda molto Kotaemon

Part : VibeVoice: A Frontier Open-Source Text-to-Speech Model

Part : [2502.12110] A-MEM: Agentic Memory for LLM Agents

Part : [2504.19413] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Part : Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS

Part : HumanLayer

Part : PageIndex: Document Index for Reasoning-based RAG

Part : Deploying DeepSeek on 96 H100 GPUs

Part : Claude Code: A Highly Agentic Coding Assistant - DeepLearning.AI

Part : DyG-RAG: Dynamic Graph Retrieval-Augmented Generation with Event-Centric Reasoning

Part : [2508.15126] aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Part : Alexander Kruel - Links for 2025-08-24

Part : AI Agents for Beginners - A Course

Part : Turning Claude Code into my best design partner

Part : How to build a coding agent

Part : Tiledesk Design Studio

Part : Build a Large Language Model (From Scratch)

Part : Data Formulator: Create Rich Visualizations with AI

Part : browser-use/web-ui

Part : Casper Capital - 100 AI Tools You Can’t Ignore in 2025...

Part : CS294/194-196 Large Language Model Agents | CS 194/294-196 Large Language Model Agents

Part : Show HN: Whispering – Open-source, local-first dictation you can trust

Part : Fallinorg v1.0.0-beta

Part : paperetl

Part : Automatically annotate papers using LLMs

Part : My AI Had Already Fixed the Code Before I Saw It

Part : Llama-Scan: Convert PDFs to Text W Local LLMs

Part : Claudia – Desktop companion for Claude code

Part : Show HN: Fallinorg - Offline Mac app that organizes files by meaning

Part : Focalboard

Part : Elysia: Agentic Framework Powered by Decision Trees

Part : LangExtract

Part : +1 for "context engineering" over "prompt engineering"

Part : The race for LLM cognitive core

Part : [2507.07935] Working with AI: Measuring the Occupational Implications of Generative AI

Part : Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Part : Prava - Teaching GPT‑5 to use a computer

Part : InstaVM - Secure Code Execution Platform

Part : Litestar is worth a look

Part : Jobs at Kaizen | Y Combinator

Part : Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production

Part : Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

Part : Agentic Design Patterns - Documenti Google

Part : [2507.14447] Routine: A Structural Planning Framework for LLM Agent System in Enterprise

Part : Qwen3-Coder: Agentic coding in the world

Part : FutureHouse Platform

Part : Voxtral | Mistral AI

Part : Research Agent with Gemini 2.5 Pro and LlamaIndex | Gemini API | Google AI for Developers

Part : AI Act, c'è il codice di condotta per un approccio responsabile e facilitato per le Pmi - Cyber Security 360

Part : [2507.06398] Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI

Part : MindsDB, an AI Data Solution - MindsDB

Part : Backlog.md – Markdown-native Task Manager and Kanban visualizer for any Git repo

Part : Opencode: AI coding agent, built for the terminal

Part : The new skill in AI is not prompting, it's context engineering

Part : SymbolicAI: A neuro-symbolic perspective on LLMs

Part : Gemini for Google Workspace Prompting Guide 101

Part : Judge Rules Training AI on Copyrighted Works Is Fair Use, Agentic Biology Evolves, and more...

Part : MCP is eating the world—and it's here to stay

Part : How Dataherald Makes Natural Language to SQL Easy

Part : Field Notes From Shipping Real Code With Claude

Part : Nice - my AI startup school talk is now up!

Part : Nice - my AI startup school talk is now up! Chapters: 0:00 Imo fair to say that software is changing quite fundamentally again

Part : Automated 73% of his remote job using basic automation tools, told his manager everything, and got a promotion

Part : Building Effective AI Agents

Part : How Anthropic Teams Use Claude Code

Part : Snorting the AGI with Claude Code

Part : Nanonets-OCR-s – OCR model that transforms documents into structured markdown

Part : The Illusion of Thinking

Part : Trends – Artificial Intelligence | BOND

Part : Claude Code is My Computer | Peter Steinberger

Part : [2505.24863] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Part : [2505.24864] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Part : My AI Skeptic Friends Are All Nuts · The Fly Blog

Part : Designing Pareto-optimal GenAI workflows with syftr

Part : "BillionMail 📧 An Open-Source MailServer, NewsLetter, Email Marketing Solution for Smarter Campaigns"

Part : Ask HN: What is the best LLM for consumer grade hardware?

Part : [2411.06037] Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Part : Show HN: Onlook – Open-source, visual-first Cursor for designers

Part : Agent Development Kit (ADK)

Part : Strands Agents

Part : Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning

Part : Introduction - IntelOwl Project Documentation

Part : Show HN: My LLM CLI tool can run tools now, from Python code or plugins

Part : [2505.03335v2] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Part : Codex’s Robot Dev Team, Grok's Fixation on South Africa, Saudi Arabia’s AI Power Play, and more...

Part : [2502.00032v1] Querying Databases with Function Calling

Part : Come Addestrare un LLM con i Tuoi Dati Personali: Guida Completa con LLaMA 3.2

Part : AI Hedge Fund

Part : Troy Hunt: Have I Been Pwned 2.0 is Now Live!

Part : A Research Preview of Codex

Part : [2505.06120] LLMs Get Lost In Multi-Turn Conversation

Part : Ollama's new engine for multimodal models

Part : Vision Now Available in Llama.cpp

Part : [2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Part : Requests for Startups | Y Combinator

Part : Token & Token Usage | DeepSeek API Docs

Part : Cua is Docker for Computer-Use AI Agents

Part : [2504.07139] Artificial Intelligence Index Report 2025

Part : Gemma 3 QAT Models: Bringing state-of-the-Art AI to consumer GPUs

Part : GitHub - HandsOnLLM/Hands-On-Large-Language-Models: Official code repo for the O'Reilly Book - 'Hands-On Large Language Models'

Part : Deep Tech Revolution - Area Science Park

Part : GitHub - humanlayer/12-factor-agents: What are the principles we can use to build LLM-powered software that is actually good enough to put

Part : Pagina LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Part : Pagina SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Part : DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

Part : A foundation model to predict and capture human cognition | Nature

Part : Large language models are proficient in solving and creating emotional intelligence tests | Communications Psychology

Part : MALADE: Multi-Agent Architecture for Pharmacovigilance - langroid

Part : Everything About Transformers

Default featured image

#### Fonte

Tipo: GitHub Repository
Link originale: https://github.com/confident-ai/deepteam
Data pubblicazione: 2025-09-04

Sintesi
#

WHAT - DeepTeam è un framework open-source per il red teaming di Large Language Models (LLMs) e sistemi basati su LLMs. Permette di simulare attacchi avversari e identificare vulnerabilità come bias, leak di informazioni personali (PII) e robustezza.

WHY - È rilevante per il business AI perché consente di testare e migliorare la sicurezza degli LLMs, riducendo il rischio di attacchi avversari e garantendo la conformità alle normative sulla privacy e sicurezza dei dati.

WHO - Gli attori principali sono Confident AI, l’azienda che sviluppa DeepTeam, e la community open-source che contribuisce al progetto. Competitor includono altre soluzioni di sicurezza per LLMs come AI Red Teaming di Microsoft.

WHERE - DeepTeam si posiziona nel mercato della sicurezza AI, specificamente nel settore del red teaming per LLMs. È parte dell’ecosistema di strumenti per la valutazione e la sicurezza dei modelli linguistici.

WHEN - DeepTeam è un progetto relativamente nuovo ma in rapida crescita, con una comunità attiva e una documentazione ben strutturata. Il trend temporale mostra un aumento di interesse e adozione.

BUSINESS IMPACT:

Opportunità: Integrazione di DeepTeam nel processo di sviluppo per migliorare la sicurezza degli LLMs, riducendo il rischio di attacchi e migliorando la fiducia degli utenti.
Rischi: Dipendenza da un progetto open-source potrebbe comportare rischi di manutenzione e supporto a lungo termine.
Integrazione: Possibile integrazione con lo stack esistente di valutazione e sicurezza dei modelli linguistici.

TECHNICAL SUMMARY:

Core technology stack: Python, DeepEval (framework di valutazione per LLMs), tecniche di red teaming come jailbreaking e prompt injection.
Scalabilità: Eseguibile localmente, scalabile in base alle risorse hardware disponibili.
Differenziatori tecnici: Simulazione di attacchi avanzati e identificazione di vulnerabilità specifiche come bias e leak di PII.

Casi d’uso
#

Private AI Stack: Integrazione in pipeline proprietarie
Client Solutions: Implementazione per progetti clienti
Development Acceleration: Riduzione time-to-market progetti
Strategic Intelligence: Input per roadmap tecnologica
Competitive Analysis: Monitoring ecosystem AI

Risorse
#

Link Originali
#

The LLM Red Teaming Framework - Link originale

Articolo segnalato e selezionato dal team Human Technology eXcellence elaborato tramite intelligenza artificiale (in questo caso con LLM HTX-EU-Mistral3.1Small) il 2025-09-04 19:37 Fonte originale: https://github.com/confident-ai/deepteam

Articoli Correlati
#

Automatically annotate papers using LLMs - LLM, Open Source
HumanLayer - Best Practices, AI, LLM
paperetl - Open Source

Articoli Interessanti - This article is part of a series.

Part : Perche' la tua azienda ha bisogno di AI privata (e non di ChatGPT)

Part : Keycloak

Part : GitHub - zai-org/GLM-OCR: GLM-OCR: Accurate × Fast × Comprehensive

Part : GitHub - EricLBuehler/mistral.rs: Fast, flexible LLM inference

Part : GitHub - antirez/voxtral.c: Pure C inference of Mistral Voxtral Realtime 4B speech to text model

Part : GitHub - alexziskind1/llama-throughput-lab: Interactive launcher and benchmarking harness for llama.cpp server throughput, with tests, sweeps, and round-robin load tools.

Part : GitHub - qwibitai/nanoclaw: A lightweight alternative to Clawdbot / OpenClaw that runs in Apple containers for security. Connect

Part : GitHub - moltbot/moltbot: Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

Part : GitHub - aiming-lab/SimpleMem: SimpleMem: Efficient Lifelong Memory for LLM Agents

Part : GitHub - mikekelly/claude-sneakpeek: Get a parallel build of Claude code that unlocks feature-flagged capabilities like swarm mode.

Part : GitHub - virattt/ai-hedge-fund: An AI Hedge Fund Team

Part : moonshotai/Kimi-K2.5 · Hugging Face

Part : Welcome - Poke Documentation

Part : Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Part : NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice - NVIDIA ADLR

Part : GitHub - different-ai/openwork: An open-source alternative to Claude Cowork, powered by OpenCode

Part : GitHub - google/langextract: A Python library for extracting structured information from unstructured text using LLMs with precis

Part : GitHub - memodb-io/Acontext: Data platform for context engineering. Context data platform that stores, observes and learns. Join

Part : GitHub - rberg27/doom-coding: A guide for how to use your smartphone to code anywhere at anytime.

Part : GitHub - bolt-foundry/gambit: Agent harness framework for building, running, and verifying LLM workflows

Part : GitHub - unclecode/crawl4ai: 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Part : GitHub - finbarr/yolobox: Let your AI go full send. Your home directory stays home.

Part : GitHub - mistralai/mistral-vibe: Minimal CLI coding agent by Mistral

Part : GitHub - eigent-ai/eigent: Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity.

Part : GitHub - NVlabs/ToolOrchestra: ToolOrchestra is an end-to-end RL training framework for orchestrating tools and agentic workflows.

Part : Ask HN: What is the best way to provide continuous context to models?

Part : Recursive Language Models

Part : Recursive Language Models | Alex L. Zhang

Part : Recursive Language Models: the paradigm of 2026

Part : The Art of Context Windows: Our AI Had Alzheimer's: Here's How We Taught It To Remember

Part : Reimagining LLM Memory: Using Context as Training Data Unlocks Models That Learn at Test-Time

Part : Il Disclaimer muore.

Part : GitHub - fullstackwebdev/rlm_repl: Recursive Language Models (RLMs) implementation based on the paper by Zhang, Kraska, and Khattab

Part : Cowork: Claude Code for the rest of your work

Part : Show HN: Agent-of-empires: OpenCode and Claude Code session manager

Part : ToolOrchestra

Part : OpenCode | The open source AI coding agent

Part : You Should Write An Agent · The Fly Blog

Part : Getting Started - SWE-agent documentation

Part : How to Build an Agent - Amp

Part : How to code Claude Code in 200 lines of code

Part : SAM Audio

Part : We Got Claude to Fine-Tune an Open Source LLM

Part : Use Claude Code with Chrome (beta) - Claude Code Docs

Part : GitHub - microsoft/VibeVoice: Open-Source Frontier Voice AI

Part : GitHub - GVCLab/PersonaLive: PersonaLive! : Expressive Portrait Image Animation for Live Streaming

Part : GitHub - NevaMind-AI/memU: Memory infrastructure for LLMs and AI agents

Part : GitHub - VibiumDev/vibium: Browser automation for AI agents and humans

Part : GitHub - yichuan-w/LEANN: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on your personal device.

Part : GitHub - DGoettlich/history-llms: Information hub for our project training the largest possible historical LLMs.

Part : LLMRouter - LLMRouter

Part : Everything as Code: How We Manage Our Company In One Monorepo | Kasava

Part : GitHub - Search code, repositories, users, issues, pull requests...: 🔥 A tool to analyze your website's AI-readiness, powered by Firecrawl

Part : Fundamentals of Building Autonomous LLM Agents This paper is based on a seminar technical report from the course Trends in Autonomous Agents: Advances in Architecture and Practice offered at TUM

Part : Introduction | MCP Toolbox for Databases

Part : GitHub - Tencent-Hunyuan/HunyuanOCR

Part : Effective harnesses for long-running agents Anthropic

Part : GitHub - pixeltable/pixeltable: Pixeltable — Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads

Part : AI Explained - Stanford Research Paper.pdf - Google Drive

Part : We present Olmo 3, our next family of fully open, leading language models

Part : Nano Banana Pro is making millions of interior designers obsolete I upload my floor plan and it design the whole house for me, and even generate real images for each room based on the dimension

Part : How to Segment Videos with Segment Anything 3 (SAM3)

Part : Introducing MagicPath, an infinite canvas to create, refine, and explore with AI

Part : Nano Banana Pro is wild

Part : Next up… Slide Decks! Turn your sources into a detailed deck for reading OR a set of presentation-ready slides

Part : Presentations — Benedict Evans

Part : Nano Banana Pro: Gemini 3 Pro Image model from Google DeepMind

Part : Google Antigravity

Part : GitHub - GibsonAI/Memori: Open-Source Memory Engine for LLMs, AI Agents & Multi-Agent Systems

Part : GitHub Projects Community (@GithubProjects) on X

Part : I’m starting to get into a habit of reading everything (blogs, articles, book chapters,…) with LLMs

Part : Love this framing！ This is exactly what we’re building at Weco: - you write an eval script (your verifier) - Weco iterates on the code to optimize it against that eval Software 1

Part : Supercharge your OCR Pipelines with Open Models

Part : [2511.09030] Solving a Million-Step LLM Task with Zero Errors

Part : Gemini 3: Introducing the latest Gemini AI model from Google

Part : [2511.10395] AgentEvolver: Towards Efficient Self-Evolving Agent System

Part : GitHub - rbalestr-lab/lejepa

Part : Use Cases | Claude

Part : Improving frontend design through Skills | Claude

Part : Sim: Open-source platform to build and deploy AI agent workflows

Part : Context Retrieval for AI Agents across Apps & Databases

Part : said we should delete tokenizers

Part : You Should Write An Agent · The Fly Blog

Part : "🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here"

Part : Link to the Strix GitHub repo: (don't forget to star 🌟)

Part : Source: Thanks and Bharat for showing the world you can in fact tra...

Part : This Claude Code prompt literally turns Claude Code into ultrathink...

Part : Wren AI | Official Blog

Part : Tongyi DeepResearch: A New Era of Open-Source AI Researchers | Tongyi DeepResearch

Part : Syllabi – Open-source agentic AI with tools, RAG, and multi-channel deploy

Part : OpenSkills

Part : MiniMax-M2

Part : AI Act Single Information Platform | AI Act Service Desk

Part : eurollm.io

Part : Introducing Mistral AI Studio. | Mistral AI

Part : OpenSnowcat - Enterprise-grade behavioral data platform.

Part : Dr Milan Milanović (@milan_milanovic) on X

Part : Game Theory | Open Yale Courses

Part : DeepSeek-OCR

Part : Airbyte: The Leading Data Integration Platform for ETL/ELT Pipelines

Part : Enterprise Deep Research

Part : I quite like the new DeepSeek-OCR paper

Part : olmOCR 2: Unit test rewards for document OCR | Ai2

Part : We used DeepSeek OCR to extract every dataset from tables/charts ac...

Part : Scripts I wrote that I use all the time

Part : DeepSeek OCR - More than OCR - YouTube

Part : How to Get Consistent Classification From Inconsistent LLMs?

Part : Production RAG: what I learned from processing 5M+ documents

Part : Stanford's ALL FREE Courses [2024 & 2025] ❯ CS230 - Deep Learni...

Part : Syllabus

Part : Make Any App Searchable for AI Agents

Part : PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Part : Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Part : Recursive Language Models (RLMs)

Part : nanochat

Part : ROMA: Recursive Open Meta-Agents

Part : NeuTTS Air

Part : Cua: Open-source infrastructure for Computer-Use Agents

Part : MCP Analytics and Authentication Platform

Part : My trick for getting consistent classification from LLMs

Part : If you're late to the whole "memory in AI agents" topic like me, I recommend investing 43 minutes to watch this video

Part : DeepLearning.AI: Start or Advance Your Career in AI

Part : Claude Code best practices | Code w/ Claude - YouTube

Part : EU-funded TildeOpen LLM delivers European AI breakthrough for multilingual innovation | Shaping Europe’s digital future

Part : The RAG Obituary: Killed by Agents, Buried by Context Windows

Part : Anthropic releases Claude Sonnet 4.5 in latest bid for AI agents and coding supremacy

Part : RAG-Anything: All-in-One RAG Framework

Part : RAGLight

Part : Turns Codebase into Easy Tutorial with AI

Part : Failing to Understand the Exponential, Again

Part : Prompt Packs | OpenAI Academy

Part : AI-Researcher: Autonomous Scientific Innovation

Part : Context Engineering for AI Agents: Lessons from Building Manus

Part : AgenticSeek: Private, Local Manus Alternative

Part : Learn Your Way

Part : Qwen-Image-Edit-2509: Multi-Image Support，Improved Consistency

Part : Qwen-Image

Part : Introducing Tongyi Deep Research

Part : 💾🎉 copyparty

Part : AI Engineering Hub

Part : Deep Chat

Part : ibm-granite/granite-docling-258M · Hugging Face

Part : Google just dropped an ace 64-page guide on building AI Agents

Part : opcode - The Elegant Desktop Companion for Claude Code

Part : NocoDB Cloud

Part : A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch

Part : MemoRAG: Moving Towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery

Part : Enable AI to control your browser 🤖

Part : Total monthly distance traveled by passengers in California’s driverless taxis - Our World in Data

Part : A must-bookmark for vibe-coders

Part : Huge AI market opportunity in 2025

Part : The Anthropic Economic Index Anthropic

Part : dots.ocr: Multilingual Document Layout Parsing in a Single Vision-Language Model

Part : PaddleOCR

Part : DeepSite v2 - a Hugging Face Space by enzostvs

Part : How to Use Claude Code Subagents to Parallelize Development

Part : Show HN: CLAVIER-36 – A programming environment for generative music

Part : Small models are the future of agentic ai

Part : Kimi K2: Open Agentic Intelligence

Part : Introducing Qwen3-Max-Preview (Instruct)

Part : Scientific Paper Agent with LangGraph

Part : Anthropic's Interactive Prompt Engineering Tutorial

Part : swiss-ai/Apertus-70B-2509 · Hugging Face

Part : Making a font of my handwriting · Chameth.com

Part : SurfSense

Part : LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Part : NextChat

Part : This Article

Part : Colette - ci ricorda molto Kotaemon

Part : VibeVoice: A Frontier Open-Source Text-to-Speech Model

Part : [2502.12110] A-MEM: Agentic Memory for LLM Agents

Part : [2504.19413] Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Part : Apertus 70B: Truly Open - Swiss LLM by ETH, EPFL and CSCS

Part : HumanLayer

Part : PageIndex: Document Index for Reasoning-based RAG

Part : Deploying DeepSeek on 96 H100 GPUs

Part : Claude Code: A Highly Agentic Coding Assistant - DeepLearning.AI

Part : DyG-RAG: Dynamic Graph Retrieval-Augmented Generation with Event-Centric Reasoning

Part : [2508.15126] aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

Part : Alexander Kruel - Links for 2025-08-24

Part : AI Agents for Beginners - A Course

Part : Turning Claude Code into my best design partner

Part : How to build a coding agent

Part : Tiledesk Design Studio

Part : Build a Large Language Model (From Scratch)

Part : Data Formulator: Create Rich Visualizations with AI

Part : browser-use/web-ui

Part : Casper Capital - 100 AI Tools You Can’t Ignore in 2025...

Part : CS294/194-196 Large Language Model Agents | CS 194/294-196 Large Language Model Agents

Part : Show HN: Whispering – Open-source, local-first dictation you can trust

Part : Fallinorg v1.0.0-beta

Part : paperetl

Part : Automatically annotate papers using LLMs

Part : My AI Had Already Fixed the Code Before I Saw It

Part : Llama-Scan: Convert PDFs to Text W Local LLMs

Part : Claudia – Desktop companion for Claude code

Part : Show HN: Fallinorg - Offline Mac app that organizes files by meaning

Part : Focalboard

Part : Elysia: Agentic Framework Powered by Decision Trees

Part : LangExtract

Part : +1 for "context engineering" over "prompt engineering"

Part : The race for LLM cognitive core

Part : [2507.07935] Working with AI: Measuring the Occupational Implications of Generative AI

Part : Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting

Part : Prava - Teaching GPT‑5 to use a computer

Part : InstaVM - Secure Code Execution Platform

Part : Litestar is worth a look

Part : Jobs at Kaizen | Y Combinator

Part : Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production

Part : Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

Part : Agentic Design Patterns - Documenti Google

Part : [2507.14447] Routine: A Structural Planning Framework for LLM Agent System in Enterprise

Part : Qwen3-Coder: Agentic coding in the world

Part : FutureHouse Platform

Part : Voxtral | Mistral AI

Part : Research Agent with Gemini 2.5 Pro and LlamaIndex | Gemini API | Google AI for Developers

Part : AI Act, c'è il codice di condotta per un approccio responsabile e facilitato per le Pmi - Cyber Security 360

Part : [2507.06398] Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI

Part : MindsDB, an AI Data Solution - MindsDB

Part : Backlog.md – Markdown-native Task Manager and Kanban visualizer for any Git repo

Part : Opencode: AI coding agent, built for the terminal

Part : The new skill in AI is not prompting, it's context engineering

Part : SymbolicAI: A neuro-symbolic perspective on LLMs

Part : Gemini for Google Workspace Prompting Guide 101

Part : Judge Rules Training AI on Copyrighted Works Is Fair Use, Agentic Biology Evolves, and more...

Part : MCP is eating the world—and it's here to stay

Part : How Dataherald Makes Natural Language to SQL Easy

Part : Field Notes From Shipping Real Code With Claude

Part : Nice - my AI startup school talk is now up!

Part : Nice - my AI startup school talk is now up! Chapters: 0:00 Imo fair to say that software is changing quite fundamentally again

Part : Automated 73% of his remote job using basic automation tools, told his manager everything, and got a promotion

Part : Building Effective AI Agents

Part : How Anthropic Teams Use Claude Code

Part : Snorting the AGI with Claude Code

Part : Nanonets-OCR-s – OCR model that transforms documents into structured markdown

Part : The Illusion of Thinking

Part : Trends – Artificial Intelligence | BOND

Part : Claude Code is My Computer | Peter Steinberger

Part : [2505.24863] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Part : [2505.24864] ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Part : My AI Skeptic Friends Are All Nuts · The Fly Blog

Part : Designing Pareto-optimal GenAI workflows with syftr

Part : "BillionMail 📧 An Open-Source MailServer, NewsLetter, Email Marketing Solution for Smarter Campaigns"

Part : Ask HN: What is the best LLM for consumer grade hardware?

Part : [2411.06037] Sufficient Context: A New Lens on Retrieval Augmented Generation Systems

Part : Show HN: Onlook – Open-source, visual-first Cursor for designers

Part : Agent Development Kit (ADK)

Part : Strands Agents

Part : Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning

Part : Introduction - IntelOwl Project Documentation

Part : Show HN: My LLM CLI tool can run tools now, from Python code or plugins

Part : [2505.03335v2] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Part : Codex’s Robot Dev Team, Grok's Fixation on South Africa, Saudi Arabia’s AI Power Play, and more...

Part : [2502.00032v1] Querying Databases with Function Calling

Part : Come Addestrare un LLM con i Tuoi Dati Personali: Guida Completa con LLaMA 3.2

Part : AI Hedge Fund

Part : Troy Hunt: Have I Been Pwned 2.0 is Now Live!

Part : A Research Preview of Codex

Part : [2505.06120] LLMs Get Lost In Multi-Turn Conversation

Part : Ollama's new engine for multimodal models

Part : Vision Now Available in Llama.cpp

Part : [2505.03335] Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Part : Requests for Startups | Y Combinator

Part : Token & Token Usage | DeepSeek API Docs

Part : Cua is Docker for Computer-Use AI Agents

Part : [2504.07139] Artificial Intelligence Index Report 2025

Part : Gemma 3 QAT Models: Bringing state-of-the-Art AI to consumer GPUs

Part : GitHub - HandsOnLLM/Hands-On-Large-Language-Models: Official code repo for the O'Reilly Book - 'Hands-On Large Language Models'

Part : Deep Tech Revolution - Area Science Park

Part : GitHub - humanlayer/12-factor-agents: What are the principles we can use to build LLM-powered software that is actually good enough to put

Part : Pagina LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Part : Pagina SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Part : DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

Part : A foundation model to predict and capture human cognition | Nature

Part : Large language models are proficient in solving and creating emotional intelligence tests | Communications Psychology

Part : MALADE: Multi-Agent Architecture for Pharmacovigilance - langroid

Part : Everything About Transformers