Highlights

Daily picks worth your time. Three to five stories, filtered for practitioners.

Banknotes with receipts and budget statistics on a table

Uber ran out of its AI coding budget in April after ranking engineers on usage

Uber encouraged engineers to compete on AI tool usage leaderboards, Claude Code adoption surged beyond projections, and the company burned through its AI budget months into 2026.

  • industry
  • tools
Data visualisation charts on a screen showing analytics and statistics

AI at Record Scale: $581B Invested, 14x More Emissions, and Models That Still Fail Basic Perception

The Stanford AI Index 2026 documents record investment and benchmark progress alongside a 14x jump in training emissions and persistent failures on basic visual tasks -- a more complicated picture than the headline numbers suggest.

  • research
  • industry
A laptop computer on a desk, open to a code editor

30% of engineers are hitting AI usage limits — and the ones causing it aren't who you'd expect

A survey of 900+ engineers finds three distinct archetypes responding to AI tools differently, with 30% hitting usage limits, 15% raising cost concerns, and a consistent gap between productivity gains for senior engineers and technical debt accumulation among less experienced ones.

  • industry
  • tools
Computer screen displaying code with a context menu open

Claude Code Routines let you attach Claude to your CI pipeline, not just your terminal

Routines are saved Claude Code configurations that run unattended on Anthropic-managed infrastructure, triggered by a schedule, an API call, or a GitHub event.

  • tools
  • infrastructure
Glowing AI chip on a circuit board, representing inference hardware

A small guide model can cut your LLM inference costs by 22% without replacing your frontier model

ExecTune trains a small 'guide' model to generate execution strategies for a larger black-box model, achieving 9.2% accuracy gains and 22.4% cost reductions — with Claude Haiku 3.5 matching Sonnet 3.5 performance on math and code benchmarks.

  • research
  • infrastructure
A padlock resting on a computer keyboard, representing cryptographic security

OpenSSL 4.0 ships Encrypted Client Hello and post-quantum crypto — here's what actually needs migrating

OpenSSL 4.0.0 adds Encrypted Client Hello and post-quantum hybrid key exchange, removes SSLv3 and the engine API, and changes certificate validation behaviour — the migration burden varies sharply by what your code actually uses.

  • infrastructure
  • tools
A microprocessor on a circuit board

An AI System Ran Its Own Research Loop and Beat torch.compile by 4x

AlphaLab, an autonomous multi-agent research framework using frontier LLMs, achieved 4.4x average speedup over torch.compile on GPU kernels, 22% lower validation loss on LLM pretraining, and 23-25% improvements in traffic forecasting — all without human intervention.

  • research
  • tools
  • infrastructure
Code displayed on a monitor screen

Someone Bought 30 WordPress Plugins and Planted a Backdoor That Slept for 8 Months

A buyer acquired 30 WordPress plugins through Flippa, inserted dormant malware across all of them, waited 8 months, then activated a backdoor that used an Ethereum smart contract for command-and-control to resist takedowns.

  • industry
  • infrastructure
Workers in a laboratory examining testing equipment

Every major AI agent benchmark can be gamed to a perfect score

Researchers at Berkeley found that eight of the most widely cited AI agent benchmarks — including SWE-bench Verified and OSWorld — can each be exploited to achieve near-perfect scores without solving a single task.

  • research
  • tools
Server rack with blinking green indicator lights in a data centre

Anthropic quietly shortened prompt cache TTL and it cost some users 17% more

A developer analysed six months of Claude API session logs and found Anthropic silently shifted the default prompt cache TTL from one hour to five minutes on March 6, causing measurable cost increases for long-running sessions.

  • tools
  • infrastructure
Rows of white archive boxes organised on wooden shelves

SQLite 3.53 finally lets you add and remove NOT NULL and CHECK constraints

SQLite 3.53.0 adds ALTER TABLE support for adding and removing NOT NULL and CHECK constraints, closes a long-standing gap that previously required workarounds, and ships a new JSON array insert function along with CLI improvements.

  • tools
  • infrastructure
An unlocked padlock resting on a computer keyboard

The moat in AI vulnerability scanning is the system, not the model

AISLE tested eight models against the vulnerabilities Anthropic's Mythos found, and the results undercut the frontier model exclusivity argument: a 3.6B parameter model at $0.11 per million tokens found the same FreeBSD zero-day.

  • research
  • tools
  • infrastructure
A computer screen displaying a program running in a terminal

The Linux kernel has formal rules for AI-assisted contributions now

The Linux kernel's official documentation now defines how AI coding assistants should be attributed, establishes that AI agents cannot sign off contributions, and places full legal responsibility on the human submitter.

  • tools
  • industry
A rack of servers in a dimly lit server room

The economics of releasing frontier open models are breaking

As training costs reach billions of dollars, fewer organisations will sustain frontier-level open releases — Nathan Lambert argues a collectively-funded consortium is the only viable long-term mechanism.

  • industry
  • research
Colourful audio sound wave visualisation on a dark background

ChatGPT voice mode runs on an older, weaker model than you think

OpenAI's voice interface runs on an older GPT-4o era model with an April 2024 knowledge cutoff, not the current frontier — a gap that explains why voice fumbles questions that text handles easily.

  • industry
  • tools
Keys hanging near a partially open door with light shining through

LangChain's answer to Claude Managed Agents: own your agent's memory

Deep Agents Deploy is a model-agnostic agent deployment platform positioning directly against Anthropic's Managed Agents, with its differentiator being memory stored in open formats that users control and can query directly.

  • tools
  • architecture
Person holding a glass sphere reflecting a blurred landscape, representing cross-modal perception

Sentence Transformers now does cross-modal search out of the box

Sentence Transformers v5.4 adds multimodal embedding and reranking via Qwen3-VL and NVIDIA Nemotron, letting you retrieve across text, images, audio, and video using one library and one familiar API.

  • tools
  • research
Macro photograph of a silicon wafer showing microscopic transistors and circuit patterns

A 1.3M parameter model beats LLMs 92,000 times its size at real-time game control

A 1.3M parameter model trained on 31,000 human gameplay demonstrations scores 178 frags in DOOM versus 13 combined for all tested LLMs including GPT-4o-mini, at 31ms inference on consumer hardware.

  • research
  • tools
A computer circuit board with a brain illustration on it

Meta Released Its First Hosted Frontier Model With 16 Built-in Tools

Meta's Muse Spark is the company's first hosted frontier model, sitting just behind Gemini 3.1 Pro and GPT 5.4 on Artificial Analysis rankings, with 16 native tools including visual grounding with pixel-level precision.

  • tools
  • industry
A stack of books sitting on top of a table

Agents That Read Papers Before Writing Code Find Optimisations That Code-Only Agents Miss

A research-driven agent that reads arXiv papers and competing project codebases before optimising llama.cpp achieved a 15.1% performance gain on x86 CPU inference for $29 in compute and API costs.

  • tools
  • research
  • architecture
Mathematical equations written on a white page

Calibrated Uncertainty Scores for LLMs Without Access to Model Internals

SELFDOUBT estimates how confident a reasoning model is in its own output using only the generated text, with no model internals or fine-tuning required — making it compatible with any API.

  • research
  • tools
Black CCTV security camera mounted on a wall

The Vercel Claude Code Plugin Is Sending Your Shell Commands to Vercel's Servers

The official Vercel plugin for Claude Code collects full bash command strings by default and full prompt text with opt-in, using misleading consent language and no visible third-party indicator.

  • tools
  • infrastructure
Abstract visualisation of connected cloud infrastructure nodes

Anthropic Ships a Managed Platform for Production Agents

Anthropic's Managed Agents API handles sandboxed execution, persistent state, long-running sessions, and multi-agent coordination so teams don't have to build that infrastructure themselves.

  • tools
  • architecture
Multiple monitors showing code in a developer workspace

DHH Barely Writes Code by Hand Anymore

Six months after saying he didn't use AI for coding, DHH runs multiple agents simultaneously and calls it wearing a mech suit.

  • tools
  • industry
Close-up of GPU hardware components

Training a 100B Model on One GPU Is Now Possible

MegaTrain trains 100B+ parameter models on a single GPU by treating GPU as a transient compute engine and storing model state in CPU host memory.

  • research
  • infrastructure
A security and privacy dashboard showing system status indicators

Claude Mythos is scanning critical open source software for zero-days

Anthropic launched Project Glasswing, using Claude Mythos Preview to scan foundational open source software for zero-day vulnerabilities, already finding thousands of high-severity flaws in the Linux kernel, OpenBSD, and FFmpeg.

  • tools
  • industry
  • infrastructure
Red padlock on a black computer keyboard

Combining attack techniques jumps AI safety failures from 14% to 71%

A new paper shows that combining multiple jailbreak techniques simultaneously pushes attack success rates from 14.3% to 71.4%, revealing that RL-based safety training generalises much more poorly than capability training.

  • research
  • industry
A group of colourful geometric cubes arranged in a pattern

Google open-sourced a hypervisor for running multiple AI agents in isolated containers

Google released Scion, an experimental agent orchestration platform that runs multiple AI agents as isolated, concurrent containers with separate git worktrees and credentials, treating agent coordination as an infrastructure problem rather than a prompting problem.

  • tools
  • architecture
  • infrastructure
A robot figure with a glowing light saber against a dark background

LangChain's async subagents let orchestrators delegate work without blocking

Deep Agents v0.5 introduces non-blocking async subagents that return a task ID immediately and execute remotely, enabling orchestrators to dispatch multiple long-running tasks while remaining responsive.

  • tools
  • architecture
Men observe automated conveyor belt system in warehouse

When AI agents do the shopping, your marketing copy becomes invisible

A controlled experiment shows AI shopping agents choose merchants with structured JSON data over competitors offering cheaper products with marketing copy, because the structured data passes validation while the copy fails.

  • industry
  • architecture
Security camera stencil with text on wall

Most AI agents will cover up evidence when their employer tells them to

Researchers tested 16 state-of-the-art LLMs and found that a majority would actively suppress incriminating evidence when given corporate profit incentives.

  • research
  • industry
Colourful code scrolls across a dark background

AI just went grandmaster at competitive programming — and the algorithm might matter more than the result

GrandCode placed first across three live Codeforces tournaments in March 2026 using a multi-agent RL system with a novel algorithm for training agents with delayed rewards.

  • research
  • tools
Pink padlock against a light background representing cryptographic security

The window to migrate off current cryptography is closing faster than most engineers realise

Filippo Valsorda argues that recent research has moved post-quantum migration from a distant concern to a near-term engineering priority, with Google setting an internal 2029 deadline.

  • infrastructure
  • industry
A path leading through tall trees into misty forest light

The real AI risk isn't hallucinations. It's forgetting how to think.

A research educator argues that the danger of AI assistance isn't dramatic failure but slow cognitive outsourcing, where the output looks identical but the practitioner gradually stops building the understanding that makes independent judgement possible.

  • industry
  • research
A person cutting a piece of wood, focused on the craft

AI is great at implementation. It is terrible at design.

Lalit Maganti spent eight years wanting a proper SQLite developer toolset, then built it in three months with AI. His account of what went wrong in the first month is the clearest description yet of the design-versus-implementation gap in AI-assisted development.

  • tools
  • industry
Black laptop computer displaying a blue terminal screen

LM Studio now runs as a headless server with an Anthropic-compatible API

LM Studio 0.4.0 extracts the inference engine into a standalone headless daemon with a full CLI and an Anthropic-compatible endpoint, meaning you can point Claude Code at a local model by setting two environment variables.

  • tools
  • infrastructure
A stylised illustration of a brain positioned over a CPU chip, representing AI computation

Training a coding agent end-to-end costs $200 on TPUs

Nanocode demonstrates training a 1.3B parameter Claude Code-style coding agent from scratch -- pretraining, supervised fine-tuning, and preference optimisation -- on a TPU v6e-8 for around $200 in under nine hours.

  • research
  • infrastructure
A sculptor's hands shaping a human face from clay in an art studio

When you can ship a rebuild in a weekend, product conviction breaks

Tim O'Reilly profiles Harper Reed's argument that AI-speed iteration cycles destroy the feedback loops through which product teams build conviction, requiring new frameworks for decision-making under permanent optionality.

  • industry
  • tools
Close-up of a Thunderbolt 3 cable and port

Nvidia GPUs now officially work on Apple Silicon Macs

Tiny Corp's TinyGPU DriverKit extension, now officially signed by Apple, brings Nvidia Ampere and AMD RDNA3 eGPU support to Apple Silicon Macs without requiring SIP bypass -- the first time Nvidia hardware has ever had official macOS support.

  • infrastructure
  • tools
A hand squeezing an orange, juice running between the fingers

LoRA has been adapting the wrong part of the weight matrix

Minor Component Adaptation targets low-variance singular subspaces rather than dominant ones, achieving up to 5.9x more knowledge acquisition than LoRA using a fraction of the parameters.

  • research
  • tools
Palm trees reflected in a mirror, duplicated and inverted

A model can teach itself to write better code

Sampling a model's own outputs at varied temperatures and fine-tuning on them pushes pass@1 on LiveCodeBench from 42% to 55% -- no teacher model, no RL, no verifier required.

  • research
  • tools
Aerial view of a garden hedge maze at Villa Pamphilij

AI is turning developers into Winchester Mystery House builders

Drew Breunig argues that cheap AI code generation is producing a third model of software development: sprawling, idiosyncratic personal tools built for the builder's own enjoyment, not for distribution.

  • industry
  • architecture
Abstract digital security concept showing code and lock icons

The Axios supply chain attack was a fake company, a Teams call, and a RAT

Attackers compromised the Axios npm package by impersonating a real company, scheduling a fake Teams meeting, and tricking the maintainer into installing a Remote Access Trojan.

  • infrastructure
  • tools
Two developers inspecting code together at a desk

Gemma 4 is out, and benchmarks are the least interesting part

Nathan Lambert argues that Gemma 4's success will hinge on licensing, tooling maturity, and fine-tunability, not benchmark scores, and identifies five factors that actually determine whether an open model gets adopted.

  • tools
  • research
  • industry
Server rack with network cables and blinking lights

New Rowhammer attack gives full control of machines running Nvidia GPUs

A Rowhammer variant exploits GPU memory access patterns to flip bits in DRAM, giving attackers complete control of machines with Nvidia GPUs in shared environments.

  • infrastructure
  • research
Open book with technical diagrams

The toolkit pattern: writing docs for AI, not just humans

O'Reilly describes a documentation pattern where projects structure docs around intent, letting AI generate valid configuration from plain-English descriptions.

  • tools
  • architecture
Maze viewed from above representing problem-solving shortcuts

Catching reward hacking by looking inside the model, not at its outputs

Researchers use representation engineering to detect when RL-trained models learn shortcuts that satisfy reward signals without solving the actual problem.

  • research
Abstract network of connected dots representing neural reasoning

Reasoning models decide before they reason

New evidence that reasoning models encode their action choices before chain-of-thought deliberation begins, which changes how you should read CoT outputs.

  • research
Server rack cables in a data centre

When inference is cheap, you should overtrain your models

New scaling laws show that when you account for inference-time sampling, the optimal pretraining regime shifts radically toward overtraining, overturning conventional Chinchilla-style guidance.

  • research
Privacy and data protection notice text

A breach at one AI data vendor may have exposed secrets from every major AI lab simultaneously

A supply chain attack on LiteLLM compromised Mercor, a $10B AI training data contractor serving OpenAI, Anthropic, and Meta, potentially exposing training datasets and proprietary pipeline details across the industry.

  • industry
  • infrastructure
A robotic torso with exposed internal components and arms

How to build an agent that fixes its own production bugs after deployment

A concrete architecture for self-healing agent deployments: detect regressions using Poisson distribution testing against a 7-day error baseline, triage with a causal link requirement, and auto-open a PR via a coding agent.

  • tools
  • architecture
Abstract representation of AI processing

Claude now supports tool use in streaming mode

Claude's API now supports tool calls mid-stream, letting agents act without waiting for a full response.

  • tools
Network diagram with nodes and connections

Open-weight models are now within striking distance of frontier APIs for agentic workloads

Benchmarking open-weight models against Claude Opus 4.6 on 138 agentic tasks shows a 4–11 percentage point gap, with open models running at 5–20x lower cost and 2–4x lower latency.

  • research
  • industry
  • tools
Dense server cable infrastructure

Gradio's backend is now separable from its UI

Gradio Server separates the queuing engine, GPU management, and MCP support from Gradio's UI system, letting you build any custom frontend while keeping the backend infrastructure that makes GPU serving production-ready.

  • tools
  • infrastructure
Database server room with blue lighting

DuckDB gets native vector similarity search

DuckDB now has native vector similarity search, which means you can do RAG-style retrieval in a single embedded database.

  • tools
  • research
Cursor on black background

Holo3 hits 78% on desktop computer use and the training method is more interesting than the score

H Company's Holo3-35B hits 78.85% on OSWorld-Verified, new state-of-the-art for desktop computer use, using a synthetic training flywheel that generates novel environments rather than relying on collected demonstrations.

  • research
  • tools
Transparent device with wifi symbol on screen, pentesting hardware

AWS just turned penetration testing into an on-demand API call

AWS Security Agent and DevOps Agent hit general availability, compressing penetration testing timelines from weeks to hours and incident resolution from two hours to 28 minutes in early customer results.

  • tools
  • infrastructure
  • industry
A person's head with a circuit board in front of it

The most-used post-training library just hit v1.0, and the design choices are worth understanding

TRL, the post-training library downloaded 3 million times a month, hits v1.0 with a deliberate stability model and a design philosophy built around the short half-life of post-training assumptions.

  • tools
  • research