Your AI Coding Tools Are Training on Your Work. The Deadline to Stop It Is April 24.

Two announcements in one week

On March 25, GitHub updated its privacy statement. From April 24, interaction data from Copilot Free, Pro, and Pro+ accounts will train AI models by default. That means the prompts you send, the code snippets Copilot sees, the suggestions you accept or modify, your file names, and your repository structure. The opt-out exists but requires finding it: /settings/copilot/features, under Privacy, disable “Allow GitHub to use my data for AI model training.” Copilot Business and Copilot Enterprise users are explicitly exempt. Students and teachers are also exempt.

The same week, a developer published an analysis of the official Vercel MCP plugin for Claude Code. By default, with no consent required, the plugin sends full bash command strings, file paths, project names, OS version, and a persistent device UUID to telemetry.vercel.com. That UUID links every session together across projects. The consent language describes this as “anonymous usage data such as skill injection patterns.” There is also an opt-in tier — explicit permission required — for your complete prompt text. The collection fires regardless of whether the project has any connection to Vercel.

These are two different companies, two different products, two different mechanisms. But they hit the same population in the same fortnight, and the exemption structure in each case points at the same underlying logic.

The tier split is the business model

GitHub’s exemption is unambiguous: Business and Enterprise customers are out. Those tiers cost $19–$39 per user per month under annual contracts, often negotiated with procurement and legal review. The individual and team tiers — Free, Pro, Pro+ — are opted in. The price of cheaper access to the tool is contribution to the training pipeline for the next version of it.

This is not a coincidence or an oversight. It is a coherent pricing model. Frontier model training is expensive, and the data used to train those models has direct dollar value. Enterprise customers pay enough that the data trade is not needed. Everyone else funds the data flywheel that will make the tool better for enterprise customers next year. Calling it a “privacy setting” is technically accurate and functionally misleading — it is a tier-based data extraction arrangement with an opt-out buried in settings.

The Vercel plugin operates on similar logic. The plugin is free to install, extends a tool (Claude Code) that developers are already paying for, and captures data that a distribution platform would find extremely valuable: what commands developers run, what projects they work on, what problems they are solving. The persistent device UUID means Vercel can build a cross-session picture of individual developer behaviour, even though the collection is described as anonymous.

What the data actually is

Developer tooling is a qualitatively richer data source than typical product telemetry. A SaaS analytics tool might tell a vendor which features you use and how often. An AI coding tool sees how you think before you’ve finished thinking.

GitHub’s collected data includes: prompts you send to Copilot (your problem statements, debugging questions, architectural queries), code snippets the model receives as context (your codebase, not just your query), outputs you accept or modify (indicating what you found useful), and file names and documentation that reveal the domain and structure of what you’re building. This is training data for a future model that will be better at the exact kind of work you do, because it was trained on how you do it.

Vercel’s bash commands contain more than people typically realise: file paths that reveal directory structure and naming conventions, environment variable names that indicate which services a project connects to, infrastructure commands that describe deployment architecture, and occasionally secrets that end up in command arguments. The collection fires on every project, not just Vercel projects. What Vercel gets is a map of a developer’s full workflow, not just their use of Vercel’s services.

GitHub’s approach is the more honest of the two. The policy change is documented in a public blog post, the opt-out is functional and clearly described, and the prior preference (if you had previously opted out) is preserved. The problem is structural: the default is collection, the deadline is two weeks away for an announcement buried in a privacy statement update, and the language does not make clear that the exemption for Business and Enterprise is a commercial arrangement rather than a privacy protection.

The Vercel plugin approach is harder to defend. Describing bash command collection — with a persistent cross-session device identifier — as “anonymous usage data such as skill injection patterns” is not accurate. The data is not anonymous when it is linked to a device UUID. The term “skill injection patterns” does not describe commands like git push origin main --force or psql -U admin production_db. The consent form for this is a Claude prompt injection that instructs the AI to ask questions on Vercel’s behalf, with no visual indicator that a third-party plugin is operating.

What both have in common is opt-in as the default, which is where the practical leverage lies. The majority of users will not change the default. That is why the default matters.

Why individuals and small teams are the target

Enterprise procurement involves legal review of data handling terms. A company with 500 Copilot seats will have someone whose job includes reading the data use clauses in vendor contracts. The Business and Enterprise tiers offer contractual protections because the vendor cannot risk losing the contract over a data clause.

Individual developers and small teams are a different negotiating position. There is no procurement review, no legal sign-off, no ability to negotiate terms. The vendor can set defaults that benefit the vendor, and the friction of opting out means most users never will.

That framing assumes the user at least knows the change happened, which is a generous assumption.

GitHub put a banner in their UI. That is worth acknowledging. It is not the norm. The standard playbook is a privacy policy update email that filters into promotions and gets bulk-deleted, a changelog entry that reaches only the subset of users who actively follow it, or a terms of service revision where continued use constitutes acceptance. The legal disclosure requirement is satisfied; the practical visibility to the affected user is close to zero. The burden of staying informed sits entirely with the individual developer, across every tool in their stack, indefinitely. Most will find out when someone else surfaces it — as happened here, via a developer publishing their findings and the story reaching Hacker News two weeks before the deadline.

This is not unique to developer tools. It is the same structure as social media advertising and consumer health app data collection. The difference is the sensitivity of what developer tools collect, and the fact that the developers building on these tools often do not think of themselves as the data source being monetised.

The Vercel and GitHub announcements landing in the same week reflects an industry that has identified the same opportunity simultaneously. The tools most widely adopted by individual developers and small teams — the free-tier AI assistants, the coding plugins, the developer workflow tools — sit in a data collection sweet spot: high contextual richness, low negotiating leverage, large number of users, and a population that tends to trust developer-friendly brands.

What to do this week

For GitHub Copilot: go to /settings/copilot/features. Under Privacy, disable “Allow GitHub to use my data for AI model training.” Do this before April 24. If you previously opted out of data collection for product improvement purposes, GitHub says that preference has been preserved and you do not need to act again. But it is worth verifying.

For the Vercel Claude Code plugin: set export VERCEL_PLUGIN_TELEMETRY=off in your shell profile, or remove the plugin entirely. Removing the plugin does not affect Claude Code’s core functionality.

For everything else in your workflow: any free or low-cost AI tool has an economic incentive to collect training data. The Copilot tier model is a template that other tools will follow. The practical audit question is: what data does this tool see during normal operation, and does the pricing tier suggest the product economics require a data contribution from users who pay below enterprise rates? If the answer is yes, look for the opt-out before assuming one exists.

The trade is real — better models do require better training data, and using AI tools means your work is part of what makes them improve. The problem is not the trade. The problem is defaults set to maximise capture, consent language that obscures what is being taken, and a tier structure that makes privacy a premium feature rather than a baseline right.

Sources:

Two announcements in one week

The tier split is the business model

What the data actually is

The consent mechanics

Why individuals and small teams are the target

What to do this week