App Swap

GitHub Copilot Tabby

Copilot moved to usage-based billing on June 1, 2026 — Tabby runs the same autocomplete on your own GPU for a flat $0.

Free
Price model
Yes
Open source

On June 1, 2026, GitHub Copilot’s flat monthly plans gave way to usage-based billing, and the developer forums filled with screenshots of bills that had jumped from $39 to several hundred dollars in a single cycle. Heavy users of agent mode and repo-wide features were hit hardest. If your day is mostly tab-completion rather than autonomous agents, Tabby delivers that same inline suggestion experience on a machine you already own, with no meter running in the background.

For the core loop — ghost-text completions, fill-in-the-middle, and a chat panel for quick questions — Tabby covers what most engineers actually use Copilot for all day. It ships as a self-contained server with editor extensions for VS Code, JetBrains, Neovim and more, and it lets you pick the model behind it: StarCoder, CodeLlama, DeepSeek-Coder, or your own fine-tune. Suggestions are drawn from your repository context, so the completions stay relevant to the codebase in front of you.

The honest trade-offs are hardware and the frontier gap. Tabby wants a consumer GPU to keep latency low; on CPU alone it is sluggish. And it does not try to match Copilot’s newest agentic tier — the multi-file, plan-and-execute mode that can refactor across a whole repository. You also inherit the operational work: updates, uptime, and model upgrades are yours now. In exchange, your source never leaves the building, which matters for regulated or air-gapped teams, and the bill is fixed at zero.

Migration is a half-day for an individual and a day or two for a team. Spin up the Tabby server with Docker, point it at a model that fits your GPU’s VRAM, install the editor extension, and disable Copilot. The muscle memory carries straight over — the keybindings and accept-suggestion flow feel identical. Budget extra time only if you want to wire Tabby into a shared server so a whole team hits one endpoint rather than each running it locally.