It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.
I think the bigger problem is that Copilot is… shit.
It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.
And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.
They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.
Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.
What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.
Umm copilot is just linkage to other models. My work VS instance defaults to claude but there are several others available. “Copilot” itself is not its own model
Really? I don’t use it for work, but I swore I was hitting some internal MS model for chat/code, as it was one of the worst experiences I’ve had with LLMs over 24B.
It’s more complex than that. The weights of big models are distributed, and then tokens are processed in parallel for multiple users. The setup varies, but it could be 8 GPUs serving many dozens of users at once, or bigger sets with even more parallelism.
I think the bigger problem is that Copilot is… shit.
It’s probably some ancient, inefficient architecture, not something super sparse and hardware efficient like (say) Deepseek V4, or Kimi 2.6, or Gemini Pro.
And literally every interesting dev team Microsoft has ever acquired (Phi, WizardLM, many more), and any interesting innovation they figured out, has just disappeared into a black hole.
They don’t have custom hardware, either, like Huawei NPUs or Cerebras WSEs, or Google TPUs. They’ve written some very interesting papers on that, and proceeded to do squat with them.
Also, it is AWFUL for its size. Tiny models that are basically free run circles around CoPilot.
What I’m getting at is that CoPilot is probably the most inefficient LLM out there. Like, it’s impressive how bad it is.
Umm copilot is just linkage to other models. My work VS instance defaults to claude but there are several others available. “Copilot” itself is not its own model
Really? I don’t use it for work, but I swore I was hitting some internal MS model for chat/code, as it was one of the worst experiences I’ve had with LLMs over 24B.