Knowing the specific multiplies and QKV and how attention works doesn't develop your intuition for how LLMs work. Knowing that the effective output is a list of tokens with associated probabilites is of marginal use. Knowing about rotary position embeddings, temperature, batching, beam search, different techniques for preventing repetition and so on doesn't really develop intuition about behavior, but rather improve the worst cases - babbling repeating nonsense in the absolute worst - but you wouldn't know that at all from first principles without playing with the things.
The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.
But those things are all so cool though. Like... how could you not want to learn about them.
Seriously though, I guess I'm just kind of uncomfortable with "treating inference implementation like a VM" as you put it. It seems like a bad idea. We are turning implementation details into user interfaces in a space that is undergoing such rapid and extreme change. Like people spent a lot of time learning the stable diffusion web ui, and then flux came out and upended the whole space. But maybe foundational knowledge isn't as valuable as I'm thinking and its fine that people just re-learn whatever UIs emerge, I don't know.
The truth is that the inference implementation is more like a VM, and the interesting thing is the model, the set of learned weights. It's like a program being executed one token at a time. How that program behaves is the interesting thing. How it degrades. What circumstances it behaves really well in, and its failure modes. That's the thing where you want to be able to switch and swap a dozen models around and get a feel for things, have forking conversations, etc. It's what LM Studio is decent at.