Knowing the specific multiplies and QKV and how attention works doesn't develop ...

BaculumMeumEst · on Aug 25, 2024

But those things are all so cool though. Like... how could you not want to learn about them.

Seriously though, I guess I'm just kind of uncomfortable with "treating inference implementation like a VM" as you put it. It seems like a bad idea. We are turning implementation details into user interfaces in a space that is undergoing such rapid and extreme change. Like people spent a lot of time learning the stable diffusion web ui, and then flux came out and upended the whole space. But maybe foundational knowledge isn't as valuable as I'm thinking and its fine that people just re-learn whatever UIs emerge, I don't know.