Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unfortunately, this looks to only cover the larger MoE models. I imagine the smaller models are what most people would target. 9B just dropped two days ago, so not surprised it’s not explicitly documented, but does use a hybrid mamba architecture that I expect needs some special consideration.
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: