You addressed storage but skipped the CPU portion: Yes, Opus can create good quality audio with low storage requirements, but it cannot do so without a high CPU cost.
ARM CPUs can be rather performant, especially for specialist tasks. These things are always plugged in and don't need to save battery, therefore they can always run at maximum performance.
The HomePod for example uses the Apple A8 chip which is a very capable chip used to power the iPhone 6 than did way more than encode/decode audio.
Wouldn't heat generation start to become an issue? As far as I know, none of these assistants have fans. I think the average consumer would notice that their device starts giving off a lot of heat if there's a lot of speech in range of its microphone - "I wonder why my Echo turns into a space heater when I leave the TV on."
> I think the average consumer would notice that their device starts giving off a lot of heat if there's a lot of speech in range of its microphone.
Interesting point, my guess would be that not many would notice since a HomePod/Alexa/Google Home would usually sit somewhere in a corner of a room/under the TV and not be regularly touched since you don't need to touch it to control it most of the time.
I am not even sure it would be that much heat, my x86 laptop can play video for a very long amount of time before getting noticeably hot, granted with a fan, however these ARM CPUs get noticeably less hot than your average Intel chip, even without a fan.
True, but not expensive either. Especially considering it's on sale all the time.
Even for the cheaper devices, the CPU is probably capable enough, (maybe excluding the cheapest Echo Dot/Nest).
The Echo Show devices even have a screen and are actually designed to play videos from all kind of sources, (decode), as well as for videocalling, (encode) and they're £60 right now on Amazon UK.
Look further up the thread. The bandwidth required to transport such data has not been observed. I don't think anyone would argue that these companies wouldn't have the ability to build a device that streamed everything home. It's that to do so would mean there'd be some observable effect in the device's network usage that has not been observed.