More

ilaksh · 2026-03-05T18:36:43 1772735803

Remember when everyone was predicting that GPT-5 would take over the planet?

dbbk · 2026-03-05T18:52:49 1772736769

It was truly scary, according to Sam...

zeeebeee · 2026-03-05T20:42:20 1772743340

iTs lITeRaLlY AGI bro

ilaksh · 2026-03-05T11:57:49 1772711869

Does anyone have working code for fine-tuning PersonaPlex for outgoing calls? I have tried to take the fine tuning LoRA stuff from Kyutai/moshi-finetune and apply it to the personaplex code. Or more accurately,various LLMs have worked on that.

I have something that seems to work in a rough way but only if I turn the lora scaling factor up to 5 and that generally screws it up in other ways.

And then of course when GPT-5.3 Codex looked at it, it said that speaker A and speaker B were switched in the LoRA code. So that is now completely changed and I am going to do another dataset generation and training run.

If anyone is curious it's a bit of a mess but it's on my GitHub under runvnc moshi-finetune and personaplex. It even has a gradio app to generate data and train. But so far no usable results.

ilaksh · 2026-03-05T11:49:01 1772711341

There is OpenAI gpt-realtime and Gemini Flash or whatever which are great but they do not seem to be quite the same level of overlapping realistic full duplex as moshi/personaplex.

ilaksh · 2026-03-05T11:47:12 1772711232

tavus.io

nerdsniper · 2026-03-05T12:05:45 1772712345

Hmm. Would this let me replace my own face in a live videoconferencing session? It seems like it's more of a video chatbot than a v-tuber style overlay.

ilaksh · 2026-03-05T12:19:15 1772713155

Had no idea that was what you were asking for. Search for Zoom Face Filter or OBS Face Filter OBS deep fake live etc.

ilaksh · 2026-03-05T11:45:28 1772711128

Different type of model but you can buy those on Amazon etc.

ilaksh · 2026-03-05T11:43:57 1772711037

For my framework, since I am using it for outgoing calls, what I am thinking maybe is I will add a tool command call_full_duplex(number, persona_name) that will get personaplex warmed up and connected and then pause the streams, then connect the SIP and attach the IO audio streams to the call and return to the agent. Then send the deepgram and personaplex text in as messages during the conversation and tell it to call a hangup() command when personaplex says goodbye or gets off track, otherwise just wait(). It could also use speak() commands to take over with TTS if necessary maybe with a shutup() command first. Need a very fast and smart model for the agent monitoring the call.

ilaksh · 2026-03-04T17:00:02 1772643602

Does anyone know when the small Qwen 3.5 models are going to be on OpenRouter?

armanj · 2026-03-04T17:09:46 1772644186

they're already there ?? https://openrouter.ai/qwen/qwen3.5-27b

ilaksh · 2026-03-04T17:39:18 1772645958

Like 4B, 2B, 9B. Supposedly they are surprisingly smart.

Sakthimm · 2026-03-04T18:32:44 1772649164

Yep. The 9B has excellent image recognition. I showed it a PCB photo and it correctly identified all components and the board type from part numbers and shape. OCR quality was solid. Tool calling with opencode worked without issues, but general coding ability is still far from sonnet-tier. Asked it to add a feature to an existing react app, it couldn't produce an error-free build and fell into a delete-redo loop. Even when I fixed the errors, the UI looked really bad. A more explicit prompt probably would have helped. Opus one-shotted it, same prompt, the component looked exactly as expected.

But I'll be running this locally for note summarization, code review, and OCR. Very coherent for its size.

BoredomIsFun · 2026-03-05T08:16:10 1772698570

> Very coherent for its size.

I found them to be less than stellar at writing coherent prose. Qwen 3.5 9b was worse in my tests than Gemma 3 4b.

yorwba · 2026-03-04T17:42:54 1772646174

There are smaller ones on HuggingFace https://huggingface.co/models?other=qwen3_5&sort=least_param... with 0.8B, 2B, 4B and 9B parameters.

ilaksh · 2026-03-03T20:36:10 1772570170

Only slightly related, but six years ago I was able to run 400 ZX Spectrum (Z80) emulator instances simultaneously on an AWS graphics workstation.

https://youtu.be/BjeVzEQW4C8?si=0I7UGU0Xz5WUT4ek

bigbuppo · 2026-03-03T21:22:09 1772572929

I remember that. Neat stuff.

ilaksh · 2026-03-03T07:24:22 1772522662

Just to mention, I have a similar solution on GitHub under my username runvnc, repo mindroot with plugins from repos mr_sip (should work with any SIP vendor although only tested on Telynx), mr_eleven_stream or mr_pocket-tts (which is free since it runs on CPU), and an LLM plugin like ah_openrouter, ah_anthropic or mr_gemini.

I also have a setting in mr_sip to use gpt-realtime via plugin ah_openai, which is very low latency speech-to-speech but quite expensive.

But my client saw the Sesame demo page, and so now I am trying to fine tune PersonaPlex.

ilaksh · 2026-03-03T07:11:12 1772521872

It just about works for our current use case but can't comprehend the concept of an outgoing call. So I am trying to fine tune it. Tricky thing is personaplex forked some of the kyutai code and has not integrated the LoRA stuff they added. So we tried to do update personaplex with the fine tuning stuff. Going to find out tonight or tomorrow whether it's actually feasible when I finish debugg/testing.