Domi

Domi@lemmy.secnd.me · 1 day ago

As a side note, Qwen3.6-27B is much more capable than Qwen3.6-35B, even though it is much slower.

https://huggingface.co/unsloth/Qwen3.6-27B-GGUF

For coding tasks where you don’t mind waiting, you should be able to barely squeeze in the 8-bit quantized version with 32 GB RAM + 8 GB VRAM and have a pretty competent local model. 4-bit quants work but they have issues with complex tool calls.

If you use the MTP branch of llama.cpp (and a suitable model) you can even double or triple your token generation speed: https://github.com/ggml-org/llama.cpp/pull/22673

For easier tasks, disable reasoning for instant responses.

Domi@lemmy.secnd.me · 4 days ago

That looks pretty good. Looks like Portainer is getting replaced this weekend.

Domi@lemmy.secnd.me · 17 days ago

Do you actually train the LLM or use RAG? I have been looking for a local LLM + Wikipedia RAG solution for a while now.

For now I just have kiwix-serve + searxng doing a simple search but the Kiwix search is…questionable.

Domi@lemmy.secnd.me · 18 days ago

I wrote an application which runs on my server and monitors my favorites on Tidal/Deezer/Qobuz. It downloads them in bulk whenever I have a premium account with one of them. Usually I purchase a month of premium every few months, at which point I get nice clean FLACs for local use.

The FLACs are moved to Jellyfin and I stream them using Finamp, which also supports transcoding, so I keep 128 kbps Opus files for offline playback and stream the raw FLAC files when bandwidth is no concern.

I have amassed a huge music library over the last decades, so even if all streaming websites go under tomorrow, I have enough music locally to last me a lifetime.

Domi@lemmy.secnd.me · 20 days ago

Miss it when tech updates were good.

Open source got your back.

Still excited every time I get a new KDE version.

Domi@lemmy.secnd.me · 22 days ago

Same, 8 heads is way overkill for my simple 0.4/0.6/0.8mm nozzle swap use case but I wanted to build something. :)

Domi@lemmy.secnd.me · 4 months ago

Fedora 43 with the Rawhide kernel.

Domi@lemmy.secnd.me · 4 months ago

gpt-oss is pretty much unusable without custom system prompt.

Sycophancy turned to 11, bullet points everywhere and you get a summary for the summary of the summary.

Domi@lemmy.secnd.me · 4 months ago

Of course, self hosted with llama-swap and llama.cpp. :)

Domi@lemmy.secnd.me · 4 months ago

I have a Strix Halo machine with 128GB VRAM so I’m definitely going to give this a try with gpt-oss-120b this weekend.

Domi@lemmy.secnd.me · 5 months ago

We need a fourth one for “User error”.