

That looks pretty good. Looks like Portainer is getting replaced this weekend.


That looks pretty good. Looks like Portainer is getting replaced this weekend.
Do you actually train the LLM or use RAG? I have been looking for a local LLM + Wikipedia RAG solution for a while now.
For now I just have kiwix-serve + searxng doing a simple search but the Kiwix search is…questionable.


I wrote an application which runs on my server and monitors my favorites on Tidal/Deezer/Qobuz. It downloads them in bulk whenever I have a premium account with one of them. Usually I purchase a month of premium every few months, at which point I get nice clean FLACs for local use.
The FLACs are moved to Jellyfin and I stream them using Finamp, which also supports transcoding, so I keep 128 kbps Opus files for offline playback and stream the raw FLAC files when bandwidth is no concern.
I have amassed a huge music library over the last decades, so even if all streaming websites go under tomorrow, I have enough music locally to last me a lifetime.


Miss it when tech updates were good.
Open source got your back.
Still excited every time I get a new KDE version.


Same, 8 heads is way overkill for my simple 0.4/0.6/0.8mm nozzle swap use case but I wanted to build something. :)
Fedora 43 with the Rawhide kernel.
gpt-oss is pretty much unusable without custom system prompt.
Sycophancy turned to 11, bullet points everywhere and you get a summary for the summary of the summary.
Of course, self hosted with llama-swap and llama.cpp. :)
I have a Strix Halo machine with 128GB VRAM so I’m definitely going to give this a try with gpt-oss-120b this weekend.
We need a fourth one for “User error”.
As a side note, Qwen3.6-27B is much more capable than Qwen3.6-35B, even though it is much slower.
https://huggingface.co/unsloth/Qwen3.6-27B-GGUF
For coding tasks where you don’t mind waiting, you should be able to barely squeeze in the 8-bit quantized version with 32 GB RAM + 8 GB VRAM and have a pretty competent local model. 4-bit quants work but they have issues with complex tool calls.
If you use the MTP branch of llama.cpp (and a suitable model) you can even double or triple your token generation speed: https://github.com/ggml-org/llama.cpp/pull/22673
For easier tasks, disable reasoning for instant responses.