@vapeloki

vapeloki@lemmy.world · 11 hours ago

Out of interest, what about open weight models?

vapeloki@lemmy.world · 11 hours ago

Because USB sticks now speed up inference or what? Without a GPU no LLM. No matter how many USB sticks …

vapeloki@lemmy.world · 4 days ago

To avoid context switching on the GPU. OpenWebUi for example uses it for memory and title generation.

Those are not performance critical and background tasks, so instead of slowing down qwen, we just outsource this stuff to the NPU.

Edit: see here for more details

vapeloki@lemmy.world · 4 days ago

ThinkPad and Dell have a bunch of Linux compatible notebooks.

If you are in a European country not being locked into apples ecosystem would be a major argument for me.

vapeloki@lemmy.world · 4 days ago

AMD Strix is an APU, optimized for AI. It is the cheapest option I am aware of to run bigger models at home. 2k for 56GB VRAM, and less den 300W total power Budget.

One could run smaller models. But for the context sizes required for research work, that is nearly impossible.

Also, external services, like openrouter, can be used to use models hosted in the cloud.

But for self hosted, you need something that can run models with at least 15GB of VRAM + Context. For comparison. Our highly quantized model uses 20GB of vram. For our 4 slots we need another 20GB on top of it (around 5GB for 254k tokens), making it 40GB.

vapeloki@lemmy.world · 4 days ago

Not imatrix or advanced quants, but yes.

But there are more then enough stock models for this task I would say. For specialized use cases though costum quant can be very very powerful

vapeloki@lemmy.world · 4 days ago

Besides marketing slides and claims of people profiting from hosted models, you do not need a 1Trillion parameter model

vapeloki@lemmy.world · 4 days ago

Openrouter is also nice for this. You can use real cheap models for embedding and the bigger ones for the actual research.

vapeloki@lemmy.world · 4 days ago

“Hey Claude, research for me current research to Nuclear Fusion. What are the biggest hurdles what are the next steps, and how promising is private research” enabling the research feature will give you a report, Fact checked (not clean but ok ish), and all the sources for it.

Claude will spin up a bunch of workers and search the web, following leads, and so on.

One of the few actual useful features of AI IMHO

vapeloki@lemmy.world · 4 days ago

I am on Gentoo for it, but everything with a decent rocm should work.

Have a look for llama-swap, that handles multi head endpoints.

Also, as you are on a big board, you can quantize yourself, as the BF16 version of qwen has only 72gb.

I will try and post a full writeup next days. But feel free to dm me, if you need some guidance on quantize or more.

I am using this fork currently: https://github.com/charlie12345/ROCmFPX

Stuff happens fast currently, so may be worth to wait a week or two ig you need something super stable, but if you are up for experimenting, that’s the way to go

vapeloki@lemmy.world · 4 days ago

For those who want to know more, rough setup:

llama-cpp rocmfp4 fork
currently custom quantized qwen3.6 35B A3B model, working on publishing
be3 embedding and reranker, also GPU
gemma4-e4b via FastFlowLM on NPU!
OpenWebUI and searxng as docker containers on a Pi currently

We get 70-100tok/s generation. Four slots with 256k context length each.

We use a smaller Board with “only” 64GB of shared LPDDR5X. Bottleneck is memory speed, rocmfp4 quants help a lot.

As soon as I get my imatrix calibration right, I will publish the quantized versions.

Most existing quantized models are broken. The authors did some not supported stuff (like using a already quantized model and requantize it) that you may get issues with coherence or sudden Chinese words in the output.

That is not an issue with rocmfp4 but with vibe coders and agent psychosis.

vapeloki@lemmy.world · 4 days ago

Openwebui+searxng on a AMD strix board.

Pro: works like a charm, low power consumption, fast, “big” , LLM (running qwen3.6 35B A3B + gemma4 E4B for website summaries and other smaller tasks)

Con: strix boards start at 2k€, more in USA because of tarrifs

vapeloki@lemmy.world · 6 days ago

I use the right tool for the right job. Letting fresh air into the house or escaping in case of fire are the only valid use cases for Windows I can imagine.

vapeloki@lemmy.world · 6 days ago

What the hell is this logo? I got an aneurism trying to read it

vapeloki@lemmy.world · 6 days ago

NASA Engineers Packed 100 Tampons in Legendary Astronaut Sally Ride’s Toiletry Kit for Just One Week In Space

vapeloki@lemmy.world · 12 days ago

Security Research, software developer, strange hardware configurations.

And of course Software I need is not including their deps

vapeloki@lemmy.world · 13 days ago

That was an example. And as someone who works in sec, I know the benefits of a package manager.

“I only need to trust brave”.

I don’t get it, static linking, curl to bash pipes and userepace install and everybody thinks that is fine. But as someone who needs to write a security concept for Linux in the office so I can finally use it at work, no that is not ok. That is shit.

Rust on desktop is also a nightmare for example.

No I do not hate arch, I hate concepts and mindsets creeping into the Linux world

vapeloki@lemmy.world · 13 days ago

Anywhere else, I just need the package from the brave project or their repo. I trust the brave project, I do not trust the AUR for reasons.

vapeloki@lemmy.world · 14 days ago

True, I also have a choice to use a distro that delivers the software I need in the main repo.

vapeloki@lemmy.world · 14 days ago

And then, where get I 70% of my packages I need? For example a useful browser like brave? Yeah …