Local AI
Aurora fully supports local AI workflows using both Nvidia and AMD GPUs. Acceleration packages for both vendors are included by default and do not require manual intervention to get working properly.
Use the robots to your advantage!
Ramalama
Ramalama is a tool which is included on the developer experience images that makes managing and working with AI models on your local machine super easy and smoooth.
It can pull models from Huggingface, Ollama and other providers without extra configuration or hassle. By default it is pulling its models from the Ollama Model Registry, which you can find here: Ollama Model Registry.
Here are some example commands to run models:
ramalama pull llama3.1:8b
ramalama run llama3.1:8b
ramalama run deepseek-r1
If you want to pull a small Web-UI to use the model rather than the cli, use the serve
command that Ramalama provides out of the box. Like this:
ramalama serve deepseek-r1
If you are on a laptop or other computer with an unsupported CPU but have Vulkan acceleration available (e.g. an iGPU from a Ryzen APU) then you can use the Vulkan-accelerated image to serve models, like so:
ramalama --image=quay.io/ramalama/vulkan:latest --ngl 999 run llama3.2:3b
It will automatically pull down the needed image and all model files that are needed.
Models and OCI images of runners are stored in your local podman storage, like the rest of your container images.
aurora ~> podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.fedoraproject.org/fedora latest 9f3411e5c4ba 10 days ago 330 MB
quay.io/admiller/ramalama-fedora rocm-42 79126d1c9dbd 3 weeks ago 7.87 GB
quay.io/ramalama/rocm latest 9ec451829443 4 weeks ago 7.71 GB
quay.io/ramalama/vulkan latest db288c77cab0 5 weeks ago 1.07 GB
Alpaca
To get a more graphical experience managing and interacting with your models, you can use the graphical client Alpaca It is running an embedded Ollama engine and features a beautiful graphical interface to answer your most burning questions.
To have proper AMD GPU Acceleration support, install the com.jeffser.Alpaca.Plugins.AMD
plugin via the CLI
(flatpak install com.jeffser.Alpaca.Plugins.AMD
) or from Discover by searching for it. This addon requires a compatible ROCM AMD GPU, like a 7900 XTX or Radeon 6900 XT.
Ollama
This part is heavily based on the guide from Bluefin which is found here.
Ollama can also be used if preferred. It integrates very well with many programs and plugins in popular IDEs and code editors like Zed, IntelliJ IDEA or VSCode using Continue.dev.
To run the ollama container on a Nvidia machine using Aurora, you first have to configure the nvidia container runtime like so:
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Choose a folder where you would like to store docker-compose.yml
file which is holding your Ollama Container configuration.
Create a compose file with the following configuration:
services:
ollama:
image: ollama/ollama
container_name: ollama
restart: unless-stopped
ports:
- 11434:11434
volumes:
- ./ollama_v:/root/.ollama
deploy:
resources:
reservations:
devices:
- capabilities:
- gpu
For folks rocking a supported ROCM AMD GPU, you can this compose file to get you started:
services:
ollama:
image: ollama/ollama:rocm
container_name: ollama
restart: unless-stopped
devices:
- /dev/kfd
- /dev/dri
volumes:
- ./ollama_v:/root/.ollama
ports:
- "11434:11434"
Then run docker compose up -d
while being inside the folder in your terminal. You should now have a accelerated Ollama server running on your local machine.
The endpoint on which you can connect to Ollama is located at http://localhost:11434
.