GMKtec Ryzen 7640HS Mini PC — AI Server Setup Guide
A complete setup guide for turning the GMKtec NucBox M5 Pro (Ryzen 7640HS) into a local AI server for your business. Covers hardware assessment, two deployment paths, security, and common mistakes.
Last updated: March 2026
TL;DR — Buy or Don't Buy?
Buy if: You want a low-cost, always-on local AI server that runs lightweight models privately without monthly API fees. The 7640HS has a solid integrated GPU (Radeon 760M) that can run 7B quantized models at acceptable speeds.
Skip if: You need fast inference on 13B+ models, you need to process video or large image workloads, or you want a zero-maintenance setup. For those cases, start with cloud APIs and come back to local hardware later.
Verdict: For most small businesses running assistants, document summarization, or light automation — this box is excellent value for the price.
What You Can Build With This Box
- A local Ollama server running Llama 3, Mistral, Phi-3, or Gemma models
- A private document Q&A tool (connect to your PDFs, notes, or internal wiki)
- An always-on AI assistant accessible from any device on your network
- A webhook-triggered automation agent (n8n or similar)
- A local API endpoint your team can hit from custom apps
Path A — API-Only (Recommended to Start)
Skip local models entirely and use this machine as an always-on automation host that calls cloud AI APIs. Simpler, faster, and easier to maintain.
What you need
- The mini PC running Ubuntu 24.04 LTS or Windows 11
- An OpenAI, Anthropic, or Google Gemini API key
- Docker (for running n8n, Open WebUI, or similar tools)
Step 1 — Install Ubuntu Server
Download Ubuntu 24.04 LTS from ubuntu.com. Flash it to a USB drive using Balena Etcher. Boot from USB and install with default options. Choose "Minimal installation" to keep it lean.
Step 2 — Secure the machine
sudo apt update && sudo apt upgrade -y
sudo ufw enable
sudo ufw allow ssh
sudo ufw allow 80
sudo ufw allow 443
Set a strong password. Disable password SSH login and use SSH keys if you plan to connect remotely.
Step 3 — Install Docker
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
Log out and back in. Confirm Docker works: docker run hello-world
Step 4 — Run your first tool
To run Open WebUI (a ChatGPT-like interface connected to your API key):
docker run -d -p 3000:8080 \
-e OPENAI_API_KEY=your-key-here \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Visit http://your-machine-ip:3000 from any device on your network.
Why Path A first?
You get a working AI tool on your network today. No GPU tuning, no model downloads, no prompt-to-speed frustration. Once you know what you actually need, Path B (local models) makes much more sense.
Path B — Hybrid Local + API
Run smaller models locally for privacy-sensitive tasks and fall back to cloud APIs for heavier work.
What you need
- Ubuntu 24.04 LTS (same as above)
- Ollama installed
- At least 16 GB RAM (the M5 Pro comes with 32 GB — you're good)
Step 1 — Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Step 2 — Pull a model
Start with a quantized 7B model. The Q4_K_M quantization gives the best speed/quality balance on this hardware:
ollama pull llama3.2:3b
Test it: ollama run llama3.2:3b "Summarize this in one sentence: ..."
Step 3 — Expose Ollama on your local network
By default Ollama only listens on localhost. To expose it to other devices:
sudo systemctl edit ollama.service
Add under [Service]:
Environment="OLLAMA_HOST=0.0.0.0"
Then: sudo systemctl restart ollama
Step 4 — Connect Open WebUI to local Ollama
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Expected performance on 7640HS
| Model | Speed |
|---|---|
| Llama 3.2 3B (Q4_K_M) | ~25–35 tokens/sec |
| Mistral 7B (Q4_K_M) | ~10–15 tokens/sec |
| Phi-3 Mini (Q4) | ~30–40 tokens/sec |
These speeds are usable for one or two concurrent users. Not suitable for high-traffic production use.
Security Checklist
Before putting any of this on a network that has access to real business data:
- Change all default passwords immediately
- Enable UFW firewall with the minimum ports needed
- Do not expose the Ollama API port (11434) to the internet — keep it LAN-only
- Do not expose Open WebUI to the internet without authentication (enable the built-in auth)
- Use SSH keys instead of password SSH if enabling remote access
- Keep the OS and Docker images updated weekly
- Do not store API keys in plain text files — use a
.envfile with restricted permissions (chmod 600 .env) - If using cloud APIs, use API keys scoped to minimum permissions
Common Mistakes
Running a model that's too large. Llama 3 8B fully loaded needs ~6 GB of RAM. If your system is also running Docker containers, Open WebUI, and background processes, you can run out of headroom quickly. Start with 3B models.
Skipping quantization. Full-precision models are 2–4x larger and slower. Always use Q4_K_M or Q5_K_M quantized versions from Ollama's library.
Exposing Ollama to the internet. The Ollama API has no built-in authentication. If port 11434 is open to the internet, anyone can use your compute and your local files. Use a reverse proxy with auth (Caddy or Nginx) if remote access is needed.
Not setting up auto-start. Ollama installs as a systemd service and starts on boot automatically. Docker containers do not. Add --restart unless-stopped to all your docker run commands.
Using the wrong Ubuntu version. Stick with Ubuntu 24.04 LTS. Newer Ubuntu versions sometimes have driver issues with the 760M integrated GPU that require extra configuration.
Buy the GMKtec NucBox M5 Pro
Buy the GMKtec NucBox M5 Pro on AmazonDisclosure: The link above is an affiliate link. If you purchase through it, we may earn a small commission at no extra cost to you. See our full disclosure.
Get the Free AI Setup Blueprint
Before you buy any hardware, get clear on what you actually need to build. The Blueprint walks you through picking your stack, estimating real costs, and avoiding the three most common mistakes.