GMKtec Ryzen 7640HS Mini PC — AI Server Setup Guide

A complete setup guide for turning the GMKtec NucBox M5 Pro (Ryzen 7640HS) into a local AI server for your business. Covers hardware assessment, two deployment paths, security, and common mistakes.

Last updated: March 2026

TL;DR — Buy or Don't Buy?

Buy if: You want a low-cost, always-on local AI server that runs lightweight models privately without monthly API fees. The 7640HS has a solid integrated GPU (Radeon 760M) that can run 7B quantized models at acceptable speeds.

Skip if: You need fast inference on 13B+ models, you need to process video or large image workloads, or you want a zero-maintenance setup. For those cases, start with cloud APIs and come back to local hardware later.

Verdict: For most small businesses running assistants, document summarization, or light automation — this box is excellent value for the price.

Buy Now

What You Can Build With This Box

A local Ollama server running Llama 3, Mistral, Phi-3, or Gemma models
A private document Q&A tool (connect to your PDFs, notes, or internal wiki)
An always-on AI assistant accessible from any device on your network
A webhook-triggered automation agent (n8n or similar)
A local API endpoint your team can hit from custom apps

Path A — API-Only (Recommended to Start)

Skip local models entirely and use this machine as an always-on automation host that calls cloud AI APIs. Simpler, faster, and easier to maintain.

What you need

The mini PC running Ubuntu 24.04 LTS or Windows 11
An OpenAI, Anthropic, or Google Gemini API key
Docker (for running n8n, Open WebUI, or similar tools)

Step 1 — Install Ubuntu Server

Download Ubuntu 24.04 LTS from ubuntu.com. Flash it to a USB drive using Balena Etcher. Boot from USB and install with default options. Choose "Minimal installation" to keep it lean.

Step 2 — Secure the machine

sudo apt update && sudo apt upgrade -y
sudo ufw enable
sudo ufw allow ssh
sudo ufw allow 80
sudo ufw allow 443

Set a strong password. Disable password SSH login and use SSH keys if you plan to connect remotely.

Step 3 — Install Docker

curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER

Log out and back in. Confirm Docker works: docker run hello-world

Step 4 — Run your first tool

To run Open WebUI (a ChatGPT-like interface connected to your API key):

docker run -d -p 3000:8080 \
  -e OPENAI_API_KEY=your-key-here \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Visit http://your-machine-ip:3000 from any device on your network.

Why Path A first?

You get a working AI tool on your network today. No GPU tuning, no model downloads, no prompt-to-speed frustration. Once you know what you actually need, Path B (local models) makes much more sense.

Path B — Hybrid Local + API

Run smaller models locally for privacy-sensitive tasks and fall back to cloud APIs for heavier work.

What you need

Ubuntu 24.04 LTS (same as above)
Ollama installed
At least 16 GB RAM (the M5 Pro comes with 32 GB — you're good)

Step 1 — Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Step 2 — Pull a model

Start with a quantized 7B model. The Q4_K_M quantization gives the best speed/quality balance on this hardware:

ollama pull llama3.2:3b

Test it: ollama run llama3.2:3b "Summarize this in one sentence: ..."

Step 3 — Expose Ollama on your local network

By default Ollama only listens on localhost. To expose it to other devices:

sudo systemctl edit ollama.service

Add under [Service]:

Environment="OLLAMA_HOST=0.0.0.0"

Then: sudo systemctl restart ollama

Step 4 — Connect Open WebUI to local Ollama

docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Expected performance on 7640HS

Model	Speed
Llama 3.2 3B (Q4_K_M)	~25–35 tokens/sec
Mistral 7B (Q4_K_M)	~10–15 tokens/sec
Phi-3 Mini (Q4)	~30–40 tokens/sec

These speeds are usable for one or two concurrent users. Not suitable for high-traffic production use.

Security Checklist

Before putting any of this on a network that has access to real business data:

Change all default passwords immediately
Enable UFW firewall with the minimum ports needed
Do not expose the Ollama API port (11434) to the internet — keep it LAN-only
Do not expose Open WebUI to the internet without authentication (enable the built-in auth)
Use SSH keys instead of password SSH if enabling remote access
Keep the OS and Docker images updated weekly
Do not store API keys in plain text files — use a .env file with restricted permissions (chmod 600 .env)
If using cloud APIs, use API keys scoped to minimum permissions

Common Mistakes

Running a model that's too large. Llama 3 8B fully loaded needs ~6 GB of RAM. If your system is also running Docker containers, Open WebUI, and background processes, you can run out of headroom quickly. Start with 3B models.

Skipping quantization. Full-precision models are 2–4x larger and slower. Always use Q4_K_M or Q5_K_M quantized versions from Ollama's library.

Exposing Ollama to the internet. The Ollama API has no built-in authentication. If port 11434 is open to the internet, anyone can use your compute and your local files. Use a reverse proxy with auth (Caddy or Nginx) if remote access is needed.

Not setting up auto-start. Ollama installs as a systemd service and starts on boot automatically. Docker containers do not. Add --restart unless-stopped to all your docker run commands.

Using the wrong Ubuntu version. Stick with Ubuntu 24.04 LTS. Newer Ubuntu versions sometimes have driver issues with the 760M integrated GPU that require extra configuration.

Buy the GMKtec NucBox M5 Pro

Buy the GMKtec NucBox M5 Pro on Amazon

Disclosure: The link above is an affiliate link. If you purchase through it, we may earn a small commission at no extra cost to you. See our full disclosure.

Get the Free AI Setup Blueprint

Before you buy any hardware, get clear on what you actually need to build. The Blueprint walks you through picking your stack, estimating real costs, and avoiding the three most common mistakes.

Get latest AI setup guides as they arrive:

No spam. Unsubscribe anytime.