Help Center← Back to Dashboard
Getting Started
What is Aithroyz?Quickstart: First EnvironmentCloud CredentialsPlans & Approvals
Environments
OverviewLifecycle PhasesTTL Auto-DestroyExtending TTLDestroying an Environment
Tools Reference
OverviewElastic Stack (SIEM)Wazuh (XDR)MITRE CalderaTheHive & DFIR-IRISVelociraptorOpenCTIGrafana + PrometheusShuffle SOARn8nUptime KumaLLM GatewayOpen WebUIFlowiseOpenClawOllamaQdrantLangfusePortainerGiteaSonarQubeCode ServerMattermostMinIOMetabaseHashiCorp VaultKeycloak SSONetBoxLocalStack
Access & Security
Google SSOTenant IsolationPasskeys & MFATeam Members
API & Integrations
API KeysMCP Tools (Clevername)Terraform ExportWebhooks & Callbacks
Stack Presets
SOC PlatformIR / DFIR LabThreat HuntingQuick Sandbox
Settings
Cloud KeysAPI KeysBillingAudit Log
Troubleshooting
Common IssuesDeployment FailuresDNS & ConnectivityTool Health Checks
Aithroyz Help
Help CenterTools ReferenceOllama

Ollama (Local LLM Runner)

Ollama runs open-source LLMs locally inside your environment — Llama 3, Mistral, Phi-3, CodeLlama — with no API key required and no data leaving the sandbox.

Aithroyz deploys Ollama on a dedicated VM and exposes it at an HTTPS subdomain. Open WebUI and Flowise are pre-wired to the Ollama internal IP when deployed in the same plan, so models pulled into Ollama appear in both UIs automatically.

Access

API URL: https://ollama.<env-name>.ops.aithroyz.com
OpenAI-compatible: https://ollama.<env-name>.ops.aithroyz.com/v1
Auth: No API key required. Access is restricted to users authenticated via Google SSO at the gateway.

Pulling a model

Ollama starts with no models installed. Pull a model via the REST API or from Open WebUI's admin panel:

# Pull a model via the Ollama API
curl -X POST https://ollama.<env-name>.ops.aithroyz.com/api/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3:8b"}'

# Pull a smaller coding model
curl -X POST https://ollama.<env-name>.ops.aithroyz.com/api/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "codellama:7b-instruct"}'

# List installed models
curl https://ollama.<env-name>.ops.aithroyz.com/api/tags
ℹ
Model downloads can take several minutes depending on size. A 7B model is roughly 4–5 GB. The pull endpoint streams progress as newline-delimited JSON until the download completes.

Chatting via the API

Ollama exposes an OpenAI-compatible endpoint at /v1, so any OpenAI SDK or tool works without modification:

# Chat completions — OpenAI-compatible format
curl -X POST https://ollama.<env-name>.ops.aithroyz.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3:8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the OSI model in two sentences."}
    ]
  }'

# Streaming response
curl -X POST https://ollama.<env-name>.ops.aithroyz.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3:8b", "messages": [...], "stream": true}'

Model sizing guide

Choose your environment VM size based on the models you want to run. Larger models produce higher quality outputs but require more RAM and are slower on CPU-only instances:

e2-standard-2 (8 GB RAM)
7B parameter models — Llama 3 8B, Mistral 7B, Phi-3 Mini. Good quality, ~3–8 tokens/sec on CPU.
e2-standard-4 (16 GB RAM)
13B parameter models — Llama 2 13B, CodeLlama 13B. Better reasoning, ~2–4 tokens/sec on CPU.
e2-standard-8 (32 GB RAM)
30B+ parameter models — Llama 3 70B (Q4 quantized), Mixtral 8x7B. Near-GPT-3.5 quality; CPU inference only.
ℹ
Aithroyz sandbox environments run on CPU-only GCE instances. GPU instances are not currently supported. For GPU-accelerated inference, consider routing through the LLM Gateway (Claude, GPT-4) for latency-sensitive workloads.

Tips

Model storage
Downloaded models are stored at /root/.ollama/models on the Ollama VM and persist across restarts. Destroying and re-provisioning the environment clears them.
Use :instruct variants
For chat use cases, prefer instruct-tuned variants (e.g. llama3:8b-instruct, mistral:7b-instruct) over base models. Base models are for text completion only.
Embeddings
Ollama also serves embeddings via /api/embeddings or /v1/embeddings. Use nomic-embed-text or mxbai-embed-large for document indexing in Qdrant or Flowise.
Open WebUI integration
If Open WebUI is in the same plan, models pulled into Ollama appear in the Open WebUI model selector within 30 seconds — no configuration needed.
Related Articles
Open WebUI (AI Chat)Read article →Flowise (LangChain Visual Builder)Read article →LLM GatewayRead article →