Initial commit

2026-01-07 16:54:39 -08:00
commit 5d1a0ee72b
53 changed files with 9885 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,142 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# Project-specific
+/inventory_raw/
+/llamacpp_runs_remote/
+/ollama_runs_remote/
+/reports/
+/tmp/
+*.log
+/C:/Users/Rushabh/.gemini/tmp/bff31f86566324f77927540d72088ce62479fd0563c197318c9f0594af2e69ee/
+
+# OS-generated files
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
--- a/AGENTS.full.md
+++ b/AGENTS.full.md
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,20 @@
+# AGENTS (compressed)
+
+This is the compact working context. For the full historical inventory and detailed snapshots, see `AGENTS.full.md` and `inventory_raw/`.
+
+## Access + basics
+- SSH: `ssh -p 55555 rushabh@192.168.1.2`
+- Sudo: `sudo -n true`
+- TrueNAS UI: `http://192.168.1.2`
+
+## Full context pointers
+- Full inventory snapshot and extra system details: `AGENTS.full.md`
+- Raw captured data: `inventory_raw/`
+- Documentation notes: `docs/*`
+Projects
+  - n8n Thesis Builder checkpoint (2026-01-04): `docs/n8n-thesis-builder-checkpoint-20260104.md`
+  - llamaCpp wrapper: A Python-based OpenAI-compatible API wrapper and model manager for the TrueNAS llama.cpp app.
+    - Location: `llamaCpp.Wrapper.app/`
+    - API Port: `9093`
+    - UI Port: `9094`
+    - See the `README.md` inside the folder for full details.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,69 @@
+# Codex TrueNAS Helper
+
+This project is a collection of scripts, configurations, and applications to manage and enhance a TrueNAS SCALE server, with a special focus on running and interacting with large language models (LLMs) like those powered by `llama.cpp` and `Ollama`.
+
+## Features
+
+*   **`llama.cpp` Wrapper:** A sophisticated wrapper for the `llama.cpp` TrueNAS application that provides:
+    *   An OpenAI-compatible API for chat completions and embeddings.
+    *   A web-based UI for managing models (listing, downloading).
+    *   The ability to hot-swap models without restarting the `llama.cpp` container by interacting with the TrueNAS API.
+*   **TrueNAS Inventory:** A snapshot of the TrueNAS server's configuration, including hardware, storage, networking, and running applications.
+*   **Automation Scripts:** A set of PowerShell and Python scripts for tasks like deploying the wrapper and testing remote endpoints.
+*   **LLM Integration:** Tools and configurations for working with various LLMs.
+
+## Directory Structure
+
+*   `AGENTS.md` & `AGENTS.full.md`: These files contain detailed information and a complete inventory of the TrueNAS server's configuration.
+*   `llamaCpp.Wrapper.app/`: A Python-based application that wraps the `llama.cpp` TrueNAS app with an OpenAI-compatible API and a model management UI.
+*   `scripts/`: Contains various scripts for deployment, testing, and other tasks.
+*   `inventory_raw/`: Raw data dumps from the TrueNAS server, used to generate the inventory in `AGENTS.full.md`.
+*   `reports/`: Contains generated reports, test results, and other artifacts.
+*   `llamacpp_runs_remote/` & `ollama_runs_remote/`: Logs and results from running LLMs.
+*   `modelfiles/`: Modelfiles for different language models.
+*   `tests/`: Python tests for the `llamaCpp.Wrapper.app`.
+
+## `llamaCpp.Wrapper.app`
+
+This is the core component of the project. It's a Python application that acts as a proxy to the `llama.cpp` server running on TrueNAS, but with added features.
+
+### Running Locally
+
+1.  Install the required Python packages:
+    ```bash
+    pip install -r llamaCpp.Wrapper.app/requirements.txt
+    ```
+2.  Run the application:
+    ```bash
+    python -m llamaCpp.Wrapper.app.run
+    ```
+    This will start two web servers: one for the API (default port 9093) and one for the UI (default port 9094).
+
+### Docker (TrueNAS)
+
+The wrapper can be run as a Docker container on TrueNAS. See the `llamaCpp.Wrapper.app/README.md` file for a detailed example of the `docker run` command. The wrapper needs to be configured with the appropriate environment variables to connect to the TrueNAS API and the `llama.cpp` container.
+
+### Model Hot-Swapping
+
+The wrapper can switch models in the `llama.cpp` server by updating the application's command via the TrueNAS API. This is a powerful feature that allows for dynamic model management without manual intervention.
+
+## Scripts
+
+*   `deploy_truenas_wrapper.py`: A Python script to deploy the `llamaCpp.Wrapper.app` to TrueNAS.
+*   `remote_wrapper_test.py`: A Python script for testing the remote wrapper.
+*   `update_llamacpp_flags.ps1`: A PowerShell script to update the `llama.cpp` flags.
+*   `llamacpp_remote_test.ps1` & `ollama_remote_test.ps1`: PowerShell scripts for testing `llama.cpp` and `Ollama` remote endpoints.
+
+## Getting Started
+
+1.  **Explore the Inventory:** Start by reading `AGENTS.md` and `AGENTS.full.md` to understand the TrueNAS server's configuration.
+2.  **Set up the Wrapper:** If you want to use the `llama.cpp` wrapper, follow the instructions in `llamaCpp.Wrapper.app/README.md` to run it either locally or as a Docker container on TrueNAS.
+3.  **Use the Scripts:** The scripts in the `scripts` directory can be used to automate various tasks.
+
+## Development
+
+The `llamaCpp.Wrapper.app` has a suite of tests located in the `tests/` directory. To run the tests, use `pytest`:
+
+```bash
+pytest
+```
--- a/docs/llamacpp-wrapper-notes.md
+++ b/docs/llamacpp-wrapper-notes.md
@@ -0,0 +1,60 @@
+# llama.cpp Wrapper Notes
+
+Last updated: 2026-01-04
+
+## Purpose
+OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
+model switching, and parameter management via TrueNAS middleware.
+
+## Deployed Image
+- `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
+
+## Ports (current)
+- API (pinned): `http://192.168.1.2:9093`
+- UI (pinned): `http://192.168.1.2:9094`
+- llama.cpp native: `http://192.168.1.2:8071`
+
+## Key Behaviors
+- Model switching uses TrueNAS middleware `app.update` to update `--model`.
+- `--device` flag is explicitly removed because it crashes llama.cpp on this host.
+- UI shows active model and supports switching with verification prompt.
+- UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
+- UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
+- UI supports dark theme toggle (persisted in localStorage).
+- UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
+
+## Tools Support (n8n/OpenWebUI)
+- Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
+  OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
+- Legacy `functions` payloads are normalized into `tools`.
+- `tool_choice` is normalized to OpenAI format as well.
+- `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
+
+## Model Resolution
+- Exact string match only (with optional explicit alias mapping).
+- Requests that do not exactly match a listed model return `404`.
+
+## Parameters UI
+- Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
+- Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
+
+## Model Switch UI
+- Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
+- Verifies switch by sending a minimal prompt.
+
+## Tests
+- Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
+- UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
+- Run with env vars:
+  - `WRAPPER_BASE=http://192.168.1.2:9093`
+  - `UI_BASE=http://192.168.1.2:9094`
+  - `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
+  - `TRUENAS_API_KEY=...`
+  - `MODEL_REQUEST=<exact model id from /v1/models>`
+
+## Runtime Validation (2026-01-04)
+- Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
+- Confirmed TinyLlama loads and answers prompts with `return_format=json`.
+- Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
+- Expect transient `503 Loading model` during warmup; retry after load completes.
+ - Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).
--- a/docs/n8n-thesis-builder-checkpoint-20260104.md
+++ b/docs/n8n-thesis-builder-checkpoint-20260104.md
@@ -0,0 +1,53 @@
+# n8n Thesis Builder Debug Checkpoint (2026-01-04)
+
+## Summary
+- Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
+- Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
+- Confirmed execution with garbled output: execution `7890` (status `success`).
+
+## What changed in the workflow
+Only this workflow was modified:
+- `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
+- `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
+- `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
+- `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
+- `Code10` now caps news items and string length.
+
+## Last successful run details (execution 7890)
+- `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
+- `AI - Technicals Auditor` output looks valid JSON (see sample below).
+- `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
+- Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
+
+### Sample `AI - Thesis Builder` output (garbled)
+- symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
+- thesis: (junk tokens, mostly non-words)
+- confidence: `0`
+
+### Sample `AI - Technicals Auditor` output (valid JSON)
+```
+{
+  "output": {
+    "timeframes": [
+      { "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
+    ],
+    "optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
+    "dataQualityScore": 0.5,
+    "error": "INSUFFICIENT_DATA"
+  }
+}
+```
+
+## Open issues
+- Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
+- Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
+
+## Useful commands
+- Last runs:
+  `SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
+- Export workflow:
+  `sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`
+
--- a/llamaCpp.Wrapper.app/Dockerfile
+++ b/llamaCpp.Wrapper.app/Dockerfile
@@ -0,0 +1,16 @@
+FROM python:3.11-slim
+
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1
+
+WORKDIR /app
+
+COPY requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+
+COPY app /app/app
+COPY trades_company_stock.txt /app/trades_company_stock.txt
+
+EXPOSE 8000 8001
+
+CMD ["python", "-m", "app.run"]
--- a/llamaCpp.Wrapper.app/README.md
+++ b/llamaCpp.Wrapper.app/README.md
@@ -0,0 +1,134 @@
+# llama.cpp OpenAI-Compatible Wrapper
+
+This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
+The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
+
+## Current Agents-Derived Details
+
+- llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
+- Host port: `8071` -> container port `8080`
+- Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
+- Network: `ix-llamacpp_default`
+- Container name: `ix-llamacpp-llamacpp-1`
+- GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
+
+Regenerate the derived config after updating `AGENTS.md`:
+
+```bash
+python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
+```
+
+## Running Locally
+
+```bash
+python -m venv .venv
+. .venv/bin/activate
+pip install -r requirements.txt
+python -m app.run
+```
+
+Defaults:
+- API: `PORT_A=9093`
+- UI: `PORT_B=9094`
+- Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
+- Model dir: `MODEL_DIR=/models`
+
+## Docker (TrueNAS)
+
+Example (join existing llama.cpp network and mount models):
+
+```bash
+docker run --rm -p 9093:9093 -p 9094:9094 \
+  --network ix-llamacpp_default \
+  -v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
+  -v /var/run/docker.sock:/var/run/docker.sock \
+  -e LLAMACPP_RESTART_METHOD=docker \
+  -e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
+  -e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
+  -e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
+  -e TRUENAS_API_KEY=YOUR_KEY \
+  -e TRUENAS_API_USER=YOUR_USER \
+  -e TRUENAS_APP_NAME=llamacpp \
+  -e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
+  -e PORT_A=9093 -e PORT_B=9094 \
+  llama-cpp-openai-wrapper:latest
+```
+
+## Model Hot-Swap / Restart Hooks
+
+This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
+provide one of the restart methods below:
+
+- `LLAMACPP_RESTART_METHOD=http`
+- `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
+
+or
+
+- `LLAMACPP_RESTART_METHOD=shell`
+- `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
+
+or (requires mounting docker socket)
+
+- `LLAMACPP_RESTART_METHOD=docker`
+- `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
+
+## Model switching via TrueNAS middleware (P0)
+
+Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
+
+```
+TRUENAS_WS_URL=ws://192.168.1.2/websocket
+TRUENAS_API_KEY=YOUR_KEY
+TRUENAS_API_USER=YOUR_USER
+TRUENAS_APP_NAME=llamacpp
+TRUENAS_VERIFY_SSL=false
+```
+
+The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
+missing GPU split flags from `LLAMACPP_*` if not already set.
+
+Optional arguments passed to restart handlers:
+
+```
+LLAMACPP_DEVICES=0,1
+LLAMACPP_TENSOR_SPLIT=0.5,0.5
+LLAMACPP_SPLIT_MODE=layer
+LLAMACPP_N_GPU_LAYERS=999
+LLAMACPP_CTX_SIZE=8192
+LLAMACPP_BATCH_SIZE=1024
+LLAMACPP_UBATCH_SIZE=256
+LLAMACPP_CACHE_TYPE_K=q4_0
+LLAMACPP_CACHE_TYPE_V=q4_0
+LLAMACPP_FLASH_ATTN=on
+```
+
+You can also pass arbitrary llama.cpp flags (space-separated) via:
+
+```
+LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
+```
+
+## Model Manager UI
+
+Open `http://HOST:PORT_B/`.
+
+Features:
+- List existing models
+- Download models via URL
+- Live progress + cancel
+
+## Testing
+
+Tests are parameterized with 100+ cases per endpoint.
+
+```bash
+pytest -q
+```
+
+## llama.cpp flags reference
+
+Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
+
+```
+pwsh scripts/update_llamacpp_flags.ps1
+```
--- a/llamaCpp.Wrapper.app/init.py
+++ b/llamaCpp.Wrapper.app/init.py
@@ -0,0 +1 @@
+
--- a/llamaCpp.Wrapper.app/agents_config.json
+++ b/llamaCpp.Wrapper.app/agents_config.json
@@ -0,0 +1,22 @@
+{
+  "image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
+  "container_name": "ix-llamacpp-llamacpp-1",
+  "host_port": 8071,
+  "container_port": 8080,
+  "web_ui_url": "http://0.0.0.0:8071/",
+  "model_host_path": "/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models",
+  "model_container_path": "/models",
+  "models": [
+    "GPT-OSS",
+    "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
+    "openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf",
+    "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
+  ],
+  "network": "ix-llamacpp_default",
+  "subnets": [
+    "172.16.18.0/24",
+    "fdb7:86ec:b1dd:11::/64"
+  ],
+  "gpu_count": 2,
+  "gpu_name": "NVIDIA RTX 5060 Ti, 16 GB each (per `nvidia-smi` in prior runs)."
+}
--- a/llamaCpp.Wrapper.app/agents_parser.py
+++ b/llamaCpp.Wrapper.app/agents_parser.py
@@ -0,0 +1,119 @@
+import json
+import re
+from dataclasses import dataclass, asdict
+from pathlib import Path
+from typing import List, Optional
+
+APP_HEADER_RE = re.compile(r"^### App: (?P<name>.+?)\s*$")
+IMAGE_RE = re.compile(r"image=(?P<image>[^\s]+)")
+PORT_MAP_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host>\d+|0\.0\.0\.0:(?P<host_ip_port>\d+))")
+PORT_LINE_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host_ip>[^:]+):(?P<host>\d+)")
+VOLUME_RE = re.compile(r"- (?P<host>/[^\s]+) -> (?P<container>/[^\s]+)")
+NETWORK_RE = re.compile(r"- (?P<name>ix-[^\s]+)_default")
+SUBNET_RE = re.compile(r"subnets=\[(?P<subnets>[^\]]+)\]")
+MODELS_RE = re.compile(r"Models in /models: (?P<models>.+)$")
+PORTAL_RE = re.compile(r"Portals: \{\'Web UI\': \'(?P<url>[^\']+)\'\}")
+GPU_RE = re.compile(r"GPUs:\s*(?P<count>\d+)x\s*(?P<name>.+)$")
+CONTAINER_NAME_RE = re.compile(r"^(?P<name>ix-llamacpp-[^\s]+)")
+
+@dataclass
+class LlamacppConfig:
+    image: Optional[str] = None
+    container_name: Optional[str] = None
+    host_port: Optional[int] = None
+    container_port: Optional[int] = None
+    web_ui_url: Optional[str] = None
+    model_host_path: Optional[str] = None
+    model_container_path: Optional[str] = None
+    models: List[str] = None
+    network: Optional[str] = None
+    subnets: List[str] = None
+    gpu_count: Optional[int] = None
+    gpu_name: Optional[str] = None
+
+
+def _find_section(lines: List[str], app_name: str) -> List[str]:
+    start = None
+    for i, line in enumerate(lines):
+        m = APP_HEADER_RE.match(line.strip())
+        if m and m.group("name") == app_name:
+            start = i
+            break
+    if start is None:
+        return []
+    for j in range(start + 1, len(lines)):
+        if APP_HEADER_RE.match(lines[j].strip()):
+            return lines[start:j]
+    return lines[start:]
+
+
+def parse_agents(path: Path) -> LlamacppConfig:
+    text = path.read_text(encoding="utf-8", errors="ignore")
+    lines = text.splitlines()
+    section = _find_section(lines, "llamacpp")
+    cfg = LlamacppConfig(models=[], subnets=[])
+
+    for line in section:
+        if cfg.image is None:
+            m = IMAGE_RE.search(line)
+            if m:
+                cfg.image = m.group("image")
+        if cfg.web_ui_url is None:
+            m = PORTAL_RE.search(line)
+            if m:
+                cfg.web_ui_url = m.group("url")
+        if cfg.container_port is None or cfg.host_port is None:
+            m = PORT_LINE_RE.search(line)
+            if m:
+                cfg.container_port = int(m.group("container"))
+                cfg.host_port = int(m.group("host"))
+        if cfg.model_host_path is None or cfg.model_container_path is None:
+            m = VOLUME_RE.search(line)
+            if m and "/models" in m.group("container"):
+                cfg.model_host_path = m.group("host")
+                cfg.model_container_path = m.group("container")
+        if cfg.network is None:
+            m = NETWORK_RE.search(line)
+            if m:
+                cfg.network = f"{m.group('name')}_default"
+        if "subnets=" in line:
+            m = SUBNET_RE.search(line)
+            if m:
+                subnets_raw = m.group("subnets")
+                subnets = [s.strip().strip("'") for s in subnets_raw.split(",")]
+                cfg.subnets.extend([s for s in subnets if s])
+        if "Models in /models:" in line:
+            m = MODELS_RE.search(line)
+            if m:
+                models_raw = m.group("models")
+                cfg.models = [s.strip() for s in models_raw.split(",") if s.strip()]
+
+    for line in lines:
+        if cfg.gpu_count is None:
+            m = GPU_RE.search(line)
+            if m:
+                cfg.gpu_count = int(m.group("count"))
+                cfg.gpu_name = m.group("name").strip()
+        if cfg.container_name is None:
+            m = CONTAINER_NAME_RE.match(line.strip())
+            if m:
+                cfg.container_name = m.group("name")
+
+    return cfg
+
+
+def main() -> None:
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--agents", default="AGENTS.md")
+    parser.add_argument("--out", default="app/agents_config.json")
+    args = parser.parse_args()
+
+    cfg = parse_agents(Path(args.agents))
+    out_path = Path(args.out)
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    out_path.write_text(json.dumps(asdict(cfg), indent=2), encoding="utf-8")
+
+
+if __name__ == "__main__":
+    main()
--- a/llamaCpp.Wrapper.app/api_app.py
+++ b/llamaCpp.Wrapper.app/api_app.py
@@ -0,0 +1,309 @@
+import asyncio
+import logging
+import time
+from pathlib import Path
+from typing import Any, Dict
+
+from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
+from fastapi.responses import JSONResponse, StreamingResponse
+import httpx
+
+from app.config import load_config
+from app.llamacpp_client import proxy_json, proxy_raw, proxy_stream
+from app.logging_utils import configure_logging
+from app.model_registry import find_model, resolve_model, scan_models
+from app.openai_translate import responses_to_chat_payload, chat_to_responses, normalize_chat_payload
+from app.restart import RestartPlan, trigger_restart
+from app.stream_transform import stream_chat_to_responses
+from app.truenas_middleware import TrueNASConfig, get_active_model_id, switch_model
+from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
+
+
+configure_logging()
+log = logging.getLogger("api_app")
+
+
+def _model_list_payload(model_dir: str) -> Dict[str, Any]:
+    data = []
+    for model in scan_models(model_dir):
+        data.append({
+            "id": model.model_id,
+            "object": "model",
+            "created": model.created,
+            "owned_by": "llama.cpp",
+        })
+    return {"object": "list", "data": data}
+
+
+def _requires_json_mode(payload: Dict[str, Any]) -> bool:
+    response_format = payload.get("response_format")
+    if isinstance(response_format, dict) and response_format.get("type") == "json_object":
+        return True
+    if payload.get("return_format") == "json":
+        return True
+    return False
+
+
+def _apply_json_fallback(payload: Dict[str, Any]) -> Dict[str, Any]:
+    payload = dict(payload)
+    payload.pop("response_format", None)
+    payload.pop("return_format", None)
+    messages = payload.get("messages")
+    if isinstance(messages, list):
+        system_msg = {"role": "system", "content": "Respond only with a valid JSON object."}
+        if not messages or messages[0].get("role") != "system":
+            payload["messages"] = [system_msg, *messages]
+        else:
+            payload["messages"] = [system_msg, *messages[1:]]
+    return payload
+
+
+async def _proxy_json_with_retry(
+    base_url: str,
+    path: str,
+    method: str,
+    headers: Dict[str, str],
+    payload: Dict[str, Any],
+    timeout_s: float,
+    delay_s: float = 3.0,
+) -> httpx.Response:
+    deadline = time.time() + timeout_s
+    attempt = 0
+    last_exc: Exception | None = None
+    while time.time() < deadline:
+        attempt += 1
+        try:
+            resp = await proxy_json(base_url, path, method, headers, payload, timeout_s)
+            if resp.status_code == 503:
+                try:
+                    data = resp.json()
+                except Exception:
+                    data = {}
+                message = ""
+                if isinstance(data, dict):
+                    err = data.get("error")
+                    if isinstance(err, dict):
+                        message = str(err.get("message") or "")
+                    else:
+                        message = str(data.get("message") or "")
+                if "loading model" in message.lower():
+                    log.warning("llama.cpp still loading model, retrying (attempt %s)", attempt)
+                    await asyncio.sleep(delay_s)
+                    continue
+            return resp
+        except httpx.RequestError as exc:
+            last_exc = exc
+            log.warning("Proxy request failed (attempt %s): %s", attempt, exc)
+            await asyncio.sleep(delay_s)
+    if last_exc:
+        raise last_exc
+    raise RuntimeError("proxy retry deadline exceeded")
+
+
+async def _get_active_model_from_truenas(cfg: TrueNASConfig) -> str:
+    try:
+        return await get_active_model_id(cfg)
+    except Exception as exc:
+        log.warning("Failed to read active model from TrueNAS config: %s", exc)
+    return ""
+
+
+async def _wait_for_active_model(cfg: TrueNASConfig, model_id: str, timeout_s: float) -> None:
+    deadline = asyncio.get_event_loop().time() + timeout_s
+    while asyncio.get_event_loop().time() < deadline:
+        active = await _get_active_model_from_truenas(cfg)
+        if active == model_id:
+            return
+        await asyncio.sleep(2)
+    raise RuntimeError(f"active model did not switch to {model_id}")
+
+
+async def _ensure_model_loaded(model_id: str, model_dir: str) -> str:
+    cfg = load_config()
+    model = resolve_model(model_dir, model_id, cfg.model_aliases)
+    if not model:
+        log.warning("Requested model not found: %s", model_id)
+        raise HTTPException(status_code=404, detail="model not found")
+    if model.model_id != model_id:
+        log.info("Resolved model alias %s -> %s", model_id, model.model_id)
+
+    truenas_cfg = None
+    if cfg.truenas_ws_url and cfg.truenas_api_key:
+        truenas_cfg = TrueNASConfig(
+            ws_url=cfg.truenas_ws_url,
+            api_key=cfg.truenas_api_key,
+            api_user=cfg.truenas_api_user,
+            app_name=cfg.truenas_app_name,
+            verify_ssl=cfg.truenas_verify_ssl,
+        )
+        active_id = await _get_active_model_from_truenas(truenas_cfg)
+        if active_id and active_id == model.model_id:
+            return model.model_id
+
+    if truenas_cfg:
+        log.info("Switching model via API model=%s args=%s extra_args=%s", model.model_id, cfg.llamacpp_args, cfg.llamacpp_extra_args)
+        try:
+            model_path = str((Path(cfg.model_container_dir) / model.model_id))
+            await switch_model(
+                truenas_cfg,
+                model_path,
+                cfg.llamacpp_args,
+                cfg.llamacpp_extra_args,
+            )
+            await _wait_for_active_model(truenas_cfg, model.model_id, cfg.switch_timeout_s)
+        except Exception as exc:
+            log.exception("TrueNAS model switch failed")
+            raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
+        warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
+        log.info("Running warmup prompt after model switch: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
+        await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
+        return model.model_id
+
+    plan = RestartPlan(
+        method=cfg.restart_method,
+        command=cfg.restart_command,
+        url=cfg.restart_url,
+        allowed_container=cfg.allowed_container,
+    )
+    log.info("Triggering restart for model=%s method=%s", model.model_id, cfg.restart_method)
+    payload = {
+        "model_id": model.model_id,
+        "model_path": str(Path(cfg.model_container_dir) / model.model_id),
+        "gpu_count": cfg.gpu_count_runtime or cfg.agents.gpu_count,
+        "llamacpp_args": cfg.llamacpp_args,
+        "llamacpp_extra_args": cfg.llamacpp_extra_args,
+    }
+    await trigger_restart(plan, payload=payload)
+    warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
+    log.info("Running warmup prompt after restart: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
+    await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
+    return model.model_id
+
+
+def create_api_app() -> FastAPI:
+    cfg = load_config()
+    app = FastAPI(title="llama.cpp OpenAI Wrapper", version="0.1.0")
+    router = APIRouter()
+
+    @app.middleware("http")
+    async def log_requests(request: Request, call_next):
+        log.info("Request %s %s", request.method, request.url.path)
+        return await call_next(request)
+
+    @app.exception_handler(Exception)
+    async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
+        log.exception("Unhandled error")
+        return JSONResponse(status_code=500, content={"detail": str(exc)})
+
+    @router.get("/health")
+    async def health() -> Dict[str, Any]:
+        return {
+            "status": "ok",
+            "base_url": cfg.base_url,
+            "model_dir": cfg.model_dir,
+            "agents": {
+                "image": cfg.agents.image,
+                "container_name": cfg.agents.container_name,
+                "network": cfg.agents.network,
+                "gpu_count": cfg.agents.gpu_count,
+            },
+            "gpu_count_runtime": cfg.gpu_count_runtime,
+        }
+
+    @router.get("/v1/models")
+    async def list_models() -> Dict[str, Any]:
+        log.info("Listing models")
+        return _model_list_payload(cfg.model_dir)
+
+    @router.get("/v1/models/{model_id}")
+    async def get_model(model_id: str) -> Dict[str, Any]:
+        log.info("Get model %s", model_id)
+        model = resolve_model(cfg.model_dir, model_id, cfg.model_aliases) or find_model(cfg.model_dir, model_id)
+        if not model:
+            raise HTTPException(status_code=404, detail="model not found")
+        return {
+            "id": model.model_id,
+            "object": "model",
+            "created": model.created,
+            "owned_by": "llama.cpp",
+        }
+
+    @router.post("/v1/chat/completions")
+    async def chat_completions(request: Request) -> Response:
+        payload = await request.json()
+        payload = normalize_chat_payload(payload)
+        model_id = payload.get("model")
+        log.info("Chat completions model=%s stream=%s", model_id, bool(payload.get("stream")))
+        if model_id:
+            resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
+            payload["model"] = resolved
+        stream = bool(payload.get("stream"))
+        if stream and _requires_json_mode(payload):
+            payload = _apply_json_fallback(payload)
+        if stream:
+            streamer = proxy_stream(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
+            return StreamingResponse(streamer, media_type="text/event-stream")
+        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
+        if resp.status_code >= 500 and _requires_json_mode(payload):
+            log.info("Retrying chat completion with JSON fallback prompt")
+            fallback_payload = _apply_json_fallback(payload)
+            resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
+        try:
+            return JSONResponse(status_code=resp.status_code, content=resp.json())
+        except Exception:
+            return Response(
+                status_code=resp.status_code,
+                content=resp.content,
+                media_type=resp.headers.get("content-type"),
+            )
+
+    @router.post("/v1/responses")
+    async def responses(request: Request) -> Response:
+        payload = await request.json()
+        chat_payload, model_id = responses_to_chat_payload(payload)
+        log.info("Responses model=%s stream=%s", model_id, bool(chat_payload.get("stream")))
+        if model_id:
+            resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
+            chat_payload["model"] = resolved
+        stream = bool(chat_payload.get("stream"))
+        if stream and _requires_json_mode(chat_payload):
+            chat_payload = _apply_json_fallback(chat_payload)
+        if stream:
+            streamer = stream_chat_to_responses(
+                cfg.base_url,
+                dict(request.headers),
+                chat_payload,
+                cfg.proxy_timeout_s,
+            )
+            return StreamingResponse(streamer, media_type="text/event-stream")
+        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), chat_payload, cfg.proxy_timeout_s)
+        if resp.status_code >= 500 and _requires_json_mode(chat_payload):
+            log.info("Retrying responses with JSON fallback prompt")
+            fallback_payload = _apply_json_fallback(chat_payload)
+            resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
+        resp.raise_for_status()
+        return JSONResponse(status_code=200, content=chat_to_responses(resp.json(), model_id))
+
+    @router.post("/v1/embeddings")
+    async def embeddings(request: Request) -> Response:
+        payload = await request.json()
+        log.info("Embeddings")
+        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/embeddings", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
+        try:
+            return JSONResponse(status_code=resp.status_code, content=resp.json())
+        except Exception:
+            return Response(
+                status_code=resp.status_code,
+                content=resp.content,
+                media_type=resp.headers.get("content-type"),
+            )
+
+    @router.api_route("/proxy/llamacpp/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"])
+    async def passthrough(path: str, request: Request) -> Response:
+        body = await request.body()
+        resp = await proxy_raw(cfg.base_url, f"/{path}", request.method, dict(request.headers), body, cfg.proxy_timeout_s)
+        return Response(status_code=resp.status_code, content=resp.content, headers=dict(resp.headers))
+
+    app.include_router(router)
+    return app
+
--- a/llamaCpp.Wrapper.app/config.py
+++ b/llamaCpp.Wrapper.app/config.py
@@ -0,0 +1,214 @@
+import json
+import os
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, List, Optional
+
+
+@dataclass
+class AgentsRuntime:
+    image: Optional[str]
+    container_name: Optional[str]
+    host_port: Optional[int]
+    container_port: Optional[int]
+    web_ui_url: Optional[str]
+    model_host_path: Optional[str]
+    model_container_path: Optional[str]
+    models: List[str]
+    network: Optional[str]
+    subnets: List[str]
+    gpu_count: Optional[int]
+    gpu_name: Optional[str]
+
+
+@dataclass
+class AppConfig:
+    api_port: int
+    ui_port: int
+    base_url: str
+    model_dir: str
+    model_container_dir: str
+    download_dir: str
+    download_max_concurrent: int
+    download_allowlist: List[str]
+    restart_method: str
+    restart_command: Optional[str]
+    restart_url: Optional[str]
+    reload_on_new_model: bool
+    proxy_timeout_s: float
+    switch_timeout_s: float
+    gpu_count_runtime: Optional[int]
+    llamacpp_args: Dict[str, str]
+    llamacpp_extra_args: str
+    truenas_api_key: Optional[str]
+    truenas_api_user: Optional[str]
+    truenas_app_name: str
+    truenas_ws_url: Optional[str]
+    truenas_verify_ssl: bool
+    allowed_container: Optional[str]
+    warmup_prompt_path: str
+    llamacpp_container_name: Optional[str]
+    model_aliases: Dict[str, str]
+    agents: AgentsRuntime
+
+
+def _load_agents_config(path: Path) -> AgentsRuntime:
+    if not path.exists():
+        return AgentsRuntime(
+            image=None,
+            container_name=None,
+            host_port=None,
+            container_port=None,
+            web_ui_url=None,
+            model_host_path=None,
+            model_container_path=None,
+            models=[],
+            network=None,
+            subnets=[],
+            gpu_count=None,
+            gpu_name=None,
+        )
+    raw = json.loads(path.read_text(encoding="utf-8"))
+    return AgentsRuntime(
+        image=raw.get("image"),
+        container_name=raw.get("container_name"),
+        host_port=raw.get("host_port"),
+        container_port=raw.get("container_port"),
+        web_ui_url=raw.get("web_ui_url"),
+        model_host_path=raw.get("model_host_path"),
+        model_container_path=raw.get("model_container_path"),
+        models=raw.get("models") or [],
+        network=raw.get("network"),
+        subnets=raw.get("subnets") or [],
+        gpu_count=raw.get("gpu_count"),
+        gpu_name=raw.get("gpu_name"),
+    )
+
+
+def _infer_gpu_count_runtime() -> Optional[int]:
+    visible = os.getenv("CUDA_VISIBLE_DEVICES") or os.getenv("NVIDIA_VISIBLE_DEVICES")
+    if visible and visible not in {"all", "void"}:
+        parts = [p.strip() for p in visible.split(",") if p.strip()]
+        if parts:
+            return len(parts)
+    return None
+
+
+def _default_base_url(agents: AgentsRuntime) -> str:
+    if agents.container_name and agents.container_port:
+        return f"http://{agents.container_name}:{agents.container_port}"
+    if agents.host_port:
+        return f"http://127.0.0.1:{agents.host_port}"
+    return "http://127.0.0.1:8080"
+
+
+def load_config() -> AppConfig:
+    agents_path = Path(os.getenv("AGENTS_CONFIG_PATH", "app/agents_config.json"))
+    agents = _load_agents_config(agents_path)
+
+    api_port = int(os.getenv("PORT_A", "9093"))
+    ui_port = int(os.getenv("PORT_B", "9094"))
+
+    base_url = os.getenv("LLAMACPP_BASE_URL") or _default_base_url(agents)
+    model_dir = os.getenv("MODEL_DIR") or agents.model_container_path or "/models"
+    model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or model_dir
+
+    download_dir = os.getenv("MODEL_DOWNLOAD_DIR") or model_dir
+    download_max = int(os.getenv("MODEL_DOWNLOAD_MAX_CONCURRENT", "2"))
+
+    allowlist_raw = os.getenv("MODEL_DOWNLOAD_ALLOWLIST", "")
+    allowlist = [item.strip() for item in allowlist_raw.split(",") if item.strip()]
+
+    restart_method = os.getenv("LLAMACPP_RESTART_METHOD", "none").lower()
+    restart_command = os.getenv("LLAMACPP_RESTART_COMMAND")
+    restart_url = os.getenv("LLAMACPP_RESTART_URL")
+
+    reload_on_new_model = os.getenv("RELOAD_ON_NEW_MODEL", "false").lower() in {"1", "true", "yes"}
+    proxy_timeout_s = float(os.getenv("LLAMACPP_PROXY_TIMEOUT_S", "600"))
+    switch_timeout_s = float(os.getenv("LLAMACPP_SWITCH_TIMEOUT_S", "300"))
+
+    gpu_count_runtime = _infer_gpu_count_runtime()
+
+    llamacpp_args = {}
+    args_map = {
+        "LLAMACPP_TENSOR_SPLIT": "tensor_split",
+        "LLAMACPP_SPLIT_MODE": "split_mode",
+        "LLAMACPP_N_GPU_LAYERS": "n_gpu_layers",
+        "LLAMACPP_CTX_SIZE": "ctx_size",
+        "LLAMACPP_BATCH_SIZE": "batch_size",
+        "LLAMACPP_UBATCH_SIZE": "ubatch_size",
+        "LLAMACPP_CACHE_TYPE_K": "cache_type_k",
+        "LLAMACPP_CACHE_TYPE_V": "cache_type_v",
+        "LLAMACPP_FLASH_ATTN": "flash_attn",
+    }
+    for env_key, arg_key in args_map.items():
+        value = os.getenv(env_key)
+        if value is not None and value != "":
+            llamacpp_args[arg_key] = value
+    llamacpp_extra_args = os.getenv("LLAMACPP_EXTRA_ARGS", "")
+
+    truenas_api_key = os.getenv("TRUENAS_API_KEY")
+    truenas_api_user = os.getenv("TRUENAS_API_USER")
+    truenas_app_name = os.getenv("TRUENAS_APP_NAME", "llamacpp")
+    truenas_ws_url = os.getenv("TRUENAS_WS_URL")
+    truenas_api_url = os.getenv("TRUENAS_API_URL")
+    if not truenas_ws_url and truenas_api_url:
+        if truenas_api_url.startswith("https://"):
+            truenas_ws_url = "wss://" + truenas_api_url[len("https://") :].rstrip("/") + "/websocket"
+        elif truenas_api_url.startswith("http://"):
+            truenas_ws_url = "ws://" + truenas_api_url[len("http://") :].rstrip("/") + "/websocket"
+    truenas_verify_ssl = os.getenv("TRUENAS_VERIFY_SSL", "false").lower() in {"1", "true", "yes"}
+    allowed_container = os.getenv("LLAMACPP_TARGET_CONTAINER") or agents.container_name
+    llamacpp_container_name = os.getenv("LLAMACPP_CONTAINER_NAME") or agents.container_name
+    warmup_prompt_path = os.getenv("WARMUP_PROMPT_PATH", str(Path("trades_company_stock.txt").resolve()))
+    if truenas_ws_url and (":" in model_container_dir[:3] or "\\" in model_container_dir):
+        model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or "/models"
+    aliases_raw = os.getenv("MODEL_ALIASES", "")
+    model_aliases: Dict[str, str] = {}
+    if aliases_raw:
+        try:
+            model_aliases = json.loads(aliases_raw)
+        except json.JSONDecodeError:
+            for item in aliases_raw.split(","):
+                if "=" in item:
+                    key, value = item.split("=", 1)
+                    model_aliases[key.strip()] = value.strip()
+
+    gpu_count = gpu_count_runtime or agents.gpu_count
+    if gpu_count and gpu_count >= 2:
+        if "tensor_split" not in llamacpp_args:
+            ratio = 1.0 / float(gpu_count)
+            split = ",".join([f"{ratio:.2f}"] * gpu_count)
+            llamacpp_args["tensor_split"] = split
+        if "split_mode" not in llamacpp_args:
+            llamacpp_args["split_mode"] = "layer"
+
+    return AppConfig(
+        api_port=api_port,
+        ui_port=ui_port,
+        base_url=base_url,
+        model_dir=model_dir,
+        model_container_dir=model_container_dir,
+        download_dir=download_dir,
+        download_max_concurrent=download_max,
+        download_allowlist=allowlist,
+        restart_method=restart_method,
+        restart_command=restart_command,
+        restart_url=restart_url,
+        reload_on_new_model=reload_on_new_model,
+        proxy_timeout_s=proxy_timeout_s,
+        switch_timeout_s=switch_timeout_s,
+        gpu_count_runtime=gpu_count_runtime,
+        llamacpp_args=llamacpp_args,
+        llamacpp_extra_args=llamacpp_extra_args,
+        truenas_api_key=truenas_api_key,
+        truenas_api_user=truenas_api_user,
+        truenas_app_name=truenas_app_name,
+        truenas_ws_url=truenas_ws_url,
+        truenas_verify_ssl=truenas_verify_ssl,
+        allowed_container=allowed_container,
+        warmup_prompt_path=warmup_prompt_path,
+        llamacpp_container_name=llamacpp_container_name,
+        model_aliases=model_aliases,
+        agents=agents,
+    )
--- a/llamaCpp.Wrapper.app/docker_logs.py
+++ b/llamaCpp.Wrapper.app/docker_logs.py
@@ -0,0 +1,61 @@
+import json
+import logging
+import os
+from typing import Optional
+
+import httpx
+
+
+log = logging.getLogger("docker_logs")
+
+
+def _docker_transport() -> httpx.AsyncHTTPTransport:
+    sock_path = os.getenv("DOCKER_SOCK", "/var/run/docker.sock")
+    return httpx.AsyncHTTPTransport(uds=sock_path)
+
+
+async def _docker_get(path: str, params: Optional[dict] = None) -> httpx.Response:
+    timeout = httpx.Timeout(10.0, read=10.0)
+    async with httpx.AsyncClient(transport=_docker_transport(), base_url="http://docker", timeout=timeout) as client:
+        resp = await client.get(path, params=params)
+        resp.raise_for_status()
+        return resp
+
+
+def _decode_docker_stream(data: bytes) -> str:
+    if not data:
+        return ""
+    out = bytearray()
+    idx = 0
+    while idx + 8 <= len(data):
+        stream_type = data[idx]
+        size = int.from_bytes(data[idx + 4: idx + 8], "big")
+        idx += 8
+        if idx + size > len(data):
+            break
+        chunk = data[idx: idx + size]
+        idx += size
+        if stream_type in (1, 2):
+            out.extend(chunk)
+        else:
+            out.extend(chunk)
+    if out:
+        return out.decode("utf-8", errors="replace")
+    return data.decode("utf-8", errors="replace")
+
+
+async def docker_container_logs(container_name: str, tail_lines: int = 200) -> str:
+    filters = json.dumps({"name": [container_name]})
+    resp = await _docker_get("/containers/json", params={"filters": filters})
+    containers = resp.json() or []
+    if not containers:
+        log.info("No docker container found for name=%s", container_name)
+        return ""
+    container_id = containers[0].get("Id")
+    if not container_id:
+        return ""
+    resp = await _docker_get(
+        f"/containers/{container_id}/logs",
+        params={"stdout": 1, "stderr": 1, "tail": tail_lines},
+    )
+    return _decode_docker_stream(resp.content)
--- a/llamaCpp.Wrapper.app/download_manager.py
+++ b/llamaCpp.Wrapper.app/download_manager.py
@@ -0,0 +1,141 @@
+import asyncio
+import fnmatch
+import logging
+import os
+import time
+import uuid
+from dataclasses import asdict, dataclass, field
+from pathlib import Path
+from typing import Dict, Optional
+
+import httpx
+
+from app.config import AppConfig
+from app.logging_utils import configure_logging
+from app.restart import RestartPlan, trigger_restart
+
+configure_logging()
+log = logging.getLogger("download_manager")
+
+
+@dataclass
+class DownloadStatus:
+    download_id: str
+    url: str
+    filename: str
+    status: str
+    bytes_total: Optional[int] = None
+    bytes_downloaded: int = 0
+    started_at: float = field(default_factory=time.time)
+    finished_at: Optional[float] = None
+    error: Optional[str] = None
+
+
+class DownloadManager:
+    def __init__(self, cfg: AppConfig, broadcaster=None) -> None:
+        self.cfg = cfg
+        self._downloads: Dict[str, DownloadStatus] = {}
+        self._tasks: Dict[str, asyncio.Task] = {}
+        self._semaphore = asyncio.Semaphore(cfg.download_max_concurrent)
+        self._broadcaster = broadcaster
+
+    async def _emit(self, payload: dict) -> None:
+        if self._broadcaster:
+            await self._broadcaster.publish(payload)
+
+    def list_downloads(self) -> Dict[str, dict]:
+        return {k: asdict(v) for k, v in self._downloads.items()}
+
+    def get(self, download_id: str) -> Optional[DownloadStatus]:
+        return self._downloads.get(download_id)
+
+    def _is_allowed(self, url: str) -> bool:
+        if not self.cfg.download_allowlist:
+            return True
+        return any(fnmatch.fnmatch(url, pattern) for pattern in self.cfg.download_allowlist)
+
+    async def start(self, url: str, filename: Optional[str] = None) -> DownloadStatus:
+        if not self._is_allowed(url):
+            raise ValueError("url not allowed by allowlist")
+        if not filename:
+            filename = os.path.basename(url.split("?")[0]) or f"model-{uuid.uuid4().hex}.gguf"
+        log.info("Download requested url=%s filename=%s", url, filename)
+        download_id = uuid.uuid4().hex
+        status = DownloadStatus(download_id=download_id, url=url, filename=filename, status="queued")
+        self._downloads[download_id] = status
+        task = asyncio.create_task(self._run_download(status))
+        self._tasks[download_id] = task
+        await self._emit({"type": "download_status", "download": asdict(status)})
+        return status
+
+    async def cancel(self, download_id: str) -> bool:
+        task = self._tasks.get(download_id)
+        if task:
+            task.cancel()
+            status = self._downloads.get(download_id)
+            if status:
+                log.info("Download cancelled id=%s filename=%s", download_id, status.filename)
+                await self._emit({"type": "download_status", "download": asdict(status)})
+            return True
+        return False
+
+    async def _run_download(self, status: DownloadStatus) -> None:
+        status.status = "downloading"
+        base = Path(self.cfg.download_dir)
+        base.mkdir(parents=True, exist_ok=True)
+        tmp_path = base / f".{status.filename}.partial"
+        final_path = base / status.filename
+        last_emit = 0.0
+
+        try:
+            async with self._semaphore:
+                async with httpx.AsyncClient(timeout=None, follow_redirects=True) as client:
+                    async with client.stream("GET", status.url) as resp:
+                        resp.raise_for_status()
+                        length = resp.headers.get("content-length")
+                        if length:
+                            status.bytes_total = int(length)
+                        with tmp_path.open("wb") as f:
+                            async for chunk in resp.aiter_bytes():
+                                if chunk:
+                                    f.write(chunk)
+                                    status.bytes_downloaded += len(chunk)
+                                    now = time.time()
+                                    if now - last_emit >= 1:
+                                        last_emit = now
+                                        await self._emit({"type": "download_progress", "download": asdict(status)})
+            if tmp_path.exists():
+                tmp_path.replace(final_path)
+            status.status = "completed"
+            status.finished_at = time.time()
+            log.info("Download completed id=%s filename=%s", status.download_id, status.filename)
+            await self._emit({"type": "download_completed", "download": asdict(status)})
+            if self.cfg.reload_on_new_model:
+                plan = RestartPlan(
+                    method=self.cfg.restart_method,
+                    command=self.cfg.restart_command,
+                    url=self.cfg.restart_url,
+                    allowed_container=self.cfg.allowed_container,
+                )
+                await trigger_restart(
+                    plan,
+                    payload={
+                        "reason": "new_model",
+                        "model_id": status.filename,
+                        "llamacpp_args": self.cfg.llamacpp_args,
+                        "llamacpp_extra_args": self.cfg.llamacpp_extra_args,
+                    },
+                )
+        except asyncio.CancelledError:
+            status.status = "cancelled"
+            if tmp_path.exists():
+                tmp_path.unlink(missing_ok=True)
+            log.info("Download cancelled id=%s filename=%s", status.download_id, status.filename)
+            await self._emit({"type": "download_cancelled", "download": asdict(status)})
+        except Exception as exc:
+            status.status = "error"
+            status.error = str(exc)
+            if tmp_path.exists():
+                tmp_path.unlink(missing_ok=True)
+            log.info("Download error id=%s filename=%s error=%s", status.download_id, status.filename, exc)
+            await self._emit({"type": "download_error", "download": asdict(status)})
--- a/llamaCpp.Wrapper.app/llamacpp_client.py
+++ b/llamaCpp.Wrapper.app/llamacpp_client.py
@@ -0,0 +1,52 @@
+import logging
+from typing import AsyncIterator, Dict, Optional
+
+import httpx
+
+
+log = logging.getLogger("llamacpp_client")
+
+
+def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
+    drop = {"host", "content-length"}
+    return {k: v for k, v in headers.items() if k.lower() not in drop}
+
+
+async def proxy_json(
+    base_url: str,
+    path: str,
+    method: str,
+    headers: Dict[str, str],
+    payload: Optional[dict],
+    timeout_s: float,
+) -> httpx.Response:
+    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
+        return await client.request(method, path, headers=_filter_headers(headers), json=payload)
+
+
+async def proxy_raw(
+    base_url: str,
+    path: str,
+    method: str,
+    headers: Dict[str, str],
+    body: Optional[bytes],
+    timeout_s: float,
+) -> httpx.Response:
+    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
+        return await client.request(method, path, headers=_filter_headers(headers), content=body)
+
+
+async def proxy_stream(
+    base_url: str,
+    path: str,
+    method: str,
+    headers: Dict[str, str],
+    payload: Optional[dict],
+    timeout_s: float,
+) -> AsyncIterator[bytes]:
+    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
+        async with client.stream(method, path, headers=_filter_headers(headers), json=payload) as resp:
+            resp.raise_for_status()
+            async for chunk in resp.aiter_bytes():
+                if chunk:
+                    yield chunk
--- a/llamaCpp.Wrapper.app/logging_utils.py
+++ b/llamaCpp.Wrapper.app/logging_utils.py
@@ -0,0 +1,13 @@
+import logging
+import os
+
+
+def configure_logging() -> None:
+    if logging.getLogger().handlers:
+        return
+    level_name = os.getenv("LOG_LEVEL", "INFO").upper()
+    level = getattr(logging, level_name, logging.INFO)
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
+    )
--- a/llamaCpp.Wrapper.app/model_registry.py
+++ b/llamaCpp.Wrapper.app/model_registry.py
@@ -0,0 +1,45 @@
+import time
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, List, Optional
+
+
+@dataclass
+class ModelInfo:
+    model_id: str
+    created: int
+    size: int
+    path: Path
+
+
+def scan_models(model_dir: str) -> List[ModelInfo]:
+    base = Path(model_dir)
+    if not base.exists():
+        return []
+    models: List[ModelInfo] = []
+    now = int(time.time())
+    for entry in base.iterdir():
+        if entry.name.endswith(".partial"):
+            continue
+        if entry.is_file():
+            size = entry.stat().st_size
+            models.append(ModelInfo(model_id=entry.name, created=now, size=size, path=entry))
+        elif entry.is_dir():
+            models.append(ModelInfo(model_id=entry.name, created=now, size=0, path=entry))
+    models.sort(key=lambda m: m.model_id.lower())
+    return models
+
+
+def find_model(model_dir: str, model_id: str) -> Optional[ModelInfo]:
+    for model in scan_models(model_dir):
+        if model.model_id == model_id:
+            return model
+    return None
+
+
+def resolve_model(model_dir: str, requested: str, aliases: Dict[str, str]) -> Optional[ModelInfo]:
+    if not requested:
+        return None
+    if requested in aliases:
+        requested = aliases[requested]
+    return find_model(model_dir, requested)
--- a/llamaCpp.Wrapper.app/openai_translate.py
+++ b/llamaCpp.Wrapper.app/openai_translate.py
@@ -0,0 +1,140 @@
+import time
+import uuid
+from typing import Any, Dict, List, Tuple
+
+
+def _messages_from_input(input_value: Any) -> List[Dict[str, Any]]:
+    if isinstance(input_value, str):
+        return [{"role": "user", "content": input_value}]
+    if isinstance(input_value, list):
+        messages: List[Dict[str, Any]] = []
+        for item in input_value:
+            if isinstance(item, str):
+                messages.append({"role": "user", "content": item})
+            elif isinstance(item, dict):
+                role = item.get("role") or "user"
+                content = item.get("content") or item.get("text") or ""
+                if item.get("type") == "input_image":
+                    content = [{"type": "image_url", "image_url": {"url": item.get("image_url", "")}}]
+                messages.append({"role": role, "content": content})
+        return messages
+    return [{"role": "user", "content": str(input_value)}]
+
+
+def _normalize_tools(tools: Any) -> Any:
+    if not isinstance(tools, list):
+        return tools
+    normalized = []
+    for tool in tools:
+        if not isinstance(tool, dict):
+            normalized.append(tool)
+            continue
+        if "function" in tool:
+            normalized.append(tool)
+            continue
+        if tool.get("type") == "function" and ("name" in tool or "parameters" in tool or "description" in tool):
+            function = {
+                "name": tool.get("name"),
+                "parameters": tool.get("parameters"),
+                "description": tool.get("description"),
+            }
+            function = {k: v for k, v in function.items() if v is not None}
+            normalized.append({"type": "function", "function": function})
+            continue
+        normalized.append(tool)
+    return normalized
+
+
+def _normalize_tool_choice(tool_choice: Any) -> Any:
+    if not isinstance(tool_choice, dict):
+        return tool_choice
+    if "function" in tool_choice:
+        return tool_choice
+    if tool_choice.get("type") == "function" and "name" in tool_choice:
+        return {"type": "function", "function": {"name": tool_choice.get("name")}}
+    return tool_choice
+
+
+def normalize_chat_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
+    if "return_format" in payload and "response_format" not in payload:
+        if payload["return_format"] == "json":
+            payload["response_format"] = {"type": "json_object"}
+    if "functions" in payload and "tools" not in payload:
+        functions = payload.get("functions")
+        if isinstance(functions, list):
+            tools = []
+            for func in functions:
+                if isinstance(func, dict):
+                    tools.append({"type": "function", "function": func})
+            if tools:
+                payload["tools"] = tools
+        payload.pop("functions", None)
+    if "tools" in payload:
+        payload["tools"] = _normalize_tools(payload.get("tools"))
+    if "tool_choice" in payload:
+        payload["tool_choice"] = _normalize_tool_choice(payload.get("tool_choice"))
+    return payload
+
+
+def responses_to_chat_payload(payload: Dict[str, Any]) -> Tuple[Dict[str, Any], str]:
+    model = payload.get("model") or "unknown"
+    messages = _messages_from_input(payload.get("input", ""))
+
+    chat_payload: Dict[str, Any] = {
+        "model": model,
+        "messages": messages,
+    }
+
+    passthrough_keys = [
+        "temperature",
+        "top_p",
+        "max_output_tokens",
+        "stream",
+        "tools",
+        "tool_choice",
+        "response_format",
+        "return_format",
+        "frequency_penalty",
+        "presence_penalty",
+        "seed",
+        "stop",
+    ]
+
+    for key in passthrough_keys:
+        if key in payload:
+            if key == "max_output_tokens":
+                chat_payload["max_tokens"] = payload[key]
+            elif key == "return_format" and payload[key] == "json":
+                chat_payload["response_format"] = {"type": "json_object"}
+            else:
+                chat_payload[key] = payload[key]
+
+    return normalize_chat_payload(chat_payload), model
+
+
+def chat_to_responses(chat: Dict[str, Any], model: str) -> Dict[str, Any]:
+    response_id = f"resp_{uuid.uuid4().hex}"
+    created = int(time.time())
+    content = ""
+    if chat.get("choices"):
+        choice = chat["choices"][0]
+        message = choice.get("message") or {}
+        content = message.get("content") or ""
+
+    return {
+        "id": response_id,
+        "object": "response",
+        "created": created,
+        "model": model,
+        "output": [
+            {
+                "id": f"msg_{uuid.uuid4().hex}",
+                "type": "message",
+                "role": "assistant",
+                "content": [
+                    {"type": "output_text", "text": content}
+                ],
+            }
+        ],
+        "usage": chat.get("usage", {}),
+    }
--- a/llamaCpp.Wrapper.app/restart.py
+++ b/llamaCpp.Wrapper.app/restart.py
@@ -0,0 +1,51 @@
+import asyncio
+import logging
+import shlex
+from dataclasses import dataclass
+from typing import Optional
+
+import httpx
+
+
+log = logging.getLogger("llamacpp_restart")
+
+
+@dataclass
+class RestartPlan:
+    method: str
+    command: Optional[str]
+    url: Optional[str]
+    allowed_container: Optional[str] = None
+
+
+async def trigger_restart(plan: RestartPlan, payload: Optional[dict] = None) -> None:
+    if plan.method == "none":
+        log.warning("Restart requested but restart method is none")
+        return
+    if plan.method == "http":
+        if not plan.url:
+            raise RuntimeError("restart url is required for http method")
+        async with httpx.AsyncClient(timeout=60) as client:
+            resp = await client.post(plan.url, json=payload or {})
+            resp.raise_for_status()
+        return
+    if plan.method == "docker":
+        if not plan.command:
+            raise RuntimeError("restart command must include container id or name for docker method")
+        if plan.allowed_container and plan.command != plan.allowed_container:
+            raise RuntimeError("docker restart command not allowed for non-target container")
+        async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds="/var/run/docker.sock"), timeout=30) as client:
+            resp = await client.post(f"http://docker/containers/{plan.command}/restart")
+            resp.raise_for_status()
+        return
+    if plan.method == "shell":
+        if not plan.command:
+            raise RuntimeError("restart command is required for shell method")
+        cmd = plan.command
+        args = shlex.split(cmd)
+        proc = await asyncio.create_subprocess_exec(*args)
+        code = await proc.wait()
+        if code != 0:
+            raise RuntimeError(f"restart command failed with exit code {code}")
+        return
+    raise RuntimeError(f"unknown restart method {plan.method}")
--- a/llamaCpp.Wrapper.app/run.py
+++ b/llamaCpp.Wrapper.app/run.py
@@ -0,0 +1,35 @@
+import os
+import signal
+import subprocess
+import sys
+
+from app.config import load_config
+
+
+def main() -> None:
+    cfg = load_config()
+    python = sys.executable
+
+    api_cmd = [python, "-m", "uvicorn", "app.api_app:create_api_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.api_port)]
+    ui_cmd = [python, "-m", "uvicorn", "app.ui_app:create_ui_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.ui_port)]
+
+    procs = [subprocess.Popen(api_cmd)]
+    if cfg.ui_port != cfg.api_port:
+        procs.append(subprocess.Popen(ui_cmd))
+
+    def shutdown(_sig, _frame):
+        for proc in procs:
+            proc.terminate()
+        for proc in procs:
+            proc.wait(timeout=10)
+        sys.exit(0)
+
+    signal.signal(signal.SIGTERM, shutdown)
+    signal.signal(signal.SIGINT, shutdown)
+
+    for proc in procs:
+        proc.wait()
+
+
+if __name__ == "__main__":
+    main()
--- a/llamaCpp.Wrapper.app/stream_transform.py
+++ b/llamaCpp.Wrapper.app/stream_transform.py
@@ -0,0 +1,102 @@
+import json
+import time
+import uuid
+from typing import Any, AsyncIterator, Dict
+
+import httpx
+
+
+def _sse_event(event: str, data: Dict[str, Any]) -> bytes:
+    payload = json.dumps(data, separators=(",", ":"))
+    return f"event: {event}\ndata: {payload}\n\n".encode("utf-8")
+
+def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
+    drop = {"host", "content-length"}
+    return {k: v for k, v in headers.items() if k.lower() not in drop}
+
+
+async def stream_chat_to_responses(
+    base_url: str,
+    headers: Dict[str, str],
+    payload: Dict[str, Any],
+    timeout_s: float,
+) -> AsyncIterator[bytes]:
+    response_id = f"resp_{uuid.uuid4().hex}"
+    created = int(time.time())
+    model = payload.get("model") or "unknown"
+    msg_id = f"msg_{uuid.uuid4().hex}"
+    output_text = ""
+
+    response_stub = {
+        "id": response_id,
+        "object": "response",
+        "created": created,
+        "model": model,
+        "output": [
+            {
+                "id": msg_id,
+                "type": "message",
+                "role": "assistant",
+                "content": [
+                    {"type": "output_text", "text": ""}
+                ],
+            }
+        ],
+    }
+
+    yield _sse_event("response.created", {"type": "response.created", "response": response_stub})
+
+    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
+        async with client.stream(
+            "POST",
+            "/v1/chat/completions",
+            headers=_filter_headers(headers),
+            json=payload,
+        ) as resp:
+            resp.raise_for_status()
+            buffer = ""
+            async for chunk in resp.aiter_text():
+                buffer += chunk
+                while "\n\n" in buffer:
+                    block, buffer = buffer.split("\n\n", 1)
+                    lines = [line for line in block.splitlines() if line.startswith("data:")]
+                    if not lines:
+                        continue
+                    data_str = "\n".join(line[len("data:"):].strip() for line in lines)
+                    if data_str == "[DONE]":
+                        continue
+                    try:
+                        data = json.loads(data_str)
+                    except json.JSONDecodeError:
+                        continue
+                    choices = data.get("choices") or []
+                    if not choices:
+                        continue
+                    delta = choices[0].get("delta") or {}
+                    text_delta = delta.get("content")
+                    if text_delta:
+                        output_text += text_delta
+                        yield _sse_event(
+                            "response.output_text.delta",
+                            {
+                                "type": "response.output_text.delta",
+                                "delta": text_delta,
+                                "item_id": msg_id,
+                                "output_index": 0,
+                                "content_index": 0,
+                            },
+                        )
+
+    yield _sse_event(
+        "response.output_text.done",
+        {
+            "type": "response.output_text.done",
+            "text": output_text,
+            "item_id": msg_id,
+            "output_index": 0,
+            "content_index": 0,
+        },
+    )
+
+    response_stub["output"][0]["content"][0]["text"] = output_text
+    yield _sse_event("response.completed", {"type": "response.completed", "response": response_stub})
--- a/llamaCpp.Wrapper.app/truenas_middleware.py
+++ b/llamaCpp.Wrapper.app/truenas_middleware.py
@@ -0,0 +1,313 @@
+import json
+import logging
+import shlex
+import ssl
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+import websockets
+import yaml
+
+
+log = logging.getLogger("truenas_middleware")
+
+
+@dataclass
+class TrueNASConfig:
+    ws_url: str
+    api_key: str
+    api_user: Optional[str]
+    app_name: str
+    verify_ssl: bool = False
+
+
+def _parse_compose(raw: Any) -> Dict[str, Any]:
+    if isinstance(raw, dict):
+        return raw
+    if isinstance(raw, str):
+        text = raw.strip()
+        try:
+            return json.loads(text)
+        except json.JSONDecodeError:
+            return yaml.safe_load(text)
+    raise ValueError("Unsupported compose payload")
+
+
+def _command_to_list(command: Any) -> list:
+    if isinstance(command, list):
+        return command
+    if isinstance(command, str):
+        return shlex.split(command)
+    return []
+
+
+def _extract_command(config: Dict[str, Any], service_name: str = "llamacpp") -> list:
+    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
+        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
+        services = compose.get("services") or {}
+        svc = services.get(service_name) or {}
+        return _command_to_list(svc.get("command"))
+    return _command_to_list(config.get("command"))
+
+
+def _model_id_from_command(cmd: list) -> Optional[str]:
+    if "--model" in cmd:
+        idx = cmd.index("--model")
+        if idx + 1 < len(cmd):
+            return Path(cmd[idx + 1]).name
+    return None
+
+
+def _set_arg(cmd: list, flag: str, value: Optional[str]) -> list:
+    if value is None:
+        return cmd
+    if flag in cmd:
+        idx = cmd.index(flag)
+        if idx + 1 < len(cmd):
+            cmd[idx + 1] = value
+        else:
+            cmd.append(value)
+        return cmd
+    cmd.extend([flag, value])
+    return cmd
+
+
+def _merge_args(cmd: list, args: Dict[str, str]) -> list:
+    flag_map = {
+        "device": "--device",
+        "tensor_split": "--tensor-split",
+        "split_mode": "--split-mode",
+        "n_gpu_layers": "--n-gpu-layers",
+        "ctx_size": "--ctx-size",
+        "batch_size": "--batch-size",
+        "ubatch_size": "--ubatch-size",
+        "cache_type_k": "--cache-type-k",
+        "cache_type_v": "--cache-type-v",
+        "flash_attn": "--flash-attn",
+    }
+    for key, value in args.items():
+        flag = flag_map.get(key)
+        if flag:
+            if flag in cmd:
+                continue
+            _set_arg(cmd, flag, value)
+    return cmd
+
+
+def _merge_extra_args(cmd: list, extra: str) -> list:
+    if not extra:
+        return cmd
+    extra_list = shlex.split(extra)
+    filtered: list[str] = []
+    skip_next = False
+    for item in extra_list:
+        if skip_next:
+            skip_next = False
+            continue
+        if item in {"--device", "-dev"}:
+            log.warning("Dropping --device from extra args to avoid llama.cpp device errors.")
+            skip_next = True
+            continue
+        filtered.append(item)
+    for flag in filtered:
+        if flag not in cmd:
+            cmd.append(flag)
+    return cmd
+
+
+def _update_model_command(command: Any, model_path: str, args: Dict[str, str], extra: str) -> list:
+    cmd = _command_to_list(command)
+    if "--device" in cmd:
+        idx = cmd.index("--device")
+        del cmd[idx: idx + 2]
+    cmd = _set_arg(cmd, "--model", model_path)
+    cmd = _merge_args(cmd, args)
+    cmd = _merge_extra_args(cmd, extra)
+    return cmd
+
+
+def _replace_flags(cmd: list, flags: Dict[str, Optional[str]], extra: str) -> list:
+    result = list(cmd)
+    for flag in flags.keys():
+        while flag in result:
+            idx = result.index(flag)
+            del result[idx: idx + 2]
+    if "--device" in result:
+        idx = result.index("--device")
+        del result[idx: idx + 2]
+    for flag, value in flags.items():
+        if value is not None and value != "":
+            result = _set_arg(result, flag, value)
+    result = _merge_extra_args(result, extra)
+    return result
+
+
+async def get_app_config(cfg: TrueNASConfig) -> Dict[str, Any]:
+    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
+    if not isinstance(config, dict):
+        raise RuntimeError("app.config returned unsupported payload")
+    return config
+
+
+async def get_app_command(cfg: TrueNASConfig, service_name: str = "llamacpp") -> list:
+    config = await get_app_config(cfg)
+    return _extract_command(config, service_name=service_name)
+
+
+async def get_active_model_id(cfg: TrueNASConfig, service_name: str = "llamacpp") -> str:
+    config = await get_app_config(cfg)
+    cmd = _extract_command(config, service_name=service_name)
+    return _model_id_from_command(cmd) or ""
+
+
+async def get_app_logs(
+    cfg: TrueNASConfig,
+    tail_lines: int = 200,
+    service_name: str = "llamacpp",
+) -> str:
+    tail_payloads = [
+        {"tail": tail_lines},
+        {"tail_lines": tail_lines},
+        {"tail": str(tail_lines)},
+    ]
+    for payload in tail_payloads:
+        try:
+            result = await _rpc_call(cfg, "app.container_logs", [cfg.app_name, service_name, payload])
+            if isinstance(result, str):
+                return result
+        except Exception as exc:
+            log.debug("app.container_logs failed (%s): %s", payload, exc)
+    for payload in tail_payloads:
+        try:
+            result = await _rpc_call(cfg, "app.logs", [cfg.app_name, payload])
+            if isinstance(result, str):
+                return result
+        except Exception as exc:
+            log.debug("app.logs failed (%s): %s", payload, exc)
+    return ""
+
+
+async def update_app_command(
+    cfg: TrueNASConfig,
+    command: list,
+    service_name: str = "llamacpp",
+) -> None:
+    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
+    if not isinstance(config, dict):
+        raise RuntimeError("app.config returned unsupported payload")
+    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
+        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
+        services = compose.get("services") or {}
+        if service_name not in services:
+            raise RuntimeError(f"service {service_name} not found in compose")
+        svc = services[service_name]
+        svc["command"] = command
+        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
+        return
+    config["command"] = command
+    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
+
+
+async def update_command_flags(
+    cfg: TrueNASConfig,
+    flags: Dict[str, Optional[str]],
+    extra: str,
+    service_name: str = "llamacpp",
+) -> None:
+    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
+    if not isinstance(config, dict):
+        raise RuntimeError("app.config returned unsupported payload")
+    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
+        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
+        services = compose.get("services") or {}
+        if service_name not in services:
+            raise RuntimeError(f"service {service_name} not found in compose")
+        svc = services[service_name]
+        cmd = svc.get("command")
+        svc["command"] = _replace_flags(_command_to_list(cmd), flags, extra)
+        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
+        return
+    cmd = _replace_flags(_command_to_list(config.get("command")), flags, extra)
+    config["command"] = cmd
+    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
+
+
+async def _rpc_call(cfg: TrueNASConfig, method: str, params: Optional[list] = None) -> Any:
+    ssl_ctx = None
+    if cfg.ws_url.startswith("wss://") and not cfg.verify_ssl:
+        ssl_ctx = ssl.create_default_context()
+        ssl_ctx.check_hostname = False
+        ssl_ctx.verify_mode = ssl.CERT_NONE
+
+    async with websockets.connect(cfg.ws_url, ssl=ssl_ctx) as ws:
+        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
+        connected = json.loads(await ws.recv())
+        if connected.get("msg") != "connected":
+            raise RuntimeError("failed to connect to TrueNAS websocket")
+
+        await ws.send(
+            json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [cfg.api_key]})
+        )
+        auth_resp = json.loads(await ws.recv())
+        if not auth_resp.get("result"):
+            if not cfg.api_user:
+                raise RuntimeError("API key rejected and TRUENAS_API_USER not set")
+            await ws.send(
+                json.dumps(
+                    {
+                        "id": 2,
+                        "msg": "method",
+                        "method": "auth.login_ex",
+                        "params": [
+                            {
+                                "mechanism": "API_KEY_PLAIN",
+                                "username": cfg.api_user,
+                                "api_key": cfg.api_key,
+                            }
+                        ],
+                    }
+                )
+            )
+            auth_ex = json.loads(await ws.recv())
+            if auth_ex.get("result", {}).get("response_type") != "SUCCESS":
+                raise RuntimeError("API key authentication failed")
+
+        req_id = 3
+        await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
+        while True:
+            raw = json.loads(await ws.recv())
+            if raw.get("id") != req_id:
+                continue
+            if raw.get("msg") == "error":
+                raise RuntimeError(raw.get("error"))
+            return raw.get("result")
+
+
+async def switch_model(
+    cfg: TrueNASConfig,
+    model_path: str,
+    args: Dict[str, str],
+    extra: str,
+    service_name: str = "llamacpp",
+) -> None:
+    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
+    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
+        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
+        services = compose.get("services") or {}
+        if service_name not in services:
+            raise RuntimeError(f"service {service_name} not found in compose")
+        svc = services[service_name]
+        cmd = svc.get("command")
+        svc["command"] = _update_model_command(cmd, model_path, args, extra)
+        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
+        log.info("Requested model switch to %s via TrueNAS middleware (custom app)", model_path)
+        return
+
+    if not isinstance(config, dict):
+        raise RuntimeError("app.config returned unsupported payload")
+
+    cmd = config.get("command")
+    config["command"] = _update_model_command(cmd, model_path, args, extra)
+    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
+    log.info("Requested model switch to %s via TrueNAS middleware (catalog app)", model_path)
--- a/llamaCpp.Wrapper.app/ui_app.py
+++ b/llamaCpp.Wrapper.app/ui_app.py
@@ -0,0 +1,357 @@
+import asyncio
+import json
+import logging
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+import httpx
+from fastapi import FastAPI, HTTPException, Request
+from fastapi.responses import FileResponse, HTMLResponse, JSONResponse, StreamingResponse
+
+from app.config import load_config
+from app.docker_logs import docker_container_logs
+from app.download_manager import DownloadManager
+from app.logging_utils import configure_logging
+from app.model_registry import scan_models
+from app.truenas_middleware import (
+    TrueNASConfig,
+    get_active_model_id,
+    get_app_command,
+    get_app_logs,
+    switch_model,
+    update_command_flags,
+)
+from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
+
+
+configure_logging()
+log = logging.getLogger("ui_app")
+
+
+class EventBroadcaster:
+    def __init__(self) -> None:
+        self._queues: set[asyncio.Queue] = set()
+
+    def connect(self) -> asyncio.Queue:
+        queue: asyncio.Queue = asyncio.Queue()
+        self._queues.add(queue)
+        return queue
+
+    def disconnect(self, queue: asyncio.Queue) -> None:
+        self._queues.discard(queue)
+
+    async def publish(self, payload: dict) -> None:
+        for queue in list(self._queues):
+            queue.put_nowait(payload)
+
+
+def _static_path() -> Path:
+    return Path(__file__).parent / "ui_static"
+
+
+async def _fetch_active_model(truenas_cfg: Optional[TrueNASConfig]) -> Optional[str]:
+    if not truenas_cfg:
+        return None
+    try:
+        return await get_active_model_id(truenas_cfg)
+    except Exception as exc:
+        log.warning("Failed to read active model from TrueNAS config: %s", exc)
+        return None
+
+
+def _model_list(model_dir: str, active_model: Optional[str]) -> Dict[str, Any]:
+    data = []
+    for model in scan_models(model_dir):
+        data.append({
+            "id": model.model_id,
+            "size": model.size,
+            "active": model.model_id == active_model,
+        })
+    return {"models": data, "active_model": active_model}
+
+
+def create_ui_app() -> FastAPI:
+    cfg = load_config()
+    app = FastAPI(title="llama.cpp Model Manager", version="0.1.0")
+    broadcaster = EventBroadcaster()
+    manager = DownloadManager(cfg, broadcaster=broadcaster)
+    truenas_cfg = None
+    if cfg.truenas_ws_url and cfg.truenas_api_key:
+        truenas_cfg = TrueNASConfig(
+            ws_url=cfg.truenas_ws_url,
+            api_key=cfg.truenas_api_key,
+            api_user=cfg.truenas_api_user,
+            app_name=cfg.truenas_app_name,
+            verify_ssl=cfg.truenas_verify_ssl,
+        )
+
+    async def monitor_active_model() -> None:
+        last_model = None
+        while True:
+            current = await _fetch_active_model(truenas_cfg)
+            if current and current != last_model:
+                last_model = current
+                await broadcaster.publish({"type": "active_model", "model_id": current})
+            await asyncio.sleep(3)
+
+    async def _fetch_logs() -> str:
+        logs = ""
+        if truenas_cfg:
+            try:
+                logs = await asyncio.wait_for(get_app_logs(truenas_cfg, tail_lines=200), timeout=5)
+            except asyncio.TimeoutError:
+                logs = ""
+        if not logs and cfg.llamacpp_container_name:
+            try:
+                logs = await asyncio.wait_for(
+                    docker_container_logs(cfg.llamacpp_container_name, tail_lines=200),
+                    timeout=10,
+                )
+            except asyncio.TimeoutError:
+                logs = ""
+        return logs
+
+    @app.on_event("startup")
+    async def start_tasks() -> None:
+        asyncio.create_task(monitor_active_model())
+
+    @app.middleware("http")
+    async def log_requests(request: Request, call_next):
+        log.info("UI request %s %s", request.method, request.url.path)
+        return await call_next(request)
+
+    @app.get("/health")
+    async def health() -> Dict[str, Any]:
+        return {"status": "ok", "model_dir": cfg.model_dir}
+
+    @app.get("/")
+    async def index() -> HTMLResponse:
+        return FileResponse(_static_path() / "index.html")
+
+    @app.get("/ui/styles.css")
+    async def styles() -> FileResponse:
+        return FileResponse(_static_path() / "styles.css")
+
+    @app.get("/ui/app.js")
+    async def app_js() -> FileResponse:
+        return FileResponse(_static_path() / "app.js")
+
+    @app.get("/ui/api/models")
+    async def list_models() -> JSONResponse:
+        active_model = await _fetch_active_model(truenas_cfg)
+        log.info("UI list models active=%s", active_model)
+        return JSONResponse(_model_list(cfg.model_dir, active_model))
+
+    @app.get("/ui/api/downloads")
+    async def list_downloads() -> JSONResponse:
+        log.info("UI list downloads")
+        return JSONResponse({"downloads": manager.list_downloads()})
+
+    @app.post("/ui/api/downloads")
+    async def start_download(request: Request) -> JSONResponse:
+        payload = await request.json()
+        url = payload.get("url")
+        filename = payload.get("filename")
+        log.info("UI download start url=%s filename=%s", url, filename)
+        if not url:
+            raise HTTPException(status_code=400, detail="url is required")
+        try:
+            status = await manager.start(url, filename=filename)
+        except ValueError as exc:
+            raise HTTPException(status_code=403, detail=str(exc))
+        return JSONResponse({"download": status.__dict__})
+
+    @app.delete("/ui/api/downloads/{download_id}")
+    async def cancel_download(download_id: str) -> JSONResponse:
+        log.info("UI download cancel id=%s", download_id)
+        ok = await manager.cancel(download_id)
+        if not ok:
+            raise HTTPException(status_code=404, detail="download not found")
+        return JSONResponse({"status": "cancelled"})
+
+    @app.get("/ui/api/events")
+    async def events() -> StreamingResponse:
+        queue = broadcaster.connect()
+
+        async def event_stream():
+            try:
+                while True:
+                    payload = await queue.get()
+                    data = json.dumps(payload, separators=(",", ":"))
+                    yield f"data: {data}\n\n".encode("utf-8")
+            finally:
+                broadcaster.disconnect(queue)
+
+        return StreamingResponse(event_stream(), media_type="text/event-stream")
+
+    @app.post("/ui/api/switch-model")
+    async def switch_model_ui(request: Request) -> JSONResponse:
+        payload = await request.json()
+        model_id = payload.get("model_id")
+        warmup_override = payload.get("warmup_prompt") or ""
+        if not model_id:
+            raise HTTPException(status_code=400, detail="model_id is required")
+
+        model_path = Path(cfg.model_dir) / model_id
+        if not model_path.exists():
+            raise HTTPException(status_code=404, detail="model not found")
+
+        if not truenas_cfg:
+            raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
+
+        try:
+            container_model_path = str(Path(cfg.model_container_dir) / model_id)
+            await switch_model(truenas_cfg, container_model_path, cfg.llamacpp_args, cfg.llamacpp_extra_args)
+        except Exception as exc:
+            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
+            raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
+
+        warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
+        log.info("UI warmup after switch model=%s prompt_len=%s", model_id, len(warmup_prompt))
+        try:
+            await run_warmup_with_retry(cfg.base_url, model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
+        except Exception as exc:
+            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
+            raise HTTPException(status_code=500, detail=f"model switch warmup failed: {exc}")
+
+        try:
+            async with httpx.AsyncClient(base_url=cfg.base_url, timeout=120) as client:
+                resp = await client.post(
+                    "/v1/chat/completions",
+                    json={
+                        "model": model_id,
+                        "messages": [{"role": "user", "content": "ok"}],
+                        "max_tokens": 4,
+                        "temperature": 0,
+                    },
+                )
+                resp.raise_for_status()
+        except Exception as exc:
+            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
+            raise HTTPException(status_code=500, detail=f"model switch verification failed: {exc}")
+
+        await broadcaster.publish({"type": "model_switched", "model_id": model_id})
+        log.info("UI model switched model=%s", model_id)
+        return JSONResponse({"status": "ok", "model_id": model_id})
+
+    @app.get("/ui/api/llamacpp-config")
+    async def get_llamacpp_config() -> JSONResponse:
+        active_model = await _fetch_active_model(truenas_cfg)
+        log.info("UI get llama.cpp config active=%s", active_model)
+        params: Dict[str, Optional[str]] = {}
+        command_raw = []
+        if truenas_cfg:
+            command_raw = await get_app_command(truenas_cfg)
+        flag_map = {
+            "--ctx-size": "ctx_size",
+            "--n-gpu-layers": "n_gpu_layers",
+            "--tensor-split": "tensor_split",
+            "--split-mode": "split_mode",
+            "--cache-type-k": "cache_type_k",
+            "--cache-type-v": "cache_type_v",
+            "--flash-attn": "flash_attn",
+            "--temp": "temp",
+            "--top-k": "top_k",
+            "--top-p": "top_p",
+            "--repeat-penalty": "repeat_penalty",
+            "--repeat-last-n": "repeat_last_n",
+            "--frequency-penalty": "frequency_penalty",
+            "--presence-penalty": "presence_penalty",
+        }
+        if isinstance(command_raw, list):
+            for flag, key in flag_map.items():
+                if flag in command_raw:
+                    idx = command_raw.index(flag)
+                    if idx + 1 < len(command_raw):
+                        params[key] = command_raw[idx + 1]
+        known_flags = set(flag_map.keys()) | {"--model"}
+        extra = []
+        if isinstance(command_raw, list):
+            skip_next = False
+            for item in command_raw:
+                if skip_next:
+                    skip_next = False
+                    continue
+                if item in known_flags:
+                    skip_next = True
+                    continue
+                extra.append(item)
+        return JSONResponse(
+            {
+                "active_model": active_model,
+                "params": params,
+                "extra_args": " ".join(extra),
+            }
+        )
+
+    @app.post("/ui/api/llamacpp-config")
+    async def update_llamacpp_config(request: Request) -> JSONResponse:
+        payload = await request.json()
+        params = payload.get("params") or {}
+        extra_args = payload.get("extra_args") or ""
+        warmup_override = payload.get("warmup_prompt") or ""
+        log.info("UI save llama.cpp config params=%s extra_args=%s", params, extra_args)
+        if not truenas_cfg:
+            raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
+        flags = {
+            "--ctx-size": params.get("ctx_size"),
+            "--n-gpu-layers": params.get("n_gpu_layers"),
+            "--tensor-split": params.get("tensor_split"),
+            "--split-mode": params.get("split_mode"),
+            "--cache-type-k": params.get("cache_type_k"),
+            "--cache-type-v": params.get("cache_type_v"),
+            "--flash-attn": params.get("flash_attn"),
+            "--temp": params.get("temp"),
+            "--top-k": params.get("top_k"),
+            "--top-p": params.get("top_p"),
+            "--repeat-penalty": params.get("repeat_penalty"),
+            "--repeat-last-n": params.get("repeat_last_n"),
+            "--frequency-penalty": params.get("frequency_penalty"),
+            "--presence-penalty": params.get("presence_penalty"),
+        }
+        try:
+            await update_command_flags(truenas_cfg, flags, extra_args)
+        except Exception as exc:
+            log.exception("UI update llama.cpp config failed")
+            raise HTTPException(status_code=500, detail=f"config update failed: {exc}")
+        active_model = await _fetch_active_model(truenas_cfg)
+        if active_model:
+            warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
+            log.info("UI warmup after config update model=%s prompt_len=%s", active_model, len(warmup_prompt))
+            try:
+                await run_warmup_with_retry(cfg.base_url, active_model, warmup_prompt, timeout_s=cfg.switch_timeout_s)
+            except Exception as exc:
+                raise HTTPException(status_code=500, detail=f"config warmup failed: {exc}")
+        await broadcaster.publish({"type": "llamacpp_config_updated"})
+        return JSONResponse({"status": "ok"})
+
+    @app.get("/ui/api/llamacpp-logs")
+    async def get_llamacpp_logs() -> JSONResponse:
+        logs = await _fetch_logs()
+        return JSONResponse({"logs": logs})
+
+    @app.get("/ui/api/llamacpp-logs/stream")
+    async def stream_llamacpp_logs() -> StreamingResponse:
+        async def event_stream():
+            last_lines: list[str] = []
+            while True:
+                logs = await _fetch_logs()
+                lines = logs.splitlines()
+                if last_lines:
+                    last_tail = last_lines[-1]
+                    idx = -1
+                    for i in range(len(lines) - 1, -1, -1):
+                        if lines[i] == last_tail:
+                            idx = i
+                            break
+                    if idx >= 0:
+                        lines = lines[idx + 1 :]
+                if lines:
+                    last_lines = (last_lines + lines)[-200:]
+                    data = json.dumps({"type": "logs", "lines": lines}, separators=(",", ":"))
+                    yield f"data: {data}\n\n".encode("utf-8")
+                await asyncio.sleep(2)
+
+        return StreamingResponse(event_stream(), media_type="text/event-stream")
+
+    return app
--- a/llamaCpp.Wrapper.app/ui_static/app.js
+++ b/llamaCpp.Wrapper.app/ui_static/app.js
@@ -0,0 +1,306 @@
+const modelsList = document.getElementById("models-list");
+const downloadsList = document.getElementById("downloads-list");
+const refreshModels = document.getElementById("refresh-models");
+const refreshDownloads = document.getElementById("refresh-downloads");
+const form = document.getElementById("download-form");
+const errorEl = document.getElementById("download-error");
+const statusEl = document.getElementById("switch-status");
+const configStatusEl = document.getElementById("config-status");
+const configForm = document.getElementById("config-form");
+const refreshConfig = document.getElementById("refresh-config");
+const warmupPromptEl = document.getElementById("warmup-prompt");
+const refreshLogs = document.getElementById("refresh-logs");
+const logsOutput = document.getElementById("logs-output");
+const logsStatus = document.getElementById("logs-status");
+const themeToggle = document.getElementById("theme-toggle");
+
+const applyTheme = (theme) => {
+  document.documentElement.setAttribute("data-theme", theme);
+  themeToggle.textContent = theme === "dark" ? "Light" : "Dark";
+  themeToggle.setAttribute("aria-pressed", theme === "dark" ? "true" : "false");
+};
+
+const savedTheme = localStorage.getItem("theme") || "light";
+applyTheme(savedTheme);
+themeToggle.addEventListener("click", () => {
+  const next = document.documentElement.getAttribute("data-theme") === "dark" ? "light" : "dark";
+  localStorage.setItem("theme", next);
+  applyTheme(next);
+});
+
+const cfgFields = {
+  ctx_size: document.getElementById("cfg-ctx-size"),
+  n_gpu_layers: document.getElementById("cfg-n-gpu-layers"),
+  tensor_split: document.getElementById("cfg-tensor-split"),
+  split_mode: document.getElementById("cfg-split-mode"),
+  cache_type_k: document.getElementById("cfg-cache-type-k"),
+  cache_type_v: document.getElementById("cfg-cache-type-v"),
+  flash_attn: document.getElementById("cfg-flash-attn"),
+  temp: document.getElementById("cfg-temp"),
+  top_k: document.getElementById("cfg-top-k"),
+  top_p: document.getElementById("cfg-top-p"),
+  repeat_penalty: document.getElementById("cfg-repeat-penalty"),
+  repeat_last_n: document.getElementById("cfg-repeat-last-n"),
+  frequency_penalty: document.getElementById("cfg-frequency-penalty"),
+  presence_penalty: document.getElementById("cfg-presence-penalty"),
+};
+const extraArgsEl = document.getElementById("cfg-extra-args");
+
+const fmtBytes = (bytes) => {
+  if (!bytes && bytes !== 0) return "-";
+  const units = ["B", "KB", "MB", "GB", "TB"];
+  let idx = 0;
+  let value = bytes;
+  while (value >= 1024 && idx < units.length - 1) {
+    value /= 1024;
+    idx += 1;
+  }
+  return `${value.toFixed(1)} ${units[idx]}`;
+};
+
+const setStatus = (message, type) => {
+  statusEl.textContent = message || "";
+  statusEl.className = "status";
+  if (type) {
+    statusEl.classList.add(type);
+  }
+};
+
+const setConfigStatus = (message, type) => {
+  configStatusEl.textContent = message || "";
+  configStatusEl.className = "status";
+  if (type) {
+    configStatusEl.classList.add(type);
+  }
+};
+
+async function loadModels() {
+  const res = await fetch("/ui/api/models");
+  const data = await res.json();
+  modelsList.innerHTML = "";
+  const activeModel = data.active_model;
+  data.models.forEach((model) => {
+    const li = document.createElement("li");
+    if (model.active) {
+      li.classList.add("active");
+    }
+    const row = document.createElement("div");
+    row.className = "model-row";
+
+    const name = document.createElement("span");
+    name.textContent = `${model.id} (${fmtBytes(model.size)})`;
+
+    const actions = document.createElement("div");
+    if (model.active) {
+      const badge = document.createElement("span");
+      badge.className = "badge";
+      badge.textContent = "Active";
+      actions.appendChild(badge);
+    } else {
+      const button = document.createElement("button");
+      button.className = "ghost";
+      button.textContent = "Switch";
+      button.onclick = async () => {
+        setStatus(`Switching to ${model.id}...`);
+        const warmupPrompt = warmupPromptEl.value.trim();
+        const res = await fetch("/ui/api/switch-model", {
+          method: "POST",
+          headers: { "Content-Type": "application/json" },
+          body: JSON.stringify({ model_id: model.id, warmup_prompt: warmupPrompt }),
+        });
+        const payload = await res.json();
+        if (!res.ok) {
+          setStatus(payload.detail || "Switch failed.", "error");
+          return;
+        }
+        warmupPromptEl.value = "";
+        setStatus(`Active model: ${model.id}`, "ok");
+        await loadModels();
+      };
+      actions.appendChild(button);
+    }
+
+    row.appendChild(name);
+    row.appendChild(actions);
+    li.appendChild(row);
+    modelsList.appendChild(li);
+  });
+  if (activeModel) {
+    setStatus(`Active model: ${activeModel}`, "ok");
+  }
+}
+
+async function loadDownloads() {
+  const res = await fetch("/ui/api/downloads");
+  const data = await res.json();
+  downloadsList.innerHTML = "";
+  const entries = Object.values(data.downloads || {});
+  if (!entries.length) {
+    downloadsList.innerHTML = "<p>No active downloads.</p>";
+    return;
+  }
+  entries.forEach((download) => {
+    const card = document.createElement("div");
+    card.className = "download-card";
+
+    const title = document.createElement("strong");
+    title.textContent = download.filename;
+
+    const meta = document.createElement("div");
+    const percent = download.bytes_total
+      ? Math.round((download.bytes_downloaded / download.bytes_total) * 100)
+      : 0;
+    meta.textContent = `${download.status} · ${fmtBytes(download.bytes_downloaded)} / ${fmtBytes(download.bytes_total)}`;
+
+    const progress = document.createElement("div");
+    progress.className = "progress";
+    const bar = document.createElement("span");
+    bar.style.width = `${Math.min(percent, 100)}%`;
+    progress.appendChild(bar);
+
+    const actions = document.createElement("div");
+    if (download.status === "downloading" || download.status === "queued") {
+      const cancel = document.createElement("button");
+      cancel.className = "ghost";
+      cancel.textContent = "Cancel";
+      cancel.onclick = async () => {
+        await fetch(`/ui/api/downloads/${download.download_id}`, { method: "DELETE" });
+        await loadDownloads();
+      };
+      actions.appendChild(cancel);
+    }
+
+    card.appendChild(title);
+    card.appendChild(meta);
+    card.appendChild(progress);
+    card.appendChild(actions);
+    downloadsList.appendChild(card);
+  });
+}
+
+async function loadConfig() {
+  const res = await fetch("/ui/api/llamacpp-config");
+  const data = await res.json();
+  Object.entries(cfgFields).forEach(([key, el]) => {
+    el.value = data.params?.[key] || "";
+  });
+  extraArgsEl.value = data.extra_args || "";
+  if (data.active_model) {
+    setConfigStatus(`Active model: ${data.active_model}`, "ok");
+  }
+}
+
+async function loadLogs() {
+  const res = await fetch("/ui/api/llamacpp-logs");
+  if (!res.ok) {
+    logsStatus.textContent = "Unavailable";
+    return;
+  }
+  const data = await res.json();
+  logsOutput.textContent = data.logs || "";
+  logsStatus.textContent = data.logs ? "Snapshot" : "Empty";
+}
+
+form.addEventListener("submit", async (event) => {
+  event.preventDefault();
+  errorEl.textContent = "";
+  const url = document.getElementById("model-url").value.trim();
+  const filename = document.getElementById("model-filename").value.trim();
+  if (!url) {
+    errorEl.textContent = "URL is required.";
+    return;
+  }
+  const payload = { url };
+  if (filename) payload.filename = filename;
+  const res = await fetch("/ui/api/downloads", {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify(payload),
+  });
+  if (!res.ok) {
+    const err = await res.json();
+    errorEl.textContent = err.detail || "Failed to start download.";
+    return;
+  }
+  document.getElementById("model-url").value = "";
+  document.getElementById("model-filename").value = "";
+  await loadDownloads();
+});
+
+configForm.addEventListener("submit", async (event) => {
+  event.preventDefault();
+  setConfigStatus("Applying parameters...");
+  const params = {};
+  Object.entries(cfgFields).forEach(([key, el]) => {
+    if (el.value.trim()) {
+      params[key] = el.value.trim();
+    }
+  });
+  const warmupPrompt = warmupPromptEl.value.trim();
+  const res = await fetch("/ui/api/llamacpp-config", {
+    method: "POST",
+    headers: { "Content-Type": "application/json" },
+    body: JSON.stringify({ params, extra_args: extraArgsEl.value.trim(), warmup_prompt: warmupPrompt }),
+  });
+  const payload = await res.json();
+  if (!res.ok) {
+    setConfigStatus(payload.detail || "Update failed.", "error");
+    return;
+  }
+  setConfigStatus("Parameters updated.", "ok");
+  warmupPromptEl.value = "";
+});
+
+refreshModels.addEventListener("click", loadModels);
+refreshDownloads.addEventListener("click", loadDownloads);
+refreshConfig.addEventListener("click", loadConfig);
+refreshLogs.addEventListener("click", loadLogs);
+
+loadModels();
+loadDownloads();
+loadConfig();
+loadLogs();
+
+const eventSource = new EventSource("/ui/api/events");
+eventSource.onmessage = async (event) => {
+  const payload = JSON.parse(event.data);
+  if (payload.type === "download_progress" || payload.type === "download_completed" || payload.type === "download_status") {
+    await loadDownloads();
+  }
+  if (payload.type === "active_model") {
+    await loadModels();
+    await loadConfig();
+  }
+  if (payload.type === "model_switched") {
+    setStatus(`Active model: ${payload.model_id}`, "ok");
+    await loadModels();
+    await loadConfig();
+  }
+  if (payload.type === "model_switch_failed") {
+    setStatus(payload.error || "Model switch failed.", "error");
+  }
+  if (payload.type === "llamacpp_config_updated") {
+    await loadConfig();
+  }
+};
+
+const logsSource = new EventSource("/ui/api/llamacpp-logs/stream");
+logsSource.onopen = () => {
+  logsStatus.textContent = "Streaming";
+};
+logsSource.onmessage = (event) => {
+  const payload = JSON.parse(event.data);
+  if (payload.type !== "logs") {
+    return;
+  }
+  const lines = payload.lines || [];
+  if (!lines.length) return;
+  const current = logsOutput.textContent.split("\n").filter((line) => line.length);
+  const merged = current.concat(lines).slice(-400);
+  logsOutput.textContent = merged.join("\n");
+  logsOutput.scrollTop = logsOutput.scrollHeight;
+  logsStatus.textContent = "Streaming";
+};
+logsSource.onerror = () => {
+  logsStatus.textContent = "Disconnected";
+};
--- a/llamaCpp.Wrapper.app/ui_static/index.html
+++ b/llamaCpp.Wrapper.app/ui_static/index.html
@@ -0,0 +1,151 @@
+<!doctype html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <title>llama.cpp Model Manager</title>
+    <link rel="stylesheet" href="/ui/styles.css" />
+  </head>
+  <body>
+    <div class="page">
+      <header class="topbar">
+        <div class="brand">
+          <p class="eyebrow">llama.cpp wrapper</p>
+          <h1>Model Manager</h1>
+          <p class="lede">Curate models, tune runtime parameters, and keep llama.cpp responsive.</p>
+        </div>
+        <div class="header-actions">
+          <button id="theme-toggle" class="ghost" type="button" aria-pressed="false">Dark</button>
+          <div class="quick-actions card">
+          <h2>Quick Add</h2>
+          <form id="download-form">
+            <label>
+              Model URL
+              <input type="url" id="model-url" placeholder="https://.../model.gguf" required />
+            </label>
+            <label>
+              Optional filename
+              <input type="text" id="model-filename" placeholder="custom-name.gguf" />
+            </label>
+            <button type="submit">Start Download</button>
+            <p id="download-error" class="error"></p>
+          </form>
+          </div>
+        </div>
+      </header>
+
+      <main class="layout">
+        <section class="column">
+          <div class="card">
+            <div class="card-header">
+              <h3>Models</h3>
+              <button id="refresh-models" class="ghost">Refresh</button>
+            </div>
+            <div id="switch-status" class="status"></div>
+            <label class="config-wide">
+              Warmup prompt (one-time)
+              <textarea id="warmup-prompt" rows="3" placeholder="Optional warmup prompt for the next restart only"></textarea>
+            </label>
+            <ul id="models-list" class="list"></ul>
+          </div>
+
+          <div class="card">
+            <div class="card-header">
+              <h3>Downloads</h3>
+              <button id="refresh-downloads" class="ghost">Refresh</button>
+            </div>
+            <div id="downloads-list" class="downloads"></div>
+          </div>
+        </section>
+
+        <section class="column">
+          <div class="card">
+            <div class="card-header">
+              <h3>Runtime Parameters</h3>
+              <button id="refresh-config" class="ghost">Refresh</button>
+            </div>
+            <div id="config-status" class="status"></div>
+            <form id="config-form" class="config-grid">
+              <label>
+                ctx-size
+                <input type="text" id="cfg-ctx-size" placeholder="e.g. 8192" />
+              </label>
+              <label>
+                n-gpu-layers
+                <input type="text" id="cfg-n-gpu-layers" placeholder="e.g. 999" />
+              </label>
+              <label>
+                tensor-split
+                <input type="text" id="cfg-tensor-split" placeholder="e.g. 0.5,0.5" />
+              </label>
+              <label>
+                split-mode
+                <input type="text" id="cfg-split-mode" placeholder="e.g. layer" />
+              </label>
+              <label>
+                cache-type-k
+                <input type="text" id="cfg-cache-type-k" placeholder="e.g. q8_0" />
+              </label>
+              <label>
+                cache-type-v
+                <input type="text" id="cfg-cache-type-v" placeholder="e.g. q8_0" />
+              </label>
+              <label>
+                flash-attn
+                <input type="text" id="cfg-flash-attn" placeholder="on/off" />
+              </label>
+              <label>
+                temp
+                <input type="text" id="cfg-temp" placeholder="e.g. 0.7" />
+              </label>
+              <label>
+                top-k
+                <input type="text" id="cfg-top-k" placeholder="e.g. 40" />
+              </label>
+              <label>
+                top-p
+                <input type="text" id="cfg-top-p" placeholder="e.g. 0.9" />
+              </label>
+              <label>
+                repeat-penalty
+                <input type="text" id="cfg-repeat-penalty" placeholder="e.g. 1.1" />
+              </label>
+              <label>
+                repeat-last-n
+                <input type="text" id="cfg-repeat-last-n" placeholder="e.g. 256" />
+              </label>
+              <label>
+                frequency-penalty
+                <input type="text" id="cfg-frequency-penalty" placeholder="e.g. 0.1" />
+              </label>
+              <label>
+                presence-penalty
+                <input type="text" id="cfg-presence-penalty" placeholder="e.g. 0.0" />
+              </label>
+              <label class="config-wide">
+                extra args
+                <textarea id="cfg-extra-args" rows="3" placeholder="--mlock --no-mmap"></textarea>
+              </label>
+              <button type="submit" class="config-wide">Apply Parameters</button>
+            </form>
+          </div>
+        </section>
+      </main>
+
+      <section class="card logs-panel">
+        <div class="card-header">
+          <div>
+            <h3>llama.cpp Logs</h3>
+            <p class="lede small">Live tail from the llama.cpp container.</p>
+          </div>
+          <div class="log-actions">
+            <span id="logs-status" class="badge muted">Idle</span>
+            <button id="refresh-logs" class="ghost">Refresh</button>
+          </div>
+        </div>
+        <pre id="logs-output" class="log-output"></pre>
+      </section>
+    </div>
+    <script src="/ui/app.js"></script>
+  </body>
+</html>
--- a/llamaCpp.Wrapper.app/ui_static/styles.css
+++ b/llamaCpp.Wrapper.app/ui_static/styles.css
@@ -0,0 +1,337 @@
+:root {
+  --bg: #f5f6f8;
+  --panel: #ffffff;
+  --panel-muted: #f2f3f6;
+  --text: #111318;
+  --muted: #5b6472;
+  --border: rgba(17, 19, 24, 0.08);
+  --accent: #0a84ff;
+  --accent-ink: #005ad6;
+  --shadow: 0 20px 60px rgba(17, 19, 24, 0.08);
+}
+
+* {
+  box-sizing: border-box;
+  margin: 0;
+  padding: 0;
+}
+
+body {
+  font-family: "SF Pro Text", "SF Pro Display", "Helvetica Neue", "Segoe UI", sans-serif;
+  background: radial-gradient(circle at top, #ffffff 0%, var(--bg) 60%);
+  color: var(--text);
+}
+
+.page {
+  max-width: 1200px;
+  margin: 0 auto;
+  padding: 48px 28px 72px;
+}
+
+.topbar {
+  display: grid;
+  grid-template-columns: minmax(240px, 1.2fr) minmax(280px, 0.8fr);
+  gap: 32px;
+  align-items: stretch;
+  margin-bottom: 36px;
+}
+
+.header-actions {
+  display: grid;
+  gap: 16px;
+  justify-items: end;
+}
+
+.header-actions .quick-actions {
+  width: 100%;
+}
+
+.header-actions #theme-toggle {
+  justify-self: end;
+}
+
+.brand h1 {
+  font-size: clamp(2.2rem, 4vw, 3.2rem);
+  letter-spacing: -0.02em;
+}
+
+.eyebrow {
+  text-transform: uppercase;
+  letter-spacing: 0.2em;
+  font-size: 0.68rem;
+  color: var(--muted);
+}
+
+.lede {
+  margin-top: 12px;
+  font-size: 1rem;
+  color: var(--muted);
+}
+
+.lede.small {
+  font-size: 0.85rem;
+}
+
+.card {
+  background: var(--panel);
+  padding: 22px;
+  border-radius: 22px;
+  border: 1px solid var(--border);
+  box-shadow: var(--shadow);
+}
+
+.quick-actions h2 {
+  margin-bottom: 14px;
+  font-size: 1.1rem;
+}
+
+.layout {
+  display: grid;
+  grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
+  gap: 24px;
+}
+
+.column {
+  display: grid;
+  gap: 24px;
+}
+
+.logs-panel {
+  margin-top: 28px;
+}
+
+.card-header {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: 12px;
+  margin-bottom: 16px;
+}
+
+.card-header h3 {
+  font-size: 1.1rem;
+}
+
+.log-actions {
+  display: flex;
+  align-items: center;
+  gap: 12px;
+}
+
+form {
+  display: grid;
+  gap: 12px;
+}
+
+label {
+  display: grid;
+  gap: 6px;
+  font-size: 0.85rem;
+  color: var(--muted);
+}
+
+input,
+textarea,
+button {
+  font: inherit;
+}
+
+input,
+textarea {
+  padding: 10px 12px;
+  border-radius: 12px;
+  border: 1px solid var(--border);
+  background: #fff;
+}
+
+button {
+  border: none;
+  padding: 10px 16px;
+  border-radius: 12px;
+  background: var(--accent);
+  color: #fff;
+  font-weight: 600;
+  cursor: pointer;
+  transition: transform 0.2s ease, background 0.2s ease;
+}
+
+button:hover {
+  transform: translateY(-1px);
+  background: var(--accent-ink);
+}
+
+button.ghost {
+  background: transparent;
+  color: var(--accent);
+  border: 1px solid rgba(10, 132, 255, 0.4);
+  padding: 8px 12px;
+}
+
+.list {
+  list-style: none;
+  padding: 0;
+  margin: 0;
+  display: grid;
+  gap: 10px;
+}
+
+.list li {
+  padding: 12px;
+  border-radius: 14px;
+  background: var(--panel-muted);
+  border: 1px solid var(--border);
+  font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
+  font-size: 0.85rem;
+}
+
+.list li.active {
+  border-color: rgba(10, 132, 255, 0.4);
+  background: #eef5ff;
+}
+
+.model-row {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+  gap: 12px;
+}
+
+.badge {
+  display: inline-block;
+  padding: 4px 8px;
+  border-radius: 999px;
+  background: var(--accent);
+  color: #fff;
+  font-size: 0.7rem;
+  font-weight: 600;
+}
+
+.badge.muted {
+  background: rgba(17, 19, 24, 0.1);
+  color: var(--muted);
+}
+
+.status {
+  margin-bottom: 12px;
+  font-size: 0.9rem;
+  color: var(--muted);
+}
+
+.status.ok {
+  color: #1a7f37;
+}
+
+.status.error {
+  color: #b02a14;
+}
+
+.downloads {
+  display: grid;
+  gap: 12px;
+}
+
+.download-card {
+  border-radius: 16px;
+  border: 1px solid var(--border);
+  padding: 12px;
+  background: #f7f8fb;
+}
+
+.download-card strong {
+  display: block;
+  font-size: 0.9rem;
+  margin-bottom: 6px;
+}
+
+.progress {
+  height: 8px;
+  border-radius: 999px;
+  background: #dfe3ea;
+  overflow: hidden;
+  margin: 8px 0;
+}
+
+.progress > span {
+  display: block;
+  height: 100%;
+  background: var(--accent);
+  width: 0;
+}
+
+.error {
+  color: #b02a14;
+  font-size: 0.85rem;
+}
+
+.config-grid {
+  display: grid;
+  grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
+  gap: 14px;
+}
+
+.config-wide {
+  grid-column: 1 / -1;
+}
+
+textarea {
+  padding: 10px 12px;
+  border-radius: 12px;
+  border: 1px solid var(--border);
+  font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
+  font-size: 0.85rem;
+  resize: vertical;
+}
+
+.log-output {
+  background: #0f141b;
+  color: #dbe6f3;
+  padding: 16px;
+  border-radius: 16px;
+  min-height: 260px;
+  max-height: 420px;
+  overflow: auto;
+  font-size: 12px;
+  line-height: 1.6;
+  white-space: pre-wrap;
+}
+
+[data-theme="dark"] {
+  --bg: #0b0d12;
+  --panel: #141824;
+  --panel-muted: #1b2132;
+  --text: #f1f4f9;
+  --muted: #a5afc2;
+  --border: rgba(241, 244, 249, 0.1);
+  --accent: #4aa3ff;
+  --accent-ink: #1f7ae0;
+  --shadow: 0 20px 60px rgba(0, 0, 0, 0.4);
+}
+
+[data-theme="dark"] body {
+  background: radial-gradient(circle at top, #131826 0%, var(--bg) 60%);
+}
+
+[data-theme="dark"] .download-card {
+  background: #121826;
+}
+
+[data-theme="dark"] .progress {
+  background: #2a3349;
+}
+
+[data-theme="dark"] .log-output {
+  background: #080b12;
+  color: #d8e4f3;
+}
+
+@media (max-width: 900px) {
+  .topbar {
+    grid-template-columns: 1fr;
+  }
+}
+
+@media (max-width: 640px) {
+  .page {
+    padding: 32px 16px 48px;
+  }
+}
--- a/llamaCpp.Wrapper.app/warmup.py
+++ b/llamaCpp.Wrapper.app/warmup.py
@@ -0,0 +1,74 @@
+import asyncio
+import logging
+import time
+from pathlib import Path
+
+import httpx
+
+
+log = logging.getLogger("llamacpp_warmup")
+
+
+def _is_loading_error(response: httpx.Response) -> bool:
+    if response.status_code != 503:
+        return False
+    try:
+        payload = response.json()
+    except Exception:
+        return False
+    message = ""
+    if isinstance(payload, dict):
+        error = payload.get("error")
+        if isinstance(error, dict):
+            message = str(error.get("message") or "")
+        else:
+            message = str(payload.get("message") or "")
+    return "loading model" in message.lower()
+
+
+def resolve_warmup_prompt(override: str | None, fallback_path: str) -> str:
+    if override:
+        prompt = override.strip()
+        if prompt:
+            return prompt
+    try:
+        prompt = Path(fallback_path).read_text(encoding="utf-8").strip()
+        if prompt:
+            return prompt
+    except Exception as exc:
+        log.warning("Failed to read warmup prompt from %s: %s", fallback_path, exc)
+    return "ok"
+
+
+async def run_warmup(base_url: str, model_id: str, prompt: str, timeout_s: float) -> None:
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": prompt}],
+        "max_tokens": 8,
+        "temperature": 0,
+    }
+    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
+        resp = await client.post("/v1/chat/completions", json=payload)
+        if resp.status_code == 503 and _is_loading_error(resp):
+            raise RuntimeError("llama.cpp still loading model")
+        resp.raise_for_status()
+
+
+async def run_warmup_with_retry(
+    base_url: str,
+    model_id: str,
+    prompt: str,
+    timeout_s: float,
+    interval_s: float = 3.0,
+) -> None:
+    deadline = time.time() + timeout_s
+    last_exc: Exception | None = None
+    while time.time() < deadline:
+        try:
+            await run_warmup(base_url, model_id, prompt, timeout_s=timeout_s)
+            return
+        except Exception as exc:
+            last_exc = exc
+            await asyncio.sleep(interval_s)
+    if last_exc:
+        raise last_exc
--- a/llamacpp_remote_test.ps1
+++ b/llamacpp_remote_test.ps1
@@ -0,0 +1,464 @@
+param(
+  [Parameter(Mandatory = $true)][string]$Model,
+  [string]$BaseUrl = "http://192.168.1.2:8071",
+  [string]$PromptPath = "prompt_crwv.txt",
+  [int]$Runs = 3,
+  [int]$MaxTokens = 2000,
+  [int]$NumCtx = 131072,
+  [int]$TopK = 1,
+  [double]$TopP = 1.0,
+  [int]$Seed = 42,
+  [double]$RepeatPenalty = 1.05,
+  [double]$Temperature = 0,
+  [string]$JsonSchema = "",
+  [int]$TimeoutSec = 1800,
+  [string]$BatchId,
+  [switch]$EnableGpuMonitor = $true,
+  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
+  [string]$SshUser = "rushabh",
+  [string]$SshHost = "192.168.1.2",
+  [int]$SshPort = 55555,
+  [int]$GpuMonitorIntervalSec = 1,
+  [int]$GpuMonitorSeconds = 120
+)
+
+$ErrorActionPreference = "Stop"
+$ProgressPreference = "SilentlyContinue"
+
+function Normalize-Strike([object]$value) {
+  if ($null -eq $value) { return $null }
+  if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
+    return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
+  }
+  return ($value.ToString().Trim())
+}
+
+function Get-AllowedLegs([string]$promptText) {
+  $pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
+  $match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
+  if (-not $match.Success) {
+    throw "Options Chain JSON block not found in prompt."
+  }
+  $chains = $match.Groups[1].Value | ConvertFrom-Json
+  $allowedExpiry = @{}
+  $allowedLegs = @{}
+  foreach ($exp in $chains) {
+    $expiry = [string]$exp.expiry
+    if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
+    $allowedExpiry[$expiry] = $true
+    foreach ($leg in $exp.liquidSet) {
+      if ($null -eq $leg) { continue }
+      if ($leg.liquid -ne $true) { continue }
+      $side = [string]$leg.side
+      $strikeNorm = Normalize-Strike $leg.strike
+      if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
+        $key = "$expiry|$side|$strikeNorm"
+        $allowedLegs[$key] = $true
+      }
+    }
+  }
+  return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
+}
+
+function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
+  $errors = New-Object System.Collections.Generic.List[string]
+
+  $requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
+  foreach ($key in $requiredTop) {
+    if (-not ($obj.PSObject.Properties.Name -contains $key)) {
+      $errors.Add("Missing top-level key: $key")
+    }
+  }
+
+  if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
+    $errors.Add("Invalid strategyBias: $($obj.strategyBias)")
+  }
+
+  if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
+    if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
+      $errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
+    }
+  } else {
+    $errors.Add("selectedExpiry is missing or empty")
+  }
+
+  if ($obj.confidenceScore -ne $null) {
+    if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
+      $errors.Add("confidenceScore is not numeric")
+    } elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
+      $errors.Add("confidenceScore out of range 0-100")
+    }
+  }
+
+  if ($obj.recommendedTrades -eq $null) {
+    $errors.Add("recommendedTrades is null")
+  } elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
+    $errors.Add("recommendedTrades is not an array")
+  }
+
+  if ($obj.strategyBias -eq "NO_TRADE") {
+    if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
+      $errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
+    }
+  } else {
+    if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
+      $errors.Add("recommendedTrades must contain 1-3 trades")
+    }
+  }
+
+  if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
+    $errors.Add("whyOthersRejected is not an array")
+  }
+
+  if ($obj.recommendedTrades) {
+    foreach ($trade in $obj.recommendedTrades) {
+      $tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
+      foreach ($tkey in $tradeRequired) {
+        if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
+          $errors.Add("Trade missing key: $tkey")
+        }
+      }
+
+      if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
+      if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
+      if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
+      if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
+        if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
+      }
+      if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
+        if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
+      }
+
+      if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
+        $errors.Add("Trade legs missing or not an array")
+        continue
+      }
+
+      $legs = @($trade.legs)
+
+      $hasBuy = $false
+      $hasSell = $false
+      foreach ($leg in $trade.legs) {
+        $side = ([string]$leg.side).ToLowerInvariant()
+        $action = ([string]$leg.action).ToLowerInvariant()
+        $expiry = [string]$leg.expiry
+        $strikeNorm = Normalize-Strike $leg.strike
+
+        if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
+        if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
+        if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
+        if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
+          $key = "$expiry|$side|$strikeNorm"
+          if (-not $allowedLegs.ContainsKey($key)) {
+            $errors.Add("Leg not in liquid set: $key")
+          }
+        }
+
+        if ($action -eq "buy") { $hasBuy = $true }
+        if ($action -eq "sell") { $hasSell = $true }
+      }
+
+      if ($obj.selectedExpiry -and $legs) {
+        foreach ($leg in $legs) {
+          if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
+            $errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
+          }
+        }
+      }
+
+      if ($hasSell -and -not $hasBuy) {
+        $errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
+      }
+
+      if ($trade.greekProfile) {
+        $gp = $trade.greekProfile
+        $gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
+        foreach ($gkey in $gpRequired) {
+          if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
+            $errors.Add("Missing greekProfile.$gkey")
+          }
+        }
+        if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
+        if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
+        if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
+        if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
+
+        if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
+          $errors.Add("ThetaExposure POS on all-long legs")
+        }
+      } else {
+        $errors.Add("Missing greekProfile")
+      }
+
+      $structure = ([string]$trade.structure).ToLowerInvariant()
+      $tradeName = ([string]$trade.name).ToLowerInvariant()
+      $isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
+      $isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
+      $isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
+      $isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
+
+      if ($isStraddle -or $isStrangle) {
+        if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
+        $callLegs = $legs | Where-Object { $_.side -eq "call" }
+        $putLegs = $legs | Where-Object { $_.side -eq "put" }
+        if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
+        if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
+          $callStrike = Normalize-Strike $callLegs[0].strike
+          $putStrike = Normalize-Strike $putLegs[0].strike
+          if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
+          if ($isStrangle) {
+            try {
+              if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
+            } catch {
+              $errors.Add("Strangle strike comparison failed")
+            }
+          }
+          if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
+            $errors.Add("Straddle/Strangle must be long (buy) legs")
+          }
+        }
+        if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
+          $errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
+        }
+      }
+
+      if ($isCallDebit) {
+        $callLegs = $legs | Where-Object { $_.side -eq "call" }
+        if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
+        $buy = $callLegs | Where-Object { $_.action -eq "buy" }
+        $sell = $callLegs | Where-Object { $_.action -eq "sell" }
+        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
+        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
+          try {
+            if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
+              $errors.Add("Call debit spread buy strike must be below sell strike")
+            }
+          } catch {
+            $errors.Add("Call debit spread strike comparison failed")
+          }
+        }
+      }
+
+      if ($isPutDebit) {
+        $putLegs = $legs | Where-Object { $_.side -eq "put" }
+        if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
+        $buy = $putLegs | Where-Object { $_.action -eq "buy" }
+        $sell = $putLegs | Where-Object { $_.action -eq "sell" }
+        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
+        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
+          try {
+            if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
+              $errors.Add("Put debit spread buy strike must be above sell strike")
+            }
+          } catch {
+            $errors.Add("Put debit spread strike comparison failed")
+          }
+        }
+      }
+    }
+  }
+
+  return $errors
+}
+
+function Parse-GpuLog {
+  param([string]$Path)
+  $summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
+  if (-not (Test-Path $Path)) {
+    $summary.error = "gpu log missing"
+    return $summary
+  }
+  $lines = Get-Content -Path $Path
+  $currentIndex = -1
+  $gpuIndex = -1
+  $inUtilBlock = $false
+  foreach ($line in $lines) {
+    if ($line -match '^Timestamp') {
+      $gpuIndex = -1
+      $currentIndex = -1
+      $inUtilBlock = $false
+      continue
+    }
+    if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
+      $gpuIndex += 1
+      $currentIndex = $gpuIndex
+      $inUtilBlock = $false
+      continue
+    }
+    if ($line -match '^\\s*Utilization\\s*$') {
+      $inUtilBlock = $true
+      continue
+    }
+    if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
+      $util = [int]$Matches[1]
+      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
+      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
+      $summary.samples += 1
+    }
+  }
+  return $summary
+}
+
+$prompt = [string](Get-Content -Raw -Path $PromptPath)
+$allowed = Get-AllowedLegs -promptText $prompt
+$allowedExpiry = $allowed.AllowedExpiry
+$allowedLegs = $allowed.AllowedLegs
+
+if ([string]::IsNullOrWhiteSpace($BatchId)) {
+  $BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
+}
+
+$outBase = Join-Path -Path (Get-Location) -ChildPath "llamacpp_runs_remote"
+if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
+
+$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
+$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
+if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
+
+$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
+if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
+
+$summary = [ordered]@{
+  model = $Model
+  baseUrl = $BaseUrl
+  batchId = $BatchId
+  params = [ordered]@{
+    temperature = $Temperature
+    top_k = $TopK
+    top_p = $TopP
+    seed = $Seed
+    repeat_penalty = $RepeatPenalty
+    max_tokens = $MaxTokens
+    num_ctx = $NumCtx
+  }
+  gpuMonitor = [ordered]@{
+    enabled = [bool]$EnableGpuMonitor
+    sshHost = $SshHost
+    sshPort = $SshPort
+    intervalSec = $GpuMonitorIntervalSec
+    durationSec = $GpuMonitorSeconds
+  }
+  modelMeta = $null
+  runs = @()
+}
+
+if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
+  try {
+    $schemaObject = $JsonSchema | ConvertFrom-Json
+  } catch {
+    throw "JsonSchema is not valid JSON: $($_.Exception.Message)"
+  }
+}
+
+try {
+  $modelsResponse = Invoke-RestMethod -Uri "$BaseUrl/v1/models" -TimeoutSec 30
+  $meta = $modelsResponse.data | Where-Object { $_.id -eq $Model } | Select-Object -First 1
+  if ($meta) { $summary.modelMeta = $meta.meta }
+} catch {
+  $summary.modelMeta = @{ error = $_.Exception.Message }
+}
+
+for ($i = 1; $i -le $Runs; $i++) {
+  Write-Host "Running $Model (run $i/$Runs)"
+
+  $runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
+  $gpuJob = $null
+  $gpuLogPath = $null
+
+  if ($EnableGpuMonitor) {
+    $samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
+    $gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
+    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
+    $gpuJob = Start-Job -ScriptBlock {
+      param($sshExe, $target, $port, $samples, $interval, $logPath)
+      for ($s = 1; $s -le $samples; $s++) {
+        Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
+        try {
+          $out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
+          Add-Content -Path $logPath -Value $out
+        } catch {
+          Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
+        }
+        Start-Sleep -Seconds $interval
+      }
+    } -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
+    Start-Sleep -Seconds 1
+  }
+
+  $body = @{
+    model = $Model
+    messages = @(@{ role = "user"; content = $prompt })
+    temperature = $Temperature
+    top_k = $TopK
+    top_p = $TopP
+    seed = $Seed
+    repeat_penalty = $RepeatPenalty
+    max_tokens = $MaxTokens
+  }
+
+  if ($schemaObject) {
+    $body.response_format = @{
+      type = "json_schema"
+      json_schema = @{
+        name = "trade_schema"
+        schema = $schemaObject
+        strict = $true
+      }
+    }
+  }
+
+  $body = $body | ConvertTo-Json -Depth 12
+
+  try {
+    $resp = Invoke-RestMethod -Uri "$BaseUrl/v1/chat/completions" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
+  } catch {
+    $runResult.errors = @("API error: $($_.Exception.Message)")
+    $summary.runs += $runResult
+    if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
+    continue
+  } finally {
+    if ($gpuJob) {
+      Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
+      if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
+      Remove-Job -Job $gpuJob | Out-Null
+    }
+  }
+
+  $raw = [string]$resp.choices[0].message.content
+
+  $jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
+  Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
+
+  try {
+    $parsed = $raw | ConvertFrom-Json
+    $errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
+    if ($errors.Count -eq 0) {
+      $runResult.ok = $true
+    } else {
+      $runResult.errors = $errors
+    }
+  } catch {
+    $runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
+  }
+
+  if ($gpuLogPath) {
+    $runResult.gpuLog = $gpuLogPath
+    $runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
+  }
+  if ($resp.timings) {
+    $runResult.timings = $resp.timings
+  }
+  if ($resp.usage) {
+    $runResult.usage = $resp.usage
+  }
+
+  $summary.runs += $runResult
+}
+
+$summaryPath = Join-Path $outDir "summary.json"
+$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
+
+$summary | ConvertTo-Json -Depth 6
--- a/llamacpp_set_command.ps1
+++ b/llamacpp_set_command.ps1
@@ -0,0 +1,117 @@
+param(
+  [Parameter(Mandatory = $true)][string]$ModelPath,
+  [Parameter(Mandatory = $true)][int]$CtxSize,
+  [int]$BatchSize = 1024,
+  [int]$UBatchSize = 256,
+  [string]$TensorSplit = "0.5,0.5",
+  [string]$Devices = "0,1",
+  [int]$GpuLayers = 999,
+  [string]$CacheTypeK = "q4_0",
+  [string]$CacheTypeV = "q4_0",
+  [string]$GrammarFile = "",
+  [string]$JsonSchema = "",
+  [string]$BaseUrl = "http://192.168.1.2:8071",
+  [int]$TimeoutSec = 600,
+  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
+  [string]$SshUser = "rushabh",
+  [string]$SshHost = "192.168.1.2",
+  [int]$SshPort = 55555
+)
+
+$ErrorActionPreference = "Stop"
+$ProgressPreference = "SilentlyContinue"
+
+$commandArgs = @(
+  "--model", $ModelPath,
+  "--ctx-size", $CtxSize.ToString(),
+  "--n-gpu-layers", $GpuLayers.ToString(),
+  "--split-mode", "layer",
+  "--tensor-split", $TensorSplit,
+  "--batch-size", $BatchSize.ToString(),
+  "--ubatch-size", $UBatchSize.ToString(),
+  "--cache-type-k", $CacheTypeK,
+  "--cache-type-v", $CacheTypeV,
+  "--flash-attn", "on"
+)
+
+if (-not [string]::IsNullOrWhiteSpace($Devices)) {
+  $commandArgs = @("--device", $Devices) + $commandArgs
+}
+
+if (-not [string]::IsNullOrWhiteSpace($GrammarFile)) {
+  $commandArgs += @("--grammar-file", $GrammarFile)
+}
+
+if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
+  $commandArgs += @("--json-schema", $JsonSchema)
+}
+
+$argJson = $commandArgs | ConvertTo-Json -Compress
+
+$py = @"
+import json
+path = r"/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
+new_cmd = json.loads(r'''$argJson''')
+lines = open(path, "r", encoding="utf-8").read().splitlines()
+out = []
+in_cmd = False
+def yaml_quote(value):
+    text = str(value)
+    return "'" + text.replace("'", "''") + "'"
+for line in lines:
+    if line.startswith('"command":'):
+        out.append('"command":')
+        for arg in new_cmd:
+            out.append(f"- {yaml_quote(arg)}")
+        in_cmd = True
+        continue
+    if in_cmd:
+        if line.startswith('"') and not line.startswith('"command":'):
+            in_cmd = False
+            out.append(line)
+        else:
+            continue
+    else:
+        out.append(line)
+if in_cmd:
+    pass
+open(path, "w", encoding="utf-8").write("\n".join(out) + "\n")
+"@
+
+$py | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -"
+
+$pyCompose = @"
+import json, yaml, subprocess
+compose_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/templates/rendered/docker-compose.yaml"
+user_config_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
+with open(compose_path, "r", encoding="utf-8") as f:
+    compose = json.load(f)
+with open(user_config_path, "r", encoding="utf-8") as f:
+    config = yaml.safe_load(f)
+command = config.get("command")
+if not command:
+    raise SystemExit("command list missing from user_config")
+svc = compose["services"]["llamacpp"]
+svc["command"] = command
+with open(compose_path, "w", encoding="utf-8") as f:
+    json.dump(compose, f)
+payload = {"custom_compose_config": compose}
+subprocess.run(["midclt", "call", "app.update", "llamacpp", json.dumps(payload)], check=True)
+"@
+
+$pyCompose | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -" | Out-Null
+
+$start = Get-Date
+while ((Get-Date) - $start -lt [TimeSpan]::FromSeconds($TimeoutSec)) {
+  try {
+    $resp = Invoke-RestMethod -Uri "$BaseUrl/health" -TimeoutSec 10
+    if ($resp.status -eq "ok") {
+      Write-Host "llamacpp healthy at $BaseUrl"
+      exit 0
+    }
+  } catch {
+    Start-Sleep -Seconds 5
+  }
+}
+
+throw "Timed out waiting for llama.cpp server at $BaseUrl"
--- a/modelfiles/options-json-deepseek14b.Modelfile
+++ b/modelfiles/options-json-deepseek14b.Modelfile
@@ -0,0 +1,14 @@
+FROM deepseek-r1:14b
+SYSTEM """
+You are a senior quantitative options trader specializing in index and ETF options.
+Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
+No markdown, no code fences, no commentary, no extra keys, no trailing text.
+Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
+If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
+Begin output with { and end with }.
+"""
+PARAMETER temperature 0
+PARAMETER top_k 1
+PARAMETER top_p 1
+PARAMETER repeat_penalty 1.05
+PARAMETER seed 42
--- a/modelfiles/options-json-llama31-70b.Modelfile
+++ b/modelfiles/options-json-llama31-70b.Modelfile
@@ -0,0 +1,14 @@
+FROM llama3.1:70b
+SYSTEM """
+You are a senior quantitative options trader specializing in index and ETF options.
+Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
+No markdown, no code fences, no commentary, no extra keys, no trailing text.
+Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
+If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
+Begin output with { and end with }.
+"""
+PARAMETER temperature 0
+PARAMETER top_k 1
+PARAMETER top_p 1
+PARAMETER repeat_penalty 1.05
+PARAMETER seed 42
--- a/modelfiles/options-json-phi3mini.Modelfile
+++ b/modelfiles/options-json-phi3mini.Modelfile
@@ -0,0 +1,14 @@
+FROM phi3:mini-128k
+SYSTEM """
+You are a senior quantitative options trader specializing in index and ETF options.
+Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
+No markdown, no code fences, no commentary, no extra keys, no trailing text.
+Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
+If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
+Begin output with { and end with }.
+"""
+PARAMETER temperature 0
+PARAMETER top_k 1
+PARAMETER top_p 1
+PARAMETER repeat_penalty 1.05
+PARAMETER seed 42
--- a/ollama_remote_test.ps1
+++ b/ollama_remote_test.ps1
@@ -0,0 +1,561 @@
+param(
+  [Parameter(Mandatory = $true)][string]$Model,
+  [string]$BaseUrl = "http://192.168.1.2:30068",
+  [string]$PromptPath = "prompt_crwv.txt",
+  [int]$Runs = 3,
+  [int]$NumPredict = 1200,
+  [int]$NumCtx = 131072,
+  [int]$NumBatch = 0,
+  [int]$NumGpuLayers = 0,
+  [int]$TimeoutSec = 900,
+  [int]$TopK = 1,
+  [double]$TopP = 1.0,
+  [int]$Seed = 42,
+  [double]$RepeatPenalty = 1.05,
+  [string]$BatchId,
+  [switch]$UseSchemaFormat = $false,
+  [switch]$EnableGpuMonitor = $true,
+  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
+  [switch]$CheckProcessor = $true,
+  [string]$SshUser = "rushabh",
+  [string]$SshHost = "192.168.1.2",
+  [int]$SshPort = 55555,
+  [int]$GpuMonitorIntervalSec = 1,
+  [int]$GpuMonitorSeconds = 120
+)
+
+$ErrorActionPreference = "Stop"
+$ProgressPreference = "SilentlyContinue"
+
+function Normalize-Strike([object]$value) {
+  if ($null -eq $value) { return $null }
+  if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
+    return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
+  }
+  return ($value.ToString().Trim())
+}
+
+function Get-AllowedLegs([string]$promptText) {
+  $pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
+  $match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
+  if (-not $match.Success) {
+    throw "Options Chain JSON block not found in prompt."
+  }
+  $chains = $match.Groups[1].Value | ConvertFrom-Json
+  $allowedExpiry = @{}
+  $allowedLegs = @{}
+  foreach ($exp in $chains) {
+    $expiry = [string]$exp.expiry
+    if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
+    $allowedExpiry[$expiry] = $true
+    foreach ($leg in $exp.liquidSet) {
+      if ($null -eq $leg) { continue }
+      if ($leg.liquid -ne $true) { continue }
+      $side = [string]$leg.side
+      $strikeNorm = Normalize-Strike $leg.strike
+      if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
+        $key = "$expiry|$side|$strikeNorm"
+        $allowedLegs[$key] = $true
+      }
+    }
+  }
+  return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
+}
+
+function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
+  $errors = New-Object System.Collections.Generic.List[string]
+
+  $requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
+  foreach ($key in $requiredTop) {
+    if (-not ($obj.PSObject.Properties.Name -contains $key)) {
+      $errors.Add("Missing top-level key: $key")
+    }
+  }
+
+  if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
+    $errors.Add("Invalid strategyBias: $($obj.strategyBias)")
+  }
+
+  if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
+    if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
+      $errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
+    }
+  } else {
+    $errors.Add("selectedExpiry is missing or empty")
+  }
+
+  if ($obj.confidenceScore -ne $null) {
+    if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
+      $errors.Add("confidenceScore is not numeric")
+    } elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
+      $errors.Add("confidenceScore out of range 0-100")
+    }
+  }
+
+  if ($obj.recommendedTrades -eq $null) {
+    $errors.Add("recommendedTrades is null")
+  } elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
+    $errors.Add("recommendedTrades is not an array")
+  }
+
+  if ($obj.strategyBias -eq "NO_TRADE") {
+    if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
+      $errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
+    }
+  } else {
+    if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
+      $errors.Add("recommendedTrades must contain 1-3 trades")
+    }
+  }
+
+  if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
+    $errors.Add("whyOthersRejected is not an array")
+  }
+
+  if ($obj.recommendedTrades) {
+    foreach ($trade in $obj.recommendedTrades) {
+      $tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
+      foreach ($tkey in $tradeRequired) {
+        if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
+          $errors.Add("Trade missing key: $tkey")
+        }
+      }
+
+      if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
+      if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
+      if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
+      if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
+      if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
+        if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
+      }
+      if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
+        if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
+      }
+
+      if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
+        $errors.Add("Trade legs missing or not an array")
+        continue
+      }
+
+      $legs = @($trade.legs)
+
+      $hasBuy = $false
+      $hasSell = $false
+      foreach ($leg in $trade.legs) {
+        $side = ([string]$leg.side).ToLowerInvariant()
+        $action = ([string]$leg.action).ToLowerInvariant()
+        $expiry = [string]$leg.expiry
+        $strikeNorm = Normalize-Strike $leg.strike
+
+        if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
+        if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
+        if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
+        if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
+          $key = "$expiry|$side|$strikeNorm"
+          if (-not $allowedLegs.ContainsKey($key)) {
+            $errors.Add("Leg not in liquid set: $key")
+          }
+        }
+
+        if ($action -eq "buy") { $hasBuy = $true }
+        if ($action -eq "sell") { $hasSell = $true }
+      }
+
+      if ($obj.selectedExpiry -and $legs) {
+        foreach ($leg in $legs) {
+          if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
+            $errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
+          }
+        }
+      }
+
+      if ($hasSell -and -not $hasBuy) {
+        $errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
+      }
+
+      if ($trade.greekProfile) {
+        $gp = $trade.greekProfile
+        $gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
+        foreach ($gkey in $gpRequired) {
+          if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
+            $errors.Add("Missing greekProfile.$gkey")
+          }
+        }
+        if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
+        if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
+        if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
+        if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
+
+        if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
+          $errors.Add("ThetaExposure POS on all-long legs")
+        }
+      } else {
+        $errors.Add("Missing greekProfile")
+      }
+
+      $structure = ([string]$trade.structure).ToLowerInvariant()
+      $tradeName = ([string]$trade.name).ToLowerInvariant()
+      $isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
+      $isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
+      $isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
+      $isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
+
+      if ($isStraddle -or $isStrangle) {
+        if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
+        $callLegs = $legs | Where-Object { $_.side -eq "call" }
+        $putLegs = $legs | Where-Object { $_.side -eq "put" }
+        if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
+        if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
+          $callStrike = Normalize-Strike $callLegs[0].strike
+          $putStrike = Normalize-Strike $putLegs[0].strike
+          if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
+          if ($isStrangle) {
+            try {
+              if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
+            } catch {
+              $errors.Add("Strangle strike comparison failed")
+            }
+          }
+          if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
+            $errors.Add("Straddle/Strangle must be long (buy) legs")
+          }
+        }
+        if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
+          $errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
+        }
+      }
+
+      if ($isCallDebit) {
+        $callLegs = $legs | Where-Object { $_.side -eq "call" }
+        if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
+        $buy = $callLegs | Where-Object { $_.action -eq "buy" }
+        $sell = $callLegs | Where-Object { $_.action -eq "sell" }
+        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
+        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
+          try {
+            if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
+              $errors.Add("Call debit spread buy strike must be below sell strike")
+            }
+          } catch {
+            $errors.Add("Call debit spread strike comparison failed")
+          }
+        }
+      }
+
+      if ($isPutDebit) {
+        $putLegs = $legs | Where-Object { $_.side -eq "put" }
+        if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
+        $buy = $putLegs | Where-Object { $_.action -eq "buy" }
+        $sell = $putLegs | Where-Object { $_.action -eq "sell" }
+        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
+        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
+          try {
+            if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
+              $errors.Add("Put debit spread buy strike must be above sell strike")
+            }
+          } catch {
+            $errors.Add("Put debit spread strike comparison failed")
+          }
+        }
+      }
+    }
+  }
+
+  return $errors
+}
+
+function Parse-GpuLog {
+  param([string]$Path)
+  $summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
+  if (-not (Test-Path $Path)) {
+    $summary.error = "gpu log missing"
+    return $summary
+  }
+  $lines = Get-Content -Path $Path
+  $currentIndex = -1
+  $gpuIndex = -1
+  $inGpuUtilSamples = $false
+  $inUtilBlock = $false
+  foreach ($line in $lines) {
+    if ($line -match '^Timestamp') {
+      $gpuIndex = -1
+      $currentIndex = -1
+      $inGpuUtilSamples = $false
+      $inUtilBlock = $false
+      continue
+    }
+    if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
+      $gpuIndex += 1
+      $currentIndex = $gpuIndex
+      $inGpuUtilSamples = $false
+      $inUtilBlock = $false
+      continue
+    }
+    if ($line -match '^\\s*Utilization\\s*$') {
+      $inUtilBlock = $true
+      continue
+    }
+    if ($line -match '^\\s*GPU Utilization Samples') {
+      $inGpuUtilSamples = $true
+      $inUtilBlock = $false
+      continue
+    }
+    if ($line -match '^\\s*(Memory|ENC|DEC) Utilization Samples') {
+      $inGpuUtilSamples = $false
+      $inUtilBlock = $false
+      continue
+    }
+    if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
+      $util = [int]$Matches[1]
+      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
+      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
+      $summary.samples += 1
+      continue
+    }
+    if ($inGpuUtilSamples -and $line -match '^\\s*Max\\s*:\\s*([0-9]+)\\s*%') {
+      $util = [int]$Matches[1]
+      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
+      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
+      $summary.samples += 1
+    }
+  }
+  return $summary
+}
+
+function Get-ProcessorShare {
+  param(
+    [string]$SshExePath,
+    [string]$Target,
+    [int]$Port,
+    [string]$ModelName
+  )
+  $result = [ordered]@{ cpuPct = $null; gpuPct = $null; raw = $null; error = $null }
+  try {
+    $out = & $SshExePath -p $Port $Target "sudo -n docker exec ix-ollama-ollama-1 ollama ps"
+    $line = $out | Select-String -SimpleMatch $ModelName | Select-Object -First 1
+    if ($null -eq $line) {
+      $result.error = "model not found in ollama ps"
+      return $result
+    }
+    $raw = $line.ToString().Trim()
+    $result.raw = $raw
+    if ($raw -match '([0-9]+)%\\/([0-9]+)%\\s+CPU\\/GPU') {
+      $result.cpuPct = [int]$Matches[1]
+      $result.gpuPct = [int]$Matches[2]
+    } elseif ($raw -match '([0-9]+)%\\s+GPU') {
+      $result.cpuPct = 0
+      $result.gpuPct = [int]$Matches[1]
+    } else {
+      $result.error = "CPU/GPU split not parsed"
+    }
+  } catch {
+    $result.error = $_.Exception.Message
+  }
+  return $result
+}
+
+$prompt = [string](Get-Content -Raw -Path $PromptPath)
+$allowed = Get-AllowedLegs -promptText $prompt
+$allowedExpiry = $allowed.AllowedExpiry
+$allowedLegs = $allowed.AllowedLegs
+
+if ([string]::IsNullOrWhiteSpace($BatchId)) {
+  $BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
+}
+
+$outBase = Join-Path -Path (Get-Location) -ChildPath "ollama_runs_remote"
+if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
+
+$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
+$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
+if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
+
+$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
+if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
+
+$summary = [ordered]@{
+  model = $Model
+  baseUrl = $BaseUrl
+  formatMode = $(if ($UseSchemaFormat) { "schema" } else { "json" })
+  batchId = $BatchId
+  gpuMonitor = [ordered]@{
+    enabled = [bool]$EnableGpuMonitor
+    sshHost = $SshHost
+    sshPort = $SshPort
+    intervalSec = $GpuMonitorIntervalSec
+    durationSec = $GpuMonitorSeconds
+  }
+  runs = @()
+}
+
+for ($i = 1; $i -le $Runs; $i++) {
+  Write-Host "Running $Model (run $i/$Runs)"
+
+  $runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
+  $gpuJob = $null
+  $gpuLogPath = $null
+
+  if ($EnableGpuMonitor) {
+    $samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
+    $gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
+    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
+    $gpuJob = Start-Job -ScriptBlock {
+      param($sshExe, $target, $port, $samples, $interval, $logPath)
+      for ($s = 1; $s -le $samples; $s++) {
+        Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
+        try {
+          $out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
+          Add-Content -Path $logPath -Value $out
+        } catch {
+          Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
+        }
+        Start-Sleep -Seconds $interval
+      }
+    } -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
+    Start-Sleep -Seconds 1
+  }
+
+  $format = "json"
+  if ($UseSchemaFormat) {
+    $format = @{
+      type = "object"
+      additionalProperties = $false
+      required = @("selectedExpiry","expiryRationale","strategyBias","recommendedTrades","whyOthersRejected","confidenceScore")
+      properties = @{
+        selectedExpiry = @{ type = "string"; minLength = 1 }
+        expiryRationale = @{ type = "string"; minLength = 1 }
+        strategyBias = @{ type = "string"; enum = @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE") }
+        recommendedTrades = @{
+          type = "array"
+          minItems = 0
+          maxItems = 3
+          items = @{
+            type = "object"
+            additionalProperties = $false
+            required = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
+            properties = @{
+              name = @{ type = "string"; minLength = 1 }
+              structure = @{ type = "string"; minLength = 1 }
+              legs = @{
+                type = "array"
+                minItems = 1
+                maxItems = 4
+                items = @{
+                  type = "object"
+                  additionalProperties = $false
+                  required = @("side","action","strike","expiry")
+                  properties = @{
+                    side = @{ type = "string"; enum = @("call","put") }
+                    action = @{ type = "string"; enum = @("buy","sell") }
+                    strike = @{ type = @("number","string") }
+                    expiry = @{ type = "string"; minLength = 1 }
+                  }
+                }
+              }
+              greekProfile = @{
+                type = "object"
+                additionalProperties = $false
+                required = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
+                properties = @{
+                  deltaBias = @{ type = "string"; enum = @("POS","NEG","NEUTRAL") }
+                  gammaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
+                  thetaExposure = @{ type = "string"; enum = @("POS","NEG","LOW") }
+                  vegaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
+                }
+              }
+              maxRisk = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
+              maxReward = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
+              thesisAlignment = @{ type = "string"; minLength = 1 }
+              invalidation = @{ type = "string"; minLength = 1 }
+              managementNotes = @{ type = "string" }
+            }
+          }
+        }
+        whyOthersRejected = @{
+          type = "array"
+          items = @{ type = "string" }
+        }
+        confidenceScore = @{ type = "number"; minimum = 0; maximum = 100 }
+      }
+    }
+  }
+
+  $options = @{
+    temperature = 0
+    top_k = $TopK
+    top_p = $TopP
+    seed = $Seed
+    repeat_penalty = $RepeatPenalty
+    num_ctx = $NumCtx
+    num_predict = $NumPredict
+  }
+  if ($NumBatch -gt 0) {
+    $options.num_batch = $NumBatch
+  }
+  if ($NumGpuLayers -gt 0) {
+    $options.num_gpu_layers = $NumGpuLayers
+  }
+
+  $body = @{
+    model = $Model
+    prompt = $prompt
+    format = $format
+    stream = $false
+    options = $options
+  } | ConvertTo-Json -Depth 10
+
+  try {
+    $resp = Invoke-RestMethod -Uri "$BaseUrl/api/generate" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
+  } catch {
+    $runResult.errors = @("API error: $($_.Exception.Message)")
+    $summary.runs += $runResult
+    if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
+    continue
+  } finally {
+    if ($gpuJob) {
+      Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
+      if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
+      Remove-Job -Job $gpuJob | Out-Null
+    }
+  }
+
+  $raw = [string]$resp.response
+
+  $jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
+  Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
+
+  try {
+    $parsed = $raw | ConvertFrom-Json
+    $errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
+    if ($errors.Count -eq 0) {
+      $runResult.ok = $true
+    } else {
+      $runResult.errors = $errors
+    }
+  } catch {
+    $runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
+  }
+
+  if ($gpuLogPath) {
+    $runResult.gpuLog = $gpuLogPath
+    $runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
+  }
+
+  if ($CheckProcessor) {
+    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
+    $proc = Get-ProcessorShare -SshExePath $SshExe -Target $sshTarget -Port $SshPort -ModelName $Model
+    $runResult.processor = $proc
+    if ($proc.cpuPct -ne $null) {
+      $runResult.gpuOnly = ($proc.cpuPct -eq 0)
+    }
+  }
+
+  $summary.runs += $runResult
+}
+
+$summaryPath = Join-Path $outDir "summary.json"
+$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
+
+$summary | ConvertTo-Json -Depth 6
--- a/prompt_crwv.txt
+++ b/prompt_crwv.txt
--- a/query.sql
+++ b/query.sql
@@ -0,0 +1 @@
+SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,8 @@
+fastapi==0.115.6
+uvicorn==0.30.6
+httpx==0.27.2
+pytest==8.3.3
+respx==0.21.1
+pytest-asyncio==0.24.0
+PyYAML==6.0.3
+websockets==12.0
--- a/scripts/deploy_truenas_wrapper.py
+++ b/scripts/deploy_truenas_wrapper.py
@@ -0,0 +1,116 @@
+import argparse
+import asyncio
+import json
+import ssl
+from typing import Any, Dict, List, Optional
+
+import websockets
+
+
+async def _rpc_call(ws_url: str, api_key: str, method: str, params: Optional[list] = None, verify_ssl: bool = False) -> Any:
+    ssl_ctx = None
+    if ws_url.startswith("wss://") and not verify_ssl:
+        ssl_ctx = ssl.create_default_context()
+        ssl_ctx.check_hostname = False
+        ssl_ctx.verify_mode = ssl.CERT_NONE
+
+    async with websockets.connect(ws_url, ssl=ssl_ctx) as ws:
+        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
+        connected = json.loads(await ws.recv())
+        if connected.get("msg") != "connected":
+            raise RuntimeError("failed to connect to TrueNAS websocket")
+
+        await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [api_key]}))
+        auth_resp = json.loads(await ws.recv())
+        if not auth_resp.get("result"):
+            raise RuntimeError("API key authentication failed")
+
+        req_id = 2
+        await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
+        while True:
+            raw = json.loads(await ws.recv())
+            if raw.get("id") != req_id:
+                continue
+            if raw.get("msg") == "error":
+                raise RuntimeError(raw.get("error"))
+            return raw.get("result")
+
+
+async def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--ws-url", required=True)
+    parser.add_argument("--api-key", required=True)
+    parser.add_argument("--api-user")
+    parser.add_argument("--app-name", required=True)
+    parser.add_argument("--image", required=True)
+    parser.add_argument("--model-host-path", required=True)
+    parser.add_argument("--llamacpp-base-url", required=True)
+    parser.add_argument("--network", required=True)
+    parser.add_argument("--api-port", type=int, default=9091)
+    parser.add_argument("--ui-port", type=int, default=9092)
+    parser.add_argument("--verify-ssl", action="store_true")
+    args = parser.parse_args()
+
+    api_port = args.api_port
+    ui_port = args.ui_port
+
+    env = {
+        "PORT_A": str(api_port),
+        "PORT_B": str(ui_port),
+        "LLAMACPP_BASE_URL": args.llamacpp_base_url,
+        "MODEL_DIR": "/models",
+        "TRUENAS_WS_URL": args.ws_url,
+        "TRUENAS_API_KEY": args.api_key,
+        "TRUENAS_APP_NAME": "llamacpp",
+        "TRUENAS_VERIFY_SSL": "false",
+    }
+    if args.api_user:
+        env["TRUENAS_API_USER"] = args.api_user
+
+    compose = {
+        "services": {
+            "wrapper": {
+                "image": args.image,
+                "restart": "unless-stopped",
+                "ports": [
+                    f"{api_port}:{api_port}",
+                    f"{ui_port}:{ui_port}",
+                ],
+                "environment": env,
+                "volumes": [
+                    f"{args.model_host_path}:/models",
+                    "/var/run/docker.sock:/var/run/docker.sock",
+                ],
+                "networks": ["llamacpp_net"],
+            }
+        },
+        "networks": {
+            "llamacpp_net": {"external": True, "name": args.network}
+        },
+    }
+
+    create_payload = {
+        "custom_app": True,
+        "app_name": args.app_name,
+        "custom_compose_config": compose,
+    }
+
+    existing = await _rpc_call(args.ws_url, args.api_key, "app.query", [[["id", "=", args.app_name]]], args.verify_ssl)
+    if existing:
+        result = await _rpc_call(
+            args.ws_url,
+            args.api_key,
+            "app.update",
+            [args.app_name, {"custom_compose_config": compose}],
+            args.verify_ssl,
+        )
+        action = "updated"
+    else:
+        result = await _rpc_call(args.ws_url, args.api_key, "app.create", [create_payload], args.verify_ssl)
+        action = "created"
+
+    print(json.dumps({"action": action, "api_port": api_port, "ui_port": ui_port, "result": result}, indent=2))
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
--- a/scripts/remote_wrapper_test.py
+++ b/scripts/remote_wrapper_test.py
@@ -0,0 +1,162 @@
+import json
+import os
+import time
+from datetime import datetime
+
+import requests
+
+BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9000")
+UPSTREAM = os.getenv("LLAMACPP_BASE", "http://192.168.1.2:8071")
+RUNS = int(os.getenv("RUNS", "100"))
+MAX_TOKENS = int(os.getenv("MAX_TOKENS", "4"))
+TIMEOUT = int(os.getenv("REQ_TIMEOUT", "300"))
+
+
+def _now():
+    return datetime.utcnow().isoformat() + "Z"
+
+
+def _get_loaded_model_id():
+    deadline = time.time() + 600
+    last_error = None
+    while time.time() < deadline:
+        try:
+            resp = requests.get(UPSTREAM + "/v1/models", timeout=30)
+            resp.raise_for_status()
+            data = resp.json().get("data") or []
+            if data:
+                return data[0].get("id")
+            last_error = "no models reported by upstream"
+        except Exception as exc:
+            last_error = str(exc)
+        time.sleep(5)
+    raise RuntimeError(f"upstream not ready: {last_error}")
+
+
+def _stream_ok(resp):
+    got_data = False
+    got_done = False
+    for line in resp.iter_lines(decode_unicode=True):
+        if not line:
+            continue
+        if line.startswith("data:"):
+            got_data = True
+            if line.strip() == "data: [DONE]":
+                got_done = True
+                break
+    return got_data, got_done
+
+
+def run_suite(model_id, idx):
+    results = {}
+
+    # Models
+    r = requests.get(BASE + "/v1/models", timeout=30)
+    results["models"] = r.status_code
+
+    r = requests.get(BASE + f"/v1/models/{model_id}", timeout=30)
+    results["model_get"] = r.status_code
+
+    # Chat completions non-stream
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": f"Run {idx}: say ok."}],
+        "max_tokens": MAX_TOKENS,
+        "temperature": (idx % 5) / 10.0,
+    }
+    r = requests.post(BASE + "/v1/chat/completions", json=payload, timeout=TIMEOUT)
+    results["chat"] = r.status_code
+
+    # Chat completions stream
+    payload_stream = dict(payload)
+    payload_stream["stream"] = True
+    r = requests.post(BASE + "/v1/chat/completions", json=payload_stream, stream=True, timeout=TIMEOUT)
+    ok_data, ok_done = _stream_ok(r)
+    results["chat_stream"] = r.status_code
+    results["chat_stream_ok"] = ok_data and ok_done
+
+    # Responses non-stream
+    payload_resp = {
+        "model": model_id,
+        "input": f"Run {idx}: say ok.",
+        "max_output_tokens": MAX_TOKENS,
+    }
+    r = requests.post(BASE + "/v1/responses", json=payload_resp, timeout=TIMEOUT)
+    results["responses"] = r.status_code
+
+    # Responses stream
+    payload_resp_stream = {
+        "model": model_id,
+        "input": f"Run {idx}: say ok.",
+        "stream": True,
+    }
+    r = requests.post(BASE + "/v1/responses", json=payload_resp_stream, stream=True, timeout=TIMEOUT)
+    ok_data, ok_done = _stream_ok(r)
+    results["responses_stream"] = r.status_code
+    results["responses_stream_ok"] = ok_data and ok_done
+
+    # Embeddings (best effort)
+    payload_emb = {"model": model_id, "input": f"Run {idx}"}
+    r = requests.post(BASE + "/v1/embeddings", json=payload_emb, timeout=TIMEOUT)
+    results["embeddings"] = r.status_code
+
+    # Proxy
+    r = requests.post(BASE + "/proxy/llamacpp/v1/chat/completions", json=payload, timeout=TIMEOUT)
+    results["proxy"] = r.status_code
+
+    return results
+
+
+def main():
+    summary = {
+        "started_at": _now(),
+        "base": BASE,
+        "upstream": UPSTREAM,
+        "runs": RUNS,
+        "max_tokens": MAX_TOKENS,
+        "results": [],
+    }
+
+    model_id = _get_loaded_model_id()
+    summary["model_id"] = model_id
+
+    for i in range(1, RUNS + 1):
+        start = time.time()
+        try:
+            results = run_suite(model_id, i)
+            ok = all(
+                results.get(key) == 200
+                for key in ("models", "model_get", "chat", "chat_stream", "responses", "responses_stream", "proxy")
+            )
+            stream_ok = results.get("chat_stream_ok") and results.get("responses_stream_ok")
+            summary["results"].append({
+                "run": i,
+                "ok": ok and stream_ok,
+                "stream_ok": stream_ok,
+                "status": results,
+                "elapsed_s": round(time.time() - start, 2),
+            })
+        except Exception as exc:
+            summary["results"].append({
+                "run": i,
+                "ok": False,
+                "stream_ok": False,
+                "error": str(exc),
+                "elapsed_s": round(time.time() - start, 2),
+            })
+        print(f"Run {i}/{RUNS} done")
+
+    summary["finished_at"] = _now()
+
+    os.makedirs("reports", exist_ok=True)
+    out_path = os.path.join("reports", "remote_wrapper_test.json")
+    with open(out_path, "w", encoding="utf-8") as f:
+        json.dump(summary, f, indent=2)
+
+    # Print a compact summary
+    ok_count = sum(1 for r in summary["results"] if r.get("ok"))
+    print(f"OK {ok_count}/{RUNS}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/update_llamacpp_flags.ps1
+++ b/scripts/update_llamacpp_flags.ps1
@@ -0,0 +1,29 @@
+param(
+  [string]$OutDocs = "reports\\llamacpp_docs.md",
+  [string]$OutFlags = "reports\\llamacpp_flags.txt"
+)
+
+$urls = @(
+  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README.md",
+  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README-llama-server.md",
+  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/README.md"
+)
+
+$out = @()
+foreach ($u in $urls) {
+  try {
+    $content = Invoke-WebRequest -Uri $u -UseBasicParsing -TimeoutSec 30
+    $out += "# Source: $u"
+    $out += $content.Content
+  } catch {
+    $out += "# Source: $u"
+    $out += "(failed to fetch)"
+  }
+}
+
+$out | Set-Content -Encoding UTF8 $OutDocs
+
+$docs = Get-Content $OutDocs -Raw
+$flags = [regex]::Matches($docs, "--[a-zA-Z0-9\\-]+") | ForEach-Object { $_.Value }
+$flags = $flags | Sort-Object -Unique
+$flags | Set-Content -Encoding UTF8 $OutFlags
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,61 @@
+import json
+import os
+from pathlib import Path
+
+import pytest
+from fastapi.testclient import TestClient
+import respx
+
+from app.api_app import create_api_app
+from app.ui_app import create_ui_app
+
+
+@pytest.fixture()
+def agents_config(tmp_path: Path) -> Path:
+    data = {
+        "image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
+        "container_name": "ix-llamacpp-llamacpp-1",
+        "host_port": 8071,
+        "container_port": 8080,
+        "web_ui_url": "http://0.0.0.0:8071/",
+        "model_host_path": str(tmp_path),
+        "model_container_path": str(tmp_path),
+        "models": [],
+        "network": "ix-llamacpp_default",
+        "subnets": ["172.16.18.0/24"],
+        "gpu_count": 2,
+        "gpu_name": "NVIDIA RTX 5060 Ti",
+    }
+    path = tmp_path / "agents_config.json"
+    path.write_text(json.dumps(data), encoding="utf-8")
+    return path
+
+
+@pytest.fixture()
+def model_dir(tmp_path: Path) -> Path:
+    (tmp_path / "model-a.gguf").write_text("x", encoding="utf-8")
+    (tmp_path / "model-b.gguf").write_text("y", encoding="utf-8")
+    return tmp_path
+
+
+@pytest.fixture()
+def api_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
+    monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
+    monkeypatch.setenv("MODEL_DIR", str(model_dir))
+    monkeypatch.setenv("LLAMACPP_BASE_URL", "http://llama.test")
+    app = create_api_app()
+    return TestClient(app)
+
+
+@pytest.fixture()
+def ui_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
+    monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
+    monkeypatch.setenv("MODEL_DIR", str(model_dir))
+    app = create_ui_app()
+    return TestClient(app)
+
+
+@pytest.fixture()
+def respx_mock():
+    with respx.mock(assert_all_called=False) as mock:
+        yield mock
--- a/tests/test_chat_completions.py
+++ b/tests/test_chat_completions.py
@@ -0,0 +1,77 @@
+import json
+import pytest
+import httpx
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_chat_completions_non_stream(api_client, respx_mock, case):
+    respx_mock.get("http://llama.test/v1/models").mock(
+        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
+    )
+    respx_mock.post("http://llama.test/v1/chat/completions").mock(
+        return_value=httpx.Response(200, json={"id": f"chatcmpl-{case}", "choices": [{"message": {"content": "ok"}}]})
+    )
+
+    payload = {
+        "model": "model-a.gguf",
+        "messages": [{"role": "user", "content": f"hello {case}"}],
+        "temperature": (case % 10) / 10,
+    }
+    resp = api_client.post("/v1/chat/completions", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["choices"][0]["message"]["content"] == "ok"
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_chat_completions_stream(api_client, respx_mock, case):
+    respx_mock.get("http://llama.test/v1/models").mock(
+        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
+    )
+
+    def stream_response(request):
+        content = b"data: {\"id\": \"chunk\"}\n\n"
+        return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
+
+    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
+
+    payload = {
+        "model": "model-a.gguf",
+        "messages": [{"role": "user", "content": f"hello {case}"}],
+        "stream": True,
+    }
+    with api_client.stream("POST", "/v1/chat/completions", json=payload) as resp:
+        assert resp.status_code == 200
+        body = b"".join(resp.iter_bytes())
+        assert b"data:" in body
+
+
+def test_chat_completions_tools_normalize(api_client, respx_mock):
+    respx_mock.get("http://llama.test/v1/models").mock(
+        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
+    )
+
+    def handler(request):
+        data = request.json()
+        tools = data.get("tools") or []
+        assert tools
+        assert tools[0].get("function", {}).get("name") == "format_final_json_response"
+        return httpx.Response(200, json={"id": "chatcmpl-tools", "choices": [{"message": {"content": "ok"}}]})
+
+    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=handler)
+
+    payload = {
+        "model": "model-a.gguf",
+        "messages": [{"role": "user", "content": "hello"}],
+        "tools": [
+            {
+                "type": "function",
+                "name": "format_final_json_response",
+                "parameters": {"type": "object"},
+            }
+        ],
+        "tool_choice": {"type": "function", "name": "format_final_json_response"},
+    }
+
+    resp = api_client.post("/v1/chat/completions", json=payload)
+    assert resp.status_code == 200
--- a/tests/test_embeddings.py
+++ b/tests/test_embeddings.py
@@ -0,0 +1,14 @@
+import pytest
+import httpx
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_embeddings(api_client, respx_mock, case):
+    respx_mock.post("http://llama.test/v1/embeddings").mock(
+        return_value=httpx.Response(200, json={"data": [{"embedding": [0.1, 0.2]}]})
+    )
+    payload = {"model": "model-a.gguf", "input": f"text-{case}"}
+    resp = api_client.post("/v1/embeddings", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert "data" in data
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -0,0 +1,24 @@
+import pytest
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_list_models_cases(api_client, case):
+    resp = api_client.get("/v1/models", headers={"x-case": str(case)})
+    assert resp.status_code == 200
+    payload = resp.json()
+    assert payload["object"] == "list"
+    assert isinstance(payload["data"], list)
+
+
+@pytest.mark.parametrize("model_id", [f"model-a.gguf" for _ in range(120)])
+def test_get_model_ok(api_client, model_id):
+    resp = api_client.get(f"/v1/models/{model_id}")
+    assert resp.status_code == 200
+    payload = resp.json()
+    assert payload["id"] == model_id
+
+
+@pytest.mark.parametrize("model_id", [f"missing-{i}" for i in range(120)])
+def test_get_model_not_found(api_client, model_id):
+    resp = api_client.get(f"/v1/models/{model_id}")
+    assert resp.status_code == 404
--- a/tests/test_proxy.py
+++ b/tests/test_proxy.py
@@ -0,0 +1,12 @@
+import pytest
+import httpx
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_proxy_passthrough(api_client, respx_mock, case):
+    respx_mock.post("http://llama.test/test/path").mock(
+        return_value=httpx.Response(200, content=f"ok-{case}".encode())
+    )
+    resp = api_client.post("/proxy/llamacpp/test/path", content=b"hello")
+    assert resp.status_code == 200
+    assert resp.content.startswith(b"ok-")
--- a/tests/test_remote_wrapper.py
+++ b/tests/test_remote_wrapper.py
@@ -0,0 +1,283 @@
+import asyncio
+import json
+import os
+import ssl
+import time
+from typing import Dict, List
+
+import pytest
+import requests
+import websockets
+
+WRAPPER_BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9093")
+UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
+TRUENAS_WS_URL = os.getenv("TRUENAS_WS_URL", "wss://192.168.1.2/websocket")
+TRUENAS_API_KEY = os.getenv("TRUENAS_API_KEY", "")
+TRUENAS_APP_NAME = os.getenv("TRUENAS_APP_NAME", "llamacpp")
+MODEL_REQUEST = os.getenv("MODEL_REQUEST", "")
+
+
+async def _rpc_call(method: str, params: List | None = None):
+    if not TRUENAS_API_KEY:
+        pytest.skip("TRUENAS_API_KEY not set")
+    ssl_ctx = ssl.create_default_context()
+    ssl_ctx.check_hostname = False
+    ssl_ctx.verify_mode = ssl.CERT_NONE
+    async with websockets.connect(TRUENAS_WS_URL, ssl=ssl_ctx) as ws:
+        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
+        connected = json.loads(await ws.recv())
+        if connected.get("msg") != "connected":
+            raise RuntimeError("failed to connect")
+        await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [TRUENAS_API_KEY]}))
+        auth = json.loads(await ws.recv())
+        if not auth.get("result"):
+            raise RuntimeError("auth failed")
+        await ws.send(json.dumps({"id": 2, "msg": "method", "method": method, "params": params or []}))
+        while True:
+            raw = json.loads(await ws.recv())
+            if raw.get("id") != 2:
+                continue
+            if raw.get("msg") == "error":
+                raise RuntimeError(raw.get("error"))
+            return raw.get("result")
+
+
+def _get_models() -> List[str]:
+    _wait_for_http(WRAPPER_BASE + "/health")
+    resp = requests.get(WRAPPER_BASE + "/v1/models", timeout=30)
+    resp.raise_for_status()
+    data = resp.json().get("data") or []
+    return [m.get("id") for m in data if m.get("id")]
+
+
+def _assert_chat_ok(resp_json: Dict) -> str:
+    choices = resp_json.get("choices") or []
+    assert choices, "no choices"
+    message = choices[0].get("message") or {}
+    text = message.get("content") or ""
+    assert text.strip(), "empty content"
+    return text
+
+
+def _wait_for_http(url: str, timeout_s: float = 90) -> None:
+    deadline = time.time() + timeout_s
+    last_err = None
+    while time.time() < deadline:
+        try:
+            resp = requests.get(url, timeout=5)
+            if resp.status_code == 200:
+                return
+            last_err = f"status {resp.status_code}"
+        except Exception as exc:
+            last_err = str(exc)
+        time.sleep(2)
+    raise RuntimeError(f"service not ready: {url} ({last_err})")
+
+
+def _post_with_retry(url: str, payload: Dict, timeout_s: float = 300, retries: int = 6, delay_s: float = 5.0):
+    last = None
+    for _ in range(retries):
+        try:
+            resp = requests.post(url, json=payload, timeout=timeout_s)
+            if resp.status_code == 200:
+                return resp
+            last = resp
+        except requests.exceptions.RequestException as exc:
+            last = exc
+        time.sleep(delay_s)
+    if isinstance(last, Exception):
+        raise last
+    return last
+
+
+@pytest.mark.asyncio
+async def test_active_model_and_multi_gpu_flags():
+    cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
+    command = cfg.get("command") or []
+    assert "--model" in command
+    assert "--tensor-split" in command
+    split_idx = command.index("--tensor-split") + 1
+    split = command[split_idx]
+    assert "," in split, f"tensor-split missing commas: {split}"
+    assert "--split-mode" in command
+
+
+def test_models_listed():
+    models = _get_models()
+    assert models, "no models discovered"
+
+
+def test_chat_completions_switch_and_prompts():
+    models = _get_models()
+    assert models, "no models"
+    if MODEL_REQUEST:
+        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
+        model_id = MODEL_REQUEST
+    else:
+        model_id = models[0]
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": "Say OK."}],
+        "max_tokens": 12,
+        "temperature": 0,
+    }
+    for _ in range(3):
+        resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
+        assert resp.status_code == 200
+        _assert_chat_ok(resp.json())
+
+
+def test_tools_flat_format():
+    models = _get_models()
+    assert models, "no models"
+    if MODEL_REQUEST:
+        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
+        model_id = MODEL_REQUEST
+    else:
+        model_id = models[0]
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": "Say OK and do not call tools."}],
+        "tools": [
+            {
+                "type": "function",
+                "name": "format_final_json_response",
+                "description": "format output",
+                "parameters": {
+                    "type": "object",
+                    "properties": {"ok": {"type": "boolean"}},
+                    "required": ["ok"],
+                },
+            }
+        ],
+        "max_tokens": 12,
+    }
+    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
+    assert resp.status_code == 200
+    _assert_chat_ok(resp.json())
+
+
+def test_functions_payload_normalized():
+    models = _get_models()
+    assert models, "no models"
+    if MODEL_REQUEST:
+        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
+        model_id = MODEL_REQUEST
+    else:
+        model_id = models[0]
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": "Say OK and do not call tools."}],
+        "functions": [
+            {
+                "name": "format_final_json_response",
+                "description": "format output",
+                "parameters": {
+                    "type": "object",
+                    "properties": {"ok": {"type": "boolean"}},
+                    "required": ["ok"],
+                },
+            }
+        ],
+        "max_tokens": 12,
+    }
+    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
+    assert resp.status_code == 200
+    _assert_chat_ok(resp.json())
+
+
+def test_return_format_json():
+    models = _get_models()
+    assert models, "no models"
+    if MODEL_REQUEST:
+        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
+        model_id = MODEL_REQUEST
+    else:
+        model_id = models[0]
+    payload = {
+        "model": model_id,
+        "messages": [{"role": "user", "content": "Return JSON with key ok true."}],
+        "return_format": "json",
+        "max_tokens": 32,
+        "temperature": 0,
+    }
+    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
+    assert resp.status_code == 200
+    text = _assert_chat_ok(resp.json())
+    parsed = json.loads(text)
+    assert isinstance(parsed, dict)
+
+
+def test_responses_endpoint():
+    models = _get_models()
+    assert models, "no models"
+    if MODEL_REQUEST:
+        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
+        model_id = MODEL_REQUEST
+    else:
+        model_id = models[0]
+    payload = {
+        "model": model_id,
+        "input": "Say OK.",
+        "max_output_tokens": 16,
+    }
+    resp = _post_with_retry(WRAPPER_BASE + "/v1/responses", payload)
+    assert resp.status_code == 200
+    output = resp.json().get("output") or []
+    assert output, "responses output empty"
+    content = output[0].get("content") or []
+    text = content[0].get("text") if content else ""
+    assert text and text.strip()
+
+
+@pytest.mark.asyncio
+async def test_model_switch_applied_to_truenas():
+    models = _get_models()
+    assert models, "no models"
+    target = MODEL_REQUEST or models[0]
+    assert target in models, f"MODEL_REQUEST not found: {target}"
+    resp = requests.post(UI_BASE + "/ui/api/switch-model", json={"model_id": target, "warmup_prompt": "warmup"}, timeout=600)
+    assert resp.status_code == 200
+    cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
+    command = cfg.get("command") or []
+    assert "--model" in command
+    model_path = command[command.index("--model") + 1]
+    assert model_path.endswith(target)
+
+
+def test_invalid_model_rejected():
+    models = _get_models()
+    assert models, "no models"
+    payload = {
+        "model": "modelx-q8:4b",
+        "messages": [{"role": "user", "content": "Say OK."}],
+        "max_tokens": 8,
+        "temperature": 0,
+    }
+    resp = requests.post(WRAPPER_BASE + "/v1/chat/completions", json=payload, timeout=60)
+    assert resp.status_code == 404
+
+
+def test_llamacpp_logs_streaming():
+    logs = ""
+    for _ in range(5):
+        try:
+            resp = requests.get(UI_BASE + "/ui/api/llamacpp-logs", timeout=10)
+            if resp.status_code == 200:
+                logs = resp.json().get("logs") or ""
+                if logs.strip():
+                    break
+        except requests.exceptions.ReadTimeout:
+            pass
+        time.sleep(2)
+    assert logs.strip(), "no logs returned"
+
+    # Force a log line before streaming.
+    try:
+        requests.get(WRAPPER_BASE + "/proxy/llamacpp/health", timeout=5)
+    except Exception:
+        pass
+
+    # Stream endpoint may not emit immediately, so validate that the endpoint responds.
+    with requests.get(UI_BASE + "/ui/api/llamacpp-logs/stream", stream=True, timeout=(5, 5)) as resp:
+        assert resp.status_code == 200
--- a/tests/test_responses.py
+++ b/tests/test_responses.py
@@ -0,0 +1,55 @@
+import json
+import pytest
+import httpx
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_responses_non_stream(api_client, respx_mock, case):
+    respx_mock.get("http://llama.test/v1/models").mock(
+        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
+    )
+    respx_mock.post("http://llama.test/v1/chat/completions").mock(
+        return_value=httpx.Response(200, json={"choices": [{"message": {"content": f"reply-{case}"}}]})
+    )
+
+    payload = {
+        "model": "model-a.gguf",
+        "input": f"prompt-{case}",
+        "max_output_tokens": 32,
+    }
+    resp = api_client.post("/v1/responses", json=payload)
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["object"] == "response"
+    assert data["output"][0]["content"][0]["text"].startswith("reply-")
+
+
+@pytest.mark.parametrize("case", list(range(120)))
+def test_responses_stream(api_client, respx_mock, case):
+    respx_mock.get("http://llama.test/v1/models").mock(
+        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
+    )
+
+    def stream_response(request):
+        payload = {
+            "id": "chunk",
+            "object": "chat.completion.chunk",
+            "choices": [{"delta": {"content": f"hi-{case}"}, "index": 0, "finish_reason": None}],
+        }
+        content = f"data: {json.dumps(payload)}\n\n".encode()
+        content += b"data: [DONE]\n\n"
+        return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
+
+    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
+
+    payload = {
+        "model": "model-a.gguf",
+        "input": f"prompt-{case}",
+        "stream": True,
+    }
+    with api_client.stream("POST", "/v1/responses", json=payload) as resp:
+        assert resp.status_code == 200
+        body = b"".join(resp.iter_bytes())
+        assert b"event: response.created" in body
+        assert b"event: response.output_text.delta" in body
+        assert b"event: response.completed" in body
--- a/tests/test_truenas_switch.py
+++ b/tests/test_truenas_switch.py
@@ -0,0 +1,54 @@
+import json
+import pytest
+
+from app.truenas_middleware import TrueNASConfig, switch_model
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("case", list(range(120)))
+async def test_switch_model_updates_command(monkeypatch, case):
+    compose = {
+        "services": {
+            "llamacpp": {
+                "command": [
+                    "--model",
+                    "/models/old.gguf",
+                    "--ctx-size",
+                    "2048",
+                ]
+            }
+        }
+    }
+
+    captured = {}
+
+    async def fake_rpc_call(cfg, method, params=None):
+        if method == "app.config":
+            return {"custom_compose_config": compose}
+        if method == "app.update":
+            captured["payload"] = params[1]
+            return {"state": "RUNNING"}
+        raise AssertionError(f"unexpected method {method}")
+
+    monkeypatch.setattr("app.truenas_middleware._rpc_call", fake_rpc_call)
+
+    cfg = TrueNASConfig(
+        ws_url="ws://truenas.test/websocket",
+        api_key="key",
+        api_user=None,
+        app_name="llamacpp",
+        verify_ssl=False,
+    )
+
+    await switch_model(
+        cfg,
+        f"/models/new-{case}.gguf",
+        {"n_gpu_layers": "999"},
+        "--flash-attn on",
+    )
+
+    assert "custom_compose_config" in captured["payload"]
+    cmd = captured["payload"]["custom_compose_config"]["services"]["llamacpp"]["command"]
+    assert "--model" in cmd
+    idx = cmd.index("--model")
+    assert cmd[idx + 1].endswith(f"new-{case}.gguf")
--- a/tests/test_ui.py
+++ b/tests/test_ui.py
@@ -0,0 +1,48 @@
+import json
+import os
+import time
+
+import pytest
+import requests
+
+UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
+
+def _wait_for_http(url: str, timeout_s: float = 90) -> None:
+    deadline = time.time() + timeout_s
+    last_err = None
+    while time.time() < deadline:
+        try:
+            resp = requests.get(url, timeout=5)
+            if resp.status_code == 200:
+                return
+            last_err = f"status {resp.status_code}"
+        except Exception as exc:
+            last_err = str(exc)
+        time.sleep(2)
+    raise RuntimeError(f"service not ready: {url} ({last_err})")
+
+
+def test_ui_index_contains_expected_elements():
+    _wait_for_http(UI_BASE + "/health")
+    resp = requests.get(UI_BASE + "/", timeout=30)
+    assert resp.status_code == 200
+    html = resp.text
+    assert "Model Manager" in html
+    assert "id=\"download-form\"" in html
+    assert "id=\"models-list\"" in html
+    assert "id=\"logs-output\"" in html
+    assert "id=\"theme-toggle\"" in html
+
+
+def test_ui_assets_available():
+    resp = requests.get(UI_BASE + "/ui/styles.css", timeout=30)
+    assert resp.status_code == 200
+    css = resp.text
+    assert "data-theme" in css
+
+    resp = requests.get(UI_BASE + "/ui/app.js", timeout=30)
+    assert resp.status_code == 200
+    js = resp.text
+    assert "themeToggle" in js
+    assert "localStorage" in js
+    assert "logs-output" in js
--- a/tmp_channels_cols.sql
+++ b/tmp_channels_cols.sql
@@ -0,0 +1 @@
+SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;
--- a/tmp_pref_type.sql
+++ b/tmp_pref_type.sql
@@ -0,0 +1 @@
+SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';
--- a/tmp_update_max_results.sql
+++ b/tmp_update_max_results.sql
@@ -0,0 +1 @@
+UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';
--- a/trades_company_stock.txt
+++ b/trades_company_stock.txt
@@ -0,0 +1,56 @@
+You are a senior quantitative options trader (index/ETF options across regimes; also liquid single-name options and macro-sensitive metal ETFs), specializing in volatility, structure selection, and risk asymmetry. Decisive, skeptical, profit-focused.
+
+You are given:
+- A validated market thesis (authoritative): multi-timeframe technicals, regime, volatility context, news impact.
+- Pre-processed options chains for three expiries (short / medium / extended) with liquidity-filtered contracts, ATM/delta anchors, delta ladders, and a liquid execution set.
+- All pricing, greeks, spreads, and liquidity metrics required for execution-quality decisions.
+
+Assume:
+- Data is correct and cleaned.
+- You must NOT re-analyze technicals or news; the thesis is authoritative.
+- Your job is to convert thesis + surface into executable options trades.
+
+Objective:
+- Select the best expiry and propose 1–3 high-quality options trades that align with thesis bias/regime, exploit volatility characteristics (gamma/theta/vega fit), are liquid/fillable/risk-defined, and include clear invalidation logic.
+- If no trade offers favorable risk/reward: strategyBias=NO_TRADE and explain why.
+
+How to decide:
+1) Compare expiries: match time-to-playout vs confidence/uncertainty; match vol regime (expansion vs decay); reject poor liquidity density; reject misaligned vega/theta; avoid overpaying for time/vol.
+2) Choose structure class (explicitly justify vs alternatives): directional debit (single/vertical), volatility (straddle/strangle), defined-risk premium selling only if the regime supports it.
+3) Select strikes ONLY from provided data (ATM anchor, delta ladder, liquidSet). Prefer tight spreads, meaningful volume & OI, and greeks that express the thesis.
+4) Risk discipline: every trade must include max risk, what must go right, and what breaks the trade (invalidation).
+
+Optional tools (use only when they materially improve decision quality; otherwise do not call):
+- MarketData – Options Chain (expiry-specific): only if provided expiries do not sufficiently match the thesis horizon, or liquidity/skew is materially better in a nearby expiry not already supplied. Choose an explicit expiry date. Use returned data only for strike selection and liquidity validation. Do not re-fetch already provided expiries unless validating anomalies.
+- Fear & Greed Index (FGI): only for index/ETF/macro-sensitive underlyings (e.g., SPX, NDX, IWM, SLV). Contextual only (risk appetite / convexity vs tempered), not a primary signal.
+
+Hard constraints:
+- Do NOT invent strikes, expiries, or prices.
+- Do NOT suggest illiquid contracts.
+- Do NOT recommend naked risk.
+- Do NOT hedge unless justified.
+- Do NOT repeat raw data back.
+
+Return ONLY valid JSON in exactly this shape:
+{
+  "selectedExpiry": "YYYY-MM-DD",
+  "expiryRationale": "Why this expiry dominates the others given thesis + vol + liquidity",
+  "strategyBias": "DIRECTIONAL|VOLATILITY|NEUTRAL|NO_TRADE",
+  "recommendedTrades": [
+    {
+      "name": "Short descriptive name",
+      "structure": "e.g. Long Call, Call Debit Spread, Long Strangle",
+      "legs": [{"side":"call|put","action":"buy|sell","strike":0,"expiry":"YYYY-MM-DD"}],
+      "greekProfile": {"deltaBias":"POS|NEG|NEUTRAL","gammaExposure":"HIGH|MED|LOW","thetaExposure":"POS|NEG|LOW","vegaExposure":"HIGH|MED|LOW"},
+      "maxRisk": "Defined numeric or qualitative",
+      "maxReward": "Defined numeric or qualitative",
+      "thesisAlignment": "Exactly how this trade expresses the thesis",
+      "invalidation": "Clear condition where trade is wrong",
+      "managementNotes": "Optional: scale, take-profit, time stop"
+    }
+  ],
+  "whyOthersRejected": ["Why other expiries or strategy types were inferior"],
+  "confidenceScore": 0
+}
+
+Final note: optimize for repeatable profitability under uncertainty. If conditions are marginal, say NO_TRADE with conviction.
				`@@ -0,0 +1 @@`
				`SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';`
				`@@ -0,0 +1 @@`
				`SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;`
				`@@ -0,0 +1 @@`
				`SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';`
				`@@ -0,0 +1 @@`
				`UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';`