Initial commit

2026-01-07 16:54:39 -08:00
commit 5d1a0ee72b
53 changed files with 9885 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,142 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # Project-specific
 /inventory_raw/
 /llamacpp_runs_remote/
 /ollama_runs_remote/
 /reports/
 /tmp/
 *.log
 /C:/Users/Rushabh/.gemini/tmp/bff31f86566324f77927540d72088ce62479fd0563c197318c9f0594af2e69ee/
 # OS-generated files
 .DS_Store
 .DS_Store?
 ._*
 .Spotlight-V100
 .Trashes
 ehthumbs.db
 Thumbs.db
--- a/AGENTS.full.md
+++ b/AGENTS.full.md
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,20 @@
 # AGENTS (compressed)
 This is the compact working context. For the full historical inventory and detailed snapshots, see `AGENTS.full.md` and `inventory_raw/`.
 ## Access + basics
 - SSH: `ssh -p 55555 rushabh@192.168.1.2`
 - Sudo: `sudo -n true`
 - TrueNAS UI: `http://192.168.1.2`
 ## Full context pointers
 - Full inventory snapshot and extra system details: `AGENTS.full.md`
 - Raw captured data: `inventory_raw/`
 - Documentation notes: `docs/*`
 Projects
  - n8n Thesis Builder checkpoint (2026-01-04): `docs/n8n-thesis-builder-checkpoint-20260104.md`
  - llamaCpp wrapper: A Python-based OpenAI-compatible API wrapper and model manager for the TrueNAS llama.cpp app.
    - Location: `llamaCpp.Wrapper.app/`
    - API Port: `9093`
    - UI Port: `9094`
    - See the `README.md` inside the folder for full details.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,69 @@
 # Codex TrueNAS Helper
 This project is a collection of scripts, configurations, and applications to manage and enhance a TrueNAS SCALE server, with a special focus on running and interacting with large language models (LLMs) like those powered by `llama.cpp` and `Ollama`.
 ## Features
 *   **`llama.cpp` Wrapper:** A sophisticated wrapper for the `llama.cpp` TrueNAS application that provides:
    *   An OpenAI-compatible API for chat completions and embeddings.
    *   A web-based UI for managing models (listing, downloading).
    *   The ability to hot-swap models without restarting the `llama.cpp` container by interacting with the TrueNAS API.
 *   **TrueNAS Inventory:** A snapshot of the TrueNAS server's configuration, including hardware, storage, networking, and running applications.
 *   **Automation Scripts:** A set of PowerShell and Python scripts for tasks like deploying the wrapper and testing remote endpoints.
 *   **LLM Integration:** Tools and configurations for working with various LLMs.
 ## Directory Structure
 *   `AGENTS.md` & `AGENTS.full.md`: These files contain detailed information and a complete inventory of the TrueNAS server's configuration.
 *   `llamaCpp.Wrapper.app/`: A Python-based application that wraps the `llama.cpp` TrueNAS app with an OpenAI-compatible API and a model management UI.
 *   `scripts/`: Contains various scripts for deployment, testing, and other tasks.
 *   `inventory_raw/`: Raw data dumps from the TrueNAS server, used to generate the inventory in `AGENTS.full.md`.
 *   `reports/`: Contains generated reports, test results, and other artifacts.
 *   `llamacpp_runs_remote/` & `ollama_runs_remote/`: Logs and results from running LLMs.
 *   `modelfiles/`: Modelfiles for different language models.
 *   `tests/`: Python tests for the `llamaCpp.Wrapper.app`.
 ## `llamaCpp.Wrapper.app`
 This is the core component of the project. It's a Python application that acts as a proxy to the `llama.cpp` server running on TrueNAS, but with added features.
 ### Running Locally
 1.  Install the required Python packages:
    ```bash
    pip install -r llamaCpp.Wrapper.app/requirements.txt
    ```
 2.  Run the application:
    ```bash
    python -m llamaCpp.Wrapper.app.run
    ```
    This will start two web servers: one for the API (default port 9093) and one for the UI (default port 9094).
 ### Docker (TrueNAS)
 The wrapper can be run as a Docker container on TrueNAS. See the `llamaCpp.Wrapper.app/README.md` file for a detailed example of the `docker run` command. The wrapper needs to be configured with the appropriate environment variables to connect to the TrueNAS API and the `llama.cpp` container.
 ### Model Hot-Swapping
 The wrapper can switch models in the `llama.cpp` server by updating the application's command via the TrueNAS API. This is a powerful feature that allows for dynamic model management without manual intervention.
 ## Scripts
 *   `deploy_truenas_wrapper.py`: A Python script to deploy the `llamaCpp.Wrapper.app` to TrueNAS.
 *   `remote_wrapper_test.py`: A Python script for testing the remote wrapper.
 *   `update_llamacpp_flags.ps1`: A PowerShell script to update the `llama.cpp` flags.
 *   `llamacpp_remote_test.ps1` & `ollama_remote_test.ps1`: PowerShell scripts for testing `llama.cpp` and `Ollama` remote endpoints.
 ## Getting Started
 1.  **Explore the Inventory:** Start by reading `AGENTS.md` and `AGENTS.full.md` to understand the TrueNAS server's configuration.
 2.  **Set up the Wrapper:** If you want to use the `llama.cpp` wrapper, follow the instructions in `llamaCpp.Wrapper.app/README.md` to run it either locally or as a Docker container on TrueNAS.
 3.  **Use the Scripts:** The scripts in the `scripts` directory can be used to automate various tasks.
 ## Development
 The `llamaCpp.Wrapper.app` has a suite of tests located in the `tests/` directory. To run the tests, use `pytest`:
 ```bash
 pytest
 ```
--- a/docs/llamacpp-wrapper-notes.md
+++ b/docs/llamacpp-wrapper-notes.md
@@ -0,0 +1,60 @@
 # llama.cpp Wrapper Notes
 Last updated: 2026-01-04
 ## Purpose
 OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
 model switching, and parameter management via TrueNAS middleware.
 ## Deployed Image
 - `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
 ## Ports (current)
 - API (pinned): `http://192.168.1.2:9093`
 - UI (pinned): `http://192.168.1.2:9094`
 - llama.cpp native: `http://192.168.1.2:8071`
 ## Key Behaviors
 - Model switching uses TrueNAS middleware `app.update` to update `--model`.
 - `--device` flag is explicitly removed because it crashes llama.cpp on this host.
 - UI shows active model and supports switching with verification prompt.
 - UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
 - UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
 - UI supports dark theme toggle (persisted in localStorage).
 - UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
 ## Tools Support (n8n/OpenWebUI)
 - Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
  OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
 - Legacy `functions` payloads are normalized into `tools`.
 - `tool_choice` is normalized to OpenAI format as well.
 - `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
 ## Model Resolution
 - Exact string match only (with optional explicit alias mapping).
 - Requests that do not exactly match a listed model return `404`.
 ## Parameters UI
 - Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
 - Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
 ## Model Switch UI
 - Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
 - Verifies switch by sending a minimal prompt.
 ## Tests
 - Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
 - UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
 - Run with env vars:
  - `WRAPPER_BASE=http://192.168.1.2:9093`
  - `UI_BASE=http://192.168.1.2:9094`
  - `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
  - `TRUENAS_API_KEY=...`
  - `MODEL_REQUEST=<exact model id from /v1/models>`
 ## Runtime Validation (2026-01-04)
 - Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
 - Confirmed TinyLlama loads and answers prompts with `return_format=json`.
 - Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
 - Expect transient `503 Loading model` during warmup; retry after load completes.
 - Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).
--- a/docs/n8n-thesis-builder-checkpoint-20260104.md
+++ b/docs/n8n-thesis-builder-checkpoint-20260104.md
@@ -0,0 +1,53 @@
 # n8n Thesis Builder Debug Checkpoint (2026-01-04)
 ## Summary
 - Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
 - Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
 - Confirmed execution with garbled output: execution `7890` (status `success`).
 ## What changed in the workflow
 Only this workflow was modified:
 - `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
 - `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
 - `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
 - `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
 - `Code10` now caps news items and string length.
 ## Last successful run details (execution 7890)
 - `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
 - `AI - Technicals Auditor` output looks valid JSON (see sample below).
 - `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
 - Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
 ### Sample `AI - Thesis Builder` output (garbled)
 - symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
 - thesis: (junk tokens, mostly non-words)
 - confidence: `0`
 ### Sample `AI - Technicals Auditor` output (valid JSON)
 ```
 {
  "output": {
    "timeframes": [
      { "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
      { "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
      { "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
      { "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
    ],
    "optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
    "dataQualityScore": 0.5,
    "error": "INSUFFICIENT_DATA"
  }
 }
 ```
 ## Open issues
 - Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
 - Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
 ## Useful commands
 - Last runs:
  `SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
 - Export workflow:
  `sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`
--- a/llamaCpp.Wrapper.app/Dockerfile
+++ b/llamaCpp.Wrapper.app/Dockerfile
@@ -0,0 +1,16 @@
 FROM python:3.11-slim
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1
 WORKDIR /app
 COPY requirements.txt /app/requirements.txt
 RUN pip install --no-cache-dir -r /app/requirements.txt
 COPY app /app/app
 COPY trades_company_stock.txt /app/trades_company_stock.txt
 EXPOSE 8000 8001
 CMD ["python", "-m", "app.run"]
--- a/llamaCpp.Wrapper.app/README.md
+++ b/llamaCpp.Wrapper.app/README.md
@@ -0,0 +1,134 @@
 # llama.cpp OpenAI-Compatible Wrapper
 This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
 The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
 ## Current Agents-Derived Details
 - llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
 - Host port: `8071` -> container port `8080`
 - Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
 - Network: `ix-llamacpp_default`
 - Container name: `ix-llamacpp-llamacpp-1`
 - GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
 Regenerate the derived config after updating `AGENTS.md`:
 ```bash
 python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
 ```
 ## Running Locally
 ```bash
 python -m venv .venv
 . .venv/bin/activate
 pip install -r requirements.txt
 python -m app.run
 ```
 Defaults:
 - API: `PORT_A=9093`
 - UI: `PORT_B=9094`
 - Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
 - Model dir: `MODEL_DIR=/models`
 ## Docker (TrueNAS)
 Example (join existing llama.cpp network and mount models):
 ```bash
 docker run --rm -p 9093:9093 -p 9094:9094 \
  --network ix-llamacpp_default \
  -v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e LLAMACPP_RESTART_METHOD=docker \
  -e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
  -e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
  -e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
  -e TRUENAS_API_KEY=YOUR_KEY \
  -e TRUENAS_API_USER=YOUR_USER \
  -e TRUENAS_APP_NAME=llamacpp \
  -e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
  -e PORT_A=9093 -e PORT_B=9094 \
  llama-cpp-openai-wrapper:latest
 ```
 ## Model Hot-Swap / Restart Hooks
 This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
 provide one of the restart methods below:
 - `LLAMACPP_RESTART_METHOD=http`
 - `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
 or
 - `LLAMACPP_RESTART_METHOD=shell`
 - `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
 or (requires mounting docker socket)
 - `LLAMACPP_RESTART_METHOD=docker`
 - `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
 ## Model switching via TrueNAS middleware (P0)
 Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
 ```
 TRUENAS_WS_URL=ws://192.168.1.2/websocket
 TRUENAS_API_KEY=YOUR_KEY
 TRUENAS_API_USER=YOUR_USER
 TRUENAS_APP_NAME=llamacpp
 TRUENAS_VERIFY_SSL=false
 ```
 The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
 missing GPU split flags from `LLAMACPP_*` if not already set.
 Optional arguments passed to restart handlers:
 ```
 LLAMACPP_DEVICES=0,1
 LLAMACPP_TENSOR_SPLIT=0.5,0.5
 LLAMACPP_SPLIT_MODE=layer
 LLAMACPP_N_GPU_LAYERS=999
 LLAMACPP_CTX_SIZE=8192
 LLAMACPP_BATCH_SIZE=1024
 LLAMACPP_UBATCH_SIZE=256
 LLAMACPP_CACHE_TYPE_K=q4_0
 LLAMACPP_CACHE_TYPE_V=q4_0
 LLAMACPP_FLASH_ATTN=on
 ```
 You can also pass arbitrary llama.cpp flags (space-separated) via:
 ```
 LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
 ```
 ## Model Manager UI
 Open `http://HOST:PORT_B/`.
 Features:
 - List existing models
 - Download models via URL
 - Live progress + cancel
 ## Testing
 Tests are parameterized with 100+ cases per endpoint.
 ```bash
 pytest -q
 ```
 ## llama.cpp flags reference
 Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
 ```
 pwsh scripts/update_llamacpp_flags.ps1
 ```
--- a/llamaCpp.Wrapper.app/init.py
+++ b/llamaCpp.Wrapper.app/init.py
@@ -0,0 +1 @@
--- a/llamaCpp.Wrapper.app/agents_config.json
+++ b/llamaCpp.Wrapper.app/agents_config.json
@@ -0,0 +1,22 @@
 {
  "image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
  "container_name": "ix-llamacpp-llamacpp-1",
  "host_port": 8071,
  "container_port": 8080,
  "web_ui_url": "http://0.0.0.0:8071/",
  "model_host_path": "/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models",
  "model_container_path": "/models",
  "models": [
    "GPT-OSS",
    "Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
    "openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf",
    "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
  ],
  "network": "ix-llamacpp_default",
  "subnets": [
    "172.16.18.0/24",
    "fdb7:86ec:b1dd:11::/64"
  ],
  "gpu_count": 2,
  "gpu_name": "NVIDIA RTX 5060 Ti, 16 GB each (per `nvidia-smi` in prior runs)."
 }
--- a/llamaCpp.Wrapper.app/agents_parser.py
+++ b/llamaCpp.Wrapper.app/agents_parser.py
@@ -0,0 +1,119 @@
 import json
 import re
 from dataclasses import dataclass, asdict
 from pathlib import Path
 from typing import List, Optional
 APP_HEADER_RE = re.compile(r"^### App: (?P<name>.+?)\s*$")
 IMAGE_RE = re.compile(r"image=(?P<image>[^\s]+)")
 PORT_MAP_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host>\d+|0\.0\.0\.0:(?P<host_ip_port>\d+))")
 PORT_LINE_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host_ip>[^:]+):(?P<host>\d+)")
 VOLUME_RE = re.compile(r"- (?P<host>/[^\s]+) -> (?P<container>/[^\s]+)")
 NETWORK_RE = re.compile(r"- (?P<name>ix-[^\s]+)_default")
 SUBNET_RE = re.compile(r"subnets=\[(?P<subnets>[^\]]+)\]")
 MODELS_RE = re.compile(r"Models in /models: (?P<models>.+)$")
 PORTAL_RE = re.compile(r"Portals: \{\'Web UI\': \'(?P<url>[^\']+)\'\}")
 GPU_RE = re.compile(r"GPUs:\s*(?P<count>\d+)x\s*(?P<name>.+)$")
 CONTAINER_NAME_RE = re.compile(r"^(?P<name>ix-llamacpp-[^\s]+)")
@dataclass
 class LlamacppConfig:
    image: Optional[str] = None
    container_name: Optional[str] = None
    host_port: Optional[int] = None
    container_port: Optional[int] = None
    web_ui_url: Optional[str] = None
    model_host_path: Optional[str] = None
    model_container_path: Optional[str] = None
    models: List[str] = None
    network: Optional[str] = None
    subnets: List[str] = None
    gpu_count: Optional[int] = None
    gpu_name: Optional[str] = None
 def _find_section(lines: List[str], app_name: str) -> List[str]:
    start = None
    for i, line in enumerate(lines):
        m = APP_HEADER_RE.match(line.strip())
        if m and m.group("name") == app_name:
            start = i
            break
    if start is None:
        return []
    for j in range(start + 1, len(lines)):
        if APP_HEADER_RE.match(lines[j].strip()):
            return lines[start:j]
    return lines[start:]
 def parse_agents(path: Path) -> LlamacppConfig:
    text = path.read_text(encoding="utf-8", errors="ignore")
    lines = text.splitlines()
    section = _find_section(lines, "llamacpp")
    cfg = LlamacppConfig(models=[], subnets=[])
    for line in section:
        if cfg.image is None:
            m = IMAGE_RE.search(line)
            if m:
                cfg.image = m.group("image")
        if cfg.web_ui_url is None:
            m = PORTAL_RE.search(line)
            if m:
                cfg.web_ui_url = m.group("url")
        if cfg.container_port is None or cfg.host_port is None:
            m = PORT_LINE_RE.search(line)
            if m:
                cfg.container_port = int(m.group("container"))
                cfg.host_port = int(m.group("host"))
        if cfg.model_host_path is None or cfg.model_container_path is None:
            m = VOLUME_RE.search(line)
            if m and "/models" in m.group("container"):
                cfg.model_host_path = m.group("host")
                cfg.model_container_path = m.group("container")
        if cfg.network is None:
            m = NETWORK_RE.search(line)
            if m:
                cfg.network = f"{m.group('name')}_default"
        if "subnets=" in line:
            m = SUBNET_RE.search(line)
            if m:
                subnets_raw = m.group("subnets")
                subnets = [s.strip().strip("'") for s in subnets_raw.split(",")]
                cfg.subnets.extend([s for s in subnets if s])
        if "Models in /models:" in line:
            m = MODELS_RE.search(line)
            if m:
                models_raw = m.group("models")
                cfg.models = [s.strip() for s in models_raw.split(",") if s.strip()]
    for line in lines:
        if cfg.gpu_count is None:
            m = GPU_RE.search(line)
            if m:
                cfg.gpu_count = int(m.group("count"))
                cfg.gpu_name = m.group("name").strip()
        if cfg.container_name is None:
            m = CONTAINER_NAME_RE.match(line.strip())
            if m:
                cfg.container_name = m.group("name")
    return cfg
 def main() -> None:
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument("--agents", default="AGENTS.md")
    parser.add_argument("--out", default="app/agents_config.json")
    args = parser.parse_args()
    cfg = parse_agents(Path(args.agents))
    out_path = Path(args.out)
    out_path.parent.mkdir(parents=True, exist_ok=True)
    out_path.write_text(json.dumps(asdict(cfg), indent=2), encoding="utf-8")
 if __name__ == "__main__":
    main()
--- a/llamaCpp.Wrapper.app/api_app.py
+++ b/llamaCpp.Wrapper.app/api_app.py
@@ -0,0 +1,309 @@
 import asyncio
 import logging
 import time
 from pathlib import Path
 from typing import Any, Dict
 from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
 from fastapi.responses import JSONResponse, StreamingResponse
 import httpx
 from app.config import load_config
 from app.llamacpp_client import proxy_json, proxy_raw, proxy_stream
 from app.logging_utils import configure_logging
 from app.model_registry import find_model, resolve_model, scan_models
 from app.openai_translate import responses_to_chat_payload, chat_to_responses, normalize_chat_payload
 from app.restart import RestartPlan, trigger_restart
 from app.stream_transform import stream_chat_to_responses
 from app.truenas_middleware import TrueNASConfig, get_active_model_id, switch_model
 from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
 configure_logging()
 log = logging.getLogger("api_app")
 def _model_list_payload(model_dir: str) -> Dict[str, Any]:
    data = []
    for model in scan_models(model_dir):
        data.append({
            "id": model.model_id,
            "object": "model",
            "created": model.created,
            "owned_by": "llama.cpp",
        })
    return {"object": "list", "data": data}
 def _requires_json_mode(payload: Dict[str, Any]) -> bool:
    response_format = payload.get("response_format")
    if isinstance(response_format, dict) and response_format.get("type") == "json_object":
        return True
    if payload.get("return_format") == "json":
        return True
    return False
 def _apply_json_fallback(payload: Dict[str, Any]) -> Dict[str, Any]:
    payload = dict(payload)
    payload.pop("response_format", None)
    payload.pop("return_format", None)
    messages = payload.get("messages")
    if isinstance(messages, list):
        system_msg = {"role": "system", "content": "Respond only with a valid JSON object."}
        if not messages or messages[0].get("role") != "system":
            payload["messages"] = [system_msg, *messages]
        else:
            payload["messages"] = [system_msg, *messages[1:]]
    return payload
 async def _proxy_json_with_retry(
    base_url: str,
    path: str,
    method: str,
    headers: Dict[str, str],
    payload: Dict[str, Any],
    timeout_s: float,
    delay_s: float = 3.0,
 ) -> httpx.Response:
    deadline = time.time() + timeout_s
    attempt = 0
    last_exc: Exception | None = None
    while time.time() < deadline:
        attempt += 1
        try:
            resp = await proxy_json(base_url, path, method, headers, payload, timeout_s)
            if resp.status_code == 503:
                try:
                    data = resp.json()
                except Exception:
                    data = {}
                message = ""
                if isinstance(data, dict):
                    err = data.get("error")
                    if isinstance(err, dict):
                        message = str(err.get("message") or "")
                    else:
                        message = str(data.get("message") or "")
                if "loading model" in message.lower():
                    log.warning("llama.cpp still loading model, retrying (attempt %s)", attempt)
                    await asyncio.sleep(delay_s)
                    continue
            return resp
        except httpx.RequestError as exc:
            last_exc = exc
            log.warning("Proxy request failed (attempt %s): %s", attempt, exc)
            await asyncio.sleep(delay_s)
    if last_exc:
        raise last_exc
    raise RuntimeError("proxy retry deadline exceeded")
 async def _get_active_model_from_truenas(cfg: TrueNASConfig) -> str:
    try:
        return await get_active_model_id(cfg)
    except Exception as exc:
        log.warning("Failed to read active model from TrueNAS config: %s", exc)
    return ""
 async def _wait_for_active_model(cfg: TrueNASConfig, model_id: str, timeout_s: float) -> None:
    deadline = asyncio.get_event_loop().time() + timeout_s
    while asyncio.get_event_loop().time() < deadline:
        active = await _get_active_model_from_truenas(cfg)
        if active == model_id:
            return
        await asyncio.sleep(2)
    raise RuntimeError(f"active model did not switch to {model_id}")
 async def _ensure_model_loaded(model_id: str, model_dir: str) -> str:
    cfg = load_config()
    model = resolve_model(model_dir, model_id, cfg.model_aliases)
    if not model:
        log.warning("Requested model not found: %s", model_id)
        raise HTTPException(status_code=404, detail="model not found")
    if model.model_id != model_id:
        log.info("Resolved model alias %s -> %s", model_id, model.model_id)
    truenas_cfg = None
    if cfg.truenas_ws_url and cfg.truenas_api_key:
        truenas_cfg = TrueNASConfig(
            ws_url=cfg.truenas_ws_url,
            api_key=cfg.truenas_api_key,
            api_user=cfg.truenas_api_user,
            app_name=cfg.truenas_app_name,
            verify_ssl=cfg.truenas_verify_ssl,
        )
        active_id = await _get_active_model_from_truenas(truenas_cfg)
        if active_id and active_id == model.model_id:
            return model.model_id
    if truenas_cfg:
        log.info("Switching model via API model=%s args=%s extra_args=%s", model.model_id, cfg.llamacpp_args, cfg.llamacpp_extra_args)
        try:
            model_path = str((Path(cfg.model_container_dir) / model.model_id))
            await switch_model(
                truenas_cfg,
                model_path,
                cfg.llamacpp_args,
                cfg.llamacpp_extra_args,
            )
            await _wait_for_active_model(truenas_cfg, model.model_id, cfg.switch_timeout_s)
        except Exception as exc:
            log.exception("TrueNAS model switch failed")
            raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
        warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
        log.info("Running warmup prompt after model switch: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
        await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
        return model.model_id
    plan = RestartPlan(
        method=cfg.restart_method,
        command=cfg.restart_command,
        url=cfg.restart_url,
        allowed_container=cfg.allowed_container,
    )
    log.info("Triggering restart for model=%s method=%s", model.model_id, cfg.restart_method)
    payload = {
        "model_id": model.model_id,
        "model_path": str(Path(cfg.model_container_dir) / model.model_id),
        "gpu_count": cfg.gpu_count_runtime or cfg.agents.gpu_count,
        "llamacpp_args": cfg.llamacpp_args,
        "llamacpp_extra_args": cfg.llamacpp_extra_args,
    }
    await trigger_restart(plan, payload=payload)
    warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
    log.info("Running warmup prompt after restart: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
    await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
    return model.model_id
 def create_api_app() -> FastAPI:
    cfg = load_config()
    app = FastAPI(title="llama.cpp OpenAI Wrapper", version="0.1.0")
    router = APIRouter()
    @app.middleware("http")
    async def log_requests(request: Request, call_next):
        log.info("Request %s %s", request.method, request.url.path)
        return await call_next(request)
    @app.exception_handler(Exception)
    async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
        log.exception("Unhandled error")
        return JSONResponse(status_code=500, content={"detail": str(exc)})
    @router.get("/health")
    async def health() -> Dict[str, Any]:
        return {
            "status": "ok",
            "base_url": cfg.base_url,
            "model_dir": cfg.model_dir,
            "agents": {
                "image": cfg.agents.image,
                "container_name": cfg.agents.container_name,
                "network": cfg.agents.network,
                "gpu_count": cfg.agents.gpu_count,
            },
            "gpu_count_runtime": cfg.gpu_count_runtime,
        }
    @router.get("/v1/models")
    async def list_models() -> Dict[str, Any]:
        log.info("Listing models")
        return _model_list_payload(cfg.model_dir)
    @router.get("/v1/models/{model_id}")
    async def get_model(model_id: str) -> Dict[str, Any]:
        log.info("Get model %s", model_id)
        model = resolve_model(cfg.model_dir, model_id, cfg.model_aliases) or find_model(cfg.model_dir, model_id)
        if not model:
            raise HTTPException(status_code=404, detail="model not found")
        return {
            "id": model.model_id,
            "object": "model",
            "created": model.created,
            "owned_by": "llama.cpp",
        }
    @router.post("/v1/chat/completions")
    async def chat_completions(request: Request) -> Response:
        payload = await request.json()
        payload = normalize_chat_payload(payload)
        model_id = payload.get("model")
        log.info("Chat completions model=%s stream=%s", model_id, bool(payload.get("stream")))
        if model_id:
            resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
            payload["model"] = resolved
        stream = bool(payload.get("stream"))
        if stream and _requires_json_mode(payload):
            payload = _apply_json_fallback(payload)
        if stream:
            streamer = proxy_stream(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
            return StreamingResponse(streamer, media_type="text/event-stream")
        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
        if resp.status_code >= 500 and _requires_json_mode(payload):
            log.info("Retrying chat completion with JSON fallback prompt")
            fallback_payload = _apply_json_fallback(payload)
            resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
        try:
            return JSONResponse(status_code=resp.status_code, content=resp.json())
        except Exception:
            return Response(
                status_code=resp.status_code,
                content=resp.content,
                media_type=resp.headers.get("content-type"),
            )
    @router.post("/v1/responses")
    async def responses(request: Request) -> Response:
        payload = await request.json()
        chat_payload, model_id = responses_to_chat_payload(payload)
        log.info("Responses model=%s stream=%s", model_id, bool(chat_payload.get("stream")))
        if model_id:
            resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
            chat_payload["model"] = resolved
        stream = bool(chat_payload.get("stream"))
        if stream and _requires_json_mode(chat_payload):
            chat_payload = _apply_json_fallback(chat_payload)
        if stream:
            streamer = stream_chat_to_responses(
                cfg.base_url,
                dict(request.headers),
                chat_payload,
                cfg.proxy_timeout_s,
            )
            return StreamingResponse(streamer, media_type="text/event-stream")
        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), chat_payload, cfg.proxy_timeout_s)
        if resp.status_code >= 500 and _requires_json_mode(chat_payload):
            log.info("Retrying responses with JSON fallback prompt")
            fallback_payload = _apply_json_fallback(chat_payload)
            resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
        resp.raise_for_status()
        return JSONResponse(status_code=200, content=chat_to_responses(resp.json(), model_id))
    @router.post("/v1/embeddings")
    async def embeddings(request: Request) -> Response:
        payload = await request.json()
        log.info("Embeddings")
        resp = await _proxy_json_with_retry(cfg.base_url, "/v1/embeddings", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
        try:
            return JSONResponse(status_code=resp.status_code, content=resp.json())
        except Exception:
            return Response(
                status_code=resp.status_code,
                content=resp.content,
                media_type=resp.headers.get("content-type"),
            )
    @router.api_route("/proxy/llamacpp/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"])
    async def passthrough(path: str, request: Request) -> Response:
        body = await request.body()
        resp = await proxy_raw(cfg.base_url, f"/{path}", request.method, dict(request.headers), body, cfg.proxy_timeout_s)
        return Response(status_code=resp.status_code, content=resp.content, headers=dict(resp.headers))
    app.include_router(router)
    return app
--- a/llamaCpp.Wrapper.app/config.py
+++ b/llamaCpp.Wrapper.app/config.py
@@ -0,0 +1,214 @@
 import json
 import os
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, List, Optional
@dataclass
 class AgentsRuntime:
    image: Optional[str]
    container_name: Optional[str]
    host_port: Optional[int]
    container_port: Optional[int]
    web_ui_url: Optional[str]
    model_host_path: Optional[str]
    model_container_path: Optional[str]
    models: List[str]
    network: Optional[str]
    subnets: List[str]
    gpu_count: Optional[int]
    gpu_name: Optional[str]
@dataclass
 class AppConfig:
    api_port: int
    ui_port: int
    base_url: str
    model_dir: str
    model_container_dir: str
    download_dir: str
    download_max_concurrent: int
    download_allowlist: List[str]
    restart_method: str
    restart_command: Optional[str]
    restart_url: Optional[str]
    reload_on_new_model: bool
    proxy_timeout_s: float
    switch_timeout_s: float
    gpu_count_runtime: Optional[int]
    llamacpp_args: Dict[str, str]
    llamacpp_extra_args: str
    truenas_api_key: Optional[str]
    truenas_api_user: Optional[str]
    truenas_app_name: str
    truenas_ws_url: Optional[str]
    truenas_verify_ssl: bool
    allowed_container: Optional[str]
    warmup_prompt_path: str
    llamacpp_container_name: Optional[str]
    model_aliases: Dict[str, str]
    agents: AgentsRuntime
 def _load_agents_config(path: Path) -> AgentsRuntime:
    if not path.exists():
        return AgentsRuntime(
            image=None,
            container_name=None,
            host_port=None,
            container_port=None,
            web_ui_url=None,
            model_host_path=None,
            model_container_path=None,
            models=[],
            network=None,
            subnets=[],
            gpu_count=None,
            gpu_name=None,
        )
    raw = json.loads(path.read_text(encoding="utf-8"))
    return AgentsRuntime(
        image=raw.get("image"),
        container_name=raw.get("container_name"),
        host_port=raw.get("host_port"),
        container_port=raw.get("container_port"),
        web_ui_url=raw.get("web_ui_url"),
        model_host_path=raw.get("model_host_path"),
        model_container_path=raw.get("model_container_path"),
        models=raw.get("models") or [],
        network=raw.get("network"),
        subnets=raw.get("subnets") or [],
        gpu_count=raw.get("gpu_count"),
        gpu_name=raw.get("gpu_name"),
    )
 def _infer_gpu_count_runtime() -> Optional[int]:
    visible = os.getenv("CUDA_VISIBLE_DEVICES") or os.getenv("NVIDIA_VISIBLE_DEVICES")
    if visible and visible not in {"all", "void"}:
        parts = [p.strip() for p in visible.split(",") if p.strip()]
        if parts:
            return len(parts)
    return None
 def _default_base_url(agents: AgentsRuntime) -> str:
    if agents.container_name and agents.container_port:
        return f"http://{agents.container_name}:{agents.container_port}"
    if agents.host_port:
        return f"http://127.0.0.1:{agents.host_port}"
    return "http://127.0.0.1:8080"
 def load_config() -> AppConfig:
    agents_path = Path(os.getenv("AGENTS_CONFIG_PATH", "app/agents_config.json"))
    agents = _load_agents_config(agents_path)
    api_port = int(os.getenv("PORT_A", "9093"))
    ui_port = int(os.getenv("PORT_B", "9094"))
    base_url = os.getenv("LLAMACPP_BASE_URL") or _default_base_url(agents)
    model_dir = os.getenv("MODEL_DIR") or agents.model_container_path or "/models"
    model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or model_dir
    download_dir = os.getenv("MODEL_DOWNLOAD_DIR") or model_dir
    download_max = int(os.getenv("MODEL_DOWNLOAD_MAX_CONCURRENT", "2"))
    allowlist_raw = os.getenv("MODEL_DOWNLOAD_ALLOWLIST", "")
    allowlist = [item.strip() for item in allowlist_raw.split(",") if item.strip()]
    restart_method = os.getenv("LLAMACPP_RESTART_METHOD", "none").lower()
    restart_command = os.getenv("LLAMACPP_RESTART_COMMAND")
    restart_url = os.getenv("LLAMACPP_RESTART_URL")
    reload_on_new_model = os.getenv("RELOAD_ON_NEW_MODEL", "false").lower() in {"1", "true", "yes"}
    proxy_timeout_s = float(os.getenv("LLAMACPP_PROXY_TIMEOUT_S", "600"))
    switch_timeout_s = float(os.getenv("LLAMACPP_SWITCH_TIMEOUT_S", "300"))
    gpu_count_runtime = _infer_gpu_count_runtime()
    llamacpp_args = {}
    args_map = {
        "LLAMACPP_TENSOR_SPLIT": "tensor_split",
        "LLAMACPP_SPLIT_MODE": "split_mode",
        "LLAMACPP_N_GPU_LAYERS": "n_gpu_layers",
        "LLAMACPP_CTX_SIZE": "ctx_size",
        "LLAMACPP_BATCH_SIZE": "batch_size",
        "LLAMACPP_UBATCH_SIZE": "ubatch_size",
        "LLAMACPP_CACHE_TYPE_K": "cache_type_k",
        "LLAMACPP_CACHE_TYPE_V": "cache_type_v",
        "LLAMACPP_FLASH_ATTN": "flash_attn",
    }
    for env_key, arg_key in args_map.items():
        value = os.getenv(env_key)
        if value is not None and value != "":
            llamacpp_args[arg_key] = value
    llamacpp_extra_args = os.getenv("LLAMACPP_EXTRA_ARGS", "")
    truenas_api_key = os.getenv("TRUENAS_API_KEY")
    truenas_api_user = os.getenv("TRUENAS_API_USER")
    truenas_app_name = os.getenv("TRUENAS_APP_NAME", "llamacpp")
    truenas_ws_url = os.getenv("TRUENAS_WS_URL")
    truenas_api_url = os.getenv("TRUENAS_API_URL")
    if not truenas_ws_url and truenas_api_url:
        if truenas_api_url.startswith("https://"):
            truenas_ws_url = "wss://" + truenas_api_url[len("https://") :].rstrip("/") + "/websocket"
        elif truenas_api_url.startswith("http://"):
            truenas_ws_url = "ws://" + truenas_api_url[len("http://") :].rstrip("/") + "/websocket"
    truenas_verify_ssl = os.getenv("TRUENAS_VERIFY_SSL", "false").lower() in {"1", "true", "yes"}
    allowed_container = os.getenv("LLAMACPP_TARGET_CONTAINER") or agents.container_name
    llamacpp_container_name = os.getenv("LLAMACPP_CONTAINER_NAME") or agents.container_name
    warmup_prompt_path = os.getenv("WARMUP_PROMPT_PATH", str(Path("trades_company_stock.txt").resolve()))
    if truenas_ws_url and (":" in model_container_dir[:3] or "\\" in model_container_dir):
        model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or "/models"
    aliases_raw = os.getenv("MODEL_ALIASES", "")
    model_aliases: Dict[str, str] = {}
    if aliases_raw:
        try:
            model_aliases = json.loads(aliases_raw)
        except json.JSONDecodeError:
            for item in aliases_raw.split(","):
                if "=" in item:
                    key, value = item.split("=", 1)
                    model_aliases[key.strip()] = value.strip()
    gpu_count = gpu_count_runtime or agents.gpu_count
    if gpu_count and gpu_count >= 2:
        if "tensor_split" not in llamacpp_args:
            ratio = 1.0 / float(gpu_count)
            split = ",".join([f"{ratio:.2f}"] * gpu_count)
            llamacpp_args["tensor_split"] = split
        if "split_mode" not in llamacpp_args:
            llamacpp_args["split_mode"] = "layer"
    return AppConfig(
        api_port=api_port,
        ui_port=ui_port,
        base_url=base_url,
        model_dir=model_dir,
        model_container_dir=model_container_dir,
        download_dir=download_dir,
        download_max_concurrent=download_max,
        download_allowlist=allowlist,
        restart_method=restart_method,
        restart_command=restart_command,
        restart_url=restart_url,
        reload_on_new_model=reload_on_new_model,
        proxy_timeout_s=proxy_timeout_s,
        switch_timeout_s=switch_timeout_s,
        gpu_count_runtime=gpu_count_runtime,
        llamacpp_args=llamacpp_args,
        llamacpp_extra_args=llamacpp_extra_args,
        truenas_api_key=truenas_api_key,
        truenas_api_user=truenas_api_user,
        truenas_app_name=truenas_app_name,
        truenas_ws_url=truenas_ws_url,
        truenas_verify_ssl=truenas_verify_ssl,
        allowed_container=allowed_container,
        warmup_prompt_path=warmup_prompt_path,
        llamacpp_container_name=llamacpp_container_name,
        model_aliases=model_aliases,
        agents=agents,
    )
--- a/llamaCpp.Wrapper.app/docker_logs.py
+++ b/llamaCpp.Wrapper.app/docker_logs.py
@@ -0,0 +1,61 @@
 import json
 import logging
 import os
 from typing import Optional
 import httpx
 log = logging.getLogger("docker_logs")
 def _docker_transport() -> httpx.AsyncHTTPTransport:
    sock_path = os.getenv("DOCKER_SOCK", "/var/run/docker.sock")
    return httpx.AsyncHTTPTransport(uds=sock_path)
 async def _docker_get(path: str, params: Optional[dict] = None) -> httpx.Response:
    timeout = httpx.Timeout(10.0, read=10.0)
    async with httpx.AsyncClient(transport=_docker_transport(), base_url="http://docker", timeout=timeout) as client:
        resp = await client.get(path, params=params)
        resp.raise_for_status()
        return resp
 def _decode_docker_stream(data: bytes) -> str:
    if not data:
        return ""
    out = bytearray()
    idx = 0
    while idx + 8 <= len(data):
        stream_type = data[idx]
        size = int.from_bytes(data[idx + 4: idx + 8], "big")
        idx += 8
        if idx + size > len(data):
            break
        chunk = data[idx: idx + size]
        idx += size
        if stream_type in (1, 2):
            out.extend(chunk)
        else:
            out.extend(chunk)
    if out:
        return out.decode("utf-8", errors="replace")
    return data.decode("utf-8", errors="replace")
 async def docker_container_logs(container_name: str, tail_lines: int = 200) -> str:
    filters = json.dumps({"name": [container_name]})
    resp = await _docker_get("/containers/json", params={"filters": filters})
    containers = resp.json() or []
    if not containers:
        log.info("No docker container found for name=%s", container_name)
        return ""
    container_id = containers[0].get("Id")
    if not container_id:
        return ""
    resp = await _docker_get(
        f"/containers/{container_id}/logs",
        params={"stdout": 1, "stderr": 1, "tail": tail_lines},
    )
    return _decode_docker_stream(resp.content)
--- a/llamaCpp.Wrapper.app/download_manager.py
+++ b/llamaCpp.Wrapper.app/download_manager.py
@@ -0,0 +1,141 @@
 import asyncio
 import fnmatch
 import logging
 import os
 import time
 import uuid
 from dataclasses import asdict, dataclass, field
 from pathlib import Path
 from typing import Dict, Optional
 import httpx
 from app.config import AppConfig
 from app.logging_utils import configure_logging
 from app.restart import RestartPlan, trigger_restart
 configure_logging()
 log = logging.getLogger("download_manager")
@dataclass
 class DownloadStatus:
    download_id: str
    url: str
    filename: str
    status: str
    bytes_total: Optional[int] = None
    bytes_downloaded: int = 0
    started_at: float = field(default_factory=time.time)
    finished_at: Optional[float] = None
    error: Optional[str] = None
 class DownloadManager:
    def __init__(self, cfg: AppConfig, broadcaster=None) -> None:
        self.cfg = cfg
        self._downloads: Dict[str, DownloadStatus] = {}
        self._tasks: Dict[str, asyncio.Task] = {}
        self._semaphore = asyncio.Semaphore(cfg.download_max_concurrent)
        self._broadcaster = broadcaster
    async def _emit(self, payload: dict) -> None:
        if self._broadcaster:
            await self._broadcaster.publish(payload)
    def list_downloads(self) -> Dict[str, dict]:
        return {k: asdict(v) for k, v in self._downloads.items()}
    def get(self, download_id: str) -> Optional[DownloadStatus]:
        return self._downloads.get(download_id)
    def _is_allowed(self, url: str) -> bool:
        if not self.cfg.download_allowlist:
            return True
        return any(fnmatch.fnmatch(url, pattern) for pattern in self.cfg.download_allowlist)
    async def start(self, url: str, filename: Optional[str] = None) -> DownloadStatus:
        if not self._is_allowed(url):
            raise ValueError("url not allowed by allowlist")
        if not filename:
            filename = os.path.basename(url.split("?")[0]) or f"model-{uuid.uuid4().hex}.gguf"
        log.info("Download requested url=%s filename=%s", url, filename)
        download_id = uuid.uuid4().hex
        status = DownloadStatus(download_id=download_id, url=url, filename=filename, status="queued")
        self._downloads[download_id] = status
        task = asyncio.create_task(self._run_download(status))
        self._tasks[download_id] = task
        await self._emit({"type": "download_status", "download": asdict(status)})
        return status
    async def cancel(self, download_id: str) -> bool:
        task = self._tasks.get(download_id)
        if task:
            task.cancel()
            status = self._downloads.get(download_id)
            if status:
                log.info("Download cancelled id=%s filename=%s", download_id, status.filename)
                await self._emit({"type": "download_status", "download": asdict(status)})
            return True
        return False
    async def _run_download(self, status: DownloadStatus) -> None:
        status.status = "downloading"
        base = Path(self.cfg.download_dir)
        base.mkdir(parents=True, exist_ok=True)
        tmp_path = base / f".{status.filename}.partial"
        final_path = base / status.filename
        last_emit = 0.0
        try:
            async with self._semaphore:
                async with httpx.AsyncClient(timeout=None, follow_redirects=True) as client:
                    async with client.stream("GET", status.url) as resp:
                        resp.raise_for_status()
                        length = resp.headers.get("content-length")
                        if length:
                            status.bytes_total = int(length)
                        with tmp_path.open("wb") as f:
                            async for chunk in resp.aiter_bytes():
                                if chunk:
                                    f.write(chunk)
                                    status.bytes_downloaded += len(chunk)
                                    now = time.time()
                                    if now - last_emit >= 1:
                                        last_emit = now
                                        await self._emit({"type": "download_progress", "download": asdict(status)})
            if tmp_path.exists():
                tmp_path.replace(final_path)
            status.status = "completed"
            status.finished_at = time.time()
            log.info("Download completed id=%s filename=%s", status.download_id, status.filename)
            await self._emit({"type": "download_completed", "download": asdict(status)})
            if self.cfg.reload_on_new_model:
                plan = RestartPlan(
                    method=self.cfg.restart_method,
                    command=self.cfg.restart_command,
                    url=self.cfg.restart_url,
                    allowed_container=self.cfg.allowed_container,
                )
                await trigger_restart(
                    plan,
                    payload={
                        "reason": "new_model",
                        "model_id": status.filename,
                        "llamacpp_args": self.cfg.llamacpp_args,
                        "llamacpp_extra_args": self.cfg.llamacpp_extra_args,
                    },
                )
        except asyncio.CancelledError:
            status.status = "cancelled"
            if tmp_path.exists():
                tmp_path.unlink(missing_ok=True)
            log.info("Download cancelled id=%s filename=%s", status.download_id, status.filename)
            await self._emit({"type": "download_cancelled", "download": asdict(status)})
        except Exception as exc:
            status.status = "error"
            status.error = str(exc)
            if tmp_path.exists():
                tmp_path.unlink(missing_ok=True)
            log.info("Download error id=%s filename=%s error=%s", status.download_id, status.filename, exc)
            await self._emit({"type": "download_error", "download": asdict(status)})
--- a/llamaCpp.Wrapper.app/llamacpp_client.py
+++ b/llamaCpp.Wrapper.app/llamacpp_client.py
@@ -0,0 +1,52 @@
 import logging
 from typing import AsyncIterator, Dict, Optional
 import httpx
 log = logging.getLogger("llamacpp_client")
 def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
    drop = {"host", "content-length"}
    return {k: v for k, v in headers.items() if k.lower() not in drop}
 async def proxy_json(
    base_url: str,
    path: str,
    method: str,
    headers: Dict[str, str],
    payload: Optional[dict],
    timeout_s: float,
 ) -> httpx.Response:
    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
        return await client.request(method, path, headers=_filter_headers(headers), json=payload)
 async def proxy_raw(
    base_url: str,
    path: str,
    method: str,
    headers: Dict[str, str],
    body: Optional[bytes],
    timeout_s: float,
 ) -> httpx.Response:
    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
        return await client.request(method, path, headers=_filter_headers(headers), content=body)
 async def proxy_stream(
    base_url: str,
    path: str,
    method: str,
    headers: Dict[str, str],
    payload: Optional[dict],
    timeout_s: float,
 ) -> AsyncIterator[bytes]:
    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
        async with client.stream(method, path, headers=_filter_headers(headers), json=payload) as resp:
            resp.raise_for_status()
            async for chunk in resp.aiter_bytes():
                if chunk:
                    yield chunk
--- a/llamaCpp.Wrapper.app/logging_utils.py
+++ b/llamaCpp.Wrapper.app/logging_utils.py
@@ -0,0 +1,13 @@
 import logging
 import os
 def configure_logging() -> None:
    if logging.getLogger().handlers:
        return
    level_name = os.getenv("LOG_LEVEL", "INFO").upper()
    level = getattr(logging, level_name, logging.INFO)
    logging.basicConfig(
        level=level,
        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
    )
--- a/llamaCpp.Wrapper.app/model_registry.py
+++ b/llamaCpp.Wrapper.app/model_registry.py
@@ -0,0 +1,45 @@
 import time
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Dict, List, Optional
@dataclass
 class ModelInfo:
    model_id: str
    created: int
    size: int
    path: Path
 def scan_models(model_dir: str) -> List[ModelInfo]:
    base = Path(model_dir)
    if not base.exists():
        return []
    models: List[ModelInfo] = []
    now = int(time.time())
    for entry in base.iterdir():
        if entry.name.endswith(".partial"):
            continue
        if entry.is_file():
            size = entry.stat().st_size
            models.append(ModelInfo(model_id=entry.name, created=now, size=size, path=entry))
        elif entry.is_dir():
            models.append(ModelInfo(model_id=entry.name, created=now, size=0, path=entry))
    models.sort(key=lambda m: m.model_id.lower())
    return models
 def find_model(model_dir: str, model_id: str) -> Optional[ModelInfo]:
    for model in scan_models(model_dir):
        if model.model_id == model_id:
            return model
    return None
 def resolve_model(model_dir: str, requested: str, aliases: Dict[str, str]) -> Optional[ModelInfo]:
    if not requested:
        return None
    if requested in aliases:
        requested = aliases[requested]
    return find_model(model_dir, requested)
--- a/llamaCpp.Wrapper.app/openai_translate.py
+++ b/llamaCpp.Wrapper.app/openai_translate.py
@@ -0,0 +1,140 @@
 import time
 import uuid
 from typing import Any, Dict, List, Tuple
 def _messages_from_input(input_value: Any) -> List[Dict[str, Any]]:
    if isinstance(input_value, str):
        return [{"role": "user", "content": input_value}]
    if isinstance(input_value, list):
        messages: List[Dict[str, Any]] = []
        for item in input_value:
            if isinstance(item, str):
                messages.append({"role": "user", "content": item})
            elif isinstance(item, dict):
                role = item.get("role") or "user"
                content = item.get("content") or item.get("text") or ""
                if item.get("type") == "input_image":
                    content = [{"type": "image_url", "image_url": {"url": item.get("image_url", "")}}]
                messages.append({"role": role, "content": content})
        return messages
    return [{"role": "user", "content": str(input_value)}]
 def _normalize_tools(tools: Any) -> Any:
    if not isinstance(tools, list):
        return tools
    normalized = []
    for tool in tools:
        if not isinstance(tool, dict):
            normalized.append(tool)
            continue
        if "function" in tool:
            normalized.append(tool)
            continue
        if tool.get("type") == "function" and ("name" in tool or "parameters" in tool or "description" in tool):
            function = {
                "name": tool.get("name"),
                "parameters": tool.get("parameters"),
                "description": tool.get("description"),
            }
            function = {k: v for k, v in function.items() if v is not None}
            normalized.append({"type": "function", "function": function})
            continue
        normalized.append(tool)
    return normalized
 def _normalize_tool_choice(tool_choice: Any) -> Any:
    if not isinstance(tool_choice, dict):
        return tool_choice
    if "function" in tool_choice:
        return tool_choice
    if tool_choice.get("type") == "function" and "name" in tool_choice:
        return {"type": "function", "function": {"name": tool_choice.get("name")}}
    return tool_choice
 def normalize_chat_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
    if "return_format" in payload and "response_format" not in payload:
        if payload["return_format"] == "json":
            payload["response_format"] = {"type": "json_object"}
    if "functions" in payload and "tools" not in payload:
        functions = payload.get("functions")
        if isinstance(functions, list):
            tools = []
            for func in functions:
                if isinstance(func, dict):
                    tools.append({"type": "function", "function": func})
            if tools:
                payload["tools"] = tools
        payload.pop("functions", None)
    if "tools" in payload:
        payload["tools"] = _normalize_tools(payload.get("tools"))
    if "tool_choice" in payload:
        payload["tool_choice"] = _normalize_tool_choice(payload.get("tool_choice"))
    return payload
 def responses_to_chat_payload(payload: Dict[str, Any]) -> Tuple[Dict[str, Any], str]:
    model = payload.get("model") or "unknown"
    messages = _messages_from_input(payload.get("input", ""))
    chat_payload: Dict[str, Any] = {
        "model": model,
        "messages": messages,
    }
    passthrough_keys = [
        "temperature",
        "top_p",
        "max_output_tokens",
        "stream",
        "tools",
        "tool_choice",
        "response_format",
        "return_format",
        "frequency_penalty",
        "presence_penalty",
        "seed",
        "stop",
    ]
    for key in passthrough_keys:
        if key in payload:
            if key == "max_output_tokens":
                chat_payload["max_tokens"] = payload[key]
            elif key == "return_format" and payload[key] == "json":
                chat_payload["response_format"] = {"type": "json_object"}
            else:
                chat_payload[key] = payload[key]
    return normalize_chat_payload(chat_payload), model
 def chat_to_responses(chat: Dict[str, Any], model: str) -> Dict[str, Any]:
    response_id = f"resp_{uuid.uuid4().hex}"
    created = int(time.time())
    content = ""
    if chat.get("choices"):
        choice = chat["choices"][0]
        message = choice.get("message") or {}
        content = message.get("content") or ""
    return {
        "id": response_id,
        "object": "response",
        "created": created,
        "model": model,
        "output": [
            {
                "id": f"msg_{uuid.uuid4().hex}",
                "type": "message",
                "role": "assistant",
                "content": [
                    {"type": "output_text", "text": content}
                ],
            }
        ],
        "usage": chat.get("usage", {}),
    }
--- a/llamaCpp.Wrapper.app/restart.py
+++ b/llamaCpp.Wrapper.app/restart.py
@@ -0,0 +1,51 @@
 import asyncio
 import logging
 import shlex
 from dataclasses import dataclass
 from typing import Optional
 import httpx
 log = logging.getLogger("llamacpp_restart")
@dataclass
 class RestartPlan:
    method: str
    command: Optional[str]
    url: Optional[str]
    allowed_container: Optional[str] = None
 async def trigger_restart(plan: RestartPlan, payload: Optional[dict] = None) -> None:
    if plan.method == "none":
        log.warning("Restart requested but restart method is none")
        return
    if plan.method == "http":
        if not plan.url:
            raise RuntimeError("restart url is required for http method")
        async with httpx.AsyncClient(timeout=60) as client:
            resp = await client.post(plan.url, json=payload or {})
            resp.raise_for_status()
        return
    if plan.method == "docker":
        if not plan.command:
            raise RuntimeError("restart command must include container id or name for docker method")
        if plan.allowed_container and plan.command != plan.allowed_container:
            raise RuntimeError("docker restart command not allowed for non-target container")
        async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds="/var/run/docker.sock"), timeout=30) as client:
            resp = await client.post(f"http://docker/containers/{plan.command}/restart")
            resp.raise_for_status()
        return
    if plan.method == "shell":
        if not plan.command:
            raise RuntimeError("restart command is required for shell method")
        cmd = plan.command
        args = shlex.split(cmd)
        proc = await asyncio.create_subprocess_exec(*args)
        code = await proc.wait()
        if code != 0:
            raise RuntimeError(f"restart command failed with exit code {code}")
        return
    raise RuntimeError(f"unknown restart method {plan.method}")
--- a/llamaCpp.Wrapper.app/run.py
+++ b/llamaCpp.Wrapper.app/run.py
@@ -0,0 +1,35 @@
 import os
 import signal
 import subprocess
 import sys
 from app.config import load_config
 def main() -> None:
    cfg = load_config()
    python = sys.executable
    api_cmd = [python, "-m", "uvicorn", "app.api_app:create_api_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.api_port)]
    ui_cmd = [python, "-m", "uvicorn", "app.ui_app:create_ui_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.ui_port)]
    procs = [subprocess.Popen(api_cmd)]
    if cfg.ui_port != cfg.api_port:
        procs.append(subprocess.Popen(ui_cmd))
    def shutdown(_sig, _frame):
        for proc in procs:
            proc.terminate()
        for proc in procs:
            proc.wait(timeout=10)
        sys.exit(0)
    signal.signal(signal.SIGTERM, shutdown)
    signal.signal(signal.SIGINT, shutdown)
    for proc in procs:
        proc.wait()
 if __name__ == "__main__":
    main()
--- a/llamaCpp.Wrapper.app/stream_transform.py
+++ b/llamaCpp.Wrapper.app/stream_transform.py
@@ -0,0 +1,102 @@
 import json
 import time
 import uuid
 from typing import Any, AsyncIterator, Dict
 import httpx
 def _sse_event(event: str, data: Dict[str, Any]) -> bytes:
    payload = json.dumps(data, separators=(",", ":"))
    return f"event: {event}\ndata: {payload}\n\n".encode("utf-8")
 def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
    drop = {"host", "content-length"}
    return {k: v for k, v in headers.items() if k.lower() not in drop}
 async def stream_chat_to_responses(
    base_url: str,
    headers: Dict[str, str],
    payload: Dict[str, Any],
    timeout_s: float,
 ) -> AsyncIterator[bytes]:
    response_id = f"resp_{uuid.uuid4().hex}"
    created = int(time.time())
    model = payload.get("model") or "unknown"
    msg_id = f"msg_{uuid.uuid4().hex}"
    output_text = ""
    response_stub = {
        "id": response_id,
        "object": "response",
        "created": created,
        "model": model,
        "output": [
            {
                "id": msg_id,
                "type": "message",
                "role": "assistant",
                "content": [
                    {"type": "output_text", "text": ""}
                ],
            }
        ],
    }
    yield _sse_event("response.created", {"type": "response.created", "response": response_stub})
    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
        async with client.stream(
            "POST",
            "/v1/chat/completions",
            headers=_filter_headers(headers),
            json=payload,
        ) as resp:
            resp.raise_for_status()
            buffer = ""
            async for chunk in resp.aiter_text():
                buffer += chunk
                while "\n\n" in buffer:
                    block, buffer = buffer.split("\n\n", 1)
                    lines = [line for line in block.splitlines() if line.startswith("data:")]
                    if not lines:
                        continue
                    data_str = "\n".join(line[len("data:"):].strip() for line in lines)
                    if data_str == "[DONE]":
                        continue
                    try:
                        data = json.loads(data_str)
                    except json.JSONDecodeError:
                        continue
                    choices = data.get("choices") or []
                    if not choices:
                        continue
                    delta = choices[0].get("delta") or {}
                    text_delta = delta.get("content")
                    if text_delta:
                        output_text += text_delta
                        yield _sse_event(
                            "response.output_text.delta",
                            {
                                "type": "response.output_text.delta",
                                "delta": text_delta,
                                "item_id": msg_id,
                                "output_index": 0,
                                "content_index": 0,
                            },
                        )
    yield _sse_event(
        "response.output_text.done",
        {
            "type": "response.output_text.done",
            "text": output_text,
            "item_id": msg_id,
            "output_index": 0,
            "content_index": 0,
        },
    )
    response_stub["output"][0]["content"][0]["text"] = output_text
    yield _sse_event("response.completed", {"type": "response.completed", "response": response_stub})
--- a/llamaCpp.Wrapper.app/truenas_middleware.py
+++ b/llamaCpp.Wrapper.app/truenas_middleware.py
@@ -0,0 +1,313 @@
 import json
 import logging
 import shlex
 import ssl
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any, Dict, Optional
 import websockets
 import yaml
 log = logging.getLogger("truenas_middleware")
@dataclass
 class TrueNASConfig:
    ws_url: str
    api_key: str
    api_user: Optional[str]
    app_name: str
    verify_ssl: bool = False
 def _parse_compose(raw: Any) -> Dict[str, Any]:
    if isinstance(raw, dict):
        return raw
    if isinstance(raw, str):
        text = raw.strip()
        try:
            return json.loads(text)
        except json.JSONDecodeError:
            return yaml.safe_load(text)
    raise ValueError("Unsupported compose payload")
 def _command_to_list(command: Any) -> list:
    if isinstance(command, list):
        return command
    if isinstance(command, str):
        return shlex.split(command)
    return []
 def _extract_command(config: Dict[str, Any], service_name: str = "llamacpp") -> list:
    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
        services = compose.get("services") or {}
        svc = services.get(service_name) or {}
        return _command_to_list(svc.get("command"))
    return _command_to_list(config.get("command"))
 def _model_id_from_command(cmd: list) -> Optional[str]:
    if "--model" in cmd:
        idx = cmd.index("--model")
        if idx + 1 < len(cmd):
            return Path(cmd[idx + 1]).name
    return None
 def _set_arg(cmd: list, flag: str, value: Optional[str]) -> list:
    if value is None:
        return cmd
    if flag in cmd:
        idx = cmd.index(flag)
        if idx + 1 < len(cmd):
            cmd[idx + 1] = value
        else:
            cmd.append(value)
        return cmd
    cmd.extend([flag, value])
    return cmd
 def _merge_args(cmd: list, args: Dict[str, str]) -> list:
    flag_map = {
        "device": "--device",
        "tensor_split": "--tensor-split",
        "split_mode": "--split-mode",
        "n_gpu_layers": "--n-gpu-layers",
        "ctx_size": "--ctx-size",
        "batch_size": "--batch-size",
        "ubatch_size": "--ubatch-size",
        "cache_type_k": "--cache-type-k",
        "cache_type_v": "--cache-type-v",
        "flash_attn": "--flash-attn",
    }
    for key, value in args.items():
        flag = flag_map.get(key)
        if flag:
            if flag in cmd:
                continue
            _set_arg(cmd, flag, value)
    return cmd
 def _merge_extra_args(cmd: list, extra: str) -> list:
    if not extra:
        return cmd
    extra_list = shlex.split(extra)
    filtered: list[str] = []
    skip_next = False
    for item in extra_list:
        if skip_next:
            skip_next = False
            continue
        if item in {"--device", "-dev"}:
            log.warning("Dropping --device from extra args to avoid llama.cpp device errors.")
            skip_next = True
            continue
        filtered.append(item)
    for flag in filtered:
        if flag not in cmd:
            cmd.append(flag)
    return cmd
 def _update_model_command(command: Any, model_path: str, args: Dict[str, str], extra: str) -> list:
    cmd = _command_to_list(command)
    if "--device" in cmd:
        idx = cmd.index("--device")
        del cmd[idx: idx + 2]
    cmd = _set_arg(cmd, "--model", model_path)
    cmd = _merge_args(cmd, args)
    cmd = _merge_extra_args(cmd, extra)
    return cmd
 def _replace_flags(cmd: list, flags: Dict[str, Optional[str]], extra: str) -> list:
    result = list(cmd)
    for flag in flags.keys():
        while flag in result:
            idx = result.index(flag)
            del result[idx: idx + 2]
    if "--device" in result:
        idx = result.index("--device")
        del result[idx: idx + 2]
    for flag, value in flags.items():
        if value is not None and value != "":
            result = _set_arg(result, flag, value)
    result = _merge_extra_args(result, extra)
    return result
 async def get_app_config(cfg: TrueNASConfig) -> Dict[str, Any]:
    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
    if not isinstance(config, dict):
        raise RuntimeError("app.config returned unsupported payload")
    return config
 async def get_app_command(cfg: TrueNASConfig, service_name: str = "llamacpp") -> list:
    config = await get_app_config(cfg)
    return _extract_command(config, service_name=service_name)
 async def get_active_model_id(cfg: TrueNASConfig, service_name: str = "llamacpp") -> str:
    config = await get_app_config(cfg)
    cmd = _extract_command(config, service_name=service_name)
    return _model_id_from_command(cmd) or ""
 async def get_app_logs(
    cfg: TrueNASConfig,
    tail_lines: int = 200,
    service_name: str = "llamacpp",
 ) -> str:
    tail_payloads = [
        {"tail": tail_lines},
        {"tail_lines": tail_lines},
        {"tail": str(tail_lines)},
    ]
    for payload in tail_payloads:
        try:
            result = await _rpc_call(cfg, "app.container_logs", [cfg.app_name, service_name, payload])
            if isinstance(result, str):
                return result
        except Exception as exc:
            log.debug("app.container_logs failed (%s): %s", payload, exc)
    for payload in tail_payloads:
        try:
            result = await _rpc_call(cfg, "app.logs", [cfg.app_name, payload])
            if isinstance(result, str):
                return result
        except Exception as exc:
            log.debug("app.logs failed (%s): %s", payload, exc)
    return ""
 async def update_app_command(
    cfg: TrueNASConfig,
    command: list,
    service_name: str = "llamacpp",
 ) -> None:
    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
    if not isinstance(config, dict):
        raise RuntimeError("app.config returned unsupported payload")
    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
        services = compose.get("services") or {}
        if service_name not in services:
            raise RuntimeError(f"service {service_name} not found in compose")
        svc = services[service_name]
        svc["command"] = command
        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
        return
    config["command"] = command
    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
 async def update_command_flags(
    cfg: TrueNASConfig,
    flags: Dict[str, Optional[str]],
    extra: str,
    service_name: str = "llamacpp",
 ) -> None:
    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
    if not isinstance(config, dict):
        raise RuntimeError("app.config returned unsupported payload")
    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
        services = compose.get("services") or {}
        if service_name not in services:
            raise RuntimeError(f"service {service_name} not found in compose")
        svc = services[service_name]
        cmd = svc.get("command")
        svc["command"] = _replace_flags(_command_to_list(cmd), flags, extra)
        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
        return
    cmd = _replace_flags(_command_to_list(config.get("command")), flags, extra)
    config["command"] = cmd
    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
 async def _rpc_call(cfg: TrueNASConfig, method: str, params: Optional[list] = None) -> Any:
    ssl_ctx = None
    if cfg.ws_url.startswith("wss://") and not cfg.verify_ssl:
        ssl_ctx = ssl.create_default_context()
        ssl_ctx.check_hostname = False
        ssl_ctx.verify_mode = ssl.CERT_NONE
    async with websockets.connect(cfg.ws_url, ssl=ssl_ctx) as ws:
        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
        connected = json.loads(await ws.recv())
        if connected.get("msg") != "connected":
            raise RuntimeError("failed to connect to TrueNAS websocket")
        await ws.send(
            json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [cfg.api_key]})
        )
        auth_resp = json.loads(await ws.recv())
        if not auth_resp.get("result"):
            if not cfg.api_user:
                raise RuntimeError("API key rejected and TRUENAS_API_USER not set")
            await ws.send(
                json.dumps(
                    {
                        "id": 2,
                        "msg": "method",
                        "method": "auth.login_ex",
                        "params": [
                            {
                                "mechanism": "API_KEY_PLAIN",
                                "username": cfg.api_user,
                                "api_key": cfg.api_key,
                            }
                        ],
                    }
                )
            )
            auth_ex = json.loads(await ws.recv())
            if auth_ex.get("result", {}).get("response_type") != "SUCCESS":
                raise RuntimeError("API key authentication failed")
        req_id = 3
        await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
        while True:
            raw = json.loads(await ws.recv())
            if raw.get("id") != req_id:
                continue
            if raw.get("msg") == "error":
                raise RuntimeError(raw.get("error"))
            return raw.get("result")
 async def switch_model(
    cfg: TrueNASConfig,
    model_path: str,
    args: Dict[str, str],
    extra: str,
    service_name: str = "llamacpp",
 ) -> None:
    config = await _rpc_call(cfg, "app.config", [cfg.app_name])
    if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
        compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
        services = compose.get("services") or {}
        if service_name not in services:
            raise RuntimeError(f"service {service_name} not found in compose")
        svc = services[service_name]
        cmd = svc.get("command")
        svc["command"] = _update_model_command(cmd, model_path, args, extra)
        await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
        log.info("Requested model switch to %s via TrueNAS middleware (custom app)", model_path)
        return
    if not isinstance(config, dict):
        raise RuntimeError("app.config returned unsupported payload")
    cmd = config.get("command")
    config["command"] = _update_model_command(cmd, model_path, args, extra)
    await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
    log.info("Requested model switch to %s via TrueNAS middleware (catalog app)", model_path)
--- a/llamaCpp.Wrapper.app/ui_app.py
+++ b/llamaCpp.Wrapper.app/ui_app.py
@@ -0,0 +1,357 @@
 import asyncio
 import json
 import logging
 from pathlib import Path
 from typing import Any, Dict, Optional
 import httpx
 from fastapi import FastAPI, HTTPException, Request
 from fastapi.responses import FileResponse, HTMLResponse, JSONResponse, StreamingResponse
 from app.config import load_config
 from app.docker_logs import docker_container_logs
 from app.download_manager import DownloadManager
 from app.logging_utils import configure_logging
 from app.model_registry import scan_models
 from app.truenas_middleware import (
    TrueNASConfig,
    get_active_model_id,
    get_app_command,
    get_app_logs,
    switch_model,
    update_command_flags,
 )
 from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
 configure_logging()
 log = logging.getLogger("ui_app")
 class EventBroadcaster:
    def __init__(self) -> None:
        self._queues: set[asyncio.Queue] = set()
    def connect(self) -> asyncio.Queue:
        queue: asyncio.Queue = asyncio.Queue()
        self._queues.add(queue)
        return queue
    def disconnect(self, queue: asyncio.Queue) -> None:
        self._queues.discard(queue)
    async def publish(self, payload: dict) -> None:
        for queue in list(self._queues):
            queue.put_nowait(payload)
 def _static_path() -> Path:
    return Path(__file__).parent / "ui_static"
 async def _fetch_active_model(truenas_cfg: Optional[TrueNASConfig]) -> Optional[str]:
    if not truenas_cfg:
        return None
    try:
        return await get_active_model_id(truenas_cfg)
    except Exception as exc:
        log.warning("Failed to read active model from TrueNAS config: %s", exc)
        return None
 def _model_list(model_dir: str, active_model: Optional[str]) -> Dict[str, Any]:
    data = []
    for model in scan_models(model_dir):
        data.append({
            "id": model.model_id,
            "size": model.size,
            "active": model.model_id == active_model,
        })
    return {"models": data, "active_model": active_model}
 def create_ui_app() -> FastAPI:
    cfg = load_config()
    app = FastAPI(title="llama.cpp Model Manager", version="0.1.0")
    broadcaster = EventBroadcaster()
    manager = DownloadManager(cfg, broadcaster=broadcaster)
    truenas_cfg = None
    if cfg.truenas_ws_url and cfg.truenas_api_key:
        truenas_cfg = TrueNASConfig(
            ws_url=cfg.truenas_ws_url,
            api_key=cfg.truenas_api_key,
            api_user=cfg.truenas_api_user,
            app_name=cfg.truenas_app_name,
            verify_ssl=cfg.truenas_verify_ssl,
        )
    async def monitor_active_model() -> None:
        last_model = None
        while True:
            current = await _fetch_active_model(truenas_cfg)
            if current and current != last_model:
                last_model = current
                await broadcaster.publish({"type": "active_model", "model_id": current})
            await asyncio.sleep(3)
    async def _fetch_logs() -> str:
        logs = ""
        if truenas_cfg:
            try:
                logs = await asyncio.wait_for(get_app_logs(truenas_cfg, tail_lines=200), timeout=5)
            except asyncio.TimeoutError:
                logs = ""
        if not logs and cfg.llamacpp_container_name:
            try:
                logs = await asyncio.wait_for(
                    docker_container_logs(cfg.llamacpp_container_name, tail_lines=200),
                    timeout=10,
                )
            except asyncio.TimeoutError:
                logs = ""
        return logs
    @app.on_event("startup")
    async def start_tasks() -> None:
        asyncio.create_task(monitor_active_model())
    @app.middleware("http")
    async def log_requests(request: Request, call_next):
        log.info("UI request %s %s", request.method, request.url.path)
        return await call_next(request)
    @app.get("/health")
    async def health() -> Dict[str, Any]:
        return {"status": "ok", "model_dir": cfg.model_dir}
    @app.get("/")
    async def index() -> HTMLResponse:
        return FileResponse(_static_path() / "index.html")
    @app.get("/ui/styles.css")
    async def styles() -> FileResponse:
        return FileResponse(_static_path() / "styles.css")
    @app.get("/ui/app.js")
    async def app_js() -> FileResponse:
        return FileResponse(_static_path() / "app.js")
    @app.get("/ui/api/models")
    async def list_models() -> JSONResponse:
        active_model = await _fetch_active_model(truenas_cfg)
        log.info("UI list models active=%s", active_model)
        return JSONResponse(_model_list(cfg.model_dir, active_model))
    @app.get("/ui/api/downloads")
    async def list_downloads() -> JSONResponse:
        log.info("UI list downloads")
        return JSONResponse({"downloads": manager.list_downloads()})
    @app.post("/ui/api/downloads")
    async def start_download(request: Request) -> JSONResponse:
        payload = await request.json()
        url = payload.get("url")
        filename = payload.get("filename")
        log.info("UI download start url=%s filename=%s", url, filename)
        if not url:
            raise HTTPException(status_code=400, detail="url is required")
        try:
            status = await manager.start(url, filename=filename)
        except ValueError as exc:
            raise HTTPException(status_code=403, detail=str(exc))
        return JSONResponse({"download": status.__dict__})
    @app.delete("/ui/api/downloads/{download_id}")
    async def cancel_download(download_id: str) -> JSONResponse:
        log.info("UI download cancel id=%s", download_id)
        ok = await manager.cancel(download_id)
        if not ok:
            raise HTTPException(status_code=404, detail="download not found")
        return JSONResponse({"status": "cancelled"})
    @app.get("/ui/api/events")
    async def events() -> StreamingResponse:
        queue = broadcaster.connect()
        async def event_stream():
            try:
                while True:
                    payload = await queue.get()
                    data = json.dumps(payload, separators=(",", ":"))
                    yield f"data: {data}\n\n".encode("utf-8")
            finally:
                broadcaster.disconnect(queue)
        return StreamingResponse(event_stream(), media_type="text/event-stream")
    @app.post("/ui/api/switch-model")
    async def switch_model_ui(request: Request) -> JSONResponse:
        payload = await request.json()
        model_id = payload.get("model_id")
        warmup_override = payload.get("warmup_prompt") or ""
        if not model_id:
            raise HTTPException(status_code=400, detail="model_id is required")
        model_path = Path(cfg.model_dir) / model_id
        if not model_path.exists():
            raise HTTPException(status_code=404, detail="model not found")
        if not truenas_cfg:
            raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
        try:
            container_model_path = str(Path(cfg.model_container_dir) / model_id)
            await switch_model(truenas_cfg, container_model_path, cfg.llamacpp_args, cfg.llamacpp_extra_args)
        except Exception as exc:
            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
            raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
        warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
        log.info("UI warmup after switch model=%s prompt_len=%s", model_id, len(warmup_prompt))
        try:
            await run_warmup_with_retry(cfg.base_url, model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
        except Exception as exc:
            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
            raise HTTPException(status_code=500, detail=f"model switch warmup failed: {exc}")
        try:
            async with httpx.AsyncClient(base_url=cfg.base_url, timeout=120) as client:
                resp = await client.post(
                    "/v1/chat/completions",
                    json={
                        "model": model_id,
                        "messages": [{"role": "user", "content": "ok"}],
                        "max_tokens": 4,
                        "temperature": 0,
                    },
                )
                resp.raise_for_status()
        except Exception as exc:
            await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
            raise HTTPException(status_code=500, detail=f"model switch verification failed: {exc}")
        await broadcaster.publish({"type": "model_switched", "model_id": model_id})
        log.info("UI model switched model=%s", model_id)
        return JSONResponse({"status": "ok", "model_id": model_id})
    @app.get("/ui/api/llamacpp-config")
    async def get_llamacpp_config() -> JSONResponse:
        active_model = await _fetch_active_model(truenas_cfg)
        log.info("UI get llama.cpp config active=%s", active_model)
        params: Dict[str, Optional[str]] = {}
        command_raw = []
        if truenas_cfg:
            command_raw = await get_app_command(truenas_cfg)
        flag_map = {
            "--ctx-size": "ctx_size",
            "--n-gpu-layers": "n_gpu_layers",
            "--tensor-split": "tensor_split",
            "--split-mode": "split_mode",
            "--cache-type-k": "cache_type_k",
            "--cache-type-v": "cache_type_v",
            "--flash-attn": "flash_attn",
            "--temp": "temp",
            "--top-k": "top_k",
            "--top-p": "top_p",
            "--repeat-penalty": "repeat_penalty",
            "--repeat-last-n": "repeat_last_n",
            "--frequency-penalty": "frequency_penalty",
            "--presence-penalty": "presence_penalty",
        }
        if isinstance(command_raw, list):
            for flag, key in flag_map.items():
                if flag in command_raw:
                    idx = command_raw.index(flag)
                    if idx + 1 < len(command_raw):
                        params[key] = command_raw[idx + 1]
        known_flags = set(flag_map.keys()) | {"--model"}
        extra = []
        if isinstance(command_raw, list):
            skip_next = False
            for item in command_raw:
                if skip_next:
                    skip_next = False
                    continue
                if item in known_flags:
                    skip_next = True
                    continue
                extra.append(item)
        return JSONResponse(
            {
                "active_model": active_model,
                "params": params,
                "extra_args": " ".join(extra),
            }
        )
    @app.post("/ui/api/llamacpp-config")
    async def update_llamacpp_config(request: Request) -> JSONResponse:
        payload = await request.json()
        params = payload.get("params") or {}
        extra_args = payload.get("extra_args") or ""
        warmup_override = payload.get("warmup_prompt") or ""
        log.info("UI save llama.cpp config params=%s extra_args=%s", params, extra_args)
        if not truenas_cfg:
            raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
        flags = {
            "--ctx-size": params.get("ctx_size"),
            "--n-gpu-layers": params.get("n_gpu_layers"),
            "--tensor-split": params.get("tensor_split"),
            "--split-mode": params.get("split_mode"),
            "--cache-type-k": params.get("cache_type_k"),
            "--cache-type-v": params.get("cache_type_v"),
            "--flash-attn": params.get("flash_attn"),
            "--temp": params.get("temp"),
            "--top-k": params.get("top_k"),
            "--top-p": params.get("top_p"),
            "--repeat-penalty": params.get("repeat_penalty"),
            "--repeat-last-n": params.get("repeat_last_n"),
            "--frequency-penalty": params.get("frequency_penalty"),
            "--presence-penalty": params.get("presence_penalty"),
        }
        try:
            await update_command_flags(truenas_cfg, flags, extra_args)
        except Exception as exc:
            log.exception("UI update llama.cpp config failed")
            raise HTTPException(status_code=500, detail=f"config update failed: {exc}")
        active_model = await _fetch_active_model(truenas_cfg)
        if active_model:
            warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
            log.info("UI warmup after config update model=%s prompt_len=%s", active_model, len(warmup_prompt))
            try:
                await run_warmup_with_retry(cfg.base_url, active_model, warmup_prompt, timeout_s=cfg.switch_timeout_s)
            except Exception as exc:
                raise HTTPException(status_code=500, detail=f"config warmup failed: {exc}")
        await broadcaster.publish({"type": "llamacpp_config_updated"})
        return JSONResponse({"status": "ok"})
    @app.get("/ui/api/llamacpp-logs")
    async def get_llamacpp_logs() -> JSONResponse:
        logs = await _fetch_logs()
        return JSONResponse({"logs": logs})
    @app.get("/ui/api/llamacpp-logs/stream")
    async def stream_llamacpp_logs() -> StreamingResponse:
        async def event_stream():
            last_lines: list[str] = []
            while True:
                logs = await _fetch_logs()
                lines = logs.splitlines()
                if last_lines:
                    last_tail = last_lines[-1]
                    idx = -1
                    for i in range(len(lines) - 1, -1, -1):
                        if lines[i] == last_tail:
                            idx = i
                            break
                    if idx >= 0:
                        lines = lines[idx + 1 :]
                if lines:
                    last_lines = (last_lines + lines)[-200:]
                    data = json.dumps({"type": "logs", "lines": lines}, separators=(",", ":"))
                    yield f"data: {data}\n\n".encode("utf-8")
                await asyncio.sleep(2)
        return StreamingResponse(event_stream(), media_type="text/event-stream")
    return app
--- a/llamaCpp.Wrapper.app/ui_static/app.js
+++ b/llamaCpp.Wrapper.app/ui_static/app.js
@@ -0,0 +1,306 @@
 const modelsList = document.getElementById("models-list");
 const downloadsList = document.getElementById("downloads-list");
 const refreshModels = document.getElementById("refresh-models");
 const refreshDownloads = document.getElementById("refresh-downloads");
 const form = document.getElementById("download-form");
 const errorEl = document.getElementById("download-error");
 const statusEl = document.getElementById("switch-status");
 const configStatusEl = document.getElementById("config-status");
 const configForm = document.getElementById("config-form");
 const refreshConfig = document.getElementById("refresh-config");
 const warmupPromptEl = document.getElementById("warmup-prompt");
 const refreshLogs = document.getElementById("refresh-logs");
 const logsOutput = document.getElementById("logs-output");
 const logsStatus = document.getElementById("logs-status");
 const themeToggle = document.getElementById("theme-toggle");
 const applyTheme = (theme) => {
  document.documentElement.setAttribute("data-theme", theme);
  themeToggle.textContent = theme === "dark" ? "Light" : "Dark";
  themeToggle.setAttribute("aria-pressed", theme === "dark" ? "true" : "false");
 };
 const savedTheme = localStorage.getItem("theme") || "light";
 applyTheme(savedTheme);
 themeToggle.addEventListener("click", () => {
  const next = document.documentElement.getAttribute("data-theme") === "dark" ? "light" : "dark";
  localStorage.setItem("theme", next);
  applyTheme(next);
 });
 const cfgFields = {
  ctx_size: document.getElementById("cfg-ctx-size"),
  n_gpu_layers: document.getElementById("cfg-n-gpu-layers"),
  tensor_split: document.getElementById("cfg-tensor-split"),
  split_mode: document.getElementById("cfg-split-mode"),
  cache_type_k: document.getElementById("cfg-cache-type-k"),
  cache_type_v: document.getElementById("cfg-cache-type-v"),
  flash_attn: document.getElementById("cfg-flash-attn"),
  temp: document.getElementById("cfg-temp"),
  top_k: document.getElementById("cfg-top-k"),
  top_p: document.getElementById("cfg-top-p"),
  repeat_penalty: document.getElementById("cfg-repeat-penalty"),
  repeat_last_n: document.getElementById("cfg-repeat-last-n"),
  frequency_penalty: document.getElementById("cfg-frequency-penalty"),
  presence_penalty: document.getElementById("cfg-presence-penalty"),
 };
 const extraArgsEl = document.getElementById("cfg-extra-args");
 const fmtBytes = (bytes) => {
  if (!bytes && bytes !== 0) return "-";
  const units = ["B", "KB", "MB", "GB", "TB"];
  let idx = 0;
  let value = bytes;
  while (value >= 1024 && idx < units.length - 1) {
    value /= 1024;
    idx += 1;
  }
  return `${value.toFixed(1)} ${units[idx]}`;
 };
 const setStatus = (message, type) => {
  statusEl.textContent = message || "";
  statusEl.className = "status";
  if (type) {
    statusEl.classList.add(type);
  }
 };
 const setConfigStatus = (message, type) => {
  configStatusEl.textContent = message || "";
  configStatusEl.className = "status";
  if (type) {
    configStatusEl.classList.add(type);
  }
 };
 async function loadModels() {
  const res = await fetch("/ui/api/models");
  const data = await res.json();
  modelsList.innerHTML = "";
  const activeModel = data.active_model;
  data.models.forEach((model) => {
    const li = document.createElement("li");
    if (model.active) {
      li.classList.add("active");
    }
    const row = document.createElement("div");
    row.className = "model-row";
    const name = document.createElement("span");
    name.textContent = `${model.id} (${fmtBytes(model.size)})`;
    const actions = document.createElement("div");
    if (model.active) {
      const badge = document.createElement("span");
      badge.className = "badge";
      badge.textContent = "Active";
      actions.appendChild(badge);
    } else {
      const button = document.createElement("button");
      button.className = "ghost";
      button.textContent = "Switch";
      button.onclick = async () => {
        setStatus(`Switching to ${model.id}...`);
        const warmupPrompt = warmupPromptEl.value.trim();
        const res = await fetch("/ui/api/switch-model", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ model_id: model.id, warmup_prompt: warmupPrompt }),
        });
        const payload = await res.json();
        if (!res.ok) {
          setStatus(payload.detail || "Switch failed.", "error");
          return;
        }
        warmupPromptEl.value = "";
        setStatus(`Active model: ${model.id}`, "ok");
        await loadModels();
      };
      actions.appendChild(button);
    }
    row.appendChild(name);
    row.appendChild(actions);
    li.appendChild(row);
    modelsList.appendChild(li);
  });
  if (activeModel) {
    setStatus(`Active model: ${activeModel}`, "ok");
  }
 }
 async function loadDownloads() {
  const res = await fetch("/ui/api/downloads");
  const data = await res.json();
  downloadsList.innerHTML = "";
  const entries = Object.values(data.downloads || {});
  if (!entries.length) {
    downloadsList.innerHTML = "<p>No active downloads.</p>";
    return;
  }
  entries.forEach((download) => {
    const card = document.createElement("div");
    card.className = "download-card";
    const title = document.createElement("strong");
    title.textContent = download.filename;
    const meta = document.createElement("div");
    const percent = download.bytes_total
      ? Math.round((download.bytes_downloaded / download.bytes_total) * 100)
      : 0;
    meta.textContent = `${download.status} · ${fmtBytes(download.bytes_downloaded)} / ${fmtBytes(download.bytes_total)}`;
    const progress = document.createElement("div");
    progress.className = "progress";
    const bar = document.createElement("span");
    bar.style.width = `${Math.min(percent, 100)}%`;
    progress.appendChild(bar);
    const actions = document.createElement("div");
    if (download.status === "downloading" || download.status === "queued") {
      const cancel = document.createElement("button");
      cancel.className = "ghost";
      cancel.textContent = "Cancel";
      cancel.onclick = async () => {
        await fetch(`/ui/api/downloads/${download.download_id}`, { method: "DELETE" });
        await loadDownloads();
      };
      actions.appendChild(cancel);
    }
    card.appendChild(title);
    card.appendChild(meta);
    card.appendChild(progress);
    card.appendChild(actions);
    downloadsList.appendChild(card);
  });
 }
 async function loadConfig() {
  const res = await fetch("/ui/api/llamacpp-config");
  const data = await res.json();
  Object.entries(cfgFields).forEach(([key, el]) => {
    el.value = data.params?.[key] || "";
  });
  extraArgsEl.value = data.extra_args || "";
  if (data.active_model) {
    setConfigStatus(`Active model: ${data.active_model}`, "ok");
  }
 }
 async function loadLogs() {
  const res = await fetch("/ui/api/llamacpp-logs");
  if (!res.ok) {
    logsStatus.textContent = "Unavailable";
    return;
  }
  const data = await res.json();
  logsOutput.textContent = data.logs || "";
  logsStatus.textContent = data.logs ? "Snapshot" : "Empty";
 }
 form.addEventListener("submit", async (event) => {
  event.preventDefault();
  errorEl.textContent = "";
  const url = document.getElementById("model-url").value.trim();
  const filename = document.getElementById("model-filename").value.trim();
  if (!url) {
    errorEl.textContent = "URL is required.";
    return;
  }
  const payload = { url };
  if (filename) payload.filename = filename;
  const res = await fetch("/ui/api/downloads", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(payload),
  });
  if (!res.ok) {
    const err = await res.json();
    errorEl.textContent = err.detail || "Failed to start download.";
    return;
  }
  document.getElementById("model-url").value = "";
  document.getElementById("model-filename").value = "";
  await loadDownloads();
 });
 configForm.addEventListener("submit", async (event) => {
  event.preventDefault();
  setConfigStatus("Applying parameters...");
  const params = {};
  Object.entries(cfgFields).forEach(([key, el]) => {
    if (el.value.trim()) {
      params[key] = el.value.trim();
    }
  });
  const warmupPrompt = warmupPromptEl.value.trim();
  const res = await fetch("/ui/api/llamacpp-config", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ params, extra_args: extraArgsEl.value.trim(), warmup_prompt: warmupPrompt }),
  });
  const payload = await res.json();
  if (!res.ok) {
    setConfigStatus(payload.detail || "Update failed.", "error");
    return;
  }
  setConfigStatus("Parameters updated.", "ok");
  warmupPromptEl.value = "";
 });
 refreshModels.addEventListener("click", loadModels);
 refreshDownloads.addEventListener("click", loadDownloads);
 refreshConfig.addEventListener("click", loadConfig);
 refreshLogs.addEventListener("click", loadLogs);
 loadModels();
 loadDownloads();
 loadConfig();
 loadLogs();
 const eventSource = new EventSource("/ui/api/events");
 eventSource.onmessage = async (event) => {
  const payload = JSON.parse(event.data);
  if (payload.type === "download_progress" || payload.type === "download_completed" || payload.type === "download_status") {
    await loadDownloads();
  }
  if (payload.type === "active_model") {
    await loadModels();
    await loadConfig();
  }
  if (payload.type === "model_switched") {
    setStatus(`Active model: ${payload.model_id}`, "ok");
    await loadModels();
    await loadConfig();
  }
  if (payload.type === "model_switch_failed") {
    setStatus(payload.error || "Model switch failed.", "error");
  }
  if (payload.type === "llamacpp_config_updated") {
    await loadConfig();
  }
 };
 const logsSource = new EventSource("/ui/api/llamacpp-logs/stream");
 logsSource.onopen = () => {
  logsStatus.textContent = "Streaming";
 };
 logsSource.onmessage = (event) => {
  const payload = JSON.parse(event.data);
  if (payload.type !== "logs") {
    return;
  }
  const lines = payload.lines || [];
  if (!lines.length) return;
  const current = logsOutput.textContent.split("\n").filter((line) => line.length);
  const merged = current.concat(lines).slice(-400);
  logsOutput.textContent = merged.join("\n");
  logsOutput.scrollTop = logsOutput.scrollHeight;
  logsStatus.textContent = "Streaming";
 };
 logsSource.onerror = () => {
  logsStatus.textContent = "Disconnected";
 };
--- a/llamaCpp.Wrapper.app/ui_static/index.html
+++ b/llamaCpp.Wrapper.app/ui_static/index.html
@@ -0,0 +1,151 @@
 <!doctype html>
 <html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <title>llama.cpp Model Manager</title>
    <link rel="stylesheet" href="/ui/styles.css" />
  </head>
  <body>
    <div class="page">
      <header class="topbar">
        <div class="brand">
          <p class="eyebrow">llama.cpp wrapper</p>
          <h1>Model Manager</h1>
          <p class="lede">Curate models, tune runtime parameters, and keep llama.cpp responsive.</p>
        </div>
        <div class="header-actions">
          <button id="theme-toggle" class="ghost" type="button" aria-pressed="false">Dark</button>
          <div class="quick-actions card">
          <h2>Quick Add</h2>
          <form id="download-form">
            <label>
              Model URL
              <input type="url" id="model-url" placeholder="https://.../model.gguf" required />
            </label>
            <label>
              Optional filename
              <input type="text" id="model-filename" placeholder="custom-name.gguf" />
            </label>
            <button type="submit">Start Download</button>
            <p id="download-error" class="error"></p>
          </form>
          </div>
        </div>
      </header>
      <main class="layout">
        <section class="column">
          <div class="card">
            <div class="card-header">
              <h3>Models</h3>
              <button id="refresh-models" class="ghost">Refresh</button>
            </div>
            <div id="switch-status" class="status"></div>
            <label class="config-wide">
              Warmup prompt (one-time)
              <textarea id="warmup-prompt" rows="3" placeholder="Optional warmup prompt for the next restart only"></textarea>
            </label>
            <ul id="models-list" class="list"></ul>
          </div>
          <div class="card">
            <div class="card-header">
              <h3>Downloads</h3>
              <button id="refresh-downloads" class="ghost">Refresh</button>
            </div>
            <div id="downloads-list" class="downloads"></div>
          </div>
        </section>
        <section class="column">
          <div class="card">
            <div class="card-header">
              <h3>Runtime Parameters</h3>
              <button id="refresh-config" class="ghost">Refresh</button>
            </div>
            <div id="config-status" class="status"></div>
            <form id="config-form" class="config-grid">
              <label>
                ctx-size
                <input type="text" id="cfg-ctx-size" placeholder="e.g. 8192" />
              </label>
              <label>
                n-gpu-layers
                <input type="text" id="cfg-n-gpu-layers" placeholder="e.g. 999" />
              </label>
              <label>
                tensor-split
                <input type="text" id="cfg-tensor-split" placeholder="e.g. 0.5,0.5" />
              </label>
              <label>
                split-mode
                <input type="text" id="cfg-split-mode" placeholder="e.g. layer" />
              </label>
              <label>
                cache-type-k
                <input type="text" id="cfg-cache-type-k" placeholder="e.g. q8_0" />
              </label>
              <label>
                cache-type-v
                <input type="text" id="cfg-cache-type-v" placeholder="e.g. q8_0" />
              </label>
              <label>
                flash-attn
                <input type="text" id="cfg-flash-attn" placeholder="on/off" />
              </label>
              <label>
                temp
                <input type="text" id="cfg-temp" placeholder="e.g. 0.7" />
              </label>
              <label>
                top-k
                <input type="text" id="cfg-top-k" placeholder="e.g. 40" />
              </label>
              <label>
                top-p
                <input type="text" id="cfg-top-p" placeholder="e.g. 0.9" />
              </label>
              <label>
                repeat-penalty
                <input type="text" id="cfg-repeat-penalty" placeholder="e.g. 1.1" />
              </label>
              <label>
                repeat-last-n
                <input type="text" id="cfg-repeat-last-n" placeholder="e.g. 256" />
              </label>
              <label>
                frequency-penalty
                <input type="text" id="cfg-frequency-penalty" placeholder="e.g. 0.1" />
              </label>
              <label>
                presence-penalty
                <input type="text" id="cfg-presence-penalty" placeholder="e.g. 0.0" />
              </label>
              <label class="config-wide">
                extra args
                <textarea id="cfg-extra-args" rows="3" placeholder="--mlock --no-mmap"></textarea>
              </label>
              <button type="submit" class="config-wide">Apply Parameters</button>
            </form>
          </div>
        </section>
      </main>
      <section class="card logs-panel">
        <div class="card-header">
          <div>
            <h3>llama.cpp Logs</h3>
            <p class="lede small">Live tail from the llama.cpp container.</p>
          </div>
          <div class="log-actions">
            <span id="logs-status" class="badge muted">Idle</span>
            <button id="refresh-logs" class="ghost">Refresh</button>
          </div>
        </div>
        <pre id="logs-output" class="log-output"></pre>
      </section>
    </div>
    <script src="/ui/app.js"></script>
  </body>
 </html>
--- a/llamaCpp.Wrapper.app/ui_static/styles.css
+++ b/llamaCpp.Wrapper.app/ui_static/styles.css
@@ -0,0 +1,337 @@
 :root {
  --bg: #f5f6f8;
  --panel: #ffffff;
  --panel-muted: #f2f3f6;
  --text: #111318;
  --muted: #5b6472;
  --border: rgba(17, 19, 24, 0.08);
  --accent: #0a84ff;
  --accent-ink: #005ad6;
  --shadow: 0 20px 60px rgba(17, 19, 24, 0.08);
 }
 * {
  box-sizing: border-box;
  margin: 0;
  padding: 0;
 }
 body {
  font-family: "SF Pro Text", "SF Pro Display", "Helvetica Neue", "Segoe UI", sans-serif;
  background: radial-gradient(circle at top, #ffffff 0%, var(--bg) 60%);
  color: var(--text);
 }
 .page {
  max-width: 1200px;
  margin: 0 auto;
  padding: 48px 28px 72px;
 }
 .topbar {
  display: grid;
  grid-template-columns: minmax(240px, 1.2fr) minmax(280px, 0.8fr);
  gap: 32px;
  align-items: stretch;
  margin-bottom: 36px;
 }
 .header-actions {
  display: grid;
  gap: 16px;
  justify-items: end;
 }
 .header-actions .quick-actions {
  width: 100%;
 }
 .header-actions #theme-toggle {
  justify-self: end;
 }
 .brand h1 {
  font-size: clamp(2.2rem, 4vw, 3.2rem);
  letter-spacing: -0.02em;
 }
 .eyebrow {
  text-transform: uppercase;
  letter-spacing: 0.2em;
  font-size: 0.68rem;
  color: var(--muted);
 }
 .lede {
  margin-top: 12px;
  font-size: 1rem;
  color: var(--muted);
 }
 .lede.small {
  font-size: 0.85rem;
 }
 .card {
  background: var(--panel);
  padding: 22px;
  border-radius: 22px;
  border: 1px solid var(--border);
  box-shadow: var(--shadow);
 }
 .quick-actions h2 {
  margin-bottom: 14px;
  font-size: 1.1rem;
 }
 .layout {
  display: grid;
  grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
  gap: 24px;
 }
 .column {
  display: grid;
  gap: 24px;
 }
 .logs-panel {
  margin-top: 28px;
 }
 .card-header {
  display: flex;
  align-items: center;
  justify-content: space-between;
  gap: 12px;
  margin-bottom: 16px;
 }
 .card-header h3 {
  font-size: 1.1rem;
 }
 .log-actions {
  display: flex;
  align-items: center;
  gap: 12px;
 }
 form {
  display: grid;
  gap: 12px;
 }
 label {
  display: grid;
  gap: 6px;
  font-size: 0.85rem;
  color: var(--muted);
 }
 input,
 textarea,
 button {
  font: inherit;
 }
 input,
 textarea {
  padding: 10px 12px;
  border-radius: 12px;
  border: 1px solid var(--border);
  background: #fff;
 }
 button {
  border: none;
  padding: 10px 16px;
  border-radius: 12px;
  background: var(--accent);
  color: #fff;
  font-weight: 600;
  cursor: pointer;
  transition: transform 0.2s ease, background 0.2s ease;
 }
 button:hover {
  transform: translateY(-1px);
  background: var(--accent-ink);
 }
 button.ghost {
  background: transparent;
  color: var(--accent);
  border: 1px solid rgba(10, 132, 255, 0.4);
  padding: 8px 12px;
 }
 .list {
  list-style: none;
  padding: 0;
  margin: 0;
  display: grid;
  gap: 10px;
 }
 .list li {
  padding: 12px;
  border-radius: 14px;
  background: var(--panel-muted);
  border: 1px solid var(--border);
  font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
  font-size: 0.85rem;
 }
 .list li.active {
  border-color: rgba(10, 132, 255, 0.4);
  background: #eef5ff;
 }
 .model-row {
  display: flex;
  align-items: center;
  justify-content: space-between;
  gap: 12px;
 }
 .badge {
  display: inline-block;
  padding: 4px 8px;
  border-radius: 999px;
  background: var(--accent);
  color: #fff;
  font-size: 0.7rem;
  font-weight: 600;
 }
 .badge.muted {
  background: rgba(17, 19, 24, 0.1);
  color: var(--muted);
 }
 .status {
  margin-bottom: 12px;
  font-size: 0.9rem;
  color: var(--muted);
 }
 .status.ok {
  color: #1a7f37;
 }
 .status.error {
  color: #b02a14;
 }
 .downloads {
  display: grid;
  gap: 12px;
 }
 .download-card {
  border-radius: 16px;
  border: 1px solid var(--border);
  padding: 12px;
  background: #f7f8fb;
 }
 .download-card strong {
  display: block;
  font-size: 0.9rem;
  margin-bottom: 6px;
 }
 .progress {
  height: 8px;
  border-radius: 999px;
  background: #dfe3ea;
  overflow: hidden;
  margin: 8px 0;
 }
 .progress > span {
  display: block;
  height: 100%;
  background: var(--accent);
  width: 0;
 }
 .error {
  color: #b02a14;
  font-size: 0.85rem;
 }
 .config-grid {
  display: grid;
  grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
  gap: 14px;
 }
 .config-wide {
  grid-column: 1 / -1;
 }
 textarea {
  padding: 10px 12px;
  border-radius: 12px;
  border: 1px solid var(--border);
  font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
  font-size: 0.85rem;
  resize: vertical;
 }
 .log-output {
  background: #0f141b;
  color: #dbe6f3;
  padding: 16px;
  border-radius: 16px;
  min-height: 260px;
  max-height: 420px;
  overflow: auto;
  font-size: 12px;
  line-height: 1.6;
  white-space: pre-wrap;
 }
 [data-theme="dark"] {
  --bg: #0b0d12;
  --panel: #141824;
  --panel-muted: #1b2132;
  --text: #f1f4f9;
  --muted: #a5afc2;
  --border: rgba(241, 244, 249, 0.1);
  --accent: #4aa3ff;
  --accent-ink: #1f7ae0;
  --shadow: 0 20px 60px rgba(0, 0, 0, 0.4);
 }
 [data-theme="dark"] body {
  background: radial-gradient(circle at top, #131826 0%, var(--bg) 60%);
 }
 [data-theme="dark"] .download-card {
  background: #121826;
 }
 [data-theme="dark"] .progress {
  background: #2a3349;
 }
 [data-theme="dark"] .log-output {
  background: #080b12;
  color: #d8e4f3;
 }
@media (max-width: 900px) {
  .topbar {
    grid-template-columns: 1fr;
  }
 }
@media (max-width: 640px) {
  .page {
    padding: 32px 16px 48px;
  }
 }
--- a/llamaCpp.Wrapper.app/warmup.py
+++ b/llamaCpp.Wrapper.app/warmup.py
@@ -0,0 +1,74 @@
 import asyncio
 import logging
 import time
 from pathlib import Path
 import httpx
 log = logging.getLogger("llamacpp_warmup")
 def _is_loading_error(response: httpx.Response) -> bool:
    if response.status_code != 503:
        return False
    try:
        payload = response.json()
    except Exception:
        return False
    message = ""
    if isinstance(payload, dict):
        error = payload.get("error")
        if isinstance(error, dict):
            message = str(error.get("message") or "")
        else:
            message = str(payload.get("message") or "")
    return "loading model" in message.lower()
 def resolve_warmup_prompt(override: str | None, fallback_path: str) -> str:
    if override:
        prompt = override.strip()
        if prompt:
            return prompt
    try:
        prompt = Path(fallback_path).read_text(encoding="utf-8").strip()
        if prompt:
            return prompt
    except Exception as exc:
        log.warning("Failed to read warmup prompt from %s: %s", fallback_path, exc)
    return "ok"
 async def run_warmup(base_url: str, model_id: str, prompt: str, timeout_s: float) -> None:
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": prompt}],
        "max_tokens": 8,
        "temperature": 0,
    }
    async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
        resp = await client.post("/v1/chat/completions", json=payload)
        if resp.status_code == 503 and _is_loading_error(resp):
            raise RuntimeError("llama.cpp still loading model")
        resp.raise_for_status()
 async def run_warmup_with_retry(
    base_url: str,
    model_id: str,
    prompt: str,
    timeout_s: float,
    interval_s: float = 3.0,
 ) -> None:
    deadline = time.time() + timeout_s
    last_exc: Exception | None = None
    while time.time() < deadline:
        try:
            await run_warmup(base_url, model_id, prompt, timeout_s=timeout_s)
            return
        except Exception as exc:
            last_exc = exc
            await asyncio.sleep(interval_s)
    if last_exc:
        raise last_exc
--- a/llamacpp_remote_test.ps1
+++ b/llamacpp_remote_test.ps1
@@ -0,0 +1,464 @@
 param(
  [Parameter(Mandatory = $true)][string]$Model,
  [string]$BaseUrl = "http://192.168.1.2:8071",
  [string]$PromptPath = "prompt_crwv.txt",
  [int]$Runs = 3,
  [int]$MaxTokens = 2000,
  [int]$NumCtx = 131072,
  [int]$TopK = 1,
  [double]$TopP = 1.0,
  [int]$Seed = 42,
  [double]$RepeatPenalty = 1.05,
  [double]$Temperature = 0,
  [string]$JsonSchema = "",
  [int]$TimeoutSec = 1800,
  [string]$BatchId,
  [switch]$EnableGpuMonitor = $true,
  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
  [string]$SshUser = "rushabh",
  [string]$SshHost = "192.168.1.2",
  [int]$SshPort = 55555,
  [int]$GpuMonitorIntervalSec = 1,
  [int]$GpuMonitorSeconds = 120
 )
 $ErrorActionPreference = "Stop"
 $ProgressPreference = "SilentlyContinue"
 function Normalize-Strike([object]$value) {
  if ($null -eq $value) { return $null }
  if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
    return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
  }
  return ($value.ToString().Trim())
 }
 function Get-AllowedLegs([string]$promptText) {
  $pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
  $match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
  if (-not $match.Success) {
    throw "Options Chain JSON block not found in prompt."
  }
  $chains = $match.Groups[1].Value | ConvertFrom-Json
  $allowedExpiry = @{}
  $allowedLegs = @{}
  foreach ($exp in $chains) {
    $expiry = [string]$exp.expiry
    if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
    $allowedExpiry[$expiry] = $true
    foreach ($leg in $exp.liquidSet) {
      if ($null -eq $leg) { continue }
      if ($leg.liquid -ne $true) { continue }
      $side = [string]$leg.side
      $strikeNorm = Normalize-Strike $leg.strike
      if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
        $key = "$expiry|$side|$strikeNorm"
        $allowedLegs[$key] = $true
      }
    }
  }
  return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
 }
 function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
  $errors = New-Object System.Collections.Generic.List[string]
  $requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
  foreach ($key in $requiredTop) {
    if (-not ($obj.PSObject.Properties.Name -contains $key)) {
      $errors.Add("Missing top-level key: $key")
    }
  }
  if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
    $errors.Add("Invalid strategyBias: $($obj.strategyBias)")
  }
  if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
    if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
      $errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
    }
  } else {
    $errors.Add("selectedExpiry is missing or empty")
  }
  if ($obj.confidenceScore -ne $null) {
    if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
      $errors.Add("confidenceScore is not numeric")
    } elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
      $errors.Add("confidenceScore out of range 0-100")
    }
  }
  if ($obj.recommendedTrades -eq $null) {
    $errors.Add("recommendedTrades is null")
  } elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
    $errors.Add("recommendedTrades is not an array")
  }
  if ($obj.strategyBias -eq "NO_TRADE") {
    if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
      $errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
    }
  } else {
    if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
      $errors.Add("recommendedTrades must contain 1-3 trades")
    }
  }
  if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
    $errors.Add("whyOthersRejected is not an array")
  }
  if ($obj.recommendedTrades) {
    foreach ($trade in $obj.recommendedTrades) {
      $tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
      foreach ($tkey in $tradeRequired) {
        if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
          $errors.Add("Trade missing key: $tkey")
        }
      }
      if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
      if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
      if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
      if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
        if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
      }
      if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
        if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
      }
      if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
        $errors.Add("Trade legs missing or not an array")
        continue
      }
      $legs = @($trade.legs)
      $hasBuy = $false
      $hasSell = $false
      foreach ($leg in $trade.legs) {
        $side = ([string]$leg.side).ToLowerInvariant()
        $action = ([string]$leg.action).ToLowerInvariant()
        $expiry = [string]$leg.expiry
        $strikeNorm = Normalize-Strike $leg.strike
        if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
        if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
        if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
        if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
          $key = "$expiry|$side|$strikeNorm"
          if (-not $allowedLegs.ContainsKey($key)) {
            $errors.Add("Leg not in liquid set: $key")
          }
        }
        if ($action -eq "buy") { $hasBuy = $true }
        if ($action -eq "sell") { $hasSell = $true }
      }
      if ($obj.selectedExpiry -and $legs) {
        foreach ($leg in $legs) {
          if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
            $errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
          }
        }
      }
      if ($hasSell -and -not $hasBuy) {
        $errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
      }
      if ($trade.greekProfile) {
        $gp = $trade.greekProfile
        $gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
        foreach ($gkey in $gpRequired) {
          if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
            $errors.Add("Missing greekProfile.$gkey")
          }
        }
        if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
        if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
        if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
        if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
        if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
          $errors.Add("ThetaExposure POS on all-long legs")
        }
      } else {
        $errors.Add("Missing greekProfile")
      }
      $structure = ([string]$trade.structure).ToLowerInvariant()
      $tradeName = ([string]$trade.name).ToLowerInvariant()
      $isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
      $isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
      $isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
      $isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
      if ($isStraddle -or $isStrangle) {
        if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
        $callLegs = $legs | Where-Object { $_.side -eq "call" }
        $putLegs = $legs | Where-Object { $_.side -eq "put" }
        if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
        if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
          $callStrike = Normalize-Strike $callLegs[0].strike
          $putStrike = Normalize-Strike $putLegs[0].strike
          if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
          if ($isStrangle) {
            try {
              if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
            } catch {
              $errors.Add("Strangle strike comparison failed")
            }
          }
          if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
            $errors.Add("Straddle/Strangle must be long (buy) legs")
          }
        }
        if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
          $errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
        }
      }
      if ($isCallDebit) {
        $callLegs = $legs | Where-Object { $_.side -eq "call" }
        if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
        $buy = $callLegs | Where-Object { $_.action -eq "buy" }
        $sell = $callLegs | Where-Object { $_.action -eq "sell" }
        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
          try {
            if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
              $errors.Add("Call debit spread buy strike must be below sell strike")
            }
          } catch {
            $errors.Add("Call debit spread strike comparison failed")
          }
        }
      }
      if ($isPutDebit) {
        $putLegs = $legs | Where-Object { $_.side -eq "put" }
        if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
        $buy = $putLegs | Where-Object { $_.action -eq "buy" }
        $sell = $putLegs | Where-Object { $_.action -eq "sell" }
        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
          try {
            if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
              $errors.Add("Put debit spread buy strike must be above sell strike")
            }
          } catch {
            $errors.Add("Put debit spread strike comparison failed")
          }
        }
      }
    }
  }
  return $errors
 }
 function Parse-GpuLog {
  param([string]$Path)
  $summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
  if (-not (Test-Path $Path)) {
    $summary.error = "gpu log missing"
    return $summary
  }
  $lines = Get-Content -Path $Path
  $currentIndex = -1
  $gpuIndex = -1
  $inUtilBlock = $false
  foreach ($line in $lines) {
    if ($line -match '^Timestamp') {
      $gpuIndex = -1
      $currentIndex = -1
      $inUtilBlock = $false
      continue
    }
    if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
      $gpuIndex += 1
      $currentIndex = $gpuIndex
      $inUtilBlock = $false
      continue
    }
    if ($line -match '^\\s*Utilization\\s*$') {
      $inUtilBlock = $true
      continue
    }
    if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
      $util = [int]$Matches[1]
      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
      $summary.samples += 1
    }
  }
  return $summary
 }
 $prompt = [string](Get-Content -Raw -Path $PromptPath)
 $allowed = Get-AllowedLegs -promptText $prompt
 $allowedExpiry = $allowed.AllowedExpiry
 $allowedLegs = $allowed.AllowedLegs
 if ([string]::IsNullOrWhiteSpace($BatchId)) {
  $BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
 }
 $outBase = Join-Path -Path (Get-Location) -ChildPath "llamacpp_runs_remote"
 if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
 $safeModel = $Model -replace '[\\/:*?"<>|]', '_'
 $batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
 if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
 $outDir = Join-Path -Path $batchDir -ChildPath $safeModel
 if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
 $summary = [ordered]@{
  model = $Model
  baseUrl = $BaseUrl
  batchId = $BatchId
  params = [ordered]@{
    temperature = $Temperature
    top_k = $TopK
    top_p = $TopP
    seed = $Seed
    repeat_penalty = $RepeatPenalty
    max_tokens = $MaxTokens
    num_ctx = $NumCtx
  }
  gpuMonitor = [ordered]@{
    enabled = [bool]$EnableGpuMonitor
    sshHost = $SshHost
    sshPort = $SshPort
    intervalSec = $GpuMonitorIntervalSec
    durationSec = $GpuMonitorSeconds
  }
  modelMeta = $null
  runs = @()
 }
 if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
  try {
    $schemaObject = $JsonSchema | ConvertFrom-Json
  } catch {
    throw "JsonSchema is not valid JSON: $($_.Exception.Message)"
  }
 }
 try {
  $modelsResponse = Invoke-RestMethod -Uri "$BaseUrl/v1/models" -TimeoutSec 30
  $meta = $modelsResponse.data | Where-Object { $_.id -eq $Model } | Select-Object -First 1
  if ($meta) { $summary.modelMeta = $meta.meta }
 } catch {
  $summary.modelMeta = @{ error = $_.Exception.Message }
 }
 for ($i = 1; $i -le $Runs; $i++) {
  Write-Host "Running $Model (run $i/$Runs)"
  $runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
  $gpuJob = $null
  $gpuLogPath = $null
  if ($EnableGpuMonitor) {
    $samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
    $gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
    $gpuJob = Start-Job -ScriptBlock {
      param($sshExe, $target, $port, $samples, $interval, $logPath)
      for ($s = 1; $s -le $samples; $s++) {
        Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
        try {
          $out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
          Add-Content -Path $logPath -Value $out
        } catch {
          Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
        }
        Start-Sleep -Seconds $interval
      }
    } -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
    Start-Sleep -Seconds 1
  }
  $body = @{
    model = $Model
    messages = @(@{ role = "user"; content = $prompt })
    temperature = $Temperature
    top_k = $TopK
    top_p = $TopP
    seed = $Seed
    repeat_penalty = $RepeatPenalty
    max_tokens = $MaxTokens
  }
  if ($schemaObject) {
    $body.response_format = @{
      type = "json_schema"
      json_schema = @{
        name = "trade_schema"
        schema = $schemaObject
        strict = $true
      }
    }
  }
  $body = $body | ConvertTo-Json -Depth 12
  try {
    $resp = Invoke-RestMethod -Uri "$BaseUrl/v1/chat/completions" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
  } catch {
    $runResult.errors = @("API error: $($_.Exception.Message)")
    $summary.runs += $runResult
    if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
    continue
  } finally {
    if ($gpuJob) {
      Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
      if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
      Remove-Job -Job $gpuJob | Out-Null
    }
  }
  $raw = [string]$resp.choices[0].message.content
  $jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
  Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
  try {
    $parsed = $raw | ConvertFrom-Json
    $errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
    if ($errors.Count -eq 0) {
      $runResult.ok = $true
    } else {
      $runResult.errors = $errors
    }
  } catch {
    $runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
  }
  if ($gpuLogPath) {
    $runResult.gpuLog = $gpuLogPath
    $runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
  }
  if ($resp.timings) {
    $runResult.timings = $resp.timings
  }
  if ($resp.usage) {
    $runResult.usage = $resp.usage
  }
  $summary.runs += $runResult
 }
 $summaryPath = Join-Path $outDir "summary.json"
 $summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
 $summary | ConvertTo-Json -Depth 6
--- a/llamacpp_set_command.ps1
+++ b/llamacpp_set_command.ps1
@@ -0,0 +1,117 @@
 param(
  [Parameter(Mandatory = $true)][string]$ModelPath,
  [Parameter(Mandatory = $true)][int]$CtxSize,
  [int]$BatchSize = 1024,
  [int]$UBatchSize = 256,
  [string]$TensorSplit = "0.5,0.5",
  [string]$Devices = "0,1",
  [int]$GpuLayers = 999,
  [string]$CacheTypeK = "q4_0",
  [string]$CacheTypeV = "q4_0",
  [string]$GrammarFile = "",
  [string]$JsonSchema = "",
  [string]$BaseUrl = "http://192.168.1.2:8071",
  [int]$TimeoutSec = 600,
  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
  [string]$SshUser = "rushabh",
  [string]$SshHost = "192.168.1.2",
  [int]$SshPort = 55555
 )
 $ErrorActionPreference = "Stop"
 $ProgressPreference = "SilentlyContinue"
 $commandArgs = @(
  "--model", $ModelPath,
  "--ctx-size", $CtxSize.ToString(),
  "--n-gpu-layers", $GpuLayers.ToString(),
  "--split-mode", "layer",
  "--tensor-split", $TensorSplit,
  "--batch-size", $BatchSize.ToString(),
  "--ubatch-size", $UBatchSize.ToString(),
  "--cache-type-k", $CacheTypeK,
  "--cache-type-v", $CacheTypeV,
  "--flash-attn", "on"
 )
 if (-not [string]::IsNullOrWhiteSpace($Devices)) {
  $commandArgs = @("--device", $Devices) + $commandArgs
 }
 if (-not [string]::IsNullOrWhiteSpace($GrammarFile)) {
  $commandArgs += @("--grammar-file", $GrammarFile)
 }
 if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
  $commandArgs += @("--json-schema", $JsonSchema)
 }
 $argJson = $commandArgs | ConvertTo-Json -Compress
 $py = @"
 import json
 path = r"/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
 new_cmd = json.loads(r'''$argJson''')
 lines = open(path, "r", encoding="utf-8").read().splitlines()
 out = []
 in_cmd = False
 def yaml_quote(value):
    text = str(value)
    return "'" + text.replace("'", "''") + "'"
 for line in lines:
    if line.startswith('"command":'):
        out.append('"command":')
        for arg in new_cmd:
            out.append(f"- {yaml_quote(arg)}")
        in_cmd = True
        continue
    if in_cmd:
        if line.startswith('"') and not line.startswith('"command":'):
            in_cmd = False
            out.append(line)
        else:
            continue
    else:
        out.append(line)
 if in_cmd:
    pass
 open(path, "w", encoding="utf-8").write("\n".join(out) + "\n")
 "@
 $py | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -"
 $pyCompose = @"
 import json, yaml, subprocess
 compose_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/templates/rendered/docker-compose.yaml"
 user_config_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
 with open(compose_path, "r", encoding="utf-8") as f:
    compose = json.load(f)
 with open(user_config_path, "r", encoding="utf-8") as f:
    config = yaml.safe_load(f)
 command = config.get("command")
 if not command:
    raise SystemExit("command list missing from user_config")
 svc = compose["services"]["llamacpp"]
 svc["command"] = command
 with open(compose_path, "w", encoding="utf-8") as f:
    json.dump(compose, f)
 payload = {"custom_compose_config": compose}
 subprocess.run(["midclt", "call", "app.update", "llamacpp", json.dumps(payload)], check=True)
 "@
 $pyCompose | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -" | Out-Null
 $start = Get-Date
 while ((Get-Date) - $start -lt [TimeSpan]::FromSeconds($TimeoutSec)) {
  try {
    $resp = Invoke-RestMethod -Uri "$BaseUrl/health" -TimeoutSec 10
    if ($resp.status -eq "ok") {
      Write-Host "llamacpp healthy at $BaseUrl"
      exit 0
    }
  } catch {
    Start-Sleep -Seconds 5
  }
 }
 throw "Timed out waiting for llama.cpp server at $BaseUrl"
--- a/modelfiles/options-json-deepseek14b.Modelfile
+++ b/modelfiles/options-json-deepseek14b.Modelfile
@@ -0,0 +1,14 @@
 FROM deepseek-r1:14b
 SYSTEM """
 You are a senior quantitative options trader specializing in index and ETF options.
 Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
 No markdown, no code fences, no commentary, no extra keys, no trailing text.
 Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
 If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
 Begin output with { and end with }.
 """
 PARAMETER temperature 0
 PARAMETER top_k 1
 PARAMETER top_p 1
 PARAMETER repeat_penalty 1.05
 PARAMETER seed 42
--- a/modelfiles/options-json-llama31-70b.Modelfile
+++ b/modelfiles/options-json-llama31-70b.Modelfile
@@ -0,0 +1,14 @@
 FROM llama3.1:70b
 SYSTEM """
 You are a senior quantitative options trader specializing in index and ETF options.
 Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
 No markdown, no code fences, no commentary, no extra keys, no trailing text.
 Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
 If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
 Begin output with { and end with }.
 """
 PARAMETER temperature 0
 PARAMETER top_k 1
 PARAMETER top_p 1
 PARAMETER repeat_penalty 1.05
 PARAMETER seed 42
--- a/modelfiles/options-json-phi3mini.Modelfile
+++ b/modelfiles/options-json-phi3mini.Modelfile
@@ -0,0 +1,14 @@
 FROM phi3:mini-128k
 SYSTEM """
 You are a senior quantitative options trader specializing in index and ETF options.
 Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
 No markdown, no code fences, no commentary, no extra keys, no trailing text.
 Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
 If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
 Begin output with { and end with }.
 """
 PARAMETER temperature 0
 PARAMETER top_k 1
 PARAMETER top_p 1
 PARAMETER repeat_penalty 1.05
 PARAMETER seed 42
--- a/ollama_remote_test.ps1
+++ b/ollama_remote_test.ps1
@@ -0,0 +1,561 @@
 param(
  [Parameter(Mandatory = $true)][string]$Model,
  [string]$BaseUrl = "http://192.168.1.2:30068",
  [string]$PromptPath = "prompt_crwv.txt",
  [int]$Runs = 3,
  [int]$NumPredict = 1200,
  [int]$NumCtx = 131072,
  [int]$NumBatch = 0,
  [int]$NumGpuLayers = 0,
  [int]$TimeoutSec = 900,
  [int]$TopK = 1,
  [double]$TopP = 1.0,
  [int]$Seed = 42,
  [double]$RepeatPenalty = 1.05,
  [string]$BatchId,
  [switch]$UseSchemaFormat = $false,
  [switch]$EnableGpuMonitor = $true,
  [string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
  [switch]$CheckProcessor = $true,
  [string]$SshUser = "rushabh",
  [string]$SshHost = "192.168.1.2",
  [int]$SshPort = 55555,
  [int]$GpuMonitorIntervalSec = 1,
  [int]$GpuMonitorSeconds = 120
 )
 $ErrorActionPreference = "Stop"
 $ProgressPreference = "SilentlyContinue"
 function Normalize-Strike([object]$value) {
  if ($null -eq $value) { return $null }
  if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
    return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
  }
  return ($value.ToString().Trim())
 }
 function Get-AllowedLegs([string]$promptText) {
  $pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
  $match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
  if (-not $match.Success) {
    throw "Options Chain JSON block not found in prompt."
  }
  $chains = $match.Groups[1].Value | ConvertFrom-Json
  $allowedExpiry = @{}
  $allowedLegs = @{}
  foreach ($exp in $chains) {
    $expiry = [string]$exp.expiry
    if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
    $allowedExpiry[$expiry] = $true
    foreach ($leg in $exp.liquidSet) {
      if ($null -eq $leg) { continue }
      if ($leg.liquid -ne $true) { continue }
      $side = [string]$leg.side
      $strikeNorm = Normalize-Strike $leg.strike
      if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
        $key = "$expiry|$side|$strikeNorm"
        $allowedLegs[$key] = $true
      }
    }
  }
  return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
 }
 function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
  $errors = New-Object System.Collections.Generic.List[string]
  $requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
  foreach ($key in $requiredTop) {
    if (-not ($obj.PSObject.Properties.Name -contains $key)) {
      $errors.Add("Missing top-level key: $key")
    }
  }
  if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
    $errors.Add("Invalid strategyBias: $($obj.strategyBias)")
  }
  if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
    if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
      $errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
    }
  } else {
    $errors.Add("selectedExpiry is missing or empty")
  }
  if ($obj.confidenceScore -ne $null) {
    if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
      $errors.Add("confidenceScore is not numeric")
    } elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
      $errors.Add("confidenceScore out of range 0-100")
    }
  }
  if ($obj.recommendedTrades -eq $null) {
    $errors.Add("recommendedTrades is null")
  } elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
    $errors.Add("recommendedTrades is not an array")
  }
  if ($obj.strategyBias -eq "NO_TRADE") {
    if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
      $errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
    }
  } else {
    if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
      $errors.Add("recommendedTrades must contain 1-3 trades")
    }
  }
  if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
    $errors.Add("whyOthersRejected is not an array")
  }
  if ($obj.recommendedTrades) {
    foreach ($trade in $obj.recommendedTrades) {
      $tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
      foreach ($tkey in $tradeRequired) {
        if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
          $errors.Add("Trade missing key: $tkey")
        }
      }
      if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
      if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
      if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
      if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
      if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
        if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
      }
      if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
        if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
      }
      if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
        $errors.Add("Trade legs missing or not an array")
        continue
      }
      $legs = @($trade.legs)
      $hasBuy = $false
      $hasSell = $false
      foreach ($leg in $trade.legs) {
        $side = ([string]$leg.side).ToLowerInvariant()
        $action = ([string]$leg.action).ToLowerInvariant()
        $expiry = [string]$leg.expiry
        $strikeNorm = Normalize-Strike $leg.strike
        if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
        if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
        if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
        if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
          $key = "$expiry|$side|$strikeNorm"
          if (-not $allowedLegs.ContainsKey($key)) {
            $errors.Add("Leg not in liquid set: $key")
          }
        }
        if ($action -eq "buy") { $hasBuy = $true }
        if ($action -eq "sell") { $hasSell = $true }
      }
      if ($obj.selectedExpiry -and $legs) {
        foreach ($leg in $legs) {
          if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
            $errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
          }
        }
      }
      if ($hasSell -and -not $hasBuy) {
        $errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
      }
      if ($trade.greekProfile) {
        $gp = $trade.greekProfile
        $gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
        foreach ($gkey in $gpRequired) {
          if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
            $errors.Add("Missing greekProfile.$gkey")
          }
        }
        if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
        if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
        if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
        if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
        if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
          $errors.Add("ThetaExposure POS on all-long legs")
        }
      } else {
        $errors.Add("Missing greekProfile")
      }
      $structure = ([string]$trade.structure).ToLowerInvariant()
      $tradeName = ([string]$trade.name).ToLowerInvariant()
      $isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
      $isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
      $isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
      $isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
      if ($isStraddle -or $isStrangle) {
        if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
        $callLegs = $legs | Where-Object { $_.side -eq "call" }
        $putLegs = $legs | Where-Object { $_.side -eq "put" }
        if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
        if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
          $callStrike = Normalize-Strike $callLegs[0].strike
          $putStrike = Normalize-Strike $putLegs[0].strike
          if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
          if ($isStrangle) {
            try {
              if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
            } catch {
              $errors.Add("Strangle strike comparison failed")
            }
          }
          if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
            $errors.Add("Straddle/Strangle must be long (buy) legs")
          }
        }
        if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
          $errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
        }
      }
      if ($isCallDebit) {
        $callLegs = $legs | Where-Object { $_.side -eq "call" }
        if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
        $buy = $callLegs | Where-Object { $_.action -eq "buy" }
        $sell = $callLegs | Where-Object { $_.action -eq "sell" }
        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
          try {
            if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
              $errors.Add("Call debit spread buy strike must be below sell strike")
            }
          } catch {
            $errors.Add("Call debit spread strike comparison failed")
          }
        }
      }
      if ($isPutDebit) {
        $putLegs = $legs | Where-Object { $_.side -eq "put" }
        if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
        $buy = $putLegs | Where-Object { $_.action -eq "buy" }
        $sell = $putLegs | Where-Object { $_.action -eq "sell" }
        if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
        if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
          try {
            if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
              $errors.Add("Put debit spread buy strike must be above sell strike")
            }
          } catch {
            $errors.Add("Put debit spread strike comparison failed")
          }
        }
      }
    }
  }
  return $errors
 }
 function Parse-GpuLog {
  param([string]$Path)
  $summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
  if (-not (Test-Path $Path)) {
    $summary.error = "gpu log missing"
    return $summary
  }
  $lines = Get-Content -Path $Path
  $currentIndex = -1
  $gpuIndex = -1
  $inGpuUtilSamples = $false
  $inUtilBlock = $false
  foreach ($line in $lines) {
    if ($line -match '^Timestamp') {
      $gpuIndex = -1
      $currentIndex = -1
      $inGpuUtilSamples = $false
      $inUtilBlock = $false
      continue
    }
    if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
      $gpuIndex += 1
      $currentIndex = $gpuIndex
      $inGpuUtilSamples = $false
      $inUtilBlock = $false
      continue
    }
    if ($line -match '^\\s*Utilization\\s*$') {
      $inUtilBlock = $true
      continue
    }
    if ($line -match '^\\s*GPU Utilization Samples') {
      $inGpuUtilSamples = $true
      $inUtilBlock = $false
      continue
    }
    if ($line -match '^\\s*(Memory|ENC|DEC) Utilization Samples') {
      $inGpuUtilSamples = $false
      $inUtilBlock = $false
      continue
    }
    if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
      $util = [int]$Matches[1]
      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
      $summary.samples += 1
      continue
    }
    if ($inGpuUtilSamples -and $line -match '^\\s*Max\\s*:\\s*([0-9]+)\\s*%') {
      $util = [int]$Matches[1]
      if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
      if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
      $summary.samples += 1
    }
  }
  return $summary
 }
 function Get-ProcessorShare {
  param(
    [string]$SshExePath,
    [string]$Target,
    [int]$Port,
    [string]$ModelName
  )
  $result = [ordered]@{ cpuPct = $null; gpuPct = $null; raw = $null; error = $null }
  try {
    $out = & $SshExePath -p $Port $Target "sudo -n docker exec ix-ollama-ollama-1 ollama ps"
    $line = $out | Select-String -SimpleMatch $ModelName | Select-Object -First 1
    if ($null -eq $line) {
      $result.error = "model not found in ollama ps"
      return $result
    }
    $raw = $line.ToString().Trim()
    $result.raw = $raw
    if ($raw -match '([0-9]+)%\\/([0-9]+)%\\s+CPU\\/GPU') {
      $result.cpuPct = [int]$Matches[1]
      $result.gpuPct = [int]$Matches[2]
    } elseif ($raw -match '([0-9]+)%\\s+GPU') {
      $result.cpuPct = 0
      $result.gpuPct = [int]$Matches[1]
    } else {
      $result.error = "CPU/GPU split not parsed"
    }
  } catch {
    $result.error = $_.Exception.Message
  }
  return $result
 }
 $prompt = [string](Get-Content -Raw -Path $PromptPath)
 $allowed = Get-AllowedLegs -promptText $prompt
 $allowedExpiry = $allowed.AllowedExpiry
 $allowedLegs = $allowed.AllowedLegs
 if ([string]::IsNullOrWhiteSpace($BatchId)) {
  $BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
 }
 $outBase = Join-Path -Path (Get-Location) -ChildPath "ollama_runs_remote"
 if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
 $safeModel = $Model -replace '[\\/:*?"<>|]', '_'
 $batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
 if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
 $outDir = Join-Path -Path $batchDir -ChildPath $safeModel
 if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
 $summary = [ordered]@{
  model = $Model
  baseUrl = $BaseUrl
  formatMode = $(if ($UseSchemaFormat) { "schema" } else { "json" })
  batchId = $BatchId
  gpuMonitor = [ordered]@{
    enabled = [bool]$EnableGpuMonitor
    sshHost = $SshHost
    sshPort = $SshPort
    intervalSec = $GpuMonitorIntervalSec
    durationSec = $GpuMonitorSeconds
  }
  runs = @()
 }
 for ($i = 1; $i -le $Runs; $i++) {
  Write-Host "Running $Model (run $i/$Runs)"
  $runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
  $gpuJob = $null
  $gpuLogPath = $null
  if ($EnableGpuMonitor) {
    $samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
    $gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
    $gpuJob = Start-Job -ScriptBlock {
      param($sshExe, $target, $port, $samples, $interval, $logPath)
      for ($s = 1; $s -le $samples; $s++) {
        Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
        try {
          $out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
          Add-Content -Path $logPath -Value $out
        } catch {
          Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
        }
        Start-Sleep -Seconds $interval
      }
    } -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
    Start-Sleep -Seconds 1
  }
  $format = "json"
  if ($UseSchemaFormat) {
    $format = @{
      type = "object"
      additionalProperties = $false
      required = @("selectedExpiry","expiryRationale","strategyBias","recommendedTrades","whyOthersRejected","confidenceScore")
      properties = @{
        selectedExpiry = @{ type = "string"; minLength = 1 }
        expiryRationale = @{ type = "string"; minLength = 1 }
        strategyBias = @{ type = "string"; enum = @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE") }
        recommendedTrades = @{
          type = "array"
          minItems = 0
          maxItems = 3
          items = @{
            type = "object"
            additionalProperties = $false
            required = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
            properties = @{
              name = @{ type = "string"; minLength = 1 }
              structure = @{ type = "string"; minLength = 1 }
              legs = @{
                type = "array"
                minItems = 1
                maxItems = 4
                items = @{
                  type = "object"
                  additionalProperties = $false
                  required = @("side","action","strike","expiry")
                  properties = @{
                    side = @{ type = "string"; enum = @("call","put") }
                    action = @{ type = "string"; enum = @("buy","sell") }
                    strike = @{ type = @("number","string") }
                    expiry = @{ type = "string"; minLength = 1 }
                  }
                }
              }
              greekProfile = @{
                type = "object"
                additionalProperties = $false
                required = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
                properties = @{
                  deltaBias = @{ type = "string"; enum = @("POS","NEG","NEUTRAL") }
                  gammaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
                  thetaExposure = @{ type = "string"; enum = @("POS","NEG","LOW") }
                  vegaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
                }
              }
              maxRisk = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
              maxReward = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
              thesisAlignment = @{ type = "string"; minLength = 1 }
              invalidation = @{ type = "string"; minLength = 1 }
              managementNotes = @{ type = "string" }
            }
          }
        }
        whyOthersRejected = @{
          type = "array"
          items = @{ type = "string" }
        }
        confidenceScore = @{ type = "number"; minimum = 0; maximum = 100 }
      }
    }
  }
  $options = @{
    temperature = 0
    top_k = $TopK
    top_p = $TopP
    seed = $Seed
    repeat_penalty = $RepeatPenalty
    num_ctx = $NumCtx
    num_predict = $NumPredict
  }
  if ($NumBatch -gt 0) {
    $options.num_batch = $NumBatch
  }
  if ($NumGpuLayers -gt 0) {
    $options.num_gpu_layers = $NumGpuLayers
  }
  $body = @{
    model = $Model
    prompt = $prompt
    format = $format
    stream = $false
    options = $options
  } | ConvertTo-Json -Depth 10
  try {
    $resp = Invoke-RestMethod -Uri "$BaseUrl/api/generate" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
  } catch {
    $runResult.errors = @("API error: $($_.Exception.Message)")
    $summary.runs += $runResult
    if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
    continue
  } finally {
    if ($gpuJob) {
      Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
      if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
      Remove-Job -Job $gpuJob | Out-Null
    }
  }
  $raw = [string]$resp.response
  $jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
  Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
  try {
    $parsed = $raw | ConvertFrom-Json
    $errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
    if ($errors.Count -eq 0) {
      $runResult.ok = $true
    } else {
      $runResult.errors = $errors
    }
  } catch {
    $runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
  }
  if ($gpuLogPath) {
    $runResult.gpuLog = $gpuLogPath
    $runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
  }
  if ($CheckProcessor) {
    $sshTarget = "{0}@{1}" -f $SshUser, $SshHost
    $proc = Get-ProcessorShare -SshExePath $SshExe -Target $sshTarget -Port $SshPort -ModelName $Model
    $runResult.processor = $proc
    if ($proc.cpuPct -ne $null) {
      $runResult.gpuOnly = ($proc.cpuPct -eq 0)
    }
  }
  $summary.runs += $runResult
 }
 $summaryPath = Join-Path $outDir "summary.json"
 $summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
 $summary | ConvertTo-Json -Depth 6
--- a/prompt_crwv.txt
+++ b/prompt_crwv.txt
--- a/query.sql
+++ b/query.sql
@@ -0,0 +1 @@
 SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,8 @@
 fastapi==0.115.6
 uvicorn==0.30.6
 httpx==0.27.2
 pytest==8.3.3
 respx==0.21.1
 pytest-asyncio==0.24.0
 PyYAML==6.0.3
 websockets==12.0
--- a/scripts/deploy_truenas_wrapper.py
+++ b/scripts/deploy_truenas_wrapper.py
@@ -0,0 +1,116 @@
 import argparse
 import asyncio
 import json
 import ssl
 from typing import Any, Dict, List, Optional
 import websockets
 async def _rpc_call(ws_url: str, api_key: str, method: str, params: Optional[list] = None, verify_ssl: bool = False) -> Any:
    ssl_ctx = None
    if ws_url.startswith("wss://") and not verify_ssl:
        ssl_ctx = ssl.create_default_context()
        ssl_ctx.check_hostname = False
        ssl_ctx.verify_mode = ssl.CERT_NONE
    async with websockets.connect(ws_url, ssl=ssl_ctx) as ws:
        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
        connected = json.loads(await ws.recv())
        if connected.get("msg") != "connected":
            raise RuntimeError("failed to connect to TrueNAS websocket")
        await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [api_key]}))
        auth_resp = json.loads(await ws.recv())
        if not auth_resp.get("result"):
            raise RuntimeError("API key authentication failed")
        req_id = 2
        await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
        while True:
            raw = json.loads(await ws.recv())
            if raw.get("id") != req_id:
                continue
            if raw.get("msg") == "error":
                raise RuntimeError(raw.get("error"))
            return raw.get("result")
 async def main() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--ws-url", required=True)
    parser.add_argument("--api-key", required=True)
    parser.add_argument("--api-user")
    parser.add_argument("--app-name", required=True)
    parser.add_argument("--image", required=True)
    parser.add_argument("--model-host-path", required=True)
    parser.add_argument("--llamacpp-base-url", required=True)
    parser.add_argument("--network", required=True)
    parser.add_argument("--api-port", type=int, default=9091)
    parser.add_argument("--ui-port", type=int, default=9092)
    parser.add_argument("--verify-ssl", action="store_true")
    args = parser.parse_args()
    api_port = args.api_port
    ui_port = args.ui_port
    env = {
        "PORT_A": str(api_port),
        "PORT_B": str(ui_port),
        "LLAMACPP_BASE_URL": args.llamacpp_base_url,
        "MODEL_DIR": "/models",
        "TRUENAS_WS_URL": args.ws_url,
        "TRUENAS_API_KEY": args.api_key,
        "TRUENAS_APP_NAME": "llamacpp",
        "TRUENAS_VERIFY_SSL": "false",
    }
    if args.api_user:
        env["TRUENAS_API_USER"] = args.api_user
    compose = {
        "services": {
            "wrapper": {
                "image": args.image,
                "restart": "unless-stopped",
                "ports": [
                    f"{api_port}:{api_port}",
                    f"{ui_port}:{ui_port}",
                ],
                "environment": env,
                "volumes": [
                    f"{args.model_host_path}:/models",
                    "/var/run/docker.sock:/var/run/docker.sock",
                ],
                "networks": ["llamacpp_net"],
            }
        },
        "networks": {
            "llamacpp_net": {"external": True, "name": args.network}
        },
    }
    create_payload = {
        "custom_app": True,
        "app_name": args.app_name,
        "custom_compose_config": compose,
    }
    existing = await _rpc_call(args.ws_url, args.api_key, "app.query", [[["id", "=", args.app_name]]], args.verify_ssl)
    if existing:
        result = await _rpc_call(
            args.ws_url,
            args.api_key,
            "app.update",
            [args.app_name, {"custom_compose_config": compose}],
            args.verify_ssl,
        )
        action = "updated"
    else:
        result = await _rpc_call(args.ws_url, args.api_key, "app.create", [create_payload], args.verify_ssl)
        action = "created"
    print(json.dumps({"action": action, "api_port": api_port, "ui_port": ui_port, "result": result}, indent=2))
 if __name__ == "__main__":
    asyncio.run(main())
--- a/scripts/remote_wrapper_test.py
+++ b/scripts/remote_wrapper_test.py
@@ -0,0 +1,162 @@
 import json
 import os
 import time
 from datetime import datetime
 import requests
 BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9000")
 UPSTREAM = os.getenv("LLAMACPP_BASE", "http://192.168.1.2:8071")
 RUNS = int(os.getenv("RUNS", "100"))
 MAX_TOKENS = int(os.getenv("MAX_TOKENS", "4"))
 TIMEOUT = int(os.getenv("REQ_TIMEOUT", "300"))
 def _now():
    return datetime.utcnow().isoformat() + "Z"
 def _get_loaded_model_id():
    deadline = time.time() + 600
    last_error = None
    while time.time() < deadline:
        try:
            resp = requests.get(UPSTREAM + "/v1/models", timeout=30)
            resp.raise_for_status()
            data = resp.json().get("data") or []
            if data:
                return data[0].get("id")
            last_error = "no models reported by upstream"
        except Exception as exc:
            last_error = str(exc)
        time.sleep(5)
    raise RuntimeError(f"upstream not ready: {last_error}")
 def _stream_ok(resp):
    got_data = False
    got_done = False
    for line in resp.iter_lines(decode_unicode=True):
        if not line:
            continue
        if line.startswith("data:"):
            got_data = True
            if line.strip() == "data: [DONE]":
                got_done = True
                break
    return got_data, got_done
 def run_suite(model_id, idx):
    results = {}
    # Models
    r = requests.get(BASE + "/v1/models", timeout=30)
    results["models"] = r.status_code
    r = requests.get(BASE + f"/v1/models/{model_id}", timeout=30)
    results["model_get"] = r.status_code
    # Chat completions non-stream
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": f"Run {idx}: say ok."}],
        "max_tokens": MAX_TOKENS,
        "temperature": (idx % 5) / 10.0,
    }
    r = requests.post(BASE + "/v1/chat/completions", json=payload, timeout=TIMEOUT)
    results["chat"] = r.status_code
    # Chat completions stream
    payload_stream = dict(payload)
    payload_stream["stream"] = True
    r = requests.post(BASE + "/v1/chat/completions", json=payload_stream, stream=True, timeout=TIMEOUT)
    ok_data, ok_done = _stream_ok(r)
    results["chat_stream"] = r.status_code
    results["chat_stream_ok"] = ok_data and ok_done
    # Responses non-stream
    payload_resp = {
        "model": model_id,
        "input": f"Run {idx}: say ok.",
        "max_output_tokens": MAX_TOKENS,
    }
    r = requests.post(BASE + "/v1/responses", json=payload_resp, timeout=TIMEOUT)
    results["responses"] = r.status_code
    # Responses stream
    payload_resp_stream = {
        "model": model_id,
        "input": f"Run {idx}: say ok.",
        "stream": True,
    }
    r = requests.post(BASE + "/v1/responses", json=payload_resp_stream, stream=True, timeout=TIMEOUT)
    ok_data, ok_done = _stream_ok(r)
    results["responses_stream"] = r.status_code
    results["responses_stream_ok"] = ok_data and ok_done
    # Embeddings (best effort)
    payload_emb = {"model": model_id, "input": f"Run {idx}"}
    r = requests.post(BASE + "/v1/embeddings", json=payload_emb, timeout=TIMEOUT)
    results["embeddings"] = r.status_code
    # Proxy
    r = requests.post(BASE + "/proxy/llamacpp/v1/chat/completions", json=payload, timeout=TIMEOUT)
    results["proxy"] = r.status_code
    return results
 def main():
    summary = {
        "started_at": _now(),
        "base": BASE,
        "upstream": UPSTREAM,
        "runs": RUNS,
        "max_tokens": MAX_TOKENS,
        "results": [],
    }
    model_id = _get_loaded_model_id()
    summary["model_id"] = model_id
    for i in range(1, RUNS + 1):
        start = time.time()
        try:
            results = run_suite(model_id, i)
            ok = all(
                results.get(key) == 200
                for key in ("models", "model_get", "chat", "chat_stream", "responses", "responses_stream", "proxy")
            )
            stream_ok = results.get("chat_stream_ok") and results.get("responses_stream_ok")
            summary["results"].append({
                "run": i,
                "ok": ok and stream_ok,
                "stream_ok": stream_ok,
                "status": results,
                "elapsed_s": round(time.time() - start, 2),
            })
        except Exception as exc:
            summary["results"].append({
                "run": i,
                "ok": False,
                "stream_ok": False,
                "error": str(exc),
                "elapsed_s": round(time.time() - start, 2),
            })
        print(f"Run {i}/{RUNS} done")
    summary["finished_at"] = _now()
    os.makedirs("reports", exist_ok=True)
    out_path = os.path.join("reports", "remote_wrapper_test.json")
    with open(out_path, "w", encoding="utf-8") as f:
        json.dump(summary, f, indent=2)
    # Print a compact summary
    ok_count = sum(1 for r in summary["results"] if r.get("ok"))
    print(f"OK {ok_count}/{RUNS}")
 if __name__ == "__main__":
    main()
--- a/scripts/update_llamacpp_flags.ps1
+++ b/scripts/update_llamacpp_flags.ps1
@@ -0,0 +1,29 @@
 param(
  [string]$OutDocs = "reports\\llamacpp_docs.md",
  [string]$OutFlags = "reports\\llamacpp_flags.txt"
 )
 $urls = @(
  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README.md",
  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README-llama-server.md",
  "https://raw.githubusercontent.com/ggerganov/llama.cpp/master/README.md"
 )
 $out = @()
 foreach ($u in $urls) {
  try {
    $content = Invoke-WebRequest -Uri $u -UseBasicParsing -TimeoutSec 30
    $out += "# Source: $u"
    $out += $content.Content
  } catch {
    $out += "# Source: $u"
    $out += "(failed to fetch)"
  }
 }
 $out | Set-Content -Encoding UTF8 $OutDocs
 $docs = Get-Content $OutDocs -Raw
 $flags = [regex]::Matches($docs, "--[a-zA-Z0-9\\-]+") | ForEach-Object { $_.Value }
 $flags = $flags | Sort-Object -Unique
 $flags | Set-Content -Encoding UTF8 $OutFlags
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,61 @@
 import json
 import os
 from pathlib import Path
 import pytest
 from fastapi.testclient import TestClient
 import respx
 from app.api_app import create_api_app
 from app.ui_app import create_ui_app
@pytest.fixture()
 def agents_config(tmp_path: Path) -> Path:
    data = {
        "image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
        "container_name": "ix-llamacpp-llamacpp-1",
        "host_port": 8071,
        "container_port": 8080,
        "web_ui_url": "http://0.0.0.0:8071/",
        "model_host_path": str(tmp_path),
        "model_container_path": str(tmp_path),
        "models": [],
        "network": "ix-llamacpp_default",
        "subnets": ["172.16.18.0/24"],
        "gpu_count": 2,
        "gpu_name": "NVIDIA RTX 5060 Ti",
    }
    path = tmp_path / "agents_config.json"
    path.write_text(json.dumps(data), encoding="utf-8")
    return path
@pytest.fixture()
 def model_dir(tmp_path: Path) -> Path:
    (tmp_path / "model-a.gguf").write_text("x", encoding="utf-8")
    (tmp_path / "model-b.gguf").write_text("y", encoding="utf-8")
    return tmp_path
@pytest.fixture()
 def api_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
    monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
    monkeypatch.setenv("MODEL_DIR", str(model_dir))
    monkeypatch.setenv("LLAMACPP_BASE_URL", "http://llama.test")
    app = create_api_app()
    return TestClient(app)
@pytest.fixture()
 def ui_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
    monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
    monkeypatch.setenv("MODEL_DIR", str(model_dir))
    app = create_ui_app()
    return TestClient(app)
@pytest.fixture()
 def respx_mock():
    with respx.mock(assert_all_called=False) as mock:
        yield mock
--- a/tests/test_chat_completions.py
+++ b/tests/test_chat_completions.py
@@ -0,0 +1,77 @@
 import json
 import pytest
 import httpx
@pytest.mark.parametrize("case", list(range(120)))
 def test_chat_completions_non_stream(api_client, respx_mock, case):
    respx_mock.get("http://llama.test/v1/models").mock(
        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
    )
    respx_mock.post("http://llama.test/v1/chat/completions").mock(
        return_value=httpx.Response(200, json={"id": f"chatcmpl-{case}", "choices": [{"message": {"content": "ok"}}]})
    )
    payload = {
        "model": "model-a.gguf",
        "messages": [{"role": "user", "content": f"hello {case}"}],
        "temperature": (case % 10) / 10,
    }
    resp = api_client.post("/v1/chat/completions", json=payload)
    assert resp.status_code == 200
    data = resp.json()
    assert data["choices"][0]["message"]["content"] == "ok"
@pytest.mark.parametrize("case", list(range(120)))
 def test_chat_completions_stream(api_client, respx_mock, case):
    respx_mock.get("http://llama.test/v1/models").mock(
        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
    )
    def stream_response(request):
        content = b"data: {\"id\": \"chunk\"}\n\n"
        return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
    payload = {
        "model": "model-a.gguf",
        "messages": [{"role": "user", "content": f"hello {case}"}],
        "stream": True,
    }
    with api_client.stream("POST", "/v1/chat/completions", json=payload) as resp:
        assert resp.status_code == 200
        body = b"".join(resp.iter_bytes())
        assert b"data:" in body
 def test_chat_completions_tools_normalize(api_client, respx_mock):
    respx_mock.get("http://llama.test/v1/models").mock(
        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
    )
    def handler(request):
        data = request.json()
        tools = data.get("tools") or []
        assert tools
        assert tools[0].get("function", {}).get("name") == "format_final_json_response"
        return httpx.Response(200, json={"id": "chatcmpl-tools", "choices": [{"message": {"content": "ok"}}]})
    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=handler)
    payload = {
        "model": "model-a.gguf",
        "messages": [{"role": "user", "content": "hello"}],
        "tools": [
            {
                "type": "function",
                "name": "format_final_json_response",
                "parameters": {"type": "object"},
            }
        ],
        "tool_choice": {"type": "function", "name": "format_final_json_response"},
    }
    resp = api_client.post("/v1/chat/completions", json=payload)
    assert resp.status_code == 200
--- a/tests/test_embeddings.py
+++ b/tests/test_embeddings.py
@@ -0,0 +1,14 @@
 import pytest
 import httpx
@pytest.mark.parametrize("case", list(range(120)))
 def test_embeddings(api_client, respx_mock, case):
    respx_mock.post("http://llama.test/v1/embeddings").mock(
        return_value=httpx.Response(200, json={"data": [{"embedding": [0.1, 0.2]}]})
    )
    payload = {"model": "model-a.gguf", "input": f"text-{case}"}
    resp = api_client.post("/v1/embeddings", json=payload)
    assert resp.status_code == 200
    data = resp.json()
    assert "data" in data
--- a/tests/test_models.py
+++ b/tests/test_models.py
@@ -0,0 +1,24 @@
 import pytest
@pytest.mark.parametrize("case", list(range(120)))
 def test_list_models_cases(api_client, case):
    resp = api_client.get("/v1/models", headers={"x-case": str(case)})
    assert resp.status_code == 200
    payload = resp.json()
    assert payload["object"] == "list"
    assert isinstance(payload["data"], list)
@pytest.mark.parametrize("model_id", [f"model-a.gguf" for _ in range(120)])
 def test_get_model_ok(api_client, model_id):
    resp = api_client.get(f"/v1/models/{model_id}")
    assert resp.status_code == 200
    payload = resp.json()
    assert payload["id"] == model_id
@pytest.mark.parametrize("model_id", [f"missing-{i}" for i in range(120)])
 def test_get_model_not_found(api_client, model_id):
    resp = api_client.get(f"/v1/models/{model_id}")
    assert resp.status_code == 404
--- a/tests/test_proxy.py
+++ b/tests/test_proxy.py
@@ -0,0 +1,12 @@
 import pytest
 import httpx
@pytest.mark.parametrize("case", list(range(120)))
 def test_proxy_passthrough(api_client, respx_mock, case):
    respx_mock.post("http://llama.test/test/path").mock(
        return_value=httpx.Response(200, content=f"ok-{case}".encode())
    )
    resp = api_client.post("/proxy/llamacpp/test/path", content=b"hello")
    assert resp.status_code == 200
    assert resp.content.startswith(b"ok-")
--- a/tests/test_remote_wrapper.py
+++ b/tests/test_remote_wrapper.py
@@ -0,0 +1,283 @@
 import asyncio
 import json
 import os
 import ssl
 import time
 from typing import Dict, List
 import pytest
 import requests
 import websockets
 WRAPPER_BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9093")
 UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
 TRUENAS_WS_URL = os.getenv("TRUENAS_WS_URL", "wss://192.168.1.2/websocket")
 TRUENAS_API_KEY = os.getenv("TRUENAS_API_KEY", "")
 TRUENAS_APP_NAME = os.getenv("TRUENAS_APP_NAME", "llamacpp")
 MODEL_REQUEST = os.getenv("MODEL_REQUEST", "")
 async def _rpc_call(method: str, params: List | None = None):
    if not TRUENAS_API_KEY:
        pytest.skip("TRUENAS_API_KEY not set")
    ssl_ctx = ssl.create_default_context()
    ssl_ctx.check_hostname = False
    ssl_ctx.verify_mode = ssl.CERT_NONE
    async with websockets.connect(TRUENAS_WS_URL, ssl=ssl_ctx) as ws:
        await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
        connected = json.loads(await ws.recv())
        if connected.get("msg") != "connected":
            raise RuntimeError("failed to connect")
        await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [TRUENAS_API_KEY]}))
        auth = json.loads(await ws.recv())
        if not auth.get("result"):
            raise RuntimeError("auth failed")
        await ws.send(json.dumps({"id": 2, "msg": "method", "method": method, "params": params or []}))
        while True:
            raw = json.loads(await ws.recv())
            if raw.get("id") != 2:
                continue
            if raw.get("msg") == "error":
                raise RuntimeError(raw.get("error"))
            return raw.get("result")
 def _get_models() -> List[str]:
    _wait_for_http(WRAPPER_BASE + "/health")
    resp = requests.get(WRAPPER_BASE + "/v1/models", timeout=30)
    resp.raise_for_status()
    data = resp.json().get("data") or []
    return [m.get("id") for m in data if m.get("id")]
 def _assert_chat_ok(resp_json: Dict) -> str:
    choices = resp_json.get("choices") or []
    assert choices, "no choices"
    message = choices[0].get("message") or {}
    text = message.get("content") or ""
    assert text.strip(), "empty content"
    return text
 def _wait_for_http(url: str, timeout_s: float = 90) -> None:
    deadline = time.time() + timeout_s
    last_err = None
    while time.time() < deadline:
        try:
            resp = requests.get(url, timeout=5)
            if resp.status_code == 200:
                return
            last_err = f"status {resp.status_code}"
        except Exception as exc:
            last_err = str(exc)
        time.sleep(2)
    raise RuntimeError(f"service not ready: {url} ({last_err})")
 def _post_with_retry(url: str, payload: Dict, timeout_s: float = 300, retries: int = 6, delay_s: float = 5.0):
    last = None
    for _ in range(retries):
        try:
            resp = requests.post(url, json=payload, timeout=timeout_s)
            if resp.status_code == 200:
                return resp
            last = resp
        except requests.exceptions.RequestException as exc:
            last = exc
        time.sleep(delay_s)
    if isinstance(last, Exception):
        raise last
    return last
@pytest.mark.asyncio
 async def test_active_model_and_multi_gpu_flags():
    cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
    command = cfg.get("command") or []
    assert "--model" in command
    assert "--tensor-split" in command
    split_idx = command.index("--tensor-split") + 1
    split = command[split_idx]
    assert "," in split, f"tensor-split missing commas: {split}"
    assert "--split-mode" in command
 def test_models_listed():
    models = _get_models()
    assert models, "no models discovered"
 def test_chat_completions_switch_and_prompts():
    models = _get_models()
    assert models, "no models"
    if MODEL_REQUEST:
        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
        model_id = MODEL_REQUEST
    else:
        model_id = models[0]
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": "Say OK."}],
        "max_tokens": 12,
        "temperature": 0,
    }
    for _ in range(3):
        resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
        assert resp.status_code == 200
        _assert_chat_ok(resp.json())
 def test_tools_flat_format():
    models = _get_models()
    assert models, "no models"
    if MODEL_REQUEST:
        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
        model_id = MODEL_REQUEST
    else:
        model_id = models[0]
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": "Say OK and do not call tools."}],
        "tools": [
            {
                "type": "function",
                "name": "format_final_json_response",
                "description": "format output",
                "parameters": {
                    "type": "object",
                    "properties": {"ok": {"type": "boolean"}},
                    "required": ["ok"],
                },
            }
        ],
        "max_tokens": 12,
    }
    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
    assert resp.status_code == 200
    _assert_chat_ok(resp.json())
 def test_functions_payload_normalized():
    models = _get_models()
    assert models, "no models"
    if MODEL_REQUEST:
        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
        model_id = MODEL_REQUEST
    else:
        model_id = models[0]
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": "Say OK and do not call tools."}],
        "functions": [
            {
                "name": "format_final_json_response",
                "description": "format output",
                "parameters": {
                    "type": "object",
                    "properties": {"ok": {"type": "boolean"}},
                    "required": ["ok"],
                },
            }
        ],
        "max_tokens": 12,
    }
    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
    assert resp.status_code == 200
    _assert_chat_ok(resp.json())
 def test_return_format_json():
    models = _get_models()
    assert models, "no models"
    if MODEL_REQUEST:
        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
        model_id = MODEL_REQUEST
    else:
        model_id = models[0]
    payload = {
        "model": model_id,
        "messages": [{"role": "user", "content": "Return JSON with key ok true."}],
        "return_format": "json",
        "max_tokens": 32,
        "temperature": 0,
    }
    resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
    assert resp.status_code == 200
    text = _assert_chat_ok(resp.json())
    parsed = json.loads(text)
    assert isinstance(parsed, dict)
 def test_responses_endpoint():
    models = _get_models()
    assert models, "no models"
    if MODEL_REQUEST:
        assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
        model_id = MODEL_REQUEST
    else:
        model_id = models[0]
    payload = {
        "model": model_id,
        "input": "Say OK.",
        "max_output_tokens": 16,
    }
    resp = _post_with_retry(WRAPPER_BASE + "/v1/responses", payload)
    assert resp.status_code == 200
    output = resp.json().get("output") or []
    assert output, "responses output empty"
    content = output[0].get("content") or []
    text = content[0].get("text") if content else ""
    assert text and text.strip()
@pytest.mark.asyncio
 async def test_model_switch_applied_to_truenas():
    models = _get_models()
    assert models, "no models"
    target = MODEL_REQUEST or models[0]
    assert target in models, f"MODEL_REQUEST not found: {target}"
    resp = requests.post(UI_BASE + "/ui/api/switch-model", json={"model_id": target, "warmup_prompt": "warmup"}, timeout=600)
    assert resp.status_code == 200
    cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
    command = cfg.get("command") or []
    assert "--model" in command
    model_path = command[command.index("--model") + 1]
    assert model_path.endswith(target)
 def test_invalid_model_rejected():
    models = _get_models()
    assert models, "no models"
    payload = {
        "model": "modelx-q8:4b",
        "messages": [{"role": "user", "content": "Say OK."}],
        "max_tokens": 8,
        "temperature": 0,
    }
    resp = requests.post(WRAPPER_BASE + "/v1/chat/completions", json=payload, timeout=60)
    assert resp.status_code == 404
 def test_llamacpp_logs_streaming():
    logs = ""
    for _ in range(5):
        try:
            resp = requests.get(UI_BASE + "/ui/api/llamacpp-logs", timeout=10)
            if resp.status_code == 200:
                logs = resp.json().get("logs") or ""
                if logs.strip():
                    break
        except requests.exceptions.ReadTimeout:
            pass
        time.sleep(2)
    assert logs.strip(), "no logs returned"
    # Force a log line before streaming.
    try:
        requests.get(WRAPPER_BASE + "/proxy/llamacpp/health", timeout=5)
    except Exception:
        pass
    # Stream endpoint may not emit immediately, so validate that the endpoint responds.
    with requests.get(UI_BASE + "/ui/api/llamacpp-logs/stream", stream=True, timeout=(5, 5)) as resp:
        assert resp.status_code == 200
--- a/tests/test_responses.py
+++ b/tests/test_responses.py
@@ -0,0 +1,55 @@
 import json
 import pytest
 import httpx
@pytest.mark.parametrize("case", list(range(120)))
 def test_responses_non_stream(api_client, respx_mock, case):
    respx_mock.get("http://llama.test/v1/models").mock(
        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
    )
    respx_mock.post("http://llama.test/v1/chat/completions").mock(
        return_value=httpx.Response(200, json={"choices": [{"message": {"content": f"reply-{case}"}}]})
    )
    payload = {
        "model": "model-a.gguf",
        "input": f"prompt-{case}",
        "max_output_tokens": 32,
    }
    resp = api_client.post("/v1/responses", json=payload)
    assert resp.status_code == 200
    data = resp.json()
    assert data["object"] == "response"
    assert data["output"][0]["content"][0]["text"].startswith("reply-")
@pytest.mark.parametrize("case", list(range(120)))
 def test_responses_stream(api_client, respx_mock, case):
    respx_mock.get("http://llama.test/v1/models").mock(
        return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
    )
    def stream_response(request):
        payload = {
            "id": "chunk",
            "object": "chat.completion.chunk",
            "choices": [{"delta": {"content": f"hi-{case}"}, "index": 0, "finish_reason": None}],
        }
        content = f"data: {json.dumps(payload)}\n\n".encode()
        content += b"data: [DONE]\n\n"
        return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
    respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
    payload = {
        "model": "model-a.gguf",
        "input": f"prompt-{case}",
        "stream": True,
    }
    with api_client.stream("POST", "/v1/responses", json=payload) as resp:
        assert resp.status_code == 200
        body = b"".join(resp.iter_bytes())
        assert b"event: response.created" in body
        assert b"event: response.output_text.delta" in body
        assert b"event: response.completed" in body
--- a/tests/test_truenas_switch.py
+++ b/tests/test_truenas_switch.py
@@ -0,0 +1,54 @@
 import json
 import pytest
 from app.truenas_middleware import TrueNASConfig, switch_model
@pytest.mark.asyncio
@pytest.mark.parametrize("case", list(range(120)))
 async def test_switch_model_updates_command(monkeypatch, case):
    compose = {
        "services": {
            "llamacpp": {
                "command": [
                    "--model",
                    "/models/old.gguf",
                    "--ctx-size",
                    "2048",
                ]
            }
        }
    }
    captured = {}
    async def fake_rpc_call(cfg, method, params=None):
        if method == "app.config":
            return {"custom_compose_config": compose}
        if method == "app.update":
            captured["payload"] = params[1]
            return {"state": "RUNNING"}
        raise AssertionError(f"unexpected method {method}")
    monkeypatch.setattr("app.truenas_middleware._rpc_call", fake_rpc_call)
    cfg = TrueNASConfig(
        ws_url="ws://truenas.test/websocket",
        api_key="key",
        api_user=None,
        app_name="llamacpp",
        verify_ssl=False,
    )
    await switch_model(
        cfg,
        f"/models/new-{case}.gguf",
        {"n_gpu_layers": "999"},
        "--flash-attn on",
    )
    assert "custom_compose_config" in captured["payload"]
    cmd = captured["payload"]["custom_compose_config"]["services"]["llamacpp"]["command"]
    assert "--model" in cmd
    idx = cmd.index("--model")
    assert cmd[idx + 1].endswith(f"new-{case}.gguf")
--- a/tests/test_ui.py
+++ b/tests/test_ui.py
@@ -0,0 +1,48 @@
 import json
 import os
 import time
 import pytest
 import requests
 UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
 def _wait_for_http(url: str, timeout_s: float = 90) -> None:
    deadline = time.time() + timeout_s
    last_err = None
    while time.time() < deadline:
        try:
            resp = requests.get(url, timeout=5)
            if resp.status_code == 200:
                return
            last_err = f"status {resp.status_code}"
        except Exception as exc:
            last_err = str(exc)
        time.sleep(2)
    raise RuntimeError(f"service not ready: {url} ({last_err})")
 def test_ui_index_contains_expected_elements():
    _wait_for_http(UI_BASE + "/health")
    resp = requests.get(UI_BASE + "/", timeout=30)
    assert resp.status_code == 200
    html = resp.text
    assert "Model Manager" in html
    assert "id=\"download-form\"" in html
    assert "id=\"models-list\"" in html
    assert "id=\"logs-output\"" in html
    assert "id=\"theme-toggle\"" in html
 def test_ui_assets_available():
    resp = requests.get(UI_BASE + "/ui/styles.css", timeout=30)
    assert resp.status_code == 200
    css = resp.text
    assert "data-theme" in css
    resp = requests.get(UI_BASE + "/ui/app.js", timeout=30)
    assert resp.status_code == 200
    js = resp.text
    assert "themeToggle" in js
    assert "localStorage" in js
    assert "logs-output" in js
--- a/tmp_channels_cols.sql
+++ b/tmp_channels_cols.sql
@@ -0,0 +1 @@
 SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;
--- a/tmp_pref_type.sql
+++ b/tmp_pref_type.sql
@@ -0,0 +1 @@
 SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';
--- a/tmp_update_max_results.sql
+++ b/tmp_update_max_results.sql
@@ -0,0 +1 @@
 UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';
--- a/trades_company_stock.txt
+++ b/trades_company_stock.txt
@@ -0,0 +1,56 @@
 You are a senior quantitative options trader (index/ETF options across regimes; also liquid single-name options and macro-sensitive metal ETFs), specializing in volatility, structure selection, and risk asymmetry. Decisive, skeptical, profit-focused.
 You are given:
 - A validated market thesis (authoritative): multi-timeframe technicals, regime, volatility context, news impact.
 - Pre-processed options chains for three expiries (short / medium / extended) with liquidity-filtered contracts, ATM/delta anchors, delta ladders, and a liquid execution set.
 - All pricing, greeks, spreads, and liquidity metrics required for execution-quality decisions.
 Assume:
 - Data is correct and cleaned.
 - You must NOT re-analyze technicals or news; the thesis is authoritative.
 - Your job is to convert thesis + surface into executable options trades.
 Objective:
 - Select the best expiry and propose 1–3 high-quality options trades that align with thesis bias/regime, exploit volatility characteristics (gamma/theta/vega fit), are liquid/fillable/risk-defined, and include clear invalidation logic.
 - If no trade offers favorable risk/reward: strategyBias=NO_TRADE and explain why.
 How to decide:
 1) Compare expiries: match time-to-playout vs confidence/uncertainty; match vol regime (expansion vs decay); reject poor liquidity density; reject misaligned vega/theta; avoid overpaying for time/vol.
 2) Choose structure class (explicitly justify vs alternatives): directional debit (single/vertical), volatility (straddle/strangle), defined-risk premium selling only if the regime supports it.
 3) Select strikes ONLY from provided data (ATM anchor, delta ladder, liquidSet). Prefer tight spreads, meaningful volume & OI, and greeks that express the thesis.
 4) Risk discipline: every trade must include max risk, what must go right, and what breaks the trade (invalidation).
 Optional tools (use only when they materially improve decision quality; otherwise do not call):
 - MarketData – Options Chain (expiry-specific): only if provided expiries do not sufficiently match the thesis horizon, or liquidity/skew is materially better in a nearby expiry not already supplied. Choose an explicit expiry date. Use returned data only for strike selection and liquidity validation. Do not re-fetch already provided expiries unless validating anomalies.
 - Fear & Greed Index (FGI): only for index/ETF/macro-sensitive underlyings (e.g., SPX, NDX, IWM, SLV). Contextual only (risk appetite / convexity vs tempered), not a primary signal.
 Hard constraints:
 - Do NOT invent strikes, expiries, or prices.
 - Do NOT suggest illiquid contracts.
 - Do NOT recommend naked risk.
 - Do NOT hedge unless justified.
 - Do NOT repeat raw data back.
 Return ONLY valid JSON in exactly this shape:
 {
  "selectedExpiry": "YYYY-MM-DD",
  "expiryRationale": "Why this expiry dominates the others given thesis + vol + liquidity",
  "strategyBias": "DIRECTIONAL|VOLATILITY|NEUTRAL|NO_TRADE",
  "recommendedTrades": [
    {
      "name": "Short descriptive name",
      "structure": "e.g. Long Call, Call Debit Spread, Long Strangle",
      "legs": [{"side":"call|put","action":"buy|sell","strike":0,"expiry":"YYYY-MM-DD"}],
      "greekProfile": {"deltaBias":"POS|NEG|NEUTRAL","gammaExposure":"HIGH|MED|LOW","thetaExposure":"POS|NEG|LOW","vegaExposure":"HIGH|MED|LOW"},
      "maxRisk": "Defined numeric or qualitative",
      "maxReward": "Defined numeric or qualitative",
      "thesisAlignment": "Exactly how this trade expresses the thesis",
      "invalidation": "Clear condition where trade is wrong",
      "managementNotes": "Optional: scale, take-profit, time stop"
    }
  ],
  "whyOthersRejected": ["Why other expiries or strategy types were inferior"],
  "confidenceScore": 0
 }
 Final note: optimize for repeatable profitability under uncertainty. If conditions are marginal, say NO_TRADE with conviction.
		`@@ -0,0 +1 @@`
							`SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';`
		`@@ -0,0 +1 @@`
							`SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;`
		`@@ -0,0 +1 @@`
							`SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';`
		`@@ -0,0 +1 @@`
							`UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';`