Initial commit

This commit is contained in:
Rushabh Gosar
2026-01-07 16:54:39 -08:00
commit 5d1a0ee72b
53 changed files with 9885 additions and 0 deletions

142
.gitignore vendored Normal file
View File

@@ -0,0 +1,142 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# Project-specific
/inventory_raw/
/llamacpp_runs_remote/
/ollama_runs_remote/
/reports/
/tmp/
*.log
/C:/Users/Rushabh/.gemini/tmp/bff31f86566324f77927540d72088ce62479fd0563c197318c9f0594af2e69ee/
# OS-generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

4206
AGENTS.full.md Normal file

File diff suppressed because it is too large Load Diff

20
AGENTS.md Normal file
View File

@@ -0,0 +1,20 @@
# AGENTS (compressed)
This is the compact working context. For the full historical inventory and detailed snapshots, see `AGENTS.full.md` and `inventory_raw/`.
## Access + basics
- SSH: `ssh -p 55555 rushabh@192.168.1.2`
- Sudo: `sudo -n true`
- TrueNAS UI: `http://192.168.1.2`
## Full context pointers
- Full inventory snapshot and extra system details: `AGENTS.full.md`
- Raw captured data: `inventory_raw/`
- Documentation notes: `docs/*`
Projects
- n8n Thesis Builder checkpoint (2026-01-04): `docs/n8n-thesis-builder-checkpoint-20260104.md`
- llamaCpp wrapper: A Python-based OpenAI-compatible API wrapper and model manager for the TrueNAS llama.cpp app.
- Location: `llamaCpp.Wrapper.app/`
- API Port: `9093`
- UI Port: `9094`
- See the `README.md` inside the folder for full details.

69
README.md Normal file
View File

@@ -0,0 +1,69 @@
# Codex TrueNAS Helper
This project is a collection of scripts, configurations, and applications to manage and enhance a TrueNAS SCALE server, with a special focus on running and interacting with large language models (LLMs) like those powered by `llama.cpp` and `Ollama`.
## Features
* **`llama.cpp` Wrapper:** A sophisticated wrapper for the `llama.cpp` TrueNAS application that provides:
* An OpenAI-compatible API for chat completions and embeddings.
* A web-based UI for managing models (listing, downloading).
* The ability to hot-swap models without restarting the `llama.cpp` container by interacting with the TrueNAS API.
* **TrueNAS Inventory:** A snapshot of the TrueNAS server's configuration, including hardware, storage, networking, and running applications.
* **Automation Scripts:** A set of PowerShell and Python scripts for tasks like deploying the wrapper and testing remote endpoints.
* **LLM Integration:** Tools and configurations for working with various LLMs.
## Directory Structure
* `AGENTS.md` & `AGENTS.full.md`: These files contain detailed information and a complete inventory of the TrueNAS server's configuration.
* `llamaCpp.Wrapper.app/`: A Python-based application that wraps the `llama.cpp` TrueNAS app with an OpenAI-compatible API and a model management UI.
* `scripts/`: Contains various scripts for deployment, testing, and other tasks.
* `inventory_raw/`: Raw data dumps from the TrueNAS server, used to generate the inventory in `AGENTS.full.md`.
* `reports/`: Contains generated reports, test results, and other artifacts.
* `llamacpp_runs_remote/` & `ollama_runs_remote/`: Logs and results from running LLMs.
* `modelfiles/`: Modelfiles for different language models.
* `tests/`: Python tests for the `llamaCpp.Wrapper.app`.
## `llamaCpp.Wrapper.app`
This is the core component of the project. It's a Python application that acts as a proxy to the `llama.cpp` server running on TrueNAS, but with added features.
### Running Locally
1. Install the required Python packages:
```bash
pip install -r llamaCpp.Wrapper.app/requirements.txt
```
2. Run the application:
```bash
python -m llamaCpp.Wrapper.app.run
```
This will start two web servers: one for the API (default port 9093) and one for the UI (default port 9094).
### Docker (TrueNAS)
The wrapper can be run as a Docker container on TrueNAS. See the `llamaCpp.Wrapper.app/README.md` file for a detailed example of the `docker run` command. The wrapper needs to be configured with the appropriate environment variables to connect to the TrueNAS API and the `llama.cpp` container.
### Model Hot-Swapping
The wrapper can switch models in the `llama.cpp` server by updating the application's command via the TrueNAS API. This is a powerful feature that allows for dynamic model management without manual intervention.
## Scripts
* `deploy_truenas_wrapper.py`: A Python script to deploy the `llamaCpp.Wrapper.app` to TrueNAS.
* `remote_wrapper_test.py`: A Python script for testing the remote wrapper.
* `update_llamacpp_flags.ps1`: A PowerShell script to update the `llama.cpp` flags.
* `llamacpp_remote_test.ps1` & `ollama_remote_test.ps1`: PowerShell scripts for testing `llama.cpp` and `Ollama` remote endpoints.
## Getting Started
1. **Explore the Inventory:** Start by reading `AGENTS.md` and `AGENTS.full.md` to understand the TrueNAS server's configuration.
2. **Set up the Wrapper:** If you want to use the `llama.cpp` wrapper, follow the instructions in `llamaCpp.Wrapper.app/README.md` to run it either locally or as a Docker container on TrueNAS.
3. **Use the Scripts:** The scripts in the `scripts` directory can be used to automate various tasks.
## Development
The `llamaCpp.Wrapper.app` has a suite of tests located in the `tests/` directory. To run the tests, use `pytest`:
```bash
pytest
```

View File

@@ -0,0 +1,60 @@
# llama.cpp Wrapper Notes
Last updated: 2026-01-04
## Purpose
OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
model switching, and parameter management via TrueNAS middleware.
## Deployed Image
- `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
## Ports (current)
- API (pinned): `http://192.168.1.2:9093`
- UI (pinned): `http://192.168.1.2:9094`
- llama.cpp native: `http://192.168.1.2:8071`
## Key Behaviors
- Model switching uses TrueNAS middleware `app.update` to update `--model`.
- `--device` flag is explicitly removed because it crashes llama.cpp on this host.
- UI shows active model and supports switching with verification prompt.
- UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
- UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
- UI supports dark theme toggle (persisted in localStorage).
- UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
## Tools Support (n8n/OpenWebUI)
- Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
- Legacy `functions` payloads are normalized into `tools`.
- `tool_choice` is normalized to OpenAI format as well.
- `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
## Model Resolution
- Exact string match only (with optional explicit alias mapping).
- Requests that do not exactly match a listed model return `404`.
## Parameters UI
- Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
- Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
## Model Switch UI
- Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
- Verifies switch by sending a minimal prompt.
## Tests
- Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
- UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
- Run with env vars:
- `WRAPPER_BASE=http://192.168.1.2:9093`
- `UI_BASE=http://192.168.1.2:9094`
- `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
- `TRUENAS_API_KEY=...`
- `MODEL_REQUEST=<exact model id from /v1/models>`
## Runtime Validation (2026-01-04)
- Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
- Confirmed TinyLlama loads and answers prompts with `return_format=json`.
- Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
- Expect transient `503 Loading model` during warmup; retry after load completes.
- Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).

View File

@@ -0,0 +1,53 @@
# n8n Thesis Builder Debug Checkpoint (2026-01-04)
## Summary
- Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
- Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
- Confirmed execution with garbled output: execution `7890` (status `success`).
## What changed in the workflow
Only this workflow was modified:
- `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
- `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
- `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
- `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
- `Code10` now caps news items and string length.
## Last successful run details (execution 7890)
- `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
- `AI - Technicals Auditor` output looks valid JSON (see sample below).
- `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
- Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
### Sample `AI - Thesis Builder` output (garbled)
- symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
- thesis: (junk tokens, mostly non-words)
- confidence: `0`
### Sample `AI - Technicals Auditor` output (valid JSON)
```
{
"output": {
"timeframes": [
{ "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
{ "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
{ "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
{ "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
],
"optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
"dataQualityScore": 0.5,
"error": "INSUFFICIENT_DATA"
}
}
```
## Open issues
- Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
- Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
## Useful commands
- Last runs:
`SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
- Export workflow:
`sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`

View File

@@ -0,0 +1,16 @@
FROM python:3.11-slim
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY app /app/app
COPY trades_company_stock.txt /app/trades_company_stock.txt
EXPOSE 8000 8001
CMD ["python", "-m", "app.run"]

View File

@@ -0,0 +1,134 @@
# llama.cpp OpenAI-Compatible Wrapper
This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
## Current Agents-Derived Details
- llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
- Host port: `8071` -> container port `8080`
- Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
- Network: `ix-llamacpp_default`
- Container name: `ix-llamacpp-llamacpp-1`
- GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
Regenerate the derived config after updating `AGENTS.md`:
```bash
python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
```
## Running Locally
```bash
python -m venv .venv
. .venv/bin/activate
pip install -r requirements.txt
python -m app.run
```
Defaults:
- API: `PORT_A=9093`
- UI: `PORT_B=9094`
- Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
- Model dir: `MODEL_DIR=/models`
## Docker (TrueNAS)
Example (join existing llama.cpp network and mount models):
```bash
docker run --rm -p 9093:9093 -p 9094:9094 \
--network ix-llamacpp_default \
-v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
-v /var/run/docker.sock:/var/run/docker.sock \
-e LLAMACPP_RESTART_METHOD=docker \
-e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
-e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
-e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
-e TRUENAS_API_KEY=YOUR_KEY \
-e TRUENAS_API_USER=YOUR_USER \
-e TRUENAS_APP_NAME=llamacpp \
-e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
-e PORT_A=9093 -e PORT_B=9094 \
llama-cpp-openai-wrapper:latest
```
## Model Hot-Swap / Restart Hooks
This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
provide one of the restart methods below:
- `LLAMACPP_RESTART_METHOD=http`
- `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
or
- `LLAMACPP_RESTART_METHOD=shell`
- `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
or (requires mounting docker socket)
- `LLAMACPP_RESTART_METHOD=docker`
- `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
## Model switching via TrueNAS middleware (P0)
Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
```
TRUENAS_WS_URL=ws://192.168.1.2/websocket
TRUENAS_API_KEY=YOUR_KEY
TRUENAS_API_USER=YOUR_USER
TRUENAS_APP_NAME=llamacpp
TRUENAS_VERIFY_SSL=false
```
The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
missing GPU split flags from `LLAMACPP_*` if not already set.
Optional arguments passed to restart handlers:
```
LLAMACPP_DEVICES=0,1
LLAMACPP_TENSOR_SPLIT=0.5,0.5
LLAMACPP_SPLIT_MODE=layer
LLAMACPP_N_GPU_LAYERS=999
LLAMACPP_CTX_SIZE=8192
LLAMACPP_BATCH_SIZE=1024
LLAMACPP_UBATCH_SIZE=256
LLAMACPP_CACHE_TYPE_K=q4_0
LLAMACPP_CACHE_TYPE_V=q4_0
LLAMACPP_FLASH_ATTN=on
```
You can also pass arbitrary llama.cpp flags (space-separated) via:
```
LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
```
## Model Manager UI
Open `http://HOST:PORT_B/`.
Features:
- List existing models
- Download models via URL
- Live progress + cancel
## Testing
Tests are parameterized with 100+ cases per endpoint.
```bash
pytest -q
```
## llama.cpp flags reference
Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
```
pwsh scripts/update_llamacpp_flags.ps1
```

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1,22 @@
{
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
"container_name": "ix-llamacpp-llamacpp-1",
"host_port": 8071,
"container_port": 8080,
"web_ui_url": "http://0.0.0.0:8071/",
"model_host_path": "/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models",
"model_container_path": "/models",
"models": [
"GPT-OSS",
"Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
"openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf",
"Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
],
"network": "ix-llamacpp_default",
"subnets": [
"172.16.18.0/24",
"fdb7:86ec:b1dd:11::/64"
],
"gpu_count": 2,
"gpu_name": "NVIDIA RTX 5060 Ti, 16 GB each (per `nvidia-smi` in prior runs)."
}

View File

@@ -0,0 +1,119 @@
import json
import re
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import List, Optional
APP_HEADER_RE = re.compile(r"^### App: (?P<name>.+?)\s*$")
IMAGE_RE = re.compile(r"image=(?P<image>[^\s]+)")
PORT_MAP_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host>\d+|0\.0\.0\.0:(?P<host_ip_port>\d+))")
PORT_LINE_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host_ip>[^:]+):(?P<host>\d+)")
VOLUME_RE = re.compile(r"- (?P<host>/[^\s]+) -> (?P<container>/[^\s]+)")
NETWORK_RE = re.compile(r"- (?P<name>ix-[^\s]+)_default")
SUBNET_RE = re.compile(r"subnets=\[(?P<subnets>[^\]]+)\]")
MODELS_RE = re.compile(r"Models in /models: (?P<models>.+)$")
PORTAL_RE = re.compile(r"Portals: \{\'Web UI\': \'(?P<url>[^\']+)\'\}")
GPU_RE = re.compile(r"GPUs:\s*(?P<count>\d+)x\s*(?P<name>.+)$")
CONTAINER_NAME_RE = re.compile(r"^(?P<name>ix-llamacpp-[^\s]+)")
@dataclass
class LlamacppConfig:
image: Optional[str] = None
container_name: Optional[str] = None
host_port: Optional[int] = None
container_port: Optional[int] = None
web_ui_url: Optional[str] = None
model_host_path: Optional[str] = None
model_container_path: Optional[str] = None
models: List[str] = None
network: Optional[str] = None
subnets: List[str] = None
gpu_count: Optional[int] = None
gpu_name: Optional[str] = None
def _find_section(lines: List[str], app_name: str) -> List[str]:
start = None
for i, line in enumerate(lines):
m = APP_HEADER_RE.match(line.strip())
if m and m.group("name") == app_name:
start = i
break
if start is None:
return []
for j in range(start + 1, len(lines)):
if APP_HEADER_RE.match(lines[j].strip()):
return lines[start:j]
return lines[start:]
def parse_agents(path: Path) -> LlamacppConfig:
text = path.read_text(encoding="utf-8", errors="ignore")
lines = text.splitlines()
section = _find_section(lines, "llamacpp")
cfg = LlamacppConfig(models=[], subnets=[])
for line in section:
if cfg.image is None:
m = IMAGE_RE.search(line)
if m:
cfg.image = m.group("image")
if cfg.web_ui_url is None:
m = PORTAL_RE.search(line)
if m:
cfg.web_ui_url = m.group("url")
if cfg.container_port is None or cfg.host_port is None:
m = PORT_LINE_RE.search(line)
if m:
cfg.container_port = int(m.group("container"))
cfg.host_port = int(m.group("host"))
if cfg.model_host_path is None or cfg.model_container_path is None:
m = VOLUME_RE.search(line)
if m and "/models" in m.group("container"):
cfg.model_host_path = m.group("host")
cfg.model_container_path = m.group("container")
if cfg.network is None:
m = NETWORK_RE.search(line)
if m:
cfg.network = f"{m.group('name')}_default"
if "subnets=" in line:
m = SUBNET_RE.search(line)
if m:
subnets_raw = m.group("subnets")
subnets = [s.strip().strip("'") for s in subnets_raw.split(",")]
cfg.subnets.extend([s for s in subnets if s])
if "Models in /models:" in line:
m = MODELS_RE.search(line)
if m:
models_raw = m.group("models")
cfg.models = [s.strip() for s in models_raw.split(",") if s.strip()]
for line in lines:
if cfg.gpu_count is None:
m = GPU_RE.search(line)
if m:
cfg.gpu_count = int(m.group("count"))
cfg.gpu_name = m.group("name").strip()
if cfg.container_name is None:
m = CONTAINER_NAME_RE.match(line.strip())
if m:
cfg.container_name = m.group("name")
return cfg
def main() -> None:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--agents", default="AGENTS.md")
parser.add_argument("--out", default="app/agents_config.json")
args = parser.parse_args()
cfg = parse_agents(Path(args.agents))
out_path = Path(args.out)
out_path.parent.mkdir(parents=True, exist_ok=True)
out_path.write_text(json.dumps(asdict(cfg), indent=2), encoding="utf-8")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,309 @@
import asyncio
import logging
import time
from pathlib import Path
from typing import Any, Dict
from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
from fastapi.responses import JSONResponse, StreamingResponse
import httpx
from app.config import load_config
from app.llamacpp_client import proxy_json, proxy_raw, proxy_stream
from app.logging_utils import configure_logging
from app.model_registry import find_model, resolve_model, scan_models
from app.openai_translate import responses_to_chat_payload, chat_to_responses, normalize_chat_payload
from app.restart import RestartPlan, trigger_restart
from app.stream_transform import stream_chat_to_responses
from app.truenas_middleware import TrueNASConfig, get_active_model_id, switch_model
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
configure_logging()
log = logging.getLogger("api_app")
def _model_list_payload(model_dir: str) -> Dict[str, Any]:
data = []
for model in scan_models(model_dir):
data.append({
"id": model.model_id,
"object": "model",
"created": model.created,
"owned_by": "llama.cpp",
})
return {"object": "list", "data": data}
def _requires_json_mode(payload: Dict[str, Any]) -> bool:
response_format = payload.get("response_format")
if isinstance(response_format, dict) and response_format.get("type") == "json_object":
return True
if payload.get("return_format") == "json":
return True
return False
def _apply_json_fallback(payload: Dict[str, Any]) -> Dict[str, Any]:
payload = dict(payload)
payload.pop("response_format", None)
payload.pop("return_format", None)
messages = payload.get("messages")
if isinstance(messages, list):
system_msg = {"role": "system", "content": "Respond only with a valid JSON object."}
if not messages or messages[0].get("role") != "system":
payload["messages"] = [system_msg, *messages]
else:
payload["messages"] = [system_msg, *messages[1:]]
return payload
async def _proxy_json_with_retry(
base_url: str,
path: str,
method: str,
headers: Dict[str, str],
payload: Dict[str, Any],
timeout_s: float,
delay_s: float = 3.0,
) -> httpx.Response:
deadline = time.time() + timeout_s
attempt = 0
last_exc: Exception | None = None
while time.time() < deadline:
attempt += 1
try:
resp = await proxy_json(base_url, path, method, headers, payload, timeout_s)
if resp.status_code == 503:
try:
data = resp.json()
except Exception:
data = {}
message = ""
if isinstance(data, dict):
err = data.get("error")
if isinstance(err, dict):
message = str(err.get("message") or "")
else:
message = str(data.get("message") or "")
if "loading model" in message.lower():
log.warning("llama.cpp still loading model, retrying (attempt %s)", attempt)
await asyncio.sleep(delay_s)
continue
return resp
except httpx.RequestError as exc:
last_exc = exc
log.warning("Proxy request failed (attempt %s): %s", attempt, exc)
await asyncio.sleep(delay_s)
if last_exc:
raise last_exc
raise RuntimeError("proxy retry deadline exceeded")
async def _get_active_model_from_truenas(cfg: TrueNASConfig) -> str:
try:
return await get_active_model_id(cfg)
except Exception as exc:
log.warning("Failed to read active model from TrueNAS config: %s", exc)
return ""
async def _wait_for_active_model(cfg: TrueNASConfig, model_id: str, timeout_s: float) -> None:
deadline = asyncio.get_event_loop().time() + timeout_s
while asyncio.get_event_loop().time() < deadline:
active = await _get_active_model_from_truenas(cfg)
if active == model_id:
return
await asyncio.sleep(2)
raise RuntimeError(f"active model did not switch to {model_id}")
async def _ensure_model_loaded(model_id: str, model_dir: str) -> str:
cfg = load_config()
model = resolve_model(model_dir, model_id, cfg.model_aliases)
if not model:
log.warning("Requested model not found: %s", model_id)
raise HTTPException(status_code=404, detail="model not found")
if model.model_id != model_id:
log.info("Resolved model alias %s -> %s", model_id, model.model_id)
truenas_cfg = None
if cfg.truenas_ws_url and cfg.truenas_api_key:
truenas_cfg = TrueNASConfig(
ws_url=cfg.truenas_ws_url,
api_key=cfg.truenas_api_key,
api_user=cfg.truenas_api_user,
app_name=cfg.truenas_app_name,
verify_ssl=cfg.truenas_verify_ssl,
)
active_id = await _get_active_model_from_truenas(truenas_cfg)
if active_id and active_id == model.model_id:
return model.model_id
if truenas_cfg:
log.info("Switching model via API model=%s args=%s extra_args=%s", model.model_id, cfg.llamacpp_args, cfg.llamacpp_extra_args)
try:
model_path = str((Path(cfg.model_container_dir) / model.model_id))
await switch_model(
truenas_cfg,
model_path,
cfg.llamacpp_args,
cfg.llamacpp_extra_args,
)
await _wait_for_active_model(truenas_cfg, model.model_id, cfg.switch_timeout_s)
except Exception as exc:
log.exception("TrueNAS model switch failed")
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
log.info("Running warmup prompt after model switch: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
return model.model_id
plan = RestartPlan(
method=cfg.restart_method,
command=cfg.restart_command,
url=cfg.restart_url,
allowed_container=cfg.allowed_container,
)
log.info("Triggering restart for model=%s method=%s", model.model_id, cfg.restart_method)
payload = {
"model_id": model.model_id,
"model_path": str(Path(cfg.model_container_dir) / model.model_id),
"gpu_count": cfg.gpu_count_runtime or cfg.agents.gpu_count,
"llamacpp_args": cfg.llamacpp_args,
"llamacpp_extra_args": cfg.llamacpp_extra_args,
}
await trigger_restart(plan, payload=payload)
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
log.info("Running warmup prompt after restart: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
return model.model_id
def create_api_app() -> FastAPI:
cfg = load_config()
app = FastAPI(title="llama.cpp OpenAI Wrapper", version="0.1.0")
router = APIRouter()
@app.middleware("http")
async def log_requests(request: Request, call_next):
log.info("Request %s %s", request.method, request.url.path)
return await call_next(request)
@app.exception_handler(Exception)
async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
log.exception("Unhandled error")
return JSONResponse(status_code=500, content={"detail": str(exc)})
@router.get("/health")
async def health() -> Dict[str, Any]:
return {
"status": "ok",
"base_url": cfg.base_url,
"model_dir": cfg.model_dir,
"agents": {
"image": cfg.agents.image,
"container_name": cfg.agents.container_name,
"network": cfg.agents.network,
"gpu_count": cfg.agents.gpu_count,
},
"gpu_count_runtime": cfg.gpu_count_runtime,
}
@router.get("/v1/models")
async def list_models() -> Dict[str, Any]:
log.info("Listing models")
return _model_list_payload(cfg.model_dir)
@router.get("/v1/models/{model_id}")
async def get_model(model_id: str) -> Dict[str, Any]:
log.info("Get model %s", model_id)
model = resolve_model(cfg.model_dir, model_id, cfg.model_aliases) or find_model(cfg.model_dir, model_id)
if not model:
raise HTTPException(status_code=404, detail="model not found")
return {
"id": model.model_id,
"object": "model",
"created": model.created,
"owned_by": "llama.cpp",
}
@router.post("/v1/chat/completions")
async def chat_completions(request: Request) -> Response:
payload = await request.json()
payload = normalize_chat_payload(payload)
model_id = payload.get("model")
log.info("Chat completions model=%s stream=%s", model_id, bool(payload.get("stream")))
if model_id:
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
payload["model"] = resolved
stream = bool(payload.get("stream"))
if stream and _requires_json_mode(payload):
payload = _apply_json_fallback(payload)
if stream:
streamer = proxy_stream(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
return StreamingResponse(streamer, media_type="text/event-stream")
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
if resp.status_code >= 500 and _requires_json_mode(payload):
log.info("Retrying chat completion with JSON fallback prompt")
fallback_payload = _apply_json_fallback(payload)
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
try:
return JSONResponse(status_code=resp.status_code, content=resp.json())
except Exception:
return Response(
status_code=resp.status_code,
content=resp.content,
media_type=resp.headers.get("content-type"),
)
@router.post("/v1/responses")
async def responses(request: Request) -> Response:
payload = await request.json()
chat_payload, model_id = responses_to_chat_payload(payload)
log.info("Responses model=%s stream=%s", model_id, bool(chat_payload.get("stream")))
if model_id:
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
chat_payload["model"] = resolved
stream = bool(chat_payload.get("stream"))
if stream and _requires_json_mode(chat_payload):
chat_payload = _apply_json_fallback(chat_payload)
if stream:
streamer = stream_chat_to_responses(
cfg.base_url,
dict(request.headers),
chat_payload,
cfg.proxy_timeout_s,
)
return StreamingResponse(streamer, media_type="text/event-stream")
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), chat_payload, cfg.proxy_timeout_s)
if resp.status_code >= 500 and _requires_json_mode(chat_payload):
log.info("Retrying responses with JSON fallback prompt")
fallback_payload = _apply_json_fallback(chat_payload)
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
resp.raise_for_status()
return JSONResponse(status_code=200, content=chat_to_responses(resp.json(), model_id))
@router.post("/v1/embeddings")
async def embeddings(request: Request) -> Response:
payload = await request.json()
log.info("Embeddings")
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/embeddings", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
try:
return JSONResponse(status_code=resp.status_code, content=resp.json())
except Exception:
return Response(
status_code=resp.status_code,
content=resp.content,
media_type=resp.headers.get("content-type"),
)
@router.api_route("/proxy/llamacpp/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"])
async def passthrough(path: str, request: Request) -> Response:
body = await request.body()
resp = await proxy_raw(cfg.base_url, f"/{path}", request.method, dict(request.headers), body, cfg.proxy_timeout_s)
return Response(status_code=resp.status_code, content=resp.content, headers=dict(resp.headers))
app.include_router(router)
return app

View File

@@ -0,0 +1,214 @@
import json
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Optional
@dataclass
class AgentsRuntime:
image: Optional[str]
container_name: Optional[str]
host_port: Optional[int]
container_port: Optional[int]
web_ui_url: Optional[str]
model_host_path: Optional[str]
model_container_path: Optional[str]
models: List[str]
network: Optional[str]
subnets: List[str]
gpu_count: Optional[int]
gpu_name: Optional[str]
@dataclass
class AppConfig:
api_port: int
ui_port: int
base_url: str
model_dir: str
model_container_dir: str
download_dir: str
download_max_concurrent: int
download_allowlist: List[str]
restart_method: str
restart_command: Optional[str]
restart_url: Optional[str]
reload_on_new_model: bool
proxy_timeout_s: float
switch_timeout_s: float
gpu_count_runtime: Optional[int]
llamacpp_args: Dict[str, str]
llamacpp_extra_args: str
truenas_api_key: Optional[str]
truenas_api_user: Optional[str]
truenas_app_name: str
truenas_ws_url: Optional[str]
truenas_verify_ssl: bool
allowed_container: Optional[str]
warmup_prompt_path: str
llamacpp_container_name: Optional[str]
model_aliases: Dict[str, str]
agents: AgentsRuntime
def _load_agents_config(path: Path) -> AgentsRuntime:
if not path.exists():
return AgentsRuntime(
image=None,
container_name=None,
host_port=None,
container_port=None,
web_ui_url=None,
model_host_path=None,
model_container_path=None,
models=[],
network=None,
subnets=[],
gpu_count=None,
gpu_name=None,
)
raw = json.loads(path.read_text(encoding="utf-8"))
return AgentsRuntime(
image=raw.get("image"),
container_name=raw.get("container_name"),
host_port=raw.get("host_port"),
container_port=raw.get("container_port"),
web_ui_url=raw.get("web_ui_url"),
model_host_path=raw.get("model_host_path"),
model_container_path=raw.get("model_container_path"),
models=raw.get("models") or [],
network=raw.get("network"),
subnets=raw.get("subnets") or [],
gpu_count=raw.get("gpu_count"),
gpu_name=raw.get("gpu_name"),
)
def _infer_gpu_count_runtime() -> Optional[int]:
visible = os.getenv("CUDA_VISIBLE_DEVICES") or os.getenv("NVIDIA_VISIBLE_DEVICES")
if visible and visible not in {"all", "void"}:
parts = [p.strip() for p in visible.split(",") if p.strip()]
if parts:
return len(parts)
return None
def _default_base_url(agents: AgentsRuntime) -> str:
if agents.container_name and agents.container_port:
return f"http://{agents.container_name}:{agents.container_port}"
if agents.host_port:
return f"http://127.0.0.1:{agents.host_port}"
return "http://127.0.0.1:8080"
def load_config() -> AppConfig:
agents_path = Path(os.getenv("AGENTS_CONFIG_PATH", "app/agents_config.json"))
agents = _load_agents_config(agents_path)
api_port = int(os.getenv("PORT_A", "9093"))
ui_port = int(os.getenv("PORT_B", "9094"))
base_url = os.getenv("LLAMACPP_BASE_URL") or _default_base_url(agents)
model_dir = os.getenv("MODEL_DIR") or agents.model_container_path or "/models"
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or model_dir
download_dir = os.getenv("MODEL_DOWNLOAD_DIR") or model_dir
download_max = int(os.getenv("MODEL_DOWNLOAD_MAX_CONCURRENT", "2"))
allowlist_raw = os.getenv("MODEL_DOWNLOAD_ALLOWLIST", "")
allowlist = [item.strip() for item in allowlist_raw.split(",") if item.strip()]
restart_method = os.getenv("LLAMACPP_RESTART_METHOD", "none").lower()
restart_command = os.getenv("LLAMACPP_RESTART_COMMAND")
restart_url = os.getenv("LLAMACPP_RESTART_URL")
reload_on_new_model = os.getenv("RELOAD_ON_NEW_MODEL", "false").lower() in {"1", "true", "yes"}
proxy_timeout_s = float(os.getenv("LLAMACPP_PROXY_TIMEOUT_S", "600"))
switch_timeout_s = float(os.getenv("LLAMACPP_SWITCH_TIMEOUT_S", "300"))
gpu_count_runtime = _infer_gpu_count_runtime()
llamacpp_args = {}
args_map = {
"LLAMACPP_TENSOR_SPLIT": "tensor_split",
"LLAMACPP_SPLIT_MODE": "split_mode",
"LLAMACPP_N_GPU_LAYERS": "n_gpu_layers",
"LLAMACPP_CTX_SIZE": "ctx_size",
"LLAMACPP_BATCH_SIZE": "batch_size",
"LLAMACPP_UBATCH_SIZE": "ubatch_size",
"LLAMACPP_CACHE_TYPE_K": "cache_type_k",
"LLAMACPP_CACHE_TYPE_V": "cache_type_v",
"LLAMACPP_FLASH_ATTN": "flash_attn",
}
for env_key, arg_key in args_map.items():
value = os.getenv(env_key)
if value is not None and value != "":
llamacpp_args[arg_key] = value
llamacpp_extra_args = os.getenv("LLAMACPP_EXTRA_ARGS", "")
truenas_api_key = os.getenv("TRUENAS_API_KEY")
truenas_api_user = os.getenv("TRUENAS_API_USER")
truenas_app_name = os.getenv("TRUENAS_APP_NAME", "llamacpp")
truenas_ws_url = os.getenv("TRUENAS_WS_URL")
truenas_api_url = os.getenv("TRUENAS_API_URL")
if not truenas_ws_url and truenas_api_url:
if truenas_api_url.startswith("https://"):
truenas_ws_url = "wss://" + truenas_api_url[len("https://") :].rstrip("/") + "/websocket"
elif truenas_api_url.startswith("http://"):
truenas_ws_url = "ws://" + truenas_api_url[len("http://") :].rstrip("/") + "/websocket"
truenas_verify_ssl = os.getenv("TRUENAS_VERIFY_SSL", "false").lower() in {"1", "true", "yes"}
allowed_container = os.getenv("LLAMACPP_TARGET_CONTAINER") or agents.container_name
llamacpp_container_name = os.getenv("LLAMACPP_CONTAINER_NAME") or agents.container_name
warmup_prompt_path = os.getenv("WARMUP_PROMPT_PATH", str(Path("trades_company_stock.txt").resolve()))
if truenas_ws_url and (":" in model_container_dir[:3] or "\\" in model_container_dir):
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or "/models"
aliases_raw = os.getenv("MODEL_ALIASES", "")
model_aliases: Dict[str, str] = {}
if aliases_raw:
try:
model_aliases = json.loads(aliases_raw)
except json.JSONDecodeError:
for item in aliases_raw.split(","):
if "=" in item:
key, value = item.split("=", 1)
model_aliases[key.strip()] = value.strip()
gpu_count = gpu_count_runtime or agents.gpu_count
if gpu_count and gpu_count >= 2:
if "tensor_split" not in llamacpp_args:
ratio = 1.0 / float(gpu_count)
split = ",".join([f"{ratio:.2f}"] * gpu_count)
llamacpp_args["tensor_split"] = split
if "split_mode" not in llamacpp_args:
llamacpp_args["split_mode"] = "layer"
return AppConfig(
api_port=api_port,
ui_port=ui_port,
base_url=base_url,
model_dir=model_dir,
model_container_dir=model_container_dir,
download_dir=download_dir,
download_max_concurrent=download_max,
download_allowlist=allowlist,
restart_method=restart_method,
restart_command=restart_command,
restart_url=restart_url,
reload_on_new_model=reload_on_new_model,
proxy_timeout_s=proxy_timeout_s,
switch_timeout_s=switch_timeout_s,
gpu_count_runtime=gpu_count_runtime,
llamacpp_args=llamacpp_args,
llamacpp_extra_args=llamacpp_extra_args,
truenas_api_key=truenas_api_key,
truenas_api_user=truenas_api_user,
truenas_app_name=truenas_app_name,
truenas_ws_url=truenas_ws_url,
truenas_verify_ssl=truenas_verify_ssl,
allowed_container=allowed_container,
warmup_prompt_path=warmup_prompt_path,
llamacpp_container_name=llamacpp_container_name,
model_aliases=model_aliases,
agents=agents,
)

View File

@@ -0,0 +1,61 @@
import json
import logging
import os
from typing import Optional
import httpx
log = logging.getLogger("docker_logs")
def _docker_transport() -> httpx.AsyncHTTPTransport:
sock_path = os.getenv("DOCKER_SOCK", "/var/run/docker.sock")
return httpx.AsyncHTTPTransport(uds=sock_path)
async def _docker_get(path: str, params: Optional[dict] = None) -> httpx.Response:
timeout = httpx.Timeout(10.0, read=10.0)
async with httpx.AsyncClient(transport=_docker_transport(), base_url="http://docker", timeout=timeout) as client:
resp = await client.get(path, params=params)
resp.raise_for_status()
return resp
def _decode_docker_stream(data: bytes) -> str:
if not data:
return ""
out = bytearray()
idx = 0
while idx + 8 <= len(data):
stream_type = data[idx]
size = int.from_bytes(data[idx + 4: idx + 8], "big")
idx += 8
if idx + size > len(data):
break
chunk = data[idx: idx + size]
idx += size
if stream_type in (1, 2):
out.extend(chunk)
else:
out.extend(chunk)
if out:
return out.decode("utf-8", errors="replace")
return data.decode("utf-8", errors="replace")
async def docker_container_logs(container_name: str, tail_lines: int = 200) -> str:
filters = json.dumps({"name": [container_name]})
resp = await _docker_get("/containers/json", params={"filters": filters})
containers = resp.json() or []
if not containers:
log.info("No docker container found for name=%s", container_name)
return ""
container_id = containers[0].get("Id")
if not container_id:
return ""
resp = await _docker_get(
f"/containers/{container_id}/logs",
params={"stdout": 1, "stderr": 1, "tail": tail_lines},
)
return _decode_docker_stream(resp.content)

View File

@@ -0,0 +1,141 @@
import asyncio
import fnmatch
import logging
import os
import time
import uuid
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Dict, Optional
import httpx
from app.config import AppConfig
from app.logging_utils import configure_logging
from app.restart import RestartPlan, trigger_restart
configure_logging()
log = logging.getLogger("download_manager")
@dataclass
class DownloadStatus:
download_id: str
url: str
filename: str
status: str
bytes_total: Optional[int] = None
bytes_downloaded: int = 0
started_at: float = field(default_factory=time.time)
finished_at: Optional[float] = None
error: Optional[str] = None
class DownloadManager:
def __init__(self, cfg: AppConfig, broadcaster=None) -> None:
self.cfg = cfg
self._downloads: Dict[str, DownloadStatus] = {}
self._tasks: Dict[str, asyncio.Task] = {}
self._semaphore = asyncio.Semaphore(cfg.download_max_concurrent)
self._broadcaster = broadcaster
async def _emit(self, payload: dict) -> None:
if self._broadcaster:
await self._broadcaster.publish(payload)
def list_downloads(self) -> Dict[str, dict]:
return {k: asdict(v) for k, v in self._downloads.items()}
def get(self, download_id: str) -> Optional[DownloadStatus]:
return self._downloads.get(download_id)
def _is_allowed(self, url: str) -> bool:
if not self.cfg.download_allowlist:
return True
return any(fnmatch.fnmatch(url, pattern) for pattern in self.cfg.download_allowlist)
async def start(self, url: str, filename: Optional[str] = None) -> DownloadStatus:
if not self._is_allowed(url):
raise ValueError("url not allowed by allowlist")
if not filename:
filename = os.path.basename(url.split("?")[0]) or f"model-{uuid.uuid4().hex}.gguf"
log.info("Download requested url=%s filename=%s", url, filename)
download_id = uuid.uuid4().hex
status = DownloadStatus(download_id=download_id, url=url, filename=filename, status="queued")
self._downloads[download_id] = status
task = asyncio.create_task(self._run_download(status))
self._tasks[download_id] = task
await self._emit({"type": "download_status", "download": asdict(status)})
return status
async def cancel(self, download_id: str) -> bool:
task = self._tasks.get(download_id)
if task:
task.cancel()
status = self._downloads.get(download_id)
if status:
log.info("Download cancelled id=%s filename=%s", download_id, status.filename)
await self._emit({"type": "download_status", "download": asdict(status)})
return True
return False
async def _run_download(self, status: DownloadStatus) -> None:
status.status = "downloading"
base = Path(self.cfg.download_dir)
base.mkdir(parents=True, exist_ok=True)
tmp_path = base / f".{status.filename}.partial"
final_path = base / status.filename
last_emit = 0.0
try:
async with self._semaphore:
async with httpx.AsyncClient(timeout=None, follow_redirects=True) as client:
async with client.stream("GET", status.url) as resp:
resp.raise_for_status()
length = resp.headers.get("content-length")
if length:
status.bytes_total = int(length)
with tmp_path.open("wb") as f:
async for chunk in resp.aiter_bytes():
if chunk:
f.write(chunk)
status.bytes_downloaded += len(chunk)
now = time.time()
if now - last_emit >= 1:
last_emit = now
await self._emit({"type": "download_progress", "download": asdict(status)})
if tmp_path.exists():
tmp_path.replace(final_path)
status.status = "completed"
status.finished_at = time.time()
log.info("Download completed id=%s filename=%s", status.download_id, status.filename)
await self._emit({"type": "download_completed", "download": asdict(status)})
if self.cfg.reload_on_new_model:
plan = RestartPlan(
method=self.cfg.restart_method,
command=self.cfg.restart_command,
url=self.cfg.restart_url,
allowed_container=self.cfg.allowed_container,
)
await trigger_restart(
plan,
payload={
"reason": "new_model",
"model_id": status.filename,
"llamacpp_args": self.cfg.llamacpp_args,
"llamacpp_extra_args": self.cfg.llamacpp_extra_args,
},
)
except asyncio.CancelledError:
status.status = "cancelled"
if tmp_path.exists():
tmp_path.unlink(missing_ok=True)
log.info("Download cancelled id=%s filename=%s", status.download_id, status.filename)
await self._emit({"type": "download_cancelled", "download": asdict(status)})
except Exception as exc:
status.status = "error"
status.error = str(exc)
if tmp_path.exists():
tmp_path.unlink(missing_ok=True)
log.info("Download error id=%s filename=%s error=%s", status.download_id, status.filename, exc)
await self._emit({"type": "download_error", "download": asdict(status)})

View File

@@ -0,0 +1,52 @@
import logging
from typing import AsyncIterator, Dict, Optional
import httpx
log = logging.getLogger("llamacpp_client")
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
drop = {"host", "content-length"}
return {k: v for k, v in headers.items() if k.lower() not in drop}
async def proxy_json(
base_url: str,
path: str,
method: str,
headers: Dict[str, str],
payload: Optional[dict],
timeout_s: float,
) -> httpx.Response:
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
return await client.request(method, path, headers=_filter_headers(headers), json=payload)
async def proxy_raw(
base_url: str,
path: str,
method: str,
headers: Dict[str, str],
body: Optional[bytes],
timeout_s: float,
) -> httpx.Response:
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
return await client.request(method, path, headers=_filter_headers(headers), content=body)
async def proxy_stream(
base_url: str,
path: str,
method: str,
headers: Dict[str, str],
payload: Optional[dict],
timeout_s: float,
) -> AsyncIterator[bytes]:
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
async with client.stream(method, path, headers=_filter_headers(headers), json=payload) as resp:
resp.raise_for_status()
async for chunk in resp.aiter_bytes():
if chunk:
yield chunk

View File

@@ -0,0 +1,13 @@
import logging
import os
def configure_logging() -> None:
if logging.getLogger().handlers:
return
level_name = os.getenv("LOG_LEVEL", "INFO").upper()
level = getattr(logging, level_name, logging.INFO)
logging.basicConfig(
level=level,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)

View File

@@ -0,0 +1,45 @@
import time
from dataclasses import dataclass
from pathlib import Path
from typing import Dict, List, Optional
@dataclass
class ModelInfo:
model_id: str
created: int
size: int
path: Path
def scan_models(model_dir: str) -> List[ModelInfo]:
base = Path(model_dir)
if not base.exists():
return []
models: List[ModelInfo] = []
now = int(time.time())
for entry in base.iterdir():
if entry.name.endswith(".partial"):
continue
if entry.is_file():
size = entry.stat().st_size
models.append(ModelInfo(model_id=entry.name, created=now, size=size, path=entry))
elif entry.is_dir():
models.append(ModelInfo(model_id=entry.name, created=now, size=0, path=entry))
models.sort(key=lambda m: m.model_id.lower())
return models
def find_model(model_dir: str, model_id: str) -> Optional[ModelInfo]:
for model in scan_models(model_dir):
if model.model_id == model_id:
return model
return None
def resolve_model(model_dir: str, requested: str, aliases: Dict[str, str]) -> Optional[ModelInfo]:
if not requested:
return None
if requested in aliases:
requested = aliases[requested]
return find_model(model_dir, requested)

View File

@@ -0,0 +1,140 @@
import time
import uuid
from typing import Any, Dict, List, Tuple
def _messages_from_input(input_value: Any) -> List[Dict[str, Any]]:
if isinstance(input_value, str):
return [{"role": "user", "content": input_value}]
if isinstance(input_value, list):
messages: List[Dict[str, Any]] = []
for item in input_value:
if isinstance(item, str):
messages.append({"role": "user", "content": item})
elif isinstance(item, dict):
role = item.get("role") or "user"
content = item.get("content") or item.get("text") or ""
if item.get("type") == "input_image":
content = [{"type": "image_url", "image_url": {"url": item.get("image_url", "")}}]
messages.append({"role": role, "content": content})
return messages
return [{"role": "user", "content": str(input_value)}]
def _normalize_tools(tools: Any) -> Any:
if not isinstance(tools, list):
return tools
normalized = []
for tool in tools:
if not isinstance(tool, dict):
normalized.append(tool)
continue
if "function" in tool:
normalized.append(tool)
continue
if tool.get("type") == "function" and ("name" in tool or "parameters" in tool or "description" in tool):
function = {
"name": tool.get("name"),
"parameters": tool.get("parameters"),
"description": tool.get("description"),
}
function = {k: v for k, v in function.items() if v is not None}
normalized.append({"type": "function", "function": function})
continue
normalized.append(tool)
return normalized
def _normalize_tool_choice(tool_choice: Any) -> Any:
if not isinstance(tool_choice, dict):
return tool_choice
if "function" in tool_choice:
return tool_choice
if tool_choice.get("type") == "function" and "name" in tool_choice:
return {"type": "function", "function": {"name": tool_choice.get("name")}}
return tool_choice
def normalize_chat_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
if "return_format" in payload and "response_format" not in payload:
if payload["return_format"] == "json":
payload["response_format"] = {"type": "json_object"}
if "functions" in payload and "tools" not in payload:
functions = payload.get("functions")
if isinstance(functions, list):
tools = []
for func in functions:
if isinstance(func, dict):
tools.append({"type": "function", "function": func})
if tools:
payload["tools"] = tools
payload.pop("functions", None)
if "tools" in payload:
payload["tools"] = _normalize_tools(payload.get("tools"))
if "tool_choice" in payload:
payload["tool_choice"] = _normalize_tool_choice(payload.get("tool_choice"))
return payload
def responses_to_chat_payload(payload: Dict[str, Any]) -> Tuple[Dict[str, Any], str]:
model = payload.get("model") or "unknown"
messages = _messages_from_input(payload.get("input", ""))
chat_payload: Dict[str, Any] = {
"model": model,
"messages": messages,
}
passthrough_keys = [
"temperature",
"top_p",
"max_output_tokens",
"stream",
"tools",
"tool_choice",
"response_format",
"return_format",
"frequency_penalty",
"presence_penalty",
"seed",
"stop",
]
for key in passthrough_keys:
if key in payload:
if key == "max_output_tokens":
chat_payload["max_tokens"] = payload[key]
elif key == "return_format" and payload[key] == "json":
chat_payload["response_format"] = {"type": "json_object"}
else:
chat_payload[key] = payload[key]
return normalize_chat_payload(chat_payload), model
def chat_to_responses(chat: Dict[str, Any], model: str) -> Dict[str, Any]:
response_id = f"resp_{uuid.uuid4().hex}"
created = int(time.time())
content = ""
if chat.get("choices"):
choice = chat["choices"][0]
message = choice.get("message") or {}
content = message.get("content") or ""
return {
"id": response_id,
"object": "response",
"created": created,
"model": model,
"output": [
{
"id": f"msg_{uuid.uuid4().hex}",
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": content}
],
}
],
"usage": chat.get("usage", {}),
}

View File

@@ -0,0 +1,51 @@
import asyncio
import logging
import shlex
from dataclasses import dataclass
from typing import Optional
import httpx
log = logging.getLogger("llamacpp_restart")
@dataclass
class RestartPlan:
method: str
command: Optional[str]
url: Optional[str]
allowed_container: Optional[str] = None
async def trigger_restart(plan: RestartPlan, payload: Optional[dict] = None) -> None:
if plan.method == "none":
log.warning("Restart requested but restart method is none")
return
if plan.method == "http":
if not plan.url:
raise RuntimeError("restart url is required for http method")
async with httpx.AsyncClient(timeout=60) as client:
resp = await client.post(plan.url, json=payload or {})
resp.raise_for_status()
return
if plan.method == "docker":
if not plan.command:
raise RuntimeError("restart command must include container id or name for docker method")
if plan.allowed_container and plan.command != plan.allowed_container:
raise RuntimeError("docker restart command not allowed for non-target container")
async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds="/var/run/docker.sock"), timeout=30) as client:
resp = await client.post(f"http://docker/containers/{plan.command}/restart")
resp.raise_for_status()
return
if plan.method == "shell":
if not plan.command:
raise RuntimeError("restart command is required for shell method")
cmd = plan.command
args = shlex.split(cmd)
proc = await asyncio.create_subprocess_exec(*args)
code = await proc.wait()
if code != 0:
raise RuntimeError(f"restart command failed with exit code {code}")
return
raise RuntimeError(f"unknown restart method {plan.method}")

View File

@@ -0,0 +1,35 @@
import os
import signal
import subprocess
import sys
from app.config import load_config
def main() -> None:
cfg = load_config()
python = sys.executable
api_cmd = [python, "-m", "uvicorn", "app.api_app:create_api_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.api_port)]
ui_cmd = [python, "-m", "uvicorn", "app.ui_app:create_ui_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.ui_port)]
procs = [subprocess.Popen(api_cmd)]
if cfg.ui_port != cfg.api_port:
procs.append(subprocess.Popen(ui_cmd))
def shutdown(_sig, _frame):
for proc in procs:
proc.terminate()
for proc in procs:
proc.wait(timeout=10)
sys.exit(0)
signal.signal(signal.SIGTERM, shutdown)
signal.signal(signal.SIGINT, shutdown)
for proc in procs:
proc.wait()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,102 @@
import json
import time
import uuid
from typing import Any, AsyncIterator, Dict
import httpx
def _sse_event(event: str, data: Dict[str, Any]) -> bytes:
payload = json.dumps(data, separators=(",", ":"))
return f"event: {event}\ndata: {payload}\n\n".encode("utf-8")
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
drop = {"host", "content-length"}
return {k: v for k, v in headers.items() if k.lower() not in drop}
async def stream_chat_to_responses(
base_url: str,
headers: Dict[str, str],
payload: Dict[str, Any],
timeout_s: float,
) -> AsyncIterator[bytes]:
response_id = f"resp_{uuid.uuid4().hex}"
created = int(time.time())
model = payload.get("model") or "unknown"
msg_id = f"msg_{uuid.uuid4().hex}"
output_text = ""
response_stub = {
"id": response_id,
"object": "response",
"created": created,
"model": model,
"output": [
{
"id": msg_id,
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": ""}
],
}
],
}
yield _sse_event("response.created", {"type": "response.created", "response": response_stub})
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
async with client.stream(
"POST",
"/v1/chat/completions",
headers=_filter_headers(headers),
json=payload,
) as resp:
resp.raise_for_status()
buffer = ""
async for chunk in resp.aiter_text():
buffer += chunk
while "\n\n" in buffer:
block, buffer = buffer.split("\n\n", 1)
lines = [line for line in block.splitlines() if line.startswith("data:")]
if not lines:
continue
data_str = "\n".join(line[len("data:"):].strip() for line in lines)
if data_str == "[DONE]":
continue
try:
data = json.loads(data_str)
except json.JSONDecodeError:
continue
choices = data.get("choices") or []
if not choices:
continue
delta = choices[0].get("delta") or {}
text_delta = delta.get("content")
if text_delta:
output_text += text_delta
yield _sse_event(
"response.output_text.delta",
{
"type": "response.output_text.delta",
"delta": text_delta,
"item_id": msg_id,
"output_index": 0,
"content_index": 0,
},
)
yield _sse_event(
"response.output_text.done",
{
"type": "response.output_text.done",
"text": output_text,
"item_id": msg_id,
"output_index": 0,
"content_index": 0,
},
)
response_stub["output"][0]["content"][0]["text"] = output_text
yield _sse_event("response.completed", {"type": "response.completed", "response": response_stub})

View File

@@ -0,0 +1,313 @@
import json
import logging
import shlex
import ssl
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, Optional
import websockets
import yaml
log = logging.getLogger("truenas_middleware")
@dataclass
class TrueNASConfig:
ws_url: str
api_key: str
api_user: Optional[str]
app_name: str
verify_ssl: bool = False
def _parse_compose(raw: Any) -> Dict[str, Any]:
if isinstance(raw, dict):
return raw
if isinstance(raw, str):
text = raw.strip()
try:
return json.loads(text)
except json.JSONDecodeError:
return yaml.safe_load(text)
raise ValueError("Unsupported compose payload")
def _command_to_list(command: Any) -> list:
if isinstance(command, list):
return command
if isinstance(command, str):
return shlex.split(command)
return []
def _extract_command(config: Dict[str, Any], service_name: str = "llamacpp") -> list:
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
services = compose.get("services") or {}
svc = services.get(service_name) or {}
return _command_to_list(svc.get("command"))
return _command_to_list(config.get("command"))
def _model_id_from_command(cmd: list) -> Optional[str]:
if "--model" in cmd:
idx = cmd.index("--model")
if idx + 1 < len(cmd):
return Path(cmd[idx + 1]).name
return None
def _set_arg(cmd: list, flag: str, value: Optional[str]) -> list:
if value is None:
return cmd
if flag in cmd:
idx = cmd.index(flag)
if idx + 1 < len(cmd):
cmd[idx + 1] = value
else:
cmd.append(value)
return cmd
cmd.extend([flag, value])
return cmd
def _merge_args(cmd: list, args: Dict[str, str]) -> list:
flag_map = {
"device": "--device",
"tensor_split": "--tensor-split",
"split_mode": "--split-mode",
"n_gpu_layers": "--n-gpu-layers",
"ctx_size": "--ctx-size",
"batch_size": "--batch-size",
"ubatch_size": "--ubatch-size",
"cache_type_k": "--cache-type-k",
"cache_type_v": "--cache-type-v",
"flash_attn": "--flash-attn",
}
for key, value in args.items():
flag = flag_map.get(key)
if flag:
if flag in cmd:
continue
_set_arg(cmd, flag, value)
return cmd
def _merge_extra_args(cmd: list, extra: str) -> list:
if not extra:
return cmd
extra_list = shlex.split(extra)
filtered: list[str] = []
skip_next = False
for item in extra_list:
if skip_next:
skip_next = False
continue
if item in {"--device", "-dev"}:
log.warning("Dropping --device from extra args to avoid llama.cpp device errors.")
skip_next = True
continue
filtered.append(item)
for flag in filtered:
if flag not in cmd:
cmd.append(flag)
return cmd
def _update_model_command(command: Any, model_path: str, args: Dict[str, str], extra: str) -> list:
cmd = _command_to_list(command)
if "--device" in cmd:
idx = cmd.index("--device")
del cmd[idx: idx + 2]
cmd = _set_arg(cmd, "--model", model_path)
cmd = _merge_args(cmd, args)
cmd = _merge_extra_args(cmd, extra)
return cmd
def _replace_flags(cmd: list, flags: Dict[str, Optional[str]], extra: str) -> list:
result = list(cmd)
for flag in flags.keys():
while flag in result:
idx = result.index(flag)
del result[idx: idx + 2]
if "--device" in result:
idx = result.index("--device")
del result[idx: idx + 2]
for flag, value in flags.items():
if value is not None and value != "":
result = _set_arg(result, flag, value)
result = _merge_extra_args(result, extra)
return result
async def get_app_config(cfg: TrueNASConfig) -> Dict[str, Any]:
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
if not isinstance(config, dict):
raise RuntimeError("app.config returned unsupported payload")
return config
async def get_app_command(cfg: TrueNASConfig, service_name: str = "llamacpp") -> list:
config = await get_app_config(cfg)
return _extract_command(config, service_name=service_name)
async def get_active_model_id(cfg: TrueNASConfig, service_name: str = "llamacpp") -> str:
config = await get_app_config(cfg)
cmd = _extract_command(config, service_name=service_name)
return _model_id_from_command(cmd) or ""
async def get_app_logs(
cfg: TrueNASConfig,
tail_lines: int = 200,
service_name: str = "llamacpp",
) -> str:
tail_payloads = [
{"tail": tail_lines},
{"tail_lines": tail_lines},
{"tail": str(tail_lines)},
]
for payload in tail_payloads:
try:
result = await _rpc_call(cfg, "app.container_logs", [cfg.app_name, service_name, payload])
if isinstance(result, str):
return result
except Exception as exc:
log.debug("app.container_logs failed (%s): %s", payload, exc)
for payload in tail_payloads:
try:
result = await _rpc_call(cfg, "app.logs", [cfg.app_name, payload])
if isinstance(result, str):
return result
except Exception as exc:
log.debug("app.logs failed (%s): %s", payload, exc)
return ""
async def update_app_command(
cfg: TrueNASConfig,
command: list,
service_name: str = "llamacpp",
) -> None:
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
if not isinstance(config, dict):
raise RuntimeError("app.config returned unsupported payload")
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
services = compose.get("services") or {}
if service_name not in services:
raise RuntimeError(f"service {service_name} not found in compose")
svc = services[service_name]
svc["command"] = command
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
return
config["command"] = command
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
async def update_command_flags(
cfg: TrueNASConfig,
flags: Dict[str, Optional[str]],
extra: str,
service_name: str = "llamacpp",
) -> None:
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
if not isinstance(config, dict):
raise RuntimeError("app.config returned unsupported payload")
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
services = compose.get("services") or {}
if service_name not in services:
raise RuntimeError(f"service {service_name} not found in compose")
svc = services[service_name]
cmd = svc.get("command")
svc["command"] = _replace_flags(_command_to_list(cmd), flags, extra)
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
return
cmd = _replace_flags(_command_to_list(config.get("command")), flags, extra)
config["command"] = cmd
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
async def _rpc_call(cfg: TrueNASConfig, method: str, params: Optional[list] = None) -> Any:
ssl_ctx = None
if cfg.ws_url.startswith("wss://") and not cfg.verify_ssl:
ssl_ctx = ssl.create_default_context()
ssl_ctx.check_hostname = False
ssl_ctx.verify_mode = ssl.CERT_NONE
async with websockets.connect(cfg.ws_url, ssl=ssl_ctx) as ws:
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
connected = json.loads(await ws.recv())
if connected.get("msg") != "connected":
raise RuntimeError("failed to connect to TrueNAS websocket")
await ws.send(
json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [cfg.api_key]})
)
auth_resp = json.loads(await ws.recv())
if not auth_resp.get("result"):
if not cfg.api_user:
raise RuntimeError("API key rejected and TRUENAS_API_USER not set")
await ws.send(
json.dumps(
{
"id": 2,
"msg": "method",
"method": "auth.login_ex",
"params": [
{
"mechanism": "API_KEY_PLAIN",
"username": cfg.api_user,
"api_key": cfg.api_key,
}
],
}
)
)
auth_ex = json.loads(await ws.recv())
if auth_ex.get("result", {}).get("response_type") != "SUCCESS":
raise RuntimeError("API key authentication failed")
req_id = 3
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
while True:
raw = json.loads(await ws.recv())
if raw.get("id") != req_id:
continue
if raw.get("msg") == "error":
raise RuntimeError(raw.get("error"))
return raw.get("result")
async def switch_model(
cfg: TrueNASConfig,
model_path: str,
args: Dict[str, str],
extra: str,
service_name: str = "llamacpp",
) -> None:
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
services = compose.get("services") or {}
if service_name not in services:
raise RuntimeError(f"service {service_name} not found in compose")
svc = services[service_name]
cmd = svc.get("command")
svc["command"] = _update_model_command(cmd, model_path, args, extra)
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
log.info("Requested model switch to %s via TrueNAS middleware (custom app)", model_path)
return
if not isinstance(config, dict):
raise RuntimeError("app.config returned unsupported payload")
cmd = config.get("command")
config["command"] = _update_model_command(cmd, model_path, args, extra)
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
log.info("Requested model switch to %s via TrueNAS middleware (catalog app)", model_path)

View File

@@ -0,0 +1,357 @@
import asyncio
import json
import logging
from pathlib import Path
from typing import Any, Dict, Optional
import httpx
from fastapi import FastAPI, HTTPException, Request
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse, StreamingResponse
from app.config import load_config
from app.docker_logs import docker_container_logs
from app.download_manager import DownloadManager
from app.logging_utils import configure_logging
from app.model_registry import scan_models
from app.truenas_middleware import (
TrueNASConfig,
get_active_model_id,
get_app_command,
get_app_logs,
switch_model,
update_command_flags,
)
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
configure_logging()
log = logging.getLogger("ui_app")
class EventBroadcaster:
def __init__(self) -> None:
self._queues: set[asyncio.Queue] = set()
def connect(self) -> asyncio.Queue:
queue: asyncio.Queue = asyncio.Queue()
self._queues.add(queue)
return queue
def disconnect(self, queue: asyncio.Queue) -> None:
self._queues.discard(queue)
async def publish(self, payload: dict) -> None:
for queue in list(self._queues):
queue.put_nowait(payload)
def _static_path() -> Path:
return Path(__file__).parent / "ui_static"
async def _fetch_active_model(truenas_cfg: Optional[TrueNASConfig]) -> Optional[str]:
if not truenas_cfg:
return None
try:
return await get_active_model_id(truenas_cfg)
except Exception as exc:
log.warning("Failed to read active model from TrueNAS config: %s", exc)
return None
def _model_list(model_dir: str, active_model: Optional[str]) -> Dict[str, Any]:
data = []
for model in scan_models(model_dir):
data.append({
"id": model.model_id,
"size": model.size,
"active": model.model_id == active_model,
})
return {"models": data, "active_model": active_model}
def create_ui_app() -> FastAPI:
cfg = load_config()
app = FastAPI(title="llama.cpp Model Manager", version="0.1.0")
broadcaster = EventBroadcaster()
manager = DownloadManager(cfg, broadcaster=broadcaster)
truenas_cfg = None
if cfg.truenas_ws_url and cfg.truenas_api_key:
truenas_cfg = TrueNASConfig(
ws_url=cfg.truenas_ws_url,
api_key=cfg.truenas_api_key,
api_user=cfg.truenas_api_user,
app_name=cfg.truenas_app_name,
verify_ssl=cfg.truenas_verify_ssl,
)
async def monitor_active_model() -> None:
last_model = None
while True:
current = await _fetch_active_model(truenas_cfg)
if current and current != last_model:
last_model = current
await broadcaster.publish({"type": "active_model", "model_id": current})
await asyncio.sleep(3)
async def _fetch_logs() -> str:
logs = ""
if truenas_cfg:
try:
logs = await asyncio.wait_for(get_app_logs(truenas_cfg, tail_lines=200), timeout=5)
except asyncio.TimeoutError:
logs = ""
if not logs and cfg.llamacpp_container_name:
try:
logs = await asyncio.wait_for(
docker_container_logs(cfg.llamacpp_container_name, tail_lines=200),
timeout=10,
)
except asyncio.TimeoutError:
logs = ""
return logs
@app.on_event("startup")
async def start_tasks() -> None:
asyncio.create_task(monitor_active_model())
@app.middleware("http")
async def log_requests(request: Request, call_next):
log.info("UI request %s %s", request.method, request.url.path)
return await call_next(request)
@app.get("/health")
async def health() -> Dict[str, Any]:
return {"status": "ok", "model_dir": cfg.model_dir}
@app.get("/")
async def index() -> HTMLResponse:
return FileResponse(_static_path() / "index.html")
@app.get("/ui/styles.css")
async def styles() -> FileResponse:
return FileResponse(_static_path() / "styles.css")
@app.get("/ui/app.js")
async def app_js() -> FileResponse:
return FileResponse(_static_path() / "app.js")
@app.get("/ui/api/models")
async def list_models() -> JSONResponse:
active_model = await _fetch_active_model(truenas_cfg)
log.info("UI list models active=%s", active_model)
return JSONResponse(_model_list(cfg.model_dir, active_model))
@app.get("/ui/api/downloads")
async def list_downloads() -> JSONResponse:
log.info("UI list downloads")
return JSONResponse({"downloads": manager.list_downloads()})
@app.post("/ui/api/downloads")
async def start_download(request: Request) -> JSONResponse:
payload = await request.json()
url = payload.get("url")
filename = payload.get("filename")
log.info("UI download start url=%s filename=%s", url, filename)
if not url:
raise HTTPException(status_code=400, detail="url is required")
try:
status = await manager.start(url, filename=filename)
except ValueError as exc:
raise HTTPException(status_code=403, detail=str(exc))
return JSONResponse({"download": status.__dict__})
@app.delete("/ui/api/downloads/{download_id}")
async def cancel_download(download_id: str) -> JSONResponse:
log.info("UI download cancel id=%s", download_id)
ok = await manager.cancel(download_id)
if not ok:
raise HTTPException(status_code=404, detail="download not found")
return JSONResponse({"status": "cancelled"})
@app.get("/ui/api/events")
async def events() -> StreamingResponse:
queue = broadcaster.connect()
async def event_stream():
try:
while True:
payload = await queue.get()
data = json.dumps(payload, separators=(",", ":"))
yield f"data: {data}\n\n".encode("utf-8")
finally:
broadcaster.disconnect(queue)
return StreamingResponse(event_stream(), media_type="text/event-stream")
@app.post("/ui/api/switch-model")
async def switch_model_ui(request: Request) -> JSONResponse:
payload = await request.json()
model_id = payload.get("model_id")
warmup_override = payload.get("warmup_prompt") or ""
if not model_id:
raise HTTPException(status_code=400, detail="model_id is required")
model_path = Path(cfg.model_dir) / model_id
if not model_path.exists():
raise HTTPException(status_code=404, detail="model not found")
if not truenas_cfg:
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
try:
container_model_path = str(Path(cfg.model_container_dir) / model_id)
await switch_model(truenas_cfg, container_model_path, cfg.llamacpp_args, cfg.llamacpp_extra_args)
except Exception as exc:
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
log.info("UI warmup after switch model=%s prompt_len=%s", model_id, len(warmup_prompt))
try:
await run_warmup_with_retry(cfg.base_url, model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
except Exception as exc:
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
raise HTTPException(status_code=500, detail=f"model switch warmup failed: {exc}")
try:
async with httpx.AsyncClient(base_url=cfg.base_url, timeout=120) as client:
resp = await client.post(
"/v1/chat/completions",
json={
"model": model_id,
"messages": [{"role": "user", "content": "ok"}],
"max_tokens": 4,
"temperature": 0,
},
)
resp.raise_for_status()
except Exception as exc:
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
raise HTTPException(status_code=500, detail=f"model switch verification failed: {exc}")
await broadcaster.publish({"type": "model_switched", "model_id": model_id})
log.info("UI model switched model=%s", model_id)
return JSONResponse({"status": "ok", "model_id": model_id})
@app.get("/ui/api/llamacpp-config")
async def get_llamacpp_config() -> JSONResponse:
active_model = await _fetch_active_model(truenas_cfg)
log.info("UI get llama.cpp config active=%s", active_model)
params: Dict[str, Optional[str]] = {}
command_raw = []
if truenas_cfg:
command_raw = await get_app_command(truenas_cfg)
flag_map = {
"--ctx-size": "ctx_size",
"--n-gpu-layers": "n_gpu_layers",
"--tensor-split": "tensor_split",
"--split-mode": "split_mode",
"--cache-type-k": "cache_type_k",
"--cache-type-v": "cache_type_v",
"--flash-attn": "flash_attn",
"--temp": "temp",
"--top-k": "top_k",
"--top-p": "top_p",
"--repeat-penalty": "repeat_penalty",
"--repeat-last-n": "repeat_last_n",
"--frequency-penalty": "frequency_penalty",
"--presence-penalty": "presence_penalty",
}
if isinstance(command_raw, list):
for flag, key in flag_map.items():
if flag in command_raw:
idx = command_raw.index(flag)
if idx + 1 < len(command_raw):
params[key] = command_raw[idx + 1]
known_flags = set(flag_map.keys()) | {"--model"}
extra = []
if isinstance(command_raw, list):
skip_next = False
for item in command_raw:
if skip_next:
skip_next = False
continue
if item in known_flags:
skip_next = True
continue
extra.append(item)
return JSONResponse(
{
"active_model": active_model,
"params": params,
"extra_args": " ".join(extra),
}
)
@app.post("/ui/api/llamacpp-config")
async def update_llamacpp_config(request: Request) -> JSONResponse:
payload = await request.json()
params = payload.get("params") or {}
extra_args = payload.get("extra_args") or ""
warmup_override = payload.get("warmup_prompt") or ""
log.info("UI save llama.cpp config params=%s extra_args=%s", params, extra_args)
if not truenas_cfg:
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
flags = {
"--ctx-size": params.get("ctx_size"),
"--n-gpu-layers": params.get("n_gpu_layers"),
"--tensor-split": params.get("tensor_split"),
"--split-mode": params.get("split_mode"),
"--cache-type-k": params.get("cache_type_k"),
"--cache-type-v": params.get("cache_type_v"),
"--flash-attn": params.get("flash_attn"),
"--temp": params.get("temp"),
"--top-k": params.get("top_k"),
"--top-p": params.get("top_p"),
"--repeat-penalty": params.get("repeat_penalty"),
"--repeat-last-n": params.get("repeat_last_n"),
"--frequency-penalty": params.get("frequency_penalty"),
"--presence-penalty": params.get("presence_penalty"),
}
try:
await update_command_flags(truenas_cfg, flags, extra_args)
except Exception as exc:
log.exception("UI update llama.cpp config failed")
raise HTTPException(status_code=500, detail=f"config update failed: {exc}")
active_model = await _fetch_active_model(truenas_cfg)
if active_model:
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
log.info("UI warmup after config update model=%s prompt_len=%s", active_model, len(warmup_prompt))
try:
await run_warmup_with_retry(cfg.base_url, active_model, warmup_prompt, timeout_s=cfg.switch_timeout_s)
except Exception as exc:
raise HTTPException(status_code=500, detail=f"config warmup failed: {exc}")
await broadcaster.publish({"type": "llamacpp_config_updated"})
return JSONResponse({"status": "ok"})
@app.get("/ui/api/llamacpp-logs")
async def get_llamacpp_logs() -> JSONResponse:
logs = await _fetch_logs()
return JSONResponse({"logs": logs})
@app.get("/ui/api/llamacpp-logs/stream")
async def stream_llamacpp_logs() -> StreamingResponse:
async def event_stream():
last_lines: list[str] = []
while True:
logs = await _fetch_logs()
lines = logs.splitlines()
if last_lines:
last_tail = last_lines[-1]
idx = -1
for i in range(len(lines) - 1, -1, -1):
if lines[i] == last_tail:
idx = i
break
if idx >= 0:
lines = lines[idx + 1 :]
if lines:
last_lines = (last_lines + lines)[-200:]
data = json.dumps({"type": "logs", "lines": lines}, separators=(",", ":"))
yield f"data: {data}\n\n".encode("utf-8")
await asyncio.sleep(2)
return StreamingResponse(event_stream(), media_type="text/event-stream")
return app

View File

@@ -0,0 +1,306 @@
const modelsList = document.getElementById("models-list");
const downloadsList = document.getElementById("downloads-list");
const refreshModels = document.getElementById("refresh-models");
const refreshDownloads = document.getElementById("refresh-downloads");
const form = document.getElementById("download-form");
const errorEl = document.getElementById("download-error");
const statusEl = document.getElementById("switch-status");
const configStatusEl = document.getElementById("config-status");
const configForm = document.getElementById("config-form");
const refreshConfig = document.getElementById("refresh-config");
const warmupPromptEl = document.getElementById("warmup-prompt");
const refreshLogs = document.getElementById("refresh-logs");
const logsOutput = document.getElementById("logs-output");
const logsStatus = document.getElementById("logs-status");
const themeToggle = document.getElementById("theme-toggle");
const applyTheme = (theme) => {
document.documentElement.setAttribute("data-theme", theme);
themeToggle.textContent = theme === "dark" ? "Light" : "Dark";
themeToggle.setAttribute("aria-pressed", theme === "dark" ? "true" : "false");
};
const savedTheme = localStorage.getItem("theme") || "light";
applyTheme(savedTheme);
themeToggle.addEventListener("click", () => {
const next = document.documentElement.getAttribute("data-theme") === "dark" ? "light" : "dark";
localStorage.setItem("theme", next);
applyTheme(next);
});
const cfgFields = {
ctx_size: document.getElementById("cfg-ctx-size"),
n_gpu_layers: document.getElementById("cfg-n-gpu-layers"),
tensor_split: document.getElementById("cfg-tensor-split"),
split_mode: document.getElementById("cfg-split-mode"),
cache_type_k: document.getElementById("cfg-cache-type-k"),
cache_type_v: document.getElementById("cfg-cache-type-v"),
flash_attn: document.getElementById("cfg-flash-attn"),
temp: document.getElementById("cfg-temp"),
top_k: document.getElementById("cfg-top-k"),
top_p: document.getElementById("cfg-top-p"),
repeat_penalty: document.getElementById("cfg-repeat-penalty"),
repeat_last_n: document.getElementById("cfg-repeat-last-n"),
frequency_penalty: document.getElementById("cfg-frequency-penalty"),
presence_penalty: document.getElementById("cfg-presence-penalty"),
};
const extraArgsEl = document.getElementById("cfg-extra-args");
const fmtBytes = (bytes) => {
if (!bytes && bytes !== 0) return "-";
const units = ["B", "KB", "MB", "GB", "TB"];
let idx = 0;
let value = bytes;
while (value >= 1024 && idx < units.length - 1) {
value /= 1024;
idx += 1;
}
return `${value.toFixed(1)} ${units[idx]}`;
};
const setStatus = (message, type) => {
statusEl.textContent = message || "";
statusEl.className = "status";
if (type) {
statusEl.classList.add(type);
}
};
const setConfigStatus = (message, type) => {
configStatusEl.textContent = message || "";
configStatusEl.className = "status";
if (type) {
configStatusEl.classList.add(type);
}
};
async function loadModels() {
const res = await fetch("/ui/api/models");
const data = await res.json();
modelsList.innerHTML = "";
const activeModel = data.active_model;
data.models.forEach((model) => {
const li = document.createElement("li");
if (model.active) {
li.classList.add("active");
}
const row = document.createElement("div");
row.className = "model-row";
const name = document.createElement("span");
name.textContent = `${model.id} (${fmtBytes(model.size)})`;
const actions = document.createElement("div");
if (model.active) {
const badge = document.createElement("span");
badge.className = "badge";
badge.textContent = "Active";
actions.appendChild(badge);
} else {
const button = document.createElement("button");
button.className = "ghost";
button.textContent = "Switch";
button.onclick = async () => {
setStatus(`Switching to ${model.id}...`);
const warmupPrompt = warmupPromptEl.value.trim();
const res = await fetch("/ui/api/switch-model", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ model_id: model.id, warmup_prompt: warmupPrompt }),
});
const payload = await res.json();
if (!res.ok) {
setStatus(payload.detail || "Switch failed.", "error");
return;
}
warmupPromptEl.value = "";
setStatus(`Active model: ${model.id}`, "ok");
await loadModels();
};
actions.appendChild(button);
}
row.appendChild(name);
row.appendChild(actions);
li.appendChild(row);
modelsList.appendChild(li);
});
if (activeModel) {
setStatus(`Active model: ${activeModel}`, "ok");
}
}
async function loadDownloads() {
const res = await fetch("/ui/api/downloads");
const data = await res.json();
downloadsList.innerHTML = "";
const entries = Object.values(data.downloads || {});
if (!entries.length) {
downloadsList.innerHTML = "<p>No active downloads.</p>";
return;
}
entries.forEach((download) => {
const card = document.createElement("div");
card.className = "download-card";
const title = document.createElement("strong");
title.textContent = download.filename;
const meta = document.createElement("div");
const percent = download.bytes_total
? Math.round((download.bytes_downloaded / download.bytes_total) * 100)
: 0;
meta.textContent = `${download.status} · ${fmtBytes(download.bytes_downloaded)} / ${fmtBytes(download.bytes_total)}`;
const progress = document.createElement("div");
progress.className = "progress";
const bar = document.createElement("span");
bar.style.width = `${Math.min(percent, 100)}%`;
progress.appendChild(bar);
const actions = document.createElement("div");
if (download.status === "downloading" || download.status === "queued") {
const cancel = document.createElement("button");
cancel.className = "ghost";
cancel.textContent = "Cancel";
cancel.onclick = async () => {
await fetch(`/ui/api/downloads/${download.download_id}`, { method: "DELETE" });
await loadDownloads();
};
actions.appendChild(cancel);
}
card.appendChild(title);
card.appendChild(meta);
card.appendChild(progress);
card.appendChild(actions);
downloadsList.appendChild(card);
});
}
async function loadConfig() {
const res = await fetch("/ui/api/llamacpp-config");
const data = await res.json();
Object.entries(cfgFields).forEach(([key, el]) => {
el.value = data.params?.[key] || "";
});
extraArgsEl.value = data.extra_args || "";
if (data.active_model) {
setConfigStatus(`Active model: ${data.active_model}`, "ok");
}
}
async function loadLogs() {
const res = await fetch("/ui/api/llamacpp-logs");
if (!res.ok) {
logsStatus.textContent = "Unavailable";
return;
}
const data = await res.json();
logsOutput.textContent = data.logs || "";
logsStatus.textContent = data.logs ? "Snapshot" : "Empty";
}
form.addEventListener("submit", async (event) => {
event.preventDefault();
errorEl.textContent = "";
const url = document.getElementById("model-url").value.trim();
const filename = document.getElementById("model-filename").value.trim();
if (!url) {
errorEl.textContent = "URL is required.";
return;
}
const payload = { url };
if (filename) payload.filename = filename;
const res = await fetch("/ui/api/downloads", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload),
});
if (!res.ok) {
const err = await res.json();
errorEl.textContent = err.detail || "Failed to start download.";
return;
}
document.getElementById("model-url").value = "";
document.getElementById("model-filename").value = "";
await loadDownloads();
});
configForm.addEventListener("submit", async (event) => {
event.preventDefault();
setConfigStatus("Applying parameters...");
const params = {};
Object.entries(cfgFields).forEach(([key, el]) => {
if (el.value.trim()) {
params[key] = el.value.trim();
}
});
const warmupPrompt = warmupPromptEl.value.trim();
const res = await fetch("/ui/api/llamacpp-config", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ params, extra_args: extraArgsEl.value.trim(), warmup_prompt: warmupPrompt }),
});
const payload = await res.json();
if (!res.ok) {
setConfigStatus(payload.detail || "Update failed.", "error");
return;
}
setConfigStatus("Parameters updated.", "ok");
warmupPromptEl.value = "";
});
refreshModels.addEventListener("click", loadModels);
refreshDownloads.addEventListener("click", loadDownloads);
refreshConfig.addEventListener("click", loadConfig);
refreshLogs.addEventListener("click", loadLogs);
loadModels();
loadDownloads();
loadConfig();
loadLogs();
const eventSource = new EventSource("/ui/api/events");
eventSource.onmessage = async (event) => {
const payload = JSON.parse(event.data);
if (payload.type === "download_progress" || payload.type === "download_completed" || payload.type === "download_status") {
await loadDownloads();
}
if (payload.type === "active_model") {
await loadModels();
await loadConfig();
}
if (payload.type === "model_switched") {
setStatus(`Active model: ${payload.model_id}`, "ok");
await loadModels();
await loadConfig();
}
if (payload.type === "model_switch_failed") {
setStatus(payload.error || "Model switch failed.", "error");
}
if (payload.type === "llamacpp_config_updated") {
await loadConfig();
}
};
const logsSource = new EventSource("/ui/api/llamacpp-logs/stream");
logsSource.onopen = () => {
logsStatus.textContent = "Streaming";
};
logsSource.onmessage = (event) => {
const payload = JSON.parse(event.data);
if (payload.type !== "logs") {
return;
}
const lines = payload.lines || [];
if (!lines.length) return;
const current = logsOutput.textContent.split("\n").filter((line) => line.length);
const merged = current.concat(lines).slice(-400);
logsOutput.textContent = merged.join("\n");
logsOutput.scrollTop = logsOutput.scrollHeight;
logsStatus.textContent = "Streaming";
};
logsSource.onerror = () => {
logsStatus.textContent = "Disconnected";
};

View File

@@ -0,0 +1,151 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>llama.cpp Model Manager</title>
<link rel="stylesheet" href="/ui/styles.css" />
</head>
<body>
<div class="page">
<header class="topbar">
<div class="brand">
<p class="eyebrow">llama.cpp wrapper</p>
<h1>Model Manager</h1>
<p class="lede">Curate models, tune runtime parameters, and keep llama.cpp responsive.</p>
</div>
<div class="header-actions">
<button id="theme-toggle" class="ghost" type="button" aria-pressed="false">Dark</button>
<div class="quick-actions card">
<h2>Quick Add</h2>
<form id="download-form">
<label>
Model URL
<input type="url" id="model-url" placeholder="https://.../model.gguf" required />
</label>
<label>
Optional filename
<input type="text" id="model-filename" placeholder="custom-name.gguf" />
</label>
<button type="submit">Start Download</button>
<p id="download-error" class="error"></p>
</form>
</div>
</div>
</header>
<main class="layout">
<section class="column">
<div class="card">
<div class="card-header">
<h3>Models</h3>
<button id="refresh-models" class="ghost">Refresh</button>
</div>
<div id="switch-status" class="status"></div>
<label class="config-wide">
Warmup prompt (one-time)
<textarea id="warmup-prompt" rows="3" placeholder="Optional warmup prompt for the next restart only"></textarea>
</label>
<ul id="models-list" class="list"></ul>
</div>
<div class="card">
<div class="card-header">
<h3>Downloads</h3>
<button id="refresh-downloads" class="ghost">Refresh</button>
</div>
<div id="downloads-list" class="downloads"></div>
</div>
</section>
<section class="column">
<div class="card">
<div class="card-header">
<h3>Runtime Parameters</h3>
<button id="refresh-config" class="ghost">Refresh</button>
</div>
<div id="config-status" class="status"></div>
<form id="config-form" class="config-grid">
<label>
ctx-size
<input type="text" id="cfg-ctx-size" placeholder="e.g. 8192" />
</label>
<label>
n-gpu-layers
<input type="text" id="cfg-n-gpu-layers" placeholder="e.g. 999" />
</label>
<label>
tensor-split
<input type="text" id="cfg-tensor-split" placeholder="e.g. 0.5,0.5" />
</label>
<label>
split-mode
<input type="text" id="cfg-split-mode" placeholder="e.g. layer" />
</label>
<label>
cache-type-k
<input type="text" id="cfg-cache-type-k" placeholder="e.g. q8_0" />
</label>
<label>
cache-type-v
<input type="text" id="cfg-cache-type-v" placeholder="e.g. q8_0" />
</label>
<label>
flash-attn
<input type="text" id="cfg-flash-attn" placeholder="on/off" />
</label>
<label>
temp
<input type="text" id="cfg-temp" placeholder="e.g. 0.7" />
</label>
<label>
top-k
<input type="text" id="cfg-top-k" placeholder="e.g. 40" />
</label>
<label>
top-p
<input type="text" id="cfg-top-p" placeholder="e.g. 0.9" />
</label>
<label>
repeat-penalty
<input type="text" id="cfg-repeat-penalty" placeholder="e.g. 1.1" />
</label>
<label>
repeat-last-n
<input type="text" id="cfg-repeat-last-n" placeholder="e.g. 256" />
</label>
<label>
frequency-penalty
<input type="text" id="cfg-frequency-penalty" placeholder="e.g. 0.1" />
</label>
<label>
presence-penalty
<input type="text" id="cfg-presence-penalty" placeholder="e.g. 0.0" />
</label>
<label class="config-wide">
extra args
<textarea id="cfg-extra-args" rows="3" placeholder="--mlock --no-mmap"></textarea>
</label>
<button type="submit" class="config-wide">Apply Parameters</button>
</form>
</div>
</section>
</main>
<section class="card logs-panel">
<div class="card-header">
<div>
<h3>llama.cpp Logs</h3>
<p class="lede small">Live tail from the llama.cpp container.</p>
</div>
<div class="log-actions">
<span id="logs-status" class="badge muted">Idle</span>
<button id="refresh-logs" class="ghost">Refresh</button>
</div>
</div>
<pre id="logs-output" class="log-output"></pre>
</section>
</div>
<script src="/ui/app.js"></script>
</body>
</html>

View File

@@ -0,0 +1,337 @@
:root {
--bg: #f5f6f8;
--panel: #ffffff;
--panel-muted: #f2f3f6;
--text: #111318;
--muted: #5b6472;
--border: rgba(17, 19, 24, 0.08);
--accent: #0a84ff;
--accent-ink: #005ad6;
--shadow: 0 20px 60px rgba(17, 19, 24, 0.08);
}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
font-family: "SF Pro Text", "SF Pro Display", "Helvetica Neue", "Segoe UI", sans-serif;
background: radial-gradient(circle at top, #ffffff 0%, var(--bg) 60%);
color: var(--text);
}
.page {
max-width: 1200px;
margin: 0 auto;
padding: 48px 28px 72px;
}
.topbar {
display: grid;
grid-template-columns: minmax(240px, 1.2fr) minmax(280px, 0.8fr);
gap: 32px;
align-items: stretch;
margin-bottom: 36px;
}
.header-actions {
display: grid;
gap: 16px;
justify-items: end;
}
.header-actions .quick-actions {
width: 100%;
}
.header-actions #theme-toggle {
justify-self: end;
}
.brand h1 {
font-size: clamp(2.2rem, 4vw, 3.2rem);
letter-spacing: -0.02em;
}
.eyebrow {
text-transform: uppercase;
letter-spacing: 0.2em;
font-size: 0.68rem;
color: var(--muted);
}
.lede {
margin-top: 12px;
font-size: 1rem;
color: var(--muted);
}
.lede.small {
font-size: 0.85rem;
}
.card {
background: var(--panel);
padding: 22px;
border-radius: 22px;
border: 1px solid var(--border);
box-shadow: var(--shadow);
}
.quick-actions h2 {
margin-bottom: 14px;
font-size: 1.1rem;
}
.layout {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
gap: 24px;
}
.column {
display: grid;
gap: 24px;
}
.logs-panel {
margin-top: 28px;
}
.card-header {
display: flex;
align-items: center;
justify-content: space-between;
gap: 12px;
margin-bottom: 16px;
}
.card-header h3 {
font-size: 1.1rem;
}
.log-actions {
display: flex;
align-items: center;
gap: 12px;
}
form {
display: grid;
gap: 12px;
}
label {
display: grid;
gap: 6px;
font-size: 0.85rem;
color: var(--muted);
}
input,
textarea,
button {
font: inherit;
}
input,
textarea {
padding: 10px 12px;
border-radius: 12px;
border: 1px solid var(--border);
background: #fff;
}
button {
border: none;
padding: 10px 16px;
border-radius: 12px;
background: var(--accent);
color: #fff;
font-weight: 600;
cursor: pointer;
transition: transform 0.2s ease, background 0.2s ease;
}
button:hover {
transform: translateY(-1px);
background: var(--accent-ink);
}
button.ghost {
background: transparent;
color: var(--accent);
border: 1px solid rgba(10, 132, 255, 0.4);
padding: 8px 12px;
}
.list {
list-style: none;
padding: 0;
margin: 0;
display: grid;
gap: 10px;
}
.list li {
padding: 12px;
border-radius: 14px;
background: var(--panel-muted);
border: 1px solid var(--border);
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
font-size: 0.85rem;
}
.list li.active {
border-color: rgba(10, 132, 255, 0.4);
background: #eef5ff;
}
.model-row {
display: flex;
align-items: center;
justify-content: space-between;
gap: 12px;
}
.badge {
display: inline-block;
padding: 4px 8px;
border-radius: 999px;
background: var(--accent);
color: #fff;
font-size: 0.7rem;
font-weight: 600;
}
.badge.muted {
background: rgba(17, 19, 24, 0.1);
color: var(--muted);
}
.status {
margin-bottom: 12px;
font-size: 0.9rem;
color: var(--muted);
}
.status.ok {
color: #1a7f37;
}
.status.error {
color: #b02a14;
}
.downloads {
display: grid;
gap: 12px;
}
.download-card {
border-radius: 16px;
border: 1px solid var(--border);
padding: 12px;
background: #f7f8fb;
}
.download-card strong {
display: block;
font-size: 0.9rem;
margin-bottom: 6px;
}
.progress {
height: 8px;
border-radius: 999px;
background: #dfe3ea;
overflow: hidden;
margin: 8px 0;
}
.progress > span {
display: block;
height: 100%;
background: var(--accent);
width: 0;
}
.error {
color: #b02a14;
font-size: 0.85rem;
}
.config-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
gap: 14px;
}
.config-wide {
grid-column: 1 / -1;
}
textarea {
padding: 10px 12px;
border-radius: 12px;
border: 1px solid var(--border);
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
font-size: 0.85rem;
resize: vertical;
}
.log-output {
background: #0f141b;
color: #dbe6f3;
padding: 16px;
border-radius: 16px;
min-height: 260px;
max-height: 420px;
overflow: auto;
font-size: 12px;
line-height: 1.6;
white-space: pre-wrap;
}
[data-theme="dark"] {
--bg: #0b0d12;
--panel: #141824;
--panel-muted: #1b2132;
--text: #f1f4f9;
--muted: #a5afc2;
--border: rgba(241, 244, 249, 0.1);
--accent: #4aa3ff;
--accent-ink: #1f7ae0;
--shadow: 0 20px 60px rgba(0, 0, 0, 0.4);
}
[data-theme="dark"] body {
background: radial-gradient(circle at top, #131826 0%, var(--bg) 60%);
}
[data-theme="dark"] .download-card {
background: #121826;
}
[data-theme="dark"] .progress {
background: #2a3349;
}
[data-theme="dark"] .log-output {
background: #080b12;
color: #d8e4f3;
}
@media (max-width: 900px) {
.topbar {
grid-template-columns: 1fr;
}
}
@media (max-width: 640px) {
.page {
padding: 32px 16px 48px;
}
}

View File

@@ -0,0 +1,74 @@
import asyncio
import logging
import time
from pathlib import Path
import httpx
log = logging.getLogger("llamacpp_warmup")
def _is_loading_error(response: httpx.Response) -> bool:
if response.status_code != 503:
return False
try:
payload = response.json()
except Exception:
return False
message = ""
if isinstance(payload, dict):
error = payload.get("error")
if isinstance(error, dict):
message = str(error.get("message") or "")
else:
message = str(payload.get("message") or "")
return "loading model" in message.lower()
def resolve_warmup_prompt(override: str | None, fallback_path: str) -> str:
if override:
prompt = override.strip()
if prompt:
return prompt
try:
prompt = Path(fallback_path).read_text(encoding="utf-8").strip()
if prompt:
return prompt
except Exception as exc:
log.warning("Failed to read warmup prompt from %s: %s", fallback_path, exc)
return "ok"
async def run_warmup(base_url: str, model_id: str, prompt: str, timeout_s: float) -> None:
payload = {
"model": model_id,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 8,
"temperature": 0,
}
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
resp = await client.post("/v1/chat/completions", json=payload)
if resp.status_code == 503 and _is_loading_error(resp):
raise RuntimeError("llama.cpp still loading model")
resp.raise_for_status()
async def run_warmup_with_retry(
base_url: str,
model_id: str,
prompt: str,
timeout_s: float,
interval_s: float = 3.0,
) -> None:
deadline = time.time() + timeout_s
last_exc: Exception | None = None
while time.time() < deadline:
try:
await run_warmup(base_url, model_id, prompt, timeout_s=timeout_s)
return
except Exception as exc:
last_exc = exc
await asyncio.sleep(interval_s)
if last_exc:
raise last_exc

464
llamacpp_remote_test.ps1 Normal file
View File

@@ -0,0 +1,464 @@
param(
[Parameter(Mandatory = $true)][string]$Model,
[string]$BaseUrl = "http://192.168.1.2:8071",
[string]$PromptPath = "prompt_crwv.txt",
[int]$Runs = 3,
[int]$MaxTokens = 2000,
[int]$NumCtx = 131072,
[int]$TopK = 1,
[double]$TopP = 1.0,
[int]$Seed = 42,
[double]$RepeatPenalty = 1.05,
[double]$Temperature = 0,
[string]$JsonSchema = "",
[int]$TimeoutSec = 1800,
[string]$BatchId,
[switch]$EnableGpuMonitor = $true,
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
[string]$SshUser = "rushabh",
[string]$SshHost = "192.168.1.2",
[int]$SshPort = 55555,
[int]$GpuMonitorIntervalSec = 1,
[int]$GpuMonitorSeconds = 120
)
$ErrorActionPreference = "Stop"
$ProgressPreference = "SilentlyContinue"
function Normalize-Strike([object]$value) {
if ($null -eq $value) { return $null }
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
}
return ($value.ToString().Trim())
}
function Get-AllowedLegs([string]$promptText) {
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
if (-not $match.Success) {
throw "Options Chain JSON block not found in prompt."
}
$chains = $match.Groups[1].Value | ConvertFrom-Json
$allowedExpiry = @{}
$allowedLegs = @{}
foreach ($exp in $chains) {
$expiry = [string]$exp.expiry
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
$allowedExpiry[$expiry] = $true
foreach ($leg in $exp.liquidSet) {
if ($null -eq $leg) { continue }
if ($leg.liquid -ne $true) { continue }
$side = [string]$leg.side
$strikeNorm = Normalize-Strike $leg.strike
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
$key = "$expiry|$side|$strikeNorm"
$allowedLegs[$key] = $true
}
}
}
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
}
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
$errors = New-Object System.Collections.Generic.List[string]
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
foreach ($key in $requiredTop) {
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
$errors.Add("Missing top-level key: $key")
}
}
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
}
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
}
} else {
$errors.Add("selectedExpiry is missing or empty")
}
if ($obj.confidenceScore -ne $null) {
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
$errors.Add("confidenceScore is not numeric")
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
$errors.Add("confidenceScore out of range 0-100")
}
}
if ($obj.recommendedTrades -eq $null) {
$errors.Add("recommendedTrades is null")
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
$errors.Add("recommendedTrades is not an array")
}
if ($obj.strategyBias -eq "NO_TRADE") {
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
}
} else {
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
$errors.Add("recommendedTrades must contain 1-3 trades")
}
}
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
$errors.Add("whyOthersRejected is not an array")
}
if ($obj.recommendedTrades) {
foreach ($trade in $obj.recommendedTrades) {
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
foreach ($tkey in $tradeRequired) {
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
$errors.Add("Trade missing key: $tkey")
}
}
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
}
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
}
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
$errors.Add("Trade legs missing or not an array")
continue
}
$legs = @($trade.legs)
$hasBuy = $false
$hasSell = $false
foreach ($leg in $trade.legs) {
$side = ([string]$leg.side).ToLowerInvariant()
$action = ([string]$leg.action).ToLowerInvariant()
$expiry = [string]$leg.expiry
$strikeNorm = Normalize-Strike $leg.strike
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
$key = "$expiry|$side|$strikeNorm"
if (-not $allowedLegs.ContainsKey($key)) {
$errors.Add("Leg not in liquid set: $key")
}
}
if ($action -eq "buy") { $hasBuy = $true }
if ($action -eq "sell") { $hasSell = $true }
}
if ($obj.selectedExpiry -and $legs) {
foreach ($leg in $legs) {
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
}
}
}
if ($hasSell -and -not $hasBuy) {
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
}
if ($trade.greekProfile) {
$gp = $trade.greekProfile
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
foreach ($gkey in $gpRequired) {
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
$errors.Add("Missing greekProfile.$gkey")
}
}
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
$errors.Add("ThetaExposure POS on all-long legs")
}
} else {
$errors.Add("Missing greekProfile")
}
$structure = ([string]$trade.structure).ToLowerInvariant()
$tradeName = ([string]$trade.name).ToLowerInvariant()
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
if ($isStraddle -or $isStrangle) {
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
$callLegs = $legs | Where-Object { $_.side -eq "call" }
$putLegs = $legs | Where-Object { $_.side -eq "put" }
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
$callStrike = Normalize-Strike $callLegs[0].strike
$putStrike = Normalize-Strike $putLegs[0].strike
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
if ($isStrangle) {
try {
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
} catch {
$errors.Add("Strangle strike comparison failed")
}
}
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
$errors.Add("Straddle/Strangle must be long (buy) legs")
}
}
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
}
}
if ($isCallDebit) {
$callLegs = $legs | Where-Object { $_.side -eq "call" }
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
try {
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
$errors.Add("Call debit spread buy strike must be below sell strike")
}
} catch {
$errors.Add("Call debit spread strike comparison failed")
}
}
}
if ($isPutDebit) {
$putLegs = $legs | Where-Object { $_.side -eq "put" }
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
try {
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
$errors.Add("Put debit spread buy strike must be above sell strike")
}
} catch {
$errors.Add("Put debit spread strike comparison failed")
}
}
}
}
}
return $errors
}
function Parse-GpuLog {
param([string]$Path)
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
if (-not (Test-Path $Path)) {
$summary.error = "gpu log missing"
return $summary
}
$lines = Get-Content -Path $Path
$currentIndex = -1
$gpuIndex = -1
$inUtilBlock = $false
foreach ($line in $lines) {
if ($line -match '^Timestamp') {
$gpuIndex = -1
$currentIndex = -1
$inUtilBlock = $false
continue
}
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
$gpuIndex += 1
$currentIndex = $gpuIndex
$inUtilBlock = $false
continue
}
if ($line -match '^\\s*Utilization\\s*$') {
$inUtilBlock = $true
continue
}
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
$util = [int]$Matches[1]
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
$summary.samples += 1
}
}
return $summary
}
$prompt = [string](Get-Content -Raw -Path $PromptPath)
$allowed = Get-AllowedLegs -promptText $prompt
$allowedExpiry = $allowed.AllowedExpiry
$allowedLegs = $allowed.AllowedLegs
if ([string]::IsNullOrWhiteSpace($BatchId)) {
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
}
$outBase = Join-Path -Path (Get-Location) -ChildPath "llamacpp_runs_remote"
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
$summary = [ordered]@{
model = $Model
baseUrl = $BaseUrl
batchId = $BatchId
params = [ordered]@{
temperature = $Temperature
top_k = $TopK
top_p = $TopP
seed = $Seed
repeat_penalty = $RepeatPenalty
max_tokens = $MaxTokens
num_ctx = $NumCtx
}
gpuMonitor = [ordered]@{
enabled = [bool]$EnableGpuMonitor
sshHost = $SshHost
sshPort = $SshPort
intervalSec = $GpuMonitorIntervalSec
durationSec = $GpuMonitorSeconds
}
modelMeta = $null
runs = @()
}
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
try {
$schemaObject = $JsonSchema | ConvertFrom-Json
} catch {
throw "JsonSchema is not valid JSON: $($_.Exception.Message)"
}
}
try {
$modelsResponse = Invoke-RestMethod -Uri "$BaseUrl/v1/models" -TimeoutSec 30
$meta = $modelsResponse.data | Where-Object { $_.id -eq $Model } | Select-Object -First 1
if ($meta) { $summary.modelMeta = $meta.meta }
} catch {
$summary.modelMeta = @{ error = $_.Exception.Message }
}
for ($i = 1; $i -le $Runs; $i++) {
Write-Host "Running $Model (run $i/$Runs)"
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
$gpuJob = $null
$gpuLogPath = $null
if ($EnableGpuMonitor) {
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
$gpuJob = Start-Job -ScriptBlock {
param($sshExe, $target, $port, $samples, $interval, $logPath)
for ($s = 1; $s -le $samples; $s++) {
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
try {
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
Add-Content -Path $logPath -Value $out
} catch {
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
}
Start-Sleep -Seconds $interval
}
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
Start-Sleep -Seconds 1
}
$body = @{
model = $Model
messages = @(@{ role = "user"; content = $prompt })
temperature = $Temperature
top_k = $TopK
top_p = $TopP
seed = $Seed
repeat_penalty = $RepeatPenalty
max_tokens = $MaxTokens
}
if ($schemaObject) {
$body.response_format = @{
type = "json_schema"
json_schema = @{
name = "trade_schema"
schema = $schemaObject
strict = $true
}
}
}
$body = $body | ConvertTo-Json -Depth 12
try {
$resp = Invoke-RestMethod -Uri "$BaseUrl/v1/chat/completions" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
} catch {
$runResult.errors = @("API error: $($_.Exception.Message)")
$summary.runs += $runResult
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
continue
} finally {
if ($gpuJob) {
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
Remove-Job -Job $gpuJob | Out-Null
}
}
$raw = [string]$resp.choices[0].message.content
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
try {
$parsed = $raw | ConvertFrom-Json
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
if ($errors.Count -eq 0) {
$runResult.ok = $true
} else {
$runResult.errors = $errors
}
} catch {
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
}
if ($gpuLogPath) {
$runResult.gpuLog = $gpuLogPath
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
}
if ($resp.timings) {
$runResult.timings = $resp.timings
}
if ($resp.usage) {
$runResult.usage = $resp.usage
}
$summary.runs += $runResult
}
$summaryPath = Join-Path $outDir "summary.json"
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
$summary | ConvertTo-Json -Depth 6

117
llamacpp_set_command.ps1 Normal file
View File

@@ -0,0 +1,117 @@
param(
[Parameter(Mandatory = $true)][string]$ModelPath,
[Parameter(Mandatory = $true)][int]$CtxSize,
[int]$BatchSize = 1024,
[int]$UBatchSize = 256,
[string]$TensorSplit = "0.5,0.5",
[string]$Devices = "0,1",
[int]$GpuLayers = 999,
[string]$CacheTypeK = "q4_0",
[string]$CacheTypeV = "q4_0",
[string]$GrammarFile = "",
[string]$JsonSchema = "",
[string]$BaseUrl = "http://192.168.1.2:8071",
[int]$TimeoutSec = 600,
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
[string]$SshUser = "rushabh",
[string]$SshHost = "192.168.1.2",
[int]$SshPort = 55555
)
$ErrorActionPreference = "Stop"
$ProgressPreference = "SilentlyContinue"
$commandArgs = @(
"--model", $ModelPath,
"--ctx-size", $CtxSize.ToString(),
"--n-gpu-layers", $GpuLayers.ToString(),
"--split-mode", "layer",
"--tensor-split", $TensorSplit,
"--batch-size", $BatchSize.ToString(),
"--ubatch-size", $UBatchSize.ToString(),
"--cache-type-k", $CacheTypeK,
"--cache-type-v", $CacheTypeV,
"--flash-attn", "on"
)
if (-not [string]::IsNullOrWhiteSpace($Devices)) {
$commandArgs = @("--device", $Devices) + $commandArgs
}
if (-not [string]::IsNullOrWhiteSpace($GrammarFile)) {
$commandArgs += @("--grammar-file", $GrammarFile)
}
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
$commandArgs += @("--json-schema", $JsonSchema)
}
$argJson = $commandArgs | ConvertTo-Json -Compress
$py = @"
import json
path = r"/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
new_cmd = json.loads(r'''$argJson''')
lines = open(path, "r", encoding="utf-8").read().splitlines()
out = []
in_cmd = False
def yaml_quote(value):
text = str(value)
return "'" + text.replace("'", "''") + "'"
for line in lines:
if line.startswith('"command":'):
out.append('"command":')
for arg in new_cmd:
out.append(f"- {yaml_quote(arg)}")
in_cmd = True
continue
if in_cmd:
if line.startswith('"') and not line.startswith('"command":'):
in_cmd = False
out.append(line)
else:
continue
else:
out.append(line)
if in_cmd:
pass
open(path, "w", encoding="utf-8").write("\n".join(out) + "\n")
"@
$py | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -"
$pyCompose = @"
import json, yaml, subprocess
compose_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/templates/rendered/docker-compose.yaml"
user_config_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
with open(compose_path, "r", encoding="utf-8") as f:
compose = json.load(f)
with open(user_config_path, "r", encoding="utf-8") as f:
config = yaml.safe_load(f)
command = config.get("command")
if not command:
raise SystemExit("command list missing from user_config")
svc = compose["services"]["llamacpp"]
svc["command"] = command
with open(compose_path, "w", encoding="utf-8") as f:
json.dump(compose, f)
payload = {"custom_compose_config": compose}
subprocess.run(["midclt", "call", "app.update", "llamacpp", json.dumps(payload)], check=True)
"@
$pyCompose | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -" | Out-Null
$start = Get-Date
while ((Get-Date) - $start -lt [TimeSpan]::FromSeconds($TimeoutSec)) {
try {
$resp = Invoke-RestMethod -Uri "$BaseUrl/health" -TimeoutSec 10
if ($resp.status -eq "ok") {
Write-Host "llamacpp healthy at $BaseUrl"
exit 0
}
} catch {
Start-Sleep -Seconds 5
}
}
throw "Timed out waiting for llama.cpp server at $BaseUrl"

View File

@@ -0,0 +1,14 @@
FROM deepseek-r1:14b
SYSTEM """
You are a senior quantitative options trader specializing in index and ETF options.
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
No markdown, no code fences, no commentary, no extra keys, no trailing text.
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
Begin output with { and end with }.
"""
PARAMETER temperature 0
PARAMETER top_k 1
PARAMETER top_p 1
PARAMETER repeat_penalty 1.05
PARAMETER seed 42

View File

@@ -0,0 +1,14 @@
FROM llama3.1:70b
SYSTEM """
You are a senior quantitative options trader specializing in index and ETF options.
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
No markdown, no code fences, no commentary, no extra keys, no trailing text.
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
Begin output with { and end with }.
"""
PARAMETER temperature 0
PARAMETER top_k 1
PARAMETER top_p 1
PARAMETER repeat_penalty 1.05
PARAMETER seed 42

View File

@@ -0,0 +1,14 @@
FROM phi3:mini-128k
SYSTEM """
You are a senior quantitative options trader specializing in index and ETF options.
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
No markdown, no code fences, no commentary, no extra keys, no trailing text.
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
Begin output with { and end with }.
"""
PARAMETER temperature 0
PARAMETER top_k 1
PARAMETER top_p 1
PARAMETER repeat_penalty 1.05
PARAMETER seed 42

561
ollama_remote_test.ps1 Normal file
View File

@@ -0,0 +1,561 @@
param(
[Parameter(Mandatory = $true)][string]$Model,
[string]$BaseUrl = "http://192.168.1.2:30068",
[string]$PromptPath = "prompt_crwv.txt",
[int]$Runs = 3,
[int]$NumPredict = 1200,
[int]$NumCtx = 131072,
[int]$NumBatch = 0,
[int]$NumGpuLayers = 0,
[int]$TimeoutSec = 900,
[int]$TopK = 1,
[double]$TopP = 1.0,
[int]$Seed = 42,
[double]$RepeatPenalty = 1.05,
[string]$BatchId,
[switch]$UseSchemaFormat = $false,
[switch]$EnableGpuMonitor = $true,
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
[switch]$CheckProcessor = $true,
[string]$SshUser = "rushabh",
[string]$SshHost = "192.168.1.2",
[int]$SshPort = 55555,
[int]$GpuMonitorIntervalSec = 1,
[int]$GpuMonitorSeconds = 120
)
$ErrorActionPreference = "Stop"
$ProgressPreference = "SilentlyContinue"
function Normalize-Strike([object]$value) {
if ($null -eq $value) { return $null }
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
}
return ($value.ToString().Trim())
}
function Get-AllowedLegs([string]$promptText) {
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
if (-not $match.Success) {
throw "Options Chain JSON block not found in prompt."
}
$chains = $match.Groups[1].Value | ConvertFrom-Json
$allowedExpiry = @{}
$allowedLegs = @{}
foreach ($exp in $chains) {
$expiry = [string]$exp.expiry
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
$allowedExpiry[$expiry] = $true
foreach ($leg in $exp.liquidSet) {
if ($null -eq $leg) { continue }
if ($leg.liquid -ne $true) { continue }
$side = [string]$leg.side
$strikeNorm = Normalize-Strike $leg.strike
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
$key = "$expiry|$side|$strikeNorm"
$allowedLegs[$key] = $true
}
}
}
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
}
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
$errors = New-Object System.Collections.Generic.List[string]
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
foreach ($key in $requiredTop) {
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
$errors.Add("Missing top-level key: $key")
}
}
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
}
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
}
} else {
$errors.Add("selectedExpiry is missing or empty")
}
if ($obj.confidenceScore -ne $null) {
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
$errors.Add("confidenceScore is not numeric")
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
$errors.Add("confidenceScore out of range 0-100")
}
}
if ($obj.recommendedTrades -eq $null) {
$errors.Add("recommendedTrades is null")
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
$errors.Add("recommendedTrades is not an array")
}
if ($obj.strategyBias -eq "NO_TRADE") {
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
}
} else {
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
$errors.Add("recommendedTrades must contain 1-3 trades")
}
}
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
$errors.Add("whyOthersRejected is not an array")
}
if ($obj.recommendedTrades) {
foreach ($trade in $obj.recommendedTrades) {
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
foreach ($tkey in $tradeRequired) {
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
$errors.Add("Trade missing key: $tkey")
}
}
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
}
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
}
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
$errors.Add("Trade legs missing or not an array")
continue
}
$legs = @($trade.legs)
$hasBuy = $false
$hasSell = $false
foreach ($leg in $trade.legs) {
$side = ([string]$leg.side).ToLowerInvariant()
$action = ([string]$leg.action).ToLowerInvariant()
$expiry = [string]$leg.expiry
$strikeNorm = Normalize-Strike $leg.strike
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
$key = "$expiry|$side|$strikeNorm"
if (-not $allowedLegs.ContainsKey($key)) {
$errors.Add("Leg not in liquid set: $key")
}
}
if ($action -eq "buy") { $hasBuy = $true }
if ($action -eq "sell") { $hasSell = $true }
}
if ($obj.selectedExpiry -and $legs) {
foreach ($leg in $legs) {
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
}
}
}
if ($hasSell -and -not $hasBuy) {
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
}
if ($trade.greekProfile) {
$gp = $trade.greekProfile
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
foreach ($gkey in $gpRequired) {
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
$errors.Add("Missing greekProfile.$gkey")
}
}
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
$errors.Add("ThetaExposure POS on all-long legs")
}
} else {
$errors.Add("Missing greekProfile")
}
$structure = ([string]$trade.structure).ToLowerInvariant()
$tradeName = ([string]$trade.name).ToLowerInvariant()
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
if ($isStraddle -or $isStrangle) {
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
$callLegs = $legs | Where-Object { $_.side -eq "call" }
$putLegs = $legs | Where-Object { $_.side -eq "put" }
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
$callStrike = Normalize-Strike $callLegs[0].strike
$putStrike = Normalize-Strike $putLegs[0].strike
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
if ($isStrangle) {
try {
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
} catch {
$errors.Add("Strangle strike comparison failed")
}
}
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
$errors.Add("Straddle/Strangle must be long (buy) legs")
}
}
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
}
}
if ($isCallDebit) {
$callLegs = $legs | Where-Object { $_.side -eq "call" }
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
try {
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
$errors.Add("Call debit spread buy strike must be below sell strike")
}
} catch {
$errors.Add("Call debit spread strike comparison failed")
}
}
}
if ($isPutDebit) {
$putLegs = $legs | Where-Object { $_.side -eq "put" }
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
try {
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
$errors.Add("Put debit spread buy strike must be above sell strike")
}
} catch {
$errors.Add("Put debit spread strike comparison failed")
}
}
}
}
}
return $errors
}
function Parse-GpuLog {
param([string]$Path)
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
if (-not (Test-Path $Path)) {
$summary.error = "gpu log missing"
return $summary
}
$lines = Get-Content -Path $Path
$currentIndex = -1
$gpuIndex = -1
$inGpuUtilSamples = $false
$inUtilBlock = $false
foreach ($line in $lines) {
if ($line -match '^Timestamp') {
$gpuIndex = -1
$currentIndex = -1
$inGpuUtilSamples = $false
$inUtilBlock = $false
continue
}
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
$gpuIndex += 1
$currentIndex = $gpuIndex
$inGpuUtilSamples = $false
$inUtilBlock = $false
continue
}
if ($line -match '^\\s*Utilization\\s*$') {
$inUtilBlock = $true
continue
}
if ($line -match '^\\s*GPU Utilization Samples') {
$inGpuUtilSamples = $true
$inUtilBlock = $false
continue
}
if ($line -match '^\\s*(Memory|ENC|DEC) Utilization Samples') {
$inGpuUtilSamples = $false
$inUtilBlock = $false
continue
}
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
$util = [int]$Matches[1]
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
$summary.samples += 1
continue
}
if ($inGpuUtilSamples -and $line -match '^\\s*Max\\s*:\\s*([0-9]+)\\s*%') {
$util = [int]$Matches[1]
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
$summary.samples += 1
}
}
return $summary
}
function Get-ProcessorShare {
param(
[string]$SshExePath,
[string]$Target,
[int]$Port,
[string]$ModelName
)
$result = [ordered]@{ cpuPct = $null; gpuPct = $null; raw = $null; error = $null }
try {
$out = & $SshExePath -p $Port $Target "sudo -n docker exec ix-ollama-ollama-1 ollama ps"
$line = $out | Select-String -SimpleMatch $ModelName | Select-Object -First 1
if ($null -eq $line) {
$result.error = "model not found in ollama ps"
return $result
}
$raw = $line.ToString().Trim()
$result.raw = $raw
if ($raw -match '([0-9]+)%\\/([0-9]+)%\\s+CPU\\/GPU') {
$result.cpuPct = [int]$Matches[1]
$result.gpuPct = [int]$Matches[2]
} elseif ($raw -match '([0-9]+)%\\s+GPU') {
$result.cpuPct = 0
$result.gpuPct = [int]$Matches[1]
} else {
$result.error = "CPU/GPU split not parsed"
}
} catch {
$result.error = $_.Exception.Message
}
return $result
}
$prompt = [string](Get-Content -Raw -Path $PromptPath)
$allowed = Get-AllowedLegs -promptText $prompt
$allowedExpiry = $allowed.AllowedExpiry
$allowedLegs = $allowed.AllowedLegs
if ([string]::IsNullOrWhiteSpace($BatchId)) {
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
}
$outBase = Join-Path -Path (Get-Location) -ChildPath "ollama_runs_remote"
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
$summary = [ordered]@{
model = $Model
baseUrl = $BaseUrl
formatMode = $(if ($UseSchemaFormat) { "schema" } else { "json" })
batchId = $BatchId
gpuMonitor = [ordered]@{
enabled = [bool]$EnableGpuMonitor
sshHost = $SshHost
sshPort = $SshPort
intervalSec = $GpuMonitorIntervalSec
durationSec = $GpuMonitorSeconds
}
runs = @()
}
for ($i = 1; $i -le $Runs; $i++) {
Write-Host "Running $Model (run $i/$Runs)"
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
$gpuJob = $null
$gpuLogPath = $null
if ($EnableGpuMonitor) {
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
$gpuJob = Start-Job -ScriptBlock {
param($sshExe, $target, $port, $samples, $interval, $logPath)
for ($s = 1; $s -le $samples; $s++) {
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
try {
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
Add-Content -Path $logPath -Value $out
} catch {
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
}
Start-Sleep -Seconds $interval
}
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
Start-Sleep -Seconds 1
}
$format = "json"
if ($UseSchemaFormat) {
$format = @{
type = "object"
additionalProperties = $false
required = @("selectedExpiry","expiryRationale","strategyBias","recommendedTrades","whyOthersRejected","confidenceScore")
properties = @{
selectedExpiry = @{ type = "string"; minLength = 1 }
expiryRationale = @{ type = "string"; minLength = 1 }
strategyBias = @{ type = "string"; enum = @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE") }
recommendedTrades = @{
type = "array"
minItems = 0
maxItems = 3
items = @{
type = "object"
additionalProperties = $false
required = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
properties = @{
name = @{ type = "string"; minLength = 1 }
structure = @{ type = "string"; minLength = 1 }
legs = @{
type = "array"
minItems = 1
maxItems = 4
items = @{
type = "object"
additionalProperties = $false
required = @("side","action","strike","expiry")
properties = @{
side = @{ type = "string"; enum = @("call","put") }
action = @{ type = "string"; enum = @("buy","sell") }
strike = @{ type = @("number","string") }
expiry = @{ type = "string"; minLength = 1 }
}
}
}
greekProfile = @{
type = "object"
additionalProperties = $false
required = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
properties = @{
deltaBias = @{ type = "string"; enum = @("POS","NEG","NEUTRAL") }
gammaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
thetaExposure = @{ type = "string"; enum = @("POS","NEG","LOW") }
vegaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
}
}
maxRisk = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
maxReward = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
thesisAlignment = @{ type = "string"; minLength = 1 }
invalidation = @{ type = "string"; minLength = 1 }
managementNotes = @{ type = "string" }
}
}
}
whyOthersRejected = @{
type = "array"
items = @{ type = "string" }
}
confidenceScore = @{ type = "number"; minimum = 0; maximum = 100 }
}
}
}
$options = @{
temperature = 0
top_k = $TopK
top_p = $TopP
seed = $Seed
repeat_penalty = $RepeatPenalty
num_ctx = $NumCtx
num_predict = $NumPredict
}
if ($NumBatch -gt 0) {
$options.num_batch = $NumBatch
}
if ($NumGpuLayers -gt 0) {
$options.num_gpu_layers = $NumGpuLayers
}
$body = @{
model = $Model
prompt = $prompt
format = $format
stream = $false
options = $options
} | ConvertTo-Json -Depth 10
try {
$resp = Invoke-RestMethod -Uri "$BaseUrl/api/generate" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
} catch {
$runResult.errors = @("API error: $($_.Exception.Message)")
$summary.runs += $runResult
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
continue
} finally {
if ($gpuJob) {
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
Remove-Job -Job $gpuJob | Out-Null
}
}
$raw = [string]$resp.response
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
try {
$parsed = $raw | ConvertFrom-Json
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
if ($errors.Count -eq 0) {
$runResult.ok = $true
} else {
$runResult.errors = $errors
}
} catch {
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
}
if ($gpuLogPath) {
$runResult.gpuLog = $gpuLogPath
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
}
if ($CheckProcessor) {
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
$proc = Get-ProcessorShare -SshExePath $SshExe -Target $sshTarget -Port $SshPort -ModelName $Model
$runResult.processor = $proc
if ($proc.cpuPct -ne $null) {
$runResult.gpuOnly = ($proc.cpuPct -eq 0)
}
}
$summary.runs += $runResult
}
$summaryPath = Join-Path $outDir "summary.json"
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
$summary | ConvertTo-Json -Depth 6

155
prompt_crwv.txt Normal file

File diff suppressed because one or more lines are too long

1
query.sql Normal file
View File

@@ -0,0 +1 @@
SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';

8
requirements.txt Normal file
View File

@@ -0,0 +1,8 @@
fastapi==0.115.6
uvicorn==0.30.6
httpx==0.27.2
pytest==8.3.3
respx==0.21.1
pytest-asyncio==0.24.0
PyYAML==6.0.3
websockets==12.0

View File

@@ -0,0 +1,116 @@
import argparse
import asyncio
import json
import ssl
from typing import Any, Dict, List, Optional
import websockets
async def _rpc_call(ws_url: str, api_key: str, method: str, params: Optional[list] = None, verify_ssl: bool = False) -> Any:
ssl_ctx = None
if ws_url.startswith("wss://") and not verify_ssl:
ssl_ctx = ssl.create_default_context()
ssl_ctx.check_hostname = False
ssl_ctx.verify_mode = ssl.CERT_NONE
async with websockets.connect(ws_url, ssl=ssl_ctx) as ws:
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
connected = json.loads(await ws.recv())
if connected.get("msg") != "connected":
raise RuntimeError("failed to connect to TrueNAS websocket")
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [api_key]}))
auth_resp = json.loads(await ws.recv())
if not auth_resp.get("result"):
raise RuntimeError("API key authentication failed")
req_id = 2
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
while True:
raw = json.loads(await ws.recv())
if raw.get("id") != req_id:
continue
if raw.get("msg") == "error":
raise RuntimeError(raw.get("error"))
return raw.get("result")
async def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--ws-url", required=True)
parser.add_argument("--api-key", required=True)
parser.add_argument("--api-user")
parser.add_argument("--app-name", required=True)
parser.add_argument("--image", required=True)
parser.add_argument("--model-host-path", required=True)
parser.add_argument("--llamacpp-base-url", required=True)
parser.add_argument("--network", required=True)
parser.add_argument("--api-port", type=int, default=9091)
parser.add_argument("--ui-port", type=int, default=9092)
parser.add_argument("--verify-ssl", action="store_true")
args = parser.parse_args()
api_port = args.api_port
ui_port = args.ui_port
env = {
"PORT_A": str(api_port),
"PORT_B": str(ui_port),
"LLAMACPP_BASE_URL": args.llamacpp_base_url,
"MODEL_DIR": "/models",
"TRUENAS_WS_URL": args.ws_url,
"TRUENAS_API_KEY": args.api_key,
"TRUENAS_APP_NAME": "llamacpp",
"TRUENAS_VERIFY_SSL": "false",
}
if args.api_user:
env["TRUENAS_API_USER"] = args.api_user
compose = {
"services": {
"wrapper": {
"image": args.image,
"restart": "unless-stopped",
"ports": [
f"{api_port}:{api_port}",
f"{ui_port}:{ui_port}",
],
"environment": env,
"volumes": [
f"{args.model_host_path}:/models",
"/var/run/docker.sock:/var/run/docker.sock",
],
"networks": ["llamacpp_net"],
}
},
"networks": {
"llamacpp_net": {"external": True, "name": args.network}
},
}
create_payload = {
"custom_app": True,
"app_name": args.app_name,
"custom_compose_config": compose,
}
existing = await _rpc_call(args.ws_url, args.api_key, "app.query", [[["id", "=", args.app_name]]], args.verify_ssl)
if existing:
result = await _rpc_call(
args.ws_url,
args.api_key,
"app.update",
[args.app_name, {"custom_compose_config": compose}],
args.verify_ssl,
)
action = "updated"
else:
result = await _rpc_call(args.ws_url, args.api_key, "app.create", [create_payload], args.verify_ssl)
action = "created"
print(json.dumps({"action": action, "api_port": api_port, "ui_port": ui_port, "result": result}, indent=2))
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -0,0 +1,162 @@
import json
import os
import time
from datetime import datetime
import requests
BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9000")
UPSTREAM = os.getenv("LLAMACPP_BASE", "http://192.168.1.2:8071")
RUNS = int(os.getenv("RUNS", "100"))
MAX_TOKENS = int(os.getenv("MAX_TOKENS", "4"))
TIMEOUT = int(os.getenv("REQ_TIMEOUT", "300"))
def _now():
return datetime.utcnow().isoformat() + "Z"
def _get_loaded_model_id():
deadline = time.time() + 600
last_error = None
while time.time() < deadline:
try:
resp = requests.get(UPSTREAM + "/v1/models", timeout=30)
resp.raise_for_status()
data = resp.json().get("data") or []
if data:
return data[0].get("id")
last_error = "no models reported by upstream"
except Exception as exc:
last_error = str(exc)
time.sleep(5)
raise RuntimeError(f"upstream not ready: {last_error}")
def _stream_ok(resp):
got_data = False
got_done = False
for line in resp.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("data:"):
got_data = True
if line.strip() == "data: [DONE]":
got_done = True
break
return got_data, got_done
def run_suite(model_id, idx):
results = {}
# Models
r = requests.get(BASE + "/v1/models", timeout=30)
results["models"] = r.status_code
r = requests.get(BASE + f"/v1/models/{model_id}", timeout=30)
results["model_get"] = r.status_code
# Chat completions non-stream
payload = {
"model": model_id,
"messages": [{"role": "user", "content": f"Run {idx}: say ok."}],
"max_tokens": MAX_TOKENS,
"temperature": (idx % 5) / 10.0,
}
r = requests.post(BASE + "/v1/chat/completions", json=payload, timeout=TIMEOUT)
results["chat"] = r.status_code
# Chat completions stream
payload_stream = dict(payload)
payload_stream["stream"] = True
r = requests.post(BASE + "/v1/chat/completions", json=payload_stream, stream=True, timeout=TIMEOUT)
ok_data, ok_done = _stream_ok(r)
results["chat_stream"] = r.status_code
results["chat_stream_ok"] = ok_data and ok_done
# Responses non-stream
payload_resp = {
"model": model_id,
"input": f"Run {idx}: say ok.",
"max_output_tokens": MAX_TOKENS,
}
r = requests.post(BASE + "/v1/responses", json=payload_resp, timeout=TIMEOUT)
results["responses"] = r.status_code
# Responses stream
payload_resp_stream = {
"model": model_id,
"input": f"Run {idx}: say ok.",
"stream": True,
}
r = requests.post(BASE + "/v1/responses", json=payload_resp_stream, stream=True, timeout=TIMEOUT)
ok_data, ok_done = _stream_ok(r)
results["responses_stream"] = r.status_code
results["responses_stream_ok"] = ok_data and ok_done
# Embeddings (best effort)
payload_emb = {"model": model_id, "input": f"Run {idx}"}
r = requests.post(BASE + "/v1/embeddings", json=payload_emb, timeout=TIMEOUT)
results["embeddings"] = r.status_code
# Proxy
r = requests.post(BASE + "/proxy/llamacpp/v1/chat/completions", json=payload, timeout=TIMEOUT)
results["proxy"] = r.status_code
return results
def main():
summary = {
"started_at": _now(),
"base": BASE,
"upstream": UPSTREAM,
"runs": RUNS,
"max_tokens": MAX_TOKENS,
"results": [],
}
model_id = _get_loaded_model_id()
summary["model_id"] = model_id
for i in range(1, RUNS + 1):
start = time.time()
try:
results = run_suite(model_id, i)
ok = all(
results.get(key) == 200
for key in ("models", "model_get", "chat", "chat_stream", "responses", "responses_stream", "proxy")
)
stream_ok = results.get("chat_stream_ok") and results.get("responses_stream_ok")
summary["results"].append({
"run": i,
"ok": ok and stream_ok,
"stream_ok": stream_ok,
"status": results,
"elapsed_s": round(time.time() - start, 2),
})
except Exception as exc:
summary["results"].append({
"run": i,
"ok": False,
"stream_ok": False,
"error": str(exc),
"elapsed_s": round(time.time() - start, 2),
})
print(f"Run {i}/{RUNS} done")
summary["finished_at"] = _now()
os.makedirs("reports", exist_ok=True)
out_path = os.path.join("reports", "remote_wrapper_test.json")
with open(out_path, "w", encoding="utf-8") as f:
json.dump(summary, f, indent=2)
# Print a compact summary
ok_count = sum(1 for r in summary["results"] if r.get("ok"))
print(f"OK {ok_count}/{RUNS}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,29 @@
param(
[string]$OutDocs = "reports\\llamacpp_docs.md",
[string]$OutFlags = "reports\\llamacpp_flags.txt"
)
$urls = @(
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README.md",
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README-llama-server.md",
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/README.md"
)
$out = @()
foreach ($u in $urls) {
try {
$content = Invoke-WebRequest -Uri $u -UseBasicParsing -TimeoutSec 30
$out += "# Source: $u"
$out += $content.Content
} catch {
$out += "# Source: $u"
$out += "(failed to fetch)"
}
}
$out | Set-Content -Encoding UTF8 $OutDocs
$docs = Get-Content $OutDocs -Raw
$flags = [regex]::Matches($docs, "--[a-zA-Z0-9\\-]+") | ForEach-Object { $_.Value }
$flags = $flags | Sort-Object -Unique
$flags | Set-Content -Encoding UTF8 $OutFlags

61
tests/conftest.py Normal file
View File

@@ -0,0 +1,61 @@
import json
import os
from pathlib import Path
import pytest
from fastapi.testclient import TestClient
import respx
from app.api_app import create_api_app
from app.ui_app import create_ui_app
@pytest.fixture()
def agents_config(tmp_path: Path) -> Path:
data = {
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
"container_name": "ix-llamacpp-llamacpp-1",
"host_port": 8071,
"container_port": 8080,
"web_ui_url": "http://0.0.0.0:8071/",
"model_host_path": str(tmp_path),
"model_container_path": str(tmp_path),
"models": [],
"network": "ix-llamacpp_default",
"subnets": ["172.16.18.0/24"],
"gpu_count": 2,
"gpu_name": "NVIDIA RTX 5060 Ti",
}
path = tmp_path / "agents_config.json"
path.write_text(json.dumps(data), encoding="utf-8")
return path
@pytest.fixture()
def model_dir(tmp_path: Path) -> Path:
(tmp_path / "model-a.gguf").write_text("x", encoding="utf-8")
(tmp_path / "model-b.gguf").write_text("y", encoding="utf-8")
return tmp_path
@pytest.fixture()
def api_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
monkeypatch.setenv("MODEL_DIR", str(model_dir))
monkeypatch.setenv("LLAMACPP_BASE_URL", "http://llama.test")
app = create_api_app()
return TestClient(app)
@pytest.fixture()
def ui_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
monkeypatch.setenv("MODEL_DIR", str(model_dir))
app = create_ui_app()
return TestClient(app)
@pytest.fixture()
def respx_mock():
with respx.mock(assert_all_called=False) as mock:
yield mock

View File

@@ -0,0 +1,77 @@
import json
import pytest
import httpx
@pytest.mark.parametrize("case", list(range(120)))
def test_chat_completions_non_stream(api_client, respx_mock, case):
respx_mock.get("http://llama.test/v1/models").mock(
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
)
respx_mock.post("http://llama.test/v1/chat/completions").mock(
return_value=httpx.Response(200, json={"id": f"chatcmpl-{case}", "choices": [{"message": {"content": "ok"}}]})
)
payload = {
"model": "model-a.gguf",
"messages": [{"role": "user", "content": f"hello {case}"}],
"temperature": (case % 10) / 10,
}
resp = api_client.post("/v1/chat/completions", json=payload)
assert resp.status_code == 200
data = resp.json()
assert data["choices"][0]["message"]["content"] == "ok"
@pytest.mark.parametrize("case", list(range(120)))
def test_chat_completions_stream(api_client, respx_mock, case):
respx_mock.get("http://llama.test/v1/models").mock(
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
)
def stream_response(request):
content = b"data: {\"id\": \"chunk\"}\n\n"
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
payload = {
"model": "model-a.gguf",
"messages": [{"role": "user", "content": f"hello {case}"}],
"stream": True,
}
with api_client.stream("POST", "/v1/chat/completions", json=payload) as resp:
assert resp.status_code == 200
body = b"".join(resp.iter_bytes())
assert b"data:" in body
def test_chat_completions_tools_normalize(api_client, respx_mock):
respx_mock.get("http://llama.test/v1/models").mock(
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
)
def handler(request):
data = request.json()
tools = data.get("tools") or []
assert tools
assert tools[0].get("function", {}).get("name") == "format_final_json_response"
return httpx.Response(200, json={"id": "chatcmpl-tools", "choices": [{"message": {"content": "ok"}}]})
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=handler)
payload = {
"model": "model-a.gguf",
"messages": [{"role": "user", "content": "hello"}],
"tools": [
{
"type": "function",
"name": "format_final_json_response",
"parameters": {"type": "object"},
}
],
"tool_choice": {"type": "function", "name": "format_final_json_response"},
}
resp = api_client.post("/v1/chat/completions", json=payload)
assert resp.status_code == 200

14
tests/test_embeddings.py Normal file
View File

@@ -0,0 +1,14 @@
import pytest
import httpx
@pytest.mark.parametrize("case", list(range(120)))
def test_embeddings(api_client, respx_mock, case):
respx_mock.post("http://llama.test/v1/embeddings").mock(
return_value=httpx.Response(200, json={"data": [{"embedding": [0.1, 0.2]}]})
)
payload = {"model": "model-a.gguf", "input": f"text-{case}"}
resp = api_client.post("/v1/embeddings", json=payload)
assert resp.status_code == 200
data = resp.json()
assert "data" in data

24
tests/test_models.py Normal file
View File

@@ -0,0 +1,24 @@
import pytest
@pytest.mark.parametrize("case", list(range(120)))
def test_list_models_cases(api_client, case):
resp = api_client.get("/v1/models", headers={"x-case": str(case)})
assert resp.status_code == 200
payload = resp.json()
assert payload["object"] == "list"
assert isinstance(payload["data"], list)
@pytest.mark.parametrize("model_id", [f"model-a.gguf" for _ in range(120)])
def test_get_model_ok(api_client, model_id):
resp = api_client.get(f"/v1/models/{model_id}")
assert resp.status_code == 200
payload = resp.json()
assert payload["id"] == model_id
@pytest.mark.parametrize("model_id", [f"missing-{i}" for i in range(120)])
def test_get_model_not_found(api_client, model_id):
resp = api_client.get(f"/v1/models/{model_id}")
assert resp.status_code == 404

12
tests/test_proxy.py Normal file
View File

@@ -0,0 +1,12 @@
import pytest
import httpx
@pytest.mark.parametrize("case", list(range(120)))
def test_proxy_passthrough(api_client, respx_mock, case):
respx_mock.post("http://llama.test/test/path").mock(
return_value=httpx.Response(200, content=f"ok-{case}".encode())
)
resp = api_client.post("/proxy/llamacpp/test/path", content=b"hello")
assert resp.status_code == 200
assert resp.content.startswith(b"ok-")

View File

@@ -0,0 +1,283 @@
import asyncio
import json
import os
import ssl
import time
from typing import Dict, List
import pytest
import requests
import websockets
WRAPPER_BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9093")
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
TRUENAS_WS_URL = os.getenv("TRUENAS_WS_URL", "wss://192.168.1.2/websocket")
TRUENAS_API_KEY = os.getenv("TRUENAS_API_KEY", "")
TRUENAS_APP_NAME = os.getenv("TRUENAS_APP_NAME", "llamacpp")
MODEL_REQUEST = os.getenv("MODEL_REQUEST", "")
async def _rpc_call(method: str, params: List | None = None):
if not TRUENAS_API_KEY:
pytest.skip("TRUENAS_API_KEY not set")
ssl_ctx = ssl.create_default_context()
ssl_ctx.check_hostname = False
ssl_ctx.verify_mode = ssl.CERT_NONE
async with websockets.connect(TRUENAS_WS_URL, ssl=ssl_ctx) as ws:
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
connected = json.loads(await ws.recv())
if connected.get("msg") != "connected":
raise RuntimeError("failed to connect")
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [TRUENAS_API_KEY]}))
auth = json.loads(await ws.recv())
if not auth.get("result"):
raise RuntimeError("auth failed")
await ws.send(json.dumps({"id": 2, "msg": "method", "method": method, "params": params or []}))
while True:
raw = json.loads(await ws.recv())
if raw.get("id") != 2:
continue
if raw.get("msg") == "error":
raise RuntimeError(raw.get("error"))
return raw.get("result")
def _get_models() -> List[str]:
_wait_for_http(WRAPPER_BASE + "/health")
resp = requests.get(WRAPPER_BASE + "/v1/models", timeout=30)
resp.raise_for_status()
data = resp.json().get("data") or []
return [m.get("id") for m in data if m.get("id")]
def _assert_chat_ok(resp_json: Dict) -> str:
choices = resp_json.get("choices") or []
assert choices, "no choices"
message = choices[0].get("message") or {}
text = message.get("content") or ""
assert text.strip(), "empty content"
return text
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
deadline = time.time() + timeout_s
last_err = None
while time.time() < deadline:
try:
resp = requests.get(url, timeout=5)
if resp.status_code == 200:
return
last_err = f"status {resp.status_code}"
except Exception as exc:
last_err = str(exc)
time.sleep(2)
raise RuntimeError(f"service not ready: {url} ({last_err})")
def _post_with_retry(url: str, payload: Dict, timeout_s: float = 300, retries: int = 6, delay_s: float = 5.0):
last = None
for _ in range(retries):
try:
resp = requests.post(url, json=payload, timeout=timeout_s)
if resp.status_code == 200:
return resp
last = resp
except requests.exceptions.RequestException as exc:
last = exc
time.sleep(delay_s)
if isinstance(last, Exception):
raise last
return last
@pytest.mark.asyncio
async def test_active_model_and_multi_gpu_flags():
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
command = cfg.get("command") or []
assert "--model" in command
assert "--tensor-split" in command
split_idx = command.index("--tensor-split") + 1
split = command[split_idx]
assert "," in split, f"tensor-split missing commas: {split}"
assert "--split-mode" in command
def test_models_listed():
models = _get_models()
assert models, "no models discovered"
def test_chat_completions_switch_and_prompts():
models = _get_models()
assert models, "no models"
if MODEL_REQUEST:
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
model_id = MODEL_REQUEST
else:
model_id = models[0]
payload = {
"model": model_id,
"messages": [{"role": "user", "content": "Say OK."}],
"max_tokens": 12,
"temperature": 0,
}
for _ in range(3):
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
assert resp.status_code == 200
_assert_chat_ok(resp.json())
def test_tools_flat_format():
models = _get_models()
assert models, "no models"
if MODEL_REQUEST:
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
model_id = MODEL_REQUEST
else:
model_id = models[0]
payload = {
"model": model_id,
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
"tools": [
{
"type": "function",
"name": "format_final_json_response",
"description": "format output",
"parameters": {
"type": "object",
"properties": {"ok": {"type": "boolean"}},
"required": ["ok"],
},
}
],
"max_tokens": 12,
}
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
assert resp.status_code == 200
_assert_chat_ok(resp.json())
def test_functions_payload_normalized():
models = _get_models()
assert models, "no models"
if MODEL_REQUEST:
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
model_id = MODEL_REQUEST
else:
model_id = models[0]
payload = {
"model": model_id,
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
"functions": [
{
"name": "format_final_json_response",
"description": "format output",
"parameters": {
"type": "object",
"properties": {"ok": {"type": "boolean"}},
"required": ["ok"],
},
}
],
"max_tokens": 12,
}
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
assert resp.status_code == 200
_assert_chat_ok(resp.json())
def test_return_format_json():
models = _get_models()
assert models, "no models"
if MODEL_REQUEST:
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
model_id = MODEL_REQUEST
else:
model_id = models[0]
payload = {
"model": model_id,
"messages": [{"role": "user", "content": "Return JSON with key ok true."}],
"return_format": "json",
"max_tokens": 32,
"temperature": 0,
}
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
assert resp.status_code == 200
text = _assert_chat_ok(resp.json())
parsed = json.loads(text)
assert isinstance(parsed, dict)
def test_responses_endpoint():
models = _get_models()
assert models, "no models"
if MODEL_REQUEST:
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
model_id = MODEL_REQUEST
else:
model_id = models[0]
payload = {
"model": model_id,
"input": "Say OK.",
"max_output_tokens": 16,
}
resp = _post_with_retry(WRAPPER_BASE + "/v1/responses", payload)
assert resp.status_code == 200
output = resp.json().get("output") or []
assert output, "responses output empty"
content = output[0].get("content") or []
text = content[0].get("text") if content else ""
assert text and text.strip()
@pytest.mark.asyncio
async def test_model_switch_applied_to_truenas():
models = _get_models()
assert models, "no models"
target = MODEL_REQUEST or models[0]
assert target in models, f"MODEL_REQUEST not found: {target}"
resp = requests.post(UI_BASE + "/ui/api/switch-model", json={"model_id": target, "warmup_prompt": "warmup"}, timeout=600)
assert resp.status_code == 200
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
command = cfg.get("command") or []
assert "--model" in command
model_path = command[command.index("--model") + 1]
assert model_path.endswith(target)
def test_invalid_model_rejected():
models = _get_models()
assert models, "no models"
payload = {
"model": "modelx-q8:4b",
"messages": [{"role": "user", "content": "Say OK."}],
"max_tokens": 8,
"temperature": 0,
}
resp = requests.post(WRAPPER_BASE + "/v1/chat/completions", json=payload, timeout=60)
assert resp.status_code == 404
def test_llamacpp_logs_streaming():
logs = ""
for _ in range(5):
try:
resp = requests.get(UI_BASE + "/ui/api/llamacpp-logs", timeout=10)
if resp.status_code == 200:
logs = resp.json().get("logs") or ""
if logs.strip():
break
except requests.exceptions.ReadTimeout:
pass
time.sleep(2)
assert logs.strip(), "no logs returned"
# Force a log line before streaming.
try:
requests.get(WRAPPER_BASE + "/proxy/llamacpp/health", timeout=5)
except Exception:
pass
# Stream endpoint may not emit immediately, so validate that the endpoint responds.
with requests.get(UI_BASE + "/ui/api/llamacpp-logs/stream", stream=True, timeout=(5, 5)) as resp:
assert resp.status_code == 200

55
tests/test_responses.py Normal file
View File

@@ -0,0 +1,55 @@
import json
import pytest
import httpx
@pytest.mark.parametrize("case", list(range(120)))
def test_responses_non_stream(api_client, respx_mock, case):
respx_mock.get("http://llama.test/v1/models").mock(
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
)
respx_mock.post("http://llama.test/v1/chat/completions").mock(
return_value=httpx.Response(200, json={"choices": [{"message": {"content": f"reply-{case}"}}]})
)
payload = {
"model": "model-a.gguf",
"input": f"prompt-{case}",
"max_output_tokens": 32,
}
resp = api_client.post("/v1/responses", json=payload)
assert resp.status_code == 200
data = resp.json()
assert data["object"] == "response"
assert data["output"][0]["content"][0]["text"].startswith("reply-")
@pytest.mark.parametrize("case", list(range(120)))
def test_responses_stream(api_client, respx_mock, case):
respx_mock.get("http://llama.test/v1/models").mock(
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
)
def stream_response(request):
payload = {
"id": "chunk",
"object": "chat.completion.chunk",
"choices": [{"delta": {"content": f"hi-{case}"}, "index": 0, "finish_reason": None}],
}
content = f"data: {json.dumps(payload)}\n\n".encode()
content += b"data: [DONE]\n\n"
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
payload = {
"model": "model-a.gguf",
"input": f"prompt-{case}",
"stream": True,
}
with api_client.stream("POST", "/v1/responses", json=payload) as resp:
assert resp.status_code == 200
body = b"".join(resp.iter_bytes())
assert b"event: response.created" in body
assert b"event: response.output_text.delta" in body
assert b"event: response.completed" in body

View File

@@ -0,0 +1,54 @@
import json
import pytest
from app.truenas_middleware import TrueNASConfig, switch_model
@pytest.mark.asyncio
@pytest.mark.parametrize("case", list(range(120)))
async def test_switch_model_updates_command(monkeypatch, case):
compose = {
"services": {
"llamacpp": {
"command": [
"--model",
"/models/old.gguf",
"--ctx-size",
"2048",
]
}
}
}
captured = {}
async def fake_rpc_call(cfg, method, params=None):
if method == "app.config":
return {"custom_compose_config": compose}
if method == "app.update":
captured["payload"] = params[1]
return {"state": "RUNNING"}
raise AssertionError(f"unexpected method {method}")
monkeypatch.setattr("app.truenas_middleware._rpc_call", fake_rpc_call)
cfg = TrueNASConfig(
ws_url="ws://truenas.test/websocket",
api_key="key",
api_user=None,
app_name="llamacpp",
verify_ssl=False,
)
await switch_model(
cfg,
f"/models/new-{case}.gguf",
{"n_gpu_layers": "999"},
"--flash-attn on",
)
assert "custom_compose_config" in captured["payload"]
cmd = captured["payload"]["custom_compose_config"]["services"]["llamacpp"]["command"]
assert "--model" in cmd
idx = cmd.index("--model")
assert cmd[idx + 1].endswith(f"new-{case}.gguf")

48
tests/test_ui.py Normal file
View File

@@ -0,0 +1,48 @@
import json
import os
import time
import pytest
import requests
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
deadline = time.time() + timeout_s
last_err = None
while time.time() < deadline:
try:
resp = requests.get(url, timeout=5)
if resp.status_code == 200:
return
last_err = f"status {resp.status_code}"
except Exception as exc:
last_err = str(exc)
time.sleep(2)
raise RuntimeError(f"service not ready: {url} ({last_err})")
def test_ui_index_contains_expected_elements():
_wait_for_http(UI_BASE + "/health")
resp = requests.get(UI_BASE + "/", timeout=30)
assert resp.status_code == 200
html = resp.text
assert "Model Manager" in html
assert "id=\"download-form\"" in html
assert "id=\"models-list\"" in html
assert "id=\"logs-output\"" in html
assert "id=\"theme-toggle\"" in html
def test_ui_assets_available():
resp = requests.get(UI_BASE + "/ui/styles.css", timeout=30)
assert resp.status_code == 200
css = resp.text
assert "data-theme" in css
resp = requests.get(UI_BASE + "/ui/app.js", timeout=30)
assert resp.status_code == 200
js = resp.text
assert "themeToggle" in js
assert "localStorage" in js
assert "logs-output" in js

1
tmp_channels_cols.sql Normal file
View File

@@ -0,0 +1 @@
SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;

1
tmp_pref_type.sql Normal file
View File

@@ -0,0 +1 @@
SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';

View File

@@ -0,0 +1 @@
UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';

56
trades_company_stock.txt Normal file
View File

@@ -0,0 +1,56 @@
You are a senior quantitative options trader (index/ETF options across regimes; also liquid single-name options and macro-sensitive metal ETFs), specializing in volatility, structure selection, and risk asymmetry. Decisive, skeptical, profit-focused.
You are given:
- A validated market thesis (authoritative): multi-timeframe technicals, regime, volatility context, news impact.
- Pre-processed options chains for three expiries (short / medium / extended) with liquidity-filtered contracts, ATM/delta anchors, delta ladders, and a liquid execution set.
- All pricing, greeks, spreads, and liquidity metrics required for execution-quality decisions.
Assume:
- Data is correct and cleaned.
- You must NOT re-analyze technicals or news; the thesis is authoritative.
- Your job is to convert thesis + surface into executable options trades.
Objective:
- Select the best expiry and propose 13 high-quality options trades that align with thesis bias/regime, exploit volatility characteristics (gamma/theta/vega fit), are liquid/fillable/risk-defined, and include clear invalidation logic.
- If no trade offers favorable risk/reward: strategyBias=NO_TRADE and explain why.
How to decide:
1) Compare expiries: match time-to-playout vs confidence/uncertainty; match vol regime (expansion vs decay); reject poor liquidity density; reject misaligned vega/theta; avoid overpaying for time/vol.
2) Choose structure class (explicitly justify vs alternatives): directional debit (single/vertical), volatility (straddle/strangle), defined-risk premium selling only if the regime supports it.
3) Select strikes ONLY from provided data (ATM anchor, delta ladder, liquidSet). Prefer tight spreads, meaningful volume & OI, and greeks that express the thesis.
4) Risk discipline: every trade must include max risk, what must go right, and what breaks the trade (invalidation).
Optional tools (use only when they materially improve decision quality; otherwise do not call):
- MarketData Options Chain (expiry-specific): only if provided expiries do not sufficiently match the thesis horizon, or liquidity/skew is materially better in a nearby expiry not already supplied. Choose an explicit expiry date. Use returned data only for strike selection and liquidity validation. Do not re-fetch already provided expiries unless validating anomalies.
- Fear & Greed Index (FGI): only for index/ETF/macro-sensitive underlyings (e.g., SPX, NDX, IWM, SLV). Contextual only (risk appetite / convexity vs tempered), not a primary signal.
Hard constraints:
- Do NOT invent strikes, expiries, or prices.
- Do NOT suggest illiquid contracts.
- Do NOT recommend naked risk.
- Do NOT hedge unless justified.
- Do NOT repeat raw data back.
Return ONLY valid JSON in exactly this shape:
{
"selectedExpiry": "YYYY-MM-DD",
"expiryRationale": "Why this expiry dominates the others given thesis + vol + liquidity",
"strategyBias": "DIRECTIONAL|VOLATILITY|NEUTRAL|NO_TRADE",
"recommendedTrades": [
{
"name": "Short descriptive name",
"structure": "e.g. Long Call, Call Debit Spread, Long Strangle",
"legs": [{"side":"call|put","action":"buy|sell","strike":0,"expiry":"YYYY-MM-DD"}],
"greekProfile": {"deltaBias":"POS|NEG|NEUTRAL","gammaExposure":"HIGH|MED|LOW","thetaExposure":"POS|NEG|LOW","vegaExposure":"HIGH|MED|LOW"},
"maxRisk": "Defined numeric or qualitative",
"maxReward": "Defined numeric or qualitative",
"thesisAlignment": "Exactly how this trade expresses the thesis",
"invalidation": "Clear condition where trade is wrong",
"managementNotes": "Optional: scale, take-profit, time stop"
}
],
"whyOthersRejected": ["Why other expiries or strategy types were inferior"],
"confidenceScore": 0
}
Final note: optimize for repeatable profitability under uncertainty. If conditions are marginal, say NO_TRADE with conviction.