Initial commit
This commit is contained in:
142
.gitignore
vendored
Normal file
142
.gitignore
vendored
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
# Byte-compiled / optimized / DLL files
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*$py.class
|
||||||
|
|
||||||
|
# C extensions
|
||||||
|
*.so
|
||||||
|
|
||||||
|
# Distribution / packaging
|
||||||
|
.Python
|
||||||
|
build/
|
||||||
|
develop-eggs/
|
||||||
|
dist/
|
||||||
|
downloads/
|
||||||
|
eggs/
|
||||||
|
.eggs/
|
||||||
|
lib/
|
||||||
|
lib64/
|
||||||
|
parts/
|
||||||
|
sdist/
|
||||||
|
var/
|
||||||
|
wheels/
|
||||||
|
share/python-wheels/
|
||||||
|
*.egg-info/
|
||||||
|
.installed.cfg
|
||||||
|
*.egg
|
||||||
|
MANIFEST
|
||||||
|
|
||||||
|
# PyInstaller
|
||||||
|
# Usually these files are written by a python script from a template
|
||||||
|
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||||
|
*.manifest
|
||||||
|
*.spec
|
||||||
|
|
||||||
|
# Installer logs
|
||||||
|
pip-log.txt
|
||||||
|
pip-delete-this-directory.txt
|
||||||
|
|
||||||
|
# Unit test / coverage reports
|
||||||
|
htmlcov/
|
||||||
|
.tox/
|
||||||
|
.nox/
|
||||||
|
.coverage
|
||||||
|
.coverage.*
|
||||||
|
.cache
|
||||||
|
nosetests.xml
|
||||||
|
coverage.xml
|
||||||
|
*.cover
|
||||||
|
*.py,cover
|
||||||
|
.hypothesis/
|
||||||
|
.pytest_cache/
|
||||||
|
|
||||||
|
# Translations
|
||||||
|
*.mo
|
||||||
|
*.pot
|
||||||
|
|
||||||
|
# Django stuff:
|
||||||
|
*.log
|
||||||
|
local_settings.py
|
||||||
|
db.sqlite3
|
||||||
|
db.sqlite3-journal
|
||||||
|
|
||||||
|
# Flask stuff:
|
||||||
|
instance/
|
||||||
|
.webassets-cache
|
||||||
|
|
||||||
|
# Scrapy stuff:
|
||||||
|
.scrapy
|
||||||
|
|
||||||
|
# Sphinx documentation
|
||||||
|
docs/_build/
|
||||||
|
|
||||||
|
# PyBuilder
|
||||||
|
target/
|
||||||
|
|
||||||
|
# Jupyter Notebook
|
||||||
|
.ipynb_checkpoints
|
||||||
|
|
||||||
|
# IPython
|
||||||
|
profile_default/
|
||||||
|
ipython_config.py
|
||||||
|
|
||||||
|
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
|
||||||
|
__pypackages__/
|
||||||
|
|
||||||
|
# Celery stuff
|
||||||
|
celerybeat-schedule
|
||||||
|
celerybeat.pid
|
||||||
|
|
||||||
|
# SageMath parsed files
|
||||||
|
*.sage.py
|
||||||
|
|
||||||
|
# Environments
|
||||||
|
.env
|
||||||
|
.venv
|
||||||
|
env/
|
||||||
|
venv/
|
||||||
|
ENV/
|
||||||
|
env.bak/
|
||||||
|
venv.bak/
|
||||||
|
|
||||||
|
# Spyder project settings
|
||||||
|
.spyderproject
|
||||||
|
.spyproject
|
||||||
|
|
||||||
|
# Rope project settings
|
||||||
|
.ropeproject
|
||||||
|
|
||||||
|
# mkdocs documentation
|
||||||
|
/site
|
||||||
|
|
||||||
|
# mypy
|
||||||
|
.mypy_cache/
|
||||||
|
.dmypy.json
|
||||||
|
dmypy.json
|
||||||
|
|
||||||
|
# Pyre type checker
|
||||||
|
.pyre/
|
||||||
|
|
||||||
|
# pytype static type analyzer
|
||||||
|
.pytype/
|
||||||
|
|
||||||
|
# Cython debug symbols
|
||||||
|
cython_debug/
|
||||||
|
|
||||||
|
# Project-specific
|
||||||
|
/inventory_raw/
|
||||||
|
/llamacpp_runs_remote/
|
||||||
|
/ollama_runs_remote/
|
||||||
|
/reports/
|
||||||
|
/tmp/
|
||||||
|
*.log
|
||||||
|
/C:/Users/Rushabh/.gemini/tmp/bff31f86566324f77927540d72088ce62479fd0563c197318c9f0594af2e69ee/
|
||||||
|
|
||||||
|
# OS-generated files
|
||||||
|
.DS_Store
|
||||||
|
.DS_Store?
|
||||||
|
._*
|
||||||
|
.Spotlight-V100
|
||||||
|
.Trashes
|
||||||
|
ehthumbs.db
|
||||||
|
Thumbs.db
|
||||||
4206
AGENTS.full.md
Normal file
4206
AGENTS.full.md
Normal file
File diff suppressed because it is too large
Load Diff
20
AGENTS.md
Normal file
20
AGENTS.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
# AGENTS (compressed)
|
||||||
|
|
||||||
|
This is the compact working context. For the full historical inventory and detailed snapshots, see `AGENTS.full.md` and `inventory_raw/`.
|
||||||
|
|
||||||
|
## Access + basics
|
||||||
|
- SSH: `ssh -p 55555 rushabh@192.168.1.2`
|
||||||
|
- Sudo: `sudo -n true`
|
||||||
|
- TrueNAS UI: `http://192.168.1.2`
|
||||||
|
|
||||||
|
## Full context pointers
|
||||||
|
- Full inventory snapshot and extra system details: `AGENTS.full.md`
|
||||||
|
- Raw captured data: `inventory_raw/`
|
||||||
|
- Documentation notes: `docs/*`
|
||||||
|
Projects
|
||||||
|
- n8n Thesis Builder checkpoint (2026-01-04): `docs/n8n-thesis-builder-checkpoint-20260104.md`
|
||||||
|
- llamaCpp wrapper: A Python-based OpenAI-compatible API wrapper and model manager for the TrueNAS llama.cpp app.
|
||||||
|
- Location: `llamaCpp.Wrapper.app/`
|
||||||
|
- API Port: `9093`
|
||||||
|
- UI Port: `9094`
|
||||||
|
- See the `README.md` inside the folder for full details.
|
||||||
69
README.md
Normal file
69
README.md
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
# Codex TrueNAS Helper
|
||||||
|
|
||||||
|
This project is a collection of scripts, configurations, and applications to manage and enhance a TrueNAS SCALE server, with a special focus on running and interacting with large language models (LLMs) like those powered by `llama.cpp` and `Ollama`.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
* **`llama.cpp` Wrapper:** A sophisticated wrapper for the `llama.cpp` TrueNAS application that provides:
|
||||||
|
* An OpenAI-compatible API for chat completions and embeddings.
|
||||||
|
* A web-based UI for managing models (listing, downloading).
|
||||||
|
* The ability to hot-swap models without restarting the `llama.cpp` container by interacting with the TrueNAS API.
|
||||||
|
* **TrueNAS Inventory:** A snapshot of the TrueNAS server's configuration, including hardware, storage, networking, and running applications.
|
||||||
|
* **Automation Scripts:** A set of PowerShell and Python scripts for tasks like deploying the wrapper and testing remote endpoints.
|
||||||
|
* **LLM Integration:** Tools and configurations for working with various LLMs.
|
||||||
|
|
||||||
|
## Directory Structure
|
||||||
|
|
||||||
|
* `AGENTS.md` & `AGENTS.full.md`: These files contain detailed information and a complete inventory of the TrueNAS server's configuration.
|
||||||
|
* `llamaCpp.Wrapper.app/`: A Python-based application that wraps the `llama.cpp` TrueNAS app with an OpenAI-compatible API and a model management UI.
|
||||||
|
* `scripts/`: Contains various scripts for deployment, testing, and other tasks.
|
||||||
|
* `inventory_raw/`: Raw data dumps from the TrueNAS server, used to generate the inventory in `AGENTS.full.md`.
|
||||||
|
* `reports/`: Contains generated reports, test results, and other artifacts.
|
||||||
|
* `llamacpp_runs_remote/` & `ollama_runs_remote/`: Logs and results from running LLMs.
|
||||||
|
* `modelfiles/`: Modelfiles for different language models.
|
||||||
|
* `tests/`: Python tests for the `llamaCpp.Wrapper.app`.
|
||||||
|
|
||||||
|
## `llamaCpp.Wrapper.app`
|
||||||
|
|
||||||
|
This is the core component of the project. It's a Python application that acts as a proxy to the `llama.cpp` server running on TrueNAS, but with added features.
|
||||||
|
|
||||||
|
### Running Locally
|
||||||
|
|
||||||
|
1. Install the required Python packages:
|
||||||
|
```bash
|
||||||
|
pip install -r llamaCpp.Wrapper.app/requirements.txt
|
||||||
|
```
|
||||||
|
2. Run the application:
|
||||||
|
```bash
|
||||||
|
python -m llamaCpp.Wrapper.app.run
|
||||||
|
```
|
||||||
|
This will start two web servers: one for the API (default port 9093) and one for the UI (default port 9094).
|
||||||
|
|
||||||
|
### Docker (TrueNAS)
|
||||||
|
|
||||||
|
The wrapper can be run as a Docker container on TrueNAS. See the `llamaCpp.Wrapper.app/README.md` file for a detailed example of the `docker run` command. The wrapper needs to be configured with the appropriate environment variables to connect to the TrueNAS API and the `llama.cpp` container.
|
||||||
|
|
||||||
|
### Model Hot-Swapping
|
||||||
|
|
||||||
|
The wrapper can switch models in the `llama.cpp` server by updating the application's command via the TrueNAS API. This is a powerful feature that allows for dynamic model management without manual intervention.
|
||||||
|
|
||||||
|
## Scripts
|
||||||
|
|
||||||
|
* `deploy_truenas_wrapper.py`: A Python script to deploy the `llamaCpp.Wrapper.app` to TrueNAS.
|
||||||
|
* `remote_wrapper_test.py`: A Python script for testing the remote wrapper.
|
||||||
|
* `update_llamacpp_flags.ps1`: A PowerShell script to update the `llama.cpp` flags.
|
||||||
|
* `llamacpp_remote_test.ps1` & `ollama_remote_test.ps1`: PowerShell scripts for testing `llama.cpp` and `Ollama` remote endpoints.
|
||||||
|
|
||||||
|
## Getting Started
|
||||||
|
|
||||||
|
1. **Explore the Inventory:** Start by reading `AGENTS.md` and `AGENTS.full.md` to understand the TrueNAS server's configuration.
|
||||||
|
2. **Set up the Wrapper:** If you want to use the `llama.cpp` wrapper, follow the instructions in `llamaCpp.Wrapper.app/README.md` to run it either locally or as a Docker container on TrueNAS.
|
||||||
|
3. **Use the Scripts:** The scripts in the `scripts` directory can be used to automate various tasks.
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
The `llamaCpp.Wrapper.app` has a suite of tests located in the `tests/` directory. To run the tests, use `pytest`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest
|
||||||
|
```
|
||||||
60
docs/llamacpp-wrapper-notes.md
Normal file
60
docs/llamacpp-wrapper-notes.md
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
# llama.cpp Wrapper Notes
|
||||||
|
|
||||||
|
Last updated: 2026-01-04
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
|
||||||
|
model switching, and parameter management via TrueNAS middleware.
|
||||||
|
|
||||||
|
## Deployed Image
|
||||||
|
- `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
|
||||||
|
|
||||||
|
## Ports (current)
|
||||||
|
- API (pinned): `http://192.168.1.2:9093`
|
||||||
|
- UI (pinned): `http://192.168.1.2:9094`
|
||||||
|
- llama.cpp native: `http://192.168.1.2:8071`
|
||||||
|
|
||||||
|
## Key Behaviors
|
||||||
|
- Model switching uses TrueNAS middleware `app.update` to update `--model`.
|
||||||
|
- `--device` flag is explicitly removed because it crashes llama.cpp on this host.
|
||||||
|
- UI shows active model and supports switching with verification prompt.
|
||||||
|
- UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
|
||||||
|
- UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
|
||||||
|
- UI supports dark theme toggle (persisted in localStorage).
|
||||||
|
- UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
|
||||||
|
|
||||||
|
## Tools Support (n8n/OpenWebUI)
|
||||||
|
- Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
|
||||||
|
OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
|
||||||
|
- Legacy `functions` payloads are normalized into `tools`.
|
||||||
|
- `tool_choice` is normalized to OpenAI format as well.
|
||||||
|
- `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
|
||||||
|
|
||||||
|
## Model Resolution
|
||||||
|
- Exact string match only (with optional explicit alias mapping).
|
||||||
|
- Requests that do not exactly match a listed model return `404`.
|
||||||
|
|
||||||
|
## Parameters UI
|
||||||
|
- Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
|
||||||
|
- Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
|
||||||
|
|
||||||
|
## Model Switch UI
|
||||||
|
- Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
|
||||||
|
- Verifies switch by sending a minimal prompt.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
- Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
|
||||||
|
- UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
|
||||||
|
- Run with env vars:
|
||||||
|
- `WRAPPER_BASE=http://192.168.1.2:9093`
|
||||||
|
- `UI_BASE=http://192.168.1.2:9094`
|
||||||
|
- `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
|
||||||
|
- `TRUENAS_API_KEY=...`
|
||||||
|
- `MODEL_REQUEST=<exact model id from /v1/models>`
|
||||||
|
|
||||||
|
## Runtime Validation (2026-01-04)
|
||||||
|
- Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
|
||||||
|
- Confirmed TinyLlama loads and answers prompts with `return_format=json`.
|
||||||
|
- Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
|
||||||
|
- Expect transient `503 Loading model` during warmup; retry after load completes.
|
||||||
|
- Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).
|
||||||
53
docs/n8n-thesis-builder-checkpoint-20260104.md
Normal file
53
docs/n8n-thesis-builder-checkpoint-20260104.md
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
# n8n Thesis Builder Debug Checkpoint (2026-01-04)
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
- Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
|
||||||
|
- Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
|
||||||
|
- Confirmed execution with garbled output: execution `7890` (status `success`).
|
||||||
|
|
||||||
|
## What changed in the workflow
|
||||||
|
Only this workflow was modified:
|
||||||
|
- `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
|
||||||
|
- `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
|
||||||
|
- `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
|
||||||
|
- `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
|
||||||
|
- `Code10` now caps news items and string length.
|
||||||
|
|
||||||
|
## Last successful run details (execution 7890)
|
||||||
|
- `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
|
||||||
|
- `AI - Technicals Auditor` output looks valid JSON (see sample below).
|
||||||
|
- `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
|
||||||
|
- Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
|
||||||
|
|
||||||
|
### Sample `AI - Thesis Builder` output (garbled)
|
||||||
|
- symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
|
||||||
|
- thesis: (junk tokens, mostly non-words)
|
||||||
|
- confidence: `0`
|
||||||
|
|
||||||
|
### Sample `AI - Technicals Auditor` output (valid JSON)
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"output": {
|
||||||
|
"timeframes": [
|
||||||
|
{ "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||||
|
{ "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||||
|
{ "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||||
|
{ "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
|
||||||
|
],
|
||||||
|
"optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
|
||||||
|
"dataQualityScore": 0.5,
|
||||||
|
"error": "INSUFFICIENT_DATA"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open issues
|
||||||
|
- Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
|
||||||
|
- Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
|
||||||
|
|
||||||
|
## Useful commands
|
||||||
|
- Last runs:
|
||||||
|
`SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
|
||||||
|
- Export workflow:
|
||||||
|
`sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`
|
||||||
|
|
||||||
16
llamaCpp.Wrapper.app/Dockerfile
Normal file
16
llamaCpp.Wrapper.app/Dockerfile
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
FROM python:3.11-slim
|
||||||
|
|
||||||
|
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||||
|
PYTHONUNBUFFERED=1
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY requirements.txt /app/requirements.txt
|
||||||
|
RUN pip install --no-cache-dir -r /app/requirements.txt
|
||||||
|
|
||||||
|
COPY app /app/app
|
||||||
|
COPY trades_company_stock.txt /app/trades_company_stock.txt
|
||||||
|
|
||||||
|
EXPOSE 8000 8001
|
||||||
|
|
||||||
|
CMD ["python", "-m", "app.run"]
|
||||||
134
llamaCpp.Wrapper.app/README.md
Normal file
134
llamaCpp.Wrapper.app/README.md
Normal file
@@ -0,0 +1,134 @@
|
|||||||
|
# llama.cpp OpenAI-Compatible Wrapper
|
||||||
|
|
||||||
|
This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
|
||||||
|
The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
|
||||||
|
|
||||||
|
## Current Agents-Derived Details
|
||||||
|
|
||||||
|
- llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
|
||||||
|
- Host port: `8071` -> container port `8080`
|
||||||
|
- Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
|
||||||
|
- Network: `ix-llamacpp_default`
|
||||||
|
- Container name: `ix-llamacpp-llamacpp-1`
|
||||||
|
- GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
|
||||||
|
|
||||||
|
Regenerate the derived config after updating `AGENTS.md`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Locally
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv
|
||||||
|
. .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python -m app.run
|
||||||
|
```
|
||||||
|
|
||||||
|
Defaults:
|
||||||
|
- API: `PORT_A=9093`
|
||||||
|
- UI: `PORT_B=9094`
|
||||||
|
- Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
|
||||||
|
- Model dir: `MODEL_DIR=/models`
|
||||||
|
|
||||||
|
## Docker (TrueNAS)
|
||||||
|
|
||||||
|
Example (join existing llama.cpp network and mount models):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --rm -p 9093:9093 -p 9094:9094 \
|
||||||
|
--network ix-llamacpp_default \
|
||||||
|
-v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
|
||||||
|
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||||
|
-e LLAMACPP_RESTART_METHOD=docker \
|
||||||
|
-e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
|
||||||
|
-e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
|
||||||
|
-e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
|
||||||
|
-e TRUENAS_API_KEY=YOUR_KEY \
|
||||||
|
-e TRUENAS_API_USER=YOUR_USER \
|
||||||
|
-e TRUENAS_APP_NAME=llamacpp \
|
||||||
|
-e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
|
||||||
|
-e PORT_A=9093 -e PORT_B=9094 \
|
||||||
|
llama-cpp-openai-wrapper:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model Hot-Swap / Restart Hooks
|
||||||
|
|
||||||
|
This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
|
||||||
|
provide one of the restart methods below:
|
||||||
|
|
||||||
|
- `LLAMACPP_RESTART_METHOD=http`
|
||||||
|
- `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
- `LLAMACPP_RESTART_METHOD=shell`
|
||||||
|
- `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
|
||||||
|
|
||||||
|
or (requires mounting docker socket)
|
||||||
|
|
||||||
|
- `LLAMACPP_RESTART_METHOD=docker`
|
||||||
|
- `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
|
||||||
|
|
||||||
|
## Model switching via TrueNAS middleware (P0)
|
||||||
|
|
||||||
|
Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
|
||||||
|
|
||||||
|
```
|
||||||
|
TRUENAS_WS_URL=ws://192.168.1.2/websocket
|
||||||
|
TRUENAS_API_KEY=YOUR_KEY
|
||||||
|
TRUENAS_API_USER=YOUR_USER
|
||||||
|
TRUENAS_APP_NAME=llamacpp
|
||||||
|
TRUENAS_VERIFY_SSL=false
|
||||||
|
```
|
||||||
|
|
||||||
|
The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
|
||||||
|
missing GPU split flags from `LLAMACPP_*` if not already set.
|
||||||
|
|
||||||
|
Optional arguments passed to restart handlers:
|
||||||
|
|
||||||
|
```
|
||||||
|
LLAMACPP_DEVICES=0,1
|
||||||
|
LLAMACPP_TENSOR_SPLIT=0.5,0.5
|
||||||
|
LLAMACPP_SPLIT_MODE=layer
|
||||||
|
LLAMACPP_N_GPU_LAYERS=999
|
||||||
|
LLAMACPP_CTX_SIZE=8192
|
||||||
|
LLAMACPP_BATCH_SIZE=1024
|
||||||
|
LLAMACPP_UBATCH_SIZE=256
|
||||||
|
LLAMACPP_CACHE_TYPE_K=q4_0
|
||||||
|
LLAMACPP_CACHE_TYPE_V=q4_0
|
||||||
|
LLAMACPP_FLASH_ATTN=on
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also pass arbitrary llama.cpp flags (space-separated) via:
|
||||||
|
|
||||||
|
```
|
||||||
|
LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Model Manager UI
|
||||||
|
|
||||||
|
Open `http://HOST:PORT_B/`.
|
||||||
|
|
||||||
|
Features:
|
||||||
|
- List existing models
|
||||||
|
- Download models via URL
|
||||||
|
- Live progress + cancel
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
Tests are parameterized with 100+ cases per endpoint.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pytest -q
|
||||||
|
```
|
||||||
|
|
||||||
|
## llama.cpp flags reference
|
||||||
|
|
||||||
|
Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
|
||||||
|
|
||||||
|
```
|
||||||
|
pwsh scripts/update_llamacpp_flags.ps1
|
||||||
|
```
|
||||||
1
llamaCpp.Wrapper.app/__init__.py
Normal file
1
llamaCpp.Wrapper.app/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
|
||||||
22
llamaCpp.Wrapper.app/agents_config.json
Normal file
22
llamaCpp.Wrapper.app/agents_config.json
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
{
|
||||||
|
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
|
||||||
|
"container_name": "ix-llamacpp-llamacpp-1",
|
||||||
|
"host_port": 8071,
|
||||||
|
"container_port": 8080,
|
||||||
|
"web_ui_url": "http://0.0.0.0:8071/",
|
||||||
|
"model_host_path": "/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models",
|
||||||
|
"model_container_path": "/models",
|
||||||
|
"models": [
|
||||||
|
"GPT-OSS",
|
||||||
|
"Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
|
||||||
|
"openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf",
|
||||||
|
"Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
|
||||||
|
],
|
||||||
|
"network": "ix-llamacpp_default",
|
||||||
|
"subnets": [
|
||||||
|
"172.16.18.0/24",
|
||||||
|
"fdb7:86ec:b1dd:11::/64"
|
||||||
|
],
|
||||||
|
"gpu_count": 2,
|
||||||
|
"gpu_name": "NVIDIA RTX 5060 Ti, 16 GB each (per `nvidia-smi` in prior runs)."
|
||||||
|
}
|
||||||
119
llamaCpp.Wrapper.app/agents_parser.py
Normal file
119
llamaCpp.Wrapper.app/agents_parser.py
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
import json
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import List, Optional
|
||||||
|
|
||||||
|
APP_HEADER_RE = re.compile(r"^### App: (?P<name>.+?)\s*$")
|
||||||
|
IMAGE_RE = re.compile(r"image=(?P<image>[^\s]+)")
|
||||||
|
PORT_MAP_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host>\d+|0\.0\.0\.0:(?P<host_ip_port>\d+))")
|
||||||
|
PORT_LINE_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host_ip>[^:]+):(?P<host>\d+)")
|
||||||
|
VOLUME_RE = re.compile(r"- (?P<host>/[^\s]+) -> (?P<container>/[^\s]+)")
|
||||||
|
NETWORK_RE = re.compile(r"- (?P<name>ix-[^\s]+)_default")
|
||||||
|
SUBNET_RE = re.compile(r"subnets=\[(?P<subnets>[^\]]+)\]")
|
||||||
|
MODELS_RE = re.compile(r"Models in /models: (?P<models>.+)$")
|
||||||
|
PORTAL_RE = re.compile(r"Portals: \{\'Web UI\': \'(?P<url>[^\']+)\'\}")
|
||||||
|
GPU_RE = re.compile(r"GPUs:\s*(?P<count>\d+)x\s*(?P<name>.+)$")
|
||||||
|
CONTAINER_NAME_RE = re.compile(r"^(?P<name>ix-llamacpp-[^\s]+)")
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class LlamacppConfig:
|
||||||
|
image: Optional[str] = None
|
||||||
|
container_name: Optional[str] = None
|
||||||
|
host_port: Optional[int] = None
|
||||||
|
container_port: Optional[int] = None
|
||||||
|
web_ui_url: Optional[str] = None
|
||||||
|
model_host_path: Optional[str] = None
|
||||||
|
model_container_path: Optional[str] = None
|
||||||
|
models: List[str] = None
|
||||||
|
network: Optional[str] = None
|
||||||
|
subnets: List[str] = None
|
||||||
|
gpu_count: Optional[int] = None
|
||||||
|
gpu_name: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
def _find_section(lines: List[str], app_name: str) -> List[str]:
|
||||||
|
start = None
|
||||||
|
for i, line in enumerate(lines):
|
||||||
|
m = APP_HEADER_RE.match(line.strip())
|
||||||
|
if m and m.group("name") == app_name:
|
||||||
|
start = i
|
||||||
|
break
|
||||||
|
if start is None:
|
||||||
|
return []
|
||||||
|
for j in range(start + 1, len(lines)):
|
||||||
|
if APP_HEADER_RE.match(lines[j].strip()):
|
||||||
|
return lines[start:j]
|
||||||
|
return lines[start:]
|
||||||
|
|
||||||
|
|
||||||
|
def parse_agents(path: Path) -> LlamacppConfig:
|
||||||
|
text = path.read_text(encoding="utf-8", errors="ignore")
|
||||||
|
lines = text.splitlines()
|
||||||
|
section = _find_section(lines, "llamacpp")
|
||||||
|
cfg = LlamacppConfig(models=[], subnets=[])
|
||||||
|
|
||||||
|
for line in section:
|
||||||
|
if cfg.image is None:
|
||||||
|
m = IMAGE_RE.search(line)
|
||||||
|
if m:
|
||||||
|
cfg.image = m.group("image")
|
||||||
|
if cfg.web_ui_url is None:
|
||||||
|
m = PORTAL_RE.search(line)
|
||||||
|
if m:
|
||||||
|
cfg.web_ui_url = m.group("url")
|
||||||
|
if cfg.container_port is None or cfg.host_port is None:
|
||||||
|
m = PORT_LINE_RE.search(line)
|
||||||
|
if m:
|
||||||
|
cfg.container_port = int(m.group("container"))
|
||||||
|
cfg.host_port = int(m.group("host"))
|
||||||
|
if cfg.model_host_path is None or cfg.model_container_path is None:
|
||||||
|
m = VOLUME_RE.search(line)
|
||||||
|
if m and "/models" in m.group("container"):
|
||||||
|
cfg.model_host_path = m.group("host")
|
||||||
|
cfg.model_container_path = m.group("container")
|
||||||
|
if cfg.network is None:
|
||||||
|
m = NETWORK_RE.search(line)
|
||||||
|
if m:
|
||||||
|
cfg.network = f"{m.group('name')}_default"
|
||||||
|
if "subnets=" in line:
|
||||||
|
m = SUBNET_RE.search(line)
|
||||||
|
if m:
|
||||||
|
subnets_raw = m.group("subnets")
|
||||||
|
subnets = [s.strip().strip("'") for s in subnets_raw.split(",")]
|
||||||
|
cfg.subnets.extend([s for s in subnets if s])
|
||||||
|
if "Models in /models:" in line:
|
||||||
|
m = MODELS_RE.search(line)
|
||||||
|
if m:
|
||||||
|
models_raw = m.group("models")
|
||||||
|
cfg.models = [s.strip() for s in models_raw.split(",") if s.strip()]
|
||||||
|
|
||||||
|
for line in lines:
|
||||||
|
if cfg.gpu_count is None:
|
||||||
|
m = GPU_RE.search(line)
|
||||||
|
if m:
|
||||||
|
cfg.gpu_count = int(m.group("count"))
|
||||||
|
cfg.gpu_name = m.group("name").strip()
|
||||||
|
if cfg.container_name is None:
|
||||||
|
m = CONTAINER_NAME_RE.match(line.strip())
|
||||||
|
if m:
|
||||||
|
cfg.container_name = m.group("name")
|
||||||
|
|
||||||
|
return cfg
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
import argparse
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--agents", default="AGENTS.md")
|
||||||
|
parser.add_argument("--out", default="app/agents_config.json")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
cfg = parse_agents(Path(args.agents))
|
||||||
|
out_path = Path(args.out)
|
||||||
|
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
out_path.write_text(json.dumps(asdict(cfg), indent=2), encoding="utf-8")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
309
llamaCpp.Wrapper.app/api_app.py
Normal file
309
llamaCpp.Wrapper.app/api_app.py
Normal file
@@ -0,0 +1,309 @@
|
|||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict
|
||||||
|
|
||||||
|
from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
|
||||||
|
from fastapi.responses import JSONResponse, StreamingResponse
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from app.config import load_config
|
||||||
|
from app.llamacpp_client import proxy_json, proxy_raw, proxy_stream
|
||||||
|
from app.logging_utils import configure_logging
|
||||||
|
from app.model_registry import find_model, resolve_model, scan_models
|
||||||
|
from app.openai_translate import responses_to_chat_payload, chat_to_responses, normalize_chat_payload
|
||||||
|
from app.restart import RestartPlan, trigger_restart
|
||||||
|
from app.stream_transform import stream_chat_to_responses
|
||||||
|
from app.truenas_middleware import TrueNASConfig, get_active_model_id, switch_model
|
||||||
|
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
|
||||||
|
|
||||||
|
|
||||||
|
configure_logging()
|
||||||
|
log = logging.getLogger("api_app")
|
||||||
|
|
||||||
|
|
||||||
|
def _model_list_payload(model_dir: str) -> Dict[str, Any]:
|
||||||
|
data = []
|
||||||
|
for model in scan_models(model_dir):
|
||||||
|
data.append({
|
||||||
|
"id": model.model_id,
|
||||||
|
"object": "model",
|
||||||
|
"created": model.created,
|
||||||
|
"owned_by": "llama.cpp",
|
||||||
|
})
|
||||||
|
return {"object": "list", "data": data}
|
||||||
|
|
||||||
|
|
||||||
|
def _requires_json_mode(payload: Dict[str, Any]) -> bool:
|
||||||
|
response_format = payload.get("response_format")
|
||||||
|
if isinstance(response_format, dict) and response_format.get("type") == "json_object":
|
||||||
|
return True
|
||||||
|
if payload.get("return_format") == "json":
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_json_fallback(payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
payload = dict(payload)
|
||||||
|
payload.pop("response_format", None)
|
||||||
|
payload.pop("return_format", None)
|
||||||
|
messages = payload.get("messages")
|
||||||
|
if isinstance(messages, list):
|
||||||
|
system_msg = {"role": "system", "content": "Respond only with a valid JSON object."}
|
||||||
|
if not messages or messages[0].get("role") != "system":
|
||||||
|
payload["messages"] = [system_msg, *messages]
|
||||||
|
else:
|
||||||
|
payload["messages"] = [system_msg, *messages[1:]]
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
async def _proxy_json_with_retry(
|
||||||
|
base_url: str,
|
||||||
|
path: str,
|
||||||
|
method: str,
|
||||||
|
headers: Dict[str, str],
|
||||||
|
payload: Dict[str, Any],
|
||||||
|
timeout_s: float,
|
||||||
|
delay_s: float = 3.0,
|
||||||
|
) -> httpx.Response:
|
||||||
|
deadline = time.time() + timeout_s
|
||||||
|
attempt = 0
|
||||||
|
last_exc: Exception | None = None
|
||||||
|
while time.time() < deadline:
|
||||||
|
attempt += 1
|
||||||
|
try:
|
||||||
|
resp = await proxy_json(base_url, path, method, headers, payload, timeout_s)
|
||||||
|
if resp.status_code == 503:
|
||||||
|
try:
|
||||||
|
data = resp.json()
|
||||||
|
except Exception:
|
||||||
|
data = {}
|
||||||
|
message = ""
|
||||||
|
if isinstance(data, dict):
|
||||||
|
err = data.get("error")
|
||||||
|
if isinstance(err, dict):
|
||||||
|
message = str(err.get("message") or "")
|
||||||
|
else:
|
||||||
|
message = str(data.get("message") or "")
|
||||||
|
if "loading model" in message.lower():
|
||||||
|
log.warning("llama.cpp still loading model, retrying (attempt %s)", attempt)
|
||||||
|
await asyncio.sleep(delay_s)
|
||||||
|
continue
|
||||||
|
return resp
|
||||||
|
except httpx.RequestError as exc:
|
||||||
|
last_exc = exc
|
||||||
|
log.warning("Proxy request failed (attempt %s): %s", attempt, exc)
|
||||||
|
await asyncio.sleep(delay_s)
|
||||||
|
if last_exc:
|
||||||
|
raise last_exc
|
||||||
|
raise RuntimeError("proxy retry deadline exceeded")
|
||||||
|
|
||||||
|
|
||||||
|
async def _get_active_model_from_truenas(cfg: TrueNASConfig) -> str:
|
||||||
|
try:
|
||||||
|
return await get_active_model_id(cfg)
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning("Failed to read active model from TrueNAS config: %s", exc)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
async def _wait_for_active_model(cfg: TrueNASConfig, model_id: str, timeout_s: float) -> None:
|
||||||
|
deadline = asyncio.get_event_loop().time() + timeout_s
|
||||||
|
while asyncio.get_event_loop().time() < deadline:
|
||||||
|
active = await _get_active_model_from_truenas(cfg)
|
||||||
|
if active == model_id:
|
||||||
|
return
|
||||||
|
await asyncio.sleep(2)
|
||||||
|
raise RuntimeError(f"active model did not switch to {model_id}")
|
||||||
|
|
||||||
|
|
||||||
|
async def _ensure_model_loaded(model_id: str, model_dir: str) -> str:
|
||||||
|
cfg = load_config()
|
||||||
|
model = resolve_model(model_dir, model_id, cfg.model_aliases)
|
||||||
|
if not model:
|
||||||
|
log.warning("Requested model not found: %s", model_id)
|
||||||
|
raise HTTPException(status_code=404, detail="model not found")
|
||||||
|
if model.model_id != model_id:
|
||||||
|
log.info("Resolved model alias %s -> %s", model_id, model.model_id)
|
||||||
|
|
||||||
|
truenas_cfg = None
|
||||||
|
if cfg.truenas_ws_url and cfg.truenas_api_key:
|
||||||
|
truenas_cfg = TrueNASConfig(
|
||||||
|
ws_url=cfg.truenas_ws_url,
|
||||||
|
api_key=cfg.truenas_api_key,
|
||||||
|
api_user=cfg.truenas_api_user,
|
||||||
|
app_name=cfg.truenas_app_name,
|
||||||
|
verify_ssl=cfg.truenas_verify_ssl,
|
||||||
|
)
|
||||||
|
active_id = await _get_active_model_from_truenas(truenas_cfg)
|
||||||
|
if active_id and active_id == model.model_id:
|
||||||
|
return model.model_id
|
||||||
|
|
||||||
|
if truenas_cfg:
|
||||||
|
log.info("Switching model via API model=%s args=%s extra_args=%s", model.model_id, cfg.llamacpp_args, cfg.llamacpp_extra_args)
|
||||||
|
try:
|
||||||
|
model_path = str((Path(cfg.model_container_dir) / model.model_id))
|
||||||
|
await switch_model(
|
||||||
|
truenas_cfg,
|
||||||
|
model_path,
|
||||||
|
cfg.llamacpp_args,
|
||||||
|
cfg.llamacpp_extra_args,
|
||||||
|
)
|
||||||
|
await _wait_for_active_model(truenas_cfg, model.model_id, cfg.switch_timeout_s)
|
||||||
|
except Exception as exc:
|
||||||
|
log.exception("TrueNAS model switch failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
|
||||||
|
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
|
||||||
|
log.info("Running warmup prompt after model switch: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
|
||||||
|
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||||
|
return model.model_id
|
||||||
|
|
||||||
|
plan = RestartPlan(
|
||||||
|
method=cfg.restart_method,
|
||||||
|
command=cfg.restart_command,
|
||||||
|
url=cfg.restart_url,
|
||||||
|
allowed_container=cfg.allowed_container,
|
||||||
|
)
|
||||||
|
log.info("Triggering restart for model=%s method=%s", model.model_id, cfg.restart_method)
|
||||||
|
payload = {
|
||||||
|
"model_id": model.model_id,
|
||||||
|
"model_path": str(Path(cfg.model_container_dir) / model.model_id),
|
||||||
|
"gpu_count": cfg.gpu_count_runtime or cfg.agents.gpu_count,
|
||||||
|
"llamacpp_args": cfg.llamacpp_args,
|
||||||
|
"llamacpp_extra_args": cfg.llamacpp_extra_args,
|
||||||
|
}
|
||||||
|
await trigger_restart(plan, payload=payload)
|
||||||
|
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
|
||||||
|
log.info("Running warmup prompt after restart: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
|
||||||
|
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||||
|
return model.model_id
|
||||||
|
|
||||||
|
|
||||||
|
def create_api_app() -> FastAPI:
|
||||||
|
cfg = load_config()
|
||||||
|
app = FastAPI(title="llama.cpp OpenAI Wrapper", version="0.1.0")
|
||||||
|
router = APIRouter()
|
||||||
|
|
||||||
|
@app.middleware("http")
|
||||||
|
async def log_requests(request: Request, call_next):
|
||||||
|
log.info("Request %s %s", request.method, request.url.path)
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
@app.exception_handler(Exception)
|
||||||
|
async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
|
||||||
|
log.exception("Unhandled error")
|
||||||
|
return JSONResponse(status_code=500, content={"detail": str(exc)})
|
||||||
|
|
||||||
|
@router.get("/health")
|
||||||
|
async def health() -> Dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"status": "ok",
|
||||||
|
"base_url": cfg.base_url,
|
||||||
|
"model_dir": cfg.model_dir,
|
||||||
|
"agents": {
|
||||||
|
"image": cfg.agents.image,
|
||||||
|
"container_name": cfg.agents.container_name,
|
||||||
|
"network": cfg.agents.network,
|
||||||
|
"gpu_count": cfg.agents.gpu_count,
|
||||||
|
},
|
||||||
|
"gpu_count_runtime": cfg.gpu_count_runtime,
|
||||||
|
}
|
||||||
|
|
||||||
|
@router.get("/v1/models")
|
||||||
|
async def list_models() -> Dict[str, Any]:
|
||||||
|
log.info("Listing models")
|
||||||
|
return _model_list_payload(cfg.model_dir)
|
||||||
|
|
||||||
|
@router.get("/v1/models/{model_id}")
|
||||||
|
async def get_model(model_id: str) -> Dict[str, Any]:
|
||||||
|
log.info("Get model %s", model_id)
|
||||||
|
model = resolve_model(cfg.model_dir, model_id, cfg.model_aliases) or find_model(cfg.model_dir, model_id)
|
||||||
|
if not model:
|
||||||
|
raise HTTPException(status_code=404, detail="model not found")
|
||||||
|
return {
|
||||||
|
"id": model.model_id,
|
||||||
|
"object": "model",
|
||||||
|
"created": model.created,
|
||||||
|
"owned_by": "llama.cpp",
|
||||||
|
}
|
||||||
|
|
||||||
|
@router.post("/v1/chat/completions")
|
||||||
|
async def chat_completions(request: Request) -> Response:
|
||||||
|
payload = await request.json()
|
||||||
|
payload = normalize_chat_payload(payload)
|
||||||
|
model_id = payload.get("model")
|
||||||
|
log.info("Chat completions model=%s stream=%s", model_id, bool(payload.get("stream")))
|
||||||
|
if model_id:
|
||||||
|
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
|
||||||
|
payload["model"] = resolved
|
||||||
|
stream = bool(payload.get("stream"))
|
||||||
|
if stream and _requires_json_mode(payload):
|
||||||
|
payload = _apply_json_fallback(payload)
|
||||||
|
if stream:
|
||||||
|
streamer = proxy_stream(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||||
|
return StreamingResponse(streamer, media_type="text/event-stream")
|
||||||
|
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||||
|
if resp.status_code >= 500 and _requires_json_mode(payload):
|
||||||
|
log.info("Retrying chat completion with JSON fallback prompt")
|
||||||
|
fallback_payload = _apply_json_fallback(payload)
|
||||||
|
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
|
||||||
|
try:
|
||||||
|
return JSONResponse(status_code=resp.status_code, content=resp.json())
|
||||||
|
except Exception:
|
||||||
|
return Response(
|
||||||
|
status_code=resp.status_code,
|
||||||
|
content=resp.content,
|
||||||
|
media_type=resp.headers.get("content-type"),
|
||||||
|
)
|
||||||
|
|
||||||
|
@router.post("/v1/responses")
|
||||||
|
async def responses(request: Request) -> Response:
|
||||||
|
payload = await request.json()
|
||||||
|
chat_payload, model_id = responses_to_chat_payload(payload)
|
||||||
|
log.info("Responses model=%s stream=%s", model_id, bool(chat_payload.get("stream")))
|
||||||
|
if model_id:
|
||||||
|
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
|
||||||
|
chat_payload["model"] = resolved
|
||||||
|
stream = bool(chat_payload.get("stream"))
|
||||||
|
if stream and _requires_json_mode(chat_payload):
|
||||||
|
chat_payload = _apply_json_fallback(chat_payload)
|
||||||
|
if stream:
|
||||||
|
streamer = stream_chat_to_responses(
|
||||||
|
cfg.base_url,
|
||||||
|
dict(request.headers),
|
||||||
|
chat_payload,
|
||||||
|
cfg.proxy_timeout_s,
|
||||||
|
)
|
||||||
|
return StreamingResponse(streamer, media_type="text/event-stream")
|
||||||
|
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), chat_payload, cfg.proxy_timeout_s)
|
||||||
|
if resp.status_code >= 500 and _requires_json_mode(chat_payload):
|
||||||
|
log.info("Retrying responses with JSON fallback prompt")
|
||||||
|
fallback_payload = _apply_json_fallback(chat_payload)
|
||||||
|
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
|
||||||
|
resp.raise_for_status()
|
||||||
|
return JSONResponse(status_code=200, content=chat_to_responses(resp.json(), model_id))
|
||||||
|
|
||||||
|
@router.post("/v1/embeddings")
|
||||||
|
async def embeddings(request: Request) -> Response:
|
||||||
|
payload = await request.json()
|
||||||
|
log.info("Embeddings")
|
||||||
|
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/embeddings", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||||
|
try:
|
||||||
|
return JSONResponse(status_code=resp.status_code, content=resp.json())
|
||||||
|
except Exception:
|
||||||
|
return Response(
|
||||||
|
status_code=resp.status_code,
|
||||||
|
content=resp.content,
|
||||||
|
media_type=resp.headers.get("content-type"),
|
||||||
|
)
|
||||||
|
|
||||||
|
@router.api_route("/proxy/llamacpp/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"])
|
||||||
|
async def passthrough(path: str, request: Request) -> Response:
|
||||||
|
body = await request.body()
|
||||||
|
resp = await proxy_raw(cfg.base_url, f"/{path}", request.method, dict(request.headers), body, cfg.proxy_timeout_s)
|
||||||
|
return Response(status_code=resp.status_code, content=resp.content, headers=dict(resp.headers))
|
||||||
|
|
||||||
|
app.include_router(router)
|
||||||
|
return app
|
||||||
|
|
||||||
214
llamaCpp.Wrapper.app/config.py
Normal file
214
llamaCpp.Wrapper.app/config.py
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AgentsRuntime:
|
||||||
|
image: Optional[str]
|
||||||
|
container_name: Optional[str]
|
||||||
|
host_port: Optional[int]
|
||||||
|
container_port: Optional[int]
|
||||||
|
web_ui_url: Optional[str]
|
||||||
|
model_host_path: Optional[str]
|
||||||
|
model_container_path: Optional[str]
|
||||||
|
models: List[str]
|
||||||
|
network: Optional[str]
|
||||||
|
subnets: List[str]
|
||||||
|
gpu_count: Optional[int]
|
||||||
|
gpu_name: Optional[str]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AppConfig:
|
||||||
|
api_port: int
|
||||||
|
ui_port: int
|
||||||
|
base_url: str
|
||||||
|
model_dir: str
|
||||||
|
model_container_dir: str
|
||||||
|
download_dir: str
|
||||||
|
download_max_concurrent: int
|
||||||
|
download_allowlist: List[str]
|
||||||
|
restart_method: str
|
||||||
|
restart_command: Optional[str]
|
||||||
|
restart_url: Optional[str]
|
||||||
|
reload_on_new_model: bool
|
||||||
|
proxy_timeout_s: float
|
||||||
|
switch_timeout_s: float
|
||||||
|
gpu_count_runtime: Optional[int]
|
||||||
|
llamacpp_args: Dict[str, str]
|
||||||
|
llamacpp_extra_args: str
|
||||||
|
truenas_api_key: Optional[str]
|
||||||
|
truenas_api_user: Optional[str]
|
||||||
|
truenas_app_name: str
|
||||||
|
truenas_ws_url: Optional[str]
|
||||||
|
truenas_verify_ssl: bool
|
||||||
|
allowed_container: Optional[str]
|
||||||
|
warmup_prompt_path: str
|
||||||
|
llamacpp_container_name: Optional[str]
|
||||||
|
model_aliases: Dict[str, str]
|
||||||
|
agents: AgentsRuntime
|
||||||
|
|
||||||
|
|
||||||
|
def _load_agents_config(path: Path) -> AgentsRuntime:
|
||||||
|
if not path.exists():
|
||||||
|
return AgentsRuntime(
|
||||||
|
image=None,
|
||||||
|
container_name=None,
|
||||||
|
host_port=None,
|
||||||
|
container_port=None,
|
||||||
|
web_ui_url=None,
|
||||||
|
model_host_path=None,
|
||||||
|
model_container_path=None,
|
||||||
|
models=[],
|
||||||
|
network=None,
|
||||||
|
subnets=[],
|
||||||
|
gpu_count=None,
|
||||||
|
gpu_name=None,
|
||||||
|
)
|
||||||
|
raw = json.loads(path.read_text(encoding="utf-8"))
|
||||||
|
return AgentsRuntime(
|
||||||
|
image=raw.get("image"),
|
||||||
|
container_name=raw.get("container_name"),
|
||||||
|
host_port=raw.get("host_port"),
|
||||||
|
container_port=raw.get("container_port"),
|
||||||
|
web_ui_url=raw.get("web_ui_url"),
|
||||||
|
model_host_path=raw.get("model_host_path"),
|
||||||
|
model_container_path=raw.get("model_container_path"),
|
||||||
|
models=raw.get("models") or [],
|
||||||
|
network=raw.get("network"),
|
||||||
|
subnets=raw.get("subnets") or [],
|
||||||
|
gpu_count=raw.get("gpu_count"),
|
||||||
|
gpu_name=raw.get("gpu_name"),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _infer_gpu_count_runtime() -> Optional[int]:
|
||||||
|
visible = os.getenv("CUDA_VISIBLE_DEVICES") or os.getenv("NVIDIA_VISIBLE_DEVICES")
|
||||||
|
if visible and visible not in {"all", "void"}:
|
||||||
|
parts = [p.strip() for p in visible.split(",") if p.strip()]
|
||||||
|
if parts:
|
||||||
|
return len(parts)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _default_base_url(agents: AgentsRuntime) -> str:
|
||||||
|
if agents.container_name and agents.container_port:
|
||||||
|
return f"http://{agents.container_name}:{agents.container_port}"
|
||||||
|
if agents.host_port:
|
||||||
|
return f"http://127.0.0.1:{agents.host_port}"
|
||||||
|
return "http://127.0.0.1:8080"
|
||||||
|
|
||||||
|
|
||||||
|
def load_config() -> AppConfig:
|
||||||
|
agents_path = Path(os.getenv("AGENTS_CONFIG_PATH", "app/agents_config.json"))
|
||||||
|
agents = _load_agents_config(agents_path)
|
||||||
|
|
||||||
|
api_port = int(os.getenv("PORT_A", "9093"))
|
||||||
|
ui_port = int(os.getenv("PORT_B", "9094"))
|
||||||
|
|
||||||
|
base_url = os.getenv("LLAMACPP_BASE_URL") or _default_base_url(agents)
|
||||||
|
model_dir = os.getenv("MODEL_DIR") or agents.model_container_path or "/models"
|
||||||
|
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or model_dir
|
||||||
|
|
||||||
|
download_dir = os.getenv("MODEL_DOWNLOAD_DIR") or model_dir
|
||||||
|
download_max = int(os.getenv("MODEL_DOWNLOAD_MAX_CONCURRENT", "2"))
|
||||||
|
|
||||||
|
allowlist_raw = os.getenv("MODEL_DOWNLOAD_ALLOWLIST", "")
|
||||||
|
allowlist = [item.strip() for item in allowlist_raw.split(",") if item.strip()]
|
||||||
|
|
||||||
|
restart_method = os.getenv("LLAMACPP_RESTART_METHOD", "none").lower()
|
||||||
|
restart_command = os.getenv("LLAMACPP_RESTART_COMMAND")
|
||||||
|
restart_url = os.getenv("LLAMACPP_RESTART_URL")
|
||||||
|
|
||||||
|
reload_on_new_model = os.getenv("RELOAD_ON_NEW_MODEL", "false").lower() in {"1", "true", "yes"}
|
||||||
|
proxy_timeout_s = float(os.getenv("LLAMACPP_PROXY_TIMEOUT_S", "600"))
|
||||||
|
switch_timeout_s = float(os.getenv("LLAMACPP_SWITCH_TIMEOUT_S", "300"))
|
||||||
|
|
||||||
|
gpu_count_runtime = _infer_gpu_count_runtime()
|
||||||
|
|
||||||
|
llamacpp_args = {}
|
||||||
|
args_map = {
|
||||||
|
"LLAMACPP_TENSOR_SPLIT": "tensor_split",
|
||||||
|
"LLAMACPP_SPLIT_MODE": "split_mode",
|
||||||
|
"LLAMACPP_N_GPU_LAYERS": "n_gpu_layers",
|
||||||
|
"LLAMACPP_CTX_SIZE": "ctx_size",
|
||||||
|
"LLAMACPP_BATCH_SIZE": "batch_size",
|
||||||
|
"LLAMACPP_UBATCH_SIZE": "ubatch_size",
|
||||||
|
"LLAMACPP_CACHE_TYPE_K": "cache_type_k",
|
||||||
|
"LLAMACPP_CACHE_TYPE_V": "cache_type_v",
|
||||||
|
"LLAMACPP_FLASH_ATTN": "flash_attn",
|
||||||
|
}
|
||||||
|
for env_key, arg_key in args_map.items():
|
||||||
|
value = os.getenv(env_key)
|
||||||
|
if value is not None and value != "":
|
||||||
|
llamacpp_args[arg_key] = value
|
||||||
|
llamacpp_extra_args = os.getenv("LLAMACPP_EXTRA_ARGS", "")
|
||||||
|
|
||||||
|
truenas_api_key = os.getenv("TRUENAS_API_KEY")
|
||||||
|
truenas_api_user = os.getenv("TRUENAS_API_USER")
|
||||||
|
truenas_app_name = os.getenv("TRUENAS_APP_NAME", "llamacpp")
|
||||||
|
truenas_ws_url = os.getenv("TRUENAS_WS_URL")
|
||||||
|
truenas_api_url = os.getenv("TRUENAS_API_URL")
|
||||||
|
if not truenas_ws_url and truenas_api_url:
|
||||||
|
if truenas_api_url.startswith("https://"):
|
||||||
|
truenas_ws_url = "wss://" + truenas_api_url[len("https://") :].rstrip("/") + "/websocket"
|
||||||
|
elif truenas_api_url.startswith("http://"):
|
||||||
|
truenas_ws_url = "ws://" + truenas_api_url[len("http://") :].rstrip("/") + "/websocket"
|
||||||
|
truenas_verify_ssl = os.getenv("TRUENAS_VERIFY_SSL", "false").lower() in {"1", "true", "yes"}
|
||||||
|
allowed_container = os.getenv("LLAMACPP_TARGET_CONTAINER") or agents.container_name
|
||||||
|
llamacpp_container_name = os.getenv("LLAMACPP_CONTAINER_NAME") or agents.container_name
|
||||||
|
warmup_prompt_path = os.getenv("WARMUP_PROMPT_PATH", str(Path("trades_company_stock.txt").resolve()))
|
||||||
|
if truenas_ws_url and (":" in model_container_dir[:3] or "\\" in model_container_dir):
|
||||||
|
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or "/models"
|
||||||
|
aliases_raw = os.getenv("MODEL_ALIASES", "")
|
||||||
|
model_aliases: Dict[str, str] = {}
|
||||||
|
if aliases_raw:
|
||||||
|
try:
|
||||||
|
model_aliases = json.loads(aliases_raw)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
for item in aliases_raw.split(","):
|
||||||
|
if "=" in item:
|
||||||
|
key, value = item.split("=", 1)
|
||||||
|
model_aliases[key.strip()] = value.strip()
|
||||||
|
|
||||||
|
gpu_count = gpu_count_runtime or agents.gpu_count
|
||||||
|
if gpu_count and gpu_count >= 2:
|
||||||
|
if "tensor_split" not in llamacpp_args:
|
||||||
|
ratio = 1.0 / float(gpu_count)
|
||||||
|
split = ",".join([f"{ratio:.2f}"] * gpu_count)
|
||||||
|
llamacpp_args["tensor_split"] = split
|
||||||
|
if "split_mode" not in llamacpp_args:
|
||||||
|
llamacpp_args["split_mode"] = "layer"
|
||||||
|
|
||||||
|
return AppConfig(
|
||||||
|
api_port=api_port,
|
||||||
|
ui_port=ui_port,
|
||||||
|
base_url=base_url,
|
||||||
|
model_dir=model_dir,
|
||||||
|
model_container_dir=model_container_dir,
|
||||||
|
download_dir=download_dir,
|
||||||
|
download_max_concurrent=download_max,
|
||||||
|
download_allowlist=allowlist,
|
||||||
|
restart_method=restart_method,
|
||||||
|
restart_command=restart_command,
|
||||||
|
restart_url=restart_url,
|
||||||
|
reload_on_new_model=reload_on_new_model,
|
||||||
|
proxy_timeout_s=proxy_timeout_s,
|
||||||
|
switch_timeout_s=switch_timeout_s,
|
||||||
|
gpu_count_runtime=gpu_count_runtime,
|
||||||
|
llamacpp_args=llamacpp_args,
|
||||||
|
llamacpp_extra_args=llamacpp_extra_args,
|
||||||
|
truenas_api_key=truenas_api_key,
|
||||||
|
truenas_api_user=truenas_api_user,
|
||||||
|
truenas_app_name=truenas_app_name,
|
||||||
|
truenas_ws_url=truenas_ws_url,
|
||||||
|
truenas_verify_ssl=truenas_verify_ssl,
|
||||||
|
allowed_container=allowed_container,
|
||||||
|
warmup_prompt_path=warmup_prompt_path,
|
||||||
|
llamacpp_container_name=llamacpp_container_name,
|
||||||
|
model_aliases=model_aliases,
|
||||||
|
agents=agents,
|
||||||
|
)
|
||||||
61
llamaCpp.Wrapper.app/docker_logs.py
Normal file
61
llamaCpp.Wrapper.app/docker_logs.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("docker_logs")
|
||||||
|
|
||||||
|
|
||||||
|
def _docker_transport() -> httpx.AsyncHTTPTransport:
|
||||||
|
sock_path = os.getenv("DOCKER_SOCK", "/var/run/docker.sock")
|
||||||
|
return httpx.AsyncHTTPTransport(uds=sock_path)
|
||||||
|
|
||||||
|
|
||||||
|
async def _docker_get(path: str, params: Optional[dict] = None) -> httpx.Response:
|
||||||
|
timeout = httpx.Timeout(10.0, read=10.0)
|
||||||
|
async with httpx.AsyncClient(transport=_docker_transport(), base_url="http://docker", timeout=timeout) as client:
|
||||||
|
resp = await client.get(path, params=params)
|
||||||
|
resp.raise_for_status()
|
||||||
|
return resp
|
||||||
|
|
||||||
|
|
||||||
|
def _decode_docker_stream(data: bytes) -> str:
|
||||||
|
if not data:
|
||||||
|
return ""
|
||||||
|
out = bytearray()
|
||||||
|
idx = 0
|
||||||
|
while idx + 8 <= len(data):
|
||||||
|
stream_type = data[idx]
|
||||||
|
size = int.from_bytes(data[idx + 4: idx + 8], "big")
|
||||||
|
idx += 8
|
||||||
|
if idx + size > len(data):
|
||||||
|
break
|
||||||
|
chunk = data[idx: idx + size]
|
||||||
|
idx += size
|
||||||
|
if stream_type in (1, 2):
|
||||||
|
out.extend(chunk)
|
||||||
|
else:
|
||||||
|
out.extend(chunk)
|
||||||
|
if out:
|
||||||
|
return out.decode("utf-8", errors="replace")
|
||||||
|
return data.decode("utf-8", errors="replace")
|
||||||
|
|
||||||
|
|
||||||
|
async def docker_container_logs(container_name: str, tail_lines: int = 200) -> str:
|
||||||
|
filters = json.dumps({"name": [container_name]})
|
||||||
|
resp = await _docker_get("/containers/json", params={"filters": filters})
|
||||||
|
containers = resp.json() or []
|
||||||
|
if not containers:
|
||||||
|
log.info("No docker container found for name=%s", container_name)
|
||||||
|
return ""
|
||||||
|
container_id = containers[0].get("Id")
|
||||||
|
if not container_id:
|
||||||
|
return ""
|
||||||
|
resp = await _docker_get(
|
||||||
|
f"/containers/{container_id}/logs",
|
||||||
|
params={"stdout": 1, "stderr": 1, "tail": tail_lines},
|
||||||
|
)
|
||||||
|
return _decode_docker_stream(resp.content)
|
||||||
141
llamaCpp.Wrapper.app/download_manager.py
Normal file
141
llamaCpp.Wrapper.app/download_manager.py
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
import asyncio
|
||||||
|
import fnmatch
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
from app.config import AppConfig
|
||||||
|
from app.logging_utils import configure_logging
|
||||||
|
from app.restart import RestartPlan, trigger_restart
|
||||||
|
|
||||||
|
configure_logging()
|
||||||
|
log = logging.getLogger("download_manager")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DownloadStatus:
|
||||||
|
download_id: str
|
||||||
|
url: str
|
||||||
|
filename: str
|
||||||
|
status: str
|
||||||
|
bytes_total: Optional[int] = None
|
||||||
|
bytes_downloaded: int = 0
|
||||||
|
started_at: float = field(default_factory=time.time)
|
||||||
|
finished_at: Optional[float] = None
|
||||||
|
error: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
class DownloadManager:
|
||||||
|
def __init__(self, cfg: AppConfig, broadcaster=None) -> None:
|
||||||
|
self.cfg = cfg
|
||||||
|
self._downloads: Dict[str, DownloadStatus] = {}
|
||||||
|
self._tasks: Dict[str, asyncio.Task] = {}
|
||||||
|
self._semaphore = asyncio.Semaphore(cfg.download_max_concurrent)
|
||||||
|
self._broadcaster = broadcaster
|
||||||
|
|
||||||
|
async def _emit(self, payload: dict) -> None:
|
||||||
|
if self._broadcaster:
|
||||||
|
await self._broadcaster.publish(payload)
|
||||||
|
|
||||||
|
def list_downloads(self) -> Dict[str, dict]:
|
||||||
|
return {k: asdict(v) for k, v in self._downloads.items()}
|
||||||
|
|
||||||
|
def get(self, download_id: str) -> Optional[DownloadStatus]:
|
||||||
|
return self._downloads.get(download_id)
|
||||||
|
|
||||||
|
def _is_allowed(self, url: str) -> bool:
|
||||||
|
if not self.cfg.download_allowlist:
|
||||||
|
return True
|
||||||
|
return any(fnmatch.fnmatch(url, pattern) for pattern in self.cfg.download_allowlist)
|
||||||
|
|
||||||
|
async def start(self, url: str, filename: Optional[str] = None) -> DownloadStatus:
|
||||||
|
if not self._is_allowed(url):
|
||||||
|
raise ValueError("url not allowed by allowlist")
|
||||||
|
if not filename:
|
||||||
|
filename = os.path.basename(url.split("?")[0]) or f"model-{uuid.uuid4().hex}.gguf"
|
||||||
|
log.info("Download requested url=%s filename=%s", url, filename)
|
||||||
|
download_id = uuid.uuid4().hex
|
||||||
|
status = DownloadStatus(download_id=download_id, url=url, filename=filename, status="queued")
|
||||||
|
self._downloads[download_id] = status
|
||||||
|
task = asyncio.create_task(self._run_download(status))
|
||||||
|
self._tasks[download_id] = task
|
||||||
|
await self._emit({"type": "download_status", "download": asdict(status)})
|
||||||
|
return status
|
||||||
|
|
||||||
|
async def cancel(self, download_id: str) -> bool:
|
||||||
|
task = self._tasks.get(download_id)
|
||||||
|
if task:
|
||||||
|
task.cancel()
|
||||||
|
status = self._downloads.get(download_id)
|
||||||
|
if status:
|
||||||
|
log.info("Download cancelled id=%s filename=%s", download_id, status.filename)
|
||||||
|
await self._emit({"type": "download_status", "download": asdict(status)})
|
||||||
|
return True
|
||||||
|
return False
|
||||||
|
|
||||||
|
async def _run_download(self, status: DownloadStatus) -> None:
|
||||||
|
status.status = "downloading"
|
||||||
|
base = Path(self.cfg.download_dir)
|
||||||
|
base.mkdir(parents=True, exist_ok=True)
|
||||||
|
tmp_path = base / f".{status.filename}.partial"
|
||||||
|
final_path = base / status.filename
|
||||||
|
last_emit = 0.0
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with self._semaphore:
|
||||||
|
async with httpx.AsyncClient(timeout=None, follow_redirects=True) as client:
|
||||||
|
async with client.stream("GET", status.url) as resp:
|
||||||
|
resp.raise_for_status()
|
||||||
|
length = resp.headers.get("content-length")
|
||||||
|
if length:
|
||||||
|
status.bytes_total = int(length)
|
||||||
|
with tmp_path.open("wb") as f:
|
||||||
|
async for chunk in resp.aiter_bytes():
|
||||||
|
if chunk:
|
||||||
|
f.write(chunk)
|
||||||
|
status.bytes_downloaded += len(chunk)
|
||||||
|
now = time.time()
|
||||||
|
if now - last_emit >= 1:
|
||||||
|
last_emit = now
|
||||||
|
await self._emit({"type": "download_progress", "download": asdict(status)})
|
||||||
|
if tmp_path.exists():
|
||||||
|
tmp_path.replace(final_path)
|
||||||
|
status.status = "completed"
|
||||||
|
status.finished_at = time.time()
|
||||||
|
log.info("Download completed id=%s filename=%s", status.download_id, status.filename)
|
||||||
|
await self._emit({"type": "download_completed", "download": asdict(status)})
|
||||||
|
if self.cfg.reload_on_new_model:
|
||||||
|
plan = RestartPlan(
|
||||||
|
method=self.cfg.restart_method,
|
||||||
|
command=self.cfg.restart_command,
|
||||||
|
url=self.cfg.restart_url,
|
||||||
|
allowed_container=self.cfg.allowed_container,
|
||||||
|
)
|
||||||
|
await trigger_restart(
|
||||||
|
plan,
|
||||||
|
payload={
|
||||||
|
"reason": "new_model",
|
||||||
|
"model_id": status.filename,
|
||||||
|
"llamacpp_args": self.cfg.llamacpp_args,
|
||||||
|
"llamacpp_extra_args": self.cfg.llamacpp_extra_args,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
status.status = "cancelled"
|
||||||
|
if tmp_path.exists():
|
||||||
|
tmp_path.unlink(missing_ok=True)
|
||||||
|
log.info("Download cancelled id=%s filename=%s", status.download_id, status.filename)
|
||||||
|
await self._emit({"type": "download_cancelled", "download": asdict(status)})
|
||||||
|
except Exception as exc:
|
||||||
|
status.status = "error"
|
||||||
|
status.error = str(exc)
|
||||||
|
if tmp_path.exists():
|
||||||
|
tmp_path.unlink(missing_ok=True)
|
||||||
|
log.info("Download error id=%s filename=%s error=%s", status.download_id, status.filename, exc)
|
||||||
|
await self._emit({"type": "download_error", "download": asdict(status)})
|
||||||
52
llamaCpp.Wrapper.app/llamacpp_client.py
Normal file
52
llamaCpp.Wrapper.app/llamacpp_client.py
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
import logging
|
||||||
|
from typing import AsyncIterator, Dict, Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("llamacpp_client")
|
||||||
|
|
||||||
|
|
||||||
|
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
|
||||||
|
drop = {"host", "content-length"}
|
||||||
|
return {k: v for k, v in headers.items() if k.lower() not in drop}
|
||||||
|
|
||||||
|
|
||||||
|
async def proxy_json(
|
||||||
|
base_url: str,
|
||||||
|
path: str,
|
||||||
|
method: str,
|
||||||
|
headers: Dict[str, str],
|
||||||
|
payload: Optional[dict],
|
||||||
|
timeout_s: float,
|
||||||
|
) -> httpx.Response:
|
||||||
|
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||||
|
return await client.request(method, path, headers=_filter_headers(headers), json=payload)
|
||||||
|
|
||||||
|
|
||||||
|
async def proxy_raw(
|
||||||
|
base_url: str,
|
||||||
|
path: str,
|
||||||
|
method: str,
|
||||||
|
headers: Dict[str, str],
|
||||||
|
body: Optional[bytes],
|
||||||
|
timeout_s: float,
|
||||||
|
) -> httpx.Response:
|
||||||
|
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||||
|
return await client.request(method, path, headers=_filter_headers(headers), content=body)
|
||||||
|
|
||||||
|
|
||||||
|
async def proxy_stream(
|
||||||
|
base_url: str,
|
||||||
|
path: str,
|
||||||
|
method: str,
|
||||||
|
headers: Dict[str, str],
|
||||||
|
payload: Optional[dict],
|
||||||
|
timeout_s: float,
|
||||||
|
) -> AsyncIterator[bytes]:
|
||||||
|
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||||
|
async with client.stream(method, path, headers=_filter_headers(headers), json=payload) as resp:
|
||||||
|
resp.raise_for_status()
|
||||||
|
async for chunk in resp.aiter_bytes():
|
||||||
|
if chunk:
|
||||||
|
yield chunk
|
||||||
13
llamaCpp.Wrapper.app/logging_utils.py
Normal file
13
llamaCpp.Wrapper.app/logging_utils.py
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
import logging
|
||||||
|
import os
|
||||||
|
|
||||||
|
|
||||||
|
def configure_logging() -> None:
|
||||||
|
if logging.getLogger().handlers:
|
||||||
|
return
|
||||||
|
level_name = os.getenv("LOG_LEVEL", "INFO").upper()
|
||||||
|
level = getattr(logging, level_name, logging.INFO)
|
||||||
|
logging.basicConfig(
|
||||||
|
level=level,
|
||||||
|
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
|
||||||
|
)
|
||||||
45
llamaCpp.Wrapper.app/model_registry.py
Normal file
45
llamaCpp.Wrapper.app/model_registry.py
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
import time
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Dict, List, Optional
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ModelInfo:
|
||||||
|
model_id: str
|
||||||
|
created: int
|
||||||
|
size: int
|
||||||
|
path: Path
|
||||||
|
|
||||||
|
|
||||||
|
def scan_models(model_dir: str) -> List[ModelInfo]:
|
||||||
|
base = Path(model_dir)
|
||||||
|
if not base.exists():
|
||||||
|
return []
|
||||||
|
models: List[ModelInfo] = []
|
||||||
|
now = int(time.time())
|
||||||
|
for entry in base.iterdir():
|
||||||
|
if entry.name.endswith(".partial"):
|
||||||
|
continue
|
||||||
|
if entry.is_file():
|
||||||
|
size = entry.stat().st_size
|
||||||
|
models.append(ModelInfo(model_id=entry.name, created=now, size=size, path=entry))
|
||||||
|
elif entry.is_dir():
|
||||||
|
models.append(ModelInfo(model_id=entry.name, created=now, size=0, path=entry))
|
||||||
|
models.sort(key=lambda m: m.model_id.lower())
|
||||||
|
return models
|
||||||
|
|
||||||
|
|
||||||
|
def find_model(model_dir: str, model_id: str) -> Optional[ModelInfo]:
|
||||||
|
for model in scan_models(model_dir):
|
||||||
|
if model.model_id == model_id:
|
||||||
|
return model
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_model(model_dir: str, requested: str, aliases: Dict[str, str]) -> Optional[ModelInfo]:
|
||||||
|
if not requested:
|
||||||
|
return None
|
||||||
|
if requested in aliases:
|
||||||
|
requested = aliases[requested]
|
||||||
|
return find_model(model_dir, requested)
|
||||||
140
llamaCpp.Wrapper.app/openai_translate.py
Normal file
140
llamaCpp.Wrapper.app/openai_translate.py
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from typing import Any, Dict, List, Tuple
|
||||||
|
|
||||||
|
|
||||||
|
def _messages_from_input(input_value: Any) -> List[Dict[str, Any]]:
|
||||||
|
if isinstance(input_value, str):
|
||||||
|
return [{"role": "user", "content": input_value}]
|
||||||
|
if isinstance(input_value, list):
|
||||||
|
messages: List[Dict[str, Any]] = []
|
||||||
|
for item in input_value:
|
||||||
|
if isinstance(item, str):
|
||||||
|
messages.append({"role": "user", "content": item})
|
||||||
|
elif isinstance(item, dict):
|
||||||
|
role = item.get("role") or "user"
|
||||||
|
content = item.get("content") or item.get("text") or ""
|
||||||
|
if item.get("type") == "input_image":
|
||||||
|
content = [{"type": "image_url", "image_url": {"url": item.get("image_url", "")}}]
|
||||||
|
messages.append({"role": role, "content": content})
|
||||||
|
return messages
|
||||||
|
return [{"role": "user", "content": str(input_value)}]
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_tools(tools: Any) -> Any:
|
||||||
|
if not isinstance(tools, list):
|
||||||
|
return tools
|
||||||
|
normalized = []
|
||||||
|
for tool in tools:
|
||||||
|
if not isinstance(tool, dict):
|
||||||
|
normalized.append(tool)
|
||||||
|
continue
|
||||||
|
if "function" in tool:
|
||||||
|
normalized.append(tool)
|
||||||
|
continue
|
||||||
|
if tool.get("type") == "function" and ("name" in tool or "parameters" in tool or "description" in tool):
|
||||||
|
function = {
|
||||||
|
"name": tool.get("name"),
|
||||||
|
"parameters": tool.get("parameters"),
|
||||||
|
"description": tool.get("description"),
|
||||||
|
}
|
||||||
|
function = {k: v for k, v in function.items() if v is not None}
|
||||||
|
normalized.append({"type": "function", "function": function})
|
||||||
|
continue
|
||||||
|
normalized.append(tool)
|
||||||
|
return normalized
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_tool_choice(tool_choice: Any) -> Any:
|
||||||
|
if not isinstance(tool_choice, dict):
|
||||||
|
return tool_choice
|
||||||
|
if "function" in tool_choice:
|
||||||
|
return tool_choice
|
||||||
|
if tool_choice.get("type") == "function" and "name" in tool_choice:
|
||||||
|
return {"type": "function", "function": {"name": tool_choice.get("name")}}
|
||||||
|
return tool_choice
|
||||||
|
|
||||||
|
|
||||||
|
def normalize_chat_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||||
|
if "return_format" in payload and "response_format" not in payload:
|
||||||
|
if payload["return_format"] == "json":
|
||||||
|
payload["response_format"] = {"type": "json_object"}
|
||||||
|
if "functions" in payload and "tools" not in payload:
|
||||||
|
functions = payload.get("functions")
|
||||||
|
if isinstance(functions, list):
|
||||||
|
tools = []
|
||||||
|
for func in functions:
|
||||||
|
if isinstance(func, dict):
|
||||||
|
tools.append({"type": "function", "function": func})
|
||||||
|
if tools:
|
||||||
|
payload["tools"] = tools
|
||||||
|
payload.pop("functions", None)
|
||||||
|
if "tools" in payload:
|
||||||
|
payload["tools"] = _normalize_tools(payload.get("tools"))
|
||||||
|
if "tool_choice" in payload:
|
||||||
|
payload["tool_choice"] = _normalize_tool_choice(payload.get("tool_choice"))
|
||||||
|
return payload
|
||||||
|
|
||||||
|
|
||||||
|
def responses_to_chat_payload(payload: Dict[str, Any]) -> Tuple[Dict[str, Any], str]:
|
||||||
|
model = payload.get("model") or "unknown"
|
||||||
|
messages = _messages_from_input(payload.get("input", ""))
|
||||||
|
|
||||||
|
chat_payload: Dict[str, Any] = {
|
||||||
|
"model": model,
|
||||||
|
"messages": messages,
|
||||||
|
}
|
||||||
|
|
||||||
|
passthrough_keys = [
|
||||||
|
"temperature",
|
||||||
|
"top_p",
|
||||||
|
"max_output_tokens",
|
||||||
|
"stream",
|
||||||
|
"tools",
|
||||||
|
"tool_choice",
|
||||||
|
"response_format",
|
||||||
|
"return_format",
|
||||||
|
"frequency_penalty",
|
||||||
|
"presence_penalty",
|
||||||
|
"seed",
|
||||||
|
"stop",
|
||||||
|
]
|
||||||
|
|
||||||
|
for key in passthrough_keys:
|
||||||
|
if key in payload:
|
||||||
|
if key == "max_output_tokens":
|
||||||
|
chat_payload["max_tokens"] = payload[key]
|
||||||
|
elif key == "return_format" and payload[key] == "json":
|
||||||
|
chat_payload["response_format"] = {"type": "json_object"}
|
||||||
|
else:
|
||||||
|
chat_payload[key] = payload[key]
|
||||||
|
|
||||||
|
return normalize_chat_payload(chat_payload), model
|
||||||
|
|
||||||
|
|
||||||
|
def chat_to_responses(chat: Dict[str, Any], model: str) -> Dict[str, Any]:
|
||||||
|
response_id = f"resp_{uuid.uuid4().hex}"
|
||||||
|
created = int(time.time())
|
||||||
|
content = ""
|
||||||
|
if chat.get("choices"):
|
||||||
|
choice = chat["choices"][0]
|
||||||
|
message = choice.get("message") or {}
|
||||||
|
content = message.get("content") or ""
|
||||||
|
|
||||||
|
return {
|
||||||
|
"id": response_id,
|
||||||
|
"object": "response",
|
||||||
|
"created": created,
|
||||||
|
"model": model,
|
||||||
|
"output": [
|
||||||
|
{
|
||||||
|
"id": f"msg_{uuid.uuid4().hex}",
|
||||||
|
"type": "message",
|
||||||
|
"role": "assistant",
|
||||||
|
"content": [
|
||||||
|
{"type": "output_text", "text": content}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"usage": chat.get("usage", {}),
|
||||||
|
}
|
||||||
51
llamaCpp.Wrapper.app/restart.py
Normal file
51
llamaCpp.Wrapper.app/restart.py
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import shlex
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("llamacpp_restart")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class RestartPlan:
|
||||||
|
method: str
|
||||||
|
command: Optional[str]
|
||||||
|
url: Optional[str]
|
||||||
|
allowed_container: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
async def trigger_restart(plan: RestartPlan, payload: Optional[dict] = None) -> None:
|
||||||
|
if plan.method == "none":
|
||||||
|
log.warning("Restart requested but restart method is none")
|
||||||
|
return
|
||||||
|
if plan.method == "http":
|
||||||
|
if not plan.url:
|
||||||
|
raise RuntimeError("restart url is required for http method")
|
||||||
|
async with httpx.AsyncClient(timeout=60) as client:
|
||||||
|
resp = await client.post(plan.url, json=payload or {})
|
||||||
|
resp.raise_for_status()
|
||||||
|
return
|
||||||
|
if plan.method == "docker":
|
||||||
|
if not plan.command:
|
||||||
|
raise RuntimeError("restart command must include container id or name for docker method")
|
||||||
|
if plan.allowed_container and plan.command != plan.allowed_container:
|
||||||
|
raise RuntimeError("docker restart command not allowed for non-target container")
|
||||||
|
async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds="/var/run/docker.sock"), timeout=30) as client:
|
||||||
|
resp = await client.post(f"http://docker/containers/{plan.command}/restart")
|
||||||
|
resp.raise_for_status()
|
||||||
|
return
|
||||||
|
if plan.method == "shell":
|
||||||
|
if not plan.command:
|
||||||
|
raise RuntimeError("restart command is required for shell method")
|
||||||
|
cmd = plan.command
|
||||||
|
args = shlex.split(cmd)
|
||||||
|
proc = await asyncio.create_subprocess_exec(*args)
|
||||||
|
code = await proc.wait()
|
||||||
|
if code != 0:
|
||||||
|
raise RuntimeError(f"restart command failed with exit code {code}")
|
||||||
|
return
|
||||||
|
raise RuntimeError(f"unknown restart method {plan.method}")
|
||||||
35
llamaCpp.Wrapper.app/run.py
Normal file
35
llamaCpp.Wrapper.app/run.py
Normal file
@@ -0,0 +1,35 @@
|
|||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from app.config import load_config
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
cfg = load_config()
|
||||||
|
python = sys.executable
|
||||||
|
|
||||||
|
api_cmd = [python, "-m", "uvicorn", "app.api_app:create_api_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.api_port)]
|
||||||
|
ui_cmd = [python, "-m", "uvicorn", "app.ui_app:create_ui_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.ui_port)]
|
||||||
|
|
||||||
|
procs = [subprocess.Popen(api_cmd)]
|
||||||
|
if cfg.ui_port != cfg.api_port:
|
||||||
|
procs.append(subprocess.Popen(ui_cmd))
|
||||||
|
|
||||||
|
def shutdown(_sig, _frame):
|
||||||
|
for proc in procs:
|
||||||
|
proc.terminate()
|
||||||
|
for proc in procs:
|
||||||
|
proc.wait(timeout=10)
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
signal.signal(signal.SIGTERM, shutdown)
|
||||||
|
signal.signal(signal.SIGINT, shutdown)
|
||||||
|
|
||||||
|
for proc in procs:
|
||||||
|
proc.wait()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
102
llamaCpp.Wrapper.app/stream_transform.py
Normal file
102
llamaCpp.Wrapper.app/stream_transform.py
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
import json
|
||||||
|
import time
|
||||||
|
import uuid
|
||||||
|
from typing import Any, AsyncIterator, Dict
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
def _sse_event(event: str, data: Dict[str, Any]) -> bytes:
|
||||||
|
payload = json.dumps(data, separators=(",", ":"))
|
||||||
|
return f"event: {event}\ndata: {payload}\n\n".encode("utf-8")
|
||||||
|
|
||||||
|
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
|
||||||
|
drop = {"host", "content-length"}
|
||||||
|
return {k: v for k, v in headers.items() if k.lower() not in drop}
|
||||||
|
|
||||||
|
|
||||||
|
async def stream_chat_to_responses(
|
||||||
|
base_url: str,
|
||||||
|
headers: Dict[str, str],
|
||||||
|
payload: Dict[str, Any],
|
||||||
|
timeout_s: float,
|
||||||
|
) -> AsyncIterator[bytes]:
|
||||||
|
response_id = f"resp_{uuid.uuid4().hex}"
|
||||||
|
created = int(time.time())
|
||||||
|
model = payload.get("model") or "unknown"
|
||||||
|
msg_id = f"msg_{uuid.uuid4().hex}"
|
||||||
|
output_text = ""
|
||||||
|
|
||||||
|
response_stub = {
|
||||||
|
"id": response_id,
|
||||||
|
"object": "response",
|
||||||
|
"created": created,
|
||||||
|
"model": model,
|
||||||
|
"output": [
|
||||||
|
{
|
||||||
|
"id": msg_id,
|
||||||
|
"type": "message",
|
||||||
|
"role": "assistant",
|
||||||
|
"content": [
|
||||||
|
{"type": "output_text", "text": ""}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
yield _sse_event("response.created", {"type": "response.created", "response": response_stub})
|
||||||
|
|
||||||
|
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||||
|
async with client.stream(
|
||||||
|
"POST",
|
||||||
|
"/v1/chat/completions",
|
||||||
|
headers=_filter_headers(headers),
|
||||||
|
json=payload,
|
||||||
|
) as resp:
|
||||||
|
resp.raise_for_status()
|
||||||
|
buffer = ""
|
||||||
|
async for chunk in resp.aiter_text():
|
||||||
|
buffer += chunk
|
||||||
|
while "\n\n" in buffer:
|
||||||
|
block, buffer = buffer.split("\n\n", 1)
|
||||||
|
lines = [line for line in block.splitlines() if line.startswith("data:")]
|
||||||
|
if not lines:
|
||||||
|
continue
|
||||||
|
data_str = "\n".join(line[len("data:"):].strip() for line in lines)
|
||||||
|
if data_str == "[DONE]":
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
data = json.loads(data_str)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
choices = data.get("choices") or []
|
||||||
|
if not choices:
|
||||||
|
continue
|
||||||
|
delta = choices[0].get("delta") or {}
|
||||||
|
text_delta = delta.get("content")
|
||||||
|
if text_delta:
|
||||||
|
output_text += text_delta
|
||||||
|
yield _sse_event(
|
||||||
|
"response.output_text.delta",
|
||||||
|
{
|
||||||
|
"type": "response.output_text.delta",
|
||||||
|
"delta": text_delta,
|
||||||
|
"item_id": msg_id,
|
||||||
|
"output_index": 0,
|
||||||
|
"content_index": 0,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
yield _sse_event(
|
||||||
|
"response.output_text.done",
|
||||||
|
{
|
||||||
|
"type": "response.output_text.done",
|
||||||
|
"text": output_text,
|
||||||
|
"item_id": msg_id,
|
||||||
|
"output_index": 0,
|
||||||
|
"content_index": 0,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
response_stub["output"][0]["content"][0]["text"] = output_text
|
||||||
|
yield _sse_event("response.completed", {"type": "response.completed", "response": response_stub})
|
||||||
313
llamaCpp.Wrapper.app/truenas_middleware.py
Normal file
313
llamaCpp.Wrapper.app/truenas_middleware.py
Normal file
@@ -0,0 +1,313 @@
|
|||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import shlex
|
||||||
|
import ssl
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
import websockets
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("truenas_middleware")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class TrueNASConfig:
|
||||||
|
ws_url: str
|
||||||
|
api_key: str
|
||||||
|
api_user: Optional[str]
|
||||||
|
app_name: str
|
||||||
|
verify_ssl: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_compose(raw: Any) -> Dict[str, Any]:
|
||||||
|
if isinstance(raw, dict):
|
||||||
|
return raw
|
||||||
|
if isinstance(raw, str):
|
||||||
|
text = raw.strip()
|
||||||
|
try:
|
||||||
|
return json.loads(text)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return yaml.safe_load(text)
|
||||||
|
raise ValueError("Unsupported compose payload")
|
||||||
|
|
||||||
|
|
||||||
|
def _command_to_list(command: Any) -> list:
|
||||||
|
if isinstance(command, list):
|
||||||
|
return command
|
||||||
|
if isinstance(command, str):
|
||||||
|
return shlex.split(command)
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_command(config: Dict[str, Any], service_name: str = "llamacpp") -> list:
|
||||||
|
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||||
|
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||||
|
services = compose.get("services") or {}
|
||||||
|
svc = services.get(service_name) or {}
|
||||||
|
return _command_to_list(svc.get("command"))
|
||||||
|
return _command_to_list(config.get("command"))
|
||||||
|
|
||||||
|
|
||||||
|
def _model_id_from_command(cmd: list) -> Optional[str]:
|
||||||
|
if "--model" in cmd:
|
||||||
|
idx = cmd.index("--model")
|
||||||
|
if idx + 1 < len(cmd):
|
||||||
|
return Path(cmd[idx + 1]).name
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _set_arg(cmd: list, flag: str, value: Optional[str]) -> list:
|
||||||
|
if value is None:
|
||||||
|
return cmd
|
||||||
|
if flag in cmd:
|
||||||
|
idx = cmd.index(flag)
|
||||||
|
if idx + 1 < len(cmd):
|
||||||
|
cmd[idx + 1] = value
|
||||||
|
else:
|
||||||
|
cmd.append(value)
|
||||||
|
return cmd
|
||||||
|
cmd.extend([flag, value])
|
||||||
|
return cmd
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_args(cmd: list, args: Dict[str, str]) -> list:
|
||||||
|
flag_map = {
|
||||||
|
"device": "--device",
|
||||||
|
"tensor_split": "--tensor-split",
|
||||||
|
"split_mode": "--split-mode",
|
||||||
|
"n_gpu_layers": "--n-gpu-layers",
|
||||||
|
"ctx_size": "--ctx-size",
|
||||||
|
"batch_size": "--batch-size",
|
||||||
|
"ubatch_size": "--ubatch-size",
|
||||||
|
"cache_type_k": "--cache-type-k",
|
||||||
|
"cache_type_v": "--cache-type-v",
|
||||||
|
"flash_attn": "--flash-attn",
|
||||||
|
}
|
||||||
|
for key, value in args.items():
|
||||||
|
flag = flag_map.get(key)
|
||||||
|
if flag:
|
||||||
|
if flag in cmd:
|
||||||
|
continue
|
||||||
|
_set_arg(cmd, flag, value)
|
||||||
|
return cmd
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_extra_args(cmd: list, extra: str) -> list:
|
||||||
|
if not extra:
|
||||||
|
return cmd
|
||||||
|
extra_list = shlex.split(extra)
|
||||||
|
filtered: list[str] = []
|
||||||
|
skip_next = False
|
||||||
|
for item in extra_list:
|
||||||
|
if skip_next:
|
||||||
|
skip_next = False
|
||||||
|
continue
|
||||||
|
if item in {"--device", "-dev"}:
|
||||||
|
log.warning("Dropping --device from extra args to avoid llama.cpp device errors.")
|
||||||
|
skip_next = True
|
||||||
|
continue
|
||||||
|
filtered.append(item)
|
||||||
|
for flag in filtered:
|
||||||
|
if flag not in cmd:
|
||||||
|
cmd.append(flag)
|
||||||
|
return cmd
|
||||||
|
|
||||||
|
|
||||||
|
def _update_model_command(command: Any, model_path: str, args: Dict[str, str], extra: str) -> list:
|
||||||
|
cmd = _command_to_list(command)
|
||||||
|
if "--device" in cmd:
|
||||||
|
idx = cmd.index("--device")
|
||||||
|
del cmd[idx: idx + 2]
|
||||||
|
cmd = _set_arg(cmd, "--model", model_path)
|
||||||
|
cmd = _merge_args(cmd, args)
|
||||||
|
cmd = _merge_extra_args(cmd, extra)
|
||||||
|
return cmd
|
||||||
|
|
||||||
|
|
||||||
|
def _replace_flags(cmd: list, flags: Dict[str, Optional[str]], extra: str) -> list:
|
||||||
|
result = list(cmd)
|
||||||
|
for flag in flags.keys():
|
||||||
|
while flag in result:
|
||||||
|
idx = result.index(flag)
|
||||||
|
del result[idx: idx + 2]
|
||||||
|
if "--device" in result:
|
||||||
|
idx = result.index("--device")
|
||||||
|
del result[idx: idx + 2]
|
||||||
|
for flag, value in flags.items():
|
||||||
|
if value is not None and value != "":
|
||||||
|
result = _set_arg(result, flag, value)
|
||||||
|
result = _merge_extra_args(result, extra)
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
async def get_app_config(cfg: TrueNASConfig) -> Dict[str, Any]:
|
||||||
|
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||||
|
if not isinstance(config, dict):
|
||||||
|
raise RuntimeError("app.config returned unsupported payload")
|
||||||
|
return config
|
||||||
|
|
||||||
|
|
||||||
|
async def get_app_command(cfg: TrueNASConfig, service_name: str = "llamacpp") -> list:
|
||||||
|
config = await get_app_config(cfg)
|
||||||
|
return _extract_command(config, service_name=service_name)
|
||||||
|
|
||||||
|
|
||||||
|
async def get_active_model_id(cfg: TrueNASConfig, service_name: str = "llamacpp") -> str:
|
||||||
|
config = await get_app_config(cfg)
|
||||||
|
cmd = _extract_command(config, service_name=service_name)
|
||||||
|
return _model_id_from_command(cmd) or ""
|
||||||
|
|
||||||
|
|
||||||
|
async def get_app_logs(
|
||||||
|
cfg: TrueNASConfig,
|
||||||
|
tail_lines: int = 200,
|
||||||
|
service_name: str = "llamacpp",
|
||||||
|
) -> str:
|
||||||
|
tail_payloads = [
|
||||||
|
{"tail": tail_lines},
|
||||||
|
{"tail_lines": tail_lines},
|
||||||
|
{"tail": str(tail_lines)},
|
||||||
|
]
|
||||||
|
for payload in tail_payloads:
|
||||||
|
try:
|
||||||
|
result = await _rpc_call(cfg, "app.container_logs", [cfg.app_name, service_name, payload])
|
||||||
|
if isinstance(result, str):
|
||||||
|
return result
|
||||||
|
except Exception as exc:
|
||||||
|
log.debug("app.container_logs failed (%s): %s", payload, exc)
|
||||||
|
for payload in tail_payloads:
|
||||||
|
try:
|
||||||
|
result = await _rpc_call(cfg, "app.logs", [cfg.app_name, payload])
|
||||||
|
if isinstance(result, str):
|
||||||
|
return result
|
||||||
|
except Exception as exc:
|
||||||
|
log.debug("app.logs failed (%s): %s", payload, exc)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
async def update_app_command(
|
||||||
|
cfg: TrueNASConfig,
|
||||||
|
command: list,
|
||||||
|
service_name: str = "llamacpp",
|
||||||
|
) -> None:
|
||||||
|
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||||
|
if not isinstance(config, dict):
|
||||||
|
raise RuntimeError("app.config returned unsupported payload")
|
||||||
|
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||||
|
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||||
|
services = compose.get("services") or {}
|
||||||
|
if service_name not in services:
|
||||||
|
raise RuntimeError(f"service {service_name} not found in compose")
|
||||||
|
svc = services[service_name]
|
||||||
|
svc["command"] = command
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||||
|
return
|
||||||
|
config["command"] = command
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||||
|
|
||||||
|
|
||||||
|
async def update_command_flags(
|
||||||
|
cfg: TrueNASConfig,
|
||||||
|
flags: Dict[str, Optional[str]],
|
||||||
|
extra: str,
|
||||||
|
service_name: str = "llamacpp",
|
||||||
|
) -> None:
|
||||||
|
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||||
|
if not isinstance(config, dict):
|
||||||
|
raise RuntimeError("app.config returned unsupported payload")
|
||||||
|
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||||
|
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||||
|
services = compose.get("services") or {}
|
||||||
|
if service_name not in services:
|
||||||
|
raise RuntimeError(f"service {service_name} not found in compose")
|
||||||
|
svc = services[service_name]
|
||||||
|
cmd = svc.get("command")
|
||||||
|
svc["command"] = _replace_flags(_command_to_list(cmd), flags, extra)
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||||
|
return
|
||||||
|
cmd = _replace_flags(_command_to_list(config.get("command")), flags, extra)
|
||||||
|
config["command"] = cmd
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||||
|
|
||||||
|
|
||||||
|
async def _rpc_call(cfg: TrueNASConfig, method: str, params: Optional[list] = None) -> Any:
|
||||||
|
ssl_ctx = None
|
||||||
|
if cfg.ws_url.startswith("wss://") and not cfg.verify_ssl:
|
||||||
|
ssl_ctx = ssl.create_default_context()
|
||||||
|
ssl_ctx.check_hostname = False
|
||||||
|
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||||
|
|
||||||
|
async with websockets.connect(cfg.ws_url, ssl=ssl_ctx) as ws:
|
||||||
|
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||||
|
connected = json.loads(await ws.recv())
|
||||||
|
if connected.get("msg") != "connected":
|
||||||
|
raise RuntimeError("failed to connect to TrueNAS websocket")
|
||||||
|
|
||||||
|
await ws.send(
|
||||||
|
json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [cfg.api_key]})
|
||||||
|
)
|
||||||
|
auth_resp = json.loads(await ws.recv())
|
||||||
|
if not auth_resp.get("result"):
|
||||||
|
if not cfg.api_user:
|
||||||
|
raise RuntimeError("API key rejected and TRUENAS_API_USER not set")
|
||||||
|
await ws.send(
|
||||||
|
json.dumps(
|
||||||
|
{
|
||||||
|
"id": 2,
|
||||||
|
"msg": "method",
|
||||||
|
"method": "auth.login_ex",
|
||||||
|
"params": [
|
||||||
|
{
|
||||||
|
"mechanism": "API_KEY_PLAIN",
|
||||||
|
"username": cfg.api_user,
|
||||||
|
"api_key": cfg.api_key,
|
||||||
|
}
|
||||||
|
],
|
||||||
|
}
|
||||||
|
)
|
||||||
|
)
|
||||||
|
auth_ex = json.loads(await ws.recv())
|
||||||
|
if auth_ex.get("result", {}).get("response_type") != "SUCCESS":
|
||||||
|
raise RuntimeError("API key authentication failed")
|
||||||
|
|
||||||
|
req_id = 3
|
||||||
|
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
|
||||||
|
while True:
|
||||||
|
raw = json.loads(await ws.recv())
|
||||||
|
if raw.get("id") != req_id:
|
||||||
|
continue
|
||||||
|
if raw.get("msg") == "error":
|
||||||
|
raise RuntimeError(raw.get("error"))
|
||||||
|
return raw.get("result")
|
||||||
|
|
||||||
|
|
||||||
|
async def switch_model(
|
||||||
|
cfg: TrueNASConfig,
|
||||||
|
model_path: str,
|
||||||
|
args: Dict[str, str],
|
||||||
|
extra: str,
|
||||||
|
service_name: str = "llamacpp",
|
||||||
|
) -> None:
|
||||||
|
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||||
|
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||||
|
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||||
|
services = compose.get("services") or {}
|
||||||
|
if service_name not in services:
|
||||||
|
raise RuntimeError(f"service {service_name} not found in compose")
|
||||||
|
svc = services[service_name]
|
||||||
|
cmd = svc.get("command")
|
||||||
|
svc["command"] = _update_model_command(cmd, model_path, args, extra)
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||||
|
log.info("Requested model switch to %s via TrueNAS middleware (custom app)", model_path)
|
||||||
|
return
|
||||||
|
|
||||||
|
if not isinstance(config, dict):
|
||||||
|
raise RuntimeError("app.config returned unsupported payload")
|
||||||
|
|
||||||
|
cmd = config.get("command")
|
||||||
|
config["command"] = _update_model_command(cmd, model_path, args, extra)
|
||||||
|
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||||
|
log.info("Requested model switch to %s via TrueNAS middleware (catalog app)", model_path)
|
||||||
357
llamaCpp.Wrapper.app/ui_app.py
Normal file
357
llamaCpp.Wrapper.app/ui_app.py
Normal file
@@ -0,0 +1,357 @@
|
|||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
from fastapi import FastAPI, HTTPException, Request
|
||||||
|
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse, StreamingResponse
|
||||||
|
|
||||||
|
from app.config import load_config
|
||||||
|
from app.docker_logs import docker_container_logs
|
||||||
|
from app.download_manager import DownloadManager
|
||||||
|
from app.logging_utils import configure_logging
|
||||||
|
from app.model_registry import scan_models
|
||||||
|
from app.truenas_middleware import (
|
||||||
|
TrueNASConfig,
|
||||||
|
get_active_model_id,
|
||||||
|
get_app_command,
|
||||||
|
get_app_logs,
|
||||||
|
switch_model,
|
||||||
|
update_command_flags,
|
||||||
|
)
|
||||||
|
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
|
||||||
|
|
||||||
|
|
||||||
|
configure_logging()
|
||||||
|
log = logging.getLogger("ui_app")
|
||||||
|
|
||||||
|
|
||||||
|
class EventBroadcaster:
|
||||||
|
def __init__(self) -> None:
|
||||||
|
self._queues: set[asyncio.Queue] = set()
|
||||||
|
|
||||||
|
def connect(self) -> asyncio.Queue:
|
||||||
|
queue: asyncio.Queue = asyncio.Queue()
|
||||||
|
self._queues.add(queue)
|
||||||
|
return queue
|
||||||
|
|
||||||
|
def disconnect(self, queue: asyncio.Queue) -> None:
|
||||||
|
self._queues.discard(queue)
|
||||||
|
|
||||||
|
async def publish(self, payload: dict) -> None:
|
||||||
|
for queue in list(self._queues):
|
||||||
|
queue.put_nowait(payload)
|
||||||
|
|
||||||
|
|
||||||
|
def _static_path() -> Path:
|
||||||
|
return Path(__file__).parent / "ui_static"
|
||||||
|
|
||||||
|
|
||||||
|
async def _fetch_active_model(truenas_cfg: Optional[TrueNASConfig]) -> Optional[str]:
|
||||||
|
if not truenas_cfg:
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
return await get_active_model_id(truenas_cfg)
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning("Failed to read active model from TrueNAS config: %s", exc)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _model_list(model_dir: str, active_model: Optional[str]) -> Dict[str, Any]:
|
||||||
|
data = []
|
||||||
|
for model in scan_models(model_dir):
|
||||||
|
data.append({
|
||||||
|
"id": model.model_id,
|
||||||
|
"size": model.size,
|
||||||
|
"active": model.model_id == active_model,
|
||||||
|
})
|
||||||
|
return {"models": data, "active_model": active_model}
|
||||||
|
|
||||||
|
|
||||||
|
def create_ui_app() -> FastAPI:
|
||||||
|
cfg = load_config()
|
||||||
|
app = FastAPI(title="llama.cpp Model Manager", version="0.1.0")
|
||||||
|
broadcaster = EventBroadcaster()
|
||||||
|
manager = DownloadManager(cfg, broadcaster=broadcaster)
|
||||||
|
truenas_cfg = None
|
||||||
|
if cfg.truenas_ws_url and cfg.truenas_api_key:
|
||||||
|
truenas_cfg = TrueNASConfig(
|
||||||
|
ws_url=cfg.truenas_ws_url,
|
||||||
|
api_key=cfg.truenas_api_key,
|
||||||
|
api_user=cfg.truenas_api_user,
|
||||||
|
app_name=cfg.truenas_app_name,
|
||||||
|
verify_ssl=cfg.truenas_verify_ssl,
|
||||||
|
)
|
||||||
|
|
||||||
|
async def monitor_active_model() -> None:
|
||||||
|
last_model = None
|
||||||
|
while True:
|
||||||
|
current = await _fetch_active_model(truenas_cfg)
|
||||||
|
if current and current != last_model:
|
||||||
|
last_model = current
|
||||||
|
await broadcaster.publish({"type": "active_model", "model_id": current})
|
||||||
|
await asyncio.sleep(3)
|
||||||
|
|
||||||
|
async def _fetch_logs() -> str:
|
||||||
|
logs = ""
|
||||||
|
if truenas_cfg:
|
||||||
|
try:
|
||||||
|
logs = await asyncio.wait_for(get_app_logs(truenas_cfg, tail_lines=200), timeout=5)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logs = ""
|
||||||
|
if not logs and cfg.llamacpp_container_name:
|
||||||
|
try:
|
||||||
|
logs = await asyncio.wait_for(
|
||||||
|
docker_container_logs(cfg.llamacpp_container_name, tail_lines=200),
|
||||||
|
timeout=10,
|
||||||
|
)
|
||||||
|
except asyncio.TimeoutError:
|
||||||
|
logs = ""
|
||||||
|
return logs
|
||||||
|
|
||||||
|
@app.on_event("startup")
|
||||||
|
async def start_tasks() -> None:
|
||||||
|
asyncio.create_task(monitor_active_model())
|
||||||
|
|
||||||
|
@app.middleware("http")
|
||||||
|
async def log_requests(request: Request, call_next):
|
||||||
|
log.info("UI request %s %s", request.method, request.url.path)
|
||||||
|
return await call_next(request)
|
||||||
|
|
||||||
|
@app.get("/health")
|
||||||
|
async def health() -> Dict[str, Any]:
|
||||||
|
return {"status": "ok", "model_dir": cfg.model_dir}
|
||||||
|
|
||||||
|
@app.get("/")
|
||||||
|
async def index() -> HTMLResponse:
|
||||||
|
return FileResponse(_static_path() / "index.html")
|
||||||
|
|
||||||
|
@app.get("/ui/styles.css")
|
||||||
|
async def styles() -> FileResponse:
|
||||||
|
return FileResponse(_static_path() / "styles.css")
|
||||||
|
|
||||||
|
@app.get("/ui/app.js")
|
||||||
|
async def app_js() -> FileResponse:
|
||||||
|
return FileResponse(_static_path() / "app.js")
|
||||||
|
|
||||||
|
@app.get("/ui/api/models")
|
||||||
|
async def list_models() -> JSONResponse:
|
||||||
|
active_model = await _fetch_active_model(truenas_cfg)
|
||||||
|
log.info("UI list models active=%s", active_model)
|
||||||
|
return JSONResponse(_model_list(cfg.model_dir, active_model))
|
||||||
|
|
||||||
|
@app.get("/ui/api/downloads")
|
||||||
|
async def list_downloads() -> JSONResponse:
|
||||||
|
log.info("UI list downloads")
|
||||||
|
return JSONResponse({"downloads": manager.list_downloads()})
|
||||||
|
|
||||||
|
@app.post("/ui/api/downloads")
|
||||||
|
async def start_download(request: Request) -> JSONResponse:
|
||||||
|
payload = await request.json()
|
||||||
|
url = payload.get("url")
|
||||||
|
filename = payload.get("filename")
|
||||||
|
log.info("UI download start url=%s filename=%s", url, filename)
|
||||||
|
if not url:
|
||||||
|
raise HTTPException(status_code=400, detail="url is required")
|
||||||
|
try:
|
||||||
|
status = await manager.start(url, filename=filename)
|
||||||
|
except ValueError as exc:
|
||||||
|
raise HTTPException(status_code=403, detail=str(exc))
|
||||||
|
return JSONResponse({"download": status.__dict__})
|
||||||
|
|
||||||
|
@app.delete("/ui/api/downloads/{download_id}")
|
||||||
|
async def cancel_download(download_id: str) -> JSONResponse:
|
||||||
|
log.info("UI download cancel id=%s", download_id)
|
||||||
|
ok = await manager.cancel(download_id)
|
||||||
|
if not ok:
|
||||||
|
raise HTTPException(status_code=404, detail="download not found")
|
||||||
|
return JSONResponse({"status": "cancelled"})
|
||||||
|
|
||||||
|
@app.get("/ui/api/events")
|
||||||
|
async def events() -> StreamingResponse:
|
||||||
|
queue = broadcaster.connect()
|
||||||
|
|
||||||
|
async def event_stream():
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
payload = await queue.get()
|
||||||
|
data = json.dumps(payload, separators=(",", ":"))
|
||||||
|
yield f"data: {data}\n\n".encode("utf-8")
|
||||||
|
finally:
|
||||||
|
broadcaster.disconnect(queue)
|
||||||
|
|
||||||
|
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
@app.post("/ui/api/switch-model")
|
||||||
|
async def switch_model_ui(request: Request) -> JSONResponse:
|
||||||
|
payload = await request.json()
|
||||||
|
model_id = payload.get("model_id")
|
||||||
|
warmup_override = payload.get("warmup_prompt") or ""
|
||||||
|
if not model_id:
|
||||||
|
raise HTTPException(status_code=400, detail="model_id is required")
|
||||||
|
|
||||||
|
model_path = Path(cfg.model_dir) / model_id
|
||||||
|
if not model_path.exists():
|
||||||
|
raise HTTPException(status_code=404, detail="model not found")
|
||||||
|
|
||||||
|
if not truenas_cfg:
|
||||||
|
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
|
||||||
|
|
||||||
|
try:
|
||||||
|
container_model_path = str(Path(cfg.model_container_dir) / model_id)
|
||||||
|
await switch_model(truenas_cfg, container_model_path, cfg.llamacpp_args, cfg.llamacpp_extra_args)
|
||||||
|
except Exception as exc:
|
||||||
|
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||||
|
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
|
||||||
|
|
||||||
|
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
|
||||||
|
log.info("UI warmup after switch model=%s prompt_len=%s", model_id, len(warmup_prompt))
|
||||||
|
try:
|
||||||
|
await run_warmup_with_retry(cfg.base_url, model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||||
|
except Exception as exc:
|
||||||
|
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||||
|
raise HTTPException(status_code=500, detail=f"model switch warmup failed: {exc}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
async with httpx.AsyncClient(base_url=cfg.base_url, timeout=120) as client:
|
||||||
|
resp = await client.post(
|
||||||
|
"/v1/chat/completions",
|
||||||
|
json={
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "ok"}],
|
||||||
|
"max_tokens": 4,
|
||||||
|
"temperature": 0,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
except Exception as exc:
|
||||||
|
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||||
|
raise HTTPException(status_code=500, detail=f"model switch verification failed: {exc}")
|
||||||
|
|
||||||
|
await broadcaster.publish({"type": "model_switched", "model_id": model_id})
|
||||||
|
log.info("UI model switched model=%s", model_id)
|
||||||
|
return JSONResponse({"status": "ok", "model_id": model_id})
|
||||||
|
|
||||||
|
@app.get("/ui/api/llamacpp-config")
|
||||||
|
async def get_llamacpp_config() -> JSONResponse:
|
||||||
|
active_model = await _fetch_active_model(truenas_cfg)
|
||||||
|
log.info("UI get llama.cpp config active=%s", active_model)
|
||||||
|
params: Dict[str, Optional[str]] = {}
|
||||||
|
command_raw = []
|
||||||
|
if truenas_cfg:
|
||||||
|
command_raw = await get_app_command(truenas_cfg)
|
||||||
|
flag_map = {
|
||||||
|
"--ctx-size": "ctx_size",
|
||||||
|
"--n-gpu-layers": "n_gpu_layers",
|
||||||
|
"--tensor-split": "tensor_split",
|
||||||
|
"--split-mode": "split_mode",
|
||||||
|
"--cache-type-k": "cache_type_k",
|
||||||
|
"--cache-type-v": "cache_type_v",
|
||||||
|
"--flash-attn": "flash_attn",
|
||||||
|
"--temp": "temp",
|
||||||
|
"--top-k": "top_k",
|
||||||
|
"--top-p": "top_p",
|
||||||
|
"--repeat-penalty": "repeat_penalty",
|
||||||
|
"--repeat-last-n": "repeat_last_n",
|
||||||
|
"--frequency-penalty": "frequency_penalty",
|
||||||
|
"--presence-penalty": "presence_penalty",
|
||||||
|
}
|
||||||
|
if isinstance(command_raw, list):
|
||||||
|
for flag, key in flag_map.items():
|
||||||
|
if flag in command_raw:
|
||||||
|
idx = command_raw.index(flag)
|
||||||
|
if idx + 1 < len(command_raw):
|
||||||
|
params[key] = command_raw[idx + 1]
|
||||||
|
known_flags = set(flag_map.keys()) | {"--model"}
|
||||||
|
extra = []
|
||||||
|
if isinstance(command_raw, list):
|
||||||
|
skip_next = False
|
||||||
|
for item in command_raw:
|
||||||
|
if skip_next:
|
||||||
|
skip_next = False
|
||||||
|
continue
|
||||||
|
if item in known_flags:
|
||||||
|
skip_next = True
|
||||||
|
continue
|
||||||
|
extra.append(item)
|
||||||
|
return JSONResponse(
|
||||||
|
{
|
||||||
|
"active_model": active_model,
|
||||||
|
"params": params,
|
||||||
|
"extra_args": " ".join(extra),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
@app.post("/ui/api/llamacpp-config")
|
||||||
|
async def update_llamacpp_config(request: Request) -> JSONResponse:
|
||||||
|
payload = await request.json()
|
||||||
|
params = payload.get("params") or {}
|
||||||
|
extra_args = payload.get("extra_args") or ""
|
||||||
|
warmup_override = payload.get("warmup_prompt") or ""
|
||||||
|
log.info("UI save llama.cpp config params=%s extra_args=%s", params, extra_args)
|
||||||
|
if not truenas_cfg:
|
||||||
|
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
|
||||||
|
flags = {
|
||||||
|
"--ctx-size": params.get("ctx_size"),
|
||||||
|
"--n-gpu-layers": params.get("n_gpu_layers"),
|
||||||
|
"--tensor-split": params.get("tensor_split"),
|
||||||
|
"--split-mode": params.get("split_mode"),
|
||||||
|
"--cache-type-k": params.get("cache_type_k"),
|
||||||
|
"--cache-type-v": params.get("cache_type_v"),
|
||||||
|
"--flash-attn": params.get("flash_attn"),
|
||||||
|
"--temp": params.get("temp"),
|
||||||
|
"--top-k": params.get("top_k"),
|
||||||
|
"--top-p": params.get("top_p"),
|
||||||
|
"--repeat-penalty": params.get("repeat_penalty"),
|
||||||
|
"--repeat-last-n": params.get("repeat_last_n"),
|
||||||
|
"--frequency-penalty": params.get("frequency_penalty"),
|
||||||
|
"--presence-penalty": params.get("presence_penalty"),
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
await update_command_flags(truenas_cfg, flags, extra_args)
|
||||||
|
except Exception as exc:
|
||||||
|
log.exception("UI update llama.cpp config failed")
|
||||||
|
raise HTTPException(status_code=500, detail=f"config update failed: {exc}")
|
||||||
|
active_model = await _fetch_active_model(truenas_cfg)
|
||||||
|
if active_model:
|
||||||
|
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
|
||||||
|
log.info("UI warmup after config update model=%s prompt_len=%s", active_model, len(warmup_prompt))
|
||||||
|
try:
|
||||||
|
await run_warmup_with_retry(cfg.base_url, active_model, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||||
|
except Exception as exc:
|
||||||
|
raise HTTPException(status_code=500, detail=f"config warmup failed: {exc}")
|
||||||
|
await broadcaster.publish({"type": "llamacpp_config_updated"})
|
||||||
|
return JSONResponse({"status": "ok"})
|
||||||
|
|
||||||
|
@app.get("/ui/api/llamacpp-logs")
|
||||||
|
async def get_llamacpp_logs() -> JSONResponse:
|
||||||
|
logs = await _fetch_logs()
|
||||||
|
return JSONResponse({"logs": logs})
|
||||||
|
|
||||||
|
@app.get("/ui/api/llamacpp-logs/stream")
|
||||||
|
async def stream_llamacpp_logs() -> StreamingResponse:
|
||||||
|
async def event_stream():
|
||||||
|
last_lines: list[str] = []
|
||||||
|
while True:
|
||||||
|
logs = await _fetch_logs()
|
||||||
|
lines = logs.splitlines()
|
||||||
|
if last_lines:
|
||||||
|
last_tail = last_lines[-1]
|
||||||
|
idx = -1
|
||||||
|
for i in range(len(lines) - 1, -1, -1):
|
||||||
|
if lines[i] == last_tail:
|
||||||
|
idx = i
|
||||||
|
break
|
||||||
|
if idx >= 0:
|
||||||
|
lines = lines[idx + 1 :]
|
||||||
|
if lines:
|
||||||
|
last_lines = (last_lines + lines)[-200:]
|
||||||
|
data = json.dumps({"type": "logs", "lines": lines}, separators=(",", ":"))
|
||||||
|
yield f"data: {data}\n\n".encode("utf-8")
|
||||||
|
await asyncio.sleep(2)
|
||||||
|
|
||||||
|
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||||
|
|
||||||
|
return app
|
||||||
306
llamaCpp.Wrapper.app/ui_static/app.js
Normal file
306
llamaCpp.Wrapper.app/ui_static/app.js
Normal file
@@ -0,0 +1,306 @@
|
|||||||
|
const modelsList = document.getElementById("models-list");
|
||||||
|
const downloadsList = document.getElementById("downloads-list");
|
||||||
|
const refreshModels = document.getElementById("refresh-models");
|
||||||
|
const refreshDownloads = document.getElementById("refresh-downloads");
|
||||||
|
const form = document.getElementById("download-form");
|
||||||
|
const errorEl = document.getElementById("download-error");
|
||||||
|
const statusEl = document.getElementById("switch-status");
|
||||||
|
const configStatusEl = document.getElementById("config-status");
|
||||||
|
const configForm = document.getElementById("config-form");
|
||||||
|
const refreshConfig = document.getElementById("refresh-config");
|
||||||
|
const warmupPromptEl = document.getElementById("warmup-prompt");
|
||||||
|
const refreshLogs = document.getElementById("refresh-logs");
|
||||||
|
const logsOutput = document.getElementById("logs-output");
|
||||||
|
const logsStatus = document.getElementById("logs-status");
|
||||||
|
const themeToggle = document.getElementById("theme-toggle");
|
||||||
|
|
||||||
|
const applyTheme = (theme) => {
|
||||||
|
document.documentElement.setAttribute("data-theme", theme);
|
||||||
|
themeToggle.textContent = theme === "dark" ? "Light" : "Dark";
|
||||||
|
themeToggle.setAttribute("aria-pressed", theme === "dark" ? "true" : "false");
|
||||||
|
};
|
||||||
|
|
||||||
|
const savedTheme = localStorage.getItem("theme") || "light";
|
||||||
|
applyTheme(savedTheme);
|
||||||
|
themeToggle.addEventListener("click", () => {
|
||||||
|
const next = document.documentElement.getAttribute("data-theme") === "dark" ? "light" : "dark";
|
||||||
|
localStorage.setItem("theme", next);
|
||||||
|
applyTheme(next);
|
||||||
|
});
|
||||||
|
|
||||||
|
const cfgFields = {
|
||||||
|
ctx_size: document.getElementById("cfg-ctx-size"),
|
||||||
|
n_gpu_layers: document.getElementById("cfg-n-gpu-layers"),
|
||||||
|
tensor_split: document.getElementById("cfg-tensor-split"),
|
||||||
|
split_mode: document.getElementById("cfg-split-mode"),
|
||||||
|
cache_type_k: document.getElementById("cfg-cache-type-k"),
|
||||||
|
cache_type_v: document.getElementById("cfg-cache-type-v"),
|
||||||
|
flash_attn: document.getElementById("cfg-flash-attn"),
|
||||||
|
temp: document.getElementById("cfg-temp"),
|
||||||
|
top_k: document.getElementById("cfg-top-k"),
|
||||||
|
top_p: document.getElementById("cfg-top-p"),
|
||||||
|
repeat_penalty: document.getElementById("cfg-repeat-penalty"),
|
||||||
|
repeat_last_n: document.getElementById("cfg-repeat-last-n"),
|
||||||
|
frequency_penalty: document.getElementById("cfg-frequency-penalty"),
|
||||||
|
presence_penalty: document.getElementById("cfg-presence-penalty"),
|
||||||
|
};
|
||||||
|
const extraArgsEl = document.getElementById("cfg-extra-args");
|
||||||
|
|
||||||
|
const fmtBytes = (bytes) => {
|
||||||
|
if (!bytes && bytes !== 0) return "-";
|
||||||
|
const units = ["B", "KB", "MB", "GB", "TB"];
|
||||||
|
let idx = 0;
|
||||||
|
let value = bytes;
|
||||||
|
while (value >= 1024 && idx < units.length - 1) {
|
||||||
|
value /= 1024;
|
||||||
|
idx += 1;
|
||||||
|
}
|
||||||
|
return `${value.toFixed(1)} ${units[idx]}`;
|
||||||
|
};
|
||||||
|
|
||||||
|
const setStatus = (message, type) => {
|
||||||
|
statusEl.textContent = message || "";
|
||||||
|
statusEl.className = "status";
|
||||||
|
if (type) {
|
||||||
|
statusEl.classList.add(type);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const setConfigStatus = (message, type) => {
|
||||||
|
configStatusEl.textContent = message || "";
|
||||||
|
configStatusEl.className = "status";
|
||||||
|
if (type) {
|
||||||
|
configStatusEl.classList.add(type);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
async function loadModels() {
|
||||||
|
const res = await fetch("/ui/api/models");
|
||||||
|
const data = await res.json();
|
||||||
|
modelsList.innerHTML = "";
|
||||||
|
const activeModel = data.active_model;
|
||||||
|
data.models.forEach((model) => {
|
||||||
|
const li = document.createElement("li");
|
||||||
|
if (model.active) {
|
||||||
|
li.classList.add("active");
|
||||||
|
}
|
||||||
|
const row = document.createElement("div");
|
||||||
|
row.className = "model-row";
|
||||||
|
|
||||||
|
const name = document.createElement("span");
|
||||||
|
name.textContent = `${model.id} (${fmtBytes(model.size)})`;
|
||||||
|
|
||||||
|
const actions = document.createElement("div");
|
||||||
|
if (model.active) {
|
||||||
|
const badge = document.createElement("span");
|
||||||
|
badge.className = "badge";
|
||||||
|
badge.textContent = "Active";
|
||||||
|
actions.appendChild(badge);
|
||||||
|
} else {
|
||||||
|
const button = document.createElement("button");
|
||||||
|
button.className = "ghost";
|
||||||
|
button.textContent = "Switch";
|
||||||
|
button.onclick = async () => {
|
||||||
|
setStatus(`Switching to ${model.id}...`);
|
||||||
|
const warmupPrompt = warmupPromptEl.value.trim();
|
||||||
|
const res = await fetch("/ui/api/switch-model", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({ model_id: model.id, warmup_prompt: warmupPrompt }),
|
||||||
|
});
|
||||||
|
const payload = await res.json();
|
||||||
|
if (!res.ok) {
|
||||||
|
setStatus(payload.detail || "Switch failed.", "error");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
warmupPromptEl.value = "";
|
||||||
|
setStatus(`Active model: ${model.id}`, "ok");
|
||||||
|
await loadModels();
|
||||||
|
};
|
||||||
|
actions.appendChild(button);
|
||||||
|
}
|
||||||
|
|
||||||
|
row.appendChild(name);
|
||||||
|
row.appendChild(actions);
|
||||||
|
li.appendChild(row);
|
||||||
|
modelsList.appendChild(li);
|
||||||
|
});
|
||||||
|
if (activeModel) {
|
||||||
|
setStatus(`Active model: ${activeModel}`, "ok");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadDownloads() {
|
||||||
|
const res = await fetch("/ui/api/downloads");
|
||||||
|
const data = await res.json();
|
||||||
|
downloadsList.innerHTML = "";
|
||||||
|
const entries = Object.values(data.downloads || {});
|
||||||
|
if (!entries.length) {
|
||||||
|
downloadsList.innerHTML = "<p>No active downloads.</p>";
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
entries.forEach((download) => {
|
||||||
|
const card = document.createElement("div");
|
||||||
|
card.className = "download-card";
|
||||||
|
|
||||||
|
const title = document.createElement("strong");
|
||||||
|
title.textContent = download.filename;
|
||||||
|
|
||||||
|
const meta = document.createElement("div");
|
||||||
|
const percent = download.bytes_total
|
||||||
|
? Math.round((download.bytes_downloaded / download.bytes_total) * 100)
|
||||||
|
: 0;
|
||||||
|
meta.textContent = `${download.status} · ${fmtBytes(download.bytes_downloaded)} / ${fmtBytes(download.bytes_total)}`;
|
||||||
|
|
||||||
|
const progress = document.createElement("div");
|
||||||
|
progress.className = "progress";
|
||||||
|
const bar = document.createElement("span");
|
||||||
|
bar.style.width = `${Math.min(percent, 100)}%`;
|
||||||
|
progress.appendChild(bar);
|
||||||
|
|
||||||
|
const actions = document.createElement("div");
|
||||||
|
if (download.status === "downloading" || download.status === "queued") {
|
||||||
|
const cancel = document.createElement("button");
|
||||||
|
cancel.className = "ghost";
|
||||||
|
cancel.textContent = "Cancel";
|
||||||
|
cancel.onclick = async () => {
|
||||||
|
await fetch(`/ui/api/downloads/${download.download_id}`, { method: "DELETE" });
|
||||||
|
await loadDownloads();
|
||||||
|
};
|
||||||
|
actions.appendChild(cancel);
|
||||||
|
}
|
||||||
|
|
||||||
|
card.appendChild(title);
|
||||||
|
card.appendChild(meta);
|
||||||
|
card.appendChild(progress);
|
||||||
|
card.appendChild(actions);
|
||||||
|
downloadsList.appendChild(card);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadConfig() {
|
||||||
|
const res = await fetch("/ui/api/llamacpp-config");
|
||||||
|
const data = await res.json();
|
||||||
|
Object.entries(cfgFields).forEach(([key, el]) => {
|
||||||
|
el.value = data.params?.[key] || "";
|
||||||
|
});
|
||||||
|
extraArgsEl.value = data.extra_args || "";
|
||||||
|
if (data.active_model) {
|
||||||
|
setConfigStatus(`Active model: ${data.active_model}`, "ok");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function loadLogs() {
|
||||||
|
const res = await fetch("/ui/api/llamacpp-logs");
|
||||||
|
if (!res.ok) {
|
||||||
|
logsStatus.textContent = "Unavailable";
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const data = await res.json();
|
||||||
|
logsOutput.textContent = data.logs || "";
|
||||||
|
logsStatus.textContent = data.logs ? "Snapshot" : "Empty";
|
||||||
|
}
|
||||||
|
|
||||||
|
form.addEventListener("submit", async (event) => {
|
||||||
|
event.preventDefault();
|
||||||
|
errorEl.textContent = "";
|
||||||
|
const url = document.getElementById("model-url").value.trim();
|
||||||
|
const filename = document.getElementById("model-filename").value.trim();
|
||||||
|
if (!url) {
|
||||||
|
errorEl.textContent = "URL is required.";
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const payload = { url };
|
||||||
|
if (filename) payload.filename = filename;
|
||||||
|
const res = await fetch("/ui/api/downloads", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify(payload),
|
||||||
|
});
|
||||||
|
if (!res.ok) {
|
||||||
|
const err = await res.json();
|
||||||
|
errorEl.textContent = err.detail || "Failed to start download.";
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
document.getElementById("model-url").value = "";
|
||||||
|
document.getElementById("model-filename").value = "";
|
||||||
|
await loadDownloads();
|
||||||
|
});
|
||||||
|
|
||||||
|
configForm.addEventListener("submit", async (event) => {
|
||||||
|
event.preventDefault();
|
||||||
|
setConfigStatus("Applying parameters...");
|
||||||
|
const params = {};
|
||||||
|
Object.entries(cfgFields).forEach(([key, el]) => {
|
||||||
|
if (el.value.trim()) {
|
||||||
|
params[key] = el.value.trim();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
const warmupPrompt = warmupPromptEl.value.trim();
|
||||||
|
const res = await fetch("/ui/api/llamacpp-config", {
|
||||||
|
method: "POST",
|
||||||
|
headers: { "Content-Type": "application/json" },
|
||||||
|
body: JSON.stringify({ params, extra_args: extraArgsEl.value.trim(), warmup_prompt: warmupPrompt }),
|
||||||
|
});
|
||||||
|
const payload = await res.json();
|
||||||
|
if (!res.ok) {
|
||||||
|
setConfigStatus(payload.detail || "Update failed.", "error");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
setConfigStatus("Parameters updated.", "ok");
|
||||||
|
warmupPromptEl.value = "";
|
||||||
|
});
|
||||||
|
|
||||||
|
refreshModels.addEventListener("click", loadModels);
|
||||||
|
refreshDownloads.addEventListener("click", loadDownloads);
|
||||||
|
refreshConfig.addEventListener("click", loadConfig);
|
||||||
|
refreshLogs.addEventListener("click", loadLogs);
|
||||||
|
|
||||||
|
loadModels();
|
||||||
|
loadDownloads();
|
||||||
|
loadConfig();
|
||||||
|
loadLogs();
|
||||||
|
|
||||||
|
const eventSource = new EventSource("/ui/api/events");
|
||||||
|
eventSource.onmessage = async (event) => {
|
||||||
|
const payload = JSON.parse(event.data);
|
||||||
|
if (payload.type === "download_progress" || payload.type === "download_completed" || payload.type === "download_status") {
|
||||||
|
await loadDownloads();
|
||||||
|
}
|
||||||
|
if (payload.type === "active_model") {
|
||||||
|
await loadModels();
|
||||||
|
await loadConfig();
|
||||||
|
}
|
||||||
|
if (payload.type === "model_switched") {
|
||||||
|
setStatus(`Active model: ${payload.model_id}`, "ok");
|
||||||
|
await loadModels();
|
||||||
|
await loadConfig();
|
||||||
|
}
|
||||||
|
if (payload.type === "model_switch_failed") {
|
||||||
|
setStatus(payload.error || "Model switch failed.", "error");
|
||||||
|
}
|
||||||
|
if (payload.type === "llamacpp_config_updated") {
|
||||||
|
await loadConfig();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const logsSource = new EventSource("/ui/api/llamacpp-logs/stream");
|
||||||
|
logsSource.onopen = () => {
|
||||||
|
logsStatus.textContent = "Streaming";
|
||||||
|
};
|
||||||
|
logsSource.onmessage = (event) => {
|
||||||
|
const payload = JSON.parse(event.data);
|
||||||
|
if (payload.type !== "logs") {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const lines = payload.lines || [];
|
||||||
|
if (!lines.length) return;
|
||||||
|
const current = logsOutput.textContent.split("\n").filter((line) => line.length);
|
||||||
|
const merged = current.concat(lines).slice(-400);
|
||||||
|
logsOutput.textContent = merged.join("\n");
|
||||||
|
logsOutput.scrollTop = logsOutput.scrollHeight;
|
||||||
|
logsStatus.textContent = "Streaming";
|
||||||
|
};
|
||||||
|
logsSource.onerror = () => {
|
||||||
|
logsStatus.textContent = "Disconnected";
|
||||||
|
};
|
||||||
151
llamaCpp.Wrapper.app/ui_static/index.html
Normal file
151
llamaCpp.Wrapper.app/ui_static/index.html
Normal file
@@ -0,0 +1,151 @@
|
|||||||
|
<!doctype html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="utf-8" />
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||||
|
<title>llama.cpp Model Manager</title>
|
||||||
|
<link rel="stylesheet" href="/ui/styles.css" />
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="page">
|
||||||
|
<header class="topbar">
|
||||||
|
<div class="brand">
|
||||||
|
<p class="eyebrow">llama.cpp wrapper</p>
|
||||||
|
<h1>Model Manager</h1>
|
||||||
|
<p class="lede">Curate models, tune runtime parameters, and keep llama.cpp responsive.</p>
|
||||||
|
</div>
|
||||||
|
<div class="header-actions">
|
||||||
|
<button id="theme-toggle" class="ghost" type="button" aria-pressed="false">Dark</button>
|
||||||
|
<div class="quick-actions card">
|
||||||
|
<h2>Quick Add</h2>
|
||||||
|
<form id="download-form">
|
||||||
|
<label>
|
||||||
|
Model URL
|
||||||
|
<input type="url" id="model-url" placeholder="https://.../model.gguf" required />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
Optional filename
|
||||||
|
<input type="text" id="model-filename" placeholder="custom-name.gguf" />
|
||||||
|
</label>
|
||||||
|
<button type="submit">Start Download</button>
|
||||||
|
<p id="download-error" class="error"></p>
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</header>
|
||||||
|
|
||||||
|
<main class="layout">
|
||||||
|
<section class="column">
|
||||||
|
<div class="card">
|
||||||
|
<div class="card-header">
|
||||||
|
<h3>Models</h3>
|
||||||
|
<button id="refresh-models" class="ghost">Refresh</button>
|
||||||
|
</div>
|
||||||
|
<div id="switch-status" class="status"></div>
|
||||||
|
<label class="config-wide">
|
||||||
|
Warmup prompt (one-time)
|
||||||
|
<textarea id="warmup-prompt" rows="3" placeholder="Optional warmup prompt for the next restart only"></textarea>
|
||||||
|
</label>
|
||||||
|
<ul id="models-list" class="list"></ul>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="card">
|
||||||
|
<div class="card-header">
|
||||||
|
<h3>Downloads</h3>
|
||||||
|
<button id="refresh-downloads" class="ghost">Refresh</button>
|
||||||
|
</div>
|
||||||
|
<div id="downloads-list" class="downloads"></div>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section class="column">
|
||||||
|
<div class="card">
|
||||||
|
<div class="card-header">
|
||||||
|
<h3>Runtime Parameters</h3>
|
||||||
|
<button id="refresh-config" class="ghost">Refresh</button>
|
||||||
|
</div>
|
||||||
|
<div id="config-status" class="status"></div>
|
||||||
|
<form id="config-form" class="config-grid">
|
||||||
|
<label>
|
||||||
|
ctx-size
|
||||||
|
<input type="text" id="cfg-ctx-size" placeholder="e.g. 8192" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
n-gpu-layers
|
||||||
|
<input type="text" id="cfg-n-gpu-layers" placeholder="e.g. 999" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
tensor-split
|
||||||
|
<input type="text" id="cfg-tensor-split" placeholder="e.g. 0.5,0.5" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
split-mode
|
||||||
|
<input type="text" id="cfg-split-mode" placeholder="e.g. layer" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
cache-type-k
|
||||||
|
<input type="text" id="cfg-cache-type-k" placeholder="e.g. q8_0" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
cache-type-v
|
||||||
|
<input type="text" id="cfg-cache-type-v" placeholder="e.g. q8_0" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
flash-attn
|
||||||
|
<input type="text" id="cfg-flash-attn" placeholder="on/off" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
temp
|
||||||
|
<input type="text" id="cfg-temp" placeholder="e.g. 0.7" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
top-k
|
||||||
|
<input type="text" id="cfg-top-k" placeholder="e.g. 40" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
top-p
|
||||||
|
<input type="text" id="cfg-top-p" placeholder="e.g. 0.9" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
repeat-penalty
|
||||||
|
<input type="text" id="cfg-repeat-penalty" placeholder="e.g. 1.1" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
repeat-last-n
|
||||||
|
<input type="text" id="cfg-repeat-last-n" placeholder="e.g. 256" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
frequency-penalty
|
||||||
|
<input type="text" id="cfg-frequency-penalty" placeholder="e.g. 0.1" />
|
||||||
|
</label>
|
||||||
|
<label>
|
||||||
|
presence-penalty
|
||||||
|
<input type="text" id="cfg-presence-penalty" placeholder="e.g. 0.0" />
|
||||||
|
</label>
|
||||||
|
<label class="config-wide">
|
||||||
|
extra args
|
||||||
|
<textarea id="cfg-extra-args" rows="3" placeholder="--mlock --no-mmap"></textarea>
|
||||||
|
</label>
|
||||||
|
<button type="submit" class="config-wide">Apply Parameters</button>
|
||||||
|
</form>
|
||||||
|
</div>
|
||||||
|
</section>
|
||||||
|
</main>
|
||||||
|
|
||||||
|
<section class="card logs-panel">
|
||||||
|
<div class="card-header">
|
||||||
|
<div>
|
||||||
|
<h3>llama.cpp Logs</h3>
|
||||||
|
<p class="lede small">Live tail from the llama.cpp container.</p>
|
||||||
|
</div>
|
||||||
|
<div class="log-actions">
|
||||||
|
<span id="logs-status" class="badge muted">Idle</span>
|
||||||
|
<button id="refresh-logs" class="ghost">Refresh</button>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<pre id="logs-output" class="log-output"></pre>
|
||||||
|
</section>
|
||||||
|
</div>
|
||||||
|
<script src="/ui/app.js"></script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
337
llamaCpp.Wrapper.app/ui_static/styles.css
Normal file
337
llamaCpp.Wrapper.app/ui_static/styles.css
Normal file
@@ -0,0 +1,337 @@
|
|||||||
|
:root {
|
||||||
|
--bg: #f5f6f8;
|
||||||
|
--panel: #ffffff;
|
||||||
|
--panel-muted: #f2f3f6;
|
||||||
|
--text: #111318;
|
||||||
|
--muted: #5b6472;
|
||||||
|
--border: rgba(17, 19, 24, 0.08);
|
||||||
|
--accent: #0a84ff;
|
||||||
|
--accent-ink: #005ad6;
|
||||||
|
--shadow: 0 20px 60px rgba(17, 19, 24, 0.08);
|
||||||
|
}
|
||||||
|
|
||||||
|
* {
|
||||||
|
box-sizing: border-box;
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
font-family: "SF Pro Text", "SF Pro Display", "Helvetica Neue", "Segoe UI", sans-serif;
|
||||||
|
background: radial-gradient(circle at top, #ffffff 0%, var(--bg) 60%);
|
||||||
|
color: var(--text);
|
||||||
|
}
|
||||||
|
|
||||||
|
.page {
|
||||||
|
max-width: 1200px;
|
||||||
|
margin: 0 auto;
|
||||||
|
padding: 48px 28px 72px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.topbar {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: minmax(240px, 1.2fr) minmax(280px, 0.8fr);
|
||||||
|
gap: 32px;
|
||||||
|
align-items: stretch;
|
||||||
|
margin-bottom: 36px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.header-actions {
|
||||||
|
display: grid;
|
||||||
|
gap: 16px;
|
||||||
|
justify-items: end;
|
||||||
|
}
|
||||||
|
|
||||||
|
.header-actions .quick-actions {
|
||||||
|
width: 100%;
|
||||||
|
}
|
||||||
|
|
||||||
|
.header-actions #theme-toggle {
|
||||||
|
justify-self: end;
|
||||||
|
}
|
||||||
|
|
||||||
|
.brand h1 {
|
||||||
|
font-size: clamp(2.2rem, 4vw, 3.2rem);
|
||||||
|
letter-spacing: -0.02em;
|
||||||
|
}
|
||||||
|
|
||||||
|
.eyebrow {
|
||||||
|
text-transform: uppercase;
|
||||||
|
letter-spacing: 0.2em;
|
||||||
|
font-size: 0.68rem;
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.lede {
|
||||||
|
margin-top: 12px;
|
||||||
|
font-size: 1rem;
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.lede.small {
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.card {
|
||||||
|
background: var(--panel);
|
||||||
|
padding: 22px;
|
||||||
|
border-radius: 22px;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
box-shadow: var(--shadow);
|
||||||
|
}
|
||||||
|
|
||||||
|
.quick-actions h2 {
|
||||||
|
margin-bottom: 14px;
|
||||||
|
font-size: 1.1rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.layout {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
|
||||||
|
gap: 24px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.column {
|
||||||
|
display: grid;
|
||||||
|
gap: 24px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.logs-panel {
|
||||||
|
margin-top: 28px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.card-header {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: space-between;
|
||||||
|
gap: 12px;
|
||||||
|
margin-bottom: 16px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.card-header h3 {
|
||||||
|
font-size: 1.1rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.log-actions {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
form {
|
||||||
|
display: grid;
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
label {
|
||||||
|
display: grid;
|
||||||
|
gap: 6px;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
input,
|
||||||
|
textarea,
|
||||||
|
button {
|
||||||
|
font: inherit;
|
||||||
|
}
|
||||||
|
|
||||||
|
input,
|
||||||
|
textarea {
|
||||||
|
padding: 10px 12px;
|
||||||
|
border-radius: 12px;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
background: #fff;
|
||||||
|
}
|
||||||
|
|
||||||
|
button {
|
||||||
|
border: none;
|
||||||
|
padding: 10px 16px;
|
||||||
|
border-radius: 12px;
|
||||||
|
background: var(--accent);
|
||||||
|
color: #fff;
|
||||||
|
font-weight: 600;
|
||||||
|
cursor: pointer;
|
||||||
|
transition: transform 0.2s ease, background 0.2s ease;
|
||||||
|
}
|
||||||
|
|
||||||
|
button:hover {
|
||||||
|
transform: translateY(-1px);
|
||||||
|
background: var(--accent-ink);
|
||||||
|
}
|
||||||
|
|
||||||
|
button.ghost {
|
||||||
|
background: transparent;
|
||||||
|
color: var(--accent);
|
||||||
|
border: 1px solid rgba(10, 132, 255, 0.4);
|
||||||
|
padding: 8px 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list {
|
||||||
|
list-style: none;
|
||||||
|
padding: 0;
|
||||||
|
margin: 0;
|
||||||
|
display: grid;
|
||||||
|
gap: 10px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list li {
|
||||||
|
padding: 12px;
|
||||||
|
border-radius: 14px;
|
||||||
|
background: var(--panel-muted);
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.list li.active {
|
||||||
|
border-color: rgba(10, 132, 255, 0.4);
|
||||||
|
background: #eef5ff;
|
||||||
|
}
|
||||||
|
|
||||||
|
.model-row {
|
||||||
|
display: flex;
|
||||||
|
align-items: center;
|
||||||
|
justify-content: space-between;
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.badge {
|
||||||
|
display: inline-block;
|
||||||
|
padding: 4px 8px;
|
||||||
|
border-radius: 999px;
|
||||||
|
background: var(--accent);
|
||||||
|
color: #fff;
|
||||||
|
font-size: 0.7rem;
|
||||||
|
font-weight: 600;
|
||||||
|
}
|
||||||
|
|
||||||
|
.badge.muted {
|
||||||
|
background: rgba(17, 19, 24, 0.1);
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.status {
|
||||||
|
margin-bottom: 12px;
|
||||||
|
font-size: 0.9rem;
|
||||||
|
color: var(--muted);
|
||||||
|
}
|
||||||
|
|
||||||
|
.status.ok {
|
||||||
|
color: #1a7f37;
|
||||||
|
}
|
||||||
|
|
||||||
|
.status.error {
|
||||||
|
color: #b02a14;
|
||||||
|
}
|
||||||
|
|
||||||
|
.downloads {
|
||||||
|
display: grid;
|
||||||
|
gap: 12px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.download-card {
|
||||||
|
border-radius: 16px;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
padding: 12px;
|
||||||
|
background: #f7f8fb;
|
||||||
|
}
|
||||||
|
|
||||||
|
.download-card strong {
|
||||||
|
display: block;
|
||||||
|
font-size: 0.9rem;
|
||||||
|
margin-bottom: 6px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.progress {
|
||||||
|
height: 8px;
|
||||||
|
border-radius: 999px;
|
||||||
|
background: #dfe3ea;
|
||||||
|
overflow: hidden;
|
||||||
|
margin: 8px 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.progress > span {
|
||||||
|
display: block;
|
||||||
|
height: 100%;
|
||||||
|
background: var(--accent);
|
||||||
|
width: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
.error {
|
||||||
|
color: #b02a14;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
}
|
||||||
|
|
||||||
|
.config-grid {
|
||||||
|
display: grid;
|
||||||
|
grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
|
||||||
|
gap: 14px;
|
||||||
|
}
|
||||||
|
|
||||||
|
.config-wide {
|
||||||
|
grid-column: 1 / -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
textarea {
|
||||||
|
padding: 10px 12px;
|
||||||
|
border-radius: 12px;
|
||||||
|
border: 1px solid var(--border);
|
||||||
|
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
|
||||||
|
font-size: 0.85rem;
|
||||||
|
resize: vertical;
|
||||||
|
}
|
||||||
|
|
||||||
|
.log-output {
|
||||||
|
background: #0f141b;
|
||||||
|
color: #dbe6f3;
|
||||||
|
padding: 16px;
|
||||||
|
border-radius: 16px;
|
||||||
|
min-height: 260px;
|
||||||
|
max-height: 420px;
|
||||||
|
overflow: auto;
|
||||||
|
font-size: 12px;
|
||||||
|
line-height: 1.6;
|
||||||
|
white-space: pre-wrap;
|
||||||
|
}
|
||||||
|
|
||||||
|
[data-theme="dark"] {
|
||||||
|
--bg: #0b0d12;
|
||||||
|
--panel: #141824;
|
||||||
|
--panel-muted: #1b2132;
|
||||||
|
--text: #f1f4f9;
|
||||||
|
--muted: #a5afc2;
|
||||||
|
--border: rgba(241, 244, 249, 0.1);
|
||||||
|
--accent: #4aa3ff;
|
||||||
|
--accent-ink: #1f7ae0;
|
||||||
|
--shadow: 0 20px 60px rgba(0, 0, 0, 0.4);
|
||||||
|
}
|
||||||
|
|
||||||
|
[data-theme="dark"] body {
|
||||||
|
background: radial-gradient(circle at top, #131826 0%, var(--bg) 60%);
|
||||||
|
}
|
||||||
|
|
||||||
|
[data-theme="dark"] .download-card {
|
||||||
|
background: #121826;
|
||||||
|
}
|
||||||
|
|
||||||
|
[data-theme="dark"] .progress {
|
||||||
|
background: #2a3349;
|
||||||
|
}
|
||||||
|
|
||||||
|
[data-theme="dark"] .log-output {
|
||||||
|
background: #080b12;
|
||||||
|
color: #d8e4f3;
|
||||||
|
}
|
||||||
|
|
||||||
|
@media (max-width: 900px) {
|
||||||
|
.topbar {
|
||||||
|
grid-template-columns: 1fr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
@media (max-width: 640px) {
|
||||||
|
.page {
|
||||||
|
padding: 32px 16px 48px;
|
||||||
|
}
|
||||||
|
}
|
||||||
74
llamaCpp.Wrapper.app/warmup.py
Normal file
74
llamaCpp.Wrapper.app/warmup.py
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
import asyncio
|
||||||
|
import logging
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
log = logging.getLogger("llamacpp_warmup")
|
||||||
|
|
||||||
|
|
||||||
|
def _is_loading_error(response: httpx.Response) -> bool:
|
||||||
|
if response.status_code != 503:
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
payload = response.json()
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
message = ""
|
||||||
|
if isinstance(payload, dict):
|
||||||
|
error = payload.get("error")
|
||||||
|
if isinstance(error, dict):
|
||||||
|
message = str(error.get("message") or "")
|
||||||
|
else:
|
||||||
|
message = str(payload.get("message") or "")
|
||||||
|
return "loading model" in message.lower()
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_warmup_prompt(override: str | None, fallback_path: str) -> str:
|
||||||
|
if override:
|
||||||
|
prompt = override.strip()
|
||||||
|
if prompt:
|
||||||
|
return prompt
|
||||||
|
try:
|
||||||
|
prompt = Path(fallback_path).read_text(encoding="utf-8").strip()
|
||||||
|
if prompt:
|
||||||
|
return prompt
|
||||||
|
except Exception as exc:
|
||||||
|
log.warning("Failed to read warmup prompt from %s: %s", fallback_path, exc)
|
||||||
|
return "ok"
|
||||||
|
|
||||||
|
|
||||||
|
async def run_warmup(base_url: str, model_id: str, prompt: str, timeout_s: float) -> None:
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": prompt}],
|
||||||
|
"max_tokens": 8,
|
||||||
|
"temperature": 0,
|
||||||
|
}
|
||||||
|
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||||
|
resp = await client.post("/v1/chat/completions", json=payload)
|
||||||
|
if resp.status_code == 503 and _is_loading_error(resp):
|
||||||
|
raise RuntimeError("llama.cpp still loading model")
|
||||||
|
resp.raise_for_status()
|
||||||
|
|
||||||
|
|
||||||
|
async def run_warmup_with_retry(
|
||||||
|
base_url: str,
|
||||||
|
model_id: str,
|
||||||
|
prompt: str,
|
||||||
|
timeout_s: float,
|
||||||
|
interval_s: float = 3.0,
|
||||||
|
) -> None:
|
||||||
|
deadline = time.time() + timeout_s
|
||||||
|
last_exc: Exception | None = None
|
||||||
|
while time.time() < deadline:
|
||||||
|
try:
|
||||||
|
await run_warmup(base_url, model_id, prompt, timeout_s=timeout_s)
|
||||||
|
return
|
||||||
|
except Exception as exc:
|
||||||
|
last_exc = exc
|
||||||
|
await asyncio.sleep(interval_s)
|
||||||
|
if last_exc:
|
||||||
|
raise last_exc
|
||||||
464
llamacpp_remote_test.ps1
Normal file
464
llamacpp_remote_test.ps1
Normal file
@@ -0,0 +1,464 @@
|
|||||||
|
param(
|
||||||
|
[Parameter(Mandatory = $true)][string]$Model,
|
||||||
|
[string]$BaseUrl = "http://192.168.1.2:8071",
|
||||||
|
[string]$PromptPath = "prompt_crwv.txt",
|
||||||
|
[int]$Runs = 3,
|
||||||
|
[int]$MaxTokens = 2000,
|
||||||
|
[int]$NumCtx = 131072,
|
||||||
|
[int]$TopK = 1,
|
||||||
|
[double]$TopP = 1.0,
|
||||||
|
[int]$Seed = 42,
|
||||||
|
[double]$RepeatPenalty = 1.05,
|
||||||
|
[double]$Temperature = 0,
|
||||||
|
[string]$JsonSchema = "",
|
||||||
|
[int]$TimeoutSec = 1800,
|
||||||
|
[string]$BatchId,
|
||||||
|
[switch]$EnableGpuMonitor = $true,
|
||||||
|
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||||
|
[string]$SshUser = "rushabh",
|
||||||
|
[string]$SshHost = "192.168.1.2",
|
||||||
|
[int]$SshPort = 55555,
|
||||||
|
[int]$GpuMonitorIntervalSec = 1,
|
||||||
|
[int]$GpuMonitorSeconds = 120
|
||||||
|
)
|
||||||
|
|
||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
$ProgressPreference = "SilentlyContinue"
|
||||||
|
|
||||||
|
function Normalize-Strike([object]$value) {
|
||||||
|
if ($null -eq $value) { return $null }
|
||||||
|
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
|
||||||
|
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
|
||||||
|
}
|
||||||
|
return ($value.ToString().Trim())
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-AllowedLegs([string]$promptText) {
|
||||||
|
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
|
||||||
|
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
|
||||||
|
if (-not $match.Success) {
|
||||||
|
throw "Options Chain JSON block not found in prompt."
|
||||||
|
}
|
||||||
|
$chains = $match.Groups[1].Value | ConvertFrom-Json
|
||||||
|
$allowedExpiry = @{}
|
||||||
|
$allowedLegs = @{}
|
||||||
|
foreach ($exp in $chains) {
|
||||||
|
$expiry = [string]$exp.expiry
|
||||||
|
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
|
||||||
|
$allowedExpiry[$expiry] = $true
|
||||||
|
foreach ($leg in $exp.liquidSet) {
|
||||||
|
if ($null -eq $leg) { continue }
|
||||||
|
if ($leg.liquid -ne $true) { continue }
|
||||||
|
$side = [string]$leg.side
|
||||||
|
$strikeNorm = Normalize-Strike $leg.strike
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
|
||||||
|
$key = "$expiry|$side|$strikeNorm"
|
||||||
|
$allowedLegs[$key] = $true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
|
||||||
|
}
|
||||||
|
|
||||||
|
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
|
||||||
|
$errors = New-Object System.Collections.Generic.List[string]
|
||||||
|
|
||||||
|
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
|
||||||
|
foreach ($key in $requiredTop) {
|
||||||
|
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
|
||||||
|
$errors.Add("Missing top-level key: $key")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
|
||||||
|
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
|
||||||
|
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
|
||||||
|
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
$errors.Add("selectedExpiry is missing or empty")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.confidenceScore -ne $null) {
|
||||||
|
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
|
||||||
|
$errors.Add("confidenceScore is not numeric")
|
||||||
|
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
|
||||||
|
$errors.Add("confidenceScore out of range 0-100")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.recommendedTrades -eq $null) {
|
||||||
|
$errors.Add("recommendedTrades is null")
|
||||||
|
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("recommendedTrades is not an array")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.strategyBias -eq "NO_TRADE") {
|
||||||
|
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
|
||||||
|
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
|
||||||
|
$errors.Add("recommendedTrades must contain 1-3 trades")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("whyOthersRejected is not an array")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.recommendedTrades) {
|
||||||
|
foreach ($trade in $obj.recommendedTrades) {
|
||||||
|
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||||
|
foreach ($tkey in $tradeRequired) {
|
||||||
|
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
|
||||||
|
$errors.Add("Trade missing key: $tkey")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
|
||||||
|
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
|
||||||
|
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
|
||||||
|
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
|
||||||
|
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
|
||||||
|
}
|
||||||
|
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
|
||||||
|
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("Trade legs missing or not an array")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
$legs = @($trade.legs)
|
||||||
|
|
||||||
|
$hasBuy = $false
|
||||||
|
$hasSell = $false
|
||||||
|
foreach ($leg in $trade.legs) {
|
||||||
|
$side = ([string]$leg.side).ToLowerInvariant()
|
||||||
|
$action = ([string]$leg.action).ToLowerInvariant()
|
||||||
|
$expiry = [string]$leg.expiry
|
||||||
|
$strikeNorm = Normalize-Strike $leg.strike
|
||||||
|
|
||||||
|
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
|
||||||
|
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
|
||||||
|
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
|
||||||
|
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
|
||||||
|
$key = "$expiry|$side|$strikeNorm"
|
||||||
|
if (-not $allowedLegs.ContainsKey($key)) {
|
||||||
|
$errors.Add("Leg not in liquid set: $key")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($action -eq "buy") { $hasBuy = $true }
|
||||||
|
if ($action -eq "sell") { $hasSell = $true }
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.selectedExpiry -and $legs) {
|
||||||
|
foreach ($leg in $legs) {
|
||||||
|
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
|
||||||
|
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($hasSell -and -not $hasBuy) {
|
||||||
|
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($trade.greekProfile) {
|
||||||
|
$gp = $trade.greekProfile
|
||||||
|
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||||
|
foreach ($gkey in $gpRequired) {
|
||||||
|
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
|
||||||
|
$errors.Add("Missing greekProfile.$gkey")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
|
||||||
|
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
|
||||||
|
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
|
||||||
|
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
|
||||||
|
|
||||||
|
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
|
||||||
|
$errors.Add("ThetaExposure POS on all-long legs")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
$errors.Add("Missing greekProfile")
|
||||||
|
}
|
||||||
|
|
||||||
|
$structure = ([string]$trade.structure).ToLowerInvariant()
|
||||||
|
$tradeName = ([string]$trade.name).ToLowerInvariant()
|
||||||
|
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
|
||||||
|
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
|
||||||
|
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||||
|
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||||
|
|
||||||
|
if ($isStraddle -or $isStrangle) {
|
||||||
|
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
|
||||||
|
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||||
|
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||||
|
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
|
||||||
|
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
|
||||||
|
$callStrike = Normalize-Strike $callLegs[0].strike
|
||||||
|
$putStrike = Normalize-Strike $putLegs[0].strike
|
||||||
|
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
|
||||||
|
if ($isStrangle) {
|
||||||
|
try {
|
||||||
|
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Strangle strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
|
||||||
|
$errors.Add("Straddle/Strangle must be long (buy) legs")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
|
||||||
|
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($isCallDebit) {
|
||||||
|
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||||
|
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
|
||||||
|
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
|
||||||
|
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
|
||||||
|
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
|
||||||
|
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||||
|
try {
|
||||||
|
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
|
||||||
|
$errors.Add("Call debit spread buy strike must be below sell strike")
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Call debit spread strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($isPutDebit) {
|
||||||
|
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||||
|
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
|
||||||
|
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
|
||||||
|
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
|
||||||
|
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
|
||||||
|
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||||
|
try {
|
||||||
|
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
|
||||||
|
$errors.Add("Put debit spread buy strike must be above sell strike")
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Put debit spread strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return $errors
|
||||||
|
}
|
||||||
|
|
||||||
|
function Parse-GpuLog {
|
||||||
|
param([string]$Path)
|
||||||
|
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
|
||||||
|
if (-not (Test-Path $Path)) {
|
||||||
|
$summary.error = "gpu log missing"
|
||||||
|
return $summary
|
||||||
|
}
|
||||||
|
$lines = Get-Content -Path $Path
|
||||||
|
$currentIndex = -1
|
||||||
|
$gpuIndex = -1
|
||||||
|
$inUtilBlock = $false
|
||||||
|
foreach ($line in $lines) {
|
||||||
|
if ($line -match '^Timestamp') {
|
||||||
|
$gpuIndex = -1
|
||||||
|
$currentIndex = -1
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
|
||||||
|
$gpuIndex += 1
|
||||||
|
$currentIndex = $gpuIndex
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^\\s*Utilization\\s*$') {
|
||||||
|
$inUtilBlock = $true
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
|
||||||
|
$util = [int]$Matches[1]
|
||||||
|
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||||
|
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||||
|
$summary.samples += 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return $summary
|
||||||
|
}
|
||||||
|
|
||||||
|
$prompt = [string](Get-Content -Raw -Path $PromptPath)
|
||||||
|
$allowed = Get-AllowedLegs -promptText $prompt
|
||||||
|
$allowedExpiry = $allowed.AllowedExpiry
|
||||||
|
$allowedLegs = $allowed.AllowedLegs
|
||||||
|
|
||||||
|
if ([string]::IsNullOrWhiteSpace($BatchId)) {
|
||||||
|
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
|
||||||
|
}
|
||||||
|
|
||||||
|
$outBase = Join-Path -Path (Get-Location) -ChildPath "llamacpp_runs_remote"
|
||||||
|
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
|
||||||
|
|
||||||
|
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
|
||||||
|
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
|
||||||
|
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
|
||||||
|
|
||||||
|
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
|
||||||
|
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
|
||||||
|
|
||||||
|
$summary = [ordered]@{
|
||||||
|
model = $Model
|
||||||
|
baseUrl = $BaseUrl
|
||||||
|
batchId = $BatchId
|
||||||
|
params = [ordered]@{
|
||||||
|
temperature = $Temperature
|
||||||
|
top_k = $TopK
|
||||||
|
top_p = $TopP
|
||||||
|
seed = $Seed
|
||||||
|
repeat_penalty = $RepeatPenalty
|
||||||
|
max_tokens = $MaxTokens
|
||||||
|
num_ctx = $NumCtx
|
||||||
|
}
|
||||||
|
gpuMonitor = [ordered]@{
|
||||||
|
enabled = [bool]$EnableGpuMonitor
|
||||||
|
sshHost = $SshHost
|
||||||
|
sshPort = $SshPort
|
||||||
|
intervalSec = $GpuMonitorIntervalSec
|
||||||
|
durationSec = $GpuMonitorSeconds
|
||||||
|
}
|
||||||
|
modelMeta = $null
|
||||||
|
runs = @()
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
|
||||||
|
try {
|
||||||
|
$schemaObject = $JsonSchema | ConvertFrom-Json
|
||||||
|
} catch {
|
||||||
|
throw "JsonSchema is not valid JSON: $($_.Exception.Message)"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
try {
|
||||||
|
$modelsResponse = Invoke-RestMethod -Uri "$BaseUrl/v1/models" -TimeoutSec 30
|
||||||
|
$meta = $modelsResponse.data | Where-Object { $_.id -eq $Model } | Select-Object -First 1
|
||||||
|
if ($meta) { $summary.modelMeta = $meta.meta }
|
||||||
|
} catch {
|
||||||
|
$summary.modelMeta = @{ error = $_.Exception.Message }
|
||||||
|
}
|
||||||
|
|
||||||
|
for ($i = 1; $i -le $Runs; $i++) {
|
||||||
|
Write-Host "Running $Model (run $i/$Runs)"
|
||||||
|
|
||||||
|
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
|
||||||
|
$gpuJob = $null
|
||||||
|
$gpuLogPath = $null
|
||||||
|
|
||||||
|
if ($EnableGpuMonitor) {
|
||||||
|
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
|
||||||
|
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
|
||||||
|
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||||
|
$gpuJob = Start-Job -ScriptBlock {
|
||||||
|
param($sshExe, $target, $port, $samples, $interval, $logPath)
|
||||||
|
for ($s = 1; $s -le $samples; $s++) {
|
||||||
|
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
|
||||||
|
try {
|
||||||
|
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
|
||||||
|
Add-Content -Path $logPath -Value $out
|
||||||
|
} catch {
|
||||||
|
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
|
||||||
|
}
|
||||||
|
Start-Sleep -Seconds $interval
|
||||||
|
}
|
||||||
|
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
|
||||||
|
Start-Sleep -Seconds 1
|
||||||
|
}
|
||||||
|
|
||||||
|
$body = @{
|
||||||
|
model = $Model
|
||||||
|
messages = @(@{ role = "user"; content = $prompt })
|
||||||
|
temperature = $Temperature
|
||||||
|
top_k = $TopK
|
||||||
|
top_p = $TopP
|
||||||
|
seed = $Seed
|
||||||
|
repeat_penalty = $RepeatPenalty
|
||||||
|
max_tokens = $MaxTokens
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($schemaObject) {
|
||||||
|
$body.response_format = @{
|
||||||
|
type = "json_schema"
|
||||||
|
json_schema = @{
|
||||||
|
name = "trade_schema"
|
||||||
|
schema = $schemaObject
|
||||||
|
strict = $true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$body = $body | ConvertTo-Json -Depth 12
|
||||||
|
|
||||||
|
try {
|
||||||
|
$resp = Invoke-RestMethod -Uri "$BaseUrl/v1/chat/completions" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
|
||||||
|
} catch {
|
||||||
|
$runResult.errors = @("API error: $($_.Exception.Message)")
|
||||||
|
$summary.runs += $runResult
|
||||||
|
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
|
||||||
|
continue
|
||||||
|
} finally {
|
||||||
|
if ($gpuJob) {
|
||||||
|
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
|
||||||
|
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
|
||||||
|
Remove-Job -Job $gpuJob | Out-Null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$raw = [string]$resp.choices[0].message.content
|
||||||
|
|
||||||
|
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
|
||||||
|
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
|
||||||
|
|
||||||
|
try {
|
||||||
|
$parsed = $raw | ConvertFrom-Json
|
||||||
|
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
|
||||||
|
if ($errors.Count -eq 0) {
|
||||||
|
$runResult.ok = $true
|
||||||
|
} else {
|
||||||
|
$runResult.errors = $errors
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($gpuLogPath) {
|
||||||
|
$runResult.gpuLog = $gpuLogPath
|
||||||
|
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
|
||||||
|
}
|
||||||
|
if ($resp.timings) {
|
||||||
|
$runResult.timings = $resp.timings
|
||||||
|
}
|
||||||
|
if ($resp.usage) {
|
||||||
|
$runResult.usage = $resp.usage
|
||||||
|
}
|
||||||
|
|
||||||
|
$summary.runs += $runResult
|
||||||
|
}
|
||||||
|
|
||||||
|
$summaryPath = Join-Path $outDir "summary.json"
|
||||||
|
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
|
||||||
|
|
||||||
|
$summary | ConvertTo-Json -Depth 6
|
||||||
117
llamacpp_set_command.ps1
Normal file
117
llamacpp_set_command.ps1
Normal file
@@ -0,0 +1,117 @@
|
|||||||
|
param(
|
||||||
|
[Parameter(Mandatory = $true)][string]$ModelPath,
|
||||||
|
[Parameter(Mandatory = $true)][int]$CtxSize,
|
||||||
|
[int]$BatchSize = 1024,
|
||||||
|
[int]$UBatchSize = 256,
|
||||||
|
[string]$TensorSplit = "0.5,0.5",
|
||||||
|
[string]$Devices = "0,1",
|
||||||
|
[int]$GpuLayers = 999,
|
||||||
|
[string]$CacheTypeK = "q4_0",
|
||||||
|
[string]$CacheTypeV = "q4_0",
|
||||||
|
[string]$GrammarFile = "",
|
||||||
|
[string]$JsonSchema = "",
|
||||||
|
[string]$BaseUrl = "http://192.168.1.2:8071",
|
||||||
|
[int]$TimeoutSec = 600,
|
||||||
|
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||||
|
[string]$SshUser = "rushabh",
|
||||||
|
[string]$SshHost = "192.168.1.2",
|
||||||
|
[int]$SshPort = 55555
|
||||||
|
)
|
||||||
|
|
||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
$ProgressPreference = "SilentlyContinue"
|
||||||
|
|
||||||
|
$commandArgs = @(
|
||||||
|
"--model", $ModelPath,
|
||||||
|
"--ctx-size", $CtxSize.ToString(),
|
||||||
|
"--n-gpu-layers", $GpuLayers.ToString(),
|
||||||
|
"--split-mode", "layer",
|
||||||
|
"--tensor-split", $TensorSplit,
|
||||||
|
"--batch-size", $BatchSize.ToString(),
|
||||||
|
"--ubatch-size", $UBatchSize.ToString(),
|
||||||
|
"--cache-type-k", $CacheTypeK,
|
||||||
|
"--cache-type-v", $CacheTypeV,
|
||||||
|
"--flash-attn", "on"
|
||||||
|
)
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($Devices)) {
|
||||||
|
$commandArgs = @("--device", $Devices) + $commandArgs
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($GrammarFile)) {
|
||||||
|
$commandArgs += @("--grammar-file", $GrammarFile)
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
|
||||||
|
$commandArgs += @("--json-schema", $JsonSchema)
|
||||||
|
}
|
||||||
|
|
||||||
|
$argJson = $commandArgs | ConvertTo-Json -Compress
|
||||||
|
|
||||||
|
$py = @"
|
||||||
|
import json
|
||||||
|
path = r"/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
|
||||||
|
new_cmd = json.loads(r'''$argJson''')
|
||||||
|
lines = open(path, "r", encoding="utf-8").read().splitlines()
|
||||||
|
out = []
|
||||||
|
in_cmd = False
|
||||||
|
def yaml_quote(value):
|
||||||
|
text = str(value)
|
||||||
|
return "'" + text.replace("'", "''") + "'"
|
||||||
|
for line in lines:
|
||||||
|
if line.startswith('"command":'):
|
||||||
|
out.append('"command":')
|
||||||
|
for arg in new_cmd:
|
||||||
|
out.append(f"- {yaml_quote(arg)}")
|
||||||
|
in_cmd = True
|
||||||
|
continue
|
||||||
|
if in_cmd:
|
||||||
|
if line.startswith('"') and not line.startswith('"command":'):
|
||||||
|
in_cmd = False
|
||||||
|
out.append(line)
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
else:
|
||||||
|
out.append(line)
|
||||||
|
if in_cmd:
|
||||||
|
pass
|
||||||
|
open(path, "w", encoding="utf-8").write("\n".join(out) + "\n")
|
||||||
|
"@
|
||||||
|
|
||||||
|
$py | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -"
|
||||||
|
|
||||||
|
$pyCompose = @"
|
||||||
|
import json, yaml, subprocess
|
||||||
|
compose_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/templates/rendered/docker-compose.yaml"
|
||||||
|
user_config_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
|
||||||
|
with open(compose_path, "r", encoding="utf-8") as f:
|
||||||
|
compose = json.load(f)
|
||||||
|
with open(user_config_path, "r", encoding="utf-8") as f:
|
||||||
|
config = yaml.safe_load(f)
|
||||||
|
command = config.get("command")
|
||||||
|
if not command:
|
||||||
|
raise SystemExit("command list missing from user_config")
|
||||||
|
svc = compose["services"]["llamacpp"]
|
||||||
|
svc["command"] = command
|
||||||
|
with open(compose_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(compose, f)
|
||||||
|
payload = {"custom_compose_config": compose}
|
||||||
|
subprocess.run(["midclt", "call", "app.update", "llamacpp", json.dumps(payload)], check=True)
|
||||||
|
"@
|
||||||
|
|
||||||
|
$pyCompose | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -" | Out-Null
|
||||||
|
|
||||||
|
$start = Get-Date
|
||||||
|
while ((Get-Date) - $start -lt [TimeSpan]::FromSeconds($TimeoutSec)) {
|
||||||
|
try {
|
||||||
|
$resp = Invoke-RestMethod -Uri "$BaseUrl/health" -TimeoutSec 10
|
||||||
|
if ($resp.status -eq "ok") {
|
||||||
|
Write-Host "llamacpp healthy at $BaseUrl"
|
||||||
|
exit 0
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
Start-Sleep -Seconds 5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
throw "Timed out waiting for llama.cpp server at $BaseUrl"
|
||||||
14
modelfiles/options-json-deepseek14b.Modelfile
Normal file
14
modelfiles/options-json-deepseek14b.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
FROM deepseek-r1:14b
|
||||||
|
SYSTEM """
|
||||||
|
You are a senior quantitative options trader specializing in index and ETF options.
|
||||||
|
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||||
|
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||||
|
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||||
|
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||||
|
Begin output with { and end with }.
|
||||||
|
"""
|
||||||
|
PARAMETER temperature 0
|
||||||
|
PARAMETER top_k 1
|
||||||
|
PARAMETER top_p 1
|
||||||
|
PARAMETER repeat_penalty 1.05
|
||||||
|
PARAMETER seed 42
|
||||||
14
modelfiles/options-json-llama31-70b.Modelfile
Normal file
14
modelfiles/options-json-llama31-70b.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
FROM llama3.1:70b
|
||||||
|
SYSTEM """
|
||||||
|
You are a senior quantitative options trader specializing in index and ETF options.
|
||||||
|
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||||
|
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||||
|
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||||
|
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||||
|
Begin output with { and end with }.
|
||||||
|
"""
|
||||||
|
PARAMETER temperature 0
|
||||||
|
PARAMETER top_k 1
|
||||||
|
PARAMETER top_p 1
|
||||||
|
PARAMETER repeat_penalty 1.05
|
||||||
|
PARAMETER seed 42
|
||||||
14
modelfiles/options-json-phi3mini.Modelfile
Normal file
14
modelfiles/options-json-phi3mini.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
FROM phi3:mini-128k
|
||||||
|
SYSTEM """
|
||||||
|
You are a senior quantitative options trader specializing in index and ETF options.
|
||||||
|
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||||
|
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||||
|
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||||
|
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||||
|
Begin output with { and end with }.
|
||||||
|
"""
|
||||||
|
PARAMETER temperature 0
|
||||||
|
PARAMETER top_k 1
|
||||||
|
PARAMETER top_p 1
|
||||||
|
PARAMETER repeat_penalty 1.05
|
||||||
|
PARAMETER seed 42
|
||||||
561
ollama_remote_test.ps1
Normal file
561
ollama_remote_test.ps1
Normal file
@@ -0,0 +1,561 @@
|
|||||||
|
param(
|
||||||
|
[Parameter(Mandatory = $true)][string]$Model,
|
||||||
|
[string]$BaseUrl = "http://192.168.1.2:30068",
|
||||||
|
[string]$PromptPath = "prompt_crwv.txt",
|
||||||
|
[int]$Runs = 3,
|
||||||
|
[int]$NumPredict = 1200,
|
||||||
|
[int]$NumCtx = 131072,
|
||||||
|
[int]$NumBatch = 0,
|
||||||
|
[int]$NumGpuLayers = 0,
|
||||||
|
[int]$TimeoutSec = 900,
|
||||||
|
[int]$TopK = 1,
|
||||||
|
[double]$TopP = 1.0,
|
||||||
|
[int]$Seed = 42,
|
||||||
|
[double]$RepeatPenalty = 1.05,
|
||||||
|
[string]$BatchId,
|
||||||
|
[switch]$UseSchemaFormat = $false,
|
||||||
|
[switch]$EnableGpuMonitor = $true,
|
||||||
|
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||||
|
[switch]$CheckProcessor = $true,
|
||||||
|
[string]$SshUser = "rushabh",
|
||||||
|
[string]$SshHost = "192.168.1.2",
|
||||||
|
[int]$SshPort = 55555,
|
||||||
|
[int]$GpuMonitorIntervalSec = 1,
|
||||||
|
[int]$GpuMonitorSeconds = 120
|
||||||
|
)
|
||||||
|
|
||||||
|
$ErrorActionPreference = "Stop"
|
||||||
|
$ProgressPreference = "SilentlyContinue"
|
||||||
|
|
||||||
|
function Normalize-Strike([object]$value) {
|
||||||
|
if ($null -eq $value) { return $null }
|
||||||
|
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
|
||||||
|
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
|
||||||
|
}
|
||||||
|
return ($value.ToString().Trim())
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-AllowedLegs([string]$promptText) {
|
||||||
|
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
|
||||||
|
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
|
||||||
|
if (-not $match.Success) {
|
||||||
|
throw "Options Chain JSON block not found in prompt."
|
||||||
|
}
|
||||||
|
$chains = $match.Groups[1].Value | ConvertFrom-Json
|
||||||
|
$allowedExpiry = @{}
|
||||||
|
$allowedLegs = @{}
|
||||||
|
foreach ($exp in $chains) {
|
||||||
|
$expiry = [string]$exp.expiry
|
||||||
|
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
|
||||||
|
$allowedExpiry[$expiry] = $true
|
||||||
|
foreach ($leg in $exp.liquidSet) {
|
||||||
|
if ($null -eq $leg) { continue }
|
||||||
|
if ($leg.liquid -ne $true) { continue }
|
||||||
|
$side = [string]$leg.side
|
||||||
|
$strikeNorm = Normalize-Strike $leg.strike
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
|
||||||
|
$key = "$expiry|$side|$strikeNorm"
|
||||||
|
$allowedLegs[$key] = $true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
|
||||||
|
}
|
||||||
|
|
||||||
|
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
|
||||||
|
$errors = New-Object System.Collections.Generic.List[string]
|
||||||
|
|
||||||
|
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
|
||||||
|
foreach ($key in $requiredTop) {
|
||||||
|
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
|
||||||
|
$errors.Add("Missing top-level key: $key")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
|
||||||
|
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
|
||||||
|
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
|
||||||
|
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
$errors.Add("selectedExpiry is missing or empty")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.confidenceScore -ne $null) {
|
||||||
|
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
|
||||||
|
$errors.Add("confidenceScore is not numeric")
|
||||||
|
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
|
||||||
|
$errors.Add("confidenceScore out of range 0-100")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.recommendedTrades -eq $null) {
|
||||||
|
$errors.Add("recommendedTrades is null")
|
||||||
|
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("recommendedTrades is not an array")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.strategyBias -eq "NO_TRADE") {
|
||||||
|
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
|
||||||
|
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
|
||||||
|
$errors.Add("recommendedTrades must contain 1-3 trades")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("whyOthersRejected is not an array")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.recommendedTrades) {
|
||||||
|
foreach ($trade in $obj.recommendedTrades) {
|
||||||
|
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||||
|
foreach ($tkey in $tradeRequired) {
|
||||||
|
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
|
||||||
|
$errors.Add("Trade missing key: $tkey")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
|
||||||
|
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
|
||||||
|
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
|
||||||
|
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
|
||||||
|
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
|
||||||
|
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
|
||||||
|
}
|
||||||
|
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
|
||||||
|
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
|
||||||
|
}
|
||||||
|
|
||||||
|
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
|
||||||
|
$errors.Add("Trade legs missing or not an array")
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
$legs = @($trade.legs)
|
||||||
|
|
||||||
|
$hasBuy = $false
|
||||||
|
$hasSell = $false
|
||||||
|
foreach ($leg in $trade.legs) {
|
||||||
|
$side = ([string]$leg.side).ToLowerInvariant()
|
||||||
|
$action = ([string]$leg.action).ToLowerInvariant()
|
||||||
|
$expiry = [string]$leg.expiry
|
||||||
|
$strikeNorm = Normalize-Strike $leg.strike
|
||||||
|
|
||||||
|
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
|
||||||
|
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
|
||||||
|
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
|
||||||
|
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
|
||||||
|
$key = "$expiry|$side|$strikeNorm"
|
||||||
|
if (-not $allowedLegs.ContainsKey($key)) {
|
||||||
|
$errors.Add("Leg not in liquid set: $key")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($action -eq "buy") { $hasBuy = $true }
|
||||||
|
if ($action -eq "sell") { $hasSell = $true }
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($obj.selectedExpiry -and $legs) {
|
||||||
|
foreach ($leg in $legs) {
|
||||||
|
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
|
||||||
|
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($hasSell -and -not $hasBuy) {
|
||||||
|
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($trade.greekProfile) {
|
||||||
|
$gp = $trade.greekProfile
|
||||||
|
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||||
|
foreach ($gkey in $gpRequired) {
|
||||||
|
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
|
||||||
|
$errors.Add("Missing greekProfile.$gkey")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
|
||||||
|
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
|
||||||
|
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
|
||||||
|
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
|
||||||
|
|
||||||
|
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
|
||||||
|
$errors.Add("ThetaExposure POS on all-long legs")
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
$errors.Add("Missing greekProfile")
|
||||||
|
}
|
||||||
|
|
||||||
|
$structure = ([string]$trade.structure).ToLowerInvariant()
|
||||||
|
$tradeName = ([string]$trade.name).ToLowerInvariant()
|
||||||
|
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
|
||||||
|
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
|
||||||
|
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||||
|
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||||
|
|
||||||
|
if ($isStraddle -or $isStrangle) {
|
||||||
|
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
|
||||||
|
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||||
|
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||||
|
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
|
||||||
|
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
|
||||||
|
$callStrike = Normalize-Strike $callLegs[0].strike
|
||||||
|
$putStrike = Normalize-Strike $putLegs[0].strike
|
||||||
|
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
|
||||||
|
if ($isStrangle) {
|
||||||
|
try {
|
||||||
|
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Strangle strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
|
||||||
|
$errors.Add("Straddle/Strangle must be long (buy) legs")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
|
||||||
|
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($isCallDebit) {
|
||||||
|
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||||
|
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
|
||||||
|
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
|
||||||
|
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
|
||||||
|
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
|
||||||
|
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||||
|
try {
|
||||||
|
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
|
||||||
|
$errors.Add("Call debit spread buy strike must be below sell strike")
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Call debit spread strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($isPutDebit) {
|
||||||
|
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||||
|
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
|
||||||
|
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
|
||||||
|
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
|
||||||
|
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
|
||||||
|
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||||
|
try {
|
||||||
|
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
|
||||||
|
$errors.Add("Put debit spread buy strike must be above sell strike")
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$errors.Add("Put debit spread strike comparison failed")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return $errors
|
||||||
|
}
|
||||||
|
|
||||||
|
function Parse-GpuLog {
|
||||||
|
param([string]$Path)
|
||||||
|
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
|
||||||
|
if (-not (Test-Path $Path)) {
|
||||||
|
$summary.error = "gpu log missing"
|
||||||
|
return $summary
|
||||||
|
}
|
||||||
|
$lines = Get-Content -Path $Path
|
||||||
|
$currentIndex = -1
|
||||||
|
$gpuIndex = -1
|
||||||
|
$inGpuUtilSamples = $false
|
||||||
|
$inUtilBlock = $false
|
||||||
|
foreach ($line in $lines) {
|
||||||
|
if ($line -match '^Timestamp') {
|
||||||
|
$gpuIndex = -1
|
||||||
|
$currentIndex = -1
|
||||||
|
$inGpuUtilSamples = $false
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
|
||||||
|
$gpuIndex += 1
|
||||||
|
$currentIndex = $gpuIndex
|
||||||
|
$inGpuUtilSamples = $false
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^\\s*Utilization\\s*$') {
|
||||||
|
$inUtilBlock = $true
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^\\s*GPU Utilization Samples') {
|
||||||
|
$inGpuUtilSamples = $true
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($line -match '^\\s*(Memory|ENC|DEC) Utilization Samples') {
|
||||||
|
$inGpuUtilSamples = $false
|
||||||
|
$inUtilBlock = $false
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
|
||||||
|
$util = [int]$Matches[1]
|
||||||
|
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||||
|
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||||
|
$summary.samples += 1
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if ($inGpuUtilSamples -and $line -match '^\\s*Max\\s*:\\s*([0-9]+)\\s*%') {
|
||||||
|
$util = [int]$Matches[1]
|
||||||
|
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||||
|
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||||
|
$summary.samples += 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return $summary
|
||||||
|
}
|
||||||
|
|
||||||
|
function Get-ProcessorShare {
|
||||||
|
param(
|
||||||
|
[string]$SshExePath,
|
||||||
|
[string]$Target,
|
||||||
|
[int]$Port,
|
||||||
|
[string]$ModelName
|
||||||
|
)
|
||||||
|
$result = [ordered]@{ cpuPct = $null; gpuPct = $null; raw = $null; error = $null }
|
||||||
|
try {
|
||||||
|
$out = & $SshExePath -p $Port $Target "sudo -n docker exec ix-ollama-ollama-1 ollama ps"
|
||||||
|
$line = $out | Select-String -SimpleMatch $ModelName | Select-Object -First 1
|
||||||
|
if ($null -eq $line) {
|
||||||
|
$result.error = "model not found in ollama ps"
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
$raw = $line.ToString().Trim()
|
||||||
|
$result.raw = $raw
|
||||||
|
if ($raw -match '([0-9]+)%\\/([0-9]+)%\\s+CPU\\/GPU') {
|
||||||
|
$result.cpuPct = [int]$Matches[1]
|
||||||
|
$result.gpuPct = [int]$Matches[2]
|
||||||
|
} elseif ($raw -match '([0-9]+)%\\s+GPU') {
|
||||||
|
$result.cpuPct = 0
|
||||||
|
$result.gpuPct = [int]$Matches[1]
|
||||||
|
} else {
|
||||||
|
$result.error = "CPU/GPU split not parsed"
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$result.error = $_.Exception.Message
|
||||||
|
}
|
||||||
|
return $result
|
||||||
|
}
|
||||||
|
|
||||||
|
$prompt = [string](Get-Content -Raw -Path $PromptPath)
|
||||||
|
$allowed = Get-AllowedLegs -promptText $prompt
|
||||||
|
$allowedExpiry = $allowed.AllowedExpiry
|
||||||
|
$allowedLegs = $allowed.AllowedLegs
|
||||||
|
|
||||||
|
if ([string]::IsNullOrWhiteSpace($BatchId)) {
|
||||||
|
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
|
||||||
|
}
|
||||||
|
|
||||||
|
$outBase = Join-Path -Path (Get-Location) -ChildPath "ollama_runs_remote"
|
||||||
|
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
|
||||||
|
|
||||||
|
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
|
||||||
|
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
|
||||||
|
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
|
||||||
|
|
||||||
|
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
|
||||||
|
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
|
||||||
|
|
||||||
|
$summary = [ordered]@{
|
||||||
|
model = $Model
|
||||||
|
baseUrl = $BaseUrl
|
||||||
|
formatMode = $(if ($UseSchemaFormat) { "schema" } else { "json" })
|
||||||
|
batchId = $BatchId
|
||||||
|
gpuMonitor = [ordered]@{
|
||||||
|
enabled = [bool]$EnableGpuMonitor
|
||||||
|
sshHost = $SshHost
|
||||||
|
sshPort = $SshPort
|
||||||
|
intervalSec = $GpuMonitorIntervalSec
|
||||||
|
durationSec = $GpuMonitorSeconds
|
||||||
|
}
|
||||||
|
runs = @()
|
||||||
|
}
|
||||||
|
|
||||||
|
for ($i = 1; $i -le $Runs; $i++) {
|
||||||
|
Write-Host "Running $Model (run $i/$Runs)"
|
||||||
|
|
||||||
|
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
|
||||||
|
$gpuJob = $null
|
||||||
|
$gpuLogPath = $null
|
||||||
|
|
||||||
|
if ($EnableGpuMonitor) {
|
||||||
|
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
|
||||||
|
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
|
||||||
|
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||||
|
$gpuJob = Start-Job -ScriptBlock {
|
||||||
|
param($sshExe, $target, $port, $samples, $interval, $logPath)
|
||||||
|
for ($s = 1; $s -le $samples; $s++) {
|
||||||
|
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
|
||||||
|
try {
|
||||||
|
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
|
||||||
|
Add-Content -Path $logPath -Value $out
|
||||||
|
} catch {
|
||||||
|
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
|
||||||
|
}
|
||||||
|
Start-Sleep -Seconds $interval
|
||||||
|
}
|
||||||
|
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
|
||||||
|
Start-Sleep -Seconds 1
|
||||||
|
}
|
||||||
|
|
||||||
|
$format = "json"
|
||||||
|
if ($UseSchemaFormat) {
|
||||||
|
$format = @{
|
||||||
|
type = "object"
|
||||||
|
additionalProperties = $false
|
||||||
|
required = @("selectedExpiry","expiryRationale","strategyBias","recommendedTrades","whyOthersRejected","confidenceScore")
|
||||||
|
properties = @{
|
||||||
|
selectedExpiry = @{ type = "string"; minLength = 1 }
|
||||||
|
expiryRationale = @{ type = "string"; minLength = 1 }
|
||||||
|
strategyBias = @{ type = "string"; enum = @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE") }
|
||||||
|
recommendedTrades = @{
|
||||||
|
type = "array"
|
||||||
|
minItems = 0
|
||||||
|
maxItems = 3
|
||||||
|
items = @{
|
||||||
|
type = "object"
|
||||||
|
additionalProperties = $false
|
||||||
|
required = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||||
|
properties = @{
|
||||||
|
name = @{ type = "string"; minLength = 1 }
|
||||||
|
structure = @{ type = "string"; minLength = 1 }
|
||||||
|
legs = @{
|
||||||
|
type = "array"
|
||||||
|
minItems = 1
|
||||||
|
maxItems = 4
|
||||||
|
items = @{
|
||||||
|
type = "object"
|
||||||
|
additionalProperties = $false
|
||||||
|
required = @("side","action","strike","expiry")
|
||||||
|
properties = @{
|
||||||
|
side = @{ type = "string"; enum = @("call","put") }
|
||||||
|
action = @{ type = "string"; enum = @("buy","sell") }
|
||||||
|
strike = @{ type = @("number","string") }
|
||||||
|
expiry = @{ type = "string"; minLength = 1 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
greekProfile = @{
|
||||||
|
type = "object"
|
||||||
|
additionalProperties = $false
|
||||||
|
required = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||||
|
properties = @{
|
||||||
|
deltaBias = @{ type = "string"; enum = @("POS","NEG","NEUTRAL") }
|
||||||
|
gammaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
|
||||||
|
thetaExposure = @{ type = "string"; enum = @("POS","NEG","LOW") }
|
||||||
|
vegaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
maxRisk = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
|
||||||
|
maxReward = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
|
||||||
|
thesisAlignment = @{ type = "string"; minLength = 1 }
|
||||||
|
invalidation = @{ type = "string"; minLength = 1 }
|
||||||
|
managementNotes = @{ type = "string" }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
whyOthersRejected = @{
|
||||||
|
type = "array"
|
||||||
|
items = @{ type = "string" }
|
||||||
|
}
|
||||||
|
confidenceScore = @{ type = "number"; minimum = 0; maximum = 100 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$options = @{
|
||||||
|
temperature = 0
|
||||||
|
top_k = $TopK
|
||||||
|
top_p = $TopP
|
||||||
|
seed = $Seed
|
||||||
|
repeat_penalty = $RepeatPenalty
|
||||||
|
num_ctx = $NumCtx
|
||||||
|
num_predict = $NumPredict
|
||||||
|
}
|
||||||
|
if ($NumBatch -gt 0) {
|
||||||
|
$options.num_batch = $NumBatch
|
||||||
|
}
|
||||||
|
if ($NumGpuLayers -gt 0) {
|
||||||
|
$options.num_gpu_layers = $NumGpuLayers
|
||||||
|
}
|
||||||
|
|
||||||
|
$body = @{
|
||||||
|
model = $Model
|
||||||
|
prompt = $prompt
|
||||||
|
format = $format
|
||||||
|
stream = $false
|
||||||
|
options = $options
|
||||||
|
} | ConvertTo-Json -Depth 10
|
||||||
|
|
||||||
|
try {
|
||||||
|
$resp = Invoke-RestMethod -Uri "$BaseUrl/api/generate" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
|
||||||
|
} catch {
|
||||||
|
$runResult.errors = @("API error: $($_.Exception.Message)")
|
||||||
|
$summary.runs += $runResult
|
||||||
|
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
|
||||||
|
continue
|
||||||
|
} finally {
|
||||||
|
if ($gpuJob) {
|
||||||
|
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
|
||||||
|
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
|
||||||
|
Remove-Job -Job $gpuJob | Out-Null
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$raw = [string]$resp.response
|
||||||
|
|
||||||
|
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
|
||||||
|
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
|
||||||
|
|
||||||
|
try {
|
||||||
|
$parsed = $raw | ConvertFrom-Json
|
||||||
|
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
|
||||||
|
if ($errors.Count -eq 0) {
|
||||||
|
$runResult.ok = $true
|
||||||
|
} else {
|
||||||
|
$runResult.errors = $errors
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($gpuLogPath) {
|
||||||
|
$runResult.gpuLog = $gpuLogPath
|
||||||
|
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
|
||||||
|
}
|
||||||
|
|
||||||
|
if ($CheckProcessor) {
|
||||||
|
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||||
|
$proc = Get-ProcessorShare -SshExePath $SshExe -Target $sshTarget -Port $SshPort -ModelName $Model
|
||||||
|
$runResult.processor = $proc
|
||||||
|
if ($proc.cpuPct -ne $null) {
|
||||||
|
$runResult.gpuOnly = ($proc.cpuPct -eq 0)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$summary.runs += $runResult
|
||||||
|
}
|
||||||
|
|
||||||
|
$summaryPath = Join-Path $outDir "summary.json"
|
||||||
|
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
|
||||||
|
|
||||||
|
$summary | ConvertTo-Json -Depth 6
|
||||||
155
prompt_crwv.txt
Normal file
155
prompt_crwv.txt
Normal file
File diff suppressed because one or more lines are too long
1
query.sql
Normal file
1
query.sql
Normal file
@@ -0,0 +1 @@
|
|||||||
|
SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';
|
||||||
8
requirements.txt
Normal file
8
requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
fastapi==0.115.6
|
||||||
|
uvicorn==0.30.6
|
||||||
|
httpx==0.27.2
|
||||||
|
pytest==8.3.3
|
||||||
|
respx==0.21.1
|
||||||
|
pytest-asyncio==0.24.0
|
||||||
|
PyYAML==6.0.3
|
||||||
|
websockets==12.0
|
||||||
116
scripts/deploy_truenas_wrapper.py
Normal file
116
scripts/deploy_truenas_wrapper.py
Normal file
@@ -0,0 +1,116 @@
|
|||||||
|
import argparse
|
||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import ssl
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
import websockets
|
||||||
|
|
||||||
|
|
||||||
|
async def _rpc_call(ws_url: str, api_key: str, method: str, params: Optional[list] = None, verify_ssl: bool = False) -> Any:
|
||||||
|
ssl_ctx = None
|
||||||
|
if ws_url.startswith("wss://") and not verify_ssl:
|
||||||
|
ssl_ctx = ssl.create_default_context()
|
||||||
|
ssl_ctx.check_hostname = False
|
||||||
|
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||||
|
|
||||||
|
async with websockets.connect(ws_url, ssl=ssl_ctx) as ws:
|
||||||
|
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||||
|
connected = json.loads(await ws.recv())
|
||||||
|
if connected.get("msg") != "connected":
|
||||||
|
raise RuntimeError("failed to connect to TrueNAS websocket")
|
||||||
|
|
||||||
|
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [api_key]}))
|
||||||
|
auth_resp = json.loads(await ws.recv())
|
||||||
|
if not auth_resp.get("result"):
|
||||||
|
raise RuntimeError("API key authentication failed")
|
||||||
|
|
||||||
|
req_id = 2
|
||||||
|
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
|
||||||
|
while True:
|
||||||
|
raw = json.loads(await ws.recv())
|
||||||
|
if raw.get("id") != req_id:
|
||||||
|
continue
|
||||||
|
if raw.get("msg") == "error":
|
||||||
|
raise RuntimeError(raw.get("error"))
|
||||||
|
return raw.get("result")
|
||||||
|
|
||||||
|
|
||||||
|
async def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument("--ws-url", required=True)
|
||||||
|
parser.add_argument("--api-key", required=True)
|
||||||
|
parser.add_argument("--api-user")
|
||||||
|
parser.add_argument("--app-name", required=True)
|
||||||
|
parser.add_argument("--image", required=True)
|
||||||
|
parser.add_argument("--model-host-path", required=True)
|
||||||
|
parser.add_argument("--llamacpp-base-url", required=True)
|
||||||
|
parser.add_argument("--network", required=True)
|
||||||
|
parser.add_argument("--api-port", type=int, default=9091)
|
||||||
|
parser.add_argument("--ui-port", type=int, default=9092)
|
||||||
|
parser.add_argument("--verify-ssl", action="store_true")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
api_port = args.api_port
|
||||||
|
ui_port = args.ui_port
|
||||||
|
|
||||||
|
env = {
|
||||||
|
"PORT_A": str(api_port),
|
||||||
|
"PORT_B": str(ui_port),
|
||||||
|
"LLAMACPP_BASE_URL": args.llamacpp_base_url,
|
||||||
|
"MODEL_DIR": "/models",
|
||||||
|
"TRUENAS_WS_URL": args.ws_url,
|
||||||
|
"TRUENAS_API_KEY": args.api_key,
|
||||||
|
"TRUENAS_APP_NAME": "llamacpp",
|
||||||
|
"TRUENAS_VERIFY_SSL": "false",
|
||||||
|
}
|
||||||
|
if args.api_user:
|
||||||
|
env["TRUENAS_API_USER"] = args.api_user
|
||||||
|
|
||||||
|
compose = {
|
||||||
|
"services": {
|
||||||
|
"wrapper": {
|
||||||
|
"image": args.image,
|
||||||
|
"restart": "unless-stopped",
|
||||||
|
"ports": [
|
||||||
|
f"{api_port}:{api_port}",
|
||||||
|
f"{ui_port}:{ui_port}",
|
||||||
|
],
|
||||||
|
"environment": env,
|
||||||
|
"volumes": [
|
||||||
|
f"{args.model_host_path}:/models",
|
||||||
|
"/var/run/docker.sock:/var/run/docker.sock",
|
||||||
|
],
|
||||||
|
"networks": ["llamacpp_net"],
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"networks": {
|
||||||
|
"llamacpp_net": {"external": True, "name": args.network}
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
create_payload = {
|
||||||
|
"custom_app": True,
|
||||||
|
"app_name": args.app_name,
|
||||||
|
"custom_compose_config": compose,
|
||||||
|
}
|
||||||
|
|
||||||
|
existing = await _rpc_call(args.ws_url, args.api_key, "app.query", [[["id", "=", args.app_name]]], args.verify_ssl)
|
||||||
|
if existing:
|
||||||
|
result = await _rpc_call(
|
||||||
|
args.ws_url,
|
||||||
|
args.api_key,
|
||||||
|
"app.update",
|
||||||
|
[args.app_name, {"custom_compose_config": compose}],
|
||||||
|
args.verify_ssl,
|
||||||
|
)
|
||||||
|
action = "updated"
|
||||||
|
else:
|
||||||
|
result = await _rpc_call(args.ws_url, args.api_key, "app.create", [create_payload], args.verify_ssl)
|
||||||
|
action = "created"
|
||||||
|
|
||||||
|
print(json.dumps({"action": action, "api_port": api_port, "ui_port": ui_port, "result": result}, indent=2))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
asyncio.run(main())
|
||||||
162
scripts/remote_wrapper_test.py
Normal file
162
scripts/remote_wrapper_test.py
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9000")
|
||||||
|
UPSTREAM = os.getenv("LLAMACPP_BASE", "http://192.168.1.2:8071")
|
||||||
|
RUNS = int(os.getenv("RUNS", "100"))
|
||||||
|
MAX_TOKENS = int(os.getenv("MAX_TOKENS", "4"))
|
||||||
|
TIMEOUT = int(os.getenv("REQ_TIMEOUT", "300"))
|
||||||
|
|
||||||
|
|
||||||
|
def _now():
|
||||||
|
return datetime.utcnow().isoformat() + "Z"
|
||||||
|
|
||||||
|
|
||||||
|
def _get_loaded_model_id():
|
||||||
|
deadline = time.time() + 600
|
||||||
|
last_error = None
|
||||||
|
while time.time() < deadline:
|
||||||
|
try:
|
||||||
|
resp = requests.get(UPSTREAM + "/v1/models", timeout=30)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json().get("data") or []
|
||||||
|
if data:
|
||||||
|
return data[0].get("id")
|
||||||
|
last_error = "no models reported by upstream"
|
||||||
|
except Exception as exc:
|
||||||
|
last_error = str(exc)
|
||||||
|
time.sleep(5)
|
||||||
|
raise RuntimeError(f"upstream not ready: {last_error}")
|
||||||
|
|
||||||
|
|
||||||
|
def _stream_ok(resp):
|
||||||
|
got_data = False
|
||||||
|
got_done = False
|
||||||
|
for line in resp.iter_lines(decode_unicode=True):
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
if line.startswith("data:"):
|
||||||
|
got_data = True
|
||||||
|
if line.strip() == "data: [DONE]":
|
||||||
|
got_done = True
|
||||||
|
break
|
||||||
|
return got_data, got_done
|
||||||
|
|
||||||
|
|
||||||
|
def run_suite(model_id, idx):
|
||||||
|
results = {}
|
||||||
|
|
||||||
|
# Models
|
||||||
|
r = requests.get(BASE + "/v1/models", timeout=30)
|
||||||
|
results["models"] = r.status_code
|
||||||
|
|
||||||
|
r = requests.get(BASE + f"/v1/models/{model_id}", timeout=30)
|
||||||
|
results["model_get"] = r.status_code
|
||||||
|
|
||||||
|
# Chat completions non-stream
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": f"Run {idx}: say ok."}],
|
||||||
|
"max_tokens": MAX_TOKENS,
|
||||||
|
"temperature": (idx % 5) / 10.0,
|
||||||
|
}
|
||||||
|
r = requests.post(BASE + "/v1/chat/completions", json=payload, timeout=TIMEOUT)
|
||||||
|
results["chat"] = r.status_code
|
||||||
|
|
||||||
|
# Chat completions stream
|
||||||
|
payload_stream = dict(payload)
|
||||||
|
payload_stream["stream"] = True
|
||||||
|
r = requests.post(BASE + "/v1/chat/completions", json=payload_stream, stream=True, timeout=TIMEOUT)
|
||||||
|
ok_data, ok_done = _stream_ok(r)
|
||||||
|
results["chat_stream"] = r.status_code
|
||||||
|
results["chat_stream_ok"] = ok_data and ok_done
|
||||||
|
|
||||||
|
# Responses non-stream
|
||||||
|
payload_resp = {
|
||||||
|
"model": model_id,
|
||||||
|
"input": f"Run {idx}: say ok.",
|
||||||
|
"max_output_tokens": MAX_TOKENS,
|
||||||
|
}
|
||||||
|
r = requests.post(BASE + "/v1/responses", json=payload_resp, timeout=TIMEOUT)
|
||||||
|
results["responses"] = r.status_code
|
||||||
|
|
||||||
|
# Responses stream
|
||||||
|
payload_resp_stream = {
|
||||||
|
"model": model_id,
|
||||||
|
"input": f"Run {idx}: say ok.",
|
||||||
|
"stream": True,
|
||||||
|
}
|
||||||
|
r = requests.post(BASE + "/v1/responses", json=payload_resp_stream, stream=True, timeout=TIMEOUT)
|
||||||
|
ok_data, ok_done = _stream_ok(r)
|
||||||
|
results["responses_stream"] = r.status_code
|
||||||
|
results["responses_stream_ok"] = ok_data and ok_done
|
||||||
|
|
||||||
|
# Embeddings (best effort)
|
||||||
|
payload_emb = {"model": model_id, "input": f"Run {idx}"}
|
||||||
|
r = requests.post(BASE + "/v1/embeddings", json=payload_emb, timeout=TIMEOUT)
|
||||||
|
results["embeddings"] = r.status_code
|
||||||
|
|
||||||
|
# Proxy
|
||||||
|
r = requests.post(BASE + "/proxy/llamacpp/v1/chat/completions", json=payload, timeout=TIMEOUT)
|
||||||
|
results["proxy"] = r.status_code
|
||||||
|
|
||||||
|
return results
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
summary = {
|
||||||
|
"started_at": _now(),
|
||||||
|
"base": BASE,
|
||||||
|
"upstream": UPSTREAM,
|
||||||
|
"runs": RUNS,
|
||||||
|
"max_tokens": MAX_TOKENS,
|
||||||
|
"results": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
model_id = _get_loaded_model_id()
|
||||||
|
summary["model_id"] = model_id
|
||||||
|
|
||||||
|
for i in range(1, RUNS + 1):
|
||||||
|
start = time.time()
|
||||||
|
try:
|
||||||
|
results = run_suite(model_id, i)
|
||||||
|
ok = all(
|
||||||
|
results.get(key) == 200
|
||||||
|
for key in ("models", "model_get", "chat", "chat_stream", "responses", "responses_stream", "proxy")
|
||||||
|
)
|
||||||
|
stream_ok = results.get("chat_stream_ok") and results.get("responses_stream_ok")
|
||||||
|
summary["results"].append({
|
||||||
|
"run": i,
|
||||||
|
"ok": ok and stream_ok,
|
||||||
|
"stream_ok": stream_ok,
|
||||||
|
"status": results,
|
||||||
|
"elapsed_s": round(time.time() - start, 2),
|
||||||
|
})
|
||||||
|
except Exception as exc:
|
||||||
|
summary["results"].append({
|
||||||
|
"run": i,
|
||||||
|
"ok": False,
|
||||||
|
"stream_ok": False,
|
||||||
|
"error": str(exc),
|
||||||
|
"elapsed_s": round(time.time() - start, 2),
|
||||||
|
})
|
||||||
|
print(f"Run {i}/{RUNS} done")
|
||||||
|
|
||||||
|
summary["finished_at"] = _now()
|
||||||
|
|
||||||
|
os.makedirs("reports", exist_ok=True)
|
||||||
|
out_path = os.path.join("reports", "remote_wrapper_test.json")
|
||||||
|
with open(out_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(summary, f, indent=2)
|
||||||
|
|
||||||
|
# Print a compact summary
|
||||||
|
ok_count = sum(1 for r in summary["results"] if r.get("ok"))
|
||||||
|
print(f"OK {ok_count}/{RUNS}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
29
scripts/update_llamacpp_flags.ps1
Normal file
29
scripts/update_llamacpp_flags.ps1
Normal file
@@ -0,0 +1,29 @@
|
|||||||
|
param(
|
||||||
|
[string]$OutDocs = "reports\\llamacpp_docs.md",
|
||||||
|
[string]$OutFlags = "reports\\llamacpp_flags.txt"
|
||||||
|
)
|
||||||
|
|
||||||
|
$urls = @(
|
||||||
|
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README.md",
|
||||||
|
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README-llama-server.md",
|
||||||
|
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/README.md"
|
||||||
|
)
|
||||||
|
|
||||||
|
$out = @()
|
||||||
|
foreach ($u in $urls) {
|
||||||
|
try {
|
||||||
|
$content = Invoke-WebRequest -Uri $u -UseBasicParsing -TimeoutSec 30
|
||||||
|
$out += "# Source: $u"
|
||||||
|
$out += $content.Content
|
||||||
|
} catch {
|
||||||
|
$out += "# Source: $u"
|
||||||
|
$out += "(failed to fetch)"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
$out | Set-Content -Encoding UTF8 $OutDocs
|
||||||
|
|
||||||
|
$docs = Get-Content $OutDocs -Raw
|
||||||
|
$flags = [regex]::Matches($docs, "--[a-zA-Z0-9\\-]+") | ForEach-Object { $_.Value }
|
||||||
|
$flags = $flags | Sort-Object -Unique
|
||||||
|
$flags | Set-Content -Encoding UTF8 $OutFlags
|
||||||
61
tests/conftest.py
Normal file
61
tests/conftest.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from fastapi.testclient import TestClient
|
||||||
|
import respx
|
||||||
|
|
||||||
|
from app.api_app import create_api_app
|
||||||
|
from app.ui_app import create_ui_app
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture()
|
||||||
|
def agents_config(tmp_path: Path) -> Path:
|
||||||
|
data = {
|
||||||
|
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
|
||||||
|
"container_name": "ix-llamacpp-llamacpp-1",
|
||||||
|
"host_port": 8071,
|
||||||
|
"container_port": 8080,
|
||||||
|
"web_ui_url": "http://0.0.0.0:8071/",
|
||||||
|
"model_host_path": str(tmp_path),
|
||||||
|
"model_container_path": str(tmp_path),
|
||||||
|
"models": [],
|
||||||
|
"network": "ix-llamacpp_default",
|
||||||
|
"subnets": ["172.16.18.0/24"],
|
||||||
|
"gpu_count": 2,
|
||||||
|
"gpu_name": "NVIDIA RTX 5060 Ti",
|
||||||
|
}
|
||||||
|
path = tmp_path / "agents_config.json"
|
||||||
|
path.write_text(json.dumps(data), encoding="utf-8")
|
||||||
|
return path
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture()
|
||||||
|
def model_dir(tmp_path: Path) -> Path:
|
||||||
|
(tmp_path / "model-a.gguf").write_text("x", encoding="utf-8")
|
||||||
|
(tmp_path / "model-b.gguf").write_text("y", encoding="utf-8")
|
||||||
|
return tmp_path
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture()
|
||||||
|
def api_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
|
||||||
|
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
|
||||||
|
monkeypatch.setenv("MODEL_DIR", str(model_dir))
|
||||||
|
monkeypatch.setenv("LLAMACPP_BASE_URL", "http://llama.test")
|
||||||
|
app = create_api_app()
|
||||||
|
return TestClient(app)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture()
|
||||||
|
def ui_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
|
||||||
|
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
|
||||||
|
monkeypatch.setenv("MODEL_DIR", str(model_dir))
|
||||||
|
app = create_ui_app()
|
||||||
|
return TestClient(app)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture()
|
||||||
|
def respx_mock():
|
||||||
|
with respx.mock(assert_all_called=False) as mock:
|
||||||
|
yield mock
|
||||||
77
tests/test_chat_completions.py
Normal file
77
tests/test_chat_completions.py
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
import json
|
||||||
|
import pytest
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_chat_completions_non_stream(api_client, respx_mock, case):
|
||||||
|
respx_mock.get("http://llama.test/v1/models").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||||
|
)
|
||||||
|
respx_mock.post("http://llama.test/v1/chat/completions").mock(
|
||||||
|
return_value=httpx.Response(200, json={"id": f"chatcmpl-{case}", "choices": [{"message": {"content": "ok"}}]})
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": "model-a.gguf",
|
||||||
|
"messages": [{"role": "user", "content": f"hello {case}"}],
|
||||||
|
"temperature": (case % 10) / 10,
|
||||||
|
}
|
||||||
|
resp = api_client.post("/v1/chat/completions", json=payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
data = resp.json()
|
||||||
|
assert data["choices"][0]["message"]["content"] == "ok"
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_chat_completions_stream(api_client, respx_mock, case):
|
||||||
|
respx_mock.get("http://llama.test/v1/models").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||||
|
)
|
||||||
|
|
||||||
|
def stream_response(request):
|
||||||
|
content = b"data: {\"id\": \"chunk\"}\n\n"
|
||||||
|
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
|
||||||
|
|
||||||
|
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": "model-a.gguf",
|
||||||
|
"messages": [{"role": "user", "content": f"hello {case}"}],
|
||||||
|
"stream": True,
|
||||||
|
}
|
||||||
|
with api_client.stream("POST", "/v1/chat/completions", json=payload) as resp:
|
||||||
|
assert resp.status_code == 200
|
||||||
|
body = b"".join(resp.iter_bytes())
|
||||||
|
assert b"data:" in body
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_completions_tools_normalize(api_client, respx_mock):
|
||||||
|
respx_mock.get("http://llama.test/v1/models").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||||
|
)
|
||||||
|
|
||||||
|
def handler(request):
|
||||||
|
data = request.json()
|
||||||
|
tools = data.get("tools") or []
|
||||||
|
assert tools
|
||||||
|
assert tools[0].get("function", {}).get("name") == "format_final_json_response"
|
||||||
|
return httpx.Response(200, json={"id": "chatcmpl-tools", "choices": [{"message": {"content": "ok"}}]})
|
||||||
|
|
||||||
|
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=handler)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": "model-a.gguf",
|
||||||
|
"messages": [{"role": "user", "content": "hello"}],
|
||||||
|
"tools": [
|
||||||
|
{
|
||||||
|
"type": "function",
|
||||||
|
"name": "format_final_json_response",
|
||||||
|
"parameters": {"type": "object"},
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"tool_choice": {"type": "function", "name": "format_final_json_response"},
|
||||||
|
}
|
||||||
|
|
||||||
|
resp = api_client.post("/v1/chat/completions", json=payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
14
tests/test_embeddings.py
Normal file
14
tests/test_embeddings.py
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
import pytest
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_embeddings(api_client, respx_mock, case):
|
||||||
|
respx_mock.post("http://llama.test/v1/embeddings").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"embedding": [0.1, 0.2]}]})
|
||||||
|
)
|
||||||
|
payload = {"model": "model-a.gguf", "input": f"text-{case}"}
|
||||||
|
resp = api_client.post("/v1/embeddings", json=payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
data = resp.json()
|
||||||
|
assert "data" in data
|
||||||
24
tests/test_models.py
Normal file
24
tests/test_models.py
Normal file
@@ -0,0 +1,24 @@
|
|||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_list_models_cases(api_client, case):
|
||||||
|
resp = api_client.get("/v1/models", headers={"x-case": str(case)})
|
||||||
|
assert resp.status_code == 200
|
||||||
|
payload = resp.json()
|
||||||
|
assert payload["object"] == "list"
|
||||||
|
assert isinstance(payload["data"], list)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("model_id", [f"model-a.gguf" for _ in range(120)])
|
||||||
|
def test_get_model_ok(api_client, model_id):
|
||||||
|
resp = api_client.get(f"/v1/models/{model_id}")
|
||||||
|
assert resp.status_code == 200
|
||||||
|
payload = resp.json()
|
||||||
|
assert payload["id"] == model_id
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("model_id", [f"missing-{i}" for i in range(120)])
|
||||||
|
def test_get_model_not_found(api_client, model_id):
|
||||||
|
resp = api_client.get(f"/v1/models/{model_id}")
|
||||||
|
assert resp.status_code == 404
|
||||||
12
tests/test_proxy.py
Normal file
12
tests/test_proxy.py
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
import pytest
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_proxy_passthrough(api_client, respx_mock, case):
|
||||||
|
respx_mock.post("http://llama.test/test/path").mock(
|
||||||
|
return_value=httpx.Response(200, content=f"ok-{case}".encode())
|
||||||
|
)
|
||||||
|
resp = api_client.post("/proxy/llamacpp/test/path", content=b"hello")
|
||||||
|
assert resp.status_code == 200
|
||||||
|
assert resp.content.startswith(b"ok-")
|
||||||
283
tests/test_remote_wrapper.py
Normal file
283
tests/test_remote_wrapper.py
Normal file
@@ -0,0 +1,283 @@
|
|||||||
|
import asyncio
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import ssl
|
||||||
|
import time
|
||||||
|
from typing import Dict, List
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import requests
|
||||||
|
import websockets
|
||||||
|
|
||||||
|
WRAPPER_BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9093")
|
||||||
|
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
|
||||||
|
TRUENAS_WS_URL = os.getenv("TRUENAS_WS_URL", "wss://192.168.1.2/websocket")
|
||||||
|
TRUENAS_API_KEY = os.getenv("TRUENAS_API_KEY", "")
|
||||||
|
TRUENAS_APP_NAME = os.getenv("TRUENAS_APP_NAME", "llamacpp")
|
||||||
|
MODEL_REQUEST = os.getenv("MODEL_REQUEST", "")
|
||||||
|
|
||||||
|
|
||||||
|
async def _rpc_call(method: str, params: List | None = None):
|
||||||
|
if not TRUENAS_API_KEY:
|
||||||
|
pytest.skip("TRUENAS_API_KEY not set")
|
||||||
|
ssl_ctx = ssl.create_default_context()
|
||||||
|
ssl_ctx.check_hostname = False
|
||||||
|
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||||
|
async with websockets.connect(TRUENAS_WS_URL, ssl=ssl_ctx) as ws:
|
||||||
|
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||||
|
connected = json.loads(await ws.recv())
|
||||||
|
if connected.get("msg") != "connected":
|
||||||
|
raise RuntimeError("failed to connect")
|
||||||
|
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [TRUENAS_API_KEY]}))
|
||||||
|
auth = json.loads(await ws.recv())
|
||||||
|
if not auth.get("result"):
|
||||||
|
raise RuntimeError("auth failed")
|
||||||
|
await ws.send(json.dumps({"id": 2, "msg": "method", "method": method, "params": params or []}))
|
||||||
|
while True:
|
||||||
|
raw = json.loads(await ws.recv())
|
||||||
|
if raw.get("id") != 2:
|
||||||
|
continue
|
||||||
|
if raw.get("msg") == "error":
|
||||||
|
raise RuntimeError(raw.get("error"))
|
||||||
|
return raw.get("result")
|
||||||
|
|
||||||
|
|
||||||
|
def _get_models() -> List[str]:
|
||||||
|
_wait_for_http(WRAPPER_BASE + "/health")
|
||||||
|
resp = requests.get(WRAPPER_BASE + "/v1/models", timeout=30)
|
||||||
|
resp.raise_for_status()
|
||||||
|
data = resp.json().get("data") or []
|
||||||
|
return [m.get("id") for m in data if m.get("id")]
|
||||||
|
|
||||||
|
|
||||||
|
def _assert_chat_ok(resp_json: Dict) -> str:
|
||||||
|
choices = resp_json.get("choices") or []
|
||||||
|
assert choices, "no choices"
|
||||||
|
message = choices[0].get("message") or {}
|
||||||
|
text = message.get("content") or ""
|
||||||
|
assert text.strip(), "empty content"
|
||||||
|
return text
|
||||||
|
|
||||||
|
|
||||||
|
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
|
||||||
|
deadline = time.time() + timeout_s
|
||||||
|
last_err = None
|
||||||
|
while time.time() < deadline:
|
||||||
|
try:
|
||||||
|
resp = requests.get(url, timeout=5)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return
|
||||||
|
last_err = f"status {resp.status_code}"
|
||||||
|
except Exception as exc:
|
||||||
|
last_err = str(exc)
|
||||||
|
time.sleep(2)
|
||||||
|
raise RuntimeError(f"service not ready: {url} ({last_err})")
|
||||||
|
|
||||||
|
|
||||||
|
def _post_with_retry(url: str, payload: Dict, timeout_s: float = 300, retries: int = 6, delay_s: float = 5.0):
|
||||||
|
last = None
|
||||||
|
for _ in range(retries):
|
||||||
|
try:
|
||||||
|
resp = requests.post(url, json=payload, timeout=timeout_s)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return resp
|
||||||
|
last = resp
|
||||||
|
except requests.exceptions.RequestException as exc:
|
||||||
|
last = exc
|
||||||
|
time.sleep(delay_s)
|
||||||
|
if isinstance(last, Exception):
|
||||||
|
raise last
|
||||||
|
return last
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_active_model_and_multi_gpu_flags():
|
||||||
|
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
|
||||||
|
command = cfg.get("command") or []
|
||||||
|
assert "--model" in command
|
||||||
|
assert "--tensor-split" in command
|
||||||
|
split_idx = command.index("--tensor-split") + 1
|
||||||
|
split = command[split_idx]
|
||||||
|
assert "," in split, f"tensor-split missing commas: {split}"
|
||||||
|
assert "--split-mode" in command
|
||||||
|
|
||||||
|
|
||||||
|
def test_models_listed():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models discovered"
|
||||||
|
|
||||||
|
|
||||||
|
def test_chat_completions_switch_and_prompts():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
if MODEL_REQUEST:
|
||||||
|
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||||
|
model_id = MODEL_REQUEST
|
||||||
|
else:
|
||||||
|
model_id = models[0]
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "Say OK."}],
|
||||||
|
"max_tokens": 12,
|
||||||
|
"temperature": 0,
|
||||||
|
}
|
||||||
|
for _ in range(3):
|
||||||
|
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
_assert_chat_ok(resp.json())
|
||||||
|
|
||||||
|
|
||||||
|
def test_tools_flat_format():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
if MODEL_REQUEST:
|
||||||
|
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||||
|
model_id = MODEL_REQUEST
|
||||||
|
else:
|
||||||
|
model_id = models[0]
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
|
||||||
|
"tools": [
|
||||||
|
{
|
||||||
|
"type": "function",
|
||||||
|
"name": "format_final_json_response",
|
||||||
|
"description": "format output",
|
||||||
|
"parameters": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {"ok": {"type": "boolean"}},
|
||||||
|
"required": ["ok"],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"max_tokens": 12,
|
||||||
|
}
|
||||||
|
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
_assert_chat_ok(resp.json())
|
||||||
|
|
||||||
|
|
||||||
|
def test_functions_payload_normalized():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
if MODEL_REQUEST:
|
||||||
|
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||||
|
model_id = MODEL_REQUEST
|
||||||
|
else:
|
||||||
|
model_id = models[0]
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
|
||||||
|
"functions": [
|
||||||
|
{
|
||||||
|
"name": "format_final_json_response",
|
||||||
|
"description": "format output",
|
||||||
|
"parameters": {
|
||||||
|
"type": "object",
|
||||||
|
"properties": {"ok": {"type": "boolean"}},
|
||||||
|
"required": ["ok"],
|
||||||
|
},
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"max_tokens": 12,
|
||||||
|
}
|
||||||
|
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
_assert_chat_ok(resp.json())
|
||||||
|
|
||||||
|
|
||||||
|
def test_return_format_json():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
if MODEL_REQUEST:
|
||||||
|
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||||
|
model_id = MODEL_REQUEST
|
||||||
|
else:
|
||||||
|
model_id = models[0]
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"messages": [{"role": "user", "content": "Return JSON with key ok true."}],
|
||||||
|
"return_format": "json",
|
||||||
|
"max_tokens": 32,
|
||||||
|
"temperature": 0,
|
||||||
|
}
|
||||||
|
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
text = _assert_chat_ok(resp.json())
|
||||||
|
parsed = json.loads(text)
|
||||||
|
assert isinstance(parsed, dict)
|
||||||
|
|
||||||
|
|
||||||
|
def test_responses_endpoint():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
if MODEL_REQUEST:
|
||||||
|
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||||
|
model_id = MODEL_REQUEST
|
||||||
|
else:
|
||||||
|
model_id = models[0]
|
||||||
|
payload = {
|
||||||
|
"model": model_id,
|
||||||
|
"input": "Say OK.",
|
||||||
|
"max_output_tokens": 16,
|
||||||
|
}
|
||||||
|
resp = _post_with_retry(WRAPPER_BASE + "/v1/responses", payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
output = resp.json().get("output") or []
|
||||||
|
assert output, "responses output empty"
|
||||||
|
content = output[0].get("content") or []
|
||||||
|
text = content[0].get("text") if content else ""
|
||||||
|
assert text and text.strip()
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
async def test_model_switch_applied_to_truenas():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
target = MODEL_REQUEST or models[0]
|
||||||
|
assert target in models, f"MODEL_REQUEST not found: {target}"
|
||||||
|
resp = requests.post(UI_BASE + "/ui/api/switch-model", json={"model_id": target, "warmup_prompt": "warmup"}, timeout=600)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
|
||||||
|
command = cfg.get("command") or []
|
||||||
|
assert "--model" in command
|
||||||
|
model_path = command[command.index("--model") + 1]
|
||||||
|
assert model_path.endswith(target)
|
||||||
|
|
||||||
|
|
||||||
|
def test_invalid_model_rejected():
|
||||||
|
models = _get_models()
|
||||||
|
assert models, "no models"
|
||||||
|
payload = {
|
||||||
|
"model": "modelx-q8:4b",
|
||||||
|
"messages": [{"role": "user", "content": "Say OK."}],
|
||||||
|
"max_tokens": 8,
|
||||||
|
"temperature": 0,
|
||||||
|
}
|
||||||
|
resp = requests.post(WRAPPER_BASE + "/v1/chat/completions", json=payload, timeout=60)
|
||||||
|
assert resp.status_code == 404
|
||||||
|
|
||||||
|
|
||||||
|
def test_llamacpp_logs_streaming():
|
||||||
|
logs = ""
|
||||||
|
for _ in range(5):
|
||||||
|
try:
|
||||||
|
resp = requests.get(UI_BASE + "/ui/api/llamacpp-logs", timeout=10)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
logs = resp.json().get("logs") or ""
|
||||||
|
if logs.strip():
|
||||||
|
break
|
||||||
|
except requests.exceptions.ReadTimeout:
|
||||||
|
pass
|
||||||
|
time.sleep(2)
|
||||||
|
assert logs.strip(), "no logs returned"
|
||||||
|
|
||||||
|
# Force a log line before streaming.
|
||||||
|
try:
|
||||||
|
requests.get(WRAPPER_BASE + "/proxy/llamacpp/health", timeout=5)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# Stream endpoint may not emit immediately, so validate that the endpoint responds.
|
||||||
|
with requests.get(UI_BASE + "/ui/api/llamacpp-logs/stream", stream=True, timeout=(5, 5)) as resp:
|
||||||
|
assert resp.status_code == 200
|
||||||
55
tests/test_responses.py
Normal file
55
tests/test_responses.py
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
import json
|
||||||
|
import pytest
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_responses_non_stream(api_client, respx_mock, case):
|
||||||
|
respx_mock.get("http://llama.test/v1/models").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||||
|
)
|
||||||
|
respx_mock.post("http://llama.test/v1/chat/completions").mock(
|
||||||
|
return_value=httpx.Response(200, json={"choices": [{"message": {"content": f"reply-{case}"}}]})
|
||||||
|
)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": "model-a.gguf",
|
||||||
|
"input": f"prompt-{case}",
|
||||||
|
"max_output_tokens": 32,
|
||||||
|
}
|
||||||
|
resp = api_client.post("/v1/responses", json=payload)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
data = resp.json()
|
||||||
|
assert data["object"] == "response"
|
||||||
|
assert data["output"][0]["content"][0]["text"].startswith("reply-")
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
def test_responses_stream(api_client, respx_mock, case):
|
||||||
|
respx_mock.get("http://llama.test/v1/models").mock(
|
||||||
|
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||||
|
)
|
||||||
|
|
||||||
|
def stream_response(request):
|
||||||
|
payload = {
|
||||||
|
"id": "chunk",
|
||||||
|
"object": "chat.completion.chunk",
|
||||||
|
"choices": [{"delta": {"content": f"hi-{case}"}, "index": 0, "finish_reason": None}],
|
||||||
|
}
|
||||||
|
content = f"data: {json.dumps(payload)}\n\n".encode()
|
||||||
|
content += b"data: [DONE]\n\n"
|
||||||
|
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
|
||||||
|
|
||||||
|
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
|
||||||
|
|
||||||
|
payload = {
|
||||||
|
"model": "model-a.gguf",
|
||||||
|
"input": f"prompt-{case}",
|
||||||
|
"stream": True,
|
||||||
|
}
|
||||||
|
with api_client.stream("POST", "/v1/responses", json=payload) as resp:
|
||||||
|
assert resp.status_code == 200
|
||||||
|
body = b"".join(resp.iter_bytes())
|
||||||
|
assert b"event: response.created" in body
|
||||||
|
assert b"event: response.output_text.delta" in body
|
||||||
|
assert b"event: response.completed" in body
|
||||||
54
tests/test_truenas_switch.py
Normal file
54
tests/test_truenas_switch.py
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
import json
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
from app.truenas_middleware import TrueNASConfig, switch_model
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.mark.asyncio
|
||||||
|
@pytest.mark.parametrize("case", list(range(120)))
|
||||||
|
async def test_switch_model_updates_command(monkeypatch, case):
|
||||||
|
compose = {
|
||||||
|
"services": {
|
||||||
|
"llamacpp": {
|
||||||
|
"command": [
|
||||||
|
"--model",
|
||||||
|
"/models/old.gguf",
|
||||||
|
"--ctx-size",
|
||||||
|
"2048",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
captured = {}
|
||||||
|
|
||||||
|
async def fake_rpc_call(cfg, method, params=None):
|
||||||
|
if method == "app.config":
|
||||||
|
return {"custom_compose_config": compose}
|
||||||
|
if method == "app.update":
|
||||||
|
captured["payload"] = params[1]
|
||||||
|
return {"state": "RUNNING"}
|
||||||
|
raise AssertionError(f"unexpected method {method}")
|
||||||
|
|
||||||
|
monkeypatch.setattr("app.truenas_middleware._rpc_call", fake_rpc_call)
|
||||||
|
|
||||||
|
cfg = TrueNASConfig(
|
||||||
|
ws_url="ws://truenas.test/websocket",
|
||||||
|
api_key="key",
|
||||||
|
api_user=None,
|
||||||
|
app_name="llamacpp",
|
||||||
|
verify_ssl=False,
|
||||||
|
)
|
||||||
|
|
||||||
|
await switch_model(
|
||||||
|
cfg,
|
||||||
|
f"/models/new-{case}.gguf",
|
||||||
|
{"n_gpu_layers": "999"},
|
||||||
|
"--flash-attn on",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert "custom_compose_config" in captured["payload"]
|
||||||
|
cmd = captured["payload"]["custom_compose_config"]["services"]["llamacpp"]["command"]
|
||||||
|
assert "--model" in cmd
|
||||||
|
idx = cmd.index("--model")
|
||||||
|
assert cmd[idx + 1].endswith(f"new-{case}.gguf")
|
||||||
48
tests/test_ui.py
Normal file
48
tests/test_ui.py
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import requests
|
||||||
|
|
||||||
|
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
|
||||||
|
|
||||||
|
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
|
||||||
|
deadline = time.time() + timeout_s
|
||||||
|
last_err = None
|
||||||
|
while time.time() < deadline:
|
||||||
|
try:
|
||||||
|
resp = requests.get(url, timeout=5)
|
||||||
|
if resp.status_code == 200:
|
||||||
|
return
|
||||||
|
last_err = f"status {resp.status_code}"
|
||||||
|
except Exception as exc:
|
||||||
|
last_err = str(exc)
|
||||||
|
time.sleep(2)
|
||||||
|
raise RuntimeError(f"service not ready: {url} ({last_err})")
|
||||||
|
|
||||||
|
|
||||||
|
def test_ui_index_contains_expected_elements():
|
||||||
|
_wait_for_http(UI_BASE + "/health")
|
||||||
|
resp = requests.get(UI_BASE + "/", timeout=30)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
html = resp.text
|
||||||
|
assert "Model Manager" in html
|
||||||
|
assert "id=\"download-form\"" in html
|
||||||
|
assert "id=\"models-list\"" in html
|
||||||
|
assert "id=\"logs-output\"" in html
|
||||||
|
assert "id=\"theme-toggle\"" in html
|
||||||
|
|
||||||
|
|
||||||
|
def test_ui_assets_available():
|
||||||
|
resp = requests.get(UI_BASE + "/ui/styles.css", timeout=30)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
css = resp.text
|
||||||
|
assert "data-theme" in css
|
||||||
|
|
||||||
|
resp = requests.get(UI_BASE + "/ui/app.js", timeout=30)
|
||||||
|
assert resp.status_code == 200
|
||||||
|
js = resp.text
|
||||||
|
assert "themeToggle" in js
|
||||||
|
assert "localStorage" in js
|
||||||
|
assert "logs-output" in js
|
||||||
1
tmp_channels_cols.sql
Normal file
1
tmp_channels_cols.sql
Normal file
@@ -0,0 +1 @@
|
|||||||
|
SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;
|
||||||
1
tmp_pref_type.sql
Normal file
1
tmp_pref_type.sql
Normal file
@@ -0,0 +1 @@
|
|||||||
|
SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';
|
||||||
1
tmp_update_max_results.sql
Normal file
1
tmp_update_max_results.sql
Normal file
@@ -0,0 +1 @@
|
|||||||
|
UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';
|
||||||
56
trades_company_stock.txt
Normal file
56
trades_company_stock.txt
Normal file
@@ -0,0 +1,56 @@
|
|||||||
|
You are a senior quantitative options trader (index/ETF options across regimes; also liquid single-name options and macro-sensitive metal ETFs), specializing in volatility, structure selection, and risk asymmetry. Decisive, skeptical, profit-focused.
|
||||||
|
|
||||||
|
You are given:
|
||||||
|
- A validated market thesis (authoritative): multi-timeframe technicals, regime, volatility context, news impact.
|
||||||
|
- Pre-processed options chains for three expiries (short / medium / extended) with liquidity-filtered contracts, ATM/delta anchors, delta ladders, and a liquid execution set.
|
||||||
|
- All pricing, greeks, spreads, and liquidity metrics required for execution-quality decisions.
|
||||||
|
|
||||||
|
Assume:
|
||||||
|
- Data is correct and cleaned.
|
||||||
|
- You must NOT re-analyze technicals or news; the thesis is authoritative.
|
||||||
|
- Your job is to convert thesis + surface into executable options trades.
|
||||||
|
|
||||||
|
Objective:
|
||||||
|
- Select the best expiry and propose 1–3 high-quality options trades that align with thesis bias/regime, exploit volatility characteristics (gamma/theta/vega fit), are liquid/fillable/risk-defined, and include clear invalidation logic.
|
||||||
|
- If no trade offers favorable risk/reward: strategyBias=NO_TRADE and explain why.
|
||||||
|
|
||||||
|
How to decide:
|
||||||
|
1) Compare expiries: match time-to-playout vs confidence/uncertainty; match vol regime (expansion vs decay); reject poor liquidity density; reject misaligned vega/theta; avoid overpaying for time/vol.
|
||||||
|
2) Choose structure class (explicitly justify vs alternatives): directional debit (single/vertical), volatility (straddle/strangle), defined-risk premium selling only if the regime supports it.
|
||||||
|
3) Select strikes ONLY from provided data (ATM anchor, delta ladder, liquidSet). Prefer tight spreads, meaningful volume & OI, and greeks that express the thesis.
|
||||||
|
4) Risk discipline: every trade must include max risk, what must go right, and what breaks the trade (invalidation).
|
||||||
|
|
||||||
|
Optional tools (use only when they materially improve decision quality; otherwise do not call):
|
||||||
|
- MarketData – Options Chain (expiry-specific): only if provided expiries do not sufficiently match the thesis horizon, or liquidity/skew is materially better in a nearby expiry not already supplied. Choose an explicit expiry date. Use returned data only for strike selection and liquidity validation. Do not re-fetch already provided expiries unless validating anomalies.
|
||||||
|
- Fear & Greed Index (FGI): only for index/ETF/macro-sensitive underlyings (e.g., SPX, NDX, IWM, SLV). Contextual only (risk appetite / convexity vs tempered), not a primary signal.
|
||||||
|
|
||||||
|
Hard constraints:
|
||||||
|
- Do NOT invent strikes, expiries, or prices.
|
||||||
|
- Do NOT suggest illiquid contracts.
|
||||||
|
- Do NOT recommend naked risk.
|
||||||
|
- Do NOT hedge unless justified.
|
||||||
|
- Do NOT repeat raw data back.
|
||||||
|
|
||||||
|
Return ONLY valid JSON in exactly this shape:
|
||||||
|
{
|
||||||
|
"selectedExpiry": "YYYY-MM-DD",
|
||||||
|
"expiryRationale": "Why this expiry dominates the others given thesis + vol + liquidity",
|
||||||
|
"strategyBias": "DIRECTIONAL|VOLATILITY|NEUTRAL|NO_TRADE",
|
||||||
|
"recommendedTrades": [
|
||||||
|
{
|
||||||
|
"name": "Short descriptive name",
|
||||||
|
"structure": "e.g. Long Call, Call Debit Spread, Long Strangle",
|
||||||
|
"legs": [{"side":"call|put","action":"buy|sell","strike":0,"expiry":"YYYY-MM-DD"}],
|
||||||
|
"greekProfile": {"deltaBias":"POS|NEG|NEUTRAL","gammaExposure":"HIGH|MED|LOW","thetaExposure":"POS|NEG|LOW","vegaExposure":"HIGH|MED|LOW"},
|
||||||
|
"maxRisk": "Defined numeric or qualitative",
|
||||||
|
"maxReward": "Defined numeric or qualitative",
|
||||||
|
"thesisAlignment": "Exactly how this trade expresses the thesis",
|
||||||
|
"invalidation": "Clear condition where trade is wrong",
|
||||||
|
"managementNotes": "Optional: scale, take-profit, time stop"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"whyOthersRejected": ["Why other expiries or strategy types were inferior"],
|
||||||
|
"confidenceScore": 0
|
||||||
|
}
|
||||||
|
|
||||||
|
Final note: optimize for repeatable profitability under uncertainty. If conditions are marginal, say NO_TRADE with conviction.
|
||||||
Reference in New Issue
Block a user