Initial commit
This commit is contained in:
142
.gitignore
vendored
Normal file
142
.gitignore
vendored
Normal file
@@ -0,0 +1,142 @@
|
||||
# Byte-compiled / optimized / DLL files
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
share/python-wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
MANIFEST
|
||||
|
||||
# PyInstaller
|
||||
# Usually these files are written by a python script from a template
|
||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
|
||||
# Unit test / coverage reports
|
||||
htmlcov/
|
||||
.tox/
|
||||
.nox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
*.cover
|
||||
*.py,cover
|
||||
.hypothesis/
|
||||
.pytest_cache/
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
*.pot
|
||||
|
||||
# Django stuff:
|
||||
*.log
|
||||
local_settings.py
|
||||
db.sqlite3
|
||||
db.sqlite3-journal
|
||||
|
||||
# Flask stuff:
|
||||
instance/
|
||||
.webassets-cache
|
||||
|
||||
# Scrapy stuff:
|
||||
.scrapy
|
||||
|
||||
# Sphinx documentation
|
||||
docs/_build/
|
||||
|
||||
# PyBuilder
|
||||
target/
|
||||
|
||||
# Jupyter Notebook
|
||||
.ipynb_checkpoints
|
||||
|
||||
# IPython
|
||||
profile_default/
|
||||
ipython_config.py
|
||||
|
||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
|
||||
__pypackages__/
|
||||
|
||||
# Celery stuff
|
||||
celerybeat-schedule
|
||||
celerybeat.pid
|
||||
|
||||
# SageMath parsed files
|
||||
*.sage.py
|
||||
|
||||
# Environments
|
||||
.env
|
||||
.venv
|
||||
env/
|
||||
venv/
|
||||
ENV/
|
||||
env.bak/
|
||||
venv.bak/
|
||||
|
||||
# Spyder project settings
|
||||
.spyderproject
|
||||
.spyproject
|
||||
|
||||
# Rope project settings
|
||||
.ropeproject
|
||||
|
||||
# mkdocs documentation
|
||||
/site
|
||||
|
||||
# mypy
|
||||
.mypy_cache/
|
||||
.dmypy.json
|
||||
dmypy.json
|
||||
|
||||
# Pyre type checker
|
||||
.pyre/
|
||||
|
||||
# pytype static type analyzer
|
||||
.pytype/
|
||||
|
||||
# Cython debug symbols
|
||||
cython_debug/
|
||||
|
||||
# Project-specific
|
||||
/inventory_raw/
|
||||
/llamacpp_runs_remote/
|
||||
/ollama_runs_remote/
|
||||
/reports/
|
||||
/tmp/
|
||||
*.log
|
||||
/C:/Users/Rushabh/.gemini/tmp/bff31f86566324f77927540d72088ce62479fd0563c197318c9f0594af2e69ee/
|
||||
|
||||
# OS-generated files
|
||||
.DS_Store
|
||||
.DS_Store?
|
||||
._*
|
||||
.Spotlight-V100
|
||||
.Trashes
|
||||
ehthumbs.db
|
||||
Thumbs.db
|
||||
4206
AGENTS.full.md
Normal file
4206
AGENTS.full.md
Normal file
File diff suppressed because it is too large
Load Diff
20
AGENTS.md
Normal file
20
AGENTS.md
Normal file
@@ -0,0 +1,20 @@
|
||||
# AGENTS (compressed)
|
||||
|
||||
This is the compact working context. For the full historical inventory and detailed snapshots, see `AGENTS.full.md` and `inventory_raw/`.
|
||||
|
||||
## Access + basics
|
||||
- SSH: `ssh -p 55555 rushabh@192.168.1.2`
|
||||
- Sudo: `sudo -n true`
|
||||
- TrueNAS UI: `http://192.168.1.2`
|
||||
|
||||
## Full context pointers
|
||||
- Full inventory snapshot and extra system details: `AGENTS.full.md`
|
||||
- Raw captured data: `inventory_raw/`
|
||||
- Documentation notes: `docs/*`
|
||||
Projects
|
||||
- n8n Thesis Builder checkpoint (2026-01-04): `docs/n8n-thesis-builder-checkpoint-20260104.md`
|
||||
- llamaCpp wrapper: A Python-based OpenAI-compatible API wrapper and model manager for the TrueNAS llama.cpp app.
|
||||
- Location: `llamaCpp.Wrapper.app/`
|
||||
- API Port: `9093`
|
||||
- UI Port: `9094`
|
||||
- See the `README.md` inside the folder for full details.
|
||||
69
README.md
Normal file
69
README.md
Normal file
@@ -0,0 +1,69 @@
|
||||
# Codex TrueNAS Helper
|
||||
|
||||
This project is a collection of scripts, configurations, and applications to manage and enhance a TrueNAS SCALE server, with a special focus on running and interacting with large language models (LLMs) like those powered by `llama.cpp` and `Ollama`.
|
||||
|
||||
## Features
|
||||
|
||||
* **`llama.cpp` Wrapper:** A sophisticated wrapper for the `llama.cpp` TrueNAS application that provides:
|
||||
* An OpenAI-compatible API for chat completions and embeddings.
|
||||
* A web-based UI for managing models (listing, downloading).
|
||||
* The ability to hot-swap models without restarting the `llama.cpp` container by interacting with the TrueNAS API.
|
||||
* **TrueNAS Inventory:** A snapshot of the TrueNAS server's configuration, including hardware, storage, networking, and running applications.
|
||||
* **Automation Scripts:** A set of PowerShell and Python scripts for tasks like deploying the wrapper and testing remote endpoints.
|
||||
* **LLM Integration:** Tools and configurations for working with various LLMs.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
* `AGENTS.md` & `AGENTS.full.md`: These files contain detailed information and a complete inventory of the TrueNAS server's configuration.
|
||||
* `llamaCpp.Wrapper.app/`: A Python-based application that wraps the `llama.cpp` TrueNAS app with an OpenAI-compatible API and a model management UI.
|
||||
* `scripts/`: Contains various scripts for deployment, testing, and other tasks.
|
||||
* `inventory_raw/`: Raw data dumps from the TrueNAS server, used to generate the inventory in `AGENTS.full.md`.
|
||||
* `reports/`: Contains generated reports, test results, and other artifacts.
|
||||
* `llamacpp_runs_remote/` & `ollama_runs_remote/`: Logs and results from running LLMs.
|
||||
* `modelfiles/`: Modelfiles for different language models.
|
||||
* `tests/`: Python tests for the `llamaCpp.Wrapper.app`.
|
||||
|
||||
## `llamaCpp.Wrapper.app`
|
||||
|
||||
This is the core component of the project. It's a Python application that acts as a proxy to the `llama.cpp` server running on TrueNAS, but with added features.
|
||||
|
||||
### Running Locally
|
||||
|
||||
1. Install the required Python packages:
|
||||
```bash
|
||||
pip install -r llamaCpp.Wrapper.app/requirements.txt
|
||||
```
|
||||
2. Run the application:
|
||||
```bash
|
||||
python -m llamaCpp.Wrapper.app.run
|
||||
```
|
||||
This will start two web servers: one for the API (default port 9093) and one for the UI (default port 9094).
|
||||
|
||||
### Docker (TrueNAS)
|
||||
|
||||
The wrapper can be run as a Docker container on TrueNAS. See the `llamaCpp.Wrapper.app/README.md` file for a detailed example of the `docker run` command. The wrapper needs to be configured with the appropriate environment variables to connect to the TrueNAS API and the `llama.cpp` container.
|
||||
|
||||
### Model Hot-Swapping
|
||||
|
||||
The wrapper can switch models in the `llama.cpp` server by updating the application's command via the TrueNAS API. This is a powerful feature that allows for dynamic model management without manual intervention.
|
||||
|
||||
## Scripts
|
||||
|
||||
* `deploy_truenas_wrapper.py`: A Python script to deploy the `llamaCpp.Wrapper.app` to TrueNAS.
|
||||
* `remote_wrapper_test.py`: A Python script for testing the remote wrapper.
|
||||
* `update_llamacpp_flags.ps1`: A PowerShell script to update the `llama.cpp` flags.
|
||||
* `llamacpp_remote_test.ps1` & `ollama_remote_test.ps1`: PowerShell scripts for testing `llama.cpp` and `Ollama` remote endpoints.
|
||||
|
||||
## Getting Started
|
||||
|
||||
1. **Explore the Inventory:** Start by reading `AGENTS.md` and `AGENTS.full.md` to understand the TrueNAS server's configuration.
|
||||
2. **Set up the Wrapper:** If you want to use the `llama.cpp` wrapper, follow the instructions in `llamaCpp.Wrapper.app/README.md` to run it either locally or as a Docker container on TrueNAS.
|
||||
3. **Use the Scripts:** The scripts in the `scripts` directory can be used to automate various tasks.
|
||||
|
||||
## Development
|
||||
|
||||
The `llamaCpp.Wrapper.app` has a suite of tests located in the `tests/` directory. To run the tests, use `pytest`:
|
||||
|
||||
```bash
|
||||
pytest
|
||||
```
|
||||
60
docs/llamacpp-wrapper-notes.md
Normal file
60
docs/llamacpp-wrapper-notes.md
Normal file
@@ -0,0 +1,60 @@
|
||||
# llama.cpp Wrapper Notes
|
||||
|
||||
Last updated: 2026-01-04
|
||||
|
||||
## Purpose
|
||||
OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
|
||||
model switching, and parameter management via TrueNAS middleware.
|
||||
|
||||
## Deployed Image
|
||||
- `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
|
||||
|
||||
## Ports (current)
|
||||
- API (pinned): `http://192.168.1.2:9093`
|
||||
- UI (pinned): `http://192.168.1.2:9094`
|
||||
- llama.cpp native: `http://192.168.1.2:8071`
|
||||
|
||||
## Key Behaviors
|
||||
- Model switching uses TrueNAS middleware `app.update` to update `--model`.
|
||||
- `--device` flag is explicitly removed because it crashes llama.cpp on this host.
|
||||
- UI shows active model and supports switching with verification prompt.
|
||||
- UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
|
||||
- UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
|
||||
- UI supports dark theme toggle (persisted in localStorage).
|
||||
- UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
|
||||
|
||||
## Tools Support (n8n/OpenWebUI)
|
||||
- Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
|
||||
OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
|
||||
- Legacy `functions` payloads are normalized into `tools`.
|
||||
- `tool_choice` is normalized to OpenAI format as well.
|
||||
- `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
|
||||
|
||||
## Model Resolution
|
||||
- Exact string match only (with optional explicit alias mapping).
|
||||
- Requests that do not exactly match a listed model return `404`.
|
||||
|
||||
## Parameters UI
|
||||
- Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
|
||||
- Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
|
||||
|
||||
## Model Switch UI
|
||||
- Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
|
||||
- Verifies switch by sending a minimal prompt.
|
||||
|
||||
## Tests
|
||||
- Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
|
||||
- UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
|
||||
- Run with env vars:
|
||||
- `WRAPPER_BASE=http://192.168.1.2:9093`
|
||||
- `UI_BASE=http://192.168.1.2:9094`
|
||||
- `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
|
||||
- `TRUENAS_API_KEY=...`
|
||||
- `MODEL_REQUEST=<exact model id from /v1/models>`
|
||||
|
||||
## Runtime Validation (2026-01-04)
|
||||
- Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
|
||||
- Confirmed TinyLlama loads and answers prompts with `return_format=json`.
|
||||
- Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
|
||||
- Expect transient `503 Loading model` during warmup; retry after load completes.
|
||||
- Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).
|
||||
53
docs/n8n-thesis-builder-checkpoint-20260104.md
Normal file
53
docs/n8n-thesis-builder-checkpoint-20260104.md
Normal file
@@ -0,0 +1,53 @@
|
||||
# n8n Thesis Builder Debug Checkpoint (2026-01-04)
|
||||
|
||||
## Summary
|
||||
- Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
|
||||
- Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
|
||||
- Confirmed execution with garbled output: execution `7890` (status `success`).
|
||||
|
||||
## What changed in the workflow
|
||||
Only this workflow was modified:
|
||||
- `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
|
||||
- `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
|
||||
- `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
|
||||
- `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
|
||||
- `Code10` now caps news items and string length.
|
||||
|
||||
## Last successful run details (execution 7890)
|
||||
- `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
|
||||
- `AI - Technicals Auditor` output looks valid JSON (see sample below).
|
||||
- `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
|
||||
- Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
|
||||
|
||||
### Sample `AI - Thesis Builder` output (garbled)
|
||||
- symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
|
||||
- thesis: (junk tokens, mostly non-words)
|
||||
- confidence: `0`
|
||||
|
||||
### Sample `AI - Technicals Auditor` output (valid JSON)
|
||||
```
|
||||
{
|
||||
"output": {
|
||||
"timeframes": [
|
||||
{ "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||
{ "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||
{ "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
|
||||
{ "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
|
||||
],
|
||||
"optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
|
||||
"dataQualityScore": 0.5,
|
||||
"error": "INSUFFICIENT_DATA"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Open issues
|
||||
- Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
|
||||
- Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
|
||||
|
||||
## Useful commands
|
||||
- Last runs:
|
||||
`SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
|
||||
- Export workflow:
|
||||
`sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`
|
||||
|
||||
16
llamaCpp.Wrapper.app/Dockerfile
Normal file
16
llamaCpp.Wrapper.app/Dockerfile
Normal file
@@ -0,0 +1,16 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
COPY requirements.txt /app/requirements.txt
|
||||
RUN pip install --no-cache-dir -r /app/requirements.txt
|
||||
|
||||
COPY app /app/app
|
||||
COPY trades_company_stock.txt /app/trades_company_stock.txt
|
||||
|
||||
EXPOSE 8000 8001
|
||||
|
||||
CMD ["python", "-m", "app.run"]
|
||||
134
llamaCpp.Wrapper.app/README.md
Normal file
134
llamaCpp.Wrapper.app/README.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# llama.cpp OpenAI-Compatible Wrapper
|
||||
|
||||
This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
|
||||
The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
|
||||
|
||||
## Current Agents-Derived Details
|
||||
|
||||
- llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
|
||||
- Host port: `8071` -> container port `8080`
|
||||
- Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
|
||||
- Network: `ix-llamacpp_default`
|
||||
- Container name: `ix-llamacpp-llamacpp-1`
|
||||
- GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
|
||||
|
||||
Regenerate the derived config after updating `AGENTS.md`:
|
||||
|
||||
```bash
|
||||
python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
|
||||
```
|
||||
|
||||
## Running Locally
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
. .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
python -m app.run
|
||||
```
|
||||
|
||||
Defaults:
|
||||
- API: `PORT_A=9093`
|
||||
- UI: `PORT_B=9094`
|
||||
- Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
|
||||
- Model dir: `MODEL_DIR=/models`
|
||||
|
||||
## Docker (TrueNAS)
|
||||
|
||||
Example (join existing llama.cpp network and mount models):
|
||||
|
||||
```bash
|
||||
docker run --rm -p 9093:9093 -p 9094:9094 \
|
||||
--network ix-llamacpp_default \
|
||||
-v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-e LLAMACPP_RESTART_METHOD=docker \
|
||||
-e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
|
||||
-e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
|
||||
-e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
|
||||
-e TRUENAS_API_KEY=YOUR_KEY \
|
||||
-e TRUENAS_API_USER=YOUR_USER \
|
||||
-e TRUENAS_APP_NAME=llamacpp \
|
||||
-e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
|
||||
-e PORT_A=9093 -e PORT_B=9094 \
|
||||
llama-cpp-openai-wrapper:latest
|
||||
```
|
||||
|
||||
## Model Hot-Swap / Restart Hooks
|
||||
|
||||
This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
|
||||
provide one of the restart methods below:
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=http`
|
||||
- `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
|
||||
|
||||
or
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=shell`
|
||||
- `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
|
||||
|
||||
or (requires mounting docker socket)
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=docker`
|
||||
- `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
|
||||
|
||||
## Model switching via TrueNAS middleware (P0)
|
||||
|
||||
Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
|
||||
|
||||
```
|
||||
TRUENAS_WS_URL=ws://192.168.1.2/websocket
|
||||
TRUENAS_API_KEY=YOUR_KEY
|
||||
TRUENAS_API_USER=YOUR_USER
|
||||
TRUENAS_APP_NAME=llamacpp
|
||||
TRUENAS_VERIFY_SSL=false
|
||||
```
|
||||
|
||||
The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
|
||||
missing GPU split flags from `LLAMACPP_*` if not already set.
|
||||
|
||||
Optional arguments passed to restart handlers:
|
||||
|
||||
```
|
||||
LLAMACPP_DEVICES=0,1
|
||||
LLAMACPP_TENSOR_SPLIT=0.5,0.5
|
||||
LLAMACPP_SPLIT_MODE=layer
|
||||
LLAMACPP_N_GPU_LAYERS=999
|
||||
LLAMACPP_CTX_SIZE=8192
|
||||
LLAMACPP_BATCH_SIZE=1024
|
||||
LLAMACPP_UBATCH_SIZE=256
|
||||
LLAMACPP_CACHE_TYPE_K=q4_0
|
||||
LLAMACPP_CACHE_TYPE_V=q4_0
|
||||
LLAMACPP_FLASH_ATTN=on
|
||||
```
|
||||
|
||||
You can also pass arbitrary llama.cpp flags (space-separated) via:
|
||||
|
||||
```
|
||||
LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
|
||||
```
|
||||
|
||||
## Model Manager UI
|
||||
|
||||
Open `http://HOST:PORT_B/`.
|
||||
|
||||
Features:
|
||||
- List existing models
|
||||
- Download models via URL
|
||||
- Live progress + cancel
|
||||
|
||||
## Testing
|
||||
|
||||
Tests are parameterized with 100+ cases per endpoint.
|
||||
|
||||
```bash
|
||||
pytest -q
|
||||
```
|
||||
|
||||
## llama.cpp flags reference
|
||||
|
||||
Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
|
||||
|
||||
```
|
||||
pwsh scripts/update_llamacpp_flags.ps1
|
||||
```
|
||||
1
llamaCpp.Wrapper.app/__init__.py
Normal file
1
llamaCpp.Wrapper.app/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
22
llamaCpp.Wrapper.app/agents_config.json
Normal file
22
llamaCpp.Wrapper.app/agents_config.json
Normal file
@@ -0,0 +1,22 @@
|
||||
{
|
||||
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
|
||||
"container_name": "ix-llamacpp-llamacpp-1",
|
||||
"host_port": 8071,
|
||||
"container_port": 8080,
|
||||
"web_ui_url": "http://0.0.0.0:8071/",
|
||||
"model_host_path": "/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models",
|
||||
"model_container_path": "/models",
|
||||
"models": [
|
||||
"GPT-OSS",
|
||||
"Meta-Llama-3-8B-Instruct.Q4_K_M.gguf",
|
||||
"openassistant-llama2-13b-orca-8k-3319.Q5_K_M.gguf",
|
||||
"Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"
|
||||
],
|
||||
"network": "ix-llamacpp_default",
|
||||
"subnets": [
|
||||
"172.16.18.0/24",
|
||||
"fdb7:86ec:b1dd:11::/64"
|
||||
],
|
||||
"gpu_count": 2,
|
||||
"gpu_name": "NVIDIA RTX 5060 Ti, 16 GB each (per `nvidia-smi` in prior runs)."
|
||||
}
|
||||
119
llamaCpp.Wrapper.app/agents_parser.py
Normal file
119
llamaCpp.Wrapper.app/agents_parser.py
Normal file
@@ -0,0 +1,119 @@
|
||||
import json
|
||||
import re
|
||||
from dataclasses import dataclass, asdict
|
||||
from pathlib import Path
|
||||
from typing import List, Optional
|
||||
|
||||
APP_HEADER_RE = re.compile(r"^### App: (?P<name>.+?)\s*$")
|
||||
IMAGE_RE = re.compile(r"image=(?P<image>[^\s]+)")
|
||||
PORT_MAP_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host>\d+|0\.0\.0\.0:(?P<host_ip_port>\d+))")
|
||||
PORT_LINE_RE = re.compile(r"- tcp (?P<container>\d+) -> (?P<host_ip>[^:]+):(?P<host>\d+)")
|
||||
VOLUME_RE = re.compile(r"- (?P<host>/[^\s]+) -> (?P<container>/[^\s]+)")
|
||||
NETWORK_RE = re.compile(r"- (?P<name>ix-[^\s]+)_default")
|
||||
SUBNET_RE = re.compile(r"subnets=\[(?P<subnets>[^\]]+)\]")
|
||||
MODELS_RE = re.compile(r"Models in /models: (?P<models>.+)$")
|
||||
PORTAL_RE = re.compile(r"Portals: \{\'Web UI\': \'(?P<url>[^\']+)\'\}")
|
||||
GPU_RE = re.compile(r"GPUs:\s*(?P<count>\d+)x\s*(?P<name>.+)$")
|
||||
CONTAINER_NAME_RE = re.compile(r"^(?P<name>ix-llamacpp-[^\s]+)")
|
||||
|
||||
@dataclass
|
||||
class LlamacppConfig:
|
||||
image: Optional[str] = None
|
||||
container_name: Optional[str] = None
|
||||
host_port: Optional[int] = None
|
||||
container_port: Optional[int] = None
|
||||
web_ui_url: Optional[str] = None
|
||||
model_host_path: Optional[str] = None
|
||||
model_container_path: Optional[str] = None
|
||||
models: List[str] = None
|
||||
network: Optional[str] = None
|
||||
subnets: List[str] = None
|
||||
gpu_count: Optional[int] = None
|
||||
gpu_name: Optional[str] = None
|
||||
|
||||
|
||||
def _find_section(lines: List[str], app_name: str) -> List[str]:
|
||||
start = None
|
||||
for i, line in enumerate(lines):
|
||||
m = APP_HEADER_RE.match(line.strip())
|
||||
if m and m.group("name") == app_name:
|
||||
start = i
|
||||
break
|
||||
if start is None:
|
||||
return []
|
||||
for j in range(start + 1, len(lines)):
|
||||
if APP_HEADER_RE.match(lines[j].strip()):
|
||||
return lines[start:j]
|
||||
return lines[start:]
|
||||
|
||||
|
||||
def parse_agents(path: Path) -> LlamacppConfig:
|
||||
text = path.read_text(encoding="utf-8", errors="ignore")
|
||||
lines = text.splitlines()
|
||||
section = _find_section(lines, "llamacpp")
|
||||
cfg = LlamacppConfig(models=[], subnets=[])
|
||||
|
||||
for line in section:
|
||||
if cfg.image is None:
|
||||
m = IMAGE_RE.search(line)
|
||||
if m:
|
||||
cfg.image = m.group("image")
|
||||
if cfg.web_ui_url is None:
|
||||
m = PORTAL_RE.search(line)
|
||||
if m:
|
||||
cfg.web_ui_url = m.group("url")
|
||||
if cfg.container_port is None or cfg.host_port is None:
|
||||
m = PORT_LINE_RE.search(line)
|
||||
if m:
|
||||
cfg.container_port = int(m.group("container"))
|
||||
cfg.host_port = int(m.group("host"))
|
||||
if cfg.model_host_path is None or cfg.model_container_path is None:
|
||||
m = VOLUME_RE.search(line)
|
||||
if m and "/models" in m.group("container"):
|
||||
cfg.model_host_path = m.group("host")
|
||||
cfg.model_container_path = m.group("container")
|
||||
if cfg.network is None:
|
||||
m = NETWORK_RE.search(line)
|
||||
if m:
|
||||
cfg.network = f"{m.group('name')}_default"
|
||||
if "subnets=" in line:
|
||||
m = SUBNET_RE.search(line)
|
||||
if m:
|
||||
subnets_raw = m.group("subnets")
|
||||
subnets = [s.strip().strip("'") for s in subnets_raw.split(",")]
|
||||
cfg.subnets.extend([s for s in subnets if s])
|
||||
if "Models in /models:" in line:
|
||||
m = MODELS_RE.search(line)
|
||||
if m:
|
||||
models_raw = m.group("models")
|
||||
cfg.models = [s.strip() for s in models_raw.split(",") if s.strip()]
|
||||
|
||||
for line in lines:
|
||||
if cfg.gpu_count is None:
|
||||
m = GPU_RE.search(line)
|
||||
if m:
|
||||
cfg.gpu_count = int(m.group("count"))
|
||||
cfg.gpu_name = m.group("name").strip()
|
||||
if cfg.container_name is None:
|
||||
m = CONTAINER_NAME_RE.match(line.strip())
|
||||
if m:
|
||||
cfg.container_name = m.group("name")
|
||||
|
||||
return cfg
|
||||
|
||||
|
||||
def main() -> None:
|
||||
import argparse
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--agents", default="AGENTS.md")
|
||||
parser.add_argument("--out", default="app/agents_config.json")
|
||||
args = parser.parse_args()
|
||||
|
||||
cfg = parse_agents(Path(args.agents))
|
||||
out_path = Path(args.out)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(json.dumps(asdict(cfg), indent=2), encoding="utf-8")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
309
llamaCpp.Wrapper.app/api_app.py
Normal file
309
llamaCpp.Wrapper.app/api_app.py
Normal file
@@ -0,0 +1,309 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict
|
||||
|
||||
from fastapi import APIRouter, FastAPI, HTTPException, Request, Response
|
||||
from fastapi.responses import JSONResponse, StreamingResponse
|
||||
import httpx
|
||||
|
||||
from app.config import load_config
|
||||
from app.llamacpp_client import proxy_json, proxy_raw, proxy_stream
|
||||
from app.logging_utils import configure_logging
|
||||
from app.model_registry import find_model, resolve_model, scan_models
|
||||
from app.openai_translate import responses_to_chat_payload, chat_to_responses, normalize_chat_payload
|
||||
from app.restart import RestartPlan, trigger_restart
|
||||
from app.stream_transform import stream_chat_to_responses
|
||||
from app.truenas_middleware import TrueNASConfig, get_active_model_id, switch_model
|
||||
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
|
||||
|
||||
|
||||
configure_logging()
|
||||
log = logging.getLogger("api_app")
|
||||
|
||||
|
||||
def _model_list_payload(model_dir: str) -> Dict[str, Any]:
|
||||
data = []
|
||||
for model in scan_models(model_dir):
|
||||
data.append({
|
||||
"id": model.model_id,
|
||||
"object": "model",
|
||||
"created": model.created,
|
||||
"owned_by": "llama.cpp",
|
||||
})
|
||||
return {"object": "list", "data": data}
|
||||
|
||||
|
||||
def _requires_json_mode(payload: Dict[str, Any]) -> bool:
|
||||
response_format = payload.get("response_format")
|
||||
if isinstance(response_format, dict) and response_format.get("type") == "json_object":
|
||||
return True
|
||||
if payload.get("return_format") == "json":
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def _apply_json_fallback(payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
payload = dict(payload)
|
||||
payload.pop("response_format", None)
|
||||
payload.pop("return_format", None)
|
||||
messages = payload.get("messages")
|
||||
if isinstance(messages, list):
|
||||
system_msg = {"role": "system", "content": "Respond only with a valid JSON object."}
|
||||
if not messages or messages[0].get("role") != "system":
|
||||
payload["messages"] = [system_msg, *messages]
|
||||
else:
|
||||
payload["messages"] = [system_msg, *messages[1:]]
|
||||
return payload
|
||||
|
||||
|
||||
async def _proxy_json_with_retry(
|
||||
base_url: str,
|
||||
path: str,
|
||||
method: str,
|
||||
headers: Dict[str, str],
|
||||
payload: Dict[str, Any],
|
||||
timeout_s: float,
|
||||
delay_s: float = 3.0,
|
||||
) -> httpx.Response:
|
||||
deadline = time.time() + timeout_s
|
||||
attempt = 0
|
||||
last_exc: Exception | None = None
|
||||
while time.time() < deadline:
|
||||
attempt += 1
|
||||
try:
|
||||
resp = await proxy_json(base_url, path, method, headers, payload, timeout_s)
|
||||
if resp.status_code == 503:
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
data = {}
|
||||
message = ""
|
||||
if isinstance(data, dict):
|
||||
err = data.get("error")
|
||||
if isinstance(err, dict):
|
||||
message = str(err.get("message") or "")
|
||||
else:
|
||||
message = str(data.get("message") or "")
|
||||
if "loading model" in message.lower():
|
||||
log.warning("llama.cpp still loading model, retrying (attempt %s)", attempt)
|
||||
await asyncio.sleep(delay_s)
|
||||
continue
|
||||
return resp
|
||||
except httpx.RequestError as exc:
|
||||
last_exc = exc
|
||||
log.warning("Proxy request failed (attempt %s): %s", attempt, exc)
|
||||
await asyncio.sleep(delay_s)
|
||||
if last_exc:
|
||||
raise last_exc
|
||||
raise RuntimeError("proxy retry deadline exceeded")
|
||||
|
||||
|
||||
async def _get_active_model_from_truenas(cfg: TrueNASConfig) -> str:
|
||||
try:
|
||||
return await get_active_model_id(cfg)
|
||||
except Exception as exc:
|
||||
log.warning("Failed to read active model from TrueNAS config: %s", exc)
|
||||
return ""
|
||||
|
||||
|
||||
async def _wait_for_active_model(cfg: TrueNASConfig, model_id: str, timeout_s: float) -> None:
|
||||
deadline = asyncio.get_event_loop().time() + timeout_s
|
||||
while asyncio.get_event_loop().time() < deadline:
|
||||
active = await _get_active_model_from_truenas(cfg)
|
||||
if active == model_id:
|
||||
return
|
||||
await asyncio.sleep(2)
|
||||
raise RuntimeError(f"active model did not switch to {model_id}")
|
||||
|
||||
|
||||
async def _ensure_model_loaded(model_id: str, model_dir: str) -> str:
|
||||
cfg = load_config()
|
||||
model = resolve_model(model_dir, model_id, cfg.model_aliases)
|
||||
if not model:
|
||||
log.warning("Requested model not found: %s", model_id)
|
||||
raise HTTPException(status_code=404, detail="model not found")
|
||||
if model.model_id != model_id:
|
||||
log.info("Resolved model alias %s -> %s", model_id, model.model_id)
|
||||
|
||||
truenas_cfg = None
|
||||
if cfg.truenas_ws_url and cfg.truenas_api_key:
|
||||
truenas_cfg = TrueNASConfig(
|
||||
ws_url=cfg.truenas_ws_url,
|
||||
api_key=cfg.truenas_api_key,
|
||||
api_user=cfg.truenas_api_user,
|
||||
app_name=cfg.truenas_app_name,
|
||||
verify_ssl=cfg.truenas_verify_ssl,
|
||||
)
|
||||
active_id = await _get_active_model_from_truenas(truenas_cfg)
|
||||
if active_id and active_id == model.model_id:
|
||||
return model.model_id
|
||||
|
||||
if truenas_cfg:
|
||||
log.info("Switching model via API model=%s args=%s extra_args=%s", model.model_id, cfg.llamacpp_args, cfg.llamacpp_extra_args)
|
||||
try:
|
||||
model_path = str((Path(cfg.model_container_dir) / model.model_id))
|
||||
await switch_model(
|
||||
truenas_cfg,
|
||||
model_path,
|
||||
cfg.llamacpp_args,
|
||||
cfg.llamacpp_extra_args,
|
||||
)
|
||||
await _wait_for_active_model(truenas_cfg, model.model_id, cfg.switch_timeout_s)
|
||||
except Exception as exc:
|
||||
log.exception("TrueNAS model switch failed")
|
||||
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
|
||||
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
|
||||
log.info("Running warmup prompt after model switch: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
|
||||
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||
return model.model_id
|
||||
|
||||
plan = RestartPlan(
|
||||
method=cfg.restart_method,
|
||||
command=cfg.restart_command,
|
||||
url=cfg.restart_url,
|
||||
allowed_container=cfg.allowed_container,
|
||||
)
|
||||
log.info("Triggering restart for model=%s method=%s", model.model_id, cfg.restart_method)
|
||||
payload = {
|
||||
"model_id": model.model_id,
|
||||
"model_path": str(Path(cfg.model_container_dir) / model.model_id),
|
||||
"gpu_count": cfg.gpu_count_runtime or cfg.agents.gpu_count,
|
||||
"llamacpp_args": cfg.llamacpp_args,
|
||||
"llamacpp_extra_args": cfg.llamacpp_extra_args,
|
||||
}
|
||||
await trigger_restart(plan, payload=payload)
|
||||
warmup_prompt = resolve_warmup_prompt(None, cfg.warmup_prompt_path)
|
||||
log.info("Running warmup prompt after restart: model=%s prompt_len=%s", model.model_id, len(warmup_prompt))
|
||||
await run_warmup_with_retry(cfg.base_url, model.model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||
return model.model_id
|
||||
|
||||
|
||||
def create_api_app() -> FastAPI:
|
||||
cfg = load_config()
|
||||
app = FastAPI(title="llama.cpp OpenAI Wrapper", version="0.1.0")
|
||||
router = APIRouter()
|
||||
|
||||
@app.middleware("http")
|
||||
async def log_requests(request: Request, call_next):
|
||||
log.info("Request %s %s", request.method, request.url.path)
|
||||
return await call_next(request)
|
||||
|
||||
@app.exception_handler(Exception)
|
||||
async def unhandled_exception_handler(request: Request, exc: Exception) -> JSONResponse:
|
||||
log.exception("Unhandled error")
|
||||
return JSONResponse(status_code=500, content={"detail": str(exc)})
|
||||
|
||||
@router.get("/health")
|
||||
async def health() -> Dict[str, Any]:
|
||||
return {
|
||||
"status": "ok",
|
||||
"base_url": cfg.base_url,
|
||||
"model_dir": cfg.model_dir,
|
||||
"agents": {
|
||||
"image": cfg.agents.image,
|
||||
"container_name": cfg.agents.container_name,
|
||||
"network": cfg.agents.network,
|
||||
"gpu_count": cfg.agents.gpu_count,
|
||||
},
|
||||
"gpu_count_runtime": cfg.gpu_count_runtime,
|
||||
}
|
||||
|
||||
@router.get("/v1/models")
|
||||
async def list_models() -> Dict[str, Any]:
|
||||
log.info("Listing models")
|
||||
return _model_list_payload(cfg.model_dir)
|
||||
|
||||
@router.get("/v1/models/{model_id}")
|
||||
async def get_model(model_id: str) -> Dict[str, Any]:
|
||||
log.info("Get model %s", model_id)
|
||||
model = resolve_model(cfg.model_dir, model_id, cfg.model_aliases) or find_model(cfg.model_dir, model_id)
|
||||
if not model:
|
||||
raise HTTPException(status_code=404, detail="model not found")
|
||||
return {
|
||||
"id": model.model_id,
|
||||
"object": "model",
|
||||
"created": model.created,
|
||||
"owned_by": "llama.cpp",
|
||||
}
|
||||
|
||||
@router.post("/v1/chat/completions")
|
||||
async def chat_completions(request: Request) -> Response:
|
||||
payload = await request.json()
|
||||
payload = normalize_chat_payload(payload)
|
||||
model_id = payload.get("model")
|
||||
log.info("Chat completions model=%s stream=%s", model_id, bool(payload.get("stream")))
|
||||
if model_id:
|
||||
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
|
||||
payload["model"] = resolved
|
||||
stream = bool(payload.get("stream"))
|
||||
if stream and _requires_json_mode(payload):
|
||||
payload = _apply_json_fallback(payload)
|
||||
if stream:
|
||||
streamer = proxy_stream(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||
return StreamingResponse(streamer, media_type="text/event-stream")
|
||||
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||
if resp.status_code >= 500 and _requires_json_mode(payload):
|
||||
log.info("Retrying chat completion with JSON fallback prompt")
|
||||
fallback_payload = _apply_json_fallback(payload)
|
||||
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
|
||||
try:
|
||||
return JSONResponse(status_code=resp.status_code, content=resp.json())
|
||||
except Exception:
|
||||
return Response(
|
||||
status_code=resp.status_code,
|
||||
content=resp.content,
|
||||
media_type=resp.headers.get("content-type"),
|
||||
)
|
||||
|
||||
@router.post("/v1/responses")
|
||||
async def responses(request: Request) -> Response:
|
||||
payload = await request.json()
|
||||
chat_payload, model_id = responses_to_chat_payload(payload)
|
||||
log.info("Responses model=%s stream=%s", model_id, bool(chat_payload.get("stream")))
|
||||
if model_id:
|
||||
resolved = await _ensure_model_loaded(model_id, cfg.model_dir)
|
||||
chat_payload["model"] = resolved
|
||||
stream = bool(chat_payload.get("stream"))
|
||||
if stream and _requires_json_mode(chat_payload):
|
||||
chat_payload = _apply_json_fallback(chat_payload)
|
||||
if stream:
|
||||
streamer = stream_chat_to_responses(
|
||||
cfg.base_url,
|
||||
dict(request.headers),
|
||||
chat_payload,
|
||||
cfg.proxy_timeout_s,
|
||||
)
|
||||
return StreamingResponse(streamer, media_type="text/event-stream")
|
||||
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), chat_payload, cfg.proxy_timeout_s)
|
||||
if resp.status_code >= 500 and _requires_json_mode(chat_payload):
|
||||
log.info("Retrying responses with JSON fallback prompt")
|
||||
fallback_payload = _apply_json_fallback(chat_payload)
|
||||
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/chat/completions", "POST", dict(request.headers), fallback_payload, cfg.proxy_timeout_s)
|
||||
resp.raise_for_status()
|
||||
return JSONResponse(status_code=200, content=chat_to_responses(resp.json(), model_id))
|
||||
|
||||
@router.post("/v1/embeddings")
|
||||
async def embeddings(request: Request) -> Response:
|
||||
payload = await request.json()
|
||||
log.info("Embeddings")
|
||||
resp = await _proxy_json_with_retry(cfg.base_url, "/v1/embeddings", "POST", dict(request.headers), payload, cfg.proxy_timeout_s)
|
||||
try:
|
||||
return JSONResponse(status_code=resp.status_code, content=resp.json())
|
||||
except Exception:
|
||||
return Response(
|
||||
status_code=resp.status_code,
|
||||
content=resp.content,
|
||||
media_type=resp.headers.get("content-type"),
|
||||
)
|
||||
|
||||
@router.api_route("/proxy/llamacpp/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"])
|
||||
async def passthrough(path: str, request: Request) -> Response:
|
||||
body = await request.body()
|
||||
resp = await proxy_raw(cfg.base_url, f"/{path}", request.method, dict(request.headers), body, cfg.proxy_timeout_s)
|
||||
return Response(status_code=resp.status_code, content=resp.content, headers=dict(resp.headers))
|
||||
|
||||
app.include_router(router)
|
||||
return app
|
||||
|
||||
214
llamaCpp.Wrapper.app/config.py
Normal file
214
llamaCpp.Wrapper.app/config.py
Normal file
@@ -0,0 +1,214 @@
|
||||
import json
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentsRuntime:
|
||||
image: Optional[str]
|
||||
container_name: Optional[str]
|
||||
host_port: Optional[int]
|
||||
container_port: Optional[int]
|
||||
web_ui_url: Optional[str]
|
||||
model_host_path: Optional[str]
|
||||
model_container_path: Optional[str]
|
||||
models: List[str]
|
||||
network: Optional[str]
|
||||
subnets: List[str]
|
||||
gpu_count: Optional[int]
|
||||
gpu_name: Optional[str]
|
||||
|
||||
|
||||
@dataclass
|
||||
class AppConfig:
|
||||
api_port: int
|
||||
ui_port: int
|
||||
base_url: str
|
||||
model_dir: str
|
||||
model_container_dir: str
|
||||
download_dir: str
|
||||
download_max_concurrent: int
|
||||
download_allowlist: List[str]
|
||||
restart_method: str
|
||||
restart_command: Optional[str]
|
||||
restart_url: Optional[str]
|
||||
reload_on_new_model: bool
|
||||
proxy_timeout_s: float
|
||||
switch_timeout_s: float
|
||||
gpu_count_runtime: Optional[int]
|
||||
llamacpp_args: Dict[str, str]
|
||||
llamacpp_extra_args: str
|
||||
truenas_api_key: Optional[str]
|
||||
truenas_api_user: Optional[str]
|
||||
truenas_app_name: str
|
||||
truenas_ws_url: Optional[str]
|
||||
truenas_verify_ssl: bool
|
||||
allowed_container: Optional[str]
|
||||
warmup_prompt_path: str
|
||||
llamacpp_container_name: Optional[str]
|
||||
model_aliases: Dict[str, str]
|
||||
agents: AgentsRuntime
|
||||
|
||||
|
||||
def _load_agents_config(path: Path) -> AgentsRuntime:
|
||||
if not path.exists():
|
||||
return AgentsRuntime(
|
||||
image=None,
|
||||
container_name=None,
|
||||
host_port=None,
|
||||
container_port=None,
|
||||
web_ui_url=None,
|
||||
model_host_path=None,
|
||||
model_container_path=None,
|
||||
models=[],
|
||||
network=None,
|
||||
subnets=[],
|
||||
gpu_count=None,
|
||||
gpu_name=None,
|
||||
)
|
||||
raw = json.loads(path.read_text(encoding="utf-8"))
|
||||
return AgentsRuntime(
|
||||
image=raw.get("image"),
|
||||
container_name=raw.get("container_name"),
|
||||
host_port=raw.get("host_port"),
|
||||
container_port=raw.get("container_port"),
|
||||
web_ui_url=raw.get("web_ui_url"),
|
||||
model_host_path=raw.get("model_host_path"),
|
||||
model_container_path=raw.get("model_container_path"),
|
||||
models=raw.get("models") or [],
|
||||
network=raw.get("network"),
|
||||
subnets=raw.get("subnets") or [],
|
||||
gpu_count=raw.get("gpu_count"),
|
||||
gpu_name=raw.get("gpu_name"),
|
||||
)
|
||||
|
||||
|
||||
def _infer_gpu_count_runtime() -> Optional[int]:
|
||||
visible = os.getenv("CUDA_VISIBLE_DEVICES") or os.getenv("NVIDIA_VISIBLE_DEVICES")
|
||||
if visible and visible not in {"all", "void"}:
|
||||
parts = [p.strip() for p in visible.split(",") if p.strip()]
|
||||
if parts:
|
||||
return len(parts)
|
||||
return None
|
||||
|
||||
|
||||
def _default_base_url(agents: AgentsRuntime) -> str:
|
||||
if agents.container_name and agents.container_port:
|
||||
return f"http://{agents.container_name}:{agents.container_port}"
|
||||
if agents.host_port:
|
||||
return f"http://127.0.0.1:{agents.host_port}"
|
||||
return "http://127.0.0.1:8080"
|
||||
|
||||
|
||||
def load_config() -> AppConfig:
|
||||
agents_path = Path(os.getenv("AGENTS_CONFIG_PATH", "app/agents_config.json"))
|
||||
agents = _load_agents_config(agents_path)
|
||||
|
||||
api_port = int(os.getenv("PORT_A", "9093"))
|
||||
ui_port = int(os.getenv("PORT_B", "9094"))
|
||||
|
||||
base_url = os.getenv("LLAMACPP_BASE_URL") or _default_base_url(agents)
|
||||
model_dir = os.getenv("MODEL_DIR") or agents.model_container_path or "/models"
|
||||
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or model_dir
|
||||
|
||||
download_dir = os.getenv("MODEL_DOWNLOAD_DIR") or model_dir
|
||||
download_max = int(os.getenv("MODEL_DOWNLOAD_MAX_CONCURRENT", "2"))
|
||||
|
||||
allowlist_raw = os.getenv("MODEL_DOWNLOAD_ALLOWLIST", "")
|
||||
allowlist = [item.strip() for item in allowlist_raw.split(",") if item.strip()]
|
||||
|
||||
restart_method = os.getenv("LLAMACPP_RESTART_METHOD", "none").lower()
|
||||
restart_command = os.getenv("LLAMACPP_RESTART_COMMAND")
|
||||
restart_url = os.getenv("LLAMACPP_RESTART_URL")
|
||||
|
||||
reload_on_new_model = os.getenv("RELOAD_ON_NEW_MODEL", "false").lower() in {"1", "true", "yes"}
|
||||
proxy_timeout_s = float(os.getenv("LLAMACPP_PROXY_TIMEOUT_S", "600"))
|
||||
switch_timeout_s = float(os.getenv("LLAMACPP_SWITCH_TIMEOUT_S", "300"))
|
||||
|
||||
gpu_count_runtime = _infer_gpu_count_runtime()
|
||||
|
||||
llamacpp_args = {}
|
||||
args_map = {
|
||||
"LLAMACPP_TENSOR_SPLIT": "tensor_split",
|
||||
"LLAMACPP_SPLIT_MODE": "split_mode",
|
||||
"LLAMACPP_N_GPU_LAYERS": "n_gpu_layers",
|
||||
"LLAMACPP_CTX_SIZE": "ctx_size",
|
||||
"LLAMACPP_BATCH_SIZE": "batch_size",
|
||||
"LLAMACPP_UBATCH_SIZE": "ubatch_size",
|
||||
"LLAMACPP_CACHE_TYPE_K": "cache_type_k",
|
||||
"LLAMACPP_CACHE_TYPE_V": "cache_type_v",
|
||||
"LLAMACPP_FLASH_ATTN": "flash_attn",
|
||||
}
|
||||
for env_key, arg_key in args_map.items():
|
||||
value = os.getenv(env_key)
|
||||
if value is not None and value != "":
|
||||
llamacpp_args[arg_key] = value
|
||||
llamacpp_extra_args = os.getenv("LLAMACPP_EXTRA_ARGS", "")
|
||||
|
||||
truenas_api_key = os.getenv("TRUENAS_API_KEY")
|
||||
truenas_api_user = os.getenv("TRUENAS_API_USER")
|
||||
truenas_app_name = os.getenv("TRUENAS_APP_NAME", "llamacpp")
|
||||
truenas_ws_url = os.getenv("TRUENAS_WS_URL")
|
||||
truenas_api_url = os.getenv("TRUENAS_API_URL")
|
||||
if not truenas_ws_url and truenas_api_url:
|
||||
if truenas_api_url.startswith("https://"):
|
||||
truenas_ws_url = "wss://" + truenas_api_url[len("https://") :].rstrip("/") + "/websocket"
|
||||
elif truenas_api_url.startswith("http://"):
|
||||
truenas_ws_url = "ws://" + truenas_api_url[len("http://") :].rstrip("/") + "/websocket"
|
||||
truenas_verify_ssl = os.getenv("TRUENAS_VERIFY_SSL", "false").lower() in {"1", "true", "yes"}
|
||||
allowed_container = os.getenv("LLAMACPP_TARGET_CONTAINER") or agents.container_name
|
||||
llamacpp_container_name = os.getenv("LLAMACPP_CONTAINER_NAME") or agents.container_name
|
||||
warmup_prompt_path = os.getenv("WARMUP_PROMPT_PATH", str(Path("trades_company_stock.txt").resolve()))
|
||||
if truenas_ws_url and (":" in model_container_dir[:3] or "\\" in model_container_dir):
|
||||
model_container_dir = os.getenv("MODEL_CONTAINER_DIR") or "/models"
|
||||
aliases_raw = os.getenv("MODEL_ALIASES", "")
|
||||
model_aliases: Dict[str, str] = {}
|
||||
if aliases_raw:
|
||||
try:
|
||||
model_aliases = json.loads(aliases_raw)
|
||||
except json.JSONDecodeError:
|
||||
for item in aliases_raw.split(","):
|
||||
if "=" in item:
|
||||
key, value = item.split("=", 1)
|
||||
model_aliases[key.strip()] = value.strip()
|
||||
|
||||
gpu_count = gpu_count_runtime or agents.gpu_count
|
||||
if gpu_count and gpu_count >= 2:
|
||||
if "tensor_split" not in llamacpp_args:
|
||||
ratio = 1.0 / float(gpu_count)
|
||||
split = ",".join([f"{ratio:.2f}"] * gpu_count)
|
||||
llamacpp_args["tensor_split"] = split
|
||||
if "split_mode" not in llamacpp_args:
|
||||
llamacpp_args["split_mode"] = "layer"
|
||||
|
||||
return AppConfig(
|
||||
api_port=api_port,
|
||||
ui_port=ui_port,
|
||||
base_url=base_url,
|
||||
model_dir=model_dir,
|
||||
model_container_dir=model_container_dir,
|
||||
download_dir=download_dir,
|
||||
download_max_concurrent=download_max,
|
||||
download_allowlist=allowlist,
|
||||
restart_method=restart_method,
|
||||
restart_command=restart_command,
|
||||
restart_url=restart_url,
|
||||
reload_on_new_model=reload_on_new_model,
|
||||
proxy_timeout_s=proxy_timeout_s,
|
||||
switch_timeout_s=switch_timeout_s,
|
||||
gpu_count_runtime=gpu_count_runtime,
|
||||
llamacpp_args=llamacpp_args,
|
||||
llamacpp_extra_args=llamacpp_extra_args,
|
||||
truenas_api_key=truenas_api_key,
|
||||
truenas_api_user=truenas_api_user,
|
||||
truenas_app_name=truenas_app_name,
|
||||
truenas_ws_url=truenas_ws_url,
|
||||
truenas_verify_ssl=truenas_verify_ssl,
|
||||
allowed_container=allowed_container,
|
||||
warmup_prompt_path=warmup_prompt_path,
|
||||
llamacpp_container_name=llamacpp_container_name,
|
||||
model_aliases=model_aliases,
|
||||
agents=agents,
|
||||
)
|
||||
61
llamaCpp.Wrapper.app/docker_logs.py
Normal file
61
llamaCpp.Wrapper.app/docker_logs.py
Normal file
@@ -0,0 +1,61 @@
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
log = logging.getLogger("docker_logs")
|
||||
|
||||
|
||||
def _docker_transport() -> httpx.AsyncHTTPTransport:
|
||||
sock_path = os.getenv("DOCKER_SOCK", "/var/run/docker.sock")
|
||||
return httpx.AsyncHTTPTransport(uds=sock_path)
|
||||
|
||||
|
||||
async def _docker_get(path: str, params: Optional[dict] = None) -> httpx.Response:
|
||||
timeout = httpx.Timeout(10.0, read=10.0)
|
||||
async with httpx.AsyncClient(transport=_docker_transport(), base_url="http://docker", timeout=timeout) as client:
|
||||
resp = await client.get(path, params=params)
|
||||
resp.raise_for_status()
|
||||
return resp
|
||||
|
||||
|
||||
def _decode_docker_stream(data: bytes) -> str:
|
||||
if not data:
|
||||
return ""
|
||||
out = bytearray()
|
||||
idx = 0
|
||||
while idx + 8 <= len(data):
|
||||
stream_type = data[idx]
|
||||
size = int.from_bytes(data[idx + 4: idx + 8], "big")
|
||||
idx += 8
|
||||
if idx + size > len(data):
|
||||
break
|
||||
chunk = data[idx: idx + size]
|
||||
idx += size
|
||||
if stream_type in (1, 2):
|
||||
out.extend(chunk)
|
||||
else:
|
||||
out.extend(chunk)
|
||||
if out:
|
||||
return out.decode("utf-8", errors="replace")
|
||||
return data.decode("utf-8", errors="replace")
|
||||
|
||||
|
||||
async def docker_container_logs(container_name: str, tail_lines: int = 200) -> str:
|
||||
filters = json.dumps({"name": [container_name]})
|
||||
resp = await _docker_get("/containers/json", params={"filters": filters})
|
||||
containers = resp.json() or []
|
||||
if not containers:
|
||||
log.info("No docker container found for name=%s", container_name)
|
||||
return ""
|
||||
container_id = containers[0].get("Id")
|
||||
if not container_id:
|
||||
return ""
|
||||
resp = await _docker_get(
|
||||
f"/containers/{container_id}/logs",
|
||||
params={"stdout": 1, "stderr": 1, "tail": tail_lines},
|
||||
)
|
||||
return _decode_docker_stream(resp.content)
|
||||
141
llamaCpp.Wrapper.app/download_manager.py
Normal file
141
llamaCpp.Wrapper.app/download_manager.py
Normal file
@@ -0,0 +1,141 @@
|
||||
import asyncio
|
||||
import fnmatch
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
import uuid
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Dict, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
from app.config import AppConfig
|
||||
from app.logging_utils import configure_logging
|
||||
from app.restart import RestartPlan, trigger_restart
|
||||
|
||||
configure_logging()
|
||||
log = logging.getLogger("download_manager")
|
||||
|
||||
|
||||
@dataclass
|
||||
class DownloadStatus:
|
||||
download_id: str
|
||||
url: str
|
||||
filename: str
|
||||
status: str
|
||||
bytes_total: Optional[int] = None
|
||||
bytes_downloaded: int = 0
|
||||
started_at: float = field(default_factory=time.time)
|
||||
finished_at: Optional[float] = None
|
||||
error: Optional[str] = None
|
||||
|
||||
|
||||
class DownloadManager:
|
||||
def __init__(self, cfg: AppConfig, broadcaster=None) -> None:
|
||||
self.cfg = cfg
|
||||
self._downloads: Dict[str, DownloadStatus] = {}
|
||||
self._tasks: Dict[str, asyncio.Task] = {}
|
||||
self._semaphore = asyncio.Semaphore(cfg.download_max_concurrent)
|
||||
self._broadcaster = broadcaster
|
||||
|
||||
async def _emit(self, payload: dict) -> None:
|
||||
if self._broadcaster:
|
||||
await self._broadcaster.publish(payload)
|
||||
|
||||
def list_downloads(self) -> Dict[str, dict]:
|
||||
return {k: asdict(v) for k, v in self._downloads.items()}
|
||||
|
||||
def get(self, download_id: str) -> Optional[DownloadStatus]:
|
||||
return self._downloads.get(download_id)
|
||||
|
||||
def _is_allowed(self, url: str) -> bool:
|
||||
if not self.cfg.download_allowlist:
|
||||
return True
|
||||
return any(fnmatch.fnmatch(url, pattern) for pattern in self.cfg.download_allowlist)
|
||||
|
||||
async def start(self, url: str, filename: Optional[str] = None) -> DownloadStatus:
|
||||
if not self._is_allowed(url):
|
||||
raise ValueError("url not allowed by allowlist")
|
||||
if not filename:
|
||||
filename = os.path.basename(url.split("?")[0]) or f"model-{uuid.uuid4().hex}.gguf"
|
||||
log.info("Download requested url=%s filename=%s", url, filename)
|
||||
download_id = uuid.uuid4().hex
|
||||
status = DownloadStatus(download_id=download_id, url=url, filename=filename, status="queued")
|
||||
self._downloads[download_id] = status
|
||||
task = asyncio.create_task(self._run_download(status))
|
||||
self._tasks[download_id] = task
|
||||
await self._emit({"type": "download_status", "download": asdict(status)})
|
||||
return status
|
||||
|
||||
async def cancel(self, download_id: str) -> bool:
|
||||
task = self._tasks.get(download_id)
|
||||
if task:
|
||||
task.cancel()
|
||||
status = self._downloads.get(download_id)
|
||||
if status:
|
||||
log.info("Download cancelled id=%s filename=%s", download_id, status.filename)
|
||||
await self._emit({"type": "download_status", "download": asdict(status)})
|
||||
return True
|
||||
return False
|
||||
|
||||
async def _run_download(self, status: DownloadStatus) -> None:
|
||||
status.status = "downloading"
|
||||
base = Path(self.cfg.download_dir)
|
||||
base.mkdir(parents=True, exist_ok=True)
|
||||
tmp_path = base / f".{status.filename}.partial"
|
||||
final_path = base / status.filename
|
||||
last_emit = 0.0
|
||||
|
||||
try:
|
||||
async with self._semaphore:
|
||||
async with httpx.AsyncClient(timeout=None, follow_redirects=True) as client:
|
||||
async with client.stream("GET", status.url) as resp:
|
||||
resp.raise_for_status()
|
||||
length = resp.headers.get("content-length")
|
||||
if length:
|
||||
status.bytes_total = int(length)
|
||||
with tmp_path.open("wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
if chunk:
|
||||
f.write(chunk)
|
||||
status.bytes_downloaded += len(chunk)
|
||||
now = time.time()
|
||||
if now - last_emit >= 1:
|
||||
last_emit = now
|
||||
await self._emit({"type": "download_progress", "download": asdict(status)})
|
||||
if tmp_path.exists():
|
||||
tmp_path.replace(final_path)
|
||||
status.status = "completed"
|
||||
status.finished_at = time.time()
|
||||
log.info("Download completed id=%s filename=%s", status.download_id, status.filename)
|
||||
await self._emit({"type": "download_completed", "download": asdict(status)})
|
||||
if self.cfg.reload_on_new_model:
|
||||
plan = RestartPlan(
|
||||
method=self.cfg.restart_method,
|
||||
command=self.cfg.restart_command,
|
||||
url=self.cfg.restart_url,
|
||||
allowed_container=self.cfg.allowed_container,
|
||||
)
|
||||
await trigger_restart(
|
||||
plan,
|
||||
payload={
|
||||
"reason": "new_model",
|
||||
"model_id": status.filename,
|
||||
"llamacpp_args": self.cfg.llamacpp_args,
|
||||
"llamacpp_extra_args": self.cfg.llamacpp_extra_args,
|
||||
},
|
||||
)
|
||||
except asyncio.CancelledError:
|
||||
status.status = "cancelled"
|
||||
if tmp_path.exists():
|
||||
tmp_path.unlink(missing_ok=True)
|
||||
log.info("Download cancelled id=%s filename=%s", status.download_id, status.filename)
|
||||
await self._emit({"type": "download_cancelled", "download": asdict(status)})
|
||||
except Exception as exc:
|
||||
status.status = "error"
|
||||
status.error = str(exc)
|
||||
if tmp_path.exists():
|
||||
tmp_path.unlink(missing_ok=True)
|
||||
log.info("Download error id=%s filename=%s error=%s", status.download_id, status.filename, exc)
|
||||
await self._emit({"type": "download_error", "download": asdict(status)})
|
||||
52
llamaCpp.Wrapper.app/llamacpp_client.py
Normal file
52
llamaCpp.Wrapper.app/llamacpp_client.py
Normal file
@@ -0,0 +1,52 @@
|
||||
import logging
|
||||
from typing import AsyncIterator, Dict, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
log = logging.getLogger("llamacpp_client")
|
||||
|
||||
|
||||
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
|
||||
drop = {"host", "content-length"}
|
||||
return {k: v for k, v in headers.items() if k.lower() not in drop}
|
||||
|
||||
|
||||
async def proxy_json(
|
||||
base_url: str,
|
||||
path: str,
|
||||
method: str,
|
||||
headers: Dict[str, str],
|
||||
payload: Optional[dict],
|
||||
timeout_s: float,
|
||||
) -> httpx.Response:
|
||||
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||
return await client.request(method, path, headers=_filter_headers(headers), json=payload)
|
||||
|
||||
|
||||
async def proxy_raw(
|
||||
base_url: str,
|
||||
path: str,
|
||||
method: str,
|
||||
headers: Dict[str, str],
|
||||
body: Optional[bytes],
|
||||
timeout_s: float,
|
||||
) -> httpx.Response:
|
||||
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||
return await client.request(method, path, headers=_filter_headers(headers), content=body)
|
||||
|
||||
|
||||
async def proxy_stream(
|
||||
base_url: str,
|
||||
path: str,
|
||||
method: str,
|
||||
headers: Dict[str, str],
|
||||
payload: Optional[dict],
|
||||
timeout_s: float,
|
||||
) -> AsyncIterator[bytes]:
|
||||
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||
async with client.stream(method, path, headers=_filter_headers(headers), json=payload) as resp:
|
||||
resp.raise_for_status()
|
||||
async for chunk in resp.aiter_bytes():
|
||||
if chunk:
|
||||
yield chunk
|
||||
13
llamaCpp.Wrapper.app/logging_utils.py
Normal file
13
llamaCpp.Wrapper.app/logging_utils.py
Normal file
@@ -0,0 +1,13 @@
|
||||
import logging
|
||||
import os
|
||||
|
||||
|
||||
def configure_logging() -> None:
|
||||
if logging.getLogger().handlers:
|
||||
return
|
||||
level_name = os.getenv("LOG_LEVEL", "INFO").upper()
|
||||
level = getattr(logging, level_name, logging.INFO)
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
|
||||
)
|
||||
45
llamaCpp.Wrapper.app/model_registry.py
Normal file
45
llamaCpp.Wrapper.app/model_registry.py
Normal file
@@ -0,0 +1,45 @@
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class ModelInfo:
|
||||
model_id: str
|
||||
created: int
|
||||
size: int
|
||||
path: Path
|
||||
|
||||
|
||||
def scan_models(model_dir: str) -> List[ModelInfo]:
|
||||
base = Path(model_dir)
|
||||
if not base.exists():
|
||||
return []
|
||||
models: List[ModelInfo] = []
|
||||
now = int(time.time())
|
||||
for entry in base.iterdir():
|
||||
if entry.name.endswith(".partial"):
|
||||
continue
|
||||
if entry.is_file():
|
||||
size = entry.stat().st_size
|
||||
models.append(ModelInfo(model_id=entry.name, created=now, size=size, path=entry))
|
||||
elif entry.is_dir():
|
||||
models.append(ModelInfo(model_id=entry.name, created=now, size=0, path=entry))
|
||||
models.sort(key=lambda m: m.model_id.lower())
|
||||
return models
|
||||
|
||||
|
||||
def find_model(model_dir: str, model_id: str) -> Optional[ModelInfo]:
|
||||
for model in scan_models(model_dir):
|
||||
if model.model_id == model_id:
|
||||
return model
|
||||
return None
|
||||
|
||||
|
||||
def resolve_model(model_dir: str, requested: str, aliases: Dict[str, str]) -> Optional[ModelInfo]:
|
||||
if not requested:
|
||||
return None
|
||||
if requested in aliases:
|
||||
requested = aliases[requested]
|
||||
return find_model(model_dir, requested)
|
||||
140
llamaCpp.Wrapper.app/openai_translate.py
Normal file
140
llamaCpp.Wrapper.app/openai_translate.py
Normal file
@@ -0,0 +1,140 @@
|
||||
import time
|
||||
import uuid
|
||||
from typing import Any, Dict, List, Tuple
|
||||
|
||||
|
||||
def _messages_from_input(input_value: Any) -> List[Dict[str, Any]]:
|
||||
if isinstance(input_value, str):
|
||||
return [{"role": "user", "content": input_value}]
|
||||
if isinstance(input_value, list):
|
||||
messages: List[Dict[str, Any]] = []
|
||||
for item in input_value:
|
||||
if isinstance(item, str):
|
||||
messages.append({"role": "user", "content": item})
|
||||
elif isinstance(item, dict):
|
||||
role = item.get("role") or "user"
|
||||
content = item.get("content") or item.get("text") or ""
|
||||
if item.get("type") == "input_image":
|
||||
content = [{"type": "image_url", "image_url": {"url": item.get("image_url", "")}}]
|
||||
messages.append({"role": role, "content": content})
|
||||
return messages
|
||||
return [{"role": "user", "content": str(input_value)}]
|
||||
|
||||
|
||||
def _normalize_tools(tools: Any) -> Any:
|
||||
if not isinstance(tools, list):
|
||||
return tools
|
||||
normalized = []
|
||||
for tool in tools:
|
||||
if not isinstance(tool, dict):
|
||||
normalized.append(tool)
|
||||
continue
|
||||
if "function" in tool:
|
||||
normalized.append(tool)
|
||||
continue
|
||||
if tool.get("type") == "function" and ("name" in tool or "parameters" in tool or "description" in tool):
|
||||
function = {
|
||||
"name": tool.get("name"),
|
||||
"parameters": tool.get("parameters"),
|
||||
"description": tool.get("description"),
|
||||
}
|
||||
function = {k: v for k, v in function.items() if v is not None}
|
||||
normalized.append({"type": "function", "function": function})
|
||||
continue
|
||||
normalized.append(tool)
|
||||
return normalized
|
||||
|
||||
|
||||
def _normalize_tool_choice(tool_choice: Any) -> Any:
|
||||
if not isinstance(tool_choice, dict):
|
||||
return tool_choice
|
||||
if "function" in tool_choice:
|
||||
return tool_choice
|
||||
if tool_choice.get("type") == "function" and "name" in tool_choice:
|
||||
return {"type": "function", "function": {"name": tool_choice.get("name")}}
|
||||
return tool_choice
|
||||
|
||||
|
||||
def normalize_chat_payload(payload: Dict[str, Any]) -> Dict[str, Any]:
|
||||
if "return_format" in payload and "response_format" not in payload:
|
||||
if payload["return_format"] == "json":
|
||||
payload["response_format"] = {"type": "json_object"}
|
||||
if "functions" in payload and "tools" not in payload:
|
||||
functions = payload.get("functions")
|
||||
if isinstance(functions, list):
|
||||
tools = []
|
||||
for func in functions:
|
||||
if isinstance(func, dict):
|
||||
tools.append({"type": "function", "function": func})
|
||||
if tools:
|
||||
payload["tools"] = tools
|
||||
payload.pop("functions", None)
|
||||
if "tools" in payload:
|
||||
payload["tools"] = _normalize_tools(payload.get("tools"))
|
||||
if "tool_choice" in payload:
|
||||
payload["tool_choice"] = _normalize_tool_choice(payload.get("tool_choice"))
|
||||
return payload
|
||||
|
||||
|
||||
def responses_to_chat_payload(payload: Dict[str, Any]) -> Tuple[Dict[str, Any], str]:
|
||||
model = payload.get("model") or "unknown"
|
||||
messages = _messages_from_input(payload.get("input", ""))
|
||||
|
||||
chat_payload: Dict[str, Any] = {
|
||||
"model": model,
|
||||
"messages": messages,
|
||||
}
|
||||
|
||||
passthrough_keys = [
|
||||
"temperature",
|
||||
"top_p",
|
||||
"max_output_tokens",
|
||||
"stream",
|
||||
"tools",
|
||||
"tool_choice",
|
||||
"response_format",
|
||||
"return_format",
|
||||
"frequency_penalty",
|
||||
"presence_penalty",
|
||||
"seed",
|
||||
"stop",
|
||||
]
|
||||
|
||||
for key in passthrough_keys:
|
||||
if key in payload:
|
||||
if key == "max_output_tokens":
|
||||
chat_payload["max_tokens"] = payload[key]
|
||||
elif key == "return_format" and payload[key] == "json":
|
||||
chat_payload["response_format"] = {"type": "json_object"}
|
||||
else:
|
||||
chat_payload[key] = payload[key]
|
||||
|
||||
return normalize_chat_payload(chat_payload), model
|
||||
|
||||
|
||||
def chat_to_responses(chat: Dict[str, Any], model: str) -> Dict[str, Any]:
|
||||
response_id = f"resp_{uuid.uuid4().hex}"
|
||||
created = int(time.time())
|
||||
content = ""
|
||||
if chat.get("choices"):
|
||||
choice = chat["choices"][0]
|
||||
message = choice.get("message") or {}
|
||||
content = message.get("content") or ""
|
||||
|
||||
return {
|
||||
"id": response_id,
|
||||
"object": "response",
|
||||
"created": created,
|
||||
"model": model,
|
||||
"output": [
|
||||
{
|
||||
"id": f"msg_{uuid.uuid4().hex}",
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"content": [
|
||||
{"type": "output_text", "text": content}
|
||||
],
|
||||
}
|
||||
],
|
||||
"usage": chat.get("usage", {}),
|
||||
}
|
||||
51
llamaCpp.Wrapper.app/restart.py
Normal file
51
llamaCpp.Wrapper.app/restart.py
Normal file
@@ -0,0 +1,51 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import shlex
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
log = logging.getLogger("llamacpp_restart")
|
||||
|
||||
|
||||
@dataclass
|
||||
class RestartPlan:
|
||||
method: str
|
||||
command: Optional[str]
|
||||
url: Optional[str]
|
||||
allowed_container: Optional[str] = None
|
||||
|
||||
|
||||
async def trigger_restart(plan: RestartPlan, payload: Optional[dict] = None) -> None:
|
||||
if plan.method == "none":
|
||||
log.warning("Restart requested but restart method is none")
|
||||
return
|
||||
if plan.method == "http":
|
||||
if not plan.url:
|
||||
raise RuntimeError("restart url is required for http method")
|
||||
async with httpx.AsyncClient(timeout=60) as client:
|
||||
resp = await client.post(plan.url, json=payload or {})
|
||||
resp.raise_for_status()
|
||||
return
|
||||
if plan.method == "docker":
|
||||
if not plan.command:
|
||||
raise RuntimeError("restart command must include container id or name for docker method")
|
||||
if plan.allowed_container and plan.command != plan.allowed_container:
|
||||
raise RuntimeError("docker restart command not allowed for non-target container")
|
||||
async with httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(uds="/var/run/docker.sock"), timeout=30) as client:
|
||||
resp = await client.post(f"http://docker/containers/{plan.command}/restart")
|
||||
resp.raise_for_status()
|
||||
return
|
||||
if plan.method == "shell":
|
||||
if not plan.command:
|
||||
raise RuntimeError("restart command is required for shell method")
|
||||
cmd = plan.command
|
||||
args = shlex.split(cmd)
|
||||
proc = await asyncio.create_subprocess_exec(*args)
|
||||
code = await proc.wait()
|
||||
if code != 0:
|
||||
raise RuntimeError(f"restart command failed with exit code {code}")
|
||||
return
|
||||
raise RuntimeError(f"unknown restart method {plan.method}")
|
||||
35
llamaCpp.Wrapper.app/run.py
Normal file
35
llamaCpp.Wrapper.app/run.py
Normal file
@@ -0,0 +1,35 @@
|
||||
import os
|
||||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
from app.config import load_config
|
||||
|
||||
|
||||
def main() -> None:
|
||||
cfg = load_config()
|
||||
python = sys.executable
|
||||
|
||||
api_cmd = [python, "-m", "uvicorn", "app.api_app:create_api_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.api_port)]
|
||||
ui_cmd = [python, "-m", "uvicorn", "app.ui_app:create_ui_app", "--factory", "--host", "0.0.0.0", "--port", str(cfg.ui_port)]
|
||||
|
||||
procs = [subprocess.Popen(api_cmd)]
|
||||
if cfg.ui_port != cfg.api_port:
|
||||
procs.append(subprocess.Popen(ui_cmd))
|
||||
|
||||
def shutdown(_sig, _frame):
|
||||
for proc in procs:
|
||||
proc.terminate()
|
||||
for proc in procs:
|
||||
proc.wait(timeout=10)
|
||||
sys.exit(0)
|
||||
|
||||
signal.signal(signal.SIGTERM, shutdown)
|
||||
signal.signal(signal.SIGINT, shutdown)
|
||||
|
||||
for proc in procs:
|
||||
proc.wait()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
102
llamaCpp.Wrapper.app/stream_transform.py
Normal file
102
llamaCpp.Wrapper.app/stream_transform.py
Normal file
@@ -0,0 +1,102 @@
|
||||
import json
|
||||
import time
|
||||
import uuid
|
||||
from typing import Any, AsyncIterator, Dict
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
def _sse_event(event: str, data: Dict[str, Any]) -> bytes:
|
||||
payload = json.dumps(data, separators=(",", ":"))
|
||||
return f"event: {event}\ndata: {payload}\n\n".encode("utf-8")
|
||||
|
||||
def _filter_headers(headers: Dict[str, str]) -> Dict[str, str]:
|
||||
drop = {"host", "content-length"}
|
||||
return {k: v for k, v in headers.items() if k.lower() not in drop}
|
||||
|
||||
|
||||
async def stream_chat_to_responses(
|
||||
base_url: str,
|
||||
headers: Dict[str, str],
|
||||
payload: Dict[str, Any],
|
||||
timeout_s: float,
|
||||
) -> AsyncIterator[bytes]:
|
||||
response_id = f"resp_{uuid.uuid4().hex}"
|
||||
created = int(time.time())
|
||||
model = payload.get("model") or "unknown"
|
||||
msg_id = f"msg_{uuid.uuid4().hex}"
|
||||
output_text = ""
|
||||
|
||||
response_stub = {
|
||||
"id": response_id,
|
||||
"object": "response",
|
||||
"created": created,
|
||||
"model": model,
|
||||
"output": [
|
||||
{
|
||||
"id": msg_id,
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"content": [
|
||||
{"type": "output_text", "text": ""}
|
||||
],
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
yield _sse_event("response.created", {"type": "response.created", "response": response_stub})
|
||||
|
||||
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||
async with client.stream(
|
||||
"POST",
|
||||
"/v1/chat/completions",
|
||||
headers=_filter_headers(headers),
|
||||
json=payload,
|
||||
) as resp:
|
||||
resp.raise_for_status()
|
||||
buffer = ""
|
||||
async for chunk in resp.aiter_text():
|
||||
buffer += chunk
|
||||
while "\n\n" in buffer:
|
||||
block, buffer = buffer.split("\n\n", 1)
|
||||
lines = [line for line in block.splitlines() if line.startswith("data:")]
|
||||
if not lines:
|
||||
continue
|
||||
data_str = "\n".join(line[len("data:"):].strip() for line in lines)
|
||||
if data_str == "[DONE]":
|
||||
continue
|
||||
try:
|
||||
data = json.loads(data_str)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
choices = data.get("choices") or []
|
||||
if not choices:
|
||||
continue
|
||||
delta = choices[0].get("delta") or {}
|
||||
text_delta = delta.get("content")
|
||||
if text_delta:
|
||||
output_text += text_delta
|
||||
yield _sse_event(
|
||||
"response.output_text.delta",
|
||||
{
|
||||
"type": "response.output_text.delta",
|
||||
"delta": text_delta,
|
||||
"item_id": msg_id,
|
||||
"output_index": 0,
|
||||
"content_index": 0,
|
||||
},
|
||||
)
|
||||
|
||||
yield _sse_event(
|
||||
"response.output_text.done",
|
||||
{
|
||||
"type": "response.output_text.done",
|
||||
"text": output_text,
|
||||
"item_id": msg_id,
|
||||
"output_index": 0,
|
||||
"content_index": 0,
|
||||
},
|
||||
)
|
||||
|
||||
response_stub["output"][0]["content"][0]["text"] = output_text
|
||||
yield _sse_event("response.completed", {"type": "response.completed", "response": response_stub})
|
||||
313
llamaCpp.Wrapper.app/truenas_middleware.py
Normal file
313
llamaCpp.Wrapper.app/truenas_middleware.py
Normal file
@@ -0,0 +1,313 @@
|
||||
import json
|
||||
import logging
|
||||
import shlex
|
||||
import ssl
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import websockets
|
||||
import yaml
|
||||
|
||||
|
||||
log = logging.getLogger("truenas_middleware")
|
||||
|
||||
|
||||
@dataclass
|
||||
class TrueNASConfig:
|
||||
ws_url: str
|
||||
api_key: str
|
||||
api_user: Optional[str]
|
||||
app_name: str
|
||||
verify_ssl: bool = False
|
||||
|
||||
|
||||
def _parse_compose(raw: Any) -> Dict[str, Any]:
|
||||
if isinstance(raw, dict):
|
||||
return raw
|
||||
if isinstance(raw, str):
|
||||
text = raw.strip()
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
return yaml.safe_load(text)
|
||||
raise ValueError("Unsupported compose payload")
|
||||
|
||||
|
||||
def _command_to_list(command: Any) -> list:
|
||||
if isinstance(command, list):
|
||||
return command
|
||||
if isinstance(command, str):
|
||||
return shlex.split(command)
|
||||
return []
|
||||
|
||||
|
||||
def _extract_command(config: Dict[str, Any], service_name: str = "llamacpp") -> list:
|
||||
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||
services = compose.get("services") or {}
|
||||
svc = services.get(service_name) or {}
|
||||
return _command_to_list(svc.get("command"))
|
||||
return _command_to_list(config.get("command"))
|
||||
|
||||
|
||||
def _model_id_from_command(cmd: list) -> Optional[str]:
|
||||
if "--model" in cmd:
|
||||
idx = cmd.index("--model")
|
||||
if idx + 1 < len(cmd):
|
||||
return Path(cmd[idx + 1]).name
|
||||
return None
|
||||
|
||||
|
||||
def _set_arg(cmd: list, flag: str, value: Optional[str]) -> list:
|
||||
if value is None:
|
||||
return cmd
|
||||
if flag in cmd:
|
||||
idx = cmd.index(flag)
|
||||
if idx + 1 < len(cmd):
|
||||
cmd[idx + 1] = value
|
||||
else:
|
||||
cmd.append(value)
|
||||
return cmd
|
||||
cmd.extend([flag, value])
|
||||
return cmd
|
||||
|
||||
|
||||
def _merge_args(cmd: list, args: Dict[str, str]) -> list:
|
||||
flag_map = {
|
||||
"device": "--device",
|
||||
"tensor_split": "--tensor-split",
|
||||
"split_mode": "--split-mode",
|
||||
"n_gpu_layers": "--n-gpu-layers",
|
||||
"ctx_size": "--ctx-size",
|
||||
"batch_size": "--batch-size",
|
||||
"ubatch_size": "--ubatch-size",
|
||||
"cache_type_k": "--cache-type-k",
|
||||
"cache_type_v": "--cache-type-v",
|
||||
"flash_attn": "--flash-attn",
|
||||
}
|
||||
for key, value in args.items():
|
||||
flag = flag_map.get(key)
|
||||
if flag:
|
||||
if flag in cmd:
|
||||
continue
|
||||
_set_arg(cmd, flag, value)
|
||||
return cmd
|
||||
|
||||
|
||||
def _merge_extra_args(cmd: list, extra: str) -> list:
|
||||
if not extra:
|
||||
return cmd
|
||||
extra_list = shlex.split(extra)
|
||||
filtered: list[str] = []
|
||||
skip_next = False
|
||||
for item in extra_list:
|
||||
if skip_next:
|
||||
skip_next = False
|
||||
continue
|
||||
if item in {"--device", "-dev"}:
|
||||
log.warning("Dropping --device from extra args to avoid llama.cpp device errors.")
|
||||
skip_next = True
|
||||
continue
|
||||
filtered.append(item)
|
||||
for flag in filtered:
|
||||
if flag not in cmd:
|
||||
cmd.append(flag)
|
||||
return cmd
|
||||
|
||||
|
||||
def _update_model_command(command: Any, model_path: str, args: Dict[str, str], extra: str) -> list:
|
||||
cmd = _command_to_list(command)
|
||||
if "--device" in cmd:
|
||||
idx = cmd.index("--device")
|
||||
del cmd[idx: idx + 2]
|
||||
cmd = _set_arg(cmd, "--model", model_path)
|
||||
cmd = _merge_args(cmd, args)
|
||||
cmd = _merge_extra_args(cmd, extra)
|
||||
return cmd
|
||||
|
||||
|
||||
def _replace_flags(cmd: list, flags: Dict[str, Optional[str]], extra: str) -> list:
|
||||
result = list(cmd)
|
||||
for flag in flags.keys():
|
||||
while flag in result:
|
||||
idx = result.index(flag)
|
||||
del result[idx: idx + 2]
|
||||
if "--device" in result:
|
||||
idx = result.index("--device")
|
||||
del result[idx: idx + 2]
|
||||
for flag, value in flags.items():
|
||||
if value is not None and value != "":
|
||||
result = _set_arg(result, flag, value)
|
||||
result = _merge_extra_args(result, extra)
|
||||
return result
|
||||
|
||||
|
||||
async def get_app_config(cfg: TrueNASConfig) -> Dict[str, Any]:
|
||||
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||
if not isinstance(config, dict):
|
||||
raise RuntimeError("app.config returned unsupported payload")
|
||||
return config
|
||||
|
||||
|
||||
async def get_app_command(cfg: TrueNASConfig, service_name: str = "llamacpp") -> list:
|
||||
config = await get_app_config(cfg)
|
||||
return _extract_command(config, service_name=service_name)
|
||||
|
||||
|
||||
async def get_active_model_id(cfg: TrueNASConfig, service_name: str = "llamacpp") -> str:
|
||||
config = await get_app_config(cfg)
|
||||
cmd = _extract_command(config, service_name=service_name)
|
||||
return _model_id_from_command(cmd) or ""
|
||||
|
||||
|
||||
async def get_app_logs(
|
||||
cfg: TrueNASConfig,
|
||||
tail_lines: int = 200,
|
||||
service_name: str = "llamacpp",
|
||||
) -> str:
|
||||
tail_payloads = [
|
||||
{"tail": tail_lines},
|
||||
{"tail_lines": tail_lines},
|
||||
{"tail": str(tail_lines)},
|
||||
]
|
||||
for payload in tail_payloads:
|
||||
try:
|
||||
result = await _rpc_call(cfg, "app.container_logs", [cfg.app_name, service_name, payload])
|
||||
if isinstance(result, str):
|
||||
return result
|
||||
except Exception as exc:
|
||||
log.debug("app.container_logs failed (%s): %s", payload, exc)
|
||||
for payload in tail_payloads:
|
||||
try:
|
||||
result = await _rpc_call(cfg, "app.logs", [cfg.app_name, payload])
|
||||
if isinstance(result, str):
|
||||
return result
|
||||
except Exception as exc:
|
||||
log.debug("app.logs failed (%s): %s", payload, exc)
|
||||
return ""
|
||||
|
||||
|
||||
async def update_app_command(
|
||||
cfg: TrueNASConfig,
|
||||
command: list,
|
||||
service_name: str = "llamacpp",
|
||||
) -> None:
|
||||
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||
if not isinstance(config, dict):
|
||||
raise RuntimeError("app.config returned unsupported payload")
|
||||
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||
services = compose.get("services") or {}
|
||||
if service_name not in services:
|
||||
raise RuntimeError(f"service {service_name} not found in compose")
|
||||
svc = services[service_name]
|
||||
svc["command"] = command
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||
return
|
||||
config["command"] = command
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||
|
||||
|
||||
async def update_command_flags(
|
||||
cfg: TrueNASConfig,
|
||||
flags: Dict[str, Optional[str]],
|
||||
extra: str,
|
||||
service_name: str = "llamacpp",
|
||||
) -> None:
|
||||
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||
if not isinstance(config, dict):
|
||||
raise RuntimeError("app.config returned unsupported payload")
|
||||
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||
services = compose.get("services") or {}
|
||||
if service_name not in services:
|
||||
raise RuntimeError(f"service {service_name} not found in compose")
|
||||
svc = services[service_name]
|
||||
cmd = svc.get("command")
|
||||
svc["command"] = _replace_flags(_command_to_list(cmd), flags, extra)
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||
return
|
||||
cmd = _replace_flags(_command_to_list(config.get("command")), flags, extra)
|
||||
config["command"] = cmd
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||
|
||||
|
||||
async def _rpc_call(cfg: TrueNASConfig, method: str, params: Optional[list] = None) -> Any:
|
||||
ssl_ctx = None
|
||||
if cfg.ws_url.startswith("wss://") and not cfg.verify_ssl:
|
||||
ssl_ctx = ssl.create_default_context()
|
||||
ssl_ctx.check_hostname = False
|
||||
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||
|
||||
async with websockets.connect(cfg.ws_url, ssl=ssl_ctx) as ws:
|
||||
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||
connected = json.loads(await ws.recv())
|
||||
if connected.get("msg") != "connected":
|
||||
raise RuntimeError("failed to connect to TrueNAS websocket")
|
||||
|
||||
await ws.send(
|
||||
json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [cfg.api_key]})
|
||||
)
|
||||
auth_resp = json.loads(await ws.recv())
|
||||
if not auth_resp.get("result"):
|
||||
if not cfg.api_user:
|
||||
raise RuntimeError("API key rejected and TRUENAS_API_USER not set")
|
||||
await ws.send(
|
||||
json.dumps(
|
||||
{
|
||||
"id": 2,
|
||||
"msg": "method",
|
||||
"method": "auth.login_ex",
|
||||
"params": [
|
||||
{
|
||||
"mechanism": "API_KEY_PLAIN",
|
||||
"username": cfg.api_user,
|
||||
"api_key": cfg.api_key,
|
||||
}
|
||||
],
|
||||
}
|
||||
)
|
||||
)
|
||||
auth_ex = json.loads(await ws.recv())
|
||||
if auth_ex.get("result", {}).get("response_type") != "SUCCESS":
|
||||
raise RuntimeError("API key authentication failed")
|
||||
|
||||
req_id = 3
|
||||
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
|
||||
while True:
|
||||
raw = json.loads(await ws.recv())
|
||||
if raw.get("id") != req_id:
|
||||
continue
|
||||
if raw.get("msg") == "error":
|
||||
raise RuntimeError(raw.get("error"))
|
||||
return raw.get("result")
|
||||
|
||||
|
||||
async def switch_model(
|
||||
cfg: TrueNASConfig,
|
||||
model_path: str,
|
||||
args: Dict[str, str],
|
||||
extra: str,
|
||||
service_name: str = "llamacpp",
|
||||
) -> None:
|
||||
config = await _rpc_call(cfg, "app.config", [cfg.app_name])
|
||||
if config.get("custom_compose_config") or config.get("custom_compose_config_string"):
|
||||
compose = _parse_compose(config.get("custom_compose_config") or config.get("custom_compose_config_string") or {})
|
||||
services = compose.get("services") or {}
|
||||
if service_name not in services:
|
||||
raise RuntimeError(f"service {service_name} not found in compose")
|
||||
svc = services[service_name]
|
||||
cmd = svc.get("command")
|
||||
svc["command"] = _update_model_command(cmd, model_path, args, extra)
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"custom_compose_config": compose}])
|
||||
log.info("Requested model switch to %s via TrueNAS middleware (custom app)", model_path)
|
||||
return
|
||||
|
||||
if not isinstance(config, dict):
|
||||
raise RuntimeError("app.config returned unsupported payload")
|
||||
|
||||
cmd = config.get("command")
|
||||
config["command"] = _update_model_command(cmd, model_path, args, extra)
|
||||
await _rpc_call(cfg, "app.update", [cfg.app_name, {"values": config}])
|
||||
log.info("Requested model switch to %s via TrueNAS middleware (catalog app)", model_path)
|
||||
357
llamaCpp.Wrapper.app/ui_app.py
Normal file
357
llamaCpp.Wrapper.app/ui_app.py
Normal file
@@ -0,0 +1,357 @@
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
import httpx
|
||||
from fastapi import FastAPI, HTTPException, Request
|
||||
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse, StreamingResponse
|
||||
|
||||
from app.config import load_config
|
||||
from app.docker_logs import docker_container_logs
|
||||
from app.download_manager import DownloadManager
|
||||
from app.logging_utils import configure_logging
|
||||
from app.model_registry import scan_models
|
||||
from app.truenas_middleware import (
|
||||
TrueNASConfig,
|
||||
get_active_model_id,
|
||||
get_app_command,
|
||||
get_app_logs,
|
||||
switch_model,
|
||||
update_command_flags,
|
||||
)
|
||||
from app.warmup import resolve_warmup_prompt, run_warmup_with_retry
|
||||
|
||||
|
||||
configure_logging()
|
||||
log = logging.getLogger("ui_app")
|
||||
|
||||
|
||||
class EventBroadcaster:
|
||||
def __init__(self) -> None:
|
||||
self._queues: set[asyncio.Queue] = set()
|
||||
|
||||
def connect(self) -> asyncio.Queue:
|
||||
queue: asyncio.Queue = asyncio.Queue()
|
||||
self._queues.add(queue)
|
||||
return queue
|
||||
|
||||
def disconnect(self, queue: asyncio.Queue) -> None:
|
||||
self._queues.discard(queue)
|
||||
|
||||
async def publish(self, payload: dict) -> None:
|
||||
for queue in list(self._queues):
|
||||
queue.put_nowait(payload)
|
||||
|
||||
|
||||
def _static_path() -> Path:
|
||||
return Path(__file__).parent / "ui_static"
|
||||
|
||||
|
||||
async def _fetch_active_model(truenas_cfg: Optional[TrueNASConfig]) -> Optional[str]:
|
||||
if not truenas_cfg:
|
||||
return None
|
||||
try:
|
||||
return await get_active_model_id(truenas_cfg)
|
||||
except Exception as exc:
|
||||
log.warning("Failed to read active model from TrueNAS config: %s", exc)
|
||||
return None
|
||||
|
||||
|
||||
def _model_list(model_dir: str, active_model: Optional[str]) -> Dict[str, Any]:
|
||||
data = []
|
||||
for model in scan_models(model_dir):
|
||||
data.append({
|
||||
"id": model.model_id,
|
||||
"size": model.size,
|
||||
"active": model.model_id == active_model,
|
||||
})
|
||||
return {"models": data, "active_model": active_model}
|
||||
|
||||
|
||||
def create_ui_app() -> FastAPI:
|
||||
cfg = load_config()
|
||||
app = FastAPI(title="llama.cpp Model Manager", version="0.1.0")
|
||||
broadcaster = EventBroadcaster()
|
||||
manager = DownloadManager(cfg, broadcaster=broadcaster)
|
||||
truenas_cfg = None
|
||||
if cfg.truenas_ws_url and cfg.truenas_api_key:
|
||||
truenas_cfg = TrueNASConfig(
|
||||
ws_url=cfg.truenas_ws_url,
|
||||
api_key=cfg.truenas_api_key,
|
||||
api_user=cfg.truenas_api_user,
|
||||
app_name=cfg.truenas_app_name,
|
||||
verify_ssl=cfg.truenas_verify_ssl,
|
||||
)
|
||||
|
||||
async def monitor_active_model() -> None:
|
||||
last_model = None
|
||||
while True:
|
||||
current = await _fetch_active_model(truenas_cfg)
|
||||
if current and current != last_model:
|
||||
last_model = current
|
||||
await broadcaster.publish({"type": "active_model", "model_id": current})
|
||||
await asyncio.sleep(3)
|
||||
|
||||
async def _fetch_logs() -> str:
|
||||
logs = ""
|
||||
if truenas_cfg:
|
||||
try:
|
||||
logs = await asyncio.wait_for(get_app_logs(truenas_cfg, tail_lines=200), timeout=5)
|
||||
except asyncio.TimeoutError:
|
||||
logs = ""
|
||||
if not logs and cfg.llamacpp_container_name:
|
||||
try:
|
||||
logs = await asyncio.wait_for(
|
||||
docker_container_logs(cfg.llamacpp_container_name, tail_lines=200),
|
||||
timeout=10,
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
logs = ""
|
||||
return logs
|
||||
|
||||
@app.on_event("startup")
|
||||
async def start_tasks() -> None:
|
||||
asyncio.create_task(monitor_active_model())
|
||||
|
||||
@app.middleware("http")
|
||||
async def log_requests(request: Request, call_next):
|
||||
log.info("UI request %s %s", request.method, request.url.path)
|
||||
return await call_next(request)
|
||||
|
||||
@app.get("/health")
|
||||
async def health() -> Dict[str, Any]:
|
||||
return {"status": "ok", "model_dir": cfg.model_dir}
|
||||
|
||||
@app.get("/")
|
||||
async def index() -> HTMLResponse:
|
||||
return FileResponse(_static_path() / "index.html")
|
||||
|
||||
@app.get("/ui/styles.css")
|
||||
async def styles() -> FileResponse:
|
||||
return FileResponse(_static_path() / "styles.css")
|
||||
|
||||
@app.get("/ui/app.js")
|
||||
async def app_js() -> FileResponse:
|
||||
return FileResponse(_static_path() / "app.js")
|
||||
|
||||
@app.get("/ui/api/models")
|
||||
async def list_models() -> JSONResponse:
|
||||
active_model = await _fetch_active_model(truenas_cfg)
|
||||
log.info("UI list models active=%s", active_model)
|
||||
return JSONResponse(_model_list(cfg.model_dir, active_model))
|
||||
|
||||
@app.get("/ui/api/downloads")
|
||||
async def list_downloads() -> JSONResponse:
|
||||
log.info("UI list downloads")
|
||||
return JSONResponse({"downloads": manager.list_downloads()})
|
||||
|
||||
@app.post("/ui/api/downloads")
|
||||
async def start_download(request: Request) -> JSONResponse:
|
||||
payload = await request.json()
|
||||
url = payload.get("url")
|
||||
filename = payload.get("filename")
|
||||
log.info("UI download start url=%s filename=%s", url, filename)
|
||||
if not url:
|
||||
raise HTTPException(status_code=400, detail="url is required")
|
||||
try:
|
||||
status = await manager.start(url, filename=filename)
|
||||
except ValueError as exc:
|
||||
raise HTTPException(status_code=403, detail=str(exc))
|
||||
return JSONResponse({"download": status.__dict__})
|
||||
|
||||
@app.delete("/ui/api/downloads/{download_id}")
|
||||
async def cancel_download(download_id: str) -> JSONResponse:
|
||||
log.info("UI download cancel id=%s", download_id)
|
||||
ok = await manager.cancel(download_id)
|
||||
if not ok:
|
||||
raise HTTPException(status_code=404, detail="download not found")
|
||||
return JSONResponse({"status": "cancelled"})
|
||||
|
||||
@app.get("/ui/api/events")
|
||||
async def events() -> StreamingResponse:
|
||||
queue = broadcaster.connect()
|
||||
|
||||
async def event_stream():
|
||||
try:
|
||||
while True:
|
||||
payload = await queue.get()
|
||||
data = json.dumps(payload, separators=(",", ":"))
|
||||
yield f"data: {data}\n\n".encode("utf-8")
|
||||
finally:
|
||||
broadcaster.disconnect(queue)
|
||||
|
||||
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||
|
||||
@app.post("/ui/api/switch-model")
|
||||
async def switch_model_ui(request: Request) -> JSONResponse:
|
||||
payload = await request.json()
|
||||
model_id = payload.get("model_id")
|
||||
warmup_override = payload.get("warmup_prompt") or ""
|
||||
if not model_id:
|
||||
raise HTTPException(status_code=400, detail="model_id is required")
|
||||
|
||||
model_path = Path(cfg.model_dir) / model_id
|
||||
if not model_path.exists():
|
||||
raise HTTPException(status_code=404, detail="model not found")
|
||||
|
||||
if not truenas_cfg:
|
||||
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
|
||||
|
||||
try:
|
||||
container_model_path = str(Path(cfg.model_container_dir) / model_id)
|
||||
await switch_model(truenas_cfg, container_model_path, cfg.llamacpp_args, cfg.llamacpp_extra_args)
|
||||
except Exception as exc:
|
||||
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||
raise HTTPException(status_code=500, detail=f"model switch failed: {exc}")
|
||||
|
||||
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
|
||||
log.info("UI warmup after switch model=%s prompt_len=%s", model_id, len(warmup_prompt))
|
||||
try:
|
||||
await run_warmup_with_retry(cfg.base_url, model_id, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||
except Exception as exc:
|
||||
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||
raise HTTPException(status_code=500, detail=f"model switch warmup failed: {exc}")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(base_url=cfg.base_url, timeout=120) as client:
|
||||
resp = await client.post(
|
||||
"/v1/chat/completions",
|
||||
json={
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": "ok"}],
|
||||
"max_tokens": 4,
|
||||
"temperature": 0,
|
||||
},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
except Exception as exc:
|
||||
await broadcaster.publish({"type": "model_switch_failed", "model_id": model_id, "error": str(exc)})
|
||||
raise HTTPException(status_code=500, detail=f"model switch verification failed: {exc}")
|
||||
|
||||
await broadcaster.publish({"type": "model_switched", "model_id": model_id})
|
||||
log.info("UI model switched model=%s", model_id)
|
||||
return JSONResponse({"status": "ok", "model_id": model_id})
|
||||
|
||||
@app.get("/ui/api/llamacpp-config")
|
||||
async def get_llamacpp_config() -> JSONResponse:
|
||||
active_model = await _fetch_active_model(truenas_cfg)
|
||||
log.info("UI get llama.cpp config active=%s", active_model)
|
||||
params: Dict[str, Optional[str]] = {}
|
||||
command_raw = []
|
||||
if truenas_cfg:
|
||||
command_raw = await get_app_command(truenas_cfg)
|
||||
flag_map = {
|
||||
"--ctx-size": "ctx_size",
|
||||
"--n-gpu-layers": "n_gpu_layers",
|
||||
"--tensor-split": "tensor_split",
|
||||
"--split-mode": "split_mode",
|
||||
"--cache-type-k": "cache_type_k",
|
||||
"--cache-type-v": "cache_type_v",
|
||||
"--flash-attn": "flash_attn",
|
||||
"--temp": "temp",
|
||||
"--top-k": "top_k",
|
||||
"--top-p": "top_p",
|
||||
"--repeat-penalty": "repeat_penalty",
|
||||
"--repeat-last-n": "repeat_last_n",
|
||||
"--frequency-penalty": "frequency_penalty",
|
||||
"--presence-penalty": "presence_penalty",
|
||||
}
|
||||
if isinstance(command_raw, list):
|
||||
for flag, key in flag_map.items():
|
||||
if flag in command_raw:
|
||||
idx = command_raw.index(flag)
|
||||
if idx + 1 < len(command_raw):
|
||||
params[key] = command_raw[idx + 1]
|
||||
known_flags = set(flag_map.keys()) | {"--model"}
|
||||
extra = []
|
||||
if isinstance(command_raw, list):
|
||||
skip_next = False
|
||||
for item in command_raw:
|
||||
if skip_next:
|
||||
skip_next = False
|
||||
continue
|
||||
if item in known_flags:
|
||||
skip_next = True
|
||||
continue
|
||||
extra.append(item)
|
||||
return JSONResponse(
|
||||
{
|
||||
"active_model": active_model,
|
||||
"params": params,
|
||||
"extra_args": " ".join(extra),
|
||||
}
|
||||
)
|
||||
|
||||
@app.post("/ui/api/llamacpp-config")
|
||||
async def update_llamacpp_config(request: Request) -> JSONResponse:
|
||||
payload = await request.json()
|
||||
params = payload.get("params") or {}
|
||||
extra_args = payload.get("extra_args") or ""
|
||||
warmup_override = payload.get("warmup_prompt") or ""
|
||||
log.info("UI save llama.cpp config params=%s extra_args=%s", params, extra_args)
|
||||
if not truenas_cfg:
|
||||
raise HTTPException(status_code=500, detail="TrueNAS credentials not configured")
|
||||
flags = {
|
||||
"--ctx-size": params.get("ctx_size"),
|
||||
"--n-gpu-layers": params.get("n_gpu_layers"),
|
||||
"--tensor-split": params.get("tensor_split"),
|
||||
"--split-mode": params.get("split_mode"),
|
||||
"--cache-type-k": params.get("cache_type_k"),
|
||||
"--cache-type-v": params.get("cache_type_v"),
|
||||
"--flash-attn": params.get("flash_attn"),
|
||||
"--temp": params.get("temp"),
|
||||
"--top-k": params.get("top_k"),
|
||||
"--top-p": params.get("top_p"),
|
||||
"--repeat-penalty": params.get("repeat_penalty"),
|
||||
"--repeat-last-n": params.get("repeat_last_n"),
|
||||
"--frequency-penalty": params.get("frequency_penalty"),
|
||||
"--presence-penalty": params.get("presence_penalty"),
|
||||
}
|
||||
try:
|
||||
await update_command_flags(truenas_cfg, flags, extra_args)
|
||||
except Exception as exc:
|
||||
log.exception("UI update llama.cpp config failed")
|
||||
raise HTTPException(status_code=500, detail=f"config update failed: {exc}")
|
||||
active_model = await _fetch_active_model(truenas_cfg)
|
||||
if active_model:
|
||||
warmup_prompt = resolve_warmup_prompt(warmup_override, cfg.warmup_prompt_path)
|
||||
log.info("UI warmup after config update model=%s prompt_len=%s", active_model, len(warmup_prompt))
|
||||
try:
|
||||
await run_warmup_with_retry(cfg.base_url, active_model, warmup_prompt, timeout_s=cfg.switch_timeout_s)
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=500, detail=f"config warmup failed: {exc}")
|
||||
await broadcaster.publish({"type": "llamacpp_config_updated"})
|
||||
return JSONResponse({"status": "ok"})
|
||||
|
||||
@app.get("/ui/api/llamacpp-logs")
|
||||
async def get_llamacpp_logs() -> JSONResponse:
|
||||
logs = await _fetch_logs()
|
||||
return JSONResponse({"logs": logs})
|
||||
|
||||
@app.get("/ui/api/llamacpp-logs/stream")
|
||||
async def stream_llamacpp_logs() -> StreamingResponse:
|
||||
async def event_stream():
|
||||
last_lines: list[str] = []
|
||||
while True:
|
||||
logs = await _fetch_logs()
|
||||
lines = logs.splitlines()
|
||||
if last_lines:
|
||||
last_tail = last_lines[-1]
|
||||
idx = -1
|
||||
for i in range(len(lines) - 1, -1, -1):
|
||||
if lines[i] == last_tail:
|
||||
idx = i
|
||||
break
|
||||
if idx >= 0:
|
||||
lines = lines[idx + 1 :]
|
||||
if lines:
|
||||
last_lines = (last_lines + lines)[-200:]
|
||||
data = json.dumps({"type": "logs", "lines": lines}, separators=(",", ":"))
|
||||
yield f"data: {data}\n\n".encode("utf-8")
|
||||
await asyncio.sleep(2)
|
||||
|
||||
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||
|
||||
return app
|
||||
306
llamaCpp.Wrapper.app/ui_static/app.js
Normal file
306
llamaCpp.Wrapper.app/ui_static/app.js
Normal file
@@ -0,0 +1,306 @@
|
||||
const modelsList = document.getElementById("models-list");
|
||||
const downloadsList = document.getElementById("downloads-list");
|
||||
const refreshModels = document.getElementById("refresh-models");
|
||||
const refreshDownloads = document.getElementById("refresh-downloads");
|
||||
const form = document.getElementById("download-form");
|
||||
const errorEl = document.getElementById("download-error");
|
||||
const statusEl = document.getElementById("switch-status");
|
||||
const configStatusEl = document.getElementById("config-status");
|
||||
const configForm = document.getElementById("config-form");
|
||||
const refreshConfig = document.getElementById("refresh-config");
|
||||
const warmupPromptEl = document.getElementById("warmup-prompt");
|
||||
const refreshLogs = document.getElementById("refresh-logs");
|
||||
const logsOutput = document.getElementById("logs-output");
|
||||
const logsStatus = document.getElementById("logs-status");
|
||||
const themeToggle = document.getElementById("theme-toggle");
|
||||
|
||||
const applyTheme = (theme) => {
|
||||
document.documentElement.setAttribute("data-theme", theme);
|
||||
themeToggle.textContent = theme === "dark" ? "Light" : "Dark";
|
||||
themeToggle.setAttribute("aria-pressed", theme === "dark" ? "true" : "false");
|
||||
};
|
||||
|
||||
const savedTheme = localStorage.getItem("theme") || "light";
|
||||
applyTheme(savedTheme);
|
||||
themeToggle.addEventListener("click", () => {
|
||||
const next = document.documentElement.getAttribute("data-theme") === "dark" ? "light" : "dark";
|
||||
localStorage.setItem("theme", next);
|
||||
applyTheme(next);
|
||||
});
|
||||
|
||||
const cfgFields = {
|
||||
ctx_size: document.getElementById("cfg-ctx-size"),
|
||||
n_gpu_layers: document.getElementById("cfg-n-gpu-layers"),
|
||||
tensor_split: document.getElementById("cfg-tensor-split"),
|
||||
split_mode: document.getElementById("cfg-split-mode"),
|
||||
cache_type_k: document.getElementById("cfg-cache-type-k"),
|
||||
cache_type_v: document.getElementById("cfg-cache-type-v"),
|
||||
flash_attn: document.getElementById("cfg-flash-attn"),
|
||||
temp: document.getElementById("cfg-temp"),
|
||||
top_k: document.getElementById("cfg-top-k"),
|
||||
top_p: document.getElementById("cfg-top-p"),
|
||||
repeat_penalty: document.getElementById("cfg-repeat-penalty"),
|
||||
repeat_last_n: document.getElementById("cfg-repeat-last-n"),
|
||||
frequency_penalty: document.getElementById("cfg-frequency-penalty"),
|
||||
presence_penalty: document.getElementById("cfg-presence-penalty"),
|
||||
};
|
||||
const extraArgsEl = document.getElementById("cfg-extra-args");
|
||||
|
||||
const fmtBytes = (bytes) => {
|
||||
if (!bytes && bytes !== 0) return "-";
|
||||
const units = ["B", "KB", "MB", "GB", "TB"];
|
||||
let idx = 0;
|
||||
let value = bytes;
|
||||
while (value >= 1024 && idx < units.length - 1) {
|
||||
value /= 1024;
|
||||
idx += 1;
|
||||
}
|
||||
return `${value.toFixed(1)} ${units[idx]}`;
|
||||
};
|
||||
|
||||
const setStatus = (message, type) => {
|
||||
statusEl.textContent = message || "";
|
||||
statusEl.className = "status";
|
||||
if (type) {
|
||||
statusEl.classList.add(type);
|
||||
}
|
||||
};
|
||||
|
||||
const setConfigStatus = (message, type) => {
|
||||
configStatusEl.textContent = message || "";
|
||||
configStatusEl.className = "status";
|
||||
if (type) {
|
||||
configStatusEl.classList.add(type);
|
||||
}
|
||||
};
|
||||
|
||||
async function loadModels() {
|
||||
const res = await fetch("/ui/api/models");
|
||||
const data = await res.json();
|
||||
modelsList.innerHTML = "";
|
||||
const activeModel = data.active_model;
|
||||
data.models.forEach((model) => {
|
||||
const li = document.createElement("li");
|
||||
if (model.active) {
|
||||
li.classList.add("active");
|
||||
}
|
||||
const row = document.createElement("div");
|
||||
row.className = "model-row";
|
||||
|
||||
const name = document.createElement("span");
|
||||
name.textContent = `${model.id} (${fmtBytes(model.size)})`;
|
||||
|
||||
const actions = document.createElement("div");
|
||||
if (model.active) {
|
||||
const badge = document.createElement("span");
|
||||
badge.className = "badge";
|
||||
badge.textContent = "Active";
|
||||
actions.appendChild(badge);
|
||||
} else {
|
||||
const button = document.createElement("button");
|
||||
button.className = "ghost";
|
||||
button.textContent = "Switch";
|
||||
button.onclick = async () => {
|
||||
setStatus(`Switching to ${model.id}...`);
|
||||
const warmupPrompt = warmupPromptEl.value.trim();
|
||||
const res = await fetch("/ui/api/switch-model", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ model_id: model.id, warmup_prompt: warmupPrompt }),
|
||||
});
|
||||
const payload = await res.json();
|
||||
if (!res.ok) {
|
||||
setStatus(payload.detail || "Switch failed.", "error");
|
||||
return;
|
||||
}
|
||||
warmupPromptEl.value = "";
|
||||
setStatus(`Active model: ${model.id}`, "ok");
|
||||
await loadModels();
|
||||
};
|
||||
actions.appendChild(button);
|
||||
}
|
||||
|
||||
row.appendChild(name);
|
||||
row.appendChild(actions);
|
||||
li.appendChild(row);
|
||||
modelsList.appendChild(li);
|
||||
});
|
||||
if (activeModel) {
|
||||
setStatus(`Active model: ${activeModel}`, "ok");
|
||||
}
|
||||
}
|
||||
|
||||
async function loadDownloads() {
|
||||
const res = await fetch("/ui/api/downloads");
|
||||
const data = await res.json();
|
||||
downloadsList.innerHTML = "";
|
||||
const entries = Object.values(data.downloads || {});
|
||||
if (!entries.length) {
|
||||
downloadsList.innerHTML = "<p>No active downloads.</p>";
|
||||
return;
|
||||
}
|
||||
entries.forEach((download) => {
|
||||
const card = document.createElement("div");
|
||||
card.className = "download-card";
|
||||
|
||||
const title = document.createElement("strong");
|
||||
title.textContent = download.filename;
|
||||
|
||||
const meta = document.createElement("div");
|
||||
const percent = download.bytes_total
|
||||
? Math.round((download.bytes_downloaded / download.bytes_total) * 100)
|
||||
: 0;
|
||||
meta.textContent = `${download.status} · ${fmtBytes(download.bytes_downloaded)} / ${fmtBytes(download.bytes_total)}`;
|
||||
|
||||
const progress = document.createElement("div");
|
||||
progress.className = "progress";
|
||||
const bar = document.createElement("span");
|
||||
bar.style.width = `${Math.min(percent, 100)}%`;
|
||||
progress.appendChild(bar);
|
||||
|
||||
const actions = document.createElement("div");
|
||||
if (download.status === "downloading" || download.status === "queued") {
|
||||
const cancel = document.createElement("button");
|
||||
cancel.className = "ghost";
|
||||
cancel.textContent = "Cancel";
|
||||
cancel.onclick = async () => {
|
||||
await fetch(`/ui/api/downloads/${download.download_id}`, { method: "DELETE" });
|
||||
await loadDownloads();
|
||||
};
|
||||
actions.appendChild(cancel);
|
||||
}
|
||||
|
||||
card.appendChild(title);
|
||||
card.appendChild(meta);
|
||||
card.appendChild(progress);
|
||||
card.appendChild(actions);
|
||||
downloadsList.appendChild(card);
|
||||
});
|
||||
}
|
||||
|
||||
async function loadConfig() {
|
||||
const res = await fetch("/ui/api/llamacpp-config");
|
||||
const data = await res.json();
|
||||
Object.entries(cfgFields).forEach(([key, el]) => {
|
||||
el.value = data.params?.[key] || "";
|
||||
});
|
||||
extraArgsEl.value = data.extra_args || "";
|
||||
if (data.active_model) {
|
||||
setConfigStatus(`Active model: ${data.active_model}`, "ok");
|
||||
}
|
||||
}
|
||||
|
||||
async function loadLogs() {
|
||||
const res = await fetch("/ui/api/llamacpp-logs");
|
||||
if (!res.ok) {
|
||||
logsStatus.textContent = "Unavailable";
|
||||
return;
|
||||
}
|
||||
const data = await res.json();
|
||||
logsOutput.textContent = data.logs || "";
|
||||
logsStatus.textContent = data.logs ? "Snapshot" : "Empty";
|
||||
}
|
||||
|
||||
form.addEventListener("submit", async (event) => {
|
||||
event.preventDefault();
|
||||
errorEl.textContent = "";
|
||||
const url = document.getElementById("model-url").value.trim();
|
||||
const filename = document.getElementById("model-filename").value.trim();
|
||||
if (!url) {
|
||||
errorEl.textContent = "URL is required.";
|
||||
return;
|
||||
}
|
||||
const payload = { url };
|
||||
if (filename) payload.filename = filename;
|
||||
const res = await fetch("/ui/api/downloads", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
if (!res.ok) {
|
||||
const err = await res.json();
|
||||
errorEl.textContent = err.detail || "Failed to start download.";
|
||||
return;
|
||||
}
|
||||
document.getElementById("model-url").value = "";
|
||||
document.getElementById("model-filename").value = "";
|
||||
await loadDownloads();
|
||||
});
|
||||
|
||||
configForm.addEventListener("submit", async (event) => {
|
||||
event.preventDefault();
|
||||
setConfigStatus("Applying parameters...");
|
||||
const params = {};
|
||||
Object.entries(cfgFields).forEach(([key, el]) => {
|
||||
if (el.value.trim()) {
|
||||
params[key] = el.value.trim();
|
||||
}
|
||||
});
|
||||
const warmupPrompt = warmupPromptEl.value.trim();
|
||||
const res = await fetch("/ui/api/llamacpp-config", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ params, extra_args: extraArgsEl.value.trim(), warmup_prompt: warmupPrompt }),
|
||||
});
|
||||
const payload = await res.json();
|
||||
if (!res.ok) {
|
||||
setConfigStatus(payload.detail || "Update failed.", "error");
|
||||
return;
|
||||
}
|
||||
setConfigStatus("Parameters updated.", "ok");
|
||||
warmupPromptEl.value = "";
|
||||
});
|
||||
|
||||
refreshModels.addEventListener("click", loadModels);
|
||||
refreshDownloads.addEventListener("click", loadDownloads);
|
||||
refreshConfig.addEventListener("click", loadConfig);
|
||||
refreshLogs.addEventListener("click", loadLogs);
|
||||
|
||||
loadModels();
|
||||
loadDownloads();
|
||||
loadConfig();
|
||||
loadLogs();
|
||||
|
||||
const eventSource = new EventSource("/ui/api/events");
|
||||
eventSource.onmessage = async (event) => {
|
||||
const payload = JSON.parse(event.data);
|
||||
if (payload.type === "download_progress" || payload.type === "download_completed" || payload.type === "download_status") {
|
||||
await loadDownloads();
|
||||
}
|
||||
if (payload.type === "active_model") {
|
||||
await loadModels();
|
||||
await loadConfig();
|
||||
}
|
||||
if (payload.type === "model_switched") {
|
||||
setStatus(`Active model: ${payload.model_id}`, "ok");
|
||||
await loadModels();
|
||||
await loadConfig();
|
||||
}
|
||||
if (payload.type === "model_switch_failed") {
|
||||
setStatus(payload.error || "Model switch failed.", "error");
|
||||
}
|
||||
if (payload.type === "llamacpp_config_updated") {
|
||||
await loadConfig();
|
||||
}
|
||||
};
|
||||
|
||||
const logsSource = new EventSource("/ui/api/llamacpp-logs/stream");
|
||||
logsSource.onopen = () => {
|
||||
logsStatus.textContent = "Streaming";
|
||||
};
|
||||
logsSource.onmessage = (event) => {
|
||||
const payload = JSON.parse(event.data);
|
||||
if (payload.type !== "logs") {
|
||||
return;
|
||||
}
|
||||
const lines = payload.lines || [];
|
||||
if (!lines.length) return;
|
||||
const current = logsOutput.textContent.split("\n").filter((line) => line.length);
|
||||
const merged = current.concat(lines).slice(-400);
|
||||
logsOutput.textContent = merged.join("\n");
|
||||
logsOutput.scrollTop = logsOutput.scrollHeight;
|
||||
logsStatus.textContent = "Streaming";
|
||||
};
|
||||
logsSource.onerror = () => {
|
||||
logsStatus.textContent = "Disconnected";
|
||||
};
|
||||
151
llamaCpp.Wrapper.app/ui_static/index.html
Normal file
151
llamaCpp.Wrapper.app/ui_static/index.html
Normal file
@@ -0,0 +1,151 @@
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<title>llama.cpp Model Manager</title>
|
||||
<link rel="stylesheet" href="/ui/styles.css" />
|
||||
</head>
|
||||
<body>
|
||||
<div class="page">
|
||||
<header class="topbar">
|
||||
<div class="brand">
|
||||
<p class="eyebrow">llama.cpp wrapper</p>
|
||||
<h1>Model Manager</h1>
|
||||
<p class="lede">Curate models, tune runtime parameters, and keep llama.cpp responsive.</p>
|
||||
</div>
|
||||
<div class="header-actions">
|
||||
<button id="theme-toggle" class="ghost" type="button" aria-pressed="false">Dark</button>
|
||||
<div class="quick-actions card">
|
||||
<h2>Quick Add</h2>
|
||||
<form id="download-form">
|
||||
<label>
|
||||
Model URL
|
||||
<input type="url" id="model-url" placeholder="https://.../model.gguf" required />
|
||||
</label>
|
||||
<label>
|
||||
Optional filename
|
||||
<input type="text" id="model-filename" placeholder="custom-name.gguf" />
|
||||
</label>
|
||||
<button type="submit">Start Download</button>
|
||||
<p id="download-error" class="error"></p>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<main class="layout">
|
||||
<section class="column">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h3>Models</h3>
|
||||
<button id="refresh-models" class="ghost">Refresh</button>
|
||||
</div>
|
||||
<div id="switch-status" class="status"></div>
|
||||
<label class="config-wide">
|
||||
Warmup prompt (one-time)
|
||||
<textarea id="warmup-prompt" rows="3" placeholder="Optional warmup prompt for the next restart only"></textarea>
|
||||
</label>
|
||||
<ul id="models-list" class="list"></ul>
|
||||
</div>
|
||||
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h3>Downloads</h3>
|
||||
<button id="refresh-downloads" class="ghost">Refresh</button>
|
||||
</div>
|
||||
<div id="downloads-list" class="downloads"></div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section class="column">
|
||||
<div class="card">
|
||||
<div class="card-header">
|
||||
<h3>Runtime Parameters</h3>
|
||||
<button id="refresh-config" class="ghost">Refresh</button>
|
||||
</div>
|
||||
<div id="config-status" class="status"></div>
|
||||
<form id="config-form" class="config-grid">
|
||||
<label>
|
||||
ctx-size
|
||||
<input type="text" id="cfg-ctx-size" placeholder="e.g. 8192" />
|
||||
</label>
|
||||
<label>
|
||||
n-gpu-layers
|
||||
<input type="text" id="cfg-n-gpu-layers" placeholder="e.g. 999" />
|
||||
</label>
|
||||
<label>
|
||||
tensor-split
|
||||
<input type="text" id="cfg-tensor-split" placeholder="e.g. 0.5,0.5" />
|
||||
</label>
|
||||
<label>
|
||||
split-mode
|
||||
<input type="text" id="cfg-split-mode" placeholder="e.g. layer" />
|
||||
</label>
|
||||
<label>
|
||||
cache-type-k
|
||||
<input type="text" id="cfg-cache-type-k" placeholder="e.g. q8_0" />
|
||||
</label>
|
||||
<label>
|
||||
cache-type-v
|
||||
<input type="text" id="cfg-cache-type-v" placeholder="e.g. q8_0" />
|
||||
</label>
|
||||
<label>
|
||||
flash-attn
|
||||
<input type="text" id="cfg-flash-attn" placeholder="on/off" />
|
||||
</label>
|
||||
<label>
|
||||
temp
|
||||
<input type="text" id="cfg-temp" placeholder="e.g. 0.7" />
|
||||
</label>
|
||||
<label>
|
||||
top-k
|
||||
<input type="text" id="cfg-top-k" placeholder="e.g. 40" />
|
||||
</label>
|
||||
<label>
|
||||
top-p
|
||||
<input type="text" id="cfg-top-p" placeholder="e.g. 0.9" />
|
||||
</label>
|
||||
<label>
|
||||
repeat-penalty
|
||||
<input type="text" id="cfg-repeat-penalty" placeholder="e.g. 1.1" />
|
||||
</label>
|
||||
<label>
|
||||
repeat-last-n
|
||||
<input type="text" id="cfg-repeat-last-n" placeholder="e.g. 256" />
|
||||
</label>
|
||||
<label>
|
||||
frequency-penalty
|
||||
<input type="text" id="cfg-frequency-penalty" placeholder="e.g. 0.1" />
|
||||
</label>
|
||||
<label>
|
||||
presence-penalty
|
||||
<input type="text" id="cfg-presence-penalty" placeholder="e.g. 0.0" />
|
||||
</label>
|
||||
<label class="config-wide">
|
||||
extra args
|
||||
<textarea id="cfg-extra-args" rows="3" placeholder="--mlock --no-mmap"></textarea>
|
||||
</label>
|
||||
<button type="submit" class="config-wide">Apply Parameters</button>
|
||||
</form>
|
||||
</div>
|
||||
</section>
|
||||
</main>
|
||||
|
||||
<section class="card logs-panel">
|
||||
<div class="card-header">
|
||||
<div>
|
||||
<h3>llama.cpp Logs</h3>
|
||||
<p class="lede small">Live tail from the llama.cpp container.</p>
|
||||
</div>
|
||||
<div class="log-actions">
|
||||
<span id="logs-status" class="badge muted">Idle</span>
|
||||
<button id="refresh-logs" class="ghost">Refresh</button>
|
||||
</div>
|
||||
</div>
|
||||
<pre id="logs-output" class="log-output"></pre>
|
||||
</section>
|
||||
</div>
|
||||
<script src="/ui/app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
337
llamaCpp.Wrapper.app/ui_static/styles.css
Normal file
337
llamaCpp.Wrapper.app/ui_static/styles.css
Normal file
@@ -0,0 +1,337 @@
|
||||
:root {
|
||||
--bg: #f5f6f8;
|
||||
--panel: #ffffff;
|
||||
--panel-muted: #f2f3f6;
|
||||
--text: #111318;
|
||||
--muted: #5b6472;
|
||||
--border: rgba(17, 19, 24, 0.08);
|
||||
--accent: #0a84ff;
|
||||
--accent-ink: #005ad6;
|
||||
--shadow: 0 20px 60px rgba(17, 19, 24, 0.08);
|
||||
}
|
||||
|
||||
* {
|
||||
box-sizing: border-box;
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: "SF Pro Text", "SF Pro Display", "Helvetica Neue", "Segoe UI", sans-serif;
|
||||
background: radial-gradient(circle at top, #ffffff 0%, var(--bg) 60%);
|
||||
color: var(--text);
|
||||
}
|
||||
|
||||
.page {
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 48px 28px 72px;
|
||||
}
|
||||
|
||||
.topbar {
|
||||
display: grid;
|
||||
grid-template-columns: minmax(240px, 1.2fr) minmax(280px, 0.8fr);
|
||||
gap: 32px;
|
||||
align-items: stretch;
|
||||
margin-bottom: 36px;
|
||||
}
|
||||
|
||||
.header-actions {
|
||||
display: grid;
|
||||
gap: 16px;
|
||||
justify-items: end;
|
||||
}
|
||||
|
||||
.header-actions .quick-actions {
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
.header-actions #theme-toggle {
|
||||
justify-self: end;
|
||||
}
|
||||
|
||||
.brand h1 {
|
||||
font-size: clamp(2.2rem, 4vw, 3.2rem);
|
||||
letter-spacing: -0.02em;
|
||||
}
|
||||
|
||||
.eyebrow {
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.2em;
|
||||
font-size: 0.68rem;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.lede {
|
||||
margin-top: 12px;
|
||||
font-size: 1rem;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.lede.small {
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: var(--panel);
|
||||
padding: 22px;
|
||||
border-radius: 22px;
|
||||
border: 1px solid var(--border);
|
||||
box-shadow: var(--shadow);
|
||||
}
|
||||
|
||||
.quick-actions h2 {
|
||||
margin-bottom: 14px;
|
||||
font-size: 1.1rem;
|
||||
}
|
||||
|
||||
.layout {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(320px, 1fr));
|
||||
gap: 24px;
|
||||
}
|
||||
|
||||
.column {
|
||||
display: grid;
|
||||
gap: 24px;
|
||||
}
|
||||
|
||||
.logs-panel {
|
||||
margin-top: 28px;
|
||||
}
|
||||
|
||||
.card-header {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 12px;
|
||||
margin-bottom: 16px;
|
||||
}
|
||||
|
||||
.card-header h3 {
|
||||
font-size: 1.1rem;
|
||||
}
|
||||
|
||||
.log-actions {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
form {
|
||||
display: grid;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
label {
|
||||
display: grid;
|
||||
gap: 6px;
|
||||
font-size: 0.85rem;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
input,
|
||||
textarea,
|
||||
button {
|
||||
font: inherit;
|
||||
}
|
||||
|
||||
input,
|
||||
textarea {
|
||||
padding: 10px 12px;
|
||||
border-radius: 12px;
|
||||
border: 1px solid var(--border);
|
||||
background: #fff;
|
||||
}
|
||||
|
||||
button {
|
||||
border: none;
|
||||
padding: 10px 16px;
|
||||
border-radius: 12px;
|
||||
background: var(--accent);
|
||||
color: #fff;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: transform 0.2s ease, background 0.2s ease;
|
||||
}
|
||||
|
||||
button:hover {
|
||||
transform: translateY(-1px);
|
||||
background: var(--accent-ink);
|
||||
}
|
||||
|
||||
button.ghost {
|
||||
background: transparent;
|
||||
color: var(--accent);
|
||||
border: 1px solid rgba(10, 132, 255, 0.4);
|
||||
padding: 8px 12px;
|
||||
}
|
||||
|
||||
.list {
|
||||
list-style: none;
|
||||
padding: 0;
|
||||
margin: 0;
|
||||
display: grid;
|
||||
gap: 10px;
|
||||
}
|
||||
|
||||
.list li {
|
||||
padding: 12px;
|
||||
border-radius: 14px;
|
||||
background: var(--panel-muted);
|
||||
border: 1px solid var(--border);
|
||||
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.list li.active {
|
||||
border-color: rgba(10, 132, 255, 0.4);
|
||||
background: #eef5ff;
|
||||
}
|
||||
|
||||
.model-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.badge {
|
||||
display: inline-block;
|
||||
padding: 4px 8px;
|
||||
border-radius: 999px;
|
||||
background: var(--accent);
|
||||
color: #fff;
|
||||
font-size: 0.7rem;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.badge.muted {
|
||||
background: rgba(17, 19, 24, 0.1);
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.status {
|
||||
margin-bottom: 12px;
|
||||
font-size: 0.9rem;
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.status.ok {
|
||||
color: #1a7f37;
|
||||
}
|
||||
|
||||
.status.error {
|
||||
color: #b02a14;
|
||||
}
|
||||
|
||||
.downloads {
|
||||
display: grid;
|
||||
gap: 12px;
|
||||
}
|
||||
|
||||
.download-card {
|
||||
border-radius: 16px;
|
||||
border: 1px solid var(--border);
|
||||
padding: 12px;
|
||||
background: #f7f8fb;
|
||||
}
|
||||
|
||||
.download-card strong {
|
||||
display: block;
|
||||
font-size: 0.9rem;
|
||||
margin-bottom: 6px;
|
||||
}
|
||||
|
||||
.progress {
|
||||
height: 8px;
|
||||
border-radius: 999px;
|
||||
background: #dfe3ea;
|
||||
overflow: hidden;
|
||||
margin: 8px 0;
|
||||
}
|
||||
|
||||
.progress > span {
|
||||
display: block;
|
||||
height: 100%;
|
||||
background: var(--accent);
|
||||
width: 0;
|
||||
}
|
||||
|
||||
.error {
|
||||
color: #b02a14;
|
||||
font-size: 0.85rem;
|
||||
}
|
||||
|
||||
.config-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(180px, 1fr));
|
||||
gap: 14px;
|
||||
}
|
||||
|
||||
.config-wide {
|
||||
grid-column: 1 / -1;
|
||||
}
|
||||
|
||||
textarea {
|
||||
padding: 10px 12px;
|
||||
border-radius: 12px;
|
||||
border: 1px solid var(--border);
|
||||
font-family: "SF Mono", "JetBrains Mono", "Menlo", monospace;
|
||||
font-size: 0.85rem;
|
||||
resize: vertical;
|
||||
}
|
||||
|
||||
.log-output {
|
||||
background: #0f141b;
|
||||
color: #dbe6f3;
|
||||
padding: 16px;
|
||||
border-radius: 16px;
|
||||
min-height: 260px;
|
||||
max-height: 420px;
|
||||
overflow: auto;
|
||||
font-size: 12px;
|
||||
line-height: 1.6;
|
||||
white-space: pre-wrap;
|
||||
}
|
||||
|
||||
[data-theme="dark"] {
|
||||
--bg: #0b0d12;
|
||||
--panel: #141824;
|
||||
--panel-muted: #1b2132;
|
||||
--text: #f1f4f9;
|
||||
--muted: #a5afc2;
|
||||
--border: rgba(241, 244, 249, 0.1);
|
||||
--accent: #4aa3ff;
|
||||
--accent-ink: #1f7ae0;
|
||||
--shadow: 0 20px 60px rgba(0, 0, 0, 0.4);
|
||||
}
|
||||
|
||||
[data-theme="dark"] body {
|
||||
background: radial-gradient(circle at top, #131826 0%, var(--bg) 60%);
|
||||
}
|
||||
|
||||
[data-theme="dark"] .download-card {
|
||||
background: #121826;
|
||||
}
|
||||
|
||||
[data-theme="dark"] .progress {
|
||||
background: #2a3349;
|
||||
}
|
||||
|
||||
[data-theme="dark"] .log-output {
|
||||
background: #080b12;
|
||||
color: #d8e4f3;
|
||||
}
|
||||
|
||||
@media (max-width: 900px) {
|
||||
.topbar {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
}
|
||||
|
||||
@media (max-width: 640px) {
|
||||
.page {
|
||||
padding: 32px 16px 48px;
|
||||
}
|
||||
}
|
||||
74
llamaCpp.Wrapper.app/warmup.py
Normal file
74
llamaCpp.Wrapper.app/warmup.py
Normal file
@@ -0,0 +1,74 @@
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import httpx
|
||||
|
||||
|
||||
log = logging.getLogger("llamacpp_warmup")
|
||||
|
||||
|
||||
def _is_loading_error(response: httpx.Response) -> bool:
|
||||
if response.status_code != 503:
|
||||
return False
|
||||
try:
|
||||
payload = response.json()
|
||||
except Exception:
|
||||
return False
|
||||
message = ""
|
||||
if isinstance(payload, dict):
|
||||
error = payload.get("error")
|
||||
if isinstance(error, dict):
|
||||
message = str(error.get("message") or "")
|
||||
else:
|
||||
message = str(payload.get("message") or "")
|
||||
return "loading model" in message.lower()
|
||||
|
||||
|
||||
def resolve_warmup_prompt(override: str | None, fallback_path: str) -> str:
|
||||
if override:
|
||||
prompt = override.strip()
|
||||
if prompt:
|
||||
return prompt
|
||||
try:
|
||||
prompt = Path(fallback_path).read_text(encoding="utf-8").strip()
|
||||
if prompt:
|
||||
return prompt
|
||||
except Exception as exc:
|
||||
log.warning("Failed to read warmup prompt from %s: %s", fallback_path, exc)
|
||||
return "ok"
|
||||
|
||||
|
||||
async def run_warmup(base_url: str, model_id: str, prompt: str, timeout_s: float) -> None:
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
"max_tokens": 8,
|
||||
"temperature": 0,
|
||||
}
|
||||
async with httpx.AsyncClient(base_url=base_url, timeout=timeout_s) as client:
|
||||
resp = await client.post("/v1/chat/completions", json=payload)
|
||||
if resp.status_code == 503 and _is_loading_error(resp):
|
||||
raise RuntimeError("llama.cpp still loading model")
|
||||
resp.raise_for_status()
|
||||
|
||||
|
||||
async def run_warmup_with_retry(
|
||||
base_url: str,
|
||||
model_id: str,
|
||||
prompt: str,
|
||||
timeout_s: float,
|
||||
interval_s: float = 3.0,
|
||||
) -> None:
|
||||
deadline = time.time() + timeout_s
|
||||
last_exc: Exception | None = None
|
||||
while time.time() < deadline:
|
||||
try:
|
||||
await run_warmup(base_url, model_id, prompt, timeout_s=timeout_s)
|
||||
return
|
||||
except Exception as exc:
|
||||
last_exc = exc
|
||||
await asyncio.sleep(interval_s)
|
||||
if last_exc:
|
||||
raise last_exc
|
||||
464
llamacpp_remote_test.ps1
Normal file
464
llamacpp_remote_test.ps1
Normal file
@@ -0,0 +1,464 @@
|
||||
param(
|
||||
[Parameter(Mandatory = $true)][string]$Model,
|
||||
[string]$BaseUrl = "http://192.168.1.2:8071",
|
||||
[string]$PromptPath = "prompt_crwv.txt",
|
||||
[int]$Runs = 3,
|
||||
[int]$MaxTokens = 2000,
|
||||
[int]$NumCtx = 131072,
|
||||
[int]$TopK = 1,
|
||||
[double]$TopP = 1.0,
|
||||
[int]$Seed = 42,
|
||||
[double]$RepeatPenalty = 1.05,
|
||||
[double]$Temperature = 0,
|
||||
[string]$JsonSchema = "",
|
||||
[int]$TimeoutSec = 1800,
|
||||
[string]$BatchId,
|
||||
[switch]$EnableGpuMonitor = $true,
|
||||
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||
[string]$SshUser = "rushabh",
|
||||
[string]$SshHost = "192.168.1.2",
|
||||
[int]$SshPort = 55555,
|
||||
[int]$GpuMonitorIntervalSec = 1,
|
||||
[int]$GpuMonitorSeconds = 120
|
||||
)
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
$ProgressPreference = "SilentlyContinue"
|
||||
|
||||
function Normalize-Strike([object]$value) {
|
||||
if ($null -eq $value) { return $null }
|
||||
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
|
||||
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
|
||||
}
|
||||
return ($value.ToString().Trim())
|
||||
}
|
||||
|
||||
function Get-AllowedLegs([string]$promptText) {
|
||||
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
|
||||
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
|
||||
if (-not $match.Success) {
|
||||
throw "Options Chain JSON block not found in prompt."
|
||||
}
|
||||
$chains = $match.Groups[1].Value | ConvertFrom-Json
|
||||
$allowedExpiry = @{}
|
||||
$allowedLegs = @{}
|
||||
foreach ($exp in $chains) {
|
||||
$expiry = [string]$exp.expiry
|
||||
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
|
||||
$allowedExpiry[$expiry] = $true
|
||||
foreach ($leg in $exp.liquidSet) {
|
||||
if ($null -eq $leg) { continue }
|
||||
if ($leg.liquid -ne $true) { continue }
|
||||
$side = [string]$leg.side
|
||||
$strikeNorm = Normalize-Strike $leg.strike
|
||||
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
|
||||
$key = "$expiry|$side|$strikeNorm"
|
||||
$allowedLegs[$key] = $true
|
||||
}
|
||||
}
|
||||
}
|
||||
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
|
||||
}
|
||||
|
||||
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
|
||||
$errors = New-Object System.Collections.Generic.List[string]
|
||||
|
||||
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
|
||||
foreach ($key in $requiredTop) {
|
||||
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
|
||||
$errors.Add("Missing top-level key: $key")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
|
||||
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
|
||||
}
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
|
||||
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
|
||||
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
|
||||
}
|
||||
} else {
|
||||
$errors.Add("selectedExpiry is missing or empty")
|
||||
}
|
||||
|
||||
if ($obj.confidenceScore -ne $null) {
|
||||
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
|
||||
$errors.Add("confidenceScore is not numeric")
|
||||
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
|
||||
$errors.Add("confidenceScore out of range 0-100")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.recommendedTrades -eq $null) {
|
||||
$errors.Add("recommendedTrades is null")
|
||||
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("recommendedTrades is not an array")
|
||||
}
|
||||
|
||||
if ($obj.strategyBias -eq "NO_TRADE") {
|
||||
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
|
||||
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
|
||||
}
|
||||
} else {
|
||||
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
|
||||
$errors.Add("recommendedTrades must contain 1-3 trades")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("whyOthersRejected is not an array")
|
||||
}
|
||||
|
||||
if ($obj.recommendedTrades) {
|
||||
foreach ($trade in $obj.recommendedTrades) {
|
||||
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||
foreach ($tkey in $tradeRequired) {
|
||||
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
|
||||
$errors.Add("Trade missing key: $tkey")
|
||||
}
|
||||
}
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
|
||||
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
|
||||
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
|
||||
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
|
||||
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
|
||||
}
|
||||
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
|
||||
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
|
||||
}
|
||||
|
||||
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("Trade legs missing or not an array")
|
||||
continue
|
||||
}
|
||||
|
||||
$legs = @($trade.legs)
|
||||
|
||||
$hasBuy = $false
|
||||
$hasSell = $false
|
||||
foreach ($leg in $trade.legs) {
|
||||
$side = ([string]$leg.side).ToLowerInvariant()
|
||||
$action = ([string]$leg.action).ToLowerInvariant()
|
||||
$expiry = [string]$leg.expiry
|
||||
$strikeNorm = Normalize-Strike $leg.strike
|
||||
|
||||
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
|
||||
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
|
||||
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
|
||||
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
|
||||
$key = "$expiry|$side|$strikeNorm"
|
||||
if (-not $allowedLegs.ContainsKey($key)) {
|
||||
$errors.Add("Leg not in liquid set: $key")
|
||||
}
|
||||
}
|
||||
|
||||
if ($action -eq "buy") { $hasBuy = $true }
|
||||
if ($action -eq "sell") { $hasSell = $true }
|
||||
}
|
||||
|
||||
if ($obj.selectedExpiry -and $legs) {
|
||||
foreach ($leg in $legs) {
|
||||
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
|
||||
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($hasSell -and -not $hasBuy) {
|
||||
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
|
||||
}
|
||||
|
||||
if ($trade.greekProfile) {
|
||||
$gp = $trade.greekProfile
|
||||
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||
foreach ($gkey in $gpRequired) {
|
||||
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
|
||||
$errors.Add("Missing greekProfile.$gkey")
|
||||
}
|
||||
}
|
||||
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
|
||||
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
|
||||
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
|
||||
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
|
||||
|
||||
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
|
||||
$errors.Add("ThetaExposure POS on all-long legs")
|
||||
}
|
||||
} else {
|
||||
$errors.Add("Missing greekProfile")
|
||||
}
|
||||
|
||||
$structure = ([string]$trade.structure).ToLowerInvariant()
|
||||
$tradeName = ([string]$trade.name).ToLowerInvariant()
|
||||
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
|
||||
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
|
||||
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||
|
||||
if ($isStraddle -or $isStrangle) {
|
||||
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
|
||||
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
|
||||
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
|
||||
$callStrike = Normalize-Strike $callLegs[0].strike
|
||||
$putStrike = Normalize-Strike $putLegs[0].strike
|
||||
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
|
||||
if ($isStrangle) {
|
||||
try {
|
||||
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
|
||||
} catch {
|
||||
$errors.Add("Strangle strike comparison failed")
|
||||
}
|
||||
}
|
||||
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
|
||||
$errors.Add("Straddle/Strangle must be long (buy) legs")
|
||||
}
|
||||
}
|
||||
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
|
||||
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
|
||||
}
|
||||
}
|
||||
|
||||
if ($isCallDebit) {
|
||||
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
|
||||
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
|
||||
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
|
||||
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
|
||||
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||
try {
|
||||
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
|
||||
$errors.Add("Call debit spread buy strike must be below sell strike")
|
||||
}
|
||||
} catch {
|
||||
$errors.Add("Call debit spread strike comparison failed")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($isPutDebit) {
|
||||
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
|
||||
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
|
||||
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
|
||||
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
|
||||
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||
try {
|
||||
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
|
||||
$errors.Add("Put debit spread buy strike must be above sell strike")
|
||||
}
|
||||
} catch {
|
||||
$errors.Add("Put debit spread strike comparison failed")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return $errors
|
||||
}
|
||||
|
||||
function Parse-GpuLog {
|
||||
param([string]$Path)
|
||||
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
|
||||
if (-not (Test-Path $Path)) {
|
||||
$summary.error = "gpu log missing"
|
||||
return $summary
|
||||
}
|
||||
$lines = Get-Content -Path $Path
|
||||
$currentIndex = -1
|
||||
$gpuIndex = -1
|
||||
$inUtilBlock = $false
|
||||
foreach ($line in $lines) {
|
||||
if ($line -match '^Timestamp') {
|
||||
$gpuIndex = -1
|
||||
$currentIndex = -1
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
|
||||
$gpuIndex += 1
|
||||
$currentIndex = $gpuIndex
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($line -match '^\\s*Utilization\\s*$') {
|
||||
$inUtilBlock = $true
|
||||
continue
|
||||
}
|
||||
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
|
||||
$util = [int]$Matches[1]
|
||||
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||
$summary.samples += 1
|
||||
}
|
||||
}
|
||||
return $summary
|
||||
}
|
||||
|
||||
$prompt = [string](Get-Content -Raw -Path $PromptPath)
|
||||
$allowed = Get-AllowedLegs -promptText $prompt
|
||||
$allowedExpiry = $allowed.AllowedExpiry
|
||||
$allowedLegs = $allowed.AllowedLegs
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace($BatchId)) {
|
||||
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
|
||||
}
|
||||
|
||||
$outBase = Join-Path -Path (Get-Location) -ChildPath "llamacpp_runs_remote"
|
||||
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
|
||||
|
||||
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
|
||||
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
|
||||
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
|
||||
|
||||
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
|
||||
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
|
||||
|
||||
$summary = [ordered]@{
|
||||
model = $Model
|
||||
baseUrl = $BaseUrl
|
||||
batchId = $BatchId
|
||||
params = [ordered]@{
|
||||
temperature = $Temperature
|
||||
top_k = $TopK
|
||||
top_p = $TopP
|
||||
seed = $Seed
|
||||
repeat_penalty = $RepeatPenalty
|
||||
max_tokens = $MaxTokens
|
||||
num_ctx = $NumCtx
|
||||
}
|
||||
gpuMonitor = [ordered]@{
|
||||
enabled = [bool]$EnableGpuMonitor
|
||||
sshHost = $SshHost
|
||||
sshPort = $SshPort
|
||||
intervalSec = $GpuMonitorIntervalSec
|
||||
durationSec = $GpuMonitorSeconds
|
||||
}
|
||||
modelMeta = $null
|
||||
runs = @()
|
||||
}
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
|
||||
try {
|
||||
$schemaObject = $JsonSchema | ConvertFrom-Json
|
||||
} catch {
|
||||
throw "JsonSchema is not valid JSON: $($_.Exception.Message)"
|
||||
}
|
||||
}
|
||||
|
||||
try {
|
||||
$modelsResponse = Invoke-RestMethod -Uri "$BaseUrl/v1/models" -TimeoutSec 30
|
||||
$meta = $modelsResponse.data | Where-Object { $_.id -eq $Model } | Select-Object -First 1
|
||||
if ($meta) { $summary.modelMeta = $meta.meta }
|
||||
} catch {
|
||||
$summary.modelMeta = @{ error = $_.Exception.Message }
|
||||
}
|
||||
|
||||
for ($i = 1; $i -le $Runs; $i++) {
|
||||
Write-Host "Running $Model (run $i/$Runs)"
|
||||
|
||||
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
|
||||
$gpuJob = $null
|
||||
$gpuLogPath = $null
|
||||
|
||||
if ($EnableGpuMonitor) {
|
||||
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
|
||||
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
|
||||
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||
$gpuJob = Start-Job -ScriptBlock {
|
||||
param($sshExe, $target, $port, $samples, $interval, $logPath)
|
||||
for ($s = 1; $s -le $samples; $s++) {
|
||||
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
|
||||
try {
|
||||
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
|
||||
Add-Content -Path $logPath -Value $out
|
||||
} catch {
|
||||
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
|
||||
}
|
||||
Start-Sleep -Seconds $interval
|
||||
}
|
||||
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
|
||||
Start-Sleep -Seconds 1
|
||||
}
|
||||
|
||||
$body = @{
|
||||
model = $Model
|
||||
messages = @(@{ role = "user"; content = $prompt })
|
||||
temperature = $Temperature
|
||||
top_k = $TopK
|
||||
top_p = $TopP
|
||||
seed = $Seed
|
||||
repeat_penalty = $RepeatPenalty
|
||||
max_tokens = $MaxTokens
|
||||
}
|
||||
|
||||
if ($schemaObject) {
|
||||
$body.response_format = @{
|
||||
type = "json_schema"
|
||||
json_schema = @{
|
||||
name = "trade_schema"
|
||||
schema = $schemaObject
|
||||
strict = $true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
$body = $body | ConvertTo-Json -Depth 12
|
||||
|
||||
try {
|
||||
$resp = Invoke-RestMethod -Uri "$BaseUrl/v1/chat/completions" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
|
||||
} catch {
|
||||
$runResult.errors = @("API error: $($_.Exception.Message)")
|
||||
$summary.runs += $runResult
|
||||
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
|
||||
continue
|
||||
} finally {
|
||||
if ($gpuJob) {
|
||||
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
|
||||
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
|
||||
Remove-Job -Job $gpuJob | Out-Null
|
||||
}
|
||||
}
|
||||
|
||||
$raw = [string]$resp.choices[0].message.content
|
||||
|
||||
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
|
||||
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
|
||||
|
||||
try {
|
||||
$parsed = $raw | ConvertFrom-Json
|
||||
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
|
||||
if ($errors.Count -eq 0) {
|
||||
$runResult.ok = $true
|
||||
} else {
|
||||
$runResult.errors = $errors
|
||||
}
|
||||
} catch {
|
||||
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
|
||||
}
|
||||
|
||||
if ($gpuLogPath) {
|
||||
$runResult.gpuLog = $gpuLogPath
|
||||
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
|
||||
}
|
||||
if ($resp.timings) {
|
||||
$runResult.timings = $resp.timings
|
||||
}
|
||||
if ($resp.usage) {
|
||||
$runResult.usage = $resp.usage
|
||||
}
|
||||
|
||||
$summary.runs += $runResult
|
||||
}
|
||||
|
||||
$summaryPath = Join-Path $outDir "summary.json"
|
||||
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
|
||||
|
||||
$summary | ConvertTo-Json -Depth 6
|
||||
117
llamacpp_set_command.ps1
Normal file
117
llamacpp_set_command.ps1
Normal file
@@ -0,0 +1,117 @@
|
||||
param(
|
||||
[Parameter(Mandatory = $true)][string]$ModelPath,
|
||||
[Parameter(Mandatory = $true)][int]$CtxSize,
|
||||
[int]$BatchSize = 1024,
|
||||
[int]$UBatchSize = 256,
|
||||
[string]$TensorSplit = "0.5,0.5",
|
||||
[string]$Devices = "0,1",
|
||||
[int]$GpuLayers = 999,
|
||||
[string]$CacheTypeK = "q4_0",
|
||||
[string]$CacheTypeV = "q4_0",
|
||||
[string]$GrammarFile = "",
|
||||
[string]$JsonSchema = "",
|
||||
[string]$BaseUrl = "http://192.168.1.2:8071",
|
||||
[int]$TimeoutSec = 600,
|
||||
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||
[string]$SshUser = "rushabh",
|
||||
[string]$SshHost = "192.168.1.2",
|
||||
[int]$SshPort = 55555
|
||||
)
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
$ProgressPreference = "SilentlyContinue"
|
||||
|
||||
$commandArgs = @(
|
||||
"--model", $ModelPath,
|
||||
"--ctx-size", $CtxSize.ToString(),
|
||||
"--n-gpu-layers", $GpuLayers.ToString(),
|
||||
"--split-mode", "layer",
|
||||
"--tensor-split", $TensorSplit,
|
||||
"--batch-size", $BatchSize.ToString(),
|
||||
"--ubatch-size", $UBatchSize.ToString(),
|
||||
"--cache-type-k", $CacheTypeK,
|
||||
"--cache-type-v", $CacheTypeV,
|
||||
"--flash-attn", "on"
|
||||
)
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace($Devices)) {
|
||||
$commandArgs = @("--device", $Devices) + $commandArgs
|
||||
}
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace($GrammarFile)) {
|
||||
$commandArgs += @("--grammar-file", $GrammarFile)
|
||||
}
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace($JsonSchema)) {
|
||||
$commandArgs += @("--json-schema", $JsonSchema)
|
||||
}
|
||||
|
||||
$argJson = $commandArgs | ConvertTo-Json -Compress
|
||||
|
||||
$py = @"
|
||||
import json
|
||||
path = r"/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
|
||||
new_cmd = json.loads(r'''$argJson''')
|
||||
lines = open(path, "r", encoding="utf-8").read().splitlines()
|
||||
out = []
|
||||
in_cmd = False
|
||||
def yaml_quote(value):
|
||||
text = str(value)
|
||||
return "'" + text.replace("'", "''") + "'"
|
||||
for line in lines:
|
||||
if line.startswith('"command":'):
|
||||
out.append('"command":')
|
||||
for arg in new_cmd:
|
||||
out.append(f"- {yaml_quote(arg)}")
|
||||
in_cmd = True
|
||||
continue
|
||||
if in_cmd:
|
||||
if line.startswith('"') and not line.startswith('"command":'):
|
||||
in_cmd = False
|
||||
out.append(line)
|
||||
else:
|
||||
continue
|
||||
else:
|
||||
out.append(line)
|
||||
if in_cmd:
|
||||
pass
|
||||
open(path, "w", encoding="utf-8").write("\n".join(out) + "\n")
|
||||
"@
|
||||
|
||||
$py | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -"
|
||||
|
||||
$pyCompose = @"
|
||||
import json, yaml, subprocess
|
||||
compose_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/templates/rendered/docker-compose.yaml"
|
||||
user_config_path = "/mnt/.ix-apps/app_configs/llamacpp/versions/1.2.17/user_config.yaml"
|
||||
with open(compose_path, "r", encoding="utf-8") as f:
|
||||
compose = json.load(f)
|
||||
with open(user_config_path, "r", encoding="utf-8") as f:
|
||||
config = yaml.safe_load(f)
|
||||
command = config.get("command")
|
||||
if not command:
|
||||
raise SystemExit("command list missing from user_config")
|
||||
svc = compose["services"]["llamacpp"]
|
||||
svc["command"] = command
|
||||
with open(compose_path, "w", encoding="utf-8") as f:
|
||||
json.dump(compose, f)
|
||||
payload = {"custom_compose_config": compose}
|
||||
subprocess.run(["midclt", "call", "app.update", "llamacpp", json.dumps(payload)], check=True)
|
||||
"@
|
||||
|
||||
$pyCompose | & $SshExe -p $SshPort "$SshUser@$SshHost" "sudo -n python3 -" | Out-Null
|
||||
|
||||
$start = Get-Date
|
||||
while ((Get-Date) - $start -lt [TimeSpan]::FromSeconds($TimeoutSec)) {
|
||||
try {
|
||||
$resp = Invoke-RestMethod -Uri "$BaseUrl/health" -TimeoutSec 10
|
||||
if ($resp.status -eq "ok") {
|
||||
Write-Host "llamacpp healthy at $BaseUrl"
|
||||
exit 0
|
||||
}
|
||||
} catch {
|
||||
Start-Sleep -Seconds 5
|
||||
}
|
||||
}
|
||||
|
||||
throw "Timed out waiting for llama.cpp server at $BaseUrl"
|
||||
14
modelfiles/options-json-deepseek14b.Modelfile
Normal file
14
modelfiles/options-json-deepseek14b.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
||||
FROM deepseek-r1:14b
|
||||
SYSTEM """
|
||||
You are a senior quantitative options trader specializing in index and ETF options.
|
||||
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||
Begin output with { and end with }.
|
||||
"""
|
||||
PARAMETER temperature 0
|
||||
PARAMETER top_k 1
|
||||
PARAMETER top_p 1
|
||||
PARAMETER repeat_penalty 1.05
|
||||
PARAMETER seed 42
|
||||
14
modelfiles/options-json-llama31-70b.Modelfile
Normal file
14
modelfiles/options-json-llama31-70b.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
||||
FROM llama3.1:70b
|
||||
SYSTEM """
|
||||
You are a senior quantitative options trader specializing in index and ETF options.
|
||||
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||
Begin output with { and end with }.
|
||||
"""
|
||||
PARAMETER temperature 0
|
||||
PARAMETER top_k 1
|
||||
PARAMETER top_p 1
|
||||
PARAMETER repeat_penalty 1.05
|
||||
PARAMETER seed 42
|
||||
14
modelfiles/options-json-phi3mini.Modelfile
Normal file
14
modelfiles/options-json-phi3mini.Modelfile
Normal file
@@ -0,0 +1,14 @@
|
||||
FROM phi3:mini-128k
|
||||
SYSTEM """
|
||||
You are a senior quantitative options trader specializing in index and ETF options.
|
||||
Return ONLY a single valid JSON object that matches the exact schema described in the user prompt.
|
||||
No markdown, no code fences, no commentary, no extra keys, no trailing text.
|
||||
Use ONLY strikes/expiries from the provided options chain; do NOT invent data.
|
||||
If no trade qualifies, set "strategyBias" to "NO_TRADE" and "recommendedTrades" to [].
|
||||
Begin output with { and end with }.
|
||||
"""
|
||||
PARAMETER temperature 0
|
||||
PARAMETER top_k 1
|
||||
PARAMETER top_p 1
|
||||
PARAMETER repeat_penalty 1.05
|
||||
PARAMETER seed 42
|
||||
561
ollama_remote_test.ps1
Normal file
561
ollama_remote_test.ps1
Normal file
@@ -0,0 +1,561 @@
|
||||
param(
|
||||
[Parameter(Mandatory = $true)][string]$Model,
|
||||
[string]$BaseUrl = "http://192.168.1.2:30068",
|
||||
[string]$PromptPath = "prompt_crwv.txt",
|
||||
[int]$Runs = 3,
|
||||
[int]$NumPredict = 1200,
|
||||
[int]$NumCtx = 131072,
|
||||
[int]$NumBatch = 0,
|
||||
[int]$NumGpuLayers = 0,
|
||||
[int]$TimeoutSec = 900,
|
||||
[int]$TopK = 1,
|
||||
[double]$TopP = 1.0,
|
||||
[int]$Seed = 42,
|
||||
[double]$RepeatPenalty = 1.05,
|
||||
[string]$BatchId,
|
||||
[switch]$UseSchemaFormat = $false,
|
||||
[switch]$EnableGpuMonitor = $true,
|
||||
[string]$SshExe = "$env:SystemRoot\\System32\\OpenSSH\\ssh.exe",
|
||||
[switch]$CheckProcessor = $true,
|
||||
[string]$SshUser = "rushabh",
|
||||
[string]$SshHost = "192.168.1.2",
|
||||
[int]$SshPort = 55555,
|
||||
[int]$GpuMonitorIntervalSec = 1,
|
||||
[int]$GpuMonitorSeconds = 120
|
||||
)
|
||||
|
||||
$ErrorActionPreference = "Stop"
|
||||
$ProgressPreference = "SilentlyContinue"
|
||||
|
||||
function Normalize-Strike([object]$value) {
|
||||
if ($null -eq $value) { return $null }
|
||||
if ($value -is [double] -or $value -is [float] -or $value -is [int] -or $value -is [long]) {
|
||||
return ([double]$value).ToString("0.################", [System.Globalization.CultureInfo]::InvariantCulture)
|
||||
}
|
||||
return ($value.ToString().Trim())
|
||||
}
|
||||
|
||||
function Get-AllowedLegs([string]$promptText) {
|
||||
$pattern = 'Options Chain\s*```\s*(\[[\s\S]*?\])\s*```'
|
||||
$match = [regex]::Match($promptText, $pattern, [System.Text.RegularExpressions.RegexOptions]::Singleline)
|
||||
if (-not $match.Success) {
|
||||
throw "Options Chain JSON block not found in prompt."
|
||||
}
|
||||
$chains = $match.Groups[1].Value | ConvertFrom-Json
|
||||
$allowedExpiry = @{}
|
||||
$allowedLegs = @{}
|
||||
foreach ($exp in $chains) {
|
||||
$expiry = [string]$exp.expiry
|
||||
if ([string]::IsNullOrWhiteSpace($expiry)) { continue }
|
||||
$allowedExpiry[$expiry] = $true
|
||||
foreach ($leg in $exp.liquidSet) {
|
||||
if ($null -eq $leg) { continue }
|
||||
if ($leg.liquid -ne $true) { continue }
|
||||
$side = [string]$leg.side
|
||||
$strikeNorm = Normalize-Strike $leg.strike
|
||||
if (-not [string]::IsNullOrWhiteSpace($side) -and $strikeNorm) {
|
||||
$key = "$expiry|$side|$strikeNorm"
|
||||
$allowedLegs[$key] = $true
|
||||
}
|
||||
}
|
||||
}
|
||||
return @{ AllowedExpiry = $allowedExpiry; AllowedLegs = $allowedLegs }
|
||||
}
|
||||
|
||||
function Test-TradeSchema($obj, $allowedExpiry, $allowedLegs) {
|
||||
$errors = New-Object System.Collections.Generic.List[string]
|
||||
|
||||
$requiredTop = @("selectedExpiry", "expiryRationale", "strategyBias", "recommendedTrades", "whyOthersRejected", "confidenceScore")
|
||||
foreach ($key in $requiredTop) {
|
||||
if (-not ($obj.PSObject.Properties.Name -contains $key)) {
|
||||
$errors.Add("Missing top-level key: $key")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.strategyBias -and ($obj.strategyBias -notin @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE"))) {
|
||||
$errors.Add("Invalid strategyBias: $($obj.strategyBias)")
|
||||
}
|
||||
|
||||
if (-not [string]::IsNullOrWhiteSpace([string]$obj.selectedExpiry)) {
|
||||
if (-not $allowedExpiry.ContainsKey([string]$obj.selectedExpiry)) {
|
||||
$errors.Add("selectedExpiry not in provided expiries: $($obj.selectedExpiry)")
|
||||
}
|
||||
} else {
|
||||
$errors.Add("selectedExpiry is missing or empty")
|
||||
}
|
||||
|
||||
if ($obj.confidenceScore -ne $null) {
|
||||
if (-not ($obj.confidenceScore -is [double] -or $obj.confidenceScore -is [int])) {
|
||||
$errors.Add("confidenceScore is not numeric")
|
||||
} elseif ($obj.confidenceScore -lt 0 -or $obj.confidenceScore -gt 100) {
|
||||
$errors.Add("confidenceScore out of range 0-100")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.recommendedTrades -eq $null) {
|
||||
$errors.Add("recommendedTrades is null")
|
||||
} elseif (-not ($obj.recommendedTrades -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("recommendedTrades is not an array")
|
||||
}
|
||||
|
||||
if ($obj.strategyBias -eq "NO_TRADE") {
|
||||
if ($obj.recommendedTrades -and $obj.recommendedTrades.Count -gt 0) {
|
||||
$errors.Add("strategyBias is NO_TRADE but recommendedTrades is not empty")
|
||||
}
|
||||
} else {
|
||||
if (-not $obj.recommendedTrades -or $obj.recommendedTrades.Count -lt 1 -or $obj.recommendedTrades.Count -gt 3) {
|
||||
$errors.Add("recommendedTrades must contain 1-3 trades")
|
||||
}
|
||||
}
|
||||
|
||||
if ($obj.whyOthersRejected -ne $null -and -not ($obj.whyOthersRejected -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("whyOthersRejected is not an array")
|
||||
}
|
||||
|
||||
if ($obj.recommendedTrades) {
|
||||
foreach ($trade in $obj.recommendedTrades) {
|
||||
$tradeRequired = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||
foreach ($tkey in $tradeRequired) {
|
||||
if (-not ($trade.PSObject.Properties.Name -contains $tkey)) {
|
||||
$errors.Add("Trade missing key: $tkey")
|
||||
}
|
||||
}
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.name)) { $errors.Add("Trade name is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.structure)) { $errors.Add("Trade structure is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.thesisAlignment)) { $errors.Add("Trade thesisAlignment is empty") }
|
||||
if ([string]::IsNullOrWhiteSpace([string]$trade.invalidation)) { $errors.Add("Trade invalidation is empty") }
|
||||
if ($trade.maxRisk -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxRisk)) { $errors.Add("Trade maxRisk is empty") }
|
||||
if ($trade.maxReward -eq $null -or [string]::IsNullOrWhiteSpace([string]$trade.maxReward)) { $errors.Add("Trade maxReward is empty") }
|
||||
if ($trade.maxRisk -is [double] -or $trade.maxRisk -is [int]) {
|
||||
if ($trade.maxRisk -le 0) { $errors.Add("Trade maxRisk must be > 0") }
|
||||
}
|
||||
if ($trade.maxReward -is [double] -or $trade.maxReward -is [int]) {
|
||||
if ($trade.maxReward -le 0) { $errors.Add("Trade maxReward must be > 0") }
|
||||
}
|
||||
|
||||
if (-not $trade.legs -or -not ($trade.legs -is [System.Collections.IEnumerable])) {
|
||||
$errors.Add("Trade legs missing or not an array")
|
||||
continue
|
||||
}
|
||||
|
||||
$legs = @($trade.legs)
|
||||
|
||||
$hasBuy = $false
|
||||
$hasSell = $false
|
||||
foreach ($leg in $trade.legs) {
|
||||
$side = ([string]$leg.side).ToLowerInvariant()
|
||||
$action = ([string]$leg.action).ToLowerInvariant()
|
||||
$expiry = [string]$leg.expiry
|
||||
$strikeNorm = Normalize-Strike $leg.strike
|
||||
|
||||
if ($side -notin @("call","put")) { $errors.Add("Invalid leg side: $side") }
|
||||
if ($action -notin @("buy","sell")) { $errors.Add("Invalid leg action: $action") }
|
||||
if (-not $allowedExpiry.ContainsKey($expiry)) { $errors.Add("Leg expiry not allowed: $expiry") }
|
||||
if (-not $strikeNorm) { $errors.Add("Leg strike missing") } else {
|
||||
$key = "$expiry|$side|$strikeNorm"
|
||||
if (-not $allowedLegs.ContainsKey($key)) {
|
||||
$errors.Add("Leg not in liquid set: $key")
|
||||
}
|
||||
}
|
||||
|
||||
if ($action -eq "buy") { $hasBuy = $true }
|
||||
if ($action -eq "sell") { $hasSell = $true }
|
||||
}
|
||||
|
||||
if ($obj.selectedExpiry -and $legs) {
|
||||
foreach ($leg in $legs) {
|
||||
if ([string]$leg.expiry -ne [string]$obj.selectedExpiry) {
|
||||
$errors.Add("Leg expiry does not match selectedExpiry: $($leg.expiry)")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($hasSell -and -not $hasBuy) {
|
||||
$errors.Add("Naked short detected: trade has sell leg(s) with no buy leg")
|
||||
}
|
||||
|
||||
if ($trade.greekProfile) {
|
||||
$gp = $trade.greekProfile
|
||||
$gpRequired = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||
foreach ($gkey in $gpRequired) {
|
||||
if (-not ($gp.PSObject.Properties.Name -contains $gkey)) {
|
||||
$errors.Add("Missing greekProfile.$gkey")
|
||||
}
|
||||
}
|
||||
if ($gp.deltaBias -and ($gp.deltaBias -notin @("POS","NEG","NEUTRAL"))) { $errors.Add("Invalid deltaBias") }
|
||||
if ($gp.gammaExposure -and ($gp.gammaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid gammaExposure") }
|
||||
if ($gp.thetaExposure -and ($gp.thetaExposure -notin @("POS","NEG","LOW"))) { $errors.Add("Invalid thetaExposure") }
|
||||
if ($gp.vegaExposure -and ($gp.vegaExposure -notin @("HIGH","MED","LOW"))) { $errors.Add("Invalid vegaExposure") }
|
||||
|
||||
if (-not $hasSell -and $gp.thetaExposure -eq "POS") {
|
||||
$errors.Add("ThetaExposure POS on all-long legs")
|
||||
}
|
||||
} else {
|
||||
$errors.Add("Missing greekProfile")
|
||||
}
|
||||
|
||||
$structure = ([string]$trade.structure).ToLowerInvariant()
|
||||
$tradeName = ([string]$trade.name).ToLowerInvariant()
|
||||
$isStraddle = $structure -match "straddle" -or $tradeName -match "straddle"
|
||||
$isStrangle = $structure -match "strangle" -or $tradeName -match "strangle"
|
||||
$isCallDebit = ($structure -match "call") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||
$isPutDebit = ($structure -match "put") -and ($structure -match "debit") -and ($structure -match "spread")
|
||||
|
||||
if ($isStraddle -or $isStrangle) {
|
||||
if ($legs.Count -ne 2) { $errors.Add("Straddle/Strangle must have exactly 2 legs") }
|
||||
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||
if ($callLegs.Count -ne 1 -or $putLegs.Count -ne 1) { $errors.Add("Straddle/Strangle must have 1 call and 1 put") }
|
||||
if ($callLegs.Count -eq 1 -and $putLegs.Count -eq 1) {
|
||||
$callStrike = Normalize-Strike $callLegs[0].strike
|
||||
$putStrike = Normalize-Strike $putLegs[0].strike
|
||||
if ($isStraddle -and $callStrike -ne $putStrike) { $errors.Add("Straddle strikes must match") }
|
||||
if ($isStrangle) {
|
||||
try {
|
||||
if ([double]$callStrike -le [double]$putStrike) { $errors.Add("Strangle call strike must be above put strike") }
|
||||
} catch {
|
||||
$errors.Add("Strangle strike comparison failed")
|
||||
}
|
||||
}
|
||||
if ($callLegs[0].action -ne "buy" -or $putLegs[0].action -ne "buy") {
|
||||
$errors.Add("Straddle/Strangle must be long (buy) legs")
|
||||
}
|
||||
}
|
||||
if ($trade.greekProfile -and $trade.greekProfile.deltaBias -and $trade.greekProfile.deltaBias -ne "NEUTRAL") {
|
||||
$errors.Add("DeltaBias must be NEUTRAL for straddle/strangle")
|
||||
}
|
||||
}
|
||||
|
||||
if ($isCallDebit) {
|
||||
$callLegs = $legs | Where-Object { $_.side -eq "call" }
|
||||
if ($callLegs.Count -ne 2) { $errors.Add("Call debit spread must have 2 call legs") }
|
||||
$buy = $callLegs | Where-Object { $_.action -eq "buy" }
|
||||
$sell = $callLegs | Where-Object { $_.action -eq "sell" }
|
||||
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Call debit spread must have 1 buy and 1 sell") }
|
||||
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||
try {
|
||||
if ([double](Normalize-Strike $buy[0].strike) -ge [double](Normalize-Strike $sell[0].strike)) {
|
||||
$errors.Add("Call debit spread buy strike must be below sell strike")
|
||||
}
|
||||
} catch {
|
||||
$errors.Add("Call debit spread strike comparison failed")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ($isPutDebit) {
|
||||
$putLegs = $legs | Where-Object { $_.side -eq "put" }
|
||||
if ($putLegs.Count -ne 2) { $errors.Add("Put debit spread must have 2 put legs") }
|
||||
$buy = $putLegs | Where-Object { $_.action -eq "buy" }
|
||||
$sell = $putLegs | Where-Object { $_.action -eq "sell" }
|
||||
if ($buy.Count -ne 1 -or $sell.Count -ne 1) { $errors.Add("Put debit spread must have 1 buy and 1 sell") }
|
||||
if ($buy.Count -eq 1 -and $sell.Count -eq 1) {
|
||||
try {
|
||||
if ([double](Normalize-Strike $buy[0].strike) -le [double](Normalize-Strike $sell[0].strike)) {
|
||||
$errors.Add("Put debit spread buy strike must be above sell strike")
|
||||
}
|
||||
} catch {
|
||||
$errors.Add("Put debit spread strike comparison failed")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return $errors
|
||||
}
|
||||
|
||||
function Parse-GpuLog {
|
||||
param([string]$Path)
|
||||
$summary = [ordered]@{ gpu0Used = $false; gpu1Used = $false; samples = 0; error = $null }
|
||||
if (-not (Test-Path $Path)) {
|
||||
$summary.error = "gpu log missing"
|
||||
return $summary
|
||||
}
|
||||
$lines = Get-Content -Path $Path
|
||||
$currentIndex = -1
|
||||
$gpuIndex = -1
|
||||
$inGpuUtilSamples = $false
|
||||
$inUtilBlock = $false
|
||||
foreach ($line in $lines) {
|
||||
if ($line -match '^Timestamp') {
|
||||
$gpuIndex = -1
|
||||
$currentIndex = -1
|
||||
$inGpuUtilSamples = $false
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($line -match '^GPU\\s+[0-9A-Fa-f:.]+$') {
|
||||
$gpuIndex += 1
|
||||
$currentIndex = $gpuIndex
|
||||
$inGpuUtilSamples = $false
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($line -match '^\\s*Utilization\\s*$') {
|
||||
$inUtilBlock = $true
|
||||
continue
|
||||
}
|
||||
if ($line -match '^\\s*GPU Utilization Samples') {
|
||||
$inGpuUtilSamples = $true
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($line -match '^\\s*(Memory|ENC|DEC) Utilization Samples') {
|
||||
$inGpuUtilSamples = $false
|
||||
$inUtilBlock = $false
|
||||
continue
|
||||
}
|
||||
if ($inUtilBlock -and $line -match '^\\s*GPU\\s*:\\s*([0-9]+)\\s*%') {
|
||||
$util = [int]$Matches[1]
|
||||
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||
$summary.samples += 1
|
||||
continue
|
||||
}
|
||||
if ($inGpuUtilSamples -and $line -match '^\\s*Max\\s*:\\s*([0-9]+)\\s*%') {
|
||||
$util = [int]$Matches[1]
|
||||
if ($currentIndex -eq 0 -and $util -gt 0) { $summary.gpu0Used = $true }
|
||||
if ($currentIndex -eq 1 -and $util -gt 0) { $summary.gpu1Used = $true }
|
||||
$summary.samples += 1
|
||||
}
|
||||
}
|
||||
return $summary
|
||||
}
|
||||
|
||||
function Get-ProcessorShare {
|
||||
param(
|
||||
[string]$SshExePath,
|
||||
[string]$Target,
|
||||
[int]$Port,
|
||||
[string]$ModelName
|
||||
)
|
||||
$result = [ordered]@{ cpuPct = $null; gpuPct = $null; raw = $null; error = $null }
|
||||
try {
|
||||
$out = & $SshExePath -p $Port $Target "sudo -n docker exec ix-ollama-ollama-1 ollama ps"
|
||||
$line = $out | Select-String -SimpleMatch $ModelName | Select-Object -First 1
|
||||
if ($null -eq $line) {
|
||||
$result.error = "model not found in ollama ps"
|
||||
return $result
|
||||
}
|
||||
$raw = $line.ToString().Trim()
|
||||
$result.raw = $raw
|
||||
if ($raw -match '([0-9]+)%\\/([0-9]+)%\\s+CPU\\/GPU') {
|
||||
$result.cpuPct = [int]$Matches[1]
|
||||
$result.gpuPct = [int]$Matches[2]
|
||||
} elseif ($raw -match '([0-9]+)%\\s+GPU') {
|
||||
$result.cpuPct = 0
|
||||
$result.gpuPct = [int]$Matches[1]
|
||||
} else {
|
||||
$result.error = "CPU/GPU split not parsed"
|
||||
}
|
||||
} catch {
|
||||
$result.error = $_.Exception.Message
|
||||
}
|
||||
return $result
|
||||
}
|
||||
|
||||
$prompt = [string](Get-Content -Raw -Path $PromptPath)
|
||||
$allowed = Get-AllowedLegs -promptText $prompt
|
||||
$allowedExpiry = $allowed.AllowedExpiry
|
||||
$allowedLegs = $allowed.AllowedLegs
|
||||
|
||||
if ([string]::IsNullOrWhiteSpace($BatchId)) {
|
||||
$BatchId = (Get-Date).ToString("yyyyMMdd_HHmmss")
|
||||
}
|
||||
|
||||
$outBase = Join-Path -Path (Get-Location) -ChildPath "ollama_runs_remote"
|
||||
if (-not (Test-Path $outBase)) { New-Item -ItemType Directory -Path $outBase | Out-Null }
|
||||
|
||||
$safeModel = $Model -replace '[\\/:*?"<>|]', '_'
|
||||
$batchDir = Join-Path -Path $outBase -ChildPath ("batch_{0}" -f $BatchId)
|
||||
if (-not (Test-Path $batchDir)) { New-Item -ItemType Directory -Path $batchDir | Out-Null }
|
||||
|
||||
$outDir = Join-Path -Path $batchDir -ChildPath $safeModel
|
||||
if (-not (Test-Path $outDir)) { New-Item -ItemType Directory -Path $outDir | Out-Null }
|
||||
|
||||
$summary = [ordered]@{
|
||||
model = $Model
|
||||
baseUrl = $BaseUrl
|
||||
formatMode = $(if ($UseSchemaFormat) { "schema" } else { "json" })
|
||||
batchId = $BatchId
|
||||
gpuMonitor = [ordered]@{
|
||||
enabled = [bool]$EnableGpuMonitor
|
||||
sshHost = $SshHost
|
||||
sshPort = $SshPort
|
||||
intervalSec = $GpuMonitorIntervalSec
|
||||
durationSec = $GpuMonitorSeconds
|
||||
}
|
||||
runs = @()
|
||||
}
|
||||
|
||||
for ($i = 1; $i -le $Runs; $i++) {
|
||||
Write-Host "Running $Model (run $i/$Runs)"
|
||||
|
||||
$runResult = [ordered]@{ run = $i; ok = $false; errors = @() }
|
||||
$gpuJob = $null
|
||||
$gpuLogPath = $null
|
||||
|
||||
if ($EnableGpuMonitor) {
|
||||
$samples = [math]::Max(5, [int]([math]::Ceiling($GpuMonitorSeconds / [double]$GpuMonitorIntervalSec)))
|
||||
$gpuLogPath = Join-Path $outDir ("gpu_run{0}.csv" -f $i)
|
||||
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||
$gpuJob = Start-Job -ScriptBlock {
|
||||
param($sshExe, $target, $port, $samples, $interval, $logPath)
|
||||
for ($s = 1; $s -le $samples; $s++) {
|
||||
Add-Content -Path $logPath -Value ("=== SAMPLE {0} {1}" -f $s, (Get-Date).ToString('s'))
|
||||
try {
|
||||
$out = & $sshExe -p $port $target "nvidia-smi -q -d UTILIZATION"
|
||||
Add-Content -Path $logPath -Value $out
|
||||
} catch {
|
||||
Add-Content -Path $logPath -Value ("GPU monitor error: $($_.Exception.Message)")
|
||||
}
|
||||
Start-Sleep -Seconds $interval
|
||||
}
|
||||
} -ArgumentList $SshExe, $sshTarget, $SshPort, $samples, $GpuMonitorIntervalSec, $gpuLogPath
|
||||
Start-Sleep -Seconds 1
|
||||
}
|
||||
|
||||
$format = "json"
|
||||
if ($UseSchemaFormat) {
|
||||
$format = @{
|
||||
type = "object"
|
||||
additionalProperties = $false
|
||||
required = @("selectedExpiry","expiryRationale","strategyBias","recommendedTrades","whyOthersRejected","confidenceScore")
|
||||
properties = @{
|
||||
selectedExpiry = @{ type = "string"; minLength = 1 }
|
||||
expiryRationale = @{ type = "string"; minLength = 1 }
|
||||
strategyBias = @{ type = "string"; enum = @("DIRECTIONAL","VOLATILITY","NEUTRAL","NO_TRADE") }
|
||||
recommendedTrades = @{
|
||||
type = "array"
|
||||
minItems = 0
|
||||
maxItems = 3
|
||||
items = @{
|
||||
type = "object"
|
||||
additionalProperties = $false
|
||||
required = @("name","structure","legs","greekProfile","maxRisk","maxReward","thesisAlignment","invalidation")
|
||||
properties = @{
|
||||
name = @{ type = "string"; minLength = 1 }
|
||||
structure = @{ type = "string"; minLength = 1 }
|
||||
legs = @{
|
||||
type = "array"
|
||||
minItems = 1
|
||||
maxItems = 4
|
||||
items = @{
|
||||
type = "object"
|
||||
additionalProperties = $false
|
||||
required = @("side","action","strike","expiry")
|
||||
properties = @{
|
||||
side = @{ type = "string"; enum = @("call","put") }
|
||||
action = @{ type = "string"; enum = @("buy","sell") }
|
||||
strike = @{ type = @("number","string") }
|
||||
expiry = @{ type = "string"; minLength = 1 }
|
||||
}
|
||||
}
|
||||
}
|
||||
greekProfile = @{
|
||||
type = "object"
|
||||
additionalProperties = $false
|
||||
required = @("deltaBias","gammaExposure","thetaExposure","vegaExposure")
|
||||
properties = @{
|
||||
deltaBias = @{ type = "string"; enum = @("POS","NEG","NEUTRAL") }
|
||||
gammaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
|
||||
thetaExposure = @{ type = "string"; enum = @("POS","NEG","LOW") }
|
||||
vegaExposure = @{ type = "string"; enum = @("HIGH","MED","LOW") }
|
||||
}
|
||||
}
|
||||
maxRisk = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
|
||||
maxReward = @{ anyOf = @(@{ type = "string"; minLength = 1 }, @{ type = "number" }) }
|
||||
thesisAlignment = @{ type = "string"; minLength = 1 }
|
||||
invalidation = @{ type = "string"; minLength = 1 }
|
||||
managementNotes = @{ type = "string" }
|
||||
}
|
||||
}
|
||||
}
|
||||
whyOthersRejected = @{
|
||||
type = "array"
|
||||
items = @{ type = "string" }
|
||||
}
|
||||
confidenceScore = @{ type = "number"; minimum = 0; maximum = 100 }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
$options = @{
|
||||
temperature = 0
|
||||
top_k = $TopK
|
||||
top_p = $TopP
|
||||
seed = $Seed
|
||||
repeat_penalty = $RepeatPenalty
|
||||
num_ctx = $NumCtx
|
||||
num_predict = $NumPredict
|
||||
}
|
||||
if ($NumBatch -gt 0) {
|
||||
$options.num_batch = $NumBatch
|
||||
}
|
||||
if ($NumGpuLayers -gt 0) {
|
||||
$options.num_gpu_layers = $NumGpuLayers
|
||||
}
|
||||
|
||||
$body = @{
|
||||
model = $Model
|
||||
prompt = $prompt
|
||||
format = $format
|
||||
stream = $false
|
||||
options = $options
|
||||
} | ConvertTo-Json -Depth 10
|
||||
|
||||
try {
|
||||
$resp = Invoke-RestMethod -Uri "$BaseUrl/api/generate" -Method Post -Body $body -ContentType "application/json" -TimeoutSec $TimeoutSec
|
||||
} catch {
|
||||
$runResult.errors = @("API error: $($_.Exception.Message)")
|
||||
$summary.runs += $runResult
|
||||
if ($gpuJob) { Stop-Job -Job $gpuJob | Out-Null }
|
||||
continue
|
||||
} finally {
|
||||
if ($gpuJob) {
|
||||
Wait-Job -Job $gpuJob -Timeout 5 | Out-Null
|
||||
if ($gpuJob.State -eq "Running") { Stop-Job -Job $gpuJob | Out-Null }
|
||||
Remove-Job -Job $gpuJob | Out-Null
|
||||
}
|
||||
}
|
||||
|
||||
$raw = [string]$resp.response
|
||||
|
||||
$jsonPath = Join-Path $outDir ("run{0}.json" -f $i)
|
||||
Set-Content -Path $jsonPath -Value $raw -Encoding ASCII
|
||||
|
||||
try {
|
||||
$parsed = $raw | ConvertFrom-Json
|
||||
$errors = Test-TradeSchema -obj $parsed -allowedExpiry $allowedExpiry -allowedLegs $allowedLegs
|
||||
if ($errors.Count -eq 0) {
|
||||
$runResult.ok = $true
|
||||
} else {
|
||||
$runResult.errors = $errors
|
||||
}
|
||||
} catch {
|
||||
$runResult.errors = @("Invalid JSON: $($_.Exception.Message)")
|
||||
}
|
||||
|
||||
if ($gpuLogPath) {
|
||||
$runResult.gpuLog = $gpuLogPath
|
||||
$runResult.gpuUsage = Parse-GpuLog -Path $gpuLogPath
|
||||
}
|
||||
|
||||
if ($CheckProcessor) {
|
||||
$sshTarget = "{0}@{1}" -f $SshUser, $SshHost
|
||||
$proc = Get-ProcessorShare -SshExePath $SshExe -Target $sshTarget -Port $SshPort -ModelName $Model
|
||||
$runResult.processor = $proc
|
||||
if ($proc.cpuPct -ne $null) {
|
||||
$runResult.gpuOnly = ($proc.cpuPct -eq 0)
|
||||
}
|
||||
}
|
||||
|
||||
$summary.runs += $runResult
|
||||
}
|
||||
|
||||
$summaryPath = Join-Path $outDir "summary.json"
|
||||
$summary | ConvertTo-Json -Depth 6 | Set-Content -Path $summaryPath -Encoding ASCII
|
||||
|
||||
$summary | ConvertTo-Json -Depth 6
|
||||
155
prompt_crwv.txt
Normal file
155
prompt_crwv.txt
Normal file
File diff suppressed because one or more lines are too long
1
query.sql
Normal file
1
query.sql
Normal file
@@ -0,0 +1 @@
|
||||
SELECT p.title, p.privacy FROM playlists p JOIN users u ON p.author = u.email WHERE u.email = 'rushabh';
|
||||
8
requirements.txt
Normal file
8
requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
fastapi==0.115.6
|
||||
uvicorn==0.30.6
|
||||
httpx==0.27.2
|
||||
pytest==8.3.3
|
||||
respx==0.21.1
|
||||
pytest-asyncio==0.24.0
|
||||
PyYAML==6.0.3
|
||||
websockets==12.0
|
||||
116
scripts/deploy_truenas_wrapper.py
Normal file
116
scripts/deploy_truenas_wrapper.py
Normal file
@@ -0,0 +1,116 @@
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
import ssl
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import websockets
|
||||
|
||||
|
||||
async def _rpc_call(ws_url: str, api_key: str, method: str, params: Optional[list] = None, verify_ssl: bool = False) -> Any:
|
||||
ssl_ctx = None
|
||||
if ws_url.startswith("wss://") and not verify_ssl:
|
||||
ssl_ctx = ssl.create_default_context()
|
||||
ssl_ctx.check_hostname = False
|
||||
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||
|
||||
async with websockets.connect(ws_url, ssl=ssl_ctx) as ws:
|
||||
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||
connected = json.loads(await ws.recv())
|
||||
if connected.get("msg") != "connected":
|
||||
raise RuntimeError("failed to connect to TrueNAS websocket")
|
||||
|
||||
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [api_key]}))
|
||||
auth_resp = json.loads(await ws.recv())
|
||||
if not auth_resp.get("result"):
|
||||
raise RuntimeError("API key authentication failed")
|
||||
|
||||
req_id = 2
|
||||
await ws.send(json.dumps({"id": req_id, "msg": "method", "method": method, "params": params or []}))
|
||||
while True:
|
||||
raw = json.loads(await ws.recv())
|
||||
if raw.get("id") != req_id:
|
||||
continue
|
||||
if raw.get("msg") == "error":
|
||||
raise RuntimeError(raw.get("error"))
|
||||
return raw.get("result")
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--ws-url", required=True)
|
||||
parser.add_argument("--api-key", required=True)
|
||||
parser.add_argument("--api-user")
|
||||
parser.add_argument("--app-name", required=True)
|
||||
parser.add_argument("--image", required=True)
|
||||
parser.add_argument("--model-host-path", required=True)
|
||||
parser.add_argument("--llamacpp-base-url", required=True)
|
||||
parser.add_argument("--network", required=True)
|
||||
parser.add_argument("--api-port", type=int, default=9091)
|
||||
parser.add_argument("--ui-port", type=int, default=9092)
|
||||
parser.add_argument("--verify-ssl", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
api_port = args.api_port
|
||||
ui_port = args.ui_port
|
||||
|
||||
env = {
|
||||
"PORT_A": str(api_port),
|
||||
"PORT_B": str(ui_port),
|
||||
"LLAMACPP_BASE_URL": args.llamacpp_base_url,
|
||||
"MODEL_DIR": "/models",
|
||||
"TRUENAS_WS_URL": args.ws_url,
|
||||
"TRUENAS_API_KEY": args.api_key,
|
||||
"TRUENAS_APP_NAME": "llamacpp",
|
||||
"TRUENAS_VERIFY_SSL": "false",
|
||||
}
|
||||
if args.api_user:
|
||||
env["TRUENAS_API_USER"] = args.api_user
|
||||
|
||||
compose = {
|
||||
"services": {
|
||||
"wrapper": {
|
||||
"image": args.image,
|
||||
"restart": "unless-stopped",
|
||||
"ports": [
|
||||
f"{api_port}:{api_port}",
|
||||
f"{ui_port}:{ui_port}",
|
||||
],
|
||||
"environment": env,
|
||||
"volumes": [
|
||||
f"{args.model_host_path}:/models",
|
||||
"/var/run/docker.sock:/var/run/docker.sock",
|
||||
],
|
||||
"networks": ["llamacpp_net"],
|
||||
}
|
||||
},
|
||||
"networks": {
|
||||
"llamacpp_net": {"external": True, "name": args.network}
|
||||
},
|
||||
}
|
||||
|
||||
create_payload = {
|
||||
"custom_app": True,
|
||||
"app_name": args.app_name,
|
||||
"custom_compose_config": compose,
|
||||
}
|
||||
|
||||
existing = await _rpc_call(args.ws_url, args.api_key, "app.query", [[["id", "=", args.app_name]]], args.verify_ssl)
|
||||
if existing:
|
||||
result = await _rpc_call(
|
||||
args.ws_url,
|
||||
args.api_key,
|
||||
"app.update",
|
||||
[args.app_name, {"custom_compose_config": compose}],
|
||||
args.verify_ssl,
|
||||
)
|
||||
action = "updated"
|
||||
else:
|
||||
result = await _rpc_call(args.ws_url, args.api_key, "app.create", [create_payload], args.verify_ssl)
|
||||
action = "created"
|
||||
|
||||
print(json.dumps({"action": action, "api_port": api_port, "ui_port": ui_port, "result": result}, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
162
scripts/remote_wrapper_test.py
Normal file
162
scripts/remote_wrapper_test.py
Normal file
@@ -0,0 +1,162 @@
|
||||
import json
|
||||
import os
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
import requests
|
||||
|
||||
BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9000")
|
||||
UPSTREAM = os.getenv("LLAMACPP_BASE", "http://192.168.1.2:8071")
|
||||
RUNS = int(os.getenv("RUNS", "100"))
|
||||
MAX_TOKENS = int(os.getenv("MAX_TOKENS", "4"))
|
||||
TIMEOUT = int(os.getenv("REQ_TIMEOUT", "300"))
|
||||
|
||||
|
||||
def _now():
|
||||
return datetime.utcnow().isoformat() + "Z"
|
||||
|
||||
|
||||
def _get_loaded_model_id():
|
||||
deadline = time.time() + 600
|
||||
last_error = None
|
||||
while time.time() < deadline:
|
||||
try:
|
||||
resp = requests.get(UPSTREAM + "/v1/models", timeout=30)
|
||||
resp.raise_for_status()
|
||||
data = resp.json().get("data") or []
|
||||
if data:
|
||||
return data[0].get("id")
|
||||
last_error = "no models reported by upstream"
|
||||
except Exception as exc:
|
||||
last_error = str(exc)
|
||||
time.sleep(5)
|
||||
raise RuntimeError(f"upstream not ready: {last_error}")
|
||||
|
||||
|
||||
def _stream_ok(resp):
|
||||
got_data = False
|
||||
got_done = False
|
||||
for line in resp.iter_lines(decode_unicode=True):
|
||||
if not line:
|
||||
continue
|
||||
if line.startswith("data:"):
|
||||
got_data = True
|
||||
if line.strip() == "data: [DONE]":
|
||||
got_done = True
|
||||
break
|
||||
return got_data, got_done
|
||||
|
||||
|
||||
def run_suite(model_id, idx):
|
||||
results = {}
|
||||
|
||||
# Models
|
||||
r = requests.get(BASE + "/v1/models", timeout=30)
|
||||
results["models"] = r.status_code
|
||||
|
||||
r = requests.get(BASE + f"/v1/models/{model_id}", timeout=30)
|
||||
results["model_get"] = r.status_code
|
||||
|
||||
# Chat completions non-stream
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": f"Run {idx}: say ok."}],
|
||||
"max_tokens": MAX_TOKENS,
|
||||
"temperature": (idx % 5) / 10.0,
|
||||
}
|
||||
r = requests.post(BASE + "/v1/chat/completions", json=payload, timeout=TIMEOUT)
|
||||
results["chat"] = r.status_code
|
||||
|
||||
# Chat completions stream
|
||||
payload_stream = dict(payload)
|
||||
payload_stream["stream"] = True
|
||||
r = requests.post(BASE + "/v1/chat/completions", json=payload_stream, stream=True, timeout=TIMEOUT)
|
||||
ok_data, ok_done = _stream_ok(r)
|
||||
results["chat_stream"] = r.status_code
|
||||
results["chat_stream_ok"] = ok_data and ok_done
|
||||
|
||||
# Responses non-stream
|
||||
payload_resp = {
|
||||
"model": model_id,
|
||||
"input": f"Run {idx}: say ok.",
|
||||
"max_output_tokens": MAX_TOKENS,
|
||||
}
|
||||
r = requests.post(BASE + "/v1/responses", json=payload_resp, timeout=TIMEOUT)
|
||||
results["responses"] = r.status_code
|
||||
|
||||
# Responses stream
|
||||
payload_resp_stream = {
|
||||
"model": model_id,
|
||||
"input": f"Run {idx}: say ok.",
|
||||
"stream": True,
|
||||
}
|
||||
r = requests.post(BASE + "/v1/responses", json=payload_resp_stream, stream=True, timeout=TIMEOUT)
|
||||
ok_data, ok_done = _stream_ok(r)
|
||||
results["responses_stream"] = r.status_code
|
||||
results["responses_stream_ok"] = ok_data and ok_done
|
||||
|
||||
# Embeddings (best effort)
|
||||
payload_emb = {"model": model_id, "input": f"Run {idx}"}
|
||||
r = requests.post(BASE + "/v1/embeddings", json=payload_emb, timeout=TIMEOUT)
|
||||
results["embeddings"] = r.status_code
|
||||
|
||||
# Proxy
|
||||
r = requests.post(BASE + "/proxy/llamacpp/v1/chat/completions", json=payload, timeout=TIMEOUT)
|
||||
results["proxy"] = r.status_code
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def main():
|
||||
summary = {
|
||||
"started_at": _now(),
|
||||
"base": BASE,
|
||||
"upstream": UPSTREAM,
|
||||
"runs": RUNS,
|
||||
"max_tokens": MAX_TOKENS,
|
||||
"results": [],
|
||||
}
|
||||
|
||||
model_id = _get_loaded_model_id()
|
||||
summary["model_id"] = model_id
|
||||
|
||||
for i in range(1, RUNS + 1):
|
||||
start = time.time()
|
||||
try:
|
||||
results = run_suite(model_id, i)
|
||||
ok = all(
|
||||
results.get(key) == 200
|
||||
for key in ("models", "model_get", "chat", "chat_stream", "responses", "responses_stream", "proxy")
|
||||
)
|
||||
stream_ok = results.get("chat_stream_ok") and results.get("responses_stream_ok")
|
||||
summary["results"].append({
|
||||
"run": i,
|
||||
"ok": ok and stream_ok,
|
||||
"stream_ok": stream_ok,
|
||||
"status": results,
|
||||
"elapsed_s": round(time.time() - start, 2),
|
||||
})
|
||||
except Exception as exc:
|
||||
summary["results"].append({
|
||||
"run": i,
|
||||
"ok": False,
|
||||
"stream_ok": False,
|
||||
"error": str(exc),
|
||||
"elapsed_s": round(time.time() - start, 2),
|
||||
})
|
||||
print(f"Run {i}/{RUNS} done")
|
||||
|
||||
summary["finished_at"] = _now()
|
||||
|
||||
os.makedirs("reports", exist_ok=True)
|
||||
out_path = os.path.join("reports", "remote_wrapper_test.json")
|
||||
with open(out_path, "w", encoding="utf-8") as f:
|
||||
json.dump(summary, f, indent=2)
|
||||
|
||||
# Print a compact summary
|
||||
ok_count = sum(1 for r in summary["results"] if r.get("ok"))
|
||||
print(f"OK {ok_count}/{RUNS}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
29
scripts/update_llamacpp_flags.ps1
Normal file
29
scripts/update_llamacpp_flags.ps1
Normal file
@@ -0,0 +1,29 @@
|
||||
param(
|
||||
[string]$OutDocs = "reports\\llamacpp_docs.md",
|
||||
[string]$OutFlags = "reports\\llamacpp_flags.txt"
|
||||
)
|
||||
|
||||
$urls = @(
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README.md",
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/examples/server/README-llama-server.md",
|
||||
"https://raw.githubusercontent.com/ggerganov/llama.cpp/master/README.md"
|
||||
)
|
||||
|
||||
$out = @()
|
||||
foreach ($u in $urls) {
|
||||
try {
|
||||
$content = Invoke-WebRequest -Uri $u -UseBasicParsing -TimeoutSec 30
|
||||
$out += "# Source: $u"
|
||||
$out += $content.Content
|
||||
} catch {
|
||||
$out += "# Source: $u"
|
||||
$out += "(failed to fetch)"
|
||||
}
|
||||
}
|
||||
|
||||
$out | Set-Content -Encoding UTF8 $OutDocs
|
||||
|
||||
$docs = Get-Content $OutDocs -Raw
|
||||
$flags = [regex]::Matches($docs, "--[a-zA-Z0-9\\-]+") | ForEach-Object { $_.Value }
|
||||
$flags = $flags | Sort-Object -Unique
|
||||
$flags | Set-Content -Encoding UTF8 $OutFlags
|
||||
61
tests/conftest.py
Normal file
61
tests/conftest.py
Normal file
@@ -0,0 +1,61 @@
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
from fastapi.testclient import TestClient
|
||||
import respx
|
||||
|
||||
from app.api_app import create_api_app
|
||||
from app.ui_app import create_ui_app
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def agents_config(tmp_path: Path) -> Path:
|
||||
data = {
|
||||
"image": "ghcr.io/ggml-org/llama.cpp:server-cuda",
|
||||
"container_name": "ix-llamacpp-llamacpp-1",
|
||||
"host_port": 8071,
|
||||
"container_port": 8080,
|
||||
"web_ui_url": "http://0.0.0.0:8071/",
|
||||
"model_host_path": str(tmp_path),
|
||||
"model_container_path": str(tmp_path),
|
||||
"models": [],
|
||||
"network": "ix-llamacpp_default",
|
||||
"subnets": ["172.16.18.0/24"],
|
||||
"gpu_count": 2,
|
||||
"gpu_name": "NVIDIA RTX 5060 Ti",
|
||||
}
|
||||
path = tmp_path / "agents_config.json"
|
||||
path.write_text(json.dumps(data), encoding="utf-8")
|
||||
return path
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def model_dir(tmp_path: Path) -> Path:
|
||||
(tmp_path / "model-a.gguf").write_text("x", encoding="utf-8")
|
||||
(tmp_path / "model-b.gguf").write_text("y", encoding="utf-8")
|
||||
return tmp_path
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def api_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
|
||||
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
|
||||
monkeypatch.setenv("MODEL_DIR", str(model_dir))
|
||||
monkeypatch.setenv("LLAMACPP_BASE_URL", "http://llama.test")
|
||||
app = create_api_app()
|
||||
return TestClient(app)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def ui_client(monkeypatch: pytest.MonkeyPatch, agents_config: Path, model_dir: Path):
|
||||
monkeypatch.setenv("AGENTS_CONFIG_PATH", str(agents_config))
|
||||
monkeypatch.setenv("MODEL_DIR", str(model_dir))
|
||||
app = create_ui_app()
|
||||
return TestClient(app)
|
||||
|
||||
|
||||
@pytest.fixture()
|
||||
def respx_mock():
|
||||
with respx.mock(assert_all_called=False) as mock:
|
||||
yield mock
|
||||
77
tests/test_chat_completions.py
Normal file
77
tests/test_chat_completions.py
Normal file
@@ -0,0 +1,77 @@
|
||||
import json
|
||||
import pytest
|
||||
import httpx
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_chat_completions_non_stream(api_client, respx_mock, case):
|
||||
respx_mock.get("http://llama.test/v1/models").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||
)
|
||||
respx_mock.post("http://llama.test/v1/chat/completions").mock(
|
||||
return_value=httpx.Response(200, json={"id": f"chatcmpl-{case}", "choices": [{"message": {"content": "ok"}}]})
|
||||
)
|
||||
|
||||
payload = {
|
||||
"model": "model-a.gguf",
|
||||
"messages": [{"role": "user", "content": f"hello {case}"}],
|
||||
"temperature": (case % 10) / 10,
|
||||
}
|
||||
resp = api_client.post("/v1/chat/completions", json=payload)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["choices"][0]["message"]["content"] == "ok"
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_chat_completions_stream(api_client, respx_mock, case):
|
||||
respx_mock.get("http://llama.test/v1/models").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||
)
|
||||
|
||||
def stream_response(request):
|
||||
content = b"data: {\"id\": \"chunk\"}\n\n"
|
||||
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
|
||||
|
||||
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
|
||||
|
||||
payload = {
|
||||
"model": "model-a.gguf",
|
||||
"messages": [{"role": "user", "content": f"hello {case}"}],
|
||||
"stream": True,
|
||||
}
|
||||
with api_client.stream("POST", "/v1/chat/completions", json=payload) as resp:
|
||||
assert resp.status_code == 200
|
||||
body = b"".join(resp.iter_bytes())
|
||||
assert b"data:" in body
|
||||
|
||||
|
||||
def test_chat_completions_tools_normalize(api_client, respx_mock):
|
||||
respx_mock.get("http://llama.test/v1/models").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||
)
|
||||
|
||||
def handler(request):
|
||||
data = request.json()
|
||||
tools = data.get("tools") or []
|
||||
assert tools
|
||||
assert tools[0].get("function", {}).get("name") == "format_final_json_response"
|
||||
return httpx.Response(200, json={"id": "chatcmpl-tools", "choices": [{"message": {"content": "ok"}}]})
|
||||
|
||||
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=handler)
|
||||
|
||||
payload = {
|
||||
"model": "model-a.gguf",
|
||||
"messages": [{"role": "user", "content": "hello"}],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"name": "format_final_json_response",
|
||||
"parameters": {"type": "object"},
|
||||
}
|
||||
],
|
||||
"tool_choice": {"type": "function", "name": "format_final_json_response"},
|
||||
}
|
||||
|
||||
resp = api_client.post("/v1/chat/completions", json=payload)
|
||||
assert resp.status_code == 200
|
||||
14
tests/test_embeddings.py
Normal file
14
tests/test_embeddings.py
Normal file
@@ -0,0 +1,14 @@
|
||||
import pytest
|
||||
import httpx
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_embeddings(api_client, respx_mock, case):
|
||||
respx_mock.post("http://llama.test/v1/embeddings").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"embedding": [0.1, 0.2]}]})
|
||||
)
|
||||
payload = {"model": "model-a.gguf", "input": f"text-{case}"}
|
||||
resp = api_client.post("/v1/embeddings", json=payload)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert "data" in data
|
||||
24
tests/test_models.py
Normal file
24
tests/test_models.py
Normal file
@@ -0,0 +1,24 @@
|
||||
import pytest
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_list_models_cases(api_client, case):
|
||||
resp = api_client.get("/v1/models", headers={"x-case": str(case)})
|
||||
assert resp.status_code == 200
|
||||
payload = resp.json()
|
||||
assert payload["object"] == "list"
|
||||
assert isinstance(payload["data"], list)
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model_id", [f"model-a.gguf" for _ in range(120)])
|
||||
def test_get_model_ok(api_client, model_id):
|
||||
resp = api_client.get(f"/v1/models/{model_id}")
|
||||
assert resp.status_code == 200
|
||||
payload = resp.json()
|
||||
assert payload["id"] == model_id
|
||||
|
||||
|
||||
@pytest.mark.parametrize("model_id", [f"missing-{i}" for i in range(120)])
|
||||
def test_get_model_not_found(api_client, model_id):
|
||||
resp = api_client.get(f"/v1/models/{model_id}")
|
||||
assert resp.status_code == 404
|
||||
12
tests/test_proxy.py
Normal file
12
tests/test_proxy.py
Normal file
@@ -0,0 +1,12 @@
|
||||
import pytest
|
||||
import httpx
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_proxy_passthrough(api_client, respx_mock, case):
|
||||
respx_mock.post("http://llama.test/test/path").mock(
|
||||
return_value=httpx.Response(200, content=f"ok-{case}".encode())
|
||||
)
|
||||
resp = api_client.post("/proxy/llamacpp/test/path", content=b"hello")
|
||||
assert resp.status_code == 200
|
||||
assert resp.content.startswith(b"ok-")
|
||||
283
tests/test_remote_wrapper.py
Normal file
283
tests/test_remote_wrapper.py
Normal file
@@ -0,0 +1,283 @@
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import ssl
|
||||
import time
|
||||
from typing import Dict, List
|
||||
|
||||
import pytest
|
||||
import requests
|
||||
import websockets
|
||||
|
||||
WRAPPER_BASE = os.getenv("WRAPPER_BASE", "http://192.168.1.2:9093")
|
||||
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
|
||||
TRUENAS_WS_URL = os.getenv("TRUENAS_WS_URL", "wss://192.168.1.2/websocket")
|
||||
TRUENAS_API_KEY = os.getenv("TRUENAS_API_KEY", "")
|
||||
TRUENAS_APP_NAME = os.getenv("TRUENAS_APP_NAME", "llamacpp")
|
||||
MODEL_REQUEST = os.getenv("MODEL_REQUEST", "")
|
||||
|
||||
|
||||
async def _rpc_call(method: str, params: List | None = None):
|
||||
if not TRUENAS_API_KEY:
|
||||
pytest.skip("TRUENAS_API_KEY not set")
|
||||
ssl_ctx = ssl.create_default_context()
|
||||
ssl_ctx.check_hostname = False
|
||||
ssl_ctx.verify_mode = ssl.CERT_NONE
|
||||
async with websockets.connect(TRUENAS_WS_URL, ssl=ssl_ctx) as ws:
|
||||
await ws.send(json.dumps({"msg": "connect", "version": "1", "support": ["1"]}))
|
||||
connected = json.loads(await ws.recv())
|
||||
if connected.get("msg") != "connected":
|
||||
raise RuntimeError("failed to connect")
|
||||
await ws.send(json.dumps({"id": 1, "msg": "method", "method": "auth.login_with_api_key", "params": [TRUENAS_API_KEY]}))
|
||||
auth = json.loads(await ws.recv())
|
||||
if not auth.get("result"):
|
||||
raise RuntimeError("auth failed")
|
||||
await ws.send(json.dumps({"id": 2, "msg": "method", "method": method, "params": params or []}))
|
||||
while True:
|
||||
raw = json.loads(await ws.recv())
|
||||
if raw.get("id") != 2:
|
||||
continue
|
||||
if raw.get("msg") == "error":
|
||||
raise RuntimeError(raw.get("error"))
|
||||
return raw.get("result")
|
||||
|
||||
|
||||
def _get_models() -> List[str]:
|
||||
_wait_for_http(WRAPPER_BASE + "/health")
|
||||
resp = requests.get(WRAPPER_BASE + "/v1/models", timeout=30)
|
||||
resp.raise_for_status()
|
||||
data = resp.json().get("data") or []
|
||||
return [m.get("id") for m in data if m.get("id")]
|
||||
|
||||
|
||||
def _assert_chat_ok(resp_json: Dict) -> str:
|
||||
choices = resp_json.get("choices") or []
|
||||
assert choices, "no choices"
|
||||
message = choices[0].get("message") or {}
|
||||
text = message.get("content") or ""
|
||||
assert text.strip(), "empty content"
|
||||
return text
|
||||
|
||||
|
||||
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
|
||||
deadline = time.time() + timeout_s
|
||||
last_err = None
|
||||
while time.time() < deadline:
|
||||
try:
|
||||
resp = requests.get(url, timeout=5)
|
||||
if resp.status_code == 200:
|
||||
return
|
||||
last_err = f"status {resp.status_code}"
|
||||
except Exception as exc:
|
||||
last_err = str(exc)
|
||||
time.sleep(2)
|
||||
raise RuntimeError(f"service not ready: {url} ({last_err})")
|
||||
|
||||
|
||||
def _post_with_retry(url: str, payload: Dict, timeout_s: float = 300, retries: int = 6, delay_s: float = 5.0):
|
||||
last = None
|
||||
for _ in range(retries):
|
||||
try:
|
||||
resp = requests.post(url, json=payload, timeout=timeout_s)
|
||||
if resp.status_code == 200:
|
||||
return resp
|
||||
last = resp
|
||||
except requests.exceptions.RequestException as exc:
|
||||
last = exc
|
||||
time.sleep(delay_s)
|
||||
if isinstance(last, Exception):
|
||||
raise last
|
||||
return last
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_active_model_and_multi_gpu_flags():
|
||||
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
|
||||
command = cfg.get("command") or []
|
||||
assert "--model" in command
|
||||
assert "--tensor-split" in command
|
||||
split_idx = command.index("--tensor-split") + 1
|
||||
split = command[split_idx]
|
||||
assert "," in split, f"tensor-split missing commas: {split}"
|
||||
assert "--split-mode" in command
|
||||
|
||||
|
||||
def test_models_listed():
|
||||
models = _get_models()
|
||||
assert models, "no models discovered"
|
||||
|
||||
|
||||
def test_chat_completions_switch_and_prompts():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
if MODEL_REQUEST:
|
||||
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||
model_id = MODEL_REQUEST
|
||||
else:
|
||||
model_id = models[0]
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": "Say OK."}],
|
||||
"max_tokens": 12,
|
||||
"temperature": 0,
|
||||
}
|
||||
for _ in range(3):
|
||||
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||
assert resp.status_code == 200
|
||||
_assert_chat_ok(resp.json())
|
||||
|
||||
|
||||
def test_tools_flat_format():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
if MODEL_REQUEST:
|
||||
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||
model_id = MODEL_REQUEST
|
||||
else:
|
||||
model_id = models[0]
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"name": "format_final_json_response",
|
||||
"description": "format output",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"ok": {"type": "boolean"}},
|
||||
"required": ["ok"],
|
||||
},
|
||||
}
|
||||
],
|
||||
"max_tokens": 12,
|
||||
}
|
||||
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||
assert resp.status_code == 200
|
||||
_assert_chat_ok(resp.json())
|
||||
|
||||
|
||||
def test_functions_payload_normalized():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
if MODEL_REQUEST:
|
||||
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||
model_id = MODEL_REQUEST
|
||||
else:
|
||||
model_id = models[0]
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": "Say OK and do not call tools."}],
|
||||
"functions": [
|
||||
{
|
||||
"name": "format_final_json_response",
|
||||
"description": "format output",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {"ok": {"type": "boolean"}},
|
||||
"required": ["ok"],
|
||||
},
|
||||
}
|
||||
],
|
||||
"max_tokens": 12,
|
||||
}
|
||||
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||
assert resp.status_code == 200
|
||||
_assert_chat_ok(resp.json())
|
||||
|
||||
|
||||
def test_return_format_json():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
if MODEL_REQUEST:
|
||||
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||
model_id = MODEL_REQUEST
|
||||
else:
|
||||
model_id = models[0]
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"messages": [{"role": "user", "content": "Return JSON with key ok true."}],
|
||||
"return_format": "json",
|
||||
"max_tokens": 32,
|
||||
"temperature": 0,
|
||||
}
|
||||
resp = _post_with_retry(WRAPPER_BASE + "/v1/chat/completions", payload)
|
||||
assert resp.status_code == 200
|
||||
text = _assert_chat_ok(resp.json())
|
||||
parsed = json.loads(text)
|
||||
assert isinstance(parsed, dict)
|
||||
|
||||
|
||||
def test_responses_endpoint():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
if MODEL_REQUEST:
|
||||
assert MODEL_REQUEST in models, f"MODEL_REQUEST not found: {MODEL_REQUEST}"
|
||||
model_id = MODEL_REQUEST
|
||||
else:
|
||||
model_id = models[0]
|
||||
payload = {
|
||||
"model": model_id,
|
||||
"input": "Say OK.",
|
||||
"max_output_tokens": 16,
|
||||
}
|
||||
resp = _post_with_retry(WRAPPER_BASE + "/v1/responses", payload)
|
||||
assert resp.status_code == 200
|
||||
output = resp.json().get("output") or []
|
||||
assert output, "responses output empty"
|
||||
content = output[0].get("content") or []
|
||||
text = content[0].get("text") if content else ""
|
||||
assert text and text.strip()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_model_switch_applied_to_truenas():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
target = MODEL_REQUEST or models[0]
|
||||
assert target in models, f"MODEL_REQUEST not found: {target}"
|
||||
resp = requests.post(UI_BASE + "/ui/api/switch-model", json={"model_id": target, "warmup_prompt": "warmup"}, timeout=600)
|
||||
assert resp.status_code == 200
|
||||
cfg = await _rpc_call("app.config", [TRUENAS_APP_NAME])
|
||||
command = cfg.get("command") or []
|
||||
assert "--model" in command
|
||||
model_path = command[command.index("--model") + 1]
|
||||
assert model_path.endswith(target)
|
||||
|
||||
|
||||
def test_invalid_model_rejected():
|
||||
models = _get_models()
|
||||
assert models, "no models"
|
||||
payload = {
|
||||
"model": "modelx-q8:4b",
|
||||
"messages": [{"role": "user", "content": "Say OK."}],
|
||||
"max_tokens": 8,
|
||||
"temperature": 0,
|
||||
}
|
||||
resp = requests.post(WRAPPER_BASE + "/v1/chat/completions", json=payload, timeout=60)
|
||||
assert resp.status_code == 404
|
||||
|
||||
|
||||
def test_llamacpp_logs_streaming():
|
||||
logs = ""
|
||||
for _ in range(5):
|
||||
try:
|
||||
resp = requests.get(UI_BASE + "/ui/api/llamacpp-logs", timeout=10)
|
||||
if resp.status_code == 200:
|
||||
logs = resp.json().get("logs") or ""
|
||||
if logs.strip():
|
||||
break
|
||||
except requests.exceptions.ReadTimeout:
|
||||
pass
|
||||
time.sleep(2)
|
||||
assert logs.strip(), "no logs returned"
|
||||
|
||||
# Force a log line before streaming.
|
||||
try:
|
||||
requests.get(WRAPPER_BASE + "/proxy/llamacpp/health", timeout=5)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Stream endpoint may not emit immediately, so validate that the endpoint responds.
|
||||
with requests.get(UI_BASE + "/ui/api/llamacpp-logs/stream", stream=True, timeout=(5, 5)) as resp:
|
||||
assert resp.status_code == 200
|
||||
55
tests/test_responses.py
Normal file
55
tests/test_responses.py
Normal file
@@ -0,0 +1,55 @@
|
||||
import json
|
||||
import pytest
|
||||
import httpx
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_responses_non_stream(api_client, respx_mock, case):
|
||||
respx_mock.get("http://llama.test/v1/models").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||
)
|
||||
respx_mock.post("http://llama.test/v1/chat/completions").mock(
|
||||
return_value=httpx.Response(200, json={"choices": [{"message": {"content": f"reply-{case}"}}]})
|
||||
)
|
||||
|
||||
payload = {
|
||||
"model": "model-a.gguf",
|
||||
"input": f"prompt-{case}",
|
||||
"max_output_tokens": 32,
|
||||
}
|
||||
resp = api_client.post("/v1/responses", json=payload)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["object"] == "response"
|
||||
assert data["output"][0]["content"][0]["text"].startswith("reply-")
|
||||
|
||||
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
def test_responses_stream(api_client, respx_mock, case):
|
||||
respx_mock.get("http://llama.test/v1/models").mock(
|
||||
return_value=httpx.Response(200, json={"data": [{"id": "model-a.gguf"}]})
|
||||
)
|
||||
|
||||
def stream_response(request):
|
||||
payload = {
|
||||
"id": "chunk",
|
||||
"object": "chat.completion.chunk",
|
||||
"choices": [{"delta": {"content": f"hi-{case}"}, "index": 0, "finish_reason": None}],
|
||||
}
|
||||
content = f"data: {json.dumps(payload)}\n\n".encode()
|
||||
content += b"data: [DONE]\n\n"
|
||||
return httpx.Response(200, content=content, headers={"Content-Type": "text/event-stream"})
|
||||
|
||||
respx_mock.post("http://llama.test/v1/chat/completions").mock(side_effect=stream_response)
|
||||
|
||||
payload = {
|
||||
"model": "model-a.gguf",
|
||||
"input": f"prompt-{case}",
|
||||
"stream": True,
|
||||
}
|
||||
with api_client.stream("POST", "/v1/responses", json=payload) as resp:
|
||||
assert resp.status_code == 200
|
||||
body = b"".join(resp.iter_bytes())
|
||||
assert b"event: response.created" in body
|
||||
assert b"event: response.output_text.delta" in body
|
||||
assert b"event: response.completed" in body
|
||||
54
tests/test_truenas_switch.py
Normal file
54
tests/test_truenas_switch.py
Normal file
@@ -0,0 +1,54 @@
|
||||
import json
|
||||
import pytest
|
||||
|
||||
from app.truenas_middleware import TrueNASConfig, switch_model
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
@pytest.mark.parametrize("case", list(range(120)))
|
||||
async def test_switch_model_updates_command(monkeypatch, case):
|
||||
compose = {
|
||||
"services": {
|
||||
"llamacpp": {
|
||||
"command": [
|
||||
"--model",
|
||||
"/models/old.gguf",
|
||||
"--ctx-size",
|
||||
"2048",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
captured = {}
|
||||
|
||||
async def fake_rpc_call(cfg, method, params=None):
|
||||
if method == "app.config":
|
||||
return {"custom_compose_config": compose}
|
||||
if method == "app.update":
|
||||
captured["payload"] = params[1]
|
||||
return {"state": "RUNNING"}
|
||||
raise AssertionError(f"unexpected method {method}")
|
||||
|
||||
monkeypatch.setattr("app.truenas_middleware._rpc_call", fake_rpc_call)
|
||||
|
||||
cfg = TrueNASConfig(
|
||||
ws_url="ws://truenas.test/websocket",
|
||||
api_key="key",
|
||||
api_user=None,
|
||||
app_name="llamacpp",
|
||||
verify_ssl=False,
|
||||
)
|
||||
|
||||
await switch_model(
|
||||
cfg,
|
||||
f"/models/new-{case}.gguf",
|
||||
{"n_gpu_layers": "999"},
|
||||
"--flash-attn on",
|
||||
)
|
||||
|
||||
assert "custom_compose_config" in captured["payload"]
|
||||
cmd = captured["payload"]["custom_compose_config"]["services"]["llamacpp"]["command"]
|
||||
assert "--model" in cmd
|
||||
idx = cmd.index("--model")
|
||||
assert cmd[idx + 1].endswith(f"new-{case}.gguf")
|
||||
48
tests/test_ui.py
Normal file
48
tests/test_ui.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import json
|
||||
import os
|
||||
import time
|
||||
|
||||
import pytest
|
||||
import requests
|
||||
|
||||
UI_BASE = os.getenv("UI_BASE", "http://192.168.1.2:9094")
|
||||
|
||||
def _wait_for_http(url: str, timeout_s: float = 90) -> None:
|
||||
deadline = time.time() + timeout_s
|
||||
last_err = None
|
||||
while time.time() < deadline:
|
||||
try:
|
||||
resp = requests.get(url, timeout=5)
|
||||
if resp.status_code == 200:
|
||||
return
|
||||
last_err = f"status {resp.status_code}"
|
||||
except Exception as exc:
|
||||
last_err = str(exc)
|
||||
time.sleep(2)
|
||||
raise RuntimeError(f"service not ready: {url} ({last_err})")
|
||||
|
||||
|
||||
def test_ui_index_contains_expected_elements():
|
||||
_wait_for_http(UI_BASE + "/health")
|
||||
resp = requests.get(UI_BASE + "/", timeout=30)
|
||||
assert resp.status_code == 200
|
||||
html = resp.text
|
||||
assert "Model Manager" in html
|
||||
assert "id=\"download-form\"" in html
|
||||
assert "id=\"models-list\"" in html
|
||||
assert "id=\"logs-output\"" in html
|
||||
assert "id=\"theme-toggle\"" in html
|
||||
|
||||
|
||||
def test_ui_assets_available():
|
||||
resp = requests.get(UI_BASE + "/ui/styles.css", timeout=30)
|
||||
assert resp.status_code == 200
|
||||
css = resp.text
|
||||
assert "data-theme" in css
|
||||
|
||||
resp = requests.get(UI_BASE + "/ui/app.js", timeout=30)
|
||||
assert resp.status_code == 200
|
||||
js = resp.text
|
||||
assert "themeToggle" in js
|
||||
assert "localStorage" in js
|
||||
assert "logs-output" in js
|
||||
1
tmp_channels_cols.sql
Normal file
1
tmp_channels_cols.sql
Normal file
@@ -0,0 +1 @@
|
||||
SELECT column_name, data_type FROM information_schema.columns WHERE table_name='channels' ORDER BY ordinal_position;
|
||||
1
tmp_pref_type.sql
Normal file
1
tmp_pref_type.sql
Normal file
@@ -0,0 +1 @@
|
||||
SELECT data_type FROM information_schema.columns WHERE table_name='users' AND column_name='preferences';
|
||||
1
tmp_update_max_results.sql
Normal file
1
tmp_update_max_results.sql
Normal file
@@ -0,0 +1 @@
|
||||
UPDATE users SET preferences = (jsonb_set(preferences::jsonb, '{max_results}', '200'::jsonb, true))::text WHERE email='rushabh';
|
||||
56
trades_company_stock.txt
Normal file
56
trades_company_stock.txt
Normal file
@@ -0,0 +1,56 @@
|
||||
You are a senior quantitative options trader (index/ETF options across regimes; also liquid single-name options and macro-sensitive metal ETFs), specializing in volatility, structure selection, and risk asymmetry. Decisive, skeptical, profit-focused.
|
||||
|
||||
You are given:
|
||||
- A validated market thesis (authoritative): multi-timeframe technicals, regime, volatility context, news impact.
|
||||
- Pre-processed options chains for three expiries (short / medium / extended) with liquidity-filtered contracts, ATM/delta anchors, delta ladders, and a liquid execution set.
|
||||
- All pricing, greeks, spreads, and liquidity metrics required for execution-quality decisions.
|
||||
|
||||
Assume:
|
||||
- Data is correct and cleaned.
|
||||
- You must NOT re-analyze technicals or news; the thesis is authoritative.
|
||||
- Your job is to convert thesis + surface into executable options trades.
|
||||
|
||||
Objective:
|
||||
- Select the best expiry and propose 1–3 high-quality options trades that align with thesis bias/regime, exploit volatility characteristics (gamma/theta/vega fit), are liquid/fillable/risk-defined, and include clear invalidation logic.
|
||||
- If no trade offers favorable risk/reward: strategyBias=NO_TRADE and explain why.
|
||||
|
||||
How to decide:
|
||||
1) Compare expiries: match time-to-playout vs confidence/uncertainty; match vol regime (expansion vs decay); reject poor liquidity density; reject misaligned vega/theta; avoid overpaying for time/vol.
|
||||
2) Choose structure class (explicitly justify vs alternatives): directional debit (single/vertical), volatility (straddle/strangle), defined-risk premium selling only if the regime supports it.
|
||||
3) Select strikes ONLY from provided data (ATM anchor, delta ladder, liquidSet). Prefer tight spreads, meaningful volume & OI, and greeks that express the thesis.
|
||||
4) Risk discipline: every trade must include max risk, what must go right, and what breaks the trade (invalidation).
|
||||
|
||||
Optional tools (use only when they materially improve decision quality; otherwise do not call):
|
||||
- MarketData – Options Chain (expiry-specific): only if provided expiries do not sufficiently match the thesis horizon, or liquidity/skew is materially better in a nearby expiry not already supplied. Choose an explicit expiry date. Use returned data only for strike selection and liquidity validation. Do not re-fetch already provided expiries unless validating anomalies.
|
||||
- Fear & Greed Index (FGI): only for index/ETF/macro-sensitive underlyings (e.g., SPX, NDX, IWM, SLV). Contextual only (risk appetite / convexity vs tempered), not a primary signal.
|
||||
|
||||
Hard constraints:
|
||||
- Do NOT invent strikes, expiries, or prices.
|
||||
- Do NOT suggest illiquid contracts.
|
||||
- Do NOT recommend naked risk.
|
||||
- Do NOT hedge unless justified.
|
||||
- Do NOT repeat raw data back.
|
||||
|
||||
Return ONLY valid JSON in exactly this shape:
|
||||
{
|
||||
"selectedExpiry": "YYYY-MM-DD",
|
||||
"expiryRationale": "Why this expiry dominates the others given thesis + vol + liquidity",
|
||||
"strategyBias": "DIRECTIONAL|VOLATILITY|NEUTRAL|NO_TRADE",
|
||||
"recommendedTrades": [
|
||||
{
|
||||
"name": "Short descriptive name",
|
||||
"structure": "e.g. Long Call, Call Debit Spread, Long Strangle",
|
||||
"legs": [{"side":"call|put","action":"buy|sell","strike":0,"expiry":"YYYY-MM-DD"}],
|
||||
"greekProfile": {"deltaBias":"POS|NEG|NEUTRAL","gammaExposure":"HIGH|MED|LOW","thetaExposure":"POS|NEG|LOW","vegaExposure":"HIGH|MED|LOW"},
|
||||
"maxRisk": "Defined numeric or qualitative",
|
||||
"maxReward": "Defined numeric or qualitative",
|
||||
"thesisAlignment": "Exactly how this trade expresses the thesis",
|
||||
"invalidation": "Clear condition where trade is wrong",
|
||||
"managementNotes": "Optional: scale, take-profit, time stop"
|
||||
}
|
||||
],
|
||||
"whyOthersRejected": ["Why other expiries or strategy types were inferior"],
|
||||
"confidenceScore": 0
|
||||
}
|
||||
|
||||
Final note: optimize for repeatable profitability under uncertainty. If conditions are marginal, say NO_TRADE with conviction.
|
||||
Reference in New Issue
Block a user