Initial commit
This commit is contained in:
134
llamaCpp.Wrapper.app/README.md
Normal file
134
llamaCpp.Wrapper.app/README.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# llama.cpp OpenAI-Compatible Wrapper
|
||||
|
||||
This project wraps the existing llama.cpp TrueNAS app with OpenAI-compatible endpoints and a model management UI.
|
||||
The wrapper reads deployment details from `AGENTS.md` (build-time) into `app/agents_config.json`.
|
||||
|
||||
## Current Agents-Derived Details
|
||||
|
||||
- llama.cpp image: `ghcr.io/ggml-org/llama.cpp:server-cuda`
|
||||
- Host port: `8071` -> container port `8080`
|
||||
- Model mount: `/mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models` -> `/models`
|
||||
- Network: `ix-llamacpp_default`
|
||||
- Container name: `ix-llamacpp-llamacpp-1`
|
||||
- GPUs: 2x NVIDIA RTX 5060 Ti (from AGENTS snapshot)
|
||||
|
||||
Regenerate the derived config after updating `AGENTS.md`:
|
||||
|
||||
```bash
|
||||
python app/agents_parser.py --agents AGENTS.md --out app/agents_config.json
|
||||
```
|
||||
|
||||
## Running Locally
|
||||
|
||||
```bash
|
||||
python -m venv .venv
|
||||
. .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
python -m app.run
|
||||
```
|
||||
|
||||
Defaults:
|
||||
- API: `PORT_A=9093`
|
||||
- UI: `PORT_B=9094`
|
||||
- Base URL: `LLAMACPP_BASE_URL` (defaults to container name or localhost based on agents config)
|
||||
- Model dir: `MODEL_DIR=/models`
|
||||
|
||||
## Docker (TrueNAS)
|
||||
|
||||
Example (join existing llama.cpp network and mount models):
|
||||
|
||||
```bash
|
||||
docker run --rm -p 9093:9093 -p 9094:9094 \
|
||||
--network ix-llamacpp_default \
|
||||
-v /mnt/fast.storage.rushg.me/datasets/apps/llama-cpp.models:/models \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-e LLAMACPP_RESTART_METHOD=docker \
|
||||
-e LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1 \
|
||||
-e LLAMACPP_TARGET_CONTAINER=ix-llamacpp-llamacpp-1 \
|
||||
-e TRUENAS_WS_URL=ws://192.168.1.2/websocket \
|
||||
-e TRUENAS_API_KEY=YOUR_KEY \
|
||||
-e TRUENAS_API_USER=YOUR_USER \
|
||||
-e TRUENAS_APP_NAME=llamacpp \
|
||||
-e LLAMACPP_BASE_URL=http://ix-llamacpp-llamacpp-1:8080 \
|
||||
-e PORT_A=9093 -e PORT_B=9094 \
|
||||
llama-cpp-openai-wrapper:latest
|
||||
```
|
||||
|
||||
## Model Hot-Swap / Restart Hooks
|
||||
|
||||
This wrapper does not modify llama.cpp by default. To enable hot-swap/restart for new models or model selection,
|
||||
provide one of the restart methods below:
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=http`
|
||||
- `LLAMACPP_RESTART_URL=http://host-or-helper/restart`
|
||||
|
||||
or
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=shell`
|
||||
- `LLAMACPP_RESTART_COMMAND="/usr/local/bin/your-restart-script --arg"`
|
||||
|
||||
or (requires mounting docker socket)
|
||||
|
||||
- `LLAMACPP_RESTART_METHOD=docker`
|
||||
- `LLAMACPP_RESTART_COMMAND=ix-llamacpp-llamacpp-1`
|
||||
|
||||
## Model switching via TrueNAS middleware (P0)
|
||||
|
||||
Provide TrueNAS API credentials so the wrapper can update the llama.cpp app command when a new model is selected:
|
||||
|
||||
```
|
||||
TRUENAS_WS_URL=ws://192.168.1.2/websocket
|
||||
TRUENAS_API_KEY=YOUR_KEY
|
||||
TRUENAS_API_USER=YOUR_USER
|
||||
TRUENAS_APP_NAME=llamacpp
|
||||
TRUENAS_VERIFY_SSL=false
|
||||
```
|
||||
|
||||
The wrapper preserves existing flags in the compose command and only updates `--model`, while optionally adding
|
||||
missing GPU split flags from `LLAMACPP_*` if not already set.
|
||||
|
||||
Optional arguments passed to restart handlers:
|
||||
|
||||
```
|
||||
LLAMACPP_DEVICES=0,1
|
||||
LLAMACPP_TENSOR_SPLIT=0.5,0.5
|
||||
LLAMACPP_SPLIT_MODE=layer
|
||||
LLAMACPP_N_GPU_LAYERS=999
|
||||
LLAMACPP_CTX_SIZE=8192
|
||||
LLAMACPP_BATCH_SIZE=1024
|
||||
LLAMACPP_UBATCH_SIZE=256
|
||||
LLAMACPP_CACHE_TYPE_K=q4_0
|
||||
LLAMACPP_CACHE_TYPE_V=q4_0
|
||||
LLAMACPP_FLASH_ATTN=on
|
||||
```
|
||||
|
||||
You can also pass arbitrary llama.cpp flags (space-separated) via:
|
||||
|
||||
```
|
||||
LLAMACPP_EXTRA_ARGS="--mlock --no-mmap --rope-scaling linear"
|
||||
```
|
||||
|
||||
## Model Manager UI
|
||||
|
||||
Open `http://HOST:PORT_B/`.
|
||||
|
||||
Features:
|
||||
- List existing models
|
||||
- Download models via URL
|
||||
- Live progress + cancel
|
||||
|
||||
## Testing
|
||||
|
||||
Tests are parameterized with 100+ cases per endpoint.
|
||||
|
||||
```bash
|
||||
pytest -q
|
||||
```
|
||||
|
||||
## llama.cpp flags reference
|
||||
|
||||
Scraped from upstream docs into `reports/llamacpp_docs.md` and `reports/llamacpp_flags.txt`.
|
||||
|
||||
```
|
||||
pwsh scripts/update_llamacpp_flags.ps1
|
||||
```
|
||||
Reference in New Issue
Block a user