Initial commit

2026-01-07 16:54:39 -08:00
commit 5d1a0ee72b
53 changed files with 9885 additions and 0 deletions
--- a/docs/llamacpp-wrapper-notes.md
+++ b/docs/llamacpp-wrapper-notes.md
@@ -0,0 +1,60 @@
+# llama.cpp Wrapper Notes
+
+Last updated: 2026-01-04
+
+## Purpose
+OpenAI-compatible wrapper for the existing `llamacpp` app with a model manager UI,
+model switching, and parameter management via TrueNAS middleware.
+
+## Deployed Image
+- `rushabhtechie/llamacpp-wrapper-rushg-d:20260104-112221`
+
+## Ports (current)
+- API (pinned): `http://192.168.1.2:9093`
+- UI (pinned): `http://192.168.1.2:9094`
+- llama.cpp native: `http://192.168.1.2:8071`
+
+## Key Behaviors
+- Model switching uses TrueNAS middleware `app.update` to update `--model`.
+- `--device` flag is explicitly removed because it crashes llama.cpp on this host.
+- UI shows active model and supports switching with verification prompt.
+- UI auto-refreshes on download progress and on llama.cpp model changes (SSE).
+- UI allows editing llama.cpp command parameters (ctx-size, temp, top-k/p, etc.).
+- UI supports dark theme toggle (persisted in localStorage).
+- UI streams llama.cpp logs via Docker socket fallback when TrueNAS log APIs are unavailable.
+
+## Tools Support (n8n/OpenWebUI)
+- Incoming `tools` in flat format (`{type,name,parameters}`) are normalized to
+  OpenAI format (`{type:"function", function:{...}}`) before proxying to llama.cpp.
+- Legacy `functions` payloads are normalized into `tools`.
+- `tool_choice` is normalized to OpenAI format as well.
+- `return_format=json` is supported (falls back to JSON-only system prompt if llama.cpp rejects `response_format`).
+
+## Model Resolution
+- Exact string match only (with optional explicit alias mapping).
+- Requests that do not exactly match a listed model return `404`.
+
+## Parameters UI
+- Endpoint: `GET /ui/api/llamacpp-config` (active model + params + extra args)
+- Endpoint: `POST /ui/api/llamacpp-config` (updates command flags + extra args)
+
+## Model Switch UI
+- Endpoint: `POST /ui/api/switch-model` with `{ "model_id": "..." }`
+- Verifies switch by sending a minimal prompt.
+
+## Tests
+- Remote functional tests: `tests/test_remote_wrapper.py` (chat/responses/tools/JSON mode, model switch, logs, multi-GPU flags).
+- UI checks: `tests/test_ui.py` (UI elements, assets, theme toggle wiring).
+- Run with env vars:
+  - `WRAPPER_BASE=http://192.168.1.2:9093`
+  - `UI_BASE=http://192.168.1.2:9094`
+  - `TRUENAS_WS_URL=wss://192.168.1.2/websocket`
+  - `TRUENAS_API_KEY=...`
+  - `MODEL_REQUEST=<exact model id from /v1/models>`
+
+## Runtime Validation (2026-01-04)
+- Fixed llama.cpp init failure by enabling `--flash-attn on` (required with KV cache quantization).
+- Confirmed TinyLlama loads and answers prompts with `return_format=json`.
+- Switched via UI to `Qwen2.5-7B-Instruct-Q4_K_M.gguf` and validated prompt success.
+- Expect transient `503 Loading model` during warmup; retry after load completes.
+ - Verified `yarn-llama-2-13b-64k.Q4_K_M.gguf` model switch from wrapper and a tool-enabled chat request completes after load (took ~107s).
--- a/docs/n8n-thesis-builder-checkpoint-20260104.md
+++ b/docs/n8n-thesis-builder-checkpoint-20260104.md
@@ -0,0 +1,53 @@
+# n8n Thesis Builder Debug Checkpoint (2026-01-04)
+
+## Summary
+- Workflow: `Options recommendation Engine Core LOCAL v2` (id `Nupt4vBG82JKFoGc`).
+- Primary issue: `AI - Thesis Builder` returns garbled output even when workflow succeeds.
+- Confirmed execution with garbled output: execution `7890` (status `success`).
+
+## What changed in the workflow
+Only this workflow was modified:
+- `Code in JavaScript9` now pulls `symbol` from `Code7` (trigger) instead of AI output.
+- `HTTP Request13` query forced to the stock symbol to avoid NewsAPI query-length errors.
+- `Trim Thesis Data` node inserted between `Aggregate2` -> `AI - Thesis Builder`.
+- `AI - Thesis Builder` prompt simplified to only: symbol, price, news, technicals.
+- `Code10` now caps news items and string length.
+
+## Last successful run details (execution 7890)
+- `AI - Thesis Builder` output is garbled (example `symbol` and `thesis` fields full of junk tokens).
+- `AI - Technicals Auditor` output looks valid JSON (see sample below).
+- `Aggregate2` payload size ~6.7KB; `news` ~859 chars; `tech` ~1231 chars; `thesis_prompt` ~4448 chars.
+- Garbling persists despite trimming input size; likely model/wrapper settings or response format handling.
+
+### Sample `AI - Thesis Builder` output (garbled)
+- symbol: `6097ig5ear18etymac3ofy4ppystugamp2llcashackicset0ovagates-hstt.20t*6fthm--offate9noptooth(2ccods+5ing, or 7ACYntat?9ur);8ot1ut`
+- thesis: (junk tokens, mostly non-words)
+- confidence: `0`
+
+### Sample `AI - Technicals Auditor` output (valid JSON)
+```
+{
+  "output": {
+    "timeframes": [
+      { "interval": "1m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "5m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "15m", "valid": true, "features": { "trend": "BEARISH" } },
+      { "interval": "1h", "valid": true, "features": { "trend": "BULLISH" } }
+    ],
+    "optionsRegime": { "priceRegime": "TRENDING", "volRegime": "EXPANDING", "nearTermSensitivity": "HIGH" },
+    "dataQualityScore": 0.5,
+    "error": "INSUFFICIENT_DATA"
+  }
+}
+```
+
+## Open issues
+- Thesis Builder garbling persists even with small prompt; likely model/wrapper output issue.
+- Need to confirm whether llama.cpp wrapper is corrupting output or model is misconfigured for JSON-only output.
+
+## Useful commands
+- Last runs:
+  `SELECT id, status, finished, "startedAt" FROM execution_entity WHERE "workflowId"='Nupt4vBG82JKFoGc' ORDER BY "startedAt" DESC LIMIT 5;`
+- Export workflow:
+  `sudo docker exec ix-n8n-n8n-1 n8n export:workflow --id Nupt4vBG82JKFoGc --output /tmp/n8n_local_v2.json`
+