# AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide) This file captures the full context, decisions, failures, fixes, commands, and paths used to fine-tune gpt-oss-20b and deploy it into Ollama as `trained-options-model`. It is meant to be a literal step-by-step recipe for retraining with new data. Read this end-to-end before touching anything. ------------------------------------------------------------------------------ ## 0) Hard Requirements (User Directives) - Use local documents in this repo only. - Dedupe repeated docs across formats; do not ingest duplicates. - Manually remove non-relevant ebook content (preface, index, author/publisher pages, etc). Options-trading content only. - Use GPU heavily (not CPU). - If local AMD 7900XTX is not available, use the remote NVIDIA box. - All long-running tasks must show progress and **post progress at least every 2 minutes** (print progress or size updates, not silent). - Retraining must complete locally (no cloud). - Final Ollama model name must be **trained-options-model**. - Final Ollama model **must support tool/function calls**. - Any destructive commands must require explicit approval (do not run them silently). ------------------------------------------------------------------------------ ## 1) Machines, OS, Access, and Credentials ### Local Windows - Repo path: `C:\Users\Rushabh\projects\ingest-ebook-options` - Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead). - Local Ollama install exists but was not used for training. ### Remote TrueNAS SCALE (Used for Training + Ollama) - Host: `192.168.1.2` - SSH port: `55555` - User: `rushabh` - Password: none required (key-based / no password). - SSH example: - `ssh -p 55555 rushabh@192.168.1.2` - Ollama HTTP endpoint (remote): `http://192.168.1.2:30068` ### TrueNAS UI / middlewared - User explicitly required: create and manage containers as TrueNAS Apps (middlewared/TrueNAS UI), not ad-hoc docker only. - If an app does not show in UI, check middlewared and re-create via UI. ------------------------------------------------------------------------------ ## 2) Storage Layout and Mounts (Critical) ### Remote TrueNAS storage root - `/mnt/fast.storage.rushg.me/datasets/apps` ### Remote training workspace (folder, not ZFS dataset) - `/mnt/fast.storage.rushg.me/datasets/apps/pytorch` - IMPORTANT: user requested a folder, not a ZFS dataset. ### Repo copy on remote - `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options` ### Ollama model storage mount (remote) - Host path: `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models` - Container path: `/root/.ollama` - Actual model store: - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models` - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs` - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests` ### Ollama imports folder (created by us) - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports` ### Hugging Face cache (remote) - `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache` - When retraining, set `HF_HOME` or `HF_HUB_CACHE` to this path to keep downloads on fast storage and avoid redownloading. ------------------------------------------------------------------------------ ## 3) TrueNAS App Setup (GPU Training + Ollama) ### Ollama App - Container name: `ix-ollama-ollama-1` - Exposes: `0.0.0.0:30068` - GPU: NVIDIA RTX 5060 Ti (16 GB VRAM) - Observed Ollama version: 0.13.5 - Uses `/root/.ollama` mapped to `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models` ### Training App (Created in TrueNAS UI) - App name: `options-train` - GPU: NVIDIA RTX 5060 Ti - Reason: user demanded TrueNAS UI app creation; also to ensure GPU access. - We explicitly stopped the `llamacpp` app to free GPU before training. ### Docker permission note - Non-root user lacks docker socket permission. - Use `sudo -n docker ...` for all docker commands on the host. ### Shell note (remote) - Default shell is `zsh`. - Use `bash -lc '...'` to avoid quote parsing issues and missing tools. - `rg` is not installed on remote; use `grep`/`find`. ------------------------------------------------------------------------------ ## 4) Data Prep Pipeline (Dedup + Manual Relevance) ### Source docs - Local docs in `eBooks/` (PDF/EPUB/etc). - Must **manually** select relevant pages (options trading content only). - Skip: prefaces, index, author/publisher info, boilerplate, etc. ### Step A - Extract full text and doc-level dedupe Script: `tools/extract_corpus.py` - Supports .pdf/.epub/.txt/.md - Dedup by SHA256 of normalized text across different formats. - Outputs: - `training_data/manifest.json` - `training_data/corpus.txt` - `training_data/text/*.txt` - `training_data/rejected.json` Example: ``` python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000 ``` Dependencies: - `pypdf`, `ebooklib`, `beautifulsoup4`, `lxml`, `chardet` ### Step B - Page/section relevance filtering (Options-focused) Script: `tools/select_relevant.py` - Scores segments for options-trading keywords. - Drops TOC/index/front matter. - Dedupe by SHA256 of normalized segment. - Includes neighboring pages by `--neighbors`. Outputs in `training_data/relevant`: - `text/*.txt` - `manifest.json` - `report.csv` - `corpus.txt` Example: ``` python tools/select_relevant.py --input eBooks --out training_data/relevant \ --min-score 10 --min-chars 800 --neighbors 1 ``` ### Step C - Chunk to JSONL dataset Script: `tools/build_dataset.py` - Splits into overlapping chunks. - Optional junk filter and keyword score. Outputs: - `training_data/relevant/dataset.jsonl` - `training_data/relevant/dataset.stats.json` Example: ``` python tools/build_dataset.py \ --manifest training_data/relevant/manifest.json \ --text-dir training_data/relevant/text \ --out training_data/curated/dataset.jsonl \ --chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk ``` ### Manual curation requirement - The scripts are helper filters only. You must still **manually review** for relevance, especially to remove prefaces, indexes, disclaimers, etc. - Use `training_data/relevant/corpus.txt` to scan human-readable content. ### Dataset used in this run - Remote dataset path: `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl` - Count: 1778 chunks. ------------------------------------------------------------------------------ ## 5) Training Pipeline (LoRA fine-tune on NVIDIA box) ### Why local AMD GPU was not used - User explicitly requested the remote NVIDIA box. - Local AMD 7900XTX was not used in this run. ### Training script (repo) - `tools/finetune_lora.py` - Modified to fix gradient checkpointing + LoRA: - `model.enable_input_require_grads()` is required. - Without it, MXFP4 path fails with: `RuntimeError: element 0 of tensors does not require grad...` ### Key training args used - `--model openai/gpt-oss-20b` - `--data training_data/curated/dataset.jsonl` - `--out training_data/lora_adapter` - `--max-length 256` - `--epochs 1` (adjust as needed) - `--lora-r 8 --lora-alpha 16 --lora-dropout 0.05` - `--grad-accum 4` - `--quant auto` (MXFP4 on GPU) - `--log-seconds 120` (must show progress every 2 minutes) - `--log-steps 10` (extra progress) ### Progress requirement (must follow) - Use `--log-seconds 120` so training prints logs every ~2 minutes. - For long copies or merges, print `date` + file size in a loop every 120 sec. ### GPU requirements - NVIDIA GPU required for quantized loading; MXFP4 needs GPU. - GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8. ### What failed and how we fixed it 1) **MXFP4 grad error** - Error: `RuntimeError: element 0 of tensors does not require grad` - Fix: In `tools/finetune_lora.py`, after `model.gradient_checkpointing_enable()` add: `model.enable_input_require_grads()` 2) **Bitsandbytes 4-bit OOM** - With `--quant 4bit` the model OOMed even with max memory limits. - CPU offload not supported with this setup; still OOM. - Fix: use `--quant auto` (MXFP4) instead. 3) **Triton/compile issues** - Triton kernels required a compiler in the container. - Fix: Use a PyTorch **CUDA devel** image (not runtime) or install `build-essential` inside the container. ### Output artifacts (LoRA) `training_data/lora_adapter/` contains: - `adapter_model.safetensors` - `adapter_config.json` - `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json` - `training_summary.json` (includes steps and loss EMA) ------------------------------------------------------------------------------ ## 6) GGUF Conversion and Merge (Required; Ollama LoRA not supported) ### Why merge is required - Ollama error when using ADAPTER: `Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented` - Therefore, must merge LoRA into base GGUF. ### llama.cpp setup (remote) - Clone location: `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp` - Build: ``` cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF cmake --build build -j $(nproc) ``` - Note: `-DLLAMA_CURL=OFF` used due to missing libcurl. - Binaries: - `build/bin/llama-export-lora` - `build/bin/llama-gguf` - When running, set: - `LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin` ### Convert LoRA to GGUF Use `convert_lora_to_gguf.py`: ``` python convert_lora_to_gguf.py \ --lora /path/to/training_data/lora_adapter \ --outfile /path/to/training_data/lora_adapter/options-lora.gguf ``` ### Architecture mismatch pitfall (critical) - Base GGUF from Ollama uses `general.architecture = gptoss` - LoRA GGUF from converter uses `general.architecture = gpt-oss` - `llama-export-lora` throws: `model arch and LoRA arch mismatch` ### Fix: rewrite LoRA GGUF metadata to `gptoss` We used `gguf-py` to rewrite metadata. Example (run inside a Python container): ``` from gguf import GGUFReader, GGUFWriter, GGUFValueType import numpy as np inp = "options-lora.gguf" out = "options-lora-gptoss.gguf" r = GGUFReader(inp) w = GGUFWriter(out, "gptoss", endianess=r.endianess) # Copy KV fields except general.architecture for key, field in r.fields.items(): if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"): continue vtype = field.types[0] if vtype == GGUFValueType.ARRAY: w.add_key_value(key, field.contents(), vtype, field.types[-1]) else: w.add_key_value(key, field.contents(), vtype) # Copy tensors for t in r.tensors: data = t.data if not data.flags["C_CONTIGUOUS"]: data = np.ascontiguousarray(data) w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)), raw_dtype=t.tensor_type, tensor_endianess=r.endianess) ``` ### Tensor orientation mismatch (critical) - After arch fix, merge failed with: `GGML_ASSERT(ggml_can_mul_mat(a, b)) failed` - Root cause: LoRA A/B tensors had orientation incompatible with base GGUF. - Fix: transpose LoRA A and B **data** when re-serializing GGUF. **Important GGUF detail:** - GGUF stores tensor dims reversed internally. - You must transpose the data while keeping the *original raw_shape*. - Working approach: ``` if name.endswith(".lora_a") or name.endswith(".lora_b"): data = np.ascontiguousarray(data.T) w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...) ``` ### Working LoRA GGUF for merge - `options-lora-gptoss-transposed2.gguf` ### Merge LoRA into base GGUF Base GGUF path (from Ollama blob): `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb` Merge command: ``` export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin /mnt/.../llama.cpp/build/bin/llama-export-lora \ -m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \ --lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \ -o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf ``` ### Merged output (final) - `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf` - Size: ~13 GB - File type: F16 ### Intermediate artifacts kept (not deleted) - `options-lora-gptoss.gguf` - `options-lora-gptoss-transposed.gguf` - `options-lora-gptoss-transposed-debug.gguf` - `options-lora-gptoss-transposed2.gguf` - `gpt-oss-20b-options-merged-f16-v2.gguf` (14 MB, failed) - `gpt-oss-20b-options-merged-f16.gguf` (0 bytes, failed) ------------------------------------------------------------------------------ ## 7) Ollama Integration (Final Model) ### Why ADAPTER does not work Modelfile with ADAPTER fails: ``` Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented ``` Therefore, merged GGUF is mandatory. ### Copy merged GGUF into Ollama imports ``` mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \ /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/ ``` ### Modelfile (with tool support) **Important:** tools only work if the TEMPLATE block matches the base model template. Without TEMPLATE, Ollama shows `{{ .Prompt }}` and tools are disabled. We extracted template from base: ``` sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \ > /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template ``` Then built Modelfile: `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model` ``` FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf TEMPLATE """ """ SYSTEM """You are a knowledgeable options trading assistant. Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions. If information is uncertain, say so rather than guessing.""" ``` ### Create the model ``` sudo -n docker exec -i ix-ollama-ollama-1 \ ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model ``` ### Verify in Ollama ``` sudo -n docker exec -i ix-ollama-ollama-1 ollama list sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model ``` Expected capabilities include: `completion`, `tools`, `thinking`. ### Runtime note - `ollama run` can take a long time to load and may time out. - Use HTTP API for reliable results: ``` curl http://192.168.1.2:30068/api/generate -d '{ "model":"trained-options-model:latest", "prompt":"Explain delta and gamma briefly.", "stream":false }' ``` ------------------------------------------------------------------------------ ## 8) Tool/Function Call Requirement (Mandatory) ### How to verify tool support 1) `ollama show trained-options-model` should list `tools` in Capabilities. 2) `ollama show trained-options-model --template` should show the full template (not `{{ .Prompt }}`). ### Tool-call test (HTTP) ``` curl http://192.168.1.2:30068/api/chat -d '{ "model":"trained-options-model:latest", "stream":false, "messages":[ {"role":"system","content":"Use tools when available."}, {"role":"user","content":"Compute total for quantity=3 price=4. Use tool."} ], "tools":[ {"type":"function","function":{ "name":"calc_total", "description":"Compute total cost for a trade", "parameters":{ "type":"object", "properties":{"quantity":{"type":"number"},"price":{"type":"number"}}, "required":["quantity","price"] } }} ] }' ``` Expected: `tool_calls` in response. ------------------------------------------------------------------------------ ## 9) Known Failures + Fixes (Summary) - **Ollama ADAPTER fails** -> Merge LoRA into GGUF. - **Arch mismatch** (`gpt-oss` vs `gptoss`) -> Rewrite LoRA metadata. - **ggml_can_mul_mat assertion** -> Transpose LoRA A/B data. - **MXFP4 gradient error** -> `model.enable_input_require_grads()`. - **Bitsandbytes 4-bit OOM** -> Use MXFP4 auto on GPU. - **Triton compile error** -> Use PyTorch CUDA *devel* image or install gcc. - **WSL convert_lora_to_gguf.py missing transformers** -> Use docker or install transformers in WSL. - **`ollama run` hangs** -> Use `/api/generate` or `/api/chat` via curl. ------------------------------------------------------------------------------ ## 10) Retrain Checklist (Minimal Friction) 1) **Prepare data locally** - Put docs in `eBooks/`. - Run: - `python tools/select_relevant.py ...` - `python tools/build_dataset.py ...` - Manually inspect `training_data/relevant/corpus.txt`. 2) **Sync to remote** - Example (PowerShell): - `scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/` 3) **Stop GPU-conflicting apps** - Stop `llamacpp` app in TrueNAS UI. 4) **Train LoRA in TrueNAS app** - Ensure GPU attached. - Use `tools/finetune_lora.py` with `--log-seconds 120`. - Confirm adapter saved in `training_data/lora_adapter`. 5) **Convert LoRA to GGUF** - `convert_lora_to_gguf.py` -> `options-lora.gguf` 6) **Fix arch + transpose** - Rewrite to `gptoss` - Transpose LoRA A/B data - Output `options-lora-gptoss-transposed2.gguf` 7) **Merge into base GGUF** - Use `llama-export-lora` - Output `gpt-oss-20b-options-merged-f16-v3.gguf` 8) **Ollama import** - Copy GGUF to `/mnt/.../ollama.models/imports` - Build Modelfile with TEMPLATE - `ollama create trained-options-model -f ...` 9) **Verify tool support** - `ollama show trained-options-model` - `/api/chat` tool-call test ------------------------------------------------------------------------------ ## 11) Commands Used in This Run (Examples) ### Remote file listing (progress + verify) ``` ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter" ``` ### GGUF metadata check ``` python - <<'PY' from gguf import GGUFReader r = GGUFReader("options-lora.gguf") print(r.get_field("general.architecture").contents()) PY ``` ### Merge with progress updates every 2 minutes ``` BASE=/mnt/.../ollama.models/models/blobs/ LORA=/mnt/.../options-lora-gptoss-transposed2.gguf OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin /mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" & pid=$! while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done wait $pid ``` ------------------------------------------------------------------------------ ## 12) Notes About Local Files in This Repo - `Modelfile.trained-options-model` (local) still references ADAPTER and is **not** valid for current Ollama (ADAPTER unsupported). - Use the remote Modelfile in `/mnt/.../ollama.models/imports/`. - `_tmp_*` scripts exist for prior automation attempts (TrueNAS app creation, GPU checks, etc). Use only if you know what they do. ------------------------------------------------------------------------------ ## 13) Progress Reporting Policy (Non-Negotiable) During any long run (training, merge, large copy): - Print a progress line every 120 seconds. - Example: `date` + file size, or a training loss line. - Do not allow silent runs. ------------------------------------------------------------------------------ ## 14) Quick Sanity Checks (After Retrain) 1) `ollama list` shows `trained-options-model:latest` 2) `ollama show trained-options-model` lists `tools` 3) `/api/generate` returns a coherent answer 4) `/api/chat` returns a tool call when tools are provided ------------------------------------------------------------------------------ ## 15) Do NOT Forget These Pitfalls - Arch mismatch (`gpt-oss` vs `gptoss`) **will break merge**. - LoRA tensor orientation mismatch **will break merge**. - ADAPTER in Modelfile **does not work** in current Ollama. - Tool calls **only** work if TEMPLATE is included. - Remote shell is zsh; use `bash -lc` for complex quoting. - Docker requires `sudo -n`. - Use the remote GPU as requested; do not train on CPU. ------------------------------------------------------------------------------ ## 16) Current "Final" Artifacts (Reference) ### LoRA adapter `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/` ### Merged GGUF (final) `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf` ### Ollama Modelfile `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model` ### Ollama Model Name `trained-options-model:latest` ------------------------------------------------------------------------------ ## 17) If You Need to Rebuild Tools Support 1) Extract base template: ``` sudo -n docker exec -i ix-ollama-ollama-1 \ ollama show gpt-oss:20b --template > /mnt/.../gptoss.template ``` 2) Create Modelfile with TEMPLATE block. 3) Re-run `ollama create`. 4) Verify `ollama show trained-options-model` lists `tools`. ------------------------------------------------------------------------------ ## 18) Git Repo + Source Inventory (This Repo) ### Remote git repo - URL (HTTP): `https://git.rushg.me/rushabh/ollama-model-training-5060ti` - URL (git): `https://git.rushg.me/rushabh/ollama-model-training-5060ti.git` - Auth: user will authenticate on push when prompted (username/password). ### What is committed (and why) - `AGENTS.md` (this runbook; full end-to-end context). - `README.md` (quick overview + links to AGENTS). - `tools/` scripts for extraction, filtering, dataset build, and training. - `training_data/` curated dataset, manifests, reports, and LoRA outputs used for the run (kept for reproducibility). - `remote/ollama/Modelfile.trained-options-model.remote` (exact remote Modelfile used to enable tools). - `remote/ollama/gptoss.template` (base template pulled from gpt-oss:20b). - `Modelfile.trained-options-model` (local reference; see remote Modelfile for tool-enabled version). ### What is excluded (and why) - `eBooks/` raw source data (large; keep local and private). - `_llama_cpp/` (upstream repo; clone on demand). - `.venv/` and Python caches. - Any base model weights or Ollama blobs (too large; download via Ollama/HF). ### How to recreate missing external assets - Base model: - `ollama pull gpt-oss:20b` on the Ollama host - or `huggingface-cli download openai/gpt-oss-20b` into HF cache - llama.cpp: - `git clone https://github.com/ggml-org/llama.cpp.git` - build with `-DLLAMA_CURL=OFF` if libcurl is missing. ------------------------------------------------------------------------------ End of AGENTS.md