Add training workflow, datasets, and runbook

2025-12-23 21:17:22 -08:00
commit 619e87aacc
2140 changed files with 2513895 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,618 @@
+# AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide)
+
+This file captures the full context, decisions, failures, fixes, commands, and
+paths used to fine-tune gpt-oss-20b and deploy it into Ollama as
+`trained-options-model`. It is meant to be a literal step-by-step recipe for
+retraining with new data. Read this end-to-end before touching anything.
+
+------------------------------------------------------------------------------
+## 0) Hard Requirements (User Directives)
+
+- Use local documents in this repo only.
+- Dedupe repeated docs across formats; do not ingest duplicates.
+- Manually remove non-relevant ebook content (preface, index, author/publisher
+  pages, etc). Options-trading content only.
+- Use GPU heavily (not CPU).
+- If local AMD 7900XTX is not available, use the remote NVIDIA box.
+- All long-running tasks must show progress and **post progress at least every
+  2 minutes** (print progress or size updates, not silent).
+- Retraining must complete locally (no cloud).
+- Final Ollama model name must be **trained-options-model**.
+- Final Ollama model **must support tool/function calls**.
+- Any destructive commands must require explicit approval (do not run them
+  silently).
+
+------------------------------------------------------------------------------
+## 1) Machines, OS, Access, and Credentials
+
+### Local Windows
+- Repo path: `C:\Users\Rushabh\projects\ingest-ebook-options`
+- Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead).
+- Local Ollama install exists but was not used for training.
+
+### Remote TrueNAS SCALE (Used for Training + Ollama)
+- Host: `192.168.1.2`
+- SSH port: `55555`
+- User: `rushabh`
+- Password: none required (key-based / no password).
+- SSH example:
+  - `ssh -p 55555 rushabh@192.168.1.2`
+- Ollama HTTP endpoint (remote): `http://192.168.1.2:30068`
+
+### TrueNAS UI / middlewared
+- User explicitly required: create and manage containers as TrueNAS Apps
+  (middlewared/TrueNAS UI), not ad-hoc docker only.
+- If an app does not show in UI, check middlewared and re-create via UI.
+
+------------------------------------------------------------------------------
+## 2) Storage Layout and Mounts (Critical)
+
+### Remote TrueNAS storage root
+- `/mnt/fast.storage.rushg.me/datasets/apps`
+
+### Remote training workspace (folder, not ZFS dataset)
+- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch`
+- IMPORTANT: user requested a folder, not a ZFS dataset.
+
+### Repo copy on remote
+- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options`
+
+### Ollama model storage mount (remote)
+- Host path: `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
+- Container path: `/root/.ollama`
+- Actual model store:
+  - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models`
+  - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs`
+  - `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests`
+
+### Ollama imports folder (created by us)
+- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports`
+
+### Hugging Face cache (remote)
+- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache`
+- When retraining, set `HF_HOME` or `HF_HUB_CACHE` to this path to keep downloads
+  on fast storage and avoid redownloading.
+
+------------------------------------------------------------------------------
+## 3) TrueNAS App Setup (GPU Training + Ollama)
+
+### Ollama App
+- Container name: `ix-ollama-ollama-1`
+- Exposes: `0.0.0.0:30068`
+- GPU: NVIDIA RTX 5060 Ti (16 GB VRAM)
+- Observed Ollama version: 0.13.5
+- Uses `/root/.ollama` mapped to `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
+
+### Training App (Created in TrueNAS UI)
+- App name: `options-train`
+- GPU: NVIDIA RTX 5060 Ti
+- Reason: user demanded TrueNAS UI app creation; also to ensure GPU access.
+- We explicitly stopped the `llamacpp` app to free GPU before training.
+
+### Docker permission note
+- Non-root user lacks docker socket permission.
+- Use `sudo -n docker ...` for all docker commands on the host.
+
+### Shell note (remote)
+- Default shell is `zsh`.
+- Use `bash -lc '...'` to avoid quote parsing issues and missing tools.
+- `rg` is not installed on remote; use `grep`/`find`.
+
+------------------------------------------------------------------------------
+## 4) Data Prep Pipeline (Dedup + Manual Relevance)
+
+### Source docs
+- Local docs in `eBooks/` (PDF/EPUB/etc).
+- Must **manually** select relevant pages (options trading content only).
+- Skip: prefaces, index, author/publisher info, boilerplate, etc.
+
+### Step A - Extract full text and doc-level dedupe
+Script: `tools/extract_corpus.py`
+- Supports .pdf/.epub/.txt/.md
+- Dedup by SHA256 of normalized text across different formats.
+- Outputs:
+  - `training_data/manifest.json`
+  - `training_data/corpus.txt`
+  - `training_data/text/*.txt`
+  - `training_data/rejected.json`
+Example:
+```
+python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000
+```
+Dependencies:
+- `pypdf`, `ebooklib`, `beautifulsoup4`, `lxml`, `chardet`
+
+### Step B - Page/section relevance filtering (Options-focused)
+Script: `tools/select_relevant.py`
+- Scores segments for options-trading keywords.
+- Drops TOC/index/front matter.
+- Dedupe by SHA256 of normalized segment.
+- Includes neighboring pages by `--neighbors`.
+Outputs in `training_data/relevant`:
+  - `text/*.txt`
+  - `manifest.json`
+  - `report.csv`
+  - `corpus.txt`
+Example:
+```
+python tools/select_relevant.py --input eBooks --out training_data/relevant \
+  --min-score 10 --min-chars 800 --neighbors 1
+```
+
+### Step C - Chunk to JSONL dataset
+Script: `tools/build_dataset.py`
+- Splits into overlapping chunks.
+- Optional junk filter and keyword score.
+Outputs:
+  - `training_data/relevant/dataset.jsonl`
+  - `training_data/relevant/dataset.stats.json`
+Example:
+```
+python tools/build_dataset.py \
+  --manifest training_data/relevant/manifest.json \
+  --text-dir training_data/relevant/text \
+  --out training_data/curated/dataset.jsonl \
+  --chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk
+```
+
+### Manual curation requirement
+- The scripts are helper filters only. You must still **manually review** for
+  relevance, especially to remove prefaces, indexes, disclaimers, etc.
+- Use `training_data/relevant/corpus.txt` to scan human-readable content.
+
+### Dataset used in this run
+- Remote dataset path:
+  `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl`
+- Count: 1778 chunks.
+
+------------------------------------------------------------------------------
+## 5) Training Pipeline (LoRA fine-tune on NVIDIA box)
+
+### Why local AMD GPU was not used
+- User explicitly requested the remote NVIDIA box.
+- Local AMD 7900XTX was not used in this run.
+
+### Training script (repo)
+- `tools/finetune_lora.py`
+- Modified to fix gradient checkpointing + LoRA:
+  - `model.enable_input_require_grads()` is required.
+  - Without it, MXFP4 path fails with:
+    `RuntimeError: element 0 of tensors does not require grad...`
+
+### Key training args used
+- `--model openai/gpt-oss-20b`
+- `--data training_data/curated/dataset.jsonl`
+- `--out training_data/lora_adapter`
+- `--max-length 256`
+- `--epochs 1` (adjust as needed)
+- `--lora-r 8 --lora-alpha 16 --lora-dropout 0.05`
+- `--grad-accum 4`
+- `--quant auto` (MXFP4 on GPU)
+- `--log-seconds 120` (must show progress every 2 minutes)
+- `--log-steps 10` (extra progress)
+
+### Progress requirement (must follow)
+- Use `--log-seconds 120` so training prints logs every ~2 minutes.
+- For long copies or merges, print `date` + file size in a loop every 120 sec.
+
+### GPU requirements
+- NVIDIA GPU required for quantized loading; MXFP4 needs GPU.
+- GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8.
+
+### What failed and how we fixed it
+
+1) **MXFP4 grad error**
+   - Error: `RuntimeError: element 0 of tensors does not require grad`
+   - Fix: In `tools/finetune_lora.py`, after
+     `model.gradient_checkpointing_enable()` add:
+     `model.enable_input_require_grads()`
+
+2) **Bitsandbytes 4-bit OOM**
+   - With `--quant 4bit` the model OOMed even with max memory limits.
+   - CPU offload not supported with this setup; still OOM.
+   - Fix: use `--quant auto` (MXFP4) instead.
+
+3) **Triton/compile issues**
+   - Triton kernels required a compiler in the container.
+   - Fix: Use a PyTorch **CUDA devel** image (not runtime) or install
+     `build-essential` inside the container.
+
+### Output artifacts (LoRA)
+`training_data/lora_adapter/` contains:
+- `adapter_model.safetensors`
+- `adapter_config.json`
+- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`
+- `training_summary.json` (includes steps and loss EMA)
+
+------------------------------------------------------------------------------
+## 6) GGUF Conversion and Merge (Required; Ollama LoRA not supported)
+
+### Why merge is required
+- Ollama error when using ADAPTER:
+  `Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented`
+- Therefore, must merge LoRA into base GGUF.
+
+### llama.cpp setup (remote)
+- Clone location: `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp`
+- Build:
+```
+cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
+cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
+cmake --build build -j $(nproc)
+```
+- Note: `-DLLAMA_CURL=OFF` used due to missing libcurl.
+- Binaries:
+  - `build/bin/llama-export-lora`
+  - `build/bin/llama-gguf`
+- When running, set:
+  - `LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin`
+
+### Convert LoRA to GGUF
+Use `convert_lora_to_gguf.py`:
+```
+python convert_lora_to_gguf.py \
+  --lora /path/to/training_data/lora_adapter \
+  --outfile /path/to/training_data/lora_adapter/options-lora.gguf
+```
+
+### Architecture mismatch pitfall (critical)
+- Base GGUF from Ollama uses `general.architecture = gptoss`
+- LoRA GGUF from converter uses `general.architecture = gpt-oss`
+- `llama-export-lora` throws:
+  `model arch and LoRA arch mismatch`
+
+### Fix: rewrite LoRA GGUF metadata to `gptoss`
+We used `gguf-py` to rewrite metadata. Example (run inside a Python container):
+```
+from gguf import GGUFReader, GGUFWriter, GGUFValueType
+import numpy as np
+
+inp = "options-lora.gguf"
+out = "options-lora-gptoss.gguf"
+r = GGUFReader(inp)
+w = GGUFWriter(out, "gptoss", endianess=r.endianess)
+
+# Copy KV fields except general.architecture
+for key, field in r.fields.items():
+    if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"):
+        continue
+    vtype = field.types[0]
+    if vtype == GGUFValueType.ARRAY:
+        w.add_key_value(key, field.contents(), vtype, field.types[-1])
+    else:
+        w.add_key_value(key, field.contents(), vtype)
+
+# Copy tensors
+for t in r.tensors:
+    data = t.data
+    if not data.flags["C_CONTIGUOUS"]:
+        data = np.ascontiguousarray(data)
+    w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)),
+                 raw_dtype=t.tensor_type, tensor_endianess=r.endianess)
+```
+
+### Tensor orientation mismatch (critical)
+- After arch fix, merge failed with:
+  `GGML_ASSERT(ggml_can_mul_mat(a, b)) failed`
+- Root cause: LoRA A/B tensors had orientation incompatible with base GGUF.
+- Fix: transpose LoRA A and B **data** when re-serializing GGUF.
+
+**Important GGUF detail:**
+- GGUF stores tensor dims reversed internally.
+- You must transpose the data while keeping the *original raw_shape*.
+- Working approach:
+```
+if name.endswith(".lora_a") or name.endswith(".lora_b"):
+    data = np.ascontiguousarray(data.T)
+w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...)
+```
+
+### Working LoRA GGUF for merge
+- `options-lora-gptoss-transposed2.gguf`
+
+### Merge LoRA into base GGUF
+Base GGUF path (from Ollama blob):
+`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb`
+
+Merge command:
+```
+export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
+/mnt/.../llama.cpp/build/bin/llama-export-lora \
+  -m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \
+  --lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \
+  -o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
+```
+
+### Merged output (final)
+- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
+- Size: ~13 GB
+- File type: F16
+
+### Intermediate artifacts kept (not deleted)
+- `options-lora-gptoss.gguf`
+- `options-lora-gptoss-transposed.gguf`
+- `options-lora-gptoss-transposed-debug.gguf`
+- `options-lora-gptoss-transposed2.gguf`
+- `gpt-oss-20b-options-merged-f16-v2.gguf` (14 MB, failed)
+- `gpt-oss-20b-options-merged-f16.gguf` (0 bytes, failed)
+
+------------------------------------------------------------------------------
+## 7) Ollama Integration (Final Model)
+
+### Why ADAPTER does not work
+Modelfile with ADAPTER fails:
+```
+Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented
+```
+Therefore, merged GGUF is mandatory.
+
+### Copy merged GGUF into Ollama imports
+```
+mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
+cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \
+   /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/
+```
+
+### Modelfile (with tool support)
+**Important:** tools only work if the TEMPLATE block matches the base model
+template. Without TEMPLATE, Ollama shows `{{ .Prompt }}` and tools are disabled.
+
+We extracted template from base:
+```
+sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \
+  > /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template
+```
+
+Then built Modelfile:
+`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
+```
+FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf
+TEMPLATE """
+<paste full gpt-oss:20b template here>
+"""
+
+SYSTEM """You are a knowledgeable options trading assistant.
+Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions.
+If information is uncertain, say so rather than guessing."""
+```
+
+### Create the model
+```
+sudo -n docker exec -i ix-ollama-ollama-1 \
+  ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model
+```
+
+### Verify in Ollama
+```
+sudo -n docker exec -i ix-ollama-ollama-1 ollama list
+sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model
+```
+Expected capabilities include: `completion`, `tools`, `thinking`.
+
+### Runtime note
+- `ollama run` can take a long time to load and may time out.
+- Use HTTP API for reliable results:
+```
+curl http://192.168.1.2:30068/api/generate -d '{
+  "model":"trained-options-model:latest",
+  "prompt":"Explain delta and gamma briefly.",
+  "stream":false
+}'
+```
+
+------------------------------------------------------------------------------
+## 8) Tool/Function Call Requirement (Mandatory)
+
+### How to verify tool support
+1) `ollama show trained-options-model` should list `tools` in Capabilities.
+2) `ollama show trained-options-model --template` should show the full template
+   (not `{{ .Prompt }}`).
+
+### Tool-call test (HTTP)
+```
+curl http://192.168.1.2:30068/api/chat -d '{
+  "model":"trained-options-model:latest",
+  "stream":false,
+  "messages":[
+    {"role":"system","content":"Use tools when available."},
+    {"role":"user","content":"Compute total for quantity=3 price=4. Use tool."}
+  ],
+  "tools":[
+    {"type":"function","function":{
+      "name":"calc_total",
+      "description":"Compute total cost for a trade",
+      "parameters":{
+        "type":"object",
+        "properties":{"quantity":{"type":"number"},"price":{"type":"number"}},
+        "required":["quantity","price"]
+      }
+    }}
+  ]
+}'
+```
+Expected: `tool_calls` in response.
+
+------------------------------------------------------------------------------
+## 9) Known Failures + Fixes (Summary)
+
+- **Ollama ADAPTER fails** -> Merge LoRA into GGUF.
+- **Arch mismatch** (`gpt-oss` vs `gptoss`) -> Rewrite LoRA metadata.
+- **ggml_can_mul_mat assertion** -> Transpose LoRA A/B data.
+- **MXFP4 gradient error** -> `model.enable_input_require_grads()`.
+- **Bitsandbytes 4-bit OOM** -> Use MXFP4 auto on GPU.
+- **Triton compile error** -> Use PyTorch CUDA *devel* image or install gcc.
+- **WSL convert_lora_to_gguf.py missing transformers** -> Use docker or install
+  transformers in WSL.
+- **`ollama run` hangs** -> Use `/api/generate` or `/api/chat` via curl.
+
+------------------------------------------------------------------------------
+## 10) Retrain Checklist (Minimal Friction)
+
+1) **Prepare data locally**
+   - Put docs in `eBooks/`.
+   - Run:
+     - `python tools/select_relevant.py ...`
+     - `python tools/build_dataset.py ...`
+   - Manually inspect `training_data/relevant/corpus.txt`.
+
+2) **Sync to remote**
+   - Example (PowerShell):
+     - `scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/`
+
+3) **Stop GPU-conflicting apps**
+   - Stop `llamacpp` app in TrueNAS UI.
+
+4) **Train LoRA in TrueNAS app**
+   - Ensure GPU attached.
+   - Use `tools/finetune_lora.py` with `--log-seconds 120`.
+   - Confirm adapter saved in `training_data/lora_adapter`.
+
+5) **Convert LoRA to GGUF**
+   - `convert_lora_to_gguf.py` -> `options-lora.gguf`
+
+6) **Fix arch + transpose**
+   - Rewrite to `gptoss`
+   - Transpose LoRA A/B data
+   - Output `options-lora-gptoss-transposed2.gguf`
+
+7) **Merge into base GGUF**
+   - Use `llama-export-lora`
+   - Output `gpt-oss-20b-options-merged-f16-v3.gguf`
+
+8) **Ollama import**
+   - Copy GGUF to `/mnt/.../ollama.models/imports`
+   - Build Modelfile with TEMPLATE
+   - `ollama create trained-options-model -f ...`
+
+9) **Verify tool support**
+   - `ollama show trained-options-model`
+   - `/api/chat` tool-call test
+
+------------------------------------------------------------------------------
+## 11) Commands Used in This Run (Examples)
+
+### Remote file listing (progress + verify)
+```
+ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter"
+```
+
+### GGUF metadata check
+```
+python - <<'PY'
+from gguf import GGUFReader
+r = GGUFReader("options-lora.gguf")
+print(r.get_field("general.architecture").contents())
+PY
+```
+
+### Merge with progress updates every 2 minutes
+```
+BASE=/mnt/.../ollama.models/models/blobs/<base-blob>
+LORA=/mnt/.../options-lora-gptoss-transposed2.gguf
+OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf
+export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
+/mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" &
+pid=$!
+while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done
+wait $pid
+```
+
+------------------------------------------------------------------------------
+## 12) Notes About Local Files in This Repo
+
+- `Modelfile.trained-options-model` (local) still references ADAPTER and is
+  **not** valid for current Ollama (ADAPTER unsupported).
+- Use the remote Modelfile in `/mnt/.../ollama.models/imports/`.
+- `_tmp_*` scripts exist for prior automation attempts (TrueNAS app creation,
+  GPU checks, etc). Use only if you know what they do.
+
+------------------------------------------------------------------------------
+## 13) Progress Reporting Policy (Non-Negotiable)
+
+During any long run (training, merge, large copy):
+- Print a progress line every 120 seconds.
+- Example: `date` + file size, or a training loss line.
+- Do not allow silent runs.
+
+------------------------------------------------------------------------------
+## 14) Quick Sanity Checks (After Retrain)
+
+1) `ollama list` shows `trained-options-model:latest`
+2) `ollama show trained-options-model` lists `tools`
+3) `/api/generate` returns a coherent answer
+4) `/api/chat` returns a tool call when tools are provided
+
+------------------------------------------------------------------------------
+## 15) Do NOT Forget These Pitfalls
+
+- Arch mismatch (`gpt-oss` vs `gptoss`) **will break merge**.
+- LoRA tensor orientation mismatch **will break merge**.
+- ADAPTER in Modelfile **does not work** in current Ollama.
+- Tool calls **only** work if TEMPLATE is included.
+- Remote shell is zsh; use `bash -lc` for complex quoting.
+- Docker requires `sudo -n`.
+- Use the remote GPU as requested; do not train on CPU.
+
+------------------------------------------------------------------------------
+## 16) Current "Final" Artifacts (Reference)
+
+### LoRA adapter
+`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/`
+
+### Merged GGUF (final)
+`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
+
+### Ollama Modelfile
+`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
+
+### Ollama Model Name
+`trained-options-model:latest`
+
+------------------------------------------------------------------------------
+## 17) If You Need to Rebuild Tools Support
+
+1) Extract base template:
+```
+sudo -n docker exec -i ix-ollama-ollama-1 \
+  ollama show gpt-oss:20b --template > /mnt/.../gptoss.template
+```
+2) Create Modelfile with TEMPLATE block.
+3) Re-run `ollama create`.
+4) Verify `ollama show trained-options-model` lists `tools`.
+
+------------------------------------------------------------------------------
+## 18) Git Repo + Source Inventory (This Repo)
+
+### Remote git repo
+- URL (HTTP): `https://git.rushg.me/rushabh/ollama-model-training-5060ti`
+- URL (git):  `https://git.rushg.me/rushabh/ollama-model-training-5060ti.git`
+- Auth: user will authenticate on push when prompted (username/password).
+
+### What is committed (and why)
+- `AGENTS.md` (this runbook; full end-to-end context).
+- `README.md` (quick overview + links to AGENTS).
+- `tools/` scripts for extraction, filtering, dataset build, and training.
+- `training_data/` curated dataset, manifests, reports, and LoRA outputs used
+  for the run (kept for reproducibility).
+- `remote/ollama/Modelfile.trained-options-model.remote` (exact remote Modelfile
+  used to enable tools).
+- `remote/ollama/gptoss.template` (base template pulled from gpt-oss:20b).
+- `Modelfile.trained-options-model` (local reference; see remote Modelfile for
+  tool-enabled version).
+
+### What is excluded (and why)
+- `eBooks/` raw source data (large; keep local and private).
+- `_llama_cpp/` (upstream repo; clone on demand).
+- `.venv/` and Python caches.
+- Any base model weights or Ollama blobs (too large; download via Ollama/HF).
+
+### How to recreate missing external assets
+- Base model:
+  - `ollama pull gpt-oss:20b` on the Ollama host
+  - or `huggingface-cli download openai/gpt-oss-20b` into HF cache
+- llama.cpp:
+  - `git clone https://github.com/ggml-org/llama.cpp.git`
+  - build with `-DLLAMA_CURL=OFF` if libcurl is missing.
+
+------------------------------------------------------------------------------
+End of AGENTS.md