Files
ollama-model-training-5060ti/AGENTS.md

22 KiB

AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide)

This file captures the full context, decisions, failures, fixes, commands, and paths used to fine-tune gpt-oss-20b and deploy it into Ollama as trained-options-model. It is meant to be a literal step-by-step recipe for retraining with new data. Read this end-to-end before touching anything.


0) Hard Requirements (User Directives)

  • Use local documents in this repo only.
  • Dedupe repeated docs across formats; do not ingest duplicates.
  • Manually remove non-relevant ebook content (preface, index, author/publisher pages, etc). Options-trading content only.
  • Use GPU heavily (not CPU).
  • If local AMD 7900XTX is not available, use the remote NVIDIA box.
  • All long-running tasks must show progress and post progress at least every 2 minutes (print progress or size updates, not silent).
  • Retraining must complete locally (no cloud).
  • Final Ollama model name must be trained-options-model.
  • Final Ollama model must support tool/function calls.
  • Any destructive commands must require explicit approval (do not run them silently).

1) Machines, OS, Access, and Credentials

Local Windows

  • Repo path: C:\Users\Rushabh\projects\ingest-ebook-options
  • Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead).
  • Local Ollama install exists but was not used for training.

Remote TrueNAS SCALE (Used for Training + Ollama)

  • Host: 192.168.1.2
  • SSH port: 55555
  • User: rushabh
  • Password: none required (key-based / no password).
  • SSH example:
    • ssh -p 55555 rushabh@192.168.1.2
  • Ollama HTTP endpoint (remote): http://192.168.1.2:30068

TrueNAS UI / middlewared

  • User explicitly required: create and manage containers as TrueNAS Apps (middlewared/TrueNAS UI), not ad-hoc docker only.
  • If an app does not show in UI, check middlewared and re-create via UI.

2) Storage Layout and Mounts (Critical)

Remote TrueNAS storage root

  • /mnt/fast.storage.rushg.me/datasets/apps

Remote training workspace (folder, not ZFS dataset)

  • /mnt/fast.storage.rushg.me/datasets/apps/pytorch
  • IMPORTANT: user requested a folder, not a ZFS dataset.

Repo copy on remote

  • /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options

Ollama model storage mount (remote)

  • Host path: /mnt/fast.storage.rushg.me/datasets/apps/ollama.models
  • Container path: /root/.ollama
  • Actual model store:
    • /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models
    • /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs
    • /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests

Ollama imports folder (created by us)

  • /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports

Hugging Face cache (remote)

  • /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache
  • When retraining, set HF_HOME or HF_HUB_CACHE to this path to keep downloads on fast storage and avoid redownloading.

3) TrueNAS App Setup (GPU Training + Ollama)

Ollama App

  • Container name: ix-ollama-ollama-1
  • Exposes: 0.0.0.0:30068
  • GPU: NVIDIA RTX 5060 Ti (16 GB VRAM)
  • Observed Ollama version: 0.13.5
  • Uses /root/.ollama mapped to /mnt/fast.storage.rushg.me/datasets/apps/ollama.models

Training App (Created in TrueNAS UI)

  • App name: options-train
  • GPU: NVIDIA RTX 5060 Ti
  • Reason: user demanded TrueNAS UI app creation; also to ensure GPU access.
  • We explicitly stopped the llamacpp app to free GPU before training.

Docker permission note

  • Non-root user lacks docker socket permission.
  • Use sudo -n docker ... for all docker commands on the host.

Shell note (remote)

  • Default shell is zsh.
  • Use bash -lc '...' to avoid quote parsing issues and missing tools.
  • rg is not installed on remote; use grep/find.

4) Data Prep Pipeline (Dedup + Manual Relevance)

Source docs

  • Local docs in eBooks/ (PDF/EPUB/etc).
  • Must manually select relevant pages (options trading content only).
  • Skip: prefaces, index, author/publisher info, boilerplate, etc.

Step A - Extract full text and doc-level dedupe

Script: tools/extract_corpus.py

  • Supports .pdf/.epub/.txt/.md
  • Dedup by SHA256 of normalized text across different formats.
  • Outputs:
    • training_data/manifest.json
    • training_data/corpus.txt
    • training_data/text/*.txt
    • training_data/rejected.json Example:
python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000

Dependencies:

  • pypdf, ebooklib, beautifulsoup4, lxml, chardet

Step B - Page/section relevance filtering (Options-focused)

Script: tools/select_relevant.py

  • Scores segments for options-trading keywords.
  • Drops TOC/index/front matter.
  • Dedupe by SHA256 of normalized segment.
  • Includes neighboring pages by --neighbors. Outputs in training_data/relevant:
    • text/*.txt
    • manifest.json
    • report.csv
    • corpus.txt Example:
python tools/select_relevant.py --input eBooks --out training_data/relevant \
  --min-score 10 --min-chars 800 --neighbors 1

Step C - Chunk to JSONL dataset

Script: tools/build_dataset.py

  • Splits into overlapping chunks.
  • Optional junk filter and keyword score. Outputs:
    • training_data/relevant/dataset.jsonl
    • training_data/relevant/dataset.stats.json Example:
python tools/build_dataset.py \
  --manifest training_data/relevant/manifest.json \
  --text-dir training_data/relevant/text \
  --out training_data/curated/dataset.jsonl \
  --chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk

Manual curation requirement

  • The scripts are helper filters only. You must still manually review for relevance, especially to remove prefaces, indexes, disclaimers, etc.
  • Use training_data/relevant/corpus.txt to scan human-readable content.

Dataset used in this run

  • Remote dataset path: /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl
  • Count: 1778 chunks.

5) Training Pipeline (LoRA fine-tune on NVIDIA box)

Why local AMD GPU was not used

  • User explicitly requested the remote NVIDIA box.
  • Local AMD 7900XTX was not used in this run.

Training script (repo)

  • tools/finetune_lora.py
  • Modified to fix gradient checkpointing + LoRA:
    • model.enable_input_require_grads() is required.
    • Without it, MXFP4 path fails with: RuntimeError: element 0 of tensors does not require grad...

Key training args used

  • --model openai/gpt-oss-20b
  • --data training_data/curated/dataset.jsonl
  • --out training_data/lora_adapter
  • --max-length 256
  • --epochs 1 (adjust as needed)
  • --lora-r 8 --lora-alpha 16 --lora-dropout 0.05
  • --grad-accum 4
  • --quant auto (MXFP4 on GPU)
  • --log-seconds 120 (must show progress every 2 minutes)
  • --log-steps 10 (extra progress)

Progress requirement (must follow)

  • Use --log-seconds 120 so training prints logs every ~2 minutes.
  • For long copies or merges, print date + file size in a loop every 120 sec.

GPU requirements

  • NVIDIA GPU required for quantized loading; MXFP4 needs GPU.
  • GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8.

What failed and how we fixed it

  1. MXFP4 grad error

    • Error: RuntimeError: element 0 of tensors does not require grad
    • Fix: In tools/finetune_lora.py, after model.gradient_checkpointing_enable() add: model.enable_input_require_grads()
  2. Bitsandbytes 4-bit OOM

    • With --quant 4bit the model OOMed even with max memory limits.
    • CPU offload not supported with this setup; still OOM.
    • Fix: use --quant auto (MXFP4) instead.
  3. Triton/compile issues

    • Triton kernels required a compiler in the container.
    • Fix: Use a PyTorch CUDA devel image (not runtime) or install build-essential inside the container.

Output artifacts (LoRA)

training_data/lora_adapter/ contains:

  • adapter_model.safetensors
  • adapter_config.json
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json
  • training_summary.json (includes steps and loss EMA)

6) GGUF Conversion and Merge (Required; Ollama LoRA not supported)

Why merge is required

  • Ollama error when using ADAPTER: Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented
  • Therefore, must merge LoRA into base GGUF.

llama.cpp setup (remote)

  • Clone location: /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
  • Build:
cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
cmake --build build -j $(nproc)
  • Note: -DLLAMA_CURL=OFF used due to missing libcurl.
  • Binaries:
    • build/bin/llama-export-lora
    • build/bin/llama-gguf
  • When running, set:
    • LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin

Convert LoRA to GGUF

Use convert_lora_to_gguf.py:

python convert_lora_to_gguf.py \
  --lora /path/to/training_data/lora_adapter \
  --outfile /path/to/training_data/lora_adapter/options-lora.gguf

Architecture mismatch pitfall (critical)

  • Base GGUF from Ollama uses general.architecture = gptoss
  • LoRA GGUF from converter uses general.architecture = gpt-oss
  • llama-export-lora throws: model arch and LoRA arch mismatch

Fix: rewrite LoRA GGUF metadata to gptoss

We used gguf-py to rewrite metadata. Example (run inside a Python container):

from gguf import GGUFReader, GGUFWriter, GGUFValueType
import numpy as np

inp = "options-lora.gguf"
out = "options-lora-gptoss.gguf"
r = GGUFReader(inp)
w = GGUFWriter(out, "gptoss", endianess=r.endianess)

# Copy KV fields except general.architecture
for key, field in r.fields.items():
    if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"):
        continue
    vtype = field.types[0]
    if vtype == GGUFValueType.ARRAY:
        w.add_key_value(key, field.contents(), vtype, field.types[-1])
    else:
        w.add_key_value(key, field.contents(), vtype)

# Copy tensors
for t in r.tensors:
    data = t.data
    if not data.flags["C_CONTIGUOUS"]:
        data = np.ascontiguousarray(data)
    w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)),
                 raw_dtype=t.tensor_type, tensor_endianess=r.endianess)

Tensor orientation mismatch (critical)

  • After arch fix, merge failed with: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
  • Root cause: LoRA A/B tensors had orientation incompatible with base GGUF.
  • Fix: transpose LoRA A and B data when re-serializing GGUF.

Important GGUF detail:

  • GGUF stores tensor dims reversed internally.
  • You must transpose the data while keeping the original raw_shape.
  • Working approach:
if name.endswith(".lora_a") or name.endswith(".lora_b"):
    data = np.ascontiguousarray(data.T)
w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...)

Working LoRA GGUF for merge

  • options-lora-gptoss-transposed2.gguf

Merge LoRA into base GGUF

Base GGUF path (from Ollama blob): /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb

Merge command:

export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
/mnt/.../llama.cpp/build/bin/llama-export-lora \
  -m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \
  --lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \
  -o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf

Merged output (final)

  • /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
  • Size: ~13 GB
  • File type: F16

Intermediate artifacts kept (not deleted)

  • options-lora-gptoss.gguf
  • options-lora-gptoss-transposed.gguf
  • options-lora-gptoss-transposed-debug.gguf
  • options-lora-gptoss-transposed2.gguf
  • gpt-oss-20b-options-merged-f16-v2.gguf (14 MB, failed)
  • gpt-oss-20b-options-merged-f16.gguf (0 bytes, failed)

7) Ollama Integration (Final Model)

Why ADAPTER does not work

Modelfile with ADAPTER fails:

Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented

Therefore, merged GGUF is mandatory.

Copy merged GGUF into Ollama imports

mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \
   /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/

Modelfile (with tool support)

Important: tools only work if the TEMPLATE block matches the base model template. Without TEMPLATE, Ollama shows {{ .Prompt }} and tools are disabled.

We extracted template from base:

sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \
  > /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template

Then built Modelfile: /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model

FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf
TEMPLATE """
<paste full gpt-oss:20b template here>
"""

SYSTEM """You are a knowledgeable options trading assistant.
Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions.
If information is uncertain, say so rather than guessing."""

Create the model

sudo -n docker exec -i ix-ollama-ollama-1 \
  ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model

Verify in Ollama

sudo -n docker exec -i ix-ollama-ollama-1 ollama list
sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model

Expected capabilities include: completion, tools, thinking.

Runtime note

  • ollama run can take a long time to load and may time out.
  • Use HTTP API for reliable results:
curl http://192.168.1.2:30068/api/generate -d '{
  "model":"trained-options-model:latest",
  "prompt":"Explain delta and gamma briefly.",
  "stream":false
}'

8) Tool/Function Call Requirement (Mandatory)

How to verify tool support

  1. ollama show trained-options-model should list tools in Capabilities.
  2. ollama show trained-options-model --template should show the full template (not {{ .Prompt }}).

Tool-call test (HTTP)

curl http://192.168.1.2:30068/api/chat -d '{
  "model":"trained-options-model:latest",
  "stream":false,
  "messages":[
    {"role":"system","content":"Use tools when available."},
    {"role":"user","content":"Compute total for quantity=3 price=4. Use tool."}
  ],
  "tools":[
    {"type":"function","function":{
      "name":"calc_total",
      "description":"Compute total cost for a trade",
      "parameters":{
        "type":"object",
        "properties":{"quantity":{"type":"number"},"price":{"type":"number"}},
        "required":["quantity","price"]
      }
    }}
  ]
}'

Expected: tool_calls in response.


9) Known Failures + Fixes (Summary)

  • Ollama ADAPTER fails -> Merge LoRA into GGUF.
  • Arch mismatch (gpt-oss vs gptoss) -> Rewrite LoRA metadata.
  • ggml_can_mul_mat assertion -> Transpose LoRA A/B data.
  • MXFP4 gradient error -> model.enable_input_require_grads().
  • Bitsandbytes 4-bit OOM -> Use MXFP4 auto on GPU.
  • Triton compile error -> Use PyTorch CUDA devel image or install gcc.
  • WSL convert_lora_to_gguf.py missing transformers -> Use docker or install transformers in WSL.
  • ollama run hangs -> Use /api/generate or /api/chat via curl.

10) Retrain Checklist (Minimal Friction)

  1. Prepare data locally

    • Put docs in eBooks/.
    • Run:
      • python tools/select_relevant.py ...
      • python tools/build_dataset.py ...
    • Manually inspect training_data/relevant/corpus.txt.
  2. Sync to remote

    • Example (PowerShell):
      • scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/
  3. Stop GPU-conflicting apps

    • Stop llamacpp app in TrueNAS UI.
  4. Train LoRA in TrueNAS app

    • Ensure GPU attached.
    • Use tools/finetune_lora.py with --log-seconds 120.
    • Confirm adapter saved in training_data/lora_adapter.
  5. Convert LoRA to GGUF

    • convert_lora_to_gguf.py -> options-lora.gguf
  6. Fix arch + transpose

    • Rewrite to gptoss
    • Transpose LoRA A/B data
    • Output options-lora-gptoss-transposed2.gguf
  7. Merge into base GGUF

    • Use llama-export-lora
    • Output gpt-oss-20b-options-merged-f16-v3.gguf
  8. Ollama import

    • Copy GGUF to /mnt/.../ollama.models/imports
    • Build Modelfile with TEMPLATE
    • ollama create trained-options-model -f ...
  9. Verify tool support

    • ollama show trained-options-model
    • /api/chat tool-call test

11) Commands Used in This Run (Examples)

Remote file listing (progress + verify)

ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter"

GGUF metadata check

python - <<'PY'
from gguf import GGUFReader
r = GGUFReader("options-lora.gguf")
print(r.get_field("general.architecture").contents())
PY

Merge with progress updates every 2 minutes

BASE=/mnt/.../ollama.models/models/blobs/<base-blob>
LORA=/mnt/.../options-lora-gptoss-transposed2.gguf
OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
/mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" &
pid=$!
while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done
wait $pid

12) Notes About Local Files in This Repo

  • Modelfile.trained-options-model (local) still references ADAPTER and is not valid for current Ollama (ADAPTER unsupported).
  • Use the remote Modelfile in /mnt/.../ollama.models/imports/.
  • _tmp_* scripts exist for prior automation attempts (TrueNAS app creation, GPU checks, etc). Use only if you know what they do.

13) Progress Reporting Policy (Non-Negotiable)

During any long run (training, merge, large copy):

  • Print a progress line every 120 seconds.
  • Example: date + file size, or a training loss line.
  • Do not allow silent runs.

14) Quick Sanity Checks (After Retrain)

  1. ollama list shows trained-options-model:latest
  2. ollama show trained-options-model lists tools
  3. /api/generate returns a coherent answer
  4. /api/chat returns a tool call when tools are provided

15) Do NOT Forget These Pitfalls

  • Arch mismatch (gpt-oss vs gptoss) will break merge.
  • LoRA tensor orientation mismatch will break merge.
  • ADAPTER in Modelfile does not work in current Ollama.
  • Tool calls only work if TEMPLATE is included.
  • Remote shell is zsh; use bash -lc for complex quoting.
  • Docker requires sudo -n.
  • Use the remote GPU as requested; do not train on CPU.

16) Current "Final" Artifacts (Reference)

LoRA adapter

/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/

Merged GGUF (final)

/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf

Ollama Modelfile

/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model

Ollama Model Name

trained-options-model:latest


17) If You Need to Rebuild Tools Support

  1. Extract base template:
sudo -n docker exec -i ix-ollama-ollama-1 \
  ollama show gpt-oss:20b --template > /mnt/.../gptoss.template
  1. Create Modelfile with TEMPLATE block.
  2. Re-run ollama create.
  3. Verify ollama show trained-options-model lists tools.

18) Git Repo + Source Inventory (This Repo)

Remote git repo

  • URL (HTTP): https://git.rushg.me/rushabh/ollama-model-training-5060ti
  • URL (git): https://git.rushg.me/rushabh/ollama-model-training-5060ti.git
  • Auth: user will authenticate on push when prompted (username/password).

What is committed (and why)

  • AGENTS.md (this runbook; full end-to-end context).
  • README.md (quick overview + links to AGENTS).
  • tools/ scripts for extraction, filtering, dataset build, and training.
  • training_data/ curated dataset, manifests, reports, and LoRA outputs used for the run (kept for reproducibility).
  • remote/ollama/Modelfile.trained-options-model.remote (exact remote Modelfile used to enable tools).
  • remote/ollama/gptoss.template (base template pulled from gpt-oss:20b).
  • Modelfile.trained-options-model (local reference; see remote Modelfile for tool-enabled version).

What is excluded (and why)

  • eBooks/ raw source data (large; keep local and private).
  • _llama_cpp/ (upstream repo; clone on demand).
  • .venv/ and Python caches.
  • Any base model weights or Ollama blobs (too large; download via Ollama/HF).

How to recreate missing external assets

  • Base model:
    • ollama pull gpt-oss:20b on the Ollama host
    • or huggingface-cli download openai/gpt-oss-20b into HF cache
  • llama.cpp:
    • git clone https://github.com/ggml-org/llama.cpp.git
    • build with -DLLAMA_CURL=OFF if libcurl is missing.

End of AGENTS.md