22 KiB
AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide)
This file captures the full context, decisions, failures, fixes, commands, and
paths used to fine-tune gpt-oss-20b and deploy it into Ollama as
trained-options-model. It is meant to be a literal step-by-step recipe for
retraining with new data. Read this end-to-end before touching anything.
0) Hard Requirements (User Directives)
- Use local documents in this repo only.
- Dedupe repeated docs across formats; do not ingest duplicates.
- Manually remove non-relevant ebook content (preface, index, author/publisher pages, etc). Options-trading content only.
- Use GPU heavily (not CPU).
- If local AMD 7900XTX is not available, use the remote NVIDIA box.
- All long-running tasks must show progress and post progress at least every 2 minutes (print progress or size updates, not silent).
- Retraining must complete locally (no cloud).
- Final Ollama model name must be trained-options-model.
- Final Ollama model must support tool/function calls.
- Any destructive commands must require explicit approval (do not run them silently).
1) Machines, OS, Access, and Credentials
Local Windows
- Repo path:
C:\Users\Rushabh\projects\ingest-ebook-options - Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead).
- Local Ollama install exists but was not used for training.
Remote TrueNAS SCALE (Used for Training + Ollama)
- Host:
192.168.1.2 - SSH port:
55555 - User:
rushabh - Password: none required (key-based / no password).
- SSH example:
ssh -p 55555 rushabh@192.168.1.2
- Ollama HTTP endpoint (remote):
http://192.168.1.2:30068
TrueNAS UI / middlewared
- User explicitly required: create and manage containers as TrueNAS Apps (middlewared/TrueNAS UI), not ad-hoc docker only.
- If an app does not show in UI, check middlewared and re-create via UI.
2) Storage Layout and Mounts (Critical)
Remote TrueNAS storage root
/mnt/fast.storage.rushg.me/datasets/apps
Remote training workspace (folder, not ZFS dataset)
/mnt/fast.storage.rushg.me/datasets/apps/pytorch- IMPORTANT: user requested a folder, not a ZFS dataset.
Repo copy on remote
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options
Ollama model storage mount (remote)
- Host path:
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models - Container path:
/root/.ollama - Actual model store:
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests
Ollama imports folder (created by us)
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
Hugging Face cache (remote)
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache- When retraining, set
HF_HOMEorHF_HUB_CACHEto this path to keep downloads on fast storage and avoid redownloading.
3) TrueNAS App Setup (GPU Training + Ollama)
Ollama App
- Container name:
ix-ollama-ollama-1 - Exposes:
0.0.0.0:30068 - GPU: NVIDIA RTX 5060 Ti (16 GB VRAM)
- Observed Ollama version: 0.13.5
- Uses
/root/.ollamamapped to/mnt/fast.storage.rushg.me/datasets/apps/ollama.models
Training App (Created in TrueNAS UI)
- App name:
options-train - GPU: NVIDIA RTX 5060 Ti
- Reason: user demanded TrueNAS UI app creation; also to ensure GPU access.
- We explicitly stopped the
llamacppapp to free GPU before training.
Docker permission note
- Non-root user lacks docker socket permission.
- Use
sudo -n docker ...for all docker commands on the host.
Shell note (remote)
- Default shell is
zsh. - Use
bash -lc '...'to avoid quote parsing issues and missing tools. rgis not installed on remote; usegrep/find.
4) Data Prep Pipeline (Dedup + Manual Relevance)
Source docs
- Local docs in
eBooks/(PDF/EPUB/etc). - Must manually select relevant pages (options trading content only).
- Skip: prefaces, index, author/publisher info, boilerplate, etc.
Step A - Extract full text and doc-level dedupe
Script: tools/extract_corpus.py
- Supports .pdf/.epub/.txt/.md
- Dedup by SHA256 of normalized text across different formats.
- Outputs:
training_data/manifest.jsontraining_data/corpus.txttraining_data/text/*.txttraining_data/rejected.jsonExample:
python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000
Dependencies:
pypdf,ebooklib,beautifulsoup4,lxml,chardet
Step B - Page/section relevance filtering (Options-focused)
Script: tools/select_relevant.py
- Scores segments for options-trading keywords.
- Drops TOC/index/front matter.
- Dedupe by SHA256 of normalized segment.
- Includes neighboring pages by
--neighbors. Outputs intraining_data/relevant:text/*.txtmanifest.jsonreport.csvcorpus.txtExample:
python tools/select_relevant.py --input eBooks --out training_data/relevant \
--min-score 10 --min-chars 800 --neighbors 1
Step C - Chunk to JSONL dataset
Script: tools/build_dataset.py
- Splits into overlapping chunks.
- Optional junk filter and keyword score.
Outputs:
training_data/relevant/dataset.jsonltraining_data/relevant/dataset.stats.jsonExample:
python tools/build_dataset.py \
--manifest training_data/relevant/manifest.json \
--text-dir training_data/relevant/text \
--out training_data/curated/dataset.jsonl \
--chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk
Manual curation requirement
- The scripts are helper filters only. You must still manually review for relevance, especially to remove prefaces, indexes, disclaimers, etc.
- Use
training_data/relevant/corpus.txtto scan human-readable content.
Dataset used in this run
- Remote dataset path:
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl - Count: 1778 chunks.
5) Training Pipeline (LoRA fine-tune on NVIDIA box)
Why local AMD GPU was not used
- User explicitly requested the remote NVIDIA box.
- Local AMD 7900XTX was not used in this run.
Training script (repo)
tools/finetune_lora.py- Modified to fix gradient checkpointing + LoRA:
model.enable_input_require_grads()is required.- Without it, MXFP4 path fails with:
RuntimeError: element 0 of tensors does not require grad...
Key training args used
--model openai/gpt-oss-20b--data training_data/curated/dataset.jsonl--out training_data/lora_adapter--max-length 256--epochs 1(adjust as needed)--lora-r 8 --lora-alpha 16 --lora-dropout 0.05--grad-accum 4--quant auto(MXFP4 on GPU)--log-seconds 120(must show progress every 2 minutes)--log-steps 10(extra progress)
Progress requirement (must follow)
- Use
--log-seconds 120so training prints logs every ~2 minutes. - For long copies or merges, print
date+ file size in a loop every 120 sec.
GPU requirements
- NVIDIA GPU required for quantized loading; MXFP4 needs GPU.
- GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8.
What failed and how we fixed it
-
MXFP4 grad error
- Error:
RuntimeError: element 0 of tensors does not require grad - Fix: In
tools/finetune_lora.py, aftermodel.gradient_checkpointing_enable()add:model.enable_input_require_grads()
- Error:
-
Bitsandbytes 4-bit OOM
- With
--quant 4bitthe model OOMed even with max memory limits. - CPU offload not supported with this setup; still OOM.
- Fix: use
--quant auto(MXFP4) instead.
- With
-
Triton/compile issues
- Triton kernels required a compiler in the container.
- Fix: Use a PyTorch CUDA devel image (not runtime) or install
build-essentialinside the container.
Output artifacts (LoRA)
training_data/lora_adapter/ contains:
adapter_model.safetensorsadapter_config.jsontokenizer.json,tokenizer_config.json,special_tokens_map.jsontraining_summary.json(includes steps and loss EMA)
6) GGUF Conversion and Merge (Required; Ollama LoRA not supported)
Why merge is required
- Ollama error when using ADAPTER:
Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented - Therefore, must merge LoRA into base GGUF.
llama.cpp setup (remote)
- Clone location:
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp - Build:
cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
cmake --build build -j $(nproc)
- Note:
-DLLAMA_CURL=OFFused due to missing libcurl. - Binaries:
build/bin/llama-export-lorabuild/bin/llama-gguf
- When running, set:
LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
Convert LoRA to GGUF
Use convert_lora_to_gguf.py:
python convert_lora_to_gguf.py \
--lora /path/to/training_data/lora_adapter \
--outfile /path/to/training_data/lora_adapter/options-lora.gguf
Architecture mismatch pitfall (critical)
- Base GGUF from Ollama uses
general.architecture = gptoss - LoRA GGUF from converter uses
general.architecture = gpt-oss llama-export-lorathrows:model arch and LoRA arch mismatch
Fix: rewrite LoRA GGUF metadata to gptoss
We used gguf-py to rewrite metadata. Example (run inside a Python container):
from gguf import GGUFReader, GGUFWriter, GGUFValueType
import numpy as np
inp = "options-lora.gguf"
out = "options-lora-gptoss.gguf"
r = GGUFReader(inp)
w = GGUFWriter(out, "gptoss", endianess=r.endianess)
# Copy KV fields except general.architecture
for key, field in r.fields.items():
if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"):
continue
vtype = field.types[0]
if vtype == GGUFValueType.ARRAY:
w.add_key_value(key, field.contents(), vtype, field.types[-1])
else:
w.add_key_value(key, field.contents(), vtype)
# Copy tensors
for t in r.tensors:
data = t.data
if not data.flags["C_CONTIGUOUS"]:
data = np.ascontiguousarray(data)
w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)),
raw_dtype=t.tensor_type, tensor_endianess=r.endianess)
Tensor orientation mismatch (critical)
- After arch fix, merge failed with:
GGML_ASSERT(ggml_can_mul_mat(a, b)) failed - Root cause: LoRA A/B tensors had orientation incompatible with base GGUF.
- Fix: transpose LoRA A and B data when re-serializing GGUF.
Important GGUF detail:
- GGUF stores tensor dims reversed internally.
- You must transpose the data while keeping the original raw_shape.
- Working approach:
if name.endswith(".lora_a") or name.endswith(".lora_b"):
data = np.ascontiguousarray(data.T)
w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...)
Working LoRA GGUF for merge
options-lora-gptoss-transposed2.gguf
Merge LoRA into base GGUF
Base GGUF path (from Ollama blob):
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb
Merge command:
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
/mnt/.../llama.cpp/build/bin/llama-export-lora \
-m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \
--lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \
-o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
Merged output (final)
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf- Size: ~13 GB
- File type: F16
Intermediate artifacts kept (not deleted)
options-lora-gptoss.ggufoptions-lora-gptoss-transposed.ggufoptions-lora-gptoss-transposed-debug.ggufoptions-lora-gptoss-transposed2.ggufgpt-oss-20b-options-merged-f16-v2.gguf(14 MB, failed)gpt-oss-20b-options-merged-f16.gguf(0 bytes, failed)
7) Ollama Integration (Final Model)
Why ADAPTER does not work
Modelfile with ADAPTER fails:
Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented
Therefore, merged GGUF is mandatory.
Copy merged GGUF into Ollama imports
mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/
Modelfile (with tool support)
Important: tools only work if the TEMPLATE block matches the base model
template. Without TEMPLATE, Ollama shows {{ .Prompt }} and tools are disabled.
We extracted template from base:
sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \
> /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template
Then built Modelfile:
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model
FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf
TEMPLATE """
<paste full gpt-oss:20b template here>
"""
SYSTEM """You are a knowledgeable options trading assistant.
Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions.
If information is uncertain, say so rather than guessing."""
Create the model
sudo -n docker exec -i ix-ollama-ollama-1 \
ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model
Verify in Ollama
sudo -n docker exec -i ix-ollama-ollama-1 ollama list
sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model
Expected capabilities include: completion, tools, thinking.
Runtime note
ollama runcan take a long time to load and may time out.- Use HTTP API for reliable results:
curl http://192.168.1.2:30068/api/generate -d '{
"model":"trained-options-model:latest",
"prompt":"Explain delta and gamma briefly.",
"stream":false
}'
8) Tool/Function Call Requirement (Mandatory)
How to verify tool support
ollama show trained-options-modelshould listtoolsin Capabilities.ollama show trained-options-model --templateshould show the full template (not{{ .Prompt }}).
Tool-call test (HTTP)
curl http://192.168.1.2:30068/api/chat -d '{
"model":"trained-options-model:latest",
"stream":false,
"messages":[
{"role":"system","content":"Use tools when available."},
{"role":"user","content":"Compute total for quantity=3 price=4. Use tool."}
],
"tools":[
{"type":"function","function":{
"name":"calc_total",
"description":"Compute total cost for a trade",
"parameters":{
"type":"object",
"properties":{"quantity":{"type":"number"},"price":{"type":"number"}},
"required":["quantity","price"]
}
}}
]
}'
Expected: tool_calls in response.
9) Known Failures + Fixes (Summary)
- Ollama ADAPTER fails -> Merge LoRA into GGUF.
- Arch mismatch (
gpt-ossvsgptoss) -> Rewrite LoRA metadata. - ggml_can_mul_mat assertion -> Transpose LoRA A/B data.
- MXFP4 gradient error ->
model.enable_input_require_grads(). - Bitsandbytes 4-bit OOM -> Use MXFP4 auto on GPU.
- Triton compile error -> Use PyTorch CUDA devel image or install gcc.
- WSL convert_lora_to_gguf.py missing transformers -> Use docker or install transformers in WSL.
ollama runhangs -> Use/api/generateor/api/chatvia curl.
10) Retrain Checklist (Minimal Friction)
-
Prepare data locally
- Put docs in
eBooks/. - Run:
python tools/select_relevant.py ...python tools/build_dataset.py ...
- Manually inspect
training_data/relevant/corpus.txt.
- Put docs in
-
Sync to remote
- Example (PowerShell):
scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/
- Example (PowerShell):
-
Stop GPU-conflicting apps
- Stop
llamacppapp in TrueNAS UI.
- Stop
-
Train LoRA in TrueNAS app
- Ensure GPU attached.
- Use
tools/finetune_lora.pywith--log-seconds 120. - Confirm adapter saved in
training_data/lora_adapter.
-
Convert LoRA to GGUF
convert_lora_to_gguf.py->options-lora.gguf
-
Fix arch + transpose
- Rewrite to
gptoss - Transpose LoRA A/B data
- Output
options-lora-gptoss-transposed2.gguf
- Rewrite to
-
Merge into base GGUF
- Use
llama-export-lora - Output
gpt-oss-20b-options-merged-f16-v3.gguf
- Use
-
Ollama import
- Copy GGUF to
/mnt/.../ollama.models/imports - Build Modelfile with TEMPLATE
ollama create trained-options-model -f ...
- Copy GGUF to
-
Verify tool support
ollama show trained-options-model/api/chattool-call test
11) Commands Used in This Run (Examples)
Remote file listing (progress + verify)
ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter"
GGUF metadata check
python - <<'PY'
from gguf import GGUFReader
r = GGUFReader("options-lora.gguf")
print(r.get_field("general.architecture").contents())
PY
Merge with progress updates every 2 minutes
BASE=/mnt/.../ollama.models/models/blobs/<base-blob>
LORA=/mnt/.../options-lora-gptoss-transposed2.gguf
OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
/mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" &
pid=$!
while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done
wait $pid
12) Notes About Local Files in This Repo
Modelfile.trained-options-model(local) still references ADAPTER and is not valid for current Ollama (ADAPTER unsupported).- Use the remote Modelfile in
/mnt/.../ollama.models/imports/. _tmp_*scripts exist for prior automation attempts (TrueNAS app creation, GPU checks, etc). Use only if you know what they do.
13) Progress Reporting Policy (Non-Negotiable)
During any long run (training, merge, large copy):
- Print a progress line every 120 seconds.
- Example:
date+ file size, or a training loss line. - Do not allow silent runs.
14) Quick Sanity Checks (After Retrain)
ollama listshowstrained-options-model:latestollama show trained-options-modelliststools/api/generatereturns a coherent answer/api/chatreturns a tool call when tools are provided
15) Do NOT Forget These Pitfalls
- Arch mismatch (
gpt-ossvsgptoss) will break merge. - LoRA tensor orientation mismatch will break merge.
- ADAPTER in Modelfile does not work in current Ollama.
- Tool calls only work if TEMPLATE is included.
- Remote shell is zsh; use
bash -lcfor complex quoting. - Docker requires
sudo -n. - Use the remote GPU as requested; do not train on CPU.
16) Current "Final" Artifacts (Reference)
LoRA adapter
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/
Merged GGUF (final)
/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
Ollama Modelfile
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model
Ollama Model Name
trained-options-model:latest
17) If You Need to Rebuild Tools Support
- Extract base template:
sudo -n docker exec -i ix-ollama-ollama-1 \
ollama show gpt-oss:20b --template > /mnt/.../gptoss.template
- Create Modelfile with TEMPLATE block.
- Re-run
ollama create. - Verify
ollama show trained-options-modelliststools.
18) Git Repo + Source Inventory (This Repo)
Remote git repo
- URL (HTTP):
https://git.rushg.me/rushabh/ollama-model-training-5060ti - URL (git):
https://git.rushg.me/rushabh/ollama-model-training-5060ti.git - Auth: user will authenticate on push when prompted (username/password).
What is committed (and why)
AGENTS.md(this runbook; full end-to-end context).README.md(quick overview + links to AGENTS).tools/scripts for extraction, filtering, dataset build, and training.training_data/curated dataset, manifests, reports, and LoRA outputs used for the run (kept for reproducibility).remote/ollama/Modelfile.trained-options-model.remote(exact remote Modelfile used to enable tools).remote/ollama/gptoss.template(base template pulled from gpt-oss:20b).Modelfile.trained-options-model(local reference; see remote Modelfile for tool-enabled version).
What is excluded (and why)
eBooks/raw source data (large; keep local and private)._llama_cpp/(upstream repo; clone on demand)..venv/and Python caches.- Any base model weights or Ollama blobs (too large; download via Ollama/HF).
How to recreate missing external assets
- Base model:
ollama pull gpt-oss:20bon the Ollama host- or
huggingface-cli download openai/gpt-oss-20binto HF cache
- llama.cpp:
git clone https://github.com/ggml-org/llama.cpp.git- build with
-DLLAMA_CURL=OFFif libcurl is missing.
End of AGENTS.md