Add training workflow, datasets, and runbook
This commit is contained in:
618
AGENTS.md
Normal file
618
AGENTS.md
Normal file
@@ -0,0 +1,618 @@
|
||||
# AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide)
|
||||
|
||||
This file captures the full context, decisions, failures, fixes, commands, and
|
||||
paths used to fine-tune gpt-oss-20b and deploy it into Ollama as
|
||||
`trained-options-model`. It is meant to be a literal step-by-step recipe for
|
||||
retraining with new data. Read this end-to-end before touching anything.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 0) Hard Requirements (User Directives)
|
||||
|
||||
- Use local documents in this repo only.
|
||||
- Dedupe repeated docs across formats; do not ingest duplicates.
|
||||
- Manually remove non-relevant ebook content (preface, index, author/publisher
|
||||
pages, etc). Options-trading content only.
|
||||
- Use GPU heavily (not CPU).
|
||||
- If local AMD 7900XTX is not available, use the remote NVIDIA box.
|
||||
- All long-running tasks must show progress and **post progress at least every
|
||||
2 minutes** (print progress or size updates, not silent).
|
||||
- Retraining must complete locally (no cloud).
|
||||
- Final Ollama model name must be **trained-options-model**.
|
||||
- Final Ollama model **must support tool/function calls**.
|
||||
- Any destructive commands must require explicit approval (do not run them
|
||||
silently).
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 1) Machines, OS, Access, and Credentials
|
||||
|
||||
### Local Windows
|
||||
- Repo path: `C:\Users\Rushabh\projects\ingest-ebook-options`
|
||||
- Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead).
|
||||
- Local Ollama install exists but was not used for training.
|
||||
|
||||
### Remote TrueNAS SCALE (Used for Training + Ollama)
|
||||
- Host: `192.168.1.2`
|
||||
- SSH port: `55555`
|
||||
- User: `rushabh`
|
||||
- Password: none required (key-based / no password).
|
||||
- SSH example:
|
||||
- `ssh -p 55555 rushabh@192.168.1.2`
|
||||
- Ollama HTTP endpoint (remote): `http://192.168.1.2:30068`
|
||||
|
||||
### TrueNAS UI / middlewared
|
||||
- User explicitly required: create and manage containers as TrueNAS Apps
|
||||
(middlewared/TrueNAS UI), not ad-hoc docker only.
|
||||
- If an app does not show in UI, check middlewared and re-create via UI.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 2) Storage Layout and Mounts (Critical)
|
||||
|
||||
### Remote TrueNAS storage root
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps`
|
||||
|
||||
### Remote training workspace (folder, not ZFS dataset)
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch`
|
||||
- IMPORTANT: user requested a folder, not a ZFS dataset.
|
||||
|
||||
### Repo copy on remote
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options`
|
||||
|
||||
### Ollama model storage mount (remote)
|
||||
- Host path: `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
|
||||
- Container path: `/root/.ollama`
|
||||
- Actual model store:
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models`
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs`
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests`
|
||||
|
||||
### Ollama imports folder (created by us)
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports`
|
||||
|
||||
### Hugging Face cache (remote)
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache`
|
||||
- When retraining, set `HF_HOME` or `HF_HUB_CACHE` to this path to keep downloads
|
||||
on fast storage and avoid redownloading.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 3) TrueNAS App Setup (GPU Training + Ollama)
|
||||
|
||||
### Ollama App
|
||||
- Container name: `ix-ollama-ollama-1`
|
||||
- Exposes: `0.0.0.0:30068`
|
||||
- GPU: NVIDIA RTX 5060 Ti (16 GB VRAM)
|
||||
- Observed Ollama version: 0.13.5
|
||||
- Uses `/root/.ollama` mapped to `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
|
||||
|
||||
### Training App (Created in TrueNAS UI)
|
||||
- App name: `options-train`
|
||||
- GPU: NVIDIA RTX 5060 Ti
|
||||
- Reason: user demanded TrueNAS UI app creation; also to ensure GPU access.
|
||||
- We explicitly stopped the `llamacpp` app to free GPU before training.
|
||||
|
||||
### Docker permission note
|
||||
- Non-root user lacks docker socket permission.
|
||||
- Use `sudo -n docker ...` for all docker commands on the host.
|
||||
|
||||
### Shell note (remote)
|
||||
- Default shell is `zsh`.
|
||||
- Use `bash -lc '...'` to avoid quote parsing issues and missing tools.
|
||||
- `rg` is not installed on remote; use `grep`/`find`.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 4) Data Prep Pipeline (Dedup + Manual Relevance)
|
||||
|
||||
### Source docs
|
||||
- Local docs in `eBooks/` (PDF/EPUB/etc).
|
||||
- Must **manually** select relevant pages (options trading content only).
|
||||
- Skip: prefaces, index, author/publisher info, boilerplate, etc.
|
||||
|
||||
### Step A - Extract full text and doc-level dedupe
|
||||
Script: `tools/extract_corpus.py`
|
||||
- Supports .pdf/.epub/.txt/.md
|
||||
- Dedup by SHA256 of normalized text across different formats.
|
||||
- Outputs:
|
||||
- `training_data/manifest.json`
|
||||
- `training_data/corpus.txt`
|
||||
- `training_data/text/*.txt`
|
||||
- `training_data/rejected.json`
|
||||
Example:
|
||||
```
|
||||
python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000
|
||||
```
|
||||
Dependencies:
|
||||
- `pypdf`, `ebooklib`, `beautifulsoup4`, `lxml`, `chardet`
|
||||
|
||||
### Step B - Page/section relevance filtering (Options-focused)
|
||||
Script: `tools/select_relevant.py`
|
||||
- Scores segments for options-trading keywords.
|
||||
- Drops TOC/index/front matter.
|
||||
- Dedupe by SHA256 of normalized segment.
|
||||
- Includes neighboring pages by `--neighbors`.
|
||||
Outputs in `training_data/relevant`:
|
||||
- `text/*.txt`
|
||||
- `manifest.json`
|
||||
- `report.csv`
|
||||
- `corpus.txt`
|
||||
Example:
|
||||
```
|
||||
python tools/select_relevant.py --input eBooks --out training_data/relevant \
|
||||
--min-score 10 --min-chars 800 --neighbors 1
|
||||
```
|
||||
|
||||
### Step C - Chunk to JSONL dataset
|
||||
Script: `tools/build_dataset.py`
|
||||
- Splits into overlapping chunks.
|
||||
- Optional junk filter and keyword score.
|
||||
Outputs:
|
||||
- `training_data/relevant/dataset.jsonl`
|
||||
- `training_data/relevant/dataset.stats.json`
|
||||
Example:
|
||||
```
|
||||
python tools/build_dataset.py \
|
||||
--manifest training_data/relevant/manifest.json \
|
||||
--text-dir training_data/relevant/text \
|
||||
--out training_data/curated/dataset.jsonl \
|
||||
--chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk
|
||||
```
|
||||
|
||||
### Manual curation requirement
|
||||
- The scripts are helper filters only. You must still **manually review** for
|
||||
relevance, especially to remove prefaces, indexes, disclaimers, etc.
|
||||
- Use `training_data/relevant/corpus.txt` to scan human-readable content.
|
||||
|
||||
### Dataset used in this run
|
||||
- Remote dataset path:
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl`
|
||||
- Count: 1778 chunks.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 5) Training Pipeline (LoRA fine-tune on NVIDIA box)
|
||||
|
||||
### Why local AMD GPU was not used
|
||||
- User explicitly requested the remote NVIDIA box.
|
||||
- Local AMD 7900XTX was not used in this run.
|
||||
|
||||
### Training script (repo)
|
||||
- `tools/finetune_lora.py`
|
||||
- Modified to fix gradient checkpointing + LoRA:
|
||||
- `model.enable_input_require_grads()` is required.
|
||||
- Without it, MXFP4 path fails with:
|
||||
`RuntimeError: element 0 of tensors does not require grad...`
|
||||
|
||||
### Key training args used
|
||||
- `--model openai/gpt-oss-20b`
|
||||
- `--data training_data/curated/dataset.jsonl`
|
||||
- `--out training_data/lora_adapter`
|
||||
- `--max-length 256`
|
||||
- `--epochs 1` (adjust as needed)
|
||||
- `--lora-r 8 --lora-alpha 16 --lora-dropout 0.05`
|
||||
- `--grad-accum 4`
|
||||
- `--quant auto` (MXFP4 on GPU)
|
||||
- `--log-seconds 120` (must show progress every 2 minutes)
|
||||
- `--log-steps 10` (extra progress)
|
||||
|
||||
### Progress requirement (must follow)
|
||||
- Use `--log-seconds 120` so training prints logs every ~2 minutes.
|
||||
- For long copies or merges, print `date` + file size in a loop every 120 sec.
|
||||
|
||||
### GPU requirements
|
||||
- NVIDIA GPU required for quantized loading; MXFP4 needs GPU.
|
||||
- GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8.
|
||||
|
||||
### What failed and how we fixed it
|
||||
|
||||
1) **MXFP4 grad error**
|
||||
- Error: `RuntimeError: element 0 of tensors does not require grad`
|
||||
- Fix: In `tools/finetune_lora.py`, after
|
||||
`model.gradient_checkpointing_enable()` add:
|
||||
`model.enable_input_require_grads()`
|
||||
|
||||
2) **Bitsandbytes 4-bit OOM**
|
||||
- With `--quant 4bit` the model OOMed even with max memory limits.
|
||||
- CPU offload not supported with this setup; still OOM.
|
||||
- Fix: use `--quant auto` (MXFP4) instead.
|
||||
|
||||
3) **Triton/compile issues**
|
||||
- Triton kernels required a compiler in the container.
|
||||
- Fix: Use a PyTorch **CUDA devel** image (not runtime) or install
|
||||
`build-essential` inside the container.
|
||||
|
||||
### Output artifacts (LoRA)
|
||||
`training_data/lora_adapter/` contains:
|
||||
- `adapter_model.safetensors`
|
||||
- `adapter_config.json`
|
||||
- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`
|
||||
- `training_summary.json` (includes steps and loss EMA)
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 6) GGUF Conversion and Merge (Required; Ollama LoRA not supported)
|
||||
|
||||
### Why merge is required
|
||||
- Ollama error when using ADAPTER:
|
||||
`Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented`
|
||||
- Therefore, must merge LoRA into base GGUF.
|
||||
|
||||
### llama.cpp setup (remote)
|
||||
- Clone location: `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp`
|
||||
- Build:
|
||||
```
|
||||
cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
|
||||
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
|
||||
cmake --build build -j $(nproc)
|
||||
```
|
||||
- Note: `-DLLAMA_CURL=OFF` used due to missing libcurl.
|
||||
- Binaries:
|
||||
- `build/bin/llama-export-lora`
|
||||
- `build/bin/llama-gguf`
|
||||
- When running, set:
|
||||
- `LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin`
|
||||
|
||||
### Convert LoRA to GGUF
|
||||
Use `convert_lora_to_gguf.py`:
|
||||
```
|
||||
python convert_lora_to_gguf.py \
|
||||
--lora /path/to/training_data/lora_adapter \
|
||||
--outfile /path/to/training_data/lora_adapter/options-lora.gguf
|
||||
```
|
||||
|
||||
### Architecture mismatch pitfall (critical)
|
||||
- Base GGUF from Ollama uses `general.architecture = gptoss`
|
||||
- LoRA GGUF from converter uses `general.architecture = gpt-oss`
|
||||
- `llama-export-lora` throws:
|
||||
`model arch and LoRA arch mismatch`
|
||||
|
||||
### Fix: rewrite LoRA GGUF metadata to `gptoss`
|
||||
We used `gguf-py` to rewrite metadata. Example (run inside a Python container):
|
||||
```
|
||||
from gguf import GGUFReader, GGUFWriter, GGUFValueType
|
||||
import numpy as np
|
||||
|
||||
inp = "options-lora.gguf"
|
||||
out = "options-lora-gptoss.gguf"
|
||||
r = GGUFReader(inp)
|
||||
w = GGUFWriter(out, "gptoss", endianess=r.endianess)
|
||||
|
||||
# Copy KV fields except general.architecture
|
||||
for key, field in r.fields.items():
|
||||
if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"):
|
||||
continue
|
||||
vtype = field.types[0]
|
||||
if vtype == GGUFValueType.ARRAY:
|
||||
w.add_key_value(key, field.contents(), vtype, field.types[-1])
|
||||
else:
|
||||
w.add_key_value(key, field.contents(), vtype)
|
||||
|
||||
# Copy tensors
|
||||
for t in r.tensors:
|
||||
data = t.data
|
||||
if not data.flags["C_CONTIGUOUS"]:
|
||||
data = np.ascontiguousarray(data)
|
||||
w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)),
|
||||
raw_dtype=t.tensor_type, tensor_endianess=r.endianess)
|
||||
```
|
||||
|
||||
### Tensor orientation mismatch (critical)
|
||||
- After arch fix, merge failed with:
|
||||
`GGML_ASSERT(ggml_can_mul_mat(a, b)) failed`
|
||||
- Root cause: LoRA A/B tensors had orientation incompatible with base GGUF.
|
||||
- Fix: transpose LoRA A and B **data** when re-serializing GGUF.
|
||||
|
||||
**Important GGUF detail:**
|
||||
- GGUF stores tensor dims reversed internally.
|
||||
- You must transpose the data while keeping the *original raw_shape*.
|
||||
- Working approach:
|
||||
```
|
||||
if name.endswith(".lora_a") or name.endswith(".lora_b"):
|
||||
data = np.ascontiguousarray(data.T)
|
||||
w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...)
|
||||
```
|
||||
|
||||
### Working LoRA GGUF for merge
|
||||
- `options-lora-gptoss-transposed2.gguf`
|
||||
|
||||
### Merge LoRA into base GGUF
|
||||
Base GGUF path (from Ollama blob):
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb`
|
||||
|
||||
Merge command:
|
||||
```
|
||||
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
|
||||
/mnt/.../llama.cpp/build/bin/llama-export-lora \
|
||||
-m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \
|
||||
--lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \
|
||||
-o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
|
||||
```
|
||||
|
||||
### Merged output (final)
|
||||
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
|
||||
- Size: ~13 GB
|
||||
- File type: F16
|
||||
|
||||
### Intermediate artifacts kept (not deleted)
|
||||
- `options-lora-gptoss.gguf`
|
||||
- `options-lora-gptoss-transposed.gguf`
|
||||
- `options-lora-gptoss-transposed-debug.gguf`
|
||||
- `options-lora-gptoss-transposed2.gguf`
|
||||
- `gpt-oss-20b-options-merged-f16-v2.gguf` (14 MB, failed)
|
||||
- `gpt-oss-20b-options-merged-f16.gguf` (0 bytes, failed)
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 7) Ollama Integration (Final Model)
|
||||
|
||||
### Why ADAPTER does not work
|
||||
Modelfile with ADAPTER fails:
|
||||
```
|
||||
Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented
|
||||
```
|
||||
Therefore, merged GGUF is mandatory.
|
||||
|
||||
### Copy merged GGUF into Ollama imports
|
||||
```
|
||||
mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
|
||||
cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \
|
||||
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/
|
||||
```
|
||||
|
||||
### Modelfile (with tool support)
|
||||
**Important:** tools only work if the TEMPLATE block matches the base model
|
||||
template. Without TEMPLATE, Ollama shows `{{ .Prompt }}` and tools are disabled.
|
||||
|
||||
We extracted template from base:
|
||||
```
|
||||
sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \
|
||||
> /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template
|
||||
```
|
||||
|
||||
Then built Modelfile:
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
|
||||
```
|
||||
FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf
|
||||
TEMPLATE """
|
||||
<paste full gpt-oss:20b template here>
|
||||
"""
|
||||
|
||||
SYSTEM """You are a knowledgeable options trading assistant.
|
||||
Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions.
|
||||
If information is uncertain, say so rather than guessing."""
|
||||
```
|
||||
|
||||
### Create the model
|
||||
```
|
||||
sudo -n docker exec -i ix-ollama-ollama-1 \
|
||||
ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model
|
||||
```
|
||||
|
||||
### Verify in Ollama
|
||||
```
|
||||
sudo -n docker exec -i ix-ollama-ollama-1 ollama list
|
||||
sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model
|
||||
```
|
||||
Expected capabilities include: `completion`, `tools`, `thinking`.
|
||||
|
||||
### Runtime note
|
||||
- `ollama run` can take a long time to load and may time out.
|
||||
- Use HTTP API for reliable results:
|
||||
```
|
||||
curl http://192.168.1.2:30068/api/generate -d '{
|
||||
"model":"trained-options-model:latest",
|
||||
"prompt":"Explain delta and gamma briefly.",
|
||||
"stream":false
|
||||
}'
|
||||
```
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 8) Tool/Function Call Requirement (Mandatory)
|
||||
|
||||
### How to verify tool support
|
||||
1) `ollama show trained-options-model` should list `tools` in Capabilities.
|
||||
2) `ollama show trained-options-model --template` should show the full template
|
||||
(not `{{ .Prompt }}`).
|
||||
|
||||
### Tool-call test (HTTP)
|
||||
```
|
||||
curl http://192.168.1.2:30068/api/chat -d '{
|
||||
"model":"trained-options-model:latest",
|
||||
"stream":false,
|
||||
"messages":[
|
||||
{"role":"system","content":"Use tools when available."},
|
||||
{"role":"user","content":"Compute total for quantity=3 price=4. Use tool."}
|
||||
],
|
||||
"tools":[
|
||||
{"type":"function","function":{
|
||||
"name":"calc_total",
|
||||
"description":"Compute total cost for a trade",
|
||||
"parameters":{
|
||||
"type":"object",
|
||||
"properties":{"quantity":{"type":"number"},"price":{"type":"number"}},
|
||||
"required":["quantity","price"]
|
||||
}
|
||||
}}
|
||||
]
|
||||
}'
|
||||
```
|
||||
Expected: `tool_calls` in response.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 9) Known Failures + Fixes (Summary)
|
||||
|
||||
- **Ollama ADAPTER fails** -> Merge LoRA into GGUF.
|
||||
- **Arch mismatch** (`gpt-oss` vs `gptoss`) -> Rewrite LoRA metadata.
|
||||
- **ggml_can_mul_mat assertion** -> Transpose LoRA A/B data.
|
||||
- **MXFP4 gradient error** -> `model.enable_input_require_grads()`.
|
||||
- **Bitsandbytes 4-bit OOM** -> Use MXFP4 auto on GPU.
|
||||
- **Triton compile error** -> Use PyTorch CUDA *devel* image or install gcc.
|
||||
- **WSL convert_lora_to_gguf.py missing transformers** -> Use docker or install
|
||||
transformers in WSL.
|
||||
- **`ollama run` hangs** -> Use `/api/generate` or `/api/chat` via curl.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 10) Retrain Checklist (Minimal Friction)
|
||||
|
||||
1) **Prepare data locally**
|
||||
- Put docs in `eBooks/`.
|
||||
- Run:
|
||||
- `python tools/select_relevant.py ...`
|
||||
- `python tools/build_dataset.py ...`
|
||||
- Manually inspect `training_data/relevant/corpus.txt`.
|
||||
|
||||
2) **Sync to remote**
|
||||
- Example (PowerShell):
|
||||
- `scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/`
|
||||
|
||||
3) **Stop GPU-conflicting apps**
|
||||
- Stop `llamacpp` app in TrueNAS UI.
|
||||
|
||||
4) **Train LoRA in TrueNAS app**
|
||||
- Ensure GPU attached.
|
||||
- Use `tools/finetune_lora.py` with `--log-seconds 120`.
|
||||
- Confirm adapter saved in `training_data/lora_adapter`.
|
||||
|
||||
5) **Convert LoRA to GGUF**
|
||||
- `convert_lora_to_gguf.py` -> `options-lora.gguf`
|
||||
|
||||
6) **Fix arch + transpose**
|
||||
- Rewrite to `gptoss`
|
||||
- Transpose LoRA A/B data
|
||||
- Output `options-lora-gptoss-transposed2.gguf`
|
||||
|
||||
7) **Merge into base GGUF**
|
||||
- Use `llama-export-lora`
|
||||
- Output `gpt-oss-20b-options-merged-f16-v3.gguf`
|
||||
|
||||
8) **Ollama import**
|
||||
- Copy GGUF to `/mnt/.../ollama.models/imports`
|
||||
- Build Modelfile with TEMPLATE
|
||||
- `ollama create trained-options-model -f ...`
|
||||
|
||||
9) **Verify tool support**
|
||||
- `ollama show trained-options-model`
|
||||
- `/api/chat` tool-call test
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 11) Commands Used in This Run (Examples)
|
||||
|
||||
### Remote file listing (progress + verify)
|
||||
```
|
||||
ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter"
|
||||
```
|
||||
|
||||
### GGUF metadata check
|
||||
```
|
||||
python - <<'PY'
|
||||
from gguf import GGUFReader
|
||||
r = GGUFReader("options-lora.gguf")
|
||||
print(r.get_field("general.architecture").contents())
|
||||
PY
|
||||
```
|
||||
|
||||
### Merge with progress updates every 2 minutes
|
||||
```
|
||||
BASE=/mnt/.../ollama.models/models/blobs/<base-blob>
|
||||
LORA=/mnt/.../options-lora-gptoss-transposed2.gguf
|
||||
OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf
|
||||
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
|
||||
/mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" &
|
||||
pid=$!
|
||||
while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done
|
||||
wait $pid
|
||||
```
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 12) Notes About Local Files in This Repo
|
||||
|
||||
- `Modelfile.trained-options-model` (local) still references ADAPTER and is
|
||||
**not** valid for current Ollama (ADAPTER unsupported).
|
||||
- Use the remote Modelfile in `/mnt/.../ollama.models/imports/`.
|
||||
- `_tmp_*` scripts exist for prior automation attempts (TrueNAS app creation,
|
||||
GPU checks, etc). Use only if you know what they do.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 13) Progress Reporting Policy (Non-Negotiable)
|
||||
|
||||
During any long run (training, merge, large copy):
|
||||
- Print a progress line every 120 seconds.
|
||||
- Example: `date` + file size, or a training loss line.
|
||||
- Do not allow silent runs.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 14) Quick Sanity Checks (After Retrain)
|
||||
|
||||
1) `ollama list` shows `trained-options-model:latest`
|
||||
2) `ollama show trained-options-model` lists `tools`
|
||||
3) `/api/generate` returns a coherent answer
|
||||
4) `/api/chat` returns a tool call when tools are provided
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 15) Do NOT Forget These Pitfalls
|
||||
|
||||
- Arch mismatch (`gpt-oss` vs `gptoss`) **will break merge**.
|
||||
- LoRA tensor orientation mismatch **will break merge**.
|
||||
- ADAPTER in Modelfile **does not work** in current Ollama.
|
||||
- Tool calls **only** work if TEMPLATE is included.
|
||||
- Remote shell is zsh; use `bash -lc` for complex quoting.
|
||||
- Docker requires `sudo -n`.
|
||||
- Use the remote GPU as requested; do not train on CPU.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 16) Current "Final" Artifacts (Reference)
|
||||
|
||||
### LoRA adapter
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/`
|
||||
|
||||
### Merged GGUF (final)
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
|
||||
|
||||
### Ollama Modelfile
|
||||
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
|
||||
|
||||
### Ollama Model Name
|
||||
`trained-options-model:latest`
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 17) If You Need to Rebuild Tools Support
|
||||
|
||||
1) Extract base template:
|
||||
```
|
||||
sudo -n docker exec -i ix-ollama-ollama-1 \
|
||||
ollama show gpt-oss:20b --template > /mnt/.../gptoss.template
|
||||
```
|
||||
2) Create Modelfile with TEMPLATE block.
|
||||
3) Re-run `ollama create`.
|
||||
4) Verify `ollama show trained-options-model` lists `tools`.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
## 18) Git Repo + Source Inventory (This Repo)
|
||||
|
||||
### Remote git repo
|
||||
- URL (HTTP): `https://git.rushg.me/rushabh/ollama-model-training-5060ti`
|
||||
- URL (git): `https://git.rushg.me/rushabh/ollama-model-training-5060ti.git`
|
||||
- Auth: user will authenticate on push when prompted (username/password).
|
||||
|
||||
### What is committed (and why)
|
||||
- `AGENTS.md` (this runbook; full end-to-end context).
|
||||
- `README.md` (quick overview + links to AGENTS).
|
||||
- `tools/` scripts for extraction, filtering, dataset build, and training.
|
||||
- `training_data/` curated dataset, manifests, reports, and LoRA outputs used
|
||||
for the run (kept for reproducibility).
|
||||
- `remote/ollama/Modelfile.trained-options-model.remote` (exact remote Modelfile
|
||||
used to enable tools).
|
||||
- `remote/ollama/gptoss.template` (base template pulled from gpt-oss:20b).
|
||||
- `Modelfile.trained-options-model` (local reference; see remote Modelfile for
|
||||
tool-enabled version).
|
||||
|
||||
### What is excluded (and why)
|
||||
- `eBooks/` raw source data (large; keep local and private).
|
||||
- `_llama_cpp/` (upstream repo; clone on demand).
|
||||
- `.venv/` and Python caches.
|
||||
- Any base model weights or Ollama blobs (too large; download via Ollama/HF).
|
||||
|
||||
### How to recreate missing external assets
|
||||
- Base model:
|
||||
- `ollama pull gpt-oss:20b` on the Ollama host
|
||||
- or `huggingface-cli download openai/gpt-oss-20b` into HF cache
|
||||
- llama.cpp:
|
||||
- `git clone https://github.com/ggml-org/llama.cpp.git`
|
||||
- build with `-DLLAMA_CURL=OFF` if libcurl is missing.
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
End of AGENTS.md
|
||||
Reference in New Issue
Block a user