619 lines
22 KiB
Markdown
619 lines
22 KiB
Markdown
# AGENTS.md - ingest-ebook-options Runbook (Deep Context + Retrain Guide)
|
|
|
|
This file captures the full context, decisions, failures, fixes, commands, and
|
|
paths used to fine-tune gpt-oss-20b and deploy it into Ollama as
|
|
`trained-options-model`. It is meant to be a literal step-by-step recipe for
|
|
retraining with new data. Read this end-to-end before touching anything.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 0) Hard Requirements (User Directives)
|
|
|
|
- Use local documents in this repo only.
|
|
- Dedupe repeated docs across formats; do not ingest duplicates.
|
|
- Manually remove non-relevant ebook content (preface, index, author/publisher
|
|
pages, etc). Options-trading content only.
|
|
- Use GPU heavily (not CPU).
|
|
- If local AMD 7900XTX is not available, use the remote NVIDIA box.
|
|
- All long-running tasks must show progress and **post progress at least every
|
|
2 minutes** (print progress or size updates, not silent).
|
|
- Retraining must complete locally (no cloud).
|
|
- Final Ollama model name must be **trained-options-model**.
|
|
- Final Ollama model **must support tool/function calls**.
|
|
- Any destructive commands must require explicit approval (do not run them
|
|
silently).
|
|
|
|
------------------------------------------------------------------------------
|
|
## 1) Machines, OS, Access, and Credentials
|
|
|
|
### Local Windows
|
|
- Repo path: `C:\Users\Rushabh\projects\ingest-ebook-options`
|
|
- Local AMD GPU: 7900XTX (not used here; remote NVIDIA box was used instead).
|
|
- Local Ollama install exists but was not used for training.
|
|
|
|
### Remote TrueNAS SCALE (Used for Training + Ollama)
|
|
- Host: `192.168.1.2`
|
|
- SSH port: `55555`
|
|
- User: `rushabh`
|
|
- Password: none required (key-based / no password).
|
|
- SSH example:
|
|
- `ssh -p 55555 rushabh@192.168.1.2`
|
|
- Ollama HTTP endpoint (remote): `http://192.168.1.2:30068`
|
|
|
|
### TrueNAS UI / middlewared
|
|
- User explicitly required: create and manage containers as TrueNAS Apps
|
|
(middlewared/TrueNAS UI), not ad-hoc docker only.
|
|
- If an app does not show in UI, check middlewared and re-create via UI.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 2) Storage Layout and Mounts (Critical)
|
|
|
|
### Remote TrueNAS storage root
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps`
|
|
|
|
### Remote training workspace (folder, not ZFS dataset)
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch`
|
|
- IMPORTANT: user requested a folder, not a ZFS dataset.
|
|
|
|
### Repo copy on remote
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options`
|
|
|
|
### Ollama model storage mount (remote)
|
|
- Host path: `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
|
|
- Container path: `/root/.ollama`
|
|
- Actual model store:
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models`
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs`
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/manifests`
|
|
|
|
### Ollama imports folder (created by us)
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports`
|
|
|
|
### Hugging Face cache (remote)
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/hf_cache`
|
|
- When retraining, set `HF_HOME` or `HF_HUB_CACHE` to this path to keep downloads
|
|
on fast storage and avoid redownloading.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 3) TrueNAS App Setup (GPU Training + Ollama)
|
|
|
|
### Ollama App
|
|
- Container name: `ix-ollama-ollama-1`
|
|
- Exposes: `0.0.0.0:30068`
|
|
- GPU: NVIDIA RTX 5060 Ti (16 GB VRAM)
|
|
- Observed Ollama version: 0.13.5
|
|
- Uses `/root/.ollama` mapped to `/mnt/fast.storage.rushg.me/datasets/apps/ollama.models`
|
|
|
|
### Training App (Created in TrueNAS UI)
|
|
- App name: `options-train`
|
|
- GPU: NVIDIA RTX 5060 Ti
|
|
- Reason: user demanded TrueNAS UI app creation; also to ensure GPU access.
|
|
- We explicitly stopped the `llamacpp` app to free GPU before training.
|
|
|
|
### Docker permission note
|
|
- Non-root user lacks docker socket permission.
|
|
- Use `sudo -n docker ...` for all docker commands on the host.
|
|
|
|
### Shell note (remote)
|
|
- Default shell is `zsh`.
|
|
- Use `bash -lc '...'` to avoid quote parsing issues and missing tools.
|
|
- `rg` is not installed on remote; use `grep`/`find`.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 4) Data Prep Pipeline (Dedup + Manual Relevance)
|
|
|
|
### Source docs
|
|
- Local docs in `eBooks/` (PDF/EPUB/etc).
|
|
- Must **manually** select relevant pages (options trading content only).
|
|
- Skip: prefaces, index, author/publisher info, boilerplate, etc.
|
|
|
|
### Step A - Extract full text and doc-level dedupe
|
|
Script: `tools/extract_corpus.py`
|
|
- Supports .pdf/.epub/.txt/.md
|
|
- Dedup by SHA256 of normalized text across different formats.
|
|
- Outputs:
|
|
- `training_data/manifest.json`
|
|
- `training_data/corpus.txt`
|
|
- `training_data/text/*.txt`
|
|
- `training_data/rejected.json`
|
|
Example:
|
|
```
|
|
python tools/extract_corpus.py --input eBooks --out training_data --min-chars 2000
|
|
```
|
|
Dependencies:
|
|
- `pypdf`, `ebooklib`, `beautifulsoup4`, `lxml`, `chardet`
|
|
|
|
### Step B - Page/section relevance filtering (Options-focused)
|
|
Script: `tools/select_relevant.py`
|
|
- Scores segments for options-trading keywords.
|
|
- Drops TOC/index/front matter.
|
|
- Dedupe by SHA256 of normalized segment.
|
|
- Includes neighboring pages by `--neighbors`.
|
|
Outputs in `training_data/relevant`:
|
|
- `text/*.txt`
|
|
- `manifest.json`
|
|
- `report.csv`
|
|
- `corpus.txt`
|
|
Example:
|
|
```
|
|
python tools/select_relevant.py --input eBooks --out training_data/relevant \
|
|
--min-score 10 --min-chars 800 --neighbors 1
|
|
```
|
|
|
|
### Step C - Chunk to JSONL dataset
|
|
Script: `tools/build_dataset.py`
|
|
- Splits into overlapping chunks.
|
|
- Optional junk filter and keyword score.
|
|
Outputs:
|
|
- `training_data/relevant/dataset.jsonl`
|
|
- `training_data/relevant/dataset.stats.json`
|
|
Example:
|
|
```
|
|
python tools/build_dataset.py \
|
|
--manifest training_data/relevant/manifest.json \
|
|
--text-dir training_data/relevant/text \
|
|
--out training_data/curated/dataset.jsonl \
|
|
--chunk-chars 6000 --overlap-chars 400 --min-chars 1200 --drop-junk
|
|
```
|
|
|
|
### Manual curation requirement
|
|
- The scripts are helper filters only. You must still **manually review** for
|
|
relevance, especially to remove prefaces, indexes, disclaimers, etc.
|
|
- Use `training_data/relevant/corpus.txt` to scan human-readable content.
|
|
|
|
### Dataset used in this run
|
|
- Remote dataset path:
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/curated/dataset.jsonl`
|
|
- Count: 1778 chunks.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 5) Training Pipeline (LoRA fine-tune on NVIDIA box)
|
|
|
|
### Why local AMD GPU was not used
|
|
- User explicitly requested the remote NVIDIA box.
|
|
- Local AMD 7900XTX was not used in this run.
|
|
|
|
### Training script (repo)
|
|
- `tools/finetune_lora.py`
|
|
- Modified to fix gradient checkpointing + LoRA:
|
|
- `model.enable_input_require_grads()` is required.
|
|
- Without it, MXFP4 path fails with:
|
|
`RuntimeError: element 0 of tensors does not require grad...`
|
|
|
|
### Key training args used
|
|
- `--model openai/gpt-oss-20b`
|
|
- `--data training_data/curated/dataset.jsonl`
|
|
- `--out training_data/lora_adapter`
|
|
- `--max-length 256`
|
|
- `--epochs 1` (adjust as needed)
|
|
- `--lora-r 8 --lora-alpha 16 --lora-dropout 0.05`
|
|
- `--grad-accum 4`
|
|
- `--quant auto` (MXFP4 on GPU)
|
|
- `--log-seconds 120` (must show progress every 2 minutes)
|
|
- `--log-steps 10` (extra progress)
|
|
|
|
### Progress requirement (must follow)
|
|
- Use `--log-seconds 120` so training prints logs every ~2 minutes.
|
|
- For long copies or merges, print `date` + file size in a loop every 120 sec.
|
|
|
|
### GPU requirements
|
|
- NVIDIA GPU required for quantized loading; MXFP4 needs GPU.
|
|
- GPU observed: RTX 5060 Ti, 16 GB VRAM, CUDA 12.8.
|
|
|
|
### What failed and how we fixed it
|
|
|
|
1) **MXFP4 grad error**
|
|
- Error: `RuntimeError: element 0 of tensors does not require grad`
|
|
- Fix: In `tools/finetune_lora.py`, after
|
|
`model.gradient_checkpointing_enable()` add:
|
|
`model.enable_input_require_grads()`
|
|
|
|
2) **Bitsandbytes 4-bit OOM**
|
|
- With `--quant 4bit` the model OOMed even with max memory limits.
|
|
- CPU offload not supported with this setup; still OOM.
|
|
- Fix: use `--quant auto` (MXFP4) instead.
|
|
|
|
3) **Triton/compile issues**
|
|
- Triton kernels required a compiler in the container.
|
|
- Fix: Use a PyTorch **CUDA devel** image (not runtime) or install
|
|
`build-essential` inside the container.
|
|
|
|
### Output artifacts (LoRA)
|
|
`training_data/lora_adapter/` contains:
|
|
- `adapter_model.safetensors`
|
|
- `adapter_config.json`
|
|
- `tokenizer.json`, `tokenizer_config.json`, `special_tokens_map.json`
|
|
- `training_summary.json` (includes steps and loss EMA)
|
|
|
|
------------------------------------------------------------------------------
|
|
## 6) GGUF Conversion and Merge (Required; Ollama LoRA not supported)
|
|
|
|
### Why merge is required
|
|
- Ollama error when using ADAPTER:
|
|
`Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented`
|
|
- Therefore, must merge LoRA into base GGUF.
|
|
|
|
### llama.cpp setup (remote)
|
|
- Clone location: `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp`
|
|
- Build:
|
|
```
|
|
cd /mnt/fast.storage.rushg.me/datasets/apps/pytorch/llama.cpp
|
|
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=OFF
|
|
cmake --build build -j $(nproc)
|
|
```
|
|
- Note: `-DLLAMA_CURL=OFF` used due to missing libcurl.
|
|
- Binaries:
|
|
- `build/bin/llama-export-lora`
|
|
- `build/bin/llama-gguf`
|
|
- When running, set:
|
|
- `LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin`
|
|
|
|
### Convert LoRA to GGUF
|
|
Use `convert_lora_to_gguf.py`:
|
|
```
|
|
python convert_lora_to_gguf.py \
|
|
--lora /path/to/training_data/lora_adapter \
|
|
--outfile /path/to/training_data/lora_adapter/options-lora.gguf
|
|
```
|
|
|
|
### Architecture mismatch pitfall (critical)
|
|
- Base GGUF from Ollama uses `general.architecture = gptoss`
|
|
- LoRA GGUF from converter uses `general.architecture = gpt-oss`
|
|
- `llama-export-lora` throws:
|
|
`model arch and LoRA arch mismatch`
|
|
|
|
### Fix: rewrite LoRA GGUF metadata to `gptoss`
|
|
We used `gguf-py` to rewrite metadata. Example (run inside a Python container):
|
|
```
|
|
from gguf import GGUFReader, GGUFWriter, GGUFValueType
|
|
import numpy as np
|
|
|
|
inp = "options-lora.gguf"
|
|
out = "options-lora-gptoss.gguf"
|
|
r = GGUFReader(inp)
|
|
w = GGUFWriter(out, "gptoss", endianess=r.endianess)
|
|
|
|
# Copy KV fields except general.architecture
|
|
for key, field in r.fields.items():
|
|
if key.startswith("GGUF.") or key in ("general.architecture", "general.alignment"):
|
|
continue
|
|
vtype = field.types[0]
|
|
if vtype == GGUFValueType.ARRAY:
|
|
w.add_key_value(key, field.contents(), vtype, field.types[-1])
|
|
else:
|
|
w.add_key_value(key, field.contents(), vtype)
|
|
|
|
# Copy tensors
|
|
for t in r.tensors:
|
|
data = t.data
|
|
if not data.flags["C_CONTIGUOUS"]:
|
|
data = np.ascontiguousarray(data)
|
|
w.add_tensor(t.name, data, raw_shape=list(map(int, t.shape)),
|
|
raw_dtype=t.tensor_type, tensor_endianess=r.endianess)
|
|
```
|
|
|
|
### Tensor orientation mismatch (critical)
|
|
- After arch fix, merge failed with:
|
|
`GGML_ASSERT(ggml_can_mul_mat(a, b)) failed`
|
|
- Root cause: LoRA A/B tensors had orientation incompatible with base GGUF.
|
|
- Fix: transpose LoRA A and B **data** when re-serializing GGUF.
|
|
|
|
**Important GGUF detail:**
|
|
- GGUF stores tensor dims reversed internally.
|
|
- You must transpose the data while keeping the *original raw_shape*.
|
|
- Working approach:
|
|
```
|
|
if name.endswith(".lora_a") or name.endswith(".lora_b"):
|
|
data = np.ascontiguousarray(data.T)
|
|
w.add_tensor(name, data, raw_shape=shape, raw_dtype=..., ...)
|
|
```
|
|
|
|
### Working LoRA GGUF for merge
|
|
- `options-lora-gptoss-transposed2.gguf`
|
|
|
|
### Merge LoRA into base GGUF
|
|
Base GGUF path (from Ollama blob):
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb`
|
|
|
|
Merge command:
|
|
```
|
|
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
|
|
/mnt/.../llama.cpp/build/bin/llama-export-lora \
|
|
-m /mnt/.../ollama.models/models/blobs/sha256-e7b273f9636059a689e3ddcab3716e4f65abe0143ac978e46673ad0e52d09efb \
|
|
--lora /mnt/.../training_data/lora_adapter/options-lora-gptoss-transposed2.gguf \
|
|
-o /mnt/.../training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf
|
|
```
|
|
|
|
### Merged output (final)
|
|
- `/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
|
|
- Size: ~13 GB
|
|
- File type: F16
|
|
|
|
### Intermediate artifacts kept (not deleted)
|
|
- `options-lora-gptoss.gguf`
|
|
- `options-lora-gptoss-transposed.gguf`
|
|
- `options-lora-gptoss-transposed-debug.gguf`
|
|
- `options-lora-gptoss-transposed2.gguf`
|
|
- `gpt-oss-20b-options-merged-f16-v2.gguf` (14 MB, failed)
|
|
- `gpt-oss-20b-options-merged-f16.gguf` (0 bytes, failed)
|
|
|
|
------------------------------------------------------------------------------
|
|
## 7) Ollama Integration (Final Model)
|
|
|
|
### Why ADAPTER does not work
|
|
Modelfile with ADAPTER fails:
|
|
```
|
|
Error: 500 Internal Server Error: failed to initialize model: loras are not yet implemented
|
|
```
|
|
Therefore, merged GGUF is mandatory.
|
|
|
|
### Copy merged GGUF into Ollama imports
|
|
```
|
|
mkdir -p /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports
|
|
cp /mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf \
|
|
/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/
|
|
```
|
|
|
|
### Modelfile (with tool support)
|
|
**Important:** tools only work if the TEMPLATE block matches the base model
|
|
template. Without TEMPLATE, Ollama shows `{{ .Prompt }}` and tools are disabled.
|
|
|
|
We extracted template from base:
|
|
```
|
|
sudo -n docker exec -i ix-ollama-ollama-1 ollama show gpt-oss:20b --template \
|
|
> /mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/gptoss.template
|
|
```
|
|
|
|
Then built Modelfile:
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
|
|
```
|
|
FROM /root/.ollama/imports/gpt-oss-20b-options-merged-f16-v3.gguf
|
|
TEMPLATE """
|
|
<paste full gpt-oss:20b template here>
|
|
"""
|
|
|
|
SYSTEM """You are a knowledgeable options trading assistant.
|
|
Explain concepts clearly, use correct terminology (Greeks, volatility, spreads, assignment), and be explicit about assumptions.
|
|
If information is uncertain, say so rather than guessing."""
|
|
```
|
|
|
|
### Create the model
|
|
```
|
|
sudo -n docker exec -i ix-ollama-ollama-1 \
|
|
ollama create trained-options-model -f /root/.ollama/imports/Modelfile.trained-options-model
|
|
```
|
|
|
|
### Verify in Ollama
|
|
```
|
|
sudo -n docker exec -i ix-ollama-ollama-1 ollama list
|
|
sudo -n docker exec -i ix-ollama-ollama-1 ollama show trained-options-model
|
|
```
|
|
Expected capabilities include: `completion`, `tools`, `thinking`.
|
|
|
|
### Runtime note
|
|
- `ollama run` can take a long time to load and may time out.
|
|
- Use HTTP API for reliable results:
|
|
```
|
|
curl http://192.168.1.2:30068/api/generate -d '{
|
|
"model":"trained-options-model:latest",
|
|
"prompt":"Explain delta and gamma briefly.",
|
|
"stream":false
|
|
}'
|
|
```
|
|
|
|
------------------------------------------------------------------------------
|
|
## 8) Tool/Function Call Requirement (Mandatory)
|
|
|
|
### How to verify tool support
|
|
1) `ollama show trained-options-model` should list `tools` in Capabilities.
|
|
2) `ollama show trained-options-model --template` should show the full template
|
|
(not `{{ .Prompt }}`).
|
|
|
|
### Tool-call test (HTTP)
|
|
```
|
|
curl http://192.168.1.2:30068/api/chat -d '{
|
|
"model":"trained-options-model:latest",
|
|
"stream":false,
|
|
"messages":[
|
|
{"role":"system","content":"Use tools when available."},
|
|
{"role":"user","content":"Compute total for quantity=3 price=4. Use tool."}
|
|
],
|
|
"tools":[
|
|
{"type":"function","function":{
|
|
"name":"calc_total",
|
|
"description":"Compute total cost for a trade",
|
|
"parameters":{
|
|
"type":"object",
|
|
"properties":{"quantity":{"type":"number"},"price":{"type":"number"}},
|
|
"required":["quantity","price"]
|
|
}
|
|
}}
|
|
]
|
|
}'
|
|
```
|
|
Expected: `tool_calls` in response.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 9) Known Failures + Fixes (Summary)
|
|
|
|
- **Ollama ADAPTER fails** -> Merge LoRA into GGUF.
|
|
- **Arch mismatch** (`gpt-oss` vs `gptoss`) -> Rewrite LoRA metadata.
|
|
- **ggml_can_mul_mat assertion** -> Transpose LoRA A/B data.
|
|
- **MXFP4 gradient error** -> `model.enable_input_require_grads()`.
|
|
- **Bitsandbytes 4-bit OOM** -> Use MXFP4 auto on GPU.
|
|
- **Triton compile error** -> Use PyTorch CUDA *devel* image or install gcc.
|
|
- **WSL convert_lora_to_gguf.py missing transformers** -> Use docker or install
|
|
transformers in WSL.
|
|
- **`ollama run` hangs** -> Use `/api/generate` or `/api/chat` via curl.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 10) Retrain Checklist (Minimal Friction)
|
|
|
|
1) **Prepare data locally**
|
|
- Put docs in `eBooks/`.
|
|
- Run:
|
|
- `python tools/select_relevant.py ...`
|
|
- `python tools/build_dataset.py ...`
|
|
- Manually inspect `training_data/relevant/corpus.txt`.
|
|
|
|
2) **Sync to remote**
|
|
- Example (PowerShell):
|
|
- `scp -P 55555 -r .\ingest-ebook-options rushabh@192.168.1.2:/mnt/fast.storage.rushg.me/datasets/apps/pytorch/`
|
|
|
|
3) **Stop GPU-conflicting apps**
|
|
- Stop `llamacpp` app in TrueNAS UI.
|
|
|
|
4) **Train LoRA in TrueNAS app**
|
|
- Ensure GPU attached.
|
|
- Use `tools/finetune_lora.py` with `--log-seconds 120`.
|
|
- Confirm adapter saved in `training_data/lora_adapter`.
|
|
|
|
5) **Convert LoRA to GGUF**
|
|
- `convert_lora_to_gguf.py` -> `options-lora.gguf`
|
|
|
|
6) **Fix arch + transpose**
|
|
- Rewrite to `gptoss`
|
|
- Transpose LoRA A/B data
|
|
- Output `options-lora-gptoss-transposed2.gguf`
|
|
|
|
7) **Merge into base GGUF**
|
|
- Use `llama-export-lora`
|
|
- Output `gpt-oss-20b-options-merged-f16-v3.gguf`
|
|
|
|
8) **Ollama import**
|
|
- Copy GGUF to `/mnt/.../ollama.models/imports`
|
|
- Build Modelfile with TEMPLATE
|
|
- `ollama create trained-options-model -f ...`
|
|
|
|
9) **Verify tool support**
|
|
- `ollama show trained-options-model`
|
|
- `/api/chat` tool-call test
|
|
|
|
------------------------------------------------------------------------------
|
|
## 11) Commands Used in This Run (Examples)
|
|
|
|
### Remote file listing (progress + verify)
|
|
```
|
|
ssh -p 55555 rushabh@192.168.1.2 "ls -la /mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter"
|
|
```
|
|
|
|
### GGUF metadata check
|
|
```
|
|
python - <<'PY'
|
|
from gguf import GGUFReader
|
|
r = GGUFReader("options-lora.gguf")
|
|
print(r.get_field("general.architecture").contents())
|
|
PY
|
|
```
|
|
|
|
### Merge with progress updates every 2 minutes
|
|
```
|
|
BASE=/mnt/.../ollama.models/models/blobs/<base-blob>
|
|
LORA=/mnt/.../options-lora-gptoss-transposed2.gguf
|
|
OUT=/mnt/.../gpt-oss-20b-options-merged-f16-v3.gguf
|
|
export LD_LIBRARY_PATH=/mnt/.../llama.cpp/build/bin
|
|
/mnt/.../llama-export-lora -m "$BASE" --lora "$LORA" -o "$OUT" &
|
|
pid=$!
|
|
while kill -0 $pid 2>/dev/null; do date; ls -lh "$OUT" || true; sleep 120; done
|
|
wait $pid
|
|
```
|
|
|
|
------------------------------------------------------------------------------
|
|
## 12) Notes About Local Files in This Repo
|
|
|
|
- `Modelfile.trained-options-model` (local) still references ADAPTER and is
|
|
**not** valid for current Ollama (ADAPTER unsupported).
|
|
- Use the remote Modelfile in `/mnt/.../ollama.models/imports/`.
|
|
- `_tmp_*` scripts exist for prior automation attempts (TrueNAS app creation,
|
|
GPU checks, etc). Use only if you know what they do.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 13) Progress Reporting Policy (Non-Negotiable)
|
|
|
|
During any long run (training, merge, large copy):
|
|
- Print a progress line every 120 seconds.
|
|
- Example: `date` + file size, or a training loss line.
|
|
- Do not allow silent runs.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 14) Quick Sanity Checks (After Retrain)
|
|
|
|
1) `ollama list` shows `trained-options-model:latest`
|
|
2) `ollama show trained-options-model` lists `tools`
|
|
3) `/api/generate` returns a coherent answer
|
|
4) `/api/chat` returns a tool call when tools are provided
|
|
|
|
------------------------------------------------------------------------------
|
|
## 15) Do NOT Forget These Pitfalls
|
|
|
|
- Arch mismatch (`gpt-oss` vs `gptoss`) **will break merge**.
|
|
- LoRA tensor orientation mismatch **will break merge**.
|
|
- ADAPTER in Modelfile **does not work** in current Ollama.
|
|
- Tool calls **only** work if TEMPLATE is included.
|
|
- Remote shell is zsh; use `bash -lc` for complex quoting.
|
|
- Docker requires `sudo -n`.
|
|
- Use the remote GPU as requested; do not train on CPU.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 16) Current "Final" Artifacts (Reference)
|
|
|
|
### LoRA adapter
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/`
|
|
|
|
### Merged GGUF (final)
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/pytorch/ingest-ebook-options/training_data/lora_adapter/gpt-oss-20b-options-merged-f16-v3.gguf`
|
|
|
|
### Ollama Modelfile
|
|
`/mnt/fast.storage.rushg.me/datasets/apps/ollama.models/imports/Modelfile.trained-options-model`
|
|
|
|
### Ollama Model Name
|
|
`trained-options-model:latest`
|
|
|
|
------------------------------------------------------------------------------
|
|
## 17) If You Need to Rebuild Tools Support
|
|
|
|
1) Extract base template:
|
|
```
|
|
sudo -n docker exec -i ix-ollama-ollama-1 \
|
|
ollama show gpt-oss:20b --template > /mnt/.../gptoss.template
|
|
```
|
|
2) Create Modelfile with TEMPLATE block.
|
|
3) Re-run `ollama create`.
|
|
4) Verify `ollama show trained-options-model` lists `tools`.
|
|
|
|
------------------------------------------------------------------------------
|
|
## 18) Git Repo + Source Inventory (This Repo)
|
|
|
|
### Remote git repo
|
|
- URL (HTTP): `https://git.rushg.me/rushabh/ollama-model-training-5060ti`
|
|
- URL (git): `https://git.rushg.me/rushabh/ollama-model-training-5060ti.git`
|
|
- Auth: user will authenticate on push when prompted (username/password).
|
|
|
|
### What is committed (and why)
|
|
- `AGENTS.md` (this runbook; full end-to-end context).
|
|
- `README.md` (quick overview + links to AGENTS).
|
|
- `tools/` scripts for extraction, filtering, dataset build, and training.
|
|
- `training_data/` curated dataset, manifests, reports, and LoRA outputs used
|
|
for the run (kept for reproducibility).
|
|
- `remote/ollama/Modelfile.trained-options-model.remote` (exact remote Modelfile
|
|
used to enable tools).
|
|
- `remote/ollama/gptoss.template` (base template pulled from gpt-oss:20b).
|
|
- `Modelfile.trained-options-model` (local reference; see remote Modelfile for
|
|
tool-enabled version).
|
|
|
|
### What is excluded (and why)
|
|
- `eBooks/` raw source data (large; keep local and private).
|
|
- `_llama_cpp/` (upstream repo; clone on demand).
|
|
- `.venv/` and Python caches.
|
|
- Any base model weights or Ollama blobs (too large; download via Ollama/HF).
|
|
|
|
### How to recreate missing external assets
|
|
- Base model:
|
|
- `ollama pull gpt-oss:20b` on the Ollama host
|
|
- or `huggingface-cli download openai/gpt-oss-20b` into HF cache
|
|
- llama.cpp:
|
|
- `git clone https://github.com/ggml-org/llama.cpp.git`
|
|
- build with `-DLLAMA_CURL=OFF` if libcurl is missing.
|
|
|
|
------------------------------------------------------------------------------
|
|
End of AGENTS.md
|