Files
SimpleScraper/AGENTS.md

46 KiB

AGENTS.md

Context

  • This project exposes a Flask API that uses Playwright to scrape Yahoo Finance options chains.
  • Entry point: scraper_service.py (launched via runner.bat or directly with Python).
  • API route: GET /scrape_sync with stock and optional expiration|expiry|date parameters.
  • Expiration inputs: epoch seconds (Yahoo date param) or date strings supported by DATE_FORMATS.

Docker

  • Build: docker build -t <image>:latest .
  • Run: docker run --rm -p 9777:9777 <image>:latest
  • The container uses the Playwright base image with bundled browsers.

Line-by-line explanation of scraper_service.py

  • Line 1: Import symbols from flask. Code: from flask import Flask, jsonify, request
  • Line 2: Import symbols from playwright.sync_api. Code: from playwright.sync_api import sync_playwright
  • Line 3: Import symbols from bs4. Code: from bs4 import BeautifulSoup
  • Line 4: Import symbols from datetime. Code: from datetime import datetime, timezone
  • Line 5: Import module urllib.parse. Code: import urllib.parse
  • Line 6: Import module logging. Code: import logging
  • Line 7: Import module json. Code: import json
  • Line 8: Import module re. Code: import re
  • Line 9: Import module time. Code: import time
  • Line 10: Blank line for readability. Code: <blank>
  • Line 11: Create the Flask application instance. Code: app = Flask(__name__)
  • Line 12: Blank line for readability. Code: <blank>
  • Line 13: Comment describing the next block. Code: # Logging
  • Line 14: Configure logging defaults. Code: logging.basicConfig(
  • Line 15: Execute the statement as written. Code: level=logging.INFO,
  • Line 16: Execute the statement as written. Code: format="%(asctime)s [%(levelname)s] %(message)s"
  • Line 17: Close the current block or container. Code: )
  • Line 18: Set the Flask logger level. Code: app.logger.setLevel(logging.INFO)
  • Line 19: Blank line for readability. Code: <blank>
  • Line 20: Define accepted expiration date string formats. Code: DATE_FORMATS = (
  • Line 21: Execute the statement as written. Code: "%Y-%m-%d",
  • Line 22: Execute the statement as written. Code: "%Y/%m/%d",
  • Line 23: Execute the statement as written. Code: "%Y%m%d",
  • Line 24: Execute the statement as written. Code: "%b %d, %Y",
  • Line 25: Execute the statement as written. Code: "%B %d, %Y",
  • Line 26: Close the current block or container. Code: )
  • Line 27: Blank line for readability. Code: <blank>
  • Line 28: Blank line for readability. Code: <blank>
  • Line 29: Define the parse_date function. Code: def parse_date(value):
  • Line 30: Loop over items. Code: for fmt in DATE_FORMATS:
  • Line 31: Start a try block for error handling. Code: try:
  • Line 32: Return a value to the caller. Code: return datetime.strptime(value, fmt).date()
  • Line 33: Handle exceptions for the preceding try block. Code: except ValueError:
  • Line 34: Execute the statement as written. Code: continue
  • Line 35: Return a value to the caller. Code: return None
  • Line 36: Blank line for readability. Code: <blank>
  • Line 37: Blank line for readability. Code: <blank>
  • Line 38: Define the normalize_label function. Code: def normalize_label(value):
  • Line 39: Return a value to the caller. Code: return " ".join(value.strip().split()).lower()
  • Line 40: Blank line for readability. Code: <blank>
  • Line 41: Blank line for readability. Code: <blank>
  • Line 42: Define the format_expiration_label function. Code: def format_expiration_label(timestamp):
  • Line 43: Start a try block for error handling. Code: try:
  • Line 44: Return a value to the caller. Code: return datetime.utcfromtimestamp(timestamp).strftime("%Y-%m-%d")
  • Line 45: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 46: Return a value to the caller. Code: return str(timestamp)
  • Line 47: Blank line for readability. Code: <blank>
  • Line 48: Blank line for readability. Code: <blank>
  • Line 49: Define the format_percent function. Code: def format_percent(value):
  • Line 50: Conditional branch. Code: if value is None:
  • Line 51: Return a value to the caller. Code: return None
  • Line 52: Start a try block for error handling. Code: try:
  • Line 53: Return a value to the caller. Code: return f"{value * 100:.2f}%"
  • Line 54: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 55: Return a value to the caller. Code: return None
  • Line 56: Blank line for readability. Code: <blank>
  • Line 57: Blank line for readability. Code: <blank>
  • Line 58: Define the extract_raw_value function. Code: def extract_raw_value(value):
  • Line 59: Conditional branch. Code: if isinstance(value, dict):
  • Line 60: Return a value to the caller. Code: return value.get("raw")
  • Line 61: Return a value to the caller. Code: return value
  • Line 62: Blank line for readability. Code: <blank>
  • Line 63: Blank line for readability. Code: <blank>
  • Line 64: Define the extract_fmt_value function. Code: def extract_fmt_value(value):
  • Line 65: Conditional branch. Code: if isinstance(value, dict):
  • Line 66: Return a value to the caller. Code: return value.get("fmt")
  • Line 67: Return a value to the caller. Code: return None
  • Line 68: Blank line for readability. Code: <blank>
  • Line 69: Blank line for readability. Code: <blank>
  • Line 70: Define the format_percent_value function. Code: def format_percent_value(value):
  • Line 71: Execute the statement as written. Code: fmt = extract_fmt_value(value)
  • Line 72: Conditional branch. Code: if fmt is not None:
  • Line 73: Return a value to the caller. Code: return fmt
  • Line 74: Return a value to the caller. Code: return format_percent(extract_raw_value(value))
  • Line 75: Blank line for readability. Code: <blank>
  • Line 76: Blank line for readability. Code: <blank>
  • Line 77: Define the format_last_trade_date function. Code: def format_last_trade_date(timestamp):
  • Line 78: Execute the statement as written. Code: timestamp = extract_raw_value(timestamp)
  • Line 79: Conditional branch. Code: if not timestamp:
  • Line 80: Return a value to the caller. Code: return None
  • Line 81: Start a try block for error handling. Code: try:
  • Line 82: Return a value to the caller. Code: return datetime.fromtimestamp(timestamp).strftime("%m/%d/%Y %I:%M %p") + " EST"
  • Line 83: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 84: Return a value to the caller. Code: return None
  • Line 85: Blank line for readability. Code: <blank>
  • Line 86: Blank line for readability. Code: <blank>
  • Line 87: Define the extract_option_chain_from_html function. Code: def extract_option_chain_from_html(html):
  • Line 88: Conditional branch. Code: if not html:
  • Line 89: Return a value to the caller. Code: return None
  • Line 90: Blank line for readability. Code: <blank>
  • Line 91: Execute the statement as written. Code: token = "\"body\":\""
  • Line 92: Execute the statement as written. Code: start = 0
  • Line 93: Execute the statement as written. Code: while True:
  • Line 94: Execute the statement as written. Code: idx = html.find(token, start)
  • Line 95: Conditional branch. Code: if idx == -1:
  • Line 96: Execute the statement as written. Code: break
  • Line 97: Execute the statement as written. Code: i = idx + len(token)
  • Line 98: Execute the statement as written. Code: escaped = False
  • Line 99: Execute the statement as written. Code: raw_chars = []
  • Line 100: Execute the statement as written. Code: while i < len(html):
  • Line 101: Execute the statement as written. Code: ch = html[i]
  • Line 102: Conditional branch. Code: if escaped:
  • Line 103: Execute the statement as written. Code: raw_chars.append(ch)
  • Line 104: Execute the statement as written. Code: escaped = False
  • Line 105: Fallback branch. Code: else:
  • Line 106: Conditional branch. Code: if ch == "\\":
  • Line 107: Execute the statement as written. Code: raw_chars.append(ch)
  • Line 108: Execute the statement as written. Code: escaped = True
  • Line 109: Alternative conditional branch. Code: elif ch == "\"":
  • Line 110: Execute the statement as written. Code: break
  • Line 111: Fallback branch. Code: else:
  • Line 112: Execute the statement as written. Code: raw_chars.append(ch)
  • Line 113: Execute the statement as written. Code: i += 1
  • Line 114: Execute the statement as written. Code: raw = "".join(raw_chars)
  • Line 115: Start a try block for error handling. Code: try:
  • Line 116: Execute the statement as written. Code: body_text = json.loads(f"\"{raw}\"")
  • Line 117: Handle exceptions for the preceding try block. Code: except json.JSONDecodeError:
  • Line 118: Execute the statement as written. Code: start = idx + len(token)
  • Line 119: Execute the statement as written. Code: continue
  • Line 120: Conditional branch. Code: if "optionChain" not in body_text:
  • Line 121: Execute the statement as written. Code: start = idx + len(token)
  • Line 122: Execute the statement as written. Code: continue
  • Line 123: Start a try block for error handling. Code: try:
  • Line 124: Execute the statement as written. Code: payload = json.loads(body_text)
  • Line 125: Handle exceptions for the preceding try block. Code: except json.JSONDecodeError:
  • Line 126: Execute the statement as written. Code: start = idx + len(token)
  • Line 127: Execute the statement as written. Code: continue
  • Line 128: Execute the statement as written. Code: option_chain = payload.get("optionChain")
  • Line 129: Conditional branch. Code: if option_chain and option_chain.get("result"):
  • Line 130: Return a value to the caller. Code: return option_chain
  • Line 131: Blank line for readability. Code: <blank>
  • Line 132: Execute the statement as written. Code: start = idx + len(token)
  • Line 133: Blank line for readability. Code: <blank>
  • Line 134: Return a value to the caller. Code: return None
  • Line 135: Blank line for readability. Code: <blank>
  • Line 136: Blank line for readability. Code: <blank>
  • Line 137: Define the extract_expiration_dates_from_chain function. Code: def extract_expiration_dates_from_chain(chain):
  • Line 138: Conditional branch. Code: if not chain:
  • Line 139: Return a value to the caller. Code: return []
  • Line 140: Blank line for readability. Code: <blank>
  • Line 141: Execute the statement as written. Code: result = chain.get("result", [])
  • Line 142: Conditional branch. Code: if not result:
  • Line 143: Return a value to the caller. Code: return []
  • Line 144: Return a value to the caller. Code: return result[0].get("expirationDates", []) or []
  • Line 145: Blank line for readability. Code: <blank>
  • Line 146: Blank line for readability. Code: <blank>
  • Line 147: Define the normalize_chain_rows function. Code: def normalize_chain_rows(rows):
  • Line 148: Execute the statement as written. Code: normalized = []
  • Line 149: Loop over items. Code: for row in rows or []:
  • Line 150: Execute the statement as written. Code: normalized.append(
  • Line 151: Execute the statement as written. Code: {
  • Line 152: Execute the statement as written. Code: "Contract Name": row.get("contractSymbol"),
  • Line 153: Execute the statement as written. Code: "Last Trade Date (EST)": format_last_trade_date(
  • Line 154: Execute the statement as written. Code: row.get("lastTradeDate")
  • Line 155: Close the current block or container. Code: ),
  • Line 156: Execute the statement as written. Code: "Strike": extract_raw_value(row.get("strike")),
  • Line 157: Execute the statement as written. Code: "Last Price": extract_raw_value(row.get("lastPrice")),
  • Line 158: Execute the statement as written. Code: "Bid": extract_raw_value(row.get("bid")),
  • Line 159: Execute the statement as written. Code: "Ask": extract_raw_value(row.get("ask")),
  • Line 160: Execute the statement as written. Code: "Change": extract_raw_value(row.get("change")),
  • Line 161: Execute the statement as written. Code: "% Change": format_percent_value(row.get("percentChange")),
  • Line 162: Execute the statement as written. Code: "Volume": extract_raw_value(row.get("volume")),
  • Line 163: Execute the statement as written. Code: "Open Interest": extract_raw_value(row.get("openInterest")),
  • Line 164: Execute the statement as written. Code: "Implied Volatility": format_percent_value(
  • Line 165: Execute the statement as written. Code: row.get("impliedVolatility")
  • Line 166: Close the current block or container. Code: ),
  • Line 167: Close the current block or container. Code: }
  • Line 168: Close the current block or container. Code: )
  • Line 169: Return a value to the caller. Code: return normalized
  • Line 170: Blank line for readability. Code: <blank>
  • Line 171: Blank line for readability. Code: <blank>
  • Line 172: Define the build_rows_from_chain function. Code: def build_rows_from_chain(chain):
  • Line 173: Execute the statement as written. Code: result = chain.get("result", []) if chain else []
  • Line 174: Conditional branch. Code: if not result:
  • Line 175: Return a value to the caller. Code: return [], []
  • Line 176: Execute the statement as written. Code: options = result[0].get("options", [])
  • Line 177: Conditional branch. Code: if not options:
  • Line 178: Return a value to the caller. Code: return [], []
  • Line 179: Execute the statement as written. Code: option = options[0]
  • Line 180: Return a value to the caller. Code: return (
  • Line 181: Execute the statement as written. Code: normalize_chain_rows(option.get("calls")),
  • Line 182: Execute the statement as written. Code: normalize_chain_rows(option.get("puts")),
  • Line 183: Close the current block or container. Code: )
  • Line 184: Blank line for readability. Code: <blank>
  • Line 185: Blank line for readability. Code: <blank>
  • Line 186: Define the extract_contract_expiry_code function. Code: def extract_contract_expiry_code(contract_name):
  • Line 187: Conditional branch. Code: if not contract_name:
  • Line 188: Return a value to the caller. Code: return None
  • Line 189: Execute the statement as written. Code: match = re.search(r"(\d{6})", contract_name)
  • Line 190: Return a value to the caller. Code: return match.group(1) if match else None
  • Line 191: Blank line for readability. Code: <blank>
  • Line 192: Blank line for readability. Code: <blank>
  • Line 193: Define the expected_expiry_code function. Code: def expected_expiry_code(timestamp):
  • Line 194: Conditional branch. Code: if not timestamp:
  • Line 195: Return a value to the caller. Code: return None
  • Line 196: Start a try block for error handling. Code: try:
  • Line 197: Return a value to the caller. Code: return datetime.utcfromtimestamp(timestamp).strftime("%y%m%d")
  • Line 198: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 199: Return a value to the caller. Code: return None
  • Line 200: Blank line for readability. Code: <blank>
  • Line 201: Blank line for readability. Code: <blank>
  • Line 202: Define the extract_expiration_dates_from_html function. Code: def extract_expiration_dates_from_html(html):
  • Line 203: Conditional branch. Code: if not html:
  • Line 204: Return a value to the caller. Code: return []
  • Line 205: Blank line for readability. Code: <blank>
  • Line 206: Execute the statement as written. Code: patterns = (
  • Line 207: Execute the statement as written. Code: r'\\"expirationDates\\":\[(.*?)\]',
  • Line 208: Execute the statement as written. Code: r'"expirationDates":\[(.*?)\]',
  • Line 209: Close the current block or container. Code: )
  • Line 210: Execute the statement as written. Code: match = None
  • Line 211: Loop over items. Code: for pattern in patterns:
  • Line 212: Execute the statement as written. Code: match = re.search(pattern, html, re.DOTALL)
  • Line 213: Conditional branch. Code: if match:
  • Line 214: Execute the statement as written. Code: break
  • Line 215: Conditional branch. Code: if not match:
  • Line 216: Return a value to the caller. Code: return []
  • Line 217: Blank line for readability. Code: <blank>
  • Line 218: Execute the statement as written. Code: raw = match.group(1)
  • Line 219: Execute the statement as written. Code: values = []
  • Line 220: Loop over items. Code: for part in raw.split(","):
  • Line 221: Execute the statement as written. Code: part = part.strip()
  • Line 222: Conditional branch. Code: if part.isdigit():
  • Line 223: Start a try block for error handling. Code: try:
  • Line 224: Execute the statement as written. Code: values.append(int(part))
  • Line 225: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 226: Execute the statement as written. Code: continue
  • Line 227: Return a value to the caller. Code: return values
  • Line 228: Blank line for readability. Code: <blank>
  • Line 229: Blank line for readability. Code: <blank>
  • Line 230: Define the build_expiration_options function. Code: def build_expiration_options(expiration_dates):
  • Line 231: Execute the statement as written. Code: options = []
  • Line 232: Loop over items. Code: for value in expiration_dates or []:
  • Line 233: Start a try block for error handling. Code: try:
  • Line 234: Execute the statement as written. Code: value_int = int(value)
  • Line 235: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 236: Execute the statement as written. Code: continue
  • Line 237: Blank line for readability. Code: <blank>
  • Line 238: Execute the statement as written. Code: label = format_expiration_label(value_int)
  • Line 239: Start a try block for error handling. Code: try:
  • Line 240: Execute the statement as written. Code: date_value = datetime.utcfromtimestamp(value_int).date()
  • Line 241: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 242: Execute the statement as written. Code: date_value = None
  • Line 243: Blank line for readability. Code: <blank>
  • Line 244: Execute the statement as written. Code: options.append({"value": value_int, "label": label, "date": date_value})
  • Line 245: Return a value to the caller. Code: return sorted(options, key=lambda x: x["value"])
  • Line 246: Blank line for readability. Code: <blank>
  • Line 247: Blank line for readability. Code: <blank>
  • Line 248: Define the resolve_expiration function. Code: def resolve_expiration(expiration, options):
  • Line 249: Conditional branch. Code: if not expiration:
  • Line 250: Return a value to the caller. Code: return None, None
  • Line 251: Blank line for readability. Code: <blank>
  • Line 252: Execute the statement as written. Code: raw = expiration.strip()
  • Line 253: Conditional branch. Code: if not raw:
  • Line 254: Return a value to the caller. Code: return None, None
  • Line 255: Blank line for readability. Code: <blank>
  • Line 256: Conditional branch. Code: if raw.isdigit():
  • Line 257: Execute the statement as written. Code: value = int(raw)
  • Line 258: Conditional branch. Code: if options:
  • Line 259: Loop over items. Code: for opt in options:
  • Line 260: Conditional branch. Code: if opt.get("value") == value:
  • Line 261: Return a value to the caller. Code: return value, opt.get("label")
  • Line 262: Return a value to the caller. Code: return None, None
  • Line 263: Return a value to the caller. Code: return value, format_expiration_label(value)
  • Line 264: Blank line for readability. Code: <blank>
  • Line 265: Execute the statement as written. Code: requested_date = parse_date(raw)
  • Line 266: Conditional branch. Code: if requested_date:
  • Line 267: Loop over items. Code: for opt in options:
  • Line 268: Conditional branch. Code: if opt.get("date") == requested_date:
  • Line 269: Return a value to the caller. Code: return opt.get("value"), opt.get("label")
  • Line 270: Return a value to the caller. Code: return None, None
  • Line 271: Blank line for readability. Code: <blank>
  • Line 272: Execute the statement as written. Code: normalized = normalize_label(raw)
  • Line 273: Loop over items. Code: for opt in options:
  • Line 274: Conditional branch. Code: if normalize_label(opt.get("label", "")) == normalized:
  • Line 275: Return a value to the caller. Code: return opt.get("value"), opt.get("label")
  • Line 276: Blank line for readability. Code: <blank>
  • Line 277: Return a value to the caller. Code: return None, None
  • Line 278: Blank line for readability. Code: <blank>
  • Line 279: Blank line for readability. Code: <blank>
  • Line 280: Define the wait_for_tables function. Code: def wait_for_tables(page):
  • Line 281: Start a try block for error handling. Code: try:
  • Line 282: Interact with the Playwright page. Code: page.wait_for_selector(
  • Line 283: Execute the statement as written. Code: "section[data-testid='options-list-table'] table",
  • Line 284: Execute the statement as written. Code: timeout=30000,
  • Line 285: Close the current block or container. Code: )
  • Line 286: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 287: Interact with the Playwright page. Code: page.wait_for_selector("table", timeout=30000)
  • Line 288: Blank line for readability. Code: <blank>
  • Line 289: Loop over items. Code: for _ in range(30): # 30 * 1s = 30 seconds
  • Line 290: Collect option tables from the page. Code: tables = page.query_selector_all(
  • Line 291: Execute the statement as written. Code: "section[data-testid='options-list-table'] table"
  • Line 292: Close the current block or container. Code: )
  • Line 293: Conditional branch. Code: if len(tables) >= 2:
  • Line 294: Return a value to the caller. Code: return tables
  • Line 295: Collect option tables from the page. Code: tables = page.query_selector_all("table")
  • Line 296: Conditional branch. Code: if len(tables) >= 2:
  • Line 297: Return a value to the caller. Code: return tables
  • Line 298: Execute the statement as written. Code: time.sleep(1)
  • Line 299: Return a value to the caller. Code: return []
  • Line 300: Blank line for readability. Code: <blank>
  • Line 301: Blank line for readability. Code: <blank>
  • Line 302: Define the scrape_yahoo_options function. Code: def scrape_yahoo_options(symbol, expiration=None):
  • Line 303: Define the parse_table function. Code: def parse_table(table_html, side):
  • Line 304: Conditional branch. Code: if not table_html:
  • Line 305: Emit or configure a log message. Code: app.logger.warning("No %s table HTML for %s", side, symbol)
  • Line 306: Return a value to the caller. Code: return []
  • Line 307: Blank line for readability. Code: <blank>
  • Line 308: Execute the statement as written. Code: soup = BeautifulSoup(table_html, "html.parser")
  • Line 309: Blank line for readability. Code: <blank>
  • Line 310: Extract header labels from the table. Code: headers = [th.get_text(strip=True) for th in soup.select("thead th")]
  • Line 311: Collect table rows for parsing. Code: rows = soup.select("tbody tr")
  • Line 312: Blank line for readability. Code: <blank>
  • Line 313: Initialize the parsed rows list. Code: parsed = []
  • Line 314: Loop over items. Code: for r in rows:
  • Line 315: Collect table cells for the current row. Code: tds = r.find_all("td")
  • Line 316: Conditional branch. Code: if len(tds) != len(headers):
  • Line 317: Execute the statement as written. Code: continue
  • Line 318: Blank line for readability. Code: <blank>
  • Line 319: Initialize a row dictionary. Code: item = {}
  • Line 320: Loop over items. Code: for i, c in enumerate(tds):
  • Line 321: Read the header name for the current column. Code: key = headers[i]
  • Line 322: Read or convert the cell value. Code: val = c.get_text(" ", strip=True)
  • Line 323: Blank line for readability. Code: <blank>
  • Line 324: Comment describing the next block. Code: # Convert numeric fields
  • Line 325: Conditional branch. Code: if key in ["Strike", "Last Price", "Bid", "Ask", "Change"]:
  • Line 326: Start a try block for error handling. Code: try:
  • Line 327: Read or convert the cell value. Code: val = float(val.replace(",", ""))
  • Line 328: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 329: Read or convert the cell value. Code: val = None
  • Line 330: Alternative conditional branch. Code: elif key in ["Volume", "Open Interest"]:
  • Line 331: Start a try block for error handling. Code: try:
  • Line 332: Read or convert the cell value. Code: val = int(val.replace(",", ""))
  • Line 333: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 334: Read or convert the cell value. Code: val = None
  • Line 335: Alternative conditional branch. Code: elif val in ["-", ""]:
  • Line 336: Read or convert the cell value. Code: val = None
  • Line 337: Blank line for readability. Code: <blank>
  • Line 338: Execute the statement as written. Code: item[key] = val
  • Line 339: Blank line for readability. Code: <blank>
  • Line 340: Execute the statement as written. Code: parsed.append(item)
  • Line 341: Blank line for readability. Code: <blank>
  • Line 342: Emit or configure a log message. Code: app.logger.info("Parsed %d %s rows", len(parsed), side)
  • Line 343: Return a value to the caller. Code: return parsed
  • Line 344: Blank line for readability. Code: <blank>
  • Line 345: Define the read_option_chain function. Code: def read_option_chain(page):
  • Line 346: Capture the page HTML content. Code: html = page.content()
  • Line 347: Execute the statement as written. Code: option_chain = extract_option_chain_from_html(html)
  • Line 348: Conditional branch. Code: if option_chain:
  • Line 349: Extract expiration date timestamps from the HTML. Code: expiration_dates = extract_expiration_dates_from_chain(option_chain)
  • Line 350: Fallback branch. Code: else:
  • Line 351: Extract expiration date timestamps from the HTML. Code: expiration_dates = extract_expiration_dates_from_html(html)
  • Line 352: Return a value to the caller. Code: return option_chain, expiration_dates
  • Line 353: Blank line for readability. Code: <blank>
  • Line 354: Define the has_expected_expiry function. Code: def has_expected_expiry(options, expected_code):
  • Line 355: Conditional branch. Code: if not expected_code:
  • Line 356: Return a value to the caller. Code: return False
  • Line 357: Loop over items. Code: for row in options or []:
  • Line 358: Execute the statement as written. Code: name = row.get("Contract Name")
  • Line 359: Conditional branch. Code: if extract_contract_expiry_code(name) == expected_code:
  • Line 360: Return a value to the caller. Code: return True
  • Line 361: Return a value to the caller. Code: return False
  • Line 362: Blank line for readability. Code: <blank>
  • Line 363: URL-encode the stock symbol. Code: encoded = urllib.parse.quote(symbol, safe="")
  • Line 364: Build the base Yahoo Finance options URL. Code: base_url = f"https://finance.yahoo.com/quote/{encoded}/options/"
  • Line 365: Normalize the expiration input string. Code: requested_expiration = expiration.strip() if expiration else None
  • Line 366: Conditional branch. Code: if not requested_expiration:
  • Line 367: Normalize the expiration input string. Code: requested_expiration = None
  • Line 368: Set the URL to load. Code: url = base_url
  • Line 369: Blank line for readability. Code: <blank>
  • Line 370: Emit or configure a log message. Code: app.logger.info(
  • Line 371: Execute the statement as written. Code: "Starting scrape for symbol=%s expiration=%s url=%s",
  • Line 372: Execute the statement as written. Code: symbol,
  • Line 373: Execute the statement as written. Code: requested_expiration,
  • Line 374: Execute the statement as written. Code: base_url,
  • Line 375: Close the current block or container. Code: )
  • Line 376: Blank line for readability. Code: <blank>
  • Line 377: Reserve storage for options table HTML. Code: calls_html = None
  • Line 378: Reserve storage for options table HTML. Code: puts_html = None
  • Line 379: Parse the full calls and puts tables. Code: calls_full = []
  • Line 380: Parse the full calls and puts tables. Code: puts_full = []
  • Line 381: Initialize or assign the current price. Code: price = None
  • Line 382: Track the resolved expiration metadata. Code: selected_expiration_value = None
  • Line 383: Track the resolved expiration metadata. Code: selected_expiration_label = None
  • Line 384: Prepare or update the list of available expirations. Code: expiration_options = []
  • Line 385: Track the resolved expiration epoch timestamp. Code: target_date = None
  • Line 386: Track whether a base-page lookup is needed. Code: fallback_to_base = False
  • Line 387: Blank line for readability. Code: <blank>
  • Line 388: Enter a context manager block. Code: with sync_playwright() as p:
  • Line 389: Launch a Playwright browser instance. Code: browser = p.chromium.launch(headless=True)
  • Line 390: Create a new Playwright page. Code: page = browser.new_page()
  • Line 391: Interact with the Playwright page. Code: page.set_extra_http_headers(
  • Line 392: Execute the statement as written. Code: {
  • Line 393: Execute the statement as written. Code: "User-Agent": (
  • Line 394: Execute the statement as written. Code: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
  • Line 395: Execute the statement as written. Code: "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120 Safari/537.36"
  • Line 396: Close the current block or container. Code: )
  • Line 397: Close the current block or container. Code: }
  • Line 398: Close the current block or container. Code: )
  • Line 399: Interact with the Playwright page. Code: page.set_default_timeout(60000)
  • Line 400: Blank line for readability. Code: <blank>
  • Line 401: Start a try block for error handling. Code: try:
  • Line 402: Conditional branch. Code: if requested_expiration:
  • Line 403: Conditional branch. Code: if requested_expiration.isdigit():
  • Line 404: Track the resolved expiration epoch timestamp. Code: target_date = int(requested_expiration)
  • Line 405: Track the resolved expiration metadata. Code: selected_expiration_value = target_date
  • Line 406: Track the resolved expiration metadata. Code: selected_expiration_label = format_expiration_label(target_date)
  • Line 407: Fallback branch. Code: else:
  • Line 408: Execute the statement as written. Code: parsed_date = parse_date(requested_expiration)
  • Line 409: Conditional branch. Code: if parsed_date:
  • Line 410: Track the resolved expiration epoch timestamp. Code: target_date = int(
  • Line 411: Execute the statement as written. Code: datetime(
  • Line 412: Execute the statement as written. Code: parsed_date.year,
  • Line 413: Execute the statement as written. Code: parsed_date.month,
  • Line 414: Execute the statement as written. Code: parsed_date.day,
  • Line 415: Execute the statement as written. Code: tzinfo=timezone.utc,
  • Line 416: Execute the statement as written. Code: ).timestamp()
  • Line 417: Close the current block or container. Code: )
  • Line 418: Track the resolved expiration metadata. Code: selected_expiration_value = target_date
  • Line 419: Track the resolved expiration metadata. Code: selected_expiration_label = format_expiration_label(target_date)
  • Line 420: Fallback branch. Code: else:
  • Line 421: Track whether a base-page lookup is needed. Code: fallback_to_base = True
  • Line 422: Blank line for readability. Code: <blank>
  • Line 423: Conditional branch. Code: if target_date:
  • Line 424: Set the URL to load. Code: url = f"{base_url}?date={target_date}"
  • Line 425: Blank line for readability. Code: <blank>
  • Line 426: Navigate the Playwright page to the target URL. Code: page.goto(url, wait_until="domcontentloaded", timeout=60000)
  • Line 427: Emit or configure a log message. Code: app.logger.info("Page loaded (domcontentloaded) for %s", symbol)
  • Line 428: Blank line for readability. Code: <blank>
  • Line 429: Execute the statement as written. Code: option_chain, expiration_dates = read_option_chain(page)
  • Line 430: Emit or configure a log message. Code: app.logger.info("Option chain found: %s", bool(option_chain))
  • Line 431: Prepare or update the list of available expirations. Code: expiration_options = build_expiration_options(expiration_dates)
  • Line 432: Blank line for readability. Code: <blank>
  • Line 433: Conditional branch. Code: if fallback_to_base:
  • Line 434: Execute the statement as written. Code: resolved_value, resolved_label = resolve_expiration(
  • Line 435: Execute the statement as written. Code: requested_expiration, expiration_options
  • Line 436: Close the current block or container. Code: )
  • Line 437: Conditional branch. Code: if resolved_value is None:
  • Line 438: Return a value to the caller. Code: return {
  • Line 439: Execute the statement as written. Code: "error": "Requested expiration not available",
  • Line 440: Execute the statement as written. Code: "stock": symbol,
  • Line 441: Execute the statement as written. Code: "requested_expiration": requested_expiration,
  • Line 442: Execute the statement as written. Code: "available_expirations": [
  • Line 443: Execute the statement as written. Code: {"label": opt.get("label"), "value": opt.get("value")}
  • Line 444: Loop over items. Code: for opt in expiration_options
  • Line 445: Close the current block or container. Code: ],
  • Line 446: Close the current block or container. Code: }
  • Line 447: Blank line for readability. Code: <blank>
  • Line 448: Track the resolved expiration epoch timestamp. Code: target_date = resolved_value
  • Line 449: Track the resolved expiration metadata. Code: selected_expiration_value = resolved_value
  • Line 450: Track the resolved expiration metadata. Code: selected_expiration_label = resolved_label or format_expiration_label(
  • Line 451: Execute the statement as written. Code: resolved_value
  • Line 452: Close the current block or container. Code: )
  • Line 453: Set the URL to load. Code: url = f"{base_url}?date={resolved_value}"
  • Line 454: Navigate the Playwright page to the target URL. Code: page.goto(url, wait_until="domcontentloaded", timeout=60000)
  • Line 455: Emit or configure a log message. Code: app.logger.info("Page loaded (domcontentloaded) for %s", symbol)
  • Line 456: Blank line for readability. Code: <blank>
  • Line 457: Execute the statement as written. Code: option_chain, expiration_dates = read_option_chain(page)
  • Line 458: Prepare or update the list of available expirations. Code: expiration_options = build_expiration_options(expiration_dates)
  • Line 459: Blank line for readability. Code: <blank>
  • Line 460: Conditional branch. Code: if target_date and expiration_options:
  • Line 461: Execute the statement as written. Code: matched = None
  • Line 462: Loop over items. Code: for opt in expiration_options:
  • Line 463: Conditional branch. Code: if opt.get("value") == target_date:
  • Line 464: Execute the statement as written. Code: matched = opt
  • Line 465: Execute the statement as written. Code: break
  • Line 466: Conditional branch. Code: if not matched:
  • Line 467: Return a value to the caller. Code: return {
  • Line 468: Execute the statement as written. Code: "error": "Requested expiration not available",
  • Line 469: Execute the statement as written. Code: "stock": symbol,
  • Line 470: Execute the statement as written. Code: "requested_expiration": requested_expiration,
  • Line 471: Execute the statement as written. Code: "available_expirations": [
  • Line 472: Execute the statement as written. Code: {"label": opt.get("label"), "value": opt.get("value")}
  • Line 473: Loop over items. Code: for opt in expiration_options
  • Line 474: Close the current block or container. Code: ],
  • Line 475: Close the current block or container. Code: }
  • Line 476: Track the resolved expiration metadata. Code: selected_expiration_value = matched.get("value")
  • Line 477: Track the resolved expiration metadata. Code: selected_expiration_label = matched.get("label")
  • Line 478: Alternative conditional branch. Code: elif expiration_options and not target_date:
  • Line 479: Track the resolved expiration metadata. Code: selected_expiration_value = expiration_options[0].get("value")
  • Line 480: Track the resolved expiration metadata. Code: selected_expiration_label = expiration_options[0].get("label")
  • Line 481: Blank line for readability. Code: <blank>
  • Line 482: Execute the statement as written. Code: calls_full, puts_full = build_rows_from_chain(option_chain)
  • Line 483: Emit or configure a log message. Code: app.logger.info(
  • Line 484: Execute the statement as written. Code: "Option chain rows: calls=%d puts=%d",
  • Line 485: Execute the statement as written. Code: len(calls_full),
  • Line 486: Execute the statement as written. Code: len(puts_full),
  • Line 487: Close the current block or container. Code: )
  • Line 488: Blank line for readability. Code: <blank>
  • Line 489: Conditional branch. Code: if not calls_full and not puts_full:
  • Line 490: Emit or configure a log message. Code: app.logger.info("Waiting for options tables...")
  • Line 491: Blank line for readability. Code: <blank>
  • Line 492: Collect option tables from the page. Code: tables = wait_for_tables(page)
  • Line 493: Conditional branch. Code: if len(tables) < 2:
  • Line 494: Emit or configure a log message. Code: app.logger.error(
  • Line 495: Execute the statement as written. Code: "Only %d tables found; expected 2. HTML may have changed.",
  • Line 496: Execute the statement as written. Code: len(tables),
  • Line 497: Close the current block or container. Code: )
  • Line 498: Return a value to the caller. Code: return {"error": "Could not locate options tables", "stock": symbol}
  • Line 499: Blank line for readability. Code: <blank>
  • Line 500: Emit or configure a log message. Code: app.logger.info("Found %d tables. Extracting Calls & Puts.", len(tables))
  • Line 501: Blank line for readability. Code: <blank>
  • Line 502: Reserve storage for options table HTML. Code: calls_html = tables[0].evaluate("el => el.outerHTML")
  • Line 503: Reserve storage for options table HTML. Code: puts_html = tables[1].evaluate("el => el.outerHTML")
  • Line 504: Blank line for readability. Code: <blank>
  • Line 505: Comment describing the next block. Code: # --- Extract current price ---
  • Line 506: Start a try block for error handling. Code: try:
  • Line 507: Comment describing the next block. Code: # Primary selector
  • Line 508: Read the current price text from the page. Code: price_text = page.locator(
  • Line 509: Execute the statement as written. Code: "fin-streamer[data-field='regularMarketPrice']"
  • Line 510: Execute the statement as written. Code: ).inner_text()
  • Line 511: Initialize or assign the current price. Code: price = float(price_text.replace(",", ""))
  • Line 512: Handle exceptions for the preceding try block. Code: except Exception:
  • Line 513: Start a try block for error handling. Code: try:
  • Line 514: Comment describing the next block. Code: # Fallback
  • Line 515: Read the current price text from the page. Code: price_text = page.locator("span[data-testid='qsp-price']").inner_text()
  • Line 516: Initialize or assign the current price. Code: price = float(price_text.replace(",", ""))
  • Line 517: Handle exceptions for the preceding try block. Code: except Exception as e:
  • Line 518: Emit or configure a log message. Code: app.logger.warning("Failed to extract price for %s: %s", symbol, e)
  • Line 519: Blank line for readability. Code: <blank>
  • Line 520: Emit or configure a log message. Code: app.logger.info("Current price for %s = %s", symbol, price)
  • Line 521: Execute the statement as written. Code: finally:
  • Line 522: Execute the statement as written. Code: browser.close()
  • Line 523: Blank line for readability. Code: <blank>
  • Line 524: Conditional branch. Code: if not calls_full and not puts_full and calls_html and puts_html:
  • Line 525: Parse the full calls and puts tables. Code: calls_full = parse_table(calls_html, "calls")
  • Line 526: Parse the full calls and puts tables. Code: puts_full = parse_table(puts_html, "puts")
  • Line 527: Blank line for readability. Code: <blank>
  • Line 528: Execute the statement as written. Code: expected_code = expected_expiry_code(target_date)
  • Line 529: Conditional branch. Code: if expected_code:
  • Line 530: Conditional branch. Code: if not has_expected_expiry(calls_full, expected_code) and not has_expected_expiry(
  • Line 531: Execute the statement as written. Code: puts_full, expected_code
  • Line 532: Close the current block or container. Code: ):
  • Line 533: Return a value to the caller. Code: return {
  • Line 534: Execute the statement as written. Code: "error": "Options chain does not match requested expiration",
  • Line 535: Execute the statement as written. Code: "stock": symbol,
  • Line 536: Execute the statement as written. Code: "requested_expiration": requested_expiration,
  • Line 537: Execute the statement as written. Code: "expected_expiration_code": expected_code,
  • Line 538: Execute the statement as written. Code: "selected_expiration": {
  • Line 539: Execute the statement as written. Code: "value": selected_expiration_value,
  • Line 540: Execute the statement as written. Code: "label": selected_expiration_label,
  • Line 541: Close the current block or container. Code: },
  • Line 542: Close the current block or container. Code: }
  • Line 543: Blank line for readability. Code: <blank>
  • Line 544: Comment describing the next block. Code: # ----------------------------------------------------------------------
  • Line 545: Comment describing the next block. Code: # Pruning logic
  • Line 546: Comment describing the next block. Code: # ----------------------------------------------------------------------
  • Line 547: Define the prune_nearest function. Code: def prune_nearest(options, price_value, limit=26, side=""):
  • Line 548: Conditional branch. Code: if price_value is None:
  • Line 549: Return a value to the caller. Code: return options, 0
  • Line 550: Blank line for readability. Code: <blank>
  • Line 551: Filter options to numeric strike entries. Code: numeric = [o for o in options if isinstance(o.get("Strike"), (int, float))]
  • Line 552: Blank line for readability. Code: <blank>
  • Line 553: Conditional branch. Code: if len(numeric) <= limit:
  • Line 554: Return a value to the caller. Code: return numeric, 0
  • Line 555: Blank line for readability. Code: <blank>
  • Line 556: Sort options by distance to current price. Code: sorted_opts = sorted(numeric, key=lambda x: abs(x["Strike"] - price_value))
  • Line 557: Keep the closest strike entries. Code: pruned = sorted_opts[:limit]
  • Line 558: Compute how many rows were pruned. Code: pruned_count = len(options) - len(pruned)
  • Line 559: Return a value to the caller. Code: return pruned, pruned_count
  • Line 560: Blank line for readability. Code: <blank>
  • Line 561: Apply pruning to calls. Code: calls, pruned_calls = prune_nearest(calls_full, price, side="calls")
  • Line 562: Apply pruning to puts. Code: puts, pruned_puts = prune_nearest(puts_full, price, side="puts")
  • Line 563: Blank line for readability. Code: <blank>
  • Line 564: Define the strike_range function. Code: def strike_range(opts):
  • Line 565: Collect strike prices from the option list. Code: strikes = [o["Strike"] for o in opts if isinstance(o.get("Strike"), (int, float))]
  • Line 566: Return a value to the caller. Code: return [min(strikes), max(strikes)] if strikes else [None, None]
  • Line 567: Blank line for readability. Code: <blank>
  • Line 568: Return a value to the caller. Code: return {
  • Line 569: Execute the statement as written. Code: "stock": symbol,
  • Line 570: Execute the statement as written. Code: "url": url,
  • Line 571: Execute the statement as written. Code: "requested_expiration": requested_expiration,
  • Line 572: Execute the statement as written. Code: "selected_expiration": {
  • Line 573: Execute the statement as written. Code: "value": selected_expiration_value,
  • Line 574: Execute the statement as written. Code: "label": selected_expiration_label,
  • Line 575: Close the current block or container. Code: },
  • Line 576: Execute the statement as written. Code: "current_price": price,
  • Line 577: Execute the statement as written. Code: "calls": calls,
  • Line 578: Execute the statement as written. Code: "puts": puts,
  • Line 579: Execute the statement as written. Code: "calls_strike_range": strike_range(calls),
  • Line 580: Execute the statement as written. Code: "puts_strike_range": strike_range(puts),
  • Line 581: Execute the statement as written. Code: "total_calls": len(calls),
  • Line 582: Execute the statement as written. Code: "total_puts": len(puts),
  • Line 583: Execute the statement as written. Code: "pruned_calls_count": pruned_calls,
  • Line 584: Execute the statement as written. Code: "pruned_puts_count": pruned_puts,
  • Line 585: Close the current block or container. Code: }
  • Line 586: Blank line for readability. Code: <blank>
  • Line 587: Blank line for readability. Code: <blank>
  • Line 588: Attach the route decorator to the handler. Code: @app.route("/scrape_sync")
  • Line 589: Define the scrape_sync function. Code: def scrape_sync():
  • Line 590: Read the stock symbol parameter. Code: symbol = request.args.get("stock", "MSFT")
  • Line 591: Read the expiration parameters from the request. Code: expiration = (
  • Line 592: Execute the statement as written. Code: request.args.get("expiration")
  • Line 593: Execute the statement as written. Code: or request.args.get("expiry")
  • Line 594: Execute the statement as written. Code: or request.args.get("date")
  • Line 595: Close the current block or container. Code: )
  • Line 596: Emit or configure a log message. Code: app.logger.info(
  • Line 597: Execute the statement as written. Code: "Received /scrape_sync request for symbol=%s expiration=%s",
  • Line 598: Execute the statement as written. Code: symbol,
  • Line 599: Execute the statement as written. Code: expiration,
  • Line 600: Close the current block or container. Code: )
  • Line 601: Return a value to the caller. Code: return jsonify(scrape_yahoo_options(symbol, expiration))
  • Line 602: Blank line for readability. Code: <blank>
  • Line 603: Blank line for readability. Code: <blank>
  • Line 604: Conditional branch. Code: if __name__ == "__main__":
  • Line 605: Run the Flask development server. Code: app.run(host="0.0.0.0", port=9777)