GitHub Models Integration Guide¶

Comprehensive guide for using GitHub Models in the Nemotron v3 Home Security Intelligence project. GitHub Models provides free access to cutting-edge AI models directly from GitHub.

Table of Contents¶

Overview
Available Models
Rate Limits
Authentication
Using the gh CLI
Using the REST API
Current Project Usage
Use Cases for This Project
Best Practices
Troubleshooting

Overview¶

GitHub Models provides free access to AI models for experimentation and prototyping. The service is integrated into GitHub Actions and can be used via:

GitHub CLI Extension (gh models)
REST API (https://models.github.ai)
Python SDK (via OpenAI-compatible API)

Marketplace: https://github.com/marketplace/models

Key Benefits¶

Free tier available for experimentation
No separate account needed - uses GitHub authentication
GitHub Actions integration - uses existing GITHUB_TOKEN
Multiple model providers - OpenAI, Meta, Microsoft, Mistral

Available Models¶

OpenAI Models¶

Model	Context Window	Best For	Rate Tier
`openai/gpt-4o`	128K tokens	Complex reasoning, code review	High
`openai/gpt-4o-mini`	128K tokens	Fast, cost-effective tasks	Low
`openai/o1`	200K tokens	Deep reasoning, math	High
`openai/o1-mini`	128K tokens	Faster reasoning tasks	Low
`openai/o3-mini`	200K tokens	Latest reasoning model	High

Meta Llama Models¶

Model	Context Window	Best For	Rate Tier
`meta/llama-3.3-70b-instruct`	128K tokens	General purpose	High
`meta/llama-3.1-8b-instruct`	128K tokens	Lightweight tasks	Low
`meta/llama-3.2-90b-vision`	128K tokens	Vision + text	High

Microsoft Phi Models¶

Model	Context Window	Best For	Rate Tier
`microsoft/phi-4`	16K tokens	Efficient reasoning	Low
`microsoft/phi-3.5-mini`	128K tokens	Fast inference	Low

Mistral Models¶

Model	Context Window	Best For	Rate Tier
`mistral/mistral-large-2411`	128K tokens	Enterprise tasks	High
`mistral/mistral-small-2503`	32K tokens	Cost-effective	Low
`mistral/codestral-2501`	256K tokens	Code generation	Low

Rate Limits¶

GitHub Models has two rate tiers based on model capability:

High-Tier Models (GPT-4o, o1, Llama-70B, etc.)¶

Limit Type	Free Tier Limit
Requests per minute	10 requests/minute
Requests per day	50 requests/day
Tokens per minute	8,000 tokens/minute
Tokens per day	16,000 tokens/day

Low-Tier Models (GPT-4o-mini, Phi, Mistral-small, etc.)¶

Limit Type	Free Tier Limit
Requests per minute	15 requests/minute
Requests per day	150 requests/day
Tokens per minute	15,000 tokens/minute
Tokens per day	150,000 tokens/day

Rate Limit Headers¶

API responses include rate limit headers:

x-ratelimit-limit-requests: 50
x-ratelimit-remaining-requests: 45
x-ratelimit-limit-tokens: 16000
x-ratelimit-remaining-tokens: 14500

Handling Rate Limits¶

import time

def call_with_retry(prompt, max_retries=3):
    """Call GitHub Models with exponential backoff."""
    for attempt in range(max_retries):
        response = call_github_models(prompt)
        if response.status_code == 429:
            wait_time = 2 ** attempt * 10  # 10s, 20s, 40s
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
            continue
        return response
    raise Exception("Max retries exceeded")

Authentication¶

In GitHub Actions¶

GitHub Actions automatically provides a GITHUB_TOKEN with models: read permission:

permissions:
  contents: read
  models: read

steps:
  - name: Call Model
    env:
      GH_TOKEN: ${{ github.token }}
    run: |
      gh models run openai/gpt-4o "Hello, world!"

Local Development¶

For local development, use a GitHub Personal Access Token (PAT):

# Set environment variable
export GH_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

# Or use gh auth
gh auth login

Required Token Permissions:

models:read - Access to GitHub Models API

Token Generation¶

Go to: github.com/settings/tokens
Click "Generate new token (classic)" or "Fine-grained token"
For fine-grained: Select "Account permissions" > "GitHub Copilot" (includes Models)
For classic: Check the copilot scope

Using the gh CLI¶

Installation¶

# Install gh-models extension
gh extension install github/gh-models

# Verify installation
gh models --help

Basic Usage¶

# Simple prompt
gh models run openai/gpt-4o "Explain async/await in Python"

# Interactive mode
gh models run openai/gpt-4o

# Pipe input
echo "What is this code doing?" | gh models run openai/gpt-4o

# With file context
cat script.py | gh models run openai/gpt-4o "Review this code"

Model Selection¶

# List available models
gh models list

# Use specific model
gh models run meta/llama-3.3-70b-instruct "Prompt here"
gh models run openai/gpt-4o-mini "Prompt here"
gh models run mistral/codestral-2501 "Prompt here"

Advanced Options¶

# Set temperature (0.0-2.0, default 1.0)
gh models run openai/gpt-4o --temperature 0.7 "Be creative"

# Set max tokens
gh models run openai/gpt-4o --max-tokens 500 "Short answer"

# System prompt
gh models run openai/gpt-4o \
  --system "You are a security expert" \
  "Review this code for vulnerabilities"

Using the REST API¶

Endpoint¶

POST https://models.github.ai/inference/chat/completions

Python Example¶

#!/usr/bin/env python3
"""GitHub Models API client example."""

import os
import requests

def call_github_models(
    prompt: str,
    model: str = "openai/gpt-4o",
    temperature: float = 0.7,
    max_tokens: int = 1000,
    system_prompt: str | None = None,
) -> str:
    """
    Call GitHub Models API.

    Args:
        prompt: User prompt
        model: Model identifier (e.g., "openai/gpt-4o")
        temperature: Sampling temperature (0.0-2.0)
        max_tokens: Maximum response tokens
        system_prompt: Optional system prompt

    Returns:
        Model response text
    """
    token = os.environ.get("GH_TOKEN") or os.environ.get("GITHUB_TOKEN")
    if not token:
        raise ValueError("GH_TOKEN or GITHUB_TOKEN environment variable required")

    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})

    response = requests.post(
        "https://models.github.ai/inference/chat/completions",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json",
        },
        json={
            "model": model,
            "messages": messages,
            "temperature": temperature,
            "max_tokens": max_tokens,
        },
        timeout=60,
    )
    response.raise_for_status()

    return response.json()["choices"][0]["message"]["content"]


if __name__ == "__main__":
    result = call_github_models("What is 2 + 2?")
    print(result)

Using OpenAI SDK (Compatible)¶

from openai import OpenAI

client = OpenAI(
    base_url="https://models.github.ai/inference",
    api_key=os.environ["GH_TOKEN"],
)

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)
print(response.choices[0].message.content)

Current Project Usage¶

AI Code Review Workflow¶

This project uses GitHub Models for automated code review on pull requests:

File: .github/workflows/ai-code-review.yml

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize]
    branches: [main]

permissions:
  contents: read
  pull-requests: write
  models: read

jobs:
  ai-review:
    name: GPT-5 Code Review
    runs-on: ubuntu-latest
    if: github.event.pull_request.draft == false

    steps:
      - name: Install gh-models extension
        run: gh extension install github/gh-models || true

      - name: Run AI Code Review
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
          # Review prompt with tech stack context
          cat << 'PROMPT_END' > review_prompt.txt
          You are an expert code reviewer for a home security application.
          Tech stack: Python FastAPI, React TypeScript, YOLO26, Nemotron.

          Review for:
          1. Security Issues
          2. Performance
          3. Best Practices
          4. Bugs
          PROMPT_END

          cat pr_diff.txt >> review_prompt.txt

          # Call GPT-4o with fallback to mini
          REVIEW=$(cat review_prompt.txt | gh models run openai/gpt-4o 2>&1) || \
          REVIEW=$(cat review_prompt.txt | gh models run openai/gpt-4o-mini 2>&1)

          echo "$REVIEW" > review_output.md

Key Features:

Automatic trigger on PR open/sync
Diff truncation for token limits
Fallback from GPT-4o to GPT-4o-mini
Posted as PR comment

Use Cases for This Project¶

1. AI Code Review (Implemented)¶

Already implemented in .github/workflows/ai-code-review.yml. Reviews PRs for:

Security vulnerabilities
Performance issues
Code style
Logic errors

2. Documentation Generation¶

Generate or update documentation from code:

# Generate docstrings
cat backend/services/detector_client.py | \
  gh models run openai/gpt-4o "Add comprehensive docstrings to this Python code"

# Generate README section
cat backend/api/routes/*.py | \
  gh models run openai/gpt-4o "Create API documentation in markdown format"

3. Test Case Suggestions¶

Generate test cases for new code:

# Suggest unit tests
cat backend/services/batch_aggregator.py | \
  gh models run openai/gpt-4o \
  --system "You are a Python testing expert using pytest" \
  "Suggest comprehensive test cases for this service"

# Generate test file skeleton
cat backend/services/nemotron_analyzer.py | \
  gh models run mistral/codestral-2501 \
  "Generate pytest test file with fixtures and edge cases"

4. Security Analysis¶

Analyze code for security issues:

# Security review
cat backend/api/routes/events.py | \
  gh models run openai/gpt-4o \
  --system "You are a security expert. Focus on OWASP Top 10" \
  "Analyze this code for security vulnerabilities"

# SQL injection check
grep -r "execute\|raw_sql" backend/ | \
  gh models run openai/gpt-4o "Check for SQL injection risks"

5. PR Description Generation¶

Generate PR descriptions from diffs:

# Generate PR description
gh pr diff 123 | \
  gh models run openai/gpt-4o \
  --system "Generate a clear PR description with summary and test plan" \
  "Describe these changes"

6. Commit Message Generation¶

Generate commit messages from staged changes:

# Generate commit message
git diff --cached | \
  gh models run openai/gpt-4o-mini \
  "Generate a conventional commit message (feat/fix/docs/refactor)"

Improve Nemotron prompts for risk analysis:

# Refine risk analysis prompt
cat docs/plans/2024-12-21-dashboard-mvp-design.md | \
  grep -A 50 "Nemotron Prompt" | \
  gh models run openai/gpt-4o \
  "Suggest improvements for this security risk analysis prompt"

Best Practices¶

1. Use Appropriate Models¶

Task	Recommended Model	Reason
Complex code review	`openai/gpt-4o`	Best reasoning
Simple formatting	`openai/gpt-4o-mini`	Fast, saves quota
Code generation	`mistral/codestral-2501`	Optimized for code
General tasks	`meta/llama-3.3-70b`	Good balance

2. Implement Fallbacks¶

# Fallback chain
RESPONSE=$(gh models run openai/gpt-4o "$PROMPT" 2>&1) || \
RESPONSE=$(gh models run openai/gpt-4o-mini "$PROMPT" 2>&1) || \
RESPONSE="Model unavailable"

3. Truncate Large Inputs¶

# Limit to ~20KB (leaves room for prompt overhead)
head -c 20000 large_file.txt > truncated.txt
cat truncated.txt | gh models run openai/gpt-4o "..."

4. Use System Prompts¶

gh models run openai/gpt-4o \
  --system "You are reviewing code for a home security system. \
            The stack is Python FastAPI, React TypeScript, PostgreSQL, Redis. \
            AI models are YOLO26 (detection) and Nemotron (reasoning)." \
  "Review this code"

5. Cache Results When Possible¶

For deterministic queries, cache results to avoid hitting rate limits:

import hashlib
import json
from pathlib import Path

CACHE_DIR = Path(".github-models-cache")

def cached_query(prompt: str, model: str = "openai/gpt-4o") -> str:
    """Query with file-based caching."""
    cache_key = hashlib.sha256(f"{model}:{prompt}".encode()).hexdigest()[:16]
    cache_file = CACHE_DIR / f"{cache_key}.json"

    if cache_file.exists():
        return json.loads(cache_file.read_text())["response"]

    response = call_github_models(prompt, model=model)

    CACHE_DIR.mkdir(exist_ok=True)
    cache_file.write_text(json.dumps({"prompt": prompt, "response": response}))

    return response

6. Handle Errors Gracefully¶

import requests

try:
    response = call_github_models(prompt)
except requests.exceptions.HTTPError as e:
    if e.response.status_code == 429:
        print("Rate limited - try again later")
    elif e.response.status_code == 401:
        print("Authentication failed - check GH_TOKEN")
    elif e.response.status_code == 400:
        print(f"Bad request: {e.response.text}")
    else:
        raise

Troubleshooting¶

"gh: command not found"¶

Install GitHub CLI:

# macOS
brew install gh

# Ubuntu/Debian
curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | \
  sudo dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] \
  https://cli.github.com/packages stable main" | \
  sudo tee /etc/apt/sources.list.d/github-cli.list
sudo apt update && sudo apt install gh

# Fedora
sudo dnf install gh

"gh models: command not found"¶

Install the extension:

gh extension install github/gh-models

"401 Unauthorized"¶

Check authentication:

# Verify token is set
echo $GH_TOKEN

# Test authentication
gh auth status

# Re-authenticate if needed
gh auth login

"429 Rate Limited"¶

You've exceeded the rate limit:

# Check current limits (in workflow)
echo "Remaining: ${{ github.event.rate_limit.remaining }}"

# Wait and retry
sleep 60

"Model not found"¶

Verify model name:

# List available models
gh models list

# Use exact name from list
gh models run openai/gpt-4o "test"

Large Response Truncated¶

Increase max tokens:

gh models run openai/gpt-4o --max-tokens 4000 "Generate detailed output"

Timeout Errors¶

Increase timeout in Python:

response = requests.post(url, json=data, timeout=120)  # 2 minutes

Additional Resources¶

GitHub Models Marketplace
GitHub Models Documentation
gh-models Extension
OpenAI API Reference (compatible format)
Project AI Code Review: .github/workflows/ai-code-review.yml
Project CI/CD: .github/workflows/ci.yml