Run SEO audits locally with Gemma, Ollama, and React

a computer screen with a bunch of data on it

Most SEO audit tools send your page data to a cloud API. That means your metadata, heading structure, internal links, schema markup, and content details leave your machine every time you run a scan. For agencies, freelancers working under NDA, or anyone auditing a site before launch, that’s a real concern.

Developer Avraham Aminov built a different approach: Local AI SEO Agent, an open source tool that runs AI analysis entirely on your own hardware using Gemma through Ollama. No external AI provider. No per-token billing. No remote inference dependency.

️ What It Does

The tool takes a public webpage URL, runs a deterministic technical SEO scan, sends a compact structured summary to a local Gemma model, validates the response, and renders a practical audit report in a React frontend.

The scanner extracts the following signals from any public page:

  • Title and meta description
  • Canonical, robots, and viewport tags
  • Heading structure (H1 through H6 counts)
  • Image alt text coverage
  • Internal, external, and empty link counts
  • Open Graph tags
  • JSON-LD schema count
  • Visible text length and word count

Gemma then takes those structured facts and produces:

  • An SEO score
  • A plain-language summary
  • Critical issues
  • Medium-priority issues
  • Actionable recommendations
  • A suggested title rewrite
  • A suggested meta description rewrite
black and gray laptop computer

How the Architecture Works

The separation between scanning and reasoning is the most interesting design decision here. Gemma never fetches a URL or parses HTML. The backend does all of that with Axios and Cheerio, then hands Gemma a compact structured summary rather than a raw HTML dump.

The full flow looks like this:

React UI
  -> Express API
  -> SEO scanner (Axios + Cheerio)
  -> prompt builder
  -> Ollama
  -> Gemma (gemma4:e4b)
  -> JSON validator (Zod)
  -> report UI

The frontend never communicates directly with Ollama. It only calls the Express backend. This keeps the Ollama service unexposed and makes the system easier to reason about.

The backend also rejects risky input before any fetch happens: localhost URLs, loopback IPs, private network addresses, malformed URLs, and unsupported protocols are all blocked. That matters because the backend is fetching user-provided URLs.

Why gemma4:e4b Specifically

Aminov chose gemma4:e4b because it’s stronger than the smallest edge variant while still being practical for local development. The model weighs in at 9.6GB. On his local Docker setup, a full audit takes around 1-2 minutes depending on whether the model is already loaded into memory.

The first request after a cold start is the slowest. Ollama needs to load the model before it can respond. The app handles this with an extended request timeout and a loading state that shows elapsed time so the user knows the process is still running.

Prompt Design and Output Validation

The prompt instructs Gemma to return JSON only. The required output shape is strict:

{
  "score": 92,
  "summary": "Short SEO summary",
  "criticalIssues": [],
  "mediumIssues": [],
  "recommendations": [],
  "suggestedTitle": "",
  "suggestedMetaDescription": ""
}

The backend validates every response with Zod before it touches the frontend. If Gemma returns malformed JSON, missing required fields, or an invalid score, the API returns a clean error instead of rendering unreliable data.

Aminov also trimmed the prompt by sending a scanner summary rather than the full raw scan object. Smaller prompts made local inference more predictable.

person using macbook pro on table

Running It Locally with Docker

The project runs with Docker Compose. Three services spin up: frontend, backend, and Ollama.

docker compose up -d --build

Default ports after startup:

  • Frontend: http://localhost:5174
  • Backend: http://localhost:3001
  • Ollama: http://localhost:11435

After the containers are running, pull the model into the Ollama service:

docker compose exec ollama ollama pull gemma4:e4b

That’s the full setup. No API keys, no accounts, no external services.

The Pattern Worth Stealing

The architectural lesson here isn’t specific to SEO. It’s a general pattern for using local models in developer tools:

Deterministic extraction + local AI reasoning + strict output validation

The scanner does the work that needs to be precise. Gemma does the interpretation work that benefits from natural language. Zod catches anything malformed before it reaches the UI. Each layer has a single, bounded job.

Aminov notes this works well because the task is clearly bounded. Gemma doesn’t need to browse the web or guess what’s on the page. It receives structured facts and focuses on interpretation.

The Catch

This is an MVP with intentional scope limits. It audits one page at a time. There’s no crawling, no sitemap support, no report history, and no PDF export. The developer lists multi-page crawling, Lighthouse integration, a browser extension, and a WordPress plugin as potential future additions, but none of those exist yet.

Performance is also hardware-dependent. A 9.6GB model running locally on a mid-range laptop will behave differently than on a machine with a dedicated GPU. The 1-2 minute audit time is from the developer’s own Docker setup.

Where to Get It

The project is open source on GitHub. MIT licensed, so you can fork it, modify it, or wire it into your own tooling.

Stay on top of AI & Automation with BizStack Newsletter
BizStack  —  Entrepreneur’s Business Stack
Logo