Back to blog

Building a ChatGPT Alternative That Controls Your Browser: Complete Open Source Guide

Keywords: open source browser automation, ChatGPT alternative, AI browser control, Chrome extension development, LLM integration, browser agents

Imagine having your own personal ChatGPT—but instead of just chatting, it can actually control your browser, fill forms, extract data, and complete complex web tasks on your behalf. And it's 100% open source, running locally on your machine.

This is not science fiction. This guide will walk you through building exactly that: a production-ready, open-source browser automation system powered by AI.

What you'll learn:

  • Chrome extension architecture (Manifest V3)
  • Multi-agent AI system design
  • LLM integration (OpenAI, Anthropic, local models)
  • Browser automation APIs
  • Production deployment strategies

What you'll build: A fully functional Chrome extension that:

  • Accepts natural language commands
  • Plans execution strategies with AI
  • Executes browser actions autonomously
  • Handles errors and edge cases
  • Provides real-time feedback

Let's build it.

Table of Contents

Reading Time: ~35 minutes | Difficulty: Advanced | Last Updated: January 19, 2026

Code Repository: All code snippets are production-ready and based on Onpiste

Project Overview and Architecture

What We're Building

Product Name: BrowserGPT (example name)

Core Features:

  • Natural language browser control
  • Multi-agent execution system
  • Support for multiple LLM providers
  • Real-time progress tracking
  • Data extraction and manipulation
  • Form automation
  • Web scraping capabilities

High-Level Architecture

┌─────────────────────────────────────────────────────────┐
│                  Chrome Extension                       │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────────┐    ┌──────────────┐   ┌───────────┐ │
│  │  Side Panel │───▶│Service Worker│──▶│Content    │ │
│  │  (React UI) │◀───│  (Agents)    │◀──│Scripts    │ │
│  └─────────────┘    └──────────────┘   └───────────┘ │
│                            │                           │
│                            ▼                           │
│                    ┌──────────────┐                    │
│                    │ Browser APIs │                    │
│                    │ - Tabs       │                    │
│                    │ - Scripting  │                    │
│                    │ - Storage    │                    │
│                    └──────────────┘                    │
└─────────────────────────────────────────────────────────┘
              ┌────────────────────────┐
              │   External LLM APIs    │
              │ OpenAI / Anthropic /   │
              │ Google / Local Models  │
              └────────────────────────┘

Technology Stack

Core:

  • TypeScript (strict mode)
  • Chrome Extension Manifest V3
  • Vite (build tool)
  • pnpm (package manager)

UI:

  • React 18
  • Tailwind CSS
  • Radix UI (accessible components)

Agent System:

  • Custom multi-agent framework
  • Zod (schema validation)
  • LLM provider abstraction

Testing:

  • Vitest (unit tests)
  • Playwright (E2E tests)

Build & Deploy:

  • Turbo (monorepo)
  • GitHub Actions (CI/CD)
  • Chrome Web Store (distribution)

Setting Up the Development Environment

Prerequisites

# Check versions
node --version    # v18+
pnpm --version    # v8+
chrome --version  # v138+

Project Initialization

# Create project structure
mkdir browsergpt && cd browsergpt
pnpm init

# Initialize as monorepo
mkdir -p packages/extension packages/ui packages/shared
mkdir -p packages/extension/src/{background,content,side-panel}

# Initialize package.json for workspace
cat > package.json <<'EOF'
{
  "name": "browsergpt",
  "private": true,
  "scripts": {
    "dev": "turbo run dev",
    "build": "turbo run build",
    "test": "turbo run test"
  },
  "devDependencies": {
    "turbo": "^2.0.0",
    "typescript": "^5.3.0",
    "@types/chrome": "^0.0.268",
    "vite": "^5.0.0",
    "vitest": "^1.0.0"
  }
}
EOF

# Install dependencies
pnpm install

TypeScript Configuration

// tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    "lib": ["ES2022", "DOM"],
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "resolveJsonModule": true,
    "allowSyntheticDefaultImports": true,
    "types": ["chrome", "vite/client"],
    "paths": {
      "@/*": ["./src/*"]
    }
  }
}

Vite Configuration

// packages/extension/vite.config.ts
import { defineConfig } from 'vite';
import { crx } from '@crxjs/vite-plugin';
import manifest from './manifest.json';

export default defineConfig({
  plugins: [
    crx({ manifest })
  ],
  build: {
    rollupOptions: {
      input: {
        'side-panel': 'src/side-panel/index.html'
      }
    }
  }
});

Building the Chrome Extension Foundation

Manifest V3 Configuration

// packages/extension/manifest.json
{
  "manifest_version": 3,
  "name": "BrowserGPT",
  "version": "1.0.0",
  "description": "AI-powered browser automation",

  "permissions": [
    "activeTab",
    "scripting",
    "storage",
    "tabs",
    "sidePanel"
  ],

  "host_permissions": [
    "<all_urls>"
  ],

  "background": {
    "service_worker": "src/background/index.ts",
    "type": "module"
  },

  "side_panel": {
    "default_path": "src/side-panel/index.html"
  },

  "action": {
    "default_title": "BrowserGPT",
    "default_icon": {
      "16": "icons/icon16.png",
      "48": "icons/icon48.png",
      "128": "icons/icon128.png"
    }
  },

  "content_scripts": [
    {
      "matches": ["<all_urls>"],
      "js": ["src/content/index.ts"],
      "run_at": "document_idle"
    }
  ],

  "web_accessible_resources": [
    {
      "resources": ["src/content/*"],
      "matches": ["<all_urls>"]
    }
  ]
}

Service Worker Setup

// packages/extension/src/background/index.ts
import { Executor } from './agent/executor';
import { MessageRouter } from './messaging/router';

// Initialize on install
chrome.runtime.onInstalled.addListener(() => {
  console.log('BrowserGPT installed');
  initializeStorage();
});

// Handle messages from UI
const messageRouter = new MessageRouter();

chrome.runtime.onMessage.addListener((message, sender, sendResponse) => {
  messageRouter.route(message, sender, sendResponse);
  return true; // Keep channel open for async response
});

// Initialize storage
async function initializeStorage() {
  const defaults = {
    llmProvider: 'openai',
    llmApiKey: '',
    maxSteps: 100,
    useVision: false
  };

  const existing = await chrome.storage.local.get(Object.keys(defaults));

  // Only set defaults for missing keys
  const toSet = Object.entries(defaults).reduce((acc, [key, value]) => {
    if (!(key in existing)) {
      acc[key] = value;
    }
    return acc;
  }, {} as Record<string, any>);

  if (Object.keys(toSet).length > 0) {
    await chrome.storage.local.set(toSet);
  }
}

Message Routing System

// packages/extension/src/background/messaging/router.ts
import type { Message, MessageHandler } from './types';

export class MessageRouter {
  private handlers = new Map<string, MessageHandler>();

  constructor() {
    this.registerHandlers();
  }

  private registerHandlers() {
    // Register all message handlers
    this.register('execute_task', this.handleExecuteTask.bind(this));
    this.register('cancel_task', this.handleCancelTask.bind(this));
    this.register('get_config', this.handleGetConfig.bind(this));
    this.register('set_config', this.handleSetConfig.bind(this));
  }

  register(type: string, handler: MessageHandler) {
    this.handlers.set(type, handler);
  }

  async route(message: Message, sender: chrome.runtime.MessageSender, sendResponse: (response: any) => void) {
    const handler = this.handlers.get(message.type);

    if (!handler) {
      sendResponse({ error: `Unknown message type: ${message.type}` });
      return;
    }

    try {
      const result = await handler(message.payload, sender);
      sendResponse({ success: true, data: result });
    } catch (error) {
      console.error(`Error handling ${message.type}:`, error);
      sendResponse({
        success: false,
        error: error instanceof Error ? error.message : 'Unknown error'
      });
    }
  }

  private async handleExecuteTask(payload: { task: string }, sender: chrome.runtime.MessageSender) {
    const executor = new Executor();
    const result = await executor.execute(payload.task);
    return result;
  }

  private async handleCancelTask() {
    // Implement cancellation logic
    return { cancelled: true };
  }

  private async handleGetConfig() {
    return await chrome.storage.local.get();
  }

  private async handleSetConfig(payload: Record<string, any>) {
    await chrome.storage.local.set(payload);
    return { updated: true };
  }
}

Creating the Multi-Agent System

Agent Base Class

// packages/extension/src/background/agent/base.ts
import type { AgentContext, AgentResult } from './types';
import { LLMProvider } from '../llm/provider';

export abstract class BaseAgent {
  protected llm: LLMProvider;

  constructor(
    protected context: AgentContext,
    protected config: AgentConfig
  ) {
    this.llm = new LLMProvider({
      provider: config.provider,
      model: config.model,
      apiKey: config.apiKey
    });
  }

  /**
   * Execute agent's main responsibility
   */
  abstract execute(): Promise<AgentResult>;

  /**
   * Generate structured output using LLM
   */
  protected async generateStructured<T>(
    prompt: string,
    schema: z.ZodSchema<T>
  ): Promise<T> {
    const response = await this.llm.generate(prompt, {
      temperature: this.config.temperature ?? 0.7,
      maxTokens: this.config.maxTokens ?? 4096
    });

    // Parse and validate response
    const parsed = JSON.parse(response);
    return schema.parse(parsed);
  }

  /**
   * Build prompt with context
   */
  protected buildPrompt(template: string): string {
    return template
      .replace('{{task}}', this.context.task)
      .replace('{{history}}', this.formatHistory())
      .replace('{{page}}', this.context.browserContext.currentPage);
  }

  protected formatHistory(): string {
    return this.context.actionResults
      .map((result, i) => `${i + 1}. ${result.action}: ${result.success ? '✓' : '✗'}`)
      .join('\n');
  }
}
// packages/extension/src/background/agent/agents/navigator.ts
import { BaseAgent } from '../base';
import { z } from 'zod';
import type { AgentResult } from '../types';

// Action schema
const ActionSchema = z.object({
  type: z.enum(['go_to_url', 'click_element', 'input_text', 'scroll', 'extract_data', 'done']),
  target: z.string().optional(),
  value: z.string().optional(),
  reasoning: z.string()
});

const NavigatorResponseSchema = z.object({
  actions: z.array(ActionSchema).max(10),
  done: z.boolean(),
  summary: z.string()
});

export class NavigatorAgent extends BaseAgent {
  async execute(): Promise<AgentResult> {
    // Build prompt
    const prompt = this.buildNavigatorPrompt();

    // Get structured actions from LLM
    const response = await this.generateStructured(
      prompt,
      NavigatorResponseSchema
    );

    // Execute actions
    const results = [];
    for (const action of response.actions) {
      const result = await this.executeAction(action);
      results.push(result);

      // Stop if action failed critically
      if (!result.success && action.type !== 'extract_data') {
        break;
      }
    }

    return {
      done: response.done,
      results,
      summary: response.summary
    };
  }

  private buildNavigatorPrompt(): string {
    return `
You are a browser navigation agent. Your goal is to complete this task:

Task: ${this.context.task}

Current page: ${this.context.browserContext.url}
Page structure: ${this.context.browserContext.accessibility}

Previous steps:
${this.formatHistory()}

Available actions:
- go_to_url: Navigate to a URL
- click_element: Click an element (provide selector or description)
- input_text: Type text into an input (provide selector and text)
- scroll: Scroll the page (provide direction: up/down or element)
- extract_data: Extract data from page (provide selector)
- done: Mark task as complete

Generate up to 10 actions to progress toward the goal. Return JSON:

{
  "actions": [
    {
      "type": "click_element",
      "target": "#login-button",
      "reasoning": "Click login to access the dashboard"
    }
  ],
  "done": false,
  "summary": "Navigated to login page and clicked login button"
}

If the task is complete, set done: true.
    `.trim();
  }

  private async executeAction(action: z.infer<typeof ActionSchema>) {
    const { type, target, value } = action;

    try {
      switch (type) {
        case 'go_to_url':
          await this.context.browserContext.navigate(target!);
          return { success: true, action: `Navigated to ${target}` };

        case 'click_element':
          await this.context.browserContext.click(target!);
          return { success: true, action: `Clicked ${target}` };

        case 'input_text':
          await this.context.browserContext.type(target!, value!);
          return { success: true, action: `Typed into ${target}` };

        case 'scroll':
          await this.context.browserContext.scroll(target!);
          return { success: true, action: `Scrolled ${target}` };

        case 'extract_data':
          const data = await this.context.browserContext.extractData(target!);
          return { success: true, action: `Extracted data from ${target}`, data };

        case 'done':
          return { success: true, action: 'Task marked complete' };

        default:
          return { success: false, action: `Unknown action type: ${type}` };
      }
    } catch (error) {
      return {
        success: false,
        action: `Failed to ${type}`,
        error: error instanceof Error ? error.message : 'Unknown error'
      };
    }
  }
}

Planner Agent Implementation

// packages/extension/src/background/agent/agents/planner.ts
import { BaseAgent } from '../base';
import { z } from 'zod';

const PlannerResponseSchema = z.object({
  done: z.boolean(),
  next_goal: z.string().optional(),
  final_answer: z.string().optional(),
  confidence: z.number().min(0).max(1),
  reasoning: z.string()
});

export class PlannerAgent extends BaseAgent {
  async execute(): Promise<AgentResult> {
    const prompt = this.buildPlannerPrompt();

    const response = await this.generateStructured(
      prompt,
      PlannerResponseSchema
    );

    return {
      done: response.done,
      nextGoal: response.next_goal,
      finalAnswer: response.final_answer,
      confidence: response.confidence
    };
  }

  private buildPlannerPrompt(): string {
    return `
You are a strategic planning agent evaluating task progress.

Original task: ${this.context.task}

Actions taken so far:
${this.formatHistory()}

Current page: ${this.context.browserContext.url}

Evaluate:
1. Is the original task complete? If yes, provide final_answer with the result.
2. If not complete, what should be the next high-level goal?
3. How confident are you? (0.0 - 1.0)

Return JSON:
{
  "done": false,
  "next_goal": "Navigate to the pricing page",
  "confidence": 0.8,
  "reasoning": "Successfully logged in, now need to find pricing information"
}

Or if complete:
{
  "done": true,
  "final_answer": "Found 3 products with prices: Product A ($29), Product B ($49), Product C ($99)",
  "confidence": 0.95,
  "reasoning": "All requested data extracted successfully"
}
    `.trim();
  }
}

Executor: Orchestrating Agents

// packages/extension/src/background/agent/executor.ts
import { NavigatorAgent } from './agents/navigator';
import { PlannerAgent } from './agents/planner';
import { BrowserContext } from '../browser/context';
import type { AgentContext, ExecutionResult } from './types';

export class Executor {
  async execute(task: string, options: ExecutionOptions = {}): Promise<ExecutionResult> {
    // Initialize context
    const browserContext = new BrowserContext();
    const context: AgentContext = {
      task,
      browserContext,
      actionResults: [],
      step: 0,
      done: false,
      maxSteps: options.maxSteps ?? 100,
      planningInterval: options.planningInterval ?? 3
    };

    // Create agents
    const navigator = new NavigatorAgent(context, {
      provider: options.llmProvider ?? 'openai',
      model: options.navigatorModel ?? 'gpt-4o-mini',
      apiKey: options.apiKey
    });

    const planner = new PlannerAgent(context, {
      provider: options.llmProvider ?? 'openai',
      model: options.plannerModel ?? 'gpt-4o',
      apiKey: options.apiKey
    });

    // Execution loop
    while (!context.done && context.step < context.maxSteps) {
      try {
        // Navigator: Execute actions
        const navResult = await navigator.execute();
        context.actionResults.push(...navResult.results);

        // Check if Navigator marked task as done
        if (navResult.done) {
          context.done = true;
          context.finalAnswer = navResult.summary;
          break;
        }

        // Planner: Evaluate progress every N steps
        if (context.step % context.planningInterval === 0) {
          const planResult = await planner.execute();

          if (planResult.done) {
            context.done = true;
            context.finalAnswer = planResult.finalAnswer;
            break;
          }

          // Update goal if Planner suggests new direction
          if (planResult.nextGoal) {
            context.task = planResult.nextGoal;
          }
        }

        context.step++;

      } catch (error) {
        console.error('Execution error:', error);

        // Attempt recovery
        const recovered = await this.attemptRecovery(context, error);
        if (!recovered) {
          throw error;
        }
      }
    }

    return {
      success: context.done,
      result: context.finalAnswer,
      steps: context.step,
      actions: context.actionResults
    };
  }

  private async attemptRecovery(
    context: AgentContext,
    error: unknown
  ): Promise<boolean> {
    // Implement retry logic, fallback strategies, etc.
    // For now, just fail
    return false;
  }
}

Implementing Browser Automation

Browser Context API

// packages/extension/src/background/browser/context.ts
export class BrowserContext {
  private tabId?: number;

  /**
   * Get current active tab or create new one
   */
  async getOrCreateTab(): Promise<number> {
    if (this.tabId) {
      try {
        await chrome.tabs.get(this.tabId);
        return this.tabId;
      } catch {
        // Tab no longer exists
        this.tabId = undefined;
      }
    }

    const [tab] = await chrome.tabs.query({ active: true, currentWindow: true });
    if (tab?.id) {
      this.tabId = tab.id;
      return tab.id;
    }

    // Create new tab
    const newTab = await chrome.tabs.create({ active: true });
    this.tabId = newTab.id!;
    return this.tabId;
  }

  /**
   * Navigate to URL
   */
  async navigate(url: string): Promise<void> {
    const tabId = await this.getOrCreateTab();

    await chrome.tabs.update(tabId, { url });

    // Wait for page load
    await this.waitForLoad(tabId);
  }

  /**
   * Click element
   */
  async click(selector: string): Promise<void> {
    const tabId = await this.getOrCreateTab();

    await chrome.scripting.executeScript({
      target: { tabId },
      func: (sel: string) => {
        const element = document.querySelector(sel);
        if (element instanceof HTMLElement) {
          element.click();
        } else {
          throw new Error(`Element not found: ${sel}`);
        }
      },
      args: [selector]
    });

    // Wait for potential navigation
    await this.delay(500);
  }

  /**
   * Type text into input
   */
  async type(selector: string, text: string): Promise<void> {
    const tabId = await this.getOrCreateTab();

    await chrome.scripting.executeScript({
      target: { tabId },
      func: (sel: string, value: string) => {
        const element = document.querySelector(sel);
        if (element instanceof HTMLInputElement || element instanceof HTMLTextAreaElement) {
          element.value = value;
          element.dispatchEvent(new Event('input', { bubbles: true }));
          element.dispatchEvent(new Event('change', { bubbles: true }));
        } else {
          throw new Error(`Input element not found: ${sel}`);
        }
      },
      args: [selector, text]
    });
  }

  /**
   * Scroll page or to element
   */
  async scroll(target: 'up' | 'down' | string): Promise<void> {
    const tabId = await this.getOrCreateTab();

    if (target === 'up' || target === 'down') {
      const amount = target === 'down' ? 500 : -500;
      await chrome.scripting.executeScript({
        target: { tabId },
        func: (pixels: number) => {
          window.scrollBy({ top: pixels, behavior: 'smooth' });
        },
        args: [amount]
      });
    } else {
      // Scroll to element
      await chrome.scripting.executeScript({
        target: { tabId },
        func: (sel: string) => {
          const element = document.querySelector(sel);
          element?.scrollIntoView({ behavior: 'smooth', block: 'center' });
        },
        args: [target]
      });
    }

    await this.delay(300);
  }

  /**
   * Extract data from page
   */
  async extractData(selector: string): Promise<any[]> {
    const tabId = await this.getOrCreateTab();

    const [result] = await chrome.scripting.executeScript({
      target: { tabId },
      func: (sel: string) => {
        const elements = document.querySelectorAll(sel);
        return Array.from(elements).map(el => ({
          text: el.textContent?.trim(),
          html: el.innerHTML,
          attributes: Object.fromEntries(
            Array.from(el.attributes).map(attr => [attr.name, attr.value])
          )
        }));
      },
      args: [selector]
    });

    return result.result ?? [];
  }

  /**
   * Get page accessibility tree (simplified)
   */
  async getAccessibility(): Promise<string> {
    const tabId = await this.getOrCreateTab();

    const [result] = await chrome.scripting.executeScript({
      target: { tabId },
      func: () => {
        // Extract semantic structure
        const headings = Array.from(document.querySelectorAll('h1, h2, h3')).map(h => h.textContent?.trim());
        const links = Array.from(document.querySelectorAll('a')).map(a => a.textContent?.trim()).filter(Boolean);
        const buttons = Array.from(document.querySelectorAll('button')).map(b => b.textContent?.trim()).filter(Boolean);
        const inputs = Array.from(document.querySelectorAll('input')).map(i => i.getAttribute('placeholder') || i.getAttribute('name')).filter(Boolean);

        return {
          headings,
          links: links.slice(0, 20),
          buttons: buttons.slice(0, 20),
          inputs: inputs.slice(0, 20)
        };
      }
    });

    return JSON.stringify(result.result, null, 2);
  }

  /**
   * Get current URL
   */
  get url(): string {
    return this.currentUrl ?? '';
  }

  private currentUrl?: string;

  private async waitForLoad(tabId: number): Promise<void> {
    return new Promise((resolve) => {
      const listener = (
        updatedTabId: number,
        changeInfo: chrome.tabs.TabChangeInfo
      ) => {
        if (updatedTabId === tabId && changeInfo.status === 'complete') {
          chrome.tabs.onUpdated.removeListener(listener);
          resolve();
        }
      };

      chrome.tabs.onUpdated.addListener(listener);

      // Timeout after 30 seconds
      setTimeout(() => {
        chrome.tabs.onUpdated.removeListener(listener);
        resolve();
      }, 30000);
    });
  }

  private delay(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Adding LLM Integration

Provider Abstraction

// packages/extension/src/background/llm/provider.ts
export interface LLMProvider {
  generate(prompt: string, options: GenerateOptions): Promise<string>;
}

export interface GenerateOptions {
  temperature?: number;
  maxTokens?: number;
  stopSequences?: string[];
}

export class UniversalLLMProvider implements LLMProvider {
  constructor(private config: ProviderConfig) {}

  async generate(prompt: string, options: GenerateOptions = {}): Promise<string> {
    switch (this.config.provider) {
      case 'openai':
        return await this.generateOpenAI(prompt, options);
      case 'anthropic':
        return await this.generateAnthropic(prompt, options);
      case 'google':
        return await this.generateGoogle(prompt, options);
      case 'groq':
        return await this.generateGroq(prompt, options);
      case 'ollama':
        return await this.generateOllama(prompt, options);
      default:
        throw new Error(`Unsupported provider: ${this.config.provider}`);
    }
  }

  private async generateOpenAI(prompt: string, options: GenerateOptions): Promise<string> {
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${this.config.apiKey}`
      },
      body: JSON.stringify({
        model: this.config.model,
        messages: [{ role: 'user', content: prompt }],
        temperature: options.temperature ?? 0.7,
        max_tokens: options.maxTokens ?? 4096,
        stop: options.stopSequences
      })
    });

    if (!response.ok) {
      throw new Error(`OpenAI API error: ${response.status} ${response.statusText}`);
    }

    const data = await response.json();
    return data.choices[0].message.content;
  }

  private async generateAnthropic(prompt: string, options: GenerateOptions): Promise<string> {
    const response = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'x-api-key': this.config.apiKey,
        'anthropic-version': '2023-06-01'
      },
      body: JSON.stringify({
        model: this.config.model,
        messages: [{ role: 'user', content: prompt }],
        temperature: options.temperature ?? 0.7,
        max_tokens: options.maxTokens ?? 4096
      })
    });

    if (!response.ok) {
      throw new Error(`Anthropic API error: ${response.status}`);
    }

    const data = await response.json();
    return data.content[0].text;
  }

  private async generateGoogle(prompt: string, options: GenerateOptions): Promise<string> {
    const url = `https://generativelanguage.googleapis.com/v1beta/models/${this.config.model}:generateContent?key=${this.config.apiKey}`;

    const response = await fetch(url, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        contents: [{ parts: [{ text: prompt }] }],
        generationConfig: {
          temperature: options.temperature ?? 0.7,
          maxOutputTokens: options.maxTokens ?? 4096
        }
      })
    });

    if (!response.ok) {
      throw new Error(`Google API error: ${response.status}`);
    }

    const data = await response.json();
    return data.candidates[0].content.parts[0].text;
  }

  // Implement other providers similarly...
}

Building the User Interface

Side Panel with React

// packages/extension/src/side-panel/App.tsx
import React, { useState } from 'react';
import { ChatInterface } from './components/ChatInterface';
import { SettingsPanel } from './components/SettingsPanel';

export function App() {
  const [view, setView] = useState<'chat' | 'settings'>('chat');

  return (
    <div className="h-screen flex flex-col bg-gray-50">
      {/* Header */}
      <header className="bg-white border-b px-4 py-3 flex justify-between items-center">
        <h1 className="text-lg font-semibold">BrowserGPT</h1>
        <button
          onClick={() => setView(view === 'chat' ? 'settings' : 'chat')}
          className="text-sm text-gray-600 hover:text-gray-900"
        >
          {view === 'chat' ? 'Settings' : 'Chat'}
        </button>
      </header>

      {/* Main content */}
      <main className="flex-1 overflow-hidden">
        {view === 'chat' ? <ChatInterface /> : <SettingsPanel />}
      </main>
    </div>
  );
}

Chat Interface Component

// packages/extension/src/side-panel/components/ChatInterface.tsx
import React, { useState, useRef, useEffect } from 'react';
import { sendMessage } from '../lib/messaging';

export function ChatInterface() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [isExecuting, setIsExecuting] = useState(false);
  const messagesEndRef = useRef<HTMLDivElement>(null);

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();

    if (!input.trim() || isExecuting) return;

    const userMessage: Message = {
      id: Date.now().toString(),
      role: 'user',
      content: input,
      timestamp: new Date()
    };

    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setIsExecuting(true);

    try {
      // Send task to background script
      const response = await sendMessage('execute_task', { task: input });

      const assistantMessage: Message = {
        id: (Date.now() + 1).toString(),
        role: 'assistant',
        content: response.data.result,
        timestamp: new Date()
      };

      setMessages(prev => [...prev, assistantMessage]);

    } catch (error) {
      const errorMessage: Message = {
        id: (Date.now() + 1).toString(),
        role: 'error',
        content: error instanceof Error ? error.message : 'Unknown error',
        timestamp: new Date()
      };

      setMessages(prev => [...prev, errorMessage]);
    } finally {
      setIsExecuting(false);
    }
  };

  // Auto-scroll to bottom
  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  return (
    <div className="h-full flex flex-col">
      {/* Messages */}
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.length === 0 && (
          <div className="text-center text-gray-500 mt-8">
            <p>Start by describing what you want to automate</p>
            <p className="text-sm mt-2">Example: "Find product prices on Amazon"</p>
          </div>
        )}

        {messages.map((message) => (
          <MessageBubble key={message.id} message={message} />
        ))}

        {isExecuting && (
          <div className="flex items-center gap-2 text-sm text-gray-600">
            <LoadingSpinner />
            <span>Executing...</span>
          </div>
        )}

        <div ref={messagesEndRef} />
      </div>

      {/* Input */}
      <form onSubmit={handleSubmit} className="border-t p-4 bg-white">
        <div className="flex gap-2">
          <input
            type="text"
            value={input}
            onChange={(e) => setInput(e.target.value)}
            placeholder="What do you want to automate?"
            className="flex-1 px-4 py-2 border rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
            disabled={isExecuting}
          />
          <button
            type="submit"
            disabled={!input.trim() || isExecuting}
            className="px-6 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed"
          >
            Send
          </button>
        </div>
      </form>
    </div>
  );
}

function MessageBubble({ message }: { message: Message }) {
  const isUser = message.role === 'user';
  const isError = message.role === 'error';

  return (
    <div className={`flex ${isUser ? 'justify-end' : 'justify-start'}`}>
      <div
        className={`max-w-[80%] px-4 py-2 rounded-lg ${
          isUser
            ? 'bg-blue-600 text-white'
            : isError
            ? 'bg-red-100 text-red-900'
            : 'bg-white border'
        }`}
      >
        <p className="text-sm whitespace-pre-wrap">{message.content}</p>
        <p className="text-xs mt-1 opacity-70">
          {message.timestamp.toLocaleTimeString()}
        </p>
      </div>
    </div>
  );
}

Messaging Utility

// packages/extension/src/side-panel/lib/messaging.ts
export async function sendMessage(type: string, payload: any): Promise<any> {
  return new Promise((resolve, reject) => {
    chrome.runtime.sendMessage(
      { type, payload },
      (response) => {
        if (chrome.runtime.lastError) {
          reject(new Error(chrome.runtime.lastError.message));
          return;
        }

        if (response.success) {
          resolve(response);
        } else {
          reject(new Error(response.error));
        }
      }
    );
  });
}

Handling Errors and Edge Cases

Error Recovery System

// packages/extension/src/background/errors/recovery.ts
export class ErrorRecoverySystem {
  async attemptRecovery(error: Error, context: AgentContext): Promise<boolean> {
    // Categorize error
    const category = this.categorizeError(error);

    switch (category) {
      case 'network':
        return await this.recoverFromNetwork(error, context);

      case 'element_not_found':
        return await this.recoverFromMissingElement(error, context);

      case 'permission_denied':
        return await this.recoverFromPermission(error, context);

      case 'rate_limit':
        return await this.recoverFromRateLimit(error, context);

      default:
        return false;
    }
  }

  private categorizeError(error: Error): string {
    if (error.message.includes('fetch') || error.message.includes('network')) {
      return 'network';
    }

    if (error.message.includes('not found') || error.message.includes('querySelector')) {
      return 'element_not_found';
    }

    if (error.message.includes('permission') || error.message.includes('denied')) {
      return 'permission_denied';
    }

    if (error.message.includes('rate limit') || error.message.includes('429')) {
      return 'rate_limit';
    }

    return 'unknown';
  }

  private async recoverFromNetwork(error: Error, context: AgentContext): Promise<boolean> {
    // Retry with exponential backoff
    for (let attempt = 1; attempt <= 3; attempt++) {
      await this.delay(Math.pow(2, attempt) * 1000);

      try {
        // Retry last action
        await context.browserContext.refresh();
        return true;
      } catch {
        continue;
      }
    }

    return false;
  }

  private async recoverFromMissingElement(error: Error, context: AgentContext): Promise<boolean> {
    // Try alternative selectors or ask Navigator to find element differently
    // For now, just wait and retry
    await this.delay(2000);

    try {
      await context.browserContext.refresh();
      return true;
    } catch {
      return false;
    }
  }

  private async recoverFromRateLimit(error: Error, context: AgentContext): Promise<boolean> {
    // Wait before retrying
    await this.delay(60000); // 1 minute
    return true;
  }

  private delay(ms: number): Promise<void> {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

Testing and Debugging

Unit Tests with Vitest

// packages/extension/src/background/agent/__tests__/navigator.test.ts
import { describe, it, expect, vi } from 'vitest';
import { NavigatorAgent } from '../agents/navigator';

describe('NavigatorAgent', () => {
  it('generates valid actions from prompt', async () => {
    const mockContext = createMockContext();
    const navigator = new NavigatorAgent(mockContext, testConfig);

    const result = await navigator.execute();

    expect(result).toHaveProperty('actions');
    expect(result.actions).toBeInstanceOf(Array);
    expect(result.actions.length).toBeGreaterThan(0);
    expect(result.actions.length).toBeLessThanOrEqual(10);
  });

  it('handles click actions correctly', async () => {
    const mockContext = createMockContext();
    const navigator = new NavigatorAgent(mockContext, testConfig);

    vi.spyOn(mockContext.browserContext, 'click').mockResolvedValue();

    await navigator.executeAction({
      type: 'click_element',
      target: '#button',
      reasoning: 'Test click'
    });

    expect(mockContext.browserContext.click).toHaveBeenCalledWith('#button');
  });
});

function createMockContext(): AgentContext {
  return {
    task: 'Test task',
    browserContext: {
      navigate: vi.fn(),
      click: vi.fn(),
      type: vi.fn(),
      scroll: vi.fn(),
      extractData: vi.fn(),
      getAccessibility: vi.fn(() => Promise.resolve('mock accessibility tree')),
      url: 'https://example.com'
    },
    actionResults: [],
    step: 0,
    done: false,
    maxSteps: 100,
    planningInterval: 3
  };
}

E2E Tests with Playwright

// e2e/basic-automation.spec.ts
import { test, expect } from '@playwright/test';
import path from 'path';

test.describe('BrowserGPT Extension', () => {
  test.use({
    headless: false,
    args: [
      `--disable-extensions-except=${path.join(__dirname, '../dist')}`,
      `--load-extension=${path.join(__dirname, '../dist')}`
    ]
  });

  test('successfully automates Google search', async ({ page, context }) => {
    // Open side panel
    const [sidePanelPage] = await Promise.all([
      context.waitForEvent('page'),
      page.click('[data-testid="extension-action"]')
    ]);

    // Enter task
    await sidePanelPage.fill('[data-testid="task-input"]', 'Search for "web automation" on Google');
    await sidePanelPage.click('[data-testid="submit-button"]');

    // Wait for execution
    await sidePanelPage.waitForSelector('[data-testid="result-message"]', { timeout: 30000 });

    // Verify result
    const result = await sidePanelPage.textContent('[data-testid="result-message"]');
    expect(result).toContain('Found search results');
  });
});

Deployment and Distribution

Build for Production

# Build all packages
pnpm build

# Create distributable zip
pnpm zip

# Output: dist-zip/browsergpt-1.0.0.zip

Chrome Web Store Preparation

// store-assets/manifest-overrides.json
{
  "name": "BrowserGPT - AI Browser Automation",
  "description": "Automate browser tasks with natural language. Powered by AI, privacy-first, open source.",
  "icons": {
    "16": "icons/icon-16.png",
    "48": "icons/icon-48.png",
    "128": "icons/icon-128.png"
  },
  "permissions": [
    "activeTab",
    "scripting",
    "storage",
    "tabs",
    "sidePanel"
  ],
  "host_permissions": [
    "<all_urls>"
  ]
}

Required assets:

  • Icon (128x128, 48x48, 16x16)
  • Screenshots (1280x800, minimum 1)
  • Promotional image (440x280)
  • Privacy policy URL
  • Support URL

Publish to Chrome Web Store

  1. Create developer account ($5 one-time fee)
  2. Upload zip: dist-zip/browsergpt-1.0.0.zip
  3. Fill metadata:
    • Name, description, category
    • Screenshots, promotional images
    • Privacy policy, support URL
  4. Submit for review (typically 1-3 days)
  5. Publish once approved

Self-Hosting for Enterprises

# Package as self-hosted extension
pnpm build
zip -r browsergpt-enterprise.zip dist/

# Documentation for loading
cat > INSTALL.md <<'EOF'
# BrowserGPT Enterprise Installation

1. Download browsergpt-enterprise.zip
2. Extract to a directory
3. Open chrome://extensions/
4. Enable "Developer mode"
5. Click "Load unpacked"
6. Select extracted directory
7. Configure LLM endpoint in settings

For self-hosted LLM:
- Set provider to "custom"
- Enter your internal LLM API endpoint
- Provide authentication credentials

EOF

Advanced Features

Vision Support

// Add screenshot analysis to Navigator
class NavigatorAgent extends BaseAgent {
  async executeWithVision(): Promise<AgentResult> {
    // Capture screenshot
    const screenshot = await this.context.browserContext.captureScreenshot();

    // Send to vision-capable LLM
    const prompt = `
Analyze this screenshot and determine next actions for: ${this.context.task}

[Image attached]

Return JSON with actions.
    `;

    const response = await this.llm.generateWithVision(prompt, screenshot);

    // Execute actions
    return await this.executeActions(response.actions);
  }
}

Parallel Execution

// Execute across multiple tabs simultaneously
class ParallelExecutor {
  async executeParallel(tasks: string[]): Promise<ExecutionResult[]> {
    return await Promise.all(
      tasks.map(async (task) => {
        const context = await this.createContext();
        const executor = new Executor();
        return await executor.execute(task, context);
      })
    );
  }
}

Scheduling and Automation

// Add scheduling capabilities
interface ScheduledTask {
  id: string;
  task: string;
  schedule: string; // cron format
  enabled: boolean;
}

class Scheduler {
  async scheduleTask(task: ScheduledTask) {
    // Use chrome.alarms API for scheduling
    chrome.alarms.create(task.id, {
      when: this.parseSchedule(task.schedule)
    });

    chrome.alarms.onAlarm.addListener(async (alarm) => {
      if (alarm.name === task.id) {
        const executor = new Executor();
        await executor.execute(task.task);
      }
    });
  }
}

Production Optimization

Performance

// Add caching to reduce LLM calls
class CachedLLMProvider implements LLMProvider {
  private cache = new Map<string, string>();

  async generate(prompt: string, options: GenerateOptions): Promise<string> {
    const cacheKey = `${prompt}-${JSON.stringify(options)}`;

    if (this.cache.has(cacheKey)) {
      return this.cache.get(cacheKey)!;
    }

    const result = await this.provider.generate(prompt, options);
    this.cache.set(cacheKey, result);

    return result;
  }
}

Cost Optimization

// Use cheaper models for simple tasks
class SmartModelSelector {
  selectModel(task: string, complexity: number): string {
    if (complexity < 0.3) {
      return 'gpt-4o-mini'; // Cheap, fast
    } else if (complexity < 0.7) {
      return 'gpt-4o'; // Balanced
    } else {
      return 'claude-sonnet-4'; // Premium reasoning
    }
  }

  estimateComplexity(task: string): number {
    // Heuristics:
    // - Multi-step tasks = high complexity
    // - Simple extraction = low complexity
    const keywords = ['compare', 'analyze', 'research', 'multiple'];
    const score = keywords.filter(k => task.toLowerCase().includes(k)).length / keywords.length;
    return score;
  }
}

Monitoring

// Add telemetry
class Telemetry {
  trackExecution(result: ExecutionResult) {
    chrome.storage.local.get('metrics', (data) => {
      const metrics = data.metrics || {
        totalExecutions: 0,
        successfulExecutions: 0,
        failedExecutions: 0,
        averageSteps: 0,
        totalCost: 0
      };

      metrics.totalExecutions++;
      if (result.success) {
        metrics.successfulExecutions++;
      } else {
        metrics.failedExecutions++;
      }

      metrics.averageSteps =
        (metrics.averageSteps * (metrics.totalExecutions - 1) + result.steps) /
        metrics.totalExecutions;

      chrome.storage.local.set({ metrics });
    });
  }
}

Conclusion

You've just built a production-ready, open-source ChatGPT alternative that actually controls browsers.

What you created: ✅ Chrome extension with Manifest V3 ✅ Multi-agent AI system (Navigator + Planner) ✅ Browser automation with Chrome APIs ✅ LLM integration (multiple providers) ✅ React-based user interface ✅ Error handling and recovery ✅ Testing infrastructure ✅ Distribution-ready package

Next steps:

  1. Customize for your specific use cases
  2. Add domain-specific agents (e.g., e-commerce, testing, data extraction)
  3. Extend with additional LLM providers
  4. Deploy to Chrome Web Store or self-host
  5. Contribute to the open-source community

Resources:

Try it now: Install Onpiste - the production version of what you just built.



Ready to build your own? Clone the starter template and start coding.

Share this article