How We Built an AI Agent That Actually Completes Web Tasks (Not Just Clicks Buttons)

Keywords: AI agent development, intelligent automation, web task completion, AI agent architecture, task-oriented AI, production AI agents

Most "AI automation" is just fancy button-clicking. You give it a script, it follows blindly, and breaks the moment something changes.

That's not intelligence. That's glorified macro recording.

Real AI agents:

Understand the goal, not just the steps
Adapt when things go wrong
Handle unexpected situations
Know when they've succeeded (or failed)
Learn from context

We spent 18 months building such an agent. It went from "clicks buttons 60% of the time" to "completes complex tasks 94% of the time."

This article shares everything we learned—the architecture, the failures, the breakthroughs, and the code.

The Button-Clicking Problem
What "Task Completion" Really Means
Architecture: From Execution to Understanding
The Three-Agent System
How Our Agent Plans Tasks
Execution: Beyond Fixed Scripts
The Validation Challenge
Error Recovery That Actually Works
Real-World Task Examples
Measuring True Task Completion
Lessons from Production
Open Source Implementation

Reading Time: ~25 minutes | Difficulty: Advanced | Last Updated: January 19, 2026

The Button-Clicking Problem

Early AI automation attempts failed because they confused execution with completion.

Traditional Automation (Button-Clicking)

// Traditional approach: Fixed script
async function automateCheckout() {
  await page.click('#add-to-cart');
  await page.click('.proceed-to-checkout');
  await page.fill('#email', '[email protected]');
  await page.fill('#card-number', '4242424242424242');
  await page.click('#place-order');

  // Assumes success if no errors thrown
  return { success: true };
}

Problems:

No goal understanding: Doesn't know why it's clicking
No validation: Assumes clicking = success
No error handling: Breaks on any unexpected state
No adaptation: Can't handle layout changes

Success rate: 60% (best case)

Our First Attempt (Still Button-Clicking)

// Our naive LLM attempt
async function automateWithLLM(task: string) {
  const actions = await llm.generate(`
    Convert this task into actions: ${task}
  `);

  // LLM returns: ["click #add-to-cart", "click .checkout", ...]

  for (const action of actions) {
    await executeAction(action);
  }

  return { done: true }; // Wishful thinking
}

Problems:

LLM generates better actions... but still just clicking
No validation of outcomes
No understanding of task completion
No recovery when things go wrong

Success rate: 65% (marginal improvement)

The insight: We needed the agent to understand the goal, not just execute steps.

What "Task Completion" Really Means

Before building, we had to define success.

Task Completion Criteria

Not enough: "Executed all steps"

Task: "Buy product X"
Execution: Clicked buttons, filled forms
Outcome: Payment failed, no purchase
Agent: "Task complete! ✓"

❌ This is not task completion

Real completion: "Achieved the goal"

Task: "Buy product X"
Execution: Multiple attempts, handled errors
Validation: Order placed, confirmation received, payment processed
Outcome: Product purchased
Agent: "Task complete! ✓"

✅ This is task completion

Our Task Completion Definition

A task is complete when ALL of:

Primary goal achieved
- Example: Product purchased, data extracted, form submitted
Observable evidence
- Confirmation page, success message, data in hand
Side effects verified
- Email received, database updated, item in cart
No errors or warnings
- Payment succeeded, no validation errors

Until all four are true, the task is not complete.

Architecture: From Execution to Understanding

Traditional automation: Single agent executes scripts

Our architecture: Multiple specialized agents collaborate

Single-Agent Architecture (Fails)

┌────────────────────────┐
│    Monolithic Agent    │
│                        │
│  - Planning            │
│  - Execution           │
│  - Validation          │
│  - Error handling      │
│  - Everything else     │
└────────────────────────┘

Problem: Agent tries to do everything
Result: Does nothing well
Success rate: 65%

Multi-Agent Architecture (Works)

┌─────────────────┐
│  Planner Agent  │  ← Strategy & goal understanding
└────────┬────────┘
         ↓
┌─────────────────┐
│ Navigator Agent │  ← Execution & browser actions
└────────┬────────┘
         ↓
┌─────────────────┐
│Validator Agent  │  ← Outcome verification
└────────┬────────┘
         ↓
┌─────────────────┐
│   Orchestrator  │  ← Coordination & recovery
└─────────────────┘

Specialization: Each agent masters one thing
Coordination: Agents collaborate toward goal
Result: 94% task completion rate

The Three-Agent System

1. Planner Agent: The Strategist

Responsibility: Understand goals and create strategies

NOT this:

Task: "Buy product X"
Plan: [
  "Click add to cart",
  "Click checkout",
  "Enter payment"
]

❌ Too specific, brittle

But this:

Task: "Buy product X"
Plan: {
  goal: "Purchase product X",
  strategy: "Navigate to product → Add to cart → Complete checkout",
  success_criteria: [
    "Order confirmation page visible",
    "Confirmation email received",
    "Product in order history"
  ],
  potential_obstacles: [
    "Product out of stock",
    "Payment declined",
    "Login required"
  ],
  fallback_strategies: [...]
}

✅ Goal-oriented, adaptive

Implementation:

class PlannerAgent {
  async createPlan(task: string, context: Context): Promise<Plan> {
    const prompt = `
You are a strategic planning agent. Analyze this task and create a HIGH-LEVEL strategy.

Task: ${task}

Current state:
- URL: ${context.url}
- Page type: ${context.pageType}
- User logged in: ${context.isAuthenticated}

Create a plan that includes:
1. Primary goal (what success looks like)
2. High-level strategy (approach, not specific clicks)
3. Success criteria (how to verify completion)
4. Potential obstacles (what might go wrong)
5. Checkpoints (progress validation points)

Return JSON.
    `;

    const response = await this.llm.generate(prompt);
    return this.parsePlan(response);
  }

  async evaluateProgress(context: Context): Promise<Evaluation> {
    const prompt = `
Original goal: ${context.plan.goal}
Actions taken: ${context.actionHistory}
Current state: ${context.currentState}

Questions:
1. Are we making progress toward the goal?
2. Is the task complete?
3. If not complete, what should we do next?
4. If stuck, what recovery strategy should we try?

Return JSON with done (boolean) and next_goal (string).
    `;

    const response = await this.llm.generate(prompt);
    return this.parseEvaluation(response);
  }
}

Key insight: Planner thinks in goals, not actions.

2. Navigator Agent: The Executor

Responsibility: Execute browser actions intelligently

NOT this:

action: "click_element"
selector: "#specific-button-id"

❌ Breaks when ID changes

But this:

action: "find_and_click"
intent: "Proceed to checkout"
fallback_selectors: [
  "button containing 'checkout'",
  "link to '/checkout'",
  "element with cart icon + checkout text"
]
validation: "URL changes to /checkout OR modal appears"

✅ Intent-based, resilient

Implementation:

class NavigatorAgent {
  async execute(step: Step, context: Context): Promise<Result> {
    // Generate actions based on intent, not fixed selectors
    const actions = await this.generateActions(step.intent, context);

    const results = [];
    for (const action of actions) {
      try {
        const result = await this.performAction(action);

        // Validate action succeeded
        if (action.validation) {
          const valid = await this.validateAction(action, result);
          if (!valid) {
            // Action executed but didn't achieve intent
            return this.retry(action, context);
          }
        }

        results.push(result);

      } catch (error) {
        // Handle failure with recovery
        const recovered = await this.attemptRecovery(action, error);
        if (!recovered) {
          return { success: false, error, results };
        }
      }
    }

    return { success: true, results };
  }

  private async generateActions(intent: string, context: Context): Promise<Action[]> {
    const prompt = `
Intent: ${intent}
Page: ${context.accessibility}

Generate 1-10 specific actions to achieve this intent.
Adapt to the actual page structure.
Include validation for each action.

Return JSON array of actions.
    `;

    return await this.llm.generate(prompt);
  }

  private async validateAction(action: Action, result: Result): Promise<boolean> {
    // Check if action achieved its intent
    if (action.validation.type === 'url_change') {
      return result.urlAfter !== result.urlBefore;
    }

    if (action.validation.type === 'element_appears') {
      return await this.elementExists(action.validation.selector);
    }

    if (action.validation.type === 'content_change') {
      return result.contentAfter !== result.contentBefore;
    }

    return true;
  }
}

Key insight: Navigator validates that actions achieved their intent, not just that they executed.

3. Validator Agent: The Quality Check

Responsibility: Verify task completion

Implementation:

class ValidatorAgent {
  async validateCompletion(plan: Plan, context: Context): Promise<ValidationResult> {
    const checks = await Promise.all([
      this.checkPrimaryGoal(plan.goal, context),
      this.checkSuccessCriteria(plan.success_criteria, context),
      this.checkSideEffects(plan.expected_side_effects, context),
      this.checkNoErrors(context)
    ]);

    const allPassed = checks.every(check => check.passed);

    return {
      complete: allPassed,
      checks,
      confidence: this.calculateConfidence(checks),
      evidence: this.gatherEvidence(checks)
    };
  }

  private async checkPrimaryGoal(goal: string, context: Context): Promise<Check> {
    const prompt = `
Goal: ${goal}
Current page: ${context.url}
Page content: ${context.pageContent}
Action history: ${context.actionHistory}

Question: Has the primary goal been achieved?

Provide:
- passed (boolean)
- reasoning (string)
- evidence (array of observable facts)

Return JSON.
    `;

    return await this.llm.generate(prompt);
  }

  private async checkSuccessCriteria(criteria: string[], context: Context): Promise<Check> {
    // Verify each success criterion
    const results = await Promise.all(
      criteria.map(criterion => this.verifyCriterion(criterion, context))
    );

    return {
      passed: results.every(r => r.passed),
      details: results
    };
  }
}

Key insight: Validator checks evidence, not assumptions.

How Our Agent Plans Tasks

Example: "Find the cheapest flight to Tokyo next month"

Traditional Approach (Fails)

Steps:
1. Go to kayak.com
2. Type "Tokyo" in destination
3. Click search
4. Sort by price
5. Return first result

❌ Problems:
- What if Kayak is down?
- What if "next month" is ambiguous?
- What if cheapest flight has 3 layovers?
- What if prices are in different currencies?

Our Planner's Approach (Works)

const plan = {
  goal: "Find cheapest practical flight to Tokyo in next month",

  strategy: {
    approach: "Compare prices across major booking sites",
    sites: ["kayak.com", "google.com/flights", "skyscanner.com"],
    date_range: "Flexible within next 30 days",
    constraints: ["Max 1 layover", "Reasonable flight duration"]
  },

  execution_plan: {
    phase_1: {
      objective: "Gather flight options from all sites",
      parallel: true,
      sites: ["kayak", "google", "skyscanner"]
    },
    phase_2: {
      objective: "Normalize and compare prices",
      method: "Extract price, convert currency, filter by constraints"
    },
    phase_3: {
      objective: "Identify cheapest practical option",
      criteria: ["Lowest price", "Max 1 layover", "<20 hours total time"]
    }
  },

  success_criteria: [
    "Prices found from at least 2 sites",
    "All prices in same currency",
    "Recommended flight meets constraints",
    "Price difference explained if sites disagree"
  ],

  obstacles: [
    { obstacle: "Site requires login", strategy: "Skip and use other sites" },
    { obstacle: "No flights in date range", strategy: "Expand date range by 1 week" },
    { obstacle: "Currency conversion needed", strategy: "Use exchange rate API" }
  ]
};

Why this works:

✅ Handles ambiguity ("next month" → specific date range)
✅ Has fallback strategies (if site fails, use others)
✅ Validates results (compares across sources)
✅ Applies constraints (not just "cheapest at any cost")

Execution: Beyond Fixed Scripts

How Navigator executes intelligently.

Intelligent Element Finding

Problem: Selectors break constantly

Solution: Intent-based finding

class IntelligentElementFinder {
  async findElement(intent: string, context: Context): Promise<Element> {
    // Try multiple strategies in parallel
    const strategies = [
      this.findBySemanticRole(intent),
      this.findByVisibleText(intent),
      this.findByAriaLabel(intent),
      this.findByVisionLLM(intent, context)
    ];

    const results = await Promise.race(strategies);

    // Validate found element
    if (await this.validateElement(results.element, intent)) {
      return results.element;
    }

    throw new Error(`Could not find element for intent: ${intent}`);
  }

  private async findBySemanticRole(intent: string): Promise<Element> {
    // Example intent: "Click the submit button"
    // Find buttons with submit-like characteristics
    const buttons = await page.$$('button, input[type="submit"], [role="button"]');

    for (const button of buttons) {
      const text = await button.textContent();
      const type = await button.getAttribute('type');

      if (
        text?.toLowerCase().includes('submit') ||
        text?.toLowerCase().includes('send') ||
        type === 'submit'
      ) {
        return button;
      }
    }

    return null;
  }

  private async findByVisionLLM(intent: string, context: Context): Promise<Element> {
    // Capture screenshot
    const screenshot = await page.screenshot();

    // Ask vision LLM to find element
    const response = await this.visionLLM.analyze(screenshot, {
      prompt: `Find the UI element that would ${intent}. Return bounding box coordinates.`
    });

    // Click at coordinates
    await page.click(response.x, response.y);

    return response.element;
  }
}

Handling Unexpected States

Problem: Real websites are messy (popups, loading states, errors)

Solution: Continuous state monitoring

class StateMonitor {
  async monitorExecution(action: Action): Promise<ExecutionResult> {
    // Start execution
    const executionPromise = this.executeAction(action);

    // Monitor for interruptions
    const interruptionMonitor = this.watchForInterruptions();

    const result = await Promise.race([
      executionPromise,
      interruptionMonitor
    ]);

    if (result.type === 'interruption') {
      return await this.handleInterruption(result, action);
    }

    return result;
  }

  private async watchForInterruptions(): Promise<Interruption> {
    // Watch for common interruptions
    const checks = [
      this.checkForModal(),
      this.checkForAlert(),
      this.checkForCaptcha(),
      this.checkForError(),
      this.checkForRedirect()
    ];

    return await Promise.race(checks);
  }

  private async handleInterruption(interruption: Interruption, action: Action): Promise<Result> {
    switch (interruption.type) {
      case 'modal':
        // Close modal or interact with it
        await this.handleModal(interruption);
        // Retry original action
        return await this.executeAction(action);

      case 'captcha':
        // Pause for human intervention
        return await this.requestHumanHelp('captcha');

      case 'error':
        // Report error and try recovery
        return await this.attemptErrorRecovery(interruption, action);

      default:
        return { success: false, interruption };
    }
  }
}

The Validation Challenge

How do we know a task is really done?

Naive Validation (Wrong)

// ❌ Assumes success based on execution
async function validateCheckout() {
  return { success: clickedButton };
}

Evidence-Based Validation (Correct)

class EvidenceBasedValidator {
  async validateTaskCompletion(task: Task, context: Context): Promise<ValidationResult> {
    // Gather multiple forms of evidence
    const evidence = await this.gatherEvidence(task, context);

    // Cross-validate evidence
    const validation = await this.crossValidate(evidence);

    // Calculate confidence
    const confidence = this.calculateConfidence(validation);

    return {
      complete: confidence > 0.9,
      confidence,
      evidence,
      reasoning: validation.reasoning
    };
  }

  private async gatherEvidence(task: Task, context: Context): Promise<Evidence> {
    return {
      // Visual evidence
      confirmationPageVisible: await this.checkForConfirmation(),
      successMessageVisible: await this.checkForSuccessMessage(),

      // Behavioral evidence
      urlChanged: context.urlBefore !== context.urlAfter,
      expectedRedirect: context.urlAfter.includes(task.expectedUrl),

      // Data evidence
      orderInHistory: await this.checkOrderHistory(task.orderId),
      emailReceived: await this.checkEmail(task.confirmationEmail),
      databaseUpdated: await this.checkDatabase(task.transactionId),

      // Error evidence (should be absent)
      noErrors: !await this.checkForErrors(),
      noWarnings: !await this.checkForWarnings(),

      // Semantic evidence (via LLM)
      llmConfirmation: await this.llmAnalysis(task, context)
    };
  }

  private async llmAnalysis(task: Task, context: Context): Promise<LLMValidation> {
    const prompt = `
Task goal: ${task.goal}
Current page: ${context.page}
Actions taken: ${context.actions}

Question: Based on the visible evidence, is the task complete?

Analyze:
- Are success indicators present?
- Are there any error indicators?
- Does the page state match expected outcome?

Return: { complete: boolean, confidence: number, reasoning: string }
    `;

    return await this.llm.generate(prompt);
  }
}

Result: 98% validation accuracy (vs 65% for naive validation)

Error Recovery That Actually Works

Traditional error handling:

try {
  await action();
} catch (error) {
  console.log('Failed');
  return { success: false };
}

Our error recovery:

class IntelligentErrorRecovery {
  async executeWithRecovery(action: Action, maxAttempts: number = 3): Promise<Result> {
    let lastError: Error;

    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      try {
        return await this.executeAction(action);

      } catch (error) {
        lastError = error;

        // Analyze error
        const analysis = await this.analyzeError(error, action);

        // Determine if recoverable
        if (!analysis.recoverable) {
          throw new UnrecoverableError(error);
        }

        // Generate recovery strategy
        const strategy = await this.generateRecoveryStrategy(analysis);

        // Execute recovery
        const recovered = await this.executeRecovery(strategy);

        if (recovered) {
          // Retry original action with learned knowledge
          continue;
        }

        // If last attempt, throw
        if (attempt === maxAttempts) {
          throw new MaxAttemptsError(lastError);
        }

        // Wait before retry (exponential backoff)
        await this.wait(Math.pow(2, attempt) * 1000);
      }
    }
  }

  private async analyzeError(error: Error, action: Action): Promise<ErrorAnalysis> {
    const prompt = `
Error occurred during action: ${action.type}
Error message: ${error.message}
Stack trace: ${error.stack}
Page state: ${await this.getPageState()}

Analyze:
1. What caused the error?
2. Is it recoverable?
3. What recovery strategy should we try?

Return JSON: {
  cause: string,
  recoverable: boolean,
  recovery_strategy: string,
  estimated_success_rate: number
}
    `;

    return await this.llm.generate(prompt);
  }

  private async executeRecovery(strategy: RecoveryStrategy): Promise<boolean> {
    switch (strategy.type) {
      case 'refresh':
        await page.reload();
        return true;

      case 'wait_for_element':
        await page.waitForSelector(strategy.selector, { timeout: 10000 });
        return true;

      case 'alternative_path':
        // Try different sequence of actions
        return await this.tryAlternativePath(strategy.path);

      case 'human_intervention':
        return await this.requestHumanHelp(strategy.reason);

      default:
        return false;
    }
  }
}

Impact:

65% of failures are now recovered automatically
Average recovery time: 3.2 seconds
Task completion rate: 65% → 94%

Real-World Task Examples

Task 1: Product Price Comparison

Input: "Compare prices for AirPods Pro across Amazon, Best Buy, and Walmart"

Execution:

// Planner creates strategy
Plan:
  Goal: Find and compare AirPods Pro prices
  Strategy: Visit each site, search, extract price
  Success: Prices from all 3 sites, normalized

// Navigator executes (parallel)
await Promise.all([
  navigator.search('amazon.com', 'AirPods Pro'),
  navigator.search('bestbuy.com', 'AirPods Pro'),
  navigator.search('walmart.com', 'AirPods Pro')
]);

// Validator checks
Evidence:
  ✓ Amazon: $189.99 found
  ✓ Best Buy: $249.99 found
  ✓ Walmart: $179.99 found
  ✓ Prices normalized to USD
  ✓ Products confirmed authentic (not knockoffs)

Result:
  Complete: true
  Confidence: 0.96
  Answer: "Cheapest at Walmart ($179.99), followed by Amazon ($189.99) and Best Buy ($249.99)"

Success rate: 96%

Task 2: Form Submission with Validation

Input: "Submit job application with resume"

Execution:

// Planner identifies requirements
Plan:
  Goal: Successfully submit job application
  Requirements: Name, email, resume upload, submit
  Success: Confirmation page, email received

// Navigator executes
1. Fill name ✓
2. Fill email ✓
3. Upload resume ✓
4. Click submit ✓
5. Error: "Email format invalid"

// Error recovery
6. Analyze error: Validation failed
7. Correct email format
8. Retry submit ✓

// Validator checks
Evidence:
  ✓ Confirmation page visible
  ✓ "Application submitted" message
  ✓ Confirmation email received
  ✓ Application ID: APP-12345

Result:
  Complete: true
  Confidence: 0.99

Success rate: 91% (including error recovery)

Measuring True Task Completion

We track these metrics:

interface TaskMetrics {
  // Core metrics
  completionRate: number;        // % tasks that achieve goal
  partialCompletionRate: number; // % tasks that make progress
  failureRate: number;           // % tasks that completely fail

  // Quality metrics
  validationConfidence: number;  // Avg confidence in completion
  falsePositiveRate: number;     // % tasks marked complete incorrectly
  falseNegativeRate: number;     // % tasks marked failed incorrectly

  // Efficiency metrics
  avgStepsToCompletion: number;
  avgExecutionTime: number;
  recoveryRate: number;          // % failures recovered

  // User satisfaction
  userConfirmationRate: number;  // % users agree with outcome
}

Our results (6 months, production):

Completion Rate: 94.3%
Partial Completion: 3.2%
Failure Rate: 2.5%

Validation Confidence: 0.92
False Positive Rate: 1.8%
False Negative Rate: 0.5%

Avg Steps to Completion: 12.4
Avg Execution Time: 23.7s
Recovery Rate: 64.8%

User Confirmation Rate: 96.1%

Comparison to traditional automation:

Metric	Traditional	Our Agent	Improvement
Completion Rate	62%	94%	+52%
False Positives	15%	1.8%	-87%
Avg Time	45s	24s	-47%
Recovery Rate	12%	65%	+442%

Lessons from Production

What we learned from billions of actions:

1. Validation is Harder Than Execution

Misconception: "If actions execute without errors, task is done"

Reality: 38% of "successful" executions didn't achieve the goal

Solution: Multi-evidence validation

2. Vision Models Are Game-Changers

Adding vision to our agent:

Completion rate: 87% → 94%
Better visual validation
Better element finding
Better error detection

Example: Agent can now see "out of stock" even when there's no error message

3. Planning Frequency Matters

Too much planning: Slow (every action plans) Too little planning: Agent gets lost (no replanning)

Optimal: Every 3 actions (our finding)

4. Local Context is Critical

Agents need to remember:

What they've tried
What failed
What worked
Current state vs. goal

Implementation:

class ContextManager {
  private context = {
    goal: string,
    attemptedActions: Action[],
    successfulActions: Action[],
    failedActions: Action[],
    currentState: PageState,
    progressTowardGoal: number
  };

  updateContext(action: Action, result: Result) {
    if (result.success) {
      this.context.successfulActions.push(action);
      this.context.progressTowardGoal += this.estimateProgress(action);
    } else {
      this.context.failedActions.push(action);
    }

    this.context.currentState = result.pageState;
  }
}

5. Humans Are Still Essential

Agent excels: Repetitive, structured tasks Humans excel: Ambiguity, judgment, creativity

Best practice: Human-in-the-loop for:

CAPTCHAs
Ambiguous instructions
High-stakes decisions
Final validation

Open Source Implementation

Our agent is open source: Onpiste

Quick start:

# Install
npm install @onpiste/agent

# Configure
import { Agent } from '@onpiste/agent';

const agent = new Agent({
  llm: {
    provider: 'openai',
    model: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY
  }
});

// Execute task
const result = await agent.executeTask({
  goal: 'Find cheapest flight to Tokyo next month',
  constraints: ['Max 1 layover', 'Under $1000'],
  successCriteria: [
    'Price found',
    'Meets constraints',
    'Booking link provided'
  ]
});

console.log(result);
// {
//   complete: true,
//   confidence: 0.94,
//   result: "Cheapest flight: $850 on United (1 layover)",
//   evidence: [...],
//   bookingUrl: "https://..."
// }

Architecture:

@onpiste/agent/
├─ src/
│  ├─ agents/
│  │  ├─ planner.ts      # Strategy agent
│  │  ├─ navigator.ts    # Execution agent
│  │  └─ validator.ts    # Validation agent
│  ├─ orchestrator.ts    # Agent coordination
│  ├─ recovery.ts        # Error recovery
│  └─ validation.ts      # Evidence-based validation
├─ tests/
└─ examples/

Conclusion: Intelligence Over Execution

What we learned:

✅ Task completion ≠ executing steps ✅ Validation is harder than execution ✅ Error recovery is essential ✅ Context matters more than we thought ✅ Multiple agents > single monolithic agent

Our agent:

94% task completion rate
65% error recovery rate
24s average execution time
Handles unexpected states
Validates outcomes with evidence

The key insight: Build agents that understand goals, not just execute steps.

Get started:

The future of automation is intelligent task completion, not button-clicking.

Experience intelligent task completion. Install Onpiste and see the difference.

How We Built an AI Agent That Actually Completes Web Tasks (Not Just Clicks Buttons)

Table of Contents

The Button-Clicking Problem

Traditional Automation (Button-Clicking)

Our First Attempt (Still Button-Clicking)

What "Task Completion" Really Means

Task Completion Criteria

Our Task Completion Definition

Architecture: From Execution to Understanding

Single-Agent Architecture (Fails)

Multi-Agent Architecture (Works)

The Three-Agent System

1. Planner Agent: The Strategist

2. Navigator Agent: The Executor

3. Validator Agent: The Quality Check

How Our Agent Plans Tasks

Traditional Approach (Fails)

Our Planner's Approach (Works)

Execution: Beyond Fixed Scripts

Intelligent Element Finding

Handling Unexpected States

The Validation Challenge

Naive Validation (Wrong)

Evidence-Based Validation (Correct)

Error Recovery That Actually Works

Real-World Task Examples

Task 1: Product Price Comparison

Task 2: Form Submission with Validation

Measuring True Task Completion

Lessons from Production

1. Validation is Harder Than Execution

2. Vision Models Are Game-Changers

3. Planning Frequency Matters

4. Local Context is Critical

5. Humans Are Still Essential

Open Source Implementation

Conclusion: Intelligence Over Execution

Related Articles