Back to blog

Automatic Pagination Handling: How to Scrape Thousands of Records Across Multiple Pages

Keywords: pagination handling, multi-page scraping, automatic pagination, next button detection, infinite scroll

You found the perfect data source. There's just one problem: the data spans 47 pages.

Manually clicking "Next" 47 times? Copy-pasting data from each page? Not happening.

This is where automatic pagination handling transforms a tedious multi-hour task into a 2-minute automated extraction.

This guide shows you how intelligent pagination detection works, handles edge cases, and extracts data across unlimited pages—without writing a single line of code.

Table of Contents

The Pagination Problem

Why Pagination Exists

Websites paginate data for good reasons:

  • Performance: Loading 10,000 products at once crashes browsers
  • User experience: Overwhelming users with too much data
  • Server load: Reducing database queries and bandwidth

The Scraper's Challenge

Traditional manual approaches fail at scale:

Manual clicking:

Page 1 → Extract data → Click next
Page 2 → Extract data → Click next
Page 3 → Extract data → Click next
...
Page 47 → Extract data → Done

Problems:

  • ❌ Time-consuming (5-10 minutes per page)
  • ❌ Error-prone (miss a page, lose data)
  • ❌ Tedious and boring
  • ❌ Doesn't scale to 100+ pages

Selenium/Playwright approach:

# Traditional code-based approach
while True:
    data = scrape_current_page()
    next_button = driver.find_element(By.CSS_SELECTOR, 'a.next')
    if not next_button:
        break
    next_button.click()
    time.sleep(2)

Problems:

  • ❌ Requires coding skills
  • ❌ Brittle selectors break with site changes
  • ❌ Hard to determine when page fully loads
  • ❌ Manual sleep timings are unreliable

How Automatic Pagination Detection Works

Visual Next Button Detection

User workflow:

  1. Navigate to first page of results
  2. Extract table data
  3. Click "Mark Next Button" mode
  4. Click on the "Next" button visually
  5. System learns the button selector
  6. Automatic extraction begins

What happens behind the scenes:

// User clicks on next button
async function markNextButton(tabId: number) {
  return new Promise((resolve) => {
    chrome.tabs.sendMessage(tabId,
      { action: "getNextButton" },
      (response) => {
        // System captures button selector
        resolve(response.selector);
      }
    );
  });
}

Benefits:

  • ✅ No CSS knowledge required
  • ✅ Works with any button style
  • ✅ Adapts to site changes (you re-click)
  • ✅ Visual, intuitive process

Smart Page Loading Detection

The challenge: How do you know when the next page finished loading?

Naive approach (unreliable):

click_next_button()
wait(2000) // Hope 2 seconds is enough
extract_data()

Problems:

  • Too short → extracts before page loads
  • Too long → wastes time
  • Network speed varies
  • Dynamic content loads asynchronously

Intelligent approach (reliable):

async function clickNextAndWait(selector: string, options) {
  // Click next button
  const button = document.querySelector(selector);
  button.click();

  // Monitor multiple signals
  await Promise.race([
    waitForNetworkQuiet(options.networkQuietMs),
    waitForDOMChanges(),
    waitForLoadingIndicators(),
    timeout(options.maxWaitMs)
  ]);

  // Ensure minimum wait time
  await sleep(options.minWaitMs);
}

Monitors:

  1. Network activity: Waits until no new requests for 500ms
  2. DOM changes: Detects when content stops updating
  3. Loading indicators: Watches for spinners to disappear
  4. Timeout safety: Max wait time prevents infinite loops

Deduplication Across Pages

Problem: Same data appears on multiple pages (overlap)

Example scenario:

  • Page 1: Items 1-20
  • Page 2: Items 18-40 (overlap of items 18-20)
  • Page 3: Items 38-60 (overlap of items 38-40)

Solution: Visited URL tracking

const visitedUrls = new Set();

function shouldExtractPage(url: string) {
  if (visitedUrls.has(url)) {
    return false; // Skip already visited
  }
  visitedUrls.add(url);
  return true;
}

Ensures:

  • No duplicate records
  • Clean, unique dataset
  • Proper progress tracking

Intelligent Page Load Detection

Network Quiet Detection

Concept: Page is "loaded" when no new network requests for N milliseconds.

Implementation:

function waitForNetworkQuiet(quietMs: number) {
  return new Promise((resolve) => {
    let timeoutId;
    let requestCount = 0;

    // Intercept network requests
    const observer = new PerformanceObserver((list) => {
      clearTimeout(timeoutId);
      requestCount++;

      // Reset timer on each request
      timeoutId = setTimeout(() => {
        if (requestCount > 0) {
          resolve();
        }
      }, quietMs);
    });

    observer.observe({ entryTypes: ['resource'] });
  });
}

Default: 500ms of network silence = page loaded

Works for:

  • AJAX-loaded content
  • Lazy-loaded images
  • Dynamic API calls
  • Asynchronous updates

DOM Mutation Monitoring

Concept: Watch for changes to page content.

function waitForDOMChanges() {
  return new Promise((resolve) => {
    const observer = new MutationObserver((mutations) => {
      if (mutations.length > 0) {
        // Content changed, page still loading
        clearTimeout(timeout);
        timeout = setTimeout(resolve, 1000);
      }
    });

    observer.observe(document.body, {
      childList: true,
      subtree: true
    });
  });
}

Detects:

  • New elements added
  • Content updated
  • Re-renders complete

Loading Indicator Detection

Common loading patterns:

const loadingSelectors = [
  '.loading',
  '.spinner',
  '[class*="loading"]',
  '[aria-busy="true"]',
  '.skeleton'
];

function waitForLoadingIndicators() {
  return new Promise((resolve) => {
    const checkInterval = setInterval(() => {
      const hasLoadingIndicator = loadingSelectors.some(
        selector => document.querySelector(selector)
      );

      if (!hasLoadingIndicator) {
        clearInterval(checkInterval);
        resolve();
      }
    }, 100);
  });
}

Recognizes:

  • CSS loading spinners
  • "Loading..." text
  • Skeleton screens
  • ARIA busy states

Combined Strategy

Best results: Use ALL signals together

const waitOptions = {
  minWaitMs: 1000,      // Always wait at least 1 second
  maxWaitMs: 20000,     // Never wait more than 20 seconds
  networkQuietMs: 500   // 500ms of network silence
};

await Promise.race([
  Promise.all([
    waitForNetworkQuiet(waitOptions.networkQuietMs),
    waitForDOMChanges(),
    waitForLoadingIndicators()
  ]),
  timeout(waitOptions.maxWaitMs) // Safety timeout
]);

// Ensure minimum wait
await sleep(waitOptions.minWaitMs);

Guarantees:

  • Page is stable before extraction
  • Doesn't wait unnecessarily
  • Handles slow connections
  • Prevents infinite waiting

Next Button Detection Strategies

Visual Click-to-Mark

Simplest approach: Let user show you the button

Process:

  1. User enters "Mark Next Button" mode
  2. Button highlight feature activates
  3. User clicks desired button
  4. System captures element selector
  5. Selector stored for automation

Advantages:

  • ✅ Works with any button design
  • ✅ No pattern recognition needed
  • ✅ User explicitly defines intent
  • ✅ Handles unusual layouts

CSS Selector Preservation

Captured information:

interface NextButtonInfo {
  selector: string;         // "button.pagination-next"
  text: string;            // "Next"
  ariaLabel: string;       // "Next page"
  position: {x, y};        // Visual location
}

Selector generation:

function generateSelector(element) {
  // Priority 1: Unique ID
  if (element.id) {
    return `#${element.id}`;
  }

  // Priority 2: Unique class
  if (element.className) {
    return `.${element.className.split(' ').join('.')}`;
  }

  // Priority 3: Element path
  return getElementPath(element);
}

Handling Button State Changes

Problem: "Next" button changes when disabled

Example:

<!-- Active state -->
<button class="pagination-next" aria-disabled="false">Next</button>

<!-- Disabled state (last page) -->
<button class="pagination-next disabled" aria-disabled="true">Next</button>

Detection strategy:

function isNextButtonAvailable(selector) {
  const button = document.querySelector(selector);

  if (!button) return false;

  // Check multiple disabled indicators
  return !(
    button.disabled ||
    button.classList.contains('disabled') ||
    button.getAttribute('aria-disabled') === 'true' ||
    button.style.pointerEvents === 'none'
  );
}

End-of-pagination detection:

  • Button disabled → Stop scraping
  • Button missing → Stop scraping
  • Button redirects to same page → Stop scraping

Infinite Scroll vs Traditional Pagination

Traditional Pagination (Button-Based)

Characteristics:

  • "Next" or page number buttons
  • Discrete pages (Page 1, 2, 3...)
  • URL changes per page
  • Clear page boundaries

Scraping approach:

let hasNextPage = true;
let pageCount = 0;

while (hasNextPage) {
  // Extract current page
  const pageData = await extractTableData(tabId);
  allData.push(...pageData);

  // Check if next button exists
  hasNextPage = await isNextButtonAvailable(nextButtonSelector);

  if (hasNextPage) {
    await clickNextPage(tabId, nextButtonSelector);
    pageCount++;
  }
}

Best for:

  • Product catalogs
  • Search results
  • Directory listings
  • Archive pages

Infinite Scroll (Scroll-to-Load)

Characteristics:

  • No "Next" button
  • Scrolling triggers loading
  • Continuous feed
  • URL typically doesn't change

Detection:

async function scrollToLoadMore(tabId, selector, options) {
  // Scroll to bottom of container
  const container = document.querySelector(selector);
  container.scrollTop = container.scrollHeight;

  // Wait for new content
  await waitForNetworkQuiet(options.networkQuietMs);
}

Scraping approach:

let previousItemCount = 0;
let currentItemCount = 0;
let noNewContentCount = 0;

while (noNewContentCount < 3) {
  currentItemCount = document.querySelectorAll('.item').length;

  if (currentItemCount === previousItemCount) {
    noNewContentCount++;
  } else {
    noNewContentCount = 0;
  }

  await scrollToLoadMore(tabId, '.scroll-container');
  previousItemCount = currentItemCount;
}

Best for:

  • Social media feeds
  • News feeds
  • Product grids
  • Image galleries

Hybrid Scenarios

Some sites use both:

  • Infinite scroll within a page
  • "Load More" button at bottom
  • Pagination after N scrolls

Adaptive strategy:

async function handleHybridPagination() {
  // Try infinite scroll first
  await scrollToLoadMore();

  // Check for "Load More" button
  const loadMoreButton = document.querySelector('.load-more');
  if (loadMoreButton) {
    loadMoreButton.click();
    await waitForNetworkQuiet();
  }

  // Check for traditional next button
  const nextButton = document.querySelector('.next-page');
  if (nextButton) {
    await clickNextPage();
  }
}

Real-World Pagination Scenarios

Scenario 1: E-commerce Product Search (Amazon-style)

Page structure:

  • 48 products per page
  • 23 pages total
  • "Next" button at bottom

Extraction workflow:

1. Search for "wireless keyboards"
2. Land on results page 1
3. Click "Find Tables" → AI detects product grid
4. Extract page 1 data (48 products)
5. Click "Mark Next Button" → Click "Next" link
6. Enable "Auto-paginate"
7. System extracts pages 2-23 automatically
8. Total: 1,104 products extracted in ~2 minutes

Export:

  • CSV with fields: Title, Price, Rating, Review Count, URL
  • 1,104 rows
  • Ready for analysis in Excel

Scenario 2: Job Listings with Infinite Scroll (LinkedIn-style)

Page structure:

  • Infinite scroll
  • ~20 jobs load per scroll
  • No traditional pagination

Extraction workflow:

1. Navigate to job search page
2. Click "Find Tables" → AI detects job listing cards
3. Extract initially visible jobs (~20)
4. Click "Auto-scroll Mode"
5. System scrolls, waits for loading, extracts
6. Repeats until no new jobs appear
7. Total: 380 jobs extracted in ~5 minutes

Smart stopping:

  • Detects when same jobs appear (no new content)
  • Stops after 3 scroll attempts with no new data

Scenario 3: Multi-Level Navigation (Category → Subcategory → Products)

Page structure:

  • Category page with subcategories
  • Each subcategory has paginated products
  • Need to scrape ALL subcategories

Manual approach required:

  1. Extract category list
  2. For each category: a. Navigate to category page b. Enable auto-pagination c. Extract all products
  3. Aggregate data

Tip: Use MCP integration for complex multi-level automation

Scenario 4: Date-Ranged Data (Transaction History)

Page structure:

  • Results filtered by date
  • Pagination within each date range
  • Must maintain date context

Workflow:

For each month:
  1. Set date filter (e.g., January 2026)
  2. Extract page 1 data
  3. Auto-paginate through all pages for that month
  4. Add month metadata to extracted data
  5. Move to next month

Data enrichment:

  • Append "Month: January 2026" to each row
  • Maintains temporal context
  • Enables time-series analysis

Handling Edge Cases and Errors

Case 1: Back/Forward Cache (bfcache) Errors

Problem: Browser navigation uses cache, breaks message channels

Error message:

"The message port closed before a response was received"

Solution: Detect bfcache and re-inject script

if (error.includes('back/forward cache')) {
  // Wait for navigation
  while (Date.now() - startTime < maxWait) {
    const tab = await chrome.tabs.get(tabId);
    if (tab.url !== initialUrl && tab.status === 'complete') {
      // Re-inject script
      await ensureScriptInjected(tabId);
      resolve();
      return;
    }
    await sleep(200);
  }
}

Case 2: Slow-Loading Pages

Problem: Page takes 10+ seconds to load fully

Solution: Configurable wait times

const waitOptions = {
  minWaitMs: 2000,      // Wait at least 2 seconds
  maxWaitMs: 30000,     // Don't wait forever (30s max)
  networkQuietMs: 1000  // 1 second of network silence
};

Adaptive waiting:

  • Fast pages: Finishes in ~2 seconds
  • Slow pages: Waits up to 30 seconds
  • Extremely slow: Times out, moves to next page

Case 3: No Next Button Found

Problem: User marked wrong element, or button selector changed

Detection:

if (!document.querySelector(nextButtonSelector)) {
  throw new Error("Next button not found. Please re-mark the button.");
}

User action:

  • Re-enter "Mark Next Button" mode
  • Click correct button
  • Resume extraction

Case 4: Captcha or Login Walls

Problem: Site requires authentication or human verification

Current limitation: Cannot auto-solve captchas (intentional - respects site security)

User workflow:

  1. Scraper detects captcha (page doesn't load)
  2. Pauses extraction
  3. Notifies user: "Please solve captcha in browser"
  4. User solves captcha manually
  5. User clicks "Resume" in extension
  6. Extraction continues

Case 5: Rate Limiting

Problem: Site blocks requests after too many pages

Solution: Built-in delays + respectful scraping

const DELAY_BETWEEN_PAGES = 1500; // 1.5 seconds

async function scrapeWithDelay() {
  for (let page = 0; page < totalPages; page++) {
    await extractPage(page);
    await sleep(DELAY_BETWEEN_PAGES);
  }
}

Best practices:

  • 1-2 seconds between pages
  • Respect robots.txt
  • Use reasonable batch sizes (< 100 pages per session)

Best Practices for Multi-Page Extraction

1. Test on First 3 Pages

Before full extraction:

1. Extract page 1 → Verify data quality
2. Click next → Extract page 2 → Verify consistency
3. Click next → Extract page 3 → Verify pagination works
4. Review 3 pages of data in preview
5. If good → Enable auto-pagination for remaining pages

Why: Catches issues early before extracting 1,000+ records

2. Set Reasonable Limits

Prevent runaway scraping:

const MAX_PAGES = 100; // Safety limit

let pageCount = 0;
while (hasNextPage && pageCount < MAX_PAGES) {
  await extractPage();
  pageCount++;
}

Protects against:

  • Infinite loops
  • Site blocking
  • Excessive resource usage

3. Monitor Progress

User feedback during extraction:

Extracting page 15 of 47...
Records extracted: 720
Estimated time remaining: 2 minutes

Provides:

  • Confidence extraction is working
  • Ability to stop if issues arise
  • Progress tracking for large datasets

4. Handle Partial Failures Gracefully

Strategy:

const failedPages = [];

for (let page = 0; page < totalPages; page++) {
  try {
    await extractPage(page);
  } catch (error) {
    failedPages.push(page);
    console.error(`Failed to extract page ${page}:`, error);
    // Continue to next page
  }
}

if (failedPages.length > 0) {
  alert(`Extraction complete. Failed pages: ${failedPages.join(', ')}`);
}

Benefits:

  • Doesn't stop entire extraction due to one failure
  • Tracks which pages failed
  • Allows retry of failed pages

5. Export Incrementally for Large Datasets

For 10,000+ records:

Option 1: Export every 100 pages
  - Reduces memory usage
  - Provides intermediate backups
  - Safer for very large scrapes

Option 2: Stream to file
  - Append to CSV as pages complete
  - Never hold entire dataset in memory
  - Works for unlimited page counts

6. Respect Website Terms of Service

Legal and ethical considerations:

  • ✅ Check robots.txt
  • ✅ Read Terms of Service
  • ✅ Use reasonable request rates
  • ✅ Don't scrape private/protected data
  • ❌ Don't bypass paywalls
  • ❌ Don't scrape for commercial resale without permission

Frequently Asked Questions

How fast can I scrape multiple pages?

Speed factors:

  • Page load time (2-5 seconds per page)
  • Your wait settings (1-2 second delays recommended)
  • Site speed and server response
  • Network connection

Realistic speeds:

  • Fast sites: 15-20 pages/minute
  • Average sites: 10-15 pages/minute
  • Slow sites: 5-10 pages/minute

Example: 100 pages = 6-10 minutes

What's the maximum number of pages I can scrape?

Technical limit: No hard limit

Practical limits:

  • Browser memory: ~1,000-2,000 pages before slowdown
  • Site rate limiting: Varies by website
  • Your patience: Scraping 10,000 pages takes hours

Recommendation: Batch large scrapes

  • Scrape 100-500 pages per session
  • Export data
  • Continue in new session

Can I scrape while doing other work?

Current limitation: Extension requires active tab

Workaround:

  1. Start extraction on target site
  2. Use second browser window for other work
  3. Check progress periodically

Future feature: Background extraction mode (in development)

What if pagination uses POST requests?

Challenge: Some sites use form POST instead of GET links

Current solution:

  • Visual "Mark Next Button" still works
  • Clicking form submit button works same as link

Advanced scenario (AJAX POST):

Does it work with JavaScript-disabled pagination?

Yes: If pagination works when JavaScript is disabled, scraper works too

No: If site requires JavaScript for pagination (most modern sites), scraper needs JavaScript enabled (which it is by default)

Can I resume a cancelled extraction?

Current limitation: No built-in resume feature

Workaround:

  1. Note last successfully extracted page
  2. Navigate to next page manually
  3. Restart extraction from there

Data deduplication: Built-in visited URL tracking prevents extracting same pages twice

Conclusion

Automatic pagination handling transforms multi-page data extraction from a tedious manual process into a fast, automated operation. By intelligently detecting next buttons, monitoring page load states, and handling edge cases, modern scrapers can extract thousands of records across unlimited pages without coding.

Key capabilities:

  • ✅ Visual next button detection (point and click)
  • ✅ Intelligent page load monitoring (network + DOM + loading indicators)
  • ✅ Deduplication across pages
  • ✅ Infinite scroll support
  • ✅ Hybrid pagination handling
  • ✅ Graceful error recovery

When to use automatic pagination:

  • Product catalogs with many pages
  • Search results across multiple pages
  • Historical data archives
  • Directory listings
  • Any scenario with 5+ pages

Best practices:

  • Test on first 3 pages before full extraction
  • Set reasonable page limits
  • Add 1-2 second delays between pages
  • Monitor progress
  • Export incrementally for large datasets

Ready to scrape thousands of records? Install the OnPiste Chrome extension and extract multi-page data in minutes, not hours.


Share this article