Building a Full-Stack Job Scraping Application: React, FastAPI & SeleniumBase

Introduction

In today’s competitive job market, efficiently aggregating job listings from multiple sources can be a game-changer for job seekers. In this article, I’ll walk you through how I built a professional job scraping application that extracts data from Indeed.com using modern technologies like React.js, FastAPI, and SeleniumBase.

What You’ll Learn:

  • Building a RESTful API with FastAPI
  • Web scraping with SeleniumBase CDP mode
  • Creating a React frontend with real-time updates
  • Bypassing anti-bot detection systems
  • Implementing multi-format data export (CSV, Excel, JSON)

The Problem I Solved

Job hunting is time-consuming. Manually browsing through hundreds of listings, keeping track of companies, and organizing applications is tedious. I wanted to create a tool that:

✅ Automates job listing collection from Indeed.com

✅ Provides real-time analytics on job postings

✅ Exports data in multiple formats for easy analysis

✅ Bypasses anti-scraping mechanisms reliably


Technology Stack & Architecture

Backend: Python FastAPI

I chose FastAPI for several reasons:

  • Lightning-fast performance with async support
  • Automatic API documentation (Swagger UI)
  • Type safety with Pydantic models
  • Modern Python features (async/await)

Frontend: React.js

React provides:

  • Component-based architecture for maintainability
  • State management for real-time updates
  • Responsive UI for all devices
  • Rich ecosystem of libraries

Scraping Engine: SeleniumBase

SeleniumBase with CDP mode offers:

  • Anti-bot bypass capabilities
  • Reliable browser automation
  • Chrome DevTools Protocol for direct control
  • Undetected mode for stealth scraping

Architecture Diagram

User Interface (React)
        ↓
    API Layer (FastAPI)
        ↓
Scraping Engine (SeleniumBase)
        ↓
    Indeed.com
        ↓
    Data Processing
        ↓
Export (CSV/Excel/JSON)

Key Features Implemented

1. Smart Job Scraping

The scraper intelligently navigates Indeed’s search results, handling pagination and extracting:

  • Job titles
  • Company names
  • Locations (including remote positions)
  • Direct job URLs
  • Posting dates

2. Anti-Bot Protection Bypass

Using SeleniumBase’s CDP mode, the scraper:

  • Runs in undetected mode
  • Mimics human behavior
  • Handles dynamic content loading
  • Manages rate limiting

Code Snippet:

from seleniumbase import SB

with SB(uc=True, headless=True) as sb:
    sb.open(search_url)
    sb.wait_for_element(".job_seen_beacon")
    jobs = sb.find_elements(".job_seen_beacon")

3. Real-Time Analytics

The application provides instant insights:

  • Total jobs found
  • Number of unique companies
  • Unique locations
  • Top hiring companies
  • 100% link validation

4. Multi-Format Export

Users can download data in three formats:

  • CSV: Excel-compatible, perfect for spreadsheets
  • Excel: Native .xlsx format with preserved formatting
  • JSON: Developer-friendly, ideal for API integration

Technical Implementation

Backend API Structure

main.py – FastAPI application with CORS configuration:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI(title="Indeed Job Scraper API")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.post("/scrape")
async def scrape_jobs(request: ScrapeRequest):
    # Scraping logic here
    pass

scraper.py – Core scraping logic:

def scrape_indeed(job_title: str, location: str, pages: int):
    jobs = []
    
    for page in range(pages):
        # Navigate to search results
        # Extract job data
        # Validate links
        pass
    
    return jobs

models.py – Pydantic data validation:

from pydantic import BaseModel

class ScrapeRequest(BaseModel):
    job_title: str
    location: str
    pages: int

class JobListing(BaseModel):
    title: str
    company: str
    location: str
    url: str

Frontend Component Structure

JobScraper.js – Search form:

const JobScraper = () => {
  const [jobTitle, setJobTitle] = useState('');
  const [location, setLocation] = useState('');
  const [pages, setPages] = useState(1);
  
  const handleSubmit = async (e) => {
    e.preventDefault();
    const response = await api.scrapeJobs({
      job_title: jobTitle,
      location,
      pages
    });
    // Handle response
  };
  
  return (
    <form onSubmit={handleSubmit}>
      {/* Form fields */}
    </form>
  );
};

JobTable.js – Results display with filtering:

const JobTable = ({ jobs }) => {
  const [filter, setFilter] = useState('');
  
  const filteredJobs = jobs.filter(job =>
    job.title.toLowerCase().includes(filter.toLowerCase()) ||
    job.company.toLowerCase().includes(filter.toLowerCase())
  );
  
  return (
    <div>
      <input 
        placeholder="Filter jobs..."
        onChange={(e) => setFilter(e.target.value)}
      />
      <table>
        {filteredJobs.map(job => (
          <tr key={job.url}>
            <td>{job.title}</td>
            <td>{job.company}</td>
            <td>{job.location}</td>
          </tr>
        ))}
      </table>
    </div>
  );
};

Challenges & Solutions

Challenge 1: Anti-Bot Detection

Problem: Indeed implements sophisticated bot detection

Solution: Used SeleniumBase CDP mode with undetected Chrome driver

Challenge 2: Dynamic Content Loading

Problem: Job listings load asynchronously

Solution: Implemented smart waiting strategies with explicit waits

Challenge 3: Data Consistency

Problem: Inconsistent HTML structure across job listings

Solution: Created robust parsing with fallback mechanisms

Challenge 4: Performance

Problem: Scraping multiple pages was slow

Solution: Optimized selectors and implemented efficient data extraction


Results & Performance

Scraping Speed: ~15-20 jobs per page in 5-7 seconds

Success Rate: 100% link validation

Reliability: CDP mode ensures consistent results

Scalability: Can handle 1-10 pages per search

Analytics Provided:

  • Real-time job count
  • Company distribution
  • Location insights
  • Top hiring companies

Screenshots

Main Interface

Results Dashboard

Indeed Job Scraper

Export Options


Installation & Usage

Quick Start

# Clone the repository
git clone https://github.com/seotanvirbd/indeed_fastapi_reactjs_scraper_app.git

# Backend setup
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python run.py

# Frontend setup (new terminal)
cd frontend
npm install
npm start

Usage Example

  1. Enter job title: “Software Engineer”
  2. Set location: “Remote”
  3. Choose pages: 3
  4. Click “START SCRAPING”
  5. View results with analytics
  6. Export in preferred format

Lessons Learned

Technical Insights

  1. FastAPI is incredibly fast – The async nature makes it perfect for I/O-bound tasks
  2. SeleniumBase CDP mode is powerful – Reliable bot detection bypass
  3. React’s component model scales well – Easy to maintain and extend
  4. Type safety matters – Pydantic caught many bugs during development

Development Practices

  1. Start with MVP – Built core scraping first, added features incrementally
  2. Test thoroughly – Multiple test runs on different searches
  3. Handle errors gracefully – Robust error handling prevents crashes
  4. Document as you go – Made future development easier

Ethical Considerations

Important: This tool is for educational and personal use only. When scraping:

  • ✅ Respect robots.txt
  • ✅ Implement rate limiting
  • ✅ Review Terms of Service
  • ✅ Use data responsibly
  • ❌ Don’t overwhelm servers
  • ❌ Don’t violate privacy

Conclusion

Building this full-stack job scraping application taught me valuable lessons about:

  • Modern API development with FastAPI
  • Advanced web scraping techniques
  • React state management
  • Bypassing anti-bot systems ethically
  • Creating user-friendly interfaces

The project demonstrates proficiency in:

  • Backend Development (Python, FastAPI, async programming)
  • Frontend Development (React, JavaScript, responsive design)
  • Web Scraping (SeleniumBase, CDP mode, anti-detection)
  • API Design (RESTful principles, documentation)
  • Data Processing (CSV, Excel, JSON export)

Resources & Links

  • GitHub Repository: https://github.com/seotanvirbd/indeed_fastapi_reactjs_scraper_app

Get In Touch

Found this helpful? Have questions or suggestions?

  • GitHub: @seotanvirbd
  • LinkedIn: https://www.linkedin.com/in/seotanvirbd/
  • Email: tanvirafra1@gmail.com

⭐ Star the repository if you find it useful!

Tags: #Python #FastAPI #React #WebScraping #SeleniumBase #FullStack #API #JobSearch #Automation #WebDevelopment


This article is part of my portfolio showcasing full-stack development skills. Check out my other projects on GitHub.

Leave a Reply