ESP32-CAM Intelligent Camera Web Server

A standalone embedded system for real-time MJPEG video streaming and on-device face detection using the ESP32-CAM module.

Embedded Systems · IoT · Edge AI · HTTP Streaming

ESP32-CAM Intelligent Camera Web Server with MJPEG Streaming and Face Detection

Project Overview

This project demonstrates a low-cost embedded vision system built on the ESP32-CAM platform. It enables live video streaming and on-device face detection through a browser-accessible web interface.

The system operates entirely locally, without cloud services or PC-based processing, making it suitable for IoT prototyping, edge AI experimentation, and embedded systems education.

Low-cost embedded vision system No cloud or external server dependency Fully browser-accessible via HTTP

Hardware Used

ESP32-CAM (AI Thinker)
OV2640 Camera Module
On-board PSRAM
On-board LED Flash
FTDI USB-to-TTL Programmer
Web Browser (Client)

Key Features

Core Capabilities

Live MJPEG Video Streaming

Real-time MJPEG video streaming directly from the ESP32-CAM to any modern web browser.

Browser-based Image Capture

Capture still images instantly through the web interface.

Web-based Camera Controls

Adjust resolution, image quality, and camera parameters directly from the browser.

Standalone Operation

Operates independently without cloud services, external servers, or PC-based processing.

Processing & Intelligence

On-device Face Detection

Face detection runs entirely on the ESP32-CAM, identifying faces and drawing bounding boxes in real time.

PSRAM Frame Buffering

On-board PSRAM enables efficient frame buffering for smoother streaming and processing.

LED Flash Control

Integrated LED flash improves image capture in low-light conditions.

Real-time Status Monitoring

Camera and system status are dynamically updated through the web interface.

How It Works

1

Camera Captures Frame

The OV2640 camera captures an image frame and sends it to the ESP32-CAM.

2

Frame Stored in PSRAM

Frames are temporarily buffered in PSRAM for efficient handling.

3

Optional Face Detection Processing

Frames may be processed for face detection, with bounding boxes drawn on detected faces.

4

JPEG Encoding

Frames are encoded as JPEG images to optimize transmission.

5

Streamed to Browser via HTTP

MJPEG streaming delivers real-time video directly to the browser.

Simplified Flow Diagram

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   OV2640        │────│   ESP32-CAM      │────│   Web Browser   │
│   Camera        │    │   (Web Server)   │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │  Capture Frame        │  HTTP MJPEG Stream    │
         │──────────────────────▶│──────────────────────▶│
         │                       │                       │
         │  PSRAM Buffer         │  JPEG Encoding        │
         │◀──────────────────────│◀──────────────────────│
         │                       │                       │
         │  Face Detection       │  Control Signals      │
         │◀──────────────────────│──────────────────────▶│
         └───────────────────────┘───────────────────────┘

Face Detection & Recognition

Face Detection

The system performs real-time face detection entirely on the ESP32-CAM.

  • Runs entirely on-device
  • Detects human faces in real time
  • Draws bounding boxes on frames
  • No external processing required

Face Recognition

Face recognition is not supported on ESP32 hardware due to processing limitations.

  • Not supported on ESP32 hardware
  • Requires significantly higher processing capability
  • Intended for ESP32-S3–based systems

Important Note

The "Enroll Face" option is limited by ESP32 hardware constraints. This project focuses on face detection only, not recognition.

Performance & Limitations

Performance

  • Real-time MJPEG video streaming

    Smooth video streaming directly to web browsers

  • Adjustable resolution and image quality

    Configurable settings to balance performance and quality

  • LED-assisted low-light capture

    Built-in LED flash for improved capture in low-light conditions

Limitations

  • Limited FPS due to CPU constraints

    Frame rate is limited by the ESP32's processing power

  • Face detection works best at low resolution

    Higher resolutions impact performance and detection accuracy

  • No real-time face recognition on ESP32

    Face recognition requires more processing power than ESP32 provides

Applications

Basic Smart Surveillance (Prototype-level)

Ideal for home security and monitoring applications with real-time face detection capabilities.

IoT Visual Monitoring Nodes

Perfect for distributed IoT networks requiring visual monitoring and edge processing.

Embedded AI Demonstrations

Excellent platform for showcasing AI capabilities on resource-constrained devices.

Educational Projects

Great for teaching embedded systems, IoT, and computer vision concepts.

Attendance Systems (ESP32-S3)

With ESP32-S3 upgrade, can be used for face recognition-based attendance tracking.

Prototype Industrial Monitoring

Suitable for monitoring production lines, safety compliance, and personnel tracking.

What This Project Does NOT Do

Out of Scope

  • No cloud-based processing
  • No face recognition on ESP32
  • No external AI hardware
  • No advanced video codecs (MJPEG only)

Conclusion

This project demonstrates how a low-cost ESP32-CAM can be used to build a complete, standalone camera web server with real-time MJPEG streaming and on-device face detection.

It highlights:

Embedded networking using HTTP Practical limits of edge AI on ESP32 Real-time video streaming without cloud dependency

Future Improvements (Optional)

  • ESP32-S3 Migration for face recognition

    ESP32-S3 offers improved vector performance, making face recognition more feasible compared to ESP32.

  • Enhanced Web Dashboard UI

    A more advanced web dashboard with analytics, detection history, and enhanced configuration options.

  • Event-driven capture and detection

    External AI processing could be explored in future iterations but is not part of the current implementation.

About the Author

Mayank Kulkarni - Founder of MKTechs & Zervista

Mayank Kulkarni

Embedded Systems | Full-Stack | IoT | AI | Full Stack Developer

Founder of MKTechs & Zervista

https://mayank.wiki

Expert in embedded systems, IoT, and edge AI technologies. Specializing in full-stack development and innovative technology solutions.