Building an Object Detector with TensorFlow

Jan 8, 2024 · 8 min read

Building an Object Detector with TensorFlow

Object detection has emerged as a core innovation with applications in various domains, ranging from computer vision, security and authentication measures, crime detection, transportation, real-time image analysis, etc. By leveraging the capabilities of TensorFlow in a web application, we can achieve efficient object detection using Tensor models. This article aims to provide insight into object detection and a practical example of steps to integrate object detection into a web application.

The importance of object detection in various fields cannot be overstated, as it offers numerous benefits:

AI-driven vehicles: By implementing object detection models, self-driven cars can easily detect and correctly identify street signs, traffic lights, people, and other locomotives, facilitating better navigation and decision-making while driving.
Security and surveillance: Object detection greatly enhances security systems, enabling surveillance systems to track and identify people and objects, monitor behaviors, and enhance security in public places such as airports.
Healthcare: As an application in the medical field, trained models can identify diseases and abnormalities in scans. These models allow such abnormalities to be detected timely and remedied.
Enhanced visual understanding and data representation: Using object detection, a computer system can understand and interpret graphical data and represent this information in a form its users can quickly assimilate.

What is object detection?

Object detection is an element of computer vision that uses Artificial Intelligence (AI) to derive meaningful information from images, videos, and visual inputs to the system. An object detection model detects, categorizes, and outlines objects in the visual input (often within a bounding box) and executes or recommends actions defined by that model.

Object detection models are primarily categorized by two key components: visual input localization and object classification. Visual input localization deals with locating objects in the input and assigning a bounding box around the detected objects. These boxes consider the detected object’s width, height, and position (x and y coordinates). Object classification assigns labels to detected objects, providing context as to what they are.

What is Tensorflow?

According to its docs, TensorFlow is an open-source machine learning framework developed by Google to provide a variety of tools and resources for machine learning-related tasks such as computer vision, deep learning, and natural language processing. It supports various programming languages, including popular languages such as Python and C++, making it accessible to broader users. With TensorFlow, users can create machine learning models that run on any platform, i.e., desktop, mobile, web, and cloud applications.

Setting up the development environment

For this article, we will be working with React.js and TensorFlow. To begin, carry out the steps below:

Set up a React app on your local machine, open up your project directory, and proceed with the next step to add the TensorFlow dependency.
Open up a shell window in your project root directory and install the TensorFlow dependencies with the following command:

npm i @tensorflow/tsfjs-backend-cpu @tensorflow/tfjs-backend-webgl @tensorflow/tfjs-converter @tensorflow/tfjs-core @tensorflow-models/coco-ssd

We will develop a web application that can detect objects using two methods. The first detects objects in an uploaded image, while the second detects objects from a camera source.

Object detection with image uploads

In this section, we will create an object detector that operates on uploaded images to identify objects present in them. First, create a component folder and a new file, ImageDetector.js. In this file, add the code below:

"use client";
import React, { useRef, useState } from "react";
import * as cocoSsd from "@tensorflow-models/coco-ssd";
import "@tensorflow/tfjs-backend-webgl";
import "@tensorflow/tfjs-backend-cpu";

const ImageDetector = () => {
 // We are going to use useRef to handle image selects
 const ImageSelectRef = useRef();

 // Image state data
 const [imageData, setImageData] = useState(null);

 // Function to open image selector
 const openImageSelector = () => {
   if (ImageSelectRef.current) {
     ImageSelectRef.current.click();
   }
 };

 // Read the converted image
 const readImage = (file) => {
   return new Promise((resolve, reject) => {
     const reader = new FileReader();
     reader.addEventListener("load", (e) => resolve(e.target.result));
     reader.addEventListener("error", reject);
     reader.readAsDataURL(file);
   });
 };

 // To display the selected image
 const onImageSelect = async (e) => {
   // Convert the selected image to base64
   const file = e.target.files[0];
   if (file && file.type.substr(0, 5) === "image") {
     console.log("image selected");
   } else {
     console.log("not an image");
   }

   const image = await readImage(file);
   // Set the image data
   setImageData(image);
 };

 return (
   <div
     style={{
       display: "flex",
       height: "100vh",
       justifyContent: "center",
       alignItems: "center",
       gap: "25px",
       flexDirection: "column",
     }}
   >
     <div
       style={{
         border: "1px solid black",
         minWidth: "50%",
         position: "relative"
       }}
     >
       {/* Display uploaded image here */}
       {
         //   If image data is null then display a text
         !imageData ? (
           <p
             style={{
               textAlign: "center",
               padding: "250px 0",
               fontSize: "20px",
             }}
           >
             Upload an image
           </p>
         ) : (
           <img
             src={imageData}
             alt="uploaded image"
           />
         )
       }
     </div>
     {/* Input image file */}
     <input
       style={{
         display: "none",
       }}
       type="file"
       ref={ImageSelectRef}
       onChange={onImageSelect}
     />
     <div
       style={{
         padding: "10px 15px",
         background: "blue",
         color: "#fff",
         fontSize: "48px",
         hover: {
           cursor: "pointer",
         },
       }}
     >
       {/* Upload image button */}
       <button onClick={() => openImageSelector()}>Select an Image</button>
     </div>
   </div>
 );
};

export default ImageDetector;

In the code snippet above, we use useRef and FileReader to handle image selection. When the “Select an Image” button is clicked, the hidden file input is triggered, and the file selection window opens. The selected image is converted to base64 and read using the JavaScript FileReader in an asynchronous function. When this happens, the imageData state is updated to contain the selected image, and a ternary operator displays the image.

To mount this component, we can add and render it in the project directory primary app/page.jsx:

import ImageDetector from "@/components/imagedetector";

export default function Home() {
 return (
   <div>
     <ImageDetector />
   </div>
 );
}

If we run the application, we will get the following result:

Creating the image object detector

To handle object detection in the selected image, we will create a new function that uses the TensorFlow Coco-ssd object detection model we earlier installed:

//...
// Function to detect objects in the image
const handleObjectDetection = async (imageElement) => {
 // load the model
 const model = await cocoSsd.load();
 // Detect objects in the image
 const predictions = await model.detect(imageElement, 5);
 console.log("Predictions: ", predictions);
};

The code above uses the cocoSsd package to handle object detection. We have also limited the number of detections to 5, as running many detections on the browser can sometimes result in slower results if the system it is being run on doesn’t possess sufficient GPU and CPU memory.

To create imageElement, we will create an element within the onImageSelect function and pass the data for the selected image to it.

// Within the onImageSelect function
//...
// Create an image element
const imageElement = document.createElement("img");
imageElement.src = image;
imageElement.onload = async () => {
 handleObjectDetection(imageElement);
};

The predictions returned by the handleObjectDetection function classify objects in the selected image and provide information on their position. Using this information, we will construct a bounding box. For example, we can add a picture of a dog and a cat and get the following results: The image below shows the logged predictions in the browser console: Upon closer inspection, we can see the predictions generated by the object detection model.

bbox: This provides information on the bounding box position and sizes.
class: This is a classification of objects in the image.
score: An estimate on the accuracy of the prediction. The closer this value is to 1, the more accurate the prediction is.

Constructing bounding box for predicted objects

For the predictions, we will create a state object which will be updated with the prediction data returned by the model.

// Handle predictions
const [predictionsData, setPredictionsData] = useState([]);

To update this state with the prediction data, make the following changes to handleObjectDetection:

// Set the predictions data
setPredictionsData(predictions);

Using the predictionsData array, we will create a function that generates a bounding box for each element in the image. To create bounding boxes of unique colors, we will first create an array of colors and a function to shuffle the array and provide a unique color section:

// An array of bounding box colors
const colors = ["blue", "green", "red", "purple", "orange"];

// Shuffle the array to randomize the color selection
function shuffleArray(array) {
 for (let i = array.length - 1; i > 0; i--) {
   const j = Math.floor(Math.random() * (i + 1));
   [array[i], array[j]] = [array[j], array[i]];
 }
}

shuffleArray(colors);

Finally, we can create and display the bounding box with the following lines of code:

{/* Create bounding box */}
{
 // If predictions data is not null then display the bounding box
 // Each box has a unique color
 predictionsData.length > 0 &&
   predictionsData.map((prediction, index) => {
     const selectedColor = colors.pop(); // Get the last color from the array and remove it
     return (
       <div
         key={index}
         style={{
           position: "absolute",
           top: prediction.bbox[1],
           left: prediction.bbox[0],
           width: prediction.bbox[2],
           height: prediction.bbox[3],
           border: `2px solid ${selectedColor}`,
           display: "flex",
           justifyContent: "center",
           alignItems: "center",
           flexDirection: "column",
         }}
       >
         <p
           style={{
             background: selectedColor,
             color: "#fff",
             padding: "10px 15px",
             fontSize: "20px",
             textTransform: "capitalize",
           }}
         >
           {prediction.class}
         </p>
         <p
           style={{
             background: selectedColor,
             color: "#fff",
             padding: "10px 15px",
             fontSize: "20px",
             textTransform: "capitalize",
           }}
         >
           {Math.round(parseFloat(prediction.score) * 100)}%
         </p>
       </div>
     );
   })
}
{/* Display uploaded image here */}
//... Other code below

In the browser, we now have the following results:

Running object detector model on a camera

In this section, we will create a component that runs object detection using visual data from a camera. To access the system webcam, we will install a new dependency, React-webcam:

npm i react-webcam

Accessing visual data with React-webcam

To access the system webcam with the React-webcam package, In the components directory, create a new file CameraObjectDetector.jsx and add the following code to it:

"use client";
import React, { useRef, useState, useEffect } from "react";
import * as cocoSsd from "@tensorflow-models/coco-ssd";
import Webcam from "react-webcam";

const CameraObjectDetector = () => {
 // Use useRef to handle webcam
 const webcamRef = useRef(null);
 // Use useRef to draw canvas
 const canvasRef = useRef(null);

 // Handle predictions
 const [predictionsData, setPredictionsData] = useState([]);

 //   Intialize cocoSsd
 const initCocoSsd = async () => {
   const model = await cocoSsd.load();
   setInterval(() => {
     detectCam(model);
   }, 5);
 };

 // Function to use webcam
 const detectCam = async (model) => {
   if (
     webcamRef.current !== undefined &&
     webcamRef.current !== null &&
     webcamRef.current.video.readyState === 4
   ) {
     // Get video properties
     const video = webcamRef.current.video;
     const videoWidth = webcamRef.current.video.videoWidth;
     const videoHeight = webcamRef.current.video.videoHeight;

     // Set video width and height
     webcamRef.current.video.width = videoWidth;
     webcamRef.current.video.height = videoHeight;

     // Set the canvas width and height
     canvasRef.current.width = videoWidth;
     canvasRef.current.height = videoHeight;

     // Detect objects in the image
     const predictions = await model.detect(video);
     console.log(predictions);

     // Draw canvas
     const ctx = canvasRef.current.getContext("2d");
   }
 };

 // useEffect to handle cocoSsd
 useEffect(() => {
   initCocoSsd();
 }, []);
 return (
   <div
     style={{
       height: "100vh",
       display: "flex",
       alignItems: "center",
       justifyContent: "center",
     }}
   >
     <Webcam
       ref={webcamRef}
       style={{
         position: "relative",
         marginLeft: "auto",
         marginRight: "auto",
         left: 0,
         right: 0,
         textAlign: "center",
         zindex: 9,
         width: 640,
         height: 480,
       }}
     />
     <canvas
       ref={canvasRef}
       style={{
         position: "absolute",
         marginLeft: "auto",
         marginRight: "auto",
         left: 0,
         right: 0,
         textAlign: "center",
         zindex: 9,
         width: 640,
         height: 480,
       }}
     />
   </div>
 );
};

export default CameraObjectDetector;

In the code block above, we have integrated React-webcam to return video footage captured by the detected system webcam. We are also passing this footage on to the cocoSsd model and logging predictions based on it. In the subsequent section, we will use this prediction data to create bounding boxes in the canvas element. To mount the component, make the following change to app/page.jsx:

import CameraObjectDetector from "@/components/CameraObjectDetector";
import ImageDetector from "@/components/imagedetector";

export default function Home() {
 return (
   <div>
     {/* <ImageDetector /> */}
     <CameraObjectDetector />
   </div>
 );
}

Now, if we run the application, we will get the following result in the browser: In the console, we can see the cocoSsd predictions: In the image above, the logged predictions contain the bbox, class, and the score value.

Drawing bounding boxes

To draw bounding boxes, we will create a function, drawBoundingBox, which takes in the predictions and constructs rectangles on objects in the video using the data.

// An array of bounding box colors
const colors = ["blue", "green", "red", "purple", "orange"];

// Shuffle the array to randomize the color selection
function shuffleArray(array) {
 for (let i = array.length - 1; i > 0; i--) {
   const j = Math.floor(Math.random() * (i + 1));
   [array[i], array[j]] = [array[j], array[i]];
 }
}

shuffleArray(colors);

// Counter to keep track of the next color to use
let colorCounter = 0;

// Draw bounding boxes
const drawBoundingBox = (predictions, ctx) => {
 predictions.forEach((prediction) => {
   const selectedColor = colors[colorCounter]; // Use the next color from the array
   colorCounter = (colorCounter + 1) % colors.length; // Increment the counter
   // Get prediction results
   const [x, y, width, height] = prediction.bbox;
   const text = prediction.class;

   // Set styling
   ctx.strokeStyle = selectedColor;
   ctx.font = "40px Montserrat";
   ctx.fillStyle = selectedColor;

   // Draw rectangle and text
   ctx.beginPath();
   ctx.fillText(text, x, y);
   ctx.rect(x, y, width, height);
   ctx.stroke();
 });
};

In the code block above, we created an array of colors that we would use for the bounding boxes. Using the predictions state, we can access the bbox property for the object’s dimensions and the class property to classify the displayed object. Next, we use the canvas prop ctx to draw the rectangle and text for the bounding box.

For the final step, we will pass on the canvas property and predictions to the drawBoundingBox function in detectCam:

const detectCam = async (model) => {
 if (
   webcamRef.current !== undefined &&
   webcamRef.current !== null &&
   webcamRef.current.video.readyState === 4
 ){
   //... Former code here
   // Draw canvas
   const ctx = canvasRef.current.getContext("2d");
   drawBoundingBox(predictions, ctx);
 }
};

Now, we get the bounding boxes on the video when we run the application:

Conclusion

This article explored the integration of object detection capabilities in a web application. We began by introducing the concept of object detection and domains where its applications are beneficial, and we finally created an implementation of object detection on the web using React.js and Tensorflow.

The capabilities presented by object detection using Tensorflow are limitless. Its other custom models can further explore them, creating interactions and applications for real-world scenarios such as surveillance, augmented reality systems, etc.

Understand every bug

Uncover frustrations, understand bugs and fix slowdowns like never before with OpenReplay — the open-source session replay tool for developers. Self-host it in minutes, and have complete control over your customer data. Check our GitHub repo and join the thousands of developers in our community.