Skip to content

rishabhpatel9/Gesture-Control

Repository files navigation

Controlling Operating System and Applications using Hand Motion Gestures

Project Overview

This repository contains the implementation of a real time hand gesture recognition system developed as a finals project for the Bachelor of Engineering in Computer Engineering. The system enables users to interact with their computer through intuitive hand motions captured via a standard/integrated webcam. By processing live video feeds with deep learning models, the application identifies specific gestures and translates them into system level commands or application specific actions.

The primary goal of this project is to simplify human computer interaction by providing a natural alternative to traditional input devices. The system is particularly effective for managing media playback, navigating user interfaces, and executing repetitive tasks through simple gestures.

Key Features

  • Live Gesture Recognition: High frequency detection and classification of hand gestures from real time video streams.
  • Hybrid 2D-3D CNN Architecture: A neural network design that utilizes 2D convolutional layers for spatial feature extraction and 3D convolutional layers for capturing temporal motion dynamics.
  • Context Aware Execution: Modular design allows mapping recognized gestures to different commands based on the active application or system state.
  • Support for Diverse Gestures: Pre trained to recognize various motions including swiping, sliding, thumb signals and static signs.
  • Automation Integration: Leverages PyAutoGUI to simulate keyboard and mouse events across the operating system.

System Design

The application architecture is divided into three main components:

  1. Preprocessing Module: Handles video acquisition from the webcam, frame resizing and normalization to ensure consistent input for the neural network.
  2. Recognition Engine: A deep learning model that analyzes sequences of 16 frames to identify gestures with high confidence. The model utilizes spatial temporal feature extraction to distinguish between similar motions.
  3. Execution Layer: Maps the identified gesture to a specific system hotkey or command, enabling hands free control of the computer.

Technical Specifications

The core engine is built using PyTorch and follows a hybrid architecture:

  • Spatial Feature Extractor (2D CNN): Processes individual frames to identify hand contours and positions.
  • Temporal Feature Extractor (3D CNN): Analyzes the change in spatial features over time to recognize the direction and nature of the movement.
  • Classifier: A fully connected network that maps extracted features to one of 27 gesture classes.

Getting Started

Prerequisites

  • Python 3.6 or newer
  • Standard webcam
  • Hardware with CUDA support is recommended for optimal performance

Installation

  1. Clone the repository:

    git clone https://github.com/rishabhpatel9/Gesture-Control.git
    cd Gesture-Control
  2. Install the required dependencies:

    pip install -r requirements.txt

Application Configuration

The configs.json file contains paths for the models and datasets. For local execution, ensure that the paths for pre-trained weights in the models_jester directory are correctly configured.

Usage

To start the gesture recognition system, execute the demo.py script:

python demo.py
  • A preview window will appear showing the webcam feed.
  • Perform gestures within the indicated green frame.
  • The classified gesture and its confidence value will be displayed in the console and on screen.
  • Press Q to exit the application.

Gesture Mappings

The following default mappings are configured in the current version:

Gesture Action
Stop Sign Open Start Menu / System Launcher
Thumb Up Launch Primary Application (linked to Taskbar 1)
Thumb Down Open File Explorer
Swiping Up Increase Volume or Page Up
Swiping Down Minimize All Windows
Pushing Hand Away Zoom In
Pulling Hand In Zoom Out

Future Scope

  • Integration of more complex gesture sequences for advanced application control.
  • Adding script/macro execution for automating tasks.
  • Implementation of higher precision hand tracking to improve recognition in variable lighting conditions.
  • Expansion of the command mapping library to support more third party applications out of the box.

Acknowledgements

The foundational model architecture and training methodology were adapted from the RT_GestureRecognition project by Fabio Baldissera.

About

Controlling OS and applications with hand motion gestures based on profile of which application is open. Each application can be assigned a profile to repurpose all gestures for that specific app.

Topics

Resources

License

Stars

Watchers

Forks

Contributors