This repository contains the implementation of a real time hand gesture recognition system developed as a finals project for the Bachelor of Engineering in Computer Engineering. The system enables users to interact with their computer through intuitive hand motions captured via a standard/integrated webcam. By processing live video feeds with deep learning models, the application identifies specific gestures and translates them into system level commands or application specific actions.
The primary goal of this project is to simplify human computer interaction by providing a natural alternative to traditional input devices. The system is particularly effective for managing media playback, navigating user interfaces, and executing repetitive tasks through simple gestures.
- Live Gesture Recognition: High frequency detection and classification of hand gestures from real time video streams.
- Hybrid 2D-3D CNN Architecture: A neural network design that utilizes 2D convolutional layers for spatial feature extraction and 3D convolutional layers for capturing temporal motion dynamics.
- Context Aware Execution: Modular design allows mapping recognized gestures to different commands based on the active application or system state.
- Support for Diverse Gestures: Pre trained to recognize various motions including swiping, sliding, thumb signals and static signs.
- Automation Integration: Leverages PyAutoGUI to simulate keyboard and mouse events across the operating system.
The application architecture is divided into three main components:
- Preprocessing Module: Handles video acquisition from the webcam, frame resizing and normalization to ensure consistent input for the neural network.
- Recognition Engine: A deep learning model that analyzes sequences of 16 frames to identify gestures with high confidence. The model utilizes spatial temporal feature extraction to distinguish between similar motions.
- Execution Layer: Maps the identified gesture to a specific system hotkey or command, enabling hands free control of the computer.
The core engine is built using PyTorch and follows a hybrid architecture:
- Spatial Feature Extractor (2D CNN): Processes individual frames to identify hand contours and positions.
- Temporal Feature Extractor (3D CNN): Analyzes the change in spatial features over time to recognize the direction and nature of the movement.
- Classifier: A fully connected network that maps extracted features to one of 27 gesture classes.
- Python 3.6 or newer
- Standard webcam
- Hardware with CUDA support is recommended for optimal performance
-
Clone the repository:
git clone https://github.com/rishabhpatel9/Gesture-Control.git cd Gesture-Control -
Install the required dependencies:
pip install -r requirements.txt
The configs.json file contains paths for the models and datasets. For local execution, ensure that the paths for pre-trained weights in the models_jester directory are correctly configured.
To start the gesture recognition system, execute the demo.py script:
python demo.py- A preview window will appear showing the webcam feed.
- Perform gestures within the indicated green frame.
- The classified gesture and its confidence value will be displayed in the console and on screen.
- Press Q to exit the application.
The following default mappings are configured in the current version:
| Gesture | Action |
|---|---|
| Stop Sign | Open Start Menu / System Launcher |
| Thumb Up | Launch Primary Application (linked to Taskbar 1) |
| Thumb Down | Open File Explorer |
| Swiping Up | Increase Volume or Page Up |
| Swiping Down | Minimize All Windows |
| Pushing Hand Away | Zoom In |
| Pulling Hand In | Zoom Out |
- Integration of more complex gesture sequences for advanced application control.
- Adding script/macro execution for automating tasks.
- Implementation of higher precision hand tracking to improve recognition in variable lighting conditions.
- Expansion of the command mapping library to support more third party applications out of the box.
The foundational model architecture and training methodology were adapted from the RT_GestureRecognition project by Fabio Baldissera.