Controlling Operating System and Applications using Hand Motion Gestures

Project Overview

This repository contains the implementation of a real time hand gesture recognition system developed as a finals project for the Bachelor of Engineering in Computer Engineering. The system enables users to interact with their computer through intuitive hand motions captured via a standard/integrated webcam. By processing live video feeds with deep learning models, the application identifies specific gestures and translates them into system level commands or application specific actions.

The primary goal of this project is to simplify human computer interaction by providing a natural alternative to traditional input devices. The system is particularly effective for managing media playback, navigating user interfaces, and executing repetitive tasks through simple gestures.

Key Features

Live Gesture Recognition: High frequency detection and classification of hand gestures from real time video streams.
Hybrid 2D-3D CNN Architecture: A neural network design that utilizes 2D convolutional layers for spatial feature extraction and 3D convolutional layers for capturing temporal motion dynamics.
Context Aware Execution: Modular design allows mapping recognized gestures to different commands based on the active application or system state.
Support for Diverse Gestures: Pre trained to recognize various motions including swiping, sliding, thumb signals and static signs.
Automation Integration: Leverages PyAutoGUI to simulate keyboard and mouse events across the operating system.

System Design

The application architecture is divided into three main components:

Preprocessing Module: Handles video acquisition from the webcam, frame resizing and normalization to ensure consistent input for the neural network.
Recognition Engine: A deep learning model that analyzes sequences of 16 frames to identify gestures with high confidence. The model utilizes spatial temporal feature extraction to distinguish between similar motions.
Execution Layer: Maps the identified gesture to a specific system hotkey or command, enabling hands free control of the computer.

Technical Specifications

The core engine is built using PyTorch and follows a hybrid architecture:

Spatial Feature Extractor (2D CNN): Processes individual frames to identify hand contours and positions.
Temporal Feature Extractor (3D CNN): Analyzes the change in spatial features over time to recognize the direction and nature of the movement.
Classifier: A fully connected network that maps extracted features to one of 27 gesture classes.

Getting Started

Prerequisites

Python 3.6 or newer
Standard webcam
Hardware with CUDA support is recommended for optimal performance

Installation

Clone the repository:

git clone https://github.com/rishabhpatel9/Gesture-Control.git
cd Gesture-Control

Install the required dependencies:
```
pip install -r requirements.txt
```

Application Configuration

The configs.json file contains paths for the models and datasets. For local execution, ensure that the paths for pre-trained weights in the models_jester directory are correctly configured.

Usage

To start the gesture recognition system, execute the demo.py script:

python demo.py

A preview window will appear showing the webcam feed.
Perform gestures within the indicated green frame.
The classified gesture and its confidence value will be displayed in the console and on screen.
Press Q to exit the application.

Gesture Mappings

The following default mappings are configured in the current version:

Gesture	Action
Stop Sign	Open Start Menu / System Launcher
Thumb Up	Launch Primary Application (linked to Taskbar 1)
Thumb Down	Open File Explorer
Swiping Up	Increase Volume or Page Up
Swiping Down	Minimize All Windows
Pushing Hand Away	Zoom In
Pulling Hand In	Zoom Out

Future Scope

Integration of more complex gesture sequences for advanced application control.
Adding script/macro execution for automating tasks.
Implementation of higher precision hand tracking to improve recognition in variable lighting conditions.
Expansion of the command mapping library to support more third party applications out of the box.

Acknowledgements

The foundational model architecture and training methodology were adapted from the RT_GestureRecognition project by Fabio Baldissera.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
models_jester		models_jester
DemoModel.py		DemoModel.py
LICENSE		LICENSE
README.md		README.md
RT3D_16F.py		RT3D_16F.py
Trainning Template.ipynb		Trainning Template.ipynb
app.py		app.py
configs.json		configs.json
data_loader.py		data_loader.py
data_parser.py		data_parser.py
demo.py		demo.py
jester-v1-labels.csv		jester-v1-labels.csv
key.py		key.py
requirements.txt		requirements.txt
test.py		test.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Controlling Operating System and Applications using Hand Motion Gestures

Project Overview

Key Features

System Design

Technical Specifications

Getting Started

Prerequisites

Installation

Application Configuration

Usage

Gesture Mappings

Future Scope

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Controlling Operating System and Applications using Hand Motion Gestures

Project Overview

Key Features

System Design

Technical Specifications

Getting Started

Prerequisites

Installation

Application Configuration

Usage

Gesture Mappings

Future Scope

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages