Personal-Project

About this project:

This project analyses demographic and household statistics across Malaysian states using Python. The dataset includes variables such as population size, age distribution, household structure, and urbanisation rates.

Using libraries such as pandas, matplotlib, and scikit-learn, the project performs data cleaning, correlation analysis, regression, K-Means clustering, and Principal Component Analysis (PCA). The goal is to identify patterns and relationships between demographic factors and household characteristics and visualise how states differ in their demographic profiles.

Dataset:

The dataset was retrieved from the Department of Statistics Malaysia.

The dataset contains demographic and household statistics for Malaysian states.
Key variables include:

Population (thousands)
Age distribution (0–14, 15–64, 65+)
Total, urban, and rural households
Average household size
Urbanisation rate

The data was cleaned and processed using pandas before analysis.

Analysis:

Correlation Analysis
Regression
K-means Clustering
Principle Complex Analysis

Visualisations

The analysis generates several plots, including:

Youth population vs household size
Elderly population vs household size
Urbanisation vs household size
Cluster visualisation with regression lines
PCA plot of Malaysian states

Libraries

Pandas
Matplotlib
Sklearn
Numpy

How to Run

Install dependencies: pip install pandas matplotlib scikit-learn numpy
Place the dataset file pop_stats.csv in the project folder.
Run the script: python analysis.py

Future Improvements

Add interactive visualisations
Include more demographic variables
Apply additional clustering evaluation methods

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
analysis.py		analysis.py
clusters_with_regression.png		clusters_with_regression.png
clusters_youth_vs_household_size.png		clusters_youth_vs_household_size.png
elderly_vs_household_size.png		elderly_vs_household_size.png
elderly_vs_household_size_regression.png		elderly_vs_household_size_regression.png
hies_state.csv		hies_state.csv
info.py		info.py
pca_states.png		pca_states.png
pop_stats.csv		pop_stats.csv
urbanisation_vs_household_size.png		urbanisation_vs_household_size.png
urbanisation_vs_household_size_regression.png		urbanisation_vs_household_size_regression.png
youth_vs_household_size.png		youth_vs_household_size.png
youth_vs_household_size_regression.png		youth_vs_household_size_regression.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal-Project

About this project:

Dataset:

Analysis:

Visualisations

Libraries

How to Run

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Personal-Project

About this project:

Dataset:

Analysis:

Visualisations

Libraries

How to Run

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages