A comprehensive full-stack data visualization dashboard designed to analyze and display insights from over 7,500 StackOverflow threads tagged with Java. This project explores topic trends, tag co-occurrences, common multithreading pitfalls, and factors affecting question solvability.
so-java-stats.mp4
.
├── backend/so-java-stats
│ ├── Dockerfile
│ ├── pom.xml
│ ├── src/main
│ │ ├── java/com/sustech/so_java_stats
│ │ │ ├── SoJavaStatsApplication.java
│ │ │ ├── config/ # CORS and OpenAPI/Swagger config
│ │ │ ├── controller/ # REST API Endpoints
│ │ │ ├── dto/ # Data Transfer Objects for API responses
│ │ │ ├── init/ # Database Initializer (JSON to DB logic)
│ │ │ ├── model/ # JPA Entities (Question, Answer, Tag, etc.)
│ │ │ ├── repository/ # Spring Data JPA Repositories
│ │ │ └── service/ # Business Logic interfaces and implementations
│ │ └── resources
│ │ ├── application.yaml # App configuration
│ │ └── data/ # raw data source [7500 thread_X.json files]
│ └── mvnw
├── frontend
│ ├── Dockerfile
│ ├── index.html
│ ├── nginx.conf # Production server config
│ ├── package.json
│ ├── postcss.config.js
│ ├── tailwind.config.js # Styling configuration
│ ├── vite.config.js # Build tool configuration
│ └── src
│ ├── App.jsx # Main React component
│ ├── main.jsx # Entry point
│ ├── components
│ │ ├── QuestionModal.jsx
│ │ └── charts/ # Dashboard Visualization components
│ │ ├── MultithreadingPitfalls.jsx
│ │ ├── QuestionSolvability.jsx
│ │ ├── TopicCooccurrences.jsx
│ │ └── TopicTrends.jsx
│ ├── utils/ # Export/Download utilities
│ index.css
├── data-scraper
│ ├── pom.xml
│ └── src/main/java/DataScraper.java # Scraper entry point
├── docker-compose.yaml
└── README.md
-
Topic Trends: A time-series analysis showing the volume of questions for specific Java sub-topics (e.g., Spring, JVM, Multithreading) over time.
-
Topic Co-occurrences: A relationship graph/chart identifying which technologies are most frequently paired with Java.
-
Concurrency Analysis: A regex-based deep dive into question bodies and answers to identify the most common multithreading "pitfalls" (Deadlocks, Race Conditions, etc.).
-
Solvability Analysis: Comparison of metrics between "Solvable" (accepted answer) and "Hard-to-solve" questions, looking at user reputation and body length.
The data-scraper module is a standalone Java tool that interfaces with the StackExchange API. It uses a custom filter to fetch rich thread data (including owners, comments, and full answer bodies) and saves them as serialized JSON files for the backend to ingest.
- Docker and Docker Compose (Recommended)
OR Local Environment:
- Java 17+
- Node.js 18+
- PostgreSQL 15+
-
Clone the repository.
-
Create a
.envfile in the root directory with the following variables:
DB_USERNAME=your_username
DB_PASSWORD=your_password
- Run the following command:
docker-compose up -d --build
- Access the dashboard at http://localhost:3000.
-
Database
Create a PostgreSQL database named
stackoverflow.Ensure your local credentials match your
.envorapplication.yaml. -
Backend
Navigate to
backend/so-java-stats.Create a
.envfile there with yourDB_USERNAMEandDB_PASSWORD.Run the application:
./mvnw spring-boot:runNote: On first run, the DatabaseInitializer will automatically parse the 7,500 JSON files in
resources/dataand populate your PostgreSQL database. -
Frontend
Navigate to
frontend.Install dependencies and start the dev server:
npm install npm run devAccess the app at http://localhost:5173.