Parallelize the unit and integration tests#520
Parallelize the unit and integration tests#520kmontemayor2-sc wants to merge 13 commits intomainfrom
Conversation
|
/unit_test_py |
|
/integration_test |
GiGL Automation@ 03:39:24UTC : 🔄 @ 04:55:30UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 03:39:30UTC : 🔄 @ 05:02:55UTC : ✅ Workflow completed successfully. |
There was a problem hiding this comment.
Have we tried ways to parallelize the tests on a single machine https://github.com/mtreinish/stestr ?
or various pytest add-ons?
This adds a lot of complexity :\ - want to be sure we have considered alternatives.
So we actually already have some sort of single machine parallelization 1 but last I checked it just doesn't allow our tests to run properly. Though we have a bigger problem in that the runners we use for CICD are on the small size and are prone to OOMing if the process tree gets too big (this happened to me quite a bit with the graph_store_integration_test), so even if we do enable single-machine parallelization it wouldn't really help us in CI/CD. For local, I don't think parallel tests are that useful either, as debugging the parallel. test traces is quite difficult and in practice I don't usually run all of the tests locally, just the ones that should be pertinent to my changes. Then, I'd push to github and This change is to support the above step, and I agree isn't strictly necessary, but if we want parallel tests on CI/CD I don't really see another way forward here, unless we want to get custom bigger runners but even then debugging will be difficult. |
GiGL Automation@ 16:25:50UTC : 🔄 @ 17:40:29UTC : ✅ Workflow completed successfully. |
What are deps needed here? FWIW, GiGL/.github/workflows/on-pr-comment.yml Lines 24 to 34 in 1efbb31 But preferred would be:
This would be a much cleaner approach - lets do it now? |
There was a problem hiding this comment.
The code is heavy to read and parse right now.
Opportunity to greatly simplify this using "matrix strategy" execution support built into github actions.
name: Sharded Unit Tests
on:
workflow_dispatch:
inputs:
shards:
description: "Number of shards"
required: true
default: "5"
type: number
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4, 5]
steps:
- uses: actions/checkout@v4
- name: Run unit tests
run: make unit_test --shard="${{ matrix.shard }}"
This just means unit_test will show 5 different "steps" in the execution flow; which as a user we are not losing much out on from when debugging what shard failed. Also, we can get rid of a lot of code here which might become hard to manage.
Scope of work done
Summary
Adds parallel test sharding to CI, splitting Python unit tests and integration tests across 4 Cloud Build shards running concurrently. Slow
test modules are explicitly pinned to specific shards via ordered tuples, preventing hash-collision hotspots. Unpinned modules are assigned via
SHA-256 hash.
and reports pass/fail with build URLs
new CLI args --shard_index/--total_shards
measured durations as comments
that scan tests/unit/ to verify sharded == unsharded
As a followup, we should probably move that inlined bash script to be python but then we'd need to add some new
cipython deps that we just install for ci purposes and that seems like a different change.NOTE: I tested this in a slightly different
on: pushpath, I'll test this change when it gets merged and if the ci setup breaks I can revertWhere is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO