Skip to content

Add CI comparing ref/ and kokkos/#23

Open
cwpearson wants to merge 4 commits intoMantevo:masterfrom
cwpearson:ci/compare-ref-kokkos
Open

Add CI comparing ref/ and kokkos/#23
cwpearson wants to merge 4 commits intoMantevo:masterfrom
cwpearson:ci/compare-ref-kokkos

Conversation

@cwpearson
Copy link
Copy Markdown

@cwpearson cwpearson commented Apr 3, 2026

This adds a CI job that does a serial+openmpi run and uses MINIFE_DEBUG to dump the raw values from inside the solver to compare the two implementations.

Along the way, two issues fixed:

  1. The kokkos/ dump thing was expecting std::vector, not Kokkos::vector
  2. The use of kokkos/ waxpy had a bug in the decomposed MPI case

With these changes, the internal vector/matrix states match before and after the solves.

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
Signed-off-by: Carl Pearson <cwpears@sandia.gov>
In ref/ cg_solve.hpp, we have waxpby(one, r, beta, p, p)

In kokkos/ we had waxpby(beta, p, one, r, p)

These are algebraicly equivalent:

waxpby(alpha, x, beta, y, w) computes w = alpha*x + beta*y,

so ref/: p = 1*r + beta * p
kokkos/: p = beta*p + 1*r

However, p is length A.num_cols and includes ghost entries after
make_local_matrix, while r has only local rows. For mpi runs, when p is
the x argument it trips

y.local_size < x.local_size || w.local_size < x.local_size

This PR swaps the Kokkos arguments back to the ref ones, so
when p is extra long the check doesn't trip.

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
@cwpearson cwpearson force-pushed the ci/compare-ref-kokkos branch from 901ffd3 to 6778f48 Compare April 3, 2026 16:33
Copy link
Copy Markdown
Member

@jwillenbring jwillenbring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I in-lined a couple of questions. Once we have those resolved, are you ready to have this applied, or were you waiting to stage something else? Seems like ready, but I want to verify.

uses: actions/checkout@v6
with:
repository: kokkos/kokkos
ref: 3.7.02
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 3.7.02? Just curious because this is an older version of Kokkos? Perhaps this is an intermediate step to bring things up to modern Kokkos?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an intermediate step.

Kokkos 4 removed Kokkos_Vector.hpp, so 3.7.02 is the last release that an unmodified miniFE can build with.

run: |
rm -f ./*.mtx.* ./*.vec.*
./kokkos/src/miniFE.x nx=${{ matrix.case.nx }} ny=${{ matrix.case.ny }} nz=${{ matrix.case.nz }}
mpirun --oversubscribe -np 2 ./kokkos/src/miniFE.x nx=${{ matrix.case.nx }} ny=${{ matrix.case.ny }} nz=${{ matrix.case.nz }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason --oversubscribe is used here but not the ref case?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, just an oversight.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here as a defense against the number of CPUs in a hosted runner shrinking. I'll add it to both runs.

Signed-off-by: Carl Pearson <cwpears@sandia.gov>
@cwpearson
Copy link
Copy Markdown
Author

You can see how the CI runs will look on my fork: https://github.com/cwpearson/miniFE/actions/runs/24212040108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants