[CRE] [2/5] Relay DON node handler for confidential relay#21639
[CRE] [2/5] Relay DON node handler for confidential relay#21639
Conversation
|
👋 nadahalli, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
There was a problem hiding this comment.
Pull request overview
Risk Rating: HIGH
Adds confidential-relay request handling across the gateway and relay DON nodes as part of confidential workflow execution (#21635), including quorum aggregation on the gateway side and Nitro-attestation validation on the relay node side before proxying to Vault/capability execution.
Changes:
- Introduces a new gateway handler that fans out confidential relay requests to relay DON nodes and aggregates node responses to an F+1 quorum.
- Introduces a new relay DON node
GatewayConnectorHandlerthat validates Nitro attestations (including multi-PCR pools) and proxiessecrets_get/capability_execto Vault/capabilities. - Updates dependencies to include attestation-related packages (
chainlink-common/pkg/teeattestation,hf/nitrite) and bumpsfxamacker/cbor.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod | Bumps/adds dependencies needed for TEE attestation + relay protocol types. |
| go.sum | Records new/updated module checksums for the dependency changes. |
| core/services/gateway/handlers/confidentialrelay/handler.go | New gateway-side fanout + quorum aggregation handler with metrics and request lifecycle management. |
| core/services/gateway/handlers/confidentialrelay/aggregator.go | New quorum aggregator for matching node responses (digest-based) with “quorum unobtainable” detection. |
| core/services/gateway/handlers/confidentialrelay/handler_test.go | Unit tests covering quorum success/divergence, timeouts, duplicate IDs, fanout failures, and rate-limiting. |
| core/capabilities/confidentialrelay/handler.go | New relay DON node handler validating Nitro attestation and proxying to Vault/capability execution. |
| core/capabilities/confidentialrelay/handler_test.go | Unit tests for relay handler behavior (attestation, capability exec, unsupported methods, invalid params). |
| core/capabilities/confidentialrelay/service.go | Lifecycle wrapper that defers handler creation until the gateway connector is available. |
| func (h *Handler) handleSecretsGet(ctx context.Context, gatewayID string, req *jsonrpc.Request[json.RawMessage]) *jsonrpc.Response[json.RawMessage] { | ||
| var params confidentialrelaytypes.SecretsRequestParams | ||
| if err := json.Unmarshal(*req.Params, ¶ms); err != nil { | ||
| return h.errorResponse(ctx, gatewayID, req, jsonrpc.ErrInvalidParams, err) | ||
| } |
There was a problem hiding this comment.
handleSecretsGet dereferences *req.Params without checking for req.Params == nil. A JSON-RPC request can omit params, which would panic here. Add a nil check and return a JSON-RPC invalid request/invalid params error response instead of crashing the handler.
| } | ||
|
|
||
| func (h *Handler) handleCapabilityExecute(ctx context.Context, gatewayID string, req *jsonrpc.Request[json.RawMessage]) *jsonrpc.Response[json.RawMessage] { | ||
| var params confidentialrelaytypes.CapabilityRequestParams |
There was a problem hiding this comment.
handleCapabilityExecute dereferences *req.Params without checking for req.Params == nil, which can panic for JSON-RPC requests that omit params. Add a nil check and return an appropriate JSON-RPC invalid request/invalid params response.
| var params confidentialrelaytypes.CapabilityRequestParams | |
| var params confidentialrelaytypes.CapabilityRequestParams | |
| if req.Params == nil { | |
| return h.errorResponse(ctx, gatewayID, req, jsonrpc.ErrInvalidParams, fmt.Errorf("missing params")) | |
| } |
| // Normalize owner to EIP-55 checksum format, matching how secrets are stored. | ||
| normalizedOwner := common.HexToAddress(params.Owner).Hex() | ||
|
|
There was a problem hiding this comment.
common.HexToAddress(params.Owner) will silently coerce invalid/empty owner strings to the zero address. If params.Owner is malformed, this could query Vault under an unintended owner without surfacing an error. Consider validating params.Owner first (e.g., common.IsHexAddress) and returning an ErrInvalidParams response when it’s not a valid EVM address.
| h.metrics.requestSuccess.Add(ctx, 1, metric.WithAttributes( | ||
| attribute.String("gateway_id", gatewayID), | ||
| )) |
There was a problem hiding this comment.
HandleGatewayMessage increments requestSuccess unconditionally after sending the response, even when the response contains a JSON-RPC error (and errorResponse already increments requestInternalError). This will skew success/error rates in telemetry. Consider incrementing success only when response.Error == nil (and using the error counter otherwise).
| h.metrics.requestSuccess.Add(ctx, 1, metric.WithAttributes( | |
| attribute.String("gateway_id", gatewayID), | |
| )) | |
| if response != nil && response.Error == nil { | |
| h.metrics.requestSuccess.Add(ctx, 1, metric.WithAttributes( | |
| attribute.String("gateway_id", gatewayID), | |
| )) | |
| } |
| package confidentialrelay | ||
|
|
||
| import ( | ||
| "context" | ||
| "encoding/json" |
There was a problem hiding this comment.
PR title/description emphasize the relay DON node handler, but this PR also introduces a new gateway-side handler (core/services/gateway/handlers/confidentialrelay/*) that fans out and aggregates responses. If that’s intentional, consider updating the PR description (or splitting) so reviewers don’t miss the gateway-surface changes.
There was a problem hiding this comment.
Fixed. The gateway handler files were erroneously included in this PR. They belong to PR #21638 [1/5]. Removed.
| l.Debugw("aggregating responses, waiting for other nodes...", "error", err) | ||
| return nil | ||
| case err != nil: | ||
| l.Error("quorum unobtainable, returning response to user...", "error", err, "responses", maps.Values(ar.responses)) |
There was a problem hiding this comment.
maps.Values(ar.responses) reads the underlying map without holding ar.mu. Since ar.responses is mutated under addResponseForNode, this log line can trigger a data race / concurrent map iteration panic under load. Use copiedResponses := ar.copiedResponses() (or lock around the read) and log the copied map/values instead.
| l.Error("quorum unobtainable, returning response to user...", "error", err, "responses", maps.Values(ar.responses)) | |
| l.Error("quorum unobtainable, returning response to user...", "error", err, "responses", copiedResponses) |
There was a problem hiding this comment.
This file was erroneously included and has been removed from this PR. Tracked in #21638.
| func (h *handler) HandleJSONRPCUserMessage(ctx context.Context, req jsonrpc.Request[json.RawMessage], callback gwhandlers.Callback) error { | ||
| if req.ID == "" { | ||
| return errors.New("request ID cannot be empty") | ||
| } | ||
| if len(req.ID) > 200 { | ||
| return errors.New("request ID is too long: " + strconv.Itoa(len(req.ID)) + ". max is 200 characters") | ||
| } | ||
|
|
||
| l := logger.With(h.lggr, "method", req.Method, "requestID", req.ID) | ||
| l.Debugw("handling confidential relay request") | ||
|
|
||
| ar, err := h.newActiveRequest(req, callback) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| return h.fanOutToNodes(ctx, l, ar) |
There was a problem hiding this comment.
HandleJSONRPCUserMessage forwards any req.Method without validation and doesn’t guard against req.Params == nil. Other gateway handlers validate supported methods and reject missing params (e.g., core/services/gateway/handlers/vault/handler.go:336-396 and core/services/gateway/handlers/capabilities/v2/http_trigger_handler.go:127-131). Consider failing fast here with UnsupportedMethodError/InvalidParamsError to avoid panics later (notably errorResponse dereferences *req.Params).
There was a problem hiding this comment.
This file was erroneously included and has been removed from this PR. Tracked in #21638.
CORA - Pending Reviewers
Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown For more details, see the full review summary. |
ef1dc22 to
e9eb648
Compare
|
| lggr logger.Logger | ||
| metrics *handlerMetrics | ||
|
|
||
| // validateAttestation validates Nitro attestation documents. |
There was a problem hiding this comment.
nit: nothing nitro specific about this function definition
There was a problem hiding this comment.
Fixed. Replaced "Nitro" with "TEE" in the type and handler comments.
| trustedPCRs []byte, | ||
| caRootsPEM string, |
There was a problem hiding this comment.
why do we need trustedPCRs as a parameter? They should be retrievable from the capRegistry parameter. The way executor.go does things is:
- Parse the configs from the registry
- Extract measurements from the list of enclaves.
- Any incoming request matching those measurements is accepted.
If this is already being done outside of this logic, why not bring it within this component? I doubt the PCRs are being used anywhere else.
There was a problem hiding this comment.
Addressed. Both TrustedPCRs and CARootsPEM are now read from the capabilities registry per-request via DONsForCapability + ConfigForCapability, same pattern as CC's EnsureFreshEnclaves in executor.go. The handler looks up the confidential-workflows capability config, extracts TrustedValues and CARootsPEM from the EnclavesList, and uses them for attestation validation.
CRE.ConfidentialRelay TOML config is now just Enabled (a kill switch for relay participation).
| // Each enclave instance has different PCR values (WireGuard keys baked | ||
| // per CID), so trustedPCRs may be a JSON array of PCR objects. Try each | ||
| // set and succeed if any match, same as pool.go's | ||
| // validateAttestationAgainstMultipleMeasurements. | ||
| var pcrArray []json.RawMessage | ||
| if json.Unmarshal(h.trustedPCRs, &pcrArray) == nil && len(pcrArray) > 0 { | ||
| var validationErr error | ||
| for _, pcrs := range pcrArray { | ||
| err := h.validateAttestation(attestationBytes, hash, pcrs, h.caRootsPEM) | ||
| if err == nil { | ||
| return nil | ||
| } | ||
| validationErr = errors.Join(validationErr, err) | ||
| } | ||
| return fmt.Errorf("no trusted PCR set matched: %w", validationErr) | ||
| } |
There was a problem hiding this comment.
Feel like this is a bit messy for unmarshalling to be within this logic every time rather than within the constructor.
There was a problem hiding this comment.
This is addressed. The trustedPCRs parameter is gone. Measurements are now read from the capabilities registry per-request via getEnclaveAttestationConfig, which returns structured data directly. No JSON unmarshalling in the hot path.
|
|
||
| // enclaveEntry mirrors the enclave config shape stored in the capabilities | ||
| // registry. Only the fields needed for attestation validation are included. | ||
| type enclaveEntry struct { |
There was a problem hiding this comment.
This doesn't match our actual types:
type Enclave struct {
EnclaveID [32]byte json:"enclaveId"
EnclaveURL string json:"enclaveURL"
TrustedValues [][]byte json:"trustedValues" // Array of trusted measurements the client accepts.
EnclaveType EnclaveType json:"enclaveType"
EnclaveAuthHeader string json:"enclaveAuthHeader"
EnclaveExtraData []byte json:"enclaveExtraData"
}
this whole CARootsPem thing is weird. IMO, we should not include this field in our system. I think we'd be able to tell the attestation type from the attestation itself and use the right root certificate.
At any rate, we don't have any reason to not just hardcode it to the Nitro one within chainlink common, for now.
If we do that, we should just get rid of CARootsPem everywhere in chainlink core, and assume the attestationValidatorFunc knows what to do.
There was a problem hiding this comment.
Agreed. CARootsPEM should not be in the system. I've updated the teeattestation package in chainlink-common to remove it from the production API:
smartcontractkit/chainlink-common@c7b8c137
ValidateAttestation now takes 3 args (always uses the hardcoded AWS Nitro root). ValidateAttestationWithRoots exists for tests with fake enclaves. Once chainlink-common PR #1899 merges, I'll bump common here and drop the caRootsPEM plumbing from the handler.
Re: the type mismatch with CC's Enclave struct, the handler only needs TrustedValues and CARootsPEM (soon just TrustedValues) from the enclave config. Mirroring unused fields (EnclaveID, EnclaveURL, etc.) would be dead code. Just having TrustedValues should be ok, no?
There was a problem hiding this comment.
Changed code now.
|
LGTM apart from getting rid of the root ca pem stuff. |
c76aea8 to
70acaf1
Compare
| }, nil | ||
| } | ||
|
|
||
| type gatewayConnector interface { |
There was a problem hiding this comment.
nit why is this a private interface yet used in the constructor. I get you may want to slim it down but clients won't be able to inspect the constructor param.
| return h.errorResponse(ctx, gatewayID, req, jsonrpc.ErrInternal, fmt.Errorf("failed to get local node: %w", err)) | ||
| } | ||
|
|
||
| capResp, err := vaultCap.Execute(ctx, capabilities.CapabilityRequest{ |
There was a problem hiding this comment.
No metrics on latency or count of executions?
There was a problem hiding this comment.
Added count and latency metrics for requests.
| }, | ||
| }) | ||
| if err != nil { | ||
| return h.errorResponse(ctx, gatewayID, req, jsonrpc.ErrInternal, fmt.Errorf("vault execute failed: %w", err)) |
There was a problem hiding this comment.
Is there any chance the err message could leak sensitive information?
There was a problem hiding this comment.
You never know. Changed it to "internal error" for safety. Logging the error though. I think we are careful enough not to send sensitive error messages from upstream. So, logging should be ok.
Add GatewayConnectorHandler that validates Nitro attestation and proxies enclave requests to VaultDON (secrets_get) and capability DONs (capability_exec). Supports multi-PCR validation for enclave pools. Part of #21635
Replace the static trustedPCRs TOML parameter with a per-request lookup from the capabilities registry. The handler reads the confidential-workflows capability config to extract enclave TrustedValues, same pattern as CC's EnsureFreshEnclaves. Remove TrustedPCRs from CREConfidentialRelay config interface, ConfidentialRelayConfig TOML struct, and config_cre.go.
CARootsPEM is stored per-enclave in the same cap registry config as TrustedValues. Read both in getEnclaveAttestationConfig, eliminating the last TOML field besides Enabled from CRE.ConfidentialRelay.
ValidateAttestation in chainlink-common now always uses the hardcoded AWS Nitro root cert (c7b8c137). Drop the caRootsPEM parameter from the handler's attestation validator func, getEnclaveAttestationConfig, and the enclaveEntry struct.
(cherry picked from commit 16e4fc9)
cb65a31 to
00a16e6
Compare
|




Context
Part of #21635 (confidential workflow execution). [2/5] in the series.
Can be reviewed and merged independently.
What this does
Adds a
GatewayConnectorHandlerthat runs on relay DON nodes. Itvalidates Nitro attestation on incoming enclave requests, then proxies
to VaultDON (
secrets_get) or capability DONs (capability_exec).Each relay node calls the target DON through CRE's standard capability
dispatch (which includes DON-level consensus), receives the aggregated
response, and translates it deterministically before returning to the
gateway. The gateway handler (#21638, [1/5]) aggregates these responses
using F+1 quorum.
Supports multi-PCR validation for enclave pools where each instance
has unique PCR values.
CRE.ConfidentialRelayTOML block (enabled, trustedPCRs,caRootsPEM) for relay DON node configuration.
Dependencies
None. This PR is self-contained.