Add Create and Update for Worker Deployment Version, update scaler config and enable per-task-type compute providers#731
Add Create and Update for Worker Deployment Version, update scaler config and enable per-task-type compute providers#73102strich wants to merge 4 commits intoserverlessfrom
Conversation
4b74a6b to
448fcaa
Compare
86892e0 to
7ca9f8b
Compare
42cbcff to
7ce67c6
Compare
|
Semgrep found 1 No explicit |
7ce67c6 to
041a11a
Compare
temporal/api/compute/v1/config.proto
Outdated
| // Informs a worker lifecycle controller *when* and *how often* to perform | ||
| // certain worker lifecycle actions like starting a serverless worker. | ||
| temporal.api.compute.v1.ComputeScaler scaler = 2; | ||
| message ComputeConfigScalingGroup { |
There was a problem hiding this comment.
nit: not need to qualify this as ComputeConfig... if it's embedded inside a ComputeConfig message.
That being said, if you think there's any chance that this message would need to be used outside of this context, I would make it top level.
temporal/api/compute/v1/config.proto
Outdated
|
|
||
| // The region of the temporal server for which this compute config should apply. | ||
| // This is used together with replication to enable clean-hand-off when migrating regions. | ||
| optional string region = 4; |
There was a problem hiding this comment.
nit: why is region field number 4 if it's the second field defined here?
There was a problem hiding this comment.
no special reason, re-numbered
| // type of the compute scaler. this string is implementation-specific and | ||
| // can be used by implementations to understand how to interpret the | ||
| // contents of the scaler_details field. | ||
| string type = 1; |
There was a problem hiding this comment.
nit: use upper case letters when starting sentences to be consistent with the rest of the API.
temporal/api/compute/v1/scaler.proto
Outdated
|
|
||
| // Contains scaler-specific instructions and configuration. | ||
| oneof detail { | ||
| // will be an unencrypted, JSON-encoded object of scaler-specific |
There was a problem hiding this comment.
nit: "will be" doesn't quite sound right here, i wouldn't use the future tense.
| // (-- api-linter: core::0146::any=disabled | ||
| // aip.dev/not-precedent: This needs to be extensible to | ||
| // externally-written compute scalers --) | ||
| string detail_json = 2; |
There was a problem hiding this comment.
Why not use payload always? Payload with metdata["encoding"] = "json/plain" achieves what you want here.
There was a problem hiding this comment.
The detail_json is consumed by the server (i.e. parsed and understood), while the payload is meant to be used with remote providers and passed onward as-is without the server inspecting it.
If even with that context, you believe this should always be a payload, happy to change it (this was specific feedback we had received on the first API PR).
| repeated temporal.api.compute.v1.ComputeConfig.ComputeConfigScalingGroup remove_compute_config_scaling_groups = 5; | ||
|
|
||
| // Optional. The identity of the client who initiated this request. | ||
| string identity = 4; |
There was a problem hiding this comment.
nit: make this field number 2 because it's mostly metadata and common across different requests.
| // Required. | ||
| string deployment_name = 2; | ||
|
|
||
| // Optional. Contains the new worker compute configuration for the Worker |
There was a problem hiding this comment.
What happens if this is empty? Shouldn't it be "required"?
There was a problem hiding this comment.
thanks, copy&paste mistake. Text updated
| string identity = 4; | ||
| } | ||
|
|
||
| message UpdateWorkerDeploymentVersionComputeConfigResponse { |
There was a problem hiding this comment.
Do we need to return a conflict token here?
| } | ||
|
|
||
| // Used to update the compute config of a Worker Deployment Version. | ||
| message UpdateWorkerDeploymentVersionComputeConfigRequest { |
There was a problem hiding this comment.
Shouldn't this take a conflict token and a ComputeConfig object?
|
|
||
| // Optional. The identity of the client who initiated this request. | ||
| string identity = 4; | ||
| // A unique identifier for this create request for idempotence. Typically UUIDv4. |
There was a problem hiding this comment.
Worth noting how this is used, does the server dedupe requests with the same idempotency key?
There was a problem hiding this comment.
@ShahabT can you confirm the details so I am not mis-representing here?
There was a problem hiding this comment.
here is the behavior, we can document:
- Retrying with same request id is a successful no-op.
- Retrying with different request id is an error.
- One the object is deleted, any request id (even one used for a previous creation) will re-create it.
There was a problem hiding this comment.
Updated the comment with Shahab's notes
041a11a to
49d52e9
Compare
| string namespace = 1; | ||
|
|
||
| // Required. | ||
| string deployment_name = 2; |
There was a problem hiding this comment.
@02strich what do you expect the server do for this API? also, related, does deployment name play any role in the validation? if not, should we remove it?
There was a problem hiding this comment.
Only in the sense that a deployment should be present. I think having it can come in beneficial if we decide to go for default values in the deployment or similar approaches down the road. Given that these are mapped to HTTP paths, adding it later won't be super straightforward.
temporal/api/compute/v1/config.proto
Outdated
|
|
||
| // The region of the temporal server for which this compute config should apply. | ||
| // This is used together with replication to enable clean-hand-off when migrating regions. | ||
| optional string region = 2; |
There was a problem hiding this comment.
let's remove optional for consistency and instead clarify in the comment that it is optional. Specify want would be the behavior for a replicated NS if this value is not provided. Can one have a scaling group with region set and another scaling group without region, say as a catch-all?
Also, if the implementation is not going to be ready in the first release, we should not merge it yet.
There was a problem hiding this comment.
Overall, I'm not sure if we should add the region dimension just yet. It's definitely not needed for the MVP and I worry we're complicating things by adding a premature dimension.
There was a problem hiding this comment.
Made consistent using the "Optional." in the comment instead of the protobuf label. And updated the comment to be clear that it is a catch all if not provided.
I am aiming for this being supported in the first release as MRN is a really interesting use case.
There was a problem hiding this comment.
removed it for now until we can settle on the name
| message ComputeConfigEntry { | ||
| message ScalingGroup { | ||
| // The set of task queue types this compute config serves. | ||
| repeated temporal.api.enums.v1.TaskQueueType task_queue_types = 1; |
There was a problem hiding this comment.
Would empty list be supported? if so, would it mean a catch-all?
Also clarify that same region cannot have more than one scaling group for the same tq type.
There was a problem hiding this comment.
yes empty would be supported and catch-all - updated the text
11be549 to
ae4546c
Compare
ae4546c to
309b565
Compare
Adding a Create API for Worker Deployment Versions, as well as a compute config setting at the version level as that is the matching one. A Deployment Version references a specific running version of a worker, which is also what a compute provider tracks.
Both scaler and compute providers are pluggable, so aligning the formats for easier understanding.
Advanced customers might want to define different compute providers for different task types, so enabling splitting that up.
a3cbeea to
387d6ad
Compare
| // Optional. Contains the compute config scaling groups to remove from the Worker Deployment. | ||
| repeated string remove_compute_config_scaling_groups = 7; |
There was a problem hiding this comment.
I'm not a huge fan of separating out a "remove" list from an "add/update" list, but if this is a pattern used elsewhere in the API, ok :) I prefer a Replace-style pattern where the user passes the exact desired state, including any existing list or map fields and the server simply replaces those fields with the supplied values. Additionally, having a add or remove separate API call to add/remove individual list items from a list field is a good idea. So, in this case, we'd add AddWorkerDeploymentVersionScalingGroup and RemoveWorkerDeploymentVersionScalingGroup API calls...
But again, if this is not a pattern used in the existing API, so be it :)
There was a problem hiding this comment.
Yeah, this was changed from replace-style to add/update based on feedback what is used elsewhere in the API
| string namespace = 1; | ||
|
|
||
| // Required. | ||
| string deployment_name = 2; |
There was a problem hiding this comment.
How is the build ID/version of the WDV identified in this request message?
There was a problem hiding this comment.
There is no build ID/version yet when running this validate API as it is not used in validation
There was a problem hiding this comment.
Hmm. Seems odd to have WorkerDeploymentVersion in the API name, then... Maybe better to go with ValidateComputeConfig?
There was a problem hiding this comment.
Yeah, I had been unhappy with that contradiction as well. That said ValidateComputeConfig seems too generic as well which is why I didn't go with it.
I now reframed it a bit: added in the build ID as well. This way the story is more that the validate-compute-config should mirror the update-compute-config more than the create version API. That means it clearly should have the build ID. Down the road this might also be useful once we can validate that the compute reference points at the configured build ID.
387d6ad to
a19c407
Compare
| // Informs a worker lifecycle controller *when* and *how often* to perform | ||
| // certain worker lifecycle actions like starting a serverless worker. | ||
| temporal.api.compute.v1.ComputeScaler scaler = 2; | ||
| message ScalingGroup { |
There was a problem hiding this comment.
nit: I would flatten this out to a top level message if there's any chance that this message will be reused in the future.
| // how to decrypt the payload. | ||
| // Encrypted, encoded bytestring containing provider-specific | ||
| // information. The implementation must understand how to decrypt the payload. | ||
| temporal.api.common.v1.Payload detail_payload = 3; |
There was a problem hiding this comment.
Use only payload please, there's a straightforward way of expressing plain JSON in a payload.
| // will be an encrypted, encoded bytestring containing | ||
| // provider-specific information. The implementation must understand | ||
| // how to decrypt the payload. | ||
| // Encrypted, encoded bytestring containing provider-specific |
There was a problem hiding this comment.
This would be optionally encrypted.
| string detail_json = 2; | ||
|
|
||
| // Encrypted, encoded bytestring containing scaler-specific | ||
| // information. The implementation must understand how to decrypt the payload. | ||
| temporal.api.common.v1.Payload detail_payload = 3; |
There was a problem hiding this comment.
Same here, no need for both of these fields.
|
|
||
| // Optional. Contains the compute config scaling groups to add or updated for the Worker | ||
| // Deployment. | ||
| map<string, temporal.api.compute.v1.ComputeConfig.ScalingGroup> compute_config_scaling_groups = 6; |
There was a problem hiding this comment.
I would recommend using field masks for update APIs. We've done that in a few places (see UpdateActivityOptions). This way if you add fields that the client is not aware of they will not accidentally be deleted on update.
There was a problem hiding this comment.
@ShahabT ?
This was what other APIs around here already used.
What changed?