Conversation
|
|
||
| [Service] | ||
| Type=simple | ||
| Environment="ENV={{ env }}" |
There was a problem hiding this comment.
{{ this }} is a Jinja variable, replaced in the bootstrapping process. This is how ENV=stage or whatever gets into the service environment.
| # do not exist, this service must have permission to create them. | ||
| - name: cloudwatch_logs | ||
| match: cloudwatch.stalwart.mail | ||
| log_group_name: /tb/${ENV}/stalwart |
There was a problem hiding this comment.
${THIS} is not Jinja, but a syntax native to fluent-bit's configuration. This gets subbed out live by fluent-bit with the value of this environment variable.
| pipeline: | ||
| inputs: | ||
| - name: systemd | ||
| tag: cloudwatch.stalwart.{{ function }} |
There was a problem hiding this comment.
Tagging things with the function lets us catch them later and route them properly.
| # Map of template files to target files | ||
| TEMPLATE_MAP = { | ||
| 'fluent-bit.service.j2': '/usr/lib/systemd/system/fluent-bit.service', | ||
| 'fluent-bit.yaml.j2': '/etc/fluent-bit/fluent-bit.yaml', |
There was a problem hiding this comment.
Add the new fluent-bit files to the list of things we template on a host.
| # Map of template variable to EC2 tags | ||
| TEMPLATE_VALUE_TAG_MAP = { | ||
| 'env': 'environment', | ||
| 'function': 'postboot.stalwart.function', |
There was a problem hiding this comment.
These are the new template variables based on instance tags.
| self, | ||
| name: str, | ||
| project: tb_pulumi.ThunderbirdPulumiProject, | ||
| log_group_arn: str, |
There was a problem hiding this comment.
This new variable gets passed down to the IAM setup so we can be sure the node profile includes write access to the Stalwart log group.
| archive_files = [ | ||
| 'bootstrap.py', | ||
| 'templates/fluent-bit.service.j2', | ||
| 'templates/fluent-bit.yaml.j2', |
There was a problem hiding this comment.
These entries ensure that the new config templates wind up in the first-phase bootstrapping blob.
| private: | ||
| - destination_cidr_block: 10.202.0.0/22 # observability-dev | ||
| vpc_peering_connection_id: pcx-0d2027442f0e54ca4 | ||
| vpc_peering_connection_id: pcx-04d7e54008cd9326c |
There was a problem hiding this comment.
The peering connection changed when I rebuilt dev for testing.
| tb:cloudwatch:LogDestination: | ||
| stalwart: | ||
| log_group: | ||
| retention_in_days: 3 |
There was a problem hiding this comment.
Shorter data retention in prod. This is actually the default value for the option, but I like to be explicit.
| Jinja2>=3.1,<4.0 | ||
| pulumi_cloudflare==6.6.0 | ||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.16 | ||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.18 |
There was a problem hiding this comment.
This is a future version with a bunch of little fixes to the core library that we need here, which is why we need the release to be done before merging this.
| set -x | ||
| set -e | ||
|
|
||
| # Places data get stored |
There was a problem hiding this comment.
Fun fact: User data can't exceed 16KB. That's why we bzip2 the second phase blob - great compression ratio! This file, fully rendered, is sitting at 7,257 bytes as of this PR, so we're well within range still.
| inputs: | ||
| - name: systemd | ||
| tag: cloudwatch.stalwart.{{ function }} | ||
| systemd_filter: _SYSTEMD_UNIT=thundermail.service |
There was a problem hiding this comment.
Without a persisted cursor (for example DB) or a read_from_tail guard on this systemd input, Fluent Bit will read the existing thundermail.service journal on first boot and can replay entries again after service restarts. That seems likely to backfill or duplicate logs in CloudWatch for these long-lived nodes. Is that intentional?
There was a problem hiding this comment.
No, I'll revisit this.
There was a problem hiding this comment.
I opted for the db option since it seems more likely to not miss messages.
| return stalwart.StalwartCluster( | ||
| f'{project.name_prefix}-stalwart', | ||
| project=project, | ||
| log_group_arn=logdests['stalwart'].resources['iam_policies']['write'].arn, |
There was a problem hiding this comment.
This wires the instances to the new LogDestination write policy, but that policy lives in the upcoming tb_pulumi release. From the implementation I reviewed, it looks like logs:CreateLogStream / logs:PutLogEvents may be scoped only to the log-group ARN rather than the log-stream ARNs. If that is still true in the cut release, Fluent Bit will bootstrap successfully but get AccessDenied when it tries to write events. Could we double-check the released policy shape before merging?
There was a problem hiding this comment.
No, this was a problem, but that was fixed by this PR, which is slated to go out with that release after it gets approved and merged.
There was a problem hiding this comment.
I just got that PR merged and have tested that code directly with success. When I go to prod with it, I'll double-check the policy that goes out there before shipping the fluent-bit configs to the live servers.
|
I cut the new version of tb_pulumi at the end of last week, so we can proceed with this on v0.0.18. I will roll this to stage today, but will wait on review to get it into prod. |
|
Just deployed this to stage after setting the tb_pulumi version to I also double-checked the read/write log policies, and they are correct. Read:
Write:
|
|
Let's hang on just a bit. I need to set the journald config to retain fewer logs as well. |
|
Ok, this is ready for review again. Vetted in stage. |


A few things are happening here. A brief list is below, but I'll leave some commentary inline as well.
Importantly, this cannot be merged until a new release of tb_pulumi is cut.Update: The new code has been released, and we can proceed with this rollout.Briefly, we:
functionconcept. This helps us with the api/mail dichotomy. This new parameter in the node definition becomes a tag on the instance, and that gets picked up by the second phase bootstrapping process and templated into the fluent-bit config.MESSAGEportion of Thundermail systemd log events and send them along to CloudWatch Logs.