-
Notifications
You must be signed in to change notification settings - Fork 0
Ship Stalwart logs to CloudWatch Logs #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
87e1c13
39c4d80
da7db96
68408cd
91d7de9
0ee694d
04b5a4f
40b5734
411ac50
ece6194
9429886
9e6b550
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,13 +10,19 @@ | |
| BOOTSTRAP_DIR = '/opt/stalwart-bootstrap' | ||
| BOOTSTRAP_LOG = '/var/log/stalwart-bootstrap.log' | ||
| INSTANCE_TAGS = {} | ||
|
|
||
| # Map of template files to target files | ||
| TEMPLATE_MAP = { | ||
| 'fluent-bit.service.j2': '/usr/lib/systemd/system/fluent-bit.service', | ||
| 'fluent-bit.yaml.j2': '/etc/fluent-bit/fluent-bit.yaml', | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add the new fluent-bit files to the list of things we template on a host. |
||
| 'journald.conf.j2': '/etc/systemd/journald.conf', | ||
| 'stalwart.toml.j2': '/opt/stalwart/etc/config.toml', | ||
| 'thundermail.service.j2': '/usr/lib/systemd/system/thundermail.service', | ||
| } | ||
| # Map of template variable to EC2 tags | ||
| TEMPLATE_VALUE_TAG_MAP = { | ||
| 'env': 'environment', | ||
| 'function': 'postboot.stalwart.function', | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are the new template variables based on instance tags. |
||
| 'https_paths': 'postboot.stalwart.https_paths', | ||
| 'node_services': 'postboot.stalwart.node_services', | ||
| 'node_id': 'postboot.stalwart.node_id', | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| [Unit] | ||
| Description=Fluent Bit | ||
| Documentation=https://docs.fluentbit.io/manual/ | ||
| Requires=network.target | ||
| After=network.target | ||
|
|
||
| [Service] | ||
| Type=simple | ||
| Environment="ENV={{ env }}" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| ExecStart=/opt/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/fluent-bit.yaml | ||
| Restart=always | ||
|
|
||
| [Install] | ||
| WantedBy=multi-user.target | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| --- | ||
|
|
||
| service: | ||
| flush: 1 | ||
| grace: 5 | ||
| daemon: no | ||
| dns.mode: UDP | ||
| hot_reload: on | ||
| log_level: info | ||
| storage.path: /fluent-bit/buffers | ||
| storage.backlog.flush_on_shutdown: on | ||
| storage.keep.rejected: on | ||
| storage.rejected.path: /fluent-bit/dlq | ||
|
|
||
| pipeline: | ||
| inputs: | ||
| - name: systemd | ||
| tag: cloudwatch.stalwart.{{ function }} | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tagging things with the function lets us catch them later and route them properly. |
||
| db: /opt/fluent-bit/thundermail.cursor | ||
| systemd_filter: _SYSTEMD_UNIT=thundermail.service | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without a persisted cursor (for example
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I'll revisit this.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opted for the |
||
|
|
||
| filters: [] | ||
|
|
||
| outputs: | ||
| # Send logs onward to CloudWatch. Log groups by the given name must pre-exist, and this service | ||
| # must have sufficient IAM permissions to post events to these log streams. If these log streams | ||
| # do not exist, this service must have permission to create them. | ||
| - name: cloudwatch_logs | ||
| match: cloudwatch.stalwart.mail | ||
| log_group_name: /tb/${ENV}/stalwart | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| log_stream_name: mail | ||
| region: eu-central-1 | ||
| log_key: MESSAGE | ||
|
|
||
| - name: cloudwatch_logs | ||
| match: cloudwatch.stalwart.api | ||
| log_group_name: /tb/${ENV}/stalwart | ||
| log_stream_name: api | ||
| region: eu-central-1 | ||
| log_key: MESSAGE | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| [Journal] | ||
| {% if env == 'prod' %} | ||
| MaxRetentionSec=3day | ||
| {% else %} | ||
| MaxRetentionSec=7day | ||
| {% endif %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,6 +8,15 @@ resources: | |
| - stalwart.postboot.keycloak_backend | ||
| recovery_window_in_days: 0 | ||
|
|
||
| tb:cloudwatch:LogDestination: | ||
| stalwart: | ||
| log_group: | ||
| retention_in_days: 7 | ||
| log_streams: | ||
| api: api | ||
| mail: mail | ||
| org_name: tb | ||
|
|
||
| tb:network:MultiTierVpc: | ||
| vpc: | ||
| cidr_block: 10.2.0.0/16 | ||
|
|
@@ -43,18 +52,18 @@ resources: | |
| additional_routes: | ||
| private: | ||
| - destination_cidr_block: 10.202.0.0/22 # observability-dev | ||
| vpc_peering_connection_id: pcx-0d2027442f0e54ca4 | ||
| vpc_peering_connection_id: pcx-04d7e54008cd9326c | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The peering connection changed when I rebuilt dev for testing. |
||
| public: [] | ||
| endpoint_interfaces: | ||
| - secretsmanager | ||
|
|
||
| # tb:ec2:SshableInstance: {} | ||
| tb:ec2:SshableInstance: {} | ||
| # Fill out this template to build an SSH bastion | ||
| tb:ec2:SshableInstance: | ||
| bastion: | ||
| ssh_keypair_name: mailstrom-dev | ||
| source_cidrs: | ||
| - 10.2.0.0/16 # Internal access | ||
| # tb:ec2:SshableInstance: | ||
| # bastion: | ||
| # ssh_keypair_name: mailstrom-dev | ||
| # source_cidrs: | ||
| # - 10.2.0.0/16 # Internal access | ||
|
|
||
| tb:mailstrom:StalwartCluster: | ||
| thundermail: | ||
|
|
@@ -81,6 +90,7 @@ resources: | |
| nodes: | ||
| "0": # Must be a unique, stringified integer | ||
| disable_api_termination: True | ||
| function: 'mail' | ||
| ignore_ami_changes: True | ||
| ignore_user_data_changes: True | ||
| instance_type: t3.micro | ||
|
|
@@ -99,6 +109,7 @@ resources: | |
| storage_capacity: 20 | ||
| "50": | ||
| disable_api_termination: True | ||
| function: 'api' | ||
| ignore_ami_changes: True | ||
| ignore_user_data_changes: True | ||
| instance_type: t3.micro | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,6 +7,15 @@ resources: | |
| - stalwart.postboot.keycloak_backend | ||
| - stalwart.postboot.postgresql_backend | ||
|
|
||
| tb:cloudwatch:LogDestination: | ||
| stalwart: | ||
| log_group: | ||
| retention_in_days: 3 | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shorter data retention in prod. This is actually the default value for the option, but I like to be explicit. |
||
| log_streams: | ||
| api: api | ||
| mail: mail | ||
| org_name: tb | ||
|
|
||
| tb:network:MultiTierVpc: | ||
| vpc: | ||
| cidr_block: 10.0.0.0/16 | ||
|
|
@@ -93,6 +102,7 @@ resources: | |
| nodes: | ||
| "0": # Must be a unique, stringified integer | ||
| disable_api_termination: True | ||
| function: 'mail' | ||
| ignore_ami_changes: True | ||
| ignore_user_data_changes: True | ||
| instance_type: t3a.large | ||
|
|
@@ -110,6 +120,7 @@ resources: | |
| storage_capacity: 20 | ||
| "1": # Must be a unique, stringified integer | ||
| disable_api_termination: True | ||
| function: 'mail' | ||
| ignore_ami_changes: True | ||
| ignore_user_data_changes: True | ||
| instance_type: t3a.large | ||
|
|
@@ -127,6 +138,7 @@ resources: | |
| storage_capacity: 20 | ||
| "50": | ||
| disable_api_termination: True | ||
| function: 'api' | ||
| ignore_ami_changes: True | ||
| ignore_user_data_changes: True | ||
| instance_type: t3.micro | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| Jinja2>=3.1,<4.0 | ||
| pulumi_cloudflare==6.6.0 | ||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.16 | ||
| tb_pulumi @ git+https://github.com/thunderbird/pulumi.git@v0.0.18 | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a future version with a bunch of little fixes to the core library that we need here, which is why we need the release to be done before merging this. |
||
| toml>=0.10.2,<0.11 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -274,6 +274,7 @@ def __init__( | |
| self, | ||
| name: str, | ||
| project: tb_pulumi.ThunderbirdPulumiProject, | ||
| log_group_arn: str, | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new variable gets passed down to the IAM setup so we can be sure the node profile includes write access to the Stalwart log group. |
||
| private_subnets: list[aws.ec2.Subnet], | ||
| public_subnets: list[aws.ec2.Subnet], | ||
| https_features: list = [], | ||
|
|
@@ -343,8 +344,16 @@ def __init__( | |
| s3_bucket, s3_secret, s3_policy = stalwart_s3.s3(self=self) | ||
|
|
||
| # Build an IAM role with a policy to enable node bootstrapping | ||
| profile_policy, role, profile_postboot_attachment, profile_s3_attachment, profile = stalwart_iam.iam( | ||
| ( | ||
| profile_policy, | ||
| role, | ||
| profile_postboot_attachment, | ||
| profile_s3_attachment, | ||
| profile_logwrite_attachment, | ||
| profile, | ||
| ) = stalwart_iam.iam( | ||
| self, | ||
| log_group_arn=log_group_arn, | ||
| s3_policy=s3_policy, | ||
| ) | ||
|
|
||
|
|
@@ -463,6 +472,7 @@ def __init__( | |
| 'spam_filter_secret': config_secrets['spam_filter'], | ||
| 'node_profile': profile, | ||
| 'node_profile_policy': profile_policy, | ||
| 'node_profile_logwrite_attachment': profile_logwrite_attachment, | ||
| 'node_profile_postboot_policy_attachment': profile_postboot_attachment, | ||
| 'node_profile_s3_policy_attachment': profile_s3_attachment, | ||
| 'node_sgs': self.node_sgs, | ||
|
|
@@ -671,6 +681,7 @@ def node( | |
| depends_on: list = [], | ||
| disable_api_stop: bool = False, | ||
| disable_api_termination: bool = False, | ||
| function: str = 'unknown', | ||
| ignore_ami_changes: bool = True, | ||
| ignore_user_data_changes: bool = True, | ||
| instance_type: str = 't3.micro', | ||
|
|
@@ -694,6 +705,9 @@ def node( | |
| False. | ||
| :type disable_api_termination: bool, optional | ||
|
|
||
| :param function: This becomes the ``postboot.stalwart.function`` tag on the instance and the ``function`` | ||
| variable inside of postboot templates. | ||
|
|
||
| :param ignore_ami_changes: When True, changes to the instance's AMI will not be applied. This prevents unwanted | ||
| rebuilding of cluster nodes, potentially causing downtime. Set to False if the AMI has changed and you | ||
| intend on rebuilding the node. Defaults to True. | ||
|
|
@@ -749,6 +763,7 @@ def node( | |
| postboot_tags = { | ||
| 'postboot.stalwart.aws_region': self.project.aws_region, | ||
| 'postboot.stalwart.env': self.project.stack, | ||
| 'postboot.stalwart.function': function, | ||
| 'postboot.stalwart.https_paths': ','.join(https_paths), | ||
| 'postboot.stalwart.image': self.stalwart_image, | ||
| 'postboot.stalwart.node_services': node_services_tag, | ||
|
|
@@ -810,6 +825,9 @@ def user_data(self): | |
| archive_file_base = './bootstrap' | ||
| archive_files = [ | ||
| 'bootstrap.py', | ||
| 'templates/fluent-bit.service.j2', | ||
| 'templates/fluent-bit.yaml.j2', | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These entries ensure that the new config templates wind up in the first-phase bootstrapping blob. |
||
| 'templates/journald.conf.j2', | ||
| 'templates/ports.j2', | ||
| 'templates/stalwart.toml.j2', | ||
| 'templates/thundermail.service.j2', | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wires the instances to the new
LogDestinationwrite policy, but that policy lives in the upcomingtb_pulumirelease. From the implementation I reviewed, it looks likelogs:CreateLogStream/logs:PutLogEventsmay be scoped only to the log-group ARN rather than the log-stream ARNs. If that is still true in the cut release, Fluent Bit will bootstrap successfully but getAccessDeniedwhen it tries to write events. Could we double-check the released policy shape before merging?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this was a problem, but that was fixed by this PR, which is slated to go out with that release after it gets approved and merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just got that PR merged and have tested that code directly with success. When I go to prod with it, I'll double-check the policy that goes out there before shipping the fluent-bit configs to the live servers.