[CAI-753] New lambda chatbot monitor with its queue and dlq by uolter · Pull Request #2000 · pagopa/developer-portal

uolter · 2026-02-09T08:19:32Z

List of Changes

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Chore (nothing changes by a user perspective)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My change requires a change to the documentation.
I have updated the documentation accordingly.

changeset-bot · 2026-02-09T08:19:39Z

🦋 Changeset detected

Latest commit: 076c822

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
infrastructure	Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

…endpoint

…eference

…o CAI-753-lambda-monitor

* Update clickhouse ram end cpu * + changeset

* Update clickhouse ram end cpu * + changeset * clickhouse task definition update

…o CAI-753-lambda-monitor

…veloper-portal into CAI-753-lambda-monitor

* update var chb_moded_id * chb-model set in lambda evaluate too. * terraform fmt

…o CAI-753-lambda-monitor

Copilot

Pull request overview

This PR introduces a new “chatbot monitor” Lambda and supporting SQS queue/DLQ wiring, while also reshaping Langfuse infrastructure (module placement, networking access, and ClickHouse resource sizing) and updating the default chatbot generation model.

Changes:

Add a new chatbot monitor Lambda, plus a new monitor SQS FIFO queue and DLQ, and refactor SQS resources to a keyed for_each structure.
Move/encapsulate the Langfuse module under the chatbot module and expose a service-discovery endpoint output for internal access.
Update Langfuse ClickHouse EFS throughput mode and increase ClickHouse task CPU/memory; update default chatbot generation model ID.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
apps/infrastructure/src/variables.tf	Update default chatbot generation model value.
apps/infrastructure/src/refacotr.tf	Add `moved` blocks for SQS refactor and Langfuse module relocation.
apps/infrastructure/src/README.md	Regenerate docs: remove top-level langfuse module entry; update model default shown.
apps/infrastructure/src/modules/langfuse/variables.tf	Add lambda SG input for Langfuse web ingress.
apps/infrastructure/src/modules/langfuse/security_group.tf	Change Langfuse web ingress handling to SG rules and attempt to include lambda SG.
apps/infrastructure/src/modules/langfuse/README.md	Regenerate module docs (new rule + new output).
apps/infrastructure/src/modules/langfuse/outputs.tf	Add service discovery endpoint output for langfuse-web.
apps/infrastructure/src/modules/langfuse/efs.tf	Switch ClickHouse EFS to `throughput_mode = "elastic"`.
apps/infrastructure/src/modules/langfuse/ecs.tf	Increase ClickHouse task/container CPU & memory.
apps/infrastructure/src/modules/chatbot/variables.tf	Extend monitoring ECS config object; add hosted zone id input.
apps/infrastructure/src/modules/chatbot/sqs.tf	Refactor evaluate queue/DLQ into `for_each` and add monitor queue/DLQ.
apps/infrastructure/src/modules/chatbot/README.md	Regenerate module docs for new resources/inputs.
apps/infrastructure/src/modules/chatbot/langfuse.tf	Add nested Langfuse module instantiation from within chatbot.
apps/infrastructure/src/modules/chatbot/lambda_monitor.tf	Add the new monitor Lambda + IAM/logging/event source mapping.
apps/infrastructure/src/modules/chatbot/lambda_index.tf	Reuse shared lambda assume-role policy document.
apps/infrastructure/src/modules/chatbot/lambda_evaluate.tf	Update to new queue addresses; add env vars for monitor/evaluate queues.
apps/infrastructure/src/modules/chatbot/lambda_api.tf	Adjust env vars and IAM policy to send to the monitor queue.
apps/infrastructure/src/modules/chatbot/ecs.tf	Make Langfuse ECS desired/min/max capacities configurable.
apps/infrastructure/src/modules/chatbot/ecr.tf	Add a new ECR repo definition for the monitor lambda image.
apps/infrastructure/src/modules/chatbot/data.tf	Add shared `lambda_assume_role` policy document.
apps/infrastructure/src/main.tf	Pass hosted zone id into chatbot; remove top-level langfuse module block.
apps/infrastructure/src/env/dev/terraform.tfvars	Override monitoring ECS desired count and update model generation.
.changeset/small-sheep-like.md	Changeset entry for EFS throughput mode change.
.changeset/six-animals-hear.md	Changeset entry for ClickHouse CPU/RAM change.
.changeset/shy-hoops-trade.md	Changeset entry for chatbot monitor lambda addition.
.changeset/polite-months-design.md	Changeset entry for model id update.
.changeset/metal-knives-grow.md	Changeset entry for ClickHouse CPU update.
.changeset/dry-llamas-beam.md	Changeset entry for scaling Langfuse down in dev.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

apps/infrastructure/src/modules/langfuse/security_group.tf

+# TODO: the for_each fails when aws_security_group.lb.id is a string, known only after apply. How can be sloved?
+resource "aws_security_group_rule" "langfuse_web_lambda_ingress" {
+  for_each = { for id in [aws_security_group.lb.id, var.lambda_security_group_id] : id => id if id != null }
+
+  type                     = "ingress"
+  from_port                = 3000
+  to_port                  = 3000
+  protocol                 = "tcp"
+  security_group_id        = aws_security_group.langfuse_web.id
+  source_security_group_id = each.value
+  description              = "Allow lambda monitor access to langfuse-web"
+


apps/infrastructure/src/modules/langfuse/security_group.tf

 }

+
+# TODO: the for_each fails when aws_security_group.lb.id is a string, known only after apply. How can be sloved?


apps/infrastructure/src/modules/chatbot/sqs.tf

+resource "aws_sqs_queue" "chatbot_queue" {
+  for_each = local.chatbot_queues
+
+  name                        = "${local.prefix}-${each.key}-queue.fifo"
  fifo_queue                  = true
  content_based_deduplication = true
  deduplication_scope         = "messageGroup"
  fifo_throughput_limit       = "perMessageGroupId"
+  visibility_timeout_seconds  = 120

  redrive_policy = jsonencode({
-    deadLetterTargetArn = aws_sqs_queue.chatbot_evaluate_queue_dlq.arn
+    deadLetterTargetArn = aws_sqs_queue.chatbot_dlq[each.key].arn
    maxReceiveCount     = 2
  })


apps/infrastructure/src/modules/chatbot/lambda_evaluate.tf

@@ -75,15 +64,17 @@ resource "aws_iam_role_policy" "lambda_evaluate_policy" {
          "sqs:DeleteMessage",
          "sqs:GetQueueAttributes",


apps/infrastructure/src/modules/chatbot/lambda_monitor.tf

+          "sqs:ReceiveMessage",
+          "sqs:DeleteMessage",
+          "sqs:GetQueueAttributes",
+          "sqs:SendMessage"


.changeset/small-sheep-like.md

+"infrastructure": minor
+---
+
+Update Langfuse clickhouse EFS from bursting do elastic.


apps/infrastructure/src/modules/chatbot/langfuse.tf

+  count = var.environment == "prod" ? 0 : 1
+
+  environment        = var.environment
+  region             = var.aws_region
+  vpc_id             = var.vpc.id
+  private_subnet_ids = var.vpc.private_subnets
+  public_subnet_ids  = var.vpc.public_subnets
+  custom_domain_id   = var.hosted_zone_id
+  custom_domain_name = var.dns_domain_name


apps/infrastructure/src/modules/chatbot/lambda_api.tf

 resource "aws_iam_policy" "chatbot_monitor_queue" {
  name        = "lambda-sqs-send"
  description = "Allow Lambda to send messages to SQS queue"
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["sqs:SendMessage", "sqs:GetQueueUrl"]
-        Resource = aws_sqs_queue.chatbot_evaluate_queue.arn
+        Resource = aws_sqs_queue.chatbot_queue["monitor"].arn
      }
    ]


apps/infrastructure/src/modules/chatbot/lambda_evaluate.tf

      {
        Effect = "Allow"
        Action = [
          "sqs:SendMessage",
-
+          "sqs:GetQueueUrl",
+        ]
+        Resource = [
+          aws_sqs_queue.chatbot_queue["monitor"].arn,
        ]
-        Resource = aws_sqs_queue.chatbot_evaluate_queue_dlq.arn
      },


apps/infrastructure/src/modules/chatbot/lambda_monitor.tf

+          "sqs:SendMessage",
+          "sqs:GetQueueUrl"
+        ]
+        Resource = aws_sqs_queue.chatbot_queue["evaluate"].arn
+      },
+      {
+        Effect = "Allow"
+        Action = [


@mdciri

* lambda api and evaluate new env variable * + change set * Apply suggestion from @mdciri Co-authored-by: Marco Domenico Cirillo <59966344+mdciri@users.noreply.github.com> * lambda evaluate new env variables google service account --------- Co-authored-by: Marco Domenico Cirillo <59966344+mdciri@users.noreply.github.com>

* refacotr permissions lambda evaluate and index * +changeset

github-actions · 2026-03-24T08:48:53Z

Branch is not up to date with base branch

@uolter it seems this Pull Request is not updated with base branch.
Please proceed with a merge or rebase to solve this.

github-actions · 2026-03-24T08:48:54Z

Jira Pull Request Link

This Pull Request refers to the following Jira issue CAI-753

github-actions · 2026-04-08T04:40:31Z

This pull request is stale because it has been open for 14 days with no activity. If the pull request is still valid, please update it within 21 days to keep it open or merge it, otherwise it will be closed automatically.

New lambda chatbot monitor with its queue and dlq

3373d9a

uolter temporarily deployed to prod February 9, 2026 08:19 — with GitHub Actions Inactive

uolter temporarily deployed to uat February 9, 2026 08:19 — with GitHub Actions Inactive

uolter temporarily deployed to dev February 9, 2026 08:19 — with GitHub Actions Inactive

github-actions bot added the infra label Feb 9, 2026

github-actions bot assigned uolter Feb 9, 2026

refactor SQS queues with DRY principle.

5c6e35e

uolter temporarily deployed to dev February 9, 2026 08:27 — with GitHub Actions Inactive

uolter temporarily deployed to uat February 9, 2026 08:27 — with GitHub Actions Inactive

uolter temporarily deployed to prod February 9, 2026 08:27 — with GitHub Actions Inactive

Refactor iam trusted policy duplicates

f37a946

uolter temporarily deployed to dev February 9, 2026 10:48 — with GitHub Actions Inactive

uolter had a problem deploying to uat February 9, 2026 10:48 — with GitHub Actions Failure

uolter temporarily deployed to prod February 9, 2026 10:48 — with GitHub Actions Inactive

update env variables lambda monitor

15c205b

uolter temporarily deployed to prod February 9, 2026 11:10 — with GitHub Actions Inactive

uolter temporarily deployed to dev February 9, 2026 11:10 — with GitHub Actions Inactive

uolter had a problem deploying to uat February 9, 2026 11:10 — with GitHub Actions Failure

update lambda monitor iam permissions

e2a86b0

uolter had a problem deploying to uat February 9, 2026 11:18 — with GitHub Actions Failure

uolter temporarily deployed to prod February 9, 2026 11:18 — with GitHub Actions Inactive

uolter temporarily deployed to dev February 9, 2026 11:18 — with GitHub Actions Inactive

notes for changes that needs to be apply later

f32423f

uolter temporarily deployed to prod February 9, 2026 13:56 — with GitHub Actions Inactive

uolter temporarily deployed to dev February 9, 2026 13:56 — with GitHub Actions Inactive

uolter had a problem deploying to uat February 9, 2026 13:56 — with GitHub Actions Failure

uolter added 2 commits February 9, 2026 17:26

lambda monitor langfuse service discovery with new service discovery …

509fbef

…endpoint

refactor moved the langfuse module inside chatbot to avoid circular r…

a09b6b7

…eference

uolter temporarily deployed to dev February 10, 2026 14:41 — with GitHub Actions Inactive

uolter temporarily deployed to uat February 23, 2026 11:26 — with GitHub Actions Inactive

uolter temporarily deployed to dev February 23, 2026 11:26 — with GitHub Actions Inactive

Merge branch 'main' into CAI-753-lambda-monitor

4da9f81

uolter temporarily deployed to dev February 26, 2026 14:41 — with GitHub Actions Inactive

uolter temporarily deployed to uat February 26, 2026 14:41 — with GitHub Actions Inactive

uolter temporarily deployed to prod February 26, 2026 14:41 — with GitHub Actions Inactive

uolter added the applied dev label Mar 2, 2026

uolter added 16 commits March 2, 2026 10:44

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

944d169

…o CAI-753-lambda-monitor

[CAI-814] Langfuse out of memory issue (#2077)

92eba7a

* Update clickhouse ram end cpu * + changeset

[CAI-814] Langfuse out of memory (#2078)

70c5b60

* Update clickhouse ram end cpu * + changeset * clickhouse task definition update

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

f23d2a9

…o CAI-753-lambda-monitor

Langfuse 2 ecs desired ecs task 0 RDS status stopped (#2094)

d437f54

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

ea4e5d0

…o CAI-753-lambda-monitor

Merge branch 'main' into CAI-753-lambda-monitor

03f1dbb

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

72d92f2

…o CAI-753-lambda-monitor

Merge branch 'CAI-753-lambda-monitor' of https://github.com/pagopa/de…

1ce5c2b

…veloper-portal into CAI-753-lambda-monitor

update sqs visibility time out

2d8088a

rebase with main

0459b4b

[CAI-839] Update chb model (#2109)

9bb309b

* update var chb_moded_id * chb-model set in lambda evaluate too. * terraform fmt

rebase with main

a6c2638

update model id in dev

5d98068

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

5231b33

…o CAI-753-lambda-monitor

Merge branch 'main' of https://github.com/pagopa/developer-portal int…

a35c34d

…o CAI-753-lambda-monitor

Copilot AI reviewed Mar 19, 2026

View reviewed changes

uolter and others added 3 commits March 23, 2026 08:59

Merge branch 'main' into CAI-753-lambda-monitor

bf7e1e5

[CAI-877] Missed permission lambda index and evaluate (#2186)

076c822

* refacotr permissions lambda evaluate and index * +changeset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CAI-753] New lambda chatbot monitor with its queue and dlq#2000

[CAI-753] New lambda chatbot monitor with its queue and dlq#2000
uolter wants to merge 48 commits intomainfrom
CAI-753-lambda-monitor

uolter commented Feb 9, 2026 •

edited

Loading

Uh oh!

changeset-bot bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited by atlassian bot

Loading

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		}


		# TODO: the for_each fails when aws_security_group.lb.id is a string, known only after apply. How can be sloved?

		@@ -75,15 +64,17 @@ resource "aws_iam_role_policy" "lambda_evaluate_policy" {
		"sqs:DeleteMessage",
		"sqs:GetQueueAttributes",

Conversation

uolter commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

List of Changes

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Uh oh!

changeset-bot bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions bot commented Mar 24, 2026

Branch is not up to date with base branch

Uh oh!

github-actions bot commented Mar 24, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Jira Pull Request Link

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uolter commented Feb 9, 2026 •

edited

Loading

changeset-bot bot commented Feb 9, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited by atlassian bot

Loading