Skip to content

Feature/scalability implementation#255

Closed
lakindu-yl wants to merge 33 commits intodevelopfrom
feature/scalability-implementation
Closed

Feature/scalability implementation#255
lakindu-yl wants to merge 33 commits intodevelopfrom
feature/scalability-implementation

Conversation

@lakindu-yl
Copy link
Copy Markdown
Contributor

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update
  • CI/CD update
  • Other (please describe):

Related Issues

Fixes #
Relates to #

Changes Made

Testing

  • Unit tests pass locally
  • Integration tests pass locally
  • Manual testing completed
  • New tests added for changes

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

Copilot AI review requested due to automatic review settings March 26, 2026 03:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a “scalable” deployment path for Agent Kernel, adding shared request-processing logic plus AWS infrastructure and runtime components for queue-based execution and (optionally) WebSocket-based async communication.

Changes:

  • Added a centralized ChatService and ResponseStore abstraction to reuse request parsing/agent execution logic across deployments.
  • Implemented AWS “serverless scalable mode” with SQS-driven agent execution, response handling, optional response storage, and WebSocket connection management.
  • Extended AKConfig with execution settings and added Terraform modules to provision queues, response stores, REST API Gateway, and WebSocket API Gateway.

Reviewed changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 25 comments.

Show a summary per file
File Description
ak-py/src/agentkernel/deployment/common/response_store.py Adds a response-store interface for session/message storage.
ak-py/src/agentkernel/deployment/common/chat_service.py Adds shared request parsing + agent execution wrapper used by deployments.
ak-py/src/agentkernel/deployment/common/init.py Common deployment package init (no content shown).
ak-py/src/agentkernel/deployment/azure/akfunction.py Refactors Azure Functions handler to use shared ChatService.
ak-py/src/agentkernel/deployment/aws/serverless/internal/response_handler.py Adds SQS consumer(s) to store or push responses (REST vs WebSocket).
ak-py/src/agentkernel/deployment/aws/serverless/core/websocket_manager.py Adds DynamoDB-backed WebSocket connection management + send helpers.
ak-py/src/agentkernel/deployment/aws/serverless/core/sqs_consumer.py Adds a base SQS consumer with retry/permanent-failure handling.
ak-py/src/agentkernel/deployment/aws/serverless/core/default_ws_endpoints.py Adds default WebSocket routes ($connect/$disconnect/$default/chat).
ak-py/src/agentkernel/deployment/aws/serverless/core/default_endpoints.py Adds default REST routes for sync/async/stream execution modes.
ak-py/src/agentkernel/deployment/aws/serverless/core/init.py Exposes default endpoints handler.
ak-py/src/agentkernel/deployment/aws/serverless/aklambda.py Adds Lambda router supporting REST and WebSocket dispatch based on mode.
ak-py/src/agentkernel/deployment/aws/serverless/akauthorizer.py Fixes/adjusts authorizer import path for new package layout.
ak-py/src/agentkernel/deployment/aws/serverless/akagentrunner.py Adds SQS-driven agent runner that posts results to output queue.
ak-py/src/agentkernel/deployment/aws/serverless/init.py Exports serverless entrypoints (Lambda/Authorizer/Runner).
ak-py/src/agentkernel/deployment/aws/response_store/redis.py Adds Redis-backed response store implementation.
ak-py/src/agentkernel/deployment/aws/response_store/handler.py Adds store selection based on AKConfig.execution.response_store.
ak-py/src/agentkernel/deployment/aws/response_store/dynamodb.py Adds DynamoDB-backed response store implementation.
ak-py/src/agentkernel/deployment/aws/response_store/init.py Adds response_store package export.
ak-py/src/agentkernel/deployment/aws/aklambda.py Removes legacy AWS Lambda router/handler implementation.
ak-py/src/agentkernel/deployment/aws/init.py Updates AWS deployment exports to new serverless package structure.
ak-py/src/agentkernel/core/model.py Adds ExecutionMode enum for deployment mode selection.
ak-py/src/agentkernel/core/config.py Adds execution config (mode/queues/response_store/websocket table).
ak-deployment/ak-aws/serverless/variables.tf Adds scalable mode inputs: execution mode, runner/handler, queue/store/websocket configs.
ak-deployment/ak-aws/serverless/state.tf Reworks module composition to conditionally create REST vs WebSocket and scalable components.
ak-deployment/ak-aws/serverless/outputs.tf Adds outputs for queues, runner, response handler, stores, and websocket resources.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/variables.tf Defines WebSocket API Gateway module inputs.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/outputs.tf Exposes WebSocket API Gateway outputs.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/main.tf Provisions API Gateway v2 WebSocket API + routes + Lambda permission.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/README.md Documents the WebSocket API Gateway module.
ak-deployment/ak-aws/serverless/modules/response-stores/variables.tf Defines response store module inputs (Redis/DynamoDB).
ak-deployment/ak-aws/serverless/modules/response-stores/outputs.tf Exposes response store outputs.
ak-deployment/ak-aws/serverless/modules/response-stores/main.tf Provisions Redis and/or DynamoDB response storage.
ak-deployment/ak-aws/serverless/modules/response-handler/variables.tf Defines response-handler Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/response-handler/response_handler.tf Provisions response-handler Lambda + IAM + SQS event source mapping.
ak-deployment/ak-aws/serverless/modules/response-handler/outputs.tf Exposes response-handler Lambda outputs.
ak-deployment/ak-aws/serverless/modules/response-handler/build_response_handler.sh Builds a minimal response-handler deployment zip from the main package.
ak-deployment/ak-aws/serverless/modules/response-handler/.gitignore Ignores response-handler build artifacts.
ak-deployment/ak-aws/serverless/modules/request-handler/variables.tf Defines request-handler Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/request-handler/outputs.tf Exposes request-handler outputs.
ak-deployment/ak-aws/serverless/modules/request-handler/main.tf Provisions request-handler Lambda + IAM permissions (incl. SQS/WebSocket perms).
ak-deployment/ak-aws/serverless/modules/queues/variables.tf Defines queue module inputs.
ak-deployment/ak-aws/serverless/modules/queues/outputs.tf Exposes queue outputs.
ak-deployment/ak-aws/serverless/modules/queues/main.tf Provisions input/output SQS queues using shared SQS module.
ak-deployment/ak-aws/serverless/modules/api-gateway/variables.tf Defines REST API Gateway module inputs.
ak-deployment/ak-aws/serverless/modules/api-gateway/outputs.tf Exposes REST API Gateway outputs.
ak-deployment/ak-aws/serverless/modules/api-gateway/main.tf Provisions REST API Gateway v1 resources/methods/integration/logging/authorizer.
ak-deployment/ak-aws/serverless/modules/agent-runner/variables.tf Defines agent-runner Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/agent-runner/outputs.tf Exposes agent-runner Lambda outputs.
ak-deployment/ak-aws/serverless/modules/agent-runner/main.tf Provisions agent-runner Lambda + IAM + SQS event source mapping.
ak-deployment/ak-aws/serverless/README.md Documents new WebSocket/async usage and outputs.
ak-deployment/ak-aws/common/modules/sqs/variables.tf Adds/defines reusable SQS module variables.
ak-deployment/ak-aws/common/modules/sqs/outputs.tf Adds reusable SQS module outputs.
ak-deployment/ak-aws/common/modules/sqs/main.tf Adds reusable SQS queue+DLQ implementation with access policies.
ak-deployment/ak-aws/common/modules/sqs/README.md Documents the reusable SQS module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +186 to +193
# Prepare message for queue
queue_payload = {
"session_id": session_id,
"connection_id": connection_id,
"message": body.get("message", ""),
"metadata": body.get("metadata", {}),
"websocket": True # Flag to indicate WebSocket origin
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue payload uses message/metadata, but ChatService.process_chat_request expects prompt (and optionally agent, images, files). In async WebSocket mode this will consistently produce “No prompt provided…” in the agent runner. Align the payload schema (e.g., send prompt instead of message, and map metadata fields appropriately).

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +74
success = cls._ws_manager.store_connection(
session_id=session_id,
connection_id=connection_id,
metadata={
"connected_at": int(context.get_remaining_time_in_millis() if context else 0)
}
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connected_at metadata is being set to context.get_remaining_time_in_millis() which is not a timestamp and will overwrite the timestamp set in WebSocketManager.store_connection. Use an actual wall-clock timestamp (e.g., epoch ms) or rename the field if you intend to store remaining execution time.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +30
redis_config = response_store_config.redis
# Determine if SSL should be used based on URL scheme
use_ssl = redis_config.url.startswith("rediss://")

self.store = RedisResponseStore(
host=redis_config.url,
ssl=use_ssl
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Redis, RedisResponseStore is initialized with host=redis_config.url, but redis.Redis(host=...) expects a hostname (no scheme/credentials). Use redis.Redis.from_url(redis_config.url, ...) (or parse URL into host/port/user/pass) so configs like redis://... / rediss://... work.

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +33
session_id = message["session_id"]
key = f"response:session:{session_id}" # TODO:: have to get from AKConfig prefix
self.client.rpush(key, json.dumps(message))

def get_messages(self, session_id: str) -> list[dict]:
key = f"response:session:{session_id}"

messages = self.client.lrange(key, 0, -1) # 0 to -1 means all items

return [json.loads(m) for m in messages]
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Redis key prefix and retention are hardcoded (response:session:{session_id} with no TTL), ignoring the configured prefix/ttl in AKConfig. This can cause unbounded growth. Use the configured prefix and set an expiry when adding messages (and/or trim the list).

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +16
def add_message(self, message: dict) -> None:
self.table.put_item(Item=message)

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB TTL won’t work as configured (Terraform uses expiry_time), because stored items never set an expiry_time attribute. Consider adding expiry_time = now + ttl when writing messages (and ensuring the table schema matches).

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +21
self.client = redis.Redis(
host=host,
port=port,
username=username,
password=password,
ssl=ssl,
decode_responses=True,
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis.Redis(host=host, ...) is being fed a full URL via ResponseDBHandler (e.g., redis://...), which will not resolve as a hostname. Prefer redis.Redis.from_url(...) (and then you can drop the manual ssl flag, since the scheme controls TLS).

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +12
def __init__(self, table_name: str, region: str):
dynamodb = boto3.resource("dynamodb", region_name=region)

self.table = dynamodb.Table(table_name)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__init__ requires region: str, but callers pass None (see ResponseDBHandler). Make region optional (defaulting to boto3’s default region resolution) or ensure a real region string is always provided.

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +96
cls._log.info(f"Permanent failure: {record}: Retried message {cls.max_receive_count} times. Sending error message to Output Queue`")
try:
error_message_body = cls._construct_error_message_body(error_msg="Failed to process message. Retried {cls.max_receive_count} times")
queue_input_message = cls._construct_queue_input_message(raw_queue_message=record, queue_input_message_body=error_message_body,)
cls._send_to_output_queue(queue_input_message=queue_input_message)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error_msg is built with a literal {cls.max_receive_count} (missing f prefix), so the user-facing error won’t include the retry count. Use an f-string (or .format) so the retry count is interpolated correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +139
try:
loop = asyncio.get_event_loop()
if loop.is_closed():
asyncio.set_event_loop(asyncio.new_event_loop())
return asyncio.run(self.service.run_multi(requests=requests))
else:
return loop.run_until_complete(self.service.run_multi(requests=requests))
except RuntimeError:
return asyncio.run(self.service.run_multi(requests=requests))
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_run_agent_service will fail when called from a context where an event loop is already running: loop.run_until_complete(...) raises, and the except RuntimeError fallback to asyncio.run(...) also raises under a running loop. Consider making process_chat_request async, or detect a running loop (get_running_loop) and use asyncio.run_coroutine_threadsafe / create_task depending on context.

Copilot uses AI. Check for mistakes.
from agentkernel.core.model import AgentReplyImage, AgentReplyText, AgentRequestAny, AgentRequestText

from ...core import AgentService
from agentkernel.deployment.core.chat_service import ChatService
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import path looks incorrect: the new ChatService lives under agentkernel.deployment.common.chat_service, not agentkernel.deployment.core.chat_service. As-is, Azure Functions will fail at import time.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 26, 2026 11:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 16 comments.

Comments suppressed due to low confidence (1)

ak-deployment/ak-aws/serverless/modules/request-handler/main.tf:289

  • In scalable/async modes the request handler uses ResponseDBHandler for polling (_get_messages), but this module only sets AK_EXECUTION__QUEUES__INPUT_QUEUE_URL (and WebSocket table) — it never sets AK_EXECUTION__RESPONSE_STORE__... env vars. With the current ResponseDBHandler implementation (which requires execution.response_store), REST_ASYNC polling will crash at runtime. Wire the response store configuration into the request handler environment when scalable_mode is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +38 to +41
self.store = DynamoDBResponseStore(
table_name=dynamodb_config.table_name,
region=None # Will use default AWS region from environment/IAM role
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDBResponseStore is instantiated with region=None, but its constructor currently declares region: str and passes it into boto3.resource(..., region_name=region). Make region optional in DynamoDBResponseStore.__init__ (defaulting to AWS SDK resolution) or pass an actual region string here.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +219
mode: ExecutionMode = Field(
default=None,
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mode is typed as ExecutionMode but has default=None. This will either fail Pydantic validation or propagate None into code paths that expect an ExecutionMode (e.g., endpoint routing), causing runtime errors. Make this Optional[ExecutionMode] with a sensible default (likely ExecutionMode.REST_SYNC) or require it explicitly.

Suggested change
mode: ExecutionMode = Field(
default=None,
mode: Optional[ExecutionMode] = Field(
default=ExecutionMode.REST_SYNC,

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +24
def __init__(self):
self._log = logging.getLogger("ak.deployment.chat_service")
# self.service = AgentService is done in the process_chat_request() function
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy is configured with disallow_untyped_defs; __init__ needs an explicit return type (-> None). This will currently fail type-checking for this new module.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +61
condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "ses_stream", "async"], var.execution_mode)
error_message = "execution_mode must be one of: rest_sync, rest_async, ses_stream, async, or null."
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

execution_mode validation allows ses_stream, but the Python ExecutionMode enum added in this PR uses stream (and the routing code checks ExecutionMode.STREAM). This mismatch will cause deployments configured with ses_stream to fail at runtime. Align the allowed Terraform values with the Python enum (or update the enum to match).

Suggested change
condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "ses_stream", "async"], var.execution_mode)
error_message = "execution_mode must be one of: rest_sync, rest_async, ses_stream, async, or null."
condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "stream", "async"], var.execution_mode)
error_message = "execution_mode must be one of: rest_sync, rest_async, stream, async, or null."

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +67
def register(self, path: str, method: str = None) -> Callable[[Callable], Callable]:
"""
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy strict mode: method defaults to None but is typed as str. Change the signature to method: Optional[str] = None to avoid type-checking errors.

Copilot uses AI. Check for mistakes.
Comment on lines +199 to +203
def sync_operation(payload: Dict[str, Any]) -> Dict[str, Any]:
queue_result = cls._send_to_queue(payload)
db_result = cls._get_messages(payload)
return {"queue_status": queue_result, "response": db_result}

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In REST_SYNC mode, sync_operation sends to SQS and immediately reads from the response store, but there’s no waiting/long-polling for the agent-runner + response-handler pipeline to complete. This will typically return an empty/partial response and behave nondeterministically. Consider making REST_SYNC bypass SQS and call ChatService directly, or implement a bounded wait/long-poll with correlation on message_id.

Copilot uses AI. Check for mistakes.
Comment on lines +265 to +272
local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
local.output_queue_visibility_timeout >= var.response_handler.timeout
) : true
error_message = format(
"[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",
local.input_queue_visibility_timeout,
var.agent_runner.timeout,
local.output_queue_visibility_timeout,
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check.queue_visibility_timeouts compares local.input_queue_visibility_timeout/local.output_queue_visibility_timeout against Lambda timeouts, but those locals can be null when queue_config is omitted (it defaults to {} and try(..., null) returns null). Comparing null >= number will error during plan/apply. Use coalesce(..., <module default>) so the check always evaluates with numbers (or derive the values from the queues module defaults).

Suggested change
local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
local.output_queue_visibility_timeout >= var.response_handler.timeout
) : true
error_message = format(
"[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",
local.input_queue_visibility_timeout,
var.agent_runner.timeout,
local.output_queue_visibility_timeout,
coalesce(local.input_queue_visibility_timeout, var.agent_runner.timeout) >= var.agent_runner.timeout &&
coalesce(local.output_queue_visibility_timeout, var.response_handler.timeout) >= var.response_handler.timeout
) : true
error_message = format(
"[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",
coalesce(local.input_queue_visibility_timeout, var.agent_runner.timeout),
var.agent_runner.timeout,
coalesce(local.output_queue_visibility_timeout, var.response_handler.timeout),

Copilot uses AI. Check for mistakes.
Comment on lines +136 to +140

if not domain_name:
cls._log.error("WebSocket domain_name not found in message or environment")
raise ValueError("Missing WebSocket domain_name")

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain_name/stage are required to call the API Gateway Management API, but this handler only looks in message_data (or env vars). Messages produced by ServerlessAgentRunner/ChatService don’t include these fields, and the Terraform in this PR doesn’t set WEBSOCKET_DOMAIN_NAME/WEBSOCKET_STAGE, so async/WebSocket mode will fail. Add domain_name/stage to the queued payload or wire them via env vars.

Copilot uses AI. Check for mistakes.
Comment on lines +184 to +186
response = self._connection_table.scan(
FilterExpression=Key("connection_id").eq(connection_id)
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scan fallback, FilterExpression=Key("connection_id").eq(connection_id) is incorrect: Key is for key conditions; scans should use Attr. This will throw or behave incorrectly when the GSI is missing. Use from boto3.dynamodb.conditions import Attr and FilterExpression=Attr("connection_id").eq(connection_id).

Copilot uses AI. Check for mistakes.
Comment on lines +14 to +18
self.client = redis.Redis(
host=host,
port=port,
username=username,
password=password,
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses redis.Redis(host=host, ...), but the configuration in this repo provides a URL (redis://.../rediss://...). If host is a URL, the client will treat it as a hostname and fail to connect. Prefer redis.Redis.from_url(...) and accept a url: str here.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 27, 2026 04:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 18 comments.

Comments suppressed due to low confidence (1)

ak-deployment/ak-aws/serverless/modules/request-handler/main.tf:58

  • aws_iam_policy.lambda_sqs_policy is not a counted resource, but the attachment references aws_iam_policy.lambda_sqs_policy[0].arn, which will fail Terraform evaluation. Reference it without an index, or add count/for_each consistently on both resources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to +57
input_queue_url = module.queues[0].input_queue_url
input_queue_arn = module.queues[0].input_queue_arn

# Output queue
output_queue_url = module.queues[0].output_queue_url
output_queue_arn = module.queues[0].output_queue_arn
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locals reference module.queues[0].*, but module "queues" is not a counted module (no count/for_each). Indexing it will fail Terraform evaluation. Reference outputs as module.queues.input_queue_url / module.queues.output_queue_arn etc, or add count = 1 and keep indexing consistent.

Suggested change
input_queue_url = module.queues[0].input_queue_url
input_queue_arn = module.queues[0].input_queue_arn
# Output queue
output_queue_url = module.queues[0].output_queue_url
output_queue_arn = module.queues[0].output_queue_arn
input_queue_url = module.queues.input_queue_url
input_queue_arn = module.queues.input_queue_arn
# Output queue
output_queue_url = module.queues.output_queue_url
output_queue_arn = module.queues.output_queue_arn

Copilot uses AI. Check for mistakes.
module_name = var.module_name
is_production = var.is_production
package_path = local.request_handler_package_path
environment_variables = var.environment_variables
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request handler Lambda needs to know execution_mode to route correctly (REST vs async/WebSocket). This module currently passes only var.environment_variables and does not inject AK_EXECUTION__MODE (unlike agent-runner/response-handler). Without it, runtime config may default incorrectly (or be None). Consider merging AK_EXECUTION__MODE = var.execution_mode into environment_variables for the request handler too.

Suggested change
environment_variables = var.environment_variables
environment_variables = merge(var.environment_variables, { AK_EXECUTION__MODE = var.execution_mode })

Copilot uses AI. Check for mistakes.
websocket_connections_table_name = local.websocket_connections_table_name
websocket_connections_table_arn = local.websocket_connections_table_arn

depends_on = [module.request_handler_source_package]
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends_on = [module.request_handler_source_package] references a module that is created with count. Referencing it without an index is invalid when count is set, and will also fail when count = 0. Use depends_on = [module.request_handler_source_package[0]] guarded by the same condition, or depend on a resource that always exists for both packaging modes.

Suggested change
depends_on = [module.request_handler_source_package]
depends_on = var.package_type == "S3Zip" ? [module.request_handler_source_package[0]] : []

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +61
value = module.response_handler[0].response_handler_lambda_function_arn
}

output "response_handler_lambda_function_name" {
description = "Name of the response handler Lambda function"
value = module.response_handler[0].response_handler_lambda_function_name
}

output "response_handler_lambda_function_invoke_arn" {
description = "Invoke ARN of the response handler Lambda function"
value = module.response_handler[0].response_handler_lambda_function_invoke_arn
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These outputs index module.response_handler[0], but module "response_handler" is not declared with count/for_each in state.tf. This will fail at plan time. Remove the index (or add count consistently and guard outputs when not created).

Suggested change
value = module.response_handler[0].response_handler_lambda_function_arn
}
output "response_handler_lambda_function_name" {
description = "Name of the response handler Lambda function"
value = module.response_handler[0].response_handler_lambda_function_name
}
output "response_handler_lambda_function_invoke_arn" {
description = "Invoke ARN of the response handler Lambda function"
value = module.response_handler[0].response_handler_lambda_function_invoke_arn
value = module.response_handler.response_handler_lambda_function_arn
}
output "response_handler_lambda_function_name" {
description = "Name of the response handler Lambda function"
value = module.response_handler.response_handler_lambda_function_name
}
output "response_handler_lambda_function_invoke_arn" {
description = "Invoke ARN of the response handler Lambda function"
value = module.response_handler.response_handler_lambda_function_invoke_arn

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +108
value = module.queues[0].input_queue_arn
}

output "input_queue_url" {
description = "URL of the input SQS queue"
value = module.queues[0].input_queue_url
}

output "input_queue_name" {
description = "Name of the input SQS queue"
value = module.queues[0].input_queue_name
}

output "output_queue_arn" {
description = "ARN of the output SQS queue"
value = module.queues[0].output_queue_arn
}

output "output_queue_url" {
description = "URL of the output SQS queue"
value = module.queues[0].output_queue_url
}

output "output_queue_name" {
description = "Name of the output SQS queue"
value = module.queues[0].output_queue_name
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These outputs index module.queues[0], but module "queues" is not declared with count/for_each in state.tf. This will fail at plan time. Reference module.queues.input_queue_arn / module.queues.output_queue_url etc instead.

Suggested change
value = module.queues[0].input_queue_arn
}
output "input_queue_url" {
description = "URL of the input SQS queue"
value = module.queues[0].input_queue_url
}
output "input_queue_name" {
description = "Name of the input SQS queue"
value = module.queues[0].input_queue_name
}
output "output_queue_arn" {
description = "ARN of the output SQS queue"
value = module.queues[0].output_queue_arn
}
output "output_queue_url" {
description = "URL of the output SQS queue"
value = module.queues[0].output_queue_url
}
output "output_queue_name" {
description = "Name of the output SQS queue"
value = module.queues[0].output_queue_name
value = module.queues.input_queue_arn
}
output "input_queue_url" {
description = "URL of the input SQS queue"
value = module.queues.input_queue_url
}
output "input_queue_name" {
description = "Name of the input SQS queue"
value = module.queues.input_queue_name
}
output "output_queue_arn" {
description = "ARN of the output SQS queue"
value = module.queues.output_queue_arn
}
output "output_queue_url" {
description = "URL of the output SQS queue"
value = module.queues.output_queue_url
}
output "output_queue_name" {
description = "Name of the output SQS queue"
value = module.queues.output_queue_name

Copilot uses AI. Check for mistakes.
Comment on lines +265 to +266
local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
local.output_queue_visibility_timeout >= var.response_handler.timeout
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check.queue_visibility_timeouts compares local.*_queue_visibility_timeout values that can be null when queue_config is omitted (since the locals use try(..., null)). Comparing null >= number yields an invalid condition and will break planning. Default these locals to the same defaults used by the queues module (e.g., 60) or compute them via coalesce(try(...), 60).

Suggested change
local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
local.output_queue_visibility_timeout >= var.response_handler.timeout
coalesce(local.input_queue_visibility_timeout, 60) >= var.agent_runner.timeout &&
coalesce(local.output_queue_visibility_timeout, 60) >= var.response_handler.timeout

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +77
value = module.agent_runner[0].agent_runner_lambda_function_arn
}

output "agent_runner_lambda_function_name" {
description = "Name of the agent runner Lambda function"
value = module.agent_runner[0].agent_runner_lambda_function_name
}

output "agent_runner_lambda_function_invoke_arn" {
description = "Invoke ARN of the agent runner Lambda function"
value = module.agent_runner[0].agent_runner_lambda_function_invoke_arn
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These outputs index module.agent_runner[0], but module "agent_runner" is not declared with count/for_each in state.tf. This will fail at plan time. Remove the index (or add count consistently and guard outputs when not created).

Suggested change
value = module.agent_runner[0].agent_runner_lambda_function_arn
}
output "agent_runner_lambda_function_name" {
description = "Name of the agent runner Lambda function"
value = module.agent_runner[0].agent_runner_lambda_function_name
}
output "agent_runner_lambda_function_invoke_arn" {
description = "Invoke ARN of the agent runner Lambda function"
value = module.agent_runner[0].agent_runner_lambda_function_invoke_arn
value = module.agent_runner.agent_runner_lambda_function_arn
}
output "agent_runner_lambda_function_name" {
description = "Name of the agent runner Lambda function"
value = module.agent_runner.agent_runner_lambda_function_name
}
output "agent_runner_lambda_function_invoke_arn" {
description = "Invoke ARN of the agent runner Lambda function"
value = module.agent_runner.agent_runner_lambda_function_invoke_arn

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +30
if response_store_config.redis is not None:
from .redis import RedisResponseStore

redis_config = response_store_config.redis
# Determine if SSL should be used based on URL scheme
use_ssl = redis_config.url.startswith("rediss://")

self.store = RedisResponseStore(
host=redis_config.url,
ssl=use_ssl
)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RedisResponseStore is constructed with host=redis_config.url, but redis.Redis(host=...) expects a hostname (e.g., localhost) rather than a full URL like redis://... or rediss://.... This will fail DNS resolution for URL strings. Use redis.Redis.from_url(redis_config.url, ...) (or parse the URL into host/port/credentials) and let the scheme drive TLS automatically.

Copilot uses AI. Check for mistakes.
from typing import List, Optional

from pydantic import BaseModel, Field
from pydantic import BaseModel, Field, model_validator
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_validator is imported but never used in this module. Removing the unused import avoids lint/type-check noise.

Suggested change
from pydantic import BaseModel, Field, model_validator
from pydantic import BaseModel, Field

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +35
redis_url = var.create_redis_cluster == true ? module.redis[0].url : null
dynamodb_memory_table_arn = var.create_dynamodb_memory_table == true ? module.dynamodb_memory[0].table_arn : null
dynamodb_memory_table_name = var.create_dynamodb_memory_table == true ? module.dynamodb_memory[0].table_name : null
dynamodb_multimodal_memory_table_arn = var.create_dynamodb_multimodal_memory_table == true ? module.dynamodb_multimodal_memory[0].table_arn : null
dynamodb_multimodal_memory_table_name = var.create_dynamodb_multimodal_memory_table == true ? module.dynamodb_multimodal_memory[0].table_name : null
create_authorizer = var.authorizer != null ? (var.authorizer.function_name != null && var.authorizer.handler_path != null && var.authorizer.package_type != null && var.authorizer.package_path != null && var.authorizer.module_name != null) : false
create_authorizer = var.authorizer != null ? (var.authorizer.function_name != null && var.authorizer.handler_path != null && var.authorizer.package_type != null && var.authorizer.package_path != null && var.authorizer.module_name != null) : false
request_handler_package_path = var.package_path
response_handler_source_package_path = var.package_path
agent_runner_package_path = try(var.agent_runner.package_path, null)
agent_runner_artifact_module_name = coalesce(var.agent_runner_module_name, "${var.module_name}-agent-runner")
# Response store wiring
is_async_mode = var.execution_mode == "async"
response_store_redis_url = var.create_redis_response_store ? local.redis_url : null
response_store_dynamodb_table_name = var.create_dynamodb_response_store ? module.dynamodb_response_store[0].table_name : null
response_store_dynamodb_table_arn = var.create_dynamodb_response_store ? module.dynamodb_response_store[0].table_arn : null
response_handler_response_store_redis = var.create_redis_response_store ? {
url = local.response_store_redis_url
} : null
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local.redis_url is only set when create_redis_cluster == true, but this module also creates Redis when create_redis_response_store == true. As a result, response_store_redis_url (and downstream response store wiring) can end up null even though a Redis cluster was created. Make local.redis_url depend on both flags (or directly on length(module.redis) > 0).

Copilot uses AI. Check for mistakes.
@lakindu-yl lakindu-yl closed this Mar 28, 2026
@lakindu-yl lakindu-yl reopened this Mar 28, 2026
@lakindu-yl lakindu-yl closed this Mar 28, 2026
@lakindu-yl
Copy link
Copy Markdown
Contributor Author

closing temporarily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants