Feature/scalability implementation by lakindu-yl · Pull Request #255 · yaalalabs/agent-kernel

lakindu-yl · 2026-03-26T03:40:05Z

Description

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Performance improvement
Test update
CI/CD update
Other (please describe):

Related Issues

Fixes #
Relates to #

Changes Made

Testing

Unit tests pass locally
Integration tests pass locally
Manual testing completed
New tests added for changes

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

…ponse_handler dist creation

Copilot

Pull request overview

This PR introduces a “scalable” deployment path for Agent Kernel, adding shared request-processing logic plus AWS infrastructure and runtime components for queue-based execution and (optionally) WebSocket-based async communication.

Changes:

Added a centralized ChatService and ResponseStore abstraction to reuse request parsing/agent execution logic across deployments.
Implemented AWS “serverless scalable mode” with SQS-driven agent execution, response handling, optional response storage, and WebSocket connection management.
Extended AKConfig with execution settings and added Terraform modules to provision queues, response stores, REST API Gateway, and WebSocket API Gateway.

Reviewed changes

Copilot reviewed 53 out of 54 changed files in this pull request and generated 25 comments.

Show a summary per file

File	Description
ak-py/src/agentkernel/deployment/common/response_store.py	Adds a response-store interface for session/message storage.
ak-py/src/agentkernel/deployment/common/chat_service.py	Adds shared request parsing + agent execution wrapper used by deployments.
ak-py/src/agentkernel/deployment/common/init.py	Common deployment package init (no content shown).
ak-py/src/agentkernel/deployment/azure/akfunction.py	Refactors Azure Functions handler to use shared `ChatService`.
ak-py/src/agentkernel/deployment/aws/serverless/internal/response_handler.py	Adds SQS consumer(s) to store or push responses (REST vs WebSocket).
ak-py/src/agentkernel/deployment/aws/serverless/core/websocket_manager.py	Adds DynamoDB-backed WebSocket connection management + send helpers.
ak-py/src/agentkernel/deployment/aws/serverless/core/sqs_consumer.py	Adds a base SQS consumer with retry/permanent-failure handling.
ak-py/src/agentkernel/deployment/aws/serverless/core/default_ws_endpoints.py	Adds default WebSocket routes ($connect/$disconnect/$default/chat).
ak-py/src/agentkernel/deployment/aws/serverless/core/default_endpoints.py	Adds default REST routes for sync/async/stream execution modes.
ak-py/src/agentkernel/deployment/aws/serverless/core/init.py	Exposes default endpoints handler.
ak-py/src/agentkernel/deployment/aws/serverless/aklambda.py	Adds Lambda router supporting REST and WebSocket dispatch based on mode.
ak-py/src/agentkernel/deployment/aws/serverless/akauthorizer.py	Fixes/adjusts authorizer import path for new package layout.
ak-py/src/agentkernel/deployment/aws/serverless/akagentrunner.py	Adds SQS-driven agent runner that posts results to output queue.
ak-py/src/agentkernel/deployment/aws/serverless/init.py	Exports serverless entrypoints (Lambda/Authorizer/Runner).
ak-py/src/agentkernel/deployment/aws/response_store/redis.py	Adds Redis-backed response store implementation.
ak-py/src/agentkernel/deployment/aws/response_store/handler.py	Adds store selection based on `AKConfig.execution.response_store`.
ak-py/src/agentkernel/deployment/aws/response_store/dynamodb.py	Adds DynamoDB-backed response store implementation.
ak-py/src/agentkernel/deployment/aws/response_store/init.py	Adds response_store package export.
ak-py/src/agentkernel/deployment/aws/aklambda.py	Removes legacy AWS Lambda router/handler implementation.
ak-py/src/agentkernel/deployment/aws/init.py	Updates AWS deployment exports to new serverless package structure.
ak-py/src/agentkernel/core/model.py	Adds `ExecutionMode` enum for deployment mode selection.
ak-py/src/agentkernel/core/config.py	Adds `execution` config (mode/queues/response_store/websocket table).
ak-deployment/ak-aws/serverless/variables.tf	Adds scalable mode inputs: execution mode, runner/handler, queue/store/websocket configs.
ak-deployment/ak-aws/serverless/state.tf	Reworks module composition to conditionally create REST vs WebSocket and scalable components.
ak-deployment/ak-aws/serverless/outputs.tf	Adds outputs for queues, runner, response handler, stores, and websocket resources.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/variables.tf	Defines WebSocket API Gateway module inputs.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/outputs.tf	Exposes WebSocket API Gateway outputs.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/main.tf	Provisions API Gateway v2 WebSocket API + routes + Lambda permission.
ak-deployment/ak-aws/serverless/modules/websocket-api-gateway/README.md	Documents the WebSocket API Gateway module.
ak-deployment/ak-aws/serverless/modules/response-stores/variables.tf	Defines response store module inputs (Redis/DynamoDB).
ak-deployment/ak-aws/serverless/modules/response-stores/outputs.tf	Exposes response store outputs.
ak-deployment/ak-aws/serverless/modules/response-stores/main.tf	Provisions Redis and/or DynamoDB response storage.
ak-deployment/ak-aws/serverless/modules/response-handler/variables.tf	Defines response-handler Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/response-handler/response_handler.tf	Provisions response-handler Lambda + IAM + SQS event source mapping.
ak-deployment/ak-aws/serverless/modules/response-handler/outputs.tf	Exposes response-handler Lambda outputs.
ak-deployment/ak-aws/serverless/modules/response-handler/build_response_handler.sh	Builds a minimal response-handler deployment zip from the main package.
ak-deployment/ak-aws/serverless/modules/response-handler/.gitignore	Ignores response-handler build artifacts.
ak-deployment/ak-aws/serverless/modules/request-handler/variables.tf	Defines request-handler Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/request-handler/outputs.tf	Exposes request-handler outputs.
ak-deployment/ak-aws/serverless/modules/request-handler/main.tf	Provisions request-handler Lambda + IAM permissions (incl. SQS/WebSocket perms).
ak-deployment/ak-aws/serverless/modules/queues/variables.tf	Defines queue module inputs.
ak-deployment/ak-aws/serverless/modules/queues/outputs.tf	Exposes queue outputs.
ak-deployment/ak-aws/serverless/modules/queues/main.tf	Provisions input/output SQS queues using shared SQS module.
ak-deployment/ak-aws/serverless/modules/api-gateway/variables.tf	Defines REST API Gateway module inputs.
ak-deployment/ak-aws/serverless/modules/api-gateway/outputs.tf	Exposes REST API Gateway outputs.
ak-deployment/ak-aws/serverless/modules/api-gateway/main.tf	Provisions REST API Gateway v1 resources/methods/integration/logging/authorizer.
ak-deployment/ak-aws/serverless/modules/agent-runner/variables.tf	Defines agent-runner Lambda module inputs.
ak-deployment/ak-aws/serverless/modules/agent-runner/outputs.tf	Exposes agent-runner Lambda outputs.
ak-deployment/ak-aws/serverless/modules/agent-runner/main.tf	Provisions agent-runner Lambda + IAM + SQS event source mapping.
ak-deployment/ak-aws/serverless/README.md	Documents new WebSocket/async usage and outputs.
ak-deployment/ak-aws/common/modules/sqs/variables.tf	Adds/defines reusable SQS module variables.
ak-deployment/ak-aws/common/modules/sqs/outputs.tf	Adds reusable SQS module outputs.
ak-deployment/ak-aws/common/modules/sqs/main.tf	Adds reusable SQS queue+DLQ implementation with access policies.
ak-deployment/ak-aws/common/modules/sqs/README.md	Documents the reusable SQS module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T03:48:23Z

ak-py/src/agentkernel/deployment/aws/serverless/core/default_ws_endpoints.py

+            # Prepare message for queue
+            queue_payload = {
+                "session_id": session_id,
+                "connection_id": connection_id,
+                "message": body.get("message", ""),
+                "metadata": body.get("metadata", {}),
+                "websocket": True  # Flag to indicate WebSocket origin
+            }


The queue payload uses message/metadata, but ChatService.process_chat_request expects prompt (and optionally agent, images, files). In async WebSocket mode this will consistently produce “No prompt provided…” in the agent runner. Align the payload schema (e.g., send prompt instead of message, and map metadata fields appropriately).

Copilot · 2026-03-26T03:48:23Z

ak-py/src/agentkernel/deployment/aws/serverless/core/default_ws_endpoints.py

+            success = cls._ws_manager.store_connection(
+                session_id=session_id,
+                connection_id=connection_id,
+                metadata={
+                    "connected_at": int(context.get_remaining_time_in_millis() if context else 0)
+                }
+            )


connected_at metadata is being set to context.get_remaining_time_in_millis() which is not a timestamp and will overwrite the timestamp set in WebSocketManager.store_connection. Use an actual wall-clock timestamp (e.g., epoch ms) or rename the field if you intend to store remaining execution time.

Copilot · 2026-03-26T03:48:23Z

ak-py/src/agentkernel/deployment/aws/response_store/handler.py

+            redis_config = response_store_config.redis
+            # Determine if SSL should be used based on URL scheme
+            use_ssl = redis_config.url.startswith("rediss://")
+
+            self.store = RedisResponseStore(
+                host=redis_config.url,
+                ssl=use_ssl
+            )


For Redis, RedisResponseStore is initialized with host=redis_config.url, but redis.Redis(host=...) expects a hostname (no scheme/credentials). Use redis.Redis.from_url(redis_config.url, ...) (or parse URL into host/port/user/pass) so configs like redis://... / rediss://... work.

Copilot · 2026-03-26T03:48:24Z

ak-py/src/agentkernel/deployment/aws/response_store/redis.py

+        session_id = message["session_id"]
+        key = f"response:session:{session_id}" # TODO:: have to get from AKConfig prefix 
+        self.client.rpush(key, json.dumps(message))
+
+    def get_messages(self, session_id: str) -> list[dict]:
+        key = f"response:session:{session_id}"
+
+        messages = self.client.lrange(key, 0, -1) # 0 to -1 means all items
+
+        return [json.loads(m) for m in messages]


The Redis key prefix and retention are hardcoded (response:session:{session_id} with no TTL), ignoring the configured prefix/ttl in AKConfig. This can cause unbounded growth. Use the configured prefix and set an expiry when adding messages (and/or trim the list).

Copilot · 2026-03-26T03:48:24Z

ak-py/src/agentkernel/deployment/aws/response_store/dynamodb.py

+    def add_message(self, message: dict) -> None:
+        self.table.put_item(Item=message)
+


DynamoDB TTL won’t work as configured (Terraform uses expiry_time), because stored items never set an expiry_time attribute. Consider adding expiry_time = now + ttl when writing messages (and ensuring the table schema matches).

Copilot · 2026-03-26T03:48:28Z

ak-py/src/agentkernel/deployment/aws/response_store/redis.py

+        self.client = redis.Redis(
+            host=host,
+            port=port,
+            username=username,
+            password=password,
+            ssl=ssl,
+            decode_responses=True,
+        )


redis.Redis(host=host, ...) is being fed a full URL via ResponseDBHandler (e.g., redis://...), which will not resolve as a hostname. Prefer redis.Redis.from_url(...) (and then you can drop the manual ssl flag, since the scheme controls TLS).

Copilot · 2026-03-26T03:48:28Z

ak-py/src/agentkernel/deployment/aws/response_store/dynamodb.py

+    def __init__(self, table_name: str, region: str):
+        dynamodb = boto3.resource("dynamodb", region_name=region)
+
+        self.table = dynamodb.Table(table_name)


__init__ requires region: str, but callers pass None (see ResponseDBHandler). Make region optional (defaulting to boto3’s default region resolution) or ensure a real region string is always provided.

Copilot · 2026-03-26T03:48:28Z

ak-py/src/agentkernel/deployment/aws/serverless/akagentrunner.py

+        cls._log.info(f"Permanent failure: {record}: Retried message {cls.max_receive_count} times. Sending error message to Output Queue`")
+        try:
+            error_message_body = cls._construct_error_message_body(error_msg="Failed to process message. Retried {cls.max_receive_count} times")
+            queue_input_message = cls._construct_queue_input_message(raw_queue_message=record, queue_input_message_body=error_message_body,)
+            cls._send_to_output_queue(queue_input_message=queue_input_message)


error_msg is built with a literal {cls.max_receive_count} (missing f prefix), so the user-facing error won’t include the retry count. Use an f-string (or .format) so the retry count is interpolated correctly.

Copilot · 2026-03-26T03:48:28Z

ak-py/src/agentkernel/deployment/common/chat_service.py

+        try:
+            loop = asyncio.get_event_loop()
+            if loop.is_closed():
+                asyncio.set_event_loop(asyncio.new_event_loop())
+                return asyncio.run(self.service.run_multi(requests=requests))
+            else:
+                return loop.run_until_complete(self.service.run_multi(requests=requests))
+        except RuntimeError:
+            return asyncio.run(self.service.run_multi(requests=requests))


_run_agent_service will fail when called from a context where an event loop is already running: loop.run_until_complete(...) raises, and the except RuntimeError fallback to asyncio.run(...) also raises under a running loop. Consider making process_chat_request async, or detect a running loop (get_running_loop) and use asyncio.run_coroutine_threadsafe / create_task depending on context.

Copilot · 2026-03-26T03:48:29Z

ak-py/src/agentkernel/deployment/azure/akfunction.py

-from agentkernel.core.model import AgentReplyImage, AgentReplyText, AgentRequestAny, AgentRequestText
-
-from ...core import AgentService
+from agentkernel.deployment.core.chat_service import ChatService


This import path looks incorrect: the new ChatService lives under agentkernel.deployment.common.chat_service, not agentkernel.deployment.core.chat_service. As-is, Azure Functions will fail at import time.

Copilot

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 16 comments.

Comments suppressed due to low confidence (1)

ak-deployment/ak-aws/serverless/modules/request-handler/main.tf:289

In scalable/async modes the request handler uses ResponseDBHandler for polling (_get_messages), but this module only sets AK_EXECUTION__QUEUES__INPUT_QUEUE_URL (and WebSocket table) — it never sets AK_EXECUTION__RESPONSE_STORE__... env vars. With the current ResponseDBHandler implementation (which requires execution.response_store), REST_ASYNC polling will crash at runtime. Wire the response store configuration into the request handler environment when scalable_mode is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T12:10:57Z

ak-py/src/agentkernel/deployment/aws/response_store/handler.py

+            self.store = DynamoDBResponseStore(
+                table_name=dynamodb_config.table_name,
+                region=None  # Will use default AWS region from environment/IAM role
+            )


DynamoDBResponseStore is instantiated with region=None, but its constructor currently declares region: str and passes it into boto3.resource(..., region_name=region). Make region optional in DynamoDBResponseStore.__init__ (defaulting to AWS SDK resolution) or pass an actual region string here.

Copilot · 2026-03-26T12:10:57Z

ak-py/src/agentkernel/core/config.py

+    mode: ExecutionMode = Field(
+        default=None,


mode is typed as ExecutionMode but has default=None. This will either fail Pydantic validation or propagate None into code paths that expect an ExecutionMode (e.g., endpoint routing), causing runtime errors. Make this Optional[ExecutionMode] with a sensible default (likely ExecutionMode.REST_SYNC) or require it explicitly.

Suggested change

mode: ExecutionMode = Field(

default=None,

mode: Optional[ExecutionMode] = Field(

default=ExecutionMode.REST_SYNC,

Copilot · 2026-03-26T12:10:58Z

ak-py/src/agentkernel/deployment/common/chat_service.py

+    def __init__(self):
+        self._log = logging.getLogger("ak.deployment.chat_service")
+        # self.service = AgentService is done in the process_chat_request() function


Mypy is configured with disallow_untyped_defs; __init__ needs an explicit return type (-> None). This will currently fail type-checking for this new module.

Copilot · 2026-03-26T12:10:58Z

ak-deployment/ak-aws/serverless/variables.tf

+    condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "ses_stream", "async"], var.execution_mode)
+    error_message = "execution_mode must be one of: rest_sync, rest_async, ses_stream, async, or null."


execution_mode validation allows ses_stream, but the Python ExecutionMode enum added in this PR uses stream (and the routing code checks ExecutionMode.STREAM). This mismatch will cause deployments configured with ses_stream to fail at runtime. Align the allowed Terraform values with the Python enum (or update the enum to match).

Suggested change

condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "ses_stream", "async"], var.execution_mode)

error_message = "execution_mode must be one of: rest_sync, rest_async, ses_stream, async, or null."

condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "stream", "async"], var.execution_mode)

error_message = "execution_mode must be one of: rest_sync, rest_async, stream, async, or null."

Copilot · 2026-03-26T12:10:59Z

ak-py/src/agentkernel/deployment/aws/serverless/aklambda.py

+    def register(self, path: str, method: str = None) -> Callable[[Callable], Callable]:
+        """


Mypy strict mode: method defaults to None but is typed as str. Change the signature to method: Optional[str] = None to avoid type-checking errors.

Copilot · 2026-03-26T12:11:01Z

ak-py/src/agentkernel/deployment/aws/serverless/core/default_endpoints.py

+        def sync_operation(payload: Dict[str, Any]) -> Dict[str, Any]:
+            queue_result = cls._send_to_queue(payload)
+            db_result = cls._get_messages(payload)
+            return {"queue_status": queue_result, "response": db_result}
+


In REST_SYNC mode, sync_operation sends to SQS and immediately reads from the response store, but there’s no waiting/long-polling for the agent-runner + response-handler pipeline to complete. This will typically return an empty/partial response and behave nondeterministically. Consider making REST_SYNC bypass SQS and call ChatService directly, or implement a bounded wait/long-poll with correlation on message_id.

Copilot · 2026-03-26T12:11:01Z

ak-deployment/ak-aws/serverless/state.tf

+      local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
+      local.output_queue_visibility_timeout >= var.response_handler.timeout
+    ) : true
+    error_message = format(
+      "[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",
+      local.input_queue_visibility_timeout,
+      var.agent_runner.timeout,
+      local.output_queue_visibility_timeout,


check.queue_visibility_timeouts compares local.input_queue_visibility_timeout/local.output_queue_visibility_timeout against Lambda timeouts, but those locals can be null when queue_config is omitted (it defaults to {} and try(..., null) returns null). Comparing null >= number will error during plan/apply. Use coalesce(..., <module default>) so the check always evaluates with numbers (or derive the values from the queues module defaults).

Suggested change

local.input_queue_visibility_timeout >= var.agent_runner.timeout &&

local.output_queue_visibility_timeout >= var.response_handler.timeout

) : true

error_message = format(

"[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",

local.input_queue_visibility_timeout,

var.agent_runner.timeout,

local.output_queue_visibility_timeout,

coalesce(local.input_queue_visibility_timeout, var.agent_runner.timeout) >= var.agent_runner.timeout &&

coalesce(local.output_queue_visibility_timeout, var.response_handler.timeout) >= var.response_handler.timeout

) : true

error_message = format(

"[IMPORTANT] Invalid queue visibility timeout configuration: input queue visibility timeout (%d) must be >= agent runner timeout (%d), and output queue visibility timeout (%d) must be >= response handler timeout (%d).",

coalesce(local.input_queue_visibility_timeout, var.agent_runner.timeout),

var.agent_runner.timeout,

coalesce(local.output_queue_visibility_timeout, var.response_handler.timeout),

Copilot · 2026-03-26T12:11:01Z

ak-py/src/agentkernel/deployment/aws/serverless/internal/response_handler.py

+
+            if not domain_name:
+                cls._log.error("WebSocket domain_name not found in message or environment")
+                raise ValueError("Missing WebSocket domain_name")
+


domain_name/stage are required to call the API Gateway Management API, but this handler only looks in message_data (or env vars). Messages produced by ServerlessAgentRunner/ChatService don’t include these fields, and the Terraform in this PR doesn’t set WEBSOCKET_DOMAIN_NAME/WEBSOCKET_STAGE, so async/WebSocket mode will fail. Add domain_name/stage to the queued payload or wire them via env vars.

Copilot · 2026-03-26T12:11:02Z

ak-py/src/agentkernel/deployment/aws/serverless/core/websocket_manager.py

+                    response = self._connection_table.scan(
+                        FilterExpression=Key("connection_id").eq(connection_id)
+                    )


In the scan fallback, FilterExpression=Key("connection_id").eq(connection_id) is incorrect: Key is for key conditions; scans should use Attr. This will throw or behave incorrectly when the GSI is missing. Use from boto3.dynamodb.conditions import Attr and FilterExpression=Attr("connection_id").eq(connection_id).

Copilot · 2026-03-26T12:11:02Z

ak-py/src/agentkernel/deployment/aws/response_store/redis.py

+        self.client = redis.Redis(
+            host=host,
+            port=port,
+            username=username,
+            password=password,


This uses redis.Redis(host=host, ...), but the configuration in this repo provides a URL (redis://.../rediss://...). If host is a URL, the client will treat it as a hostname and fail to connect. Prefer redis.Redis.from_url(...) and accept a url: str here.

Copilot

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 18 comments.

Comments suppressed due to low confidence (1)

ak-deployment/ak-aws/serverless/modules/request-handler/main.tf:58

aws_iam_policy.lambda_sqs_policy is not a counted resource, but the attachment references aws_iam_policy.lambda_sqs_policy[0].arn, which will fail Terraform evaluation. Reference it without an index, or add count/for_each consistently on both resources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T05:22:38Z

ak-deployment/ak-aws/serverless/state.tf

+  input_queue_url = module.queues[0].input_queue_url
+  input_queue_arn = module.queues[0].input_queue_arn
+
+  # Output queue
+  output_queue_url = module.queues[0].output_queue_url
+  output_queue_arn = module.queues[0].output_queue_arn


The locals reference module.queues[0].*, but module "queues" is not a counted module (no count/for_each). Indexing it will fail Terraform evaluation. Reference outputs as module.queues.input_queue_url / module.queues.output_queue_arn etc, or add count = 1 and keep indexing consistent.

Suggested change

input_queue_url = module.queues[0].input_queue_url

input_queue_arn = module.queues[0].input_queue_arn

# Output queue

output_queue_url = module.queues[0].output_queue_url

output_queue_arn = module.queues[0].output_queue_arn

input_queue_url = module.queues.input_queue_url

input_queue_arn = module.queues.input_queue_arn

# Output queue

output_queue_url = module.queues.output_queue_url

output_queue_arn = module.queues.output_queue_arn

Copilot · 2026-03-27T05:22:39Z

ak-deployment/ak-aws/serverless/state.tf

+  module_name                             = var.module_name
+  is_production                           = var.is_production
+  package_path                            = local.request_handler_package_path
+  environment_variables                   = var.environment_variables


The request handler Lambda needs to know execution_mode to route correctly (REST vs async/WebSocket). This module currently passes only var.environment_variables and does not inject AK_EXECUTION__MODE (unlike agent-runner/response-handler). Without it, runtime config may default incorrectly (or be None). Consider merging AK_EXECUTION__MODE = var.execution_mode into environment_variables for the request handler too.

Suggested change

environment_variables = var.environment_variables

environment_variables = merge(var.environment_variables, { AK_EXECUTION__MODE = var.execution_mode })

Copilot · 2026-03-27T05:22:39Z

ak-deployment/ak-aws/serverless/state.tf

+  websocket_connections_table_name        = local.websocket_connections_table_name
+  websocket_connections_table_arn         = local.websocket_connections_table_arn
+
+  depends_on = [module.request_handler_source_package]


depends_on = [module.request_handler_source_package] references a module that is created with count. Referencing it without an index is invalid when count is set, and will also fail when count = 0. Use depends_on = [module.request_handler_source_package[0]] guarded by the same condition, or depend on a resource that always exists for both packaging modes.

Suggested change

depends_on = [module.request_handler_source_package]

depends_on = var.package_type == "S3Zip" ? [module.request_handler_source_package[0]] : []

Copilot · 2026-03-27T05:22:39Z

ak-deployment/ak-aws/serverless/outputs.tf

+  value       = module.response_handler[0].response_handler_lambda_function_arn
+}
+
+output "response_handler_lambda_function_name" {
+  description = "Name of the response handler Lambda function"
+  value       = module.response_handler[0].response_handler_lambda_function_name
+}
+
+output "response_handler_lambda_function_invoke_arn" {
+  description = "Invoke ARN of the response handler Lambda function"
+  value       = module.response_handler[0].response_handler_lambda_function_invoke_arn


These outputs index module.response_handler[0], but module "response_handler" is not declared with count/for_each in state.tf. This will fail at plan time. Remove the index (or add count consistently and guard outputs when not created).

Suggested change

value = module.response_handler[0].response_handler_lambda_function_arn

}

output "response_handler_lambda_function_name" {

description = "Name of the response handler Lambda function"

value = module.response_handler[0].response_handler_lambda_function_name

}

output "response_handler_lambda_function_invoke_arn" {

description = "Invoke ARN of the response handler Lambda function"

value = module.response_handler[0].response_handler_lambda_function_invoke_arn

value = module.response_handler.response_handler_lambda_function_arn

}

output "response_handler_lambda_function_name" {

description = "Name of the response handler Lambda function"

value = module.response_handler.response_handler_lambda_function_name

}

output "response_handler_lambda_function_invoke_arn" {

description = "Invoke ARN of the response handler Lambda function"

value = module.response_handler.response_handler_lambda_function_invoke_arn

Copilot · 2026-03-27T05:22:39Z

ak-deployment/ak-aws/serverless/outputs.tf

+  value       = module.queues[0].input_queue_arn
+}
+
+output "input_queue_url" {
+  description = "URL of the input SQS queue"
+  value       = module.queues[0].input_queue_url
+}
+
+output "input_queue_name" {
+  description = "Name of the input SQS queue"
+  value       = module.queues[0].input_queue_name
+}
+
+output "output_queue_arn" {
+  description = "ARN of the output SQS queue"
+  value       = module.queues[0].output_queue_arn
+}
+
+output "output_queue_url" {
+  description = "URL of the output SQS queue"
+  value       = module.queues[0].output_queue_url
+}
+
+output "output_queue_name" {
+  description = "Name of the output SQS queue"
+  value       = module.queues[0].output_queue_name


These outputs index module.queues[0], but module "queues" is not declared with count/for_each in state.tf. This will fail at plan time. Reference module.queues.input_queue_arn / module.queues.output_queue_url etc instead.

Suggested change

value = module.queues[0].input_queue_arn

}

output "input_queue_url" {

description = "URL of the input SQS queue"

value = module.queues[0].input_queue_url

}

output "input_queue_name" {

description = "Name of the input SQS queue"

value = module.queues[0].input_queue_name

}

output "output_queue_arn" {

description = "ARN of the output SQS queue"

value = module.queues[0].output_queue_arn

}

output "output_queue_url" {

description = "URL of the output SQS queue"

value = module.queues[0].output_queue_url

}

output "output_queue_name" {

description = "Name of the output SQS queue"

value = module.queues[0].output_queue_name

value = module.queues.input_queue_arn

}

output "input_queue_url" {

description = "URL of the input SQS queue"

value = module.queues.input_queue_url

}

output "input_queue_name" {

description = "Name of the input SQS queue"

value = module.queues.input_queue_name

}

output "output_queue_arn" {

description = "ARN of the output SQS queue"

value = module.queues.output_queue_arn

}

output "output_queue_url" {

description = "URL of the output SQS queue"

value = module.queues.output_queue_url

}

output "output_queue_name" {

description = "Name of the output SQS queue"

value = module.queues.output_queue_name

Copilot · 2026-03-27T05:22:41Z

ak-deployment/ak-aws/serverless/state.tf

+      local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
+      local.output_queue_visibility_timeout >= var.response_handler.timeout


check.queue_visibility_timeouts compares local.*_queue_visibility_timeout values that can be null when queue_config is omitted (since the locals use try(..., null)). Comparing null >= number yields an invalid condition and will break planning. Default these locals to the same defaults used by the queues module (e.g., 60) or compute them via coalesce(try(...), 60).

Suggested change

local.input_queue_visibility_timeout >= var.agent_runner.timeout &&

local.output_queue_visibility_timeout >= var.response_handler.timeout

coalesce(local.input_queue_visibility_timeout, 60) >= var.agent_runner.timeout &&

coalesce(local.output_queue_visibility_timeout, 60) >= var.response_handler.timeout

Copilot · 2026-03-27T05:22:41Z

ak-deployment/ak-aws/serverless/outputs.tf

+  value       = module.agent_runner[0].agent_runner_lambda_function_arn
+}
+
+output "agent_runner_lambda_function_name" {
+  description = "Name of the agent runner Lambda function"
+  value       = module.agent_runner[0].agent_runner_lambda_function_name
+}
+
+output "agent_runner_lambda_function_invoke_arn" {
+  description = "Invoke ARN of the agent runner Lambda function"
+  value       = module.agent_runner[0].agent_runner_lambda_function_invoke_arn


These outputs index module.agent_runner[0], but module "agent_runner" is not declared with count/for_each in state.tf. This will fail at plan time. Remove the index (or add count consistently and guard outputs when not created).

Suggested change

value = module.agent_runner[0].agent_runner_lambda_function_arn

}

output "agent_runner_lambda_function_name" {

description = "Name of the agent runner Lambda function"

value = module.agent_runner[0].agent_runner_lambda_function_name

}

output "agent_runner_lambda_function_invoke_arn" {

description = "Invoke ARN of the agent runner Lambda function"

value = module.agent_runner[0].agent_runner_lambda_function_invoke_arn

value = module.agent_runner.agent_runner_lambda_function_arn

}

output "agent_runner_lambda_function_name" {

description = "Name of the agent runner Lambda function"

value = module.agent_runner.agent_runner_lambda_function_name

}

output "agent_runner_lambda_function_invoke_arn" {

description = "Invoke ARN of the agent runner Lambda function"

value = module.agent_runner.agent_runner_lambda_function_invoke_arn

Copilot · 2026-03-27T05:22:42Z

ak-py/src/agentkernel/deployment/aws/response_store/handler.py

+        if response_store_config.redis is not None:
+            from .redis import RedisResponseStore
+
+            redis_config = response_store_config.redis
+            # Determine if SSL should be used based on URL scheme
+            use_ssl = redis_config.url.startswith("rediss://")
+
+            self.store = RedisResponseStore(
+                host=redis_config.url,
+                ssl=use_ssl
+            )


RedisResponseStore is constructed with host=redis_config.url, but redis.Redis(host=...) expects a hostname (e.g., localhost) rather than a full URL like redis://... or rediss://.... This will fail DNS resolution for URL strings. Use redis.Redis.from_url(redis_config.url, ...) (or parse the URL into host/port/credentials) and let the scheme drive TLS automatically.

Copilot · 2026-03-27T05:22:42Z

ak-py/src/agentkernel/core/config.py

 from typing import List, Optional

-from pydantic import BaseModel, Field
+from pydantic import BaseModel, Field, model_validator


model_validator is imported but never used in this module. Removing the unused import avoids lint/type-check noise.

Suggested change

from pydantic import BaseModel, Field, model_validator

from pydantic import BaseModel, Field

Copilot · 2026-03-27T05:22:42Z

ak-deployment/ak-aws/serverless/state.tf

+  redis_url                             = var.create_redis_cluster == true ? module.redis[0].url : null
+  dynamodb_memory_table_arn             = var.create_dynamodb_memory_table == true ? module.dynamodb_memory[0].table_arn : null
+  dynamodb_memory_table_name            = var.create_dynamodb_memory_table == true ? module.dynamodb_memory[0].table_name : null
  dynamodb_multimodal_memory_table_arn  = var.create_dynamodb_multimodal_memory_table == true ? module.dynamodb_multimodal_memory[0].table_arn : null
  dynamodb_multimodal_memory_table_name = var.create_dynamodb_multimodal_memory_table == true ? module.dynamodb_multimodal_memory[0].table_name : null
-  create_authorizer          = var.authorizer != null ? (var.authorizer.function_name != null && var.authorizer.handler_path != null && var.authorizer.package_type != null && var.authorizer.package_path != null && var.authorizer.module_name != null) : false
+  create_authorizer                     = var.authorizer != null ? (var.authorizer.function_name != null && var.authorizer.handler_path != null && var.authorizer.package_type != null && var.authorizer.package_path != null && var.authorizer.module_name != null) : false
+  request_handler_package_path          = var.package_path
+  response_handler_source_package_path  = var.package_path
+  agent_runner_package_path             = try(var.agent_runner.package_path, null)
+  agent_runner_artifact_module_name     = coalesce(var.agent_runner_module_name, "${var.module_name}-agent-runner")
+  # Response store wiring
+  is_async_mode                   = var.execution_mode == "async"
+  response_store_redis_url        = var.create_redis_response_store ? local.redis_url : null
+  response_store_dynamodb_table_name = var.create_dynamodb_response_store ? module.dynamodb_response_store[0].table_name : null
+  response_store_dynamodb_table_arn  = var.create_dynamodb_response_store ? module.dynamodb_response_store[0].table_arn : null
+  response_handler_response_store_redis = var.create_redis_response_store ? {
+    url = local.response_store_redis_url
+  } : null


local.redis_url is only set when create_redis_cluster == true, but this module also creates Redis when create_redis_response_store == true. As a result, response_store_redis_url (and downstream response store wiring) can end up null even though a Redis cluster was created. Make local.redis_url depend on both flags (or directly on length(module.redis) > 0).

lakindu-yl · 2026-03-28T06:21:22Z

closing temporarily

lakindu-yl added 27 commits March 11, 2026 14:28

SQS common TF module updates

25d9e66

python initial code

df1f355

response-handler serverless module update

6803e97

import path fix

5401cb9

response-handler serverless module updates with null resource for res…

ad53fb9

…ponse_handler dist creation

serverless module updates with null resource and some validations

9a1cd9b

some updates to serverless and response-handler moduels

933aea0

getting env vars -> akconfig conversion

a1ffb68

changed env var export to akconfig env var format

4ac0815

changed normal env var to config env var in state.tf

77d4f84

Merge branch 'develop' into feature/scalability-implementation

233a914

sqs common module updates

3d27e53

terraform modules and infrastructure update for scalability design

aad0259

SOURCE and DESTINATION to INPUT and OUTPUT queue urls repectively update

45518ae

terraform aws serverless code refactoring

b0e9ce5

websocket python updates

09258d3

websocket api update

693a118

websocket updates for aklambda snd sqs_consumer lambda

60783a0

max receive count update

cdf098a

check for visbility timeout < lambda timeout

20ea02d

changed the deafult timeout values

580b9fd

proper indexing for dynamodb response store

c17b4ae

websocket connection table refactor

94a8586

some more code refactoring and fixing udpates

79e7874

source package for request handler update

fa4ca91

s3zip & Image support for AgentRunner

3d72453

small fixes

d5b612e

Copilot AI review requested due to automatic review settings March 26, 2026 03:40

Copilot started reviewing on behalf of lakindu-yl March 26, 2026 03:40 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

lakindu-yl added 2 commits March 26, 2026 17:28

refactoring response store creation logic

d5dea69

change to config.py _ResponseStoreConfig

0476c83

Copilot AI review requested due to automatic review settings March 26, 2026 11:59

Copilot started reviewing on behalf of lakindu-yl March 26, 2026 12:00 View session

import fixes

1b6af3a

Copilot AI reviewed Mar 26, 2026

View reviewed changes

lakindu-yl added 2 commits March 26, 2026 19:40

response handler build script update

2c79bb0

removed scalable mode selection option

e5e7e63

Copilot AI review requested due to automatic review settings March 27, 2026 04:59

Copilot started reviewing on behalf of lakindu-yl March 27, 2026 05:15 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

serverless scalable openai example

e2788b5

lakindu-yl closed this Mar 28, 2026

lakindu-yl reopened this Mar 28, 2026

lakindu-yl closed this Mar 28, 2026

		def add_message(self, message: dict) -> None:
		self.table.put_item(Item=message)

		condition = var.execution_mode == null ? true : contains(["rest_sync", "rest_async", "ses_stream", "async"], var.execution_mode)
		error_message = "execution_mode must be one of: rest_sync, rest_async, ses_stream, async, or null."

		def register(self, path: str, method: str = None) -> Callable[[Callable], Callable]:
		"""

	environment_variables = var.environment_variables
	environment_variables = merge(var.environment_variables, { AK_EXECUTION__MODE = var.execution_mode })

	depends_on = [module.request_handler_source_package]
	depends_on = var.package_type == "S3Zip" ? [module.request_handler_source_package[0]] : []

		local.input_queue_visibility_timeout >= var.agent_runner.timeout &&
		local.output_queue_visibility_timeout >= var.response_handler.timeout

	from pydantic import BaseModel, Field, model_validator
	from pydantic import BaseModel, Field

Conversation

lakindu-yl commented Mar 26, 2026

Description

Type of Change

Related Issues

Changes Made

Testing

Checklist

Screenshots (if applicable)

Additional Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!