Skip to content

Add advance health check for database#361

Merged
drtechie merged 17 commits intoPSMRI:mainfrom
DurgaPrasad-54:main
Mar 2, 2026
Merged

Add advance health check for database#361
drtechie merged 17 commits intoPSMRI:mainfrom
DurgaPrasad-54:main

Conversation

@DurgaPrasad-54
Copy link
Contributor

@DurgaPrasad-54 DurgaPrasad-54 commented Feb 23, 2026

📋 Description

JIRA ID:

This PR implements /health and /version endpoints in the Common-API as part of Issue PSMRI/AMRIT#102

What’s changed

  • Update /health endpoint
  • Update /version endpoint

Summary by CodeRabbit

  • New Features

    • Build now emits a git properties file with branch, commit, version, and build time.
    • Version endpoint now returns build metadata as JSON.
    • Health checks enhanced with per-component status, severity reporting, timestamps, and background diagnostics.
  • Security

    • /version and /health endpoints are exempted from authentication for public access.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

Warning

Rate limit exceeded

@DurgaPrasad-54 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Build now emits git metadata during Maven initialize; /version returns JSON built from generated git.properties; health checks gained a background MySQL diagnostics scheduler, richer per-component status and lifecycle shutdown; auth filter now exempts /health and /version from JWT checks.

Changes

Cohort / File(s) Summary
Build Configuration
pom.xml
Added io.github.git-commit-id:git-commit-id-maven-plugin (v9.0.2) to run revision in initialize, generate ${project.build.outputDirectory}/git.properties, include only git.branch, git.commit.id.abbrev, git.build.version, git.build.time, and disable failures when Git info is absent.
Version Endpoint
src/main/java/com/iemr/common/controller/version/VersionController.java
Refactored /version to @GetMapping producing JSON and returning ResponseEntity<Map<String,String>> populated from git.properties with fallback to "unknown"; added helper to load properties and updated logging.
Health Check Service
src/main/java/com/iemr/common/service/health/HealthService.java
Reworked health logic: added scheduled background MySQL diagnostics (stuck processes, long transactions, deadlocks, slow queries, connection usage), cached DB severity, timeout-bounded SELECT 1 DB connectivity checks, Redis connectivity checks, structured per-component response with checkedAt, and a @PreDestroy shutdownDiagnostics() lifecycle method.
Authentication Filter
src/main/java/com/iemr/common/utils/JwtUserIdValidationFilter.java
Extended shouldSkipAuthentication to treat ${contextPath}/health and ${contextPath}/version as public endpoints so they bypass JWT validation.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Scheduler
    participant HealthService
    participant MySQL as MySQL Database
    participant Cache as Severity Cache
    participant Redis as Redis

    Note over Scheduler,HealthService: Background diagnostics (startup & periodic)
    Scheduler->>HealthService: trigger runAdvancedMySQLDiagnostics
    HealthService->>MySQL: run diagnostic queries (stuck, long tx, deadlocks, slow queries)
    MySQL-->>HealthService: diagnostic results
    HealthService->>HealthService: compute worst-case severity
    HealthService->>Cache: update cached DB severity

    Note over Client,HealthService: Health check request flow
    Client->>HealthService: GET /health
    HealthService->>MySQL: SELECT 1 (bounded timeout)
    MySQL-->>HealthService: connectivity result
    HealthService->>Cache: read cached DB severity
    HealthService->>Redis: check connectivity
    Redis-->>HealthService: connectivity result
    HealthService->>Client: return per-component map (status, severity, checkedAt)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through builds to fetch the git trace,
Wove health-check threads in a quiet, safe space,
Background diagnostics nibble through night,
Version maps glow with commit-time light,
I leave the endpoints open — swift as a hare's pace! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add advance health check for database' accurately reflects the main change: adding advanced database health checks to the HealthService. It aligns with the primary objective and is concise and clear.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/java/com/iemr/common/utils/JwtUserIdValidationFilter.java (1)

239-256: ⚠️ Potential issue | 🟡 Minor

Duplicate /user/refreshToken entry with inconsistent match semantics.

Line 243 already checks path.equals(contextPath + "/user/refreshToken"), and line 254 adds path.startsWith(contextPath + "/user/refreshToken"). The startsWith variant is strictly more permissive (it would also match /user/refreshTokenEvil) and makes the equals check on line 243 redundant. Remove the duplicate and decide which semantic is intended — equals is safer.

Proposed fix
 				|| path.startsWith(contextPath + "/user/validateSecurityQuestionAndAnswer")
 				|| path.startsWith(contextPath + "/user/logOutUserFromConcurrentSession")
-				|| path.startsWith(contextPath + "/user/refreshToken")
 				|| path.equals(contextPath + "/health")
 				|| path.equals(contextPath + "/version");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/utils/JwtUserIdValidationFilter.java` around
lines 239 - 256, The shouldSkipAuthentication method contains duplicate checks
for "/user/refreshToken": one using equals (path.equals(contextPath +
"/user/refreshToken")) and another using startsWith (path.startsWith(contextPath
+ "/user/refreshToken")); remove the redundant check and choose the intended
semantic (prefer equals for strict match). Edit the shouldSkipAuthentication
function to eliminate the duplicate and keep only the intended condition (either
the equals variant for exact match or the startsWith variant if prefix matching
is required), ensuring no other logic references rely on the removed form.
🧹 Nitpick comments (3)
src/main/java/com/iemr/common/controller/version/VersionController.java (1)

58-76: Consider caching git properties — they never change at runtime.

loadGitProperties() re-reads git.properties from the classpath on every /version call. Since these values are immutable after build time, loading them once (e.g., in @PostConstruct or a lazy-init field) would be cleaner and avoids repeated I/O, even if the endpoint is low-traffic.

Also, the logger.info("version Controller Start/End") calls on lines 61 and 74 will fire on every request to a public, unauthenticated endpoint — consider lowering to debug.

Proposed refactor — load once at startup
 `@RestController`
 public class VersionController {
 
 	private final Logger logger =
 			LoggerFactory.getLogger(this.getClass().getSimpleName());
 	
 	private static final String UNKNOWN_VALUE = "unknown";
+	private final Map<String, String> versionInfo;
+
+	public VersionController() {
+		Map<String, String> info = new LinkedHashMap<>();
+		try {
+			Properties gitProperties = loadGitProperties();
+			info.put("buildTimestamp", gitProperties.getProperty("git.build.time", UNKNOWN_VALUE));
+			info.put("version", gitProperties.getProperty("git.build.version", UNKNOWN_VALUE));
+			info.put("branch", gitProperties.getProperty("git.branch", UNKNOWN_VALUE));
+			info.put("commitHash", gitProperties.getProperty("git.commit.id.abbrev", UNKNOWN_VALUE));
+		} catch (Exception e) {
+			logger.error("Failed to load version information", e);
+			info.put("buildTimestamp", UNKNOWN_VALUE);
+			info.put("version", UNKNOWN_VALUE);
+			info.put("branch", UNKNOWN_VALUE);
+			info.put("commitHash", UNKNOWN_VALUE);
+		}
+		this.versionInfo = Map.copyOf(info);
+	}
 
 	`@Operation`(summary = "Get version information")
 	`@GetMapping`(value = "/version", produces = MediaType.APPLICATION_JSON_VALUE)
 	public ResponseEntity<Map<String, String>> versionInformation() {
-		Map<String, String> response = new LinkedHashMap<>();
-		try {
-			logger.info("version Controller Start");
-			Properties gitProperties = loadGitProperties();
-			response.put("buildTimestamp", gitProperties.getProperty("git.build.time", UNKNOWN_VALUE));
-			response.put("version", gitProperties.getProperty("git.build.version", UNKNOWN_VALUE));
-			response.put("branch", gitProperties.getProperty("git.branch", UNKNOWN_VALUE));
-			response.put("commitHash", gitProperties.getProperty("git.commit.id.abbrev", UNKNOWN_VALUE));
-		} catch (Exception e) {
-			logger.error("Failed to load version information", e);
-			response.put("buildTimestamp", UNKNOWN_VALUE);
-			response.put("version", UNKNOWN_VALUE);
-			response.put("branch", UNKNOWN_VALUE);
-			response.put("commitHash", UNKNOWN_VALUE);
-		}
-		logger.info("version Controller End");
-		return ResponseEntity.ok(response);
+		return ResponseEntity.ok(versionInfo);
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/controller/version/VersionController.java`
around lines 58 - 76, The versionInformation() method currently calls
loadGitProperties() on every request and logs "version Controller Start/End" at
INFO; change this to load git.properties once at startup and use DEBUG for
request tracing: create a private field (e.g., gitPropertiesCache or
cachedGitInfo) populated in a `@PostConstruct` method or as a lazy-initialized
singleton that calls loadGitProperties() once and falls back to UNKNOWN_VALUE on
error, update versionInformation() to read from the cache instead of calling
loadGitProperties(), and change logger.info(...) to logger.debug(...) to avoid
noisy INFO logs for the public endpoint.
src/main/java/com/iemr/common/service/health/HealthService.java (2)

50-62: Unused constant RESPONSE_TIME_SLOW_MS.

RESPONSE_TIME_SLOW_MS (line 56) is declared but never referenced anywhere in this file. Remove it to avoid confusion, or implement the response-time check it was intended for.

Remove dead constant
     private static final String LOG_EVENT_POOL_EXHAUSTED = "MYSQL_POOL_EXHAUSTED";
-    private static final long RESPONSE_TIME_SLOW_MS    = 2000; // > 2s → SLOW
     private static final int  STUCK_PROCESS_THRESHOLD  = 5;    // > 5 stuck → WARNING
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` around lines
50 - 62, The constant RESPONSE_TIME_SLOW_MS is declared but never used; either
remove the unused constant from HealthService or implement the intended
response-time check using it (e.g., in any method that evaluates query latency)
so the value is referenced. Locate RESPONSE_TIME_SLOW_MS in HealthService and
either delete that declaration along with any related comment, or add logic
(using RESPONSE_TIME_SLOW_MS) where query/response times are assessed (e.g., in
methods that compute slow query metrics) to classify responses as SLOW when >
RESPONSE_TIME_SLOW_MS.

202-355: Background diagnostic approach is well-structured overall.

The per-check isolation with individual try-catch blocks, structured log events with tags, and the escalation model are all sound. A few minor observations beyond what's already flagged:

  1. CHECK 2 (long-running transactions, line 253): Escalates directly to CRITICAL, which maps to DOWN in the health endpoint. A single long transaction (e.g., a legitimate batch job > 30s) would mark the entire DB as DOWN. Consider using WARNING here instead, or making the threshold configurable.

  2. CHECK 5 (lines 304-337): Threads_connected is the server-wide thread count, not the application's pool usage. In shared-database deployments, other apps' connections inflate this metric and could trigger false alerts. The log message says "connection pool" but this is actually server-level connection usage — consider clarifying the log or adding a note.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` around lines
202 - 355, runAdvancedMySQLDiagnostics: for CHECK 2 (InnoDB long-running
transactions / LOG_EVENT_LOCK_WAIT) avoid mapping a single long transaction
directly to CRITICAL/DOWN—change the escalation to WARNING (use
escalate(worstSeverity, "WARNING")) or make the threshold/severity configurable
(introduce a config flag used by the check) so legitimate long-running jobs
don't mark DB as DOWN; for CHECK 5 (Threads_connected / LOG_EVENT_POOL_EXHAUSTED
/ LOG_EVENT_CONN_USAGE) update the log messages and variable names to reflect
these are server-level connection metrics (e.g., mention "server-level
Threads_connected (may include other apps)" instead of "connection pool") and/or
add a comment explaining this limitation so alerts aren't misleading in
shared-database deployments.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 141-200: checkDatabaseConnectivity currently returns the
cachedDbSeverity (which defaults to "INFO") on a successful SELECT 1 while Redis
returns "OK", causing inconsistent healthy severities; modify
checkDatabaseConnectivity so that on successful connectivity it sets severity to
"OK" (and uses resolveDatabaseStatus("OK") for status) instead of using
cachedDbSeverity, and ensure any related logic in runAdvancedMySQLDiagnostics
continues to update cachedDbSeverity for degraded states only so healthy checks
remain "OK".
- Around line 284-301: The slow query check uses the cumulative MySQL
"Slow_queries" counter and will permanently warn after the first slow query; add
a long field (e.g., previousSlowQueryCount) to HealthService and update the
CHECK 4 block to compute a delta = slowQueries - previousSlowQueryCount, only
warn/escalate (use LOG_EVENT_SLOW_QUERIES and escalate(...)) when delta > 0,
include the delta and cumulative value in the log, and then assign
previousSlowQueryCount = slowQueries so subsequent runs only flag new slow
queries (follow the same delta-tracking pattern used by previousDeadlockCount).

---

Outside diff comments:
In `@src/main/java/com/iemr/common/utils/JwtUserIdValidationFilter.java`:
- Around line 239-256: The shouldSkipAuthentication method contains duplicate
checks for "/user/refreshToken": one using equals (path.equals(contextPath +
"/user/refreshToken")) and another using startsWith (path.startsWith(contextPath
+ "/user/refreshToken")); remove the redundant check and choose the intended
semantic (prefer equals for strict match). Edit the shouldSkipAuthentication
function to eliminate the duplicate and keep only the intended condition (either
the equals variant for exact match or the startsWith variant if prefix matching
is required), ensuring no other logic references rely on the removed form.

---

Nitpick comments:
In `@src/main/java/com/iemr/common/controller/version/VersionController.java`:
- Around line 58-76: The versionInformation() method currently calls
loadGitProperties() on every request and logs "version Controller Start/End" at
INFO; change this to load git.properties once at startup and use DEBUG for
request tracing: create a private field (e.g., gitPropertiesCache or
cachedGitInfo) populated in a `@PostConstruct` method or as a lazy-initialized
singleton that calls loadGitProperties() once and falls back to UNKNOWN_VALUE on
error, update versionInformation() to read from the cache instead of calling
loadGitProperties(), and change logger.info(...) to logger.debug(...) to avoid
noisy INFO logs for the public endpoint.

In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 50-62: The constant RESPONSE_TIME_SLOW_MS is declared but never
used; either remove the unused constant from HealthService or implement the
intended response-time check using it (e.g., in any method that evaluates query
latency) so the value is referenced. Locate RESPONSE_TIME_SLOW_MS in
HealthService and either delete that declaration along with any related comment,
or add logic (using RESPONSE_TIME_SLOW_MS) where query/response times are
assessed (e.g., in methods that compute slow query metrics) to classify
responses as SLOW when > RESPONSE_TIME_SLOW_MS.
- Around line 202-355: runAdvancedMySQLDiagnostics: for CHECK 2 (InnoDB
long-running transactions / LOG_EVENT_LOCK_WAIT) avoid mapping a single long
transaction directly to CRITICAL/DOWN—change the escalation to WARNING (use
escalate(worstSeverity, "WARNING")) or make the threshold/severity configurable
(introduce a config flag used by the check) so legitimate long-running jobs
don't mark DB as DOWN; for CHECK 5 (Threads_connected / LOG_EVENT_POOL_EXHAUSTED
/ LOG_EVENT_CONN_USAGE) update the log messages and variable names to reflect
these are server-level connection metrics (e.g., mention "server-level
Threads_connected (may include other apps)" instead of "connection pool") and/or
add a comment explaining this limitation so alerts aren't misleading in
shared-database deployments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/main/java/com/iemr/common/service/health/HealthService.java (2)

258-281: Stuck process check: warn-level log but OK severity for counts 1–5.

When stuckCount is between 1 and STUCK_PROCESS_THRESHOLD (5), the code logs at WARN level but returns SEVERITY_OK, which means no escalation. This is a bit inconsistent — either the situation is worth alerting on (and should return at least INFO or a lower-tier signal) or the log should be at INFO/DEBUG level.

If this is intentional observability without escalation, consider logging at INFO instead of WARN to avoid noisy log-based alerts that don't match the reported severity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` around lines
258 - 281, performStuckProcessCheck logs a WARN when any stuckCount > 0 but
still returns SEVERITY_OK for counts <= STUCK_PROCESS_THRESHOLD, causing
inconsistent noisy warnings; change the non-escalating branch so that when
stuckCount > 0 && stuckCount <= STUCK_PROCESS_THRESHOLD you call logger.info
(not logger.warn) and retain returning SEVERITY_OK, and keep logger.warn +
return SEVERITY_WARNING only when stuckCount > STUCK_PROCESS_THRESHOLD (refer to
performStuckProcessCheck, STUCK_PROCESS_THRESHOLD, STUCK_PROCESS_SECONDS,
logger.warn/info, SEVERITY_OK, SEVERITY_WARNING).

79-79: Remove unused constant RESPONSE_TIME_SLOW_MS.

This constant is declared at line 79 but never used anywhere in the codebase. Remove it to keep the code clean.

🧹 Proposed fix
-    private static final long RESPONSE_TIME_SLOW_MS    = 2000; // > 2s → SLOW
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` at line 79,
Remove the unused constant RESPONSE_TIME_SLOW_MS from HealthService to clean up
dead code: locate the declaration "private static final long
RESPONSE_TIME_SLOW_MS = 2000;" in the HealthService class and delete that field,
then run a build/compile to confirm no references remain (if any exist, replace
usages with the appropriate existing threshold or refactor to use the existing
RESPONSE_TIME_* constants).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 174-183: When a successful SELECT 1 proves DB connectivity, avoid
reporting DOWN based on background diagnostics: in the block that reads
cachedDbSeverity and sets result.put(FIELD_STATUS...) (using
cachedDbSeverity.get() and resolveDatabaseStatus(severity)), check if
resolveDatabaseStatus(severity) would return DOWN and, if so, override/cap the
severity to DEGRADED before putting values into result (e.g., set severity =
"DEGRADED" and use resolveDatabaseStatus("DEGRADED") for FIELD_STATUS); make the
same change in the similar block around lines 392-398 so that DOWN is reserved
for actual connection failures handled in the catch branch.
- Around line 283-304: performLongTransactionCheck currently marks any single
long-running InnoDB transaction (> STUCK_PROCESS_SECONDS) as CRITICAL; change it
to use a count threshold similar to STUCK_PROCESS_THRESHOLD (e.g.,
STUCK_TRANSACTION_THRESHOLD) and implement graduated escalation: if lockCount ==
0 return SEVERITY_OK; if 0 < lockCount <= STUCK_TRANSACTION_THRESHOLD return
SEVERITY_WARNING (and log with LOG_EVENT_LOCK_WAIT, lockCount and threshold); if
lockCount > STUCK_TRANSACTION_THRESHOLD return SEVERITY_CRITICAL (and log
likewise). Keep existing exception logging behavior but include lockCount-based
decision logic inside performLongTransactionCheck and add/define the new
STUCK_TRANSACTION_THRESHOLD constant.

---

Duplicate comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 330-353: The current delta logic in performSlowQueryCheck uses
previousSlowQueryCount.getAndSet(slowQueries) which can produce a false-positive
warning on first run if previousSlowQueryCount defaults to 0; change the
initialization/flow so the first observed slowQueries value is only used to seed
previousSlowQueryCount (no warning) and subsequent runs compute delta normally.
Specifically, in performSlowQueryCheck, after reading slowQueries from
ResultSet, detect an uninitialized sentinel (e.g., previousSlowQueryCount == 0
or a dedicated negative sentinel) and set previousSlowQueryCount to slowQueries
without logging; otherwise compute delta = slowQueries - previousSlow and only
log/warn when delta > 0 (retain STATUS_VALUE and LOG_EVENT_SLOW_QUERIES
references).

---

Nitpick comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 258-281: performStuckProcessCheck logs a WARN when any stuckCount
> 0 but still returns SEVERITY_OK for counts <= STUCK_PROCESS_THRESHOLD, causing
inconsistent noisy warnings; change the non-escalating branch so that when
stuckCount > 0 && stuckCount <= STUCK_PROCESS_THRESHOLD you call logger.info
(not logger.warn) and retain returning SEVERITY_OK, and keep logger.warn +
return SEVERITY_WARNING only when stuckCount > STUCK_PROCESS_THRESHOLD (refer to
performStuckProcessCheck, STUCK_PROCESS_THRESHOLD, STUCK_PROCESS_SECONDS,
logger.warn/info, SEVERITY_OK, SEVERITY_WARNING).
- Line 79: Remove the unused constant RESPONSE_TIME_SLOW_MS from HealthService
to clean up dead code: locate the declaration "private static final long
RESPONSE_TIME_SLOW_MS = 2000;" in the HealthService class and delete that field,
then run a build/compile to confirm no references remain (if any exist, replace
usages with the appropriate existing threshold or refactor to use the existing
RESPONSE_TIME_* constants).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/main/java/com/iemr/common/service/health/HealthService.java (2)

102-103: Both delta counters start at 0, causing a false-positive WARNING on the first diagnostic run.

Innodb_deadlocks and Slow_queries are server-lifetime cumulative counters. On first run after service startup, previousDeadlockCount and previousSlowQueryCount are 0, so any non-zero current value (accumulated before this service started) registers as a new delta and emits a WARNING — triggering alert noise on every deployment.

Consider seeding the counters from MySQL during the first diagnostic run using a sentinel value (e.g., -1L) to skip delta comparison:

♻️ Proposed approach
-    private final AtomicLong previousDeadlockCount  = new AtomicLong(0);
-    private final AtomicLong previousSlowQueryCount = new AtomicLong(0);
+    // -1L = "first run, seed only — do not emit a delta warning"
+    private final AtomicLong previousDeadlockCount  = new AtomicLong(-1L);
+    private final AtomicLong previousSlowQueryCount = new AtomicLong(-1L);

Then in each delta check:

-                long previousDeadlocks = previousDeadlockCount.getAndSet(currentDeadlocks);
-                if (currentDeadlocks > previousDeadlocks) {
+                long previousDeadlocks = previousDeadlockCount.getAndSet(currentDeadlocks);
+                if (previousDeadlocks >= 0 && currentDeadlocks > previousDeadlocks) {

Apply the same guard to performSlowQueryCheck.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` around lines
102 - 103, Previous delta counters previousDeadlockCount and
previousSlowQueryCount are initialized to 0 causing false positives on first
run; change their initial sentinel to -1L and update the diagnostic logic in the
methods that compute deltas (the deadlock check and performSlowQueryCheck) to
treat a previous value of -1 as "first run"—on first run set
previousDeadlockCount/previousSlowQueryCount to the current MySQL-retrieved
count and skip emitting a WARNING or computing a delta, otherwise compute delta
normally and update the previous counter after the check.

79-79: RESPONSE_TIME_SLOW_MS is declared but never used — either wire it in or remove it.

If the intent is to classify a slow-but-alive DB response (e.g., response time > 2 s → SLOW/WARNING), the elapsed-time measurement and comparison logic is missing from checkDatabaseConnectivity. Otherwise, remove the dead constant.

#!/bin/bash
# Confirm RESPONSE_TIME_SLOW_MS is not referenced anywhere else in the codebase.
rg -n "RESPONSE_TIME_SLOW_MS"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/iemr/common/service/health/HealthService.java` at line 79,
RESPONSE_TIME_SLOW_MS is declared but unused; either remove the constant or wire
it into checkDatabaseConnectivity by measuring DB call elapsed time (capture
start/end around the DB ping in checkDatabaseConnectivity), compare the elapsed
millis to RESPONSE_TIME_SLOW_MS and return/mark a SLOW or WARNING health state
accordingly (use the existing health result/enum used by
checkDatabaseConnectivity), or if you choose deletion simply remove the
RESPONSE_TIME_SLOW_MS constant and any related dead code; update unit tests if
present to reflect the chosen behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 177-181: The health check currently only sets
stmt.setQueryTimeout(3) but does not bound dataSource.getConnection(), so
getConnection() can block up to the pool's connectionTimeout and violate the
"/health must never block > 3s" guarantee; either ensure the connection pool
(HikariCP) configuration sets connectionTimeout ≤ 3000 ms or wrap the entire
connection+query block in a timed task (e.g., ExecutorService.submit(() -> { try
(Connection c = dataSource.getConnection(); Statement s = c.createStatement()) {
s.setQueryTimeout(3); s.execute("SELECT 1"); } })).get(3, TimeUnit.SECONDS) and
handle TimeoutException by failing fast and closing resources, referencing
dataSource.getConnection(), stmt.setQueryTimeout(...), and
performConnectionUsageCheck to keep consistent behavior.
- Around line 269-276: The current logic logs sub-threshold stuck MySQL process
counts using logger.warn while still returning SEVERITY_OK; change the logging
level for the non-escalating branch so that when stuckCount > 0 but <=
STUCK_PROCESS_THRESHOLD (use the same check around stuckCount and
STUCK_PROCESS_THRESHOLD) you call logger.info instead of logger.warn and keep
returning SEVERITY_OK, and reserve logger.warn (or leave as-is) only in the
branch where stuckCount > STUCK_PROCESS_THRESHOLD that returns SEVERITY_WARNING;
update the logging call that references LOG_EVENT_STUCK_PROCESS and
STUCK_PROCESS_SECONDS accordingly.
- Line 31: Update the incompatible import in HealthService: replace the javax
annotation import with the Jakarta equivalent by changing the import of
PreDestroy from javax.annotation.PreDestroy to jakarta.annotation.PreDestroy so
the HealthService class compiles under Spring Boot 3.2.2 (locate the import
statement in the HealthService file and update it accordingly).

---

Duplicate comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 183-186: When the background diagnostic cachedDbSeverity is
CRITICAL but the live "SELECT 1" check succeeds, cap the reported severity to
DEGRADED before resolving status so we don't report DOWN; update the block that
currently does result.put(FIELD_STATUS, resolveDatabaseStatus(severity)) and
result.put(FIELD_SEVERITY, severity) to first map SEVERITY_CRITICAL ->
SEVERITY_DEGRADED (e.g., compute a cappedSeverity variable from
cachedDbSeverity.get()), use cappedSeverity when calling resolveDatabaseStatus
and when putting FIELD_SEVERITY, and keep reporting STATUS_DOWN only in the
existing catch branch that handles exceptions.

---

Nitpick comments:
In `@src/main/java/com/iemr/common/service/health/HealthService.java`:
- Around line 102-103: Previous delta counters previousDeadlockCount and
previousSlowQueryCount are initialized to 0 causing false positives on first
run; change their initial sentinel to -1L and update the diagnostic logic in the
methods that compute deltas (the deadlock check and performSlowQueryCheck) to
treat a previous value of -1 as "first run"—on first run set
previousDeadlockCount/previousSlowQueryCount to the current MySQL-retrieved
count and skip emitting a WARNING or computing a delta, otherwise compute delta
normally and update the previous counter after the check.
- Line 79: RESPONSE_TIME_SLOW_MS is declared but unused; either remove the
constant or wire it into checkDatabaseConnectivity by measuring DB call elapsed
time (capture start/end around the DB ping in checkDatabaseConnectivity),
compare the elapsed millis to RESPONSE_TIME_SLOW_MS and return/mark a SLOW or
WARNING health state accordingly (use the existing health result/enum used by
checkDatabaseConnectivity), or if you choose deletion simply remove the
RESPONSE_TIME_SLOW_MS constant and any related dead code; update unit tests if
present to reflect the chosen behavior.

@sonarqubecloud
Copy link

@drtechie drtechie merged commit 4b44045 into PSMRI:main Mar 2, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants