ip,ip6: add rate limiting for ICMP errors, ARP and ICMP input#576
ip,ip6: add rate limiting for ICMP errors, ARP and ICMP input#576
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds per-path rate limiting and selective graph config updates. Introduces a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/api/gr_infra.h`:
- Around line 391-393: The three rate fields (icmp_error_rate, arp_rate,
icmp_rate) in struct gr_graph_conf cannot use 0 both as “disable” and
“unchanged” because GR_GRAPH_CONF_SET treats zeroed members as “unchanged”;
modify the API to add explicit presence flags (e.g., a bitmask or bools like
has_icmp_error_rate/has_arp_rate/has_icmp_rate) to the struct and update the
setter/serialization logic (the code path used by GR_GRAPH_CONF_SET and callers
such as modules/infra/cli/graph.c) to consult these presence bits when encoding
requests so a caller can send has_*=true with value 0 to mean “disable” while
has_*=false means “leave unchanged.”
In `@modules/infra/datapath/gr_datapath.h`:
- Around line 18-31: The rate limiter currently allows a whole vector when
tokens<nb_pkts and doesn't apply a runtime decrease of max_rate; in rate_limited
(and struct rate_limit_ctx usage) change the logic so: if max_rate == 0 return
true (reject all); on refill (now - last_refill >= rte_get_tsc_hz()) set
ctx->tokens = max_rate and update last_refill; otherwise clamp ctx->tokens =
RTE_MIN(ctx->tokens, max_rate) to handle runtime decreases; then if ctx->tokens
< nb_pkts return true (reject the batch) else subtract nb_pkts from ctx->tokens
(ctx->tokens -= nb_pkts); this ensures batches larger than remaining tokens are
rejected and lowers-of-rate take effect immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6713d6cb-850c-47c1-a541-4e2c5fcf04e0
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/gr_graph.hmodules/infra/control/graph.cmodules/infra/datapath/gr_datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/control/graph.h`:
- Line 40: The global mutable declaration extern struct gr_graph_conf graph_conf
exposes graph_conf (type gr_graph_conf) to concurrent reads of graph_conf.*_rate
from datapath workers while modules/infra/control/graph.c mutates it, causing
data races; fix by removing the shared mutable global and either (a) change the
*_rate fields to atomic types and update them via atomic_store/atomic_load, or
(b) stop in-place updates and publish an immutable snapshot (allocate a new
gr_graph_conf, populate it in graph.c, then publish via an atomic pointer swap
and provide an accessor like gr_get_graph_conf_snapshot()) so datapath readers
use the snapshot or atomic loads instead of reading a mutable graph_conf
directly.
In `@modules/infra/datapath/datapath.h`:
- Around line 22-31: The limiter currently ignores upward changes to max_rate
because the code returns early when max_rate==0 and only refills tokens on a
periodic timer; update the logic so that when max_rate increases (including 0 ->
N) you immediately reseed the token bucket by setting ctx->tokens = max_rate and
ctx->last_refill = now before packet-drop decisions. Concretely, detect an
upward change by comparing the new max_rate to the previously configured value
stored in the ctx (e.g., ctx->max_rate or a saved prev_rate), perform the
immediate reseed, then update ctx->max_rate to the new value; keep the periodic
refill logic for normal operation and preserve the existing behavior when
max_rate is decreased or set to 0.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cb6146b-de9e-432e-944b-a9df65b62a27
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/graph.cmodules/infra/control/graph.hmodules/infra/datapath/datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/ip/datapath/icmp_input.c
- modules/infra/control/graph.c
- modules/infra/api/gr_infra.h
|
|
||
| rte_edge_t gr_node_attach_parent(const char *parent, const char *node); | ||
|
|
||
| extern struct gr_graph_conf graph_conf; |
There was a problem hiding this comment.
graph_conf is now a live cross-thread data race.
modules/infra/control/graph.c updates this struct in place, while the new datapath fast paths read graph_conf.*_rate on every vector. Exporting it as a plain global here makes those control-plane writes race with worker-lcore reads. Please publish the rate fields through atomics or swap a read-only snapshot instead of sharing a mutable struct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/control/graph.h` at line 40, The global mutable declaration
extern struct gr_graph_conf graph_conf exposes graph_conf (type gr_graph_conf)
to concurrent reads of graph_conf.*_rate from datapath workers while
modules/infra/control/graph.c mutates it, causing data races; fix by removing
the shared mutable global and either (a) change the *_rate fields to atomic
types and update them via atomic_store/atomic_load, or (b) stop in-place updates
and publish an immutable snapshot (allocate a new gr_graph_conf, populate it in
graph.c, then publish via an atomic pointer swap and provide an accessor like
gr_get_graph_conf_snapshot()) so datapath readers use the snapshot or atomic
loads instead of reading a mutable graph_conf directly.
| if (max_rate == 0) | ||
| return false; | ||
|
|
||
| if (now - ctx->last_refill >= rte_get_tsc_hz()) { | ||
| ctx->tokens = max_rate; | ||
| ctx->last_refill = now; | ||
| } else if (ctx->tokens == 0) { | ||
| return true; | ||
| } | ||
| ctx->tokens -= RTE_MIN(ctx->tokens, nb_pkts); |
There was a problem hiding this comment.
Upward rate changes are not applied immediately.
Line 22 leaves the bucket state untouched when the limiter is disabled, and Line 25 is the only place that replenishes tokens. If a node has already drained its bucket and the operator later increases max_rate — especially 0 -> N — Line 28 will keep dropping whole vectors until the stale 1-second refill window expires. That contradicts the stated “take effect immediately” behavior and can blackhole traffic right after reconfiguration. Please reseed the bucket on upward changes instead of waiting for the next periodic refill.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modules/infra/datapath/datapath.h` around lines 22 - 31, The limiter
currently ignores upward changes to max_rate because the code returns early when
max_rate==0 and only refills tokens on a periodic timer; update the logic so
that when max_rate increases (including 0 -> N) you immediately reseed the token
bucket by setting ctx->tokens = max_rate and ctx->last_refill = now before
packet-drop decisions. Concretely, detect an upward change by comparing the new
max_rate to the previously configured value stored in the ctx (e.g.,
ctx->max_rate or a saved prev_rate), perform the immediate reseed, then update
ctx->max_rate to the new value; keep the periodic refill logic for normal
operation and preserve the existing behavior when max_rate is decreased or set
to 0.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/infra/api/gr_infra.h`:
- Around line 388-394: The enum gr_graph_conf_set_attr_t is declared as uint16_t
but only GR_GRAPH_SET_RX_BURST uses GR_BIT16 while the other flags use GR_BIT64;
change GR_BIT64 to GR_BIT16 for GR_GRAPH_SET_VECTOR, GR_GRAPH_SET_ICMP_ERROR,
GR_GRAPH_SET_ARP and GR_GRAPH_SET_ICMP so all bit macros match the uint16_t base
type and preserve unique bit positions within the 16-bit range.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9350d67f-f12f-46e1-9fee-922dd53ea5e8
📒 Files selected for processing (10)
modules/infra/api/gr_infra.hmodules/infra/cli/graph.cmodules/infra/control/graph.cmodules/infra/control/graph.hmodules/infra/datapath/datapath.hmodules/ip/datapath/arp_input.cmodules/ip/datapath/icmp_input.cmodules/ip/datapath/ip_error.cmodules/ip6/datapath/icmp6_input.cmodules/ip6/datapath/ip6_error.c
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/infra/control/graph.h
- modules/infra/datapath/datapath.h
- modules/infra/control/graph.c
The graph config API uses zero to mean "don't change this field" for rx_burst_max and vector_max. Future fields may need zero as a valid value, so the convention does not scale. Add a set_attrs bitmask to the config set request so that the handler knows which fields the client explicitly provided. Move validation (zero and overflow checks) inside the per-field blocks so they only apply to fields being changed. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Under a flood of bad packets (TTL=1, no-route, etc.), every packet triggers expensive ICMP error construction: address lookups, header prepend with potential memmove, and checksum computation. Similarly, ARP and ICMP/ICMPv6 input processing can saturate the control plane thread when flooded. None of this is rate limited today. Add a per-node token bucket to all ICMP error nodes, ARP input, and ICMP/ICMPv6 input nodes. The bucket refills once per second and the check runs before any per-packet work. When the bucket is empty, the whole vector is moved to a dedicated drop node via rte_node_next_stream_move(), avoiding any unnecessary processing. Three separate rates are configurable via the graph config API: icmp-error-rate, arp-rate and icmp-rate. All default to 1000 packets/sec per node per worker. Setting a rate to 0 disables the limit. Rate changes take effect immediately without graph reload. Signed-off-by: Robin Jarry <rjarry@redhat.com>
Under a flood of bad packets (TTL=1, no-route, etc.), every packet triggers expensive ICMP error construction: address lookups, header prepend with potential memmove, and checksum computation. Similarly, ARP and ICMP/ICMPv6 input processing can saturate the control plane thread when flooded. None of this is rate limited today.
Add a per-node token bucket to all ICMP error nodes, ARP input, and ICMP/ICMPv6 input nodes. The bucket refills once per second and the check runs before any per-packet work. When the bucket is empty, the whole vector is moved to a dedicated drop node via rte_node_next_stream_move(), avoiding any unnecessary processing.
Three separate rates are configurable via the graph config API: icmp-error-rate, arp-rate and icmp-rate. All default to 1000 packets/sec per node per worker. Setting a rate to 0 disables the limit. Rate changes take effect immediately without graph reload.
Rate limiting for ICMP errors, ARP, and ICMP/ICMPv6 input
Behavior
Implementation
Control-plane and configuration
CLI
Precise semantics from code