Re: [PATCHv2 net-next] tcp: Add TCP ROCCET congestion control module.

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: t.fuechsel@gmx.de
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
	pabeni@redhat.com, horms@kernel.org, ncardwell@google.com,
	kuniyu@google.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, lukas.prause@ikt.uni-hannover.de
Subject: Re: [PATCHv2 net-next] tcp: Add TCP ROCCET congestion control module.
Date: Fri, 15 May 2026 18:12:12 -0700	[thread overview]
Message-ID: <20260516011212.1859280-1-kuba@kernel.org> (raw)
In-Reply-To: <agMMMUotXaG_Ypqz@volt-roccet-vm>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Note: The format of this report is altered to provide a free-form summary of 
the findings due to recitation restrictions. The original patch code is not 
quoted.

commit 838c20dc5dd6210ad01e69f78cc66400822cd8c7
Author: Tim Fuechsel <t.fuechsel@gmx.de>

tcp: Add TCP ROCCET congestion control module.

TCP ROCCET is an extension of TCP CUBIC that improves its overall performance 
in cellular networks by using additional metrics to detect bufferbloat. It 
introduces a modified slow start and uses srRTT and the acknowledgment arrival 
rate to determine when to exit slow start.

Link: https://doi.org/10.1109/VTC2023-Fall60731.2023.10333357
Link: https://doi.org/10.1109/WMNC.2016.7543932
Link: https://doi.org/10.23919/WONS68803.2026.11501781

Can the srRTT calculation cause a catastrophic integer underflow? If a socket 
has not yet received a valid RTT sample, curr_rtt remains 0. When 
roccettcp_cong_avoid() executes, it calls update_min_rtt(), which computes 
curr_min_rtt as max(0, 1), resulting in 1. Then update_srrtt() calculates 
curr_rtt - curr_min_rtt (0 - 1). Since both are unsigned 32-bit integers, 
does this underflow to 0xFFFFFFFF? Multiplying this by 100 yields 0xFFFFFF9C, 
which gets integrated into the curr_srrtt EWMA. Would this resulting massive 
value instantly exceed the threshold and falsely trigger a severe bufferbloat 
response that permanently cripples the congestion window?

Are the congestion control state variables curr_rtt and curr_srrtt properly 
initialized? In roccettcp_reset(), memset zeroes the state struct only up to 
the curr_rtt field, leaving curr_rtt and curr_srrtt uninitialized. If a user 
dynamically switches an active socket's congestion control to ROCCET via 
setsockopt, could the icsk_ca_priv area contain leftover memory from the 
previous algorithm? Might these garbage values be fed into the core RTT logic, 
causing unpredictable bufferbloat detection and spurious throughput drops?

Is it safe to expose internal TCP congestion control callbacks as BPF kfuncs? 
The module natively hooks into the TCP stack, but it annotates its static 
internal functions with __bpf_kfunc and exports them via 
register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS). Does this allow arbitrary 
BPF struct_ops programs to call these native state-modifying functions 
directly? Could a BPF program invoke these functions on a socket using a 
different congestion control algorithm, blindly casting sk->icsk_ca_priv to 
struct roccettcp and corrupting the socket's internal state?

Could the 16-bit ACK rate counter wrap around on high-speed links? The 
ack_rate.cnt field is defined as a u16, accumulating the number of 
acknowledged segments over a 100ms interval. On modern fast links, more than 
83,000 segments can be acknowledged in a single 100ms window. Will this 
silently overflow multiple times per interval, recording an artificially low, 
randomized rate and breaking the slow start exit conditions?

Does the interval boundary logic discard acknowledged packets? In 
update_ack_rate(), when the 100ms interval rolls over, the interval state 
rotates and ack_rate.cnt is reset to 0. This happens inside an if/else block 
where the else branch contains the counter increment. Because the else branch 
is bypassed during a rollover, are any segments acknowledged by the specific 
ACK packet that triggered the boundary cross permanently discarded, causing a 
persistent undercounting of the throughput?

Can a timestamp wraparound freeze ACK rate tracking for idle connections? 
The update_ack_rate() function evaluates elapsed time via an explicit signed 
cast of a 32-bit microsecond timestamp delta. Since this timestamp wraps 
around roughly every 71 minutes, if a socket remains idle for more than about 
35 minutes, the unsigned delta exceeds 0x7FFFFFFF. Would the signed cast 
convert this into a large negative number, rendering the interval condition 
false, freezing the interval logic, and causing the counter to accumulate 
indefinitely without rotation?

Are ECN marks being ignored and cached, leading to spurious later drops? In 
roccet_in_ack_event(), receiving an ECE flag sets ece_received to true. 
However, in roccettcp_cong_avoid(), this flag is only evaluated and cleared 
if curr_srrtt exceeds roccet_xj. If an ECN mark arrives while the RTT is low, 
is it ignored, leaving ece_received stuck at true? Could this stale flag 
falsely trigger a severe congestion window reduction hours later if the RTT 
naturally spikes above the threshold?

Is there a division by zero vulnerability during module initialization? The 
variables beta and bic_scale are configurable module parameters. In 
roccettcp_register(), bic_scale is used directly in a do_div() call, and beta 
is used in a denominator. If the module is loaded with bic_scale set to 0 or 
beta set to 1024, will this instantly cause a division-by-zero kernel panic?

Does the implementation match its stated ACK rate bufferbloat detection 
mechanism? The commit message states that the rate is sampled over 5 RTT 
intervals, and slow start exits if the rate stays on the same level. However, 
the code uses a hardcoded 100ms interval, and the exit condition only triggers 
if the new rate drops by 10 or more segments. If the rate genuinely stays on 
the same level, the difference is 0. Does this completely fail the check?
-- 
pw-bot: cr

     prev parent reply	other threads:[~2026-05-16  1:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 11:17 [PATCHv2 net-next] tcp: Add TCP ROCCET congestion control module Tim Fuechsel
2026-05-16  1:12 ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011212.1859280-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukas.prause@ikt.uni-hannover.de \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=t.fuechsel@gmx.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox