public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: fix orphan count order in __tcp_close()
@ 2026-04-17 20:25 RubenKelevra
  2026-04-17 20:54 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: RubenKelevra @ 2026-04-17 20:25 UTC (permalink / raw)
  To: netdev, edumazet, ncardwell
  Cc: kuniyu, davem, dsahern, kuba, pabeni, horms, RubenKelevra

__tcp_close() calls sock_orphan(sk) first and drains the backlog with
__release_sock(sk), which might call tcp_done() which decrements the
tcp_orphan_count. After which we will increment tcp_orphan_count again.

Since tcp_orphan_count is an unsigned int, we underflow to uint_max if we
started with a 0 - at least on all current supported platforms.

I could not locate a direct user of this value, and in
tcp_orphan_count_sum() this underflow is contained by adding the unsigned
int value into a signed int sum, causing it to behave like -1 on current
supported platforms and then get clamped by max(n, 0) to 0.

The impact therefore is currently limited to e.g. tcp_too_many_orphans()
checking an artificially low value, if the cached sum is refreshed within
this timeframe.

This fix mirrors the previous fix I found while investigating: commit
75c2d9077c63 ("[TCP]: Fix sock_orphan dead lock")

Later commit eb4dea585304 ("net: Fix percpu counters deadlock") moved the
increment down for old percpu_counter reasons. commit 19757cebf0c5 ("tcp:
switch orphan_count to bare per-cpu counters") changed orphan accounting to
plain per-cpu counters, so that old reason no longer applies the same way
now.

Fixes: 19757cebf0c5 ("tcp: switch orphan_count to bare per-cpu counters")
Signed-off-by: RubenKelevra <rubenkelevra@gmail.com>
---
 net/ipv4/tcp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 2014a6408e93..1a91cb31b02f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3234,11 +3234,10 @@ void __tcp_close(struct sock *sk, long timeout)
 
 	local_bh_disable();
 	bh_lock_sock(sk);
+	tcp_orphan_count_inc();
 	/* remove backlog if any, without releasing ownership. */
 	__release_sock(sk);
 
-	tcp_orphan_count_inc();
-
 	/* Have we already been destroyed by a softirq or backlog? */
 	if (state != TCP_CLOSE && sk->sk_state == TCP_CLOSE)
 		goto out;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] tcp: fix orphan count order in __tcp_close()
  2026-04-17 20:25 [PATCH] tcp: fix orphan count order in __tcp_close() RubenKelevra
@ 2026-04-17 20:54 ` Eric Dumazet
  2026-04-17 21:36   ` Ruben Kelevra
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2026-04-17 20:54 UTC (permalink / raw)
  To: RubenKelevra
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms

On Fri, Apr 17, 2026 at 1:25 PM RubenKelevra <rubenkelevra@gmail.com> wrote:
>
> __tcp_close() calls sock_orphan(sk) first and drains the backlog with
> __release_sock(sk), which might call tcp_done() which decrements the
> tcp_orphan_count. After which we will increment tcp_orphan_count again.
>
> Since tcp_orphan_count is an unsigned int, we underflow to uint_max if we
> started with a 0 - at least on all current supported platforms.
>
> I could not locate a direct user of this value, and in
> tcp_orphan_count_sum() this underflow is contained by adding the unsigned
> int value into a signed int sum, causing it to behave like -1 on current
> supported platforms and then get clamped by max(n, 0) to 0.
>
> The impact therefore is currently limited to e.g. tcp_too_many_orphans()
> checking an artificially low value, if the cached sum is refreshed within
> this timeframe.
>
> This fix mirrors the previous fix I found while investigating: commit
> 75c2d9077c63 ("[TCP]: Fix sock_orphan dead lock")
>
> Later commit eb4dea585304 ("net: Fix percpu counters deadlock") moved the
> increment down for old percpu_counter reasons. commit 19757cebf0c5 ("tcp:
> switch orphan_count to bare per-cpu counters") changed orphan accounting to
> plain per-cpu counters, so that old reason no longer applies the same way
> now.

I find this patch rather confusing. Have you used an LLM to generate it?

You are mentioning old patches that are not relevant (bh disable/enable).

Given we advise only increasing tcp_max_orphans value (default being
262144 on modern hosts),
your patch has really no effect.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] tcp: fix orphan count order in __tcp_close()
  2026-04-17 20:54 ` Eric Dumazet
@ 2026-04-17 21:36   ` Ruben Kelevra
  2026-04-17 21:43     ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Ruben Kelevra @ 2026-04-17 21:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms

On Fri, Apr 17, 2026 at 10:54 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 1:25 PM RubenKelevra <rubenkelevra@gmail.com> wrote:
> >
> > __tcp_close() calls sock_orphan(sk) first and drains the backlog with
> > __release_sock(sk), which might call tcp_done() which decrements the
> > tcp_orphan_count. After which we will increment tcp_orphan_count again.
> >
> > Since tcp_orphan_count is an unsigned int, we underflow to uint_max if we
> > started with a 0 - at least on all current supported platforms.
> >
> > I could not locate a direct user of this value, and in
> > tcp_orphan_count_sum() this underflow is contained by adding the unsigned
> > int value into a signed int sum, causing it to behave like -1 on current
> > supported platforms and then get clamped by max(n, 0) to 0.
> >
> > The impact therefore is currently limited to e.g. tcp_too_many_orphans()
> > checking an artificially low value, if the cached sum is refreshed within
> > this timeframe.
> >
> > This fix mirrors the previous fix I found while investigating: commit
> > 75c2d9077c63 ("[TCP]: Fix sock_orphan dead lock")
> >
> > Later commit eb4dea585304 ("net: Fix percpu counters deadlock") moved the
> > increment down for old percpu_counter reasons. commit 19757cebf0c5 ("tcp:
> > switch orphan_count to bare per-cpu counters") changed orphan accounting to
> > plain per-cpu counters, so that old reason no longer applies the same way
> > now.
>
> I find this patch rather confusing. Have you used an LLM to generate it?
>
> You are mentioning old patches that are not relevant (bh disable/enable).
>
> Given we advise only increasing tcp_max_orphans value (default being
> 262144 on modern hosts),
> your patch has really no effect.

Hey Eric,

no. But I'm not a native speaker. I'm sorry if my explanation did not
meet your or the project's communication quality.

The point I was trying to make is that currently we underflow and
overflow an unsigned and signed int in case we start with 0 in
tcp_orphan_count. And while it currently has no real effect on real
world applications, as we use a 100ms cached value anyway, I just
wanted to clean up this "old" approach which was apparently chosen for
historical reasons to circumvent bugs, not because it was the intent
implementation.

That's what my list of old commits about the history was meant to reference.

Best regards,

Ruben

On Fri, Apr 17, 2026 at 10:54 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 17, 2026 at 1:25 PM RubenKelevra <rubenkelevra@gmail.com> wrote:
> >
> > __tcp_close() calls sock_orphan(sk) first and drains the backlog with
> > __release_sock(sk), which might call tcp_done() which decrements the
> > tcp_orphan_count. After which we will increment tcp_orphan_count again.
> >
> > Since tcp_orphan_count is an unsigned int, we underflow to uint_max if we
> > started with a 0 - at least on all current supported platforms.
> >
> > I could not locate a direct user of this value, and in
> > tcp_orphan_count_sum() this underflow is contained by adding the unsigned
> > int value into a signed int sum, causing it to behave like -1 on current
> > supported platforms and then get clamped by max(n, 0) to 0.
> >
> > The impact therefore is currently limited to e.g. tcp_too_many_orphans()
> > checking an artificially low value, if the cached sum is refreshed within
> > this timeframe.
> >
> > This fix mirrors the previous fix I found while investigating: commit
> > 75c2d9077c63 ("[TCP]: Fix sock_orphan dead lock")
> >
> > Later commit eb4dea585304 ("net: Fix percpu counters deadlock") moved the
> > increment down for old percpu_counter reasons. commit 19757cebf0c5 ("tcp:
> > switch orphan_count to bare per-cpu counters") changed orphan accounting to
> > plain per-cpu counters, so that old reason no longer applies the same way
> > now.
>
> I find this patch rather confusing. Have you used an LLM to generate it?
>
> You are mentioning old patches that are not relevant (bh disable/enable).
>
> Given we advise only increasing tcp_max_orphans value (default being
> 262144 on modern hosts),
> your patch has really no effect.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] tcp: fix orphan count order in __tcp_close()
  2026-04-17 21:36   ` Ruben Kelevra
@ 2026-04-17 21:43     ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2026-04-17 21:43 UTC (permalink / raw)
  To: Ruben Kelevra
  Cc: netdev, ncardwell, kuniyu, davem, dsahern, kuba, pabeni, horms

On Fri, Apr 17, 2026 at 2:36 PM Ruben Kelevra <rubenkelevra@gmail.com> wrote:
>
> Hey Eric,
>
> no. But I'm not a native speaker. I'm sorry if my explanation did not
> meet your or the project's communication quality.
>
> The point I was trying to make is that currently we underflow and
> overflow an unsigned and signed int in case we start with 0 in
> tcp_orphan_count. And while it currently has no real effect on real
> world applications, as we use a 100ms cached value anyway, I just
> wanted to clean up this "old" approach which was apparently chosen for
> historical reasons to circumvent bugs, not because it was the intent
> implementation.
>
> That's what my list of old commits about the history was meant to reference.

We are using per-cpu counters, folded every 100ms, so the precise
value is irrelevant,
as long as it is balanced (does not increase to infinity)

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-17 21:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-17 20:25 [PATCH] tcp: fix orphan count order in __tcp_close() RubenKelevra
2026-04-17 20:54 ` Eric Dumazet
2026-04-17 21:36   ` Ruben Kelevra
2026-04-17 21:43     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox