netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org, Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	David Ahern <dsahern@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>,
	Matthieu Baerts <matttbe@kernel.org>
Subject: Re: [PATCH net-next 1/2] tcp: do not set a zero size receive buffer
Date: Mon, 21 Jul 2025 15:32:16 +0200	[thread overview]
Message-ID: <ebc7890c-e239-4a64-99af-df5053245b28@redhat.com> (raw)
In-Reply-To: <CANn89i+baSpvbJM6gcbSjZMmWVyvwsFotvH1czui9ARVRjS5Bw@mail.gmail.com>

On 7/21/25 2:30 PM, Eric Dumazet wrote:
> On Mon, Jul 21, 2025 at 3:50 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 7/21/25 10:04 AM, Eric Dumazet wrote:
>>> On Fri, Jul 18, 2025 at 10:25 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> The nipa CI is reporting frequent failures in the mptcp_connect
>>>> self-tests.
>>>>
>>>> In the failing scenarios (TCP -> MPTCP) the involved sockets are
>>>> actually plain TCP ones, as fallback for passive socket at 2whs
>>>> time cause the MPTCP listener to actually create a TCP socket.
>>>>
>>>> The transfer is stuck due to the receiver buffer being zero.
>>>> With the stronger check in place, tcp_clamp_window() can be invoked
>>>> while the TCP socket has sk_rmem_alloc == 0, and the receive buffer
>>>> will be zeroed, too.
>>>>
>>>> Pass to tcp_clamp_window() even the current skb truesize, so that
>>>> such helper could compute and use the actual limit enforced by
>>>> the stack.
>>>>
>>>> Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>>  net/ipv4/tcp_input.c | 12 ++++++------
>>>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>>> index 672cbfbdcec1..c98de02a3c57 100644
>>>> --- a/net/ipv4/tcp_input.c
>>>> +++ b/net/ipv4/tcp_input.c
>>>> @@ -610,24 +610,24 @@ static void tcp_init_buffer_space(struct sock *sk)
>>>>  }
>>>>
>>>>  /* 4. Recalculate window clamp after socket hit its memory bounds. */
>>>> -static void tcp_clamp_window(struct sock *sk)
>>>> +static void tcp_clamp_window(struct sock *sk, int truesize)
>>>
>>>
>>> I am unsure about this one. truesize can be 1MB here, do we want that
>>> in general ?
>>
>> I'm unsure either. But I can't think of a different approach?!? If the
>> incoming truesize is 1M the socket should allow for at least 1M rcvbuf
>> size to accept it, right?
> 
> What I meant was :
> 
> This is the generic point, accepting skb->truesize as additional input
> here would make us more vulnerable, or we could risk other
> regressions.

Understood, thanks for the clarification.

> The question is : why does MPTCP end up here in the first place.
> Perhaps an older issue with an incorrectly sized sk_rcvbuf ?

I collected a few more data. The issue happens even with plain TCP
sockets[1].

The relevant transfer is on top of the loopback device. The scaling_rate
rapidly grows to 254 - that is `truesize` and `len` are very near.

The stall happens when the received get in a packet with a slightly less
'efficient' layout (in the experiment I have handy len is 71424,
truesize 72320) (almost) filling the receiver window.

On such input, tcp_clamp_window() shrinks the receiver buffer to the
current rmem usage. The same happens on retransmissions until rcvbuf
becomes 0.

I *think* that catching only the !sk_rmem_alloc case would avoid the
stall, but I think it's a bit 'late'. I'm unsure if we could
preventing/forbidding 'too high' values of scaling_rate? (also I'm
unsure where to draw the line exactly.

Cheers,

Paolo


[1] You can run the relevant test by adding '-t' on the mptcp_connect.sh
command line, but it will take a lot of time to run the 10-20 iterations
I need to observe the issue. To make it faster I manually trimmed the
not relevant test-cases.


  parent reply	other threads:[~2025-07-21 13:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-18 17:25 [PATCH net-next 0/2] tcp: a couple of fixes Paolo Abeni
2025-07-18 17:25 ` [PATCH net-next 1/2] tcp: do not set a zero size receive buffer Paolo Abeni
2025-07-18 22:15   ` Matthieu Baerts
2025-07-21  8:04   ` Eric Dumazet
2025-07-21 10:50     ` Paolo Abeni
2025-07-21 12:30       ` Eric Dumazet
2025-07-21 12:41         ` Eric Dumazet
2025-07-21 13:32         ` Paolo Abeni [this message]
2025-07-21 13:52           ` Eric Dumazet
2025-07-21 14:56             ` Paolo Abeni
2025-07-21 15:21               ` Eric Dumazet
2025-07-21 16:17                 ` Paolo Abeni
2025-07-21 15:27               ` Jakub Kicinski
2025-07-21 15:30                 ` Jakub Kicinski
2025-07-21 16:50                 ` Eric Dumazet
2025-07-18 17:25 ` [PATCH net-next 2/2] tcp: do not increment BeyondWindow MIB for old seq Paolo Abeni
2025-07-18 22:16   ` Matthieu Baerts
2025-07-21  8:28   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebc7890c-e239-4a64-99af-df5053245b28@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=matttbe@kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).