From: Stefano Brivio <sbrivio@redhat.com>
To: Menglong Dong <menglong8.dong@gmail.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>,
Jon Maloy <jmaloy@redhat.com>, Eric Dumazet <edumazet@google.com>,
Neal Cardwell <ncardwell@google.com>,
netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
passt-dev@passt.top, lvivier@redhat.com, dgibson@redhat.com,
eric.dumazet@gmail.com
Subject: Re: [net,v2] tcp: correct handling of extreme memory squeeze
Date: Mon, 27 Jan 2025 15:03:03 +0100 [thread overview]
Message-ID: <20250127150303.46c9d9f5@elisabeth> (raw)
In-Reply-To: <CADxym3Zji3NZy2tBAxSm5GaQ8tVG8PmxcyJ_AGnUC-H386tq7g@mail.gmail.com>
On Mon, 27 Jan 2025 21:37:23 +0800
Menglong Dong <menglong8.dong@gmail.com> wrote:
> On Mon, Jan 27, 2025 at 6:32 PM Stefano Brivio <sbrivio@redhat.com> wrote:
> >
> > On Mon, 27 Jan 2025 18:17:28 +0800
> > Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > > I'm not that sure if it's a bug belonging to the Linux kernel.
> >
> > It is, because for at least 20-25 years (before that it's a bit hard to
> > understand from history) a non-zero window would be announced, as
> > obviously expected, once there's again space in the receive window.
>
> Sorry for the late reply. I think the key of this problem is
> what should we do when we receive a tcp packet and we are
> out of memory.
>
> The RFC doesn't define such a thing,
Why not? RFC 9293, 3.8.6:
There is an assumption that this is related to the data buffer space
currently available for this connection.
That is, out-of-memory -> zero window.
> so in the commit
> e2142825c120 ("net: tcp: send zero-window ACK when no memory"),
> I reply with a zero-window ACK to the peer.
Your patch is fundamentally correct, nobody is disputing that. The
problem is that it introduces a side effect because it gets the notion
of "current window" out of sync by sending a one-off packet with a
zero-window, without recording that.
> And the peer will keep
> probing the window by retransmitting the packet that we dropped if
> the peer is a LINUX SYSTEM.
>
> As I said, the RFC doesn't define such a case, so the behavior of
> the peer is undefined if it is not a LINUX SYSTEM. If the peer doesn't
> keep retransmitting the packet, it will hang the connection, just like
> the problem that described in this commit log.
It's not undefined. RFC 9293 3.8.6.1 (just like RFC 1122 4.2.2.17,
RFC 793 3.7) requires zero-window probes.
But keeping the window closed indefinitely if there's no zero-window
probe is a regression anyway:
- a retransmission timeout must elapse (RFC 9293 3.8.1) before the
zero-window probe is sent, so relying on zero-window probes means
introducing an unnecessary delay
- if the peer (as it was the case here) fails to send a zero-window
probe for whatever reason, things break. This is a userspace
breakage, regardless of the fact that the peer should send a
zero-window probe
> However, we can make some optimization to make it more
> adaptable. We can send a ACK with the right window to the
> peer when the memory is available, and __tcp_cleanup_rbuf()
> is a good choice.
>
> Generally speaking, I think this patch makes sense. However,
> I'm not sure if there is any other influence if we make
> "tp->rcv_wnd=0", but it can trigger a ACK in __tcp_cleanup_rbuf().
I don't understand what's your concern with the patch that was proposed
(and tested quite thoroughly, by the way).
> Following is the code that I thought before to optimize this
> case (the code is totally not tested):
>
> [...]
--
Stefano
next prev parent reply other threads:[~2025-01-27 14:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-17 21:40 [net,v2] tcp: correct handling of extreme memory squeeze jmaloy
2025-01-17 22:09 ` Eric Dumazet
2025-01-17 22:27 ` Stefano Brivio
2025-01-18 17:01 ` Jason Xing
2025-01-18 20:04 ` Neal Cardwell
2025-01-20 5:03 ` Jon Maloy
2025-01-20 16:10 ` Jon Maloy
2025-01-20 16:22 ` Eric Dumazet
2025-01-24 17:40 ` Jon Maloy
2025-01-27 9:53 ` Eric Dumazet
2025-01-27 10:01 ` Stefano Brivio
2025-01-27 10:06 ` Eric Dumazet
2025-01-27 10:27 ` Stefano Brivio
2025-01-27 10:17 ` Jason Xing
2025-01-27 10:32 ` Stefano Brivio
2025-01-27 13:37 ` Menglong Dong
2025-01-27 14:03 ` Stefano Brivio [this message]
2025-01-27 16:37 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250127150303.46c9d9f5@elisabeth \
--to=sbrivio@redhat.com \
--cc=davem@davemloft.net \
--cc=dgibson@redhat.com \
--cc=edumazet@google.com \
--cc=eric.dumazet@gmail.com \
--cc=jmaloy@redhat.com \
--cc=kerneljasonxing@gmail.com \
--cc=kuba@kernel.org \
--cc=lvivier@redhat.com \
--cc=menglong8.dong@gmail.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=passt-dev@passt.top \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.