All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Hwang <leon.hwang@linux.dev>
To: Eric Dumazet <edumazet@google.com>, Leon Hwang <leon.huangfu@shopee.com>
Cc: netdev@vger.kernel.org, "David S . Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Shuah Khan" <skhan@linuxfoundation.org>,
	"David Ahern" <dsahern@kernel.org>,
	"Neal Cardwell" <ncardwell@google.com>,
	"Kuniyuki Iwashima" <kuniyu@google.com>,
	"Ilpo Järvinen" <ij@kernel.org>,
	"Ido Schimmel" <idosch@nvidia.com>,
	kerneljasonxing@gmail.com, lance.yang@linux.dev,
	jiayuan.chen@linux.dev, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl
Date: Wed, 25 Feb 2026 17:48:09 +0800	[thread overview]
Message-ID: <f6eae6e1-b0ad-4027-ac53-26abbfabe2c6@linux.dev> (raw)
In-Reply-To: <CANn89i+RZtN0wcyBUxKf83pkcbH4=nN_Cpc62tNwwS8T-LQR2A@mail.gmail.com>



On 25/2/26 16:31, Eric Dumazet wrote:
> On Wed, Feb 25, 2026 at 8:46 AM Leon Hwang <leon.huangfu@shopee.com> wrote:
>>
>> Introduce a new sysctl knob, net.ipv4.tcp_purge_receive_queue, to
>> address a memory leak scenario related to TCP sockets.
> 
> We use the term "memory leak" for a persistent loss of memory (until reboot)
> 

Thanks for the clarification.

> Lets not abuse it and confuse various AI/human agents which will
> declare emergency situations
> caused by an inexistent fatal error.
> 

I'll reword it in the next revision.

>>
>> Issue:
>> When a TCP socket in the CLOSE_WAIT state receives a RST packet, the
>> current implementation does not clear the socket's receive queue. This
>> causes SKBs in the queue to remain allocated until the socket is
>> explicitly closed by the application. As a consequence:
>>
>> 1. The page pool pages held by these SKBs are not released.
> 
> This situation also applies for normal TCP_ESTABLISHED sockets, when
> applications
> do not drain the receive queue.
> 
> As long the application has not called close(), kernel should not
> assume the application
> will _not_ read the data that was received.
> 

Understood.

This patch provides an option to drain the receive queue in the
CLOSE_WAIT + RST case, instead of purging it unconditionally upon
receiving a RST packet.

> 
>> 2. The associated page pool cannot be freed.
>>
>> RFC 9293 Section 3.10.7.4 specifies that when a RST is received in
>> CLOSE_WAIT state, "all segment queues should be flushed." However, the
>> current implementation does not flush the receive queue.
> 
> Some buggy stacks send RST anyway after FIN. I think that forcingly
> purging good data
> received before the RST would add many surprises.
> 

Understood.

There is a tcp_write_queue_purge(sk) call in tcp_done_with_error(),
which means sk_write_queue is always purged when a RST packet is
received. I assume the reason for purging sk_write_queue is that any
pending transmissions become meaningless once a RST is received.

Would it be better to defer kb_queue_purge(&sk->sk_receive_queue) until
after tcp_done_with_error()?

[...]

>>
> 
> Please prepare a packetdrill test.

Ack.

I'll add a packetdrill test in the next revision.

Thanks,
Leon


  reply	other threads:[~2026-02-25  9:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-25  7:46 [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl Leon Hwang
2026-02-25  8:31 ` Eric Dumazet
2026-02-25  9:48   ` Leon Hwang [this message]
2026-02-26  1:43 ` Jakub Kicinski
2026-03-02  9:55   ` Leon Hwang
2026-03-03  0:22     ` Jakub Kicinski
2026-03-03  2:12       ` Leon Hwang
2026-03-03  3:55         ` Eric Dumazet
2026-03-03  6:26           ` Leon Hwang
2026-03-03  7:55             ` Leon Hwang
2026-03-03  8:17               ` Eric Dumazet
2026-03-03  8:54                 ` Leon Hwang
2026-03-03  8:56                   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f6eae6e1-b0ad-4027-ac53-26abbfabe2c6@linux.dev \
    --to=leon.hwang@linux.dev \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=ij@kernel.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=lance.yang@linux.dev \
    --cc=leon.huangfu@shopee.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.