netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] tcp: __tcp_close() changes
@ 2025-09-03  8:47 Eric Dumazet
  2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-09-03  8:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

First patch fixes a rare bug.

Second patch adds a corresponding packetdrill test.

Third patch is a small optimization.

Eric Dumazet (3):
  tcp: fix __tcp_close() to only send RST when required
  selftests/net: packetdrill: add tcp_close_no_rst.pkt
  tcp: use tcp_eat_recv_skb in __tcp_close()

 net/ipv4/tcp.c                                | 13 ++++----
 .../net/packetdrill/tcp_close_no_rst.pkt      | 32 +++++++++++++++++++
 2 files changed, 39 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_close_no_rst.pkt

-- 
2.51.0.338.gd7d06c2dae-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-03  8:47 [PATCH net-next 0/3] tcp: __tcp_close() changes Eric Dumazet
@ 2025-09-03  8:47 ` Eric Dumazet
  2025-09-03 15:07   ` Neal Cardwell
  2025-09-04  5:03   ` Jason Xing
  2025-09-03  8:47 ` [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt Eric Dumazet
  2025-09-03  8:47 ` [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close() Eric Dumazet
  2 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-09-03  8:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

If the receive queue contains payload that was already
received, __tcp_close() can send an unexpected RST.

Refine the code to take tp->copied_seq into account,
as we already do in tcp recvmsg().

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 40b774b4f587..39eb03f6d07f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
 
 void __tcp_close(struct sock *sk, long timeout)
 {
+	bool data_was_unread = false;
 	struct sk_buff *skb;
-	int data_was_unread = 0;
 	int state;
 
 	WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
@@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
 	 *  reader process may not have drained the data yet!
 	 */
 	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
-		u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
+		u32 end_seq = TCP_SKB_CB(skb)->end_seq;
 
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
-			len--;
-		data_was_unread += len;
+			end_seq--;
+		if (after(end_seq, tcp_sk(sk)->copied_seq))
+			data_was_unread = true;
 		__kfree_skb(skb);
 	}
 
-- 
2.51.0.338.gd7d06c2dae-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt
  2025-09-03  8:47 [PATCH net-next 0/3] tcp: __tcp_close() changes Eric Dumazet
  2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
@ 2025-09-03  8:47 ` Eric Dumazet
  2025-09-03 15:08   ` Neal Cardwell
  2025-09-04  6:20   ` Jason Xing
  2025-09-03  8:47 ` [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close() Eric Dumazet
  2 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-09-03  8:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

This test makes sure we do send a FIN on close()
if the receive queue contains data that was consumed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 .../net/packetdrill/tcp_close_no_rst.pkt      | 32 +++++++++++++++++++
 1 file changed, 32 insertions(+)
 create mode 100644 tools/testing/selftests/net/packetdrill/tcp_close_no_rst.pkt

diff --git a/tools/testing/selftests/net/packetdrill/tcp_close_no_rst.pkt b/tools/testing/selftests/net/packetdrill/tcp_close_no_rst.pkt
new file mode 100644
index 000000000000..eef01d5f1118
--- /dev/null
+++ b/tools/testing/selftests/net/packetdrill/tcp_close_no_rst.pkt
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+
+--mss=1000
+
+`./defaults.sh`
+
+// Initialize connection
+    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+   +0 bind(3, ..., ...) = 0
+   +0 listen(3, 1) = 0
+
+   +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop>
+   +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
+  +.1 < . 1:1(0) ack 1 win 32792
+
+
+   +0 accept(3, ..., ...) = 4
+   +0 < . 1:1001(1000) ack 1 win 32792
+   +0 > . 1:1(0) ack 1001
+   +0 read(4, ..., 1000) = 1000
+
+// resend the payload + a FIN
+   +0 < F. 1:1001(1000) ack 1 win 32792
+// Why do we have a delay and no dsack ?
+   +0~+.04 > . 1:1(0) ack 1002
+
+   +0 close(4) = 0
+
+// According to RFC 2525, section 2.17
+// we should _not_ send an RST here, because there was no data to consume.
+   +0 > F. 1:1(0) ack 1002
-- 
2.51.0.338.gd7d06c2dae-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close()
  2025-09-03  8:47 [PATCH net-next 0/3] tcp: __tcp_close() changes Eric Dumazet
  2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
  2025-09-03  8:47 ` [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt Eric Dumazet
@ 2025-09-03  8:47 ` Eric Dumazet
  2025-09-03 15:09   ` Neal Cardwell
  2025-09-04  6:34   ` Jason Xing
  2 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2025-09-03  8:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell
  Cc: Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet,
	Eric Dumazet

Small change to use tcp_eat_recv_skb() instead
of __kfree_skb(). This can help if an application
under attack has to close many sockets with unread data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 39eb03f6d07f..588932c3cf1d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3118,14 +3118,14 @@ void __tcp_close(struct sock *sk, long timeout)
 	 *  descriptor close, not protocol-sourced closes, because the
 	 *  reader process may not have drained the data yet!
 	 */
-	while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
+	while ((skb = skb_peek(&sk->sk_receive_queue)) != NULL) {
 		u32 end_seq = TCP_SKB_CB(skb)->end_seq;
 
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			end_seq--;
 		if (after(end_seq, tcp_sk(sk)->copied_seq))
 			data_was_unread = true;
-		__kfree_skb(skb);
+		tcp_eat_recv_skb(sk, skb);
 	}
 
 	/* If socket has been already reset (e.g. in tcp_reset()) - kill it. */
-- 
2.51.0.338.gd7d06c2dae-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
@ 2025-09-03 15:07   ` Neal Cardwell
  2025-09-03 18:28     ` Kuniyuki Iwashima
  2025-09-04  5:03   ` Jason Xing
  1 sibling, 1 reply; 17+ messages in thread
From: Neal Cardwell @ 2025-09-03 15:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
>
> If the receive queue contains payload that was already
> received, __tcp_close() can send an unexpected RST.
>
> Refine the code to take tp->copied_seq into account,
> as we already do in tcp recvmsg().
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Reviewed-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!
neal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt
  2025-09-03  8:47 ` [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt Eric Dumazet
@ 2025-09-03 15:08   ` Neal Cardwell
  2025-09-03 18:28     ` Kuniyuki Iwashima
  2025-09-04  6:20   ` Jason Xing
  1 sibling, 1 reply; 17+ messages in thread
From: Neal Cardwell @ 2025-09-03 15:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This test makes sure we do send a FIN on close()
> if the receive queue contains data that was consumed.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Reviewed-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close()
  2025-09-03  8:47 ` [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close() Eric Dumazet
@ 2025-09-03 15:09   ` Neal Cardwell
  2025-09-03 18:30     ` Kuniyuki Iwashima
  2025-09-04  6:34   ` Jason Xing
  1 sibling, 1 reply; 17+ messages in thread
From: Neal Cardwell @ 2025-09-03 15:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Small change to use tcp_eat_recv_skb() instead
> of __kfree_skb(). This can help if an application
> under attack has to close many sockets with unread data.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Reviewed-by: Neal Cardwell <ncardwell@google.com>

Thanks, Eric!

neal

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-03 15:07   ` Neal Cardwell
@ 2025-09-03 18:28     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-09-03 18:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 8:07 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > If the receive queue contains payload that was already
> > received, __tcp_close() can send an unexpected RST.
> >
> > Refine the code to take tp->copied_seq into account,
> > as we already do in tcp recvmsg().
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
>
> Reviewed-by: Neal Cardwell <ncardwell@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt
  2025-09-03 15:08   ` Neal Cardwell
@ 2025-09-03 18:28     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-09-03 18:28 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 8:09 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > This test makes sure we do send a FIN on close()
> > if the receive queue contains data that was consumed.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
>
> Reviewed-by: Neal Cardwell <ncardwell@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close()
  2025-09-03 15:09   ` Neal Cardwell
@ 2025-09-03 18:30     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-09-03 18:30 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 8:09 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:47 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Small change to use tcp_eat_recv_skb() instead
> > of __kfree_skb(). This can help if an application
> > under attack has to close many sockets with unread data.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
>
> Reviewed-by: Neal Cardwell <ncardwell@google.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
  2025-09-03 15:07   ` Neal Cardwell
@ 2025-09-04  5:03   ` Jason Xing
  2025-09-04  5:31     ` Kuniyuki Iwashima
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Xing @ 2025-09-04  5:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
>
> If the receive queue contains payload that was already
> received, __tcp_close() can send an unexpected RST.
>
> Refine the code to take tp->copied_seq into account,
> as we already do in tcp recvmsg().
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")

Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.

IIUC, it's not possible that one skb stays in the receive queue and
all of the data has been consumed in tcp_recvmsg() unless it's
MSG_PEEK mode. So my understanding is that the patch tries to cover
the case where partial data of skb is read by applications and the
whole skb has not been unlinked from the receive queue yet. Sure, as
we can learn from tcp_sendsmg(), skb can be partially read.

As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
len, and the skb still exists in the receive queue, it can directly
means some part of skb hasn't been read yet. We can call it the unread
data case then, so the logic before this patch is right.

Two conditions (1. skb still stays in the queue, 2. skb has data) make
sure that the data unread case can be detected and then sends an RST.
No need to replace it with copied_seq, I wonder? At least, it's not a
bug.

Thanks,
Jason




> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/tcp.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 40b774b4f587..39eb03f6d07f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
>
>  void __tcp_close(struct sock *sk, long timeout)
>  {
> +       bool data_was_unread = false;
>         struct sk_buff *skb;
> -       int data_was_unread = 0;
>         int state;
>
>         WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
>          *  reader process may not have drained the data yet!
>          */
>         while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> -               u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> +               u32 end_seq = TCP_SKB_CB(skb)->end_seq;
>
>                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> -                       len--;
> -               data_was_unread += len;
> +                       end_seq--;
> +               if (after(end_seq, tcp_sk(sk)->copied_seq))
> +                       data_was_unread = true;
>                 __kfree_skb(skb);
>         }
>
> --
> 2.51.0.338.gd7d06c2dae-goog
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-04  5:03   ` Jason Xing
@ 2025-09-04  5:31     ` Kuniyuki Iwashima
  2025-09-04  6:19       ` Jason Xing
  0 siblings, 1 reply; 17+ messages in thread
From: Kuniyuki Iwashima @ 2025-09-04  5:31 UTC (permalink / raw)
  To: Jason Xing
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Neal Cardwell, Simon Horman, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > If the receive queue contains payload that was already
> > received, __tcp_close() can send an unexpected RST.
> >
> > Refine the code to take tp->copied_seq into account,
> > as we already do in tcp recvmsg().
> >
> > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>
> Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
>
> IIUC, it's not possible that one skb stays in the receive queue and
> all of the data has been consumed in tcp_recvmsg() unless it's
> MSG_PEEK mode. So my understanding is that the patch tries to cover
> the case where partial data of skb is read by applications and the
> whole skb has not been unlinked from the receive queue yet. Sure, as
> we can learn from tcp_sendsmg(), skb can be partially read.

You can find a clear example in patch 2 that this patch fixes.

Without patch 1, the test fails:

# ./ksft_runner.sh tcp_close_no_rst.pkt
...
tcp_close_no_rst.pkt:32: error handling packet: live packet field
tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
script packet:  0.140854 F. 1:1(0) ack 1002
actual packet:  0.140844 R. 1:1(0) ack 1002 win 65535
not ok 1 ipv4


>
> As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> len, and the skb still exists in the receive queue, it can directly
> means some part of skb hasn't been read yet. We can call it the unread
> data case then, so the logic before this patch is right.
>
> Two conditions (1. skb still stays in the queue, 2. skb has data) make
> sure that the data unread case can be detected and then sends an RST.
> No need to replace it with copied_seq, I wonder? At least, it's not a
> bug.
>
> Thanks,
> Jason
>
>
>
>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  net/ipv4/tcp.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 40b774b4f587..39eb03f6d07f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> >
> >  void __tcp_close(struct sock *sk, long timeout)
> >  {
> > +       bool data_was_unread = false;
> >         struct sk_buff *skb;
> > -       int data_was_unread = 0;
> >         int state;
> >
> >         WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> >          *  reader process may not have drained the data yet!
> >          */
> >         while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > -               u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > +               u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> >
> >                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > -                       len--;
> > -               data_was_unread += len;
> > +                       end_seq--;
> > +               if (after(end_seq, tcp_sk(sk)->copied_seq))
> > +                       data_was_unread = true;
> >                 __kfree_skb(skb);
> >         }
> >
> > --
> > 2.51.0.338.gd7d06c2dae-goog
> >
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-04  5:31     ` Kuniyuki Iwashima
@ 2025-09-04  6:19       ` Jason Xing
  2025-09-04  6:44         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Xing @ 2025-09-04  6:19 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Neal Cardwell, Simon Horman, netdev, eric.dumazet

On Thu, Sep 4, 2025 at 1:32 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > If the receive queue contains payload that was already
> > > received, __tcp_close() can send an unexpected RST.
> > >
> > > Refine the code to take tp->copied_seq into account,
> > > as we already do in tcp recvmsg().
> > >
> > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> >
> > Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
> >
> > IIUC, it's not possible that one skb stays in the receive queue and
> > all of the data has been consumed in tcp_recvmsg() unless it's
> > MSG_PEEK mode. So my understanding is that the patch tries to cover
> > the case where partial data of skb is read by applications and the
> > whole skb has not been unlinked from the receive queue yet. Sure, as
> > we can learn from tcp_sendsmg(), skb can be partially read.
>
> You can find a clear example in patch 2 that this patch fixes.

Oh, great, a very interesting corner case: resending data with FIN....
I just wasn't able to read the second patch in time...

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks,
Jason

>
> Without patch 1, the test fails:
>
> # ./ksft_runner.sh tcp_close_no_rst.pkt
> ...
> tcp_close_no_rst.pkt:32: error handling packet: live packet field
> tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
> script packet:  0.140854 F. 1:1(0) ack 1002
> actual packet:  0.140844 R. 1:1(0) ack 1002 win 65535
> not ok 1 ipv4
>
>
> >
> > As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> > len, and the skb still exists in the receive queue, it can directly
> > means some part of skb hasn't been read yet. We can call it the unread
> > data case then, so the logic before this patch is right.
> >
> > Two conditions (1. skb still stays in the queue, 2. skb has data) make
> > sure that the data unread case can be detected and then sends an RST.
> > No need to replace it with copied_seq, I wonder? At least, it's not a
> > bug.
> >
> > Thanks,
> > Jason
> >
> >
> >
> >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > ---
> > >  net/ipv4/tcp.c | 9 +++++----
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 40b774b4f587..39eb03f6d07f 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> > >
> > >  void __tcp_close(struct sock *sk, long timeout)
> > >  {
> > > +       bool data_was_unread = false;
> > >         struct sk_buff *skb;
> > > -       int data_was_unread = 0;
> > >         int state;
> > >
> > >         WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> > >          *  reader process may not have drained the data yet!
> > >          */
> > >         while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > > -               u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > > +               u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> > >
> > >                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > > -                       len--;
> > > -               data_was_unread += len;
> > > +                       end_seq--;
> > > +               if (after(end_seq, tcp_sk(sk)->copied_seq))
> > > +                       data_was_unread = true;
> > >                 __kfree_skb(skb);
> > >         }
> > >
> > > --
> > > 2.51.0.338.gd7d06c2dae-goog
> > >
> > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt
  2025-09-03  8:47 ` [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt Eric Dumazet
  2025-09-03 15:08   ` Neal Cardwell
@ 2025-09-04  6:20   ` Jason Xing
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-09-04  6:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
>
> This test makes sure we do send a FIN on close()
> if the receive queue contains data that was consumed.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks for spotting this interesting case.

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close()
  2025-09-03  8:47 ` [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close() Eric Dumazet
  2025-09-03 15:09   ` Neal Cardwell
@ 2025-09-04  6:34   ` Jason Xing
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-09-04  6:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Neal Cardwell,
	Simon Horman, Kuniyuki Iwashima, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 4:48 PM Eric Dumazet <edumazet@google.com> wrote:
>
> Small change to use tcp_eat_recv_skb() instead
> of __kfree_skb(). This can help if an application
> under attack has to close many sockets with unread data.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-04  6:19       ` Jason Xing
@ 2025-09-04  6:44         ` Eric Dumazet
  2025-09-04  6:49           ` Jason Xing
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2025-09-04  6:44 UTC (permalink / raw)
  To: Jason Xing
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Neal Cardwell, Simon Horman, netdev, eric.dumazet

On Wed, Sep 3, 2025 at 11:19 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Thu, Sep 4, 2025 at 1:32 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > If the receive queue contains payload that was already
> > > > received, __tcp_close() can send an unexpected RST.
> > > >
> > > > Refine the code to take tp->copied_seq into account,
> > > > as we already do in tcp recvmsg().
> > > >
> > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > >
> > > Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
> > >
> > > IIUC, it's not possible that one skb stays in the receive queue and
> > > all of the data has been consumed in tcp_recvmsg() unless it's
> > > MSG_PEEK mode. So my understanding is that the patch tries to cover
> > > the case where partial data of skb is read by applications and the
> > > whole skb has not been unlinked from the receive queue yet. Sure, as
> > > we can learn from tcp_sendsmg(), skb can be partially read.
> >
> > You can find a clear example in patch 2 that this patch fixes.
>
> Oh, great, a very interesting corner case: resending data with FIN....

Linux TCP stack under memory pressure can do that BTW, no need for
another implementation :)

tcp_send_fin()


> I just wasn't able to read the second patch in time...
>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
>
> Thanks,
> Jason
>
> >
> > Without patch 1, the test fails:
> >
> > # ./ksft_runner.sh tcp_close_no_rst.pkt
> > ...
> > tcp_close_no_rst.pkt:32: error handling packet: live packet field
> > tcp_fin: expected: 1 (0x1) vs actual: 0 (0x0)
> > script packet:  0.140854 F. 1:1(0) ack 1002
> > actual packet:  0.140844 R. 1:1(0) ack 1002 win 65535
> > not ok 1 ipv4
> >
> >
> > >
> > > As long as 'TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq' has data
> > > len, and the skb still exists in the receive queue, it can directly
> > > means some part of skb hasn't been read yet. We can call it the unread
> > > data case then, so the logic before this patch is right.
> > >
> > > Two conditions (1. skb still stays in the queue, 2. skb has data) make
> > > sure that the data unread case can be detected and then sends an RST.
> > > No need to replace it with copied_seq, I wonder? At least, it's not a
> > > bug.
> > >
> > > Thanks,
> > > Jason
> > >
> > >
> > >
> > >
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > ---
> > > >  net/ipv4/tcp.c | 9 +++++----
> > > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > index 40b774b4f587..39eb03f6d07f 100644
> > > > --- a/net/ipv4/tcp.c
> > > > +++ b/net/ipv4/tcp.c
> > > > @@ -3099,8 +3099,8 @@ bool tcp_check_oom(const struct sock *sk, int shift)
> > > >
> > > >  void __tcp_close(struct sock *sk, long timeout)
> > > >  {
> > > > +       bool data_was_unread = false;
> > > >         struct sk_buff *skb;
> > > > -       int data_was_unread = 0;
> > > >         int state;
> > > >
> > > >         WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
> > > > @@ -3119,11 +3119,12 @@ void __tcp_close(struct sock *sk, long timeout)
> > > >          *  reader process may not have drained the data yet!
> > > >          */
> > > >         while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) {
> > > > -               u32 len = TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq;
> > > > +               u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> > > >
> > > >                 if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
> > > > -                       len--;
> > > > -               data_was_unread += len;
> > > > +                       end_seq--;
> > > > +               if (after(end_seq, tcp_sk(sk)->copied_seq))
> > > > +                       data_was_unread = true;
> > > >                 __kfree_skb(skb);
> > > >         }
> > > >
> > > > --
> > > > 2.51.0.338.gd7d06c2dae-goog
> > > >
> > > >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required
  2025-09-04  6:44         ` Eric Dumazet
@ 2025-09-04  6:49           ` Jason Xing
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Xing @ 2025-09-04  6:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Kuniyuki Iwashima, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Neal Cardwell, Simon Horman, netdev, eric.dumazet

On Thu, Sep 4, 2025 at 2:44 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Sep 3, 2025 at 11:19 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Thu, Sep 4, 2025 at 1:32 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
> > >
> > > On Wed, Sep 3, 2025 at 10:04 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 3, 2025 at 4:47 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > If the receive queue contains payload that was already
> > > > > received, __tcp_close() can send an unexpected RST.
> > > > >
> > > > > Refine the code to take tp->copied_seq into account,
> > > > > as we already do in tcp recvmsg().
> > > > >
> > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> > > >
> > > > Sorry, Eric. I might be wrong, and I don't think it's a bugfix for now.
> > > >
> > > > IIUC, it's not possible that one skb stays in the receive queue and
> > > > all of the data has been consumed in tcp_recvmsg() unless it's
> > > > MSG_PEEK mode. So my understanding is that the patch tries to cover
> > > > the case where partial data of skb is read by applications and the
> > > > whole skb has not been unlinked from the receive queue yet. Sure, as
> > > > we can learn from tcp_sendsmg(), skb can be partially read.
> > >
> > > You can find a clear example in patch 2 that this patch fixes.
> >
> > Oh, great, a very interesting corner case: resending data with FIN....
>
> Linux TCP stack under memory pressure can do that BTW, no need for
> another implementation :)
>
> tcp_send_fin()

Thank you, Eric!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-09-04  6:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-03  8:47 [PATCH net-next 0/3] tcp: __tcp_close() changes Eric Dumazet
2025-09-03  8:47 ` [PATCH net-next 1/3] tcp: fix __tcp_close() to only send RST when required Eric Dumazet
2025-09-03 15:07   ` Neal Cardwell
2025-09-03 18:28     ` Kuniyuki Iwashima
2025-09-04  5:03   ` Jason Xing
2025-09-04  5:31     ` Kuniyuki Iwashima
2025-09-04  6:19       ` Jason Xing
2025-09-04  6:44         ` Eric Dumazet
2025-09-04  6:49           ` Jason Xing
2025-09-03  8:47 ` [PATCH net-next 2/3] selftests/net: packetdrill: add tcp_close_no_rst.pkt Eric Dumazet
2025-09-03 15:08   ` Neal Cardwell
2025-09-03 18:28     ` Kuniyuki Iwashima
2025-09-04  6:20   ` Jason Xing
2025-09-03  8:47 ` [PATCH net-next 3/3] tcp: use tcp_eat_recv_skb in __tcp_close() Eric Dumazet
2025-09-03 15:09   ` Neal Cardwell
2025-09-03 18:30     ` Kuniyuki Iwashima
2025-09-04  6:34   ` Jason Xing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).