public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Youngmin Nam <youngmin.nam@samsung.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>,
	Youngmin Nam <youngmin.nam@samsung.com>,
	davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, dujeong.lee@samsung.com,
	guo88.liu@samsung.com, yiwang.cai@samsung.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	joonki.min@samsung.com, hajun.sung@samsung.com,
	d7271.choe@samsung.com, sw.ju@samsung.com
Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
Date: Wed, 4 Dec 2024 12:08:59 +0900	[thread overview]
Message-ID: <Z0/HyztKs8UFBOa0@perf> (raw)
In-Reply-To: <CANn89iK+7CKO31=3EvNo6-raUzyibwRRN8HkNXeqzuP9q8k_tA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12184 bytes --]

Hi Eric.
Thanks for looking at this issue.

On Tue, Dec 03, 2024 at 12:07:05PM +0100, Eric Dumazet wrote:
> On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote:
> >
> > We encountered the following WARNINGs
> > in tcp_sacktag_write_queue()/tcp_fastretrans_alert()
> > which triggered a kernel panic due to panic_on_warn.
> >
> > case 1.
> > ------------[ cut here ]------------
> > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026
> > Call trace:
> >  tcp_sacktag_write_queue+0xae8/0xb60
> >  tcp_ack+0x4ec/0x12b8
> >  tcp_rcv_state_process+0x22c/0xd38
> >  tcp_v4_do_rcv+0x220/0x300
> >  tcp_v4_rcv+0xa5c/0xbb4
> >  ip_protocol_deliver_rcu+0x198/0x34c
> >  ip_local_deliver_finish+0x94/0xc4
> >  ip_local_deliver+0x74/0x10c
> >  ip_rcv+0xa0/0x13c
> > Kernel panic - not syncing: kernel: panic_on_warn set ...
> >
> > case 2.
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004
> > Call trace:
> >  tcp_fastretrans_alert+0x8ac/0xa74
> >  tcp_ack+0x904/0x12b8
> >  tcp_rcv_state_process+0x22c/0xd38
> >  tcp_v4_do_rcv+0x220/0x300
> >  tcp_v4_rcv+0xa5c/0xbb4
> >  ip_protocol_deliver_rcu+0x198/0x34c
> >  ip_local_deliver_finish+0x94/0xc4
> >  ip_local_deliver+0x74/0x10c
> >  ip_rcv+0xa0/0x13c
> > Kernel panic - not syncing: kernel: panic_on_warn set ...
> >
> 
> I have not seen these warnings firing. Neal, have you seen this in the past ?
> 
> Please provide the kernel version (this must be a pristine LTS one).
We are running Android kernel for Android mobile device which is based on LTS kernel 6.6-30.
But we've seen this issue since kernel 5.15 LTS.

> and symbolized stack traces using scripts/decode_stacktrace.sh
Unfortunately, we don't have the matched vmlinux right now. So we need to rebuild and reproduce.
> 
> If this warning was easy to trigger, please provide a packetdrill test ?
I'm not sure if we can run packetdrill test on Android device. Anyway let me check.

FYI, Here are more detailed logs.

Case 1.
[26496.422651]I[4:  napi/wlan0-33:  467] ------------[ cut here ]------------
[26496.422665]I[4:  napi/wlan0-33:  467] WARNING: CPU: 4 PID: 467 at net/ipv4/tcp_input.c:2026 tcp_sacktag_write_queue+0xae8/0xb60
[26496.423420]I[4:  napi/wlan0-33:  467] CPU: 4 PID: 467 Comm: napi/wlan0-33 Tainted: G S         OE      6.6.30-android15-8-geeceb2c9cdf1-ab20240930.125201-4k #1 a1c80b36942fa6e9575b2544032a7536ed502804
[26496.423427]I[4:  napi/wlan0-33:  467] Hardware name: Samsung ERD9955 board based on S5E9955 (DT)
[26496.423432]I[4:  napi/wlan0-33:  467] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[26496.423438]I[4:  napi/wlan0-33:  467] pc : tcp_sacktag_write_queue+0xae8/0xb60
[26496.423446]I[4:  napi/wlan0-33:  467] lr : tcp_ack+0x4ec/0x12b8
[26496.423455]I[4:  napi/wlan0-33:  467] sp : ffffffc096b8b690
[26496.423458]I[4:  napi/wlan0-33:  467] x29: ffffffc096b8b710 x28: 0000000000008001 x27: 000000005526d635
[26496.423469]I[4:  napi/wlan0-33:  467] x26: ffffff8a19079684 x25: 000000005526dbfd x24: 0000000000000001
[26496.423480]I[4:  napi/wlan0-33:  467] x23: 000000000000000a x22: ffffff88e5f5b680 x21: 000000005526dbc9
[26496.423489]I[4:  napi/wlan0-33:  467] x20: ffffff8a19078d80 x19: ffffff88e9f4193e x18: ffffffd083114c80
[26496.423499]I[4:  napi/wlan0-33:  467] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000000
[26496.423508]I[4:  napi/wlan0-33:  467] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000
[26496.423517]I[4:  napi/wlan0-33:  467] x11: 0000000000000000 x10: 0000000000000001 x9 : 00000000fffffffd
[26496.423526]I[4:  napi/wlan0-33:  467] x8 : 0000000000000001 x7 : 0000000000000000 x6 : ffffffd081ec0bc4
[26496.423536]I[4:  napi/wlan0-33:  467] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc096b8b818
[26496.423545]I[4:  napi/wlan0-33:  467] x2 : 000000005526d635 x1 : ffffff88e5f5b680 x0 : ffffff8a19078d80
[26496.423555]I[4:  napi/wlan0-33:  467] Call trace:
[26496.423558]I[4:  napi/wlan0-33:  467]  tcp_sacktag_write_queue+0xae8/0xb60
[26496.423566]I[4:  napi/wlan0-33:  467]  tcp_ack+0x4ec/0x12b8
[26496.423573]I[4:  napi/wlan0-33:  467]  tcp_rcv_state_process+0x22c/0xd38
[26496.423580]I[4:  napi/wlan0-33:  467]  tcp_v4_do_rcv+0x220/0x300
[26496.423590]I[4:  napi/wlan0-33:  467]  tcp_v4_rcv+0xa5c/0xbb4
[26496.423596]I[4:  napi/wlan0-33:  467]  ip_protocol_deliver_rcu+0x198/0x34c
[26496.423607]I[4:  napi/wlan0-33:  467]  ip_local_deliver_finish+0x94/0xc4
[26496.423614]I[4:  napi/wlan0-33:  467]  ip_local_deliver+0x74/0x10c
[26496.423620]I[4:  napi/wlan0-33:  467]  ip_rcv+0xa0/0x13c
[26496.423625]I[4:  napi/wlan0-33:  467]  __netif_receive_skb_core+0xe14/0x1104
[26496.423642]I[4:  napi/wlan0-33:  467]  __netif_receive_skb_list_core+0xb8/0x2dc
[26496.423649]I[4:  napi/wlan0-33:  467]  netif_receive_skb_list_internal+0x234/0x320
[26496.423655]I[4:  napi/wlan0-33:  467]  napi_complete_done+0xb4/0x1a0
[26496.423660]I[4:  napi/wlan0-33:  467]  slsi_rx_netif_napi_poll+0x22c/0x258 [scsc_wlan 16ac2100e65b7c78ce863cecc238b39b162dbe82]
[26496.423822]I[4:  napi/wlan0-33:  467]  __napi_poll+0x5c/0x238
[26496.423829]I[4:  napi/wlan0-33:  467]  napi_threaded_poll+0x110/0x204
[26496.423836]I[4:  napi/wlan0-33:  467]  kthread+0x114/0x138
[26496.423845]I[4:  napi/wlan0-33:  467]  ret_from_fork+0x10/0x20
[26496.423856]I[4:  napi/wlan0-33:  467] Kernel panic - not syncing: kernel: panic_on_warn set ..

Case 2.
[ 1843.463330]I[0: surfaceflinger:  648] ------------[ cut here ]------------
[ 1843.463355]I[0: surfaceflinger:  648] WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 tcp_fastretrans_alert+0x8ac/0xa74
[ 1843.464508]I[0: surfaceflinger:  648] CPU: 0 PID: 648 Comm: surfaceflinger Tainted: G S         OE      6.6.30-android15-8-geeceb2c9cdf1-ab20241017.075836-4k #1 de751202c2c5ab3ec352a00ae470fc5e907bdcfe
[ 1843.464520]I[0: surfaceflinger:  648] Hardware name: Samsung ERD8855 board based on S5E8855 (DT)
[ 1843.464527]I[0: surfaceflinger:  648] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 1843.464535]I[0: surfaceflinger:  648] pc : tcp_fastretrans_alert+0x8ac/0xa74
[ 1843.464548]I[0: surfaceflinger:  648] lr : tcp_ack+0x904/0x12b8
[ 1843.464556]I[0: surfaceflinger:  648] sp : ffffffc0800036e0
[ 1843.464561]I[0: surfaceflinger:  648] x29: ffffffc0800036e0 x28: 0000000000008005 x27: 000000001bc05562
[ 1843.464579]I[0: surfaceflinger:  648] x26: ffffff890418a3c4 x25: 0000000000000000 x24: 000000000000cd02
[ 1843.464595]I[0: surfaceflinger:  648] x23: 000000001bc05562 x22: 0000000000000000 x21: ffffffc080003800
[ 1843.464611]I[0: surfaceflinger:  648] x20: ffffffc08000378c x19: ffffff8904189ac0 x18: 0000000000000000
[ 1843.464627]I[0: surfaceflinger:  648] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000001
[ 1843.464642]I[0: surfaceflinger:  648] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000
[ 1843.464658]I[0: surfaceflinger:  648] x11: ffffff883e9c9540 x10: 0000000000000001 x9 : 0000000000000001
[ 1843.464673]I[0: surfaceflinger:  648] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffffd081ec0bc4
[ 1843.464689]I[0: surfaceflinger:  648] x5 : 0000000000000000 x4 : ffffffc08000378c x3 : ffffffc080003800
[ 1843.464704]I[0: surfaceflinger:  648] x2 : 0000000000000000 x1 : 000000001bc05562 x0 : ffffff8904189ac0
[ 1843.464720]I[0: surfaceflinger:  648] Call trace:
[ 1843.464725]I[0: surfaceflinger:  648]  tcp_fastretrans_alert+0x8ac/0xa74
[ 1843.464735]I[0: surfaceflinger:  648]  tcp_ack+0x904/0x12b8
[ 1843.464743]I[0: surfaceflinger:  648]  tcp_rcv_state_process+0x22c/0xd38
[ 1843.464751]I[0: surfaceflinger:  648]  tcp_v4_do_rcv+0x220/0x300
[ 1843.464760]I[0: surfaceflinger:  648]  tcp_v4_rcv+0xa5c/0xbb4
[ 1843.464767]I[0: surfaceflinger:  648]  ip_protocol_deliver_rcu+0x198/0x34c
[ 1843.464776]I[0: surfaceflinger:  648]  ip_local_deliver_finish+0x94/0xc4
[ 1843.464784]I[0: surfaceflinger:  648]  ip_local_deliver+0x74/0x10c
[ 1843.464791]I[0: surfaceflinger:  648]  ip_rcv+0xa0/0x13c
[ 1843.464799]I[0: surfaceflinger:  648]  __netif_receive_skb_core+0xe14/0x1104
[ 1843.464810]I[0: surfaceflinger:  648]  __netif_receive_skb+0x40/0x124
[ 1843.464818]I[0: surfaceflinger:  648]  netif_receive_skb+0x7c/0x234
[ 1843.464825]I[0: surfaceflinger:  648]  slsi_rx_data_deliver_skb+0x1e0/0xdbc [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28]
[ 1843.465025]I[0: surfaceflinger:  648]  slsi_ba_process_complete+0x70/0xa4 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28]
[ 1843.465219]I[0: surfaceflinger:  648]  slsi_ba_aging_timeout_handler+0x324/0x354 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28]
[ 1843.465410]I[0: surfaceflinger:  648]  call_timer_fn+0xd0/0x360
[ 1843.465423]I[0: surfaceflinger:  648]  __run_timers+0x1b4/0x268
[ 1843.465432]I[0: surfaceflinger:  648]  run_timer_softirq+0x24/0x4c
[ 1843.465440]I[0: surfaceflinger:  648]  __do_softirq+0x158/0x45c
[ 1843.465448]I[0: surfaceflinger:  648]  ____do_softirq+0x10/0x20
[ 1843.465455]I[0: surfaceflinger:  648]  call_on_irq_stack+0x3c/0x74
[ 1843.465463]I[0: surfaceflinger:  648]  do_softirq_own_stack+0x1c/0x2c
[ 1843.465470]I[0: surfaceflinger:  648]  __irq_exit_rcu+0x54/0xb4
[ 1843.465480]I[0: surfaceflinger:  648]  irq_exit_rcu+0x10/0x1c
[ 1843.465489]I[0: surfaceflinger:  648]  el0_interrupt+0x54/0xe0
[ 1843.465499]I[0: surfaceflinger:  648]  __el0_irq_handler_common+0x18/0x28
[ 1843.465508]I[0: surfaceflinger:  648]  el0t_64_irq_handler+0x10/0x1c
[ 1843.465516]I[0: surfaceflinger:  648]  el0t_64_irq+0x1a8/0x1ac
[ 1843.465525]I[0: surfaceflinger:  648] Kernel panic - not syncing: kernel: panic_on_warn set ...

> > When we check the socket state value at the time of the issue,
> > it was 0x4.
> >
> > skc_state = 0x4,
> >
> > This is "TCP_FIN_WAIT1" and which means the device closed its socket.
> >
> > enum {
> >         TCP_ESTABLISHED = 1,
> >         TCP_SYN_SENT,
> >         TCP_SYN_RECV,
> >         TCP_FIN_WAIT1,
> >
> > And also this means tp->packets_out was initialized as 0
> > by tcp_write_queue_purge().
> 
> What stack trace leads to this tcp_write_queue_purge() exactly ?
I couldn't find the exact call stack to this.
But I just thought that the function would be called based on ramdump snapshot.

(*(struct tcp_sock *)(0xFFFFFF800467AB00)).packets_out = 0
(*(struct inet_connection_sock *)0xFFFFFF800467AB00).icsk_backoff = 0

> >
> > In a congested network situation, a TCP ACK for
> > an already closed session may be received with a delay from the peer.
> > This can trigger the WARN_ON macro to help debug the situation.
> >
> > To make this situation more meaningful, we would like to call
> > WARN_ON only when the state of the socket is "TCP_ESTABLISHED".
> > This will prevent the kernel from triggering a panic
> > due to panic_on_warn.
> >
> > Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com>
> > ---
> >  net/ipv4/tcp_input.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index 5bdf13ac26ef..62f4c285ab80 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -2037,7 +2037,8 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb,
> >         WARN_ON((int)tp->sacked_out < 0);
> >         WARN_ON((int)tp->lost_out < 0);
> >         WARN_ON((int)tp->retrans_out < 0);
> > -       WARN_ON((int)tcp_packets_in_flight(tp) < 0);
> > +       if (sk->sk_state == TCP_ESTABLISHED)
> 
> In any case this test on sk_state is too specific.
Yes. I agree as wll. Do you have any idea to avoid a kernel panic that we are going through ?

> > +               WARN_ON((int)tcp_packets_in_flight(tp) < 0);
> >  #endif
> >         return state->flag;
> >  }
> > @@ -3080,7 +3081,8 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una,
> >                 return;
> >
> >         /* C. Check consistency of the current state. */
> > -       tcp_verify_left_out(tp);
> > +       if (sk->sk_state == TCP_ESTABLISHED)
> > +               tcp_verify_left_out(tp);
> >
> >         /* D. Check state exit conditions. State can be terminated
> >          *    when high_seq is ACKed. */
> > --
> > 2.39.2
> >
> 

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



  parent reply	other threads:[~2024-12-04  3:05 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20241203081005epcas2p247b3d05bc767b1a50ba85c4433657295@epcas2p2.samsung.com>
2024-12-03  8:12 ` [PATCH] tcp: check socket state before calling WARN_ON Youngmin Nam
2024-12-03 11:07   ` Eric Dumazet
2024-12-03 15:34     ` Neal Cardwell
2024-12-04  2:18       ` Jakub Kicinski
2024-12-04  3:39         ` Youngmin Nam
2024-12-04  7:13           ` Eric Dumazet
2024-12-04  7:48             ` Dujeong.lee
2024-12-04 14:21               ` Neal Cardwell
2024-12-05 12:31                 ` Dujeong.lee
2025-01-17  5:08                 ` Youngmin Nam
2025-01-17 15:18                   ` Neal Cardwell
2025-01-20  0:18                     ` Youngmin Nam
2025-02-03  5:21                       ` Youngmin Nam
2025-02-24 21:13                         ` Neal Cardwell
2025-02-25 17:24                           ` Neal Cardwell
2025-02-25 18:28                             ` Yuchung Cheng
2025-02-25 18:43                               ` Eric Dumazet
2025-03-01  5:37                             ` Youngmin Nam
2025-03-14  2:49                               ` Youngmin Nam
2024-12-06  5:53             ` Youngmin Nam
2024-12-06  8:35               ` Eric Dumazet
2024-12-06  9:01                 ` Youngmin Nam
2024-12-06  9:08                   ` Eric Dumazet
2024-12-06 15:34                     ` Neal Cardwell
2024-12-09  1:52                       ` Youngmin Nam
2024-12-09  1:32                     ` Youngmin Nam
2024-12-09 10:16                     ` Dujeong.lee
2024-12-09 10:20                       ` Eric Dumazet
2024-12-10  3:38                         ` Dujeong.lee
2024-12-10  7:10                         ` Dujeong.lee
2024-12-18 10:18                         ` Dujeong.lee
2024-12-18 10:27                           ` Eric Dumazet
2024-12-30  0:23                             ` Dujeong.lee
2024-12-30  9:33                               ` Eric Dumazet
2025-01-02  0:22                                 ` Dujeong.lee
2025-01-02  8:16                                   ` Eric Dumazet
2025-01-03  4:16                                     ` Dujeong.lee
2024-12-04  3:26       ` Youngmin Nam
2024-12-04  8:55         ` Eric Dumazet
2024-12-04  3:08     ` Youngmin Nam [this message]
2024-12-04  9:03       ` Eric Dumazet
2024-12-05  2:45         ` Youngmin Nam
2024-12-13  7:14       ` Youngmin Nam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z0/HyztKs8UFBOa0@perf \
    --to=youngmin.nam@samsung.com \
    --cc=d7271.choe@samsung.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=dujeong.lee@samsung.com \
    --cc=edumazet@google.com \
    --cc=guo88.liu@samsung.com \
    --cc=hajun.sung@samsung.com \
    --cc=horms@kernel.org \
    --cc=joonki.min@samsung.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sw.ju@samsung.com \
    --cc=yiwang.cai@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox