From: Youngmin Nam <youngmin.nam@samsung.com>
To: Neal Cardwell <ncardwell@google.com>, Eric Dumazet <edumazet@google.com>
Cc: Youngmin Nam <youngmin.nam@samsung.com>,
davem@davemloft.net, dsahern@kernel.org, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, dujeong.lee@samsung.com,
guo88.liu@samsung.com, yiwang.cai@samsung.com,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
joonki.min@samsung.com, hajun.sung@samsung.com,
d7271.choe@samsung.com, sw.ju@samsung.com
Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
Date: Wed, 4 Dec 2024 12:26:18 +0900 [thread overview]
Message-ID: <Z0/L2gDjvXVfj1ho@perf> (raw)
In-Reply-To: <CADVnQynUspJL4e3UnZTKps9WmgnE-0ngQnQmn=8gjSmyg4fQ5A@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3157 bytes --]
On Tue, Dec 03, 2024 at 10:34:46AM -0500, Neal Cardwell wrote:
> On Tue, Dec 3, 2024 at 6:07 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote:
> > >
> > > We encountered the following WARNINGs
> > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert()
> > > which triggered a kernel panic due to panic_on_warn.
> > >
> > > case 1.
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026
> > > Call trace:
> > > tcp_sacktag_write_queue+0xae8/0xb60
> > > tcp_ack+0x4ec/0x12b8
> > > tcp_rcv_state_process+0x22c/0xd38
> > > tcp_v4_do_rcv+0x220/0x300
> > > tcp_v4_rcv+0xa5c/0xbb4
> > > ip_protocol_deliver_rcu+0x198/0x34c
> > > ip_local_deliver_finish+0x94/0xc4
> > > ip_local_deliver+0x74/0x10c
> > > ip_rcv+0xa0/0x13c
> > > Kernel panic - not syncing: kernel: panic_on_warn set ...
> > >
> > > case 2.
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004
> > > Call trace:
> > > tcp_fastretrans_alert+0x8ac/0xa74
> > > tcp_ack+0x904/0x12b8
> > > tcp_rcv_state_process+0x22c/0xd38
> > > tcp_v4_do_rcv+0x220/0x300
> > > tcp_v4_rcv+0xa5c/0xbb4
> > > ip_protocol_deliver_rcu+0x198/0x34c
> > > ip_local_deliver_finish+0x94/0xc4
> > > ip_local_deliver+0x74/0x10c
> > > ip_rcv+0xa0/0x13c
> > > Kernel panic - not syncing: kernel: panic_on_warn set ...
> > >
> >
> > I have not seen these warnings firing. Neal, have you seen this in the past ?
>
> I can't recall seeing these warnings over the past 5 years or so, and
> (from checking our monitoring) they don't seem to be firing in our
> fleet recently.
>
> > In any case this test on sk_state is too specific.
>
> I agree with Eric. IMHO TCP_FIN_WAIT1 deserves all the same warnings
> as ESTABLISHED, since in this state the connection may still have a
> big queue of data it is trying to reliably send to the other side,
> with full loss recovery and congestion control logic.
Yes I agree with Eric as well.
>
> I would suggest that instead of running with panic_on_warn it would
> make more sense to not panic on warning, and instead add more detail
> to these warning messages in your kernels during your testing, to help
> debug what is going wrong. I would suggest adding to the warning
> message:
>
> tp->packets_out
> tp->sacked_out
> tp->lost_out
> tp->retrans_out
> tcp_is_sack(tp)
> tp->mss_cache
> inet_csk(sk)->icsk_ca_state
> inet_csk(sk)->icsk_pmtu_cookie
Hi Neal.
Thanks for your opinion.
By the way, we enable panic_on_warn by default for stability.
As you know, panic_on_warn is not applied to a specific subsystem but to the entire kernel.
We just want to avoid the kernel panic.
So when I see below lwn article, I think we might use pr_warn() instaed of WARN_ON().
https://lwn.net/Articles/969923/
How do you think of it ?
>
> A hunch would be that this is either firing for (a) non-SACK
> connections, or (b) after an MTU reduction.
>
> In particular, you might try `echo 0 >
> /proc/sys/net/ipv4/tcp_mtu_probing` and see if that makes the warnings
> go away.
>
> cheers,
> neal
>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2024-12-04 3:23 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20241203081005epcas2p247b3d05bc767b1a50ba85c4433657295@epcas2p2.samsung.com>
2024-12-03 8:12 ` [PATCH] tcp: check socket state before calling WARN_ON Youngmin Nam
2024-12-03 11:07 ` Eric Dumazet
2024-12-03 15:34 ` Neal Cardwell
2024-12-04 2:18 ` Jakub Kicinski
2024-12-04 3:39 ` Youngmin Nam
2024-12-04 7:13 ` Eric Dumazet
2024-12-04 7:48 ` Dujeong.lee
2024-12-04 14:21 ` Neal Cardwell
2024-12-05 12:31 ` Dujeong.lee
2025-01-17 5:08 ` Youngmin Nam
2025-01-17 15:18 ` Neal Cardwell
[not found] ` <CGME20250120001504epcas2p1d766c193256b4b7f79d19f61d76d697d@epcas2p1.samsung.com>
2025-01-20 0:18 ` Youngmin Nam
2025-02-03 5:21 ` Youngmin Nam
2025-02-24 21:13 ` Neal Cardwell
2025-02-25 17:24 ` Neal Cardwell
2025-02-25 18:28 ` Yuchung Cheng
2025-02-25 18:43 ` Eric Dumazet
2025-03-01 5:37 ` Youngmin Nam
2025-03-14 2:49 ` Youngmin Nam
2024-12-06 5:53 ` Youngmin Nam
2024-12-06 8:35 ` Eric Dumazet
2024-12-06 9:01 ` Youngmin Nam
2024-12-06 9:08 ` Eric Dumazet
2024-12-06 15:34 ` Neal Cardwell
[not found] ` <CGME20241209014847epcas2p219955d6e71c91d1f9b2b5dbca5d705d6@epcas2p2.samsung.com>
2024-12-09 1:52 ` Youngmin Nam
[not found] ` <CGME20241209012851epcas2p19a32fe38ec43dd2a91eda9540c11bf97@epcas2p1.samsung.com>
2024-12-09 1:32 ` Youngmin Nam
2024-12-09 10:16 ` Dujeong.lee
2024-12-09 10:20 ` Eric Dumazet
2024-12-10 3:38 ` Dujeong.lee
2024-12-10 7:10 ` Dujeong.lee
2024-12-18 10:18 ` Dujeong.lee
2024-12-18 10:27 ` Eric Dumazet
2024-12-30 0:23 ` Dujeong.lee
2024-12-30 9:33 ` Eric Dumazet
2025-01-02 0:22 ` Dujeong.lee
2025-01-02 8:16 ` Eric Dumazet
2025-01-03 4:16 ` Dujeong.lee
2024-12-04 3:26 ` Youngmin Nam [this message]
2024-12-04 8:55 ` Eric Dumazet
2024-12-04 3:08 ` Youngmin Nam
2024-12-04 9:03 ` Eric Dumazet
2024-12-05 2:45 ` Youngmin Nam
2024-12-13 7:14 ` Youngmin Nam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z0/L2gDjvXVfj1ho@perf \
--to=youngmin.nam@samsung.com \
--cc=d7271.choe@samsung.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=dujeong.lee@samsung.com \
--cc=edumazet@google.com \
--cc=guo88.liu@samsung.com \
--cc=hajun.sung@samsung.com \
--cc=horms@kernel.org \
--cc=joonki.min@samsung.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sw.ju@samsung.com \
--cc=yiwang.cai@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).