From: "Dujeong.lee" <dujeong.lee@samsung.com>
To: "'Eric Dumazet'" <edumazet@google.com>,
"'Youngmin Nam'" <youngmin.nam@samsung.com>
Cc: "'Jakub Kicinski'" <kuba@kernel.org>,
"'Neal Cardwell'" <ncardwell@google.com>, <davem@davemloft.net>,
<dsahern@kernel.org>, <pabeni@redhat.com>, <horms@kernel.org>,
<guo88.liu@samsung.com>, <yiwang.cai@samsung.com>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<joonki.min@samsung.com>, <hajun.sung@samsung.com>,
<d7271.choe@samsung.com>, <sw.ju@samsung.com>
Subject: RE: [PATCH] tcp: check socket state before calling WARN_ON
Date: Wed, 4 Dec 2024 16:48:42 +0900 [thread overview]
Message-ID: <009e01db4620$f08f42e0$d1adc8a0$@samsung.com> (raw)
In-Reply-To: <CANn89iKms_9EX+wArf1FK7Cy3-Cr_ryX+MJ2YC8yt1xmvpY=Uw@mail.gmail.com>
On Wed, Dec 4, 2024 at 4:14 PM Eric Dumazet wrote:
> To: Youngmin Nam <youngmin.nam@samsung.com>
> Cc: Jakub Kicinski <kuba@kernel.org>; Neal Cardwell <ncardwell@google.com>;
> davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com;
> horms@kernel.org; dujeong.lee@samsung.com; guo88.liu@samsung.com;
> yiwang.cai@samsung.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; joonki.min@samsung.com; hajun.sung@samsung.com;
> d7271.choe@samsung.com; sw.ju@samsung.com
> Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
>
> On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@samsung.com>
> wrote:
> >
> > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote:
> > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote:
> > > > > I have not seen these warnings firing. Neal, have you seen this in
> the past ?
> > > >
> > > > I can't recall seeing these warnings over the past 5 years or so,
> > > > and (from checking our monitoring) they don't seem to be firing in
> > > > our fleet recently.
> > >
> > > FWIW I see this at Meta on 5.12 kernels, but nothing since.
> > > Could be that one of our workloads is pinned to 5.12.
> > > Youngmin, what's the newest kernel you can repro this on?
> > >
> > Hi Jakub.
> > Thank you for taking an interest in this issue.
> >
> > We've seen this issue since 5.15 kernel.
> > Now, we can see this on 6.6 kernel which is the newest kernel we are
> running.
>
> The fact that we are processing ACK packets after the write queue has been
> purged would be a serious bug.
>
> Thus the WARN() makes sense to us.
>
> It would be easy to build a packetdrill test. Please do so, then we can
> fix the root cause.
>
> Thank you !
Please let me share some more details and clarifications on the issue from ramdump snapshot locally secured.
1) This issue has been reported from Android-T linux kernel when we enabled panic_on_warn for the first time.
Reproduction rate is not high and can be seen in any test cases with public internet connection.
2) Analysis from ramdump (which is not available at the moment).
2-A) From ramdump, I was able to find below values.
tp->packets_out = 0
tp->retrans_out = 1
tp->max_packets_out = 1
tp->max_packets_Seq = 1575830358
tp->snd_ssthresh = 5
tp->snd_cwnd = 1
tp->prior_cwnd = 10
tp->wite_seq = 1575830359
tp->pushed_seq = 1575830358
tp->lost_out = 1
tp->sacked_out = 0
2-B) The last Tx packet from the device is (Time: 17371.562934)
Hex:
4500005b95a3400040063e34c0a848188efacf0a888a01bb5ded432f5ad8ab29801800495b5800000101080a3a52197fef299d901703030022f3589123b0523bdd07be137a98ca9b5d3475332d4382c7b420571e6d437a07ba7787
Internet Protocol Version 4
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
Total Length: 91
Identification: 0x95a3 (38307)
010. .... = Flags: 0x2, Don't fragment
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 64
Protocol: TCP (6)
Header Checksum: 0x3e34
Header checksum status: Unverified
Source Address: 192.168.72.24
Destination Address: 142.250.207.10
Stream index: 0
Transmission Control Protocol
Source Port: 34954
Destination Port: 443
Stream index: 0
Conversation completeness: Incomplete (0)
TCP Segment Len: 39
Sequence Number: 0x5ded432f
Sequence Number (raw): 1575830319
Next Sequence Number: 40
Acknowledgment Number: 0x5ad8ab29
Acknowledgment number (raw): 1524149033
1000 .... = Header Length: 32 bytes (8)
Flags: 0x018 (PSH, ACK)
Window: 73
Calculated window size: 73
Window size scaling factor: -1 (unknown)
Checksum: 0x5b58
Checksum Status: Unverified
Urgent Pointer: 0
Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
Timestamps
SEQ/ACK analysis
TCP payload (39 bytes)
Transport Layer Security
TLSv1.2 Record Layer: Application Data Protocol: Hypertext Transfer Protocol
2-C) When warn hit, DUT was processing (Time: 17399.502603, 28 seconds later since last Tx)
Hex:
456000405FA20000720681F08EFACF0AC0A8481801BB888A5AD8AB295DED4356B010010D93D800000101080AEF299EF43A52089F0101050A5DED432F5DED4356
Internet Protocol Version 4
0100 .... = Version: 4
.... 0101 = Header Length: 20 bytes (5)
Differentiated Services Field: 0x60 (DSCP: CS3, ECN: Not-ECT)
Total Length: 64
Identification: 0x5fa2 (24482)
000. .... = Flags: 0x0
...0 0000 0000 0000 = Fragment Offset: 0
Time to Live: 114
Protocol: TCP (6)
Header Checksum: 0x81f0
Header checksum status: Unverified
Source Address: 142.250.207.10
Destination Address: 192.168.72.24
Stream index: 0
Transmission Control Protocol
Source Port: 443
Destination Port: 34954
Stream index: 0
Conversation completeness: Incomplete (0)
TCP Segment Len: 0
Sequence Number: 0x5ad8ab29
Sequence Number (raw): 1524149033
Next Sequence Number: 1
Acknowledgment Number: 0x5ded4356
Acknowledgment number (raw): 1575830358
1011 .... = Header Length: 44 bytes (11)
Flags: 0x010 (ACK)
Window: 269
Calculated window size: 269
Window size scaling factor: -1 (unknown)
Checksum: 0x93d8
Checksum Status: Unverified
Urgent Pointer: 0
Options: (24 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps, No-Operation (NOP), No-Operation (NOP), SACK
Timestamps
2-D) The DUT received ack after 28 seconds from Access Point.
3)Clarification on "tcp_write_queue_purge" claim
This is just my conjecture based on ramdump snapshot and it is not shown in calltrace.
Based on tcp status in snapshot I thought tcp_write_queue_purge was called and packets_out was cleared.
4) In our kernel "/proc/sys/net/ipv4/tcp_mtu_probing" is set to 0.
next prev parent reply other threads:[~2024-12-04 7:48 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20241203081005epcas2p247b3d05bc767b1a50ba85c4433657295@epcas2p2.samsung.com>
2024-12-03 8:12 ` [PATCH] tcp: check socket state before calling WARN_ON Youngmin Nam
2024-12-03 11:07 ` Eric Dumazet
2024-12-03 15:34 ` Neal Cardwell
2024-12-04 2:18 ` Jakub Kicinski
2024-12-04 3:39 ` Youngmin Nam
2024-12-04 7:13 ` Eric Dumazet
2024-12-04 7:48 ` Dujeong.lee [this message]
2024-12-04 14:21 ` Neal Cardwell
2024-12-05 12:31 ` Dujeong.lee
2025-01-17 5:08 ` Youngmin Nam
2025-01-17 15:18 ` Neal Cardwell
[not found] ` <CGME20250120001504epcas2p1d766c193256b4b7f79d19f61d76d697d@epcas2p1.samsung.com>
2025-01-20 0:18 ` Youngmin Nam
2025-02-03 5:21 ` Youngmin Nam
2025-02-24 21:13 ` Neal Cardwell
2025-02-25 17:24 ` Neal Cardwell
2025-02-25 18:28 ` Yuchung Cheng
2025-02-25 18:43 ` Eric Dumazet
2025-03-01 5:37 ` Youngmin Nam
2025-03-14 2:49 ` Youngmin Nam
2024-12-06 5:53 ` Youngmin Nam
2024-12-06 8:35 ` Eric Dumazet
2024-12-06 9:01 ` Youngmin Nam
2024-12-06 9:08 ` Eric Dumazet
2024-12-06 15:34 ` Neal Cardwell
[not found] ` <CGME20241209014847epcas2p219955d6e71c91d1f9b2b5dbca5d705d6@epcas2p2.samsung.com>
2024-12-09 1:52 ` Youngmin Nam
[not found] ` <CGME20241209012851epcas2p19a32fe38ec43dd2a91eda9540c11bf97@epcas2p1.samsung.com>
2024-12-09 1:32 ` Youngmin Nam
2024-12-09 10:16 ` Dujeong.lee
2024-12-09 10:20 ` Eric Dumazet
2024-12-10 3:38 ` Dujeong.lee
2024-12-10 7:10 ` Dujeong.lee
2024-12-18 10:18 ` Dujeong.lee
2024-12-18 10:27 ` Eric Dumazet
2024-12-30 0:23 ` Dujeong.lee
2024-12-30 9:33 ` Eric Dumazet
2025-01-02 0:22 ` Dujeong.lee
2025-01-02 8:16 ` Eric Dumazet
2025-01-03 4:16 ` Dujeong.lee
2024-12-04 3:26 ` Youngmin Nam
2024-12-04 8:55 ` Eric Dumazet
2024-12-04 3:08 ` Youngmin Nam
2024-12-04 9:03 ` Eric Dumazet
2024-12-05 2:45 ` Youngmin Nam
2024-12-13 7:14 ` Youngmin Nam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='009e01db4620$f08f42e0$d1adc8a0$@samsung.com' \
--to=dujeong.lee@samsung.com \
--cc=d7271.choe@samsung.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=guo88.liu@samsung.com \
--cc=hajun.sung@samsung.com \
--cc=horms@kernel.org \
--cc=joonki.min@samsung.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sw.ju@samsung.com \
--cc=yiwang.cai@samsung.com \
--cc=youngmin.nam@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).