* kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
@ 2023-10-11 10:28 Dmitry Kravkov
2023-10-11 14:02 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Kravkov @ 2023-10-11 10:28 UTC (permalink / raw)
To: netdev, edumazet, Slava (Ice) Sheremet
Hi,
In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
in kernel that bisected to this commit:
commit 849b425cd091e1804af964b771761cfbefbafb43
Author: Eric Dumazet <edumazet@google.com>
Date: Tue Jun 14 10:17:34 2022 -0700
tcp: fix possible freeze in tx path under memory pressure
Blamed commit only dealt with applications issuing small writes.
Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.
In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.
Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.
For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.
v2: deal with zero copy paths.
Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This happens in a pretty stressful situation when two 100Gb (E810 or
ConnectX6) ports transmit above 150Gbps that most of the data is read
from disks. So it appears that the system is constantly in a memory
deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
the crash and system appears stable at delivering 180Gbps
[ 2445.532318] ------------[ cut here ]------------
[ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
5.19.0-rc2+ #21
[ 2445.560127] ------------[ cut here ]------------
[ 2445.560565] Hardware name: Cisco Systems Inc
UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
11/21/2021
[ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
[ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
41 39
[ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
[ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
[ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
[ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
[ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
[ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
[ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
knlGS:0000000000000000
[ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
[ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2445.561849] PKRU: 55555554
[ 2445.561853] Call Trace:
[ 2445.561858] <TASK>
[ 2445.564202] ------------[ cut here ]------------
[ 2445.568007] ? tcp_tasklet_func+0x120/0x120
[ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.569608] tcp_tsq_handler+0x7c/0xa0
[ 2445.569627] tcp_pace_kick+0x19/0x60
[ 2445.569632] __run_hrtimer+0x5c/0x1d0
[ 2445.572264] ------------[ cut here ]------------
[ 2445.574287] ------------[ cut here ]------------
[ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
[ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
--
--
--
--
Dmitry Kravkov Software Engineer
Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 10:28 kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer Dmitry Kravkov
@ 2023-10-11 14:02 ` Eric Dumazet
2023-10-11 20:20 ` Dmitry Kravkov
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2023-10-11 14:02 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: netdev, Slava (Ice) Sheremet
On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
>
> Hi,
>
> In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> in kernel that bisected to this commit:
>
> commit 849b425cd091e1804af964b771761cfbefbafb43
> Author: Eric Dumazet <edumazet@google.com>
> Date: Tue Jun 14 10:17:34 2022 -0700
>
> tcp: fix possible freeze in tx path under memory pressure
>
> Blamed commit only dealt with applications issuing small writes.
>
> Issue here is that we allow to force memory schedule for the sk_buff
> allocation, but we have no guarantee that sendmsg() is able to
> copy some payload in it.
>
> In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
>
> For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> and initial skb->truesize being 1280, tcp_sendmsg() is able to
> copy up to 2816 bytes under memory pressure.
>
> Before this patch a sendmsg() sending more than 2816 bytes
> would either block forever (if persistent memory pressure),
> or return -EAGAIN.
>
> For bigger MTU networks, it is advised to increase tcp_wmem[0]
> to avoid sending too small packets.
>
> v2: deal with zero copy paths.
>
> Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> Reviewed-by: Wei Wang <weiwan@google.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> This happens in a pretty stressful situation when two 100Gb (E810 or
> ConnectX6) ports transmit above 150Gbps that most of the data is read
> from disks. So it appears that the system is constantly in a memory
> deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> the crash and system appears stable at delivering 180Gbps
>
> [ 2445.532318] ------------[ cut here ]------------
> [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
> 5.19.0-rc2+ #21
> [ 2445.560127] ------------[ cut here ]------------
> [ 2445.560565] Hardware name: Cisco Systems Inc
> UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> 11/21/2021
> [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> 41 39
> [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> knlGS:0000000000000000
> [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 2445.561849] PKRU: 55555554
> [ 2445.561853] Call Trace:
> [ 2445.561858] <TASK>
> [ 2445.564202] ------------[ cut here ]------------
> [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> [ 2445.569627] tcp_pace_kick+0x19/0x60
> [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> [ 2445.572264] ------------[ cut here ]------------
> [ 2445.574287] ------------[ cut here ]------------
> [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> --
> --
>
> --
> --
>
> Dmitry Kravkov Software Engineer
> Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
Hi Dmitry, thanks for the report.
Can you post content of /proc/sys/net/ipv4/tcp_wmem and
/proc/sys/net/ipv4/tcp_rmem ?
Are you using memcg ?
Can you try the following patch ?
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
msghdr *msg, size_t size)
continue;
wait_for_space:
+ tcp_remove_empty_skb(sk);
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
if (copied)
tcp_push(sk, flags & ~MSG_MORE, mss_now,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 14:02 ` Eric Dumazet
@ 2023-10-11 20:20 ` Dmitry Kravkov
2023-10-11 20:54 ` Kuniyuki Iwashima
2023-10-16 11:12 ` Eric Dumazet
0 siblings, 2 replies; 10+ messages in thread
From: Dmitry Kravkov @ 2023-10-11 20:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Slava (Ice) Sheremet
On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> >
> > Hi,
> >
> > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > in kernel that bisected to this commit:
> >
> > commit 849b425cd091e1804af964b771761cfbefbafb43
> > Author: Eric Dumazet <edumazet@google.com>
> > Date: Tue Jun 14 10:17:34 2022 -0700
> >
> > tcp: fix possible freeze in tx path under memory pressure
> >
> > Blamed commit only dealt with applications issuing small writes.
> >
> > Issue here is that we allow to force memory schedule for the sk_buff
> > allocation, but we have no guarantee that sendmsg() is able to
> > copy some payload in it.
> >
> > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> >
> > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > copy up to 2816 bytes under memory pressure.
> >
> > Before this patch a sendmsg() sending more than 2816 bytes
> > would either block forever (if persistent memory pressure),
> > or return -EAGAIN.
> >
> > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > to avoid sending too small packets.
> >
> > v2: deal with zero copy paths.
> >
> > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > Reviewed-by: Wei Wang <weiwan@google.com>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > This happens in a pretty stressful situation when two 100Gb (E810 or
> > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > from disks. So it appears that the system is constantly in a memory
> > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > the crash and system appears stable at delivering 180Gbps
> >
> > [ 2445.532318] ------------[ cut here ]------------
> > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
> > 5.19.0-rc2+ #21
> > [ 2445.560127] ------------[ cut here ]------------
> > [ 2445.560565] Hardware name: Cisco Systems Inc
> > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > 11/21/2021
> > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > 41 39
> > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > knlGS:0000000000000000
> > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 2445.561849] PKRU: 55555554
> > [ 2445.561853] Call Trace:
> > [ 2445.561858] <TASK>
> > [ 2445.564202] ------------[ cut here ]------------
> > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > [ 2445.572264] ------------[ cut here ]------------
> > [ 2445.574287] ------------[ cut here ]------------
> > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > --
> > --
> >
> > --
> > --
> >
> > Dmitry Kravkov Software Engineer
> > Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
>
> Hi Dmitry, thanks for the report.
>
> Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> /proc/sys/net/ipv4/tcp_rmem ?
Thank you, Eric
# cat /proc/sys/net/ipv4/tcp_wmem
786432 1048576 6291456
# cat /proc/sys/net/ipv4/tcp_rmem
4096 87380 6291456
>
> Are you using memcg ?
No
>
> Can you try the following patch ?
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> msghdr *msg, size_t size)
> continue;
>
> wait_for_space:
> + tcp_remove_empty_skb(sk);
> set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> if (copied)
> tcp_push(sk, flags & ~MSG_MORE, mss_now,
The patched kernel crashed in the same manner:
[ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
--
--
Dmitry Kravkov Software Engineer
Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 20:20 ` Dmitry Kravkov
@ 2023-10-11 20:54 ` Kuniyuki Iwashima
2023-10-11 21:19 ` Dmitry Kravkov
2023-10-16 11:12 ` Eric Dumazet
1 sibling, 1 reply; 10+ messages in thread
From: Kuniyuki Iwashima @ 2023-10-11 20:54 UTC (permalink / raw)
To: dmitryk; +Cc: edumazet, netdev, slavas, kuniyu
From: Dmitry Kravkov <dmitryk@qwilt.com>
Date: Wed, 11 Oct 2023 23:20:10 +0300
> On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> > >
> > > Hi,
> > >
> > > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > > in kernel that bisected to this commit:
> > >
> > > commit 849b425cd091e1804af964b771761cfbefbafb43
> > > Author: Eric Dumazet <edumazet@google.com>
> > > Date: Tue Jun 14 10:17:34 2022 -0700
> > >
> > > tcp: fix possible freeze in tx path under memory pressure
> > >
> > > Blamed commit only dealt with applications issuing small writes.
> > >
> > > Issue here is that we allow to force memory schedule for the sk_buff
> > > allocation, but we have no guarantee that sendmsg() is able to
> > > copy some payload in it.
> > >
> > > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> > >
> > > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > > copy up to 2816 bytes under memory pressure.
> > >
> > > Before this patch a sendmsg() sending more than 2816 bytes
> > > would either block forever (if persistent memory pressure),
> > > or return -EAGAIN.
> > >
> > > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > > to avoid sending too small packets.
> > >
> > > v2: deal with zero copy paths.
> > >
> > > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > > Reviewed-by: Wei Wang <weiwan@google.com>
> > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > >
> > > This happens in a pretty stressful situation when two 100Gb (E810 or
> > > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > > from disks. So it appears that the system is constantly in a memory
> > > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > > the crash and system appears stable at delivering 180Gbps
> > >
> > > [ 2445.532318] ------------[ cut here ]------------
> > > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
It seems 3rd party module is loaded.
Just curious if it is possible to reproduce the issue without
out-of-tree modules.
> > > 5.19.0-rc2+ #21
> > > [ 2445.560127] ------------[ cut here ]------------
> > > [ 2445.560565] Hardware name: Cisco Systems Inc
> > > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > > 11/21/2021
> > > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > > 41 39
> > > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > > knlGS:0000000000000000
> > > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 2445.561849] PKRU: 55555554
> > > [ 2445.561853] Call Trace:
> > > [ 2445.561858] <TASK>
> > > [ 2445.564202] ------------[ cut here ]------------
> > > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > > [ 2445.572264] ------------[ cut here ]------------
> > > [ 2445.574287] ------------[ cut here ]------------
> > > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > > --
> > > --
> > >
> > > --
> > > --
> > >
> > > Dmitry Kravkov Software Engineer
> > > Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
> >
> > Hi Dmitry, thanks for the report.
> >
> > Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> > /proc/sys/net/ipv4/tcp_rmem ?
> Thank you, Eric
>
> # cat /proc/sys/net/ipv4/tcp_wmem
> 786432 1048576 6291456
> # cat /proc/sys/net/ipv4/tcp_rmem
> 4096 87380 6291456
>
> >
> > Are you using memcg ?
> No
> >
> > Can you try the following patch ?
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> > msghdr *msg, size_t size)
> > continue;
> >
> > wait_for_space:
> > + tcp_remove_empty_skb(sk);
> > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > if (copied)
> > tcp_push(sk, flags & ~MSG_MORE, mss_now,
>
>
> The patched kernel crashed in the same manner:
> [ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 20:54 ` Kuniyuki Iwashima
@ 2023-10-11 21:19 ` Dmitry Kravkov
2023-10-12 16:49 ` Dmitry Kravkov
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Kravkov @ 2023-10-11 21:19 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: edumazet, netdev, slavas
On Wed, Oct 11, 2023 at 11:54 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From: Dmitry Kravkov <dmitryk@qwilt.com>
> Date: Wed, 11 Oct 2023 23:20:10 +0300
> > On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > > > in kernel that bisected to this commit:
> > > >
> > > > commit 849b425cd091e1804af964b771761cfbefbafb43
> > > > Author: Eric Dumazet <edumazet@google.com>
> > > > Date: Tue Jun 14 10:17:34 2022 -0700
> > > >
> > > > tcp: fix possible freeze in tx path under memory pressure
> > > >
> > > > Blamed commit only dealt with applications issuing small writes.
> > > >
> > > > Issue here is that we allow to force memory schedule for the sk_buff
> > > > allocation, but we have no guarantee that sendmsg() is able to
> > > > copy some payload in it.
> > > >
> > > > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> > > >
> > > > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > > > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > > > copy up to 2816 bytes under memory pressure.
> > > >
> > > > Before this patch a sendmsg() sending more than 2816 bytes
> > > > would either block forever (if persistent memory pressure),
> > > > or return -EAGAIN.
> > > >
> > > > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > > > to avoid sending too small packets.
> > > >
> > > > v2: deal with zero copy paths.
> > > >
> > > > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > > > Reviewed-by: Wei Wang <weiwan@google.com>
> > > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > >
> > > > This happens in a pretty stressful situation when two 100Gb (E810 or
> > > > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > > > from disks. So it appears that the system is constantly in a memory
> > > > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > > > the crash and system appears stable at delivering 180Gbps
> > > >
> > > > [ 2445.532318] ------------[ cut here ]------------
> > > > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
>
> It seems 3rd party module is loaded.
>
> Just curious if it is possible to reproduce the issue without
> out-of-tree modules.
Not sure if ice driver is mature enough there. We will give it a try. Thanks
>
>
> > > > 5.19.0-rc2+ #21
> > > > [ 2445.560127] ------------[ cut here ]------------
> > > > [ 2445.560565] Hardware name: Cisco Systems Inc
> > > > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > > > 11/21/2021
> > > > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > > > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > > > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > > > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > > > 41 39
> > > > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > > > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > > > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > > > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > > > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > > > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > > > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > > > knlGS:0000000000000000
> > > > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > > > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [ 2445.561849] PKRU: 55555554
> > > > [ 2445.561853] Call Trace:
> > > > [ 2445.561858] <TASK>
> > > > [ 2445.564202] ------------[ cut here ]------------
> > > > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > > > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > > > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > > > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > > > [ 2445.572264] ------------[ cut here ]------------
> > > > [ 2445.574287] ------------[ cut here ]------------
> > > > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > > > --
> > > > --
> > > >
> > > > --
> > > > --
> > > >
> > > > Dmitry Kravkov Software Engineer
> > > > Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
> > >
> > > Hi Dmitry, thanks for the report.
> > >
> > > Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> > > /proc/sys/net/ipv4/tcp_rmem ?
> > Thank you, Eric
> >
> > # cat /proc/sys/net/ipv4/tcp_wmem
> > 786432 1048576 6291456
> > # cat /proc/sys/net/ipv4/tcp_rmem
> > 4096 87380 6291456
> >
> > >
> > > Are you using memcg ?
> > No
> > >
> > > Can you try the following patch ?
> > >
> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> > > 100644
> > > --- a/net/ipv4/tcp.c
> > > +++ b/net/ipv4/tcp.c
> > > @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> > > msghdr *msg, size_t size)
> > > continue;
> > >
> > > wait_for_space:
> > > + tcp_remove_empty_skb(sk);
> > > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > > if (copied)
> > > tcp_push(sk, flags & ~MSG_MORE, mss_now,
> >
> >
> > The patched kernel crashed in the same manner:
> > [ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
>
--
--
Dmitry Kravkov Software Engineer
Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 21:19 ` Dmitry Kravkov
@ 2023-10-12 16:49 ` Dmitry Kravkov
2023-10-16 11:10 ` Dmitry Kravkov
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Kravkov @ 2023-10-12 16:49 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: edumazet, netdev, slavas
On Thu, Oct 12, 2023 at 12:19 AM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
>
> On Wed, Oct 11, 2023 at 11:54 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From: Dmitry Kravkov <dmitryk@qwilt.com>
> > Date: Wed, 11 Oct 2023 23:20:10 +0300
> > > On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > > > > in kernel that bisected to this commit:
> > > > >
> > > > > commit 849b425cd091e1804af964b771761cfbefbafb43
> > > > > Author: Eric Dumazet <edumazet@google.com>
> > > > > Date: Tue Jun 14 10:17:34 2022 -0700
> > > > >
> > > > > tcp: fix possible freeze in tx path under memory pressure
> > > > >
> > > > > Blamed commit only dealt with applications issuing small writes.
> > > > >
> > > > > Issue here is that we allow to force memory schedule for the sk_buff
> > > > > allocation, but we have no guarantee that sendmsg() is able to
> > > > > copy some payload in it.
> > > > >
> > > > > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> > > > >
> > > > > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > > > > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > > > > copy up to 2816 bytes under memory pressure.
> > > > >
> > > > > Before this patch a sendmsg() sending more than 2816 bytes
> > > > > would either block forever (if persistent memory pressure),
> > > > > or return -EAGAIN.
> > > > >
> > > > > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > > > > to avoid sending too small packets.
> > > > >
> > > > > v2: deal with zero copy paths.
> > > > >
> > > > > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > > > > Reviewed-by: Wei Wang <weiwan@google.com>
> > > > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > > >
> > > > > This happens in a pretty stressful situation when two 100Gb (E810 or
> > > > > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > > > > from disks. So it appears that the system is constantly in a memory
> > > > > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > > > > the crash and system appears stable at delivering 180Gbps
> > > > >
> > > > > [ 2445.532318] ------------[ cut here ]------------
> > > > > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
> >
> > It seems 3rd party module is loaded.
> >
> > Just curious if it is possible to reproduce the issue without
> > out-of-tree modules.
> Not sure if ice driver is mature enough there. We will give it a try. Thanks
Happens on not-tained kernel too (we went for 6.1.38 to have more mature ice)
[ 1057.099780] ------------[ cut here ]------------
[ 1057.100389] RSP: 0018:ffffaa4e10093df0 EFLAGS: 00010286
[ 1057.101060] kernel BUG at net/ipv4/tcp_output.c:2645!
[ 1057.122021] RAX: 00000000ffff4000 RBX: ffff8ccad77e3540 RCX: 0000000000000000
[ 1057.122025] RDX: 0000000000000000 RSI: 00000000ffff4000 RDI: ffff8ccad77e3540
[ 1057.122027] RBP: ffff8ccad77e3480 R08: ffff8ccad77e35d4 R09: 0000000080400013
[ 1057.122029] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ccad77e3480
[ 1057.122031] R13: 7fffffffffffff00 R14: ffff8ccad77e3698 R15: 0000000000000000
[ 1057.122033] FS: 00007fd600d42840(0000) GS:ffff8ce1ffac0000(0000)
knlGS:0000000000000000
[ 1057.122035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1057.122038] CR2: 00007fd57dacdc80 CR3: 00000041dda7a005 CR4: 0000000000770ee0
[ 1057.122041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1057.122042] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1057.122044] PKRU: 55555554
[ 1057.122046] Call Trace:
[ 1057.122880] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 1057.123683] <TASK>
[ 1057.124409] CPU: 112 PID: 51072 Comm: nginx Not tainted 6.1.38 #27
[ 1057.125125] ? show_trace_log_lvl+0x1c4/0x2df
[ 1057.125812] Hardware name: Cisco Systems Inc
UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
11/21/2021
[ 1057.125815] RIP: 0010:tcp_write_xmit+0x70f/0x830
[ 1057.126559] ? show_trace_log_lvl+0x1c4/0x2df
> >
> >
> > > > > 5.19.0-rc2+ #21
> > > > > [ 2445.560127] ------------[ cut here ]------------
> > > > > [ 2445.560565] Hardware name: Cisco Systems Inc
> > > > > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > > > > 11/21/2021
> > > > > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > > > > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > > > > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > > > > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > > > > 41 39
> > > > > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > > > > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > > > > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > > > > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > > > > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > > > > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > > > > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > > > > knlGS:0000000000000000
> > > > > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > > > > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [ 2445.561849] PKRU: 55555554
> > > > > [ 2445.561853] Call Trace:
> > > > > [ 2445.561858] <TASK>
> > > > > [ 2445.564202] ------------[ cut here ]------------
> > > > > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > > > > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > > > > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > > > > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > > > > [ 2445.572264] ------------[ cut here ]------------
> > > > > [ 2445.574287] ------------[ cut here ]------------
> > > > > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > > > > --
> > > > > --
> > > > >
> > > > > --
> > > > > --
> > > > >
> > > > > Dmitry Kravkov Software Engineer
> > > > > Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
> > > >
> > > > Hi Dmitry, thanks for the report.
> > > >
> > > > Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> > > > /proc/sys/net/ipv4/tcp_rmem ?
> > > Thank you, Eric
> > >
> > > # cat /proc/sys/net/ipv4/tcp_wmem
> > > 786432 1048576 6291456
> > > # cat /proc/sys/net/ipv4/tcp_rmem
> > > 4096 87380 6291456
> > >
> > > >
> > > > Are you using memcg ?
> > > No
> > > >
> > > > Can you try the following patch ?
> > > >
> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> > > > 100644
> > > > --- a/net/ipv4/tcp.c
> > > > +++ b/net/ipv4/tcp.c
> > > > @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> > > > msghdr *msg, size_t size)
> > > > continue;
> > > >
> > > > wait_for_space:
> > > > + tcp_remove_empty_skb(sk);
> > > > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > > > if (copied)
> > > > tcp_push(sk, flags & ~MSG_MORE, mss_now,
> > >
> > >
> > > The patched kernel crashed in the same manner:
> > > [ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
> >
>
>
> --
> --
>
> Dmitry Kravkov Software Engineer
> Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
--
--
Dmitry Kravkov Software Engineer
Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-12 16:49 ` Dmitry Kravkov
@ 2023-10-16 11:10 ` Dmitry Kravkov
2023-10-16 11:13 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Kravkov @ 2023-10-16 11:10 UTC (permalink / raw)
To: Kuniyuki Iwashima; +Cc: edumazet, netdev, slavas
On Thu, Oct 12, 2023 at 7:49 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
>
> On Thu, Oct 12, 2023 at 12:19 AM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> >
> > On Wed, Oct 11, 2023 at 11:54 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From: Dmitry Kravkov <dmitryk@qwilt.com>
> > > Date: Wed, 11 Oct 2023 23:20:10 +0300
> > > > On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > > > > > in kernel that bisected to this commit:
> > > > > >
> > > > > > commit 849b425cd091e1804af964b771761cfbefbafb43
> > > > > > Author: Eric Dumazet <edumazet@google.com>
> > > > > > Date: Tue Jun 14 10:17:34 2022 -0700
> > > > > >
> > > > > > tcp: fix possible freeze in tx path under memory pressure
> > > > > >
> > > > > > Blamed commit only dealt with applications issuing small writes.
> > > > > >
> > > > > > Issue here is that we allow to force memory schedule for the sk_buff
> > > > > > allocation, but we have no guarantee that sendmsg() is able to
> > > > > > copy some payload in it.
> > > > > >
> > > > > > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> > > > > >
> > > > > > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > > > > > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > > > > > copy up to 2816 bytes under memory pressure.
> > > > > >
> > > > > > Before this patch a sendmsg() sending more than 2816 bytes
> > > > > > would either block forever (if persistent memory pressure),
> > > > > > or return -EAGAIN.
> > > > > >
> > > > > > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > > > > > to avoid sending too small packets.
> > > > > >
> > > > > > v2: deal with zero copy paths.
> > > > > >
> > > > > > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > > > > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > > > > > Reviewed-by: Wei Wang <weiwan@google.com>
> > > > > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > > > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > > > >
> > > > > > This happens in a pretty stressful situation when two 100Gb (E810 or
> > > > > > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > > > > > from disks. So it appears that the system is constantly in a memory
> > > > > > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > > > > > the crash and system appears stable at delivering 180Gbps
> > > > > >
> > > > > > [ 2445.532318] ------------[ cut here ]------------
> > > > > > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > > > > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
> > >
> > > It seems 3rd party module is loaded.
> > >
> > > Just curious if it is possible to reproduce the issue without
> > > out-of-tree modules.
> > Not sure if ice driver is mature enough there. We will give it a try. Thanks
>
> Happens on not-tained kernel too (we went for 6.1.38 to have more mature ice)
>
> [ 1057.099780] ------------[ cut here ]------------
> [ 1057.100389] RSP: 0018:ffffaa4e10093df0 EFLAGS: 00010286
> [ 1057.101060] kernel BUG at net/ipv4/tcp_output.c:2645!
> [ 1057.122021] RAX: 00000000ffff4000 RBX: ffff8ccad77e3540 RCX: 0000000000000000
> [ 1057.122025] RDX: 0000000000000000 RSI: 00000000ffff4000 RDI: ffff8ccad77e3540
> [ 1057.122027] RBP: ffff8ccad77e3480 R08: ffff8ccad77e35d4 R09: 0000000080400013
> [ 1057.122029] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ccad77e3480
> [ 1057.122031] R13: 7fffffffffffff00 R14: ffff8ccad77e3698 R15: 0000000000000000
> [ 1057.122033] FS: 00007fd600d42840(0000) GS:ffff8ce1ffac0000(0000)
> knlGS:0000000000000000
> [ 1057.122035] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1057.122038] CR2: 00007fd57dacdc80 CR3: 00000041dda7a005 CR4: 0000000000770ee0
> [ 1057.122041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1057.122042] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1057.122044] PKRU: 55555554
> [ 1057.122046] Call Trace:
> [ 1057.122880] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [ 1057.123683] <TASK>
> [ 1057.124409] CPU: 112 PID: 51072 Comm: nginx Not tainted 6.1.38 #27
> [ 1057.125125] ? show_trace_log_lvl+0x1c4/0x2df
> [ 1057.125812] Hardware name: Cisco Systems Inc
> UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> 11/21/2021
> [ 1057.125815] RIP: 0010:tcp_write_xmit+0x70f/0x830
> [ 1057.126559] ? show_trace_log_lvl+0x1c4/0x2df
>
>
>
> > >
> > >
> > > > > > 5.19.0-rc2+ #21
> > > > > > [ 2445.560127] ------------[ cut here ]------------
> > > > > > [ 2445.560565] Hardware name: Cisco Systems Inc
> > > > > > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > > > > > 11/21/2021
> > > > > > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > > > > > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > > > > > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > > > > > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > > > > > 41 39
> > > > > > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > > > > > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > > > > > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > > > > > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > > > > > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > > > > > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > > > > > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > > > > > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > [ 2445.561849] PKRU: 55555554
> > > > > > [ 2445.561853] Call Trace:
> > > > > > [ 2445.561858] <TASK>
> > > > > > [ 2445.564202] ------------[ cut here ]------------
> > > > > > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > > > > > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > > > > > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > > > > > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > > > > > [ 2445.572264] ------------[ cut here ]------------
> > > > > > [ 2445.574287] ------------[ cut here ]------------
> > > > > > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > > > > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > > > > > --
> > > > > > --
> > > > > >
> > > > > > --
> > > > > > --
> > > > > >
> > > > >
> > > > > Hi Dmitry, thanks for the report.
> > > > >
> > > > > Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> > > > > /proc/sys/net/ipv4/tcp_rmem ?
> > > > Thank you, Eric
> > > >
> > > > # cat /proc/sys/net/ipv4/tcp_wmem
> > > > 786432 1048576 6291456
> > > > # cat /proc/sys/net/ipv4/tcp_rmem
> > > > 4096 87380 6291456
> > > >
> > > > >
> > > > > Are you using memcg ?
> > > > No
> > > > >
> > > > > Can you try the following patch ?
> > > > >
> > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > > > > index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> > > > > 100644
> > > > > --- a/net/ipv4/tcp.c
> > > > > +++ b/net/ipv4/tcp.c
> > > > > @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> > > > > msghdr *msg, size_t size)
> > > > > continue;
> > > > >
> > > > > wait_for_space:
> > > > > + tcp_remove_empty_skb(sk);
> > > > > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > > > > if (copied)
> > > > > tcp_push(sk, flags & ~MSG_MORE, mss_now,
> > > >
> > > >
> > > > The patched kernel crashed in the same manner:
> > > > [ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
Hi Eric, do you think we can try something to avoid the crash?
Decreasing tcp_wmem[0] did not help
# cat /proc/sys/net/ipv4/tcp_wmem
4096 1048576 629145
> > >
> >
> >
> > --
> > --
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-11 20:20 ` Dmitry Kravkov
2023-10-11 20:54 ` Kuniyuki Iwashima
@ 2023-10-16 11:12 ` Eric Dumazet
1 sibling, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2023-10-16 11:12 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: netdev, Slava (Ice) Sheremet
On Wed, Oct 11, 2023 at 10:20 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
>
> On Wed, Oct 11, 2023 at 5:02 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Oct 11, 2023 at 12:28 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> > >
> > > Hi,
> > >
> > > In our try to upgrade from 5.10 to 6.1 kernel we noticed stable crash
> > > in kernel that bisected to this commit:
> > >
> > > commit 849b425cd091e1804af964b771761cfbefbafb43
> > > Author: Eric Dumazet <edumazet@google.com>
> > > Date: Tue Jun 14 10:17:34 2022 -0700
> > >
> > > tcp: fix possible freeze in tx path under memory pressure
> > >
> > > Blamed commit only dealt with applications issuing small writes.
> > >
> > > Issue here is that we allow to force memory schedule for the sk_buff
> > > allocation, but we have no guarantee that sendmsg() is able to
> > > copy some payload in it.
> > >
> > > In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
> > >
> > > For example, if we consider tcp_wmem[0] = 4096 (default on x86),
> > > and initial skb->truesize being 1280, tcp_sendmsg() is able to
> > > copy up to 2816 bytes under memory pressure.
> > >
> > > Before this patch a sendmsg() sending more than 2816 bytes
> > > would either block forever (if persistent memory pressure),
> > > or return -EAGAIN.
> > >
> > > For bigger MTU networks, it is advised to increase tcp_wmem[0]
> > > to avoid sending too small packets.
> > >
> > > v2: deal with zero copy paths.
> > >
> > > Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
> > > Reviewed-by: Wei Wang <weiwan@google.com>
> > > Reviewed-by: Shakeel Butt <shakeelb@google.com>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > >
> > > This happens in a pretty stressful situation when two 100Gb (E810 or
> > > ConnectX6) ports transmit above 150Gbps that most of the data is read
> > > from disks. So it appears that the system is constantly in a memory
> > > deficit. Apparently reverting the patch in 6.1.38 kernel eliminates
> > > the crash and system appears stable at delivering 180Gbps
> > >
> > > [ 2445.532318] ------------[ cut here ]------------
> > > [ 2445.532323] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.532334] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > [ 2445.550934] CPU: 61 PID: 109767 Comm: nginx Tainted: G S OE
> > > 5.19.0-rc2+ #21
> > > [ 2445.560127] ------------[ cut here ]------------
> > > [ 2445.560565] Hardware name: Cisco Systems Inc
> > > UCSC-C220-M6N/UCSC-C220-M6N, BIOS C220M6.4.2.1g.0.1121212157
> > > 11/21/2021
> > > [ 2445.560571] RIP: 0010:tcp_write_xmit+0x70b/0x830
> > > [ 2445.561221] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.561821] Code: 84 0b fc ff ff 0f b7 43 32 41 39 c6 0f 84 fe fb
> > > ff ff 8b 43 70 41 39 c6 0f 82 ff 00 00 00 c7 43 30 01 00 00 00 e9 e6
> > > fb ff ff <0f> 0b 8b 74 24 20 8b 85 dc 05 00 00 44 89 ea 01 c8 2b 43 28
> > > 41 39
> > > [ 2445.561828] RSP: 0000:ffffc110ed647dc0 EFLAGS: 00010246
> > > [ 2445.561832] RAX: 0000000000000000 RBX: ffff9fe1f8081a00 RCX: 00000000000005a8
> > > [ 2445.561833] RDX: 000000000000043a RSI: 000002389172f8f4 RDI: 000000000000febf
> > > [ 2445.561835] RBP: ffff9fe5f864e900 R08: 0000000000000000 R09: 0000000000000100
> > > [ 2445.561836] R10: ffffffff9be060d0 R11: 000000000000000e R12: ffff9fe5f864e901
> > > [ 2445.561837] R13: 0000000000000001 R14: 00000000000005a8 R15: 0000000000000000
> > > [ 2445.561839] FS: 00007f342530c840(0000) GS:ffff9ffa7f940000(0000)
> > > knlGS:0000000000000000
> > > [ 2445.561842] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2445.561844] CR2: 00007f20ca4ed830 CR3: 00000045d976e005 CR4: 0000000000770ee0
> > > [ 2445.561846] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [ 2445.561847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [ 2445.561849] PKRU: 55555554
> > > [ 2445.561853] Call Trace:
> > > [ 2445.561858] <TASK>
> > > [ 2445.564202] ------------[ cut here ]------------
> > > [ 2445.568007] ? tcp_tasklet_func+0x120/0x120
> > > [ 2445.569107] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.569608] tcp_tsq_handler+0x7c/0xa0
> > > [ 2445.569627] tcp_pace_kick+0x19/0x60
> > > [ 2445.569632] __run_hrtimer+0x5c/0x1d0
> > > [ 2445.572264] ------------[ cut here ]------------
> > > [ 2445.574287] ------------[ cut here ]------------
> > > [ 2445.574292] kernel BUG at net/ipv4/tcp_output.c:2642!
> > > [ 2445.582581] __hrtimer_run_queues+0x7d/0xe0
> > > --
> > > --
> > >
> > > --
> > > --
> > >
> > > Dmitry Kravkov Software Engineer
> > > Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
> >
> > Hi Dmitry, thanks for the report.
> >
> > Can you post content of /proc/sys/net/ipv4/tcp_wmem and
> > /proc/sys/net/ipv4/tcp_rmem ?
> Thank you, Eric
>
> # cat /proc/sys/net/ipv4/tcp_wmem
> 786432 1048576 6291456
Hmm. This does look strange to me.
tcp_rmem[0] is supposed to be small.
> # cat /proc/sys/net/ipv4/tcp_rmem
> 4096 87380 6291456
>
> >
> > Are you using memcg ?
> No
> >
> > Can you try the following patch ?
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 3f66cdeef7decb5b5d2b84212c623781b8ce63db..d74b197e02e94aa2f032f2c3971969e604abc7de
> > 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1286,6 +1286,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct
> > msghdr *msg, size_t size)
> > continue;
> >
> > wait_for_space:
> > + tcp_remove_empty_skb(sk);
> > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > if (copied)
> > tcp_push(sk, flags & ~MSG_MORE, mss_now,
>
>
> The patched kernel crashed in the same manner:
> [ 2214.154278] kernel BUG at net/ipv4/tcp_output.c:2642!
>
>
>
> --
> --
>
> Dmitry Kravkov Software Engineer
> Qwilt | Mobile: +972-54-4839923 | dmitryk@qwilt.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-16 11:10 ` Dmitry Kravkov
@ 2023-10-16 11:13 ` Eric Dumazet
2023-10-16 11:17 ` Eric Dumazet
0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2023-10-16 11:13 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: Kuniyuki Iwashima, netdev, slavas
On Mon, Oct 16, 2023 at 1:10 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
> Hi Eric, do you think we can try something to avoid the crash?
> Decreasing tcp_wmem[0] did not help
> # cat /proc/sys/net/ipv4/tcp_wmem
> 4096 1048576 629145
>
>
Honestly I have no idea, it seems strange that you are the only one to
hit such a bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer
2023-10-16 11:13 ` Eric Dumazet
@ 2023-10-16 11:17 ` Eric Dumazet
0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2023-10-16 11:17 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: Kuniyuki Iwashima, netdev, slavas
On Mon, Oct 16, 2023 at 1:13 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Oct 16, 2023 at 1:10 PM Dmitry Kravkov <dmitryk@qwilt.com> wrote:
>
> > Hi Eric, do you think we can try something to avoid the crash?
> > Decreasing tcp_wmem[0] did not help
> > # cat /proc/sys/net/ipv4/tcp_wmem
> > 4096 1048576 629145
> >
> >
>
> Honestly I have no idea, it seems strange that you are the only one to
> hit such a bug.
Please tell us which Congestion Control module and qdisc you use.
If you use BBR and/or SO_MAX_PACING_RATE, I would recommend switching to sch_fq
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-10-16 11:18 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-11 10:28 kernel BUG at net/ipv4/tcp_output.c:2642 with kernel 5.19.0-rc2 and newer Dmitry Kravkov
2023-10-11 14:02 ` Eric Dumazet
2023-10-11 20:20 ` Dmitry Kravkov
2023-10-11 20:54 ` Kuniyuki Iwashima
2023-10-11 21:19 ` Dmitry Kravkov
2023-10-12 16:49 ` Dmitry Kravkov
2023-10-16 11:10 ` Dmitry Kravkov
2023-10-16 11:13 ` Eric Dumazet
2023-10-16 11:17 ` Eric Dumazet
2023-10-16 11:12 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).