From: Eric Dumazet <eric.dumazet@gmail.com>
To: Laurent Fasnacht <laurent.fasnacht@proton.ch>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Eric Dumazet <edumazet@google.com>,
Yuchung Cheng <ycheng@google.com>,
Neal Cardwell <ncardwell@google.com>
Subject: Re: Bug in tcp_rtx_synack?
Date: Mon, 30 May 2022 11:09:30 -0700 [thread overview]
Message-ID: <7492f8e7-d4cc-8fe2-a02c-6749a8068be5@gmail.com> (raw)
In-Reply-To: <99ZT3wzzJiMfHBn9Ul-NdFqpZAo3QoZbOGfgFx-X60_EOIzwtUNC6991CzKn0CSNukTVz1ib9TrLSgTlhePSDVK70nTaQlx5oTxXHYbsSyg=@proton.ch>
On 5/30/22 03:08, Laurent Fasnacht wrote:
> Hello,
>
> I'm having the following bug on a 5.16 kernel, it happens periodically (a few times per day, on every of our production server that has this kernel). I'm unable to reproduce on machines with lower load and I also know for sure that it doesn't happen on the 5.10 kernel.
>
> I wonder if it's related to trace_tcp_retransmit_synack?
>
> I'm happy to help, let me know.
>
> Cheers,
> Laurent
>
> ---
>
> (gdb) l *(tcp_rtx_synack+0x8d)
> 0xffffffff817ee76d is in tcp_rtx_synack (arch/x86/include/asm/preempt.h:95).
> 90 * a decrement which hits zero means we have no preempt_count and should
> 91 * reschedule.
> 92 */
> 93 static __always_inline bool __preempt_count_dec_and_test(void)
> 94 {
> 95 return GEN_UNARY_RMWcc("decl", __preempt_count, e, __percpu_arg([var]));
> 96 }
> 97
> 98 /*
> 99 * Returns true when we need to resched and can (barring IRQ state).
> (gdb) l *(tcp_rtx_synack+0x8d-4)
> 0xffffffff817ee769 is in tcp_rtx_synack (include/trace/events/tcp.h:190).
> 185 TP_PROTO(struct sock *sk),
> 186
> 187 TP_ARGS(sk)
> 188 );
> 189
> 190 TRACE_EVENT(tcp_retransmit_synack,
> 191
> 192 TP_PROTO(const struct sock *sk, const struct request_sock *req),
> 193
> 194 TP_ARGS(sk, req),
>
> --
>
> BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
> caller is tcp_rtx_synack.part.0+0x36/0xc0
> CPU: 10 PID: 2180 Comm: epollpep Tainted: G OE 5.16.0-0.bpo.4-amd64 #1 Debian 5.16.12-1~bpo11+1
> Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
> Call Trace:
> <TASK>
> dump_stack_lvl+0x48/0x5e
> check_preemption_disabled+0xde/0xe0
> tcp_rtx_synack.part.0+0x36/0xc0
> tcp_rtx_synack+0x8d/0xa0
> ? kmem_cache_alloc+0x2e0/0x3e0
> ? apparmor_file_alloc_security+0x3b/0x1f0
> inet_rtx_syn_ack+0x16/0x30
> tcp_check_req+0x367/0x610
> tcp_rcv_state_process+0x91/0xf60
> ? get_nohz_timer_target+0x18/0x1a0
> ? lock_timer_base+0x61/0x80
> ? preempt_count_add+0x68/0xa0
> tcp_v4_do_rcv+0xbd/0x270
> __release_sock+0x6d/0xb0
> release_sock+0x2b/0x90
> sock_setsockopt+0x138/0x1140
> ? __sys_getsockname+0x7e/0xc0
> ? aa_sk_perm+0x3e/0x1a0
> __sys_setsockopt+0x198/0x1e0
> __x64_sys_setsockopt+0x21/0x30
> do_syscall_64+0x38/0xc0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fefe7d4441a
> Code: ff ff ff c3 0f 1f 40 00 48 8b 15 71 ea 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 ea 0b 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffca1cd0ab8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
> RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fefe7d4441a
> RDX: 0000000000000009 RSI: 0000000000000001 RDI: 00000000000006f3
> RBP: 00007ffca1cd1410 R08: 0000000000000004 R09: 0000560e9f8a55ec
> R10: 00007ffca1cd10f0 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffca1cd1190 R14: 00007ffca1cd1198 R15: 00007ffca1cd23f0
> </TASK>
>
I think this is related to FastOpen server code.
It seems that we retransmit a SYNACK if we receive the SYN another time.
This means tcp_rtx_synack() can be called from process context (instead
of BH handler) in this particular case.
Fix could be:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index
b4b2284ed4a2c9e2569bd945e3b4e023c5502f25..1c054431e358328fe3849f5a45aaa88308a1e1c8
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -4115,8 +4115,8 @@ int tcp_rtx_synack(const struct sock *sk, struct
request_sock *req)
res = af_ops->send_synack(sk, NULL, &fl, req, NULL,
TCP_SYNACK_NORMAL,
NULL);
if (!res) {
- __TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
- __NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
+ TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
if (unlikely(tcp_passive_fastopen(sk)))
tcp_sk(sk)->total_retrans++;
trace_tcp_retransmit_synack(sk, req);
prev parent reply other threads:[~2022-05-30 18:09 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-30 10:08 Bug in tcp_rtx_synack? Laurent Fasnacht
2022-05-30 18:09 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7492f8e7-d4cc-8fe2-a02c-6749a8068be5@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=edumazet@google.com \
--cc=laurent.fasnacht@proton.ch \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).