From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Watson Subject: Re: kTLS in combination with mlx4 is very unstable Date: Tue, 1 May 2018 09:09:08 -0700 Message-ID: <20180501160908.GA26223@advait-mbp.dhcp.thefacebook.com> References: <20180424170100.GA40104@fidjisimo-mbp.dhcp.thefacebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: netdev , , Aviad Yehezkel To: Andre Tomt Return-path: Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:40212 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753505AbeEAQJV (ORCPT ); Tue, 1 May 2018 12:09:21 -0400 Content-Disposition: inline In-Reply-To: <20180424170100.GA40104@fidjisimo-mbp.dhcp.thefacebook.com> Sender: netdev-owner@vger.kernel.org List-ID: Hi Andre, On 04/24/18 10:01 AM, Dave Watson wrote: > On 04/22/18 11:21 PM, Andre Tomt wrote: > > The kernel seems to get increasingly unstable as I load it up with client > > connections. At about 9Gbps and 700 connections, it is okay at least for a > > while - it might run fine for say 45 minutes. Once it gets to 20 - 30Gbps, > > the kernel will usually start spewing OOPSes within minutes and the traffic > > drops. > > > > Some bad interaction between mlx4 and kTLS? I tried to repro, but wasn't able to - of course I don't have an mlx4 test setup. If I manually add a tls_write_space call after do_tcp_sendpages, I get a similar stack though. Something like the following should work, can you test? Thanks diff --git a/include/net/tls.h b/include/net/tls.h index 8c56809..ee78f33 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -187,6 +187,7 @@ struct tls_context { struct scatterlist *partially_sent_record; u16 partially_sent_offset; unsigned long flags; + bool in_tcp_sendpages; u16 pending_open_record_frags; int (*push_pending_record)(struct sock *sk, int flags); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 3aafb87..095af65 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -114,6 +114,7 @@ int tls_push_sg(struct sock *sk, size = sg->length - offset; offset += sg->offset; + ctx->in_tcp_sendpages = 1; while (1) { if (sg_is_last(sg)) sendpage_flags = flags; @@ -148,6 +149,8 @@ int tls_push_sg(struct sock *sk, } clear_bit(TLS_PENDING_CLOSED_RECORD, &ctx->flags); + ctx->in_tcp_sendpages = 0; + ctx->sk_write_space(sk); return 0; } @@ -217,6 +220,9 @@ static void tls_write_space(struct sock *sk) { struct tls_context *ctx = tls_get_ctx(sk); + if (ctx->in_tcp_sendpages) + return; + if (!sk->sk_write_pending && tls_is_pending_closed_record(ctx)) { gfp_t sk_allocation = sk->sk_allocation; int rc;