From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash Date: Tue, 20 Nov 2018 05:49:47 -0800 Message-ID: References: <3c8b5aea0c812323d8e15b548789a1e240f499d7.1542709015.git.pabeni@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: "David S. Miller" , Willem de Bruijn , Eric Dumazet To: Paolo Abeni , netdev@vger.kernel.org Return-path: Received: from mail-pf1-f196.google.com ([209.85.210.196]:45745 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726119AbeKUATF (ORCPT ); Tue, 20 Nov 2018 19:19:05 -0500 Received: by mail-pf1-f196.google.com with SMTP id g62so1017318pfd.12 for ; Tue, 20 Nov 2018 05:49:50 -0800 (PST) In-Reply-To: <3c8b5aea0c812323d8e15b548789a1e240f499d7.1542709015.git.pabeni@redhat.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 11/20/2018 02:17 AM, Paolo Abeni wrote: > Eric noted that with UDP GRO and napi timeout, we could keep a single > UDP packet inside the GRO hash forever, if the related NAPI instance > calls napi_gro_complete() at an higher frequency than the napi timeout. > Willem noted that even TCP packets could be trapped there, till the > next retransmission. > This patch tries to address the issue, flushing the oldest packets before > scheduling the NAPI timeout. The rationale is that such a timeout should be > well below a jiffy and we are not flushing packets eligible for sane GRO. > > Reported-by: Eric Dumazet > Signed-off-by: Paolo Abeni > --- > Sending as RFC, as I fear I'm missing some relevant pieces. > Also I'm unsure if this should considered a fixes for "udp: implement > GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer" You can add both, now worries. Google DC TCP forces a PSH flag on all TSO packets, so for us the flush is done because of the PSH flag, not upon a timer/jiffie. Truth be told, relying on jiffies change is a bit fragile for HZ=100 or HZ=250 kernels. See recent TCP commit that got rid of tcp_tso_should_defer() dependency on HZ/jiffies https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=a682850a114aef947da5d603f7fd2cfe7eabbd72 > --- > net/core/dev.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 5927f6a7c301..5cc4c4961869 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -5975,11 +5975,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > if (work_done) > timeout = n->dev->gro_flush_timeout; > > + /* When the NAPI instance uses a timeout, we still need to > + * someout bound the time packets are keept in the GRO layer > + * under heavy traffic > + */ > + napi_gro_flush(n, !!timeout); > if (timeout) > hrtimer_start(&n->timer, ns_to_ktime(timeout), > HRTIMER_MODE_REL_PINNED); > - else > - napi_gro_flush(n, false); > } > if (unlikely(!list_empty(&n->poll_list))) { > /* If n->poll_list is not empty, we need to mask irqs */ >