[RFC PATCH] net: don't keep lonely packets forever in the gro hash

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH] net: don't keep lonely packets forever in the gro hash
@ 2018-11-20 10:17 Paolo Abeni
  2018-11-20 11:30 ` Sergei Shtylyov
  2018-11-20 13:49 ` Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: Paolo Abeni @ 2018-11-20 10:17 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Willem de Bruijn, Eric Dumazet

Eric noted that with UDP GRO and napi timeout, we could keep a single
UDP packet inside the GRO hash forever, if the related NAPI instance
calls napi_gro_complete() at an higher frequency than the napi timeout.
Willem noted that even TCP packets could be trapped there, till the
next retransmission.
This patch tries to address the issue, flushing the oldest packets before
scheduling the NAPI timeout. The rationale is that such a timeout should be
well below a jiffy and we are not flushing packets eligible for sane GRO.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Sending as RFC, as I fear I'm missing some relevant pieces.
Also I'm unsure if this should considered a fixes for "udp: implement
GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"
---
 net/core/dev.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5927f6a7c301..5cc4c4961869 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5975,11 +5975,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 		if (work_done)
 			timeout = n->dev->gro_flush_timeout;
 
+		/* When the NAPI instance uses a timeout, we still need to
+		 * someout bound the time packets are keept in the GRO layer
+		 * under heavy traffic
+		 */
+		napi_gro_flush(n, !!timeout);
 		if (timeout)
 			hrtimer_start(&n->timer, ns_to_ktime(timeout),
 				      HRTIMER_MODE_REL_PINNED);
-		else
-			napi_gro_flush(n, false);
 	}
 	if (unlikely(!list_empty(&n->poll_list))) {
 		/* If n->poll_list is not empty, we need to mask irqs */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash
  2018-11-20 10:17 [RFC PATCH] net: don't keep lonely packets forever in the gro hash Paolo Abeni
@ 2018-11-20 11:30 ` Sergei Shtylyov
  2018-11-20 13:49 ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Sergei Shtylyov @ 2018-11-20 11:30 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Eric Dumazet

On 11/20/2018 01:17 PM, Paolo Abeni wrote:

> Eric noted that with UDP GRO and napi timeout, we could keep a single

   NAPI, else you're simply inconsistent.

> UDP packet inside the GRO hash forever, if the related NAPI instance
> calls napi_gro_complete() at an higher frequency than the napi timeout.

   NAPI.

> Willem noted that even TCP packets could be trapped there, till the
> next retransmission.
> This patch tries to address the issue, flushing the oldest packets before
> scheduling the NAPI timeout. The rationale is that such a timeout should be
> well below a jiffy and we are not flushing packets eligible for sane GRO.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Sending as RFC, as I fear I'm missing some relevant pieces.
> Also I'm unsure if this should considered a fixes for "udp: implement
> GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"
> ---
>  net/core/dev.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5927f6a7c301..5cc4c4961869 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5975,11 +5975,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>  		if (work_done)
>  			timeout = n->dev->gro_flush_timeout;
>  
> +		/* When the NAPI instance uses a timeout, we still need to
> +		 * someout bound the time packets are keept in the GRO layer

   Somehow?

> +		 * under heavy traffic
> +		 */
> +		napi_gro_flush(n, !!timeout);
>  		if (timeout)
>  			hrtimer_start(&n->timer, ns_to_ktime(timeout),
>  				      HRTIMER_MODE_REL_PINNED);
> -		else
> -			napi_gro_flush(n, false);
>  	}
>  	if (unlikely(!list_empty(&n->poll_list))) {
>  		/* If n->poll_list is not empty, we need to mask irqs */

MBR, Sergei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash
  2018-11-20 10:17 [RFC PATCH] net: don't keep lonely packets forever in the gro hash Paolo Abeni
  2018-11-20 11:30 ` Sergei Shtylyov
@ 2018-11-20 13:49 ` Eric Dumazet
  2018-11-20 15:42   ` Paolo Abeni
  2018-11-21  9:13   ` Daniel Borkmann
  1 sibling, 2 replies; 6+ messages in thread
From: Eric Dumazet @ 2018-11-20 13:49 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn, Eric Dumazet



On 11/20/2018 02:17 AM, Paolo Abeni wrote:
> Eric noted that with UDP GRO and napi timeout, we could keep a single
> UDP packet inside the GRO hash forever, if the related NAPI instance
> calls napi_gro_complete() at an higher frequency than the napi timeout.
> Willem noted that even TCP packets could be trapped there, till the
> next retransmission.
> This patch tries to address the issue, flushing the oldest packets before
> scheduling the NAPI timeout. The rationale is that such a timeout should be
> well below a jiffy and we are not flushing packets eligible for sane GRO.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Sending as RFC, as I fear I'm missing some relevant pieces.
> Also I'm unsure if this should considered a fixes for "udp: implement
> GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"

You can add both, now worries.

Google DC TCP forces a PSH flag on all TSO packets, so for us the flush is done
because of the PSH flag, not upon a timer/jiffie.

Truth be told, relying on jiffies change is a bit fragile for HZ=100 or HZ=250 kernels.

See recent TCP commit that got rid of tcp_tso_should_defer() dependency on HZ/jiffies

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=a682850a114aef947da5d603f7fd2cfe7eabbd72


> ---
>  net/core/dev.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 5927f6a7c301..5cc4c4961869 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5975,11 +5975,14 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>  		if (work_done)
>  			timeout = n->dev->gro_flush_timeout;
>  
> +		/* When the NAPI instance uses a timeout, we still need to
> +		 * someout bound the time packets are keept in the GRO layer
> +		 * under heavy traffic
> +		 */
> +		napi_gro_flush(n, !!timeout);
>  		if (timeout)
>  			hrtimer_start(&n->timer, ns_to_ktime(timeout),
>  				      HRTIMER_MODE_REL_PINNED);
> -		else
> -			napi_gro_flush(n, false);
>  	}
>  	if (unlikely(!list_empty(&n->poll_list))) {
>  		/* If n->poll_list is not empty, we need to mask irqs */
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash
  2018-11-20 13:49 ` Eric Dumazet
@ 2018-11-20 15:42   ` Paolo Abeni
  2018-11-20 16:28     ` Eric Dumazet
  2018-11-21  9:13   ` Daniel Borkmann
  1 sibling, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2018-11-20 15:42 UTC (permalink / raw)
  To: Eric Dumazet, netdev; +Cc: David S. Miller, Willem de Bruijn

Hi,

On Tue, 2018-11-20 at 05:49 -0800, Eric Dumazet wrote:
> 
> On 11/20/2018 02:17 AM, Paolo Abeni wrote:
> > Eric noted that with UDP GRO and napi timeout, we could keep a single
> > UDP packet inside the GRO hash forever, if the related NAPI instance
> > calls napi_gro_complete() at an higher frequency than the napi timeout.
> > Willem noted that even TCP packets could be trapped there, till the
> > next retransmission.
> > This patch tries to address the issue, flushing the oldest packets before
> > scheduling the NAPI timeout. The rationale is that such a timeout should be
> > well below a jiffy and we are not flushing packets eligible for sane GRO.
> > 
> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Sending as RFC, as I fear I'm missing some relevant pieces.
> > Also I'm unsure if this should considered a fixes for "udp: implement
> > GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"

Thank you for your feedback!

> Truth be told, relying on jiffies change is a bit fragile for HZ=100 or HZ=250 kernels.

Yes, we have higher bound there.

> See recent TCP commit that got rid of tcp_tso_should_defer() dependency on HZ/jiffies
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=a682850a114aef947da5d603f7fd2cfe7eabbd72

I'm unsure I follow correctly. Are you suggesting to use ns precision
for skb aging in GRO? If so, could that be a separate change? (looks
more invasive)

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash
  2018-11-20 15:42   ` Paolo Abeni
@ 2018-11-20 16:28     ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2018-11-20 16:28 UTC (permalink / raw)
  To: Paolo Abeni, netdev; +Cc: David S. Miller, Willem de Bruijn



On 11/20/2018 07:42 AM, Paolo Abeni wrote:
> Hi,
> 
> On Tue, 2018-11-20 at 05:49 -0800, Eric Dumazet wrote:
>>
>> On 11/20/2018 02:17 AM, Paolo Abeni wrote:
>>> Eric noted that with UDP GRO and napi timeout, we could keep a single
>>> UDP packet inside the GRO hash forever, if the related NAPI instance
>>> calls napi_gro_complete() at an higher frequency than the napi timeout.
>>> Willem noted that even TCP packets could be trapped there, till the
>>> next retransmission.
>>> This patch tries to address the issue, flushing the oldest packets before
>>> scheduling the NAPI timeout. The rationale is that such a timeout should be
>>> well below a jiffy and we are not flushing packets eligible for sane GRO.
>>>
>>> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> Sending as RFC, as I fear I'm missing some relevant pieces.
>>> Also I'm unsure if this should considered a fixes for "udp: implement
>>> GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"
> 
> Thank you for your feedback!
> 
>> Truth be told, relying on jiffies change is a bit fragile for HZ=100 or HZ=250 kernels.
> 
> Yes, we have higher bound there.
> 
>> See recent TCP commit that got rid of tcp_tso_should_defer() dependency on HZ/jiffies
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=a682850a114aef947da5d603f7fd2cfe7eabbd72
> 
> I'm unsure I follow correctly. Are you suggesting to use ns precision
> for skb aging in GRO? If so, could that be a separate change? (looks
> more invasive)

I am not suggesting adding ns in your patch.

That can be done later if we care.

I simply warn that some distros have low HZ value and thus the fix wont prevent packet sitting 4 or 10 ms in the queue.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH] net: don't keep lonely packets forever in the gro hash
  2018-11-20 13:49 ` Eric Dumazet
  2018-11-20 15:42   ` Paolo Abeni
@ 2018-11-21  9:13   ` Daniel Borkmann
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2018-11-21  9:13 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni, netdev
  Cc: David S. Miller, Willem de Bruijn, alexei.starovoitov, brakmo

Hi Eric,

On 11/20/2018 02:49 PM, Eric Dumazet wrote:
> On 11/20/2018 02:17 AM, Paolo Abeni wrote:
>> Eric noted that with UDP GRO and napi timeout, we could keep a single
>> UDP packet inside the GRO hash forever, if the related NAPI instance
>> calls napi_gro_complete() at an higher frequency than the napi timeout.
>> Willem noted that even TCP packets could be trapped there, till the
>> next retransmission.
>> This patch tries to address the issue, flushing the oldest packets before
>> scheduling the NAPI timeout. The rationale is that such a timeout should be
>> well below a jiffy and we are not flushing packets eligible for sane GRO.
>>
>> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> Sending as RFC, as I fear I'm missing some relevant pieces.
>> Also I'm unsure if this should considered a fixes for "udp: implement
>> GRO for plain UDP sockets." or for "net: gro: add a per device gro flush timer"
> 
> You can add both, now worries.
> 
> Google DC TCP forces a PSH flag on all TSO packets, so for us the flush is done
> because of the PSH flag, not upon a timer/jiffie.

Any plans to upstream this change? Or is the concern that this should only
be used DC internally? (If the latter, perhaps it would make sense to add
this under DCTCP mode then?)

Thanks a lot,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-21 19:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-11-20 10:17 [RFC PATCH] net: don't keep lonely packets forever in the gro hash Paolo Abeni
2018-11-20 11:30 ` Sergei Shtylyov
2018-11-20 13:49 ` Eric Dumazet
2018-11-20 15:42   ` Paolo Abeni
2018-11-20 16:28     ` Eric Dumazet
2018-11-21  9:13   ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).