netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: brouer@redhat.com, David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net-next] udp: under rx pressure, try to condense skbs
Date: Thu, 8 Dec 2016 10:46:20 +0100	[thread overview]
Message-ID: <20161208104620.5fc691b8@redhat.com> (raw)
In-Reply-To: <1481131173.4930.36.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 07 Dec 2016 09:19:33 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> Under UDP flood, many softirq producers try to add packets to
> UDP receive queue, and one user thread is burning one cpu trying
> to dequeue packets as fast as possible.
> 
> Two parts of the per packet cost are :
> - copying payload from kernel space to user space,
> - freeing memory pieces associated with skb.
> 
> If socket is under pressure, softirq handler(s) can try to pull in
> skb->head the payload of the packet if it fits.
> 
> Meaning the softirq handler(s) can free/reuse the page fragment
> immediately, instead of letting udp_recvmsg() do this hundreds of usec
> later, possibly from another node.
> 
> 
> Additional gains :
> - We reduce skb->truesize and thus can store more packets per SO_RCVBUF
> - We avoid cache line misses at copyout() time and consume_skb() time,
> and avoid one put_page() with potential alien freeing on NUMA hosts.
> 
> This comes at the cost of a copy, bounded to available tail room, which
> is usually small. (We might have to fix GRO_MAX_HEAD which looks bigger
> than necessary)
> 
> This patch gave me about 5 % increase in throughput in my tests.

Hmmm... I'm not thrilled to have such heuristics, that change memory
behavior when half of the queue size (sk->sk_rcvbuf) is reached.

Most of the win comes from doing a local atomic page-refcnt decrement
oppose to doing a remote CPU refcnf-dec.  And as you noticed the
benefit is quite high saving 241 cycles (see [1]).  And you patch is
"using" these cycles to copy the packet instead.

This might no be a win in the future.  I'm working on a more generic
solution (page_pool) that (as one objective) target this remote recfnt.


[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench03.c
 Measured on: i7-4790K CPU @ 4.00GHz
 Same CPU release cost  : 251 cycles
 Remote CPU release cost: 492 cycles

 
> skb_condense() helper could probably used in other contexts.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> ---
[...]
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index b45cd1494243fc99686016949f4546dbba11f424..84151cf40aebb973bad5bee3ee4be0758084d83c 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4931,3 +4931,31 @@ struct sk_buff *pskb_extract(struct sk_buff *skb, int off,
>  EXPORT_SYMBOL(pskb_extract);
> +
> +/**
> + * skb_condense - try to get rid of fragments/frag_list if possible
> + * @skb: buffer
> + *
> + * Can be used to save memory before skb is added to a busy queue.
> + * If packet has bytes in frags and enough tail room in skb->head,
> + * pull all of them, so that we can free the frags right now and adjust
> + * truesize.
> + * Notes:
> + *	We do not reallocate skb->head thus can not fail.
> + *	Caller must re-evaluate skb->truesize if needed.
> + */
> +void skb_condense(struct sk_buff *skb)
> +{
> +	if (!skb->data_len ||
> +	    skb->data_len > skb->end - skb->tail ||
> +	    skb_cloned(skb))
> +		return;

So this only active, depending on how driver constructed the SKB, but
all end-up doing a function call (not inlined).

> +	/* Nice, we can free page frag(s) right now */
> +	__pskb_pull_tail(skb, skb->data_len);
> +
> +	/* Now adjust skb->truesize, since __pskb_pull_tail() does
> +	 * not do this.
> +	 */
> +	skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
> +}
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 16d88ba9ff1c402f77063cfb5eea2708d86da2fc..f5628ada47b53f0d92d08210e5d7e4132a107f73 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
[...]
> @@ -1208,6 +1208,16 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
>  	if (rmem > sk->sk_rcvbuf)
>  		goto drop;
>  
> +	/* Under mem pressure, it might be helpful to help udp_recvmsg()
> +	 * having linear skbs :
> +	 * - Reduce memory overhead and thus increase receive queue capacity
> +	 * - Less cache line misses at copyout() time
> +	 * - Less work at consume_skb() (less alien page frag freeing)
> +	 */
> +	if (rmem > (sk->sk_rcvbuf >> 1))
> +		skb_condense(skb);
> +	size = skb->truesize;
> +



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  parent reply	other threads:[~2016-12-08  9:46 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 17:19 [PATCH net-next] udp: under rx pressure, try to condense skbs Eric Dumazet
2016-12-07 19:31 ` Eric Dumazet
2016-12-08  9:46 ` Jesper Dangaard Brouer [this message]
2016-12-08 15:30   ` Eric Dumazet
2016-12-08 15:36     ` Rick Jones
2016-12-08 16:08       ` Eric Dumazet
2016-12-08 18:26 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161208104620.5fc691b8@redhat.com \
    --to=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).