Re: [PATCH net-next] page_pool: let the compiler optimize and inline core functions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Alexander Lobakin <alobakin@pm.me>
Cc: brouer@redhat.com, "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Matteo Croce <mcroce@linux.microsoft.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next] page_pool: let the compiler optimize and inline core functions
Date: Tue, 23 Mar 2021 10:01:38 +0100	[thread overview]
Message-ID: <20210323100138.791a77ce@carbon> (raw)
In-Reply-To: <20210322183047.10768-1-alobakin@pm.me>

On Mon, 22 Mar 2021 18:30:55 +0000
Alexander Lobakin <alobakin@pm.me> wrote:

> As per disscussion in Page Pool bulk allocator thread [0],
> there are two functions in Page Pool core code that are marked as
> 'noinline'. The reason for this is not so clear, and even if it
> was made to reduce hotpath overhead, in fact it only makes things
> worse.
> As both of these functions as being called only once through the
> code, they could be inlined/folded into the non-static entry point.
> However, 'noinline' marks effectively prevent from doing that and
> induce totally unneeded fragmentation (baseline -> after removal):
> 
> add/remove: 0/3 grow/shrink: 1/0 up/down: 1024/-1096 (-72)
> Function                                     old     new   delta
> page_pool_alloc_pages                        100    1124   +1024
> page_pool_dma_map                            164       -    -164
> page_pool_refill_alloc_cache                 332       -    -332
> __page_pool_alloc_pages_slow                 600       -    -600
> 
> (taken from Mel's branch, hence factored-out page_pool_dma_map())

I see that the refactor of page_pool_dma_map() caused it to be
uninlined, that were a mistake.  Thanks for high-lighting that again
as I forgot about this (even-though I think Alex Duyck did point this
out earlier).

I am considering if we should allow compiler to inline
page_pool_refill_alloc_cache + __page_pool_alloc_pages_slow, for the
sake of performance and I loose the ability to diagnose the behavior
from perf-report.  Mind that page_pool avoids stat for the sake of
performance, but these noinline makes it possible to diagnose the
behavior anyway.

> 
> 1124 is a normal hotpath frame size, but these jumps between tiny
> page_pool_alloc_pages(), page_pool_refill_alloc_cache() and
> __page_pool_alloc_pages_slow() are really redundant and harmful
> for performance.

Well, I disagree. (this is a NACK)

If pages were recycled then the code never had to visit
__page_pool_alloc_pages_slow().  And today without the bulk page-alloc
(that we are working on adding together with Mel) we have to visit
__page_pool_alloc_pages_slow() every time, which is a bad design, but
I'm trying to fix that.

Matteo is working on recycling here[1]:
 [1] https://lore.kernel.org/netdev/20210322170301.26017-1-mcroce@linux.microsoft.com/

It would be really great if you could try out his patchset, as it will
help your driver avoid the slow path of the page_pool.  Given you are
very detailed oriented, I do want to point out that Matteo's patchset
is only the first step, as to really improve performance for page_pool,
we need to bulk return these page_pool pages (it is requires some
restructure of the core code, that will be confusing at this point).


> This simple removal of 'noinline' keywords bumps the throughput
> on XDP_PASS + napi_build_skb() + napi_gro_receive() on 25+ Mbps
> for 1G embedded NIC.
> 
> [0] https://lore.kernel.org/netdev/20210317222506.1266004-1-alobakin@pm.me
> 
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> ---
>  net/core/page_pool.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ad8b0707af04..589e4df6ef2b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -102,7 +102,6 @@ EXPORT_SYMBOL(page_pool_create);
> 
>  static void page_pool_return_page(struct page_pool *pool, struct page *page);
> 
> -noinline
>  static struct page *page_pool_refill_alloc_cache(struct page_pool *pool)
>  {
>  	struct ptr_ring *r = &pool->ring;
> @@ -181,7 +180,6 @@ static void page_pool_dma_sync_for_device(struct page_pool *pool,
>  }
> 
>  /* slow path */
> -noinline
>  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>  						 gfp_t _gfp)
>  {
> --
> 2.31.0
> 
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2021-03-23  9:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22 18:30 [PATCH net-next] page_pool: let the compiler optimize and inline core functions Alexander Lobakin
2021-03-23  9:01 ` Jesper Dangaard Brouer [this message]
2021-03-23 15:36   ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210323100138.791a77ce@carbon \
    --to=brouer@redhat.com \
    --cc=alobakin@pm.me \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcroce@linux.microsoft.com \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.