All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander H Duyck <alexander.duyck@gmail.com>
To: Alexander Lobakin <aleksander.lobakin@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Larysa Zaremba <larysa.zaremba@intel.com>,
	Yunsheng Lin <linyunsheng@huawei.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
Date: Thu, 29 Jun 2023 09:45:26 -0700	[thread overview]
Message-ID: <69e827e239dab9fd7986ee43cef599d024c8535f.camel@gmail.com> (raw)
In-Reply-To: <20230629152305.905962-3-aleksander.lobakin@intel.com>

On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
> even when on DMA-coherent platforms (like x86) with no active IOMMU or
> swiotlb, just for the call ladder.
> Indeed, it's
> 
> page_pool_put_page()
>   page_pool_put_defragged_page()                  <- external
>     __page_pool_put_page()
>       page_pool_dma_sync_for_device()             <- non-inline
>         dma_sync_single_range_for_device()
>           dma_sync_single_for_device()            <- external
>             dma_direct_sync_single_for_device()
>               dev_is_dma_coherent()               <- exit
> 
> For the inline functions, no guarantees the compiler won't uninline them
> (they're clearly not one-liners and sometimes compilers uninline even
> 2 + 2). The first external call is necessary, but the rest 2+ are done
> for nothing each time, plus a bunch of checks here and there.
> Since Page Pool mappings are long-term and for one "device + addr" pair
> dma_need_sync() will always return the same value (basically, whether it
> belongs to an swiotlb pool), addresses can be tested once right after
> they're obtained and the result can be reused until the page is unmapped.
> Define new PP flag, which will mean "do DMA syncs for device, but only
> when needed" and turn it on by default when the driver asks to sync
> pages. When a page is mapped, check whether it needs syncs and if so,
> replace that "sync when needed" back to "always do syncs" globally for
> the whole pool (better safe than sorry). As long as a pool has no pages
> requiring DMA syncs, this cuts off a good piece of calls and checks.
> On my x86_64, this gives from 2% to 5% performance benefit with no
> negative impact for cases when IOMMU is on and the shortcut can't be
> used.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/net/page_pool.h |  3 +++
>  net/core/page_pool.c    | 10 ++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 829dc1f8ba6b..ff3772fab707 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -23,6 +23,9 @@
>  					* Please note DMA-sync-for-CPU is still
>  					* device driver responsibility
>  					*/
> +#define PP_FLAG_DMA_MAYBE_SYNC	BIT(2) /* Internal, should not be used in
> +					* drivers
> +					*/
>  #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
>  				 PP_FLAG_DMA_SYNC_DEV)
>  
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index dff0b4fa2316..498e058140b3 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -197,6 +197,10 @@ static int page_pool_init(struct page_pool *pool,
>  		/* pool->p.offset has to be set according to the address
>  		 * offset used by the DMA engine to start copying rx data
>  		 */
> +
> +		/* Try to avoid calling no-op syncs */
> +		pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC;
> +		pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV;
>  	}
>  
>  #ifdef CONFIG_PAGE_POOL_STATS
> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>  
>  	page_pool_set_dma_addr(page, dma);
>  
> +	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
> +	    dma_need_sync(pool->p.dev, dma)) {
> +		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
> +		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
> +	}
> +
>  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>  

I am pretty sure the logic is flawed here. The problem is
dma_needs_sync depends on the DMA address being used. In the worst case
scenario we could have a device that has something like a 32b DMA
address space on a system with over 4GB of memory. In such a case the
higher addresses would need to be synced because they will go off to a
swiotlb bounce buffer while the lower addresses wouldn't.

If you were to store a flag like this it would have to be generated per
page.

  reply	other threads:[~2023-06-29 16:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
2023-06-29 16:55   ` Alexander H Duyck
2023-06-30 12:39     ` Alexander Lobakin
2023-06-30 15:11       ` Alexander Duyck
2023-06-30 16:05         ` Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-06-29 16:45   ` Alexander H Duyck [this message]
2023-06-30 12:29     ` Alexander Lobakin
2023-06-30 14:44       ` Alexander Duyck
2023-06-30 15:34         ` Alexander Lobakin
2023-06-30 18:28           ` Alexander Duyck
2023-07-03 13:38             ` Alexander Lobakin
2023-07-03 20:32           ` Jakub Kicinski
2023-07-05 14:41             ` Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 3/4] net: add flag to indicate NAPI/GRO is running right now Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 4/4] net: skbuff: always recycle PP pages directly when inside a NAPI loop Alexander Lobakin
2023-07-02  0:01 ` [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Jakub Kicinski
2023-07-03 13:50   ` Alexander Lobakin
2023-07-03 18:57     ` Jakub Kicinski
2023-07-05 12:31       ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=69e827e239dab9fd7986ee43cef599d024c8535f.camel@gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=larysa.zaremba@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.