[PATCH net-next v12 0/5] introduce page_pool

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v12 0/5] introduce page_pool_alloc() related API
@ 2023-10-20  9:59 Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, bpf,
	linux-arm-kernel, linux-mediatek

In [1] & [2] & [3], there are usecases for veth and virtio_net
to use frag support in page pool to reduce memory usage, and it
may request different frag size depending on the head/tail
room space for xdp_frame/shinfo and mtu/packet size. When the
requested frag size is large enough that a single page can not
be split into more than one frag, using frag support only have
performance penalty because of the extra frag count handling
for frag support.

So this patchset provides a page pool API for the driver to
allocate memory with least memory utilization and performance
penalty when it doesn't know the size of memory it need
beforehand.

1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/
3. https://github.com/alobakin/linux/tree/iavf-pp-frag

V12: Rename page_pool_cache_alloc() to page_pool_alloc_va()
     and mask off __GFP_HIGHMEM for page allocation.

V11: Repost based on the latest net-next branch and collect
     Tested-by Tag from Alexander.

V10: Use fragment instead of frag in English docs.
     Remove PP_FLAG_PAGE_FRAG usage in idpf driver.

V9: Update some performance info in patch 2.

V8: Store the dma addr on a shifted u32 instead of using
    dma_addr_t explicitly for 32-bit arch with 64-bit DMA.
    Update document according to discussion in v7.

V7: Fix a compile error, a few typo and use kernel-doc syntax.

V6: Add a PP_FLAG_PAGE_SPLIT_IN_DRIVER flag to fail the page_pool
    creation for 32-bit arch with 64-bit DMA when driver tries to
    do the page splitting itself, adjust the requested size to
    include head/tail room in veth, and rebased on the latest
    next-net.

v5 RFC: Add a new page_pool_cache_alloc() API, and other minor
        change as discussed in v4. As there seems to be three
        comsumers that might be made use of the new API, so
        repost it as RFC and CC the relevant authors to see
        if the new API fits their need.

V4. Fix a typo and add a patch to update document about frag
    API, PAGE_POOL_DMA_USE_PP_FRAG_COUNT is not renamed yet
    as we may need a different thread to discuss that.

V3: Incorporate changes from the disscusion with Alexander,
    mostly the inline wraper, PAGE_POOL_DMA_USE_PP_FRAG_COUNT
    change split to separate patch and comment change.

V2: Add patch to remove PP_FLAG_PAGE_FRAG flags and mention
    virtio_net usecase in the cover letter.

V1: Drop RFC tag and page_pool_frag patch.

Yunsheng Lin (5):
  page_pool: unify frag_count handling in page_pool_is_last_frag()
  page_pool: remove PP_FLAG_PAGE_FRAG
  page_pool: introduce page_pool_alloc() API
  page_pool: update document about fragment API
  net: veth: use newly added page pool API for veth with xdp

 Documentation/networking/page_pool.rst        |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 -
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   3 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.c   |   3 -
 .../marvell/octeontx2/nic/otx2_common.c       |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +-
 drivers/net/veth.c                            |  25 ++-
 drivers/net/wireless/mediatek/mt76/mac80211.c |   2 +-
 include/net/page_pool/helpers.h               | 210 +++++++++++++++---
 include/net/page_pool/types.h                 |   6 +-
 net/core/page_pool.c                          |  17 +-
 net/core/skbuff.c                             |   2 +-
 12 files changed, 220 insertions(+), 58 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
@ 2023-10-20  9:59 ` Yunsheng Lin
  2023-10-23 11:43   ` Ilias Apalodimas
  2023-10-20  9:59 ` [PATCH net-next v12 2/5] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
   page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
   the page->pp_frag_count when there is no more user will
   be holding on to that page.

Those constraints exist in order to support a page to be
split into multi fragments.

And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.

Those constraints are unavoidable for case when we need a
page to be split into more than one fragment, but there is
also case that we want to avoid the above constraints and
their overhead when a page can't be split as it can only
hold a fragment as requested by user, depending on different
use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
            depending on the fragment size.

Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.

So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
   not, which means we might need to update that flag/bit
   everytime the page is recycled, dirtying the cache line
   of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
   unsplit page by assuming all pages in the page pool is split
   into a big fragment initially.

As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last fragment
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.

There is no noticeable performance degradation and some
justification for unifying the frag_count handling with this
patch applied using a micro-benchmark testing in [1].

1. https://lore.kernel.org/all/bf2591f8-7b3c-4480-bb2c-31dc9da1d6ac@huawei.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 47 ++++++++++++++++++++++++---------
 net/core/page_pool.c            | 10 ++++++-
 2 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 8f64adf86f5b..759489c037c7 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -115,28 +115,49 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
 	long ret;
 
 	/* If nr == pp_frag_count then we have cleared all remaining
-	 * references to the page. No need to actually overwrite it, instead
-	 * we can leave this to be overwritten by the calling function.
+	 * references to the page:
+	 * 1. 'n == 1': no need to actually overwrite it.
+	 * 2. 'n != 1': overwrite it with one, which is the rare case
+	 *              for pp_frag_count draining.
 	 *
-	 * The main advantage to doing this is that an atomic_read is
-	 * generally a much cheaper operation than an atomic update,
-	 * especially when dealing with a page that may be partitioned
-	 * into only 2 or 3 pieces.
+	 * The main advantage to doing this is that not only we avoid a atomic
+	 * update, as an atomic_read is generally a much cheaper operation than
+	 * an atomic update, especially when dealing with a page that may be
+	 * partitioned into only 2 or 3 pieces; but also unify the pp_frag_count
+	 * handling by ensuring all pages have partitioned into only 1 piece
+	 * initially, and only overwrite it when the page is partitioned into
+	 * more than one piece.
 	 */
-	if (atomic_long_read(&page->pp_frag_count) == nr)
+	if (atomic_long_read(&page->pp_frag_count) == nr) {
+		/* As we have ensured nr is always one for constant case using
+		 * the BUILD_BUG_ON(), only need to handle the non-constant case
+		 * here for pp_frag_count draining, which is a rare case.
+		 */
+		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
+		if (!__builtin_constant_p(nr))
+			atomic_long_set(&page->pp_frag_count, 1);
+
 		return 0;
+	}
 
 	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
 	WARN_ON(ret < 0);
+
+	/* We are the last user here too, reset pp_frag_count back to 1 to
+	 * ensure all pages have been partitioned into 1 piece initially,
+	 * this should be the rare case when the last two fragment users call
+	 * page_pool_defrag_page() currently.
+	 */
+	if (unlikely(!ret))
+		atomic_long_set(&page->pp_frag_count, 1);
+
 	return ret;
 }
 
-static inline bool page_pool_is_last_frag(struct page_pool *pool,
-					  struct page *page)
+static inline bool page_pool_is_last_frag(struct page *page)
 {
-	/* If fragments aren't enabled or count is 0 we were the last user */
-	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-	       (page_pool_defrag_page(page, 1) == 0);
+	/* If page_pool_defrag_page() returns 0, we were the last user */
+	return page_pool_defrag_page(page, 1) == 0;
 }
 
 /**
@@ -161,7 +182,7 @@ static inline void page_pool_put_page(struct page_pool *pool,
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	if (!page_pool_is_last_frag(pool, page))
+	if (!page_pool_is_last_frag(page))
 		return;
 
 	page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 8a9868ea5067..953535cab081 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -376,6 +376,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+
+	/* Ensuring all pages have been split into one fragment initially:
+	 * page_pool_set_pp_info() is only called once for every page when it
+	 * is allocated from the page allocator and page_pool_fragment_page()
+	 * is dirtying the same cache line as the page->pp_magic above, so
+	 * the overhead is negligible.
+	 */
+	page_pool_fragment_page(page, 1);
 	if (pool->p.init_callback)
 		pool->p.init_callback(page, pool->p.init_arg);
 }
@@ -672,7 +680,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 		struct page *page = virt_to_head_page(data[i]);
 
 		/* It is not the last user for the page frag case */
-		if (!page_pool_is_last_frag(pool, page))
+		if (!page_pool_is_last_frag(page))
 			continue;
 
 		page = __page_pool_put_page(pool, page, -1, false);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v12 2/5] page_pool: remove PP_FLAG_PAGE_FRAG
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-10-20  9:59 ` Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 3/5] page_pool: introduce page_pool_alloc() API Yunsheng Lin
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Michael Chan,
	Eric Dumazet, Yisen Zhuang, Salil Mehta, Jesse Brandeburg,
	Tony Nguyen, Sunil Goutham, Geetha sowjanya, Subbaraya Sundeep,
	hariprasad, Saeed Mahameed, Leon Romanovsky, Felix Fietkau,
	Ryder Lee, Shayne Chen, Sean Wang, Kalle Valo, Matthias Brugger,
	AngeloGioacchino Del Regno, Jesper Dangaard Brouer,
	Ilias Apalodimas, intel-wired-lan, linux-rdma, linux-wireless,
	linux-arm-kernel, linux-mediatek

PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c                | 2 --
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c          | 3 +--
 drivers/net/ethernet/intel/idpf/idpf_txrx.c              | 3 ---
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c        | 2 +-
 drivers/net/wireless/mediatek/mt76/mac80211.c            | 2 +-
 include/net/page_pool/types.h                            | 6 ++----
 net/core/page_pool.c                                     | 3 +--
 net/core/skbuff.c                                        | 2 +-
 9 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 16eb7a7af970..2685d0b7be4b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3250,8 +3250,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	pp.dma_dir = bp->rx_dir;
 	pp.max_len = PAGE_SIZE;
 	pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
-	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
-		pp.flags |= PP_FLAG_PAGE_FRAG;
 
 	rxr->page_pool = page_pool_create(&pp);
 	if (IS_ERR(rxr->page_pool)) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index cf50368441b7..06117502001f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4940,8 +4940,7 @@ static void hns3_put_ring_config(struct hns3_nic_priv *priv)
 static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
 {
 	struct page_pool_params pp_params = {
-		.flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
-				PP_FLAG_DMA_SYNC_DEV,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.order = hns3_page_order(ring),
 		.pool_size = ring->desc_num * hns3_buf_size(ring) /
 				(PAGE_SIZE << hns3_page_order(ring)),
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 6fa79898c42c..55a099986b55 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -595,9 +595,6 @@ static struct page_pool *idpf_rx_create_page_pool(struct idpf_queue *rxbufq)
 		.offset		= 0,
 	};
 
-	if (rxbufq->rx_buf_size == IDPF_RX_BUF_2048)
-		pp.flags |= PP_FLAG_PAGE_FRAG;
-
 	return page_pool_create(&pp);
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 818ce76185b2..1a42bfded872 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -1404,7 +1404,7 @@ int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
 	}
 
 	pp_params.order = get_order(buf_size);
-	pp_params.flags = PP_FLAG_PAGE_FRAG | PP_FLAG_DMA_MAP;
+	pp_params.flags = PP_FLAG_DMA_MAP;
 	pp_params.pool_size = min(OTX2_PAGE_POOL_SZ, numptrs);
 	pp_params.nid = NUMA_NO_NODE;
 	pp_params.dev = pfvf->dev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 9325b8f00af0..ea58c6917433 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -897,7 +897,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		struct page_pool_params pp_params = { 0 };
 
 		pp_params.order     = 0;
-		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG;
+		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
 		pp_params.pool_size = pool_size;
 		pp_params.nid       = node;
 		pp_params.dev       = rq->pdev;
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index cb76053973aa..51a767121b0d 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -570,7 +570,7 @@ int mt76_create_page_pool(struct mt76_dev *dev, struct mt76_queue *q)
 {
 	struct page_pool_params pp_params = {
 		.order = 0,
-		.flags = PP_FLAG_PAGE_FRAG,
+		.flags = 0,
 		.nid = NUMA_NO_NODE,
 		.dev = dev->dma_dev,
 	};
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 887e7946a597..6fc5134095ed 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -17,10 +17,8 @@
 					* Please note DMA-sync-for-CPU is still
 					* device driver responsibility
 					*/
-#define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
-				 PP_FLAG_DMA_SYNC_DEV |\
-				 PP_FLAG_PAGE_FRAG)
+				 PP_FLAG_DMA_SYNC_DEV)
 
 /*
  * Fast allocation side cache array/stack
@@ -45,7 +43,7 @@ struct pp_alloc_cache {
 
 /**
  * struct page_pool_params - page pool parameters
- * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG
+ * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
  * @order:	2^order pages on allocation
  * @pool_size:	size of the ptr_ring
  * @nid:	NUMA node id to allocate from pages from
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 953535cab081..2a3671c97ca7 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -756,8 +756,7 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
+	if (WARN_ON(size > max_size))
 		return NULL;
 
 	size = ALIGN(size, dma_get_cache_alignment());
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 975c9a6ffb4a..c52ddd6891d9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5765,7 +5765,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	/* In general, avoid mixing page_pool and non-page_pool allocated
 	 * pages within the same SKB. Additionally avoid dealing with clones
 	 * with page_pool pages, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
+	 * references (page_pool_alloc_frag()). Since we only take full page
 	 * references for cloned SKBs at the moment that would result in
 	 * inconsistent reference counts.
 	 * In theory we could take full references if @from is cloned and
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v12 3/5] page_pool: introduce page_pool_alloc() API
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 2/5] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
@ 2023-10-20  9:59 ` Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 4/5] page_pool: update document about fragment API Yunsheng Lin
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently page pool supports the below use cases:
use case 1: allocate page without page splitting using
            page_pool_alloc_pages() API if the driver knows
            that the memory it need is always bigger than
            half of the page allocated from page pool.
use case 2: allocate page frag with page splitting using
            page_pool_alloc_frag() API if the driver knows
            that the memory it need is always smaller than
            or equal to the half of the page allocated from
            page pool.

There is emerging use case [1] & [2] that is a mix of the
above two case: the driver doesn't know the size of memory it
need beforehand, so the driver may use something like below to
allocate memory with least memory utilization and performance
penalty:

if (size << 1 > max_size)
	page = page_pool_alloc_pages();
else
	page = page_pool_alloc_frag();

To avoid the driver doing something like above, add the
page_pool_alloc() API to support the above use case, and update
the true size of memory that is acctually allocated by updating
'*size' back to the driver in order to avoid exacerbating
truesize underestimate problem.

Rename page_pool_free() which is used in the destroy process to
__page_pool_destroy() to avoid confusion with the newly added
API.

1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 66 +++++++++++++++++++++++++++++++++
 net/core/page_pool.c            |  4 +-
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 759489c037c7..1b76e05dc4d2 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -82,6 +82,66 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 	return page_pool_alloc_frag(pool, offset, size, gfp);
 }
 
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+					   unsigned int *offset,
+					   unsigned int *size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+	struct page *page;
+
+	if ((*size << 1) > max_size) {
+		*size = max_size;
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
+	page = page_pool_alloc_frag(pool, offset, *size, gfp);
+	if (unlikely(!page))
+		return NULL;
+
+	/* There is very likely not enough space for another fragment, so append
+	 * the remaining size to the current fragment to avoid truesize
+	 * underestimate problem.
+	 */
+	if (pool->frag_offset + *size > max_size) {
+		*size = max_size - *offset;
+		pool->frag_offset = max_size;
+	}
+
+	return page;
+}
+
+static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
+					       unsigned int *offset,
+					       unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc(pool, offset, size, gfp);
+}
+
+static inline void *page_pool_alloc_va(struct page_pool *pool,
+				       unsigned int *size, gfp_t gfp)
+{
+	unsigned int offset;
+	struct page *page;
+
+	/* Mask off __GFP_HIGHMEM to ensure we can use page_address() */
+	page = page_pool_alloc(pool, &offset, size, gfp & ~__GFP_HIGHMEM);
+	if (unlikely(!page))
+		return NULL;
+
+	return page_address(page) + offset;
+}
+
+static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
+					   unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc_va(pool, size, gfp);
+}
+
 /**
  * page_pool_get_dma_dir() - Retrieve the stored DMA direction.
  * @pool:	pool from which page was allocated
@@ -221,6 +281,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 #define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
+static inline void page_pool_free_va(struct page_pool *pool, void *va,
+				     bool allow_direct)
+{
+	page_pool_put_page(pool, virt_to_head_page(va), -1, allow_direct);
+}
+
 /**
  * page_pool_get_dma_addr() - Retrieve the stored DMA address.
  * @page:	page allocated from a page pool
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 2a3671c97ca7..5e409b98aba0 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -809,7 +809,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
 	}
 }
 
-static void page_pool_free(struct page_pool *pool)
+static void __page_pool_destroy(struct page_pool *pool)
 {
 	if (pool->disconnect)
 		pool->disconnect(pool);
@@ -860,7 +860,7 @@ static int page_pool_release(struct page_pool *pool)
 	page_pool_scrub(pool);
 	inflight = page_pool_inflight(pool);
 	if (!inflight)
-		page_pool_free(pool);
+		__page_pool_destroy(pool);
 
 	return inflight;
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v12 4/5] page_pool: update document about fragment API
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (2 preceding siblings ...)
  2023-10-20  9:59 ` [PATCH net-next v12 3/5] page_pool: introduce page_pool_alloc() API Yunsheng Lin
@ 2023-10-20  9:59 ` Yunsheng Lin
  2023-10-20  9:59 ` [PATCH net-next v12 5/5] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Dima Tisnek,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet,
	Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, linux-doc, bpf

As more drivers begin to use the fragment API, update the
document about how to decide which API to use for the
driver author.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Dima Tisnek <dimaqq@gmail.com>
---
 Documentation/networking/page_pool.rst |  4 +-
 include/net/page_pool/helpers.h        | 93 ++++++++++++++++++++++----
 2 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 215ebc92752c..60993cb56b32 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -58,7 +58,9 @@ a page will cause no race conditions is enough.
 
 .. kernel-doc:: include/net/page_pool/helpers.h
    :identifiers: page_pool_put_page page_pool_put_full_page
-		 page_pool_recycle_direct page_pool_dev_alloc_pages
+		 page_pool_recycle_direct page_pool_free_va
+		 page_pool_dev_alloc_pages page_pool_dev_alloc_frag
+		 page_pool_dev_alloc page_pool_dev_alloc_va
 		 page_pool_get_dma_addr page_pool_get_dma_dir
 
 .. kernel-doc:: net/core/page_pool.c
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 1b76e05dc4d2..4ebd544ae977 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -8,23 +8,46 @@
 /**
  * DOC: page_pool allocator
  *
- * The page_pool allocator is optimized for the XDP mode that
- * uses one frame per-page, but it can fallback on the
- * regular page allocator APIs.
+ * The page_pool allocator is optimized for recycling page or page fragment used
+ * by skb packet and xdp frame.
  *
- * Basic use involves replacing alloc_pages() calls with the
- * page_pool_alloc_pages() call.  Drivers should use
- * page_pool_dev_alloc_pages() replacing dev_alloc_pages().
+ * Basic use involves replacing and alloc_pages() calls with page_pool_alloc(),
+ * which allocate memory with or without page splitting depending on the
+ * requested memory size.
  *
- * The API keeps track of in-flight pages, in order to let API users know
- * when it is safe to free a page_pool object.  Thus, API users
- * must call page_pool_put_page() to free the page, or attach
- * the page to a page_pool-aware object like skbs marked with
+ * If the driver knows that it always requires full pages or its allocations are
+ * always smaller than half a page, it can use one of the more specific API
+ * calls:
+ *
+ * 1. page_pool_alloc_pages(): allocate memory without page splitting when
+ * driver knows that the memory it need is always bigger than half of the page
+ * allocated from page pool. There is no cache line dirtying for 'struct page'
+ * when a page is recycled back to the page pool.
+ *
+ * 2. page_pool_alloc_frag(): allocate memory with page splitting when driver
+ * knows that the memory it need is always smaller than or equal to half of the
+ * page allocated from page pool. Page splitting enables memory saving and thus
+ * avoids TLB/cache miss for data access, but there also is some cost to
+ * implement page splitting, mainly some cache line dirtying/bouncing for
+ * 'struct page' and atomic operation for page->pp_frag_count.
+ *
+ * The API keeps track of in-flight pages, in order to let API users know when
+ * it is safe to free a page_pool object, the API users must call
+ * page_pool_put_page() or page_pool_free_va() to free the page_pool object, or
+ * attach the page_pool object to a page_pool-aware object like skbs marked with
  * skb_mark_for_recycle().
  *
- * API users must call page_pool_put_page() once on a page, as it
- * will either recycle the page, or in case of refcnt > 1, it will
- * release the DMA mapping and in-flight state accounting.
+ * page_pool_put_page() may be called multi times on the same page if a page is
+ * split into multi fragments. For the last fragment, it will either recycle the
+ * page, or in case of page->_refcount > 1, it will release the DMA mapping and
+ * in-flight state accounting.
+ *
+ * dma_sync_single_range_for_device() is only called for the last fragment when
+ * page_pool is created with PP_FLAG_DMA_SYNC_DEV flag, so it depends on the
+ * last freed fragment to do the sync_for_device operation for all fragments in
+ * the same page when a page is split, the API user must setup pool->p.max_len
+ * and pool->p.offset correctly and ensure that page_pool_put_page() is called
+ * with dma_sync_size being -1 for fragment API.
  */
 #ifndef _NET_PAGE_POOL_HELPERS_H
 #define _NET_PAGE_POOL_HELPERS_H
@@ -73,6 +96,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
+/**
+ * page_pool_dev_alloc_frag() - allocate a page fragment.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: requested size
+ *
+ * Get a page fragment from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Return allocated page fragment, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 						    unsigned int *offset,
 						    unsigned int size)
@@ -111,6 +145,19 @@ static inline struct page *page_pool_alloc(struct page_pool *pool,
 	return page;
 }
 
+/**
+ * page_pool_dev_alloc() - allocate a page or a page fragment.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: in as the requested size, out as the allocated size
+ *
+ * Get a page or a page fragment from the page allocator or page_pool caches
+ * depending on the requested size in order to allocate memory with least memory
+ * utilization and performance penalty.
+ *
+ * Return:
+ * Return allocated page or page fragment, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
 					       unsigned int *offset,
 					       unsigned int *size)
@@ -134,6 +181,18 @@ static inline void *page_pool_alloc_va(struct page_pool *pool,
 	return page_address(page) + offset;
 }
 
+/**
+ * page_pool_dev_alloc_va() - allocate a page or a page fragment and return its
+ *			      va.
+ * @pool: pool from which to allocate
+ * @size: in as the requested size, out as the allocated size
+ *
+ * This is just a thin wrapper around the page_pool_alloc() API, and
+ * it returns va of the allocated page or page fragment.
+ *
+ * Return:
+ * Return the va for the allocated page or page fragment, otherwise return NULL.
+ */
 static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
 					   unsigned int *size)
 {
@@ -281,6 +340,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 #define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
+/**
+ * page_pool_free_va() - free a va into the page_pool
+ * @pool: pool from which va was allocated
+ * @va: va to be freed
+ * @allow_direct: freed by the consumer, allow lockless caching
+ *
+ * Free a va allocated from page_pool_allo_va().
+ */
 static inline void page_pool_free_va(struct page_pool *pool, void *va,
 				     bool allow_direct)
 {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v12 5/5] net: veth: use newly added page pool API for veth with xdp
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (3 preceding siblings ...)
  2023-10-20  9:59 ` [PATCH net-next v12 4/5] page_pool: update document about fragment API Yunsheng Lin
@ 2023-10-20  9:59 ` Yunsheng Lin
  2023-10-24  2:24 ` [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Jakub Kicinski
  2023-10-24  2:30 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-20  9:59 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Eric Dumazet,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, bpf

Use page_pool_alloc() API to allocate memory with least
memory utilization and performance penalty.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/veth.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0deefd1573cf..9980517ed8b0 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -737,10 +737,11 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	if (skb_shared(skb) || skb_head_is_locked(skb) ||
 	    skb_shinfo(skb)->nr_frags ||
 	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
-		u32 size, len, max_head_size, off;
+		u32 size, len, max_head_size, off, truesize, page_offset;
 		struct sk_buff *nskb;
 		struct page *page;
 		int i, head_off;
+		void *va;
 
 		/* We need a private copy of the skb and data buffers since
 		 * the ebpf program can modify it. We segment the original skb
@@ -753,14 +754,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
 			goto drop;
 
+		size = min_t(u32, skb->len, max_head_size);
+		truesize = SKB_HEAD_ALIGN(size) + VETH_XDP_HEADROOM;
+
 		/* Allocate skb head */
-		page = page_pool_dev_alloc_pages(rq->page_pool);
-		if (!page)
+		va = page_pool_dev_alloc_va(rq->page_pool, &truesize);
+		if (!va)
 			goto drop;
 
-		nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+		nskb = napi_build_skb(va, truesize);
 		if (!nskb) {
-			page_pool_put_full_page(rq->page_pool, page, true);
+			page_pool_free_va(rq->page_pool, va, true);
 			goto drop;
 		}
 
@@ -768,7 +772,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		skb_copy_header(nskb, skb);
 		skb_mark_for_recycle(nskb);
 
-		size = min_t(u32, skb->len, max_head_size);
 		if (skb_copy_bits(skb, 0, nskb->data, size)) {
 			consume_skb(nskb);
 			goto drop;
@@ -783,14 +786,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		len = skb->len - off;
 
 		for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
-			page = page_pool_dev_alloc_pages(rq->page_pool);
+			size = min_t(u32, len, PAGE_SIZE);
+			truesize = size;
+
+			page = page_pool_dev_alloc(rq->page_pool, &page_offset,
+						   &truesize);
 			if (!page) {
 				consume_skb(nskb);
 				goto drop;
 			}
 
-			size = min_t(u32, len, PAGE_SIZE);
-			skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
+			skb_add_rx_frag(nskb, i, page, page_offset, size,
+					truesize);
 			if (skb_copy_bits(skb, off, page_address(page),
 					  size)) {
 				consume_skb(nskb);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-10-20  9:59 ` [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-10-23 11:43   ` Ilias Apalodimas
  2023-10-23 12:26     ` Yunsheng Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Ilias Apalodimas @ 2023-10-23 11:43 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Eric Dumazet

Hi Yunsheng, 

[...]

> +	 * 1. 'n == 1': no need to actually overwrite it.
> +	 * 2. 'n != 1': overwrite it with one, which is the rare case
> +	 *              for pp_frag_count draining.
>  	 *
> -	 * The main advantage to doing this is that an atomic_read is
> -	 * generally a much cheaper operation than an atomic update,
> -	 * especially when dealing with a page that may be partitioned
> -	 * into only 2 or 3 pieces.
> +	 * The main advantage to doing this is that not only we avoid a atomic
> +	 * update, as an atomic_read is generally a much cheaper operation than
> +	 * an atomic update, especially when dealing with a page that may be
> +	 * partitioned into only 2 or 3 pieces; but also unify the pp_frag_count
> +	 * handling by ensuring all pages have partitioned into only 1 piece
> +	 * initially, and only overwrite it when the page is partitioned into
> +	 * more than one piece.
>  	 */
> -	if (atomic_long_read(&page->pp_frag_count) == nr)
> +	if (atomic_long_read(&page->pp_frag_count) == nr) {
> +		/* As we have ensured nr is always one for constant case using
> +		 * the BUILD_BUG_ON(), only need to handle the non-constant case
> +		 * here for pp_frag_count draining, which is a rare case.
> +		 */
> +		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
> +		if (!__builtin_constant_p(nr))
> +			atomic_long_set(&page->pp_frag_count, 1);

Aren't we changing the behaviour of the current code here? IIRC is
atomic_long_read(&page->pp_frag_count) == nr we never updated the atomic
pp_frag_count and the reasoning was that the next caller can set it
properly. 

> +
>  		return 0;
> +	}
>  
>  	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>  	WARN_ON(ret < 0);
> +
> +	/* We are the last user here too, reset pp_frag_count back to 1 to
> +	 * ensure all pages have been partitioned into 1 piece initially,
> +	 * this should be the rare case when the last two fragment users call
> +	 * page_pool_defrag_page() currently.
> +	 */
> +	if (unlikely(!ret))
> +		atomic_long_set(&page->pp_frag_count, 1);
> +
>  	return ret;
>  }
>  
 
 [....]

 Thanks
 /Ilias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-10-23 11:43   ` Ilias Apalodimas
@ 2023-10-23 12:26     ` Yunsheng Lin
  2023-10-24  8:26       ` Ilias Apalodimas
  0 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2023-10-23 12:26 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Eric Dumazet

On 2023/10/23 19:43, Ilias Apalodimas wrote:
> Hi Yunsheng, 
> 
> [...]
> 
>> +	 * 1. 'n == 1': no need to actually overwrite it.
>> +	 * 2. 'n != 1': overwrite it with one, which is the rare case
>> +	 *              for pp_frag_count draining.
>>  	 *
>> -	 * The main advantage to doing this is that an atomic_read is
>> -	 * generally a much cheaper operation than an atomic update,
>> -	 * especially when dealing with a page that may be partitioned
>> -	 * into only 2 or 3 pieces.
>> +	 * The main advantage to doing this is that not only we avoid a atomic
>> +	 * update, as an atomic_read is generally a much cheaper operation than
>> +	 * an atomic update, especially when dealing with a page that may be
>> +	 * partitioned into only 2 or 3 pieces; but also unify the pp_frag_count
>> +	 * handling by ensuring all pages have partitioned into only 1 piece
>> +	 * initially, and only overwrite it when the page is partitioned into
>> +	 * more than one piece.
>>  	 */
>> -	if (atomic_long_read(&page->pp_frag_count) == nr)
>> +	if (atomic_long_read(&page->pp_frag_count) == nr) {
>> +		/* As we have ensured nr is always one for constant case using
>> +		 * the BUILD_BUG_ON(), only need to handle the non-constant case
>> +		 * here for pp_frag_count draining, which is a rare case.
>> +		 */
>> +		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
>> +		if (!__builtin_constant_p(nr))
>> +			atomic_long_set(&page->pp_frag_count, 1);
> 
> Aren't we changing the behaviour of the current code here? IIRC is
> atomic_long_read(&page->pp_frag_count) == nr we never updated the atomic
> pp_frag_count and the reasoning was that the next caller can set it
> properly. 

If the next caller is calling the page_pool_alloc_frag(), then yes,
because page_pool_fragment_page() will be used to reset the
page->pp_frag_count, so it does not really matter what is the value
of page->pp_frag_count when we are recycling a page.

If the next caller is calling page_pool_alloc_pages() directly without
fragmenting a page, the above code is used to ensure that pp_frag_count
is always one when page_pool_alloc_pages() fetches a page from pool->alloc
or pool->ring, because page_pool_fragment_page() is not used to reset the
page->pp_frag_count for page_pool_alloc_pages() and we have removed the
per page_pool PP_FLAG_PAGE_FRAG in page_pool_is_last_frag().

As we don't know if the caller is page_pool_alloc_frag() or
page_pool_alloc_pages(), so the above code ensure the page in pool->alloc
or pool->ring always have the pp_frag_count being one.


> 
>> +
>>  		return 0;
>> +	}
>>  
>>  	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>>  	WARN_ON(ret < 0);
>> +
>> +	/* We are the last user here too, reset pp_frag_count back to 1 to
>> +	 * ensure all pages have been partitioned into 1 piece initially,
>> +	 * this should be the rare case when the last two fragment users call
>> +	 * page_pool_defrag_page() currently.
>> +	 */
>> +	if (unlikely(!ret))
>> +		atomic_long_set(&page->pp_frag_count, 1);
>> +
>>  	return ret;
>>  }
>>  
>  
>  [....]
> 
>  Thanks
>  /Ilias
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v12 0/5] introduce page_pool_alloc() related API
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (4 preceding siblings ...)
  2023-10-20  9:59 ` [PATCH net-next v12 5/5] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
@ 2023-10-24  2:24 ` Jakub Kicinski
  2023-10-24  2:30 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2023-10-24  2:24 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, pabeni, netdev, linux-kernel, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, bpf,
	linux-arm-kernel, linux-mediatek

On Fri, 20 Oct 2023 17:59:47 +0800 Yunsheng Lin wrote:
> In [1] & [2] & [3], there are usecases for veth and virtio_net
> to use frag support in page pool to reduce memory usage, and it
> may request different frag size depending on the head/tail
> room space for xdp_frame/shinfo and mtu/packet size. When the
> requested frag size is large enough that a single page can not
> be split into more than one frag, using frag support only have
> performance penalty because of the extra frag count handling
> for frag support.
> 
> So this patchset provides a page pool API for the driver to
> allocate memory with least memory utilization and performance
> penalty when it doesn't know the size of memory it need
> beforehand.

I don't mean to cut off the discussion, if any is still to happen.
But AFAIU we have a general agreement that this is a good direction,
we're at v12 already, and it's getting late in the release cycle.
To give this series a chance of making v6.7 I will apply it now.
If there are any unresolved concerns in a couple of days we can drop it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v12 0/5] introduce page_pool_alloc() related API
  2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (5 preceding siblings ...)
  2023-10-24  2:24 ` [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Jakub Kicinski
@ 2023-10-24  2:30 ` patchwork-bot+netdevbpf
  6 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-10-24  2:30 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, ast, daniel, hawk,
	john.fastabend, matthias.bgg, angelogioacchino.delregno, bpf,
	linux-arm-kernel, linux-mediatek

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 20 Oct 2023 17:59:47 +0800 you wrote:
> In [1] & [2] & [3], there are usecases for veth and virtio_net
> to use frag support in page pool to reduce memory usage, and it
> may request different frag size depending on the head/tail
> room space for xdp_frame/shinfo and mtu/packet size. When the
> requested frag size is large enough that a single page can not
> be split into more than one frag, using frag support only have
> performance penalty because of the extra frag count handling
> for frag support.
> 
> [...]

Here is the summary with links:
  - [net-next,v12,1/5] page_pool: unify frag_count handling in page_pool_is_last_frag()
    https://git.kernel.org/netdev/net-next/c/58d53d8f7da6
  - [net-next,v12,2/5] page_pool: remove PP_FLAG_PAGE_FRAG
    https://git.kernel.org/netdev/net-next/c/09d96ee5674a
  - [net-next,v12,3/5] page_pool: introduce page_pool_alloc() API
    https://git.kernel.org/netdev/net-next/c/de97502e16fc
  - [net-next,v12,4/5] page_pool: update document about fragment API
    https://git.kernel.org/netdev/net-next/c/8ab32fa1c794
  - [net-next,v12,5/5] net: veth: use newly added page pool API for veth with xdp
    https://git.kernel.org/netdev/net-next/c/2d0de67da51a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-10-23 12:26     ` Yunsheng Lin
@ 2023-10-24  8:26       ` Ilias Apalodimas
  0 siblings, 0 replies; 11+ messages in thread
From: Ilias Apalodimas @ 2023-10-24  8:26 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Eric Dumazet

On Mon, 23 Oct 2023 at 15:27, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/10/23 19:43, Ilias Apalodimas wrote:
> > Hi Yunsheng,
> >
> > [...]
> >
> >> +     * 1. 'n == 1': no need to actually overwrite it.
> >> +     * 2. 'n != 1': overwrite it with one, which is the rare case
> >> +     *              for pp_frag_count draining.
> >>       *
> >> -     * The main advantage to doing this is that an atomic_read is
> >> -     * generally a much cheaper operation than an atomic update,
> >> -     * especially when dealing with a page that may be partitioned
> >> -     * into only 2 or 3 pieces.
> >> +     * The main advantage to doing this is that not only we avoid a atomic
> >> +     * update, as an atomic_read is generally a much cheaper operation than
> >> +     * an atomic update, especially when dealing with a page that may be
> >> +     * partitioned into only 2 or 3 pieces; but also unify the pp_frag_count
> >> +     * handling by ensuring all pages have partitioned into only 1 piece
> >> +     * initially, and only overwrite it when the page is partitioned into
> >> +     * more than one piece.
> >>       */
> >> -    if (atomic_long_read(&page->pp_frag_count) == nr)
> >> +    if (atomic_long_read(&page->pp_frag_count) == nr) {
> >> +            /* As we have ensured nr is always one for constant case using
> >> +             * the BUILD_BUG_ON(), only need to handle the non-constant case
> >> +             * here for pp_frag_count draining, which is a rare case.
> >> +             */
> >> +            BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
> >> +            if (!__builtin_constant_p(nr))
> >> +                    atomic_long_set(&page->pp_frag_count, 1);
> >
> > Aren't we changing the behaviour of the current code here? IIRC is
> > atomic_long_read(&page->pp_frag_count) == nr we never updated the atomic
> > pp_frag_count and the reasoning was that the next caller can set it
> > properly.
>
> If the next caller is calling the page_pool_alloc_frag(), then yes,
> because page_pool_fragment_page() will be used to reset the
> page->pp_frag_count, so it does not really matter what is the value
> of page->pp_frag_count when we are recycling a page.
>
> If the next caller is calling page_pool_alloc_pages() directly without
> fragmenting a page, the above code is used to ensure that pp_frag_count
> is always one when page_pool_alloc_pages() fetches a page from pool->alloc
> or pool->ring, because page_pool_fragment_page() is not used to reset the
> page->pp_frag_count for page_pool_alloc_pages() and we have removed the
> per page_pool PP_FLAG_PAGE_FRAG in page_pool_is_last_frag().
>
> As we don't know if the caller is page_pool_alloc_frag() or
> page_pool_alloc_pages(), so the above code ensure the page in pool->alloc
> or pool->ring always have the pp_frag_count being one.


Fair enough, Jakub pulled the series before I managed to ack them, but
that's okay. It's been long overdue apologies. FWIW I went through the
patches and didn't find anything wrong coding-wise

Thanks
/Ilias
>
>
>
> >
> >> +
> >>              return 0;
> >> +    }
> >>
> >>      ret = atomic_long_sub_return(nr, &page->pp_frag_count);
> >>      WARN_ON(ret < 0);
> >> +
> >> +    /* We are the last user here too, reset pp_frag_count back to 1 to
> >> +     * ensure all pages have been partitioned into 1 piece initially,
> >> +     * this should be the rare case when the last two fragment users call
> >> +     * page_pool_defrag_page() currently.
> >> +     */
> >> +    if (unlikely(!ret))
> >> +            atomic_long_set(&page->pp_frag_count, 1);
> >> +
> >>      return ret;
> >>  }
> >>
> >
> >  [....]
> >
> >  Thanks
> >  /Ilias
> >
> > .
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-24  8:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-20  9:59 [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Yunsheng Lin
2023-10-20  9:59 ` [PATCH net-next v12 1/5] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
2023-10-23 11:43   ` Ilias Apalodimas
2023-10-23 12:26     ` Yunsheng Lin
2023-10-24  8:26       ` Ilias Apalodimas
2023-10-20  9:59 ` [PATCH net-next v12 2/5] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
2023-10-20  9:59 ` [PATCH net-next v12 3/5] page_pool: introduce page_pool_alloc() API Yunsheng Lin
2023-10-20  9:59 ` [PATCH net-next v12 4/5] page_pool: update document about fragment API Yunsheng Lin
2023-10-20  9:59 ` [PATCH net-next v12 5/5] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
2023-10-24  2:24 ` [PATCH net-next v12 0/5] introduce page_pool_alloc() related API Jakub Kicinski
2023-10-24  2:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).