* [PATCH net-next v3 0/5] devmem TCP fixes
@ 2024-12-09 17:23 Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 1/5] net: page_pool: rename page_pool_alloc_netmem to *_netmems Mina Almasry
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas
Couple unrelated devmem TCP fixes bundled in a series for some
convenience.
- fix naming and provide page_pool_alloc_netmem for fragged
netmem.
- fix issues with dma-buf dma addresses being potentially
passed to dma_sync_for_* helpers.
---
v3:
- Add documentation patch for memory providers. (Jakub)
- Drop netmem_prefetch helper (Jakub)
v2:
- Fork off the syzbot fixes to net tree, resubmit the rest here.
Mina Almasry (4):
net: page_pool: rename page_pool_alloc_netmem to *_netmems
net: page_pool: create page_pool_alloc_netmem
page_pool: disable sync for cpu for dmabuf memory provider
net: Document memory provider driver support
Samiullah Khawaja (1):
page_pool: Set `dma_sync` to false for devmem memory provider
Documentation/networking/index.rst | 1 +
Documentation/networking/memory-provider.rst | 52 ++++++++++++++++++++
include/net/page_pool/helpers.h | 50 +++++++++++++------
include/net/page_pool/types.h | 2 +-
net/core/devmem.c | 9 ++--
net/core/page_pool.c | 11 +++--
6 files changed, 100 insertions(+), 25 deletions(-)
create mode 100644 Documentation/networking/memory-provider.rst
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next v3 1/5] net: page_pool: rename page_pool_alloc_netmem to *_netmems
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
@ 2024-12-09 17:23 ` Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 2/5] net: page_pool: create page_pool_alloc_netmem Mina Almasry
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas,
Stanislav Fomichev
page_pool_alloc_netmem (without an s) was the mirror of
page_pool_alloc_pages (with an s), which was confusing.
Rename to page_pool_alloc_netmems so it's the mirror of
page_pool_alloc_pages.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
include/net/page_pool/types.h | 2 +-
net/core/page_pool.c | 8 ++++----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1ea16b0e9c79..bd1170e16cff 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -242,7 +242,7 @@ struct page_pool {
};
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
-netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp);
+netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp);
struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
unsigned int size, gfp_t gfp);
netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4c85b77cfdac..3c0e19e13e64 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -574,7 +574,7 @@ static noinline netmem_ref __page_pool_alloc_pages_slow(struct page_pool *pool,
/* For using page_pool replace: alloc_pages() API calls, but provide
* synchronization guarantee for allocation side.
*/
-netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp)
+netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
{
netmem_ref netmem;
@@ -590,11 +590,11 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp)
netmem = __page_pool_alloc_pages_slow(pool, gfp);
return netmem;
}
-EXPORT_SYMBOL(page_pool_alloc_netmem);
+EXPORT_SYMBOL(page_pool_alloc_netmems);
struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
{
- return netmem_to_page(page_pool_alloc_netmem(pool, gfp));
+ return netmem_to_page(page_pool_alloc_netmems(pool, gfp));
}
EXPORT_SYMBOL(page_pool_alloc_pages);
ALLOW_ERROR_INJECTION(page_pool_alloc_pages, NULL);
@@ -957,7 +957,7 @@ netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
}
if (!netmem) {
- netmem = page_pool_alloc_netmem(pool, gfp);
+ netmem = page_pool_alloc_netmems(pool, gfp);
if (unlikely(!netmem)) {
pool->frag_page = 0;
return 0;
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v3 2/5] net: page_pool: create page_pool_alloc_netmem
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 1/5] net: page_pool: rename page_pool_alloc_netmem to *_netmems Mina Almasry
@ 2024-12-09 17:23 ` Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider Mina Almasry
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas,
Stanislav Fomichev
Create page_pool_alloc_netmem to be the mirror of page_pool_alloc.
This enables drivers that want currently use page_pool_alloc to
transition to netmem by converting the call sites to
page_pool_alloc_netmem.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
---
include/net/page_pool/helpers.h | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 793e6fd78bc5..8e548ff3044c 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -116,22 +116,22 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
return page_pool_alloc_frag(pool, offset, size, gfp);
}
-static inline struct page *page_pool_alloc(struct page_pool *pool,
- unsigned int *offset,
- unsigned int *size, gfp_t gfp)
+static inline netmem_ref page_pool_alloc_netmem(struct page_pool *pool,
+ unsigned int *offset,
+ unsigned int *size, gfp_t gfp)
{
unsigned int max_size = PAGE_SIZE << pool->p.order;
- struct page *page;
+ netmem_ref netmem;
if ((*size << 1) > max_size) {
*size = max_size;
*offset = 0;
- return page_pool_alloc_pages(pool, gfp);
+ return page_pool_alloc_netmems(pool, gfp);
}
- page = page_pool_alloc_frag(pool, offset, *size, gfp);
- if (unlikely(!page))
- return NULL;
+ netmem = page_pool_alloc_frag_netmem(pool, offset, *size, gfp);
+ if (unlikely(!netmem))
+ return 0;
/* There is very likely not enough space for another fragment, so append
* the remaining size to the current fragment to avoid truesize
@@ -142,7 +142,14 @@ static inline struct page *page_pool_alloc(struct page_pool *pool,
pool->frag_offset = max_size;
}
- return page;
+ return netmem;
+}
+
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+ unsigned int *offset,
+ unsigned int *size, gfp_t gfp)
+{
+ return netmem_to_page(page_pool_alloc_netmem(pool, offset, size, gfp));
}
/**
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 1/5] net: page_pool: rename page_pool_alloc_netmem to *_netmems Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 2/5] net: page_pool: create page_pool_alloc_netmem Mina Almasry
@ 2024-12-09 17:23 ` Mina Almasry
2024-12-11 3:44 ` Jakub Kicinski
2024-12-09 17:23 ` [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf " Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 5/5] net: Document memory provider driver support Mina Almasry
4 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas,
Jason Gunthorpe
From: Samiullah Khawaja <skhawaja@google.com>
Move the `dma_map` and `dma_sync` checks to `page_pool_init` to make
them generic. Set dma_sync to false for devmem memory provider because
the dma_sync APIs should not be used for dma_buf backed devmem memory
provider.
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
net/core/devmem.c | 9 ++++-----
net/core/page_pool.c | 3 +++
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 11b91c12ee11..826d0b00159f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -331,11 +331,10 @@ int mp_dmabuf_devmem_init(struct page_pool *pool)
if (!binding)
return -EINVAL;
- if (!pool->dma_map)
- return -EOPNOTSUPP;
-
- if (pool->dma_sync)
- return -EOPNOTSUPP;
+ /* dma-buf dma addresses should not be used with
+ * dma_sync_for_cpu/device. Force disable dma_sync.
+ */
+ pool->dma_sync = false;
if (pool->p.order != 0)
return -E2BIG;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 3c0e19e13e64..060450082342 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -287,6 +287,9 @@ static int page_pool_init(struct page_pool *pool,
}
if (pool->mp_priv) {
+ if (!pool->dma_map || !pool->dma_sync)
+ return -EOPNOTSUPP;
+
err = mp_dmabuf_devmem_init(pool);
if (err) {
pr_warn("%s() mem-provider init failed %d\n", __func__,
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf memory provider
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
` (2 preceding siblings ...)
2024-12-09 17:23 ` [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider Mina Almasry
@ 2024-12-09 17:23 ` Mina Almasry
2024-12-11 3:47 ` Jakub Kicinski
2024-12-09 17:23 ` [PATCH net-next v3 5/5] net: Document memory provider driver support Mina Almasry
4 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas,
Jason Gunthorpe
dmabuf dma-addresses should not be dma_sync'd for CPU/device. Typically
its the driver responsibility to dma_sync for CPU, but the driver should
not dma_sync for CPU if the netmem is actually coming from a dmabuf
memory provider.
The page_pool already exposes a helper for dma_sync_for_cpu:
page_pool_dma_sync_for_cpu. Upgrade this existing helper to handle
netmem, and have it skip dma_sync if the memory is from a dmabuf memory
provider. Drivers should migrate to using this helper when adding
support for netmem.
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
include/net/page_pool/helpers.h | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 8e548ff3044c..ad4fed4a791c 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -429,9 +429,10 @@ static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
}
/**
- * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
+ * page_pool_dma_sync_netmem_for_cpu - sync Rx page for CPU after it's written
+ * by HW
* @pool: &page_pool the @page belongs to
- * @page: page to sync
+ * @netmem: netmem to sync
* @offset: offset from page start to "hard" start if using PP frags
* @dma_sync_size: size of the data written to the page
*
@@ -440,16 +441,28 @@ static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
* Note that this version performs DMA sync unconditionally, even if the
* associated PP doesn't perform sync-for-device.
*/
-static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
- const struct page *page,
- u32 offset, u32 dma_sync_size)
+static inline void
+page_pool_dma_sync_netmem_for_cpu(const struct page_pool *pool,
+ const netmem_ref netmem, u32 offset,
+ u32 dma_sync_size)
{
+ if (pool->mp_priv)
+ return;
+
dma_sync_single_range_for_cpu(pool->p.dev,
- page_pool_get_dma_addr(page),
+ page_pool_get_dma_addr_netmem(netmem),
offset + pool->p.offset, dma_sync_size,
page_pool_get_dma_dir(pool));
}
+static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
+ struct page *page, u32 offset,
+ u32 dma_sync_size)
+{
+ page_pool_dma_sync_netmem_for_cpu(pool, page_to_netmem(page), offset,
+ dma_sync_size);
+}
+
static inline bool page_pool_put(struct page_pool *pool)
{
return refcount_dec_and_test(&pool->user_cnt);
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
` (3 preceding siblings ...)
2024-12-09 17:23 ` [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf " Mina Almasry
@ 2024-12-09 17:23 ` Mina Almasry
2024-12-09 17:43 ` Randy Dunlap
2024-12-11 3:55 ` Jakub Kicinski
4 siblings, 2 replies; 15+ messages in thread
From: Mina Almasry @ 2024-12-09 17:23 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Mina Almasry, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas
Document expectations from drivers looking to add support for device
memory tcp or other memory provider based features.
Signed-off-by: Mina Almasry <almasrymina@google.com>
---
Documentation/networking/index.rst | 1 +
Documentation/networking/memory-provider.rst | 52 ++++++++++++++++++++
2 files changed, 53 insertions(+)
create mode 100644 Documentation/networking/memory-provider.rst
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 46c178e564b3..c184e86a6ce1 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -73,6 +73,7 @@ Contents:
l2tp
lapb-module
mac80211-injection
+ memory-provider
mctp
mpls-sysctl
mptcp
diff --git a/Documentation/networking/memory-provider.rst b/Documentation/networking/memory-provider.rst
new file mode 100644
index 000000000000..4eee3b01eb18
--- /dev/null
+++ b/Documentation/networking/memory-provider.rst
@@ -0,0 +1,52 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Memory providers
+================
+
+
+Intro
+=====
+
+Device memory TCP, and likely more upcoming features, are reliant in memory
+provider support in the driver.
+
+
+Driver support
+==============
+
+1. The driver must support page_pool. The driver must not do its own recycling
+ on top of page_pool.
+
+2. The driver must support tcp-data-split ethtool option.
+
+3. The driver must use the page_pool netmem APIs. The netmem APIs are
+ currently 1-to-1 mapped with page APIs. Conversion to netmem should be
+ achievable by switching the page APIs to netmem APIs and tracking memory via
+ netmem_refs in the driver rather than struct page * :
+
+ - page_pool_alloc -> page_pool_alloc_netmem
+ - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem
+ - page_pool_put_page -> page_pool_put_netmem
+
+ Not all page APIs have netmem equivalents at the moment. If your driver
+ relies on a missing netmem API, feel free to add and propose to netdev@ or
+ reach out to almasrymina@google.com for help adding the netmem API.
+
+4. The driver must use the following PP_FLAGS:
+
+ - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver
+ must delegate the dma mapping to the page_pool.
+ - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable
+ by the driver. The driver must delegate the dma syncing to the page_pool.
+ - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff
+ tcp-data-split is enabled. In this case the netmem allocated by the
+ page_pool may be unreadable, and netmem_address() will return NULL to
+ indicate that. The driver must not assume that the netmem is readable.
+
+5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of
+ dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for
+ CPU will be done by the page_pool, for others (particularly dmabuf memory
+ provider), dma syncing for CPU is the responsibility of the userspace using
+ dmabuf APIs. The driver must delegate the entire dma-syncing operation to
+ the page_pool which will do it correctly.
--
2.47.0.338.g60cca15819-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-09 17:23 ` [PATCH net-next v3 5/5] net: Document memory provider driver support Mina Almasry
@ 2024-12-09 17:43 ` Randy Dunlap
2024-12-11 3:55 ` Jakub Kicinski
1 sibling, 0 replies; 15+ messages in thread
From: Randy Dunlap @ 2024-12-09 17:43 UTC (permalink / raw)
To: Mina Almasry, netdev, Jakub Kicinski, Pavel Begunkov,
Kaiyuan Zhang, Willem de Bruijn, Samiullah Khawaja, linux-doc,
linux-kernel
Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
Jonathan Corbet, Jesper Dangaard Brouer, Ilias Apalodimas
Hi--
On 12/9/24 9:23 AM, Mina Almasry wrote:
> Document expectations from drivers looking to add support for device
> memory tcp or other memory provider based features.
>
> Signed-off-by: Mina Almasry <almasrymina@google.com>
>
> ---
> Documentation/networking/index.rst | 1 +
> Documentation/networking/memory-provider.rst | 52 ++++++++++++++++++++
> 2 files changed, 53 insertions(+)
> create mode 100644 Documentation/networking/memory-provider.rst
>
> diff --git a/Documentation/networking/memory-provider.rst b/Documentation/networking/memory-provider.rst
> new file mode 100644
> index 000000000000..4eee3b01eb18
> --- /dev/null
> +++ b/Documentation/networking/memory-provider.rst
> @@ -0,0 +1,52 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +================
> +Memory providers
> +================
> +
> +
> +Intro
Full word, please.
> +=====
> +
> +Device memory TCP, and likely more upcoming features, are reliant in memory
> +provider support in the driver.
I can't quite parse after "features." Maybe "are reliant on"?
Maybe "in-memory"?
Should it be "reliable" instead of "reliant"? Internet tells me that
reliant means dependent on.
> +
> +
> +Driver support
> +==============
> +
> +1. The driver must support page_pool. The driver must not do its own recycling
> + on top of page_pool.
> +
> +2. The driver must support tcp-data-split ethtool option.
must support the
> +
> +3. The driver must use the page_pool netmem APIs. The netmem APIs are
> + currently 1-to-1 mapped with page APIs. Conversion to netmem should be
> + achievable by switching the page APIs to netmem APIs and tracking memory via
> + netmem_refs in the driver rather than struct page * :
> +
> + - page_pool_alloc -> page_pool_alloc_netmem
> + - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem
> + - page_pool_put_page -> page_pool_put_netmem
> +
> + Not all page APIs have netmem equivalents at the moment. If your driver
> + relies on a missing netmem API, feel free to add and propose to netdev@ or
> + reach out to almasrymina@google.com for help adding the netmem API.
> +
> +4. The driver must use the following PP_FLAGS:
> +
> + - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver
dma-mappable
> + must delegate the dma mapping to the page_pool.
> + - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable
> + by the driver. The driver must delegate the dma syncing to the page_pool.
> + - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff
> + tcp-data-split is enabled. In this case the netmem allocated by the
> + page_pool may be unreadable, and netmem_address() will return NULL to
> + indicate that. The driver must not assume that the netmem is readable.
> +
> +5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of
> + dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for
> + CPU will be done by the page_pool, for others (particularly dmabuf memory
> + provider), dma syncing for CPU is the responsibility of the userspace using
> + dmabuf APIs. The driver must delegate the entire dma-syncing operation to
> + the page_pool which will do it correctly.
thanks.
--
~Randy
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider
2024-12-09 17:23 ` [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider Mina Almasry
@ 2024-12-11 3:44 ` Jakub Kicinski
0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2024-12-11 3:44 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Mon, 9 Dec 2024 17:23:06 +0000 Mina Almasry wrote:
> + /* dma-buf dma addresses should not be used with
Sounds a bit scary, maybe let's say:
vvvvvvvvvvvvvvv
/* dma-buf DMA addresses do not need and should not be used with
? up to you.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf memory provider
2024-12-09 17:23 ` [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf " Mina Almasry
@ 2024-12-11 3:47 ` Jakub Kicinski
2024-12-11 19:40 ` Mina Almasry
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-12-11 3:47 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Mon, 9 Dec 2024 17:23:07 +0000 Mina Almasry wrote:
> -static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> - const struct page *page,
> - u32 offset, u32 dma_sync_size)
> +static inline void
> +page_pool_dma_sync_netmem_for_cpu(const struct page_pool *pool,
> + const netmem_ref netmem, u32 offset,
> + u32 dma_sync_size)
> {
> + if (pool->mp_priv)
Let's add a dedicated bit to skip sync. The io-uring support feels
quite close. Let's not force those guys to have to rejig this.
> + return;
> +
> dma_sync_single_range_for_cpu(pool->p.dev,
> - page_pool_get_dma_addr(page),
> + page_pool_get_dma_addr_netmem(netmem),
> offset + pool->p.offset, dma_sync_size,
> page_pool_get_dma_dir(pool));
> }
>
> +static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> + struct page *page, u32 offset,
> + u32 dma_sync_size)
> +{
> + page_pool_dma_sync_netmem_for_cpu(pool, page_to_netmem(page), offset,
> + dma_sync_size);
I have the feeling Olek won't thank us for this extra condition and
bit clearing. If driver calls page_pool_dma_sync_for_cpu() we don't
have to check the new bit / mp_priv. Let's copy & paste the
dma_sync_single_range_for_cpu() call directly here.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-09 17:23 ` [PATCH net-next v3 5/5] net: Document memory provider driver support Mina Almasry
2024-12-09 17:43 ` Randy Dunlap
@ 2024-12-11 3:55 ` Jakub Kicinski
2024-12-11 17:53 ` Mina Almasry
1 sibling, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-12-11 3:55 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Mon, 9 Dec 2024 17:23:08 +0000 Mina Almasry wrote:
> +================
> +Memory providers
> +================
> +
> +
> +Intro
> +=====
> +
> +Device memory TCP, and likely more upcoming features, are reliant in memory
> +provider support in the driver.
Are "memory providers" something driver authors care about?
I'd go with netmem naming in all driver facing APIs.
Or perhaps say placing data into unreable buffers?
> +Driver support
> +==============
> +
> +1. The driver must support page_pool. The driver must not do its own recycling
> + on top of page_pool.
I like the rule, but is there a specific reason driver can't do its own
recycling?
> +2. The driver must support tcp-data-split ethtool option.
> +
> +3. The driver must use the page_pool netmem APIs. The netmem APIs are
> + currently 1-to-1 mapped with page APIs. Conversion to netmem should be
mapped => correspond ?
> + achievable by switching the page APIs to netmem APIs and tracking memory via
> + netmem_refs in the driver rather than struct page * :
> +
> + - page_pool_alloc -> page_pool_alloc_netmem
> + - page_pool_get_dma_addr -> page_pool_get_dma_addr_netmem
> + - page_pool_put_page -> page_pool_put_netmem
> +
> + Not all page APIs have netmem equivalents at the moment. If your driver
> + relies on a missing netmem API, feel free to add and propose to netdev@ or
> + reach out to almasrymina@google.com for help adding the netmem API.
> +
> +4. The driver must use the following PP_FLAGS:
> +
> + - PP_FLAG_DMA_MAP: netmem is not dma mappable by the driver. The driver
> + must delegate the dma mapping to the page_pool.
> + - PP_FLAG_DMA_SYNC_DEV: netmem dma addr is not necessarily dma-syncable
> + by the driver. The driver must delegate the dma syncing to the page_pool.
> + - PP_FLAG_ALLOW_UNREADABLE_NETMEM. The driver must specify this flag iff
> + tcp-data-split is enabled. In this case the netmem allocated by the
> + page_pool may be unreadable, and netmem_address() will return NULL to
> + indicate that. The driver must not assume that the netmem is readable.
Should we turn the netmem_address() explanation into a point of its own,
calling out restrictions on use of CPU addresses, netmem_to_page() etc?
> +5. The driver must use page_pool_dma_sync_netmem_for_cpu() in lieu of
> + dma_sync_single_range_for_cpu(). For some memory providers, dma_syncing for
> + CPU will be done by the page_pool, for others (particularly dmabuf memory
> + provider), dma syncing for CPU is the responsibility of the userspace using
> + dmabuf APIs. The driver must delegate the entire dma-syncing operation to
> + the page_pool which will do it correctly.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-11 3:55 ` Jakub Kicinski
@ 2024-12-11 17:53 ` Mina Almasry
2024-12-12 2:28 ` Jakub Kicinski
0 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-12-11 17:53 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Tue, Dec 10, 2024 at 7:55 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 9 Dec 2024 17:23:08 +0000 Mina Almasry wrote:
> > +================
> > +Memory providers
> > +================
> > +
> > +
> > +Intro
> > +=====
> > +
> > +Device memory TCP, and likely more upcoming features, are reliant in memory
> > +provider support in the driver.
>
> Are "memory providers" something driver authors care about?
> I'd go with netmem naming in all driver facing APIs.
> Or perhaps say placing data into unreable buffers?
>
Sure, I can center the docs around netmem rather than 'memory providers'.
> > +Driver support
> > +==============
> > +
> > +1. The driver must support page_pool. The driver must not do its own recycling
> > + on top of page_pool.
>
> I like the rule, but is there a specific reason driver can't do its own
> recycling?
>
Drivers doing their own recycling is not currently supported, AFAICT.
Adding support for it in the future and maintaining it is doable, but
a headache. I also noticed with IDPF you're nacking drivers doing
their own recycling anyway, so I thought why not just declare all such
use cases as not supported to make the whole thing much simpler.
Drivers can deprecate their recycling while adding support for netmem
which brings them in line with what you're enforcing for new drivers
anyway.
The specific reason: currently drivers will get_page pp pages to hold
on to them to do their own recycling, right? We don't even have a
get_netmem equivalent. We could add one (and for the TX path, which is
coming along, I do add one), but even then, the pp needs to detect
elevated references on net_iovs to exclude them from pp recycling. The
mp also needs to understand/keep track of elevated refcounts and make
sure the page is returned to it when the elevated refcounts from the
driver are dropped.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf memory provider
2024-12-11 3:47 ` Jakub Kicinski
@ 2024-12-11 19:40 ` Mina Almasry
0 siblings, 0 replies; 15+ messages in thread
From: Mina Almasry @ 2024-12-11 19:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Tue, Dec 10, 2024 at 7:47 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 9 Dec 2024 17:23:07 +0000 Mina Almasry wrote:
> > -static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> > - const struct page *page,
> > - u32 offset, u32 dma_sync_size)
> > +static inline void
> > +page_pool_dma_sync_netmem_for_cpu(const struct page_pool *pool,
> > + const netmem_ref netmem, u32 offset,
> > + u32 dma_sync_size)
> > {
> > + if (pool->mp_priv)
>
> Let's add a dedicated bit to skip sync. The io-uring support feels
> quite close. Let's not force those guys to have to rejig this.
>
OK.
> > + return;
> > +
> > dma_sync_single_range_for_cpu(pool->p.dev,
> > - page_pool_get_dma_addr(page),
> > + page_pool_get_dma_addr_netmem(netmem),
> > offset + pool->p.offset, dma_sync_size,
> > page_pool_get_dma_dir(pool));
> > }
> >
> > +static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> > + struct page *page, u32 offset,
> > + u32 dma_sync_size)
> > +{
> > + page_pool_dma_sync_netmem_for_cpu(pool, page_to_netmem(page), offset,
> > + dma_sync_size);
>
> I have the feeling Olek won't thank us for this extra condition and
> bit clearing. If driver calls page_pool_dma_sync_for_cpu() we don't
> have to check the new bit / mp_priv. Let's copy & paste the
> dma_sync_single_range_for_cpu() call directly here.
page_pool_get_dma_addr() also does a cast to netmem and bit clearing :/
The whole netmem stuff was written to maximize code reuse. We don't
really special case pages for performance, we convert pages to netmem
then pipe them to through common code paths. I can special case pages
here but we would also need to copy the implementation of
page_pool_get_dma_addr() as well. But note the tradeoff is some code
duplication. Seems from the discussions it's worth it which is fine by
me.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-11 17:53 ` Mina Almasry
@ 2024-12-12 2:28 ` Jakub Kicinski
2024-12-13 17:50 ` Mina Almasry
0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-12-12 2:28 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Wed, 11 Dec 2024 09:53:36 -0800 Mina Almasry wrote:
> Drivers doing their own recycling is not currently supported, AFAICT.
> Adding support for it in the future and maintaining it is doable, but
> a headache. I also noticed with IDPF you're nacking drivers doing
> their own recycling anyway, so I thought why not just declare all such
> use cases as not supported to make the whole thing much simpler.
> Drivers can deprecate their recycling while adding support for netmem
> which brings them in line with what you're enforcing for new drivers
> anyway.
IIRC IDPF was doing recycling based on the old page ref tricks,
without any use of page pool at all. But without using page pool
the driver will never acquire a netmem_ref in the first place.
> The specific reason: currently drivers will get_page pp pages to hold
> on to them to do their own recycling, right? We don't even have a
> get_netmem equivalent. We could add one (and for the TX path, which is
> coming along, I do add one), but even then, the pp needs to detect
> elevated references on net_iovs to exclude them from pp recycling. The
> mp also needs to understand/keep track of elevated refcounts and make
> sure the page is returned to it when the elevated refcounts from the
> driver are dropped.
No? It should all just work. The page may get split / fragmented by
the driver or page_pool_alloc_netmem() which you're adding in this
series. A fragmented net_iov will have an elevated refcount in exactly
the same way as if the driver was stashing one ref to release later.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-12 2:28 ` Jakub Kicinski
@ 2024-12-13 17:50 ` Mina Almasry
2024-12-14 1:55 ` Jakub Kicinski
0 siblings, 1 reply; 15+ messages in thread
From: Mina Almasry @ 2024-12-13 17:50 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Wed, Dec 11, 2024 at 6:28 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 11 Dec 2024 09:53:36 -0800 Mina Almasry wrote:
> > Drivers doing their own recycling is not currently supported, AFAICT.
> > Adding support for it in the future and maintaining it is doable, but
> > a headache. I also noticed with IDPF you're nacking drivers doing
> > their own recycling anyway, so I thought why not just declare all such
> > use cases as not supported to make the whole thing much simpler.
> > Drivers can deprecate their recycling while adding support for netmem
> > which brings them in line with what you're enforcing for new drivers
> > anyway.
>
> IIRC IDPF was doing recycling based on the old page ref tricks,
> without any use of page pool at all. But without using page pool
> the driver will never acquire a netmem_ref in the first place.
>
> > The specific reason: currently drivers will get_page pp pages to hold
> > on to them to do their own recycling, right? We don't even have a
> > get_netmem equivalent. We could add one (and for the TX path, which is
> > coming along, I do add one), but even then, the pp needs to detect
> > elevated references on net_iovs to exclude them from pp recycling. The
> > mp also needs to understand/keep track of elevated refcounts and make
> > sure the page is returned to it when the elevated refcounts from the
> > driver are dropped.
>
> No? It should all just work. The page may get split / fragmented by
> the driver or page_pool_alloc_netmem() which you're adding in this
> series. A fragmented net_iov will have an elevated refcount in exactly
> the same way as if the driver was stashing one ref to release later.
Ah, I had not considered that the driver may recycle the page by
holding onto a pp ref count, rather than a page refcount. The former
should indeed just work, although I don't have a driver that does
this, so test coverage may be a bit lacking.
But you mentioned you like the rule above. Do you want this removed
from the docs entirely? Or clarified to something like "driver's can't
perform their own recycling by holding onto page refs, but can hold
onto page_pool refs"?
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3 5/5] net: Document memory provider driver support
2024-12-13 17:50 ` Mina Almasry
@ 2024-12-14 1:55 ` Jakub Kicinski
0 siblings, 0 replies; 15+ messages in thread
From: Jakub Kicinski @ 2024-12-14 1:55 UTC (permalink / raw)
To: Mina Almasry
Cc: netdev, Pavel Begunkov, Kaiyuan Zhang, Willem de Bruijn,
Samiullah Khawaja, linux-doc, linux-kernel, David S. Miller,
Eric Dumazet, Paolo Abeni, Simon Horman, Jonathan Corbet,
Jesper Dangaard Brouer, Ilias Apalodimas
On Fri, 13 Dec 2024 09:50:15 -0800 Mina Almasry wrote:
> > No? It should all just work. The page may get split / fragmented by
> > the driver or page_pool_alloc_netmem() which you're adding in this
> > series. A fragmented net_iov will have an elevated refcount in exactly
> > the same way as if the driver was stashing one ref to release later.
>
> Ah, I had not considered that the driver may recycle the page by
> holding onto a pp ref count, rather than a page refcount. The former
> should indeed just work, although I don't have a driver that does
> this, so test coverage may be a bit lacking.
>
> But you mentioned you like the rule above. Do you want this removed
> from the docs entirely? Or clarified to something like "driver's can't
> perform their own recycling by holding onto page refs, but can hold
> onto page_pool refs"?
We can call it out, makes sense. I'm not sure how to clearly name
the page ref vs page_pool ref. But yes, driver must not attempt to
hold onto struct page for recycling, as there is no struct page.
I'd add to that that recycling using page pool refs is also discouraged
as the circulation time for buffers is much longer than when data is
copied out during recvmsg().
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-12-14 1:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-09 17:23 [PATCH net-next v3 0/5] devmem TCP fixes Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 1/5] net: page_pool: rename page_pool_alloc_netmem to *_netmems Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 2/5] net: page_pool: create page_pool_alloc_netmem Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 3/5] page_pool: Set `dma_sync` to false for devmem memory provider Mina Almasry
2024-12-11 3:44 ` Jakub Kicinski
2024-12-09 17:23 ` [PATCH net-next v3 4/5] page_pool: disable sync for cpu for dmabuf " Mina Almasry
2024-12-11 3:47 ` Jakub Kicinski
2024-12-11 19:40 ` Mina Almasry
2024-12-09 17:23 ` [PATCH net-next v3 5/5] net: Document memory provider driver support Mina Almasry
2024-12-09 17:43 ` Randy Dunlap
2024-12-11 3:55 ` Jakub Kicinski
2024-12-11 17:53 ` Mina Almasry
2024-12-12 2:28 ` Jakub Kicinski
2024-12-13 17:50 ` Mina Almasry
2024-12-14 1:55 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).