* [PATCH net-next 01/12] page_pool: allow mixing PPs within one bulk
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 02/12] xdp: get rid of xdp_frame::mem.id Alexander Lobakin
` (11 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
The main reason for this change was to allow mixing pages from different
&page_pools within one &xdp_buff/&xdp_frame. Why not? With stuff like
devmem and io_uring zerocopy Rx, it's required to have separate PPs for
header buffers and payload buffers.
Adjust xdp_return_frame_bulk() and page_pool_put_netmem_bulk(), so that
they won't be tied to a particular pool. Let the latter create a
separate bulk of pages which's PP is different from the first netmem of
the bulk and process it after the main loop.
This greatly optimizes xdp_return_frame_bulk(): no more hashtable
lookups and forced flushes on PP mismatch. Also make
xdp_flush_frame_bulk() inline, as it's just one if + function call + one
u32 read, not worth extending the call ladder.
Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com> # iterative
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org> # while (count)
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/page_pool/types.h | 6 +-
include/net/xdp.h | 16 +++--
net/core/page_pool.c | 109 ++++++++++++++++++++++------------
net/core/xdp.c | 29 +--------
4 files changed, 87 insertions(+), 73 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1ea16b0e9c79..05a864031271 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -259,8 +259,7 @@ void page_pool_disable_direct_recycling(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
const struct xdp_mem_info *mem);
-void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
- u32 count);
+void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
#else
static inline void page_pool_destroy(struct page_pool *pool)
{
@@ -272,8 +271,7 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
{
}
-static inline void page_pool_put_netmem_bulk(struct page_pool *pool,
- netmem_ref *data, u32 count)
+static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
{
}
#endif
diff --git a/include/net/xdp.h b/include/net/xdp.h
index f4020b29122f..9e7eb8223513 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -11,6 +11,8 @@
#include <linux/netdevice.h>
#include <linux/skbuff.h> /* skb_shared_info */
+#include <net/page_pool/types.h>
+
/**
* DOC: XDP RX-queue information
*
@@ -193,14 +195,12 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
#define XDP_BULK_QUEUE_SIZE 16
struct xdp_frame_bulk {
int count;
- void *xa;
netmem_ref q[XDP_BULK_QUEUE_SIZE];
};
static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
{
- /* bq->count will be zero'ed when bq->xa gets updated */
- bq->xa = NULL;
+ bq->count = 0;
}
static inline struct skb_shared_info *
@@ -317,10 +317,18 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
void xdp_return_buff(struct xdp_buff *xdp);
-void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq);
+static inline void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
+{
+ if (unlikely(!bq->count))
+ return;
+
+ page_pool_put_netmem_bulk(bq->q, bq->count);
+ bq->count = 0;
+}
+
static __always_inline unsigned int
xdp_get_frame_len(const struct xdp_frame *xdpf)
{
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4c85b77cfdac..10cef95f12e3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -839,9 +839,41 @@ void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page,
}
EXPORT_SYMBOL(page_pool_put_unrefed_page);
+static void page_pool_recycle_ring_bulk(struct page_pool *pool,
+ netmem_ref *bulk,
+ u32 bulk_len)
+{
+ bool in_softirq;
+ u32 i;
+
+ /* Bulk produce into ptr_ring page_pool cache */
+ in_softirq = page_pool_producer_lock(pool);
+
+ for (i = 0; i < bulk_len; i++) {
+ if (__ptr_ring_produce(&pool->ring, (__force void *)bulk[i])) {
+ /* ring full */
+ recycle_stat_inc(pool, ring_full);
+ break;
+ }
+ }
+
+ page_pool_producer_unlock(pool, in_softirq);
+ recycle_stat_add(pool, ring, i);
+
+ /* Hopefully all pages were returned into ptr_ring */
+ if (likely(i == bulk_len))
+ return;
+
+ /*
+ * ptr_ring cache is full, free remaining pages outside producer lock
+ * since put_page() with refcnt == 1 can be an expensive operation.
+ */
+ for (; i < bulk_len; i++)
+ page_pool_return_page(pool, bulk[i]);
+}
+
/**
* page_pool_put_netmem_bulk() - release references on multiple netmems
- * @pool: pool from which pages were allocated
* @data: array holding netmem references
* @count: number of entries in @data
*
@@ -854,52 +886,55 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page);
* Please note the caller must not use data area after running
* page_pool_put_netmem_bulk(), as this function overwrites it.
*/
-void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
- u32 count)
+void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
{
- int i, bulk_len = 0;
- bool allow_direct;
- bool in_softirq;
-
- allow_direct = page_pool_napi_local(pool);
+ u32 bulk_len = 0;
- for (i = 0; i < count; i++) {
+ for (u32 i = 0; i < count; i++) {
netmem_ref netmem = netmem_compound_head(data[i]);
- /* It is not the last user for the page frag case */
- if (!page_pool_is_last_ref(netmem))
- continue;
-
- netmem = __page_pool_put_page(pool, netmem, -1, allow_direct);
- /* Approved for bulk recycling in ptr_ring cache */
- if (netmem)
+ if (page_pool_is_last_ref(netmem))
data[bulk_len++] = netmem;
}
- if (!bulk_len)
- return;
-
- /* Bulk producer into ptr_ring page_pool cache */
- in_softirq = page_pool_producer_lock(pool);
- for (i = 0; i < bulk_len; i++) {
- if (__ptr_ring_produce(&pool->ring, (__force void *)data[i])) {
- /* ring full */
- recycle_stat_inc(pool, ring_full);
- break;
+ count = bulk_len;
+ while (count) {
+ netmem_ref bulk[XDP_BULK_QUEUE_SIZE];
+ struct page_pool *pool = NULL;
+ bool allow_direct;
+ u32 foreign = 0;
+
+ bulk_len = 0;
+
+ for (u32 i = 0; i < count; i++) {
+ struct page_pool *netmem_pp;
+ netmem_ref netmem = data[i];
+
+ netmem_pp = netmem_get_pp(netmem);
+ if (unlikely(!pool)) {
+ pool = netmem_pp;
+ allow_direct = page_pool_napi_local(pool);
+ } else if (netmem_pp != pool) {
+ /*
+ * If the netmem belongs to a different
+ * page_pool, save it for another round.
+ */
+ data[foreign++] = netmem;
+ continue;
+ }
+
+ netmem = __page_pool_put_page(pool, netmem, -1,
+ allow_direct);
+ /* Approved for bulk recycling in ptr_ring cache */
+ if (netmem)
+ bulk[bulk_len++] = netmem;
}
- }
- recycle_stat_add(pool, ring, i);
- page_pool_producer_unlock(pool, in_softirq);
- /* Hopefully all pages was return into ptr_ring */
- if (likely(i == bulk_len))
- return;
+ if (bulk_len)
+ page_pool_recycle_ring_bulk(pool, bulk, bulk_len);
- /* ptr_ring cache full, free remaining pages outside producer lock
- * since put_page() with refcnt == 1 can be an expensive operation
- */
- for (; i < bulk_len; i++)
- page_pool_return_page(pool, data[i]);
+ count = foreign;
+ }
}
EXPORT_SYMBOL(page_pool_put_netmem_bulk);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 938ad15c9857..56127e8ec85f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -511,46 +511,19 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
* xdp_frame_bulk is usually stored/allocated on the function
* call-stack to avoid locking penalties.
*/
-void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
-{
- struct xdp_mem_allocator *xa = bq->xa;
-
- if (unlikely(!xa || !bq->count))
- return;
-
- page_pool_put_netmem_bulk(xa->page_pool, bq->q, bq->count);
- /* bq->xa is not cleared to save lookup, if mem.id same in next bulk */
- bq->count = 0;
-}
-EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
/* Must be called with rcu_read_lock held */
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq)
{
- struct xdp_mem_info *mem = &xdpf->mem;
- struct xdp_mem_allocator *xa;
-
- if (mem->type != MEM_TYPE_PAGE_POOL) {
+ if (xdpf->mem.type != MEM_TYPE_PAGE_POOL) {
xdp_return_frame(xdpf);
return;
}
- xa = bq->xa;
- if (unlikely(!xa)) {
- xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- bq->count = 0;
- bq->xa = xa;
- }
-
if (bq->count == XDP_BULK_QUEUE_SIZE)
xdp_flush_frame_bulk(bq);
- if (unlikely(mem->id != xa->mem.id)) {
- xdp_flush_frame_bulk(bq);
- bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- }
-
if (unlikely(xdp_frame_has_frags(xdpf))) {
struct skb_shared_info *sinfo;
int i;
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 02/12] xdp: get rid of xdp_frame::mem.id
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 01/12] page_pool: allow mixing PPs within one bulk Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 03/12] xdp: make __xdp_return() MP-agnostic Alexander Lobakin
` (10 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Initially, xdp_frame::mem.id was used to search for the corresponding
&page_pool to return the page correctly.
However, after that struct page was extended to have a direct pointer
to its PP (netmem has it as well), further keeping of this field makes
no sense. xdp_return_frame_bulk() still used it to do a lookup, and
this leftover is now removed.
Remove xdp_frame::mem and replace it with ::mem_type, as only memory
type still matters and we need to know it to be able to free the frame
correctly.
As a cute side effect, we can now make every scalar field in &xdp_frame
of 4 byte width, speeding up accesses to them.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 14 +++++-----
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
drivers/net/veth.c | 4 +--
kernel/bpf/cpumap.c | 2 +-
net/bpf/test_run.c | 4 +--
net/core/filter.c | 12 ++++----
net/core/xdp.c | 28 +++++++++----------
7 files changed, 33 insertions(+), 33 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 9e7eb8223513..1c260869a353 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -169,13 +169,13 @@ xdp_get_buff_len(const struct xdp_buff *xdp)
struct xdp_frame {
void *data;
- u16 len;
- u16 headroom;
+ u32 len;
+ u32 headroom;
u32 metasize; /* uses lower 8-bits */
/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
- * while mem info is valid on remote CPU.
+ * while mem_type is valid on remote CPU.
*/
- struct xdp_mem_info mem;
+ enum xdp_mem_type mem_type:32;
struct net_device *dev_rx; /* used by cpumap */
u32 frame_sz;
u32 flags; /* supported values defined in xdp_buff_flags */
@@ -306,13 +306,13 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
if (unlikely(xdp_update_frame_from_buff(xdp, xdp_frame) < 0))
return NULL;
- /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
- xdp_frame->mem = xdp->rxq->mem;
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_type */
+ xdp_frame->mem_type = xdp->rxq->mem.type;
return xdp_frame;
}
-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp);
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index bf5baef5c3e0..4948b4906584 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2281,7 +2281,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv,
new_xdpf->len = xdpf->len;
new_xdpf->headroom = priv->tx_headroom;
new_xdpf->frame_sz = DPAA_BP_RAW_SIZE;
- new_xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ new_xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
/* Release the initial buffer */
xdp_return_frame_rx_napi(xdpf);
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 07ebb800edf1..01251868a9c2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -634,7 +634,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
break;
case XDP_TX:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
trace_xdp_exception(rq->dev, xdp_prog, act);
frame = &orig_frame;
@@ -646,7 +646,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
goto xdp_xmit;
case XDP_REDIRECT:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
frame = &orig_frame;
stats->rx_drops++;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a2f46785ac3b..774accbd4a22 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -190,7 +190,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
int err;
rxq.dev = xdpf->dev_rx;
- rxq.mem = xdpf->mem;
+ rxq.mem.type = xdpf->mem_type;
/* TODO: report queue_index to xdp_rxq_info */
xdp_convert_frame_to_buff(xdpf, &xdp);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 501ec4249fed..9ae2a7f1738b 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -153,7 +153,7 @@ static void xdp_test_run_init_page(netmem_ref netmem, void *arg)
new_ctx->data = new_ctx->data_meta + meta_len;
xdp_update_frame_from_buff(new_ctx, frm);
- frm->mem = new_ctx->rxq->mem;
+ frm->mem_type = new_ctx->rxq->mem.type;
memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
}
@@ -246,7 +246,7 @@ static void reset_ctx(struct xdp_page_head *head)
head->ctx.data_meta = head->orig_ctx.data_meta;
head->ctx.data_end = head->orig_ctx.data_end;
xdp_update_frame_from_buff(&head->ctx, head->frame);
- head->frame->mem = head->orig_ctx.rxq->mem;
+ head->frame->mem_type = head->orig_ctx.rxq->mem.type;
}
static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
diff --git a/net/core/filter.c b/net/core/filter.c
index fac245065b0a..6c036708634b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4119,13 +4119,13 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset)
}
static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
- struct xdp_mem_info *mem_info, bool release)
+ enum xdp_mem_type mem_type, bool release)
{
struct xdp_buff *zc_frag = xsk_buff_get_tail(xdp);
if (release) {
xsk_buff_del_tail(zc_frag);
- __xdp_return(NULL, mem_info, false, zc_frag);
+ __xdp_return(NULL, mem_type, false, zc_frag);
} else {
zc_frag->data_end -= shrink;
}
@@ -4134,18 +4134,18 @@ static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag,
int shrink)
{
- struct xdp_mem_info *mem_info = &xdp->rxq->mem;
+ enum xdp_mem_type mem_type = xdp->rxq->mem.type;
bool release = skb_frag_size(frag) == shrink;
- if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) {
- bpf_xdp_shrink_data_zc(xdp, shrink, mem_info, release);
+ if (mem_type == MEM_TYPE_XSK_BUFF_POOL) {
+ bpf_xdp_shrink_data_zc(xdp, shrink, mem_type, release);
goto out;
}
if (release) {
struct page *page = skb_frag_page(frag);
- __xdp_return(page_address(page), mem_info, false, NULL);
+ __xdp_return(page_address(page), mem_type, false, NULL);
}
out:
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 56127e8ec85f..d367571c5838 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -430,12 +430,12 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
* is used for those calls sites. Thus, allowing for faster recycling
* of xdp_frames/pages in those cases.
*/
-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp)
{
struct page *page;
- switch (mem->type) {
+ switch (mem_type) {
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())
@@ -458,7 +458,7 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
break;
default:
/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
- WARN(1, "Incorrect XDP memory type (%d) usage", mem->type);
+ WARN(1, "Incorrect XDP memory type (%d) usage", mem_type);
break;
}
}
@@ -475,10 +475,10 @@ void xdp_return_frame(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdpf->mem, false, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, false, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, false, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame);
@@ -494,10 +494,10 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdpf->mem, true, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, true, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, true, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, true, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -516,7 +516,7 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq)
{
- if (xdpf->mem.type != MEM_TYPE_PAGE_POOL) {
+ if (xdpf->mem_type != MEM_TYPE_PAGE_POOL) {
xdp_return_frame(xdpf);
return;
}
@@ -553,10 +553,11 @@ void xdp_return_buff(struct xdp_buff *xdp)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+ __xdp_return(page_address(page), xdp->rxq->mem.type, true,
+ xdp);
}
out:
- __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
+ __xdp_return(xdp->data, xdp->rxq->mem.type, true, xdp);
}
EXPORT_SYMBOL_GPL(xdp_return_buff);
@@ -602,7 +603,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
xdpf->headroom = 0;
xdpf->metasize = metasize;
xdpf->frame_sz = PAGE_SIZE;
- xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
xsk_buff_free(xdp);
return xdpf;
@@ -672,7 +673,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
* - RX ring dev queue index (skb_record_rx_queue)
*/
- if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
+ if (xdpf->mem_type == MEM_TYPE_PAGE_POOL)
skb_mark_for_recycle(skb);
/* Allow SKB to reuse area used by xdp_frame */
@@ -719,8 +720,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
nxdpf = addr;
nxdpf->data = addr + headroom;
nxdpf->frame_sz = PAGE_SIZE;
- nxdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
- nxdpf->mem.id = 0;
+ nxdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
return nxdpf;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 03/12] xdp: make __xdp_return() MP-agnostic
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 01/12] page_pool: allow mixing PPs within one bulk Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 02/12] xdp: get rid of xdp_frame::mem.id Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 04/12] xdp: add generic xdp_buff_add_frag() Alexander Lobakin
` (9 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Currently, __xdp_return() takes pointer to the virtual memory to free
a buffer. Apart from that this sometimes provokes redundant
data <--> page conversions, taking data pointer effectively prevents
lots of XDP code to support non-page-backed buffers, as there's no
mapping for the non-host memory (data is always NULL).
Just convert it to always take netmem reference. For
xdp_return_{buff,frame*}(), this chops off one page_address() per each
frag and adds one virt_to_netmem() (same as virt_to_page()) per header
buffer. For __xdp_return() itself, it removes one virt_to_page() for
MEM_TYPE_PAGE_POOL and another one for MEM_TYPE_PAGE_ORDER0, adding
one page_address() for [not really common nowadays]
MEM_TYPE_PAGE_SHARED, but the main effect is that the abovementioned
functions won't die or memleak anymore if the frame has non-host memory
attached and will correctly free those.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 4 ++--
net/core/filter.c | 9 +++------
net/core/xdp.c | 47 +++++++++++++++++++----------------------------
3 files changed, 24 insertions(+), 36 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 1c260869a353..d2089cfecefd 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -312,8 +312,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
return xdp_frame;
}
-void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
- struct xdp_buff *xdp);
+void __xdp_return(netmem_ref netmem, enum xdp_mem_type mem_type,
+ bool napi_direct, struct xdp_buff *xdp);
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
void xdp_return_buff(struct xdp_buff *xdp);
diff --git a/net/core/filter.c b/net/core/filter.c
index 6c036708634b..5fea874025d3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4125,7 +4125,7 @@ static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
if (release) {
xsk_buff_del_tail(zc_frag);
- __xdp_return(NULL, mem_type, false, zc_frag);
+ __xdp_return(0, mem_type, false, zc_frag);
} else {
zc_frag->data_end -= shrink;
}
@@ -4142,11 +4142,8 @@ static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag,
goto out;
}
- if (release) {
- struct page *page = skb_frag_page(frag);
-
- __xdp_return(page_address(page), mem_type, false, NULL);
- }
+ if (release)
+ __xdp_return(skb_frag_netmem(frag), mem_type, false, NULL);
out:
return release;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index d367571c5838..f1165a35411b 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -430,27 +430,25 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
* is used for those calls sites. Thus, allowing for faster recycling
* of xdp_frames/pages in those cases.
*/
-void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
- struct xdp_buff *xdp)
+void __xdp_return(netmem_ref netmem, enum xdp_mem_type mem_type,
+ bool napi_direct, struct xdp_buff *xdp)
{
- struct page *page;
-
switch (mem_type) {
case MEM_TYPE_PAGE_POOL:
- page = virt_to_head_page(data);
+ netmem = netmem_compound_head(netmem);
if (napi_direct && xdp_return_frame_no_direct())
napi_direct = false;
/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
* as mem->type knows this a page_pool page
*/
- page_pool_put_full_page(page->pp, page, napi_direct);
+ page_pool_put_full_netmem(netmem_get_pp(netmem), netmem,
+ napi_direct);
break;
case MEM_TYPE_PAGE_SHARED:
- page_frag_free(data);
+ page_frag_free(__netmem_address(netmem));
break;
case MEM_TYPE_PAGE_ORDER0:
- page = virt_to_page(data); /* Assumes order0 page*/
- put_page(page);
+ put_page(__netmem_to_page(netmem));
break;
case MEM_TYPE_XSK_BUFF_POOL:
/* NB! Only valid from an xdp_buff! */
@@ -466,38 +464,34 @@ void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
void xdp_return_frame(struct xdp_frame *xdpf)
{
struct skb_shared_info *sinfo;
- int i;
if (likely(!xdp_frame_has_frags(xdpf)))
goto out;
sinfo = xdp_get_shared_info_from_frame(xdpf);
- for (i = 0; i < sinfo->nr_frags; i++) {
- struct page *page = skb_frag_page(&sinfo->frags[i]);
+ for (u32 i = 0; i < sinfo->nr_frags; i++)
+ __xdp_return(skb_frag_netmem(&sinfo->frags[i]), xdpf->mem_type,
+ false, NULL);
- __xdp_return(page_address(page), xdpf->mem_type, false, NULL);
- }
out:
- __xdp_return(xdpf->data, xdpf->mem_type, false, NULL);
+ __xdp_return(virt_to_netmem(xdpf->data), xdpf->mem_type, false, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
{
struct skb_shared_info *sinfo;
- int i;
if (likely(!xdp_frame_has_frags(xdpf)))
goto out;
sinfo = xdp_get_shared_info_from_frame(xdpf);
- for (i = 0; i < sinfo->nr_frags; i++) {
- struct page *page = skb_frag_page(&sinfo->frags[i]);
+ for (u32 i = 0; i < sinfo->nr_frags; i++)
+ __xdp_return(skb_frag_netmem(&sinfo->frags[i]), xdpf->mem_type,
+ true, NULL);
- __xdp_return(page_address(page), xdpf->mem_type, true, NULL);
- }
out:
- __xdp_return(xdpf->data, xdpf->mem_type, true, NULL);
+ __xdp_return(virt_to_netmem(xdpf->data), xdpf->mem_type, true, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -544,20 +538,17 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
void xdp_return_buff(struct xdp_buff *xdp)
{
struct skb_shared_info *sinfo;
- int i;
if (likely(!xdp_buff_has_frags(xdp)))
goto out;
sinfo = xdp_get_shared_info_from_buff(xdp);
- for (i = 0; i < sinfo->nr_frags; i++) {
- struct page *page = skb_frag_page(&sinfo->frags[i]);
+ for (u32 i = 0; i < sinfo->nr_frags; i++)
+ __xdp_return(skb_frag_netmem(&sinfo->frags[i]),
+ xdp->rxq->mem.type, true, xdp);
- __xdp_return(page_address(page), xdp->rxq->mem.type, true,
- xdp);
- }
out:
- __xdp_return(xdp->data, xdp->rxq->mem.type, true, xdp);
+ __xdp_return(virt_to_netmem(xdp->data), xdp->rxq->mem.type, true, xdp);
}
EXPORT_SYMBOL_GPL(xdp_return_buff);
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 04/12] xdp: add generic xdp_buff_add_frag()
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (2 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 03/12] xdp: make __xdp_return() MP-agnostic Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff() Alexander Lobakin
` (8 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
The code piece which would attach a frag to &xdp_buff is almost
identical across the drivers supporting XDP multi-buffer on Rx.
Make it a generic elegant "oneliner".
Also, I see lots of drivers calculating frags_truesize as
`xdp->frame_sz * nr_frags`. I can't say this is fully correct, since
frags might be backed by chunks of different sizes, especially with
stuff like the header split. Even page_pool_alloc() can give you two
different truesizes on two subsequent requests to allocate the same
buffer size. Add a field to &skb_shared_info (unionized as there's no
free slot currently on x86_64) to track the "true" truesize. It can
be used later when updating an skb.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/linux/skbuff.h | 16 +++++--
include/net/xdp.h | 96 +++++++++++++++++++++++++++++++++++++++++-
net/core/xdp.c | 11 +++++
3 files changed, 118 insertions(+), 5 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 69624b394cd9..8bcf14ae6789 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -608,11 +608,19 @@ struct skb_shared_info {
* Warning : all fields before dataref are cleared in __alloc_skb()
*/
atomic_t dataref;
- unsigned int xdp_frags_size;
- /* Intermediate layers must ensure that destructor_arg
- * remains valid until skb destructor */
- void * destructor_arg;
+ union {
+ struct {
+ u32 xdp_frags_size;
+ u32 xdp_frags_truesize;
+ };
+
+ /*
+ * Intermediate layers must ensure that destructor_arg
+ * remains valid until skb destructor.
+ */
+ void *destructor_arg;
+ };
/* must be last field, see pskb_expand_head() */
skb_frag_t frags[MAX_SKB_FRAGS];
diff --git a/include/net/xdp.h b/include/net/xdp.h
index d2089cfecefd..11139c210b49 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -167,6 +167,93 @@ xdp_get_buff_len(const struct xdp_buff *xdp)
return len;
}
+void xdp_return_frag(netmem_ref netmem, const struct xdp_buff *xdp);
+
+/**
+ * __xdp_buff_add_frag - attach frag to &xdp_buff
+ * @xdp: XDP buffer to attach the frag to
+ * @netmem: network memory containing the frag
+ * @offset: offset at which the frag starts
+ * @size: size of the frag
+ * @truesize: total memory size occupied by the frag
+ * @try_coalesce: whether to try coalescing the frags (not valid for XSk)
+ *
+ * Attach frag to the XDP buffer. If it currently has no frags attached,
+ * initialize the related fields, otherwise check that the frag number
+ * didn't reach the limit of ``MAX_SKB_FRAGS``. If possible, try coalescing
+ * the frag with the previous one.
+ * The function doesn't check/update the pfmemalloc bit. Please use the
+ * non-underscored wrapper in drivers.
+ *
+ * Return: true on success, false if there's no space for the frag in
+ * the shared info struct.
+ */
+static inline bool __xdp_buff_add_frag(struct xdp_buff *xdp, netmem_ref netmem,
+ u32 offset, u32 size, u32 truesize,
+ bool try_coalesce)
+{
+ struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+ skb_frag_t *prev;
+ u32 nr_frags;
+
+ if (!xdp_buff_has_frags(xdp)) {
+ xdp_buff_set_frags_flag(xdp);
+
+ nr_frags = 0;
+ sinfo->xdp_frags_size = 0;
+ sinfo->xdp_frags_truesize = 0;
+
+ goto fill;
+ }
+
+ nr_frags = sinfo->nr_frags;
+ prev = &sinfo->frags[nr_frags - 1];
+
+ if (try_coalesce && netmem == skb_frag_netmem(prev) &&
+ offset == skb_frag_off(prev) + skb_frag_size(prev)) {
+ skb_frag_size_add(prev, size);
+ /* Guaranteed to only decrement the refcount */
+ xdp_return_frag(netmem, xdp);
+ } else if (unlikely(nr_frags == MAX_SKB_FRAGS)) {
+ return false;
+ } else {
+fill:
+ __skb_fill_netmem_desc_noacc(sinfo, nr_frags++, netmem,
+ offset, size);
+ }
+
+ sinfo->nr_frags = nr_frags;
+ sinfo->xdp_frags_size += size;
+ sinfo->xdp_frags_truesize += truesize;
+
+ return true;
+}
+
+/**
+ * xdp_buff_add_frag - attach frag to &xdp_buff
+ * @xdp: XDP buffer to attach the frag to
+ * @netmem: network memory containing the frag
+ * @offset: offset at which the frag starts
+ * @size: size of the frag
+ * @truesize: total memory size occupied by the frag
+ *
+ * Version of __xdp_buff_add_frag() which takes care of the pfmemalloc bit.
+ *
+ * Return: true on success, false if there's no space for the frag in
+ * the shared info struct.
+ */
+static inline bool xdp_buff_add_frag(struct xdp_buff *xdp, netmem_ref netmem,
+ u32 offset, u32 size, u32 truesize)
+{
+ if (!__xdp_buff_add_frag(xdp, netmem, offset, size, truesize, true))
+ return false;
+
+ if (unlikely(netmem_is_pfmemalloc(netmem)))
+ xdp_buff_set_frag_pfmemalloc(xdp);
+
+ return true;
+}
+
struct xdp_frame {
void *data;
u32 len;
@@ -230,7 +317,14 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
unsigned int size, unsigned int truesize,
bool pfmemalloc)
{
- skb_shinfo(skb)->nr_frags = nr_frags;
+ struct skb_shared_info *sinfo = skb_shinfo(skb);
+
+ sinfo->nr_frags = nr_frags;
+ /*
+ * ``destructor_arg`` is unionized with ``xdp_frags_{,true}size``,
+ * reset it after that these fields aren't used anymore.
+ */
+ sinfo->destructor_arg = NULL;
skb->len += size;
skb->data_len += size;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index f1165a35411b..a66a4e036f53 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -535,6 +535,17 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
}
EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
+/**
+ * xdp_return_frag -- free one XDP frag or decrement its refcount
+ * @netmem: network memory reference to release
+ * @xdp: &xdp_buff to release the frag for
+ */
+void xdp_return_frag(netmem_ref netmem, const struct xdp_buff *xdp)
+{
+ __xdp_return(netmem, xdp->rxq->mem.type, true, NULL);
+}
+EXPORT_SYMBOL_GPL(xdp_return_frag);
+
void xdp_return_buff(struct xdp_buff *xdp)
{
struct skb_shared_info *sinfo;
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff()
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (3 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 04/12] xdp: add generic xdp_buff_add_frag() Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-13 2:11 ` Jakub Kicinski
2024-12-11 17:26 ` [PATCH net-next 06/12] xsk: make xsk_buff_add_frag really add the frag via __xdp_buff_add_frag() Alexander Lobakin
` (7 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
The code which builds an skb from an &xdp_buff keeps multiplying itself
around the drivers with almost no changes. Let's try to stop that by
adding a generic function.
Unlike __xdp_build_skb_from_frame(), always allocate an skbuff head
using napi_build_skb() and make use of the available xdp_rxq pointer to
assign the Rx queue index. In case of PP-backed buffer, mark the skb to
be recycled, as every PP user's been switched to recycle skbs.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 1 +
net/core/xdp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 11139c210b49..aa24fa78cbe6 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -336,6 +336,7 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
void xdp_warn(const char *msg, const char *func, const int line);
#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp);
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index a66a4e036f53..c4d824cf27da 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -629,6 +629,61 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
}
EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);
+/**
+ * xdp_build_skb_from_buff - create an skb from &xdp_buff
+ * @xdp: &xdp_buff to convert to an skb
+ *
+ * Perform common operations to create a new skb to pass up the stack from
+ * &xdp_buff: allocate an skb head from the NAPI percpu cache, initialize
+ * skb data pointers and offsets, set the recycle bit if the buff is
+ * PP-backed, Rx queue index, protocol and update frags info.
+ *
+ * Return: new &sk_buff on success, %NULL on error.
+ */
+struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp)
+{
+ const struct xdp_rxq_info *rxq = xdp->rxq;
+ const struct skb_shared_info *sinfo;
+ struct sk_buff *skb;
+ u32 nr_frags = 0;
+ int metalen;
+
+ if (unlikely(xdp_buff_has_frags(xdp))) {
+ sinfo = xdp_get_shared_info_from_buff(xdp);
+ nr_frags = sinfo->nr_frags;
+ }
+
+ skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz);
+ if (unlikely(!skb))
+ return NULL;
+
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ __skb_put(skb, xdp->data_end - xdp->data);
+
+ metalen = xdp->data - xdp->data_meta;
+ if (metalen > 0)
+ skb_metadata_set(skb, metalen);
+
+ if (rxq->mem.type == MEM_TYPE_PAGE_POOL && is_page_pool_compiled_in())
+ skb_mark_for_recycle(skb);
+
+ skb_record_rx_queue(skb, rxq->queue_index);
+
+ if (unlikely(nr_frags)) {
+ u32 tsize;
+
+ tsize = sinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz;
+ xdp_update_skb_shared_info(skb, nr_frags,
+ sinfo->xdp_frags_size, tsize,
+ xdp_buff_is_frag_pfmemalloc(xdp));
+ }
+
+ skb->protocol = eth_type_trans(skb, rxq->dev);
+
+ return skb;
+}
+EXPORT_SYMBOL_GPL(xdp_build_skb_from_buff);
+
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
struct net_device *dev)
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff()
2024-12-11 17:26 ` [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff() Alexander Lobakin
@ 2024-12-13 2:11 ` Jakub Kicinski
2024-12-13 17:25 ` Alexander Lobakin
0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2024-12-13 2:11 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Wed, 11 Dec 2024 18:26:42 +0100 Alexander Lobakin wrote:
> + if (rxq->mem.type == MEM_TYPE_PAGE_POOL && is_page_pool_compiled_in())
> + skb_mark_for_recycle(skb);
I feel like the check for mem.type is unnecessary. I can't think of
a driver that would build a skb out of pp pages, and not own pp
refs on those pages. Setting pp_recycle on non-pp skb should be safe,
even if slightly wasteful.
Also:
static inline void skb_mark_for_recycle(struct sk_buff *skb)
{
#ifdef CONFIG_PAGE_POOL
skb->pp_recycle = 1;
#endif
}
You don't have to check if PP is complied in explicitly.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff()
2024-12-13 2:11 ` Jakub Kicinski
@ 2024-12-13 17:25 ` Alexander Lobakin
0 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-13 17:25 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 12 Dec 2024 18:11:29 -0800
> On Wed, 11 Dec 2024 18:26:42 +0100 Alexander Lobakin wrote:
>> + if (rxq->mem.type == MEM_TYPE_PAGE_POOL && is_page_pool_compiled_in())
>> + skb_mark_for_recycle(skb);
>
> I feel like the check for mem.type is unnecessary. I can't think of
> a driver that would build a skb out of pp pages, and not own pp
But there's no restriction that this function is limited to PP pages.
> refs on those pages. Setting pp_recycle on non-pp skb should be safe,
> even if slightly wasteful.
>
> Also:
>
> static inline void skb_mark_for_recycle(struct sk_buff *skb)
> {
> #ifdef CONFIG_PAGE_POOL
> skb->pp_recycle = 1;
> #endif
> }
>
> You don't have to check if PP is complied in explicitly.
Ah I see ._. My idea was to compile-out this check/branch at all when
the PP is not compiled in, but I think that it would be optimized out
anyway by the compiler, so correct.
Thanks,
Olek
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next 06/12] xsk: make xsk_buff_add_frag really add the frag via __xdp_buff_add_frag()
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (4 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 05/12] xdp: add generic xdp_build_skb_from_buff() Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion Alexander Lobakin
` (6 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Currently, xsk_buff_add_frag() only adds the frag to pool's linked list,
not doing anything with the &xdp_buff. The drivers do that manually and
the logic is the same.
Make it really add an skb frag, just like xdp_buff_add_frag() does that,
and freeing frags on error if needed. This allows to remove repeating
code from i40e and ice and not add the same code again and again.
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp_sock_drv.h | 18 ++++++++++--
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 30 ++------------------
drivers/net/ethernet/intel/ice/ice_xsk.c | 32 ++--------------------
3 files changed, 20 insertions(+), 60 deletions(-)
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index f3175a5d28f7..86620c818965 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -136,11 +136,21 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
xp_free(xskb);
}
-static inline void xsk_buff_add_frag(struct xdp_buff *xdp)
+static inline bool xsk_buff_add_frag(struct xdp_buff *head,
+ struct xdp_buff *xdp)
{
- struct xdp_buff_xsk *frag = container_of(xdp, struct xdp_buff_xsk, xdp);
+ const void *data = xdp->data;
+ struct xdp_buff_xsk *frag;
+
+ if (!__xdp_buff_add_frag(head, virt_to_netmem(data),
+ offset_in_page(data), xdp->data_end - data,
+ xdp->frame_sz, false))
+ return false;
+ frag = container_of(xdp, struct xdp_buff_xsk, xdp);
list_add_tail(&frag->list_node, &frag->pool->xskb_list);
+
+ return true;
}
static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
@@ -357,8 +367,10 @@ static inline void xsk_buff_free(struct xdp_buff *xdp)
{
}
-static inline void xsk_buff_add_frag(struct xdp_buff *xdp)
+static inline bool xsk_buff_add_frag(struct xdp_buff *head,
+ struct xdp_buff *xdp)
{
+ return false;
}
static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 4e885df789ef..e28f1905a4a0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -395,32 +395,6 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring,
WARN_ON_ONCE(1);
}
-static int
-i40e_add_xsk_frag(struct i40e_ring *rx_ring, struct xdp_buff *first,
- struct xdp_buff *xdp, const unsigned int size)
-{
- struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first);
-
- if (!xdp_buff_has_frags(first)) {
- sinfo->nr_frags = 0;
- sinfo->xdp_frags_size = 0;
- xdp_buff_set_frags_flag(first);
- }
-
- if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) {
- xsk_buff_free(first);
- return -ENOMEM;
- }
-
- __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++,
- virt_to_page(xdp->data_hard_start),
- XDP_PACKET_HEADROOM, size);
- sinfo->xdp_frags_size += size;
- xsk_buff_add_frag(xdp);
-
- return 0;
-}
-
/**
* i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
* @rx_ring: Rx ring
@@ -486,8 +460,10 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
if (!first)
first = bi;
- else if (i40e_add_xsk_frag(rx_ring, first, bi, size))
+ else if (!xsk_buff_add_frag(first, bi)) {
+ xsk_buff_free(first);
break;
+ }
if (++next_to_process == count)
next_to_process = 0;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 334ae945d640..8975d2971bc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -801,35 +801,6 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
return result;
}
-static int
-ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
- struct xdp_buff *xdp, const unsigned int size)
-{
- struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(first);
-
- if (!size)
- return 0;
-
- if (!xdp_buff_has_frags(first)) {
- sinfo->nr_frags = 0;
- sinfo->xdp_frags_size = 0;
- xdp_buff_set_frags_flag(first);
- }
-
- if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) {
- xsk_buff_free(first);
- return -ENOMEM;
- }
-
- __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++,
- virt_to_page(xdp->data_hard_start),
- XDP_PACKET_HEADROOM, size);
- sinfo->xdp_frags_size += size;
- xsk_buff_add_frag(xdp);
-
- return 0;
-}
-
/**
* ice_clean_rx_irq_zc - consumes packets from the hardware ring
* @rx_ring: AF_XDP Rx ring
@@ -895,7 +866,8 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring,
if (!first) {
first = xdp;
- } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) {
+ } else if (likely(size) && !xsk_buff_add_frag(first, xdp)) {
+ xsk_buff_free(first);
break;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (5 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 06/12] xsk: make xsk_buff_add_frag really add the frag via __xdp_buff_add_frag() Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-13 2:19 ` Jakub Kicinski
2024-12-11 17:26 ` [PATCH net-next 08/12] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go Alexander Lobakin
` (5 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Same as with converting &xdp_buff to skb on Rx, the code which allocates
a new skb and copies the XSk frame there is identical across the
drivers, so make it generic. This includes copying all the frags if they
are present in the original buff.
System percpu Page Pools help here a lot: when available, allocate pages
from there instead of the MM layer. This greatly improves XDP_PASS
performance on XSk: instead of page_alloc() + page_free(), the net core
recycles the same pages, so the only overhead left is memcpy()s.
Note that the passed buff gets freed if the conversion is done w/o any
error, assuming you don't need this buffer after you convert it to an
skb.
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 1 +
net/core/xdp.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 139 insertions(+)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index aa24fa78cbe6..6da0e746cf75 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -337,6 +337,7 @@ void xdp_warn(const char *msg, const char *func, const int line);
#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp);
+struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp);
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index c4d824cf27da..6e319e00ae78 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -22,6 +22,8 @@
#include <trace/events/xdp.h>
#include <net/xdp_sock_drv.h>
+#include "dev.h"
+
#define REG_STATE_NEW 0x0
#define REG_STATE_REGISTERED 0x1
#define REG_STATE_UNREGISTERED 0x2
@@ -684,6 +686,142 @@ struct sk_buff *xdp_build_skb_from_buff(const struct xdp_buff *xdp)
}
EXPORT_SYMBOL_GPL(xdp_build_skb_from_buff);
+/**
+ * xdp_copy_frags_from_zc - copy frags from XSk buff to skb
+ * @skb: skb to copy frags to
+ * @xdp: XSk &xdp_buff from which the frags will be copied
+ * @pp: &page_pool backing page allocation, if available
+ *
+ * Copy all frags from XSk &xdp_buff to the skb to pass it up the stack.
+ * Allocate a new page / page frag for each frag, copy it and attach to
+ * the skb.
+ *
+ * Return: true on success, false on page allocation fail.
+ */
+static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,
+ const struct xdp_buff *xdp,
+ struct page_pool *pp)
+{
+ const struct skb_shared_info *xinfo;
+ struct skb_shared_info *sinfo;
+ u32 nr_frags, ts;
+
+ xinfo = xdp_get_shared_info_from_buff(xdp);
+ nr_frags = xinfo->nr_frags;
+ sinfo = skb_shinfo(skb);
+
+#if IS_ENABLED(CONFIG_PAGE_POOL)
+ ts = 0;
+#else
+ ts = xinfo->xdp_frags_truesize ? : nr_frags * xdp->frame_sz;
+#endif
+
+ for (u32 i = 0; i < nr_frags; i++) {
+ u32 len = skb_frag_size(&xinfo->frags[i]);
+ void *data;
+#if IS_ENABLED(CONFIG_PAGE_POOL)
+ u32 truesize = len;
+
+ data = page_pool_dev_alloc_va(pp, &truesize);
+ ts += truesize;
+#else
+ data = napi_alloc_frag(len);
+#endif
+ if (unlikely(!data))
+ return false;
+
+ memcpy(data, skb_frag_address(&xinfo->frags[i]),
+ LARGEST_ALIGN(len));
+ __skb_fill_netmem_desc(skb, sinfo->nr_frags++,
+ virt_to_netmem(data),
+ offset_in_page(data), len);
+ }
+
+ xdp_update_skb_shared_info(skb, nr_frags, xinfo->xdp_frags_size,
+ ts, false);
+
+ return true;
+}
+
+/**
+ * xdp_build_skb_from_zc - create an skb from XSk &xdp_buff
+ * @xdp: source XSk buff
+ *
+ * Similar to xdp_build_skb_from_buff(), but for XSk frames. Allocate an skb
+ * head, new page for the head, copy the data and initialize the skb fields.
+ * If there are frags, allocate new pages for them and copy.
+ * If Page Pool is available, the function allocates memory from the system
+ * percpu pools to try recycling the pages, otherwise it uses the NAPI page
+ * frag caches.
+ * If new skb was built successfully, @xdp is returned to XSk pool's freelist.
+ * On error, it remains untouched and the caller must take care of this.
+ *
+ * Return: new &sk_buff on success, %NULL on error.
+ */
+struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
+{
+ const struct xdp_rxq_info *rxq = xdp->rxq;
+ u32 len = xdp->data_end - xdp->data_meta;
+ struct page_pool *pp;
+ struct sk_buff *skb;
+ int metalen;
+#if IS_ENABLED(CONFIG_PAGE_POOL)
+ u32 truesize;
+ void *data;
+
+ pp = this_cpu_read(system_page_pool);
+ truesize = xdp->frame_sz;
+
+ data = page_pool_dev_alloc_va(pp, &truesize);
+ if (unlikely(!data))
+ return NULL;
+
+ skb = napi_build_skb(data, truesize);
+ if (unlikely(!skb)) {
+ page_pool_free_va(pp, data, true);
+ return NULL;
+ }
+
+ skb_mark_for_recycle(skb);
+ skb_reserve(skb, xdp->data_meta - xdp->data_hard_start);
+#else /* !CONFIG_PAGE_POOL */
+ struct napi_struct *napi;
+
+ pp = NULL;
+ napi = napi_by_id(rxq->napi_id);
+ if (likely(napi))
+ skb = napi_alloc_skb(napi, len);
+ else
+ skb = __netdev_alloc_skb_ip_align(rxq->dev, len,
+ GFP_ATOMIC | __GFP_NOWARN);
+ if (unlikely(!skb))
+ return NULL;
+#endif /* !CONFIG_PAGE_POOL */
+
+ memcpy(__skb_put(skb, len), xdp->data_meta, LARGEST_ALIGN(len));
+
+ metalen = xdp->data - xdp->data_meta;
+ if (metalen > 0) {
+ skb_metadata_set(skb, metalen);
+ __skb_pull(skb, metalen);
+ }
+
+ skb_record_rx_queue(skb, rxq->queue_index);
+
+ if (unlikely(xdp_buff_has_frags(xdp)) &&
+ unlikely(!xdp_copy_frags_from_zc(skb, xdp, pp))) {
+ napi_consume_skb(skb, true);
+ return NULL;
+ }
+
+ xsk_buff_free(xdp);
+
+ skb->protocol = eth_type_trans(skb, rxq->dev);
+
+ return skb;
+}
+EXPORT_SYMBOL_GPL(xdp_build_skb_from_zc);
+
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
struct net_device *dev)
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion
2024-12-11 17:26 ` [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion Alexander Lobakin
@ 2024-12-13 2:19 ` Jakub Kicinski
2024-12-13 17:31 ` Alexander Lobakin
0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2024-12-13 2:19 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Wed, 11 Dec 2024 18:26:44 +0100 Alexander Lobakin wrote:
> +#else /* !CONFIG_PAGE_POOL */
> + struct napi_struct *napi;
> +
> + pp = NULL;
> + napi = napi_by_id(rxq->napi_id);
> + if (likely(napi))
> + skb = napi_alloc_skb(napi, len);
> + else
> + skb = __netdev_alloc_skb_ip_align(rxq->dev, len,
> + GFP_ATOMIC | __GFP_NOWARN);
> + if (unlikely(!skb))
> + return NULL;
> +#endif /* !CONFIG_PAGE_POOL */
What are the chances of having a driver with AF_XDP support
and without page pool support? I think it's zero :S
Can we kill all the if !CONFIG_PAGE_POOL sections and hide
the entire helper under if CONFIG_PAGE_POLL ?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion
2024-12-13 2:19 ` Jakub Kicinski
@ 2024-12-13 17:31 ` Alexander Lobakin
2024-12-14 2:31 ` Jakub Kicinski
0 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-13 17:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 12 Dec 2024 18:19:44 -0800
> On Wed, 11 Dec 2024 18:26:44 +0100 Alexander Lobakin wrote:
>> +#else /* !CONFIG_PAGE_POOL */
>> + struct napi_struct *napi;
>> +
>> + pp = NULL;
>> + napi = napi_by_id(rxq->napi_id);
>> + if (likely(napi))
>> + skb = napi_alloc_skb(napi, len);
>> + else
>> + skb = __netdev_alloc_skb_ip_align(rxq->dev, len,
>> + GFP_ATOMIC | __GFP_NOWARN);
>> + if (unlikely(!skb))
>> + return NULL;
>> +#endif /* !CONFIG_PAGE_POOL */
>
> What are the chances of having a driver with AF_XDP support
> and without page pool support? I think it's zero :S
Right, as CONFIG_BPF_SYSCALL selects PAGE_POOL if NET.
I think I wrote this when it wasn't true (before Lorenzo introduced
generic page_pools) and then just forgot to change that ._.
> Can we kill all the if !CONFIG_PAGE_POOL sections and hide
> the entire helper under if CONFIG_PAGE_POLL ?
We can. But I think I'd need to introduce a return-NULL wrapper in case
of !PAGE_POOL to satisfy the linker, as lots of drivers build their XSk
code unconditionally.
Thanks,
Olek
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion
2024-12-13 17:31 ` Alexander Lobakin
@ 2024-12-14 2:31 ` Jakub Kicinski
0 siblings, 0 replies; 25+ messages in thread
From: Jakub Kicinski @ 2024-12-14 2:31 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Fri, 13 Dec 2024 18:31:59 +0100 Alexander Lobakin wrote:
> > Can we kill all the if !CONFIG_PAGE_POOL sections and hide
> > the entire helper under if CONFIG_PAGE_POLL ?
>
> We can. But I think I'd need to introduce a return-NULL wrapper in case
> of !PAGE_POOL to satisfy the linker, as lots of drivers build their XSk
> code unconditionally.
Oh wow, you're right. Bunch of drivers of a certain vendor still don't
use page pool.. return NULL wrapped SGTM.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next 08/12] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (6 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 07/12] xsk: add generic XSk &xdp_buff -> skb conversion Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts Alexander Lobakin
` (4 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Currently, when you send an XSk frame with metadata, you need to do
the following:
* call external xsk_buff_raw_get_dma();
* call inline xsk_buff_get_metadata(), which calls external
xsk_buff_raw_get_data() and then do some inline checks.
This effectively means that the following piece:
addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
is done twice per frame, plus you have 2 external calls per frame, plus
this:
meta = pool->addrs + addr - pool->tx_metadata_len;
if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
is always inlined, even if there's no meta or it's invalid.
Add xsk_buff_raw_get_ctx() (xp_raw_get_ctx() to be precise) to do that
in one go. It returns a small structure with 2 fields: DMA address,
filled unconditionally, and metadata pointer, valid only if it's
present. The address correction is performed only once and you also
have only 1 external call per XSk frame, which does all the calculations
and checks outside of your hotpath. You only need to check
`if (ctx.meta)` for the metadata presence.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp_sock_drv.h | 23 +++++++++++++++++++++
include/net/xsk_buff_pool.h | 8 ++++++++
net/xdp/xsk_buff_pool.c | 40 +++++++++++++++++++++++++++++++++++++
3 files changed, 71 insertions(+)
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 86620c818965..7fd1709deef5 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -205,6 +205,23 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
return xp_raw_get_data(pool, addr);
}
+/**
+ * xsk_buff_raw_get_ctx - get &xdp_desc context
+ * @pool: XSk buff pool desc address belongs to
+ * @addr: desc address (from userspace)
+ *
+ * Wrapper for xp_raw_get_ctx() to be used in drivers, see its kdoc for
+ * details.
+ *
+ * Return: new &xdp_desc_ctx struct containing desc's DMA address and metadata
+ * pointer, if it is present and valid (initialized to %NULL otherwise).
+ */
+static inline struct xdp_desc_ctx
+xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+ return xp_raw_get_ctx(pool, addr);
+}
+
#define XDP_TXMD_FLAGS_VALID ( \
XDP_TXMD_FLAGS_TIMESTAMP | \
XDP_TXMD_FLAGS_CHECKSUM | \
@@ -402,6 +419,12 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
return NULL;
}
+static inline struct xdp_desc_ctx
+xsk_buff_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+ return (struct xdp_desc_ctx){ };
+}
+
static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta)
{
return false;
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 50779406bc2d..1dcd4d71468a 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -141,6 +141,14 @@ u32 xp_alloc_batch(struct xsk_buff_pool *pool, struct xdp_buff **xdp, u32 max);
bool xp_can_alloc(struct xsk_buff_pool *pool, u32 count);
void *xp_raw_get_data(struct xsk_buff_pool *pool, u64 addr);
dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr);
+
+struct xdp_desc_ctx {
+ dma_addr_t dma;
+ struct xsk_tx_metadata *meta;
+};
+
+struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr);
+
static inline dma_addr_t xp_get_dma(struct xdp_buff_xsk *xskb)
{
return xskb->dma;
diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c
index ae71da7d2cd6..02c42caec9f4 100644
--- a/net/xdp/xsk_buff_pool.c
+++ b/net/xdp/xsk_buff_pool.c
@@ -715,3 +715,43 @@ dma_addr_t xp_raw_get_dma(struct xsk_buff_pool *pool, u64 addr)
(addr & ~PAGE_MASK);
}
EXPORT_SYMBOL(xp_raw_get_dma);
+
+/**
+ * xp_raw_get_ctx - get &xdp_desc context
+ * @pool: XSk buff pool desc address belongs to
+ * @addr: desc address (from userspace)
+ *
+ * Helper for getting desc's DMA address and metadata pointer, if present.
+ * Saves one call on hotpath, double calculation of the actual address,
+ * and inline checks for metadata presence and sanity.
+ * Please use xsk_buff_raw_get_ctx() in drivers instead.
+ *
+ * Return: new &xdp_desc_ctx struct containing desc's DMA address and metadata
+ * pointer, if it is present and valid (initialized to %NULL otherwise).
+ */
+struct xdp_desc_ctx xp_raw_get_ctx(const struct xsk_buff_pool *pool, u64 addr)
+{
+ struct xsk_tx_metadata *meta;
+ struct xdp_desc_ctx ret;
+
+ addr = pool->unaligned ? xp_unaligned_add_offset_to_addr(addr) : addr;
+ ret = (typeof(ret)){
+ /* Same logic as in xp_raw_get_dma() */
+ .dma = (pool->dma_pages[addr >> PAGE_SHIFT] &
+ ~XSK_NEXT_PG_CONTIG_MASK) + (addr & ~PAGE_MASK),
+ };
+
+ if (!pool->tx_metadata_len)
+ goto out;
+
+ /* Same logic as in xp_raw_get_data() + xsk_buff_get_metadata() */
+ meta = pool->addrs + addr - pool->tx_metadata_len;
+ if (unlikely(!xsk_buff_valid_tx_metadata(meta)))
+ goto out;
+
+ ret.meta = meta;
+
+out:
+ return ret;
+}
+EXPORT_SYMBOL(xp_raw_get_ctx);
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (7 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 08/12] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-13 19:13 ` Mina Almasry
2024-12-11 17:26 ` [PATCH net-next 10/12] skbuff: allow 2-4-argument skb_frag_dma_map() Alexander Lobakin
` (3 subsequent siblings)
12 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Add the following Page Pool netmem wrappers to be able to implement
an MP-agnostic driver:
* page_pool{,_dev}_alloc_best_fit_netmem()
Same as page_pool{,_dev}_alloc(). Make the latter a wrapper around
the new helper (as a page is always a netmem, but not vice versa).
'page_pool_alloc_netmem' is already busy, hence '_best_fit' (which
also says what the helper tries to do).
* page_pool_dma_sync_for_cpu_netmem()
Same as page_pool_dma_sync_for_cpu(). Performs DMA sync only if
the netmem comes from the host.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/page_pool/helpers.h | 46 ++++++++++++++++++++++++++-------
1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 26caa2c20912..d75d10678958 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -115,22 +115,22 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
return page_pool_alloc_frag(pool, offset, size, gfp);
}
-static inline struct page *page_pool_alloc(struct page_pool *pool,
- unsigned int *offset,
- unsigned int *size, gfp_t gfp)
+static inline netmem_ref
+page_pool_alloc_best_fit_netmem(struct page_pool *pool, unsigned int *offset,
+ unsigned int *size, gfp_t gfp)
{
unsigned int max_size = PAGE_SIZE << pool->p.order;
- struct page *page;
+ netmem_ref netmem;
if ((*size << 1) > max_size) {
*size = max_size;
*offset = 0;
- return page_pool_alloc_pages(pool, gfp);
+ return page_pool_alloc_netmem(pool, gfp);
}
- page = page_pool_alloc_frag(pool, offset, *size, gfp);
- if (unlikely(!page))
- return NULL;
+ netmem = page_pool_alloc_frag_netmem(pool, offset, *size, gfp);
+ if (unlikely(!netmem))
+ return 0;
/* There is very likely not enough space for another fragment, so append
* the remaining size to the current fragment to avoid truesize
@@ -141,7 +141,25 @@ static inline struct page *page_pool_alloc(struct page_pool *pool,
pool->frag_offset = max_size;
}
- return page;
+ return netmem;
+}
+
+static inline netmem_ref
+page_pool_dev_alloc_best_fit_netmem(struct page_pool *pool,
+ unsigned int *offset,
+ unsigned int *size)
+{
+ gfp_t gfp = GFP_ATOMIC | __GFP_NOWARN;
+
+ return page_pool_alloc_best_fit_netmem(pool, offset, size, gfp);
+}
+
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+ unsigned int *offset,
+ unsigned int *size, gfp_t gfp)
+{
+ return netmem_to_page(page_pool_alloc_best_fit_netmem(pool, offset,
+ size, gfp));
}
/**
@@ -440,6 +458,16 @@ static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
page_pool_get_dma_dir(pool));
}
+static inline void
+page_pool_dma_sync_for_cpu_netmem(const struct page_pool *pool,
+ netmem_ref netmem, u32 offset,
+ u32 dma_sync_size)
+{
+ if (!netmem_is_net_iov(netmem))
+ page_pool_dma_sync_for_cpu(pool, netmem_to_page(netmem),
+ offset, dma_sync_size);
+}
+
static inline bool page_pool_put(struct page_pool *pool)
{
return refcount_dec_and_test(&pool->user_cnt);
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts
2024-12-11 17:26 ` [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts Alexander Lobakin
@ 2024-12-13 19:13 ` Mina Almasry
2024-12-16 15:58 ` Alexander Lobakin
0 siblings, 1 reply; 25+ messages in thread
From: Mina Almasry @ 2024-12-13 19:13 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Wed, Dec 11, 2024 at 9:31 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> Add the following Page Pool netmem wrappers to be able to implement
> an MP-agnostic driver:
>
Sorry, we raced a bit here. Jakub merged my "page_pool_alloc_netmem",
which does similar to what this patch does.
> * page_pool{,_dev}_alloc_best_fit_netmem()
>
> Same as page_pool{,_dev}_alloc(). Make the latter a wrapper around
> the new helper (as a page is always a netmem, but not vice versa).
> 'page_pool_alloc_netmem' is already busy, hence '_best_fit' (which
> also says what the helper tries to do).
>
I freed the page_pool_alloc_netmem name by doing a rename, and now
page_pool_alloc_netmem is the netmem counterpart to page_pool_alloc. I
did not however add a page_pool_dev_alloc equivalent.
> * page_pool_dma_sync_for_cpu_netmem()
>
> Same as page_pool_dma_sync_for_cpu(). Performs DMA sync only if
> the netmem comes from the host.
>
My series also adds page_pool_dma_sync_netmem_for_cpu, which should be
the same as your page_pool_dma_sync_for_cpu_netmem.
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts
2024-12-13 19:13 ` Mina Almasry
@ 2024-12-16 15:58 ` Alexander Lobakin
0 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-16 15:58 UTC (permalink / raw)
To: Mina Almasry
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
From: Mina Almasry <almasrymina@google.com>
Date: Fri, 13 Dec 2024 11:13:33 -0800
> On Wed, Dec 11, 2024 at 9:31 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> Add the following Page Pool netmem wrappers to be able to implement
>> an MP-agnostic driver:
>>
>
> Sorry, we raced a bit here. Jakub merged my "page_pool_alloc_netmem",
> which does similar to what this patch does.
>
>> * page_pool{,_dev}_alloc_best_fit_netmem()
>>
>> Same as page_pool{,_dev}_alloc(). Make the latter a wrapper around
>> the new helper (as a page is always a netmem, but not vice versa).
>> 'page_pool_alloc_netmem' is already busy, hence '_best_fit' (which
>> also says what the helper tries to do).
>>
>
> I freed the page_pool_alloc_netmem name by doing a rename, and now
> page_pool_alloc_netmem is the netmem counterpart to page_pool_alloc. I
> did not however add a page_pool_dev_alloc equivalent.
>
>> * page_pool_dma_sync_for_cpu_netmem()
>>
>> Same as page_pool_dma_sync_for_cpu(). Performs DMA sync only if
>> the netmem comes from the host.
>>
>
> My series also adds page_pool_dma_sync_netmem_for_cpu, which should be
> the same as your page_pool_dma_sync_for_cpu_netmem.
Yep, I saw your changes, rebasing soon.
Thanks,
Olek
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next 10/12] skbuff: allow 2-4-argument skb_frag_dma_map()
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (8 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 09/12] page_pool: add a couple of netmem counterparts Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:26 ` [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked() Alexander Lobakin
` (2 subsequent siblings)
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE)
is repeated across dozens of drivers and really wants a shorthand.
Add a macro which will count args and handle all possible number
from 2 to 5. Semantics:
skb_frag_dma_map(dev, frag) ->
__skb_frag_dma_map(dev, frag, 0, skb_frag_size(frag), DMA_TO_DEVICE)
skb_frag_dma_map(dev, frag, offset) ->
__skb_frag_dma_map(dev, frag, offset, skb_frag_size(frag) - offset,
DMA_TO_DEVICE)
skb_frag_dma_map(dev, frag, offset, size) ->
__skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE)
skb_frag_dma_map(dev, frag, offset, size, dir) ->
__skb_frag_dma_map(dev, frag, offset, size, dir)
No object code size changes for the existing callers. Users passing
less arguments also won't have bigger size comparing to the full
equivalent call.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/linux/skbuff.h | 31 ++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8bcf14ae6789..bb2b751d274a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3682,7 +3682,7 @@ static inline void skb_frag_page_copy(skb_frag_t *fragto,
bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio);
/**
- * skb_frag_dma_map - maps a paged fragment via the DMA API
+ * __skb_frag_dma_map - maps a paged fragment via the DMA API
* @dev: the device to map the fragment to
* @frag: the paged fragment to map
* @offset: the offset within the fragment (starting at the
@@ -3692,15 +3692,36 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t prio);
*
* Maps the page associated with @frag to @device.
*/
-static inline dma_addr_t skb_frag_dma_map(struct device *dev,
- const skb_frag_t *frag,
- size_t offset, size_t size,
- enum dma_data_direction dir)
+static inline dma_addr_t __skb_frag_dma_map(struct device *dev,
+ const skb_frag_t *frag,
+ size_t offset, size_t size,
+ enum dma_data_direction dir)
{
return dma_map_page(dev, skb_frag_page(frag),
skb_frag_off(frag) + offset, size, dir);
}
+#define skb_frag_dma_map(dev, frag, ...) \
+ CONCATENATE(_skb_frag_dma_map, \
+ COUNT_ARGS(__VA_ARGS__))(dev, frag, ##__VA_ARGS__)
+
+#define __skb_frag_dma_map1(dev, frag, offset, uf, uo) ({ \
+ const skb_frag_t *uf = (frag); \
+ size_t uo = (offset); \
+ \
+ __skb_frag_dma_map(dev, uf, uo, skb_frag_size(uf) - uo, \
+ DMA_TO_DEVICE); \
+})
+#define _skb_frag_dma_map1(dev, frag, offset) \
+ __skb_frag_dma_map1(dev, frag, offset, __UNIQUE_ID(frag_), \
+ __UNIQUE_ID(offset_))
+#define _skb_frag_dma_map0(dev, frag) \
+ _skb_frag_dma_map1(dev, frag, 0)
+#define _skb_frag_dma_map2(dev, frag, offset, size) \
+ __skb_frag_dma_map(dev, frag, offset, size, DMA_TO_DEVICE)
+#define _skb_frag_dma_map3(dev, frag, offset, size, dir) \
+ __skb_frag_dma_map(dev, frag, offset, size, dir)
+
static inline struct sk_buff *pskb_copy(struct sk_buff *skb,
gfp_t gfp_mask)
{
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (9 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 10/12] skbuff: allow 2-4-argument skb_frag_dma_map() Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-11 17:40 ` Josh Poimboeuf
2024-12-11 17:26 ` [PATCH net-next 12/12] unroll: add generic loop unroll helpers Alexander Lobakin
2024-12-13 2:50 ` [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II patchwork-bot+netdevbpf
12 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Sometimes, there's a need to modify a lot of static keys or modify the
same key multiple times in a loop. In that case, it seems more optimal
to lock cpu_read_lock once and then call _cpuslocked() variants.
The enable/disable functions are already exported, the refcounted
counterparts however are not. Fix that to allow modules to save some
cycles.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
kernel/jump_label.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 93a822d3c468..1034c0348995 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -182,6 +182,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
}
return true;
}
+EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);
bool static_key_slow_inc(struct static_key *key)
{
@@ -342,6 +343,7 @@ void static_key_slow_dec_cpuslocked(struct static_key *key)
STATIC_KEY_CHECK_USE(key);
__static_key_slow_dec_cpuslocked(key);
}
+EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);
void __static_key_slow_dec_deferred(struct static_key *key,
struct delayed_work *work,
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
2024-12-11 17:26 ` [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked() Alexander Lobakin
@ 2024-12-11 17:40 ` Josh Poimboeuf
2024-12-13 17:22 ` Alexander Lobakin
0 siblings, 1 reply; 25+ messages in thread
From: Josh Poimboeuf @ 2024-12-11 17:40 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Wed, Dec 11, 2024 at 06:26:48PM +0100, Alexander Lobakin wrote:
> Sometimes, there's a need to modify a lot of static keys or modify the
> same key multiple times in a loop. In that case, it seems more optimal
> to lock cpu_read_lock once and then call _cpuslocked() variants.
> The enable/disable functions are already exported, the refcounted
> counterparts however are not. Fix that to allow modules to save some
> cycles.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> kernel/jump_label.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 93a822d3c468..1034c0348995 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -182,6 +182,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
> }
> return true;
> }
> +EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);
>
> bool static_key_slow_inc(struct static_key *key)
> {
> @@ -342,6 +343,7 @@ void static_key_slow_dec_cpuslocked(struct static_key *key)
> STATIC_KEY_CHECK_USE(key);
> __static_key_slow_dec_cpuslocked(key);
> }
> +EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);
Where's the code which uses this?
--
Josh
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
2024-12-11 17:40 ` Josh Poimboeuf
@ 2024-12-13 17:22 ` Alexander Lobakin
2024-12-14 3:24 ` Josh Poimboeuf
0 siblings, 1 reply; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-13 17:22 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
From: Josh Poimboeuf <jpoimboe@kernel.org>
Date: Wed, 11 Dec 2024 09:40:00 -0800
> On Wed, Dec 11, 2024 at 06:26:48PM +0100, Alexander Lobakin wrote:
>> Sometimes, there's a need to modify a lot of static keys or modify the
>> same key multiple times in a loop. In that case, it seems more optimal
>> to lock cpu_read_lock once and then call _cpuslocked() variants.
>> The enable/disable functions are already exported, the refcounted
>> counterparts however are not. Fix that to allow modules to save some
>> cycles.
>>
>> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
>> ---
>> kernel/jump_label.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index 93a822d3c468..1034c0348995 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -182,6 +182,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
>> }
>> return true;
>> }
>> +EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);
>>
>> bool static_key_slow_inc(struct static_key *key)
>> {
>> @@ -342,6 +343,7 @@ void static_key_slow_dec_cpuslocked(struct static_key *key)
>> STATIC_KEY_CHECK_USE(key);
>> __static_key_slow_dec_cpuslocked(key);
>> }
>> +EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);
>
> Where's the code which uses this?
It's not in this series -- the initial one was too large, so it was split.
Thanks,
Olek
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
2024-12-13 17:22 ` Alexander Lobakin
@ 2024-12-14 3:24 ` Josh Poimboeuf
2024-12-16 16:02 ` Alexander Lobakin
0 siblings, 1 reply; 25+ messages in thread
From: Josh Poimboeuf @ 2024-12-14 3:24 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
On Fri, Dec 13, 2024 at 06:22:51PM +0100, Alexander Lobakin wrote:
> From: Josh Poimboeuf <jpoimboe@kernel.org>
> >> +EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);
> >
> > Where's the code which uses this?
>
> It's not in this series -- the initial one was too large, so it was split.
It's best to put the patch exporting the symbol adjacent to (or squashed
with) the patch using it so the justification for exporting it can be
reviewed at the same time.
--
Josh
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
2024-12-14 3:24 ` Josh Poimboeuf
@ 2024-12-16 16:02 ` Alexander Lobakin
0 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-16 16:02 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Peter Zijlstra, Jose E. Marchesi,
Toke Høiland-Jørgensen, Magnus Karlsson,
Maciej Fijalkowski, Przemek Kitszel, Jason Baron, Casey Schaufler,
Nathan Chancellor, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
From: Josh Poimboeuf <jpoimboe@kernel.org>
Date: Fri, 13 Dec 2024 19:24:31 -0800
> On Fri, Dec 13, 2024 at 06:22:51PM +0100, Alexander Lobakin wrote:
>> From: Josh Poimboeuf <jpoimboe@kernel.org>
>>>> +EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);
>>>
>>> Where's the code which uses this?
>>
>> It's not in this series -- the initial one was too large, so it was split.
>
> It's best to put the patch exporting the symbol adjacent to (or squashed
> with) the patch using it so the justification for exporting it can be
> reviewed at the same time.
Sure, I'll move this one to the next series.
Thanks,
Olek
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next 12/12] unroll: add generic loop unroll helpers
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (10 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked() Alexander Lobakin
@ 2024-12-11 17:26 ` Alexander Lobakin
2024-12-13 2:50 ` [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II patchwork-bot+netdevbpf
12 siblings, 0 replies; 25+ messages in thread
From: Alexander Lobakin @ 2024-12-11 17:26 UTC (permalink / raw)
To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Peter Zijlstra, Josh Poimboeuf,
Jose E. Marchesi, Toke Høiland-Jørgensen,
Magnus Karlsson, Maciej Fijalkowski, Przemek Kitszel, Jason Baron,
Casey Schaufler, Nathan Chancellor,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
There are cases when we need to explicitly unroll loops. For example,
cache operations, filling DMA descriptors on very high speeds etc.
Add compiler-specific attribute macros to give the compiler a hint
that we'd like to unroll a loop.
Example usage:
#define UNROLL_BATCH 8
unrolled_count(UNROLL_BATCH)
for (u32 i = 0; i < UNROLL_BATCH; i++)
op(priv, i);
Note that sometimes the compilers won't unroll loops if they think this
would have worse optimization and perf than without unrolling, and that
unroll attributes are available only starting GCC 8. For older compiler
versions, no hints/attributes will be applied.
For better unrolling/parallelization, don't have any variables that
interfere between iterations except for the iterator itself.
Co-developed-by: Jose E. Marchesi <jose.marchesi@oracle.com> # pragmas
Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/linux/unroll.h | 44 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/include/linux/unroll.h b/include/linux/unroll.h
index d42fd6366373..69b6ea74d7c1 100644
--- a/include/linux/unroll.h
+++ b/include/linux/unroll.h
@@ -9,6 +9,50 @@
#include <linux/args.h>
+#ifdef CONFIG_CC_IS_CLANG
+#define __pick_unrolled(x, y) _Pragma(#x)
+#elif CONFIG_GCC_VERSION >= 80000
+#define __pick_unrolled(x, y) _Pragma(#y)
+#else
+#define __pick_unrolled(x, y) /* not supported */
+#endif
+
+/**
+ * unrolled - loop attributes to ask the compiler to unroll it
+ *
+ * Usage:
+ *
+ * #define BATCH 8
+ *
+ * unrolled_count(BATCH)
+ * for (u32 i = 0; i < BATCH; i++)
+ * // loop body without cross-iteration dependencies
+ *
+ * This is only a hint and the compiler is free to disable unrolling if it
+ * thinks the count is suboptimal and may hurt performance and/or hugely
+ * increase object code size.
+ * Not having any cross-iteration dependencies (i.e. when iter x + 1 depends
+ * on what iter x will do with variables) is not a strict requirement, but
+ * provides best performance and object code size.
+ * Available only on Clang and GCC 8.x onwards.
+ */
+
+/* Ask the compiler to pick an optimal unroll count, Clang only */
+#define unrolled \
+ __pick_unrolled(clang loop unroll(enable), /* nothing */)
+
+/* Unroll each @n iterations of a loop */
+#define unrolled_count(n) \
+ __pick_unrolled(clang loop unroll_count(n), GCC unroll n)
+
+/* Unroll the whole loop */
+#define unrolled_full \
+ __pick_unrolled(clang loop unroll(full), GCC unroll 65534)
+
+/* Never unroll a loop */
+#define unrolled_none \
+ __pick_unrolled(clang loop unroll(disable), GCC unroll 1)
+
#define UNROLL(N, MACRO, args...) CONCATENATE(__UNROLL_, N)(MACRO, args)
#define __UNROLL_0(MACRO, args...)
--
2.47.1
^ permalink raw reply related [flat|nested] 25+ messages in thread* Re: [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II
2024-12-11 17:26 [PATCH net-next 00/12] xdp: a fistful of generic changes pt. II Alexander Lobakin
` (11 preceding siblings ...)
2024-12-11 17:26 ` [PATCH net-next 12/12] unroll: add generic loop unroll helpers Alexander Lobakin
@ 2024-12-13 2:50 ` patchwork-bot+netdevbpf
12 siblings, 0 replies; 25+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-13 2:50 UTC (permalink / raw)
To: Alexander Lobakin
Cc: andrew+netdev, davem, edumazet, kuba, pabeni, ast, daniel,
john.fastabend, andrii, peterz, jpoimboe, jose.marchesi, toke,
magnus.karlsson, maciej.fijalkowski, przemyslaw.kitszel, jbaron,
casey, nathan, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Wed, 11 Dec 2024 18:26:37 +0100 you wrote:
> XDP for idpf is currently 5.5 chapters:
> * convert Rx to libeth;
> * convert Tx and stats to libeth;
> * generic XDP and XSk code changes;
> * generic XDP and XSk code additions (you are here);
> * actual XDP for idpf via new libeth_xdp;
> * XSk for idpf (via ^).
>
> [...]
Here is the summary with links:
- [net-next,01/12] page_pool: allow mixing PPs within one bulk
https://git.kernel.org/netdev/net-next/c/fcc680a647ba
- [net-next,02/12] xdp: get rid of xdp_frame::mem.id
(no matching commit)
- [net-next,03/12] xdp: make __xdp_return() MP-agnostic
https://git.kernel.org/netdev/net-next/c/207ff83cecae
- [net-next,04/12] xdp: add generic xdp_buff_add_frag()
(no matching commit)
- [net-next,05/12] xdp: add generic xdp_build_skb_from_buff()
(no matching commit)
- [net-next,06/12] xsk: make xsk_buff_add_frag really add the frag via __xdp_buff_add_frag()
(no matching commit)
- [net-next,07/12] xsk: add generic XSk &xdp_buff -> skb conversion
(no matching commit)
- [net-next,08/12] xsk: add helper to get &xdp_desc's DMA and meta pointer in one go
(no matching commit)
- [net-next,09/12] page_pool: add a couple of netmem counterparts
(no matching commit)
- [net-next,10/12] skbuff: allow 2-4-argument skb_frag_dma_map()
https://git.kernel.org/netdev/net-next/c/0dffdb3b3366
- [net-next,11/12] jump_label: export static_key_slow_{inc,dec}_cpuslocked()
(no matching commit)
- [net-next,12/12] unroll: add generic loop unroll helpers
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 25+ messages in thread