* [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I
@ 2024-12-03 17:37 Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder Alexander Lobakin
` (10 more replies)
0 siblings, 11 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
XDP for idpf is currently 6 chapters:
* convert Rx to libeth;
* convert Tx and stats to libeth;
* generic XDP and XSk code changes (you are here);
* generic XDP and XSk code additions;
* actual XDP for idpf via new libeth_xdp;
* XSk for idpf (via ^).
Part III does the following:
* improve &xdp_buff_xsk cacheline placement;
* does some cleanups with marking read-only bpf_prog and xdp_buff
arguments const for some generic functions;
* allows attaching already registered XDP memory model to RxQ info;
* makes system percpu page_pools valid XDP memory models;
* starts using netmems in the XDP core code (1 function);
* allows mixing pages from several page_pools within one XDP frame;
* optimizes &xdp_frame layout and removes no-more-used field.
Bullets 4-6 are the most important ones. All of them are prereqs to
libeth_xdp.
Alexander Lobakin (9):
xsk: align &xdp_buff_xsk harder
bpf, xdp: constify some bpf_prog * function arguments
xdp, xsk: constify read-only arguments of some static inline helpers
xdp: allow attaching already registered memory model to xdp_rxq_info
xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model()
netmem: add a couple of page helper wrappers
page_pool: make page_pool_put_page_bulk() handle array of netmems
page_pool: allow mixing PPs within one bulk
xdp: get rid of xdp_frame::mem.id
Toke Høiland-Jørgensen (1):
xdp: register system page pool as an XDP memory model
include/net/page_pool/types.h | 6 +-
include/linux/bpf.h | 12 +-
include/linux/filter.h | 9 +-
include/linux/netdevice.h | 7 +-
include/linux/skbuff.h | 2 +-
include/net/netmem.h | 78 +++++++++++-
include/net/xdp.h | 93 ++++++++++----
include/net/xdp_sock_drv.h | 11 +-
include/net/xsk_buff_pool.h | 4 +-
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
drivers/net/veth.c | 4 +-
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/devmap.c | 8 +-
net/bpf/test_run.c | 4 +-
net/core/dev.c | 20 ++-
net/core/filter.c | 41 +++---
net/core/page_pool.c | 79 ++++++++----
net/core/skbuff.c | 2 +-
net/core/xdp.c | 118 +++++++++++-------
19 files changed, 348 insertions(+), 154 deletions(-)
---
From v5[0]:
* split the overgrowth series into 2 parts: changes and additions
(Jakub);
* 008: future-proof: make the touched function MP-agnostic to avoid
double work in future;
* send to better fitting now bpf instead of netdev.
From v4[1]:
* 12: pick RB from Toke;
* 19: drop redundant ';'s (Jakub);
* 19: fix a couple context imbalance warnings by moving __acquires() /
__releases() to the proper place (smatch);
* no functional changes.
From v3[2]:
* rebase on top of the latest net-next to solve conflict (Jakub);
* 09: use iterative approach instead of recursive to not blow the stack
(Toke);
* 12: rephrase the commitmsg since the functionality changed, so that
it's not actual anymore (Toke);
* align &xdp_buff_xsk a bit harder since its alignment degraded
recently;
* pick RBs from Toke.
From v2[3]:
* cover: rename the series;
* collect RBs and Acks from Maciej;
* 007: reword the commitmsg;
* 011: fix typos in the commitmsg (M);
* 012: 'ts' -> 'tsize' (M; not 'truesize' to fit into 80 cols =\);
* 016: fix the intro sentence (M);
* no functional changes.
From v1[4]:
* rebase on top of the latest net-next;
* no other changes.
[0] https://lore.kernel.org/netdev/20241113152442.4000468-1-aleksander.lobakin@intel.com
[1] https://lore.kernel.org/netdev/20241107161026.2903044-1-aleksander.lobakin@intel.com
[2] https://lore.kernel.org/netdev/20241030165201.442301-1-aleksander.lobakin@intel.com
[3] https://lore.kernel.org/netdev/20241015145350.4077765-1-aleksander.lobakin@intel.com
[4] https://lore.kernel.org/netdev/20241009152756.3113697-1-aleksander.lobakin@intel.com
--
2.47.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-04 10:27 ` Toke Høiland-Jørgensen
2024-12-03 17:37 ` [PATCH net-next v6 02/10] bpf, xdp: constify some bpf_prog * function arguments Alexander Lobakin
` (9 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
After the series "XSk buff on a diet" by Maciej, the greatest pow-2
which &xdp_buff_xsk can be divided got reduced from 16 to 8 on x86_64.
Also, sizeof(xdp_buff_xsk) now is 120 bytes, which, taking the previous
sentence into account, leads to that it leaves 8 bytes at the end of
cacheline, which means an array of buffs will have its elements
messed between the cachelines chaotically.
Use __aligned_largest for this struct. This alignment is usually 16
bytes, which makes it fill two full cachelines and align an array
nicely. ___cacheline_aligned may be excessive here, especially on
arches with 128-256 byte CLs, as well as 32-bit arches (76 -> 96
bytes on MIPS32R2), while not doing better than _largest.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xsk_buff_pool.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index bb03cee716b3..7637799b6c19 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -29,7 +29,7 @@ struct xdp_buff_xsk {
dma_addr_t frame_dma;
struct xsk_buff_pool *pool;
struct list_head list_node;
-};
+} __aligned_largest;
#define XSK_CHECK_PRIV_TYPE(t) BUILD_BUG_ON(sizeof(t) > offsetofend(struct xdp_buff_xsk, cb))
#define XSK_TX_COMPL_FITS(t) BUILD_BUG_ON(sizeof(struct xsk_tx_metadata_compl) > sizeof(t))
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 02/10] bpf, xdp: constify some bpf_prog * function arguments
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 03/10] xdp, xsk: constify read-only arguments of some static inline helpers Alexander Lobakin
` (8 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
In lots of places, bpf_prog pointer is used only for tracing or other
stuff that doesn't modify the structure itself. Same for net_device.
Address at least some of them and add `const` attributes there. The
object code didn't change, but that may prevent unwanted data
modifications and also allow more helpers to have const arguments.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/linux/bpf.h | 12 ++++++------
include/linux/filter.h | 9 +++++----
include/linux/netdevice.h | 6 +++---
include/linux/skbuff.h | 2 +-
kernel/bpf/devmap.c | 8 ++++----
net/core/dev.c | 10 +++++-----
net/core/filter.c | 29 ++++++++++++++++-------------
net/core/skbuff.c | 2 +-
8 files changed, 41 insertions(+), 37 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index eaee2a819f4c..ec3acb16359e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2591,10 +2591,10 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
struct bpf_map *map, bool exclude_ingress);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
- struct bpf_prog *xdp_prog);
+ const struct bpf_prog *xdp_prog);
int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress);
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress);
void __cpu_map_flush(struct list_head *flush_list);
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
@@ -2864,15 +2864,15 @@ struct sk_buff;
static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst,
struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
return 0;
}
static inline
int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress)
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress)
{
return 0;
}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 3a21947f2fd4..9a5d23ae3855 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1179,17 +1179,18 @@ static inline int xdp_ok_fwd_dev(const struct net_device *fwd,
* This does not appear to be a real limitation for existing software.
*/
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct xdp_buff *xdp, struct bpf_prog *prog);
+ struct xdp_buff *xdp, const struct bpf_prog *prog);
int xdp_do_redirect(struct net_device *dev,
struct xdp_buff *xdp,
- struct bpf_prog *prog);
+ const struct bpf_prog *prog);
int xdp_do_redirect_frame(struct net_device *dev,
struct xdp_buff *xdp,
struct xdp_frame *xdpf,
- struct bpf_prog *prog);
+ const struct bpf_prog *prog);
void xdp_do_flush(void);
-void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act);
+void bpf_warn_invalid_xdp_action(const struct net_device *dev,
+ const struct bpf_prog *prog, u32 act);
#ifdef CONFIG_INET
struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk,
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ecc686409161..ecca21387a68 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3958,9 +3958,9 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
}
u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog);
-void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb);
+ const struct bpf_prog *xdp_prog);
+void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog);
+int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff **pskb);
int netif_rx(struct sk_buff *skb);
int __netif_rx(struct sk_buff *skb);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 58009fa66102..95452d1a07fc 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3627,7 +3627,7 @@ static inline netmem_ref skb_frag_netmem(const skb_frag_t *frag)
int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
unsigned int headroom);
int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
- struct bpf_prog *prog);
+ const struct bpf_prog *prog);
/**
* skb_frag_address - gets the address of the data contained in a paged fragment
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 7878be18e9d2..effde52bc857 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -678,7 +678,7 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
}
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
int err;
@@ -701,7 +701,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
struct sk_buff *nskb;
int err;
@@ -720,8 +720,8 @@ static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
}
int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress)
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *dst, *last_dst = NULL;
diff --git a/net/core/dev.c b/net/core/dev.c
index 13d00fc10f55..bbb456b86e8b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4931,7 +4931,7 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
}
u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
void *orig_data, *orig_data_end, *hard_start;
struct netdev_rx_queue *rxqueue;
@@ -5033,7 +5033,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
}
static int
-netif_skb_check_for_xdp(struct sk_buff **pskb, struct bpf_prog *prog)
+netif_skb_check_for_xdp(struct sk_buff **pskb, const struct bpf_prog *prog)
{
struct sk_buff *skb = *pskb;
int err, hroom, troom;
@@ -5057,7 +5057,7 @@ netif_skb_check_for_xdp(struct sk_buff **pskb, struct bpf_prog *prog)
static u32 netif_receive_generic_xdp(struct sk_buff **pskb,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
struct sk_buff *skb = *pskb;
u32 mac_len, act = XDP_DROP;
@@ -5110,7 +5110,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff **pskb,
* and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
* queues, so they do not have this starvation issue.
*/
-void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
+void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog)
{
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
@@ -5135,7 +5135,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff **pskb)
+int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff **pskb)
{
struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
diff --git a/net/core/filter.c b/net/core/filter.c
index 6625b3f563a4..fac245065b0a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4348,9 +4348,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
EXPORT_SYMBOL_GPL(xdp_master_redirect);
static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
- struct net_device *dev,
+ const struct net_device *dev,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
@@ -4371,10 +4371,10 @@ static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
return err;
}
-static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
- struct net_device *dev,
- struct xdp_frame *xdpf,
- struct bpf_prog *xdp_prog)
+static __always_inline int
+__xdp_do_redirect_frame(struct bpf_redirect_info *ri, struct net_device *dev,
+ struct xdp_frame *xdpf,
+ const struct bpf_prog *xdp_prog)
{
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
@@ -4443,7 +4443,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
}
int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
enum bpf_map_type map_type = ri->map_type;
@@ -4457,7 +4457,8 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
EXPORT_SYMBOL_GPL(xdp_do_redirect);
int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp,
- struct xdp_frame *xdpf, struct bpf_prog *xdp_prog)
+ struct xdp_frame *xdpf,
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
enum bpf_map_type map_type = ri->map_type;
@@ -4472,9 +4473,9 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect_frame);
static int xdp_do_generic_redirect_map(struct net_device *dev,
struct sk_buff *skb,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog, void *fwd,
- enum bpf_map_type map_type, u32 map_id,
- u32 flags)
+ const struct bpf_prog *xdp_prog,
+ void *fwd, enum bpf_map_type map_type,
+ u32 map_id, u32 flags)
{
struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
struct bpf_map *map;
@@ -4528,7 +4529,8 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
}
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
+ struct xdp_buff *xdp,
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
enum bpf_map_type map_type = ri->map_type;
@@ -9075,7 +9077,8 @@ static bool xdp_is_valid_access(int off, int size,
return __is_valid_xdp_access(off, size);
}
-void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act)
+void bpf_warn_invalid_xdp_action(const struct net_device *dev,
+ const struct bpf_prog *prog, u32 act)
{
const u32 act_max = XDP_REDIRECT;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6841e61a6bd0..a441613a1e6c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1009,7 +1009,7 @@ int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
EXPORT_SYMBOL(skb_pp_cow_data);
int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
- struct bpf_prog *prog)
+ const struct bpf_prog *prog)
{
if (!prog->aux->xdp_has_frags)
return -EINVAL;
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 03/10] xdp, xsk: constify read-only arguments of some static inline helpers
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 02/10] bpf, xdp: constify some bpf_prog * function arguments Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 04/10] xdp: allow attaching already registered memory model to xdp_rxq_info Alexander Lobakin
` (7 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Lots of read-only helpers for &xdp_buff and &xdp_frame, such as getting
the frame length, skb_shared_info etc., don't have their arguments
marked with `const` for no reason. Add the missing annotations to leave
less place for mistakes and more for optimization.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 29 +++++++++++++++++------------
include/net/xdp_sock_drv.h | 11 ++++++-----
include/net/xsk_buff_pool.h | 2 +-
3 files changed, 24 insertions(+), 18 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index e6770dd40c91..197808df1ee1 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -88,7 +88,7 @@ struct xdp_buff {
u32 flags; /* supported values defined in xdp_buff_flags */
};
-static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp)
+static __always_inline bool xdp_buff_has_frags(const struct xdp_buff *xdp)
{
return !!(xdp->flags & XDP_FLAGS_HAS_FRAGS);
}
@@ -103,7 +103,8 @@ static __always_inline void xdp_buff_clear_frags_flag(struct xdp_buff *xdp)
xdp->flags &= ~XDP_FLAGS_HAS_FRAGS;
}
-static __always_inline bool xdp_buff_is_frag_pfmemalloc(struct xdp_buff *xdp)
+static __always_inline bool
+xdp_buff_is_frag_pfmemalloc(const struct xdp_buff *xdp)
{
return !!(xdp->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
}
@@ -144,15 +145,16 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
static inline struct skb_shared_info *
-xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
+xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
{
return (struct skb_shared_info *)xdp_data_hard_end(xdp);
}
-static __always_inline unsigned int xdp_get_buff_len(struct xdp_buff *xdp)
+static __always_inline unsigned int
+xdp_get_buff_len(const struct xdp_buff *xdp)
{
unsigned int len = xdp->data_end - xdp->data;
- struct skb_shared_info *sinfo;
+ const struct skb_shared_info *sinfo;
if (likely(!xdp_buff_has_frags(xdp)))
goto out;
@@ -177,12 +179,13 @@ struct xdp_frame {
u32 flags; /* supported values defined in xdp_buff_flags */
};
-static __always_inline bool xdp_frame_has_frags(struct xdp_frame *frame)
+static __always_inline bool xdp_frame_has_frags(const struct xdp_frame *frame)
{
return !!(frame->flags & XDP_FLAGS_HAS_FRAGS);
}
-static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame *frame)
+static __always_inline bool
+xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
{
return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
}
@@ -201,7 +204,7 @@ static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
}
static inline struct skb_shared_info *
-xdp_get_shared_info_from_frame(struct xdp_frame *frame)
+xdp_get_shared_info_from_frame(const struct xdp_frame *frame)
{
void *data_hard_start = frame->data - frame->headroom - sizeof(*frame);
@@ -249,7 +252,8 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf);
static inline
-void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
+void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
+ struct xdp_buff *xdp)
{
xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
xdp->data = frame->data;
@@ -260,7 +264,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
}
static inline
-int xdp_update_frame_from_buff(struct xdp_buff *xdp,
+int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
struct xdp_frame *xdp_frame)
{
int metasize, headroom;
@@ -317,9 +321,10 @@ void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq);
-static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xdpf)
+static __always_inline unsigned int
+xdp_get_frame_len(const struct xdp_frame *xdpf)
{
- struct skb_shared_info *sinfo;
+ const struct skb_shared_info *sinfo;
unsigned int len = xdpf->len;
if (likely(!xdp_frame_has_frags(xdpf)))
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
index 40085afd9160..f3175a5d28f7 100644
--- a/include/net/xdp_sock_drv.h
+++ b/include/net/xdp_sock_drv.h
@@ -101,7 +101,7 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
return xp_alloc(pool);
}
-static inline bool xsk_is_eop_desc(struct xdp_desc *desc)
+static inline bool xsk_is_eop_desc(const struct xdp_desc *desc)
{
return !xp_mb_desc(desc);
}
@@ -143,7 +143,7 @@ static inline void xsk_buff_add_frag(struct xdp_buff *xdp)
list_add_tail(&frag->list_node, &frag->pool->xskb_list);
}
-static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
+static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
{
struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp);
struct xdp_buff *ret = NULL;
@@ -200,7 +200,8 @@ static inline void *xsk_buff_raw_get_data(struct xsk_buff_pool *pool, u64 addr)
XDP_TXMD_FLAGS_CHECKSUM | \
0)
-static inline bool xsk_buff_valid_tx_metadata(struct xsk_tx_metadata *meta)
+static inline bool
+xsk_buff_valid_tx_metadata(const struct xsk_tx_metadata *meta)
{
return !(meta->flags & ~XDP_TXMD_FLAGS_VALID);
}
@@ -337,7 +338,7 @@ static inline struct xdp_buff *xsk_buff_alloc(struct xsk_buff_pool *pool)
return NULL;
}
-static inline bool xsk_is_eop_desc(struct xdp_desc *desc)
+static inline bool xsk_is_eop_desc(const struct xdp_desc *desc)
{
return false;
}
@@ -360,7 +361,7 @@ static inline void xsk_buff_add_frag(struct xdp_buff *xdp)
{
}
-static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
+static inline struct xdp_buff *xsk_buff_get_frag(const struct xdp_buff *first)
{
return NULL;
}
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index 7637799b6c19..50779406bc2d 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -183,7 +183,7 @@ static inline bool xp_desc_crosses_non_contig_pg(struct xsk_buff_pool *pool,
!(pool->dma_pages[addr >> PAGE_SHIFT] & XSK_NEXT_PG_CONTIG_MASK);
}
-static inline bool xp_mb_desc(struct xdp_desc *desc)
+static inline bool xp_mb_desc(const struct xdp_desc *desc)
{
return desc->options & XDP_PKT_CONTD;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 04/10] xdp: allow attaching already registered memory model to xdp_rxq_info
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (2 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 03/10] xdp, xsk: constify read-only arguments of some static inline helpers Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 05/10] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model() Alexander Lobakin
` (6 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
One may need to register memory model separately from xdp_rxq_info. One
simple example may be XDP test run code, but in general, it might be
useful when memory model registering is managed by one layer and then
XDP RxQ info by a different one.
Allow such scenarios by adding a simple helper which "attaches"
already registered memory model to the desired xdp_rxq_info. As this
is mostly needed for Page Pool, add a special function to do that for
a &page_pool pointer.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 32 +++++++++++++++++++++++++++
net/core/xdp.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 197808df1ee1..1253fe21ede7 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -356,6 +356,38 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq);
int xdp_reg_mem_model(struct xdp_mem_info *mem,
enum xdp_mem_type type, void *allocator);
void xdp_unreg_mem_model(struct xdp_mem_info *mem);
+int xdp_reg_page_pool(struct page_pool *pool);
+void xdp_unreg_page_pool(const struct page_pool *pool);
+void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq,
+ const struct page_pool *pool);
+
+/**
+ * xdp_rxq_info_attach_mem_model - attach registered mem info to RxQ info
+ * @xdp_rxq: XDP RxQ info to attach the memory info to
+ * @mem: already registered memory info
+ *
+ * If the driver registers its memory providers manually, it must use this
+ * function instead of xdp_rxq_info_reg_mem_model().
+ */
+static inline void
+xdp_rxq_info_attach_mem_model(struct xdp_rxq_info *xdp_rxq,
+ const struct xdp_mem_info *mem)
+{
+ xdp_rxq->mem = *mem;
+}
+
+/**
+ * xdp_rxq_info_detach_mem_model - detach registered mem info from RxQ info
+ * @xdp_rxq: XDP RxQ info to detach the memory info from
+ *
+ * If the driver registers its memory providers manually and then attaches it
+ * via xdp_rxq_info_attach_mem_model(), it must call this function before
+ * xdp_rxq_info_unreg().
+ */
+static inline void xdp_rxq_info_detach_mem_model(struct xdp_rxq_info *xdp_rxq)
+{
+ xdp_rxq->mem = (struct xdp_mem_info){ };
+}
/* Drivers not supporting XDP metadata can use this helper, which
* rejects any room expansion for metadata as a result.
diff --git a/net/core/xdp.c b/net/core/xdp.c
index bcc5551c6424..885a2a664bce 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -365,6 +365,62 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
+/**
+ * xdp_reg_page_pool - register &page_pool as a memory provider for XDP
+ * @pool: &page_pool to register
+ *
+ * Can be used to register pools manually without connecting to any XDP RxQ
+ * info, so that the XDP layer will be aware of them. Then, they can be
+ * attached to an RxQ info manually via xdp_rxq_info_attach_page_pool().
+ *
+ * Return: %0 on success, -errno on error.
+ */
+int xdp_reg_page_pool(struct page_pool *pool)
+{
+ struct xdp_mem_info mem;
+
+ return xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pool);
+}
+EXPORT_SYMBOL_GPL(xdp_reg_page_pool);
+
+/**
+ * xdp_unreg_page_pool - unregister &page_pool from the memory providers list
+ * @pool: &page_pool to unregister
+ *
+ * A shorthand for manual unregistering page pools. If the pool was previously
+ * attached to an RxQ info, it must be detached first.
+ */
+void xdp_unreg_page_pool(const struct page_pool *pool)
+{
+ struct xdp_mem_info mem = {
+ .type = MEM_TYPE_PAGE_POOL,
+ .id = pool->xdp_mem_id,
+ };
+
+ xdp_unreg_mem_model(&mem);
+}
+EXPORT_SYMBOL_GPL(xdp_unreg_page_pool);
+
+/**
+ * xdp_rxq_info_attach_page_pool - attach registered pool to RxQ info
+ * @xdp_rxq: XDP RxQ info to attach the pool to
+ * @pool: pool to attach
+ *
+ * If the pool was registered manually, this function must be called instead
+ * of xdp_rxq_info_reg_mem_model() to connect it to the RxQ info.
+ */
+void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq,
+ const struct page_pool *pool)
+{
+ struct xdp_mem_info mem = {
+ .type = MEM_TYPE_PAGE_POOL,
+ .id = pool->xdp_mem_id,
+ };
+
+ xdp_rxq_info_attach_mem_model(xdp_rxq, &mem);
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
+
/* XDP RX runs under NAPI protection, and in different delivery error
* scenarios (e.g. queue full), it is possible to return the xdp_frame
* while still leveraging this protection. The @napi_direct boolean
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 05/10] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model()
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (3 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 04/10] xdp: allow attaching already registered memory model to xdp_rxq_info Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 06/10] xdp: register system page pool as an XDP memory model Alexander Lobakin
` (5 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
When you register an XSk pool as XDP Rxq info memory model, you then
need to manually attach it after the registration.
Let the user combine both actions into one by just passing a pointer
to the pool directly to xdp_rxq_info_reg_mem_model(), which will take
care of calling xsk_pool_set_rxq_info(). This looks similar to how a
&page_pool gets registered and reduce repeating driver code.
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
net/core/xdp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 885a2a664bce..de1e9cb78718 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -358,6 +358,9 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
if (IS_ERR(xdp_alloc))
return PTR_ERR(xdp_alloc);
+ if (type == MEM_TYPE_XSK_BUFF_POOL && allocator)
+ xsk_pool_set_rxq_info(allocator, xdp_rxq);
+
if (trace_mem_connect_enabled() && xdp_alloc)
trace_mem_connect(xdp_alloc, xdp_rxq);
return 0;
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 06/10] xdp: register system page pool as an XDP memory model
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (4 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 05/10] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model() Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers Alexander Lobakin
` (4 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
From: Toke Høiland-Jørgensen <toke@redhat.com>
To make the system page pool usable as a source for allocating XDP
frames, we need to register it with xdp_reg_mem_model(), so that page
return works correctly. This is done in preparation for using the system
page_pool to convert XDP_PASS XSk frames to skbs; for the same reason,
make the per-cpu variable non-static so we can access it from other
source files as well (but w/o exporting).
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/linux/netdevice.h | 1 +
net/core/dev.c | 10 +++++++++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ecca21387a68..d1a8d98b132c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3322,6 +3322,7 @@ struct softnet_data {
};
DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
+DECLARE_PER_CPU(struct page_pool *, system_page_pool);
#ifndef CONFIG_PREEMPT_RT
static inline int dev_recursion_level(void)
diff --git a/net/core/dev.c b/net/core/dev.c
index bbb456b86e8b..b93dba1e98ee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -460,7 +460,7 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
* PP consumers must pay attention to run APIs in the appropriate context
* (e.g. NAPI context).
*/
-static DEFINE_PER_CPU(struct page_pool *, system_page_pool);
+DEFINE_PER_CPU(struct page_pool *, system_page_pool);
#ifdef CONFIG_LOCKDEP
/*
@@ -12146,11 +12146,18 @@ static int net_page_pool_create(int cpuid)
.nid = cpu_to_mem(cpuid),
};
struct page_pool *pp_ptr;
+ int err;
pp_ptr = page_pool_create_percpu(&page_pool_params, cpuid);
if (IS_ERR(pp_ptr))
return -ENOMEM;
+ err = xdp_reg_page_pool(pp_ptr);
+ if (err) {
+ page_pool_destroy(pp_ptr);
+ return err;
+ }
+
per_cpu(system_page_pool, cpuid) = pp_ptr;
#endif
return 0;
@@ -12284,6 +12291,7 @@ static int __init net_dev_init(void)
if (!pp_ptr)
continue;
+ xdp_unreg_page_pool(pp_ptr);
page_pool_destroy(pp_ptr);
per_cpu(system_page_pool, i) = NULL;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (5 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 06/10] xdp: register system page pool as an XDP memory model Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-04 10:49 ` Toke Høiland-Jørgensen
2024-12-06 4:03 ` Mina Almasry
2024-12-03 17:37 ` [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems Alexander Lobakin
` (3 subsequent siblings)
10 siblings, 2 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Add the following netmem counterparts:
* virt_to_netmem() -- simple page_to_netmem(virt_to_page()) wrapper;
* netmem_is_pfmemalloc() -- page_is_pfmemalloc() for page-backed
netmems, false otherwise;
and the following "unsafe" versions:
* __netmem_to_page()
* __netmem_get_pp()
* __netmem_address()
They do the same as their non-underscored buddies, but assume the netmem
is always page-backed. When working with header &page_pools, you don't
need to check whether netmem belongs to the host memory and you can
never get NULL instead of &page. Checks for the LSB, clearing the LSB,
branches take cycles and increase object code size, sometimes
significantly. When you're sure your PP is always host, you can avoid
this by using the underscored counterparts.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/netmem.h | 78 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 76 insertions(+), 2 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 8a6e20be4b9d..1b58faa4f20f 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -72,6 +72,22 @@ static inline bool netmem_is_net_iov(const netmem_ref netmem)
return (__force unsigned long)netmem & NET_IOV;
}
+/**
+ * __netmem_to_page - unsafely get pointer to the &page backing @netmem
+ * @netmem: netmem reference to convert
+ *
+ * Unsafe version of netmem_to_page(). When @netmem is always page-backed,
+ * e.g. when it's a header buffer, performs faster and generates smaller
+ * object code (no check for the LSB, no WARN). When @netmem points to IOV,
+ * provokes undefined behaviour.
+ *
+ * Return: pointer to the &page (garbage if @netmem is not page-backed).
+ */
+static inline struct page *__netmem_to_page(netmem_ref netmem)
+{
+ return (__force struct page *)netmem;
+}
+
/* This conversion fails (returns NULL) if the netmem_ref is not struct page
* backed.
*/
@@ -80,7 +96,7 @@ static inline struct page *netmem_to_page(netmem_ref netmem)
if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
return NULL;
- return (__force struct page *)netmem;
+ return __netmem_to_page(netmem);
}
static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem)
@@ -103,6 +119,17 @@ static inline netmem_ref page_to_netmem(struct page *page)
return (__force netmem_ref)page;
}
+/**
+ * virt_to_netmem - convert virtual memory pointer to a netmem reference
+ * @data: host memory pointer to convert
+ *
+ * Return: netmem reference to the &page backing this virtual address.
+ */
+static inline netmem_ref virt_to_netmem(const void *data)
+{
+ return page_to_netmem(virt_to_page(data));
+}
+
static inline int netmem_ref_count(netmem_ref netmem)
{
/* The non-pp refcount of net_iov is always 1. On net_iov, we only
@@ -127,6 +154,22 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
}
+/**
+ * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
+ * @netmem: netmem reference to get the pointer from
+ *
+ * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
+ * e.g. when it's a header buffer, performs faster and generates smaller
+ * object code (avoids clearing the LSB). When @netmem points to IOV,
+ * provokes invalid memory access.
+ *
+ * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
+ */
+static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
+{
+ return __netmem_to_page(netmem)->pp;
+}
+
static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
{
return __netmem_clear_lsb(netmem)->pp;
@@ -158,12 +201,43 @@ static inline netmem_ref netmem_compound_head(netmem_ref netmem)
return page_to_netmem(compound_head(netmem_to_page(netmem)));
}
+/**
+ * __netmem_address - unsafely get pointer to the memory backing @netmem
+ * @netmem: netmem reference to get the pointer for
+ *
+ * Unsafe version of netmem_address(). When @netmem is always page-backed,
+ * e.g. when it's a header buffer, performs faster and generates smaller
+ * object code (no check for the LSB). When @netmem points to IOV, provokes
+ * undefined behaviour.
+ *
+ * Return: pointer to the memory (garbage if @netmem is not page-backed).
+ */
+static inline void *__netmem_address(netmem_ref netmem)
+{
+ return page_address(__netmem_to_page(netmem));
+}
+
static inline void *netmem_address(netmem_ref netmem)
{
if (netmem_is_net_iov(netmem))
return NULL;
- return page_address(netmem_to_page(netmem));
+ return __netmem_address(netmem);
+}
+
+/**
+ * netmem_is_pfmemalloc - check if @netmem was allocated under memory pressure
+ * @netmem: netmem reference to check
+ *
+ * Return: true if @netmem is page-backed and the page was allocated under
+ * memory pressure, false otherwise.
+ */
+static inline bool netmem_is_pfmemalloc(netmem_ref netmem)
+{
+ if (netmem_is_net_iov(netmem))
+ return false;
+
+ return page_is_pfmemalloc(netmem_to_page(netmem));
}
static inline unsigned long netmem_get_dma_addr(netmem_ref netmem)
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (6 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-04 10:50 ` Toke Høiland-Jørgensen
2024-12-06 4:07 ` Mina Almasry
2024-12-03 17:37 ` [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk Alexander Lobakin
` (2 subsequent siblings)
10 siblings, 2 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Currently, page_pool_put_page_bulk() indeed takes an array of pointers
to the data, not pages, despite the name. As one side effect, when
you're freeing frags from &skb_shared_info, xdp_return_frame_bulk()
converts page pointers to virtual addresses and then
page_pool_put_page_bulk() converts them back. Moreover, data pointers
assume every frag is placed in the host memory, making this function
non-universal.
Make page_pool_put_page_bulk() handle array of netmems. Pass frag
netmems directly and use virt_to_netmem() when freeing xdpf->data,
so that the PP core will then get the compound netmem and take care
of the rest.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/page_pool/types.h | 8 ++++----
include/net/xdp.h | 2 +-
net/core/page_pool.c | 30 +++++++++++++++---------------
net/core/xdp.c | 6 +++---
4 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index c022c410abe3..1ea16b0e9c79 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -259,8 +259,8 @@ void page_pool_disable_direct_recycling(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
const struct xdp_mem_info *mem);
-void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count);
+void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
+ u32 count);
#else
static inline void page_pool_destroy(struct page_pool *pool)
{
@@ -272,8 +272,8 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
{
}
-static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count)
+static inline void page_pool_put_netmem_bulk(struct page_pool *pool,
+ netmem_ref *data, u32 count)
{
}
#endif
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 1253fe21ede7..f4020b29122f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -194,7 +194,7 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
struct xdp_frame_bulk {
int count;
void *xa;
- void *q[XDP_BULK_QUEUE_SIZE];
+ netmem_ref q[XDP_BULK_QUEUE_SIZE];
};
static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f89cf93f6eb4..4c85b77cfdac 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -840,22 +840,22 @@ void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page,
EXPORT_SYMBOL(page_pool_put_unrefed_page);
/**
- * page_pool_put_page_bulk() - release references on multiple pages
+ * page_pool_put_netmem_bulk() - release references on multiple netmems
* @pool: pool from which pages were allocated
- * @data: array holding page pointers
- * @count: number of pages in @data
+ * @data: array holding netmem references
+ * @count: number of entries in @data
*
- * Tries to refill a number of pages into the ptr_ring cache holding ptr_ring
- * producer lock. If the ptr_ring is full, page_pool_put_page_bulk()
- * will release leftover pages to the page allocator.
- * page_pool_put_page_bulk() is suitable to be run inside the driver NAPI tx
+ * Tries to refill a number of netmems into the ptr_ring cache holding ptr_ring
+ * producer lock. If the ptr_ring is full, page_pool_put_netmem_bulk()
+ * will release leftover netmems to the memory provider.
+ * page_pool_put_netmem_bulk() is suitable to be run inside the driver NAPI tx
* completion loop for the XDP_REDIRECT use case.
*
* Please note the caller must not use data area after running
- * page_pool_put_page_bulk(), as this function overwrites it.
+ * page_pool_put_netmem_bulk(), as this function overwrites it.
*/
-void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count)
+void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
+ u32 count)
{
int i, bulk_len = 0;
bool allow_direct;
@@ -864,7 +864,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
allow_direct = page_pool_napi_local(pool);
for (i = 0; i < count; i++) {
- netmem_ref netmem = page_to_netmem(virt_to_head_page(data[i]));
+ netmem_ref netmem = netmem_compound_head(data[i]);
/* It is not the last user for the page frag case */
if (!page_pool_is_last_ref(netmem))
@@ -873,7 +873,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
netmem = __page_pool_put_page(pool, netmem, -1, allow_direct);
/* Approved for bulk recycling in ptr_ring cache */
if (netmem)
- data[bulk_len++] = (__force void *)netmem;
+ data[bulk_len++] = netmem;
}
if (!bulk_len)
@@ -882,7 +882,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
/* Bulk producer into ptr_ring page_pool cache */
in_softirq = page_pool_producer_lock(pool);
for (i = 0; i < bulk_len; i++) {
- if (__ptr_ring_produce(&pool->ring, data[i])) {
+ if (__ptr_ring_produce(&pool->ring, (__force void *)data[i])) {
/* ring full */
recycle_stat_inc(pool, ring_full);
break;
@@ -899,9 +899,9 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
* since put_page() with refcnt == 1 can be an expensive operation
*/
for (; i < bulk_len; i++)
- page_pool_return_page(pool, (__force netmem_ref)data[i]);
+ page_pool_return_page(pool, data[i]);
}
-EXPORT_SYMBOL(page_pool_put_page_bulk);
+EXPORT_SYMBOL(page_pool_put_netmem_bulk);
static netmem_ref page_pool_drain_frag(struct page_pool *pool,
netmem_ref netmem)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index de1e9cb78718..938ad15c9857 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -518,7 +518,7 @@ void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
if (unlikely(!xa || !bq->count))
return;
- page_pool_put_page_bulk(xa->page_pool, bq->q, bq->count);
+ page_pool_put_netmem_bulk(xa->page_pool, bq->q, bq->count);
/* bq->xa is not cleared to save lookup, if mem.id same in next bulk */
bq->count = 0;
}
@@ -559,12 +559,12 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
for (i = 0; i < sinfo->nr_frags; i++) {
skb_frag_t *frag = &sinfo->frags[i];
- bq->q[bq->count++] = skb_frag_address(frag);
+ bq->q[bq->count++] = skb_frag_netmem(frag);
if (bq->count == XDP_BULK_QUEUE_SIZE)
xdp_flush_frame_bulk(bq);
}
}
- bq->q[bq->count++] = xdpf->data;
+ bq->q[bq->count++] = virt_to_netmem(xdpf->data);
}
EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (7 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-06 2:40 ` Jakub Kicinski
2024-12-03 17:37 ` [PATCH net-next v6 10/10] xdp: get rid of xdp_frame::mem.id Alexander Lobakin
2024-12-06 7:10 ` [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I patchwork-bot+netdevbpf
10 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
The main reason for this change was to allow mixing pages from different
&page_pools within one &xdp_buff/&xdp_frame. Why not? With stuff like
devmem and io_uring zerocopy Rx, it's required to have separate PPs for
header buffers and payload buffers.
Adjust xdp_return_frame_bulk() and page_pool_put_netmem_bulk(), so that
they won't be tied to a particular pool. Let the latter create a
separate bulk of pages which's PP is different from the first netmem of
the bulk and process it after the main loop.
This greatly optimizes xdp_return_frame_bulk(): no more hashtable
lookups and forced flushes on PP mismatch. Also make
xdp_flush_frame_bulk() inline, as it's just one if + function call + one
u32 read, not worth extending the call ladder.
Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com> # iterative
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/page_pool/types.h | 6 ++--
include/net/xdp.h | 16 ++++++---
net/core/page_pool.c | 61 ++++++++++++++++++++++++++---------
net/core/xdp.c | 29 +----------------
4 files changed, 61 insertions(+), 51 deletions(-)
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 1ea16b0e9c79..05a864031271 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -259,8 +259,7 @@ void page_pool_disable_direct_recycling(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
const struct xdp_mem_info *mem);
-void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
- u32 count);
+void page_pool_put_netmem_bulk(netmem_ref *data, u32 count);
#else
static inline void page_pool_destroy(struct page_pool *pool)
{
@@ -272,8 +271,7 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
{
}
-static inline void page_pool_put_netmem_bulk(struct page_pool *pool,
- netmem_ref *data, u32 count)
+static inline void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
{
}
#endif
diff --git a/include/net/xdp.h b/include/net/xdp.h
index f4020b29122f..9e7eb8223513 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -11,6 +11,8 @@
#include <linux/netdevice.h>
#include <linux/skbuff.h> /* skb_shared_info */
+#include <net/page_pool/types.h>
+
/**
* DOC: XDP RX-queue information
*
@@ -193,14 +195,12 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
#define XDP_BULK_QUEUE_SIZE 16
struct xdp_frame_bulk {
int count;
- void *xa;
netmem_ref q[XDP_BULK_QUEUE_SIZE];
};
static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
{
- /* bq->count will be zero'ed when bq->xa gets updated */
- bq->xa = NULL;
+ bq->count = 0;
}
static inline struct skb_shared_info *
@@ -317,10 +317,18 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
void xdp_return_buff(struct xdp_buff *xdp);
-void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq);
+static inline void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
+{
+ if (unlikely(!bq->count))
+ return;
+
+ page_pool_put_netmem_bulk(bq->q, bq->count);
+ bq->count = 0;
+}
+
static __always_inline unsigned int
xdp_get_frame_len(const struct xdp_frame *xdpf)
{
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4c85b77cfdac..62cd1fcb9e97 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -841,7 +841,6 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page);
/**
* page_pool_put_netmem_bulk() - release references on multiple netmems
- * @pool: pool from which pages were allocated
* @data: array holding netmem references
* @count: number of entries in @data
*
@@ -854,35 +853,58 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page);
* Please note the caller must not use data area after running
* page_pool_put_netmem_bulk(), as this function overwrites it.
*/
-void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
- u32 count)
+void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
{
- int i, bulk_len = 0;
- bool allow_direct;
- bool in_softirq;
+ bool allow_direct, in_softirq, again = false;
+ netmem_ref bulk[XDP_BULK_QUEUE_SIZE];
+ u32 i, bulk_len, foreign;
+ struct page_pool *pool;
- allow_direct = page_pool_napi_local(pool);
+again:
+ pool = NULL;
+ bulk_len = 0;
+ foreign = 0;
for (i = 0; i < count; i++) {
- netmem_ref netmem = netmem_compound_head(data[i]);
+ struct page_pool *netmem_pp;
+ netmem_ref netmem;
+
+ if (!again) {
+ netmem = netmem_compound_head(data[i]);
- /* It is not the last user for the page frag case */
- if (!page_pool_is_last_ref(netmem))
+ /* It is not the last user for the page frag case */
+ if (!page_pool_is_last_ref(netmem))
+ continue;
+ } else {
+ netmem = data[i];
+ }
+
+ netmem_pp = netmem_get_pp(netmem);
+ if (unlikely(!pool)) {
+ pool = netmem_pp;
+ allow_direct = page_pool_napi_local(pool);
+ } else if (netmem_pp != pool) {
+ /*
+ * If the netmem belongs to a different page_pool, save
+ * it for another round after the main loop.
+ */
+ data[foreign++] = netmem;
continue;
+ }
netmem = __page_pool_put_page(pool, netmem, -1, allow_direct);
/* Approved for bulk recycling in ptr_ring cache */
if (netmem)
- data[bulk_len++] = netmem;
+ bulk[bulk_len++] = netmem;
}
if (!bulk_len)
- return;
+ goto out;
/* Bulk producer into ptr_ring page_pool cache */
in_softirq = page_pool_producer_lock(pool);
for (i = 0; i < bulk_len; i++) {
- if (__ptr_ring_produce(&pool->ring, (__force void *)data[i])) {
+ if (__ptr_ring_produce(&pool->ring, (__force void *)bulk[i])) {
/* ring full */
recycle_stat_inc(pool, ring_full);
break;
@@ -893,13 +915,22 @@ void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
/* Hopefully all pages was return into ptr_ring */
if (likely(i == bulk_len))
- return;
+ goto out;
/* ptr_ring cache full, free remaining pages outside producer lock
* since put_page() with refcnt == 1 can be an expensive operation
*/
for (; i < bulk_len; i++)
- page_pool_return_page(pool, data[i]);
+ page_pool_return_page(pool, bulk[i]);
+
+out:
+ if (!foreign)
+ return;
+
+ count = foreign;
+ again = true;
+
+ goto again;
}
EXPORT_SYMBOL(page_pool_put_netmem_bulk);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 938ad15c9857..56127e8ec85f 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -511,46 +511,19 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
* xdp_frame_bulk is usually stored/allocated on the function
* call-stack to avoid locking penalties.
*/
-void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
-{
- struct xdp_mem_allocator *xa = bq->xa;
-
- if (unlikely(!xa || !bq->count))
- return;
-
- page_pool_put_netmem_bulk(xa->page_pool, bq->q, bq->count);
- /* bq->xa is not cleared to save lookup, if mem.id same in next bulk */
- bq->count = 0;
-}
-EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
/* Must be called with rcu_read_lock held */
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq)
{
- struct xdp_mem_info *mem = &xdpf->mem;
- struct xdp_mem_allocator *xa;
-
- if (mem->type != MEM_TYPE_PAGE_POOL) {
+ if (xdpf->mem.type != MEM_TYPE_PAGE_POOL) {
xdp_return_frame(xdpf);
return;
}
- xa = bq->xa;
- if (unlikely(!xa)) {
- xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- bq->count = 0;
- bq->xa = xa;
- }
-
if (bq->count == XDP_BULK_QUEUE_SIZE)
xdp_flush_frame_bulk(bq);
- if (unlikely(mem->id != xa->mem.id)) {
- xdp_flush_frame_bulk(bq);
- bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- }
-
if (unlikely(xdp_frame_has_frags(xdpf))) {
struct skb_shared_info *sinfo;
int i;
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net-next v6 10/10] xdp: get rid of xdp_frame::mem.id
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (8 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk Alexander Lobakin
@ 2024-12-03 17:37 ` Alexander Lobakin
2024-12-06 7:10 ` [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I patchwork-bot+netdevbpf
10 siblings, 0 replies; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-03 17:37 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Initially, xdp_frame::mem.id was used to search for the corresponding
&page_pool to return the page correctly.
However, after that struct page was extended to have a direct pointer
to its PP (netmem has it as well), further keeping of this field makes
no sense. xdp_return_frame_bulk() still used it to do a lookup, and
this leftover is now removed.
Remove xdp_frame::mem and replace it with ::mem_type, as only memory
type still matters and we need to know it to be able to free the frame
correctly.
As a cute side effect, we can now make every scalar field in &xdp_frame
of 4 byte width, speeding up accesses to them.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
include/net/xdp.h | 14 +++++-----
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
drivers/net/veth.c | 4 +--
kernel/bpf/cpumap.c | 2 +-
net/bpf/test_run.c | 4 +--
net/core/filter.c | 12 ++++----
net/core/xdp.c | 28 +++++++++----------
7 files changed, 33 insertions(+), 33 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 9e7eb8223513..1c260869a353 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -169,13 +169,13 @@ xdp_get_buff_len(const struct xdp_buff *xdp)
struct xdp_frame {
void *data;
- u16 len;
- u16 headroom;
+ u32 len;
+ u32 headroom;
u32 metasize; /* uses lower 8-bits */
/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
- * while mem info is valid on remote CPU.
+ * while mem_type is valid on remote CPU.
*/
- struct xdp_mem_info mem;
+ enum xdp_mem_type mem_type:32;
struct net_device *dev_rx; /* used by cpumap */
u32 frame_sz;
u32 flags; /* supported values defined in xdp_buff_flags */
@@ -306,13 +306,13 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
if (unlikely(xdp_update_frame_from_buff(xdp, xdp_frame) < 0))
return NULL;
- /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
- xdp_frame->mem = xdp->rxq->mem;
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_type */
+ xdp_frame->mem_type = xdp->rxq->mem.type;
return xdp_frame;
}
-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp);
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index bf5baef5c3e0..4948b4906584 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2281,7 +2281,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv,
new_xdpf->len = xdpf->len;
new_xdpf->headroom = priv->tx_headroom;
new_xdpf->frame_sz = DPAA_BP_RAW_SIZE;
- new_xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ new_xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
/* Release the initial buffer */
xdp_return_frame_rx_napi(xdpf);
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0d6d0d749d44..048eb599a95a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -634,7 +634,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
break;
case XDP_TX:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
trace_xdp_exception(rq->dev, xdp_prog, act);
frame = &orig_frame;
@@ -646,7 +646,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
goto xdp_xmit;
case XDP_REDIRECT:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
frame = &orig_frame;
stats->rx_drops++;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a2f46785ac3b..774accbd4a22 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -190,7 +190,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
int err;
rxq.dev = xdpf->dev_rx;
- rxq.mem = xdpf->mem;
+ rxq.mem.type = xdpf->mem_type;
/* TODO: report queue_index to xdp_rxq_info */
xdp_convert_frame_to_buff(xdpf, &xdp);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 501ec4249fed..9ae2a7f1738b 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -153,7 +153,7 @@ static void xdp_test_run_init_page(netmem_ref netmem, void *arg)
new_ctx->data = new_ctx->data_meta + meta_len;
xdp_update_frame_from_buff(new_ctx, frm);
- frm->mem = new_ctx->rxq->mem;
+ frm->mem_type = new_ctx->rxq->mem.type;
memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
}
@@ -246,7 +246,7 @@ static void reset_ctx(struct xdp_page_head *head)
head->ctx.data_meta = head->orig_ctx.data_meta;
head->ctx.data_end = head->orig_ctx.data_end;
xdp_update_frame_from_buff(&head->ctx, head->frame);
- head->frame->mem = head->orig_ctx.rxq->mem;
+ head->frame->mem_type = head->orig_ctx.rxq->mem.type;
}
static int xdp_recv_frames(struct xdp_frame **frames, int nframes,
diff --git a/net/core/filter.c b/net/core/filter.c
index fac245065b0a..6c036708634b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4119,13 +4119,13 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset)
}
static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
- struct xdp_mem_info *mem_info, bool release)
+ enum xdp_mem_type mem_type, bool release)
{
struct xdp_buff *zc_frag = xsk_buff_get_tail(xdp);
if (release) {
xsk_buff_del_tail(zc_frag);
- __xdp_return(NULL, mem_info, false, zc_frag);
+ __xdp_return(NULL, mem_type, false, zc_frag);
} else {
zc_frag->data_end -= shrink;
}
@@ -4134,18 +4134,18 @@ static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag,
int shrink)
{
- struct xdp_mem_info *mem_info = &xdp->rxq->mem;
+ enum xdp_mem_type mem_type = xdp->rxq->mem.type;
bool release = skb_frag_size(frag) == shrink;
- if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) {
- bpf_xdp_shrink_data_zc(xdp, shrink, mem_info, release);
+ if (mem_type == MEM_TYPE_XSK_BUFF_POOL) {
+ bpf_xdp_shrink_data_zc(xdp, shrink, mem_type, release);
goto out;
}
if (release) {
struct page *page = skb_frag_page(frag);
- __xdp_return(page_address(page), mem_info, false, NULL);
+ __xdp_return(page_address(page), mem_type, false, NULL);
}
out:
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 56127e8ec85f..d367571c5838 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -430,12 +430,12 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
* is used for those calls sites. Thus, allowing for faster recycling
* of xdp_frames/pages in those cases.
*/
-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp)
{
struct page *page;
- switch (mem->type) {
+ switch (mem_type) {
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())
@@ -458,7 +458,7 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
break;
default:
/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
- WARN(1, "Incorrect XDP memory type (%d) usage", mem->type);
+ WARN(1, "Incorrect XDP memory type (%d) usage", mem_type);
break;
}
}
@@ -475,10 +475,10 @@ void xdp_return_frame(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdpf->mem, false, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, false, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, false, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame);
@@ -494,10 +494,10 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdpf->mem, true, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, true, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, true, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, true, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -516,7 +516,7 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq)
{
- if (xdpf->mem.type != MEM_TYPE_PAGE_POOL) {
+ if (xdpf->mem_type != MEM_TYPE_PAGE_POOL) {
xdp_return_frame(xdpf);
return;
}
@@ -553,10 +553,11 @@ void xdp_return_buff(struct xdp_buff *xdp)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);
- __xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+ __xdp_return(page_address(page), xdp->rxq->mem.type, true,
+ xdp);
}
out:
- __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
+ __xdp_return(xdp->data, xdp->rxq->mem.type, true, xdp);
}
EXPORT_SYMBOL_GPL(xdp_return_buff);
@@ -602,7 +603,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
xdpf->headroom = 0;
xdpf->metasize = metasize;
xdpf->frame_sz = PAGE_SIZE;
- xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
xsk_buff_free(xdp);
return xdpf;
@@ -672,7 +673,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
* - RX ring dev queue index (skb_record_rx_queue)
*/
- if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
+ if (xdpf->mem_type == MEM_TYPE_PAGE_POOL)
skb_mark_for_recycle(skb);
/* Allow SKB to reuse area used by xdp_frame */
@@ -719,8 +720,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
nxdpf = addr;
nxdpf->data = addr + headroom;
nxdpf->frame_sz = PAGE_SIZE;
- nxdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
- nxdpf->mem.id = 0;
+ nxdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
return nxdpf;
}
--
2.47.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder
2024-12-03 17:37 ` [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder Alexander Lobakin
@ 2024-12-04 10:27 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-12-04 10:27 UTC (permalink / raw)
To: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Maciej Fijalkowski, Stanislav Fomichev,
Magnus Karlsson, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
Alexander Lobakin <aleksander.lobakin@intel.com> writes:
> After the series "XSk buff on a diet" by Maciej, the greatest pow-2
> which &xdp_buff_xsk can be divided got reduced from 16 to 8 on x86_64.
> Also, sizeof(xdp_buff_xsk) now is 120 bytes, which, taking the previous
> sentence into account, leads to that it leaves 8 bytes at the end of
> cacheline, which means an array of buffs will have its elements
> messed between the cachelines chaotically.
> Use __aligned_largest for this struct. This alignment is usually 16
> bytes, which makes it fill two full cachelines and align an array
> nicely. ___cacheline_aligned may be excessive here, especially on
> arches with 128-256 byte CLs, as well as 32-bit arches (76 -> 96
> bytes on MIPS32R2), while not doing better than _largest.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Ohh, didn't know about that attribute - neat!
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers
2024-12-03 17:37 ` [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers Alexander Lobakin
@ 2024-12-04 10:49 ` Toke Høiland-Jørgensen
2024-12-06 4:03 ` Mina Almasry
1 sibling, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-12-04 10:49 UTC (permalink / raw)
To: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Maciej Fijalkowski, Stanislav Fomichev,
Magnus Karlsson, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
Alexander Lobakin <aleksander.lobakin@intel.com> writes:
> Add the following netmem counterparts:
>
> * virt_to_netmem() -- simple page_to_netmem(virt_to_page()) wrapper;
> * netmem_is_pfmemalloc() -- page_is_pfmemalloc() for page-backed
> netmems, false otherwise;
>
> and the following "unsafe" versions:
>
> * __netmem_to_page()
> * __netmem_get_pp()
> * __netmem_address()
>
> They do the same as their non-underscored buddies, but assume the netmem
> is always page-backed. When working with header &page_pools, you don't
> need to check whether netmem belongs to the host memory and you can
> never get NULL instead of &page. Checks for the LSB, clearing the LSB,
> branches take cycles and increase object code size, sometimes
> significantly. When you're sure your PP is always host, you can avoid
> this by using the underscored counterparts.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Makes sense to have these as helpers, spelling out the constraints
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems
2024-12-03 17:37 ` [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems Alexander Lobakin
@ 2024-12-04 10:50 ` Toke Høiland-Jørgensen
2024-12-06 4:07 ` Mina Almasry
1 sibling, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-12-04 10:50 UTC (permalink / raw)
To: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko
Cc: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Maciej Fijalkowski, Stanislav Fomichev,
Magnus Karlsson, nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev,
linux-kernel
Alexander Lobakin <aleksander.lobakin@intel.com> writes:
> Currently, page_pool_put_page_bulk() indeed takes an array of pointers
> to the data, not pages, despite the name. As one side effect, when
> you're freeing frags from &skb_shared_info, xdp_return_frame_bulk()
> converts page pointers to virtual addresses and then
> page_pool_put_page_bulk() converts them back. Moreover, data pointers
> assume every frag is placed in the host memory, making this function
> non-universal.
> Make page_pool_put_page_bulk() handle array of netmems. Pass frag
> netmems directly and use virt_to_netmem() when freeing xdpf->data,
> so that the PP core will then get the compound netmem and take care
> of the rest.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk
2024-12-03 17:37 ` [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk Alexander Lobakin
@ 2024-12-06 2:40 ` Jakub Kicinski
2024-12-06 13:49 ` Alexander Lobakin
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2024-12-06 2:40 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, David S. Miller, Eric Dumazet, Paolo Abeni,
Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Very nice in general, I'll apply the previous 8 but I'd like to offer
some alternatives here..
On Tue, 3 Dec 2024 18:37:32 +0100 Alexander Lobakin wrote:
> +void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
> {
> - int i, bulk_len = 0;
> - bool allow_direct;
> - bool in_softirq;
> + bool allow_direct, in_softirq, again = false;
> + netmem_ref bulk[XDP_BULK_QUEUE_SIZE];
> + u32 i, bulk_len, foreign;
> + struct page_pool *pool;
>
> - allow_direct = page_pool_napi_local(pool);
> +again:
> + pool = NULL;
> + bulk_len = 0;
> + foreign = 0;
>
> for (i = 0; i < count; i++) {
> - netmem_ref netmem = netmem_compound_head(data[i]);
> + struct page_pool *netmem_pp;
> + netmem_ref netmem;
> +
> + if (!again) {
> + netmem = netmem_compound_head(data[i]);
>
> - /* It is not the last user for the page frag case */
> - if (!page_pool_is_last_ref(netmem))
> + /* It is not the last user for the page frag case */
> + if (!page_pool_is_last_ref(netmem))
> + continue;
We check the "again" condition potentially n^2 times, is it written
this way because we expect no mixing? Would it not be fewer cycles
to do a first pass, convert all buffers to heads, filter out all
non-last refs, and delete the "again" check?
Minor benefit is that it removes a few of the long lines so it'd be
feasible to drop the "goto again" as well and just turn this function
into a while (count) loop.
> + } else {
> + netmem = data[i];
> + }
> +
> + netmem_pp = netmem_get_pp(netmem);
nit: netmem_pp is not a great name. Ain't nothing especially netmem
about it, it's just the _current_ page pool.
> + if (unlikely(!pool)) {
> + pool = netmem_pp;
> + allow_direct = page_pool_napi_local(pool);
> + } else if (netmem_pp != pool) {
> + /*
> + * If the netmem belongs to a different page_pool, save
> + * it for another round after the main loop.
> + */
> + data[foreign++] = netmem;
> continue;
> + }
>
> netmem = __page_pool_put_page(pool, netmem, -1, allow_direct);
> /* Approved for bulk recycling in ptr_ring cache */
> if (netmem)
> - data[bulk_len++] = netmem;
> + bulk[bulk_len++] = netmem;
> }
>
> if (!bulk_len)
You can invert this condition, and move all the code from here to the
out label into a small helper with just 3 params (pool, bulk, bulk_len).
Naming will be the tricky part but you can save us a bunch of gotos.
> - return;
> + goto out;
>
> /* Bulk producer into ptr_ring page_pool cache */
> in_softirq = page_pool_producer_lock(pool);
> for (i = 0; i < bulk_len; i++) {
> - if (__ptr_ring_produce(&pool->ring, (__force void *)data[i])) {
> + if (__ptr_ring_produce(&pool->ring, (__force void *)bulk[i])) {
> /* ring full */
> recycle_stat_inc(pool, ring_full);
> break;
> @@ -893,13 +915,22 @@ void page_pool_put_netmem_bulk(struct page_pool *pool, netmem_ref *data,
>
> /* Hopefully all pages was return into ptr_ring */
> if (likely(i == bulk_len))
> - return;
> + goto out;
>
> /* ptr_ring cache full, free remaining pages outside producer lock
> * since put_page() with refcnt == 1 can be an expensive operation
> */
> for (; i < bulk_len; i++)
> - page_pool_return_page(pool, data[i]);
> + page_pool_return_page(pool, bulk[i]);
> +
> +out:
> + if (!foreign)
> + return;
> +
> + count = foreign;
> + again = true;
> +
> + goto again;
> }
> EXPORT_SYMBOL(page_pool_put_netmem_bulk);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers
2024-12-03 17:37 ` [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers Alexander Lobakin
2024-12-04 10:49 ` Toke Høiland-Jørgensen
@ 2024-12-06 4:03 ` Mina Almasry
1 sibling, 0 replies; 20+ messages in thread
From: Mina Almasry @ 2024-12-06 4:03 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
On Tue, Dec 3, 2024 at 9:43 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> Add the following netmem counterparts:
>
> * virt_to_netmem() -- simple page_to_netmem(virt_to_page()) wrapper;
> * netmem_is_pfmemalloc() -- page_is_pfmemalloc() for page-backed
> netmems, false otherwise;
>
> and the following "unsafe" versions:
>
> * __netmem_to_page()
> * __netmem_get_pp()
> * __netmem_address()
>
> They do the same as their non-underscored buddies, but assume the netmem
> is always page-backed. When working with header &page_pools, you don't
> need to check whether netmem belongs to the host memory and you can
> never get NULL instead of &page. Checks for the LSB, clearing the LSB,
> branches take cycles and increase object code size, sometimes
> significantly. When you're sure your PP is always host, you can avoid
> this by using the underscored counterparts.
>
I can imagine future use cases where net_iov netmems are used for
headers. I'm thinking of a memory provider backed by hugepages
(Eric/Jakub's idea). In that case the netmem may be backed by a tail
page underneath or may be backed by net_iov that happens to be
readable.
But if these ideas ever materialize, we can always revisit this. Some
suggestions for consideration below but either way:
Reviewed-by: Mina Almasry <almasrymina@google.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> include/net/netmem.h | 78 ++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/netmem.h b/include/net/netmem.h
> index 8a6e20be4b9d..1b58faa4f20f 100644
> --- a/include/net/netmem.h
> +++ b/include/net/netmem.h
> @@ -72,6 +72,22 @@ static inline bool netmem_is_net_iov(const netmem_ref netmem)
> return (__force unsigned long)netmem & NET_IOV;
> }
>
> +/**
> + * __netmem_to_page - unsafely get pointer to the &page backing @netmem
> + * @netmem: netmem reference to convert
> + *
> + * Unsafe version of netmem_to_page(). When @netmem is always page-backed,
> + * e.g. when it's a header buffer, performs faster and generates smaller
> + * object code (no check for the LSB, no WARN). When @netmem points to IOV,
> + * provokes undefined behaviour.
> + *
> + * Return: pointer to the &page (garbage if @netmem is not page-backed).
> + */
> +static inline struct page *__netmem_to_page(netmem_ref netmem)
> +{
> + return (__force struct page *)netmem;
> +}
> +
nit: I would name it netmem_to_page_unsafe(), just for glaringly
obvious clarity.
> /* This conversion fails (returns NULL) if the netmem_ref is not struct page
> * backed.
> */
> @@ -80,7 +96,7 @@ static inline struct page *netmem_to_page(netmem_ref netmem)
> if (WARN_ON_ONCE(netmem_is_net_iov(netmem)))
> return NULL;
>
> - return (__force struct page *)netmem;
> + return __netmem_to_page(netmem);
> }
>
> static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem)
> @@ -103,6 +119,17 @@ static inline netmem_ref page_to_netmem(struct page *page)
> return (__force netmem_ref)page;
> }
>
> +/**
> + * virt_to_netmem - convert virtual memory pointer to a netmem reference
> + * @data: host memory pointer to convert
> + *
> + * Return: netmem reference to the &page backing this virtual address.
> + */
> +static inline netmem_ref virt_to_netmem(const void *data)
> +{
> + return page_to_netmem(virt_to_page(data));
> +}
> +
> static inline int netmem_ref_count(netmem_ref netmem)
> {
> /* The non-pp refcount of net_iov is always 1. On net_iov, we only
> @@ -127,6 +154,22 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> }
>
> +/**
> + * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
> + * @netmem: netmem reference to get the pointer from
> + *
> + * Unsafe version of netmem_get_pp(). When @netmem is always page-backed,
> + * e.g. when it's a header buffer, performs faster and generates smaller
> + * object code (avoids clearing the LSB). When @netmem points to IOV,
> + * provokes invalid memory access.
> + *
> + * Return: pointer to the &page_pool (garbage if @netmem is not page-backed).
> + */
> +static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
> +{
> + return __netmem_to_page(netmem)->pp;
> +}
> +
> static inline struct page_pool *netmem_get_pp(netmem_ref netmem)
> {
> return __netmem_clear_lsb(netmem)->pp;
> @@ -158,12 +201,43 @@ static inline netmem_ref netmem_compound_head(netmem_ref netmem)
> return page_to_netmem(compound_head(netmem_to_page(netmem)));
> }
>
> +/**
> + * __netmem_address - unsafely get pointer to the memory backing @netmem
> + * @netmem: netmem reference to get the pointer for
> + *
> + * Unsafe version of netmem_address(). When @netmem is always page-backed,
> + * e.g. when it's a header buffer, performs faster and generates smaller
> + * object code (no check for the LSB). When @netmem points to IOV, provokes
> + * undefined behaviour.
> + *
> + * Return: pointer to the memory (garbage if @netmem is not page-backed).
> + */
> +static inline void *__netmem_address(netmem_ref netmem)
> +{
> + return page_address(__netmem_to_page(netmem));
> +}
> +
> static inline void *netmem_address(netmem_ref netmem)
> {
> if (netmem_is_net_iov(netmem))
> return NULL;
>
> - return page_address(netmem_to_page(netmem));
> + return __netmem_address(netmem);
> +}
> +
> +/**
> + * netmem_is_pfmemalloc - check if @netmem was allocated under memory pressure
> + * @netmem: netmem reference to check
> + *
> + * Return: true if @netmem is page-backed and the page was allocated under
> + * memory pressure, false otherwise.
> + */
> +static inline bool netmem_is_pfmemalloc(netmem_ref netmem)
> +{
> + if (netmem_is_net_iov(netmem))
> + return false;
> +
> + return page_is_pfmemalloc(netmem_to_page(netmem));
> }
Is it better to add these pfmemalloc/address helpers, or better to do:
page = netmem_to_page_unsafe(netmem);
page_is_pfmemalloc(page);
page_address(page);
Sure the latter is a bit more of a pain to type, but is arguably more
elegant than having to add *_unsafe() versions of all the netmem
helpers eventually.
--
Thanks,
Mina
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems
2024-12-03 17:37 ` [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems Alexander Lobakin
2024-12-04 10:50 ` Toke Høiland-Jørgensen
@ 2024-12-06 4:07 ` Mina Almasry
1 sibling, 0 replies; 20+ messages in thread
From: Mina Almasry @ 2024-12-06 4:07 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
On Tue, Dec 3, 2024 at 9:43 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> Currently, page_pool_put_page_bulk() indeed takes an array of pointers
> to the data, not pages, despite the name. As one side effect, when
> you're freeing frags from &skb_shared_info, xdp_return_frame_bulk()
> converts page pointers to virtual addresses and then
> page_pool_put_page_bulk() converts them back. Moreover, data pointers
> assume every frag is placed in the host memory, making this function
> non-universal.
> Make page_pool_put_page_bulk() handle array of netmems. Pass frag
> netmems directly and use virt_to_netmem() when freeing xdpf->data,
> so that the PP core will then get the compound netmem and take care
> of the rest.
>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Thank you very much. There are a handful of page_pool APIs that don't
yet have netmem replacements/equivalents. Thanks for taking up this
one as you look into XDP.
Reviewed-by: Mina Almasry <almasrymina@google.com>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
` (9 preceding siblings ...)
2024-12-03 17:37 ` [PATCH net-next v6 10/10] xdp: get rid of xdp_frame::mem.id Alexander Lobakin
@ 2024-12-06 7:10 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-06 7:10 UTC (permalink / raw)
To: Alexander Lobakin
Cc: ast, daniel, john.fastabend, andrii, davem, edumazet, kuba,
pabeni, toke, maciej.fijalkowski, sdf, magnus.karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 3 Dec 2024 18:37:23 +0100 you wrote:
> XDP for idpf is currently 6 chapters:
> * convert Rx to libeth;
> * convert Tx and stats to libeth;
> * generic XDP and XSk code changes (you are here);
> * generic XDP and XSk code additions;
> * actual XDP for idpf via new libeth_xdp;
> * XSk for idpf (via ^).
>
> [...]
Here is the summary with links:
- [net-next,v6,01/10] xsk: align &xdp_buff_xsk harder
https://git.kernel.org/netdev/net-next/c/ca5c94949fac
- [net-next,v6,02/10] bpf, xdp: constify some bpf_prog * function arguments
https://git.kernel.org/netdev/net-next/c/7cd1107f48e2
- [net-next,v6,03/10] xdp, xsk: constify read-only arguments of some static inline helpers
https://git.kernel.org/netdev/net-next/c/dcf3827cde86
- [net-next,v6,04/10] xdp: allow attaching already registered memory model to xdp_rxq_info
https://git.kernel.org/netdev/net-next/c/f65966fe0178
- [net-next,v6,05/10] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model()
https://git.kernel.org/netdev/net-next/c/9e25dd9d65d2
- [net-next,v6,06/10] xdp: register system page pool as an XDP memory model
https://git.kernel.org/netdev/net-next/c/e77d9aee9513
- [net-next,v6,07/10] netmem: add a couple of page helper wrappers
https://git.kernel.org/netdev/net-next/c/9bd9f72a7434
- [net-next,v6,08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems
https://git.kernel.org/netdev/net-next/c/024bfd2e9d80
- [net-next,v6,09/10] page_pool: allow mixing PPs within one bulk
(no matching commit)
- [net-next,v6,10/10] xdp: get rid of xdp_frame::mem.id
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk
2024-12-06 2:40 ` Jakub Kicinski
@ 2024-12-06 13:49 ` Alexander Lobakin
2024-12-06 16:20 ` Jakub Kicinski
0 siblings, 1 reply; 20+ messages in thread
From: Alexander Lobakin @ 2024-12-06 13:49 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, David S. Miller, Eric Dumazet, Paolo Abeni,
Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
From: Jakub Kicinski <kuba@kernel.org>
Date: Thu, 5 Dec 2024 18:40:16 -0800
> Very nice in general, I'll apply the previous 8 but I'd like to offer
> some alternatives here..
Great suggestions, I addressed most of them already and the function
looks much better now.
One note below.
>
> On Tue, 3 Dec 2024 18:37:32 +0100 Alexander Lobakin wrote:
>> +void page_pool_put_netmem_bulk(netmem_ref *data, u32 count)
>> {
>> - int i, bulk_len = 0;
>> - bool allow_direct;
>> - bool in_softirq;
>> + bool allow_direct, in_softirq, again = false;
>> + netmem_ref bulk[XDP_BULK_QUEUE_SIZE];
>> + u32 i, bulk_len, foreign;
>> + struct page_pool *pool;
>>
>> - allow_direct = page_pool_napi_local(pool);
>> +again:
>> + pool = NULL;
>> + bulk_len = 0;
>> + foreign = 0;
>>
>> for (i = 0; i < count; i++) {
>> - netmem_ref netmem = netmem_compound_head(data[i]);
>> + struct page_pool *netmem_pp;
>> + netmem_ref netmem;
>> +
>> + if (!again) {
>> + netmem = netmem_compound_head(data[i]);
>>
>> - /* It is not the last user for the page frag case */
>> - if (!page_pool_is_last_ref(netmem))
>> + /* It is not the last user for the page frag case */
>> + if (!page_pool_is_last_ref(netmem))
>> + continue;
>
> We check the "again" condition potentially n^2 times, is it written
> this way because we expect no mixing? Would it not be fewer cycles
> to do a first pass, convert all buffers to heads, filter out all
> non-last refs, and delete the "again" check?
>
> Minor benefit is that it removes a few of the long lines so it'd be
> feasible to drop the "goto again" as well and just turn this function
> into a while (count) loop.
>
>> + } else {
>> + netmem = data[i];
>> + }
>> +
>> + netmem_pp = netmem_get_pp(netmem);
>
> nit: netmem_pp is not a great name. Ain't nothing especially netmem
> about it, it's just the _current_ page pool.
It's the page_pool of the @netmem we're processing on this iteration.
"This netmem's PP" => netmem_pp.
Current page_pool which we'll use for recycling is @pool.
>
>> + if (unlikely(!pool)) {
>> + pool = netmem_pp;
>> + allow_direct = page_pool_napi_local(pool);
>> + } else if (netmem_pp != pool) {
>> + /*
>> + * If the netmem belongs to a different page_pool, save
>> + * it for another round after the main loop.
>> + */
>> + data[foreign++] = netmem;
>> continue;
>> + }
>>
>> netmem = __page_pool_put_page(pool, netmem, -1, allow_direct);
>> /* Approved for bulk recycling in ptr_ring cache */
>> if (netmem)
>> - data[bulk_len++] = netmem;
>> + bulk[bulk_len++] = netmem;
>> }
Thanks,
Olek
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk
2024-12-06 13:49 ` Alexander Lobakin
@ 2024-12-06 16:20 ` Jakub Kicinski
0 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2024-12-06 16:20 UTC (permalink / raw)
To: Alexander Lobakin
Cc: Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, David S. Miller, Eric Dumazet, Paolo Abeni,
Toke Høiland-Jørgensen, Maciej Fijalkowski,
Stanislav Fomichev, Magnus Karlsson,
nex.sw.ncis.osdt.itp.upstreaming, bpf, netdev, linux-kernel
On Fri, 6 Dec 2024 14:49:18 +0100 Alexander Lobakin wrote:
> > nit: netmem_pp is not a great name. Ain't nothing especially netmem
> > about it, it's just the _current_ page pool.
>
> It's the page_pool of the @netmem we're processing on this iteration.
> "This netmem's PP" => netmem_pp.
> Current page_pool which we'll use for recycling is @pool.
Heh, yes, I guess there are levels to current..ness :)
Maybe instead of current the one we're servicing could be called tgt_pp
and the one from iterator just pp? No big deal either way, tho, this is
very nit picky and subjective...
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-12-06 16:21 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 17:37 [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 01/10] xsk: align &xdp_buff_xsk harder Alexander Lobakin
2024-12-04 10:27 ` Toke Høiland-Jørgensen
2024-12-03 17:37 ` [PATCH net-next v6 02/10] bpf, xdp: constify some bpf_prog * function arguments Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 03/10] xdp, xsk: constify read-only arguments of some static inline helpers Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 04/10] xdp: allow attaching already registered memory model to xdp_rxq_info Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 05/10] xsk: allow attaching XSk pool via xdp_rxq_info_reg_mem_model() Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 06/10] xdp: register system page pool as an XDP memory model Alexander Lobakin
2024-12-03 17:37 ` [PATCH net-next v6 07/10] netmem: add a couple of page helper wrappers Alexander Lobakin
2024-12-04 10:49 ` Toke Høiland-Jørgensen
2024-12-06 4:03 ` Mina Almasry
2024-12-03 17:37 ` [PATCH net-next v6 08/10] page_pool: make page_pool_put_page_bulk() handle array of netmems Alexander Lobakin
2024-12-04 10:50 ` Toke Høiland-Jørgensen
2024-12-06 4:07 ` Mina Almasry
2024-12-03 17:37 ` [PATCH net-next v6 09/10] page_pool: allow mixing PPs within one bulk Alexander Lobakin
2024-12-06 2:40 ` Jakub Kicinski
2024-12-06 13:49 ` Alexander Lobakin
2024-12-06 16:20 ` Jakub Kicinski
2024-12-03 17:37 ` [PATCH net-next v6 10/10] xdp: get rid of xdp_frame::mem.id Alexander Lobakin
2024-12-06 7:10 ` [PATCH net-next v6 00/10] xdp: a fistful of generic changes pt. I patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).