Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH v20 bpf-next 10/23] net: mvneta: enable jumbo frames if the loaded XDP program support mb
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode if the loaded program declare to properly support
xdp multi-buff. At same time reject a xdp program not supporting xdp
multi-buffer if the driver is running in xdp multi-buffer mode.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 332699960b53..98db3d03116a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3750,6 +3750,7 @@ static void mvneta_percpu_disable(void *arg)
 static int mvneta_change_mtu(struct net_device *dev, int mtu)
 {
 	struct mvneta_port *pp = netdev_priv(dev);
+	struct bpf_prog *prog = pp->xdp_prog;
 	int ret;
 
 	if (!IS_ALIGNED(MVNETA_RX_PKT_SIZE(mtu), 8)) {
@@ -3758,8 +3759,11 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
 	}
 
-	if (pp->xdp_prog && mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		netdev_info(dev, "Illegal MTU value %d for XDP mode\n", mtu);
+	if (prog && !prog->aux->xdp_mb && mtu > MVNETA_MAX_RX_BUF_SIZE) {
+		netdev_info(dev,
+			    "Illegal MTU %d for XDP prog without multi-buf\n",
+			    mtu);
+
 		return -EINVAL;
 	}
 
@@ -4428,8 +4432,9 @@ static int mvneta_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
 	struct mvneta_port *pp = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 
-	if (prog && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
+	if (prog && !prog->aux->xdp_mb && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "prog does not support XDP multi-buff");
 		return -EOPNOTSUPP;
 	}
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 11/23] bpf: introduce bpf_xdp_get_buff_len helper
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h              | 14 ++++++++++++++
 include/uapi/linux/bpf.h       |  7 +++++++
 net/core/filter.c              | 15 +++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 4 files changed, 43 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 798b84d86d97..067716d38ebc 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -145,6 +145,20 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
 	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
 }
 
+static __always_inline unsigned int xdp_get_buff_len(struct xdp_buff *xdp)
+{
+	unsigned int len = xdp->data_end - xdp->data;
+	struct skb_shared_info *sinfo;
+
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	len += sinfo->xdp_frags_size;
+out:
+	return len;
+}
+
 struct xdp_frame {
 	void *data;
 	u16 len;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d5d3e7d9ec49..d5921a0202f4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4988,6 +4988,12 @@ union bpf_attr {
  *	Return
  *		The number of loops performed, **-EINVAL** for invalid **flags**,
  *		**-E2BIG** if **nr_loops** exceeds the maximum number of loops.
+ *
+ * u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5172,6 +5178,7 @@ union bpf_attr {
 	FN(kallsyms_lookup_name),	\
 	FN(find_vma),			\
 	FN(loop),			\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index fe27c91e3758..122d3a6afe9d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3783,6 +3783,19 @@ static const struct bpf_func_proto sk_skb_change_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 	.arg3_type	= ARG_ANYTHING,
 };
+
+BPF_CALL_1(bpf_xdp_get_buff_len, struct  xdp_buff*, xdp)
+{
+	return xdp_get_buff_len(xdp);
+}
+
+static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 static unsigned long xdp_get_metalen(const struct xdp_buff *xdp)
 {
 	return xdp_data_meta_unsupported(xdp) ? 0 :
@@ -7470,6 +7483,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_redirect_map_proto;
 	case BPF_FUNC_xdp_adjust_tail:
 		return &bpf_xdp_adjust_tail_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
 	case BPF_FUNC_check_mtu:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d5d3e7d9ec49..d5921a0202f4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4988,6 +4988,12 @@ union bpf_attr {
  *	Return
  *		The number of loops performed, **-EINVAL** for invalid **flags**,
  *		**-E2BIG** if **nr_loops** exceeds the maximum number of loops.
+ *
+ * u64 bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5172,6 +5178,7 @@ union bpf_attr {
 	FN(kallsyms_lookup_name),	\
 	FN(find_vma),			\
 	FN(loop),			\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 12/23] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

From: Eelco Chaudron <echaudro@redhat.com>

This change adds support for tail growing and shrinking for XDP multi-buff.

When called on a multi-buffer packet with a grow request, it will work
on the last fragment of the packet. So the maximum grow size is the
last fragments tailroom, i.e. no new buffer will be allocated.
A XDP mb capable driver is expected to set frag_size in xdp_rxq_info data
structure to notify the XDP core the fragment size. frag_size set to 0 is
interpreted by the XDP core as tail growing is not allowed.
Introduce __xdp_rxq_info_reg utility routine to initialize frag_size field.

When shrinking, it will work from the last fragment, all the way down to
the base buffer depending on the shrinking size. It's important to mention
that once you shrink down the fragment(s) are freed, so you can not grow
again to the original size.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |  3 +-
 include/net/xdp.h                     | 16 ++++++-
 net/core/filter.c                     | 65 +++++++++++++++++++++++++++
 net/core/xdp.c                        | 12 ++---
 4 files changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 98db3d03116a..5296a17236ad 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3270,7 +3270,8 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 		return err;
 	}
 
-	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0);
+	err = __xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id, 0,
+				 PAGE_SIZE);
 	if (err < 0)
 		goto err_free_pp;
 
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 067716d38ebc..c45eaf58b3d4 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -60,6 +60,7 @@ struct xdp_rxq_info {
 	u32 reg_state;
 	struct xdp_mem_info mem;
 	unsigned int napi_id;
+	u32 frag_size;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
 struct xdp_txq_info {
@@ -304,6 +305,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
 	return xdp_frame;
 }
 
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp);
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
@@ -340,8 +343,17 @@ static inline void xdp_release_frame(struct xdp_frame *xdpf)
 	__xdp_release_frame(xdpf->data, mem);
 }
 
-int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index, unsigned int napi_id);
+int __xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
+		       struct net_device *dev, u32 queue_index,
+		       unsigned int napi_id, u32 frag_size);
+static inline int
+xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
+		 struct net_device *dev, u32 queue_index,
+		 unsigned int napi_id)
+{
+	return __xdp_rxq_info_reg(xdp_rxq, dev, queue_index, napi_id, 0);
+}
+
 void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq);
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq);
 bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
diff --git a/net/core/filter.c b/net/core/filter.c
index 122d3a6afe9d..d2d0f31a5498 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3830,11 +3830,76 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static int bpf_xdp_mb_increase_tail(struct xdp_buff *xdp, int offset)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
+	struct xdp_rxq_info *rxq = xdp->rxq;
+	unsigned int tailroom;
+
+	if (!rxq->frag_size || rxq->frag_size > xdp->frame_sz)
+		return -EOPNOTSUPP;
+
+	tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag);
+	if (unlikely(offset > tailroom))
+		return -EINVAL;
+
+	memset(skb_frag_address(frag) + skb_frag_size(frag), 0, offset);
+	skb_frag_size_add(frag, offset);
+	sinfo->xdp_frags_size += offset;
+
+	return 0;
+}
+
+static int bpf_xdp_mb_shrink_tail(struct xdp_buff *xdp, int offset)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	int i, n_frags_free = 0, len_free = 0;
+
+	if (unlikely(offset > (int)xdp_get_buff_len(xdp) - ETH_HLEN))
+		return -EINVAL;
+
+	for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
+		skb_frag_t *frag = &sinfo->frags[i];
+		int shrink = min_t(int, offset, skb_frag_size(frag));
+
+		len_free += shrink;
+		offset -= shrink;
+
+		if (skb_frag_size(frag) == shrink) {
+			struct page *page = skb_frag_page(frag);
+
+			__xdp_return(page_address(page), &xdp->rxq->mem,
+				     false, NULL);
+			n_frags_free++;
+		} else {
+			skb_frag_size_sub(frag, shrink);
+			break;
+		}
+	}
+	sinfo->nr_frags -= n_frags_free;
+	sinfo->xdp_frags_size -= len_free;
+
+	if (unlikely(!sinfo->nr_frags)) {
+		xdp_buff_clear_mb(xdp);
+		xdp->data_end -= offset;
+	}
+
+	return 0;
+}
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
+	if (unlikely(xdp_buff_is_mb(xdp))) { /* xdp multi-buffer */
+		if (offset < 0)
+			return bpf_xdp_mb_shrink_tail(xdp, -offset);
+
+		return bpf_xdp_mb_increase_tail(xdp, offset);
+	}
+
 	/* Notice that xdp_data_hard_end have reserved some tailroom */
 	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 7cfcc93116d7..bf3b3884efb3 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -156,8 +156,9 @@ static void xdp_rxq_info_init(struct xdp_rxq_info *xdp_rxq)
 }
 
 /* Returns 0 on success, negative on failure */
-int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
-		     struct net_device *dev, u32 queue_index, unsigned int napi_id)
+int __xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
+		       struct net_device *dev, u32 queue_index,
+		       unsigned int napi_id, u32 frag_size)
 {
 	if (xdp_rxq->reg_state == REG_STATE_UNUSED) {
 		WARN(1, "Driver promised not to register this");
@@ -179,11 +180,12 @@ int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
 	xdp_rxq->dev = dev;
 	xdp_rxq->queue_index = queue_index;
 	xdp_rxq->napi_id = napi_id;
+	xdp_rxq->frag_size = frag_size;
 
 	xdp_rxq->reg_state = REG_STATE_REGISTERED;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(xdp_rxq_info_reg);
+EXPORT_SYMBOL_GPL(__xdp_rxq_info_reg);
 
 void xdp_rxq_info_unused(struct xdp_rxq_info *xdp_rxq)
 {
@@ -337,8 +339,8 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * is used for those calls sites.  Thus, allowing for faster recycling
  * of xdp_frames/pages in those cases.
  */
-static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 struct xdp_buff *xdp)
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 13/23] bpf: add multi-buffer support to xdp copy helpers
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

From: Eelco Chaudron <echaudro@redhat.com>

This patch adds support for multi-buffer for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 kernel/trace/bpf_trace.c                      |   3 +
 net/core/filter.c                             |  57 ++++++-
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 151 +++++++++++++-----
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 4 files changed, 168 insertions(+), 45 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 77f13de6f9f9..12f7b6ac9045 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1517,6 +1517,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
 
 extern const struct bpf_func_proto bpf_skb_output_proto;
 extern const struct bpf_func_proto bpf_xdp_output_proto;
+extern const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto;
 
 BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
 	   struct bpf_map *, map, u64, flags)
@@ -1616,6 +1617,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sock_from_file_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_ptr_cookie_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_trace_proto;
 #endif
 	case BPF_FUNC_seq_printf:
 		return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index d2d0f31a5498..a5ca054023d0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3796,6 +3796,15 @@ static const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff)
+
+const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_xdp_get_buff_len_bpf_ids[0],
+};
+
 static unsigned long xdp_get_metalen(const struct xdp_buff *xdp)
 {
 	return xdp_data_meta_unsupported(xdp) ? 0 :
@@ -4615,10 +4624,48 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
 };
 #endif
 
-static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
+static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
 				  unsigned long off, unsigned long len)
 {
-	memcpy(dst_buff, src_buff + off, len);
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	unsigned long ptr_len, ptr_off = 0;
+	skb_frag_t *next_frag, *end_frag;
+	struct skb_shared_info *sinfo;
+	u8 *ptr_buf;
+
+	if (likely(xdp->data_end - xdp->data >= off + len)) {
+		memcpy(dst_buff, xdp->data + off, len);
+		return 0;
+	}
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	end_frag = &sinfo->frags[sinfo->nr_frags];
+	next_frag = &sinfo->frags[0];
+
+	ptr_len = xdp->data_end - xdp->data;
+	ptr_buf = xdp->data;
+
+	while (true) {
+		if (off < ptr_off + ptr_len) {
+			unsigned long copy_off = off - ptr_off;
+			unsigned long copy_len = min(len, ptr_len - copy_off);
+
+			memcpy(dst_buff, ptr_buf + copy_off, copy_len);
+
+			off += copy_len;
+			len -= copy_len;
+			dst_buff += copy_len;
+		}
+
+		if (!len || next_frag == end_frag)
+			break;
+
+		ptr_off += ptr_len;
+		ptr_buf = skb_frag_address(next_frag);
+		ptr_len = skb_frag_size(next_frag);
+		next_frag++;
+	}
+
 	return 0;
 }
 
@@ -4629,11 +4676,11 @@ BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map,
 
 	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
 		return -EINVAL;
-	if (unlikely(!xdp ||
-		     xdp_size > (unsigned long)(xdp->data_end - xdp->data)))
+
+	if (unlikely(!xdp || xdp_size > xdp_get_buff_len(xdp)))
 		return -EFAULT;
 
-	return bpf_event_output(map, flags, meta, meta_size, xdp->data,
+	return bpf_event_output(map, flags, meta, meta_size, xdp,
 				xdp_size, bpf_xdp_copy);
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
index c98a897ad692..c5cff4f2d9de 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
@@ -10,11 +10,20 @@ struct meta {
 	int pkt_len;
 };
 
+struct test_ctx_s {
+	bool passed;
+	int pkt_size;
+};
+
+struct test_ctx_s test_ctx;
+
 static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 {
-	int duration = 0;
 	struct meta *meta = (struct meta *)data;
 	struct ipv4_packet *trace_pkt_v4 = data + sizeof(*meta);
+	unsigned char *raw_pkt = data + sizeof(*meta);
+	struct test_ctx_s *tst_ctx = ctx;
+	int duration = 0;
 
 	if (CHECK(size < sizeof(pkt_v4) + sizeof(*meta),
 		  "check_size", "size %u < %zu\n",
@@ -25,25 +34,114 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 		  "meta->ifindex = %d\n", meta->ifindex))
 		return;
 
-	if (CHECK(meta->pkt_len != sizeof(pkt_v4), "check_meta_pkt_len",
-		  "meta->pkt_len = %zd\n", sizeof(pkt_v4)))
+	if (CHECK(meta->pkt_len != tst_ctx->pkt_size, "check_meta_pkt_len",
+		  "meta->pkt_len = %d\n", tst_ctx->pkt_size))
 		return;
 
 	if (CHECK(memcmp(trace_pkt_v4, &pkt_v4, sizeof(pkt_v4)),
 		  "check_packet_content", "content not the same\n"))
 		return;
 
-	*(bool *)ctx = true;
+	if (meta->pkt_len > sizeof(pkt_v4)) {
+		for (int i = 0; i < (meta->pkt_len - sizeof(pkt_v4)); i++) {
+			if (raw_pkt[i + sizeof(pkt_v4)] != (unsigned char)i) {
+				CHECK(true, "check_packet_content",
+				      "byte %zu does not match %u != %u\n",
+				      i + sizeof(pkt_v4),
+				      raw_pkt[i + sizeof(pkt_v4)],
+				      (unsigned char)i);
+				break;
+			}
+		}
+	}
+
+	tst_ctx->passed = true;
 }
 
-void test_xdp_bpf2bpf(void)
+#define BUF_SZ	9000
+
+static int run_xdp_bpf2bpf_pkt_size(int pkt_fd, struct perf_buffer *pb,
+				    struct test_xdp_bpf2bpf *ftrace_skel,
+				    int pkt_size)
 {
 	__u32 duration = 0, retval, size;
-	char buf[128];
+	__u8 *buf, *buf_in;
+	int err, ret = 0;
+
+	if (pkt_size > BUF_SZ || pkt_size < sizeof(pkt_v4))
+		return -EINVAL;
+
+	buf_in = malloc(BUF_SZ);
+	if (CHECK(!buf_in, "buf_in malloc()", "error:%s\n", strerror(errno)))
+		return -ENOMEM;
+
+	buf = malloc(BUF_SZ);
+	if (CHECK(!buf, "buf malloc()", "error:%s\n", strerror(errno))) {
+		ret = -ENOMEM;
+		goto free_buf_in;
+	}
+
+	test_ctx.passed = false;
+	test_ctx.pkt_size = pkt_size;
+
+	memcpy(buf_in, &pkt_v4, sizeof(pkt_v4));
+	if (pkt_size > sizeof(pkt_v4)) {
+		for (int i = 0; i < (pkt_size - sizeof(pkt_v4)); i++)
+			buf_in[i + sizeof(pkt_v4)] = i;
+	}
+
+	/* Run test program */
+	err = bpf_prog_test_run(pkt_fd, 1, buf_in, pkt_size,
+				buf, &size, &retval, &duration);
+
+	if (CHECK(err || retval != XDP_PASS || size != pkt_size,
+		  "ipv4", "err %d errno %d retval %d size %d\n",
+		  err, errno, retval, size)) {
+		ret = err ? err : -EINVAL;
+		goto free_buf;
+	}
+
+	/* Make sure bpf_xdp_output() was triggered and it sent the expected
+	 * data to the perf ring buffer.
+	 */
+	err = perf_buffer__poll(pb, 100);
+	if (CHECK(err <= 0, "perf_buffer__poll", "err %d\n", err)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	if (CHECK_FAIL(!test_ctx.passed)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	/* Verify test results */
+	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
+		  "result", "fentry failed err %llu\n",
+		  ftrace_skel->bss->test_result_fentry)) {
+		ret = -EINVAL;
+		goto free_buf;
+	}
+
+	if (CHECK(ftrace_skel->bss->test_result_fexit != XDP_PASS, "result",
+		  "fexit failed err %llu\n",
+		  ftrace_skel->bss->test_result_fexit))
+		ret = -EINVAL;
+
+free_buf:
+	free(buf);
+free_buf_in:
+	free(buf_in);
+
+	return ret;
+}
+
+void test_xdp_bpf2bpf(void)
+{
 	int err, pkt_fd, map_fd;
-	bool passed = false;
-	struct iphdr iph;
-	struct iptnl_info value4 = {.family = AF_INET};
+	__u32 duration = 0;
+	int pkt_sizes[] = {sizeof(pkt_v4), 1024, 4100, 8200};
+	struct iptnl_info value4 = {.family = AF_INET6};
 	struct test_xdp *pkt_skel = NULL;
 	struct test_xdp_bpf2bpf *ftrace_skel = NULL;
 	struct vip key4 = {.protocol = 6, .family = AF_INET};
@@ -85,39 +183,14 @@ void test_xdp_bpf2bpf(void)
 		goto out;
 
 	/* Set up perf buffer */
-	pb = perf_buffer__new(bpf_map__fd(ftrace_skel->maps.perf_buf_map), 1,
-			      on_sample, NULL, &passed, NULL);
+	pb = perf_buffer__new(bpf_map__fd(ftrace_skel->maps.perf_buf_map), 8,
+			      on_sample, NULL, &test_ctx, NULL);
 	if (!ASSERT_OK_PTR(pb, "perf_buf__new"))
 		goto out;
 
-	/* Run test program */
-	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v4, sizeof(pkt_v4),
-				buf, &size, &retval, &duration);
-	memcpy(&iph, buf + sizeof(struct ethhdr), sizeof(iph));
-	if (CHECK(err || retval != XDP_TX || size != 74 ||
-		  iph.protocol != IPPROTO_IPIP, "ipv4",
-		  "err %d errno %d retval %d size %d\n",
-		  err, errno, retval, size))
-		goto out;
-
-	/* Make sure bpf_xdp_output() was triggered and it sent the expected
-	 * data to the perf ring buffer.
-	 */
-	err = perf_buffer__poll(pb, 100);
-	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
-		goto out;
-
-	CHECK_FAIL(!passed);
-
-	/* Verify test results */
-	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
-		  "result", "fentry failed err %llu\n",
-		  ftrace_skel->bss->test_result_fentry))
-		goto out;
-
-	CHECK(ftrace_skel->bss->test_result_fexit != XDP_TX, "result",
-	      "fexit failed err %llu\n", ftrace_skel->bss->test_result_fexit);
-
+	for (int i = 0; i < ARRAY_SIZE(pkt_sizes); i++)
+		run_xdp_bpf2bpf_pkt_size(pkt_fd, pb, ftrace_skel,
+					 pkt_sizes[i]);
 out:
 	if (pb)
 		perf_buffer__free(pb);
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
index 58cf4345f5cc..3379d303f41a 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
@@ -49,7 +49,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
 	void *data = (void *)(long)xdp->data;
 
 	meta.ifindex = xdp->rxq->dev->ifindex;
-	meta.pkt_len = data_end - data;
+	meta.pkt_len = bpf_xdp_get_buff_len((struct xdp_md *)xdp);
 	bpf_xdp_output(xdp, &perf_buf_map,
 		       ((__u64) meta.pkt_len << 32) |
 		       BPF_F_CURRENT_CPU,
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 14/23] bpf: move user_size out of bpf_test_init
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp multi-buff selftest

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 46dd95755967..dbf1227f437c 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -249,11 +249,10 @@ bool bpf_prog_test_check_kfunc_call(u32 kfunc_id, struct module *owner)
 	return bpf_check_mod_kfunc_call(&prog_test_kfunc_list, kfunc_id, owner);
 }
 
-static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
-			   u32 headroom, u32 tailroom)
+static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
+			   u32 size, u32 headroom, u32 tailroom)
 {
 	void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
-	u32 user_size = kattr->test.data_size_in;
 	void *data;
 
 	if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
@@ -581,7 +580,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (kattr->test.flags || kattr->test.cpu)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     size, NET_SKB_PAD + NET_IP_ALIGN,
 			     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 	if (IS_ERR(data))
 		return PTR_ERR(data);
@@ -790,7 +790,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
 
-	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     max_data_sz, headroom, tailroom);
 	if (IS_ERR(data)) {
 		ret = PTR_ERR(data);
 		goto free_ctx;
@@ -876,7 +877,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (size < ETH_HLEN)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, 0, 0);
+	data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 15/23] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce the capability to allocate a xdp multi-buff in
bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce
the selftests for new xdp multi-buff ebpf helpers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 58 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 45 insertions(+), 13 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index dbf1227f437c..9cb5d5eced9a 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -758,16 +758,16 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
+	u32 headroom = XDP_PACKET_HEADROOM;
+	u32 retval, duration, max_data_sz;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
+	struct skb_shared_info *sinfo;
 	struct xdp_buff xdp = {};
-	u32 retval, duration;
+	int i, ret = -EINVAL;
 	struct xdp_md *ctx;
-	u32 max_data_sz;
 	void *data;
-	int ret = -EINVAL;
 
 	if (prog->expected_attach_type == BPF_XDP_DEVMAP ||
 	    prog->expected_attach_type == BPF_XDP_CPUMAP)
@@ -787,27 +787,60 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		headroom -= ctx->data;
 	}
 
-	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
+	size = min_t(u32, size, max_data_sz);
 
-	data = bpf_test_init(kattr, kattr->test.data_size_in,
-			     max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data)) {
 		ret = PTR_ERR(data);
 		goto free_ctx;
 	}
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
-	xdp_init_buff(&xdp, headroom + max_data_sz + tailroom,
-		      &rxqueue->xdp_rxq);
+	rxqueue->xdp_rxq.frag_size = headroom + max_data_sz + tailroom;
+	xdp_init_buff(&xdp, rxqueue->xdp_rxq.frag_size, &rxqueue->xdp_rxq);
 	xdp_prepare_buff(&xdp, data, headroom, size, true);
+	sinfo = xdp_get_shared_info_from_buff(&xdp);
 
 	ret = xdp_convert_md_to_buff(ctx, &xdp);
 	if (ret)
 		goto free_data;
 
+	if (unlikely(kattr->test.data_size_in > size)) {
+		void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
+
+		while (size < kattr->test.data_size_in) {
+			struct page *page;
+			skb_frag_t *frag;
+			int data_len;
+
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			frag = &sinfo->frags[sinfo->nr_frags++];
+			__skb_frag_set_page(frag, page);
+
+			data_len = min_t(int, kattr->test.data_size_in - size,
+					 PAGE_SIZE);
+			skb_frag_size_set(frag, data_len);
+
+			if (copy_from_user(page_address(page), data_in + size,
+					   data_len)) {
+				ret = -EFAULT;
+				goto out;
+			}
+			sinfo->xdp_frags_size += data_len;
+			size += data_len;
+		}
+		xdp_buff_set_mb(&xdp);
+	}
+
 	if (repeat > 1)
 		bpf_prog_change_xdp(NULL, prog);
+
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	/* We convert the xdp_buff back to an xdp_md before checking the return
 	 * code so the reference count of any held netdevice will be decremented
@@ -817,10 +850,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (ret)
 		goto out;
 
-	if (xdp.data_meta != data + headroom ||
-	    xdp.data_end != xdp.data_meta + size)
-		size = xdp.data_end - xdp.data_meta;
-
+	size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
 	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval,
 			      duration);
 	if (!ret)
@@ -831,6 +861,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (repeat > 1)
 		bpf_prog_change_xdp(prog, NULL);
 free_data:
+	for (i = 0; i < sinfo->nr_frags; i++)
+		__free_page(skb_frag_page(&sinfo->frags[i]));
 	kfree(data);
 free_ctx:
 	kfree(ctx);
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 16/23] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

introduce xdp_shared_info pointer in bpf_test_finish signature in order
to copy back paged data from a xdp multi-buff frame to userspace buffer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 48 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 9cb5d5eced9a..e57c71085dd5 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -130,7 +130,8 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 
 static int bpf_test_finish(const union bpf_attr *kattr,
 			   union bpf_attr __user *uattr, const void *data,
-			   u32 size, u32 retval, u32 duration)
+			   struct skb_shared_info *sinfo, u32 size,
+			   u32 retval, u32 duration)
 {
 	void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
 	int err = -EFAULT;
@@ -145,8 +146,36 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 		err = -ENOSPC;
 	}
 
-	if (data_out && copy_to_user(data_out, data, copy_size))
-		goto out;
+	if (data_out) {
+		int len = sinfo ? copy_size - sinfo->xdp_frags_size : copy_size;
+
+		if (copy_to_user(data_out, data, len))
+			goto out;
+
+		if (sinfo) {
+			int i, offset = len, data_len;
+
+			for (i = 0; i < sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &sinfo->frags[i];
+
+				if (offset >= copy_size) {
+					err = -ENOSPC;
+					break;
+				}
+
+				data_len = min_t(int, copy_size - offset,
+						 skb_frag_size(frag));
+
+				if (copy_to_user(data_out + offset,
+						 skb_frag_address(frag),
+						 data_len))
+					goto out;
+
+				offset += data_len;
+			}
+		}
+	}
+
 	if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
 		goto out;
 	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
@@ -683,7 +712,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* bpf program can never convert linear skb to non-linear */
 	if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
 		size = skb_headlen(skb);
-	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval,
+			      duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct __sk_buff));
@@ -851,8 +881,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		goto out;
 
 	size = xdp.data_end - xdp.data_meta + sinfo->xdp_frags_size;
-	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, size, retval,
-			      duration);
+	ret = bpf_test_finish(kattr, uattr, xdp.data_meta, sinfo, size,
+			      retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct xdp_md));
@@ -944,8 +974,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (ret < 0)
 		goto out;
 
-	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
-			      retval, duration);
+	ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL,
+			      sizeof(flow_keys), retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
 				     sizeof(struct bpf_flow_keys));
@@ -1049,7 +1079,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
 		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
 	}
 
-	ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 17/23] bpf: selftests: update xdp_adjust_tail selftest to include multi-buffer
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

From: Eelco Chaudron <echaudro@redhat.com>

This change adds test cases for the multi-buffer scenarios when shrinking
and growing.

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 .../bpf/prog_tests/xdp_adjust_tail.c          | 131 ++++++++++++++++++
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++++-
 3 files changed, 166 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 3f5a17c38be5..d21e0a597b3c 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -130,6 +130,133 @@ static void test_xdp_adjust_tail_grow2(void)
 	bpf_object__close(obj);
 }
 
+void test_xdp_adjust_mb_tail_shrink(void)
+{
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	int err, prog_fd;
+	__u8 *buf;
+
+	/* For the individual test cases, the first byte in the packet
+	 * indicates which test will be run.
+	 */
+	obj = bpf_object__open(file);
+	if (libbpf_get_error(obj))
+		return;
+
+	prog = bpf_object__next_program(obj, NULL);
+	if (bpf_object__load(obj))
+		return;
+
+	prog_fd = bpf_program__fd(prog);
+
+	buf = malloc(9000);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		goto out;
+
+	memset(buf, 0, 9000);
+
+	/* Test case removing 10 bytes from last frag, NOT freeing it */
+	exp_size = 8990; /* 9000 - 10 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-10b", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing one of two pages, assuming 4K pages */
+	buf[0] = 1;
+	exp_size = 4900; /* 9000 - 4100 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-1p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing two pages resulting in a non mb xdp_buff */
+	buf[0] = 2;
+	exp_size = 800; /* 9000 - 8200 */
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-2p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	free(buf);
+out:
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_mb_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	int err, i, prog_fd;
+	__u8 *buf;
+
+	obj = bpf_object__open(file);
+	if (libbpf_get_error(obj))
+		return;
+
+	prog = bpf_object__next_program(obj, NULL);
+	if (bpf_object__load(obj))
+		return;
+
+	prog_fd = bpf_program__fd(prog);
+
+	buf = malloc(16384);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		goto out;
+
+	/* Test case add 10 bytes to last frag */
+	memset(buf, 1, 16384);
+	size = 9000;
+	exp_size = size + 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	for (i = 0; i < 9000; i++)
+		CHECK(buf[i] != 1, "9k+10b-old",
+		      "Old data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9000; i < 9010; i++)
+		CHECK(buf[i] != 0, "9k+10b-new",
+		      "New data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9010; i < 16384; i++)
+		CHECK(buf[i] != 1, "9k+10b-untouched",
+		      "Unused data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	/* Test a too large grow */
+	memset(buf, 1, 16384);
+	size = 9001;
+	exp_size = size;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_DROP || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	free(buf);
+out:
+	bpf_object__close(obj);
+}
+
 void test_xdp_adjust_tail(void)
 {
 	if (test__start_subtest("xdp_adjust_tail_shrink"))
@@ -138,4 +265,8 @@ void test_xdp_adjust_tail(void)
 		test_xdp_adjust_tail_grow();
 	if (test__start_subtest("xdp_adjust_tail_grow2"))
 		test_xdp_adjust_tail_grow2();
+	if (test__start_subtest("xdp_adjust_mb_tail_shrink"))
+		test_xdp_adjust_mb_tail_shrink();
+	if (test__start_subtest("xdp_adjust_mb_tail_grow"))
+		test_xdp_adjust_mb_tail_grow();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
index 199c61b7d062..53b64c999450 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -7,11 +7,10 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 {
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
-	unsigned int data_len;
+	int data_len = bpf_xdp_get_buff_len(xdp);
 	int offset = 0;
 
 	/* Data length determine test case */
-	data_len = data_end - data;
 
 	if (data_len == 54) { /* sizeof(pkt_v4) */
 		offset = 4096; /* test too large offset */
@@ -20,7 +19,12 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 	} else if (data_len == 64) {
 		offset = 128;
 	} else if (data_len == 128) {
-		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+		/* Max tail grow 3520 */
+		offset = 4096 - 256 - 320 - data_len;
+	} else if (data_len == 9000) {
+		offset = 10;
+	} else if (data_len == 9001) {
+		offset = 4096;
 	} else {
 		return XDP_ABORTED; /* No matching test */
 	}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
index b7448253d135..eeff48997b6e 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -12,14 +12,38 @@
 SEC("xdp")
 int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
 {
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
 	int offset = 0;
 
-	if (data_end - data == 54) /* sizeof(pkt_v4) */
+	switch (bpf_xdp_get_buff_len(xdp)) {
+	case 54:
+		/* sizeof(pkt_v4) */
 		offset = 256; /* shrink too much */
-	else
+		break;
+	case 9000:
+		/* Multi-buffer test cases */
+		if (data + 1 > data_end)
+			return XDP_DROP;
+
+		switch (data[0]) {
+		case 0:
+			offset = 10;
+			break;
+		case 1:
+			offset = 4100;
+			break;
+		case 2:
+			offset = 8200;
+			break;
+		default:
+			return XDP_DROP;
+		}
+		break;
+	default:
 		offset = 20;
+		break;
+	}
 	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
 		return XDP_DROP;
 	return XDP_TX;
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 18/23] libbpf: Add SEC name for xdp_mb programs
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce support for the following SEC entries for XDP multi-buff
property:
- SEC("xdp_mb/")
- SEC("xdp_devmap_mb/")
- SEC("xdp_cpumap_mb/")

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 tools/lib/bpf/libbpf.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d027e1d620fc..71781e4516b7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -271,6 +271,8 @@ enum sec_def_flags {
 	SEC_SLEEPABLE = 8,
 	/* allow non-strict prefix matching */
 	SEC_SLOPPY_PFX = 16,
+	/* BPF program support XDP multi-buff */
+	SEC_XDP_MB = 32,
 };
 
 struct bpf_sec_def {
@@ -6565,6 +6567,9 @@ static int libbpf_preload_prog(struct bpf_program *prog,
 	if (def & SEC_SLEEPABLE)
 		opts->prog_flags |= BPF_F_SLEEPABLE;
 
+	if (prog->type == BPF_PROG_TYPE_XDP && (def & SEC_XDP_MB))
+		opts->prog_flags |= BPF_F_XDP_MB;
+
 	if ((prog->type == BPF_PROG_TYPE_TRACING ||
 	     prog->type == BPF_PROG_TYPE_LSM ||
 	     prog->type == BPF_PROG_TYPE_EXT) && !prog->attach_btf_id) {
@@ -8603,8 +8608,11 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("lsm.s/",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
 	SEC_DEF("iter/",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter),
 	SEC_DEF("syscall",		SYSCALL, 0, SEC_SLEEPABLE),
+	SEC_DEF("xdp_devmap_mb/",	XDP, BPF_XDP_DEVMAP, SEC_ATTACHABLE | SEC_XDP_MB),
 	SEC_DEF("xdp_devmap/",		XDP, BPF_XDP_DEVMAP, SEC_ATTACHABLE),
+	SEC_DEF("xdp_cpumap_mb/",	XDP, BPF_XDP_CPUMAP, SEC_ATTACHABLE | SEC_XDP_MB),
 	SEC_DEF("xdp_cpumap/",		XDP, BPF_XDP_CPUMAP, SEC_ATTACHABLE),
+	SEC_DEF("xdp_mb/",		XDP, BPF_XDP, SEC_ATTACHABLE_OPT | SEC_SLOPPY_PFX | SEC_XDP_MB),
 	SEC_DEF("xdp",			XDP, BPF_XDP, SEC_ATTACHABLE_OPT | SEC_SLOPPY_PFX),
 	SEC_DEF("perf_event",		PERF_EVENT, 0, SEC_NONE | SEC_SLOPPY_PFX),
 	SEC_DEF("lwt_in",		LWT_IN, 0, SEC_NONE | SEC_SLOPPY_PFX),
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 19/23] bpf: generalise tail call map compatibility check
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

From: Toke Hoiland-Jorgensen <toke@redhat.com>

The check for tail call map compatibility ensures that tail calls only
happen between maps of the same type. To ensure backwards compatibility for
XDP multi-buffer we need a similar type of check for cpumap and devmap
programs, so move the state from bpf_array_aux into bpf_map, add xdp_mb to
the check, and apply the same check to cpumap and devmap.

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Toke Hoiland-Jorgensen <toke@redhat.com>
---
 include/linux/bpf.h   | 31 ++++++++++++++++++++-----------
 kernel/bpf/arraymap.c |  4 +---
 kernel/bpf/core.c     | 28 ++++++++++++++--------------
 kernel/bpf/cpumap.c   |  8 +++++---
 kernel/bpf/devmap.c   |  3 ++-
 kernel/bpf/syscall.c  | 21 +++++++++++----------
 6 files changed, 53 insertions(+), 42 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e516815e35f9..394e33c45513 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -194,6 +194,18 @@ struct bpf_map {
 	struct work_struct work;
 	struct mutex freeze_mutex;
 	u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
+
+	/* 'Ownership' of program-containing map is claimed by the first program
+	 * that is going to use this map or by the first program which FD is
+	 * stored in the map to make sure that all callers and callees have the
+	 * same prog type, JITed flag and xdp_mb flag.
+	 */
+	struct {
+		spinlock_t lock;
+		enum bpf_prog_type type;
+		bool jited;
+		bool xdp_mb;
+	} owner;
 };
 
 static inline bool map_value_has_spin_lock(const struct bpf_map *map)
@@ -936,16 +948,6 @@ struct bpf_prog_aux {
 };
 
 struct bpf_array_aux {
-	/* 'Ownership' of prog array is claimed by the first program that
-	 * is going to use this map or by the first program which FD is
-	 * stored in the map to make sure that all callers and callees have
-	 * the same prog type and JITed flag.
-	 */
-	struct {
-		spinlock_t lock;
-		enum bpf_prog_type type;
-		bool jited;
-	} owner;
 	/* Programs with direct jumps into programs part of this array. */
 	struct list_head poke_progs;
 	struct bpf_map *map;
@@ -1120,7 +1122,14 @@ struct bpf_event_entry {
 	struct rcu_head rcu;
 };
 
-bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp);
+static inline bool map_type_contains_progs(struct bpf_map *map)
+{
+	return map->map_type == BPF_MAP_TYPE_PROG_ARRAY ||
+	       map->map_type == BPF_MAP_TYPE_DEVMAP ||
+	       map->map_type == BPF_MAP_TYPE_CPUMAP;
+}
+
+bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp);
 int bpf_prog_calc_tag(struct bpf_prog *fp);
 
 const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index c7a5be3bf8be..7f145aefbff8 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -837,13 +837,12 @@ static int fd_array_map_delete_elem(struct bpf_map *map, void *key)
 static void *prog_fd_array_get_ptr(struct bpf_map *map,
 				   struct file *map_file, int fd)
 {
-	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	struct bpf_prog *prog = bpf_prog_get(fd);
 
 	if (IS_ERR(prog))
 		return prog;
 
-	if (!bpf_prog_array_compatible(array, prog)) {
+	if (!bpf_prog_map_compatible(map, prog)) {
 		bpf_prog_put(prog);
 		return ERR_PTR(-EINVAL);
 	}
@@ -1071,7 +1070,6 @@ static struct bpf_map *prog_array_map_alloc(union bpf_attr *attr)
 	INIT_WORK(&aux->work, prog_array_map_clear_deferred);
 	INIT_LIST_HEAD(&aux->poke_progs);
 	mutex_init(&aux->poke_mutex);
-	spin_lock_init(&aux->owner.lock);
 
 	map = array_map_alloc(attr);
 	if (IS_ERR(map)) {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index de3e5bc6781f..4a8beab6071e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1829,28 +1829,30 @@ static unsigned int __bpf_prog_ret0_warn(const void *ctx,
 }
 #endif
 
-bool bpf_prog_array_compatible(struct bpf_array *array,
-			       const struct bpf_prog *fp)
+bool bpf_prog_map_compatible(struct bpf_map *map,
+			     const struct bpf_prog *fp)
 {
 	bool ret;
 
 	if (fp->kprobe_override)
 		return false;
 
-	spin_lock(&array->aux->owner.lock);
-
-	if (!array->aux->owner.type) {
+	spin_lock(&map->owner.lock);
+	if (!map->owner.type) {
 		/* There's no owner yet where we could check for
 		 * compatibility.
 		 */
-		array->aux->owner.type  = fp->type;
-		array->aux->owner.jited = fp->jited;
+		map->owner.type  = fp->type;
+		map->owner.jited = fp->jited;
+		map->owner.xdp_mb = fp->aux->xdp_mb;
 		ret = true;
 	} else {
-		ret = array->aux->owner.type  == fp->type &&
-		      array->aux->owner.jited == fp->jited;
+		ret = map->owner.type  == fp->type &&
+		      map->owner.jited == fp->jited &&
+		      map->owner.xdp_mb == fp->aux->xdp_mb;
 	}
-	spin_unlock(&array->aux->owner.lock);
+	spin_unlock(&map->owner.lock);
+
 	return ret;
 }
 
@@ -1862,13 +1864,11 @@ static int bpf_check_tail_call(const struct bpf_prog *fp)
 	mutex_lock(&aux->used_maps_mutex);
 	for (i = 0; i < aux->used_map_cnt; i++) {
 		struct bpf_map *map = aux->used_maps[i];
-		struct bpf_array *array;
 
-		if (map->map_type != BPF_MAP_TYPE_PROG_ARRAY)
+		if (!map_type_contains_progs(map))
 			continue;
 
-		array = container_of(map, struct bpf_array, map);
-		if (!bpf_prog_array_compatible(array, fp)) {
+		if (!bpf_prog_map_compatible(map, fp)) {
 			ret = -EINVAL;
 			goto out;
 		}
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 585b2b77ccc4..7f9984e7ba1d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -397,7 +397,8 @@ static int cpu_map_kthread_run(void *data)
 	return 0;
 }
 
-static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu, int fd)
+static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu,
+				      struct bpf_map *map, int fd)
 {
 	struct bpf_prog *prog;
 
@@ -405,7 +406,8 @@ static int __cpu_map_load_bpf_program(struct bpf_cpu_map_entry *rcpu, int fd)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
-	if (prog->expected_attach_type != BPF_XDP_CPUMAP) {
+	if (prog->expected_attach_type != BPF_XDP_CPUMAP ||
+	    !bpf_prog_map_compatible(map, prog)) {
 		bpf_prog_put(prog);
 		return -EINVAL;
 	}
@@ -457,7 +459,7 @@ __cpu_map_entry_alloc(struct bpf_map *map, struct bpf_cpumap_val *value,
 	rcpu->map_id = map->id;
 	rcpu->value.qsize  = value->qsize;
 
-	if (fd > 0 && __cpu_map_load_bpf_program(rcpu, fd))
+	if (fd > 0 && __cpu_map_load_bpf_program(rcpu, map, fd))
 		goto free_ptr_ring;
 
 	/* Setup kthread */
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f02d04540c0c..a35edd4a2bf1 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -868,7 +868,8 @@ static struct bpf_dtab_netdev *__dev_map_alloc_node(struct net *net,
 					     BPF_PROG_TYPE_XDP, false);
 		if (IS_ERR(prog))
 			goto err_put_dev;
-		if (prog->expected_attach_type != BPF_XDP_DEVMAP)
+		if (prog->expected_attach_type != BPF_XDP_DEVMAP ||
+		    !bpf_prog_map_compatible(&dtab->map, prog))
 			goto err_put_prog;
 	}
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 82626a95be99..0c45d3e9eda3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -540,16 +540,15 @@ static unsigned long bpf_map_memory_footprint(const struct bpf_map *map)
 
 static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 {
-	const struct bpf_map *map = filp->private_data;
-	const struct bpf_array *array;
-	u32 type = 0, jited = 0;
-
-	if (map->map_type == BPF_MAP_TYPE_PROG_ARRAY) {
-		array = container_of(map, struct bpf_array, map);
-		spin_lock(&array->aux->owner.lock);
-		type  = array->aux->owner.type;
-		jited = array->aux->owner.jited;
-		spin_unlock(&array->aux->owner.lock);
+	struct bpf_map *map = filp->private_data;
+	u32 type = 0, jited = 0, xdp_mb = 0;
+
+	if (map_type_contains_progs(map)) {
+		spin_lock(&map->owner.lock);
+		type  = map->owner.type;
+		jited = map->owner.jited;
+		xdp_mb = map->owner.xdp_mb;
+		spin_unlock(&map->owner.lock);
 	}
 
 	seq_printf(m,
@@ -574,6 +573,7 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 	if (type) {
 		seq_printf(m, "owner_prog_type:\t%u\n", type);
 		seq_printf(m, "owner_jited:\t%u\n", jited);
+		seq_printf(m, "owner_xdp_mb:\t%u\n", xdp_mb);
 	}
 }
 #endif
@@ -864,6 +864,7 @@ static int map_create(union bpf_attr *attr)
 	atomic64_set(&map->refcnt, 1);
 	atomic64_set(&map->usercnt, 1);
 	mutex_init(&map->freeze_mutex);
+	spin_lock_init(&map->owner.lock);
 
 	map->spin_lock_off = -EINVAL;
 	map->timer_off = -EINVAL;
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 20/23] net: xdp: introduce bpf_xdp_pointer utility routine
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Similar to skb_header_pointer, introduce bpf_xdp_pointer utility routine
to return a pointer to a given position in the xdp_buff if the requested
area (offset + len) is contained in a contiguous memory area otherwise it
will be copied in a bounce buffer provided by the caller.
Similar to the tc counterpart, introduce the two following xdp helpers:
- bpf_xdp_load_bytes
- bpf_xdp_store_bytes

Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/uapi/linux/bpf.h       |  18 ++++
 net/core/filter.c              | 176 ++++++++++++++++++++++++++-------
 tools/include/uapi/linux/bpf.h |  18 ++++
 3 files changed, 174 insertions(+), 38 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d5921a0202f4..94e43088971d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4994,6 +4994,22 @@ union bpf_attr {
  *		Get the total size of a given xdp buff (linear and paged area)
  *	Return
  *		The total size of a given xdp buffer.
+ *
+ * long bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf, u32 len)
+ *	Description
+ *		This helper is provided as an easy way to load data from a
+ *		xdp buffer. It can be used to load *len* bytes from *offset* from
+ *		the frame associated to *xdp_md*, into the buffer pointed by
+ *		*buf*.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * long bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf, u32 len)
+ *	Description
+ *		Store *len* bytes from buffer *buf* into the frame
+ *		associated to *xdp_md*, at *offset*.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5179,6 +5195,8 @@ union bpf_attr {
 	FN(find_vma),			\
 	FN(loop),			\
 	FN(xdp_get_buff_len),		\
+	FN(xdp_load_bytes),		\
+	FN(xdp_store_bytes),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index a5ca054023d0..14860931733d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3839,6 +3839,138 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
+			     void *buf, unsigned long len, bool flush)
+{
+	unsigned long ptr_len, ptr_off = 0;
+	skb_frag_t *next_frag, *end_frag;
+	struct skb_shared_info *sinfo;
+	void *src, *dst;
+	u8 *ptr_buf;
+
+	if (likely(xdp->data_end - xdp->data >= off + len)) {
+		src = flush ? buf : xdp->data + off;
+		dst = flush ? xdp->data + off : buf;
+		memcpy(dst, src, len);
+		return;
+	}
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	end_frag = &sinfo->frags[sinfo->nr_frags];
+	next_frag = &sinfo->frags[0];
+
+	ptr_len = xdp->data_end - xdp->data;
+	ptr_buf = xdp->data;
+
+	while (true) {
+		if (off < ptr_off + ptr_len) {
+			unsigned long copy_off = off - ptr_off;
+			unsigned long copy_len = min(len, ptr_len - copy_off);
+
+			src = flush ? buf : ptr_buf + copy_off;
+			dst = flush ? ptr_buf + copy_off : buf;
+			memcpy(dst, src, copy_len);
+
+			off += copy_len;
+			len -= copy_len;
+			buf += copy_len;
+		}
+
+		if (!len || next_frag == end_frag)
+			break;
+
+		ptr_off += ptr_len;
+		ptr_buf = skb_frag_address(next_frag);
+		ptr_len = skb_frag_size(next_frag);
+		next_frag++;
+	}
+}
+
+static void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	u32 size = xdp->data_end - xdp->data;
+	void *addr = xdp->data;
+	int i;
+
+	if (unlikely(offset > 0xffff || len > 0xffff))
+		return ERR_PTR(-EFAULT);
+
+	if (offset + len > xdp_get_buff_len(xdp))
+		return ERR_PTR(-EINVAL);
+
+	if (offset < size) /* linear area */
+		goto out;
+
+	offset -= size;
+	for (i = 0; i < sinfo->nr_frags; i++) { /* paged area */
+		u32 frag_size = skb_frag_size(&sinfo->frags[i]);
+
+		if  (offset < frag_size) {
+			addr = skb_frag_address(&sinfo->frags[i]);
+			size = frag_size;
+			break;
+		}
+		offset -= frag_size;
+	}
+out:
+	return offset + len < size ? addr + offset : NULL;
+}
+
+BPF_CALL_4(bpf_xdp_load_bytes, struct xdp_buff *, xdp, u32, offset,
+	   void *, buf, u32, len)
+{
+	void *ptr;
+
+	ptr = bpf_xdp_pointer(xdp, offset, len);
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	if (!ptr)
+		bpf_xdp_copy_buf(xdp, offset, buf, len, false);
+	else
+		memcpy(buf, ptr, len);
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_load_bytes_proto = {
+	.func		= bpf_xdp_load_bytes,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
+};
+
+BPF_CALL_4(bpf_xdp_store_bytes, struct xdp_buff *, xdp, u32, offset,
+	   void *, buf, u32, len)
+{
+	void *ptr;
+
+	ptr = bpf_xdp_pointer(xdp, offset, len);
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	if (!ptr)
+		bpf_xdp_copy_buf(xdp, offset, buf, len, true);
+	else
+		memcpy(ptr, buf, len);
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_xdp_store_bytes_proto = {
+	.func		= bpf_xdp_store_bytes,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
+};
+
 static int bpf_xdp_mb_increase_tail(struct xdp_buff *xdp, int offset)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
@@ -4624,48 +4756,12 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
 };
 #endif
 
-static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
+static unsigned long bpf_xdp_copy(void *dst, const void *ctx,
 				  unsigned long off, unsigned long len)
 {
 	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
-	unsigned long ptr_len, ptr_off = 0;
-	skb_frag_t *next_frag, *end_frag;
-	struct skb_shared_info *sinfo;
-	u8 *ptr_buf;
-
-	if (likely(xdp->data_end - xdp->data >= off + len)) {
-		memcpy(dst_buff, xdp->data + off, len);
-		return 0;
-	}
-
-	sinfo = xdp_get_shared_info_from_buff(xdp);
-	end_frag = &sinfo->frags[sinfo->nr_frags];
-	next_frag = &sinfo->frags[0];
-
-	ptr_len = xdp->data_end - xdp->data;
-	ptr_buf = xdp->data;
-
-	while (true) {
-		if (off < ptr_off + ptr_len) {
-			unsigned long copy_off = off - ptr_off;
-			unsigned long copy_len = min(len, ptr_len - copy_off);
-
-			memcpy(dst_buff, ptr_buf + copy_off, copy_len);
-
-			off += copy_len;
-			len -= copy_len;
-			dst_buff += copy_len;
-		}
-
-		if (!len || next_frag == end_frag)
-			break;
-
-		ptr_off += ptr_len;
-		ptr_buf = skb_frag_address(next_frag);
-		ptr_len = skb_frag_size(next_frag);
-		next_frag++;
-	}
 
+	bpf_xdp_copy_buf(xdp, off, dst, len, false);
 	return 0;
 }
 
@@ -7597,6 +7693,10 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_adjust_tail_proto;
 	case BPF_FUNC_xdp_get_buff_len:
 		return &bpf_xdp_get_buff_len_proto;
+	case BPF_FUNC_xdp_load_bytes:
+		return &bpf_xdp_load_bytes_proto;
+	case BPF_FUNC_xdp_store_bytes:
+		return &bpf_xdp_store_bytes_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
 	case BPF_FUNC_check_mtu:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d5921a0202f4..94e43088971d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4994,6 +4994,22 @@ union bpf_attr {
  *		Get the total size of a given xdp buff (linear and paged area)
  *	Return
  *		The total size of a given xdp buffer.
+ *
+ * long bpf_xdp_load_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf, u32 len)
+ *	Description
+ *		This helper is provided as an easy way to load data from a
+ *		xdp buffer. It can be used to load *len* bytes from *offset* from
+ *		the frame associated to *xdp_md*, into the buffer pointed by
+ *		*buf*.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
+ *
+ * long bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, void *buf, u32 len)
+ *	Description
+ *		Store *len* bytes from buffer *buf* into the frame
+ *		associated to *xdp_md*, at *offset*.
+ *	Return
+ *		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5179,6 +5195,8 @@ union bpf_attr {
 	FN(find_vma),			\
 	FN(loop),			\
 	FN(xdp_get_buff_len),		\
+	FN(xdp_load_bytes),		\
+	FN(xdp_store_bytes),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 21/23] bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce kernel selftest for new bpf_xdp_{load,store}_bytes helpers.
and bpf_xdp_pointer/bpf_xdp_copy_buf utility routines.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_adjust_frags.c         | 103 ++++++++++++++++++
 .../bpf/progs/test_xdp_update_frags.c         |  42 +++++++
 2 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_adjust_frags.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_update_frags.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_frags.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_frags.c
new file mode 100644
index 000000000000..4b0d04fb1041
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_frags.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+
+void test_xdp_update_frags(void)
+{
+	const char *file = "./test_xdp_update_frags.o";
+	__u32 duration, retval, size;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	int err, prog_fd;
+	__u32 *offset;
+	__u8 *buf;
+
+	obj = bpf_object__open(file);
+	if (libbpf_get_error(obj))
+		return;
+
+	prog = bpf_object__next_program(obj, NULL);
+	if (bpf_object__load(obj))
+		return;
+
+	prog_fd = bpf_program__fd(prog);
+
+	buf = malloc(128);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		goto out;
+
+	memset(buf, 0, 128);
+	offset = (__u32 *)buf;
+	*offset = 16;
+	buf[*offset] = 0xaa;		/* marker at offset 16 (head) */
+	buf[*offset + 15] = 0xaa;	/* marker at offset 31 (head) */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 128,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[16,31]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS || buf[16] != 0xbb || buf[31] != 0xbb,
+	      "128b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	free(buf);
+
+	buf = malloc(9000);
+	if (CHECK(!buf, "malloc()", "error:%s\n", strerror(errno)))
+		goto out;
+
+	memset(buf, 0, 9000);
+	offset = (__u32 *)buf;
+	*offset = 5000;
+	buf[*offset] = 0xaa;		/* marker at offset 5000 (frag0) */
+	buf[*offset + 15] = 0xaa;	/* marker at offset 5015 (frag0) */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[5000,5015]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS ||
+	      buf[5000] != 0xbb || buf[5015] != 0xbb,
+	      "9000b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	memset(buf, 0, 9000);
+	offset = (__u32 *)buf;
+	*offset = 3510;
+	buf[*offset] = 0xaa;		/* marker at offset 3510 (head) */
+	buf[*offset + 15] = 0xaa;	/* marker at offset 3525 (frag0) */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[3510,3525]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS ||
+	      buf[3510] != 0xbb || buf[3525] != 0xbb,
+	      "9000b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	memset(buf, 0, 9000);
+	offset = (__u32 *)buf;
+	*offset = 7606;
+	buf[*offset] = 0xaa;		/* marker at offset 7606 (frag0) */
+	buf[*offset + 15] = 0xaa;	/* marker at offset 7621 (frag1) */
+
+	err = bpf_prog_test_run(prog_fd, 1, buf, 9000,
+				buf, &size, &retval, &duration);
+
+	/* test_xdp_update_frags: buf[7606,7621]: 0xaa -> 0xbb */
+	CHECK(err || retval != XDP_PASS ||
+	      buf[7606] != 0xbb || buf[7621] != 0xbb,
+	      "9000b", "err %d errno %d retval %d size %d\n",
+	      err, errno, retval, size);
+
+	free(buf);
+out:
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_frags(void)
+{
+	if (test__start_subtest("xdp_adjust_frags"))
+		test_xdp_update_frags();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c b/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c
new file mode 100644
index 000000000000..5801f05219db
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_update_frags.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_mb/xdp_adjust_frags")
+int _xdp_adjust_frags(struct xdp_md *xdp)
+{
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
+	__u8 val[16] = {};
+	__u32 offset;
+	int err;
+
+	if (data + sizeof(__u32) > data_end)
+		return XDP_DROP;
+
+	offset = *(__u32 *)data;
+	err = bpf_xdp_load_bytes(xdp, offset, val, sizeof(val));
+	if (err < 0)
+		return XDP_DROP;
+
+	if (val[0] != 0xaa || val[15] != 0xaa) /* marker */
+		return XDP_DROP;
+
+	val[0] = 0xbb; /* update the marker */
+	val[15] = 0xbb;
+	err = bpf_xdp_store_bytes(xdp, offset, val, sizeof(val));
+	if (err < 0)
+		return XDP_DROP;
+
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 22/23] bpf: selftests: add CPUMAP/DEVMAP selftests for xdp multi-buff
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Verify compatibility checks attaching a XDP multi-buff program to a
CPUMAP/DEVMAP

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_cpumap_attach.c        | 65 ++++++++++++++++++-
 .../bpf/prog_tests/xdp_devmap_attach.c        | 56 ++++++++++++++++
 .../bpf/progs/test_xdp_with_cpumap_helpers.c  |  6 ++
 .../progs/test_xdp_with_cpumap_mb_helpers.c   | 27 ++++++++
 .../bpf/progs/test_xdp_with_devmap_helpers.c  |  7 ++
 .../progs/test_xdp_with_devmap_mb_helpers.c   | 27 ++++++++
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_mb_helpers.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_with_devmap_mb_helpers.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
index fd812bd43600..ee580b50a945 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_cpumap_attach.c
@@ -3,11 +3,12 @@
 #include <linux/if_link.h>
 #include <test_progs.h>
 
+#include "test_xdp_with_cpumap_mb_helpers.skel.h"
 #include "test_xdp_with_cpumap_helpers.skel.h"
 
 #define IFINDEX_LO	1
 
-void serial_test_xdp_cpumap_attach(void)
+void test_xdp_with_cpumap_helpers(void)
 {
 	struct test_xdp_with_cpumap_helpers *skel;
 	struct bpf_prog_info info = {};
@@ -54,6 +55,68 @@ void serial_test_xdp_cpumap_attach(void)
 	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
 	ASSERT_NEQ(err, 0, "Add non-BPF_XDP_CPUMAP program to cpumap entry");
 
+	/* try to attach BPF_XDP_CPUMAP multi-buff program when we have already
+	 * loaded a legacy XDP program on the map
+	 */
+	idx = 1;
+	val.qsize = 192;
+	val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_cm_mb);
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_NEQ(err, 0,
+		   "Add BPF_XDP_CPUMAP multi-buff program to cpumap entry");
+
 out_close:
 	test_xdp_with_cpumap_helpers__destroy(skel);
 }
+
+void test_xdp_with_cpumap_mb_helpers(void)
+{
+	struct test_xdp_with_cpumap_mb_helpers *skel;
+	struct bpf_prog_info info = {};
+	__u32 len = sizeof(info);
+	struct bpf_cpumap_val val = {
+		.qsize = 192,
+	};
+	int err, mb_prog_fd, map_fd;
+	__u32 idx = 0;
+
+	skel = test_xdp_with_cpumap_mb_helpers__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_xdp_with_cpumap_helpers__open_and_load"))
+		return;
+
+	mb_prog_fd = bpf_program__fd(skel->progs.xdp_dummy_cm_mb);
+	map_fd = bpf_map__fd(skel->maps.cpu_map);
+	err = bpf_obj_get_info_by_fd(mb_prog_fd, &info, &len);
+	if (!ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
+		goto out_close;
+
+	val.bpf_prog.fd = mb_prog_fd;
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_OK(err, "Add program to cpumap entry");
+
+	err = bpf_map_lookup_elem(map_fd, &idx, &val);
+	ASSERT_OK(err, "Read cpumap entry");
+	ASSERT_EQ(info.id, val.bpf_prog.id,
+		  "Match program id to cpumap entry prog_id");
+
+	/* try to attach BPF_XDP_CPUMAP program when we have already
+	 * loaded a multi-buff XDP program on the map
+	 */
+	idx = 1;
+	val.qsize = 192;
+	val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_cm);
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_NEQ(err, 0, "Add BPF_XDP_CPUMAP program to cpumap entry");
+
+out_close:
+	test_xdp_with_cpumap_mb_helpers__destroy(skel);
+}
+
+void serial_test_xdp_cpumap_attach(void)
+{
+	if (test__start_subtest("CPUMAP with programs in entries"))
+		test_xdp_with_cpumap_helpers();
+
+	if (test__start_subtest("CPUMAP with multi-buff programs in entries"))
+		test_xdp_with_cpumap_mb_helpers();
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
index 3079d5568f8f..5c0dc3c20fc9 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_devmap_attach.c
@@ -4,6 +4,7 @@
 #include <test_progs.h>
 
 #include "test_xdp_devmap_helpers.skel.h"
+#include "test_xdp_with_devmap_mb_helpers.skel.h"
 #include "test_xdp_with_devmap_helpers.skel.h"
 
 #define IFINDEX_LO 1
@@ -56,6 +57,16 @@ static void test_xdp_with_devmap_helpers(void)
 	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
 	ASSERT_NEQ(err, 0, "Add non-BPF_XDP_DEVMAP program to devmap entry");
 
+	/* try to attach BPF_XDP_DEVMAP multi-buff program when we have already
+	 * loaded a legacy XDP program on the map
+	 */
+	idx = 1;
+	val.ifindex = 1;
+	val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_dm_mb);
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_NEQ(err, 0,
+		   "Add BPF_XDP_DEVMAP multi-buff program to devmap entry");
+
 out_close:
 	test_xdp_with_devmap_helpers__destroy(skel);
 }
@@ -71,12 +82,57 @@ static void test_neg_xdp_devmap_helpers(void)
 	}
 }
 
+void test_xdp_with_devmap_mb_helpers(void)
+{
+	struct test_xdp_with_devmap_mb_helpers *skel;
+	struct bpf_prog_info info = {};
+	struct bpf_devmap_val val = {
+		.ifindex = IFINDEX_LO,
+	};
+	__u32 len = sizeof(info);
+	int err, dm_fd_mb, map_fd;
+	__u32 idx = 0;
+
+	skel = test_xdp_with_devmap_mb_helpers__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_xdp_with_devmap_helpers__open_and_load"))
+		return;
+
+	dm_fd_mb = bpf_program__fd(skel->progs.xdp_dummy_dm_mb);
+	map_fd = bpf_map__fd(skel->maps.dm_ports);
+	err = bpf_obj_get_info_by_fd(dm_fd_mb, &info, &len);
+	if (!ASSERT_OK(err, "bpf_obj_get_info_by_fd"))
+		goto out_close;
+
+	val.bpf_prog.fd = dm_fd_mb;
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_OK(err, "Add multi-buff program to devmap entry");
+
+	err = bpf_map_lookup_elem(map_fd, &idx, &val);
+	ASSERT_OK(err, "Read devmap entry");
+	ASSERT_EQ(info.id, val.bpf_prog.id,
+		  "Match program id to devmap entry prog_id");
+
+	/* try to attach BPF_XDP_DEVMAP program when we have already loaded a
+	 * multi-buff XDP program on the map
+	 */
+	idx = 1;
+	val.ifindex = 1;
+	val.bpf_prog.fd = bpf_program__fd(skel->progs.xdp_dummy_dm);
+	err = bpf_map_update_elem(map_fd, &idx, &val, 0);
+	ASSERT_NEQ(err, 0, "Add BPF_XDP_DEVMAP program to devmap entry");
+
+out_close:
+	test_xdp_with_devmap_mb_helpers__destroy(skel);
+}
 
 void serial_test_xdp_devmap_attach(void)
 {
 	if (test__start_subtest("DEVMAP with programs in entries"))
 		test_xdp_with_devmap_helpers();
 
+	if (test__start_subtest("DEVMAP with multi-buff programs in entries"))
+		test_xdp_with_devmap_mb_helpers();
+
 	if (test__start_subtest("Verifier check of DEVMAP programs"))
 		test_neg_xdp_devmap_helpers();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_helpers.c
index 532025057711..f32e4dab1751 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_helpers.c
@@ -33,4 +33,10 @@ int xdp_dummy_cm(struct xdp_md *ctx)
 	return XDP_PASS;
 }
 
+SEC("xdp_cpumap_mb/mb_dummy_cm")
+int xdp_dummy_cm_mb(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_mb_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_mb_helpers.c
new file mode 100644
index 000000000000..96eedbaef71b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_mb_helpers.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+#define IFINDEX_LO	1
+
+struct {
+	__uint(type, BPF_MAP_TYPE_CPUMAP);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_cpumap_val));
+	__uint(max_entries, 4);
+} cpu_map SEC(".maps");
+
+SEC("xdp_cpumap/dummy_cm")
+int xdp_dummy_cm(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+SEC("xdp_cpumap_mb/mb_dummy_cm")
+int xdp_dummy_cm_mb(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
index 1e6b9c38ea6d..691f2d70dedc 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_helpers.c
@@ -40,4 +40,11 @@ int xdp_dummy_dm(struct xdp_md *ctx)
 
 	return XDP_PASS;
 }
+
+SEC("xdp_devmap_mb/mp_map_prog")
+int xdp_dummy_dm_mb(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_mb_helpers.c b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_mb_helpers.c
new file mode 100644
index 000000000000..05221b1fd9f2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_with_devmap_mb_helpers.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_DEVMAP);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct bpf_devmap_val));
+	__uint(max_entries, 4);
+} dm_ports SEC(".maps");
+
+/* valid program on DEVMAP entry via SEC name;
+ * has access to egress and ingress ifindex
+ */
+SEC("xdp_devmap/map_prog")
+int xdp_dummy_dm(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+SEC("xdp_devmap_mb/mp_map_prog")
+int xdp_dummy_dm_mb(struct xdp_md *ctx)
+{
+	return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 23/23] xdp: disable XDP_REDIRECT for xdp multi-buff
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

XDP_REDIRECT is not fully supported yet for xdp multi-buff since not
all XDP capable drivers can map non-linear xdp_frame in ndo_xdp_xmit
so disable it for the moment.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/core/filter.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 14860931733d..def6e9f451a7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4186,6 +4186,13 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 	struct bpf_map *map;
 	int err;
 
+	/* XDP_REDIRECT is not fully supported yet for xdp multi-buff since
+	 * not all XDP capable drivers can map non-linear xdp_frame in
+	 * ndo_xdp_xmit.
+	 */
+	if (unlikely(xdp_buff_is_mb(xdp) && map_type != BPF_MAP_TYPE_CPUMAP))
+		return -EOPNOTSUPP;
+
 	ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
 	ri->map_type = BPF_MAP_TYPE_UNSPEC;
 
-- 
2.33.1


^ permalink raw reply related

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Vladimir Oltean @ 2021-12-10 19:27 UTC (permalink / raw)
  To: Ansuel Smith
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <61b3a621.1c69fb81.b4bf5.8dd2@mx.google.com>

On Fri, Dec 10, 2021 at 08:10:21PM +0100, Ansuel Smith wrote:
> > Ok I added more tracing and packet are received to the tagger right
> > after the log from ipv6 "link becomes ready". That log just check if the
> > interface is up and if it does have a valid sched.
> > I notice after link becomes ready we have a CHANGE event for eth0. That
> > should be the correct way to understand when the cpu port is actually
> > usable.
> > (just to make it clear before the link becomes ready no packet is
> > received to the tagger and the completion timeouts)
> > 
> > -- 
> > 	Ansuel
> 
> Sorry for the triple message spam... I have a solution. It seems packet
> are processed as soon as dev_activate is called (so a qdisk is assigned)
> By adding another bool like master_oper_ready and
> 
> void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
>                                       struct net_device *master,
>                                       bool up);
> 
> static void dsa_tree_master_state_change(struct dsa_switch_tree *dst,
>                                         struct net_device *master)
> {
>        struct dsa_notifier_master_state_info info;
>        struct dsa_port *cpu_dp = master->dsa_ptr;
> 
>        info.master = master;
>        info.operational = cpu_dp->master_admin_up && cpu_dp->master_oper_up && cpu_dp->master_oper_ready;
> 
>        dsa_tree_notify(dst, DSA_NOTIFIER_MASTER_STATE_CHANGE, &info);
> }
> 
> void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
>                                       struct net_device *master,
>                                       bool up)
> {
>        struct dsa_port *cpu_dp = master->dsa_ptr;
>        bool notify = false;
> 
>        if ((cpu_dp->master_oper_ready && cpu_dp->master_oper_ready) !=
>            (cpu_dp->master_oper_ready && up))
>                notify = true;
> 
>        cpu_dp->master_oper_ready = up;
> 
>        if (notify)
>                dsa_tree_master_state_change(dst, master);
> }
> 
> In slave.c at the NETDEV_CHANGE event the additional
> dsa_tree_master_oper_state_ready(dst, dev, dev_ingress_queue(dev));
> we have no timeout function. I just tested this and it works right away.
> 
> Think we need this additional check to make sure the tagger can finally
> accept packet from the switch.
> 
> With this added I think this is ready.

Why ingress_queue?
I was looking at dev_activate() too, especially since net/ipv6/addrconf.c uses:

/* Check if link is ready: is it up and is a valid qdisc available */
static inline bool addrconf_link_ready(const struct net_device *dev)
{
	return netif_oper_up(dev) && !qdisc_tx_is_noop(dev);
}

and you can see that qdisc_tx_is_noop() checks for the qdisc on TX
queues, not ingress qdisc (which makes more sense anyway).

Anyway the reason why I didn't say anything about this is because I
don't yet understand how it is supposed to work. Specifically:

rtnl_lock

dev_open()
-> __dev_open()
   -> dev->flags |= IFF_UP;
   -> dev_activate()
      -> transition_one_qdisc()
-> call_netdevice_notifiers(NETDEV_UP, dev);

rtnl_unlock

so the qdisc should have already transitioned by the time NETDEV_UP is
emitted.

and since we already require a NETDEV_UP to have occurred, or dev->flags
to contain IFF_UP, I simply don't understand the following
(a) why would the qdisc be noop when we catch NETDEV_UP
(b) who calls netdev_state_change() (or __dev_notify_flags ?!) after the
    qdisc changes on a TX queue? If no one, then I'm not sure how we can
    reliably check for the state of the qdisc if we aren't notified
    about changes to it.

^ permalink raw reply

* Re: [PATCH net-next 0/2] net: stmmac: add EthType Rx Frame steering
From: Jakub Kicinski @ 2021-12-10 19:38 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Ong Boon Leong, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu, Maxime Coquelin, alexandre.torgue,
	Kurt Kanzenbach, netdev, linux-stm32, linux-arm-kernel,
	Amritha Nambiar
In-Reply-To: <20211210115730.bcdh7jvwt24u5em3@skbuf>

On Fri, 10 Dec 2021 13:57:30 +0200 Vladimir Oltean wrote:
> Is it the canonical approach to perform flow steering via tc-flower hw_tc,
> as opposed to ethtool --config-nfc? My understanding from reading the
> documentation is that tc-flower hw_tc only selects the hardware traffic
> class for a packet, and that this has to do with prioritization
> (although the concept in itself is a bit ill-defined as far as I
> understand it, how does it relate to things like offloaded skbedit priority?).
> But selecting a traffic class, in itself, doesn't (directly or
> necessarily) select a ring per se, as ethtool does? Just like ethtool
> doesn't select packet priority, just RX queue. When the RX queue
> priority is configurable (see the "snps,priority" device tree property
> in stmmac_mtl_setup) and more RX queues have the same priority, I'm not
> sure what hw_tc is supposed to do in terms of RX queue selection?

You didn't mention the mqprio, but I think that's the piece that maps
TCs to queue pairs. You can have multiple queues in a TC.

Obviously that's still pretty weird what the flow rules should select
is an RSS context. mqprio is a qdisc, which means Tx, not Rx.

Adding Amritha who I believe added the concept of selecting Rx queues
via hw_tc. Can you comment?

^ permalink raw reply

* Re: [PATCH net-next] net: Enable neighbor sysctls that is save for userns root
From: Joanne Koong @ 2021-12-10 19:38 UTC (permalink / raw)
  To: cgel.zte, davem
  Cc: kuba, ebiederm, linux-kernel, netdev, daniel, xu xin, Zeal Robot
In-Reply-To: <20211208085844.405570-1-xu.xin16@zte.com.cn>

On 12/8/21 12:58 AM, cgel.zte@gmail.com wrote:

> From: xu xin <xu.xin16@zte.com.cn>
>
> Inside netns owned by non-init userns, sysctls about ARP/neighbor is
> currently not visible and configurable.
>
> For the attributes these sysctls correspond to, any modifications make
> effects on the performance of networking(ARP, especilly) only in the
> scope of netns, which does not affect other netns.
>
> Actually, some tools via netlink can modify these attribute. iproute2 is
> an example. see as follows:
>
> $ unshare -ur -n
> $ cat /proc/sys/net/ipv4/neigh/lo/retrans_time
> cat: can't open '/proc/sys/net/ipv4/neigh/lo/retrans_time': No such file
> or directory
> $ ip ntable show dev lo
> inet arp_cache
>      dev lo
>      refcnt 1 reachable 19494 base_reachable 30000 retrans 1000
>      gc_stale 60000 delay_probe 5000 queue 101
>      app_probes 0 ucast_probes 3 mcast_probes 3
>      anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
>
> inet6 ndisc_cache
>      dev lo
>      refcnt 1 reachable 42394 base_reachable 30000 retrans 1000
>      gc_stale 60000 delay_probe 5000 queue 101
>      app_probes 0 ucast_probes 3 mcast_probes 3
>      anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
> $ ip ntable change name arp_cache dev <if> retrans 2000
> inet arp_cache
>      dev lo
>      refcnt 1 reachable 22917 base_reachable 30000 retrans 2000
>      gc_stale 60000 delay_probe 5000 queue 101
>      app_probes 0 ucast_probes 3 mcast_probes 3
>      anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 1000
>
> inet6 ndisc_cache
>      dev lo
>      refcnt 1 reachable 35524 base_reachable 30000 retrans 1000
>      gc_stale 60000 delay_probe 5000 queue 101
>      app_probes 0 ucast_probes 3 mcast_probes 3
>      anycast_delay 1000 proxy_delay 800 proxy_queue 64 locktime 0
>
> Reported-by: Zeal Robot <zealci@zte.com.cn>
> Signed-off-by: xu xin <xu.xin16@zte.com.cn>
> ---
This LGTM. The neighbour sysctls are registered to the net namespace
associated with the neigh_parms. Any changes made to a net namespace
will be locally scoped to that net namespace (changes won't affect any other
net namespace).

There is also no possibility of a non-privileged user namespace messing 
up the
net namespace sysctls it shares with its parent user namespace. When a 
new user
namespace is created without unsharing the network namespace (eg calling
clone()  with CLONE_NEWUSER), the new user namespace shares its
parent's network namespace. Write access is protected by the mode set
in the sysctl ctl_table (and enforced by procfs). Here in the case of 
the neighbour
sysctls, 0644 is set for every sysctl; only the user owner has write access.


Acked-by: Joanne Koong <joannekoong@fb.com>
>   net/core/neighbour.c | 4 ----
>   1 file changed, 4 deletions(-)
>
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 0cdd4d9ad942..44d90cc341ea 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -3771,10 +3771,6 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
>   			neigh_proc_base_reachable_time;
>   	}
>   
> -	/* Don't export sysctls to unprivileged users */
> -	if (neigh_parms_net(p)->user_ns != &init_user_ns)
> -		t->neigh_vars[0].procname = NULL;
> -
>   	switch (neigh_parms_family(p)) {
>   	case AF_INET:
>   	      p_name = "ipv4";

^ permalink raw reply

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Ansuel Smith @ 2021-12-10 19:45 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <20211210192723.noa3hb2vso6t7zju@skbuf>

On Fri, Dec 10, 2021 at 07:27:24PM +0000, Vladimir Oltean wrote:
> On Fri, Dec 10, 2021 at 08:10:21PM +0100, Ansuel Smith wrote:
> > > Ok I added more tracing and packet are received to the tagger right
> > > after the log from ipv6 "link becomes ready". That log just check if the
> > > interface is up and if it does have a valid sched.
> > > I notice after link becomes ready we have a CHANGE event for eth0. That
> > > should be the correct way to understand when the cpu port is actually
> > > usable.
> > > (just to make it clear before the link becomes ready no packet is
> > > received to the tagger and the completion timeouts)
> > > 
> > > -- 
> > > 	Ansuel
> > 
> > Sorry for the triple message spam... I have a solution. It seems packet
> > are processed as soon as dev_activate is called (so a qdisk is assigned)
> > By adding another bool like master_oper_ready and
> > 
> > void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
> >                                       struct net_device *master,
> >                                       bool up);
> > 
> > static void dsa_tree_master_state_change(struct dsa_switch_tree *dst,
> >                                         struct net_device *master)
> > {
> >        struct dsa_notifier_master_state_info info;
> >        struct dsa_port *cpu_dp = master->dsa_ptr;
> > 
> >        info.master = master;
> >        info.operational = cpu_dp->master_admin_up && cpu_dp->master_oper_up && cpu_dp->master_oper_ready;
> > 
> >        dsa_tree_notify(dst, DSA_NOTIFIER_MASTER_STATE_CHANGE, &info);
> > }
> > 
> > void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
> >                                       struct net_device *master,
> >                                       bool up)
> > {
> >        struct dsa_port *cpu_dp = master->dsa_ptr;
> >        bool notify = false;
> > 
> >        if ((cpu_dp->master_oper_ready && cpu_dp->master_oper_ready) !=
> >            (cpu_dp->master_oper_ready && up))
> >                notify = true;
> > 
> >        cpu_dp->master_oper_ready = up;
> > 
> >        if (notify)
> >                dsa_tree_master_state_change(dst, master);
> > }
> > 
> > In slave.c at the NETDEV_CHANGE event the additional
> > dsa_tree_master_oper_state_ready(dst, dev, dev_ingress_queue(dev));
> > we have no timeout function. I just tested this and it works right away.
> > 
> > Think we need this additional check to make sure the tagger can finally
> > accept packet from the switch.
> > 
> > With this added I think this is ready.
> 
> Why ingress_queue?
> I was looking at dev_activate() too, especially since net/ipv6/addrconf.c uses:
> 
> /* Check if link is ready: is it up and is a valid qdisc available */
> static inline bool addrconf_link_ready(const struct net_device *dev)
> {
> 	return netif_oper_up(dev) && !qdisc_tx_is_noop(dev);
> }
> 
> and you can see that qdisc_tx_is_noop() checks for the qdisc on TX
> queues, not ingress qdisc (which makes more sense anyway).
> 
> Anyway the reason why I didn't say anything about this is because I
> don't yet understand how it is supposed to work. Specifically:
> 
> rtnl_lock
> 
> dev_open()
> -> __dev_open()
>    -> dev->flags |= IFF_UP;
>    -> dev_activate()
>       -> transition_one_qdisc()
> -> call_netdevice_notifiers(NETDEV_UP, dev);
> 
> rtnl_unlock
> 
> so the qdisc should have already transitioned by the time NETDEV_UP is
> emitted.
> 
> and since we already require a NETDEV_UP to have occurred, or dev->flags
> to contain IFF_UP, I simply don't understand the following
> (a) why would the qdisc be noop when we catch NETDEV_UP
> (b) who calls netdev_state_change() (or __dev_notify_flags ?!) after the
>     qdisc changes on a TX queue? If no one, then I'm not sure how we can
>     reliably check for the state of the qdisc if we aren't notified
>     about changes to it.

The ipv6 check is just a hint. The real clue was the second
NETDEV_CHANGE called by linkwatch_do_dev in link_watch.c
That is the one that calls the CHANGE event before the ready stuff.

I had problem tracking this as the change logic is "emit CHANGE when flags
change" but netdev_state_change is also called for other reason and one
example is dev_activate/dev_deactivate from linkwatch_do_dev.
It seems a bit confusing that a generic state change is called even when
flags are not changed and because of this is a bit problematic track why
the CHANGE event was called.

Wonder if linkwatch_do_dev should be changed and introduce a flag? But
that seems problematic if for whatever reason a driver use the CHANGE
event to track exactly dev_activate/deactivate.

-- 
	Ansuel

^ permalink raw reply

* pull-request: bpf-next 2021-12-10
From: Andrii Nakryiko @ 2021-12-10 19:50 UTC (permalink / raw)
  To: davem; +Cc: kuba, daniel, ast, netdev, bpf, andrii, kernel-team

Hi David, hi Jakub,

The following pull-request contains BPF updates for your *net-next* tree.

There are three merge conflicts between bpf and bpf-next:

1. Documentation/bpf/index.rst. Please Just drop the libbpf and BTF sections,
   so that the resulting content is like this:

  [...]

  This kernel side documentation is still work in progress.
  The Cilium project also maintains a `BPF and XDP Reference Guide`_
  that goes into great technical depth about the BPF Architecture.

  .. toctree::
     :maxdepth: 1
  
     instruction-set
     verifier
  [...]

2. kernel/bpf/btf.c. There was a big chunk of code added at the end, but git
   is confused about #endif. Please keep the original #endif (corresponding to
   #ifdef CONFIG_DEBUG_INTO_BTF_MODULES) and all the newly added code goes to
   the end of the file:

  --- a/kernel/bpf/btf.c
  +++ b/kernel/bpf/btf.c
  @@@ -6418,384 -6390,4 +6409,386 @@@ bool bpf_check_mod_kfunc_call(struct kf
    DEFINE_KFUNC_BTF_ID_LIST(bpf_tcp_ca_kfunc_list);
    DEFINE_KFUNC_BTF_ID_LIST(prog_test_kfunc_list);

  + #endif
  ++
   +int bpf_core_types_are_compat(const struct btf *local_btf, __u32 local_id,
   +                            const struct btf *targ_btf, __u32 targ_id)
   +{
   +      return -EOPNOTSUPP;
   +}
  [...]

3. tools/lib/bpf/libbpf.c, attr->log_level should be replaced with
   extra_log_level, but otherwise 4-parameter invocation of btf_gen__init()
   wins:

  --- a/tools/lib/bpf/libbpf.c
  +++ b/tools/lib/bpf/libbpf.c
  @@@ -7477,7 -7258,7 +7477,7 @@@ static int bpf_object_load(struct bpf_o
          }

          if (obj->gen_loader)
  -               bpf_gen__init(obj->gen_loader, extra_log_level);
   -              bpf_gen__init(obj->gen_loader, attr->log_level, obj->nr_programs, obj->nr_maps);
  ++              bpf_gen__init(obj->gen_loader, extra_log_level, obj->nr_programs, obj->nr_maps);

          err = bpf_object__probe_loading(obj);
          err = err ? : bpf_object__load_vmlinux_btf(obj, false);

We've added 116 non-merge commits during the last 26 day(s) which contain
a total of 182 files changed, 5748 insertions(+), 2568 deletions(-).

The main changes are:

1) Various samples fixes, from Alexander Lobakin.

2) BPF CO-RE support in kernel and light skeleton, from Alexei Starovoitov.

3) A batch of new unified APIs for libbpf, logging improvements, version
   querying, etc. Also a batch of old deprecations for old APIs and various
   bug fixes, in preparation for libbpf 1.0, from Andrii Nakryiko.

4) BPF documentation reorganization and improvements, from Christoph Hellwig
   and Dave Tucker.

5) Support for declarative initialization of BPF_MAP_TYPE_PROG_ARRAY in
   libbpf, from Hengqi Chen.

6) Verifier log fixes, from Hou Tao.

7) Runtime-bounded loops support with bpf_loop() helper, from Joanne Koong.

8) Extend branch record capturing to all platforms that support it,
   from Kajol Jain.

9) Light skeleton codegen improvements, from Kumar Kartikeya Dwivedi.

10) bpftool doc-generating script improvements, from Quentin Monnet.

11) Two libbpf v0.6 bug fixes, from Shuyi Cheng and Vincent Minet.

12) Deprecation warning fix for perf/bpf_counter, from Song Liu.

13) MAX_TAIL_CALL_CNT unification and MIPS build fix for libbpf,
    from Tiezhu Yang.

14) BTF_KING_TYPE_TAG follow-up fixes, from Yonghong Song.

15) Selftests fixes and improvements, from Ilya Leoshkevich, Jean-Philippe
    Brucker, Jiri Olsa, Maxim Mikityanskiy, Tirthendu Sarkar, Yucong Sun,
    and others.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot!

Also thanks to reporters, reviewers and testers of commits in this pull-request:

Andrii Nakryiko, Björn Töpel, Evgeny Vereshchagin, Gustavo A. R. Silva, 
Ilya Leoshkevich, Jiri Olsa, Johan Almbladh, John Fastabend, KP Singh, 
Kumar Kartikeya Dwivedi, Maciej Fijalkowski, Martin KaFai Lau, Quentin 
Monnet, Song Liu, Toke Høiland-Jørgensen, Yonghong Song, Zeal Robot

----------------------------------------------------------------

The following changes since commit a5bdc36354cbf1a1a91396f4da548ff484686305:

  Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next (2021-11-15 08:49:23 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 733719ec4e0b7b4e94e053979f0bbfc489b76f0b:

  libbpf: Add "bool skipped" to struct bpf_map (2021-12-10 10:17:59 -0800)

----------------------------------------------------------------
Alan Maguire (1):
      libbpf: Silence uninitialized warning/error in btf_dump_dump_type_data

Alexander Lobakin (3):
      samples: bpf: Fix conflicting types in fds_example
      samples: bpf: Fix xdp_sample_user.o linking with Clang
      samples: bpf: Fix 'unknown warning group' build warning on Clang

Alexei Starovoitov (22):
      Merge branch 'Add bpf_loop helper'
      libbpf: Replace btf__type_by_id() with btf_type_by_id().
      bpf: Rename btf_member accessors.
      bpf: Prepare relo_core.c for kernel duty.
      bpf: Define enum bpf_core_relo_kind as uapi.
      bpf: Pass a set of bpf_core_relo-s to prog_load command.
      bpf: Adjust BTF log size limit.
      bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().
      libbpf: Use CO-RE in the kernel in light skeleton.
      libbpf: Support init of inner maps in light skeleton.
      libbpf: Clean gen_loader's attach kind.
      selftests/bpf: Add lskel version of kfunc test.
      selftests/bpf: Improve inner_map test coverage.
      selftests/bpf: Convert map_ptr_kern test to use light skeleton.
      selftests/bpf: Additional test for CO-RE in the kernel.
      selftests/bpf: Revert CO-RE removal in test_ksyms_weak.
      selftests/bpf: Add CO-RE relocations to verifier scale test.
      Merge branch 'Deprecate bpf_prog_load_xattr() API'
      libbpf: Reduce bpf_core_apply_relo_insn() stack usage.
      bpftool: Add debug mode for gen_loader.
      bpf: Silence purge_cand_cache build warning.
      Merge branch 'Enhance and rework logging controls in libbpf'

Andrii Nakryiko (48):
      selftests/bpf: Add uprobe triggering overhead benchmarks
      libbpf: Add runtime APIs to query libbpf version
      libbpf: Accommodate DWARF/compiler bug with duplicated structs
      libbpf: Load global data maps lazily on legacy kernels
      selftests/bpf: Mix legacy (maps) and modern (vars) BPF in one test
      libbpf: Unify low-level map creation APIs w/ new bpf_map_create()
      libbpf: Use bpf_map_create() consistently internally
      libbpf: Prevent deprecation warnings in xsk.c
      selftests/bpf: Migrate selftests to bpf_map_create()
      tools/resolve_btf_ids: Close ELF file on error
      libbpf: Fix potential misaligned memory access in btf_ext__new()
      libbpf: Don't call libc APIs with NULL pointers
      libbpf: Fix glob_syms memory leak in bpf_linker
      libbpf: Fix using invalidated memory in bpf_linker
      selftests/bpf: Fix UBSan complaint about signed __int128 overflow
      selftests/bpf: Fix possible NULL passed to memcpy() with zero size
      selftests/bpf: Prevent misaligned memory access in get_stack_raw_tp test
      selftests/bpf: Fix misaligned memory access in queue_stack_map test
      selftests/bpf: Prevent out-of-bounds stack access in test_bpffs
      selftests/bpf: Fix misaligned memory accesses in xdp_bonding test
      selftests/bpf: Fix misaligned accesses in xdp and xdp_bpf2bpf tests
      Merge branch 'Support static initialization of BPF_MAP_TYPE_PROG_ARRAY'
      Merge branch 'Apply suggestions for typeless/weak ksym series'
      libbpf: Cleanup struct bpf_core_cand.
      Merge branch 'bpf: CO-RE support in the kernel'
      libbpf: Use __u32 fields in bpf_map_create_opts
      libbpf: Add API to get/set log_level at per-program level
      bpftool: Migrate off of deprecated bpf_create_map_xattr() API
      selftests/bpf: Remove recently reintroduced legacy btf__dedup() use
      selftests/bpf: Mute xdpxceiver.c's deprecation warnings
      selftests/bpf: Remove all the uses of deprecated bpf_prog_load_xattr()
      samples/bpf: Clean up samples/bpf build failes
      samples/bpf: Get rid of deprecated libbpf API uses
      libbpf: Deprecate bpf_prog_load_xattr() API
      perf: Mute libbpf API deprecations temporarily
      Merge branch 'samples: bpf: fix build issues with Clang/LLVM'
      libbpf: Fix bpf_prog_load() log_buf logic for log_level 0
      libbpf: Add OPTS-based bpf_btf_load() API
      libbpf: Allow passing preallocated log_buf when loading BTF into kernel
      libbpf: Allow passing user log setting through bpf_object_open_opts
      libbpf: Improve logging around BPF program loading
      libbpf: Preserve kernel error code and remove kprobe prog type guessing
      libbpf: Add per-program log buffer setter and getter
      libbpf: Deprecate bpf_object__load_xattr()
      selftests/bpf: Replace all uses of bpf_load_btf() with bpf_btf_load()
      selftests/bpf: Add test for libbpf's custom log_buf behavior
      selftests/bpf: Remove the only use of deprecated bpf_object__load_xattr()
      bpftool: Switch bpf_object__load_xattr() to bpf_object__load()

Christoph Hellwig (5):
      x86, bpf: Cleanup the top of file header in bpf_jit_comp.c
      bpf: Remove a redundant comment on bpf_prog_free
      bpf, docs: Prune all references to "internal BPF"
      bpf, docs: Move handling of maps to Documentation/bpf/maps.rst
      bpf, docs: Split general purpose eBPF documentation out of filter.rst

Colin Ian King (1):
      bpf: Remove redundant assignment to pointer t

Dave Tucker (3):
      bpf, docs: Change underline in btf to match style guide
      bpf, docs: Rename bpf_lsm.rst to prog_lsm.rst
      bpf, docs: Fix ordering of bpf documentation

Drew Fustini (1):
      selftests/bpf: Fix trivial typo

Florent Revest (1):
      libbpf: Change bpf_program__set_extra_flags to bpf_program__set_flags

Grant Seltzer (1):
      libbpf: Add doc comments in libbpf.h

Hengqi Chen (2):
      libbpf: Support static initialization of BPF_MAP_TYPE_PROG_ARRAY
      selftests/bpf: Test BPF_MAP_TYPE_PROG_ARRAY static initialization

Hou Tao (2):
      bpf: Clean-up bpf_verifier_vlog() for BPF_LOG_KERNEL log level
      bpf: Disallow BPF_LOG_KERNEL log level for bpf(BPF_BTF_LOAD)

Ilya Leoshkevich (1):
      selfetests/bpf: Adapt vmtest.sh to s390 libbpf CI changes

Jean-Philippe Brucker (1):
      selftests/bpf: Build testing_helpers.o out of tree

Jiri Olsa (1):
      selftests/bpf: Add btf_dedup case with duplicated structs within CU

Joanne Koong (4):
      bpf: Add bpf_loop helper
      selftests/bpf: Add bpf_loop test
      selftests/bpf: Measure bpf_loop verifier performance
      selftest/bpf/benchs: Add bpf_loop benchmark

Kajol Jain (1):
      bpf: Remove config check to enable bpf support for branch records

Kumar Kartikeya Dwivedi (3):
      bpf: Change bpf_kallsyms_lookup_name size type to ARG_CONST_SIZE_OR_ZERO
      libbpf: Avoid double stores for success/failure case of ksym relocations
      libbpf: Avoid reload of imm for weak, unresolved, repeating ksym

Maxim Mikityanskiy (1):
      bpf: Fix the test_task_vma selftest to support output shorter than 1 kB

Mehrdad Arshad Rad (1):
      libbpf: Remove duplicate assignments

Minghao Chi (1):
      samples/bpf: Remove unneeded variable

Paul E. McKenney (1):
      selftests/bpf: Update test names for xchg and cmpxchg

Quentin Monnet (3):
      bpftool: Add SPDX tags to RST documentation files
      bpftool: Update doc (use susbtitutions) and test_bpftool_synctypes.py
      selftests/bpf: Configure dir paths via env in test_bpftool_synctypes.py

Shuyi Cheng (1):
      libbpf: Add "bool skipped" to struct bpf_map

Song Liu (1):
      perf/bpf_counter: Use bpf_map_create instead of bpf_create_map

Stanislav Fomichev (1):
      bpftool: Add current libbpf_strict mode to version output

Tiezhu Yang (2):
      bpf: Change value of MAX_TAIL_CALL_CNT from 32 to 33
      bpf, mips: Fix build errors about __NR_bpf undeclared

Tirthendu Sarkar (1):
      selftests/bpf: Fix xdpxceiver failures for no hugepages

Vincent Minet (1):
      libbpf: Fix typo in btf__dedup@LIBBPF_0.0.2 definition

Yihao Han (1):
      samples/bpf: xdpsock: Fix swap.cocci warning

Yonghong Song (3):
      libbpf: Fix a couple of missed btf_type_tag handling in btf.c
      selftests/bpf: Add a dedup selftest with equivalent structure types
      selftests/bpf: Fix a compilation warning

Yucong Sun (3):
      selftests/bpf: Move summary line after the error logs
      selftests/bpf: Variable naming fix
      selftests/bpf: Mark variable as static

huangxuesen (1):
      libbpf: Fix trivial typo

 Documentation/bpf/btf.rst                          |   44 +-
 Documentation/bpf/faq.rst                          |   11 +
 Documentation/bpf/helpers.rst                      |    7 +
 Documentation/bpf/index.rst                        |  102 +-
 Documentation/bpf/instruction-set.rst              |  467 +++++++++
 Documentation/bpf/libbpf/index.rst                 |    4 +-
 Documentation/bpf/maps.rst                         |   52 +
 Documentation/bpf/other.rst                        |    9 +
 Documentation/bpf/{bpf_lsm.rst => prog_lsm.rst}    |    0
 Documentation/bpf/programs.rst                     |    9 +
 Documentation/bpf/syscall_api.rst                  |   11 +
 Documentation/bpf/test_debug.rst                   |    9 +
 Documentation/bpf/verifier.rst                     |  529 ++++++++++
 Documentation/networking/filter.rst                | 1036 +-------------------
 MAINTAINERS                                        |    2 +-
 arch/arm/net/bpf_jit_32.c                          |    7 +-
 arch/arm64/net/bpf_jit_comp.c                      |    7 +-
 arch/mips/net/bpf_jit_comp32.c                     |    3 +-
 arch/mips/net/bpf_jit_comp64.c                     |    2 +-
 arch/powerpc/net/bpf_jit_comp32.c                  |    4 +-
 arch/powerpc/net/bpf_jit_comp64.c                  |    4 +-
 arch/riscv/net/bpf_jit_comp32.c                    |    6 +-
 arch/riscv/net/bpf_jit_comp64.c                    |    7 +-
 arch/s390/net/bpf_jit_comp.c                       |    6 +-
 arch/sparc/net/bpf_jit_comp_64.c                   |    4 +-
 arch/x86/net/bpf_jit_comp.c                        |   14 +-
 arch/x86/net/bpf_jit_comp32.c                      |    4 +-
 include/linux/bpf.h                                |   11 +-
 include/linux/bpf_verifier.h                       |    7 +
 include/linux/btf.h                                |   89 +-
 include/uapi/linux/bpf.h                           |  105 +-
 kernel/bpf/Makefile                                |    4 +
 kernel/bpf/bpf_iter.c                              |   35 +
 kernel/bpf/bpf_struct_ops.c                        |    6 +-
 kernel/bpf/btf.c                                   |  410 +++++++-
 kernel/bpf/core.c                                  |    6 +-
 kernel/bpf/helpers.c                               |    2 +
 kernel/bpf/syscall.c                               |    4 +-
 kernel/bpf/verifier.c                              |  180 +++-
 kernel/trace/bpf_trace.c                           |    6 +-
 lib/test_bpf.c                                     |    4 +-
 net/core/filter.c                                  |   11 +-
 net/ipv4/bpf_tcp_ca.c                              |    6 +-
 samples/bpf/Makefile                               |   18 +-
 samples/bpf/Makefile.target                        |   11 -
 samples/bpf/cookie_uid_helper_example.c            |   14 +-
 samples/bpf/fds_example.c                          |   29 +-
 samples/bpf/hbm_kern.h                             |    2 -
 samples/bpf/lwt_len_hist_kern.c                    |    7 -
 samples/bpf/map_perf_test_user.c                   |   15 +-
 samples/bpf/sock_example.c                         |   12 +-
 samples/bpf/sockex1_user.c                         |   15 +-
 samples/bpf/sockex2_user.c                         |   14 +-
 samples/bpf/test_cgrp2_array_pin.c                 |    4 +-
 samples/bpf/test_cgrp2_attach.c                    |   13 +-
 samples/bpf/test_cgrp2_sock.c                      |    8 +-
 samples/bpf/test_lru_dist.c                        |   11 +-
 samples/bpf/trace_output_user.c                    |    4 +-
 samples/bpf/xdp_redirect_cpu.bpf.c                 |    4 +-
 samples/bpf/xdp_sample_pkts_user.c                 |   22 +-
 samples/bpf/xdp_sample_user.h                      |    2 +
 samples/bpf/xdpsock_ctrl_proc.c                    |    3 +
 samples/bpf/xdpsock_user.c                         |    3 +
 samples/bpf/xsk_fwd.c                              |    8 +-
 tools/bpf/bpftool/Documentation/Makefile           |    2 +-
 tools/bpf/bpftool/Documentation/bpftool-btf.rst    |    7 +-
 tools/bpf/bpftool/Documentation/bpftool-cgroup.rst |    7 +-
 .../bpf/bpftool/Documentation/bpftool-feature.rst  |    6 +-
 tools/bpf/bpftool/Documentation/bpftool-gen.rst    |    7 +-
 tools/bpf/bpftool/Documentation/bpftool-iter.rst   |    6 +-
 tools/bpf/bpftool/Documentation/bpftool-link.rst   |    7 +-
 tools/bpf/bpftool/Documentation/bpftool-map.rst    |    7 +-
 tools/bpf/bpftool/Documentation/bpftool-net.rst    |    6 +-
 tools/bpf/bpftool/Documentation/bpftool-perf.rst   |    6 +-
 tools/bpf/bpftool/Documentation/bpftool-prog.rst   |    6 +-
 .../bpftool/Documentation/bpftool-struct_ops.rst   |    6 +-
 tools/bpf/bpftool/Documentation/bpftool.rst        |    7 +-
 tools/bpf/bpftool/Documentation/common_options.rst |    2 +
 tools/bpf/bpftool/Documentation/substitutions.rst  |    3 +
 tools/bpf/bpftool/gen.c                            |   11 +-
 tools/bpf/bpftool/main.c                           |   12 +-
 tools/bpf/bpftool/map.c                            |   23 +-
 tools/bpf/bpftool/prog.c                           |   44 +-
 tools/bpf/bpftool/struct_ops.c                     |   15 +-
 tools/bpf/resolve_btfids/main.c                    |    5 +-
 tools/build/feature/test-bpf.c                     |    6 +
 tools/include/uapi/linux/bpf.h                     |  105 +-
 tools/lib/bpf/bpf.c                                |  234 +++--
 tools/lib/bpf/bpf.h                                |   55 +-
 tools/lib/bpf/bpf_gen_internal.h                   |    9 +-
 tools/lib/bpf/btf.c                                |  139 ++-
 tools/lib/bpf/btf.h                                |    2 +-
 tools/lib/bpf/btf_dump.c                           |    2 +-
 tools/lib/bpf/gen_loader.c                         |  160 ++-
 tools/lib/bpf/libbpf.c                             |  649 ++++++++----
 tools/lib/bpf/libbpf.h                             |  115 ++-
 tools/lib/bpf/libbpf.map                           |   15 +-
 tools/lib/bpf/libbpf_common.h                      |    5 +
 tools/lib/bpf/libbpf_internal.h                    |   24 +-
 tools/lib/bpf/libbpf_probes.c                      |   32 +-
 tools/lib/bpf/libbpf_version.h                     |    2 +-
 tools/lib/bpf/linker.c                             |    6 +-
 tools/lib/bpf/relo_core.c                          |  231 +++--
 tools/lib/bpf/relo_core.h                          |  103 +-
 tools/lib/bpf/skel_internal.h                      |   13 +-
 tools/lib/bpf/xsk.c                                |   18 +-
 tools/perf/tests/bpf.c                             |    4 +
 tools/perf/util/bpf-loader.c                       |    3 +
 tools/perf/util/bpf_counter.c                      |   18 +-
 tools/testing/selftests/bpf/Makefile               |   49 +-
 tools/testing/selftests/bpf/bench.c                |   47 +
 tools/testing/selftests/bpf/bench.h                |    2 +
 .../testing/selftests/bpf/benchs/bench_bpf_loop.c  |  105 ++
 tools/testing/selftests/bpf/benchs/bench_trigger.c |  146 +++
 .../selftests/bpf/benchs/run_bench_bpf_loop.sh     |   15 +
 tools/testing/selftests/bpf/benchs/run_common.sh   |   15 +
 .../selftests/bpf/map_tests/array_map_batch_ops.c  |   13 +-
 .../selftests/bpf/map_tests/htab_map_batch_ops.c   |   13 +-
 .../bpf/map_tests/lpm_trie_map_batch_ops.c         |   15 +-
 .../selftests/bpf/map_tests/sk_storage_map.c       |   52 +-
 tools/testing/selftests/bpf/prog_tests/atomics.c   |    4 +-
 .../selftests/bpf/prog_tests/bloom_filter_map.c    |   36 +-
 tools/testing/selftests/bpf/prog_tests/bpf_iter.c  |   13 +-
 tools/testing/selftests/bpf/prog_tests/bpf_loop.c  |  145 +++
 .../testing/selftests/bpf/prog_tests/bpf_tcp_ca.c  |    6 +-
 .../selftests/bpf/prog_tests/bpf_verif_scale.c     |   42 +-
 tools/testing/selftests/bpf/prog_tests/btf.c       |  127 ++-
 .../selftests/bpf/prog_tests/btf_dedup_split.c     |  113 +++
 tools/testing/selftests/bpf/prog_tests/btf_dump.c  |    4 +-
 .../selftests/bpf/prog_tests/cgroup_attach_multi.c |   12 +-
 .../selftests/bpf/prog_tests/connect_force_port.c  |   17 +-
 tools/testing/selftests/bpf/prog_tests/core_kern.c |   14 +
 .../testing/selftests/bpf/prog_tests/core_reloc.c  |    3 +-
 .../selftests/bpf/prog_tests/get_stack_raw_tp.c    |   14 +-
 tools/testing/selftests/bpf/prog_tests/kfree_skb.c |   58 +-
 .../testing/selftests/bpf/prog_tests/kfunc_call.c  |   24 +
 .../selftests/bpf/prog_tests/legacy_printk.c       |   65 ++
 tools/testing/selftests/bpf/prog_tests/log_buf.c   |  276 ++++++
 tools/testing/selftests/bpf/prog_tests/map_ptr.c   |   16 +-
 tools/testing/selftests/bpf/prog_tests/pinning.c   |    4 +-
 .../selftests/bpf/prog_tests/prog_array_init.c     |   32 +
 .../selftests/bpf/prog_tests/queue_stack_map.c     |   12 +-
 .../selftests/bpf/prog_tests/ringbuf_multi.c       |    4 +-
 .../selftests/bpf/prog_tests/select_reuseport.c    |   21 +-
 .../selftests/bpf/prog_tests/sockmap_basic.c       |    4 +-
 .../selftests/bpf/prog_tests/sockmap_ktls.c        |    2 +-
 .../selftests/bpf/prog_tests/sockmap_listen.c      |    4 +-
 .../selftests/bpf/prog_tests/sockopt_inherit.c     |   12 +-
 .../selftests/bpf/prog_tests/sockopt_multi.c       |   12 +-
 tools/testing/selftests/bpf/prog_tests/tcp_rtt.c   |   21 +-
 .../testing/selftests/bpf/prog_tests/test_bpffs.c  |    6 +-
 .../selftests/bpf/prog_tests/test_global_funcs.c   |   28 +-
 tools/testing/selftests/bpf/prog_tests/xdp.c       |   11 +-
 .../testing/selftests/bpf/prog_tests/xdp_bonding.c |   36 +-
 .../testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c |    6 +-
 tools/testing/selftests/bpf/progs/bpf_loop.c       |  112 +++
 tools/testing/selftests/bpf/progs/bpf_loop_bench.c |   26 +
 tools/testing/selftests/bpf/progs/core_kern.c      |  104 ++
 tools/testing/selftests/bpf/progs/map_ptr_kern.c   |   16 +-
 tools/testing/selftests/bpf/progs/pyperf.h         |   71 +-
 .../selftests/bpf/progs/pyperf600_bpf_loop.c       |    6 +
 tools/testing/selftests/bpf/progs/strobemeta.h     |   75 +-
 .../selftests/bpf/progs/strobemeta_bpf_loop.c      |    9 +
 .../testing/selftests/bpf/progs/test_ksyms_weak.c  |    2 +-
 .../selftests/bpf/progs/test_legacy_printk.c       |   73 ++
 tools/testing/selftests/bpf/progs/test_log_buf.c   |   24 +
 .../selftests/bpf/progs/test_prog_array_init.c     |   39 +
 .../selftests/bpf/progs/test_verif_scale2.c        |    4 +-
 tools/testing/selftests/bpf/progs/trigger_bench.c  |    7 +
 .../selftests/bpf/test_bpftool_synctypes.py        |   94 +-
 tools/testing/selftests/bpf/test_cgroup_storage.c  |    8 +-
 tools/testing/selftests/bpf/test_lpm_map.c         |   27 +-
 tools/testing/selftests/bpf/test_lru_map.c         |   16 +-
 tools/testing/selftests/bpf/test_maps.c            |  110 ++-
 tools/testing/selftests/bpf/test_progs.c           |   28 +-
 tools/testing/selftests/bpf/test_sock_addr.c       |   33 +-
 tools/testing/selftests/bpf/test_tag.c             |    5 +-
 tools/testing/selftests/bpf/test_verifier.c        |   54 +-
 tools/testing/selftests/bpf/testing_helpers.c      |   14 +-
 tools/testing/selftests/bpf/vmtest.sh              |   46 +-
 tools/testing/selftests/bpf/xdp_redirect_multi.c   |   15 +-
 tools/testing/selftests/bpf/xdpxceiver.c           |   12 +-
 182 files changed, 5748 insertions(+), 2568 deletions(-)
 create mode 100644 Documentation/bpf/faq.rst
 create mode 100644 Documentation/bpf/helpers.rst
 create mode 100644 Documentation/bpf/instruction-set.rst
 create mode 100644 Documentation/bpf/maps.rst
 create mode 100644 Documentation/bpf/other.rst
 rename Documentation/bpf/{bpf_lsm.rst => prog_lsm.rst} (100%)
 create mode 100644 Documentation/bpf/programs.rst
 create mode 100644 Documentation/bpf/syscall_api.rst
 create mode 100644 Documentation/bpf/test_debug.rst
 create mode 100644 Documentation/bpf/verifier.rst
 create mode 100644 tools/bpf/bpftool/Documentation/substitutions.rst
 create mode 100644 tools/testing/selftests/bpf/benchs/bench_bpf_loop.c
 create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_bpf_loop.sh
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_loop.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_kern.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/legacy_printk.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/log_buf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/prog_array_init.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_loop.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_loop_bench.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/pyperf600_bpf_loop.c
 create mode 100644 tools/testing/selftests/bpf/progs/strobemeta_bpf_loop.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_legacy_printk.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_log_buf.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_prog_array_init.c

^ permalink raw reply

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Vladimir Oltean @ 2021-12-10 19:54 UTC (permalink / raw)
  To: Ansuel Smith
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <61b3ae6b.1c69fb81.9a57f.8856@mx.google.com>

On Fri, Dec 10, 2021 at 08:45:43PM +0100, Ansuel Smith wrote:
> > Anyway the reason why I didn't say anything about this is because I
> > don't yet understand how it is supposed to work. Specifically:
> > 
> > rtnl_lock
> > 
> > dev_open()
> > -> __dev_open()
> >    -> dev->flags |= IFF_UP;
> >    -> dev_activate()
> >       -> transition_one_qdisc()
> > -> call_netdevice_notifiers(NETDEV_UP, dev);
> > 
> > rtnl_unlock
> > 
> > so the qdisc should have already transitioned by the time NETDEV_UP is
> > emitted.
> > 
> > and since we already require a NETDEV_UP to have occurred, or dev->flags
> > to contain IFF_UP, I simply don't understand the following
> > (a) why would the qdisc be noop when we catch NETDEV_UP
> > (b) who calls netdev_state_change() (or __dev_notify_flags ?!) after the
> >     qdisc changes on a TX queue? If no one, then I'm not sure how we can
> >     reliably check for the state of the qdisc if we aren't notified
> >     about changes to it.
> 
> The ipv6 check is just a hint. The real clue was the second
> NETDEV_CHANGE called by linkwatch_do_dev in link_watch.c
> That is the one that calls the CHANGE event before the ready stuff.
> 
> I had problem tracking this as the change logic is "emit CHANGE when flags
> change" but netdev_state_change is also called for other reason and one
> example is dev_activate/dev_deactivate from linkwatch_do_dev.
> It seems a bit confusing that a generic state change is called even when
> flags are not changed and because of this is a bit problematic track why
> the CHANGE event was called.
> 
> Wonder if linkwatch_do_dev should be changed and introduce a flag? But
> that seems problematic if for whatever reason a driver use the CHANGE
> event to track exactly dev_activate/deactivate.

Yes, I had my own "aha" moment just minutes before you sent this email
about linkwatch_do_dev. So indeed that's the source of both the
dev_activate(), as well as the netdev_state_change() notifier.

As to my previous question (why would the qdisc be noop when we catch
NETDEV_UP): the answer is of course in the code as well:

dev_activate() has:
	if (!netif_carrier_ok(dev))
		/* Delay activation until next carrier-on event */
		return;

which is then actually picked up from linkwatch_do_dev().

Let's not change linkwatch_do_dev(), I just wanted to understand why it
works. Please confirm that it also works for you to make master_admin_up
depend on qdisc_tx_is_noop() instead of the current ingress_queue check,
then add a comment stating the mechanism through which we are tracking
the dev_activate() calls, and then this should be good to go.
I'd like you to pick up the patches and post them together with your
driver changes. I can't post the patches on my own since I don't have
any use for them. I'll leave a few more "review" comments on them in a
minute.

^ permalink raw reply

* Re: [net v5 2/3] net: sched: add check tc_skip_classify in sch egress
From: Tonghao Zhang @ 2021-12-10 19:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Eric Dumazet,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Arnd Bergmann
In-Reply-To: <CAMDZJNXL5qSfFv54A=RrMwHe8DOv48EfrypHb1FFSUFu36-9DQ@mail.gmail.com>

On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > >
> > > xiangxia.m.yue@ wrote:
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > Try to resolve the issues as below:
> > > > * We look up and then check tc_skip_classify flag in net
> > > >   sched layer, even though skb don't want to be classified.
> > > >   That case may consume a lot of cpu cycles. This patch
> > > >   is useful when there are a lot of filters with different
> > > >   prio. There is ~5 prio in in production, ~1% improvement.
> > > >
> > > >   Rules as below:
> > > >   $ for id in $(seq 1 5); do
> > > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > > >   $ done
> > > >
> > > > * bpf_redirect may be invoked in egress path. If we don't
> > > >   check the flags and then return immediately, the packets
> > > >   will loopback.
> > >
> > > This would be the naive case right? Meaning the BPF program is
> > > doing a redirect without any logic or is buggy?
> > >
> > > Can you map out how this happens for me, I'm not fully sure I
> > > understand the exact concern. Is it possible for BPF programs
> > > that used to see packets no longer see the packet as expected?
> > >
> > > Is this the path you are talking about?
> > Hi John
> > Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > netdev, right ?
> > so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >
> > >  rx ethx  ->
> > >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> > >        dev_queue_xmit() ->
> > >           sch_handle_egress() ->
> > >             execute BPF program again
> > >
> > > I can't see why you want to skip that second tc BPF program,
> > > or for that matter any tc filter there. In general how do you
> > > know that is the correct/expected behavior? Before the above
> > > change it would have been called, what if its doing useful
> > > work.
> > bpf_redirect works fine on ingress with ifb
> > __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> > -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> > but
> > __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> > execute the BPF progam again)
> so the packets will NOT execute the BPF progam again)
>
> > > Also its not clear how your ifb setup is built or used. That
> > > might help understand your use case. I would just remove the
> > > IFB altogether and the above discussion is mute.
> > tc filter add dev veth1 egress bpf direct-action obj
> > test_bpf_redirect_ifb.o sec redirect_ifb
> >
> > the test_bpf_redirect_ifb  bpf progam:
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2021 DiDi Global */
> > +
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +SEC("redirect_ifb")
> > +int redirect(struct __sk_buff *skb)
> > +{
> > +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> > +}
> > +
> > +char __license[] SEC("license") = "GPL";
> >
> > The 3/3 is selftest:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >
> > > Thanks,
> > > John
Hi John
Did I answer your question? I hope I explained things clearly.
> >
> >
> > --
> > Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao



-- 
Best regards, Tonghao

^ permalink raw reply

* [PATCH net-next 0/2] bareudp: Remove unused code from header file
From: Guillaume Nault @ 2021-12-10 19:56 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Martin Varghese

Stop exporting unused functions and structures in bareudp.h. The only
piece of bareudp.h that is actually used is netif_is_bareudp(). The
rest can be moved to bareudp.c or even dropped entirely.


Guillaume Nault (2):
  bareudp: Remove bareudp_dev_create()
  bareudp: Move definition of struct bareudp_conf to bareudp.c

 drivers/net/bareudp.c | 41 +++++++----------------------------------
 include/net/bareudp.h | 13 +------------
 2 files changed, 8 insertions(+), 46 deletions(-)

-- 
2.21.3


^ permalink raw reply

* [PATCH net-next 1/2] bareudp: Remove bareudp_dev_create()
From: Guillaume Nault @ 2021-12-10 19:56 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Martin Varghese
In-Reply-To: <cover.1639166064.git.gnault@redhat.com>

There's no user for this function.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 drivers/net/bareudp.c | 34 ----------------------------------
 include/net/bareudp.h |  4 ----
 2 files changed, 38 deletions(-)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index edffc3489a12..fb71a0753385 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -721,40 +721,6 @@ static struct rtnl_link_ops bareudp_link_ops __read_mostly = {
 	.fill_info      = bareudp_fill_info,
 };
 
-struct net_device *bareudp_dev_create(struct net *net, const char *name,
-				      u8 name_assign_type,
-				      struct bareudp_conf *conf)
-{
-	struct nlattr *tb[IFLA_MAX + 1];
-	struct net_device *dev;
-	int err;
-
-	memset(tb, 0, sizeof(tb));
-	dev = rtnl_create_link(net, name, name_assign_type,
-			       &bareudp_link_ops, tb, NULL);
-	if (IS_ERR(dev))
-		return dev;
-
-	err = bareudp_configure(net, dev, conf);
-	if (err) {
-		free_netdev(dev);
-		return ERR_PTR(err);
-	}
-	err = dev_set_mtu(dev, IP_MAX_MTU - BAREUDP_BASE_HLEN);
-	if (err)
-		goto err;
-
-	err = rtnl_configure_link(dev, NULL);
-	if (err < 0)
-		goto err;
-
-	return dev;
-err:
-	bareudp_dellink(dev, NULL);
-	return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(bareudp_dev_create);
-
 static __net_init int bareudp_init_net(struct net *net)
 {
 	struct bareudp_net *bn = net_generic(net, bareudp_net_id);
diff --git a/include/net/bareudp.h b/include/net/bareudp.h
index dc65a0d71d9b..8f07a91e0f25 100644
--- a/include/net/bareudp.h
+++ b/include/net/bareudp.h
@@ -14,10 +14,6 @@ struct bareudp_conf {
 	bool multi_proto_mode;
 };
 
-struct net_device *bareudp_dev_create(struct net *net, const char *name,
-				      u8 name_assign_type,
-				      struct bareudp_conf *info);
-
 static inline bool netif_is_bareudp(const struct net_device *dev)
 {
 	return dev->rtnl_link_ops &&
-- 
2.21.3


^ permalink raw reply related

* [PATCH net-next 2/2] bareudp: Move definition of struct bareudp_conf to bareudp.c
From: Guillaume Nault @ 2021-12-10 19:56 UTC (permalink / raw)
  To: David Miller, Jakub Kicinski; +Cc: netdev, Martin Varghese
In-Reply-To: <cover.1639166064.git.gnault@redhat.com>

This structure is used only in bareudp.c.

While there, adjust include files: we need netdevice.h, not skbuff.h.

Signed-off-by: Guillaume Nault <gnault@redhat.com>
---
 drivers/net/bareudp.c | 7 +++++++
 include/net/bareudp.h | 9 +--------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bareudp.c b/drivers/net/bareudp.c
index fb71a0753385..f80330361399 100644
--- a/drivers/net/bareudp.c
+++ b/drivers/net/bareudp.c
@@ -38,6 +38,13 @@ struct bareudp_net {
 	struct list_head        bareudp_list;
 };
 
+struct bareudp_conf {
+	__be16 ethertype;
+	__be16 port;
+	u16 sport_min;
+	bool multi_proto_mode;
+};
+
 /* Pseudo network device */
 struct bareudp_dev {
 	struct net         *net;        /* netns for packet i/o */
diff --git a/include/net/bareudp.h b/include/net/bareudp.h
index 8f07a91e0f25..17610c8d6361 100644
--- a/include/net/bareudp.h
+++ b/include/net/bareudp.h
@@ -3,17 +3,10 @@
 #ifndef __NET_BAREUDP_H
 #define __NET_BAREUDP_H
 
+#include <linux/netdevice.h>
 #include <linux/types.h>
-#include <linux/skbuff.h>
 #include <net/rtnetlink.h>
 
-struct bareudp_conf {
-	__be16 ethertype;
-	__be16 port;
-	u16 sport_min;
-	bool multi_proto_mode;
-};
-
 static inline bool netif_is_bareudp(const struct net_device *dev)
 {
 	return dev->rtnl_link_ops &&
-- 
2.21.3


^ permalink raw reply related

* [RFC net-next 0/4] net: Improving network scheduling latencies
From: Yannick Vignon @ 2021-12-10 19:35 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue, netdev, Ong Boon Leong,
	David S. Miller, Jakub Kicinski, Jose Abreu, Eric Dumazet,
	Wei Wang, Alexander Lobakin, Vladimir Oltean, Xiaoliang Yang,
	mingkai.hu, Joakim Zhang, sebastien.laveze

I am working on an application to showcase TSN use cases. That
application wakes up periodically, reads packet(s) from the network,
sends packet(s), then goes back to sleep. Endpoints are synchronized
through gPTP, and a 802.1Qbv schedule is in place to ensure packets are
sent at a fixed time. Right now, we achieve an overal period of 2ms,
which results in 500µs between the time the application is supposed to
wake up to the time the last packet is sent. We use an NXP kernel 5.10.x
with PREEMPT_RT patches.
I've been focusing lately on reducing the period, to see how close a
Linux-based system could get to a micro-controller with a "real-time"
OS. I've been able to achieve 500µs overall (125µs for the app itself)
by using AF_XDP sockets, but this also led to identifying several
sources of "scheduling" latencies, which I've tried to resolve with the
patches attached. The main culprit so far has been
local_bh_disable/local_bh_enable sections running in lower prio tasks,
requiring costly context switches along with priority inheritance. I've
removed the offending sections without significant problems so far, but
I'm not entirely clear though on the reason local_disable/enable were
used in those places: is it some simple oversight, an excess of caution,
or am I missing something more fundamental in the way those locks are
used?

Thanks,
Yannick

Yannick Vignon (4)
  net: stmmac: remove unnecessary locking around PTP clock reads
  net: stmmac: do not use __netif_tx_lock_bh when in NAPI threaded mode
  net: stmmac: move to threaded IRQ
  net: napi threaded: remove unnecessary locking

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 44
+++++++++++++++++++++++++-------------------
 drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c  |  2 --
 net/core/dev.c                                    |  2 --
 3 files changed, 25 insertions(+), 23 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox