Netdev List
 help / color / mirror / Atom feed
* [PATCH v20 bpf-next 10/23] net: mvneta: enable jumbo frames if the loaded XDP program support mb
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode if the loaded program declare to properly support
xdp multi-buff. At same time reject a xdp program not supporting xdp
multi-buffer if the driver is running in xdp multi-buffer mode.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 332699960b53..98db3d03116a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3750,6 +3750,7 @@ static void mvneta_percpu_disable(void *arg)
 static int mvneta_change_mtu(struct net_device *dev, int mtu)
 {
 	struct mvneta_port *pp = netdev_priv(dev);
+	struct bpf_prog *prog = pp->xdp_prog;
 	int ret;
 
 	if (!IS_ALIGNED(MVNETA_RX_PKT_SIZE(mtu), 8)) {
@@ -3758,8 +3759,11 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
 	}
 
-	if (pp->xdp_prog && mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		netdev_info(dev, "Illegal MTU value %d for XDP mode\n", mtu);
+	if (prog && !prog->aux->xdp_mb && mtu > MVNETA_MAX_RX_BUF_SIZE) {
+		netdev_info(dev,
+			    "Illegal MTU %d for XDP prog without multi-buf\n",
+			    mtu);
+
 		return -EINVAL;
 	}
 
@@ -4428,8 +4432,9 @@ static int mvneta_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
 	struct mvneta_port *pp = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 
-	if (prog && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
+	if (prog && !prog->aux->xdp_mb && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "prog does not support XDP multi-buff");
 		return -EOPNOTSUPP;
 	}
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 09/23] bpf: introduce BPF_F_XDP_MB flag in prog_flags loading the ebpf program
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce BPF_F_XDP_MB and the related field in bpf_prog_aux in order to
notify the driver the loaded program support xdp multi-buffer.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/bpf.h            | 1 +
 include/uapi/linux/bpf.h       | 5 +++++
 kernel/bpf/syscall.c           | 4 +++-
 tools/include/uapi/linux/bpf.h | 5 +++++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8bbf08fbab66..e516815e35f9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -875,6 +875,7 @@ struct bpf_prog_aux {
 	bool func_proto_unreliable;
 	bool sleepable;
 	bool tail_call_reachable;
+	bool xdp_mb;
 	struct hlist_node tramp_hlist;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c26871263f1f..d5d3e7d9ec49 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1111,6 +1111,11 @@ enum bpf_link_type {
  */
 #define BPF_F_SLEEPABLE		(1U << 4)
 
+/* If BPF_F_XDP_MB is used in BPF_PROG_LOAD command, the loaded program
+ * fully support xdp multi-buffer
+ */
+#define BPF_F_XDP_MB		(1U << 5)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * the following extensions:
  *
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b3ada4085f85..82626a95be99 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2202,7 +2202,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 				 BPF_F_ANY_ALIGNMENT |
 				 BPF_F_TEST_STATE_FREQ |
 				 BPF_F_SLEEPABLE |
-				 BPF_F_TEST_RND_HI32))
+				 BPF_F_TEST_RND_HI32 |
+				 BPF_F_XDP_MB))
 		return -EINVAL;
 
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
@@ -2288,6 +2289,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
 	prog->aux->dst_prog = dst_prog;
 	prog->aux->offload_requested = !!attr->prog_ifindex;
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
+	prog->aux->xdp_mb = attr->prog_flags & BPF_F_XDP_MB;
 
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c26871263f1f..d5d3e7d9ec49 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1111,6 +1111,11 @@ enum bpf_link_type {
  */
 #define BPF_F_SLEEPABLE		(1U << 4)
 
+/* If BPF_F_XDP_MB is used in BPF_PROG_LOAD command, the loaded program
+ * fully support xdp multi-buffer
+ */
+#define BPF_F_XDP_MB		(1U << 5)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * the following extensions:
  *
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 08/23] net: mvneta: add multi buffer support to XDP_TX
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 112 +++++++++++++++++---------
 1 file changed, 76 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e6977815b805..332699960b53 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1856,8 +1856,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 			bytes_compl += buf->skb->len;
 			pkts_compl++;
 			dev_kfree_skb_any(buf->skb);
-		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
-			   buf->type == MVNETA_TYPE_XDP_NDO) {
+		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
+			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
 			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
 				xdp_return_frame_rx_napi(buf->xdpf);
 			else
@@ -2051,47 +2051,87 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 static int
 mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
-			struct xdp_frame *xdpf, bool dma_map)
+			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
 {
-	struct mvneta_tx_desc *tx_desc;
-	struct mvneta_tx_buf *buf;
-	dma_addr_t dma_addr;
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
+	struct device *dev = pp->dev->dev.parent;
+	struct mvneta_tx_desc *tx_desc = NULL;
+	int i, num_frames = 1;
+	struct page *page;
+
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		num_frames += sinfo->nr_frags;
 
-	if (txq->count >= txq->tx_stop_threshold)
+	if (txq->count + num_frames >= txq->size)
 		return MVNETA_XDP_DROPPED;
 
-	tx_desc = mvneta_txq_next_desc_get(txq);
+	for (i = 0; i < num_frames; i++) {
+		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
+		skb_frag_t *frag = NULL;
+		int len = xdpf->len;
+		dma_addr_t dma_addr;
 
-	buf = &txq->buf[txq->txq_put_index];
-	if (dma_map) {
-		/* ndo_xdp_xmit */
-		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
-					  xdpf->len, DMA_TO_DEVICE);
-		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
-			mvneta_txq_desc_put(txq);
-			return MVNETA_XDP_DROPPED;
+		if (unlikely(i)) { /* paged area */
+			frag = &sinfo->frags[i - 1];
+			len = skb_frag_size(frag);
 		}
-		buf->type = MVNETA_TYPE_XDP_NDO;
-	} else {
-		struct page *page = virt_to_page(xdpf->data);
 
-		dma_addr = page_pool_get_dma_addr(page) +
-			   sizeof(*xdpf) + xdpf->headroom;
-		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
-					   xdpf->len, DMA_BIDIRECTIONAL);
-		buf->type = MVNETA_TYPE_XDP_TX;
+		tx_desc = mvneta_txq_next_desc_get(txq);
+		if (dma_map) {
+			/* ndo_xdp_xmit */
+			void *data;
+
+			data = unlikely(frag) ? skb_frag_address(frag)
+					      : xdpf->data;
+			dma_addr = dma_map_single(dev, data, len,
+						  DMA_TO_DEVICE);
+			if (dma_mapping_error(dev, dma_addr)) {
+				mvneta_txq_desc_put(txq);
+				goto unmap;
+			}
+
+			buf->type = MVNETA_TYPE_XDP_NDO;
+		} else {
+			page = unlikely(frag) ? skb_frag_page(frag)
+					      : virt_to_page(xdpf->data);
+			dma_addr = page_pool_get_dma_addr(page);
+			if (unlikely(frag))
+				dma_addr += skb_frag_off(frag);
+			else
+				dma_addr += sizeof(*xdpf) + xdpf->headroom;
+			dma_sync_single_for_device(dev, dma_addr, len,
+						   DMA_BIDIRECTIONAL);
+			buf->type = MVNETA_TYPE_XDP_TX;
+		}
+		buf->xdpf = unlikely(i) ? NULL : xdpf;
+
+		tx_desc->command = unlikely(i) ? 0 : MVNETA_TXD_F_DESC;
+		tx_desc->buf_phys_addr = dma_addr;
+		tx_desc->data_size = len;
+		*nxmit_byte += len;
+
+		mvneta_txq_inc_put(txq);
 	}
-	buf->xdpf = xdpf;
 
-	tx_desc->command = MVNETA_TXD_FLZ_DESC;
-	tx_desc->buf_phys_addr = dma_addr;
-	tx_desc->data_size = xdpf->len;
+	/*last descriptor */
+	if (likely(tx_desc))
+		tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
 
-	mvneta_txq_inc_put(txq);
-	txq->pending++;
-	txq->count++;
+	txq->pending += num_frames;
+	txq->count += num_frames;
 
 	return MVNETA_XDP_TX;
+
+unmap:
+	for (i--; i >= 0; i--) {
+		mvneta_txq_desc_put(txq);
+		tx_desc = txq->descs + txq->next_desc_to_proc;
+		dma_unmap_single(dev, tx_desc->buf_phys_addr,
+				 tx_desc->data_size,
+				 DMA_TO_DEVICE);
+	}
+
+	return MVNETA_XDP_DROPPED;
 }
 
 static int
@@ -2100,8 +2140,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 	struct mvneta_tx_queue *txq;
 	struct netdev_queue *nq;
+	int cpu, nxmit_byte = 0;
 	struct xdp_frame *xdpf;
-	int cpu;
 	u32 ret;
 
 	xdpf = xdp_convert_buff_to_frame(xdp);
@@ -2113,10 +2153,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	nq = netdev_get_tx_queue(pp->dev, txq->id);
 
 	__netif_tx_lock(nq, cpu);
-	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
+	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
 	if (ret == MVNETA_XDP_TX) {
 		u64_stats_update_begin(&stats->syncp);
-		stats->es.ps.tx_bytes += xdpf->len;
+		stats->es.ps.tx_bytes += nxmit_byte;
 		stats->es.ps.tx_packets++;
 		stats->es.ps.xdp_tx++;
 		u64_stats_update_end(&stats->syncp);
@@ -2155,11 +2195,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
 
 	__netif_tx_lock(nq, cpu);
 	for (i = 0; i < num_frame; i++) {
-		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
+		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
+					      true);
 		if (ret != MVNETA_XDP_TX)
 			break;
 
-		nxmit_byte += frames[i]->len;
 		nxmit++;
 	}
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 07/23] xdp: add multi-buff support to xdp_return_{buff/frame}
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 18 ++++++++++++++--
 net/core/xdp.c    | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index e594016eb193..798b84d86d97 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -306,10 +306,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
 static inline void xdp_release_frame(struct xdp_frame *xdpf)
 {
 	struct xdp_mem_info *mem = &xdpf->mem;
+	struct skb_shared_info *sinfo;
+	int i;
 
 	/* Curr only page_pool needs this */
-	if (mem->type == MEM_TYPE_PAGE_POOL)
-		__xdp_release_frame(xdpf->data, mem);
+	if (mem->type != MEM_TYPE_PAGE_POOL)
+		return;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_release_frame(page_address(page), mem);
+	}
+out:
+	__xdp_release_frame(xdpf->data, mem);
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 89183b2e3c07..7cfcc93116d7 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -374,12 +374,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -415,7 +441,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 	struct xdp_mem_allocator *xa;
 
 	if (mem->type != MEM_TYPE_PAGE_POOL) {
-		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+		xdp_return_frame(xdpf);
 		return;
 	}
 
@@ -434,12 +460,38 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 	}
 
+	if (unlikely(xdp_frame_is_mb(xdpf))) {
+		struct skb_shared_info *sinfo;
+		int i;
+
+		sinfo = xdp_get_shared_info_from_frame(xdpf);
+		for (i = 0; i < sinfo->nr_frags; i++) {
+			skb_frag_t *frag = &sinfo->frags[i];
+
+			bq->q[bq->count++] = skb_frag_address(frag);
+			if (bq->count == XDP_BULK_QUEUE_SIZE)
+				xdp_flush_frame_bulk(bq);
+		}
+	}
 	bq->q[bq->count++] = xdpf->data;
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+	}
+out:
 	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
 }
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 06/23] net: marvell: rely on xdp_update_skb_shared_info utility routine
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Rely on xdp_update_skb_shared_info routine in order to avoid
resetting frags array in skb_shared_info structure building
the skb in mvneta_swbm_build_skb(). Frags array is expected to
be initialized by the receiving driver building the xdp_buff
and here we just need to update memory metadata.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 567358a021a9..e6977815b805 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2304,8 +2304,12 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
 
-		if (!xdp_buff_is_mb(xdp))
+		if (!xdp_buff_is_mb(xdp)) {
+			sinfo->xdp_frags_size = *size;
 			xdp_buff_set_mb(xdp);
+		}
+		if (page_is_pfmemalloc(page))
+			xdp_buff_set_frag_pfmemalloc(xdp);
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
@@ -2319,7 +2323,6 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	struct sk_buff *skb;
 	u8 num_frags;
-	int i;
 
 	if (unlikely(xdp_buff_is_mb(xdp)))
 		num_frags = sinfo->nr_frags;
@@ -2334,18 +2337,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
 
-	if (likely(!xdp_buff_is_mb(xdp)))
-		goto out;
-
-	for (i = 0; i < num_frags; i++) {
-		skb_frag_t *frag = &sinfo->frags[i];
-
-		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-				skb_frag_page(frag), skb_frag_off(frag),
-				skb_frag_size(frag), PAGE_SIZE);
-	}
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		xdp_update_skb_shared_info(skb, num_frags,
+					   sinfo->xdp_frags_size,
+					   num_frags * xdp->frame_sz,
+					   xdp_buff_is_frag_pfmemalloc(xdp));
 
-out:
 	return skb;
 }
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 05/23] net: xdp: add xdp_update_skb_shared_info utility routine
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce xdp_update_skb_shared_info routine to update frags array
metadata in skb_shared_info data structure converting to a skb from
a xdp_buff or xdp_frame.
According to the current skb_shared_info architecture in
xdp_frame/xdp_buff and to the xdp multi-buff support, there is
no need to run skb_add_rx_frag() and reset frags array converting the buffer
to a skb since the frag array will be in the same position for xdp_buff/xdp_frame
and for the skb, we just need to update memory metadata.
Introduce XDP_FLAGS_PF_MEMALLOC flag in xdp_buff_flags in order to mark
the xdp_buff or xdp_frame as under memory-pressure if pages of the frags array
are under memory pressure. Doing so we can avoid looping over all fragments in
xdp_update_skb_shared_info routine. The driver is expected to set the
flag constructing the xdp_buffer using xdp_buff_set_frag_pfmemalloc
utility routine.
Rely on xdp_update_skb_shared_info in __xdp_build_skb_from_frame routine
converting the multi-buff xdp_frame to a skb after performing a XDP_REDIRECT.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 33 ++++++++++++++++++++++++++++++++-
 net/core/xdp.c    | 12 ++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 4ec7bdf0d937..e594016eb193 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -67,7 +67,10 @@ struct xdp_txq_info {
 };
 
 enum xdp_buff_flags {
-	XDP_FLAGS_MULTI_BUFF	= BIT(0), /* non-linear xdp buff */
+	XDP_FLAGS_MULTI_BUFF		= BIT(0), /* non-linear xdp buff */
+	XDP_FLAGS_FRAGS_PF_MEMALLOC	= BIT(1), /* xdp multi-buff paged memory
+						   * is under pressure
+						   */
 };
 
 struct xdp_buff {
@@ -96,6 +99,16 @@ static __always_inline void xdp_buff_clear_mb(struct xdp_buff *xdp)
 	xdp->flags &= ~XDP_FLAGS_MULTI_BUFF;
 }
 
+static __always_inline bool xdp_buff_is_frag_pfmemalloc(struct xdp_buff *xdp)
+{
+	return !!(xdp->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
+}
+
+static __always_inline void xdp_buff_set_frag_pfmemalloc(struct xdp_buff *xdp)
+{
+	xdp->flags |= XDP_FLAGS_FRAGS_PF_MEMALLOC;
+}
+
 static __always_inline void
 xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
@@ -151,6 +164,11 @@ static __always_inline bool xdp_frame_is_mb(struct xdp_frame *frame)
 	return !!(frame->flags & XDP_FLAGS_MULTI_BUFF);
 }
 
+static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame *frame)
+{
+	return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
+}
+
 #define XDP_BULK_QUEUE_SIZE	16
 struct xdp_frame_bulk {
 	int count;
@@ -186,6 +204,19 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+static inline void
+xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
+			   unsigned int size, unsigned int truesize,
+			   bool pfmemalloc)
+{
+	skb_shinfo(skb)->nr_frags = nr_frags;
+
+	skb->len += size;
+	skb->data_len += size;
+	skb->truesize += truesize;
+	skb->pfmemalloc |= pfmemalloc;
+}
+
 /* Avoids inlining WARN macro in fast-path */
 void xdp_warn(const char *msg, const char *func, const int line);
 #define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 5ddc29f29bad..89183b2e3c07 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -529,8 +529,14 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
 	unsigned int headroom, frame_size;
 	void *hard_start;
+	u8 nr_frags;
+
+	/* xdp multi-buff frame */
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		nr_frags = sinfo->nr_frags;
 
 	/* Part of headroom was reserved to xdpf */
 	headroom = sizeof(*xdpf) + xdpf->headroom;
@@ -550,6 +556,12 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	if (xdpf->metasize)
 		skb_metadata_set(skb, xdpf->metasize);
 
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		xdp_update_skb_shared_info(skb, nr_frags,
+					   sinfo->xdp_frags_size,
+					   nr_frags * xdpf->frame_sz,
+					   xdp_frame_is_frag_pfmemalloc(xdpf));
+
 	/* Essential SKB info: protocol and skb->dev */
 	skb->protocol = eth_type_trans(skb, dev);
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 04/23] net: mvneta: simplify mvneta_swbm_add_rx_fragment management
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Relying on xdp mb bit, remove skb_shared_info structure allocated on the
stack in mvneta_rx_swbm routine and simplify mvneta_swbm_add_rx_fragment
accessing skb_shared_info in the xdp_buff structure directly. There is no
performance penalty in this approach since mvneta_swbm_add_rx_fragment
is run just for multi-buff use-case.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 42 ++++++++++-----------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 208dc9116147..567358a021a9 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2032,9 +2032,9 @@ int mvneta_rx_refill_queue(struct mvneta_port *pp, struct mvneta_rx_queue *rxq)
 
 static void
 mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
-		    struct xdp_buff *xdp, struct skb_shared_info *sinfo,
-		    int sync_len)
+		    struct xdp_buff *xdp, int sync_len)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	int i;
 
 	if (likely(!xdp_buff_is_mb(xdp)))
@@ -2182,7 +2182,6 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       u32 frame_sz, struct mvneta_stats *stats)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	unsigned int len, data_len, sync;
 	u32 ret, act;
 
@@ -2203,7 +2202,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 			ret = MVNETA_XDP_DROPPED;
 		} else {
 			ret = MVNETA_XDP_REDIR;
@@ -2214,7 +2213,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2223,7 +2222,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		fallthrough;
 	case XDP_DROP:
-		mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+		mvneta_xdp_put_buff(pp, rxq, xdp, sync);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2275,9 +2274,9 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 			    struct mvneta_rx_desc *rx_desc,
 			    struct mvneta_rx_queue *rxq,
 			    struct xdp_buff *xdp, int *size,
-			    struct skb_shared_info *xdp_sinfo,
 			    struct page *page)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
 	int data_len, len;
@@ -2295,8 +2294,11 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 				len, dma_dir);
 	rx_desc->buf_phys_addr = 0;
 
-	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
-		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
+	if (!xdp_buff_is_mb(xdp))
+		sinfo->nr_frags = 0;
+
+	if (data_len > 0 && sinfo->nr_frags < MAX_SKB_FRAGS) {
+		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags++];
 
 		skb_frag_off_set(frag, pp->rx_offset_correction);
 		skb_frag_size_set(frag, data_len);
@@ -2307,16 +2309,6 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
-
-	/* last fragment */
-	if (len == *size) {
-		struct skb_shared_info *sinfo;
-
-		sinfo = xdp_get_shared_info_from_buff(xdp);
-		sinfo->nr_frags = xdp_sinfo->nr_frags;
-		memcpy(sinfo->frags, xdp_sinfo->frags,
-		       sinfo->nr_frags * sizeof(skb_frag_t));
-	}
 	*size -= len;
 }
 
@@ -2364,7 +2356,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 {
 	int rx_proc = 0, rx_todo, refill, size = 0;
 	struct net_device *dev = pp->dev;
-	struct skb_shared_info sinfo;
 	struct mvneta_stats ps = {};
 	struct bpf_prog *xdp_prog;
 	u32 desc_status, frame_sz;
@@ -2373,8 +2364,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	xdp_init_buff(&xdp_buf, PAGE_SIZE, &rxq->xdp_rxq);
 	xdp_buf.data_hard_start = NULL;
 
-	sinfo.nr_frags = 0;
-
 	/* Get number of received packets */
 	rx_todo = mvneta_rxq_busy_desc_num_get(pp, rxq);
 
@@ -2416,7 +2405,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			}
 
 			mvneta_swbm_add_rx_fragment(pp, rx_desc, rxq, &xdp_buf,
-						    &size, &sinfo, page);
+						    &size, page);
 		} /* Middle or Last descriptor */
 
 		if (!(rx_status & MVNETA_RXD_LAST_DESC))
@@ -2424,7 +2413,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			continue;
 
 		if (size) {
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 			goto next;
 		}
 
@@ -2436,7 +2425,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		if (IS_ERR(skb)) {
 			struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 
 			u64_stats_update_begin(&stats->syncp);
 			stats->es.skb_alloc_error++;
@@ -2453,11 +2442,10 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		napi_gro_receive(napi, skb);
 next:
 		xdp_buf.data_hard_start = NULL;
-		sinfo.nr_frags = 0;
 	}
 
 	if (xdp_buf.data_hard_start)
-		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, -1);
 
 	if (ps.xdp_redirect)
 		xdp_do_flush_map();
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 03/23] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
XDP remote drivers if this is a "non-linear" XDP buffer. Access
skb_shared_info only if xdp_buff mb is set in order to avoid possible
cache-misses.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 67a644177880..208dc9116147 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2037,9 +2037,14 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 {
 	int i;
 
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
 	for (i = 0; i < sinfo->nr_frags; i++)
 		page_pool_put_full_page(rxq->page_pool,
 					skb_frag_page(&sinfo->frags[i]), true);
+
+out:
 	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
 			   sync_len, true);
 }
@@ -2241,7 +2246,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	int data_len = -MVNETA_MH_SIZE, len;
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
-	struct skb_shared_info *sinfo;
 
 	if (*size > MVNETA_MAX_RX_BUF_SIZE) {
 		len = MVNETA_MAX_RX_BUF_SIZE;
@@ -2261,11 +2265,9 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 
 	/* Prefetch header */
 	prefetch(data);
+	xdp_buff_clear_mb(xdp);
 	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
 			 data_len, false);
-
-	sinfo = xdp_get_shared_info_from_buff(xdp);
-	sinfo->nr_frags = 0;
 }
 
 static void
@@ -2299,6 +2301,9 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		skb_frag_off_set(frag, pp->rx_offset_correction);
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
+
+		if (!xdp_buff_is_mb(xdp))
+			xdp_buff_set_mb(xdp);
 	} else {
 		page_pool_put_full_page(rxq->page_pool, page, true);
 	}
@@ -2320,8 +2325,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = sinfo->nr_frags;
 	struct sk_buff *skb;
+	u8 num_frags;
+	int i;
+
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		num_frags = sinfo->nr_frags;
 
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
@@ -2333,6 +2342,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 	skb_put(skb, xdp->data_end - xdp->data);
 	skb->ip_summed = mvneta_rx_csum(pp, desc_status);
 
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
 	for (i = 0; i < num_frags; i++) {
 		skb_frag_t *frag = &sinfo->frags[i];
 
@@ -2341,6 +2353,7 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct page_pool *pool,
 				skb_frag_size(frag), PAGE_SIZE);
 	}
 
+out:
 	return skb;
 }
 
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 02/23] xdp: introduce flags field in xdp_buff/xdp_frame
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce flags field in xdp_frame and xdp_buffer data structures
to define additional buffer features. At the moment the only
supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
frame (mb = 1). In the latter case the driver is expected to initialize
the skb_shared_info structure at the end of the first buffer to link
together subsequent buffers belonging to the same frame.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 447f9b1578f3..4ec7bdf0d937 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -66,6 +66,10 @@ struct xdp_txq_info {
 	struct net_device *dev;
 };
 
+enum xdp_buff_flags {
+	XDP_FLAGS_MULTI_BUFF	= BIT(0), /* non-linear xdp buff */
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -74,13 +78,30 @@ struct xdp_buff {
 	struct xdp_rxq_info *rxq;
 	struct xdp_txq_info *txq;
 	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
+	u32 flags; /* supported values defined in xdp_buff_flags */
 };
 
+static __always_inline bool xdp_buff_is_mb(struct xdp_buff *xdp)
+{
+	return !!(xdp->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
+static __always_inline void xdp_buff_set_mb(struct xdp_buff *xdp)
+{
+	xdp->flags |= XDP_FLAGS_MULTI_BUFF;
+}
+
+static __always_inline void xdp_buff_clear_mb(struct xdp_buff *xdp)
+{
+	xdp->flags &= ~XDP_FLAGS_MULTI_BUFF;
+}
+
 static __always_inline void
 xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
 	xdp->frame_sz = frame_sz;
 	xdp->rxq = rxq;
+	xdp->flags = 0;
 }
 
 static __always_inline void
@@ -122,8 +143,14 @@ struct xdp_frame {
 	 */
 	struct xdp_mem_info mem;
 	struct net_device *dev_rx; /* used by cpumap */
+	u32 flags; /* supported values defined in xdp_buff_flags */
 };
 
+static __always_inline bool xdp_frame_is_mb(struct xdp_frame *frame)
+{
+	return !!(frame->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
 #define XDP_BULK_QUEUE_SIZE	16
 struct xdp_frame_bulk {
 	int count;
@@ -180,6 +207,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
 	xdp->data_end = frame->data + frame->len;
 	xdp->data_meta = frame->data - frame->metasize;
 	xdp->frame_sz = frame->frame_sz;
+	xdp->flags = frame->flags;
 }
 
 static inline
@@ -206,6 +234,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
 	xdp_frame->frame_sz = xdp->frame_sz;
+	xdp_frame->flags = xdp->flags;
 
 	return 0;
 }
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 01/23] net: skbuff: add size metadata to skb_shared_info for xdp
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke
In-Reply-To: <cover.1639162845.git.lorenzo@kernel.org>

Introduce xdp_frags_size field in skb_shared_info data structure
to store xdp_buff/xdp_frame frame paged size (xdp_frags_size will
be used in xdp multi-buff support). In order to not increase
skb_shared_info size we will use a hole due to skb_shared_info
alignment.

Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/skbuff.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 686a666d073d..2eecb6931975 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -535,6 +535,7 @@ struct skb_shared_info {
 	 * Warning : all fields before dataref are cleared in __alloc_skb()
 	 */
 	atomic_t	dataref;
+	unsigned int	xdp_frags_size;
 
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
-- 
2.33.1


^ permalink raw reply related

* [PATCH v20 bpf-next 00/23] mvneta: introduce XDP multi-buffer support
From: Lorenzo Bianconi @ 2021-12-10 19:14 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar, toke

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 12506 bytes --]

This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

The main idea for the new multi-buffer layout is to reuse the same
structure used for non-linear SKB. This rely on the "skb_shared_info"
struct at the end of the first buffer to link together subsequent
buffers. Keeping the layout compatible with SKBs is also done to ease
and speedup creating a SKB from an xdp_{buff,frame}.
Converting xdp_frame to SKB and deliver it to the network stack is shown
in patch 05/18 (e.g. cpumaps).

A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
structure to notify the bpf/network layer if this is a xdp multi-buffer frame
(mb = 1) or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the skb_shared_info structure at the end
of the first buffer will be initialized only if mb is set.
Moreover the flags field in xdp_{buff,frame} will be reused even for
xdp rx csum offloading in future series.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO/GRO for XDP_REDIRECT

The three following ebpf helpers (and related selftests) has been introduced:
- bpf_xdp_load_bytes:
  This helper is provided as an easy way to load data from a xdp buffer. It
  can be used to load len bytes from offset from the frame associated to
  xdp_md, into the buffer pointed by buf.
- bpf_xdp_store_bytes:
  Store len bytes from buffer buf into the frame associated to xdp_md, at
  offset.
- bpf_xdp_get_buff_len:
  Return the total frame size (linear + paged parts)

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account xdp multi-buff frames.
Moreover, similar to skb_header_pointer, we introduced bpf_xdp_pointer utility
routine to return a pointer to a given position in the xdp_buff if the
requested area (offset + len) is contained in a contiguous memory area
otherwise it must be copied in a bounce buffer provided by the caller running
bpf_xdp_copy_buf().

BPF_F_XDP_MB flag for bpf_attr has been introduced to notify the kernel the
eBPF program fully support xdp multi-buffer.
SEC("xdp_mb/"), SEC_DEF("xdp_devmap_mb/") and SEC_DEF("xdp_cpumap_mb/" have been
introduced to declare xdp multi-buffer support.
The NIC driver is expected to reject an eBPF program if it is running in XDP
multi-buffer mode and the program does not support XDP multi-buffer.
In the same way it is not possible to mix xdp multi-buffer and xdp legacy
programs in a CPUMAP/DEVMAP or tailcall a xdp multi-buffer/legacy program from
a legacy/multi-buff one.

More info about the main idea behind this approach can be found here [1][2].

Changes since v19:
- do not run deprecated bpf_prog_load()
- rely on skb_frag_size_add/skb_frag_size_sub in
  bpf_xdp_mb_increase_tail/bpf_xdp_mb_shrink_tail
- rely on sinfo->nr_frags in bpf_xdp_mb_shrink_tail to check if the frame has
  been shrunk to a single-buffer one
- allow XDP_REDIRECT of a xdp-mb frame into a CPUMAP

Changes since v18:
- fix bpf_xdp_copy_buf utility routine when we want to load/store data
  contained in frag<n>
- add a selftest for bpf_xdp_load_bytes/bpf_xdp_store_bytes when the caller
  accesses data contained in frag<n> and frag<n+1>

Changes since v17:
- rework bpf_xdp_copy to squash base and frag management
- remove unused variable in bpf_xdp_mb_shrink_tail()
- move bpf_xdp_copy_buf() out of bpf_xdp_pointer()
- add sanity check for len in bpf_xdp_pointer()
- remove EXPORT_SYMBOL for __xdp_return()
- introduce frag_size field in xdp_rxq_info to let the driver specify max value
  for xdp fragments. frag_size set to 0 means the tail increase of last the
  fragment is not supported.

Changes since v16:
- do not allow tailcalling a xdp multi-buffer/legacy program from a
  legacy/multi-buff one.
- do not allow mixing xdp multi-buffer and xdp legacy programs in a
  CPUMAP/DEVMAP
- add selftests for CPUMAP/DEVMAP xdp mb compatibility
- disable XDP_REDIRECT for xdp multi-buff for the moment
- set max offset value to 0xffff in bpf_xdp_pointer
- use ARG_PTR_TO_UNINIT_MEM and ARG_CONST_SIZE for arg3_type and arg4_type
  of bpf_xdp_store_bytes/bpf_xdp_load_bytes

Changes since v15:
- let the verifier check buf is not NULL in
  bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
- return an error if offset + length is over frame boundaries in
  bpf_xdp_pointer routine
- introduce BPF_F_XDP_MB flag for bpf_attr to notify the kernel the eBPF
  program fully supports xdp multi-buffer.
- reject a non XDP multi-buffer program if the driver is running in
  XDP multi-buffer mode.

Changes since v14:
- intrudce bpf_xdp_pointer utility routine and
  bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
- drop bpf_xdp_adjust_data helper
- drop xdp_frags_truesize in skb_shared_info
- explode bpf_xdp_mb_adjust_tail in bpf_xdp_mb_increase_tail and
  bpf_xdp_mb_shrink_tail

Changes since v13:
- use u32 for xdp_buff/xdp_frame flags field
- rename xdp_frags_tsize in xdp_frags_truesize
- fixed comments

Changes since v12:
- fix bpf_xdp_adjust_data helper for single-buffer use case
- return -EFAULT in bpf_xdp_adjust_{head,tail} in case the data pointers are not
  properly reset
- collect ACKs from John

Changes since v11:
- add missing static to bpf_xdp_get_buff_len_proto structure
- fix bpf_xdp_adjust_data helper when offset is smaller than linear area length.

Changes since v10:
- move xdp->data to the requested payload offset instead of to the beginning of
  the fragment in bpf_xdp_adjust_data()

Changes since v9:
- introduce bpf_xdp_adjust_data helper and related selftest
- add xdp_frags_size and xdp_frags_tsize fields in skb_shared_info
- introduce xdp_update_skb_shared_info utility routine in ordere to not reset
  frags array in skb_shared_info converting from a xdp_buff/xdp_frame to a skb 
- simplify bpf_xdp_copy routine

Changes since v8:
- add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
- switch back to skb_shared_info implementation from previous xdp_shared_info
  one
- avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
  regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
  penalties for regular codebase
- add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
- add data_len field in skb_shared_info struct
- introduce XDP_FLAGS_FRAGS_PF_MEMALLOC flag

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (3):
  bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  bpf: add multi-buffer support to xdp copy helpers
  bpf: selftests: update xdp_adjust_tail selftest to include
    multi-buffer

Lorenzo Bianconi (19):
  net: skbuff: add size metadata to skb_shared_info for xdp
  xdp: introduce flags field in xdp_buff/xdp_frame
  net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  net: mvneta: simplify mvneta_swbm_add_rx_fragment management
  net: xdp: add xdp_update_skb_shared_info utility routine
  net: marvell: rely on xdp_update_skb_shared_info utility routine
  xdp: add multi-buff support to xdp_return_{buff/frame}
  net: mvneta: add multi buffer support to XDP_TX
  bpf: introduce BPF_F_XDP_MB flag in prog_flags loading the ebpf
    program
  net: mvneta: enable jumbo frames if the loaded XDP program support mb
  bpf: introduce bpf_xdp_get_buff_len helper
  bpf: move user_size out of bpf_test_init
  bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature
  libbpf: Add SEC name for xdp_mb programs
  net: xdp: introduce bpf_xdp_pointer utility routine
  bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
  bpf: selftests: add CPUMAP/DEVMAP selftests for xdp multi-buff
  xdp: disable XDP_REDIRECT for xdp multi-buff

Toke Hoiland-Jorgensen (1):
  bpf: generalise tail call map compatibility check

 drivers/net/ethernet/marvell/mvneta.c         | 204 +++++++++------
 include/linux/bpf.h                           |  32 ++-
 include/linux/skbuff.h                        |   1 +
 include/net/xdp.h                             | 108 +++++++-
 include/uapi/linux/bpf.h                      |  30 +++
 kernel/bpf/arraymap.c                         |   4 +-
 kernel/bpf/core.c                             |  28 +-
 kernel/bpf/cpumap.c                           |   8 +-
 kernel/bpf/devmap.c                           |   3 +-
 kernel/bpf/syscall.c                          |  25 +-
 kernel/trace/bpf_trace.c                      |   3 +
 net/bpf/test_run.c                            | 115 +++++++--
 net/core/filter.c                             | 244 +++++++++++++++++-
 net/core/xdp.c                                |  78 +++++-
 tools/include/uapi/linux/bpf.h                |  30 +++
 tools/lib/bpf/libbpf.c                        |   8 +
 .../bpf/prog_tests/xdp_adjust_frags.c         | 103 ++++++++
 .../bpf/prog_tests/xdp_adjust_tail.c          | 131 ++++++++++
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 151 ++++++++---
 .../bpf/prog_tests/xdp_cpumap_attach.c        |  65 ++++-
 .../bpf/prog_tests/xdp_devmap_attach.c        |  56 ++++
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 .../bpf/progs/test_xdp_update_frags.c         |  42 +++
 .../bpf/progs/test_xdp_with_cpumap_helpers.c  |   6 +
 .../progs/test_xdp_with_cpumap_mb_helpers.c   |  27 ++
 .../bpf/progs/test_xdp_with_devmap_helpers.c  |   7 +
 .../progs/test_xdp_with_devmap_mb_helpers.c   |  27 ++
 29 files changed, 1368 insertions(+), 212 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_adjust_frags.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_update_frags.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_with_cpumap_mb_helpers.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_with_devmap_mb_helpers.c

-- 
2.33.1


^ permalink raw reply

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Ansuel Smith @ 2021-12-10 19:10 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <61b396c3.1c69fb81.17062.836a@mx.google.com>

On Fri, Dec 10, 2021 at 07:04:48PM +0100, Ansuel Smith wrote:
> On Fri, Dec 10, 2021 at 06:29:32PM +0100, Ansuel Smith wrote:
> > On Fri, Dec 10, 2021 at 05:15:30PM +0000, Vladimir Oltean wrote:
> > > On Fri, Dec 10, 2021 at 06:10:45PM +0100, Ansuel Smith wrote:
> > > > On Fri, Dec 10, 2021 at 05:02:42PM +0000, Vladimir Oltean wrote:
> > > > > On Fri, Dec 10, 2021 at 04:37:52AM +0100, Ansuel Smith wrote:
> > > > > > On Thu, Dec 09, 2021 at 07:39:23PM +0200, Vladimir Oltean wrote:
> > > > > > > This patch set is provided solely for review purposes (therefore not to
> > > > > > > be applied anywhere) and for Ansuel to test whether they resolve the
> > > > > > > slowdown reported here:
> > > > > > > https://patchwork.kernel.org/project/netdevbpf/cover/20211207145942.7444-1-ansuelsmth@gmail.com/
> > > > > > > 
> > > > > > > The patches posted here are mainly to offer a consistent
> > > > > > > "master_state_change" chain of events to switches, without duplicates,
> > > > > > > and always starting with operational=true and ending with
> > > > > > > operational=false. This way, drivers should know when they can perform
> > > > > > > Ethernet-based register access, and need not care about more than that.
> > > > > > > 
> > > > > > > Changes in v2:
> > > > > > > - dropped some useless patches
> > > > > > > - also check master operstate.
> > > > > > > 
> > > > > > > Vladimir Oltean (4):
> > > > > > >   net: dsa: provide switch operations for tracking the master state
> > > > > > >   net: dsa: stop updating master MTU from master.c
> > > > > > >   net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}
> > > > > > >   net: dsa: replay master state events in
> > > > > > >     dsa_tree_{setup,teardown}_master
> > > > > > > 
> > > > > > >  include/net/dsa.h  | 11 +++++++
> > > > > > >  net/dsa/dsa2.c     | 80 +++++++++++++++++++++++++++++++++++++++++++---
> > > > > > >  net/dsa/dsa_priv.h | 13 ++++++++
> > > > > > >  net/dsa/master.c   | 29 ++---------------
> > > > > > >  net/dsa/slave.c    | 27 ++++++++++++++++
> > > > > > >  net/dsa/switch.c   | 15 +++++++++
> > > > > > >  6 files changed, 145 insertions(+), 30 deletions(-)
> > > > > > > 
> > > > > > > -- 
> > > > > > > 2.25.1
> > > > > > > 
> > > > > > 
> > > > > > Hi, I tested this v2 and I still have 2 ethernet mdio failing on init.
> > > > > > I don't think we have other way to track this. Am I wrong?
> > > > > > 
> > > > > > All works correctly with this and promisc_on_master.
> > > > > > If you have other test, feel free to send me other stuff to test.
> > > > > > 
> > > > > > (I'm starting to think the fail is caused by some delay that the switch
> > > > > > require to actually start accepting packet or from the reinit? But I'm
> > > > > > not sure... don't know if you notice something from the pcap)
> > > > > 
> > > > > I've opened the pcap just now. The Ethernet MDIO packets are
> > > > > non-standard. When the DSA master receives them, it expects the first 6
> > > > > octets to be the MAC DA, because that's the format of an Ethernet frame.
> > > > > But the packets have this other format, according to your own writing:
> > > > > 
> > > > > /* Specific define for in-band MDIO read/write with Ethernet packet */
> > > > > #define QCA_HDR_MDIO_SEQ_LEN           4 /* 4 byte for the seq */
> > > > > #define QCA_HDR_MDIO_COMMAND_LEN       4 /* 4 byte for the command */
> > > > > #define QCA_HDR_MDIO_DATA1_LEN         4 /* First 4 byte for the mdio data */
> > > > > #define QCA_HDR_MDIO_HEADER_LEN        (QCA_HDR_MDIO_SEQ_LEN + \
> > > > >                                        QCA_HDR_MDIO_COMMAND_LEN + \
> > > > >                                        QCA_HDR_MDIO_DATA1_LEN)
> > > > > 
> > > > > #define QCA_HDR_MDIO_DATA2_LEN         12 /* Other 12 byte for the mdio data */
> > > > > #define QCA_HDR_MDIO_PADDING_LEN       34 /* Padding to reach the min Ethernet packet */
> > > > > 
> > > > > The first 6 octets change like crazy in your pcap. Definitely can't add
> > > > > that to the RX filter of the DSA master.
> > > > > 
> > > > > So yes, promisc_on_master is precisely what you need, it exists for
> > > > > situations like this.
> > > > > 
> > > > > Considering this, I guess it works?
> > > > 
> > > > Yes it works! We can totally accept 2 mdio timeout out of a good way to
> > > > track the master port. It's probably related to other stuff like switch
> > > > delay or other.
> > > > 
> > > > Wonder the next step is wait for this to be accepted and then I can
> > > > propose a v3 of my patch? Or net-next is closed now and I should just
> > > > send v3 RFC saying it does depend on this?
> > > 
> > > Wait a minute, I don't think I understood your previous reply.
> > > With promisc_on_master, is there or is there not any timeout?
> > 
> > With promisc_on_master I have only 2 timeout.
> > 
> > > My understanding was this: DSA tells you when the master is up and
> > > operational. That information is correct, except it isn't sufficient and
> > > you don't see the replies back. Later during boot, you have some init
> > > scripts triggered by user space that create a bridge interface and put
> > > the switch ports under the bridge. The bridge puts the switch interfaces
> > > in promiscuous mode, because that's what bridges do. Then DSA propagates
> > > the promiscuous mode from the switch ports to the DSA master, and once
> > > the master is promiscuous, the Ethernet MDIO starts working too.
> > > Now, with promisc_on_master set, the DSA master is already promiscuous
> > > by the time DSA tells you that it's up and running. Hence your message
> > > that "All works correctly with this and promisc_on_master."
> > > What did I misunderstand?
> > 
> > You got all correct. But still I have these 2 timeout at the very start.
> > Let me give you another pastebin to make this more clear. [0]
> > Transaction done is when the Ethernet packet is received and processed.
> > I added some pr with the events received by switch.c
> > 
> > I should check if the tagger receive some packet before the
> > "function timeout". 
> > What I mean with "acceptable state" is that aside from the 2
> > timeout everything else works correctly withtout any slowdown in the
> > init process.
> > 
> > [0] https://pastebin.com/VfGB5hAQ
> > 
> > -- 
> > 	Ansuel
> 
> Ok I added more tracing and packet are received to the tagger right
> after the log from ipv6 "link becomes ready". That log just check if the
> interface is up and if it does have a valid sched.
> I notice after link becomes ready we have a CHANGE event for eth0. That
> should be the correct way to understand when the cpu port is actually
> usable.
> (just to make it clear before the link becomes ready no packet is
> received to the tagger and the completion timeouts)
> 
> -- 
> 	Ansuel

Sorry for the triple message spam... I have a solution. It seems packet
are processed as soon as dev_activate is called (so a qdisk is assigned)
By adding another bool like master_oper_ready and

void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
                                      struct net_device *master,
                                      bool up);

static void dsa_tree_master_state_change(struct dsa_switch_tree *dst,
                                        struct net_device *master)
{
       struct dsa_notifier_master_state_info info;
       struct dsa_port *cpu_dp = master->dsa_ptr;

       info.master = master;
       info.operational = cpu_dp->master_admin_up && cpu_dp->master_oper_up && cpu_dp->master_oper_ready;

       dsa_tree_notify(dst, DSA_NOTIFIER_MASTER_STATE_CHANGE, &info);
}

void dsa_tree_master_oper_state_ready(struct dsa_switch_tree *dst,
                                      struct net_device *master,
                                      bool up)
{
       struct dsa_port *cpu_dp = master->dsa_ptr;
       bool notify = false;

       if ((cpu_dp->master_oper_ready && cpu_dp->master_oper_ready) !=
           (cpu_dp->master_oper_ready && up))
               notify = true;

       cpu_dp->master_oper_ready = up;

       if (notify)
               dsa_tree_master_state_change(dst, master);
}

In slave.c at the NETDEV_CHANGE event the additional
dsa_tree_master_oper_state_ready(dst, dev, dev_ingress_queue(dev));
we have no timeout function. I just tested this and it works right away.

Think we need this additional check to make sure the tagger can finally
accept packet from the switch.

With this added I think this is ready.

-- 
	Ansuel

^ permalink raw reply

* Re: [PATCH] selftests: mptcp: remove duplicate include in mptcp_inq.c
From: patchwork-bot+netdevbpf @ 2021-12-10 19:10 UTC (permalink / raw)
  To: CGEL
  Cc: mathew.j.martineau, matthieu.baerts, davem, kuba, shuah, netdev,
	mptcp, linux-kselftest, linux-kernel, ye.guojin, zealci
In-Reply-To: <20211210071424.425773-1-ye.guojin@zte.com.cn>

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 10 Dec 2021 07:14:24 +0000 you wrote:
> From: Ye Guojin <ye.guojin@zte.com.cn>
> 
> 'sys/ioctl.h' included in 'mptcp_inq.c' is duplicated.
> 
> Reported-by: ZealRobot <zealci@zte.com.cn>
> Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
> 
> [...]

Here is the summary with links:
  - selftests: mptcp: remove duplicate include in mptcp_inq.c
    https://git.kernel.org/netdev/net-next/c/db1041544815

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v6 00/14] genirq: Cleanup the abuse of irq_set_affinity_hint()
From: Nitesh Lal @ 2021-12-10 18:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Martin K. Petersen, Juri Lelli, Dick Kennedy, Marcelo Tosatti,
	linux-kernel, linux-pci, linux-scsi, netdev, davem, ajit.khaparde,
	sriharsha.basavapatna, somnath.kotur, huangguangbin2, huangdaode,
	Frederic Weisbecker, Alex Belits, Bjorn Helgaas, rostedt,
	Peter Zijlstra, Jesse Brandeburg, Robin Murphy, Ingo Molnar,
	jbrandeb, akpm, sfr, stephen, rppt, chris.friesen, Marc Zyngier,
	Neil Horman, pjwaskiewicz, Stefan Assmann, Tomas Henzl,
	james.smart, Ken Cox, faisal.latif, shiraz.saleem, tariqt,
	Alaa Hleihel, Kamal Heib, borisp, saeedm, Nikolova, Tatyana E,
	Ismail, Mustafa, Al Stone, Leon Romanovsky, Chandrakanth Patil,
	bjorn.andersson, chunkuang.hu, yongqiang.niu, baolin.wang7,
	Petr Oros, Ming Lei, Ewan Milne, jejb, kabel, Viresh Kumar,
	Jakub Kicinski, kashyap.desai, Sumit Saxena,
	shivasharan.srikanteshwara, sathya.prakash, Sreekanth Reddy,
	suganath-prabu.subramani, ley.foon.tan, jbrunet, johannes,
	snelson, lewis.hanly, benve, _govind, jassisinghbrar
In-Reply-To: <87ilvwxpt7.ffs@tglx>

On Fri, Dec 10, 2021 at 1:44 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, Dec 10 2021 at 08:51, Nitesh Lal wrote:
> > On Wed, Nov 24, 2021 at 5:16 PM Nitesh Lal <nilal@redhat.com> wrote:
> >> > The more general question is whether I should queue all the others or
> >> > whether some subsystem would prefer to pull in a tagged commit on top of
> >> > rc1. I'm happy to carry them all of course.
> >> >
> >>
> >> I am fine either way.
> >> In the past, while I was asking for more testing help I was asked if the
> >> SCSI changes are part of Martins's scsi-fixes tree as that's something
> >> Broadcom folks test to check for regression.
> >> So, maybe Martin can pull this up?
> >>
> >
> > Gentle ping.
> > Any thoughts on the above query?
>
> As nobody cares, I'll pick it up.
>

Sounds good to me.
Thank you!

--
Nitesh


^ permalink raw reply

* Re: [PATCH net-next v8 6/6] net: dt-bindings: dwmac: add support for mt8195
From: Rob Herring @ 2021-12-10 18:52 UTC (permalink / raw)
  To: Biao Huang
  Cc: davem, Jakub Kicinski, Matthias Brugger, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu, Maxime Coquelin, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, linux-stm32,
	srv_heupstream, macpaul.lin, angelogioacchino.delregno, dkirjanov
In-Reply-To: <20211210013129.811-7-biao.huang@mediatek.com>

On Fri, Dec 10, 2021 at 09:31:29AM +0800, Biao Huang wrote:
> Add binding document for the ethernet on mt8195.
> 
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>
> ---
>  .../bindings/net/mediatek-dwmac.yaml          | 86 +++++++++++++++----
>  1 file changed, 70 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> index 9207266a6e69..fb04166404d8 100644
> --- a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> +++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> @@ -19,11 +19,67 @@ select:
>        contains:
>          enum:
>            - mediatek,mt2712-gmac
> +          - mediatek,mt8195-gmac
>    required:
>      - compatible
>  
>  allOf:
>    - $ref: "snps,dwmac.yaml#"
> +  - if:
> +      properties:
> +        compatible:
> +          contains:
> +            enum:
> +              - mediatek,mt2712-gmac
> +
> +    then:
> +      properties:
> +        clocks:
> +          minItems: 5
> +          items:
> +            - description: AXI clock
> +            - description: APB clock
> +            - description: MAC Main clock
> +            - description: PTP clock
> +            - description: RMII reference clock provided by MAC
> +
> +        clock-names:
> +          minItems: 5
> +          items:
> +            - const: axi
> +            - const: apb
> +            - const: mac_main
> +            - const: ptp_ref
> +            - const: rmii_internal
> +
> +  - if:
> +      properties:
> +        compatible:
> +          contains:
> +            enum:
> +              - mediatek,mt8195-gmac
> +
> +    then:
> +      properties:
> +        clocks:
> +          minItems: 6
> +          items:
> +            - description: AXI clock
> +            - description: APB clock
> +            - description: MAC clock gate

Add new clocks on to the end of existing clocks. That will simplify the 
binding as here you will just need 'minItems: 6'.

> +            - description: MAC Main clock
> +            - description: PTP clock
> +            - description: RMII reference clock provided by MAC
> +
> +        clock-names:
> +          minItems: 6
> +          items:
> +            - const: axi
> +            - const: apb
> +            - const: mac_cg
> +            - const: mac_main
> +            - const: ptp_ref
> +            - const: rmii_internal
>  
>  properties:
>    compatible:
> @@ -32,22 +88,10 @@ properties:
>            - enum:
>                - mediatek,mt2712-gmac
>            - const: snps,dwmac-4.20a
> -
> -  clocks:
> -    items:
> -      - description: AXI clock
> -      - description: APB clock
> -      - description: MAC Main clock
> -      - description: PTP clock
> -      - description: RMII reference clock provided by MAC
> -
> -  clock-names:
> -    items:
> -      - const: axi
> -      - const: apb
> -      - const: mac_main
> -      - const: ptp_ref
> -      - const: rmii_internal
> +      - items:
> +          - enum:
> +              - mediatek,mt8195-gmac
> +          - const: snps,dwmac-5.10a
>  
>    mediatek,pericfg:
>      $ref: /schemas/types.yaml#/definitions/phandle
> @@ -62,6 +106,8 @@ properties:
>        or will round down. Range 0~31*170.
>        For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
>        or will round down. Range 0~31*550.
> +      For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple of 290,
> +      or will round down. Range 0~31*290.
>  
>    mediatek,rx-delay-ps:
>      description:
> @@ -70,6 +116,8 @@ properties:
>        or will round down. Range 0~31*170.
>        For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
>        or will round down. Range 0~31*550.
> +      For MT8195 RGMII/RMII/MII interface, Allowed value need to be a multiple
> +      of 290, or will round down. Range 0~31*290.
>  
>    mediatek,rmii-rxc:
>      type: boolean
> @@ -103,6 +151,12 @@ properties:
>        3. the inside clock, which be sent to MAC, will be inversed in RMII case when
>           the reference clock is from MAC.
>  
> +  mediatek,mac-wol:
> +    type: boolean
> +    description:
> +      If present, indicates that MAC supports WOL(Wake-On-LAN), and MAC WOL will be enabled.
> +      Otherwise, PHY WOL is perferred.
> +
>  required:
>    - compatible
>    - reg
> -- 
> 2.25.1
> 
> 

^ permalink raw reply

* [PATCH net-next] tipc: discard MSG_CRYPTO msgs when key_exchange_enabled is not set
From: Xin Long @ 2021-12-10 18:50 UTC (permalink / raw)
  To: network dev, tipc-discussion
  Cc: Jon Maloy, Ying Xue, Hoang Huu Le, davem, kuba

When key_exchange is disabled, there is no reason to accept MSG_CRYPTO
msgs if it doesn't send MSG_CRYPTO msgs.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jon Maloy <jmaloy@redhat.com>
---
 net/tipc/link.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 09ae8448f394..8d9e09f48f4c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1298,7 +1298,8 @@ static bool tipc_data_input(struct tipc_link *l, struct sk_buff *skb,
 		return false;
 #ifdef CONFIG_TIPC_CRYPTO
 	case MSG_CRYPTO:
-		if (TIPC_SKB_CB(skb)->decrypted) {
+		if (sysctl_tipc_key_exchange_enabled &&
+		    TIPC_SKB_CB(skb)->decrypted) {
 			tipc_crypto_msg_rcv(l->net, skb);
 			return true;
 		}
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH net-next v8 4/6] net: dt-bindings: dwmac: Convert mediatek-dwmac to DT schema
From: Rob Herring @ 2021-12-10 18:49 UTC (permalink / raw)
  To: Biao Huang
  Cc: davem, Jakub Kicinski, Matthias Brugger, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu, Maxime Coquelin, netdev, devicetree,
	linux-kernel, linux-arm-kernel, linux-mediatek, linux-stm32,
	srv_heupstream, macpaul.lin, angelogioacchino.delregno, dkirjanov
In-Reply-To: <20211210013129.811-5-biao.huang@mediatek.com>

On Fri, Dec 10, 2021 at 09:31:27AM +0800, Biao Huang wrote:
> Convert mediatek-dwmac to DT schema, and delete old mediatek-dwmac.txt.
> And there are some changes in .yaml than .txt, others almost keep the same:
>   1. compatible "const: snps,dwmac-4.20".
>   2. delete "snps,reset-active-low;" in example, since driver remove this
>      property long ago.
>   3. add "snps,reset-delay-us = <0 10000 10000>" in example.
>   4. the example is for rgmii interface, keep related properties only.
> 
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>
> ---
>  .../bindings/net/mediatek-dwmac.txt           |  91 ----------
>  .../bindings/net/mediatek-dwmac.yaml          | 156 ++++++++++++++++++
>  2 files changed, 156 insertions(+), 91 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/net/mediatek-dwmac.txt
>  create mode 100644 Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> 
> diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt b/Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> deleted file mode 100644
> index afbcaebf062e..000000000000
> --- a/Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> +++ /dev/null
> @@ -1,91 +0,0 @@
> -MediaTek DWMAC glue layer controller
> -
> -This file documents platform glue layer for stmmac.
> -Please see stmmac.txt for the other unchanged properties.
> -
> -The device node has following properties.
> -
> -Required properties:
> -- compatible:  Should be "mediatek,mt2712-gmac" for MT2712 SoC
> -- reg:  Address and length of the register set for the device
> -- interrupts:  Should contain the MAC interrupts
> -- interrupt-names: Should contain a list of interrupt names corresponding to
> -	the interrupts in the interrupts property, if available.
> -	Should be "macirq" for the main MAC IRQ
> -- clocks: Must contain a phandle for each entry in clock-names.
> -- clock-names: The name of the clock listed in the clocks property. These are
> -	"axi", "apb", "mac_main", "ptp_ref", "rmii_internal" for MT2712 SoC.
> -- mac-address: See ethernet.txt in the same directory
> -- phy-mode: See ethernet.txt in the same directory
> -- mediatek,pericfg: A phandle to the syscon node that control ethernet
> -	interface and timing delay.
> -
> -Optional properties:
> -- mediatek,tx-delay-ps: TX clock delay macro value. Default is 0.
> -	It should be defined for RGMII/MII interface.
> -	It should be defined for RMII interface when the reference clock is from MT2712 SoC.
> -- mediatek,rx-delay-ps: RX clock delay macro value. Default is 0.
> -	It should be defined for RGMII/MII interface.
> -	It should be defined for RMII interface.
> -Both delay properties need to be a multiple of 170 for RGMII interface,
> -or will round down. Range 0~31*170.
> -Both delay properties need to be a multiple of 550 for MII/RMII interface,
> -or will round down. Range 0~31*550.
> -
> -- mediatek,rmii-rxc: boolean property, if present indicates that the RMII
> -	reference clock, which is from external PHYs, is connected to RXC pin
> -	on MT2712 SoC.
> -	Otherwise, is connected to TXC pin.
> -- mediatek,rmii-clk-from-mac: boolean property, if present indicates that
> -	MT2712 SoC provides the RMII reference clock, which outputs to TXC pin only.
> -- mediatek,txc-inverse: boolean property, if present indicates that
> -	1. tx clock will be inversed in MII/RGMII case,
> -	2. tx clock inside MAC will be inversed relative to reference clock
> -	   which is from external PHYs in RMII case, and it rarely happen.
> -	3. the reference clock, which outputs to TXC pin will be inversed in RMII case
> -	   when the reference clock is from MT2712 SoC.
> -- mediatek,rxc-inverse: boolean property, if present indicates that
> -	1. rx clock will be inversed in MII/RGMII case.
> -	2. reference clock will be inversed when arrived at MAC in RMII case, when
> -	   the reference clock is from external PHYs.
> -	3. the inside clock, which be sent to MAC, will be inversed in RMII case when
> -	   the reference clock is from MT2712 SoC.
> -- assigned-clocks: mac_main and ptp_ref clocks
> -- assigned-clock-parents: parent clocks of the assigned clocks
> -
> -Example:
> -	eth: ethernet@1101c000 {
> -		compatible = "mediatek,mt2712-gmac";
> -		reg = <0 0x1101c000 0 0x1300>;
> -		interrupts = <GIC_SPI 237 IRQ_TYPE_LEVEL_LOW>;
> -		interrupt-names = "macirq";
> -		phy-mode ="rgmii-rxid";
> -		mac-address = [00 55 7b b5 7d f7];
> -		clock-names = "axi",
> -			      "apb",
> -			      "mac_main",
> -			      "ptp_ref",
> -			      "rmii_internal";
> -		clocks = <&pericfg CLK_PERI_GMAC>,
> -			 <&pericfg CLK_PERI_GMAC_PCLK>,
> -			 <&topckgen CLK_TOP_ETHER_125M_SEL>,
> -			 <&topckgen CLK_TOP_ETHER_50M_SEL>,
> -			 <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>;
> -		assigned-clocks = <&topckgen CLK_TOP_ETHER_125M_SEL>,
> -				  <&topckgen CLK_TOP_ETHER_50M_SEL>,
> -				  <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>;
> -		assigned-clock-parents = <&topckgen CLK_TOP_ETHERPLL_125M>,
> -					 <&topckgen CLK_TOP_APLL1_D3>,
> -					 <&topckgen CLK_TOP_ETHERPLL_50M>;
> -		power-domains = <&scpsys MT2712_POWER_DOMAIN_AUDIO>;
> -		mediatek,pericfg = <&pericfg>;
> -		mediatek,tx-delay-ps = <1530>;
> -		mediatek,rx-delay-ps = <1530>;
> -		mediatek,rmii-rxc;
> -		mediatek,txc-inverse;
> -		mediatek,rxc-inverse;
> -		snps,txpbl = <1>;
> -		snps,rxpbl = <1>;
> -		snps,reset-gpio = <&pio 87 GPIO_ACTIVE_LOW>;
> -		snps,reset-active-low;
> -	};
> diff --git a/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> new file mode 100644
> index 000000000000..9207266a6e69
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
> @@ -0,0 +1,156 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/mediatek-dwmac.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: MediaTek DWMAC glue layer controller
> +
> +maintainers:
> +  - Biao Huang <biao.huang@mediatek.com>
> +
> +description:
> +  This file documents platform glue layer for stmmac.
> +
> +# We need a select here so we don't match all nodes with 'snps,dwmac'
> +select:
> +  properties:
> +    compatible:
> +      contains:
> +        enum:
> +          - mediatek,mt2712-gmac
> +  required:
> +    - compatible
> +
> +allOf:
> +  - $ref: "snps,dwmac.yaml#"
> +
> +properties:
> +  compatible:
> +    oneOf:
> +      - items:

Don't need oneOf for 1 entry.

> +          - enum:
> +              - mediatek,mt2712-gmac
> +          - const: snps,dwmac-4.20a
> +
> +  clocks:
> +    items:
> +      - description: AXI clock
> +      - description: APB clock
> +      - description: MAC Main clock
> +      - description: PTP clock
> +      - description: RMII reference clock provided by MAC

Seems you need 'minItems: 4' or are the DT files wrong?

> +
> +  clock-names:
> +    items:
> +      - const: axi
> +      - const: apb
> +      - const: mac_main
> +      - const: ptp_ref
> +      - const: rmii_internal
> +
> +  mediatek,pericfg:
> +    $ref: /schemas/types.yaml#/definitions/phandle
> +    description:
> +      The phandle to the syscon node that control ethernet
> +      interface and timing delay.
> +
> +  mediatek,tx-delay-ps:
> +    description:
> +      The internal TX clock delay (provided by this driver) in nanoseconds.
> +      For MT2712 RGMII interface, Allowed value need to be a multiple of 170,
> +      or will round down. Range 0~31*170.
> +      For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
> +      or will round down. Range 0~31*550.
> +
> +  mediatek,rx-delay-ps:
> +    description:
> +      The internal RX clock delay (provided by this driver) in nanoseconds.
> +      For MT2712 RGMII interface, Allowed value need to be a multiple of 170,
> +      or will round down. Range 0~31*170.
> +      For MT2712 RMII/MII interface, Allowed value need to be a multiple of 550,
> +      or will round down. Range 0~31*550.
> +
> +  mediatek,rmii-rxc:
> +    type: boolean
> +    description:
> +      If present, indicates that the RMII reference clock, which is from external
> +      PHYs, is connected to RXC pin. Otherwise, is connected to TXC pin.
> +
> +  mediatek,rmii-clk-from-mac:
> +    type: boolean
> +    description:
> +      If present, indicates that MAC provides the RMII reference clock, which
> +      outputs to TXC pin only.
> +
> +  mediatek,txc-inverse:
> +    type: boolean
> +    description:
> +      If present, indicates that
> +      1. tx clock will be inversed in MII/RGMII case,
> +      2. tx clock inside MAC will be inversed relative to reference clock
> +         which is from external PHYs in RMII case, and it rarely happen.
> +      3. the reference clock, which outputs to TXC pin will be inversed in RMII case
> +         when the reference clock is from MAC.
> +
> +  mediatek,rxc-inverse:
> +    type: boolean
> +    description:
> +      If present, indicates that
> +      1. rx clock will be inversed in MII/RGMII case.
> +      2. reference clock will be inversed when arrived at MAC in RMII case, when
> +         the reference clock is from external PHYs.
> +      3. the inside clock, which be sent to MAC, will be inversed in RMII case when
> +         the reference clock is from MAC.
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +  - interrupt-names
> +  - clocks
> +  - clock-names
> +  - phy-mode
> +  - mediatek,pericfg
> +
> +unevaluatedProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/clock/mt2712-clk.h>
> +    #include <dt-bindings/gpio/gpio.h>
> +    #include <dt-bindings/interrupt-controller/arm-gic.h>
> +    #include <dt-bindings/interrupt-controller/irq.h>
> +    #include <dt-bindings/power/mt2712-power.h>
> +
> +    eth: ethernet@1101c000 {
> +        compatible = "mediatek,mt2712-gmac", "snps,dwmac-4.20a";
> +        reg = <0x1101c000 0x1300>;
> +        interrupts = <GIC_SPI 237 IRQ_TYPE_LEVEL_LOW>;
> +        interrupt-names = "macirq";
> +        phy-mode ="rgmii-rxid";
> +        mac-address = [00 55 7b b5 7d f7];
> +        clock-names = "axi",
> +                      "apb",
> +                      "mac_main",
> +                      "ptp_ref",
> +                      "rmii_internal";
> +        clocks = <&pericfg CLK_PERI_GMAC>,
> +                 <&pericfg CLK_PERI_GMAC_PCLK>,
> +                 <&topckgen CLK_TOP_ETHER_125M_SEL>,
> +                 <&topckgen CLK_TOP_ETHER_50M_SEL>,
> +                 <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>;
> +        assigned-clocks = <&topckgen CLK_TOP_ETHER_125M_SEL>,
> +                          <&topckgen CLK_TOP_ETHER_50M_SEL>,
> +                          <&topckgen CLK_TOP_ETHER_50M_RMII_SEL>;
> +        assigned-clock-parents = <&topckgen CLK_TOP_ETHERPLL_125M>,
> +                                 <&topckgen CLK_TOP_APLL1_D3>,
> +                                 <&topckgen CLK_TOP_ETHERPLL_50M>;
> +        power-domains = <&scpsys MT2712_POWER_DOMAIN_AUDIO>;
> +        mediatek,pericfg = <&pericfg>;
> +        mediatek,tx-delay-ps = <1530>;
> +        snps,txpbl = <1>;
> +        snps,rxpbl = <1>;
> +        snps,reset-gpio = <&pio 87 GPIO_ACTIVE_LOW>;
> +        snps,reset-delays-us = <0 10000 10000>;
> +    };
> -- 
> 2.25.1
> 
> 

^ permalink raw reply

* Re: [PATCH v6 00/14] genirq: Cleanup the abuse of irq_set_affinity_hint()
From: Thomas Gleixner @ 2021-12-10 18:44 UTC (permalink / raw)
  To: Nitesh Lal, Martin K. Petersen
  Cc: Juri Lelli, Dick Kennedy, Marcelo Tosatti, linux-kernel,
	linux-pci, linux-scsi, netdev, davem, ajit.khaparde,
	sriharsha.basavapatna, somnath.kotur, huangguangbin2, huangdaode,
	Frederic Weisbecker, Alex Belits, Bjorn Helgaas, rostedt,
	Peter Zijlstra, Jesse Brandeburg, Robin Murphy, Ingo Molnar,
	jbrandeb, akpm, sfr, stephen, rppt, chris.friesen, Marc Zyngier,
	Neil Horman, pjwaskiewicz, Stefan Assmann, Tomas Henzl,
	james.smart, Ken Cox, faisal.latif, shiraz.saleem, tariqt,
	Alaa Hleihel, Kamal Heib, borisp, saeedm, Nikolova, Tatyana E,
	Ismail, Mustafa, Al Stone, Leon Romanovsky, Chandrakanth Patil,
	bjorn.andersson, chunkuang.hu, yongqiang.niu, baolin.wang7,
	Petr Oros, Ming Lei, Ewan Milne, jejb, kabel, Viresh Kumar,
	Jakub Kicinski, kashyap.desai, Sumit Saxena,
	shivasharan.srikanteshwara, sathya.prakash, Sreekanth Reddy,
	suganath-prabu.subramani, ley.foon.tan, jbrunet, johannes,
	snelson, lewis.hanly, benve, _govind, jassisinghbrar
In-Reply-To: <CAFki+L=5sLN+nU+YpSSrQN0zkAOKrJorevm0nQ+KdwCpnOzf3w@mail.gmail.com>

On Fri, Dec 10 2021 at 08:51, Nitesh Lal wrote:
> On Wed, Nov 24, 2021 at 5:16 PM Nitesh Lal <nilal@redhat.com> wrote:
>> > The more general question is whether I should queue all the others or
>> > whether some subsystem would prefer to pull in a tagged commit on top of
>> > rc1. I'm happy to carry them all of course.
>> >
>>
>> I am fine either way.
>> In the past, while I was asking for more testing help I was asked if the
>> SCSI changes are part of Martins's scsi-fixes tree as that's something
>> Broadcom folks test to check for regression.
>> So, maybe Martin can pull this up?
>>
>
> Gentle ping.
> Any thoughts on the above query?

As nobody cares, I'll pick it up.

Thanks,

        tglx

^ permalink raw reply

* Re: [PATCH net] mptcp: fix NULL ptr dereference in inet_csk_accept()
From: Paolo Abeni @ 2021-12-10 18:32 UTC (permalink / raw)
  To: netdev; +Cc: mptcp, Geliang Tang
In-Reply-To: <299865ffd73315ea549ed4a8026783633203a237.1639155048.git.pabeni@redhat.com>

On Fri, 2021-12-10 at 17:51 +0100, Paolo Abeni wrote:
> Since commit 740d798e8767 ("mptcp: remove id 0 address"), the PM
> can remove the MPTCP first subflow in response to the netlink DEL_ADDR
> command. At subflow removal time, the TCP subflow socket is orphaned.
> 
> If the relevant MPTCP socket is in listening status and such
> operation races with an accept(), the kernel will access a NULL wait
> queue, as reported by syzbot:
> 
> general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
> CPU: 1 PID: 6550 Comm: syz-executor122 Not tainted 5.16.0-rc4-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__lock_acquire+0xd7d/0x54a0 kernel/locking/lockdep.c:4897
> Code: 0f 0e 41 be 01 00 00 00 0f 86 c8 00 00 00 89 05 69 cc 0f 0e e9 bd 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 da 48 c1 ea 03 <80> 3c 02 00 0f 85 f3 2f 00 00 48 81 3b 20 75 17 8f 0f 84 52 f3 ff
> RSP: 0018:ffffc90001f2f818 EFLAGS: 00010016
> RAX: dffffc0000000000 RBX: 0000000000000018 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000001
> RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000000
> R13: ffff88801b98d700 R14: 0000000000000000 R15: 0000000000000001
> FS:  00007f177cd3d700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f177cd1b268 CR3: 000000001dd55000 CR4: 0000000000350ee0
> Call Trace:
>  <TASK>
>  lock_acquire kernel/locking/lockdep.c:5637 [inline]
>  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>  _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
>  finish_wait+0xc0/0x270 kernel/sched/wait.c:400
>  inet_csk_wait_for_connect net/ipv4/inet_connection_sock.c:464 [inline]
>  inet_csk_accept+0x7de/0x9d0 net/ipv4/inet_connection_sock.c:497
>  mptcp_accept+0xe5/0x500 net/mptcp/protocol.c:2865
>  inet_accept+0xe4/0x7b0 net/ipv4/af_inet.c:739
>  mptcp_stream_accept+0x2e7/0x10e0 net/mptcp/protocol.c:3345
>  do_accept+0x382/0x510 net/socket.c:1773
>  __sys_accept4_file+0x7e/0xe0 net/socket.c:1816
>  __sys_accept4+0xb0/0x100 net/socket.c:1846
>  __do_sys_accept net/socket.c:1864 [inline]
>  __se_sys_accept net/socket.c:1861 [inline]
>  __x64_sys_accept+0x71/0xb0 net/socket.c:1861
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f177cd8b8e9
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f177cd3d308 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> RAX: ffffffffffffffda RBX: 00007f177ce13408 RCX: 00007f177cd8b8e9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
> RBP: 00007f177ce13400 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f177ce1340c
> R13: 00007f177cde1004 R14: 6d705f706374706d R15: 0000000000022000
>  </TASK>
> 
> Fix the issue explicitly preventing the PM from closing subflows
> of MPTCP socket in listener status.
> 
> Reported-and-tested-by: syzbot+e4d843bb96a9431e6331@syzkaller.appspotmail.com
> Fixes: 740d798e8767 ("mptcp: remove id 0 address")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

I'm dumb. We have likely a better fix for this one. I initially thought
it was not suitable for -net... as I was looking to the wrong patch :(

Self-nack, I'll test and send the other patch.

Sorry for the noise.

Paolo


^ permalink raw reply

* Re: [PATCH 1/8] perf/kprobe: Add support to create multiple probes
From: Andrii Nakryiko @ 2021-12-10 18:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Masami Hiramatsu,
	Steven Rostedt, Networking, bpf, lkml, Ingo Molnar, Mark Rutland,
	Martin KaFai Lau, Alexander Shishkin, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Ravi Bangoria
In-Reply-To: <YbNLPdrA80OMbzdS@krava>

On Fri, Dec 10, 2021 at 4:42 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Dec 08, 2021 at 02:50:09PM +0100, Jiri Olsa wrote:
> > On Mon, Dec 06, 2021 at 07:15:58PM -0800, Andrii Nakryiko wrote:
> > > On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > > > > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > >
> > > > > > Adding support to create multiple probes within single perf event.
> > > > > > This way we can associate single bpf program with multiple kprobes,
> > > > > > because bpf program gets associated with the perf event.
> > > > > >
> > > > > > The perf_event_attr is not extended, current fields for kprobe
> > > > > > attachment are used for multi attachment.
> > > > >
> > > > > I'm a bit concerned with complicating perf_event_attr further to
> > > > > support this multi-attach. For BPF, at least, we now have
> > > > > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > > > > syscall which allows much simpler and cleaner API to do this. Libbpf
> > > > > will actually pick bpf_link-based attachment if kernel supports it. I
> > > > > think we should better do bpf_link-based approach from the get go.
> > > > >
> > > > > Another thing I'd like you to keep in mind and think about is BPF
> > > > > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > > > > arbitrary user-provided u64 value which will be accessible from BPF
> > > > > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > > > > because extremely crucial feature to support, otherwise it's both
> > > > > expensive, inconvenient and complicated to be able to distinguish
> > > > > between different instances of the same multi-attach kprobe
> > > > > invocation. So with that, what would be the interface to specify these
> > > > > BPF cookies for this multi-attach kprobe, if we are going through
> > > > > perf_event_attr. Probably picking yet another unused field and
> > > > > union-izing it with a pointer. It will work, but makes the interface
> > > > > even more overloaded. While for LINK_CREATE we can just add another
> > > > > pointer to a u64[] with the same size as number of kfunc names and
> > > > > offsets.
> > > >
> > > > I'm not sure we could bypass perf event easily.. perhaps introduce
> > > > BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> > > > type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> > > > that might be that way we'd have full control over the API
> > >
> > > Sure, new type works.
> > >
> > > >
> > > > >
> > > > > But other than that, I'm super happy that you are working on these
> > > > > complicated multi-attach capabilities! It would be great to benchmark
> > > > > one-by-one attachment vs multi-attach to the same set of kprobes once
> > > > > you arrive at the final implementation.
> > > >
> > > > I have the change for bpftrace to use this and even though there's
> > > > some speed up, it's not as substantial as for trampolines
> > > >
> > > > looks like we 'only' got rid of the multiple perf syscall overheads,
> > > > compared to rcu syncs timeouts like we eliminated for trampolines
> > >
> > > if it's just eliminating a pretty small overhead of multiple syscalls,
> > > then it would be quite disappointing to add a bunch of complexity just
> > > for that.
> >
> > I meant it's not as huge save as for trampolines, but I expect some
> > noticeable speedup, I'll make more becnhmarks with current patchset
>
> so with this approach there's noticable speedup, but it's not the
> 'instant attachment speed' as for trampolines
>
> as a base I used bpftrace with change that allows to reuse bpf program
> for multiple kprobes
>
> bpftrace standard attach of 672 kprobes:
>
>   Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; }  i:ms:10 { printf("KRAVA\n"); exit() }':
>
>       70.548897815 seconds time elapsed
>
>        0.909996000 seconds user
>       50.622834000 seconds sys
>
>
> bpftrace using interface from this patchset attach of 673 kprobes:
>
>   Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; }  i:ms:10 { printf("KRAVA\n"); exit() }':
>
>       36.947586803 seconds time elapsed
>
>        0.272585000 seconds user
>       30.900831000 seconds sys
>
>
> so it's noticeable, but I wonder it's not enough ;-)

Typical retsnoop run for BPF use case is attaching to ~1200 functions.
Answer for yourself if you think the tool that takes 36 seconds to
start up is a great user experience? ;)

>
> jirka
>
> >
> > > Are there any reasons we can't use the same low-level ftrace
> > > batch attach API to speed this up considerably? I assume it's only
> > > possible if kprobe is attached at the beginning of the function (not
> > > sure how kretprobe is treated here), so we can either say that this
> > > new kprobe prog type can only be attached at the beginning of each
> > > function and enforce that (probably would be totally reasonable
> > > assumption as that's what's happening most frequently in practice).
> > > Worst case, should be possible to split all requested attach targets
> > > into two groups, one fast at function entry and all the rest.
> > >
> > > Am I too far off on this one? There might be some more complications
> > > that I don't see.
> >
> > I'd need to check more on kprobes internals, but.. ;-)
> >
> > the new ftrace interface is special for 'direct' trampolines and
> > I think that although kprobes can use ftrace for attaching, they
> > use it in a different way
> >
> > also this current 'multi attach' approach is on top of current kprobe
> > interface, if we wanted to use the new ftrace API we'd need to add new
> > kprobe interface and change the kprobe attaching to use it (for cases
> > it's attached at the function entry)
> >
> > jirka
> >
> > >
> > > >
> > > > I'll make full benchmarks once we have some final solution
> > > >
> > > > jirka
> > > >
> > >
>

^ permalink raw reply

* Re: [PATCH] samples: bpf: fix tracex2 due to empty sys_write count argument
From: Yonghong Song @ 2021-12-10 18:28 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko
  Cc: bpf, netdev
In-Reply-To: <20211210111918.4904-1-danieltimlee@gmail.com>



On 12/10/21 3:19 AM, Daniel T. Lee wrote:
> Currently from syscall entry, argument can't be fetched correctly as a
> result of register cleanup.
> 
>      commit 6b8cf5cc9965 ("x86/entry/64/compat: Clear registers for compat syscalls, to reduce speculation attack surface")
> 
> For example in upper commit, registers are cleaned prior to syscall.
> To be more specific, sys_write syscall has count size as a third argument.
> But this can't be fetched from __x64_sys_enter/__s390x_sys_enter due to
> register cleanup. (e.g. [x86] xorl %r8d, %r8d / [s390x] xgr %r7, %r7)

is this the real reason? Did you build 32-bit user space application?
Note that the above commit is for compat syscalls.

> 
> This commit fix this problem by modifying the trace event to ksys_write
> instead of sys_write syscall entry.
> 
>      # Wrong example of 'write()' syscall argument fetching
>      # ./tracex2
>      ...
>      pid 50909 cmd dd uid 0
>             syscall write() stats
>       byte_size       : count     distribution
>         1 -> 1        : 4968837  |************************************* |
> 
>      # Successful example of 'write()' syscall argument fetching
>      # (dd's write bytes at a time defaults to 512)
>      # ./tracex2
>      ...
>      pid 3095 cmd dd uid 0
>             syscall write() stats
>       byte_size       : count     distribution
>      ...
>       256 -> 511      : 0        |                                      |
>       512 -> 1023     : 4968844  |************************************* |
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>   samples/bpf/tracex2_kern.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
> index 5bc696bac27d..96dff3bea227 100644
> --- a/samples/bpf/tracex2_kern.c
> +++ b/samples/bpf/tracex2_kern.c
> @@ -78,7 +78,7 @@ struct {
>   	__uint(max_entries, 1024);
>   } my_hist_map SEC(".maps");
>   
> -SEC("kprobe/" SYSCALL(sys_write))
> +SEC("kprobe/ksys_write")
>   int bpf_prog3(struct pt_regs *ctx)
>   {
>   	long write_size = PT_REGS_PARM3(ctx);

I think the real reason of the failure is due to SYSCALL_WRAPPER.
Please take a look at test_probe_write_user_kern.c.

The issue with ksys_write() is that it can easily be inlined. For 
example, the source code,

ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
{
	...
}

SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
                 size_t, count)
{
         return ksys_write(fd, buf, count);
}

In my 5.12 kernel, I have
ffffffff81053b00 <__x64_sys_write>:
ffffffff81053b00: 0f 1f 44 00 00        nopl    (%rax,%rax)
ffffffff81053b05: 41 57                 pushq   %r15
ffffffff81053b07: 41 56                 pushq   %r14
ffffffff81053b09: 41 55                 pushq   %r13
ffffffff81053b0b: 41 54                 pushq   %r12
ffffffff81053b0d: 53                    pushq   %rbx
ffffffff81053b0e: 48 83 ec 10           subq    $16, %rsp
ffffffff81053b12: 65 48 8b 04 25 28 00 00 00    movq    %gs:40, %rax
ffffffff81053b1b: 48 89 44 24 08        movq    %rax, 8(%rsp)
ffffffff81053b20: 8b 47 70              movl    112(%rdi), %eax
ffffffff81053b23: 4c 8b 7f 60           movq    96(%rdi), %r15
ffffffff81053b27: 4c 8b 67 68           movq    104(%rdi), %r12
ffffffff81053b2b: 89 c7                 movl    %eax, %edi
ffffffff81053b2d: e8 6e a3 00 00        callq   0xffffffff8105dea0 
<__fdget_pos>
...

The ksys_write() is inlined.

^ permalink raw reply

* Re: [PATCH V2] net: bonding: Add support for IPV6 ns/na
From: kernel test robot @ 2021-12-10 18:13 UTC (permalink / raw)
  To: Sun Shouxin, j.vosburgh, vfalico, andy, davem, kuba
  Cc: llvm, kbuild-all, netdev, linux-kernel, huyd12
In-Reply-To: <1639141691-3741-1-git-send-email-sunshouxin@chinatelecom.cn>

Hi Sun,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.16-rc4 next-20211208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211210-210940
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c741e49150dbb0c0aebe234389f4aa8b47958fa8
config: hexagon-randconfig-r006-20211210 (https://download.01.org/0day-ci/archive/20211211/202112110234.hkzxELcK-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 097a1cb1d5ebb3a0ec4bcaed8ba3ff6a8e33c00a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f86d634c3ced7ec9b5af72e4b92bca681be033f7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211210-210940
        git checkout f86d634c3ced7ec9b5af72e4b92bca681be033f7
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash drivers/net/bonding/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/bonding/bond_alb.c:1318:26: error: implicit declaration of function 'csum_ipv6_magic' [-Werror,-Wimplicit-function-declaration]
                           icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
                                                 ^
   drivers/net/bonding/bond_alb.c:1318:26: note: did you mean 'csum_tcpudp_magic'?
   arch/hexagon/include/asm/checksum.h:21:9: note: 'csum_tcpudp_magic' declared here
   __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr,
           ^
   arch/hexagon/include/asm/checksum.h:20:27: note: expanded from macro 'csum_tcpudp_magic'
   #define csum_tcpudp_magic csum_tcpudp_magic
                             ^
   1 error generated.


vim +/csum_ipv6_magic +1318 drivers/net/bonding/bond_alb.c

  1283	
  1284	static void alb_change_nd_option(struct sk_buff *skb, void *data)
  1285	{
  1286		struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
  1287		struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
  1288		struct net_device *dev = skb->dev;
  1289		struct icmp6hdr *icmp6h = icmp6_hdr(skb);
  1290		struct ipv6hdr *ip6hdr = ipv6_hdr(skb);
  1291		u8 *lladdr = NULL;
  1292		u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) +
  1293					offsetof(struct nd_msg, opt));
  1294	
  1295		while (ndoptlen) {
  1296			int l;
  1297	
  1298			switch (nd_opt->nd_opt_type) {
  1299			case ND_OPT_SOURCE_LL_ADDR:
  1300			case ND_OPT_TARGET_LL_ADDR:
  1301			lladdr = ndisc_opt_addr_data(nd_opt, dev);
  1302			break;
  1303	
  1304			default:
  1305			lladdr = NULL;
  1306			break;
  1307			}
  1308	
  1309			l = nd_opt->nd_opt_len << 3;
  1310	
  1311			if (ndoptlen < l || l == 0)
  1312				return;
  1313	
  1314			if (lladdr) {
  1315				memcpy(lladdr, data, dev->addr_len);
  1316				icmp6h->icmp6_cksum = 0;
  1317	
> 1318				icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
  1319								      &ip6hdr->daddr,
  1320							ntohs(ip6hdr->payload_len),
  1321							IPPROTO_ICMPV6,
  1322							csum_partial(icmp6h,
  1323								     ntohs(ip6hdr->payload_len), 0));
  1324			}
  1325			ndoptlen -= l;
  1326			nd_opt = ((void *)nd_opt) + l;
  1327		}
  1328	}
  1329	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [RFC PATCH v2 net-next 0/4] DSA master state tracking
From: Ansuel Smith @ 2021-12-10 18:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev@vger.kernel.org, David S. Miller, Jakub Kicinski,
	Andrew Lunn, Vivien Didelot, Florian Fainelli
In-Reply-To: <61b38e7f.1c69fb81.96d1c.7933@mx.google.com>

On Fri, Dec 10, 2021 at 06:29:32PM +0100, Ansuel Smith wrote:
> On Fri, Dec 10, 2021 at 05:15:30PM +0000, Vladimir Oltean wrote:
> > On Fri, Dec 10, 2021 at 06:10:45PM +0100, Ansuel Smith wrote:
> > > On Fri, Dec 10, 2021 at 05:02:42PM +0000, Vladimir Oltean wrote:
> > > > On Fri, Dec 10, 2021 at 04:37:52AM +0100, Ansuel Smith wrote:
> > > > > On Thu, Dec 09, 2021 at 07:39:23PM +0200, Vladimir Oltean wrote:
> > > > > > This patch set is provided solely for review purposes (therefore not to
> > > > > > be applied anywhere) and for Ansuel to test whether they resolve the
> > > > > > slowdown reported here:
> > > > > > https://patchwork.kernel.org/project/netdevbpf/cover/20211207145942.7444-1-ansuelsmth@gmail.com/
> > > > > > 
> > > > > > The patches posted here are mainly to offer a consistent
> > > > > > "master_state_change" chain of events to switches, without duplicates,
> > > > > > and always starting with operational=true and ending with
> > > > > > operational=false. This way, drivers should know when they can perform
> > > > > > Ethernet-based register access, and need not care about more than that.
> > > > > > 
> > > > > > Changes in v2:
> > > > > > - dropped some useless patches
> > > > > > - also check master operstate.
> > > > > > 
> > > > > > Vladimir Oltean (4):
> > > > > >   net: dsa: provide switch operations for tracking the master state
> > > > > >   net: dsa: stop updating master MTU from master.c
> > > > > >   net: dsa: hold rtnl_mutex when calling dsa_master_{setup,teardown}
> > > > > >   net: dsa: replay master state events in
> > > > > >     dsa_tree_{setup,teardown}_master
> > > > > > 
> > > > > >  include/net/dsa.h  | 11 +++++++
> > > > > >  net/dsa/dsa2.c     | 80 +++++++++++++++++++++++++++++++++++++++++++---
> > > > > >  net/dsa/dsa_priv.h | 13 ++++++++
> > > > > >  net/dsa/master.c   | 29 ++---------------
> > > > > >  net/dsa/slave.c    | 27 ++++++++++++++++
> > > > > >  net/dsa/switch.c   | 15 +++++++++
> > > > > >  6 files changed, 145 insertions(+), 30 deletions(-)
> > > > > > 
> > > > > > -- 
> > > > > > 2.25.1
> > > > > > 
> > > > > 
> > > > > Hi, I tested this v2 and I still have 2 ethernet mdio failing on init.
> > > > > I don't think we have other way to track this. Am I wrong?
> > > > > 
> > > > > All works correctly with this and promisc_on_master.
> > > > > If you have other test, feel free to send me other stuff to test.
> > > > > 
> > > > > (I'm starting to think the fail is caused by some delay that the switch
> > > > > require to actually start accepting packet or from the reinit? But I'm
> > > > > not sure... don't know if you notice something from the pcap)
> > > > 
> > > > I've opened the pcap just now. The Ethernet MDIO packets are
> > > > non-standard. When the DSA master receives them, it expects the first 6
> > > > octets to be the MAC DA, because that's the format of an Ethernet frame.
> > > > But the packets have this other format, according to your own writing:
> > > > 
> > > > /* Specific define for in-band MDIO read/write with Ethernet packet */
> > > > #define QCA_HDR_MDIO_SEQ_LEN           4 /* 4 byte for the seq */
> > > > #define QCA_HDR_MDIO_COMMAND_LEN       4 /* 4 byte for the command */
> > > > #define QCA_HDR_MDIO_DATA1_LEN         4 /* First 4 byte for the mdio data */
> > > > #define QCA_HDR_MDIO_HEADER_LEN        (QCA_HDR_MDIO_SEQ_LEN + \
> > > >                                        QCA_HDR_MDIO_COMMAND_LEN + \
> > > >                                        QCA_HDR_MDIO_DATA1_LEN)
> > > > 
> > > > #define QCA_HDR_MDIO_DATA2_LEN         12 /* Other 12 byte for the mdio data */
> > > > #define QCA_HDR_MDIO_PADDING_LEN       34 /* Padding to reach the min Ethernet packet */
> > > > 
> > > > The first 6 octets change like crazy in your pcap. Definitely can't add
> > > > that to the RX filter of the DSA master.
> > > > 
> > > > So yes, promisc_on_master is precisely what you need, it exists for
> > > > situations like this.
> > > > 
> > > > Considering this, I guess it works?
> > > 
> > > Yes it works! We can totally accept 2 mdio timeout out of a good way to
> > > track the master port. It's probably related to other stuff like switch
> > > delay or other.
> > > 
> > > Wonder the next step is wait for this to be accepted and then I can
> > > propose a v3 of my patch? Or net-next is closed now and I should just
> > > send v3 RFC saying it does depend on this?
> > 
> > Wait a minute, I don't think I understood your previous reply.
> > With promisc_on_master, is there or is there not any timeout?
> 
> With promisc_on_master I have only 2 timeout.
> 
> > My understanding was this: DSA tells you when the master is up and
> > operational. That information is correct, except it isn't sufficient and
> > you don't see the replies back. Later during boot, you have some init
> > scripts triggered by user space that create a bridge interface and put
> > the switch ports under the bridge. The bridge puts the switch interfaces
> > in promiscuous mode, because that's what bridges do. Then DSA propagates
> > the promiscuous mode from the switch ports to the DSA master, and once
> > the master is promiscuous, the Ethernet MDIO starts working too.
> > Now, with promisc_on_master set, the DSA master is already promiscuous
> > by the time DSA tells you that it's up and running. Hence your message
> > that "All works correctly with this and promisc_on_master."
> > What did I misunderstand?
> 
> You got all correct. But still I have these 2 timeout at the very start.
> Let me give you another pastebin to make this more clear. [0]
> Transaction done is when the Ethernet packet is received and processed.
> I added some pr with the events received by switch.c
> 
> I should check if the tagger receive some packet before the
> "function timeout". 
> What I mean with "acceptable state" is that aside from the 2
> timeout everything else works correctly withtout any slowdown in the
> init process.
> 
> [0] https://pastebin.com/VfGB5hAQ
> 
> -- 
> 	Ansuel

Ok I added more tracing and packet are received to the tagger right
after the log from ipv6 "link becomes ready". That log just check if the
interface is up and if it does have a valid sched.
I notice after link becomes ready we have a CHANGE event for eth0. That
should be the correct way to understand when the cpu port is actually
usable.
(just to make it clear before the link becomes ready no packet is
received to the tagger and the completion timeouts)

-- 
	Ansuel

^ permalink raw reply

* Re: [PATCH V2] net: bonding: Add support for IPV6 ns/na
From: kernel test robot @ 2021-12-10 18:03 UTC (permalink / raw)
  To: Sun Shouxin, j.vosburgh, vfalico, andy, davem, kuba
  Cc: kbuild-all, netdev, linux-kernel, huyd12
In-Reply-To: <1639141691-3741-1-git-send-email-sunshouxin@chinatelecom.cn>

Hi Sun,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.16-rc4 next-20211208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211210-210940
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git c741e49150dbb0c0aebe234389f4aa8b47958fa8
config: nios2-randconfig-r026-20211210 (https://download.01.org/0day-ci/archive/20211211/202112110146.ZZvFe0rG-lkp@intel.com/config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/f86d634c3ced7ec9b5af72e4b92bca681be033f7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211210-210940
        git checkout f86d634c3ced7ec9b5af72e4b92bca681be033f7
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=nios2 SHELL=/bin/bash drivers/net/bonding/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/bonding/bond_alb.c: In function 'alb_change_nd_option':
>> drivers/net/bonding/bond_alb.c:1318:47: error: implicit declaration of function 'csum_ipv6_magic'; did you mean 'csum_tcpudp_magic'? [-Werror=implicit-function-declaration]
    1318 |                         icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
         |                                               ^~~~~~~~~~~~~~~
         |                                               csum_tcpudp_magic
   cc1: some warnings being treated as errors


vim +1318 drivers/net/bonding/bond_alb.c

  1283	
  1284	static void alb_change_nd_option(struct sk_buff *skb, void *data)
  1285	{
  1286		struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
  1287		struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
  1288		struct net_device *dev = skb->dev;
  1289		struct icmp6hdr *icmp6h = icmp6_hdr(skb);
  1290		struct ipv6hdr *ip6hdr = ipv6_hdr(skb);
  1291		u8 *lladdr = NULL;
  1292		u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) +
  1293					offsetof(struct nd_msg, opt));
  1294	
  1295		while (ndoptlen) {
  1296			int l;
  1297	
  1298			switch (nd_opt->nd_opt_type) {
  1299			case ND_OPT_SOURCE_LL_ADDR:
  1300			case ND_OPT_TARGET_LL_ADDR:
  1301			lladdr = ndisc_opt_addr_data(nd_opt, dev);
  1302			break;
  1303	
  1304			default:
  1305			lladdr = NULL;
  1306			break;
  1307			}
  1308	
  1309			l = nd_opt->nd_opt_len << 3;
  1310	
  1311			if (ndoptlen < l || l == 0)
  1312				return;
  1313	
  1314			if (lladdr) {
  1315				memcpy(lladdr, data, dev->addr_len);
  1316				icmp6h->icmp6_cksum = 0;
  1317	
> 1318				icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
  1319								      &ip6hdr->daddr,
  1320							ntohs(ip6hdr->payload_len),
  1321							IPPROTO_ICMPV6,
  1322							csum_partial(icmp6h,
  1323								     ntohs(ip6hdr->payload_len), 0));
  1324			}
  1325			ndoptlen -= l;
  1326			nd_opt = ((void *)nd_opt) + l;
  1327		}
  1328	}
  1329	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH] selftests: mptcp: remove duplicate include in mptcp_inq.c
From: Mat Martineau @ 2021-12-10 18:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Matthieu Baerts, cgel.zte, davem, shuah, netdev, mptcp,
	linux-kselftest, linux-kernel, Ye Guojin, ZealRobot
In-Reply-To: <20211210075701.06bfced2@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Fri, 10 Dec 2021, Jakub Kicinski wrote:

> On Fri, 10 Dec 2021 16:36:06 +0100 Matthieu Baerts wrote:
>>> Actually, I take that back, let's hear from Mat, he may want to take
>>> the patch via his tree.
>>
>> We "rebase" our tree on top of net-next every night. I think for such
>> small patches with no behaviour change and sent directly to netdev ML,
>> it is probably best to apply them directly. I can check with Mat if it
>> is an issue if you prefer.
>
> Please do, I'm happy to apply the patch but Mat usually prefers to take
> things thru MPTCP tree.
>

Jakub -

It is ok with me if you apply this now, for the reasons Matthieu cited.

The usual division of labor between Matthieu and I as MPTCP co-maintainers 
usually has me upstreaming the patches to netdev, but I do trust 
Matthieu's judgement on sending out Reviewed-by tags and advising direct 
appliction to the netdev trees! Also, much like you & David, having offset 
timezones can be helpful.

Also appreciate your awareness of the normal patch flow for MPTCP, and 
that you're checking that we're all on the same page.


>> I would have applied it in our MPTCP tree if we were sending PR, not to
>> bother you for such patches but I guess it is best not to have us
>> sending this patch a second time later :)
>>
>> BTW, if you prefer us sending PR over batches of patches, please tell us!
>
> Small preference for patches. It's good to have the code on the ML for
> everyone to look at and mixed PR + patches are a tiny bit more clicking
> for me.
>

Good to know.


Thanks!

--
Mat Martineau
Intel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox