Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 16/16] bnxt_en: Add PCI IDs for 57500 series NPAR devices.
From: Michael Chan @ 2019-07-29 10:10 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1564395033-19511-1-git-send-email-michael.chan@broadcom.com>

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9fe81fa..7a9b92e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -116,6 +116,9 @@ enum board_idx {
 	BCM57508,
 	BCM57504,
 	BCM57502,
+	BCM57508_NPAR,
+	BCM57504_NPAR,
+	BCM57502_NPAR,
 	BCM58802,
 	BCM58804,
 	BCM58808,
@@ -161,6 +164,9 @@ static const struct {
 	[BCM57508] = { "Broadcom BCM57508 NetXtreme-E 10Gb/25Gb/50Gb/100Gb/200Gb Ethernet" },
 	[BCM57504] = { "Broadcom BCM57504 NetXtreme-E 10Gb/25Gb/50Gb/100Gb/200Gb Ethernet" },
 	[BCM57502] = { "Broadcom BCM57502 NetXtreme-E 10Gb/25Gb/50Gb Ethernet" },
+	[BCM57508_NPAR] = { "Broadcom BCM57508 NetXtreme-E Ethernet Partition" },
+	[BCM57504_NPAR] = { "Broadcom BCM57504 NetXtreme-E Ethernet Partition" },
+	[BCM57502_NPAR] = { "Broadcom BCM57502 NetXtreme-E Ethernet Partition" },
 	[BCM58802] = { "Broadcom BCM58802 NetXtreme-S 10Gb/25Gb/40Gb/50Gb Ethernet" },
 	[BCM58804] = { "Broadcom BCM58804 NetXtreme-S 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet" },
 	[BCM58808] = { "Broadcom BCM58808 NetXtreme-S 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet" },
@@ -209,6 +215,12 @@ static const struct pci_device_id bnxt_pci_tbl[] = {
 	{ PCI_VDEVICE(BROADCOM, 0x1750), .driver_data = BCM57508 },
 	{ PCI_VDEVICE(BROADCOM, 0x1751), .driver_data = BCM57504 },
 	{ PCI_VDEVICE(BROADCOM, 0x1752), .driver_data = BCM57502 },
+	{ PCI_VDEVICE(BROADCOM, 0x1800), .driver_data = BCM57508_NPAR },
+	{ PCI_VDEVICE(BROADCOM, 0x1801), .driver_data = BCM57504_NPAR },
+	{ PCI_VDEVICE(BROADCOM, 0x1802), .driver_data = BCM57502_NPAR },
+	{ PCI_VDEVICE(BROADCOM, 0x1803), .driver_data = BCM57508_NPAR },
+	{ PCI_VDEVICE(BROADCOM, 0x1804), .driver_data = BCM57504_NPAR },
+	{ PCI_VDEVICE(BROADCOM, 0x1805), .driver_data = BCM57502_NPAR },
 	{ PCI_VDEVICE(BROADCOM, 0xd802), .driver_data = BCM58802 },
 	{ PCI_VDEVICE(BROADCOM, 0xd804), .driver_data = BCM58804 },
 #ifdef CONFIG_BNXT_SRIOV
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next 07/16] bnxt_en: Set TPA GRO mode flags on 57500 chips properly.
From: Michael Chan @ 2019-07-29 10:10 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1564395033-19511-1-git-send-email-michael.chan@broadcom.com>

On 57500 chips, hardware GRO mode cannot be determined from the TPA
end, so we need to check bp->flags to determine if we are in hardware
GRO mode or not.  Modify bnxt_set_features so that the TPA flags
in bp->flags don't change until the device is closed.  This will ensure
that the fast path can safely rely on bp->flags to determine the
TPA mode.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4856fd7..b615206 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -9345,7 +9345,8 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
 	if (changes & BNXT_FLAG_TPA) {
 		update_tpa = true;
 		if ((bp->flags & BNXT_FLAG_TPA) == 0 ||
-		    (flags & BNXT_FLAG_TPA) == 0)
+		    (flags & BNXT_FLAG_TPA) == 0 ||
+		    (bp->flags & BNXT_FLAG_CHIP_P5))
 			re_init = true;
 	}
 
@@ -9355,9 +9356,8 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
 	if (flags != bp->flags) {
 		u32 old_flags = bp->flags;
 
-		bp->flags = flags;
-
 		if (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
+			bp->flags = flags;
 			if (update_tpa)
 				bnxt_set_ring_params(bp);
 			return rc;
@@ -9365,12 +9365,14 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
 
 		if (re_init) {
 			bnxt_close_nic(bp, false, false);
+			bp->flags = flags;
 			if (update_tpa)
 				bnxt_set_ring_params(bp);
 
 			return bnxt_open_nic(bp, false, false);
 		}
 		if (update_tpa) {
+			bp->flags = flags;
 			rc = bnxt_set_tpa(bp,
 					  (flags & BNXT_FLAG_TPA) ?
 					  true : false);
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next 05/16] bnxt_en: Handle standalone RX_AGG completions.
From: Michael Chan @ 2019-07-29 10:10 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1564395033-19511-1-git-send-email-michael.chan@broadcom.com>

On the new 57500 chips, these new RX_AGG completions are not coalesced
at the TPA_END completion.  Handle these by storing them in the
array in the bnxt_tpa_info struct, as they are seen when processing
the CMPL ring.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b4b3405..810eea2 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1517,6 +1517,17 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	return skb;
 }
 
+static void bnxt_tpa_agg(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
+			 struct rx_agg_cmp *rx_agg)
+{
+	u16 agg_id = TPA_AGG_AGG_ID(rx_agg);
+	struct bnxt_tpa_info *tpa_info;
+
+	tpa_info = &rxr->rx_tpa[agg_id];
+	BUG_ON(tpa_info->agg_count >= MAX_SKB_FRAGS);
+	tpa_info->agg_arr[tpa_info->agg_count++] = *rx_agg;
+}
+
 static void bnxt_deliver_skb(struct bnxt *bp, struct bnxt_napi *bnapi,
 			     struct sk_buff *skb)
 {
@@ -1558,6 +1569,13 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	rxcmp = (struct rx_cmp *)
 			&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
 
+	cmp_type = RX_CMP_TYPE(rxcmp);
+
+	if (cmp_type == CMP_TYPE_RX_TPA_AGG_CMP) {
+		bnxt_tpa_agg(bp, rxr, (struct rx_agg_cmp *)rxcmp);
+		goto next_rx_no_prod_no_len;
+	}
+
 	tmp_raw_cons = NEXT_RAW_CMP(tmp_raw_cons);
 	cp_cons = RING_CMP(tmp_raw_cons);
 	rxcmp1 = (struct rx_cmp_ext *)
@@ -1566,8 +1584,6 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	if (!RX_CMP_VALID(rxcmp1, tmp_raw_cons))
 		return -EBUSY;
 
-	cmp_type = RX_CMP_TYPE(rxcmp);
-
 	prod = rxr->rx_prod;
 
 	if (cmp_type == CMP_TYPE_RX_L2_TPA_START_CMP) {
-- 
2.5.1


^ permalink raw reply related

* [PATCH net-next 03/16] bnxt_en: Refactor TPA logic.
From: Michael Chan @ 2019-07-29 10:10 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <1564395033-19511-1-git-send-email-michael.chan@broadcom.com>

Refactor the TPA logic slightly, so that the code can be more easily
extended to support TPA on the new 57500 chips.  In particular, the
logic to get the next aggregation completion is refactored into a
new function bnxt_get_agg() so that this operation is made more
generalized.  This operation will be different on the new chip in TPA
mode.  The logic to recycle the aggregation buffers has a new start
index parameter added for the same purpose.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 117 ++++++++++++++++++------------
 1 file changed, 69 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7134d2c..7e3a37a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -828,8 +828,20 @@ static inline int bnxt_alloc_rx_page(struct bnxt *bp,
 	return 0;
 }
 
-static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 cp_cons,
-				   u32 agg_bufs)
+static struct rx_agg_cmp *bnxt_get_agg(struct bnxt *bp,
+				       struct bnxt_cp_ring_info *cpr,
+				       u16 cp_cons, u16 curr)
+{
+	struct rx_agg_cmp *agg;
+
+	cp_cons = RING_CMP(ADV_RAW_CMP(cp_cons, curr));
+	agg = (struct rx_agg_cmp *)
+		&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+	return agg;
+}
+
+static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
+				   u16 start, u32 agg_bufs, bool tpa)
 {
 	struct bnxt_napi *bnapi = cpr->bnapi;
 	struct bnxt *bp = bnapi->bp;
@@ -845,8 +857,7 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 cp_cons,
 		struct rx_bd *prod_bd;
 		struct page *page;
 
-		agg = (struct rx_agg_cmp *)
-			&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+		agg = bnxt_get_agg(bp, cpr, idx, start + i);
 		cons = agg->rx_agg_cmp_opaque;
 		__clear_bit(cons, rxr->rx_agg_bmap);
 
@@ -874,7 +885,6 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 cp_cons,
 
 		prod = NEXT_RX_AGG(prod);
 		sw_prod = NEXT_RX_AGG(sw_prod);
-		cp_cons = NEXT_CMP(cp_cons);
 	}
 	rxr->rx_agg_prod = prod;
 	rxr->rx_sw_agg_prod = sw_prod;
@@ -957,8 +967,8 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 
 static struct sk_buff *bnxt_rx_pages(struct bnxt *bp,
 				     struct bnxt_cp_ring_info *cpr,
-				     struct sk_buff *skb, u16 cp_cons,
-				     u32 agg_bufs)
+				     struct sk_buff *skb, u16 idx,
+				     u32 agg_bufs, bool tpa)
 {
 	struct bnxt_napi *bnapi = cpr->bnapi;
 	struct pci_dev *pdev = bp->pdev;
@@ -973,8 +983,7 @@ static struct sk_buff *bnxt_rx_pages(struct bnxt *bp,
 		struct page *page;
 		dma_addr_t mapping;
 
-		agg = (struct rx_agg_cmp *)
-			&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
+		agg = bnxt_get_agg(bp, cpr, idx, i);
 		cons = agg->rx_agg_cmp_opaque;
 		frag_len = (le32_to_cpu(agg->rx_agg_cmp_len_flags_type) &
 			    RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
@@ -1008,7 +1017,7 @@ static struct sk_buff *bnxt_rx_pages(struct bnxt *bp,
 			 * allocated already.
 			 */
 			rxr->rx_agg_prod = prod;
-			bnxt_reuse_rx_agg_bufs(cpr, cp_cons, agg_bufs - i);
+			bnxt_reuse_rx_agg_bufs(cpr, idx, i, agg_bufs - i, tpa);
 			return NULL;
 		}
 
@@ -1021,7 +1030,6 @@ static struct sk_buff *bnxt_rx_pages(struct bnxt *bp,
 		skb->truesize += PAGE_SIZE;
 
 		prod = NEXT_RX_AGG(prod);
-		cp_cons = NEXT_CMP(cp_cons);
 	}
 	rxr->rx_agg_prod = prod;
 	return skb;
@@ -1081,9 +1089,7 @@ static int bnxt_discard_rx(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	} else if (cmp_type == CMP_TYPE_RX_L2_TPA_END_CMP) {
 		struct rx_tpa_end_cmp *tpa_end = cmp;
 
-		agg_bufs = (le32_to_cpu(tpa_end->rx_tpa_end_cmp_misc_v1) &
-			    RX_TPA_END_CMP_AGG_BUFS) >>
-			   RX_TPA_END_CMP_AGG_BUFS_SHIFT;
+		agg_bufs = TPA_END_AGG_BUFS(tpa_end);
 	}
 
 	if (agg_bufs) {
@@ -1195,11 +1201,10 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	cons_rx_buf->data = NULL;
 }
 
-static void bnxt_abort_tpa(struct bnxt_cp_ring_info *cpr, u16 cp_cons,
-			   u32 agg_bufs)
+static void bnxt_abort_tpa(struct bnxt_cp_ring_info *cpr, u16 idx, u32 agg_bufs)
 {
 	if (agg_bufs)
-		bnxt_reuse_rx_agg_bufs(cpr, cp_cons, agg_bufs);
+		bnxt_reuse_rx_agg_bufs(cpr, idx, 0, agg_bufs, true);
 }
 
 static struct sk_buff *bnxt_gro_func_5731x(struct bnxt_tpa_info *tpa_info,
@@ -1371,9 +1376,7 @@ static inline struct sk_buff *bnxt_gro_skb(struct bnxt *bp,
 	skb_shinfo(skb)->gso_size =
 		le32_to_cpu(tpa_end1->rx_tpa_end_cmp_seg_len);
 	skb_shinfo(skb)->gso_type = tpa_info->gso_type;
-	payload_off = (le32_to_cpu(tpa_end->rx_tpa_end_cmp_misc_v1) &
-		       RX_TPA_END_CMP_PAYLOAD_OFFSET) >>
-		      RX_TPA_END_CMP_PAYLOAD_OFFSET_SHIFT;
+	payload_off = TPA_END_PAYLOAD_OFF(tpa_end);
 	skb = bp->gro_func(tpa_info, payload_off, TPA_END_GRO_TS(tpa_end), skb);
 	if (likely(skb))
 		tcp_gro_complete(skb);
@@ -1403,11 +1406,11 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
 	u8 agg_id = TPA_END_AGG_ID(tpa_end);
 	u8 *data_ptr, agg_bufs;
-	u16 cp_cons = RING_CMP(*raw_cons);
 	unsigned int len;
 	struct bnxt_tpa_info *tpa_info;
 	dma_addr_t mapping;
 	struct sk_buff *skb;
+	u16 idx = 0;
 	void *data;
 
 	if (unlikely(bnapi->in_reset)) {
@@ -1425,19 +1428,19 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	len = tpa_info->len;
 	mapping = tpa_info->mapping;
 
-	agg_bufs = (le32_to_cpu(tpa_end->rx_tpa_end_cmp_misc_v1) &
-		    RX_TPA_END_CMP_AGG_BUFS) >> RX_TPA_END_CMP_AGG_BUFS_SHIFT;
+	agg_bufs = TPA_END_AGG_BUFS(tpa_end);
 
 	if (agg_bufs) {
+		idx = RING_CMP(*raw_cons);
 		if (!bnxt_agg_bufs_valid(bp, cpr, agg_bufs, raw_cons))
 			return ERR_PTR(-EBUSY);
 
 		*event |= BNXT_AGG_EVENT;
-		cp_cons = NEXT_CMP(cp_cons);
+		idx = NEXT_CMP(idx);
 	}
 
 	if (unlikely(agg_bufs > MAX_SKB_FRAGS || TPA_END_ERRORS(tpa_end1))) {
-		bnxt_abort_tpa(cpr, cp_cons, agg_bufs);
+		bnxt_abort_tpa(cpr, idx, agg_bufs);
 		if (agg_bufs > MAX_SKB_FRAGS)
 			netdev_warn(bp->dev, "TPA frags %d exceeded MAX_SKB_FRAGS %d\n",
 				    agg_bufs, (int)MAX_SKB_FRAGS);
@@ -1447,7 +1450,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	if (len <= bp->rx_copy_thresh) {
 		skb = bnxt_copy_skb(bnapi, data_ptr, len, mapping);
 		if (!skb) {
-			bnxt_abort_tpa(cpr, cp_cons, agg_bufs);
+			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			return NULL;
 		}
 	} else {
@@ -1456,7 +1459,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 
 		new_data = __bnxt_alloc_rx_data(bp, &new_mapping, GFP_ATOMIC);
 		if (!new_data) {
-			bnxt_abort_tpa(cpr, cp_cons, agg_bufs);
+			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			return NULL;
 		}
 
@@ -1471,7 +1474,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 
 		if (!skb) {
 			kfree(data);
-			bnxt_abort_tpa(cpr, cp_cons, agg_bufs);
+			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			return NULL;
 		}
 		skb_reserve(skb, bp->rx_offset);
@@ -1479,7 +1482,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	}
 
 	if (agg_bufs) {
-		skb = bnxt_rx_pages(bp, cpr, skb, cp_cons, agg_bufs);
+		skb = bnxt_rx_pages(bp, cpr, skb, idx, agg_bufs, true);
 		if (!skb) {
 			/* Page reuse already handled by bnxt_rx_pages(). */
 			return NULL;
@@ -1623,7 +1626,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 
 		bnxt_reuse_rx_data(rxr, cons, data);
 		if (agg_bufs)
-			bnxt_reuse_rx_agg_bufs(cpr, cp_cons, agg_bufs);
+			bnxt_reuse_rx_agg_bufs(cpr, cp_cons, 0, agg_bufs,
+					       false);
 
 		rc = -EIO;
 		if (rx_err & RX_CMPL_ERRORS_BUFFER_ERROR_MASK) {
@@ -1646,7 +1650,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		bnxt_reuse_rx_data(rxr, cons, data);
 		if (!skb) {
 			if (agg_bufs)
-				bnxt_reuse_rx_agg_bufs(cpr, cp_cons, agg_bufs);
+				bnxt_reuse_rx_agg_bufs(cpr, cp_cons, 0,
+						       agg_bufs, false);
 			rc = -ENOMEM;
 			goto next_rx;
 		}
@@ -1666,7 +1671,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	}
 
 	if (agg_bufs) {
-		skb = bnxt_rx_pages(bp, cpr, skb, cp_cons, agg_bufs);
+		skb = bnxt_rx_pages(bp, cpr, skb, cp_cons, agg_bufs, false);
 		if (!skb) {
 			rc = -ENOMEM;
 			goto next_rx;
@@ -2483,6 +2488,33 @@ static int bnxt_alloc_ring(struct bnxt *bp, struct bnxt_ring_mem_info *rmem)
 	return 0;
 }
 
+static void bnxt_free_tpa_info(struct bnxt *bp)
+{
+	int i;
+
+	for (i = 0; i < bp->rx_nr_rings; i++) {
+		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
+
+		kfree(rxr->rx_tpa);
+		rxr->rx_tpa = NULL;
+	}
+}
+
+static int bnxt_alloc_tpa_info(struct bnxt *bp)
+{
+	int i;
+
+	for (i = 0; i < bp->rx_nr_rings; i++) {
+		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
+
+		rxr->rx_tpa = kcalloc(MAX_TPA, sizeof(struct bnxt_tpa_info),
+				      GFP_KERNEL);
+		if (!rxr->rx_tpa)
+			return -ENOMEM;
+	}
+	return 0;
+}
+
 static void bnxt_free_rx_rings(struct bnxt *bp)
 {
 	int i;
@@ -2490,6 +2522,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 	if (!bp->rx_ring)
 		return;
 
+	bnxt_free_tpa_info(bp);
 	for (i = 0; i < bp->rx_nr_rings; i++) {
 		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
 		struct bnxt_ring_struct *ring;
@@ -2503,9 +2536,6 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 		page_pool_destroy(rxr->page_pool);
 		rxr->page_pool = NULL;
 
-		kfree(rxr->rx_tpa);
-		rxr->rx_tpa = NULL;
-
 		kfree(rxr->rx_agg_bmap);
 		rxr->rx_agg_bmap = NULL;
 
@@ -2539,7 +2569,7 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 
 static int bnxt_alloc_rx_rings(struct bnxt *bp)
 {
-	int i, rc, agg_rings = 0, tpa_rings = 0;
+	int i, rc = 0, agg_rings = 0;
 
 	if (!bp->rx_ring)
 		return -ENOMEM;
@@ -2547,9 +2577,6 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 	if (bp->flags & BNXT_FLAG_AGG_RINGS)
 		agg_rings = 1;
 
-	if (bp->flags & BNXT_FLAG_TPA)
-		tpa_rings = 1;
-
 	for (i = 0; i < bp->rx_nr_rings; i++) {
 		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
 		struct bnxt_ring_struct *ring;
@@ -2591,17 +2618,11 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 			rxr->rx_agg_bmap = kzalloc(mem_size, GFP_KERNEL);
 			if (!rxr->rx_agg_bmap)
 				return -ENOMEM;
-
-			if (tpa_rings) {
-				rxr->rx_tpa = kcalloc(MAX_TPA,
-						sizeof(struct bnxt_tpa_info),
-						GFP_KERNEL);
-				if (!rxr->rx_tpa)
-					return -ENOMEM;
-			}
 		}
 	}
-	return 0;
+	if (bp->flags & BNXT_FLAG_TPA)
+		rc = bnxt_alloc_tpa_info(bp);
+	return rc;
 }
 
 static void bnxt_free_tx_rings(struct bnxt *bp)
-- 
2.5.1


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Mikko Perttunen @ 2019-07-29  9:45 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <f3525260-c43f-c983-0b5b-34a83bd53283@nvidia.com>

My understanding is that Tegra186 does not have DMA coherency, but 
Tegra194 does.

Mikko

On 23.7.2019 16.34, Jon Hunter wrote:
> 
> On 23/07/2019 13:51, Jose Abreu wrote:
>> From: Jon Hunter <jonathanh@nvidia.com>
>> Date: Jul/23/2019, 12:58:55 (UTC+00:00)
>>
>>>
>>> On 23/07/2019 11:49, Jose Abreu wrote:
>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>> Date: Jul/23/2019, 11:38:33 (UTC+00:00)
>>>>
>>>>>
>>>>> On 23/07/2019 11:07, Jose Abreu wrote:
>>>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>>>> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
>>>>>>
>>>>>>> This appears to be a winner and by disabling the SMMU for the ethernet
>>>>>>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>>>>>>> this worked! So yes appears to be related to the SMMU being enabled. We
>>>>>>> had to enable the SMMU for ethernet recently due to commit
>>>>>>> 954a03be033c7cef80ddc232e7cbdb17df735663.
>>>>>>
>>>>>> Finally :)
>>>>>>
>>>>>> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
>>>>>>
>>>>>> +         There are few reasons to allow unmatched stream bypass, and
>>>>>> +         even fewer good ones.  If saying YES here breaks your board
>>>>>> +         you should work on fixing your board.
>>>>>>
>>>>>> So, how can we fix this ? Is your ethernet DT node marked as
>>>>>> "dma-coherent;" ?
>>>>>
>>>>> TBH I have no idea. I can't say I fully understand your change or how it
>>>>> is breaking things for us.
>>>>>
>>>>> Currently, the Tegra DT binding does not have 'dma-coherent' set. I see
>>>>> this is optional, but I am not sure how you determine whether or not
>>>>> this should be set.
>>>>
>>>>  From my understanding it means that your device / IP DMA accesses are coherent regarding the CPU point of view. I think it will be the case if GMAC is not behind any kind of IOMMU in the HW arch.
>>>
>>> I understand what coherency is, I just don't know how you tell if this
>>> implementation of the ethernet controller is coherent or not.
>>
>> Do you have any detailed diagram of your HW ? Such as blocks / IPs
>> connection, address space wiring , ...
> 
> Yes, this can be found in the Tegra X2 Technical Reference Manual [0].
> Unfortunately, you need to create an account to download it.
> 
> Jon
> 
> [0] https://developer.nvidia.com/embedded/dlc/parker-series-trm
> 

^ permalink raw reply

* Re: [PATCH net-next 08/11] net: hns3: add interrupt affinity support for misc interrupt
From: Yunsheng Lin @ 2019-07-29 10:17 UTC (permalink / raw)
  To: Saeed Mahameed, tanhuazhong@huawei.com, davem@davemloft.net
  Cc: lipeng321@huawei.com, yisen.zhuang@huawei.com,
	salil.mehta@huawei.com, linuxarm@huawei.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <fa9b747119c2e7acb1ef5f139c022402cd2c854d.camel@mellanox.com>

On 2019/7/27 7:18, Saeed Mahameed wrote:
> On Thu, 2019-07-25 at 10:05 +0800, Yunsheng Lin wrote:
>> On 2019/7/25 3:24, Saeed Mahameed wrote:
>>> On Wed, 2019-07-24 at 11:18 +0800, Huazhong Tan wrote:
>>>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>>>
> 
> [...]
>>>>
>>>> +static void hclge_irq_affinity_notify(struct irq_affinity_notify
>>>> *notify,
>>>> +				      const cpumask_t *mask)
>>>> +{
>>>> +	struct hclge_dev *hdev = container_of(notify, struct hclge_dev,
>>>> +					      affinity_notify);
>>>> +
>>>> +	cpumask_copy(&hdev->affinity_mask, mask);
>>>> +	del_timer_sync(&hdev->service_timer);
>>>> +	hdev->service_timer.expires = jiffies + HZ;
>>>> +	add_timer_on(&hdev->service_timer, cpumask_first(&hdev-
>>>>> affinity_mask));
>>>> +}
>>>
>>> I don't see any relation between your misc irq vector and &hdev-
>>>> service_timer, to me this looks like an abuse of the irq affinity
>>>> API
>>> to allow the user to move the service timer affinity.
>>
>> Hi, thanks for reviewing.
>>
>> hdev->service_timer is used to schedule the periodic work
>> queue hdev->service_task, we want all the management work
>> queue including hdev->service_task to bind to the same cpu
>> to improve cache and power efficiency, it is better to move
>> service timer affinity according to that.
>>
>> The hdev->service_task is changed to delay work queue in
>> next patch " net: hns3: make hclge_service use delayed workqueue",
>> So the affinity in the timer of the delay work queue is automatically
>> set to the affinity of the delay work queue, we will move the
>> "make hclge_service use delayed workqueue" patch before the
>> "add interrupt affinity support for misc interrupt" patch, so
>> we do not have to set service timer affinity explicitly.
>>
>> Also, There is there work queues(mbx_service_task, service_task,
>> rst_service_task) in the hns3 driver, we plan to combine them
>> to one or two workqueue to improve efficiency and readability.
>>
> 
> So just to make it clear, you have 3 deferred works, 2 are triggered by
> the misc irq and 1 is periodic, you want them all on the same core and
> you want to control their affinity via the irq affinity ? for works #1
> and  #2 you get that for free since the irq is triggering them, but for
> work #3 (the periodic one) you need to manually move it when the irq
> affinity changes.

Yes, You are right.

> 
> I guess i am ok with this, since moving the irq affinity isn't only
> required by the periodic work but also for the other works that the irq
> is actually triggering (so it is not an actual abuse, sort of .. )
> 
> 
> 


^ permalink raw reply

* [PATCH] net: geneve: Fix a possible null-pointer dereference in geneve_link_config()
From: Jia-Ju Bai @ 2019-07-29 10:26 UTC (permalink / raw)
  To: davem, sbrivio, sd, liuhangbin, jbenc, dsahern, natechancellor,
	tglx
  Cc: netdev, linux-kernel, Jia-Ju Bai

In geneve_link_config(), there is an if statement on line 1524 to check
whether rt is NULL:
    if (rt && rt->dst.dev)

When rt is NULL, it is used on line 1526:
    ip6_rt_put(rt)
        dst_release(&rt->dst);

Thus, a possible null-pointer dereference may occur.

To fix this bug, ip6_rt_put(rt) is called when rt is not NULL.

This bug is found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/geneve.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index cb2ea8facd8d..a47a1b31b166 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1521,9 +1521,10 @@ static void geneve_link_config(struct net_device *dev,
 		rt = rt6_lookup(geneve->net, &info->key.u.ipv6.dst, NULL, 0,
 				NULL, 0);
 
-		if (rt && rt->dst.dev)
+		if (rt && rt->dst.dev) {
 			ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV6_HLEN;
-		ip6_rt_put(rt);
+			ip6_rt_put(rt);
+		}
 		break;
 	}
 #endif
-- 
2.17.0


^ permalink raw reply related

* Re: [PATCH] net: geneve: Fix a possible null-pointer dereference in geneve_link_config()
From: Jiri Benc @ 2019-07-29 10:30 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: davem, sbrivio, sd, liuhangbin, dsahern, natechancellor, tglx,
	netdev, linux-kernel
In-Reply-To: <20190729102611.2338-1-baijiaju1990@gmail.com>

On Mon, 29 Jul 2019 18:26:11 +0800, Jia-Ju Bai wrote:
> --- a/drivers/net/geneve.c
> +++ b/drivers/net/geneve.c
> @@ -1521,9 +1521,10 @@ static void geneve_link_config(struct net_device *dev,
>  		rt = rt6_lookup(geneve->net, &info->key.u.ipv6.dst, NULL, 0,
>  				NULL, 0);
>  
> -		if (rt && rt->dst.dev)
> +		if (rt && rt->dst.dev) {
>  			ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV6_HLEN;
> -		ip6_rt_put(rt);
> +			ip6_rt_put(rt);
> +		}
>  		break;
>  	}
>  #endif

Are you sure rt6_lookup can never return a non-NULL rt with rt->dst.dev
being NULL? You'd leak the reference in such case.

 Jiri

^ permalink raw reply

* Re: [PATCH net-next 08/11] net: hns3: add interrupt affinity support for misc interrupt
From: Yunsheng Lin @ 2019-07-29 10:37 UTC (permalink / raw)
  To: Salil Mehta, tanhuazhong, davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Zhuangyuzeng (Yisen), Linuxarm, lipeng (Y)
In-Reply-To: <fa3dbcce9e22453fb1ef111fd107d20f@huawei.com>

On 2019/7/29 17:18, Salil Mehta wrote:
>> From: tanhuazhong
>> Sent: Wednesday, July 24, 2019 4:19 AM
>> To: davem@davemloft.net
>> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Salil Mehta
>> <salil.mehta@huawei.com>; Zhuangyuzeng (Yisen) <yisen.zhuang@huawei.com>;
>> Linuxarm <linuxarm@huawei.com>; linyunsheng <linyunsheng@huawei.com>; lipeng
>> (Y) <lipeng321@huawei.com>; tanhuazhong <tanhuazhong@huawei.com>
>> Subject: [PATCH net-next 08/11] net: hns3: add interrupt affinity support for
>> misc interrupt
>>
>> From: Yunsheng Lin <linyunsheng@huawei.com>
>>
>> The misc interrupt is used to schedule the reset and mailbox
>> subtask, and a 1 sec timer is used to schedule the service
>> subtask, which does periodic work.
>>
>> This patch sets the above three subtask's affinity using the
>> misc interrupt' affinity.
>>
>> Also this patch setups a affinity notify for misc interrupt to
>> allow user to change the above three subtask's affinity.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> Signed-off-by: Peng Li <lipeng321@huawei.com>
>> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
>> ---
>>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 59
>> ++++++++++++++++++++--
>>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h    |  4 ++
>>  2 files changed, 59 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> index f345095..fe45986 100644
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> @@ -1270,6 +1270,12 @@ static int hclge_configure(struct hclge_dev *hdev)
>>
>>  	hclge_init_kdump_kernel_config(hdev);
>>
>> +	/* Set the init affinity based on pci func number */
>> +	i = cpumask_weight(cpumask_of_node(dev_to_node(&hdev->pdev->dev)));
>> +	i = i ? PCI_FUNC(hdev->pdev->devfn) % i : 0;
>> +	cpumask_set_cpu(cpumask_local_spread(i, dev_to_node(&hdev->pdev->dev)),
>> +			&hdev->affinity_mask);
>> +
>>  	return ret;
>>  }
>>
>> @@ -2502,14 +2508,16 @@ static void hclge_mbx_task_schedule(struct hclge_dev
>> *hdev)
>>  {
>>  	if (!test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_MBX_SERVICE_SCHED, &hdev->state))
>> -		schedule_work(&hdev->mbx_service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
> 
> 
> Why we have to use system Work Queue here? This could adversely affect
> other work scheduled not related to the HNS3 driver. Mailbox is internal
> to the driver and depending upon utilization of the mbx channel(which could
> be heavy if many VMs are running), this might stress other system tasks
> as well. Have we thought of this?

If I understand the CMWQ correctly, it is better to reuse the system Work
Queue when the work queue needed by HNS3 driver shares the same property of
system Work Queue, because Concurrency Managed Workqueue mechnism has ensured
they have the same worker pool if they share the same property.

Some driver(i40e) allocates a work queue with WQ_MEM_RECLAIM, I am not sure
what is the usecase which requires the WQ_MEM_RECLAIM.

Anyway, If the HNS3 need to allocate a new work queue, we need to
figure out the usecase first.

> 
> 
> 
>> +			      &hdev->mbx_service_task);
>>  }
>>
>>  static void hclge_reset_task_schedule(struct hclge_dev *hdev)
>>  {
>>  	if (!test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_RST_SERVICE_SCHED, &hdev->state))
>> -		schedule_work(&hdev->rst_service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
>> +			      &hdev->rst_service_task);
>>  }
>>
>>  static void hclge_task_schedule(struct hclge_dev *hdev)
>> @@ -2517,7 +2525,8 @@ static void hclge_task_schedule(struct hclge_dev *hdev)
>>  	if (!test_bit(HCLGE_STATE_DOWN, &hdev->state) &&
>>  	    !test_bit(HCLGE_STATE_REMOVING, &hdev->state) &&
>>  	    !test_and_set_bit(HCLGE_STATE_SERVICE_SCHED, &hdev->state))
>> -		(void)schedule_work(&hdev->service_task);
>> +		queue_work_on(cpumask_first(&hdev->affinity_mask), system_wq,
> 
> Same here.
> 
> 
> Salil.
> 
> .
> 


^ permalink raw reply

* DSA Rate Limiting in 88E6390
From: Benjamin Beckmeyer @ 2019-07-29 10:30 UTC (permalink / raw)
  To: netdev

Hi all,
is there a possibility to set the rate limiting for the 6390 with linux tools?
I've seen that the driver will init all to zero, so rate limiting is disabled, 
but there is no solution for it in the ip tool?

The only thing I found for rate limiting is the tc tool, but I guess this is 
only a software solution?

Furthermore, does exist a table or a tutorial which functions DSA supports?
The background is that we are using the DSDT driver (in future maybe the UMSD
driver) but we would like to switch to the in kernel DSA entirely. And our
software is using some of the DSDT functions, so I have to find an 
alternative to these functions.

Thanks in advance.

Best regards,
Benjamin


^ permalink raw reply

* Re: [PATCH] net: geneve: Fix a possible null-pointer dereference in geneve_link_config()
From: Jiri Benc @ 2019-07-29 10:52 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: davem, sbrivio, sd, liuhangbin, dsahern, natechancellor, tglx,
	netdev, linux-kernel
In-Reply-To: <20190729123055.5a7ba416@redhat.com>

On Mon, 29 Jul 2019 12:30:55 +0200, Jiri Benc wrote:
> Are you sure rt6_lookup can never return a non-NULL rt with rt->dst.dev
> being NULL? You'd leak the reference in such case.

In fact, you're introducing a bug, not fixing one. ip6_rt_put does
accept NULL parameter. And it seems you already have been told that?

Nacked-by: Jiri Benc <jbenc@redhat.com>

 Jiri

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-29 10:55 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro, Robin Murphy,
	David S . Miller
In-Reply-To: <MN2PR12MB327907D4A6FB378AC989571AD3DD0@MN2PR12MB3279.namprd12.prod.outlook.com>


On 29/07/2019 09:16, Jose Abreu wrote:
> From: Jose Abreu <joabreu@synopsys.com>
> Date: Jul/27/2019, 16:56:37 (UTC+00:00)
> 
>> From: Jon Hunter <jonathanh@nvidia.com>
>> Date: Jul/26/2019, 15:11:00 (UTC+00:00)
>>
>>>
>>> On 25/07/2019 16:12, Jose Abreu wrote:
>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>> Date: Jul/25/2019, 15:25:59 (UTC+00:00)
>>>>
>>>>>
>>>>> On 25/07/2019 14:26, Jose Abreu wrote:
>>>>>
>>>>> ...
>>>>>
>>>>>> Well, I wasn't expecting that :/
>>>>>>
>>>>>> Per documentation of barriers I think we should set descriptor fields 
>>>>>> and then barrier and finally ownership to HW so that remaining fields 
>>>>>> are coherent before owner is set.
>>>>>>
>>>>>> Anyway, can you also add a dma_rmb() after the call to 
>>>>>> stmmac_rx_status() ?
>>>>>
>>>>> Yes. I removed the debug print added the barrier, but that did not help.
>>>>
>>>> So, I was finally able to setup NFS using your replicated setup and I 
>>>> can't see the issue :(
>>>>
>>>> The only difference I have from yours is that I'm using TCP in NFS 
>>>> whilst you (I believe from the logs), use UDP.
>>>
>>> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and
>>> 'proto=tcp' and this does appear to be more stable, but not 100% stable.
>>> It still appears to fail in the same place about 50% of the time.
>>>
>>>> You do have flow control active right ? And your HW FIFO size is >= 4k ?
>>>
>>> How can I verify if flow control is active?
>>
>> You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30).

Where would be the appropriate place to dump this? After probe? Maybe
best if you can share a code snippet of where to dump this.

>> Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ?

You can find a boot log here:

https://paste.ubuntu.com/p/qtRqtYKHGF/

> And, please try attached debug patch.

With this patch it appears to boot fine. So far no issues seen.

Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH iproute2 0/1] Fix s64 argument parsing
From: Kurt Kanzenbach @ 2019-07-29 11:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20190704122427.22256-1-kurt@linutronix.de>

[-- Attachment #1: Type: text/plain, Size: 466 bytes --]

On Thu, Jul 04, 2019 at 02:24:26PM +0200, Kurt Kanzenbach wrote:
> Hi,
>
> while using the TAPRIO Qdisc on ARM32 I've noticed that the base_time parameter is
> incorrectly configured. The problem is the utility function get_s64() used by
> TAPRIO doesn't parse the value correctly.

polite ping.

>
> Thanks,
> Kurt
>
> Kurt Kanzenbach (1):
>   utils: Fix get_s64() function
>
>  lib/utils.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --
> 2.11.0
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: Jiri Pirko @ 2019-07-29 11:13 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, fw, jakub.kicinski, netfilter-devel, netdev
In-Reply-To: <1564296769-32294-2-git-send-email-wenxu@ucloud.cn>

Sun, Jul 28, 2019 at 08:52:47AM CEST, wenxu@ucloud.cn wrote:
>From: wenxu <wenxu@ucloud.cn>
>
>move tc indirect block to flow_offload and rename
>it to flow indirect block.The nf_tables can use the
>indr block architecture.
>
>Signed-off-by: wenxu <wenxu@ucloud.cn>
>---
>v3: subsys_initcall for init_flow_indr_rhashtable
>v4: no change
>

[...]


>diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
>index 00b9aab..66f89bc 100644
>--- a/include/net/flow_offload.h
>+++ b/include/net/flow_offload.h
>@@ -4,6 +4,7 @@
> #include <linux/kernel.h>
> #include <linux/list.h>
> #include <net/flow_dissector.h>
>+#include <linux/rhashtable.h>
> 
> struct flow_match {
> 	struct flow_dissector	*dissector;
>@@ -366,4 +367,42 @@ static inline void flow_block_init(struct flow_block *flow_block)
> 	INIT_LIST_HEAD(&flow_block->cb_list);
> }
> 
>+typedef int flow_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
>+				      enum tc_setup_type type, void *type_data);
>+
>+struct flow_indr_block_cb {
>+	struct list_head list;
>+	void *cb_priv;
>+	flow_indr_block_bind_cb_t *cb;
>+	void *cb_ident;
>+};

I don't understand why are you pushing this struct out of the c file to
the header. Please don't.


>+
>+typedef void flow_indr_block_ing_cmd_t(struct net_device *dev,
>+				       struct flow_block *flow_block,
>+				       struct flow_indr_block_cb *indr_block_cb,
>+				       enum flow_block_command command);
>+
>+struct flow_indr_block_dev {
>+	struct rhash_head ht_node;
>+	struct net_device *dev;
>+	unsigned int refcnt;
>+	struct list_head cb_list;
>+	flow_indr_block_ing_cmd_t *ing_cmd_cb;
>+	struct flow_block *flow_block;

I don't understand why are you pushing this struct out of the c file to
the header. Please don't.


>+};
>+
>+struct flow_indr_block_dev *flow_indr_block_dev_lookup(struct net_device *dev);
>+
>+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
>+				  flow_indr_block_bind_cb_t *cb, void *cb_ident);
>+
>+void __flow_indr_block_cb_unregister(struct net_device *dev,
>+				     flow_indr_block_bind_cb_t *cb, void *cb_ident);
>+
>+int flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
>+				flow_indr_block_bind_cb_t *cb, void *cb_ident);
>+
>+void flow_indr_block_cb_unregister(struct net_device *dev,
>+				   flow_indr_block_bind_cb_t *cb, void *cb_ident);
>+
	
[...]

	
>+
>+int __flow_indr_block_cb_register(struct net_device *dev, void *cb_priv,
>+				  flow_indr_block_bind_cb_t *cb,
>+				  void *cb_ident)
>+{
>+	struct flow_indr_block_cb *indr_block_cb;
>+	struct flow_indr_block_dev *indr_dev;
>+	int err;
>+
>+	indr_dev = flow_indr_block_dev_get(dev);
>+	if (!indr_dev)
>+		return -ENOMEM;
>+
>+	indr_block_cb = flow_indr_block_cb_add(indr_dev, cb_priv, cb, cb_ident);
>+	err = PTR_ERR_OR_ZERO(indr_block_cb);
>+	if (err)
>+		goto err_dev_put;
>+
>+	if (indr_dev->ing_cmd_cb)
>+		indr_dev->ing_cmd_cb(indr_dev->dev, indr_dev->flow_block, indr_block_cb,

This line is over 80cols. Please run checkpatch script for your patch
and obey the warnings.


>+				     FLOW_BLOCK_BIND);
>+
>+	return 0;
>+
>+err_dev_put:
>+	flow_indr_block_dev_put(indr_dev);
>+	return err;
>+}
>+EXPORT_SYMBOL_GPL(__flow_indr_block_cb_register);

[...]


> 
>-static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
>-				  struct tc_indr_block_cb *indr_block_cb,
>+static void tc_indr_block_ing_cmd(struct net_device *dev,

I don't understand why you change struct tc_indr_block_dev * to
struct net_device * here. If you want to do that, please do that in a
separate patch, not it this one where only "the move" should happen.


>+				  struct flow_block *flow_block,
>+				  struct flow_indr_block_cb *indr_block_cb,
> 				  enum flow_block_command command)
> {
>+	struct tcf_block *block = flow_block ?
>+				  container_of(flow_block,
>+					       struct tcf_block,
>+					       flow_block) : NULL;
> 	struct flow_block_offload bo = {
> 		.command	= command,
> 		.binder_type	= FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
>-		.net		= dev_net(indr_dev->dev),
>-		.block_shared	= tcf_block_non_null_shared(indr_dev->block),
>+		.net		= dev_net(dev),
>+		.block_shared	= tcf_block_non_null_shared(block),
> 	};
> 	INIT_LIST_HEAD(&bo.cb_list);
> 

[...]

^ permalink raw reply

* Re: [PATCH net-next v4 1/3] flow_offload: move tc indirect block to flow offload
From: Jiri Pirko @ 2019-07-29 11:15 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, fw, jakub.kicinski, netfilter-devel, netdev
In-Reply-To: <1564296769-32294-2-git-send-email-wenxu@ucloud.cn>

Sun, Jul 28, 2019 at 08:52:47AM CEST, wenxu@ucloud.cn wrote:
>From: wenxu <wenxu@ucloud.cn>
>
>move tc indirect block to flow_offload and rename

A sentence should start with capital letter.


>it to flow indirect block.The nf_tables can use the

There should be a space between "." and first letter of the next
sensence.


>indr block architecture.
>

[...]

^ permalink raw reply

* tcan4x5x on a Raspberry Pi
From: FIXED-TERM Buecheler Konstantin (ETAS-SEC/ECT-Mu) @ 2019-07-29 11:19 UTC (permalink / raw)
  To: linux-can@vger.kernel.org, netdev@vger.kernel.org; +Cc: dmurphy@ti.com

Hi all, 

I am currently working on a project where I am trying to use the tcan4550 chip with a Raspberry PI 3B.
I am struggling to create a working device tree overlay file for the Raspberry Pi.
Has anyone here tried this already? I would appreciate any help. 

Thanks,
Konstantin


^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-29 11:29 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Catalin Marinas,
	Will Deacon
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro, Robin Murphy,
	David S . Miller
In-Reply-To: <b99b1e49-0cbc-2c66-6325-50fa6f263d91@nvidia.com>

++ Catalin, Will (ARM64 Maintainers)

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/29/2019, 11:55:18 (UTC+00:00)

> 
> On 29/07/2019 09:16, Jose Abreu wrote:
> > From: Jose Abreu <joabreu@synopsys.com>
> > Date: Jul/27/2019, 16:56:37 (UTC+00:00)
> > 
> >> From: Jon Hunter <jonathanh@nvidia.com>
> >> Date: Jul/26/2019, 15:11:00 (UTC+00:00)
> >>
> >>>
> >>> On 25/07/2019 16:12, Jose Abreu wrote:
> >>>> From: Jon Hunter <jonathanh@nvidia.com>
> >>>> Date: Jul/25/2019, 15:25:59 (UTC+00:00)
> >>>>
> >>>>>
> >>>>> On 25/07/2019 14:26, Jose Abreu wrote:
> >>>>>
> >>>>> ...
> >>>>>
> >>>>>> Well, I wasn't expecting that :/
> >>>>>>
> >>>>>> Per documentation of barriers I think we should set descriptor fields 
> >>>>>> and then barrier and finally ownership to HW so that remaining fields 
> >>>>>> are coherent before owner is set.
> >>>>>>
> >>>>>> Anyway, can you also add a dma_rmb() after the call to 
> >>>>>> stmmac_rx_status() ?
> >>>>>
> >>>>> Yes. I removed the debug print added the barrier, but that did not help.
> >>>>
> >>>> So, I was finally able to setup NFS using your replicated setup and I 
> >>>> can't see the issue :(
> >>>>
> >>>> The only difference I have from yours is that I'm using TCP in NFS 
> >>>> whilst you (I believe from the logs), use UDP.
> >>>
> >>> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and
> >>> 'proto=tcp' and this does appear to be more stable, but not 100% stable.
> >>> It still appears to fail in the same place about 50% of the time.
> >>>
> >>>> You do have flow control active right ? And your HW FIFO size is >= 4k ?
> >>>
> >>> How can I verify if flow control is active?
> >>
> >> You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30).
> 
> Where would be the appropriate place to dump this? After probe? Maybe
> best if you can share a code snippet of where to dump this.
> 
> >> Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ?
> 
> You can find a boot log here:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_p_qtRqtYKHGF_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=NrxsR2etpZHGb7HkN4XdgaGmKM1XYyldihNPL6qVSv0&s=CMATEcHVoqZw4sIrNOXc7SFE_kV_5CO5EU21-yJez6c&e= 
> 
> > And, please try attached debug patch.
> 
> With this patch it appears to boot fine. So far no issues seen.

Thank you for testing.

Hi Catalin and Will,

Sorry to add you in such a long thread but we are seeing a DMA issue 
with stmmac driver in an ARM64 platform with IOMMU enabled.

The issue seems to be solved when buffers allocation for DMA based 
transfers are *not* mapped with the DMA_ATTR_SKIP_CPU_SYNC flag *OR* 
when IOMMU is disabled.

Notice that after transfer is done we do use 
dma_sync_single_for_{cpu,device} and then we reuse *the same* page for 
another transfer.

Can you please comment on whether DMA_ATTR_SKIP_CPU_SYNC can not be used 
in ARM64 platforms with IOMMU ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH] arcnet: com90xx: Mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2019-07-29 11:09 UTC (permalink / raw)
  To: Michael Grzeschik, David S. Miller
  Cc: netdev, linux-kernel, Gustavo A. R. Silva, Stephen Rothwell,
	Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warnings (Building: powerpc allyesconfig):

drivers/net/arcnet/com90xx.c: In function 'com90xx_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/com90xx.c:695:3: note: in expansion of macro 'pr_err'
   pr_err("Too many arguments\n");
   ^~~~~~
drivers/net/arcnet/com90xx.c:696:2: note: here
  case 3:  /* Mem address */
  ^~~~
drivers/net/arcnet/com90xx.c:697:9: warning: this statement may fall through [-Wimplicit-fallthrough=]
   shmem = ints[3];
   ~~~~~~^~~~~~~~~
drivers/net/arcnet/com90xx.c:698:2: note: here
  case 2:  /* IRQ */
  ^~~~
drivers/net/arcnet/com90xx.c:699:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
   irq = ints[2];
   ~~~~^~~~~~~~~
drivers/net/arcnet/com90xx.c:700:2: note: here
  case 1:  /* IO address */
  ^~~~

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/arcnet/com90xx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/arcnet/com90xx.c b/drivers/net/arcnet/com90xx.c
index ca4a57c30bf8..bd75d06ad7df 100644
--- a/drivers/net/arcnet/com90xx.c
+++ b/drivers/net/arcnet/com90xx.c
@@ -693,10 +693,13 @@ static int __init com90xx_setup(char *s)
 	switch (ints[0]) {
 	default:		/* ERROR */
 		pr_err("Too many arguments\n");
+		/* Fall through */
 	case 3:		/* Mem address */
 		shmem = ints[3];
+		/* Fall through */
 	case 2:		/* IRQ */
 		irq = ints[2];
+		/* Fall through */
 	case 1:		/* IO address */
 		io = ints[1];
 	}
-- 
2.22.0


^ permalink raw reply related

* [PATCH] arcnet: com90io: Mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2019-07-29 11:13 UTC (permalink / raw)
  To: Michael Grzeschik, David S. Miller
  Cc: netdev, linux-kernel, Gustavo A. R. Silva, Stephen Rothwell,
	Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warnings (Building: powerpc allyesconfig):

drivers/net/arcnet/com90io.c: In function 'com90io_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/com90io.c:365:3: note: in expansion of macro 'pr_err'
   pr_err("Too many arguments\n");
   ^~~~~~
drivers/net/arcnet/com90io.c:366:2: note: here
  case 2:  /* IRQ */
  ^~~~
drivers/net/arcnet/com90io.c:367:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
   irq = ints[2];
   ~~~~^~~~~~~~~
drivers/net/arcnet/com90io.c:368:2: note: here
  case 1:  /* IO address */
  ^~~~

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/arcnet/com90io.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/arcnet/com90io.c b/drivers/net/arcnet/com90io.c
index 2c546013a980..186bbf87bc84 100644
--- a/drivers/net/arcnet/com90io.c
+++ b/drivers/net/arcnet/com90io.c
@@ -363,8 +363,10 @@ static int __init com90io_setup(char *s)
 	switch (ints[0]) {
 	default:		/* ERROR */
 		pr_err("Too many arguments\n");
+		/* Fall through */
 	case 2:		/* IRQ */
 		irq = ints[2];
+		/* Fall through */
 	case 1:		/* IO address */
 		io = ints[1];
 	}
-- 
2.22.0


^ permalink raw reply related

* [PATCH] arcnet: arc-rimi: Mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2019-07-29 11:15 UTC (permalink / raw)
  To: Michael Grzeschik, David S. Miller
  Cc: netdev, linux-kernel, Gustavo A. R. Silva, Stephen Rothwell,
	Kees Cook

Mark switch cases where we are expecting to fall through.

This patch fixes the following warnings (Building: powerpc allyesconfig):

drivers/net/arcnet/arc-rimi.c: In function 'arcrimi_setup':
include/linux/printk.h:304:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
  printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/arcnet/arc-rimi.c:365:3: note: in expansion of macro 'pr_err'
   pr_err("Too many arguments\n");
   ^~~~~~
drivers/net/arcnet/arc-rimi.c:366:2: note: here
  case 3:  /* Node ID */
  ^~~~
drivers/net/arcnet/arc-rimi.c:367:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
   node = ints[3];
   ~~~~~^~~~~~~~~
drivers/net/arcnet/arc-rimi.c:368:2: note: here
  case 2:  /* IRQ */
  ^~~~
drivers/net/arcnet/arc-rimi.c:369:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
   irq = ints[2];
   ~~~~^~~~~~~~~
drivers/net/arcnet/arc-rimi.c:370:2: note: here
  case 1:  /* IO address */
  ^~~~

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/net/arcnet/arc-rimi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/arcnet/arc-rimi.c b/drivers/net/arcnet/arc-rimi.c
index 11c5bad95226..14a5fb378145 100644
--- a/drivers/net/arcnet/arc-rimi.c
+++ b/drivers/net/arcnet/arc-rimi.c
@@ -363,10 +363,13 @@ static int __init arcrimi_setup(char *s)
 	switch (ints[0]) {
 	default:		/* ERROR */
 		pr_err("Too many arguments\n");
+		/* Fall through */
 	case 3:		/* Node ID */
 		node = ints[3];
+		/* Fall through */
 	case 2:		/* IRQ */
 		irq = ints[2];
+		/* Fall through */
 	case 1:		/* IO address */
 		io = ints[1];
 	}
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Robin Murphy @ 2019-07-29 11:52 UTC (permalink / raw)
  To: Jose Abreu, Jon Hunter, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Catalin Marinas,
	Will Deacon
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller
In-Reply-To: <MN2PR12MB327997BDF2EA5CEE00F45AC3D3DD0@MN2PR12MB3279.namprd12.prod.outlook.com>

On 29/07/2019 12:29, Jose Abreu wrote:
> ++ Catalin, Will (ARM64 Maintainers)
> 
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/29/2019, 11:55:18 (UTC+00:00)
> 
>>
>> On 29/07/2019 09:16, Jose Abreu wrote:
>>> From: Jose Abreu <joabreu@synopsys.com>
>>> Date: Jul/27/2019, 16:56:37 (UTC+00:00)
>>>
>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>> Date: Jul/26/2019, 15:11:00 (UTC+00:00)
>>>>
>>>>>
>>>>> On 25/07/2019 16:12, Jose Abreu wrote:
>>>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>>>> Date: Jul/25/2019, 15:25:59 (UTC+00:00)
>>>>>>
>>>>>>>
>>>>>>> On 25/07/2019 14:26, Jose Abreu wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> Well, I wasn't expecting that :/
>>>>>>>>
>>>>>>>> Per documentation of barriers I think we should set descriptor fields
>>>>>>>> and then barrier and finally ownership to HW so that remaining fields
>>>>>>>> are coherent before owner is set.
>>>>>>>>
>>>>>>>> Anyway, can you also add a dma_rmb() after the call to
>>>>>>>> stmmac_rx_status() ?
>>>>>>>
>>>>>>> Yes. I removed the debug print added the barrier, but that did not help.
>>>>>>
>>>>>> So, I was finally able to setup NFS using your replicated setup and I
>>>>>> can't see the issue :(
>>>>>>
>>>>>> The only difference I have from yours is that I'm using TCP in NFS
>>>>>> whilst you (I believe from the logs), use UDP.
>>>>>
>>>>> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and
>>>>> 'proto=tcp' and this does appear to be more stable, but not 100% stable.
>>>>> It still appears to fail in the same place about 50% of the time.
>>>>>
>>>>>> You do have flow control active right ? And your HW FIFO size is >= 4k ?
>>>>>
>>>>> How can I verify if flow control is active?
>>>>
>>>> You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30).
>>
>> Where would be the appropriate place to dump this? After probe? Maybe
>> best if you can share a code snippet of where to dump this.
>>
>>>> Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ?
>>
>> You can find a boot log here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_p_qtRqtYKHGF_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=NrxsR2etpZHGb7HkN4XdgaGmKM1XYyldihNPL6qVSv0&s=CMATEcHVoqZw4sIrNOXc7SFE_kV_5CO5EU21-yJez6c&e=
>>
>>> And, please try attached debug patch.
>>
>> With this patch it appears to boot fine. So far no issues seen.
> 
> Thank you for testing.
> 
> Hi Catalin and Will,
> 
> Sorry to add you in such a long thread but we are seeing a DMA issue
> with stmmac driver in an ARM64 platform with IOMMU enabled.
> 
> The issue seems to be solved when buffers allocation for DMA based
> transfers are *not* mapped with the DMA_ATTR_SKIP_CPU_SYNC flag *OR*
> when IOMMU is disabled.
> 
> Notice that after transfer is done we do use
> dma_sync_single_for_{cpu,device} and then we reuse *the same* page for
> another transfer.
> 
> Can you please comment on whether DMA_ATTR_SKIP_CPU_SYNC can not be used
> in ARM64 platforms with IOMMU ?

In terms of what they do, there should be no difference on arm64 between:

dma_map_page(..., dir);
...
dma_unmap_page(..., dir);

and:

dma_map_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);
dma_sync_single_for_device(..., dir);
...
dma_sync_single_for_cpu(..., dir);
dma_unmap_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);

provided that the first sync covers the whole buffer and any subsequent 
ones cover at least the parts of the buffer which may have changed. Plus 
for coherent hardware it's entirely moot either way.

Given Jon's previous findings, I would lean towards the idea that 
performing the extra (redundant) cache maintenance plus barrier in 
dma_unmap is mostly just perturbing timing in the same way as the debug 
print which also made things seem OK.

Robin.

^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-29 12:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <b755f613-e6d8-a2e6-16cd-6f13ec0a6ddc@cumulusnetworks.com>

Hi Nikolay,

First of all, as mentioned further down in this thread, I realized that our
implementation of the multicast floodmasks does not align with the existing SW
implementation. We will change this, such that all multicast packets goes to the
SW bridge.

This changes things a bit, not that much.

I actually think you summarized the issue we have (after changing to multicast
flood-masks) right here:

The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
> >> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
> >> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
> >> Thus only the flooding may need to be controlled.

This seems to be exactly what we need.

Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
on a network where we want to limit the flooding of frames with dmac
01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.

One way of doing this could potentially be to support the following command:

bridge fdb add    01:21:6C:00:00:01 port eth0
bridge fdb append 01:21:6C:00:00:01 port eth1

On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
> >>>>>>  In general NLM_F_APPEND is only used in vxlan, the bridge does not
> >>>>>>  handle that flag at all.  FDB is only for *unicast*, nothing is joined
> >>>>>>  and no multicast should be used with fdbs. MDB is used for multicast
> >>>>>>  handling, but both of these are used for forwarding.
This is true, and this should have been addressed in the patch, we were too
focused on setting up the offload patch in the driver, and forgot to do the SW
implementation.

Do you see any issues in supporting this flag, and updating the SW
forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
to be a multicast?

/Allan


^ permalink raw reply

* [PATCH net-next] MAINTAINERS: Remove mailing-list entry for XDP (eXpress Data Path)
From: Jesper Dangaard Brouer @ 2019-07-29 12:16 UTC (permalink / raw)
  Cc: Jesper Dangaard Brouer, xdp-newbies, netdev, bpf, ast, daniel,
	davem, jakub.kicinski, john.fastabend

This removes the mailing list xdp-newbies@vger.kernel.org from the XDP
kernel maintainers entry.

Being in the kernel MAINTAINERS file successfully caused the list to
receive kbuild bot warnings, syzbot reports and sometimes developer
patches. The level of details in these messages, doesn't match the
target audience of the XDP-newbies list. This is based on a survey on
the mailing list, where 73% voted for removal from MAINTAINERS file.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 MAINTAINERS |    1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cc156c58f0c..45cb4237eddc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17560,7 +17560,6 @@ M:	Jakub Kicinski <jakub.kicinski@netronome.com>
 M:	Jesper Dangaard Brouer <hawk@kernel.org>
 M:	John Fastabend <john.fastabend@gmail.com>
 L:	netdev@vger.kernel.org
-L:	xdp-newbies@vger.kernel.org
 L:	bpf@vger.kernel.org
 S:	Supported
 F:	net/core/xdp.c


^ permalink raw reply related

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Nikolay Aleksandrov @ 2019-07-29 12:22 UTC (permalink / raw)
  To: Allan W. Nielsen
  Cc: Horatiu Vultur, roopa, davem, bridge, netdev, linux-kernel
In-Reply-To: <20190729121409.wa47uelw5f6l4vs4@lx-anielsen.microsemi.net>

Hi Allan,
On 29/07/2019 15:14, Allan W. Nielsen wrote:
> Hi Nikolay,
> 
> First of all, as mentioned further down in this thread, I realized that our
> implementation of the multicast floodmasks does not align with the existing SW
> implementation. We will change this, such that all multicast packets goes to the
> SW bridge.
> 
> This changes things a bit, not that much.
> 
> I actually think you summarized the issue we have (after changing to multicast
> flood-masks) right here:
> 
> The 07/26/2019 12:26, Nikolay Aleksandrov wrote:
>>>> Actually you mentioned non-IP traffic, so the querier stuff is not a problem. This
>>>> traffic will always be flooded by the bridge (and also a copy will be locally sent up).
>>>> Thus only the flooding may need to be controlled.
> 
> This seems to be exactly what we need.
> 
> Assuming we have a SW bridge (br0) with 4 slave interfaces (eth0-3). We use this
> on a network where we want to limit the flooding of frames with dmac
> 01:21:6C:00:00:01 (which is non IP traffic) to eth0 and eth1.
> 
> One way of doing this could potentially be to support the following command:
> 
> bridge fdb add    01:21:6C:00:00:01 port eth0
> bridge fdb append 01:21:6C:00:00:01 port eth1
> 
> On 25/07/2019 16:06, Nikolay Aleksandrov wrote:
>>>>>>>>  In general NLM_F_APPEND is only used in vxlan, the bridge does not
>>>>>>>>  handle that flag at all.  FDB is only for *unicast*, nothing is joined
>>>>>>>>  and no multicast should be used with fdbs. MDB is used for multicast
>>>>>>>>  handling, but both of these are used for forwarding.
> This is true, and this should have been addressed in the patch, we were too
> focused on setting up the offload patch in the driver, and forgot to do the SW
> implementation.
> 
> Do you see any issues in supporting this flag, and updating the SW
> forwarding in br_handle_frame_finish such that it can support/allow a FDB entry
> to be a multicast?
> 

Yes, all of the multicast code is handled differently, it doesn't go through the fdb
lookup or code at all. I don't see how you'll do a lookup in the fdb table with a
multicast mac address, take a look at br_handle_frame_finish() and you'll notice
that when a multicast dmac is detected then we use the bridge mcast code for lookups
and forwarding. If you're trying to achieve Rx only on the bridge of these then
why not just use Ido's tc suggestion or even the ip maddr add offload for each port ?

If you add a multicast mac in the fdb (currently allowed, but has no effect) and you
use dev_mc_add() as suggested that'd just be a hack to pass it down and it is already
possible to achieve via other methods, no need to go through the bridge.

> /Allan
> 


^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-29 12:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Horatiu Vultur, Nikolay Aleksandrov, roopa, davem,
	bridge, netdev, linux-kernel
In-Reply-To: <20190729060923.GA16938@splinter>

Hi Ido,

The 07/29/2019 09:09, Ido Schimmel wrote:
> External E-Mail
> 
> 
> On Sun, Jul 28, 2019 at 09:15:59PM +0200, Allan W. Nielsen wrote:
> > If we assume that the SwitchDev driver implemented such that all multicast
> > traffic goes to the CPU, then we should really have a way to install a HW
> > offload path in the silicon, such that these packets does not go to the CPU (as
> > they are known not to be use full, and a frame every 3 us is a significant load
> > on small DMA connections and CPU resources).
> > 
> > If we assume that the SwitchDev driver implemented such that only "needed"
> > multicast packets goes to the CPU, then we need a way to get these packets in
> > case we want to implement the DLR protocol.
> 
> I'm not familiar with the HW you're working with, so the below might not
> be relevant.
> 
> In case you don't want to send all multicast traffic to the CPU (I'll
> refer to it later), you can install an ingress tc filter that traps to
> the CPU the packets you do want to receive. Something like:
> 
> # tc qdisc add dev swp1 clsact
> # tc filter add dev swp1 pref 1 ingress flower skip_sw dst_mac \
> 	01:21:6C:00:00:01 action trap
I have actually been looking at this, and it may an idea to go down this road.
But so far we have chosen not to for the following reasons:
- It is not only about trapping traffic to the CPU, we also needs to capability
  to limit the flooding on the front ports.
- In our case (the silicon), this feature really belongs to the MAC-table, which
  is why we did prefer to do it via the FDB entries.
  - But the HW does have TCAM resources, and we are planning on exposing these
    resources via the tc-flower interface. It is just that we have more MAC
    table resoruces than TCAM resources, which is another argument for using the
    MAC table.

> If your HW supports sharing the same filter among multiple ports, then
> you can install your filter in a tc shared block and bind multiple ports
> to it.
It does, thanks for making us aware of this optimization option.

> Another option is to always send a *copy* of multicast packets to the
> CPU, but make sure the HW uses a policer that prevents the CPU from
> being overwhelmed. To avoid packets being forwarded twice (by HW and
> SW), you will need to mark such packets in your driver with
> 'skb->offload_fwd_mark = 1'.
Understood

> Now, in case user wants to allow the CPU to receive certain packets at a
> higher rate, a tc filter can be used. It will be identical to the filter
> I mentioned earlier, but with a 'police' action chained before 'trap'.
I see.

> I don't think this is currently supported by any driver, but I believe
> it's the right way to go: By default the CPU receives all the traffic it
> should receive and user can fine-tune it using ACLs.
If all the frames goes to the CPU, then how can I fine-tune frames not to go to
the CPU?? I can do a TRAP (to get it to the CPU) a DROP (to drop it before
forwarding), but how can I forward a multicast packet, but prevent it from going
to the CPU?

I have seen that the mirror command can do re-direction, but not to a list of
ports...

All in all, thanks a lot for the suggestions, but to begin with I think we will
explore the MAC table option a bit more. But we will get back to TC to support
the ACL functions.

/Allan



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox