linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support
@ 2024-07-03 13:51 Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

Hi,

am65-cpsw can support up to 8 queues at Rx. So far we have
been using only one queue (i.e. default flow) for all RX traffic.

This series adds multi-queue support. The driver starts with
1 RX queue by default. User can increase the RX queues via ethtool,
e.g. 'ethtool -L ethx rx <N>'

The series also adds regmap and regfield support to some of the
ALE registers. It adds Policer/Classifier registers and fields.

Converting the existing ALE control APIs to regfields can be a separate
exercise.

Some helper functions are added to read/write to the Policer/Classifier
registers and a default Classifier setup function is added that
routes packets based on their PCP/DSCP priority to different RX queues.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
---
Changes in v3:
- code style fixes
- squashed patches 5 and 6
- added comment about priority to thread mapping table.
- Added Reviewed-by Simon Horman.
- Link to v2: https://lore.kernel.org/r/20240628-am65-cpsw-multi-rx-v2-0-c399cb77db56@kernel.org

Changes in v2:
- rebase to net/next
- fixed RX stall issue during iperf
- Link to v1: https://lore.kernel.org/r/20240606-am65-cpsw-multi-rx-v1-0-0704b0cb6fdc@kernel.org

---
Roger Quadros (6):
      net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
      net: ethernet: ti: cpsw_ale: use regfields for ALE registers
      net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers
      net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields
      net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults
      net: ethernet: ti: am65-cpsw: setup priority to flow mapping

 drivers/net/ethernet/ti/am65-cpsw-ethtool.c |  62 +++--
 drivers/net/ethernet/ti/am65-cpsw-nuss.c    | 370 ++++++++++++++++------------
 drivers/net/ethernet/ti/am65-cpsw-nuss.h    |  36 +--
 drivers/net/ethernet/ti/cpsw_ale.c          | 287 +++++++++++++++++++--
 drivers/net/ethernet/ti/cpsw_ale.h          |  62 ++++-
 5 files changed, 609 insertions(+), 208 deletions(-)
---
base-commit: 84562f9953ec5f91a4922baa2bd4f2d4f64fac31
change-id: 20240606-am65-cpsw-multi-rx-fb6cf8dea5eb

Best regards,
-- 
Roger Quadros <rogerq@kernel.org>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-06  1:15   ` Jakub Kicinski
                     ` (2 more replies)
  2024-07-03 13:51 ` [PATCH net-next v3 2/6] net: ethernet: ti: cpsw_ale: use regfields for ALE registers Roger Quadros
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

am65-cpsw can support up to 8 queues at Rx.
Use a macro AM65_CPSW_MAX_RX_QUEUES to indicate that.
As there is only one DMA channel for RX traffic, the
8 queues come as 8 flows in that channel.

By default, we will start with 1 flow as defined by the
macro AM65_CPSW_DEFAULT_RX_CHN_FLOWS.

User can change the number of flows by ethtool like so
'ethtool -L ethx rx <N>'

All traffic will still come on flow 0. To get traffic on
different flows the Classifiers will need to be set up.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changelog:
v3:
- style fixes: reverse xmas tree and checkpatch.pl --max-line-length=80
- typo fix: Classifer -> Classifier
- added Reviewed-by Simon Horman
---
 drivers/net/ethernet/ti/am65-cpsw-ethtool.c |  62 +++--
 drivers/net/ethernet/ti/am65-cpsw-nuss.c    | 367 ++++++++++++++++------------
 drivers/net/ethernet/ti/am65-cpsw-nuss.h    |  36 +--
 3 files changed, 284 insertions(+), 181 deletions(-)

diff --git a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
index a1d0935d1ebe..01e3967852e0 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
@@ -429,7 +429,7 @@ static void am65_cpsw_get_channels(struct net_device *ndev,
 
 	ch->max_rx = AM65_CPSW_MAX_RX_QUEUES;
 	ch->max_tx = AM65_CPSW_MAX_TX_QUEUES;
-	ch->rx_count = AM65_CPSW_MAX_RX_QUEUES;
+	ch->rx_count = common->rx_ch_num_flows;
 	ch->tx_count = common->tx_ch_num;
 }
 
@@ -448,8 +448,10 @@ static int am65_cpsw_set_channels(struct net_device *ndev,
 		return -EBUSY;
 
 	am65_cpsw_nuss_remove_tx_chns(common);
+	am65_cpsw_nuss_remove_rx_chns(common);
 
-	return am65_cpsw_nuss_update_tx_chns(common, chs->tx_count);
+	return am65_cpsw_nuss_update_tx_rx_chns(common, chs->tx_count,
+						chs->rx_count);
 }
 
 static void
@@ -920,11 +922,13 @@ static int am65_cpsw_get_coalesce(struct net_device *ndev, struct ethtool_coales
 				  struct netlink_ext_ack *extack)
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
+	struct am65_cpsw_rx_flow *rx_flow;
 	struct am65_cpsw_tx_chn *tx_chn;
 
 	tx_chn = &common->tx_chns[0];
+	rx_flow = &common->rx_chns.flows[0];
 
-	coal->rx_coalesce_usecs = common->rx_pace_timeout / 1000;
+	coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
 	coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
 
 	return 0;
@@ -934,14 +938,26 @@ static int am65_cpsw_get_per_queue_coalesce(struct net_device *ndev, u32 queue,
 					    struct ethtool_coalesce *coal)
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
+	struct am65_cpsw_rx_flow *rx_flow;
 	struct am65_cpsw_tx_chn *tx_chn;
 
-	if (queue >= AM65_CPSW_MAX_TX_QUEUES)
+	if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
+	    queue >= AM65_CPSW_MAX_RX_QUEUES)
 		return -EINVAL;
 
-	tx_chn = &common->tx_chns[queue];
+	if (queue < AM65_CPSW_MAX_TX_QUEUES) {
+		tx_chn = &common->tx_chns[queue];
+		coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
+	} else {
+		coal->tx_coalesce_usecs = ~0;
+	}
 
-	coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
+	if (queue < AM65_CPSW_MAX_RX_QUEUES) {
+		rx_flow = &common->rx_chns.flows[queue];
+		coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
+	} else {
+		coal->rx_coalesce_usecs = ~0;
+	}
 
 	return 0;
 }
@@ -951,9 +967,11 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
 				  struct netlink_ext_ack *extack)
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
+	struct am65_cpsw_rx_flow *rx_flow;
 	struct am65_cpsw_tx_chn *tx_chn;
 
 	tx_chn = &common->tx_chns[0];
+	rx_flow = &common->rx_chns.flows[0];
 
 	if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20)
 		return -EINVAL;
@@ -961,7 +979,7 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
 	if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20)
 		return -EINVAL;
 
-	common->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
+	rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
 	tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
 
 	return 0;
@@ -971,20 +989,36 @@ static int am65_cpsw_set_per_queue_coalesce(struct net_device *ndev, u32 queue,
 					    struct ethtool_coalesce *coal)
 {
 	struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
+	struct am65_cpsw_rx_flow *rx_flow;
 	struct am65_cpsw_tx_chn *tx_chn;
 
-	if (queue >= AM65_CPSW_MAX_TX_QUEUES)
+	if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
+	    queue >= AM65_CPSW_MAX_RX_QUEUES)
 		return -EINVAL;
 
-	tx_chn = &common->tx_chns[queue];
+	if (queue < AM65_CPSW_MAX_TX_QUEUES) {
+		tx_chn = &common->tx_chns[queue];
+
+		if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
+			dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
+				 queue);
+			coal->tx_coalesce_usecs = 20;
+		}
 
-	if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
-		dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
-			 queue);
-		coal->tx_coalesce_usecs = 20;
+		tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
 	}
 
-	tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
+	if (queue < AM65_CPSW_MAX_RX_QUEUES) {
+		rx_flow = &common->rx_chns.flows[queue];
+
+		if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20) {
+			dev_info(common->dev, "defaulting to min value of 20us for rx-usecs for rx-%u\n",
+				 queue);
+			coal->rx_coalesce_usecs = 20;
+		}
+
+		rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
+	}
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 81d9f21086ec..deb8e39e4648 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -138,7 +138,7 @@
 	 AM65_CPSW_PN_TS_CTL_RX_ANX_F_EN)
 
 #define AM65_CPSW_ALE_AGEOUT_DEFAULT	30
-/* Number of TX/RX descriptors */
+/* Number of TX/RX descriptors per channel/flow */
 #define AM65_CPSW_MAX_TX_DESC	500
 #define AM65_CPSW_MAX_RX_DESC	500
 
@@ -150,6 +150,7 @@
 			 NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
 
 #define AM65_CPSW_DEFAULT_TX_CHNS	8
+#define AM65_CPSW_DEFAULT_RX_CHN_FLOWS	1
 
 /* CPPI streaming packet interface */
 #define AM65_CPSW_CPPI_TX_FLOW_ID  0x3FFF
@@ -330,7 +331,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
 }
 
 static int am65_cpsw_nuss_rx_push(struct am65_cpsw_common *common,
-				  struct page *page)
+				  struct page *page, u32 flow_idx)
 {
 	struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
 	struct cppi5_host_desc_t *desc_rx;
@@ -363,7 +364,8 @@ static int am65_cpsw_nuss_rx_push(struct am65_cpsw_common *common,
 	swdata = cppi5_hdesc_get_swdata(desc_rx);
 	*((void **)swdata) = page_address(page);
 
-	return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, 0, desc_rx, desc_dma);
+	return k3_udma_glue_push_rx_chn(rx_chn->rx_chn, flow_idx,
+					desc_rx, desc_dma);
 }
 
 void am65_cpsw_nuss_set_p0_ptype(struct am65_cpsw_common *common)
@@ -398,22 +400,27 @@ static void am65_cpsw_init_port_emac_ale(struct am65_cpsw_port *port);
 static void am65_cpsw_destroy_xdp_rxqs(struct am65_cpsw_common *common)
 {
 	struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
+	struct am65_cpsw_rx_flow *flow;
 	struct xdp_rxq_info *rxq;
-	int i;
+	int id, port;
 
-	for (i = 0; i < common->port_num; i++) {
-		if (!common->ports[i].ndev)
-			continue;
+	for (id = 0; id < common->rx_ch_num_flows; id++) {
+		flow = &rx_chn->flows[id];
 
-		rxq = &common->ports[i].xdp_rxq;
+		for (port = 0; port < common->port_num; port++) {
+			if (!common->ports[port].ndev)
+				continue;
 
-		if (xdp_rxq_info_is_reg(rxq))
-			xdp_rxq_info_unreg(rxq);
-	}
+			rxq = &common->ports[port].xdp_rxq[id];
 
-	if (rx_chn->page_pool) {
-		page_pool_destroy(rx_chn->page_pool);
-		rx_chn->page_pool = NULL;
+			if (xdp_rxq_info_is_reg(rxq))
+				xdp_rxq_info_unreg(rxq);
+		}
+
+		if (flow->page_pool) {
+			page_pool_destroy(flow->page_pool);
+			flow->page_pool = NULL;
+		}
 	}
 }
 
@@ -427,31 +434,44 @@ static int am65_cpsw_create_xdp_rxqs(struct am65_cpsw_common *common)
 		.nid = dev_to_node(common->dev),
 		.dev = common->dev,
 		.dma_dir = DMA_BIDIRECTIONAL,
-		.napi = &common->napi_rx,
+		/* .napi set dynamically */
 	};
+	struct am65_cpsw_rx_flow *flow;
 	struct xdp_rxq_info *rxq;
 	struct page_pool *pool;
-	int i, ret;
-
-	pool = page_pool_create(&pp_params);
-	if (IS_ERR(pool))
-		return PTR_ERR(pool);
+	int id, port, ret;
+
+	for (id = 0; id < common->rx_ch_num_flows; id++) {
+		flow = &rx_chn->flows[id];
+		pp_params.napi = &flow->napi_rx;
+		pool = page_pool_create(&pp_params);
+		if (IS_ERR(pool)) {
+			ret = PTR_ERR(pool);
+			goto err;
+		}
 
-	rx_chn->page_pool = pool;
+		flow->page_pool = pool;
 
-	for (i = 0; i < common->port_num; i++) {
-		if (!common->ports[i].ndev)
-			continue;
+		/* using same page pool is allowed as no running rx handlers
+		 * simultaneously for both ndevs
+		 */
+		for (port = 0; port < common->port_num; port++) {
+			if (!common->ports[port].ndev)
+				continue;
 
-		rxq = &common->ports[i].xdp_rxq;
+			rxq = &common->ports[port].xdp_rxq[id];
 
-		ret = xdp_rxq_info_reg(rxq, common->ports[i].ndev, i, 0);
-		if (ret)
-			goto err;
+			ret = xdp_rxq_info_reg(rxq, common->ports[port].ndev,
+					       id, flow->napi_rx.napi_id);
+			if (ret)
+				goto err;
 
-		ret = xdp_rxq_info_reg_mem_model(rxq, MEM_TYPE_PAGE_POOL, pool);
-		if (ret)
-			goto err;
+			ret = xdp_rxq_info_reg_mem_model(rxq,
+							 MEM_TYPE_PAGE_POOL,
+							 pool);
+			if (ret)
+				goto err;
+		}
 	}
 
 	return 0;
@@ -496,25 +516,27 @@ static enum am65_cpsw_tx_buf_type am65_cpsw_nuss_buf_type(struct am65_cpsw_tx_ch
 								       desc_idx);
 }
 
-static inline void am65_cpsw_put_page(struct am65_cpsw_rx_chn *rx_chn,
+static inline void am65_cpsw_put_page(struct am65_cpsw_rx_flow *flow,
 				      struct page *page,
 				      bool allow_direct,
 				      int desc_idx)
 {
-	page_pool_put_full_page(rx_chn->page_pool, page, allow_direct);
-	rx_chn->pages[desc_idx] = NULL;
+	page_pool_put_full_page(flow->page_pool, page, allow_direct);
+	flow->pages[desc_idx] = NULL;
 }
 
 static void am65_cpsw_nuss_rx_cleanup(void *data, dma_addr_t desc_dma)
 {
-	struct am65_cpsw_rx_chn *rx_chn = data;
+	struct am65_cpsw_rx_flow *flow = data;
 	struct cppi5_host_desc_t *desc_rx;
+	struct am65_cpsw_rx_chn *rx_chn;
 	dma_addr_t buf_dma;
 	u32 buf_dma_len;
 	void *page_addr;
 	void **swdata;
 	int desc_idx;
 
+	rx_chn = &flow->common->rx_chns;
 	desc_rx = k3_cppi_desc_pool_dma2virt(rx_chn->desc_pool, desc_dma);
 	swdata = cppi5_hdesc_get_swdata(desc_rx);
 	page_addr = *swdata;
@@ -525,7 +547,7 @@ static void am65_cpsw_nuss_rx_cleanup(void *data, dma_addr_t desc_dma)
 
 	desc_idx = am65_cpsw_nuss_desc_idx(rx_chn->desc_pool, desc_rx,
 					   rx_chn->dsize_log2);
-	am65_cpsw_put_page(rx_chn, virt_to_page(page_addr), false, desc_idx);
+	am65_cpsw_put_page(flow, virt_to_page(page_addr), false, desc_idx);
 }
 
 static void am65_cpsw_nuss_xmit_free(struct am65_cpsw_tx_chn *tx_chn,
@@ -601,7 +623,8 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
 	struct am65_cpsw_host *host_p = am65_common_get_host(common);
 	struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
 	struct am65_cpsw_tx_chn *tx_chn = common->tx_chns;
-	int port_idx, i, ret, tx;
+	int port_idx, i, ret, tx, flow_idx;
+	struct am65_cpsw_rx_flow *flow;
 	u32 val, port_mask;
 	struct page *page;
 
@@ -669,27 +692,32 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
 		return ret;
 	}
 
-	for (i = 0; i < rx_chn->descs_num; i++) {
-		page = page_pool_dev_alloc_pages(rx_chn->page_pool);
-		if (!page) {
-			ret = -ENOMEM;
-			if (i)
-				goto fail_rx;
+	for (flow_idx = 0; flow_idx < common->rx_ch_num_flows; flow_idx++) {
+		flow = &rx_chn->flows[flow_idx];
+		for (i = 0; i < AM65_CPSW_MAX_RX_DESC; i++) {
+			page = page_pool_dev_alloc_pages(flow->page_pool);
+			if (!page) {
+				dev_err(common->dev, "cannot allocate page in flow %d\n",
+					flow_idx);
+				ret = -ENOMEM;
+				if (i)
+					goto fail_rx;
 
-			return ret;
-		}
-		rx_chn->pages[i] = page;
+				return ret;
+			}
+			flow->pages[i] = page;
 
-		ret = am65_cpsw_nuss_rx_push(common, page);
-		if (ret < 0) {
-			dev_err(common->dev,
-				"cannot submit page to channel rx: %d\n",
-				ret);
-			am65_cpsw_put_page(rx_chn, page, false, i);
-			if (i)
-				goto fail_rx;
+			ret = am65_cpsw_nuss_rx_push(common, page, flow_idx);
+			if (ret < 0) {
+				dev_err(common->dev,
+					"cannot submit page to rx channel flow %d, error %d\n",
+					flow_idx, ret);
+				am65_cpsw_put_page(flow, page, false, i);
+				if (i)
+					goto fail_rx;
 
-			return ret;
+				return ret;
+			}
 		}
 	}
 
@@ -699,6 +727,14 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
 		goto fail_rx;
 	}
 
+	for (i = 0; i < common->rx_ch_num_flows ; i++) {
+		napi_enable(&common->rx_chns.flows[i].napi_rx);
+		if (common->rx_chns.flows[i].irq_disabled) {
+			common->rx_chns.flows[i].irq_disabled = false;
+			enable_irq(common->rx_chns.flows[i].irq);
+		}
+	}
+
 	for (tx = 0; tx < common->tx_ch_num; tx++) {
 		ret = k3_udma_glue_enable_tx_chn(tx_chn[tx].tx_chn);
 		if (ret) {
@@ -710,12 +746,6 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
 		napi_enable(&tx_chn[tx].napi_tx);
 	}
 
-	napi_enable(&common->napi_rx);
-	if (common->rx_irq_disabled) {
-		common->rx_irq_disabled = false;
-		enable_irq(rx_chn->irq);
-	}
-
 	dev_dbg(common->dev, "cpsw_nuss started\n");
 	return 0;
 
@@ -726,11 +756,24 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
 		tx--;
 	}
 
+	for (flow_idx = 0; i < common->rx_ch_num_flows; flow_idx++) {
+		flow = &rx_chn->flows[flow_idx];
+		if (!flow->irq_disabled) {
+			disable_irq(flow->irq);
+			flow->irq_disabled = true;
+		}
+		napi_disable(&flow->napi_rx);
+	}
+
 	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
 
 fail_rx:
-	k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, 0, rx_chn,
-				  am65_cpsw_nuss_rx_cleanup, 0);
+	for (i = 0; i < common->rx_ch_num_flows; i--)
+		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
+					  am65_cpsw_nuss_rx_cleanup, !!i);
+
+	am65_cpsw_destroy_xdp_rxqs(common);
+
 	return ret;
 }
 
@@ -779,12 +822,12 @@ static int am65_cpsw_nuss_common_stop(struct am65_cpsw_common *common)
 			dev_err(common->dev, "rx teardown timeout\n");
 	}
 
-	napi_disable(&common->napi_rx);
-	hrtimer_cancel(&common->rx_hrtimer);
-
-	for (i = 0; i < AM65_CPSW_MAX_RX_FLOWS; i++)
-		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, rx_chn,
+	for (i = 0; i < common->rx_ch_num_flows; i++) {
+		napi_disable(&common->rx_chns.flows[i].napi_rx);
+		hrtimer_cancel(&common->rx_chns.flows[i].rx_hrtimer);
+		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
 					  am65_cpsw_nuss_rx_cleanup, !!i);
+	}
 
 	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
 
@@ -793,10 +836,6 @@ static int am65_cpsw_nuss_common_stop(struct am65_cpsw_common *common)
 	writel(0, common->cpsw_base + AM65_CPSW_REG_CTL);
 	writel(0, common->cpsw_base + AM65_CPSW_REG_STAT_PORT_EN);
 
-	for (i = 0; i < rx_chn->descs_num; i++) {
-		if (rx_chn->pages[i])
-			am65_cpsw_put_page(rx_chn, rx_chn->pages[i], false, i);
-	}
 	am65_cpsw_destroy_xdp_rxqs(common);
 
 	dev_dbg(common->dev, "cpsw_nuss stopped\n");
@@ -867,7 +906,7 @@ static int am65_cpsw_nuss_ndo_slave_open(struct net_device *ndev)
 		goto runtime_put;
 	}
 
-	ret = netif_set_real_num_rx_queues(ndev, AM65_CPSW_MAX_RX_QUEUES);
+	ret = netif_set_real_num_rx_queues(ndev, common->rx_ch_num_flows);
 	if (ret) {
 		dev_err(common->dev, "cannot set real number of rx queues\n");
 		goto runtime_put;
@@ -990,12 +1029,12 @@ static int am65_cpsw_xdp_tx_frame(struct net_device *ndev,
 	return ret;
 }
 
-static int am65_cpsw_run_xdp(struct am65_cpsw_common *common,
+static int am65_cpsw_run_xdp(struct am65_cpsw_rx_flow *flow,
 			     struct am65_cpsw_port *port,
 			     struct xdp_buff *xdp,
 			     int desc_idx, int cpu, int *len)
 {
-	struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
+	struct am65_cpsw_common *common = flow->common;
 	struct net_device *ndev = port->ndev;
 	int ret = AM65_CPSW_XDP_CONSUMED;
 	struct am65_cpsw_tx_chn *tx_chn;
@@ -1055,7 +1094,7 @@ static int am65_cpsw_run_xdp(struct am65_cpsw_common *common,
 	}
 
 	page = virt_to_head_page(xdp->data);
-	am65_cpsw_put_page(rx_chn, page, true, desc_idx);
+	am65_cpsw_put_page(flow, page, true, desc_idx);
 
 out:
 	return ret;
@@ -1094,11 +1133,11 @@ static void am65_cpsw_nuss_rx_csum(struct sk_buff *skb, u32 csum_info)
 	}
 }
 
-static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
-				     u32 flow_idx, int cpu)
+static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_rx_flow *flow, int cpu)
 {
-	struct am65_cpsw_rx_chn *rx_chn = &common->rx_chns;
+	struct am65_cpsw_rx_chn *rx_chn = &flow->common->rx_chns;
 	u32 buf_dma_len, pkt_len, port_id = 0, csum_info;
+	struct am65_cpsw_common *common = flow->common;
 	struct am65_cpsw_ndev_priv *ndev_priv;
 	struct am65_cpsw_ndev_stats *stats;
 	struct cppi5_host_desc_t *desc_rx;
@@ -1108,6 +1147,7 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
 	struct am65_cpsw_port *port;
 	int headroom, desc_idx, ret;
 	struct net_device *ndev;
+	u32 flow_idx = flow->id;
 	struct sk_buff *skb;
 	struct xdp_buff	xdp;
 	void *page_addr;
@@ -1161,12 +1201,13 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
 	}
 
 	if (port->xdp_prog) {
-		xdp_init_buff(&xdp, AM65_CPSW_MAX_PACKET_SIZE, &port->xdp_rxq);
+		xdp_init_buff(&xdp, AM65_CPSW_MAX_PACKET_SIZE,
+			      &port->xdp_rxq[flow->id]);
 
 		xdp_prepare_buff(&xdp, page_addr, skb_headroom(skb),
 				 pkt_len, false);
 
-		ret = am65_cpsw_run_xdp(common, port, &xdp, desc_idx,
+		ret = am65_cpsw_run_xdp(flow, port, &xdp, desc_idx,
 					cpu, &pkt_len);
 		if (ret != AM65_CPSW_XDP_PASS)
 			return ret;
@@ -1184,7 +1225,7 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
 	skb_mark_for_recycle(skb);
 	skb->protocol = eth_type_trans(skb, ndev);
 	am65_cpsw_nuss_rx_csum(skb, csum_info);
-	napi_gro_receive(&common->napi_rx, skb);
+	napi_gro_receive(&flow->napi_rx, skb);
 
 	stats = this_cpu_ptr(ndev_priv->stats);
 
@@ -1193,21 +1234,21 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
 	stats->rx_bytes += pkt_len;
 	u64_stats_update_end(&stats->syncp);
 
-	new_page = page_pool_dev_alloc_pages(rx_chn->page_pool);
+	new_page = page_pool_dev_alloc_pages(flow->page_pool);
 	if (unlikely(!new_page))
 		return -ENOMEM;
-	rx_chn->pages[desc_idx] = new_page;
+	flow->pages[desc_idx] = new_page;
 
 	if (netif_dormant(ndev)) {
-		am65_cpsw_put_page(rx_chn, new_page, true, desc_idx);
+		am65_cpsw_put_page(flow, new_page, true, desc_idx);
 		ndev->stats.rx_dropped++;
 		return 0;
 	}
 
 requeue:
-	ret = am65_cpsw_nuss_rx_push(common, new_page);
+	ret = am65_cpsw_nuss_rx_push(common, new_page, flow_idx);
 	if (WARN_ON(ret < 0)) {
-		am65_cpsw_put_page(rx_chn, new_page, true, desc_idx);
+		am65_cpsw_put_page(flow, new_page, true, desc_idx);
 		ndev->stats.rx_errors++;
 		ndev->stats.rx_dropped++;
 	}
@@ -1217,38 +1258,33 @@ static int am65_cpsw_nuss_rx_packets(struct am65_cpsw_common *common,
 
 static enum hrtimer_restart am65_cpsw_nuss_rx_timer_callback(struct hrtimer *timer)
 {
-	struct am65_cpsw_common *common =
-			container_of(timer, struct am65_cpsw_common, rx_hrtimer);
+	struct am65_cpsw_rx_flow *flow = container_of(timer,
+						      struct am65_cpsw_rx_flow,
+						      rx_hrtimer);
 
-	enable_irq(common->rx_chns.irq);
+	enable_irq(flow->irq);
 	return HRTIMER_NORESTART;
 }
 
 static int am65_cpsw_nuss_rx_poll(struct napi_struct *napi_rx, int budget)
 {
-	struct am65_cpsw_common *common = am65_cpsw_napi_to_common(napi_rx);
-	int flow = AM65_CPSW_MAX_RX_FLOWS;
+	struct am65_cpsw_rx_flow *flow = am65_cpsw_napi_to_rx_flow(napi_rx);
+	struct am65_cpsw_common *common = flow->common;
 	int cpu = smp_processor_id();
 	bool xdp_redirect = false;
 	int cur_budget, ret;
 	int num_rx = 0;
 
-	/* process every flow */
-	while (flow--) {
-		cur_budget = budget - num_rx;
-
-		while (cur_budget--) {
-			ret = am65_cpsw_nuss_rx_packets(common, flow, cpu);
-			if (ret) {
-				if (ret == AM65_CPSW_XDP_REDIRECT)
-					xdp_redirect = true;
-				break;
-			}
-			num_rx++;
-		}
-
-		if (num_rx >= budget)
+	/* process only this flow */
+	cur_budget = budget;
+	while (cur_budget--) {
+		ret = am65_cpsw_nuss_rx_packets(flow, cpu);
+		if (ret) {
+			if (ret == AM65_CPSW_XDP_REDIRECT)
+				xdp_redirect = true;
 			break;
+		}
+		num_rx++;
 	}
 
 	if (xdp_redirect)
@@ -1257,14 +1293,14 @@ static int am65_cpsw_nuss_rx_poll(struct napi_struct *napi_rx, int budget)
 	dev_dbg(common->dev, "%s num_rx:%d %d\n", __func__, num_rx, budget);
 
 	if (num_rx < budget && napi_complete_done(napi_rx, num_rx)) {
-		if (common->rx_irq_disabled) {
-			common->rx_irq_disabled = false;
-			if (unlikely(common->rx_pace_timeout)) {
-				hrtimer_start(&common->rx_hrtimer,
-					      ns_to_ktime(common->rx_pace_timeout),
+		if (flow->irq_disabled) {
+			flow->irq_disabled = false;
+			if (unlikely(flow->rx_pace_timeout)) {
+				hrtimer_start(&flow->rx_hrtimer,
+					      ns_to_ktime(flow->rx_pace_timeout),
 					      HRTIMER_MODE_REL_PINNED);
 			} else {
-				enable_irq(common->rx_chns.irq);
+				enable_irq(flow->irq);
 			}
 		}
 	}
@@ -1512,11 +1548,11 @@ static int am65_cpsw_nuss_tx_poll(struct napi_struct *napi_tx, int budget)
 
 static irqreturn_t am65_cpsw_nuss_rx_irq(int irq, void *dev_id)
 {
-	struct am65_cpsw_common *common = dev_id;
+	struct am65_cpsw_rx_flow *flow = dev_id;
 
-	common->rx_irq_disabled = true;
+	flow->irq_disabled = true;
 	disable_irq_nosync(irq);
-	napi_schedule(&common->napi_rx);
+	napi_schedule(&flow->napi_rx);
 
 	return IRQ_HANDLED;
 }
@@ -2315,19 +2351,23 @@ static void am65_cpsw_nuss_free_rx_chns(void *data)
 		k3_udma_glue_release_rx_chn(rx_chn->rx_chn);
 }
 
-static void am65_cpsw_nuss_remove_rx_chns(void *data)
+void am65_cpsw_nuss_remove_rx_chns(void *data)
 {
 	struct am65_cpsw_common *common = data;
 	struct device *dev = common->dev;
 	struct am65_cpsw_rx_chn *rx_chn;
+	struct am65_cpsw_rx_flow *flows;
+	int i;
 
 	rx_chn = &common->rx_chns;
+	flows = rx_chn->flows;
 	devm_remove_action(dev, am65_cpsw_nuss_free_rx_chns, common);
 
-	if (!(rx_chn->irq < 0))
-		devm_free_irq(dev, rx_chn->irq, common);
-
-	netif_napi_del(&common->napi_rx);
+	for (i = 0; i < common->rx_ch_num_flows; i++) {
+		if (!(flows[i].irq < 0))
+			devm_free_irq(dev, flows[i].irq, &flows[i]);
+		netif_napi_del(&flows[i].napi_rx);
+	}
 
 	am65_cpsw_nuss_free_rx_chns(common);
 
@@ -2340,6 +2380,7 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 	struct k3_udma_glue_rx_channel_cfg rx_cfg = { 0 };
 	u32  max_desc_num = AM65_CPSW_MAX_RX_DESC;
 	struct device *dev = common->dev;
+	struct am65_cpsw_rx_flow *flow;
 	u32 hdesc_size, hdesc_size_out;
 	u32 fdqring_id;
 	int i, ret = 0;
@@ -2348,12 +2389,21 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 					   AM65_CPSW_NAV_SW_DATA_SIZE);
 
 	rx_cfg.swdata_size = AM65_CPSW_NAV_SW_DATA_SIZE;
-	rx_cfg.flow_id_num = AM65_CPSW_MAX_RX_FLOWS;
+	rx_cfg.flow_id_num = common->rx_ch_num_flows;
 	rx_cfg.flow_id_base = common->rx_flow_id_base;
 
 	/* init all flows */
 	rx_chn->dev = dev;
-	rx_chn->descs_num = max_desc_num;
+	rx_chn->descs_num = max_desc_num * rx_cfg.flow_id_num;
+
+	for (i = 0; i < common->rx_ch_num_flows; i++) {
+		flow = &rx_chn->flows[i];
+		flow->page_pool = NULL;
+		flow->pages = devm_kcalloc(dev, AM65_CPSW_MAX_RX_DESC,
+					   sizeof(*flow->pages), GFP_KERNEL);
+		if (!flow->pages)
+			return -ENOMEM;
+	}
 
 	rx_chn->rx_chn = k3_udma_glue_request_rx_chn(dev, "rx", &rx_cfg);
 	if (IS_ERR(rx_chn->rx_chn)) {
@@ -2376,13 +2426,6 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 	rx_chn->dsize_log2 = __fls(hdesc_size_out);
 	WARN_ON(hdesc_size_out != (1 << rx_chn->dsize_log2));
 
-	rx_chn->page_pool = NULL;
-
-	rx_chn->pages = devm_kcalloc(dev, rx_chn->descs_num,
-				     sizeof(*rx_chn->pages), GFP_KERNEL);
-	if (!rx_chn->pages)
-		return -ENOMEM;
-
 	common->rx_flow_id_base =
 			k3_udma_glue_rx_get_flow_id_base(rx_chn->rx_chn);
 	dev_info(dev, "set new flow-id-base %u\n", common->rx_flow_id_base);
@@ -2406,6 +2449,10 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 				K3_UDMA_GLUE_SRC_TAG_LO_USE_REMOTE_SRC_TAG,
 		};
 
+		flow = &rx_chn->flows[i];
+		flow->id = i;
+		flow->common = common;
+
 		rx_flow_cfg.ring_rxfdq0_id = fdqring_id;
 		rx_flow_cfg.rx_cfg.size = max_desc_num;
 		rx_flow_cfg.rxfdq_cfg.size = max_desc_num;
@@ -2422,28 +2469,32 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 				k3_udma_glue_rx_flow_get_fdq_id(rx_chn->rx_chn,
 								i);
 
-		rx_chn->irq = k3_udma_glue_rx_get_irq(rx_chn->rx_chn, i);
-
-		if (rx_chn->irq < 0) {
+		flow->irq = k3_udma_glue_rx_get_irq(rx_chn->rx_chn, i);
+		if (flow->irq <= 0) {
 			dev_err(dev, "Failed to get rx dma irq %d\n",
-				rx_chn->irq);
-			ret = rx_chn->irq;
+				flow->irq);
+			ret = flow->irq;
 			goto err;
 		}
-	}
-
-	netif_napi_add(common->dma_ndev, &common->napi_rx,
-		       am65_cpsw_nuss_rx_poll);
-	hrtimer_init(&common->rx_hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
-	common->rx_hrtimer.function = &am65_cpsw_nuss_rx_timer_callback;
 
-	ret = devm_request_irq(dev, rx_chn->irq,
-			       am65_cpsw_nuss_rx_irq,
-			       IRQF_TRIGGER_HIGH, dev_name(dev), common);
-	if (ret) {
-		dev_err(dev, "failure requesting rx irq %u, %d\n",
-			rx_chn->irq, ret);
-		goto err;
+		snprintf(flow->name,
+			 sizeof(flow->name), "%s-rx%d",
+			 dev_name(dev), i);
+		netif_napi_add(common->dma_ndev, &flow->napi_rx,
+			       am65_cpsw_nuss_rx_poll);
+		hrtimer_init(&flow->rx_hrtimer, CLOCK_MONOTONIC,
+			     HRTIMER_MODE_REL_PINNED);
+		flow->rx_hrtimer.function = &am65_cpsw_nuss_rx_timer_callback;
+
+		ret = devm_request_irq(dev, flow->irq,
+				       am65_cpsw_nuss_rx_irq,
+				       IRQF_TRIGGER_HIGH,
+				       flow->name, flow);
+		if (ret) {
+			dev_err(dev, "failure requesting rx %d irq %u, %d\n",
+				i, flow->irq, ret);
+			goto err;
+		}
 	}
 
 err:
@@ -3287,8 +3338,9 @@ static int am65_cpsw_nuss_register_ndevs(struct am65_cpsw_common *common)
 		k3_udma_glue_disable_tx_chn(tx_chan[i].tx_chn);
 	}
 
-	for (i = 0; i < AM65_CPSW_MAX_RX_FLOWS; i++)
-		k3_udma_glue_reset_rx_chn(rx_chan->rx_chn, i, rx_chan,
+	for (i = 0; i < common->rx_ch_num_flows; i++)
+		k3_udma_glue_reset_rx_chn(rx_chan->rx_chn, i,
+					  &rx_chan->flows[i],
 					  am65_cpsw_nuss_rx_cleanup, !!i);
 
 	k3_udma_glue_disable_rx_chn(rx_chan->rx_chn);
@@ -3330,12 +3382,18 @@ static int am65_cpsw_nuss_register_ndevs(struct am65_cpsw_common *common)
 	return ret;
 }
 
-int am65_cpsw_nuss_update_tx_chns(struct am65_cpsw_common *common, int num_tx)
+int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common,
+				     int num_tx, int num_rx)
 {
 	int ret;
 
 	common->tx_ch_num = num_tx;
+	common->rx_ch_num_flows = num_rx;
 	ret = am65_cpsw_nuss_init_tx_chns(common);
+	if (ret)
+		return ret;
+
+	ret = am65_cpsw_nuss_init_rx_chns(common);
 
 	return ret;
 }
@@ -3465,6 +3523,7 @@ static int am65_cpsw_nuss_probe(struct platform_device *pdev)
 	common->rx_flow_id_base = -1;
 	init_completion(&common->tdown_complete);
 	common->tx_ch_num = AM65_CPSW_DEFAULT_TX_CHNS;
+	common->rx_ch_num_flows = AM65_CPSW_DEFAULT_RX_CHN_FLOWS;
 	common->pf_p0_rx_ptype_rrobin = false;
 	common->default_vlan = 1;
 
@@ -3656,8 +3715,10 @@ static int am65_cpsw_nuss_resume(struct device *dev)
 		return ret;
 
 	/* If RX IRQ was disabled before suspend, keep it disabled */
-	if (common->rx_irq_disabled)
-		disable_irq(common->rx_chns.irq);
+	for (i = 0; i < common->rx_ch_num_flows; i++) {
+		if (common->rx_chns.flows[i].irq_disabled)
+			disable_irq(common->rx_chns.flows[i].irq);
+	}
 
 	am65_cpts_resume(common->cpts);
 
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.h b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
index e2ce2be320bd..7528bf7ab3fb 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.h
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.h
@@ -22,8 +22,7 @@ struct am65_cpts;
 #define HOST_PORT_NUM		0
 
 #define AM65_CPSW_MAX_TX_QUEUES	8
-#define AM65_CPSW_MAX_RX_QUEUES	1
-#define AM65_CPSW_MAX_RX_FLOWS	1
+#define AM65_CPSW_MAX_RX_QUEUES	8
 
 #define AM65_CPSW_PORT_VLAN_REG_OFFSET	0x014
 
@@ -58,7 +57,7 @@ struct am65_cpsw_port {
 	struct am65_cpsw_qos		qos;
 	struct devlink_port		devlink_port;
 	struct bpf_prog			*xdp_prog;
-	struct xdp_rxq_info		xdp_rxq;
+	struct xdp_rxq_info		xdp_rxq[AM65_CPSW_MAX_RX_QUEUES];
 	/* Only for suspend resume context */
 	u32				vid_context;
 };
@@ -94,16 +93,27 @@ struct am65_cpsw_tx_chn {
 	u32 rate_mbps;
 };
 
+struct am65_cpsw_rx_flow {
+	u32 id;
+	struct napi_struct napi_rx;
+	struct am65_cpsw_common	*common;
+	int irq;
+	bool irq_disabled;
+	struct hrtimer rx_hrtimer;
+	unsigned long rx_pace_timeout;
+	struct page_pool *page_pool;
+	struct page **pages;
+	char name[32];
+};
+
 struct am65_cpsw_rx_chn {
 	struct device *dev;
 	struct device *dma_dev;
 	struct k3_cppi_desc_pool *desc_pool;
 	struct k3_udma_glue_rx_channel *rx_chn;
-	struct page_pool *page_pool;
-	struct page **pages;
 	u32 descs_num;
 	unsigned char dsize_log2;
-	int irq;
+	struct am65_cpsw_rx_flow flows[AM65_CPSW_MAX_RX_QUEUES];
 };
 
 #define AM65_CPSW_QUIRK_I2027_NO_TX_CSUM BIT(0)
@@ -149,12 +159,8 @@ struct am65_cpsw_common {
 	struct completion	tdown_complete;
 	atomic_t		tdown_cnt;
 
+	int			rx_ch_num_flows;
 	struct am65_cpsw_rx_chn	rx_chns;
-	struct napi_struct	napi_rx;
-
-	bool			rx_irq_disabled;
-	struct hrtimer		rx_hrtimer;
-	unsigned long		rx_pace_timeout;
 
 	u32			nuss_ver;
 	u32			cpsw_ver;
@@ -203,8 +209,8 @@ struct am65_cpsw_ndev_priv {
 #define am65_common_get_host(common) (&(common)->host)
 #define am65_common_get_port(common, id) (&(common)->ports[(id) - 1])
 
-#define am65_cpsw_napi_to_common(pnapi) \
-	container_of(pnapi, struct am65_cpsw_common, napi_rx)
+#define am65_cpsw_napi_to_rx_flow(pnapi) \
+	container_of(pnapi, struct am65_cpsw_rx_flow, napi_rx)
 #define am65_cpsw_napi_to_tx_chn(pnapi) \
 	container_of(pnapi, struct am65_cpsw_tx_chn, napi_tx)
 
@@ -216,7 +222,9 @@ extern const struct ethtool_ops am65_cpsw_ethtool_ops_slave;
 
 void am65_cpsw_nuss_set_p0_ptype(struct am65_cpsw_common *common);
 void am65_cpsw_nuss_remove_tx_chns(struct am65_cpsw_common *common);
-int am65_cpsw_nuss_update_tx_chns(struct am65_cpsw_common *common, int num_tx);
+void am65_cpsw_nuss_remove_rx_chns(void *data);
+int am65_cpsw_nuss_update_tx_rx_chns(struct am65_cpsw_common *common,
+				     int num_tx, int num_rx);
 
 bool am65_cpsw_port_dev_check(const struct net_device *dev);
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 2/6] net: ethernet: ti: cpsw_ale: use regfields for ALE registers
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 3/6] net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers Roger Quadros
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

Map the entire ALE registerspace using regmap.

Add regfields for Major and Minor Version fields.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changelog:
v3:
- added Reviewed-by Simon Horman
---
 drivers/net/ethernet/ti/cpsw_ale.c | 84 ++++++++++++++++++++++++++++++--------
 drivers/net/ethernet/ti/cpsw_ale.h | 17 ++++++--
 2 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 64bf22cd860c..f638aa63edee 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
+#include <linux/regmap.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 #include <linux/err.h>
@@ -76,7 +77,7 @@ enum {
  * @dev_id: ALE version/SoC id
  * @features: features supported by ALE
  * @tbl_entries: number of ALE entries
- * @major_ver_mask: mask of ALE Major Version Value in ALE_IDVER reg.
+ * @reg_fields: pointer to array of register field configuration
  * @nu_switch_ale: NU Switch ALE
  * @vlan_entry_tbl: ALE vlan entry fields description tbl
  */
@@ -84,7 +85,7 @@ struct cpsw_ale_dev_id {
 	const char *dev_id;
 	u32 features;
 	u32 tbl_entries;
-	u32 major_ver_mask;
+	const struct reg_field *reg_fields;
 	bool nu_switch_ale;
 	const struct ale_entry_fld *vlan_entry_tbl;
 };
@@ -1292,25 +1293,37 @@ void cpsw_ale_stop(struct cpsw_ale *ale)
 	cpsw_ale_control_set(ale, 0, ALE_ENABLE, 0);
 }
 
+static const struct reg_field ale_fields_cpsw[] = {
+	/* CPSW_ALE_IDVER_REG */
+	[MINOR_VER]	= REG_FIELD(ALE_IDVER, 0, 7),
+	[MAJOR_VER]	= REG_FIELD(ALE_IDVER, 8, 15),
+};
+
+static const struct reg_field ale_fields_cpsw_nu[] = {
+	/* CPSW_ALE_IDVER_REG */
+	[MINOR_VER]	= REG_FIELD(ALE_IDVER, 0, 7),
+	[MAJOR_VER]	= REG_FIELD(ALE_IDVER, 8, 10),
+};
+
 static const struct cpsw_ale_dev_id cpsw_ale_id_match[] = {
 	{
 		/* am3/4/5, dra7. dm814x, 66ak2hk-gbe */
 		.dev_id = "cpsw",
 		.tbl_entries = 1024,
-		.major_ver_mask = 0xff,
+		.reg_fields = ale_fields_cpsw,
 		.vlan_entry_tbl = vlan_entry_cpsw,
 	},
 	{
 		/* 66ak2h_xgbe */
 		.dev_id = "66ak2h-xgbe",
 		.tbl_entries = 2048,
-		.major_ver_mask = 0xff,
+		.reg_fields = ale_fields_cpsw,
 		.vlan_entry_tbl = vlan_entry_cpsw,
 	},
 	{
 		.dev_id = "66ak2el",
 		.features = CPSW_ALE_F_STATUS_REG,
-		.major_ver_mask = 0x7,
+		.reg_fields = ale_fields_cpsw_nu,
 		.nu_switch_ale = true,
 		.vlan_entry_tbl = vlan_entry_nu,
 	},
@@ -1318,7 +1331,7 @@ static const struct cpsw_ale_dev_id cpsw_ale_id_match[] = {
 		.dev_id = "66ak2g",
 		.features = CPSW_ALE_F_STATUS_REG,
 		.tbl_entries = 64,
-		.major_ver_mask = 0x7,
+		.reg_fields = ale_fields_cpsw_nu,
 		.nu_switch_ale = true,
 		.vlan_entry_tbl = vlan_entry_nu,
 	},
@@ -1326,20 +1339,20 @@ static const struct cpsw_ale_dev_id cpsw_ale_id_match[] = {
 		.dev_id = "am65x-cpsw2g",
 		.features = CPSW_ALE_F_STATUS_REG | CPSW_ALE_F_HW_AUTOAGING,
 		.tbl_entries = 64,
-		.major_ver_mask = 0x7,
+		.reg_fields = ale_fields_cpsw_nu,
 		.nu_switch_ale = true,
 		.vlan_entry_tbl = vlan_entry_nu,
 	},
 	{
 		.dev_id = "j721e-cpswxg",
 		.features = CPSW_ALE_F_STATUS_REG | CPSW_ALE_F_HW_AUTOAGING,
-		.major_ver_mask = 0x7,
+		.reg_fields = ale_fields_cpsw_nu,
 		.vlan_entry_tbl = vlan_entry_k3_cpswxg,
 	},
 	{
 		.dev_id = "am64-cpswxg",
 		.features = CPSW_ALE_F_STATUS_REG | CPSW_ALE_F_HW_AUTOAGING,
-		.major_ver_mask = 0x7,
+		.reg_fields = ale_fields_cpsw_nu,
 		.vlan_entry_tbl = vlan_entry_k3_cpswxg,
 		.tbl_entries = 512,
 	},
@@ -1361,41 +1374,76 @@ cpsw_ale_dev_id *cpsw_ale_match_id(const struct cpsw_ale_dev_id *id,
 	return NULL;
 }
 
+static const struct regmap_config ale_regmap_cfg = {
+	.reg_bits = 32,
+	.val_bits = 32,
+	.reg_stride = 4,
+	.name = "cpsw-ale",
+};
+
+static int cpsw_ale_regfield_init(struct cpsw_ale *ale)
+{
+	struct regmap *regmap = ale->regmap;
+	struct device *dev = ale->params.dev;
+	const struct reg_field *reg_fields = ale->params.reg_fields;
+	int i;
+
+	for (i = 0; i < ALE_FIELDS_MAX; i++) {
+		ale->fields[i] = devm_regmap_field_alloc(dev, regmap,
+							 reg_fields[i]);
+		if (IS_ERR(ale->fields[i])) {
+			dev_err(dev, "Unable to allocate regmap field %d\n", i);
+			return PTR_ERR(ale->fields[i]);
+		}
+	}
+
+	return 0;
+}
+
 struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params)
 {
 	const struct cpsw_ale_dev_id *ale_dev_id;
 	struct cpsw_ale *ale;
-	u32 rev, ale_entries;
+	u32 ale_entries, rev_major, rev_minor;
+	int ret;
 
 	ale_dev_id = cpsw_ale_match_id(cpsw_ale_id_match, params->dev_id);
 	if (!ale_dev_id)
 		return ERR_PTR(-EINVAL);
 
 	params->ale_entries = ale_dev_id->tbl_entries;
-	params->major_ver_mask = ale_dev_id->major_ver_mask;
 	params->nu_switch_ale = ale_dev_id->nu_switch_ale;
+	params->reg_fields = ale_dev_id->reg_fields;
 
 	ale = devm_kzalloc(params->dev, sizeof(*ale), GFP_KERNEL);
 	if (!ale)
 		return ERR_PTR(-ENOMEM);
+	ale->regmap = devm_regmap_init_mmio(params->dev, params->ale_regs,
+					    &ale_regmap_cfg);
+	if (IS_ERR(ale->regmap)) {
+		dev_err(params->dev, "Couldn't create CPSW ALE regmap\n");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ale->params = *params;
+	ret = cpsw_ale_regfield_init(ale);
+	if (ret)
+		return ERR_PTR(ret);
 
 	ale->p0_untag_vid_mask = devm_bitmap_zalloc(params->dev, VLAN_N_VID,
 						    GFP_KERNEL);
 	if (!ale->p0_untag_vid_mask)
 		return ERR_PTR(-ENOMEM);
 
-	ale->params = *params;
 	ale->ageout = ale->params.ale_ageout * HZ;
 	ale->features = ale_dev_id->features;
 	ale->vlan_entry_tbl = ale_dev_id->vlan_entry_tbl;
 
-	rev = readl_relaxed(ale->params.ale_regs + ALE_IDVER);
-	ale->version =
-		(ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask) << 8) |
-		 ALE_VERSION_MINOR(rev);
+	regmap_field_read(ale->fields[MINOR_VER], &rev_minor);
+	regmap_field_read(ale->fields[MAJOR_VER], &rev_major);
+	ale->version = rev_major << 8 | rev_minor;
 	dev_info(ale->params.dev, "initialized cpsw ale version %d.%d\n",
-		 ALE_VERSION_MAJOR(rev, ale->params.major_ver_mask),
-		 ALE_VERSION_MINOR(rev));
+		 rev_major, rev_minor);
 
 	if (ale->features & CPSW_ALE_F_STATUS_REG &&
 	    !ale->params.ale_entries) {
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index 6779ee111d57..58d377dd7496 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -8,6 +8,8 @@
 #ifndef __TI_CPSW_ALE_H__
 #define __TI_CPSW_ALE_H__
 
+struct reg_fields;
+
 struct cpsw_ale_params {
 	struct device		*dev;
 	void __iomem		*ale_regs;
@@ -20,19 +22,26 @@ struct cpsw_ale_params {
 	 * to identify this hardware.
 	 */
 	bool			nu_switch_ale;
-	/* mask bit used in NU Switch ALE is 3 bits instead of 8 bits. So
-	 * pass it from caller.
-	 */
-	u32			major_ver_mask;
+	const struct reg_field *reg_fields;
 	const char		*dev_id;
 	unsigned long		bus_freq;
 };
 
 struct ale_entry_fld;
+struct regmap;
+
+enum ale_fields {
+	MINOR_VER,
+	MAJOR_VER,
+	/* terminator */
+	ALE_FIELDS_MAX,
+};
 
 struct cpsw_ale {
 	struct cpsw_ale_params	params;
 	struct timer_list	timer;
+	struct regmap		*regmap;
+	struct regmap_field	*fields[ALE_FIELDS_MAX];
 	unsigned long		ageout;
 	u32			version;
 	u32			features;

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 3/6] net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 2/6] net: ethernet: ti: cpsw_ale: use regfields for ALE registers Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 4/6] net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields Roger Quadros
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

Use regfields for number of ALE Entries and Policers.

The variants that support Policers/Classifiers have the number
of policers encoded in the ALE_STATUS register.

Use that and show the number of Policers in the ALE info message.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changelog:
v3:
- added Reviewed-by Simon Horman
---
 drivers/net/ethernet/ti/cpsw_ale.c | 25 +++++++++++++++++++------
 drivers/net/ethernet/ti/cpsw_ale.h |  3 +++
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index f638aa63edee..c11326e8fbd4 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -103,7 +103,7 @@ struct cpsw_ale_dev_id {
 #define ALE_UCAST_TOUCHED		3
 
 #define ALE_TABLE_SIZE_MULTIPLIER	1024
-#define ALE_STATUS_SIZE_MASK		0x1f
+#define ALE_POLICER_SIZE_MULTIPLIER	8
 
 static inline int cpsw_ale_get_field(u32 *ale_entry, u32 start, u32 bits)
 {
@@ -1303,6 +1303,9 @@ static const struct reg_field ale_fields_cpsw_nu[] = {
 	/* CPSW_ALE_IDVER_REG */
 	[MINOR_VER]	= REG_FIELD(ALE_IDVER, 0, 7),
 	[MAJOR_VER]	= REG_FIELD(ALE_IDVER, 8, 10),
+	/* CPSW_ALE_STATUS_REG */
+	[ALE_ENTRIES]	= REG_FIELD(ALE_STATUS, 0, 7),
+	[ALE_POLICERS]	= REG_FIELD(ALE_STATUS, 8, 15),
 };
 
 static const struct cpsw_ale_dev_id cpsw_ale_id_match[] = {
@@ -1404,7 +1407,7 @@ struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params)
 {
 	const struct cpsw_ale_dev_id *ale_dev_id;
 	struct cpsw_ale *ale;
-	u32 ale_entries, rev_major, rev_minor;
+	u32 ale_entries, rev_major, rev_minor, policers;
 	int ret;
 
 	ale_dev_id = cpsw_ale_match_id(cpsw_ale_id_match, params->dev_id);
@@ -1447,9 +1450,7 @@ struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params)
 
 	if (ale->features & CPSW_ALE_F_STATUS_REG &&
 	    !ale->params.ale_entries) {
-		ale_entries =
-			readl_relaxed(ale->params.ale_regs + ALE_STATUS) &
-			ALE_STATUS_SIZE_MASK;
+		regmap_field_read(ale->fields[ALE_ENTRIES], &ale_entries);
 		/* ALE available on newer NetCP switches has introduced
 		 * a register, ALE_STATUS, to indicate the size of ALE
 		 * table which shows the size as a multiple of 1024 entries.
@@ -1463,8 +1464,20 @@ struct cpsw_ale *cpsw_ale_create(struct cpsw_ale_params *params)
 		ale_entries *= ALE_TABLE_SIZE_MULTIPLIER;
 		ale->params.ale_entries = ale_entries;
 	}
+
+	if (ale->features & CPSW_ALE_F_STATUS_REG &&
+	    !ale->params.num_policers) {
+		regmap_field_read(ale->fields[ALE_POLICERS], &policers);
+		if (!policers)
+			return ERR_PTR(-EINVAL);
+
+		policers *= ALE_POLICER_SIZE_MULTIPLIER;
+		ale->params.num_policers = policers;
+	}
+
 	dev_info(ale->params.dev,
-		 "ALE Table size %ld\n", ale->params.ale_entries);
+		 "ALE Table size %ld, Policers %ld\n", ale->params.ale_entries,
+		 ale->params.num_policers);
 
 	/* set default bits for existing h/w */
 	ale->port_mask_bits = ale->params.ale_ports;
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index 58d377dd7496..e12bb2caf016 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -15,6 +15,7 @@ struct cpsw_ale_params {
 	void __iomem		*ale_regs;
 	unsigned long		ale_ageout;	/* in secs */
 	unsigned long		ale_entries;
+	unsigned long		num_policers;
 	unsigned long		ale_ports;
 	/* NU Switch has specific handling as number of bits in ALE entries
 	 * are different than other versions of ALE. Also there are specific
@@ -33,6 +34,8 @@ struct regmap;
 enum ale_fields {
 	MINOR_VER,
 	MAJOR_VER,
+	ALE_ENTRIES,
+	ALE_POLICERS,
 	/* terminator */
 	ALE_FIELDS_MAX,
 };

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 4/6] net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
                   ` (2 preceding siblings ...)
  2024-07-03 13:51 ` [PATCH net-next v3 3/6] net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-03 13:51 ` [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults Roger Quadros
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

Adds regfileds for Policer registers and Thread mapping/control registers.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changelog:
v3:
- added Reviewed-by Simon Horman
---
 drivers/net/ethernet/ti/cpsw_ale.c | 86 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_ale.h | 41 ++++++++++++++++++
 2 files changed, 127 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index c11326e8fbd4..6b64ec78d1b6 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -46,6 +46,24 @@
 #define ALE_UNKNOWNVLAN_FORCE_UNTAG_EGRESS	0x9C
 #define ALE_VLAN_MASK_MUX(reg)			(0xc0 + (0x4 * (reg)))
 
+#define ALE_POLICER_PORT_OUI		0x100
+#define ALE_POLICER_DA_SA		0x104
+#define ALE_POLICER_VLAN		0x108
+#define ALE_POLICER_ETHERTYPE_IPSA	0x10c
+#define ALE_POLICER_IPDA		0x110
+#define ALE_POLICER_PIR			0x118
+#define ALE_POLICER_CIR			0x11c
+#define ALE_POLICER_TBL_CTL		0x120
+#define ALE_POLICER_CTL			0x124
+#define ALE_POLICER_TEST_CTL		0x128
+#define ALE_POLICER_HIT_STATUS		0x12c
+#define ALE_THREAD_DEF			0x134
+#define ALE_THREAD_CTL			0x138
+#define ALE_THREAD_VAL			0x13c
+
+#define ALE_POLICER_TBL_WRITE_ENABLE	BIT(31)
+#define ALE_POLICER_TBL_INDEX_MASK	GENMASK(4, 0)
+
 #define AM65_CPSW_ALE_THREAD_DEF_REG 0x134
 
 /* ALE_AGING_TIMER */
@@ -1306,6 +1324,74 @@ static const struct reg_field ale_fields_cpsw_nu[] = {
 	/* CPSW_ALE_STATUS_REG */
 	[ALE_ENTRIES]	= REG_FIELD(ALE_STATUS, 0, 7),
 	[ALE_POLICERS]	= REG_FIELD(ALE_STATUS, 8, 15),
+	/* CPSW_ALE_POLICER_PORT_OUI_REG */
+	[POL_PORT_MEN]	= REG_FIELD(ALE_POLICER_PORT_OUI, 31, 31),
+	[POL_TRUNK_ID]	= REG_FIELD(ALE_POLICER_PORT_OUI, 30, 30),
+	[POL_PORT_NUM]	= REG_FIELD(ALE_POLICER_PORT_OUI, 25, 25),
+	[POL_PRI_MEN]	= REG_FIELD(ALE_POLICER_PORT_OUI, 19, 19),
+	[POL_PRI_VAL]	= REG_FIELD(ALE_POLICER_PORT_OUI, 16, 18),
+	[POL_OUI_MEN]	= REG_FIELD(ALE_POLICER_PORT_OUI, 15, 15),
+	[POL_OUI_INDEX]	= REG_FIELD(ALE_POLICER_PORT_OUI, 0, 5),
+
+	/* CPSW_ALE_POLICER_DA_SA_REG */
+	[POL_DST_MEN]	= REG_FIELD(ALE_POLICER_DA_SA, 31, 31),
+	[POL_DST_INDEX]	= REG_FIELD(ALE_POLICER_DA_SA, 16, 21),
+	[POL_SRC_MEN]	= REG_FIELD(ALE_POLICER_DA_SA, 15, 15),
+	[POL_SRC_INDEX]	= REG_FIELD(ALE_POLICER_DA_SA, 0, 5),
+
+	/* CPSW_ALE_POLICER_VLAN_REG */
+	[POL_OVLAN_MEN]		= REG_FIELD(ALE_POLICER_VLAN, 31, 31),
+	[POL_OVLAN_INDEX]	= REG_FIELD(ALE_POLICER_VLAN, 16, 21),
+	[POL_IVLAN_MEN]		= REG_FIELD(ALE_POLICER_VLAN, 15, 15),
+	[POL_IVLAN_INDEX]	= REG_FIELD(ALE_POLICER_VLAN, 0, 5),
+
+	/* CPSW_ALE_POLICER_ETHERTYPE_IPSA_REG */
+	[POL_ETHERTYPE_MEN]	= REG_FIELD(ALE_POLICER_ETHERTYPE_IPSA, 31, 31),
+	[POL_ETHERTYPE_INDEX]	= REG_FIELD(ALE_POLICER_ETHERTYPE_IPSA, 16, 21),
+	[POL_IPSRC_MEN]		= REG_FIELD(ALE_POLICER_ETHERTYPE_IPSA, 15, 15),
+	[POL_IPSRC_INDEX]	= REG_FIELD(ALE_POLICER_ETHERTYPE_IPSA, 0, 5),
+
+	/* CPSW_ALE_POLICER_IPDA_REG */
+	[POL_IPDST_MEN]		= REG_FIELD(ALE_POLICER_IPDA, 31, 31),
+	[POL_IPDST_INDEX]	= REG_FIELD(ALE_POLICER_IPDA, 16, 21),
+
+	/* CPSW_ALE_POLICER_TBL_CTL_REG */
+	/**
+	 * REG_FIELDS not defined for this as fields cannot be correctly
+	 * used independently
+	 */
+
+	/* CPSW_ALE_POLICER_CTL_REG */
+	[POL_EN]		= REG_FIELD(ALE_POLICER_CTL, 31, 31),
+	[POL_RED_DROP_EN]	= REG_FIELD(ALE_POLICER_CTL, 29, 29),
+	[POL_YELLOW_DROP_EN]	= REG_FIELD(ALE_POLICER_CTL, 28, 28),
+	[POL_YELLOW_THRESH]	= REG_FIELD(ALE_POLICER_CTL, 24, 26),
+	[POL_POL_MATCH_MODE]	= REG_FIELD(ALE_POLICER_CTL, 22, 23),
+	[POL_PRIORITY_THREAD_EN] = REG_FIELD(ALE_POLICER_CTL, 21, 21),
+	[POL_MAC_ONLY_DEF_DIS]	= REG_FIELD(ALE_POLICER_CTL, 20, 20),
+
+	/* CPSW_ALE_POLICER_TEST_CTL_REG */
+	[POL_TEST_CLR]		= REG_FIELD(ALE_POLICER_TEST_CTL, 31, 31),
+	[POL_TEST_CLR_RED]	= REG_FIELD(ALE_POLICER_TEST_CTL, 30, 30),
+	[POL_TEST_CLR_YELLOW]	= REG_FIELD(ALE_POLICER_TEST_CTL, 29, 29),
+	[POL_TEST_CLR_SELECTED]	= REG_FIELD(ALE_POLICER_TEST_CTL, 28, 28),
+	[POL_TEST_ENTRY]	= REG_FIELD(ALE_POLICER_TEST_CTL, 0, 4),
+
+	/* CPSW_ALE_POLICER_HIT_STATUS_REG */
+	[POL_STATUS_HIT]	= REG_FIELD(ALE_POLICER_HIT_STATUS, 31, 31),
+	[POL_STATUS_HIT_RED]	= REG_FIELD(ALE_POLICER_HIT_STATUS, 30, 30),
+	[POL_STATUS_HIT_YELLOW]	= REG_FIELD(ALE_POLICER_HIT_STATUS, 29, 29),
+
+	/* CPSW_ALE_THREAD_DEF_REG */
+	[ALE_DEFAULT_THREAD_EN]		= REG_FIELD(ALE_THREAD_DEF, 15, 15),
+	[ALE_DEFAULT_THREAD_VAL]	= REG_FIELD(ALE_THREAD_DEF, 0, 5),
+
+	/* CPSW_ALE_THREAD_CTL_REG */
+	[ALE_THREAD_CLASS_INDEX] = REG_FIELD(ALE_THREAD_CTL, 0, 4),
+
+	/* CPSW_ALE_THREAD_VAL_REG */
+	[ALE_THREAD_ENABLE]	= REG_FIELD(ALE_THREAD_VAL, 15, 15),
+	[ALE_THREAD_VALUE]	= REG_FIELD(ALE_THREAD_VAL, 0, 5),
 };
 
 static const struct cpsw_ale_dev_id cpsw_ale_id_match[] = {
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index e12bb2caf016..2cb76acc6d16 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -36,6 +36,47 @@ enum ale_fields {
 	MAJOR_VER,
 	ALE_ENTRIES,
 	ALE_POLICERS,
+	POL_PORT_MEN,
+	POL_TRUNK_ID,
+	POL_PORT_NUM,
+	POL_PRI_MEN,
+	POL_PRI_VAL,
+	POL_OUI_MEN,
+	POL_OUI_INDEX,
+	POL_DST_MEN,
+	POL_DST_INDEX,
+	POL_SRC_MEN,
+	POL_SRC_INDEX,
+	POL_OVLAN_MEN,
+	POL_OVLAN_INDEX,
+	POL_IVLAN_MEN,
+	POL_IVLAN_INDEX,
+	POL_ETHERTYPE_MEN,
+	POL_ETHERTYPE_INDEX,
+	POL_IPSRC_MEN,
+	POL_IPSRC_INDEX,
+	POL_IPDST_MEN,
+	POL_IPDST_INDEX,
+	POL_EN,
+	POL_RED_DROP_EN,
+	POL_YELLOW_DROP_EN,
+	POL_YELLOW_THRESH,
+	POL_POL_MATCH_MODE,
+	POL_PRIORITY_THREAD_EN,
+	POL_MAC_ONLY_DEF_DIS,
+	POL_TEST_CLR,
+	POL_TEST_CLR_RED,
+	POL_TEST_CLR_YELLOW,
+	POL_TEST_CLR_SELECTED,
+	POL_TEST_ENTRY,
+	POL_STATUS_HIT,
+	POL_STATUS_HIT_RED,
+	POL_STATUS_HIT_YELLOW,
+	ALE_DEFAULT_THREAD_EN,
+	ALE_DEFAULT_THREAD_VAL,
+	ALE_THREAD_CLASS_INDEX,
+	ALE_THREAD_ENABLE,
+	ALE_THREAD_VALUE,
 	/* terminator */
 	ALE_FIELDS_MAX,
 };

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
                   ` (3 preceding siblings ...)
  2024-07-03 13:51 ` [PATCH net-next v3 4/6] net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-04  8:54   ` Simon Horman
  2024-07-03 13:51 ` [PATCH net-next v3 6/6] net: ethernet: ti: am65-cpsw: setup priority to flow mapping Roger Quadros
  2024-07-04  9:59 ` [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support MD Danish Anwar
  6 siblings, 1 reply; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

The Policer registers in the ALE register space are just shadow registers
and use an index field in the policer table control register to read/write
to the actual Polier registers.
Add helper functions to Read and Write to Policer registers.

Also add a helper function to set the thread value to classifier/policer
mapping. Any packet that first matches the classifier will be sent to the
thread (flow) that is set in the classifier to thread mapping table.
If not set then it goes to the default flow.

Default behaviour is to have 8 classifiers to map 8 DSCP/PCP
priorities to N receive threads (flows). N depends on number of
RX channels enabled for the port.
As per the standard [1] User prioritie 1 (Background) and 2 (Spare) have
lower priority than the user priority 0 (default). User priority 1 being
of the lowest priority.

[1] IEEE802.1D-2004, IEEE Standard for Local and metropolitan area networks
Table G-2 - Traffic type acronyms
Table G-3 - Defining traffic types

Signed-off-by: Roger Quadros <rogerq@kernel.org>
---
Changelog:
v3:
- squashed 2 patches into 1
- added comment explaining the default thread to priority mapping table
- typo fix "classifer"->"classifier"
---
 drivers/net/ethernet/ti/cpsw_ale.c | 94 ++++++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/ti/cpsw_ale.h |  1 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 6b64ec78d1b6..cf2958093fc8 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -1627,3 +1627,97 @@ u32 cpsw_ale_get_num_entries(struct cpsw_ale *ale)
 {
 	return ale ? ale->params.ale_entries : 0;
 }
+
+/* Reads the specified policer index into ALE POLICER registers */
+static void cpsw_ale_policer_read_idx(struct cpsw_ale *ale, u32 idx)
+{
+	idx &= ALE_POLICER_TBL_INDEX_MASK;
+	writel_relaxed(idx, ale->params.ale_regs + ALE_POLICER_TBL_CTL);
+}
+
+/* Writes the ALE POLICER registers into the specified policer index */
+static void cpsw_ale_policer_write_idx(struct cpsw_ale *ale, u32 idx)
+{
+	idx &= ALE_POLICER_TBL_INDEX_MASK;
+	idx |= ALE_POLICER_TBL_WRITE_ENABLE;
+	writel_relaxed(idx, ale->params.ale_regs + ALE_POLICER_TBL_CTL);
+}
+
+/* enables/disables the custom thread value for the specified policer index */
+static void cpsw_ale_policer_thread_idx_enable(struct cpsw_ale *ale, u32 idx,
+					       u32 thread_id, bool enable)
+{
+	regmap_field_write(ale->fields[ALE_THREAD_CLASS_INDEX], idx);
+	regmap_field_write(ale->fields[ALE_THREAD_VALUE], thread_id);
+	regmap_field_write(ale->fields[ALE_THREAD_ENABLE], enable ? 1 : 0);
+}
+
+/* Disable all policer entries and thread mappings */
+static void cpsw_ale_policer_reset(struct cpsw_ale *ale)
+{
+	int i;
+
+	for (i = 0; i < ale->params.num_policers ; i++) {
+		cpsw_ale_policer_read_idx(ale, i);
+		regmap_field_write(ale->fields[POL_PORT_MEN], 0);
+		regmap_field_write(ale->fields[POL_PRI_MEN], 0);
+		regmap_field_write(ale->fields[POL_OUI_MEN], 0);
+		regmap_field_write(ale->fields[POL_DST_MEN], 0);
+		regmap_field_write(ale->fields[POL_SRC_MEN], 0);
+		regmap_field_write(ale->fields[POL_OVLAN_MEN], 0);
+		regmap_field_write(ale->fields[POL_IVLAN_MEN], 0);
+		regmap_field_write(ale->fields[POL_ETHERTYPE_MEN], 0);
+		regmap_field_write(ale->fields[POL_IPSRC_MEN], 0);
+		regmap_field_write(ale->fields[POL_IPDST_MEN], 0);
+		regmap_field_write(ale->fields[POL_EN], 0);
+		regmap_field_write(ale->fields[POL_RED_DROP_EN], 0);
+		regmap_field_write(ale->fields[POL_YELLOW_DROP_EN], 0);
+		regmap_field_write(ale->fields[POL_PRIORITY_THREAD_EN], 0);
+
+		cpsw_ale_policer_thread_idx_enable(ale, i, 0, 0);
+	}
+}
+
+/* Default classifier is to map 8 user priorities to N receive channels */
+void cpsw_ale_classifier_setup_default(struct cpsw_ale *ale, int num_rx_ch)
+{
+	int pri, idx;
+	/* IEEE802.1D-2004, Standard for Local and metropolitan area networks
+	 *    Table G-2 - Traffic type acronyms
+	 *    Table G-3 - Defining traffic types
+	 * User priority values 1 and 2 effectively communicate a lower
+	 * priority than 0. In the below table 0 is assigned to higher priority
+	 * thread than 1 and 2 wherever possible.
+	 * The below table maps which thread the user priority needs to be
+	 * sent to for a given number of threads (RX channels). Upper threads
+	 * have higher priority.
+	 * e.g. if number of threads is 8 then user priority 0 will map to
+	 * pri_thread_map[8-1][0] i.e. thread 2
+	 */
+	int pri_thread_map[8][8] = {	{ 0, 0, 0, 0, 0, 0, 0, 0, },
+					{ 0, 0, 0, 0, 1, 1, 1, 1, },
+					{ 0, 0, 0, 0, 1, 1, 2, 2, },
+					{ 1, 0, 0, 1, 2, 2, 3, 3, },
+					{ 1, 0, 0, 1, 2, 3, 4, 4, },
+					{ 1, 0, 0, 2, 3, 4, 5, 5, },
+					{ 1, 0, 0, 2, 3, 4, 5, 6, },
+					{ 2, 0, 1, 3, 4, 5, 6, 7, } };
+
+	cpsw_ale_policer_reset(ale);
+
+	/* use first 8 classifiers to map 8 (DSCP/PCP) priorities to channels */
+	for (pri = 0; pri < 8; pri++) {
+		idx = pri;
+
+		/* Classifier 'idx' match on priority 'pri' */
+		cpsw_ale_policer_read_idx(ale, idx);
+		regmap_field_write(ale->fields[POL_PRI_VAL], pri);
+		regmap_field_write(ale->fields[POL_PRI_MEN], 1);
+		cpsw_ale_policer_write_idx(ale, idx);
+
+		/* Map Classifier 'idx' to thread provided by the map */
+		cpsw_ale_policer_thread_idx_enable(ale, idx,
+						   pri_thread_map[num_rx_ch - 1][pri],
+						   1);
+	}
+}
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index 2cb76acc6d16..1e4e9a3dd234 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -193,5 +193,6 @@ int cpsw_ale_vlan_add_modify(struct cpsw_ale *ale, u16 vid, int port_mask,
 int cpsw_ale_vlan_del_modify(struct cpsw_ale *ale, u16 vid, int port_mask);
 void cpsw_ale_set_unreg_mcast(struct cpsw_ale *ale, int unreg_mcast_mask,
 			      bool add);
+void cpsw_ale_classifier_setup_default(struct cpsw_ale *ale, int num_rx_ch);
 
 #endif

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v3 6/6] net: ethernet: ti: am65-cpsw: setup priority to flow mapping
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
                   ` (4 preceding siblings ...)
  2024-07-03 13:51 ` [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults Roger Quadros
@ 2024-07-03 13:51 ` Roger Quadros
  2024-07-04  9:59 ` [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support MD Danish Anwar
  6 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-03 13:51 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap, Roger Quadros

Now that we support multiple RX queues, enable default priority
to flow mapping so that higher priority packets come on higher
channels (flows).

The Classifier checks for PCP/DSCP priority in the packet and
routes them to the appropriate flow.

Signed-off-by: Roger Quadros <rogerq@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changelog:
v3:
- added Reviewed-by Simon Horman
---
 drivers/net/ethernet/ti/am65-cpsw-nuss.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index deb8e39e4648..4db03010d234 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -2497,6 +2497,9 @@ static int am65_cpsw_nuss_init_rx_chns(struct am65_cpsw_common *common)
 		}
 	}
 
+	/* setup classifier to route priorities to flows */
+	cpsw_ale_classifier_setup_default(common->ale, common->rx_ch_num_flows);
+
 err:
 	i = devm_add_action(dev, am65_cpsw_nuss_free_rx_chns, common);
 	if (i) {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults
  2024-07-03 13:51 ` [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults Roger Quadros
@ 2024-07-04  8:54   ` Simon Horman
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Horman @ 2024-07-04  8:54 UTC (permalink / raw)
  To: Roger Quadros
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis, Andrew Lunn, srk, vigneshr,
	danishanwar, pekka Varis, netdev, linux-kernel, linux-omap

On Wed, Jul 03, 2024 at 04:51:36PM +0300, Roger Quadros wrote:
> The Policer registers in the ALE register space are just shadow registers
> and use an index field in the policer table control register to read/write
> to the actual Polier registers.
> Add helper functions to Read and Write to Policer registers.
> 
> Also add a helper function to set the thread value to classifier/policer
> mapping. Any packet that first matches the classifier will be sent to the
> thread (flow) that is set in the classifier to thread mapping table.
> If not set then it goes to the default flow.
> 
> Default behaviour is to have 8 classifiers to map 8 DSCP/PCP
> priorities to N receive threads (flows). N depends on number of
> RX channels enabled for the port.
> As per the standard [1] User prioritie 1 (Background) and 2 (Spare) have
> lower priority than the user priority 0 (default). User priority 1 being
> of the lowest priority.
> 
> [1] IEEE802.1D-2004, IEEE Standard for Local and metropolitan area networks
> Table G-2 - Traffic type acronyms
> Table G-3 - Defining traffic types
> 
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> ---
> Changelog:
> v3:
> - squashed 2 patches into 1
> - added comment explaining the default thread to priority mapping table
> - typo fix "classifer"->"classifier"

Thanks for the updates,

Reviewed-by: Simon Horman <horms@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support
  2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
                   ` (5 preceding siblings ...)
  2024-07-03 13:51 ` [PATCH net-next v3 6/6] net: ethernet: ti: am65-cpsw: setup priority to flow mapping Roger Quadros
@ 2024-07-04  9:59 ` MD Danish Anwar
  6 siblings, 0 replies; 16+ messages in thread
From: MD Danish Anwar @ 2024-07-04  9:59 UTC (permalink / raw)
  To: Roger Quadros, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, pekka Varis, netdev,
	linux-kernel, linux-omap



On 03/07/24 7:21 pm, Roger Quadros wrote:
> Hi,
> 
> am65-cpsw can support up to 8 queues at Rx. So far we have
> been using only one queue (i.e. default flow) for all RX traffic.
> 
> This series adds multi-queue support. The driver starts with
> 1 RX queue by default. User can increase the RX queues via ethtool,
> e.g. 'ethtool -L ethx rx <N>'
> 
> The series also adds regmap and regfield support to some of the
> ALE registers. It adds Policer/Classifier registers and fields.
> 
> Converting the existing ALE control APIs to regfields can be a separate
> exercise.
> 
> Some helper functions are added to read/write to the Policer/Classifier
> registers and a default Classifier setup function is added that
> routes packets based on their PCP/DSCP priority to different RX queues.
> 
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> ---
> Changes in v3:
> - code style fixes
> - squashed patches 5 and 6
> - added comment about priority to thread mapping table.
> - Added Reviewed-by Simon Horman.
> - Link to v2: https://lore.kernel.org/r/20240628-am65-cpsw-multi-rx-v2-0-c399cb77db56@kernel.org
> 
> Changes in v2:
> - rebase to net/next
> - fixed RX stall issue during iperf
> - Link to v1: https://lore.kernel.org/r/20240606-am65-cpsw-multi-rx-v1-0-0704b0cb6fdc@kernel.org
> 
> ---
> Roger Quadros (6):
>       net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
>       net: ethernet: ti: cpsw_ale: use regfields for ALE registers
>       net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers
>       net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields
>       net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults
>       net: ethernet: ti: am65-cpsw: setup priority to flow mapping
> 
>  drivers/net/ethernet/ti/am65-cpsw-ethtool.c |  62 +++--
>  drivers/net/ethernet/ti/am65-cpsw-nuss.c    | 370 ++++++++++++++++------------
>  drivers/net/ethernet/ti/am65-cpsw-nuss.h    |  36 +--
>  drivers/net/ethernet/ti/cpsw_ale.c          | 287 +++++++++++++++++++--
>  drivers/net/ethernet/ti/cpsw_ale.h          |  62 ++++-
>  5 files changed, 609 insertions(+), 208 deletions(-)
> ---
> base-commit: 84562f9953ec5f91a4922baa2bd4f2d4f64fac31
> change-id: 20240606-am65-cpsw-multi-rx-fb6cf8dea5eb
> 
> Best regards,

For this series,
Reviewed-by: MD Danish Anwar <danishanwar@ti.com>

-- 
Thanks and Regards,
Danish

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
@ 2024-07-06  1:15   ` Jakub Kicinski
  2024-07-08 19:42     ` Roger Quadros
  2024-07-23 17:11   ` Joe Damato
  2024-07-23 21:10   ` Brett Creeley
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2024-07-06  1:15 UTC (permalink / raw)
  To: Roger Quadros
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Siddharth Vadapalli,
	Julien Panis, Simon Horman, Andrew Lunn, srk, vigneshr,
	danishanwar, pekka Varis, netdev, linux-kernel, linux-omap

On Wed, 03 Jul 2024 16:51:32 +0300 Roger Quadros wrote:
>  
> -	if (queue >= AM65_CPSW_MAX_TX_QUEUES)
> +	if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
> +	    queue >= AM65_CPSW_MAX_RX_QUEUES)
>  		return -EINVAL;

both MAXes are 8, the else conditions below are dead code
Same for set

> -	tx_chn = &common->tx_chns[queue];
> +	if (queue < AM65_CPSW_MAX_TX_QUEUES) {
> +		tx_chn = &common->tx_chns[queue];
> +		coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
> +	} else {
> +		coal->tx_coalesce_usecs = ~0;
> +	}
>  
> -	coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
> +	if (queue < AM65_CPSW_MAX_RX_QUEUES) {
> +		rx_flow = &common->rx_chns.flows[queue];
> +		coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
> +	} else {
> +		coal->rx_coalesce_usecs = ~0;
> +	}

+	for (flow_idx = 0; flow_idx < common->rx_ch_num_flows; flow_idx++) {
+		flow = &rx_chn->flows[flow_idx];
+		for (i = 0; i < AM65_CPSW_MAX_RX_DESC; i++) {
+			page = page_pool_dev_alloc_pages(flow->page_pool);
+			if (!page) {
+				dev_err(common->dev, "cannot allocate page in flow %d\n",
+					flow_idx);
+				ret = -ENOMEM;
+				if (i)
+					goto fail_rx;
 
-			return ret;
-		}
-		rx_chn->pages[i] = page;
+				return ret;

the direct returns now that it's a double-nested loop seem questionable,
don't you have to goto fail_rx?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-06  1:15   ` Jakub Kicinski
@ 2024-07-08 19:42     ` Roger Quadros
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-08 19:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Siddharth Vadapalli,
	Julien Panis, Simon Horman, Andrew Lunn, srk, vigneshr,
	danishanwar, pekka Varis, netdev, linux-kernel, linux-omap



On 06/07/2024 04:15, Jakub Kicinski wrote:
> On Wed, 03 Jul 2024 16:51:32 +0300 Roger Quadros wrote:
>>  
>> -	if (queue >= AM65_CPSW_MAX_TX_QUEUES)
>> +	if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
>> +	    queue >= AM65_CPSW_MAX_RX_QUEUES)
>>  		return -EINVAL;
> 
> both MAXes are 8, the else conditions below are dead code
> Same for set

yes. Maybe I should just use one define for both? e.g. AM65_CPSW_MAX_QUEUES.

> 
>> -	tx_chn = &common->tx_chns[queue];
>> +	if (queue < AM65_CPSW_MAX_TX_QUEUES) {
>> +		tx_chn = &common->tx_chns[queue];
>> +		coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
>> +	} else {
>> +		coal->tx_coalesce_usecs = ~0;
>> +	}
>>  
>> -	coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
>> +	if (queue < AM65_CPSW_MAX_RX_QUEUES) {
>> +		rx_flow = &common->rx_chns.flows[queue];
>> +		coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
>> +	} else {
>> +		coal->rx_coalesce_usecs = ~0;
>> +	}
> 
> +	for (flow_idx = 0; flow_idx < common->rx_ch_num_flows; flow_idx++) {
> +		flow = &rx_chn->flows[flow_idx];
> +		for (i = 0; i < AM65_CPSW_MAX_RX_DESC; i++) {
> +			page = page_pool_dev_alloc_pages(flow->page_pool);
> +			if (!page) {
> +				dev_err(common->dev, "cannot allocate page in flow %d\n",
> +					flow_idx);
> +				ret = -ENOMEM;
> +				if (i)
> +					goto fail_rx;
>  
> -			return ret;
> -		}
> -		rx_chn->pages[i] = page;
> +				return ret;
> 
> the direct returns now that it's a double-nested loop seem questionable,
> don't you have to goto fail_rx?

Good catch. I should just drop the "if (i)" and goto fail_rx regardless.

-- 
cheers,
-roger

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
  2024-07-06  1:15   ` Jakub Kicinski
@ 2024-07-23 17:11   ` Joe Damato
  2024-07-27  6:29     ` Roger Quadros
  2024-07-23 21:10   ` Brett Creeley
  2 siblings, 1 reply; 16+ messages in thread
From: Joe Damato @ 2024-07-23 17:11 UTC (permalink / raw)
  To: Roger Quadros
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Siddharth Vadapalli, Julien Panis, Simon Horman, Andrew Lunn, srk,
	vigneshr, danishanwar, pekka Varis, netdev, linux-kernel,
	linux-omap

On Wed, Jul 03, 2024 at 04:51:32PM +0300, Roger Quadros wrote:

[...]

> @@ -699,6 +727,14 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>  		goto fail_rx;
>  	}
>  
> +	for (i = 0; i < common->rx_ch_num_flows ; i++) {
> +		napi_enable(&common->rx_chns.flows[i].napi_rx);
> +		if (common->rx_chns.flows[i].irq_disabled) {
> +			common->rx_chns.flows[i].irq_disabled = false;

Just a minor nit (not a reason to hold this back): I've been
encouraging folks to use the new netdev-genl APIs in their drivers
to map NAPIs to queues and IRQs if possible because it allows for
more expressive and interesting userland applications.

You may consider in the future using something vaguely like (this is
untested psuedo-code I just typed out):

   netif_napi_set_irq(&common->rx_chns.flows[i].napi_rx,
                      common->rx_chns.flows[i].irq);

and 

   netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX,
                        &common->rx_chns.flows[i].napi_rx);

To link everything together (note that RTNL must be held while doing
this -- I haven't checked your code path to see if that is true here).

For an example, see 64b62146ba9e ("net/mlx4: link NAPI instances to
queues and IRQs). 

Doing this would allow userland to get data via netlink, which you
can examine yourself by using cli.py like this:

python3 tools/net/ynl/cli.py \
  --spec Documentation/netlink/specs/netdev.yaml \
  --dump queue-get

python3 tools/net/ynl/cli.py \
  --spec Documentation/netlink/specs/netdev.yaml \
  --dump napi-get

> +			enable_irq(common->rx_chns.flows[i].irq);
> +		}
> +	}
> +
>  	for (tx = 0; tx < common->tx_ch_num; tx++) {
>  		ret = k3_udma_glue_enable_tx_chn(tx_chn[tx].tx_chn);
>  		if (ret) {
> @@ -710,12 +746,6 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>  		napi_enable(&tx_chn[tx].napi_tx);
>  	}
>  
> -	napi_enable(&common->napi_rx);
> -	if (common->rx_irq_disabled) {
> -		common->rx_irq_disabled = false;
> -		enable_irq(rx_chn->irq);
> -	}
> -
>  	dev_dbg(common->dev, "cpsw_nuss started\n");
>  	return 0;
>  
> @@ -726,11 +756,24 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>  		tx--;
>  	}
>  
> +	for (flow_idx = 0; i < common->rx_ch_num_flows; flow_idx++) {
> +		flow = &rx_chn->flows[flow_idx];
> +		if (!flow->irq_disabled) {
> +			disable_irq(flow->irq);
> +			flow->irq_disabled = true;
> +		}
> +		napi_disable(&flow->napi_rx);
> +	}
> +
>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>  
>  fail_rx:
> -	k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, 0, rx_chn,
> -				  am65_cpsw_nuss_rx_cleanup, 0);
> +	for (i = 0; i < common->rx_ch_num_flows; i--)
> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
> +					  am65_cpsw_nuss_rx_cleanup, !!i);
> +
> +	am65_cpsw_destroy_xdp_rxqs(common);
> +
>  	return ret;
>  }
>  
> @@ -779,12 +822,12 @@ static int am65_cpsw_nuss_common_stop(struct am65_cpsw_common *common)
>  			dev_err(common->dev, "rx teardown timeout\n");
>  	}
>  
> -	napi_disable(&common->napi_rx);
> -	hrtimer_cancel(&common->rx_hrtimer);
> -
> -	for (i = 0; i < AM65_CPSW_MAX_RX_FLOWS; i++)
> -		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, rx_chn,
> +	for (i = 0; i < common->rx_ch_num_flows; i++) {
> +		napi_disable(&common->rx_chns.flows[i].napi_rx);

The inverse of the above is probably true somewhere around here;
again a small piece of psuedo code for illustrative purposes:

   netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX, NULL);

> +		hrtimer_cancel(&common->rx_chns.flows[i].rx_hrtimer);
> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
>  					  am65_cpsw_nuss_rx_cleanup, !!i);
> +	}
>  
>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>  

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
  2024-07-06  1:15   ` Jakub Kicinski
  2024-07-23 17:11   ` Joe Damato
@ 2024-07-23 21:10   ` Brett Creeley
  2024-07-27  6:27     ` Roger Quadros
  2 siblings, 1 reply; 16+ messages in thread
From: Brett Creeley @ 2024-07-23 21:10 UTC (permalink / raw)
  To: Roger Quadros, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap



On 7/3/2024 6:51 AM, Roger Quadros wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> am65-cpsw can support up to 8 queues at Rx.
> Use a macro AM65_CPSW_MAX_RX_QUEUES to indicate that.
> As there is only one DMA channel for RX traffic, the
> 8 queues come as 8 flows in that channel.
> 
> By default, we will start with 1 flow as defined by the
> macro AM65_CPSW_DEFAULT_RX_CHN_FLOWS.
> 
> User can change the number of flows by ethtool like so
> 'ethtool -L ethx rx <N>'
> 
> All traffic will still come on flow 0. To get traffic on
> different flows the Classifiers will need to be set up.
> 
> Signed-off-by: Roger Quadros <rogerq@kernel.org>
> Reviewed-by: Simon Horman <horms@kernel.org>
> ---
> Changelog:
> v3:
> - style fixes: reverse xmas tree and checkpatch.pl --max-line-length=80
> - typo fix: Classifer -> Classifier
> - added Reviewed-by Simon Horman
> ---
>   drivers/net/ethernet/ti/am65-cpsw-ethtool.c |  62 +++--
>   drivers/net/ethernet/ti/am65-cpsw-nuss.c    | 367 ++++++++++++++++------------
>   drivers/net/ethernet/ti/am65-cpsw-nuss.h    |  36 +--
>   3 files changed, 284 insertions(+), 181 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
> index a1d0935d1ebe..01e3967852e0 100644
> --- a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
> +++ b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
> @@ -429,7 +429,7 @@ static void am65_cpsw_get_channels(struct net_device *ndev,
> 
>          ch->max_rx = AM65_CPSW_MAX_RX_QUEUES;
>          ch->max_tx = AM65_CPSW_MAX_TX_QUEUES;
> -       ch->rx_count = AM65_CPSW_MAX_RX_QUEUES;
> +       ch->rx_count = common->rx_ch_num_flows;
>          ch->tx_count = common->tx_ch_num;
>   }
> 
> @@ -448,8 +448,10 @@ static int am65_cpsw_set_channels(struct net_device *ndev,
>                  return -EBUSY;
> 
>          am65_cpsw_nuss_remove_tx_chns(common);
> +       am65_cpsw_nuss_remove_rx_chns(common);
> 
> -       return am65_cpsw_nuss_update_tx_chns(common, chs->tx_count);
> +       return am65_cpsw_nuss_update_tx_rx_chns(common, chs->tx_count,
> +                                               chs->rx_count);
>   }
> 
>   static void
> @@ -920,11 +922,13 @@ static int am65_cpsw_get_coalesce(struct net_device *ndev, struct ethtool_coales
>                                    struct netlink_ext_ack *extack)
>   {
>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +       struct am65_cpsw_rx_flow *rx_flow;
>          struct am65_cpsw_tx_chn *tx_chn;
> 
>          tx_chn = &common->tx_chns[0];
> +       rx_flow = &common->rx_chns.flows[0];
> 
> -       coal->rx_coalesce_usecs = common->rx_pace_timeout / 1000;
> +       coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
>          coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
> 
>          return 0;
> @@ -934,14 +938,26 @@ static int am65_cpsw_get_per_queue_coalesce(struct net_device *ndev, u32 queue,
>                                              struct ethtool_coalesce *coal)
>   {
>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +       struct am65_cpsw_rx_flow *rx_flow;
>          struct am65_cpsw_tx_chn *tx_chn;
> 
> -       if (queue >= AM65_CPSW_MAX_TX_QUEUES)
> +       if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
> +           queue >= AM65_CPSW_MAX_RX_QUEUES)
>                  return -EINVAL;
> 
> -       tx_chn = &common->tx_chns[queue];
> +       if (queue < AM65_CPSW_MAX_TX_QUEUES) {
> +               tx_chn = &common->tx_chns[queue];
> +               coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
> +       } else {
> +               coal->tx_coalesce_usecs = ~0;
> +       }
> 
> -       coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
> +       if (queue < AM65_CPSW_MAX_RX_QUEUES) {
> +               rx_flow = &common->rx_chns.flows[queue];
> +               coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
> +       } else {
> +               coal->rx_coalesce_usecs = ~0;
> +       }

Minor nit, but after removing the dead code below the check for queue 
being greater than max values, I think am65_cpsw_get_coalesce() and 
am65_get_per_queue_coalesce() are identical except the "u32 queue" argument.

I think you could do something like the following:

static int am65_cpsw_get_per_queue_coalesce(struct net_device *ndev,
				  struct ethtool_coalesce *coal,
				  struct netlink_ext_ack *extack)
{
	return __am65_cpsw_get_coalesce(ndev, coal, 0);
}

static int am65_cpsw_get_coalesce(struct net_device *ndev, u32 queue,
				  struct ethtool_coalesce *coal,
				  struct netlink_ext_ack *extack,
				  u32 )
{
	return __am65_cpsw_get_coalesce(ndev, coal, queue);
}

> 
>          return 0;
>   }
> @@ -951,9 +967,11 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
>                                    struct netlink_ext_ack *extack)
>   {
>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +       struct am65_cpsw_rx_flow *rx_flow;
>          struct am65_cpsw_tx_chn *tx_chn;
> 
>          tx_chn = &common->tx_chns[0];
> +       rx_flow = &common->rx_chns.flows[0];
> 
>          if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20)
>                  return -EINVAL;
> @@ -961,7 +979,7 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
>          if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20)
>                  return -EINVAL;

Why does this return -EINVAL here, but 
am65_cpsw_set_per_queue_coalesce() prints a dev_info() and then set the 
value to 20?

Would it better to have consistent behavior? Maybe I'm missing some 
context or reasoning here?

> 
> -       common->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
> +       rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
>          tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
> 
>          return 0;
> @@ -971,20 +989,36 @@ static int am65_cpsw_set_per_queue_coalesce(struct net_device *ndev, u32 queue,
>                                              struct ethtool_coalesce *coal)
>   {
>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
> +       struct am65_cpsw_rx_flow *rx_flow;
>          struct am65_cpsw_tx_chn *tx_chn;
> 
> -       if (queue >= AM65_CPSW_MAX_TX_QUEUES)
> +       if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
> +           queue >= AM65_CPSW_MAX_RX_QUEUES)
>                  return -EINVAL;
> 
> -       tx_chn = &common->tx_chns[queue];
> +       if (queue < AM65_CPSW_MAX_TX_QUEUES) {
> +               tx_chn = &common->tx_chns[queue];
> +
> +               if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
> +                       dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
> +                                queue);
> +                       coal->tx_coalesce_usecs = 20;
> +               }
> 
> -       if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
> -               dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
> -                        queue);
> -               coal->tx_coalesce_usecs = 20;
> +               tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
>          }
> 
> -       tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
> +       if (queue < AM65_CPSW_MAX_RX_QUEUES) {
> +               rx_flow = &common->rx_chns.flows[queue];
> +
> +               if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20) {
> +                       dev_info(common->dev, "defaulting to min value of 20us for rx-usecs for rx-%u\n",
> +                                queue);

Would it make more sense to just return -EINVAL here similar to the 
standard "set_coalesce"?

> +                       coal->rx_coalesce_usecs = 20;
> +               }
> +
> +               rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
> +       }
> 
>          return 0;
>   }

I think my comment to the "get" and "get_per_queue" versions of these 
functions also applies here, but only if the behavior of returning 
-EINVAL or setting a value for the user is the same between the "set" 
and "set_per_queue".

Thanks,

Brett

<snip>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-23 21:10   ` Brett Creeley
@ 2024-07-27  6:27     ` Roger Quadros
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-07-27  6:27 UTC (permalink / raw)
  To: Brett Creeley, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Siddharth Vadapalli, Julien Panis
  Cc: Simon Horman, Andrew Lunn, srk, vigneshr, danishanwar,
	pekka Varis, netdev, linux-kernel, linux-omap



On 24/07/2024 00:10, Brett Creeley wrote:
> 
> 
> On 7/3/2024 6:51 AM, Roger Quadros wrote:
>> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
>>
>>
>> am65-cpsw can support up to 8 queues at Rx.
>> Use a macro AM65_CPSW_MAX_RX_QUEUES to indicate that.
>> As there is only one DMA channel for RX traffic, the
>> 8 queues come as 8 flows in that channel.
>>
>> By default, we will start with 1 flow as defined by the
>> macro AM65_CPSW_DEFAULT_RX_CHN_FLOWS.
>>
>> User can change the number of flows by ethtool like so
>> 'ethtool -L ethx rx <N>'
>>
>> All traffic will still come on flow 0. To get traffic on
>> different flows the Classifiers will need to be set up.
>>
>> Signed-off-by: Roger Quadros <rogerq@kernel.org>
>> Reviewed-by: Simon Horman <horms@kernel.org>
>> ---
>> Changelog:
>> v3:
>> - style fixes: reverse xmas tree and checkpatch.pl --max-line-length=80
>> - typo fix: Classifer -> Classifier
>> - added Reviewed-by Simon Horman
>> ---
>>   drivers/net/ethernet/ti/am65-cpsw-ethtool.c |  62 +++--
>>   drivers/net/ethernet/ti/am65-cpsw-nuss.c    | 367 ++++++++++++++++------------
>>   drivers/net/ethernet/ti/am65-cpsw-nuss.h    |  36 +--
>>   3 files changed, 284 insertions(+), 181 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
>> index a1d0935d1ebe..01e3967852e0 100644
>> --- a/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
>> +++ b/drivers/net/ethernet/ti/am65-cpsw-ethtool.c
>> @@ -429,7 +429,7 @@ static void am65_cpsw_get_channels(struct net_device *ndev,
>>
>>          ch->max_rx = AM65_CPSW_MAX_RX_QUEUES;
>>          ch->max_tx = AM65_CPSW_MAX_TX_QUEUES;
>> -       ch->rx_count = AM65_CPSW_MAX_RX_QUEUES;
>> +       ch->rx_count = common->rx_ch_num_flows;
>>          ch->tx_count = common->tx_ch_num;
>>   }
>>
>> @@ -448,8 +448,10 @@ static int am65_cpsw_set_channels(struct net_device *ndev,
>>                  return -EBUSY;
>>
>>          am65_cpsw_nuss_remove_tx_chns(common);
>> +       am65_cpsw_nuss_remove_rx_chns(common);
>>
>> -       return am65_cpsw_nuss_update_tx_chns(common, chs->tx_count);
>> +       return am65_cpsw_nuss_update_tx_rx_chns(common, chs->tx_count,
>> +                                               chs->rx_count);
>>   }
>>
>>   static void
>> @@ -920,11 +922,13 @@ static int am65_cpsw_get_coalesce(struct net_device *ndev, struct ethtool_coales
>>                                    struct netlink_ext_ack *extack)
>>   {
>>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +       struct am65_cpsw_rx_flow *rx_flow;
>>          struct am65_cpsw_tx_chn *tx_chn;
>>
>>          tx_chn = &common->tx_chns[0];
>> +       rx_flow = &common->rx_chns.flows[0];
>>
>> -       coal->rx_coalesce_usecs = common->rx_pace_timeout / 1000;
>> +       coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
>>          coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
>>
>>          return 0;
>> @@ -934,14 +938,26 @@ static int am65_cpsw_get_per_queue_coalesce(struct net_device *ndev, u32 queue,
>>                                              struct ethtool_coalesce *coal)
>>   {
>>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +       struct am65_cpsw_rx_flow *rx_flow;
>>          struct am65_cpsw_tx_chn *tx_chn;
>>
>> -       if (queue >= AM65_CPSW_MAX_TX_QUEUES)
>> +       if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
>> +           queue >= AM65_CPSW_MAX_RX_QUEUES)
>>                  return -EINVAL;
>>
>> -       tx_chn = &common->tx_chns[queue];
>> +       if (queue < AM65_CPSW_MAX_TX_QUEUES) {
>> +               tx_chn = &common->tx_chns[queue];
>> +               coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
>> +       } else {
>> +               coal->tx_coalesce_usecs = ~0;
>> +       }
>>
>> -       coal->tx_coalesce_usecs = tx_chn->tx_pace_timeout / 1000;
>> +       if (queue < AM65_CPSW_MAX_RX_QUEUES) {
>> +               rx_flow = &common->rx_chns.flows[queue];
>> +               coal->rx_coalesce_usecs = rx_flow->rx_pace_timeout / 1000;
>> +       } else {
>> +               coal->rx_coalesce_usecs = ~0;
>> +       }
> 
> Minor nit, but after removing the dead code below the check for queue being greater than max values, I think am65_cpsw_get_coalesce() and am65_get_per_queue_coalesce() are identical except the "u32 queue" argument.
> 
> I think you could do something like the following:
> 
> static int am65_cpsw_get_per_queue_coalesce(struct net_device *ndev,
>                   struct ethtool_coalesce *coal,
>                   struct netlink_ext_ack *extack)
> {
>     return __am65_cpsw_get_coalesce(ndev, coal, 0);
> }
> 
> static int am65_cpsw_get_coalesce(struct net_device *ndev, u32 queue,
>                   struct ethtool_coalesce *coal,
>                   struct netlink_ext_ack *extack,
>                   u32 )
> {
>     return __am65_cpsw_get_coalesce(ndev, coal, queue);
> }
> 

Sure, this is much nicer.

>>
>>          return 0;
>>   }
>> @@ -951,9 +967,11 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
>>                                    struct netlink_ext_ack *extack)
>>   {
>>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +       struct am65_cpsw_rx_flow *rx_flow;
>>          struct am65_cpsw_tx_chn *tx_chn;
>>
>>          tx_chn = &common->tx_chns[0];
>> +       rx_flow = &common->rx_chns.flows[0];
>>
>>          if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20)
>>                  return -EINVAL;
>> @@ -961,7 +979,7 @@ static int am65_cpsw_set_coalesce(struct net_device *ndev, struct ethtool_coales
>>          if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20)
>>                  return -EINVAL;
> 
> Why does this return -EINVAL here, but am65_cpsw_set_per_queue_coalesce() prints a dev_info() and then set the value to 20?
> 
> Would it better to have consistent behavior? Maybe I'm missing some context or reasoning here?

Consistent behaviour is better.

> 
>>
>> -       common->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
>> +       rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
>>          tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
>>
>>          return 0;
>> @@ -971,20 +989,36 @@ static int am65_cpsw_set_per_queue_coalesce(struct net_device *ndev, u32 queue,
>>                                              struct ethtool_coalesce *coal)
>>   {
>>          struct am65_cpsw_common *common = am65_ndev_to_common(ndev);
>> +       struct am65_cpsw_rx_flow *rx_flow;
>>          struct am65_cpsw_tx_chn *tx_chn;
>>
>> -       if (queue >= AM65_CPSW_MAX_TX_QUEUES)
>> +       if (queue >= AM65_CPSW_MAX_TX_QUEUES &&
>> +           queue >= AM65_CPSW_MAX_RX_QUEUES)
>>                  return -EINVAL;
>>
>> -       tx_chn = &common->tx_chns[queue];
>> +       if (queue < AM65_CPSW_MAX_TX_QUEUES) {
>> +               tx_chn = &common->tx_chns[queue];
>> +
>> +               if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
>> +                       dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
>> +                                queue);
>> +                       coal->tx_coalesce_usecs = 20;
>> +               }
>>
>> -       if (coal->tx_coalesce_usecs && coal->tx_coalesce_usecs < 20) {
>> -               dev_info(common->dev, "defaulting to min value of 20us for tx-usecs for tx-%u\n",
>> -                        queue);
>> -               coal->tx_coalesce_usecs = 20;
>> +               tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
>>          }
>>
>> -       tx_chn->tx_pace_timeout = coal->tx_coalesce_usecs * 1000;
>> +       if (queue < AM65_CPSW_MAX_RX_QUEUES) {
>> +               rx_flow = &common->rx_chns.flows[queue];
>> +
>> +               if (coal->rx_coalesce_usecs && coal->rx_coalesce_usecs < 20) {
>> +                       dev_info(common->dev, "defaulting to min value of 20us for rx-usecs for rx-%u\n",
>> +                                queue);
> 
> Would it make more sense to just return -EINVAL here similar to the standard "set_coalesce"?

Yes, I'll do that in next spin.

> 
>> +                       coal->rx_coalesce_usecs = 20;
>> +               }
>> +
>> +               rx_flow->rx_pace_timeout = coal->rx_coalesce_usecs * 1000;
>> +       }
>>
>>          return 0;
>>   }
> 
> I think my comment to the "get" and "get_per_queue" versions of these functions also applies here, but only if the behavior of returning -EINVAL or setting a value for the user is the same between the "set" and "set_per_queue".

Thanks for the review!

> 
> Thanks,
> 
> Brett
> 
> <snip>
> 

-- 
cheers,
-roger

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-23 17:11   ` Joe Damato
@ 2024-07-27  6:29     ` Roger Quadros
  2024-09-09 14:17       ` Roger Quadros
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Quadros @ 2024-07-27  6:29 UTC (permalink / raw)
  To: Joe Damato, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Siddharth Vadapalli, Julien Panis, Simon Horman,
	Andrew Lunn, srk, vigneshr, danishanwar, pekka Varis, netdev,
	linux-kernel, linux-omap



On 23/07/2024 20:11, Joe Damato wrote:
> On Wed, Jul 03, 2024 at 04:51:32PM +0300, Roger Quadros wrote:
> 
> [...]
> 
>> @@ -699,6 +727,14 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>  		goto fail_rx;
>>  	}
>>  
>> +	for (i = 0; i < common->rx_ch_num_flows ; i++) {
>> +		napi_enable(&common->rx_chns.flows[i].napi_rx);
>> +		if (common->rx_chns.flows[i].irq_disabled) {
>> +			common->rx_chns.flows[i].irq_disabled = false;
> 
> Just a minor nit (not a reason to hold this back): I've been
> encouraging folks to use the new netdev-genl APIs in their drivers
> to map NAPIs to queues and IRQs if possible because it allows for
> more expressive and interesting userland applications.
> 
> You may consider in the future using something vaguely like (this is
> untested psuedo-code I just typed out):
> 
>    netif_napi_set_irq(&common->rx_chns.flows[i].napi_rx,
>                       common->rx_chns.flows[i].irq);
> 
> and 
> 
>    netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX,
>                         &common->rx_chns.flows[i].napi_rx);
> 
> To link everything together (note that RTNL must be held while doing
> this -- I haven't checked your code path to see if that is true here).
> 
> For an example, see 64b62146ba9e ("net/mlx4: link NAPI instances to
> queues and IRQs). 
> 
> Doing this would allow userland to get data via netlink, which you
> can examine yourself by using cli.py like this:
> 
> python3 tools/net/ynl/cli.py \
>   --spec Documentation/netlink/specs/netdev.yaml \
>   --dump queue-get
> 
> python3 tools/net/ynl/cli.py \
>   --spec Documentation/netlink/specs/netdev.yaml \
>   --dump napi-get
> 

Thanks for the pionters. I will check and see if I can incorportate
this in the next spin.

>> +			enable_irq(common->rx_chns.flows[i].irq);
>> +		}
>> +	}
>> +
>>  	for (tx = 0; tx < common->tx_ch_num; tx++) {
>>  		ret = k3_udma_glue_enable_tx_chn(tx_chn[tx].tx_chn);
>>  		if (ret) {
>> @@ -710,12 +746,6 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>  		napi_enable(&tx_chn[tx].napi_tx);
>>  	}
>>  
>> -	napi_enable(&common->napi_rx);
>> -	if (common->rx_irq_disabled) {
>> -		common->rx_irq_disabled = false;
>> -		enable_irq(rx_chn->irq);
>> -	}
>> -
>>  	dev_dbg(common->dev, "cpsw_nuss started\n");
>>  	return 0;
>>  
>> @@ -726,11 +756,24 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>  		tx--;
>>  	}
>>  
>> +	for (flow_idx = 0; i < common->rx_ch_num_flows; flow_idx++) {
>> +		flow = &rx_chn->flows[flow_idx];
>> +		if (!flow->irq_disabled) {
>> +			disable_irq(flow->irq);
>> +			flow->irq_disabled = true;
>> +		}
>> +		napi_disable(&flow->napi_rx);
>> +	}
>> +
>>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>>  
>>  fail_rx:
>> -	k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, 0, rx_chn,
>> -				  am65_cpsw_nuss_rx_cleanup, 0);
>> +	for (i = 0; i < common->rx_ch_num_flows; i--)
>> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
>> +					  am65_cpsw_nuss_rx_cleanup, !!i);
>> +
>> +	am65_cpsw_destroy_xdp_rxqs(common);
>> +
>>  	return ret;
>>  }
>>  
>> @@ -779,12 +822,12 @@ static int am65_cpsw_nuss_common_stop(struct am65_cpsw_common *common)
>>  			dev_err(common->dev, "rx teardown timeout\n");
>>  	}
>>  
>> -	napi_disable(&common->napi_rx);
>> -	hrtimer_cancel(&common->rx_hrtimer);
>> -
>> -	for (i = 0; i < AM65_CPSW_MAX_RX_FLOWS; i++)
>> -		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, rx_chn,
>> +	for (i = 0; i < common->rx_ch_num_flows; i++) {
>> +		napi_disable(&common->rx_chns.flows[i].napi_rx);
> 
> The inverse of the above is probably true somewhere around here;
> again a small piece of psuedo code for illustrative purposes:
> 
>    netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX, NULL);
> 
>> +		hrtimer_cancel(&common->rx_chns.flows[i].rx_hrtimer);
>> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
>>  					  am65_cpsw_nuss_rx_cleanup, !!i);
>> +	}
>>  
>>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>>  

-- 
cheers,
-roger

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx
  2024-07-27  6:29     ` Roger Quadros
@ 2024-09-09 14:17       ` Roger Quadros
  0 siblings, 0 replies; 16+ messages in thread
From: Roger Quadros @ 2024-09-09 14:17 UTC (permalink / raw)
  To: Joe Damato, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Siddharth Vadapalli, Julien Panis, Simon Horman,
	Andrew Lunn, srk, vigneshr, danishanwar, pekka Varis, netdev,
	linux-kernel, linux-omap

Hi Joe,

On 27/07/2024 09:29, Roger Quadros wrote:
> 
> 
> On 23/07/2024 20:11, Joe Damato wrote:
>> On Wed, Jul 03, 2024 at 04:51:32PM +0300, Roger Quadros wrote:
>>
>> [...]
>>
>>> @@ -699,6 +727,14 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>>  		goto fail_rx;
>>>  	}
>>>  
>>> +	for (i = 0; i < common->rx_ch_num_flows ; i++) {
>>> +		napi_enable(&common->rx_chns.flows[i].napi_rx);
>>> +		if (common->rx_chns.flows[i].irq_disabled) {
>>> +			common->rx_chns.flows[i].irq_disabled = false;
>>
>> Just a minor nit (not a reason to hold this back): I've been
>> encouraging folks to use the new netdev-genl APIs in their drivers
>> to map NAPIs to queues and IRQs if possible because it allows for
>> more expressive and interesting userland applications.
>>
>> You may consider in the future using something vaguely like (this is
>> untested psuedo-code I just typed out):
>>
>>    netif_napi_set_irq(&common->rx_chns.flows[i].napi_rx,
>>                       common->rx_chns.flows[i].irq);
>>
>> and 
>>
>>    netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX,
>>                         &common->rx_chns.flows[i].napi_rx);

common->dev is not actually struct net_device.
The hardware has one set of queues for more than one network ports.
Currently the NAPI queues are associated only with the first network device
that is brought up.

In such a case, what must be done with netif_queue_set_napi()?
do we still associate the same napi with all the network devices?

>>
>> To link everything together (note that RTNL must be held while doing
>> this -- I haven't checked your code path to see if that is true here).

This happens as part of ndo_open call. We don't explicitly hold RTNL here.

>>
>> For an example, see 64b62146ba9e ("net/mlx4: link NAPI instances to
>> queues and IRQs). 
>>
>> Doing this would allow userland to get data via netlink, which you
>> can examine yourself by using cli.py like this:
>>
>> python3 tools/net/ynl/cli.py \
>>   --spec Documentation/netlink/specs/netdev.yaml \
>>   --dump queue-get
>>
>> python3 tools/net/ynl/cli.py \
>>   --spec Documentation/netlink/specs/netdev.yaml \
>>   --dump napi-get
>>
> 
> Thanks for the pionters. I will check and see if I can incorportate
> this in the next spin.
> 
>>> +			enable_irq(common->rx_chns.flows[i].irq);
>>> +		}
>>> +	}
>>> +
>>>  	for (tx = 0; tx < common->tx_ch_num; tx++) {
>>>  		ret = k3_udma_glue_enable_tx_chn(tx_chn[tx].tx_chn);
>>>  		if (ret) {
>>> @@ -710,12 +746,6 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>>  		napi_enable(&tx_chn[tx].napi_tx);
>>>  	}
>>>  
>>> -	napi_enable(&common->napi_rx);
>>> -	if (common->rx_irq_disabled) {
>>> -		common->rx_irq_disabled = false;
>>> -		enable_irq(rx_chn->irq);
>>> -	}
>>> -
>>>  	dev_dbg(common->dev, "cpsw_nuss started\n");
>>>  	return 0;
>>>  
>>> @@ -726,11 +756,24 @@ static int am65_cpsw_nuss_common_open(struct am65_cpsw_common *common)
>>>  		tx--;
>>>  	}
>>>  
>>> +	for (flow_idx = 0; i < common->rx_ch_num_flows; flow_idx++) {
>>> +		flow = &rx_chn->flows[flow_idx];
>>> +		if (!flow->irq_disabled) {
>>> +			disable_irq(flow->irq);
>>> +			flow->irq_disabled = true;
>>> +		}
>>> +		napi_disable(&flow->napi_rx);
>>> +	}
>>> +
>>>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>>>  
>>>  fail_rx:
>>> -	k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, 0, rx_chn,
>>> -				  am65_cpsw_nuss_rx_cleanup, 0);
>>> +	for (i = 0; i < common->rx_ch_num_flows; i--)
>>> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
>>> +					  am65_cpsw_nuss_rx_cleanup, !!i);
>>> +
>>> +	am65_cpsw_destroy_xdp_rxqs(common);
>>> +
>>>  	return ret;
>>>  }
>>>  
>>> @@ -779,12 +822,12 @@ static int am65_cpsw_nuss_common_stop(struct am65_cpsw_common *common)
>>>  			dev_err(common->dev, "rx teardown timeout\n");
>>>  	}
>>>  
>>> -	napi_disable(&common->napi_rx);
>>> -	hrtimer_cancel(&common->rx_hrtimer);
>>> -
>>> -	for (i = 0; i < AM65_CPSW_MAX_RX_FLOWS; i++)
>>> -		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, rx_chn,
>>> +	for (i = 0; i < common->rx_ch_num_flows; i++) {
>>> +		napi_disable(&common->rx_chns.flows[i].napi_rx);
>>
>> The inverse of the above is probably true somewhere around here;
>> again a small piece of psuedo code for illustrative purposes:
>>
>>    netif_queue_set_napi(common->dev, i, NETDEV_QUEUE_TYPE_RX, NULL);
>>
>>> +		hrtimer_cancel(&common->rx_chns.flows[i].rx_hrtimer);
>>> +		k3_udma_glue_reset_rx_chn(rx_chn->rx_chn, i, &rx_chn->flows[i],
>>>  					  am65_cpsw_nuss_rx_cleanup, !!i);
>>> +	}
>>>  
>>>  	k3_udma_glue_disable_rx_chn(rx_chn->rx_chn);
>>>  
> 

-- 
cheers,
-roger

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-09-09 14:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 13:51 [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support Roger Quadros
2024-07-03 13:51 ` [PATCH net-next v3 1/6] net: ethernet: ti: am65-cpsw: Introduce multi queue Rx Roger Quadros
2024-07-06  1:15   ` Jakub Kicinski
2024-07-08 19:42     ` Roger Quadros
2024-07-23 17:11   ` Joe Damato
2024-07-27  6:29     ` Roger Quadros
2024-09-09 14:17       ` Roger Quadros
2024-07-23 21:10   ` Brett Creeley
2024-07-27  6:27     ` Roger Quadros
2024-07-03 13:51 ` [PATCH net-next v3 2/6] net: ethernet: ti: cpsw_ale: use regfields for ALE registers Roger Quadros
2024-07-03 13:51 ` [PATCH net-next v3 3/6] net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers Roger Quadros
2024-07-03 13:51 ` [PATCH net-next v3 4/6] net: ethernet: ti: cpsw_ale: add Policer and Thread control register fields Roger Quadros
2024-07-03 13:51 ` [PATCH net-next v3 5/6] net: ethernet: ti: cpsw_ale: add policer/classifier helpers and setup defaults Roger Quadros
2024-07-04  8:54   ` Simon Horman
2024-07-03 13:51 ` [PATCH net-next v3 6/6] net: ethernet: ti: am65-cpsw: setup priority to flow mapping Roger Quadros
2024-07-04  9:59 ` [PATCH net-next v3 0/6] net: ethernet: ti: am65-cpsw: Add multi queue RX support MD Danish Anwar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).