[RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
@ 2024-12-11 15:31 Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Introduce support for ETS and TBF qdisc offload available in the Airoha
EN7581 ethernet controller.
Some DSA hw switches do not support Qdisc offloading or the mac chip
has more fine grained QoS capabilities with respect to the hw switch
(e.g. Airoha EN7581 mac chip has more hw QoS and buffering capabilities
with respect to the mt7530 switch). 
Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
Qdisc policies for the specified DSA user port configuring the hw switch
cpu port (mac chip).

Lorenzo Bianconi (5):
  net: airoha: Enable Tx drop capability for each Tx DMA ring
  net: airoha: Introduce ndo_select_queue callback
  net: dsa: Introduce ndo_setup_tc_conduit callback
  net: airoha: Add sched ETS offload support
  net: airoha: Add sched TBF offload support

 drivers/net/ethernet/mediatek/airoha_eth.c | 372 ++++++++++++++++++++-
 include/linux/netdevice.h                  |  12 +
 net/dsa/user.c                             |  47 ++-
 3 files changed, 422 insertions(+), 9 deletions(-)

-- 
2.47.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
@ 2024-12-11 15:31 ` Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback Lorenzo Bianconi
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

This is a preliminary patch in order to enable hw Qdisc offloading.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/mediatek/airoha_eth.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c
index 6c683a12d5aa..dd8d65a7e255 100644
--- a/drivers/net/ethernet/mediatek/airoha_eth.c
+++ b/drivers/net/ethernet/mediatek/airoha_eth.c
@@ -1789,6 +1789,10 @@ static int airoha_qdma_init_tx_queue(struct airoha_queue *q,
 		WRITE_ONCE(q->desc[i].ctrl, cpu_to_le32(val));
 	}
 
+	/* xmit ring drop default setting */
+	airoha_qdma_set(qdma, REG_TX_RING_BLOCKING(qid),
+			TX_RING_IRQ_BLOCKING_TX_DROP_EN_MASK);
+
 	airoha_qdma_wr(qdma, REG_TX_RING_BASE(qid), dma_addr);
 	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
 			FIELD_PREP(TX_RING_CPU_IDX_MASK, q->head));
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
@ 2024-12-11 15:31 ` Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback Lorenzo Bianconi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Airoha EN7581 SoC supports 32 Tx DMA rings used to feed packets to 32
QoS channels. Each channels supports 8 QoS queues where the user can
apply QoS scheduling policies. In a similar way, the user can configure
hw rate shaping for each QoS channel.
Introduce ndo_select_queue callback in order to select the tx queue
based on QoS channel and QoS queue. In particular, for dsa device select
QoS channel according to the dsa user port index, rely on port id
otherwise. Select QoS queue based on the skb priority.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/mediatek/airoha_eth.c | 28 ++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c
index dd8d65a7e255..8b927bf310ed 100644
--- a/drivers/net/ethernet/mediatek/airoha_eth.c
+++ b/drivers/net/ethernet/mediatek/airoha_eth.c
@@ -23,6 +23,7 @@
 #define AIROHA_MAX_NUM_XSI_RSTS		5
 #define AIROHA_MAX_MTU			2000
 #define AIROHA_MAX_PACKET_SIZE		2048
+#define AIROHA_NUM_QOS_QUEUES		8
 #define AIROHA_NUM_TX_RING		32
 #define AIROHA_NUM_RX_RING		32
 #define AIROHA_FE_MC_MAX_VLAN_TABLE	64
@@ -2417,21 +2418,43 @@ static void airoha_dev_get_stats64(struct net_device *dev,
 	} while (u64_stats_fetch_retry(&port->stats.syncp, start));
 }
 
+static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
+				   struct net_device *sb_dev)
+{
+	struct airoha_gdm_port *port = netdev_priv(dev);
+	u16 queue;
+
+	/* For dsa device select QoS channel according to the dsa user port
+	 * index, rely on port id otherwise. Select QoS queue based on the
+	 * skb priority.
+	 */
+	queue = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
+	queue = queue * AIROHA_NUM_QOS_QUEUES +		/* QoS channel */
+		skb->priority % AIROHA_NUM_QOS_QUEUES;	/* QoS queue */
+
+	return queue < dev->num_tx_queues ? queue : 0;
+}
+
 static netdev_tx_t airoha_dev_xmit(struct sk_buff *skb,
 				   struct net_device *dev)
 {
 	struct skb_shared_info *sinfo = skb_shinfo(skb);
 	struct airoha_gdm_port *port = netdev_priv(dev);
-	u32 msg0 = 0, msg1, len = skb_headlen(skb);
-	int i, qid = skb_get_queue_mapping(skb);
+	u32 msg0, msg1, len = skb_headlen(skb);
 	struct airoha_qdma *qdma = port->qdma;
 	u32 nr_frags = 1 + sinfo->nr_frags;
 	struct netdev_queue *txq;
 	struct airoha_queue *q;
 	void *data = skb->data;
+	int i, qid;
 	u16 index;
 	u8 fport;
 
+	qid = skb_get_queue_mapping(skb) % ARRAY_SIZE(qdma->q_tx);
+	msg0 = FIELD_PREP(QDMA_ETH_TXMSG_CHAN_MASK,
+			  qid / AIROHA_NUM_QOS_QUEUES) |
+	       FIELD_PREP(QDMA_ETH_TXMSG_QUEUE_MASK,
+			  qid % AIROHA_NUM_QOS_QUEUES);
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
 		msg0 |= FIELD_PREP(QDMA_ETH_TXMSG_TCO_MASK, 1) |
 			FIELD_PREP(QDMA_ETH_TXMSG_UCO_MASK, 1) |
@@ -2605,6 +2628,7 @@ static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_init		= airoha_dev_init,
 	.ndo_open		= airoha_dev_open,
 	.ndo_stop		= airoha_dev_stop,
+	.ndo_select_queue	= airoha_dev_select_queue,
 	.ndo_start_xmit		= airoha_dev_xmit,
 	.ndo_get_stats64        = airoha_dev_get_stats64,
 	.ndo_set_mac_address	= airoha_dev_set_macaddr,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback Lorenzo Bianconi
@ 2024-12-11 15:31 ` Lorenzo Bianconi
  2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Some DSA hw switches do not support Qdisc offloading or the mac chip
has more fine grained QoS capabilities with respect to the hw switch (e.g.
Airoha EN7581 mac chip has more hw QoS and buffering capabilities with
respect to the mt7530 switch). On the other hand, configuring the switch
cpu port via tc does not allow to address all possible use-cases (e.g.
shape just tcp traffic with dst port 80 transmitted on lan0).
Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
Qdisc policies for the specified user ports configuring the hw switch cpu
port (mac chip).

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/netdevice.h | 12 ++++++++++
 net/dsa/user.c            | 47 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d917949bba03..78b63dafad16 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1169,6 +1169,14 @@ struct netdev_net_notifier {
  *	This is always called from the stack with the rtnl lock held and netif
  *	tx queues stopped. This allows the netdevice to perform queue
  *	management safely.
+ * int (*ndo_setup_tc_conduit)(struct net_device *dev, int user_port,
+ *			       enum tc_setup_type type, void *type_data);
+ *	Called to setup any 'tc' scheduler, classifier or action on the user
+ *	port @user_port via the conduit port @dev. This is useful if the hw
+ *	supports improved offloading capability through the conduit port.
+ *	This is always called from the stack with the rtnl lock held and netif
+ *	tx queues stopped. This allows the netdevice to perform queue
+ *	management safely.
  *
  *	Fiber Channel over Ethernet (FCoE) offload functions.
  * int (*ndo_fcoe_enable)(struct net_device *dev);
@@ -1475,6 +1483,10 @@ struct net_device_ops {
 	int			(*ndo_setup_tc)(struct net_device *dev,
 						enum tc_setup_type type,
 						void *type_data);
+	int			(*ndo_setup_tc_conduit)(struct net_device *dev,
+							int user_port,
+							enum tc_setup_type type,
+							void *type_data);
 #if IS_ENABLED(CONFIG_FCOE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/dsa/user.c b/net/dsa/user.c
index c736c019e2af..2d5ed32a1f7c 100644
--- a/net/dsa/user.c
+++ b/net/dsa/user.c
@@ -1725,6 +1725,46 @@ static int dsa_user_setup_ft_block(struct dsa_switch *ds, int port,
 	return conduit->netdev_ops->ndo_setup_tc(conduit, TC_SETUP_FT, type_data);
 }
 
+static int dsa_user_setup_qdisc(struct net_device *dev,
+				enum tc_setup_type type, void *type_data)
+{
+	struct dsa_port *dp = dsa_user_to_port(dev);
+	struct dsa_switch *ds = dp->ds;
+	struct net_device *conduit;
+	int ret = -EOPNOTSUPP;
+
+	conduit = dsa_port_to_conduit(dsa_to_port(ds, dp->index));
+	if (conduit->netdev_ops->ndo_setup_tc_conduit) {
+		ret = conduit->netdev_ops->ndo_setup_tc_conduit(conduit,
+								dp->index,
+								type,
+								type_data);
+		if (ret && ret != -EOPNOTSUPP) {
+			netdev_err(dev,
+				   "qdisc offload failed on conduit %s: %d\n",
+				   conduit->name, ret);
+			return ret;
+		}
+	}
+
+	/* Try to offload the requested qdisc via user port. This is necessary
+	 * if the traffic is forwarded by the hw dsa switch.
+	 */
+	if (ds->ops->port_setup_tc) {
+		int err;
+
+		err = ds->ops->port_setup_tc(ds, dp->index, type, type_data);
+		if (err != -EOPNOTSUPP) {
+			if (err)
+				netdev_err(dev, "qdisc offload failed: %d\n",
+					   err);
+			ret = err;
+		}
+	}
+
+	return ret;
+}
+
 static int dsa_user_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			     void *type_data)
 {
@@ -1737,13 +1777,8 @@ static int dsa_user_setup_tc(struct net_device *dev, enum tc_setup_type type,
 	case TC_SETUP_FT:
 		return dsa_user_setup_ft_block(ds, dp->index, type_data);
 	default:
-		break;
+		return dsa_user_setup_qdisc(dev, type, type_data);
 	}
-
-	if (!ds->ops->port_setup_tc)
-		return -EOPNOTSUPP;
-
-	return ds->ops->port_setup_tc(ds, dp->index, type, type_data);
 }
 
 static int dsa_user_get_rxnfc(struct net_device *dev,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC net-next 4/5] net: airoha: Add sched ETS offload support
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2024-12-11 15:31 ` [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback Lorenzo Bianconi
@ 2024-12-11 15:31 ` Lorenzo Bianconi
  2024-12-12 14:37   ` Davide Caratti
  2024-12-11 15:31 ` [RFC net-next 5/5] net: airoha: Add sched TBF " Lorenzo Bianconi
  2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
  5 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Introduce support for ETS qdisc offload available in the Airoha EN7581
ethernet controller. Add the capability to configure hw ETS Qdisc for
the specified DSA user port via the QDMA block available in the mac chip
(QDMA block is connected to the DSA switch cpu port).

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/mediatek/airoha_eth.c | 155 ++++++++++++++++++++-
 1 file changed, 154 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c
index 8b927bf310ed..23aad8670a17 100644
--- a/drivers/net/ethernet/mediatek/airoha_eth.c
+++ b/drivers/net/ethernet/mediatek/airoha_eth.c
@@ -15,6 +15,7 @@
 #include <linux/u64_stats_sync.h>
 #include <net/dsa.h>
 #include <net/page_pool/helpers.h>
+#include <net/pkt_cls.h>
 #include <uapi/linux/ppp_defs.h>
 
 #define AIROHA_MAX_NUM_GDM_PORTS	1
@@ -542,9 +543,24 @@
 #define INGRESS_SLOW_TICK_RATIO_MASK	GENMASK(29, 16)
 #define INGRESS_FAST_TICK_MASK		GENMASK(15, 0)
 
+#define REG_QUEUE_CLOSE_CFG(_n)		(0x00a0 + ((_n) & 0xfc))
+#define TXQ_DISABLE_CHAN_QUEUE_MASK(_n, _m)	BIT((_m) + (((_n) & 0x3) << 3))
+
 #define REG_TXQ_DIS_CFG_BASE(_n)	((_n) ? 0x20a0 : 0x00a0)
 #define REG_TXQ_DIS_CFG(_n, _m)		(REG_TXQ_DIS_CFG_BASE((_n)) + (_m) << 2)
 
+#define REG_CNTR_CFG(_n)		(0x0400 + ((_n) << 3))
+#define CNTR_EN_MASK			BIT(31)
+#define CNTR_ALL_CHAN_EN_MASK		BIT(30)
+#define CNTR_ALL_QUEUE_EN_MASK		BIT(29)
+#define CNTR_ALL_DSCP_RING_EN_MASK	BIT(28)
+#define CNTR_SRC_MASK			GENMASK(27, 24)
+#define CNTR_DSCP_RING_MASK		GENMASK(20, 16)
+#define CNTR_CHAN_MASK			GENMASK(7, 3)
+#define CNTR_QUEUE_MASK			GENMASK(2, 0)
+
+#define REG_CNTR_VAL(_n)		(0x0404 + ((_n) << 3))
+
 #define REG_LMGR_INIT_CFG		0x1000
 #define LMGR_INIT_START			BIT(31)
 #define LMGR_SRAM_MODE_MASK		BIT(30)
@@ -570,9 +586,19 @@
 #define TWRR_WEIGHT_SCALE_MASK		BIT(31)
 #define TWRR_WEIGHT_BASE_MASK		BIT(3)
 
+#define REG_TXWRR_WEIGHT_CFG		0x1024
+#define TWRR_RW_CMD_MASK		BIT(31)
+#define TWRR_RW_CMD_DONE		BIT(30)
+#define TWRR_CHAN_IDX_MASK		GENMASK(23, 19)
+#define TWRR_QUEUE_IDX_MASK		GENMASK(18, 16)
+#define TWRR_VALUE_MASK			GENMASK(15, 0)
+
 #define REG_PSE_BUF_USAGE_CFG		0x1028
 #define PSE_BUF_ESTIMATE_EN_MASK	BIT(29)
 
+#define REG_CHAN_QOS_MODE(_n)		(0x1040 + ((_n) << 2))
+#define CHAN_QOS_MODE_MASK(_n)		GENMASK(2 + ((_n) << 2), (_n) << 2)
+
 #define REG_GLB_TRTCM_CFG		0x1080
 #define GLB_TRTCM_EN_MASK		BIT(31)
 #define GLB_TRTCM_MODE_MASK		BIT(30)
@@ -721,6 +747,17 @@ enum {
 	FE_PSE_PORT_DROP = 0xf,
 };
 
+enum tx_sched_mode {
+	TC_SCH_WRR8,
+	TC_SCH_SP,
+	TC_SCH_WRR7,
+	TC_SCH_WRR6,
+	TC_SCH_WRR5,
+	TC_SCH_WRR4,
+	TC_SCH_WRR3,
+	TC_SCH_WRR2,
+};
+
 struct airoha_queue_entry {
 	union {
 		void *buf;
@@ -2624,6 +2661,119 @@ airoha_ethtool_get_rmon_stats(struct net_device *dev,
 	} while (u64_stats_fetch_retry(&port->stats.syncp, start));
 }
 
+static int airoha_qdma_set_chan_tx_sched(struct airoha_gdm_port *port,
+					 int channel, enum tx_sched_mode mode,
+					 const u16 *weights, u8 n_weights)
+{
+	int i;
+
+	for (i = 0; i < AIROHA_NUM_TX_RING; i++)
+		airoha_qdma_clear(port->qdma, REG_QUEUE_CLOSE_CFG(channel),
+				  TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i));
+
+	for (i = 0; i < n_weights; i++) {
+		u32 status;
+		int err;
+
+		airoha_qdma_wr(port->qdma, REG_TXWRR_WEIGHT_CFG,
+			       TWRR_RW_CMD_MASK |
+			       FIELD_PREP(TWRR_CHAN_IDX_MASK, channel) |
+			       FIELD_PREP(TWRR_QUEUE_IDX_MASK, i) |
+			       FIELD_PREP(TWRR_VALUE_MASK, weights[i]));
+		err = read_poll_timeout(airoha_qdma_rr, status,
+					status & TWRR_RW_CMD_DONE,
+					USEC_PER_MSEC, 10 * USEC_PER_MSEC,
+					true, port->qdma,
+					REG_TXWRR_WEIGHT_CFG);
+		if (err)
+			return err;
+	}
+
+	airoha_qdma_rmw(port->qdma, REG_CHAN_QOS_MODE(channel >> 3),
+			CHAN_QOS_MODE_MASK(channel),
+			mode << __ffs(CHAN_QOS_MODE_MASK(channel)));
+
+	return 0;
+}
+
+static int airoha_qdma_set_tx_prio_sched(struct airoha_gdm_port *port,
+					 int channel)
+{
+	static const u16 w[AIROHA_NUM_QOS_QUEUES] = {};
+
+	return airoha_qdma_set_chan_tx_sched(port, channel, TC_SCH_SP, w,
+					     ARRAY_SIZE(w));
+}
+
+static int airoha_qdma_set_tx_ets_sched(struct airoha_gdm_port *port,
+					int channel,
+					struct tc_ets_qopt_offload *opt)
+{
+	struct tc_ets_qopt_offload_replace_params *p = &opt->replace_params;
+	enum tx_sched_mode mode = TC_SCH_SP;
+	u16 w[AIROHA_NUM_QOS_QUEUES] = {};
+	int i, nstrict = 0;
+
+	if (p->bands != AIROHA_NUM_QOS_QUEUES)
+		return -EINVAL;
+
+	for (i = 0; i < p->bands; i++) {
+		if (!p->quanta[i])
+			nstrict++;
+	}
+
+	/* this configuration is not supported by the hw */
+	if (nstrict == AIROHA_NUM_QOS_QUEUES - 1)
+		return -EINVAL;
+
+	for (i = 0; i < p->bands - nstrict; i++)
+		w[i] = p->weights[nstrict + i];
+
+	if (!nstrict)
+		mode = TC_SCH_WRR8;
+	else if (nstrict < AIROHA_NUM_QOS_QUEUES - 1)
+		mode = nstrict + 1;
+
+	return airoha_qdma_set_chan_tx_sched(port, channel, mode, w,
+					     ARRAY_SIZE(w));
+}
+
+static int airoha_tc_setup_qdisc_ets(struct airoha_gdm_port *port, int channel,
+				     struct tc_ets_qopt_offload *opt)
+{
+	switch (opt->command) {
+	case TC_ETS_REPLACE:
+		return airoha_qdma_set_tx_ets_sched(port, channel, opt);
+	case TC_ETS_DESTROY:
+		/* PRIO is default qdisc scheduler */
+		return airoha_qdma_set_tx_prio_sched(port, channel);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int airoha_dev_tc_setup_conduit(struct net_device *dev, int channel,
+				       enum tc_setup_type type,
+				       void *type_data)
+{
+	struct airoha_gdm_port *port = netdev_priv(dev);
+
+	switch (type) {
+	case TC_SETUP_QDISC_ETS:
+		return airoha_tc_setup_qdisc_ets(port, channel, type_data);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int airoha_dev_tc_setup(struct net_device *dev, enum tc_setup_type type,
+			       void *type_data)
+{
+	struct airoha_gdm_port *port = netdev_priv(dev);
+
+	return airoha_dev_tc_setup_conduit(dev, port->id, type, type_data);
+}
+
 static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_init		= airoha_dev_init,
 	.ndo_open		= airoha_dev_open,
@@ -2632,6 +2782,8 @@ static const struct net_device_ops airoha_netdev_ops = {
 	.ndo_start_xmit		= airoha_dev_xmit,
 	.ndo_get_stats64        = airoha_dev_get_stats64,
 	.ndo_set_mac_address	= airoha_dev_set_macaddr,
+	.ndo_setup_tc		= airoha_dev_tc_setup,
+	.ndo_setup_tc_conduit	= airoha_dev_tc_setup_conduit,
 };
 
 static const struct ethtool_ops airoha_ethtool_ops = {
@@ -2681,7 +2833,8 @@ static int airoha_alloc_gdm_port(struct airoha_eth *eth, struct device_node *np)
 	dev->watchdog_timeo = 5 * HZ;
 	dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
 			   NETIF_F_TSO6 | NETIF_F_IPV6_CSUM |
-			   NETIF_F_SG | NETIF_F_TSO;
+			   NETIF_F_SG | NETIF_F_TSO |
+			   NETIF_F_HW_TC;
 	dev->features |= dev->hw_features;
 	dev->dev.of_node = np;
 	dev->irq = qdma->irq;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC net-next 5/5] net: airoha: Add sched TBF offload support
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
@ 2024-12-11 15:31 ` Lorenzo Bianconi
  2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
  5 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-11 15:31 UTC (permalink / raw)
  To: netdev
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Introduce support for TBF qdisc offload available in the Airoha EN7581
ethernet controller. Add the capability to configure hw TBF Qdisc for
the specified DSA user port via the QDMA block available in the mac chip
(QDMA block is connected to the DSA switch cpu port).

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/mediatek/airoha_eth.c | 185 +++++++++++++++++++++
 1 file changed, 185 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/airoha_eth.c b/drivers/net/ethernet/mediatek/airoha_eth.c
index 23aad8670a17..a79c92a816a2 100644
--- a/drivers/net/ethernet/mediatek/airoha_eth.c
+++ b/drivers/net/ethernet/mediatek/airoha_eth.c
@@ -42,6 +42,9 @@
 #define PSE_RSV_PAGES			128
 #define PSE_QUEUE_RSV_PAGES		64
 
+#define QDMA_METER_IDX(_n)		((_n) & 0xff)
+#define QDMA_METER_GROUP(_n)		(((_n) >> 8) & 0x3)
+
 /* FE */
 #define PSE_BASE			0x0100
 #define CSR_IFC_BASE			0x0200
@@ -582,6 +585,17 @@
 #define EGRESS_SLOW_TICK_RATIO_MASK	GENMASK(29, 16)
 #define EGRESS_FAST_TICK_MASK		GENMASK(15, 0)
 
+#define TRTCM_PARAM_RW_MASK		BIT(31)
+#define TRTCM_PARAM_RW_DONE_MASK	BIT(30)
+#define TRTCM_PARAM_TYPE_MASK		GENMASK(29, 28)
+#define TRTCM_METER_GROUP_MASK		GENMASK(27, 26)
+#define TRTCM_PARAM_INDEX_MASK		GENMASK(23, 17)
+#define TRTCM_PARAM_RATE_TYPE_MASK	BIT(16)
+
+#define REG_TRTCM_CFG_PARAM(_n)		((_n) + 0x4)
+#define REG_TRTCM_DATA_LOW(_n)		((_n) + 0x8)
+#define REG_TRTCM_DATA_HIGH(_n)		((_n) + 0xc)
+
 #define REG_TXWRR_MODE_CFG		0x1020
 #define TWRR_WEIGHT_SCALE_MASK		BIT(31)
 #define TWRR_WEIGHT_BASE_MASK		BIT(3)
@@ -758,6 +772,29 @@ enum tx_sched_mode {
 	TC_SCH_WRR2,
 };
 
+enum trtcm_param_type {
+	TRTCM_MISC_MODE, /* meter_en, pps_mode, tick_sel */
+	TRTCM_TOKEN_RATE_MODE,
+	TRTCM_BUCKETSIZE_SHIFT_MODE,
+	TRTCM_BUCKET_COUNTER_MODE,
+};
+
+enum trtcm_mode_type {
+	TRTCM_COMMIT_MODE,
+	TRTCM_PEAK_MODE,
+};
+
+enum trtcm_param {
+	TRTCM_TICK_SEL = BIT(0),
+	TRTCM_PKT_MODE = BIT(1),
+	TRTCM_METER_MODE = BIT(2),
+};
+
+#define MIN_TOKEN_SIZE				4096
+#define MAX_TOKEN_SIZE_OFFSET			17
+#define TRTCM_TOKEN_RATE_MASK			GENMASK(23, 6)
+#define TRTCM_TOKEN_RATE_FRACTION_MASK		GENMASK(5, 0)
+
 struct airoha_queue_entry {
 	union {
 		void *buf;
@@ -2752,6 +2789,152 @@ static int airoha_tc_setup_qdisc_ets(struct airoha_gdm_port *port, int channel,
 	}
 }
 
+static int airoha_qdma_get_trtcm_param(struct airoha_qdma *qdma, int channel,
+				       u32 addr, enum trtcm_param_type param,
+				       enum trtcm_mode_type mode,
+				       u32 *val_low, u32 *val_high)
+{
+	u32 idx = QDMA_METER_IDX(channel), group = QDMA_METER_GROUP(channel);
+	u32 val, config = FIELD_PREP(TRTCM_PARAM_TYPE_MASK, param) |
+			  FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
+			  FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
+			  FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
+
+	airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
+	if (read_poll_timeout(airoha_qdma_rr, val,
+			      val & TRTCM_PARAM_RW_DONE_MASK,
+			      USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
+			      qdma, REG_TRTCM_CFG_PARAM(addr)))
+		return -ETIMEDOUT;
+
+	*val_low = airoha_qdma_rr(qdma, REG_TRTCM_DATA_LOW(addr));
+	if (val_high)
+		*val_high = airoha_qdma_rr(qdma, REG_TRTCM_DATA_HIGH(addr));
+
+	return 0;
+}
+
+static int airoha_qdma_set_trtcm_param(struct airoha_qdma *qdma, int channel,
+				       u32 addr, enum trtcm_param_type param,
+				       enum trtcm_mode_type mode, u32 val)
+{
+	u32 idx = QDMA_METER_IDX(channel), group = QDMA_METER_GROUP(channel);
+	u32 config = TRTCM_PARAM_RW_MASK |
+		     FIELD_PREP(TRTCM_PARAM_TYPE_MASK, param) |
+		     FIELD_PREP(TRTCM_METER_GROUP_MASK, group) |
+		     FIELD_PREP(TRTCM_PARAM_INDEX_MASK, idx) |
+		     FIELD_PREP(TRTCM_PARAM_RATE_TYPE_MASK, mode);
+
+	airoha_qdma_wr(qdma, REG_TRTCM_DATA_LOW(addr), val);
+	airoha_qdma_wr(qdma, REG_TRTCM_CFG_PARAM(addr), config);
+
+	return read_poll_timeout(airoha_qdma_rr, val,
+				 val & TRTCM_PARAM_RW_DONE_MASK,
+				 USEC_PER_MSEC, 10 * USEC_PER_MSEC, true,
+				 qdma, REG_TRTCM_CFG_PARAM(addr));
+}
+
+static int airoha_qdma_set_trtcm_config(struct airoha_qdma *qdma, int channel,
+					u32 addr, enum trtcm_mode_type mode,
+					bool enable, u32 enable_mask)
+{
+	u32 val;
+
+	if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
+					mode, &val, NULL))
+		return -EINVAL;
+
+	val = enable ? val | enable_mask : val & ~enable_mask;
+
+	return airoha_qdma_set_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
+					   mode, val);
+}
+
+static int airoha_qdma_set_trtcm_token_bucket(struct airoha_qdma *qdma,
+					      int channel, u32 addr,
+					      enum trtcm_mode_type mode,
+					      u32 rate_val, u32 bucket_size)
+{
+	u32 val, config, tick, unit, rate, rate_frac;
+	int err;
+
+	if (airoha_qdma_get_trtcm_param(qdma, channel, addr, TRTCM_MISC_MODE,
+					mode, &config, NULL))
+		return -EINVAL;
+
+	val = airoha_qdma_rr(qdma, addr);
+	tick = FIELD_GET(INGRESS_FAST_TICK_MASK, val);
+	if (config & TRTCM_TICK_SEL)
+		tick *= FIELD_GET(INGRESS_SLOW_TICK_RATIO_MASK, val);
+	if (!tick)
+		return -EINVAL;
+
+	unit = (config & TRTCM_PKT_MODE) ? 1000000 / tick : 8000 / tick;
+	if (!unit)
+		return -EINVAL;
+
+	rate = rate_val / unit;
+	rate_frac = rate_val % unit;
+	rate_frac = FIELD_PREP(TRTCM_TOKEN_RATE_MASK, rate_frac) / unit;
+	rate = FIELD_PREP(TRTCM_TOKEN_RATE_MASK, rate) |
+	       FIELD_PREP(TRTCM_TOKEN_RATE_FRACTION_MASK, rate_frac);
+
+	err = airoha_qdma_set_trtcm_param(qdma, channel, addr,
+					  TRTCM_TOKEN_RATE_MODE, mode, rate);
+	if (err)
+		return err;
+
+	val = max_t(u32, bucket_size, MIN_TOKEN_SIZE);
+	val = min_t(u32, __fls(val), MAX_TOKEN_SIZE_OFFSET);
+
+	return airoha_qdma_set_trtcm_param(qdma, channel, addr,
+					   TRTCM_BUCKETSIZE_SHIFT_MODE,
+					   mode, val);
+}
+
+static int airoha_qdma_set_tx_tbf_sched(struct airoha_gdm_port *port,
+					int channel, u32 rate, u32 bucket_size)
+{
+	int i, err;
+
+	for (i = 0; i <= TRTCM_PEAK_MODE; i++) {
+		err = airoha_qdma_set_trtcm_config(port->qdma, channel,
+						   REG_EGRESS_TRTCM_CFG, i,
+						   !!rate, TRTCM_METER_MODE);
+		if (err)
+			return err;
+
+		err = airoha_qdma_set_trtcm_token_bucket(port->qdma, channel,
+							 REG_EGRESS_TRTCM_CFG,
+							 i, rate, bucket_size);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int airoha_tc_setup_qdisc_tbf(struct airoha_gdm_port *port, int channel,
+				     struct tc_tbf_qopt_offload *qopt)
+{
+	struct tc_tbf_qopt_offload_replace_params *p = &qopt->replace_params;
+	u32 rate = 0;
+
+	if (qopt->parent != TC_H_ROOT)
+		return -EINVAL;
+
+	switch (qopt->command) {
+	case TC_TBF_REPLACE:
+		rate = div_u64(p->rate.rate_bytes_ps, 1000) << 3; /* kbps */
+		fallthrough;
+	case TC_TBF_DESTROY:
+		return airoha_qdma_set_tx_tbf_sched(port, channel, rate,
+						    p->max_size);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static int airoha_dev_tc_setup_conduit(struct net_device *dev, int channel,
 				       enum tc_setup_type type,
 				       void *type_data)
@@ -2761,6 +2944,8 @@ static int airoha_dev_tc_setup_conduit(struct net_device *dev, int channel,
 	switch (type) {
 	case TC_SETUP_QDISC_ETS:
 		return airoha_tc_setup_qdisc_ets(port, channel, type_data);
+	case TC_SETUP_QDISC_TBF:
+		return airoha_tc_setup_qdisc_tbf(port, channel, type_data);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
                   ` (4 preceding siblings ...)
  2024-12-11 15:31 ` [RFC net-next 5/5] net: airoha: Add sched TBF " Lorenzo Bianconi
@ 2024-12-11 15:41 ` Vladimir Oltean
  2024-12-12  9:19   ` Lorenzo Bianconi
  5 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-11 15:41 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

Hi Lorenzo,

On Wed, Dec 11, 2024 at 04:31:48PM +0100, Lorenzo Bianconi wrote:
> Introduce support for ETS and TBF qdisc offload available in the Airoha
> EN7581 ethernet controller.
> Some DSA hw switches do not support Qdisc offloading or the mac chip
> has more fine grained QoS capabilities with respect to the hw switch
> (e.g. Airoha EN7581 mac chip has more hw QoS and buffering capabilities
> with respect to the mt7530 switch). 
> Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
> Qdisc policies for the specified DSA user port configuring the hw switch
> cpu port (mac chip).

Can you please make a detailed diagram explaining how is the conduit
involved in the packet data path for QoS? Offloaded tc on a DSA user
port is supposed to affect autonomously forwarded traffic too (like the
Linux bridge).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
@ 2024-12-12  9:19   ` Lorenzo Bianconi
  2024-12-12 15:06     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-12  9:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 3112 bytes --]

> Hi Lorenzo,

Hi Vladimir,

> 
> On Wed, Dec 11, 2024 at 04:31:48PM +0100, Lorenzo Bianconi wrote:
> > Introduce support for ETS and TBF qdisc offload available in the Airoha
> > EN7581 ethernet controller.
> > Some DSA hw switches do not support Qdisc offloading or the mac chip
> > has more fine grained QoS capabilities with respect to the hw switch
> > (e.g. Airoha EN7581 mac chip has more hw QoS and buffering capabilities
> > with respect to the mt7530 switch). 
> > Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
> > Qdisc policies for the specified DSA user port configuring the hw switch
> > cpu port (mac chip).
> 
> Can you please make a detailed diagram explaining how is the conduit
> involved in the packet data path for QoS? Offloaded tc on a DSA user
> port is supposed to affect autonomously forwarded traffic too (like the
> Linux bridge).

I guess a typical use case would be the one below where the traffic from the
WAN port is forwarded to a DSA LAN one (e.g. lan0) via netfilter flowtable
offload.

            ┌─────────────────────────────────┐             
            │               BR0               │             
            └───┬────────┬────────┬────────┬──┘             
┌───────────────┼────────┼────────┼────────┼───┐            
│DSA            │        │        │        │   │            
│               │        │        │        │   │            
│ ┌───┐      ┌──▼─┐   ┌──▼─┐   ┌──▼─┐   ┌──▼─┐ │       ┌───┐
│ │CPU│      │LAN0│   │LAN1│   │LAN2│   │LAN3│ │       │WAN│
│ └───┘      └────┘   └────┘   └────┘   └────┘ │       └───┘
└──────────────────────────────────────────────┘            

In this case the mac chip forwards (in hw) the WAN traffic to the DSA switch
via the CPU port. In [0] we have the EN7581 mac chip architecture where we
can assume GDM1 is the CPU port and GDM2 is the WAN port.
The goal of this RFC series is to offload a Qdisc rule (e.g. ETS) on a given
LAN port using the mac chip QoS capabilities instead of creating the QoS
discipline directly in the DSA hw switch:

$tc qdisc replace dev lan0 root handle 1: ets bands 8 strict 2 quanta 1514 1514 1514 3528 1514 1514

As described above the reason for this approach would be to rely on the more
fine grained QoS capabilities available on the mac chip with respect to the
hw switch or because the DSA switch does not support QoS offloading.

Regards,
Lorenzo

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 4/5] net: airoha: Add sched ETS offload support
  2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
@ 2024-12-12 14:37   ` Davide Caratti
  2024-12-12 17:04     ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Davide Caratti @ 2024-12-12 14:37 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, andrew, olteanv, davem, edumazet, kuba, pabeni, horms,
	nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

hi Lorenzo,

On Wed, Dec 11, 2024 at 4:32 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce support for ETS qdisc offload available in the Airoha EN7581
> ethernet controller. Add the capability to configure hw ETS Qdisc for
> the specified DSA user port via the QDMA block available in the mac chip
> (QDMA block is connected to the DSA switch cpu port).
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/mediatek/airoha_eth.c | 155 ++++++++++++++++++++-
>  1 file changed, 154 insertions(+), 1 deletion(-)
>
[...]

> +static int airoha_qdma_set_tx_ets_sched(struct airoha_gdm_port *port,
> +                                       int channel,
> +                                       struct tc_ets_qopt_offload *opt)
> +{
> +       struct tc_ets_qopt_offload_replace_params *p = &opt->replace_params;
> +       enum tx_sched_mode mode = TC_SCH_SP;
> +       u16 w[AIROHA_NUM_QOS_QUEUES] = {};
> +       int i, nstrict = 0;
> +
> +       if (p->bands != AIROHA_NUM_QOS_QUEUES)
> +               return -EINVAL;

maybe this condition can be relaxed to '<'  if priomap is parsed ? (see below)

> +
> +       for (i = 0; i < p->bands; i++) {
> +               if (!p->quanta[i])
> +                       nstrict++;
> +       }
> +
> +       /* this configuration is not supported by the hw */
> +       if (nstrict == AIROHA_NUM_QOS_QUEUES - 1)
> +               return -EINVAL;
> +
> +       for (i = 0; i < p->bands - nstrict; i++)
> +               w[i] = p->weights[nstrict + i];
> +
> +       if (!nstrict)
> +               mode = TC_SCH_WRR8;
> +       else if (nstrict < AIROHA_NUM_QOS_QUEUES - 1)
> +               mode = nstrict + 1;
> +
> +       return airoha_qdma_set_chan_tx_sched(port, channel, mode, w,
> +                                            ARRAY_SIZE(w));

it seems that SP queues have a fixed, non-programmable priority in
hardware (e.g., queue 7 is served prior to queue 6) If this is the
case, you probably have to ensure that 'priomap' maps correctly
skb->priority to one of the SP queues, like done in [1].

thanks,
-- 
davide

[1] https://elixir.bootlin.com/linux/v6.12.4/source/drivers/net/ethernet/microchip/lan966x/lan966x_ets.c#L41


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-12  9:19   ` Lorenzo Bianconi
@ 2024-12-12 15:06     ` Vladimir Oltean
  2024-12-12 17:03       ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-12 15:06 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

On Thu, Dec 12, 2024 at 10:19:41AM +0100, Lorenzo Bianconi wrote:
> > Hi Lorenzo,
> 
> Hi Vladimir,
> 
> > 
> > On Wed, Dec 11, 2024 at 04:31:48PM +0100, Lorenzo Bianconi wrote:
> > > Introduce support for ETS and TBF qdisc offload available in the Airoha
> > > EN7581 ethernet controller.
> > > Some DSA hw switches do not support Qdisc offloading or the mac chip
> > > has more fine grained QoS capabilities with respect to the hw switch
> > > (e.g. Airoha EN7581 mac chip has more hw QoS and buffering capabilities
> > > with respect to the mt7530 switch). 
> > > Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
> > > Qdisc policies for the specified DSA user port configuring the hw switch
> > > cpu port (mac chip).
> > 
> > Can you please make a detailed diagram explaining how is the conduit
> > involved in the packet data path for QoS? Offloaded tc on a DSA user
> > port is supposed to affect autonomously forwarded traffic too (like the
> > Linux bridge).
> 
> I guess a typical use case would be the one below where the traffic from the
> WAN port is forwarded to a DSA LAN one (e.g. lan0) via netfilter flowtable
> offload.
> 
>             ┌─────────────────────────────────┐             
>             │               BR0               │             
>             └───┬────────┬────────┬────────┬──┘             
> ┌───────────────┼────────┼────────┼────────┼───┐            
> │DSA            │        │        │        │   │            
> │               │        │        │        │   │            
> │ ┌───┐      ┌──▼─┐   ┌──▼─┐   ┌──▼─┐   ┌──▼─┐ │       ┌───┐
> │ │CPU│      │LAN0│   │LAN1│   │LAN2│   │LAN3│ │       │WAN│
> │ └───┘      └────┘   └────┘   └────┘   └────┘ │       └───┘
> └──────────────────────────────────────────────┘            
> 
> In this case the mac chip forwards (in hw) the WAN traffic to the DSA switch
> via the CPU port. In [0] we have the EN7581 mac chip architecture where we
> can assume GDM1 is the CPU port and GDM2 is the WAN port.
> The goal of this RFC series is to offload a Qdisc rule (e.g. ETS) on a given
> LAN port using the mac chip QoS capabilities instead of creating the QoS
> discipline directly in the DSA hw switch:
> 
> $tc qdisc replace dev lan0 root handle 1: ets bands 8 strict 2 quanta 1514 1514 1514 3528 1514 1514
> 
> As described above the reason for this approach would be to rely on the more
> fine grained QoS capabilities available on the mac chip with respect to the
> hw switch or because the DSA switch does not support QoS offloading.
> 
> Regards,
> Lorenzo
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752

Explain "the mac chip forwards (in hw) the WAN traffic to the DSA switch
via the CPU port". How many packets does airoha_dev_select_queue() see?
All of them, or only the first of a flow? What operations does the
offload consist of?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-12 15:06     ` Vladimir Oltean
@ 2024-12-12 17:03       ` Lorenzo Bianconi
  2024-12-12 18:46         ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-12 17:03 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 4123 bytes --]

On Dec 12, Vladimir Oltean wrote:
> On Thu, Dec 12, 2024 at 10:19:41AM +0100, Lorenzo Bianconi wrote:
> > > Hi Lorenzo,
> > 
> > Hi Vladimir,
> > 
> > > 
> > > On Wed, Dec 11, 2024 at 04:31:48PM +0100, Lorenzo Bianconi wrote:
> > > > Introduce support for ETS and TBF qdisc offload available in the Airoha
> > > > EN7581 ethernet controller.
> > > > Some DSA hw switches do not support Qdisc offloading or the mac chip
> > > > has more fine grained QoS capabilities with respect to the hw switch
> > > > (e.g. Airoha EN7581 mac chip has more hw QoS and buffering capabilities
> > > > with respect to the mt7530 switch). 
> > > > Introduce ndo_setup_tc_conduit callback in order to allow tc to offload
> > > > Qdisc policies for the specified DSA user port configuring the hw switch
> > > > cpu port (mac chip).
> > > 
> > > Can you please make a detailed diagram explaining how is the conduit
> > > involved in the packet data path for QoS? Offloaded tc on a DSA user
> > > port is supposed to affect autonomously forwarded traffic too (like the
> > > Linux bridge).
> > 
> > I guess a typical use case would be the one below where the traffic from the
> > WAN port is forwarded to a DSA LAN one (e.g. lan0) via netfilter flowtable
> > offload.
> > 
> >             ┌─────────────────────────────────┐             
> >             │               BR0               │             
> >             └───┬────────┬────────┬────────┬──┘             
> > ┌───────────────┼────────┼────────┼────────┼───┐            
> > │DSA            │        │        │        │   │            
> > │               │        │        │        │   │            
> > │ ┌───┐      ┌──▼─┐   ┌──▼─┐   ┌──▼─┐   ┌──▼─┐ │       ┌───┐
> > │ │CPU│      │LAN0│   │LAN1│   │LAN2│   │LAN3│ │       │WAN│
> > │ └───┘      └────┘   └────┘   └────┘   └────┘ │       └───┘
> > └──────────────────────────────────────────────┘            
> > 
> > In this case the mac chip forwards (in hw) the WAN traffic to the DSA switch
> > via the CPU port. In [0] we have the EN7581 mac chip architecture where we
> > can assume GDM1 is the CPU port and GDM2 is the WAN port.
> > The goal of this RFC series is to offload a Qdisc rule (e.g. ETS) on a given
> > LAN port using the mac chip QoS capabilities instead of creating the QoS
> > discipline directly in the DSA hw switch:
> > 
> > $tc qdisc replace dev lan0 root handle 1: ets bands 8 strict 2 quanta 1514 1514 1514 3528 1514 1514
> > 
> > As described above the reason for this approach would be to rely on the more
> > fine grained QoS capabilities available on the mac chip with respect to the
> > hw switch or because the DSA switch does not support QoS offloading.
> > 
> > Regards,
> > Lorenzo
> > 
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
> 
> Explain "the mac chip forwards (in hw) the WAN traffic to the DSA switch
> via the CPU port". How many packets does airoha_dev_select_queue() see?
> All of them, or only the first of a flow? What operations does the
> offload consist of?

I am referring to the netfilter flowtable offload where the kernel receives
just the 3-way handshake of a TCP connection and then the traffic is fully
offloaded (the hw receives a flower rule to route the traffic between
interfaces applying NAT mangling if requested). Re-thinking about it,
I guess it is better to post flowtable support first and then continue
the discussion about QoS offloading, what do you think?

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 4/5] net: airoha: Add sched ETS offload support
  2024-12-12 14:37   ` Davide Caratti
@ 2024-12-12 17:04     ` Lorenzo Bianconi
  0 siblings, 0 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-12 17:04 UTC (permalink / raw)
  To: Davide Caratti
  Cc: netdev, andrew, olteanv, davem, edumazet, kuba, pabeni, horms,
	nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 2544 bytes --]

On Dec 12, Davide Caratti wrote:
> hi Lorenzo,
> 
> On Wed, Dec 11, 2024 at 4:32 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > Introduce support for ETS qdisc offload available in the Airoha EN7581
> > ethernet controller. Add the capability to configure hw ETS Qdisc for
> > the specified DSA user port via the QDMA block available in the mac chip
> > (QDMA block is connected to the DSA switch cpu port).
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/ethernet/mediatek/airoha_eth.c | 155 ++++++++++++++++++++-
> >  1 file changed, 154 insertions(+), 1 deletion(-)
> >
> [...]
> 
> > +static int airoha_qdma_set_tx_ets_sched(struct airoha_gdm_port *port,
> > +                                       int channel,
> > +                                       struct tc_ets_qopt_offload *opt)
> > +{
> > +       struct tc_ets_qopt_offload_replace_params *p = &opt->replace_params;
> > +       enum tx_sched_mode mode = TC_SCH_SP;
> > +       u16 w[AIROHA_NUM_QOS_QUEUES] = {};
> > +       int i, nstrict = 0;
> > +
> > +       if (p->bands != AIROHA_NUM_QOS_QUEUES)
> > +               return -EINVAL;
> 
> maybe this condition can be relaxed to '<'  if priomap is parsed ? (see below)

ack, I guess we can relax a bit this condition.

> 
> > +
> > +       for (i = 0; i < p->bands; i++) {
> > +               if (!p->quanta[i])
> > +                       nstrict++;
> > +       }
> > +
> > +       /* this configuration is not supported by the hw */
> > +       if (nstrict == AIROHA_NUM_QOS_QUEUES - 1)
> > +               return -EINVAL;
> > +
> > +       for (i = 0; i < p->bands - nstrict; i++)
> > +               w[i] = p->weights[nstrict + i];
> > +
> > +       if (!nstrict)
> > +               mode = TC_SCH_WRR8;
> > +       else if (nstrict < AIROHA_NUM_QOS_QUEUES - 1)
> > +               mode = nstrict + 1;
> > +
> > +       return airoha_qdma_set_chan_tx_sched(port, channel, mode, w,
> > +                                            ARRAY_SIZE(w));
> 
> it seems that SP queues have a fixed, non-programmable priority in
> hardware (e.g., queue 7 is served prior to queue 6) If this is the
> case, you probably have to ensure that 'priomap' maps correctly
> skb->priority to one of the SP queues, like done in [1].

ack, I will take a look.

Regards,
Lorenzo

> 
> thanks,
> -- 
> davide
> 
> [1] https://elixir.bootlin.com/linux/v6.12.4/source/drivers/net/ethernet/microchip/lan966x/lan966x_ets.c#L41
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-12 17:03       ` Lorenzo Bianconi
@ 2024-12-12 18:46         ` Vladimir Oltean
  2024-12-16 12:09           ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-12 18:46 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

On Thu, Dec 12, 2024 at 06:03:08PM +0100, Lorenzo Bianconi wrote:
> > Explain "the mac chip forwards (in hw) the WAN traffic to the DSA switch
> > via the CPU port". How many packets does airoha_dev_select_queue() see?
> > All of them, or only the first of a flow? What operations does the
> > offload consist of?
> 
> I am referring to the netfilter flowtable offload where the kernel receives
> just the 3-way handshake of a TCP connection and then the traffic is fully
> offloaded (the hw receives a flower rule to route the traffic between
> interfaces applying NAT mangling if requested).

And how do the follow-up packets know to go to the same conduit queue as
the initial packets of the flow?

As mentioned, my trouble with your current proposal is that I don't
think it reacts adequately to the user space request. Given your command,
packets forwarded from lan1 to lan0 should also go through lan0's ETS
scheduler, but my understanding is that they won't, because they bypass
the conduit. I don't encourage adding new net_device_ops infrastructure
to implement unexpected behavior.

I'm trying to look at the big picture and abstract away the flowtable a
bit. I don't think the tc rule should be on the user port. Can the
redirection of packets destined towards a particular switch port be
accomplished with a tc u32 filter on the conduit interface instead?
If the tc primitives for either the filter or the action don't exist,
maybe those could be added instead? Like DSA keys in "flower" which gain
introspection into the encapsulated packet headers?

> Re-thinking about it, I guess it is better to post flowtable support
> first and then continue the discussion about QoS offloading, what do
> you think?

I don't know about Andrew, but I'm really not familiar with the
netfilter flowtable (and there's another series from Eric Woudstra
waiting for me to know everything about it).

Though, I don't think this can continue for long, we need to find a
common starting place for discussions, since the development for chips
with flowtable offload is starting to put pressure on DSA. What to read
as a starting point for a basic understanding?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-12 18:46         ` Vladimir Oltean
@ 2024-12-16 12:09           ` Lorenzo Bianconi
  2024-12-16 15:49             ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-16 12:09 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 4369 bytes --]

> On Thu, Dec 12, 2024 at 06:03:08PM +0100, Lorenzo Bianconi wrote:
> > > Explain "the mac chip forwards (in hw) the WAN traffic to the DSA switch
> > > via the CPU port". How many packets does airoha_dev_select_queue() see?
> > > All of them, or only the first of a flow? What operations does the
> > > offload consist of?
> > 
> > I am referring to the netfilter flowtable offload where the kernel receives
> > just the 3-way handshake of a TCP connection and then the traffic is fully
> > offloaded (the hw receives a flower rule to route the traffic between
> > interfaces applying NAT mangling if requested).

Hi Vladimir,

Sorry for the late reply.

> 
> And how do the follow-up packets know to go to the same conduit queue as
> the initial packets of the flow?
> 
> As mentioned, my trouble with your current proposal is that I don't
> think it reacts adequately to the user space request. Given your command,
> packets forwarded from lan1 to lan0 should also go through lan0's ETS
> scheduler, but my understanding is that they won't, because they bypass
> the conduit. I don't encourage adding new net_device_ops infrastructure
> to implement unexpected behavior.

I guess what I did not make clear here is that we are discussing about
'routed' traffic (sorry for that). The traffic is received from the WAN
interface and routed to a DSA port (or the other way around).
In this scenario the 3-way handshake will be received by the CPU via the
WAN port (or the conduit port) while the subsequent packets will be hw
forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
will forward it to the GDM1 port that is connected to the DSA cpu port.

The proposed series is about adding the control path to apply a given Qdisc
(ETS or TBF for EN7581) to the traffic that is following the described path
without creating it directly on the DSA switch port (for the reasons described
before). E.g. the user would want to apply an ETS Qdisc just for traffic
egressing via lan0.

This series is not strictly related to the airoha_eth flowtable offload
implementation but the latter is required to have a full pictures of the
possible use case (this is why I was saying it is better to post it first).

> 
> I'm trying to look at the big picture and abstract away the flowtable a
> bit. I don't think the tc rule should be on the user port. Can the
> redirection of packets destined towards a particular switch port be
> accomplished with a tc u32 filter on the conduit interface instead?
> If the tc primitives for either the filter or the action don't exist,
> maybe those could be added instead? Like DSA keys in "flower" which gain
> introspection into the encapsulated packet headers?

The issue with the current DSA infrastructure is there is no way to use
the conduit port to offload a Qdisc policy to a given lan port since we
are missing in the APIs the information about what user port we are
interested in (this is why I added the new netdev callback).
Please consider here we are discussing about Qdisc policies and not flower
rules to mangle the traffic. The hw needs to be configured in advance to apply
the requested policy (e.g TBF for traffic shaping).

> 
> > Re-thinking about it, I guess it is better to post flowtable support
> > first and then continue the discussion about QoS offloading, what do
> > you think?
> 
> I don't know about Andrew, but I'm really not familiar with the
> netfilter flowtable (and there's another series from Eric Woudstra
> waiting for me to know everything about it).
> 
> Though, I don't think this can continue for long, we need to find a
> common starting place for discussions, since the development for chips
> with flowtable offload is starting to put pressure on DSA. What to read
> as a starting point for a basic understanding?

I do not think there is much documentation about it (except the source code).
I guess you can take a look to [1],[2].

Regards,
Lorenzo

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
[1] https://docs.kernel.org/networking/nf_flowtable.html
[2] https://thermalcircle.de/doku.php?id=blog:linux:flowtables_1_a_netfilter_nftables_fastpath

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 12:09           ` Lorenzo Bianconi
@ 2024-12-16 15:49             ` Vladimir Oltean
  2024-12-16 18:14               ` Oleksij Rempel
  2024-12-16 19:01               ` Lorenzo Bianconi
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-16 15:49 UTC (permalink / raw)
  To: Lorenzo Bianconi, Oleksij Rempel
  Cc: netdev, andrew, davem, edumazet, kuba, pabeni, horms, nbd,
	sean.wang, Mark-MC.Lee, lorenzo.bianconi83

On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> I guess what I did not make clear here is that we are discussing about
> 'routed' traffic (sorry for that). The traffic is received from the WAN
> interface and routed to a DSA port (or the other way around).
> In this scenario the 3-way handshake will be received by the CPU via the
> WAN port (or the conduit port) while the subsequent packets will be hw
> forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> will forward it to the GDM1 port that is connected to the DSA cpu port.
> 
> The proposed series is about adding the control path to apply a given Qdisc
> (ETS or TBF for EN7581) to the traffic that is following the described path
> without creating it directly on the DSA switch port (for the reasons described
> before). E.g. the user would want to apply an ETS Qdisc just for traffic
> egressing via lan0.
> 
> This series is not strictly related to the airoha_eth flowtable offload
> implementation but the latter is required to have a full pictures of the
> possible use case (this is why I was saying it is better to post it first).

It's good to know this does not depend on flowtable.

When you add an offloaded Qdisc to the egress of a net device, you don't
affect just the traffic L3 routed to that device, but all traffic (also
includes the packets sent to it using L2 forwarding). As such, I simply
don't believe that the way in which the UAPI is interpreted here (root
egress qdisc matches only routed traffic) is proper.

Ack?

> > I'm trying to look at the big picture and abstract away the flowtable a
> > bit. I don't think the tc rule should be on the user port. Can the
> > redirection of packets destined towards a particular switch port be
> > accomplished with a tc u32 filter on the conduit interface instead?
> > If the tc primitives for either the filter or the action don't exist,
> > maybe those could be added instead? Like DSA keys in "flower" which gain
> > introspection into the encapsulated packet headers?
> 
> The issue with the current DSA infrastructure is there is no way to use
> the conduit port to offload a Qdisc policy to a given lan port since we
> are missing in the APIs the information about what user port we are
> interested in (this is why I added the new netdev callback).

How does the introduction of ndo_setup_tc_conduit() help, since the problem
is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
It is simply not comparable to the way in which it is offloaded by
drivers/net/dsa/microchip/ksz_common.c, even though the user space
syntax is the same. Unless you're suggesting that for ksz it is not
offloaded correctly?

Oleksij, am I missing something?

> Please consider here we are discussing about Qdisc policies and not flower
> rules to mangle the traffic.

What's a Qdisc policy?

Also, flower is a classifier, not an action. It doesn't mangle packets
by the very definition of what a classifier is.

> The hw needs to be configured in advance to apply the requested policy
> (e.g TBF for traffic shaping).

What are you missing exactly to make DSA packets go to a particular
channel on the conduit?

For Qdisc offloading you want to configure the NIC in advance, of course.

Can't you do something like this to guide packets to the correct channels?

tc qdisc add dev eth0 clsact
tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
	flowid 1:1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 15:49             ` Vladimir Oltean
@ 2024-12-16 18:14               ` Oleksij Rempel
  2024-12-16 19:01               ` Lorenzo Bianconi
  1 sibling, 0 replies; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-16 18:14 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Lorenzo Bianconi, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Mon, Dec 16, 2024 at 05:49:47PM +0200, Vladimir Oltean wrote:
> On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> > I guess what I did not make clear here is that we are discussing about
> > 'routed' traffic (sorry for that). The traffic is received from the WAN
> > interface and routed to a DSA port (or the other way around).
> > In this scenario the 3-way handshake will be received by the CPU via the
> > WAN port (or the conduit port) while the subsequent packets will be hw
> > forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> > will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> > will forward it to the GDM1 port that is connected to the DSA cpu port.
> > 
> > The proposed series is about adding the control path to apply a given Qdisc
> > (ETS or TBF for EN7581) to the traffic that is following the described path
> > without creating it directly on the DSA switch port (for the reasons described
> > before). E.g. the user would want to apply an ETS Qdisc just for traffic
> > egressing via lan0.
> > 
> > This series is not strictly related to the airoha_eth flowtable offload
> > implementation but the latter is required to have a full pictures of the
> > possible use case (this is why I was saying it is better to post it first).
> 
> It's good to know this does not depend on flowtable.
> 
> When you add an offloaded Qdisc to the egress of a net device, you don't
> affect just the traffic L3 routed to that device, but all traffic (also
> includes the packets sent to it using L2 forwarding). As such, I simply
> don't believe that the way in which the UAPI is interpreted here (root
> egress qdisc matches only routed traffic) is proper.
> 
> Ack?
> 
> > > I'm trying to look at the big picture and abstract away the flowtable a
> > > bit. I don't think the tc rule should be on the user port. Can the
> > > redirection of packets destined towards a particular switch port be
> > > accomplished with a tc u32 filter on the conduit interface instead?
> > > If the tc primitives for either the filter or the action don't exist,
> > > maybe those could be added instead? Like DSA keys in "flower" which gain
> > > introspection into the encapsulated packet headers?
> > 
> > The issue with the current DSA infrastructure is there is no way to use
> > the conduit port to offload a Qdisc policy to a given lan port since we
> > are missing in the APIs the information about what user port we are
> > interested in (this is why I added the new netdev callback).
> 
> How does the introduction of ndo_setup_tc_conduit() help, since the problem
> is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> It is simply not comparable to the way in which it is offloaded by
> drivers/net/dsa/microchip/ksz_common.c, even though the user space
> syntax is the same. Unless you're suggesting that for ksz it is not
> offloaded correctly?
> 
> Oleksij, am I missing something?

Ahm.. let me try to understand the problem. First I would like to
interpret the diagram provide by Lorenzo, since I have no idea about
EN7581, I use KSZ* model and transfer it to this one:

                               CPU
                                |
              +-----------------|---------------------+
    EN7581    |      GDM1  <-- fabric? ---->    GDM2  | like NXP Layerscape?
              +-------|--------------------------|----+
                      |                          |
              +-------|---------+               WAN
    MT7530    |      CPU_port   |
              |     fabric      |
              | lan1 lan2 ....  |
	      +--|----|----|||--+

In case of lanN to lanN traffic, switch internal QoS should be used.
This is where "tc qdisc add lanN root ets" rules apply. At least, this
is how it is implemented for KSZ switches.

In case of traffic flow from WAN to lanN, we deal with two separate sets
of queues and schedulers: GDM1 egress queues and scheduler, (probably
ingress queues on CPU_port), lanN queues and scheduler. Combining both
parts in to one rule, is not good, especially if rules of each lanN port
may be different.

> > Please consider here we are discussing about Qdisc policies and not flower
> > rules to mangle the traffic.
> 
> What's a Qdisc policy?
> 
> Also, flower is a classifier, not an action. It doesn't mangle packets
> by the very definition of what a classifier is.
> 
> > The hw needs to be configured in advance to apply the requested policy
> > (e.g TBF for traffic shaping).
> 
> What are you missing exactly to make DSA packets go to a particular
> channel on the conduit?
> 
> For Qdisc offloading you want to configure the NIC in advance, of course.
> 
> Can't you do something like this to guide packets to the correct channels?
> 
> tc qdisc add dev eth0 clsact
> tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> 	flowid 1:1

ACK, this would be my expectation as well.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 15:49             ` Vladimir Oltean
  2024-12-16 18:14               ` Oleksij Rempel
@ 2024-12-16 19:01               ` Lorenzo Bianconi
  2024-12-16 19:23                 ` Oleksij Rempel
                                   ` (2 more replies)
  1 sibling, 3 replies; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-16 19:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Oleksij Rempel, netdev, andrew, davem, edumazet, kuba, pabeni,
	horms, nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 5786 bytes --]

> On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> > I guess what I did not make clear here is that we are discussing about
> > 'routed' traffic (sorry for that). The traffic is received from the WAN
> > interface and routed to a DSA port (or the other way around).
> > In this scenario the 3-way handshake will be received by the CPU via the
> > WAN port (or the conduit port) while the subsequent packets will be hw
> > forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> > will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> > will forward it to the GDM1 port that is connected to the DSA cpu port.
> > 
> > The proposed series is about adding the control path to apply a given Qdisc
> > (ETS or TBF for EN7581) to the traffic that is following the described path
> > without creating it directly on the DSA switch port (for the reasons described
> > before). E.g. the user would want to apply an ETS Qdisc just for traffic
> > egressing via lan0.
> > 
> > This series is not strictly related to the airoha_eth flowtable offload
> > implementation but the latter is required to have a full pictures of the
> > possible use case (this is why I was saying it is better to post it first).
> 
> It's good to know this does not depend on flowtable.
> 
> When you add an offloaded Qdisc to the egress of a net device, you don't
> affect just the traffic L3 routed to that device, but all traffic (also
> includes the packets sent to it using L2 forwarding). As such, I simply
> don't believe that the way in which the UAPI is interpreted here (root
> egress qdisc matches only routed traffic) is proper.
> 
> Ack?

Considering patch [0], we are still offloading the Qdisc on the provided
DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
callback in order to use the hw Qdisc capabilities available on the mac chip
(e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?

> 
> > > I'm trying to look at the big picture and abstract away the flowtable a
> > > bit. I don't think the tc rule should be on the user port. Can the
> > > redirection of packets destined towards a particular switch port be
> > > accomplished with a tc u32 filter on the conduit interface instead?
> > > If the tc primitives for either the filter or the action don't exist,
> > > maybe those could be added instead? Like DSA keys in "flower" which gain
> > > introspection into the encapsulated packet headers?
> > 
> > The issue with the current DSA infrastructure is there is no way to use
> > the conduit port to offload a Qdisc policy to a given lan port since we
> > are missing in the APIs the information about what user port we are
> > interested in (this is why I added the new netdev callback).
> 
> How does the introduction of ndo_setup_tc_conduit() help, since the problem
> is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> It is simply not comparable to the way in which it is offloaded by
> drivers/net/dsa/microchip/ksz_common.c, even though the user space
> syntax is the same. Unless you're suggesting that for ksz it is not
> offloaded correctly?

nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
does not allow to exploit all hw capabilities available on EN7581 when the
traffic is routed from the WAN port to a given DSA switch port. If we do:

$tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...

in the current upstream implementation we do:
  dsa_user_setup_tc():
     ...
       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
          (it applies the Qdisc on lan0 configuring the hw switch)

adding the ndo_setup_tc_conduit() callback we have:

  dsa_user_setup_qdisc()
    ...
      -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
         (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)

      -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
         (it applies the Qdisc on lan0 configuring the hw switch)

> 
> Oleksij, am I missing something?
> 
> > Please consider here we are discussing about Qdisc policies and not flower
> > rules to mangle the traffic.
> 
> What's a Qdisc policy?

I mean a queue scheduler algorithm (e.g. TBF, ETS, HTB, ...)

> 
> Also, flower is a classifier, not an action. It doesn't mangle packets
> by the very definition of what a classifier is.

yes, but goal of the series is the Queue scheduler offloading, not
classifier/action. Agree?

> 
> > The hw needs to be configured in advance to apply the requested policy
> > (e.g TBF for traffic shaping).
> 
> What are you missing exactly to make DSA packets go to a particular
> channel on the conduit?
> 
> For Qdisc offloading you want to configure the NIC in advance, of course.
> 
> Can't you do something like this to guide packets to the correct channels?
> 
> tc qdisc add dev eth0 clsact
> tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> 	flowid 1:1

If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
queue scheduler on all the traffic egressing via the DSA switch while I would
like to apply it on per DSA port basis (but using the mac chip hw capabilities),
got my point?

Regards,
Lorenzo

[0] https://patchwork.kernel.org/project/netdevbpf/patch/8e57ae3c4b064403ca843ffa45a5eb4e4198cf80.1733930558.git.lorenzo@kernel.org/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 19:01               ` Lorenzo Bianconi
@ 2024-12-16 19:23                 ` Oleksij Rempel
  2024-12-16 21:44                   ` Lorenzo Bianconi
  2024-12-16 19:46                 ` Vladimir Oltean
  2024-12-16 23:24                 ` Andrew Lunn
  2 siblings, 1 reply; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-16 19:23 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Vladimir Oltean, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Mon, Dec 16, 2024 at 08:01:33PM +0100, Lorenzo Bianconi wrote:
> > On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> > > I guess what I did not make clear here is that we are discussing about
> > > 'routed' traffic (sorry for that). The traffic is received from the WAN
> > > interface and routed to a DSA port (or the other way around).
> > > In this scenario the 3-way handshake will be received by the CPU via the
> > > WAN port (or the conduit port) while the subsequent packets will be hw
> > > forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> > > will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> > > will forward it to the GDM1 port that is connected to the DSA cpu port.
> > > 
> > > The proposed series is about adding the control path to apply a given Qdisc
> > > (ETS or TBF for EN7581) to the traffic that is following the described path
> > > without creating it directly on the DSA switch port (for the reasons described
> > > before). E.g. the user would want to apply an ETS Qdisc just for traffic
> > > egressing via lan0.
> > > 
> > > This series is not strictly related to the airoha_eth flowtable offload
> > > implementation but the latter is required to have a full pictures of the
> > > possible use case (this is why I was saying it is better to post it first).
> > 
> > It's good to know this does not depend on flowtable.
> > 
> > When you add an offloaded Qdisc to the egress of a net device, you don't
> > affect just the traffic L3 routed to that device, but all traffic (also
> > includes the packets sent to it using L2 forwarding). As such, I simply
> > don't believe that the way in which the UAPI is interpreted here (root
> > egress qdisc matches only routed traffic) is proper.
> > 
> > Ack?
> 
> Considering patch [0], we are still offloading the Qdisc on the provided
> DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> callback in order to use the hw Qdisc capabilities available on the mac chip
> (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?
> 
> > 
> > > > I'm trying to look at the big picture and abstract away the flowtable a
> > > > bit. I don't think the tc rule should be on the user port. Can the
> > > > redirection of packets destined towards a particular switch port be
> > > > accomplished with a tc u32 filter on the conduit interface instead?
> > > > If the tc primitives for either the filter or the action don't exist,
> > > > maybe those could be added instead? Like DSA keys in "flower" which gain
> > > > introspection into the encapsulated packet headers?
> > > 
> > > The issue with the current DSA infrastructure is there is no way to use
> > > the conduit port to offload a Qdisc policy to a given lan port since we
> > > are missing in the APIs the information about what user port we are
> > > interested in (this is why I added the new netdev callback).
> > 
> > How does the introduction of ndo_setup_tc_conduit() help, since the problem
> > is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> > It is simply not comparable to the way in which it is offloaded by
> > drivers/net/dsa/microchip/ksz_common.c, even though the user space
> > syntax is the same. Unless you're suggesting that for ksz it is not
> > offloaded correctly?
> 
> nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> does not allow to exploit all hw capabilities available on EN7581 when the
> traffic is routed from the WAN port to a given DSA switch port. If we do:
> 
> $tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...
> 
> in the current upstream implementation we do:
>   dsa_user_setup_tc():
>      ...
>        -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
>           (it applies the Qdisc on lan0 configuring the hw switch)
> 
> adding the ndo_setup_tc_conduit() callback we have:
> 
>   dsa_user_setup_qdisc()
>     ...
>       -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
>          (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)
> 
>       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
>          (it applies the Qdisc on lan0 configuring the hw switch)
> 
> > 
> > Oleksij, am I missing something?
> > 
> > > Please consider here we are discussing about Qdisc policies and not flower
> > > rules to mangle the traffic.
> > 
> > What's a Qdisc policy?
> 
> I mean a queue scheduler algorithm (e.g. TBF, ETS, HTB, ...)
> 
> > 
> > Also, flower is a classifier, not an action. It doesn't mangle packets
> > by the very definition of what a classifier is.
> 
> yes, but goal of the series is the Queue scheduler offloading, not
> classifier/action. Agree?
> 
> > 
> > > The hw needs to be configured in advance to apply the requested policy
> > > (e.g TBF for traffic shaping).
> > 
> > What are you missing exactly to make DSA packets go to a particular
> > channel on the conduit?
> > 
> > For Qdisc offloading you want to configure the NIC in advance, of course.
> > 
> > Can't you do something like this to guide packets to the correct channels?
> > 
> > tc qdisc add dev eth0 clsact
> > tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> > tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> > 	flowid 1:1
> 
> If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
> queue scheduler on all the traffic egressing via the DSA switch while I would
> like to apply it on per DSA port basis (but using the mac chip hw capabilities),
> got my point?

Hm, I guess I have similar use case in one of my projects. In my case, the CPU
interface is 1Gbit the switch ports are 100Mbit each. It is still
possible to keep the CPU interface busy by sending 1Gbit UDP stream, so
900Mbit is dropped by the switch. I would like to have traffic limiter
per virtual DSA port on the SoC site to reduce the load on DSA conduit.
Currently it was not possible.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 19:01               ` Lorenzo Bianconi
  2024-12-16 19:23                 ` Oleksij Rempel
@ 2024-12-16 19:46                 ` Vladimir Oltean
  2024-12-16 22:28                   ` Lorenzo Bianconi
  2024-12-16 23:24                 ` Andrew Lunn
  2 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-16 19:46 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Oleksij Rempel, netdev, andrew, davem, edumazet, kuba, pabeni,
	horms, nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

On Mon, Dec 16, 2024 at 08:01:33PM +0100, Lorenzo Bianconi wrote:
> Considering patch [0], we are still offloading the Qdisc on the provided
> DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> callback in order to use the hw Qdisc capabilities available on the mac chip
> (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?

Not quite, no.

ndo_setup_tc_conduit() does not have the same instruments to offload
what port_setup_tc() can offload. It is not involved in all data paths
that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
-EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
paths in the switch, such that all packets that egress the DSA user port
are handled by ndo_setup_tc_conduit()'s instruments.

> > How does the introduction of ndo_setup_tc_conduit() help, since the problem
> > is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> > It is simply not comparable to the way in which it is offloaded by
> > drivers/net/dsa/microchip/ksz_common.c, even though the user space
> > syntax is the same. Unless you're suggesting that for ksz it is not
> > offloaded correctly?
> 
> nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> does not allow to exploit all hw capabilities available on EN7581 when the
> traffic is routed from the WAN port to a given DSA switch port.

And I don't believe it should, in this way.

> If we do:
> 
> $tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...
> 
> in the current upstream implementation we do:
>   dsa_user_setup_tc():
>      ...
>        -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
>           (it applies the Qdisc on lan0 configuring the hw switch)
> 
> adding the ndo_setup_tc_conduit() callback we have:
> 
>   dsa_user_setup_qdisc()
>     ...
>       -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
>          (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)
> 
>       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
>          (it applies the Qdisc on lan0 configuring the hw switch)
> 
> > Also, flower is a classifier, not an action. It doesn't mangle packets
> > by the very definition of what a classifier is.
> 
> yes, but goal of the series is the Queue scheduler offloading, not
> classifier/action. Agree?

Classifiers + flowid are an instrument to direct packets to classes of a
classful egress Qdisc. They seem perfectly relevant to the discussion,
given the information I currently have.

> > Can't you do something like this to guide packets to the correct channels?
> > 
> > tc qdisc add dev eth0 clsact
> > tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> > tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> > 	flowid 1:1
> 
> If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
> queue scheduler on all the traffic egressing via the DSA switch while I would
> like to apply it on per DSA port basis (but using the mac chip hw capabilities),
> got my point?

We need something as the root Qdisc of the conduit which exposes its
hardware capabilities. I just assumed that would be a simple (and single)
ETS, you can correct me if I am wrong.

On conduit egress, what is the arbitration scheme between the traffic
destined towards each DSA user port (channel, as the driver calls them)?
How can this be best represented?

IIUC, in your patch set, you expose the conduit hardware QoS capabilities
as if they can be perfectly virtualized among DSA user ports, and as if
each DSA user port can have its own ETS root Qdisc, completely independent
of each other, as if the packets do not serialize on the conduit <-> CPU
port link, and as if that is not a bottleneck. Is that really the case?
If so (but please explain how), maybe you really need your own root Qdisc
driver, with one class per DSA user port, and those classes have ETS
attached to them.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 19:23                 ` Oleksij Rempel
@ 2024-12-16 21:44                   ` Lorenzo Bianconi
  2024-12-17  8:46                     ` Oleksij Rempel
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-16 21:44 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Vladimir Oltean, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 7075 bytes --]

> On Mon, Dec 16, 2024 at 08:01:33PM +0100, Lorenzo Bianconi wrote:
> > > On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> > > > I guess what I did not make clear here is that we are discussing about
> > > > 'routed' traffic (sorry for that). The traffic is received from the WAN
> > > > interface and routed to a DSA port (or the other way around).
> > > > In this scenario the 3-way handshake will be received by the CPU via the
> > > > WAN port (or the conduit port) while the subsequent packets will be hw
> > > > forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> > > > will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> > > > will forward it to the GDM1 port that is connected to the DSA cpu port.
> > > > 
> > > > The proposed series is about adding the control path to apply a given Qdisc
> > > > (ETS or TBF for EN7581) to the traffic that is following the described path
> > > > without creating it directly on the DSA switch port (for the reasons described
> > > > before). E.g. the user would want to apply an ETS Qdisc just for traffic
> > > > egressing via lan0.
> > > > 
> > > > This series is not strictly related to the airoha_eth flowtable offload
> > > > implementation but the latter is required to have a full pictures of the
> > > > possible use case (this is why I was saying it is better to post it first).
> > > 
> > > It's good to know this does not depend on flowtable.
> > > 
> > > When you add an offloaded Qdisc to the egress of a net device, you don't
> > > affect just the traffic L3 routed to that device, but all traffic (also
> > > includes the packets sent to it using L2 forwarding). As such, I simply
> > > don't believe that the way in which the UAPI is interpreted here (root
> > > egress qdisc matches only routed traffic) is proper.
> > > 
> > > Ack?
> > 
> > Considering patch [0], we are still offloading the Qdisc on the provided
> > DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> > dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> > callback in order to use the hw Qdisc capabilities available on the mac chip
> > (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> > the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?
> > 
> > > 
> > > > > I'm trying to look at the big picture and abstract away the flowtable a
> > > > > bit. I don't think the tc rule should be on the user port. Can the
> > > > > redirection of packets destined towards a particular switch port be
> > > > > accomplished with a tc u32 filter on the conduit interface instead?
> > > > > If the tc primitives for either the filter or the action don't exist,
> > > > > maybe those could be added instead? Like DSA keys in "flower" which gain
> > > > > introspection into the encapsulated packet headers?
> > > > 
> > > > The issue with the current DSA infrastructure is there is no way to use
> > > > the conduit port to offload a Qdisc policy to a given lan port since we
> > > > are missing in the APIs the information about what user port we are
> > > > interested in (this is why I added the new netdev callback).
> > > 
> > > How does the introduction of ndo_setup_tc_conduit() help, since the problem
> > > is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> > > It is simply not comparable to the way in which it is offloaded by
> > > drivers/net/dsa/microchip/ksz_common.c, even though the user space
> > > syntax is the same. Unless you're suggesting that for ksz it is not
> > > offloaded correctly?
> > 
> > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > does not allow to exploit all hw capabilities available on EN7581 when the
> > traffic is routed from the WAN port to a given DSA switch port. If we do:
> > 
> > $tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...
> > 
> > in the current upstream implementation we do:
> >   dsa_user_setup_tc():
> >      ...
> >        -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> >           (it applies the Qdisc on lan0 configuring the hw switch)
> > 
> > adding the ndo_setup_tc_conduit() callback we have:
> > 
> >   dsa_user_setup_qdisc()
> >     ...
> >       -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
> >          (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)
> > 
> >       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> >          (it applies the Qdisc on lan0 configuring the hw switch)
> > 
> > > 
> > > Oleksij, am I missing something?
> > > 
> > > > Please consider here we are discussing about Qdisc policies and not flower
> > > > rules to mangle the traffic.
> > > 
> > > What's a Qdisc policy?
> > 
> > I mean a queue scheduler algorithm (e.g. TBF, ETS, HTB, ...)
> > 
> > > 
> > > Also, flower is a classifier, not an action. It doesn't mangle packets
> > > by the very definition of what a classifier is.
> > 
> > yes, but goal of the series is the Queue scheduler offloading, not
> > classifier/action. Agree?
> > 
> > > 
> > > > The hw needs to be configured in advance to apply the requested policy
> > > > (e.g TBF for traffic shaping).
> > > 
> > > What are you missing exactly to make DSA packets go to a particular
> > > channel on the conduit?
> > > 
> > > For Qdisc offloading you want to configure the NIC in advance, of course.
> > > 
> > > Can't you do something like this to guide packets to the correct channels?
> > > 
> > > tc qdisc add dev eth0 clsact
> > > tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> > > tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> > > 	flowid 1:1
> > 
> > If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
> > queue scheduler on all the traffic egressing via the DSA switch while I would
> > like to apply it on per DSA port basis (but using the mac chip hw capabilities),
> > got my point?
> 
> Hm, I guess I have similar use case in one of my projects. In my case, the CPU
> interface is 1Gbit the switch ports are 100Mbit each. It is still
> possible to keep the CPU interface busy by sending 1Gbit UDP stream, so
> 900Mbit is dropped by the switch. I would like to have traffic limiter
> per virtual DSA port on the SoC site to reduce the load on DSA conduit.
> Currently it was not possible.

Does the mac chip in your setup support TX shaping (e.g. via HTB or TBF)?
If so, I guess you could do it via the ndo_setup_tc_conduit() callback.

Regards,
Lorenzo

> 
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 19:46                 ` Vladimir Oltean
@ 2024-12-16 22:28                   ` Lorenzo Bianconi
  2024-12-16 23:13                     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-16 22:28 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Oleksij Rempel, netdev, andrew, davem, edumazet, kuba, pabeni,
	horms, nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 6285 bytes --]

> On Mon, Dec 16, 2024 at 08:01:33PM +0100, Lorenzo Bianconi wrote:
> > Considering patch [0], we are still offloading the Qdisc on the provided
> > DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> > dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> > callback in order to use the hw Qdisc capabilities available on the mac chip
> > (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> > the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?
> 
> Not quite, no.
> 
> ndo_setup_tc_conduit() does not have the same instruments to offload
> what port_setup_tc() can offload. It is not involved in all data paths
> that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()

Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
->port_setup_tc()) refer to the same DSA user port (please take a look the
callback signature).

> returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> paths in the switch, such that all packets that egress the DSA user port
> are handled by ndo_setup_tc_conduit()'s instruments.

Uhm, do you mean we are changing the user expected result in this way?
It seems to me the only case we are actually changing is if port_setup_tc()
callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
one is supported by the mac chip. In this case the previous implementation
returns -EOPNOTSUPP while the proposed one does not report any error.
Do we really care about this case? If so, I guess we can rework
dsa_user_setup_qdisc().

> 
> > > How does the introduction of ndo_setup_tc_conduit() help, since the problem
> > > is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> > > It is simply not comparable to the way in which it is offloaded by
> > > drivers/net/dsa/microchip/ksz_common.c, even though the user space
> > > syntax is the same. Unless you're suggesting that for ksz it is not
> > > offloaded correctly?
> > 
> > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > does not allow to exploit all hw capabilities available on EN7581 when the
> > traffic is routed from the WAN port to a given DSA switch port.
> 
> And I don't believe it should, in this way.

Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
for this.

> 
> > If we do:
> > 
> > $tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...
> > 
> > in the current upstream implementation we do:
> >   dsa_user_setup_tc():
> >      ...
> >        -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> >           (it applies the Qdisc on lan0 configuring the hw switch)
> > 
> > adding the ndo_setup_tc_conduit() callback we have:
> > 
> >   dsa_user_setup_qdisc()
> >     ...
> >       -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
> >          (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)
> > 
> >       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> >          (it applies the Qdisc on lan0 configuring the hw switch)
> > 
> > > Also, flower is a classifier, not an action. It doesn't mangle packets
> > > by the very definition of what a classifier is.
> > 
> > yes, but goal of the series is the Queue scheduler offloading, not
> > classifier/action. Agree?
> 
> Classifiers + flowid are an instrument to direct packets to classes of a
> classful egress Qdisc. They seem perfectly relevant to the discussion,
> given the information I currently have.

yep, sure. We will need a tc classifier to set the flow-id (I used flower during
development). What I mean is the series is taking care just of Qdisc offloading.

> 
> > > Can't you do something like this to guide packets to the correct channels?
> > > 
> > > tc qdisc add dev eth0 clsact
> > > tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> > > tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> > > 	flowid 1:1
> > 
> > If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
> > queue scheduler on all the traffic egressing via the DSA switch while I would
> > like to apply it on per DSA port basis (but using the mac chip hw capabilities),
> > got my point?
> 
> We need something as the root Qdisc of the conduit which exposes its
> hardware capabilities. I just assumed that would be a simple (and single)
> ETS, you can correct me if I am wrong.
> 
> On conduit egress, what is the arbitration scheme between the traffic
> destined towards each DSA user port (channel, as the driver calls them)?
> How can this be best represented?

The EN7581 supports up to 32 different 'channels' (each of them support 8
different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
My idea is to associate a channel to each DSA switch port, so the user can
define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
is offloaded) updates the channel and queue information in the DMA descriptor
(please take a look to [0] for the first case).

> 
> IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> as if they can be perfectly virtualized among DSA user ports, and as if
> each DSA user port can have its own ETS root Qdisc, completely independent
> of each other, as if the packets do not serialize on the conduit <-> CPU
> port link, and as if that is not a bottleneck. Is that really the case?

correct

> If so (but please explain how), maybe you really need your own root Qdisc
> driver, with one class per DSA user port, and those classes have ETS
> attached to them.

Can you please clarify what do you mean with 'root Qdisc driver'?

Regards,
Lorenzo

[0] https://patchwork.kernel.org/project/netdevbpf/patch/a7d8ec3d70d7a0e2208909189e46a63e769f8f9d.1733930558.git.lorenzo@kernel.org/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 22:28                   ` Lorenzo Bianconi
@ 2024-12-16 23:13                     ` Vladimir Oltean
  2024-12-17  9:11                       ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-16 23:13 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Oleksij Rempel, netdev, andrew, davem, edumazet, kuba, pabeni,
	horms, nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > ndo_setup_tc_conduit() does not have the same instruments to offload
> > what port_setup_tc() can offload. It is not involved in all data paths
> > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> 
> Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> ->port_setup_tc()) refer to the same DSA user port (please take a look the
> callback signature).

I'd be just repeating what I've said a few times before. Your proposed
ndo_setup_tc_conduit() appears to be configuring conduit resources
(QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
exclusive control of the user port data path. But as long as there are
packets in the user port data path that bypass those conduit QoS resources
(like for example mt7530 switch forwards packet from one port to another,
bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
first place. The tc command you're trying to make to do what you want is
supposed to _also_ include the mt7530 packets forwarded from one port to
another in its QoS mix. It applies at the _egress_ of the mt7530 port.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752

Let me try to add some squiggles based on your diagram, to clarify what
is my understanding and complaint.

               ┌───────┐                                   ┌───────┐
               │ QDMA2 │                                   │ QDMA1 │
               └───┬───┘                                   └───┬───┘
                   │                                           │
           ┌───────▼───────────────────────────────────────────▼────────┐
           │                                                            │
           │       P5                                          P0       │
           │                                                            │
           │                                                            │
           │                                                            │    ┌──────┐
           │                                                         P3 ├────► GDM3 │
           │                                                            │
┌─────┐    │                                                            │
│ PPE ◄────┤ P4                        PSE                              │
└─────┘    │                                                            │
           │                                                            │    ┌──────┐
           │                                                         P9 ├────► GDM4 │
           │                                                            │    └──────┘
           │                                                            │
           │        P2                                         P1       │
           └─────────┬─────────────────────────────────────────┬────────┘
                     │                                         │
                 ┌───▼──┐                                   ┌──▼───┐
                 │ GDM2 │                                   │ GDM1 │
                 └──────┘                                   └──┬───┘
                                                               │
                                                ┌──────────────▼───────────────┐
                                                │            CPU port          │
                                                │   ┌─────────┘                │
                                                │   │         MT7530           │
                                                │   │                          │
                                                │   ▼         x                │
                                                │   ┌─────┐ ┌─┘                │
                                                │  lan1  lan2  lan3  lan4      │
                                                └───│──────────────────────────┘
                                                    ▼

When you add an offloaded Qdisc to the egress of lan1, the expectation
is that packets from lan2 obey it too (offloaded tc goes hand in hand
with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
breaking that expectation, because packets from lan2 bridged by MT7530
don't go to GDM1 (the "x").

> > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > paths in the switch, such that all packets that egress the DSA user port
> > are handled by ndo_setup_tc_conduit()'s instruments.
> 
> Uhm, do you mean we are changing the user expected result in this way?
> It seems to me the only case we are actually changing is if port_setup_tc()
> callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> one is supported by the mac chip. In this case the previous implementation
> returns -EOPNOTSUPP while the proposed one does not report any error.
> Do we really care about this case? If so, I guess we can rework
> dsa_user_setup_qdisc().

See above, there's nothing to rework.

> > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > traffic is routed from the WAN port to a given DSA switch port.
> > 
> > And I don't believe it should, in this way.
> 
> Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> for this.

See above, I'm also waiting for Oleksij's answer but I don't expect you
2 to be talking about the same thing. If there's some common infrastructure
to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().

> > We need something as the root Qdisc of the conduit which exposes its
> > hardware capabilities. I just assumed that would be a simple (and single)
> > ETS, you can correct me if I am wrong.
> > 
> > On conduit egress, what is the arbitration scheme between the traffic
> > destined towards each DSA user port (channel, as the driver calls them)?
> > How can this be best represented?
> 
> The EN7581 supports up to 32 different 'channels' (each of them support 8
> different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> My idea is to associate a channel to each DSA switch port, so the user can
> define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> is offloaded) updates the channel and queue information in the DMA descriptor
> (please take a look to [0] for the first case).

But you call it a MAC chip because between the GDM1 and the MT7530 there's
an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?
I'm asking again, are the channels completely independent of one another,
or are they consuming shared bandwidth in a way that with your proposal
is just not visible? If there is a GMII between the GDM1 and the MT7530,
how come the bandwidth between the channels is not shared in any way?
And if there is no GMII or similar MAC interface, we need to take 100
steps back and discuss why was the DSA model chosen for this switch, and
not a freeform switchdev driver where the conduit is not discrete?

I'm not sure what to associate these channels with. Would it be wrong to
think of each channel as a separate DSA conduit? Because for example there
is API to customize the user port <-> conduit assignment.

> > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > as if they can be perfectly virtualized among DSA user ports, and as if
> > each DSA user port can have its own ETS root Qdisc, completely independent
> > of each other, as if the packets do not serialize on the conduit <-> CPU
> > port link, and as if that is not a bottleneck. Is that really the case?
> 
> correct

Very interesting but will need more than one word for an explanation :)

> > If so (but please explain how), maybe you really need your own root Qdisc
> > driver, with one class per DSA user port, and those classes have ETS
> > attached to them.
> 
> Can you please clarify what do you mean with 'root Qdisc driver'?

Quite literally, an implementer of struct Qdisc_ops whose parent can
only be TC_H_ROOT. I was implying you'd have to create an abstract
software model of the QoS capabilities of the QDMA and the GDM port such
that we all understand them, a netlink attribute scheme for configuring
those QoS parameters, and then a QoS offload mechanism through which
they are communicated to compatible hardware. But let's leave that aside
until it becomes more clear what you have.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 19:01               ` Lorenzo Bianconi
  2024-12-16 19:23                 ` Oleksij Rempel
  2024-12-16 19:46                 ` Vladimir Oltean
@ 2024-12-16 23:24                 ` Andrew Lunn
  2024-12-17  9:38                   ` Oleksij Rempel
  2 siblings, 1 reply; 33+ messages in thread
From: Andrew Lunn @ 2024-12-16 23:24 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Vladimir Oltean, Oleksij Rempel, netdev, davem, edumazet, kuba,
	pabeni, horms, nbd, sean.wang, Mark-MC.Lee, lorenzo.bianconi83

> Considering patch [0], we are still offloading the Qdisc on the provided
> DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> callback in order to use the hw Qdisc capabilities available on the mac chip
> (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?

I've not read all the details, so i could be getting something
wrong. But let me point out the basics. Offloading is used to
accelerate what Linux already supports in software. So forget about
your hardware. How would i configure a bunch of e1000e cards connected
to a software bridge to do what you want?

There is no conduit interface in this, so i would not expect to
explicitly configure a conduit interface. Maybe the offloading needs
to implicitly configure the conduit, but that should be all hidden
away from the user. But given the software bridge has no concept of a
conduit, i doubt it.

It could well be our model does not map to the hardware too well,
leaving some bits unusable, but there is not much you can do about
that, that is the Linux model, accelerate what Linux supports in
software.

	Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 21:44                   ` Lorenzo Bianconi
@ 2024-12-17  8:46                     ` Oleksij Rempel
  0 siblings, 0 replies; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-17  8:46 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Vladimir Oltean, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Mon, Dec 16, 2024 at 10:44:54PM +0100, Lorenzo Bianconi wrote:
> > On Mon, Dec 16, 2024 at 08:01:33PM +0100, Lorenzo Bianconi wrote:
> > > > On Mon, Dec 16, 2024 at 01:09:01PM +0100, Lorenzo Bianconi wrote:
> > > > > I guess what I did not make clear here is that we are discussing about
> > > > > 'routed' traffic (sorry for that). The traffic is received from the WAN
> > > > > interface and routed to a DSA port (or the other way around).
> > > > > In this scenario the 3-way handshake will be received by the CPU via the
> > > > > WAN port (or the conduit port) while the subsequent packets will be hw
> > > > > forwarded from WAN to LAN (or LAN to WAN). For EN7581 [0], the traffic
> > > > > will be received by the system from GDM2 (WAN) and the PSE/PPE blocks
> > > > > will forward it to the GDM1 port that is connected to the DSA cpu port.
> > > > > 
> > > > > The proposed series is about adding the control path to apply a given Qdisc
> > > > > (ETS or TBF for EN7581) to the traffic that is following the described path
> > > > > without creating it directly on the DSA switch port (for the reasons described
> > > > > before). E.g. the user would want to apply an ETS Qdisc just for traffic
> > > > > egressing via lan0.
> > > > > 
> > > > > This series is not strictly related to the airoha_eth flowtable offload
> > > > > implementation but the latter is required to have a full pictures of the
> > > > > possible use case (this is why I was saying it is better to post it first).
> > > > 
> > > > It's good to know this does not depend on flowtable.
> > > > 
> > > > When you add an offloaded Qdisc to the egress of a net device, you don't
> > > > affect just the traffic L3 routed to that device, but all traffic (also
> > > > includes the packets sent to it using L2 forwarding). As such, I simply
> > > > don't believe that the way in which the UAPI is interpreted here (root
> > > > egress qdisc matches only routed traffic) is proper.
> > > > 
> > > > Ack?
> > > 
> > > Considering patch [0], we are still offloading the Qdisc on the provided
> > > DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> > > dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> > > callback in order to use the hw Qdisc capabilities available on the mac chip
> > > (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> > > the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?
> > > 
> > > > 
> > > > > > I'm trying to look at the big picture and abstract away the flowtable a
> > > > > > bit. I don't think the tc rule should be on the user port. Can the
> > > > > > redirection of packets destined towards a particular switch port be
> > > > > > accomplished with a tc u32 filter on the conduit interface instead?
> > > > > > If the tc primitives for either the filter or the action don't exist,
> > > > > > maybe those could be added instead? Like DSA keys in "flower" which gain
> > > > > > introspection into the encapsulated packet headers?
> > > > > 
> > > > > The issue with the current DSA infrastructure is there is no way to use
> > > > > the conduit port to offload a Qdisc policy to a given lan port since we
> > > > > are missing in the APIs the information about what user port we are
> > > > > interested in (this is why I added the new netdev callback).
> > > > 
> > > > How does the introduction of ndo_setup_tc_conduit() help, since the problem
> > > > is elsewhere? You are not making "tc qdisc add lanN root ets" work correctly.
> > > > It is simply not comparable to the way in which it is offloaded by
> > > > drivers/net/dsa/microchip/ksz_common.c, even though the user space
> > > > syntax is the same. Unless you're suggesting that for ksz it is not
> > > > offloaded correctly?
> > > 
> > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > traffic is routed from the WAN port to a given DSA switch port. If we do:
> > > 
> > > $tc qdisc add dev lan0 root handle 1: ets strict 8 priomap ...
> > > 
> > > in the current upstream implementation we do:
> > >   dsa_user_setup_tc():
> > >      ...
> > >        -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> > >           (it applies the Qdisc on lan0 configuring the hw switch)
> > > 
> > > adding the ndo_setup_tc_conduit() callback we have:
> > > 
> > >   dsa_user_setup_qdisc()
> > >     ...
> > >       -> conduit->netdev_ops->ndo_setup_tc_conduit(conduit, dp->index, type, type_data)
> > >          (it applies the Qdisc on EN7581 mac chip for the routed traffic destinated to lan0)
> > > 
> > >       -> ds->ops->port_setup_tc(ds, dp->index, type, type_data)
> > >          (it applies the Qdisc on lan0 configuring the hw switch)
> > > 
> > > > 
> > > > Oleksij, am I missing something?
> > > > 
> > > > > Please consider here we are discussing about Qdisc policies and not flower
> > > > > rules to mangle the traffic.
> > > > 
> > > > What's a Qdisc policy?
> > > 
> > > I mean a queue scheduler algorithm (e.g. TBF, ETS, HTB, ...)
> > > 
> > > > 
> > > > Also, flower is a classifier, not an action. It doesn't mangle packets
> > > > by the very definition of what a classifier is.
> > > 
> > > yes, but goal of the series is the Queue scheduler offloading, not
> > > classifier/action. Agree?
> > > 
> > > > 
> > > > > The hw needs to be configured in advance to apply the requested policy
> > > > > (e.g TBF for traffic shaping).
> > > > 
> > > > What are you missing exactly to make DSA packets go to a particular
> > > > channel on the conduit?
> > > > 
> > > > For Qdisc offloading you want to configure the NIC in advance, of course.
> > > > 
> > > > Can't you do something like this to guide packets to the correct channels?
> > > > 
> > > > tc qdisc add dev eth0 clsact
> > > > tc qdisc add dev eth0 root handle 1: ets strict 8 priomap ...
> > > > tc filter add dev eth0 egress ${u32 or flower filter to match on DSA tagged packets} \
> > > > 	flowid 1:1
> > > 
> > > If we apply the Qdisc directly on the conduit port (eth0) we can just apply the
> > > queue scheduler on all the traffic egressing via the DSA switch while I would
> > > like to apply it on per DSA port basis (but using the mac chip hw capabilities),
> > > got my point?
> > 
> > Hm, I guess I have similar use case in one of my projects. In my case, the CPU
> > interface is 1Gbit the switch ports are 100Mbit each. It is still
> > possible to keep the CPU interface busy by sending 1Gbit UDP stream, so
> > 900Mbit is dropped by the switch. I would like to have traffic limiter
> > per virtual DSA port on the SoC site to reduce the load on DSA conduit.
> > Currently it was not possible.
> 
> Does the mac chip in your setup support TX shaping (e.g. via HTB or TBF)?
> If so, I guess you could do it via the ndo_setup_tc_conduit() callback.

HW offloading is optional optimization. At first step, it should be able
to work without HW offloading.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 23:13                     ` Vladimir Oltean
@ 2024-12-17  9:11                       ` Lorenzo Bianconi
  2024-12-17  9:30                         ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-17  9:11 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Lorenzo Bianconi, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 11264 bytes --]

> On Mon, Dec 16, 2024 at 11:28:18PM +0100, Lorenzo Bianconi wrote:
> > > ndo_setup_tc_conduit() does not have the same instruments to offload
> > > what port_setup_tc() can offload. It is not involved in all data paths
> > > that port_setup_tc() has to handle. Please ack this. So if port_setup_tc()
> > 
> > Can you please elaborate on this? Both (->ndo_setup_tc_conduit() and
> > ->port_setup_tc()) refer to the same DSA user port (please take a look the
> > callback signature).
> 
> I'd be just repeating what I've said a few times before. Your proposed
> ndo_setup_tc_conduit() appears to be configuring conduit resources
> (QDMA, GDM) for mt7530 user port tc offload, as if it is in complete and
> exclusive control of the user port data path. But as long as there are
> packets in the user port data path that bypass those conduit QoS resources
> (like for example mt7530 switch forwards packet from one port to another,
> bypassing the GDM1 in your drawing[1]), it isn't a good model. Forget
> about ndo_setup_tc_conduit(), it isn't a good tc command to run in the
> first place. The tc command you're trying to make to do what you want is
> supposed to _also_ include the mt7530 packets forwarded from one port to
> another in its QoS mix. It applies at the _egress_ of the mt7530 port.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23020f04932701d5c8363e60756f12b43b8ed752
> 
> Let me try to add some squiggles based on your diagram, to clarify what
> is my understanding and complaint.
> 
>                ┌───────┐                                   ┌───────┐
>                │ QDMA2 │                                   │ QDMA1 │
>                └───┬───┘                                   └───┬───┘
>                    │                                           │
>            ┌───────▼───────────────────────────────────────────▼────────┐
>            │                                                            │
>            │       P5                                          P0       │
>            │                                                            │
>            │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P3 ├────► GDM3 │
>            │                                                            │
> ┌─────┐    │                                                            │
> │ PPE ◄────┤ P4                        PSE                              │
> └─────┘    │                                                            │
>            │                                                            │    ┌──────┐
>            │                                                         P9 ├────► GDM4 │
>            │                                                            │    └──────┘
>            │                                                            │
>            │        P2                                         P1       │
>            └─────────┬─────────────────────────────────────────┬────────┘
>                      │                                         │
>                  ┌───▼──┐                                   ┌──▼───┐
>                  │ GDM2 │                                   │ GDM1 │
>                  └──────┘                                   └──┬───┘
>                                                                │
>                                                 ┌──────────────▼───────────────┐
>                                                 │            CPU port          │
>                                                 │   ┌─────────┘                │
>                                                 │   │         MT7530           │
>                                                 │   │                          │
>                                                 │   ▼         x                │
>                                                 │   ┌─────┐ ┌─┘                │
>                                                 │  lan1  lan2  lan3  lan4      │
>                                                 └───│──────────────────────────┘
>                                                     ▼
> 
> When you add an offloaded Qdisc to the egress of lan1, the expectation
> is that packets from lan2 obey it too (offloaded tc goes hand in hand
> with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> breaking that expectation, because packets from lan2 bridged by MT7530
> don't go to GDM1 (the "x").

ack, I got your point. I was assuming to cover this case (traffic from lan2 to
lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
not ok, I guess we will need to revisit the approach.

> 
> > > returns -EOPNOTSUPP, the entire dsa_user_setup_qdisc() should return
> > > -EOPNOTSUPP, UNLESS you install packet traps on all other offloaded data
> > > paths in the switch, such that all packets that egress the DSA user port
> > > are handled by ndo_setup_tc_conduit()'s instruments.
> > 
> > Uhm, do you mean we are changing the user expected result in this way?
> > It seems to me the only case we are actually changing is if port_setup_tc()
> > callback is not supported by the DSA switch driver while ndo_setup_tc_conduit()
> > one is supported by the mac chip. In this case the previous implementation
> > returns -EOPNOTSUPP while the proposed one does not report any error.
> > Do we really care about this case? If so, I guess we can rework
> > dsa_user_setup_qdisc().
> 
> See above, there's nothing to rework.
> 
> > > > nope, I am not saying the current Qdisc DSA infrastructure is wrong, it just
> > > > does not allow to exploit all hw capabilities available on EN7581 when the
> > > > traffic is routed from the WAN port to a given DSA switch port.
> > > 
> > > And I don't believe it should, in this way.
> > 
> > Can you please elaborate on this? IIUC it seems even Oleksij has a use-case
> > for this.
> 
> See above, I'm also waiting for Oleksij's answer but I don't expect you
> 2 to be talking about the same thing. If there's some common infrastructure
> to be shared, my understanding is it has nothing to do with ndo_setup_tc_conduit().

ack

> 
> > > We need something as the root Qdisc of the conduit which exposes its
> > > hardware capabilities. I just assumed that would be a simple (and single)
> > > ETS, you can correct me if I am wrong.
> > > 
> > > On conduit egress, what is the arbitration scheme between the traffic
> > > destined towards each DSA user port (channel, as the driver calls them)?
> > > How can this be best represented?
> > 
> > The EN7581 supports up to 32 different 'channels' (each of them support 8
> > different hw queues). You can define an ETS and/or TBF Qdisc for each channel.
> > My idea is to associate a channel to each DSA switch port, so the user can
> > define independent QoS policies for each DSA ports (e.g. shape at 100Mbps lan0,
> > apply ETS on lan1, ...) configuring the mac chip instead of the hw switch.
> > The kernel (if the traffic is not offloaded) or the PPE block (if the traffic
> > is offloaded) updates the channel and queue information in the DMA descriptor
> > (please take a look to [0] for the first case).
> 
> But you call it a MAC chip because between the GDM1 and the MT7530 there's
> an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?

With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
(what is managed by airoha_eth driver). There is no other chip in between
of GDM1 and MT7530 switch (sorry for the confusion).

> I'm asking again, are the channels completely independent of one another,
> or are they consuming shared bandwidth in a way that with your proposal
> is just not visible? If there is a GMII between the GDM1 and the MT7530,
> how come the bandwidth between the channels is not shared in any way?

Channels are logically independent.
GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
is higher than the sum of DSA port link speeds (on my development boards I have
4 DSA ports @ 1Gbps);

> And if there is no GMII or similar MAC interface, we need to take 100
> steps back and discuss why was the DSA model chosen for this switch, and
> not a freeform switchdev driver where the conduit is not discrete?
> 
> I'm not sure what to associate these channels with. Would it be wrong to
> think of each channel as a separate DSA conduit? Because for example there
> is API to customize the user port <-> conduit assignment.
> 
> > > IIUC, in your patch set, you expose the conduit hardware QoS capabilities
> > > as if they can be perfectly virtualized among DSA user ports, and as if
> > > each DSA user port can have its own ETS root Qdisc, completely independent
> > > of each other, as if the packets do not serialize on the conduit <-> CPU
> > > port link, and as if that is not a bottleneck. Is that really the case?
> > 
> > correct
> 
> Very interesting but will need more than one word for an explanation :)

I mean your paragraph above well describes hw architecture.

> 
> > > If so (but please explain how), maybe you really need your own root Qdisc
> > > driver, with one class per DSA user port, and those classes have ETS
> > > attached to them.
> > 
> > Can you please clarify what do you mean with 'root Qdisc driver'?
> 
> Quite literally, an implementer of struct Qdisc_ops whose parent can
> only be TC_H_ROOT. I was implying you'd have to create an abstract
> software model of the QoS capabilities of the QDMA and the GDM port such
> that we all understand them, a netlink attribute scheme for configuring
> those QoS parameters, and then a QoS offload mechanism through which
> they are communicated to compatible hardware. But let's leave that aside
> until it becomes more clear what you have.
> 

ack, fine.

Regards,
Lorenzo

[0] https://github.com/openwrt/openwrt/blob/main/target/linux/mediatek/files-6.6/arch/arm64/boot/dts/mediatek/mt7988a.dtsi#L1531

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17  9:11                       ` Lorenzo Bianconi
@ 2024-12-17  9:30                         ` Vladimir Oltean
  2024-12-17 10:01                           ` Lorenzo Bianconi
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-17  9:30 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 10:11:46AM +0100, Lorenzo Bianconi wrote:
> > When you add an offloaded Qdisc to the egress of lan1, the expectation
> > is that packets from lan2 obey it too (offloaded tc goes hand in hand
> > with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> > breaking that expectation, because packets from lan2 bridged by MT7530
> > don't go to GDM1 (the "x").
> 
> ack, I got your point. I was assuming to cover this case (traffic from lan2 to
> lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
> traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
> not ok, I guess we will need to revisit the approach.

To offload QoS on the egress of a DSA user port:

port_setup_tc() is:
(a) necessary
(b) sufficient

ndo_setup_tc_conduit() is:
(a) unnecessary
(b) insufficient

> > But you call it a MAC chip because between the GDM1 and the MT7530 there's
> > an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?
> 
> With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
> (what is managed by airoha_eth driver). There is no other chip in between
> of GDM1 and MT7530 switch (sorry for the confusion).

The MT7530 is also on the same chip as the GDM1, correct?

> > I'm asking again, are the channels completely independent of one another,
> > or are they consuming shared bandwidth in a way that with your proposal
> > is just not visible? If there is a GMII between the GDM1 and the MT7530,
> > how come the bandwidth between the channels is not shared in any way?
> 
> Channels are logically independent.

"Logically independent" does not mean "does not share resources", which
is what I asked.

> GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
> to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
> is higher than the sum of DSA port link speeds (on my development boards I have
> 4 DSA ports @ 1Gbps);

And this fixed connection is a pair of internal Ethernet MACs, correct?
I see on MT7988 we do have the "pause" property, which would suggest so,
since flow control is a MAC level feature. I assume 10 Gbps in the
device tree means it is an XGMII really limited at that speed, and that
speed is not just for phylink compliance, right?

What if we push your example to the extreme, and say that the DSA user
ports also have 10 Gbps links? How independent are the QDMA channels in
this case? What arbitration algorithm will be used for QoS among user
ports, when the combined bandwidth exceeds the CPU port capacity?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-16 23:24                 ` Andrew Lunn
@ 2024-12-17  9:38                   ` Oleksij Rempel
  2024-12-17 11:54                     ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-17  9:38 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Lorenzo Bianconi, Vladimir Oltean, Oleksij Rempel, netdev, davem,
	edumazet, kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 12:24:13AM +0100, Andrew Lunn wrote:
> > Considering patch [0], we are still offloading the Qdisc on the provided
> > DSA switch port (e.g. LANx) via the port_setup_tc() callback available in
> > dsa_user_setup_qdisc(), but we are introducing even the ndo_setup_tc_conduit()
> > callback in order to use the hw Qdisc capabilities available on the mac chip
> > (e.g. EN7581) for the routed traffic from WAN to LANx. We will still apply
> > the Qdisc defined on LANx for L2 traffic from LANy to LANx. Agree?
> 
> I've not read all the details, so i could be getting something
> wrong. But let me point out the basics. Offloading is used to
> accelerate what Linux already supports in software. So forget about
> your hardware. How would i configure a bunch of e1000e cards connected
> to a software bridge to do what you want?
> 
> There is no conduit interface in this, so i would not expect to
> explicitly configure a conduit interface. Maybe the offloading needs
> to implicitly configure the conduit, but that should be all hidden
> away from the user. But given the software bridge has no concept of a
> conduit, i doubt it.
> 
> It could well be our model does not map to the hardware too well,
> leaving some bits unusable, but there is not much you can do about
> that, that is the Linux model, accelerate what Linux supports in
> software.

Hi,

You are absolutely correct that offloading should accelerate what Linux already
supports in software, and we need to respect this model. However, I’d like to
step back for a moment to clarify the underlying problem before focusing too
much on solutions.

### The Core Problem: Flow Control Limitations

1. **QoS and Flow Control:** 

   At the heart of proper QoS implementation lies flow control. Flow control
   mechanisms exist at various levels:

   - MAC-level signaling (e.g., pause frames)

   - Queue management (e.g., stopping queues when the hardware is congested)

   The typical Linux driver uses flow control signaling from the MAC (e.g.,
   stopping queues) to coordinate traffic, and depending on the Qdisc, this
   flow control can propagate up to user space applications.

2. **Challenges with DSA:**
   In DSA, we lose direct **flow control communication** between:

   - The host MAC

   - The MAC of a DSA user port.

   While internal flow control within the switch may still work, it does not
   extend to the host. Specifically:

   - Pause frames often affect **all priorities** and are not granular enough
     for low-latency applications.

   - The signaling from the MAC of the DSA user port to the host is either
     **not supported** or is **disabled** (often through device tree
     configuration).

### Why This Matters for QoS

For traffic flowing **from the host** to DSA user ports:

- Without proper flow control, congestion cannot be communicated back to the
  host, leading to buffer overruns and degraded QoS.  

- To address this, we need to compensate for the lack of flow control signaling
  by applying traffic limits (or shaping).

### Approach: Applying Limits on the Conduit Interface

One way to solve this is by applying traffic shaping or limits directly on the
**conduit MAC**. However, this approach has significant complexity:

1. **Hardware-Specific Details:**

   We would need deep hardware knowledge to set up traffic filters or disectors
   at the conduit level. This includes:

   - Parsing **CPU tags** specific to the switch in use.  

   - Applying port-specific rules, some of which depend on **user port link
     speed**.

2. **Admin Burden:**

   Forcing network administrators to configure conduit-specific filters
   manually increases complexity and goes against the existing DSA abstractions,
   which are already well-integrated into the kernel.

### How Things Can Be Implemented

To address QoS for host-to-user port traffic in DSA, I see two possible
approaches:

#### 1. Apply Rules on the Conduit Port (Using `dst_port`)

In this approach, rules are applied to the **conduit interface**, and specific
user ports are matched using **port indices**.

# Conduit interface  
tc qdisc add dev conduit0 clsact  

# Match traffic for user port 1 (e.g., lan0)  
tc filter add dev conduit0 egress flower dst_port 1 \  
    action police rate 50mbit burst 5k drop  

# Match traffic for user port 2 (e.g., lan1)  
tc filter add dev conduit0 egress flower dst_port 2 \  
    action police rate 30mbit burst 3k drop  

#### 2. Apply Rules Directly on the User Ports (With Conduit Marker)

In this approach, rules are applied **directly to the user-facing DSA ports**
(e.g., `lan0`, `lan1`) with a **conduit-specific marker**. The kernel resolves
the mapping internally.

# Apply rules with conduit marker for user ports  
tc qdisc add dev lan0 root tbf rate 50mbit burst 5k conduit-only  
tc qdisc add dev lan1 root tbf rate 30mbit burst 3k conduit-only  

Here:  
- **`conduit-only`**: A marker (flag) indicating that the rule applies
specifically to **host-to-port traffic** and not to L2-forwarded traffic within
the switch.  

### Recommendation

The second approach (**user port-based with `conduit-only` marker**) is cleaner
and more intuitive. It avoids exposing hardware details like port indices while
letting the kernel handle conduit-specific behavior transparently.

Best regards,  
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17  9:30                         ` Vladimir Oltean
@ 2024-12-17 10:01                           ` Lorenzo Bianconi
  2024-12-17 10:17                             ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Lorenzo Bianconi @ 2024-12-17 10:01 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Lorenzo Bianconi, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

[-- Attachment #1: Type: text/plain, Size: 3211 bytes --]

> On Tue, Dec 17, 2024 at 10:11:46AM +0100, Lorenzo Bianconi wrote:
> > > When you add an offloaded Qdisc to the egress of lan1, the expectation
> > > is that packets from lan2 obey it too (offloaded tc goes hand in hand
> > > with offloaded bridge). Whereas, by using GDM1/QDMA resources, you are
> > > breaking that expectation, because packets from lan2 bridged by MT7530
> > > don't go to GDM1 (the "x").
> > 
> > ack, I got your point. I was assuming to cover this case (traffic from lan2 to
> > lan1) maintaining the port_setup_tc() callback in dsa_user_setup_qdisc() (this
> > traffic is not managed by ndo_setup_tc_conduit() callback). If this approach is
> > not ok, I guess we will need to revisit the approach.
> 
> To offload QoS on the egress of a DSA user port:
> 
> port_setup_tc() is:
> (a) necessary
> (b) sufficient
> 
> ndo_setup_tc_conduit() is:
> (a) unnecessary

I agree it is unnecessary, but w/o it we must rely on limited QoS capabilities
of the hw dsa switch. The goal of the series is just exploit enhanced QoS
capabilities available on the EN7581 SoC.

> (b) insufficient
> 
> > > But you call it a MAC chip because between the GDM1 and the MT7530 there's
> > > an in-chip Ethernet MAC (GMII netlist), with a fixed packet rate, right?
> > 
> > With "mac chip" I mean the set of PSE/PPE and QDMA blocks in the diagram
> > (what is managed by airoha_eth driver). There is no other chip in between
> > of GDM1 and MT7530 switch (sorry for the confusion).
> 
> The MT7530 is also on the same chip as the GDM1, correct?

I think so, but I am not sure.

> 
> > > I'm asking again, are the channels completely independent of one another,
> > > or are they consuming shared bandwidth in a way that with your proposal
> > > is just not visible? If there is a GMII between the GDM1 and the MT7530,
> > > how come the bandwidth between the channels is not shared in any way?
> > 
> > Channels are logically independent.
> 
> "Logically independent" does not mean "does not share resources", which
> is what I asked.
> 
> > GDM1 is connected to the MT7530 switch via a fixed speed link (10Gbps, similar
> > to what we have for other MediaTek chipset, like MT7988 [0]). The fixed link speed
> > is higher than the sum of DSA port link speeds (on my development boards I have
> > 4 DSA ports @ 1Gbps);
> 
> And this fixed connection is a pair of internal Ethernet MACs, correct?

yes

> I see on MT7988 we do have the "pause" property, which would suggest so,
> since flow control is a MAC level feature. I assume 10 Gbps in the
> device tree means it is an XGMII really limited at that speed, and that
> speed is not just for phylink compliance, right?

I think so

> 
> What if we push your example to the extreme, and say that the DSA user
> ports also have 10 Gbps links? How independent are the QDMA channels in
> this case? What arbitration algorithm will be used for QoS among user
> ports, when the combined bandwidth exceeds the CPU port capacity?

AFIR there is even the possibility to configure inter-channel QoS on the chip,
like a Round Robin scheduler or Strict-Priority between channels.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17 10:01                           ` Lorenzo Bianconi
@ 2024-12-17 10:17                             ` Vladimir Oltean
  2024-12-17 10:23                               ` Oleksij Rempel
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-17 10:17 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, Oleksij Rempel, netdev, andrew, davem, edumazet,
	kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 11:01:28AM +0100, Lorenzo Bianconi wrote:
> I agree it is unnecessary, but w/o it we must rely on limited QoS capabilities
> of the hw dsa switch. The goal of the series is just exploit enhanced QoS
> capabilities available on the EN7581 SoC.
(...)
> AFIR there is even the possibility to configure inter-channel QoS on the chip,
> like a Round Robin scheduler or Strict-Priority between channels.

So the conclusion of both these statements is that it would be good to understand
the QoS algorithm among channels of the MAC chip, see if there is any classful
root Qdisc which can describe that algorithm, then offload it with ndo_setup_tc(),
then assign traffic to user ports probably using a software classifier + flowid.
The ETS Qdiscs which you are trying to add here should not be attached to the root
egress Qdisc of DSA user ports, but to the classes of this unknown conduit root
Qdisc. Agree?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17 10:17                             ` Vladimir Oltean
@ 2024-12-17 10:23                               ` Oleksij Rempel
  0 siblings, 0 replies; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-17 10:23 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, Oleksij Rempel, netdev,
	andrew, davem, edumazet, kuba, pabeni, horms, nbd, sean.wang,
	Mark-MC.Lee, lorenzo.bianconi83

On Tue, Dec 17, 2024 at 12:17:24PM +0200, Vladimir Oltean wrote:
> On Tue, Dec 17, 2024 at 11:01:28AM +0100, Lorenzo Bianconi wrote:
> > I agree it is unnecessary, but w/o it we must rely on limited QoS capabilities
> > of the hw dsa switch. The goal of the series is just exploit enhanced QoS
> > capabilities available on the EN7581 SoC.
> (...)
> > AFIR there is even the possibility to configure inter-channel QoS on the chip,
> > like a Round Robin scheduler or Strict-Priority between channels.
> 
> So the conclusion of both these statements is that it would be good to understand
> the QoS algorithm among channels of the MAC chip, see if there is any classful
> root Qdisc which can describe that algorithm, then offload it with ndo_setup_tc(),
> then assign traffic to user ports probably using a software classifier + flowid.
> The ETS Qdiscs which you are trying to add here should not be attached to the root
> egress Qdisc of DSA user ports, but to the classes of this unknown conduit root
> Qdisc. Agree?

I would love to have some kind of easy to use DSA user ports classifiers.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17  9:38                   ` Oleksij Rempel
@ 2024-12-17 11:54                     ` Vladimir Oltean
  2024-12-17 12:22                       ` Oleksij Rempel
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-17 11:54 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Lorenzo Bianconi, Oleksij Rempel, netdev, davem,
	edumazet, kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 10:38:21AM +0100, Oleksij Rempel wrote:
> Hi,
> 
> You are absolutely correct that offloading should accelerate what Linux already
> supports in software, and we need to respect this model. However, I’d like to
> step back for a moment to clarify the underlying problem before focusing too
> much on solutions.
> 
> ### The Core Problem: Flow Control Limitations
> 
> 1. **QoS and Flow Control:** 
> 
>    At the heart of proper QoS implementation lies flow control. Flow control
>    mechanisms exist at various levels:
> 
>    - MAC-level signaling (e.g., pause frames)
> 
>    - Queue management (e.g., stopping queues when the hardware is congested)
> 
>    The typical Linux driver uses flow control signaling from the MAC (e.g.,
>    stopping queues) to coordinate traffic, and depending on the Qdisc, this
>    flow control can propagate up to user space applications.

I read this section as "The Core Problem: Ethernet".

> 2. **Challenges with DSA:**
>    In DSA, we lose direct **flow control communication** between:
> 
>    - The host MAC
> 
>    - The MAC of a DSA user port.
> 
>    While internal flow control within the switch may still work, it does not
>    extend to the host. Specifically:
> 
>    - Pause frames often affect **all priorities** and are not granular enough
>      for low-latency applications.
> 
>    - The signaling from the MAC of the DSA user port to the host is either
>      **not supported** or is **disabled** (often through device tree
>      configuration).

And this as: "Challenges with DSA: uses Ethernet". I think we can all
agree that standard Ethernet, with all the flexibility it gives to pair
any discrete DSA switch to any host NIC, does not give us sufficient
instruments for independent flow control of each user port.

Food for thought: strongly coupled MAC + integrated DSA switch systems,
like for example Broadcom, have custom methods of pacing transmission to
user ports by selectively stopping conduit TX queues associated with
those user ports on congestion:
https://lore.kernel.org/netdev/7510c29a-b60f-e0d7-4129-cb90fe376c74@gmail.com/

> ### Why This Matters for QoS
> 
> For traffic flowing **from the host** to DSA user ports:
> 
> - Without proper flow control, congestion cannot be communicated back to the
>   host, leading to buffer overruns and degraded QoS.  

There are multiple, and sometimes conflicting, goals to QoS and strategies on
congestion. Generally speaking, it is good to clarify that deterministic latency,
high throughput and zero loss cannot be all achieved at the same time. It is
also good to highlight the fact that you are focusing on zero loss and that
this is not necessarily the full picture. Some AVB/TSN switches, like SJA1105,
do not support pause frames at all, not even on user ports, because as you say,
it's like the nuclear solution which stops the entire port regardless of
packet priorities. And even if they did support it, for deterministic latency
applications it is best to turn it off. If you make a port enter congestion by
bombarding it with TC0 traffic, you'll incur latency to TC7 traffic until you
exit the congestion condition. These switches just expect to have reservations
very carefully configured by the system administrator. What exceeds reservations
and cannot consume shared resources (because they are temporarily depleted) is dropped.

> - To address this, we need to compensate for the lack of flow control signaling
>   by applying traffic limits (or shaping).

A splendid idea in theory. In practice, the traffic rate at the egress
of a user port is the sum of locally injected traffic plus autonomously
forwarded traffic. The port can enter congestion even with shaping of
CPU-injected traffic at a certain rate.

            Conduit
               |
               v
  +-------------------------+
  |         CPU port        |
  |            |            |
  |   +--------+            |
  |   |                     |
  |   +<---+                |
  |   |    |                |
  |   v    |                |
  | lan0  lan1  lan2  lan3  |
  +-------------------------+
      |
      v Just 1Gbps.

You _could_ apply this technique to achieve a different purpose than
net zero packet loss: selective transmission guarantees for CPU-injected
traffic. But you also need to ensure that injected packets have a higher
strict priority than the rest, and that the switch resources are
configured through devlink-sb to have enough reserved space to keep
these high priority packets on congestion and drop something else instead.

It's a tool to have for sure, but you need to be extremely specific and
realistic about your goals.

> ### Approach: Applying Limits on the Conduit Interface
> 
> One way to solve this is by applying traffic shaping or limits directly on the
> **conduit MAC**. However, this approach has significant complexity:
> 
> 1. **Hardware-Specific Details:**
> 
>    We would need deep hardware knowledge to set up traffic filters or disectors
>    at the conduit level. This includes:
> 
>    - Parsing **CPU tags** specific to the switch in use.  
> 
>    - Applying port-specific rules, some of which depend on **user port link
>      speed**.
> 
> 2. **Admin Burden:**
> 
>    Forcing network administrators to configure conduit-specific filters
>    manually increases complexity and goes against the existing DSA abstractions,
>    which are already well-integrated into the kernel.

Agree that there is high complexity. Just need to see a proposal which
acknowledges that it's not for nothing.

> ### How Things Can Be Implemented
> 
> To address QoS for host-to-user port traffic in DSA, I see two possible
> approaches:
> 
> #### 1. Apply Rules on the Conduit Port (Using `dst_port`)
> 
> In this approach, rules are applied to the **conduit interface**, and specific
> user ports are matched using **port indices**.
> 
> # Conduit interface  
> tc qdisc add dev conduit0 clsact  
> 
> # Match traffic for user port 1 (e.g., lan0)  
> tc filter add dev conduit0 egress flower dst_port 1 \  
>     action police rate 50mbit burst 5k drop  
> 
> # Match traffic for user port 2 (e.g., lan1)  
> tc filter add dev conduit0 egress flower dst_port 2 \  
>     action police rate 30mbit burst 3k drop  

Ok, so you propose an abstract key set for DSA in the flower classifier
with mappings to concrete packet fields happening in the backend,
probably done by the tagging protocol in use. The abstract key set
represents the superset of all known DSA fields, united by a common
interpretation, and each tagger rejects keys it cannot map to the
physical DSA tag.

I can immediately think of a challenge here, that we can dynamically
change the tagging protocol while tc rules are present, and this can
affect which flower keys can be mapped and which cannot. For example,
the ocelot tagging protocol could map a virtual DSA key "TX timestamp
type" to the REW_OP field, but the ocelot-8021q tagger cannot. Plus, you
could add tc filters to a block shared by multiple devices. You can't
always infer the physical tagging protocol from the device that the
filters are attached to.

> #### 2. Apply Rules Directly on the User Ports (With Conduit Marker)
> 
> In this approach, rules are applied **directly to the user-facing DSA ports**
> (e.g., `lan0`, `lan1`) with a **conduit-specific marker**. The kernel resolves
> the mapping internally.
> 
> # Apply rules with conduit marker for user ports  
> tc qdisc add dev lan0 root tbf rate 50mbit burst 5k conduit-only  
> tc qdisc add dev lan1 root tbf rate 30mbit burst 3k conduit-only  
> 
> Here:  
> - **`conduit-only`**: A marker (flag) indicating that the rule applies
> specifically to **host-to-port traffic** and not to L2-forwarded traffic within
> the switch.  
> 
> ### Recommendation
> 
> The second approach (**user port-based with `conduit-only` marker**) is cleaner
> and more intuitive. It avoids exposing hardware details like port indices while
> letting the kernel handle conduit-specific behavior transparently.
> 
> Best regards,  
> Oleksij

The second approach that you recommend suffers from the same problem as Lorenzo's
revised proposal, which is that it treats the conduit interface as a collection of
independent pipes of infinite capacity to each user port, with no arbitration concerns
of its own. The model is again great in theory, but maps really poorly on real life.
Your proposal actively encourages users to look away from the scheduling algorithm
of the conduit, and just look at user ports in isolation of each other. I strongly
disagree with it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17 11:54                     ` Vladimir Oltean
@ 2024-12-17 12:22                       ` Oleksij Rempel
  2024-12-17 13:28                         ` Vladimir Oltean
  0 siblings, 1 reply; 33+ messages in thread
From: Oleksij Rempel @ 2024-12-17 12:22 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Lorenzo Bianconi, Oleksij Rempel, netdev, davem,
	edumazet, kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 01:54:48PM +0200, Vladimir Oltean wrote:
> On Tue, Dec 17, 2024 at 10:38:21AM +0100, Oleksij Rempel wrote:
> > Hi,
> > 
> > You are absolutely correct that offloading should accelerate what Linux already
> > supports in software, and we need to respect this model. However, I’d like to
> > step back for a moment to clarify the underlying problem before focusing too
> > much on solutions.
> > 
> > ### The Core Problem: Flow Control Limitations
> > 
> > 1. **QoS and Flow Control:** 
> > 
> >    At the heart of proper QoS implementation lies flow control. Flow control
> >    mechanisms exist at various levels:
> > 
> >    - MAC-level signaling (e.g., pause frames)
> > 
> >    - Queue management (e.g., stopping queues when the hardware is congested)
> > 
> >    The typical Linux driver uses flow control signaling from the MAC (e.g.,
> >    stopping queues) to coordinate traffic, and depending on the Qdisc, this
> >    flow control can propagate up to user space applications.
> 
> I read this section as "The Core Problem: Ethernet".

ack.

> > ### Why This Matters for QoS
> > 
> > For traffic flowing **from the host** to DSA user ports:
> > 
> > - Without proper flow control, congestion cannot be communicated back to the
> >   host, leading to buffer overruns and degraded QoS.  
> 
> There are multiple, and sometimes conflicting, goals to QoS and strategies on
> congestion. Generally speaking, it is good to clarify that deterministic latency,
> high throughput and zero loss cannot be all achieved at the same time. It is
> also good to highlight the fact that you are focusing on zero loss and that
> this is not necessarily the full picture. Some AVB/TSN switches, like SJA1105,
> do not support pause frames at all, not even on user ports, because as you say,
> it's like the nuclear solution which stops the entire port regardless of
> packet priorities. And even if they did support it, for deterministic latency
> applications it is best to turn it off. If you make a port enter congestion by
> bombarding it with TC0 traffic, you'll incur latency to TC7 traffic until you
> exit the congestion condition. These switches just expect to have reservations
> very carefully configured by the system administrator. What exceeds reservations
> and cannot consume shared resources (because they are temporarily depleted) is dropped.

> > - To address this, we need to compensate for the lack of flow control signaling
> >   by applying traffic limits (or shaping).
> 
> A splendid idea in theory. In practice, the traffic rate at the egress
> of a user port is the sum of locally injected traffic plus autonomously
> forwarded traffic. The port can enter congestion even with shaping of
> CPU-injected traffic at a certain rate.
> 
>             Conduit
>                |
>                v
>   +-------------------------+
>   |         CPU port        |
>   |            |            |
>   |   +--------+            |
>   |   |                     |
>   |   +<---+                |
>   |   |    |                |
>   |   v    |                |
>   | lan0  lan1  lan2  lan3  |
>   +-------------------------+
>       |
>       v Just 1Gbps.
> 
> You _could_ apply this technique to achieve a different purpose than
> net zero packet loss: selective transmission guarantees for CPU-injected
> traffic. But you also need to ensure that injected packets have a higher
> strict priority than the rest, and that the switch resources are
> configured through devlink-sb to have enough reserved space to keep
> these high priority packets on congestion and drop something else instead.
> 
> It's a tool to have for sure, but you need to be extremely specific and
> realistic about your goals.

Yes, you are right. For my specific use case the switch is used mostly as port
multiplayer.

> > #### 2. Apply Rules Directly on the User Ports (With Conduit Marker)
> > 
> > In this approach, rules are applied **directly to the user-facing DSA ports**
> > (e.g., `lan0`, `lan1`) with a **conduit-specific marker**. The kernel resolves
> > the mapping internally.
> > 
> > # Apply rules with conduit marker for user ports  
> > tc qdisc add dev lan0 root tbf rate 50mbit burst 5k conduit-only  
> > tc qdisc add dev lan1 root tbf rate 30mbit burst 3k conduit-only  
> > 
> > Here:  
> > - **`conduit-only`**: A marker (flag) indicating that the rule applies
> > specifically to **host-to-port traffic** and not to L2-forwarded traffic within
> > the switch.  
> > 
> > ### Recommendation
> > 
> > The second approach (**user port-based with `conduit-only` marker**) is cleaner
> > and more intuitive. It avoids exposing hardware details like port indices while
> > letting the kernel handle conduit-specific behavior transparently.
> > 
> > Best regards,  
> > Oleksij
> 
> The second approach that you recommend suffers from the same problem as Lorenzo's
> revised proposal, which is that it treats the conduit interface as a collection of
> independent pipes of infinite capacity to each user port, with no arbitration concerns
> of its own. The model is again great in theory, but maps really poorly on real life.
> Your proposal actively encourages users to look away from the scheduling algorithm
> of the conduit, and just look at user ports in isolation of each other. I strongly
> disagree with it.

I'm still thinking about best way to classify DSA user port traffic.
Will it be enough to set classid on user port?

tc filter add dev lan0 protocol all flower skip_hw \
    action classid 1:1
tc filter add dev lan1 protocol all flower skip_hw \
    action classid 1:2

And then process it on the conduit port:
# Add HTB Qdisc on the conduit interface
tc qdisc add dev conduit0 root handle 1: htb default 1

# Define rate-limiting classes
tc class add dev conduit0 parent 1: classid 1:1 htb rate 100mbit burst 5k
tc class add dev conduit0 parent 1: classid 1:2 htb rate 100mbit burst 5k

Or the classid will not be transferred between devices and i'll need to
use something like skbedit?

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC
  2024-12-17 12:22                       ` Oleksij Rempel
@ 2024-12-17 13:28                         ` Vladimir Oltean
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Oltean @ 2024-12-17 13:28 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Lorenzo Bianconi, Oleksij Rempel, netdev, davem,
	edumazet, kuba, pabeni, horms, nbd, sean.wang, Mark-MC.Lee,
	lorenzo.bianconi83

On Tue, Dec 17, 2024 at 01:22:45PM +0100, Oleksij Rempel wrote:
> I'm still thinking about best way to classify DSA user port traffic.
> Will it be enough to set classid on user port?
> 
> tc filter add dev lan0 protocol all flower skip_hw \
>     action classid 1:1
> tc filter add dev lan1 protocol all flower skip_hw \
>     action classid 1:2
> 
> And then process it on the conduit port:
> # Add HTB Qdisc on the conduit interface
> tc qdisc add dev conduit0 root handle 1: htb default 1
> 
> # Define rate-limiting classes
> tc class add dev conduit0 parent 1: classid 1:1 htb rate 100mbit burst 5k
> tc class add dev conduit0 parent 1: classid 1:2 htb rate 100mbit burst 5k
> 
> Or the classid will not be transferred between devices and i'll need to
> use something like skbedit?

I don't know, if you find out, let us know.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-12-17 13:28 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-11 15:31 [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 1/5] net: airoha: Enable Tx drop capability for each Tx DMA ring Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 2/5] net: airoha: Introduce ndo_select_queue callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 3/5] net: dsa: Introduce ndo_setup_tc_conduit callback Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 4/5] net: airoha: Add sched ETS offload support Lorenzo Bianconi
2024-12-12 14:37   ` Davide Caratti
2024-12-12 17:04     ` Lorenzo Bianconi
2024-12-11 15:31 ` [RFC net-next 5/5] net: airoha: Add sched TBF " Lorenzo Bianconi
2024-12-11 15:41 ` [RFC net-next 0/5] Add ETS and TBF Qdisc offload for Airoha EN7581 SoC Vladimir Oltean
2024-12-12  9:19   ` Lorenzo Bianconi
2024-12-12 15:06     ` Vladimir Oltean
2024-12-12 17:03       ` Lorenzo Bianconi
2024-12-12 18:46         ` Vladimir Oltean
2024-12-16 12:09           ` Lorenzo Bianconi
2024-12-16 15:49             ` Vladimir Oltean
2024-12-16 18:14               ` Oleksij Rempel
2024-12-16 19:01               ` Lorenzo Bianconi
2024-12-16 19:23                 ` Oleksij Rempel
2024-12-16 21:44                   ` Lorenzo Bianconi
2024-12-17  8:46                     ` Oleksij Rempel
2024-12-16 19:46                 ` Vladimir Oltean
2024-12-16 22:28                   ` Lorenzo Bianconi
2024-12-16 23:13                     ` Vladimir Oltean
2024-12-17  9:11                       ` Lorenzo Bianconi
2024-12-17  9:30                         ` Vladimir Oltean
2024-12-17 10:01                           ` Lorenzo Bianconi
2024-12-17 10:17                             ` Vladimir Oltean
2024-12-17 10:23                               ` Oleksij Rempel
2024-12-16 23:24                 ` Andrew Lunn
2024-12-17  9:38                   ` Oleksij Rempel
2024-12-17 11:54                     ` Vladimir Oltean
2024-12-17 12:22                       ` Oleksij Rempel
2024-12-17 13:28                         ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).