linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1
@ 2024-12-13  2:17 Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC Wei Fang
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-13  2:17 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

Compared to ENETC v1 (LS1028A), ENETC v4 (i.MX95) adds more features, and
some features are configured completely differently from v1. In order to
more fully support ENETC v4, these features will be added through several
rounds of patch sets. This round adds these features, such as Tx and Rx
checksum offload, increase maximum chained Tx BD number and Large send
offload (LSO).

---
v1 Link: https://lore.kernel.org/imx/20241107033817.1654163-1-wei.fang@nxp.com/
v2 Link: https://lore.kernel.org/imx/20241111015216.1804534-1-wei.fang@nxp.com/
v3 Link: https://lore.kernel.org/imx/20241112091447.1850899-1-wei.fang@nxp.com/
v4 Link: https://lore.kernel.org/imx/20241115024744.1903377-1-wei.fang@nxp.com/
v5 Link: https://lore.kernel.org/imx/20241118060630.1956134-1-wei.fang@nxp.com/
v6 Link: https://lore.kernel.org/imx/20241119082344.2022830-1-wei.fang@nxp.com/
v6 RESEND Link: https://lore.kernel.org/imx/20241204052932.112446-1-wei.fang@nxp.com/
v7 Link: https://lore.kernel.org/imx/20241211063752.744975-1-wei.fang@nxp.com/
---

Wei Fang (4):
  net: enetc: add Tx checksum offload for i.MX95 ENETC
  net: enetc: update max chained Tx BD number for i.MX95 ENETC
  net: enetc: add LSO support for i.MX95 ENETC PF
  net: enetc: add UDP segmentation offload support

 drivers/net/ethernet/freescale/enetc/enetc.c  | 324 +++++++++++++++++-
 drivers/net/ethernet/freescale/enetc/enetc.h  |  30 +-
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  22 ++
 .../net/ethernet/freescale/enetc/enetc_hw.h   |  29 +-
 .../freescale/enetc/enetc_pf_common.c         |  13 +-
 .../net/ethernet/freescale/enetc/enetc_vf.c   |   7 +-
 6 files changed, 395 insertions(+), 30 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC
  2024-12-13  2:17 [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1 Wei Fang
@ 2024-12-13  2:17 ` Wei Fang
  2024-12-17 15:13   ` Alexander Lobakin
  2024-12-13  2:17 ` [PATCH v8 net-next 2/4] net: enetc: update max chained Tx BD number " Wei Fang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Wei Fang @ 2024-12-13  2:17 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

In addition to supporting Rx checksum offload, i.MX95 ENETC also supports
Tx checksum offload. The transmit checksum offload is implemented through
the Tx BD. To support Tx checksum offload, software needs to fill some
auxiliary information in Tx BD, such as IP version, IP header offset and
size, whether L4 is UDP or TCP, etc.

Same as Rx checksum offload, Tx checksum offload capability isn't defined
in register, so tx_csum bit is added to struct enetc_drvdata to indicate
whether the device supports Tx checksum offload.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
v2: refine enetc_tx_csum_offload_check().
v3:
1. refine enetc_tx_csum_offload_check() and enetc_skb_is_tcp() through
skb->csum_offset instead of touching skb->data.
2. add enetc_skb_is_ipv6() helper function
v4: no changes
v5:
1. remove 'inline' from enetc_skb_is_ipv6() and enetc_skb_is_tcp().
2. temp_bd.ipcs is no need to be set due to Linux always aclculates
the IPv4 checksum, so remove it.
3. simplify the setting of temp_bd.l3t.
4. remove the error log from the datapath
v6: no changes
v7:
1. Change the layout of enetc_tx_bd to fix the issue on big-endian
hosts.
2. Rebase the patch due to remove the Rx checksum offload patch from
v6.
v8: no changes
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 54 ++++++++++++++++---
 drivers/net/ethernet/freescale/enetc/enetc.h  |  2 +
 .../net/ethernet/freescale/enetc/enetc_hw.h   | 15 ++++--
 .../freescale/enetc/enetc_pf_common.c         |  3 ++
 4 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 535969fa0fdb..b8ac680e46bd 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -146,6 +146,27 @@ static int enetc_ptp_parse(struct sk_buff *skb, u8 *udp,
 	return 0;
 }
 
+static bool enetc_tx_csum_offload_check(struct sk_buff *skb)
+{
+	switch (skb->csum_offset) {
+	case offsetof(struct tcphdr, check):
+	case offsetof(struct udphdr, check):
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool enetc_skb_is_ipv6(struct sk_buff *skb)
+{
+	return vlan_get_protocol(skb) == htons(ETH_P_IPV6);
+}
+
+static bool enetc_skb_is_tcp(struct sk_buff *skb)
+{
+	return skb->csum_offset == offsetof(struct tcphdr, check);
+}
+
 static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 {
 	bool do_vlan, do_onestep_tstamp = false, do_twostep_tstamp = false;
@@ -163,6 +184,30 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 	dma_addr_t dma;
 	u8 flags = 0;
 
+	enetc_clear_tx_bd(&temp_bd);
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		/* Can not support TSD and checksum offload at the same time */
+		if (priv->active_offloads & ENETC_F_TXCSUM &&
+		    enetc_tx_csum_offload_check(skb) && !tx_ring->tsd_enable) {
+			temp_bd.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START,
+						     skb_network_offset(skb));
+			temp_bd.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
+						     skb_network_header_len(skb) / 4);
+			temp_bd.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T,
+						      enetc_skb_is_ipv6(skb));
+			if (enetc_skb_is_tcp(skb))
+				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
+							    ENETC_TXBD_L4T_TCP);
+			else
+				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
+							    ENETC_TXBD_L4T_UDP);
+			flags |= ENETC_TXBD_FLAGS_CSUM_LSO | ENETC_TXBD_FLAGS_L4CS;
+		} else {
+			if (skb_checksum_help(skb))
+				return 0;
+		}
+	}
+
 	i = tx_ring->next_to_use;
 	txbd = ENETC_TXBD(*tx_ring, i);
 	prefetchw(txbd);
@@ -173,7 +218,6 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 
 	temp_bd.addr = cpu_to_le64(dma);
 	temp_bd.buf_len = cpu_to_le16(len);
-	temp_bd.lstatus = 0;
 
 	tx_swbd = &tx_ring->tx_swbd[i];
 	tx_swbd->dma = dma;
@@ -594,7 +638,7 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 {
 	struct enetc_ndev_priv *priv = netdev_priv(ndev);
 	struct enetc_bdr *tx_ring;
-	int count, err;
+	int count;
 
 	/* Queue one-step Sync packet if already locked */
 	if (skb->cb[0] & ENETC_F_TX_ONESTEP_SYNC_TSTAMP) {
@@ -627,11 +671,6 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 			return NETDEV_TX_BUSY;
 		}
 
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			err = skb_checksum_help(skb);
-			if (err)
-				goto drop_packet_err;
-		}
 		enetc_lock_mdio();
 		count = enetc_map_tx_buffs(tx_ring, skb);
 		enetc_unlock_mdio();
@@ -3274,6 +3313,7 @@ static const struct enetc_drvdata enetc_pf_data = {
 
 static const struct enetc_drvdata enetc4_pf_data = {
 	.sysclk_freq = ENETC_CLK_333M,
+	.tx_csum = 1,
 	.pmac_offset = ENETC4_PMAC_OFFSET,
 	.eth_ops = &enetc4_pf_ethtool_ops,
 };
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 72fa03dbc2dd..e82eb9a9137c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -234,6 +234,7 @@ enum enetc_errata {
 
 struct enetc_drvdata {
 	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
+	u8 tx_csum:1;
 	u64 sysclk_freq;
 	const struct ethtool_ops *eth_ops;
 };
@@ -341,6 +342,7 @@ enum enetc_active_offloads {
 	ENETC_F_QBV			= BIT(9),
 	ENETC_F_QCI			= BIT(10),
 	ENETC_F_QBU			= BIT(11),
+	ENETC_F_TXCSUM			= BIT(12),
 };
 
 enum enetc_flags_bit {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 55ba949230ff..0e259baf36ee 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -558,7 +558,16 @@ union enetc_tx_bd {
 		__le16 frm_len;
 		union {
 			struct {
-				u8 reserved[3];
+				u8 l3_aux0;
+#define ENETC_TX_BD_L3_START	GENMASK(6, 0)
+#define ENETC_TX_BD_IPCS	BIT(7)
+				u8 l3_aux1;
+#define ENETC_TX_BD_L3_HDR_LEN	GENMASK(6, 0)
+#define ENETC_TX_BD_L3T		BIT(7)
+				u8 l4_aux;
+#define ENETC_TX_BD_L4T		GENMASK(7, 5)
+#define ENETC_TXBD_L4T_UDP	1
+#define ENETC_TXBD_L4T_TCP	2
 				u8 flags;
 			}; /* default layout */
 			__le32 txstart;
@@ -582,10 +591,10 @@ union enetc_tx_bd {
 };
 
 enum enetc_txbd_flags {
-	ENETC_TXBD_FLAGS_RES0 = BIT(0), /* reserved */
+	ENETC_TXBD_FLAGS_L4CS = BIT(0), /* For ENETC 4.1 and later */
 	ENETC_TXBD_FLAGS_TSE = BIT(1),
 	ENETC_TXBD_FLAGS_W = BIT(2),
-	ENETC_TXBD_FLAGS_RES3 = BIT(3), /* reserved */
+	ENETC_TXBD_FLAGS_CSUM_LSO = BIT(3), /* For ENETC 4.1 and later */
 	ENETC_TXBD_FLAGS_TXSTART = BIT(4),
 	ENETC_TXBD_FLAGS_EX = BIT(6),
 	ENETC_TXBD_FLAGS_F = BIT(7)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 0eecfc833164..09f2d7ec44eb 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -119,6 +119,9 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 
 	ndev->priv_flags |= IFF_UNICAST_FLT;
 
+	if (si->drvdata->tx_csum)
+		priv->active_offloads |= ENETC_F_TXCSUM;
+
 	/* TODO: currently, i.MX95 ENETC driver does not support advanced features */
 	if (!is_enetc_rev1(si)) {
 		ndev->hw_features &= ~(NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_LOOPBACK);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 net-next 2/4] net: enetc: update max chained Tx BD number for i.MX95 ENETC
  2024-12-13  2:17 [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1 Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC Wei Fang
@ 2024-12-13  2:17 ` Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 4/4] net: enetc: add UDP segmentation offload support Wei Fang
  3 siblings, 0 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-13  2:17 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

The max chained Tx BDs of latest ENETC (i.MX95 ENETC, rev 4.1) has been
increased to 63, but since the range of MAX_SKB_FRAGS is 17~45, so for
i.MX95 ENETC and later revision, it is better to set ENETC4_MAX_SKB_FRAGS
to MAX_SKB_FRAGS.

In addition, add max_frags in struct enetc_drvdata to indicate the max
chained BDs supported by device. Because the max number of chained BDs
supported by LS1028A and i.MX95 ENETC is different.

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
---
v2:
1. Refine the commit message
2. Add Reviewed-by tag
v3 ~ v8: no changes
---
 drivers/net/ethernet/freescale/enetc/enetc.c        | 13 +++++++++----
 drivers/net/ethernet/freescale/enetc/enetc.h        | 13 +++++++++++--
 .../net/ethernet/freescale/enetc/enetc_pf_common.c  |  1 +
 drivers/net/ethernet/freescale/enetc/enetc_vf.c     |  1 +
 4 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index b8ac680e46bd..09ca4223ff9d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -535,6 +535,7 @@ static void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso
 
 static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 {
+	struct enetc_ndev_priv *priv = netdev_priv(tx_ring->ndev);
 	int hdr_len, total_len, data_len;
 	struct enetc_tx_swbd *tx_swbd;
 	union enetc_tx_bd *txbd;
@@ -600,7 +601,7 @@ static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb
 			bd_data_num++;
 			tso_build_data(skb, &tso, size);
 
-			if (unlikely(bd_data_num >= ENETC_MAX_SKB_FRAGS && data_len))
+			if (unlikely(bd_data_num >= priv->max_frags && data_len))
 				goto err_chained_bd;
 		}
 
@@ -661,7 +662,7 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 		count = enetc_map_tx_tso_buffs(tx_ring, skb);
 		enetc_unlock_mdio();
 	} else {
-		if (unlikely(skb_shinfo(skb)->nr_frags > ENETC_MAX_SKB_FRAGS))
+		if (unlikely(skb_shinfo(skb)->nr_frags > priv->max_frags))
 			if (unlikely(skb_linearize(skb)))
 				goto drop_packet_err;
 
@@ -679,7 +680,7 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 	if (unlikely(!count))
 		goto drop_packet_err;
 
-	if (enetc_bd_unused(tx_ring) < ENETC_TXBDS_MAX_NEEDED)
+	if (enetc_bd_unused(tx_ring) < ENETC_TXBDS_MAX_NEEDED(priv->max_frags))
 		netif_stop_subqueue(ndev, tx_ring->index);
 
 	return NETDEV_TX_OK;
@@ -947,7 +948,8 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
 	if (unlikely(tx_frm_cnt && netif_carrier_ok(ndev) &&
 		     __netif_subqueue_stopped(ndev, tx_ring->index) &&
 		     !test_bit(ENETC_TX_DOWN, &priv->flags) &&
-		     (enetc_bd_unused(tx_ring) >= ENETC_TXBDS_MAX_NEEDED))) {
+		     (enetc_bd_unused(tx_ring) >=
+		      ENETC_TXBDS_MAX_NEEDED(priv->max_frags)))) {
 		netif_wake_subqueue(ndev, tx_ring->index);
 	}
 
@@ -3308,18 +3310,21 @@ EXPORT_SYMBOL_GPL(enetc_pci_remove);
 static const struct enetc_drvdata enetc_pf_data = {
 	.sysclk_freq = ENETC_CLK_400M,
 	.pmac_offset = ENETC_PMAC_OFFSET,
+	.max_frags = ENETC_MAX_SKB_FRAGS,
 	.eth_ops = &enetc_pf_ethtool_ops,
 };
 
 static const struct enetc_drvdata enetc4_pf_data = {
 	.sysclk_freq = ENETC_CLK_333M,
 	.tx_csum = 1,
+	.max_frags = ENETC4_MAX_SKB_FRAGS,
 	.pmac_offset = ENETC4_PMAC_OFFSET,
 	.eth_ops = &enetc4_pf_ethtool_ops,
 };
 
 static const struct enetc_drvdata enetc_vf_data = {
 	.sysclk_freq = ENETC_CLK_400M,
+	.max_frags = ENETC_MAX_SKB_FRAGS,
 	.eth_ops = &enetc_vf_ethtool_ops,
 };
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index e82eb9a9137c..1e680f0f5123 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -59,9 +59,16 @@ struct enetc_rx_swbd {
 
 /* ENETC overhead: optional extension BD + 1 BD gap */
 #define ENETC_TXBDS_NEEDED(val)	((val) + 2)
-/* max # of chained Tx BDs is 15, including head and extension BD */
+/* For LS1028A, max # of chained Tx BDs is 15, including head and
+ * extension BD.
+ */
 #define ENETC_MAX_SKB_FRAGS	13
-#define ENETC_TXBDS_MAX_NEEDED	ENETC_TXBDS_NEEDED(ENETC_MAX_SKB_FRAGS + 1)
+/* For ENETC v4 and later versions, max # of chained Tx BDs is 63,
+ * including head and extension BD, but the range of MAX_SKB_FRAGS
+ * is 17 ~ 45, so set ENETC4_MAX_SKB_FRAGS to MAX_SKB_FRAGS.
+ */
+#define ENETC4_MAX_SKB_FRAGS		MAX_SKB_FRAGS
+#define ENETC_TXBDS_MAX_NEEDED(x)	ENETC_TXBDS_NEEDED((x) + 1)
 
 struct enetc_ring_stats {
 	unsigned int packets;
@@ -235,6 +242,7 @@ enum enetc_errata {
 struct enetc_drvdata {
 	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
 	u8 tx_csum:1;
+	u8 max_frags;
 	u64 sysclk_freq;
 	const struct ethtool_ops *eth_ops;
 };
@@ -377,6 +385,7 @@ struct enetc_ndev_priv {
 	u16 msg_enable;
 
 	u8 preemptible_tcs;
+	u8 max_frags; /* The maximum number of BDs for fragments */
 
 	enum enetc_active_offloads active_offloads;
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 09f2d7ec44eb..00b73a948746 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -101,6 +101,7 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 
 	priv->msg_enable = (NETIF_MSG_WOL << 1) - 1;
 	priv->sysclk_freq = si->drvdata->sysclk_freq;
+	priv->max_frags = si->drvdata->max_frags;
 	ndev->netdev_ops = ndev_ops;
 	enetc_set_ethtool_ops(ndev);
 	ndev->watchdog_timeo = 5 * HZ;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index a5f8ce576b6e..63d78b2b8670 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -136,6 +136,7 @@ static void enetc_vf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 
 	priv->msg_enable = (NETIF_MSG_IFUP << 1) - 1;
 	priv->sysclk_freq = si->drvdata->sysclk_freq;
+	priv->max_frags = si->drvdata->max_frags;
 	ndev->netdev_ops = ndev_ops;
 	enetc_set_ethtool_ops(ndev);
 	ndev->watchdog_timeo = 5 * HZ;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-13  2:17 [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1 Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC Wei Fang
  2024-12-13  2:17 ` [PATCH v8 net-next 2/4] net: enetc: update max chained Tx BD number " Wei Fang
@ 2024-12-13  2:17 ` Wei Fang
  2024-12-17  9:20   ` Paolo Abeni
  2024-12-17 15:32   ` Alexander Lobakin
  2024-12-13  2:17 ` [PATCH v8 net-next 4/4] net: enetc: add UDP segmentation offload support Wei Fang
  3 siblings, 2 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-13  2:17 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

ENETC rev 4.1 supports large send offload (LSO), segmenting large TCP
and UDP transmit units into multiple Ethernet frames. To support LSO,
software needs to fill some auxiliary information in Tx BD, such as LSO
header length, frame length, LSO maximum segment size, etc.

At 1Gbps link rate, TCP segmentation was tested using iperf3, and the
CPU performance before and after applying the patch was compared through
the top command. It can be seen that LSO saves a significant amount of
CPU cycles compared to software TSO.

Before applying the patch:
%Cpu(s):  0.1 us,  4.1 sy,  0.0 ni, 85.7 id,  0.0 wa,  0.5 hi,  9.7 si

After applying the patch:
%Cpu(s):  0.1 us,  2.3 sy,  0.0 ni, 94.5 id,  0.0 wa,  0.4 hi,  2.6 si

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
---
v2: no changes
v3: use enetc_skb_is_ipv6() helper fucntion which is added in patch 2
v4: fix a typo
v5: no changes
v6: remove error logs from the datapath
v7: rebase the patch due to the layout change of enetc_tx_bd
v8: rebase the patch due to merge conflicts
---
 drivers/net/ethernet/freescale/enetc/enetc.c  | 257 +++++++++++++++++-
 drivers/net/ethernet/freescale/enetc/enetc.h  |  15 +
 .../net/ethernet/freescale/enetc/enetc4_hw.h  |  22 ++
 .../net/ethernet/freescale/enetc/enetc_hw.h   |  14 +-
 .../freescale/enetc/enetc_pf_common.c         |   3 +
 5 files changed, 301 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 09ca4223ff9d..41a3798c7564 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -533,6 +533,224 @@ static void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso
 	}
 }
 
+static inline int enetc_lso_count_descs(const struct sk_buff *skb)
+{
+	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
+	 * for linear area data but not include LSO header, namely
+	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
+	 */
+	return skb_shinfo(skb)->nr_frags + 4;
+}
+
+static int enetc_lso_get_hdr_len(const struct sk_buff *skb)
+{
+	int hdr_len, tlen;
+
+	tlen = skb_is_gso_tcp(skb) ? tcp_hdrlen(skb) : sizeof(struct udphdr);
+	hdr_len = skb_transport_offset(skb) + tlen;
+
+	return hdr_len;
+}
+
+static void enetc_lso_start(struct sk_buff *skb, struct enetc_lso_t *lso)
+{
+	lso->lso_seg_size = skb_shinfo(skb)->gso_size;
+	lso->ipv6 = enetc_skb_is_ipv6(skb);
+	lso->tcp = skb_is_gso_tcp(skb);
+	lso->l3_hdr_len = skb_network_header_len(skb);
+	lso->l3_start = skb_network_offset(skb);
+	lso->hdr_len = enetc_lso_get_hdr_len(skb);
+	lso->total_len = skb->len - lso->hdr_len;
+}
+
+static void enetc_lso_map_hdr(struct enetc_bdr *tx_ring, struct sk_buff *skb,
+			      int *i, struct enetc_lso_t *lso)
+{
+	union enetc_tx_bd txbd_tmp, *txbd;
+	struct enetc_tx_swbd *tx_swbd;
+	u16 frm_len, frm_len_ext;
+	u8 flags, e_flags = 0;
+	dma_addr_t addr;
+	char *hdr;
+
+	/* Get the first BD of the LSO BDs chain */
+	txbd = ENETC_TXBD(*tx_ring, *i);
+	tx_swbd = &tx_ring->tx_swbd[*i];
+	prefetchw(txbd);
+
+	/* Prepare LSO header: MAC + IP + TCP/UDP */
+	hdr = tx_ring->tso_headers + *i * TSO_HEADER_SIZE;
+	memcpy(hdr, skb->data, lso->hdr_len);
+	addr = tx_ring->tso_headers_dma + *i * TSO_HEADER_SIZE;
+
+	frm_len = lso->total_len & 0xffff;
+	frm_len_ext = (lso->total_len >> 16) & 0xf;
+
+	/* Set the flags of the first BD */
+	flags = ENETC_TXBD_FLAGS_EX | ENETC_TXBD_FLAGS_CSUM_LSO |
+		ENETC_TXBD_FLAGS_LSO | ENETC_TXBD_FLAGS_L4CS;
+
+	enetc_clear_tx_bd(&txbd_tmp);
+	txbd_tmp.addr = cpu_to_le64(addr);
+	txbd_tmp.hdr_len = cpu_to_le16(lso->hdr_len);
+
+	/* first BD needs frm_len and offload flags set */
+	txbd_tmp.frm_len = cpu_to_le16(frm_len);
+	txbd_tmp.flags = flags;
+
+	txbd_tmp.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START, lso->l3_start);
+	/* l3_hdr_size in 32-bits (4 bytes) */
+	txbd_tmp.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
+				      lso->l3_hdr_len / 4);
+	if (lso->ipv6)
+		txbd_tmp.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T, 1);
+	else
+		txbd_tmp.l3_aux0 |= FIELD_PREP(ENETC_TX_BD_IPCS, 1);
+
+	txbd_tmp.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T, lso->tcp ?
+				     ENETC_TXBD_L4T_TCP : ENETC_TXBD_L4T_UDP);
+
+	/* For the LSO header we do not set the dma address since
+	 * we do not want it unmapped when we do cleanup. We still
+	 * set len so that we count the bytes sent.
+	 */
+	tx_swbd->len = lso->hdr_len;
+	tx_swbd->do_twostep_tstamp = false;
+	tx_swbd->check_wb = false;
+
+	/* Actually write the header in the BD */
+	*txbd = txbd_tmp;
+
+	/* Get the next BD, and the next BD is extended BD */
+	enetc_bdr_idx_inc(tx_ring, i);
+	txbd = ENETC_TXBD(*tx_ring, *i);
+	tx_swbd = &tx_ring->tx_swbd[*i];
+	prefetchw(txbd);
+
+	enetc_clear_tx_bd(&txbd_tmp);
+	if (skb_vlan_tag_present(skb)) {
+		/* Setup the VLAN fields */
+		txbd_tmp.ext.vid = cpu_to_le16(skb_vlan_tag_get(skb));
+		txbd_tmp.ext.tpid = 0; /* < C-TAG */
+		e_flags = ENETC_TXBD_E_FLAGS_VLAN_INS;
+	}
+
+	/* Write the BD */
+	txbd_tmp.ext.e_flags = e_flags;
+	txbd_tmp.ext.lso_sg_size = cpu_to_le16(lso->lso_seg_size);
+	txbd_tmp.ext.frm_len_ext = cpu_to_le16(frm_len_ext);
+	*txbd = txbd_tmp;
+}
+
+static int enetc_lso_map_data(struct enetc_bdr *tx_ring, struct sk_buff *skb,
+			      int *i, struct enetc_lso_t *lso, int *count)
+{
+	union enetc_tx_bd txbd_tmp, *txbd = NULL;
+	struct enetc_tx_swbd *tx_swbd;
+	skb_frag_t *frag;
+	dma_addr_t dma;
+	u8 flags = 0;
+	int len, f;
+
+	len = skb_headlen(skb) - lso->hdr_len;
+	if (len > 0) {
+		dma = dma_map_single(tx_ring->dev, skb->data + lso->hdr_len,
+				     len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
+			return -ENOMEM;
+
+		enetc_bdr_idx_inc(tx_ring, i);
+		txbd = ENETC_TXBD(*tx_ring, *i);
+		tx_swbd = &tx_ring->tx_swbd[*i];
+		prefetchw(txbd);
+		*count += 1;
+
+		enetc_clear_tx_bd(&txbd_tmp);
+		txbd_tmp.addr = cpu_to_le64(dma);
+		txbd_tmp.buf_len = cpu_to_le16(len);
+
+		tx_swbd->dma = dma;
+		tx_swbd->len = len;
+		tx_swbd->is_dma_page = 0;
+		tx_swbd->dir = DMA_TO_DEVICE;
+	}
+
+	frag = &skb_shinfo(skb)->frags[0];
+	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++, frag++) {
+		if (txbd)
+			*txbd = txbd_tmp;
+
+		len = skb_frag_size(frag);
+		dma = skb_frag_dma_map(tx_ring->dev, frag, 0, len,
+				       DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
+			return -ENOMEM;
+
+		/* Get the next BD */
+		enetc_bdr_idx_inc(tx_ring, i);
+		txbd = ENETC_TXBD(*tx_ring, *i);
+		tx_swbd = &tx_ring->tx_swbd[*i];
+		prefetchw(txbd);
+		*count += 1;
+
+		enetc_clear_tx_bd(&txbd_tmp);
+		txbd_tmp.addr = cpu_to_le64(dma);
+		txbd_tmp.buf_len = cpu_to_le16(len);
+
+		tx_swbd->dma = dma;
+		tx_swbd->len = len;
+		tx_swbd->is_dma_page = 1;
+		tx_swbd->dir = DMA_TO_DEVICE;
+	}
+
+	/* Last BD needs 'F' bit set */
+	flags |= ENETC_TXBD_FLAGS_F;
+	txbd_tmp.flags = flags;
+	*txbd = txbd_tmp;
+
+	tx_swbd->is_eof = 1;
+	tx_swbd->skb = skb;
+
+	return 0;
+}
+
+static int enetc_lso_hw_offload(struct enetc_bdr *tx_ring, struct sk_buff *skb)
+{
+	struct enetc_tx_swbd *tx_swbd;
+	struct enetc_lso_t lso = {0};
+	int err, i, count = 0;
+
+	/* Initialize the LSO handler */
+	enetc_lso_start(skb, &lso);
+	i = tx_ring->next_to_use;
+
+	enetc_lso_map_hdr(tx_ring, skb, &i, &lso);
+	/* First BD and an extend BD */
+	count += 2;
+
+	err = enetc_lso_map_data(tx_ring, skb, &i, &lso, &count);
+	if (err)
+		goto dma_err;
+
+	/* Go to the next BD */
+	enetc_bdr_idx_inc(tx_ring, &i);
+	tx_ring->next_to_use = i;
+	enetc_update_tx_ring_tail(tx_ring);
+
+	return count;
+
+dma_err:
+	do {
+		tx_swbd = &tx_ring->tx_swbd[i];
+		enetc_free_tx_frame(tx_ring, tx_swbd);
+		if (i == 0)
+			i = tx_ring->bd_count;
+		i--;
+	} while (count--);
+
+	return 0;
+}
+
 static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
 {
 	struct enetc_ndev_priv *priv = netdev_priv(tx_ring->ndev);
@@ -653,14 +871,26 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
 	tx_ring = priv->tx_ring[skb->queue_mapping];
 
 	if (skb_is_gso(skb)) {
-		if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
-			netif_stop_subqueue(ndev, tx_ring->index);
-			return NETDEV_TX_BUSY;
-		}
+		/* LSO data unit lengths of up to 256KB are supported */
+		if (priv->active_offloads & ENETC_F_LSO &&
+		    (skb->len - enetc_lso_get_hdr_len(skb)) <=
+		    ENETC_LSO_MAX_DATA_LEN) {
+			if (enetc_bd_unused(tx_ring) < enetc_lso_count_descs(skb)) {
+				netif_stop_subqueue(ndev, tx_ring->index);
+				return NETDEV_TX_BUSY;
+			}
 
-		enetc_lock_mdio();
-		count = enetc_map_tx_tso_buffs(tx_ring, skb);
-		enetc_unlock_mdio();
+			count = enetc_lso_hw_offload(tx_ring, skb);
+		} else {
+			if (enetc_bd_unused(tx_ring) < tso_count_descs(skb)) {
+				netif_stop_subqueue(ndev, tx_ring->index);
+				return NETDEV_TX_BUSY;
+			}
+
+			enetc_lock_mdio();
+			count = enetc_map_tx_tso_buffs(tx_ring, skb);
+			enetc_unlock_mdio();
+		}
 	} else {
 		if (unlikely(skb_shinfo(skb)->nr_frags > priv->max_frags))
 			if (unlikely(skb_linearize(skb)))
@@ -1800,6 +2030,9 @@ void enetc_get_si_caps(struct enetc_si *si)
 		rss = enetc_rd(hw, ENETC_SIRSSCAPR);
 		si->num_rss = ENETC_SIRSSCAPR_GET_NUM_RSS(rss);
 	}
+
+	if (val & ENETC_SIPCAPR0_LSO)
+		si->hw_features |= ENETC_SI_F_LSO;
 }
 EXPORT_SYMBOL_GPL(enetc_get_si_caps);
 
@@ -2096,6 +2329,13 @@ static int enetc_setup_default_rss_table(struct enetc_si *si, int num_groups)
 	return 0;
 }
 
+static void enetc_set_lso_flags_mask(struct enetc_hw *hw)
+{
+	enetc_wr(hw, ENETC4_SILSOSFMR0,
+		 SILSOSFMR0_VAL_SET(TCP_NL_SEG_FLAGS_DMASK, TCP_NL_SEG_FLAGS_DMASK));
+	enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
+}
+
 int enetc_configure_si(struct enetc_ndev_priv *priv)
 {
 	struct enetc_si *si = priv->si;
@@ -2109,6 +2349,9 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
 	/* enable SI */
 	enetc_wr(hw, ENETC_SIMR, ENETC_SIMR_EN);
 
+	if (si->hw_features & ENETC_SI_F_LSO)
+		enetc_set_lso_flags_mask(hw);
+
 	/* TODO: RSS support for i.MX95 will be supported later, and the
 	 * is_enetc_rev1() condition will be removed
 	 */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 1e680f0f5123..6db6b3eee45c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -41,6 +41,19 @@ struct enetc_tx_swbd {
 	u8 qbv_en:1;
 };
 
+struct enetc_lso_t {
+	bool	ipv6;
+	bool	tcp;
+	u8	l3_hdr_len;
+	u8	hdr_len; /* LSO header length */
+	u8	l3_start;
+	u16	lso_seg_size;
+	int	total_len; /* total data length, not include LSO header */
+};
+
+#define ENETC_1KB_SIZE			1024
+#define ENETC_LSO_MAX_DATA_LEN		(256 * ENETC_1KB_SIZE)
+
 #define ENETC_RX_MAXFRM_SIZE	ENETC_MAC_MAXFRM_SIZE
 #define ENETC_RXB_TRUESIZE	2048 /* PAGE_SIZE >> 1 */
 #define ENETC_RXB_PAD		NET_SKB_PAD /* add extra space if needed */
@@ -238,6 +251,7 @@ enum enetc_errata {
 #define ENETC_SI_F_PSFP BIT(0)
 #define ENETC_SI_F_QBV  BIT(1)
 #define ENETC_SI_F_QBU  BIT(2)
+#define ENETC_SI_F_LSO	BIT(3)
 
 struct enetc_drvdata {
 	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
@@ -351,6 +365,7 @@ enum enetc_active_offloads {
 	ENETC_F_QCI			= BIT(10),
 	ENETC_F_QBU			= BIT(11),
 	ENETC_F_TXCSUM			= BIT(12),
+	ENETC_F_LSO			= BIT(13),
 };
 
 enum enetc_flags_bit {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
index 26b220677448..cdde8e93a73c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
@@ -12,6 +12,28 @@
 #define NXP_ENETC_VENDOR_ID		0x1131
 #define NXP_ENETC_PF_DEV_ID		0xe101
 
+/**********************Station interface registers************************/
+/* Station interface LSO segmentation flag mask register 0/1 */
+#define ENETC4_SILSOSFMR0		0x1300
+#define  SILSOSFMR0_TCP_MID_SEG		GENMASK(27, 16)
+#define  SILSOSFMR0_TCP_1ST_SEG		GENMASK(11, 0)
+#define  SILSOSFMR0_VAL_SET(first, mid)	((((mid) << 16) & SILSOSFMR0_TCP_MID_SEG) | \
+					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
+
+#define ENETC4_SILSOSFMR1		0x1304
+#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
+#define   TCP_FLAGS_FIN			BIT(0)
+#define   TCP_FLAGS_SYN			BIT(1)
+#define   TCP_FLAGS_RST			BIT(2)
+#define   TCP_FLAGS_PSH			BIT(3)
+#define   TCP_FLAGS_ACK			BIT(4)
+#define   TCP_FLAGS_URG			BIT(5)
+#define   TCP_FLAGS_ECE			BIT(6)
+#define   TCP_FLAGS_CWR			BIT(7)
+#define   TCP_FLAGS_NS			BIT(8)
+/* According to tso_build_hdr(), clear all special flags for not last packet. */
+#define TCP_NL_SEG_FLAGS_DMASK		(TCP_FLAGS_FIN | TCP_FLAGS_RST | TCP_FLAGS_PSH)
+
 /***************************ENETC port registers**************************/
 #define ENETC4_ECAPR0			0x0
 #define  ECAPR0_RFS			BIT(2)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 0e259baf36ee..c3789868e9eb 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -25,6 +25,7 @@
 #define ENETC_SIPCAPR0	0x20
 #define ENETC_SIPCAPR0_RSS	BIT(8)
 #define ENETC_SIPCAPR0_RFS	BIT(2)
+#define ENETC_SIPCAPR0_LSO	BIT(1)
 #define ENETC_SIPCAPR1	0x24
 #define ENETC_SITGTGR	0x30
 #define ENETC_SIRBGCR	0x38
@@ -554,7 +555,10 @@ static inline u64 _enetc_rd_reg64_wa(void __iomem *reg)
 union enetc_tx_bd {
 	struct {
 		__le64 addr;
-		__le16 buf_len;
+		union {
+			__le16 buf_len;
+			__le16 hdr_len;	/* For LSO, ENETC 4.1 and later */
+		};
 		__le16 frm_len;
 		union {
 			struct {
@@ -578,13 +582,16 @@ union enetc_tx_bd {
 		__le32 tstamp;
 		__le16 tpid;
 		__le16 vid;
-		u8 reserved[6];
+		__le16 lso_sg_size; /* For ENETC 4.1 and later */
+		__le16 frm_len_ext; /* For ENETC 4.1 and later */
+		u8 reserved[2];
 		u8 e_flags;
 		u8 flags;
 	} ext; /* Tx BD extension */
 	struct {
 		__le32 tstamp;
-		u8 reserved[10];
+		u8 reserved[8];
+		__le16 lso_err_count; /* For ENETC 4.1 and later */
 		u8 status;
 		u8 flags;
 	} wb; /* writeback descriptor */
@@ -593,6 +600,7 @@ union enetc_tx_bd {
 enum enetc_txbd_flags {
 	ENETC_TXBD_FLAGS_L4CS = BIT(0), /* For ENETC 4.1 and later */
 	ENETC_TXBD_FLAGS_TSE = BIT(1),
+	ENETC_TXBD_FLAGS_LSO = BIT(1), /* For ENETC 4.1 and later */
 	ENETC_TXBD_FLAGS_W = BIT(2),
 	ENETC_TXBD_FLAGS_CSUM_LSO = BIT(3), /* For ENETC 4.1 and later */
 	ENETC_TXBD_FLAGS_TXSTART = BIT(4),
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 00b73a948746..31dedc665a16 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -123,6 +123,9 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 	if (si->drvdata->tx_csum)
 		priv->active_offloads |= ENETC_F_TXCSUM;
 
+	if (si->hw_features & ENETC_SI_F_LSO)
+		priv->active_offloads |= ENETC_F_LSO;
+
 	/* TODO: currently, i.MX95 ENETC driver does not support advanced features */
 	if (!is_enetc_rev1(si)) {
 		ndev->hw_features &= ~(NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_LOOPBACK);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v8 net-next 4/4] net: enetc: add UDP segmentation offload support
  2024-12-13  2:17 [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1 Wei Fang
                   ` (2 preceding siblings ...)
  2024-12-13  2:17 ` [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF Wei Fang
@ 2024-12-13  2:17 ` Wei Fang
  3 siblings, 0 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-13  2:17 UTC (permalink / raw)
  To: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

Set NETIF_F_GSO_UDP_L4 bit of hw_features and features because i.MX95
enetc and LS1028A driver implements UDP segmentation.

- i.MX95 ENETC supports UDP segmentation via LSO.
- LS1028A ENETC supports UDP segmentation since the commit 3d5b459ba0e3
("net: tso: add UDP segmentation support").

Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
---
v2: rephrase the commit message
v3: no changes
v4: fix typo in commit message
v5 ~ v8: no changes
---
 drivers/net/ethernet/freescale/enetc/enetc_pf_common.c | 6 ++++--
 drivers/net/ethernet/freescale/enetc/enetc_vf.c        | 6 ++++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
index 31dedc665a16..3fd9b0727875 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf_common.c
@@ -110,11 +110,13 @@ void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 	ndev->hw_features = NETIF_F_SG | NETIF_F_RXCSUM |
 			    NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
 			    NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_LOOPBACK |
-			    NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6;
+			    NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6 |
+			    NETIF_F_GSO_UDP_L4;
 	ndev->features = NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_RXCSUM |
 			 NETIF_F_HW_VLAN_CTAG_TX |
 			 NETIF_F_HW_VLAN_CTAG_RX |
-			 NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6;
+			 NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6 |
+			 NETIF_F_GSO_UDP_L4;
 	ndev->vlan_features = NETIF_F_SG | NETIF_F_HW_CSUM |
 			      NETIF_F_TSO | NETIF_F_TSO6;
 
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_vf.c b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
index 63d78b2b8670..3768752b6008 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_vf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_vf.c
@@ -145,11 +145,13 @@ static void enetc_vf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
 	ndev->hw_features = NETIF_F_SG | NETIF_F_RXCSUM |
 			    NETIF_F_HW_VLAN_CTAG_TX |
 			    NETIF_F_HW_VLAN_CTAG_RX |
-			    NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6;
+			    NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6 |
+			    NETIF_F_GSO_UDP_L4;
 	ndev->features = NETIF_F_HIGHDMA | NETIF_F_SG | NETIF_F_RXCSUM |
 			 NETIF_F_HW_VLAN_CTAG_TX |
 			 NETIF_F_HW_VLAN_CTAG_RX |
-			 NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6;
+			 NETIF_F_HW_CSUM | NETIF_F_TSO | NETIF_F_TSO6 |
+			 NETIF_F_GSO_UDP_L4;
 	ndev->vlan_features = NETIF_F_SG | NETIF_F_HW_CSUM |
 			      NETIF_F_TSO | NETIF_F_TSO6;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-13  2:17 ` [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF Wei Fang
@ 2024-12-17  9:20   ` Paolo Abeni
  2024-12-17 12:52     ` Wei Fang
  2024-12-17 15:32   ` Alexander Lobakin
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-12-17  9:20 UTC (permalink / raw)
  To: Wei Fang, claudiu.manoil, vladimir.oltean, xiaoning.wang,
	andrew+netdev, davem, edumazet, kuba, frank.li, horms, idosch
  Cc: netdev, linux-kernel, imx

On 12/13/24 03:17, Wei Fang wrote:
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 09ca4223ff9d..41a3798c7564 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -533,6 +533,224 @@ static void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso
>  	}
>  }
>  
> +static inline int enetc_lso_count_descs(const struct sk_buff *skb)

Please, don't use inline in c files

> +{
> +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> +	 * for linear area data but not include LSO header, namely
> +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
> +	 */
> +	return skb_shinfo(skb)->nr_frags + 4;
> +}
> +static int enetc_lso_hw_offload(struct enetc_bdr *tx_ring, struct sk_buff *skb)
> +{
> +	struct enetc_tx_swbd *tx_swbd;
> +	struct enetc_lso_t lso = {0};
> +	int err, i, count = 0;
> +
> +	/* Initialize the LSO handler */
> +	enetc_lso_start(skb, &lso);
> +	i = tx_ring->next_to_use;
> +
> +	enetc_lso_map_hdr(tx_ring, skb, &i, &lso);
> +	/* First BD and an extend BD */
> +	count += 2;
> +
> +	err = enetc_lso_map_data(tx_ring, skb, &i, &lso, &count);
> +	if (err)
> +		goto dma_err;
> +
> +	/* Go to the next BD */
> +	enetc_bdr_idx_inc(tx_ring, &i);
> +	tx_ring->next_to_use = i;
> +	enetc_update_tx_ring_tail(tx_ring);
> +
> +	return count;
> +
> +dma_err:
> +	do {
> +		tx_swbd = &tx_ring->tx_swbd[i];
> +		enetc_free_tx_frame(tx_ring, tx_swbd);
> +		if (i == 0)
> +			i = tx_ring->bd_count;
> +		i--;
> +	} while (count--);
> +
> +	return 0;
> +}

I'm sorry for not catching the issue early, but apparently there is an
off-by-one in the above loop: if 'count' bds have been used, it will
attempt to free 'count + 1' of them.

The minimal fix should be using:

	} while (--count);

/P


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-17  9:20   ` Paolo Abeni
@ 2024-12-17 12:52     ` Wei Fang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-17 12:52 UTC (permalink / raw)
  To: Paolo Abeni, Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, Frank Li, horms@kernel.org, idosch@idosch.org
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	imx@lists.linux.dev

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: 2024年12月17日 17:20
> To: Wei Fang <wei.fang@nxp.com>; Claudiu Manoil
> <claudiu.manoil@nxp.com>; Vladimir Oltean <vladimir.oltean@nxp.com>;
> Clark Wang <xiaoning.wang@nxp.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org; Frank Li
> <frank.li@nxp.com>; horms@kernel.org; idosch@idosch.org
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; imx@lists.linux.dev
> Subject: Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95
> ENETC PF
> 
> On 12/13/24 03:17, Wei Fang wrote:
> > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c
> b/drivers/net/ethernet/freescale/enetc/enetc.c
> > index 09ca4223ff9d..41a3798c7564 100644
> > --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> > +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> > @@ -533,6 +533,224 @@ static void enetc_tso_complete_csum(struct
> enetc_bdr *tx_ring, struct tso_t *tso
> >  	}
> >  }
> >
> > +static inline int enetc_lso_count_descs(const struct sk_buff *skb)
> 
> Please, don't use inline in c files
> 
> > +{
> > +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> > +	 * for linear area data but not include LSO header, namely
> > +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
> > +	 */
> > +	return skb_shinfo(skb)->nr_frags + 4;
> > +}
> > +static int enetc_lso_hw_offload(struct enetc_bdr *tx_ring, struct sk_buff
> *skb)
> > +{
> > +	struct enetc_tx_swbd *tx_swbd;
> > +	struct enetc_lso_t lso = {0};
> > +	int err, i, count = 0;
> > +
> > +	/* Initialize the LSO handler */
> > +	enetc_lso_start(skb, &lso);
> > +	i = tx_ring->next_to_use;
> > +
> > +	enetc_lso_map_hdr(tx_ring, skb, &i, &lso);
> > +	/* First BD and an extend BD */
> > +	count += 2;
> > +
> > +	err = enetc_lso_map_data(tx_ring, skb, &i, &lso, &count);
> > +	if (err)
> > +		goto dma_err;
> > +
> > +	/* Go to the next BD */
> > +	enetc_bdr_idx_inc(tx_ring, &i);
> > +	tx_ring->next_to_use = i;
> > +	enetc_update_tx_ring_tail(tx_ring);
> > +
> > +	return count;
> > +
> > +dma_err:
> > +	do {
> > +		tx_swbd = &tx_ring->tx_swbd[i];
> > +		enetc_free_tx_frame(tx_ring, tx_swbd);
> > +		if (i == 0)
> > +			i = tx_ring->bd_count;
> > +		i--;
> > +	} while (count--);
> > +
> > +	return 0;
> > +}
> 
> I'm sorry for not catching the issue early, but apparently there is an
> off-by-one in the above loop: if 'count' bds have been used, it will
> attempt to free 'count + 1' of them.
> 
> The minimal fix should be using:
> 
> 	} while (--count);
> 
> /P

Thanks for reminder, I will fix it. :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC
  2024-12-13  2:17 ` [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC Wei Fang
@ 2024-12-17 15:13   ` Alexander Lobakin
  2024-12-18  1:53     ` Wei Fang
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2024-12-17 15:13 UTC (permalink / raw)
  To: Wei Fang
  Cc: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch, netdev,
	linux-kernel, imx

From: Wei Fang <wei.fang@nxp.com>
Date: Fri, 13 Dec 2024 10:17:28 +0800

> In addition to supporting Rx checksum offload, i.MX95 ENETC also supports
> Tx checksum offload. The transmit checksum offload is implemented through
> the Tx BD. To support Tx checksum offload, software needs to fill some
> auxiliary information in Tx BD, such as IP version, IP header offset and
> size, whether L4 is UDP or TCP, etc.
> 
> Same as Rx checksum offload, Tx checksum offload capability isn't defined
> in register, so tx_csum bit is added to struct enetc_drvdata to indicate
> whether the device supports Tx checksum offload.

[...]

> @@ -163,6 +184,30 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
>  	dma_addr_t dma;
>  	u8 flags = 0;
>  
> +	enetc_clear_tx_bd(&temp_bd);
> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +		/* Can not support TSD and checksum offload at the same time */
> +		if (priv->active_offloads & ENETC_F_TXCSUM &&
> +		    enetc_tx_csum_offload_check(skb) && !tx_ring->tsd_enable) {
> +			temp_bd.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START,
> +						     skb_network_offset(skb));
> +			temp_bd.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
> +						     skb_network_header_len(skb) / 4);
> +			temp_bd.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T,
> +						      enetc_skb_is_ipv6(skb));
> +			if (enetc_skb_is_tcp(skb))
> +				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
> +							    ENETC_TXBD_L4T_TCP);
> +			else
> +				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
> +							    ENETC_TXBD_L4T_UDP);
> +			flags |= ENETC_TXBD_FLAGS_CSUM_LSO | ENETC_TXBD_FLAGS_L4CS;
> +		} else {
> +			if (skb_checksum_help(skb))

Why not

		} else if (skb_checksum_help(skb)) {

?

> +				return 0;
> +		}
> +	}
> +
>  	i = tx_ring->next_to_use;
>  	txbd = ENETC_TXBD(*tx_ring, i);
>  	prefetchw(txbd);
> @@ -173,7 +218,6 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)
>  
>  	temp_bd.addr = cpu_to_le64(dma);
>  	temp_bd.buf_len = cpu_to_le16(len);
> -	temp_bd.lstatus = 0;

Why is this removed and how is this change related to the checksum offload?

>  
>  	tx_swbd = &tx_ring->tx_swbd[i];
>  	tx_swbd->dma = dma;
> @@ -594,7 +638,7 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
>  {
>  	struct enetc_ndev_priv *priv = netdev_priv(ndev);
>  	struct enetc_bdr *tx_ring;
> -	int count, err;
> +	int count;
>  
>  	/* Queue one-step Sync packet if already locked */
>  	if (skb->cb[0] & ENETC_F_TX_ONESTEP_SYNC_TSTAMP) {
> @@ -627,11 +671,6 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff *skb,
>  			return NETDEV_TX_BUSY;
>  		}
>  
> -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -			err = skb_checksum_help(skb);
> -			if (err)
> -				goto drop_packet_err;
> -		}
>  		enetc_lock_mdio();
>  		count = enetc_map_tx_buffs(tx_ring, skb);
>  		enetc_unlock_mdio();
> @@ -3274,6 +3313,7 @@ static const struct enetc_drvdata enetc_pf_data = {
>  
>  static const struct enetc_drvdata enetc4_pf_data = {
>  	.sysclk_freq = ENETC_CLK_333M,
> +	.tx_csum = 1,

Maybe make it `bool tx_csum:1` instead of u8 and assign `true` here?

>  	.pmac_offset = ENETC4_PMAC_OFFSET,
>  	.eth_ops = &enetc4_pf_ethtool_ops,
>  };
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index 72fa03dbc2dd..e82eb9a9137c 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -234,6 +234,7 @@ enum enetc_errata {
>  
>  struct enetc_drvdata {
>  	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
> +	u8 tx_csum:1;
>  	u64 sysclk_freq;
>  	const struct ethtool_ops *eth_ops;
>  };

Thanks,
Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-13  2:17 ` [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF Wei Fang
  2024-12-17  9:20   ` Paolo Abeni
@ 2024-12-17 15:32   ` Alexander Lobakin
  2024-12-18  3:06     ` Wei Fang
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2024-12-17 15:32 UTC (permalink / raw)
  To: Wei Fang
  Cc: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, frank.li, horms, idosch, netdev,
	linux-kernel, imx

From: Wei Fang <wei.fang@nxp.com>
Date: Fri, 13 Dec 2024 10:17:30 +0800

> ENETC rev 4.1 supports large send offload (LSO), segmenting large TCP
> and UDP transmit units into multiple Ethernet frames. To support LSO,
> software needs to fill some auxiliary information in Tx BD, such as LSO
> header length, frame length, LSO maximum segment size, etc.
> 
> At 1Gbps link rate, TCP segmentation was tested using iperf3, and the
> CPU performance before and after applying the patch was compared through
> the top command. It can be seen that LSO saves a significant amount of
> CPU cycles compared to software TSO.
> 
> Before applying the patch:
> %Cpu(s):  0.1 us,  4.1 sy,  0.0 ni, 85.7 id,  0.0 wa,  0.5 hi,  9.7 si
> 
> After applying the patch:
> %Cpu(s):  0.1 us,  2.3 sy,  0.0 ni, 94.5 id,  0.0 wa,  0.4 hi,  2.6 si
> 
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
> ---
> v2: no changes
> v3: use enetc_skb_is_ipv6() helper fucntion which is added in patch 2
> v4: fix a typo
> v5: no changes
> v6: remove error logs from the datapath
> v7: rebase the patch due to the layout change of enetc_tx_bd
> v8: rebase the patch due to merge conflicts
> ---
>  drivers/net/ethernet/freescale/enetc/enetc.c  | 257 +++++++++++++++++-
>  drivers/net/ethernet/freescale/enetc/enetc.h  |  15 +
>  .../net/ethernet/freescale/enetc/enetc4_hw.h  |  22 ++
>  .../net/ethernet/freescale/enetc/enetc_hw.h   |  14 +-
>  .../freescale/enetc/enetc_pf_common.c         |   3 +
>  5 files changed, 301 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 09ca4223ff9d..41a3798c7564 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -533,6 +533,224 @@ static void enetc_tso_complete_csum(struct enetc_bdr *tx_ring, struct tso_t *tso
>  	}
>  }
>  
> +static inline int enetc_lso_count_descs(const struct sk_buff *skb)
> +{
> +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> +	 * for linear area data but not include LSO header, namely
> +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.

What if the head contains headers only and
`skb_headlen(skb) - lso_hdr_len` is 0?

> +	 */
> +	return skb_shinfo(skb)->nr_frags + 4;
> +}
> +
> +static int enetc_lso_get_hdr_len(const struct sk_buff *skb)
> +{
> +	int hdr_len, tlen;
> +
> +	tlen = skb_is_gso_tcp(skb) ? tcp_hdrlen(skb) : sizeof(struct udphdr);
> +	hdr_len = skb_transport_offset(skb) + tlen;
> +
> +	return hdr_len;
> +}

Are you sure the kernel doesn't have similar generic helpers?

> +
> +static void enetc_lso_start(struct sk_buff *skb, struct enetc_lso_t *lso)
> +{
> +	lso->lso_seg_size = skb_shinfo(skb)->gso_size;
> +	lso->ipv6 = enetc_skb_is_ipv6(skb);
> +	lso->tcp = skb_is_gso_tcp(skb);
> +	lso->l3_hdr_len = skb_network_header_len(skb);
> +	lso->l3_start = skb_network_offset(skb);
> +	lso->hdr_len = enetc_lso_get_hdr_len(skb);
> +	lso->total_len = skb->len - lso->hdr_len;
> +}
> +
> +static void enetc_lso_map_hdr(struct enetc_bdr *tx_ring, struct sk_buff *skb,
> +			      int *i, struct enetc_lso_t *lso)
> +{
> +	union enetc_tx_bd txbd_tmp, *txbd;
> +	struct enetc_tx_swbd *tx_swbd;
> +	u16 frm_len, frm_len_ext;
> +	u8 flags, e_flags = 0;
> +	dma_addr_t addr;
> +	char *hdr;
> +
> +	/* Get the first BD of the LSO BDs chain */
> +	txbd = ENETC_TXBD(*tx_ring, *i);
> +	tx_swbd = &tx_ring->tx_swbd[*i];
> +	prefetchw(txbd);

Is this prefetchw() proven to give any benefit?

> +
> +	/* Prepare LSO header: MAC + IP + TCP/UDP */
> +	hdr = tx_ring->tso_headers + *i * TSO_HEADER_SIZE;
> +	memcpy(hdr, skb->data, lso->hdr_len);
> +	addr = tx_ring->tso_headers_dma + *i * TSO_HEADER_SIZE;
> +
> +	frm_len = lso->total_len & 0xffff;
> +	frm_len_ext = (lso->total_len >> 16) & 0xf;

Why are these magics just open-coded, even without any comment?
I have no idea what is going on here for example.

Also, `& 0xffff` is lower_16_bits(), while `lso->total_len >> 16` is
upper_16_bits().

> +
> +	/* Set the flags of the first BD */
> +	flags = ENETC_TXBD_FLAGS_EX | ENETC_TXBD_FLAGS_CSUM_LSO |
> +		ENETC_TXBD_FLAGS_LSO | ENETC_TXBD_FLAGS_L4CS;
> +
> +	enetc_clear_tx_bd(&txbd_tmp);
> +	txbd_tmp.addr = cpu_to_le64(addr);
> +	txbd_tmp.hdr_len = cpu_to_le16(lso->hdr_len);
> +
> +	/* first BD needs frm_len and offload flags set */
> +	txbd_tmp.frm_len = cpu_to_le16(frm_len);
> +	txbd_tmp.flags = flags;
> +
> +	txbd_tmp.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START, lso->l3_start);
> +	/* l3_hdr_size in 32-bits (4 bytes) */
> +	txbd_tmp.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
> +				      lso->l3_hdr_len / 4);
> +	if (lso->ipv6)
> +		txbd_tmp.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T, 1);
> +	else
> +		txbd_tmp.l3_aux0 |= FIELD_PREP(ENETC_TX_BD_IPCS, 1);

Both these "fields" are single bits. You don't need FIELD_PREP() for
single-bit fields, just `|= ENETC_TX_BD_L3T` etc.

> +
> +	txbd_tmp.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T, lso->tcp ?
> +				     ENETC_TXBD_L4T_TCP : ENETC_TXBD_L4T_UDP);
> +
> +	/* For the LSO header we do not set the dma address since
> +	 * we do not want it unmapped when we do cleanup. We still
> +	 * set len so that we count the bytes sent.
> +	 */
> +	tx_swbd->len = lso->hdr_len;
> +	tx_swbd->do_twostep_tstamp = false;
> +	tx_swbd->check_wb = false;
> +
> +	/* Actually write the header in the BD */
> +	*txbd = txbd_tmp;
> +
> +	/* Get the next BD, and the next BD is extended BD */
> +	enetc_bdr_idx_inc(tx_ring, i);
> +	txbd = ENETC_TXBD(*tx_ring, *i);
> +	tx_swbd = &tx_ring->tx_swbd[*i];
> +	prefetchw(txbd);

(same question as for the previous prefetchw())

> +
> +	enetc_clear_tx_bd(&txbd_tmp);
> +	if (skb_vlan_tag_present(skb)) {
> +		/* Setup the VLAN fields */
> +		txbd_tmp.ext.vid = cpu_to_le16(skb_vlan_tag_get(skb));
> +		txbd_tmp.ext.tpid = 0; /* < C-TAG */

???

Maybe #define it somewhere, that 0 means CVLAN etc.?

> +		e_flags = ENETC_TXBD_E_FLAGS_VLAN_INS;
> +	}
> +
> +	/* Write the BD */
> +	txbd_tmp.ext.e_flags = e_flags;
> +	txbd_tmp.ext.lso_sg_size = cpu_to_le16(lso->lso_seg_size);
> +	txbd_tmp.ext.frm_len_ext = cpu_to_le16(frm_len_ext);
> +	*txbd = txbd_tmp;
> +}
> +
> +static int enetc_lso_map_data(struct enetc_bdr *tx_ring, struct sk_buff *skb,
> +			      int *i, struct enetc_lso_t *lso, int *count)
> +{
> +	union enetc_tx_bd txbd_tmp, *txbd = NULL;
> +	struct enetc_tx_swbd *tx_swbd;
> +	skb_frag_t *frag;
> +	dma_addr_t dma;
> +	u8 flags = 0;
> +	int len, f;
> +
> +	len = skb_headlen(skb) - lso->hdr_len;
> +	if (len > 0) {
> +		dma = dma_map_single(tx_ring->dev, skb->data + lso->hdr_len,
> +				     len, DMA_TO_DEVICE);
> +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))

dma_mapping_error() already contains unlikely().

> +			return -ENOMEM;
> +
> +		enetc_bdr_idx_inc(tx_ring, i);
> +		txbd = ENETC_TXBD(*tx_ring, *i);
> +		tx_swbd = &tx_ring->tx_swbd[*i];
> +		prefetchw(txbd);
> +		*count += 1;
> +
> +		enetc_clear_tx_bd(&txbd_tmp);
> +		txbd_tmp.addr = cpu_to_le64(dma);
> +		txbd_tmp.buf_len = cpu_to_le16(len);
> +
> +		tx_swbd->dma = dma;
> +		tx_swbd->len = len;
> +		tx_swbd->is_dma_page = 0;
> +		tx_swbd->dir = DMA_TO_DEVICE;
> +	}
> +
> +	frag = &skb_shinfo(skb)->frags[0];
> +	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++, frag++) {
> +		if (txbd)
> +			*txbd = txbd_tmp;
> +
> +		len = skb_frag_size(frag);
> +		dma = skb_frag_dma_map(tx_ring->dev, frag, 0, len,
> +				       DMA_TO_DEVICE);

You now can use skb_frag_dma_map() with 2-4 arguments, so this can be
replaced to

		dma = skb_frag_dma_map(tx_ring->dev, frag);

> +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
> +			return -ENOMEM;
> +
> +		/* Get the next BD */
> +		enetc_bdr_idx_inc(tx_ring, i);
> +		txbd = ENETC_TXBD(*tx_ring, *i);
> +		tx_swbd = &tx_ring->tx_swbd[*i];
> +		prefetchw(txbd);
> +		*count += 1;
> +
> +		enetc_clear_tx_bd(&txbd_tmp);
> +		txbd_tmp.addr = cpu_to_le64(dma);
> +		txbd_tmp.buf_len = cpu_to_le16(len);
> +
> +		tx_swbd->dma = dma;
> +		tx_swbd->len = len;
> +		tx_swbd->is_dma_page = 1;
> +		tx_swbd->dir = DMA_TO_DEVICE;
> +	}
> +
> +	/* Last BD needs 'F' bit set */
> +	flags |= ENETC_TXBD_FLAGS_F;
> +	txbd_tmp.flags = flags;
> +	*txbd = txbd_tmp;
> +
> +	tx_swbd->is_eof = 1;
> +	tx_swbd->skb = skb;
> +
> +	return 0;
> +}

[...]

> @@ -2096,6 +2329,13 @@ static int enetc_setup_default_rss_table(struct enetc_si *si, int num_groups)
>  	return 0;
>  }
>  
> +static void enetc_set_lso_flags_mask(struct enetc_hw *hw)
> +{
> +	enetc_wr(hw, ENETC4_SILSOSFMR0,
> +		 SILSOSFMR0_VAL_SET(TCP_NL_SEG_FLAGS_DMASK, TCP_NL_SEG_FLAGS_DMASK));
> +	enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
> +}
> +
>  int enetc_configure_si(struct enetc_ndev_priv *priv)
>  {
>  	struct enetc_si *si = priv->si;
> @@ -2109,6 +2349,9 @@ int enetc_configure_si(struct enetc_ndev_priv *priv)
>  	/* enable SI */
>  	enetc_wr(hw, ENETC_SIMR, ENETC_SIMR_EN);
>  
> +	if (si->hw_features & ENETC_SI_F_LSO)
> +		enetc_set_lso_flags_mask(hw);
> +
>  	/* TODO: RSS support for i.MX95 will be supported later, and the
>  	 * is_enetc_rev1() condition will be removed
>  	 */
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index 1e680f0f5123..6db6b3eee45c 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -41,6 +41,19 @@ struct enetc_tx_swbd {
>  	u8 qbv_en:1;
>  };
>  
> +struct enetc_lso_t {
> +	bool	ipv6;
> +	bool	tcp;
> +	u8	l3_hdr_len;
> +	u8	hdr_len; /* LSO header length */
> +	u8	l3_start;
> +	u16	lso_seg_size;
> +	int	total_len; /* total data length, not include LSO header */
> +};
> +
> +#define ENETC_1KB_SIZE			1024

SZ_1K

> +#define ENETC_LSO_MAX_DATA_LEN		(256 * ENETC_1KB_SIZE)

SZ_256K

> +
>  #define ENETC_RX_MAXFRM_SIZE	ENETC_MAC_MAXFRM_SIZE
>  #define ENETC_RXB_TRUESIZE	2048 /* PAGE_SIZE >> 1 */
>  #define ENETC_RXB_PAD		NET_SKB_PAD /* add extra space if needed */
> @@ -238,6 +251,7 @@ enum enetc_errata {
>  #define ENETC_SI_F_PSFP BIT(0)
>  #define ENETC_SI_F_QBV  BIT(1)
>  #define ENETC_SI_F_QBU  BIT(2)
> +#define ENETC_SI_F_LSO	BIT(3)
>  
>  struct enetc_drvdata {
>  	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
> @@ -351,6 +365,7 @@ enum enetc_active_offloads {
>  	ENETC_F_QCI			= BIT(10),
>  	ENETC_F_QBU			= BIT(11),
>  	ENETC_F_TXCSUM			= BIT(12),
> +	ENETC_F_LSO			= BIT(13),
>  };
>  
>  enum enetc_flags_bit {
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> index 26b220677448..cdde8e93a73c 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> @@ -12,6 +12,28 @@
>  #define NXP_ENETC_VENDOR_ID		0x1131
>  #define NXP_ENETC_PF_DEV_ID		0xe101
>  
> +/**********************Station interface registers************************/
> +/* Station interface LSO segmentation flag mask register 0/1 */
> +#define ENETC4_SILSOSFMR0		0x1300
> +#define  SILSOSFMR0_TCP_MID_SEG		GENMASK(27, 16)
> +#define  SILSOSFMR0_TCP_1ST_SEG		GENMASK(11, 0)
> +#define  SILSOSFMR0_VAL_SET(first, mid)	((((mid) << 16) & SILSOSFMR0_TCP_MID_SEG) | \

Why not FIELD_PREP()?

> +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
> +
> +#define ENETC4_SILSOSFMR1		0x1304
> +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
> +#define   TCP_FLAGS_FIN			BIT(0)
> +#define   TCP_FLAGS_SYN			BIT(1)
> +#define   TCP_FLAGS_RST			BIT(2)
> +#define   TCP_FLAGS_PSH			BIT(3)
> +#define   TCP_FLAGS_ACK			BIT(4)
> +#define   TCP_FLAGS_URG			BIT(5)
> +#define   TCP_FLAGS_ECE			BIT(6)
> +#define   TCP_FLAGS_CWR			BIT(7)
> +#define   TCP_FLAGS_NS			BIT(8)

Why are you open-coding these if they're present in uapi/linux/tcp.h?

> +/* According to tso_build_hdr(), clear all special flags for not last packet. */

But this mask is used only to do a writel(), I don't see it anywhere
clearing anything...

> +#define TCP_NL_SEG_FLAGS_DMASK		(TCP_FLAGS_FIN | TCP_FLAGS_RST | TCP_FLAGS_PSH)
> +
>  /***************************ENETC port registers**************************/
>  #define ENETC4_ECAPR0			0x0
>  #define  ECAPR0_RFS			BIT(2)
Thanks,
Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC
  2024-12-17 15:13   ` Alexander Lobakin
@ 2024-12-18  1:53     ` Wei Fang
  0 siblings, 0 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-18  1:53 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

> 
> From: Wei Fang <wei.fang@nxp.com>
> Date: Fri, 13 Dec 2024 10:17:28 +0800
> 
> > In addition to supporting Rx checksum offload, i.MX95 ENETC also supports
> > Tx checksum offload. The transmit checksum offload is implemented through
> > the Tx BD. To support Tx checksum offload, software needs to fill some
> > auxiliary information in Tx BD, such as IP version, IP header offset and
> > size, whether L4 is UDP or TCP, etc.
> >
> > Same as Rx checksum offload, Tx checksum offload capability isn't defined
> > in register, so tx_csum bit is added to struct enetc_drvdata to indicate
> > whether the device supports Tx checksum offload.
> 
> [...]
> 
> > @@ -163,6 +184,30 @@ static int enetc_map_tx_buffs(struct enetc_bdr
> *tx_ring, struct sk_buff *skb)
> >  	dma_addr_t dma;
> >  	u8 flags = 0;
> >
> > +	enetc_clear_tx_bd(&temp_bd);
> > +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > +		/* Can not support TSD and checksum offload at the same time */
> > +		if (priv->active_offloads & ENETC_F_TXCSUM &&
> > +		    enetc_tx_csum_offload_check(skb) && !tx_ring->tsd_enable) {
> > +			temp_bd.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START,
> > +						     skb_network_offset(skb));
> > +			temp_bd.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
> > +						     skb_network_header_len(skb) / 4);
> > +			temp_bd.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T,
> > +						      enetc_skb_is_ipv6(skb));
> > +			if (enetc_skb_is_tcp(skb))
> > +				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
> > +							    ENETC_TXBD_L4T_TCP);
> > +			else
> > +				temp_bd.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T,
> > +							    ENETC_TXBD_L4T_UDP);
> > +			flags |= ENETC_TXBD_FLAGS_CSUM_LSO |
> ENETC_TXBD_FLAGS_L4CS;
> > +		} else {
> > +			if (skb_checksum_help(skb))
> 
> Why not
> 
> 		} else if (skb_checksum_help(skb)) {
> 
> ?

Okay, accept.
> 
> > +				return 0;
> > +		}
> > +	}
> > +
> >  	i = tx_ring->next_to_use;
> >  	txbd = ENETC_TXBD(*tx_ring, i);
> >  	prefetchw(txbd);
> > @@ -173,7 +218,6 @@ static int enetc_map_tx_buffs(struct enetc_bdr
> *tx_ring, struct sk_buff *skb)
> >
> >  	temp_bd.addr = cpu_to_le64(dma);
> >  	temp_bd.buf_len = cpu_to_le16(len);
> > -	temp_bd.lstatus = 0;
> 
> Why is this removed and how is this change related to the checksum offload?

temp_bd has been cleared at the beginning, so we don't need to clear
lstatus again.
+	enetc_clear_tx_bd(&temp_bd);

And lstatus and aux* fields are in the same union. Clearing the lstatus
field will clear the checksum offload auxiliary information previously set.

union {
	struct {
		u8 l3_aux0;
		u8 l3_aux1;
		u8 l4_aux;
		u8 flags;
	}; /* default layout */
	__le32 txstart;
	__le32 lstatus;
};

> 
> >
> >  	tx_swbd = &tx_ring->tx_swbd[i];
> >  	tx_swbd->dma = dma;
> > @@ -594,7 +638,7 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff
> *skb,
> >  {
> >  	struct enetc_ndev_priv *priv = netdev_priv(ndev);
> >  	struct enetc_bdr *tx_ring;
> > -	int count, err;
> > +	int count;
> >
> >  	/* Queue one-step Sync packet if already locked */
> >  	if (skb->cb[0] & ENETC_F_TX_ONESTEP_SYNC_TSTAMP) {
> > @@ -627,11 +671,6 @@ static netdev_tx_t enetc_start_xmit(struct sk_buff
> *skb,
> >  			return NETDEV_TX_BUSY;
> >  		}
> >
> > -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > -			err = skb_checksum_help(skb);
> > -			if (err)
> > -				goto drop_packet_err;
> > -		}
> >  		enetc_lock_mdio();
> >  		count = enetc_map_tx_buffs(tx_ring, skb);
> >  		enetc_unlock_mdio();
> > @@ -3274,6 +3313,7 @@ static const struct enetc_drvdata enetc_pf_data =
> {
> >
> >  static const struct enetc_drvdata enetc4_pf_data = {
> >  	.sysclk_freq = ENETC_CLK_333M,
> > +	.tx_csum = 1,
> 
> Maybe make it `bool tx_csum:1` instead of u8 and assign `true` here?

I think 'u8 tx_csum:1' is fine, we just need to change "1" to "true". After
all, 'u8 xxx:n' fields may be defined later.

> 
> >  	.pmac_offset = ENETC4_PMAC_OFFSET,
> >  	.eth_ops = &enetc4_pf_ethtool_ops,
> >  };
> > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h
> b/drivers/net/ethernet/freescale/enetc/enetc.h
> > index 72fa03dbc2dd..e82eb9a9137c 100644
> > --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> > +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> > @@ -234,6 +234,7 @@ enum enetc_errata {
> >
> >  struct enetc_drvdata {
> >  	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
> > +	u8 tx_csum:1;
> >  	u64 sysclk_freq;
> >  	const struct ethtool_ops *eth_ops;
> >  };
> 
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-17 15:32   ` Alexander Lobakin
@ 2024-12-18  3:06     ` Wei Fang
  2024-12-18  5:45       ` Wei Fang
  2024-12-18 14:30       ` Alexander Lobakin
  0 siblings, 2 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-18  3:06 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

> > +static inline int enetc_lso_count_descs(const struct sk_buff *skb) {
> > +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> > +	 * for linear area data but not include LSO header, namely
> > +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
> 
> What if the head contains headers only and
> `skb_headlen(skb) - lso_hdr_len` is 0?
> 

enetc_lso_count_descs() is a simple helper and only used to calculate
the number of BDs needed in the worst case, so that we can check
whether there are enough BDs to accommodate the current frame.
It has no significant impact on the case you mentioned.

> > +	 */
> > +	return skb_shinfo(skb)->nr_frags + 4; }
> > +
> > +static int enetc_lso_get_hdr_len(const struct sk_buff *skb) {
> > +	int hdr_len, tlen;
> > +
> > +	tlen = skb_is_gso_tcp(skb) ? tcp_hdrlen(skb) : sizeof(struct udphdr);
> > +	hdr_len = skb_transport_offset(skb) + tlen;
> > +
> > +	return hdr_len;
> > +}
> 
> Are you sure the kernel doesn't have similar generic helpers?

This function refers to tso_start() in tso.c. I have not found any other similar
helper functions in kernel.
> 
> > +
> > +static void enetc_lso_start(struct sk_buff *skb, struct enetc_lso_t
> > +*lso) {
> > +	lso->lso_seg_size = skb_shinfo(skb)->gso_size;
> > +	lso->ipv6 = enetc_skb_is_ipv6(skb);
> > +	lso->tcp = skb_is_gso_tcp(skb);
> > +	lso->l3_hdr_len = skb_network_header_len(skb);
> > +	lso->l3_start = skb_network_offset(skb);
> > +	lso->hdr_len = enetc_lso_get_hdr_len(skb);
> > +	lso->total_len = skb->len - lso->hdr_len; }
> > +
> > +static void enetc_lso_map_hdr(struct enetc_bdr *tx_ring, struct sk_buff
> *skb,
> > +			      int *i, struct enetc_lso_t *lso) {
> > +	union enetc_tx_bd txbd_tmp, *txbd;
> > +	struct enetc_tx_swbd *tx_swbd;
> > +	u16 frm_len, frm_len_ext;
> > +	u8 flags, e_flags = 0;
> > +	dma_addr_t addr;
> > +	char *hdr;
> > +
> > +	/* Get the first BD of the LSO BDs chain */
> > +	txbd = ENETC_TXBD(*tx_ring, *i);
> > +	tx_swbd = &tx_ring->tx_swbd[*i];
> > +	prefetchw(txbd);
> 
> Is this prefetchw() proven to give any benefit?
> 

Just to keep the logic consistent with the current code. Existing code always
uses prefetchw() before setting up txbd, and I see no reason for it to behave
differently in different places. I also don't have a quick answer to what the
benefits are. :(

> > +
> > +	/* Prepare LSO header: MAC + IP + TCP/UDP */
> > +	hdr = tx_ring->tso_headers + *i * TSO_HEADER_SIZE;
> > +	memcpy(hdr, skb->data, lso->hdr_len);
> > +	addr = tx_ring->tso_headers_dma + *i * TSO_HEADER_SIZE;
> > +
> > +	frm_len = lso->total_len & 0xffff;
> > +	frm_len_ext = (lso->total_len >> 16) & 0xf;
> 
> Why are these magics just open-coded, even without any comment?
> I have no idea what is going on here for example.
> 
> Also, `& 0xffff` is lower_16_bits(), while `lso->total_len >> 16` is upper_16_bits().

frm_len is the lower 16 bits of the frame length, frm_len_ext is the higher 4 bits
of the frame length, I will add some comments or macros.
> 
> > +
> > +	/* Set the flags of the first BD */
> > +	flags = ENETC_TXBD_FLAGS_EX | ENETC_TXBD_FLAGS_CSUM_LSO |
> > +		ENETC_TXBD_FLAGS_LSO | ENETC_TXBD_FLAGS_L4CS;
> > +
> > +	enetc_clear_tx_bd(&txbd_tmp);
> > +	txbd_tmp.addr = cpu_to_le64(addr);
> > +	txbd_tmp.hdr_len = cpu_to_le16(lso->hdr_len);
> > +
> > +	/* first BD needs frm_len and offload flags set */
> > +	txbd_tmp.frm_len = cpu_to_le16(frm_len);
> > +	txbd_tmp.flags = flags;
> > +
> > +	txbd_tmp.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START, lso->l3_start);
> > +	/* l3_hdr_size in 32-bits (4 bytes) */
> > +	txbd_tmp.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
> > +				      lso->l3_hdr_len / 4);
> > +	if (lso->ipv6)
> > +		txbd_tmp.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T, 1);
> > +	else
> > +		txbd_tmp.l3_aux0 |= FIELD_PREP(ENETC_TX_BD_IPCS, 1);
> 
> Both these "fields" are single bits. You don't need FIELD_PREP() for single-bit
> fields, just `|= ENETC_TX_BD_L3T` etc.

Okay, thanks.
> 
> > +
> > +	txbd_tmp.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T, lso->tcp ?
> > +				     ENETC_TXBD_L4T_TCP : ENETC_TXBD_L4T_UDP);
> > +
> > +	/* For the LSO header we do not set the dma address since
> > +	 * we do not want it unmapped when we do cleanup. We still
> > +	 * set len so that we count the bytes sent.
> > +	 */
> > +	tx_swbd->len = lso->hdr_len;
> > +	tx_swbd->do_twostep_tstamp = false;
> > +	tx_swbd->check_wb = false;
> > +
> > +	/* Actually write the header in the BD */
> > +	*txbd = txbd_tmp;
> > +
> > +	/* Get the next BD, and the next BD is extended BD */
> > +	enetc_bdr_idx_inc(tx_ring, i);
> > +	txbd = ENETC_TXBD(*tx_ring, *i);
> > +	tx_swbd = &tx_ring->tx_swbd[*i];
> > +	prefetchw(txbd);
> 
> (same question as for the previous prefetchw())
> 
> > +
> > +	enetc_clear_tx_bd(&txbd_tmp);
> > +	if (skb_vlan_tag_present(skb)) {
> > +		/* Setup the VLAN fields */
> > +		txbd_tmp.ext.vid = cpu_to_le16(skb_vlan_tag_get(skb));
> > +		txbd_tmp.ext.tpid = 0; /* < C-TAG */
> 
> ???
> 
> Maybe #define it somewhere, that 0 means CVLAN etc.?

Okay, accept.

> 
> > +		e_flags = ENETC_TXBD_E_FLAGS_VLAN_INS;
> > +	}
> > +
> > +	/* Write the BD */
> > +	txbd_tmp.ext.e_flags = e_flags;
> > +	txbd_tmp.ext.lso_sg_size = cpu_to_le16(lso->lso_seg_size);
> > +	txbd_tmp.ext.frm_len_ext = cpu_to_le16(frm_len_ext);
> > +	*txbd = txbd_tmp;
> > +}
> > +
> > +static int enetc_lso_map_data(struct enetc_bdr *tx_ring, struct sk_buff *skb,
> > +			      int *i, struct enetc_lso_t *lso, int *count) {
> > +	union enetc_tx_bd txbd_tmp, *txbd = NULL;
> > +	struct enetc_tx_swbd *tx_swbd;
> > +	skb_frag_t *frag;
> > +	dma_addr_t dma;
> > +	u8 flags = 0;
> > +	int len, f;
> > +
> > +	len = skb_headlen(skb) - lso->hdr_len;
> > +	if (len > 0) {
> > +		dma = dma_map_single(tx_ring->dev, skb->data + lso->hdr_len,
> > +				     len, DMA_TO_DEVICE);
> > +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
> 
> dma_mapping_error() already contains unlikely().

Will remove 'unlikely'. Thanks.

And I will remove all the 'unlikely' for dma_mapping_error() in the future.

> 
> > +			return -ENOMEM;
> > +
> > +		enetc_bdr_idx_inc(tx_ring, i);
> > +		txbd = ENETC_TXBD(*tx_ring, *i);
> > +		tx_swbd = &tx_ring->tx_swbd[*i];
> > +		prefetchw(txbd);
> > +		*count += 1;
> > +
> > +		enetc_clear_tx_bd(&txbd_tmp);
> > +		txbd_tmp.addr = cpu_to_le64(dma);
> > +		txbd_tmp.buf_len = cpu_to_le16(len);
> > +
> > +		tx_swbd->dma = dma;
> > +		tx_swbd->len = len;
> > +		tx_swbd->is_dma_page = 0;
> > +		tx_swbd->dir = DMA_TO_DEVICE;
> > +	}
> > +
> > +	frag = &skb_shinfo(skb)->frags[0];
> > +	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++, frag++) {
> > +		if (txbd)
> > +			*txbd = txbd_tmp;
> > +
> > +		len = skb_frag_size(frag);
> > +		dma = skb_frag_dma_map(tx_ring->dev, frag, 0, len,
> > +				       DMA_TO_DEVICE);
> 
> You now can use skb_frag_dma_map() with 2-4 arguments, so this can be
> replaced to
> 
> 		dma = skb_frag_dma_map(tx_ring->dev, frag);


But my compiler complains the error:
drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_lso_map_data’:
drivers/net/ethernet/freescale/enetc/enetc.c:684:23: error: too few arguments to function ‘skb_frag_dma_map’
  684 |                 dma = skb_frag_dma_map(tx_ring->dev, frag);
      |                       ^~~~~~~~~~~~~~~~

> 
> > +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
> > +			return -ENOMEM;
> > +
> > +		/* Get the next BD */
> > +		enetc_bdr_idx_inc(tx_ring, i);
> > +		txbd = ENETC_TXBD(*tx_ring, *i);
> > +		tx_swbd = &tx_ring->tx_swbd[*i];
> > +		prefetchw(txbd);
> > +		*count += 1;
> > +
> > +		enetc_clear_tx_bd(&txbd_tmp);
> > +		txbd_tmp.addr = cpu_to_le64(dma);
> > +		txbd_tmp.buf_len = cpu_to_le16(len);
> > +
> > +		tx_swbd->dma = dma;
> > +		tx_swbd->len = len;
> > +		tx_swbd->is_dma_page = 1;
> > +		tx_swbd->dir = DMA_TO_DEVICE;
> > +	}
> > +
> > +	/* Last BD needs 'F' bit set */
> > +	flags |= ENETC_TXBD_FLAGS_F;
> > +	txbd_tmp.flags = flags;
> > +	*txbd = txbd_tmp;
> > +
> > +	tx_swbd->is_eof = 1;
> > +	tx_swbd->skb = skb;
> > +
> > +	return 0;
> > +}
> 
> [...]
> 
> > @@ -2096,6 +2329,13 @@ static int enetc_setup_default_rss_table(struct
> enetc_si *si, int num_groups)
> >  	return 0;
> >  }
> >
> > +static void enetc_set_lso_flags_mask(struct enetc_hw *hw) {
> > +	enetc_wr(hw, ENETC4_SILSOSFMR0,
> > +		 SILSOSFMR0_VAL_SET(TCP_NL_SEG_FLAGS_DMASK,
> TCP_NL_SEG_FLAGS_DMASK));
> > +	enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
> > +}
> > +
> >  int enetc_configure_si(struct enetc_ndev_priv *priv)  {
> >  	struct enetc_si *si = priv->si;
> > @@ -2109,6 +2349,9 @@ int enetc_configure_si(struct enetc_ndev_priv
> *priv)
> >  	/* enable SI */
> >  	enetc_wr(hw, ENETC_SIMR, ENETC_SIMR_EN);
> >
> > +	if (si->hw_features & ENETC_SI_F_LSO)
> > +		enetc_set_lso_flags_mask(hw);
> > +
> >  	/* TODO: RSS support for i.MX95 will be supported later, and the
> >  	 * is_enetc_rev1() condition will be removed
> >  	 */
> > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h
> > b/drivers/net/ethernet/freescale/enetc/enetc.h
> > index 1e680f0f5123..6db6b3eee45c 100644
> > --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> > +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> > @@ -41,6 +41,19 @@ struct enetc_tx_swbd {
> >  	u8 qbv_en:1;
> >  };
> >
> > +struct enetc_lso_t {
> > +	bool	ipv6;
> > +	bool	tcp;
> > +	u8	l3_hdr_len;
> > +	u8	hdr_len; /* LSO header length */
> > +	u8	l3_start;
> > +	u16	lso_seg_size;
> > +	int	total_len; /* total data length, not include LSO header */
> > +};
> > +
> > +#define ENETC_1KB_SIZE			1024
> 
> SZ_1K
> 
> > +#define ENETC_LSO_MAX_DATA_LEN		(256 * ENETC_1KB_SIZE)
> 
> SZ_256K
> 
> > +
> >  #define ENETC_RX_MAXFRM_SIZE	ENETC_MAC_MAXFRM_SIZE
> >  #define ENETC_RXB_TRUESIZE	2048 /* PAGE_SIZE >> 1 */
> >  #define ENETC_RXB_PAD		NET_SKB_PAD /* add extra space if needed
> */
> > @@ -238,6 +251,7 @@ enum enetc_errata {  #define ENETC_SI_F_PSFP
> > BIT(0)  #define ENETC_SI_F_QBV  BIT(1)  #define ENETC_SI_F_QBU
> BIT(2)
> > +#define ENETC_SI_F_LSO	BIT(3)
> >
> >  struct enetc_drvdata {
> >  	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */ @@
> > -351,6 +365,7 @@ enum enetc_active_offloads {
> >  	ENETC_F_QCI			= BIT(10),
> >  	ENETC_F_QBU			= BIT(11),
> >  	ENETC_F_TXCSUM			= BIT(12),
> > +	ENETC_F_LSO			= BIT(13),
> >  };
> >
> >  enum enetc_flags_bit {
> > diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > index 26b220677448..cdde8e93a73c 100644
> > --- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > +++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > @@ -12,6 +12,28 @@
> >  #define NXP_ENETC_VENDOR_ID		0x1131
> >  #define NXP_ENETC_PF_DEV_ID		0xe101
> >
> > +/**********************Station interface
> > +registers************************/
> > +/* Station interface LSO segmentation flag mask register 0/1 */
> > +#define ENETC4_SILSOSFMR0		0x1300
> > +#define  SILSOSFMR0_TCP_MID_SEG		GENMASK(27, 16)
> > +#define  SILSOSFMR0_TCP_1ST_SEG		GENMASK(11, 0)
> > +#define  SILSOSFMR0_VAL_SET(first, mid)	((((mid) << 16) &
> SILSOSFMR0_TCP_MID_SEG) | \
> 
> Why not FIELD_PREP()?

Okay, accept.

> 
> > +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
> > +
> > +#define ENETC4_SILSOSFMR1		0x1304
> > +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
> > +#define   TCP_FLAGS_FIN			BIT(0)
> > +#define   TCP_FLAGS_SYN			BIT(1)
> > +#define   TCP_FLAGS_RST			BIT(2)
> > +#define   TCP_FLAGS_PSH			BIT(3)
> > +#define   TCP_FLAGS_ACK			BIT(4)
> > +#define   TCP_FLAGS_URG			BIT(5)
> > +#define   TCP_FLAGS_ECE			BIT(6)
> > +#define   TCP_FLAGS_CWR			BIT(7)
> > +#define   TCP_FLAGS_NS			BIT(8)
> 
> Why are you open-coding these if they're present in uapi/linux/tcp.h?

Okay, I will add 'ENETC' prefix.
> 
> > +/* According to tso_build_hdr(), clear all special flags for not last
> > +packet. */
> 
> But this mask is used only to do a writel(), I don't see it anywhere clearing
> anything...

The hardware will help mask of TCP header flags, we just need to set
flag mask register.

> 
> > +#define TCP_NL_SEG_FLAGS_DMASK		(TCP_FLAGS_FIN |
> TCP_FLAGS_RST | TCP_FLAGS_PSH)
> > +
> >  /***************************ENETC port
> registers**************************/
> >  #define ENETC4_ECAPR0			0x0
> >  #define  ECAPR0_RFS			BIT(2)
> Thanks,
> Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-18  3:06     ` Wei Fang
@ 2024-12-18  5:45       ` Wei Fang
  2024-12-18 14:30       ` Alexander Lobakin
  1 sibling, 0 replies; 15+ messages in thread
From: Wei Fang @ 2024-12-18  5:45 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

> > > +static inline int enetc_lso_count_descs(const struct sk_buff *skb) {
> > > +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> > > +	 * for linear area data but not include LSO header, namely
> > > +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
> >
> > What if the head contains headers only and
> > `skb_headlen(skb) - lso_hdr_len` is 0?
> >
> 
> enetc_lso_count_descs() is a simple helper and only used to calculate
> the number of BDs needed in the worst case, so that we can check
> whether there are enough BDs to accommodate the current frame.
> It has no significant impact on the case you mentioned.
> 
> > > +	 */
> > > +	return skb_shinfo(skb)->nr_frags + 4; }
> > > +
> > > +static int enetc_lso_get_hdr_len(const struct sk_buff *skb) {
> > > +	int hdr_len, tlen;
> > > +
> > > +	tlen = skb_is_gso_tcp(skb) ? tcp_hdrlen(skb) : sizeof(struct udphdr);
> > > +	hdr_len = skb_transport_offset(skb) + tlen;
> > > +
> > > +	return hdr_len;
> > > +}
> >
> > Are you sure the kernel doesn't have similar generic helpers?
> 
> This function refers to tso_start() in tso.c. I have not found any other similar
> helper functions in kernel.
> >
> > > +
> > > +static void enetc_lso_start(struct sk_buff *skb, struct enetc_lso_t
> > > +*lso) {
> > > +	lso->lso_seg_size = skb_shinfo(skb)->gso_size;
> > > +	lso->ipv6 = enetc_skb_is_ipv6(skb);
> > > +	lso->tcp = skb_is_gso_tcp(skb);
> > > +	lso->l3_hdr_len = skb_network_header_len(skb);
> > > +	lso->l3_start = skb_network_offset(skb);
> > > +	lso->hdr_len = enetc_lso_get_hdr_len(skb);
> > > +	lso->total_len = skb->len - lso->hdr_len; }
> > > +
> > > +static void enetc_lso_map_hdr(struct enetc_bdr *tx_ring, struct sk_buff
> > *skb,
> > > +			      int *i, struct enetc_lso_t *lso) {
> > > +	union enetc_tx_bd txbd_tmp, *txbd;
> > > +	struct enetc_tx_swbd *tx_swbd;
> > > +	u16 frm_len, frm_len_ext;
> > > +	u8 flags, e_flags = 0;
> > > +	dma_addr_t addr;
> > > +	char *hdr;
> > > +
> > > +	/* Get the first BD of the LSO BDs chain */
> > > +	txbd = ENETC_TXBD(*tx_ring, *i);
> > > +	tx_swbd = &tx_ring->tx_swbd[*i];
> > > +	prefetchw(txbd);
> >
> > Is this prefetchw() proven to give any benefit?
> >
> 
> Just to keep the logic consistent with the current code. Existing code always
> uses prefetchw() before setting up txbd, and I see no reason for it to behave
> differently in different places. I also don't have a quick answer to what the
> benefits are. :(
> 
> > > +
> > > +	/* Prepare LSO header: MAC + IP + TCP/UDP */
> > > +	hdr = tx_ring->tso_headers + *i * TSO_HEADER_SIZE;
> > > +	memcpy(hdr, skb->data, lso->hdr_len);
> > > +	addr = tx_ring->tso_headers_dma + *i * TSO_HEADER_SIZE;
> > > +
> > > +	frm_len = lso->total_len & 0xffff;
> > > +	frm_len_ext = (lso->total_len >> 16) & 0xf;
> >
> > Why are these magics just open-coded, even without any comment?
> > I have no idea what is going on here for example.
> >
> > Also, `& 0xffff` is lower_16_bits(), while `lso->total_len >> 16` is
> upper_16_bits().
> 
> frm_len is the lower 16 bits of the frame length, frm_len_ext is the higher 4 bits
> of the frame length, I will add some comments or macros.
> >
> > > +
> > > +	/* Set the flags of the first BD */
> > > +	flags = ENETC_TXBD_FLAGS_EX | ENETC_TXBD_FLAGS_CSUM_LSO |
> > > +		ENETC_TXBD_FLAGS_LSO | ENETC_TXBD_FLAGS_L4CS;
> > > +
> > > +	enetc_clear_tx_bd(&txbd_tmp);
> > > +	txbd_tmp.addr = cpu_to_le64(addr);
> > > +	txbd_tmp.hdr_len = cpu_to_le16(lso->hdr_len);
> > > +
> > > +	/* first BD needs frm_len and offload flags set */
> > > +	txbd_tmp.frm_len = cpu_to_le16(frm_len);
> > > +	txbd_tmp.flags = flags;
> > > +
> > > +	txbd_tmp.l3_aux0 = FIELD_PREP(ENETC_TX_BD_L3_START,
> lso->l3_start);
> > > +	/* l3_hdr_size in 32-bits (4 bytes) */
> > > +	txbd_tmp.l3_aux1 = FIELD_PREP(ENETC_TX_BD_L3_HDR_LEN,
> > > +				      lso->l3_hdr_len / 4);
> > > +	if (lso->ipv6)
> > > +		txbd_tmp.l3_aux1 |= FIELD_PREP(ENETC_TX_BD_L3T, 1);
> > > +	else
> > > +		txbd_tmp.l3_aux0 |= FIELD_PREP(ENETC_TX_BD_IPCS, 1);
> >
> > Both these "fields" are single bits. You don't need FIELD_PREP() for single-bit
> > fields, just `|= ENETC_TX_BD_L3T` etc.
> 
> Okay, thanks.
> >
> > > +
> > > +	txbd_tmp.l4_aux = FIELD_PREP(ENETC_TX_BD_L4T, lso->tcp ?
> > > +				     ENETC_TXBD_L4T_TCP :
> ENETC_TXBD_L4T_UDP);
> > > +
> > > +	/* For the LSO header we do not set the dma address since
> > > +	 * we do not want it unmapped when we do cleanup. We still
> > > +	 * set len so that we count the bytes sent.
> > > +	 */
> > > +	tx_swbd->len = lso->hdr_len;
> > > +	tx_swbd->do_twostep_tstamp = false;
> > > +	tx_swbd->check_wb = false;
> > > +
> > > +	/* Actually write the header in the BD */
> > > +	*txbd = txbd_tmp;
> > > +
> > > +	/* Get the next BD, and the next BD is extended BD */
> > > +	enetc_bdr_idx_inc(tx_ring, i);
> > > +	txbd = ENETC_TXBD(*tx_ring, *i);
> > > +	tx_swbd = &tx_ring->tx_swbd[*i];
> > > +	prefetchw(txbd);
> >
> > (same question as for the previous prefetchw())
> >
> > > +
> > > +	enetc_clear_tx_bd(&txbd_tmp);
> > > +	if (skb_vlan_tag_present(skb)) {
> > > +		/* Setup the VLAN fields */
> > > +		txbd_tmp.ext.vid = cpu_to_le16(skb_vlan_tag_get(skb));
> > > +		txbd_tmp.ext.tpid = 0; /* < C-TAG */
> >
> > ???
> >
> > Maybe #define it somewhere, that 0 means CVLAN etc.?
> 
> Okay, accept.
> 
> >
> > > +		e_flags = ENETC_TXBD_E_FLAGS_VLAN_INS;
> > > +	}
> > > +
> > > +	/* Write the BD */
> > > +	txbd_tmp.ext.e_flags = e_flags;
> > > +	txbd_tmp.ext.lso_sg_size = cpu_to_le16(lso->lso_seg_size);
> > > +	txbd_tmp.ext.frm_len_ext = cpu_to_le16(frm_len_ext);
> > > +	*txbd = txbd_tmp;
> > > +}
> > > +
> > > +static int enetc_lso_map_data(struct enetc_bdr *tx_ring, struct sk_buff
> *skb,
> > > +			      int *i, struct enetc_lso_t *lso, int *count) {
> > > +	union enetc_tx_bd txbd_tmp, *txbd = NULL;
> > > +	struct enetc_tx_swbd *tx_swbd;
> > > +	skb_frag_t *frag;
> > > +	dma_addr_t dma;
> > > +	u8 flags = 0;
> > > +	int len, f;
> > > +
> > > +	len = skb_headlen(skb) - lso->hdr_len;
> > > +	if (len > 0) {
> > > +		dma = dma_map_single(tx_ring->dev, skb->data + lso->hdr_len,
> > > +				     len, DMA_TO_DEVICE);
> > > +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
> >
> > dma_mapping_error() already contains unlikely().
> 
> Will remove 'unlikely'. Thanks.
> 
> And I will remove all the 'unlikely' for dma_mapping_error() in the future.
> 
> >
> > > +			return -ENOMEM;
> > > +
> > > +		enetc_bdr_idx_inc(tx_ring, i);
> > > +		txbd = ENETC_TXBD(*tx_ring, *i);
> > > +		tx_swbd = &tx_ring->tx_swbd[*i];
> > > +		prefetchw(txbd);
> > > +		*count += 1;
> > > +
> > > +		enetc_clear_tx_bd(&txbd_tmp);
> > > +		txbd_tmp.addr = cpu_to_le64(dma);
> > > +		txbd_tmp.buf_len = cpu_to_le16(len);
> > > +
> > > +		tx_swbd->dma = dma;
> > > +		tx_swbd->len = len;
> > > +		tx_swbd->is_dma_page = 0;
> > > +		tx_swbd->dir = DMA_TO_DEVICE;
> > > +	}
> > > +
> > > +	frag = &skb_shinfo(skb)->frags[0];
> > > +	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++, frag++) {
> > > +		if (txbd)
> > > +			*txbd = txbd_tmp;
> > > +
> > > +		len = skb_frag_size(frag);
> > > +		dma = skb_frag_dma_map(tx_ring->dev, frag, 0, len,
> > > +				       DMA_TO_DEVICE);
> >
> > You now can use skb_frag_dma_map() with 2-4 arguments, so this can be
> > replaced to
> >
> > 		dma = skb_frag_dma_map(tx_ring->dev, frag);
> 
> 
> But my compiler complains the error:
> drivers/net/ethernet/freescale/enetc/enetc.c: In function
> ‘enetc_lso_map_data’:
> drivers/net/ethernet/freescale/enetc/enetc.c:684:23: error: too few
> arguments to function ‘skb_frag_dma_map’
>   684 |                 dma = skb_frag_dma_map(tx_ring->dev, frag);
>       |                       ^~~~~~~~~~~~~~~~
> 

My bad, I did not update my code base.

> >
> > > +		if (unlikely(dma_mapping_error(tx_ring->dev, dma)))
> > > +			return -ENOMEM;
> > > +
> > > +		/* Get the next BD */
> > > +		enetc_bdr_idx_inc(tx_ring, i);
> > > +		txbd = ENETC_TXBD(*tx_ring, *i);
> > > +		tx_swbd = &tx_ring->tx_swbd[*i];
> > > +		prefetchw(txbd);
> > > +		*count += 1;
> > > +
> > > +		enetc_clear_tx_bd(&txbd_tmp);
> > > +		txbd_tmp.addr = cpu_to_le64(dma);
> > > +		txbd_tmp.buf_len = cpu_to_le16(len);
> > > +
> > > +		tx_swbd->dma = dma;
> > > +		tx_swbd->len = len;
> > > +		tx_swbd->is_dma_page = 1;
> > > +		tx_swbd->dir = DMA_TO_DEVICE;
> > > +	}
> > > +
> > > +	/* Last BD needs 'F' bit set */
> > > +	flags |= ENETC_TXBD_FLAGS_F;
> > > +	txbd_tmp.flags = flags;
> > > +	*txbd = txbd_tmp;
> > > +
> > > +	tx_swbd->is_eof = 1;
> > > +	tx_swbd->skb = skb;
> > > +
> > > +	return 0;
> > > +}
> >
> > [...]
> >
> > > @@ -2096,6 +2329,13 @@ static int enetc_setup_default_rss_table(struct
> > enetc_si *si, int num_groups)
> > >  	return 0;
> > >  }
> > >
> > > +static void enetc_set_lso_flags_mask(struct enetc_hw *hw) {
> > > +	enetc_wr(hw, ENETC4_SILSOSFMR0,
> > > +		 SILSOSFMR0_VAL_SET(TCP_NL_SEG_FLAGS_DMASK,
> > TCP_NL_SEG_FLAGS_DMASK));
> > > +	enetc_wr(hw, ENETC4_SILSOSFMR1, 0);
> > > +}
> > > +
> > >  int enetc_configure_si(struct enetc_ndev_priv *priv)  {
> > >  	struct enetc_si *si = priv->si;
> > > @@ -2109,6 +2349,9 @@ int enetc_configure_si(struct enetc_ndev_priv
> > *priv)
> > >  	/* enable SI */
> > >  	enetc_wr(hw, ENETC_SIMR, ENETC_SIMR_EN);
> > >
> > > +	if (si->hw_features & ENETC_SI_F_LSO)
> > > +		enetc_set_lso_flags_mask(hw);
> > > +
> > >  	/* TODO: RSS support for i.MX95 will be supported later, and the
> > >  	 * is_enetc_rev1() condition will be removed
> > >  	 */
> > > diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h
> > > b/drivers/net/ethernet/freescale/enetc/enetc.h
> > > index 1e680f0f5123..6db6b3eee45c 100644
> > > --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> > > +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> > > @@ -41,6 +41,19 @@ struct enetc_tx_swbd {
> > >  	u8 qbv_en:1;
> > >  };
> > >
> > > +struct enetc_lso_t {
> > > +	bool	ipv6;
> > > +	bool	tcp;
> > > +	u8	l3_hdr_len;
> > > +	u8	hdr_len; /* LSO header length */
> > > +	u8	l3_start;
> > > +	u16	lso_seg_size;
> > > +	int	total_len; /* total data length, not include LSO header */
> > > +};
> > > +
> > > +#define ENETC_1KB_SIZE			1024
> >
> > SZ_1K
> >
> > > +#define ENETC_LSO_MAX_DATA_LEN		(256 * ENETC_1KB_SIZE)
> >
> > SZ_256K
> >
> > > +
> > >  #define ENETC_RX_MAXFRM_SIZE	ENETC_MAC_MAXFRM_SIZE
> > >  #define ENETC_RXB_TRUESIZE	2048 /* PAGE_SIZE >> 1 */
> > >  #define ENETC_RXB_PAD		NET_SKB_PAD /* add extra space if
> needed
> > */
> > > @@ -238,6 +251,7 @@ enum enetc_errata {  #define ENETC_SI_F_PSFP
> > > BIT(0)  #define ENETC_SI_F_QBV  BIT(1)  #define ENETC_SI_F_QBU
> > BIT(2)
> > > +#define ENETC_SI_F_LSO	BIT(3)
> > >
> > >  struct enetc_drvdata {
> > >  	u32 pmac_offset; /* Only valid for PSI which supports 802.1Qbu */
> @@
> > > -351,6 +365,7 @@ enum enetc_active_offloads {
> > >  	ENETC_F_QCI			= BIT(10),
> > >  	ENETC_F_QBU			= BIT(11),
> > >  	ENETC_F_TXCSUM			= BIT(12),
> > > +	ENETC_F_LSO			= BIT(13),
> > >  };
> > >
> > >  enum enetc_flags_bit {
> > > diff --git a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > > b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > > index 26b220677448..cdde8e93a73c 100644
> > > --- a/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > > +++ b/drivers/net/ethernet/freescale/enetc/enetc4_hw.h
> > > @@ -12,6 +12,28 @@
> > >  #define NXP_ENETC_VENDOR_ID		0x1131
> > >  #define NXP_ENETC_PF_DEV_ID		0xe101
> > >
> > > +/**********************Station interface
> > > +registers************************/
> > > +/* Station interface LSO segmentation flag mask register 0/1 */
> > > +#define ENETC4_SILSOSFMR0		0x1300
> > > +#define  SILSOSFMR0_TCP_MID_SEG		GENMASK(27, 16)
> > > +#define  SILSOSFMR0_TCP_1ST_SEG		GENMASK(11, 0)
> > > +#define  SILSOSFMR0_VAL_SET(first, mid)	((((mid) << 16) &
> > SILSOSFMR0_TCP_MID_SEG) | \
> >
> > Why not FIELD_PREP()?
> 
> Okay, accept.
> 
> >
> > > +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
> > > +
> > > +#define ENETC4_SILSOSFMR1		0x1304
> > > +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
> > > +#define   TCP_FLAGS_FIN			BIT(0)
> > > +#define   TCP_FLAGS_SYN			BIT(1)
> > > +#define   TCP_FLAGS_RST			BIT(2)
> > > +#define   TCP_FLAGS_PSH			BIT(3)
> > > +#define   TCP_FLAGS_ACK			BIT(4)
> > > +#define   TCP_FLAGS_URG			BIT(5)
> > > +#define   TCP_FLAGS_ECE			BIT(6)
> > > +#define   TCP_FLAGS_CWR			BIT(7)
> > > +#define   TCP_FLAGS_NS			BIT(8)
> >
> > Why are you open-coding these if they're present in uapi/linux/tcp.h?
> 
> Okay, I will add 'ENETC' prefix.
> >
> > > +/* According to tso_build_hdr(), clear all special flags for not last
> > > +packet. */
> >
> > But this mask is used only to do a writel(), I don't see it anywhere clearing
> > anything...
> 
> The hardware will help mask of TCP header flags, we just need to set
> flag mask register.
> 
> >
> > > +#define TCP_NL_SEG_FLAGS_DMASK		(TCP_FLAGS_FIN |
> > TCP_FLAGS_RST | TCP_FLAGS_PSH)
> > > +
> > >  /***************************ENETC port
> > registers**************************/
> > >  #define ENETC4_ECAPR0			0x0
> > >  #define  ECAPR0_RFS			BIT(2)
> > Thanks,
> > Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-18  3:06     ` Wei Fang
  2024-12-18  5:45       ` Wei Fang
@ 2024-12-18 14:30       ` Alexander Lobakin
  2024-12-19  1:32         ` Wei Fang
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Lobakin @ 2024-12-18 14:30 UTC (permalink / raw)
  To: Wei Fang
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

From: Wei Fang <wei.fang@nxp.com>
Date: Wed, 18 Dec 2024 03:06:06 +0000

>>> +static inline int enetc_lso_count_descs(const struct sk_buff *skb) {
>>> +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
>>> +	 * for linear area data but not include LSO header, namely
>>> +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.

[...]

>>> +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
>>> +
>>> +#define ENETC4_SILSOSFMR1		0x1304
>>> +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
>>> +#define   TCP_FLAGS_FIN			BIT(0)
>>> +#define   TCP_FLAGS_SYN			BIT(1)
>>> +#define   TCP_FLAGS_RST			BIT(2)
>>> +#define   TCP_FLAGS_PSH			BIT(3)
>>> +#define   TCP_FLAGS_ACK			BIT(4)
>>> +#define   TCP_FLAGS_URG			BIT(5)
>>> +#define   TCP_FLAGS_ECE			BIT(6)
>>> +#define   TCP_FLAGS_CWR			BIT(7)
>>> +#define   TCP_FLAGS_NS			BIT(8)
>>
>> Why are you open-coding these if they're present in uapi/linux/tcp.h?
> 
> Okay, I will add 'ENETC' prefix.

You don't need to add a prefix, you need to just use the generic
definitions from the abovementioned file.

>>
>>> +/* According to tso_build_hdr(), clear all special flags for not last
>>> +packet. */
>>
>> But this mask is used only to do a writel(), I don't see it anywhere clearing
>> anything...

Thanks,
Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-18 14:30       ` Alexander Lobakin
@ 2024-12-19  1:32         ` Wei Fang
  2024-12-19 15:16           ` Alexander Lobakin
  0 siblings, 1 reply; 15+ messages in thread
From: Wei Fang @ 2024-12-19  1:32 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

> -----Original Message-----
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Sent: 2024年12月18日 22:30
> To: Wei Fang <wei.fang@nxp.com>
> Cc: Claudiu Manoil <claudiu.manoil@nxp.com>; Vladimir Oltean
> <vladimir.oltean@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>;
> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Frank Li <frank.li@nxp.com>;
> horms@kernel.org; idosch@idosch.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; imx@lists.linux.dev
> Subject: Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95
> ENETC PF
> 
> From: Wei Fang <wei.fang@nxp.com>
> Date: Wed, 18 Dec 2024 03:06:06 +0000
> 
> >>> +static inline int enetc_lso_count_descs(const struct sk_buff *skb) {
> >>> +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
> >>> +	 * for linear area data but not include LSO header, namely
> >>> +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
> 
> [...]
> 
> >>> +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
> >>> +
> >>> +#define ENETC4_SILSOSFMR1		0x1304
> >>> +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
> >>> +#define   TCP_FLAGS_FIN			BIT(0)
> >>> +#define   TCP_FLAGS_SYN			BIT(1)
> >>> +#define   TCP_FLAGS_RST			BIT(2)
> >>> +#define   TCP_FLAGS_PSH			BIT(3)
> >>> +#define   TCP_FLAGS_ACK			BIT(4)
> >>> +#define   TCP_FLAGS_URG			BIT(5)
> >>> +#define   TCP_FLAGS_ECE			BIT(6)
> >>> +#define   TCP_FLAGS_CWR			BIT(7)
> >>> +#define   TCP_FLAGS_NS			BIT(8)
> >>
> >> Why are you open-coding these if they're present in uapi/linux/tcp.h?
> >
> > Okay, I will add 'ENETC' prefix.
> 
> You don't need to add a prefix, you need to just use the generic definitions
> from the abovementioned file.

These are definitions of register bits, they are different from the generic
definitions. The current macros are actually different from those in tcp.h.
The generic format is 'TCP_FLAG_XXX', while here it is 'TCP_FLAGS_XXX'. 
Anyway, I think it is better to add the 'ENETC' prefix to avoid people
mistakenly thinking that these are generic definitions.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF
  2024-12-19  1:32         ` Wei Fang
@ 2024-12-19 15:16           ` Alexander Lobakin
  0 siblings, 0 replies; 15+ messages in thread
From: Alexander Lobakin @ 2024-12-19 15:16 UTC (permalink / raw)
  To: Wei Fang
  Cc: Claudiu Manoil, Vladimir Oltean, Clark Wang,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Frank Li, horms@kernel.org,
	idosch@idosch.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev

From: Wei Fang <wei.fang@nxp.com>
Date: Thu, 19 Dec 2024 01:32:56 +0000

>> -----Original Message-----
>> From: Alexander Lobakin <aleksander.lobakin@intel.com>
>> Sent: 2024年12月18日 22:30
>> To: Wei Fang <wei.fang@nxp.com>
>> Cc: Claudiu Manoil <claudiu.manoil@nxp.com>; Vladimir Oltean
>> <vladimir.oltean@nxp.com>; Clark Wang <xiaoning.wang@nxp.com>;
>> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
>> kuba@kernel.org; pabeni@redhat.com; Frank Li <frank.li@nxp.com>;
>> horms@kernel.org; idosch@idosch.org; netdev@vger.kernel.org;
>> linux-kernel@vger.kernel.org; imx@lists.linux.dev
>> Subject: Re: [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95
>> ENETC PF
>>
>> From: Wei Fang <wei.fang@nxp.com>
>> Date: Wed, 18 Dec 2024 03:06:06 +0000
>>
>>>>> +static inline int enetc_lso_count_descs(const struct sk_buff *skb) {
>>>>> +	/* 4 BDs: 1 BD for LSO header + 1 BD for extended BD + 1 BD
>>>>> +	 * for linear area data but not include LSO header, namely
>>>>> +	 * skb_headlen(skb) - lso_hdr_len. And 1 BD for gap.
>>
>> [...]
>>
>>>>> +					 ((first) & SILSOSFMR0_TCP_1ST_SEG))
>>>>> +
>>>>> +#define ENETC4_SILSOSFMR1		0x1304
>>>>> +#define  SILSOSFMR1_TCP_LAST_SEG	GENMASK(11, 0)
>>>>> +#define   TCP_FLAGS_FIN			BIT(0)
>>>>> +#define   TCP_FLAGS_SYN			BIT(1)
>>>>> +#define   TCP_FLAGS_RST			BIT(2)
>>>>> +#define   TCP_FLAGS_PSH			BIT(3)
>>>>> +#define   TCP_FLAGS_ACK			BIT(4)
>>>>> +#define   TCP_FLAGS_URG			BIT(5)
>>>>> +#define   TCP_FLAGS_ECE			BIT(6)
>>>>> +#define   TCP_FLAGS_CWR			BIT(7)
>>>>> +#define   TCP_FLAGS_NS			BIT(8)
>>>>
>>>> Why are you open-coding these if they're present in uapi/linux/tcp.h?
>>>
>>> Okay, I will add 'ENETC' prefix.
>>
>> You don't need to add a prefix, you need to just use the generic definitions
>> from the abovementioned file.
> 
> These are definitions of register bits, they are different from the generic
> definitions. The current macros are actually different from those in tcp.h.
> The generic format is 'TCP_FLAG_XXX', while here it is 'TCP_FLAGS_XXX'. 
> Anyway, I think it is better to add the 'ENETC' prefix to avoid people
> mistakenly thinking that these are generic definitions.

Oh I'm sorry, I thought those are copies of the generic defs :s

Yes, just add ENETC_ then.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-12-19 15:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-13  2:17 [PATCH v8 net-next 0/4] Add more feautues for ENETC v4 - round 1 Wei Fang
2024-12-13  2:17 ` [PATCH v8 net-next 1/4] net: enetc: add Tx checksum offload for i.MX95 ENETC Wei Fang
2024-12-17 15:13   ` Alexander Lobakin
2024-12-18  1:53     ` Wei Fang
2024-12-13  2:17 ` [PATCH v8 net-next 2/4] net: enetc: update max chained Tx BD number " Wei Fang
2024-12-13  2:17 ` [PATCH v8 net-next 3/4] net: enetc: add LSO support for i.MX95 ENETC PF Wei Fang
2024-12-17  9:20   ` Paolo Abeni
2024-12-17 12:52     ` Wei Fang
2024-12-17 15:32   ` Alexander Lobakin
2024-12-18  3:06     ` Wei Fang
2024-12-18  5:45       ` Wei Fang
2024-12-18 14:30       ` Alexander Lobakin
2024-12-19  1:32         ` Wei Fang
2024-12-19 15:16           ` Alexander Lobakin
2024-12-13  2:17 ` [PATCH v8 net-next 4/4] net: enetc: add UDP segmentation offload support Wei Fang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).