netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers.
@ 2024-06-04 13:13 Mateusz Polchlopek
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF Mateusz Polchlopek
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Mateusz Polchlopek

Initially, during VF creation it registers the PTP clock in
the system and negotiates with PF it's capabilities. In the
meantime the PF enables the Flexible Descriptor for VF.
Only this type of descriptor allows to receive Rx timestamps.

Enabling virtual clock would be possible, though it would probably
perform poorly due to the lack of direct time access.

Enable timestamping should be done using userspace tools, e.g.
hwstamp_ctl -i $VF -r 14

In order to report the timestamps to userspace, the VF extends
timestamp to 40b.

To support this feature the flexible descriptors and PTP part
in iavf driver have been introduced.

---
v7:
- changed .ndo_eth_ioctl to .ndo_hwtstamp_get and .ndo_hwtstamp_set
  (according to Kuba's suggestion) - patch 11

v6:
- reordered tags
- added RB tags where applicable
- removed redundant instructions in ifs - patch 4 and patch 5
- changed teardown to LIFO, adapter->ptp.initialized = false
  moved to the top of function - patch 6
- changed cpu-endianess for testing - patch 9
- aligned to libeth changes - patch 9
https://lore.kernel.org/netdev/20240528112301.5374-1-mateusz.polchlopek@intel.com/

v5:
- fixed all new issues generated by this series in kernel-doc
https://lore.kernel.org/netdev/20240418052500.50678-1-mateusz.polchlopek@intel.com/

v4:
- fixed duplicated argument in iavf_virtchnl.c reported by coccicheck
https://lore.kernel.org/netdev/20240410121706.6223-1-mateusz.polchlopek@intel.com/

v3:
- added RB in commit 6
- removed inline keyword in commit 9
- fixed sparse issues in commit 9 and commit 10
- used GENMASK_ULL when possible in commit 9
https://lore.kernel.org/netdev/20240403131927.87021-1-mateusz.polchlopek@intel.com/

v2:
- fixed warning related to wrong specifier to dev_err_once in
  commit 7
- fixed warnings related to unused variables in commit 9
https://lore.kernel.org/netdev/20240327132543.15923-1-mateusz.polchlopek@intel.com/

v1:
- initial series
https://lore.kernel.org/netdev/20240326115116.10040-1-mateusz.polchlopek@intel.com/
---

Jacob Keller (10):
  virtchnl: add support for enabling PTP on iAVF
  virtchnl: add enumeration for the rxdid format
  iavf: add support for negotiating flexible RXDID format
  iavf: negotiate PTP capabilities
  iavf: add initial framework for registering PTP clock
  iavf: add support for indirect access to PHC time
  iavf: periodically cache PHC time
  iavf: refactor iavf_clean_rx_irq to support legacy and flex
    descriptors
  iavf: handle set and get timestamps ops
  iavf: add support for Rx timestamps to hotpath

Mateusz Polchlopek (1):
  iavf: Implement checking DD desc field

Simei Su (1):
  ice: support Rx timestamp on flex descriptor

 drivers/net/ethernet/intel/iavf/Makefile      |   3 +-
 drivers/net/ethernet/intel/iavf/iavf.h        |  33 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 238 +++++++-
 drivers/net/ethernet/intel/iavf/iavf_ptp.c    | 543 ++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h    |  49 ++
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 425 +++++++++++---
 drivers/net/ethernet/intel/iavf/iavf_txrx.h   |  26 +-
 drivers/net/ethernet/intel/iavf/iavf_type.h   | 148 +++--
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 238 ++++++++
 drivers/net/ethernet/intel/ice/ice_base.c     |   3 -
 drivers/net/ethernet/intel/ice/ice_ptp.c      |   4 +-
 drivers/net/ethernet/intel/ice/ice_ptp.h      |   2 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   2 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  86 ++-
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |   2 +
 .../intel/ice/ice_virtchnl_allowlist.c        |   6 +
 include/linux/avf/virtchnl.h                  | 127 +++-
 17 files changed, 1776 insertions(+), 159 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_ptp.c
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_ptp.h

-- 
2.38.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:55   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor Mateusz Polchlopek
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Rahul Rameshbabu,
	Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Add support for allowing a VF to enable PTP feature - Rx timestamps

The new capability is gated by VIRTCHNL_VF_CAP_PTP, which must be
set by the VF to request access to the new operations. In addition, the
VIRTCHNL_OP_1588_PTP_CAPS command is used to determine the specific
capabilities available to the VF.

This support includes the following additional capabilities:

* Rx timestamps enabled in the Rx queues (when using flexible advanced
  descriptors)
* Read access to PHC time over virtchnl using
  VIRTCHNL_OP_1588_PTP_GET_TIME

Extra space is reserved in most structures to allow for future
extension (like set clock, Tx timestamps).  Additional opcode numbers
are reserved and space in the virtchnl_ptp_caps structure is
specifically set aside for this.
Additionally, each structure has some space reserved for future
extensions to allow some flexibility.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 include/linux/avf/virtchnl.h | 66 ++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index f41395264dca..3663ad743de1 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -151,6 +151,9 @@ enum virtchnl_ops {
 	VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2 = 55,
 	VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2 = 56,
 	VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2 = 57,
+	/* opcode 58 and 59 are reserved */
+	VIRTCHNL_OP_1588_PTP_GET_CAPS = 60,
+	VIRTCHNL_OP_1588_PTP_GET_TIME = 61,
 	VIRTCHNL_OP_MAX,
 };
 
@@ -261,6 +264,7 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC	BIT(26)
 #define VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF		BIT(27)
 #define VIRTCHNL_VF_OFFLOAD_FDIR_PF		BIT(28)
+#define VIRTCHNL_VF_CAP_PTP			BIT(31)
 
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
 			       VIRTCHNL_VF_OFFLOAD_VLAN | \
@@ -1416,6 +1420,62 @@ struct virtchnl_fdir_del {
 
 VIRTCHNL_CHECK_STRUCT_LEN(12, virtchnl_fdir_del);
 
+#define VIRTCHNL_1588_PTP_CAP_RX_TSTAMP		BIT(1)
+#define VIRTCHNL_1588_PTP_CAP_READ_PHC		BIT(2)
+
+/**
+ * struct virtchnl_ptp_caps
+ *
+ * Structure that defines the PTP capabilities available to the VF. The VF
+ * sends VIRTCHNL_OP_1588_PTP_GET_CAPS, and must fill in the ptp_caps field
+ * indicating what capabilities it is requesting. The PF will respond with the
+ * same message with the virtchnl_ptp_caps structure indicating what is
+ * enabled for the VF.
+ *
+ * @caps: On send, VF sets what capabilities it requests. On reply, PF
+ *        indicates what has been enabled for this VF. The PF shall not set
+ *        bits which were not requested by the VF.
+ * @rsvd: Reserved bits for future extension.
+ *
+ * PTP capabilities
+ *
+ * VIRTCHNL_1588_PTP_CAP_RX_TSTAMP indicates that the VF receive queues have
+ * receive timestamps enabled in the flexible descriptors. Note that this
+ * requires a VF to also negotiate to enable advanced flexible descriptors in
+ * the receive path instead of the default legacy descriptor format.
+ *
+ * VIRTCHNL_1588_PTP_CAP_READ_PHC indicates that the VF may read the PHC time
+ * via the VIRTCHNL_OP_1588_PTP_GET_TIME command.
+ *
+ * Note that in the future, additional capability flags may be added which
+ * indicate additional extended support. All fields marked as reserved by this
+ * header will be set to zero. VF implementations should verify this to ensure
+ * that future extensions do not break compatibility.
+ */
+struct virtchnl_ptp_caps {
+	u32 caps;
+	u8 rsvd[44];
+};
+VIRTCHNL_CHECK_STRUCT_LEN(48, virtchnl_ptp_caps);
+
+/**
+ * struct virtchnl_phc_time
+ * @time: PHC time in nanoseconds
+ * @rsvd: Reserved for future extension
+ *
+ * Structure received with VIRTCHNL_OP_1588_PTP_GET_TIME. Contains the 64bits
+ * of PHC clock time in * nanoseconds.
+ *
+ * VIRTCHNL_OP_1588_PTP_GET_TIME may be sent to request the current time of
+ * the PHC. This op is available in case direct access via the PHC registers
+ * is not available.
+ */
+struct virtchnl_phc_time {
+	u64 time;
+	u8 rsvd[8];
+};
+VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_phc_time);
+
 #define __vss_byone(p, member, count, old)				      \
 	(struct_size(p, member, count) + (old - 1 - struct_size(p, member, 0)))
 
@@ -1637,6 +1697,12 @@ virtchnl_vc_validate_vf_msg(struct virtchnl_version_info *ver, u32 v_opcode,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		valid_len = sizeof(struct virtchnl_vlan_setting);
 		break;
+	case VIRTCHNL_OP_1588_PTP_GET_CAPS:
+		valid_len = sizeof(struct virtchnl_ptp_caps);
+		break;
+	case VIRTCHNL_OP_1588_PTP_GET_TIME:
+		valid_len = sizeof(struct virtchnl_phc_time);
+		break;
 	/* These are always errors coming from the VF. */
 	case VIRTCHNL_OP_EVENT:
 	case VIRTCHNL_OP_UNKNOWN:
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:56   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format Mateusz Polchlopek
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Simei Su, Wojciech Drewek, Mateusz Polchlopek

From: Simei Su <simei.su@intel.com>

To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by
the VF to request PTP capability and responded by the PF what capability
is enabled for that VF.

Hardware captures timestamps which contain only 32 bits of nominal
nanoseconds, as opposed to the 64bit timestamps that the stack expects.
To convert 32b to 64b, we need a current PHC time.
VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the
PF with the current PHC time.

Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c     |  3 -
 drivers/net/ethernet/intel/ice/ice_ptp.c      |  4 +-
 drivers/net/ethernet/intel/ice/ice_ptp.h      |  2 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |  2 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 86 ++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |  2 +
 .../intel/ice/ice_virtchnl_allowlist.c        |  6 ++
 include/linux/avf/virtchnl.h                  | 15 +++-
 8 files changed, 111 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index f3dfce136106..81315e58236a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -469,9 +469,6 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring)
 	 */
 	if (vsi->type != ICE_VSI_VF)
 		ice_write_qrxflxp_cntxt(hw, pf_q, rxdid, 0x3, true);
-	else
-		ice_write_qrxflxp_cntxt(hw, pf_q, ICE_RXDID_LEGACY_1, 0x3,
-					false);
 
 	/* Absolute queue number out of 2K needs to be passed */
 	err = ice_write_rxq_ctx(hw, &rlan_ctx, pf_q);
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index b7ab6fdf710d..cee6f0d6181e 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -364,8 +364,8 @@ void ice_ptp_restore_timestamp_mode(struct ice_pf *pf)
  * @sts: Optional parameter for holding a pair of system timestamps from
  *       the system clock. Will be ignored if NULL is given.
  */
-static u64
-ice_ptp_read_src_clk_reg(struct ice_pf *pf, struct ptp_system_timestamp *sts)
+u64 ice_ptp_read_src_clk_reg(struct ice_pf *pf,
+			     struct ptp_system_timestamp *sts)
 {
 	struct ice_hw *hw = &pf->hw;
 	u32 hi, lo, lo2;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h
index e0c23aaedc12..7697f64261c6 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.h
@@ -315,6 +315,8 @@ void ice_ptp_req_tx_single_tstamp(struct ice_ptp_tx *tx, u8 idx);
 void ice_ptp_complete_tx_single_tstamp(struct ice_ptp_tx *tx);
 enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf);
 
+u64 ice_ptp_read_src_clk_reg(struct ice_pf *pf,
+			     struct ptp_system_timestamp *sts);
 u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
 			const struct ice_pkt_ctx *pkt_ctx);
 void ice_ptp_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type);
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index be4266899690..fdc63fae1803 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -136,6 +136,8 @@ struct ice_vf {
 	const struct ice_virtchnl_ops *virtchnl_ops;
 	const struct ice_vf_ops *vf_ops;
 
+	struct virtchnl_ptp_caps ptp_caps;
+
 	/* devlink port data */
 	struct devlink_port devlink_port;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 82ad4c6ff4d7..4f5e36c063e2 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -495,6 +495,9 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 	if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_USO)
 		vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_USO;
 
+	if (vf->driver_caps & VIRTCHNL_VF_CAP_PTP)
+		vfres->vf_cap_flags |= VIRTCHNL_VF_CAP_PTP;
+
 	vfres->num_vsis = 1;
 	/* Tx and Rx queue are equal for VF */
 	vfres->num_queue_pairs = vsi->num_txq;
@@ -1783,9 +1786,17 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 				rxdid = ICE_RXDID_LEGACY_1;
 			}
 
-			ice_write_qrxflxp_cntxt(&vsi->back->hw,
-						vsi->rxq_map[q_idx],
-						rxdid, 0x03, false);
+			if (vf->driver_caps &
+			    VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC &&
+			    vf->driver_caps & VIRTCHNL_VF_CAP_PTP &&
+			    qpi->rxq.flags & VIRTCHNL_PTP_RX_TSTAMP)
+				ice_write_qrxflxp_cntxt(&vsi->back->hw,
+							vsi->rxq_map[q_idx],
+							rxdid, 0x03, true);
+			else
+				ice_write_qrxflxp_cntxt(&vsi->back->hw,
+							vsi->rxq_map[q_idx],
+							rxdid, 0x03, false);
 		}
 	}
 
@@ -3788,6 +3799,65 @@ static int ice_vc_dis_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg)
 				     v_ret, NULL, 0);
 }
 
+static int ice_vc_get_ptp_cap(struct ice_vf *vf, u8 *msg)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	u32 msg_caps;
+	int ret;
+
+	/* VF is not in active state */
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	msg_caps = ((struct virtchnl_ptp_caps *)msg)->caps;
+
+	/* Any VF asking for RX timestamping and reading PHC will get that */
+	if (msg_caps & (VIRTCHNL_1588_PTP_CAP_RX_TSTAMP |
+	    VIRTCHNL_1588_PTP_CAP_READ_PHC))
+		vf->ptp_caps.caps = VIRTCHNL_1588_PTP_CAP_RX_TSTAMP |
+				    VIRTCHNL_1588_PTP_CAP_READ_PHC;
+
+err:
+	/* send the response back to the VF */
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_1588_PTP_GET_CAPS, v_ret,
+				    (u8 *)&vf->ptp_caps,
+				    sizeof(struct virtchnl_ptp_caps));
+	return ret;
+}
+
+static int ice_vc_get_phc_time(struct ice_vf *vf)
+{
+	enum virtchnl_status_code v_ret = VIRTCHNL_STATUS_SUCCESS;
+	struct virtchnl_phc_time *phc_time = NULL;
+	struct ice_pf *pf = vf->pf;
+	int len = 0;
+	int ret;
+
+	if (!test_bit(ICE_VF_STATE_ACTIVE, vf->vf_states)) {
+		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
+		goto err;
+	}
+
+	len = sizeof(struct virtchnl_phc_time);
+	phc_time = kzalloc(len, GFP_KERNEL);
+	if (!phc_time) {
+		v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+		len = 0;
+		goto err;
+	}
+
+	phc_time->time = ice_ptp_read_src_clk_reg(pf, NULL);
+
+err:
+	/* send the response back to the VF */
+	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_1588_PTP_GET_TIME,
+				    v_ret, (u8 *)phc_time, len);
+	kfree(phc_time);
+	return ret;
+}
+
 static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.get_ver_msg = ice_vc_get_ver_msg,
 	.get_vf_res_msg = ice_vc_get_vf_res_msg,
@@ -3821,6 +3891,8 @@ static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_ptp_cap = ice_vc_get_ptp_cap,
+	.get_phc_time = ice_vc_get_phc_time,
 };
 
 /**
@@ -3951,6 +4023,8 @@ static const struct ice_virtchnl_ops ice_virtchnl_repr_ops = {
 	.dis_vlan_stripping_v2_msg = ice_vc_dis_vlan_stripping_v2_msg,
 	.ena_vlan_insertion_v2_msg = ice_vc_ena_vlan_insertion_v2_msg,
 	.dis_vlan_insertion_v2_msg = ice_vc_dis_vlan_insertion_v2_msg,
+	.get_ptp_cap = ice_vc_get_ptp_cap,
+	.get_phc_time = ice_vc_get_phc_time,
 };
 
 /**
@@ -4177,6 +4251,12 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	case VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2:
 		err = ops->dis_vlan_insertion_v2_msg(vf, msg);
 		break;
+	case VIRTCHNL_OP_1588_PTP_GET_CAPS:
+		err = ops->get_ptp_cap(vf, msg);
+		break;
+	case VIRTCHNL_OP_1588_PTP_GET_TIME:
+		err = ops->get_phc_time(vf);
+		break;
 	case VIRTCHNL_OP_UNKNOWN:
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index 3a4115869153..e1c32f0f2e7a 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -61,6 +61,8 @@ struct ice_virtchnl_ops {
 	int (*dis_vlan_stripping_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*ena_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
 	int (*dis_vlan_insertion_v2_msg)(struct ice_vf *vf, u8 *msg);
+	int (*get_ptp_cap)(struct ice_vf *vf, u8 *msg);
+	int (*get_phc_time)(struct ice_vf *vf);
 };
 
 #ifdef CONFIG_PCI_IOV
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
index d796dbd2a440..7a442a53f4cc 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
@@ -84,6 +84,11 @@ static const u32 fdir_pf_allowlist_opcodes[] = {
 	VIRTCHNL_OP_ADD_FDIR_FILTER, VIRTCHNL_OP_DEL_FDIR_FILTER,
 };
 
+/* VIRTCHNL_VF_CAP_PTP */
+static const u32 ptp_allowlist_opcodes[] = {
+	VIRTCHNL_OP_1588_PTP_GET_CAPS, VIRTCHNL_OP_1588_PTP_GET_TIME,
+};
+
 struct allowlist_opcode_info {
 	const u32 *opcodes;
 	size_t size;
@@ -104,6 +109,7 @@ static const struct allowlist_opcode_info allowlist_opcodes[] = {
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF, adv_rss_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_FDIR_PF, fdir_pf_allowlist_opcodes),
 	ALLOW_ITEM(VIRTCHNL_VF_OFFLOAD_VLAN_V2, vlan_v2_allowlist_opcodes),
+	ALLOW_ITEM(VIRTCHNL_VF_CAP_PTP, ptp_allowlist_opcodes),
 };
 
 /**
diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 3663ad743de1..48bdfa1dc893 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -304,6 +304,18 @@ struct virtchnl_txq_info {
 
 VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_txq_info);
 
+/* virtchnl_rxq_info_flags
+ *
+ * Definition of bits in the flags field of the virtchnl_rxq_info structure.
+ */
+enum virtchnl_rxq_info_flags {
+	/* If the VIRTCHNL_PTP_RX_TSTAMP bit of the flag field is set, this is
+	 * a request to enable Rx timestamp. Other flag bits are currently
+	 * reserved and they may be extended in the future.
+	 */
+	VIRTCHNL_PTP_RX_TSTAMP = BIT(0),
+};
+
 /* VIRTCHNL_OP_CONFIG_RX_QUEUE
  * VF sends this message to set up parameters for one RX queue.
  * External data buffer contains one instance of virtchnl_rxq_info.
@@ -327,7 +339,8 @@ struct virtchnl_rxq_info {
 	u32 max_pkt_size;
 	u8 crc_disable;
 	u8 rxdid;
-	u8 pad1[2];
+	u8 flags; /* see virtchnl_rxq_info_flags */
+	u8 pad1;
 	u64 dma_ring_addr;
 
 	/* see enum virtchnl_rx_hsplit; deprecated with AVF 1.0 */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF Mateusz Polchlopek
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:57   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format Mateusz Polchlopek
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Rahul Rameshbabu,
	Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Support for allowing VF to negotiate the descriptor format requires that
the VF specify which descriptor format to use when requesting Rx queues.
The VF is supposed to request the set of supported formats via the new
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, and then set one of the supported
formats in the rxdid field of the virtchnl_rxq_info structure.

The virtchnl.h header does not provide an enumeration of the format
values. The existing implementations in the PF directly use the values
from the DDP package.

Make the formats explicit by defining an enumeration of the RXDIDs.
Provide an enumeration for the values as well as the bit positions as
returned by the supported_rxdids data from the
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 include/linux/avf/virtchnl.h | 46 ++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index 48bdfa1dc893..13c5596233a4 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -304,6 +304,46 @@ struct virtchnl_txq_info {
 
 VIRTCHNL_CHECK_STRUCT_LEN(24, virtchnl_txq_info);
 
+/* RX descriptor IDs (range from 0 to 63) */
+enum virtchnl_rx_desc_ids {
+	VIRTCHNL_RXDID_0_16B_BASE		= 0,
+	VIRTCHNL_RXDID_1_32B_BASE		= 1,
+	VIRTCHNL_RXDID_2_FLEX_SQ_NIC		= 2,
+	VIRTCHNL_RXDID_3_FLEX_SQ_SW		= 3,
+	VIRTCHNL_RXDID_4_FLEX_SQ_NIC_VEB	= 4,
+	VIRTCHNL_RXDID_5_FLEX_SQ_NIC_ACL	= 5,
+	VIRTCHNL_RXDID_6_FLEX_SQ_NIC_2		= 6,
+	VIRTCHNL_RXDID_7_HW_RSVD		= 7,
+	/* 8 through 15 are reserved */
+	VIRTCHNL_RXDID_16_COMMS_GENERIC		= 16,
+	VIRTCHNL_RXDID_17_COMMS_AUX_VLAN	= 17,
+	VIRTCHNL_RXDID_18_COMMS_AUX_IPV4	= 18,
+	VIRTCHNL_RXDID_19_COMMS_AUX_IPV6	= 19,
+	VIRTCHNL_RXDID_20_COMMS_AUX_FLOW	= 20,
+	VIRTCHNL_RXDID_21_COMMS_AUX_TCP		= 21,
+	/* 22 through 63 are reserved */
+};
+
+/* RX descriptor ID bitmasks */
+enum virtchnl_rx_desc_id_bitmasks {
+	VIRTCHNL_RXDID_0_16B_BASE_M		= BIT(VIRTCHNL_RXDID_0_16B_BASE),
+	VIRTCHNL_RXDID_1_32B_BASE_M		= BIT(VIRTCHNL_RXDID_1_32B_BASE),
+	VIRTCHNL_RXDID_2_FLEX_SQ_NIC_M		= BIT(VIRTCHNL_RXDID_2_FLEX_SQ_NIC),
+	VIRTCHNL_RXDID_3_FLEX_SQ_SW_M		= BIT(VIRTCHNL_RXDID_3_FLEX_SQ_SW),
+	VIRTCHNL_RXDID_4_FLEX_SQ_NIC_VEB_M	= BIT(VIRTCHNL_RXDID_4_FLEX_SQ_NIC_VEB),
+	VIRTCHNL_RXDID_5_FLEX_SQ_NIC_ACL_M	= BIT(VIRTCHNL_RXDID_5_FLEX_SQ_NIC_ACL),
+	VIRTCHNL_RXDID_6_FLEX_SQ_NIC_2_M	= BIT(VIRTCHNL_RXDID_6_FLEX_SQ_NIC_2),
+	VIRTCHNL_RXDID_7_HW_RSVD_M		= BIT(VIRTCHNL_RXDID_7_HW_RSVD),
+	/* 8 through 15 are reserved */
+	VIRTCHNL_RXDID_16_COMMS_GENERIC_M	= BIT(VIRTCHNL_RXDID_16_COMMS_GENERIC),
+	VIRTCHNL_RXDID_17_COMMS_AUX_VLAN_M	= BIT(VIRTCHNL_RXDID_17_COMMS_AUX_VLAN),
+	VIRTCHNL_RXDID_18_COMMS_AUX_IPV4_M	= BIT(VIRTCHNL_RXDID_18_COMMS_AUX_IPV4),
+	VIRTCHNL_RXDID_19_COMMS_AUX_IPV6_M	= BIT(VIRTCHNL_RXDID_19_COMMS_AUX_IPV6),
+	VIRTCHNL_RXDID_20_COMMS_AUX_FLOW_M	= BIT(VIRTCHNL_RXDID_20_COMMS_AUX_FLOW),
+	VIRTCHNL_RXDID_21_COMMS_AUX_TCP_M	= BIT(VIRTCHNL_RXDID_21_COMMS_AUX_TCP),
+	/* 22 through 63 are reserved */
+};
+
 /* virtchnl_rxq_info_flags
  *
  * Definition of bits in the flags field of the virtchnl_rxq_info structure.
@@ -338,6 +378,11 @@ struct virtchnl_rxq_info {
 	u32 databuffer_size;
 	u32 max_pkt_size;
 	u8 crc_disable;
+	/* see enum virtchnl_rx_desc_ids;
+	 * only used when VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is supported. Note
+	 * that when the offload is not supported, the descriptor format aligns
+	 * with VIRTCHNL_RXDID_1_32B_BASE.
+	 */
 	u8 rxdid;
 	u8 flags; /* see virtchnl_rxq_info_flags */
 	u8 pad1;
@@ -1041,6 +1086,7 @@ struct virtchnl_filter {
 VIRTCHNL_CHECK_STRUCT_LEN(272, virtchnl_filter);
 
 struct virtchnl_supported_rxdids {
+	/* see enum virtchnl_rx_desc_id_bitmasks */
 	u64 supported_rxdids;
 };
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (2 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:56   ` Simon Horman
  2024-06-08 12:58   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities Mateusz Polchlopek
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Jacob Keller, Wojciech Drewek, Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF
driver the ability to determine what Rx descriptor formats are
available. This requires sending an additional message during
initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This
operation requests the supported Rx descriptor IDs available from the
PF.

This is treated the same way that VLAN V2 capabilities are handled. Add
a new set of extended capability flags, used to process send and receipt
of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message.

This ensures we finish negotiating for the supported descriptor formats
prior to beginning configuration of receive queues.

This change stores the supported format bitmap into the iavf_adapter
structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled
by the PF, we need to make sure that the Rx queue configuration
specifies the format.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        |  20 ++-
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 127 ++++++++++++++++--
 drivers/net/ethernet/intel/iavf/iavf_txrx.h   |   2 +
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |  59 ++++++++
 4 files changed, 199 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index fb6f1b644d3b..bc0201f6453d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -267,6 +267,7 @@ struct iavf_adapter {
 	/* Lock to protect accesses to MAC and VLAN lists */
 	spinlock_t mac_vlan_list_lock;
 	char misc_vector_name[IFNAMSIZ + 9];
+	u8 rxdid;
 	int num_active_queues;
 	int num_req_queues;
 
@@ -336,6 +337,14 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_INSERTION	BIT_ULL(36)
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
+#define IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS		BIT_ULL(39)
+
+	/* AQ messages that must be sent after IAVF_FLAG_AQ_GET_CONFIG, in
+	 * order to negotiated extended capabilities.
+	 */
+#define IAVF_FLAG_AQ_EXTENDED_CAPS			\
+	(IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS |	\
+	 IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -347,10 +356,14 @@ struct iavf_adapter {
 	u64 extended_caps;
 #define IAVF_EXTENDED_CAP_SEND_VLAN_V2			BIT_ULL(0)
 #define IAVF_EXTENDED_CAP_RECV_VLAN_V2			BIT_ULL(1)
+#define IAVF_EXTENDED_CAP_SEND_RXDID			BIT_ULL(2)
+#define IAVF_EXTENDED_CAP_RECV_RXDID			BIT_ULL(3)
 
 #define IAVF_EXTENDED_CAPS				\
 	(IAVF_EXTENDED_CAP_SEND_VLAN_V2 |		\
-	 IAVF_EXTENDED_CAP_RECV_VLAN_V2)
+	 IAVF_EXTENDED_CAP_RECV_VLAN_V2 |		\
+	 IAVF_EXTENDED_CAP_SEND_RXDID |			\
+	 IAVF_EXTENDED_CAP_RECV_RXDID)
 
 	/* Lock to prevent possible clobbering of
 	 * current_netdev_promisc_flags
@@ -408,12 +421,15 @@ struct iavf_adapter {
 			       VIRTCHNL_VF_OFFLOAD_FDIR_PF)
 #define ADV_RSS_SUPPORT(_a) ((_a)->vf_res->vf_cap_flags & \
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
+#define RXDID_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			   VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
 #define PF_IS_V11(_a) (((_a)->pf_version.major == 1) && \
 		       ((_a)->pf_version.minor == 1))
 	struct virtchnl_vlan_caps vlan_v2_caps;
+	struct virtchnl_supported_rxdids supported_rxdids;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
 	struct iavf_vsi vsi;
@@ -551,6 +567,8 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter);
 int iavf_get_vf_config(struct iavf_adapter *adapter);
 int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter);
 int iavf_send_vf_offload_vlan_v2_msg(struct iavf_adapter *adapter);
+int iavf_send_vf_supported_rxdids_msg(struct iavf_adapter *adapter);
+int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter);
 void iavf_set_queue_vlan_tag_loc(struct iavf_adapter *adapter);
 u16 iavf_get_num_vlans_added(struct iavf_adapter *adapter);
 void iavf_irq_enable(struct iavf_adapter *adapter, bool flush);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f46865f2ab56..11f3280793e6 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -710,6 +710,38 @@ static void iavf_configure_tx(struct iavf_adapter *adapter)
 		adapter->tx_rings[i].tail = hw->hw_addr + IAVF_QTX_TAIL1(i);
 }
 
+/**
+ * iavf_select_rx_desc_format - Select Rx descriptor format
+ * @adapter: adapter private structure
+ *
+ * Select what Rx descriptor format based on availability and enabled
+ * features.
+ *
+ * Return: the desired RXDID to select for a given Rx queue, as defined by
+ *         enum virtchnl_rxdid_format.
+ */
+static u8 iavf_select_rx_desc_format(struct iavf_adapter *adapter)
+{
+	u64 supported_rxdids = adapter->supported_rxdids.supported_rxdids;
+
+	/* If we did not negotiate VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, we must
+	 * stick with the default value of the legacy 32 byte format.
+	 */
+	if (!RXDID_ALLOWED(adapter))
+		return VIRTCHNL_RXDID_1_32B_BASE;
+
+	/* Warn if the PF does not list support for the default legacy
+	 * descriptor format. This shouldn't happen, as this is the format
+	 * used if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is not supported. It is
+	 * likely caused by a bug in the PF implementation failing to indicate
+	 * support for the format.
+	 */
+	if (supported_rxdids & BIT(VIRTCHNL_RXDID_1_32B_BASE))
+		dev_warn(&adapter->pdev->dev, "PF does not list support for default Rx descriptor format\n");
+
+	return VIRTCHNL_RXDID_1_32B_BASE;
+}
+
 /**
  * iavf_configure_rx - Configure Receive Unit after Reset
  * @adapter: board private structure
@@ -720,8 +752,12 @@ static void iavf_configure_rx(struct iavf_adapter *adapter)
 {
 	struct iavf_hw *hw = &adapter->hw;
 
-	for (u32 i = 0; i < adapter->num_active_queues; i++)
+	adapter->rxdid = iavf_select_rx_desc_format(adapter);
+
+	for (u32 i = 0; i < adapter->num_active_queues; i++) {
 		adapter->rx_rings[i].tail = hw->hw_addr + IAVF_QRX_TAIL1(i);
+		adapter->rx_rings[i].rxdid = adapter->rxdid;
+	}
 }
 
 /**
@@ -2046,6 +2082,8 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return iavf_send_vf_config_msg(adapter);
 	if (adapter->aq_required & IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS)
 		return iavf_send_vf_offload_vlan_v2_msg(adapter);
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS)
+		return iavf_send_vf_supported_rxdids_msg(adapter);
 	if (adapter->aq_required & IAVF_FLAG_AQ_DISABLE_QUEUES) {
 		iavf_disable_queues(adapter);
 		return 0;
@@ -2559,6 +2597,67 @@ static void iavf_init_recv_offload_vlan_v2_caps(struct iavf_adapter *adapter)
 	iavf_change_state(adapter, __IAVF_INIT_FAILED);
 }
 
+/**
+ * iavf_init_send_supported_rxdids - part of querying for supported RXDID
+ * formats
+ * @adapter: board private structure
+ *
+ * Function processes send of the request for supported RXDIDs to the PF.
+ * Must clear IAVF_EXTENDED_CAP_RECV_RXDID if the message is not sent, e.g.
+ * due to the PF not negotiating VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC.
+ */
+static void iavf_init_send_supported_rxdids(struct iavf_adapter *adapter)
+{
+	int ret;
+
+	WARN_ON(!(adapter->extended_caps & IAVF_EXTENDED_CAP_SEND_RXDID));
+
+	ret = iavf_send_vf_supported_rxdids_msg(adapter);
+	if (ret == -EOPNOTSUPP) {
+		/* PF does not support VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC. In this
+		 * case, we did not send the capability exchange message and
+		 * do not expect a response.
+		 */
+		adapter->extended_caps &= ~IAVF_EXTENDED_CAP_RECV_RXDID;
+	}
+
+	/* We sent the message, so move on to the next step */
+	adapter->extended_caps &= ~IAVF_EXTENDED_CAP_SEND_RXDID;
+}
+
+/**
+ * iavf_init_recv_supported_rxdids - part of querying for supported RXDID
+ * formats
+ * @adapter: board private structure
+ *
+ * Function processes receipt of the supported RXDIDs message from the PF.
+ **/
+static void iavf_init_recv_supported_rxdids(struct iavf_adapter *adapter)
+{
+	int ret;
+
+	WARN_ON(!(adapter->extended_caps & IAVF_EXTENDED_CAP_RECV_RXDID));
+
+	memset(&adapter->supported_rxdids, 0,
+	       sizeof(adapter->supported_rxdids));
+
+	ret = iavf_get_vf_supported_rxdids(adapter);
+	if (ret)
+		goto err;
+
+	/* We've processed the PF response to the
+	 * VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message we sent previously.
+	 */
+	adapter->extended_caps &= ~IAVF_EXTENDED_CAP_RECV_RXDID;
+	return;
+err:
+	/* We didn't receive a reply. Make sure we try sending again when
+	 * __IAVF_INIT_FAILED attempts to recover.
+	 */
+	adapter->extended_caps |= IAVF_EXTENDED_CAP_SEND_RXDID;
+	iavf_change_state(adapter, __IAVF_INIT_FAILED);
+}
+
 /**
  * iavf_init_process_extended_caps - Part of driver startup
  * @adapter: board private structure
@@ -2583,6 +2682,15 @@ static void iavf_init_process_extended_caps(struct iavf_adapter *adapter)
 		return;
 	}
 
+	/* Process capability exchange for RXDID formats */
+	if (adapter->extended_caps & IAVF_EXTENDED_CAP_SEND_RXDID) {
+		iavf_init_send_supported_rxdids(adapter);
+		return;
+	} else if (adapter->extended_caps & IAVF_EXTENDED_CAP_RECV_RXDID) {
+		iavf_init_recv_supported_rxdids(adapter);
+		return;
+	}
+
 	/* When we reach here, no further extended capabilities exchanges are
 	 * necessary, so we finally transition into __IAVF_INIT_CONFIG_ADAPTER
 	 */
@@ -3051,15 +3159,18 @@ static void iavf_reset_task(struct work_struct *work)
 	}
 
 	adapter->aq_required |= IAVF_FLAG_AQ_GET_CONFIG;
-	/* always set since VIRTCHNL_OP_GET_VF_RESOURCES has not been
-	 * sent/received yet, so VLAN_V2_ALLOWED() cannot is not reliable here,
-	 * however the VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS won't be sent until
-	 * VIRTCHNL_OP_GET_VF_RESOURCES and VIRTCHNL_VF_OFFLOAD_VLAN_V2 have
-	 * been successfully sent and negotiated
-	 */
-	adapter->aq_required |= IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS;
 	adapter->aq_required |= IAVF_FLAG_AQ_MAP_VECTORS;
 
+	/* Certain capabilities require an extended negotiation process using
+	 * extra messages that must be processed after getting the VF
+	 * configuration. The related checks such as VLAN_V2_ALLOWED() are not
+	 * reliable here, since the configuration has not yet been negotiated.
+	 *
+	 * Always set these flags, since them related VIRTCHNL messages won't
+	 * be sent until after VIRTCHNL_OP_GET_VF_RESOURCES.
+	 */
+	adapter->aq_required |= IAVF_FLAG_AQ_EXTENDED_CAPS;
+
 	spin_lock_bh(&adapter->mac_vlan_list_lock);
 
 	/* Delete filter for the current MAC address, it could have
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index d7b5587aeb8e..17309d8625ac 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -262,6 +262,8 @@ struct iavf_ring {
 	u16 next_to_use;
 	u16 next_to_clean;
 
+	u8 rxdid;		/* Rx descriptor format */
+
 	u16 flags;
 #define IAVF_TXR_FLAGS_WB_ON_ITR		BIT(0)
 #define IAVF_TXR_FLAGS_ARM_WB			BIT(1)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 7e810b65380c..797e6ecbc30b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -144,6 +144,7 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_ENCAP |
 	       VIRTCHNL_VF_OFFLOAD_TC_U32 |
 	       VIRTCHNL_VF_OFFLOAD_VLAN_V2 |
+	       VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC |
 	       VIRTCHNL_VF_OFFLOAD_CRC |
 	       VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM |
 	       VIRTCHNL_VF_OFFLOAD_REQ_QUEUES |
@@ -176,6 +177,19 @@ int iavf_send_vf_offload_vlan_v2_msg(struct iavf_adapter *adapter)
 				NULL, 0);
 }
 
+int iavf_send_vf_supported_rxdids_msg(struct iavf_adapter *adapter)
+{
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS;
+
+	if (!RXDID_ALLOWED(adapter))
+		return -EOPNOTSUPP;
+
+	adapter->current_op = VIRTCHNL_OP_GET_SUPPORTED_RXDIDS;
+
+	return iavf_send_pf_msg(adapter, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
+				NULL, 0);
+}
+
 /**
  * iavf_validate_num_queues
  * @adapter: adapter structure
@@ -262,6 +276,45 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter)
 	return err;
 }
 
+int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter)
+{
+	struct iavf_hw *hw = &adapter->hw;
+	struct iavf_arq_event_info event;
+	enum virtchnl_ops op;
+	enum iavf_status err;
+	u16 len;
+
+	len =  sizeof(struct virtchnl_supported_rxdids);
+	event.buf_len = len;
+	event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
+	if (!event.msg_buf) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	while (1) {
+		/* When the AQ is empty, iavf_clean_arq_element will return
+		 * nonzero and this loop will terminate.
+		 */
+		err = iavf_clean_arq_element(hw, &event, NULL);
+		if (err != IAVF_SUCCESS)
+			goto out_alloc;
+		op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
+		if (op == VIRTCHNL_OP_GET_SUPPORTED_RXDIDS)
+			break;
+	}
+
+	err = (enum iavf_status)le32_to_cpu(event.desc.cookie_low);
+	if (err)
+		goto out_alloc;
+
+	memcpy(&adapter->supported_rxdids, event.msg_buf, min(event.msg_len, len));
+out_alloc:
+	kfree(event.msg_buf);
+out:
+	return err;
+}
+
 /**
  * iavf_configure_queues
  * @adapter: adapter structure
@@ -308,6 +361,8 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
 		vqpi->rxq.dma_ring_addr = adapter->rx_rings[i].dma;
 		vqpi->rxq.max_pkt_size = max_frame;
 		vqpi->rxq.databuffer_size = adapter->rx_rings[i].rx_buf_len;
+		if (RXDID_ALLOWED(adapter))
+			vqpi->rxq.rxdid = adapter->rxdid;
 		if (CRC_OFFLOAD_ALLOWED(adapter))
 			vqpi->rxq.crc_disable = !!(adapter->netdev->features &
 						   NETIF_F_RXFCS);
@@ -2372,6 +2427,10 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 			aq_required;
 		}
 		break;
+	case VIRTCHNL_OP_GET_SUPPORTED_RXDIDS:
+		memcpy(&adapter->supported_rxdids, msg,
+		       min_t(u16, msglen, sizeof(adapter->supported_rxdids)));
+		break;
 	case VIRTCHNL_OP_ENABLE_QUEUES:
 		/* enable transmits */
 		iavf_irq_enable(adapter, true);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (3 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:58   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock Mateusz Polchlopek
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Jacob Keller, Wojciech Drewek, Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Add a new extended capabilities negotiation to exchange information from
the PF about what PTP capabilities are supported by this VF. This
requires sending a VIRTCHNL_OP_1588_PTP_GET_CAPS message, and waiting
for the response from the PF. Handle this early on during the VF
initialization.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf.h        | 17 +++-
 drivers/net/ethernet/intel/iavf/iavf_main.c   | 69 ++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h    | 12 +++
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   | 79 +++++++++++++++++++
 4 files changed, 175 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_ptp.h

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index bc0201f6453d..8e86b0edb046 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -40,6 +40,7 @@
 #include "iavf_txrx.h"
 #include "iavf_fdir.h"
 #include "iavf_adv_rss.h"
+#include "iavf_ptp.h"
 #include <linux/bitmap.h>
 
 #define DEFAULT_DEBUG_LEVEL_SHIFT 3
@@ -338,13 +339,16 @@ struct iavf_adapter {
 #define IAVF_FLAG_AQ_ENABLE_STAG_VLAN_INSERTION		BIT_ULL(37)
 #define IAVF_FLAG_AQ_DISABLE_STAG_VLAN_INSERTION	BIT_ULL(38)
 #define IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS		BIT_ULL(39)
+#define IAVF_FLAG_AQ_GET_PTP_CAPS			BIT_ULL(40)
+#define IAVF_FLAG_AQ_SEND_PTP_CMD			BIT_ULL(41)
 
 	/* AQ messages that must be sent after IAVF_FLAG_AQ_GET_CONFIG, in
 	 * order to negotiated extended capabilities.
 	 */
 #define IAVF_FLAG_AQ_EXTENDED_CAPS			\
 	(IAVF_FLAG_AQ_GET_OFFLOAD_VLAN_V2_CAPS |	\
-	 IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS)
+	 IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS |		\
+	 IAVF_FLAG_AQ_GET_PTP_CAPS)
 
 	/* flags for processing extended capability messages during
 	 * __IAVF_INIT_EXTENDED_CAPS. Each capability exchange requires
@@ -358,12 +362,16 @@ struct iavf_adapter {
 #define IAVF_EXTENDED_CAP_RECV_VLAN_V2			BIT_ULL(1)
 #define IAVF_EXTENDED_CAP_SEND_RXDID			BIT_ULL(2)
 #define IAVF_EXTENDED_CAP_RECV_RXDID			BIT_ULL(3)
+#define IAVF_EXTENDED_CAP_SEND_PTP			BIT_ULL(4)
+#define IAVF_EXTENDED_CAP_RECV_PTP			BIT_ULL(5)
 
 #define IAVF_EXTENDED_CAPS				\
 	(IAVF_EXTENDED_CAP_SEND_VLAN_V2 |		\
 	 IAVF_EXTENDED_CAP_RECV_VLAN_V2 |		\
 	 IAVF_EXTENDED_CAP_SEND_RXDID |			\
-	 IAVF_EXTENDED_CAP_RECV_RXDID)
+	 IAVF_EXTENDED_CAP_RECV_RXDID |			\
+	 IAVF_EXTENDED_CAP_SEND_PTP |			\
+	 IAVF_EXTENDED_CAP_RECV_PTP)
 
 	/* Lock to prevent possible clobbering of
 	 * current_netdev_promisc_flags
@@ -423,6 +431,8 @@ struct iavf_adapter {
 			     VIRTCHNL_VF_OFFLOAD_ADV_RSS_PF)
 #define RXDID_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
 			   VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC)
+#define PTP_ALLOWED(_a) ((_a)->vf_res->vf_cap_flags & \
+			 VIRTCHNL_VF_CAP_PTP)
 	struct virtchnl_vf_resource *vf_res; /* incl. all VSIs */
 	struct virtchnl_vsi_resource *vsi_res; /* our LAN VSI */
 	struct virtchnl_version_info pf_version;
@@ -430,6 +440,7 @@ struct iavf_adapter {
 		       ((_a)->pf_version.minor == 1))
 	struct virtchnl_vlan_caps vlan_v2_caps;
 	struct virtchnl_supported_rxdids supported_rxdids;
+	struct iavf_ptp ptp;
 	u16 msg_enable;
 	struct iavf_eth_stats current_stats;
 	struct iavf_vsi vsi;
@@ -569,6 +580,8 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter);
 int iavf_send_vf_offload_vlan_v2_msg(struct iavf_adapter *adapter);
 int iavf_send_vf_supported_rxdids_msg(struct iavf_adapter *adapter);
 int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter);
+int iavf_send_vf_ptp_caps_msg(struct iavf_adapter *adapter);
+int iavf_get_vf_ptp_caps(struct iavf_adapter *adapter);
 void iavf_set_queue_vlan_tag_loc(struct iavf_adapter *adapter);
 u16 iavf_get_num_vlans_added(struct iavf_adapter *adapter);
 void iavf_irq_enable(struct iavf_adapter *adapter, bool flush);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 11f3280793e6..7612c2f15845 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2084,6 +2084,8 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		return iavf_send_vf_offload_vlan_v2_msg(adapter);
 	if (adapter->aq_required & IAVF_FLAG_AQ_GET_SUPPORTED_RXDIDS)
 		return iavf_send_vf_supported_rxdids_msg(adapter);
+	if (adapter->aq_required & IAVF_FLAG_AQ_GET_PTP_CAPS)
+		return iavf_send_vf_ptp_caps_msg(adapter);
 	if (adapter->aq_required & IAVF_FLAG_AQ_DISABLE_QUEUES) {
 		iavf_disable_queues(adapter);
 		return 0;
@@ -2658,6 +2660,64 @@ static void iavf_init_recv_supported_rxdids(struct iavf_adapter *adapter)
 	iavf_change_state(adapter, __IAVF_INIT_FAILED);
 }
 
+/**
+ * iavf_init_send_ptp_caps - part of querying for extended PTP capabilities
+ * @adapter: board private structure
+ *
+ * Function processes send of the request for 1588 PTP capabilities to the PF.
+ * Must clear IAVF_EXTENDED_CAP_SEND_PTP if the message is not sent, e.g.
+ * due to the PF not negotiating VIRTCHNL_VF_PTP_CAP
+ */
+static void iavf_init_send_ptp_caps(struct iavf_adapter *adapter)
+{
+	int ret;
+
+	WARN_ON(!(adapter->extended_caps & IAVF_EXTENDED_CAP_SEND_PTP));
+
+	ret = iavf_send_vf_ptp_caps_msg(adapter);
+	if (ret == -EOPNOTSUPP) {
+		/* PF does not support VIRTCHNL_VF_PTP_CAP. In this case, we
+		 * did not send the capability exchange message and do not
+		 * expect a response.
+		 */
+		adapter->extended_caps &= ~IAVF_EXTENDED_CAP_RECV_PTP;
+	}
+
+	/* We sent the message, so move on to the next step */
+	adapter->extended_caps &= ~IAVF_EXTENDED_CAP_SEND_PTP;
+}
+
+/**
+ * iavf_init_recv_ptp_caps - part of querying for supported PTP capabilities
+ * @adapter: board private structure
+ *
+ * Function processes receipt of the PTP capabilities supported on this VF.
+ **/
+static void iavf_init_recv_ptp_caps(struct iavf_adapter *adapter)
+{
+	int ret;
+
+	WARN_ON(!(adapter->extended_caps & IAVF_EXTENDED_CAP_RECV_PTP));
+
+	memset(&adapter->ptp.hw_caps, 0, sizeof(adapter->ptp.hw_caps));
+
+	ret = iavf_get_vf_ptp_caps(adapter);
+	if (ret)
+		goto err;
+
+	/* We've processed the PF response to the VIRTCHNL_OP_1588_PTP_GET_CAPS
+	 * message we sent previously.
+	 */
+	adapter->extended_caps &= ~IAVF_EXTENDED_CAP_RECV_PTP;
+	return;
+err:
+	/* We didn't receive a reply. Make sure we try sending again when
+	 * __IAVF_INIT_FAILED attempts to recover.
+	 */
+	adapter->extended_caps |= IAVF_EXTENDED_CAP_SEND_PTP;
+	iavf_change_state(adapter, __IAVF_INIT_FAILED);
+}
+
 /**
  * iavf_init_process_extended_caps - Part of driver startup
  * @adapter: board private structure
@@ -2691,6 +2751,15 @@ static void iavf_init_process_extended_caps(struct iavf_adapter *adapter)
 		return;
 	}
 
+	/* Process capability exchange for PTP features */
+	if (adapter->extended_caps & IAVF_EXTENDED_CAP_SEND_PTP) {
+		iavf_init_send_ptp_caps(adapter);
+		return;
+	} else if (adapter->extended_caps & IAVF_EXTENDED_CAP_RECV_PTP) {
+		iavf_init_recv_ptp_caps(adapter);
+		return;
+	}
+
 	/* When we reach here, no further extended capabilities exchanges are
 	 * necessary, so we finally transition into __IAVF_INIT_CONFIG_ADAPTER
 	 */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
new file mode 100644
index 000000000000..aee4e2da0b9a
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2024 Intel Corporation. */
+
+#ifndef _IAVF_PTP_H_
+#define _IAVF_PTP_H_
+
+/* fields used for PTP support */
+struct iavf_ptp {
+	struct virtchnl_ptp_caps hw_caps;
+};
+
+#endif /* _IAVF_PTP_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 797e6ecbc30b..096b15375b3d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -148,6 +148,7 @@ int iavf_send_vf_config_msg(struct iavf_adapter *adapter)
 	       VIRTCHNL_VF_OFFLOAD_CRC |
 	       VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM |
 	       VIRTCHNL_VF_OFFLOAD_REQ_QUEUES |
+	       VIRTCHNL_VF_CAP_PTP |
 	       VIRTCHNL_VF_OFFLOAD_ADQ |
 	       VIRTCHNL_VF_OFFLOAD_USO |
 	       VIRTCHNL_VF_OFFLOAD_FDIR_PF |
@@ -190,6 +191,41 @@ int iavf_send_vf_supported_rxdids_msg(struct iavf_adapter *adapter)
 				NULL, 0);
 }
 
+/**
+ * iavf_send_vf_ptp_caps_msg - Send request for PTP capabilities
+ * @adapter: private adapter structure
+ *
+ * Send the VIRTCHNL_OP_1588_PTP_GET_CAPS command to the PF to request the PTP
+ * capabilities available to this device. This includes the following
+ * potential access:
+ *
+ * * READ_PHC - access to read the PTP hardware clock time
+ * * RX_TSTAMP - access to request Rx timestamps on all received packets
+ *
+ * The PF will reply with the same opcode a filled out copy of the
+ * virtchnl_ptp_caps structure which defines the specifics of which features
+ * are accessible to this device.
+ *
+ * Return: 0 if success, error code otherwise
+ */
+int iavf_send_vf_ptp_caps_msg(struct iavf_adapter *adapter)
+{
+	struct virtchnl_ptp_caps hw_caps = {};
+
+	adapter->aq_required &= ~IAVF_FLAG_AQ_GET_PTP_CAPS;
+
+	if (!PTP_ALLOWED(adapter))
+		return -EOPNOTSUPP;
+
+	hw_caps.caps = (VIRTCHNL_1588_PTP_CAP_READ_PHC |
+			VIRTCHNL_1588_PTP_CAP_RX_TSTAMP);
+
+	adapter->current_op = VIRTCHNL_OP_1588_PTP_GET_CAPS;
+
+	return iavf_send_pf_msg(adapter, VIRTCHNL_OP_1588_PTP_GET_CAPS,
+				(u8 *)&hw_caps, sizeof(hw_caps));
+}
+
 /**
  * iavf_validate_num_queues
  * @adapter: adapter structure
@@ -315,6 +351,45 @@ int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter)
 	return err;
 }
 
+int iavf_get_vf_ptp_caps(struct iavf_adapter *adapter)
+{
+	struct iavf_hw *hw = &adapter->hw;
+	struct iavf_arq_event_info event;
+	enum virtchnl_ops op;
+	enum iavf_status err;
+	u16 len;
+
+	len =  sizeof(struct virtchnl_ptp_caps);
+	event.buf_len = len;
+	event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
+	if (!event.msg_buf) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	while (1) {
+		/* When the AQ is empty, iavf_clean_arq_element will return
+		 * nonzero and this loop will terminate.
+		 */
+		err = iavf_clean_arq_element(hw, &event, NULL);
+		if (err != IAVF_SUCCESS)
+			goto out_alloc;
+		op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
+		if (op == VIRTCHNL_OP_1588_PTP_GET_CAPS)
+			break;
+	}
+
+	err = (enum iavf_status)le32_to_cpu(event.desc.cookie_low);
+	if (err)
+		goto out_alloc;
+
+	memcpy(&adapter->ptp.hw_caps, event.msg_buf, min(event.msg_len, len));
+out_alloc:
+	kfree(event.msg_buf);
+out:
+	return err;
+}
+
 /**
  * iavf_configure_queues
  * @adapter: adapter structure
@@ -2431,6 +2506,10 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		memcpy(&adapter->supported_rxdids, msg,
 		       min_t(u16, msglen, sizeof(adapter->supported_rxdids)));
 		break;
+	case VIRTCHNL_OP_1588_PTP_GET_CAPS:
+		memcpy(&adapter->ptp.hw_caps, msg,
+		       min_t(u16, msglen, sizeof(adapter->ptp.hw_caps)));
+		break;
 	case VIRTCHNL_OP_ENABLE_QUEUES:
 		/* enable transmits */
 		iavf_irq_enable(adapter, true);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (4 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:58   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time Mateusz Polchlopek
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Sai Krishna, Ahmed Zaki,
	Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Add the iavf_ptp.c file and fill it in with a skeleton framework to
allow registering the PTP clock device.
Add implementation of helper functions to check if a PTP capability
is supported and handle change in PTP capabilities.
Enabling virtual clock would be possible, though it would probably
perform poorly due to the lack of direct time access.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/Makefile      |   3 +-
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   5 +
 drivers/net/ethernet/intel/iavf/iavf_ptp.c    | 125 ++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h    |  10 ++
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   2 +
 5 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/iavf/iavf_ptp.c

diff --git a/drivers/net/ethernet/intel/iavf/Makefile b/drivers/net/ethernet/intel/iavf/Makefile
index 356ac9faa5bf..364eafe31483 100644
--- a/drivers/net/ethernet/intel/iavf/Makefile
+++ b/drivers/net/ethernet/intel/iavf/Makefile
@@ -12,4 +12,5 @@ subdir-ccflags-y += -I$(src)
 obj-$(CONFIG_IAVF) += iavf.o
 
 iavf-y := iavf_main.o iavf_ethtool.o iavf_virtchnl.o iavf_fdir.o \
-	  iavf_adv_rss.o iavf_txrx.o iavf_common.o iavf_adminq.o
+	  iavf_adv_rss.o iavf_txrx.o iavf_common.o iavf_adminq.o \
+	  iavf_ptp.o
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 7612c2f15845..cf11a1fd19bd 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2848,6 +2848,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
 		/* request initial VLAN offload settings */
 		iavf_set_vlan_offload_features(adapter, 0, netdev->features);
 
+	/* Setup initial PTP configuration */
+	iavf_ptp_init(adapter);
+
 	iavf_schedule_finish_config(adapter);
 	return;
 
@@ -5475,6 +5478,8 @@ static void iavf_remove(struct pci_dev *pdev)
 		msleep(50);
 	}
 
+	iavf_ptp_release(adapter);
+
 	iavf_misc_irq_disable(adapter);
 	/* Shut down all the garbage mashers on the detention level */
 	cancel_work_sync(&adapter->reset_task);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
new file mode 100644
index 000000000000..84ce98ac9c31
--- /dev/null
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2024 Intel Corporation. */
+
+#include "iavf.h"
+
+/**
+ * iavf_ptp_cap_supported - Check if a PTP capability is supported
+ * @adapter: private adapter structure
+ * @cap: the capability bitmask to check
+ *
+ * Return: true if every capability set in cap is also set in the enabled
+ *         capabilities reported by the PF, false otherwise.
+ */
+bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap)
+{
+	if (!PTP_ALLOWED(adapter))
+		return false;
+
+	/* Only return true if every bit in cap is set in hw_caps.caps */
+	return (adapter->ptp.hw_caps.caps & cap) == cap;
+}
+
+/**
+ * iavf_ptp_register_clock - Register a new PTP for userspace
+ * @adapter: private adapter structure
+ *
+ * Allocate and register a new PTP clock device if necessary.
+ *
+ * Return: 0 if success, error otherwise
+ */
+static int iavf_ptp_register_clock(struct iavf_adapter *adapter)
+{
+	struct ptp_clock_info *ptp_info = &adapter->ptp.info;
+	struct device *dev = &adapter->pdev->dev;
+
+	memset(ptp_info, 0, sizeof(*ptp_info));
+
+	snprintf(ptp_info->name, sizeof(ptp_info->name) - 1, "%s-%s-clk",
+		 dev_driver_string(dev),
+		 dev_name(dev));
+	ptp_info->owner = THIS_MODULE;
+
+	adapter->ptp.clock = ptp_clock_register(ptp_info, dev);
+	if (IS_ERR(adapter->ptp.clock))
+		return PTR_ERR(adapter->ptp.clock);
+
+	dev_info(&adapter->pdev->dev, "PTP clock %s registered\n",
+		 adapter->ptp.info.name);
+	return 0;
+}
+
+/**
+ * iavf_ptp_init - Initialize PTP support if capability was negotiated
+ * @adapter: private adapter structure
+ *
+ * Initialize PTP functionality, based on the capabilities that the PF has
+ * enabled for this VF.
+ */
+void iavf_ptp_init(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	int err;
+
+	if (WARN_ON(adapter->ptp.initialized)) {
+		dev_err(dev, "PTP functionality was already initialized!\n");
+		return;
+	}
+
+	if (!iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_READ_PHC)) {
+		dev_dbg(dev, "Device does not have PTP clock support\n");
+		return;
+	}
+
+	err = iavf_ptp_register_clock(adapter);
+	if (err) {
+		dev_warn(dev, "Failed to register PTP clock device (%d)\n",
+			 err);
+		return;
+	}
+
+	adapter->ptp.initialized = true;
+}
+
+/**
+ * iavf_ptp_release - Disable PTP support
+ * @adapter: private adapter structure
+ *
+ * Release all PTP resources that were previously initialized.
+ */
+void iavf_ptp_release(struct iavf_adapter *adapter)
+{
+	adapter->ptp.initialized = false;
+
+	if (!IS_ERR_OR_NULL(adapter->ptp.clock)) {
+		dev_info(&adapter->pdev->dev, "removing PTP clock %s\n",
+			 adapter->ptp.info.name);
+		ptp_clock_unregister(adapter->ptp.clock);
+		adapter->ptp.clock = NULL;
+	}
+}
+
+/**
+ * iavf_ptp_process_caps - Handle change in PTP capabilities
+ * @adapter: private adapter structure
+ *
+ * Handle any state changes necessary due to change in PTP capabilities, such
+ * as after a device reset or change in configuration from the PF.
+ */
+void iavf_ptp_process_caps(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+
+	dev_dbg(dev, "PTP capabilities changed at runtime\n");
+
+	/* Check if the device gained or lost necessary access to support the
+	 * PTP hardware clock. If so, driver must respond appropriately by
+	 * creating or destroying the PTP clock device.
+	 */
+	if (adapter->ptp.initialized &&
+	    !iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_READ_PHC))
+		iavf_ptp_release(adapter);
+	else if (!adapter->ptp.initialized &&
+		 iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_READ_PHC))
+		iavf_ptp_init(adapter);
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
index aee4e2da0b9a..4939c219bd18 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -4,9 +4,19 @@
 #ifndef _IAVF_PTP_H_
 #define _IAVF_PTP_H_
 
+#include <linux/ptp_clock_kernel.h>
+
 /* fields used for PTP support */
 struct iavf_ptp {
 	struct virtchnl_ptp_caps hw_caps;
+	bool initialized;
+	struct ptp_clock_info info;
+	struct ptp_clock *clock;
 };
 
+void iavf_ptp_init(struct iavf_adapter *adapter);
+void iavf_ptp_release(struct iavf_adapter *adapter);
+void iavf_ptp_process_caps(struct iavf_adapter *adapter);
+bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap);
+
 #endif /* _IAVF_PTP_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 096b15375b3d..ebb764e20ddf 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -2509,6 +2509,8 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 	case VIRTCHNL_OP_1588_PTP_GET_CAPS:
 		memcpy(&adapter->ptp.hw_caps, msg,
 		       min_t(u16, msglen, sizeof(adapter->ptp.hw_caps)));
+		/* process any state change needed due to new capabilities */
+		iavf_ptp_process_caps(adapter);
 		break;
 	case VIRTCHNL_OP_ENABLE_QUEUES:
 		/* enable transmits */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (5 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:59   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache " Mateusz Polchlopek
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Rahul Rameshbabu,
	Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Implement support for reading the PHC time indirectly via the
VIRTCHNL_OP_1588_PTP_GET_TIME operation.

Based on some simple tests with ftrace, the latency of the indirect
clock access appears to be about ~110 microseconds. This is due to the
cost of preparing a message to send over the virtchnl queue.

This is expected, due to the increased jitter caused by sending messages
over virtchnl. It is not easy to control the precise time that the
message is sent by the VF, or the time that the message is responded to
by the PF, or the time that the message sent from the PF is received by
the VF.

For sending the request, note that many PTP related operations will
require sending of VIRTCHNL messages. Instead of adding a separate AQ
flag and storage for each operation, setup a simple queue mechanism for
queuing up virtchnl messages.

Each message will be converted to a iavf_ptp_aq_cmd structure which ends
with a flexible array member. A single AQ flag is added for processing
messages from this queue. In principle this could be extended to handle
arbitrary virtchnl messages. For now it is kept to PTP-specific as the
need is primarily for handling PTP-related commands.

Use this to implement .gettimex64 using the indirect method via the
virtchnl command. The response from the PF is processed and stored into
the cached_phc_time. A wait queue is used to allow the PTP clock gettime
request to sleep until the message is sent from the PF.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c   |   9 +-
 drivers/net/ethernet/intel/iavf/iavf_ptp.c    | 161 ++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h    |  16 ++
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |  93 ++++++++++
 4 files changed, 278 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index cf11a1fd19bd..f613bffabf85 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -2235,7 +2235,10 @@ static int iavf_process_aq_command(struct iavf_adapter *adapter)
 		iavf_enable_vlan_insertion_v2(adapter, ETH_P_8021AD);
 		return 0;
 	}
-
+	if (adapter->aq_required & IAVF_FLAG_AQ_SEND_PTP_CMD) {
+		iavf_virtchnl_send_ptp_cmd(adapter);
+		return IAVF_SUCCESS;
+	}
 	if (adapter->aq_required & IAVF_FLAG_AQ_REQUEST_STATS) {
 		iavf_request_stats(adapter);
 		return 0;
@@ -5327,6 +5330,10 @@ static int iavf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Setup the wait queue for indicating virtchannel events */
 	init_waitqueue_head(&adapter->vc_waitqueue);
 
+	INIT_LIST_HEAD(&adapter->ptp.aq_cmds);
+	init_waitqueue_head(&adapter->ptp.phc_time_waitqueue);
+	spin_lock_init(&adapter->ptp.aq_cmd_lock);
+
 	queue_delayed_work(adapter->wq, &adapter->watchdog_task,
 			   msecs_to_jiffies(5 * (pdev->devfn & 0x07)));
 	/* Initialization goes on in the work. Do not add more of it below. */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
index 84ce98ac9c31..d63f018792de 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
@@ -3,6 +3,23 @@
 
 #include "iavf.h"
 
+/**
+ * clock_to_adapter - Convert clock info pointer to adapter pointer
+ * @ptp_info: PTP info structure
+ *
+ * Use container_of in order to extract a pointer to the iAVF adapter private
+ * structure.
+ *
+ * Return: pointer to iavf_adapter structure
+ */
+static struct iavf_adapter *clock_to_adapter(struct ptp_clock_info *ptp_info)
+{
+	struct iavf_ptp *ptp_priv;
+
+	ptp_priv = container_of(ptp_info, struct iavf_ptp, info);
+	return container_of(ptp_priv, struct iavf_adapter, ptp);
+}
+
 /**
  * iavf_ptp_cap_supported - Check if a PTP capability is supported
  * @adapter: private adapter structure
@@ -20,6 +37,138 @@ bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap)
 	return (adapter->ptp.hw_caps.caps & cap) == cap;
 }
 
+/**
+ * iavf_allocate_ptp_cmd - Allocate a PTP command message structure
+ * @v_opcode: the virtchnl opcode
+ * @msglen: length in bytes of the associated virtchnl structure
+ *
+ * Allocates a PTP command message and pre-fills it with the provided message
+ * length and opcode.
+ *
+ * Return: allocated PTP command
+ */
+static struct iavf_ptp_aq_cmd *iavf_allocate_ptp_cmd(enum virtchnl_ops v_opcode,
+						     u16 msglen)
+{
+	struct iavf_ptp_aq_cmd *cmd;
+
+	cmd = kzalloc(struct_size(cmd, msg, msglen), GFP_KERNEL);
+	if (!cmd)
+		return NULL;
+
+	cmd->v_opcode = v_opcode;
+	cmd->msglen = msglen;
+
+	return cmd;
+}
+
+/**
+ * iavf_queue_ptp_cmd - Queue PTP command for sending over virtchnl
+ * @adapter: private adapter structure
+ * @cmd: the command structure to send
+ *
+ * Queue the given command structure into the PTP virtchnl command queue tos
+ * end to the PF.
+ */
+static void iavf_queue_ptp_cmd(struct iavf_adapter *adapter,
+			       struct iavf_ptp_aq_cmd *cmd)
+{
+	spin_lock(&adapter->ptp.aq_cmd_lock);
+	list_add_tail(&cmd->list, &adapter->ptp.aq_cmds);
+	spin_unlock(&adapter->ptp.aq_cmd_lock);
+
+	adapter->aq_required |= IAVF_FLAG_AQ_SEND_PTP_CMD;
+	mod_delayed_work(adapter->wq, &adapter->watchdog_task, 0);
+}
+
+/**
+ * iavf_send_phc_read - Send request to read PHC time
+ * @adapter: private adapter structure
+ *
+ * Send a request to obtain the PTP hardware clock time. This allocates the
+ * VIRTCHNL_OP_1588_PTP_GET_TIME message and queues it up to send to
+ * indirectly read the PHC time.
+ *
+ * This function does not wait for the reply from the PF.
+ *
+ * Return: 0 if success, error code otherwise
+ */
+static int iavf_send_phc_read(struct iavf_adapter *adapter)
+{
+	struct iavf_ptp_aq_cmd *cmd;
+
+	if (!adapter->ptp.initialized)
+		return -EOPNOTSUPP;
+
+	cmd = iavf_allocate_ptp_cmd(VIRTCHNL_OP_1588_PTP_GET_TIME,
+				    sizeof(struct virtchnl_phc_time));
+	if (!cmd)
+		return -ENOMEM;
+
+	iavf_queue_ptp_cmd(adapter, cmd);
+
+	return 0;
+}
+
+/**
+ * iavf_read_phc_indirect - Indirectly read the PHC time via virtchnl
+ * @adapter: private adapter structure
+ * @ts: storage for the timestamp value
+ * @sts: system timestamp values before and after the read
+ *
+ * Used when the device does not have direct register access to the PHC time.
+ * Indirectly reads the time via the VIRTCHNL_OP_1588_PTP_GET_TIME, and waits
+ * for the reply from the PF.
+ *
+ * Based on some simple measurements using ftrace and phc2sys, this clock
+ * access method has about a ~110 usec latency even when the system is not
+ * under load. In order to achieve acceptable results when using phc2sys with
+ * the indirect clock access method, it is recommended to use more
+ * conservative proportional and integration constants with the P/I servo.
+ *
+ * Return: 0 if success, error code otherwise
+ */
+static int iavf_read_phc_indirect(struct iavf_adapter *adapter,
+				  struct timespec64 *ts,
+				  struct ptp_system_timestamp *sts)
+{
+	long ret;
+	int err;
+
+	adapter->ptp.phc_time_ready = false;
+	ptp_read_system_prets(sts);
+
+	err = iavf_send_phc_read(adapter);
+	if (err)
+		return err;
+
+	ret = wait_event_interruptible_timeout(adapter->ptp.phc_time_waitqueue,
+					       adapter->ptp.phc_time_ready,
+					       HZ);
+	if (ret < 0)
+		return ret;
+	else if (!ret)
+		return -EBUSY;
+
+	*ts = ns_to_timespec64(adapter->ptp.cached_phc_time);
+
+	ptp_read_system_postts(sts);
+
+	return 0;
+}
+
+static int iavf_ptp_gettimex64(struct ptp_clock_info *ptp,
+			       struct timespec64 *ts,
+			       struct ptp_system_timestamp *sts)
+{
+	struct iavf_adapter *adapter = clock_to_adapter(ptp);
+
+	if (!adapter->ptp.initialized)
+		return -ENODEV;
+
+	return iavf_read_phc_indirect(adapter, ts, sts);
+}
+
 /**
  * iavf_ptp_register_clock - Register a new PTP for userspace
  * @adapter: private adapter structure
@@ -39,6 +188,7 @@ static int iavf_ptp_register_clock(struct iavf_adapter *adapter)
 		 dev_driver_string(dev),
 		 dev_name(dev));
 	ptp_info->owner = THIS_MODULE;
+	ptp_info->gettimex64 = iavf_ptp_gettimex64;
 
 	adapter->ptp.clock = ptp_clock_register(ptp_info, dev);
 	if (IS_ERR(adapter->ptp.clock))
@@ -89,6 +239,8 @@ void iavf_ptp_init(struct iavf_adapter *adapter)
  */
 void iavf_ptp_release(struct iavf_adapter *adapter)
 {
+	struct iavf_ptp_aq_cmd *cmd, *tmp;
+
 	adapter->ptp.initialized = false;
 
 	if (!IS_ERR_OR_NULL(adapter->ptp.clock)) {
@@ -97,6 +249,15 @@ void iavf_ptp_release(struct iavf_adapter *adapter)
 		ptp_clock_unregister(adapter->ptp.clock);
 		adapter->ptp.clock = NULL;
 	}
+
+	/* Cancel any remaining uncompleted PTP clock commands */
+	spin_lock(&adapter->ptp.aq_cmd_lock);
+	list_for_each_entry_safe(cmd, tmp, &adapter->ptp.aq_cmds, list) {
+		list_del(&cmd->list);
+		kfree(cmd);
+	}
+	adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD;
+	spin_unlock(&adapter->ptp.aq_cmd_lock);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
index 4939c219bd18..4f84416743e1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -6,10 +6,25 @@
 
 #include <linux/ptp_clock_kernel.h>
 
+/* structure used to queue PTP commands for processing */
+struct iavf_ptp_aq_cmd {
+	struct list_head list;
+	enum virtchnl_ops v_opcode;
+	u16 msglen;
+	u8 msg[];
+};
+
 /* fields used for PTP support */
 struct iavf_ptp {
+	wait_queue_head_t phc_time_waitqueue;
 	struct virtchnl_ptp_caps hw_caps;
+	struct list_head aq_cmds;
+	/* Lock protecting access to the AQ command list */
+	spinlock_t aq_cmd_lock;
+	u64 cached_phc_time;
+	unsigned long cached_phc_updated;
 	bool initialized;
+	bool phc_time_ready;
 	struct ptp_clock_info info;
 	struct ptp_clock *clock;
 };
@@ -18,5 +33,6 @@ void iavf_ptp_init(struct iavf_adapter *adapter);
 void iavf_ptp_release(struct iavf_adapter *adapter);
 void iavf_ptp_process_caps(struct iavf_adapter *adapter);
 bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap);
+void iavf_virtchnl_send_ptp_cmd(struct iavf_adapter *adapter);
 
 #endif /* _IAVF_PTP_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index ebb764e20ddf..2693c3ad0830 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -1531,6 +1531,63 @@ void iavf_disable_vlan_insertion_v2(struct iavf_adapter *adapter, u16 tpid)
 				  VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2);
 }
 
+/**
+ * iavf_virtchnl_send_ptp_cmd - Send one queued PTP command
+ * @adapter: adapter private structure
+ *
+ * De-queue one PTP command request and send the command message to the PF.
+ * Clear IAVF_FLAG_AQ_SEND_PTP_CMD if no more messages are left to send.
+ */
+void iavf_virtchnl_send_ptp_cmd(struct iavf_adapter *adapter)
+{
+	struct device *dev = &adapter->pdev->dev;
+	struct iavf_ptp_aq_cmd *cmd;
+	int err;
+
+	if (WARN_ON(!adapter->ptp.initialized)) {
+		/* This shouldn't be possible to hit, since no messages should
+		 * be queued if PTP is not initialized.
+		 */
+		adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD;
+		return;
+	}
+
+	spin_lock(&adapter->ptp.aq_cmd_lock);
+	cmd = list_first_entry_or_null(&adapter->ptp.aq_cmds,
+				       struct iavf_ptp_aq_cmd, list);
+	if (!cmd) {
+		/* no further PTP messages to send */
+		adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD;
+		goto out_unlock;
+	}
+
+	if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
+		/* bail because we already have a command pending */
+		dev_err(dev, "Cannot send PTP command %d, command %d pending\n",
+			cmd->v_opcode, adapter->current_op);
+		goto out_unlock;
+	}
+
+	err = iavf_send_pf_msg(adapter, cmd->v_opcode, cmd->msg, cmd->msglen);
+	if (!err) {
+		/* Command was sent without errors, so we can remove it from
+		 * the list and discard it.
+		 */
+		list_del(&cmd->list);
+		kfree(cmd);
+	} else {
+		/* We failed to send the command, try again next cycle */
+		dev_warn(dev, "Failed to send PTP command %d\n", cmd->v_opcode);
+	}
+
+	if (list_empty(&adapter->ptp.aq_cmds))
+		/* no further PTP messages to send */
+		adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD;
+
+out_unlock:
+	spin_unlock(&adapter->ptp.aq_cmd_lock);
+}
+
 /**
  * iavf_print_link_message - print link up or down
  * @adapter: adapter structure
@@ -2102,6 +2159,39 @@ static void iavf_activate_fdir_filters(struct iavf_adapter *adapter)
 		adapter->aq_required |= IAVF_FLAG_AQ_ADD_FDIR_FILTER;
 }
 
+/**
+ * iavf_virtchnl_ptp_get_time - Respond to VIRTCHNL_OP_1588_PTP_GET_TIME
+ * @adapter: private adapter structure
+ * @data: the message from the PF
+ * @len: length of the message from the PF
+ *
+ * Handle the VIRTCHNL_OP_1588_PTP_GET_TIME message from the PF. This message
+ * is sent by the PF in response to the same op as a request from the VF.
+ * Extract the 64bit nanoseconds time from the message and store it in
+ * cached_phc_time. Then, notify any thread that is waiting for the update via
+ * the wait queue.
+ */
+static void iavf_virtchnl_ptp_get_time(struct iavf_adapter *adapter,
+				       void *data, u16 len)
+{
+	struct virtchnl_phc_time *msg;
+
+	if (len == sizeof(*msg)) {
+		msg = (struct virtchnl_phc_time *)data;
+	} else {
+		dev_err_once(&adapter->pdev->dev,
+			     "Invalid VIRTCHNL_OP_1588_PTP_GET_TIME from PF. Got size %u, expected %zu\n",
+			     len, sizeof(*msg));
+		return;
+	}
+
+	adapter->ptp.cached_phc_time = msg->time;
+	adapter->ptp.cached_phc_updated = jiffies;
+	adapter->ptp.phc_time_ready = true;
+
+	wake_up(&adapter->ptp.phc_time_waitqueue);
+}
+
 /**
  * iavf_virtchnl_completion
  * @adapter: adapter structure
@@ -2512,6 +2602,9 @@ void iavf_virtchnl_completion(struct iavf_adapter *adapter,
 		/* process any state change needed due to new capabilities */
 		iavf_ptp_process_caps(adapter);
 		break;
+	case VIRTCHNL_OP_1588_PTP_GET_TIME:
+		iavf_virtchnl_ptp_get_time(adapter, msg, msglen);
+		break;
 	case VIRTCHNL_OP_ENABLE_QUEUES:
 		/* enable transmits */
 		iavf_irq_enable(adapter, true);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache PHC time
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (6 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:59   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors Mateusz Polchlopek
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Jacob Keller, Wojciech Drewek, Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

The Rx timestamps reported by hardware may only have 32 bits of storage
for nanosecond time. These timestamps cannot be directly reported to the
Linux stack, as it expects 64bits of time.

To handle this, the timestamps must be extended using an algorithm that
calculates the corrected 64bit timestamp by comparison between the PHC
time and the timestamp. This algorithm requires the PHC time to be
captured within ~2 seconds of when the timestamp was captured.

Instead of trying to read the PHC time in the Rx hotpath, the algorithm
relies on a cached value that is periodically updated.

Keep this cached time up to date by using the PTP .do_aux_work kthread
function.

The iavf_ptp_do_aux_work will reschedule itself about twice a second,
and will check whether or not the cached PTP time needs to be updated.
If so, it issues a VIRTCHNL_OP_1588_PTP_GET_TIME to request the time
from the PF. The jitter and latency involved with this command aren't
important, because the cached time just needs to be kept up to date
within about ~2 seconds.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_ptp.c | 52 ++++++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h |  1 +
 2 files changed, 53 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
index d63f018792de..69e4948a9057 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
@@ -169,6 +169,55 @@ static int iavf_ptp_gettimex64(struct ptp_clock_info *ptp,
 	return iavf_read_phc_indirect(adapter, ts, sts);
 }
 
+/**
+ * iavf_ptp_cache_phc_time - Cache PHC time for performing timestamp extension
+ * @adapter: private adapter structure
+ *
+ * Periodically cache the PHC time in order to allow for timestamp extension.
+ * This is required because the Tx and Rx timestamps only contain 32bits of
+ * nanoseconds. Timestamp extension allows calculating the corrected 64bit
+ * timestamp. This algorithm relies on the cached time being within ~1 second
+ * of the timestamp.
+ */
+static void iavf_ptp_cache_phc_time(struct iavf_adapter *adapter)
+{
+	if (time_is_before_jiffies(adapter->ptp.cached_phc_updated + HZ)) {
+		/* The response from virtchnl will store the time into
+		 * cached_phc_time
+		 */
+		iavf_send_phc_read(adapter);
+	}
+}
+
+/**
+ * iavf_ptp_do_aux_work - Perform periodic work required for PTP support
+ * @ptp: PTP clock info structure
+ *
+ * Handler to take care of periodic work required for PTP operation. This
+ * includes the following tasks:
+ *
+ *   1) updating cached_phc_time
+ *
+ *      cached_phc_time is used by the Tx and Rx timestamp flows in order to
+ *      perform timestamp extension, by carefully comparing the timestamp
+ *      32bit nanosecond timestamps and determining the corrected 64bit
+ *      timestamp value to report to userspace. This algorithm only works if
+ *      the cached_phc_time is within ~1 second of the Tx or Rx timestamp
+ *      event. This task periodically reads the PHC time and stores it, to
+ *      ensure that timestamp extension operates correctly.
+ *
+ * Returns: time in jiffies until the periodic task should be re-scheduled.
+ */
+long iavf_ptp_do_aux_work(struct ptp_clock_info *ptp)
+{
+	struct iavf_adapter *adapter = clock_to_adapter(ptp);
+
+	iavf_ptp_cache_phc_time(adapter);
+
+	/* Check work about twice a second */
+	return msecs_to_jiffies(500);
+}
+
 /**
  * iavf_ptp_register_clock - Register a new PTP for userspace
  * @adapter: private adapter structure
@@ -189,6 +238,7 @@ static int iavf_ptp_register_clock(struct iavf_adapter *adapter)
 		 dev_name(dev));
 	ptp_info->owner = THIS_MODULE;
 	ptp_info->gettimex64 = iavf_ptp_gettimex64;
+	ptp_info->do_aux_work = iavf_ptp_do_aux_work;
 
 	adapter->ptp.clock = ptp_clock_register(ptp_info, dev);
 	if (IS_ERR(adapter->ptp.clock))
@@ -228,6 +278,8 @@ void iavf_ptp_init(struct iavf_adapter *adapter)
 		return;
 	}
 
+	ptp_schedule_worker(adapter->ptp.clock, 0);
+
 	adapter->ptp.initialized = true;
 }
 
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
index 4f84416743e1..7a25647980f3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -34,5 +34,6 @@ void iavf_ptp_release(struct iavf_adapter *adapter);
 void iavf_ptp_process_caps(struct iavf_adapter *adapter);
 bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap);
 void iavf_virtchnl_send_ptp_cmd(struct iavf_adapter *adapter);
+long iavf_ptp_do_aux_work(struct ptp_clock_info *ptp);
 
 #endif /* _IAVF_PTP_H_ */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (7 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache " Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:59   ` Simon Horman
  2024-06-11 11:47   ` Alexander Lobakin
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field Mateusz Polchlopek
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, Jacob Keller, Wojciech Drewek, Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
negotiating to enable the advanced flexible descriptor layout. Add the
flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.

Also add bit position definitions for the status and error indications
that are needed.

The iavf_clean_rx_irq function needs to extract a few fields from the Rx
descriptor, including the size, rx_ptype, and vlan_tag.
Move the extraction to a separate function that decodes the fields into
a structure. This will reduce the burden for handling multiple
descriptor types by keeping the relevant extraction logic in one place.

To support handling an additional descriptor format with minimal code
duplication, refactor Rx checksum handling so that the general logic
is separated from the bit calculations. Introduce an iavf_rx_desc_decoded
structure which holds the relevant bits decoded from the Rx descriptor.
This will enable implementing flexible descriptor handling without
duplicating the general logic twice.

Introduce an iavf_extract_flex_rx_fields, iavf_flex_rx_hash, and
iavf_flex_rx_csum functions which operate on the flexible NIC descriptor
format instead of the legacy 32 byte format. Based on the negotiated
RXDID, select the correct function for processing the Rx descriptors.

With this change, the Rx hot path should be functional when using either
the default legacy 32byte format or when we switch to the flexible NIC
layout.

Modify the Rx hot path to add support for the flexible descriptor
format and add request enabling Rx timestamps for all queues.

As in ice, make sure we bump the checksum level if the hardware detected
a packet type which could have an outer checksum. This is important
because hardware only verifies the inner checksum.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 354 +++++++++++++-----
 drivers/net/ethernet/intel/iavf/iavf_txrx.h   |   8 +
 drivers/net/ethernet/intel/iavf/iavf_type.h   | 147 ++++++--
 .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   5 +
 4 files changed, 391 insertions(+), 123 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 26b424fd6718..97da5af52ad7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -895,63 +895,68 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
 	return true;
 }
 
+/* iavf_rx_csum_decoded
+ *
+ * Checksum offload bits decoded from the receive descriptor.
+ */
+struct iavf_rx_csum_decoded {
+	u8 l3l4p : 1;
+	u8 ipe : 1;
+	u8 eipe : 1;
+	u8 eudpe : 1;
+	u8 ipv6exadd : 1;
+	u8 l4e : 1;
+	u8 pprs : 1;
+	u8 nat : 1;
+};
+
 /**
- * iavf_rx_checksum - Indicate in skb if hw indicated a good cksum
+ * iavf_rx_csum - Indicate in skb if hw indicated a good checksum
  * @vsi: the VSI we care about
  * @skb: skb currently being received and modified
- * @rx_desc: the receive descriptor
+ * @ptype: decoded ptype information
+ * @csum_bits: decoded Rx descriptor information
  **/
-static void iavf_rx_checksum(struct iavf_vsi *vsi,
-			     struct sk_buff *skb,
-			     union iavf_rx_desc *rx_desc)
+static void iavf_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
+			 struct libeth_rx_pt *ptype,
+			 struct iavf_rx_csum_decoded *csum_bits)
 {
-	struct libeth_rx_pt decoded;
-	u32 rx_error, rx_status;
 	bool ipv4, ipv6;
-	u8 ptype;
-	u64 qword;
 
 	skb->ip_summed = CHECKSUM_NONE;
 
-	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
-
-	decoded = libie_rx_pt_parse(ptype);
-	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
-		return;
-
-	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
-	rx_status = FIELD_GET(IAVF_RXD_QW1_STATUS_MASK, qword);
-
 	/* did the hardware decode the packet and checksum? */
-	if (!(rx_status & BIT(IAVF_RX_DESC_STATUS_L3L4P_SHIFT)))
+	if (!csum_bits->l3l4p)
 		return;
 
-	ipv4 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV4;
-	ipv6 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV6;
+	ipv4 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV4;
+	ipv6 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV6;
 
-	if (ipv4 &&
-	    (rx_error & (BIT(IAVF_RX_DESC_ERROR_IPE_SHIFT) |
-			 BIT(IAVF_RX_DESC_ERROR_EIPE_SHIFT))))
+	if (ipv4 && (csum_bits->ipe || csum_bits->eipe))
 		goto checksum_fail;
 
 	/* likely incorrect csum if alternate IP extension headers found */
-	if (ipv6 &&
-	    rx_status & BIT(IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT))
-		/* don't increment checksum err here, non-fatal err */
+	if (ipv6 && csum_bits->ipv6exadd)
 		return;
 
 	/* there was some L4 error, count error and punt packet to the stack */
-	if (rx_error & BIT(IAVF_RX_DESC_ERROR_L4E_SHIFT))
+	if (csum_bits->l4e)
 		goto checksum_fail;
 
 	/* handle packets that were not able to be checksummed due
 	 * to arrival speed, in this case the stack can compute
 	 * the csum.
 	 */
-	if (rx_error & BIT(IAVF_RX_DESC_ERROR_PPRS_SHIFT))
+	if (csum_bits->pprs)
 		return;
 
+	/* If there is an outer header present that might contain a checksum
+	 * we need to bump the checksum level by 1 to reflect the fact that
+	 * we are indicating we validated the inner checksum.
+	 */
+	if (ptype->tunnel_type >= LIBETH_RX_PT_TUNNEL_IP_GRENAT)
+		skb->csum_level = 1;
+
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 	return;
 
@@ -960,22 +965,105 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
 }
 
 /**
- * iavf_rx_hash - set the hash value in the skb
+ * iavf_legacy_rx_csum - Indicate in skb if hw indicated a good cksum
+ * @vsi: the VSI we care about
+ * @skb: skb currently being received and modified
+ * @rx_desc: the receive descriptor
+ *
+ * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
+ * descriptor writeback format.
+ **/
+static void iavf_legacy_rx_csum(struct iavf_vsi *vsi,
+				struct sk_buff *skb,
+				union iavf_rx_desc *rx_desc)
+{
+	struct iavf_rx_csum_decoded csum_bits;
+	struct libeth_rx_pt decoded;
+
+	u32 rx_error;
+	u64 qword;
+	u16 ptype;
+
+	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
+	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
+	decoded = libie_rx_pt_parse(ptype);
+
+	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
+		return;
+
+	csum_bits.ipe = FIELD_GET(IAVF_RX_DESC_ERROR_IPE_MASK, rx_error);
+	csum_bits.eipe = FIELD_GET(IAVF_RX_DESC_ERROR_EIPE_MASK, rx_error);
+	csum_bits.l4e = FIELD_GET(IAVF_RX_DESC_ERROR_L4E_MASK, rx_error);
+	csum_bits.pprs = FIELD_GET(IAVF_RX_DESC_ERROR_PPRS_MASK, rx_error);
+	csum_bits.l3l4p = FIELD_GET(IAVF_RX_DESC_STATUS_L3L4P_MASK, rx_error);
+	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_DESC_STATUS_IPV6EXADD_MASK,
+					rx_error);
+	csum_bits.nat = 0;
+	csum_bits.eudpe = 0;
+
+	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
+}
+
+/**
+ * iavf_flex_rx_csum - Indicate in skb if hw indicated a good cksum
+ * @vsi: the VSI we care about
+ * @skb: skb currently being received and modified
+ * @rx_desc: the receive descriptor
+ *
+ * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
+ * descriptor writeback format.
+ **/
+static void iavf_flex_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
+			      union iavf_rx_desc *rx_desc)
+{
+	struct iavf_rx_csum_decoded csum_bits;
+	struct libeth_rx_pt decoded;
+	u16 rx_status0, ptype;
+
+	rx_status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
+	ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M,
+			  le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0));
+	decoded = libie_rx_pt_parse(ptype);
+
+	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
+		return;
+
+	csum_bits.ipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M,
+				  rx_status0);
+	csum_bits.eipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M,
+				   rx_status0);
+	csum_bits.l4e = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M,
+				  rx_status0);
+	csum_bits.eudpe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M,
+				    rx_status0);
+	csum_bits.l3l4p = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M,
+				    rx_status0);
+	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M,
+					rx_status0);
+	csum_bits.nat = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS1_NAT_M, rx_status0);
+	csum_bits.pprs = 0;
+
+	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
+}
+
+/**
+ * iavf_legacy_rx_hash - set the hash value in the skb
  * @ring: descriptor ring
  * @rx_desc: specific descriptor
  * @skb: skb currently being received and modified
  * @rx_ptype: Rx packet type
+ *
+ * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
+ * descriptor writeback format.
  **/
-static void iavf_rx_hash(struct iavf_ring *ring,
-			 union iavf_rx_desc *rx_desc,
-			 struct sk_buff *skb,
-			 u8 rx_ptype)
+static void iavf_legacy_rx_hash(struct iavf_ring *ring,
+				union iavf_rx_desc *rx_desc,
+				struct sk_buff *skb, u8 rx_ptype)
 {
+	const __le64 rss_mask = cpu_to_le64(IAVF_RX_DESC_STATUS_FLTSTAT_MASK);
 	struct libeth_rx_pt decoded;
 	u32 hash;
-	const __le64 rss_mask =
-		cpu_to_le64((u64)IAVF_RX_DESC_FLTSTAT_RSS_HASH <<
-			    IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT);
 
 	decoded = libie_rx_pt_parse(rx_ptype);
 	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
@@ -987,6 +1075,38 @@ static void iavf_rx_hash(struct iavf_ring *ring,
 	}
 }
 
+/**
+ * iavf_flex_rx_hash - set the hash value in the skb
+ * @ring: descriptor ring
+ * @rx_desc: specific descriptor
+ * @skb: skb currently being received and modified
+ * @rx_ptype: Rx packet type
+ *
+ * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
+ * descriptor writeback format.
+ **/
+static void iavf_flex_rx_hash(struct iavf_ring *ring,
+			      union iavf_rx_desc *rx_desc,
+			      struct sk_buff *skb, u16 rx_ptype)
+{
+	struct libeth_rx_pt decoded;
+	u16 status0;
+	u32 hash;
+
+	if (!(ring->netdev->features & NETIF_F_RXHASH))
+		return;
+
+	decoded = libie_rx_pt_parse(rx_ptype);
+	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
+		return;
+
+	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
+	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M) {
+		hash = le32_to_cpu(rx_desc->flex_wb.rss_hash);
+		libeth_rx_pt_set_hash(skb, hash, decoded);
+	}
+}
+
 /**
  * iavf_process_skb_fields - Populate skb header fields from Rx descriptor
  * @rx_ring: rx descriptor ring packet is being transacted on
@@ -998,14 +1118,17 @@ static void iavf_rx_hash(struct iavf_ring *ring,
  * order to populate the hash, checksum, VLAN, protocol, and
  * other fields within the skb.
  **/
-static void
-iavf_process_skb_fields(struct iavf_ring *rx_ring,
-			union iavf_rx_desc *rx_desc, struct sk_buff *skb,
-			u8 rx_ptype)
+static void iavf_process_skb_fields(struct iavf_ring *rx_ring,
+				    union iavf_rx_desc *rx_desc,
+				    struct sk_buff *skb, u16 rx_ptype)
 {
-	iavf_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
-
-	iavf_rx_checksum(rx_ring->vsi, skb, rx_desc);
+	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE) {
+		iavf_legacy_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
+		iavf_legacy_rx_csum(rx_ring->vsi, skb, rx_desc);
+	} else {
+		iavf_flex_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
+		iavf_flex_rx_csum(rx_ring->vsi, skb, rx_desc);
+	}
 
 	skb_record_rx_queue(skb, rx_ring->queue_index);
 
@@ -1092,7 +1215,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
 /**
  * iavf_is_non_eop - process handling of non-EOP buffers
  * @rx_ring: Rx ring being processed
- * @rx_desc: Rx descriptor for current buffer
+ * @fields: Rx descriptor extracted fields
  * @skb: Current socket buffer containing buffer in progress
  *
  * This function updates next to clean.  If the buffer is an EOP buffer
@@ -1101,7 +1224,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
  * that this is in fact a non-EOP buffer.
  **/
 static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
-			    union iavf_rx_desc *rx_desc,
+			    struct iavf_rx_extracted *fields,
 			    struct sk_buff *skb)
 {
 	u32 ntc = rx_ring->next_to_clean + 1;
@@ -1113,8 +1236,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
 	prefetch(IAVF_RX_DESC(rx_ring, ntc));
 
 	/* if we are the last buffer then there is nothing else to do */
-#define IAVF_RXD_EOF BIT(IAVF_RX_DESC_STATUS_EOF_SHIFT)
-	if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
+	if (likely(fields->end_of_packet))
 		return false;
 
 	rx_ring->rx_stats.non_eop_descs++;
@@ -1122,6 +1244,91 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
 	return true;
 }
 
+/**
+ * iavf_extract_legacy_rx_fields - Extract fields from the Rx descriptor
+ * @rx_ring: rx descriptor ring
+ * @rx_desc: the descriptor to process
+ * @fields: storage for extracted values
+ *
+ * Decode the Rx descriptor and extract relevant information including the
+ * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
+ *
+ * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
+ * descriptor writeback format.
+ */
+static void iavf_extract_legacy_rx_fields(struct iavf_ring *rx_ring,
+					  union iavf_rx_desc *rx_desc,
+					  struct iavf_rx_extracted *fields)
+{
+	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+
+	fields->size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
+	fields->rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
+
+	if (qword & IAVF_RX_DESC_STATUS_L2TAG1P_MASK &&
+	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
+		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
+
+	if (rx_desc->wb.qword2.ext_status &
+	    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
+	    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
+		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);
+
+	fields->end_of_packet = FIELD_GET(IAVF_RX_DESC_STATUS_EOF_MASK, qword);
+	fields->rxe = FIELD_GET(BIT(IAVF_RXD_QW1_ERROR_SHIFT), qword);
+}
+
+/**
+ * iavf_extract_flex_rx_fields - Extract fields from the Rx descriptor
+ * @rx_ring: rx descriptor ring
+ * @rx_desc: the descriptor to process
+ * @fields: storage for extracted values
+ *
+ * Decode the Rx descriptor and extract relevant information including the
+ * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
+ *
+ * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
+ * descriptor writeback format.
+ */
+static void iavf_extract_flex_rx_fields(struct iavf_ring *rx_ring,
+					union iavf_rx_desc *rx_desc,
+					struct iavf_rx_extracted *fields)
+{
+	u16 status0, status1, flexi_flags0;
+
+	fields->size = FIELD_GET(IAVF_RX_FLEX_DESC_PKT_LEN_M,
+				 le16_to_cpu(rx_desc->flex_wb.pkt_len));
+
+	flexi_flags0 = le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0);
+
+	fields->rx_ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M, flexi_flags0);
+
+	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
+	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M &&
+	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
+		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag1);
+
+	status1 = le16_to_cpu(rx_desc->flex_wb.status_error1);
+	if (status1 & IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M &&
+	    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
+		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag2_2nd);
+
+	fields->end_of_packet = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT,
+					  status0);
+	fields->rxe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT,
+				status0);
+}
+
+static void iavf_extract_rx_fields(struct iavf_ring *rx_ring,
+				   union iavf_rx_desc *rx_desc,
+				   struct iavf_rx_extracted *fields)
+{
+	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE)
+		iavf_extract_legacy_rx_fields(rx_ring, rx_desc, fields);
+	else
+		iavf_extract_flex_rx_fields(rx_ring, rx_desc, fields);
+}
+
 /**
  * iavf_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @rx_ring: rx descriptor ring to transact packets on
@@ -1142,12 +1349,9 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
 	bool failure = false;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
+		struct iavf_rx_extracted fields = {};
 		struct libeth_fqe *rx_buffer;
 		union iavf_rx_desc *rx_desc;
-		unsigned int size;
-		u16 vlan_tag = 0;
-		u8 rx_ptype;
-		u64 qword;
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IAVF_RX_BUFFER_WRITE) {
@@ -1158,35 +1362,27 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
 
 		rx_desc = IAVF_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
-		/* status_error_len will always be zero for unused descriptors
-		 * because it's cleared in cleanup, and overlaps with hdr_addr
-		 * which is always zero because packet split isn't used, if the
-		 * hardware wrote DD then the length will be non-zero
-		 */
-		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we have
 		 * verified the descriptor has been written back.
 		 */
 		dma_rmb();
-#define IAVF_RXD_DD BIT(IAVF_RX_DESC_STATUS_DD_SHIFT)
-		if (!iavf_test_staterr(rx_desc, IAVF_RXD_DD))
+		if (!iavf_test_staterr(rx_desc, IAVF_RX_DESC_STATUS_DD_MASK))
 			break;
 
-		size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
+		iavf_extract_rx_fields(rx_ring, rx_desc, &fields);
 
 		iavf_trace(clean_rx_irq, rx_ring, rx_desc, skb);
 
 		rx_buffer = &rx_ring->rx_fqes[rx_ring->next_to_clean];
-		if (!libeth_rx_sync_for_cpu(rx_buffer, size))
+		if (!libeth_rx_sync_for_cpu(rx_buffer, fields.size))
 			goto skip_data;
 
 		/* retrieve a buffer from the ring */
 		if (skb)
-			iavf_add_rx_frag(skb, rx_buffer, size);
+			iavf_add_rx_frag(skb, rx_buffer, fields.size);
 		else
-			skb = iavf_build_skb(rx_buffer, size);
+			skb = iavf_build_skb(rx_buffer, fields.size);
 
 		/* exit if we failed to retrieve a buffer */
 		if (!skb) {
@@ -1197,15 +1393,14 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
 skip_data:
 		cleaned_count++;
 
-		if (iavf_is_non_eop(rx_ring, rx_desc, skb) || unlikely(!skb))
+		if (iavf_is_non_eop(rx_ring, &fields, skb) || unlikely(!skb))
 			continue;
 
-		/* ERR_MASK will only have valid bits if EOP set, and
-		 * what we are doing here is actually checking
-		 * IAVF_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
-		 * the error field
+		/* RXE field in descriptor is an indication of the MAC errors
+		 * (like CRC, alignment, oversize etc). If it is set then iavf
+		 * should finish.
 		 */
-		if (unlikely(iavf_test_staterr(rx_desc, BIT(IAVF_RXD_QW1_ERROR_SHIFT)))) {
+		if (unlikely(fields.rxe)) {
 			dev_kfree_skb_any(skb);
 			skb = NULL;
 			continue;
@@ -1219,22 +1414,11 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
 
-		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-		rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
-
 		/* populate checksum, VLAN, and protocol */
-		iavf_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
-
-		if (qword & BIT(IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT) &&
-		    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
-			vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
-		if (rx_desc->wb.qword2.ext_status &
-		    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
-		    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
-			vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);
+		iavf_process_skb_fields(rx_ring, rx_desc, skb, fields.rx_ptype);
 
 		iavf_trace(clean_rx_irq_rx, rx_ring, rx_desc, skb);
-		iavf_receive_skb(rx_ring, skb, vlan_tag);
+		iavf_receive_skb(rx_ring, skb, fields.vlan_tag);
 		skb = NULL;
 
 		/* update budget accounting */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 17309d8625ac..3661cd57a068 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -99,6 +99,14 @@ static inline bool iavf_test_staterr(union iavf_rx_desc *rx_desc,
 		  cpu_to_le64(stat_err_bits));
 }
 
+struct iavf_rx_extracted {
+	unsigned int size;
+	u16 vlan_tag;
+	u16 rx_ptype;
+	u8 end_of_packet:1;
+	u8 rxe:1;
+};
+
 /* How many Rx Buffers do we bundle into one write to the hardware ? */
 #define IAVF_RX_INCREMENT(r, i) \
 	do {					\
diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index f6b09e57abce..82c16a720807 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -206,6 +206,45 @@ union iavf_16byte_rx_desc {
 	} wb;  /* writeback */
 };
 
+/* Rx Flex Descriptor NIC Profile
+ * RxDID Profile ID 2
+ * Flex-field 0: RSS hash lower 16-bits
+ * Flex-field 1: RSS hash upper 16-bits
+ * Flex-field 2: Flow ID lower 16-bits
+ * Flex-field 3: Flow ID higher 16-bits
+ * Flex-field 4: reserved, VLAN ID taken from L2Tag
+ */
+struct iavf_32byte_rx_flex_wb {
+	/* Qword 0 */
+	u8 rxdid;
+	u8 mir_id_umb_cast;
+	__le16 ptype_flexi_flags0;
+	__le16 pkt_len;
+	__le16 hdr_len_sph_flex_flags1;
+
+	/* Qword 1 */
+	__le16 status_error0;
+	__le16 l2tag1;
+	__le32 rss_hash;
+
+	/* Qword 2 */
+	__le16 status_error1;
+	u8 flexi_flags2;
+	u8 ts_low;
+	__le16 l2tag2_1st;
+	__le16 l2tag2_2nd;
+
+	/* Qword 3 */
+	__le32 flow_id;
+	union {
+		struct {
+			__le16 rsvd;
+			__le16 flow_id_ipv6;
+		} flex;
+		__le32 ts_high;
+	} flex_ts;
+};
+
 union iavf_32byte_rx_desc {
 	struct {
 		__le64  pkt_addr; /* Packet buffer address */
@@ -253,35 +292,34 @@ union iavf_32byte_rx_desc {
 			} hi_dword;
 		} qword3;
 	} wb;  /* writeback */
+	struct iavf_32byte_rx_flex_wb flex_wb;
 };
 
-enum iavf_rx_desc_status_bits {
-	/* Note: These are predefined bit offsets */
-	IAVF_RX_DESC_STATUS_DD_SHIFT		= 0,
-	IAVF_RX_DESC_STATUS_EOF_SHIFT		= 1,
-	IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT	= 2,
-	IAVF_RX_DESC_STATUS_L3L4P_SHIFT		= 3,
-	IAVF_RX_DESC_STATUS_CRCP_SHIFT		= 4,
-	IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT	= 5, /* 2 BITS */
-	IAVF_RX_DESC_STATUS_TSYNVALID_SHIFT	= 7,
-	/* Note: Bit 8 is reserved in X710 and XL710 */
-	IAVF_RX_DESC_STATUS_EXT_UDP_0_SHIFT	= 8,
-	IAVF_RX_DESC_STATUS_UMBCAST_SHIFT	= 9, /* 2 BITS */
-	IAVF_RX_DESC_STATUS_FLM_SHIFT		= 11,
-	IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT	= 12, /* 2 BITS */
-	IAVF_RX_DESC_STATUS_LPBK_SHIFT		= 14,
-	IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT	= 15,
-	IAVF_RX_DESC_STATUS_RESERVED_SHIFT	= 16, /* 2 BITS */
-	/* Note: For non-tunnel packets INT_UDP_0 is the right status for
-	 * UDP header
-	 */
-	IAVF_RX_DESC_STATUS_INT_UDP_0_SHIFT	= 18,
-	IAVF_RX_DESC_STATUS_LAST /* this entry must be last!!! */
-};
+/* Note: These are predefined bit offsets */
+#define IAVF_RX_DESC_STATUS_DD_MASK		BIT(0)
+#define IAVF_RX_DESC_STATUS_EOF_MASK		BIT(1)
+#define IAVF_RX_DESC_STATUS_L2TAG1P_MASK	BIT(2)
+#define IAVF_RX_DESC_STATUS_L3L4P_MASK		BIT(3)
+#define IAVF_RX_DESC_STATUS_CRCP_MASK		BIT(4)
+#define IAVF_RX_DESC_STATUS_TSYNINDX_MASK	GENMASK_ULL(6, 5)
+#define IAVF_RX_DESC_STATUS_TSYNVALID_MASK	BIT(7)
+/* Note: Bit 8 is reserved in X710 and XL710 */
+#define IAVF_RX_DESC_STATUS_EXT_UDP_0_MASK	BIT(8)
+#define IAVF_RX_DESC_STATUS_UMBCAST_MASK	GENMASK_ULL(10, 9)
+#define IAVF_RX_DESC_STATUS_FLM_MASK		BIT(11)
+#define IAVF_RX_DESC_STATUS_FLTSTAT_MASK	GENMASK_ULL(13, 12)
+#define IAVF_RX_DESC_STATUS_LPBK_MASK		BIT(14)
+#define IAVF_RX_DESC_STATUS_IPV6EXADD_MASK	BIT(15)
+#define IAVF_RX_DESC_STATUS_RESERVED_MASK	GENMASK_ULL(17, 16)
+/* Note: For non-tunnel packets INT_UDP_0 is the right status for
+ * UDP header
+ */
+#define IAVF_RX_DESC_STATUS_INT_UDP_0_MASK	BIT(18)
+
+#define IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT	BIT(1)
+#define IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT	BIT(10)
 
-#define IAVF_RXD_QW1_STATUS_SHIFT	0
-#define IAVF_RXD_QW1_STATUS_MASK	((BIT(IAVF_RX_DESC_STATUS_LAST) - 1) \
-					 << IAVF_RXD_QW1_STATUS_SHIFT)
+#define IAVF_RXD_QW1_STATUS_MASK		(BIT(19) - 1)
 
 #define IAVF_RXD_QW1_STATUS_TSYNINDX_SHIFT IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT
 #define IAVF_RXD_QW1_STATUS_TSYNINDX_MASK  (0x3UL << \
@@ -301,18 +339,16 @@ enum iavf_rx_desc_fltstat_values {
 #define IAVF_RXD_QW1_ERROR_SHIFT	19
 #define IAVF_RXD_QW1_ERROR_MASK		(0xFFUL << IAVF_RXD_QW1_ERROR_SHIFT)
 
-enum iavf_rx_desc_error_bits {
-	/* Note: These are predefined bit offsets */
-	IAVF_RX_DESC_ERROR_RXE_SHIFT		= 0,
-	IAVF_RX_DESC_ERROR_RECIPE_SHIFT		= 1,
-	IAVF_RX_DESC_ERROR_HBO_SHIFT		= 2,
-	IAVF_RX_DESC_ERROR_L3L4E_SHIFT		= 3, /* 3 BITS */
-	IAVF_RX_DESC_ERROR_IPE_SHIFT		= 3,
-	IAVF_RX_DESC_ERROR_L4E_SHIFT		= 4,
-	IAVF_RX_DESC_ERROR_EIPE_SHIFT		= 5,
-	IAVF_RX_DESC_ERROR_OVERSIZE_SHIFT	= 6,
-	IAVF_RX_DESC_ERROR_PPRS_SHIFT		= 7
-};
+/* Note: These are predefined bit offsets */
+#define IAVF_RX_DESC_ERROR_RXE_MASK		BIT(0)
+#define IAVF_RX_DESC_ERROR_RECIPE_MASK		BIT(1)
+#define IAVF_RX_DESC_ERROR_HBO_MASK		BIT(2)
+#define IAVF_RX_DESC_ERROR_L3L4E_MASK		GENMASK_ULL(5, 3)
+#define IAVF_RX_DESC_ERROR_IPE_MASK		BIT(3)
+#define IAVF_RX_DESC_ERROR_L4E_MASK		BIT(4)
+#define IAVF_RX_DESC_ERROR_EIPE_MASK		BIT(5)
+#define IAVF_RX_DESC_ERROR_OVERSIZE_MASK	BIT(6)
+#define IAVF_RX_DESC_ERROR_PPRS_MASK		BIT(7)
 
 enum iavf_rx_desc_error_l3l4e_fcoe_masks {
 	IAVF_RX_DESC_ERROR_L3L4E_NONE		= 0,
@@ -325,6 +361,41 @@ enum iavf_rx_desc_error_l3l4e_fcoe_masks {
 #define IAVF_RXD_QW1_PTYPE_SHIFT	30
 #define IAVF_RXD_QW1_PTYPE_MASK		(0xFFULL << IAVF_RXD_QW1_PTYPE_SHIFT)
 
+/* for iavf_32byte_rx_flex_wb.ptype_flexi_flags0 member */
+#define IAVF_RX_FLEX_DESC_PTYPE_M      (0x3FF) /* 10-bits */
+
+/* for iavf_32byte_rx_flex_wb.pkt_length member */
+#define IAVF_RX_FLEX_DESC_PKT_LEN_M    (0x3FFF) /* 14-bits */
+
+/* Note: These are predefined bit offsets */
+#define IAVF_RX_FLEX_DESC_STATUS0_DD_M			BIT(0)
+#define IAVF_RX_FLEX_DESC_STATUS0_EOF_M			BIT(1)
+#define IAVF_RX_FLEX_DESC_STATUS0_HBO_M			BIT(2)
+#define IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M		BIT(3)
+#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M		BIT(4)
+#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M		BIT(5)
+#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M		BIT(6)
+#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M		BIT(7)
+#define IAVF_RX_FLEX_DESC_STATUS0_LPBK_M		BIT(8)
+#define IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M		BIT(9)
+#define IAVF_RX_FLEX_DESC_STATUS0_RXE_M			BIT(10)
+#define IAVF_RX_FLEX_DESC_STATUS0_CRCP_			BIT(11)
+#define IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M		BIT(12)
+#define IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M		BIT(13)
+#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD0_VALID_M	BIT(14)
+#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD1_VALID_M	BIT(15)
+
+/* Note: These are predefined bit offsets */
+#define IAVF_RX_FLEX_DESC_STATUS1_CPM_M			(0xFULL) /* 4 bits */
+#define IAVF_RX_FLEX_DESC_STATUS1_NAT_M			BIT(4)
+#define IAVF_RX_FLEX_DESC_STATUS1_CRYPTO_M		BIT(5)
+/* [10:6] reserved */
+#define IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M		BIT(11)
+#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD2_VALID_M	BIT(12)
+#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD3_VALID_M	BIT(13)
+#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD4_VALID_M	BIT(14)
+#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD5_VALID_M	BIT(15)
+
 #define IAVF_RXD_QW1_LENGTH_PBUF_SHIFT	38
 #define IAVF_RXD_QW1_LENGTH_PBUF_MASK	(0x3FFFULL << \
 					 IAVF_RXD_QW1_LENGTH_PBUF_SHIFT)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 2693c3ad0830..5cbb375b7063 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -402,6 +402,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
 	int pairs = adapter->num_active_queues;
 	struct virtchnl_queue_pair_info *vqpi;
 	u32 i, max_frame;
+	u8 rx_flags = 0;
 	size_t len;
 
 	max_frame = LIBIE_MAX_RX_FRM_LEN(adapter->rx_rings->pp->p.offset);
@@ -419,6 +420,9 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
 	if (!vqci)
 		return;
 
+	if (iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP))
+		rx_flags |= VIRTCHNL_PTP_RX_TSTAMP;
+
 	vqci->vsi_id = adapter->vsi_res->vsi_id;
 	vqci->num_queue_pairs = pairs;
 	vqpi = vqci->qpair;
@@ -441,6 +445,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
 		if (CRC_OFFLOAD_ALLOWED(adapter))
 			vqpi->rxq.crc_disable = !!(adapter->netdev->features &
 						   NETIF_F_RXFCS);
+		vqpi->rxq.flags = rx_flags;
 		vqpi++;
 	}
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (8 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 12:59   ` Simon Horman
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops Mateusz Polchlopek
  2024-06-04 13:14 ` [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath Mateusz Polchlopek
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Mateusz Polchlopek, Wojciech Drewek, Rahul Rameshbabu

Rx timestamping introduced in PF driver caused the need of refactoring
the VF driver mechanism to check packet fields.

The function to check errors in descriptor has been removed and from
now only previously set struct fields are being checked. The field DD
(descriptor done) needs to be checked at the very beginning, before
extracting other fields.

Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
---
 drivers/net/ethernet/intel/iavf/iavf_txrx.c | 30 ++++++++++++++++++++-
 drivers/net/ethernet/intel/iavf/iavf_txrx.h | 17 ------------
 drivers/net/ethernet/intel/iavf/iavf_type.h |  1 +
 3 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 97da5af52ad7..78da3b2e81a7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -9,6 +9,30 @@
 #include "iavf_trace.h"
 #include "iavf_prototype.h"
 
+/**
+ * iavf_is_descriptor_done - tests DD bit in Rx descriptor
+ * @rx_ring: the ring parameter to distinguish descriptor type (flex/legacy)
+ * @rx_desc: pointer to receive descriptor
+ *
+ * This function tests the descriptor done bit in specified descriptor. Because
+ * there are two types of descriptors (legacy and flex) the parameter rx_ring
+ * is used to distinguish.
+ *
+ * Return: true or false based on the state of DD bit in Rx descriptor
+ */
+static bool iavf_is_descriptor_done(struct iavf_ring *rx_ring,
+				    union iavf_rx_desc *rx_desc)
+{
+	u64 status_error_len = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+
+	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE)
+		return !!(FIELD_GET(IAVF_RX_DESC_STATUS_DD_MASK,
+			  status_error_len));
+
+	return !!(FIELD_GET((IAVF_RX_FLEX_DESC_STATUS_ERR0_DD_BIT),
+		  le16_to_cpu(rx_desc->flex_wb.status_error0)));
+}
+
 static __le64 build_ctob(u32 td_cmd, u32 td_offset, unsigned int size,
 			 u32 td_tag)
 {
@@ -1367,7 +1391,11 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
 		 * verified the descriptor has been written back.
 		 */
 		dma_rmb();
-		if (!iavf_test_staterr(rx_desc, IAVF_RX_DESC_STATUS_DD_MASK))
+
+		/* If DD field (descriptor done) is unset then other fields are
+		 * not valid
+		 */
+		if (!iavf_is_descriptor_done(rx_ring, rx_desc))
 			break;
 
 		iavf_extract_rx_fields(rx_ring, rx_desc, &fields);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 3661cd57a068..3add31924d75 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -82,23 +82,6 @@ enum iavf_dyn_idx_t {
 
 #define iavf_rx_desc iavf_32byte_rx_desc
 
-/**
- * iavf_test_staterr - tests bits in Rx descriptor status and error fields
- * @rx_desc: pointer to receive descriptor (in le64 format)
- * @stat_err_bits: value to mask
- *
- * This function does some fast chicanery in order to return the
- * value of the mask which is really only used for boolean tests.
- * The status_error_len doesn't need to be shifted because it begins
- * at offset zero.
- */
-static inline bool iavf_test_staterr(union iavf_rx_desc *rx_desc,
-				     const u64 stat_err_bits)
-{
-	return !!(rx_desc->wb.qword1.status_error_len &
-		  cpu_to_le64(stat_err_bits));
-}
-
 struct iavf_rx_extracted {
 	unsigned int size;
 	u16 vlan_tag;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index 82c16a720807..61012ee5de2e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -316,6 +316,7 @@ union iavf_32byte_rx_desc {
  */
 #define IAVF_RX_DESC_STATUS_INT_UDP_0_MASK	BIT(18)
 
+#define IAVF_RX_FLEX_DESC_STATUS_ERR0_DD_BIT	BIT(0)
 #define IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT	BIT(1)
 #define IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT	BIT(10)
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (9 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field Mateusz Polchlopek
@ 2024-06-04 13:13 ` Mateusz Polchlopek
  2024-06-08 13:00   ` Simon Horman
  2024-06-04 13:14 ` [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath Mateusz Polchlopek
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:13 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Rahul Rameshbabu,
	Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Add handlers for the .ndo_hwtstamp_get and .ndo_hwtstamp_set ops which allow
userspace to request timestamp enablement for the device. This support allows
standard Linux applications to request the timestamping desired.

As with other devices that support timestamping all packets, the driver
will upgrade any request for timestamping of a specific type of packet
to HWTSTAMP_FILTER_ALL.

The current configuration is stored, so that it can be retrieved by
calling .ndo_hwtstamp_get

The Tx timestamps are not implemented yet so calling set ops for
Tx path will end with EOPNOTSUPP error code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c |  19 +++
 drivers/net/ethernet/intel/iavf/iavf_ptp.c  | 136 ++++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h  |   6 +
 drivers/net/ethernet/intel/iavf/iavf_txrx.h |   1 +
 4 files changed, 162 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f613bffabf85..5c4c3032c30a 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -5068,6 +5068,23 @@ static netdev_features_t iavf_fix_features(struct net_device *netdev,
 	return iavf_fix_strip_features(adapter, features);
 }
 
+static int iavf_hwstamp_get(struct net_device *netdev,
+			    struct kernel_hwtstamp_config *config)
+{
+	struct iavf_adapter *adapter = netdev_priv(netdev);
+
+	return iavf_ptp_get_ts_config(adapter, config);
+}
+
+static int iavf_hwstamp_set(struct net_device *netdev,
+			    struct kernel_hwtstamp_config *config,
+			    struct netlink_ext_ack *extack)
+{
+	struct iavf_adapter *adapter = netdev_priv(netdev);
+
+	return iavf_ptp_set_ts_config(adapter, config, extack);
+}
+
 static const struct net_device_ops iavf_netdev_ops = {
 	.ndo_open		= iavf_open,
 	.ndo_stop		= iavf_close,
@@ -5083,6 +5100,8 @@ static const struct net_device_ops iavf_netdev_ops = {
 	.ndo_fix_features	= iavf_fix_features,
 	.ndo_set_features	= iavf_set_features,
 	.ndo_setup_tc		= iavf_setup_tc,
+	.ndo_hwtstamp_get	= iavf_hwstamp_get,
+	.ndo_hwtstamp_set	= iavf_hwstamp_set
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
index 69e4948a9057..1a0a7a038ae1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
@@ -3,6 +3,136 @@
 
 #include "iavf.h"
 
+/**
+ * iavf_ptp_disable_rx_tstamp - Disable timestamping in Rx rings
+ * @adapter: private adapter structure
+ *
+ * Disable timestamp reporting for all Rx rings.
+ */
+static void iavf_ptp_disable_rx_tstamp(struct iavf_adapter *adapter)
+{
+	unsigned int i;
+
+	for (i = 0; i < adapter->num_active_queues; i++)
+		adapter->rx_rings[i].flags &= ~IAVF_TXRX_FLAGS_HW_TSTAMP;
+}
+
+/**
+ * iavf_ptp_enable_rx_tstamp - Enable timestamping in Rx rings
+ * @adapter: private adapter structure
+ *
+ * Enable timestamp reporting for all Rx rings.
+ */
+static void iavf_ptp_enable_rx_tstamp(struct iavf_adapter *adapter)
+{
+	unsigned int i;
+
+	for (i = 0; i < adapter->num_active_queues; i++)
+		adapter->rx_rings[i].flags |= IAVF_TXRX_FLAGS_HW_TSTAMP;
+}
+
+/**
+ * iavf_ptp_set_timestamp_mode - Set device timestamping mode
+ * @adapter: private adapter structure
+ * @config: pointer to kernel_hwtstamp_config
+ *
+ * Set the timestamping mode requested from the userspace.
+ *
+ * Note: this function always translates Rx timestamp requests for any packet
+ * category into HWTSTAMP_FILTER_ALL.
+ *
+ * Return: zero.
+ */
+static int iavf_ptp_set_timestamp_mode(struct iavf_adapter *adapter,
+				       struct kernel_hwtstamp_config *config)
+{
+	/* Reserved for future extensions. */
+	if (config->flags)
+		return -EINVAL;
+
+	switch (config->tx_type) {
+	case HWTSTAMP_TX_OFF:
+		break;
+	case HWTSTAMP_TX_ON:
+		return -EOPNOTSUPP;
+	default:
+		return -ERANGE;
+	}
+
+	switch (config->rx_filter) {
+	case HWTSTAMP_FILTER_NONE:
+		iavf_ptp_disable_rx_tstamp(adapter);
+		break;
+	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_NTP_ALL:
+	case HWTSTAMP_FILTER_ALL:
+		if (!(iavf_ptp_cap_supported(adapter,
+					     VIRTCHNL_1588_PTP_CAP_RX_TSTAMP)))
+			return -EOPNOTSUPP;
+		config->rx_filter = HWTSTAMP_FILTER_ALL;
+		iavf_ptp_enable_rx_tstamp(adapter);
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	return 0;
+}
+
+/**
+ * iavf_ptp_get_ts_config - Get timestamping configuration
+ * @adapter: private adapter structure
+ * @config: pointer to kernel_hwtstamp_config
+ *
+ * Return the current hardware timestamping configuration back to userspace.
+ *
+ * Return: zero.
+ */
+int iavf_ptp_get_ts_config(struct iavf_adapter *adapter,
+			   struct kernel_hwtstamp_config *config)
+{
+	*config = adapter->ptp.hwtstamp_config;
+
+	return 0;
+}
+
+/**
+ * iavf_ptp_set_ts_config - Set timestamping configuration
+ * @adapter: private adapter structure
+ * @config: pointer to kernel_hwtstamp_config structure
+ * @extack: pointer to netlink_ext_ack structure
+ *
+ * Program the requested timestamping configuration to the device.
+ *
+ * Return: zero.
+ */
+int iavf_ptp_set_ts_config(struct iavf_adapter *adapter,
+			   struct kernel_hwtstamp_config *config,
+			   struct netlink_ext_ack *extack)
+{
+	int err;
+
+	err = iavf_ptp_set_timestamp_mode(adapter, config);
+	if (err)
+		return err;
+
+	/* Save successful settings for future reference */
+	adapter->ptp.hwtstamp_config = *config;
+
+	return 0;
+}
+
 /**
  * clock_to_adapter - Convert clock info pointer to adapter pointer
  * @ptp_info: PTP info structure
@@ -335,4 +465,10 @@ void iavf_ptp_process_caps(struct iavf_adapter *adapter)
 	else if (!adapter->ptp.initialized &&
 		 iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_READ_PHC))
 		iavf_ptp_init(adapter);
+
+	/* Check if the device lost access to Rx timestamp incoming packets */
+	if (!iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP)) {
+		adapter->ptp.hwtstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
+		iavf_ptp_disable_rx_tstamp(adapter);
+	}
 }
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
index 7a25647980f3..fd211b1c4025 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -21,6 +21,7 @@ struct iavf_ptp {
 	struct list_head aq_cmds;
 	/* Lock protecting access to the AQ command list */
 	spinlock_t aq_cmd_lock;
+	struct kernel_hwtstamp_config hwtstamp_config;
 	u64 cached_phc_time;
 	unsigned long cached_phc_updated;
 	bool initialized;
@@ -35,5 +36,10 @@ void iavf_ptp_process_caps(struct iavf_adapter *adapter);
 bool iavf_ptp_cap_supported(struct iavf_adapter *adapter, u32 cap);
 void iavf_virtchnl_send_ptp_cmd(struct iavf_adapter *adapter);
 long iavf_ptp_do_aux_work(struct ptp_clock_info *ptp);
+int iavf_ptp_get_ts_config(struct iavf_adapter *adapter,
+			   struct kernel_hwtstamp_config *config);
+int iavf_ptp_set_ts_config(struct iavf_adapter *adapter,
+			   struct kernel_hwtstamp_config *config,
+			   struct netlink_ext_ack *extack);
 
 #endif /* _IAVF_PTP_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 3add31924d75..0379f94acb56 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -262,6 +262,7 @@ struct iavf_ring {
 #define IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1	BIT(3)
 #define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2	BIT(4)
 #define IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2	BIT(5)
+#define IAVF_TXRX_FLAGS_HW_TSTAMP		BIT(6)
 
 	/* stats structs */
 	struct iavf_queue_stats	stats;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath
  2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
                   ` (10 preceding siblings ...)
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops Mateusz Polchlopek
@ 2024-06-04 13:14 ` Mateusz Polchlopek
  2024-06-08 13:00   ` Simon Horman
  11 siblings, 1 reply; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-04 13:14 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, Jacob Keller, Wojciech Drewek, Rahul Rameshbabu,
	Sunil Goutham, Mateusz Polchlopek

From: Jacob Keller <jacob.e.keller@intel.com>

Add support for receive timestamps to the Rx hotpath. This support only
works when using the flexible descriptor format, so make sure that we
request this format by default if we have receive timestamp support
available in the PTP capabilities.

In order to report the timestamps to userspace, we need to perform
timestamp extension. The Rx descriptor does actually contain the "40
bit" timestamp. However, upper 32 bits which contain nanoseconds are
conveniently stored separately in the descriptor. We could extract the
32bits and lower 8 bits, then perform a bitwise OR to calculate the
40bit value. This makes no sense, because the timestamp extension
algorithm would simply discard the lower 8 bits anyways.

Thus, implement timestamp extension as iavf_ptp_extend_32b_timestamp(),
and extract and forward only the 32bits of nominal nanoseconds.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c |  9 +++
 drivers/net/ethernet/intel/iavf/iavf_ptp.c  | 69 +++++++++++++++++++++
 drivers/net/ethernet/intel/iavf/iavf_ptp.h  |  4 ++
 drivers/net/ethernet/intel/iavf/iavf_txrx.c | 43 +++++++++++++
 4 files changed, 125 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 5c4c3032c30a..7c17d20cc254 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -730,6 +730,15 @@ static u8 iavf_select_rx_desc_format(struct iavf_adapter *adapter)
 	if (!RXDID_ALLOWED(adapter))
 		return VIRTCHNL_RXDID_1_32B_BASE;
 
+	/* Rx timestamping requires the use of flexible NIC descriptors */
+	if (iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP)) {
+		if (supported_rxdids & BIT(VIRTCHNL_RXDID_2_FLEX_SQ_NIC))
+			return VIRTCHNL_RXDID_2_FLEX_SQ_NIC;
+
+		dev_dbg(&adapter->pdev->dev,
+			"Unable to negotiate flexible descriptor format.\n");
+	}
+
 	/* Warn if the PF does not list support for the default legacy
 	 * descriptor format. This shouldn't happen, as this is the format
 	 * used if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is not supported. It is
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.c b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
index 1a0a7a038ae1..70c360f5a7ce 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.c
@@ -440,6 +440,9 @@ void iavf_ptp_release(struct iavf_adapter *adapter)
 	}
 	adapter->aq_required &= ~IAVF_FLAG_AQ_SEND_PTP_CMD;
 	spin_unlock(&adapter->ptp.aq_cmd_lock);
+
+	adapter->ptp.hwtstamp_config.rx_filter = HWTSTAMP_FILTER_NONE;
+	iavf_ptp_disable_rx_tstamp(adapter);
 }
 
 /**
@@ -472,3 +475,69 @@ void iavf_ptp_process_caps(struct iavf_adapter *adapter)
 		iavf_ptp_disable_rx_tstamp(adapter);
 	}
 }
+
+/**
+ * iavf_ptp_extend_32b_timestamp - Convert a 32b nanoseconds timestamp to 64b
+ * nanoseconds
+ * @cached_phc_time: recently cached copy of PHC time
+ * @in_tstamp: Ingress/egress 32b nanoseconds timestamp value
+ *
+ * Hardware captures timestamps which contain only 32 bits of nominal
+ * nanoseconds, as opposed to the 64bit timestamps that the stack expects.
+ *
+ * Extend the 32bit nanosecond timestamp using the following algorithm and
+ * assumptions:
+ *
+ * 1) have a recently cached copy of the PHC time
+ * 2) assume that the in_tstamp was captured 2^31 nanoseconds (~2.1
+ *    seconds) before or after the PHC time was captured.
+ * 3) calculate the delta between the cached time and the timestamp
+ * 4) if the delta is smaller than 2^31 nanoseconds, then the timestamp was
+ *    captured after the PHC time. In this case, the full timestamp is just
+ *    the cached PHC time plus the delta.
+ * 5) otherwise, if the delta is larger than 2^31 nanoseconds, then the
+ *    timestamp was captured *before* the PHC time, i.e. because the PHC
+ *    cache was updated after the timestamp was captured by hardware. In this
+ *    case, the full timestamp is the cached time minus the inverse delta.
+ *
+ * This algorithm works even if the PHC time was updated after a Tx timestamp
+ * was requested, but before the Tx timestamp event was reported from
+ * hardware.
+ *
+ * This calculation primarily relies on keeping the cached PHC time up to
+ * date. If the timestamp was captured more than 2^31 nanoseconds after the
+ * PHC time, it is possible that the lower 32bits of PHC time have
+ * overflowed more than once, and we might generate an incorrect timestamp.
+ *
+ * This is prevented by (a) periodically updating the cached PHC time once
+ * a second, and (b) discarding any Tx timestamp packet if it has waited for
+ * a timestamp for more than one second.
+ *
+ * Return: extended timestamp (to 64b)
+ */
+u64 iavf_ptp_extend_32b_timestamp(u64 cached_phc_time, u32 in_tstamp)
+{
+	const u64 mask = GENMASK_ULL(31, 0);
+	u32 delta;
+	u64 ns;
+
+	/* Calculate the delta between the lower 32bits of the cached PHC
+	 * time and the in_tstamp value
+	 */
+	delta = (in_tstamp - (u32)(cached_phc_time & mask));
+
+	/* Do not assume that the in_tstamp is always more recent than the
+	 * cached PHC time. If the delta is large, it indicates that the
+	 * in_tstamp was taken in the past, and should be converted
+	 * forward.
+	 */
+	if (delta > (mask / 2)) {
+		/* reverse the delta calculation here */
+		delta = ((u32)(cached_phc_time & mask) - in_tstamp);
+		ns = cached_phc_time - delta;
+	} else {
+		ns = cached_phc_time + delta;
+	}
+
+	return ns;
+}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ptp.h b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
index fd211b1c4025..be07e543ce48 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ptp.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_ptp.h
@@ -6,6 +6,9 @@
 
 #include <linux/ptp_clock_kernel.h>
 
+/* bit indicating whether a 40bit timestamp is valid */
+#define IAVF_PTP_40B_TSTAMP_VALID	BIT(0)
+
 /* structure used to queue PTP commands for processing */
 struct iavf_ptp_aq_cmd {
 	struct list_head list;
@@ -41,5 +44,6 @@ int iavf_ptp_get_ts_config(struct iavf_adapter *adapter,
 int iavf_ptp_set_ts_config(struct iavf_adapter *adapter,
 			   struct kernel_hwtstamp_config *config,
 			   struct netlink_ext_ack *extack);
+u64 iavf_ptp_extend_32b_timestamp(u64 cached_phc_time, u32 in_tstamp);
 
 #endif /* _IAVF_PTP_H_ */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 78da3b2e81a7..1d20cd559f7d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1131,6 +1131,48 @@ static void iavf_flex_rx_hash(struct iavf_ring *ring,
 	}
 }
 
+/**
+ * iavf_flex_rx_tstamp - Capture Rx timestamp from the descriptor
+ * @rx_ring: descriptor ring
+ * @rx_desc: specific descriptor
+ * @skb: skb currently being received
+ *
+ * Read the Rx timestamp value from the descriptor and pass it to the stack.
+ *
+ * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
+ * descriptor writeback format.
+ */
+static void iavf_flex_rx_tstamp(struct iavf_ring *rx_ring,
+				union iavf_rx_desc *rx_desc,
+				struct sk_buff *skb)
+{
+	struct skb_shared_hwtstamps *skb_tstamps;
+	struct iavf_adapter *adapter;
+	u32 tstamp;
+	u64 ns;
+
+	/* Skip processing if timestamps aren't enabled */
+	if (!(rx_ring->flags & IAVF_TXRX_FLAGS_HW_TSTAMP))
+		return;
+
+	/* Check if this Rx descriptor has a valid timestamp */
+	if (!(rx_desc->flex_wb.ts_low & IAVF_PTP_40B_TSTAMP_VALID))
+		return;
+
+	adapter = netdev_priv(rx_ring->netdev);
+
+	/* the ts_low field only contains the valid bit and sub-nanosecond
+	 * precision, so we don't need to extract it.
+	 */
+	tstamp = le32_to_cpu(rx_desc->flex_wb.flex_ts.ts_high);
+	ns = iavf_ptp_extend_32b_timestamp(adapter->ptp.cached_phc_time,
+					   tstamp);
+
+	skb_tstamps = skb_hwtstamps(skb);
+	memset(skb_tstamps, 0, sizeof(*skb_tstamps));
+	skb_tstamps->hwtstamp = ns_to_ktime(ns);
+}
+
 /**
  * iavf_process_skb_fields - Populate skb header fields from Rx descriptor
  * @rx_ring: rx descriptor ring packet is being transacted on
@@ -1152,6 +1194,7 @@ static void iavf_process_skb_fields(struct iavf_ring *rx_ring,
 	} else {
 		iavf_flex_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 		iavf_flex_rx_csum(rx_ring->vsi, skb, rx_desc);
+		iavf_flex_rx_tstamp(rx_ring, rx_desc, skb);
 	}
 
 	skb_record_rx_queue(skb, rx_ring->queue_index);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF Mateusz Polchlopek
@ 2024-06-08 12:55   ` Simon Horman
  2024-06-10 10:18     ` Mateusz Polchlopek
  2024-06-10 18:35     ` Jacob Keller
  0 siblings, 2 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:55 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu

On Tue, Jun 04, 2024 at 09:13:49AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Add support for allowing a VF to enable PTP feature - Rx timestamps
> 
> The new capability is gated by VIRTCHNL_VF_CAP_PTP, which must be
> set by the VF to request access to the new operations. In addition, the
> VIRTCHNL_OP_1588_PTP_CAPS command is used to determine the specific
> capabilities available to the VF.
> 
> This support includes the following additional capabilities:
> 
> * Rx timestamps enabled in the Rx queues (when using flexible advanced
>   descriptors)
> * Read access to PHC time over virtchnl using
>   VIRTCHNL_OP_1588_PTP_GET_TIME
> 
> Extra space is reserved in most structures to allow for future
> extension (like set clock, Tx timestamps).  Additional opcode numbers
> are reserved and space in the virtchnl_ptp_caps structure is
> specifically set aside for this.
> Additionally, each structure has some space reserved for future
> extensions to allow some flexibility.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Hi Mateusz, Jacob, all,

If you need to respin this for some reason, please consider updating
the Kernel doc for the following to include a short description.
Else, please consider doing so as a follow-up

* struct virtchnl_ptp_caps
* struct virtchnl_phc_time

Likewise as a follow-up, as it was not introduced by this patch, for:

* virtchnl_vc_validate_vf_msg

Flagged by kernel-doc -none -Wall

The above not withstanding, this looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>

...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format Mateusz Polchlopek
@ 2024-06-08 12:56   ` Simon Horman
  2024-06-08 12:58   ` Simon Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:56 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:52AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF
> driver the ability to determine what Rx descriptor formats are
> available. This requires sending an additional message during
> initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This
> operation requests the supported Rx descriptor IDs available from the
> PF.
> 
> This is treated the same way that VLAN V2 capabilities are handled. Add
> a new set of extended capability flags, used to process send and receipt
> of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message.
> 
> This ensures we finish negotiating for the supported descriptor formats
> prior to beginning configuration of receive queues.
> 
> This change stores the supported format bitmap into the iavf_adapter
> structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled
> by the PF, we need to make sure that the Rx queue configuration
> specifies the format.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Hi Mateusz, Jacob, all,

The nit below notwithstanding, this looks good to me.

Reviewed-by: Simon Horman <horms@kernel.org>

...

> @@ -262,6 +276,45 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter)
>  	return err;
>  }
>  
> +int iavf_get_vf_supported_rxdids(struct iavf_adapter *adapter)
> +{
> +	struct iavf_hw *hw = &adapter->hw;
> +	struct iavf_arq_event_info event;
> +	enum virtchnl_ops op;
> +	enum iavf_status err;
> +	u16 len;
> +
> +	len =  sizeof(struct virtchnl_supported_rxdids);
> +	event.buf_len = len;
> +	event.msg_buf = kzalloc(event.buf_len, GFP_KERNEL);
> +	if (!event.msg_buf) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +
> +	while (1) {
> +		/* When the AQ is empty, iavf_clean_arq_element will return
> +		 * nonzero and this loop will terminate.
> +		 */
> +		err = iavf_clean_arq_element(hw, &event, NULL);
> +		if (err != IAVF_SUCCESS)
> +			goto out_alloc;
> +		op = (enum virtchnl_ops)le32_to_cpu(event.desc.cookie_high);
> +		if (op == VIRTCHNL_OP_GET_SUPPORTED_RXDIDS)
> +			break;
> +	}
> +
> +	err = (enum iavf_status)le32_to_cpu(event.desc.cookie_low);
> +	if (err)
> +		goto out_alloc;
> +
> +	memcpy(&adapter->supported_rxdids, event.msg_buf, min(event.msg_len, len));

If you need to respin for some other reason,
please consider wrapping the above to <= 80 columns wide.

Likewise for the 2nd call to iavf_ptp_cap_supported() in
iavf_ptp_process_caps() in
[PATCH v7 06/12] iavf: add initial framework for registering PTP clock

Flagged by: checkpatch.pl --max-line-length=80

> +out_alloc:
> +	kfree(event.msg_buf);
> +out:
> +	return err;
> +}
> +
>  /**
>   * iavf_configure_queues
>   * @adapter: adapter structure

...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor Mateusz Polchlopek
@ 2024-06-08 12:56   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:56 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Simei Su, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:50AM -0400, Mateusz Polchlopek wrote:
> From: Simei Su <simei.su@intel.com>
> 
> To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by
> the VF to request PTP capability and responded by the PF what capability
> is enabled for that VF.
> 
> Hardware captures timestamps which contain only 32 bits of nominal
> nanoseconds, as opposed to the 64bit timestamps that the stack expects.
> To convert 32b to 64b, we need a current PHC time.
> VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the
> PF with the current PHC time.
> 
> Signed-off-by: Simei Su <simei.su@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format Mateusz Polchlopek
@ 2024-06-08 12:57   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:57 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu

On Tue, Jun 04, 2024 at 09:13:51AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Support for allowing VF to negotiate the descriptor format requires that
> the VF specify which descriptor format to use when requesting Rx queues.
> The VF is supposed to request the set of supported formats via the new
> VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, and then set one of the supported
> formats in the rxdid field of the virtchnl_rxq_info structure.
> 
> The virtchnl.h header does not provide an enumeration of the format
> values. The existing implementations in the PF directly use the values
> from the DDP package.
> 
> Make the formats explicit by defining an enumeration of the RXDIDs.
> Provide an enumeration for the values as well as the bit positions as
> returned by the supported_rxdids data from the
> VIRTCHNL_OP_GET_SUPPORTED_RXDIDS.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format Mateusz Polchlopek
  2024-06-08 12:56   ` Simon Horman
@ 2024-06-08 12:58   ` Simon Horman
  1 sibling, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:58 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:52AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Enable support for VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC, to enable the VF
> driver the ability to determine what Rx descriptor formats are
> available. This requires sending an additional message during
> initialization and reset, the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS. This
> operation requests the supported Rx descriptor IDs available from the
> PF.
> 
> This is treated the same way that VLAN V2 capabilities are handled. Add
> a new set of extended capability flags, used to process send and receipt
> of the VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message.
> 
> This ensures we finish negotiating for the supported descriptor formats
> prior to beginning configuration of receive queues.
> 
> This change stores the supported format bitmap into the iavf_adapter
> structure. Additionally, if VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC is enabled
> by the PF, we need to make sure that the Rx queue configuration
> specifies the format.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities Mateusz Polchlopek
@ 2024-06-08 12:58   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:58 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:53AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Add a new extended capabilities negotiation to exchange information from
> the PF about what PTP capabilities are supported by this VF. This
> requires sending a VIRTCHNL_OP_1588_PTP_GET_CAPS message, and waiting
> for the response from the PF. Handle this early on during the VF
> initialization.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock Mateusz Polchlopek
@ 2024-06-08 12:58   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:58 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Sai Krishna, Ahmed Zaki

On Tue, Jun 04, 2024 at 09:13:54AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Add the iavf_ptp.c file and fill it in with a skeleton framework to
> allow registering the PTP clock device.
> Add implementation of helper functions to check if a PTP capability
> is supported and handle change in PTP capabilities.
> Enabling virtual clock would be possible, though it would probably
> perform poorly due to the lack of direct time access.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Sai Krishna <saikrishnag@marvell.com>
> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time Mateusz Polchlopek
@ 2024-06-08 12:59   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:59 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu

On Tue, Jun 04, 2024 at 09:13:55AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Implement support for reading the PHC time indirectly via the
> VIRTCHNL_OP_1588_PTP_GET_TIME operation.
> 
> Based on some simple tests with ftrace, the latency of the indirect
> clock access appears to be about ~110 microseconds. This is due to the
> cost of preparing a message to send over the virtchnl queue.
> 
> This is expected, due to the increased jitter caused by sending messages
> over virtchnl. It is not easy to control the precise time that the
> message is sent by the VF, or the time that the message is responded to
> by the PF, or the time that the message sent from the PF is received by
> the VF.
> 
> For sending the request, note that many PTP related operations will
> require sending of VIRTCHNL messages. Instead of adding a separate AQ
> flag and storage for each operation, setup a simple queue mechanism for
> queuing up virtchnl messages.
> 
> Each message will be converted to a iavf_ptp_aq_cmd structure which ends
> with a flexible array member. A single AQ flag is added for processing
> messages from this queue. In principle this could be extended to handle
> arbitrary virtchnl messages. For now it is kept to PTP-specific as the
> need is primarily for handling PTP-related commands.
> 
> Use this to implement .gettimex64 using the indirect method via the
> virtchnl command. The response from the PF is processed and stored into
> the cached_phc_time. A wait queue is used to allow the PTP clock gettime
> request to sleep until the message is sent from the PF.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache PHC time
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache " Mateusz Polchlopek
@ 2024-06-08 12:59   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:59 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:56AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> The Rx timestamps reported by hardware may only have 32 bits of storage
> for nanosecond time. These timestamps cannot be directly reported to the
> Linux stack, as it expects 64bits of time.
> 
> To handle this, the timestamps must be extended using an algorithm that
> calculates the corrected 64bit timestamp by comparison between the PHC
> time and the timestamp. This algorithm requires the PHC time to be
> captured within ~2 seconds of when the timestamp was captured.
> 
> Instead of trying to read the PHC time in the Rx hotpath, the algorithm
> relies on a cached value that is periodically updated.
> 
> Keep this cached time up to date by using the PTP .do_aux_work kthread
> function.
> 
> The iavf_ptp_do_aux_work will reschedule itself about twice a second,
> and will check whether or not the cached PTP time needs to be updated.
> If so, it issues a VIRTCHNL_OP_1588_PTP_GET_TIME to request the time
> from the PF. The jitter and latency involved with this command aren't
> important, because the cached time just needs to be kept up to date
> within about ~2 seconds.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors Mateusz Polchlopek
@ 2024-06-08 12:59   ` Simon Horman
  2024-06-11 11:47   ` Alexander Lobakin
  1 sibling, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:59 UTC (permalink / raw)
  To: Mateusz Polchlopek; +Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek

On Tue, Jun 04, 2024 at 09:13:57AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
> negotiating to enable the advanced flexible descriptor layout. Add the
> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.
> 
> Also add bit position definitions for the status and error indications
> that are needed.
> 
> The iavf_clean_rx_irq function needs to extract a few fields from the Rx
> descriptor, including the size, rx_ptype, and vlan_tag.
> Move the extraction to a separate function that decodes the fields into
> a structure. This will reduce the burden for handling multiple
> descriptor types by keeping the relevant extraction logic in one place.
> 
> To support handling an additional descriptor format with minimal code
> duplication, refactor Rx checksum handling so that the general logic
> is separated from the bit calculations. Introduce an iavf_rx_desc_decoded
> structure which holds the relevant bits decoded from the Rx descriptor.
> This will enable implementing flexible descriptor handling without
> duplicating the general logic twice.
> 
> Introduce an iavf_extract_flex_rx_fields, iavf_flex_rx_hash, and
> iavf_flex_rx_csum functions which operate on the flexible NIC descriptor
> format instead of the legacy 32 byte format. Based on the negotiated
> RXDID, select the correct function for processing the Rx descriptors.
> 
> With this change, the Rx hot path should be functional when using either
> the default legacy 32byte format or when we switch to the flexible NIC
> layout.
> 
> Modify the Rx hot path to add support for the flexible descriptor
> format and add request enabling Rx timestamps for all queues.
> 
> As in ice, make sure we bump the checksum level if the hardware detected
> a packet type which could have an outer checksum. This is important
> because hardware only verifies the inner checksum.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field Mateusz Polchlopek
@ 2024-06-08 12:59   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 12:59 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Wojciech Drewek, Rahul Rameshbabu

On Tue, Jun 04, 2024 at 09:13:58AM -0400, Mateusz Polchlopek wrote:
> Rx timestamping introduced in PF driver caused the need of refactoring
> the VF driver mechanism to check packet fields.
> 
> The function to check errors in descriptor has been removed and from
> now only previously set struct fields are being checked. The field DD
> (descriptor done) needs to be checked at the very beginning, before
> extracting other fields.
> 
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops Mateusz Polchlopek
@ 2024-06-08 13:00   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 13:00 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu

On Tue, Jun 04, 2024 at 09:13:59AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Add handlers for the .ndo_hwtstamp_get and .ndo_hwtstamp_set ops which allow
> userspace to request timestamp enablement for the device. This support allows
> standard Linux applications to request the timestamping desired.
> 
> As with other devices that support timestamping all packets, the driver
> will upgrade any request for timestamping of a specific type of packet
> to HWTSTAMP_FILTER_ALL.
> 
> The current configuration is stored, so that it can be retrieved by
> calling .ndo_hwtstamp_get
> 
> The Tx timestamps are not implemented yet so calling set ops for
> Tx path will end with EOPNOTSUPP error code.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath
  2024-06-04 13:14 ` [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath Mateusz Polchlopek
@ 2024-06-08 13:00   ` Simon Horman
  0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2024-06-08 13:00 UTC (permalink / raw)
  To: Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu, Sunil Goutham

On Tue, Jun 04, 2024 at 09:14:00AM -0400, Mateusz Polchlopek wrote:
> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Add support for receive timestamps to the Rx hotpath. This support only
> works when using the flexible descriptor format, so make sure that we
> request this format by default if we have receive timestamp support
> available in the PTP capabilities.
> 
> In order to report the timestamps to userspace, we need to perform
> timestamp extension. The Rx descriptor does actually contain the "40
> bit" timestamp. However, upper 32 bits which contain nanoseconds are
> conveniently stored separately in the descriptor. We could extract the
> 32bits and lower 8 bits, then perform a bitwise OR to calculate the
> 40bit value. This makes no sense, because the timestamp extension
> algorithm would simply discard the lower 8 bits anyways.
> 
> Thus, implement timestamp extension as iavf_ptp_extend_32b_timestamp(),
> and extract and forward only the 32bits of nominal nanoseconds.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
> Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF
  2024-06-08 12:55   ` Simon Horman
@ 2024-06-10 10:18     ` Mateusz Polchlopek
  2024-06-10 18:35     ` Jacob Keller
  1 sibling, 0 replies; 34+ messages in thread
From: Mateusz Polchlopek @ 2024-06-10 10:18 UTC (permalink / raw)
  To: Simon Horman
  Cc: intel-wired-lan, netdev, Jacob Keller, Wojciech Drewek,
	Rahul Rameshbabu



On 6/8/2024 2:55 PM, Simon Horman wrote:
> On Tue, Jun 04, 2024 at 09:13:49AM -0400, Mateusz Polchlopek wrote:
>> From: Jacob Keller <jacob.e.keller@intel.com>
>>
>> Add support for allowing a VF to enable PTP feature - Rx timestamps
>>
>> The new capability is gated by VIRTCHNL_VF_CAP_PTP, which must be
>> set by the VF to request access to the new operations. In addition, the
>> VIRTCHNL_OP_1588_PTP_CAPS command is used to determine the specific
>> capabilities available to the VF.
>>
>> This support includes the following additional capabilities:
>>
>> * Rx timestamps enabled in the Rx queues (when using flexible advanced
>>    descriptors)
>> * Read access to PHC time over virtchnl using
>>    VIRTCHNL_OP_1588_PTP_GET_TIME
>>
>> Extra space is reserved in most structures to allow for future
>> extension (like set clock, Tx timestamps).  Additional opcode numbers
>> are reserved and space in the virtchnl_ptp_caps structure is
>> specifically set aside for this.
>> Additionally, each structure has some space reserved for future
>> extensions to allow some flexibility.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
>> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> 
> Hi Mateusz, Jacob, all,
> 
> If you need to respin this for some reason, please consider updating
> the Kernel doc for the following to include a short description.
> Else, please consider doing so as a follow-up
> 
> * struct virtchnl_ptp_caps
> * struct virtchnl_phc_time
> 
> Likewise as a follow-up, as it was not introduced by this patch, for:
> 
> * virtchnl_vc_validate_vf_msg
> 
> Flagged by kernel-doc -none -Wall
> 
> The above not withstanding, this looks good to me.
> 
> Reviewed-by: Simon Horman <horms@kernel.org>
> 
> ...

Hello Simon!

Thanks for Your review - I appreciate it.

I thought about followup series after this being merged but I received
one warning from kernel-bot regarding ARM architecture issue. That being
said I will post (probably tomorrow) v8 with fix for ARM architecture
issue and I will also include fixes for virtchnl_ptp_caps and
virtchnl_phc_time (and exceeded 80 chars issues in commit 6).

As You pointed, the virtchnl_vc_validate_vf_msg function has not been
introduced in this patch so I do not want to mix that. I will create
post-merge followup with documentation changes for mentioned function
(virtchnl_vc_validate_vf_msg) and also for one docs leftover from my
previous series (related to tx scheduler).

Mateusz

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF
  2024-06-08 12:55   ` Simon Horman
  2024-06-10 10:18     ` Mateusz Polchlopek
@ 2024-06-10 18:35     ` Jacob Keller
  1 sibling, 0 replies; 34+ messages in thread
From: Jacob Keller @ 2024-06-10 18:35 UTC (permalink / raw)
  To: Simon Horman, Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Wojciech Drewek, Rahul Rameshbabu



On 6/8/2024 5:55 AM, Simon Horman wrote:
> If you need to respin this for some reason, please consider updating
> the Kernel doc for the following to include a short description.
> Else, please consider doing so as a follow-up
> 
> * struct virtchnl_ptp_caps
> * struct virtchnl_phc_time
> 
> Likewise as a follow-up, as it was not introduced by this patch, for:
> 
> * virtchnl_vc_validate_vf_msg
> 
> Flagged by kernel-doc -none -Wall
> 
> The above not withstanding, this looks good to me.
> 
> Reviewed-by: Simon Horman <horms@kernel.org>
> 
> ...

At some point I would like to do the work to cleanup all of the
remaining doc warnings in ice so that we can get a clean slate :D

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors Mateusz Polchlopek
  2024-06-08 12:59   ` Simon Horman
@ 2024-06-11 11:47   ` Alexander Lobakin
  2024-06-11 20:52     ` Jacob Keller
  1 sibling, 1 reply; 34+ messages in thread
From: Alexander Lobakin @ 2024-06-11 11:47 UTC (permalink / raw)
  To: Mateusz Polchlopek, Jacob Keller; +Cc: intel-wired-lan, netdev, Wojciech Drewek

From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Date: Tue,  4 Jun 2024 09:13:57 -0400

> From: Jacob Keller <jacob.e.keller@intel.com>
> 
> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
> negotiating to enable the advanced flexible descriptor layout. Add the
> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.
> 
> Also add bit position definitions for the status and error indications
> that are needed.
> 
> The iavf_clean_rx_irq function needs to extract a few fields from the Rx
> descriptor, including the size, rx_ptype, and vlan_tag.
> Move the extraction to a separate function that decodes the fields into
> a structure. This will reduce the burden for handling multiple
> descriptor types by keeping the relevant extraction logic in one place.
> 
> To support handling an additional descriptor format with minimal code
> duplication, refactor Rx checksum handling so that the general logic
> is separated from the bit calculations. Introduce an iavf_rx_desc_decoded
> structure which holds the relevant bits decoded from the Rx descriptor.
> This will enable implementing flexible descriptor handling without
> duplicating the general logic twice.
> 
> Introduce an iavf_extract_flex_rx_fields, iavf_flex_rx_hash, and
> iavf_flex_rx_csum functions which operate on the flexible NIC descriptor
> format instead of the legacy 32 byte format. Based on the negotiated
> RXDID, select the correct function for processing the Rx descriptors.
> 
> With this change, the Rx hot path should be functional when using either
> the default legacy 32byte format or when we switch to the flexible NIC
> layout.
> 
> Modify the Rx hot path to add support for the flexible descriptor
> format and add request enabling Rx timestamps for all queues.
> 
> As in ice, make sure we bump the checksum level if the hardware detected
> a packet type which could have an outer checksum. This is important
> because hardware only verifies the inner checksum.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 354 +++++++++++++-----
>  drivers/net/ethernet/intel/iavf/iavf_txrx.h   |   8 +
>  drivers/net/ethernet/intel/iavf/iavf_type.h   | 147 ++++++--
>  .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   5 +
>  4 files changed, 391 insertions(+), 123 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
> index 26b424fd6718..97da5af52ad7 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
> @@ -895,63 +895,68 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
>  	return true;
>  }
>  
> +/* iavf_rx_csum_decoded
> + *
> + * Checksum offload bits decoded from the receive descriptor.
> + */
> +struct iavf_rx_csum_decoded {
> +	u8 l3l4p : 1;
> +	u8 ipe : 1;
> +	u8 eipe : 1;
> +	u8 eudpe : 1;
> +	u8 ipv6exadd : 1;
> +	u8 l4e : 1;
> +	u8 pprs : 1;
> +	u8 nat : 1;
> +};

I see the same struct in idpf, probably a candidate for libeth.

> +
>  /**
> - * iavf_rx_checksum - Indicate in skb if hw indicated a good cksum
> + * iavf_rx_csum - Indicate in skb if hw indicated a good checksum
>   * @vsi: the VSI we care about
>   * @skb: skb currently being received and modified
> - * @rx_desc: the receive descriptor
> + * @ptype: decoded ptype information
> + * @csum_bits: decoded Rx descriptor information
>   **/
> -static void iavf_rx_checksum(struct iavf_vsi *vsi,
> -			     struct sk_buff *skb,
> -			     union iavf_rx_desc *rx_desc)
> +static void iavf_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,

Can't @vsi be const?

> +			 struct libeth_rx_pt *ptype,

struct libeth_rx_pt is smaller than a pointer to it. Pass it directly

> +			 struct iavf_rx_csum_decoded *csum_bits)

Same for this struct.

>  {
> -	struct libeth_rx_pt decoded;
> -	u32 rx_error, rx_status;
>  	bool ipv4, ipv6;
> -	u8 ptype;
> -	u64 qword;
>  
>  	skb->ip_summed = CHECKSUM_NONE;
>  
> -	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> -	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
> -
> -	decoded = libie_rx_pt_parse(ptype);
> -	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
> -		return;
> -
> -	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
> -	rx_status = FIELD_GET(IAVF_RXD_QW1_STATUS_MASK, qword);
> -
>  	/* did the hardware decode the packet and checksum? */
> -	if (!(rx_status & BIT(IAVF_RX_DESC_STATUS_L3L4P_SHIFT)))
> +	if (!csum_bits->l3l4p)
>  		return;
>  
> -	ipv4 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV4;
> -	ipv6 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV6;
> +	ipv4 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV4;
> +	ipv6 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV6;
>  
> -	if (ipv4 &&
> -	    (rx_error & (BIT(IAVF_RX_DESC_ERROR_IPE_SHIFT) |
> -			 BIT(IAVF_RX_DESC_ERROR_EIPE_SHIFT))))
> +	if (ipv4 && (csum_bits->ipe || csum_bits->eipe))
>  		goto checksum_fail;
>  
>  	/* likely incorrect csum if alternate IP extension headers found */
> -	if (ipv6 &&
> -	    rx_status & BIT(IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT))
> -		/* don't increment checksum err here, non-fatal err */
> +	if (ipv6 && csum_bits->ipv6exadd)
>  		return;
>  
>  	/* there was some L4 error, count error and punt packet to the stack */
> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_L4E_SHIFT))
> +	if (csum_bits->l4e)
>  		goto checksum_fail;
>  
>  	/* handle packets that were not able to be checksummed due
>  	 * to arrival speed, in this case the stack can compute
>  	 * the csum.
>  	 */
> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_PPRS_SHIFT))
> +	if (csum_bits->pprs)
>  		return;
>  
> +	/* If there is an outer header present that might contain a checksum
> +	 * we need to bump the checksum level by 1 to reflect the fact that
> +	 * we are indicating we validated the inner checksum.
> +	 */
> +	if (ptype->tunnel_type >= LIBETH_RX_PT_TUNNEL_IP_GRENAT)
> +		skb->csum_level = 1;
> +
>  	skb->ip_summed = CHECKSUM_UNNECESSARY;
>  	return;

For the whole function: you need to use unlikely() for checksum errors
to not slow down regular frames.
Also, I would even unlikely() packets with not verified checksum as it's
really rare case.
Optimize hotpath for most common workloads.

>  
> @@ -960,22 +965,105 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
>  }
>  
>  /**
> - * iavf_rx_hash - set the hash value in the skb
> + * iavf_legacy_rx_csum - Indicate in skb if hw indicated a good cksum
> + * @vsi: the VSI we care about
> + * @skb: skb currently being received and modified
> + * @rx_desc: the receive descriptor
> + *
> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
> + * descriptor writeback format.
> + **/
> +static void iavf_legacy_rx_csum(struct iavf_vsi *vsi,
> +				struct sk_buff *skb,
> +				union iavf_rx_desc *rx_desc)

@vsi and @rx_desc can be const.

> +{
> +	struct iavf_rx_csum_decoded csum_bits;
> +	struct libeth_rx_pt decoded;
> +
> +	u32 rx_error;
> +	u64 qword;
> +	u16 ptype;
> +
> +	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
> +	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);

You don't need @rx_error before libeth_rx_pt_has_checksum().

> +	decoded = libie_rx_pt_parse(ptype);
> +
> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
> +		return;
> +
> +	csum_bits.ipe = FIELD_GET(IAVF_RX_DESC_ERROR_IPE_MASK, rx_error);

So, @rx_error is a field of @qword and then there are more subfields?
Why not extract those fields directly from @qword?

> +	csum_bits.eipe = FIELD_GET(IAVF_RX_DESC_ERROR_EIPE_MASK, rx_error);
> +	csum_bits.l4e = FIELD_GET(IAVF_RX_DESC_ERROR_L4E_MASK, rx_error);
> +	csum_bits.pprs = FIELD_GET(IAVF_RX_DESC_ERROR_PPRS_MASK, rx_error);
> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_DESC_STATUS_L3L4P_MASK, rx_error);
> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_DESC_STATUS_IPV6EXADD_MASK,
> +					rx_error);
> +	csum_bits.nat = 0;
> +	csum_bits.eudpe = 0;

Initialize the whole struct with = { } at the declaration site and
remove this.

> +
> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);

In order to avoid having 2 call sites for this, make
iavf_{flex,legacy}_rx_csum() return @csum_bits and call iavf_rx_csum()
outside of them once.

> +}
> +
> +/**
> + * iavf_flex_rx_csum - Indicate in skb if hw indicated a good cksum
> + * @vsi: the VSI we care about
> + * @skb: skb currently being received and modified
> + * @rx_desc: the receive descriptor
> + *
> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
> + * descriptor writeback format.
> + **/
> +static void iavf_flex_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
> +			      union iavf_rx_desc *rx_desc)

Same for const.

> +{
> +	struct iavf_rx_csum_decoded csum_bits;
> +	struct libeth_rx_pt decoded;
> +	u16 rx_status0, ptype;
> +
> +	rx_status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);

This is not needed before libeth_rx_pt_has_checksum().

> +	ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M,
> +			  le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0));

You also access this field later when extracting base fields. Shouldn't
this be combined somehow?

> +	decoded = libie_rx_pt_parse(ptype);
> +
> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
> +		return;
> +
> +	csum_bits.ipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M,
> +				  rx_status0);
> +	csum_bits.eipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M,
> +				   rx_status0);
> +	csum_bits.l4e = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M,
> +				  rx_status0);
> +	csum_bits.eudpe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M,
> +				    rx_status0);
> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M,
> +				    rx_status0);
> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M,
> +					rx_status0);
> +	csum_bits.nat = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS1_NAT_M, rx_status0);
> +	csum_bits.pprs = 0;

See above for struct initialization.

> +
> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);

See above.

> +}
> +
> +/**
> + * iavf_legacy_rx_hash - set the hash value in the skb
>   * @ring: descriptor ring
>   * @rx_desc: specific descriptor
>   * @skb: skb currently being received and modified
>   * @rx_ptype: Rx packet type
> + *
> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
> + * descriptor writeback format.
>   **/
> -static void iavf_rx_hash(struct iavf_ring *ring,
> -			 union iavf_rx_desc *rx_desc,
> -			 struct sk_buff *skb,
> -			 u8 rx_ptype)
> +static void iavf_legacy_rx_hash(struct iavf_ring *ring,
> +				union iavf_rx_desc *rx_desc,

Const for both.

> +				struct sk_buff *skb, u8 rx_ptype)
>  {
> +	const __le64 rss_mask = cpu_to_le64(IAVF_RX_DESC_STATUS_FLTSTAT_MASK);
>  	struct libeth_rx_pt decoded;
>  	u32 hash;
> -	const __le64 rss_mask =
> -		cpu_to_le64((u64)IAVF_RX_DESC_FLTSTAT_RSS_HASH <<
> -			    IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT);

Looks like unrelated, but nice change anyway.

>  
>  	decoded = libie_rx_pt_parse(rx_ptype);
>  	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
> @@ -987,6 +1075,38 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>  	}
>  }
>  
> +/**
> + * iavf_flex_rx_hash - set the hash value in the skb
> + * @ring: descriptor ring
> + * @rx_desc: specific descriptor
> + * @skb: skb currently being received and modified
> + * @rx_ptype: Rx packet type
> + *
> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
> + * descriptor writeback format.
> + **/
> +static void iavf_flex_rx_hash(struct iavf_ring *ring,
> +			      union iavf_rx_desc *rx_desc,

Const.

> +			      struct sk_buff *skb, u16 rx_ptype)

Why is @rx_ptype u16 here, but u8 above? Just use u32 for both.

> +{
> +	struct libeth_rx_pt decoded;
> +	u16 status0;
> +	u32 hash;
> +
> +	if (!(ring->netdev->features & NETIF_F_RXHASH))

This is checked in libeth_rx_pt_has_hash(), why check 2 times?

> +		return;
> +
> +	decoded = libie_rx_pt_parse(rx_ptype);
> +	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
> +		return;
> +
> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M) {
> +		hash = le32_to_cpu(rx_desc->flex_wb.rss_hash);
> +		libeth_rx_pt_set_hash(skb, hash, decoded);
> +	}
> +}

Also, just parse rx_ptype once in process_skb_fields(), you don't need
to do that in each function.

> +
>  /**
>   * iavf_process_skb_fields - Populate skb header fields from Rx descriptor
>   * @rx_ring: rx descriptor ring packet is being transacted on
> @@ -998,14 +1118,17 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>   * order to populate the hash, checksum, VLAN, protocol, and
>   * other fields within the skb.
>   **/
> -static void
> -iavf_process_skb_fields(struct iavf_ring *rx_ring,
> -			union iavf_rx_desc *rx_desc, struct sk_buff *skb,
> -			u8 rx_ptype)
> +static void iavf_process_skb_fields(struct iavf_ring *rx_ring,
> +				    union iavf_rx_desc *rx_desc,

Const.

> +				    struct sk_buff *skb, u16 rx_ptype)
>  {
> -	iavf_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
> -
> -	iavf_rx_checksum(rx_ring->vsi, skb, rx_desc);
> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE) {

Any likely/unlikely() here? Or it's 50:50?

> +		iavf_legacy_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
> +		iavf_legacy_rx_csum(rx_ring->vsi, skb, rx_desc);
> +	} else {
> +		iavf_flex_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
> +		iavf_flex_rx_csum(rx_ring->vsi, skb, rx_desc);
> +	}
>  
>  	skb_record_rx_queue(skb, rx_ring->queue_index);
>  
> @@ -1092,7 +1215,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>  /**
>   * iavf_is_non_eop - process handling of non-EOP buffers
>   * @rx_ring: Rx ring being processed
> - * @rx_desc: Rx descriptor for current buffer
> + * @fields: Rx descriptor extracted fields
>   * @skb: Current socket buffer containing buffer in progress
>   *
>   * This function updates next to clean.  If the buffer is an EOP buffer
> @@ -1101,7 +1224,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>   * that this is in fact a non-EOP buffer.
>   **/
>  static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
> -			    union iavf_rx_desc *rx_desc,
> +			    struct iavf_rx_extracted *fields,

Pass value instead of pointer.

>  			    struct sk_buff *skb)

Is it still needed?

>  {
>  	u32 ntc = rx_ring->next_to_clean + 1;
> @@ -1113,8 +1236,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>  	prefetch(IAVF_RX_DESC(rx_ring, ntc));
>  
>  	/* if we are the last buffer then there is nothing else to do */
> -#define IAVF_RXD_EOF BIT(IAVF_RX_DESC_STATUS_EOF_SHIFT)
> -	if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
> +	if (likely(fields->end_of_packet))
>  		return false;
>  
>  	rx_ring->rx_stats.non_eop_descs++;
> @@ -1122,6 +1244,91 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>  	return true;
>  }
>  
> +/**
> + * iavf_extract_legacy_rx_fields - Extract fields from the Rx descriptor
> + * @rx_ring: rx descriptor ring
> + * @rx_desc: the descriptor to process
> + * @fields: storage for extracted values
> + *
> + * Decode the Rx descriptor and extract relevant information including the
> + * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
> + *
> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
> + * descriptor writeback format.
> + */
> +static void iavf_extract_legacy_rx_fields(struct iavf_ring *rx_ring,
> +					  union iavf_rx_desc *rx_desc,

Consts.

> +					  struct iavf_rx_extracted *fields)

Return a struct &iavf_rx_extracted instead of passing a pointer to it.

> +{
> +	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +
> +	fields->size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
> +	fields->rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
> +
> +	if (qword & IAVF_RX_DESC_STATUS_L2TAG1P_MASK &&
> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
> +		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
> +
> +	if (rx_desc->wb.qword2.ext_status &
> +	    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&

Bitops must have own pairs of braces.

> +	    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
> +		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);
> +
> +	fields->end_of_packet = FIELD_GET(IAVF_RX_DESC_STATUS_EOF_MASK, qword);
> +	fields->rxe = FIELD_GET(BIT(IAVF_RXD_QW1_ERROR_SHIFT), qword);
> +}
> +
> +/**
> + * iavf_extract_flex_rx_fields - Extract fields from the Rx descriptor
> + * @rx_ring: rx descriptor ring
> + * @rx_desc: the descriptor to process
> + * @fields: storage for extracted values
> + *
> + * Decode the Rx descriptor and extract relevant information including the
> + * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
> + *
> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
> + * descriptor writeback format.
> + */
> +static void iavf_extract_flex_rx_fields(struct iavf_ring *rx_ring,
> +					union iavf_rx_desc *rx_desc,
> +					struct iavf_rx_extracted *fields)

Same for everything.

> +{
> +	u16 status0, status1, flexi_flags0;
> +
> +	fields->size = FIELD_GET(IAVF_RX_FLEX_DESC_PKT_LEN_M,
> +				 le16_to_cpu(rx_desc->flex_wb.pkt_len));

le16_get_bits().

> +
> +	flexi_flags0 = le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0);
> +
> +	fields->rx_ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M, flexi_flags0);

le16_get_bits() instead of these two?

> +
> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M &&
> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)

Braces for bitops.

> +		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag1);
> +
> +	status1 = le16_to_cpu(rx_desc->flex_wb.status_error1);
> +	if (status1 & IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M &&
> +	    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
> +		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag2_2nd);

(same)

> +
> +	fields->end_of_packet = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT,
> +					  status0);
> +	fields->rxe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT,
> +				status0);
> +}
> +
> +static void iavf_extract_rx_fields(struct iavf_ring *rx_ring,
> +				   union iavf_rx_desc *rx_desc,
> +				   struct iavf_rx_extracted *fields)

Consts + return struct @fields directly.

> +{
> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE)

You check this several times, this could be combined and optimized.

> +		iavf_extract_legacy_rx_fields(rx_ring, rx_desc, fields);
> +	else
> +		iavf_extract_flex_rx_fields(rx_ring, rx_desc, fields);
> +}
> +
>  /**
>   * iavf_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
>   * @rx_ring: rx descriptor ring to transact packets on
> @@ -1142,12 +1349,9 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>  	bool failure = false;
>  
>  	while (likely(total_rx_packets < (unsigned int)budget)) {
> +		struct iavf_rx_extracted fields = {};
>  		struct libeth_fqe *rx_buffer;
>  		union iavf_rx_desc *rx_desc;
> -		unsigned int size;
> -		u16 vlan_tag = 0;
> -		u8 rx_ptype;
> -		u64 qword;
>  
>  		/* return some buffers to hardware, one at a time is too slow */
>  		if (cleaned_count >= IAVF_RX_BUFFER_WRITE) {
> @@ -1158,35 +1362,27 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>  
>  		rx_desc = IAVF_RX_DESC(rx_ring, rx_ring->next_to_clean);
>  
> -		/* status_error_len will always be zero for unused descriptors
> -		 * because it's cleared in cleanup, and overlaps with hdr_addr
> -		 * which is always zero because packet split isn't used, if the
> -		 * hardware wrote DD then the length will be non-zero
> -		 */
> -		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> -
>  		/* This memory barrier is needed to keep us from reading
>  		 * any other fields out of the rx_desc until we have
>  		 * verified the descriptor has been written back.
>  		 */
>  		dma_rmb();
> -#define IAVF_RXD_DD BIT(IAVF_RX_DESC_STATUS_DD_SHIFT)
> -		if (!iavf_test_staterr(rx_desc, IAVF_RXD_DD))
> +		if (!iavf_test_staterr(rx_desc, IAVF_RX_DESC_STATUS_DD_MASK))
>  			break;
>  
> -		size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
> +		iavf_extract_rx_fields(rx_ring, rx_desc, &fields);
>  
>  		iavf_trace(clean_rx_irq, rx_ring, rx_desc, skb);
>  
>  		rx_buffer = &rx_ring->rx_fqes[rx_ring->next_to_clean];
> -		if (!libeth_rx_sync_for_cpu(rx_buffer, size))
> +		if (!libeth_rx_sync_for_cpu(rx_buffer, fields.size))
>  			goto skip_data;
>  
>  		/* retrieve a buffer from the ring */
>  		if (skb)
> -			iavf_add_rx_frag(skb, rx_buffer, size);
> +			iavf_add_rx_frag(skb, rx_buffer, fields.size);
>  		else
> -			skb = iavf_build_skb(rx_buffer, size);
> +			skb = iavf_build_skb(rx_buffer, fields.size);
>  
>  		/* exit if we failed to retrieve a buffer */
>  		if (!skb) {
> @@ -1197,15 +1393,14 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>  skip_data:
>  		cleaned_count++;
>  
> -		if (iavf_is_non_eop(rx_ring, rx_desc, skb) || unlikely(!skb))
> +		if (iavf_is_non_eop(rx_ring, &fields, skb) || unlikely(!skb))
>  			continue;
>  
> -		/* ERR_MASK will only have valid bits if EOP set, and
> -		 * what we are doing here is actually checking
> -		 * IAVF_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
> -		 * the error field
> +		/* RXE field in descriptor is an indication of the MAC errors
> +		 * (like CRC, alignment, oversize etc). If it is set then iavf
> +		 * should finish.
>  		 */
> -		if (unlikely(iavf_test_staterr(rx_desc, BIT(IAVF_RXD_QW1_ERROR_SHIFT)))) {
> +		if (unlikely(fields.rxe)) {
>  			dev_kfree_skb_any(skb);
>  			skb = NULL;
>  			continue;
> @@ -1219,22 +1414,11 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>  		/* probably a little skewed due to removing CRC */
>  		total_rx_bytes += skb->len;
>  
> -		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> -		rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
> -
>  		/* populate checksum, VLAN, and protocol */
> -		iavf_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
> -
> -		if (qword & BIT(IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT) &&
> -		    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
> -		if (rx_desc->wb.qword2.ext_status &
> -		    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
> -		    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);

BTW I'm wondering whether filling the whole @fields can be less
optimized than accesssing descriptor fields one by one like it was here
before.
I mean, in some cases you won't need all the fields from the extracted
struct, but they will be read and initialized anyway.

> +		iavf_process_skb_fields(rx_ring, rx_desc, skb, fields.rx_ptype);
>  
>  		iavf_trace(clean_rx_irq_rx, rx_ring, rx_desc, skb);
> -		iavf_receive_skb(rx_ring, skb, vlan_tag);
> +		iavf_receive_skb(rx_ring, skb, fields.vlan_tag);
>  		skb = NULL;
>  
>  		/* update budget accounting */
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
> index 17309d8625ac..3661cd57a068 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
> @@ -99,6 +99,14 @@ static inline bool iavf_test_staterr(union iavf_rx_desc *rx_desc,
>  		  cpu_to_le64(stat_err_bits));
>  }
>  
> +struct iavf_rx_extracted {
> +	unsigned int size;
> +	u16 vlan_tag;
> +	u16 rx_ptype;
> +	u8 end_of_packet:1;
> +	u8 rxe:1;
> +};

Also something libethish, but not sure for this one.

> +
>  /* How many Rx Buffers do we bundle into one write to the hardware ? */
>  #define IAVF_RX_INCREMENT(r, i) \
>  	do {					\
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
> index f6b09e57abce..82c16a720807 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_type.h
> +++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
> @@ -206,6 +206,45 @@ union iavf_16byte_rx_desc {
>  	} wb;  /* writeback */
>  };
>  
> +/* Rx Flex Descriptor NIC Profile
> + * RxDID Profile ID 2
> + * Flex-field 0: RSS hash lower 16-bits
> + * Flex-field 1: RSS hash upper 16-bits
> + * Flex-field 2: Flow ID lower 16-bits
> + * Flex-field 3: Flow ID higher 16-bits
> + * Flex-field 4: reserved, VLAN ID taken from L2Tag
> + */
> +struct iavf_32byte_rx_flex_wb {
> +	/* Qword 0 */
> +	u8 rxdid;
> +	u8 mir_id_umb_cast;
> +	__le16 ptype_flexi_flags0;
> +	__le16 pkt_len;
> +	__le16 hdr_len_sph_flex_flags1;
> +
> +	/* Qword 1 */
> +	__le16 status_error0;
> +	__le16 l2tag1;
> +	__le32 rss_hash;
> +
> +	/* Qword 2 */
> +	__le16 status_error1;
> +	u8 flexi_flags2;
> +	u8 ts_low;
> +	__le16 l2tag2_1st;
> +	__le16 l2tag2_2nd;
> +
> +	/* Qword 3 */
> +	__le32 flow_id;
> +	union {
> +		struct {
> +			__le16 rsvd;
> +			__le16 flow_id_ipv6;
> +		} flex;
> +		__le32 ts_high;
> +	} flex_ts;
> +};

I'm wondering whether HW descriptors can be defined as just a bunch of
u64 qwords instead of all those u8/__le16 etc. fields. That would be faster.
In case this would work differently on BE and LE, #ifdefs.

> +
>  union iavf_32byte_rx_desc {
>  	struct {
>  		__le64  pkt_addr; /* Packet buffer address */
> @@ -253,35 +292,34 @@ union iavf_32byte_rx_desc {
>  			} hi_dword;
>  		} qword3;
>  	} wb;  /* writeback */
> +	struct iavf_32byte_rx_flex_wb flex_wb;

So, already existing formats were described here directly, but flex is
declared as a field? Can this be more consistent?

>  };
>  
> -enum iavf_rx_desc_status_bits {
> -	/* Note: These are predefined bit offsets */
> -	IAVF_RX_DESC_STATUS_DD_SHIFT		= 0,
> -	IAVF_RX_DESC_STATUS_EOF_SHIFT		= 1,
> -	IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT	= 2,
> -	IAVF_RX_DESC_STATUS_L3L4P_SHIFT		= 3,
> -	IAVF_RX_DESC_STATUS_CRCP_SHIFT		= 4,
> -	IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT	= 5, /* 2 BITS */
> -	IAVF_RX_DESC_STATUS_TSYNVALID_SHIFT	= 7,
> -	/* Note: Bit 8 is reserved in X710 and XL710 */
> -	IAVF_RX_DESC_STATUS_EXT_UDP_0_SHIFT	= 8,
> -	IAVF_RX_DESC_STATUS_UMBCAST_SHIFT	= 9, /* 2 BITS */
> -	IAVF_RX_DESC_STATUS_FLM_SHIFT		= 11,
> -	IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT	= 12, /* 2 BITS */
> -	IAVF_RX_DESC_STATUS_LPBK_SHIFT		= 14,
> -	IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT	= 15,
> -	IAVF_RX_DESC_STATUS_RESERVED_SHIFT	= 16, /* 2 BITS */
> -	/* Note: For non-tunnel packets INT_UDP_0 is the right status for
> -	 * UDP header
> -	 */
> -	IAVF_RX_DESC_STATUS_INT_UDP_0_SHIFT	= 18,
> -	IAVF_RX_DESC_STATUS_LAST /* this entry must be last!!! */
> -};
> +/* Note: These are predefined bit offsets */
> +#define IAVF_RX_DESC_STATUS_DD_MASK		BIT(0)
> +#define IAVF_RX_DESC_STATUS_EOF_MASK		BIT(1)
> +#define IAVF_RX_DESC_STATUS_L2TAG1P_MASK	BIT(2)
> +#define IAVF_RX_DESC_STATUS_L3L4P_MASK		BIT(3)
> +#define IAVF_RX_DESC_STATUS_CRCP_MASK		BIT(4)
> +#define IAVF_RX_DESC_STATUS_TSYNINDX_MASK	GENMASK_ULL(6, 5)
> +#define IAVF_RX_DESC_STATUS_TSYNVALID_MASK	BIT(7)
> +/* Note: Bit 8 is reserved in X710 and XL710 */
> +#define IAVF_RX_DESC_STATUS_EXT_UDP_0_MASK	BIT(8)
> +#define IAVF_RX_DESC_STATUS_UMBCAST_MASK	GENMASK_ULL(10, 9)
> +#define IAVF_RX_DESC_STATUS_FLM_MASK		BIT(11)
> +#define IAVF_RX_DESC_STATUS_FLTSTAT_MASK	GENMASK_ULL(13, 12)
> +#define IAVF_RX_DESC_STATUS_LPBK_MASK		BIT(14)
> +#define IAVF_RX_DESC_STATUS_IPV6EXADD_MASK	BIT(15)
> +#define IAVF_RX_DESC_STATUS_RESERVED_MASK	GENMASK_ULL(17, 16)
> +/* Note: For non-tunnel packets INT_UDP_0 is the right status for
> + * UDP header
> + */
> +#define IAVF_RX_DESC_STATUS_INT_UDP_0_MASK	BIT(18)
> +
> +#define IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT	BIT(1)
> +#define IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT	BIT(10)
>  
> -#define IAVF_RXD_QW1_STATUS_SHIFT	0
> -#define IAVF_RXD_QW1_STATUS_MASK	((BIT(IAVF_RX_DESC_STATUS_LAST) - 1) \
> -					 << IAVF_RXD_QW1_STATUS_SHIFT)
> +#define IAVF_RXD_QW1_STATUS_MASK		(BIT(19) - 1)

GENMASK().

>  
>  #define IAVF_RXD_QW1_STATUS_TSYNINDX_SHIFT IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT
>  #define IAVF_RXD_QW1_STATUS_TSYNINDX_MASK  (0x3UL << \
> @@ -301,18 +339,16 @@ enum iavf_rx_desc_fltstat_values {
>  #define IAVF_RXD_QW1_ERROR_SHIFT	19
>  #define IAVF_RXD_QW1_ERROR_MASK		(0xFFUL << IAVF_RXD_QW1_ERROR_SHIFT)
>  
> -enum iavf_rx_desc_error_bits {
> -	/* Note: These are predefined bit offsets */
> -	IAVF_RX_DESC_ERROR_RXE_SHIFT		= 0,
> -	IAVF_RX_DESC_ERROR_RECIPE_SHIFT		= 1,
> -	IAVF_RX_DESC_ERROR_HBO_SHIFT		= 2,
> -	IAVF_RX_DESC_ERROR_L3L4E_SHIFT		= 3, /* 3 BITS */
> -	IAVF_RX_DESC_ERROR_IPE_SHIFT		= 3,
> -	IAVF_RX_DESC_ERROR_L4E_SHIFT		= 4,
> -	IAVF_RX_DESC_ERROR_EIPE_SHIFT		= 5,
> -	IAVF_RX_DESC_ERROR_OVERSIZE_SHIFT	= 6,
> -	IAVF_RX_DESC_ERROR_PPRS_SHIFT		= 7
> -};
> +/* Note: These are predefined bit offsets */
> +#define IAVF_RX_DESC_ERROR_RXE_MASK		BIT(0)
> +#define IAVF_RX_DESC_ERROR_RECIPE_MASK		BIT(1)
> +#define IAVF_RX_DESC_ERROR_HBO_MASK		BIT(2)
> +#define IAVF_RX_DESC_ERROR_L3L4E_MASK		GENMASK_ULL(5, 3)
> +#define IAVF_RX_DESC_ERROR_IPE_MASK		BIT(3)
> +#define IAVF_RX_DESC_ERROR_L4E_MASK		BIT(4)
> +#define IAVF_RX_DESC_ERROR_EIPE_MASK		BIT(5)
> +#define IAVF_RX_DESC_ERROR_OVERSIZE_MASK	BIT(6)
> +#define IAVF_RX_DESC_ERROR_PPRS_MASK		BIT(7)
>  
>  enum iavf_rx_desc_error_l3l4e_fcoe_masks {
>  	IAVF_RX_DESC_ERROR_L3L4E_NONE		= 0,
> @@ -325,6 +361,41 @@ enum iavf_rx_desc_error_l3l4e_fcoe_masks {
>  #define IAVF_RXD_QW1_PTYPE_SHIFT	30
>  #define IAVF_RXD_QW1_PTYPE_MASK		(0xFFULL << IAVF_RXD_QW1_PTYPE_SHIFT)
>  
> +/* for iavf_32byte_rx_flex_wb.ptype_flexi_flags0 member */
> +#define IAVF_RX_FLEX_DESC_PTYPE_M      (0x3FF) /* 10-bits */

Redundant braces + GENMASK()

> +
> +/* for iavf_32byte_rx_flex_wb.pkt_length member */
> +#define IAVF_RX_FLEX_DESC_PKT_LEN_M    (0x3FFF) /* 14-bits */

Same.

> +
> +/* Note: These are predefined bit offsets */
> +#define IAVF_RX_FLEX_DESC_STATUS0_DD_M			BIT(0)
> +#define IAVF_RX_FLEX_DESC_STATUS0_EOF_M			BIT(1)
> +#define IAVF_RX_FLEX_DESC_STATUS0_HBO_M			BIT(2)
> +#define IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M		BIT(3)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M		BIT(4)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M		BIT(5)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M		BIT(6)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M		BIT(7)
> +#define IAVF_RX_FLEX_DESC_STATUS0_LPBK_M		BIT(8)
> +#define IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M		BIT(9)
> +#define IAVF_RX_FLEX_DESC_STATUS0_RXE_M			BIT(10)
> +#define IAVF_RX_FLEX_DESC_STATUS0_CRCP_			BIT(11)
> +#define IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M		BIT(12)
> +#define IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M		BIT(13)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD0_VALID_M	BIT(14)
> +#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD1_VALID_M	BIT(15)
> +
> +/* Note: These are predefined bit offsets */
> +#define IAVF_RX_FLEX_DESC_STATUS1_CPM_M			(0xFULL) /* 4 bits */

Redundant braces.
+ GENMASK_ULL(7, 0)?

> +#define IAVF_RX_FLEX_DESC_STATUS1_NAT_M			BIT(4)
> +#define IAVF_RX_FLEX_DESC_STATUS1_CRYPTO_M		BIT(5)
> +/* [10:6] reserved */
> +#define IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M		BIT(11)
> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD2_VALID_M	BIT(12)
> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD3_VALID_M	BIT(13)
> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD4_VALID_M	BIT(14)
> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD5_VALID_M	BIT(15)
> +
>  #define IAVF_RXD_QW1_LENGTH_PBUF_SHIFT	38
>  #define IAVF_RXD_QW1_LENGTH_PBUF_MASK	(0x3FFFULL << \
>  					 IAVF_RXD_QW1_LENGTH_PBUF_SHIFT)
> diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> index 2693c3ad0830..5cbb375b7063 100644
> --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
> @@ -402,6 +402,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>  	int pairs = adapter->num_active_queues;
>  	struct virtchnl_queue_pair_info *vqpi;
>  	u32 i, max_frame;
> +	u8 rx_flags = 0;
>  	size_t len;
>  
>  	max_frame = LIBIE_MAX_RX_FRM_LEN(adapter->rx_rings->pp->p.offset);
> @@ -419,6 +420,9 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>  	if (!vqci)
>  		return;
>  
> +	if (iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP))
> +		rx_flags |= VIRTCHNL_PTP_RX_TSTAMP;
> +
>  	vqci->vsi_id = adapter->vsi_res->vsi_id;
>  	vqci->num_queue_pairs = pairs;
>  	vqpi = vqci->qpair;
> @@ -441,6 +445,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>  		if (CRC_OFFLOAD_ALLOWED(adapter))
>  			vqpi->rxq.crc_disable = !!(adapter->netdev->features &
>  						   NETIF_F_RXFCS);
> +		vqpi->rxq.flags = rx_flags;
>  		vqpi++;
>  	}

Thanks,
Olek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-11 11:47   ` Alexander Lobakin
@ 2024-06-11 20:52     ` Jacob Keller
  2024-06-12 11:51       ` Przemek Kitszel
  2024-06-12 12:33       ` Alexander Lobakin
  0 siblings, 2 replies; 34+ messages in thread
From: Jacob Keller @ 2024-06-11 20:52 UTC (permalink / raw)
  To: Alexander Lobakin, Mateusz Polchlopek
  Cc: intel-wired-lan, netdev, Wojciech Drewek



On 6/11/2024 4:47 AM, Alexander Lobakin wrote:
> From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
> Date: Tue,  4 Jun 2024 09:13:57 -0400
> 
>> From: Jacob Keller <jacob.e.keller@intel.com>
>>
>> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
>> negotiating to enable the advanced flexible descriptor layout. Add the
>> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.
>>
>> Also add bit position definitions for the status and error indications
>> that are needed.
>>
>> The iavf_clean_rx_irq function needs to extract a few fields from the Rx
>> descriptor, including the size, rx_ptype, and vlan_tag.
>> Move the extraction to a separate function that decodes the fields into
>> a structure. This will reduce the burden for handling multiple
>> descriptor types by keeping the relevant extraction logic in one place.
>>
>> To support handling an additional descriptor format with minimal code
>> duplication, refactor Rx checksum handling so that the general logic
>> is separated from the bit calculations. Introduce an iavf_rx_desc_decoded
>> structure which holds the relevant bits decoded from the Rx descriptor.
>> This will enable implementing flexible descriptor handling without
>> duplicating the general logic twice.
>>
>> Introduce an iavf_extract_flex_rx_fields, iavf_flex_rx_hash, and
>> iavf_flex_rx_csum functions which operate on the flexible NIC descriptor
>> format instead of the legacy 32 byte format. Based on the negotiated
>> RXDID, select the correct function for processing the Rx descriptors.
>>
>> With this change, the Rx hot path should be functional when using either
>> the default legacy 32byte format or when we switch to the flexible NIC
>> layout.
>>
>> Modify the Rx hot path to add support for the flexible descriptor
>> format and add request enabling Rx timestamps for all queues.
>>
>> As in ice, make sure we bump the checksum level if the hardware detected
>> a packet type which could have an outer checksum. This is important
>> because hardware only verifies the inner checksum.
>>
>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>> ---
>>  drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 354 +++++++++++++-----
>>  drivers/net/ethernet/intel/iavf/iavf_txrx.h   |   8 +
>>  drivers/net/ethernet/intel/iavf/iavf_type.h   | 147 ++++++--
>>  .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   5 +
>>  4 files changed, 391 insertions(+), 123 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>> index 26b424fd6718..97da5af52ad7 100644
>> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>> @@ -895,63 +895,68 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
>>  	return true;
>>  }
>>  
>> +/* iavf_rx_csum_decoded
>> + *
>> + * Checksum offload bits decoded from the receive descriptor.
>> + */
>> +struct iavf_rx_csum_decoded {
>> +	u8 l3l4p : 1;
>> +	u8 ipe : 1;
>> +	u8 eipe : 1;
>> +	u8 eudpe : 1;
>> +	u8 ipv6exadd : 1;
>> +	u8 l4e : 1;
>> +	u8 pprs : 1;
>> +	u8 nat : 1;
>> +};
> 
> I see the same struct in idpf, probably a candidate for libeth.
> 

Makes sense.

>> +
>>  /**
>> - * iavf_rx_checksum - Indicate in skb if hw indicated a good cksum
>> + * iavf_rx_csum - Indicate in skb if hw indicated a good checksum
>>   * @vsi: the VSI we care about
>>   * @skb: skb currently being received and modified
>> - * @rx_desc: the receive descriptor
>> + * @ptype: decoded ptype information
>> + * @csum_bits: decoded Rx descriptor information
>>   **/
>> -static void iavf_rx_checksum(struct iavf_vsi *vsi,
>> -			     struct sk_buff *skb,
>> -			     union iavf_rx_desc *rx_desc)
>> +static void iavf_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
> 
> Can't @vsi be const?
> 
>> +			 struct libeth_rx_pt *ptype,
> 
> struct libeth_rx_pt is smaller than a pointer to it. Pass it directly
> 
>> +			 struct iavf_rx_csum_decoded *csum_bits)
> 
> Same for this struct.
> 
>>  {
>> -	struct libeth_rx_pt decoded;
>> -	u32 rx_error, rx_status;
>>  	bool ipv4, ipv6;
>> -	u8 ptype;
>> -	u64 qword;
>>  
>>  	skb->ip_summed = CHECKSUM_NONE;
>>  
>> -	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>> -	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>> -
>> -	decoded = libie_rx_pt_parse(ptype);
>> -	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>> -		return;
>> -
>> -	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
>> -	rx_status = FIELD_GET(IAVF_RXD_QW1_STATUS_MASK, qword);
>> -
>>  	/* did the hardware decode the packet and checksum? */
>> -	if (!(rx_status & BIT(IAVF_RX_DESC_STATUS_L3L4P_SHIFT)))
>> +	if (!csum_bits->l3l4p)
>>  		return;
>>  
>> -	ipv4 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV4;
>> -	ipv6 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV6;
>> +	ipv4 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV4;
>> +	ipv6 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV6;
>>  
>> -	if (ipv4 &&
>> -	    (rx_error & (BIT(IAVF_RX_DESC_ERROR_IPE_SHIFT) |
>> -			 BIT(IAVF_RX_DESC_ERROR_EIPE_SHIFT))))
>> +	if (ipv4 && (csum_bits->ipe || csum_bits->eipe))
>>  		goto checksum_fail;
>>  
>>  	/* likely incorrect csum if alternate IP extension headers found */
>> -	if (ipv6 &&
>> -	    rx_status & BIT(IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT))
>> -		/* don't increment checksum err here, non-fatal err */
>> +	if (ipv6 && csum_bits->ipv6exadd)
>>  		return;
>>  
>>  	/* there was some L4 error, count error and punt packet to the stack */
>> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_L4E_SHIFT))
>> +	if (csum_bits->l4e)
>>  		goto checksum_fail;
>>  
>>  	/* handle packets that were not able to be checksummed due
>>  	 * to arrival speed, in this case the stack can compute
>>  	 * the csum.
>>  	 */
>> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_PPRS_SHIFT))
>> +	if (csum_bits->pprs)
>>  		return;
>>  
>> +	/* If there is an outer header present that might contain a checksum
>> +	 * we need to bump the checksum level by 1 to reflect the fact that
>> +	 * we are indicating we validated the inner checksum.
>> +	 */
>> +	if (ptype->tunnel_type >= LIBETH_RX_PT_TUNNEL_IP_GRENAT)
>> +		skb->csum_level = 1;
>> +
>>  	skb->ip_summed = CHECKSUM_UNNECESSARY;
>>  	return;
> 
> For the whole function: you need to use unlikely() for checksum errors
> to not slow down regular frames.
> Also, I would even unlikely() packets with not verified checksum as it's
> really rare case.
> Optimize hotpath for most common workloads.
> 

Makes sense.

>>  
>> @@ -960,22 +965,105 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
>>  }
>>  
>>  /**
>> - * iavf_rx_hash - set the hash value in the skb
>> + * iavf_legacy_rx_csum - Indicate in skb if hw indicated a good cksum
>> + * @vsi: the VSI we care about
>> + * @skb: skb currently being received and modified
>> + * @rx_desc: the receive descriptor
>> + *
>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>> + * descriptor writeback format.
>> + **/
>> +static void iavf_legacy_rx_csum(struct iavf_vsi *vsi,
>> +				struct sk_buff *skb,
>> +				union iavf_rx_desc *rx_desc)
> 
> @vsi and @rx_desc can be const.
> 
>> +{
>> +	struct iavf_rx_csum_decoded csum_bits;
>> +	struct libeth_rx_pt decoded;
>> +
>> +	u32 rx_error;
>> +	u64 qword;
>> +	u16 ptype;
>> +
>> +	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>> +	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>> +	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
> 
> You don't need @rx_error before libeth_rx_pt_has_checksum().
> 
>> +	decoded = libie_rx_pt_parse(ptype);
>> +
>> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>> +		return;
>> +
>> +	csum_bits.ipe = FIELD_GET(IAVF_RX_DESC_ERROR_IPE_MASK, rx_error);
> 
> So, @rx_error is a field of @qword and then there are more subfields?
> Why not extract those fields directly from @qword?
> 

Yea that would be better. Probably just because the pre-existing
defines. Should be simple to update it.

>> +	csum_bits.eipe = FIELD_GET(IAVF_RX_DESC_ERROR_EIPE_MASK, rx_error);
>> +	csum_bits.l4e = FIELD_GET(IAVF_RX_DESC_ERROR_L4E_MASK, rx_error);
>> +	csum_bits.pprs = FIELD_GET(IAVF_RX_DESC_ERROR_PPRS_MASK, rx_error);
>> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_DESC_STATUS_L3L4P_MASK, rx_error);
>> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_DESC_STATUS_IPV6EXADD_MASK,
>> +					rx_error);
>> +	csum_bits.nat = 0;
>> +	csum_bits.eudpe = 0;
> 
> Initialize the whole struct with = { } at the declaration site and
> remove this.
> 
>> +
>> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
> 
> In order to avoid having 2 call sites for this, make
> iavf_{flex,legacy}_rx_csum() return @csum_bits and call iavf_rx_csum()
> outside of them once.
> 

Good idea.

>> +}
>> +
>> +/**
>> + * iavf_flex_rx_csum - Indicate in skb if hw indicated a good cksum
>> + * @vsi: the VSI we care about
>> + * @skb: skb currently being received and modified
>> + * @rx_desc: the receive descriptor
>> + *
>> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
>> + * descriptor writeback format.
>> + **/
>> +static void iavf_flex_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
>> +			      union iavf_rx_desc *rx_desc)
> 
> Same for const.
> 
>> +{
>> +	struct iavf_rx_csum_decoded csum_bits;
>> +	struct libeth_rx_pt decoded;
>> +	u16 rx_status0, ptype;
>> +
>> +	rx_status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
> 
> This is not needed before libeth_rx_pt_has_checksum().
> 
>> +	ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M,
>> +			  le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0));
> 
> You also access this field later when extracting base fields. Shouldn't
> this be combined somehow?
> 
>> +	decoded = libie_rx_pt_parse(ptype);
>> +
>> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>> +		return;
>> +
>> +	csum_bits.ipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M,
>> +				  rx_status0);
>> +	csum_bits.eipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M,
>> +				   rx_status0);
>> +	csum_bits.l4e = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M,
>> +				  rx_status0);
>> +	csum_bits.eudpe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M,
>> +				    rx_status0);
>> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M,
>> +				    rx_status0);
>> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M,
>> +					rx_status0);
>> +	csum_bits.nat = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS1_NAT_M, rx_status0);
>> +	csum_bits.pprs = 0;
> 
> See above for struct initialization.
> 
>> +
>> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
> 
> See above.
> 
>> +}
>> +
>> +/**
>> + * iavf_legacy_rx_hash - set the hash value in the skb
>>   * @ring: descriptor ring
>>   * @rx_desc: specific descriptor
>>   * @skb: skb currently being received and modified
>>   * @rx_ptype: Rx packet type
>> + *
>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>> + * descriptor writeback format.
>>   **/
>> -static void iavf_rx_hash(struct iavf_ring *ring,
>> -			 union iavf_rx_desc *rx_desc,
>> -			 struct sk_buff *skb,
>> -			 u8 rx_ptype)
>> +static void iavf_legacy_rx_hash(struct iavf_ring *ring,
>> +				union iavf_rx_desc *rx_desc,
> 
> Const for both.
> 
>> +				struct sk_buff *skb, u8 rx_ptype)
>>  {
>> +	const __le64 rss_mask = cpu_to_le64(IAVF_RX_DESC_STATUS_FLTSTAT_MASK);
>>  	struct libeth_rx_pt decoded;
>>  	u32 hash;
>> -	const __le64 rss_mask =
>> -		cpu_to_le64((u64)IAVF_RX_DESC_FLTSTAT_RSS_HASH <<
>> -			    IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT);
> 
> Looks like unrelated, but nice change anyway.
> 
>>  
>>  	decoded = libie_rx_pt_parse(rx_ptype);
>>  	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
>> @@ -987,6 +1075,38 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>>  	}
>>  }
>>  
>> +/**
>> + * iavf_flex_rx_hash - set the hash value in the skb
>> + * @ring: descriptor ring
>> + * @rx_desc: specific descriptor
>> + * @skb: skb currently being received and modified
>> + * @rx_ptype: Rx packet type
>> + *
>> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
>> + * descriptor writeback format.
>> + **/
>> +static void iavf_flex_rx_hash(struct iavf_ring *ring,
>> +			      union iavf_rx_desc *rx_desc,
> 
> Const.
> 
>> +			      struct sk_buff *skb, u16 rx_ptype)
> 
> Why is @rx_ptype u16 here, but u8 above? Just use u32 for both.
> 
>> +{
>> +	struct libeth_rx_pt decoded;
>> +	u16 status0;
>> +	u32 hash;
>> +
>> +	if (!(ring->netdev->features & NETIF_F_RXHASH))
> 
> This is checked in libeth_rx_pt_has_hash(), why check 2 times?
> 

I think libeth_rx_pt_has_hash() was added after so this patch is
re-introducing the check on accident when porting to upstream.

>> +		return;
>> +
>> +	decoded = libie_rx_pt_parse(rx_ptype);
>> +	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
>> +		return;
>> +
>> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
>> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M) {
>> +		hash = le32_to_cpu(rx_desc->flex_wb.rss_hash);
>> +		libeth_rx_pt_set_hash(skb, hash, decoded);
>> +	}
>> +}
> 
> Also, just parse rx_ptype once in process_skb_fields(), you don't need
> to do that in each function.
> 
>> +
>>  /**
>>   * iavf_process_skb_fields - Populate skb header fields from Rx descriptor
>>   * @rx_ring: rx descriptor ring packet is being transacted on
>> @@ -998,14 +1118,17 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>>   * order to populate the hash, checksum, VLAN, protocol, and
>>   * other fields within the skb.
>>   **/
>> -static void
>> -iavf_process_skb_fields(struct iavf_ring *rx_ring,
>> -			union iavf_rx_desc *rx_desc, struct sk_buff *skb,
>> -			u8 rx_ptype)
>> +static void iavf_process_skb_fields(struct iavf_ring *rx_ring,
>> +				    union iavf_rx_desc *rx_desc,
> 
> Const.
> 
>> +				    struct sk_buff *skb, u16 rx_ptype)
>>  {
>> -	iavf_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>> -
>> -	iavf_rx_checksum(rx_ring->vsi, skb, rx_desc);
>> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE) {
> 
> Any likely/unlikely() here? Or it's 50:50?
> 

Strictly speaking, the likely way is whatever way the software
configured during the slow init path. That's not a compile time known
value so we can't really use that to optimize this flow.

I don't know which is more common. The pre-existing descriptor format is
likely supported on more PFs currently, but I think overtime we may have
more support for the flex descriptors and that might end up being default.

>> +		iavf_legacy_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>> +		iavf_legacy_rx_csum(rx_ring->vsi, skb, rx_desc);
>> +	} else {
>> +		iavf_flex_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>> +		iavf_flex_rx_csum(rx_ring->vsi, skb, rx_desc);
>> +	}
>>  
>>  	skb_record_rx_queue(skb, rx_ring->queue_index);
>>  
>> @@ -1092,7 +1215,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>>  /**
>>   * iavf_is_non_eop - process handling of non-EOP buffers
>>   * @rx_ring: Rx ring being processed
>> - * @rx_desc: Rx descriptor for current buffer
>> + * @fields: Rx descriptor extracted fields
>>   * @skb: Current socket buffer containing buffer in progress
>>   *
>>   * This function updates next to clean.  If the buffer is an EOP buffer
>> @@ -1101,7 +1224,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>>   * that this is in fact a non-EOP buffer.
>>   **/
>>  static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>> -			    union iavf_rx_desc *rx_desc,
>> +			    struct iavf_rx_extracted *fields,
> 
> Pass value instead of pointer.
> 
>>  			    struct sk_buff *skb)
> 
> Is it still needed?
> 
>>  {
>>  	u32 ntc = rx_ring->next_to_clean + 1;
>> @@ -1113,8 +1236,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>>  	prefetch(IAVF_RX_DESC(rx_ring, ntc));
>>  
>>  	/* if we are the last buffer then there is nothing else to do */
>> -#define IAVF_RXD_EOF BIT(IAVF_RX_DESC_STATUS_EOF_SHIFT)
>> -	if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
>> +	if (likely(fields->end_of_packet))
>>  		return false;
>>  
>>  	rx_ring->rx_stats.non_eop_descs++;
>> @@ -1122,6 +1244,91 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>>  	return true;
>>  }
>>  
>> +/**
>> + * iavf_extract_legacy_rx_fields - Extract fields from the Rx descriptor
>> + * @rx_ring: rx descriptor ring
>> + * @rx_desc: the descriptor to process
>> + * @fields: storage for extracted values
>> + *
>> + * Decode the Rx descriptor and extract relevant information including the
>> + * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
>> + *
>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>> + * descriptor writeback format.
>> + */
>> +static void iavf_extract_legacy_rx_fields(struct iavf_ring *rx_ring,
>> +					  union iavf_rx_desc *rx_desc,
> 
> Consts.
> 
>> +					  struct iavf_rx_extracted *fields)
> 
> Return a struct &iavf_rx_extracted instead of passing a pointer to it.
> 
>> +{
>> +	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>> +
>> +	fields->size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
>> +	fields->rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>> +
>> +	if (qword & IAVF_RX_DESC_STATUS_L2TAG1P_MASK &&
>> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
>> +		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
>> +
>> +	if (rx_desc->wb.qword2.ext_status &
>> +	    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
> 
> Bitops must have own pairs of braces.

I don't understand what this comment is asking for. braces like { }? Or
adding parenthesis around bit op?


>> +
>> +	flexi_flags0 = le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0);
>> +
>> +	fields->rx_ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M, flexi_flags0);
> 
> le16_get_bits() instead of these two?

Neat! I wasn't aware of this.

>> +
>> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
>> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M &&
>> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
> 
> Braces for bitops.
> 
>> +		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag1);
>> +
>> +	status1 = le16_to_cpu(rx_desc->flex_wb.status_error1);
>> +	if (status1 & IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M &&
>> +	    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
>> +		fields->vlan_tag = le16_to_cpu(rx_desc->flex_wb.l2tag2_2nd);
> 
> (same)
> 
>> +
>> +	fields->end_of_packet = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT,
>> +					  status0);
>> +	fields->rxe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT,
>> +				status0);
>> +}
>> +
>> +static void iavf_extract_rx_fields(struct iavf_ring *rx_ring,
>> +				   union iavf_rx_desc *rx_desc,
>> +				   struct iavf_rx_extracted *fields)
> 
> Consts + return struct @fields directly.
> 
>> +{
>> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE)
> 
> You check this several times, this could be combined and optimized.
> 

Yea. I wasn't sure what the best way to optimize this, while also trying
to avoid duplicating code. Ideally we want to check it once and then go
through the correct sequence (calling the legacy or flex function
versions). But I also didn't want to duplicate all of the common code
between each flex or legacy call site.

>> @@ -1219,22 +1414,11 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>>  		/* probably a little skewed due to removing CRC */
>>  		total_rx_bytes += skb->len;
>>  
>> -		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>> -		rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>> -
>>  		/* populate checksum, VLAN, and protocol */
>> -		iavf_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
>> -
>> -		if (qword & BIT(IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT) &&
>> -		    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
>> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
>> -		if (rx_desc->wb.qword2.ext_status &
>> -		    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
>> -		    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
>> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);
> 
> BTW I'm wondering whether filling the whole @fields can be less
> optimized than accesssing descriptor fields one by one like it was here
> before.
> I mean, in some cases you won't need all the fields from the extracted
> struct, but they will be read and initialized anyway.


Yes. I was more focused on "what makes this readable" because I didn't
want to end up having two near identical copies of iavf_clean_rx_irq
which just used different bit offsets. But then it became tricky to
figure out how to do it in a good way. :/

> 
>> +		iavf_process_skb_fields(rx_ring, rx_desc, skb, fields.rx_ptype);
>>  
>>  		iavf_trace(clean_rx_irq_rx, rx_ring, rx_desc, skb);
>> -		iavf_receive_skb(rx_ring, skb, vlan_tag);
>> +		iavf_receive_skb(rx_ring, skb, fields.vlan_tag);
>>  		skb = NULL;
>>  
>>  		/* update budget accounting */
>> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
>> index 17309d8625ac..3661cd57a068 100644
>> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
>> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
>> @@ -99,6 +99,14 @@ static inline bool iavf_test_staterr(union iavf_rx_desc *rx_desc,
>>  		  cpu_to_le64(stat_err_bits));
>>  }
>>  
>> +struct iavf_rx_extracted {
>> +	unsigned int size;
>> +	u16 vlan_tag;
>> +	u16 rx_ptype;
>> +	u8 end_of_packet:1;
>> +	u8 rxe:1;
>> +};
> 
> Also something libethish, but not sure for this one.
> 
>> +
>>  /* How many Rx Buffers do we bundle into one write to the hardware ? */
>>  #define IAVF_RX_INCREMENT(r, i) \
>>  	do {					\
>> diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
>> index f6b09e57abce..82c16a720807 100644
>> --- a/drivers/net/ethernet/intel/iavf/iavf_type.h
>> +++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
>> @@ -206,6 +206,45 @@ union iavf_16byte_rx_desc {
>>  	} wb;  /* writeback */
>>  };
>>  
>> +/* Rx Flex Descriptor NIC Profile
>> + * RxDID Profile ID 2
>> + * Flex-field 0: RSS hash lower 16-bits
>> + * Flex-field 1: RSS hash upper 16-bits
>> + * Flex-field 2: Flow ID lower 16-bits
>> + * Flex-field 3: Flow ID higher 16-bits
>> + * Flex-field 4: reserved, VLAN ID taken from L2Tag
>> + */
>> +struct iavf_32byte_rx_flex_wb {
>> +	/* Qword 0 */
>> +	u8 rxdid;
>> +	u8 mir_id_umb_cast;
>> +	__le16 ptype_flexi_flags0;
>> +	__le16 pkt_len;
>> +	__le16 hdr_len_sph_flex_flags1;
>> +
>> +	/* Qword 1 */
>> +	__le16 status_error0;
>> +	__le16 l2tag1;
>> +	__le32 rss_hash;
>> +
>> +	/* Qword 2 */
>> +	__le16 status_error1;
>> +	u8 flexi_flags2;
>> +	u8 ts_low;
>> +	__le16 l2tag2_1st;
>> +	__le16 l2tag2_2nd;
>> +
>> +	/* Qword 3 */
>> +	__le32 flow_id;
>> +	union {
>> +		struct {
>> +			__le16 rsvd;
>> +			__le16 flow_id_ipv6;
>> +		} flex;
>> +		__le32 ts_high;
>> +	} flex_ts;
>> +};
> 
> I'm wondering whether HW descriptors can be defined as just a bunch of
> u64 qwords instead of all those u8/__le16 etc. fields. That would be faster.
> In case this would work differently on BE and LE, #ifdefs.
> 

we could define them as __le64 qwords for sure.

>> +
>>  union iavf_32byte_rx_desc {
>>  	struct {
>>  		__le64  pkt_addr; /* Packet buffer address */
>> @@ -253,35 +292,34 @@ union iavf_32byte_rx_desc {
>>  			} hi_dword;
>>  		} qword3;
>>  	} wb;  /* writeback */
>> +	struct iavf_32byte_rx_flex_wb flex_wb;
> 
> So, already existing formats were described here directly, but flex is
> declared as a field? Can this be more consistent?
> 
>>  };
>>  
>> -enum iavf_rx_desc_status_bits {
>> -	/* Note: These are predefined bit offsets */
>> -	IAVF_RX_DESC_STATUS_DD_SHIFT		= 0,
>> -	IAVF_RX_DESC_STATUS_EOF_SHIFT		= 1,
>> -	IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT	= 2,
>> -	IAVF_RX_DESC_STATUS_L3L4P_SHIFT		= 3,
>> -	IAVF_RX_DESC_STATUS_CRCP_SHIFT		= 4,
>> -	IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT	= 5, /* 2 BITS */
>> -	IAVF_RX_DESC_STATUS_TSYNVALID_SHIFT	= 7,
>> -	/* Note: Bit 8 is reserved in X710 and XL710 */
>> -	IAVF_RX_DESC_STATUS_EXT_UDP_0_SHIFT	= 8,
>> -	IAVF_RX_DESC_STATUS_UMBCAST_SHIFT	= 9, /* 2 BITS */
>> -	IAVF_RX_DESC_STATUS_FLM_SHIFT		= 11,
>> -	IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT	= 12, /* 2 BITS */
>> -	IAVF_RX_DESC_STATUS_LPBK_SHIFT		= 14,
>> -	IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT	= 15,
>> -	IAVF_RX_DESC_STATUS_RESERVED_SHIFT	= 16, /* 2 BITS */
>> -	/* Note: For non-tunnel packets INT_UDP_0 is the right status for
>> -	 * UDP header
>> -	 */
>> -	IAVF_RX_DESC_STATUS_INT_UDP_0_SHIFT	= 18,
>> -	IAVF_RX_DESC_STATUS_LAST /* this entry must be last!!! */
>> -};
>> +/* Note: These are predefined bit offsets */
>> +#define IAVF_RX_DESC_STATUS_DD_MASK		BIT(0)
>> +#define IAVF_RX_DESC_STATUS_EOF_MASK		BIT(1)
>> +#define IAVF_RX_DESC_STATUS_L2TAG1P_MASK	BIT(2)
>> +#define IAVF_RX_DESC_STATUS_L3L4P_MASK		BIT(3)
>> +#define IAVF_RX_DESC_STATUS_CRCP_MASK		BIT(4)
>> +#define IAVF_RX_DESC_STATUS_TSYNINDX_MASK	GENMASK_ULL(6, 5)
>> +#define IAVF_RX_DESC_STATUS_TSYNVALID_MASK	BIT(7)
>> +/* Note: Bit 8 is reserved in X710 and XL710 */
>> +#define IAVF_RX_DESC_STATUS_EXT_UDP_0_MASK	BIT(8)
>> +#define IAVF_RX_DESC_STATUS_UMBCAST_MASK	GENMASK_ULL(10, 9)
>> +#define IAVF_RX_DESC_STATUS_FLM_MASK		BIT(11)
>> +#define IAVF_RX_DESC_STATUS_FLTSTAT_MASK	GENMASK_ULL(13, 12)
>> +#define IAVF_RX_DESC_STATUS_LPBK_MASK		BIT(14)
>> +#define IAVF_RX_DESC_STATUS_IPV6EXADD_MASK	BIT(15)
>> +#define IAVF_RX_DESC_STATUS_RESERVED_MASK	GENMASK_ULL(17, 16)
>> +/* Note: For non-tunnel packets INT_UDP_0 is the right status for
>> + * UDP header
>> + */
>> +#define IAVF_RX_DESC_STATUS_INT_UDP_0_MASK	BIT(18)
>> +
>> +#define IAVF_RX_FLEX_DESC_STATUS_ERR0_EOP_BIT	BIT(1)
>> +#define IAVF_RX_FLEX_DESC_STATUS_ERR0_RXE_BIT	BIT(10)
>>  
>> -#define IAVF_RXD_QW1_STATUS_SHIFT	0
>> -#define IAVF_RXD_QW1_STATUS_MASK	((BIT(IAVF_RX_DESC_STATUS_LAST) - 1) \
>> -					 << IAVF_RXD_QW1_STATUS_SHIFT)
>> +#define IAVF_RXD_QW1_STATUS_MASK		(BIT(19) - 1)
> 
> GENMASK().
> 
>>  
>>  #define IAVF_RXD_QW1_STATUS_TSYNINDX_SHIFT IAVF_RX_DESC_STATUS_TSYNINDX_SHIFT
>>  #define IAVF_RXD_QW1_STATUS_TSYNINDX_MASK  (0x3UL << \
>> @@ -301,18 +339,16 @@ enum iavf_rx_desc_fltstat_values {
>>  #define IAVF_RXD_QW1_ERROR_SHIFT	19
>>  #define IAVF_RXD_QW1_ERROR_MASK		(0xFFUL << IAVF_RXD_QW1_ERROR_SHIFT)
>>  
>> -enum iavf_rx_desc_error_bits {
>> -	/* Note: These are predefined bit offsets */
>> -	IAVF_RX_DESC_ERROR_RXE_SHIFT		= 0,
>> -	IAVF_RX_DESC_ERROR_RECIPE_SHIFT		= 1,
>> -	IAVF_RX_DESC_ERROR_HBO_SHIFT		= 2,
>> -	IAVF_RX_DESC_ERROR_L3L4E_SHIFT		= 3, /* 3 BITS */
>> -	IAVF_RX_DESC_ERROR_IPE_SHIFT		= 3,
>> -	IAVF_RX_DESC_ERROR_L4E_SHIFT		= 4,
>> -	IAVF_RX_DESC_ERROR_EIPE_SHIFT		= 5,
>> -	IAVF_RX_DESC_ERROR_OVERSIZE_SHIFT	= 6,
>> -	IAVF_RX_DESC_ERROR_PPRS_SHIFT		= 7
>> -};
>> +/* Note: These are predefined bit offsets */
>> +#define IAVF_RX_DESC_ERROR_RXE_MASK		BIT(0)
>> +#define IAVF_RX_DESC_ERROR_RECIPE_MASK		BIT(1)
>> +#define IAVF_RX_DESC_ERROR_HBO_MASK		BIT(2)
>> +#define IAVF_RX_DESC_ERROR_L3L4E_MASK		GENMASK_ULL(5, 3)
>> +#define IAVF_RX_DESC_ERROR_IPE_MASK		BIT(3)
>> +#define IAVF_RX_DESC_ERROR_L4E_MASK		BIT(4)
>> +#define IAVF_RX_DESC_ERROR_EIPE_MASK		BIT(5)
>> +#define IAVF_RX_DESC_ERROR_OVERSIZE_MASK	BIT(6)
>> +#define IAVF_RX_DESC_ERROR_PPRS_MASK		BIT(7)
>>  
>>  enum iavf_rx_desc_error_l3l4e_fcoe_masks {
>>  	IAVF_RX_DESC_ERROR_L3L4E_NONE		= 0,
>> @@ -325,6 +361,41 @@ enum iavf_rx_desc_error_l3l4e_fcoe_masks {
>>  #define IAVF_RXD_QW1_PTYPE_SHIFT	30
>>  #define IAVF_RXD_QW1_PTYPE_MASK		(0xFFULL << IAVF_RXD_QW1_PTYPE_SHIFT)
>>  
>> +/* for iavf_32byte_rx_flex_wb.ptype_flexi_flags0 member */
>> +#define IAVF_RX_FLEX_DESC_PTYPE_M      (0x3FF) /* 10-bits */
> 
> Redundant braces + GENMASK()
> 
>> +
>> +/* for iavf_32byte_rx_flex_wb.pkt_length member */
>> +#define IAVF_RX_FLEX_DESC_PKT_LEN_M    (0x3FFF) /* 14-bits */
> 
> Same.
> 
>> +
>> +/* Note: These are predefined bit offsets */
>> +#define IAVF_RX_FLEX_DESC_STATUS0_DD_M			BIT(0)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_EOF_M			BIT(1)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_HBO_M			BIT(2)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M		BIT(3)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M		BIT(4)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M		BIT(5)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M		BIT(6)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M		BIT(7)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_LPBK_M		BIT(8)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M		BIT(9)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_RXE_M			BIT(10)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_CRCP_			BIT(11)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M		BIT(12)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M		BIT(13)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD0_VALID_M	BIT(14)
>> +#define IAVF_RX_FLEX_DESC_STATUS0_XTRMD1_VALID_M	BIT(15)
>> +
>> +/* Note: These are predefined bit offsets */
>> +#define IAVF_RX_FLEX_DESC_STATUS1_CPM_M			(0xFULL) /* 4 bits */
> 
> Redundant braces.
> + GENMASK_ULL(7, 0)?
> 
>> +#define IAVF_RX_FLEX_DESC_STATUS1_NAT_M			BIT(4)
>> +#define IAVF_RX_FLEX_DESC_STATUS1_CRYPTO_M		BIT(5)
>> +/* [10:6] reserved */
>> +#define IAVF_RX_FLEX_DESC_STATUS1_L2TAG2P_M		BIT(11)
>> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD2_VALID_M	BIT(12)
>> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD3_VALID_M	BIT(13)
>> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD4_VALID_M	BIT(14)
>> +#define IAVF_RX_FLEX_DESC_STATUS1_XTRMD5_VALID_M	BIT(15)
>> +
>>  #define IAVF_RXD_QW1_LENGTH_PBUF_SHIFT	38
>>  #define IAVF_RXD_QW1_LENGTH_PBUF_MASK	(0x3FFFULL << \
>>  					 IAVF_RXD_QW1_LENGTH_PBUF_SHIFT)
>> diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
>> index 2693c3ad0830..5cbb375b7063 100644
>> --- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
>> +++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
>> @@ -402,6 +402,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>>  	int pairs = adapter->num_active_queues;
>>  	struct virtchnl_queue_pair_info *vqpi;
>>  	u32 i, max_frame;
>> +	u8 rx_flags = 0;
>>  	size_t len;
>>  
>>  	max_frame = LIBIE_MAX_RX_FRM_LEN(adapter->rx_rings->pp->p.offset);
>> @@ -419,6 +420,9 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>>  	if (!vqci)
>>  		return;
>>  
>> +	if (iavf_ptp_cap_supported(adapter, VIRTCHNL_1588_PTP_CAP_RX_TSTAMP))
>> +		rx_flags |= VIRTCHNL_PTP_RX_TSTAMP;
>> +
>>  	vqci->vsi_id = adapter->vsi_res->vsi_id;
>>  	vqci->num_queue_pairs = pairs;
>>  	vqpi = vqci->qpair;
>> @@ -441,6 +445,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
>>  		if (CRC_OFFLOAD_ALLOWED(adapter))
>>  			vqpi->rxq.crc_disable = !!(adapter->netdev->features &
>>  						   NETIF_F_RXFCS);
>> +		vqpi->rxq.flags = rx_flags;
>>  		vqpi++;
>>  	}
> 
> Thanks,
> Olek

Thanks for the detailed review. This is rather tricky to get right. The
goal is to be able to support both the legacy descriptors for old PFs
and the new flex descriptors to support new features like timestamping,
while avoiding having a lot of near duplicate logic.

I guess you could achieve some of that via macros or some other
construction that expands the code out better for compile time optimization?

I don't want to end up with just duplicating the entire hot path in
code.. but I also don't want to end up with a "to avoid that we just
check the same values again and again".

The goal is to make sure its maintainable and avoid the case where we
introduce or fix bugs in one flow without fixing it in the others.. But
the current approach here is obviously not the most optimal way to
achieve these goals.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-11 20:52     ` Jacob Keller
@ 2024-06-12 11:51       ` Przemek Kitszel
  2024-06-12 12:33       ` Alexander Lobakin
  1 sibling, 0 replies; 34+ messages in thread
From: Przemek Kitszel @ 2024-06-12 11:51 UTC (permalink / raw)
  To: Jacob Keller, Alexander Lobakin, Mateusz Polchlopek,
	Nguyen, Anthony L
  Cc: intel-wired-lan, netdev, Wojciech Drewek

On 6/11/24 22:52, Jacob Keller wrote:
> 
> 
> On 6/11/2024 4:47 AM, Alexander Lobakin wrote:
>> From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>> Date: Tue,  4 Jun 2024 09:13:57 -0400
>>
>>> From: Jacob Keller <jacob.e.keller@intel.com>

[..]

>> Thanks,
>> Olek
> 
> Thanks for the detailed review. This is rather tricky to get right. The
> goal is to be able to support both the legacy descriptors for old PFs
> and the new flex descriptors to support new features like timestamping,
> while avoiding having a lot of near duplicate logic.
> 
> I guess you could achieve some of that via macros or some other
> construction that expands the code out better for compile time optimization?
> 
> I don't want to end up with just duplicating the entire hot path in
> code.. but I also don't want to end up with a "to avoid that we just
> check the same values again and again".
> 
> The goal is to make sure its maintainable and avoid the case where we
> introduce or fix bugs in one flow without fixing it in the others.. But
> the current approach here is obviously not the most optimal way to
> achieve these goals.
> 

Thank you Olek for providing the feedback, especially such insightful!

@Tony, I would like to have this patch kept in the for-VAL bucket, if
only to double check if applying the feedback have not accidentaly broke
the correctness. Additional points if double testing will illustrate
performance improvements :)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-11 20:52     ` Jacob Keller
  2024-06-12 11:51       ` Przemek Kitszel
@ 2024-06-12 12:33       ` Alexander Lobakin
  2024-06-21 14:21         ` Alexander Lobakin
  1 sibling, 1 reply; 34+ messages in thread
From: Alexander Lobakin @ 2024-06-12 12:33 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Mateusz Polchlopek, intel-wired-lan, netdev, Wojciech Drewek

From: Jacob Keller <jacob.e.keller@intel.com>
Date: Tue, 11 Jun 2024 13:52:57 -0700

> 
> 
> On 6/11/2024 4:47 AM, Alexander Lobakin wrote:
>> From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>> Date: Tue,  4 Jun 2024 09:13:57 -0400
>>
>>> From: Jacob Keller <jacob.e.keller@intel.com>
>>>
>>> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
>>> negotiating to enable the advanced flexible descriptor layout. Add the
>>> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.
>>>
>>> Also add bit position definitions for the status and error indications
>>> that are needed.
>>>
>>> The iavf_clean_rx_irq function needs to extract a few fields from the Rx
>>> descriptor, including the size, rx_ptype, and vlan_tag.
>>> Move the extraction to a separate function that decodes the fields into
>>> a structure. This will reduce the burden for handling multiple
>>> descriptor types by keeping the relevant extraction logic in one place.
>>>
>>> To support handling an additional descriptor format with minimal code
>>> duplication, refactor Rx checksum handling so that the general logic
>>> is separated from the bit calculations. Introduce an iavf_rx_desc_decoded
>>> structure which holds the relevant bits decoded from the Rx descriptor.
>>> This will enable implementing flexible descriptor handling without
>>> duplicating the general logic twice.
>>>
>>> Introduce an iavf_extract_flex_rx_fields, iavf_flex_rx_hash, and
>>> iavf_flex_rx_csum functions which operate on the flexible NIC descriptor
>>> format instead of the legacy 32 byte format. Based on the negotiated
>>> RXDID, select the correct function for processing the Rx descriptors.
>>>
>>> With this change, the Rx hot path should be functional when using either
>>> the default legacy 32byte format or when we switch to the flexible NIC
>>> layout.
>>>
>>> Modify the Rx hot path to add support for the flexible descriptor
>>> format and add request enabling Rx timestamps for all queues.
>>>
>>> As in ice, make sure we bump the checksum level if the hardware detected
>>> a packet type which could have an outer checksum. This is important
>>> because hardware only verifies the inner checksum.
>>>
>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
>>> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
>>> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>>> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>>> ---
>>>  drivers/net/ethernet/intel/iavf/iavf_txrx.c   | 354 +++++++++++++-----
>>>  drivers/net/ethernet/intel/iavf/iavf_txrx.h   |   8 +
>>>  drivers/net/ethernet/intel/iavf/iavf_type.h   | 147 ++++++--
>>>  .../net/ethernet/intel/iavf/iavf_virtchnl.c   |   5 +
>>>  4 files changed, 391 insertions(+), 123 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>>> index 26b424fd6718..97da5af52ad7 100644
>>> --- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>>> +++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
>>> @@ -895,63 +895,68 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
>>>  	return true;
>>>  }
>>>  
>>> +/* iavf_rx_csum_decoded
>>> + *
>>> + * Checksum offload bits decoded from the receive descriptor.
>>> + */
>>> +struct iavf_rx_csum_decoded {
>>> +	u8 l3l4p : 1;
>>> +	u8 ipe : 1;
>>> +	u8 eipe : 1;
>>> +	u8 eudpe : 1;
>>> +	u8 ipv6exadd : 1;
>>> +	u8 l4e : 1;
>>> +	u8 pprs : 1;
>>> +	u8 nat : 1;
>>> +};
>>
>> I see the same struct in idpf, probably a candidate for libeth.
>>
> 
> Makes sense.
> 
>>> +
>>>  /**
>>> - * iavf_rx_checksum - Indicate in skb if hw indicated a good cksum
>>> + * iavf_rx_csum - Indicate in skb if hw indicated a good checksum
>>>   * @vsi: the VSI we care about
>>>   * @skb: skb currently being received and modified
>>> - * @rx_desc: the receive descriptor
>>> + * @ptype: decoded ptype information
>>> + * @csum_bits: decoded Rx descriptor information
>>>   **/
>>> -static void iavf_rx_checksum(struct iavf_vsi *vsi,
>>> -			     struct sk_buff *skb,
>>> -			     union iavf_rx_desc *rx_desc)
>>> +static void iavf_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
>>
>> Can't @vsi be const?
>>
>>> +			 struct libeth_rx_pt *ptype,
>>
>> struct libeth_rx_pt is smaller than a pointer to it. Pass it directly
>>
>>> +			 struct iavf_rx_csum_decoded *csum_bits)
>>
>> Same for this struct.
>>
>>>  {
>>> -	struct libeth_rx_pt decoded;
>>> -	u32 rx_error, rx_status;
>>>  	bool ipv4, ipv6;
>>> -	u8 ptype;
>>> -	u64 qword;
>>>  
>>>  	skb->ip_summed = CHECKSUM_NONE;
>>>  
>>> -	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>>> -	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>>> -
>>> -	decoded = libie_rx_pt_parse(ptype);
>>> -	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>>> -		return;
>>> -
>>> -	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
>>> -	rx_status = FIELD_GET(IAVF_RXD_QW1_STATUS_MASK, qword);
>>> -
>>>  	/* did the hardware decode the packet and checksum? */
>>> -	if (!(rx_status & BIT(IAVF_RX_DESC_STATUS_L3L4P_SHIFT)))
>>> +	if (!csum_bits->l3l4p)
>>>  		return;
>>>  
>>> -	ipv4 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV4;
>>> -	ipv6 = libeth_rx_pt_get_ip_ver(decoded) == LIBETH_RX_PT_OUTER_IPV6;
>>> +	ipv4 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV4;
>>> +	ipv6 = libeth_rx_pt_get_ip_ver(*ptype) == LIBETH_RX_PT_OUTER_IPV6;
>>>  
>>> -	if (ipv4 &&
>>> -	    (rx_error & (BIT(IAVF_RX_DESC_ERROR_IPE_SHIFT) |
>>> -			 BIT(IAVF_RX_DESC_ERROR_EIPE_SHIFT))))
>>> +	if (ipv4 && (csum_bits->ipe || csum_bits->eipe))
>>>  		goto checksum_fail;
>>>  
>>>  	/* likely incorrect csum if alternate IP extension headers found */
>>> -	if (ipv6 &&
>>> -	    rx_status & BIT(IAVF_RX_DESC_STATUS_IPV6EXADD_SHIFT))
>>> -		/* don't increment checksum err here, non-fatal err */
>>> +	if (ipv6 && csum_bits->ipv6exadd)
>>>  		return;
>>>  
>>>  	/* there was some L4 error, count error and punt packet to the stack */
>>> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_L4E_SHIFT))
>>> +	if (csum_bits->l4e)
>>>  		goto checksum_fail;
>>>  
>>>  	/* handle packets that were not able to be checksummed due
>>>  	 * to arrival speed, in this case the stack can compute
>>>  	 * the csum.
>>>  	 */
>>> -	if (rx_error & BIT(IAVF_RX_DESC_ERROR_PPRS_SHIFT))
>>> +	if (csum_bits->pprs)
>>>  		return;
>>>  
>>> +	/* If there is an outer header present that might contain a checksum
>>> +	 * we need to bump the checksum level by 1 to reflect the fact that
>>> +	 * we are indicating we validated the inner checksum.
>>> +	 */
>>> +	if (ptype->tunnel_type >= LIBETH_RX_PT_TUNNEL_IP_GRENAT)
>>> +		skb->csum_level = 1;
>>> +
>>>  	skb->ip_summed = CHECKSUM_UNNECESSARY;
>>>  	return;
>>
>> For the whole function: you need to use unlikely() for checksum errors
>> to not slow down regular frames.
>> Also, I would even unlikely() packets with not verified checksum as it's
>> really rare case.
>> Optimize hotpath for most common workloads.
>>
> 
> Makes sense.
> 
>>>  
>>> @@ -960,22 +965,105 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
>>>  }
>>>  
>>>  /**
>>> - * iavf_rx_hash - set the hash value in the skb
>>> + * iavf_legacy_rx_csum - Indicate in skb if hw indicated a good cksum
>>> + * @vsi: the VSI we care about
>>> + * @skb: skb currently being received and modified
>>> + * @rx_desc: the receive descriptor
>>> + *
>>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>>> + * descriptor writeback format.
>>> + **/
>>> +static void iavf_legacy_rx_csum(struct iavf_vsi *vsi,
>>> +				struct sk_buff *skb,
>>> +				union iavf_rx_desc *rx_desc)
>>
>> @vsi and @rx_desc can be const.
>>
>>> +{
>>> +	struct iavf_rx_csum_decoded csum_bits;
>>> +	struct libeth_rx_pt decoded;
>>> +
>>> +	u32 rx_error;
>>> +	u64 qword;
>>> +	u16 ptype;
>>> +
>>> +	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>>> +	ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>>> +	rx_error = FIELD_GET(IAVF_RXD_QW1_ERROR_MASK, qword);
>>
>> You don't need @rx_error before libeth_rx_pt_has_checksum().
>>
>>> +	decoded = libie_rx_pt_parse(ptype);
>>> +
>>> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>>> +		return;
>>> +
>>> +	csum_bits.ipe = FIELD_GET(IAVF_RX_DESC_ERROR_IPE_MASK, rx_error);
>>
>> So, @rx_error is a field of @qword and then there are more subfields?
>> Why not extract those fields directly from @qword?
>>
> 
> Yea that would be better. Probably just because the pre-existing
> defines. Should be simple to update it.
> 
>>> +	csum_bits.eipe = FIELD_GET(IAVF_RX_DESC_ERROR_EIPE_MASK, rx_error);
>>> +	csum_bits.l4e = FIELD_GET(IAVF_RX_DESC_ERROR_L4E_MASK, rx_error);
>>> +	csum_bits.pprs = FIELD_GET(IAVF_RX_DESC_ERROR_PPRS_MASK, rx_error);
>>> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_DESC_STATUS_L3L4P_MASK, rx_error);
>>> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_DESC_STATUS_IPV6EXADD_MASK,
>>> +					rx_error);
>>> +	csum_bits.nat = 0;
>>> +	csum_bits.eudpe = 0;
>>
>> Initialize the whole struct with = { } at the declaration site and
>> remove this.
>>
>>> +
>>> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
>>
>> In order to avoid having 2 call sites for this, make
>> iavf_{flex,legacy}_rx_csum() return @csum_bits and call iavf_rx_csum()
>> outside of them once.
>>
> 
> Good idea.
> 
>>> +}
>>> +
>>> +/**
>>> + * iavf_flex_rx_csum - Indicate in skb if hw indicated a good cksum
>>> + * @vsi: the VSI we care about
>>> + * @skb: skb currently being received and modified
>>> + * @rx_desc: the receive descriptor
>>> + *
>>> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
>>> + * descriptor writeback format.
>>> + **/
>>> +static void iavf_flex_rx_csum(struct iavf_vsi *vsi, struct sk_buff *skb,
>>> +			      union iavf_rx_desc *rx_desc)
>>
>> Same for const.
>>
>>> +{
>>> +	struct iavf_rx_csum_decoded csum_bits;
>>> +	struct libeth_rx_pt decoded;
>>> +	u16 rx_status0, ptype;
>>> +
>>> +	rx_status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
>>
>> This is not needed before libeth_rx_pt_has_checksum().
>>
>>> +	ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M,
>>> +			  le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0));
>>
>> You also access this field later when extracting base fields. Shouldn't
>> this be combined somehow?
>>
>>> +	decoded = libie_rx_pt_parse(ptype);
>>> +
>>> +	if (!libeth_rx_pt_has_checksum(vsi->netdev, decoded))
>>> +		return;
>>> +
>>> +	csum_bits.ipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_IPE_M,
>>> +				  rx_status0);
>>> +	csum_bits.eipe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EIPE_M,
>>> +				   rx_status0);
>>> +	csum_bits.l4e = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_L4E_M,
>>> +				  rx_status0);
>>> +	csum_bits.eudpe = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_XSUM_EUDPE_M,
>>> +				    rx_status0);
>>> +	csum_bits.l3l4p = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_L3L4P_M,
>>> +				    rx_status0);
>>> +	csum_bits.ipv6exadd = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS0_IPV6EXADD_M,
>>> +					rx_status0);
>>> +	csum_bits.nat = FIELD_GET(IAVF_RX_FLEX_DESC_STATUS1_NAT_M, rx_status0);
>>> +	csum_bits.pprs = 0;
>>
>> See above for struct initialization.
>>
>>> +
>>> +	iavf_rx_csum(vsi, skb, &decoded, &csum_bits);
>>
>> See above.
>>
>>> +}
>>> +
>>> +/**
>>> + * iavf_legacy_rx_hash - set the hash value in the skb
>>>   * @ring: descriptor ring
>>>   * @rx_desc: specific descriptor
>>>   * @skb: skb currently being received and modified
>>>   * @rx_ptype: Rx packet type
>>> + *
>>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>>> + * descriptor writeback format.
>>>   **/
>>> -static void iavf_rx_hash(struct iavf_ring *ring,
>>> -			 union iavf_rx_desc *rx_desc,
>>> -			 struct sk_buff *skb,
>>> -			 u8 rx_ptype)
>>> +static void iavf_legacy_rx_hash(struct iavf_ring *ring,
>>> +				union iavf_rx_desc *rx_desc,
>>
>> Const for both.
>>
>>> +				struct sk_buff *skb, u8 rx_ptype)
>>>  {
>>> +	const __le64 rss_mask = cpu_to_le64(IAVF_RX_DESC_STATUS_FLTSTAT_MASK);
>>>  	struct libeth_rx_pt decoded;
>>>  	u32 hash;
>>> -	const __le64 rss_mask =
>>> -		cpu_to_le64((u64)IAVF_RX_DESC_FLTSTAT_RSS_HASH <<
>>> -			    IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT);
>>
>> Looks like unrelated, but nice change anyway.
>>
>>>  
>>>  	decoded = libie_rx_pt_parse(rx_ptype);
>>>  	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
>>> @@ -987,6 +1075,38 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>>>  	}
>>>  }
>>>  
>>> +/**
>>> + * iavf_flex_rx_hash - set the hash value in the skb
>>> + * @ring: descriptor ring
>>> + * @rx_desc: specific descriptor
>>> + * @skb: skb currently being received and modified
>>> + * @rx_ptype: Rx packet type
>>> + *
>>> + * This function only operates on the VIRTCHNL_RXDID_2_FLEX_SQ_NIC flexible
>>> + * descriptor writeback format.
>>> + **/
>>> +static void iavf_flex_rx_hash(struct iavf_ring *ring,
>>> +			      union iavf_rx_desc *rx_desc,
>>
>> Const.
>>
>>> +			      struct sk_buff *skb, u16 rx_ptype)
>>
>> Why is @rx_ptype u16 here, but u8 above? Just use u32 for both.
>>
>>> +{
>>> +	struct libeth_rx_pt decoded;
>>> +	u16 status0;
>>> +	u32 hash;
>>> +
>>> +	if (!(ring->netdev->features & NETIF_F_RXHASH))
>>
>> This is checked in libeth_rx_pt_has_hash(), why check 2 times?
>>
> 
> I think libeth_rx_pt_has_hash() was added after so this patch is
> re-introducing the check on accident when porting to upstream.
> 
>>> +		return;
>>> +
>>> +	decoded = libie_rx_pt_parse(rx_ptype);
>>> +	if (!libeth_rx_pt_has_hash(ring->netdev, decoded))
>>> +		return;
>>> +
>>> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
>>> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_RSS_VALID_M) {
>>> +		hash = le32_to_cpu(rx_desc->flex_wb.rss_hash);
>>> +		libeth_rx_pt_set_hash(skb, hash, decoded);
>>> +	}
>>> +}
>>
>> Also, just parse rx_ptype once in process_skb_fields(), you don't need
>> to do that in each function.
>>
>>> +
>>>  /**
>>>   * iavf_process_skb_fields - Populate skb header fields from Rx descriptor
>>>   * @rx_ring: rx descriptor ring packet is being transacted on
>>> @@ -998,14 +1118,17 @@ static void iavf_rx_hash(struct iavf_ring *ring,
>>>   * order to populate the hash, checksum, VLAN, protocol, and
>>>   * other fields within the skb.
>>>   **/
>>> -static void
>>> -iavf_process_skb_fields(struct iavf_ring *rx_ring,
>>> -			union iavf_rx_desc *rx_desc, struct sk_buff *skb,
>>> -			u8 rx_ptype)
>>> +static void iavf_process_skb_fields(struct iavf_ring *rx_ring,
>>> +				    union iavf_rx_desc *rx_desc,
>>
>> Const.
>>
>>> +				    struct sk_buff *skb, u16 rx_ptype)
>>>  {
>>> -	iavf_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>>> -
>>> -	iavf_rx_checksum(rx_ring->vsi, skb, rx_desc);
>>> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE) {
>>
>> Any likely/unlikely() here? Or it's 50:50?
>>
> 
> Strictly speaking, the likely way is whatever way the software
> configured during the slow init path. That's not a compile time known
> value so we can't really use that to optimize this flow.
> 
> I don't know which is more common. The pre-existing descriptor format is
> likely supported on more PFs currently, but I think overtime we may have
> more support for the flex descriptors and that might end up being default.
> 
>>> +		iavf_legacy_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>>> +		iavf_legacy_rx_csum(rx_ring->vsi, skb, rx_desc);
>>> +	} else {
>>> +		iavf_flex_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
>>> +		iavf_flex_rx_csum(rx_ring->vsi, skb, rx_desc);
>>> +	}
>>>  
>>>  	skb_record_rx_queue(skb, rx_ring->queue_index);
>>>  
>>> @@ -1092,7 +1215,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>>>  /**
>>>   * iavf_is_non_eop - process handling of non-EOP buffers
>>>   * @rx_ring: Rx ring being processed
>>> - * @rx_desc: Rx descriptor for current buffer
>>> + * @fields: Rx descriptor extracted fields
>>>   * @skb: Current socket buffer containing buffer in progress
>>>   *
>>>   * This function updates next to clean.  If the buffer is an EOP buffer
>>> @@ -1101,7 +1224,7 @@ static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
>>>   * that this is in fact a non-EOP buffer.
>>>   **/
>>>  static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>>> -			    union iavf_rx_desc *rx_desc,
>>> +			    struct iavf_rx_extracted *fields,
>>
>> Pass value instead of pointer.
>>
>>>  			    struct sk_buff *skb)
>>
>> Is it still needed?
>>
>>>  {
>>>  	u32 ntc = rx_ring->next_to_clean + 1;
>>> @@ -1113,8 +1236,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>>>  	prefetch(IAVF_RX_DESC(rx_ring, ntc));
>>>  
>>>  	/* if we are the last buffer then there is nothing else to do */
>>> -#define IAVF_RXD_EOF BIT(IAVF_RX_DESC_STATUS_EOF_SHIFT)
>>> -	if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
>>> +	if (likely(fields->end_of_packet))
>>>  		return false;
>>>  
>>>  	rx_ring->rx_stats.non_eop_descs++;
>>> @@ -1122,6 +1244,91 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
>>>  	return true;
>>>  }
>>>  
>>> +/**
>>> + * iavf_extract_legacy_rx_fields - Extract fields from the Rx descriptor
>>> + * @rx_ring: rx descriptor ring
>>> + * @rx_desc: the descriptor to process
>>> + * @fields: storage for extracted values
>>> + *
>>> + * Decode the Rx descriptor and extract relevant information including the
>>> + * size, VLAN tag, Rx packet type, end of packet field and RXE field value.
>>> + *
>>> + * This function only operates on the VIRTCHNL_RXDID_1_32B_BASE legacy 32byte
>>> + * descriptor writeback format.
>>> + */
>>> +static void iavf_extract_legacy_rx_fields(struct iavf_ring *rx_ring,
>>> +					  union iavf_rx_desc *rx_desc,
>>
>> Consts.
>>
>>> +					  struct iavf_rx_extracted *fields)
>>
>> Return a struct &iavf_rx_extracted instead of passing a pointer to it.
>>
>>> +{
>>> +	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>>> +
>>> +	fields->size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);
>>> +	fields->rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>>> +
>>> +	if (qword & IAVF_RX_DESC_STATUS_L2TAG1P_MASK &&
>>> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
>>> +		fields->vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
>>> +
>>> +	if (rx_desc->wb.qword2.ext_status &
>>> +	    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
>>
>> Bitops must have own pairs of braces.
> 
> I don't understand what this comment is asking for. braces like { }? Or
> adding parenthesis around bit op?

Sorry for my english :D Parenthesis.

	if ((status & BIT) && condition2)

> 
> 
>>> +
>>> +	flexi_flags0 = le16_to_cpu(rx_desc->flex_wb.ptype_flexi_flags0);
>>> +
>>> +	fields->rx_ptype = FIELD_GET(IAVF_RX_FLEX_DESC_PTYPE_M, flexi_flags0);
>>
>> le16_get_bits() instead of these two?
> 
> Neat! I wasn't aware of this.

Indexers have a hard time with this as these inlines get generated by
preprocessor definitions, see the end of <linux/bitfield.h>.

> 
>>> +
>>> +	status0 = le16_to_cpu(rx_desc->flex_wb.status_error0);
>>> +	if (status0 & IAVF_RX_FLEX_DESC_STATUS0_L2TAG1P_M &&
>>> +	    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)

[...]

>>> +	if (rx_ring->rxdid == VIRTCHNL_RXDID_1_32B_BASE)
>>
>> You check this several times, this could be combined and optimized.
>>
> 
> Yea. I wasn't sure what the best way to optimize this, while also trying
> to avoid duplicating code. Ideally we want to check it once and then go
> through the correct sequence (calling the legacy or flex function
> versions). But I also didn't want to duplicate all of the common code
> between each flex or legacy call site.

This one `if` won't hurt tho if avoiding it is more expensive like
you're describing. Up to you.

> 
>>> @@ -1219,22 +1414,11 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
>>>  		/* probably a little skewed due to removing CRC */
>>>  		total_rx_bytes += skb->len;
>>>  
>>> -		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
>>> -		rx_ptype = FIELD_GET(IAVF_RXD_QW1_PTYPE_MASK, qword);
>>> -
>>>  		/* populate checksum, VLAN, and protocol */
>>> -		iavf_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
>>> -
>>> -		if (qword & BIT(IAVF_RX_DESC_STATUS_L2TAG1P_SHIFT) &&
>>> -		    rx_ring->flags & IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1)
>>> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1);
>>> -		if (rx_desc->wb.qword2.ext_status &
>>> -		    cpu_to_le16(BIT(IAVF_RX_DESC_EXT_STATUS_L2TAG2P_SHIFT)) &&
>>> -		    rx_ring->flags & IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2)
>>> -			vlan_tag = le16_to_cpu(rx_desc->wb.qword2.l2tag2_2);
>>
>> BTW I'm wondering whether filling the whole @fields can be less
>> optimized than accesssing descriptor fields one by one like it was here
>> before.
>> I mean, in some cases you won't need all the fields from the extracted
>> struct, but they will be read and initialized anyway.
> 
> 
> Yes. I was more focused on "what makes this readable" because I didn't
> want to end up having two near identical copies of iavf_clean_rx_irq
> which just used different bit offsets. But then it became tricky to
> figure out how to do it in a good way. :/

Nonono, don't copy the whole function.

But perhaps we can fill @fields not whole at once, but step-by-step?
Splitting extract_fields() into several small functions?
I realize this will introduce a couple more ifs checking for the
descriptor format again and again, but this might be faster than filling
the whole struct. Anyway, this needs practical testing which way is
better. Filling the whole might indeed be faster, as you'd just access
the descriptor once in one place (well, not once, but less frequently).

[...]

>>> +		struct {
>>> +			__le16 rsvd;
>>> +			__le16 flow_id_ipv6;
>>> +		} flex;
>>> +		__le32 ts_high;
>>> +	} flex_ts;
>>> +};
>>
>> I'm wondering whether HW descriptors can be defined as just a bunch of
>> u64 qwords instead of all those u8/__le16 etc. fields. That would be faster.
>> In case this would work differently on BE and LE, #ifdefs.
>>
> 
> we could define them as __le64 qwords for sure.

In the idpf XDP code[0], I defined Rx descriptor as a pack of __le64s
and then either access it as u64s if the platform is LE or
field-by-field if it's BE, so that we get perf boost on LE without
breaking anything on BE.
I could play around it on BE as well by just defining bitfields
differently there, but it's not a target platform, so I left it as it
is, it's not slower than the current Rx hotpath code anyway.

[...]

> Thanks for the detailed review. This is rather tricky to get right. The
> goal is to be able to support both the legacy descriptors for old PFs
> and the new flex descriptors to support new features like timestamping,
> while avoiding having a lot of near duplicate logic.

Not really tricky. idpf's singleq code also extracts some fields to a
common structure depending on the descriptor format and there's also a
couple identical checks. One `if` is rather cheap and clearly better
than copying 50-100 locs.

(you can take a look at the tree and/or idpf_txrx_singleq.c in the link
 below, it has patterns similar to this patch and I optimized some stuff
 there, e.g. libeth calls vs open-coded checks etc.)

> 
> I guess you could achieve some of that via macros or some other
> construction that expands the code out better for compile time optimization?
> 
> I don't want to end up with just duplicating the entire hot path in
> code.. but I also don't want to end up with a "to avoid that we just
> check the same values again and again".
> 
> The goal is to make sure its maintainable and avoid the case where we
> introduce or fix bugs in one flow without fixing it in the others.. But
> the current approach here is obviously not the most optimal way to
> achieve these goals.

[0]
https://github.com/alobakin/linux/blob/idpf-libie-new/drivers/net/ethernet/intel/idpf/xdp.h#L113

Thanks,
Olek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-12 12:33       ` Alexander Lobakin
@ 2024-06-21 14:21         ` Alexander Lobakin
  2024-06-21 15:08           ` Tony Nguyen
  0 siblings, 1 reply; 34+ messages in thread
From: Alexander Lobakin @ 2024-06-21 14:21 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Jacob Keller, Wojciech Drewek, netdev, Mateusz Polchlopek

From: Alexander Lobakin <aleksander.lobakin@intel.com>
Date: Wed, 12 Jun 2024 14:33:17 +0200

> From: Jacob Keller <jacob.e.keller@intel.com>
> Date: Tue, 11 Jun 2024 13:52:57 -0700
> 
>>
>>
>> On 6/11/2024 4:47 AM, Alexander Lobakin wrote:
>>> From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>>> Date: Tue,  4 Jun 2024 09:13:57 -0400
>>>
>>>> From: Jacob Keller <jacob.e.keller@intel.com>
>>>>
>>>> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
>>>> negotiating to enable the advanced flexible descriptor layout. Add the
>>>> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.

[...]

Why is this taken into the next queue if I asked for changes and there's
v8 in development?

Thanks,
Olek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors
  2024-06-21 14:21         ` Alexander Lobakin
@ 2024-06-21 15:08           ` Tony Nguyen
  0 siblings, 0 replies; 34+ messages in thread
From: Tony Nguyen @ 2024-06-21 15:08 UTC (permalink / raw)
  To: Alexander Lobakin, intel-wired-lan
  Cc: Jacob Keller, netdev, Wojciech Drewek, Mateusz Polchlopek



On 6/21/2024 7:21 AM, Alexander Lobakin wrote:
> From: Alexander Lobakin <aleksander.lobakin@intel.com>
> Date: Wed, 12 Jun 2024 14:33:17 +0200
> 
>> From: Jacob Keller <jacob.e.keller@intel.com>
>> Date: Tue, 11 Jun 2024 13:52:57 -0700
>>
>>>
>>>
>>> On 6/11/2024 4:47 AM, Alexander Lobakin wrote:
>>>> From: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
>>>> Date: Tue,  4 Jun 2024 09:13:57 -0400
>>>>
>>>>> From: Jacob Keller <jacob.e.keller@intel.com>
>>>>>
>>>>> Using VIRTCHNL_VF_OFFLOAD_FLEX_DESC, the iAVF driver is capable of
>>>>> negotiating to enable the advanced flexible descriptor layout. Add the
>>>>> flexible NIC layout (RXDID=2) as a member of the Rx descriptor union.
> 
> [...]
> 
> Why is this taken into the next queue if I asked for changes and there's
> v8 in development?

This was applied before I returned, however, I believe the patches were 
applied before your comments were received. Since they were already 
applied, I left them on the tree by request [1] (while waiting for v8). 
There were other issues reported after this though so I recently dropped 
the series off the tree.

Thanks,
Tony

[1] 
https://lore.kernel.org/intel-wired-lan/70458c52-75ef-4876-a4a3-c042c52ecdb3@intel.com/

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-06-21 15:13 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 13:13 [Intel-wired-lan] [PATCH iwl-next v7 00/12] Add support for Rx timestamping for both ice and iavf drivers Mateusz Polchlopek
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 01/12] virtchnl: add support for enabling PTP on iAVF Mateusz Polchlopek
2024-06-08 12:55   ` Simon Horman
2024-06-10 10:18     ` Mateusz Polchlopek
2024-06-10 18:35     ` Jacob Keller
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 02/12] ice: support Rx timestamp on flex descriptor Mateusz Polchlopek
2024-06-08 12:56   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 03/12] virtchnl: add enumeration for the rxdid format Mateusz Polchlopek
2024-06-08 12:57   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 04/12] iavf: add support for negotiating flexible RXDID format Mateusz Polchlopek
2024-06-08 12:56   ` Simon Horman
2024-06-08 12:58   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 05/12] iavf: negotiate PTP capabilities Mateusz Polchlopek
2024-06-08 12:58   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 06/12] iavf: add initial framework for registering PTP clock Mateusz Polchlopek
2024-06-08 12:58   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 07/12] iavf: add support for indirect access to PHC time Mateusz Polchlopek
2024-06-08 12:59   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 08/12] iavf: periodically cache " Mateusz Polchlopek
2024-06-08 12:59   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 09/12] iavf: refactor iavf_clean_rx_irq to support legacy and flex descriptors Mateusz Polchlopek
2024-06-08 12:59   ` Simon Horman
2024-06-11 11:47   ` Alexander Lobakin
2024-06-11 20:52     ` Jacob Keller
2024-06-12 11:51       ` Przemek Kitszel
2024-06-12 12:33       ` Alexander Lobakin
2024-06-21 14:21         ` Alexander Lobakin
2024-06-21 15:08           ` Tony Nguyen
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 10/12] iavf: Implement checking DD desc field Mateusz Polchlopek
2024-06-08 12:59   ` Simon Horman
2024-06-04 13:13 ` [Intel-wired-lan] [PATCH iwl-next v7 11/12] iavf: handle set and get timestamps ops Mateusz Polchlopek
2024-06-08 13:00   ` Simon Horman
2024-06-04 13:14 ` [Intel-wired-lan] [PATCH iwl-next v7 12/12] iavf: add support for Rx timestamps to hotpath Mateusz Polchlopek
2024-06-08 13:00   ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).