[PATCH iwl-next v4 00/12] Add E800 live migration driver

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH iwl-next v4 00/12] Add E800 live migration driver
@ 2023-11-21  2:50 Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Yahui Cao
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:50 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

This series adds vfio live migration support for Intel E810 VF devices
based on the v2 migration protocol definition series discussed here[0].

Steps to test:
1. Bind one or more E810 VF devices to the module ice-vfio-pci.ko
2. Assign the VFs to the virtual machine and enable device live migration
3. Run a workload using IAVF inside the VM, for example, iperf.
4. Migrate the VM from the source node to a destination node.

The series is also available for review here[1].

Thanks,
Yahui
[0] https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/
[1] https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux.git/log/?h=ice_live_migration

Change log:

v4:
 - Remove unnecessary iomap from vfio variant driver
 - Change Kconfig to select VFIO_PCI_CORE for ICE_VFIO_PCI module (Alex)
 - Replace restore state with load state for naming convention
 - Remove RXDID Patch
 - Fix missed comments in Patch03
 - Remove "so" at the beginning of the sentence and fix other grammar issue.
 - Remove double init and change return logic for Patch 10
 - Change ice_migration_unlog_vf_msg comments for Patch04
 - Add r-b from Michal to Patch04 of v4
 - Change ice_migration_is_loggable_msg return value type into bool type for Patch05
 - Change naming from dirtied to dirty for Patch11
 - Use total_length to pass parameter to save/load function instead of macro for Patch12
 - Refactor timeout logic for Patch09
 - Change migration_enabled from bool into u8:1 type for Patch04
 - Fix 80 max line length limit issue and compilation warning 
 - Add r-b from Igor to all the patches of v4
 - Fix incorrect type in assignment of __le16/32 for Patch06
 - Change product name to from E800 to E810

v3: https://lore.kernel.org/intel-wired-lan/20230918062546.40419-1-yahui.cao@intel.com/
 - Add P2P support in vfio driver (Jason)
 - Remove source/destination check in vfio driver (Jason)
 - Restructure PF exported API with proper types and layering (Jason)
 - Change patchset email sender.
 - Reword commit message and comments to be more reviewer-friendly (Kevin)
 - Add s-o-b for Patch01 (Kevin)
 - Merge Patch08 into Patch04 and merge Patch13 into Patch06 (Kevin)
 - Remove uninit() in VF destroy stage for Patch 05 (Kevin)
 - change migration_active to migration_enabled (Kevin)
 - Add total_size in devstate to greatly simplify the various checks for
   Patch07 (Kevin)
 - Add magic and version in device state for Patch07 (Kevin)
 - Fix rx head init issue in Patch10 (Kevin)
 - Remove DMA access for Guest Memory at device resume stage and deprecate
   the approach to restore TX head in VF space, instead restore TX head in
   PF space and then switch context back to VF space which is transparent
   to Guest for Patch11 (Jason, Kevin)
 - Use non-interrupt mode instead of VF MSIX vector to restore TX head for
   Patch11 (Kevin)
 - Move VF pci mmio save/restore from vfio driver into PF driver
 - Add configuration match check at device resume stage (Kevin)
 - Remove sleep before stopping queue at device suspend stage (Kevin)
 - Let PF respond failure to VF if virtual channel messages logging failed (Kevin)
 - Add migration setup and description in cover letter

v2: https://lore.kernel.org/intel-wired-lan/20230621091112.44945-1-lingyu.liu@intel.com/
 - clarified comments and commit message

v1: https://lore.kernel.org/intel-wired-lan/20230620100001.5331-1-lingyu.liu@intel.com/

---


Lingyu Liu (9):
  ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration
  ice: Add fundamental migration init and exit function
  ice: Log virtual channel messages in PF
  ice: Add device state save/load function for migration
  ice: Fix VSI id in virtual channel message for migration
  ice: Save and load RX Queue head
  ice: Save and load TX Queue head
  ice: Add device suspend function for migration
  vfio/ice: Implement vfio_pci driver for E800 devices

Yahui Cao (3):
  ice: Add function to get RX queue context
  ice: Add function to get and set TX queue context
  ice: Save and load mmio registers

 MAINTAINERS                                   |    7 +
 drivers/net/ethernet/intel/ice/Makefile       |    1 +
 drivers/net/ethernet/intel/ice/ice.h          |    3 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  484 +++++-
 drivers/net/ethernet/intel/ice/ice_common.h   |   11 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   23 +
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |    3 +
 drivers/net/ethernet/intel/ice/ice_main.c     |   15 +
 .../net/ethernet/intel/ice/ice_migration.c    | 1378 +++++++++++++++++
 .../intel/ice/ice_migration_private.h         |   49 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.c   |    4 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   11 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  256 ++-
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |   15 +-
 .../ethernet/intel/ice/ice_virtchnl_fdir.c    |   28 +-
 drivers/vfio/pci/Kconfig                      |    2 +
 drivers/vfio/pci/Makefile                     |    2 +
 drivers/vfio/pci/ice/Kconfig                  |   10 +
 drivers/vfio/pci/ice/Makefile                 |    4 +
 drivers/vfio/pci/ice/ice_vfio_pci.c           |  707 +++++++++
 include/linux/net/intel/ice_migration.h       |   48 +
 21 files changed, 2962 insertions(+), 99 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_migration.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_migration_private.h
 create mode 100644 drivers/vfio/pci/ice/Kconfig
 create mode 100644 drivers/vfio/pci/ice/Makefile
 create mode 100644 drivers/vfio/pci/ice/ice_vfio_pci.c
 create mode 100644 include/linux/net/intel/ice_migration.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-08 22:01   ` Brett Creeley
  2023-11-21  2:51 ` [PATCH iwl-next v4 02/12] ice: Add function to get and set TX " Yahui Cao
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

Export RX queue context get function which is consumed by linux live
migration driver to save and load device state.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c | 268 ++++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_common.h |   5 +
 2 files changed, 273 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 9a6c25f98632..d0a3bed00921 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1540,6 +1540,34 @@ ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index)
 	return 0;
 }
 
+/**
+ * ice_copy_rxq_ctx_from_hw - Copy rxq context register from HW
+ * @hw: pointer to the hardware structure
+ * @ice_rxq_ctx: pointer to the rxq context
+ * @rxq_index: the index of the Rx queue
+ *
+ * Copy rxq context from HW register space to dense structure
+ */
+static int
+ice_copy_rxq_ctx_from_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index)
+{
+	u8 i;
+
+	if (!ice_rxq_ctx || rxq_index > QRX_CTRL_MAX_INDEX)
+		return -EINVAL;
+
+	/* Copy each dword separately from HW */
+	for (i = 0; i < ICE_RXQ_CTX_SIZE_DWORDS; i++) {
+		u32 *ctx = (u32 *)(ice_rxq_ctx + (i * sizeof(u32)));
+
+		*ctx = rd32(hw, QRX_CONTEXT(i, rxq_index));
+
+		ice_debug(hw, ICE_DBG_QCTX, "qrxdata[%d]: %08X\n", i, *ctx);
+	}
+
+	return 0;
+}
+
 /* LAN Rx Queue Context */
 static const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 	/* Field		Width	LSB */
@@ -1591,6 +1619,32 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
 }
 
+/**
+ * ice_read_rxq_ctx - Read rxq context from HW
+ * @hw: pointer to the hardware structure
+ * @rlan_ctx: pointer to the rxq context
+ * @rxq_index: the index of the Rx queue
+ *
+ * Read rxq context from HW register space and then converts it from dense
+ * structure to sparse
+ */
+int
+ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
+		 u32 rxq_index)
+{
+	u8 ctx_buf[ICE_RXQ_CTX_SZ] = { 0 };
+	int status;
+
+	if (!rlan_ctx)
+		return -EINVAL;
+
+	status = ice_copy_rxq_ctx_from_hw(hw, ctx_buf, rxq_index);
+	if (status)
+		return status;
+
+	return ice_get_ctx(ctx_buf, (u8 *)rlan_ctx, ice_rlan_ctx_info);
+}
+
 /* LAN Tx Queue Context */
 const struct ice_ctx_ele ice_tlan_ctx_info[] = {
 				    /* Field			Width	LSB */
@@ -4743,6 +4797,220 @@ ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
 	return 0;
 }
 
+/**
+ * ice_read_byte - read context byte into struct
+ * @src_ctx:  the context structure to read from
+ * @dest_ctx: the context to be written to
+ * @ce_info:  a description of the struct to be filled
+ */
+static void
+ice_read_byte(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
+{
+	u8 dest_byte, mask;
+	u8 *src, *target;
+	u16 shift_width;
+
+	/* prepare the bits and mask */
+	shift_width = ce_info->lsb % 8;
+	mask = (u8)(BIT(ce_info->width) - 1);
+
+	/* shift to correct alignment */
+	mask <<= shift_width;
+
+	/* get the current bits from the src bit string */
+	src = src_ctx + (ce_info->lsb / 8);
+
+	memcpy(&dest_byte, src, sizeof(dest_byte));
+
+	dest_byte &= mask;
+
+	dest_byte >>= shift_width;
+
+	/* get the address from the struct field */
+	target = dest_ctx + ce_info->offset;
+
+	/* put it back in the struct */
+	memcpy(target, &dest_byte, sizeof(dest_byte));
+}
+
+/**
+ * ice_read_word - read context word into struct
+ * @src_ctx:  the context structure to read from
+ * @dest_ctx: the context to be written to
+ * @ce_info:  a description of the struct to be filled
+ */
+static void
+ice_read_word(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
+{
+	u16 dest_word, mask;
+	u8 *src, *target;
+	__le16 src_word;
+	u16 shift_width;
+
+	/* prepare the bits and mask */
+	shift_width = ce_info->lsb % 8;
+	mask = BIT(ce_info->width) - 1;
+
+	/* shift to correct alignment */
+	mask <<= shift_width;
+
+	/* get the current bits from the src bit string */
+	src = src_ctx + (ce_info->lsb / 8);
+
+	memcpy(&src_word, src, sizeof(src_word));
+
+	/* the data in the memory is stored as little endian so mask it
+	 * correctly
+	 */
+	src_word &= cpu_to_le16(mask);
+
+	/* get the data back into host order before shifting */
+	dest_word = le16_to_cpu(src_word);
+
+	dest_word >>= shift_width;
+
+	/* get the address from the struct field */
+	target = dest_ctx + ce_info->offset;
+
+	/* put it back in the struct */
+	memcpy(target, &dest_word, sizeof(dest_word));
+}
+
+/**
+ * ice_read_dword - read context dword into struct
+ * @src_ctx:  the context structure to read from
+ * @dest_ctx: the context to be written to
+ * @ce_info:  a description of the struct to be filled
+ */
+static void
+ice_read_dword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
+{
+	u32 dest_dword, mask;
+	__le32 src_dword;
+	u8 *src, *target;
+	u16 shift_width;
+
+	/* prepare the bits and mask */
+	shift_width = ce_info->lsb % 8;
+
+	/* if the field width is exactly 32 on an x86 machine, then the shift
+	 * operation will not work because the SHL instructions count is masked
+	 * to 5 bits so the shift will do nothing
+	 */
+	if (ce_info->width < 32)
+		mask = BIT(ce_info->width) - 1;
+	else
+		mask = (u32)~0;
+
+	/* shift to correct alignment */
+	mask <<= shift_width;
+
+	/* get the current bits from the src bit string */
+	src = src_ctx + (ce_info->lsb / 8);
+
+	memcpy(&src_dword, src, sizeof(src_dword));
+
+	/* the data in the memory is stored as little endian so mask it
+	 * correctly
+	 */
+	src_dword &= cpu_to_le32(mask);
+
+	/* get the data back into host order before shifting */
+	dest_dword = le32_to_cpu(src_dword);
+
+	dest_dword >>= shift_width;
+
+	/* get the address from the struct field */
+	target = dest_ctx + ce_info->offset;
+
+	/* put it back in the struct */
+	memcpy(target, &dest_dword, sizeof(dest_dword));
+}
+
+/**
+ * ice_read_qword - read context qword into struct
+ * @src_ctx:  the context structure to read from
+ * @dest_ctx: the context to be written to
+ * @ce_info:  a description of the struct to be filled
+ */
+static void
+ice_read_qword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
+{
+	u64 dest_qword, mask;
+	__le64 src_qword;
+	u8 *src, *target;
+	u16 shift_width;
+
+	/* prepare the bits and mask */
+	shift_width = ce_info->lsb % 8;
+
+	/* if the field width is exactly 64 on an x86 machine, then the shift
+	 * operation will not work because the SHL instructions count is masked
+	 * to 6 bits so the shift will do nothing
+	 */
+	if (ce_info->width < 64)
+		mask = BIT_ULL(ce_info->width) - 1;
+	else
+		mask = (u64)~0;
+
+	/* shift to correct alignment */
+	mask <<= shift_width;
+
+	/* get the current bits from the src bit string */
+	src = src_ctx + (ce_info->lsb / 8);
+
+	memcpy(&src_qword, src, sizeof(src_qword));
+
+	/* the data in the memory is stored as little endian so mask it
+	 * correctly
+	 */
+	src_qword &= cpu_to_le64(mask);
+
+	/* get the data back into host order before shifting */
+	dest_qword = le64_to_cpu(src_qword);
+
+	dest_qword >>= shift_width;
+
+	/* get the address from the struct field */
+	target = dest_ctx + ce_info->offset;
+
+	/* put it back in the struct */
+	memcpy(target, &dest_qword, sizeof(dest_qword));
+}
+
+/**
+ * ice_get_ctx - extract context bits from a packed structure
+ * @src_ctx:  pointer to a generic packed context structure
+ * @dest_ctx: pointer to a generic non-packed context structure
+ * @ce_info:  a description of the structure to be read from
+ */
+int
+ice_get_ctx(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info)
+{
+	int i;
+
+	for (i = 0; ce_info[i].width; i++) {
+		switch (ce_info[i].size_of) {
+		case 1:
+			ice_read_byte(src_ctx, dest_ctx, &ce_info[i]);
+			break;
+		case 2:
+			ice_read_word(src_ctx, dest_ctx, &ce_info[i]);
+			break;
+		case 4:
+			ice_read_dword(src_ctx, dest_ctx, &ce_info[i]);
+			break;
+		case 8:
+			ice_read_qword(src_ctx, dest_ctx, &ce_info[i]);
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 /**
  * ice_get_lan_q_ctx - get the LAN queue context for the given VSI and TC
  * @hw: pointer to the HW struct
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index 31fdcac33986..df9c7f30592a 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -55,6 +55,9 @@ void ice_set_safe_mode_caps(struct ice_hw *hw);
 int
 ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 		  u32 rxq_index);
+int
+ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
+		 u32 rxq_index);
 
 int
 ice_aq_get_rss_lut(struct ice_hw *hw, struct ice_aq_get_set_rss_lut_params *get_params);
@@ -74,6 +77,8 @@ extern const struct ice_ctx_ele ice_tlan_ctx_info[];
 int
 ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx,
 	    const struct ice_ctx_ele *ce_info);
+int
+ice_get_ctx(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info);
 
 extern struct mutex ice_global_cfg_lock_sw;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 02/12] ice: Add function to get and set TX queue context
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-08 22:14   ` Brett Creeley
  2023-11-21  2:51 ` [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration Yahui Cao
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

Export TX queue context get and set function which is consumed by linux
live migration driver to save and load device state.

TX queue context contains static fields which does not change during TX
traffic and dynamic fields which may change during TX traffic.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c   | 216 +++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_common.h   |   6 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  15 ++
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |   3 +
 4 files changed, 239 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index d0a3bed00921..8577a5ef423e 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1645,7 +1645,10 @@ ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 	return ice_get_ctx(ctx_buf, (u8 *)rlan_ctx, ice_rlan_ctx_info);
 }
 
-/* LAN Tx Queue Context */
+/* LAN Tx Queue Context used for set Tx config by ice_aqc_opc_add_txqs,
+ * Bit[0-175] is valid
+ */
+
 const struct ice_ctx_ele ice_tlan_ctx_info[] = {
 				    /* Field			Width	LSB */
 	ICE_CTX_STORE(ice_tlan_ctx, base,			57,	0),
@@ -1679,6 +1682,217 @@ const struct ice_ctx_ele ice_tlan_ctx_info[] = {
 	{ 0 }
 };
 
+/* LAN Tx Queue Context used for get Tx config from QTXCOMM_CNTX data,
+ * Bit[0-292] is valid, including internal queue state. Since internal
+ * queue state is dynamic field, its value will be cleared once queue
+ * is disabled
+ */
+static const struct ice_ctx_ele ice_tlan_ctx_data_info[] = {
+				    /* Field			Width	LSB */
+	ICE_CTX_STORE(ice_tlan_ctx, base,			57,	0),
+	ICE_CTX_STORE(ice_tlan_ctx, port_num,			3,	57),
+	ICE_CTX_STORE(ice_tlan_ctx, cgd_num,			5,	60),
+	ICE_CTX_STORE(ice_tlan_ctx, pf_num,			3,	65),
+	ICE_CTX_STORE(ice_tlan_ctx, vmvf_num,			10,	68),
+	ICE_CTX_STORE(ice_tlan_ctx, vmvf_type,			2,	78),
+	ICE_CTX_STORE(ice_tlan_ctx, src_vsi,			10,	80),
+	ICE_CTX_STORE(ice_tlan_ctx, tsyn_ena,			1,	90),
+	ICE_CTX_STORE(ice_tlan_ctx, internal_usage_flag,	1,	91),
+	ICE_CTX_STORE(ice_tlan_ctx, alt_vlan,			1,	92),
+	ICE_CTX_STORE(ice_tlan_ctx, cpuid,			8,	93),
+	ICE_CTX_STORE(ice_tlan_ctx, wb_mode,			1,	101),
+	ICE_CTX_STORE(ice_tlan_ctx, tphrd_desc,			1,	102),
+	ICE_CTX_STORE(ice_tlan_ctx, tphrd,			1,	103),
+	ICE_CTX_STORE(ice_tlan_ctx, tphwr_desc,			1,	104),
+	ICE_CTX_STORE(ice_tlan_ctx, cmpq_id,			9,	105),
+	ICE_CTX_STORE(ice_tlan_ctx, qnum_in_func,		14,	114),
+	ICE_CTX_STORE(ice_tlan_ctx, itr_notification_mode,	1,	128),
+	ICE_CTX_STORE(ice_tlan_ctx, adjust_prof_id,		6,	129),
+	ICE_CTX_STORE(ice_tlan_ctx, qlen,			13,	135),
+	ICE_CTX_STORE(ice_tlan_ctx, quanta_prof_idx,		4,	148),
+	ICE_CTX_STORE(ice_tlan_ctx, tso_ena,			1,	152),
+	ICE_CTX_STORE(ice_tlan_ctx, tso_qnum,			11,	153),
+	ICE_CTX_STORE(ice_tlan_ctx, legacy_int,			1,	164),
+	ICE_CTX_STORE(ice_tlan_ctx, drop_ena,			1,	165),
+	ICE_CTX_STORE(ice_tlan_ctx, cache_prof_idx,		2,	166),
+	ICE_CTX_STORE(ice_tlan_ctx, pkt_shaper_prof_idx,	3,	168),
+	ICE_CTX_STORE(ice_tlan_ctx, tail,			13,	184),
+	{ 0 }
+};
+
+/**
+ * ice_copy_txq_ctx_from_hw - Copy txq context register from HW
+ * @hw: pointer to the hardware structure
+ * @ice_txq_ctx: pointer to the txq context
+ *
+ * Copy txq context from HW register space to dense structure
+ */
+static int
+ice_copy_txq_ctx_from_hw(struct ice_hw *hw, u8 *ice_txq_ctx)
+{
+	u8 i;
+
+	if (!ice_txq_ctx)
+		return -EINVAL;
+
+	/* Copy each dword separately from HW */
+	for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) {
+		u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32)));
+
+		*ctx = rd32(hw, GLCOMM_QTX_CNTX_DATA(i));
+
+		ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx);
+	}
+
+	return 0;
+}
+
+/**
+ * ice_copy_txq_ctx_to_hw - Copy txq context register into HW
+ * @hw: pointer to the hardware structure
+ * @ice_txq_ctx: pointer to the txq context
+ *
+ * Copy txq context from dense structure to HW register space
+ */
+static int
+ice_copy_txq_ctx_to_hw(struct ice_hw *hw, u8 *ice_txq_ctx)
+{
+	u8 i;
+
+	if (!ice_txq_ctx)
+		return -EINVAL;
+
+	/* Copy each dword separately to HW */
+	for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) {
+		u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32)));
+
+		wr32(hw, GLCOMM_QTX_CNTX_DATA(i), *ctx);
+
+		ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx);
+	}
+
+	return 0;
+}
+
+/* Configuration access to tx ring context(from PF) is done via indirect
+ * interface, GLCOMM_QTX_CNTX_CTL/DATA registers. However, there registers
+ * are shared by all the PFs with single PCI card. Hence multiplied PF may
+ * access there registers simultaneously, causing access conflicts. Then
+ * card-level grained locking is required to protect these registers from
+ * being competed by PF devices within the same card. However, there is no
+ * such kind of card-level locking supported. Introduce a coarse grained
+ * global lock which is shared by all the PF driver.
+ *
+ * The overall flow is to acquire the lock, read/write TXQ context through
+ * GLCOMM_QTX_CNTX_CTL/DATA indirect interface and release the lock once
+ * access is completed. In this way, only one PF can have access to TXQ
+ * context safely.
+ */
+static DEFINE_MUTEX(ice_global_txq_ctx_lock);
+
+/**
+ * ice_read_txq_ctx - Read txq context from HW
+ * @hw: pointer to the hardware structure
+ * @tlan_ctx: pointer to the txq context
+ * @txq_index: the index of the Tx queue
+ *
+ * Read txq context from HW register space and then convert it from dense
+ * structure to sparse
+ */
+int
+ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
+		 u32 txq_index)
+{
+	u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 };
+	int status;
+	u32 txq_base;
+	u32 cmd, reg;
+
+	if (!tlan_ctx)
+		return -EINVAL;
+
+	if (txq_index > QTX_COMM_HEAD_MAX_INDEX)
+		return -EINVAL;
+
+	/* Get TXQ base within card space */
+	txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id));
+	txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >>
+		   PFLAN_TX_QALLOC_FIRSTQ_S;
+
+	cmd = (GLCOMM_QTX_CNTX_CTL_CMD_READ
+		<< GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M;
+	reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M |
+	      (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) &
+	       GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M);
+
+	mutex_lock(&ice_global_txq_ctx_lock);
+
+	wr32(hw, GLCOMM_QTX_CNTX_CTL, reg);
+	ice_flush(hw);
+
+	status = ice_copy_txq_ctx_from_hw(hw, ctx_buf);
+	if (status) {
+		mutex_unlock(&ice_global_txq_ctx_lock);
+		return status;
+	}
+
+	mutex_unlock(&ice_global_txq_ctx_lock);
+
+	return ice_get_ctx(ctx_buf, (u8 *)tlan_ctx, ice_tlan_ctx_data_info);
+}
+
+/**
+ * ice_write_txq_ctx - Write txq context from HW
+ * @hw: pointer to the hardware structure
+ * @tlan_ctx: pointer to the txq context
+ * @txq_index: the index of the Tx queue
+ *
+ * Convert txq context from sparse to dense structure and then write
+ * it to HW register space
+ */
+int
+ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
+		  u32 txq_index)
+{
+	u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 };
+	int status;
+	u32 txq_base;
+	u32 cmd, reg;
+
+	if (!tlan_ctx)
+		return -EINVAL;
+
+	if (txq_index > QTX_COMM_HEAD_MAX_INDEX)
+		return -EINVAL;
+
+	ice_set_ctx(hw, (u8 *)tlan_ctx, ctx_buf, ice_tlan_ctx_info);
+
+	/* Get TXQ base within card space */
+	txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id));
+	txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >>
+		   PFLAN_TX_QALLOC_FIRSTQ_S;
+
+	cmd = (GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN
+		<< GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M;
+	reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M |
+	      (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) &
+	       GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M);
+
+	mutex_lock(&ice_global_txq_ctx_lock);
+
+	status = ice_copy_txq_ctx_to_hw(hw, ctx_buf);
+	if (status) {
+		mutex_lock(&ice_global_txq_ctx_lock);
+		return status;
+	}
+
+	wr32(hw, GLCOMM_QTX_CNTX_CTL, reg);
+	ice_flush(hw);
+
+	mutex_unlock(&ice_global_txq_ctx_lock);
+
+	return 0;
+}
 /* Sideband Queue command wrappers */
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
index df9c7f30592a..40fbb9088475 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.h
+++ b/drivers/net/ethernet/intel/ice/ice_common.h
@@ -58,6 +58,12 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 int
 ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 		 u32 rxq_index);
+int
+ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
+		 u32 txq_index);
+int
+ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
+		  u32 txq_index);
 
 int
 ice_aq_get_rss_lut(struct ice_hw *hw, struct ice_aq_get_set_rss_lut_params *get_params);
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 86936b758ade..7410da715ad4 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -8,6 +8,7 @@
 
 #define QTX_COMM_DBELL(_DBQM)			(0x002C0000 + ((_DBQM) * 4))
 #define QTX_COMM_HEAD(_DBQM)			(0x000E0000 + ((_DBQM) * 4))
+#define QTX_COMM_HEAD_MAX_INDEX			16383
 #define QTX_COMM_HEAD_HEAD_S			0
 #define QTX_COMM_HEAD_HEAD_M			ICE_M(0x1FFF, 0)
 #define PF_FW_ARQBAH				0x00080180
@@ -258,6 +259,9 @@
 #define VPINT_ALLOC_PCI_VALID_M			BIT(31)
 #define VPINT_MBX_CTL(_VSI)			(0x0016A000 + ((_VSI) * 4))
 #define VPINT_MBX_CTL_CAUSE_ENA_M		BIT(30)
+#define PFLAN_TX_QALLOC(_PF)			(0x001D2580 + ((_PF) * 4))
+#define PFLAN_TX_QALLOC_FIRSTQ_S		0
+#define PFLAN_TX_QALLOC_FIRSTQ_M		ICE_M(0x3FFF, 0)
 #define GLLAN_RCTL_0				0x002941F8
 #define QRX_CONTEXT(_i, _QRX)			(0x00280000 + ((_i) * 8192 + (_QRX) * 4))
 #define QRX_CTRL(_QRX)				(0x00120000 + ((_QRX) * 4))
@@ -362,6 +366,17 @@
 #define GLNVM_ULD_POR_DONE_1_M			BIT(8)
 #define GLNVM_ULD_PCIER_DONE_2_M		BIT(9)
 #define GLNVM_ULD_PE_DONE_M			BIT(10)
+#define GLCOMM_QTX_CNTX_CTL			0x002D2DC8
+#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S		0
+#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M		ICE_M(0x3FFF, 0)
+#define GLCOMM_QTX_CNTX_CTL_CMD_S		16
+#define GLCOMM_QTX_CNTX_CTL_CMD_M		ICE_M(0x7, 16)
+#define GLCOMM_QTX_CNTX_CTL_CMD_READ		0
+#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE		1
+#define GLCOMM_QTX_CNTX_CTL_CMD_RESET		3
+#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN	4
+#define GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M		BIT(19)
+#define GLCOMM_QTX_CNTX_DATA(_i)		(0x002D2D40 + ((_i) * 4))
 #define GLPCI_CNF2				0x000BE004
 #define GLPCI_CNF2_CACHELINE_SIZE_M		BIT(1)
 #define PF_FUNC_RID				0x0009E880
diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 89f986a75cc8..79e07c863ae0 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -431,6 +431,8 @@ enum ice_rx_flex_desc_status_error_1_bits {
 
 #define ICE_RXQ_CTX_SIZE_DWORDS		8
 #define ICE_RXQ_CTX_SZ			(ICE_RXQ_CTX_SIZE_DWORDS * sizeof(u32))
+#define ICE_TXQ_CTX_SIZE_DWORDS		10
+#define ICE_TXQ_CTX_SZ			(ICE_TXQ_CTX_SIZE_DWORDS * sizeof(u32))
 #define ICE_TX_CMPLTNQ_CTX_SIZE_DWORDS	22
 #define ICE_TX_DRBELL_Q_CTX_SIZE_DWORDS	5
 #define GLTCLAN_CQ_CNTX(i, CQ)		(GLTCLAN_CQ_CNTX0(CQ) + ((i) * 0x0800))
@@ -649,6 +651,7 @@ struct ice_tlan_ctx {
 	u8 cache_prof_idx;
 	u8 pkt_shaper_prof_idx;
 	u8 int_q_state;	/* width not needed - internal - DO NOT WRITE!!! */
+	u16 tail;
 };
 
 /* The ice_ptype_lkup table is used to convert from the 10-bit ptype in the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 02/12] ice: Add function to get and set TX " Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-08 22:28   ` Brett Creeley
  2023-11-21  2:51 ` [PATCH iwl-next v4 04/12] ice: Add fundamental migration init and exit function Yahui Cao
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

During migration device resume stage, part of device state is loaded by
replaying logged virtual channel message. By default, once virtual
channel message is processed successfully, PF will send message to VF.

In addition, PF will notify VF about link state while handling virtual
channel message GET_VF_RESOURCE and ENABLE_QUEUES. And VF driver will
print link state change info once receiving notification from PF.

However, device resume stage does not need PF to send messages to VF
for the above cases. Stop PF from sending messages to VF while VF is
in replay state.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c | 179 +++++++++++-------
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |   8 +-
 .../ethernet/intel/ice/ice_virtchnl_fdir.c    |  28 +--
 4 files changed, 127 insertions(+), 89 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 93c774f2f437..c7e7df7baf38 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -37,6 +37,7 @@ enum ice_vf_states {
 	ICE_VF_STATE_DIS,
 	ICE_VF_STATE_MC_PROMISC,
 	ICE_VF_STATE_UC_PROMISC,
+	ICE_VF_STATE_REPLAYING_VC,
 	ICE_VF_STATES_NBITS
 };
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index cdf17b1e2f25..661ca86c3032 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -233,6 +233,9 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf)
 	struct virtchnl_pf_event pfe = { 0 };
 	struct ice_hw *hw = &vf->pf->hw;
 
+	if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states))
+		return;
+
 	pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
 	pfe.severity = PF_EVENT_SEVERITY_INFO;
 
@@ -282,7 +285,7 @@ void ice_vc_notify_reset(struct ice_pf *pf)
 }
 
 /**
- * ice_vc_send_msg_to_vf - Send message to VF
+ * ice_vc_send_response_to_vf - Send response message to VF
  * @vf: pointer to the VF info
  * @v_opcode: virtual channel opcode
  * @v_retval: virtual channel return value
@@ -291,9 +294,10 @@ void ice_vc_notify_reset(struct ice_pf *pf)
  *
  * send msg to VF
  */
-int
-ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
-		      enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
+static int
+ice_vc_send_response_to_vf(struct ice_vf *vf, u32 v_opcode,
+			   enum virtchnl_status_code v_retval,
+			   u8 *msg, u16 msglen)
 {
 	struct device *dev;
 	struct ice_pf *pf;
@@ -314,6 +318,39 @@ ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
 	return 0;
 }
 
+/**
+ * ice_vc_respond_to_vf - Respond to VF
+ * @vf: pointer to the VF info
+ * @v_opcode: virtual channel opcode
+ * @v_retval: virtual channel return value
+ * @msg: pointer to the msg buffer
+ * @msglen: msg length
+ *
+ * Respond to VF. If it is replaying, return directly.
+ *
+ * Return 0 for success, negative for error.
+ */
+int
+ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
+		     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
+{
+	struct device *dev;
+	struct ice_pf *pf = vf->pf;
+
+	dev = ice_pf_to_dev(pf);
+
+	if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) {
+		if (v_retval == VIRTCHNL_STATUS_SUCCESS)
+			return 0;
+
+		dev_dbg(dev, "Unable to replay virt channel command, VF ID %d, virtchnl status code %d. op code %d, len %d.\n",
+			vf->vf_id, v_retval, v_opcode, msglen);
+		return -EIO;
+	}
+
+	return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen);
+}
+
 /**
  * ice_vc_get_ver_msg
  * @vf: pointer to the VF info
@@ -332,9 +369,9 @@ static int ice_vc_get_ver_msg(struct ice_vf *vf, u8 *msg)
 	if (VF_IS_V10(&vf->vf_ver))
 		info.minor = VIRTCHNL_VERSION_MINOR_NO_VF_CAPS;
 
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_VERSION,
-				     VIRTCHNL_STATUS_SUCCESS, (u8 *)&info,
-				     sizeof(struct virtchnl_version_info));
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_VERSION,
+				    VIRTCHNL_STATUS_SUCCESS, (u8 *)&info,
+				    sizeof(struct virtchnl_version_info));
 }
 
 /**
@@ -522,8 +559,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 
 err:
 	/* send the response back to the VF */
-	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret,
-				    (u8 *)vfres, len);
+	ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret,
+				   (u8 *)vfres, len);
 
 	kfree(vfres);
 	return ret;
@@ -892,7 +929,7 @@ static int ice_vc_handle_rss_cfg(struct ice_vf *vf, u8 *msg, bool add)
 	}
 
 error_param:
-	return ice_vc_send_msg_to_vf(vf, v_opcode, v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, v_opcode, v_ret, NULL, 0);
 }
 
 /**
@@ -938,8 +975,8 @@ static int ice_vc_config_rss_key(struct ice_vf *vf, u8 *msg)
 	if (ice_set_rss_key(vsi, vrk->key))
 		v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
 error_param:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret,
-				     NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret,
+				    NULL, 0);
 }
 
 /**
@@ -984,7 +1021,7 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
 	if (ice_set_rss_lut(vsi, vrl->lut, ICE_LUT_VSI_SIZE))
 		v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
 error_param:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret,
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret,
 				     NULL, 0);
 }
 
@@ -1124,8 +1161,8 @@ static int ice_vc_cfg_promiscuous_mode_msg(struct ice_vf *vf, u8 *msg)
 	}
 
 error_param:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -1165,8 +1202,8 @@ static int ice_vc_get_stats_msg(struct ice_vf *vf, u8 *msg)
 
 error_param:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret,
-				     (u8 *)&stats, sizeof(stats));
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret,
+				    (u8 *)&stats, sizeof(stats));
 }
 
 /**
@@ -1315,8 +1352,8 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg)
 
 error_param:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret,
-				     NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret,
+				    NULL, 0);
 }
 
 /**
@@ -1455,8 +1492,8 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg)
 
 error_param:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret,
-				     NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret,
+				    NULL, 0);
 }
 
 /**
@@ -1586,8 +1623,8 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
 
 error_param:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret,
-				     NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret,
+				    NULL, 0);
 }
 
 /**
@@ -1730,8 +1767,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 	}
 
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
-				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
+				    VIRTCHNL_STATUS_SUCCESS, NULL, 0);
 error_param:
 	/* disable whatever we can */
 	for (; i >= 0; i--) {
@@ -1746,8 +1783,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
 	ice_lag_move_new_vf_nodes(vf);
 
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
-				     VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
+				    VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
 }
 
 /**
@@ -2049,7 +2086,7 @@ ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set)
 
 handle_mac_exit:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, vc_op, v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, vc_op, v_ret, NULL, 0);
 }
 
 /**
@@ -2132,8 +2169,8 @@ static int ice_vc_request_qs_msg(struct ice_vf *vf, u8 *msg)
 
 error_param:
 	/* send the response to the VF */
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES,
-				     v_ret, (u8 *)vfres, sizeof(*vfres));
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES,
+				    v_ret, (u8 *)vfres, sizeof(*vfres));
 }
 
 /**
@@ -2398,11 +2435,11 @@ static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v)
 error_param:
 	/* send the response to the VF */
 	if (add_v)
-		return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret,
-					     NULL, 0);
+		return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret,
+					    NULL, 0);
 	else
-		return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret,
-					     NULL, 0);
+		return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret,
+					    NULL, 0);
 }
 
 /**
@@ -2477,8 +2514,8 @@ static int ice_vc_ena_vlan_stripping(struct ice_vf *vf)
 		vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA;
 
 error_param:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -2514,8 +2551,8 @@ static int ice_vc_dis_vlan_stripping(struct ice_vf *vf)
 		vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA;
 
 error_param:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -2550,8 +2587,8 @@ static int ice_vc_get_rss_hena(struct ice_vf *vf)
 	vrh->hena = ICE_DEFAULT_RSS_HENA;
 err:
 	/* send the response back to the VF */
-	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret,
-				    (u8 *)vrh, len);
+	ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret,
+				   (u8 *)vrh, len);
 	kfree(vrh);
 	return ret;
 }
@@ -2616,8 +2653,8 @@ static int ice_vc_set_rss_hena(struct ice_vf *vf, u8 *msg)
 
 	/* send the response to the VF */
 err:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret,
-				     NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret,
+				    NULL, 0);
 }
 
 /**
@@ -2672,8 +2709,8 @@ static int ice_vc_query_rxdid(struct ice_vf *vf)
 	pf->supported_rxdids = rxdid->supported_rxdids;
 
 err:
-	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
-				    v_ret, (u8 *)rxdid, len);
+	ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
+				   v_ret, (u8 *)rxdid, len);
 	kfree(rxdid);
 	return ret;
 }
@@ -2909,8 +2946,8 @@ static int ice_vc_get_offload_vlan_v2_caps(struct ice_vf *vf)
 	memcpy(&vf->vlan_v2_caps, caps, sizeof(*caps));
 
 out:
-	err = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS,
-				    v_ret, (u8 *)caps, len);
+	err = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS,
+				   v_ret, (u8 *)caps, len);
 	kfree(caps);
 	return err;
 }
@@ -3151,8 +3188,8 @@ static int ice_vc_remove_vlan_v2_msg(struct ice_vf *vf, u8 *msg)
 		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2, v_ret, NULL,
-				     0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3293,8 +3330,8 @@ static int ice_vc_add_vlan_v2_msg(struct ice_vf *vf, u8 *msg)
 		v_ret = VIRTCHNL_STATUS_ERR_PARAM;
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2, v_ret, NULL,
-				     0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3525,8 +3562,8 @@ static int ice_vc_ena_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg)
 		vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA;
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3600,8 +3637,8 @@ static int ice_vc_dis_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg)
 		vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA;
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3659,8 +3696,8 @@ static int ice_vc_ena_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg)
 	}
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3714,8 +3751,8 @@ static int ice_vc_dis_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg)
 	}
 
 out:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2,
+				    v_ret, NULL, 0);
 }
 
 static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
@@ -3812,8 +3849,8 @@ static int ice_vc_repr_add_mac(struct ice_vf *vf, u8 *msg)
 	}
 
 handle_mac_exit:
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR,
-				     v_ret, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR,
+				    v_ret, NULL, 0);
 }
 
 /**
@@ -3832,8 +3869,8 @@ ice_vc_repr_del_mac(struct ice_vf __always_unused *vf, u8 __always_unused *msg)
 
 	ice_update_legacy_cached_mac(vf, &al->list[0]);
 
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR,
-				     VIRTCHNL_STATUS_SUCCESS, NULL, 0);
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR,
+				    VIRTCHNL_STATUS_SUCCESS, NULL, 0);
 }
 
 static int
@@ -3842,8 +3879,8 @@ ice_vc_repr_cfg_promiscuous_mode(struct ice_vf *vf, u8 __always_unused *msg)
 	dev_dbg(ice_pf_to_dev(vf->pf),
 		"Can't config promiscuous mode in switchdev mode for VF %d\n",
 		vf->vf_id);
-	return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
-				     VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
+	return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
+				    VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
 				     NULL, 0);
 }
 
@@ -3986,16 +4023,16 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 
 error_handler:
 	if (err) {
-		ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM,
-				      NULL, 0);
+		ice_vc_respond_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM,
+				     NULL, 0);
 		dev_err(dev, "Invalid message from VF %d, opcode %d, len %d, error %d\n",
 			vf_id, v_opcode, msglen, err);
 		goto finish;
 	}
 
 	if (!ice_vc_is_opcode_allowed(vf, v_opcode)) {
-		ice_vc_send_msg_to_vf(vf, v_opcode,
-				      VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL,
+		ice_vc_respond_to_vf(vf, v_opcode,
+				     VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL,
 				      0);
 		goto finish;
 	}
@@ -4106,9 +4143,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	default:
 		dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
 			vf_id);
-		err = ice_vc_send_msg_to_vf(vf, v_opcode,
-					    VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
-					    NULL, 0);
+		err = ice_vc_respond_to_vf(vf, v_opcode,
+					   VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
+					   NULL, 0);
 		break;
 	}
 	if (err) {
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index cd747718de73..a2b6094e2f2f 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -60,8 +60,8 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf);
 void ice_vc_notify_link_state(struct ice_pf *pf);
 void ice_vc_notify_reset(struct ice_pf *pf);
 int
-ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
-		      enum virtchnl_status_code v_retval, u8 *msg, u16 msglen);
+ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
+		     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen);
 bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id);
 void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 			   struct ice_mbx_data *mbxdata);
@@ -73,8 +73,8 @@ static inline void ice_vc_notify_link_state(struct ice_pf *pf) { }
 static inline void ice_vc_notify_reset(struct ice_pf *pf) { }
 
 static inline int
-ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
-		      enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
+ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
+		     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
index 24b23b7ef04a..816d8bf8bec4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
@@ -1584,8 +1584,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
 	resp->flow_id = conf->flow_id;
 	vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]++;
 
-	ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
-				    (u8 *)resp, len);
+	ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
+				   (u8 *)resp, len);
 	kfree(resp);
 
 	dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n",
@@ -1600,8 +1600,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
 	ice_vc_fdir_remove_entry(vf, conf, conf->flow_id);
 	devm_kfree(dev, conf);
 
-	ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
-				    (u8 *)resp, len);
+	ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
+				   (u8 *)resp, len);
 	kfree(resp);
 	return ret;
 }
@@ -1648,8 +1648,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
 	ice_vc_fdir_remove_entry(vf, conf, conf->flow_id);
 	vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]--;
 
-	ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
-				    (u8 *)resp, len);
+	ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
+				   (u8 *)resp, len);
 	kfree(resp);
 
 	dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n",
@@ -1665,8 +1665,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
 	if (success)
 		devm_kfree(dev, conf);
 
-	ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
-				    (u8 *)resp, len);
+	ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
+				   (u8 *)resp, len);
 	kfree(resp);
 	return ret;
 }
@@ -1863,8 +1863,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg)
 		v_ret = VIRTCHNL_STATUS_SUCCESS;
 		stat->status = VIRTCHNL_FDIR_SUCCESS;
 		devm_kfree(dev, conf);
-		ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER,
-					    v_ret, (u8 *)stat, len);
+		ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER,
+					   v_ret, (u8 *)stat, len);
 		goto exit;
 	}
 
@@ -1922,8 +1922,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg)
 err_free_conf:
 	devm_kfree(dev, conf);
 err_exit:
-	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret,
-				    (u8 *)stat, len);
+	ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret,
+				   (u8 *)stat, len);
 	kfree(stat);
 	return ret;
 }
@@ -2006,8 +2006,8 @@ int ice_vc_del_fdir_fltr(struct ice_vf *vf, u8 *msg)
 err_del_tmr:
 	ice_vc_fdir_clear_irq_ctx(vf);
 err_exit:
-	ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret,
-				    (u8 *)stat, len);
+	ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret,
+				   (u8 *)stat, len);
 	kfree(stat);
 	return ret;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 04/12] ice: Add fundamental migration init and exit function
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (2 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

Add basic entry point for live migration functionality initialization,
uninitialization and add helper function for vfio driver to reach pf
driver data.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 drivers/net/ethernet/intel/ice/Makefile       |  1 +
 drivers/net/ethernet/intel/ice/ice.h          |  3 +
 drivers/net/ethernet/intel/ice/ice_main.c     | 15 ++++
 .../net/ethernet/intel/ice/ice_migration.c    | 82 +++++++++++++++++++
 .../intel/ice/ice_migration_private.h         | 21 +++++
 drivers/net/ethernet/intel/ice/ice_vf_lib.c   |  4 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |  2 +
 include/linux/net/intel/ice_migration.h       | 27 ++++++
 8 files changed, 155 insertions(+)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_migration.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_migration_private.h
 create mode 100644 include/linux/net/intel/ice_migration.h

diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile
index 0679907980f7..c536a9a896c0 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -49,3 +49,4 @@ ice-$(CONFIG_RFS_ACCEL) += ice_arfs.o
 ice-$(CONFIG_XDP_SOCKETS) += ice_xsk.o
 ice-$(CONFIG_ICE_SWITCHDEV) += ice_eswitch.o ice_eswitch_br.o
 ice-$(CONFIG_GNSS) += ice_gnss.o
+ice-$(CONFIG_ICE_VFIO_PCI) += ice_migration.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 351e0d36df44..13f6ce51985c 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -55,6 +55,7 @@
 #include <net/vxlan.h>
 #include <net/gtp.h>
 #include <linux/ppp_defs.h>
+#include <linux/net/intel/ice_migration.h>
 #include "ice_devids.h"
 #include "ice_type.h"
 #include "ice_txrx.h"
@@ -77,6 +78,7 @@
 #include "ice_gnss.h"
 #include "ice_irq.h"
 #include "ice_dpll.h"
+#include "ice_migration_private.h"
 
 #define ICE_BAR0		0
 #define ICE_REQ_DESC_MULTIPLE	32
@@ -963,6 +965,7 @@ void ice_service_task_schedule(struct ice_pf *pf);
 int ice_load(struct ice_pf *pf);
 void ice_unload(struct ice_pf *pf);
 void ice_adv_lnk_speed_maps_init(void);
+struct ice_pf *ice_get_pf_from_vf_pdev(struct pci_dev *pdev);
 
 /**
  * ice_set_rdma_cap - enable RDMA support
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6607fa6fe556..2daa4d2b1dd1 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -9313,3 +9313,18 @@ static const struct net_device_ops ice_netdev_ops = {
 	.ndo_xdp_xmit = ice_xdp_xmit,
 	.ndo_xsk_wakeup = ice_xsk_wakeup,
 };
+
+/**
+ * ice_get_pf_from_vf_pdev - Get PF structure from PCI device
+ * @pdev: pointer to PCI device
+ *
+ * Return pointer to ice PF structure, NULL for failure
+ */
+struct ice_pf *ice_get_pf_from_vf_pdev(struct pci_dev *pdev)
+{
+	struct ice_pf *pf;
+
+	pf = pci_iov_get_pf_drvdata(pdev, &ice_driver);
+
+	return !IS_ERR(pf) ? pf : NULL;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
new file mode 100644
index 000000000000..2b9b5a2ce367
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2018-2023 Intel Corporation */
+
+#include "ice.h"
+
+/**
+ * ice_migration_get_pf - Get ice PF structure pointer by pdev
+ * @pdev: pointer to ice vfio pci VF pdev structure
+ *
+ * Return nonzero for success, NULL for failure.
+ */
+struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev)
+{
+	return ice_get_pf_from_vf_pdev(pdev);
+}
+EXPORT_SYMBOL(ice_migration_get_pf);
+
+/**
+ * ice_migration_init_vf - init ice VF device state data
+ * @vf: pointer to VF
+ */
+void ice_migration_init_vf(struct ice_vf *vf)
+{
+	vf->migration_enabled = true;
+}
+
+/**
+ * ice_migration_uninit_vf - uninit VF device state data
+ * @vf: pointer to VF
+ */
+void ice_migration_uninit_vf(struct ice_vf *vf)
+{
+	if (!vf->migration_enabled)
+		return;
+
+	vf->migration_enabled = false;
+}
+
+/**
+ * ice_migration_init_dev - init ice migration device
+ * @pf: pointer to PF of migration device
+ * @vf_id: VF index of migration device
+ *
+ * Return 0 for success, negative for failure
+ */
+int ice_migration_init_dev(struct ice_pf *pf, int vf_id)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vf *vf;
+
+	vf = ice_get_vf_by_id(pf, vf_id);
+	if (!vf) {
+		dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id);
+		return -EINVAL;
+	}
+
+	ice_migration_init_vf(vf);
+	ice_put_vf(vf);
+	return 0;
+}
+EXPORT_SYMBOL(ice_migration_init_dev);
+
+/**
+ * ice_migration_uninit_dev - uninit ice migration device
+ * @pf: pointer to PF of migration device
+ * @vf_id: VF index of migration device
+ */
+void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vf *vf;
+
+	vf = ice_get_vf_by_id(pf, vf_id);
+	if (!vf) {
+		dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id);
+		return;
+	}
+
+	ice_migration_uninit_vf(vf);
+	ice_put_vf(vf);
+}
+EXPORT_SYMBOL(ice_migration_uninit_dev);
diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h
new file mode 100644
index 000000000000..2cc2f515fc5e
--- /dev/null
+++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2018-2023 Intel Corporation */
+
+#ifndef _ICE_MIGRATION_PRIVATE_H_
+#define _ICE_MIGRATION_PRIVATE_H_
+
+/* This header file is for exposing functions in ice_migration.c to
+ * files which will be compiled in ice.ko.
+ * Functions which may be used by other files which will be compiled
+ * in ice-vfio-pic.ko should be exposed as part of ice_migration.h.
+ */
+
+#if IS_ENABLED(CONFIG_ICE_VFIO_PCI)
+void ice_migration_init_vf(struct ice_vf *vf);
+void ice_migration_uninit_vf(struct ice_vf *vf);
+#else
+static inline void ice_migration_init_vf(struct ice_vf *vf) { }
+static inline void ice_migration_uninit_vf(struct ice_vf *vf) { }
+#endif /* CONFIG_ICE_VFIO_PCI */
+
+#endif /* _ICE_MIGRATION_PRIVATE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
index aca1f2ea5034..8e571280831e 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c
@@ -243,6 +243,10 @@ static void ice_vf_pre_vsi_rebuild(struct ice_vf *vf)
 	if (vf->vf_ops->irq_close)
 		vf->vf_ops->irq_close(vf);
 
+	if (vf->migration_enabled) {
+		ice_migration_uninit_vf(vf);
+		ice_migration_init_vf(vf);
+	}
 	ice_vf_clear_counters(vf);
 	vf->vf_ops->clear_reset_trigger(vf);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index c7e7df7baf38..431fd28787e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -139,6 +139,8 @@ struct ice_vf {
 	struct devlink_port devlink_port;
 
 	u16 num_msix;			/* num of MSI-X configured on this VF */
+
+	u8 migration_enabled:1;
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h
new file mode 100644
index 000000000000..7ea11a8714d6
--- /dev/null
+++ b/include/linux/net/intel/ice_migration.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2018-2023 Intel Corporation */
+
+#ifndef _ICE_MIGRATION_H_
+#define _ICE_MIGRATION_H_
+
+struct ice_pf;
+
+#if IS_ENABLED(CONFIG_ICE_VFIO_PCI)
+struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev);
+int ice_migration_init_dev(struct ice_pf *pf, int vf_id);
+void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id);
+#else
+static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev)
+{
+	return NULL;
+}
+
+static inline int ice_migration_init_dev(struct ice_pf *pf, int vf_id)
+{
+	return 0;
+}
+
+static inline void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) { }
+#endif /* CONFIG_ICE_VFIO_PCI */
+
+#endif /* _ICE_MIGRATION_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (3 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 04/12] ice: Add fundamental migration init and exit function Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-11-29 17:12   ` Simon Horman
                     ` (2 more replies)
  2023-11-21  2:51 ` [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration Yahui Cao
                   ` (8 subsequent siblings)
  13 siblings, 3 replies; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

Save the virtual channel messages sent by VF on the source side during
runtime. The logged virtchnl messages will be transferred and loaded
into the device on the destination side during the device resume stage.

For the feature which can not be migrated yet, it must be disabled or
blocked to prevent from being abused by VF. Otherwise, it may introduce
functional and security issue. Mask unsupported VF capability flags in
the VF-PF negotiaion stage.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 167 ++++++++++++++++++
 .../intel/ice/ice_migration_private.h         |  17 ++
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   5 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  31 ++++
 4 files changed, 220 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 2b9b5a2ce367..18ec4ec7d147 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -3,6 +3,17 @@
 
 #include "ice.h"
 
+struct ice_migration_virtchnl_msg_slot {
+	u32 opcode;
+	u16 msg_len;
+	char msg_buffer[];
+};
+
+struct ice_migration_virtchnl_msg_listnode {
+	struct list_head node;
+	struct ice_migration_virtchnl_msg_slot msg_slot;
+};
+
 /**
  * ice_migration_get_pf - Get ice PF structure pointer by pdev
  * @pdev: pointer to ice vfio pci VF pdev structure
@@ -22,6 +33,9 @@ EXPORT_SYMBOL(ice_migration_get_pf);
 void ice_migration_init_vf(struct ice_vf *vf)
 {
 	vf->migration_enabled = true;
+	INIT_LIST_HEAD(&vf->virtchnl_msg_list);
+	vf->virtchnl_msg_num = 0;
+	vf->virtchnl_msg_size = 0;
 }
 
 /**
@@ -30,10 +44,24 @@ void ice_migration_init_vf(struct ice_vf *vf)
  */
 void ice_migration_uninit_vf(struct ice_vf *vf)
 {
+	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
+	struct ice_migration_virtchnl_msg_listnode *dtmp;
+
 	if (!vf->migration_enabled)
 		return;
 
 	vf->migration_enabled = false;
+
+	if (list_empty(&vf->virtchnl_msg_list))
+		return;
+	list_for_each_entry_safe(msg_listnode, dtmp,
+				 &vf->virtchnl_msg_list,
+				 node) {
+		list_del(&msg_listnode->node);
+		kfree(msg_listnode);
+	}
+	vf->virtchnl_msg_num = 0;
+	vf->virtchnl_msg_size = 0;
 }
 
 /**
@@ -80,3 +108,142 @@ void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id)
 	ice_put_vf(vf);
 }
 EXPORT_SYMBOL(ice_migration_uninit_dev);
+
+/**
+ * ice_migration_is_loggable_msg - is this message loggable or not
+ * @v_opcode: virtchnl message operation code
+ *
+ * Return true if this message logging is supported, otherwise return false
+ */
+static inline bool ice_migration_is_loggable_msg(u32 v_opcode)
+{
+	switch (v_opcode) {
+	case VIRTCHNL_OP_VERSION:
+	case VIRTCHNL_OP_GET_VF_RESOURCES:
+	case VIRTCHNL_OP_CONFIG_VSI_QUEUES:
+	case VIRTCHNL_OP_CONFIG_IRQ_MAP:
+	case VIRTCHNL_OP_ADD_ETH_ADDR:
+	case VIRTCHNL_OP_DEL_ETH_ADDR:
+	case VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE:
+	case VIRTCHNL_OP_ENABLE_QUEUES:
+	case VIRTCHNL_OP_DISABLE_QUEUES:
+	case VIRTCHNL_OP_ADD_VLAN:
+	case VIRTCHNL_OP_DEL_VLAN:
+	case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
+	case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
+	case VIRTCHNL_OP_CONFIG_RSS_KEY:
+	case VIRTCHNL_OP_CONFIG_RSS_LUT:
+	case VIRTCHNL_OP_GET_SUPPORTED_RXDIDS:
+		return true;
+	default:
+		return false;
+	}
+}
+
+/**
+ * ice_migration_log_vf_msg - Log request message from VF
+ * @vf: pointer to the VF structure
+ * @event: pointer to the AQ event
+ *
+ * Log VF message for later device state loading during live migration
+ *
+ * Return 0 for success, negative for error
+ */
+int ice_migration_log_vf_msg(struct ice_vf *vf,
+			     struct ice_rq_event_info *event)
+{
+	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
+	u32 v_opcode = le32_to_cpu(event->desc.cookie_high);
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	u16 msglen = event->msg_len;
+	u8 *msg = event->msg_buf;
+
+	if (!ice_migration_is_loggable_msg(v_opcode))
+		return 0;
+
+	if (vf->virtchnl_msg_num >= VIRTCHNL_MSG_MAX) {
+		dev_warn(dev, "VF %d has maximum number virtual channel commands\n",
+			 vf->vf_id);
+		return -ENOMEM;
+	}
+
+	msg_listnode = (struct ice_migration_virtchnl_msg_listnode *)
+			kzalloc(struct_size(msg_listnode,
+					    msg_slot.msg_buffer,
+					    msglen),
+				GFP_KERNEL);
+	if (!msg_listnode) {
+		dev_err(dev, "VF %d failed to allocate memory for msg listnode\n",
+			vf->vf_id);
+		return -ENOMEM;
+	}
+	dev_dbg(dev, "VF %d save virtual channel command, op code: %d, len: %d\n",
+		vf->vf_id, v_opcode, msglen);
+	msg_listnode->msg_slot.opcode = v_opcode;
+	msg_listnode->msg_slot.msg_len = msglen;
+	memcpy(msg_listnode->msg_slot.msg_buffer, msg, msglen);
+	list_add_tail(&msg_listnode->node, &vf->virtchnl_msg_list);
+	vf->virtchnl_msg_num++;
+	vf->virtchnl_msg_size += struct_size(&msg_listnode->msg_slot,
+					     msg_buffer,
+					     msglen);
+	return 0;
+}
+
+/**
+ * ice_migration_unlog_vf_msg - revert logged message
+ * @vf: pointer to the VF structure
+ * @v_opcode: virtchnl message operation code
+ *
+ * Remove the last virtual channel message logged before.
+ */
+void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode)
+{
+	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
+
+	if (!ice_migration_is_loggable_msg(v_opcode))
+		return;
+
+	if (WARN_ON_ONCE(list_empty(&vf->virtchnl_msg_list)))
+		return;
+
+	msg_listnode =
+		list_last_entry(&vf->virtchnl_msg_list,
+				struct ice_migration_virtchnl_msg_listnode,
+				node);
+	if (WARN_ON_ONCE(msg_listnode->msg_slot.opcode != v_opcode))
+		return;
+
+	list_del(&msg_listnode->node);
+	kfree(msg_listnode);
+	vf->virtchnl_msg_num--;
+	vf->virtchnl_msg_size -= struct_size(&msg_listnode->msg_slot,
+					     msg_buffer,
+					     msg_listnode->msg_slot.msg_len);
+}
+
+#define VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE \
+				(VIRTCHNL_VF_OFFLOAD_L2 | \
+				 VIRTCHNL_VF_OFFLOAD_RSS_PF | \
+				 VIRTCHNL_VF_OFFLOAD_RSS_AQ | \
+				 VIRTCHNL_VF_OFFLOAD_RSS_REG | \
+				 VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2 | \
+				 VIRTCHNL_VF_OFFLOAD_ENCAP | \
+				 VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM | \
+				 VIRTCHNL_VF_OFFLOAD_RX_POLLING | \
+				 VIRTCHNL_VF_OFFLOAD_WB_ON_ITR | \
+				 VIRTCHNL_VF_CAP_ADV_LINK_SPEED | \
+				 VIRTCHNL_VF_OFFLOAD_VLAN | \
+				 VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC | \
+				 VIRTCHNL_VF_OFFLOAD_USO)
+
+/**
+ * ice_migration_supported_caps - get migration supported VF capabilities
+ *
+ * When migration is activated, some VF capabilities are not supported.
+ * Hence unmask those capability flags for VF resources.
+ */
+u32 ice_migration_supported_caps(void)
+{
+	return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h
index 2cc2f515fc5e..676eb2d6c12e 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration_private.h
+++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h
@@ -13,9 +13,26 @@
 #if IS_ENABLED(CONFIG_ICE_VFIO_PCI)
 void ice_migration_init_vf(struct ice_vf *vf);
 void ice_migration_uninit_vf(struct ice_vf *vf);
+int ice_migration_log_vf_msg(struct ice_vf *vf,
+			     struct ice_rq_event_info *event);
+void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode);
+u32 ice_migration_supported_caps(void);
 #else
 static inline void ice_migration_init_vf(struct ice_vf *vf) { }
 static inline void ice_migration_uninit_vf(struct ice_vf *vf) { }
+static inline int ice_migration_log_vf_msg(struct ice_vf *vf,
+					   struct ice_rq_event_info *event)
+{
+	return 0;
+}
+
+static inline void
+ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode) { }
+static inline u32
+ice_migration_supported_caps(void)
+{
+	return 0xFFFFFFFF;
+}
 #endif /* CONFIG_ICE_VFIO_PCI */
 
 #endif /* _ICE_MIGRATION_PRIVATE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 431fd28787e8..318b6dfc016d 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -77,6 +77,7 @@ struct ice_vfs {
 	unsigned long last_printed_mdd_jiffies;	/* MDD message rate limit */
 };
 
+#define VIRTCHNL_MSG_MAX 1000
 /* VF information structure */
 struct ice_vf {
 	struct hlist_node entry;
@@ -141,6 +142,10 @@ struct ice_vf {
 	u16 num_msix;			/* num of MSI-X configured on this VF */
 
 	u8 migration_enabled:1;
+	struct list_head virtchnl_msg_list;
+	u64 virtchnl_msg_num;
+	u64 virtchnl_msg_size;
+	u32 virtchnl_retval;
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 661ca86c3032..730eeaea8c89 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -348,6 +348,12 @@ ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
 		return -EIO;
 	}
 
+	/* v_retval will not be returned in this function, store it in the
+	 * per VF field to be used by migration logging logic later.
+	 */
+	if (vf->migration_enabled)
+		vf->virtchnl_retval = v_retval;
+
 	return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen);
 }
 
@@ -480,6 +486,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
 				  VIRTCHNL_VF_OFFLOAD_RSS_REG |
 				  VIRTCHNL_VF_OFFLOAD_VLAN;
 
+	if (vf->migration_enabled)
+		vf->driver_caps &= ice_migration_supported_caps();
 	vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
 	vsi = ice_get_vf_vsi(vf);
 	if (!vsi) {
@@ -4037,6 +4045,17 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 		goto finish;
 	}
 
+	if (vf->migration_enabled) {
+		if (ice_migration_log_vf_msg(vf, event)) {
+			u32 status_code = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+
+			err = ice_vc_respond_to_vf(vf, v_opcode,
+						   status_code,
+						   NULL, 0);
+			goto finish;
+		}
+	}
+
 	switch (v_opcode) {
 	case VIRTCHNL_OP_VERSION:
 		err = ops->get_ver_msg(vf, msg);
@@ -4156,6 +4175,18 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 			 vf_id, v_opcode, err);
 	}
 
+	/* All of the loggable virtual channel messages are logged by
+	 * ice_migration_unlog_vf_msg() before they are processed.
+	 *
+	 * Two kinds of error may happen, virtual channel message's result
+	 * is failure after processed by PF or message is not sent to VF
+	 * successfully. If error happened, fallback here by reverting logged
+	 * messages.
+	 */
+	if (vf->migration_enabled &&
+	    (vf->virtchnl_retval != VIRTCHNL_STATUS_SUCCESS || err))
+		ice_migration_unlog_vf_msg(vf, v_opcode);
+
 finish:
 	mutex_unlock(&vf->cfg_lock);
 	ice_put_vf(vf);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (4 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-07  7:39   ` Tian, Kevin
  2023-11-21  2:51 ` [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message " Yahui Cao
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

Add device state save/load function to adapter vfio migration stack
when device is in stop-copy/resume stage.

Device state saving handler is called by vfio driver in device stop copy
stage. It snapshots the device state, translates device state into device
specific data and fills the data into migration buffer.

Device state loading handler is called by vfio driver in device resume
stage. It gets device specific data from the migration buffer, translates
the data into the device state and recover the device with the state.

Currently only the virtual channel messages are handled.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 226 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  27 ++-
 drivers/net/ethernet/intel/ice/ice_virtchnl.h |   7 +-
 include/linux/net/intel/ice_migration.h       |  15 ++
 4 files changed, 266 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 18ec4ec7d147..981aa92bbe86 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -3,6 +3,9 @@
 
 #include "ice.h"
 
+#define ICE_MIG_DEVSTAT_MAGIC			0xE8000001
+#define ICE_MIG_DEVSTAT_VERSION			0x1
+
 struct ice_migration_virtchnl_msg_slot {
 	u32 opcode;
 	u16 msg_len;
@@ -14,6 +17,17 @@ struct ice_migration_virtchnl_msg_listnode {
 	struct ice_migration_virtchnl_msg_slot msg_slot;
 };
 
+struct ice_migration_dev_state {
+	u32 magic;
+	u32 version;
+	u64 total_size;
+	u32 vf_caps;
+	u16 num_txq;
+	u16 num_rxq;
+
+	u8 virtchnl_msgs[];
+} __aligned(8);
+
 /**
  * ice_migration_get_pf - Get ice PF structure pointer by pdev
  * @pdev: pointer to ice vfio pci VF pdev structure
@@ -247,3 +261,215 @@ u32 ice_migration_supported_caps(void)
 {
 	return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
 }
+
+/**
+ * ice_migration_save_devstate - save device state to migration buffer
+ * @pf: pointer to PF of migration device
+ * @vf_id: VF index of migration device
+ * @buf: pointer to VF msg in migration buffer
+ * @buf_sz: size of migration buffer
+ *
+ * Return 0 for success, negative for error
+ */
+int
+ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
+{
+	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
+	struct ice_migration_virtchnl_msg_slot *dummy_op;
+	struct ice_migration_dev_state *devstate;
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vsi *vsi;
+	struct ice_vf *vf;
+	u64 total_sz;
+	int ret = 0;
+
+	vf = ice_get_vf_by_id(pf, vf_id);
+	if (!vf) {
+		dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id);
+		return -EINVAL;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		ret = -EINVAL;
+		goto out_put_vf;
+	}
+
+	/* Reserve space to store device state */
+	total_sz = sizeof(struct ice_migration_dev_state) +
+			vf->virtchnl_msg_size + sizeof(*dummy_op);
+	if (total_sz > buf_sz) {
+		dev_err(dev, "Insufficient buffer to store device state for VF %d\n",
+			vf->vf_id);
+		ret = -ENOBUFS;
+		goto out_put_vf;
+	}
+
+	devstate = (struct ice_migration_dev_state *)buf;
+	devstate->magic = ICE_MIG_DEVSTAT_MAGIC;
+	devstate->version = ICE_MIG_DEVSTAT_VERSION;
+	devstate->total_size = total_sz;
+	devstate->vf_caps = ice_migration_supported_caps();
+	devstate->num_txq = vsi->num_txq;
+	devstate->num_rxq = vsi->num_rxq;
+	buf = devstate->virtchnl_msgs;
+
+	list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) {
+		struct ice_migration_virtchnl_msg_slot *msg_slot;
+		u64 slot_size;
+
+		msg_slot = &msg_listnode->msg_slot;
+		slot_size = struct_size(msg_slot, msg_buffer,
+					msg_slot->msg_len);
+		dev_dbg(dev, "VF %d copy virtchnl message to migration buffer op: %d, len: %d\n",
+			vf->vf_id, msg_slot->opcode, msg_slot->msg_len);
+		memcpy(buf, msg_slot, slot_size);
+		buf += slot_size;
+	}
+
+	/* Use op code unknown to mark end of vc messages */
+	dummy_op = (struct ice_migration_virtchnl_msg_slot *)buf;
+	dummy_op->opcode = VIRTCHNL_OP_UNKNOWN;
+
+out_put_vf:
+	ice_put_vf(vf);
+	return ret;
+}
+EXPORT_SYMBOL(ice_migration_save_devstate);
+
+/**
+ * ice_migration_check_match - check if configuration is matched or not
+ * @vf: pointer to VF
+ * @buf: pointer to device state buffer
+ * @buf_sz: size of buffer
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_check_match(struct ice_vf *vf, const u8 *buf, u64 buf_sz)
+{
+	u32 supported_caps = ice_migration_supported_caps();
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_migration_dev_state *devstate;
+	struct ice_vsi *vsi;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	if (sizeof(struct ice_migration_dev_state) > buf_sz) {
+		dev_err(dev, "VF %d devstate header exceeds buffer size\n",
+			vf->vf_id);
+		return -EINVAL;
+	}
+
+	devstate = (struct ice_migration_dev_state *)buf;
+	if (devstate->magic != ICE_MIG_DEVSTAT_MAGIC) {
+		dev_err(dev, "VF %d devstate has invalid magic 0x%x\n",
+			vf->vf_id, devstate->magic);
+		return -EINVAL;
+	}
+
+	if (devstate->version != ICE_MIG_DEVSTAT_VERSION) {
+		dev_err(dev, "VF %d devstate has invalid version 0x%x\n",
+			vf->vf_id, devstate->version);
+		return -EINVAL;
+	}
+
+	if (devstate->num_txq != vsi->num_txq) {
+		dev_err(dev, "Failed to match VF %d tx queue number, request %d, support %d\n",
+			vf->vf_id, devstate->num_txq, vsi->num_txq);
+		return -EINVAL;
+	}
+
+	if (devstate->num_rxq != vsi->num_rxq) {
+		dev_err(dev, "Failed to match VF %d rx queue number, request %d, support %d\n",
+			vf->vf_id, devstate->num_rxq, vsi->num_rxq);
+		return -EINVAL;
+	}
+
+	if ((devstate->vf_caps & supported_caps) != devstate->vf_caps) {
+		dev_err(dev, "Failed to match VF %d caps, request 0x%x, support 0x%x\n",
+			vf->vf_id, devstate->vf_caps, supported_caps);
+		return -EINVAL;
+	}
+
+	if (devstate->total_size > buf_sz) {
+		dev_err(dev, "VF %d devstate exceeds buffer size\n",
+			vf->vf_id);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
+ * ice_migration_load_devstate - load device state at destination
+ * @pf: pointer to PF of migration device
+ * @vf_id: VF index of migration device
+ * @buf: pointer to device state buf in migration buffer
+ * @buf_sz: size of migration buffer
+ *
+ * This function uses the device state saved in migration buffer
+ * to load device state at destination VM
+ *
+ * Return 0 for success, negative for error
+ */
+int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
+				const u8 *buf, u64 buf_sz)
+{
+	struct ice_migration_virtchnl_msg_slot *msg_slot;
+	struct ice_migration_dev_state *devstate;
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vf *vf;
+	int ret = 0;
+
+	if (!buf)
+		return -EINVAL;
+
+	vf = ice_get_vf_by_id(pf, vf_id);
+	if (!vf) {
+		dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id);
+		return -EINVAL;
+	}
+
+	ret = ice_migration_check_match(vf, buf, buf_sz);
+	if (ret)
+		goto out_put_vf;
+
+	devstate = (struct ice_migration_dev_state *)buf;
+	msg_slot = (struct ice_migration_virtchnl_msg_slot *)
+			devstate->virtchnl_msgs;
+	set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
+
+	while (msg_slot->opcode != VIRTCHNL_OP_UNKNOWN) {
+		struct ice_rq_event_info event;
+		u64 slot_sz;
+
+		slot_sz = struct_size(msg_slot, msg_buffer, msg_slot->msg_len);
+		dev_dbg(dev, "VF %d replay virtchnl message op code: %d, msg len: %d\n",
+			vf->vf_id, msg_slot->opcode, msg_slot->msg_len);
+		event.desc.cookie_high = cpu_to_le32(msg_slot->opcode);
+		event.msg_len = msg_slot->msg_len;
+		event.desc.retval = cpu_to_le16(vf->vf_id);
+		event.msg_buf = (unsigned char *)msg_slot->msg_buffer;
+		ret = ice_vc_process_vf_msg(vf->pf, &event, NULL);
+		if (ret) {
+			dev_err(dev, "VF %d failed to replay virtchnl message op code: %d\n",
+				vf->vf_id, msg_slot->opcode);
+			goto out_clear_replay;
+		}
+		event.msg_buf = NULL;
+		msg_slot = (struct ice_migration_virtchnl_msg_slot *)
+					((char *)msg_slot + slot_sz);
+	}
+out_clear_replay:
+	clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
+out_put_vf:
+	ice_put_vf(vf);
+	return ret;
+}
+EXPORT_SYMBOL(ice_migration_load_devstate);
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 730eeaea8c89..54f441daa87e 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -3982,11 +3982,24 @@ ice_is_malicious_vf(struct ice_vf *vf, struct ice_mbx_data *mbxdata)
  * @event: pointer to the AQ event
  * @mbxdata: information used to detect VF attempting mailbox overflow
  *
- * called from the common asq/arq handler to
- * process request from VF
+ * This function will be called from:
+ * 1. the common asq/arq handler to process request from VF
+ *
+ *    The return value is ignored, as the command handler will send the status
+ *    of the request as a response to the VF. This flow sets the mbxdata to
+ *    a non-NULL value and must call ice_is_malicious_vf to determine if this
+ *    VF might be attempting to overflow the PF message queue.
+ *
+ * 2. replay virtual channel commamds during live migration
+ *
+ *    The return value is used to indicate failure to replay vc commands and
+ *    that the migration failed. This flow sets mbxdata to NULL and skips the
+ *    ice_is_malicious_vf checks which are unnecessary during replay.
+ *
+ * Return 0 if success, negative for failure.
  */
-void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
-			   struct ice_mbx_data *mbxdata)
+int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
+			  struct ice_mbx_data *mbxdata)
 {
 	u32 v_opcode = le32_to_cpu(event->desc.cookie_high);
 	s16 vf_id = le16_to_cpu(event->desc.retval);
@@ -4003,13 +4016,14 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	if (!vf) {
 		dev_err(dev, "Unable to locate VF for message from VF ID %d, opcode %d, len %d\n",
 			vf_id, v_opcode, msglen);
-		return;
+		return -EINVAL;
 	}
 
 	mutex_lock(&vf->cfg_lock);
 
 	/* Check if the VF is trying to overflow the mailbox */
-	if (ice_is_malicious_vf(vf, mbxdata))
+	if (!test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states) &&
+	    ice_is_malicious_vf(vf, mbxdata))
 		goto finish;
 
 	/* Check if VF is disabled. */
@@ -4190,4 +4204,5 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 finish:
 	mutex_unlock(&vf->cfg_lock);
 	ice_put_vf(vf);
+	return err;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
index a2b6094e2f2f..4b151a228c52 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
@@ -63,8 +63,8 @@ int
 ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
 		     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen);
 bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id);
-void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
-			   struct ice_mbx_data *mbxdata);
+int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
+			  struct ice_mbx_data *mbxdata);
 #else /* CONFIG_PCI_IOV */
 static inline void ice_virtchnl_set_dflt_ops(struct ice_vf *vf) { }
 static inline void ice_virtchnl_set_repr_ops(struct ice_vf *vf) { }
@@ -84,10 +84,11 @@ static inline bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id)
 	return false;
 }
 
-static inline void
+static inline int
 ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 		      struct ice_mbx_data *mbxdata)
 {
+	return -EOPNOTSUPP;
 }
 #endif /* !CONFIG_PCI_IOV */
 
diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h
index 7ea11a8714d6..a142b78283a8 100644
--- a/include/linux/net/intel/ice_migration.h
+++ b/include/linux/net/intel/ice_migration.h
@@ -10,6 +10,10 @@ struct ice_pf;
 struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev);
 int ice_migration_init_dev(struct ice_pf *pf, int vf_id);
 void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id);
+int ice_migration_save_devstate(struct ice_pf *pf, int vf_id,
+				u8 *buf, u64 buf_sz);
+int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
+				const u8 *buf, u64 buf_sz);
 #else
 static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev)
 {
@@ -22,6 +26,17 @@ static inline int ice_migration_init_dev(struct ice_pf *pf, int vf_id)
 }
 
 static inline void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) { }
+static inline int
+ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
+{
+	return 0;
+}
+
+static inline int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
+					      const u8 *buf, u64 buf_sz)
+{
+	return 0;
+}
 #endif /* CONFIG_ICE_VFIO_PCI */
 
 #endif /* _ICE_MIGRATION_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message for migration
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (5 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-07  7:42   ` Tian, Kevin
  2023-11-21  2:51 ` [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head Yahui Cao
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

VSI id is a resource id for each VF and it is an absolute hardware id
per PCI card. It is exposed to VF driver through virtual channel
messages at the VF-PF negotiation stage. It is constant during the whole
device lifecycle unless driver re-init.

Almost all of the virtual channel messages will contain the VSI id. Once
PF receives message, it will check if VSI id in the message is equal to
the VF's VSI id for security and other reason.  If a VM backed by VF VSI
A is migrated to a VM backed by VF with VSI B, while in messages
replaying stage, all the messages will be rejected by PF due to the
invalid VSI id. Even after migration, VM runtime will get failure as
well.

Fix this gap by modifying the VSI id in the virtual channel message at
migration device resuming stage and VM runtime stage. In most cases the
VSI id will vary between migration source and destination side. And this
is a slow path anyway.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 95 +++++++++++++++++++
 .../intel/ice/ice_migration_private.h         |  4 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  1 +
 4 files changed, 101 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 981aa92bbe86..780d2183011a 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -25,6 +25,7 @@ struct ice_migration_dev_state {
 	u16 num_txq;
 	u16 num_rxq;
 
+	u16 vsi_id;
 	u8 virtchnl_msgs[];
 } __aligned(8);
 
@@ -50,6 +51,7 @@ void ice_migration_init_vf(struct ice_vf *vf)
 	INIT_LIST_HEAD(&vf->virtchnl_msg_list);
 	vf->virtchnl_msg_num = 0;
 	vf->virtchnl_msg_size = 0;
+	vf->vm_vsi_num = vf->lan_vsi_num;
 }
 
 /**
@@ -314,6 +316,7 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
 	devstate->num_txq = vsi->num_txq;
 	devstate->num_rxq = vsi->num_rxq;
 	buf = devstate->virtchnl_msgs;
+	devstate->vsi_id = vf->vm_vsi_num;
 
 	list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) {
 		struct ice_migration_virtchnl_msg_slot *msg_slot;
@@ -441,6 +444,8 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 		goto out_put_vf;
 
 	devstate = (struct ice_migration_dev_state *)buf;
+	vf->vm_vsi_num = devstate->vsi_id;
+	dev_dbg(dev, "VF %d vm vsi num is:%d\n", vf->vf_id, vf->vm_vsi_num);
 	msg_slot = (struct ice_migration_virtchnl_msg_slot *)
 			devstate->virtchnl_msgs;
 	set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
@@ -473,3 +478,93 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 	return ret;
 }
 EXPORT_SYMBOL(ice_migration_load_devstate);
+
+/**
+ * ice_migration_fix_msg_vsi - change virtual channel msg VSI id
+ *
+ * @vf: pointer to the VF structure
+ * @v_opcode: virtchnl message operation code
+ * @msg: pointer to the virtual channel message
+ *
+ * After migration, the VSI id saved by VF driver may be different from VF
+ * VSI id. Some virtual channel commands will fail due to unmatch VSI id.
+ * Change virtual channel message payload VSI id to real VSI id.
+ */
+void ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg)
+{
+	if (!vf->migration_enabled)
+		return;
+
+	switch (v_opcode) {
+	case VIRTCHNL_OP_ADD_ETH_ADDR:
+	case VIRTCHNL_OP_DEL_ETH_ADDR:
+	case VIRTCHNL_OP_ENABLE_QUEUES:
+	case VIRTCHNL_OP_DISABLE_QUEUES:
+	case VIRTCHNL_OP_CONFIG_RSS_KEY:
+	case VIRTCHNL_OP_CONFIG_RSS_LUT:
+	case VIRTCHNL_OP_GET_STATS:
+	case VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE:
+	case VIRTCHNL_OP_ADD_FDIR_FILTER:
+	case VIRTCHNL_OP_DEL_FDIR_FILTER:
+	case VIRTCHNL_OP_ADD_VLAN:
+	case VIRTCHNL_OP_DEL_VLAN: {
+		/* Read the beginning two bytes of message for VSI id */
+		u16 *vsi_id = (u16 *)msg;
+
+		/* For VM runtime stage, vsi_id in the virtual channel message
+		 * should be equal to the PF logged vsi_id and vsi_id is
+		 * replaced by VF's VSI id to guarantee that messages are
+		 * processed successfully. If vsi_id is not equal to the PF
+		 * logged vsi_id, then this message must be sent by malicious
+		 * VF and no replacement is needed. Just let virtual channel
+		 * handler to fail this message.
+		 *
+		 * For virtual channel replaying stage, all of the PF logged
+		 * virtual channel messages are trusted and vsi_id is replaced
+		 * anyway to guarantee the messages are processed successfully.
+		 */
+		if (*vsi_id == vf->vm_vsi_num ||
+		    test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states))
+			*vsi_id = vf->lan_vsi_num;
+		break;
+	}
+	case VIRTCHNL_OP_CONFIG_IRQ_MAP: {
+		struct virtchnl_irq_map_info *irqmap_info;
+		u16 num_q_vectors_mapped;
+		int i;
+
+		irqmap_info = (struct virtchnl_irq_map_info *)msg;
+		num_q_vectors_mapped = irqmap_info->num_vectors;
+		for (i = 0; i < num_q_vectors_mapped; i++) {
+			struct virtchnl_vector_map *map;
+
+			map = &irqmap_info->vecmap[i];
+			if (map->vsi_id == vf->vm_vsi_num ||
+			    test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states))
+				map->vsi_id = vf->lan_vsi_num;
+		}
+		break;
+	}
+	case VIRTCHNL_OP_CONFIG_VSI_QUEUES: {
+		struct virtchnl_vsi_queue_config_info *qci;
+
+		qci = (struct virtchnl_vsi_queue_config_info *)msg;
+		if (qci->vsi_id == vf->vm_vsi_num ||
+		    test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) {
+			int i;
+
+			qci->vsi_id = vf->lan_vsi_num;
+			for (i = 0; i < qci->num_queue_pairs; i++) {
+				struct virtchnl_queue_pair_info *qpi;
+
+				qpi = &qci->qpair[i];
+				qpi->txq.vsi_id = vf->lan_vsi_num;
+				qpi->rxq.vsi_id = vf->lan_vsi_num;
+			}
+		}
+		break;
+	}
+	default:
+		break;
+	}
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h
index 676eb2d6c12e..f72a488d9002 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration_private.h
+++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h
@@ -17,6 +17,7 @@ int ice_migration_log_vf_msg(struct ice_vf *vf,
 			     struct ice_rq_event_info *event);
 void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode);
 u32 ice_migration_supported_caps(void);
+void ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg);
 #else
 static inline void ice_migration_init_vf(struct ice_vf *vf) { }
 static inline void ice_migration_uninit_vf(struct ice_vf *vf) { }
@@ -33,6 +34,9 @@ ice_migration_supported_caps(void)
 {
 	return 0xFFFFFFFF;
 }
+
+static inline void
+ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg) { }
 #endif /* CONFIG_ICE_VFIO_PCI */
 
 #endif /* _ICE_MIGRATION_PRIVATE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 318b6dfc016d..49d99694e91f 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -146,6 +146,7 @@ struct ice_vf {
 	u64 virtchnl_msg_num;
 	u64 virtchnl_msg_size;
 	u32 virtchnl_retval;
+	u16 vm_vsi_num;
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 54f441daa87e..8dbe558790af 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -4060,6 +4060,7 @@ int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
 	}
 
 	if (vf->migration_enabled) {
+		ice_migration_fix_msg_vsi(vf, v_opcode, msg);
 		if (ice_migration_log_vf_msg(vf, event)) {
 			u32 status_code = VIRTCHNL_STATUS_ERR_NO_MEMORY;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (6 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message " Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-07  7:55   ` Tian, Kevin
  2023-11-21  2:51 ` [PATCH iwl-next v4 09/12] ice: Save and load TX " Yahui Cao
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

RX Queue head is a fundamental dma ring context which determines the
next RX descriptor to be fetched. However, RX Queue head is not visible
to VF while it is only visible in PF. As a result, PF needs to save and
load RX Queue Head explicitly.

Since network packets may come in at any time once RX Queue is enabled,
RX Queue head needs to be loaded before Queue is enabled.

RX Queue head loading handler is implemented by reading and then
overwriting queue context with specific HEAD value.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 125 ++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 780d2183011a..473be6a83cf3 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -2,9 +2,11 @@
 /* Copyright (C) 2018-2023 Intel Corporation */
 
 #include "ice.h"
+#include "ice_base.h"
 
 #define ICE_MIG_DEVSTAT_MAGIC			0xE8000001
 #define ICE_MIG_DEVSTAT_VERSION			0x1
+#define ICE_MIG_VF_QRX_TAIL_MAX			256
 
 struct ice_migration_virtchnl_msg_slot {
 	u32 opcode;
@@ -26,6 +28,8 @@ struct ice_migration_dev_state {
 	u16 num_rxq;
 
 	u16 vsi_id;
+	/* next RX desc index to be processed by the device */
+	u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX];
 	u8 virtchnl_msgs[];
 } __aligned(8);
 
@@ -264,6 +268,54 @@ u32 ice_migration_supported_caps(void)
 	return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
 }
 
+/**
+ * ice_migration_save_rx_head - save rx head into device state buffer
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration buffer
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_save_rx_head(struct ice_vf *vf,
+			   struct ice_migration_dev_state *devstate)
+{
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_vsi *vsi;
+	int i;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	ice_for_each_rxq(vsi, i) {
+		struct ice_rx_ring *rx_ring = vsi->rx_rings[i];
+		struct ice_rlan_ctx rlan_ctx = {};
+		struct ice_hw *hw = &vf->pf->hw;
+		u16 rxq_index;
+		int status;
+
+		if (WARN_ON_ONCE(!rx_ring))
+			return -EINVAL;
+
+		devstate->rx_head[i] = 0;
+		if (!test_bit(i, vf->rxq_ena))
+			continue;
+
+		rxq_index = rx_ring->reg_idx;
+		status = ice_read_rxq_ctx(hw, &rlan_ctx, rxq_index);
+		if (status) {
+			dev_err(dev, "Failed to read RXQ[%d] context, err=%d\n",
+				rx_ring->q_index, status);
+			return -EIO;
+		}
+		devstate->rx_head[i] = rlan_ctx.head;
+	}
+
+	return 0;
+}
+
 /**
  * ice_migration_save_devstate - save device state to migration buffer
  * @pf: pointer to PF of migration device
@@ -318,6 +370,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
 	buf = devstate->virtchnl_msgs;
 	devstate->vsi_id = vf->vm_vsi_num;
 
+	ret = ice_migration_save_rx_head(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to save rxq head\n", vf->vf_id);
+		goto out_put_vf;
+	}
+
 	list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) {
 		struct ice_migration_virtchnl_msg_slot *msg_slot;
 		u64 slot_size;
@@ -409,6 +467,57 @@ ice_migration_check_match(struct ice_vf *vf, const u8 *buf, u64 buf_sz)
 	return 0;
 }
 
+/**
+ * ice_migration_load_rx_head - load rx head from device state buffer
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_load_rx_head(struct ice_vf *vf,
+			   struct ice_migration_dev_state *devstate)
+{
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_vsi *vsi;
+	int i;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	ice_for_each_rxq(vsi, i) {
+		struct ice_rx_ring *rx_ring = vsi->rx_rings[i];
+		struct ice_rlan_ctx rlan_ctx = {};
+		struct ice_hw *hw = &vf->pf->hw;
+		u16 rxq_index;
+		int status;
+
+		if (WARN_ON_ONCE(!rx_ring))
+			return -EINVAL;
+
+		rxq_index = rx_ring->reg_idx;
+		status = ice_read_rxq_ctx(hw, &rlan_ctx, rxq_index);
+		if (status) {
+			dev_err(dev, "Failed to read RXQ[%d] context, err=%d\n",
+				rx_ring->q_index, status);
+			return -EIO;
+		}
+
+		rlan_ctx.head = devstate->rx_head[i];
+		status = ice_write_rxq_ctx(hw, &rlan_ctx, rxq_index);
+		if (status) {
+			dev_err(dev, "Failed to set LAN RXQ[%d] context, err=%d\n",
+				rx_ring->q_index, status);
+			return -EIO;
+		}
+	}
+
+	return 0;
+}
+
 /**
  * ice_migration_load_devstate - load device state at destination
  * @pf: pointer to PF of migration device
@@ -467,6 +576,22 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 				vf->vf_id, msg_slot->opcode);
 			goto out_clear_replay;
 		}
+
+		/* Once RX Queue is enabled, network traffic may come in at any
+		 * time. As a result, RX Queue head needs to be loaded before
+		 * RX Queue is enabled.
+		 * For simplicity and integration, overwrite RX head just after
+		 * RX ring context is configured.
+		 */
+		if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES) {
+			ret = ice_migration_load_rx_head(vf, devstate);
+			if (ret) {
+				dev_err(dev, "VF %d failed to load rx head\n",
+					vf->vf_id);
+				goto out_clear_replay;
+			}
+		}
+
 		event.msg_buf = NULL;
 		msg_slot = (struct ice_migration_virtchnl_msg_slot *)
 					((char *)msg_slot + slot_sz);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 09/12] ice: Save and load TX Queue head
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (7 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-07  8:22   ` Tian, Kevin
  2023-11-21  2:51 ` [PATCH iwl-next v4 10/12] ice: Add device suspend function for migration Yahui Cao
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

TX Queue head is a fundamental DMA ring context which determines the
next TX descriptor to be fetched. However, TX Queue head is not visible
to VF while it is only visible in PF. As a result, PF needs to save and
load TX Queue head explicitly.

Unfortunately, due to HW limitation, TX Queue head can't be recovered
through writing mmio registers.

Since sending one packet will make TX head advanced by 1 index, TX Queue
head can be advanced by N index through sending N packets. Filling in
DMA ring with NOP descriptors and bumping doorbell can be used to change
TX Queue head indirectly. And this method has no side effects except
changing TX head value.

To advance TX Head queue, HW needs to touch memory by DMA. But directly
touching VM's memory to advance TX Queue head does not follow vfio
migration protocol design, because vIOMMU state is not defined by the
protocol. Even this may introduce functional and security issue under
hostile guest circumstances.

In order not to touch any VF memory or IO page table, TX Queue head
loading is using PF managed memory and PF isolation domain. This will
also introduce another dependency that while switching TX Queue between
PF space and VF space, TX Queue head value is not changed. HW provides
an indirect context access so that head value can be kept while
switching context.

In virtual channel model, VF driver only send TX queue ring base and
length info to PF, while rest of the TX queue context are managed by PF.
TX queue length must be verified by PF during virtual channel message
processing. When PF uses dummy descriptors to advance TX head, it will
configure the TX ring base as the new address managed by PF itself. As a
result, all of the TX queue context is taken control of by PF and this
method won't generate any attacking vulnerability

The overall steps for TX head loading handler are:
1. Backup TX context, switch TX queue context as PF space and PF
   DMA ring base with interrupt disabled
2. Fill the DMA ring with dummy descriptors and bump doorbell to
   advance TX head. Once kicking doorbell, HW will issue DMA and
   send PCI upstream memory transaction tagged by PF BDF. Since
   ring base is PF's managed DMA buffer, DMA can work successfully
   and TX Head is advanced as expected.
3. Overwrite TX context by the backup context in step 1. Since TX
   queue head value is not changed while context switch, TX queue
   head is successfully loaded.

Since everything is happening inside PF context, it is transparent to
vfio driver and has no effects outside of PF.

Co-developed-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 306 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_virtchnl.c |  18 ++
 2 files changed, 324 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 473be6a83cf3..082ae2b79f60 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -3,10 +3,14 @@
 
 #include "ice.h"
 #include "ice_base.h"
+#include "ice_txrx_lib.h"
 
 #define ICE_MIG_DEVSTAT_MAGIC			0xE8000001
 #define ICE_MIG_DEVSTAT_VERSION			0x1
 #define ICE_MIG_VF_QRX_TAIL_MAX			256
+#define QTX_HEAD_RESTORE_DELAY_MAX		100
+#define QTX_HEAD_RESTORE_DELAY_SLEEP_US_MIN	10
+#define QTX_HEAD_RESTORE_DELAY_SLEEP_US_MAX	10
 
 struct ice_migration_virtchnl_msg_slot {
 	u32 opcode;
@@ -30,6 +34,8 @@ struct ice_migration_dev_state {
 	u16 vsi_id;
 	/* next RX desc index to be processed by the device */
 	u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX];
+	/* next TX desc index to be processed by the device */
+	u16 tx_head[ICE_MIG_VF_QRX_TAIL_MAX];
 	u8 virtchnl_msgs[];
 } __aligned(8);
 
@@ -316,6 +322,62 @@ ice_migration_save_rx_head(struct ice_vf *vf,
 	return 0;
 }
 
+/**
+ * ice_migration_save_tx_head - save tx head in migration region
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_save_tx_head(struct ice_vf *vf,
+			   struct ice_migration_dev_state *devstate)
+{
+	struct ice_vsi *vsi = ice_get_vf_vsi(vf);
+	struct ice_pf *pf = vf->pf;
+	struct device *dev;
+	int i = 0;
+
+	dev = ice_pf_to_dev(pf);
+
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	ice_for_each_txq(vsi, i) {
+		u16 tx_head;
+		u32 reg;
+
+		devstate->tx_head[i] = 0;
+		if (!test_bit(i, vf->txq_ena))
+			continue;
+
+		reg = rd32(&pf->hw, QTX_COMM_HEAD(vsi->txq_map[i]));
+		tx_head = (reg & QTX_COMM_HEAD_HEAD_M)
+					>> QTX_COMM_HEAD_HEAD_S;
+
+		/* 1. If TX head is QTX_COMM_HEAD_HEAD_M marker, which means
+		 *    it is the value written by software and there are no
+		 *    descriptors write back happened, then there are no
+		 *    packets sent since queue enabled.
+		 * 2. If TX head is ring length minus 1, then it just returns
+		 *    to the start of the ring.
+		 */
+		if (tx_head == QTX_COMM_HEAD_HEAD_M ||
+		    tx_head == (vsi->tx_rings[i]->count - 1))
+			tx_head = 0;
+		else
+			/* Add compensation since value read from TX Head
+			 * register is always the real TX head minus 1
+			 */
+			tx_head++;
+
+		devstate->tx_head[i] = tx_head;
+	}
+	return 0;
+}
+
 /**
  * ice_migration_save_devstate - save device state to migration buffer
  * @pf: pointer to PF of migration device
@@ -376,6 +438,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
 		goto out_put_vf;
 	}
 
+	ret = ice_migration_save_tx_head(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to save txq head\n", vf->vf_id);
+		goto out_put_vf;
+	}
+
 	list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) {
 		struct ice_migration_virtchnl_msg_slot *msg_slot;
 		u64 slot_size;
@@ -518,6 +586,234 @@ ice_migration_load_rx_head(struct ice_vf *vf,
 	return 0;
 }
 
+/**
+ * ice_migration_init_dummy_desc - init dma ring by dummy descriptor
+ * @tx_desc: tx ring descriptor array
+ * @len: array length
+ * @tx_pkt_dma: dummy packet dma address
+ */
+static inline void
+ice_migration_init_dummy_desc(struct ice_tx_desc *tx_desc,
+			      u16 len,
+			      dma_addr_t tx_pkt_dma)
+{
+	int i;
+
+	/* Init ring with dummy descriptors */
+	for (i = 0; i < len; i++) {
+		u32 td_cmd;
+
+		td_cmd = ICE_TXD_LAST_DESC_CMD | ICE_TX_DESC_CMD_DUMMY;
+		tx_desc[i].cmd_type_offset_bsz =
+				ice_build_ctob(td_cmd, 0, SZ_256, 0);
+		tx_desc[i].buf_addr = cpu_to_le64(tx_pkt_dma);
+	}
+}
+
+/**
+ * ice_migration_wait_for_tx_completion - wait for TX transmission completion
+ * @hw: pointer to the device HW structure
+ * @tx_ring: tx ring instance
+ * @head: expected tx head position when transmission completion
+ *
+ * Return 0 for success, negative for error.
+ */
+static int
+ice_migration_wait_for_tx_completion(struct ice_hw *hw,
+				     struct ice_tx_ring *tx_ring, u16 head)
+{
+	u32 tx_head;
+	int i;
+
+	tx_head = rd32(hw, QTX_COMM_HEAD(tx_ring->reg_idx));
+	tx_head = (tx_head & QTX_COMM_HEAD_HEAD_M)
+		   >> QTX_COMM_HEAD_HEAD_S;
+
+	for (i = 0; i < QTX_HEAD_RESTORE_DELAY_MAX && tx_head != (head - 1);
+				i++) {
+		usleep_range(QTX_HEAD_RESTORE_DELAY_SLEEP_US_MIN,
+			     QTX_HEAD_RESTORE_DELAY_SLEEP_US_MAX);
+
+		tx_head = rd32(hw, QTX_COMM_HEAD(tx_ring->reg_idx));
+		tx_head = (tx_head & QTX_COMM_HEAD_HEAD_M)
+			   >> QTX_COMM_HEAD_HEAD_S;
+	}
+
+	if (i == QTX_HEAD_RESTORE_DELAY_MAX)
+		return -EBUSY;
+
+	return 0;
+}
+
+/**
+ * ice_migration_inject_dummy_desc - inject dummy descriptors
+ * @vf: pointer to VF structure
+ * @tx_ring: tx ring instance
+ * @head: tx head to be loaded
+ * @tx_desc_dma:tx descriptor ring base dma address
+ *
+ * For each TX queue, load the TX head by following below steps:
+ * 1. Backup TX context, switch TX queue context as PF space and PF
+ *    DMA ring base with interrupt disabled
+ * 2. Fill the DMA ring with dummy descriptors and bump doorbell to
+ *    advance TX head. Once kicking doorbell, HW will issue DMA and
+ *    send PCI upstream memory transaction tagged by PF BDF. Since
+ *    ring base is PF's managed DMA buffer, DMA can work successfully
+ *    and TX Head is advanced as expected.
+ * 3. Overwrite TX context by the backup context in step 1. Since TX
+ *    queue head value is not changed while context switch, TX queue
+ *    head is successfully loaded.
+ *
+ * Return 0 for success, negative for error.
+ */
+static int
+ice_migration_inject_dummy_desc(struct ice_vf *vf, struct ice_tx_ring *tx_ring,
+				u16 head, dma_addr_t tx_desc_dma)
+{
+	struct ice_tlan_ctx tlan_ctx, tlan_ctx_orig;
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_hw *hw = &vf->pf->hw;
+	u32 dynctl;
+	u32 tqctl;
+	int status;
+	int ret;
+
+	/* 1.1 Backup TX Queue context */
+	status = ice_read_txq_ctx(hw, &tlan_ctx, tx_ring->reg_idx);
+	if (status) {
+		dev_err(dev, "Failed to read TXQ[%d] context, err=%d\n",
+			tx_ring->q_index, status);
+		return -EIO;
+	}
+	memcpy(&tlan_ctx_orig, &tlan_ctx, sizeof(tlan_ctx));
+	tqctl = rd32(hw, QINT_TQCTL(tx_ring->reg_idx));
+	if (tx_ring->q_vector)
+		dynctl = rd32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx));
+
+	/* 1.2 switch TX queue context as PF space and PF DMA ring base */
+	tlan_ctx.vmvf_type = ICE_TLAN_CTX_VMVF_TYPE_PF;
+	tlan_ctx.vmvf_num = 0;
+	tlan_ctx.base = tx_desc_dma >> ICE_TLAN_CTX_BASE_S;
+	status = ice_write_txq_ctx(hw, &tlan_ctx, tx_ring->reg_idx);
+	if (status) {
+		dev_err(dev, "Failed to write TXQ[%d] context, err=%d\n",
+			tx_ring->q_index, status);
+		return -EIO;
+	}
+
+	/* 1.3 Disable TX queue interrupt */
+	wr32(hw, QINT_TQCTL(tx_ring->reg_idx), QINT_TQCTL_ITR_INDX_M);
+
+	/* To disable tx queue interrupt during run time, software should
+	 * write mmio to trigger a MSIX interrupt.
+	 */
+	if (tx_ring->q_vector)
+		wr32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx),
+		     (ICE_ITR_NONE << GLINT_DYN_CTL_ITR_INDX_S) |
+		     GLINT_DYN_CTL_SWINT_TRIG_M |
+		     GLINT_DYN_CTL_INTENA_M);
+
+	/* Force memory writes to complete before letting h/w know there
+	 * are new descriptors to fetch.
+	 */
+	wmb();
+
+	/* 2.1 Bump doorbell to advance TX Queue head */
+	writel(head, tx_ring->tail);
+
+	/* 2.2 Wait until TX Queue head move to expected place */
+	ret = ice_migration_wait_for_tx_completion(hw, tx_ring, head);
+	if (ret) {
+		dev_err(dev, "VF %d txq[%d] head loading timeout\n",
+			vf->vf_id, tx_ring->q_index);
+		return ret;
+	}
+
+	/* 3. Overwrite TX Queue context with backup context */
+	status = ice_write_txq_ctx(hw, &tlan_ctx_orig, tx_ring->reg_idx);
+	if (status) {
+		dev_err(dev, "Failed to write TXQ[%d] context, err=%d\n",
+			tx_ring->q_index, status);
+		return -EIO;
+	}
+	wr32(hw, QINT_TQCTL(tx_ring->reg_idx), tqctl);
+	if (tx_ring->q_vector)
+		wr32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx), dynctl);
+
+	return 0;
+}
+
+/**
+ * ice_migration_load_tx_head - load tx head
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_load_tx_head(struct ice_vf *vf,
+			   struct ice_migration_dev_state *devstate)
+{
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	u16 ring_len = ICE_MAX_NUM_DESC;
+	dma_addr_t tx_desc_dma, tx_pkt_dma;
+	struct ice_tx_desc *tx_desc;
+	struct ice_vsi *vsi;
+	char *tx_pkt;
+	int ret = 0;
+	int i = 0;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	/* Allocate DMA ring and descriptor by PF */
+	tx_desc = dma_alloc_coherent(dev, ring_len * sizeof(struct ice_tx_desc),
+				     &tx_desc_dma, GFP_KERNEL | __GFP_ZERO);
+	tx_pkt = dma_alloc_coherent(dev, SZ_4K, &tx_pkt_dma,
+				    GFP_KERNEL | __GFP_ZERO);
+	if (!tx_desc || !tx_pkt) {
+		dev_err(dev, "PF failed to allocate memory for VF %d\n",
+			vf->vf_id);
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	ice_for_each_txq(vsi, i) {
+		struct ice_tx_ring *tx_ring = vsi->tx_rings[i];
+		u16 *tx_heads = devstate->tx_head;
+
+		/* 1. Skip if TX Queue is not enabled */
+		if (!test_bit(i, vf->txq_ena) || tx_heads[i] == 0)
+			continue;
+
+		if (tx_heads[i] >= tx_ring->count) {
+			dev_err(dev, "VF %d: invalid tx ring length to load\n",
+				vf->vf_id);
+			ret = -EINVAL;
+			goto err;
+		}
+
+		/* Dummy descriptors must be re-initialized after use, since
+		 * it may be written back by HW
+		 */
+		ice_migration_init_dummy_desc(tx_desc, ring_len, tx_pkt_dma);
+		ret = ice_migration_inject_dummy_desc(vf, tx_ring, tx_heads[i],
+						      tx_desc_dma);
+		if (ret)
+			goto err;
+	}
+
+err:
+	dma_free_coherent(dev, ring_len * sizeof(struct ice_tx_desc),
+			  tx_desc, tx_desc_dma);
+	dma_free_coherent(dev, SZ_4K, tx_pkt, tx_pkt_dma);
+
+	return ret;
+}
+
 /**
  * ice_migration_load_devstate - load device state at destination
  * @pf: pointer to PF of migration device
@@ -596,6 +892,16 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 		msg_slot = (struct ice_migration_virtchnl_msg_slot *)
 					((char *)msg_slot + slot_sz);
 	}
+
+	/* Only load the TX Queue head after rest of device state is loaded
+	 * successfully.
+	 */
+	ret = ice_migration_load_tx_head(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to load tx head\n", vf->vf_id);
+		goto out_clear_replay;
+	}
+
 out_clear_replay:
 	clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
 out_put_vf:
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
index 8dbe558790af..e588712f585e 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
@@ -1351,6 +1351,24 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg)
 			continue;
 
 		ice_vf_ena_txq_interrupt(vsi, vf_q_id);
+
+		/* TX head register is a shadow copy of on-die TX head which
+		 * maintains the accurate location. And TX head register is
+		 * updated only after a packet is sent. If nothing is sent
+		 * after the queue is enabled, then the value is the one
+		 * updated last time and out-of-date.
+		 *
+		 * QTX_COMM_HEAD.HEAD rang value from 0x1fe0 to 0x1fff is
+		 * reserved and will never be used by HW. Manually write a
+		 * reserved value into TX head and use this as a marker for
+		 * the case that there's no packets sent.
+		 *
+		 * This marker is only used in live migration use case.
+		 */
+		if (vf->migration_enabled)
+			wr32(&vsi->back->hw,
+			     QTX_COMM_HEAD(vsi->txq_map[vf_q_id]),
+			     QTX_COMM_HEAD_HEAD_M);
 		set_bit(vf_q_id, vf->txq_ena);
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 10/12] ice: Add device suspend function for migration
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (8 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 09/12] ice: Save and load TX " Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 11/12] ice: Save and load mmio registers Yahui Cao
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

Device suspend handler is called by vfio driver before saving device
state. Typical operation includes stopping TX/RX queue.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_migration.c    | 69 +++++++++++++++++++
 include/linux/net/intel/ice_migration.h       |  6 ++
 2 files changed, 75 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index 082ae2b79f60..a11cd0d3ad3d 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -2,6 +2,8 @@
 /* Copyright (C) 2018-2023 Intel Corporation */
 
 #include "ice.h"
+#include "ice_lib.h"
+#include "ice_fltr.h"
 #include "ice_base.h"
 #include "ice_txrx_lib.h"
 
@@ -274,6 +276,73 @@ u32 ice_migration_supported_caps(void)
 	return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
 }
 
+/**
+ * ice_migration_suspend_dev - suspend device
+ * @pf: pointer to PF of migration device
+ * @vf_id: VF index of migration device
+ *
+ * Return 0 for success, negative for error
+ */
+int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id)
+{
+	struct device *dev = ice_pf_to_dev(pf);
+	struct ice_vsi *vsi;
+	struct ice_vf *vf;
+	int ret;
+
+	vf = ice_get_vf_by_id(pf, vf_id);
+	if (!vf) {
+		dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id);
+		return -EINVAL;
+	}
+
+	if (!test_bit(ICE_VF_STATE_QS_ENA, vf->vf_states)) {
+		ret = 0;
+		goto out_put_vf;
+	}
+
+	if (vf->virtchnl_msg_num > VIRTCHNL_MSG_MAX) {
+		dev_err(dev, "SR-IOV live migration disabled on VF %d. Migration buffer exceeded\n",
+			vf->vf_id);
+		ret = -EIO;
+		goto out_put_vf;
+	}
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		ret = -EINVAL;
+		goto out_put_vf;
+	}
+
+	/* Prevent VSI from queuing incoming packets by removing all filters */
+	ice_fltr_remove_all(vsi);
+
+	/* MAC based filter rule is disabled at this point. Set MAC to zero
+	 * to keep consistency with VF mac address info shown by ip link
+	 */
+	eth_zero_addr(vf->hw_lan_addr);
+	eth_zero_addr(vf->dev_lan_addr);
+
+	ret = ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id);
+	if (ret) {
+		dev_err(dev, "VF %d failed to stop tx rings\n", vf->vf_id);
+		ret = -EIO;
+		goto out_put_vf;
+	}
+	ret = ice_vsi_stop_all_rx_rings(vsi);
+	if (ret) {
+		dev_err(dev, "VF %d failed to stop rx rings\n", vf->vf_id);
+		ret = -EIO;
+		goto out_put_vf;
+	}
+
+out_put_vf:
+	ice_put_vf(vf);
+	return ret;
+}
+EXPORT_SYMBOL(ice_migration_suspend_dev);
+
 /**
  * ice_migration_save_rx_head - save rx head into device state buffer
  * @vf: pointer to VF structure
diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h
index a142b78283a8..47f46dca07ae 100644
--- a/include/linux/net/intel/ice_migration.h
+++ b/include/linux/net/intel/ice_migration.h
@@ -14,6 +14,7 @@ int ice_migration_save_devstate(struct ice_pf *pf, int vf_id,
 				u8 *buf, u64 buf_sz);
 int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 				const u8 *buf, u64 buf_sz);
+int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id);
 #else
 static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev)
 {
@@ -37,6 +38,11 @@ static inline int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 {
 	return 0;
 }
+
+static inline int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id)
+{
+	return 0;
+}
 #endif /* CONFIG_ICE_VFIO_PCI */
 
 #endif /* _ICE_MIGRATION_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 11/12] ice: Save and load mmio registers
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (9 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 10/12] ice: Add device suspend function for migration Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-11-21  2:51 ` [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices Yahui Cao
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

In E800 device model, VF takes direct control over the context of
AdminQ, irq ctrl, TX tail and RX tail by accessing VF pci mmio. Rest of
all the state can only be setup by PF and the procedure is that VF sends
all these configuration to PF through virtual channel messages to setup
the rest of the state.

To migrate AdminQ/irq context successfully, only AdminQ/irq register
needs to be loaded, rest of the part like generic msix is handled by
migration stack.

To migrate RX dma ring successfully, RX ring base, length(setup via
virtual channel messages) and tail register (setup via VF pci mmio)
must be loaded before RX queue is enabled.

To migrate TX dma ring successfully, TX ring base and length(setup via
virtual channel messages) must be loaded before TX queue is enabled,
and TX tail(setup via VF pci mmio) doesn't need to be loaded since TX
queue is drained before migration and TX tail is stateless.

For simplicity, just load all the VF pci mmio before virtual channel
messages are replayed so that all the TX/RX ring context are loaded
before queue is enabled.

However, there are 2 corner cases which need to be taken care of:
- During device suspenion, irq register may be dirtied when stopping
  queue. Hence save irq register into internal pre-saved area before
  queue is stopped and fetch the pre-saved irq register value at device
  saving stage.
- When PF processes virtual channel VIRTCHNL_OP_CONFIG_VSI_QUEUES, irq
  register may be dirtied. Hence load the affacted irq register after
  virtual channel messages are replayed.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 +
 .../net/ethernet/intel/ice/ice_migration.c    | 308 ++++++++++++++++++
 .../intel/ice/ice_migration_private.h         |   7 +
 drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   2 +
 4 files changed, 325 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 7410da715ad4..389bf00411ff 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -31,8 +31,16 @@
 #define PF_FW_ATQLEN_ATQVFE_M			BIT(28)
 #define PF_FW_ATQLEN_ATQOVFL_M			BIT(29)
 #define PF_FW_ATQLEN_ATQCRIT_M			BIT(30)
+#define VF_MBX_ARQBAH(_VF)			(0x0022B800 + ((_VF) * 4))
+#define VF_MBX_ARQBAL(_VF)			(0x0022B400 + ((_VF) * 4))
+#define VF_MBX_ARQH(_VF)			(0x0022C000 + ((_VF) * 4))
 #define VF_MBX_ARQLEN(_VF)			(0x0022BC00 + ((_VF) * 4))
+#define VF_MBX_ARQT(_VF)			(0x0022C400 + ((_VF) * 4))
+#define VF_MBX_ATQBAH(_VF)			(0x0022A400 + ((_VF) * 4))
+#define VF_MBX_ATQBAL(_VF)			(0x0022A000 + ((_VF) * 4))
+#define VF_MBX_ATQH(_VF)			(0x0022AC00 + ((_VF) * 4))
 #define VF_MBX_ATQLEN(_VF)			(0x0022A800 + ((_VF) * 4))
+#define VF_MBX_ATQT(_VF)			(0x0022B000 + ((_VF) * 4))
 #define PF_FW_ATQLEN_ATQENABLE_M		BIT(31)
 #define PF_FW_ATQT				0x00080400
 #define PF_MBX_ARQBAH				0x0022E400
diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
index a11cd0d3ad3d..127d45be6767 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration.c
+++ b/drivers/net/ethernet/intel/ice/ice_migration.c
@@ -25,6 +25,27 @@ struct ice_migration_virtchnl_msg_listnode {
 	struct ice_migration_virtchnl_msg_slot msg_slot;
 };
 
+struct ice_migration_mmio_regs {
+	/* VF Interrupts */
+	u32 int_dyn_ctl[ICE_MIG_VF_MSIX_MAX];
+	u32 int_intr[ICE_MIG_VF_ITR_NUM][ICE_MIG_VF_MSIX_MAX];
+
+	/* VF Control Queues */
+	u32 asq_bal;
+	u32 asq_bah;
+	u32 asq_len;
+	u32 asq_head;
+	u32 asq_tail;
+	u32 arq_bal;
+	u32 arq_bah;
+	u32 arq_len;
+	u32 arq_head;
+	u32 arq_tail;
+
+	/* VF LAN RX */
+	u32 rx_tail[ICE_MIG_VF_QRX_TAIL_MAX];
+};
+
 struct ice_migration_dev_state {
 	u32 magic;
 	u32 version;
@@ -33,6 +54,7 @@ struct ice_migration_dev_state {
 	u16 num_txq;
 	u16 num_rxq;
 
+	struct ice_migration_mmio_regs regs;
 	u16 vsi_id;
 	/* next RX desc index to be processed by the device */
 	u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX];
@@ -276,6 +298,57 @@ u32 ice_migration_supported_caps(void)
 	return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
 }
 
+/**
+ * ice_migration_save_dirty_regs - save registers which may be dirtied
+ * @vf: pointer to VF structure
+ *
+ * Return 0 for success, negative for error
+ */
+static int ice_migration_save_dirty_regs(struct ice_vf *vf)
+{
+	struct ice_migration_dirty_regs *dirty_regs = &vf->dirty_regs;
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int itr, v_id;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF >
+			 ICE_MIG_VF_MSIX_MAX))
+		return -EINVAL;
+
+	/* Save Mailbox Q vectors */
+	dirty_regs->int_dyn_ctl[0] =
+		rd32(hw, GLINT_DYN_CTL(vf->first_vector_idx));
+	for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+		dirty_regs->int_intr[itr][0] =
+			rd32(hw, GLINT_ITR(itr, vf->first_vector_idx));
+
+	/* Save Data Q vectors */
+	for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) {
+		int irq = v_id + ICE_NONQ_VECS_VF;
+		struct ice_q_vector *q_vector;
+
+		q_vector = vsi->q_vectors[v_id];
+		if (!q_vector) {
+			dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id);
+			return -EINVAL;
+		}
+		dirty_regs->int_dyn_ctl[irq] =
+				rd32(hw, GLINT_DYN_CTL(q_vector->reg_idx));
+		for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+			dirty_regs->int_intr[itr][irq] =
+				rd32(hw, GLINT_ITR(itr, q_vector->reg_idx));
+	}
+
+	return 0;
+}
+
 /**
  * ice_migration_suspend_dev - suspend device
  * @pf: pointer to PF of migration device
@@ -324,6 +397,15 @@ int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id)
 	eth_zero_addr(vf->hw_lan_addr);
 	eth_zero_addr(vf->dev_lan_addr);
 
+	/* Irq register may be dirtied when stopping queue. Hence save irq
+	 * register into pre-saved area before queue is stopped.
+	 */
+	ret = ice_migration_save_dirty_regs(vf);
+	if (ret) {
+		dev_err(dev, "VF %d failed to save dirty register copy\n",
+			vf->vf_id);
+		goto out_put_vf;
+	}
 	ret = ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id);
 	if (ret) {
 		dev_err(dev, "VF %d failed to stop tx rings\n", vf->vf_id);
@@ -447,6 +529,84 @@ ice_migration_save_tx_head(struct ice_vf *vf,
 	return 0;
 }
 
+/**
+ * ice_migration_save_regs - save mmio registers in migration region
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_save_regs(struct ice_vf *vf,
+			struct ice_migration_dev_state *devstate)
+{
+	struct ice_migration_dirty_regs *dirty_regs = &vf->dirty_regs;
+	struct ice_migration_mmio_regs *regs = &devstate->regs;
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int i, itr, v_id;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF >
+			 ICE_MIG_VF_MSIX_MAX))
+		return -EINVAL;
+
+	/* For irq registers which may be dirtied when virtual channel message
+	 * VIRTCHNL_OP_CONFIG_VSI_QUEUES is processed, load values from
+	 * pre-saved area.
+	 */
+
+	/* Save Mailbox Q vectors */
+	regs->int_dyn_ctl[0] = dirty_regs->int_dyn_ctl[0];
+	for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+		regs->int_intr[itr][0] = dirty_regs->int_intr[itr][0];
+
+	/* Save Data Q vectors */
+	for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) {
+		int irq = v_id + ICE_NONQ_VECS_VF;
+		struct ice_q_vector *q_vector;
+
+		q_vector = vsi->q_vectors[v_id];
+		if (!q_vector) {
+			dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id);
+			return -EINVAL;
+		}
+		regs->int_dyn_ctl[irq] = dirty_regs->int_dyn_ctl[irq];
+		for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+			regs->int_intr[itr][irq] =
+				dirty_regs->int_intr[itr][irq];
+	}
+
+	regs->asq_bal = rd32(hw, VF_MBX_ATQBAL(vf->vf_id));
+	regs->asq_bah = rd32(hw, VF_MBX_ATQBAH(vf->vf_id));
+	regs->asq_len = rd32(hw, VF_MBX_ATQLEN(vf->vf_id));
+	regs->asq_head = rd32(hw, VF_MBX_ATQH(vf->vf_id));
+	regs->asq_tail = rd32(hw, VF_MBX_ATQT(vf->vf_id));
+	regs->arq_bal = rd32(hw, VF_MBX_ARQBAL(vf->vf_id));
+	regs->arq_bah = rd32(hw, VF_MBX_ARQBAH(vf->vf_id));
+	regs->arq_len = rd32(hw,  VF_MBX_ARQLEN(vf->vf_id));
+	regs->arq_head = rd32(hw, VF_MBX_ARQH(vf->vf_id));
+	regs->arq_tail = rd32(hw, VF_MBX_ARQT(vf->vf_id));
+
+	ice_for_each_rxq(vsi, i) {
+		struct ice_rx_ring *rx_ring = vsi->rx_rings[i];
+
+		regs->rx_tail[i] = 0;
+		if (!test_bit(i, vf->rxq_ena))
+			continue;
+
+		regs->rx_tail[i] = rd32(hw, QRX_TAIL(rx_ring->reg_idx));
+	}
+
+	return 0;
+}
+
 /**
  * ice_migration_save_devstate - save device state to migration buffer
  * @pf: pointer to PF of migration device
@@ -501,6 +661,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz)
 	buf = devstate->virtchnl_msgs;
 	devstate->vsi_id = vf->vm_vsi_num;
 
+	ret = ice_migration_save_regs(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to save mmio register\n", vf->vf_id);
+		goto out_put_vf;
+	}
+
 	ret = ice_migration_save_rx_head(vf, devstate);
 	if (ret) {
 		dev_err(dev, "VF %d failed to save rxq head\n", vf->vf_id);
@@ -883,6 +1049,125 @@ ice_migration_load_tx_head(struct ice_vf *vf,
 	return ret;
 }
 
+/**
+ * ice_migration_load_regs - load mmio registers from device state buffer
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_load_regs(struct ice_vf *vf,
+			struct ice_migration_dev_state *devstate)
+{
+	struct ice_migration_mmio_regs *regs = &devstate->regs;
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int i, itr, v_id;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF >
+			 ICE_MIG_VF_MSIX_MAX))
+		return -EINVAL;
+
+	/* Restore Mailbox Q vectors */
+	wr32(hw, GLINT_DYN_CTL(vf->first_vector_idx), regs->int_dyn_ctl[0]);
+	for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+		wr32(hw, GLINT_ITR(itr, vf->first_vector_idx),
+		     regs->int_intr[itr][0]);
+
+	/* Restore Data Q vectors */
+	for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) {
+		int irq = v_id + ICE_NONQ_VECS_VF;
+		struct ice_q_vector *q_vector;
+
+		q_vector = vsi->q_vectors[v_id];
+		if (!q_vector) {
+			dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id);
+			return -EINVAL;
+		}
+		wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx),
+		     regs->int_dyn_ctl[irq]);
+		for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+			wr32(hw, GLINT_ITR(itr, q_vector->reg_idx),
+			     regs->int_intr[itr][irq]);
+	}
+
+	wr32(hw, VF_MBX_ATQBAL(vf->vf_id), regs->asq_bal);
+	wr32(hw, VF_MBX_ATQBAH(vf->vf_id), regs->asq_bah);
+	wr32(hw, VF_MBX_ATQLEN(vf->vf_id), regs->asq_len);
+	wr32(hw, VF_MBX_ATQH(vf->vf_id), regs->asq_head);
+	/* Since Mailbox ctrl tx queue tail is bumped by VF driver to notify
+	 * HW to send pks, VF_MBX_ATQT is not necessry to be loaded here.
+	 */
+	wr32(hw, VF_MBX_ARQBAL(vf->vf_id), regs->arq_bal);
+	wr32(hw, VF_MBX_ARQBAH(vf->vf_id), regs->arq_bah);
+	wr32(hw, VF_MBX_ARQLEN(vf->vf_id), regs->arq_len);
+	wr32(hw, VF_MBX_ARQH(vf->vf_id), regs->arq_head);
+	wr32(hw, VF_MBX_ARQT(vf->vf_id), regs->arq_tail);
+
+	ice_for_each_rxq(vsi, i) {
+		struct ice_rx_ring *rx_ring = vsi->rx_rings[i];
+
+		wr32(hw, QRX_TAIL(rx_ring->reg_idx), regs->rx_tail[i]);
+	}
+
+	return 0;
+}
+
+/**
+ * ice_migration_load_dirty_regs - load registers which may be dirtied
+ * @vf: pointer to VF structure
+ * @devstate: pointer to migration device state
+ *
+ * Return 0 for success, negative for error
+ */
+static int
+ice_migration_load_dirty_regs(struct ice_vf *vf,
+			      struct ice_migration_dev_state *devstate)
+{
+	struct ice_migration_mmio_regs *regs = &devstate->regs;
+	struct device *dev = ice_pf_to_dev(vf->pf);
+	struct ice_hw *hw = &vf->pf->hw;
+	struct ice_vsi *vsi;
+	int itr, v_id;
+
+	vsi = ice_get_vf_vsi(vf);
+	if (!vsi) {
+		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
+		return -EINVAL;
+	}
+
+	if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF >
+			 ICE_MIG_VF_MSIX_MAX))
+		return -EINVAL;
+
+	/* Restore Data Q vectors */
+	for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) {
+		int irq = v_id + ICE_NONQ_VECS_VF;
+		struct ice_q_vector *q_vector;
+
+		q_vector = vsi->q_vectors[v_id];
+		if (!q_vector) {
+			dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id);
+			return -EINVAL;
+		}
+		wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx),
+		     regs->int_dyn_ctl[irq]);
+		for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++)
+			wr32(hw, GLINT_ITR(itr, q_vector->reg_idx),
+			     regs->int_intr[itr][irq]);
+	}
+
+	return 0;
+}
+
 /**
  * ice_migration_load_devstate - load device state at destination
  * @pf: pointer to PF of migration device
@@ -920,6 +1205,18 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 	devstate = (struct ice_migration_dev_state *)buf;
 	vf->vm_vsi_num = devstate->vsi_id;
 	dev_dbg(dev, "VF %d vm vsi num is:%d\n", vf->vf_id, vf->vm_vsi_num);
+
+	/* RX tail register must be loaded before queue is enabled. For
+	 * simplicity, just load all the mmio before virtual channel messages
+	 * are replayed.
+	 */
+	ret = ice_migration_load_regs(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to load mmio registers\n",
+			vf->vf_id);
+		goto out_put_vf;
+	}
+
 	msg_slot = (struct ice_migration_virtchnl_msg_slot *)
 			devstate->virtchnl_msgs;
 	set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
@@ -971,6 +1268,17 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id,
 		goto out_clear_replay;
 	}
 
+	/* When PF processes virtual channel VIRTCHNL_OP_CONFIG_VSI_QUEUES, irq
+	 * register may be dirtied. Hence load the affacted irq register again
+	 * after virtual channel messages are replayed.
+	 */
+	ret = ice_migration_load_dirty_regs(vf, devstate);
+	if (ret) {
+		dev_err(dev, "VF %d failed to load dirty registers\n",
+			vf->vf_id);
+		goto out_clear_replay;
+	}
+
 out_clear_replay:
 	clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states);
 out_put_vf:
diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h
index f72a488d9002..b76eb05747c8 100644
--- a/drivers/net/ethernet/intel/ice/ice_migration_private.h
+++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h
@@ -10,6 +10,13 @@
  * in ice-vfio-pic.ko should be exposed as part of ice_migration.h.
  */
 
+#define ICE_MIG_VF_MSIX_MAX		65
+#define ICE_MIG_VF_ITR_NUM		4
+struct ice_migration_dirty_regs {
+	u32 int_dyn_ctl[ICE_MIG_VF_MSIX_MAX];
+	u32 int_intr[ICE_MIG_VF_ITR_NUM][ICE_MIG_VF_MSIX_MAX];
+};
+
 #if IS_ENABLED(CONFIG_ICE_VFIO_PCI)
 void ice_migration_init_vf(struct ice_vf *vf);
 void ice_migration_uninit_vf(struct ice_vf *vf);
diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
index 49d99694e91f..c971fb47c2ff 100644
--- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
@@ -14,6 +14,7 @@
 #include "ice_type.h"
 #include "ice_virtchnl_fdir.h"
 #include "ice_vsi_vlan_ops.h"
+#include "ice_migration_private.h"
 
 #define ICE_MAX_SRIOV_VFS		256
 
@@ -147,6 +148,7 @@ struct ice_vf {
 	u64 virtchnl_msg_size;
 	u32 virtchnl_retval;
 	u16 vm_vsi_num;
+	struct ice_migration_dirty_regs dirty_regs;
 };
 
 /* Flags for controlling behavior of ice_reset_vf */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (10 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 11/12] ice: Save and load mmio registers Yahui Cao
@ 2023-11-21  2:51 ` Yahui Cao
  2023-12-07 22:43   ` Alex Williamson
  2023-12-04 11:18 ` [PATCH iwl-next v4 00/12] Add E800 live migration driver Cao, Yahui
  2024-01-18 22:09 ` Jacob Keller
  13 siblings, 1 reply; 33+ messages in thread
From: Yahui Cao @ 2023-11-21  2:51 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

From: Lingyu Liu <lingyu.liu@intel.com>

Add a vendor-specific vfio_pci driver for E800 devices.

It uses vfio_pci_core to register to the VFIO subsystem and then
implements the E800 specific logic to support VF live migration.

It implements the device state transition flow for live
migration.

Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
---
 MAINTAINERS                         |   7 +
 drivers/vfio/pci/Kconfig            |   2 +
 drivers/vfio/pci/Makefile           |   2 +
 drivers/vfio/pci/ice/Kconfig        |  10 +
 drivers/vfio/pci/ice/Makefile       |   4 +
 drivers/vfio/pci/ice/ice_vfio_pci.c | 707 ++++++++++++++++++++++++++++
 6 files changed, 732 insertions(+)
 create mode 100644 drivers/vfio/pci/ice/Kconfig
 create mode 100644 drivers/vfio/pci/ice/Makefile
 create mode 100644 drivers/vfio/pci/ice/ice_vfio_pci.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 97f51d5ec1cf..c8faf7fe1bd1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22860,6 +22860,13 @@ L:	kvm@vger.kernel.org
 S:	Maintained
 F:	drivers/vfio/pci/mlx5/
 
+VFIO ICE PCI DRIVER
+M:	Yahui Cao <yahui.cao@intel.com>
+M:	Lingyu Liu <lingyu.liu@intel.com>
+L:	kvm@vger.kernel.org
+S:	Maintained
+F:	drivers/vfio/pci/ice/
+
 VFIO PCI DEVICE SPECIFIC DRIVERS
 R:	Jason Gunthorpe <jgg@nvidia.com>
 R:	Yishai Hadas <yishaih@nvidia.com>
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..6618208947af 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
 
 source "drivers/vfio/pci/pds/Kconfig"
 
+source "drivers/vfio/pci/ice/Kconfig"
+
 endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..fc1df82df3ac 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
 
 obj-$(CONFIG_PDS_VFIO_PCI) += pds/
+
+obj-$(CONFIG_ICE_VFIO_PCI) += ice/
diff --git a/drivers/vfio/pci/ice/Kconfig b/drivers/vfio/pci/ice/Kconfig
new file mode 100644
index 000000000000..0b8cd1489073
--- /dev/null
+++ b/drivers/vfio/pci/ice/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config ICE_VFIO_PCI
+	tristate "VFIO support for Intel(R) Ethernet Connection E800 Series"
+	depends on ICE
+	select VFIO_PCI_CORE
+	help
+	  This provides migration support for Intel(R) Ethernet connection E800
+	  series devices using the VFIO framework.
+
+	  If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/ice/Makefile b/drivers/vfio/pci/ice/Makefile
new file mode 100644
index 000000000000..259d4ab89105
--- /dev/null
+++ b/drivers/vfio/pci/ice/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_ICE_VFIO_PCI) += ice-vfio-pci.o
+ice-vfio-pci-y := ice_vfio_pci.o
+
diff --git a/drivers/vfio/pci/ice/ice_vfio_pci.c b/drivers/vfio/pci/ice/ice_vfio_pci.c
new file mode 100644
index 000000000000..28a181aa2f3f
--- /dev/null
+++ b/drivers/vfio/pci/ice/ice_vfio_pci.c
@@ -0,0 +1,707 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2018-2023 Intel Corporation */
+
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/file.h>
+#include <linux/pci.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/net/intel/ice_migration.h>
+#include <linux/anon_inodes.h>
+
+#define DRIVER_DESC     "ICE VFIO PCI - User Level meta-driver for Intel E800 device family"
+
+struct ice_vfio_pci_migration_file {
+	struct file *filp;
+	struct mutex lock; /* protect migration file access */
+	bool disabled;
+
+	u8 mig_data[SZ_128K];
+	size_t total_length;
+};
+
+struct ice_vfio_pci_core_device {
+	struct vfio_pci_core_device core_device;
+	u8 deferred_reset:1;
+	struct mutex state_mutex; /* protect migration state */
+	enum vfio_device_mig_state mig_state;
+	/* protect the reset_done flow */
+	spinlock_t reset_lock;
+	struct ice_vfio_pci_migration_file *resuming_migf;
+	struct ice_vfio_pci_migration_file *saving_migf;
+	struct vfio_device_migration_info mig_info;
+	u8 *mig_data;
+	struct ice_pf *pf;
+	int vf_id;
+};
+
+/**
+ * ice_vfio_pci_load_state - VFIO device state reloading
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ *
+ * Load device state. This function is called when the userspace VFIO uAPI
+ * consumer wants to load the device state info from VFIO migration region and
+ * load them into the device. This function should make sure all the device
+ * state info is loaded successfully. As a result, return value is mandatory
+ * to be checked.
+ *
+ * Return 0 for success, negative value for failure.
+ */
+static int __must_check
+ice_vfio_pci_load_state(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	struct ice_vfio_pci_migration_file *migf = ice_vdev->resuming_migf;
+
+	return ice_migration_load_devstate(ice_vdev->pf,
+					   ice_vdev->vf_id,
+					   migf->mig_data,
+					   migf->total_length);
+}
+
+/**
+ * ice_vfio_pci_save_state - VFIO device state saving
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ * @migf: pointer to migration file
+ *
+ * Snapshot the device state and save it. This function is called when the
+ * VFIO uAPI consumer wants to snapshot the current device state and saves
+ * it into the VFIO migration region. This function should make sure all
+ * of the device state info is collectted and saved successfully. As a
+ * result, return value is mandatory to be checked.
+ *
+ * Return 0 for success, negative value for failure.
+ */
+static int __must_check
+ice_vfio_pci_save_state(struct ice_vfio_pci_core_device *ice_vdev,
+			struct ice_vfio_pci_migration_file *migf)
+{
+	migf->total_length = SZ_128K;
+
+	return ice_migration_save_devstate(ice_vdev->pf,
+					   ice_vdev->vf_id,
+					   migf->mig_data,
+					   migf->total_length);
+}
+
+/**
+ * ice_vfio_migration_init - Initialization for live migration function
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ *
+ * Returns 0 on success, negative value on error
+ */
+static int ice_vfio_migration_init(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	struct pci_dev *pdev = ice_vdev->core_device.pdev;
+
+	ice_vdev->pf = ice_migration_get_pf(pdev);
+	if (!ice_vdev->pf)
+		return -EFAULT;
+
+	ice_vdev->vf_id = pci_iov_vf_id(pdev);
+	if (ice_vdev->vf_id < 0)
+		return -EINVAL;
+
+	return ice_migration_init_dev(ice_vdev->pf, ice_vdev->vf_id);
+}
+
+/**
+ * ice_vfio_migration_uninit - Cleanup for live migration function
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ */
+static void ice_vfio_migration_uninit(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	ice_migration_uninit_dev(ice_vdev->pf, ice_vdev->vf_id);
+}
+
+/**
+ * ice_vfio_pci_disable_fd - Close migration file
+ * @migf: pointer to ice vfio pci migration file
+ */
+static void ice_vfio_pci_disable_fd(struct ice_vfio_pci_migration_file *migf)
+{
+	mutex_lock(&migf->lock);
+	migf->disabled = true;
+	migf->total_length = 0;
+	migf->filp->f_pos = 0;
+	mutex_unlock(&migf->lock);
+}
+
+/**
+ * ice_vfio_pci_disable_fds - Close migration files of ice vfio pci device
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ */
+static void ice_vfio_pci_disable_fds(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	if (ice_vdev->resuming_migf) {
+		ice_vfio_pci_disable_fd(ice_vdev->resuming_migf);
+		fput(ice_vdev->resuming_migf->filp);
+		ice_vdev->resuming_migf = NULL;
+	}
+	if (ice_vdev->saving_migf) {
+		ice_vfio_pci_disable_fd(ice_vdev->saving_migf);
+		fput(ice_vdev->saving_migf->filp);
+		ice_vdev->saving_migf = NULL;
+	}
+}
+
+/*
+ * This function is called in all state_mutex unlock cases to
+ * handle a 'deferred_reset' if exists.
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ */
+static void
+ice_vfio_pci_state_mutex_unlock(struct ice_vfio_pci_core_device *ice_vdev)
+{
+again:
+	spin_lock(&ice_vdev->reset_lock);
+	if (ice_vdev->deferred_reset) {
+		ice_vdev->deferred_reset = false;
+		spin_unlock(&ice_vdev->reset_lock);
+		ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+		ice_vfio_pci_disable_fds(ice_vdev);
+		goto again;
+	}
+	mutex_unlock(&ice_vdev->state_mutex);
+	spin_unlock(&ice_vdev->reset_lock);
+}
+
+static void ice_vfio_pci_reset_done(struct pci_dev *pdev)
+{
+	struct ice_vfio_pci_core_device *ice_vdev =
+		(struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev);
+
+	/*
+	 * As the higher VFIO layers are holding locks across reset and using
+	 * those same locks with the mm_lock we need to prevent ABBA deadlock
+	 * with the state_mutex and mm_lock.
+	 * In case the state_mutex was taken already we defer the cleanup work
+	 * to the unlock flow of the other running context.
+	 */
+	spin_lock(&ice_vdev->reset_lock);
+	ice_vdev->deferred_reset = true;
+	if (!mutex_trylock(&ice_vdev->state_mutex)) {
+		spin_unlock(&ice_vdev->reset_lock);
+		return;
+	}
+	spin_unlock(&ice_vdev->reset_lock);
+	ice_vfio_pci_state_mutex_unlock(ice_vdev);
+}
+
+/**
+ * ice_vfio_pci_open_device - Called when a vfio device is probed by VFIO UAPI
+ * @core_vdev: the vfio device to open
+ *
+ * Initialization of the vfio device
+ *
+ * Returns 0 on success, negative value on error
+ */
+static int ice_vfio_pci_open_device(struct vfio_device *core_vdev)
+{
+	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
+			struct ice_vfio_pci_core_device, core_device.vdev);
+	struct vfio_pci_core_device *vdev = &ice_vdev->core_device;
+	int ret;
+
+	ret = vfio_pci_core_enable(vdev);
+	if (ret)
+		return ret;
+
+	ret = ice_vfio_migration_init(ice_vdev);
+	if (ret) {
+		vfio_pci_core_disable(vdev);
+		return ret;
+	}
+	ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+	vfio_pci_core_finish_enable(vdev);
+
+	return 0;
+}
+
+/**
+ * ice_vfio_pci_close_device - Called when a vfio device fd is closed
+ * @core_vdev: the vfio device to close
+ */
+static void ice_vfio_pci_close_device(struct vfio_device *core_vdev)
+{
+	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
+			struct ice_vfio_pci_core_device, core_device.vdev);
+
+	ice_vfio_pci_disable_fds(ice_vdev);
+	vfio_pci_core_close_device(core_vdev);
+	ice_vfio_migration_uninit(ice_vdev);
+}
+
+/**
+ * ice_vfio_pci_release_file - release ice vfio pci migration file
+ * @inode: pointer to inode
+ * @filp: pointer to the file to release
+ *
+ * Return 0 for success, negative for error
+ */
+static int ice_vfio_pci_release_file(struct inode *inode, struct file *filp)
+{
+	struct ice_vfio_pci_migration_file *migf = filp->private_data;
+
+	ice_vfio_pci_disable_fd(migf);
+	mutex_destroy(&migf->lock);
+	kfree(migf);
+	return 0;
+}
+
+/**
+ * ice_vfio_pci_save_read - save migration file data to user space
+ * @filp: pointer to migration file
+ * @buf: pointer to user space buffer
+ * @len: data length to be saved
+ * @pos: should be 0
+ *
+ * Return len of saved data, negative for error
+ */
+static ssize_t ice_vfio_pci_save_read(struct file *filp, char __user *buf,
+				      size_t len, loff_t *pos)
+{
+	struct ice_vfio_pci_migration_file *migf = filp->private_data;
+	loff_t *off = &filp->f_pos;
+	ssize_t done = 0;
+	int ret;
+
+	if (pos)
+		return -ESPIPE;
+
+	mutex_lock(&migf->lock);
+	if (*off > migf->total_length) {
+		done = -EINVAL;
+		goto out_unlock;
+	}
+
+	if (migf->disabled) {
+		done = -ENODEV;
+		goto out_unlock;
+	}
+
+	len = min_t(size_t, migf->total_length - *off, len);
+	if (len) {
+		ret = copy_to_user(buf, migf->mig_data + *off, len);
+		if (ret) {
+			done = -EFAULT;
+			goto out_unlock;
+		}
+		*off += len;
+		done = len;
+	}
+out_unlock:
+	mutex_unlock(&migf->lock);
+	return done;
+}
+
+static const struct file_operations ice_vfio_pci_save_fops = {
+	.owner = THIS_MODULE,
+	.read = ice_vfio_pci_save_read,
+	.release = ice_vfio_pci_release_file,
+	.llseek = no_llseek,
+};
+
+/**
+ * ice_vfio_pci_stop_copy - create migration file and save migration state to it
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ *
+ * Return migration file handler
+ */
+static struct ice_vfio_pci_migration_file *
+ice_vfio_pci_stop_copy(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	struct ice_vfio_pci_migration_file *migf;
+	int ret;
+
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	if (!migf)
+		return ERR_PTR(-ENOMEM);
+
+	migf->filp = anon_inode_getfile("ice_vfio_pci_mig",
+					&ice_vfio_pci_save_fops, migf,
+					O_RDONLY);
+	if (IS_ERR(migf->filp)) {
+		int err = PTR_ERR(migf->filp);
+
+		kfree(migf);
+		return ERR_PTR(err);
+	}
+
+	stream_open(migf->filp->f_inode, migf->filp);
+	mutex_init(&migf->lock);
+
+	ret = ice_vfio_pci_save_state(ice_vdev, migf);
+	if (ret) {
+		fput(migf->filp);
+		kfree(migf);
+		return ERR_PTR(ret);
+	}
+
+	return migf;
+}
+
+/**
+ * ice_vfio_pci_resume_write- copy migration file data from user space
+ * @filp: pointer to migration file
+ * @buf: pointer to user space buffer
+ * @len: data length to be copied
+ * @pos: should be 0
+ *
+ * Return len of saved data, negative for error
+ */
+static ssize_t
+ice_vfio_pci_resume_write(struct file *filp, const char __user *buf,
+			  size_t len, loff_t *pos)
+{
+	struct ice_vfio_pci_migration_file *migf = filp->private_data;
+	loff_t *off = &filp->f_pos;
+	loff_t requested_length;
+	ssize_t done = 0;
+	int ret;
+
+	if (pos)
+		return -ESPIPE;
+
+	if (*off < 0 ||
+	    check_add_overflow((loff_t)len, *off, &requested_length))
+		return -EINVAL;
+
+	if (requested_length > sizeof(migf->mig_data))
+		return -ENOMEM;
+
+	mutex_lock(&migf->lock);
+	if (migf->disabled) {
+		done = -ENODEV;
+		goto out_unlock;
+	}
+
+	ret = copy_from_user(migf->mig_data + *off, buf, len);
+	if (ret) {
+		done = -EFAULT;
+		goto out_unlock;
+	}
+	*off += len;
+	done = len;
+	migf->total_length += len;
+out_unlock:
+	mutex_unlock(&migf->lock);
+	return done;
+}
+
+static const struct file_operations ice_vfio_pci_resume_fops = {
+	.owner = THIS_MODULE,
+	.write = ice_vfio_pci_resume_write,
+	.release = ice_vfio_pci_release_file,
+	.llseek = no_llseek,
+};
+
+/**
+ * ice_vfio_pci_resume - create resuming migration file
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ *
+ * Return migration file handler, negative value for failure
+ */
+static struct ice_vfio_pci_migration_file *
+ice_vfio_pci_resume(struct ice_vfio_pci_core_device *ice_vdev)
+{
+	struct ice_vfio_pci_migration_file *migf;
+
+	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+	if (!migf)
+		return ERR_PTR(-ENOMEM);
+
+	migf->filp = anon_inode_getfile("ice_vfio_pci_mig",
+					&ice_vfio_pci_resume_fops, migf,
+					O_WRONLY);
+	if (IS_ERR(migf->filp)) {
+		int err = PTR_ERR(migf->filp);
+
+		kfree(migf);
+		return ERR_PTR(err);
+	}
+
+	stream_open(migf->filp->f_inode, migf->filp);
+	mutex_init(&migf->lock);
+	return migf;
+}
+
+/**
+ * ice_vfio_pci_step_device_state_locked - process device state change
+ * @ice_vdev: pointer to ice vfio pci core device structure
+ * @new: new device state
+ * @final: final device state
+ *
+ * Return migration file handler or NULL for success, negative value for failure
+ */
+static struct file *
+ice_vfio_pci_step_device_state_locked(struct ice_vfio_pci_core_device *ice_vdev,
+				      u32 new, u32 final)
+{
+	u32 cur = ice_vdev->mig_state;
+	int ret;
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING &&
+	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
+		ice_migration_suspend_dev(ice_vdev->pf, ice_vdev->vf_id);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
+	    new == VFIO_DEVICE_STATE_STOP)
+		return NULL;
+
+	if (cur == VFIO_DEVICE_STATE_STOP &&
+	    new == VFIO_DEVICE_STATE_STOP_COPY) {
+		struct ice_vfio_pci_migration_file *migf;
+
+		migf = ice_vfio_pci_stop_copy(ice_vdev);
+		if (IS_ERR(migf))
+			return ERR_CAST(migf);
+		get_file(migf->filp);
+		ice_vdev->saving_migf = migf;
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP_COPY &&
+	    new == VFIO_DEVICE_STATE_STOP) {
+		ice_vfio_pci_disable_fds(ice_vdev);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_STOP &&
+	    new == VFIO_DEVICE_STATE_RESUMING) {
+		struct ice_vfio_pci_migration_file *migf;
+
+		migf = ice_vfio_pci_resume(ice_vdev);
+		if (IS_ERR(migf))
+			return ERR_CAST(migf);
+		get_file(migf->filp);
+		ice_vdev->resuming_migf = migf;
+		return migf->filp;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP)
+		return NULL;
+
+	if (cur == VFIO_DEVICE_STATE_STOP &&
+	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
+		ret = ice_vfio_pci_load_state(ice_vdev);
+		if (ret)
+			return ERR_PTR(ret);
+		ice_vfio_pci_disable_fds(ice_vdev);
+		return NULL;
+	}
+
+	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
+	    new == VFIO_DEVICE_STATE_RUNNING)
+		return NULL;
+
+	/*
+	 * vfio_mig_get_next_state() does not use arcs other than the above
+	 */
+	WARN_ON(true);
+	return ERR_PTR(-EINVAL);
+}
+
+/**
+ * ice_vfio_pci_set_device_state - Config device state
+ * @vdev: pointer to vfio pci device
+ * @new_state: device state
+ *
+ * Return 0 for success, negative value for failure.
+ */
+static struct file *
+ice_vfio_pci_set_device_state(struct vfio_device *vdev,
+			      enum vfio_device_mig_state new_state)
+{
+	struct ice_vfio_pci_core_device *ice_vdev =
+			container_of(vdev,
+				     struct ice_vfio_pci_core_device,
+				     core_device.vdev);
+	enum vfio_device_mig_state next_state;
+	struct file *res = NULL;
+	int ret;
+
+	mutex_lock(&ice_vdev->state_mutex);
+	while (new_state != ice_vdev->mig_state) {
+		ret = vfio_mig_get_next_state(vdev, ice_vdev->mig_state,
+					      new_state, &next_state);
+		if (ret) {
+			res = ERR_PTR(ret);
+			break;
+		}
+		res = ice_vfio_pci_step_device_state_locked(ice_vdev,
+							    next_state,
+							    new_state);
+		if (IS_ERR(res))
+			break;
+		ice_vdev->mig_state = next_state;
+		if (WARN_ON(res && new_state != ice_vdev->mig_state)) {
+			fput(res);
+			res = ERR_PTR(-EINVAL);
+			break;
+		}
+	}
+	ice_vfio_pci_state_mutex_unlock(ice_vdev);
+	return res;
+}
+
+/**
+ * ice_vfio_pci_get_device_state - get device state
+ * @vdev: pointer to vfio pci device
+ * @curr_state: device state
+ *
+ * Return 0 for success
+ */
+static int ice_vfio_pci_get_device_state(struct vfio_device *vdev,
+					 enum vfio_device_mig_state *curr_state)
+{
+	struct ice_vfio_pci_core_device *ice_vdev =
+			container_of(vdev,
+				     struct ice_vfio_pci_core_device,
+				     core_device.vdev);
+	mutex_lock(&ice_vdev->state_mutex);
+	*curr_state = ice_vdev->mig_state;
+	ice_vfio_pci_state_mutex_unlock(ice_vdev);
+	return 0;
+}
+
+/**
+ * ice_vfio_pci_get_data_size - get migration data size
+ * @vdev: pointer to vfio pci device
+ * @stop_copy_length: migration data size
+ *
+ * Return 0 for success
+ */
+static int
+ice_vfio_pci_get_data_size(struct vfio_device *vdev,
+			   unsigned long *stop_copy_length)
+{
+	*stop_copy_length = SZ_128K;
+	return 0;
+}
+
+static const struct vfio_migration_ops ice_vfio_pci_migrn_state_ops = {
+	.migration_set_state = ice_vfio_pci_set_device_state,
+	.migration_get_state = ice_vfio_pci_get_device_state,
+	.migration_get_data_size = ice_vfio_pci_get_data_size,
+};
+
+/**
+ * ice_vfio_pci_core_init_dev - initialize vfio device
+ * @core_vdev: pointer to vfio device
+ *
+ * Return 0 for success
+ */
+static int ice_vfio_pci_core_init_dev(struct vfio_device *core_vdev)
+{
+	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
+			struct ice_vfio_pci_core_device, core_device.vdev);
+
+	mutex_init(&ice_vdev->state_mutex);
+	spin_lock_init(&ice_vdev->reset_lock);
+
+	core_vdev->migration_flags =
+		VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
+	core_vdev->mig_ops = &ice_vfio_pci_migrn_state_ops;
+
+	return vfio_pci_core_init_dev(core_vdev);
+}
+
+static const struct vfio_device_ops ice_vfio_pci_ops = {
+	.name		= "ice-vfio-pci",
+	.init		= ice_vfio_pci_core_init_dev,
+	.release	= vfio_pci_core_release_dev,
+	.open_device	= ice_vfio_pci_open_device,
+	.close_device	= ice_vfio_pci_close_device,
+	.device_feature = vfio_pci_core_ioctl_feature,
+	.read		= vfio_pci_core_read,
+	.write		= vfio_pci_core_write,
+	.ioctl		= vfio_pci_core_ioctl,
+	.mmap		= vfio_pci_core_mmap,
+	.request	= vfio_pci_core_request,
+	.match		= vfio_pci_core_match,
+	.bind_iommufd	= vfio_iommufd_physical_bind,
+	.unbind_iommufd	= vfio_iommufd_physical_unbind,
+	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
+	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
+};
+
+/**
+ * ice_vfio_pci_probe - Device initialization routine
+ * @pdev: PCI device information struct
+ * @id: entry in ice_vfio_pci_table
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int
+ice_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+	struct ice_vfio_pci_core_device *ice_vdev;
+	int ret;
+
+	ice_vdev = vfio_alloc_device(ice_vfio_pci_core_device, core_device.vdev,
+				     &pdev->dev, &ice_vfio_pci_ops);
+	if (!ice_vdev)
+		return -ENOMEM;
+
+	dev_set_drvdata(&pdev->dev, &ice_vdev->core_device);
+
+	ret = vfio_pci_core_register_device(&ice_vdev->core_device);
+	if (ret)
+		goto out_free;
+
+	return 0;
+
+out_free:
+	vfio_put_device(&ice_vdev->core_device.vdev);
+	return ret;
+}
+
+/**
+ * ice_vfio_pci_remove - Device removal routine
+ * @pdev: PCI device information struct
+ */
+static void ice_vfio_pci_remove(struct pci_dev *pdev)
+{
+	struct ice_vfio_pci_core_device *ice_vdev =
+		(struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev);
+
+	vfio_pci_core_unregister_device(&ice_vdev->core_device);
+	vfio_put_device(&ice_vdev->core_device.vdev);
+}
+
+/* ice_pci_tbl - PCI Device ID Table
+ *
+ * Wildcard entries (PCI_ANY_ID) should come last
+ * Last entry must be all 0s
+ *
+ * { Vendor ID, Device ID, SubVendor ID, SubDevice ID,
+ *   Class, Class Mask, private data (not used) }
+ */
+static const struct pci_device_id ice_vfio_pci_table[] = {
+	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x1889) },
+	{}
+};
+MODULE_DEVICE_TABLE(pci, ice_vfio_pci_table);
+
+static const struct pci_error_handlers ice_vfio_pci_core_err_handlers = {
+	.reset_done = ice_vfio_pci_reset_done,
+	.error_detected = vfio_pci_core_aer_err_detected,
+};
+
+static struct pci_driver ice_vfio_pci_driver = {
+	.name			= "ice-vfio-pci",
+	.id_table		= ice_vfio_pci_table,
+	.probe			= ice_vfio_pci_probe,
+	.remove			= ice_vfio_pci_remove,
+	.err_handler            = &ice_vfio_pci_core_err_handlers,
+	.driver_managed_dma	= true,
+};
+
+module_pci_driver(ice_vfio_pci_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
+MODULE_DESCRIPTION(DRIVER_DESC);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF
  2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
@ 2023-11-29 17:12   ` Simon Horman
  2023-12-01  8:27     ` Cao, Yahui
  2023-12-07  7:33   ` Tian, Kevin
  2023-12-08  1:53   ` Brett Creeley
  2 siblings, 1 reply; 33+ messages in thread
From: Simon Horman @ 2023-11-29 17:12 UTC (permalink / raw)
  To: Yahui Cao
  Cc: intel-wired-lan, kvm, netdev, lingyu.liu, kevin.tian,
	madhu.chittim, sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

On Tue, Nov 21, 2023 at 02:51:04AM +0000, Yahui Cao wrote:
> From: Lingyu Liu <lingyu.liu@intel.com>
> 
> Save the virtual channel messages sent by VF on the source side during
> runtime. The logged virtchnl messages will be transferred and loaded
> into the device on the destination side during the device resume stage.
> 
> For the feature which can not be migrated yet, it must be disabled or
> blocked to prevent from being abused by VF. Otherwise, it may introduce
> functional and security issue. Mask unsupported VF capability flags in
> the VF-PF negotiaion stage.
> 
> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>

Hi Lingyu Liu and Yahui Cao,

some minor feedback from my side.

...

> diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c

...

> +/**
> + * ice_migration_log_vf_msg - Log request message from VF
> + * @vf: pointer to the VF structure
> + * @event: pointer to the AQ event
> + *
> + * Log VF message for later device state loading during live migration
> + *
> + * Return 0 for success, negative for error
> + */
> +int ice_migration_log_vf_msg(struct ice_vf *vf,
> +			     struct ice_rq_event_info *event)
> +{
> +	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
> +	u32 v_opcode = le32_to_cpu(event->desc.cookie_high);
> +	struct device *dev = ice_pf_to_dev(vf->pf);
> +	u16 msglen = event->msg_len;
> +	u8 *msg = event->msg_buf;
> +
> +	if (!ice_migration_is_loggable_msg(v_opcode))
> +		return 0;
> +
> +	if (vf->virtchnl_msg_num >= VIRTCHNL_MSG_MAX) {
> +		dev_warn(dev, "VF %d has maximum number virtual channel commands\n",
> +			 vf->vf_id);
> +		return -ENOMEM;
> +	}
> +
> +	msg_listnode = (struct ice_migration_virtchnl_msg_listnode *)
> +			kzalloc(struct_size(msg_listnode,
> +					    msg_slot.msg_buffer,
> +					    msglen),
> +				GFP_KERNEL);

nit: there is no need to cast the void * pointer returned by kzalloc().

Flagged by Coccinelle.

> +	if (!msg_listnode) {
> +		dev_err(dev, "VF %d failed to allocate memory for msg listnode\n",
> +			vf->vf_id);
> +		return -ENOMEM;
> +	}
> +	dev_dbg(dev, "VF %d save virtual channel command, op code: %d, len: %d\n",
> +		vf->vf_id, v_opcode, msglen);
> +	msg_listnode->msg_slot.opcode = v_opcode;
> +	msg_listnode->msg_slot.msg_len = msglen;
> +	memcpy(msg_listnode->msg_slot.msg_buffer, msg, msglen);
> +	list_add_tail(&msg_listnode->node, &vf->virtchnl_msg_list);
> +	vf->virtchnl_msg_num++;
> +	vf->virtchnl_msg_size += struct_size(&msg_listnode->msg_slot,
> +					     msg_buffer,
> +					     msglen);
> +	return 0;
> +}
> +
> +/**
> + * ice_migration_unlog_vf_msg - revert logged message
> + * @vf: pointer to the VF structure
> + * @v_opcode: virtchnl message operation code
> + *
> + * Remove the last virtual channel message logged before.
> + */
> +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode)
> +{
> +	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
> +
> +	if (!ice_migration_is_loggable_msg(v_opcode))
> +		return;
> +
> +	if (WARN_ON_ONCE(list_empty(&vf->virtchnl_msg_list)))
> +		return;
> +
> +	msg_listnode =
> +		list_last_entry(&vf->virtchnl_msg_list,
> +				struct ice_migration_virtchnl_msg_listnode,
> +				node);
> +	if (WARN_ON_ONCE(msg_listnode->msg_slot.opcode != v_opcode))
> +		return;
> +
> +	list_del(&msg_listnode->node);
> +	kfree(msg_listnode);

msg_listnode is freed on the line above,
but dereferenced in the usage of struct_size() below.

As flagged by Smatch and Coccinelle.

> +	vf->virtchnl_msg_num--;
> +	vf->virtchnl_msg_size -= struct_size(&msg_listnode->msg_slot,
> +					     msg_buffer,
> +					     msg_listnode->msg_slot.msg_len);
> +}

...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF
  2023-11-29 17:12   ` Simon Horman
@ 2023-12-01  8:27     ` Cao, Yahui
  0 siblings, 0 replies; 33+ messages in thread
From: Cao, Yahui @ 2023-12-01  8:27 UTC (permalink / raw)
  To: Simon Horman
  Cc: intel-wired-lan, kvm, netdev, lingyu.liu, kevin.tian,
	madhu.chittim, sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni



On 11/30/2023 1:12 AM, Simon Horman wrote:
> On Tue, Nov 21, 2023 at 02:51:04AM +0000, Yahui Cao wrote:
>> From: Lingyu Liu <lingyu.liu@intel.com>
>>
>> Save the virtual channel messages sent by VF on the source side during
>> runtime. The logged virtchnl messages will be transferred and loaded
>> into the device on the destination side during the device resume stage.
>>
>> For the feature which can not be migrated yet, it must be disabled or
>> blocked to prevent from being abused by VF. Otherwise, it may introduce
>> functional and security issue. Mask unsupported VF capability flags in
>> the VF-PF negotiaion stage.
>>
>> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
>> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> 
> Hi Lingyu Liu and Yahui Cao,
> 
> some minor feedback from my side.
> 
> ...
> 
>> diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
> 
> ...
> 
>> +/**
>> + * ice_migration_log_vf_msg - Log request message from VF
>> + * @vf: pointer to the VF structure
>> + * @event: pointer to the AQ event
>> + *
>> + * Log VF message for later device state loading during live migration
>> + *
>> + * Return 0 for success, negative for error
>> + */
>> +int ice_migration_log_vf_msg(struct ice_vf *vf,
>> +			     struct ice_rq_event_info *event)
>> +{
>> +	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
>> +	u32 v_opcode = le32_to_cpu(event->desc.cookie_high);
>> +	struct device *dev = ice_pf_to_dev(vf->pf);
>> +	u16 msglen = event->msg_len;
>> +	u8 *msg = event->msg_buf;
>> +
>> +	if (!ice_migration_is_loggable_msg(v_opcode))
>> +		return 0;
>> +
>> +	if (vf->virtchnl_msg_num >= VIRTCHNL_MSG_MAX) {
>> +		dev_warn(dev, "VF %d has maximum number virtual channel commands\n",
>> +			 vf->vf_id);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	msg_listnode = (struct ice_migration_virtchnl_msg_listnode *)
>> +			kzalloc(struct_size(msg_listnode,
>> +					    msg_slot.msg_buffer,
>> +					    msglen),
>> +				GFP_KERNEL);
> 
> nit: there is no need to cast the void * pointer returned by kzalloc().
> 
> Flagged by Coccinelle.

Sure. Will fix in next version.

> 
>> +	if (!msg_listnode) {
>> +		dev_err(dev, "VF %d failed to allocate memory for msg listnode\n",
>> +			vf->vf_id);
>> +		return -ENOMEM;
>> +	}
>> +	dev_dbg(dev, "VF %d save virtual channel command, op code: %d, len: %d\n",
>> +		vf->vf_id, v_opcode, msglen);
>> +	msg_listnode->msg_slot.opcode = v_opcode;
>> +	msg_listnode->msg_slot.msg_len = msglen;
>> +	memcpy(msg_listnode->msg_slot.msg_buffer, msg, msglen);
>> +	list_add_tail(&msg_listnode->node, &vf->virtchnl_msg_list);
>> +	vf->virtchnl_msg_num++;
>> +	vf->virtchnl_msg_size += struct_size(&msg_listnode->msg_slot,
>> +					     msg_buffer,
>> +					     msglen);
>> +	return 0;
>> +}
>> +
>> +/**
>> + * ice_migration_unlog_vf_msg - revert logged message
>> + * @vf: pointer to the VF structure
>> + * @v_opcode: virtchnl message operation code
>> + *
>> + * Remove the last virtual channel message logged before.
>> + */
>> +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode)
>> +{
>> +	struct ice_migration_virtchnl_msg_listnode *msg_listnode;
>> +
>> +	if (!ice_migration_is_loggable_msg(v_opcode))
>> +		return;
>> +
>> +	if (WARN_ON_ONCE(list_empty(&vf->virtchnl_msg_list)))
>> +		return;
>> +
>> +	msg_listnode =
>> +		list_last_entry(&vf->virtchnl_msg_list,
>> +				struct ice_migration_virtchnl_msg_listnode,
>> +				node);
>> +	if (WARN_ON_ONCE(msg_listnode->msg_slot.opcode != v_opcode))
>> +		return;
>> +
>> +	list_del(&msg_listnode->node);
>> +	kfree(msg_listnode);
> 
> msg_listnode is freed on the line above,
> but dereferenced in the usage of struct_size() below.
> 
> As flagged by Smatch and Coccinelle.
>  >> +	vf->virtchnl_msg_num--;
>> +	vf->virtchnl_msg_size -= struct_size(&msg_listnode->msg_slot,
>> +					     msg_buffer,
>> +					     msg_listnode->msg_slot.msg_len);
>> +}
> 
> ...
> 

Good catch :) Will fix in next version

Thanks.
Yahui.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 00/12] Add E800 live migration driver
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (11 preceding siblings ...)
  2023-11-21  2:51 ` [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices Yahui Cao
@ 2023-12-04 11:18 ` Cao, Yahui
  2024-01-18 22:09 ` Jacob Keller
  13 siblings, 0 replies; 33+ messages in thread
From: Cao, Yahui @ 2023-12-04 11:18 UTC (permalink / raw)
  To: jgg, alex.williamson
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, yishaih, shameerali.kolothum.thodi,
	brett.creeley, davem, edumazet, kuba, pabeni, intel-wired-lan



On 11/21/2023 10:50 AM, Yahui Cao wrote:
> This series adds vfio live migration support for Intel E810 VF devices
> based on the v2 migration protocol definition series discussed here[0].
> 
> Steps to test:
> 1. Bind one or more E810 VF devices to the module ice-vfio-pci.ko
> 2. Assign the VFs to the virtual machine and enable device live migration
> 3. Run a workload using IAVF inside the VM, for example, iperf.
> 4. Migrate the VM from the source node to a destination node.
> 
> The series is also available for review here[1].
> 
> Thanks,
> Yahui
> [0] https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux.git/log/?h=ice_live_migration
> 
> Change log:
> 
> v4:
>   - Remove unnecessary iomap from vfio variant driver
>   - Change Kconfig to select VFIO_PCI_CORE for ICE_VFIO_PCI module (Alex)
>   - Replace restore state with load state for naming convention
>   - Remove RXDID Patch
>   - Fix missed comments in Patch03
>   - Remove "so" at the beginning of the sentence and fix other grammar issue.
>   - Remove double init and change return logic for Patch 10
>   - Change ice_migration_unlog_vf_msg comments for Patch04
>   - Add r-b from Michal to Patch04 of v4
>   - Change ice_migration_is_loggable_msg return value type into bool type for Patch05
>   - Change naming from dirtied to dirty for Patch11
>   - Use total_length to pass parameter to save/load function instead of macro for Patch12
>   - Refactor timeout logic for Patch09
>   - Change migration_enabled from bool into u8:1 type for Patch04
>   - Fix 80 max line length limit issue and compilation warning
>   - Add r-b from Igor to all the patches of v4
>   - Fix incorrect type in assignment of __le16/32 for Patch06
>   - Change product name to from E800 to E810
> 
> v3: https://lore.kernel.org/intel-wired-lan/20230918062546.40419-1-yahui.cao@intel.com/
>   - Add P2P support in vfio driver (Jason)
>   - Remove source/destination check in vfio driver (Jason)
>   - Restructure PF exported API with proper types and layering (Jason)
>   - Change patchset email sender.
>   - Reword commit message and comments to be more reviewer-friendly (Kevin)
>   - Add s-o-b for Patch01 (Kevin)
>   - Merge Patch08 into Patch04 and merge Patch13 into Patch06 (Kevin)
>   - Remove uninit() in VF destroy stage for Patch 05 (Kevin)
>   - change migration_active to migration_enabled (Kevin)
>   - Add total_size in devstate to greatly simplify the various checks for
>     Patch07 (Kevin)
>   - Add magic and version in device state for Patch07 (Kevin)
>   - Fix rx head init issue in Patch10 (Kevin)
>   - Remove DMA access for Guest Memory at device resume stage and deprecate
>     the approach to restore TX head in VF space, instead restore TX head in
>     PF space and then switch context back to VF space which is transparent
>     to Guest for Patch11 (Jason, Kevin)
>   - Use non-interrupt mode instead of VF MSIX vector to restore TX head for
>     Patch11 (Kevin)
>   - Move VF pci mmio save/restore from vfio driver into PF driver
>   - Add configuration match check at device resume stage (Kevin)
>   - Remove sleep before stopping queue at device suspend stage (Kevin)
>   - Let PF respond failure to VF if virtual channel messages logging failed (Kevin)
>   - Add migration setup and description in cover letter
> 
> v2: https://lore.kernel.org/intel-wired-lan/20230621091112.44945-1-lingyu.liu@intel.com/
>   - clarified comments and commit message
> 
> v1: https://lore.kernel.org/intel-wired-lan/20230620100001.5331-1-lingyu.liu@intel.com/
> 
> ---
> 
> 
> Lingyu Liu (9):
>    ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration
>    ice: Add fundamental migration init and exit function
>    ice: Log virtual channel messages in PF
>    ice: Add device state save/load function for migration
>    ice: Fix VSI id in virtual channel message for migration
>    ice: Save and load RX Queue head
>    ice: Save and load TX Queue head
>    ice: Add device suspend function for migration
>    vfio/ice: Implement vfio_pci driver for E800 devices
> 
> Yahui Cao (3):
>    ice: Add function to get RX queue context
>    ice: Add function to get and set TX queue context
>    ice: Save and load mmio registers
> 
>   MAINTAINERS                                   |    7 +
>   drivers/net/ethernet/intel/ice/Makefile       |    1 +
>   drivers/net/ethernet/intel/ice/ice.h          |    3 +
>   drivers/net/ethernet/intel/ice/ice_common.c   |  484 +++++-
>   drivers/net/ethernet/intel/ice/ice_common.h   |   11 +
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |   23 +
>   .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |    3 +
>   drivers/net/ethernet/intel/ice/ice_main.c     |   15 +
>   .../net/ethernet/intel/ice/ice_migration.c    | 1378 +++++++++++++++++
>   .../intel/ice/ice_migration_private.h         |   49 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.c   |    4 +
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   11 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c |  256 ++-
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |   15 +-
>   .../ethernet/intel/ice/ice_virtchnl_fdir.c    |   28 +-
>   drivers/vfio/pci/Kconfig                      |    2 +
>   drivers/vfio/pci/Makefile                     |    2 +
>   drivers/vfio/pci/ice/Kconfig                  |   10 +
>   drivers/vfio/pci/ice/Makefile                 |    4 +
>   drivers/vfio/pci/ice/ice_vfio_pci.c           |  707 +++++++++
>   include/linux/net/intel/ice_migration.h       |   48 +
>   21 files changed, 2962 insertions(+), 99 deletions(-)
>   create mode 100644 drivers/net/ethernet/intel/ice/ice_migration.c
>   create mode 100644 drivers/net/ethernet/intel/ice/ice_migration_private.h
>   create mode 100644 drivers/vfio/pci/ice/Kconfig
>   create mode 100644 drivers/vfio/pci/ice/Makefile
>   create mode 100644 drivers/vfio/pci/ice/ice_vfio_pci.c
>   create mode 100644 include/linux/net/intel/ice_migration.h
> 

Hey Jason & Alex,

     Did you have any chance to review this v4 patchset ?

     The branch is published as 
https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux.git/log/?h=ice_live_migration 
as requested.

     These patches are based on top of commit b85ea95d0864 ("Linux 
6.7-rc1") and being sent as a whole for ease of review. A branch/shared 
pull request for the networking portion of these patches (1-11) will be 
sent when review is complete.

Thanks.
Yahui.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF
  2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
  2023-11-29 17:12   ` Simon Horman
@ 2023-12-07  7:33   ` Tian, Kevin
  2023-12-08  1:53   ` Brett Creeley
  2 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2023-12-07  7:33 UTC (permalink / raw)
  To: Cao, Yahui, intel-wired-lan@lists.osuosl.org
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, Liu, Lingyu,
	Chittim, Madhu, Samudrala, Sridhar, alex.williamson@redhat.com,
	jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Cao, Yahui <yahui.cao@intel.com>
> Sent: Tuesday, November 21, 2023 10:51 AM
> @@ -4037,6 +4045,17 @@ void ice_vc_process_vf_msg(struct ice_pf *pf,
> struct ice_rq_event_info *event,
>  		goto finish;
>  	}
> 
> +	if (vf->migration_enabled) {
> +		if (ice_migration_log_vf_msg(vf, event)) {
> +			u32 status_code =
> VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +
> +			err = ice_vc_respond_to_vf(vf, v_opcode,
> +						   status_code,
> +						   NULL, 0);
> +			goto finish;
> +		}
> +	}
> +

I'm not sure it's a good thing to fail guest just because the message
cannot be logged for migration purpose.

It's more reasonable to block migration in this case while letting the
guest run as normal...

> 
> +	/* All of the loggable virtual channel messages are logged by
> +	 * ice_migration_unlog_vf_msg() before they are processed.
> +	 *
> +	 * Two kinds of error may happen, virtual channel message's result
> +	 * is failure after processed by PF or message is not sent to VF
> +	 * successfully. If error happened, fallback here by reverting logged
> +	 * messages.
> +	 */
> +	if (vf->migration_enabled &&
> +	    (vf->virtchnl_retval != VIRTCHNL_STATUS_SUCCESS || err))
> +		ice_migration_unlog_vf_msg(vf, v_opcode);
> +

... and here unlog is not required. Just place log at this point.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration
  2023-11-21  2:51 ` [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration Yahui Cao
@ 2023-12-07  7:39   ` Tian, Kevin
  0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2023-12-07  7:39 UTC (permalink / raw)
  To: Cao, Yahui, intel-wired-lan@lists.osuosl.org
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, Liu, Lingyu,
	Chittim, Madhu, Samudrala, Sridhar, alex.williamson@redhat.com,
	jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Cao, Yahui <yahui.cao@intel.com>
> Sent: Tuesday, November 21, 2023 10:51 AM
> +
> +	while (msg_slot->opcode != VIRTCHNL_OP_UNKNOWN) {
> +		struct ice_rq_event_info event;
> +		u64 slot_sz;
> +
> +		slot_sz = struct_size(msg_slot, msg_buffer, msg_slot-
> >msg_len);
> +		dev_dbg(dev, "VF %d replay virtchnl message op code: %d,
> msg len: %d\n",
> +			vf->vf_id, msg_slot->opcode, msg_slot->msg_len);
> +		event.desc.cookie_high = cpu_to_le32(msg_slot->opcode);
> +		event.msg_len = msg_slot->msg_len;
> +		event.desc.retval = cpu_to_le16(vf->vf_id);
> +		event.msg_buf = (unsigned char *)msg_slot->msg_buffer;
> +		ret = ice_vc_process_vf_msg(vf->pf, &event, NULL);
> +		if (ret) {
> +			dev_err(dev, "VF %d failed to replay virtchnl message
> op code: %d\n",
> +				vf->vf_id, msg_slot->opcode);
> +			goto out_clear_replay;
> +		}
> +		event.msg_buf = NULL;

this line is unnecessary.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message for migration
  2023-11-21  2:51 ` [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message " Yahui Cao
@ 2023-12-07  7:42   ` Tian, Kevin
  0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2023-12-07  7:42 UTC (permalink / raw)
  To: Cao, Yahui, intel-wired-lan@lists.osuosl.org
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, Liu, Lingyu,
	Chittim, Madhu, Samudrala, Sridhar, alex.williamson@redhat.com,
	jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Cao, Yahui <yahui.cao@intel.com>
> Sent: Tuesday, November 21, 2023 10:51 AM
>
> +		/* Read the beginning two bytes of message for VSI id */
> +		u16 *vsi_id = (u16 *)msg;
> +
> +		/* For VM runtime stage, vsi_id in the virtual channel
> message
> +		 * should be equal to the PF logged vsi_id and vsi_id is
> +		 * replaced by VF's VSI id to guarantee that messages are
> +		 * processed successfully. If vsi_id is not equal to the PF
> +		 * logged vsi_id, then this message must be sent by malicious
> +		 * VF and no replacement is needed. Just let virtual channel
> +		 * handler to fail this message.
> +		 *
> +		 * For virtual channel replaying stage, all of the PF logged
> +		 * virtual channel messages are trusted and vsi_id is replaced
> +		 * anyway to guarantee the messages are processed
> successfully.
> +		 */
> +		if (*vsi_id == vf->vm_vsi_num ||
> +		    test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states))
> +			*vsi_id = vf->lan_vsi_num;

The second check is redundant. As long as vf->vm_vsi_num is restored
before replaying vc messages, there shouldn't be mismatch in the replay
phase.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head
  2023-11-21  2:51 ` [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head Yahui Cao
@ 2023-12-07  7:55   ` Tian, Kevin
  2023-12-07 14:46     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Tian, Kevin @ 2023-12-07  7:55 UTC (permalink / raw)
  To: Cao, Yahui, intel-wired-lan@lists.osuosl.org
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, Liu, Lingyu,
	Chittim, Madhu, Samudrala, Sridhar, alex.williamson@redhat.com,
	jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Cao, Yahui <yahui.cao@intel.com>
> Sent: Tuesday, November 21, 2023 10:51 AM
>
> +
> +		/* Once RX Queue is enabled, network traffic may come in at
> any
> +		 * time. As a result, RX Queue head needs to be loaded
> before
> +		 * RX Queue is enabled.
> +		 * For simplicity and integration, overwrite RX head just after
> +		 * RX ring context is configured.
> +		 */
> +		if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES)
> {
> +			ret = ice_migration_load_rx_head(vf, devstate);
> +			if (ret) {
> +				dev_err(dev, "VF %d failed to load rx head\n",
> +					vf->vf_id);
> +				goto out_clear_replay;
> +			}
> +		}
> +

Don't we have the same problem here as for TX head restore that the
vfio migration protocol doesn't carry a way to tell whether the IOAS
associated with the device has been restored then allowing RX DMA
at this point might cause device error?

@Jason, is it a common gap applying to all devices which include a
receiving path from link? How is it handled in mlx migration
driver? 

I may overlook an important aspect here but if not I wonder whether
the migration driver should keep DMA disabled (at least for RX) even
when the device moves to RUNNING and then introduce an explicit
enable-DMA state which VMM can request after it restores the
relevant IOAS/HWPT...
with the device.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 09/12] ice: Save and load TX Queue head
  2023-11-21  2:51 ` [PATCH iwl-next v4 09/12] ice: Save and load TX " Yahui Cao
@ 2023-12-07  8:22   ` Tian, Kevin
  2023-12-07 14:48     ` Jason Gunthorpe
  0 siblings, 1 reply; 33+ messages in thread
From: Tian, Kevin @ 2023-12-07  8:22 UTC (permalink / raw)
  To: Cao, Yahui, intel-wired-lan@lists.osuosl.org
  Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, Liu, Lingyu,
	Chittim, Madhu, Samudrala, Sridhar, alex.williamson@redhat.com,
	jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Cao, Yahui <yahui.cao@intel.com>
> Sent: Tuesday, November 21, 2023 10:51 AM
> 
> To advance TX Head queue, HW needs to touch memory by DMA. But
> directly
> touching VM's memory to advance TX Queue head does not follow vfio
> migration protocol design, because vIOMMU state is not defined by the
> protocol. Even this may introduce functional and security issue under
> hostile guest circumstances.

this limitation is not restricted to vIOMMU. Even when it's absent
there is still no guarantee that the GPA address space has been
re-attached to this device.

> 
> In order not to touch any VF memory or IO page table, TX Queue head
> loading is using PF managed memory and PF isolation domain. This will

PF doesn't manage memory. It's probably clearer to say that TX queue
is temporarily moved to PF when the head is being restored.

> also introduce another dependency that while switching TX Queue between
> PF space and VF space, TX Queue head value is not changed. HW provides
> an indirect context access so that head value can be kept while
> switching context.
> 
> In virtual channel model, VF driver only send TX queue ring base and
> length info to PF, while rest of the TX queue context are managed by PF.
> TX queue length must be verified by PF during virtual channel message
> processing. When PF uses dummy descriptors to advance TX head, it will
> configure the TX ring base as the new address managed by PF itself. As a
> result, all of the TX queue context is taken control of by PF and this
> method won't generate any attacking vulnerability

So basically the key points are:

1) TX queue head cannot be directly updated via VF mmio interface;
2) Using dummy descriptors to update TX queue head is possible but it
    must be done in PF's context;
3) FW provides a way to keep TX queue head intact when moving
    the TX queue ownership between VF and PF;
4) the TX queue context affected by the ownership change is largely
    initialized by the PF driver already, except ring base/size coming from
    virtual channel messages. This implies that a malicious guest VF driver
    cannot attack this small window though the tx head restore is done
    after all the VF state are restored;
5) and a missing point is that the temporary owner change doesn't
    expose the TX queue to the software stack on top of the PF driver
    otherwise that would be a severe issue.

> +static int
> +ice_migration_save_tx_head(struct ice_vf *vf,
> +			   struct ice_migration_dev_state *devstate)
> +{
> +	struct ice_vsi *vsi = ice_get_vf_vsi(vf);
> +	struct ice_pf *pf = vf->pf;
> +	struct device *dev;
> +	int i = 0;
> +
> +	dev = ice_pf_to_dev(pf);
> +
> +	if (!vsi) {
> +		dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id);
> +		return -EINVAL;
> +	}
> +
> +	ice_for_each_txq(vsi, i) {
> +		u16 tx_head;
> +		u32 reg;
> +
> +		devstate->tx_head[i] = 0;
> +		if (!test_bit(i, vf->txq_ena))
> +			continue;
> +
> +		reg = rd32(&pf->hw, QTX_COMM_HEAD(vsi->txq_map[i]));
> +		tx_head = (reg & QTX_COMM_HEAD_HEAD_M)
> +					>> QTX_COMM_HEAD_HEAD_S;
> +
> +		/* 1. If TX head is QTX_COMM_HEAD_HEAD_M marker,
> which means
> +		 *    it is the value written by software and there are no
> +		 *    descriptors write back happened, then there are no
> +		 *    packets sent since queue enabled.

It's unclear why it's not zero when no packet is sent.

> +static int
> +ice_migration_inject_dummy_desc(struct ice_vf *vf, struct ice_tx_ring
> *tx_ring,
> +				u16 head, dma_addr_t tx_desc_dma)

based on intention this reads clearer to be:

	ice_migration_restore_tx_head()


> +
> +	/* 1.3 Disable TX queue interrupt */
> +	wr32(hw, QINT_TQCTL(tx_ring->reg_idx), QINT_TQCTL_ITR_INDX_M);
> +
> +	/* To disable tx queue interrupt during run time, software should
> +	 * write mmio to trigger a MSIX interrupt.
> +	 */
> +	if (tx_ring->q_vector)
> +		wr32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx),
> +		     (ICE_ITR_NONE << GLINT_DYN_CTL_ITR_INDX_S) |
> +		     GLINT_DYN_CTL_SWINT_TRIG_M |
> +		     GLINT_DYN_CTL_INTENA_M);

this needs more explanation as it's not intuitive to disable interrupt by
triggering another interrupt.

> +
> +	ice_for_each_txq(vsi, i) {
> +		struct ice_tx_ring *tx_ring = vsi->tx_rings[i];
> +		u16 *tx_heads = devstate->tx_head;
> +
> +		/* 1. Skip if TX Queue is not enabled */
> +		if (!test_bit(i, vf->txq_ena) || tx_heads[i] == 0)
> +			continue;
> +
> +		if (tx_heads[i] >= tx_ring->count) {
> +			dev_err(dev, "VF %d: invalid tx ring length to load\n",
> +				vf->vf_id);
> +			ret = -EINVAL;
> +			goto err;
> +		}
> +
> +		/* Dummy descriptors must be re-initialized after use, since
> +		 * it may be written back by HW
> +		 */
> +		ice_migration_init_dummy_desc(tx_desc, ring_len,
> tx_pkt_dma);
> +		ret = ice_migration_inject_dummy_desc(vf, tx_ring,
> tx_heads[i],
> +						      tx_desc_dma);
> +		if (ret)
> +			goto err;
> +	}
> +
> +err:
> +	dma_free_coherent(dev, ring_len * sizeof(struct ice_tx_desc),
> +			  tx_desc, tx_desc_dma);
> +	dma_free_coherent(dev, SZ_4K, tx_pkt, tx_pkt_dma);
> +
> +	return ret;

there is no err unwinding for the tx ring context itself.

> +
> +	/* Only load the TX Queue head after rest of device state is loaded
> +	 * successfully.
> +	 */

"otherwise it might be changed by virtual channel messages e.g. reset"

> @@ -1351,6 +1351,24 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8
> *msg)
>  			continue;
> 
>  		ice_vf_ena_txq_interrupt(vsi, vf_q_id);
> +
> +		/* TX head register is a shadow copy of on-die TX head which
> +		 * maintains the accurate location. And TX head register is
> +		 * updated only after a packet is sent. If nothing is sent
> +		 * after the queue is enabled, then the value is the one
> +		 * updated last time and out-of-date.

when is "last time"? Is it even not updated upon reset?

or does it talk about a disable-enable sequence in which the real TX head
is left with a stale value from last enable?

> +		 *
> +		 * QTX_COMM_HEAD.HEAD rang value from 0x1fe0 to 0x1fff
> is
> +		 * reserved and will never be used by HW. Manually write a
> +		 * reserved value into TX head and use this as a marker for
> +		 * the case that there's no packets sent.

why using a reserved value instead of setting it to 0?

> +		 *
> +		 * This marker is only used in live migration use case.
> +		 */
> +		if (vf->migration_enabled)
> +			wr32(&vsi->back->hw,
> +			     QTX_COMM_HEAD(vsi->txq_map[vf_q_id]),
> +			     QTX_COMM_HEAD_HEAD_M);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head
  2023-12-07  7:55   ` Tian, Kevin
@ 2023-12-07 14:46     ` Jason Gunthorpe
  2023-12-08  2:53       ` Tian, Kevin
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2023-12-07 14:46 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Cao, Yahui, intel-wired-lan@lists.osuosl.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Liu, Lingyu, Chittim, Madhu,
	Samudrala, Sridhar, alex.williamson@redhat.com,
	yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com,
	brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com

On Thu, Dec 07, 2023 at 07:55:17AM +0000, Tian, Kevin wrote:
> > From: Cao, Yahui <yahui.cao@intel.com>
> > Sent: Tuesday, November 21, 2023 10:51 AM
> >
> > +
> > +		/* Once RX Queue is enabled, network traffic may come in at
> > any
> > +		 * time. As a result, RX Queue head needs to be loaded
> > before
> > +		 * RX Queue is enabled.
> > +		 * For simplicity and integration, overwrite RX head just after
> > +		 * RX ring context is configured.
> > +		 */
> > +		if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES)
> > {
> > +			ret = ice_migration_load_rx_head(vf, devstate);
> > +			if (ret) {
> > +				dev_err(dev, "VF %d failed to load rx head\n",
> > +					vf->vf_id);
> > +				goto out_clear_replay;
> > +			}
> > +		}
> > +
> 
> Don't we have the same problem here as for TX head restore that the
> vfio migration protocol doesn't carry a way to tell whether the IOAS
> associated with the device has been restored then allowing RX DMA
> at this point might cause device error?

Does this trigger a DMA?

> @Jason, is it a common gap applying to all devices which include a
> receiving path from link? How is it handled in mlx migration
> driver? 

There should be no DMA until the device is placed in RUNNING. All
devices may instantly trigger DMA once placed in RUNNING.

The VMM must ensure the entire environment is ready to go before
putting anything in RUNNING, including having setup the IOMMU.

> I may overlook an important aspect here but if not I wonder whether
> the migration driver should keep DMA disabled (at least for RX) even
> when the device moves to RUNNING and then introduce an explicit
> enable-DMA state which VMM can request after it restores the
> relevant IOAS/HWPT...
> with the device.

Why do we need a state like this?

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 09/12] ice: Save and load TX Queue head
  2023-12-07  8:22   ` Tian, Kevin
@ 2023-12-07 14:48     ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2023-12-07 14:48 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Cao, Yahui, intel-wired-lan@lists.osuosl.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Liu, Lingyu, Chittim, Madhu,
	Samudrala, Sridhar, alex.williamson@redhat.com,
	yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com,
	brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com

On Thu, Dec 07, 2023 at 08:22:53AM +0000, Tian, Kevin wrote:
> > In virtual channel model, VF driver only send TX queue ring base and
> > length info to PF, while rest of the TX queue context are managed by PF.
> > TX queue length must be verified by PF during virtual channel message
> > processing. When PF uses dummy descriptors to advance TX head, it will
> > configure the TX ring base as the new address managed by PF itself. As a
> > result, all of the TX queue context is taken control of by PF and this
> > method won't generate any attacking vulnerability
> 
> So basically the key points are:
> 
> 1) TX queue head cannot be directly updated via VF mmio interface;
> 2) Using dummy descriptors to update TX queue head is possible but it
>     must be done in PF's context;
> 3) FW provides a way to keep TX queue head intact when moving
>     the TX queue ownership between VF and PF;
> 4) the TX queue context affected by the ownership change is largely
>     initialized by the PF driver already, except ring base/size coming from
>     virtual channel messages. This implies that a malicious guest VF driver
>     cannot attack this small window though the tx head restore is done
>     after all the VF state are restored;
> 5) and a missing point is that the temporary owner change doesn't
>     expose the TX queue to the software stack on top of the PF driver
>     otherwise that would be a severe issue.

This matches my impression of these patches. It is convoluted but the
explanation sounds find, and if Intel has done an internal security
review then I have no issue.

Jason

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices
  2023-11-21  2:51 ` [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices Yahui Cao
@ 2023-12-07 22:43   ` Alex Williamson
  2023-12-08  3:42     ` Tian, Kevin
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2023-12-07 22:43 UTC (permalink / raw)
  To: Yahui Cao
  Cc: intel-wired-lan, kvm, netdev, lingyu.liu, kevin.tian,
	madhu.chittim, sridhar.samudrala, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

On Tue, 21 Nov 2023 02:51:11 +0000
Yahui Cao <yahui.cao@intel.com> wrote:

> From: Lingyu Liu <lingyu.liu@intel.com>
> 
> Add a vendor-specific vfio_pci driver for E800 devices.
> 
> It uses vfio_pci_core to register to the VFIO subsystem and then
> implements the E800 specific logic to support VF live migration.
> 
> It implements the device state transition flow for live
> migration.
> 
> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> ---
>  MAINTAINERS                         |   7 +
>  drivers/vfio/pci/Kconfig            |   2 +
>  drivers/vfio/pci/Makefile           |   2 +
>  drivers/vfio/pci/ice/Kconfig        |  10 +
>  drivers/vfio/pci/ice/Makefile       |   4 +
>  drivers/vfio/pci/ice/ice_vfio_pci.c | 707 ++++++++++++++++++++++++++++
>  6 files changed, 732 insertions(+)
>  create mode 100644 drivers/vfio/pci/ice/Kconfig
>  create mode 100644 drivers/vfio/pci/ice/Makefile
>  create mode 100644 drivers/vfio/pci/ice/ice_vfio_pci.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 97f51d5ec1cf..c8faf7fe1bd1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -22860,6 +22860,13 @@ L:	kvm@vger.kernel.org
>  S:	Maintained
>  F:	drivers/vfio/pci/mlx5/
>  
> +VFIO ICE PCI DRIVER
> +M:	Yahui Cao <yahui.cao@intel.com>
> +M:	Lingyu Liu <lingyu.liu@intel.com>
> +L:	kvm@vger.kernel.org
> +S:	Maintained
> +F:	drivers/vfio/pci/ice/
> +
>  VFIO PCI DEVICE SPECIFIC DRIVERS
>  R:	Jason Gunthorpe <jgg@nvidia.com>
>  R:	Yishai Hadas <yishaih@nvidia.com>
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 8125e5f37832..6618208947af 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
>  
>  source "drivers/vfio/pci/pds/Kconfig"
>  
> +source "drivers/vfio/pci/ice/Kconfig"
> +
>  endmenu
> diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
> index 45167be462d8..fc1df82df3ac 100644
> --- a/drivers/vfio/pci/Makefile
> +++ b/drivers/vfio/pci/Makefile
> @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)           += mlx5/
>  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
>  
>  obj-$(CONFIG_PDS_VFIO_PCI) += pds/
> +
> +obj-$(CONFIG_ICE_VFIO_PCI) += ice/
> diff --git a/drivers/vfio/pci/ice/Kconfig b/drivers/vfio/pci/ice/Kconfig
> new file mode 100644
> index 000000000000..0b8cd1489073
> --- /dev/null
> +++ b/drivers/vfio/pci/ice/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config ICE_VFIO_PCI
> +	tristate "VFIO support for Intel(R) Ethernet Connection E800 Series"
> +	depends on ICE
> +	select VFIO_PCI_CORE
> +	help
> +	  This provides migration support for Intel(R) Ethernet connection E800
> +	  series devices using the VFIO framework.
> +
> +	  If you don't know what to do here, say N.
> diff --git a/drivers/vfio/pci/ice/Makefile b/drivers/vfio/pci/ice/Makefile
> new file mode 100644
> index 000000000000..259d4ab89105
> --- /dev/null
> +++ b/drivers/vfio/pci/ice/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_ICE_VFIO_PCI) += ice-vfio-pci.o
> +ice-vfio-pci-y := ice_vfio_pci.o
> +
> diff --git a/drivers/vfio/pci/ice/ice_vfio_pci.c b/drivers/vfio/pci/ice/ice_vfio_pci.c
> new file mode 100644
> index 000000000000..28a181aa2f3f
> --- /dev/null
> +++ b/drivers/vfio/pci/ice/ice_vfio_pci.c
> @@ -0,0 +1,707 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (C) 2018-2023 Intel Corporation */
> +
> +#include <linux/device.h>
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/file.h>
> +#include <linux/pci.h>
> +#include <linux/vfio_pci_core.h>
> +#include <linux/net/intel/ice_migration.h>
> +#include <linux/anon_inodes.h>
> +
> +#define DRIVER_DESC     "ICE VFIO PCI - User Level meta-driver for Intel E800 device family"
> +
> +struct ice_vfio_pci_migration_file {
> +	struct file *filp;
> +	struct mutex lock; /* protect migration file access */
> +	bool disabled;
> +
> +	u8 mig_data[SZ_128K];
> +	size_t total_length;
> +};
> +
> +struct ice_vfio_pci_core_device {
> +	struct vfio_pci_core_device core_device;
> +	u8 deferred_reset:1;

Move vf_id here to use some of the hole this leaves.

> +	struct mutex state_mutex; /* protect migration state */
> +	enum vfio_device_mig_state mig_state;
> +	/* protect the reset_done flow */
> +	spinlock_t reset_lock;
> +	struct ice_vfio_pci_migration_file *resuming_migf;
> +	struct ice_vfio_pci_migration_file *saving_migf;
> +	struct vfio_device_migration_info mig_info;
> +	u8 *mig_data;
> +	struct ice_pf *pf;
> +	int vf_id;
> +};
> +
> +/**
> + * ice_vfio_pci_load_state - VFIO device state reloading
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + *
> + * Load device state. This function is called when the userspace VFIO uAPI
> + * consumer wants to load the device state info from VFIO migration region and
> + * load them into the device. This function should make sure all the device
> + * state info is loaded successfully. As a result, return value is mandatory
> + * to be checked.
> + *
> + * Return 0 for success, negative value for failure.
> + */
> +static int __must_check
> +ice_vfio_pci_load_state(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	struct ice_vfio_pci_migration_file *migf = ice_vdev->resuming_migf;
> +
> +	return ice_migration_load_devstate(ice_vdev->pf,
> +					   ice_vdev->vf_id,
> +					   migf->mig_data,
> +					   migf->total_length);
> +}
> +
> +/**
> + * ice_vfio_pci_save_state - VFIO device state saving
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + * @migf: pointer to migration file
> + *
> + * Snapshot the device state and save it. This function is called when the
> + * VFIO uAPI consumer wants to snapshot the current device state and saves
> + * it into the VFIO migration region. This function should make sure all
> + * of the device state info is collectted and saved successfully. As a
> + * result, return value is mandatory to be checked.
> + *
> + * Return 0 for success, negative value for failure.
> + */
> +static int __must_check
> +ice_vfio_pci_save_state(struct ice_vfio_pci_core_device *ice_vdev,
> +			struct ice_vfio_pci_migration_file *migf)
> +{
> +	migf->total_length = SZ_128K;
> +
> +	return ice_migration_save_devstate(ice_vdev->pf,
> +					   ice_vdev->vf_id,
> +					   migf->mig_data,
> +					   migf->total_length);
> +}
> +
> +/**
> + * ice_vfio_migration_init - Initialization for live migration function
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + *
> + * Returns 0 on success, negative value on error
> + */
> +static int ice_vfio_migration_init(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	struct pci_dev *pdev = ice_vdev->core_device.pdev;
> +
> +	ice_vdev->pf = ice_migration_get_pf(pdev);
> +	if (!ice_vdev->pf)
> +		return -EFAULT;
> +
> +	ice_vdev->vf_id = pci_iov_vf_id(pdev);
> +	if (ice_vdev->vf_id < 0)
> +		return -EINVAL;
> +
> +	return ice_migration_init_dev(ice_vdev->pf, ice_vdev->vf_id);
> +}
> +
> +/**
> + * ice_vfio_migration_uninit - Cleanup for live migration function
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + */
> +static void ice_vfio_migration_uninit(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	ice_migration_uninit_dev(ice_vdev->pf, ice_vdev->vf_id);
> +}
> +
> +/**
> + * ice_vfio_pci_disable_fd - Close migration file
> + * @migf: pointer to ice vfio pci migration file
> + */
> +static void ice_vfio_pci_disable_fd(struct ice_vfio_pci_migration_file *migf)
> +{
> +	mutex_lock(&migf->lock);
> +	migf->disabled = true;
> +	migf->total_length = 0;
> +	migf->filp->f_pos = 0;
> +	mutex_unlock(&migf->lock);
> +}
> +
> +/**
> + * ice_vfio_pci_disable_fds - Close migration files of ice vfio pci device
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + */
> +static void ice_vfio_pci_disable_fds(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	if (ice_vdev->resuming_migf) {
> +		ice_vfio_pci_disable_fd(ice_vdev->resuming_migf);
> +		fput(ice_vdev->resuming_migf->filp);
> +		ice_vdev->resuming_migf = NULL;
> +	}
> +	if (ice_vdev->saving_migf) {
> +		ice_vfio_pci_disable_fd(ice_vdev->saving_migf);
> +		fput(ice_vdev->saving_migf->filp);
> +		ice_vdev->saving_migf = NULL;
> +	}
> +}
> +
> +/*
> + * This function is called in all state_mutex unlock cases to
> + * handle a 'deferred_reset' if exists.
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + */
> +static void
> +ice_vfio_pci_state_mutex_unlock(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +again:
> +	spin_lock(&ice_vdev->reset_lock);
> +	if (ice_vdev->deferred_reset) {
> +		ice_vdev->deferred_reset = false;
> +		spin_unlock(&ice_vdev->reset_lock);
> +		ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> +		ice_vfio_pci_disable_fds(ice_vdev);
> +		goto again;
> +	}
> +	mutex_unlock(&ice_vdev->state_mutex);
> +	spin_unlock(&ice_vdev->reset_lock);
> +}
> +
> +static void ice_vfio_pci_reset_done(struct pci_dev *pdev)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev =
> +		(struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev);
> +
> +	/*
> +	 * As the higher VFIO layers are holding locks across reset and using
> +	 * those same locks with the mm_lock we need to prevent ABBA deadlock
> +	 * with the state_mutex and mm_lock.
> +	 * In case the state_mutex was taken already we defer the cleanup work
> +	 * to the unlock flow of the other running context.
> +	 */
> +	spin_lock(&ice_vdev->reset_lock);
> +	ice_vdev->deferred_reset = true;
> +	if (!mutex_trylock(&ice_vdev->state_mutex)) {
> +		spin_unlock(&ice_vdev->reset_lock);
> +		return;
> +	}
> +	spin_unlock(&ice_vdev->reset_lock);
> +	ice_vfio_pci_state_mutex_unlock(ice_vdev);
> +}
> +
> +/**
> + * ice_vfio_pci_open_device - Called when a vfio device is probed by VFIO UAPI
> + * @core_vdev: the vfio device to open
> + *
> + * Initialization of the vfio device
> + *
> + * Returns 0 on success, negative value on error
> + */
> +static int ice_vfio_pci_open_device(struct vfio_device *core_vdev)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
> +			struct ice_vfio_pci_core_device, core_device.vdev);
> +	struct vfio_pci_core_device *vdev = &ice_vdev->core_device;
> +	int ret;
> +
> +	ret = vfio_pci_core_enable(vdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = ice_vfio_migration_init(ice_vdev);
> +	if (ret) {
> +		vfio_pci_core_disable(vdev);
> +		return ret;
> +	}
> +	ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
> +	vfio_pci_core_finish_enable(vdev);
> +
> +	return 0;
> +}
> +
> +/**
> + * ice_vfio_pci_close_device - Called when a vfio device fd is closed
> + * @core_vdev: the vfio device to close
> + */
> +static void ice_vfio_pci_close_device(struct vfio_device *core_vdev)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
> +			struct ice_vfio_pci_core_device, core_device.vdev);
> +
> +	ice_vfio_pci_disable_fds(ice_vdev);
> +	vfio_pci_core_close_device(core_vdev);
> +	ice_vfio_migration_uninit(ice_vdev);
> +}
> +
> +/**
> + * ice_vfio_pci_release_file - release ice vfio pci migration file
> + * @inode: pointer to inode
> + * @filp: pointer to the file to release
> + *
> + * Return 0 for success, negative for error
> + */
> +static int ice_vfio_pci_release_file(struct inode *inode, struct file *filp)
> +{
> +	struct ice_vfio_pci_migration_file *migf = filp->private_data;
> +
> +	ice_vfio_pci_disable_fd(migf);
> +	mutex_destroy(&migf->lock);
> +	kfree(migf);
> +	return 0;
> +}
> +
> +/**
> + * ice_vfio_pci_save_read - save migration file data to user space
> + * @filp: pointer to migration file
> + * @buf: pointer to user space buffer
> + * @len: data length to be saved
> + * @pos: should be 0
> + *
> + * Return len of saved data, negative for error
> + */
> +static ssize_t ice_vfio_pci_save_read(struct file *filp, char __user *buf,
> +				      size_t len, loff_t *pos)
> +{
> +	struct ice_vfio_pci_migration_file *migf = filp->private_data;
> +	loff_t *off = &filp->f_pos;
> +	ssize_t done = 0;
> +	int ret;
> +
> +	if (pos)
> +		return -ESPIPE;
> +
> +	mutex_lock(&migf->lock);
> +	if (*off > migf->total_length) {
> +		done = -EINVAL;
> +		goto out_unlock;
> +	}
> +
> +	if (migf->disabled) {
> +		done = -ENODEV;
> +		goto out_unlock;
> +	}
> +
> +	len = min_t(size_t, migf->total_length - *off, len);
> +	if (len) {
> +		ret = copy_to_user(buf, migf->mig_data + *off, len);
> +		if (ret) {
> +			done = -EFAULT;
> +			goto out_unlock;
> +		}
> +		*off += len;
> +		done = len;
> +	}
> +out_unlock:
> +	mutex_unlock(&migf->lock);
> +	return done;
> +}
> +
> +static const struct file_operations ice_vfio_pci_save_fops = {
> +	.owner = THIS_MODULE,
> +	.read = ice_vfio_pci_save_read,
> +	.release = ice_vfio_pci_release_file,
> +	.llseek = no_llseek,
> +};
> +
> +/**
> + * ice_vfio_pci_stop_copy - create migration file and save migration state to it
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + *
> + * Return migration file handler
> + */
> +static struct ice_vfio_pci_migration_file *
> +ice_vfio_pci_stop_copy(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	struct ice_vfio_pci_migration_file *migf;
> +	int ret;
> +
> +	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
> +	if (!migf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	migf->filp = anon_inode_getfile("ice_vfio_pci_mig",
> +					&ice_vfio_pci_save_fops, migf,
> +					O_RDONLY);
> +	if (IS_ERR(migf->filp)) {
> +		int err = PTR_ERR(migf->filp);
> +
> +		kfree(migf);
> +		return ERR_PTR(err);
> +	}
> +
> +	stream_open(migf->filp->f_inode, migf->filp);
> +	mutex_init(&migf->lock);
> +
> +	ret = ice_vfio_pci_save_state(ice_vdev, migf);
> +	if (ret) {
> +		fput(migf->filp);
> +		kfree(migf);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return migf;
> +}
> +
> +/**
> + * ice_vfio_pci_resume_write- copy migration file data from user space
> + * @filp: pointer to migration file
> + * @buf: pointer to user space buffer
> + * @len: data length to be copied
> + * @pos: should be 0
> + *
> + * Return len of saved data, negative for error
> + */
> +static ssize_t
> +ice_vfio_pci_resume_write(struct file *filp, const char __user *buf,
> +			  size_t len, loff_t *pos)
> +{
> +	struct ice_vfio_pci_migration_file *migf = filp->private_data;
> +	loff_t *off = &filp->f_pos;
> +	loff_t requested_length;
> +	ssize_t done = 0;
> +	int ret;
> +
> +	if (pos)
> +		return -ESPIPE;
> +
> +	if (*off < 0 ||
> +	    check_add_overflow((loff_t)len, *off, &requested_length))
> +		return -EINVAL;
> +
> +	if (requested_length > sizeof(migf->mig_data))
> +		return -ENOMEM;
> +
> +	mutex_lock(&migf->lock);
> +	if (migf->disabled) {
> +		done = -ENODEV;
> +		goto out_unlock;
> +	}
> +
> +	ret = copy_from_user(migf->mig_data + *off, buf, len);
> +	if (ret) {
> +		done = -EFAULT;
> +		goto out_unlock;
> +	}
> +	*off += len;
> +	done = len;
> +	migf->total_length += len;
> +out_unlock:
> +	mutex_unlock(&migf->lock);
> +	return done;
> +}
> +
> +static const struct file_operations ice_vfio_pci_resume_fops = {
> +	.owner = THIS_MODULE,
> +	.write = ice_vfio_pci_resume_write,
> +	.release = ice_vfio_pci_release_file,
> +	.llseek = no_llseek,
> +};
> +
> +/**
> + * ice_vfio_pci_resume - create resuming migration file
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + *
> + * Return migration file handler, negative value for failure
> + */
> +static struct ice_vfio_pci_migration_file *
> +ice_vfio_pci_resume(struct ice_vfio_pci_core_device *ice_vdev)
> +{
> +	struct ice_vfio_pci_migration_file *migf;
> +
> +	migf = kzalloc(sizeof(*migf), GFP_KERNEL);
> +	if (!migf)
> +		return ERR_PTR(-ENOMEM);
> +
> +	migf->filp = anon_inode_getfile("ice_vfio_pci_mig",
> +					&ice_vfio_pci_resume_fops, migf,
> +					O_WRONLY);
> +	if (IS_ERR(migf->filp)) {
> +		int err = PTR_ERR(migf->filp);
> +
> +		kfree(migf);
> +		return ERR_PTR(err);
> +	}
> +
> +	stream_open(migf->filp->f_inode, migf->filp);
> +	mutex_init(&migf->lock);
> +	return migf;
> +}
> +
> +/**
> + * ice_vfio_pci_step_device_state_locked - process device state change
> + * @ice_vdev: pointer to ice vfio pci core device structure
> + * @new: new device state
> + * @final: final device state
> + *
> + * Return migration file handler or NULL for success, negative value for failure
> + */
> +static struct file *
> +ice_vfio_pci_step_device_state_locked(struct ice_vfio_pci_core_device *ice_vdev,
> +				      u32 new, u32 final)
> +{
> +	u32 cur = ice_vdev->mig_state;
> +	int ret;
> +
> +	if (cur == VFIO_DEVICE_STATE_RUNNING &&
> +	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> +		ice_migration_suspend_dev(ice_vdev->pf, ice_vdev->vf_id);
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
> +	    new == VFIO_DEVICE_STATE_STOP)
> +		return NULL;

This looks suspicious, are we actually able to freeze the internal
device state?  It should happen here.

 * RUNNING_P2P -> STOP
 * STOP_COPY -> STOP
 *   While in STOP the device must stop the operation of the device. The device
 *   must not generate interrupts, DMA, or any other change to external state.
 *   It must not change its internal state. When stopped the device and kernel
 *   migration driver must accept and respond to interaction to support external
 *   subsystems in the STOP state, for example PCI MSI-X and PCI config space.
 *   Failure by the user to restrict device access while in STOP must not result
 *   in error conditions outside the user context (ex. host system faults).
 *
 *   The STOP_COPY arc will terminate a data transfer session.

> +
> +	if (cur == VFIO_DEVICE_STATE_STOP &&
> +	    new == VFIO_DEVICE_STATE_STOP_COPY) {
> +		struct ice_vfio_pci_migration_file *migf;
> +
> +		migf = ice_vfio_pci_stop_copy(ice_vdev);
> +		if (IS_ERR(migf))
> +			return ERR_CAST(migf);
> +		get_file(migf->filp);
> +		ice_vdev->saving_migf = migf;
> +		return migf->filp;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP_COPY &&
> +	    new == VFIO_DEVICE_STATE_STOP) {
> +		ice_vfio_pci_disable_fds(ice_vdev);
> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_STOP &&
> +	    new == VFIO_DEVICE_STATE_RESUMING) {
> +		struct ice_vfio_pci_migration_file *migf;
> +
> +		migf = ice_vfio_pci_resume(ice_vdev);
> +		if (IS_ERR(migf))
> +			return ERR_CAST(migf);
> +		get_file(migf->filp);
> +		ice_vdev->resuming_migf = migf;
> +		return migf->filp;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP)
> +		return NULL;

 * RESUMING -> STOP
 *   Leaving RESUMING terminates a data transfer session and indicates the
 *   device should complete processing of the data delivered by write(). The
 *   kernel migration driver should complete the incorporation of data written
 *   to the data transfer FD into the device internal state and perform
 *   final validity and consistency checking of the new device state. If the
 *   user provided data is found to be incomplete, inconsistent, or otherwise
 *   invalid, the migration driver must fail the SET_STATE ioctl and
 *   optionally go to the ERROR state as described below.

> +
> +	if (cur == VFIO_DEVICE_STATE_STOP &&
> +	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> +		ret = ice_vfio_pci_load_state(ice_vdev);
> +		if (ret)
> +			return ERR_PTR(ret);
> +		ice_vfio_pci_disable_fds(ice_vdev);

STOP is not a state that should have active migration fds, RESUMING ->
STOP above is, which is also where we'd expect to see the state loaded.
This again makes it suspicious whether the device actually supports
stopping and resuming internal state changes.

> +		return NULL;
> +	}
> +
> +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
> +	    new == VFIO_DEVICE_STATE_RUNNING)
> +		return NULL;
> +
> +	/*
> +	 * vfio_mig_get_next_state() does not use arcs other than the above
> +	 */
> +	WARN_ON(true);
> +	return ERR_PTR(-EINVAL);
> +}
> +
> +/**
> + * ice_vfio_pci_set_device_state - Config device state
> + * @vdev: pointer to vfio pci device
> + * @new_state: device state
> + *
> + * Return 0 for success, negative value for failure.

Inaccurate description of return value.

> + */
> +static struct file *
> +ice_vfio_pci_set_device_state(struct vfio_device *vdev,
> +			      enum vfio_device_mig_state new_state)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev =
> +			container_of(vdev,
> +				     struct ice_vfio_pci_core_device,
> +				     core_device.vdev);
> +	enum vfio_device_mig_state next_state;
> +	struct file *res = NULL;
> +	int ret;
> +
> +	mutex_lock(&ice_vdev->state_mutex);
> +	while (new_state != ice_vdev->mig_state) {
> +		ret = vfio_mig_get_next_state(vdev, ice_vdev->mig_state,
> +					      new_state, &next_state);
> +		if (ret) {
> +			res = ERR_PTR(ret);
> +			break;
> +		}
> +		res = ice_vfio_pci_step_device_state_locked(ice_vdev,
> +							    next_state,
> +							    new_state);
> +		if (IS_ERR(res))
> +			break;
> +		ice_vdev->mig_state = next_state;
> +		if (WARN_ON(res && new_state != ice_vdev->mig_state)) {
> +			fput(res);
> +			res = ERR_PTR(-EINVAL);
> +			break;
> +		}
> +	}
> +	ice_vfio_pci_state_mutex_unlock(ice_vdev);
> +	return res;
> +}
> +
> +/**
> + * ice_vfio_pci_get_device_state - get device state
> + * @vdev: pointer to vfio pci device
> + * @curr_state: device state
> + *
> + * Return 0 for success
> + */
> +static int ice_vfio_pci_get_device_state(struct vfio_device *vdev,
> +					 enum vfio_device_mig_state *curr_state)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev =
> +			container_of(vdev,
> +				     struct ice_vfio_pci_core_device,
> +				     core_device.vdev);

Blank line after variable declaration.

> +	mutex_lock(&ice_vdev->state_mutex);
> +	*curr_state = ice_vdev->mig_state;
> +	ice_vfio_pci_state_mutex_unlock(ice_vdev);
> +	return 0;
> +}
> +
> +/**
> + * ice_vfio_pci_get_data_size - get migration data size
> + * @vdev: pointer to vfio pci device
> + * @stop_copy_length: migration data size
> + *
> + * Return 0 for success
> + */
> +static int
> +ice_vfio_pci_get_data_size(struct vfio_device *vdev,
> +			   unsigned long *stop_copy_length)
> +{
> +	*stop_copy_length = SZ_128K;
> +	return 0;
> +}
> +
> +static const struct vfio_migration_ops ice_vfio_pci_migrn_state_ops = {
> +	.migration_set_state = ice_vfio_pci_set_device_state,
> +	.migration_get_state = ice_vfio_pci_get_device_state,
> +	.migration_get_data_size = ice_vfio_pci_get_data_size,
> +};
> +
> +/**
> + * ice_vfio_pci_core_init_dev - initialize vfio device
> + * @core_vdev: pointer to vfio device
> + *
> + * Return 0 for success
> + */
> +static int ice_vfio_pci_core_init_dev(struct vfio_device *core_vdev)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev,
> +			struct ice_vfio_pci_core_device, core_device.vdev);
> +
> +	mutex_init(&ice_vdev->state_mutex);
> +	spin_lock_init(&ice_vdev->reset_lock);
> +
> +	core_vdev->migration_flags =
> +		VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P;
> +	core_vdev->mig_ops = &ice_vfio_pci_migrn_state_ops;
> +
> +	return vfio_pci_core_init_dev(core_vdev);
> +}
> +
> +static const struct vfio_device_ops ice_vfio_pci_ops = {
> +	.name		= "ice-vfio-pci",
> +	.init		= ice_vfio_pci_core_init_dev,
> +	.release	= vfio_pci_core_release_dev,

Looks like the release callback should at least cleanup the locks for
lockdep rather than use the core function directly.

> +	.open_device	= ice_vfio_pci_open_device,
> +	.close_device	= ice_vfio_pci_close_device,
> +	.device_feature = vfio_pci_core_ioctl_feature,
> +	.read		= vfio_pci_core_read,
> +	.write		= vfio_pci_core_write,
> +	.ioctl		= vfio_pci_core_ioctl,
> +	.mmap		= vfio_pci_core_mmap,
> +	.request	= vfio_pci_core_request,
> +	.match		= vfio_pci_core_match,
> +	.bind_iommufd	= vfio_iommufd_physical_bind,
> +	.unbind_iommufd	= vfio_iommufd_physical_unbind,
> +	.attach_ioas	= vfio_iommufd_physical_attach_ioas,
> +	.detach_ioas	= vfio_iommufd_physical_detach_ioas,
> +};
> +
> +/**
> + * ice_vfio_pci_probe - Device initialization routine
> + * @pdev: PCI device information struct
> + * @id: entry in ice_vfio_pci_table
> + *
> + * Returns 0 on success, negative on failure
> + */
> +static int
> +ice_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev;
> +	int ret;
> +
> +	ice_vdev = vfio_alloc_device(ice_vfio_pci_core_device, core_device.vdev,
> +				     &pdev->dev, &ice_vfio_pci_ops);
> +	if (!ice_vdev)

Needs to test IS_ERR(ice_vdev).  Thanks,

Alex

> +		return -ENOMEM;
> +
> +	dev_set_drvdata(&pdev->dev, &ice_vdev->core_device);
> +
> +	ret = vfio_pci_core_register_device(&ice_vdev->core_device);
> +	if (ret)
> +		goto out_free;
> +
> +	return 0;
> +
> +out_free:
> +	vfio_put_device(&ice_vdev->core_device.vdev);
> +	return ret;
> +}
> +
> +/**
> + * ice_vfio_pci_remove - Device removal routine
> + * @pdev: PCI device information struct
> + */
> +static void ice_vfio_pci_remove(struct pci_dev *pdev)
> +{
> +	struct ice_vfio_pci_core_device *ice_vdev =
> +		(struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev);
> +
> +	vfio_pci_core_unregister_device(&ice_vdev->core_device);
> +	vfio_put_device(&ice_vdev->core_device.vdev);
> +}
> +
> +/* ice_pci_tbl - PCI Device ID Table
> + *
> + * Wildcard entries (PCI_ANY_ID) should come last
> + * Last entry must be all 0s
> + *
> + * { Vendor ID, Device ID, SubVendor ID, SubDevice ID,
> + *   Class, Class Mask, private data (not used) }
> + */
> +static const struct pci_device_id ice_vfio_pci_table[] = {
> +	{ PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x1889) },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(pci, ice_vfio_pci_table);
> +
> +static const struct pci_error_handlers ice_vfio_pci_core_err_handlers = {
> +	.reset_done = ice_vfio_pci_reset_done,
> +	.error_detected = vfio_pci_core_aer_err_detected,
> +};
> +
> +static struct pci_driver ice_vfio_pci_driver = {
> +	.name			= "ice-vfio-pci",
> +	.id_table		= ice_vfio_pci_table,
> +	.probe			= ice_vfio_pci_probe,
> +	.remove			= ice_vfio_pci_remove,
> +	.err_handler            = &ice_vfio_pci_core_err_handlers,
> +	.driver_managed_dma	= true,
> +};
> +
> +module_pci_driver(ice_vfio_pci_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Intel Corporation, <linux.nics@intel.com>");
> +MODULE_DESCRIPTION(DRIVER_DESC);


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF
  2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
  2023-11-29 17:12   ` Simon Horman
  2023-12-07  7:33   ` Tian, Kevin
@ 2023-12-08  1:53   ` Brett Creeley
  2 siblings, 0 replies; 33+ messages in thread
From: Brett Creeley @ 2023-12-08  1:53 UTC (permalink / raw)
  To: Yahui Cao, intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni



On 11/20/2023 6:51 PM, Yahui Cao wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Lingyu Liu <lingyu.liu@intel.com>
> 
> Save the virtual channel messages sent by VF on the source side during
> runtime. The logged virtchnl messages will be transferred and loaded
> into the device on the destination side during the device resume stage.
> 
> For the feature which can not be migrated yet, it must be disabled or
> blocked to prevent from being abused by VF. Otherwise, it may introduce
> functional and security issue. Mask unsupported VF capability flags in
> the VF-PF negotiaion stage.

s/negotiaion/negotiation/

> 
> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> ---
>   .../net/ethernet/intel/ice/ice_migration.c    | 167 ++++++++++++++++++
>   .../intel/ice/ice_migration_private.h         |  17 ++
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   5 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c |  31 ++++
>   4 files changed, 220 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c
> index 2b9b5a2ce367..18ec4ec7d147 100644
> --- a/drivers/net/ethernet/intel/ice/ice_migration.c
> +++ b/drivers/net/ethernet/intel/ice/ice_migration.c
> @@ -3,6 +3,17 @@
> 
>   #include "ice.h"
> 
> +struct ice_migration_virtchnl_msg_slot {
> +       u32 opcode;
> +       u16 msg_len;
> +       char msg_buffer[];
> +};
> +
> +struct ice_migration_virtchnl_msg_listnode {
> +       struct list_head node;
> +       struct ice_migration_virtchnl_msg_slot msg_slot;
> +};
> +
>   /**
>    * ice_migration_get_pf - Get ice PF structure pointer by pdev
>    * @pdev: pointer to ice vfio pci VF pdev structure
> @@ -22,6 +33,9 @@ EXPORT_SYMBOL(ice_migration_get_pf);
>   void ice_migration_init_vf(struct ice_vf *vf)
>   {
>          vf->migration_enabled = true;
> +       INIT_LIST_HEAD(&vf->virtchnl_msg_list);
> +       vf->virtchnl_msg_num = 0;
> +       vf->virtchnl_msg_size = 0;
>   }
> 
>   /**
> @@ -30,10 +44,24 @@ void ice_migration_init_vf(struct ice_vf *vf)
>    */
>   void ice_migration_uninit_vf(struct ice_vf *vf)
>   {
> +       struct ice_migration_virtchnl_msg_listnode *msg_listnode;
> +       struct ice_migration_virtchnl_msg_listnode *dtmp;
> +
>          if (!vf->migration_enabled)
>                  return;
> 
>          vf->migration_enabled = false;
> +
> +       if (list_empty(&vf->virtchnl_msg_list))
> +               return;
> +       list_for_each_entry_safe(msg_listnode, dtmp,
> +                                &vf->virtchnl_msg_list,
> +                                node) {
> +               list_del(&msg_listnode->node);
> +               kfree(msg_listnode);
> +       }
> +       vf->virtchnl_msg_num = 0;
> +       vf->virtchnl_msg_size = 0;
>   }
> 
>   /**
> @@ -80,3 +108,142 @@ void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id)
>          ice_put_vf(vf);
>   }
>   EXPORT_SYMBOL(ice_migration_uninit_dev);
> +
> +/**
> + * ice_migration_is_loggable_msg - is this message loggable or not
> + * @v_opcode: virtchnl message operation code
> + *
> + * Return true if this message logging is supported, otherwise return false
> + */
> +static inline bool ice_migration_is_loggable_msg(u32 v_opcode)
> +{
> +       switch (v_opcode) {
> +       case VIRTCHNL_OP_VERSION:
> +       case VIRTCHNL_OP_GET_VF_RESOURCES:
> +       case VIRTCHNL_OP_CONFIG_VSI_QUEUES:
> +       case VIRTCHNL_OP_CONFIG_IRQ_MAP:
> +       case VIRTCHNL_OP_ADD_ETH_ADDR:
> +       case VIRTCHNL_OP_DEL_ETH_ADDR:
> +       case VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE:
> +       case VIRTCHNL_OP_ENABLE_QUEUES:
> +       case VIRTCHNL_OP_DISABLE_QUEUES:
> +       case VIRTCHNL_OP_ADD_VLAN:
> +       case VIRTCHNL_OP_DEL_VLAN:
> +       case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING:
> +       case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING:
> +       case VIRTCHNL_OP_CONFIG_RSS_KEY:
> +       case VIRTCHNL_OP_CONFIG_RSS_LUT:
> +       case VIRTCHNL_OP_GET_SUPPORTED_RXDIDS:
> +               return true;
> +       default:
> +               return false;
> +       }
> +}
> +
> +/**
> + * ice_migration_log_vf_msg - Log request message from VF
> + * @vf: pointer to the VF structure
> + * @event: pointer to the AQ event
> + *
> + * Log VF message for later device state loading during live migration
> + *
> + * Return 0 for success, negative for error
> + */
> +int ice_migration_log_vf_msg(struct ice_vf *vf,
> +                            struct ice_rq_event_info *event)
> +{
> +       struct ice_migration_virtchnl_msg_listnode *msg_listnode;
> +       u32 v_opcode = le32_to_cpu(event->desc.cookie_high);
> +       struct device *dev = ice_pf_to_dev(vf->pf);
> +       u16 msglen = event->msg_len;
> +       u8 *msg = event->msg_buf;
> +
> +       if (!ice_migration_is_loggable_msg(v_opcode))
> +               return 0;
> +
> +       if (vf->virtchnl_msg_num >= VIRTCHNL_MSG_MAX) {
> +               dev_warn(dev, "VF %d has maximum number virtual channel commands\n",
> +                        vf->vf_id);
> +               return -ENOMEM;
> +       }
> +
> +       msg_listnode = (struct ice_migration_virtchnl_msg_listnode *)
> +                       kzalloc(struct_size(msg_listnode,
> +                                           msg_slot.msg_buffer,
> +                                           msglen),
> +                               GFP_KERNEL);
> +       if (!msg_listnode) {
> +               dev_err(dev, "VF %d failed to allocate memory for msg listnode\n",
> +                       vf->vf_id);
> +               return -ENOMEM;
> +       }
> +       dev_dbg(dev, "VF %d save virtual channel command, op code: %d, len: %d\n",
> +               vf->vf_id, v_opcode, msglen);
> +       msg_listnode->msg_slot.opcode = v_opcode;
> +       msg_listnode->msg_slot.msg_len = msglen;
> +       memcpy(msg_listnode->msg_slot.msg_buffer, msg, msglen);

It seems like this can still be abused. What if the VM/VF user sends 
hundreds of thousands of ADD_ADDR/DEL_ADDR, ADD_VLAN/DEL_VLAN, 
PROMISCUOUS, ENABLE_VLAN_STRIPPING/DISABLE_VLAN_STRIPPING, RSS_LUT, 
RSS_KEY, etc.?

Shouldn't you only maintain one copy for each key/value when it makes 
sense? For example, you don't need multiple RSS_LUT and RSS_KEY messages 
logged as just the most recent one is needed.

What if multiple promiscuous messages are sent? Do you need to save them 
all or just the most recent?

What if you have an ADD_ADDR/DEL_ADDR for the same address? Do you need 
to save both of those messages? Seems like when you get a DEL_ADDR you 
should search for the associated ADD_ADDR and just remove it. Same 
comment applies for ADD_VLAN/DEL_VLAN.

> +       list_add_tail(&msg_listnode->node, &vf->virtchnl_msg_list);
> +       vf->virtchnl_msg_num++;
> +       vf->virtchnl_msg_size += struct_size(&msg_listnode->msg_slot,
> +                                            msg_buffer,
> +                                            msglen);
> +       return 0;
> +}
> +
> +/**
> + * ice_migration_unlog_vf_msg - revert logged message
> + * @vf: pointer to the VF structure
> + * @v_opcode: virtchnl message operation code
> + *
> + * Remove the last virtual channel message logged before.
> + */
> +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode)
> +{
> +       struct ice_migration_virtchnl_msg_listnode *msg_listnode;
> +
> +       if (!ice_migration_is_loggable_msg(v_opcode))
> +               return;
> +
> +       if (WARN_ON_ONCE(list_empty(&vf->virtchnl_msg_list)))
> +               return;
> +
> +       msg_listnode =
> +               list_last_entry(&vf->virtchnl_msg_list,
> +                               struct ice_migration_virtchnl_msg_listnode,
> +                               node);
> +       if (WARN_ON_ONCE(msg_listnode->msg_slot.opcode != v_opcode))
> +               return;
> +
> +       list_del(&msg_listnode->node);
> +       kfree(msg_listnode);
> +       vf->virtchnl_msg_num--;
> +       vf->virtchnl_msg_size -= struct_size(&msg_listnode->msg_slot,
> +                                            msg_buffer,
> +                                            msg_listnode->msg_slot.msg_len);
> +}
> +
> +#define VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE \
> +                               (VIRTCHNL_VF_OFFLOAD_L2 | \
> +                                VIRTCHNL_VF_OFFLOAD_RSS_PF | \
> +                                VIRTCHNL_VF_OFFLOAD_RSS_AQ | \
> +                                VIRTCHNL_VF_OFFLOAD_RSS_REG | \
> +                                VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2 | \
> +                                VIRTCHNL_VF_OFFLOAD_ENCAP | \
> +                                VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM | \
> +                                VIRTCHNL_VF_OFFLOAD_RX_POLLING | \
> +                                VIRTCHNL_VF_OFFLOAD_WB_ON_ITR | \
> +                                VIRTCHNL_VF_CAP_ADV_LINK_SPEED | \
> +                                VIRTCHNL_VF_OFFLOAD_VLAN | \
> +                                VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC | \
> +                                VIRTCHNL_VF_OFFLOAD_USO)
> +
> +/**
> + * ice_migration_supported_caps - get migration supported VF capabilities
> + *
> + * When migration is activated, some VF capabilities are not supported.
> + * Hence unmask those capability flags for VF resources.
> + */
> +u32 ice_migration_supported_caps(void)
> +{
> +       return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE;
> +}
> diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h
> index 2cc2f515fc5e..676eb2d6c12e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_migration_private.h
> +++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h
> @@ -13,9 +13,26 @@
>   #if IS_ENABLED(CONFIG_ICE_VFIO_PCI)
>   void ice_migration_init_vf(struct ice_vf *vf);
>   void ice_migration_uninit_vf(struct ice_vf *vf);
> +int ice_migration_log_vf_msg(struct ice_vf *vf,
> +                            struct ice_rq_event_info *event);
> +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode);
> +u32 ice_migration_supported_caps(void);
>   #else
>   static inline void ice_migration_init_vf(struct ice_vf *vf) { }
>   static inline void ice_migration_uninit_vf(struct ice_vf *vf) { }
> +static inline int ice_migration_log_vf_msg(struct ice_vf *vf,
> +                                          struct ice_rq_event_info *event)
> +{
> +       return 0;
> +}
> +
> +static inline void
> +ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode) { }
> +static inline u32
> +ice_migration_supported_caps(void)
> +{
> +       return 0xFFFFFFFF;
> +}
>   #endif /* CONFIG_ICE_VFIO_PCI */
> 
>   #endif /* _ICE_MIGRATION_PRIVATE_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> index 431fd28787e8..318b6dfc016d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> @@ -77,6 +77,7 @@ struct ice_vfs {
>          unsigned long last_printed_mdd_jiffies; /* MDD message rate limit */
>   };
> 
> +#define VIRTCHNL_MSG_MAX 1000

This seems fairly arbitrary. How did you come up with this value? It 
seems like you can figure out the max number of messages needed for a 
single VF and it wouldn't be too unreasonable. What if it's a trusted VF 
that supports 4K VLANs?

Also, should this be named more appropriately since it's specific to the 
ice driver, i.e.:

ICE_VF_VIRTCHNL_LOGGABLE_MSG_MAX


>   /* VF information structure */
>   struct ice_vf {
>          struct hlist_node entry;
> @@ -141,6 +142,10 @@ struct ice_vf {
>          u16 num_msix;                   /* num of MSI-X configured on this VF */
> 
>          u8 migration_enabled:1;
> +       struct list_head virtchnl_msg_list;
> +       u64 virtchnl_msg_num;
> +       u64 virtchnl_msg_size;
> +       u32 virtchnl_retval;
>   };
> 
>   /* Flags for controlling behavior of ice_reset_vf */
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index 661ca86c3032..730eeaea8c89 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -348,6 +348,12 @@ ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
>                  return -EIO;
>          }
> 
> +       /* v_retval will not be returned in this function, store it in the
> +        * per VF field to be used by migration logging logic later.
> +        */
> +       if (vf->migration_enabled)
> +               vf->virtchnl_retval = v_retval;
> +
>          return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen);
>   }
> 
> @@ -480,6 +486,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
>                                    VIRTCHNL_VF_OFFLOAD_RSS_REG |
>                                    VIRTCHNL_VF_OFFLOAD_VLAN;
> 
> +       if (vf->migration_enabled)
> +               vf->driver_caps &= ice_migration_supported_caps();
>          vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
>          vsi = ice_get_vf_vsi(vf);
>          if (!vsi) {
> @@ -4037,6 +4045,17 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>                  goto finish;
>          }
> 
> +       if (vf->migration_enabled) {
> +               if (ice_migration_log_vf_msg(vf, event)) {
> +                       u32 status_code = VIRTCHNL_STATUS_ERR_NO_MEMORY;
> +
> +                       err = ice_vc_respond_to_vf(vf, v_opcode,
> +                                                  status_code,
> +                                                  NULL, 0);
> +                       goto finish;
> +               }
> +       }
> +
>          switch (v_opcode) {
>          case VIRTCHNL_OP_VERSION:
>                  err = ops->get_ver_msg(vf, msg);
> @@ -4156,6 +4175,18 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>                           vf_id, v_opcode, err);
>          }
> 
> +       /* All of the loggable virtual channel messages are logged by
> +        * ice_migration_unlog_vf_msg() before they are processed.

Should this be ice_migration_log_vf_msg() in the comment instead?

> +        *
> +        * Two kinds of error may happen, virtual channel message's result
> +        * is failure after processed by PF or message is not sent to VF
> +        * successfully. If error happened, fallback here by reverting logged
> +        * messages.
> +        */
> +       if (vf->migration_enabled &&
> +           (vf->virtchnl_retval != VIRTCHNL_STATUS_SUCCESS || err))
> +               ice_migration_unlog_vf_msg(vf, v_opcode);
> +
>   finish:
>          mutex_unlock(&vf->cfg_lock);
>          ice_put_vf(vf);
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head
  2023-12-07 14:46     ` Jason Gunthorpe
@ 2023-12-08  2:53       ` Tian, Kevin
  0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2023-12-08  2:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cao, Yahui, intel-wired-lan@lists.osuosl.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Liu, Lingyu, Chittim, Madhu,
	Samudrala, Sridhar, alex.williamson@redhat.com,
	yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com,
	brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, December 7, 2023 10:46 PM
> 
> On Thu, Dec 07, 2023 at 07:55:17AM +0000, Tian, Kevin wrote:
> > > From: Cao, Yahui <yahui.cao@intel.com>
> > > Sent: Tuesday, November 21, 2023 10:51 AM
> > >
> > > +
> > > +		/* Once RX Queue is enabled, network traffic may come in at
> > > any
> > > +		 * time. As a result, RX Queue head needs to be loaded
> > > before
> > > +		 * RX Queue is enabled.
> > > +		 * For simplicity and integration, overwrite RX head just after
> > > +		 * RX ring context is configured.
> > > +		 */
> > > +		if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES)
> > > {
> > > +			ret = ice_migration_load_rx_head(vf, devstate);
> > > +			if (ret) {
> > > +				dev_err(dev, "VF %d failed to load rx head\n",
> > > +					vf->vf_id);
> > > +				goto out_clear_replay;
> > > +			}
> > > +		}
> > > +
> >
> > Don't we have the same problem here as for TX head restore that the
> > vfio migration protocol doesn't carry a way to tell whether the IOAS
> > associated with the device has been restored then allowing RX DMA
> > at this point might cause device error?
> 
> Does this trigger a DMA?

looks yes from the comment

> 
> > @Jason, is it a common gap applying to all devices which include a
> > receiving path from link? How is it handled in mlx migration
> > driver?
> 
> There should be no DMA until the device is placed in RUNNING. All
> devices may instantly trigger DMA once placed in RUNNING.
> 
> The VMM must ensure the entire environment is ready to go before
> putting anything in RUNNING, including having setup the IOMMU.
> 

ah, yes. that is the right behavior.

so if there is no other way to block DMA before RUNNING is reached,
here the RX queue should be left disabled until when transitioning 
to RUNNING.

Yahui, can you double check?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices
  2023-12-07 22:43   ` Alex Williamson
@ 2023-12-08  3:42     ` Tian, Kevin
  2023-12-08  3:42       ` Tian, Kevin
  0 siblings, 1 reply; 33+ messages in thread
From: Tian, Kevin @ 2023-12-08  3:42 UTC (permalink / raw)
  To: Alex Williamson, Cao, Yahui
  Cc: intel-wired-lan@lists.osuosl.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Liu, Lingyu, Chittim, Madhu,
	Samudrala, Sridhar, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Friday, December 8, 2023 6:43 AM
> > +
> > +	if (cur == VFIO_DEVICE_STATE_RUNNING &&
> > +	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> > +		ice_migration_suspend_dev(ice_vdev->pf, ice_vdev->vf_id);
> > +		return NULL;
> > +	}
> > +
> > +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
> > +	    new == VFIO_DEVICE_STATE_STOP)
> > +		return NULL;
> 
> This looks suspicious, are we actually able to freeze the internal
> device state?  It should happen here.
> 
>  * RUNNING_P2P -> STOP
>  * STOP_COPY -> STOP
>  *   While in STOP the device must stop the operation of the device. The
> device
>  *   must not generate interrupts, DMA, or any other change to external state.
>  *   It must not change its internal state. When stopped the device and kernel
>  *   migration driver must accept and respond to interaction to support
> external
>  *   subsystems in the STOP state, for example PCI MSI-X and PCI config space.
>  *   Failure by the user to restrict device access while in STOP must not result
>  *   in error conditions outside the user context (ex. host system faults).
>  *
>  *   The STOP_COPY arc will terminate a data transfer session.
> 

It was discussed in v3 [1].

This device only provides a way to drain/stop outgoing traffic (for
RUNNING->RUNNING_P2P). No interface for stopping the incoming
requests.

Jason explained that RUNNING_P2P->STOP transition can be a 'nop' as long
as there is guarantee that the device state is frozen at this point.

By definition the user should request this transition only after all devices
are put in RUNNING_P2P. At that point no one is sending P2P requests to
further affect the internal state of this device. Then an explicit "stop
responder" action is not strictly required and 'nop' can still meet
above definition.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices
  2023-12-08  3:42     ` Tian, Kevin
@ 2023-12-08  3:42       ` Tian, Kevin
  0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2023-12-08  3:42 UTC (permalink / raw)
  To: Alex Williamson, Cao, Yahui
  Cc: intel-wired-lan@lists.osuosl.org, kvm@vger.kernel.org,
	netdev@vger.kernel.org, Liu, Lingyu, Chittim, Madhu,
	Samudrala, Sridhar, jgg@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com

> From: Tian, Kevin
> Sent: Friday, December 8, 2023 11:42 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Friday, December 8, 2023 6:43 AM
> > > +
> > > +	if (cur == VFIO_DEVICE_STATE_RUNNING &&
> > > +	    new == VFIO_DEVICE_STATE_RUNNING_P2P) {
> > > +		ice_migration_suspend_dev(ice_vdev->pf, ice_vdev->vf_id);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	if (cur == VFIO_DEVICE_STATE_RUNNING_P2P &&
> > > +	    new == VFIO_DEVICE_STATE_STOP)
> > > +		return NULL;
> >
> > This looks suspicious, are we actually able to freeze the internal
> > device state?  It should happen here.
> >
> >  * RUNNING_P2P -> STOP
> >  * STOP_COPY -> STOP
> >  *   While in STOP the device must stop the operation of the device. The
> > device
> >  *   must not generate interrupts, DMA, or any other change to external
> state.
> >  *   It must not change its internal state. When stopped the device and
> kernel
> >  *   migration driver must accept and respond to interaction to support
> > external
> >  *   subsystems in the STOP state, for example PCI MSI-X and PCI config
> space.
> >  *   Failure by the user to restrict device access while in STOP must not
> result
> >  *   in error conditions outside the user context (ex. host system faults).
> >  *
> >  *   The STOP_COPY arc will terminate a data transfer session.
> >
> 
> It was discussed in v3 [1].
> 
> This device only provides a way to drain/stop outgoing traffic (for
> RUNNING->RUNNING_P2P). No interface for stopping the incoming
> requests.
> 
> Jason explained that RUNNING_P2P->STOP transition can be a 'nop' as long
> as there is guarantee that the device state is frozen at this point.
> 
> By definition the user should request this transition only after all devices
> are put in RUNNING_P2P. At that point no one is sending P2P requests to
> further affect the internal state of this device. Then an explicit "stop
> responder" action is not strictly required and 'nop' can still meet
> above definition.

[1] https://lore.kernel.org/intel-wired-lan/20231013140744.GT3952@nvidia.com/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context
  2023-11-21  2:51 ` [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Yahui Cao
@ 2023-12-08 22:01   ` Brett Creeley
  0 siblings, 0 replies; 33+ messages in thread
From: Brett Creeley @ 2023-12-08 22:01 UTC (permalink / raw)
  To: Yahui Cao, intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

On 11/20/2023 6:51 PM, Yahui Cao wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> Export RX queue context get function which is consumed by linux live
> migration driver to save and load device state.

Nit, but I don't think "linux" needs to be mentioned here.

> 
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice_common.c | 268 ++++++++++++++++++++
>   drivers/net/ethernet/intel/ice/ice_common.h |   5 +
>   2 files changed, 273 insertions(+)
> 

[...]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 02/12] ice: Add function to get and set TX queue context
  2023-11-21  2:51 ` [PATCH iwl-next v4 02/12] ice: Add function to get and set TX " Yahui Cao
@ 2023-12-08 22:14   ` Brett Creeley
  0 siblings, 0 replies; 33+ messages in thread
From: Brett Creeley @ 2023-12-08 22:14 UTC (permalink / raw)
  To: Yahui Cao, intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

On 11/20/2023 6:51 PM, Yahui Cao wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> Export TX queue context get and set function which is consumed by linux
> live migration driver to save and load device state.

Nit, but I don't think "linux" needs to be mentioned here.

> 
> TX queue context contains static fields which does not change during TX
> traffic and dynamic fields which may change during TX traffic.
> 
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice_common.c   | 216 +++++++++++++++++-
>   drivers/net/ethernet/intel/ice/ice_common.h   |   6 +
>   .../net/ethernet/intel/ice/ice_hw_autogen.h   |  15 ++
>   .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |   3 +
>   4 files changed, 239 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index d0a3bed00921..8577a5ef423e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -1645,7 +1645,10 @@ ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
>          return ice_get_ctx(ctx_buf, (u8 *)rlan_ctx, ice_rlan_ctx_info);
>   }
> 
> -/* LAN Tx Queue Context */
> +/* LAN Tx Queue Context used for set Tx config by ice_aqc_opc_add_txqs,
> + * Bit[0-175] is valid
> + */
> +
>   const struct ice_ctx_ele ice_tlan_ctx_info[] = {
>                                      /* Field                    Width   LSB */
>          ICE_CTX_STORE(ice_tlan_ctx, base,                       57,     0),
> @@ -1679,6 +1682,217 @@ const struct ice_ctx_ele ice_tlan_ctx_info[] = {
>          { 0 }
>   };
> 
> +/* LAN Tx Queue Context used for get Tx config from QTXCOMM_CNTX data,
> + * Bit[0-292] is valid, including internal queue state. Since internal
> + * queue state is dynamic field, its value will be cleared once queue
> + * is disabled
> + */
> +static const struct ice_ctx_ele ice_tlan_ctx_data_info[] = {
> +                                   /* Field                    Width   LSB */
> +       ICE_CTX_STORE(ice_tlan_ctx, base,                       57,     0),
> +       ICE_CTX_STORE(ice_tlan_ctx, port_num,                   3,      57),
> +       ICE_CTX_STORE(ice_tlan_ctx, cgd_num,                    5,      60),
> +       ICE_CTX_STORE(ice_tlan_ctx, pf_num,                     3,      65),
> +       ICE_CTX_STORE(ice_tlan_ctx, vmvf_num,                   10,     68),
> +       ICE_CTX_STORE(ice_tlan_ctx, vmvf_type,                  2,      78),
> +       ICE_CTX_STORE(ice_tlan_ctx, src_vsi,                    10,     80),
> +       ICE_CTX_STORE(ice_tlan_ctx, tsyn_ena,                   1,      90),
> +       ICE_CTX_STORE(ice_tlan_ctx, internal_usage_flag,        1,      91),
> +       ICE_CTX_STORE(ice_tlan_ctx, alt_vlan,                   1,      92),
> +       ICE_CTX_STORE(ice_tlan_ctx, cpuid,                      8,      93),
> +       ICE_CTX_STORE(ice_tlan_ctx, wb_mode,                    1,      101),
> +       ICE_CTX_STORE(ice_tlan_ctx, tphrd_desc,                 1,      102),
> +       ICE_CTX_STORE(ice_tlan_ctx, tphrd,                      1,      103),
> +       ICE_CTX_STORE(ice_tlan_ctx, tphwr_desc,                 1,      104),
> +       ICE_CTX_STORE(ice_tlan_ctx, cmpq_id,                    9,      105),
> +       ICE_CTX_STORE(ice_tlan_ctx, qnum_in_func,               14,     114),
> +       ICE_CTX_STORE(ice_tlan_ctx, itr_notification_mode,      1,      128),
> +       ICE_CTX_STORE(ice_tlan_ctx, adjust_prof_id,             6,      129),
> +       ICE_CTX_STORE(ice_tlan_ctx, qlen,                       13,     135),
> +       ICE_CTX_STORE(ice_tlan_ctx, quanta_prof_idx,            4,      148),
> +       ICE_CTX_STORE(ice_tlan_ctx, tso_ena,                    1,      152),
> +       ICE_CTX_STORE(ice_tlan_ctx, tso_qnum,                   11,     153),
> +       ICE_CTX_STORE(ice_tlan_ctx, legacy_int,                 1,      164),
> +       ICE_CTX_STORE(ice_tlan_ctx, drop_ena,                   1,      165),
> +       ICE_CTX_STORE(ice_tlan_ctx, cache_prof_idx,             2,      166),
> +       ICE_CTX_STORE(ice_tlan_ctx, pkt_shaper_prof_idx,        3,      168),
> +       ICE_CTX_STORE(ice_tlan_ctx, tail,                       13,     184),
> +       { 0 }
> +};
> +
> +/**
> + * ice_copy_txq_ctx_from_hw - Copy txq context register from HW
> + * @hw: pointer to the hardware structure
> + * @ice_txq_ctx: pointer to the txq context
> + *
> + * Copy txq context from HW register space to dense structure
> + */
> +static int
> +ice_copy_txq_ctx_from_hw(struct ice_hw *hw, u8 *ice_txq_ctx)
> +{
> +       u8 i;
> +
> +       if (!ice_txq_ctx)
> +               return -EINVAL;
> +
> +       /* Copy each dword separately from HW */
> +       for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) {
> +               u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32)));
> +
> +               *ctx = rd32(hw, GLCOMM_QTX_CNTX_DATA(i));
> +
> +               ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx);
> +       }
> +
> +       return 0;
> +}
> +
> +/**
> + * ice_copy_txq_ctx_to_hw - Copy txq context register into HW
> + * @hw: pointer to the hardware structure
> + * @ice_txq_ctx: pointer to the txq context
> + *
> + * Copy txq context from dense structure to HW register space
> + */
> +static int
> +ice_copy_txq_ctx_to_hw(struct ice_hw *hw, u8 *ice_txq_ctx)
> +{
> +       u8 i;
> +
> +       if (!ice_txq_ctx)
> +               return -EINVAL;
> +
> +       /* Copy each dword separately to HW */
> +       for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) {
> +               u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32)));
> +
> +               wr32(hw, GLCOMM_QTX_CNTX_DATA(i), *ctx);
> +
> +               ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx);
> +       }
> +
> +       return 0;
> +}
> +
> +/* Configuration access to tx ring context(from PF) is done via indirect
> + * interface, GLCOMM_QTX_CNTX_CTL/DATA registers. However, there registers

s/there/these

> + * are shared by all the PFs with single PCI card. Hence multiplied PF may
> + * access there registers simultaneously, causing access conflicts. Then

s/there/these

> + * card-level grained locking is required to protect these registers from
> + * being competed by PF devices within the same card. However, there is no
> + * such kind of card-level locking supported. Introduce a coarse grained
> + * global lock which is shared by all the PF driver.

Not sure if this has any unexpected consequences, but the lock will also 
be shared between PFs of separate cards on the same system as well.

> + *
> + * The overall flow is to acquire the lock, read/write TXQ context through
> + * GLCOMM_QTX_CNTX_CTL/DATA indirect interface and release the lock once
> + * access is completed. In this way, only one PF can have access to TXQ
> + * context safely.
> + */
> +static DEFINE_MUTEX(ice_global_txq_ctx_lock); > +
> +/**
> + * ice_read_txq_ctx - Read txq context from HW
> + * @hw: pointer to the hardware structure
> + * @tlan_ctx: pointer to the txq context
> + * @txq_index: the index of the Tx queue
> + *
> + * Read txq context from HW register space and then convert it from dense
> + * structure to sparse
> + */
> +int
> +ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
> +                u32 txq_index)
> +{
> +       u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 };
> +       int status;
> +       u32 txq_base;
> +       u32 cmd, reg;
> +
> +       if (!tlan_ctx)
> +               return -EINVAL;
> +
> +       if (txq_index > QTX_COMM_HEAD_MAX_INDEX)
> +               return -EINVAL;
> +
> +       /* Get TXQ base within card space */
> +       txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id));
> +       txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >>
> +                  PFLAN_TX_QALLOC_FIRSTQ_S;
> +
> +       cmd = (GLCOMM_QTX_CNTX_CTL_CMD_READ
> +               << GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M;
> +       reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M |
> +             (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) &
> +              GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M);
> +
> +       mutex_lock(&ice_global_txq_ctx_lock);
> +
> +       wr32(hw, GLCOMM_QTX_CNTX_CTL, reg);
> +       ice_flush(hw);
> +
> +       status = ice_copy_txq_ctx_from_hw(hw, ctx_buf);
> +       if (status) {
> +               mutex_unlock(&ice_global_txq_ctx_lock);
> +               return status;
> +       }
> +
> +       mutex_unlock(&ice_global_txq_ctx_lock);
> +
> +       return ice_get_ctx(ctx_buf, (u8 *)tlan_ctx, ice_tlan_ctx_data_info);
> +}
> +
> +/**
> + * ice_write_txq_ctx - Write txq context from HW
> + * @hw: pointer to the hardware structure
> + * @tlan_ctx: pointer to the txq context
> + * @txq_index: the index of the Tx queue
> + *
> + * Convert txq context from sparse to dense structure and then write
> + * it to HW register space
> + */
> +int
> +ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
> +                 u32 txq_index)
> +{
> +       u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 };
> +       int status;
> +       u32 txq_base;
> +       u32 cmd, reg;
> +
> +       if (!tlan_ctx)
> +               return -EINVAL;
> +
> +       if (txq_index > QTX_COMM_HEAD_MAX_INDEX)
> +               return -EINVAL;
> +
> +       ice_set_ctx(hw, (u8 *)tlan_ctx, ctx_buf, ice_tlan_ctx_info);
> +
> +       /* Get TXQ base within card space */
> +       txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id));
> +       txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >>
> +                  PFLAN_TX_QALLOC_FIRSTQ_S;
> +
> +       cmd = (GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN
> +               << GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M;
> +       reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M |
> +             (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) &
> +              GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M);
> +
> +       mutex_lock(&ice_global_txq_ctx_lock);
> +
> +       status = ice_copy_txq_ctx_to_hw(hw, ctx_buf);
> +       if (status) {
> +               mutex_lock(&ice_global_txq_ctx_lock);
> +               return status;
> +       }
> +
> +       wr32(hw, GLCOMM_QTX_CNTX_CTL, reg);
> +       ice_flush(hw);
> +
> +       mutex_unlock(&ice_global_txq_ctx_lock);
> +
> +       return 0;
> +}
>   /* Sideband Queue command wrappers */
> 
>   /**
> diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h
> index df9c7f30592a..40fbb9088475 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.h
> +++ b/drivers/net/ethernet/intel/ice/ice_common.h
> @@ -58,6 +58,12 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
>   int
>   ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
>                   u32 rxq_index);
> +int
> +ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
> +                u32 txq_index);
> +int
> +ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx,
> +                 u32 txq_index);
> 
>   int
>   ice_aq_get_rss_lut(struct ice_hw *hw, struct ice_aq_get_set_rss_lut_params *get_params);
> diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> index 86936b758ade..7410da715ad4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
> @@ -8,6 +8,7 @@
> 
>   #define QTX_COMM_DBELL(_DBQM)                  (0x002C0000 + ((_DBQM) * 4))
>   #define QTX_COMM_HEAD(_DBQM)                   (0x000E0000 + ((_DBQM) * 4))
> +#define QTX_COMM_HEAD_MAX_INDEX                        16383
>   #define QTX_COMM_HEAD_HEAD_S                   0
>   #define QTX_COMM_HEAD_HEAD_M                   ICE_M(0x1FFF, 0)
>   #define PF_FW_ARQBAH                           0x00080180
> @@ -258,6 +259,9 @@
>   #define VPINT_ALLOC_PCI_VALID_M                        BIT(31)
>   #define VPINT_MBX_CTL(_VSI)                    (0x0016A000 + ((_VSI) * 4))
>   #define VPINT_MBX_CTL_CAUSE_ENA_M              BIT(30)
> +#define PFLAN_TX_QALLOC(_PF)                   (0x001D2580 + ((_PF) * 4))
> +#define PFLAN_TX_QALLOC_FIRSTQ_S               0
> +#define PFLAN_TX_QALLOC_FIRSTQ_M               ICE_M(0x3FFF, 0)
>   #define GLLAN_RCTL_0                           0x002941F8
>   #define QRX_CONTEXT(_i, _QRX)                  (0x00280000 + ((_i) * 8192 + (_QRX) * 4))
>   #define QRX_CTRL(_QRX)                         (0x00120000 + ((_QRX) * 4))
> @@ -362,6 +366,17 @@
>   #define GLNVM_ULD_POR_DONE_1_M                 BIT(8)
>   #define GLNVM_ULD_PCIER_DONE_2_M               BIT(9)
>   #define GLNVM_ULD_PE_DONE_M                    BIT(10)
> +#define GLCOMM_QTX_CNTX_CTL                    0x002D2DC8
> +#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S         0
> +#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M         ICE_M(0x3FFF, 0)
> +#define GLCOMM_QTX_CNTX_CTL_CMD_S              16
> +#define GLCOMM_QTX_CNTX_CTL_CMD_M              ICE_M(0x7, 16)
> +#define GLCOMM_QTX_CNTX_CTL_CMD_READ           0
> +#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE          1
> +#define GLCOMM_QTX_CNTX_CTL_CMD_RESET          3
> +#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN   4
> +#define GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M         BIT(19)
> +#define GLCOMM_QTX_CNTX_DATA(_i)               (0x002D2D40 + ((_i) * 4))
>   #define GLPCI_CNF2                             0x000BE004
>   #define GLPCI_CNF2_CACHELINE_SIZE_M            BIT(1)
>   #define PF_FUNC_RID                            0x0009E880
> diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> index 89f986a75cc8..79e07c863ae0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
> @@ -431,6 +431,8 @@ enum ice_rx_flex_desc_status_error_1_bits {
> 
>   #define ICE_RXQ_CTX_SIZE_DWORDS                8
>   #define ICE_RXQ_CTX_SZ                 (ICE_RXQ_CTX_SIZE_DWORDS * sizeof(u32))
> +#define ICE_TXQ_CTX_SIZE_DWORDS                10
> +#define ICE_TXQ_CTX_SZ                 (ICE_TXQ_CTX_SIZE_DWORDS * sizeof(u32))
>   #define ICE_TX_CMPLTNQ_CTX_SIZE_DWORDS 22
>   #define ICE_TX_DRBELL_Q_CTX_SIZE_DWORDS        5
>   #define GLTCLAN_CQ_CNTX(i, CQ)         (GLTCLAN_CQ_CNTX0(CQ) + ((i) * 0x0800))
> @@ -649,6 +651,7 @@ struct ice_tlan_ctx {
>          u8 cache_prof_idx;
>          u8 pkt_shaper_prof_idx;
>          u8 int_q_state; /* width not needed - internal - DO NOT WRITE!!! */
> +       u16 tail;
>   };
> 
>   /* The ice_ptype_lkup table is used to convert from the 10-bit ptype in the
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration
  2023-11-21  2:51 ` [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration Yahui Cao
@ 2023-12-08 22:28   ` Brett Creeley
  2024-02-12 23:07     ` [Intel-wired-lan] " Jacob Keller
  0 siblings, 1 reply; 33+ messages in thread
From: Brett Creeley @ 2023-12-08 22:28 UTC (permalink / raw)
  To: Yahui Cao, intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni

On 11/20/2023 6:51 PM, Yahui Cao wrote:
> Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> 
> 
> From: Lingyu Liu <lingyu.liu@intel.com>
> 
> During migration device resume stage, part of device state is loaded by
> replaying logged virtual channel message. By default, once virtual
> channel message is processed successfully, PF will send message to VF.
> 
> In addition, PF will notify VF about link state while handling virtual
> channel message GET_VF_RESOURCE and ENABLE_QUEUES. And VF driver will
> print link state change info once receiving notification from PF.
> 
> However, device resume stage does not need PF to send messages to VF
> for the above cases. Stop PF from sending messages to VF while VF is
> in replay state.
> 
> Signed-off-by: Lingyu Liu <lingyu.liu@intel.com>
> Signed-off-by: Yahui Cao <yahui.cao@intel.com>
> ---
>   drivers/net/ethernet/intel/ice/ice_vf_lib.h   |   1 +
>   drivers/net/ethernet/intel/ice/ice_virtchnl.c | 179 +++++++++++-------
>   drivers/net/ethernet/intel/ice/ice_virtchnl.h |   8 +-
>   .../ethernet/intel/ice/ice_virtchnl_fdir.c    |  28 +--
>   4 files changed, 127 insertions(+), 89 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> index 93c774f2f437..c7e7df7baf38 100644
> --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h
> @@ -37,6 +37,7 @@ enum ice_vf_states {
>          ICE_VF_STATE_DIS,
>          ICE_VF_STATE_MC_PROMISC,
>          ICE_VF_STATE_UC_PROMISC,
> +       ICE_VF_STATE_REPLAYING_VC,

Should this enum have "MIGRATION" in it to make it clear that this flag 
is specifically for replaying VF state for migration purposes?

>          ICE_VF_STATES_NBITS
>   };
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index cdf17b1e2f25..661ca86c3032 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -233,6 +233,9 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf)
>          struct virtchnl_pf_event pfe = { 0 };
>          struct ice_hw *hw = &vf->pf->hw;
> 
> +       if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states))
> +               return;
> +
>          pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
>          pfe.severity = PF_EVENT_SEVERITY_INFO;
> 
> @@ -282,7 +285,7 @@ void ice_vc_notify_reset(struct ice_pf *pf)
>   }
> 
>   /**
> - * ice_vc_send_msg_to_vf - Send message to VF
> + * ice_vc_send_response_to_vf - Send response message to VF
>    * @vf: pointer to the VF info
>    * @v_opcode: virtual channel opcode
>    * @v_retval: virtual channel return value
> @@ -291,9 +294,10 @@ void ice_vc_notify_reset(struct ice_pf *pf)
>    *
>    * send msg to VF
>    */
> -int
> -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
> -                     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
> +static int
> +ice_vc_send_response_to_vf(struct ice_vf *vf, u32 v_opcode,
> +                          enum virtchnl_status_code v_retval,
> +                          u8 *msg, u16 msglen)

Is all of this rework needed? It seems like it's just a name change with 
additional logic to check the REPLAYING state. IMHO the naming isn't 
really any cleaner.

Would it make more sense to just modify the current 
ice_vc_send_msg_to_vf() to handle the REPLAYING state? It seems like 
that would simplify this patch quite a bit.

Is there a reason for these changes in follow up patches that I missed?

Thanks,

Brett
>   {
>          struct device *dev;
>          struct ice_pf *pf;
> @@ -314,6 +318,39 @@ ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
>          return 0;
>   }
> 
> +/**
> + * ice_vc_respond_to_vf - Respond to VF
> + * @vf: pointer to the VF info
> + * @v_opcode: virtual channel opcode
> + * @v_retval: virtual channel return value
> + * @msg: pointer to the msg buffer
> + * @msglen: msg length
> + *
> + * Respond to VF. If it is replaying, return directly.
> + *
> + * Return 0 for success, negative for error.
> + */
> +int
> +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
> +                    enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
> +{
> +       struct device *dev;
> +       struct ice_pf *pf = vf->pf;
> +
> +       dev = ice_pf_to_dev(pf);
> +
> +       if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) {
> +               if (v_retval == VIRTCHNL_STATUS_SUCCESS)
> +                       return 0;
> +
> +               dev_dbg(dev, "Unable to replay virt channel command, VF ID %d, virtchnl status code %d. op code %d, len %d.\n",
> +                       vf->vf_id, v_retval, v_opcode, msglen);
> +               return -EIO;
> +       }
> +
> +       return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen);
> +}
> +
>   /**
>    * ice_vc_get_ver_msg
>    * @vf: pointer to the VF info
> @@ -332,9 +369,9 @@ static int ice_vc_get_ver_msg(struct ice_vf *vf, u8 *msg)
>          if (VF_IS_V10(&vf->vf_ver))
>                  info.minor = VIRTCHNL_VERSION_MINOR_NO_VF_CAPS;
> 
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_VERSION,
> -                                    VIRTCHNL_STATUS_SUCCESS, (u8 *)&info,
> -                                    sizeof(struct virtchnl_version_info));
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_VERSION,
> +                                   VIRTCHNL_STATUS_SUCCESS, (u8 *)&info,
> +                                   sizeof(struct virtchnl_version_info));
>   }
> 
>   /**
> @@ -522,8 +559,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
> 
>   err:
>          /* send the response back to the VF */
> -       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret,
> -                                   (u8 *)vfres, len);
> +       ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret,
> +                                  (u8 *)vfres, len);
> 
>          kfree(vfres);
>          return ret;
> @@ -892,7 +929,7 @@ static int ice_vc_handle_rss_cfg(struct ice_vf *vf, u8 *msg, bool add)
>          }
> 
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, v_opcode, v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, v_opcode, v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -938,8 +975,8 @@ static int ice_vc_config_rss_key(struct ice_vf *vf, u8 *msg)
>          if (ice_set_rss_key(vsi, vrk->key))
>                  v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret,
> -                                    NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret,
> +                                   NULL, 0);
>   }
> 
>   /**
> @@ -984,7 +1021,7 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg)
>          if (ice_set_rss_lut(vsi, vrl->lut, ICE_LUT_VSI_SIZE))
>                  v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR;
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret,
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret,
>                                       NULL, 0);
>   }
> 
> @@ -1124,8 +1161,8 @@ static int ice_vc_cfg_promiscuous_mode_msg(struct ice_vf *vf, u8 *msg)
>          }
> 
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -1165,8 +1202,8 @@ static int ice_vc_get_stats_msg(struct ice_vf *vf, u8 *msg)
> 
>   error_param:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret,
> -                                    (u8 *)&stats, sizeof(stats));
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret,
> +                                   (u8 *)&stats, sizeof(stats));
>   }
> 
>   /**
> @@ -1315,8 +1352,8 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg)
> 
>   error_param:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret,
> -                                    NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret,
> +                                   NULL, 0);
>   }
> 
>   /**
> @@ -1455,8 +1492,8 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg)
> 
>   error_param:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret,
> -                                    NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret,
> +                                   NULL, 0);
>   }
> 
>   /**
> @@ -1586,8 +1623,8 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg)
> 
>   error_param:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret,
> -                                    NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret,
> +                                   NULL, 0);
>   }
> 
>   /**
> @@ -1730,8 +1767,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>          }
> 
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
> -                                    VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
> +                                   VIRTCHNL_STATUS_SUCCESS, NULL, 0);
>   error_param:
>          /* disable whatever we can */
>          for (; i >= 0; i--) {
> @@ -1746,8 +1783,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
>          ice_lag_move_new_vf_nodes(vf);
> 
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
> -                                    VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES,
> +                                   VIRTCHNL_STATUS_ERR_PARAM, NULL, 0);
>   }
> 
>   /**
> @@ -2049,7 +2086,7 @@ ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set)
> 
>   handle_mac_exit:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, vc_op, v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, vc_op, v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -2132,8 +2169,8 @@ static int ice_vc_request_qs_msg(struct ice_vf *vf, u8 *msg)
> 
>   error_param:
>          /* send the response to the VF */
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES,
> -                                    v_ret, (u8 *)vfres, sizeof(*vfres));
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES,
> +                                   v_ret, (u8 *)vfres, sizeof(*vfres));
>   }
> 
>   /**
> @@ -2398,11 +2435,11 @@ static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v)
>   error_param:
>          /* send the response to the VF */
>          if (add_v)
> -               return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret,
> -                                            NULL, 0);
> +               return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret,
> +                                           NULL, 0);
>          else
> -               return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret,
> -                                            NULL, 0);
> +               return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret,
> +                                           NULL, 0);
>   }
> 
>   /**
> @@ -2477,8 +2514,8 @@ static int ice_vc_ena_vlan_stripping(struct ice_vf *vf)
>                  vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA;
> 
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -2514,8 +2551,8 @@ static int ice_vc_dis_vlan_stripping(struct ice_vf *vf)
>                  vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA;
> 
>   error_param:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -2550,8 +2587,8 @@ static int ice_vc_get_rss_hena(struct ice_vf *vf)
>          vrh->hena = ICE_DEFAULT_RSS_HENA;
>   err:
>          /* send the response back to the VF */
> -       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret,
> -                                   (u8 *)vrh, len);
> +       ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret,
> +                                  (u8 *)vrh, len);
>          kfree(vrh);
>          return ret;
>   }
> @@ -2616,8 +2653,8 @@ static int ice_vc_set_rss_hena(struct ice_vf *vf, u8 *msg)
> 
>          /* send the response to the VF */
>   err:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret,
> -                                    NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret,
> +                                   NULL, 0);
>   }
> 
>   /**
> @@ -2672,8 +2709,8 @@ static int ice_vc_query_rxdid(struct ice_vf *vf)
>          pf->supported_rxdids = rxdid->supported_rxdids;
> 
>   err:
> -       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
> -                                   v_ret, (u8 *)rxdid, len);
> +       ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS,
> +                                  v_ret, (u8 *)rxdid, len);
>          kfree(rxdid);
>          return ret;
>   }
> @@ -2909,8 +2946,8 @@ static int ice_vc_get_offload_vlan_v2_caps(struct ice_vf *vf)
>          memcpy(&vf->vlan_v2_caps, caps, sizeof(*caps));
> 
>   out:
> -       err = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS,
> -                                   v_ret, (u8 *)caps, len);
> +       err = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS,
> +                                  v_ret, (u8 *)caps, len);
>          kfree(caps);
>          return err;
>   }
> @@ -3151,8 +3188,8 @@ static int ice_vc_remove_vlan_v2_msg(struct ice_vf *vf, u8 *msg)
>                  v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2, v_ret, NULL,
> -                                    0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3293,8 +3330,8 @@ static int ice_vc_add_vlan_v2_msg(struct ice_vf *vf, u8 *msg)
>                  v_ret = VIRTCHNL_STATUS_ERR_PARAM;
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2, v_ret, NULL,
> -                                    0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3525,8 +3562,8 @@ static int ice_vc_ena_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg)
>                  vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA;
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3600,8 +3637,8 @@ static int ice_vc_dis_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg)
>                  vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA;
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3659,8 +3696,8 @@ static int ice_vc_ena_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg)
>          }
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3714,8 +3751,8 @@ static int ice_vc_dis_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg)
>          }
> 
>   out:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2,
> +                                   v_ret, NULL, 0);
>   }
> 
>   static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = {
> @@ -3812,8 +3849,8 @@ static int ice_vc_repr_add_mac(struct ice_vf *vf, u8 *msg)
>          }
> 
>   handle_mac_exit:
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR,
> -                                    v_ret, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR,
> +                                   v_ret, NULL, 0);
>   }
> 
>   /**
> @@ -3832,8 +3869,8 @@ ice_vc_repr_del_mac(struct ice_vf __always_unused *vf, u8 __always_unused *msg)
> 
>          ice_update_legacy_cached_mac(vf, &al->list[0]);
> 
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR,
> -                                    VIRTCHNL_STATUS_SUCCESS, NULL, 0);
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR,
> +                                   VIRTCHNL_STATUS_SUCCESS, NULL, 0);
>   }
> 
>   static int
> @@ -3842,8 +3879,8 @@ ice_vc_repr_cfg_promiscuous_mode(struct ice_vf *vf, u8 __always_unused *msg)
>          dev_dbg(ice_pf_to_dev(vf->pf),
>                  "Can't config promiscuous mode in switchdev mode for VF %d\n",
>                  vf->vf_id);
> -       return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
> -                                    VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
> +       return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE,
> +                                   VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
>                                       NULL, 0);
>   }
> 
> @@ -3986,16 +4023,16 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
> 
>   error_handler:
>          if (err) {
> -               ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM,
> -                                     NULL, 0);
> +               ice_vc_respond_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM,
> +                                    NULL, 0);
>                  dev_err(dev, "Invalid message from VF %d, opcode %d, len %d, error %d\n",
>                          vf_id, v_opcode, msglen, err);
>                  goto finish;
>          }
> 
>          if (!ice_vc_is_opcode_allowed(vf, v_opcode)) {
> -               ice_vc_send_msg_to_vf(vf, v_opcode,
> -                                     VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL,
> +               ice_vc_respond_to_vf(vf, v_opcode,
> +                                    VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL,
>                                        0);
>                  goto finish;
>          }
> @@ -4106,9 +4143,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>          default:
>                  dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode,
>                          vf_id);
> -               err = ice_vc_send_msg_to_vf(vf, v_opcode,
> -                                           VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
> -                                           NULL, 0);
> +               err = ice_vc_respond_to_vf(vf, v_opcode,
> +                                          VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
> +                                          NULL, 0);
>                  break;
>          }
>          if (err) {
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> index cd747718de73..a2b6094e2f2f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h
> @@ -60,8 +60,8 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf);
>   void ice_vc_notify_link_state(struct ice_pf *pf);
>   void ice_vc_notify_reset(struct ice_pf *pf);
>   int
> -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
> -                     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen);
> +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
> +                    enum virtchnl_status_code v_retval, u8 *msg, u16 msglen);
>   bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id);
>   void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event,
>                             struct ice_mbx_data *mbxdata);
> @@ -73,8 +73,8 @@ static inline void ice_vc_notify_link_state(struct ice_pf *pf) { }
>   static inline void ice_vc_notify_reset(struct ice_pf *pf) { }
> 
>   static inline int
> -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
> -                     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
> +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode,
> +                    enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
>   {
>          return -EOPNOTSUPP;
>   }
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
> index 24b23b7ef04a..816d8bf8bec4 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c
> @@ -1584,8 +1584,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
>          resp->flow_id = conf->flow_id;
>          vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]++;
> 
> -       ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
> -                                   (u8 *)resp, len);
> +       ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
> +                                  (u8 *)resp, len);
>          kfree(resp);
> 
>          dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n",
> @@ -1600,8 +1600,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
>          ice_vc_fdir_remove_entry(vf, conf, conf->flow_id);
>          devm_kfree(dev, conf);
> 
> -       ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
> -                                   (u8 *)resp, len);
> +       ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
> +                                  (u8 *)resp, len);
>          kfree(resp);
>          return ret;
>   }
> @@ -1648,8 +1648,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
>          ice_vc_fdir_remove_entry(vf, conf, conf->flow_id);
>          vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]--;
> 
> -       ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
> -                                   (u8 *)resp, len);
> +       ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
> +                                  (u8 *)resp, len);
>          kfree(resp);
> 
>          dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n",
> @@ -1665,8 +1665,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx,
>          if (success)
>                  devm_kfree(dev, conf);
> 
> -       ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret,
> -                                   (u8 *)resp, len);
> +       ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret,
> +                                  (u8 *)resp, len);
>          kfree(resp);
>          return ret;
>   }
> @@ -1863,8 +1863,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg)
>                  v_ret = VIRTCHNL_STATUS_SUCCESS;
>                  stat->status = VIRTCHNL_FDIR_SUCCESS;
>                  devm_kfree(dev, conf);
> -               ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER,
> -                                           v_ret, (u8 *)stat, len);
> +               ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER,
> +                                          v_ret, (u8 *)stat, len);
>                  goto exit;
>          }
> 
> @@ -1922,8 +1922,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg)
>   err_free_conf:
>          devm_kfree(dev, conf);
>   err_exit:
> -       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret,
> -                                   (u8 *)stat, len);
> +       ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret,
> +                                  (u8 *)stat, len);
>          kfree(stat);
>          return ret;
>   }
> @@ -2006,8 +2006,8 @@ int ice_vc_del_fdir_fltr(struct ice_vf *vf, u8 *msg)
>   err_del_tmr:
>          ice_vc_fdir_clear_irq_ctx(vf);
>   err_exit:
> -       ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret,
> -                                   (u8 *)stat, len);
> +       ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret,
> +                                  (u8 *)stat, len);
>          kfree(stat);
>          return ret;
>   }
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH iwl-next v4 00/12] Add E800 live migration driver
  2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
                   ` (12 preceding siblings ...)
  2023-12-04 11:18 ` [PATCH iwl-next v4 00/12] Add E800 live migration driver Cao, Yahui
@ 2024-01-18 22:09 ` Jacob Keller
  13 siblings, 0 replies; 33+ messages in thread
From: Jacob Keller @ 2024-01-18 22:09 UTC (permalink / raw)
  To: Yahui Cao, intel-wired-lan
  Cc: kvm, netdev, lingyu.liu, kevin.tian, madhu.chittim,
	sridhar.samudrala, alex.williamson, jgg, yishaih,
	shameerali.kolothum.thodi, brett.creeley, davem, edumazet, kuba,
	pabeni



On 11/20/2023 6:50 PM, Yahui Cao wrote:
> This series adds vfio live migration support for Intel E810 VF devices
> based on the v2 migration protocol definition series discussed here[0].
> 
> Steps to test:
> 1. Bind one or more E810 VF devices to the module ice-vfio-pci.ko
> 2. Assign the VFs to the virtual machine and enable device live migration
> 3. Run a workload using IAVF inside the VM, for example, iperf.
> 4. Migrate the VM from the source node to a destination node.
> 
> The series is also available for review here[1].
> 
> Thanks,
> Yahui
> [0] https://lore.kernel.org/kvm/20220224142024.147653-1-yishaih@nvidia.com/
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/linux.git/log/?h=ice_live_migration
> 
> Change log:
> 

Hi,

As a heads up to the reviewers of the previous versions, starting with
v5 and going forward, I'm taking over this series from Yahui and Lingyu
Liu. I'm currently catching up on the code and going over the v4 review
comments before I begin working on v5.

It is probable v5 may be delayed as I take some time to get familiar
with the code and feature.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration
  2023-12-08 22:28   ` Brett Creeley
@ 2024-02-12 23:07     ` Jacob Keller
  0 siblings, 0 replies; 33+ messages in thread
From: Jacob Keller @ 2024-02-12 23:07 UTC (permalink / raw)
  To: Brett Creeley, Yahui Cao, intel-wired-lan
  Cc: kevin.tian, yishaih, brett.creeley, kvm, sridhar.samudrala,
	edumazet, shameerali.kolothum.thodi, alex.williamson,
	madhu.chittim, jgg, netdev, kuba, pabeni, davem



On 12/8/2023 2:28 PM, Brett Creeley wrote:
>> -int
>> -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
>> -                     enum virtchnl_status_code v_retval, u8 *msg, u16 msglen)
>> +static int
>> +ice_vc_send_response_to_vf(struct ice_vf *vf, u32 v_opcode,
>> +                          enum virtchnl_status_code v_retval,
>> +                          u8 *msg, u16 msglen)
> 
> Is all of this rework needed? It seems like it's just a name change with 
> additional logic to check the REPLAYING state. IMHO the naming isn't 
> really any cleaner.
> 
> Would it make more sense to just modify the current 
> ice_vc_send_msg_to_vf() to handle the REPLAYING state? It seems like 
> that would simplify this patch quite a bit.
> 
> Is there a reason for these changes in follow up patches that I missed?
> 
> Thanks,
> 
> Brett

I remember making the suggestion to switch from "ice_vc_send_msg_to_vf"
to "ice_vc_send_response_to_vf" irrespective of the live migration.

I guess i could see it as just thrash, but it reads more clear ot me
that the action is about sending a response to the VF vs the generic
"send_msg_to_vf" which could be about any type of message whether its in
response or not. But to some extend thats just bike shedding.

I'll drop this change in the next version regardless, because I'm going
to move away from the virtchnl as the serialization format for the live
migration data.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-02-12 23:07 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-21  2:50 [PATCH iwl-next v4 00/12] Add E800 live migration driver Yahui Cao
2023-11-21  2:51 ` [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Yahui Cao
2023-12-08 22:01   ` Brett Creeley
2023-11-21  2:51 ` [PATCH iwl-next v4 02/12] ice: Add function to get and set TX " Yahui Cao
2023-12-08 22:14   ` Brett Creeley
2023-11-21  2:51 ` [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration Yahui Cao
2023-12-08 22:28   ` Brett Creeley
2024-02-12 23:07     ` [Intel-wired-lan] " Jacob Keller
2023-11-21  2:51 ` [PATCH iwl-next v4 04/12] ice: Add fundamental migration init and exit function Yahui Cao
2023-11-21  2:51 ` [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Yahui Cao
2023-11-29 17:12   ` Simon Horman
2023-12-01  8:27     ` Cao, Yahui
2023-12-07  7:33   ` Tian, Kevin
2023-12-08  1:53   ` Brett Creeley
2023-11-21  2:51 ` [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration Yahui Cao
2023-12-07  7:39   ` Tian, Kevin
2023-11-21  2:51 ` [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message " Yahui Cao
2023-12-07  7:42   ` Tian, Kevin
2023-11-21  2:51 ` [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head Yahui Cao
2023-12-07  7:55   ` Tian, Kevin
2023-12-07 14:46     ` Jason Gunthorpe
2023-12-08  2:53       ` Tian, Kevin
2023-11-21  2:51 ` [PATCH iwl-next v4 09/12] ice: Save and load TX " Yahui Cao
2023-12-07  8:22   ` Tian, Kevin
2023-12-07 14:48     ` Jason Gunthorpe
2023-11-21  2:51 ` [PATCH iwl-next v4 10/12] ice: Add device suspend function for migration Yahui Cao
2023-11-21  2:51 ` [PATCH iwl-next v4 11/12] ice: Save and load mmio registers Yahui Cao
2023-11-21  2:51 ` [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices Yahui Cao
2023-12-07 22:43   ` Alex Williamson
2023-12-08  3:42     ` Tian, Kevin
2023-12-08  3:42       ` Tian, Kevin
2023-12-04 11:18 ` [PATCH iwl-next v4 00/12] Add E800 live migration driver Cao, Yahui
2024-01-18 22:09 ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).