netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver
@ 2025-02-03 21:09 Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 1/9] ice: count combined queues using Rx/Tx count Tony Nguyen
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Tony Nguyen, michal.swiatkowski, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

Michal Swiatkowski says:

It is another try to allow user to manage amount of MSI-X used for each
feature in ice. First was via devlink resources API, it wasn't accepted
in upstream. Also static MSI-X allocation using devlink resources isn't
really user friendly.

This try is using more dynamic way. "Dynamic" across whole kernel when
platform supports it and "dynamic" across the driver when not.

To achieve that reuse global devlink parameter pf_msix_max and
pf_msix_min. It fits how ice hardware counts MSI-X. In case of ice amount
of MSI-X reported on PCI is a whole MSI-X for the card (with MSI-X for
VFs also). Having pf_msix_max allow user to statically set how many
MSI-X he wants on PF and how many should be reserved for VFs.

pf_msix_min is used to set minimum number of MSI-X with which ice driver
should probe correctly.

Meaning of this field in case of dynamic vs static allocation:
- on system with dynamic MSI-X allocation support
 * alloc pf_msix_min as static, rest will be allocated dynamically
- on system without dynamic MSI-X allocation support
 * try alloc pf_msix_max as static, minimum acceptable result is
 pf_msix_min

As Jesse and Piotr suggested pf_msix_max and pf_msix_min can (an
probably should) be stored in NVM. This patchset isn't implementing
that.

Dynamic (kernel or driver) way means that splitting MSI-X across the
RDMA and eth in case there is a MSI-X shortage isn't correct. Can work
when dynamic is only on driver site, but can't when dynamic is on kernel
site.

Let's remove this code and move to MSI-X allocation feature by feature.
If there is no more MSI-X for a feature, a feature is working with less
MSI-X or it is turned off.

There is a regression here. With MSI-X splitting user can run RDMA and
eth even on system with not enough MSI-X. Now only eth will work. RDMA
can be turned on by changing number of PF queues (lowering) and reprobe
RDMA driver.

Example:
72 CPU number, eth, RDMA and flow director (1 MSI-X), 1 MSI-X for OICR
on PF, and 1 more for RDMA. Card is using 1 + 72 + 1 + 72 + 1 = 147.

We set pf_msix_min = 2, pf_msix_max = 128

OICR: 1
eth: 72
flow director: 1
RDMA: 128 - 74 = 54

We can change number of queues on pf to 36 and do devlink reinit

OICR: 1
eth: 36
RDMA: 73
flow director: 1

We can also (implemented in "ice: enable_rdma devlink param") turned
RDMA off.

OICR: 1
eth: 72
RDMA: 0 (turned off)
flow director: 1

After this changes we have a static base vector for SRIOV (SIOV probably
in the feature). Last patch from this series is simplifying managing VF
MSI-X code based on static vector.

Now changing queues using ethtool is also changing MSI-X. If there is
enough MSI-X it is always one to one. When there is not enough there
will be more queues than MSI-X. There is a lack of ability to set how
many queues should be used per MSI-X. Maybe we should introduce another
ethtool param for it? Sth like queues_per_vector?
---
IWL [9]:

v8 --> v9: [8]
 * add tested-by tags
 * v8 was send incorrect, fix it here

v7 --> v8: [7]
 * fix unrolling in devlink parameters register function (patch 2)

v6 --> v7: [6]
 * use vu32 for devlink MSI-X parameters instead of u16 (patch 2)
 * < instead of <= for MSI-X min parameter validation (patch 2)
 * use u32 for MSI-X values (patch 2, 8)

v5 --> v6: [5]
 * set default MSI-X max value based on needs instead of const define
   (patch 3)

v4 --> v5: [4]
 * count combined queues in ethtool for case the vectors aren't mapped
   1:1 to queues (patch 1)
 * change min_t to min where the casting isn't needed (and can hide
   problems) (patch 4)
 * load msix_max and msix_min value after devlink reload; it accidentally
   wasn't added after removing loading in probe path to mitigate error
   from devl_para_driverinit...() (patch 2)
 * add documentation in develink/ice for new parameters (patch 2)

v3 --> v4: [3]
 * drop unnecessary text in devlink validation comments
 * assume that devl_param_driverinit...() shouldn't return error in
   normal execution path

v2 --> v3: [2]
 * move flow director init before RDMA init
 * fix unrolling RDMA MSI-X allocation
 * add comment in commit message about lowering control RDMA MSI-X
   amount

v1 --> v2: [1]
 * change permanent MSI-X cmode parameters to driverinit
 * remove locking during devlink parameter registration (it is now
   locked for whole init/deinit part)

[9] https://lore.kernel.org/netdev/20241203065817.13475-1-michal.swiatkowski@linux.intel.com/
[8] https://lore.kernel.org/netdev/20241114122009.97416-1-michal.swiatkowski@linux.intel.com/
[7] https://lore.kernel.org/netdev/20241104121337.129287-1-michal.swiatkowski@linux.intel.com/
[6] https://lore.kernel.org/netdev/20241028100341.16631-1-michal.swiatkowski@linux.intel.com/
[5] https://lore.kernel.org/netdev/20241024121230.5861-1-michal.swiatkowski@linux.intel.com/
[4] https://lore.kernel.org/netdev/20240930120402.3468-1-michal.swiatkowski@linux.intel.com/
[3] https://lore.kernel.org/netdev/20240808072016.10321-1-michal.swiatkowski@linux.intel.com/
[2] https://lore.kernel.org/netdev/20240801093115.8553-1-michal.swiatkowski@linux.intel.com/
[1] https://lore.kernel.org/netdev/20240213073509.77622-1-michal.swiatkowski@linux.intel.com/

The following are changes since commit c2933b2befe25309f4c5cfbea0ca80909735fd76:
  Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE

Michal Swiatkowski (9):
  ice: count combined queues using Rx/Tx count
  ice: devlink PF MSI-X max and min parameter
  ice: remove splitting MSI-X between features
  ice: get rid of num_lan_msix field
  ice, irdma: move interrupts code to irdma
  ice: treat dyn_allowed only as suggestion
  ice: enable_rdma devlink param
  ice: simplify VF MSI-X managing
  ice: init flow director before RDMA

 Documentation/networking/devlink/ice.rst      |  11 +
 drivers/infiniband/hw/irdma/hw.c              |   2 -
 drivers/infiniband/hw/irdma/main.c            |  46 ++-
 drivers/infiniband/hw/irdma/main.h            |   3 +
 .../net/ethernet/intel/ice/devlink/devlink.c  | 109 +++++++
 drivers/net/ethernet/intel/ice/ice.h          |  21 +-
 drivers/net/ethernet/intel/ice/ice_base.c     |  10 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |   9 +-
 drivers/net/ethernet/intel/ice/ice_idc.c      |  64 +---
 drivers/net/ethernet/intel/ice/ice_irq.c      | 275 ++++++------------
 drivers/net/ethernet/intel/ice/ice_irq.h      |  13 +-
 drivers/net/ethernet/intel/ice/ice_lib.c      |  35 ++-
 drivers/net/ethernet/intel/ice/ice_main.c     |   6 +-
 drivers/net/ethernet/intel/ice/ice_sriov.c    | 154 +---------
 include/linux/net/intel/iidc.h                |   2 +
 15 files changed, 336 insertions(+), 424 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 1/9] ice: count combined queues using Rx/Tx count
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter Tony Nguyen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Previous implementation assumes that there is 1:1 matching between
vectors and queues. It isn't always true.

Get minimum value from Rx/Tx queues to determine combined queues number.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index f241493a6ac8..6bbb304ad9ab 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3817,8 +3817,7 @@ static u32 ice_get_combined_cnt(struct ice_vsi *vsi)
 	ice_for_each_q_vector(vsi, q_idx) {
 		struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
 
-		if (q_vector->rx.rx_ring && q_vector->tx.tx_ring)
-			combined++;
+		combined += min(q_vector->num_ring_tx, q_vector->num_ring_rx);
 	}
 
 	return combined;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 1/9] ice: count combined queues using Rx/Tx count Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:48   ` David Laight
  2025-02-04 22:35   ` Jakub Kicinski
  2025-02-03 21:09 ` [PATCH net-next 3/9] ice: remove splitting MSI-X between features Tony Nguyen
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Use generic devlink PF MSI-X parameter to allow user to change MSI-X
range.

Add notes about this parameters into ice devlink documentation.

Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 Documentation/networking/devlink/ice.rst      | 11 +++
 .../net/ethernet/intel/ice/devlink/devlink.c  | 88 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice.h          |  7 ++
 drivers/net/ethernet/intel/ice/ice_irq.c      |  7 ++
 4 files changed, 113 insertions(+)

diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index e3972d03cea0..792e9f8c846a 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -69,6 +69,17 @@ Parameters
 
        To verify that value has been set:
        $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
+   * - ``msix_vec_per_pf_max``
+     - driverinit
+     - Set the max MSI-X that can be used by the PF, rest can be utilized for
+       SRIOV. The range is from min value set in msix_vec_per_pf_min to
+       2k/number of ports.
+   * - ``msix_vec_per_pf_min``
+     - driverinit
+     - Set the min MSI-X that will be used by the PF. This value inform how many
+       MSI-X will be allocated statically. The range is from 2 to value set
+       in msix_vec_per_pf_max.
+
 .. list-table:: Driver specific parameters implemented
     :widths: 5 5 90
 
diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
index d116e2b10bce..c53baecf8a90 100644
--- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
+++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
@@ -1202,6 +1202,25 @@ static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
 	return status;
 }
 
+static void ice_set_min_max_msix(struct ice_pf *pf)
+{
+	struct devlink *devlink = priv_to_devlink(pf);
+	union devlink_param_value val;
+	int err;
+
+	err = devl_param_driverinit_value_get(devlink,
+					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+					      &val);
+	if (!err)
+		pf->msix.min = val.vu32;
+
+	err = devl_param_driverinit_value_get(devlink,
+					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+					      &val);
+	if (!err)
+		pf->msix.max = val.vu32;
+}
+
 /**
  * ice_devlink_reinit_up - do reinit of the given PF
  * @pf: pointer to the PF struct
@@ -1217,6 +1236,9 @@ static int ice_devlink_reinit_up(struct ice_pf *pf)
 		return err;
 	}
 
+	/* load MSI-X values */
+	ice_set_min_max_msix(pf);
+
 	err = ice_init_dev(pf);
 	if (err)
 		goto unroll_hw_init;
@@ -1530,6 +1552,37 @@ static int ice_devlink_local_fwd_validate(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int
+ice_devlink_msix_max_pf_validate(struct devlink *devlink, u32 id,
+				 union devlink_param_value val,
+				 struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
+	    val.vu32 < pf->msix.min) {
+		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
+				 union devlink_param_value val,
+				 struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+
+	if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
+		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 enum ice_param_id {
 	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
@@ -1547,6 +1600,15 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
 			      ice_devlink_enable_iw_validate),
 };
 
+static const struct devlink_param ice_dvl_msix_params[] = {
+	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MAX,
+			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      NULL, NULL, ice_devlink_msix_max_pf_validate),
+	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MIN,
+			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      NULL, NULL, ice_devlink_msix_min_pf_validate),
+};
+
 static const struct devlink_param ice_dvl_sched_params[] = {
 	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
 			     "tx_scheduling_layers",
@@ -1648,6 +1710,7 @@ void ice_devlink_unregister(struct ice_pf *pf)
 int ice_devlink_register_params(struct ice_pf *pf)
 {
 	struct devlink *devlink = priv_to_devlink(pf);
+	union devlink_param_value value;
 	struct ice_hw *hw = &pf->hw;
 	int status;
 
@@ -1656,10 +1719,33 @@ int ice_devlink_register_params(struct ice_pf *pf)
 	if (status)
 		return status;
 
+	status = devl_params_register(devlink, ice_dvl_msix_params,
+				      ARRAY_SIZE(ice_dvl_msix_params));
+	if (status)
+		goto unregister_rdma_params;
+
 	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
 		status = devl_params_register(devlink, ice_dvl_sched_params,
 					      ARRAY_SIZE(ice_dvl_sched_params));
+	if (status)
+		goto unregister_msix_params;
+
+	value.vu32 = pf->msix.max;
+	devl_param_driverinit_value_set(devlink,
+					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+					value);
+	value.vu32 = pf->msix.min;
+	devl_param_driverinit_value_set(devlink,
+					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+					value);
+	return 0;
 
+unregister_msix_params:
+	devl_params_unregister(devlink, ice_dvl_msix_params,
+			       ARRAY_SIZE(ice_dvl_msix_params));
+unregister_rdma_params:
+	devl_params_unregister(devlink, ice_dvl_rdma_params,
+			       ARRAY_SIZE(ice_dvl_rdma_params));
 	return status;
 }
 
@@ -1670,6 +1756,8 @@ void ice_devlink_unregister_params(struct ice_pf *pf)
 
 	devl_params_unregister(devlink, ice_dvl_rdma_params,
 			       ARRAY_SIZE(ice_dvl_rdma_params));
+	devl_params_unregister(devlink, ice_dvl_msix_params,
+			       ARRAY_SIZE(ice_dvl_msix_params));
 
 	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
 		devl_params_unregister(devlink, ice_dvl_sched_params,
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 71e05d30f0fd..d041b04ff324 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -542,6 +542,12 @@ struct ice_agg_node {
 	u8 valid;
 };
 
+struct ice_pf_msix {
+	u32 cur;
+	u32 min;
+	u32 max;
+};
+
 struct ice_pf {
 	struct pci_dev *pdev;
 	struct ice_adapter *adapter;
@@ -612,6 +618,7 @@ struct ice_pf {
 	struct msi_map ll_ts_irq;	/* LL_TS interrupt MSIX vector */
 	u16 max_pf_txqs;	/* Total Tx queues PF wide */
 	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
+	struct ice_pf_msix msix;
 	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
 	u16 num_lan_tx;		/* num LAN Tx queues setup */
 	u16 num_lan_rx;		/* num LAN Rx queues setup */
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index ad82ff7d1995..0659b96b9b8c 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -254,6 +254,13 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 	int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
 	int vectors, max_vectors;
 
+	/* load default PF MSI-X range */
+	if (!pf->msix.min)
+		pf->msix.min = ICE_MIN_MSIX;
+
+	if (!pf->msix.max)
+		pf->msix.max = total_vectors / 2;
+
 	vectors = ice_ena_msix_range(pf);
 
 	if (vectors < 0)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 3/9] ice: remove splitting MSI-X between features
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 1/9] ice: count combined queues using Rx/Tx count Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 4/9] ice: get rid of num_lan_msix field Tony Nguyen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

With dynamic approach to alloc MSI-X there is no sense to statically
split MSI-X between PF features.

Splitting was also calculating needed MSI-X. Move this part to separate
function and use as max value.

Remove ICE_ESWITCH_MSIX, as there is no need for additional MSI-X for
switchdev.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h     |   2 -
 drivers/net/ethernet/intel/ice/ice_irq.c | 172 +++--------------------
 2 files changed, 16 insertions(+), 158 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index d041b04ff324..c78bd45cf016 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -98,8 +98,6 @@
 #define ICE_MIN_MSIX		(ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
 #define ICE_FDIR_MSIX		2
 #define ICE_RDMA_NUM_AEQ_MSIX	4
-#define ICE_MIN_RDMA_MSIX	2
-#define ICE_ESWITCH_MSIX	1
 #define ICE_NO_VSI		0xffff
 #define ICE_VSI_MAP_CONTIG	0
 #define ICE_VSI_MAP_SCATTER	1
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 0659b96b9b8c..4a50a6dc817e 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -84,155 +84,11 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
 	return entry;
 }
 
-/**
- * ice_reduce_msix_usage - Reduce usage of MSI-X vectors
- * @pf: board private structure
- * @v_remain: number of remaining MSI-X vectors to be distributed
- *
- * Reduce the usage of MSI-X vectors when entire request cannot be fulfilled.
- * pf->num_lan_msix and pf->num_rdma_msix values are set based on number of
- * remaining vectors.
- */
-static void ice_reduce_msix_usage(struct ice_pf *pf, int v_remain)
+static int ice_get_default_msix_amount(struct ice_pf *pf)
 {
-	int v_rdma;
-
-	if (!ice_is_rdma_ena(pf)) {
-		pf->num_lan_msix = v_remain;
-		return;
-	}
-
-	/* RDMA needs at least 1 interrupt in addition to AEQ MSIX */
-	v_rdma = ICE_RDMA_NUM_AEQ_MSIX + 1;
-
-	if (v_remain < ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_RDMA_MSIX) {
-		dev_warn(ice_pf_to_dev(pf), "Not enough MSI-X vectors to support RDMA.\n");
-		clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
-
-		pf->num_rdma_msix = 0;
-		pf->num_lan_msix = ICE_MIN_LAN_TXRX_MSIX;
-	} else if ((v_remain < ICE_MIN_LAN_TXRX_MSIX + v_rdma) ||
-		   (v_remain - v_rdma < v_rdma)) {
-		/* Support minimum RDMA and give remaining vectors to LAN MSIX
-		 */
-		pf->num_rdma_msix = ICE_MIN_RDMA_MSIX;
-		pf->num_lan_msix = v_remain - ICE_MIN_RDMA_MSIX;
-	} else {
-		/* Split remaining MSIX with RDMA after accounting for AEQ MSIX
-		 */
-		pf->num_rdma_msix = (v_remain - ICE_RDMA_NUM_AEQ_MSIX) / 2 +
-				    ICE_RDMA_NUM_AEQ_MSIX;
-		pf->num_lan_msix = v_remain - pf->num_rdma_msix;
-	}
-}
-
-/**
- * ice_ena_msix_range - Request a range of MSIX vectors from the OS
- * @pf: board private structure
- *
- * Compute the number of MSIX vectors wanted and request from the OS. Adjust
- * device usage if there are not enough vectors. Return the number of vectors
- * reserved or negative on failure.
- */
-static int ice_ena_msix_range(struct ice_pf *pf)
-{
-	int num_cpus, hw_num_msix, v_other, v_wanted, v_actual;
-	struct device *dev = ice_pf_to_dev(pf);
-	int err;
-
-	hw_num_msix = pf->hw.func_caps.common_cap.num_msix_vectors;
-	num_cpus = num_online_cpus();
-
-	/* LAN miscellaneous handler */
-	v_other = ICE_MIN_LAN_OICR_MSIX;
-
-	/* Flow Director */
-	if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
-		v_other += ICE_FDIR_MSIX;
-
-	/* switchdev */
-	v_other += ICE_ESWITCH_MSIX;
-
-	v_wanted = v_other;
-
-	/* LAN traffic */
-	pf->num_lan_msix = num_cpus;
-	v_wanted += pf->num_lan_msix;
-
-	/* RDMA auxiliary driver */
-	if (ice_is_rdma_ena(pf)) {
-		pf->num_rdma_msix = num_cpus + ICE_RDMA_NUM_AEQ_MSIX;
-		v_wanted += pf->num_rdma_msix;
-	}
-
-	if (v_wanted > hw_num_msix) {
-		int v_remain;
-
-		dev_warn(dev, "not enough device MSI-X vectors. wanted = %d, available = %d\n",
-			 v_wanted, hw_num_msix);
-
-		if (hw_num_msix < ICE_MIN_MSIX) {
-			err = -ERANGE;
-			goto exit_err;
-		}
-
-		v_remain = hw_num_msix - v_other;
-		if (v_remain < ICE_MIN_LAN_TXRX_MSIX) {
-			v_other = ICE_MIN_MSIX - ICE_MIN_LAN_TXRX_MSIX;
-			v_remain = ICE_MIN_LAN_TXRX_MSIX;
-		}
-
-		ice_reduce_msix_usage(pf, v_remain);
-		v_wanted = pf->num_lan_msix + pf->num_rdma_msix + v_other;
-
-		dev_notice(dev, "Reducing request to %d MSI-X vectors for LAN traffic.\n",
-			   pf->num_lan_msix);
-		if (ice_is_rdma_ena(pf))
-			dev_notice(dev, "Reducing request to %d MSI-X vectors for RDMA.\n",
-				   pf->num_rdma_msix);
-	}
-
-	/* actually reserve the vectors */
-	v_actual = pci_alloc_irq_vectors(pf->pdev, ICE_MIN_MSIX, v_wanted,
-					 PCI_IRQ_MSIX);
-	if (v_actual < 0) {
-		dev_err(dev, "unable to reserve MSI-X vectors\n");
-		err = v_actual;
-		goto exit_err;
-	}
-
-	if (v_actual < v_wanted) {
-		dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n",
-			 v_wanted, v_actual);
-
-		if (v_actual < ICE_MIN_MSIX) {
-			/* error if we can't get minimum vectors */
-			pci_free_irq_vectors(pf->pdev);
-			err = -ERANGE;
-			goto exit_err;
-		} else {
-			int v_remain = v_actual - v_other;
-
-			if (v_remain < ICE_MIN_LAN_TXRX_MSIX)
-				v_remain = ICE_MIN_LAN_TXRX_MSIX;
-
-			ice_reduce_msix_usage(pf, v_remain);
-
-			dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n",
-				   pf->num_lan_msix);
-
-			if (ice_is_rdma_ena(pf))
-				dev_notice(dev, "Enabled %d MSI-X vectors for RDMA.\n",
-					   pf->num_rdma_msix);
-		}
-	}
-
-	return v_actual;
-
-exit_err:
-	pf->num_rdma_msix = 0;
-	pf->num_lan_msix = 0;
-	return err;
+	return ICE_MIN_LAN_OICR_MSIX + num_online_cpus() +
+	       (test_bit(ICE_FLAG_FD_ENA, pf->flags) ? ICE_FDIR_MSIX : 0) +
+	       (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_NUM_AEQ_MSIX : 0);
 }
 
 /**
@@ -259,17 +115,21 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 		pf->msix.min = ICE_MIN_MSIX;
 
 	if (!pf->msix.max)
-		pf->msix.max = total_vectors / 2;
-
-	vectors = ice_ena_msix_range(pf);
+		pf->msix.max = min(total_vectors,
+				   ice_get_default_msix_amount(pf));
 
-	if (vectors < 0)
-		return -ENOMEM;
-
-	if (pci_msix_can_alloc_dyn(pf->pdev))
+	if (pci_msix_can_alloc_dyn(pf->pdev)) {
+		vectors = pf->msix.min;
 		max_vectors = total_vectors;
-	else
+	} else {
+		vectors = pf->msix.max;
 		max_vectors = vectors;
+	}
+
+	vectors = pci_alloc_irq_vectors(pf->pdev, pf->msix.min, vectors,
+					PCI_IRQ_MSIX);
+	if (vectors < pf->msix.min)
+		return -ENOMEM;
 
 	ice_init_irq_tracker(pf, max_vectors, vectors);
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 4/9] ice: get rid of num_lan_msix field
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (2 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 3/9] ice: remove splitting MSI-X between features Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 5/9] ice, irdma: move interrupts code to irdma Tony Nguyen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Remove the field to allow having more queues than MSI-X on VSI. As
default the number will be the same, but if there won't be more MSI-X
available VSI can run with at least one MSI-X.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |  1 -
 drivers/net/ethernet/intel/ice/ice_base.c    | 10 ++++----
 drivers/net/ethernet/intel/ice/ice_ethtool.c |  6 ++---
 drivers/net/ethernet/intel/ice/ice_irq.c     | 11 ++++-----
 drivers/net/ethernet/intel/ice/ice_lib.c     | 25 +++++++++++---------
 5 files changed, 24 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index c78bd45cf016..ad61dd688871 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -617,7 +617,6 @@ struct ice_pf {
 	u16 max_pf_txqs;	/* Total Tx queues PF wide */
 	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
 	struct ice_pf_msix msix;
-	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
 	u16 num_lan_tx;		/* num LAN Tx queues setup */
 	u16 num_lan_rx;		/* num LAN Rx queues setup */
 	u16 next_vsi;		/* Next free slot in pf->vsi[] - 0-based! */
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index b2af8e3586f7..0e862f20427a 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -801,13 +801,11 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 	return 0;
 
 err_out:
-	while (v_idx--)
-		ice_free_q_vector(vsi, v_idx);
 
-	dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
-		vsi->num_q_vectors, vsi->vsi_num, err);
-	vsi->num_q_vectors = 0;
-	return err;
+	dev_info(dev, "Failed to allocate %d q_vectors for VSI %d, new value %d",
+		 vsi->num_q_vectors, vsi->vsi_num, v_idx);
+	vsi->num_q_vectors = v_idx;
+	return v_idx ? 0 : err;
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 6bbb304ad9ab..b0805704834d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3788,8 +3788,7 @@ ice_get_ts_info(struct net_device *dev, struct kernel_ethtool_ts_info *info)
  */
 static int ice_get_max_txq(struct ice_pf *pf)
 {
-	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
-		    (u16)pf->hw.func_caps.common_cap.num_txq);
+	return min(num_online_cpus(), pf->hw.func_caps.common_cap.num_txq);
 }
 
 /**
@@ -3798,8 +3797,7 @@ static int ice_get_max_txq(struct ice_pf *pf)
  */
 static int ice_get_max_rxq(struct ice_pf *pf)
 {
-	return min3(pf->num_lan_msix, (u16)num_online_cpus(),
-		    (u16)pf->hw.func_caps.common_cap.num_rxq);
+	return min(num_online_cpus(), pf->hw.func_caps.common_cap.num_rxq);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 4a50a6dc817e..1a7d446ab5f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -108,7 +108,7 @@ void ice_clear_interrupt_scheme(struct ice_pf *pf)
 int ice_init_interrupt_scheme(struct ice_pf *pf)
 {
 	int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
-	int vectors, max_vectors;
+	int vectors;
 
 	/* load default PF MSI-X range */
 	if (!pf->msix.min)
@@ -118,20 +118,17 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 		pf->msix.max = min(total_vectors,
 				   ice_get_default_msix_amount(pf));
 
-	if (pci_msix_can_alloc_dyn(pf->pdev)) {
+	if (pci_msix_can_alloc_dyn(pf->pdev))
 		vectors = pf->msix.min;
-		max_vectors = total_vectors;
-	} else {
+	else
 		vectors = pf->msix.max;
-		max_vectors = vectors;
-	}
 
 	vectors = pci_alloc_irq_vectors(pf->pdev, pf->msix.min, vectors,
 					PCI_IRQ_MSIX);
 	if (vectors < pf->msix.min)
 		return -ENOMEM;
 
-	ice_init_irq_tracker(pf, max_vectors, vectors);
+	ice_init_irq_tracker(pf, pf->msix.max, vectors);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 38a1c8372180..4b8d7aa7b1bb 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -157,6 +157,16 @@ static void ice_vsi_set_num_desc(struct ice_vsi *vsi)
 	}
 }
 
+static u16 ice_get_rxq_count(struct ice_pf *pf)
+{
+	return min(ice_get_avail_rxq_count(pf), num_online_cpus());
+}
+
+static u16 ice_get_txq_count(struct ice_pf *pf)
+{
+	return min(ice_get_avail_txq_count(pf), num_online_cpus());
+}
+
 /**
  * ice_vsi_set_num_qs - Set number of queues, descriptors and vectors for a VSI
  * @vsi: the VSI being configured
@@ -178,9 +188,7 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
 			vsi->alloc_txq = vsi->req_txq;
 			vsi->num_txq = vsi->req_txq;
 		} else {
-			vsi->alloc_txq = min3(pf->num_lan_msix,
-					      ice_get_avail_txq_count(pf),
-					      (u16)num_online_cpus());
+			vsi->alloc_txq = ice_get_txq_count(pf);
 		}
 
 		pf->num_lan_tx = vsi->alloc_txq;
@@ -193,17 +201,13 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
 				vsi->alloc_rxq = vsi->req_rxq;
 				vsi->num_rxq = vsi->req_rxq;
 			} else {
-				vsi->alloc_rxq = min3(pf->num_lan_msix,
-						      ice_get_avail_rxq_count(pf),
-						      (u16)num_online_cpus());
+				vsi->alloc_rxq = ice_get_rxq_count(pf);
 			}
 		}
 
 		pf->num_lan_rx = vsi->alloc_rxq;
 
-		vsi->num_q_vectors = min_t(int, pf->num_lan_msix,
-					   max_t(int, vsi->alloc_rxq,
-						 vsi->alloc_txq));
+		vsi->num_q_vectors = max(vsi->alloc_rxq, vsi->alloc_txq);
 		break;
 	case ICE_VSI_SF:
 		vsi->alloc_txq = 1;
@@ -1173,12 +1177,11 @@ static void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
 static void
 ice_chnl_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
 {
-	struct ice_pf *pf = vsi->back;
 	u16 qcount, qmap;
 	u8 offset = 0;
 	int pow;
 
-	qcount = min_t(int, vsi->num_rxq, pf->num_lan_msix);
+	qcount = vsi->num_rxq;
 
 	pow = order_base_2(qcount);
 	qmap = FIELD_PREP(ICE_AQ_VSI_TC_Q_OFFSET_M, offset);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 5/9] ice, irdma: move interrupts code to irdma
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (3 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 4/9] ice: get rid of num_lan_msix field Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 6/9] ice: treat dyn_allowed only as suggestion Tony Nguyen
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Move responsibility of MSI-X requesting for RDMA feature from ice driver
to irdma driver. It is done to allow simple fallback when there is not
enough MSI-X available.

Change amount of MSI-X used for control from 4 to 1, as it isn't needed
to have more than one MSI-X for this purpose.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/infiniband/hw/irdma/hw.c         |  2 -
 drivers/infiniband/hw/irdma/main.c       | 46 ++++++++++++++++-
 drivers/infiniband/hw/irdma/main.h       |  3 ++
 drivers/net/ethernet/intel/ice/ice.h     |  1 -
 drivers/net/ethernet/intel/ice/ice_idc.c | 64 ++++++------------------
 drivers/net/ethernet/intel/ice/ice_irq.c |  3 +-
 include/linux/net/intel/iidc.h           |  2 +
 7 files changed, 65 insertions(+), 56 deletions(-)

diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index ad50b77282f8..69ce1862eabe 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -498,8 +498,6 @@ static int irdma_save_msix_info(struct irdma_pci_f *rf)
 	iw_qvlist->num_vectors = rf->msix_count;
 	if (rf->msix_count <= num_online_cpus())
 		rf->msix_shared = true;
-	else if (rf->msix_count > num_online_cpus() + 1)
-		rf->msix_count = num_online_cpus() + 1;
 
 	pmsix = rf->msix_entries;
 	for (i = 0, ceq_idx = 0; i < rf->msix_count; i++, iw_qvinfo++) {
diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
index 3f13200ff71b..1ee8969595d3 100644
--- a/drivers/infiniband/hw/irdma/main.c
+++ b/drivers/infiniband/hw/irdma/main.c
@@ -206,6 +206,43 @@ static void irdma_lan_unregister_qset(struct irdma_sc_vsi *vsi,
 		ibdev_dbg(&iwdev->ibdev, "WS: LAN free_res for rdma qset failed.\n");
 }
 
+static int irdma_init_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
+{
+	int i;
+
+	rf->msix_count = num_online_cpus() + IRDMA_NUM_AEQ_MSIX;
+	rf->msix_entries = kcalloc(rf->msix_count, sizeof(*rf->msix_entries),
+				   GFP_KERNEL);
+	if (!rf->msix_entries)
+		return -ENOMEM;
+
+	for (i = 0; i < rf->msix_count; i++)
+		if (ice_alloc_rdma_qvector(pf, &rf->msix_entries[i]))
+			break;
+
+	if (i < IRDMA_MIN_MSIX) {
+		for (; i > 0; i--)
+			ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
+
+		kfree(rf->msix_entries);
+		return -ENOMEM;
+	}
+
+	rf->msix_count = i;
+
+	return 0;
+}
+
+static void irdma_deinit_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
+{
+	int i;
+
+	for (i = 0; i < rf->msix_count; i++)
+		ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
+
+	kfree(rf->msix_entries);
+}
+
 static void irdma_remove(struct auxiliary_device *aux_dev)
 {
 	struct iidc_auxiliary_dev *iidc_adev = container_of(aux_dev,
@@ -216,6 +253,7 @@ static void irdma_remove(struct auxiliary_device *aux_dev)
 
 	irdma_ib_unregister_device(iwdev);
 	ice_rdma_update_vsi_filter(pf, iwdev->vsi_num, false);
+	irdma_deinit_interrupts(iwdev->rf, pf);
 
 	pr_debug("INIT: Gen2 PF[%d] device remove success\n", PCI_FUNC(pf->pdev->devfn));
 }
@@ -230,9 +268,7 @@ static void irdma_fill_device_info(struct irdma_device *iwdev, struct ice_pf *pf
 	rf->gen_ops.unregister_qset = irdma_lan_unregister_qset;
 	rf->hw.hw_addr = pf->hw.hw_addr;
 	rf->pcidev = pf->pdev;
-	rf->msix_count =  pf->num_rdma_msix;
 	rf->pf_id = pf->hw.pf_id;
-	rf->msix_entries = &pf->msix_entries[pf->rdma_base_vector];
 	rf->default_vsi.vsi_idx = vsi->vsi_num;
 	rf->protocol_used = pf->rdma_mode & IIDC_RDMA_PROTOCOL_ROCEV2 ?
 			    IRDMA_ROCE_PROTOCOL_ONLY : IRDMA_IWARP_PROTOCOL_ONLY;
@@ -281,6 +317,10 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
 	irdma_fill_device_info(iwdev, pf, vsi);
 	rf = iwdev->rf;
 
+	err = irdma_init_interrupts(rf, pf);
+	if (err)
+		goto err_init_interrupts;
+
 	err = irdma_ctrl_init_hw(rf);
 	if (err)
 		goto err_ctrl_init;
@@ -311,6 +351,8 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
 err_rt_init:
 	irdma_ctrl_deinit_hw(rf);
 err_ctrl_init:
+	irdma_deinit_interrupts(rf, pf);
+err_init_interrupts:
 	kfree(iwdev->rf);
 	ib_dealloc_device(&iwdev->ibdev);
 
diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
index 9f0ed6e84471..ef9a9b79d711 100644
--- a/drivers/infiniband/hw/irdma/main.h
+++ b/drivers/infiniband/hw/irdma/main.h
@@ -117,6 +117,9 @@ extern struct auxiliary_driver i40iw_auxiliary_drv;
 
 #define IRDMA_IRQ_NAME_STR_LEN (64)
 
+#define IRDMA_NUM_AEQ_MSIX	1
+#define IRDMA_MIN_MSIX		2
+
 enum init_completion_state {
 	INVALID_STATE = 0,
 	INITIAL_STATE,
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index ad61dd688871..0dbc98ba69f4 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -97,7 +97,6 @@
 #define ICE_MIN_LAN_OICR_MSIX	1
 #define ICE_MIN_MSIX		(ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
 #define ICE_FDIR_MSIX		2
-#define ICE_RDMA_NUM_AEQ_MSIX	4
 #define ICE_NO_VSI		0xffff
 #define ICE_VSI_MAP_CONTIG	0
 #define ICE_VSI_MAP_SCATTER	1
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 145b27f2a4ce..bab3e81cad5d 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -228,61 +228,34 @@ void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos)
 }
 EXPORT_SYMBOL_GPL(ice_get_qos_params);
 
-/**
- * ice_alloc_rdma_qvectors - Allocate vector resources for RDMA driver
- * @pf: board private structure to initialize
- */
-static int ice_alloc_rdma_qvectors(struct ice_pf *pf)
+int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
 {
-	if (ice_is_rdma_ena(pf)) {
-		int i;
-
-		pf->msix_entries = kcalloc(pf->num_rdma_msix,
-					   sizeof(*pf->msix_entries),
-						  GFP_KERNEL);
-		if (!pf->msix_entries)
-			return -ENOMEM;
+	struct msi_map map = ice_alloc_irq(pf, true);
 
-		/* RDMA is the only user of pf->msix_entries array */
-		pf->rdma_base_vector = 0;
-
-		for (i = 0; i < pf->num_rdma_msix; i++) {
-			struct msix_entry *entry = &pf->msix_entries[i];
-			struct msi_map map;
+	if (map.index < 0)
+		return -ENOMEM;
 
-			map = ice_alloc_irq(pf, false);
-			if (map.index < 0)
-				break;
+	entry->entry = map.index;
+	entry->vector = map.virq;
 
-			entry->entry = map.index;
-			entry->vector = map.virq;
-		}
-	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(ice_alloc_rdma_qvector);
 
 /**
  * ice_free_rdma_qvector - free vector resources reserved for RDMA driver
  * @pf: board private structure to initialize
+ * @entry: MSI-X entry to be removed
  */
-static void ice_free_rdma_qvector(struct ice_pf *pf)
+void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
 {
-	int i;
-
-	if (!pf->msix_entries)
-		return;
-
-	for (i = 0; i < pf->num_rdma_msix; i++) {
-		struct msi_map map;
+	struct msi_map map;
 
-		map.index = pf->msix_entries[i].entry;
-		map.virq = pf->msix_entries[i].vector;
-		ice_free_irq(pf, map);
-	}
-
-	kfree(pf->msix_entries);
-	pf->msix_entries = NULL;
+	map.index = entry->entry;
+	map.virq = entry->vector;
+	ice_free_irq(pf, map);
 }
+EXPORT_SYMBOL_GPL(ice_free_rdma_qvector);
 
 /**
  * ice_adev_release - function to be mapped to AUX dev's release op
@@ -382,12 +355,6 @@ int ice_init_rdma(struct ice_pf *pf)
 		return -ENOMEM;
 	}
 
-	/* Reserve vector resources */
-	ret = ice_alloc_rdma_qvectors(pf);
-	if (ret < 0) {
-		dev_err(dev, "failed to reserve vectors for RDMA\n");
-		goto err_reserve_rdma_qvector;
-	}
 	pf->rdma_mode |= IIDC_RDMA_PROTOCOL_ROCEV2;
 	ret = ice_plug_aux_dev(pf);
 	if (ret)
@@ -395,8 +362,6 @@ int ice_init_rdma(struct ice_pf *pf)
 	return 0;
 
 err_plug_aux_dev:
-	ice_free_rdma_qvector(pf);
-err_reserve_rdma_qvector:
 	pf->adev = NULL;
 	xa_erase(&ice_aux_id, pf->aux_idx);
 	return ret;
@@ -412,6 +377,5 @@ void ice_deinit_rdma(struct ice_pf *pf)
 		return;
 
 	ice_unplug_aux_dev(pf);
-	ice_free_rdma_qvector(pf);
 	xa_erase(&ice_aux_id, pf->aux_idx);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 1a7d446ab5f1..80c9ee2e64c1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -84,11 +84,12 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
 	return entry;
 }
 
+#define ICE_RDMA_AEQ_MSIX 1
 static int ice_get_default_msix_amount(struct ice_pf *pf)
 {
 	return ICE_MIN_LAN_OICR_MSIX + num_online_cpus() +
 	       (test_bit(ICE_FLAG_FD_ENA, pf->flags) ? ICE_FDIR_MSIX : 0) +
-	       (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_NUM_AEQ_MSIX : 0);
+	       (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_AEQ_MSIX : 0);
 }
 
 /**
diff --git a/include/linux/net/intel/iidc.h b/include/linux/net/intel/iidc.h
index 1c1332e4df26..13274c3def66 100644
--- a/include/linux/net/intel/iidc.h
+++ b/include/linux/net/intel/iidc.h
@@ -78,6 +78,8 @@ int ice_del_rdma_qset(struct ice_pf *pf, struct iidc_rdma_qset_params *qset);
 int ice_rdma_request_reset(struct ice_pf *pf, enum iidc_reset_type reset_type);
 int ice_rdma_update_vsi_filter(struct ice_pf *pf, u16 vsi_id, bool enable);
 void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos);
+int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
+void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
 
 /* Structure representing auxiliary driver tailored information about the core
  * PCI dev, each auxiliary driver using the IIDC interface will have an
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 6/9] ice: treat dyn_allowed only as suggestion
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (4 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 5/9] ice, irdma: move interrupts code to irdma Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 7/9] ice: enable_rdma devlink param Tony Nguyen
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

It can be needed to have some MSI-X allocated as static and rest as
dynamic. For example on PF VSI. We want to always have minimum one MSI-X
on it, because of that it is allocated as a static one, rest can be
dynamic if it is supported.

Change the ice_get_irq_res() to allow using static entries if they are
free even if caller wants dynamic one.

Adjust limit values to the new approach. Min and max in limit means the
values that are valid, so decrease max and num_static by one.

Set vsi::irq_dyn_alloc if dynamic allocation is supported.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_irq.c | 25 ++++++++++++------------
 drivers/net/ethernet/intel/ice/ice_lib.c |  2 ++
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 80c9ee2e64c1..d466d29b2ef1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -45,7 +45,7 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
 /**
  * ice_get_irq_res - get an interrupt resource
  * @pf: board private structure
- * @dyn_only: force entry to be dynamically allocated
+ * @dyn_allowed: allow entry to be dynamically allocated
  *
  * Allocate new irq entry in the free slot of the tracker. Since xarray
  * is used, always allocate new entry at the lowest possible index. Set
@@ -53,11 +53,12 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
  *
  * Returns allocated irq entry or NULL on failure.
  */
-static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
+static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf,
+					     bool dyn_allowed)
 {
-	struct xa_limit limit = { .max = pf->irq_tracker.num_entries,
+	struct xa_limit limit = { .max = pf->irq_tracker.num_entries - 1,
 				  .min = 0 };
-	unsigned int num_static = pf->irq_tracker.num_static;
+	unsigned int num_static = pf->irq_tracker.num_static - 1;
 	struct ice_irq_entry *entry;
 	unsigned int index;
 	int ret;
@@ -66,9 +67,9 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
 	if (!entry)
 		return NULL;
 
-	/* skip preallocated entries if the caller says so */
-	if (dyn_only)
-		limit.min = num_static;
+	/* only already allocated if the caller says so */
+	if (!dyn_allowed)
+		limit.max = num_static;
 
 	ret = xa_alloc(&pf->irq_tracker.entries, &index, entry, limit,
 		       GFP_KERNEL);
@@ -78,7 +79,7 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
 		entry = NULL;
 	} else {
 		entry->index = index;
-		entry->dynamic = index >= num_static;
+		entry->dynamic = index > num_static;
 	}
 
 	return entry;
@@ -137,7 +138,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 /**
  * ice_alloc_irq - Allocate new interrupt vector
  * @pf: board private structure
- * @dyn_only: force dynamic allocation of the interrupt
+ * @dyn_allowed: allow dynamic allocation of the interrupt
  *
  * Allocate new interrupt vector for a given owner id.
  * return struct msi_map with interrupt details and track
@@ -150,20 +151,20 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
  * interrupt will be allocated with pci_msix_alloc_irq_at.
  *
  * Some callers may only support dynamically allocated interrupts.
- * This is indicated with dyn_only flag.
+ * This is indicated with dyn_allowed flag.
  *
  * On failure, return map with negative .index. The caller
  * is expected to check returned map index.
  *
  */
-struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only)
+struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
 {
 	int sriov_base_vector = pf->sriov_base_vector;
 	struct msi_map map = { .index = -ENOENT };
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_irq_entry *entry;
 
-	entry = ice_get_irq_res(pf, dyn_only);
+	entry = ice_get_irq_res(pf, dyn_allowed);
 	if (!entry)
 		return map;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 4b8d7aa7b1bb..1827f1f20ce8 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -571,6 +571,8 @@ ice_vsi_alloc_def(struct ice_vsi *vsi, struct ice_channel *ch)
 			return -ENOMEM;
 	}
 
+	vsi->irq_dyn_alloc = pci_msix_can_alloc_dyn(vsi->back->pdev);
+
 	switch (vsi->type) {
 	case ICE_VSI_PF:
 	case ICE_VSI_SF:
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 7/9] ice: enable_rdma devlink param
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (5 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 6/9] ice: treat dyn_allowed only as suggestion Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 8/9] ice: simplify VF MSI-X managing Tony Nguyen
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, Jan Sokolowski

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Implement enable_rdma devlink parameter to allow user to turn RDMA
feature on and off.

It is useful when there is no enough interrupts and user doesn't need
RDMA feature.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 .../net/ethernet/intel/ice/devlink/devlink.c  | 21 +++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_lib.c      |  8 ++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
index c53baecf8a90..725136c975e1 100644
--- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
+++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
@@ -1583,6 +1583,19 @@ ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int ice_devlink_enable_rdma_validate(struct devlink *devlink, u32 id,
+					    union devlink_param_value val,
+					    struct netlink_ext_ack *extack)
+{
+	struct ice_pf *pf = devlink_priv(devlink);
+	bool new_state = val.vbool;
+
+	if (new_state && !test_bit(ICE_FLAG_RDMA_ENA, pf->flags))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 enum ice_param_id {
 	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
 	ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
@@ -1598,6 +1611,8 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
 			      ice_devlink_enable_iw_get,
 			      ice_devlink_enable_iw_set,
 			      ice_devlink_enable_iw_validate),
+	DEVLINK_PARAM_GENERIC(ENABLE_RDMA, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+			      NULL, NULL, ice_devlink_enable_rdma_validate),
 };
 
 static const struct devlink_param ice_dvl_msix_params[] = {
@@ -1738,6 +1753,12 @@ int ice_devlink_register_params(struct ice_pf *pf)
 	devl_param_driverinit_value_set(devlink,
 					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
 					value);
+
+	value.vbool = test_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+	devl_param_driverinit_value_set(devlink,
+					DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
+					value);
+
 	return 0;
 
 unregister_msix_params:
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 1827f1f20ce8..16c419809849 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -833,7 +833,13 @@ bool ice_is_safe_mode(struct ice_pf *pf)
  */
 bool ice_is_rdma_ena(struct ice_pf *pf)
 {
-	return test_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+	union devlink_param_value value;
+	int err;
+
+	err = devl_param_driverinit_value_get(priv_to_devlink(pf),
+					      DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
+					      &value);
+	return err ? test_bit(ICE_FLAG_RDMA_ENA, pf->flags) : value.vbool;
 }
 
 /**
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 8/9] ice: simplify VF MSI-X managing
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (6 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 7/9] ice: enable_rdma devlink param Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-03 21:09 ` [PATCH net-next 9/9] ice: init flow director before RDMA Tony Nguyen
  2025-02-04 22:42 ` [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Jakub Kicinski
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, Rafal Romanowski

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

After implementing pf->msix.max field, base vector for other use cases
(like VFs) can be fixed. This simplify code when changing MSI-X amount
on particular VF, because there is no need to move a base vector.

A fixed base vector allows to reserve vectors from the beginning
instead of from the end, which is also simpler in code.

Store total and rest value in the same struct as max and min for PF.
Move tracking vectors from ice_sriov.c to ice_irq.c as it can be also
use for other none PF use cases (SIOV).

Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h       |  10 +-
 drivers/net/ethernet/intel/ice/ice_irq.c   |  75 +++++++---
 drivers/net/ethernet/intel/ice/ice_irq.h   |  13 +-
 drivers/net/ethernet/intel/ice/ice_sriov.c | 154 ++-------------------
 4 files changed, 79 insertions(+), 173 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 0dbc98ba69f4..2a6de2115193 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -543,6 +543,8 @@ struct ice_pf_msix {
 	u32 cur;
 	u32 min;
 	u32 max;
+	u32 total;
+	u32 rest;
 };
 
 struct ice_pf {
@@ -559,13 +561,7 @@ struct ice_pf {
 	/* OS reserved IRQ details */
 	struct msix_entry *msix_entries;
 	struct ice_irq_tracker irq_tracker;
-	/* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
-	 * number of MSIX vectors needed for all SR-IOV VFs from the number of
-	 * MSIX vectors allowed on this PF.
-	 */
-	u16 sriov_base_vector;
-	unsigned long *sriov_irq_bm;	/* bitmap to track irq usage */
-	u16 sriov_irq_size;		/* size of the irq_bm bitmap */
+	struct ice_virt_irq_tracker virt_irq_tracker;
 
 	u16 ctrl_vsi_idx;		/* control VSI index in pf->vsi array */
 
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index d466d29b2ef1..cbae3d81f0f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -20,6 +20,19 @@ ice_init_irq_tracker(struct ice_pf *pf, unsigned int max_vectors,
 	xa_init_flags(&pf->irq_tracker.entries, XA_FLAGS_ALLOC);
 }
 
+static int
+ice_init_virt_irq_tracker(struct ice_pf *pf, u32 base, u32 num_entries)
+{
+	pf->virt_irq_tracker.bm = bitmap_zalloc(num_entries, GFP_KERNEL);
+	if (!pf->virt_irq_tracker.bm)
+		return -ENOMEM;
+
+	pf->virt_irq_tracker.num_entries = num_entries;
+	pf->virt_irq_tracker.base = base;
+
+	return 0;
+}
+
 /**
  * ice_deinit_irq_tracker - free xarray tracker
  * @pf: board private structure
@@ -29,6 +42,11 @@ static void ice_deinit_irq_tracker(struct ice_pf *pf)
 	xa_destroy(&pf->irq_tracker.entries);
 }
 
+static void ice_deinit_virt_irq_tracker(struct ice_pf *pf)
+{
+	bitmap_free(pf->virt_irq_tracker.bm);
+}
+
 /**
  * ice_free_irq_res - free a block of resources
  * @pf: board private structure
@@ -101,6 +119,7 @@ void ice_clear_interrupt_scheme(struct ice_pf *pf)
 {
 	pci_free_irq_vectors(pf->pdev);
 	ice_deinit_irq_tracker(pf);
+	ice_deinit_virt_irq_tracker(pf);
 }
 
 /**
@@ -120,6 +139,9 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 		pf->msix.max = min(total_vectors,
 				   ice_get_default_msix_amount(pf));
 
+	pf->msix.total = total_vectors;
+	pf->msix.rest = total_vectors - pf->msix.max;
+
 	if (pci_msix_can_alloc_dyn(pf->pdev))
 		vectors = pf->msix.min;
 	else
@@ -132,7 +154,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
 
 	ice_init_irq_tracker(pf, pf->msix.max, vectors);
 
-	return 0;
+	return ice_init_virt_irq_tracker(pf, pf->msix.max, pf->msix.rest);
 }
 
 /**
@@ -159,7 +181,6 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
  */
 struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
 {
-	int sriov_base_vector = pf->sriov_base_vector;
 	struct msi_map map = { .index = -ENOENT };
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_irq_entry *entry;
@@ -168,10 +189,6 @@ struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
 	if (!entry)
 		return map;
 
-	/* fail if we're about to violate SRIOV vectors space */
-	if (sriov_base_vector && entry->index >= sriov_base_vector)
-		goto exit_free_res;
-
 	if (pci_msix_can_alloc_dyn(pf->pdev) && entry->dynamic) {
 		map = pci_msix_alloc_irq_at(pf->pdev, entry->index, NULL);
 		if (map.index < 0)
@@ -219,26 +236,40 @@ void ice_free_irq(struct ice_pf *pf, struct msi_map map)
 }
 
 /**
- * ice_get_max_used_msix_vector - Get the max used interrupt vector
- * @pf: board private structure
+ * ice_virt_get_irqs - get irqs for SR-IOV usacase
+ * @pf: pointer to PF structure
+ * @needed: number of irqs to get
  *
- * Return index of maximum used interrupt vectors with respect to the
- * beginning of the MSIX table. Take into account that some interrupts
- * may have been dynamically allocated after MSIX was initially enabled.
+ * This returns the first MSI-X vector index in PF space that is used by this
+ * VF. This index is used when accessing PF relative registers such as
+ * GLINT_VECT2FUNC and GLINT_DYN_CTL.
+ * This will always be the OICR index in the AVF driver so any functionality
+ * using vf->first_vector_idx for queue configuration_id: id of VF which will
+ * use this irqs
  */
-int ice_get_max_used_msix_vector(struct ice_pf *pf)
+int ice_virt_get_irqs(struct ice_pf *pf, u32 needed)
 {
-	unsigned long start, index, max_idx;
-	void *entry;
+	int res = bitmap_find_next_zero_area(pf->virt_irq_tracker.bm,
+					     pf->virt_irq_tracker.num_entries,
+					     0, needed, 0);
 
-	/* Treat all preallocated interrupts as used */
-	start = pf->irq_tracker.num_static;
-	max_idx = start - 1;
+	if (res >= pf->virt_irq_tracker.num_entries)
+		return -ENOENT;
 
-	xa_for_each_start(&pf->irq_tracker.entries, index, entry, start) {
-		if (index > max_idx)
-			max_idx = index;
-	}
+	bitmap_set(pf->virt_irq_tracker.bm, res, needed);
 
-	return max_idx;
+	/* conversion from number in bitmap to global irq index */
+	return res + pf->virt_irq_tracker.base;
+}
+
+/**
+ * ice_virt_free_irqs - free irqs used by the VF
+ * @pf: pointer to PF structure
+ * @index: first index to be free
+ * @irqs: number of irqs to free
+ */
+void ice_virt_free_irqs(struct ice_pf *pf, u32 index, u32 irqs)
+{
+	bitmap_clear(pf->virt_irq_tracker.bm, index - pf->virt_irq_tracker.base,
+		     irqs);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.h b/drivers/net/ethernet/intel/ice/ice_irq.h
index f35efc08575e..b2f9dbafd57e 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.h
+++ b/drivers/net/ethernet/intel/ice/ice_irq.h
@@ -15,11 +15,22 @@ struct ice_irq_tracker {
 	u16 num_static;	/* preallocated entries */
 };
 
+struct ice_virt_irq_tracker {
+	unsigned long *bm;	/* bitmap to track irq usage */
+	u32 num_entries;
+	/* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
+	 * number of MSIX vectors needed for all SR-IOV VFs from the number of
+	 * MSIX vectors allowed on this PF.
+	 */
+	u32 base;
+};
+
 int ice_init_interrupt_scheme(struct ice_pf *pf);
 void ice_clear_interrupt_scheme(struct ice_pf *pf);
 
 struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only);
 void ice_free_irq(struct ice_pf *pf, struct msi_map map);
-int ice_get_max_used_msix_vector(struct ice_pf *pf);
 
+int ice_virt_get_irqs(struct ice_pf *pf, u32 needed);
+void ice_virt_free_irqs(struct ice_pf *pf, u32 index, u32 irqs);
 #endif
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index b83f99c01d91..33eac29b6a50 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -122,27 +122,6 @@ static void ice_dis_vf_mappings(struct ice_vf *vf)
 		dev_err(dev, "Scattered mode for VF Rx queues is not yet implemented\n");
 }
 
-/**
- * ice_sriov_free_msix_res - Reset/free any used MSIX resources
- * @pf: pointer to the PF structure
- *
- * Since no MSIX entries are taken from the pf->irq_tracker then just clear
- * the pf->sriov_base_vector.
- *
- * Returns 0 on success, and -EINVAL on error.
- */
-static int ice_sriov_free_msix_res(struct ice_pf *pf)
-{
-	if (!pf)
-		return -EINVAL;
-
-	bitmap_free(pf->sriov_irq_bm);
-	pf->sriov_irq_size = 0;
-	pf->sriov_base_vector = 0;
-
-	return 0;
-}
-
 /**
  * ice_free_vfs - Free all VFs
  * @pf: pointer to the PF structure
@@ -177,6 +156,7 @@ void ice_free_vfs(struct ice_pf *pf)
 
 		ice_eswitch_detach_vf(pf, vf);
 		ice_dis_vf_qs(vf);
+		ice_virt_free_irqs(pf, vf->first_vector_idx, vf->num_msix);
 
 		if (test_bit(ICE_VF_STATE_INIT, vf->vf_states)) {
 			/* disable VF qp mappings and set VF disable state */
@@ -200,9 +180,6 @@ void ice_free_vfs(struct ice_pf *pf)
 		mutex_unlock(&vf->cfg_lock);
 	}
 
-	if (ice_sriov_free_msix_res(pf))
-		dev_err(dev, "Failed to free MSIX resources used by SR-IOV\n");
-
 	vfs->num_qps_per = 0;
 	ice_free_vf_entries(pf);
 
@@ -371,40 +348,6 @@ void ice_calc_vf_reg_idx(struct ice_vf *vf, struct ice_q_vector *q_vector)
 	q_vector->reg_idx = vf->first_vector_idx + q_vector->vf_reg_idx;
 }
 
-/**
- * ice_sriov_set_msix_res - Set any used MSIX resources
- * @pf: pointer to PF structure
- * @num_msix_needed: number of MSIX vectors needed for all SR-IOV VFs
- *
- * This function allows SR-IOV resources to be taken from the end of the PF's
- * allowed HW MSIX vectors so that the irq_tracker will not be affected. We
- * just set the pf->sriov_base_vector and return success.
- *
- * If there are not enough resources available, return an error. This should
- * always be caught by ice_set_per_vf_res().
- *
- * Return 0 on success, and -EINVAL when there are not enough MSIX vectors
- * in the PF's space available for SR-IOV.
- */
-static int ice_sriov_set_msix_res(struct ice_pf *pf, u16 num_msix_needed)
-{
-	u16 total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
-	int vectors_used = ice_get_max_used_msix_vector(pf);
-	int sriov_base_vector;
-
-	sriov_base_vector = total_vectors - num_msix_needed;
-
-	/* make sure we only grab irq_tracker entries from the list end and
-	 * that we have enough available MSIX vectors
-	 */
-	if (sriov_base_vector < vectors_used)
-		return -EINVAL;
-
-	pf->sriov_base_vector = sriov_base_vector;
-
-	return 0;
-}
-
 /**
  * ice_set_per_vf_res - check if vectors and queues are available
  * @pf: pointer to the PF structure
@@ -429,11 +372,9 @@ static int ice_sriov_set_msix_res(struct ice_pf *pf, u16 num_msix_needed)
  */
 static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
 {
-	int vectors_used = ice_get_max_used_msix_vector(pf);
 	u16 num_msix_per_vf, num_txq, num_rxq, avail_qs;
 	int msix_avail_per_vf, msix_avail_for_sriov;
 	struct device *dev = ice_pf_to_dev(pf);
-	int err;
 
 	lockdep_assert_held(&pf->vfs.table_lock);
 
@@ -441,8 +382,7 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
 		return -EINVAL;
 
 	/* determine MSI-X resources per VF */
-	msix_avail_for_sriov = pf->hw.func_caps.common_cap.num_msix_vectors -
-		vectors_used;
+	msix_avail_for_sriov = pf->virt_irq_tracker.num_entries;
 	msix_avail_per_vf = msix_avail_for_sriov / num_vfs;
 	if (msix_avail_per_vf >= ICE_NUM_VF_MSIX_MED) {
 		num_msix_per_vf = ICE_NUM_VF_MSIX_MED;
@@ -481,13 +421,6 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
 		return -ENOSPC;
 	}
 
-	err = ice_sriov_set_msix_res(pf, num_msix_per_vf * num_vfs);
-	if (err) {
-		dev_err(dev, "Unable to set MSI-X resources for %d VFs, err %d\n",
-			num_vfs, err);
-		return err;
-	}
-
 	/* only allow equal Tx/Rx queue count (i.e. queue pairs) */
 	pf->vfs.num_qps_per = min_t(int, num_txq, num_rxq);
 	pf->vfs.num_msix_per = num_msix_per_vf;
@@ -497,52 +430,6 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
 	return 0;
 }
 
-/**
- * ice_sriov_get_irqs - get irqs for SR-IOV usacase
- * @pf: pointer to PF structure
- * @needed: number of irqs to get
- *
- * This returns the first MSI-X vector index in PF space that is used by this
- * VF. This index is used when accessing PF relative registers such as
- * GLINT_VECT2FUNC and GLINT_DYN_CTL.
- * This will always be the OICR index in the AVF driver so any functionality
- * using vf->first_vector_idx for queue configuration_id: id of VF which will
- * use this irqs
- *
- * Only SRIOV specific vectors are tracked in sriov_irq_bm. SRIOV vectors are
- * allocated from the end of global irq index. First bit in sriov_irq_bm means
- * last irq index etc. It simplifies extension of SRIOV vectors.
- * They will be always located from sriov_base_vector to the last irq
- * index. While increasing/decreasing sriov_base_vector can be moved.
- */
-static int ice_sriov_get_irqs(struct ice_pf *pf, u16 needed)
-{
-	int res = bitmap_find_next_zero_area(pf->sriov_irq_bm,
-					     pf->sriov_irq_size, 0, needed, 0);
-	/* conversion from number in bitmap to global irq index */
-	int index = pf->sriov_irq_size - res - needed;
-
-	if (res >= pf->sriov_irq_size || index < pf->sriov_base_vector)
-		return -ENOENT;
-
-	bitmap_set(pf->sriov_irq_bm, res, needed);
-	return index;
-}
-
-/**
- * ice_sriov_free_irqs - free irqs used by the VF
- * @pf: pointer to PF structure
- * @vf: pointer to VF structure
- */
-static void ice_sriov_free_irqs(struct ice_pf *pf, struct ice_vf *vf)
-{
-	/* Move back from first vector index to first index in bitmap */
-	int bm_i = pf->sriov_irq_size - vf->first_vector_idx - vf->num_msix;
-
-	bitmap_clear(pf->sriov_irq_bm, bm_i, vf->num_msix);
-	vf->first_vector_idx = 0;
-}
-
 /**
  * ice_init_vf_vsi_res - initialize/setup VF VSI resources
  * @vf: VF to initialize/setup the VSI for
@@ -556,7 +443,7 @@ static int ice_init_vf_vsi_res(struct ice_vf *vf)
 	struct ice_vsi *vsi;
 	int err;
 
-	vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+	vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
 	if (vf->first_vector_idx < 0)
 		return -ENOMEM;
 
@@ -856,16 +743,10 @@ static int ice_create_vf_entries(struct ice_pf *pf, u16 num_vfs)
  */
 static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
 {
-	int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
 	struct device *dev = ice_pf_to_dev(pf);
 	struct ice_hw *hw = &pf->hw;
 	int ret;
 
-	pf->sriov_irq_bm = bitmap_zalloc(total_vectors, GFP_KERNEL);
-	if (!pf->sriov_irq_bm)
-		return -ENOMEM;
-	pf->sriov_irq_size = total_vectors;
-
 	/* Disable global interrupt 0 so we don't try to handle the VFLR. */
 	wr32(hw, GLINT_DYN_CTL(pf->oicr_irq.index),
 	     ICE_ITR_NONE << GLINT_DYN_CTL_ITR_INDX_S);
@@ -918,7 +799,6 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
 	/* rearm interrupts here */
 	ice_irq_dynamic_ena(hw, NULL, NULL);
 	clear_bit(ICE_OICR_INTR_DIS, pf->state);
-	bitmap_free(pf->sriov_irq_bm);
 	return ret;
 }
 
@@ -992,16 +872,7 @@ u32 ice_sriov_get_vf_total_msix(struct pci_dev *pdev)
 {
 	struct ice_pf *pf = pci_get_drvdata(pdev);
 
-	return pf->sriov_irq_size - ice_get_max_used_msix_vector(pf);
-}
-
-static int ice_sriov_move_base_vector(struct ice_pf *pf, int move)
-{
-	if (pf->sriov_base_vector - move < ice_get_max_used_msix_vector(pf))
-		return -ENOMEM;
-
-	pf->sriov_base_vector -= move;
-	return 0;
+	return pf->virt_irq_tracker.num_entries;
 }
 
 static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
@@ -1020,7 +891,8 @@ static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
 			continue;
 
 		ice_dis_vf_mappings(tmp_vf);
-		ice_sriov_free_irqs(pf, tmp_vf);
+		ice_virt_free_irqs(pf, tmp_vf->first_vector_idx,
+				   tmp_vf->num_msix);
 
 		vf_ids[to_remap] = tmp_vf->vf_id;
 		to_remap += 1;
@@ -1032,7 +904,7 @@ static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
 			continue;
 
 		tmp_vf->first_vector_idx =
-			ice_sriov_get_irqs(pf, tmp_vf->num_msix);
+			ice_virt_get_irqs(pf, tmp_vf->num_msix);
 		/* there is no need to rebuild VSI as we are only changing the
 		 * vector indexes not amount of MSI-X or queues
 		 */
@@ -1105,20 +977,15 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
 	prev_msix = vf->num_msix;
 	prev_queues = vf->num_vf_qs;
 
-	if (ice_sriov_move_base_vector(pf, msix_vec_count - prev_msix)) {
-		ice_put_vf(vf);
-		return -ENOSPC;
-	}
-
 	ice_dis_vf_mappings(vf);
-	ice_sriov_free_irqs(pf, vf);
+	ice_virt_free_irqs(pf, vf->first_vector_idx, vf->num_msix);
 
 	/* Remap all VFs beside the one is now configured */
 	ice_sriov_remap_vectors(pf, vf->vf_id);
 
 	vf->num_msix = msix_vec_count;
 	vf->num_vf_qs = queues;
-	vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+	vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
 	if (vf->first_vector_idx < 0)
 		goto unroll;
 
@@ -1147,7 +1014,8 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
 
 	vf->num_msix = prev_msix;
 	vf->num_vf_qs = prev_queues;
-	vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+
+	vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
 	if (vf->first_vector_idx < 0) {
 		ice_put_vf(vf);
 		return -EINVAL;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 9/9] ice: init flow director before RDMA
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (7 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 8/9] ice: simplify VF MSI-X managing Tony Nguyen
@ 2025-02-03 21:09 ` Tony Nguyen
  2025-02-04 22:42 ` [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Jakub Kicinski
  9 siblings, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-03 21:09 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: Michal Swiatkowski, anthony.l.nguyen, sridhar.samudrala,
	jacob.e.keller, pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

Flow director needs only one MSI-X. Load it before RDMA to save MSI-X
for it.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c3a0fb97c5ee..d7037de29545 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5186,11 +5186,12 @@ int ice_load(struct ice_pf *pf)
 
 	ice_napi_add(vsi);
 
+	ice_init_features(pf);
+
 	err = ice_init_rdma(pf);
 	if (err)
 		goto err_init_rdma;
 
-	ice_init_features(pf);
 	ice_service_task_restart(pf);
 
 	clear_bit(ICE_DOWN, pf->state);
@@ -5198,6 +5199,7 @@ int ice_load(struct ice_pf *pf)
 	return 0;
 
 err_init_rdma:
+	ice_deinit_features(pf);
 	ice_tc_indir_block_unregister(vsi);
 err_tc_indir_block_register:
 	ice_unregister_netdev(vsi);
@@ -5221,8 +5223,8 @@ void ice_unload(struct ice_pf *pf)
 
 	devl_assert_locked(priv_to_devlink(pf));
 
-	ice_deinit_features(pf);
 	ice_deinit_rdma(pf);
+	ice_deinit_features(pf);
 	ice_tc_indir_block_unregister(vsi);
 	ice_unregister_netdev(vsi);
 	ice_devlink_destroy_pf_port(pf);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-03 21:09 ` [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter Tony Nguyen
@ 2025-02-03 21:48   ` David Laight
  2025-02-04  6:06     ` Michal Swiatkowski
  2025-02-04 22:35   ` Jakub Kicinski
  1 sibling, 1 reply; 19+ messages in thread
From: David Laight @ 2025-02-03 21:48 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Michal Swiatkowski, sridhar.samudrala, jacob.e.keller,
	pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Mon,  3 Feb 2025 13:09:31 -0800
Tony Nguyen <anthony.l.nguyen@intel.com> wrote:

> From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> 
> Use generic devlink PF MSI-X parameter to allow user to change MSI-X
> range.
> 
> Add notes about this parameters into ice devlink documentation.
> 
> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> ---
>  Documentation/networking/devlink/ice.rst      | 11 +++
>  .../net/ethernet/intel/ice/devlink/devlink.c  | 88 +++++++++++++++++++
>  drivers/net/ethernet/intel/ice/ice.h          |  7 ++
>  drivers/net/ethernet/intel/ice/ice_irq.c      |  7 ++
>  4 files changed, 113 insertions(+)
> 
> diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
> index e3972d03cea0..792e9f8c846a 100644
> --- a/Documentation/networking/devlink/ice.rst
> +++ b/Documentation/networking/devlink/ice.rst
> @@ -69,6 +69,17 @@ Parameters
>  
>         To verify that value has been set:
>         $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
> +   * - ``msix_vec_per_pf_max``
> +     - driverinit
> +     - Set the max MSI-X that can be used by the PF, rest can be utilized for
> +       SRIOV. The range is from min value set in msix_vec_per_pf_min to
> +       2k/number of ports.
> +   * - ``msix_vec_per_pf_min``
> +     - driverinit
> +     - Set the min MSI-X that will be used by the PF. This value inform how many
> +       MSI-X will be allocated statically. The range is from 2 to value set
> +       in msix_vec_per_pf_max.
> +
>  .. list-table:: Driver specific parameters implemented
>      :widths: 5 5 90
>  
> diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> index d116e2b10bce..c53baecf8a90 100644
> --- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> +++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> @@ -1202,6 +1202,25 @@ static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
>  	return status;
>  }
>  
> +static void ice_set_min_max_msix(struct ice_pf *pf)
> +{
> +	struct devlink *devlink = priv_to_devlink(pf);
> +	union devlink_param_value val;
> +	int err;
> +
> +	err = devl_param_driverinit_value_get(devlink,
> +					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
> +					      &val);
> +	if (!err)
> +		pf->msix.min = val.vu32;
> +
> +	err = devl_param_driverinit_value_get(devlink,
> +					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
> +					      &val);
> +	if (!err)
> +		pf->msix.max = val.vu32;
> +}
> +
>  /**
>   * ice_devlink_reinit_up - do reinit of the given PF
>   * @pf: pointer to the PF struct
> @@ -1217,6 +1236,9 @@ static int ice_devlink_reinit_up(struct ice_pf *pf)
>  		return err;
>  	}
>  
> +	/* load MSI-X values */
> +	ice_set_min_max_msix(pf);
> +
>  	err = ice_init_dev(pf);
>  	if (err)
>  		goto unroll_hw_init;
> @@ -1530,6 +1552,37 @@ static int ice_devlink_local_fwd_validate(struct devlink *devlink, u32 id,
>  	return 0;
>  }
>  
> +static int
> +ice_devlink_msix_max_pf_validate(struct devlink *devlink, u32 id,
> +				 union devlink_param_value val,
> +				 struct netlink_ext_ack *extack)
> +{
> +	struct ice_pf *pf = devlink_priv(devlink);
> +
> +	if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
> +	    val.vu32 < pf->msix.min) {
> +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
> +				 union devlink_param_value val,
> +				 struct netlink_ext_ack *extack)
> +{
> +	struct ice_pf *pf = devlink_priv(devlink);
> +
> +	if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
> +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}

Don't those checks make it difficult to set the min and max together?
I think you need to create the new min/max pair and check they are
valid together.
Which probably requires one parameter with two values.

	David

> +
>  enum ice_param_id {
>  	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
>  	ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
> @@ -1547,6 +1600,15 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
>  			      ice_devlink_enable_iw_validate),
>  };
>  
> +static const struct devlink_param ice_dvl_msix_params[] = {
> +	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MAX,
> +			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
> +			      NULL, NULL, ice_devlink_msix_max_pf_validate),
> +	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MIN,
> +			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
> +			      NULL, NULL, ice_devlink_msix_min_pf_validate),
> +};
> +
>  static const struct devlink_param ice_dvl_sched_params[] = {
>  	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
>  			     "tx_scheduling_layers",
> @@ -1648,6 +1710,7 @@ void ice_devlink_unregister(struct ice_pf *pf)
>  int ice_devlink_register_params(struct ice_pf *pf)
>  {
>  	struct devlink *devlink = priv_to_devlink(pf);
> +	union devlink_param_value value;
>  	struct ice_hw *hw = &pf->hw;
>  	int status;
>  
> @@ -1656,10 +1719,33 @@ int ice_devlink_register_params(struct ice_pf *pf)
>  	if (status)
>  		return status;
>  
> +	status = devl_params_register(devlink, ice_dvl_msix_params,
> +				      ARRAY_SIZE(ice_dvl_msix_params));
> +	if (status)
> +		goto unregister_rdma_params;
> +
>  	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
>  		status = devl_params_register(devlink, ice_dvl_sched_params,
>  					      ARRAY_SIZE(ice_dvl_sched_params));
> +	if (status)
> +		goto unregister_msix_params;
> +
> +	value.vu32 = pf->msix.max;
> +	devl_param_driverinit_value_set(devlink,
> +					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
> +					value);
> +	value.vu32 = pf->msix.min;
> +	devl_param_driverinit_value_set(devlink,
> +					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
> +					value);
> +	return 0;
>  
> +unregister_msix_params:
> +	devl_params_unregister(devlink, ice_dvl_msix_params,
> +			       ARRAY_SIZE(ice_dvl_msix_params));
> +unregister_rdma_params:
> +	devl_params_unregister(devlink, ice_dvl_rdma_params,
> +			       ARRAY_SIZE(ice_dvl_rdma_params));
>  	return status;
>  }
>  
> @@ -1670,6 +1756,8 @@ void ice_devlink_unregister_params(struct ice_pf *pf)
>  
>  	devl_params_unregister(devlink, ice_dvl_rdma_params,
>  			       ARRAY_SIZE(ice_dvl_rdma_params));
> +	devl_params_unregister(devlink, ice_dvl_msix_params,
> +			       ARRAY_SIZE(ice_dvl_msix_params));
>  
>  	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
>  		devl_params_unregister(devlink, ice_dvl_sched_params,
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index 71e05d30f0fd..d041b04ff324 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -542,6 +542,12 @@ struct ice_agg_node {
>  	u8 valid;
>  };
>  
> +struct ice_pf_msix {
> +	u32 cur;
> +	u32 min;
> +	u32 max;
> +};
> +
>  struct ice_pf {
>  	struct pci_dev *pdev;
>  	struct ice_adapter *adapter;
> @@ -612,6 +618,7 @@ struct ice_pf {
>  	struct msi_map ll_ts_irq;	/* LL_TS interrupt MSIX vector */
>  	u16 max_pf_txqs;	/* Total Tx queues PF wide */
>  	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
> +	struct ice_pf_msix msix;
>  	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
>  	u16 num_lan_tx;		/* num LAN Tx queues setup */
>  	u16 num_lan_rx;		/* num LAN Rx queues setup */
> diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
> index ad82ff7d1995..0659b96b9b8c 100644
> --- a/drivers/net/ethernet/intel/ice/ice_irq.c
> +++ b/drivers/net/ethernet/intel/ice/ice_irq.c
> @@ -254,6 +254,13 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
>  	int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
>  	int vectors, max_vectors;
>  
> +	/* load default PF MSI-X range */
> +	if (!pf->msix.min)
> +		pf->msix.min = ICE_MIN_MSIX;
> +
> +	if (!pf->msix.max)
> +		pf->msix.max = total_vectors / 2;
> +
>  	vectors = ice_ena_msix_range(pf);
>  
>  	if (vectors < 0)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-03 21:48   ` David Laight
@ 2025-02-04  6:06     ` Michal Swiatkowski
  2025-02-04 18:41       ` David Laight
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Swiatkowski @ 2025-02-04  6:06 UTC (permalink / raw)
  To: David Laight
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	sridhar.samudrala, jacob.e.keller, pio.raczynski, konrad.knitter,
	marcin.szycik, nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri,
	horms, David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Mon, Feb 03, 2025 at 09:48:08PM +0000, David Laight wrote:
> On Mon,  3 Feb 2025 13:09:31 -0800
> Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
> 
> > From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > 
> > Use generic devlink PF MSI-X parameter to allow user to change MSI-X
> > range.
> > 
> > Add notes about this parameters into ice devlink documentation.
> > 
> > Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
> > Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > ---
> >  Documentation/networking/devlink/ice.rst      | 11 +++
> >  .../net/ethernet/intel/ice/devlink/devlink.c  | 88 +++++++++++++++++++
> >  drivers/net/ethernet/intel/ice/ice.h          |  7 ++
> >  drivers/net/ethernet/intel/ice/ice_irq.c      |  7 ++
> >  4 files changed, 113 insertions(+)
> > 
> > diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
> > index e3972d03cea0..792e9f8c846a 100644
> > --- a/Documentation/networking/devlink/ice.rst
> > +++ b/Documentation/networking/devlink/ice.rst
> > @@ -69,6 +69,17 @@ Parameters
> >  
> >         To verify that value has been set:
> >         $ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
> > +   * - ``msix_vec_per_pf_max``
> > +     - driverinit
> > +     - Set the max MSI-X that can be used by the PF, rest can be utilized for
> > +       SRIOV. The range is from min value set in msix_vec_per_pf_min to
> > +       2k/number of ports.
> > +   * - ``msix_vec_per_pf_min``
> > +     - driverinit
> > +     - Set the min MSI-X that will be used by the PF. This value inform how many
> > +       MSI-X will be allocated statically. The range is from 2 to value set
> > +       in msix_vec_per_pf_max.
> > +
> >  .. list-table:: Driver specific parameters implemented
> >      :widths: 5 5 90
> >  
> > diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> > index d116e2b10bce..c53baecf8a90 100644
> > --- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
> > +++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
> > @@ -1202,6 +1202,25 @@ static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
> >  	return status;
> >  }
> >  
> > +static void ice_set_min_max_msix(struct ice_pf *pf)
> > +{
> > +	struct devlink *devlink = priv_to_devlink(pf);
> > +	union devlink_param_value val;
> > +	int err;
> > +
> > +	err = devl_param_driverinit_value_get(devlink,
> > +					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
> > +					      &val);
> > +	if (!err)
> > +		pf->msix.min = val.vu32;
> > +
> > +	err = devl_param_driverinit_value_get(devlink,
> > +					      DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
> > +					      &val);
> > +	if (!err)
> > +		pf->msix.max = val.vu32;
> > +}
> > +
> >  /**
> >   * ice_devlink_reinit_up - do reinit of the given PF
> >   * @pf: pointer to the PF struct
> > @@ -1217,6 +1236,9 @@ static int ice_devlink_reinit_up(struct ice_pf *pf)
> >  		return err;
> >  	}
> >  
> > +	/* load MSI-X values */
> > +	ice_set_min_max_msix(pf);
> > +
> >  	err = ice_init_dev(pf);
> >  	if (err)
> >  		goto unroll_hw_init;
> > @@ -1530,6 +1552,37 @@ static int ice_devlink_local_fwd_validate(struct devlink *devlink, u32 id,
> >  	return 0;
> >  }
> >  
> > +static int
> > +ice_devlink_msix_max_pf_validate(struct devlink *devlink, u32 id,
> > +				 union devlink_param_value val,
> > +				 struct netlink_ext_ack *extack)
> > +{
> > +	struct ice_pf *pf = devlink_priv(devlink);
> > +
> > +	if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
> > +	    val.vu32 < pf->msix.min) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int
> > +ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
> > +				 union devlink_param_value val,
> > +				 struct netlink_ext_ack *extack)
> > +{
> > +	struct ice_pf *pf = devlink_priv(devlink);
> > +
> > +	if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> > +		return -EINVAL;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> Don't those checks make it difficult to set the min and max together?
> I think you need to create the new min/max pair and check they are
> valid together.
> Which probably requires one parameter with two values.
> 

I wanted to reuse exsisting parameter. The other user of it is bnxt
driver. In it there is a separate check for min "max" and max "max".
It is also problematic, because min can be set to value greater than
max (here it can happen when setting together to specific values).
I can do a follow up to this series and change this parameter as you
suggested. What do you think?

Thanks,
Michal

> 	David
> 
> > +
> >  enum ice_param_id {
> >  	ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
> >  	ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
> > @@ -1547,6 +1600,15 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
> >  			      ice_devlink_enable_iw_validate),
> >  };
> >  
> > +static const struct devlink_param ice_dvl_msix_params[] = {
> > +	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MAX,
> > +			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
> > +			      NULL, NULL, ice_devlink_msix_max_pf_validate),
> > +	DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MIN,
> > +			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
> > +			      NULL, NULL, ice_devlink_msix_min_pf_validate),
> > +};
> > +
> >  static const struct devlink_param ice_dvl_sched_params[] = {
> >  	DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
> >  			     "tx_scheduling_layers",
> > @@ -1648,6 +1710,7 @@ void ice_devlink_unregister(struct ice_pf *pf)
> >  int ice_devlink_register_params(struct ice_pf *pf)
> >  {
> >  	struct devlink *devlink = priv_to_devlink(pf);
> > +	union devlink_param_value value;
> >  	struct ice_hw *hw = &pf->hw;
> >  	int status;
> >  
> > @@ -1656,10 +1719,33 @@ int ice_devlink_register_params(struct ice_pf *pf)
> >  	if (status)
> >  		return status;
> >  
> > +	status = devl_params_register(devlink, ice_dvl_msix_params,
> > +				      ARRAY_SIZE(ice_dvl_msix_params));
> > +	if (status)
> > +		goto unregister_rdma_params;
> > +
> >  	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
> >  		status = devl_params_register(devlink, ice_dvl_sched_params,
> >  					      ARRAY_SIZE(ice_dvl_sched_params));
> > +	if (status)
> > +		goto unregister_msix_params;
> > +
> > +	value.vu32 = pf->msix.max;
> > +	devl_param_driverinit_value_set(devlink,
> > +					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
> > +					value);
> > +	value.vu32 = pf->msix.min;
> > +	devl_param_driverinit_value_set(devlink,
> > +					DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
> > +					value);
> > +	return 0;
> >  
> > +unregister_msix_params:
> > +	devl_params_unregister(devlink, ice_dvl_msix_params,
> > +			       ARRAY_SIZE(ice_dvl_msix_params));
> > +unregister_rdma_params:
> > +	devl_params_unregister(devlink, ice_dvl_rdma_params,
> > +			       ARRAY_SIZE(ice_dvl_rdma_params));
> >  	return status;
> >  }
> >  
> > @@ -1670,6 +1756,8 @@ void ice_devlink_unregister_params(struct ice_pf *pf)
> >  
> >  	devl_params_unregister(devlink, ice_dvl_rdma_params,
> >  			       ARRAY_SIZE(ice_dvl_rdma_params));
> > +	devl_params_unregister(devlink, ice_dvl_msix_params,
> > +			       ARRAY_SIZE(ice_dvl_msix_params));
> >  
> >  	if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
> >  		devl_params_unregister(devlink, ice_dvl_sched_params,
> > diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> > index 71e05d30f0fd..d041b04ff324 100644
> > --- a/drivers/net/ethernet/intel/ice/ice.h
> > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > @@ -542,6 +542,12 @@ struct ice_agg_node {
> >  	u8 valid;
> >  };
> >  
> > +struct ice_pf_msix {
> > +	u32 cur;
> > +	u32 min;
> > +	u32 max;
> > +};
> > +
> >  struct ice_pf {
> >  	struct pci_dev *pdev;
> >  	struct ice_adapter *adapter;
> > @@ -612,6 +618,7 @@ struct ice_pf {
> >  	struct msi_map ll_ts_irq;	/* LL_TS interrupt MSIX vector */
> >  	u16 max_pf_txqs;	/* Total Tx queues PF wide */
> >  	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
> > +	struct ice_pf_msix msix;
> >  	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
> >  	u16 num_lan_tx;		/* num LAN Tx queues setup */
> >  	u16 num_lan_rx;		/* num LAN Rx queues setup */
> > diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
> > index ad82ff7d1995..0659b96b9b8c 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_irq.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_irq.c
> > @@ -254,6 +254,13 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
> >  	int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
> >  	int vectors, max_vectors;
> >  
> > +	/* load default PF MSI-X range */
> > +	if (!pf->msix.min)
> > +		pf->msix.min = ICE_MIN_MSIX;
> > +
> > +	if (!pf->msix.max)
> > +		pf->msix.max = total_vectors / 2;
> > +
> >  	vectors = ice_ena_msix_range(pf);
> >  
> >  	if (vectors < 0)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-04  6:06     ` Michal Swiatkowski
@ 2025-02-04 18:41       ` David Laight
  2025-02-05  7:40         ` Michal Swiatkowski
  0 siblings, 1 reply; 19+ messages in thread
From: David Laight @ 2025-02-04 18:41 UTC (permalink / raw)
  To: Michal Swiatkowski
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	sridhar.samudrala, jacob.e.keller, pio.raczynski, konrad.knitter,
	marcin.szycik, nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri,
	horms, David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Tue, 4 Feb 2025 07:06:00 +0100
Michal Swiatkowski <michal.swiatkowski@linux.intel.com> wrote:

> On Mon, Feb 03, 2025 at 09:48:08PM +0000, David Laight wrote:
> > On Mon,  3 Feb 2025 13:09:31 -0800
> > Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
> >   
> > > From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > > 
> > > Use generic devlink PF MSI-X parameter to allow user to change MSI-X
> > > range.
> > > 
> > > Add notes about this parameters into ice devlink documentation.
....
> > Don't those checks make it difficult to set the min and max together?
> > I think you need to create the new min/max pair and check they are
> > valid together.
> > Which probably requires one parameter with two values.
> >   
> 
> I wanted to reuse exsisting parameter. The other user of it is bnxt
> driver. In it there is a separate check for min "max" and max "max".
> It is also problematic, because min can be set to value greater than
> max (here it can happen when setting together to specific values).
> I can do a follow up to this series and change this parameter as you
> suggested. What do you think?

Changing the way a parameter is used will break API compatibility.
Perhaps you can get the generic parameter validation function to
update a 'pending' copy, and then do the final min < max check after
all the parameters have been processed before actually updating
the live limits.

The other option is just not to check whether min < max and just
document which takes precedence (and not use clamp()).

It may even be worth saving the 'live limits' as 'hi << 16 | lo' so
that then can be accessed atomically (with READ/WRITE_ONCE) to avoid
anything looking at the limits getting confused.
(Although maybe that doesn't matter here?)

	David

> 
> Thanks,
> Michal

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-03 21:09 ` [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter Tony Nguyen
  2025-02-03 21:48   ` David Laight
@ 2025-02-04 22:35   ` Jakub Kicinski
  2025-02-05  5:46     ` Michal Swiatkowski
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-04 22:35 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, pabeni, edumazet, andrew+netdev, netdev,
	Michal Swiatkowski, sridhar.samudrala, jacob.e.keller,
	pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Mon,  3 Feb 2025 13:09:31 -0800 Tony Nguyen wrote:
> +	if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
> +	    val.vu32 < pf->msix.min) {
> +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> +		return -EINVAL;

> +	if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
> +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> +		return -EINVAL;

Please follow up and either remove these extack messages, or make them
more meaningful. The "value is invalid" is already expressed by EINVAL

The suggestion to set the values at once or as "pending" is a
distraction IMO.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver
  2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
                   ` (8 preceding siblings ...)
  2025-02-03 21:09 ` [PATCH net-next 9/9] ice: init flow director before RDMA Tony Nguyen
@ 2025-02-04 22:42 ` Jakub Kicinski
  2025-02-04 23:07   ` Tony Nguyen
  2025-02-05  5:46   ` Michal Swiatkowski
  9 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-02-04 22:42 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, pabeni, edumazet, andrew+netdev, netdev,
	michal.swiatkowski, sridhar.samudrala, jacob.e.keller,
	pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

On Mon,  3 Feb 2025 13:09:29 -0800 Tony Nguyen wrote:
> Now changing queues using ethtool is also changing MSI-X. If there is
> enough MSI-X it is always one to one. When there is not enough there
> will be more queues than MSI-X. There is a lack of ability to set how
> many queues should be used per MSI-X. Maybe we should introduce another
> ethtool param for it? Sth like queues_per_vector?

It's probably okay for today. AFAIU ethtool channels basically
correspond to IRQs. As the queue API matures we'll have
the ability to allocate more queues for "channel" == IRQ / event
queue.

> The following are changes since commit c2933b2befe25309f4c5cfbea0ca80909735fd76:
>   Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE

Tony, the patches in your tree are missing your SoB, and I suspect 
you may need the same PR to get pulled into RDMA, so I'm not applying
from the list... Please respin.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver
  2025-02-04 22:42 ` [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Jakub Kicinski
@ 2025-02-04 23:07   ` Tony Nguyen
  2025-02-05  5:46   ` Michal Swiatkowski
  1 sibling, 0 replies; 19+ messages in thread
From: Tony Nguyen @ 2025-02-04 23:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, netdev,
	michal.swiatkowski, sridhar.samudrala, jacob.e.keller,
	pio.raczynski, konrad.knitter, marcin.szycik,
	nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
	David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma



On 2/4/2025 2:42 PM, Jakub Kicinski wrote:
> Tony, the patches in your tree are missing your SoB, and I suspect
> you may need the same PR to get pulled into RDMA, so I'm not applying
> from the list... Please respin.

Oops. I had to do this a little differently than normal and, obviously, 
missed that step :( Will add it in and work with Michal to make the 
other changes.

Thanks,
Tony


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver
  2025-02-04 22:42 ` [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Jakub Kicinski
  2025-02-04 23:07   ` Tony Nguyen
@ 2025-02-05  5:46   ` Michal Swiatkowski
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Swiatkowski @ 2025-02-05  5:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
	sridhar.samudrala, jacob.e.keller, pio.raczynski, konrad.knitter,
	marcin.szycik, nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri,
	horms, David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma

On Tue, Feb 04, 2025 at 02:42:52PM -0800, Jakub Kicinski wrote:
> On Mon,  3 Feb 2025 13:09:29 -0800 Tony Nguyen wrote:
> > Now changing queues using ethtool is also changing MSI-X. If there is
> > enough MSI-X it is always one to one. When there is not enough there
> > will be more queues than MSI-X. There is a lack of ability to set how
> > many queues should be used per MSI-X. Maybe we should introduce another
> > ethtool param for it? Sth like queues_per_vector?
> 
> It's probably okay for today. AFAIU ethtool channels basically
> correspond to IRQs. As the queue API matures we'll have
> the ability to allocate more queues for "channel" == IRQ / event
> queue.
> 

Ok, thanks for explanation.

> > The following are changes since commit c2933b2befe25309f4c5cfbea0ca80909735fd76:
> >   Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> > and are available in the git repository at:
> >   git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue 100GbE
> 
> Tony, the patches in your tree are missing your SoB, and I suspect 
> you may need the same PR to get pulled into RDMA, so I'm not applying
> from the list... Please respin.
> -- 
> pw-bot: cr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-04 22:35   ` Jakub Kicinski
@ 2025-02-05  5:46     ` Michal Swiatkowski
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Swiatkowski @ 2025-02-05  5:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Tony Nguyen, davem, pabeni, edumazet, andrew+netdev, netdev,
	sridhar.samudrala, jacob.e.keller, pio.raczynski, konrad.knitter,
	marcin.szycik, nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri,
	horms, David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Tue, Feb 04, 2025 at 02:35:17PM -0800, Jakub Kicinski wrote:
> On Mon,  3 Feb 2025 13:09:31 -0800 Tony Nguyen wrote:
> > +	if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
> > +	    val.vu32 < pf->msix.min) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> > +		return -EINVAL;
> 
> > +	if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
> > +		NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
> > +		return -EINVAL;
> 
> Please follow up and either remove these extack messages, or make them
> more meaningful. The "value is invalid" is already expressed by EINVAL

Will be removed.

> 
> The suggestion to set the values at once or as "pending" is a
> distraction IMO.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter
  2025-02-04 18:41       ` David Laight
@ 2025-02-05  7:40         ` Michal Swiatkowski
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Swiatkowski @ 2025-02-05  7:40 UTC (permalink / raw)
  To: David Laight
  Cc: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	sridhar.samudrala, jacob.e.keller, pio.raczynski, konrad.knitter,
	marcin.szycik, nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri,
	horms, David.Laight, pmenzel, mschmidt, tatyana.e.nikolova,
	Jason Gunthorpe, Leon Romanovsky, linux-rdma, corbet, linux-doc

On Tue, Feb 04, 2025 at 06:41:21PM +0000, David Laight wrote:
> On Tue, 4 Feb 2025 07:06:00 +0100
> Michal Swiatkowski <michal.swiatkowski@linux.intel.com> wrote:
> 
> > On Mon, Feb 03, 2025 at 09:48:08PM +0000, David Laight wrote:
> > > On Mon,  3 Feb 2025 13:09:31 -0800
> > > Tony Nguyen <anthony.l.nguyen@intel.com> wrote:
> > >   
> > > > From: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> > > > 
> > > > Use generic devlink PF MSI-X parameter to allow user to change MSI-X
> > > > range.
> > > > 
> > > > Add notes about this parameters into ice devlink documentation.
> ....
> > > Don't those checks make it difficult to set the min and max together?
> > > I think you need to create the new min/max pair and check they are
> > > valid together.
> > > Which probably requires one parameter with two values.
> > >   
> > 
> > I wanted to reuse exsisting parameter. The other user of it is bnxt
> > driver. In it there is a separate check for min "max" and max "max".
> > It is also problematic, because min can be set to value greater than
> > max (here it can happen when setting together to specific values).
> > I can do a follow up to this series and change this parameter as you
> > suggested. What do you think?
> 
> Changing the way a parameter is used will break API compatibility.
> Perhaps you can get the generic parameter validation function to
> update a 'pending' copy, and then do the final min < max check after
> all the parameters have been processed before actually updating
> the live limits.
> 
> The other option is just not to check whether min < max and just
> document which takes precedence (and not use clamp()).
> 
> It may even be worth saving the 'live limits' as 'hi << 16 | lo' so
> that then can be accessed atomically (with READ/WRITE_ONCE) to avoid
> anything looking at the limits getting confused.
> (Although maybe that doesn't matter here?)
> 
> 	David

Right, I though it is better to have any additional validation for min >
max cases, but it looks like it is more problematic. I can drop it to
algin with the bnxt solution.

Thanks,
Michal

> 
> > 
> > Thanks,
> > Michal

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-05  7:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03 21:09 [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 1/9] ice: count combined queues using Rx/Tx count Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 2/9] ice: devlink PF MSI-X max and min parameter Tony Nguyen
2025-02-03 21:48   ` David Laight
2025-02-04  6:06     ` Michal Swiatkowski
2025-02-04 18:41       ` David Laight
2025-02-05  7:40         ` Michal Swiatkowski
2025-02-04 22:35   ` Jakub Kicinski
2025-02-05  5:46     ` Michal Swiatkowski
2025-02-03 21:09 ` [PATCH net-next 3/9] ice: remove splitting MSI-X between features Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 4/9] ice: get rid of num_lan_msix field Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 5/9] ice, irdma: move interrupts code to irdma Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 6/9] ice: treat dyn_allowed only as suggestion Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 7/9] ice: enable_rdma devlink param Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 8/9] ice: simplify VF MSI-X managing Tony Nguyen
2025-02-03 21:09 ` [PATCH net-next 9/9] ice: init flow director before RDMA Tony Nguyen
2025-02-04 22:42 ` [PATCH net-next 0/9][pull request] ice: managing MSI-X in driver Jakub Kicinski
2025-02-04 23:07   ` Tony Nguyen
2025-02-05  5:46   ` Michal Swiatkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).