* [iwl-next v9 0/9] ice: managing MSI-X in driver
@ 2024-12-03 6:58 Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 1/9] ice: count combined queues using Rx/Tx count Michal Swiatkowski
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Hi,
It is another try to allow user to manage amount of MSI-X used for each
feature in ice. First was via devlink resources API, it wasn't accepted
in upstream. Also static MSI-X allocation using devlink resources isn't
really user friendly.
This try is using more dynamic way. "Dynamic" across whole kernel when
platform supports it and "dynamic" across the driver when not.
To achieve that reuse global devlink parameter pf_msix_max and
pf_msix_min. It fits how ice hardware counts MSI-X. In case of ice amount
of MSI-X reported on PCI is a whole MSI-X for the card (with MSI-X for
VFs also). Having pf_msix_max allow user to statically set how many
MSI-X he wants on PF and how many should be reserved for VFs.
pf_msix_min is used to set minimum number of MSI-X with which ice driver
should probe correctly.
Meaning of this field in case of dynamic vs static allocation:
- on system with dynamic MSI-X allocation support
* alloc pf_msix_min as static, rest will be allocated dynamically
- on system without dynamic MSI-X allocation support
* try alloc pf_msix_max as static, minimum acceptable result is
pf_msix_min
As Jesse and Piotr suggested pf_msix_max and pf_msix_min can (an
probably should) be stored in NVM. This patchset isn't implementing
that.
Dynamic (kernel or driver) way means that splitting MSI-X across the
RDMA and eth in case there is a MSI-X shortage isn't correct. Can work
when dynamic is only on driver site, but can't when dynamic is on kernel
site.
Let's remove this code and move to MSI-X allocation feature by feature.
If there is no more MSI-X for a feature, a feature is working with less
MSI-X or it is turned off.
There is a regression here. With MSI-X splitting user can run RDMA and
eth even on system with not enough MSI-X. Now only eth will work. RDMA
can be turned on by changing number of PF queues (lowering) and reprobe
RDMA driver.
Example:
72 CPU number, eth, RDMA and flow director (1 MSI-X), 1 MSI-X for OICR
on PF, and 1 more for RDMA. Card is using 1 + 72 + 1 + 72 + 1 = 147.
We set pf_msix_min = 2, pf_msix_max = 128
OICR: 1
eth: 72
flow director: 1
RDMA: 128 - 74 = 54
We can change number of queues on pf to 36 and do devlink reinit
OICR: 1
eth: 36
RDMA: 73
flow director: 1
We can also (implemented in "ice: enable_rdma devlink param") turned
RDMA off.
OICR: 1
eth: 72
RDMA: 0 (turned off)
flow director: 1
After this changes we have a static base vector for SRIOV (SIOV probably
in the feature). Last patch from this series is simplifying managing VF
MSI-X code based on static vector.
Now changing queues using ethtool is also changing MSI-X. If there is
enough MSI-X it is always one to one. When there is not enough there
will be more queues than MSI-X. There is a lack of ability to set how
many queues should be used per MSI-X. Maybe we should introduce another
ethtool param for it? Sth like queues_per_vector?
v8 --> v9: [8]
* add tested-by tags
* v8 was send incorrect, fix it here
v7 --> v8: [7]
* fix unrolling in devlink parameters register function (patch 2)
v6 --> v7: [6]
* use vu32 for devlink MSI-X parameters instead of u16 (patch 2)
* < instead of <= for MSI-X min parameter validation (patch 2)
* use u32 for MSI-X values (patch 2, 8)
v5 --> v6: [5]
* set default MSI-X max value based on needs instead of const define
(patch 3)
v4 --> v5: [4]
* count combined queues in ethtool for case the vectors aren't mapped
1:1 to queues (patch 1)
* change min_t to min where the casting isn't needed (and can hide
problems) (patch 4)
* load msix_max and msix_min value after devlink reload; it accidentally
wasn't added after removing loading in probe path to mitigate error
from devl_para_driverinit...() (patch 2)
* add documentation in develink/ice for new parameters (patch 2)
v3 --> v4: [3]
* drop unnecessary text in devlink validation comments
* assume that devl_param_driverinit...() shouldn't return error in
normal execution path
v2 --> v3: [2]
* move flow director init before RDMA init
* fix unrolling RDMA MSI-X allocation
* add comment in commit message about lowering control RDMA MSI-X
amount
v1 --> v2: [1]
* change permanent MSI-X cmode parameters to driverinit
* remove locking during devlink parameter registration (it is now
locked for whole init/deinit part)
[8] https://lore.kernel.org/netdev/20241114122009.97416-1-michal.swiatkowski@linux.intel.com/
[7] https://lore.kernel.org/netdev/20241104121337.129287-1-michal.swiatkowski@linux.intel.com/
[6] https://lore.kernel.org/netdev/20241028100341.16631-1-michal.swiatkowski@linux.intel.com/
[5] https://lore.kernel.org/netdev/20241024121230.5861-1-michal.swiatkowski@linux.intel.com/T/#t
[4] https://lore.kernel.org/netdev/20240930120402.3468-1-michal.swiatkowski@linux.intel.com/
[3] https://lore.kernel.org/netdev/20240808072016.10321-1-michal.swiatkowski@linux.intel.com/
[2] https://lore.kernel.org/netdev/20240801093115.8553-1-michal.swiatkowski@linux.intel.com/
[1] https://lore.kernel.org/netdev/20240213073509.77622-1-michal.swiatkowski@linux.intel.com/
Michal Swiatkowski (9):
ice: count combined queues using Rx/Tx count
ice: devlink PF MSI-X max and min parameter
ice: remove splitting MSI-X between features
ice: get rid of num_lan_msix field
ice, irdma: move interrupts code to irdma
ice: treat dyn_allowed only as suggestion
ice: enable_rdma devlink param
ice: simplify VF MSI-X managing
ice: init flow director before RDMA
Documentation/networking/devlink/ice.rst | 11 +
drivers/infiniband/hw/irdma/main.h | 3 +
drivers/net/ethernet/intel/ice/ice.h | 21 +-
drivers/net/ethernet/intel/ice/ice_irq.h | 13 +-
include/linux/net/intel/iidc.h | 2 +
drivers/infiniband/hw/irdma/hw.c | 2 -
drivers/infiniband/hw/irdma/main.c | 46 ++-
.../net/ethernet/intel/ice/devlink/devlink.c | 109 +++++++
drivers/net/ethernet/intel/ice/ice_base.c | 10 +-
drivers/net/ethernet/intel/ice/ice_ethtool.c | 9 +-
drivers/net/ethernet/intel/ice/ice_idc.c | 64 +---
drivers/net/ethernet/intel/ice/ice_irq.c | 275 ++++++------------
drivers/net/ethernet/intel/ice/ice_lib.c | 35 ++-
drivers/net/ethernet/intel/ice/ice_main.c | 6 +-
drivers/net/ethernet/intel/ice/ice_sriov.c | 154 +---------
15 files changed, 336 insertions(+), 424 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [iwl-next v9 1/9] ice: count combined queues using Rx/Tx count
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 2/9] ice: devlink PF MSI-X max and min parameter Michal Swiatkowski
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Previous implementation assumes that there is 1:1 matching between
vectors and queues. It isn't always true.
Get minimum value from Rx/Tx queues to determine combined queues number.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice_ethtool.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index b552439fc1f9..c2f53946f1c3 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3818,8 +3818,7 @@ static u32 ice_get_combined_cnt(struct ice_vsi *vsi)
ice_for_each_q_vector(vsi, q_idx) {
struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
- if (q_vector->rx.rx_ring && q_vector->tx.tx_ring)
- combined++;
+ combined += min(q_vector->num_ring_tx, q_vector->num_ring_rx);
}
return combined;
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 2/9] ice: devlink PF MSI-X max and min parameter
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 1/9] ice: count combined queues using Rx/Tx count Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 3/9] ice: remove splitting MSI-X between features Michal Swiatkowski
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Use generic devlink PF MSI-X parameter to allow user to change MSI-X
range.
Add notes about this parameters into ice devlink documentation.
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
Documentation/networking/devlink/ice.rst | 11 +++
drivers/net/ethernet/intel/ice/ice.h | 7 ++
.../net/ethernet/intel/ice/devlink/devlink.c | 88 +++++++++++++++++++
drivers/net/ethernet/intel/ice/ice_irq.c | 7 ++
4 files changed, 113 insertions(+)
diff --git a/Documentation/networking/devlink/ice.rst b/Documentation/networking/devlink/ice.rst
index e3972d03cea0..792e9f8c846a 100644
--- a/Documentation/networking/devlink/ice.rst
+++ b/Documentation/networking/devlink/ice.rst
@@ -69,6 +69,17 @@ Parameters
To verify that value has been set:
$ devlink dev param show pci/0000:16:00.0 name tx_scheduling_layers
+ * - ``msix_vec_per_pf_max``
+ - driverinit
+ - Set the max MSI-X that can be used by the PF, rest can be utilized for
+ SRIOV. The range is from min value set in msix_vec_per_pf_min to
+ 2k/number of ports.
+ * - ``msix_vec_per_pf_min``
+ - driverinit
+ - Set the min MSI-X that will be used by the PF. This value inform how many
+ MSI-X will be allocated statically. The range is from 2 to value set
+ in msix_vec_per_pf_max.
+
.. list-table:: Driver specific parameters implemented
:widths: 5 5 90
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 7997745686b3..5baa36a5a500 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -543,6 +543,12 @@ struct ice_agg_node {
u8 valid;
};
+struct ice_pf_msix {
+ u32 cur;
+ u32 min;
+ u32 max;
+};
+
struct ice_pf {
struct pci_dev *pdev;
struct ice_adapter *adapter;
@@ -613,6 +619,7 @@ struct ice_pf {
struct msi_map ll_ts_irq; /* LL_TS interrupt MSIX vector */
u16 max_pf_txqs; /* Total Tx queues PF wide */
u16 max_pf_rxqs; /* Total Rx queues PF wide */
+ struct ice_pf_msix msix;
u16 num_lan_msix; /* Total MSIX vectors for base driver */
u16 num_lan_tx; /* num LAN Tx queues setup */
u16 num_lan_rx; /* num LAN Rx queues setup */
diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
index d116e2b10bce..c53baecf8a90 100644
--- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
+++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
@@ -1202,6 +1202,25 @@ static int ice_devlink_set_parent(struct devlink_rate *devlink_rate,
return status;
}
+static void ice_set_min_max_msix(struct ice_pf *pf)
+{
+ struct devlink *devlink = priv_to_devlink(pf);
+ union devlink_param_value val;
+ int err;
+
+ err = devl_param_driverinit_value_get(devlink,
+ DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+ &val);
+ if (!err)
+ pf->msix.min = val.vu32;
+
+ err = devl_param_driverinit_value_get(devlink,
+ DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+ &val);
+ if (!err)
+ pf->msix.max = val.vu32;
+}
+
/**
* ice_devlink_reinit_up - do reinit of the given PF
* @pf: pointer to the PF struct
@@ -1217,6 +1236,9 @@ static int ice_devlink_reinit_up(struct ice_pf *pf)
return err;
}
+ /* load MSI-X values */
+ ice_set_min_max_msix(pf);
+
err = ice_init_dev(pf);
if (err)
goto unroll_hw_init;
@@ -1530,6 +1552,37 @@ static int ice_devlink_local_fwd_validate(struct devlink *devlink, u32 id,
return 0;
}
+static int
+ice_devlink_msix_max_pf_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ struct ice_pf *pf = devlink_priv(devlink);
+
+ if (val.vu32 > pf->hw.func_caps.common_cap.num_msix_vectors ||
+ val.vu32 < pf->msix.min) {
+ NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int
+ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ struct ice_pf *pf = devlink_priv(devlink);
+
+ if (val.vu32 < ICE_MIN_MSIX || val.vu32 > pf->msix.max) {
+ NL_SET_ERR_MSG_MOD(extack, "Value is invalid");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
enum ice_param_id {
ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
@@ -1547,6 +1600,15 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
ice_devlink_enable_iw_validate),
};
+static const struct devlink_param ice_dvl_msix_params[] = {
+ DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MAX,
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ NULL, NULL, ice_devlink_msix_max_pf_validate),
+ DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MIN,
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ NULL, NULL, ice_devlink_msix_min_pf_validate),
+};
+
static const struct devlink_param ice_dvl_sched_params[] = {
DEVLINK_PARAM_DRIVER(ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
"tx_scheduling_layers",
@@ -1648,6 +1710,7 @@ void ice_devlink_unregister(struct ice_pf *pf)
int ice_devlink_register_params(struct ice_pf *pf)
{
struct devlink *devlink = priv_to_devlink(pf);
+ union devlink_param_value value;
struct ice_hw *hw = &pf->hw;
int status;
@@ -1656,10 +1719,33 @@ int ice_devlink_register_params(struct ice_pf *pf)
if (status)
return status;
+ status = devl_params_register(devlink, ice_dvl_msix_params,
+ ARRAY_SIZE(ice_dvl_msix_params));
+ if (status)
+ goto unregister_rdma_params;
+
if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
status = devl_params_register(devlink, ice_dvl_sched_params,
ARRAY_SIZE(ice_dvl_sched_params));
+ if (status)
+ goto unregister_msix_params;
+
+ value.vu32 = pf->msix.max;
+ devl_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+ value);
+ value.vu32 = pf->msix.min;
+ devl_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+ value);
+ return 0;
+unregister_msix_params:
+ devl_params_unregister(devlink, ice_dvl_msix_params,
+ ARRAY_SIZE(ice_dvl_msix_params));
+unregister_rdma_params:
+ devl_params_unregister(devlink, ice_dvl_rdma_params,
+ ARRAY_SIZE(ice_dvl_rdma_params));
return status;
}
@@ -1670,6 +1756,8 @@ void ice_devlink_unregister_params(struct ice_pf *pf)
devl_params_unregister(devlink, ice_dvl_rdma_params,
ARRAY_SIZE(ice_dvl_rdma_params));
+ devl_params_unregister(devlink, ice_dvl_msix_params,
+ ARRAY_SIZE(ice_dvl_msix_params));
if (hw->func_caps.common_cap.tx_sched_topo_comp_mode_en)
devl_params_unregister(devlink, ice_dvl_sched_params,
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index ad82ff7d1995..0659b96b9b8c 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -254,6 +254,13 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
int vectors, max_vectors;
+ /* load default PF MSI-X range */
+ if (!pf->msix.min)
+ pf->msix.min = ICE_MIN_MSIX;
+
+ if (!pf->msix.max)
+ pf->msix.max = total_vectors / 2;
+
vectors = ice_ena_msix_range(pf);
if (vectors < 0)
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 3/9] ice: remove splitting MSI-X between features
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 1/9] ice: count combined queues using Rx/Tx count Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 2/9] ice: devlink PF MSI-X max and min parameter Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 4/9] ice: get rid of num_lan_msix field Michal Swiatkowski
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
With dynamic approach to alloc MSI-X there is no sense to statically
split MSI-X between PF features.
Splitting was also calculating needed MSI-X. Move this part to separate
function and use as max value.
Remove ICE_ESWITCH_MSIX, as there is no need for additional MSI-X for
switchdev.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 2 -
drivers/net/ethernet/intel/ice/ice_irq.c | 172 +++--------------------
2 files changed, 16 insertions(+), 158 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 5baa36a5a500..bf07d64a58a7 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -98,8 +98,6 @@
#define ICE_MIN_MSIX (ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
#define ICE_FDIR_MSIX 2
#define ICE_RDMA_NUM_AEQ_MSIX 4
-#define ICE_MIN_RDMA_MSIX 2
-#define ICE_ESWITCH_MSIX 1
#define ICE_NO_VSI 0xffff
#define ICE_VSI_MAP_CONTIG 0
#define ICE_VSI_MAP_SCATTER 1
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 0659b96b9b8c..4a50a6dc817e 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -84,155 +84,11 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
return entry;
}
-/**
- * ice_reduce_msix_usage - Reduce usage of MSI-X vectors
- * @pf: board private structure
- * @v_remain: number of remaining MSI-X vectors to be distributed
- *
- * Reduce the usage of MSI-X vectors when entire request cannot be fulfilled.
- * pf->num_lan_msix and pf->num_rdma_msix values are set based on number of
- * remaining vectors.
- */
-static void ice_reduce_msix_usage(struct ice_pf *pf, int v_remain)
+static int ice_get_default_msix_amount(struct ice_pf *pf)
{
- int v_rdma;
-
- if (!ice_is_rdma_ena(pf)) {
- pf->num_lan_msix = v_remain;
- return;
- }
-
- /* RDMA needs at least 1 interrupt in addition to AEQ MSIX */
- v_rdma = ICE_RDMA_NUM_AEQ_MSIX + 1;
-
- if (v_remain < ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_RDMA_MSIX) {
- dev_warn(ice_pf_to_dev(pf), "Not enough MSI-X vectors to support RDMA.\n");
- clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
-
- pf->num_rdma_msix = 0;
- pf->num_lan_msix = ICE_MIN_LAN_TXRX_MSIX;
- } else if ((v_remain < ICE_MIN_LAN_TXRX_MSIX + v_rdma) ||
- (v_remain - v_rdma < v_rdma)) {
- /* Support minimum RDMA and give remaining vectors to LAN MSIX
- */
- pf->num_rdma_msix = ICE_MIN_RDMA_MSIX;
- pf->num_lan_msix = v_remain - ICE_MIN_RDMA_MSIX;
- } else {
- /* Split remaining MSIX with RDMA after accounting for AEQ MSIX
- */
- pf->num_rdma_msix = (v_remain - ICE_RDMA_NUM_AEQ_MSIX) / 2 +
- ICE_RDMA_NUM_AEQ_MSIX;
- pf->num_lan_msix = v_remain - pf->num_rdma_msix;
- }
-}
-
-/**
- * ice_ena_msix_range - Request a range of MSIX vectors from the OS
- * @pf: board private structure
- *
- * Compute the number of MSIX vectors wanted and request from the OS. Adjust
- * device usage if there are not enough vectors. Return the number of vectors
- * reserved or negative on failure.
- */
-static int ice_ena_msix_range(struct ice_pf *pf)
-{
- int num_cpus, hw_num_msix, v_other, v_wanted, v_actual;
- struct device *dev = ice_pf_to_dev(pf);
- int err;
-
- hw_num_msix = pf->hw.func_caps.common_cap.num_msix_vectors;
- num_cpus = num_online_cpus();
-
- /* LAN miscellaneous handler */
- v_other = ICE_MIN_LAN_OICR_MSIX;
-
- /* Flow Director */
- if (test_bit(ICE_FLAG_FD_ENA, pf->flags))
- v_other += ICE_FDIR_MSIX;
-
- /* switchdev */
- v_other += ICE_ESWITCH_MSIX;
-
- v_wanted = v_other;
-
- /* LAN traffic */
- pf->num_lan_msix = num_cpus;
- v_wanted += pf->num_lan_msix;
-
- /* RDMA auxiliary driver */
- if (ice_is_rdma_ena(pf)) {
- pf->num_rdma_msix = num_cpus + ICE_RDMA_NUM_AEQ_MSIX;
- v_wanted += pf->num_rdma_msix;
- }
-
- if (v_wanted > hw_num_msix) {
- int v_remain;
-
- dev_warn(dev, "not enough device MSI-X vectors. wanted = %d, available = %d\n",
- v_wanted, hw_num_msix);
-
- if (hw_num_msix < ICE_MIN_MSIX) {
- err = -ERANGE;
- goto exit_err;
- }
-
- v_remain = hw_num_msix - v_other;
- if (v_remain < ICE_MIN_LAN_TXRX_MSIX) {
- v_other = ICE_MIN_MSIX - ICE_MIN_LAN_TXRX_MSIX;
- v_remain = ICE_MIN_LAN_TXRX_MSIX;
- }
-
- ice_reduce_msix_usage(pf, v_remain);
- v_wanted = pf->num_lan_msix + pf->num_rdma_msix + v_other;
-
- dev_notice(dev, "Reducing request to %d MSI-X vectors for LAN traffic.\n",
- pf->num_lan_msix);
- if (ice_is_rdma_ena(pf))
- dev_notice(dev, "Reducing request to %d MSI-X vectors for RDMA.\n",
- pf->num_rdma_msix);
- }
-
- /* actually reserve the vectors */
- v_actual = pci_alloc_irq_vectors(pf->pdev, ICE_MIN_MSIX, v_wanted,
- PCI_IRQ_MSIX);
- if (v_actual < 0) {
- dev_err(dev, "unable to reserve MSI-X vectors\n");
- err = v_actual;
- goto exit_err;
- }
-
- if (v_actual < v_wanted) {
- dev_warn(dev, "not enough OS MSI-X vectors. requested = %d, obtained = %d\n",
- v_wanted, v_actual);
-
- if (v_actual < ICE_MIN_MSIX) {
- /* error if we can't get minimum vectors */
- pci_free_irq_vectors(pf->pdev);
- err = -ERANGE;
- goto exit_err;
- } else {
- int v_remain = v_actual - v_other;
-
- if (v_remain < ICE_MIN_LAN_TXRX_MSIX)
- v_remain = ICE_MIN_LAN_TXRX_MSIX;
-
- ice_reduce_msix_usage(pf, v_remain);
-
- dev_notice(dev, "Enabled %d MSI-X vectors for LAN traffic.\n",
- pf->num_lan_msix);
-
- if (ice_is_rdma_ena(pf))
- dev_notice(dev, "Enabled %d MSI-X vectors for RDMA.\n",
- pf->num_rdma_msix);
- }
- }
-
- return v_actual;
-
-exit_err:
- pf->num_rdma_msix = 0;
- pf->num_lan_msix = 0;
- return err;
+ return ICE_MIN_LAN_OICR_MSIX + num_online_cpus() +
+ (test_bit(ICE_FLAG_FD_ENA, pf->flags) ? ICE_FDIR_MSIX : 0) +
+ (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_NUM_AEQ_MSIX : 0);
}
/**
@@ -259,17 +115,21 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
pf->msix.min = ICE_MIN_MSIX;
if (!pf->msix.max)
- pf->msix.max = total_vectors / 2;
-
- vectors = ice_ena_msix_range(pf);
+ pf->msix.max = min(total_vectors,
+ ice_get_default_msix_amount(pf));
- if (vectors < 0)
- return -ENOMEM;
-
- if (pci_msix_can_alloc_dyn(pf->pdev))
+ if (pci_msix_can_alloc_dyn(pf->pdev)) {
+ vectors = pf->msix.min;
max_vectors = total_vectors;
- else
+ } else {
+ vectors = pf->msix.max;
max_vectors = vectors;
+ }
+
+ vectors = pci_alloc_irq_vectors(pf->pdev, pf->msix.min, vectors,
+ PCI_IRQ_MSIX);
+ if (vectors < pf->msix.min)
+ return -ENOMEM;
ice_init_irq_tracker(pf, max_vectors, vectors);
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 4/9] ice: get rid of num_lan_msix field
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (2 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 3/9] ice: remove splitting MSI-X between features Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma Michal Swiatkowski
` (4 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Remove the field to allow having more queues than MSI-X on VSI. As
default the number will be the same, but if there won't be more MSI-X
available VSI can run with at least one MSI-X.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 1 -
drivers/net/ethernet/intel/ice/ice_base.c | 10 ++++----
drivers/net/ethernet/intel/ice/ice_ethtool.c | 6 ++---
drivers/net/ethernet/intel/ice/ice_irq.c | 11 ++++-----
drivers/net/ethernet/intel/ice/ice_lib.c | 25 +++++++++++---------
5 files changed, 24 insertions(+), 29 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index bf07d64a58a7..f497f7d6eb71 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -618,7 +618,6 @@ struct ice_pf {
u16 max_pf_txqs; /* Total Tx queues PF wide */
u16 max_pf_rxqs; /* Total Rx queues PF wide */
struct ice_pf_msix msix;
- u16 num_lan_msix; /* Total MSIX vectors for base driver */
u16 num_lan_tx; /* num LAN Tx queues setup */
u16 num_lan_rx; /* num LAN Rx queues setup */
u16 next_vsi; /* Next free slot in pf->vsi[] - 0-based! */
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 64b4243ace03..0b3cf9baef04 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -795,13 +795,11 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
return 0;
err_out:
- while (v_idx--)
- ice_free_q_vector(vsi, v_idx);
- dev_err(dev, "Failed to allocate %d q_vector for VSI %d, ret=%d\n",
- vsi->num_q_vectors, vsi->vsi_num, err);
- vsi->num_q_vectors = 0;
- return err;
+ dev_info(dev, "Failed to allocate %d q_vectors for VSI %d, new value %d",
+ vsi->num_q_vectors, vsi->vsi_num, v_idx);
+ vsi->num_q_vectors = v_idx;
+ return v_idx ? 0 : err;
}
/**
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index c2f53946f1c3..5d001fe95753 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3789,8 +3789,7 @@ ice_get_ts_info(struct net_device *dev, struct kernel_ethtool_ts_info *info)
*/
static int ice_get_max_txq(struct ice_pf *pf)
{
- return min3(pf->num_lan_msix, (u16)num_online_cpus(),
- (u16)pf->hw.func_caps.common_cap.num_txq);
+ return min(num_online_cpus(), pf->hw.func_caps.common_cap.num_txq);
}
/**
@@ -3799,8 +3798,7 @@ static int ice_get_max_txq(struct ice_pf *pf)
*/
static int ice_get_max_rxq(struct ice_pf *pf)
{
- return min3(pf->num_lan_msix, (u16)num_online_cpus(),
- (u16)pf->hw.func_caps.common_cap.num_rxq);
+ return min(num_online_cpus(), pf->hw.func_caps.common_cap.num_rxq);
}
/**
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 4a50a6dc817e..1a7d446ab5f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -108,7 +108,7 @@ void ice_clear_interrupt_scheme(struct ice_pf *pf)
int ice_init_interrupt_scheme(struct ice_pf *pf)
{
int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
- int vectors, max_vectors;
+ int vectors;
/* load default PF MSI-X range */
if (!pf->msix.min)
@@ -118,20 +118,17 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
pf->msix.max = min(total_vectors,
ice_get_default_msix_amount(pf));
- if (pci_msix_can_alloc_dyn(pf->pdev)) {
+ if (pci_msix_can_alloc_dyn(pf->pdev))
vectors = pf->msix.min;
- max_vectors = total_vectors;
- } else {
+ else
vectors = pf->msix.max;
- max_vectors = vectors;
- }
vectors = pci_alloc_irq_vectors(pf->pdev, pf->msix.min, vectors,
PCI_IRQ_MSIX);
if (vectors < pf->msix.min)
return -ENOMEM;
- ice_init_irq_tracker(pf, max_vectors, vectors);
+ ice_init_irq_tracker(pf, pf->msix.max, vectors);
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 7da2f3336c59..ecc850a43643 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -157,6 +157,16 @@ static void ice_vsi_set_num_desc(struct ice_vsi *vsi)
}
}
+static u16 ice_get_rxq_count(struct ice_pf *pf)
+{
+ return min(ice_get_avail_rxq_count(pf), num_online_cpus());
+}
+
+static u16 ice_get_txq_count(struct ice_pf *pf)
+{
+ return min(ice_get_avail_txq_count(pf), num_online_cpus());
+}
+
/**
* ice_vsi_set_num_qs - Set number of queues, descriptors and vectors for a VSI
* @vsi: the VSI being configured
@@ -178,9 +188,7 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
vsi->alloc_txq = vsi->req_txq;
vsi->num_txq = vsi->req_txq;
} else {
- vsi->alloc_txq = min3(pf->num_lan_msix,
- ice_get_avail_txq_count(pf),
- (u16)num_online_cpus());
+ vsi->alloc_txq = ice_get_txq_count(pf);
}
pf->num_lan_tx = vsi->alloc_txq;
@@ -193,17 +201,13 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
vsi->alloc_rxq = vsi->req_rxq;
vsi->num_rxq = vsi->req_rxq;
} else {
- vsi->alloc_rxq = min3(pf->num_lan_msix,
- ice_get_avail_rxq_count(pf),
- (u16)num_online_cpus());
+ vsi->alloc_rxq = ice_get_rxq_count(pf);
}
}
pf->num_lan_rx = vsi->alloc_rxq;
- vsi->num_q_vectors = min_t(int, pf->num_lan_msix,
- max_t(int, vsi->alloc_rxq,
- vsi->alloc_txq));
+ vsi->num_q_vectors = max(vsi->alloc_rxq, vsi->alloc_txq);
break;
case ICE_VSI_SF:
vsi->alloc_txq = 1;
@@ -1173,12 +1177,11 @@ static void ice_set_rss_vsi_ctx(struct ice_vsi_ctx *ctxt, struct ice_vsi *vsi)
static void
ice_chnl_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
{
- struct ice_pf *pf = vsi->back;
u16 qcount, qmap;
u8 offset = 0;
int pow;
- qcount = min_t(int, vsi->num_rxq, pf->num_lan_msix);
+ qcount = vsi->num_rxq;
pow = order_base_2(qcount);
qmap = FIELD_PREP(ICE_AQ_VSI_TC_Q_OFFSET_M, offset);
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (3 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 4/9] ice: get rid of num_lan_msix field Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2025-02-13 19:20 ` [Intel-wired-lan] " Marcin Szycik
2024-12-03 6:58 ` [iwl-next v9 6/9] ice: treat dyn_allowed only as suggestion Michal Swiatkowski
` (3 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Move responsibility of MSI-X requesting for RDMA feature from ice driver
to irdma driver. It is done to allow simple fallback when there is not
enough MSI-X available.
Change amount of MSI-X used for control from 4 to 1, as it isn't needed
to have more than one MSI-X for this purpose.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/infiniband/hw/irdma/main.h | 3 ++
drivers/net/ethernet/intel/ice/ice.h | 1 -
include/linux/net/intel/iidc.h | 2 +
drivers/infiniband/hw/irdma/hw.c | 2 -
drivers/infiniband/hw/irdma/main.c | 46 ++++++++++++++++-
drivers/net/ethernet/intel/ice/ice_idc.c | 64 ++++++------------------
drivers/net/ethernet/intel/ice/ice_irq.c | 3 +-
7 files changed, 65 insertions(+), 56 deletions(-)
diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
index 9f0ed6e84471..ef9a9b79d711 100644
--- a/drivers/infiniband/hw/irdma/main.h
+++ b/drivers/infiniband/hw/irdma/main.h
@@ -117,6 +117,9 @@ extern struct auxiliary_driver i40iw_auxiliary_drv;
#define IRDMA_IRQ_NAME_STR_LEN (64)
+#define IRDMA_NUM_AEQ_MSIX 1
+#define IRDMA_MIN_MSIX 2
+
enum init_completion_state {
INVALID_STATE = 0,
INITIAL_STATE,
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index f497f7d6eb71..14a90c916d43 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -97,7 +97,6 @@
#define ICE_MIN_LAN_OICR_MSIX 1
#define ICE_MIN_MSIX (ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
#define ICE_FDIR_MSIX 2
-#define ICE_RDMA_NUM_AEQ_MSIX 4
#define ICE_NO_VSI 0xffff
#define ICE_VSI_MAP_CONTIG 0
#define ICE_VSI_MAP_SCATTER 1
diff --git a/include/linux/net/intel/iidc.h b/include/linux/net/intel/iidc.h
index 1c1332e4df26..13274c3def66 100644
--- a/include/linux/net/intel/iidc.h
+++ b/include/linux/net/intel/iidc.h
@@ -78,6 +78,8 @@ int ice_del_rdma_qset(struct ice_pf *pf, struct iidc_rdma_qset_params *qset);
int ice_rdma_request_reset(struct ice_pf *pf, enum iidc_reset_type reset_type);
int ice_rdma_update_vsi_filter(struct ice_pf *pf, u16 vsi_id, bool enable);
void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos);
+int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
+void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
/* Structure representing auxiliary driver tailored information about the core
* PCI dev, each auxiliary driver using the IIDC interface will have an
diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
index ad50b77282f8..69ce1862eabe 100644
--- a/drivers/infiniband/hw/irdma/hw.c
+++ b/drivers/infiniband/hw/irdma/hw.c
@@ -498,8 +498,6 @@ static int irdma_save_msix_info(struct irdma_pci_f *rf)
iw_qvlist->num_vectors = rf->msix_count;
if (rf->msix_count <= num_online_cpus())
rf->msix_shared = true;
- else if (rf->msix_count > num_online_cpus() + 1)
- rf->msix_count = num_online_cpus() + 1;
pmsix = rf->msix_entries;
for (i = 0, ceq_idx = 0; i < rf->msix_count; i++, iw_qvinfo++) {
diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
index 3f13200ff71b..1ee8969595d3 100644
--- a/drivers/infiniband/hw/irdma/main.c
+++ b/drivers/infiniband/hw/irdma/main.c
@@ -206,6 +206,43 @@ static void irdma_lan_unregister_qset(struct irdma_sc_vsi *vsi,
ibdev_dbg(&iwdev->ibdev, "WS: LAN free_res for rdma qset failed.\n");
}
+static int irdma_init_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
+{
+ int i;
+
+ rf->msix_count = num_online_cpus() + IRDMA_NUM_AEQ_MSIX;
+ rf->msix_entries = kcalloc(rf->msix_count, sizeof(*rf->msix_entries),
+ GFP_KERNEL);
+ if (!rf->msix_entries)
+ return -ENOMEM;
+
+ for (i = 0; i < rf->msix_count; i++)
+ if (ice_alloc_rdma_qvector(pf, &rf->msix_entries[i]))
+ break;
+
+ if (i < IRDMA_MIN_MSIX) {
+ for (; i > 0; i--)
+ ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
+
+ kfree(rf->msix_entries);
+ return -ENOMEM;
+ }
+
+ rf->msix_count = i;
+
+ return 0;
+}
+
+static void irdma_deinit_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
+{
+ int i;
+
+ for (i = 0; i < rf->msix_count; i++)
+ ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
+
+ kfree(rf->msix_entries);
+}
+
static void irdma_remove(struct auxiliary_device *aux_dev)
{
struct iidc_auxiliary_dev *iidc_adev = container_of(aux_dev,
@@ -216,6 +253,7 @@ static void irdma_remove(struct auxiliary_device *aux_dev)
irdma_ib_unregister_device(iwdev);
ice_rdma_update_vsi_filter(pf, iwdev->vsi_num, false);
+ irdma_deinit_interrupts(iwdev->rf, pf);
pr_debug("INIT: Gen2 PF[%d] device remove success\n", PCI_FUNC(pf->pdev->devfn));
}
@@ -230,9 +268,7 @@ static void irdma_fill_device_info(struct irdma_device *iwdev, struct ice_pf *pf
rf->gen_ops.unregister_qset = irdma_lan_unregister_qset;
rf->hw.hw_addr = pf->hw.hw_addr;
rf->pcidev = pf->pdev;
- rf->msix_count = pf->num_rdma_msix;
rf->pf_id = pf->hw.pf_id;
- rf->msix_entries = &pf->msix_entries[pf->rdma_base_vector];
rf->default_vsi.vsi_idx = vsi->vsi_num;
rf->protocol_used = pf->rdma_mode & IIDC_RDMA_PROTOCOL_ROCEV2 ?
IRDMA_ROCE_PROTOCOL_ONLY : IRDMA_IWARP_PROTOCOL_ONLY;
@@ -281,6 +317,10 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
irdma_fill_device_info(iwdev, pf, vsi);
rf = iwdev->rf;
+ err = irdma_init_interrupts(rf, pf);
+ if (err)
+ goto err_init_interrupts;
+
err = irdma_ctrl_init_hw(rf);
if (err)
goto err_ctrl_init;
@@ -311,6 +351,8 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
err_rt_init:
irdma_ctrl_deinit_hw(rf);
err_ctrl_init:
+ irdma_deinit_interrupts(rf, pf);
+err_init_interrupts:
kfree(iwdev->rf);
ib_dealloc_device(&iwdev->ibdev);
diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
index 145b27f2a4ce..bab3e81cad5d 100644
--- a/drivers/net/ethernet/intel/ice/ice_idc.c
+++ b/drivers/net/ethernet/intel/ice/ice_idc.c
@@ -228,61 +228,34 @@ void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos)
}
EXPORT_SYMBOL_GPL(ice_get_qos_params);
-/**
- * ice_alloc_rdma_qvectors - Allocate vector resources for RDMA driver
- * @pf: board private structure to initialize
- */
-static int ice_alloc_rdma_qvectors(struct ice_pf *pf)
+int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
{
- if (ice_is_rdma_ena(pf)) {
- int i;
-
- pf->msix_entries = kcalloc(pf->num_rdma_msix,
- sizeof(*pf->msix_entries),
- GFP_KERNEL);
- if (!pf->msix_entries)
- return -ENOMEM;
+ struct msi_map map = ice_alloc_irq(pf, true);
- /* RDMA is the only user of pf->msix_entries array */
- pf->rdma_base_vector = 0;
-
- for (i = 0; i < pf->num_rdma_msix; i++) {
- struct msix_entry *entry = &pf->msix_entries[i];
- struct msi_map map;
+ if (map.index < 0)
+ return -ENOMEM;
- map = ice_alloc_irq(pf, false);
- if (map.index < 0)
- break;
+ entry->entry = map.index;
+ entry->vector = map.virq;
- entry->entry = map.index;
- entry->vector = map.virq;
- }
- }
return 0;
}
+EXPORT_SYMBOL_GPL(ice_alloc_rdma_qvector);
/**
* ice_free_rdma_qvector - free vector resources reserved for RDMA driver
* @pf: board private structure to initialize
+ * @entry: MSI-X entry to be removed
*/
-static void ice_free_rdma_qvector(struct ice_pf *pf)
+void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
{
- int i;
-
- if (!pf->msix_entries)
- return;
-
- for (i = 0; i < pf->num_rdma_msix; i++) {
- struct msi_map map;
+ struct msi_map map;
- map.index = pf->msix_entries[i].entry;
- map.virq = pf->msix_entries[i].vector;
- ice_free_irq(pf, map);
- }
-
- kfree(pf->msix_entries);
- pf->msix_entries = NULL;
+ map.index = entry->entry;
+ map.virq = entry->vector;
+ ice_free_irq(pf, map);
}
+EXPORT_SYMBOL_GPL(ice_free_rdma_qvector);
/**
* ice_adev_release - function to be mapped to AUX dev's release op
@@ -382,12 +355,6 @@ int ice_init_rdma(struct ice_pf *pf)
return -ENOMEM;
}
- /* Reserve vector resources */
- ret = ice_alloc_rdma_qvectors(pf);
- if (ret < 0) {
- dev_err(dev, "failed to reserve vectors for RDMA\n");
- goto err_reserve_rdma_qvector;
- }
pf->rdma_mode |= IIDC_RDMA_PROTOCOL_ROCEV2;
ret = ice_plug_aux_dev(pf);
if (ret)
@@ -395,8 +362,6 @@ int ice_init_rdma(struct ice_pf *pf)
return 0;
err_plug_aux_dev:
- ice_free_rdma_qvector(pf);
-err_reserve_rdma_qvector:
pf->adev = NULL;
xa_erase(&ice_aux_id, pf->aux_idx);
return ret;
@@ -412,6 +377,5 @@ void ice_deinit_rdma(struct ice_pf *pf)
return;
ice_unplug_aux_dev(pf);
- ice_free_rdma_qvector(pf);
xa_erase(&ice_aux_id, pf->aux_idx);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 1a7d446ab5f1..80c9ee2e64c1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -84,11 +84,12 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
return entry;
}
+#define ICE_RDMA_AEQ_MSIX 1
static int ice_get_default_msix_amount(struct ice_pf *pf)
{
return ICE_MIN_LAN_OICR_MSIX + num_online_cpus() +
(test_bit(ICE_FLAG_FD_ENA, pf->flags) ? ICE_FDIR_MSIX : 0) +
- (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_NUM_AEQ_MSIX : 0);
+ (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_AEQ_MSIX : 0);
}
/**
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 6/9] ice: treat dyn_allowed only as suggestion
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (4 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 7/9] ice: enable_rdma devlink param Michal Swiatkowski
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
It can be needed to have some MSI-X allocated as static and rest as
dynamic. For example on PF VSI. We want to always have minimum one MSI-X
on it, because of that it is allocated as a static one, rest can be
dynamic if it is supported.
Change the ice_get_irq_res() to allow using static entries if they are
free even if caller wants dynamic one.
Adjust limit values to the new approach. Min and max in limit means the
values that are valid, so decrease max and num_static by one.
Set vsi::irq_dyn_alloc if dynamic allocation is supported.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice_irq.c | 25 ++++++++++++------------
drivers/net/ethernet/intel/ice/ice_lib.c | 2 ++
2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index 80c9ee2e64c1..d466d29b2ef1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -45,7 +45,7 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
/**
* ice_get_irq_res - get an interrupt resource
* @pf: board private structure
- * @dyn_only: force entry to be dynamically allocated
+ * @dyn_allowed: allow entry to be dynamically allocated
*
* Allocate new irq entry in the free slot of the tracker. Since xarray
* is used, always allocate new entry at the lowest possible index. Set
@@ -53,11 +53,12 @@ static void ice_free_irq_res(struct ice_pf *pf, u16 index)
*
* Returns allocated irq entry or NULL on failure.
*/
-static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
+static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf,
+ bool dyn_allowed)
{
- struct xa_limit limit = { .max = pf->irq_tracker.num_entries,
+ struct xa_limit limit = { .max = pf->irq_tracker.num_entries - 1,
.min = 0 };
- unsigned int num_static = pf->irq_tracker.num_static;
+ unsigned int num_static = pf->irq_tracker.num_static - 1;
struct ice_irq_entry *entry;
unsigned int index;
int ret;
@@ -66,9 +67,9 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
if (!entry)
return NULL;
- /* skip preallocated entries if the caller says so */
- if (dyn_only)
- limit.min = num_static;
+ /* only already allocated if the caller says so */
+ if (!dyn_allowed)
+ limit.max = num_static;
ret = xa_alloc(&pf->irq_tracker.entries, &index, entry, limit,
GFP_KERNEL);
@@ -78,7 +79,7 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
entry = NULL;
} else {
entry->index = index;
- entry->dynamic = index >= num_static;
+ entry->dynamic = index > num_static;
}
return entry;
@@ -137,7 +138,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
/**
* ice_alloc_irq - Allocate new interrupt vector
* @pf: board private structure
- * @dyn_only: force dynamic allocation of the interrupt
+ * @dyn_allowed: allow dynamic allocation of the interrupt
*
* Allocate new interrupt vector for a given owner id.
* return struct msi_map with interrupt details and track
@@ -150,20 +151,20 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
* interrupt will be allocated with pci_msix_alloc_irq_at.
*
* Some callers may only support dynamically allocated interrupts.
- * This is indicated with dyn_only flag.
+ * This is indicated with dyn_allowed flag.
*
* On failure, return map with negative .index. The caller
* is expected to check returned map index.
*
*/
-struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only)
+struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
{
int sriov_base_vector = pf->sriov_base_vector;
struct msi_map map = { .index = -ENOENT };
struct device *dev = ice_pf_to_dev(pf);
struct ice_irq_entry *entry;
- entry = ice_get_irq_res(pf, dyn_only);
+ entry = ice_get_irq_res(pf, dyn_allowed);
if (!entry)
return map;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index ecc850a43643..01afa5f84af9 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -571,6 +571,8 @@ ice_vsi_alloc_def(struct ice_vsi *vsi, struct ice_channel *ch)
return -ENOMEM;
}
+ vsi->irq_dyn_alloc = pci_msix_can_alloc_dyn(vsi->back->pdev);
+
switch (vsi->type) {
case ICE_VSI_PF:
case ICE_VSI_SF:
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 7/9] ice: enable_rdma devlink param
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (5 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 6/9] ice: treat dyn_allowed only as suggestion Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 8/9] ice: simplify VF MSI-X managing Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 9/9] ice: init flow director before RDMA Michal Swiatkowski
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Implement enable_rdma devlink parameter to allow user to turn RDMA
feature on and off.
It is useful when there is no enough interrupts and user doesn't need
RDMA feature.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
.../net/ethernet/intel/ice/devlink/devlink.c | 21 +++++++++++++++++++
drivers/net/ethernet/intel/ice/ice_lib.c | 8 ++++++-
2 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink.c b/drivers/net/ethernet/intel/ice/devlink/devlink.c
index c53baecf8a90..725136c975e1 100644
--- a/drivers/net/ethernet/intel/ice/devlink/devlink.c
+++ b/drivers/net/ethernet/intel/ice/devlink/devlink.c
@@ -1583,6 +1583,19 @@ ice_devlink_msix_min_pf_validate(struct devlink *devlink, u32 id,
return 0;
}
+static int ice_devlink_enable_rdma_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ struct ice_pf *pf = devlink_priv(devlink);
+ bool new_state = val.vbool;
+
+ if (new_state && !test_bit(ICE_FLAG_RDMA_ENA, pf->flags))
+ return -EOPNOTSUPP;
+
+ return 0;
+}
+
enum ice_param_id {
ICE_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
ICE_DEVLINK_PARAM_ID_TX_SCHED_LAYERS,
@@ -1598,6 +1611,8 @@ static const struct devlink_param ice_dvl_rdma_params[] = {
ice_devlink_enable_iw_get,
ice_devlink_enable_iw_set,
ice_devlink_enable_iw_validate),
+ DEVLINK_PARAM_GENERIC(ENABLE_RDMA, BIT(DEVLINK_PARAM_CMODE_DRIVERINIT),
+ NULL, NULL, ice_devlink_enable_rdma_validate),
};
static const struct devlink_param ice_dvl_msix_params[] = {
@@ -1738,6 +1753,12 @@ int ice_devlink_register_params(struct ice_pf *pf)
devl_param_driverinit_value_set(devlink,
DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
value);
+
+ value.vbool = test_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+ devl_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
+ value);
+
return 0;
unregister_msix_params:
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 01afa5f84af9..29a3e055b7c6 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -833,7 +833,13 @@ bool ice_is_safe_mode(struct ice_pf *pf)
*/
bool ice_is_rdma_ena(struct ice_pf *pf)
{
- return test_bit(ICE_FLAG_RDMA_ENA, pf->flags);
+ union devlink_param_value value;
+ int err;
+
+ err = devl_param_driverinit_value_get(priv_to_devlink(pf),
+ DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
+ &value);
+ return err ? test_bit(ICE_FLAG_RDMA_ENA, pf->flags) : value.vbool;
}
/**
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 8/9] ice: simplify VF MSI-X managing
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (6 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 7/9] ice: enable_rdma devlink param Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 9/9] ice: init flow director before RDMA Michal Swiatkowski
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
After implementing pf->msix.max field, base vector for other use cases
(like VFs) can be fixed. This simplify code when changing MSI-X amount
on particular VF, because there is no need to move a base vector.
A fixed base vector allows to reserve vectors from the beginning
instead of from the end, which is also simpler in code.
Store total and rest value in the same struct as max and min for PF.
Move tracking vectors from ice_sriov.c to ice_irq.c as it can be also
use for other none PF use cases (SIOV).
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 10 +-
drivers/net/ethernet/intel/ice/ice_irq.h | 13 +-
drivers/net/ethernet/intel/ice/ice_irq.c | 75 +++++++---
drivers/net/ethernet/intel/ice/ice_sriov.c | 154 ++-------------------
4 files changed, 79 insertions(+), 173 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 14a90c916d43..7200d6042590 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -544,6 +544,8 @@ struct ice_pf_msix {
u32 cur;
u32 min;
u32 max;
+ u32 total;
+ u32 rest;
};
struct ice_pf {
@@ -560,13 +562,7 @@ struct ice_pf {
/* OS reserved IRQ details */
struct msix_entry *msix_entries;
struct ice_irq_tracker irq_tracker;
- /* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
- * number of MSIX vectors needed for all SR-IOV VFs from the number of
- * MSIX vectors allowed on this PF.
- */
- u16 sriov_base_vector;
- unsigned long *sriov_irq_bm; /* bitmap to track irq usage */
- u16 sriov_irq_size; /* size of the irq_bm bitmap */
+ struct ice_virt_irq_tracker virt_irq_tracker;
u16 ctrl_vsi_idx; /* control VSI index in pf->vsi array */
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.h b/drivers/net/ethernet/intel/ice/ice_irq.h
index f35efc08575e..b2f9dbafd57e 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.h
+++ b/drivers/net/ethernet/intel/ice/ice_irq.h
@@ -15,11 +15,22 @@ struct ice_irq_tracker {
u16 num_static; /* preallocated entries */
};
+struct ice_virt_irq_tracker {
+ unsigned long *bm; /* bitmap to track irq usage */
+ u32 num_entries;
+ /* First MSIX vector used by SR-IOV VFs. Calculated by subtracting the
+ * number of MSIX vectors needed for all SR-IOV VFs from the number of
+ * MSIX vectors allowed on this PF.
+ */
+ u32 base;
+};
+
int ice_init_interrupt_scheme(struct ice_pf *pf);
void ice_clear_interrupt_scheme(struct ice_pf *pf);
struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_only);
void ice_free_irq(struct ice_pf *pf, struct msi_map map);
-int ice_get_max_used_msix_vector(struct ice_pf *pf);
+int ice_virt_get_irqs(struct ice_pf *pf, u32 needed);
+void ice_virt_free_irqs(struct ice_pf *pf, u32 index, u32 irqs);
#endif
diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
index d466d29b2ef1..cbae3d81f0f1 100644
--- a/drivers/net/ethernet/intel/ice/ice_irq.c
+++ b/drivers/net/ethernet/intel/ice/ice_irq.c
@@ -20,6 +20,19 @@ ice_init_irq_tracker(struct ice_pf *pf, unsigned int max_vectors,
xa_init_flags(&pf->irq_tracker.entries, XA_FLAGS_ALLOC);
}
+static int
+ice_init_virt_irq_tracker(struct ice_pf *pf, u32 base, u32 num_entries)
+{
+ pf->virt_irq_tracker.bm = bitmap_zalloc(num_entries, GFP_KERNEL);
+ if (!pf->virt_irq_tracker.bm)
+ return -ENOMEM;
+
+ pf->virt_irq_tracker.num_entries = num_entries;
+ pf->virt_irq_tracker.base = base;
+
+ return 0;
+}
+
/**
* ice_deinit_irq_tracker - free xarray tracker
* @pf: board private structure
@@ -29,6 +42,11 @@ static void ice_deinit_irq_tracker(struct ice_pf *pf)
xa_destroy(&pf->irq_tracker.entries);
}
+static void ice_deinit_virt_irq_tracker(struct ice_pf *pf)
+{
+ bitmap_free(pf->virt_irq_tracker.bm);
+}
+
/**
* ice_free_irq_res - free a block of resources
* @pf: board private structure
@@ -101,6 +119,7 @@ void ice_clear_interrupt_scheme(struct ice_pf *pf)
{
pci_free_irq_vectors(pf->pdev);
ice_deinit_irq_tracker(pf);
+ ice_deinit_virt_irq_tracker(pf);
}
/**
@@ -120,6 +139,9 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
pf->msix.max = min(total_vectors,
ice_get_default_msix_amount(pf));
+ pf->msix.total = total_vectors;
+ pf->msix.rest = total_vectors - pf->msix.max;
+
if (pci_msix_can_alloc_dyn(pf->pdev))
vectors = pf->msix.min;
else
@@ -132,7 +154,7 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
ice_init_irq_tracker(pf, pf->msix.max, vectors);
- return 0;
+ return ice_init_virt_irq_tracker(pf, pf->msix.max, pf->msix.rest);
}
/**
@@ -159,7 +181,6 @@ int ice_init_interrupt_scheme(struct ice_pf *pf)
*/
struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
{
- int sriov_base_vector = pf->sriov_base_vector;
struct msi_map map = { .index = -ENOENT };
struct device *dev = ice_pf_to_dev(pf);
struct ice_irq_entry *entry;
@@ -168,10 +189,6 @@ struct msi_map ice_alloc_irq(struct ice_pf *pf, bool dyn_allowed)
if (!entry)
return map;
- /* fail if we're about to violate SRIOV vectors space */
- if (sriov_base_vector && entry->index >= sriov_base_vector)
- goto exit_free_res;
-
if (pci_msix_can_alloc_dyn(pf->pdev) && entry->dynamic) {
map = pci_msix_alloc_irq_at(pf->pdev, entry->index, NULL);
if (map.index < 0)
@@ -219,26 +236,40 @@ void ice_free_irq(struct ice_pf *pf, struct msi_map map)
}
/**
- * ice_get_max_used_msix_vector - Get the max used interrupt vector
- * @pf: board private structure
+ * ice_virt_get_irqs - get irqs for SR-IOV usacase
+ * @pf: pointer to PF structure
+ * @needed: number of irqs to get
*
- * Return index of maximum used interrupt vectors with respect to the
- * beginning of the MSIX table. Take into account that some interrupts
- * may have been dynamically allocated after MSIX was initially enabled.
+ * This returns the first MSI-X vector index in PF space that is used by this
+ * VF. This index is used when accessing PF relative registers such as
+ * GLINT_VECT2FUNC and GLINT_DYN_CTL.
+ * This will always be the OICR index in the AVF driver so any functionality
+ * using vf->first_vector_idx for queue configuration_id: id of VF which will
+ * use this irqs
*/
-int ice_get_max_used_msix_vector(struct ice_pf *pf)
+int ice_virt_get_irqs(struct ice_pf *pf, u32 needed)
{
- unsigned long start, index, max_idx;
- void *entry;
+ int res = bitmap_find_next_zero_area(pf->virt_irq_tracker.bm,
+ pf->virt_irq_tracker.num_entries,
+ 0, needed, 0);
- /* Treat all preallocated interrupts as used */
- start = pf->irq_tracker.num_static;
- max_idx = start - 1;
+ if (res >= pf->virt_irq_tracker.num_entries)
+ return -ENOENT;
- xa_for_each_start(&pf->irq_tracker.entries, index, entry, start) {
- if (index > max_idx)
- max_idx = index;
- }
+ bitmap_set(pf->virt_irq_tracker.bm, res, needed);
- return max_idx;
+ /* conversion from number in bitmap to global irq index */
+ return res + pf->virt_irq_tracker.base;
+}
+
+/**
+ * ice_virt_free_irqs - free irqs used by the VF
+ * @pf: pointer to PF structure
+ * @index: first index to be free
+ * @irqs: number of irqs to free
+ */
+void ice_virt_free_irqs(struct ice_pf *pf, u32 index, u32 irqs)
+{
+ bitmap_clear(pf->virt_irq_tracker.bm, index - pf->virt_irq_tracker.base,
+ irqs);
}
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c b/drivers/net/ethernet/intel/ice/ice_sriov.c
index b83f99c01d91..33eac29b6a50 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -122,27 +122,6 @@ static void ice_dis_vf_mappings(struct ice_vf *vf)
dev_err(dev, "Scattered mode for VF Rx queues is not yet implemented\n");
}
-/**
- * ice_sriov_free_msix_res - Reset/free any used MSIX resources
- * @pf: pointer to the PF structure
- *
- * Since no MSIX entries are taken from the pf->irq_tracker then just clear
- * the pf->sriov_base_vector.
- *
- * Returns 0 on success, and -EINVAL on error.
- */
-static int ice_sriov_free_msix_res(struct ice_pf *pf)
-{
- if (!pf)
- return -EINVAL;
-
- bitmap_free(pf->sriov_irq_bm);
- pf->sriov_irq_size = 0;
- pf->sriov_base_vector = 0;
-
- return 0;
-}
-
/**
* ice_free_vfs - Free all VFs
* @pf: pointer to the PF structure
@@ -177,6 +156,7 @@ void ice_free_vfs(struct ice_pf *pf)
ice_eswitch_detach_vf(pf, vf);
ice_dis_vf_qs(vf);
+ ice_virt_free_irqs(pf, vf->first_vector_idx, vf->num_msix);
if (test_bit(ICE_VF_STATE_INIT, vf->vf_states)) {
/* disable VF qp mappings and set VF disable state */
@@ -200,9 +180,6 @@ void ice_free_vfs(struct ice_pf *pf)
mutex_unlock(&vf->cfg_lock);
}
- if (ice_sriov_free_msix_res(pf))
- dev_err(dev, "Failed to free MSIX resources used by SR-IOV\n");
-
vfs->num_qps_per = 0;
ice_free_vf_entries(pf);
@@ -371,40 +348,6 @@ void ice_calc_vf_reg_idx(struct ice_vf *vf, struct ice_q_vector *q_vector)
q_vector->reg_idx = vf->first_vector_idx + q_vector->vf_reg_idx;
}
-/**
- * ice_sriov_set_msix_res - Set any used MSIX resources
- * @pf: pointer to PF structure
- * @num_msix_needed: number of MSIX vectors needed for all SR-IOV VFs
- *
- * This function allows SR-IOV resources to be taken from the end of the PF's
- * allowed HW MSIX vectors so that the irq_tracker will not be affected. We
- * just set the pf->sriov_base_vector and return success.
- *
- * If there are not enough resources available, return an error. This should
- * always be caught by ice_set_per_vf_res().
- *
- * Return 0 on success, and -EINVAL when there are not enough MSIX vectors
- * in the PF's space available for SR-IOV.
- */
-static int ice_sriov_set_msix_res(struct ice_pf *pf, u16 num_msix_needed)
-{
- u16 total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
- int vectors_used = ice_get_max_used_msix_vector(pf);
- int sriov_base_vector;
-
- sriov_base_vector = total_vectors - num_msix_needed;
-
- /* make sure we only grab irq_tracker entries from the list end and
- * that we have enough available MSIX vectors
- */
- if (sriov_base_vector < vectors_used)
- return -EINVAL;
-
- pf->sriov_base_vector = sriov_base_vector;
-
- return 0;
-}
-
/**
* ice_set_per_vf_res - check if vectors and queues are available
* @pf: pointer to the PF structure
@@ -429,11 +372,9 @@ static int ice_sriov_set_msix_res(struct ice_pf *pf, u16 num_msix_needed)
*/
static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
{
- int vectors_used = ice_get_max_used_msix_vector(pf);
u16 num_msix_per_vf, num_txq, num_rxq, avail_qs;
int msix_avail_per_vf, msix_avail_for_sriov;
struct device *dev = ice_pf_to_dev(pf);
- int err;
lockdep_assert_held(&pf->vfs.table_lock);
@@ -441,8 +382,7 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
return -EINVAL;
/* determine MSI-X resources per VF */
- msix_avail_for_sriov = pf->hw.func_caps.common_cap.num_msix_vectors -
- vectors_used;
+ msix_avail_for_sriov = pf->virt_irq_tracker.num_entries;
msix_avail_per_vf = msix_avail_for_sriov / num_vfs;
if (msix_avail_per_vf >= ICE_NUM_VF_MSIX_MED) {
num_msix_per_vf = ICE_NUM_VF_MSIX_MED;
@@ -481,13 +421,6 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
return -ENOSPC;
}
- err = ice_sriov_set_msix_res(pf, num_msix_per_vf * num_vfs);
- if (err) {
- dev_err(dev, "Unable to set MSI-X resources for %d VFs, err %d\n",
- num_vfs, err);
- return err;
- }
-
/* only allow equal Tx/Rx queue count (i.e. queue pairs) */
pf->vfs.num_qps_per = min_t(int, num_txq, num_rxq);
pf->vfs.num_msix_per = num_msix_per_vf;
@@ -497,52 +430,6 @@ static int ice_set_per_vf_res(struct ice_pf *pf, u16 num_vfs)
return 0;
}
-/**
- * ice_sriov_get_irqs - get irqs for SR-IOV usacase
- * @pf: pointer to PF structure
- * @needed: number of irqs to get
- *
- * This returns the first MSI-X vector index in PF space that is used by this
- * VF. This index is used when accessing PF relative registers such as
- * GLINT_VECT2FUNC and GLINT_DYN_CTL.
- * This will always be the OICR index in the AVF driver so any functionality
- * using vf->first_vector_idx for queue configuration_id: id of VF which will
- * use this irqs
- *
- * Only SRIOV specific vectors are tracked in sriov_irq_bm. SRIOV vectors are
- * allocated from the end of global irq index. First bit in sriov_irq_bm means
- * last irq index etc. It simplifies extension of SRIOV vectors.
- * They will be always located from sriov_base_vector to the last irq
- * index. While increasing/decreasing sriov_base_vector can be moved.
- */
-static int ice_sriov_get_irqs(struct ice_pf *pf, u16 needed)
-{
- int res = bitmap_find_next_zero_area(pf->sriov_irq_bm,
- pf->sriov_irq_size, 0, needed, 0);
- /* conversion from number in bitmap to global irq index */
- int index = pf->sriov_irq_size - res - needed;
-
- if (res >= pf->sriov_irq_size || index < pf->sriov_base_vector)
- return -ENOENT;
-
- bitmap_set(pf->sriov_irq_bm, res, needed);
- return index;
-}
-
-/**
- * ice_sriov_free_irqs - free irqs used by the VF
- * @pf: pointer to PF structure
- * @vf: pointer to VF structure
- */
-static void ice_sriov_free_irqs(struct ice_pf *pf, struct ice_vf *vf)
-{
- /* Move back from first vector index to first index in bitmap */
- int bm_i = pf->sriov_irq_size - vf->first_vector_idx - vf->num_msix;
-
- bitmap_clear(pf->sriov_irq_bm, bm_i, vf->num_msix);
- vf->first_vector_idx = 0;
-}
-
/**
* ice_init_vf_vsi_res - initialize/setup VF VSI resources
* @vf: VF to initialize/setup the VSI for
@@ -556,7 +443,7 @@ static int ice_init_vf_vsi_res(struct ice_vf *vf)
struct ice_vsi *vsi;
int err;
- vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+ vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
if (vf->first_vector_idx < 0)
return -ENOMEM;
@@ -856,16 +743,10 @@ static int ice_create_vf_entries(struct ice_pf *pf, u16 num_vfs)
*/
static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
{
- int total_vectors = pf->hw.func_caps.common_cap.num_msix_vectors;
struct device *dev = ice_pf_to_dev(pf);
struct ice_hw *hw = &pf->hw;
int ret;
- pf->sriov_irq_bm = bitmap_zalloc(total_vectors, GFP_KERNEL);
- if (!pf->sriov_irq_bm)
- return -ENOMEM;
- pf->sriov_irq_size = total_vectors;
-
/* Disable global interrupt 0 so we don't try to handle the VFLR. */
wr32(hw, GLINT_DYN_CTL(pf->oicr_irq.index),
ICE_ITR_NONE << GLINT_DYN_CTL_ITR_INDX_S);
@@ -918,7 +799,6 @@ static int ice_ena_vfs(struct ice_pf *pf, u16 num_vfs)
/* rearm interrupts here */
ice_irq_dynamic_ena(hw, NULL, NULL);
clear_bit(ICE_OICR_INTR_DIS, pf->state);
- bitmap_free(pf->sriov_irq_bm);
return ret;
}
@@ -992,16 +872,7 @@ u32 ice_sriov_get_vf_total_msix(struct pci_dev *pdev)
{
struct ice_pf *pf = pci_get_drvdata(pdev);
- return pf->sriov_irq_size - ice_get_max_used_msix_vector(pf);
-}
-
-static int ice_sriov_move_base_vector(struct ice_pf *pf, int move)
-{
- if (pf->sriov_base_vector - move < ice_get_max_used_msix_vector(pf))
- return -ENOMEM;
-
- pf->sriov_base_vector -= move;
- return 0;
+ return pf->virt_irq_tracker.num_entries;
}
static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
@@ -1020,7 +891,8 @@ static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
continue;
ice_dis_vf_mappings(tmp_vf);
- ice_sriov_free_irqs(pf, tmp_vf);
+ ice_virt_free_irqs(pf, tmp_vf->first_vector_idx,
+ tmp_vf->num_msix);
vf_ids[to_remap] = tmp_vf->vf_id;
to_remap += 1;
@@ -1032,7 +904,7 @@ static void ice_sriov_remap_vectors(struct ice_pf *pf, u16 restricted_id)
continue;
tmp_vf->first_vector_idx =
- ice_sriov_get_irqs(pf, tmp_vf->num_msix);
+ ice_virt_get_irqs(pf, tmp_vf->num_msix);
/* there is no need to rebuild VSI as we are only changing the
* vector indexes not amount of MSI-X or queues
*/
@@ -1105,20 +977,15 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
prev_msix = vf->num_msix;
prev_queues = vf->num_vf_qs;
- if (ice_sriov_move_base_vector(pf, msix_vec_count - prev_msix)) {
- ice_put_vf(vf);
- return -ENOSPC;
- }
-
ice_dis_vf_mappings(vf);
- ice_sriov_free_irqs(pf, vf);
+ ice_virt_free_irqs(pf, vf->first_vector_idx, vf->num_msix);
/* Remap all VFs beside the one is now configured */
ice_sriov_remap_vectors(pf, vf->vf_id);
vf->num_msix = msix_vec_count;
vf->num_vf_qs = queues;
- vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+ vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
if (vf->first_vector_idx < 0)
goto unroll;
@@ -1147,7 +1014,8 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
vf->num_msix = prev_msix;
vf->num_vf_qs = prev_queues;
- vf->first_vector_idx = ice_sriov_get_irqs(pf, vf->num_msix);
+
+ vf->first_vector_idx = ice_virt_get_irqs(pf, vf->num_msix);
if (vf->first_vector_idx < 0) {
ice_put_vf(vf);
return -EINVAL;
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [iwl-next v9 9/9] ice: init flow director before RDMA
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
` (7 preceding siblings ...)
2024-12-03 6:58 ` [iwl-next v9 8/9] ice: simplify VF MSI-X managing Michal Swiatkowski
@ 2024-12-03 6:58 ` Michal Swiatkowski
8 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2024-12-03 6:58 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, pawel.chmielewski, sridhar.samudrala, jacob.e.keller,
pio.raczynski, konrad.knitter, marcin.szycik, wojciech.drewek,
nex.sw.ncis.nat.hpm.dev, przemyslaw.kitszel, jiri, horms,
David.Laight, pmenzel, mschmidt, himasekharx.reddy.pucha,
rafal.romanowski
Flow director needs only one MSI-X. Load it before RDMA to save MSI-X
for it.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
---
drivers/net/ethernet/intel/ice/ice_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 7b9be612cf33..576bff42f440 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5180,11 +5180,12 @@ int ice_load(struct ice_pf *pf)
ice_napi_add(vsi);
+ ice_init_features(pf);
+
err = ice_init_rdma(pf);
if (err)
goto err_init_rdma;
- ice_init_features(pf);
ice_service_task_restart(pf);
clear_bit(ICE_DOWN, pf->state);
@@ -5192,6 +5193,7 @@ int ice_load(struct ice_pf *pf)
return 0;
err_init_rdma:
+ ice_deinit_features(pf);
ice_tc_indir_block_unregister(vsi);
err_tc_indir_block_register:
ice_unregister_netdev(vsi);
@@ -5215,8 +5217,8 @@ void ice_unload(struct ice_pf *pf)
devl_assert_locked(priv_to_devlink(pf));
- ice_deinit_features(pf);
ice_deinit_rdma(pf);
+ ice_deinit_features(pf);
ice_tc_indir_block_unregister(vsi);
ice_unregister_netdev(vsi);
ice_devlink_destroy_pf_port(pf);
--
2.42.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma
2024-12-03 6:58 ` [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma Michal Swiatkowski
@ 2025-02-13 19:20 ` Marcin Szycik
2025-02-14 5:41 ` Michal Swiatkowski
0 siblings, 1 reply; 12+ messages in thread
From: Marcin Szycik @ 2025-02-13 19:20 UTC (permalink / raw)
To: Michal Swiatkowski, intel-wired-lan
Cc: himasekharx.reddy.pucha, pmenzel, marcin.szycik, netdev,
rafal.romanowski, konrad.knitter, pawel.chmielewski, horms,
David.Laight, nex.sw.ncis.nat.hpm.dev, pio.raczynski,
sridhar.samudrala, jacob.e.keller, jiri, przemyslaw.kitszel,
Tony Nguyen
On 03.12.2024 07:58, Michal Swiatkowski wrote:
> Move responsibility of MSI-X requesting for RDMA feature from ice driver
> to irdma driver. It is done to allow simple fallback when there is not
> enough MSI-X available.
>
> Change amount of MSI-X used for control from 4 to 1, as it isn't needed
> to have more than one MSI-X for this purpose.
Hi, I'm observing KASAN reports or kernel panic when attempting to remove irdma
with this patchset, most probably this patch being the culprit, since it touches
functions from splat.
Reproducer:
sudo rmmod irdma
Minified splat(s):
BUG: KASAN: use-after-free in irdma_remove+0x257/0x2d0 [irdma]
Call Trace:
<TASK>
? __pfx__raw_spin_lock_irqsave+0x10/0x10
? kfree+0x253/0x450
? irdma_remove+0x257/0x2d0 [irdma]
kasan_report+0xed/0x120
? irdma_remove+0x257/0x2d0 [irdma]
irdma_remove+0x257/0x2d0 [irdma]
auxiliary_bus_remove+0x56/0x80
device_release_driver_internal+0x371/0x530
? kernfs_put.part.0+0x147/0x310
driver_detach+0xbf/0x180
bus_remove_driver+0x11b/0x2a0
auxiliary_driver_unregister+0x1a/0x50
irdma_exit_module+0x40/0x4c [irdma]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:ice_free_rdma_qvector+0x2a/0xa0 [ice]
Call Trace:
? ice_free_rdma_qvector+0x2a/0xa0 [ice]
irdma_remove+0x179/0x2d0 [irdma]
auxiliary_bus_remove+0x56/0x80
device_release_driver_internal+0x371/0x530
? kobject_put+0x61/0x4b0
driver_detach+0xbf/0x180
bus_remove_driver+0x11b/0x2a0
auxiliary_driver_unregister+0x1a/0x50
irdma_exit_module+0x40/0x4c [irdma]
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
> ---
> drivers/infiniband/hw/irdma/main.h | 3 ++
> drivers/net/ethernet/intel/ice/ice.h | 1 -
> include/linux/net/intel/iidc.h | 2 +
> drivers/infiniband/hw/irdma/hw.c | 2 -
> drivers/infiniband/hw/irdma/main.c | 46 ++++++++++++++++-
> drivers/net/ethernet/intel/ice/ice_idc.c | 64 ++++++------------------
> drivers/net/ethernet/intel/ice/ice_irq.c | 3 +-
> 7 files changed, 65 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/infiniband/hw/irdma/main.h b/drivers/infiniband/hw/irdma/main.h
> index 9f0ed6e84471..ef9a9b79d711 100644
> --- a/drivers/infiniband/hw/irdma/main.h
> +++ b/drivers/infiniband/hw/irdma/main.h
> @@ -117,6 +117,9 @@ extern struct auxiliary_driver i40iw_auxiliary_drv;
>
> #define IRDMA_IRQ_NAME_STR_LEN (64)
>
> +#define IRDMA_NUM_AEQ_MSIX 1
> +#define IRDMA_MIN_MSIX 2
> +
> enum init_completion_state {
> INVALID_STATE = 0,
> INITIAL_STATE,
> diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
> index f497f7d6eb71..14a90c916d43 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -97,7 +97,6 @@
> #define ICE_MIN_LAN_OICR_MSIX 1
> #define ICE_MIN_MSIX (ICE_MIN_LAN_TXRX_MSIX + ICE_MIN_LAN_OICR_MSIX)
> #define ICE_FDIR_MSIX 2
> -#define ICE_RDMA_NUM_AEQ_MSIX 4
> #define ICE_NO_VSI 0xffff
> #define ICE_VSI_MAP_CONTIG 0
> #define ICE_VSI_MAP_SCATTER 1
> diff --git a/include/linux/net/intel/iidc.h b/include/linux/net/intel/iidc.h
> index 1c1332e4df26..13274c3def66 100644
> --- a/include/linux/net/intel/iidc.h
> +++ b/include/linux/net/intel/iidc.h
> @@ -78,6 +78,8 @@ int ice_del_rdma_qset(struct ice_pf *pf, struct iidc_rdma_qset_params *qset);
> int ice_rdma_request_reset(struct ice_pf *pf, enum iidc_reset_type reset_type);
> int ice_rdma_update_vsi_filter(struct ice_pf *pf, u16 vsi_id, bool enable);
> void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos);
> +int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
> +void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry);
>
> /* Structure representing auxiliary driver tailored information about the core
> * PCI dev, each auxiliary driver using the IIDC interface will have an
> diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c
> index ad50b77282f8..69ce1862eabe 100644
> --- a/drivers/infiniband/hw/irdma/hw.c
> +++ b/drivers/infiniband/hw/irdma/hw.c
> @@ -498,8 +498,6 @@ static int irdma_save_msix_info(struct irdma_pci_f *rf)
> iw_qvlist->num_vectors = rf->msix_count;
> if (rf->msix_count <= num_online_cpus())
> rf->msix_shared = true;
> - else if (rf->msix_count > num_online_cpus() + 1)
> - rf->msix_count = num_online_cpus() + 1;
>
> pmsix = rf->msix_entries;
> for (i = 0, ceq_idx = 0; i < rf->msix_count; i++, iw_qvinfo++) {
> diff --git a/drivers/infiniband/hw/irdma/main.c b/drivers/infiniband/hw/irdma/main.c
> index 3f13200ff71b..1ee8969595d3 100644
> --- a/drivers/infiniband/hw/irdma/main.c
> +++ b/drivers/infiniband/hw/irdma/main.c
> @@ -206,6 +206,43 @@ static void irdma_lan_unregister_qset(struct irdma_sc_vsi *vsi,
> ibdev_dbg(&iwdev->ibdev, "WS: LAN free_res for rdma qset failed.\n");
> }
>
> +static int irdma_init_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
> +{
> + int i;
> +
> + rf->msix_count = num_online_cpus() + IRDMA_NUM_AEQ_MSIX;
> + rf->msix_entries = kcalloc(rf->msix_count, sizeof(*rf->msix_entries),
> + GFP_KERNEL);
> + if (!rf->msix_entries)
> + return -ENOMEM;
> +
> + for (i = 0; i < rf->msix_count; i++)
> + if (ice_alloc_rdma_qvector(pf, &rf->msix_entries[i]))
> + break;
> +
> + if (i < IRDMA_MIN_MSIX) {
> + for (; i > 0; i--)
> + ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
> +
> + kfree(rf->msix_entries);
> + return -ENOMEM;
> + }
> +
> + rf->msix_count = i;
> +
> + return 0;
> +}
> +
> +static void irdma_deinit_interrupts(struct irdma_pci_f *rf, struct ice_pf *pf)
> +{
> + int i;
> +
> + for (i = 0; i < rf->msix_count; i++)
> + ice_free_rdma_qvector(pf, &rf->msix_entries[i]);
> +
> + kfree(rf->msix_entries);
> +}
> +
> static void irdma_remove(struct auxiliary_device *aux_dev)
> {
> struct iidc_auxiliary_dev *iidc_adev = container_of(aux_dev,
> @@ -216,6 +253,7 @@ static void irdma_remove(struct auxiliary_device *aux_dev)
>
> irdma_ib_unregister_device(iwdev);
> ice_rdma_update_vsi_filter(pf, iwdev->vsi_num, false);
> + irdma_deinit_interrupts(iwdev->rf, pf);
>
> pr_debug("INIT: Gen2 PF[%d] device remove success\n", PCI_FUNC(pf->pdev->devfn));
> }
> @@ -230,9 +268,7 @@ static void irdma_fill_device_info(struct irdma_device *iwdev, struct ice_pf *pf
> rf->gen_ops.unregister_qset = irdma_lan_unregister_qset;
> rf->hw.hw_addr = pf->hw.hw_addr;
> rf->pcidev = pf->pdev;
> - rf->msix_count = pf->num_rdma_msix;
> rf->pf_id = pf->hw.pf_id;
> - rf->msix_entries = &pf->msix_entries[pf->rdma_base_vector];
> rf->default_vsi.vsi_idx = vsi->vsi_num;
> rf->protocol_used = pf->rdma_mode & IIDC_RDMA_PROTOCOL_ROCEV2 ?
> IRDMA_ROCE_PROTOCOL_ONLY : IRDMA_IWARP_PROTOCOL_ONLY;
> @@ -281,6 +317,10 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
> irdma_fill_device_info(iwdev, pf, vsi);
> rf = iwdev->rf;
>
> + err = irdma_init_interrupts(rf, pf);
> + if (err)
> + goto err_init_interrupts;
> +
> err = irdma_ctrl_init_hw(rf);
> if (err)
> goto err_ctrl_init;
> @@ -311,6 +351,8 @@ static int irdma_probe(struct auxiliary_device *aux_dev, const struct auxiliary_
> err_rt_init:
> irdma_ctrl_deinit_hw(rf);
> err_ctrl_init:
> + irdma_deinit_interrupts(rf, pf);
> +err_init_interrupts:
> kfree(iwdev->rf);
> ib_dealloc_device(&iwdev->ibdev);
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_idc.c b/drivers/net/ethernet/intel/ice/ice_idc.c
> index 145b27f2a4ce..bab3e81cad5d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_idc.c
> +++ b/drivers/net/ethernet/intel/ice/ice_idc.c
> @@ -228,61 +228,34 @@ void ice_get_qos_params(struct ice_pf *pf, struct iidc_qos_params *qos)
> }
> EXPORT_SYMBOL_GPL(ice_get_qos_params);
>
> -/**
> - * ice_alloc_rdma_qvectors - Allocate vector resources for RDMA driver
> - * @pf: board private structure to initialize
> - */
> -static int ice_alloc_rdma_qvectors(struct ice_pf *pf)
> +int ice_alloc_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
> {
> - if (ice_is_rdma_ena(pf)) {
> - int i;
> -
> - pf->msix_entries = kcalloc(pf->num_rdma_msix,
> - sizeof(*pf->msix_entries),
> - GFP_KERNEL);
> - if (!pf->msix_entries)
> - return -ENOMEM;
> + struct msi_map map = ice_alloc_irq(pf, true);
>
> - /* RDMA is the only user of pf->msix_entries array */
> - pf->rdma_base_vector = 0;
> -
> - for (i = 0; i < pf->num_rdma_msix; i++) {
> - struct msix_entry *entry = &pf->msix_entries[i];
> - struct msi_map map;
> + if (map.index < 0)
> + return -ENOMEM;
>
> - map = ice_alloc_irq(pf, false);
> - if (map.index < 0)
> - break;
> + entry->entry = map.index;
> + entry->vector = map.virq;
>
> - entry->entry = map.index;
> - entry->vector = map.virq;
> - }
> - }
> return 0;
> }
> +EXPORT_SYMBOL_GPL(ice_alloc_rdma_qvector);
>
> /**
> * ice_free_rdma_qvector - free vector resources reserved for RDMA driver
> * @pf: board private structure to initialize
> + * @entry: MSI-X entry to be removed
> */
> -static void ice_free_rdma_qvector(struct ice_pf *pf)
> +void ice_free_rdma_qvector(struct ice_pf *pf, struct msix_entry *entry)
> {
> - int i;
> -
> - if (!pf->msix_entries)
> - return;
> -
> - for (i = 0; i < pf->num_rdma_msix; i++) {
> - struct msi_map map;
> + struct msi_map map;
>
> - map.index = pf->msix_entries[i].entry;
> - map.virq = pf->msix_entries[i].vector;
> - ice_free_irq(pf, map);
> - }
> -
> - kfree(pf->msix_entries);
> - pf->msix_entries = NULL;
> + map.index = entry->entry;
> + map.virq = entry->vector;
> + ice_free_irq(pf, map);
> }
> +EXPORT_SYMBOL_GPL(ice_free_rdma_qvector);
>
> /**
> * ice_adev_release - function to be mapped to AUX dev's release op
> @@ -382,12 +355,6 @@ int ice_init_rdma(struct ice_pf *pf)
> return -ENOMEM;
> }
>
> - /* Reserve vector resources */
> - ret = ice_alloc_rdma_qvectors(pf);
> - if (ret < 0) {
> - dev_err(dev, "failed to reserve vectors for RDMA\n");
> - goto err_reserve_rdma_qvector;
> - }
> pf->rdma_mode |= IIDC_RDMA_PROTOCOL_ROCEV2;
> ret = ice_plug_aux_dev(pf);
> if (ret)
> @@ -395,8 +362,6 @@ int ice_init_rdma(struct ice_pf *pf)
> return 0;
>
> err_plug_aux_dev:
> - ice_free_rdma_qvector(pf);
> -err_reserve_rdma_qvector:
> pf->adev = NULL;
> xa_erase(&ice_aux_id, pf->aux_idx);
> return ret;
> @@ -412,6 +377,5 @@ void ice_deinit_rdma(struct ice_pf *pf)
> return;
>
> ice_unplug_aux_dev(pf);
> - ice_free_rdma_qvector(pf);
> xa_erase(&ice_aux_id, pf->aux_idx);
> }
> diff --git a/drivers/net/ethernet/intel/ice/ice_irq.c b/drivers/net/ethernet/intel/ice/ice_irq.c
> index 1a7d446ab5f1..80c9ee2e64c1 100644
> --- a/drivers/net/ethernet/intel/ice/ice_irq.c
> +++ b/drivers/net/ethernet/intel/ice/ice_irq.c
> @@ -84,11 +84,12 @@ static struct ice_irq_entry *ice_get_irq_res(struct ice_pf *pf, bool dyn_only)
> return entry;
> }
>
> +#define ICE_RDMA_AEQ_MSIX 1
> static int ice_get_default_msix_amount(struct ice_pf *pf)
> {
> return ICE_MIN_LAN_OICR_MSIX + num_online_cpus() +
> (test_bit(ICE_FLAG_FD_ENA, pf->flags) ? ICE_FDIR_MSIX : 0) +
> - (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_NUM_AEQ_MSIX : 0);
> + (ice_is_rdma_ena(pf) ? num_online_cpus() + ICE_RDMA_AEQ_MSIX : 0);
> }
>
> /**
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma
2025-02-13 19:20 ` [Intel-wired-lan] " Marcin Szycik
@ 2025-02-14 5:41 ` Michal Swiatkowski
0 siblings, 0 replies; 12+ messages in thread
From: Michal Swiatkowski @ 2025-02-14 5:41 UTC (permalink / raw)
To: Marcin Szycik
Cc: Michal Swiatkowski, intel-wired-lan, himasekharx.reddy.pucha,
pmenzel, marcin.szycik, netdev, rafal.romanowski, konrad.knitter,
pawel.chmielewski, horms, David.Laight, nex.sw.ncis.nat.hpm.dev,
pio.raczynski, sridhar.samudrala, jacob.e.keller, jiri,
przemyslaw.kitszel, Tony Nguyen
On Thu, Feb 13, 2025 at 08:20:31PM +0100, Marcin Szycik wrote:
>
>
> On 03.12.2024 07:58, Michal Swiatkowski wrote:
> > Move responsibility of MSI-X requesting for RDMA feature from ice driver
> > to irdma driver. It is done to allow simple fallback when there is not
> > enough MSI-X available.
> >
> > Change amount of MSI-X used for control from 4 to 1, as it isn't needed
> > to have more than one MSI-X for this purpose.
>
> Hi, I'm observing KASAN reports or kernel panic when attempting to remove irdma
> with this patchset, most probably this patch being the culprit, since it touches
> functions from splat.
>
> Reproducer:
> sudo rmmod irdma
>
> Minified splat(s):
> BUG: KASAN: use-after-free in irdma_remove+0x257/0x2d0 [irdma]
> Call Trace:
> <TASK>
> ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> ? kfree+0x253/0x450
> ? irdma_remove+0x257/0x2d0 [irdma]
> kasan_report+0xed/0x120
> ? irdma_remove+0x257/0x2d0 [irdma]
> irdma_remove+0x257/0x2d0 [irdma]
> auxiliary_bus_remove+0x56/0x80
> device_release_driver_internal+0x371/0x530
> ? kernfs_put.part.0+0x147/0x310
> driver_detach+0xbf/0x180
> bus_remove_driver+0x11b/0x2a0
> auxiliary_driver_unregister+0x1a/0x50
> irdma_exit_module+0x40/0x4c [irdma]
>
> Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
> KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> RIP: 0010:ice_free_rdma_qvector+0x2a/0xa0 [ice]
> Call Trace:
> ? ice_free_rdma_qvector+0x2a/0xa0 [ice]
> irdma_remove+0x179/0x2d0 [irdma]
> auxiliary_bus_remove+0x56/0x80
> device_release_driver_internal+0x371/0x530
> ? kobject_put+0x61/0x4b0
> driver_detach+0xbf/0x180
> bus_remove_driver+0x11b/0x2a0
> auxiliary_driver_unregister+0x1a/0x50
> irdma_exit_module+0x40/0x4c [irdma]
Thanks, I will work on it.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-14 5:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03 6:58 [iwl-next v9 0/9] ice: managing MSI-X in driver Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 1/9] ice: count combined queues using Rx/Tx count Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 2/9] ice: devlink PF MSI-X max and min parameter Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 3/9] ice: remove splitting MSI-X between features Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 4/9] ice: get rid of num_lan_msix field Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 5/9] ice, irdma: move interrupts code to irdma Michal Swiatkowski
2025-02-13 19:20 ` [Intel-wired-lan] " Marcin Szycik
2025-02-14 5:41 ` Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 6/9] ice: treat dyn_allowed only as suggestion Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 7/9] ice: enable_rdma devlink param Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 8/9] ice: simplify VF MSI-X managing Michal Swiatkowski
2024-12-03 6:58 ` [iwl-next v9 9/9] ice: init flow director before RDMA Michal Swiatkowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).