* Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-04 16:17 [PATCH] net/mlx5: poll mlx5 eq during irq migration Praveen Kumar Kannoju
@ 2026-03-04 20:11 ` Jason Gunthorpe
[not found] ` <CH3PR10MB7704DD1E6B9A671796FC6B528C7DA@CH3PR10MB7704.namprd10.prod.outlook.com>
2026-03-05 4:17 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2026-03-04 20:11 UTC (permalink / raw)
To: Praveen Kumar Kannoju
Cc: saeedm, leon, tariqt, mbloch, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, linux-rdma, linux-kernel, rama.nichanamatlu,
manjunath.b.patil, anand.a.khoje
On Wed, Mar 04, 2026 at 04:17:04PM +0000, Praveen Kumar Kannoju wrote:
> Interrupt lost scenario has been observed in multiple issues during IRQ
> migration due to cpu scaling activity. This further led to the presence of
> unhandled EQE's causing corresponding Mellanox transmission queues to
> become full and get timedout. This patch overcomes this situation by
> polling the EQ associated with the IRQ which undergoes migration, to
> recover any unhandled EQE's and keep the transmission uninterrupted from
> the corresponding queue.
What? This does not seem like something we should do like this.
IRQ migration is not supposed to loose interrupts, this seems like a
IRQ layer bug to me. If it is buggy and loosing interrupts it should
probably inject a spurious interrupt around these events so all
devices can be enjoy the bug fix.
Basically you need to explain with alot more detail why the IRQ was
lost, not just some hand wavey "migration something something"..
BTW there are known bugs in things like qemu that can loose interrupts
around changes to the MSI (and worse than that too), but I thought
they were all fixed now?
Jason
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-04 16:17 [PATCH] net/mlx5: poll mlx5 eq during irq migration Praveen Kumar Kannoju
2026-03-04 20:11 ` Jason Gunthorpe
@ 2026-03-05 4:17 ` kernel test robot
2026-03-05 8:45 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2026-03-05 4:17 UTC (permalink / raw)
To: Praveen Kumar Kannoju, saeedm, leon, tariqt, mbloch,
andrew+netdev, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
linux-kernel
Cc: oe-kbuild-all, rama.nichanamatlu, manjunath.b.patil,
anand.a.khoje, Praveen Kumar Kannoju
Hi Praveen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
[also build test WARNING on net/main linus/master v6.16-rc1 next-20260304]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-Kumar-Kannoju/net-mlx5-poll-mlx5-eq-during-irq-migration/20260305-003505
base: net-next/main
patch link: https://lore.kernel.org/r/20260304161704.910564-1-praveen.kannoju%40oracle.com
patch subject: [PATCH] net/mlx5: poll mlx5 eq during irq migration
config: x86_64-rhel-9.4-ltp (https://download.01.org/0day-ci/archive/20260305/202603050528.5JWnahEr-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260305/202603050528.5JWnahEr-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603050528.5JWnahEr-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:6: warning: no previous prototype for 'mlx5_eq_reap_irq_notify' [-Wmissing-prototypes]
958 | void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
| ^~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:6: warning: no previous prototype for 'mlx5_eq_reap_irq_release' [-Wmissing-prototypes]
978 | void mlx5_eq_reap_irq_release(struct kref *ref) {}
| ^~~~~~~~~~~~~~~~~~~~~~~~
vim +/mlx5_eq_reap_irq_notify +958 drivers/net/ethernet/mellanox/mlx5/core/eq.c
957
> 958 void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
959 {
960 u32 eqe_count;
961 struct mlx5_eq_comp *eq = container_of(notify, struct mlx5_eq_comp, notify);
962
963 if (mlx5_reap_eq_irq_aff_change) {
964 mlx5_core_warn(eq->core.dev, "irqn = 0x%x migration notified, EQ 0x%x: Cons = 0x%x\n",
965 eq->core.irqn, eq->core.eqn, eq->core.cons_index);
966
967 while (!rtnl_trylock())
968 msleep(20);
969
970 eqe_count = mlx5_eq_poll_irq_disabled(eq);
971 if (eqe_count)
972 mlx5_core_warn(eq->core.dev, "Recovered %d eqes on EQ 0x%x\n",
973 eqe_count, eq->core.eqn);
974 rtnl_unlock();
975 }
976 }
977
> 978 void mlx5_eq_reap_irq_release(struct kref *ref) {}
979
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-04 16:17 [PATCH] net/mlx5: poll mlx5 eq during irq migration Praveen Kumar Kannoju
2026-03-04 20:11 ` Jason Gunthorpe
2026-03-05 4:17 ` kernel test robot
@ 2026-03-05 8:45 ` kernel test robot
2026-03-05 9:29 ` kernel test robot
2026-03-05 11:16 ` kernel test robot
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2026-03-05 8:45 UTC (permalink / raw)
To: Praveen Kumar Kannoju, saeedm, leon, tariqt, mbloch,
andrew+netdev, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
linux-kernel
Cc: llvm, oe-kbuild-all, rama.nichanamatlu, manjunath.b.patil,
anand.a.khoje, Praveen Kumar Kannoju
Hi Praveen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
[also build test WARNING on net/main linus/master v7.0-rc2 next-20260304]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-Kumar-Kannoju/net-mlx5-poll-mlx5-eq-during-irq-migration/20260305-003505
base: net-next/main
patch link: https://lore.kernel.org/r/20260304161704.910564-1-praveen.kannoju%40oracle.com
patch subject: [PATCH] net/mlx5: poll mlx5 eq during irq migration
config: x86_64-buildonly-randconfig-002-20260305 (https://download.01.org/0day-ci/archive/20260305/202603051647.fykhqQ3H-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260305/202603051647.fykhqQ3H-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603051647.fykhqQ3H-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:6: warning: no previous prototype for function 'mlx5_eq_reap_irq_notify' [-Wmissing-prototypes]
958 | void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
| ^
drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
958 | void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
| ^
| static
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:6: warning: no previous prototype for function 'mlx5_eq_reap_irq_release' [-Wmissing-prototypes]
978 | void mlx5_eq_reap_irq_release(struct kref *ref) {}
| ^
drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
978 | void mlx5_eq_reap_irq_release(struct kref *ref) {}
| ^
| static
2 warnings generated.
vim +/mlx5_eq_reap_irq_notify +958 drivers/net/ethernet/mellanox/mlx5/core/eq.c
957
> 958 void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
959 {
960 u32 eqe_count;
961 struct mlx5_eq_comp *eq = container_of(notify, struct mlx5_eq_comp, notify);
962
963 if (mlx5_reap_eq_irq_aff_change) {
964 mlx5_core_warn(eq->core.dev, "irqn = 0x%x migration notified, EQ 0x%x: Cons = 0x%x\n",
965 eq->core.irqn, eq->core.eqn, eq->core.cons_index);
966
967 while (!rtnl_trylock())
968 msleep(20);
969
970 eqe_count = mlx5_eq_poll_irq_disabled(eq);
971 if (eqe_count)
972 mlx5_core_warn(eq->core.dev, "Recovered %d eqes on EQ 0x%x\n",
973 eqe_count, eq->core.eqn);
974 rtnl_unlock();
975 }
976 }
977
> 978 void mlx5_eq_reap_irq_release(struct kref *ref) {}
979
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-04 16:17 [PATCH] net/mlx5: poll mlx5 eq during irq migration Praveen Kumar Kannoju
` (2 preceding siblings ...)
2026-03-05 8:45 ` kernel test robot
@ 2026-03-05 9:29 ` kernel test robot
2026-03-05 11:16 ` kernel test robot
4 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2026-03-05 9:29 UTC (permalink / raw)
To: Praveen Kumar Kannoju, saeedm, leon, tariqt, mbloch,
andrew+netdev, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
linux-kernel
Cc: oe-kbuild-all, rama.nichanamatlu, manjunath.b.patil,
anand.a.khoje, Praveen Kumar Kannoju
Hi Praveen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
[also build test WARNING on net/main linus/master v7.0-rc2 next-20260304]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-Kumar-Kannoju/net-mlx5-poll-mlx5-eq-during-irq-migration/20260305-003505
base: net-next/main
patch link: https://lore.kernel.org/r/20260304161704.910564-1-praveen.kannoju%40oracle.com
patch subject: [PATCH] net/mlx5: poll mlx5 eq during irq migration
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20260305/202603051743.ceus9qzu-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260305/202603051743.ceus9qzu-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603051743.ceus9qzu-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:6: warning: no previous prototype for 'mlx5_eq_reap_irq_notify' [-Wmissing-prototypes]
958 | void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
| ^~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:6: warning: no previous prototype for 'mlx5_eq_reap_irq_release' [-Wmissing-prototypes]
978 | void mlx5_eq_reap_irq_release(struct kref *ref) {}
| ^~~~~~~~~~~~~~~~~~~~~~~~
vim +/mlx5_eq_reap_irq_notify +958 drivers/net/ethernet/mellanox/mlx5/core/eq.c
957
> 958 void mlx5_eq_reap_irq_notify(struct irq_affinity_notify *notify, const cpumask_t *mask)
959 {
960 u32 eqe_count;
961 struct mlx5_eq_comp *eq = container_of(notify, struct mlx5_eq_comp, notify);
962
963 if (mlx5_reap_eq_irq_aff_change) {
964 mlx5_core_warn(eq->core.dev, "irqn = 0x%x migration notified, EQ 0x%x: Cons = 0x%x\n",
965 eq->core.irqn, eq->core.eqn, eq->core.cons_index);
966
967 while (!rtnl_trylock())
968 msleep(20);
969
970 eqe_count = mlx5_eq_poll_irq_disabled(eq);
971 if (eqe_count)
972 mlx5_core_warn(eq->core.dev, "Recovered %d eqes on EQ 0x%x\n",
973 eqe_count, eq->core.eqn);
974 rtnl_unlock();
975 }
976 }
977
> 978 void mlx5_eq_reap_irq_release(struct kref *ref) {}
979
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-04 16:17 [PATCH] net/mlx5: poll mlx5 eq during irq migration Praveen Kumar Kannoju
` (3 preceding siblings ...)
2026-03-05 9:29 ` kernel test robot
@ 2026-03-05 11:16 ` kernel test robot
2026-03-05 13:15 ` Praveen Kannoju
4 siblings, 1 reply; 13+ messages in thread
From: kernel test robot @ 2026-03-05 11:16 UTC (permalink / raw)
To: Praveen Kumar Kannoju, saeedm, leon, tariqt, mbloch,
andrew+netdev, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
linux-kernel
Cc: oe-kbuild-all, rama.nichanamatlu, manjunath.b.patil,
anand.a.khoje, Praveen Kumar Kannoju
Hi Praveen,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
[also build test WARNING on net/main linus/master v7.0-rc2 next-20260304]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Praveen-Kumar-Kannoju/net-mlx5-poll-mlx5-eq-during-irq-migration/20260305-003505
base: net-next/main
patch link: https://lore.kernel.org/r/20260304161704.910564-1-praveen.kannoju%40oracle.com
patch subject: [PATCH] net/mlx5: poll mlx5 eq during irq migration
config: loongarch-randconfig-r121-20260305 (https://download.01.org/0day-ci/archive/20260305/202603051910.7oo8wCfc-lkp@intel.com/config)
compiler: loongarch64-linux-gcc (GCC) 15.2.0
sparse: v0.6.5-rc1
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260305/202603051910.7oo8wCfc-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202603051910.7oo8wCfc-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:25:14: sparse: sparse: symbol 'mlx5_reap_eq_irq_aff_change' was not declared. Should it be static?
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:6: sparse: sparse: symbol 'mlx5_eq_reap_irq_notify' was not declared. Should it be static?
>> drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:6: sparse: sparse: symbol 'mlx5_eq_reap_irq_release' was not declared. Should it be static?
vim +/mlx5_reap_eq_irq_aff_change +25 drivers/net/ethernet/mellanox/mlx5/core/eq.c
24
> 25 unsigned int mlx5_reap_eq_irq_aff_change;
26 module_param(mlx5_reap_eq_irq_aff_change, int, 0644);
27 MODULE_PARM_DESC(mlx5_reap_eq_irq_aff_change, "mlx5_reap_eq_irq_aff_change: 0 = Disable MLX5 EQ Reap upon IRQ affinity change, \
28 1 = Enable MLX5 EQ Reap upon IRQ affinity change. Default=0");
29 enum {
30 MLX5_EQE_OWNER_INIT_VAL = 0x1,
31 };
32
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread* RE: [PATCH] net/mlx5: poll mlx5 eq during irq migration
2026-03-05 11:16 ` kernel test robot
@ 2026-03-05 13:15 ` Praveen Kannoju
0 siblings, 0 replies; 13+ messages in thread
From: Praveen Kannoju @ 2026-03-05 13:15 UTC (permalink / raw)
To: kernel test robot, saeedm@nvidia.com, leon@kernel.org,
tariqt@nvidia.com, mbloch@nvidia.com, andrew+netdev@lunn.ch,
davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: oe-kbuild-all@lists.linux.dev, Rama Nichanamatlu, Manjunath Patil,
Anand Khoje
Confidential - Oracle Restricted \Including External Recipients
Patch applied to the wrong git tree. Will resubmit.
Confidential - Oracle Restricted \Including External Recipients
> -----Original Message-----
> From: kernel test robot <lkp@intel.com>
> Sent: Thursday, March 5, 2026 4:47 PM
> To: Praveen Kannoju <praveen.kannoju@oracle.com>; saeedm@nvidia.com;
> leon@kernel.org; tariqt@nvidia.com; mbloch@nvidia.com;
> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; linux-
> rdma@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: oe-kbuild-all@lists.linux.dev; Rama Nichanamatlu
> <rama.nichanamatlu@oracle.com>; Manjunath Patil
> <manjunath.b.patil@oracle.com>; Anand Khoje
> <anand.a.khoje@oracle.com>; Praveen Kannoju
> <praveen.kannoju@oracle.com>
> Subject: Re: [PATCH] net/mlx5: poll mlx5 eq during irq migration
>
> Hi Praveen,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on net-next/main] [also build test WARNING on
> net/main linus/master v7.0-rc2 next-20260304] [If your patch is applied to
> the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Praveen-Kumar-
> Kannoju/net-mlx5-poll-mlx5-eq-during-irq-migration/20260305-003505
> base: net-next/main
> patch link: https://lore.kernel.org/r/20260304161704.910564-1-
> praveen.kannoju%40oracle.com
> patch subject: [PATCH] net/mlx5: poll mlx5 eq during irq migration
> config: loongarch-randconfig-r121-20260305
> (https://download.01.org/0day-
> ci/archive/20260305/202603051910.7oo8wCfc-lkp@intel.com/config)
> compiler: loongarch64-linux-gcc (GCC) 15.2.0
> sparse: v0.6.5-rc1
> reproduce (this is a W=1 build): (https://download.01.org/0day-
> ci/archive/20260305/202603051910.7oo8wCfc-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes:
> | https://lore.kernel.org/oe-kbuild-all/202603051910.7oo8wCfc-lkp@intel.
> | com/
>
> sparse warnings: (new ones prefixed by >>)
> >> drivers/net/ethernet/mellanox/mlx5/core/eq.c:25:14: sparse: sparse:
> symbol 'mlx5_reap_eq_irq_aff_change' was not declared. Should it be static?
> >> drivers/net/ethernet/mellanox/mlx5/core/eq.c:958:6: sparse: sparse:
> symbol 'mlx5_eq_reap_irq_notify' was not declared. Should it be static?
> >> drivers/net/ethernet/mellanox/mlx5/core/eq.c:978:6: sparse: sparse:
> symbol 'mlx5_eq_reap_irq_release' was not declared. Should it be static?
>
> vim +/mlx5_reap_eq_irq_aff_change +25
> drivers/net/ethernet/mellanox/mlx5/core/eq.c
>
> 24
> > 25 unsigned int mlx5_reap_eq_irq_aff_change;
> 26 module_param(mlx5_reap_eq_irq_aff_change, int, 0644);
> 27 MODULE_PARM_DESC(mlx5_reap_eq_irq_aff_change,
> "mlx5_reap_eq_irq_aff_change: 0 = Disable MLX5 EQ Reap upon IRQ affinity
> change, \
> 28 1 = Enable MLX5 EQ Reap upon IRQ affinity change.
> Default=0");
> 29 enum {
> 30 MLX5_EQE_OWNER_INIT_VAL = 0x1,
> 31 };
> 32
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 13+ messages in thread