Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix a kernel panic in nvme-fc
@ 2026-05-28  9:27 Maurizio Lombardi
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
  2026-06-10  9:23 ` [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi
  0 siblings, 2 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-05-28  9:27 UTC (permalink / raw)
  To: kbusch; +Cc: mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner, emilne

We received a RHEL bug report regarding a kernel panic that occurs
when loading/unloading the lpfc module:

Call Trace:
blk_mq_tagset_busy_iter+0x210/0x440
__nvme_fc_abort_outstanding_ios+0x1d8/0x260 [nvme_fc]
nvme_fc_ctrl_ioerr_work+0x4c/0x90 [nvme_fc]
process_one_work+0x1f4/0x500
worker_thread+0x33c/0x510
kthread+0x154/0x170
start_kernel_thread+0x14/0x18

The following patch, which is part of the Rapid Path Failure patchset,
fixes the crash.

https://lore.kernel.org/linux-nvme/20260328004518.1729186-16-mkhalfella@purestorage.com/

Since this patch is independent of the Rapid Path Failure feature itself,
I propose merging it separately so we don't have to wait for the full
feature to be approved.

Mohamed Khalfella (1):
  nvme-fc: Do not cancel requests in io target before it is initialized

 drivers/nvme/host/fc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.54.0



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-05-28  9:27 [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi
@ 2026-05-28  9:27 ` Maurizio Lombardi
  2026-06-01  7:12   ` Christoph Hellwig
                     ` (3 more replies)
  2026-06-10  9:23 ` [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi
  1 sibling, 4 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-05-28  9:27 UTC (permalink / raw)
  To: kbusch; +Cc: mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner, emilne

From: Mohamed Khalfella <mkhalfella@purestorage.com>

A new nvme-fc controller in CONNECTING state sees admin request timeout
schedules ctrl->ioerr_work to abort inflight requests. This ends up
calling __nvme_fc_abort_outstanding_ios() which aborts requests in both
admin and io tagsets. In case fc_ctrl->tag_set was not initialized we
see the warning below. This is because ctrl.queue_count is initialized
early in nvme_fc_alloc_ctrl().

nvme nvme0: NVME-FC{0}: starting error recovery Connectivity Loss
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
lpfc 0000:ab:00.0: queue 0 connect admin queue failed (-6).
you didn't initialize this object before use?
turning off the locking correctness validator.
Workqueue: nvme-reset-wq nvme_fc_ctrl_ioerr_work [nvme_fc]
Call Trace:
 <TASK>
 dump_stack_lvl+0x57/0x80
 register_lock_class+0x567/0x580
 __lock_acquire+0x330/0xb90
 lock_acquire.part.0+0xad/0x210
 blk_mq_tagset_busy_iter+0xf9/0xc00
 __nvme_fc_abort_outstanding_ios+0x23f/0x320 [nvme_fc]
 nvme_fc_ctrl_ioerr_work+0x172/0x210 [nvme_fc]
 process_one_work+0x82c/0x1450
 worker_thread+0x5ee/0xfd0
 kthread+0x3a0/0x750
 ret_from_fork+0x439/0x670
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Update the check in __nvme_fc_abort_outstanding_ios() confirm that io
tagset was created before iterating over busy requests. Also make sure
to cancel ctrl->ioerr_work before removing io tagset.

Reviewed-by: Randy Jennings <randyj@purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: James Smart <jsmart833426@gmail.com>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
---
 drivers/nvme/host/fc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e4f4528fe2a2..5c6a81333174 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2461,7 +2461,7 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
 	 * io requests back to the block layer as part of normal completions
 	 * (but with error status).
 	 */
-	if (ctrl->ctrl.queue_count > 1) {
+	if (ctrl->ctrl.queue_count > 1 && ctrl->ctrl.tagset) {
 		nvme_quiesce_io_queues(&ctrl->ctrl);
 		nvme_sync_io_queues(&ctrl->ctrl);
 		blk_mq_tagset_busy_iter(&ctrl->tag_set,
@@ -2900,6 +2900,11 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 out_delete_hw_queues:
 	nvme_fc_delete_hw_io_queues(ctrl);
 out_cleanup_tagset:
+	/*
+	 * In CONNECTING state ctrl->ioerr_work will abort both admin
+	 * and io tagsets. Cancel it first before removing io tagset.
+	 */
+	cancel_work_sync(&ctrl->ioerr_work);
 	nvme_remove_io_tag_set(&ctrl->ctrl);
 	nvme_fc_free_io_queues(ctrl);
 
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
@ 2026-06-01  7:12   ` Christoph Hellwig
  2026-06-01  9:47   ` Daniel Wagner
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2026-06-01  7:12 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner,
	emilne

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
  2026-06-01  7:12   ` Christoph Hellwig
@ 2026-06-01  9:47   ` Daniel Wagner
  2026-06-01 10:28     ` Maurizio Lombardi
  2026-06-10 11:35   ` Hannes Reinecke
  2026-06-10 14:35   ` Keith Busch
  3 siblings, 1 reply; 9+ messages in thread
From: Daniel Wagner @ 2026-06-01  9:47 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: kbusch, mlombard, mkhalfella, randyj, hch, linux-nvme, emilne

On Thu, May 28, 2026 at 11:27:34AM +0200, Maurizio Lombardi wrote:
> -	if (ctrl->ctrl.queue_count > 1) {
> +	if (ctrl->ctrl.queue_count > 1 && ctrl->ctrl.tagset) {
>  		nvme_quiesce_io_queues(&ctrl->ctrl);
>  		nvme_sync_io_queues(&ctrl->ctrl);
>  		blk_mq_tagset_busy_iter(&ctrl->tag_set,

Yes, that makes sense.

> @@ -2900,6 +2900,11 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
>  out_delete_hw_queues:
>  	nvme_fc_delete_hw_io_queues(ctrl);
>  out_cleanup_tagset:
> +	/*
> +	 * In CONNECTING state ctrl->ioerr_work will abort both admin
> +	 * and io tagsets. Cancel it first before removing io tagset.
> +	 */
> +	cancel_work_sync(&ctrl->ioerr_work);
>  	nvme_remove_io_tag_set(&ctrl->ctrl);
>  	nvme_fc_free_io_queues(ctrl);

Again, does makes sense but which tree is this based on? I don't see
this hunk in master or in nvme-7.2.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-06-01  9:47   ` Daniel Wagner
@ 2026-06-01 10:28     ` Maurizio Lombardi
  2026-06-01 14:16       ` Daniel Wagner
  0 siblings, 1 reply; 9+ messages in thread
From: Maurizio Lombardi @ 2026-06-01 10:28 UTC (permalink / raw)
  To: Daniel Wagner, Maurizio Lombardi
  Cc: kbusch, mlombard, mkhalfella, randyj, hch, linux-nvme, emilne

On Mon Jun 1, 2026 at 11:47 AM CEST, Daniel Wagner wrote:
>> @@ -2900,6 +2900,11 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
>>  out_delete_hw_queues:
>>  	nvme_fc_delete_hw_io_queues(ctrl);
>>  out_cleanup_tagset:
>> +	/*
>> +	 * In CONNECTING state ctrl->ioerr_work will abort both admin
>> +	 * and io tagsets. Cancel it first before removing io tagset.
>> +	 */
>> +	cancel_work_sync(&ctrl->ioerr_work);
>>  	nvme_remove_io_tag_set(&ctrl->ctrl);
>>  	nvme_fc_free_io_queues(ctrl);
>
> Again, does makes sense but which tree is this based on? I don't see
> this hunk in master or in nvme-7.2.


I am not sure what do you mean, this applies and compiles cleanly
against master.

Maybe you are referring to the comment?

Maurizio


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-06-01 10:28     ` Maurizio Lombardi
@ 2026-06-01 14:16       ` Daniel Wagner
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel Wagner @ 2026-06-01 14:16 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: Maurizio Lombardi, kbusch, mkhalfella, randyj, hch, linux-nvme,
	emilne

On Mon, Jun 01, 2026 at 12:28:14PM +0200, Maurizio Lombardi wrote:
> > Again, does makes sense but which tree is this based on? I don't see
> > this hunk in master or in nvme-7.2.
> 
> 
> I am not sure what do you mean, this applies and compiles cleanly
> against master.
> 
> Maybe you are referring to the comment?

I was looking at the wrong function. All good.

Reviewed-by: Daniel Wagner <dwagner@suse.de>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/1] Fix a kernel panic in nvme-fc
  2026-05-28  9:27 [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
@ 2026-06-10  9:23 ` Maurizio Lombardi
  1 sibling, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2026-06-10  9:23 UTC (permalink / raw)
  To: Maurizio Lombardi, kbusch
  Cc: mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner, emilne

On Thu May 28, 2026 at 11:27 AM CEST, Maurizio Lombardi wrote:
> We received a RHEL bug report regarding a kernel panic that occurs
> when loading/unloading the lpfc module:
>
> Call Trace:
> blk_mq_tagset_busy_iter+0x210/0x440
> __nvme_fc_abort_outstanding_ios+0x1d8/0x260 [nvme_fc]
> nvme_fc_ctrl_ioerr_work+0x4c/0x90 [nvme_fc]
> process_one_work+0x1f4/0x500
> worker_thread+0x33c/0x510
> kthread+0x154/0x170
> start_kernel_thread+0x14/0x18
>
> The following patch, which is part of the Rapid Path Failure patchset,
> fixes the crash.
>
> https://lore.kernel.org/linux-nvme/20260328004518.1729186-16-mkhalfella@purestorage.com/
>
> Since this patch is independent of the Rapid Path Failure feature itself,
> I propose merging it separately so we don't have to wait for the full
> feature to be approved.
>
> Mohamed Khalfella (1):
>   nvme-fc: Do not cancel requests in io target before it is initialized
>
>  drivers/nvme/host/fc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Keith? Can you merge this?

Thanks,
Maurizio


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
  2026-06-01  7:12   ` Christoph Hellwig
  2026-06-01  9:47   ` Daniel Wagner
@ 2026-06-10 11:35   ` Hannes Reinecke
  2026-06-10 14:35   ` Keith Busch
  3 siblings, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2026-06-10 11:35 UTC (permalink / raw)
  To: Maurizio Lombardi, kbusch
  Cc: mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner, emilne

On 5/28/26 11:27, Maurizio Lombardi wrote:
> From: Mohamed Khalfella <mkhalfella@purestorage.com>
> 
> A new nvme-fc controller in CONNECTING state sees admin request timeout
> schedules ctrl->ioerr_work to abort inflight requests. This ends up
> calling __nvme_fc_abort_outstanding_ios() which aborts requests in both
> admin and io tagsets. In case fc_ctrl->tag_set was not initialized we
> see the warning below. This is because ctrl.queue_count is initialized
> early in nvme_fc_alloc_ctrl().
> 
> nvme nvme0: NVME-FC{0}: starting error recovery Connectivity Loss
> INFO: trying to register non-static key.
> The code is fine but needs lockdep annotation, or maybe
> lpfc 0000:ab:00.0: queue 0 connect admin queue failed (-6).
> you didn't initialize this object before use?
> turning off the locking correctness validator.
> Workqueue: nvme-reset-wq nvme_fc_ctrl_ioerr_work [nvme_fc]
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x57/0x80
>   register_lock_class+0x567/0x580
>   __lock_acquire+0x330/0xb90
>   lock_acquire.part.0+0xad/0x210
>   blk_mq_tagset_busy_iter+0xf9/0xc00
>   __nvme_fc_abort_outstanding_ios+0x23f/0x320 [nvme_fc]
>   nvme_fc_ctrl_ioerr_work+0x172/0x210 [nvme_fc]
>   process_one_work+0x82c/0x1450
>   worker_thread+0x5ee/0xfd0
>   kthread+0x3a0/0x750
>   ret_from_fork+0x439/0x670
>   ret_from_fork_asm+0x1a/0x30
>   </TASK>
> 
> Update the check in __nvme_fc_abort_outstanding_ios() confirm that io
> tagset was created before iterating over busy requests. Also make sure
> to cancel ctrl->ioerr_work before removing io tagset.
> 
> Reviewed-by: Randy Jennings <randyj@purestorage.com>
> Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
> Signed-off-by: James Smart <jsmart833426@gmail.com>
> Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
> ---
>   drivers/nvme/host/fc.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 

Would be nice to have a 'Fixes' tag.
Otherwise:

Reviewed-by: Hannes Reinecke <hare@kernel.org>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized
  2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
                     ` (2 preceding siblings ...)
  2026-06-10 11:35   ` Hannes Reinecke
@ 2026-06-10 14:35   ` Keith Busch
  3 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2026-06-10 14:35 UTC (permalink / raw)
  To: Maurizio Lombardi
  Cc: mlombard, mkhalfella, randyj, hch, linux-nvme, dwagner, emilne

On Thu, May 28, 2026 at 11:27:34AM +0200, Maurizio Lombardi wrote:
> From: Mohamed Khalfella <mkhalfella@purestorage.com>
> 
> A new nvme-fc controller in CONNECTING state sees admin request timeout
> schedules ctrl->ioerr_work to abort inflight requests. This ends up
> calling __nvme_fc_abort_outstanding_ios() which aborts requests in both
> admin and io tagsets. In case fc_ctrl->tag_set was not initialized we
> see the warning below. This is because ctrl.queue_count is initialized
> early in nvme_fc_alloc_ctrl().

Thanks, applied to nvme-7.2.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-06-10 14:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28  9:27 [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi
2026-05-28  9:27 ` [PATCH 1/1] nvme-fc: Do not cancel requests in io target before it is initialized Maurizio Lombardi
2026-06-01  7:12   ` Christoph Hellwig
2026-06-01  9:47   ` Daniel Wagner
2026-06-01 10:28     ` Maurizio Lombardi
2026-06-01 14:16       ` Daniel Wagner
2026-06-10 11:35   ` Hannes Reinecke
2026-06-10 14:35   ` Keith Busch
2026-06-10  9:23 ` [PATCH 0/1] Fix a kernel panic in nvme-fc Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox