Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Jonathan Nicklin <jnicklin@blockbridge.com>
Cc: Christoph Hellwig <hch@lst.de>, Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>,
	linux-nvme@lists.infradead.org
Subject: Re: [PATCH 1/2] nvme-tcp: fix possible hang caused during ctrl deletion
Date: Thu, 29 Sep 2022 21:43:38 +0300	[thread overview]
Message-ID: <603d624d-de25-e856-58ea-7a76e3029dbb@grimberg.me> (raw)
In-Reply-To: <20220928062326.45119-1-sagi@grimberg.me>



On 9/28/22 09:23, Sagi Grimberg wrote:
> When we delete a controller, we execute the following:
> 1. nvme_stop_ctrl() - stop some work elements that may be
> 	inflight or scheduled (specifically also .stop_ctrl
> 	which cancels ctrl error recovery work)
> 2. nvme_remove_namespaces() - which first flushes scan_work
> 	to avoid competing ns addition/removal
> 3. continue to teardown the controller
> 
> However, if err_work was scheduled to run in (1), it is designed to
> cancel any inflight I/O, particularly I/O that is originating from ns
> scan_work in (2), but because it is cancelled in .stop_ctrl(), we can
> prevent forward progress of (2) as ns scanning is blocking on I/O
> (that will never be cancelled).
> 
> The race is:
> 1. transport layer error observed -> err_work is scheduled
> 2. scan_work executes, discovers ns, generate I/O to it
> 3. nvme_ctop_ctrl() -> .stop_ctrl() -> cancel_work_sync(err_work)
>     - err_work never executed
> 4. nvme_remove_namespaces() -> flush_work(scan_work)
> --> deadlock, because scan_work is blocked on I/O that was supposed
> to be cancelled by err_work, but was cancelled before executing (see
> stack trace [1]).
> 
> Fix this by flushing err_work instead of cancelling it, to force it
> to execute and cancel all inflight I/O.
> 
> [1]:
> --
> Call Trace:
>   <TASK>
>   __schedule+0x390/0x910
>   ? scan_shadow_nodes+0x40/0x40
>   schedule+0x55/0xe0
>   io_schedule+0x16/0x40
>   do_read_cache_page+0x55d/0x850
>   ? __page_cache_alloc+0x90/0x90
>   read_cache_page+0x12/0x20
>   read_part_sector+0x3f/0x110
>   amiga_partition+0x3d/0x3e0
>   ? osf_partition+0x33/0x220
>   ? put_partition+0x90/0x90
>   bdev_disk_changed+0x1fe/0x4d0
>   blkdev_get_whole+0x7b/0x90
>   blkdev_get_by_dev+0xda/0x2d0
>   device_add_disk+0x356/0x3b0
>   nvme_mpath_set_live+0x13c/0x1a0 [nvme_core]
>   ? nvme_parse_ana_log+0xae/0x1a0 [nvme_core]
>   nvme_update_ns_ana_state+0x3a/0x40 [nvme_core]
>   nvme_mpath_add_disk+0x120/0x160 [nvme_core]
>   nvme_alloc_ns+0x594/0xa00 [nvme_core]
>   nvme_validate_or_alloc_ns+0xb9/0x1a0 [nvme_core]
>   ? __nvme_submit_sync_cmd+0x1d2/0x210 [nvme_core]
>   nvme_scan_work+0x281/0x410 [nvme_core]
>   process_one_work+0x1be/0x380
>   worker_thread+0x37/0x3b0
>   ? process_one_work+0x380/0x380
>   kthread+0x12d/0x150
>   ? set_kthread_struct+0x50/0x50
>   ret_from_fork+0x1f/0x30
>   </TASK>
> INFO: task nvme:6725 blocked for more than 491 seconds.
>        Not tainted 5.15.65-f0.el7.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:nvme            state:D
>   stack:    0 pid: 6725 ppid:  1761 flags:0x00004000
> Call Trace:
>   <TASK>
>   __schedule+0x390/0x910
>   ? sched_clock+0x9/0x10
>   schedule+0x55/0xe0
>   schedule_timeout+0x24b/0x2e0
>   ? try_to_wake_up+0x358/0x510
>   ? finish_task_switch+0x88/0x2c0
>   wait_for_completion+0xa5/0x110
>   __flush_work+0x144/0x210
>   ? worker_attach_to_pool+0xc0/0xc0
>   flush_work+0x10/0x20
>   nvme_remove_namespaces+0x41/0xf0 [nvme_core]
>   nvme_do_delete_ctrl+0x47/0x66 [nvme_core]
>   nvme_sysfs_delete.cold.96+0x8/0xd [nvme_core]
>   dev_attr_store+0x14/0x30
>   sysfs_kf_write+0x38/0x50
>   kernfs_fop_write_iter+0x146/0x1d0
>   new_sync_write+0x114/0x1b0
>   ? intel_pmu_handle_irq+0xe0/0x420
>   vfs_write+0x18d/0x270
>   ksys_write+0x61/0xe0
>   __x64_sys_write+0x1a/0x20
>   do_syscall_64+0x37/0x90
>   entry_SYSCALL_64_after_hwframe+0x61/0xcb
> --
> 
> Fixes: 3f2304f8c6d6 ("nvme-tcp: add NVMe over TCP host driver")
> Reported-by: Jonathan Nicklin <jnicklin@blockbridge.com>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Jonathan, can I get your Tested-by tag on this?


  parent reply	other threads:[~2022-09-29 18:44 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28  6:23 [PATCH 1/2] nvme-tcp: fix possible hang caused during ctrl deletion Sagi Grimberg
2022-09-28  6:23 ` [PATCH 2/2] nvme-rdma: " Sagi Grimberg
2022-09-29 18:43 ` Sagi Grimberg [this message]
2022-09-29 18:45 ` [PATCH 1/2] nvme-tcp: " Jonathan Nicklin
2022-10-10  8:29 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=603d624d-de25-e856-58ea-7a76e3029dbb@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Chaitanya.Kulkarni@wdc.com \
    --cc=hch@lst.de \
    --cc=jnicklin@blockbridge.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox