public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Yoav Cohen <yoav@nvidia.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Jared Holzman <jholzman@nvidia.com>,
	Guy Eisenberg <geisenberg@nvidia.com>,
	Jens Axboe <axboe@kernel.dk>, Ofer Oshri <ofer@nvidia.com>
Subject: Re: ublk: partition scan during START_DEV can block userspace
Date: Thu, 18 Dec 2025 19:20:25 +0800	[thread overview]
Message-ID: <aUPjeWYBGLb-GzzI@fedora> (raw)
In-Reply-To: <DM4PR12MB63280C5637917C071C2F0D65A9A8A@DM4PR12MB6328.namprd12.prod.outlook.com>

On Thu, Dec 18, 2025 at 10:11:18AM +0000, Yoav Cohen wrote:
> Hi,
> 
> Background:
> We expose a network-managed block device using ublk.
> 
> When issuing START_DEV, the kernel automatically attempts to scan the disk
> partitions. This results in synchronous reads from the device, as shown in the
> stack trace below:
> 
> [<0>] folio_wait_bit_common+0x138/0x310
> [<0>] filemap_read_folio+0x94/0xe0
> [<0>] do_read_cache_folio+0x80/0x1c0
> [<0>] read_cache_folio+0x12/0x30
> [<0>] read_part_sector+0x39/0xe0
> [<0>] read_lba+0x91/0x110
> [<0>] find_valid_gpt.constprop.0+0xe5/0x5d0
> [<0>] efi_partition+0x5b/0x360
> [<0>] check_partition+0x166/0x3c0
> [<0>] blk_add_partitions+0x3e/0x280
> [<0>] bdev_disk_changed+0x149/0x1c0
> [<0>] blkdev_get_whole+0x8c/0xb0
> [<0>] bdev_open+0x2ea/0x3c0
> [<0>] bdev_file_open_by_dev+0xde/0x140
> [<0>] disk_scan_partitions+0x68/0x130
> [<0>] add_disk_fwnode+0x46c/0x490
> [<0>] device_add_disk+0x10/0x20
> [<0>] ublk_ctrl_start_dev.isra.0+0x29d/0x3a0 [ublk_drv]
> [<0>] ublk_ctrl_uring_cmd+0x407/0x600 [ublk_drv]
> [<0>] io_uring_cmd+0xa4/0x150
> 
> Problems observed
> 
> Userspace crash can leave the process stuck
> 
> If the ublk userspace server crashes while this partition-scan I/O is in
> progress, the process may fail to terminate cleanly. For example:
> 
> yoav@nvme195:~$ sudo cat /proc/3083/stack
> [<0>] do_exit+0xd7/0xa50
> [<0>] do_group_exit+0x34/0x90
> [<0>] get_signal+0x928/0x950
> [<0>] arch_do_signal_or_restart+0x41/0x260
> [<0>] irqentry_exit_to_user_mode+0x13b/0x1d0
> [<0>] irqentry_exit+0x43/0x50
> [<0>] sysvec_reschedule_ipi+0x65/0x110
> [<0>] asm_sysvec_reschedule_ipi+0x1b/0x20
> 
> At this point, the server is no longer able to serve I/O, yet the kernel is
> still waiting for completion of the partition-scan reads, preventing proper
> shutdown and recovery.
> 
> Restart requires serving partition scan I/O
> 
> Even without a crash, restarting the userspace application requires handling
> these implicit partition-scan requests. This is undesirable for our use case,
> as the device contents are managed remotely and partition probing is not always
> meaningful or wanted.
> 
> No way to suppress partition scanning in ublk
> 
> We considered introducing an option to set GD_SUPPRESS_PART_SCAN at device
> startup, and possibly triggering partition scanning later from userspace when
> appropriate. However, it is not clear whether this is the correct approach, nor
> from which context such a rescan should safely be initiated.
> 
> Questions / discussion points
> 
> Is there an existing, recommended way for ublk devices to suppress automatic
> partition scanning at START_DEV time?

No, there isn't.

> 
> Would it make sense to add a ublk-specific option to control
> GD_SUPPRESS_PART_SCAN, similar to how some other drivers handle this?

Yes, I think it is reasonable to add this feature flag via `ublksrv_ctrl_dev_info`,
and it should be very useful & flexible for user to scan partitions by
themselves.

> 
> Are there alternative approaches to avoid blocking behavior during device
> startup without requiring kernel changes?

It should be triggered in case of UBLK_F_USER_RECOVERY only.

add_disk() is run with ub->mutex grabbed, so the following UBLK_U_CMD_DEL_DEV or
STOP_DEV command hangs forever on the ub->mutex.

I will think about how to fix this issue. Probably the lock needs to be
released when calling add_disk().


Thanks,
Ming


      reply	other threads:[~2025-12-18 11:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18 10:11 ublk: partition scan during START_DEV can block userspace Yoav Cohen
2025-12-18 11:20 ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUPjeWYBGLb-GzzI@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=geisenberg@nvidia.com \
    --cc=jholzman@nvidia.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ofer@nvidia.com \
    --cc=yoav@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox