Re: [PATCH] ublk: fix deadlock when reading partition table

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH] ublk: fix deadlock when reading partition table
Date: Fri, 12 Dec 2025 16:22:57 +0800	[thread overview]
Message-ID: <aTvQ4UokhqHtKGll@fedora> (raw)
In-Reply-To: <CADUfDZqh-jdJzK4E9bz0V12W=d4vGZOJMgC0F-EzXvd8dT_+4Q@mail.gmail.com>

On Thu, Dec 11, 2025 at 08:56:46PM -0800, Caleb Sander Mateos wrote:
> On Thu, Dec 11, 2025 at 2:27 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 12:30:35PM -0800, Caleb Sander Mateos wrote:
> > > On Thu, Dec 11, 2025 at 12:38 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > When one process(such as udev) opens ublk block device (e.g., to read
> > > > the partition table via bdev_open()), a deadlock[1] can occur:
> > > >
> > > > 1. bdev_open() grabs disk->open_mutex
> > > > 2. The process issues read I/O to ublk backend to read partition table
> > >
> > > I'm not sure I understand how a process could be issuing read I/O to
> > > the block device before bdev_open() has returned? Or do you mean that
> > > bdev_open() is issuing read I/O for the partition table via
> > > blkdev_get_whole() -> bdev_disk_changed() -> blk_add_partitions() ->
> > > check_partition()?
> >
> > Yes, disk->open_mutex is grabbed and waiting for reading partition table.
> >
> > >
> > > > 3. In __ublk_complete_rq(), blk_update_request() or blk_mq_end_request()
> > > >    runs bio->bi_end_io() callbacks
> > > > 4. If this triggers fput() on file descriptor of ublk block device, the
> > > >    work may be deferred to current task's task work (see fput() implementation)
> > >
> > > What is the bi_end_io implementation that results in an fput() call?
> >
> > libaio calls fput() via ->ki_complete() via bio->bi_end_io(), io-uring may
> > call it from io_free_rsrc_node().
> >
> > https://github.com/ublk-org/ublksrv/issues/170#issuecomment-3635162644
> >
> > >
> > > > 5. This eventually calls blkdev_release() from the same context
> > > > 6. blkdev_release() tries to grab disk->open_mutex again
> > > > 7. Deadlock: same task waiting for a mutex it already holds
> > > >
> > > > The fix is to run blk_update_request() and blk_mq_end_request() with bottom
> > > > halves disabled. This forces blkdev_release() to run in kernel work-queue
> > > > context instead of current task work context, and allows ublk server to make
> > > > forward progress, and avoids the deadlock.
> > >
> > > The idea here seems reasonable, but I can't say I understand all the
> > > pieces resulting in the deadlock.
> >
> > Please see the following scenarios:
> >
> > 1) task A: fio is running IO over /dev/ublkb0 for 5secs
> >
> > 2) task B: just when fio is exiting, another task is calling into ioctl(RRPART) on
> > /dev/ublkb0, waiting for reading partition with disk->open_mutex held.
> >
> > 3) in ublk server task, for some reason, fput() drops the `struct
> > file`'s last reference from task A, so bdev_release() is called from
> > task_work_run() in ublk server context. However, task B is holding
> > disk->open_mutex, so bdev_release() hangs forever, because this ublk server
> > can't handle IO for task B any more.
> 
> Ah okay this makes more sense, I thought the fput() was for the
> completion of the partition table read. But I see now it can be for
> any other process that happens to have opened the same ublk disk. The
> "userspace thread and not in interrupt/softirq" logic in
> __fput_deferred() to decide to close the file in task work seems
> pretty suspect. The assumption seems to be that blocking a userspace
> thread is fine since it won't be blocking any kernel threads, but
> that's clearly wrong for a userspace block device. I wonder if any
> similar deadlocks are possible with fuse? Pretending the ublk server

It should be one block only issue, cause it is related disk->open_disk.

> thread is in softirq context when completing the seems a bit hacky,
> but I can see it may be the simplest fix.
> 
> Do the blk_mq_end_request() calls in__ublk_do_auto_buf_reg() and
> __ublk_abort_rq() not also need this protection?

I think all needs this protection, will do it in v2.


Thanks,
Ming

     prev parent reply	other threads:[~2025-12-12  8:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11  8:38 [PATCH] ublk: fix deadlock when reading partition table Ming Lei
2025-12-11 20:30 ` Caleb Sander Mateos
2025-12-11 22:26   ` Ming Lei
2025-12-12  4:56     ` Caleb Sander Mateos
2025-12-12  8:22       ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTvQ4UokhqHtKGll@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox