Re: [PATCH] ublk: fix deadlock when reading partition table

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Uday Shankar <ushankar@purestorage.com>
Subject: Re: [PATCH] ublk: fix deadlock when reading partition table
Date: Fri, 12 Dec 2025 16:22:57 +0800	[thread overview]
Message-ID: <aTvQ4UokhqHtKGll@fedora> (raw)
In-Reply-To: <CADUfDZqh-jdJzK4E9bz0V12W=d4vGZOJMgC0F-EzXvd8dT_+4Q@mail.gmail.com>

On Thu, Dec 11, 2025 at 08:56:46PM -0800, Caleb Sander Mateos wrote:
> On Thu, Dec 11, 2025 at 2:27 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Thu, Dec 11, 2025 at 12:30:35PM -0800, Caleb Sander Mateos wrote:
> > > On Thu, Dec 11, 2025 at 12:38 AM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > When one process(such as udev) opens ublk block device (e.g., to read
> > > > the partition table via bdev_open()), a deadlock[1] can occur:
> > > >
> > > > 1. bdev_open() grabs disk->open_mutex
> > > > 2. The process issues read I/O to ublk backend to read partition table
> > >
> > > I'm not sure I understand how a process could be issuing read I/O to
> > > the block device before bdev_open() has returned? Or do you mean that
> > > bdev_open() is issuing read I/O for the partition table via
> > > blkdev_get_whole() -> bdev_disk_changed() -> blk_add_partitions() ->
> > > check_partition()?
> >
> > Yes, disk->open_mutex is grabbed and waiting for reading partition table.
> >
> > >
> > > > 3. In __ublk_complete_rq(), blk_update_request() or blk_mq_end_request()
> > > >    runs bio->bi_end_io() callbacks
> > > > 4. If this triggers fput() on file descriptor of ublk block device, the
> > > >    work may be deferred to current task's task work (see fput() implementation)
> > >
> > > What is the bi_end_io implementation that results in an fput() call?
> >
> > libaio calls fput() via ->ki_complete() via bio->bi_end_io(), io-uring may
> > call it from io_free_rsrc_node().
> >
> > https://github.com/ublk-org/ublksrv/issues/170#issuecomment-3635162644
> >
> > >
> > > > 5. This eventually calls blkdev_release() from the same context
> > > > 6. blkdev_release() tries to grab disk->open_mutex again
> > > > 7. Deadlock: same task waiting for a mutex it already holds
> > > >
> > > > The fix is to run blk_update_request() and blk_mq_end_request() with bottom
> > > > halves disabled. This forces blkdev_release() to run in kernel work-queue
> > > > context instead of current task work context, and allows ublk server to make
> > > > forward progress, and avoids the deadlock.
> > >
> > > The idea here seems reasonable, but I can't say I understand all the
> > > pieces resulting in the deadlock.
> >
> > Please see the following scenarios:
> >
> > 1) task A: fio is running IO over /dev/ublkb0 for 5secs
> >
> > 2) task B: just when fio is exiting, another task is calling into ioctl(RRPART) on
> > /dev/ublkb0, waiting for reading partition with disk->open_mutex held.
> >
> > 3) in ublk server task, for some reason, fput() drops the `struct
> > file`'s last reference from task A, so bdev_release() is called from
> > task_work_run() in ublk server context. However, task B is holding
> > disk->open_mutex, so bdev_release() hangs forever, because this ublk server
> > can't handle IO for task B any more.
> 
> Ah okay this makes more sense, I thought the fput() was for the
> completion of the partition table read. But I see now it can be for
> any other process that happens to have opened the same ublk disk. The
> "userspace thread and not in interrupt/softirq" logic in
> __fput_deferred() to decide to close the file in task work seems
> pretty suspect. The assumption seems to be that blocking a userspace
> thread is fine since it won't be blocking any kernel threads, but
> that's clearly wrong for a userspace block device. I wonder if any
> similar deadlocks are possible with fuse? Pretending the ublk server

It should be one block only issue, cause it is related disk->open_disk.

> thread is in softirq context when completing the seems a bit hacky,
> but I can see it may be the simplest fix.
> 
> Do the blk_mq_end_request() calls in__ublk_do_auto_buf_reg() and
> __ublk_abort_rq() not also need this protection?

I think all needs this protection, will do it in v2.


Thanks,
Ming

     prev parent reply	other threads:[~2025-12-12  8:23 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-11  8:38 [PATCH] ublk: fix deadlock when reading partition table Ming Lei
2025-12-11 20:30 ` Caleb Sander Mateos
2025-12-11 22:26   ` Ming Lei
2025-12-12  4:56     ` Caleb Sander Mateos
2025-12-12  8:22       ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTvQ4UokhqHtKGll@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.