Re: [PATCH 6/9] ublk: improve detection and handling of ublk server exit

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: Uday Shankar <ushankar@purestorage.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org,
	Caleb Sander Mateos <csander@purestorage.com>
Subject: Re: [PATCH 6/9] ublk: improve detection and handling of ublk server exit
Date: Tue, 15 Apr 2025 09:54:13 +0800	[thread overview]
Message-ID: <Z_28RbNHq1Knm6Ef@fedora> (raw)
In-Reply-To: <Z/1x19wmebu/RnPK@dev-ushankar.dev.purestorage.com>

On Mon, Apr 14, 2025 at 02:36:39PM -0600, Uday Shankar wrote:
> On Mon, Apr 14, 2025 at 07:25:47PM +0800, Ming Lei wrote:
> > From: Uday Shankar <ushankar@purestorage.com>
> > 
> > There are currently two ways in which ublk server exit is detected by
> > ublk_drv:
> > 
> > 1. uring_cmd cancellation. If there are any outstanding uring_cmds which
> >    have not been completed to the ublk server when it exits, io_uring
> >    calls the uring_cmd callback with a special cancellation flag as the
> >    issuing task is exiting.
> > 2. I/O timeout. This is needed in addition to the above to handle the
> >    "saturated queue" case, when all I/Os for a given queue are in the
> >    ublk server, and therefore there are no outstanding uring_cmds to
> >    cancel when the ublk server exits.
> > 
> > There are a couple of issues with this approach:
> > 
> > - It is complex and inelegant to have two methods to detect the same
> >   condition
> > - The second method detects ublk server exit only after a long delay
> >   (~30s, the default timeout assigned by the block layer). This delays
> >   the nosrv behavior from kicking in and potential subsequent recovery
> >   of the device.
> > 
> > The second issue is brought to light with the new test_generic_06 which
> > will be added in following patch. It fails before this fix:
> > 
> > selftests: ublk: test_generic_06.sh
> > dev id is 0
> > dd: error writing '/dev/ublkb0': Input/output error
> > 1+0 records in
> > 0+0 records out
> > 0 bytes copied, 30.0611 s, 0.0 kB/s
> > DEAD
> > dd took 31 seconds to exit (>= 5s tolerance)!
> > generic_06 : [FAIL]
> > 
> > Fix this by instead detecting and handling ublk server exit in the
> > character file release callback. This has several advantages:
> > 
> > - This one place can handle both saturated and unsaturated queues. Thus,
> >   it replaces both preexisting methods of detecting ublk server exit.
> > - It runs quickly on ublk server exit - there is no 30s delay.
> > - It starts the process of removing task references in ublk_drv. This is
> >   needed if we want to relax restrictions in the driver like letting
> >   only one thread serve each queue
> > 
> > There is also the disadvantage that the character file release callback
> > can also be triggered by intentional close of the file, which is a
> > significant behavior change. Preexisting ublk servers (libublksrv) are
> > dependent on the ability to open/close the file multiple times. To
> > address this, only transition to a nosrv state if the file is released
> > while the ublk device is live. This allows for programs to open/close
> > the file multiple times during setup. It is still a behavior change if a
> > ublk server decides to close/reopen the file while the device is LIVE
> > (i.e. while it is responsible for serving I/O), but that would be highly
> > unusual. This behavior is in line with what is done by FUSE, which is
> > very similar to ublk in that a userspace daemon is providing services
> > traditionally provided by the kernel.
> > 
> > With this change in, the new test (and all other selftests, and all
> > ublksrv tests) pass:
> > 
> > selftests: ublk: test_generic_06.sh
> > dev id is 0
> > dd: error writing '/dev/ublkb0': Input/output error
> > 1+0 records in
> > 0+0 records out
> > 0 bytes copied, 0.0376731 s, 0.0 kB/s
> > DEAD
> > generic_04 : [PASS]
> > 
> > Signed-off-by: Uday Shankar <ushankar@purestorage.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/block/ublk_drv.c | 231 ++++++++++++++++++++++-----------------
> >  1 file changed, 131 insertions(+), 100 deletions(-)
> > 
> > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> > index b68bd4172fa8..e02f205f6da4 100644
> > --- a/drivers/block/ublk_drv.c
> > +++ b/drivers/block/ublk_drv.c
> > @@ -199,8 +199,6 @@ struct ublk_device {
> >  	struct completion	completion;
> >  	unsigned int		nr_queues_ready;
> >  	unsigned int		nr_privileged_daemon;
> > -
> > -	struct work_struct	nosrv_work;
> >  };
> >  
> >  /* header of ublk_params */
> > @@ -209,8 +207,9 @@ struct ublk_params_header {
> >  	__u32	types;
> >  };
> >  
> > -static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
> > -
> > +static void ublk_stop_dev_unlocked(struct ublk_device *ub);
> > +static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq);
> > +static void __ublk_quiesce_dev(struct ublk_device *ub);
> >  static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
> >  		struct ublk_queue *ubq, int tag, size_t offset);
> >  static inline unsigned int ublk_req_build_flags(struct request *req);
> > @@ -1336,8 +1335,6 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
> >  static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  {
> >  	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> > -	unsigned int nr_inflight = 0;
> > -	int i;
> >  
> >  	if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
> >  		if (!ubq->timeout) {
> > @@ -1348,26 +1345,6 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
> >  		return BLK_EH_DONE;
> >  	}
> >  
> > -	if (!ubq_daemon_is_dying(ubq))
> > -		return BLK_EH_RESET_TIMER;
> > -
> > -	for (i = 0; i < ubq->q_depth; i++) {
> > -		struct ublk_io *io = &ubq->ios[i];
> > -
> > -		if (!(io->flags & UBLK_IO_FLAG_ACTIVE))
> > -			nr_inflight++;
> > -	}
> > -
> > -	/* cancelable uring_cmd can't help us if all commands are in-flight */
> > -	if (nr_inflight == ubq->q_depth) {
> > -		struct ublk_device *ub = ubq->dev;
> > -
> > -		if (ublk_abort_requests(ub, ubq)) {
> > -			schedule_work(&ub->nosrv_work);
> > -		}
> > -		return BLK_EH_DONE;
> > -	}
> > -
> >  	return BLK_EH_RESET_TIMER;
> >  }
> >  
> > @@ -1525,13 +1502,112 @@ static void ublk_reset_ch_dev(struct ublk_device *ub)
> >  	ub->nr_privileged_daemon = 0;
> >  }
> >  
> > +static struct gendisk *ublk_get_disk(struct ublk_device *ub)
> > +{
> > +	struct gendisk *disk;
> > +
> > +	spin_lock(&ub->lock);
> > +	disk = ub->ub_disk;
> > +	if (disk)
> > +		get_device(disk_to_dev(disk));
> > +	spin_unlock(&ub->lock);
> > +
> > +	return disk;
> > +}
> > +
> > +static void ublk_put_disk(struct gendisk *disk)
> > +{
> > +	if (disk)
> > +		put_device(disk_to_dev(disk));
> > +}
> > +
> >  static int ublk_ch_release(struct inode *inode, struct file *filp)
> >  {
> >  	struct ublk_device *ub = filp->private_data;
> > +	struct gendisk *disk;
> > +	int i;
> > +
> > +	/*
> > +	 * disk isn't attached yet, either device isn't live, or it has
> > +	 * been removed already, so we needn't to do anything
> > +	 */
> > +	disk = ublk_get_disk(ub);
> > +	if (!disk)
> > +		goto out;
> > +
> > +	/*
> > +	 * All uring_cmd are done now, so abort any request outstanding to
> > +	 * the ublk server
> > +	 *
> > +	 * This can be done in lockless way because ublk server has been
> > +	 * gone
> > +	 *
> > +	 * More importantly, we have to provide forward progress guarantee
> > +	 * without holding ub->mutex, otherwise control task grabbing
> > +	 * ub->mutex triggers deadlock
> > +	 *
> > +	 * All requests may be inflight, so ->canceling may not be set, set
> > +	 * it now.
> > +	 */
> > +	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
> > +		struct ublk_queue *ubq = ublk_get_queue(ub, i);
> > +
> > +		ubq->canceling = true;
> > +		ublk_abort_queue(ub, ubq);
> > +	}
> > +	blk_mq_kick_requeue_list(disk->queue);
> > +
> > +	/*
> > +	 * All infligh requests have been completed or requeued and any new
> > +	 * request will be failed or requeued via `->canceling` now, so it is
> > +	 * fine to grab ub->mutex now.
> > +	 */
> > +	mutex_lock(&ub->mutex);
> > +
> > +	/* double check after grabbing lock */
> > +	if (!ub->ub_disk)
> > +		goto unlock;
> > +
> > +	/*
> > +	 * Transition the device to the nosrv state. What exactly this
> > +	 * means depends on the recovery flags
> > +	 */
> > +	blk_mq_quiesce_queue(disk->queue);
> > +	if (ublk_nosrv_should_stop_dev(ub)) {
> > +		/*
> > +		 * Allow any pending/future I/O to pass through quickly
> > +		 * with an error. This is needed because del_gendisk
> > +		 * waits for all pending I/O to complete
> > +		 */
> > +		for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
> > +			ublk_get_queue(ub, i)->force_abort = true;
> > +		blk_mq_unquiesce_queue(disk->queue);
> > +
> > +		ublk_stop_dev_unlocked(ub);
> > +	} else {
> > +		if (ublk_nosrv_dev_should_queue_io(ub)) {
> > +			/*
> > +			 * keep request queue as quiesced for queuing new IO
> > +			 * during QUIESCED state
> > +			 *
> > +			 * request queue will be unquiesced in ending
> > +			 * recovery, see ublk_ctrl_end_recovery
> > +			 */
> 
> Does this comment need an update after
> 
> [PATCH 4/9] ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io
> 
> If I read that one right, actually the request queue is not quiesced
> anymore during the UBLK_S_DEV_QUIESCED state

The comment is removed actually in the following patch, but it shouldn't have
been added here, will do it in V2.


Thanks, 
Ming

next prev parent reply	other threads:[~2025-04-15  1:54 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-14 11:25 [PATCH 0/9] ublk: simplify & improve IO canceling Ming Lei
2025-04-14 11:25 ` [PATCH 1/9] ublk: don't try to stop disk if ->ub_disk is NULL Ming Lei
2025-04-14 19:44   ` Uday Shankar
2025-04-15  1:32     ` Ming Lei
2025-04-14 11:25 ` [PATCH 2/9] ublk: properly serialize all FETCH_REQs Ming Lei
2025-04-14 19:58   ` Uday Shankar
2025-04-14 20:39     ` Jens Axboe
2025-04-14 20:52       ` Uday Shankar
2025-04-14 21:00         ` Jens Axboe
2025-04-15  1:40         ` Ming Lei
2025-04-16  1:13       ` Ming Lei
2025-04-16  1:17         ` Jens Axboe
2025-04-16  2:04           ` Ming Lei
2025-04-16  1:04     ` Uday Shankar
2025-04-14 11:25 ` [PATCH 3/9] ublk: add ublk_force_abort_dev() Ming Lei
2025-04-14 20:06   ` Uday Shankar
2025-04-14 11:25 ` [PATCH 4/9] ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io Ming Lei
2025-04-14 20:15   ` Uday Shankar
2025-04-15  1:48     ` Ming Lei
2025-04-14 11:25 ` [PATCH 5/9] ublk: move device reset into ublk_ch_release() Ming Lei
2025-04-14 20:29   ` Uday Shankar
2025-04-15  1:50     ` Ming Lei
2025-04-14 11:25 ` [PATCH 6/9] ublk: improve detection and handling of ublk server exit Ming Lei
2025-04-14 20:36   ` Uday Shankar
2025-04-15  1:54     ` Ming Lei [this message]
2025-04-14 11:25 ` [PATCH 7/9] ublk: remove __ublk_quiesce_dev() Ming Lei
2025-04-14 20:37   ` Uday Shankar
2025-04-14 11:25 ` [PATCH 8/9] ublk: simplify aborting ublk request Ming Lei
2025-04-14 20:42   ` Uday Shankar
2025-04-14 11:25 ` [PATCH 9/9] selftests: ublk: add generic_06 for covering fault inject Ming Lei
2025-04-14 20:44   ` Uday Shankar
2025-04-15  1:57     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z_28RbNHq1Knm6Ef@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ushankar@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).