All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Ziyang Zhang <ZiyangZhang@linux.alibaba.com>
Cc: axboe@kernel.dk, xiaoguang.wang@linux.alibaba.com,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	joseph.qi@linux.alibaba.com
Subject: Re: [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled
Date: Tue, 20 Sep 2022 12:41:48 +0800	[thread overview]
Message-ID: <YylEjEply6y+bs0B@T590> (raw)
In-Reply-To: <0642eab9-6124-ba42-1585-82eab1ff9e87@linux.alibaba.com>

On Tue, Sep 20, 2022 at 11:34:32AM +0800, Ziyang Zhang wrote:
> On 2022/9/20 11:18, Ming Lei wrote:
> > On Tue, Sep 20, 2022 at 11:04:30AM +0800, Ziyang Zhang wrote:
> >> On 2022/9/20 10:39, Ming Lei wrote:
> >>> On Tue, Sep 20, 2022 at 09:31:54AM +0800, Ziyang Zhang wrote:
> >>>> On 2022/9/19 20:39, Ming Lei wrote:
> >>>>> On Mon, Sep 19, 2022 at 05:12:21PM +0800, Ziyang Zhang wrote:
> >>>>>> On 2022/9/19 11:55, Ming Lei wrote:
> >>>>>>> On Tue, Sep 13, 2022 at 12:17:04PM +0800, ZiyangZhang wrote:
> >>>>>>>> With recovery feature enabled, in ublk_queue_rq or task work
> >>>>>>>> (in exit_task_work or fallback wq), we requeue rqs instead of
> >>>>>>>> ending(aborting) them. Besides, No matter recovery feature is enabled
> >>>>>>>> or disabled, we schedule monitor_work immediately.
> >>>>>>>>
> >>>>>>>> Signed-off-by: ZiyangZhang <ZiyangZhang@linux.alibaba.com>
> >>>>>>>> ---
> >>>>>>>>  drivers/block/ublk_drv.c | 34 ++++++++++++++++++++++++++++++++--
> >>>>>>>>  1 file changed, 32 insertions(+), 2 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> >>>>>>>> index 23337bd7c105..b067f33a1913 100644
> >>>>>>>> --- a/drivers/block/ublk_drv.c
> >>>>>>>> +++ b/drivers/block/ublk_drv.c
> >>>>>>>> @@ -682,6 +682,21 @@ static void ubq_complete_io_cmd(struct ublk_io *io, int res)
> >>>>>>>>  
> >>>>>>>>  #define UBLK_REQUEUE_DELAY_MS	3
> >>>>>>>>  
> >>>>>>>> +static inline void __ublk_abort_rq_in_task_work(struct ublk_queue *ubq,
> >>>>>>>> +		struct request *rq)
> >>>>>>>> +{
> >>>>>>>> +	pr_devel("%s: %s q_id %d tag %d io_flags %x.\n", __func__,
> >>>>>>>> +			(ublk_queue_can_use_recovery(ubq)) ? "requeue" : "abort",
> >>>>>>>> +			ubq->q_id, rq->tag, ubq->ios[rq->tag].flags);
> >>>>>>>> +	/* We cannot process this rq so just requeue it. */
> >>>>>>>> +	if (ublk_queue_can_use_recovery(ubq)) {
> >>>>>>>> +		blk_mq_requeue_request(rq, false);
> >>>>>>>> +		blk_mq_delay_kick_requeue_list(rq->q, UBLK_REQUEUE_DELAY_MS);
> >>>>>>>
> >>>>>>> Here you needn't to kick requeue list since we know it can't make
> >>>>>>> progress. And you can do that once before deleting gendisk
> >>>>>>> or the queue is recovered.
> >>>>>>
> >>>>>> No, kicking rq here is necessary.
> >>>>>>
> >>>>>> Consider USER_RECOVERY is enabled and everything goes well.
> >>>>>> User sends STOP_DEV, and we have kicked requeue list in
> >>>>>> ublk_stop_dev() and are going to call del_gendisk().
> >>>>>> However, a crash happens now. Then rqs may be still requeued
> >>>>>> by ublk_queue_rq() because ublk_queue_rq() sees a dying
> >>>>>> ubq_daemon. So del_gendisk() will hang because there are
> >>>>>> rqs leaving in requeue list and no one kicks them.
> >>>>>
> >>>>> Why can't you kick requeue list before calling del_gendisk().
> >>>>
> >>>> Yes, we can kick requeue list once before calling del_gendisk().
> >>>> But a crash may happen just after kicking but before del_gendisk().
> >>>> So some rqs may be requeued at this moment. But we have already
> >>>> kicked the requeue list! Then del_gendisk() will hang, right?
> >>>
> >>> ->force_abort is set before kicking in ublk_unquiesce_dev(), so
> >>> all new requests are failed immediately instead of being requeued,
> >>> right?
> >>>
> >>
> >> ->force_abort is not heplful here because there may be fallback wq running
> >> which can requeue rqs after kicking requeue list.
> > 
> > After ublk_wait_tagset_rqs_idle() returns, there can't be any
> > pending requests in fallback wq or task work, can there
> Please consider this case: a crash happens while ublk_stop_dev() is
> calling. In such case I cannot schedule quiesce_work or call
> ublk_wait_tagset_rqs_idle(). This is because quiesce_work has to
> accquire ub_mutex to quiesce request queue.

The issue can be addressed in the following way more reliably &
cleanly & consistently, then you needn't to switch between the
two modes.

ublk_stop_dev()

	if (ublk_can_use_recovery(ub)) {
		if (ub->dev_info.state == UBLK_S_DEV_LIVE)
				__ublk_quiesce_dev(ub);		//lockless version
		ublk_unquiesce_dev();
	}

Meantime not necessary to disable recovery feature in ublk_unquiesce_dev
any more.




thanks,
Ming


  reply	other threads:[~2022-09-20  4:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-13  4:17 [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support ZiyangZhang
2022-09-13  4:17 ` [PATCH V3 1/7] ublk_drv: check 'current' instead of 'ubq_daemon' ZiyangZhang
2022-09-13  4:17 ` [PATCH V3 2/7] ublk_drv: refactor ublk_cancel_queue() ZiyangZhang
2022-09-13  4:17 ` [PATCH V3 3/7] ublk_drv: define macros for recovery feature and check them ZiyangZhang
2022-09-20  5:04   ` Ming Lei
2022-09-13  4:17 ` [PATCH V3 4/7] ublk_drv: requeue rqs with recovery feature enabled ZiyangZhang
2022-09-19  3:55   ` Ming Lei
2022-09-19  9:12     ` Ziyang Zhang
2022-09-19 12:39       ` Ming Lei
2022-09-20  1:31         ` Ziyang Zhang
2022-09-20  2:39           ` Ming Lei
2022-09-20  3:04             ` Ziyang Zhang
2022-09-20  3:18               ` Ming Lei
2022-09-20  3:34                 ` Ziyang Zhang
2022-09-20  4:41                   ` Ming Lei [this message]
2022-09-13  4:17 ` [PATCH V3 5/7] ublk_drv: consider recovery feature in aborting mechanism ZiyangZhang
2022-09-19  9:32   ` Ming Lei
2022-09-19  9:55     ` Ziyang Zhang
2022-09-19 12:33       ` Ming Lei
2022-09-20  1:49         ` Ziyang Zhang
2022-09-20  3:04           ` Ming Lei
2022-09-20  3:24             ` Ziyang Zhang
2022-09-20  4:01               ` Ming Lei
2022-09-20  4:39                 ` Ziyang Zhang
2022-09-20  4:49                   ` Ming Lei
2022-09-20  5:03                     ` Ziyang Zhang
2022-09-20  4:45             ` Ziyang Zhang
2022-09-20  5:05               ` Ziyang Zhang
2022-09-13  4:17 ` [PATCH V3 6/7] ublk_drv: add START_USER_RECOVERY and END_USER_RECOVERY support ZiyangZhang
2022-09-19 13:03   ` Ming Lei
2022-09-20  2:41     ` Ziyang Zhang
2022-09-13  4:17 ` [PATCH V3 7/7] ublk_drv: do not run monitor_work while ub's state is QUIESCED ZiyangZhang
2022-09-19  2:17 ` [PATCH V3 0/7] ublk_drv: add USER_RECOVERY support Ziyang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YylEjEply6y+bs0B@T590 \
    --to=ming.lei@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=joseph.qi@linux.alibaba.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.