All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Paolo Valente <paolo.valente@linaro.org>
Cc: Alban Browaeys <alban.browaeys@gmail.com>,
	Jens Axboe <axboe@fb.com>,
	linux-block <linux-block@vger.kernel.org>,
	SERENA ZIVIANI <169364@studenti.unimore.it>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Linus Walleij <linus.walleij@linaro.org>
Subject: Re: blk-mq + bfq:  udevd hang on usb2 storages
Date: Fri, 8 Dec 2017 09:28:04 +0800	[thread overview]
Message-ID: <20171208012803.GC21488@ming.t460p> (raw)
In-Reply-To: <D03704C9-BF8E-49E1-B552-90393BC9DE4A@linaro.org>

Hi Paolo,

On Thu, Dec 07, 2017 at 07:04:54PM +0100, Paolo Valente wrote:
> 
> > Il giorno 04 dic 2017, alle ore 11:57, Ming Lei <ming.lei@redhat.com> ha scritto:
> > 
> > On Fri, Dec 01, 2017 at 06:04:29PM +0100, Alban Browaeys wrote:
> >> I initially reported as https://bugzilla.kernel.org/show_bug.cgi?id=198
> >> 023 .
> >> 
> >> I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq-
> >> sched: decide how to handle flush rq via RQF_FLUSH_SEQ".
> >> 
> >> This is with an USB stick Sandisk Cruzer (USB Version:  2.10) I
> >> regressed with.
> >> systemctl restart systemd-udevd restores sanity.
> >> 
> >> PS: With an USB3 Lexar (USB Version:  3.00) I get more severe an issue
> >> (not bisected) where I find no way out of reboot. My report to bugzilla
> >> has logs when I was swapping between the these keys. The logs attached
> >> there mixes what looks like two different behaviors.
> > 
> > Hi Paolo,
> > 
> > From both Alban's trace and my trace, looks this issue is in BFQ,
> > since request can't be retrieved via e->type->ops.mq.dispatch_request()
> > in blk_mq_do_dispatch_sched() after it is inserted into BFQ's queue.
> > 
> >        https://bugzilla.kernel.org/show_bug.cgi?id=198023#c4
> >        https://marc.info/?l=linux-block&m=151214241518562&w=2
> > 
> > BTW, I have tried to reproduce the issue with scsi_debug, but not succeed,
> > and it can't be reproduced with other schedulers(mq-deadline, none) too.
> > 
> > So could you take a look?
> > 
> 
> Hi Ming, all,
> sorry for the delay, but we preferred to reply directly after finding
> the cause of the problem.  And the cause is that gdisk makes an I/O

Not a problem, :-)

In the previous mail, I just want to share you our findings.

> request that is dispatched to the drive, but apparently never
> completed (as Serena, in CC discovered).  Or, at least, the execution
> of completed_request in bfq is never triggered.

I can understand the case a bit, and the following info may be helpful
for you:

1) USB's queue depth is one

2) the only pending request is completed, and scsi_finish_command() is called

3) inside scsi_finish_command(), scsi_device_unbusy() is called at the
beginning, once it is done, blk_mq_get_dispatch_budget() in blk_mq_do_dispatch_sched()
returns true, then we can start to try to dispatch request

4) e->type->ops.mq.dispatch_request() is called, but the request in 2)
isn't completed yet, completed_request in bfq isn't be run yet because
it is called later from scsi_end_request()(<-scsi_io_completion()<-scsi_finish_command())

Then no request can be dispatched any more, and hang happens, but
finally completed_request should be run later.

> 
> In more detail: disk is a process for which bfq performs device idling
> (for good reasons), and, for one such process, bfq does not switch to
> serving another process until the last pending request of the process
> is completed, after which device idling is started, to wait for the
> next request of the process.  So, if such a last request is never
> completed, bfq remains forever waiting for such an event, and then
> refuses forever to deliver requests of other queues.
> 
> As for why bfq_completed_request is not executed for the above,

It should be run.

> dispatched request, the reason is either that the bfq_finish_request
> hook is not invoked at all, or that it is invoked, but the request
> does not have the RQF_STARTED flag set.  Discovering which event

The flag of RQF_STARTED is set only if there is one request found by
__bfq_dispatch_request(), which can never happen in this case, since
we observed no request is found by __bfq_dispatch_request() even though
it has been inserted to BFQ queue already.

> occurs is our next step.
> 
> We'll let you know.

Thanks,
Ming

      reply	other threads:[~2017-12-08  1:28 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-01 17:04 blk-mq + bfq: udevd hang on usb2 storages Alban Browaeys
2017-12-01 17:29 ` Ming Lei
2017-12-04 10:57 ` Ming Lei
2017-12-07 18:04   ` Paolo Valente
2017-12-08  1:28     ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171208012803.GC21488@ming.t460p \
    --to=ming.lei@redhat.com \
    --cc=169364@studenti.unimore.it \
    --cc=alban.browaeys@gmail.com \
    --cc=axboe@fb.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=paolo.valente@linaro.org \
    --cc=ulf.hansson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.