From: Alan Stern <stern@rowland.harvard.edu>
To: Martin Kepplinger <martin.kepplinger@puri.sm>
Cc: Bart Van Assche <bvanassche@acm.org>,
Can Guo <cang@codeaurora.org>,
linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
kernel@puri.sm
Subject: Re: [PATCH] block: Fix bug in runtime-resume handling
Date: Sat, 29 Aug 2020 14:56:53 -0400 [thread overview]
Message-ID: <20200829185653.GB501978@rowland.harvard.edu> (raw)
In-Reply-To: <6d22ec22-a0c7-6a9d-439e-38ef87b0207c@puri.sm>
On Sat, Aug 29, 2020 at 06:33:26PM +0200, Martin Kepplinger wrote:
> On 29.08.20 17:26, Alan Stern wrote:
> > Hmmm. I'm wondering about something you wrote back in June
> > (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
> >
> > blk_queue_enter() always - especially when sd is runtime
> > suspended and I try to mount as above - sets success to be true
> > for me, so never continues down to bkl_pm_request_resume(). All
> > I see is "PM: Removing info for No Bus:sda1".
> >
> > blk_queue_enter() would always set success to be true because pm
> > (derived from the BLK_MQ_REQ_PREEMPT flag) is true. But why was the
> > BLK_MQ_REQ_PREEMPT flag set? In other words, where was
> > blk_queue_enter() called from?
> >
> > Can you get a stack trace (i.e., call dump_stack()) at exactly this
> > point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED? Or
> > do you already know the answer?
> >
> >
>
> I reverted any scsi/block out-of-tree fixes for this.
>
> when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
> first stack trace I get in this condition, inside of blk_queue_enter():
>
> There is more, but I don't know if that's interesting.
>
> [ 38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
> [ 38.642207] Hardware name: Purism Librem 5r3 (DT)
> [ 38.642213] Call trace:
> [ 38.642233] dump_backtrace+0x0/0x210
> [ 38.642242] show_stack+0x20/0x30
> [ 38.642252] dump_stack+0xc8/0x128
> [ 38.642262] blk_queue_enter+0x1b8/0x2d8
> [ 38.642271] blk_mq_alloc_request+0x54/0xb0
> [ 38.642277] blk_get_request+0x34/0x78
> [ 38.642286] __scsi_execute+0x60/0x1c8
> [ 38.642291] scsi_test_unit_ready+0x88/0x118
> [ 38.642298] sd_check_events+0x110/0x158
> [ 38.642306] disk_check_events+0x68/0x188
> [ 38.642312] disk_clear_events+0x84/0x198
> [ 38.642320] check_disk_change+0x38/0x90
> [ 38.642325] sd_open+0x60/0x148
> [ 38.642330] __blkdev_get+0xcc/0x4c8
> [ 38.642335] __blkdev_get+0x278/0x4c8
> [ 38.642339] blkdev_get+0x128/0x1a8
> [ 38.642345] blkdev_open+0x98/0xb0
> [ 38.642354] do_dentry_open+0x130/0x3c8
> [ 38.642359] vfs_open+0x34/0x40
> [ 38.642366] path_openat+0xa30/0xe40
> [ 38.642372] do_filp_open+0x84/0x100
> [ 38.642377] do_sys_openat2+0x1f4/0x2b0
> [ 38.642382] do_sys_open+0x60/0xa8
> (...)
>
> and of course it doesn't work and /dev/sda1 disappears, see the initial
> discussion that led to your fix.
Great! That's exactly what I was looking for, thank you.
Bart, this is a perfect example of the potential race I've been talking
about in the other email thread. Suppose thread 0 is carrying out a
runtime suspend of a SCSI disk and at the same time, thread 1 is opening
the disk's block device (as we see in the stack trace here). Then we
could have the following:
Thread 0 Thread 1
-------- --------
Start runtime suspend
blk_pre_runtime_suspend calls
blk_set_pm_only and sets
q->rpm_status to RPM_SUSPENDING
Call sd_open -> ... -> scsi_test_unit_ready
-> __scsi_execute -> ...
-> blk_queue_enter
Sees BLK_MQ_REQ_PREEMPT set and
RPM_SUSPENDING queue status, so does
not postpone the request
blk_post_runtime_suspend sets
q->rpm_status to RPM_SUSPENDED
The drive goes into runtime suspend
Issues the TEST UNIT READY request
Request fails because the drive is suspended
One way to avoid this race is mutual exclusion: We could make sd_open
prevent the drive from being runtime suspended until it returns.
However I don't like this approach; it would mean tracking down every
possible pathway to __scsi_execute and making sure that runtime suspend
is blocked.
A more fine-grained approach would be to have __scsi_execute itself call
scsi_autopm_get/put_device whenever the rq_flags argument doesn't
contain RQF_PM. This way we wouldn't have to worry about missing any
possiible pathways. But it relies on an implicit assumption that
__scsi_execute is the only place where the PREEMPT flag gets set.
A third possibility is the approach I outlined before, adding a
BLK_MQ_REQ_PM flag. But to avoid the deadlock you pointed out, I would
make blk_queue_enter smarter about whether to postpone a request. The
logic would go like this:
If !blk_queue_pm_only:
Allow
If !BLK_MQ_REQ_PREEMPT:
Postpone
If q->rpm_status is RPM_ACTIVE:
Allow
If !BLK_MQ_REQ_PM:
Postpone
If q->rpm_status is RPM_SUSPENDED:
Postpone
Else:
Allow
The assumption here is that the PREEMPT flag is set whenever the PM flag
is.
I believe either the second or third possibility would work. The second
looks to be the simplest
What do you think?
Alan Stern
next prev parent reply other threads:[~2020-08-29 18:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9b80ca7c-39f8-e52d-2535-8b0baf93c7d1@puri.sm>
[not found] ` <425990b3-4b0b-4dcf-24dc-4e7e60d5869d@puri.sm>
[not found] ` <20200807143002.GE226516@rowland.harvard.edu>
[not found] ` <b0abab28-880e-4b88-eb3c-9ffd927d1ed9@puri.sm>
[not found] ` <20200808150542.GB256751@rowland.harvard.edu>
[not found] ` <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm>
[not found] ` <20200809152643.GA277165@rowland.harvard.edu>
[not found] ` <60150284-be13-d373-5448-651b72a7c4c9@puri.sm>
[not found] ` <20200810141343.GA299045@rowland.harvard.edu>
[not found] ` <6f0c530f-4309-ab1e-393b-83bf8367f59e@puri.sm>
2020-08-23 14:57 ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
2020-08-24 17:48 ` Bart Van Assche
2020-08-24 20:13 ` Alan Stern
2020-08-26 7:48 ` Martin Kepplinger
2020-08-27 17:42 ` Martin Kepplinger
2020-08-27 20:29 ` Alan Stern
2020-08-29 7:24 ` Martin Kepplinger
2020-08-29 15:26 ` Alan Stern
2020-08-29 16:33 ` Martin Kepplinger
2020-08-29 18:56 ` Alan Stern [this message]
2020-08-30 0:38 ` Bart Van Assche
2020-08-30 1:06 ` Alan Stern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200829185653.GB501978@rowland.harvard.edu \
--to=stern@rowland.harvard.edu \
--cc=bvanassche@acm.org \
--cc=cang@codeaurora.org \
--cc=kernel@puri.sm \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.kepplinger@puri.sm \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox