Re: [PATCH] block: Fix bug in runtime-resume handling

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Alan Stern <stern@rowland.harvard.edu>
To: Martin Kepplinger <martin.kepplinger@puri.sm>
Cc: Bart Van Assche <bvanassche@acm.org>,
	Can Guo <cang@codeaurora.org>,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
	kernel@puri.sm
Subject: Re: [PATCH] block: Fix bug in runtime-resume handling
Date: Sat, 29 Aug 2020 14:56:53 -0400	[thread overview]
Message-ID: <20200829185653.GB501978@rowland.harvard.edu> (raw)
In-Reply-To: <6d22ec22-a0c7-6a9d-439e-38ef87b0207c@puri.sm>

On Sat, Aug 29, 2020 at 06:33:26PM +0200, Martin Kepplinger wrote:
> On 29.08.20 17:26, Alan Stern wrote:
> > Hmmm.  I'm wondering about something you wrote back in June 
> > (https://marc.info/?l=linux-scsi&m=159345778431615&w=2):
> > 
> > 	blk_queue_enter() always - especially when sd is runtime 
> > 	suspended and I try to mount as above - sets success to be true 
> > 	for me, so never continues down to bkl_pm_request_resume(). All 
> > 	I see is "PM: Removing info for No Bus:sda1".
> > 
> > blk_queue_enter() would always set success to be true because pm 
> > (derived from the BLK_MQ_REQ_PREEMPT flag) is true.  But why was the 
> > BLK_MQ_REQ_PREEMPT flag set?  In other words, where was 
> > blk_queue_enter() called from?
> > 
> > Can you get a stack trace (i.e., call dump_stack()) at exactly this 
> > point, that is, when pm is true and q->rpm_status is RPM_SUSPENDED?  Or 
> > do you already know the answer?
> > 
> >
> 
> I reverted any scsi/block out-of-tree fixes for this.
> 
> when I try to mount, pm is TRUE (BLK_MQ_REQ_PREEMT set) and that's the
> first stack trace I get in this condition, inside of blk_queue_enter():
> 
> There is more, but I don't know if that's interesting.
> 
> [   38.642202] CPU: 2 PID: 1522 Comm: mount Not tainted 5.8.0-1-librem5 #487
> [   38.642207] Hardware name: Purism Librem 5r3 (DT)
> [   38.642213] Call trace:
> [   38.642233]  dump_backtrace+0x0/0x210
> [   38.642242]  show_stack+0x20/0x30
> [   38.642252]  dump_stack+0xc8/0x128
> [   38.642262]  blk_queue_enter+0x1b8/0x2d8
> [   38.642271]  blk_mq_alloc_request+0x54/0xb0
> [   38.642277]  blk_get_request+0x34/0x78
> [   38.642286]  __scsi_execute+0x60/0x1c8
> [   38.642291]  scsi_test_unit_ready+0x88/0x118
> [   38.642298]  sd_check_events+0x110/0x158
> [   38.642306]  disk_check_events+0x68/0x188
> [   38.642312]  disk_clear_events+0x84/0x198
> [   38.642320]  check_disk_change+0x38/0x90
> [   38.642325]  sd_open+0x60/0x148
> [   38.642330]  __blkdev_get+0xcc/0x4c8
> [   38.642335]  __blkdev_get+0x278/0x4c8
> [   38.642339]  blkdev_get+0x128/0x1a8
> [   38.642345]  blkdev_open+0x98/0xb0
> [   38.642354]  do_dentry_open+0x130/0x3c8
> [   38.642359]  vfs_open+0x34/0x40
> [   38.642366]  path_openat+0xa30/0xe40
> [   38.642372]  do_filp_open+0x84/0x100
> [   38.642377]  do_sys_openat2+0x1f4/0x2b0
> [   38.642382]  do_sys_open+0x60/0xa8
> (...)
> 
> and of course it doesn't work and /dev/sda1 disappears, see the initial
> discussion that led to your fix.

Great!  That's exactly what I was looking for, thank you.

Bart, this is a perfect example of the potential race I've been talking 
about in the other email thread.  Suppose thread 0 is carrying out a 
runtime suspend of a SCSI disk and at the same time, thread 1 is opening 
the disk's block device (as we see in the stack trace here).  Then we 
could have the following:

	Thread 0		Thread 1
	--------		--------
	Start runtime suspend
	blk_pre_runtime_suspend calls
	  blk_set_pm_only and sets
	  q->rpm_status to RPM_SUSPENDING

				Call sd_open -> ... -> scsi_test_unit_ready
				  -> __scsi_execute -> ...
				  -> blk_queue_enter
				Sees BLK_MQ_REQ_PREEMPT set and
				  RPM_SUSPENDING queue status, so does 
				  not postpone the request

	blk_post_runtime_suspend sets
	  q->rpm_status to RPM_SUSPENDED
	The drive goes into runtime suspend

				Issues the TEST UNIT READY request
				Request fails because the drive is suspended

One way to avoid this race is mutual exclusion: We could make sd_open 
prevent the drive from being runtime suspended until it returns.  
However I don't like this approach; it would mean tracking down every 
possible pathway to __scsi_execute and making sure that runtime suspend 
is blocked.

A more fine-grained approach would be to have __scsi_execute itself call 
scsi_autopm_get/put_device whenever the rq_flags argument doesn't 
contain RQF_PM.  This way we wouldn't have to worry about missing any 
possiible pathways.  But it relies on an implicit assumption that 
__scsi_execute is the only place where the PREEMPT flag gets set.

A third possibility is the approach I outlined before, adding a 
BLK_MQ_REQ_PM flag.  But to avoid the deadlock you pointed out, I would 
make blk_queue_enter smarter about whether to postpone a request.  The 
logic would go like this:

	If !blk_queue_pm_only:
		Allow
	If !BLK_MQ_REQ_PREEMPT:
		Postpone
	If q->rpm_status is RPM_ACTIVE:
		Allow
	If !BLK_MQ_REQ_PM:
		Postpone
	If q->rpm_status is RPM_SUSPENDED:
		Postpone
	Else:
		Allow

The assumption here is that the PREEMPT flag is set whenever the PM flag 
is.

I believe either the second or third possibility would work.  The second 
looks to be the simplest

What do you think?

Alan Stern

next prev parent reply	other threads:[~2020-08-29 18:56 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9b80ca7c-39f8-e52d-2535-8b0baf93c7d1@puri.sm>
     [not found] ` <425990b3-4b0b-4dcf-24dc-4e7e60d5869d@puri.sm>
     [not found]   ` <20200807143002.GE226516@rowland.harvard.edu>
     [not found]     ` <b0abab28-880e-4b88-eb3c-9ffd927d1ed9@puri.sm>
     [not found]       ` <20200808150542.GB256751@rowland.harvard.edu>
     [not found]         ` <d3b6f7b8-5345-1ae1-4f79-5dde226e74f1@puri.sm>
     [not found]           ` <20200809152643.GA277165@rowland.harvard.edu>
     [not found]             ` <60150284-be13-d373-5448-651b72a7c4c9@puri.sm>
     [not found]               ` <20200810141343.GA299045@rowland.harvard.edu>
     [not found]                 ` <6f0c530f-4309-ab1e-393b-83bf8367f59e@puri.sm>
2020-08-23 14:57                   ` [PATCH] block: Fix bug in runtime-resume handling Alan Stern
2020-08-24 17:48                     ` Bart Van Assche
2020-08-24 20:13                       ` Alan Stern
2020-08-26  7:48                         ` Martin Kepplinger
2020-08-27 17:42                           ` Martin Kepplinger
2020-08-27 20:29                             ` Alan Stern
2020-08-29  7:24                               ` Martin Kepplinger
2020-08-29 15:26                                 ` Alan Stern
2020-08-29 16:33                                   ` Martin Kepplinger
2020-08-29 18:56                                     ` Alan Stern [this message]
2020-08-30  0:38                                       ` Bart Van Assche
2020-08-30  1:06                                         ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200829185653.GB501978@rowland.harvard.edu \
    --to=stern@rowland.harvard.edu \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=kernel@puri.sm \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.kepplinger@puri.sm \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox