public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] ata fixes for 6.5-rc5
@ 2023-08-06  1:29 Damien Le Moal
  2023-08-06  2:26 ` Linus Torvalds
  2023-08-06  2:31 ` pr-tracker-bot
  0 siblings, 2 replies; 7+ messages in thread
From: Damien Le Moal @ 2023-08-06  1:29 UTC (permalink / raw)
  To: Linus Torvalds, linux-ide

Linus,

The following changes since commit 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4:

  Linux 6.5-rc4 (2023-07-30 13:23:47 -0700)

are available in the Git repository at:

  ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata tags/ata-6.5-rc5

for you to fetch changes up to 0a8589055936d8feb56477123a8373ac634018fa:

  ata,scsi: do not issue START STOP UNIT on resume (2023-08-02 17:01:12 +0900)

----------------------------------------------------------------
ata fixes for 6.5-rc5

 - Prevent the scsi disk driver from issuing a START STOP UNIT command
   for ATA devices during system resume as this causes various issues
   reported by multiple users.

----------------------------------------------------------------
Damien Le Moal (1):
      ata,scsi: do not issue START STOP UNIT on resume

 drivers/ata/libata-scsi.c  | 7 +++++++
 drivers/scsi/sd.c          | 9 ++++++---
 include/scsi/scsi_device.h | 1 +
 3 files changed, 14 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  1:29 [GIT PULL] ata fixes for 6.5-rc5 Damien Le Moal
@ 2023-08-06  2:26 ` Linus Torvalds
  2023-08-06  2:55   ` Linus Torvalds
  2023-08-06  9:48   ` Damien Le Moal
  2023-08-06  2:31 ` pr-tracker-bot
  1 sibling, 2 replies; 7+ messages in thread
From: Linus Torvalds @ 2023-08-06  2:26 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide

On Sat, 5 Aug 2023 at 18:29, Damien Le Moal <dlemoal@kernel.org> wrote:
>
>  - Prevent the scsi disk driver from issuing a START STOP UNIT command
>    for ATA devices during system resume as this causes various issues
>    reported by multiple users.

Honestly, this seems pretty broken.

I mean, there are literally just three things that manage_start_stop
does, and you just disabled one of those cases.

And it's quite illogical, since one of the two remaining cases is for
the suspend side (the final one is shutdown).

The deeper issue seems to be that ATA is just doing something that
other SCSI devices aren't doing (the whole set
'sdev->manage_start_stop' thing), and while there is one other user in
the sbp2 firewire code, clearly the SCSI people don't really care for
this bit, and the SCSI changes then cause problems.

I agree that the new "don't ask sd.c to send a start command" is the
right thing to do, but I feel it was done in a particularly ugly and
illogical manner.

I think it would have been better if maybe the approach would have been to

 (a) split manage_start_stop into three bits for the three actual
cases it deals with: (stop_on_suspend, stop_on_shutdown,
start_on_resume).

 (b) then not set the "start_on_resume" bit

instead of the current crazy model of first saying "please manage
start/stop for me" and then following up with "oh, but don't do it for
this case".

See what I'm saying? Don't mix "please do X, but don't do subset Y"
bits. It's a completely messy thing and makes it really hard to figure
out what you actually want for no good reason.

I've pulled this, but it really smells like a maintenance disaster to
me. Particularly since the SCSI people really don't care about ATA
anyway.

               Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  1:29 [GIT PULL] ata fixes for 6.5-rc5 Damien Le Moal
  2023-08-06  2:26 ` Linus Torvalds
@ 2023-08-06  2:31 ` pr-tracker-bot
  1 sibling, 0 replies; 7+ messages in thread
From: pr-tracker-bot @ 2023-08-06  2:31 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Linus Torvalds, linux-ide

The pull request you sent on Sun,  6 Aug 2023 10:29:01 +0900:

> ssh://git@gitolite.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata tags/ata-6.5-rc5

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fb0d91991cedb51bc604c6b3915df75d8a59a4a3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  2:26 ` Linus Torvalds
@ 2023-08-06  2:55   ` Linus Torvalds
  2023-08-06  9:51     ` Damien Le Moal
  2023-08-06  9:48   ` Damien Le Moal
  1 sibling, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2023-08-06  2:55 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide

On Sat, 5 Aug 2023 at 19:26, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>  (a) split manage_start_stop into three bits for the three actual
> cases it deals with: (stop_on_suspend, stop_on_shutdown,
> start_on_resume).
>
>  (b) then not set the "start_on_resume" bit

Side note: the ATA layer already has

 - ATA_FLAG_NO_POWEROFF_SPINDOWN

which seems to basically be very close to "don't send stop on shutdown"

And in fact, it looks like

 - ATA_FLAG_NO_HIBERNATE_SPINDOWN

is in turn very close to "don't send stop on suspend". So if that
manage_start_stop had been just split into those three actions, it
sounds like those ATA_FLAG flags we have would basically have
translated to setting those bits too.

And the whole "start" seems to be translated to "ATA_CMD_VERIFY for
the first sector", which would seem to be literally just a random
command intentionally chosen to not return any actual data.

The *only* effect of doing that 'start' is to cause extra disk IO
early that is then ignored. Logically it doesn't actually do anything
useful, and it would seem like it might actually be an actively bad
thing (ie spinning up a disk on a laptop for potentially no actual
good reason, and waiting for this all to happen).

End result: it really smells like ATA fundamentally doesn't want the
whole 'manage_start_stop' noise AT ALL.

You just removed the nasty early start with that 'no_start_on_resume'
bit, and the spindown seems to be questionable on at least some
machines too.

So I wonder: did somebody test just removing the setting of that flag entirely?

I guess ATA is legacy enough these days that people want to make
minimal changes, although that 'no_start_on_resume' really doesn't
smell all that much more minimal to me than not sending spindown
commands.

        Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  2:26 ` Linus Torvalds
  2023-08-06  2:55   ` Linus Torvalds
@ 2023-08-06  9:48   ` Damien Le Moal
  1 sibling, 0 replies; 7+ messages in thread
From: Damien Le Moal @ 2023-08-06  9:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-ide

On 8/6/23 11:26, Linus Torvalds wrote:
> On Sat, 5 Aug 2023 at 18:29, Damien Le Moal <dlemoal@kernel.org> wrote:
>>
>>  - Prevent the scsi disk driver from issuing a START STOP UNIT command
>>    for ATA devices during system resume as this causes various issues
>>    reported by multiple users.
> 
> Honestly, this seems pretty broken.
> 
> I mean, there are literally just three things that manage_start_stop
> does, and you just disabled one of those cases.
> 
> And it's quite illogical, since one of the two remaining cases is for
> the suspend side (the final one is shutdown).
> 
> The deeper issue seems to be that ATA is just doing something that
> other SCSI devices aren't doing (the whole set
> 'sdev->manage_start_stop' thing), and while there is one other user in
> the sbp2 firewire code, clearly the SCSI people don't really care for
> this bit, and the SCSI changes then cause problems.
> 
> I agree that the new "don't ask sd.c to send a start command" is the
> right thing to do, but I feel it was done in a particularly ugly and
> illogical manner.
> 
> I think it would have been better if maybe the approach would have been to
> 
>  (a) split manage_start_stop into three bits for the three actual
> cases it deals with: (stop_on_suspend, stop_on_shutdown,
> start_on_resume).
> 
>  (b) then not set the "start_on_resume" bit
> 
> instead of the current crazy model of first saying "please manage
> start/stop for me" and then following up with "oh, but don't do it for
> this case".
> 
> See what I'm saying? Don't mix "please do X, but don't do subset Y"
> bits. It's a completely messy thing and makes it really hard to figure
> out what you actually want for no good reason.
> 
> I've pulled this, but it really smells like a maintenance disaster to
> me. Particularly since the SCSI people really don't care about ATA
> anyway.

I fully agree with all you said. ATA suspend/resume was badly impacted by the
switch to asynchronous PM operations and the scsi layer using the regular PM ops
instead of its own. The problems come from the fact that there is no direct link
between the scsi device and ata device (and ata port/host device). As a result,
the asynchronous PM ops ordering (parent then child for suspend and reverse for
resume) does not happen and mess things up. This rather ugly "fix" is the
smallest I could think of to quickly address the multiple regressions that were
signaled by users.

This definitely needs a proper cleanup on top. Your suggestion above makes sense
and is definitely needed as the scsi START STOP UNIT command on resume is really
useless. There are also some strange things happening with stopping on shutdown
that I have noticed as well (e.g. cache flush failing), which I now suspect
happen for the same reason as this resume issue: PM operation ordering. So I
believe that a proper fix must also include creating correct device links
between ATA and scsi devices so that PM ordering works again.

This is at the top of my to-do list. Working on it.

> 
>                Linus

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  2:55   ` Linus Torvalds
@ 2023-08-06  9:51     ` Damien Le Moal
  2023-08-06 17:37       ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Damien Le Moal @ 2023-08-06  9:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-ide

On 8/6/23 11:55, Linus Torvalds wrote:
> On Sat, 5 Aug 2023 at 19:26, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>>  (a) split manage_start_stop into three bits for the three actual
>> cases it deals with: (stop_on_suspend, stop_on_shutdown,
>> start_on_resume).
>>
>>  (b) then not set the "start_on_resume" bit
> 
> Side note: the ATA layer already has
> 
>  - ATA_FLAG_NO_POWEROFF_SPINDOWN
> 
> which seems to basically be very close to "don't send stop on shutdown"
> 
> And in fact, it looks like
> 
>  - ATA_FLAG_NO_HIBERNATE_SPINDOWN
> 
> is in turn very close to "don't send stop on suspend". So if that
> manage_start_stop had been just split into those three actions, it
> sounds like those ATA_FLAG flags we have would basically have
> translated to setting those bits too.

Indeed.

> And the whole "start" seems to be translated to "ATA_CMD_VERIFY for
> the first sector", which would seem to be literally just a random
> command intentionally chosen to not return any actual data.

This likely was added to force the drive to spinup, to avoid subsequent commands
to fail. The start process doing a port reset and then revalidating the drive,
which starts by issuing an IDENTIFY command, definitely will wake up the drive.
There is no need for relying on scsi to do it.

However, I am still not convinced that the revalidate should be done at all,
unless the scsi layer does do it too. As you said, this may be causing
unwarranted spinup for cases when the drive is not used after resuming. But
again here, scsi and ATA revalidate are not linked either. One does not trigger
the other.

> The *only* effect of doing that 'start' is to cause extra disk IO
> early that is then ignored. Logically it doesn't actually do anything
> useful, and it would seem like it might actually be an actively bad
> thing (ie spinning up a disk on a laptop for potentially no actual
> good reason, and waiting for this all to happen).
> 
> End result: it really smells like ATA fundamentally doesn't want the
> whole 'manage_start_stop' noise AT ALL.

Hannes suggested doing all ATA start/stop processing solely relying on libata,
that is, not using manage_start_stop at all. This is another option I am
considering.

> 
> You just removed the nasty early start with that 'no_start_on_resume'
> bit, and the spindown seems to be questionable on at least some
> machines too.

Yes it is. I have seen issues with the shutdown side, most of the time showing
as a synchronize cache failure. But until now, I had no clear understanding
how/why that happened. I understand it now and will be fixing that.

> So I wonder: did somebody test just removing the setting of that flag entirely?
> 
> I guess ATA is legacy enough these days that people want to make
> minimal changes, although that 'no_start_on_resume' really doesn't
> smell all that much more minimal to me than not sending spindown
> commands.

I will revisit everything around PM operations. ATA is indeed legacy-ish for
most desktop/laptop users as NVMe has taken over most systems. But there are
still a lot of enterprise deployments using ATA drives, and a lot of users
running old hardware. The former is rather easy to deal with as the hardware is
most of the time very recent, and so better than old devices. The former is a
constant source of headaches due to the sheer amount of buggy hardware out
there. The potential for regressions when changing something is high, so I try
to be prudent with changes.

Thank you for your comments.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GIT PULL] ata fixes for 6.5-rc5
  2023-08-06  9:51     ` Damien Le Moal
@ 2023-08-06 17:37       ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2023-08-06 17:37 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide

On Sun, 6 Aug 2023 at 02:51, Damien Le Moal <dlemoal@kernel.org> wrote:
>
> Hannes suggested doing all ATA start/stop processing solely relying on libata,
> that is, not using manage_start_stop at all. This is another option I am
> considering.

That definitely sounds like the best option to me, so if it is doable
that sounds worth pursuing.

            Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-08-06 17:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-06  1:29 [GIT PULL] ata fixes for 6.5-rc5 Damien Le Moal
2023-08-06  2:26 ` Linus Torvalds
2023-08-06  2:55   ` Linus Torvalds
2023-08-06  9:51     ` Damien Le Moal
2023-08-06 17:37       ` Linus Torvalds
2023-08-06  9:48   ` Damien Le Moal
2023-08-06  2:31 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox