Block device's sysfs setting getting lost after suspend-resume cycle

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Block device's sysfs setting getting lost after suspend-resume cycle
@ 2025-04-19 10:16 Holger Hoffstätte
  2025-04-21 11:49 ` Christoph Hellwig
  2025-04-23  6:35 ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Holger Hoffstätte @ 2025-04-19 10:16 UTC (permalink / raw)
  To: linux-block

Hi!

I just noticed that sysfs settings now seem to get lost after
a suspend/resume cycle. In my case it's queue/read_ahead_kb,
which I configure with a udev rule. This has been working fine
for years.

We start out with:
$cat /sys/block/nvme0n1/queue/read_ahead_kb
128

Set a different value:
$echo 256 > /sys/block/nvme0n1/queue/read_ahead_kb
$cat /sys/block/nvme0n1/queue/read_ahead_kb
256

<suspend & resume>

Check again:
$cat /sys/block/nvme0n1/queue/read_ahead_kb
128

I'm reasonably sure it used to retain the configured value.
The same also happens with sd (SATA) devices on a different machine,
so it seems to be a generic problem with either block or sysfs.

This is with 6.14.3-rc2. I have unfortunately no idea when this
started to happen - i only noticed it now. Will trawl through
git history but wanted to see if this rings a bell with someone.

thanks,
Holger

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-19 10:16 Block device's sysfs setting getting lost after suspend-resume cycle Holger Hoffstätte
@ 2025-04-21 11:49 ` Christoph Hellwig
  2025-04-23  6:35 ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-04-21 11:49 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-block

On Sat, Apr 19, 2025 at 12:16:35PM +0200, Holger Hoffstätte wrote:
> I'm reasonably sure it used to retain the configured value.
> The same also happens with sd (SATA) devices on a different machine,
> so it seems to be a generic problem with either block or sysfs.
> 
> This is with 6.14.3-rc2. I have unfortunately no idea when this
> started to happen - i only noticed it now. Will trawl through
> git history but wanted to see if this rings a bell with someone.

I'm pretty sure this is the atomic queue limits series, and I'm
actually surprised it was persisted before as we don't have separated
hardware vs user limits.  I'll dig into it and get back to you ASAP
after catching up from having a few days off.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-19 10:16 Block device's sysfs setting getting lost after suspend-resume cycle Holger Hoffstätte
  2025-04-21 11:49 ` Christoph Hellwig
@ 2025-04-23  6:35 ` Christoph Hellwig
  2025-04-23 10:11   ` Holger Hoffstätte
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-04-23  6:35 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-block

Hi Holger,

can you try the patch below?  It fixes losing the read-ahead value in
my little test. 

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 6b2dbe645d23..4817e7ca03f8 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -61,8 +61,14 @@ void blk_apply_bdi_limits(struct backing_dev_info *bdi,
 	/*
 	 * For read-ahead of large files to be effective, we need to read ahead
 	 * at least twice the optimal I/O size.
+	 *
+	 * There is no hardware limitation for the read-ahead size and the user
+	 * might have increased the read-ahead size through sysfs, so don't ever
+	 * decrease it.
 	 */
-	bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
+	bdi->ra_pages = max3(bdi->ra_pages,
+				lim->io_opt * 2 / PAGE_SIZE,
+				VM_READAHEAD_PAGES);
 	bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-23  6:35 ` Christoph Hellwig
@ 2025-04-23 10:11   ` Holger Hoffstätte
  2025-04-23 16:19     ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2025-04-23 10:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block

On 2025-04-23 08:35, Christoph Hellwig wrote:
> Hi Holger,
> 
> can you try the patch below?  It fixes losing the read-ahead value in
> my little test.
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 6b2dbe645d23..4817e7ca03f8 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -61,8 +61,14 @@ void blk_apply_bdi_limits(struct backing_dev_info *bdi,
>   	/*
>   	 * For read-ahead of large files to be effective, we need to read ahead
>   	 * at least twice the optimal I/O size.
> +	 *
> +	 * There is no hardware limitation for the read-ahead size and the user
> +	 * might have increased the read-ahead size through sysfs, so don't ever
> +	 * decrease it.
>   	 */
> -	bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
> +	bdi->ra_pages = max3(bdi->ra_pages,
> +				lim->io_opt * 2 / PAGE_SIZE,
> +				VM_READAHEAD_PAGES);
>   	bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
>   }
>   
> 

Tried several different readahead values across multiple
suspend/resume cycles and it retained them properly again.

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

Thank you!

cheers
Holger

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-23 10:11   ` Holger Hoffstätte
@ 2025-04-23 16:19     ` Christoph Hellwig
  2025-04-23 17:05       ` Holger Hoffstätte
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2025-04-23 16:19 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: linux-block, martin.petersen

So I tried to figure out how this happened, but AFAIK even
the pre atomic limits code (blk_queue_io_opt) always overrode
ra_pages.  So for nvme in particular this either was introduced by

81adb863349157c67ccec871e5ae5574600c50be (HEAD)
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Jun 28 09:53:31 2019 -0700

    nvme: set physical block size and optimal I/O size

which is so old that my current compiler refuses to build that
kernel to verify it, or by the fact that you either upgraded your SSD
or the SSD firmware to set the relevant limit which was added to
nvme only a little before that.

No good fixes tag I guess, but I'll formally send out the patch anyway.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-23 16:19     ` Christoph Hellwig
@ 2025-04-23 17:05       ` Holger Hoffstätte
  2025-04-24  8:17         ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Holger Hoffstätte @ 2025-04-23 17:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, martin.petersen

On 2025-04-23 18:19, Christoph Hellwig wrote:
> So I tried to figure out how this happened, but AFAIK even
> the pre atomic limits code (blk_queue_io_opt) always overrode
> ra_pages.  So for nvme in particular this either was introduced by
> 
> 81adb863349157c67ccec871e5ae5574600c50be (HEAD)
> Author: Bart Van Assche <bvanassche@acm.org>
> Date:   Fri Jun 28 09:53:31 2019 -0700
> 
>      nvme: set physical block size and optimal I/O size
> 
> which is so old that my current compiler refuses to build that
> kernel to verify it, or by the fact that you either upgraded your SSD
> or the SSD firmware to set the relevant limit which was added to
> nvme only a little before that.
> 
> No good fixes tag I guess, but I'll formally send out the patch anyway.

There may have been a misunderstanding. I first noticed this on an old
machine with SATA SSDs where I *do* have an udev rule for readahead.
I only used my laptop with NVME drive (from ~2021) to reproduce the problem
and send the email. On that machine I do not have any udev rule to set
readahead since it's plenty fast.

Not sure if that matters, as it was a valid bug after all and now
it's fixed, so thanks again!

cheers
Holger

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Block device's sysfs setting getting lost after suspend-resume cycle
  2025-04-23 17:05       ` Holger Hoffstätte
@ 2025-04-24  8:17         ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2025-04-24  8:17 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Christoph Hellwig, linux-block, martin.petersen

On Wed, Apr 23, 2025 at 07:05:26PM +0200, Holger Hoffstätte wrote:
> There may have been a misunderstanding. I first noticed this on an old
> machine with SATA SSDs where I *do* have an udev rule for readahead.
> I only used my laptop with NVME drive (from ~2021) to reproduce the problem
> and send the email. On that machine I do not have any udev rule to set
> readahead since it's plenty fast.

Ah.

> Not sure if that matters, as it was a valid bug after all and now
> it's fixed, so thanks again!

I usually try to understand what happened to properly document it and
create test cases if needed.

With your above information I dug a bit deeper and found the likely
culprit.  Before scsi was converted to the atomic queue limits API, it
did not use the proper blk_queue_io_opt API, so it never updated the
ra_size based on the optimal I/O.  Which means the user value did
stick around for SCSI but not the other drivers before.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-24  8:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-19 10:16 Block device's sysfs setting getting lost after suspend-resume cycle Holger Hoffstätte
2025-04-21 11:49 ` Christoph Hellwig
2025-04-23  6:35 ` Christoph Hellwig
2025-04-23 10:11   ` Holger Hoffstätte
2025-04-23 16:19     ` Christoph Hellwig
2025-04-23 17:05       ` Holger Hoffstätte
2025-04-24  8:17         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).