Linux SCSI subsystem development
 help / color / mirror / Atom feed
* Sd card race on resume with filesystem errors (possible data loss?)
@ 2026-01-02  9:15 Sergio Callegari
  2026-01-09 20:57 ` Bart Van Assche
  0 siblings, 1 reply; 4+ messages in thread
From: Sergio Callegari @ 2026-01-02  9:15 UTC (permalink / raw)
  To: linux-scsi

Hi and happy new year!

I would like to report a problem that I am encountering with the sdcard 
storage.

I have received the suggestion to write to the dedicated ml from the 
linux-stable one.

In my setup /home is on an sd-card (because system is a 
laptop/convertible where the internal disk is too small). The card is 
luks encrypted and has a btrfs filesystem on it.

When the laptop sleeps and then resumes, there is a race. The sdcard 
gets accessed for read/write but is not yet ready, so there are I/O 
errors. BTRFS is not happy with them and tends to remount RO.

This issue is well known to purism developers (e.g. see 
https://source.puri.sm/Librem5/linux/-/issues/484 and 
https://forums.puri.sm/t/sdcard-becomes-read-only-after-waking-up-from-suspend/20767/2).

My kernel logs are identical to those in 
https://source.puri.sm/Librem5/linux/-/issues/484 (first comment), apart 
from the fact that I get the errors from BTRFS, while the reporter there 
gets the errors from EXT4. This indicates that the race is not specific 
to BTRFS.

The errors in the kernel logs come right after the PM: suspend exit message.

 From what I understand:

1. The error is more frequent with the SD-LUKS-FILESYSTEM 
stratification, but not specific to it

2. A phone/tablet set up such as those that purism developers address 
will generally use sdcard for storage and require suspend, being a good 
trigger for the problem. However, the problem is in no means specific to 
phones, ARM devices, etc. I am getting it on an X86-64 laptop.

3. It is unclear to me if there is a real risk of data loss. Possibly 
with BTRFS that has a more complex data management this can be the case.

4.Even if data loss can be excluded, the issue requires a reboot to get 
the filesystem back to RW, so it is annoying.

5. Purism developers have a kernel patch for it at 
https://source.puri.sm/Librem5/linux/-/merge_requests/788. From my 
understandig, this is not in linux mainline or stable. Would it make 
sense to consider that patch?

6. For stable kernels, there is a mitigation consisting in a systemd 
sleep-resume hook as in

#!/bin/sh
/usr/bin/systemd-cat -p5 /usr/bin/echo ${1} ${2}

case "${1}" in
         post)
                 sleep 1.5
                 systemd-cat -p4 /usr/bin/echo "hack, wait for sdcard"
         ;;
esac

see https://source.puri.sm/Librem5/linux/-/issues/484#note_277648

This appears to reduce the occurrence of the problem, but not to 
eliminate it completely.

Thanks for the attention

Sergio


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Sd card race on resume with filesystem errors (possible data loss?)
  2026-01-02  9:15 Sd card race on resume with filesystem errors (possible data loss?) Sergio Callegari
@ 2026-01-09 20:57 ` Bart Van Assche
  2026-01-20 12:19   ` Sergio Callegari
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2026-01-09 20:57 UTC (permalink / raw)
  To: Sergio Callegari, linux-scsi

On 1/2/26 2:15 AM, Sergio Callegari wrote:
> 5. Purism developers have a kernel patch for it at https:// 
> source.puri.sm/Librem5/linux/-/merge_requests/788. From my understandig, 
> this is not in linux mainline or stable. Would it make sense to consider 
> that patch?

Please post the patch on the linux-scsi mailing list if you want it
included in the upstream kernel.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Sd card race on resume with filesystem errors (possible data loss?)
  2026-01-09 20:57 ` Bart Van Assche
@ 2026-01-20 12:19   ` Sergio Callegari
  2026-01-24 20:38     ` Sergio Callegari
  0 siblings, 1 reply; 4+ messages in thread
From: Sergio Callegari @ 2026-01-20 12:19 UTC (permalink / raw)
  To: Bart Van Assche, linux-scsi



On 09/01/2026 21:57, Bart Van Assche wrote:
> 
> Please post the patch on the linux-scsi mailing list if you want it
> included in the upstream kernel.
> 

Hi and thanks.
Before posting the patch, I would like to provide more info about the 
situation. This is what I am experiencing after resume from sleep (as 
seen on the system logs):

These are the lines that follow "PM: suspend exit"

Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2470304 op 
0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 1, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 2, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 337600 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 3, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 4, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 5, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 346752 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 6, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 7, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 8, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 373120 op 
0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 9, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 378528 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev 
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 10, rd 0, 
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 378848 op 
0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 389600 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 390016 op 
0x1:(WRITE) flags 0x1800 phys_seg 6 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2434752 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2443904 op 
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline 
or changed
Jan 19 11:56:43 coccobill kernel: BTRFS: error (device dm-0) in 
btrfs_commit_transaction:2535: errno=-5 IO failure (Error while writing 
out transaction)
Jan 19 11:56:43 coccobill kernel: BTRFS info (device dm-0 state E): 
forced readonly

In this case the errors are serious enough to cause btrfs to get to RO mode.

Let me recall that sda is my sd-card that works with the usb-storage 
module, and that I have layered btrfs over luks encryption on it.

The errors are related to the resume from sleep as they come out *only* 
after "PM: suspend exit". There are no errors for the device in normal 
operation.

Before getting to the patch, I have also made more experiments:

- Trying to modify the usb-storage delay_use parameter has no effect on 
the issue.

This looks strange to me since this parameter should specifically 
control how much time the kernel waits before using the sd-card on my 
system.

Trying to modify the /sys/block/sda/events_poll_msecs also makes no 
difference at all (normally it is -1, I have tried to change that from 
100 ms to 2 s).

Thanks for the attention,

Best
Sergio


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Sd card race on resume with filesystem errors (possible data loss?)
  2026-01-20 12:19   ` Sergio Callegari
@ 2026-01-24 20:38     ` Sergio Callegari
  0 siblings, 0 replies; 4+ messages in thread
From: Sergio Callegari @ 2026-01-24 20:38 UTC (permalink / raw)
  To: linux-scsi

Made some tests with the purism patch:

Made some tests with the patch from the purism developers, that is the 
following one:

Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
---
  drivers/scsi/sd.c | 39 +++++++++++++++++----------------------
  1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2c627deedc1f..c2353a260123 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3968,11 +3968,28 @@ static int sd_resume(struct device *dev)
  static int sd_resume_common(struct device *dev, bool runtime)
  {
      struct scsi_disk *sdkp = dev_get_drvdata(dev);
+    struct scsi_device *sdp;
      int ret;

      if (!sdkp)    /* E.g.: runtime resume at the start of sd_probe() */
          return 0;

+    sdp = sdkp->device;
+
+    if (sdp->ignore_media_change) {
+        /* clear the device's sense data */
+        static const u8 cmd[10] = { REQUEST_SENSE };
+        const struct scsi_exec_args exec_args = {
+            .req_flags = BLK_MQ_REQ_PM,
+        };
+
+        if (scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0,
+                     sdp->request_queue->rq_timeout, 1,
+                     &exec_args))
+            sd_printk(KERN_NOTICE, sdkp,
+                  "Failed to clear sense data\n");
+    }
+
      if (!sd_do_start_stop(sdkp->device, runtime)) {
          sdkp->suspended = false;
          return 0;
@@ -4005,28 +4022,6 @@ static int sd_resume_system(struct device *dev)

  static int sd_resume_runtime(struct device *dev)
  {
-    struct scsi_disk *sdkp = dev_get_drvdata(dev);
-    struct scsi_device *sdp;
-
-    if (!sdkp)    /* E.g.: runtime resume at the start of sd_probe() */
-        return 0;
-
-    sdp = sdkp->device;
-
-    if (sdp->ignore_media_change) {
-        /* clear the device's sense data */
-        static const u8 cmd[10] = { REQUEST_SENSE };
-        const struct scsi_exec_args exec_args = {
-            .req_flags = BLK_MQ_REQ_PM,
-        };
-
-        if (scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0,
-                     sdp->request_queue->rq_timeout, 1,
-                     &exec_args))
-            sd_printk(KERN_NOTICE, sdkp,
-                  "Failed to clear sense data\n");
-    }
-
      return sd_resume_common(dev, true);
  }

--

Unfortunately, this patch does not seem to fix my issue.

There seems to be something wrong with the usb-persist mechanism, since 
there seems to be no way to delay the first access to the usb disk until 
it can be assumed to be ready.

Incidentally, my sd reader is by Genesys Logic 05e3:0751, in case this 
information is useful.

Best,

Sergio


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-01-24 20:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02  9:15 Sd card race on resume with filesystem errors (possible data loss?) Sergio Callegari
2026-01-09 20:57 ` Bart Van Assche
2026-01-20 12:19   ` Sergio Callegari
2026-01-24 20:38     ` Sergio Callegari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox