* Sd card race on resume with filesystem errors (possible data loss?)
@ 2026-01-02 9:15 Sergio Callegari
2026-01-09 20:57 ` Bart Van Assche
0 siblings, 1 reply; 4+ messages in thread
From: Sergio Callegari @ 2026-01-02 9:15 UTC (permalink / raw)
To: linux-scsi
Hi and happy new year!
I would like to report a problem that I am encountering with the sdcard
storage.
I have received the suggestion to write to the dedicated ml from the
linux-stable one.
In my setup /home is on an sd-card (because system is a
laptop/convertible where the internal disk is too small). The card is
luks encrypted and has a btrfs filesystem on it.
When the laptop sleeps and then resumes, there is a race. The sdcard
gets accessed for read/write but is not yet ready, so there are I/O
errors. BTRFS is not happy with them and tends to remount RO.
This issue is well known to purism developers (e.g. see
https://source.puri.sm/Librem5/linux/-/issues/484 and
https://forums.puri.sm/t/sdcard-becomes-read-only-after-waking-up-from-suspend/20767/2).
My kernel logs are identical to those in
https://source.puri.sm/Librem5/linux/-/issues/484 (first comment), apart
from the fact that I get the errors from BTRFS, while the reporter there
gets the errors from EXT4. This indicates that the race is not specific
to BTRFS.
The errors in the kernel logs come right after the PM: suspend exit message.
From what I understand:
1. The error is more frequent with the SD-LUKS-FILESYSTEM
stratification, but not specific to it
2. A phone/tablet set up such as those that purism developers address
will generally use sdcard for storage and require suspend, being a good
trigger for the problem. However, the problem is in no means specific to
phones, ARM devices, etc. I am getting it on an X86-64 laptop.
3. It is unclear to me if there is a real risk of data loss. Possibly
with BTRFS that has a more complex data management this can be the case.
4.Even if data loss can be excluded, the issue requires a reboot to get
the filesystem back to RW, so it is annoying.
5. Purism developers have a kernel patch for it at
https://source.puri.sm/Librem5/linux/-/merge_requests/788. From my
understandig, this is not in linux mainline or stable. Would it make
sense to consider that patch?
6. For stable kernels, there is a mitigation consisting in a systemd
sleep-resume hook as in
#!/bin/sh
/usr/bin/systemd-cat -p5 /usr/bin/echo ${1} ${2}
case "${1}" in
post)
sleep 1.5
systemd-cat -p4 /usr/bin/echo "hack, wait for sdcard"
;;
esac
see https://source.puri.sm/Librem5/linux/-/issues/484#note_277648
This appears to reduce the occurrence of the problem, but not to
eliminate it completely.
Thanks for the attention
Sergio
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Sd card race on resume with filesystem errors (possible data loss?)
2026-01-02 9:15 Sd card race on resume with filesystem errors (possible data loss?) Sergio Callegari
@ 2026-01-09 20:57 ` Bart Van Assche
2026-01-20 12:19 ` Sergio Callegari
0 siblings, 1 reply; 4+ messages in thread
From: Bart Van Assche @ 2026-01-09 20:57 UTC (permalink / raw)
To: Sergio Callegari, linux-scsi
On 1/2/26 2:15 AM, Sergio Callegari wrote:
> 5. Purism developers have a kernel patch for it at https://
> source.puri.sm/Librem5/linux/-/merge_requests/788. From my understandig,
> this is not in linux mainline or stable. Would it make sense to consider
> that patch?
Please post the patch on the linux-scsi mailing list if you want it
included in the upstream kernel.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Sd card race on resume with filesystem errors (possible data loss?)
2026-01-09 20:57 ` Bart Van Assche
@ 2026-01-20 12:19 ` Sergio Callegari
2026-01-24 20:38 ` Sergio Callegari
0 siblings, 1 reply; 4+ messages in thread
From: Sergio Callegari @ 2026-01-20 12:19 UTC (permalink / raw)
To: Bart Van Assche, linux-scsi
On 09/01/2026 21:57, Bart Van Assche wrote:
>
> Please post the patch on the linux-scsi mailing list if you want it
> included in the upstream kernel.
>
Hi and thanks.
Before posting the patch, I would like to provide more info about the
situation. This is what I am experiencing after resume from sleep (as
seen on the system logs):
These are the lines that follow "PM: suspend exit"
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2470304 op
0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 1, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 2, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 337600 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 3, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 4, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 5, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 346752 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 6, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 7, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 8, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 373120 op
0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 9, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 378528 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: BTRFS error (device dm-0): bdev
/dev/mapper/luks-7223e129-f73a-4877-98fc-bc00384ce937 errs: wr 10, rd 0,
flush 0, corrupt 0, gen 0
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 378848 op
0x1:(WRITE) flags 0x1800 phys_seg 4 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 389600 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 390016 op
0x1:(WRITE) flags 0x1800 phys_seg 6 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2434752 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: I/O error, dev sda, sector 2443904 op
0x1:(WRITE) flags 0x1800 phys_seg 3 prio class 1
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: sd 2:0:0:0: [sda] tag#0 device offline
or changed
Jan 19 11:56:43 coccobill kernel: BTRFS: error (device dm-0) in
btrfs_commit_transaction:2535: errno=-5 IO failure (Error while writing
out transaction)
Jan 19 11:56:43 coccobill kernel: BTRFS info (device dm-0 state E):
forced readonly
In this case the errors are serious enough to cause btrfs to get to RO mode.
Let me recall that sda is my sd-card that works with the usb-storage
module, and that I have layered btrfs over luks encryption on it.
The errors are related to the resume from sleep as they come out *only*
after "PM: suspend exit". There are no errors for the device in normal
operation.
Before getting to the patch, I have also made more experiments:
- Trying to modify the usb-storage delay_use parameter has no effect on
the issue.
This looks strange to me since this parameter should specifically
control how much time the kernel waits before using the sd-card on my
system.
Trying to modify the /sys/block/sda/events_poll_msecs also makes no
difference at all (normally it is -1, I have tried to change that from
100 ms to 2 s).
Thanks for the attention,
Best
Sergio
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Sd card race on resume with filesystem errors (possible data loss?)
2026-01-20 12:19 ` Sergio Callegari
@ 2026-01-24 20:38 ` Sergio Callegari
0 siblings, 0 replies; 4+ messages in thread
From: Sergio Callegari @ 2026-01-24 20:38 UTC (permalink / raw)
To: linux-scsi
Made some tests with the purism patch:
Made some tests with the patch from the purism developers, that is the
following one:
Signed-off-by: Sebastian Krzyszkowiak <sebastian.krzyszkowiak@puri.sm>
---
drivers/scsi/sd.c | 39 +++++++++++++++++----------------------
1 file changed, 17 insertions(+), 22 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2c627deedc1f..c2353a260123 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3968,11 +3968,28 @@ static int sd_resume(struct device *dev)
static int sd_resume_common(struct device *dev, bool runtime)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
+ struct scsi_device *sdp;
int ret;
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
return 0;
+ sdp = sdkp->device;
+
+ if (sdp->ignore_media_change) {
+ /* clear the device's sense data */
+ static const u8 cmd[10] = { REQUEST_SENSE };
+ const struct scsi_exec_args exec_args = {
+ .req_flags = BLK_MQ_REQ_PM,
+ };
+
+ if (scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0,
+ sdp->request_queue->rq_timeout, 1,
+ &exec_args))
+ sd_printk(KERN_NOTICE, sdkp,
+ "Failed to clear sense data\n");
+ }
+
if (!sd_do_start_stop(sdkp->device, runtime)) {
sdkp->suspended = false;
return 0;
@@ -4005,28 +4022,6 @@ static int sd_resume_system(struct device *dev)
static int sd_resume_runtime(struct device *dev)
{
- struct scsi_disk *sdkp = dev_get_drvdata(dev);
- struct scsi_device *sdp;
-
- if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
- return 0;
-
- sdp = sdkp->device;
-
- if (sdp->ignore_media_change) {
- /* clear the device's sense data */
- static const u8 cmd[10] = { REQUEST_SENSE };
- const struct scsi_exec_args exec_args = {
- .req_flags = BLK_MQ_REQ_PM,
- };
-
- if (scsi_execute_cmd(sdp, cmd, REQ_OP_DRV_IN, NULL, 0,
- sdp->request_queue->rq_timeout, 1,
- &exec_args))
- sd_printk(KERN_NOTICE, sdkp,
- "Failed to clear sense data\n");
- }
-
return sd_resume_common(dev, true);
}
--
Unfortunately, this patch does not seem to fix my issue.
There seems to be something wrong with the usb-persist mechanism, since
there seems to be no way to delay the first access to the usb disk until
it can be assumed to be ready.
Incidentally, my sd reader is by Genesys Logic 05e3:0751, in case this
information is useful.
Best,
Sergio
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-24 20:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02 9:15 Sd card race on resume with filesystem errors (possible data loss?) Sergio Callegari
2026-01-09 20:57 ` Bart Van Assche
2026-01-20 12:19 ` Sergio Callegari
2026-01-24 20:38 ` Sergio Callegari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox