linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* I/O errors while writing to external Transcend XS-2000 4TB SSD
@ 2024-02-11 15:42 Martin Steigerwald
  2024-02-11 16:02 ` Holger Hoffstätte
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2024-02-11 15:42 UTC (permalink / raw)
  To: stable, regressions, linux-usb

Hi!

This is not exactly a regression, as I am not aware of a prior working
state, but kernel documentation advises me to CC regressions list anyway¹.

I am trying to put data on an external Kingston XS-2000 4 TB SSD using
self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not think
BCacheFS has any part in the errors I see, but if you disagree feel free
to CC the BCacheFS mailing list as you reply.

I am using a ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U and 32
GiB of RAM.

I connected the SSD onto USB-C port directly with the ThinkPad. lsusb
lists it as:

Bus 007 Device 004: ID 0951:176b Kingston Technology XS2000

The SSD is detected as follows:

[20303.913644] usb 7-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[20303.926616] usb 7-1: New USB device found, idVendor=0951, idProduct=176b, bcdDevice= 1.00
[20303.926633] usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[20303.926641] usb 7-1: Product: XS2000
[20303.926647] usb 7-1: Manufacturer: Kingston
[20303.926652] usb 7-1: SerialNumber: […]
[20303.929078] scsi host0: uas
[20303.983859] scsi 0:0:0:0: Direct-Access     Kingston XS2000           1000 PQ: 0 ANSI: 6
[20303.984426] sd 0:0:0:0: Attached scsi generic sg0 type 0
[20303.985197] sd 0:0:0:0: [sda] 8001573552 512-byte logical blocks: (4.10 TB/3.73 TiB)
[20303.985331] sd 0:0:0:0: [sda] Write Protect is off
[20303.985341] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[20303.985579] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[20303.989516]  sda: sda1
[20303.989611] sd 0:0:0:0: [sda] Attached SCSI disk

BCacheFS is mounted as follows – but I suspect BCacheFS is not involved in
those errors anyway:

[20310.437864] bcachefs (sda1): mounting version 1.3: rebalance_work opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
[20310.437895] bcachefs (sda1): recovering from clean shutdown, journal seq 5094
[20310.450813] bcachefs (sda1): alloc_read... done
[20310.450851] bcachefs (sda1): stripes_read... done
[20310.450855] bcachefs (sda1): snapshots_read... done
[20310.470815] bcachefs (sda1): journal_replay... done
[20310.470824] bcachefs (sda1): resume_logged_ops... done
[20310.470835] bcachefs (sda1): going read-write


During rsync'ing about 1,4 TB of data after eventually a hour I got
things like this:

[33963.462694] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 1 inflight: CMD 
[33963.462708] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
[33963.462718] sd 0:0:0:0: [sda] tag#11 uas_zap_pending 0 uas-tag 2 inflight: CMD 
[33963.462725] sd 0:0:0:0: [sda] tag#11 CDB: Write(16) 8a 00 00 00 00 00 82 c1 c8 00 00 00 04 00 00 00
[33963.462733] sd 0:0:0:0: [sda] tag#15 uas_zap_pending 0 uas-tag 3 inflight: CMD 
[33963.462740] sd 0:0:0:0: [sda] tag#15 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d2 4c 00 00 01 2f 00 00
[33963.462748] sd 0:0:0:0: [sda] tag#12 uas_zap_pending 0 uas-tag 4 inflight: CMD 
[33963.462754] sd 0:0:0:0: [sda] tag#12 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d0 00 00 00 02 4c 00 00
[33963.462762] sd 0:0:0:0: [sda] tag#13 uas_zap_pending 0 uas-tag 5 inflight: CMD 
[33963.462769] sd 0:0:0:0: [sda] tag#13 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d4 00 00 00 00 ff 00 00
[33963.462777] sd 0:0:0:0: [sda] tag#14 uas_zap_pending 0 uas-tag 6 inflight: CMD 
[33963.462783] sd 0:0:0:0: [sda] tag#14 CDB: Write(16) 8a 00 00 00 00 00 82 c1 ce 00 00 00 00 cc 00 00
[33963.576991] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[33963.590793] scsi host0: uas_eh_device_reset_handler success
[33963.592857] sd 0:0:0:0: [sda] tag#10 timing out command, waited 180s
[33963.592872] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=182s
[33963.592881] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
[33963.592886] I/O error, dev sda, sector 2193734656 op 0x1:(WRITE) flags 0x104000 phys_seg 773 prio class 2
[33963.592898] bcachefs (sda1 inum 1073761281 offset 265216): data write error: I/O
[33963.592925] bcachefs (sda1 inum 1073761281 offset 467456): data write error: I/O
[33963.592933] bcachefs (sda1 inum 1073761281 offset 470016): data write error: I/O
[33963.592939] bcachefs (sda1 inum 1073761281 offset 471552): data write error: I/O
[33963.592949] bcachefs (sda1 inum 1073761281 offset 514560): data write error: I/O
[33963.592956] bcachefs (sda1 inum 1073761281 offset 517120): data write error: I/O
[33963.592963] bcachefs (sda1 inum 1073761281 offset 519168): data write error: I/O
[33963.592969] bcachefs (sda1 inum 1073761281 offset 521728): data write error: I/O
[33963.592976] bcachefs (sda1 inum 1073761281 offset 523776): data write error: I/O
[33963.592983] bcachefs (sda1 inum 1073761281 offset 526336): data write error: I/O

The rsync completed but I did not trust the result, even tough
"bcachefs fsck" told me the filesystem structure is okay.

Thus I reran rsync with option "-c" for checksumming. After a long time
with data that did match, it started to transfer a file again which should
not happen if data would have been identical. As it ran into I/O errors
again, I stopped the rsync process.

I looked for that UAS error message and according to the article² I
found I disabled UAS as follows:

% cat /etc/modprobe.d/disable-uas.conf
# Does not work with external SSD Transcend XS2000 4TB
options usb-storage quirks=0951:176b:u

The quirk was applied as I reconnected the devices after unloading
both usb-storage and uas modules:

[   55.871301] usb 7-1: UAS is ignored for this device, using usb-storage instead
[   55.871310] usb-storage 7-1:1.0: USB Mass Storage device detected
[   55.871559] usb-storage 7-1:1.0: Quirks match for vid 0951 pid 176b: 800000

I recreated the BCacheFS filesystem and tried again. This time it did
not take more than 10 minutes for the first I/O error to appear. Unless
with UAS it made rsync stop with an I/O error immediately. Before that
there were several USB resets. Here is the excerpt from dmesg:

[  795.768306] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[  932.976677] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[  963.189438] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1000.057333] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1036.917137] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1073.782876] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1110.647786] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1117.163693] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=214s
[ 1117.163718] sd 0:0:0:0: [sda] tag#0 CDB: Write(16) 8a 00 00 00 00 00 02 72 20 00 00 00 08 00 00 00
[ 1117.163725] I/O error, dev sda, sector 41033728 op 0x1:(WRITE) flags 0x104000 phys_seg 1551 prio class 2
[ 1117.163739] bcachefs (sda1 inum 1879048481 offset 2572800): data write error: I/O
[ 1117.163763] bcachefs (sda1 inum 1879048481 offset 2576384): data write error: I/O
[ 1117.163771] bcachefs (sda1 inum 1879048481 offset 2578432): data write error: I/O
[ 1117.163779] bcachefs (sda1 inum 1879048481 offset 2580480): data write error: I/O
[ 1117.163786] bcachefs (sda1 inum 1879048481 offset 2582528): data write error: I/O
[ 1117.163794] bcachefs (sda1 inum 1879048481 offset 2584576): data write error: I/O
[ 1117.163803] bcachefs (sda1 inum 1879048481 offset 2586624): data write error: I/O
[ 1117.163811] bcachefs (sda1 inum 1879048481 offset 2588672): data write error: I/O
[ 1117.163818] bcachefs (sda1 inum 1879048481 offset 2590720): data write error: I/O
[ 1117.163824] bcachefs (sda1 inum 1879048481 offset 2592768): data write error: I/O

So even without UAS the device does not seem to like to write data on
Linux.

Next steps may involve looking for a firmware update for the external SSD
as well as trying to obtain its SMART status. So far I did not succeed in
finding the right options for smartctl. In case there is enough evidence
that the device is defective I'd try to RMA it.

I will keep a copy of kernel log and I could do some further tests as time
permits. So let me know whether you need anything else, but for now
the mail is long enough as it is.


[1] https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

[2] How to disable USB Attached Storage (UAS)
Last edited on 4 December 2022, at 14:00

https://leo.leung.xyz/wiki/How_to_disable_USB_Attached_Storage_(UAS)

Ciao,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-11 15:42 I/O errors while writing to external Transcend XS-2000 4TB SSD Martin Steigerwald
@ 2024-02-11 16:02 ` Holger Hoffstätte
  2024-02-11 17:06   ` Martin Steigerwald
  0 siblings, 1 reply; 10+ messages in thread
From: Holger Hoffstätte @ 2024-02-11 16:02 UTC (permalink / raw)
  To: Martin Steigerwald, stable, regressions, linux-usb

On 2024-02-11 16:42, Martin Steigerwald wrote:
> Hi!
> I am trying to put data on an external Kingston XS-2000 4 TB SSD using
> self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not think
> BCacheFS has any part in the errors I see, but if you disagree feel free
> to CC the BCacheFS mailing list as you reply.

This is indeed a known bug with bcachefs on USB-connected devices.
Apply the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05

This and some other commits are already scheduled for -stable.

Holger

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-11 16:02 ` Holger Hoffstätte
@ 2024-02-11 17:06   ` Martin Steigerwald
  2024-02-11 18:51     ` Kent Overstreet
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2024-02-11 17:06 UTC (permalink / raw)
  To: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs

Hi Holger!

CC'ing BCacheFS mailing list.

My original mail is here:

https://lore.kernel.org/linux-usb/5264d425-fc13-6a77-2dbf-6853479051a0@applied-asynchrony.com/T/
#m5ec9ecad1240edfbf41ad63c7aeeb6aa6ea38a5e

Holger Hoffstätte - 11.02.24, 17:02:29 CET:
> On 2024-02-11 16:42, Martin Steigerwald wrote:
> > Hi!
> > I am trying to put data on an external Kingston XS-2000 4 TB SSD using
> > self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not
> > think BCacheFS has any part in the errors I see, but if you disagree 
> > feel free to CC the BCacheFS mailing list as you reply.
> 
> This is indeed a known bug with bcachefs on USB-connected devices.
> Apply the following commit:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commi
> t/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05
> 
> This and some other commits are already scheduled for -stable.

Thanks!

Oh my. I was aware of some bug fixes coming for stable. I briefly looked 
through them, but now I did not make a connection.

I will wait for 6.7.5 and retry then I bet.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-11 17:06   ` Martin Steigerwald
@ 2024-02-11 18:51     ` Kent Overstreet
  2024-02-12 15:52       ` Martin Steigerwald
  2024-03-15  9:08       ` Martin Steigerwald
  0 siblings, 2 replies; 10+ messages in thread
From: Kent Overstreet @ 2024-02-11 18:51 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs

On Sun, Feb 11, 2024 at 06:06:27PM +0100, Martin Steigerwald wrote:
> Hi Holger!
> 
> CC'ing BCacheFS mailing list.
> 
> My original mail is here:
> 
> https://lore.kernel.org/linux-usb/5264d425-fc13-6a77-2dbf-6853479051a0@applied-asynchrony.com/T/
> #m5ec9ecad1240edfbf41ad63c7aeeb6aa6ea38a5e
> 
> Holger Hoffstätte - 11.02.24, 17:02:29 CET:
> > On 2024-02-11 16:42, Martin Steigerwald wrote:
> > > Hi!
> > > I am trying to put data on an external Kingston XS-2000 4 TB SSD using
> > > self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not
> > > think BCacheFS has any part in the errors I see, but if you disagree 
> > > feel free to CC the BCacheFS mailing list as you reply.
> > 
> > This is indeed a known bug with bcachefs on USB-connected devices.
> > Apply the following commit:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commi
> > t/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05
> > 
> > This and some other commits are already scheduled for -stable.
> 
> Thanks!
> 
> Oh my. I was aware of some bug fixes coming for stable. I briefly looked 
> through them, but now I did not make a connection.
> 
> I will wait for 6.7.5 and retry then I bet.

That doesn't look related - the device claims to not support flush or
fua, and the bug resulted in us not sending flush/fua devices; the main
thing people would see without that patch, on 6.8, would be an immediate
-EOPNOTSUP on the first flush journal write.

He only got errors after an hour or so, or 10 minutes with UAS disabled;
we send flushes once a second. Sounds like a screwy device.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-11 18:51     ` Kent Overstreet
@ 2024-02-12 15:52       ` Martin Steigerwald
  2024-02-12 20:42         ` Kent Overstreet
  2024-03-15  9:08       ` Martin Steigerwald
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2024-02-12 15:52 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs

Kent Overstreet - 11.02.24, 19:51:32 CET:
> On Sun, Feb 11, 2024 at 06:06:27PM +0100, Martin Steigerwald wrote:
[…]
> > CC'ing BCacheFS mailing list.
> > 
> > My original mail is here:
> > 
> > https://lore.kernel.org/linux-usb/5264d425-fc13-6a77-2dbf-6853479051a0
> > @applied-asynchrony.com/T/ #m5ec9ecad1240edfbf41ad63c7aeeb6aa6ea38a5e
> > 
> > Holger Hoffstätte - 11.02.24, 17:02:29 CET:
> > > On 2024-02-11 16:42, Martin Steigerwald wrote:
> > > > Hi!
> > > > I am trying to put data on an external Kingston XS-2000 4 TB SSD
> > > > using
> > > > self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not
> > > > think BCacheFS has any part in the errors I see, but if you
> > > > disagree
> > > > feel free to CC the BCacheFS mailing list as you reply.
> > > 
> > > This is indeed a known bug with bcachefs on USB-connected devices.
> > > Apply the following commit:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > ommi t/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05
> > > 
> > > This and some other commits are already scheduled for -stable.
> > 
> > Thanks!
> > 
> > Oh my. I was aware of some bug fixes coming for stable. I briefly
> > looked through them, but now I did not make a connection.
> > 
> > I will wait for 6.7.5 and retry then I bet.
> 
> That doesn't look related - the device claims to not support flush or
> fua, and the bug resulted in us not sending flush/fua devices; the main
> thing people would see without that patch, on 6.8, would be an immediate
> -EOPNOTSUP on the first flush journal write.
> 
> He only got errors after an hour or so, or 10 minutes with UAS disabled;
> we send flushes once a second. Sounds like a screwy device.

Thanks for that explanation, Kent.

I am the one with that external Transcend XS 2000 4 TB SSD and I
specifically did not CC bcachefs mailing list at the beginning as after
seeing things like

[33963.462694] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 1 inflight: CMD 
[33963.462708] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
[…]
[33963.592872] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=182s

I thought some quirks in the device to be at fault.

However while Sandisk Extreme Pro 2 TB claims to support DPO and FUA I see

Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

also with other devices like external Toshiba Canvio 4 TB hard disks. Using
LUKS encrypted BTRFS on those I never saw any timeout while writing out
data issue with any of those hard disks. Also with disabled write cache
any cache flush / FUA request should be a no-op anyway? These hard disks
have been doing a ton of backup workloads without any issues, but so far
only with BTRFS.

I may test the Transcend XS2000 with BTRFS to see whether it makes a
difference, however I really like to use it with BCacheFS and I do not really
like to use LUKS for external devices. According to the kernel log I still
don't really think those errors at the block layer were about anything
filesystem specific, but what  do I know?

With UAS enabled for Transcend XS2000 I see:

Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

This sounds about right: Without cache flush / FUA request disable write
cache.

With UAS disabled, using only usb-storage, however I see:

Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

Which appears to be broken to me: If it cannot do cache flush / FUA it
should have write cache disabled.

Thus I removed the quirk to disable UAS again. It did not help anyway.

However when I look at the output of "hdparm -I" for that Transcend XS2000
none of this makes sense. Cause it blatantly advertises to support

[…]
           *    Mandatory FLUSH_CACHE
           *    FLUSH_CACHE_EXT
[…]
           *    WRITE_{DMA|MULTIPLE}_FUA_EXT
[…]

It has firmware revision S9K00107. I see whether I can get this updated
in case any update is available. Which is not obvious to me as Kingston
only offers to download a Windows application to update the firmware.

I asked them how to do an update on Linux. But am also prepared to run to
a friend with Windows system to do the update.

There is no urgency in this, so let's see whether a firmware update may
fix anything. In case someone has any additional insight, feel free to add
it. Otherwise I consider it case closed unless I retest with either Linux
kernel 6.7.5 or 6.8-rc4 and/or after having made a firmware update
if available.

Maybe also some other quirks would need to be enabled for that
device? I tested it with:

% cat /etc/modprobe.d/disable-uas.conf
# Does not work with external SSD Transcend XS2000 4TB
options usb-storage quirks=0951:176b:u

but as explained that did not help and thus I disabled UAS disabling
quirk again.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-12 15:52       ` Martin Steigerwald
@ 2024-02-12 20:42         ` Kent Overstreet
  2024-02-15 11:09           ` Martin Steigerwald
  0 siblings, 1 reply; 10+ messages in thread
From: Kent Overstreet @ 2024-02-12 20:42 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs, linux-block

On Mon, Feb 12, 2024 at 04:52:09PM +0100, Martin Steigerwald wrote:
> Kent Overstreet - 11.02.24, 19:51:32 CET:
> > On Sun, Feb 11, 2024 at 06:06:27PM +0100, Martin Steigerwald wrote:
> […]
> > > CC'ing BCacheFS mailing list.
> > > 
> > > My original mail is here:
> > > 
> > > https://lore.kernel.org/linux-usb/5264d425-fc13-6a77-2dbf-6853479051a0
> > > @applied-asynchrony.com/T/ #m5ec9ecad1240edfbf41ad63c7aeeb6aa6ea38a5e
> > > 
> > > Holger Hoffstätte - 11.02.24, 17:02:29 CET:
> > > > On 2024-02-11 16:42, Martin Steigerwald wrote:
> > > > > Hi!
> > > > > I am trying to put data on an external Kingston XS-2000 4 TB SSD
> > > > > using
> > > > > self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not
> > > > > think BCacheFS has any part in the errors I see, but if you
> > > > > disagree
> > > > > feel free to CC the BCacheFS mailing list as you reply.
> > > > 
> > > > This is indeed a known bug with bcachefs on USB-connected devices.
> > > > Apply the following commit:
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > > > ommi t/fs/bcachefs?id=3e44f325f6f75078cdcd44cd337f517ba3650d05
> > > > 
> > > > This and some other commits are already scheduled for -stable.
> > > 
> > > Thanks!
> > > 
> > > Oh my. I was aware of some bug fixes coming for stable. I briefly
> > > looked through them, but now I did not make a connection.
> > > 
> > > I will wait for 6.7.5 and retry then I bet.
> > 
> > That doesn't look related - the device claims to not support flush or
> > fua, and the bug resulted in us not sending flush/fua devices; the main
> > thing people would see without that patch, on 6.8, would be an immediate
> > -EOPNOTSUP on the first flush journal write.
> > 
> > He only got errors after an hour or so, or 10 minutes with UAS disabled;
> > we send flushes once a second. Sounds like a screwy device.
> 
> Thanks for that explanation, Kent.
> 
> I am the one with that external Transcend XS 2000 4 TB SSD and I
> specifically did not CC bcachefs mailing list at the beginning as after
> seeing things like
> 
> [33963.462694] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 1 inflight: CMD 
> [33963.462708] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
> […]
> [33963.592872] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=182s
> 
> I thought some quirks in the device to be at fault.
> 
> However while Sandisk Extreme Pro 2 TB claims to support DPO and FUA I see
> 
> Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> 
> also with other devices like external Toshiba Canvio 4 TB hard disks. Using
> LUKS encrypted BTRFS on those I never saw any timeout while writing out
> data issue with any of those hard disks. Also with disabled write cache
> any cache flush / FUA request should be a no-op anyway? These hard disks
> have been doing a ton of backup workloads without any issues, but so far
> only with BTRFS.
> 
> I may test the Transcend XS2000 with BTRFS to see whether it makes a
> difference, however I really like to use it with BCacheFS and I do not really
> like to use LUKS for external devices. According to the kernel log I still
> don't really think those errors at the block layer were about anything
> filesystem specific, but what  do I know?

It's definitely not unheard of for one specific filesystem to be
tickling driver/device bugs and not others.

I wonder what it would take to dump the outstanding requests on device
timeout.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-12 20:42         ` Kent Overstreet
@ 2024-02-15 11:09           ` Martin Steigerwald
  2024-02-15 15:19             ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Steigerwald @ 2024-02-15 11:09 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs, linux-block

Kent Overstreet - 12.02.24, 21:42:26 CET:

[thoughts about whether a cache flush / FUA request with write caches 
disabled would be a no-op anyway]

> > I may test the Transcend XS2000 with BTRFS to see whether it makes a
> > difference, however I really like to use it with BCacheFS and I do not
> > really like to use LUKS for external devices. According to the kernel
> > log I still don't really think those errors at the block layer were
> > about anything filesystem specific, but what  do I know?
> 
> It's definitely not unheard of for one specific filesystem to be
> tickling driver/device bugs and not others.
> 
> I wonder what it would take to dump the outstanding requests on device
> timeout.

I got some reply back from Transcend support.

They brought up two possible issues:

1) Copied to many files at once. I am not going to accept that one. An 
external 4 TB SSD should handle writing 1,4 TB in about 215000 files, 
coming from a slower Toshiba Canvio Basics external HD, just fine. About 
90000 files was larger files like sound and video files or installation 
archives. The rest is from a Linux system backup, so smaller files. I 
likely move those elsewhere before I try again as I do not need these on 
flash anyway. However if the amount of files or data matters I could never 
know what amount of data I could write safely in one go. That is not 
acceptable to me.

2) Power management related to USB port. Cause I am using a laptop. It may 
have been that the Linux kernel decided to put the USB port the SSD was 
connected to into some kind of sleep state. However it was a constant 
rsync based copy workload. Yes, the kernel buffers data and the reads from 
Toshiba HD should be quite a bit slower than the Transcend SSD could 
handle the writes. I saw now more than 80-90 MiB/s coming from the hard 
disk. However I would doubt this lead to pauses of write activity of more 
than 30 seconds. Still it could be a thing.

Regarding further testing I am unsure whether to first test with BTRFS on 
top of LUKS – I do not like to store clear text data on the SSD – or with 
BCacheFS plus fixes which are 6.7.5 or 6.8-rc4 in just in the case the flush 
handling fixes would still have an influence on the issue at hand.

First I will have a look on how to see what USB power management options 
may be in place and how to tell Linux to keep the USB port the SSD is 
connected to at all times.

Let's see how this story unfolds. At least I am in no hurry about it.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-15 11:09           ` Martin Steigerwald
@ 2024-02-15 15:19             ` Alan Stern
  2024-02-15 15:36               ` Martin Steigerwald
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2024-02-15 15:19 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Kent Overstreet, stable, regressions, linux-usb,
	Holger Hoffstätte, linux-bcachefs, linux-block

On Thu, Feb 15, 2024 at 12:09:20PM +0100, Martin Steigerwald wrote:
> Kent Overstreet - 12.02.24, 21:42:26 CET:
> 
> [thoughts about whether a cache flush / FUA request with write caches 
> disabled would be a no-op anyway]
> 
> > > I may test the Transcend XS2000 with BTRFS to see whether it makes a
> > > difference, however I really like to use it with BCacheFS and I do not
> > > really like to use LUKS for external devices. According to the kernel
> > > log I still don't really think those errors at the block layer were
> > > about anything filesystem specific, but what  do I know?
> > 
> > It's definitely not unheard of for one specific filesystem to be
> > tickling driver/device bugs and not others.
> > 
> > I wonder what it would take to dump the outstanding requests on device
> > timeout.
> 
> I got some reply back from Transcend support.
> 
> They brought up two possible issues:
> 
> 1) Copied to many files at once. I am not going to accept that one. An 
> external 4 TB SSD should handle writing 1,4 TB in about 215000 files, 
> coming from a slower Toshiba Canvio Basics external HD, just fine. About 
> 90000 files was larger files like sound and video files or installation 
> archives. The rest is from a Linux system backup, so smaller files. I 
> likely move those elsewhere before I try again as I do not need these on 
> flash anyway. However if the amount of files or data matters I could never 
> know what amount of data I could write safely in one go. That is not 
> acceptable to me.
> 
> 2) Power management related to USB port. Cause I am using a laptop. It may 
> have been that the Linux kernel decided to put the USB port the SSD was 
> connected to into some kind of sleep state. However it was a constant 
> rsync based copy workload. Yes, the kernel buffers data and the reads from 
> Toshiba HD should be quite a bit slower than the Transcend SSD could 
> handle the writes. I saw now more than 80-90 MiB/s coming from the hard 
> disk. However I would doubt this lead to pauses of write activity of more 
> than 30 seconds. Still it could be a thing.
> 
> Regarding further testing I am unsure whether to first test with BTRFS on 
> top of LUKS – I do not like to store clear text data on the SSD – or with 
> BCacheFS plus fixes which are 6.7.5 or 6.8-rc4 in just in the case the flush 
> handling fixes would still have an influence on the issue at hand.
> 
> First I will have a look on how to see what USB power management options 
> may be in place and how to tell Linux to keep the USB port the SSD is 
> connected to at all times.
> 
> Let's see how this story unfolds. At least I am in no hurry about it.

This may not be an issue of power management but rather one of 
insufficient power.  A laptop may not provide enough power through its 
USB ports for the Transcend SSD to work properly under load.

You can test this by connecting a powered UBS-3 hub between the laptop 
and the drive.

Alan Stern

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-15 15:19             ` Alan Stern
@ 2024-02-15 15:36               ` Martin Steigerwald
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Steigerwald @ 2024-02-15 15:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Kent Overstreet, stable, regressions, linux-usb,
	Holger Hoffstätte, linux-bcachefs, linux-block

Alan Stern - 15.02.24, 16:19:54 CET:
> > First I will have a look on how to see what USB power management
> > options may be in place and how to tell Linux to keep the USB port
> > the SSD is connected to at all times.
> > 
> > Let's see how this story unfolds. At least I am in no hurry about it.
> 
> This may not be an issue of power management but rather one of
> insufficient power.  A laptop may not provide enough power through its
> USB ports for the Transcend SSD to work properly under load.
> 
> You can test this by connecting a powered UBS-3 hub between the laptop
> and the drive.

Interesting idea. Maybe the Transcend XS-2000 4TB needs more power than 
the Sandisk Extreme Pro 2TB.

Not sure whether I have one at hand with USB-C here, cause my regular USB 
hub only has USB-A connectors. Need to look for one with enough USB-A and 
USB-C connectors as I use an USB hub as replacement for a docking station. 
But I do have at least optionally powered hub with USB-C one at another 
place. It does not have many ports. But for the task ahead one USB-C port 
is sufficient.

I will try this as well. Thanks.

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: I/O errors while writing to external Transcend XS-2000 4TB SSD
  2024-02-11 18:51     ` Kent Overstreet
  2024-02-12 15:52       ` Martin Steigerwald
@ 2024-03-15  9:08       ` Martin Steigerwald
  1 sibling, 0 replies; 10+ messages in thread
From: Martin Steigerwald @ 2024-03-15  9:08 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: stable, regressions, linux-usb, Holger Hoffstätte,
	linux-bcachefs

Hi!

Kent Overstreet - 11.02.24, 19:51:32 CET:
> He only got errors after an hour or so, or 10 minutes with UAS disabled;
> we send flushes once a second. Sounds like a screwy device.

Kingston support intends to RMA the XS-2000 4 TB SSD with a variant with a 
newer firmware version, in case they have it available, while they work on 
a newer firmware version for the device variant the error happened on.

So it appears the device has a bug. I will keep you posted, once I either 
receive that other variant or a firmware upgrade for the existing one.

I am happy with Kingston support so far. It takes quite a while, but they 
are taking the issue for real instead of writing use Windows instead of 
Linux or something like that :) - like I read before in other occasions 
with hardware from other suppliers. Thanks!

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-15  9:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-11 15:42 I/O errors while writing to external Transcend XS-2000 4TB SSD Martin Steigerwald
2024-02-11 16:02 ` Holger Hoffstätte
2024-02-11 17:06   ` Martin Steigerwald
2024-02-11 18:51     ` Kent Overstreet
2024-02-12 15:52       ` Martin Steigerwald
2024-02-12 20:42         ` Kent Overstreet
2024-02-15 11:09           ` Martin Steigerwald
2024-02-15 15:19             ` Alan Stern
2024-02-15 15:36               ` Martin Steigerwald
2024-03-15  9:08       ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).