linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SCSI error indicating misalignment on part of Linux scsi or block layer?
@ 2024-07-16 19:55 David Howells
  2024-07-16 23:07 ` Damien Le Moal
  2024-07-17  0:01 ` David Howells
  0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2024-07-16 19:55 UTC (permalink / raw)
  To: James E.J. Bottomley; +Cc: dhowells, linux-scsi, linux-block

Hi James,

I'm wondering if I'm seeing a problem with DIO writes through Ext4 or XFS
manifesting as SCSI misalignment errors.  This has occurred with two different
drives.  I saw it first with v6.10-rc6, I think, but I haven't tried
cachefiles for a while.  It does happen with v6.10.

ata1.00: exception Emask 0x60 SAct 0x1 SErr 0x800 action 0x6 frozen
ata1.00: irq_stat 0x20000000, host bus error
ata1: SError: { HostInt }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/68:00:b0:93:34/00:00:02:00:00/40 tag 0 ncq dma 53248 out
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x60 (host bus error)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
sd 0:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
sd 0:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 02 34 93 b0 00 00 68 00
I/O error, dev sda, sector 37000112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
ata1: EH complete

For reference, I made it dump the result of the READ CAPACITY 16 command:

sd 0:0:0:0: [sda] RC16 000000003a38602f000002000000000000000000000000000000000000000000

The drive says it has 512-byte logical and physical block sizes.

The DIO writes are being generated by cachefiles and are all
PAGE_SIZED-aligned in terms of file offset and request length.

I also saw this:

	CacheFiles: I/O Error: Trunc-to-dio-size failed -95 [o=000001cb]

which indicates that ext4/xfs returned EOPNOTSUPP to vfs_truncate() and thence
to cachefiles.  I'm not sure why it would do that.

Any idea what might cause this or how to investigate it further?  Is it
possible it's some sort of hardware error in the I/O bridge or IOMMU?

Thanks,
David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
  2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
@ 2024-07-16 23:07 ` Damien Le Moal
  2024-07-17  0:01 ` David Howells
  1 sibling, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-07-16 23:07 UTC (permalink / raw)
  To: David Howells, James E.J. Bottomley; +Cc: linux-scsi, linux-block

On 7/17/24 04:55, David Howells wrote:
> Hi James,
> 
> I'm wondering if I'm seeing a problem with DIO writes through Ext4 or XFS
> manifesting as SCSI misalignment errors.  This has occurred with two different
> drives.  I saw it first with v6.10-rc6, I think, but I haven't tried
> cachefiles for a while.  It does happen with v6.10.
> 
> ata1.00: exception Emask 0x60 SAct 0x1 SErr 0x800 action 0x6 frozen
> ata1.00: irq_stat 0x20000000, host bus error

Bus error is a serious error...

> ata1: SError: { HostInt }
> ata1.00: failed command: WRITE FPDMA QUEUED
> ata1.00: cmd 61/68:00:b0:93:34/00:00:02:00:00/40 tag 0 ncq dma 53248 out
>          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x60 (host bus error)
> ata1.00: status: { DRDY }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

That is very low... Old hardware ?

> ata1.00: configured for UDMA/133
> sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
> sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current] 
> sd 0:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command

That is likely the result of the automatice generation of sense data for failed
commands based on ata status and error fields for a failed command, which
defaults to this when nothing else matches (yeah, I know, that is not pretty.
But the SAT specs in that area are a nightmare and following them actually ends
up with this asc/ascq. Will try to do something about it).

The host bus error is the issue. Not sure what triggers it though.
What is the adapter model you are using ?

> sd 0:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 02 34 93 b0 00 00 68 00
> I/O error, dev sda, sector 37000112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
> ata1: EH complete
> 
> For reference, I made it dump the result of the READ CAPACITY 16 command:
> 
> sd 0:0:0:0: [sda] RC16 000000003a38602f000002000000000000000000000000000000000000000000
> 
> The drive says it has 512-byte logical and physical block sizes.
> 
> The DIO writes are being generated by cachefiles and are all
> PAGE_SIZED-aligned in terms of file offset and request length.
> 
> I also saw this:
> 
> 	CacheFiles: I/O Error: Trunc-to-dio-size failed -95 [o=000001cb]
> 
> which indicates that ext4/xfs returned EOPNOTSUPP to vfs_truncate() and thence
> to cachefiles.  I'm not sure why it would do that.
> 
> Any idea what might cause this or how to investigate it further?  Is it
> possible it's some sort of hardware error in the I/O bridge or IOMMU?
> 
> Thanks,
> David
> 
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
  2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
  2024-07-16 23:07 ` Damien Le Moal
@ 2024-07-17  0:01 ` David Howells
  2024-07-17  0:31   ` Damien Le Moal
  1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2024-07-17  0:01 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: dhowells, James E.J. Bottomley, linux-scsi, linux-block

Damien Le Moal <dlemoal@kernel.org> wrote:

> That is very low... Old hardware ?

I got the cpu and motherboard in 2016, I think:

	model name      : Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

	Base Board Information
		Manufacturer: ASUSTeK COMPUTER INC.
		Product Name: H97-PLUS

> What is the adapter model you are using ?

This:

00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Device 8534
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 30
        I/O ports at f0b0 [size=8]
        I/O ports at f0a0 [size=4]
        I/O ports at f090 [size=8]
        I/O ports at f080 [size=4]
        I/O ports at f060 [size=32]
        Memory at f7d19000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [70] Power Management version 3
        Capabilities: [a8] SATA HBA v1.0
        Kernel driver in use: ahci

It's whatever is on the motherboard.

David


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
  2024-07-17  0:01 ` David Howells
@ 2024-07-17  0:31   ` Damien Le Moal
  0 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-07-17  0:31 UTC (permalink / raw)
  To: David Howells; +Cc: James E.J. Bottomley, linux-scsi, linux-block

On 7/17/24 09:01, David Howells wrote:
> Damien Le Moal <dlemoal@kernel.org> wrote:
> 
>> That is very low... Old hardware ?
> 
> I got the cpu and motherboard in 2016, I think:
> 
> 	model name      : Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
> 
> 	Base Board Information
> 		Manufacturer: ASUSTeK COMPUTER INC.
> 		Product Name: H97-PLUS

The CPU does not really matter much. I was talking about the disk connected to
your AHCI adapter. It links up at SATA-1 speed, which is uncommon for recent
drives. So I suspect your drive is old-ish, and old drives have the tendency to
be buggy and needing quirks...

What does "hdparm -I" say for this drive ?

> 
>> What is the adapter model you are using ?
> 
> This:
> 
> 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0])
>         Subsystem: ASUSTeK Computer Inc. Device 8534
>         Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 30
>         I/O ports at f0b0 [size=8]
>         I/O ports at f0a0 [size=4]
>         I/O ports at f090 [size=8]
>         I/O ports at f080 [size=4]
>         I/O ports at f060 [size=32]
>         Memory at f7d19000 (32-bit, non-prefetchable) [size=2K]
>         Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
>         Capabilities: [70] Power Management version 3
>         Capabilities: [a8] SATA HBA v1.0
>         Kernel driver in use: ahci
> 
> It's whatever is on the motherboard.
> 
> David
> 
> 

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-17  0:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
2024-07-16 23:07 ` Damien Le Moal
2024-07-17  0:01 ` David Howells
2024-07-17  0:31   ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).