* SCSI error indicating misalignment on part of Linux scsi or block layer?
@ 2024-07-16 19:55 David Howells
2024-07-16 23:07 ` Damien Le Moal
2024-07-17 0:01 ` David Howells
0 siblings, 2 replies; 4+ messages in thread
From: David Howells @ 2024-07-16 19:55 UTC (permalink / raw)
To: James E.J. Bottomley; +Cc: dhowells, linux-scsi, linux-block
Hi James,
I'm wondering if I'm seeing a problem with DIO writes through Ext4 or XFS
manifesting as SCSI misalignment errors. This has occurred with two different
drives. I saw it first with v6.10-rc6, I think, but I haven't tried
cachefiles for a while. It does happen with v6.10.
ata1.00: exception Emask 0x60 SAct 0x1 SErr 0x800 action 0x6 frozen
ata1.00: irq_stat 0x20000000, host bus error
ata1: SError: { HostInt }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/68:00:b0:93:34/00:00:02:00:00/40 tag 0 ncq dma 53248 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x60 (host bus error)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
sd 0:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
sd 0:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 02 34 93 b0 00 00 68 00
I/O error, dev sda, sector 37000112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
ata1: EH complete
For reference, I made it dump the result of the READ CAPACITY 16 command:
sd 0:0:0:0: [sda] RC16 000000003a38602f000002000000000000000000000000000000000000000000
The drive says it has 512-byte logical and physical block sizes.
The DIO writes are being generated by cachefiles and are all
PAGE_SIZED-aligned in terms of file offset and request length.
I also saw this:
CacheFiles: I/O Error: Trunc-to-dio-size failed -95 [o=000001cb]
which indicates that ext4/xfs returned EOPNOTSUPP to vfs_truncate() and thence
to cachefiles. I'm not sure why it would do that.
Any idea what might cause this or how to investigate it further? Is it
possible it's some sort of hardware error in the I/O bridge or IOMMU?
Thanks,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
@ 2024-07-16 23:07 ` Damien Le Moal
2024-07-17 0:01 ` David Howells
1 sibling, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-07-16 23:07 UTC (permalink / raw)
To: David Howells, James E.J. Bottomley; +Cc: linux-scsi, linux-block
On 7/17/24 04:55, David Howells wrote:
> Hi James,
>
> I'm wondering if I'm seeing a problem with DIO writes through Ext4 or XFS
> manifesting as SCSI misalignment errors. This has occurred with two different
> drives. I saw it first with v6.10-rc6, I think, but I haven't tried
> cachefiles for a while. It does happen with v6.10.
>
> ata1.00: exception Emask 0x60 SAct 0x1 SErr 0x800 action 0x6 frozen
> ata1.00: irq_stat 0x20000000, host bus error
Bus error is a serious error...
> ata1: SError: { HostInt }
> ata1.00: failed command: WRITE FPDMA QUEUED
> ata1.00: cmd 61/68:00:b0:93:34/00:00:02:00:00/40 tag 0 ncq dma 53248 out
> res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x60 (host bus error)
> ata1.00: status: { DRDY }
> ata1: hard resetting link
> ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
That is very low... Old hardware ?
> ata1.00: configured for UDMA/133
> sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s
> sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
> sd 0:0:0:0: [sda] tag#0 Add. Sense: Unaligned write command
That is likely the result of the automatice generation of sense data for failed
commands based on ata status and error fields for a failed command, which
defaults to this when nothing else matches (yeah, I know, that is not pretty.
But the SAT specs in that area are a nightmare and following them actually ends
up with this asc/ascq. Will try to do something about it).
The host bus error is the issue. Not sure what triggers it though.
What is the adapter model you are using ?
> sd 0:0:0:0: [sda] tag#0 CDB: Write(10) 2a 00 02 34 93 b0 00 00 68 00
> I/O error, dev sda, sector 37000112 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
> ata1: EH complete
>
> For reference, I made it dump the result of the READ CAPACITY 16 command:
>
> sd 0:0:0:0: [sda] RC16 000000003a38602f000002000000000000000000000000000000000000000000
>
> The drive says it has 512-byte logical and physical block sizes.
>
> The DIO writes are being generated by cachefiles and are all
> PAGE_SIZED-aligned in terms of file offset and request length.
>
> I also saw this:
>
> CacheFiles: I/O Error: Trunc-to-dio-size failed -95 [o=000001cb]
>
> which indicates that ext4/xfs returned EOPNOTSUPP to vfs_truncate() and thence
> to cachefiles. I'm not sure why it would do that.
>
> Any idea what might cause this or how to investigate it further? Is it
> possible it's some sort of hardware error in the I/O bridge or IOMMU?
>
> Thanks,
> David
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
2024-07-16 23:07 ` Damien Le Moal
@ 2024-07-17 0:01 ` David Howells
2024-07-17 0:31 ` Damien Le Moal
1 sibling, 1 reply; 4+ messages in thread
From: David Howells @ 2024-07-17 0:01 UTC (permalink / raw)
To: Damien Le Moal; +Cc: dhowells, James E.J. Bottomley, linux-scsi, linux-block
Damien Le Moal <dlemoal@kernel.org> wrote:
> That is very low... Old hardware ?
I got the cpu and motherboard in 2016, I think:
model name : Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
Base Board Information
Manufacturer: ASUSTeK COMPUTER INC.
Product Name: H97-PLUS
> What is the adapter model you are using ?
This:
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Device 8534
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 30
I/O ports at f0b0 [size=8]
I/O ports at f0a0 [size=4]
I/O ports at f090 [size=8]
I/O ports at f080 [size=4]
I/O ports at f060 [size=32]
Memory at f7d19000 (32-bit, non-prefetchable) [size=2K]
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [70] Power Management version 3
Capabilities: [a8] SATA HBA v1.0
Kernel driver in use: ahci
It's whatever is on the motherboard.
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: SCSI error indicating misalignment on part of Linux scsi or block layer?
2024-07-17 0:01 ` David Howells
@ 2024-07-17 0:31 ` Damien Le Moal
0 siblings, 0 replies; 4+ messages in thread
From: Damien Le Moal @ 2024-07-17 0:31 UTC (permalink / raw)
To: David Howells; +Cc: James E.J. Bottomley, linux-scsi, linux-block
On 7/17/24 09:01, David Howells wrote:
> Damien Le Moal <dlemoal@kernel.org> wrote:
>
>> That is very low... Old hardware ?
>
> I got the cpu and motherboard in 2016, I think:
>
> model name : Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
>
> Base Board Information
> Manufacturer: ASUSTeK COMPUTER INC.
> Product Name: H97-PLUS
The CPU does not really matter much. I was talking about the disk connected to
your AHCI adapter. It links up at SATA-1 speed, which is uncommon for recent
drives. So I suspect your drive is old-ish, and old drives have the tendency to
be buggy and needing quirks...
What does "hdparm -I" say for this drive ?
>
>> What is the adapter model you are using ?
>
> This:
>
> 00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode] (prog-if 01 [AHCI 1.0])
> Subsystem: ASUSTeK Computer Inc. Device 8534
> Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 30
> I/O ports at f0b0 [size=8]
> I/O ports at f0a0 [size=4]
> I/O ports at f090 [size=8]
> I/O ports at f080 [size=4]
> I/O ports at f060 [size=32]
> Memory at f7d19000 (32-bit, non-prefetchable) [size=2K]
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [70] Power Management version 3
> Capabilities: [a8] SATA HBA v1.0
> Kernel driver in use: ahci
>
> It's whatever is on the motherboard.
>
> David
>
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-07-17 0:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-16 19:55 SCSI error indicating misalignment on part of Linux scsi or block layer? David Howells
2024-07-16 23:07 ` Damien Le Moal
2024-07-17 0:01 ` David Howells
2024-07-17 0:31 ` Damien Le Moal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).