* [Bug 219467] New: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5
@ 2024-11-04 22:17 bugzilla-daemon
2024-11-04 22:32 ` [Bug 219467] " bugzilla-daemon
2024-11-04 22:47 ` bugzilla-daemon
0 siblings, 2 replies; 3+ messages in thread
From: bugzilla-daemon @ 2024-11-04 22:17 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=219467
Bug ID: 219467
Summary: Adaptec 71605 hangs with aacraid: Host adapter abort
request after update to linux 6.11.5
Product: SCSI Drivers
Version: 2.5
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P3
Component: AACRAID
Assignee: scsi_drivers-aacraid@kernel-bugs.osdl.org
Reporter: kernel-bugzilla@cygnusx-1.org
Regression: No
On October 31st I upgraded a system from Fedora 40 to Fedora 41. This upgraded
the kernel from 6.10.6-200.fc40.x86_64 to 6.11.5-300.fc41.x86_64. One of the
system's primary uses is as a NAS using an Adaptec 71605 and zfs-2.2.6. The
system does zfs scrubs on the two zfs filesystems on Mondays, like Oct 28th and
Nov 4th. On Oct 28th it was still on the 6.10.6 kernel, and today it was on the
6.11.5 kernel.
The errors repeated until I woke up, and found the scrubs had stopped from zfs
errors caused by the controller errors. After a bit I rebooted the system, and
then had to stop the scrubs again. They had automatically restarted. I then
installed 6.10.14-200.fc40.x86_64, and restarted the scrubs.
The scrub processes started at nearly 4am. You can see from the timing of the
logs below that the errors didn't start for over two hours into the scrub. The
house thermostat is set to 73F/76F, and the outside temperature at 6am was 45F.
So the room shouldn't have been unusually hot.
I saw zfs read and write errors on all the drives on the 71605.
I restarted the scrubs after downgrading to 6.10.14. It has been about three
hours since then. Which means it has lasted longer than 6.11.5 so far. I will
update with a new comment when it either throws an error or completes.
I built the system in May of 2021, and it hasn't given many any issues like
this before. It started with a 5.11.12-300.fc34 kernel.
I did look for a newer version of the disk controller's bios, but found it is
already the latest, 32118.
System hardware:
AMD Ryzen 9 5950X, processor
Kingston 128gb(4x32gb) DDR4 ECC, memory
ASUS Pro WS X570-ACE, motherboard
Adaptec 71605, disk controller
6 WD 18tb SATA, drives(one on the 71605, rest on other controllers)
9 WD 8tb SATA, drives(all on the 71605)
BIOS/Firmware versions:
BIOS : 7.5-0 (32118)
Firmware : 7.5-0 (32118)
A older, but very similar bug:
https://bugzilla.kernel.org/show_bug.cgi?id=217599
Timing of scrubs and errors:
Nov 04 03:46:01 storage zed[2545101]: eid=11 class=scrub_start pool='data18'
Nov 04 03:46:11 storage zed[2545231]: eid=13 class=scrub_start pool='data8'
Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request.
Errors:
Nov 04 06:08:38 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host adapter abort request.
aacraid: Outstanding commands on (2,1,12,0):
Nov 04 06:09:08 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
handler-8
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:09:08 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:10:19 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed
Nov 04 06:11:19 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
Issuing IOP resethandler-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-124
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:11:19 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:11:19 storage kernel: rfkill wmi_bmof snd_timer drm_ttm_helper
pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss
nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi
nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco
nvme_auth wmi ip6_tables ip_tables fuse
Nov 04 06:11:19 storage kernel: src_sync_cmd+0x108/0x2e0 [aacraid]
Nov 04 06:11:19 storage kernel: aac_src_restart_adapter.part.0+0x112/0x2b6
[aacraid]
Nov 04 06:11:19 storage kernel: aac_reset_adapter+0xeb/0x650 [aacraid]
Nov 04 06:11:19 storage kernel: aac_eh_host_reset+0x62/0xe0 [aacraid]
Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:12:34 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed
Nov 04 06:12:34 storage kernel: mxm_wmi nvme sha512_ssse3 aacraid
Nov 04 06:13:04 storage kernel: aacraid: Host bus reset request. SCSI hang ?
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: Adapter health - -3
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
midlevel-0
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
lowlevel-0
Nov 04 06:13:04 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: error
handler-0
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd:
firmware-1
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: outstanding cmd: kernel-0
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Controller reset type is
3
Nov 04 06:13:05 storage kernel: aacraid 0000:0a:00.0: Issuing IOP reset
Nov 04 06:13:05 storage kernel: rfkill wmi_bmof snd_timer drm_ttm_helper
pcspkr ttm k10temp i2c_piix4 snd i2c_smbus video soundcore igc nfsd auth_rpcgss
nfs_acl lockd grace sunrpc loop nfnetlink crct10dif_pclmul crc32_pclmul
crc32c_intel polyval_clmulni polyval_generic raid1 ghash_clmulni_intel mxm_wmi
nvme sha512_ssse3 aacraid sha256_ssse3 sha1_ssse3 nvme_core sp5100_tco
nvme_auth wmi ip6_tables ip_tables fuse
Nov 04 06:13:05 storage kernel: src_sync_cmd+0x108/0x2e0 [aacraid]
Nov 04 06:13:05 storage kernel: aac_src_restart_adapter.part.0+0x112/0x2b6
[aacraid]
Nov 04 06:13:05 storage kernel: aac_reset_adapter+0xeb/0x650 [aacraid]
Nov 04 06:13:05 storage kernel: aac_eh_host_reset+0x62/0xe0 [aacraid]
Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: IOP reset failed
Nov 04 06:14:20 storage kernel: aacraid 0000:0a:00.0: ARC Reset attempt failed
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug 219467] Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5
2024-11-04 22:17 [Bug 219467] New: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5 bugzilla-daemon
@ 2024-11-04 22:32 ` bugzilla-daemon
2024-11-04 22:47 ` bugzilla-daemon
1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2024-11-04 22:32 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=219467
--- Comment #1 from Nathan Grennan (kernel-bugzilla@cygnusx-1.org) ---
boot drives:
2x Samsung SSD 980 PRO 500GB drives in mdadm raid1
lspci, short, disk controllers:
07:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
[AHCI mode] (rev 51)
08:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
[AHCI mode] (rev 51)
0a:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01)
lspci, long, everything:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root
Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP
Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP
Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP
Bridge
00:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP
Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:05.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse
Internal PCIe GPP Bridge 0 to bus[E:B]
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe
Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Starship/Matisse
Internal PCIe GPP Bridge 0 to bus[E:B]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Matisse/Vermeer Data
Fabric: Device 18h; Function 7
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
Controller PM9A1/PM9A3/980PRO
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream
03:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
03:02.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller I225-V (rev
03)
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD
Controller PM9A1/PM9A3/980PRO
06:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Starship/Matisse Reserved SPP
06:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host
Controller
06:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host
Controller
07:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
[AHCI mode] (rev 51)
08:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller
[AHCI mode] (rev 51)
09:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060
3GB] (rev a1)
09:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller
(rev a1)
0a:00.0 RAID bus controller: Adaptec Series 7 6G SAS/PCIe 3 (rev 01)
0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Starship/Matisse PCIe Dummy Function
0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Starship/Matisse Reserved SPP
0c:00.1 Encryption controller: Advanced Micro Devices, Inc. [AMD]
Starship/Matisse Cryptographic Coprocessor PSPCPP
0c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host
Controller
0c:00.4 Audio device: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD
Audio Controller
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug 219467] Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5
2024-11-04 22:17 [Bug 219467] New: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5 bugzilla-daemon
2024-11-04 22:32 ` [Bug 219467] " bugzilla-daemon
@ 2024-11-04 22:47 ` bugzilla-daemon
1 sibling, 0 replies; 3+ messages in thread
From: bugzilla-daemon @ 2024-11-04 22:47 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=219467
--- Comment #2 from Nathan Grennan (kernel-bugzilla@cygnusx-1.org) ---
Normal boot kernel messages about the disk controller for both version of the
kernel:
Nov 04 11:31:53 storage kernel: Linux version 6.11.5-300.fc41.x86_64
(mockbuild@a0564de4e00d4277aa3a51770ad85255) (gcc (GCC) 14.2.1 20240912 (Red
Hat 14.2.1-3), GNU ld version 2.43.1-2.fc41) #1 SMP PREEMPT_DYNAMIC Tue Oct 22
20:11:15 UTC 2024
Nov 04 11:31:53 storage kernel: Adaptec aacraid driver 1.2.1[50983]-custom
Nov 04 11:31:53 storage kernel: aacraid: Comm Interface type2 enabled
Nov 04 11:31:53 storage kernel: scsi host2: aacraid
Nov 04 12:09:05 storage kernel: Linux version 6.10.14-200.fc40.x86_64
(mockbuild@2cac3d8aa36b4f0888a34a961cba75ab) (gcc (GCC) 14.2.1 20240912 (Red
Hat 14.2.1-3), GNU ld version 2.41-37.fc40) #1 SMP PREEMPT_DYNAMIC Thu Oct 10
18:49:57 UTC 2024
Nov 04 12:09:06 storage kernel: Adaptec aacraid driver 1.2.1[50983]-custom
Nov 04 12:09:06 storage kernel: aacraid: Comm Interface type2 enabled
Nov 04 12:09:06 storage kernel: scsi host2: aacraid
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-11-04 22:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-04 22:17 [Bug 219467] New: Adaptec 71605 hangs with aacraid: Host adapter abort request after update to linux 6.11.5 bugzilla-daemon
2024-11-04 22:32 ` [Bug 219467] " bugzilla-daemon
2024-11-04 22:47 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).