linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 217599] Adaptec 71605z hangs with aacraid: Host adapter abort request after update to linux 6.4.0
Date: Thu, 16 Nov 2023 08:45:41 +0000	[thread overview]
Message-ID: <bug-217599-11613-b32MjWIJo0@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217599-11613@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=217599

Joop Boonen (joop.boonen@netapp.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joop.boonen@netapp.com

--- Comment #26 from Joop Boonen (joop.boonen@netapp.com) ---
We have noticed on our Server using an Adaptec ASR8805 RAID controller running
Debian 12 i.e. Bookworm Kernel 6.1.55
That we get 100% wait states that causes the system to hang.
top - 12:57:32 up 7 min,  2 users,  load average: 5.02, 1.71, 0.65
Tasks: 451 total,   2 running, 449 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni, 81.8 id, 18.2 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  0.0 us,100.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  0.0 us,  0.0 sy,  0.0 ni,  0.0 id,100.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 257590.5 total, 242751.4 free,  10355.7 used,   6092.0 buff/cache    
MiB Swap:      0.0 total,      0.0 free,      0.0 used. 247234.8 avail Mem

When it's running with a < 6.1.53 Kernel we never see 100% wait states,
certainly not staining for a long time.

We also saw repeatedly:
[ 1376.837737] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.841731] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.842412] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843004] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.843587] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844169] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.844747] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845322] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.845906] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.846484] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847055] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.847628] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.848767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849336] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.849995] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1376.850560] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.789765] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.889767] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.890899] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.892002] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.893103] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.897790] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.898918] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.900009] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.901094] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.902199] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.903287] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.904384] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.905472] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.906585] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.907678] aacraid: Host adapter abort request.
               aacraid: Outstanding commands on (0,0,0,0):
[ 1378.945954] aacraid: Host bus reset request. SCSI hang ?
[ 1378.946602] aacraid 0000:af:00.0: outstanding cmd: midlevel-0
[ 1378.946607] aacraid 0000:af:00.0: outstanding cmd: lowlevel-0
[ 1378.946610] aacraid 0000:af:00.0: outstanding cmd: error handler-0
[ 1378.946613] aacraid 0000:af:00.0: outstanding cmd: firmware-32
[ 1378.946616] aacraid 0000:af:00.0: outstanding cmd: kernel-0
[ 1378.961850] aacraid 0000:af:00.0: Controller reset type is 3
[ 1378.962435] aacraid 0000:af:00.0: Issuing IOP reset
[ 1412.498211] aacraid 0000:af:00.0: IOP reset succeeded
[ 1412.523256] aacraid: Comm Interface type2 enabled
[ 1424.734176] aacraid 0000:af:00.0: Scheduling bus rescan
[ 1434.755589] aacraid 0000:af:00.0: DDR cache data recovered successfully

On another server that has an Adaptec ASR8405 raid controller running exactly
the same Distribution and kernel we don't see this issue at all.

The only major difference is that the system that has the problem has two
sockets i.e. CPUs.
This one also has SSD drives, but I don't think this could be an issue?

We have found out that this issue exists since Kernel 6.1.53. 
We found that Kernel 6.1.53 incorporated this patch: 
scsi: aacraid: Reply queue mapping to CPUs based on IRQ affinity

https://www.spinics.net/lists/stable-commits/msg313381.html

I think that this ticket is related to this issue.
https://bugzilla.kernel.org/show_bug.cgi?id=217599

and this email/link
https://lore.kernel.org/regressions/4a639fff-445e-455b-9a31-57368d6b7021@leemhuis.info/

We have tested Kernel 6.1.55 like the one in Debian Bookworm with the
above-mentioned patch reverted. It worked flawlessly.

Might it be related to multiple CPU sockets i.e. CPUs. As we don't have an
issue on a single Socket system.

Both systems have an Intel Xeon CPU(s).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2023-11-16  8:45 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 22:36 [Bug 217599] New: Adaptec 71605z hangs with aacraid: Host adapter abort request after update to linux 6.4.0 bugzilla-daemon
2023-06-27  1:31 ` [Bug 217599] " bugzilla-daemon
2023-06-27  1:47 ` bugzilla-daemon
2023-06-27  2:11 ` bugzilla-daemon
2023-06-27  2:39 ` [Bug 217599] New: " Bagas Sanjaya
2023-06-27  2:39 ` [Bug 217599] " bugzilla-daemon
2023-06-28  9:38 ` bugzilla-daemon
2023-07-03 17:21 ` bugzilla-daemon
2023-07-22 18:48 ` bugzilla-daemon
2023-07-23 18:03 ` bugzilla-daemon
2023-07-23 18:14 ` bugzilla-daemon
2023-07-27 21:22 ` bugzilla-daemon
2023-07-31 19:04 ` bugzilla-daemon
2023-08-02 13:01 ` bugzilla-daemon
2023-08-03 15:53 ` bugzilla-daemon
2023-09-05 14:57 ` bugzilla-daemon
2023-09-05 16:04 ` bugzilla-daemon
2023-09-07 17:41 ` bugzilla-daemon
2023-09-07 17:45 ` bugzilla-daemon
2023-09-07 21:38 ` bugzilla-daemon
2023-09-07 21:48 ` bugzilla-daemon
2023-09-26  7:36 ` bugzilla-daemon
2023-10-20 11:08 ` bugzilla-daemon
2023-10-25 10:22 ` bugzilla-daemon
2023-10-25 10:23 ` bugzilla-daemon
2023-10-26  7:35 ` bugzilla-daemon
2023-11-11  6:43 ` bugzilla-daemon
2023-11-16  8:45 ` bugzilla-daemon [this message]
2023-11-18 14:23 ` bugzilla-daemon
2023-11-18 22:47 ` bugzilla-daemon
2023-11-21  9:54 ` bugzilla-daemon
2023-11-21 13:24   ` James Bottomley
2023-11-21 13:24 ` bugzilla-daemon
2023-11-21 13:30 ` bugzilla-daemon
2023-11-21 13:34 ` bugzilla-daemon
2023-11-21 15:27 ` bugzilla-daemon
2023-11-22 22:18 ` bugzilla-daemon
2023-11-23  8:02 ` bugzilla-daemon
2023-11-23 14:39 ` bugzilla-daemon
2023-11-23 14:58 ` bugzilla-daemon
2023-11-23 17:26 ` bugzilla-daemon
2023-11-24  6:57 ` bugzilla-daemon
2023-11-24  6:58 ` bugzilla-daemon
2023-11-24 12:19 ` bugzilla-daemon
2023-11-28 14:15 ` bugzilla-daemon
2023-12-06 13:52 ` bugzilla-daemon
2023-12-08 17:20 ` bugzilla-daemon
2023-12-09  0:56 ` bugzilla-daemon
2023-12-09 21:13 ` bugzilla-daemon
2023-12-16  4:07 ` bugzilla-daemon
2023-12-16  5:35 ` bugzilla-daemon
2023-12-16 22:00 ` bugzilla-daemon
2023-12-18  6:58 ` bugzilla-daemon
2023-12-18  7:14 ` bugzilla-daemon
2023-12-18  7:31 ` bugzilla-daemon
2023-12-21  7:49 ` bugzilla-daemon
2023-12-30  0:22 ` bugzilla-daemon
2023-12-30 12:41 ` bugzilla-daemon
2024-01-06 19:38 ` bugzilla-daemon
2024-01-07 10:11 ` bugzilla-daemon
2024-01-07 10:48 ` bugzilla-daemon
2024-01-25  3:07 ` bugzilla-daemon
2024-01-25 18:30 ` bugzilla-daemon
2024-02-01 19:46 ` bugzilla-daemon
2024-02-14 13:11 ` bugzilla-daemon
2024-02-14 14:10 ` bugzilla-daemon
2024-10-13  0:52 ` bugzilla-daemon
2024-10-14  7:29 ` bugzilla-daemon
2024-10-21 22:56 ` bugzilla-daemon
2025-01-02 10:17 ` bugzilla-daemon
2025-01-03  8:27 ` bugzilla-daemon
2025-01-06  4:31 ` bugzilla-daemon
2025-01-06 13:52 ` bugzilla-daemon
2025-01-07  3:33 ` bugzilla-daemon
2025-01-27 21:59 ` bugzilla-daemon
2025-02-03 18:40 ` bugzilla-daemon
2025-07-22  2:33 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-217599-11613-b32MjWIJo0@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).