From: Vitaly Chikunov <vt@altlinux.org>
To: megaraidlinux.pdl@broadcom.com, linux-scsi@vger.kernel.org,
Kashyap Desai <kashyap.desai@broadcom.com>,
Sumit Saxena <sumit.saxena@broadcom.com>,
Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
Chandrakanth patil <chandrakanth.patil@broadcom.com>
Subject: megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19
Date: Sat, 10 Feb 2024 04:18:31 +0300 [thread overview]
Message-ID: <20240210011831.47f55oe67utq2yr7@altlinux.org> (raw)
Hi,
We started to get timeouts and controller resets since 5.19.5 (vanilla
v5.19 is not tested, tests below are on 6.6.15) when several ioctl
FALLOC_FL_ZERO_RANGE are issued into device consequentially without
delay between them (3-5 is enough to trigger condition). Because of
this, for example, mkfs.ext4 extremely slows down when initializing
filesystem. This happens on aarch64 (Kunpeng-920) server.
Reproducer:
# for ((i=0;i<5;i++)); do echo $i; fallocate -z -l 2097152 /dev/sdc; done
Example of dmesg messages after problematic ioctl calls:
Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 Abort request is for SMID: 4753
Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d51beacc) tm_dev_handle 0x4
Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 task abort FAILED!! scmd(0x00000000d51beacc)
Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 CDB: Write(10) 2a 00 00 00 00 00 00 00 08 00
Feb 06 19:45:04 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 Abort request is for SMID: 8293
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d9406c9c) tm_dev_handle 0x4
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 task abort SUCCESS!! scmd(0x00000000d9406c9c)
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 CDB: Write Same(10) 41 00 03 4c 00 10 00 10 00 00
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting target reset! scmd(0x00000000d51beacc) tm_dev_handle: 0x4
Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 target reset SUCCESS!!
Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: Power-on or device reset occurred
Excerpt from the controller events log (from storli):
Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004 reset (Type 03)
Event Description: Drive PD 05(e0xfb/s4) link speed changed
Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004 reset (Type 03)
Event Description: Drive PD 05(e0xfb/s4) link speed changed
Event Description: Unexpected sense: PD 05(e0xfb/s4) Path 5e8b4700e35e2004, CDB: 41 00 00 00 00 00 00 10 00 00, Sense: 6/29/00
Tests was on the latest firmware (at the moment):
Product Name = MegaRAID 9560-8i 4GB
Serial Number = SKC4006982
Firmware Package Build = 52.28.0-5305
Firmware Version = 5.280.02-3972
PSOC FW Version = 0x001A
PSOC Hardware Version = 0x000A
PSOC Part Number = 29211-260-4GB
NVDATA Version = 5.2800.00-0752
CBB Version = 28.250.04.00
Bios Version = 7.28.00.0_0x071C0000
HII Version = 07.28.04.00
HIIA Version = 07.28.04.00
Driver Name = megaraid_sas
Driver Version = 07.725.01.00-rc1
I tried also latest available megaraid_sas driver (07.728.04.00) which is not
yet merged into mainline but the problems are not resolved with it.
Thanks,
next reply other threads:[~2024-02-10 1:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-10 1:18 Vitaly Chikunov [this message]
2024-02-15 15:18 ` megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19 Vitaly Chikunov
2024-02-15 18:42 ` Martin K. Petersen
2024-02-16 10:08 ` Vitaly Chikunov
2025-03-09 13:55 ` Samy Lahfa
2025-03-11 2:24 ` Martin K. Petersen
2025-03-15 17:12 ` Ryan Lahfa
2025-03-18 1:38 ` Martin K. Petersen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240210011831.47f55oe67utq2yr7@altlinux.org \
--to=vt@altlinux.org \
--cc=chandrakanth.patil@broadcom.com \
--cc=kashyap.desai@broadcom.com \
--cc=linux-scsi@vger.kernel.org \
--cc=megaraidlinux.pdl@broadcom.com \
--cc=shivasharan.srikanteshwara@broadcom.com \
--cc=sumit.saxena@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox