public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Chikunov <vt@altlinux.org>
To: megaraidlinux.pdl@broadcom.com, linux-scsi@vger.kernel.org,
	Kashyap Desai <kashyap.desai@broadcom.com>,
	Sumit Saxena <sumit.saxena@broadcom.com>,
	Shivasharan S <shivasharan.srikanteshwara@broadcom.com>,
	Chandrakanth patil <chandrakanth.patil@broadcom.com>
Subject: megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19
Date: Sat, 10 Feb 2024 04:18:31 +0300	[thread overview]
Message-ID: <20240210011831.47f55oe67utq2yr7@altlinux.org> (raw)

Hi,

We started to get timeouts and controller resets since 5.19.5 (vanilla
v5.19 is not tested, tests below are on 6.6.15) when several ioctl
FALLOC_FL_ZERO_RANGE are issued into device consequentially without
delay between them (3-5 is enough to trigger condition). Because of
this, for example, mkfs.ext4 extremely slows down when initializing
filesystem. This happens on aarch64 (Kunpeng-920) server.

Reproducer:

  # for ((i=0;i<5;i++)); do echo $i; fallocate -z -l 2097152 /dev/sdc; done

Example of dmesg messages after problematic ioctl calls:

  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 Abort request is for SMID: 4753
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d51beacc) tm_dev_handle 0x4
  Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
  Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 task abort FAILED!! scmd(0x00000000d51beacc)
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 CDB: Write(10) 2a 00 00 00 00 00 00 00 08 00
  Feb 06 19:45:04 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 Abort request is for SMID: 8293
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d9406c9c) tm_dev_handle 0x4
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 task abort SUCCESS!! scmd(0x00000000d9406c9c)
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 CDB: Write Same(10) 41 00 03 4c 00 10 00 10 00 00
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting target reset! scmd(0x00000000d51beacc) tm_dev_handle: 0x4
  Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
  Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 target reset SUCCESS!!
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: Power-on or device reset occurred

Excerpt from the controller events log (from storli):

  Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004  reset (Type 03)
  Event Description: Drive PD 05(e0xfb/s4) link speed changed
  Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
  Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
  Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004  reset (Type 03)
  Event Description: Drive PD 05(e0xfb/s4) link speed changed
  Event Description: Unexpected sense: PD 05(e0xfb/s4) Path 5e8b4700e35e2004, CDB: 41 00 00 00 00 00 00 10 00 00, Sense: 6/29/00

Tests was on the latest firmware (at the moment):

  Product Name = MegaRAID 9560-8i 4GB
  Serial Number = SKC4006982
  Firmware Package Build = 52.28.0-5305
  Firmware Version = 5.280.02-3972
  PSOC FW Version = 0x001A
  PSOC Hardware Version = 0x000A
  PSOC Part Number = 29211-260-4GB
  NVDATA Version = 5.2800.00-0752
  CBB Version = 28.250.04.00
  Bios Version = 7.28.00.0_0x071C0000
  HII Version = 07.28.04.00
  HIIA Version = 07.28.04.00
  Driver Name = megaraid_sas
  Driver Version = 07.725.01.00-rc1

I tried also latest available megaraid_sas driver (07.728.04.00) which is not
yet merged into mainline but the problems are not resolved with it.

Thanks,


             reply	other threads:[~2024-02-10  1:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-10  1:18 Vitaly Chikunov [this message]
2024-02-15 15:18 ` megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19 Vitaly Chikunov
2024-02-15 18:42   ` Martin K. Petersen
2024-02-16 10:08     ` Vitaly Chikunov
2025-03-09 13:55       ` Samy Lahfa
2025-03-11  2:24         ` Martin K. Petersen
2025-03-15 17:12           ` Ryan Lahfa
2025-03-18  1:38             ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240210011831.47f55oe67utq2yr7@altlinux.org \
    --to=vt@altlinux.org \
    --cc=chandrakanth.patil@broadcom.com \
    --cc=kashyap.desai@broadcom.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=megaraidlinux.pdl@broadcom.com \
    --cc=shivasharan.srikanteshwara@broadcom.com \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox