public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19
@ 2024-02-10  1:18 Vitaly Chikunov
  2024-02-15 15:18 ` Vitaly Chikunov
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Chikunov @ 2024-02-10  1:18 UTC (permalink / raw)
  To: megaraidlinux.pdl, linux-scsi, Kashyap Desai, Sumit Saxena,
	Shivasharan S, Chandrakanth patil

Hi,

We started to get timeouts and controller resets since 5.19.5 (vanilla
v5.19 is not tested, tests below are on 6.6.15) when several ioctl
FALLOC_FL_ZERO_RANGE are issued into device consequentially without
delay between them (3-5 is enough to trigger condition). Because of
this, for example, mkfs.ext4 extremely slows down when initializing
filesystem. This happens on aarch64 (Kunpeng-920) server.

Reproducer:

  # for ((i=0;i<5;i++)); do echo $i; fallocate -z -l 2097152 /dev/sdc; done

Example of dmesg messages after problematic ioctl calls:

  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 Abort request is for SMID: 4753
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d51beacc) tm_dev_handle 0x4
  Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
  Feb 06 19:44:07 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 task abort FAILED!! scmd(0x00000000d51beacc)
  Feb 06 19:44:07 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 CDB: Write(10) 2a 00 00 00 00 00 00 00 08 00
  Feb 06 19:45:04 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 Abort request is for SMID: 8293
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting task abort! scmd(0x00000000d9406c9c) tm_dev_handle 0x4
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 task abort SUCCESS!! scmd(0x00000000d9406c9c)
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#8292 CDB: Write Same(10) 41 00 03 4c 00 10 00 10 00 00
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: attempting target reset! scmd(0x00000000d51beacc) tm_dev_handle: 0x4
  Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
  Feb 06 19:45:06 host-226 kernel: megaraid_sas 0000:01:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: [sdc] tag#4752 target reset SUCCESS!!
  Feb 06 19:45:06 host-226 kernel: sd 0:2:4:0: Power-on or device reset occurred

Excerpt from the controller events log (from storli):

  Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004  reset (Type 03)
  Event Description: Drive PD 05(e0xfb/s4) link speed changed
  Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
  Event Description: Unexpected sense: Encl PD fb Path 5e8b4700e35e201e, CDB: 3c 01 05 00 00 00 00 00 10 00, Sense: b/4b/05
  Event Description: PD 05(e0xfb/s4) Path 5e8b4700e35e2004  reset (Type 03)
  Event Description: Drive PD 05(e0xfb/s4) link speed changed
  Event Description: Unexpected sense: PD 05(e0xfb/s4) Path 5e8b4700e35e2004, CDB: 41 00 00 00 00 00 00 10 00 00, Sense: 6/29/00

Tests was on the latest firmware (at the moment):

  Product Name = MegaRAID 9560-8i 4GB
  Serial Number = SKC4006982
  Firmware Package Build = 52.28.0-5305
  Firmware Version = 5.280.02-3972
  PSOC FW Version = 0x001A
  PSOC Hardware Version = 0x000A
  PSOC Part Number = 29211-260-4GB
  NVDATA Version = 5.2800.00-0752
  CBB Version = 28.250.04.00
  Bios Version = 7.28.00.0_0x071C0000
  HII Version = 07.28.04.00
  HIIA Version = 07.28.04.00
  Driver Name = megaraid_sas
  Driver Version = 07.725.01.00-rc1

I tried also latest available megaraid_sas driver (07.728.04.00) which is not
yet merged into mainline but the problems are not resolved with it.

Thanks,


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-18  1:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-10  1:18 megaraid_sas: multiple FALLOC_FL_ZERO_RANGE causes timeouts and resets on MegaRAID 9560-8i 4GB since 5.19 Vitaly Chikunov
2024-02-15 15:18 ` Vitaly Chikunov
2024-02-15 18:42   ` Martin K. Petersen
2024-02-16 10:08     ` Vitaly Chikunov
2025-03-09 13:55       ` Samy Lahfa
2025-03-11  2:24         ` Martin K. Petersen
2025-03-15 17:12           ` Ryan Lahfa
2025-03-18  1:38             ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox