From: bugzilla-daemon@bugzilla.kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 206253] New: mpt3sas driver crash under heavy load
Date: Sun, 19 Jan 2020 00:46:48 +0000 [thread overview]
Message-ID: <bug-206253-11613@https.bugzilla.kernel.org/> (raw)
https://bugzilla.kernel.org/show_bug.cgi?id=206253
Bug ID: 206253
Summary: mpt3sas driver crash under heavy load
Product: IO/Storage
Version: 2.5
Kernel Version: 3.10.0-957.27.2.el7.x86_64
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: SCSI
Assignee: linux-scsi@vger.kernel.org
Reporter: itoufiqu@uci.edu
Regression: No
Created attachment 286877
--> https://bugzilla.kernel.org/attachment.cgi?id=286877&action=edit
console screenshot
Hi guys,
I am new to this site, so if this problem is a repeat, my apology in advance.
I have a Dell R7425 server with dual AMD EPYC-7301 CPUs, 256GB of RAM, with
dual SAS 3008 cards. OS is centOS 7.6.
I have 60-bay JBOD connected to these 2 HBA cards, all bays are full with HGST
SAS 10TB drives. We run BeeGFS (with ZFS at the backend) in this system. This
is a brand new setup and we noticed that under heavy load after a while , the
system completely freezes up. It then needs a hard reboot. I ran starce-ng
for 2 days on the CPUs and RAM, the system was stable. I torture tested the
boot drives as well for 2 days, nothing came out. Everything seems normal.
the problem seems come in when we start stressing the drives in the JBOD,
connected with the HBA card. I have the system configured with dual connected
multipath ( roound-robin ) between both of the HBA cards and primary and
secondary SAS expanders of the of JBOD.
Since this is a Dell server, I went to Dell's support website, and found a
newer mpt3sas driver and installed it. OS installed driver version was
16.100.01.00 , and updated version now is 27.00.01.00 .
This morning, the system hung again, and I was able to capture something from
the console. I have attached the screenshot.
The error was, mpt3sas_cm0 fault_state(0x5854!)
Below is the current modinfo output ( in brief ) from the current mpt3sas
driver:
filename:
/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias: mpt2sas
version: 27.00.01.00
license: GPL
description: LSI MPT Fusion SAS 3.0 & SAS 3.5 Device Driver
author: Broadcom Inc. <MPT-FusionLinux.pdl@broadcom.com>
retpoline: Y
rhelversion: 7.6
srcversion: 26E62E1FFC69FC8709F8CD7
I have no idea what to do here. What can I do to fix this issue? Do I need a
special configuration? A new driver? Our file servers are usually under
moderate to high load. We have a 1.6PB system here ( BeeGFS + ZFS at the
backend ), with CentOS 6.9. that system run pretty solid without much hiccup.
that system also has a HBA 3008 card.
Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
reply other threads:[~2020-01-19 0:46 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-206253-11613@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox