From: bugzilla-daemon@bugzilla.kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 206253] New: mpt3sas driver crash under heavy load
Date: Sun, 19 Jan 2020 00:46:48 +0000 [thread overview]
Message-ID: <bug-206253-11613@https.bugzilla.kernel.org/> (raw)
https://bugzilla.kernel.org/show_bug.cgi?id=206253
Bug ID: 206253
Summary: mpt3sas driver crash under heavy load
Product: IO/Storage
Version: 2.5
Kernel Version: 3.10.0-957.27.2.el7.x86_64
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: high
Priority: P1
Component: SCSI
Assignee: linux-scsi@vger.kernel.org
Reporter: itoufiqu@uci.edu
Regression: No
Created attachment 286877
--> https://bugzilla.kernel.org/attachment.cgi?id=286877&action=edit
console screenshot
Hi guys,
I am new to this site, so if this problem is a repeat, my apology in advance.
I have a Dell R7425 server with dual AMD EPYC-7301 CPUs, 256GB of RAM, with
dual SAS 3008 cards. OS is centOS 7.6.
I have 60-bay JBOD connected to these 2 HBA cards, all bays are full with HGST
SAS 10TB drives. We run BeeGFS (with ZFS at the backend) in this system. This
is a brand new setup and we noticed that under heavy load after a while , the
system completely freezes up. It then needs a hard reboot. I ran starce-ng
for 2 days on the CPUs and RAM, the system was stable. I torture tested the
boot drives as well for 2 days, nothing came out. Everything seems normal.
the problem seems come in when we start stressing the drives in the JBOD,
connected with the HBA card. I have the system configured with dual connected
multipath ( roound-robin ) between both of the HBA cards and primary and
secondary SAS expanders of the of JBOD.
Since this is a Dell server, I went to Dell's support website, and found a
newer mpt3sas driver and installed it. OS installed driver version was
16.100.01.00 , and updated version now is 27.00.01.00 .
This morning, the system hung again, and I was able to capture something from
the console. I have attached the screenshot.
The error was, mpt3sas_cm0 fault_state(0x5854!)
Below is the current modinfo output ( in brief ) from the current mpt3sas
driver:
filename:
/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias: mpt2sas
version: 27.00.01.00
license: GPL
description: LSI MPT Fusion SAS 3.0 & SAS 3.5 Device Driver
author: Broadcom Inc. <MPT-FusionLinux.pdl@broadcom.com>
retpoline: Y
rhelversion: 7.6
srcversion: 26E62E1FFC69FC8709F8CD7
I have no idea what to do here. What can I do to fix this issue? Do I need a
special configuration? A new driver? Our file servers are usually under
moderate to high load. We have a 1.6PB system here ( BeeGFS + ZFS at the
backend ), with CentOS 6.9. that system run pretty solid without much hiccup.
that system also has a HBA 3008 card.
Thanks.
--
You are receiving this mail because:
You are the assignee for the bug.
reply other threads:[~2020-01-19 0:46 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-206253-11613@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@bugzilla.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.