All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: linux-scsi@vger.kernel.org
Subject: [Bug 206253] New: mpt3sas driver crash under heavy load
Date: Sun, 19 Jan 2020 00:46:48 +0000	[thread overview]
Message-ID: <bug-206253-11613@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=206253

            Bug ID: 206253
           Summary: mpt3sas driver crash under heavy load
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 3.10.0-957.27.2.el7.x86_64
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@vger.kernel.org
          Reporter: itoufiqu@uci.edu
        Regression: No

Created attachment 286877
  --> https://bugzilla.kernel.org/attachment.cgi?id=286877&action=edit
console screenshot

Hi guys, 

I am new to this site, so if this problem is a repeat, my apology in advance.

I have a Dell R7425 server with dual AMD EPYC-7301 CPUs, 256GB of RAM, with
dual SAS 3008 cards. OS is centOS 7.6.  

I have 60-bay JBOD connected to these 2 HBA cards, all bays are full with HGST
SAS 10TB drives.  We run BeeGFS (with ZFS at the backend) in this system.  This
is a brand new setup and we noticed that under heavy load after a while , the
system completely freezes up.  It then needs a hard reboot.  I ran starce-ng
for 2 days on the CPUs and RAM, the system was stable.  I torture tested the
boot drives as well for 2 days, nothing came out.  Everything seems normal. 
the problem seems come in when we start stressing the drives in the JBOD,
connected with the HBA card. I have the system configured with dual connected
multipath ( roound-robin ) between both of the HBA cards and primary and
secondary SAS expanders of the of JBOD.

Since this is a Dell server, I went to Dell's support website, and found a
newer mpt3sas driver and installed it.  OS installed driver version was
16.100.01.00 , and updated version now is 27.00.01.00 .

This morning, the system hung again, and I was able to capture something from
the console.  I have attached the screenshot.  

The error was, mpt3sas_cm0 fault_state(0x5854!)

Below is the current modinfo output ( in brief ) from the current mpt3sas
driver:

filename:      
/lib/modules/3.10.0-957.27.2.el7.x86_64/kernel/drivers/scsi/mpt3sas/mpt3sas.ko.xz
alias:          mpt2sas
version:        27.00.01.00
license:        GPL
description:    LSI MPT Fusion SAS 3.0 & SAS 3.5 Device Driver
author:         Broadcom Inc. <MPT-FusionLinux.pdl@broadcom.com>
retpoline:      Y
rhelversion:    7.6
srcversion:     26E62E1FFC69FC8709F8CD7



I have no idea what to do here.  What can I do to fix this issue? Do I need a
special configuration?  A new driver? Our file servers are usually under
moderate to high load.  We have a 1.6PB system here ( BeeGFS + ZFS at the
backend ), with CentOS 6.9.  that system run pretty solid without much hiccup. 
that system also has a HBA 3008 card.

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

                 reply	other threads:[~2020-01-19  0:46 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-206253-11613@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.