linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
@ 2010-08-31  8:06 bugzilla-daemon
  2010-08-31  8:08 ` [Bug 17551] " bugzilla-daemon
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:06 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551

           Summary: mpt2sas -- spurious hotplug event causes drive drop to
                    drop out of JBOD array
           Product: SCSI Drivers
           Version: 2.5
    Kernel Version: 2.6.26 - 2.6.32rc4-scsi-misc
          Platform: All
        OS/Version: Linux
              Tree: Fedora
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
        AssignedTo: scsi_drivers-other@kernel-bugs.osdl.org
        ReportedBy: starlight@binnacle.cx
                CC: kashyap.desai@lsi.com
        Regression: No


At random interval of between 10 and 40 days a Seagate Momentus drive drops out
of an eight-drive JBOD array attached to a LSI 2008 SAS controller.

LSI 2008
eight Seagate Momentus ST9500420AS SATA drives, JBOD
LVM2 8x striped/RAID0 LV

CentOS 5.5 kernel 2.6.18-194.8.1.el5
MPT2BIOS 7.05.01.00 (2010.09.09)
SAS2008-IT 5.00.00.00
LSI mpt2sas 05.00.00.00

also

CentOS 5.4 kernel 2.6.18-164.10.1.el5
MPT2BIOS 7.03.00.00 (2009-10-12)
SAS2008-IR 4.00.00.00
distro mpt2sas version 01.101.00.00

-----

Striped LV is for logging and recevies moderate write activity for 6.5 hours
each day.  Additionally a 'pbzip2' job runs nightly to compress each day's log.
 Uncompressed logs run from between 250 and 500 GBs each.  Ext4 filesystem.

-----

Originally reported under bug 14831 before exact nature of problem was
identified.  See bottom of that report for initial analysis by kdesai.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
@ 2010-08-31  8:08 ` bugzilla-daemon
  2010-08-31  8:08 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:08 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #1 from starlight@binnacle.cx  2010-08-31 08:08:04 ---
Created an attachment (id=28611)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28611)
kernel messages from failure with logging_level=0x1F8

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
  2010-08-31  8:08 ` [Bug 17551] " bugzilla-daemon
@ 2010-08-31  8:08 ` bugzilla-daemon
  2010-08-31  8:09 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:08 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #2 from starlight@binnacle.cx  2010-08-31 08:08:39 ---
Created an attachment (id=28621)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28621)
boot-time messages with logging_level=0x1F8

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
  2010-08-31  8:08 ` [Bug 17551] " bugzilla-daemon
  2010-08-31  8:08 ` bugzilla-daemon
@ 2010-08-31  8:09 ` bugzilla-daemon
  2010-08-31  8:10 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:09 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #3 from starlight@binnacle.cx  2010-08-31 08:09:16 ---
Created an attachment (id=28631)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28631)
firmware events from boot and failure

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (2 preceding siblings ...)
  2010-08-31  8:09 ` bugzilla-daemon
@ 2010-08-31  8:10 ` bugzilla-daemon
  2010-08-31  8:12 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:10 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #4 from starlight@binnacle.cx  2010-08-31 08:10:31 ---
Created an attachment (id=28641)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28641)
boot-time information from 'lsiutil'

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (3 preceding siblings ...)
  2010-08-31  8:10 ` bugzilla-daemon
@ 2010-08-31  8:12 ` bugzilla-daemon
  2010-08-31  8:23 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:12 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #5 from starlight@binnacle.cx  2010-08-31 08:12:16 ---
Created an attachment (id=28651)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28651)
miscellaneous 'lsiutil' information collected after failure

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (4 preceding siblings ...)
  2010-08-31  8:12 ` bugzilla-daemon
@ 2010-08-31  8:23 ` bugzilla-daemon
  2010-08-31 12:36 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31  8:23 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #6 from starlight@binnacle.cx  2010-08-31 08:23:09 ---
Hardware details:

Supermicro 1026T-URF
two Intel / Xeon X5560 2.8GHz / 1333MHz / 8MB L3 / D0
Hynix / ECC UDIMM HMT351U7AFR8C-H9 / 1333MHz / 4GB x 6 = 24GB
Supermicro AOC-USAS2-L8i SAS controller

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (5 preceding siblings ...)
  2010-08-31  8:23 ` bugzilla-daemon
@ 2010-08-31 12:36 ` bugzilla-daemon
  2010-09-03 18:34 ` [Bug 17551] mpt2sas -- spurious hotplug event causes drive " bugzilla-daemon
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-08-31 12:36 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #7 from starlight@binnacle.cx  2010-08-31 09:32:44 ---
Left out one bit of hardware, and remembering it lead to an idea.  A SuperMicro
SAS-113TQ SAS/SATA backplane (
http://www.supermicro.com/manuals/other/BPN-SAS-113TQ.pdf ) is also in the mix
and could be a possible cause of random hotplug events.  Distinctly recall
puzzling over two tiny ribbon cables that run between the controller card and
the backplane.  Turned out that the extra connections allow the controller and
backplane to communicate via the obscure "SGPIO" protocol (
http://en.wikipedia.org/wiki/SGPIO ).  Seems to be for flashing LEDs but who
knows?  Maybe the backplane can trigger hotplug events.

Another detail is that it's always the last drive in each of the two SAS IPASS
cable groups that drops:  either physical slot 3 or physical slot 7 (where the
ranges are 0-3 and 4-7).  A suspicious coincidence.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (6 preceding siblings ...)
  2010-08-31 12:36 ` bugzilla-daemon
@ 2010-09-03 18:34 ` bugzilla-daemon
  2010-09-04 20:14 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-09-03 18:34 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551


starlight@binnacle.cx changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|mpt2sas -- spurious hotplug |mpt2sas -- spurious hotplug
                   |event causes drive drop to  |event causes drive to drop
                   |drop out of JBOD array      |out of JBOD array




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (7 preceding siblings ...)
  2010-09-03 18:34 ` [Bug 17551] mpt2sas -- spurious hotplug event causes drive " bugzilla-daemon
@ 2010-09-04 20:14 ` bugzilla-daemon
  2010-09-05  6:41 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-09-04 20:14 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #8 from starlight@binnacle.cx  2010-09-04 20:14:36 ---
Created an attachment (id=28962)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=28962)
kernel log from yet another failure

yet another controller failure

different profile:  infinite hot-plug event loop this time

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (8 preceding siblings ...)
  2010-09-04 20:14 ` bugzilla-daemon
@ 2010-09-05  6:41 ` bugzilla-daemon
  2010-09-05 16:55 ` bugzilla-daemon
  2012-08-13 16:07 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-09-05  6:41 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #9 from starlight@binnacle.cx  2010-09-05 06:40:44 ---
Possibly have figured this out.  Since the problem often has occurred when the
logging application becomes idle, it seems possible that power management in
the drives is a cause.  The Seagate Momentus ST9500420AS drives are known for
parking their heads aggressively (and driving laptop users nuts).  For some
reason 'hdparm' does not work with LSI attached drives under CentOS 5.5, but it
does work under Fedora 12.  Have a F12 OS image available on the server and
used it to run 'hdparm -B 255 /dev/sdX' on all of the drives, then rebooted
back to CentOS after verifying that the value sticks.  Time will tell if
disabling APM on the drives works around the issue.

If this is the cause, it implies that possibly the LSI firmware is mistaking
APM event notifications from the drives as hot-plug events.  Seems to me that
would be a bug.  However it's strange that this only happens after an extended
period of time, so it may be a more complex variation of that basic theory. 
Perhaps the drives have a quirk where they drop into the spin-down power state
only after a certain amount of uptime.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (9 preceding siblings ...)
  2010-09-05  6:41 ` bugzilla-daemon
@ 2010-09-05 16:55 ` bugzilla-daemon
  2012-08-13 16:07 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2010-09-05 16:55 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551





--- Comment #10 from starlight@binnacle.cx  2010-09-05 16:55:19 ---
Arrgh!  It appears the the this drive has a bad firmware that hangs and freezes
along with excessively parking the heads.  Even better, Seagate has not
released a fix.  Second server with bad Seagate firmware--defintaely sticking
with Western Digital going forward.

http://forums.seagate.com/t5/Momentus-XT-Momentus-and/Momentus-ST9500420AS-Firmware-Update/td-p/33862

Hopefully the disabling of APM will avoid the firmware bug.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug 17551] mpt2sas -- spurious hotplug event causes drive to drop out of JBOD array
  2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
                   ` (10 preceding siblings ...)
  2010-09-05 16:55 ` bugzilla-daemon
@ 2012-08-13 16:07 ` bugzilla-daemon
  11 siblings, 0 replies; 13+ messages in thread
From: bugzilla-daemon @ 2012-08-13 16:07 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=17551


Alan <alan@lxorguk.ukuu.org.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |alan@lxorguk.ukuu.org.uk
         Resolution|                            |DOCUMENTED




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-13 16:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-31  8:06 [Bug 17551] New: mpt2sas -- spurious hotplug event causes drive drop to drop out of JBOD array bugzilla-daemon
2010-08-31  8:08 ` [Bug 17551] " bugzilla-daemon
2010-08-31  8:08 ` bugzilla-daemon
2010-08-31  8:09 ` bugzilla-daemon
2010-08-31  8:10 ` bugzilla-daemon
2010-08-31  8:12 ` bugzilla-daemon
2010-08-31  8:23 ` bugzilla-daemon
2010-08-31 12:36 ` bugzilla-daemon
2010-09-03 18:34 ` [Bug 17551] mpt2sas -- spurious hotplug event causes drive " bugzilla-daemon
2010-09-04 20:14 ` bugzilla-daemon
2010-09-05  6:41 ` bugzilla-daemon
2010-09-05 16:55 ` bugzilla-daemon
2012-08-13 16:07 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).