All of lore.kernel.org
 help / color / mirror / Atom feed
* 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31)
@ 2009-12-13 15:08 Jörn Nettingsmeier
  2009-12-13 20:38 ` adam radford
  0 siblings, 1 reply; 4+ messages in thread
From: Jörn Nettingsmeier @ 2009-12-13 15:08 UTC (permalink / raw)
  To: linux-scsi; +Cc: Jörn Nettingsmeier

hi everyone !


i have had a weird issue with the 3ware 9690SA controller on an intel
nehalem system. there are 5 2tb drives attached to that controller in a
raid6 configuration, which is exported to the operating system as a
single volume.

an installation of opensuse 11.2 (using a 2.6.31 kernel) fails
reproducibly after 10-20mins of heavy disk activity. the error manifests
itself by the first two drives falling off the array, whereupon the
controller switches to read-only mode and any subsequent writes fail.

checking with the CLI or the controller bios shows that drives p0 and p1
are disconnected.

at the same time, the controller bios fails to come up after a warm
reboot 3 out of 4 times, which is fixed only by a cold restart.

the nature of the failure and the reboot problems made me suspect a
hardware failure (an opinion shared by my vendor's support technician).

so the box was sent back for testing. turns out they can reproduce the
error 100% with opensuse 11.2, but not with an older debian or SLES 10
system. and they pretty much swapped all hardware components in the lab
(i had already seen the issue with two different controllers of the same
model).

so it might be a software issue after all.
since the vendor is not supporting oS 11.2, they closed the issue and
sent the machine back.
still, this is nagging me, simply because if it happened on a production
machine, it would be an issue of massive data loss...

do you know of any known regressions in the 3ware driver or userspace
utilities since 2.6.27 (because that was the latest kernel that passed
the tests) that could be causing this issue?

here are some screenshots of the dmesg buffer and the controller bios
during the error condition:

http://stackingdwarves.net/download/3ware-9690SE-crash/

i'd be happy to run further tests on that machine on monday - any advice
on how to proceed would be most welcome.


best regards,


jörn



ps: i'd appreciate being cc:ed on this issue, since i'm not subscribed.



-- 
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio), Elektrofachkraft
Audio and event engineer - Ambisonic surround recordings

http://stackingdwarves.net

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-14  0:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-13 15:08 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31) Jörn Nettingsmeier
2009-12-13 20:38 ` adam radford
2009-12-13 22:24   ` Jörn Nettingsmeier
2009-12-14  0:45     ` adam radford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.