All of lore.kernel.org
 help / color / mirror / Atom feed
* 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31)
@ 2009-12-13 15:08 Jörn Nettingsmeier
  2009-12-13 20:38 ` adam radford
  0 siblings, 1 reply; 4+ messages in thread
From: Jörn Nettingsmeier @ 2009-12-13 15:08 UTC (permalink / raw)
  To: linux-scsi; +Cc: Jörn Nettingsmeier

hi everyone !


i have had a weird issue with the 3ware 9690SA controller on an intel
nehalem system. there are 5 2tb drives attached to that controller in a
raid6 configuration, which is exported to the operating system as a
single volume.

an installation of opensuse 11.2 (using a 2.6.31 kernel) fails
reproducibly after 10-20mins of heavy disk activity. the error manifests
itself by the first two drives falling off the array, whereupon the
controller switches to read-only mode and any subsequent writes fail.

checking with the CLI or the controller bios shows that drives p0 and p1
are disconnected.

at the same time, the controller bios fails to come up after a warm
reboot 3 out of 4 times, which is fixed only by a cold restart.

the nature of the failure and the reboot problems made me suspect a
hardware failure (an opinion shared by my vendor's support technician).

so the box was sent back for testing. turns out they can reproduce the
error 100% with opensuse 11.2, but not with an older debian or SLES 10
system. and they pretty much swapped all hardware components in the lab
(i had already seen the issue with two different controllers of the same
model).

so it might be a software issue after all.
since the vendor is not supporting oS 11.2, they closed the issue and
sent the machine back.
still, this is nagging me, simply because if it happened on a production
machine, it would be an issue of massive data loss...

do you know of any known regressions in the 3ware driver or userspace
utilities since 2.6.27 (because that was the latest kernel that passed
the tests) that could be causing this issue?

here are some screenshots of the dmesg buffer and the controller bios
during the error condition:

http://stackingdwarves.net/download/3ware-9690SE-crash/

i'd be happy to run further tests on that machine on monday - any advice
on how to proceed would be most welcome.


best regards,


jörn



ps: i'd appreciate being cc:ed on this issue, since i'm not subscribed.



-- 
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio), Elektrofachkraft
Audio and event engineer - Ambisonic surround recordings

http://stackingdwarves.net

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31)
  2009-12-13 15:08 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31) Jörn Nettingsmeier
@ 2009-12-13 20:38 ` adam radford
  2009-12-13 22:24   ` Jörn Nettingsmeier
  0 siblings, 1 reply; 4+ messages in thread
From: adam radford @ 2009-12-13 20:38 UTC (permalink / raw)
  To: Jörn Nettingsmeier; +Cc: linux-scsi

2009/12/13 Jörn Nettingsmeier <nettings@stackingdwarves.net>:
> do you know of any known regressions in the 3ware driver or userspace
> utilities since 2.6.27 (because that was the latest kernel that passed
> the tests) that could be causing this issue?
>

The 3ware driver and userspace utilities do make the decision of whether
disks get kicked out of the array, the 3ware firmware does.  Make sure
you are running the 9.5.3 (latest) firmware.

> here are some screenshots of the dmesg buffer and the controller bios
> during the error condition:
>
> http://stackingdwarves.net/download/3ware-9690SE-crash/
>

I believe you have bad hardware, or a drive cabling issue.  I will
forward this email
to 3ware support.  They should be able to tell you if you have a known flakey
drive firmware rev, enclosure, etc.

-Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31)
  2009-12-13 20:38 ` adam radford
@ 2009-12-13 22:24   ` Jörn Nettingsmeier
  2009-12-14  0:45     ` adam radford
  0 siblings, 1 reply; 4+ messages in thread
From: Jörn Nettingsmeier @ 2009-12-13 22:24 UTC (permalink / raw)
  To: adam radford; +Cc: linux-scsi

hi adam!


thanks for your reply!

adam radford wrote:
> 2009/12/13 Jörn Nettingsmeier <nettings@stackingdwarves.net>:
>> do you know of any known regressions in the 3ware driver or userspace
>> utilities since 2.6.27 (because that was the latest kernel that passed
>> the tests) that could be causing this issue?
>>
> 
> The 3ware driver and userspace utilities do make the decision of whether

i guess that should read "do *not*" ?

> disks get kicked out of the array, the 3ware firmware does.  Make sure
> you are running the 9.5.3 (latest) firmware.

i will check this first thing tomorrow and report back.
(the vendor assured me that the firmware was at the latest version, but
i didn't have a chance to check it myself.)

>> here are some screenshots of the dmesg buffer and the controller bios
>> during the error condition:
>>
>> http://stackingdwarves.net/download/3ware-9690SE-crash/
>>
> 
> I believe you have bad hardware, or a drive cabling issue. 

as i said before, this issue has occurred with two identical controller
cards, and the vendor claims to also have swapped backplane and cabling.
i'll try and get their technician on the phone tomorrow to confirm that.

> I will
> forward this email
> to 3ware support.  They should be able to tell you if you have a known flakey
> drive firmware rev, enclosure, etc.

great, i appreciate your interest and support in this issue!

i'll try and fend off my client for another few days (who is kind of
eager to get the system up and running), but i do want to get this thing
sorted out - i had always been using 3ware gear on linux with great
success...
after spending more than 18 work hours just waiting for the array to
initialize, fall apart, start over, this is getting personal :-D



-- 
Jörn Nettingsmeier
Lortzingstr. 11, 45128 Essen, Tel. +49 177 7937487

Meister für Veranstaltungstechnik (Bühne/Studio), Elektrofachkraft
Audio and event engineer - Ambisonic surround recordings

http://stackingdwarves.net

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31)
  2009-12-13 22:24   ` Jörn Nettingsmeier
@ 2009-12-14  0:45     ` adam radford
  0 siblings, 0 replies; 4+ messages in thread
From: adam radford @ 2009-12-14  0:45 UTC (permalink / raw)
  To: Jörn Nettingsmeier; +Cc: linux-scsi

2009/12/13 Jörn Nettingsmeier <nettings@stackingdwarves.net>:
> adam radford wrote:
>> 2009/12/13 Jörn Nettingsmeier <nettings@stackingdwarves.net>:
>>> do you know of any known regressions in the 3ware driver or userspace
>>> utilities since 2.6.27 (because that was the latest kernel that passed
>>> the tests) that could be causing this issue?
>>>
>>
>> The 3ware driver and userspace utilities do make the decision of whether
>
> i guess that should read "do *not*" ?

Yes, that should read "do not".

-Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-12-14  0:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-13 15:08 3ware 9690SA: raid6 array blows up under opensuse 11.2 (2.6.31) Jörn Nettingsmeier
2009-12-13 20:38 ` adam radford
2009-12-13 22:24   ` Jörn Nettingsmeier
2009-12-14  0:45     ` adam radford

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.