linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mdadm freezes the system
@ 2010-06-08  8:59 Roman Mamedov
  2010-06-08 16:24 ` Roman Mamedov
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2010-06-08  8:59 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

Hello.

I am having a strange issue with md RAID on the 2.6.34 kernel. To be
specific, it sometimes locks up the system completely, with the following
symptoms:
- any attempt to read from an array seems to never return
- no errors at all on the server console
- in one lock-up episode I had "top" running, which displayed zero CPU
  load (no mdX_raidX in sight on top of the CPU-load sorted list)
- Alt-SysRQ-B works, and allows to reboot the system

Now, regarding when this happens. I had two such lock-ups shortly after moving
my root FS to RAID5; after the first one I changed the FS from XFS to Ext4
(this did not help), after the second one I disabled NCQ on all drives and the
write intent bitmap on the array. After that, it worked for maybe a week of
intense reads/writes onto the arrays with no more hangs.

Today, I have decided to convert a three-member RAID5 into a four-member
RAID6. mdadm segfaulted(!) right after the --grow command, and dmesg had
an error about md being unable to overwrite the /sys/.....stripe_cache_size
file. (As I understand, this is already fixed in the latest kernel).

The array then started rebuilding as 4-member RAID6 seemingly fine, but
shortly after, the system locked up in the same manner as described above.

Several attempts to do the rebuild after reboots consistently caused the same
lock-ups early in the rebuild (at less than 1% done). So for now, I decided to
give up and returned the array to its previous RAID5 three-member
configuration, which went fine.

The configuration:
md0 is 3* 1990GB RAID5
md1 is 3* 10GB RAID5 (root FS)
Three drives are 2* WD20EADS and 1* Hitachi 2TB drive. Fourth array member I
was trying to add to md0, is a RAID0 of two 1TB drives (Seagate and Hitachi).
SATA controllers are nForce4 chipset and a PCI-E JMicron JMB363. I am using
mdadm 3.1.2 now, and going to try the 2.6.35-rc2 kernel.

So, my question is, does anyone have an idea on what could cause this, and what
would be the best way to diagnose/fix the lockup problem?  Thanks in advance.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-06-16 11:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08  8:59 mdadm freezes the system Roman Mamedov
2010-06-08 16:24 ` Roman Mamedov
2010-06-10 18:43   ` Roman Mamedov
2010-06-16  7:03     ` Michael Evans
2010-06-16  7:16       ` Roman Mamedov
2010-06-16 11:47         ` Billy Crook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).