segfault in mdadm v2.6.4

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* segfault in mdadm v2.6.4
@ 2008-04-04  5:03 Brett Dikeman
  2008-04-04 10:05 ` Christian Pernegger
  0 siblings, 1 reply; 5+ messages in thread
From: Brett Dikeman @ 2008-04-04  5:03 UTC (permalink / raw)
  To: linux-raid

On a debian/testing system, under both 2.6.22 and 2.6.24, I've been  
trying to set up a 4-drive RAID6 array.  I see the following segfault  
listed in /var/log/messages, and it's appeared each time I've  
assembled the array.  Two drives are on the onboard SATA; two are on  
external USB-SATA bridges (this is not permanent- just what I had  
available for migrating off an other array.)

Also, the array's initial "resync" hung after a few hours, much to my  
great annoyance (it's a +10 hour process; I was 3 hours in.)  It took  
the entire device with it- I couldn't unmount the filesystem.  I  
eventually tracked it to one of the four drives, on the USB<->SATA  
bridge; it wasn't responding, and the other 3 drives seemed fine.   It  
then took the entire system with it a few minutes later; all the  
running daemons stopped responding and I couldn't get a shell.  After  
waiting half an hour for a device timeout, etc- I unplugged the hung  
drive since I didn't have anything on the array.  No change, no kernel  
messages logged before or after, not even when the USB device  
'disappeared.'  I gave up and power-cycled the box.

I'd appreciate being cc'd on followups- though I will be checking the  
archives.  I'm happy to provide additional info and run tests.

Thanks!
Brett

Apr  4 00:36:46 frank kernel: md: md1 stopped.
Apr  4 00:36:46 frank kernel: md: unbind<sdc2>
Apr  4 00:36:46 frank kernel: md: export_rdev(sdc2)
Apr  4 00:36:46 frank kernel: md: unbind<sdd2>
Apr  4 00:36:46 frank kernel: md: export_rdev(sdd2)
Apr  4 00:36:46 frank kernel: md: bind<sdd2>
Apr  4 00:36:46 frank kernel: md: bind<sde2>
Apr  4 00:36:46 frank kernel: md: bind<sdf2>
Apr  4 00:36:46 frank kernel: md: bind<sdc2>
Apr  4 00:36:46 frank kernel: xor: automatically using best  
checksumming function: generic_sse
Apr  4 00:36:46 frank kernel:    generic_sse:  3086.000 MB/sec
Apr  4 00:36:46 frank kernel: xor: using function: generic_sse  
(3086.000 MB/sec)
Apr  4 00:36:46 frank kernel: async_tx: api initialized (sync-only)
Apr  4 00:36:46 frank kernel: raid6: int64x1    693 MB/s
Apr  4 00:36:46 frank kernel: raid6: int64x2    922 MB/s
Apr  4 00:36:46 frank kernel: raid6: int64x4   1083 MB/s
Apr  4 00:36:46 frank kernel: raid6: int64x8    794 MB/s
Apr  4 00:36:46 frank kernel: raid6: sse2x1    1268 MB/s
Apr  4 00:36:46 frank kernel: raid6: sse2x2    1828 MB/s
Apr  4 00:36:46 frank kernel: raid6: sse2x4    1929 MB/s
Apr  4 00:36:46 frank kernel: raid6: using algorithm sse2x4 (1929 MB/s)
Apr  4 00:36:46 frank kernel: md: raid6 personality registered for  
level 6
Apr  4 00:36:46 frank kernel: md: raid5 personality registered for  
level 5
Apr  4 00:36:46 frank kernel: md: raid4 personality registered for  
level 4
Apr  4 00:36:46 frank kernel: raid5: device sdc2 operational as raid  
disk 0
Apr  4 00:36:46 frank kernel: raid5: device sdf2 operational as raid  
disk 3
Apr  4 00:36:46 frank kernel: raid5: device sde2 operational as raid  
disk 2
Apr  4 00:36:46 frank kernel: raid5: device sdd2 operational as raid  
disk 1
Apr  4 00:36:46 frank kernel: raid5: allocated 4274kB for md1
Apr  4 00:36:46 frank kernel: raid5: raid level 6 set md1 active with  
4 out of 4 devices, algorithm 2
Apr  4 00:36:46 frank kernel: RAID5 conf printout:
Apr  4 00:36:46 frank kernel:  --- rd:4 wd:4
Apr  4 00:36:46 frank kernel:  disk 0, o:1, dev:sdc2
Apr  4 00:36:46 frank kernel:  disk 1, o:1, dev:sdd2
Apr  4 00:36:46 frank kernel:  disk 2, o:1, dev:sde2
Apr  4 00:36:46 frank kernel:  disk 3, o:1, dev:sdf2
Apr  4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c  
rsp 7fffcdd18fa0 error 4

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segfault in mdadm v2.6.4
  2008-04-04  5:03 segfault in mdadm v2.6.4 Brett Dikeman
@ 2008-04-04 10:05 ` Christian Pernegger
  2008-04-04 16:01   ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Pernegger @ 2008-04-04 10:05 UTC (permalink / raw)
  To: brett; +Cc: linux-raid

>  Apr  4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c rsp
> 7fffcdd18fa0 error 4

The segfault after assemble is more or less "normal" with the Debian
version right now. Of course I haven't been able to reproduce it with
any consistency since I reported the bug :(

You might want to post exact array config + mdadm commandline used.

C.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segfault in mdadm v2.6.4
  2008-04-04 10:05 ` Christian Pernegger
@ 2008-04-04 16:01   ` Dan Williams
  2008-04-04 16:14     ` Brett Dikeman
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2008-04-04 16:01 UTC (permalink / raw)
  To: brett; +Cc: Christian Pernegger, linux-raid

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

On Fri, Apr 4, 2008 at 3:05 AM, Christian Pernegger <pernegger@gmail.com> wrote:
> >  Apr  4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c rsp
>  > 7fffcdd18fa0 error 4
>
>  The segfault after assemble is more or less "normal" with the Debian
>  version right now. Of course I haven't been able to reproduce it with
>  any consistency since I reported the bug :(
>

There is a known issue with arrays in the "read-auto" state and "mdadm
--monitor".  The attached patch addresses this.

What is more concerning is the usb-storage hangs.  Do you have logs
from when it hung?

--
Dan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mdstat-fix-level-parsing.patch --]
[-- Type: text/x-patch; name=mdstat-fix-level-parsing.patch, Size: 960 bytes --]

mdadm: fix segfault, /proc/mdstat parsing of 'level'

From: Dan Williams <dan.j.williams@intel.com>

If the array is in 'read-auto' mode /proc/mdstat will have a string
like:

	"active(auto-read-only)"

The parsing code does not recognize this as "active" so it does not set
->level.  This leads to a segfault in --monitor mode (Monitor.c:405).

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 mdstat.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)


diff --git a/mdstat.c b/mdstat.c
index 335e1e5..1dcd709 100644
--- a/mdstat.c
+++ b/mdstat.c
@@ -165,7 +165,8 @@ struct mdstat_ent *mdstat_read(int hold, int start)
 		for (w=dl_next(line); w!= line ; w=dl_next(w)) {
 			int l = strlen(w);
 			char *eq;
-			if (strcmp(w, "active")==0)
+			if (strncmp(w, "active", strlen("active"))==0)
+			/* strncmp to catch the "active(auto-read-only)" case */
 				ent->active = 1;
 			else if (strcmp(w, "inactive")==0)
 				ent->active = 0;

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: segfault in mdadm v2.6.4
  2008-04-04 16:01   ` Dan Williams
@ 2008-04-04 16:14     ` Brett Dikeman
  2008-04-04 16:43       ` Dan Williams
  0 siblings, 1 reply; 5+ messages in thread
From: Brett Dikeman @ 2008-04-04 16:14 UTC (permalink / raw)
  To: Dan Williams; +Cc: brett, Christian Pernegger, linux-raid

> There is a known issue with arrays in the "read-auto" state and "mdadm
> --monitor".  The attached patch addresses this.

Ah- I forgot that, yes, mdadm monitor was running.  I was *wondering* what
mdadm process was hanging around to segfault- duh! :-)

> What is more concerning is the usb-storage hangs.  Do you have logs
> from when it hung?

Nope- absolutely nothing useful from dmesg or /var/log/messages.  The
array simply grinds to a halt.  I think it might be the USB bridge, since
this morning I woke up to find the same drive as previously mentioned,
with its access light on continuously (didn't happen the first time.)  It
did get further along in the sync- 66.7%, very roughly twice as far as the
first time.

/proc/mdstat keeps getting updated during all this; the (obviously
averaged) rebuild rate average drops steadily.

  I don't have any info handy on the USB device, but it's an older
SATA<->USB/eSATA AMS Venus (expensive but otherwise nice case.  Has a
silent 80mm fan in it that doesn't move much hair, but does keep drives
cooler.)  It dates back to when manufacturers were still making "L"
"eSATA" ports (grr.)  The other USB bridge is a Vantec multi-interface
bare adapter.  That one seemed fine.

Ask away with things you'd like me to try- I'll get more info this
evening, and I might try picking up a second one of the Vantec interfaces
if I can (they're cheap- and have proven endlessly useful.) 
Unfortunately, my intended destination for this array uses Adaptec 1205SA
PCI cards, which idiotically don't support drives over 500GB and hang
during their POST :(

Sometimes I wish for the days of ISA and IRQ's...grrrr!

Brett

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: segfault in mdadm v2.6.4
  2008-04-04 16:14     ` Brett Dikeman
@ 2008-04-04 16:43       ` Dan Williams
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Williams @ 2008-04-04 16:43 UTC (permalink / raw)
  To: brett; +Cc: Christian Pernegger, linux-raid, linux-usb

On Fri, Apr 4, 2008 at 9:14 AM, Brett Dikeman <brett@cloud9.net> wrote:
> > There is a known issue with arrays in the "read-auto" state and "mdadm
>  > --monitor".  The attached patch addresses this.
>
>  Ah- I forgot that, yes, mdadm monitor was running.  I was *wondering* what
>  mdadm process was hanging around to segfault- duh! :-)
>
>
>
>  > What is more concerning is the usb-storage hangs.  Do you have logs
>  > from when it hung?
>
>  Nope- absolutely nothing useful from dmesg or /var/log/messages.  The
>  array simply grinds to a halt.  I think it might be the USB bridge, since
>  this morning I woke up to find the same drive as previously mentioned,
>  with its access light on continuously (didn't happen the first time.)  It
>  did get further along in the sync- 66.7%, very roughly twice as far as the
>  first time.
>
>  /proc/mdstat keeps getting updated during all this; the (obviously
>  averaged) rebuild rate average drops steadily.
>
>   I don't have any info handy on the USB device, but it's an older
>  SATA<->USB/eSATA AMS Venus (expensive but otherwise nice case.  Has a
>  silent 80mm fan in it that doesn't move much hair, but does keep drives
>  cooler.)  It dates back to when manufacturers were still making "L"
>  "eSATA" ports (grr.)  The other USB bridge is a Vantec multi-interface
>  bare adapter.  That one seemed fine.
>
>  Ask away with things you'd like me to try- I'll get more info this
>  evening, and I might try picking up a second one of the Vantec interfaces
>  if I can (they're cheap- and have proven endlessly useful.)
>  Unfortunately, my intended destination for this array uses Adaptec 1205SA
>  PCI cards, which idiotically don't support drives over 500GB and hang
>  during their POST :(
>

The output from sysrq-w after the hang might shed some light.  I have
copied linux-usb [1] in case they recognize the devices you mentioned.

--
Dan

[1]: original report http://marc.info/?l=linux-raid&m=120728545823965&w=2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-04-04 16:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04  5:03 segfault in mdadm v2.6.4 Brett Dikeman
2008-04-04 10:05 ` Christian Pernegger
2008-04-04 16:01   ` Dan Williams
2008-04-04 16:14     ` Brett Dikeman
2008-04-04 16:43       ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).