Linux RAID subsystem development
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Keith Keller <kkeller@wombat.san-francisco.ca.us>
Cc: linux-raid@vger.kernel.org
Subject: Re: meta: should i chase this down?
Date: Wed, 7 Dec 2011 11:47:26 +1100	[thread overview]
Message-ID: <20111207114726.2f3f3543@notabene.brown> (raw)
In-Reply-To: <42n2r8xe3k.ln2@goaway.wombat.san-francisco.ca.us>

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

On Tue, 06 Dec 2011 16:02:44 -0800 Keith Keller
<kkeller@wombat.san-francisco.ca.us> wrote:

> Hi all,
> 
> A little while back, I had a strange issue, where reshaping a RAID6 to
> add a disk, then performing significant write activity (in this case, an
> rsnapshot), would cause a kernel crash.  I only attempted this twice,
> and neglected to write down the kernel oops errors, but I saw a few
> calls that seemed to imply that the md driver might be involved.  (Doing
> the same write activity during a rebuild is fine, which is another
> reason I suspected the reshape code in the md driver.  If it's of
> interest, I'm using kernel 2.6.39-4.el5.elrepo from ELRepo on a CentOS
> 5.7 box.)  It's certainly possible that I have a hardware issue, but not
> being able to reliably replicate the issue outside a reshape complicates
> debugging.
> 
> My question is, should I try to hunt down the actual source of this
> crash, and if so, what would be the best way to go about that?  I am
> decidedly not a kernel developer, and am not familiar with how to obtain
> debugging information in that environment.  I'm happy enough for this
> machine to suffer crashes, but I prefer not to work with the existing
> RAID6 if possible, and would want a more reliable way of collecting the
> kernel's debug output beyond writing it down on paper.  :)
> 

I'm always happy to receive detailed crash reports.  However I cannot measure
how much your time is worth, nor can I guarantee that what you find wont
already have been fixed (though 2.6.39 is quite recent and I don't recall any
recent kernel-crash-during-reshape bugs, not can I find any in a quick scan
through the logs).
So I cannot advise you on whether it is "worth the effort".  I would
appreciate it though.

The best way I have found to catch kernel messages is using netconsole.
See Documentation/networking/netconsole.txt

You need a wired network port and another machine on the same network that
can capture the messages.

You almost certainly need some disks to make the RAID6 out of.  You could try
loop-back devices over files but the timing is likely to be very different
and so the chance of reproducing the bug correspondingly small.

But if you do manage to get a crash message I would be very happy to
interpret it and work to fix the bug that causes it.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2011-12-07  0:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-07  0:02 meta: should i chase this down? Keith Keller
2011-12-07  0:47 ` NeilBrown [this message]
2011-12-07  4:39   ` Keith Keller
2011-12-12 21:13     ` Keith Keller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111207114726.2f3f3543@notabene.brown \
    --to=neilb@suse.de \
    --cc=kkeller@wombat.san-francisco.ca.us \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox