linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Brad Campbell <lists2009@fnarfbargle.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: What the heck happened to my array?
Date: Tue, 5 Apr 2011 21:31:11 +1000	[thread overview]
Message-ID: <20110405213111.3dceae00@notabene.brown> (raw)
In-Reply-To: <4D9ADAB3.7040205@fnarfbargle.com>

On Tue, 05 Apr 2011 17:02:43 +0800 Brad Campbell <lists2009@fnarfbargle.com>
wrote:

> Well, luckily I preserved the entire build tree then. I was planning on 
> running nm over the binary and have a two thumbs type of look into it 
> with gdb, but seeing as you probably have a much better idea what you 
> are looking for I'll just send you the binary!

Thanks.  It took me a little while, but I've found the problem.

The code was failing at 
			wd0 = sources[z][d];
in qsyndrome in restripe.c.
It is looking up 'd' in 'sources[z]' and having problems.
The error address (from dmesg) is 0x7f2000 so it isn't a
NULL pointer, but rather it is falling off the end of an allocation.

When doing qsyndrome calculations we often need a block full of zeros, so
restripe.c allocates one and stores it in in a global pointer.

You were restriping from 512K to 64K chunk size.

The first thing restripe.c was called on to do was to restore data from the
backup file into the array.  This uses the new chunk size - 64K.  So the
'zero' buffer was allocated at 64K and cleared.


The next thing it does is read the next section of the array and write it to
the backup.  As the array was missing 2 devices it needed to do a qsyndrome
calculation to get the missing data block(s).  This was a calculation done on
old-style chunks so it needed a 512K zero block.
However as a zero block had already been allocated it didn't bother to
allocate another one.  It just used what it had, which was too small.
So it fell off the end and got the result we saw.

I don't know why this works in 3.2.1 where it didn't work in 3.1.4.

However when it successfully recovers from the backup it should update the
metadata so that it knows it has successfully recovered and doesn't need to
recover any more.  So maybe the time it worked, it found there wasn't any
recovery needed and so didn't allocate a 'zero' buffer until it was working
with the old, bigger, chunk size.

Anyway, this is easy to fix which I will do.

It only affects restarting a reshape of a double-degraded RAID6 which reduced
the chunksize.

Thanks,
NeilBrown

  reply	other threads:[~2011-04-05 11:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-03 13:32 What the heck happened to my array? (No apparent data loss) Brad Campbell
2011-04-03 15:47 ` Roberto Spadim
2011-04-04  5:59   ` Brad Campbell
2011-04-04 16:49     ` Roberto Spadim
2011-04-05  0:47       ` What the heck happened to my array? Brad Campbell
2011-04-05  6:10         ` NeilBrown
2011-04-05  9:02           ` Brad Campbell
2011-04-05 11:31             ` NeilBrown [this message]
2011-04-05 11:47               ` Brad Campbell
2011-04-08  1:19           ` Brad Campbell
2011-04-08  9:52             ` NeilBrown
2011-04-08 15:27               ` Roberto Spadim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110405213111.3dceae00@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists2009@fnarfbargle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).