Re: assemble vs create an array.......

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Tokarev <mjt@tls.msk.ru>
To: Dragos <dragos@mpigani.org>
Cc: David Greaves <david@dgreaves.com>,
	linux-raid@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: assemble vs create an array.......
Date: Thu, 06 Dec 2007 19:39:28 +0300	[thread overview]
Message-ID: <475825C0.4070605@msgid.tls.msk.ru> (raw)
In-Reply-To: <4758129D.40600@mpigani.org>

[Cc'd to xfs list as it contains something related]

Dragos wrote:
> Thank you.
> I want to make sure I understand.

[Some background for XFS list.  The talk is about a broken linux software
raid (the reason for breakage isn't relevant anymore).  The OP seems to
lost the order of drives in his array, and now tries to create new array
ontop, trying different combinations of drives.  The filesystem there
WAS XFS.  One point is that linux refuses to mount it, saying
"structure needs cleaning".  This all is mostly md-related, but there
are several XFS-related questions and concerns too.]

> 
> 1- Does it matter which permutation of drives I use for xfs_repair (as
> long as it tells me that the Structure needs cleaning)? When it comes to
> linux I consider myself at intermediate level, but I am a beginner when
> it comes to raid and filesystem issues.

The permutation DOES MATTER - for all the devices.
Linux, when mounting an fs, only looks at the superblock of the filesystem,
which is usually located at the beginning of the device.

So in each case linux actually recognizes the filesystem (instead of
seeing complete garbage), the same device is the first one - I.e, this
way you found your first device.  The rest may be still out of order.

Raid5 data is laid like this (with 3 drives for simplicity, it's similar
with more drives):

       DiskA       DiskB       DiskC
Blk0   Data0       Data1       P0
Blk1   P1          Data2       Data3
Blk2   Data4       P2          Data5
Blk3   Data6       Data7       P3
... and so on .......................

where your actual data blocks are Data0, Data1, ... DataN,
and PX are parity blocks.

As long as DiskA remains in this position, the beginning of
the array is Data0 block, -- hence linux sees the beginning
of the filesystem and recognizes it.  But you can switch
DiskB and DiskC still, and the rest of the data will be
complete garbage, only data blocks on DiskA will be in
place.

So you still need to find order of the other drives
(you found your first drive, DriveA, already).

Note also that if Data1 block is all-zeros (a situation
which is unlikely for a non-empty filesystem), P0 (first
parity block) will be exactly the same as Data0, because
XORing anything with zeros gives the same "anything" again
(XOR is the operation used to calculate parity blocks in
RAID5).  So there's still a remote chance you've TWO
"first" disks...

What to do is to give repairfs a try for each permutation,
but again without letting it to actually fix anything.
Just run it in read-only mode and see which combination
of drives gives less errors, or no fatal errors (there
may be several similar combinations, with the same order
of drives but with different drive "missing").

It's sad that xfs refuses mount when "structure needs
cleaning" - the best way here is to actually mount it
and see how it looks like, instead of trying repair
tools.  Is there some option to force-mount it still
(in readonly mode, knowing it may OOPs kernel etc)?

I'm not very familiar with xfs yet - it seems to be
much faster than ext3 for our workload (mostly databases),
and I'm experimenting with it slowly.  But this very
thread prompted me to think.  If I can't force-mount it
(or browse it using other ways) as I can almost always
do with (somewhat?) broken ext[23] just to examine things,
maybe I'm trying it before it's mature enough? ;)  Note
the smile, but note there's a bit of joke in every joke... :)

> 2- After I do it, assuming that it worked, how do I reintegrate the
> 'missing' drive while keeping my data?

Just add it back -- mdadm --add /dev/mdX /dev/sdYZ.
But don't do that till you actually see your data.

/mjt

next prev parent reply	other threads:[~2007-12-06 16:39 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-30  3:42 assemble vs create an array Dragos
2007-11-30  5:32 ` Neil Brown
2007-11-30 14:26   ` David Greaves
2007-12-01  6:48     ` Dragos
2007-12-01  7:23       ` Dragos
2007-12-04 13:14     ` Dragos
2007-12-05 11:39       ` David Greaves
2007-12-06 15:17         ` Dragos
2007-12-06 16:39           ` Michael Tokarev [this message]
2007-12-06 17:12             ` Eric Sandeen
2007-12-06 21:22             ` David Chinner
2008-02-03  7:42               ` Dragos
2007-11-30 14:53 ` Bryce
2007-11-30 17:40   ` Michael Tokarev
  -- strict thread matches above, loose matches on Subject: below --
2008-01-28  7:21 Dragos
2008-01-28  7:25 Dragos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=475825C0.4070605@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=david@dgreaves.com \
    --cc=dragos@mpigani.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).