From: Simon McNair <simonmcnair@gmail.com>
To: Phil Turmel <philip@turmel.org>
Cc: NeilBrown <neilb@suse.de>, linux-raid@vger.kernel.org
Subject: Re: Linux software RAID assistance
Date: Tue, 15 Feb 2011 19:04:43 +0000 [thread overview]
Message-ID: <4D5ACE4B.9040309@gmail.com> (raw)
In-Reply-To: <4D5A92F3.1090004@turmel.org>
Phil,
Thanks for filling in the gaps, I had forgotton quite how much help and
assistance you had provided up to now. You're a real godsend.
To fill in some of the other pieces of info:
The original machine was called proxmox (KVM virtualisation machine) and
I used an Ubuntu live cd to see if it was the software which was
preventing progress or not. It does seem like there is data corruption
in the machine name for some reason.
The original company I ordered the controller through sent me an email
at 4pm saying that they could not fulfill the order, 4pm was too late
for them to pick & pack so there was another 24hr turn around delay.
The supermicro card should arrive here tomorrow and hopefully I'll be
able to get a dd of each of the drives prior to Phil coming online.
For some reason blkid doesn't exist on my machine even though I have
e2fs-utils installed. V weird.
Simon
On 15/02/2011 14:51, Phil Turmel wrote:
> Hi Neil,
>
> Since Simon has responded, let me summarize the assistance I provided per his off-list request:
>
> On 02/14/2011 11:53 PM, NeilBrown wrote:
>> On Thu, 10 Feb 2011 16:16:44 +0000 Simon McNair<simonmcnair@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> I use a 3ware 9500-12 port sata card (JBOD) which will not work without a
>>> 128mb sodimm. The sodimm socket is flakey and the result is that the
>>> machine occasionally crashes. Yesterday I finally gave in and put
>>> together another
>>> machine so that I can rsync between them. When I turned the machine
>>> on today to set up rync, the RAID array was not gone, but corrupted.
>>> Typical...
>> Presumably the old machine was called 'ubuntu' and the new machine 'proølox'
>>
>>
>>> I built the array in Aug 2010 using the following command:
>>>
>>> mdadm --create --verbose /dev/md0 --metadata=1.1 --level=5
>>> --raid-devices=10 /dev/sd{b,c,d,e,f,g,h,i,j,k}1 --chunk=64
>>>
>>> Using LVM, I did the following:
>>> pvscan
>>> pvcreate -M2 /dev/md0
>>> vgcreate lvm-raid /dev/md0
>>> vgdisplay lvm-raid
>>> vgscan
>>> lvscan
>>> lvcreate -v -l 100%VG -n RAID lvm-raid
>>> lvdisplay /dev/lvm-raid/lvm0
>>>
>>> I then formatted using:
>>> mkfs -t ext4 -v -m .1 -b 4096 -E stride=16,stripe-width=144
>>> /dev/lvm-raid/RAID
>>>
>>> This worked perfectly since I created the array. Now mdadm is coming up
>>> with
>>>
>>> proxmox:/dev/md# mdadm --assemble --scan --verbose
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/md/ubuntu:0
>> And it seems that ubuntu:0 have been successfully assembled.
>> It is missing one device for some reason (sdd1) but RAID can cope with that.
> 3ware card is compromised, with a loose buffer memory dimm. Some of its ECC errors were caught and reported in dmesg. Its likely, based on the loose memory socket, that many multiple-bit errors got through.
>
> [trim /]
>
>>> mdadm: no uptodate device for slot 8 of /dev/md/pro�lox:0
>>> mdadm: no uptodate device for slot 9 of /dev/md/pro�lox:0
>>> mdadm: failed to add /dev/sdd1 to /dev/md/pro�lox:0: Invalid argument
>>> mdadm: /dev/md/pro�lox:0 assembled from 0 drives - not enough to start
>>> the array.
>> This looks like it is *after* to trying the --create command you give
>> below.. It is best to report things in the order they happen, else you can
>> confuse people (or get caught out!).
> Yes, this was after.
>
>>> mdadm: looking for devices for further assembly
>>> mdadm: no recogniseable superblock on /dev/sdd
>>> mdadm: No arrays found in config file or automatically
>>>
>>> pvscan and vgscan show nothing.
>>>
>>> So I tried running mdadm --create --verbose /dev/md0 --metadata=1.1
>>> --level=5 --raid-devices=10 missing /dev/sde1 /dev/sdf1 /dev/sdg1
>>> /dev/sdh1 /dev/sdi1 /dev/sdj1 /dev/sdk1 /dev/sdl1 /dev/sdm1 --chunk=64
>>>
>>> as it seemed that /dev/sdd1 failed to be added to the array. This did
>>> nothing.
>> It did not to nothing. It wrote a superblock to /dev/sdd1 and complained
>> that it couldn't write to all the others --- didn't it?
> There were multiple attempts to create. One wrote to just sdd1, another succeeded with all but sdd1.
>
>>> dmesg contains:
>>>
>>> md: invalid superblock checksum on sdd1
>> I guess that is why sdd1 was missing from 'ubuntu:0'. Though as I cannot
>> tell if this happened before or after any of the various things reported
>> above, it is hard to be sure.
>>
>>
>> The real mystery is why 'pvscan' reports nothing.
> The original array was created with mdadm v2.6.7, and had a data offset of 264 sectors. After Simon's various attempts to --create, he ended up with data offset of 2048, using mdadm v3.1.4. The mdadm -E reports he posted to the list showed the 264 offset. We didn't realize the offset had been updated until somewhat later in our troubleshooting efforts.
>
> In any case, pvscan couldn't see the LVM signature because it wasn't there (at offset 2048).
>
>> What about
>> pvscan --verbose
>>
>> or
>>
>> blkid -p /dev/md/ubuntu:0
>>
>> or even
>>
>> dd of=/dev/md/ubuntu:0 count=8 | od -c
> Fortunately, Simon did have a copy of his LVM configuration. With the help of dd, strings, and grep, we did locate his LVM sig at the correct location on sdd1 (for data offset 264). After a number of attempts to bypass LVM and access his single LV with dmsetup (based on his backed up configuration, on the assembled new array less sdd1), I realized that the data offset was wrong on the recreated array, and went looking for the cause. I found your git commit that changed that logic last spring, and recommended that Simon revert to the default package for his ubuntu install, which is v2.6.7.
>
> Simon has now attempted to recreate the array with v2.6.7, but the controller is throwing too many errors to succeed, and I suggested it was too flakey to trust any further. Based on the existence of the LVM sig on sdd1, I believe Simon's data is (mostly) intact, and only needs a successful create operation with a properly functioning controller. (He might also need to perform an lvm vgcfgrestore, but he has the necessary backup file.)
>
> A new controller is on order.
>
> Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-02-15 19:04 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-10 16:16 Linux software RAID assistance Simon McNair
2011-02-10 18:24 ` Phil Turmel
2011-02-15 4:53 ` NeilBrown
2011-02-15 8:48 ` Simon McNair
2011-02-15 14:51 ` Phil Turmel
2011-02-15 19:04 ` Simon McNair [this message]
2011-02-15 19:37 ` Phil Turmel
2011-02-15 19:45 ` Roman Mamedov
2011-02-15 21:09 ` Simon McNair
2011-02-17 15:10 ` Simon Mcnair
2011-02-17 15:42 ` Roman Mamedov
2011-02-18 9:13 ` Simon McNair
2011-02-18 9:38 ` Robin Hill
2011-02-18 10:38 ` Simon Mcnair
2011-02-19 11:46 ` Jan Ceuleers
2011-02-19 12:40 ` Simon McNair
2011-02-19 17:37 ` Jan Ceuleers
2011-02-16 13:51 ` Simon McNair
2011-02-16 14:37 ` Phil Turmel
2011-02-16 17:49 ` Simon McNair
2011-02-16 18:14 ` Phil Turmel
2011-02-16 18:18 ` Simon McNair
2011-02-16 18:22 ` Phil Turmel
2011-02-16 18:25 ` Phil Turmel
2011-02-16 18:52 ` Simon McNair
2011-02-16 18:57 ` Phil Turmel
2011-02-16 19:07 ` Simon McNair
2011-02-16 19:10 ` Phil Turmel
2011-02-16 19:15 ` Simon McNair
2011-02-16 19:36 ` Phil Turmel
2011-02-16 21:28 ` Simon McNair
2011-02-16 21:30 ` Phil Turmel
2011-02-16 22:44 ` Simon Mcnair
2011-02-16 23:39 ` Phil Turmel
2011-02-17 13:26 ` Simon Mcnair
2011-02-17 13:48 ` Phil Turmel
2011-02-17 13:56 ` Simon Mcnair
2011-02-17 14:34 ` Simon Mcnair
2011-02-17 16:54 ` Phil Turmel
2011-02-19 8:43 ` Simon Mcnair
2011-02-19 15:30 ` Phil Turmel
[not found] ` <AANLkTinOXJWRw_et2U43R_T9XPBzQLnN56Kf2bOAz=_c@mail.gmail.com>
2011-02-19 16:19 ` Phil Turmel
2011-02-20 9:56 ` Simon Mcnair
2011-02-20 19:50 ` Phil Turmel
2011-02-20 23:17 ` Simon Mcnair
2011-02-20 23:39 ` Phil Turmel
2011-02-22 17:12 ` Simon Mcnair
2011-02-22 17:14 ` Simon Mcnair
2011-02-22 18:23 ` Phil Turmel
2011-02-22 18:36 ` Simon McNair
2011-02-22 19:06 ` Phil Turmel
2011-02-18 9:31 ` Simon Mcnair
2011-02-18 13:16 ` Phil Turmel
2011-02-18 13:21 ` Roberto Spadim
2011-02-18 13:26 ` Phil Turmel
2011-02-18 13:29 ` Simon Mcnair
2011-02-18 13:34 ` Phil Turmel
2011-02-18 14:12 ` Simon McNair
2011-02-18 16:10 ` Phil Turmel
2011-02-18 16:38 ` Roberto Spadim
[not found] ` <AANLkTi=RmR5nVnmFLuqK5anHc3WDPxjuYjitT6+5wAqS@mail.gmail.com>
2011-02-20 18:48 ` Phil Turmel
2011-02-20 19:25 ` Simon Mcnair
2011-02-19 8:49 ` Simon Mcnair
2011-02-16 13:56 ` Simon McNair
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5ACE4B.9040309@gmail.com \
--to=simonmcnair@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).