* raid5 reshape/resync
@ 2007-11-24 11:02 Nagilum
2007-11-25 19:04 ` Nagilum
0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-11-24 11:02 UTC (permalink / raw)
To: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]
Hi,
I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
During that reshape (at around 4%) /dev/sdd reported read errors and
went offline.
I replaced /dev/sdd with a new drive and tried to reassemble the array
(/dev/sdd was shown as removed and now as spare).
Assembly worked but it would not run unless I use --force.
Since I'm always reluctant to use force I put the bad disk back in,
this time as /dev/sdg . I re-added the drive and could run the array.
The array started to resync (since the disk can be read until 4%) and
then I marked the disk as failed. Now the array is "active, degraded,
recovering":
nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
Version : 00.91.03
Creation Time : Sat Sep 15 21:11:41 2007
Raid Level : raid5
Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
Raid Devices : 6
Total Devices : 7
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Nov 24 10:10:46 2007
State : active, degraded, recovering
Active Devices : 5
Working Devices : 6
Failed Devices : 1
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 16K
Reshape Status : 19% complete
Delta Devices : 1, (5->6)
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
Events : 0.726347
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
6 8 96 3 faulty spare rebuilding /dev/sdg
4 8 64 4 active sync /dev/sde
5 8 80 5 active sync /dev/sdf
7 8 48 - spare /dev/sdd
iostat:
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 129.48 1498.01 1201.59 7520 6032
sdb 134.86 1498.01 1201.59 7520 6032
sdc 127.69 1498.01 1201.59 7520 6032
sdd 0.40 0.00 3.19 0 16
sde 111.55 1498.01 1201.59 7520 6032
sdf 117.73 0.00 1201.59 0 6032
sdg 0.00 0.00 0.00 0 0
What I find somewhat confusing/disturbing is that does not appear to
utilize /dev/sdd. What I see here could be explained by md doing a
RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
have expected it to use the new spare sdd for that. Also the speed is
unusually low which seems to indicate a lot of seeking as if two
operations are happening at the same time.
Also when I look at the data rates it looks more like the reshape is
continuing even though one drive is missing (possible but risky).
Can someone relief my doubts as to whether md does the right thing here?
Thanks,
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid5 reshape/resync
2007-11-24 11:02 raid5 reshape/resync Nagilum
@ 2007-11-25 19:04 ` Nagilum
2007-11-29 5:48 ` Neil Brown
0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-11-25 19:04 UTC (permalink / raw)
To: linux-raid; +Cc: Neil Brown
[-- Attachment #1: Type: text/plain, Size: 4916 bytes --]
----- Message from nagilum@nagilum.org ---------
Date: Sat, 24 Nov 2007 12:02:09 +0100
From: Nagilum <nagilum@nagilum.org>
Reply-To: Nagilum <nagilum@nagilum.org>
Subject: raid5 reshape/resync
To: linux-raid@vger.kernel.org
> Hi,
> I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
> I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
> During that reshape (at around 4%) /dev/sdd reported read errors and
> went offline.
> I replaced /dev/sdd with a new drive and tried to reassemble the array
> (/dev/sdd was shown as removed and now as spare).
> Assembly worked but it would not run unless I use --force.
> Since I'm always reluctant to use force I put the bad disk back in,
> this time as /dev/sdg . I re-added the drive and could run the array.
> The array started to resync (since the disk can be read until 4%) and
> then I marked the disk as failed. Now the array is "active, degraded,
> recovering":
>
> nas:~# mdadm -Q --detail /dev/md0
> /dev/md0:
> Version : 00.91.03
> Creation Time : Sat Sep 15 21:11:41 2007
> Raid Level : raid5
> Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
> Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
> Raid Devices : 6
> Total Devices : 7
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Sat Nov 24 10:10:46 2007
> State : active, degraded, recovering
> Active Devices : 5
> Working Devices : 6
> Failed Devices : 1
> Spare Devices : 1
>
> Layout : left-symmetric
> Chunk Size : 16K
>
> Reshape Status : 19% complete
> Delta Devices : 1, (5->6)
>
> UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
> Events : 0.726347
>
> Number Major Minor RaidDevice State
> 0 8 0 0 active sync /dev/sda
> 1 8 16 1 active sync /dev/sdb
> 2 8 32 2 active sync /dev/sdc
> 6 8 96 3 faulty spare rebuilding /dev/sdg
> 4 8 64 4 active sync /dev/sde
> 5 8 80 5 active sync /dev/sdf
>
> 7 8 48 - spare /dev/sdd
>
> iostat:
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 129.48 1498.01 1201.59 7520 6032
> sdb 134.86 1498.01 1201.59 7520 6032
> sdc 127.69 1498.01 1201.59 7520 6032
> sdd 0.40 0.00 3.19 0 16
> sde 111.55 1498.01 1201.59 7520 6032
> sdf 117.73 0.00 1201.59 0 6032
> sdg 0.00 0.00 0.00 0 0
>
> What I find somewhat confusing/disturbing is that does not appear to
> utilize /dev/sdd. What I see here could be explained by md doing a
> RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
> have expected it to use the new spare sdd for that. Also the speed is
> unusually low which seems to indicate a lot of seeking as if two
> operations are happening at the same time.
> Also when I look at the data rates it looks more like the reshape is
> continuing even though one drive is missing (possible but risky).
> Can someone relief my doubts as to whether md does the right thing here?
> Thanks,
>
----- End message from nagilum@nagilum.org -----
Ok, so the reshape tried to continue without the failed drive and
after that resynced to the new spare.
Unfortunately the result is a mess. On top of the Raid5 I have
dm-crypt and LVM.
Although dmcrypt and LVM dont appear to have a problem the filesystems
on top are a mess now.
I still have the failed drive, I can read the superblock from that
drive and up to 4% from the beginning and probably backwards from the
end towards that point.
So in theory it could be possible to reorder the stripe blocks which
appears to have been messed up.(?)
Unfortunately I'm not sure what exactly went wrong or what I did
wrong. Can someone please give me hint?
Thanks,
Alex.
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid5 reshape/resync
2007-11-25 19:04 ` Nagilum
@ 2007-11-29 5:48 ` Neil Brown
2007-12-01 14:48 ` Nagilum
0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2007-11-29 5:48 UTC (permalink / raw)
To: Nagilum; +Cc: linux-raid
On Sunday November 25, nagilum@nagilum.org wrote:
> ----- Message from nagilum@nagilum.org ---------
> Date: Sat, 24 Nov 2007 12:02:09 +0100
> From: Nagilum <nagilum@nagilum.org>
> Reply-To: Nagilum <nagilum@nagilum.org>
> Subject: raid5 reshape/resync
> To: linux-raid@vger.kernel.org
>
> > Hi,
> > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
> > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
> > During that reshape (at around 4%) /dev/sdd reported read errors and
> > went offline.
Sad.
> > I replaced /dev/sdd with a new drive and tried to reassemble the array
> > (/dev/sdd was shown as removed and now as spare).
There must be a step missing here.
Just because one drive goes offline, that doesn't mean that you need
to reassemble the array. It should just continue with the reshape
until that is finished. Did you shut the machine down or did it crash
or what
> > Assembly worked but it would not run unless I use --force.
That suggests an unclean shutdown. Maybe it did crash?
> > Since I'm always reluctant to use force I put the bad disk back in,
> > this time as /dev/sdg . I re-added the drive and could run the array.
> > The array started to resync (since the disk can be read until 4%) and
> > then I marked the disk as failed. Now the array is "active, degraded,
> > recovering":
It should have restarted the reshape from whereever it was up to, so
it should have hit the read error almost immediately. Do you remember
where it started the reshape from? If it restarted from the beginning
that would be bad.
Did you just "--assemble" all the drives or did you do something else?
> >
> > What I find somewhat confusing/disturbing is that does not appear to
> > utilize /dev/sdd. What I see here could be explained by md doing a
> > RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
> > have expected it to use the new spare sdd for that. Also the speed is
md cannot recover to a spare while a reshape is happening. It
completes the reshape, then does the recovery (as you discovered).
> > unusually low which seems to indicate a lot of seeking as if two
> > operations are happening at the same time.
Well reshape is always slow as it has to read from one part of the
drive and write to another part of the drive.
> > Also when I look at the data rates it looks more like the reshape is
> > continuing even though one drive is missing (possible but risky).
Yes, that is happening.
> > Can someone relief my doubts as to whether md does the right thing here?
> > Thanks,
I believe it is do "the right thing".
> >
> ----- End message from nagilum@nagilum.org -----
>
> Ok, so the reshape tried to continue without the failed drive and
> after that resynced to the new spare.
As I would expect.
> Unfortunately the result is a mess. On top of the Raid5 I have
Hmm. This I would not expect.
> dm-crypt and LVM.
> Although dmcrypt and LVM dont appear to have a problem the filesystems
> on top are a mess now.
Can you be more specific about what sort of "mess" they are in?
NeilBrown
> I still have the failed drive, I can read the superblock from that
> drive and up to 4% from the beginning and probably backwards from the
> end towards that point.
> So in theory it could be possible to reorder the stripe blocks which
> appears to have been messed up.(?)
> Unfortunately I'm not sure what exactly went wrong or what I did
> wrong. Can someone please give me hint?
> Thanks,
> Alex.
>
> ========================================================================
> # _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
> # / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
> # / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
> # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
> # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
> ========================================================================
>
>
> ----------------------------------------------------------------
> cakebox.homeunix.net - all the machine one needs..
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5 reshape/resync
2007-11-29 5:48 ` Neil Brown
@ 2007-12-01 14:48 ` Nagilum
2007-12-11 21:56 ` Nagilum
0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-01 14:48 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 10844 bytes --]
----- Message from neilb@suse.de ---------
Date: Thu, 29 Nov 2007 16:48:47 +1100
From: Neil Brown <neilb@suse.de>
Reply-To: Neil Brown <neilb@suse.de>
Subject: Re: raid5 reshape/resync
To: Nagilum <nagilum@nagilum.org>
Cc: linux-raid@vger.kernel.org
>> > Hi,
>> > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
>> > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
>> > During that reshape (at around 4%) /dev/sdd reported read errors and
>> > went offline.
>
> Sad.
>
>> > I replaced /dev/sdd with a new drive and tried to reassemble the array
>> > (/dev/sdd was shown as removed and now as spare).
>
> There must be a step missing here.
> Just because one drive goes offline, that doesn't mean that you need
> to reassemble the array. It should just continue with the reshape
> until that is finished. Did you shut the machine down or did it crash
> or what
>> > Assembly worked but it would not run unless I use --force.
>
> That suggests an unclean shutdown. Maybe it did crash?
I started the reshape and went out. When I came back the controller
was beeping (indicating the erraneous disk). I tried to log on but I
could not get in. The machine was responding to pings but that was
about it (no ssh or xdm login worked). So I hard rebooted.
I booted into a rescue root, the /etc/mdadm/mdadm.conf didn't yet
include the new disk so the raid was missing one disk and not started.
Since I didn't know what exactly what was going on I --re-added sdf
(the new disk) and tried to resume reshaping. A second into that the
read failure on /dev/sdd was reported. So I stopped md0 and shut down
to verify the read error with another controller.
After I had verified that I replaced /dev/sdd with a new drive and put
in the broken drive as /dev/sdg, just in case.
>> > Since I'm always reluctant to use force I put the bad disk back in,
>> > this time as /dev/sdg . I re-added the drive and could run the array.
>> > The array started to resync (since the disk can be read until 4%) and
>> > then I marked the disk as failed. Now the array is "active, degraded,
>> > recovering":
>
> It should have restarted the reshape from whereever it was up to, so
> it should have hit the read error almost immediately. Do you remember
> where it started the reshape from? If it restarted from the beginning
> that would be bad.
It must have continued where it left off since the reshape position in
all superblocks was at about 4%.
> Did you just "--assemble" all the drives or did you do something else?
Sorry for being a bit unexact here, I didn't actually have to use
--assemble, when booting into the rescue root the raid came up with
/dev/sdd and /dev/sdf removed. I just had to --re-add /dev/sdf
>> > unusually low which seems to indicate a lot of seeking as if two
>> > operations are happening at the same time.
>
> Well reshape is always slow as it has to read from one part of the
> drive and write to another part of the drive.
Actually it was resyncing with the minimum speed, I managed to crank
up the speed to >20MB/s by adjusting /sys/block/md0/md/sync_speed_min
>> > Can someone relief my doubts as to whether md does the right thing here?
>> > Thanks,
>
> I believe it is do "the right thing".
>
>> >
>> ----- End message from nagilum@nagilum.org -----
>>
>> Ok, so the reshape tried to continue without the failed drive and
>> after that resynced to the new spare.
>
> As I would expect.
>
>> Unfortunately the result is a mess. On top of the Raid5 I have
>
> Hmm. This I would not expect.
>
>> dm-crypt and LVM.
>> Although dmcrypt and LVM dont appear to have a problem the filesystems
>> on top are a mess now.
>
> Can you be more specific about what sort of "mess" they are in?
Sure.
So here is the vg-layout:
nas:~# lvdisplay vg01
--- Logical volume ---
LV Name /dev/vg01/lv1
VG Name vg01
LV UUID 4HmzU2-VQpO-vy5R-Wdys-PmwH-AuUg-W02CKS
LV Write Access read/write
LV Status available
# open 0
LV Size 512.00 MB
Current LE 128
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:1
--- Logical volume ---
LV Name /dev/vg01/lv2
VG Name vg01
LV UUID 4e2ZB9-29Rb-dy4M-EzEY-cEIG-Nm1I-CPI0kk
LV Write Access read/write
LV Status available
# open 0
LV Size 7.81 GB
Current LE 2000
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:2
--- Logical volume ---
LV Name /dev/vg01/lv3
VG Name vg01
LV UUID YQRd0X-5hF8-2dd3-GG4v-wQLH-WGH0-ntGgug
LV Write Access read/write
LV Status available
# open 0
LV Size 1.81 TB
Current LE 474735
Segments 1
Allocation inherit
Read ahead sectors 0
Block device 253:3
The layout was created like that and except for increasing the size of
the lv3 never changed anything. Therefore I think its safe to assume
they are located in order and without gaps. The first lv is swap, so
not much to loose here, the second lv is "/" reiserfs and is fine too.
The third lv however looks pretty bad.
I uploaded the "xfs_repair -n /dev/mapper/vg01-lv3" output to
http://www.nagilum.org/md/xfs_repair.txt.
I can mount the filesystem but the directories all look like that:
drwxr-xr-x 16 nagilum nagilum 155 2007-09-18 18:20 .
drwxr-xr-x 5 nagilum nagilum 89 2007-09-22 17:56 ..
drwxr-xr-x 12 nagilum nagilum 121 2007-09-18 18:19 biz
?--------- ? ? ? ? ? comm
?--------- ? ? ? ? ? dev
drwxr-xr-x 8 nagilum nagilum 76 2007-09-18 18:19 disk
drwxr-xr-x 7 nagilum nagilum 64 2007-09-18 18:19 docs
?--------- ? ? ? ? ? game
?--------- ? ? ? ? ? gfx
drwxr-xr-x 5 nagilum nagilum 40 2007-09-18 18:20 hard
drwxr-xr-x 8 nagilum nagilum 69 2007-09-18 18:20 misc
drwxr-xr-x 4 nagilum nagilum 27 2007-09-18 18:20 mods
drwxr-xr-x 5 nagilum nagilum 39 2007-09-18 18:20 mus
?--------- ? ? ? ? ? pix
drwxr-xr-x 6 nagilum nagilum 51 2007-09-18 18:20 text
drwxr-xr-x 22 nagilum nagilum 4096 2007-09-18 18:21 util
Also the files which are readable are corrupt.
It looks to me as if md mixed up the chunk order in the stripes past
the 4% mark.
I looked at a larger textfile to see what kind damage was done and see
that it starts out ok but at 0xd000 the data becomes random data until
0x11000.
Maybe a table to simplify things:
Ok 0x0 - 0xd000
Random 0xd000 - 0x11000
Ok 0x11000 - 0x21000
Random 0x21000 - 0x25000
Ok 0x25000 - 0x35000
Random 0x35000 - 0x39000
And so on.. 0x4000 is equal to my chunk size.
Since LUKS uses the sectornumber for whitening the "random data"
must be wrongly decrypted text.
I'm not sure how to reorder things so it will be ok again, I'll ponder
about that while I try to recreate the situation using files and
losetup.
And finally the information from the failed drive:
nas:~# mdadm -E /dev/sdg
/dev/sdg:
Magic : a92b4efc
Version : 00.91.00
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
Creation Time : Sat Sep 15 21:11:41 2007
Raid Level : raid5
Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
Raid Devices : 6
Total Devices : 7
Preferred Minor : 0
Reshape pos'n : 118360960 (112.88 GiB 121.20 GB)
Delta Devices : 1 (5->6)
Update Time : Fri Nov 23 20:05:50 2007
State : active
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Checksum : 9a8358c4 - correct
Events : 0.677965
Layout : left-symmetric
Chunk Size : 16K
Number Major Minor RaidDevice State
this 3 8 96 3 active sync /dev/sdg
0 0 8 0 0 active sync /dev/sda
1 1 8 16 1 active sync /dev/sdb
2 2 8 32 2 active sync /dev/sdc
3 3 8 96 3 active sync /dev/sdg
4 4 8 64 4 active sync /dev/sde
5 5 8 80 5 active sync /dev/sdf
6 6 8 48 6 spare /dev/sdd
from md's point of view the array is "fine" now of course:
nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sat Sep 15 21:11:41 2007
Raid Level : raid5
Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
Raid Devices : 6
Total Devices : 6
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Dec 1 15:25:59 2007
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 16K
UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
Events : 0.986918
Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
3 8 48 3 active sync /dev/sdd
4 8 64 4 active sync /dev/sde
5 8 80 5 active sync /dev/sdf
Ok, enough for now, any useful ideas are greatly appreciated!
Alex.
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid5 reshape/resync
2007-12-01 14:48 ` Nagilum
@ 2007-12-11 21:56 ` Nagilum
2007-12-16 13:16 ` Janek Kozicki
0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-11 21:56 UTC (permalink / raw)
To: Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]
----- Message from nagilum@nagilum.org ---------
Date: Sat, 01 Dec 2007 15:48:17 +0100
From: Nagilum <nagilum@nagilum.org>
Reply-To: Nagilum <nagilum@nagilum.org>
Subject: Re: raid5 reshape/resync
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
> I'm not sure how to reorder things so it will be ok again, I'll ponder
> about that while I try to recreate the situation using files and
> losetup.
----- End message from nagilum@nagilum.org -----
Ok, I've recreated the problem in form of a semiautomatic testcase.
All necessary files (plus the old xfs_repair output) are at:
http://www.nagilum.de/md/
I also added a readme: http://www.nagilum.de/md/readme.txt
After running the test.sh the created xfs filesystem on the raid
device is broken and (at last in my case) cannot be mounted anymore.
I hope this will help finding the problem.
Kind regards,
Alex.
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid5 reshape/resync
2007-12-11 21:56 ` Nagilum
@ 2007-12-16 13:16 ` Janek Kozicki
2007-12-18 10:09 ` Nagilum
0 siblings, 1 reply; 9+ messages in thread
From: Janek Kozicki @ 2007-12-16 13:16 UTC (permalink / raw)
To: Nagilum; +Cc: linux-raid
Nagilum said: (by the date of Tue, 11 Dec 2007 22:56:13 +0100)
> Ok, I've recreated the problem in form of a semiautomatic testcase.
> All necessary files (plus the old xfs_repair output) are at:
> http://www.nagilum.de/md/
> After running the test.sh the created xfs filesystem on the raid
> device is broken and (at last in my case) cannot be mounted anymore.
I think that you should file a bugreport, and provide there the
explanations you have put in there. An automated test case that leads
to xfs corruption is a neat snack for bug squashers ;-)
I wonder however where to report this - the xfs or raid ? Eventually
cross report to both places and write in the bugreport that you are
not sure on which side there is a bug.
best regards
--
Janek Kozicki |
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5 reshape/resync
2007-12-16 13:16 ` Janek Kozicki
@ 2007-12-18 10:09 ` Nagilum
2007-12-19 4:50 ` raid5 reshape/resync - BUGREPORT Janek Kozicki
0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-18 10:09 UTC (permalink / raw)
To: Janek Kozicki; +Cc: linux-raid, Neil Brown
[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]
----- Message from janek_listy@wp.pl ---------
Date: Sun, 16 Dec 2007 14:16:45 +0100
From: Janek Kozicki <janek_listy@wp.pl>
Reply-To: Janek Kozicki <janek_listy@wp.pl>
Subject: Re: raid5 reshape/resync
To: Nagilum <nagilum@nagilum.org>
Cc: linux-raid@vger.kernel.org
> Nagilum said: (by the date of Tue, 11 Dec 2007 22:56:13 +0100)
>
>> Ok, I've recreated the problem in form of a semiautomatic testcase.
>> All necessary files (plus the old xfs_repair output) are at:
>> http://www.nagilum.de/md/
>
>> After running the test.sh the created xfs filesystem on the raid
>> device is broken and (at last in my case) cannot be mounted anymore.
>
> I think that you should file a bugreport, and provide there the
> explanations you have put in there. An automated test case that leads
> to xfs corruption is a neat snack for bug squashers ;-)
>
> I wonder however where to report this - the xfs or raid ? Eventually
> cross report to both places and write in the bugreport that you are
> not sure on which side there is a bug.
>
----- End message from janek_listy@wp.pl -----
This is a md/mdadm problem. xfs is merely used as a vehicle to show
the problem also amplified bei luks.
Where would I file this bug report? I thought this is the place?
I could also really use a way to fix that corruption. :(
Thanks,
Alex.
PS: yesterday I verified this bug on 2.6.23.9, will do 2.6.23.11 today.
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: raid5 reshape/resync - BUGREPORT
2007-12-18 10:09 ` Nagilum
@ 2007-12-19 4:50 ` Janek Kozicki
2007-12-19 21:04 ` raid5 reshape/resync - BUGREPORT/PROBLEM Nagilum
0 siblings, 1 reply; 9+ messages in thread
From: Janek Kozicki @ 2007-12-19 4:50 UTC (permalink / raw)
Cc: linux-raid, Neil Brown
> ----- Message from janek_listy@wp.pl ---------
Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100)
> >> Ok, I've recreated the problem in form of a semiautomatic testcase.
> >> All necessary files (plus the old xfs_repair output) are at:
> >>
> >> http://www.nagilum.de/md/
> >
> >> After running the test.sh the created xfs filesystem on the raid
> >> device is broken and (at last in my case) cannot be mounted anymore.
> >
> > I think that you should file a bugreport
> ----- End message from janek_listy@wp.pl -----
>
> Where would I file this bug report? I thought this is the place?
> I could also really use a way to fix that corruption. :(
ouch. To be honest I subscribed here just a month ago, so I'm not
sure. But I haven't seen other bugreports here so far.
I was expecting that there is some bugzilla?
--
Janek Kozicki |
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: raid5 reshape/resync - BUGREPORT/PROBLEM
2007-12-19 4:50 ` raid5 reshape/resync - BUGREPORT Janek Kozicki
@ 2007-12-19 21:04 ` Nagilum
0 siblings, 0 replies; 9+ messages in thread
From: Nagilum @ 2007-12-19 21:04 UTC (permalink / raw)
To: Janek Kozicki; +Cc: linux-raid, Neil Brown, hpa
[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]
----- Message from janek_listy@wp.pl ---------
>> ----- Message from janek_listy@wp.pl ---------
> Nagilum said: (by the date of Tue, 18 Dec 2007 11:09:38 +0100)
>
>> >> Ok, I've recreated the problem in form of a semiautomatic testcase.
>> >> All necessary files (plus the old xfs_repair output) are at:
>> >>
>> >> http://www.nagilum.de/md/
>> >
>> >> After running the test.sh the created xfs filesystem on the raid
>> >> device is broken and (at last in my case) cannot be mounted anymore.
>> >
>> > I think that you should file a bugreport
>
>> ----- End message from janek_listy@wp.pl -----
>>
>> Where would I file this bug report? I thought this is the place?
>> I could also really use a way to fix that corruption. :(
>
> ouch. To be honest I subscribed here just a month ago, so I'm not
> sure. But I haven't seen other bugreports here so far.
>
> I was expecting that there is some bugzilla?
Not really I'm afraid. At least not aware of anything like that for vanilla.
Anyway I just verified the bug on 2.6.23.11 and 2.6.24-rc5-git4.
Also I came across the bug on amd64 while I'm now using a PPC750
machine to verify the bug. So it's an architecture undependant bug.
(but that was to be expected)
I also prepared a different version of the testcase "v2_start.sh" and
"v2_test.sh". This will print out all the wrong bytes (longs to be
exact) + location.
It shows the data is there, but scattered. :(
Kind regards,
Alex.
----- End message from janek_listy@wp.pl -----
========================================================================
# _ __ _ __ http://www.nagilum.org/ \n icq://69646724 #
# / |/ /__ ____ _(_) /_ ____ _ nagilum@nagilum.org \n +491776461165 #
# / / _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux #
# /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 #
========================================================================
----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..
[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-12-19 21:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24 11:02 raid5 reshape/resync Nagilum
2007-11-25 19:04 ` Nagilum
2007-11-29 5:48 ` Neil Brown
2007-12-01 14:48 ` Nagilum
2007-12-11 21:56 ` Nagilum
2007-12-16 13:16 ` Janek Kozicki
2007-12-18 10:09 ` Nagilum
2007-12-19 4:50 ` raid5 reshape/resync - BUGREPORT Janek Kozicki
2007-12-19 21:04 ` raid5 reshape/resync - BUGREPORT/PROBLEM Nagilum
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).