raid5 reshape/resync

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid5 reshape/resync
@ 2007-11-24 11:02 Nagilum
  2007-11-25 19:04 ` Nagilum
  0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-11-24 11:02 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3834 bytes --]

Hi,
I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
During that reshape (at around 4%) /dev/sdd reported read errors and  
went offline.
I replaced /dev/sdd with a new drive and tried to reassemble the array  
(/dev/sdd was shown as removed and now as spare).
Assembly worked but it would not run unless I use --force.
Since I'm always reluctant to use force I put the bad disk back in,  
this time as /dev/sdg . I re-added the drive and could run the array.  
The array started to resync (since the disk can be read until 4%) and  
then I marked the disk as failed. Now the array is "active, degraded,  
recovering":

nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
         Version : 00.91.03
   Creation Time : Sat Sep 15 21:11:41 2007
      Raid Level : raid5
      Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
   Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
    Raid Devices : 6
   Total Devices : 7
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Sat Nov 24 10:10:46 2007
           State : active, degraded, recovering
  Active Devices : 5
Working Devices : 6
  Failed Devices : 1
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 16K

  Reshape Status : 19% complete
   Delta Devices : 1, (5->6)

            UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
          Events : 0.726347

     Number   Major   Minor   RaidDevice State
        0       8        0        0      active sync   /dev/sda
        1       8       16        1      active sync   /dev/sdb
        2       8       32        2      active sync   /dev/sdc
        6       8       96        3      faulty spare rebuilding   /dev/sdg
        4       8       64        4      active sync   /dev/sde
        5       8       80        5      active sync   /dev/sdf

        7       8       48        -      spare   /dev/sdd

iostat:
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             129.48      1498.01      1201.59       7520       6032
sdb             134.86      1498.01      1201.59       7520       6032
sdc             127.69      1498.01      1201.59       7520       6032
sdd               0.40         0.00         3.19          0         16
sde             111.55      1498.01      1201.59       7520       6032
sdf             117.73         0.00      1201.59          0       6032
sdg               0.00         0.00         0.00          0          0

What I find somewhat confusing/disturbing is that does not appear to  
utilize /dev/sdd. What I see here could be explained by md doing a  
RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would  
have expected it to use the new spare sdd for that. Also the speed is  
unusually low which seems to indicate a lot of seeking as if two  
operations are happening at the same time.
Also when I look at the data rates it looks more like the reshape is  
continuing even though one drive is missing (possible but risky).
Can someone relief my doubts as to whether md does the right thing here?
Thanks,

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================

----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..

[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-11-24 11:02 raid5 reshape/resync Nagilum
@ 2007-11-25 19:04 ` Nagilum
  2007-11-29  5:48   ` Neil Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-11-25 19:04 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

[-- Attachment #1: Type: text/plain, Size: 4916 bytes --]

----- Message from nagilum@nagilum.org ---------
     Date: Sat, 24 Nov 2007 12:02:09 +0100
     From: Nagilum <nagilum@nagilum.org>
Reply-To: Nagilum <nagilum@nagilum.org>
  Subject: raid5 reshape/resync
       To: linux-raid@vger.kernel.org

> Hi,
> I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
> I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
> During that reshape (at around 4%) /dev/sdd reported read errors and
> went offline.
> I replaced /dev/sdd with a new drive and tried to reassemble the array
> (/dev/sdd was shown as removed and now as spare).
> Assembly worked but it would not run unless I use --force.
> Since I'm always reluctant to use force I put the bad disk back in,
> this time as /dev/sdg . I re-added the drive and could run the array.
> The array started to resync (since the disk can be read until 4%) and
> then I marked the disk as failed. Now the array is "active, degraded,
> recovering":
>
> nas:~# mdadm -Q --detail /dev/md0
> /dev/md0:
>         Version : 00.91.03
>   Creation Time : Sat Sep 15 21:11:41 2007
>      Raid Level : raid5
>      Array Size : 1953234688 (1862.75 GiB 2000.11 GB)
>   Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
>    Raid Devices : 6
>   Total Devices : 7
> Preferred Minor : 0
>     Persistence : Superblock is persistent
>
>     Update Time : Sat Nov 24 10:10:46 2007
>           State : active, degraded, recovering
>  Active Devices : 5
> Working Devices : 6
>  Failed Devices : 1
>   Spare Devices : 1
>
>          Layout : left-symmetric
>      Chunk Size : 16K
>
>  Reshape Status : 19% complete
>   Delta Devices : 1, (5->6)
>
>            UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
>          Events : 0.726347
>
>     Number   Major   Minor   RaidDevice State
>        0       8        0        0      active sync   /dev/sda
>        1       8       16        1      active sync   /dev/sdb
>        2       8       32        2      active sync   /dev/sdc
>        6       8       96        3      faulty spare rebuilding   /dev/sdg
>        4       8       64        4      active sync   /dev/sde
>        5       8       80        5      active sync   /dev/sdf
>
>        7       8       48        -      spare   /dev/sdd
>
> iostat:
> Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
> sda             129.48      1498.01      1201.59       7520       6032
> sdb             134.86      1498.01      1201.59       7520       6032
> sdc             127.69      1498.01      1201.59       7520       6032
> sdd               0.40         0.00         3.19          0         16
> sde             111.55      1498.01      1201.59       7520       6032
> sdf             117.73         0.00      1201.59          0       6032
> sdg               0.00         0.00         0.00          0          0
>
> What I find somewhat confusing/disturbing is that does not appear to
> utilize /dev/sdd. What I see here could be explained by md doing a
> RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
> have expected it to use the new spare sdd for that. Also the speed is
> unusually low which seems to indicate a lot of seeking as if two
> operations are happening at the same time.
> Also when I look at the data rates it looks more like the reshape is
> continuing even though one drive is missing (possible but risky).
> Can someone relief my doubts as to whether md does the right thing here?
> Thanks,
>
----- End message from nagilum@nagilum.org -----

Ok, so the reshape tried to continue without the failed drive and  
after that resynced to the new spare.
Unfortunately the result is a mess. On top of the Raid5 I have  
dm-crypt and LVM.
Although dmcrypt and LVM dont appear to have a problem the filesystems  
on top are a mess now.
I still have the failed drive, I can read the superblock from that  
drive and up to 4% from the beginning and probably backwards from the  
end towards that point.
So in theory it could be possible to reorder the stripe blocks which  
appears to have been messed up.(?)
Unfortunately I'm not sure what exactly went wrong or what I did  
wrong. Can someone please give me hint?
Thanks,
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..


[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-11-25 19:04 ` Nagilum
@ 2007-11-29  5:48   ` Neil Brown
  2007-12-01 14:48     ` Nagilum
  0 siblings, 1 reply; 9+ messages in thread
From: Neil Brown @ 2007-11-29  5:48 UTC (permalink / raw)
  To: Nagilum; +Cc: linux-raid

On Sunday November 25, nagilum@nagilum.org wrote:
> ----- Message from nagilum@nagilum.org ---------
>      Date: Sat, 24 Nov 2007 12:02:09 +0100
>      From: Nagilum <nagilum@nagilum.org>
> Reply-To: Nagilum <nagilum@nagilum.org>
>   Subject: raid5 reshape/resync
>        To: linux-raid@vger.kernel.org
> 
> > Hi,
> > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
> > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
> > During that reshape (at around 4%) /dev/sdd reported read errors and
> > went offline.

Sad.

> > I replaced /dev/sdd with a new drive and tried to reassemble the array
> > (/dev/sdd was shown as removed and now as spare).

There must be a step missing here.
Just because one drive goes offline, that  doesn't mean that you need
to reassemble the array.  It should just continue with the reshape
until that is finished.  Did you shut the machine down or did it crash
or what

> > Assembly worked but it would not run unless I use --force.

That suggests an unclean shutdown.  Maybe it did crash?


> > Since I'm always reluctant to use force I put the bad disk back in,
> > this time as /dev/sdg . I re-added the drive and could run the array.
> > The array started to resync (since the disk can be read until 4%) and
> > then I marked the disk as failed. Now the array is "active, degraded,
> > recovering":

It should have restarted the reshape from whereever it was up to, so
it should have hit the read error almost immediately.  Do you remember
where it started the reshape from?  If it restarted from the beginning
that would be bad.

Did you just "--assemble" all the drives or did you do something else?

> >
> > What I find somewhat confusing/disturbing is that does not appear to
> > utilize /dev/sdd. What I see here could be explained by md doing a
> > RAID5 resync from the 4 drives sd[a-c,e] to sd[a-c,e,f] but I would
> > have expected it to use the new spare sdd for that. Also the speed is

md cannot recover to a spare while a reshape is happening.  It
completes the reshape, then does the recovery (as you discovered).

> > unusually low which seems to indicate a lot of seeking as if two
> > operations are happening at the same time.

Well reshape is always slow as it has to read from one part of the
drive and write to another part of the drive.

> > Also when I look at the data rates it looks more like the reshape is
> > continuing even though one drive is missing (possible but risky).

Yes, that is happening.

> > Can someone relief my doubts as to whether md does the right thing here?
> > Thanks,

I believe it is do "the right thing".

> >
> ----- End message from nagilum@nagilum.org -----
> 
> Ok, so the reshape tried to continue without the failed drive and  
> after that resynced to the new spare.

As I would expect.

> Unfortunately the result is a mess. On top of the Raid5 I have  

Hmm.  This I would not expect.

> dm-crypt and LVM.
> Although dmcrypt and LVM dont appear to have a problem the filesystems  
> on top are a mess now.

Can you be more specific about what sort of "mess" they are in?

NeilBrown


> I still have the failed drive, I can read the superblock from that  
> drive and up to 4% from the beginning and probably backwards from the  
> end towards that point.
> So in theory it could be possible to reorder the stripe blocks which  
> appears to have been messed up.(?)
> Unfortunately I'm not sure what exactly went wrong or what I did  
> wrong. Can someone please give me hint?
> Thanks,
> Alex.
> 
> ========================================================================
> #    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
> #   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
> #  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
> # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
> #           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
> ========================================================================
> 
> 
> ----------------------------------------------------------------
> cakebox.homeunix.net - all the machine one needs..
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-11-29  5:48   ` Neil Brown
@ 2007-12-01 14:48     ` Nagilum
  2007-12-11 21:56       ` Nagilum
  0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-01 14:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 10844 bytes --]

----- Message from neilb@suse.de ---------
     Date: Thu, 29 Nov 2007 16:48:47 +1100
     From: Neil Brown <neilb@suse.de>
Reply-To: Neil Brown <neilb@suse.de>
  Subject: Re: raid5 reshape/resync
       To: Nagilum <nagilum@nagilum.org>
       Cc: linux-raid@vger.kernel.org

>> > Hi,
>> > I'm running 2.6.23.8 x86_64 using mdadm v2.6.4.
>> > I was adding a disk (/dev/sdf) to an existing raid5 (/dev/sd[a-e] -> md0)
>> > During that reshape (at around 4%) /dev/sdd reported read errors and
>> > went offline.
>
> Sad.
>
>> > I replaced /dev/sdd with a new drive and tried to reassemble the array
>> > (/dev/sdd was shown as removed and now as spare).
>
> There must be a step missing here.
> Just because one drive goes offline, that  doesn't mean that you need
> to reassemble the array.  It should just continue with the reshape
> until that is finished.  Did you shut the machine down or did it crash
> or what
>> > Assembly worked but it would not run unless I use --force.
>
> That suggests an unclean shutdown.  Maybe it did crash?

I started the reshape and went out. When I came back the controller  
was beeping (indicating the erraneous disk). I tried to log on but I  
could not get in. The machine was responding to pings but that was  
about it (no ssh or xdm login worked). So I hard rebooted.
I booted into a rescue root, the /etc/mdadm/mdadm.conf didn't yet  
include the new disk so the raid was missing one disk and not started.
Since I didn't know what exactly what was going on I --re-added sdf  
(the new disk) and tried to resume reshaping. A second into that the  
read failure on /dev/sdd was reported. So I stopped md0 and shut down  
to verify the read error with another controller.
After I had verified that I replaced /dev/sdd with a new drive and put  
in the broken drive as /dev/sdg, just in case.

>> > Since I'm always reluctant to use force I put the bad disk back in,
>> > this time as /dev/sdg . I re-added the drive and could run the array.
>> > The array started to resync (since the disk can be read until 4%) and
>> > then I marked the disk as failed. Now the array is "active, degraded,
>> > recovering":
>
> It should have restarted the reshape from whereever it was up to, so
> it should have hit the read error almost immediately.  Do you remember
> where it started the reshape from?  If it restarted from the beginning
> that would be bad.

It must have continued where it left off since the reshape position in  
all superblocks was at about 4%.

> Did you just "--assemble" all the drives or did you do something else?

Sorry for being a bit unexact here, I didn't actually have to use  
--assemble, when booting into the rescue root the raid came up with  
/dev/sdd and /dev/sdf removed. I just had to --re-add /dev/sdf

>> > unusually low which seems to indicate a lot of seeking as if two
>> > operations are happening at the same time.
>
> Well reshape is always slow as it has to read from one part of the
> drive and write to another part of the drive.

Actually it was resyncing with the minimum speed, I managed to crank  
up the speed to >20MB/s by adjusting /sys/block/md0/md/sync_speed_min

>> > Can someone relief my doubts as to whether md does the right thing here?
>> > Thanks,
>
> I believe it is do "the right thing".
>
>> >
>> ----- End message from nagilum@nagilum.org -----
>>
>> Ok, so the reshape tried to continue without the failed drive and
>> after that resynced to the new spare.
>
> As I would expect.
>
>> Unfortunately the result is a mess. On top of the Raid5 I have
>
> Hmm.  This I would not expect.
>
>> dm-crypt and LVM.
>> Although dmcrypt and LVM dont appear to have a problem the filesystems
>> on top are a mess now.
>
> Can you be more specific about what sort of "mess" they are in?

Sure.
So here is the vg-layout:
nas:~# lvdisplay vg01
   --- Logical volume ---
   LV Name                /dev/vg01/lv1
   VG Name                vg01
   LV UUID                4HmzU2-VQpO-vy5R-Wdys-PmwH-AuUg-W02CKS
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                512.00 MB
   Current LE             128
   Segments               1
   Allocation             inherit
   Read ahead sectors     0
   Block device           253:1

   --- Logical volume ---
   LV Name                /dev/vg01/lv2
   VG Name                vg01
   LV UUID                4e2ZB9-29Rb-dy4M-EzEY-cEIG-Nm1I-CPI0kk
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                7.81 GB
   Current LE             2000
   Segments               1
   Allocation             inherit
   Read ahead sectors     0
   Block device           253:2

   --- Logical volume ---
   LV Name                /dev/vg01/lv3
   VG Name                vg01
   LV UUID                YQRd0X-5hF8-2dd3-GG4v-wQLH-WGH0-ntGgug
   LV Write Access        read/write
   LV Status              available
   # open                 0
   LV Size                1.81 TB
   Current LE             474735
   Segments               1
   Allocation             inherit
   Read ahead sectors     0
   Block device           253:3

The layout was created like that and except for increasing the size of  
the lv3 never changed anything. Therefore I think its safe to assume  
they are located in order and without gaps. The first lv is swap, so  
not much to loose here, the second lv is "/" reiserfs and is fine too.  
The third lv however looks pretty bad.
I uploaded the "xfs_repair -n /dev/mapper/vg01-lv3" output to  
http://www.nagilum.org/md/xfs_repair.txt.
I can mount the filesystem but the directories all look like that:

drwxr-xr-x 16 nagilum nagilum  155 2007-09-18 18:20 .
drwxr-xr-x  5 nagilum nagilum   89 2007-09-22 17:56 ..
drwxr-xr-x 12 nagilum nagilum  121 2007-09-18 18:19 biz
?---------  ? ?       ?          ?                ? comm
?---------  ? ?       ?          ?                ? dev
drwxr-xr-x  8 nagilum nagilum   76 2007-09-18 18:19 disk
drwxr-xr-x  7 nagilum nagilum   64 2007-09-18 18:19 docs
?---------  ? ?       ?          ?                ? game
?---------  ? ?       ?          ?                ? gfx
drwxr-xr-x  5 nagilum nagilum   40 2007-09-18 18:20 hard
drwxr-xr-x  8 nagilum nagilum   69 2007-09-18 18:20 misc
drwxr-xr-x  4 nagilum nagilum   27 2007-09-18 18:20 mods
drwxr-xr-x  5 nagilum nagilum   39 2007-09-18 18:20 mus
?---------  ? ?       ?          ?                ? pix
drwxr-xr-x  6 nagilum nagilum   51 2007-09-18 18:20 text
drwxr-xr-x 22 nagilum nagilum 4096 2007-09-18 18:21 util

Also the files which are readable are corrupt.
It looks to me as if md mixed up the chunk order in the stripes past  
the 4% mark.
I looked at a larger textfile to see what kind damage was done and see  
that it starts out ok but at 0xd000 the data becomes random data until  
0x11000.
Maybe a table to simplify things:
Ok     0x0     - 0xd000
Random 0xd000  - 0x11000
Ok     0x11000 - 0x21000
Random 0x21000 - 0x25000
Ok     0x25000 - 0x35000
Random 0x35000 - 0x39000

And so on.. 0x4000 is equal to my chunk size.
  Since LUKS uses the sectornumber for whitening the "random data"  
must be wrongly decrypted text.
I'm not sure how to reorder things so it will be ok again, I'll ponder  
about that while I try to recreate the situation using files and  
losetup.
And finally the information from the failed drive:

nas:~# mdadm -E /dev/sdg
/dev/sdg:
           Magic : a92b4efc
         Version : 00.91.00
            UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
   Creation Time : Sat Sep 15 21:11:41 2007
      Raid Level : raid5
   Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
      Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
    Raid Devices : 6
   Total Devices : 7
Preferred Minor : 0

   Reshape pos'n : 118360960 (112.88 GiB 121.20 GB)
   Delta Devices : 1 (5->6)

     Update Time : Fri Nov 23 20:05:50 2007
           State : active
  Active Devices : 6
Working Devices : 7
  Failed Devices : 0
   Spare Devices : 1
        Checksum : 9a8358c4 - correct
          Events : 0.677965

          Layout : left-symmetric
      Chunk Size : 16K

       Number   Major   Minor   RaidDevice State
this     3       8       96        3      active sync   /dev/sdg

    0     0       8        0        0      active sync   /dev/sda
    1     1       8       16        1      active sync   /dev/sdb
    2     2       8       32        2      active sync   /dev/sdc
    3     3       8       96        3      active sync   /dev/sdg
    4     4       8       64        4      active sync   /dev/sde
    5     5       8       80        5      active sync   /dev/sdf
    6     6       8       48        6      spare   /dev/sdd

from md's point of view the array is "fine" now of course:

nas:~# mdadm -Q --detail /dev/md0
/dev/md0:
         Version : 00.90.03
   Creation Time : Sat Sep 15 21:11:41 2007
      Raid Level : raid5
      Array Size : 2441543360 (2328.44 GiB 2500.14 GB)
   Used Dev Size : 488308672 (465.69 GiB 500.03 GB)
    Raid Devices : 6
   Total Devices : 6
Preferred Minor : 0
     Persistence : Superblock is persistent

     Update Time : Sat Dec  1 15:25:59 2007
           State : clean
  Active Devices : 6
Working Devices : 6
  Failed Devices : 0
   Spare Devices : 0

          Layout : left-symmetric
      Chunk Size : 16K

            UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380
          Events : 0.986918

     Number   Major   Minor   RaidDevice State
        0       8        0        0      active sync   /dev/sda
        1       8       16        1      active sync   /dev/sdb
        2       8       32        2      active sync   /dev/sdc
        3       8       48        3      active sync   /dev/sdd
        4       8       64        4      active sync   /dev/sde
        5       8       80        5      active sync   /dev/sdf

Ok, enough for now, any useful ideas are greatly appreciated!
Alex.



========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..


[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-12-01 14:48     ` Nagilum
@ 2007-12-11 21:56       ` Nagilum
  2007-12-16 13:16         ` Janek Kozicki
  0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-11 21:56 UTC (permalink / raw)
  To: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1580 bytes --]

----- Message from nagilum@nagilum.org ---------
     Date: Sat, 01 Dec 2007 15:48:17 +0100
     From: Nagilum <nagilum@nagilum.org>
Reply-To: Nagilum <nagilum@nagilum.org>
  Subject: Re: raid5 reshape/resync
       To: Neil Brown <neilb@suse.de>
       Cc: linux-raid@vger.kernel.org


> I'm not sure how to reorder things so it will be ok again, I'll ponder
> about that while I try to recreate the situation using files and
> losetup.

----- End message from nagilum@nagilum.org -----

Ok, I've recreated the problem in form of a semiautomatic testcase.
All necessary files (plus the old xfs_repair output) are at:
  http://www.nagilum.de/md/

I also added a readme: http://www.nagilum.de/md/readme.txt

After running the test.sh the created xfs filesystem on the raid  
device is broken and (at last in my case) cannot be mounted anymore.

I hope this will help finding the problem.

Kind regards,
Alex.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..


[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-12-11 21:56       ` Nagilum
@ 2007-12-16 13:16         ` Janek Kozicki
  2007-12-18 10:09           ` Nagilum
  0 siblings, 1 reply; 9+ messages in thread
From: Janek Kozicki @ 2007-12-16 13:16 UTC (permalink / raw)
  To: Nagilum; +Cc: linux-raid

Nagilum said:     (by the date of Tue, 11 Dec 2007 22:56:13 +0100)

> Ok, I've recreated the problem in form of a semiautomatic testcase.
> All necessary files (plus the old xfs_repair output) are at:
>   http://www.nagilum.de/md/

> After running the test.sh the created xfs filesystem on the raid  
> device is broken and (at last in my case) cannot be mounted anymore.

I think that you should file a bugreport, and provide there the
explanations you have put in there. An automated test case that leads
to xfs corruption is a neat snack for bug squashers ;-)

I wonder however where to report this - the xfs or raid ? Eventually
cross report to both places and write in the bugreport that you are
not sure on which side there is a bug.

best regards
-- 
Janek Kozicki                                                         |

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync
  2007-12-16 13:16         ` Janek Kozicki
@ 2007-12-18 10:09           ` Nagilum
  2007-12-19  4:50             ` raid5 reshape/resync - BUGREPORT Janek Kozicki
  0 siblings, 1 reply; 9+ messages in thread
From: Nagilum @ 2007-12-18 10:09 UTC (permalink / raw)
  To: Janek Kozicki; +Cc: linux-raid, Neil Brown

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

----- Message from janek_listy@wp.pl ---------
     Date: Sun, 16 Dec 2007 14:16:45 +0100
     From: Janek Kozicki <janek_listy@wp.pl>
Reply-To: Janek Kozicki <janek_listy@wp.pl>
  Subject: Re: raid5 reshape/resync
       To: Nagilum <nagilum@nagilum.org>
       Cc: linux-raid@vger.kernel.org


> Nagilum said:     (by the date of Tue, 11 Dec 2007 22:56:13 +0100)
>
>> Ok, I've recreated the problem in form of a semiautomatic testcase.
>> All necessary files (plus the old xfs_repair output) are at:
>>   http://www.nagilum.de/md/
>
>> After running the test.sh the created xfs filesystem on the raid
>> device is broken and (at last in my case) cannot be mounted anymore.
>
> I think that you should file a bugreport, and provide there the
> explanations you have put in there. An automated test case that leads
> to xfs corruption is a neat snack for bug squashers ;-)
>
> I wonder however where to report this - the xfs or raid ? Eventually
> cross report to both places and write in the bugreport that you are
> not sure on which side there is a bug.
>
----- End message from janek_listy@wp.pl -----

This is a md/mdadm problem. xfs is merely used as a vehicle to show  
the problem also amplified bei luks.
Where would I file this bug report? I thought this is the place?
I could also really use a way to fix that corruption. :(
Thanks,
Alex.

PS: yesterday I verified this bug on 2.6.23.9, will do 2.6.23.11 today.

========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..


[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync - BUGREPORT
  2007-12-18 10:09           ` Nagilum
@ 2007-12-19  4:50             ` Janek Kozicki
  2007-12-19 21:04               ` raid5 reshape/resync - BUGREPORT/PROBLEM Nagilum
  0 siblings, 1 reply; 9+ messages in thread
From: Janek Kozicki @ 2007-12-19  4:50 UTC (permalink / raw)
  Cc: linux-raid, Neil Brown

> ----- Message from janek_listy@wp.pl ---------
Nagilum said:     (by the date of Tue, 18 Dec 2007 11:09:38 +0100)

> >> Ok, I've recreated the problem in form of a semiautomatic testcase.
> >> All necessary files (plus the old xfs_repair output) are at:
> >>
> >>   http://www.nagilum.de/md/
> >
> >> After running the test.sh the created xfs filesystem on the raid
> >> device is broken and (at last in my case) cannot be mounted anymore.
> >
> > I think that you should file a bugreport

> ----- End message from janek_listy@wp.pl -----
> 
> Where would I file this bug report? I thought this is the place?
> I could also really use a way to fix that corruption. :(

ouch. To be honest I subscribed here just a month ago, so I'm not
sure. But I haven't seen other bugreports here so far. 

I was expecting that there is some bugzilla?

-- 
Janek Kozicki                                                         |

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: raid5 reshape/resync - BUGREPORT/PROBLEM
  2007-12-19  4:50             ` raid5 reshape/resync - BUGREPORT Janek Kozicki
@ 2007-12-19 21:04               ` Nagilum
  0 siblings, 0 replies; 9+ messages in thread
From: Nagilum @ 2007-12-19 21:04 UTC (permalink / raw)
  To: Janek Kozicki; +Cc: linux-raid, Neil Brown, hpa

[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]

----- Message from janek_listy@wp.pl ---------
>> ----- Message from janek_listy@wp.pl ---------
> Nagilum said:     (by the date of Tue, 18 Dec 2007 11:09:38 +0100)
>
>> >> Ok, I've recreated the problem in form of a semiautomatic testcase.
>> >> All necessary files (plus the old xfs_repair output) are at:
>> >>
>> >>   http://www.nagilum.de/md/
>> >
>> >> After running the test.sh the created xfs filesystem on the raid
>> >> device is broken and (at last in my case) cannot be mounted anymore.
>> >
>> > I think that you should file a bugreport
>
>> ----- End message from janek_listy@wp.pl -----
>>
>> Where would I file this bug report? I thought this is the place?
>> I could also really use a way to fix that corruption. :(
>
> ouch. To be honest I subscribed here just a month ago, so I'm not
> sure. But I haven't seen other bugreports here so far.
>
> I was expecting that there is some bugzilla?

Not really I'm afraid. At least not aware of anything like that for vanilla.

Anyway I just verified the bug on 2.6.23.11 and 2.6.24-rc5-git4.
Also I came across the bug on amd64 while I'm now using a PPC750  
machine to verify the bug. So it's an architecture undependant bug.  
(but that was to be expected)
I also prepared a different version of the testcase "v2_start.sh" and  
"v2_test.sh". This will print out all the wrong bytes (longs to be  
exact) + location.
It shows the data is there, but scattered. :(
Kind regards,
Alex.

----- End message from janek_listy@wp.pl -----



========================================================================
#    _  __          _ __     http://www.nagilum.org/ \n icq://69646724 #
#   / |/ /__ ____ _(_) /_ ____ _  nagilum@nagilum.org \n +491776461165 #
#  /    / _ `/ _ `/ / / // /  ' \  Amiga (68k/PPC): AOS/NetBSD/Linux   #
# /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/   Mac (PPC): MacOS-X / NetBSD /Linux #
#           /___/     x86: FreeBSD/Linux/Solaris/Win2k  ARM9: EPOC EV6 #
========================================================================


----------------------------------------------------------------
cakebox.homeunix.net - all the machine one needs..


[-- Attachment #2: PGP Digital Signature --]
[-- Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-12-19 21:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24 11:02 raid5 reshape/resync Nagilum
2007-11-25 19:04 ` Nagilum
2007-11-29  5:48   ` Neil Brown
2007-12-01 14:48     ` Nagilum
2007-12-11 21:56       ` Nagilum
2007-12-16 13:16         ` Janek Kozicki
2007-12-18 10:09           ` Nagilum
2007-12-19  4:50             ` raid5 reshape/resync - BUGREPORT Janek Kozicki
2007-12-19 21:04               ` raid5 reshape/resync - BUGREPORT/PROBLEM Nagilum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).