linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eli Stair <estair@ilm.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.
Date: Wed, 18 Oct 2006 15:18:48 -0700	[thread overview]
Message-ID: <4536A848.8020908@ilm.com> (raw)
In-Reply-To: <17706.65210.999441.373846@cse.unsw.edu.au>


FYI, I'm testing 2.6.18.1 and noticed this mis-numbering of RAID10
members issue is still extant.  Even with this fix applied to raid10.c, 
I am still seeing repeatable issues with devices assuming a "Number" 
greater than that which they had when removed from a running array.

Issue 1)

I'm seeing inconsistencies in the way a drive is marked (and its 
behaviour) during rebuild after it is removed and added.  In this 
instance, the re-added drive is picked up and marked as "spare 
rebuilding".

  Rebuild Status : 20% complete

            Name : 0
            UUID : ab764369:7cf80f2b:cf61b6df:0b13cd3a
          Events : 1

     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
        2     253       10        2      active sync   /dev/dm-10
        3     253       11        3      active sync   /dev/dm-11
        4     253       12        4      active sync   /dev/dm-12
        5     253       13        5      active sync   /dev/dm-13
        6     253        2        6      active sync   /dev/dm-2
        7     253        3        7      active sync   /dev/dm-3
        8     253        4        8      active sync   /dev/dm-4
        9     253        5        9      active sync   /dev/dm-5
       10     253        6       10      active sync   /dev/dm-6
       11     253        7       11      active sync   /dev/dm-7
       12     253        8       12      active sync   /dev/dm-8
       13     253        9       13      active sync   /dev/dm-9
[root@gtmp02 ~]# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 dm-9[13] dm-8[12] dm-7[11] dm-6[10] dm-5[9] dm-4[8] 
dm-3[7] dm-2[6] dm-13[5] dm-12[4] dm-11[3] dm-10[2] dm-1[1] dm-0[0]
       1003620352 blocks super 1.2 512K chunks 2 offset-copies [14/14] 
[UUUUUUUUUUUUUU]
       [====>................]  resync = 21.7% (218664064/1003620352) 
finish=114.1min speed=114596K/sec




However, on the same configuration, it occasionally is pulled right back 
with a state of "active sync", without indication that it dirty:

Issue 2)

When a device is removed and subsequently added again (after setting 
failed and removing from the array), it SHOULD be set back to the 
"Number" it originally had in the array correct?  In the cases when the 
drive is NOT automatically marked as "active sync" and all members show 
up fine, it is picked up as a spare and rebuild is started, during which 
time it is marked down "_" in the /proc/mdstat date, and "spare 
rebuilding" in mdadm -D output:



When device "Number" 10


// STATE WHEN CLEAN:
            UUID : 6ccd7974:1b23f5b2:047d1560:b5922692

     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
        2     253       10        2      active sync   /dev/dm-10
        3     253       11        3      active sync   /dev/dm-11
        4     253       12        4      active sync   /dev/dm-12
        5     253       13        5      active sync   /dev/dm-13
        6     253        2        6      active sync   /dev/dm-2
        7     253        3        7      active sync   /dev/dm-3
        8     253        4        8      active sync   /dev/dm-4
        9     253        5        9      active sync   /dev/dm-5
       10     253        6       10      active sync   /dev/dm-6
       11     253        7       11      active sync   /dev/dm-7
       12     253        8       12      active sync   /dev/dm-8
       13     253        9       13      active sync   /dev/dm-9


// STATE AFTER FAILURE:
     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
        2       0        0        2      removed
        3     253       11        3      active sync   /dev/dm-11
        4     253       12        4      active sync   /dev/dm-12
        5     253       13        5      active sync   /dev/dm-13
        6     253        2        6      active sync   /dev/dm-2
        7     253        3        7      active sync   /dev/dm-3
        8     253        4        8      active sync   /dev/dm-4
        9     253        5        9      active sync   /dev/dm-5
       10     253        6       10      active sync   /dev/dm-6
       11     253        7       11      active sync   /dev/dm-7
       12     253        8       12      active sync   /dev/dm-8
       13     253        9       13      active sync   /dev/dm-9

        2     253       10        -      faulty spare   /dev/dm-10

// STATE AFTER REMOVAL:
     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
        2       0        0        2      removed
        3     253       11        3      active sync   /dev/dm-11
        4     253       12        4      active sync   /dev/dm-12
        5     253       13        5      active sync   /dev/dm-13
        6     253        2        6      active sync   /dev/dm-2
        7     253        3        7      active sync   /dev/dm-3
        8     253        4        8      active sync   /dev/dm-4
        9     253        5        9      active sync   /dev/dm-5
       10     253        6       10      active sync   /dev/dm-6
       11     253        7       11      active sync   /dev/dm-7
       12     253        8       12      active sync   /dev/dm-8
       13     253        9       13      active sync   /dev/dm-9

// STATE AFTER RE-ADD:
     Number   Major   Minor   RaidDevice State
        0     253        0        0      active sync   /dev/dm-0
        1     253        1        1      active sync   /dev/dm-1
       14     253       10        2      spare rebuilding   /dev/dm-10
        3     253       11        3      active sync   /dev/dm-11
        4     253       12        4      active sync   /dev/dm-12
        5     253       13        5      active sync   /dev/dm-13
        6     253        2        6      active sync   /dev/dm-2
        7     253        3        7      active sync   /dev/dm-3
        8     253        4        8      active sync   /dev/dm-4
        9     253        5        9      active sync   /dev/dm-5
       10     253        6       10      active sync   /dev/dm-6
       11     253        7       11      active sync   /dev/dm-7
       12     253        8       12      active sync   /dev/dm-8
       13     253        9       13      active sync   /dev/dm-9







/eli


// raid10.c:

         for (i = 0; i < conf->raid_disks; i++) {

                 disk = conf->mirrors + i;

                 if (!disk->rdev ||
                     !test_bit(In_sync, &rdev->flags)) {
                         disk->head_position = 0;
                         mddev->degraded++;
                 }
         }

// END raid10.c




Neil Brown wrote:
> On Friday October 6, estair@ilm.com wrote:
>  >
>  > This patch has resolved the immediate issue I was having on 2.6.18 with
>  > RAID10.  Previous to this change, after removing a device from the array
>  > (with mdadm --remove), physically pulling the device and
>  > changing/re-inserting, the "Number" of the new device would be
>  > incremented on top of the highest-present device in the array.  Now, it
>  > resumes its previous place.
>  >
>  > Does this look to be 'correct' output for a 14-drive array, which dev 8
>  > was failed/removed from then "add"'ed?  I'm trying to determine why the
>  > device doesn't get pulled back into the active configuration and
>  > re-synced.  Any comments?
> 
> Does this patch help?
> 
> 
> 
> Fix count of degraded drives in raid10.
> 
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./drivers/md/raid10.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
> --- .prev/drivers/md/raid10.c   2006-10-09 14:18:00.000000000 +1000
> +++ ./drivers/md/raid10.c       2006-10-05 20:10:07.000000000 +1000
> @@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
>                 disk = conf->mirrors + i;
> 
>                 if (!disk->rdev ||
> -                   !test_bit(In_sync, &rdev->flags)) {
> +                   !test_bit(In_sync, &disk->rdev->flags)) {
>                         disk->head_position = 0;
>                         mddev->degraded++;
>                 }
> 
> 
> NeilBrown
> 



  parent reply	other threads:[~2006-10-18 22:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061005171233.6542.patches@notabene>
2006-10-05  7:13 ` [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly NeilBrown
2006-10-05 19:26   ` Eli Stair
2006-10-06 22:42     ` Eli Stair
2006-10-10  2:00       ` Neil Brown
2006-10-10 20:42         ` Eli Stair
2006-10-11  0:00           ` Eli Stair
2006-10-12 10:02         ` Michael Tokarev
2006-10-17  0:13           ` Neil Brown
2006-10-18 22:18         ` Eli Stair [this message]
2006-10-20  3:24           ` Neil Brown
2006-10-10 20:20       ` Eli Stair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4536A848.8020908@ilm.com \
    --to=estair@ilm.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).