Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Eli Stair <estair@ilm.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.
Date: Tue, 10 Oct 2006 13:42:23 -0700	[thread overview]
Message-ID: <452C05AF.2070807@ilm.com> (raw)
In-Reply-To: <17706.65210.999441.373846@cse.unsw.edu.au>

[-- Attachment #1: Type: text/plain, Size: 4550 bytes --]


Thanks Neil,

I just gave this patched module a shot on four systems.  So far, I 
haven't seen the device number inappropriately increment, though as per 
  a mail I sent a short while ago that seemed remedied by using the 1.2 
superblock, for some reason.  However, it appears to have introduced a 
new issue, and another is unresolved by it:



// BUG 1
The single-command syntax to fail and remove a drive is still failing, I 
do not know if this is somehow contributing to the further (new) issues 
below:

   [root@gtmp06 tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0
   mdadm: set /dev/dm-0 faulty in /dev/md0
   mdadm: hot remove failed for /dev/dm-0: Device or resource busy

   [root@gtmp06 tmp]# mdadm /dev/md0 --remove /dev/dm-0
   mdadm: hot removed /dev/dm-0


// BUG 2
Now, upon adding or re-adding a "fail...remove"'d drive, it is not used 
for resync.  I realized previously that added drives weren't re-synced 
until the existing array build was done, then they were grabbed.  This 
however is a clean/active array that is rejecting the drive.

I've performed this identically on both a clean & active array, as well 
as a newly-created (resync'ing) array, to the same effect.  Even after 
rebuild or reboot, the removed drive isn't taken back and remains listed 
as a "faulty spare", with dmesg indicating that it is "non-fresh".




// DMESG:

md: kicking non-fresh dm-0 from array!


// ARRAY status 'mdadm -D /dev/md0'

           State : active, degraded
  Active Devices : 13
Working Devices : 13
  Failed Devices : 1
   Spare Devices : 0

          Layout : near=1, offset=2
      Chunk Size : 512K

            Name : 0
            UUID : 05c2faf4:facfcad3:ba33b140:100f428a
          Events : 22

     Number   Major   Minor   RaidDevice State
        0     253        1        0      active sync   /dev/dm-1
        1     253        2        1      active sync   /dev/dm-2
        2     253        5        2      active sync   /dev/dm-5
        3     253        4        3      active sync   /dev/dm-4
        4     253        6        4      active sync   /dev/dm-6
        5     253        3        5      active sync   /dev/dm-3
        6     253       13        6      active sync   /dev/dm-13
        7       0        0        7      removed
        8     253        7        8      active sync   /dev/dm-7
        9     253        8        9      active sync   /dev/dm-8
       10     253        9       10      active sync   /dev/dm-9
       11     253       11       11      active sync   /dev/dm-11
       12     253       10       12      active sync   /dev/dm-10
       13     253       12       13      active sync   /dev/dm-12

        7     253        0        -      faulty spare   /dev/dm-0




Let me know what more I can do to help track this down.  I'm reverting 
this patch, since it is behaving less-well than before.  Will be happy 
to try others.

Attached are typescript of the drive remove/add sessions and all output.


/eli


Neil Brown wrote:
> On Friday October 6, estair@ilm.com wrote:
>  >
>  > This patch has resolved the immediate issue I was having on 2.6.18 with
>  > RAID10.  Previous to this change, after removing a device from the array
>  > (with mdadm --remove), physically pulling the device and
>  > changing/re-inserting, the "Number" of the new device would be
>  > incremented on top of the highest-present device in the array.  Now, it
>  > resumes its previous place.
>  >
>  > Does this look to be 'correct' output for a 14-drive array, which dev 8
>  > was failed/removed from then "add"'ed?  I'm trying to determine why the
>  > device doesn't get pulled back into the active configuration and
>  > re-synced.  Any comments?
> 
> Does this patch help?
> 
> 
> 
> Fix count of degraded drives in raid10.
> 
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./drivers/md/raid10.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
> --- .prev/drivers/md/raid10.c   2006-10-09 14:18:00.000000000 +1000
> +++ ./drivers/md/raid10.c       2006-10-05 20:10:07.000000000 +1000
> @@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
>                 disk = conf->mirrors + i;
> 
>                 if (!disk->rdev ||
> -                   !test_bit(In_sync, &rdev->flags)) {
> +                   !test_bit(In_sync, &disk->rdev->flags)) {
>                         disk->head_position = 0;
>                         mddev->degraded++;
>                 }
> 
> 
> NeilBrown
> 


[-- Attachment #2: gtmp-mdadm-add-drive-after-boot-to-degraded-array-fails-to-resync.log.gz --]
[-- Type: application/x-gzip, Size: 1349 bytes --]

[-- Attachment #3: gtmp-mdadm-remove-drive-from-activeclean-array-add-fails.log.gz --]
[-- Type: application/x-gzip, Size: 11331 bytes --]

next prev parent reply	other threads:[~2006-10-10 20:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061005171233.6542.patches@notabene>
2006-10-05  7:13 ` [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly NeilBrown
2006-10-05 19:26   ` Eli Stair
2006-10-06 22:42     ` Eli Stair
2006-10-10  2:00       ` Neil Brown
2006-10-10 20:42         ` Eli Stair [this message]
2006-10-11  0:00           ` Eli Stair
2006-10-12 10:02         ` Michael Tokarev
2006-10-17  0:13           ` Neil Brown
2006-10-18 22:18         ` Eli Stair
2006-10-20  3:24           ` Neil Brown
2006-10-10 20:20       ` Eli Stair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=452C05AF.2070807@ilm.com \
    --to=estair@ilm.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).