Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Eli Stair <estair@ilm.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.
Date: Tue, 10 Oct 2006 13:42:23 -0700	[thread overview]
Message-ID: <452C05AF.2070807@ilm.com> (raw)
In-Reply-To: <17706.65210.999441.373846@cse.unsw.edu.au>

[-- Attachment #1: Type: text/plain, Size: 4550 bytes --]


Thanks Neil,

I just gave this patched module a shot on four systems.  So far, I 
haven't seen the device number inappropriately increment, though as per 
  a mail I sent a short while ago that seemed remedied by using the 1.2 
superblock, for some reason.  However, it appears to have introduced a 
new issue, and another is unresolved by it:



// BUG 1
The single-command syntax to fail and remove a drive is still failing, I 
do not know if this is somehow contributing to the further (new) issues 
below:

   [root@gtmp06 tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0
   mdadm: set /dev/dm-0 faulty in /dev/md0
   mdadm: hot remove failed for /dev/dm-0: Device or resource busy

   [root@gtmp06 tmp]# mdadm /dev/md0 --remove /dev/dm-0
   mdadm: hot removed /dev/dm-0


// BUG 2
Now, upon adding or re-adding a "fail...remove"'d drive, it is not used 
for resync.  I realized previously that added drives weren't re-synced 
until the existing array build was done, then they were grabbed.  This 
however is a clean/active array that is rejecting the drive.

I've performed this identically on both a clean & active array, as well 
as a newly-created (resync'ing) array, to the same effect.  Even after 
rebuild or reboot, the removed drive isn't taken back and remains listed 
as a "faulty spare", with dmesg indicating that it is "non-fresh".




// DMESG:

md: kicking non-fresh dm-0 from array!


// ARRAY status 'mdadm -D /dev/md0'

           State : active, degraded
  Active Devices : 13
Working Devices : 13
  Failed Devices : 1
   Spare Devices : 0

          Layout : near=1, offset=2
      Chunk Size : 512K

            Name : 0
            UUID : 05c2faf4:facfcad3:ba33b140:100f428a
          Events : 22

     Number   Major   Minor   RaidDevice State
        0     253        1        0      active sync   /dev/dm-1
        1     253        2        1      active sync   /dev/dm-2
        2     253        5        2      active sync   /dev/dm-5
        3     253        4        3      active sync   /dev/dm-4
        4     253        6        4      active sync   /dev/dm-6
        5     253        3        5      active sync   /dev/dm-3
        6     253       13        6      active sync   /dev/dm-13
        7       0        0        7      removed
        8     253        7        8      active sync   /dev/dm-7
        9     253        8        9      active sync   /dev/dm-8
       10     253        9       10      active sync   /dev/dm-9
       11     253       11       11      active sync   /dev/dm-11
       12     253       10       12      active sync   /dev/dm-10
       13     253       12       13      active sync   /dev/dm-12

        7     253        0        -      faulty spare   /dev/dm-0




Let me know what more I can do to help track this down.  I'm reverting 
this patch, since it is behaving less-well than before.  Will be happy 
to try others.

Attached are typescript of the drive remove/add sessions and all output.


/eli


Neil Brown wrote:
> On Friday October 6, estair@ilm.com wrote:
>  >
>  > This patch has resolved the immediate issue I was having on 2.6.18 with
>  > RAID10.  Previous to this change, after removing a device from the array
>  > (with mdadm --remove), physically pulling the device and
>  > changing/re-inserting, the "Number" of the new device would be
>  > incremented on top of the highest-present device in the array.  Now, it
>  > resumes its previous place.
>  >
>  > Does this look to be 'correct' output for a 14-drive array, which dev 8
>  > was failed/removed from then "add"'ed?  I'm trying to determine why the
>  > device doesn't get pulled back into the active configuration and
>  > re-synced.  Any comments?
> 
> Does this patch help?
> 
> 
> 
> Fix count of degraded drives in raid10.
> 
> 
> Signed-off-by: Neil Brown <neilb@suse.de>
> 
> ### Diffstat output
>  ./drivers/md/raid10.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
> --- .prev/drivers/md/raid10.c   2006-10-09 14:18:00.000000000 +1000
> +++ ./drivers/md/raid10.c       2006-10-05 20:10:07.000000000 +1000
> @@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
>                 disk = conf->mirrors + i;
> 
>                 if (!disk->rdev ||
> -                   !test_bit(In_sync, &rdev->flags)) {
> +                   !test_bit(In_sync, &disk->rdev->flags)) {
>                         disk->head_position = 0;
>                         mddev->degraded++;
>                 }
> 
> 
> NeilBrown
> 


[-- Attachment #2: gtmp-mdadm-add-drive-after-boot-to-degraded-array-fails-to-resync.log.gz --]
[-- Type: application/x-gzip, Size: 1349 bytes --]

[-- Attachment #3: gtmp-mdadm-remove-drive-from-activeclean-array-add-fails.log.gz --]
[-- Type: application/x-gzip, Size: 11331 bytes --]

next prev parent reply	other threads:[~2006-10-10 20:42 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20061005171233.6542.patches@notabene>
2006-10-05  7:13 ` [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly NeilBrown
2006-10-05 19:26   ` Eli Stair
2006-10-06 22:42     ` Eli Stair
2006-10-10  2:00       ` Neil Brown
2006-10-10 20:42         ` Eli Stair [this message]
2006-10-11  0:00           ` Eli Stair
2006-10-12 10:02         ` Michael Tokarev
2006-10-17  0:13           ` Neil Brown
2006-10-18 22:18         ` Eli Stair
2006-10-20  3:24           ` Neil Brown
2006-10-10 20:20       ` Eli Stair

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=452C05AF.2070807@ilm.com \
    --to=estair@ilm.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.