From: Eli Stair <estair@ilm.com>
To: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly.
Date: Tue, 10 Oct 2006 13:42:23 -0700 [thread overview]
Message-ID: <452C05AF.2070807@ilm.com> (raw)
In-Reply-To: <17706.65210.999441.373846@cse.unsw.edu.au>
[-- Attachment #1: Type: text/plain, Size: 4550 bytes --]
Thanks Neil,
I just gave this patched module a shot on four systems. So far, I
haven't seen the device number inappropriately increment, though as per
a mail I sent a short while ago that seemed remedied by using the 1.2
superblock, for some reason. However, it appears to have introduced a
new issue, and another is unresolved by it:
// BUG 1
The single-command syntax to fail and remove a drive is still failing, I
do not know if this is somehow contributing to the further (new) issues
below:
[root@gtmp06 tmp]# mdadm /dev/md0 --fail /dev/dm-0 --remove /dev/dm-0
mdadm: set /dev/dm-0 faulty in /dev/md0
mdadm: hot remove failed for /dev/dm-0: Device or resource busy
[root@gtmp06 tmp]# mdadm /dev/md0 --remove /dev/dm-0
mdadm: hot removed /dev/dm-0
// BUG 2
Now, upon adding or re-adding a "fail...remove"'d drive, it is not used
for resync. I realized previously that added drives weren't re-synced
until the existing array build was done, then they were grabbed. This
however is a clean/active array that is rejecting the drive.
I've performed this identically on both a clean & active array, as well
as a newly-created (resync'ing) array, to the same effect. Even after
rebuild or reboot, the removed drive isn't taken back and remains listed
as a "faulty spare", with dmesg indicating that it is "non-fresh".
// DMESG:
md: kicking non-fresh dm-0 from array!
// ARRAY status 'mdadm -D /dev/md0'
State : active, degraded
Active Devices : 13
Working Devices : 13
Failed Devices : 1
Spare Devices : 0
Layout : near=1, offset=2
Chunk Size : 512K
Name : 0
UUID : 05c2faf4:facfcad3:ba33b140:100f428a
Events : 22
Number Major Minor RaidDevice State
0 253 1 0 active sync /dev/dm-1
1 253 2 1 active sync /dev/dm-2
2 253 5 2 active sync /dev/dm-5
3 253 4 3 active sync /dev/dm-4
4 253 6 4 active sync /dev/dm-6
5 253 3 5 active sync /dev/dm-3
6 253 13 6 active sync /dev/dm-13
7 0 0 7 removed
8 253 7 8 active sync /dev/dm-7
9 253 8 9 active sync /dev/dm-8
10 253 9 10 active sync /dev/dm-9
11 253 11 11 active sync /dev/dm-11
12 253 10 12 active sync /dev/dm-10
13 253 12 13 active sync /dev/dm-12
7 253 0 - faulty spare /dev/dm-0
Let me know what more I can do to help track this down. I'm reverting
this patch, since it is behaving less-well than before. Will be happy
to try others.
Attached are typescript of the drive remove/add sessions and all output.
/eli
Neil Brown wrote:
> On Friday October 6, estair@ilm.com wrote:
> >
> > This patch has resolved the immediate issue I was having on 2.6.18 with
> > RAID10. Previous to this change, after removing a device from the array
> > (with mdadm --remove), physically pulling the device and
> > changing/re-inserting, the "Number" of the new device would be
> > incremented on top of the highest-present device in the array. Now, it
> > resumes its previous place.
> >
> > Does this look to be 'correct' output for a 14-drive array, which dev 8
> > was failed/removed from then "add"'ed? I'm trying to determine why the
> > device doesn't get pulled back into the active configuration and
> > re-synced. Any comments?
>
> Does this patch help?
>
>
>
> Fix count of degraded drives in raid10.
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./drivers/md/raid10.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
> --- .prev/drivers/md/raid10.c 2006-10-09 14:18:00.000000000 +1000
> +++ ./drivers/md/raid10.c 2006-10-05 20:10:07.000000000 +1000
> @@ -2079,7 +2079,7 @@ static int run(mddev_t *mddev)
> disk = conf->mirrors + i;
>
> if (!disk->rdev ||
> - !test_bit(In_sync, &rdev->flags)) {
> + !test_bit(In_sync, &disk->rdev->flags)) {
> disk->head_position = 0;
> mddev->degraded++;
> }
>
>
> NeilBrown
>
[-- Attachment #2: gtmp-mdadm-add-drive-after-boot-to-degraded-array-fails-to-resync.log.gz --]
[-- Type: application/x-gzip, Size: 1349 bytes --]
[-- Attachment #3: gtmp-mdadm-remove-drive-from-activeclean-array-add-fails.log.gz --]
[-- Type: application/x-gzip, Size: 11331 bytes --]
next prev parent reply other threads:[~2006-10-10 20:42 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20061005171233.6542.patches@notabene>
2006-10-05 7:13 ` [PATCH] md: Fix bug where new drives added to an md array sometimes don't sync properly NeilBrown
2006-10-05 19:26 ` Eli Stair
2006-10-06 22:42 ` Eli Stair
2006-10-10 2:00 ` Neil Brown
2006-10-10 20:42 ` Eli Stair [this message]
2006-10-11 0:00 ` Eli Stair
2006-10-12 10:02 ` Michael Tokarev
2006-10-17 0:13 ` Neil Brown
2006-10-18 22:18 ` Eli Stair
2006-10-20 3:24 ` Neil Brown
2006-10-10 20:20 ` Eli Stair
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=452C05AF.2070807@ilm.com \
--to=estair@ilm.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).