From: Neil Brown <neilb@suse.de>
To: Mike Snitzer <snitzer@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
paul.clements@steeleye.com
Subject: Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition
Date: Fri, 16 May 2008 21:54:31 +1000 [thread overview]
Message-ID: <18477.30199.992302.271647@notabene.brown> (raw)
In-Reply-To: message from Mike Snitzer on Friday May 9
On Friday May 9, snitzer@gmail.com wrote:
> On Fri, May 9, 2008 at 2:01 AM, Neil Brown <neilb@suse.de> wrote:
> >
> > On Friday May 9, snitzer@gmail.com wrote:
>
> > > Unfortunately my testing with this patch results in a full resync.
> > >
> > > Here is the state of the array after shutdown:
> > > # mdadm -X /dev/nbd0 /dev/sdq
> > > Filename : /dev/nbd0
> > > Magic : 6d746962
> > > Version : 4
> > > UUID : 7140cc3c:8681416c:12c5668a:984ca55d
> > > Events : 896
> > > Events Cleared : 897
> >
> > Events Cleared is *larger* than Events!!! Is that repeatable? I can
> > only see it happening if a very small race were lost. You don't have
> > any other patches in there do you?
>
> Yes, it is repeatable with your previous patch. But with your most
> recent patch I had the following after shutdown:
>
> # mdadm -X /dev/nbd0 /dev/sdq
> Filename : /dev/nbd0
> Events : 1732
> Events Cleared : 1732
> Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
>
> Filename : /dev/sdq
> Events : 1736
> Events Cleared : 1736
> Bitmap : 409600 bits (chunks), 1 dirty (0.0%)
>
> Unfortunately sdq's events_cleared appears to have been updated
> _after_ the array became degraded.
> As such a full resync occurred because 1732 < 1736.
>
> > This patch should close the race, though I still find it hard to
> > believe that you lost the race.
>
> Comments inlined below.
>
> > Signed-off-by: Neil Brown <neilb@suse.de>
> >
> > ### Diffstat output
> > ./drivers/md/bitmap.c | 20 +++++++++++++++-----
> > 1 file changed, 15 insertions(+), 5 deletions(-)
> >
> >
> > diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
> > --- .prev/drivers/md/bitmap.c 2008-05-09 11:02:13.000000000 +1000
> > +++ ./drivers/md/bitmap.c 2008-05-09 16:00:07.000000000 +1000
> >
> > @@ -465,8 +465,6 @@ void bitmap_update_sb(struct bitmap *bit
> > spin_unlock_irqrestore(&bitmap->lock, flags);
> > sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
> > sb->events = cpu_to_le64(bitmap->mddev->events);
> > - if (!bitmap->mddev->degraded)
> > - sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
>
> Before, events_cleared was _not_ updated if the array was degraded.
> Your patch doesn't appear to maintain that design.
It does, but it is well hidden.
Bits in the bitmap are only cleared when the array is not degraded.
The new code for updating events_cleared is only triggered when a bit
is about to be cleared.
>
> I needed "bitmap->mddev->sb_wait" and "bitmap->mddev->flags" to get
> the code to compile.
Sorry about that...
I decided to bite the bullet and create a setup where I could test
this myself. Using the faulty personality of md make it fairly
straight forward. This script:
------------------------------------------------------------
mdadm -Ss
mdadm -B /dev/md9 -l faulty -n 1 /dev/sdc
mdadm -CR /dev/md0 -l1 -n2 -d 1 --bitmap internal --assume-clean /dev/sdb /dev/md9
mkfs /dev/md0 3000000
mount /dev/md0 /mnt
echo hello > /mnt/afile
sync
sleep 4
echo before grow
mdadm -E /dev/md9 | grep -E '(State|Event).*:'
mdadm -X /dev/md9 | grep Bitmap
mdadm -G /dev/md9 -l faulty -p wa
umount /mnt
echo after umount
mdadm -X /dev/sdb | grep Event
mdadm -S /dev/md0
echo sdb
mdadm -E /dev/sdb | grep Event
mdadm -E /dev/sdb | grep State' :'
mdadm -X /dev/sdb | grep Event
echo mdp
mdadm -E /dev/md9 | grep Event
mdadm -E /dev/md9 | grep State' :'
mdadm -X /dev/md9 | grep Event
mdadm -G /dev/md9 -l faulty -p none
mdadm -A /dev/md0 /dev/sdb
mdadm /dev/md0 -a /dev/md9
sleep 1
cat /proc/mdstat
------------------------------------------------------------
reproduces exactly your problem (I think).
This helped me discover what was wrong with my patch. It has to do
with the event counter going backwards sometimes.
This patch makes the above test work as expected, and should provide
happiness for you too.
NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c 2008-05-16 20:27:49.000000000 +1000
+++ ./drivers/md/bitmap.c 2008-05-16 21:49:20.000000000 +1000
@@ -454,8 +454,11 @@ void bitmap_update_sb(struct bitmap *bit
spin_unlock_irqrestore(&bitmap->lock, flags);
sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
sb->events = cpu_to_le64(bitmap->mddev->events);
- if (!bitmap->mddev->degraded)
- sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
+ if (bitmap->mddev->events < bitmap->events_cleared){
+ /* rocking back to read-only */
+ bitmap->events_cleared = bitmap->mddev->events;
+ sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
+ }
kunmap_atomic(sb, KM_USER0);
write_page(bitmap, bitmap->sb_page, 1);
}
@@ -1085,9 +1088,22 @@ void bitmap_daemon_work(struct bitmap *b
} else
spin_unlock_irqrestore(&bitmap->lock, flags);
lastpage = page;
-/*
- printk("bitmap clean at page %lu\n", j);
-*/
+
+ /* We are possibly going to clear some bits, so make
+ * sure that events_cleared is up-to-date.
+ */
+ if (bitmap->events_cleared < bitmap->mddev->events) {
+ bitmap_super_t *sb;
+ bitmap->events_cleared = bitmap->mddev->events;
+ wait_event(bitmap->mddev->sb_wait,
+ !test_bit(MD_CHANGE_CLEAN,
+ &bitmap->mddev->flags));
+ sb = kmap_atomic(bitmap->sb_page, KM_USER0);
+ sb->events_cleared =
+ cpu_to_le64(bitmap->events_cleared);
+ kunmap_atomic(sb, KM_USER0);
+ write_page(bitmap, bitmap->sb_page, 1);
+ }
spin_lock_irqsave(&bitmap->lock, flags);
clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
}
next prev parent reply other threads:[~2008-05-16 11:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-02 22:09 [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition Mike Snitzer
2008-05-06 6:53 ` Neil Brown
2008-05-06 11:58 ` Mike Snitzer
2008-05-08 6:13 ` Neil Brown
2008-05-08 20:11 ` Mike Snitzer
2008-05-09 1:40 ` Neil Brown
2008-05-09 4:42 ` Mike Snitzer
2008-05-09 5:08 ` Mike Snitzer
2008-05-09 5:26 ` Mike Snitzer
2008-05-09 6:01 ` Neil Brown
2008-05-09 15:00 ` Mike Snitzer
2008-05-16 11:54 ` Neil Brown [this message]
2008-05-19 4:33 ` Mike Snitzer
2008-05-19 5:27 ` Neil Brown
2008-05-20 15:30 ` Mike Snitzer
2008-05-20 15:33 ` Mike Snitzer
2008-05-27 6:56 ` Neil Brown
2008-05-27 14:33 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18477.30199.992302.271647@notabene.brown \
--to=neilb@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=paul.clements@steeleye.com \
--cc=snitzer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).