From: Neil Brown <neilb@suse.de>
To: Mike Snitzer <snitzer@gmail.com>
Cc: linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
paul.clements@steeleye.com
Subject: Re: [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition
Date: Tue, 27 May 2008 16:56:03 +1000 [thread overview]
Message-ID: <18491.45187.557090.225552@notabene.brown> (raw)
In-Reply-To: message from Mike Snitzer on Tuesday May 20
On Tuesday May 20, snitzer@gmail.com wrote:
>
> Hi Neil,
>
> We're much closer. The events_cleared is symmetric on both the failed
> and active member of the raid1. But there have been some instances
> where the md thread hits a deadlock during my testing. What follows
> is the backtrace and live crash info:
...
>
> So running with your latest patches seems to introduce a race in
> bitmap_daemon_work's if (unlikely((*bmc & COUNTER_MAX) ==
> COUNTER_MAX)) { } block.
As you not, that block is in the wrong place.
It is actually locking up in
wait_event(bitmap->mddev->sb_wait,
!test_bit(MD_CHANGE_CLEAN,
&bitmap->mddev->flags));
which the patch adds. However with my last update that wait_event
isn't needed any more. I was using it to ensure mddev->events matched
what was on disk. But we now read mddev->events much earlier and it
will definitely be on disc by this time.
So: this combined patch should do it.
Thanks for all your testing.
NeilBrown
---------------------------
Improve setting of "events_cleared" for write-intent bitmaps.
When an array is degraded, bits in the write-intent bitmap are not
cleared, so that if the missing device is re-added, it can be synced
by only updated those parts of the device that have changed since
it was removed.
The enable this a 'events_cleared' value is stored. It is the event
counter for the array the last time that any bits were cleared.
Sometimes - if a device disappears from an array while it is 'clean' -
the events_cleared value gets updated incorrectly (there are subtle
ordering issues between updateing events in the main metadata and the
bitmap metadata) resulting in the missing device appearing to require
a full resync when it is re-added.
With this patch, we update events_cleared precisely when we are about
to clear a bit in the bitmap. We record events_cleared when we clear
the bit internally, and copy that to the superblock which is written
out before the bit on storage. This makes it more "obviously correct".
We also need to update events_cleared when the event_count is going
backwards (as happens on a dirty->clean transition of a non-degraded
array).
Thanks to Mike Snitzer for identifying this problem and testing early
"fixes".
Cc: "Mike Snitzer" <snitzer@gmail.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 29 ++++++++++++++++++++++++-----
./include/linux/raid/bitmap.h | 1 +
2 files changed, 25 insertions(+), 5 deletions(-)
diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c 2008-05-27 16:50:04.000000000 +1000
+++ ./drivers/md/bitmap.c 2008-05-27 16:50:53.000000000 +1000
@@ -454,8 +454,11 @@ void bitmap_update_sb(struct bitmap *bit
spin_unlock_irqrestore(&bitmap->lock, flags);
sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
sb->events = cpu_to_le64(bitmap->mddev->events);
- if (!bitmap->mddev->degraded)
- sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
+ if (bitmap->mddev->events < bitmap->events_cleared) {
+ /* rocking back to read-only */
+ bitmap->events_cleared = bitmap->mddev->events;
+ sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
+ }
kunmap_atomic(sb, KM_USER0);
write_page(bitmap, bitmap->sb_page, 1);
}
@@ -1085,9 +1088,19 @@ void bitmap_daemon_work(struct bitmap *b
} else
spin_unlock_irqrestore(&bitmap->lock, flags);
lastpage = page;
-/*
- printk("bitmap clean at page %lu\n", j);
-*/
+
+ /* We are possibly going to clear some bits, so make
+ * sure that events_cleared is up-to-date.
+ */
+ if (bitmap->need_sync) {
+ bitmap_super_t *sb;
+ bitmap->need_sync = 0;
+ sb = kmap_atomic(bitmap->sb_page, KM_USER0);
+ sb->events_cleared =
+ cpu_to_le64(bitmap->events_cleared);
+ kunmap_atomic(sb, KM_USER0);
+ write_page(bitmap, bitmap->sb_page, 1);
+ }
spin_lock_irqsave(&bitmap->lock, flags);
clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
}
@@ -1257,6 +1270,12 @@ void bitmap_endwrite(struct bitmap *bitm
return;
}
+ if (success &&
+ bitmap->events_cleared < bitmap->mddev->events) {
+ bitmap->events_cleared = bitmap->mddev->events;
+ bitmap->need_sync = 1;
+ }
+
if (!success && ! (*bmc & NEEDED_MASK))
*bmc |= NEEDED_MASK;
diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h
--- .prev/include/linux/raid/bitmap.h 2008-05-26 09:46:04.000000000 +1000
+++ ./include/linux/raid/bitmap.h 2008-05-27 16:50:19.000000000 +1000
@@ -221,6 +221,7 @@ struct bitmap {
unsigned long syncchunk;
__u64 events_cleared;
+ int need_sync;
/* bitmap spinlock */
spinlock_t lock;
next prev parent reply other threads:[~2008-05-27 6:56 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-02 22:09 [RFC][PATCH] md: avoid fullsync if a faulty member missed a dirty transition Mike Snitzer
2008-05-06 6:53 ` Neil Brown
2008-05-06 11:58 ` Mike Snitzer
2008-05-08 6:13 ` Neil Brown
2008-05-08 20:11 ` Mike Snitzer
2008-05-09 1:40 ` Neil Brown
2008-05-09 4:42 ` Mike Snitzer
2008-05-09 5:08 ` Mike Snitzer
2008-05-09 5:26 ` Mike Snitzer
2008-05-09 6:01 ` Neil Brown
2008-05-09 15:00 ` Mike Snitzer
2008-05-16 11:54 ` Neil Brown
2008-05-19 4:33 ` Mike Snitzer
2008-05-19 5:27 ` Neil Brown
2008-05-20 15:30 ` Mike Snitzer
2008-05-20 15:33 ` Mike Snitzer
2008-05-27 6:56 ` Neil Brown [this message]
2008-05-27 14:33 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18491.45187.557090.225552@notabene.brown \
--to=neilb@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=paul.clements@steeleye.com \
--cc=snitzer@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).