* [PATCH] md: new bitmap sysfs interface
@ 2006-07-25 6:30 Paul Clements
2006-07-25 17:41 ` dave rientjes
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-25 6:30 UTC (permalink / raw)
To: linux-raid, neilb
[-- Attachment #1: Type: text/plain, Size: 796 bytes --]
This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
that allows the bitmap of an array to be dirtied. The interface is
write-only, and is used as follows:
echo "1000" > /sys/block/md2/md/bitmap
(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
bitmaps of array md2)
echo "1000-2000" > /sys/block/md1/md/bitmap
(dirty the bits for chunks 1000-2000 in md1's bitmap)
This is useful, for example, in cluster environments where you may need
to combine two disjoint bitmaps into one (following a server failure,
after a secondary server has taken over the array). By combining the
bitmaps on the two servers, a full resync can be avoided (This was
discussed on the list back on March 18, 2005, "[PATCH 1/2] md bitmap bug
fixes" thread).
Thanks,
Paul
[-- Attachment #2: md_bitmap_sysfs-3.diff --]
[-- Type: text/plain, Size: 2756 bytes --]
diff -pur linux-2.6.18-rc1-mm1/drivers/md/bitmap.c linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/bitmap.c
--- linux-2.6.18-rc1-mm1/drivers/md/bitmap.c 2006-07-06 00:09:49.000000000 -0400
+++ linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/bitmap.c 2006-07-24 16:35:18.000000000 -0400
@@ -613,6 +613,7 @@ static inline unsigned long file_page_of
static inline struct page *filemap_get_page(struct bitmap *bitmap,
unsigned long chunk)
{
+ if (file_page_index(chunk) >= bitmap->file_pages) return NULL;
return bitmap->filemap[file_page_index(chunk) - file_page_index(0)];
}
@@ -739,6 +740,7 @@ static void bitmap_file_set_bit(struct b
}
page = filemap_get_page(bitmap, chunk);
+ if (!page) return;
bit = file_page_offset(chunk);
/* set the bit */
@@ -1322,6 +1324,18 @@ static void bitmap_set_memory_bits(struc
}
+/* dirty the memory and file bits for bitmap chunks "s" to "e" */
+void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long e)
+{
+ unsigned long chunk;
+
+ for (chunk = s; chunk <= e; chunk++) {
+ sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
+ bitmap_set_memory_bits(bitmap, sec, 1);
+ bitmap_file_set_bit(bitmap, sec);
+ }
+}
+
/*
* flush out any pending updates
*/
diff -pur linux-2.6.18-rc1-mm1/drivers/md/md.c linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/md.c
--- linux-2.6.18-rc1-mm1/drivers/md/md.c 2006-07-14 16:10:41.000000000 -0400
+++ linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/md.c 2006-07-18 11:52:11.000000000 -0400
@@ -2507,6 +2507,36 @@ static struct md_sysfs_entry md_new_devi
__ATTR(new_dev, S_IWUSR, null_show, new_dev_store);
static ssize_t
+bitmap_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ char *end;
+ unsigned long chunk, end_chunk;
+
+ if (!mddev->bitmap)
+ goto out;
+ /* buf should be <chunk> <chunk> ... or <chunk>-<chunk> ... (range) */
+ while (*buf) {
+ chunk = end_chunk = simple_strtoul(buf, &end, 0);
+ if (buf == end) break;
+ if (*end == '-') { /* range */
+ buf = end + 1;
+ end_chunk = simple_strtoul(buf, &end, 0);
+ if (buf == end) break;
+ }
+ if (*end && !isspace(*end)) break;
+ bitmap_dirty_bits(mddev->bitmap, chunk, end_chunk);
+ buf = end;
+ while (isspace(*buf)) buf++;
+ }
+ bitmap_unplug(mddev->bitmap); /* flush the bits to disk */
+out:
+ return len;
+}
+
+static struct md_sysfs_entry md_bitmap =
+__ATTR(bitmap, S_IWUSR, null_show, bitmap_store);
+
+static ssize_t
size_show(mddev_t *mddev, char *page)
{
return sprintf(page, "%llu\n", (unsigned long long)mddev->size);
@@ -2826,6 +2856,7 @@ static struct attribute *md_redundancy_a
&md_sync_completed.attr,
&md_suspend_lo.attr,
&md_suspend_hi.attr,
+ &md_bitmap.attr,
NULL,
};
static struct attribute_group md_redundancy_group = {
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-25 6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
@ 2006-07-25 17:41 ` dave rientjes
2006-07-26 21:30 ` Mike Snitzer
2006-08-03 1:42 ` Neil Brown
2 siblings, 0 replies; 12+ messages in thread
From: dave rientjes @ 2006-07-25 17:41 UTC (permalink / raw)
To: Paul Clements; +Cc: linux-raid, neilb
On Tue, 25 Jul 2006, Paul Clements wrote:
> @@ -1322,6 +1324,18 @@ static void bitmap_set_memory_bits(struc
>
> }
>
> +/* dirty the memory and file bits for bitmap chunks "s" to "e" */
> +void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long
e)
> +{
> + unsigned long chunk;
> +
> + for (chunk = s; chunk <= e; chunk++) {
> + sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
> + bitmap_set_memory_bits(bitmap, sec, 1);
> + bitmap_file_set_bit(bitmap, sec);
> + }
> +}
> +
Why not
{
for (; s <= e; s++) {
sector_t sec = s << CHUNK_BLOCK_SHIFT(bitmap);
...
}
}
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-25 6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
2006-07-25 17:41 ` dave rientjes
@ 2006-07-26 21:30 ` Mike Snitzer
2006-07-27 2:27 ` Paul Clements
2006-08-03 1:42 ` Neil Brown
2 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-26 21:30 UTC (permalink / raw)
To: Paul Clements; +Cc: linux-raid, neilb
On 7/25/06, Paul Clements <paul.clements@steeleye.com> wrote:
> This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
> that allows the bitmap of an array to be dirtied. The interface is
> write-only, and is used as follows:
>
> echo "1000" > /sys/block/md2/md/bitmap
>
> (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
> bitmaps of array md2)
>
> echo "1000-2000" > /sys/block/md1/md/bitmap
>
> (dirty the bits for chunks 1000-2000 in md1's bitmap)
>
> This is useful, for example, in cluster environments where you may need
> to combine two disjoint bitmaps into one (following a server failure,
> after a secondary server has taken over the array). By combining the
> bitmaps on the two servers, a full resync can be avoided (This was
> discussed on the list back on March 18, 2005, "[PATCH 1/2] md bitmap bug
> fixes" thread).
Hi Paul,
I tracked down the thread you referenced and these posts (by you)
seems to summarize things well:
http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
But for clarity's sake, could you elaborate on the negative
implications of not merging the bitmaps on the secondary server? Will
the previous primary's dirty blocks get dropped on the floor because
the secondary (now the primary) doesn't have awareness of the previous
primary's dirty blocks once it activates the raid1?
Also, what is the interface one should use to collect dirty bits from
the primary's bitmap?
This bitmap merge can't happen until the primary's dirty bits can be
collected right? Waiting for the failed server to come back to
harvest the dirty bits it has seems wrong (why failover at all?); so I
must be missing something.
please advise, thanks.
Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-26 21:30 ` Mike Snitzer
@ 2006-07-27 2:27 ` Paul Clements
2006-07-27 3:36 ` Mike Snitzer
2006-07-27 14:28 ` Mike Snitzer
0 siblings, 2 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27 2:27 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-raid, neilb
Mike Snitzer wrote:
> I tracked down the thread you referenced and these posts (by you)
> seems to summarize things well:
> http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
>
> But for clarity's sake, could you elaborate on the negative
> implications of not merging the bitmaps on the secondary server? Will
> the previous primary's dirty blocks get dropped on the floor because
> the secondary (now the primary) doesn't have awareness of the previous
> primary's dirty blocks once it activates the raid1?
Right. At the time of the failover, there were (probably) blocks that
were out of sync between the primary and secondary. Now, after you've
failed over to the secondary, you've got to overwrite those blocks with
data from the secondary in order to make the primary disk consistent
again. This requires that either you do a full resync from secondary to
primary (if you don't know what differs), or you merge the two bitmaps
and resync just that data.
> Also, what is the interface one should use to collect dirty bits from
> the primary's bitmap?
Whatever you'd like. scp the bitmap file over or collect the ranges into
a file and scp that over, or something similar.
> This bitmap merge can't happen until the primary's dirty bits can be
> collected right? Waiting for the failed server to come back to
Right. So, when the primary fails, you start the array on the secondary
with a _clean_ bitmap, and just its local disk component. Now, whatever
gets written while the primary is down gets put into the bitmap on the
secondary. When the primary comes back up, you take the dirty bits from
it and add them into the secondary's bitmap. Then, you insert the
primary's disk (via nbd or similar) back into the array, and begin a
resync.
That's the whole reason for this interface. We have to modify the bitmap
while the array is active (modifying the bitmap while the array is down
is trivial, and certainly doesn't require sysfs :).
> harvest the dirty bits it has seems wrong (why failover at all?); so I
> must be missing something.
We fail over immediately. We wait until later to combine the bitmaps and
resync the data.
Hope that helps.
--
Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-27 2:27 ` Paul Clements
@ 2006-07-27 3:36 ` Mike Snitzer
2006-07-27 14:07 ` Paul Clements
2006-07-27 14:28 ` Mike Snitzer
1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-27 3:36 UTC (permalink / raw)
To: Paul Clements; +Cc: linux-raid, neilb
On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
> Mike Snitzer wrote:
>
> > I tracked down the thread you referenced and these posts (by you)
> > seems to summarize things well:
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
> >
> > But for clarity's sake, could you elaborate on the negative
> > implications of not merging the bitmaps on the secondary server? Will
> > the previous primary's dirty blocks get dropped on the floor because
> > the secondary (now the primary) doesn't have awareness of the previous
> > primary's dirty blocks once it activates the raid1?
>
> Right. At the time of the failover, there were (probably) blocks that
> were out of sync between the primary and secondary. Now, after you've
> failed over to the secondary, you've got to overwrite those blocks with
> data from the secondary in order to make the primary disk consistent
> again. This requires that either you do a full resync from secondary to
> primary (if you don't know what differs), or you merge the two bitmaps
> and resync just that data.
I took more time to read the later posts in the original thread; that
coupled with your detailed response has helped a lot. thanks.
> > Also, what is the interface one should use to collect dirty bits from
> > the primary's bitmap?
>
> Whatever you'd like. scp the bitmap file over or collect the ranges into
> a file and scp that over, or something similar.
OK, so regardless of whether you are using an external or internal
bitmap; how does one collect the ranges from an array's bitmap?
Generally speaking I think others would have the same (naive) question
given that we need to know what to use as input for the sysfs
interface you've kindly provided. If it is left as an exercise to
the user that is fine; I'd imagine neilb will get our backs with a
nifty new mdadm flag if need be.
thanks again,
Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-27 3:36 ` Mike Snitzer
@ 2006-07-27 14:07 ` Paul Clements
0 siblings, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27 14:07 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-raid, neilb
Mike Snitzer wrote:
> On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
>> Mike Snitzer wrote:
>> > Also, what is the interface one should use to collect dirty bits from
>> > the primary's bitmap?
>>
>> Whatever you'd like. scp the bitmap file over or collect the ranges into
>> a file and scp that over, or something similar.
>
> OK, so regardless of whether you are using an external or internal
> bitmap; how does one collect the ranges from an array's bitmap?
Well, with an internal bitmap you don't need this interface. The bitmap
is located on all the component disks. The reason we don't use internal
bitmaps in a configuration where one of the disks is located remotely
(over a LAN or SAN, or possibly a WAN) is that the bitmap updates (which
are synchronous and occur fairly often) would be too costly.
So reading the bits out of a file is fairly simple. The bitmap file is
laid out one bit per chunk, with a 256 byte superblock at the front. You
just need a perl script (for example) that reads the file and keeps
track of which bits are dirty, and then prints those numbers out.
--
Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-27 2:27 ` Paul Clements
2006-07-27 3:36 ` Mike Snitzer
@ 2006-07-27 14:28 ` Mike Snitzer
2006-07-27 14:55 ` Paul Clements
1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-27 14:28 UTC (permalink / raw)
To: Paul Clements; +Cc: linux-raid, neilb
On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
> Mike Snitzer wrote:
>
> > I tracked down the thread you referenced and these posts (by you)
> > seems to summarize things well:
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
> >
> > But for clarity's sake, could you elaborate on the negative
> > implications of not merging the bitmaps on the secondary server? Will
> > the previous primary's dirty blocks get dropped on the floor because
> > the secondary (now the primary) doesn't have awareness of the previous
> > primary's dirty blocks once it activates the raid1?
>
> Right. At the time of the failover, there were (probably) blocks that
> were out of sync between the primary and secondary.
OK, so now that I understand the need to merge the bitmaps... the
various scenarios that create this (potential) inconsistency are still
unclear to me when you consider the different flavors of raid1. Is
this inconsistency only possible if using async (aka write-behind)
raid1?
If not, how would this difference in committed blocks occur with
normal (sync) raid1 given MD's endio acknowledges writes after they
are submitted to all raid members? Is it merely that the bitmap is
left with dangling bits set that don't reflect reality (blocks weren't
actually changed anywhere) when a crash occurs? Is there real
potential for inconsistent data on disk(s) when using sync raid1 (does
having an nbd member increase the likelihood)?
regards,
Mike
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-27 14:28 ` Mike Snitzer
@ 2006-07-27 14:55 ` Paul Clements
0 siblings, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27 14:55 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-raid, neilb
Mike Snitzer wrote:
> On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
>> Right. At the time of the failover, there were (probably) blocks that
>> were out of sync between the primary and secondary.
>
> OK, so now that I understand the need to merge the bitmaps... the
> various scenarios that create this (potential) inconsistency are still
> unclear to me when you consider the different flavors of raid1. Is
> this inconsistency only possible if using async (aka write-behind)
> raid1?
No. Even with a synchronous (normal) raid1, you will probably have
blocks that are out of sync when one disk (or server) fails. This is
true even of raid1's using internal disks. That's why you resync the
array after a failure (of the system or of one of the disks). That's
exactly what the bitmap is for -- to optimize that resync.
--
Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-07-25 6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
2006-07-25 17:41 ` dave rientjes
2006-07-26 21:30 ` Mike Snitzer
@ 2006-08-03 1:42 ` Neil Brown
2006-08-03 1:53 ` Paul Clements
2006-08-03 7:24 ` David Greaves
2 siblings, 2 replies; 12+ messages in thread
From: Neil Brown @ 2006-08-03 1:42 UTC (permalink / raw)
To: Paul Clements; +Cc: linux-raid
On Tuesday July 25, paul.clements@steeleye.com wrote:
> This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
> that allows the bitmap of an array to be dirtied. The interface is
> write-only, and is used as follows:
>
> echo "1000" > /sys/block/md2/md/bitmap
>
> (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
> bitmaps of array md2)
>
> echo "1000-2000" > /sys/block/md1/md/bitmap
>
> (dirty the bits for chunks 1000-2000 in md1's bitmap)
>
Thanks. Only one remaining issue (which I didn't think of last time :-)
Is 'bitmap' the best name for the sysfs file?
It seems a bit generic to me.
write-bits-here-to-dirty-them-in-the-bitmap
is probably (no, definitely) too verbose.
dirty-in-bitmap
maybe?
bitmap-set-bits
None of these seem completely satisfying.
I like to start with the most important word - which is 'bitmap' in
this case, but that isn't completely essential.
Any better suggestions?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-08-03 1:42 ` Neil Brown
@ 2006-08-03 1:53 ` Paul Clements
2006-08-03 7:24 ` David Greaves
1 sibling, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-08-03 1:53 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown wrote:
> Is 'bitmap' the best name for the sysfs file?
> It seems a bit generic to me.
>
> write-bits-here-to-dirty-them-in-the-bitmap
>
> is probably (no, definitely) too verbose.
>
> dirty-in-bitmap
> maybe?
> bitmap-set-bits
> Any better suggestions?
I like "bitmap-set-bits" or "bitmap-dirty" maybe. I had thought about
this, but never changed the name. I agree, something more than just
"bitmap" is probably appropriate, so that if we ever add a generic
bitmap output or input (to clear bits, maybe) then we still have
"bitmap" available, and also because (as you said) "bitmap" is a little
too generic for what this interface really does.
Thanks,
Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-08-03 1:42 ` Neil Brown
2006-08-03 1:53 ` Paul Clements
@ 2006-08-03 7:24 ` David Greaves
2006-08-03 15:39 ` Mr. James W. Laferriere
1 sibling, 1 reply; 12+ messages in thread
From: David Greaves @ 2006-08-03 7:24 UTC (permalink / raw)
To: Neil Brown; +Cc: Paul Clements, linux-raid
Neil Brown wrote:
> write-bits-here-to-dirty-them-in-the-bitmap
>
> is probably (no, definitely) too verbose.
> Any better suggestions?
It's not actually a bitmap is it?
It takes a number or range and *operates* on a bitmap.
so:
dirty-chunk-in-bitmap
or maybe:
dirty-bitmap-chunk
David
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] md: new bitmap sysfs interface
2006-08-03 7:24 ` David Greaves
@ 2006-08-03 15:39 ` Mr. James W. Laferriere
0 siblings, 0 replies; 12+ messages in thread
From: Mr. James W. Laferriere @ 2006-08-03 15:39 UTC (permalink / raw)
To: David Greaves; +Cc: Neil Brown, Paul Clements, linux-raid
Hello All ,
On Thu, 3 Aug 2006, David Greaves wrote:
> Neil Brown wrote:
>> write-bits-here-to-dirty-them-in-the-bitmap
>>
>> is probably (no, definitely) too verbose.
>> Any better suggestions?
>
> It's not actually a bitmap is it?
> It takes a number or range and *operates* on a bitmap.
>
> so:
> dirty-chunk-in-bitmap
>
> or maybe:
> dirty-bitmap-chunk
>
> David
A thought , maybe bitmap-dirty-this-chunk ?
--
+----------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 3600 14th Ave SE #20-103 | Give me Linux |
| babydr@baby-dragons.com | Olympia , WA. 98501 | only on AXP |
+----------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2006-08-03 15:39 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25 6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
2006-07-25 17:41 ` dave rientjes
2006-07-26 21:30 ` Mike Snitzer
2006-07-27 2:27 ` Paul Clements
2006-07-27 3:36 ` Mike Snitzer
2006-07-27 14:07 ` Paul Clements
2006-07-27 14:28 ` Mike Snitzer
2006-07-27 14:55 ` Paul Clements
2006-08-03 1:42 ` Neil Brown
2006-08-03 1:53 ` Paul Clements
2006-08-03 7:24 ` David Greaves
2006-08-03 15:39 ` Mr. James W. Laferriere
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).