linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] md: new bitmap sysfs interface
@ 2006-07-25  6:30 Paul Clements
  2006-07-25 17:41 ` dave rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-25  6:30 UTC (permalink / raw)
  To: linux-raid, neilb

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface 
that allows the bitmap of an array to be dirtied. The interface is 
write-only, and is used as follows:

echo "1000" > /sys/block/md2/md/bitmap

(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk 
bitmaps of array md2)

echo "1000-2000" > /sys/block/md1/md/bitmap

(dirty the bits for chunks 1000-2000 in md1's bitmap)

This is useful, for example, in cluster environments where you may need 
to combine two disjoint bitmaps into one (following a server failure, 
after a secondary server has taken over the array). By combining the 
bitmaps on the two servers, a full resync can be avoided (This was 
discussed on the list back on March 18, 2005, "[PATCH 1/2] md bitmap bug 
fixes" thread).

Thanks,
Paul

[-- Attachment #2: md_bitmap_sysfs-3.diff --]
[-- Type: text/plain, Size: 2756 bytes --]

diff -pur linux-2.6.18-rc1-mm1/drivers/md/bitmap.c linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/bitmap.c
--- linux-2.6.18-rc1-mm1/drivers/md/bitmap.c	2006-07-06 00:09:49.000000000 -0400
+++ linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/bitmap.c	2006-07-24 16:35:18.000000000 -0400
@@ -613,6 +613,7 @@ static inline unsigned long file_page_of
 static inline struct page *filemap_get_page(struct bitmap *bitmap,
 					unsigned long chunk)
 {
+	if (file_page_index(chunk) >= bitmap->file_pages) return NULL;
 	return bitmap->filemap[file_page_index(chunk) - file_page_index(0)];
 }
 
@@ -739,6 +740,7 @@ static void bitmap_file_set_bit(struct b
 	}
 
 	page = filemap_get_page(bitmap, chunk);
+	if (!page) return;
 	bit = file_page_offset(chunk);
 
  	/* set the bit */
@@ -1322,6 +1324,18 @@ static void bitmap_set_memory_bits(struc
 
 }
 
+/* dirty the memory and file bits for bitmap chunks "s" to "e" */
+void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long e)
+{
+	unsigned long chunk;
+
+	for (chunk = s; chunk <= e; chunk++) {
+		sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
+		bitmap_set_memory_bits(bitmap, sec, 1);
+		bitmap_file_set_bit(bitmap, sec);
+	}
+}
+
 /*
  * flush out any pending updates
  */
diff -pur linux-2.6.18-rc1-mm1/drivers/md/md.c linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/md.c
--- linux-2.6.18-rc1-mm1/drivers/md/md.c	2006-07-14 16:10:41.000000000 -0400
+++ linux-2.6.18-rc1-mm1-bitmap-sysfs/drivers/md/md.c	2006-07-18 11:52:11.000000000 -0400
@@ -2507,6 +2507,36 @@ static struct md_sysfs_entry md_new_devi
 __ATTR(new_dev, S_IWUSR, null_show, new_dev_store);
 
 static ssize_t
+bitmap_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	char *end;
+	unsigned long chunk, end_chunk;
+
+	if (!mddev->bitmap)
+		goto out;
+	/* buf should be <chunk> <chunk> ... or <chunk>-<chunk> ... (range) */
+	while (*buf) {
+		chunk = end_chunk = simple_strtoul(buf, &end, 0);
+		if (buf == end) break;
+		if (*end == '-') { /* range */
+			buf = end + 1;
+			end_chunk = simple_strtoul(buf, &end, 0);
+			if (buf == end) break;
+		}
+		if (*end && !isspace(*end)) break;
+		bitmap_dirty_bits(mddev->bitmap, chunk, end_chunk);
+		buf = end;
+		while (isspace(*buf)) buf++;
+	}
+	bitmap_unplug(mddev->bitmap); /* flush the bits to disk */
+out:
+	return len;
+}
+
+static struct md_sysfs_entry md_bitmap =
+__ATTR(bitmap, S_IWUSR, null_show, bitmap_store);
+
+static ssize_t
 size_show(mddev_t *mddev, char *page)
 {
 	return sprintf(page, "%llu\n", (unsigned long long)mddev->size);
@@ -2826,6 +2856,7 @@ static struct attribute *md_redundancy_a
 	&md_sync_completed.attr,
 	&md_suspend_lo.attr,
 	&md_suspend_hi.attr,
+	&md_bitmap.attr,
 	NULL,
 };
 static struct attribute_group md_redundancy_group = {

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-25  6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
@ 2006-07-25 17:41 ` dave rientjes
  2006-07-26 21:30 ` Mike Snitzer
  2006-08-03  1:42 ` Neil Brown
  2 siblings, 0 replies; 12+ messages in thread
From: dave rientjes @ 2006-07-25 17:41 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid, neilb

On Tue, 25 Jul 2006, Paul Clements wrote:

> @@ -1322,6 +1324,18 @@ static void bitmap_set_memory_bits(struc
>
>  }
> 
> +/* dirty the memory and file bits for bitmap chunks "s" to "e" */
> +void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long 
e)
> +{
> +       unsigned long chunk;
> +
> +       for (chunk = s; chunk <= e; chunk++) {
> +               sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
> +               bitmap_set_memory_bits(bitmap, sec, 1);
> +               bitmap_file_set_bit(bitmap, sec);
> +       }
> +}
> +

Why not

{
 	for (; s <= e; s++) {
 		sector_t sec = s << CHUNK_BLOCK_SHIFT(bitmap);
 		...
 	}
}

 		David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-25  6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
  2006-07-25 17:41 ` dave rientjes
@ 2006-07-26 21:30 ` Mike Snitzer
  2006-07-27  2:27   ` Paul Clements
  2006-08-03  1:42 ` Neil Brown
  2 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-26 21:30 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid, neilb

On 7/25/06, Paul Clements <paul.clements@steeleye.com> wrote:
> This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
> that allows the bitmap of an array to be dirtied. The interface is
> write-only, and is used as follows:
>
> echo "1000" > /sys/block/md2/md/bitmap
>
> (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
> bitmaps of array md2)
>
> echo "1000-2000" > /sys/block/md1/md/bitmap
>
> (dirty the bits for chunks 1000-2000 in md1's bitmap)
>
> This is useful, for example, in cluster environments where you may need
> to combine two disjoint bitmaps into one (following a server failure,
> after a secondary server has taken over the array). By combining the
> bitmaps on the two servers, a full resync can be avoided (This was
> discussed on the list back on March 18, 2005, "[PATCH 1/2] md bitmap bug
> fixes" thread).

Hi Paul,

I tracked down the thread you referenced and these posts (by you)
seems to summarize things well:
http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2

But for clarity's sake, could you elaborate on the negative
implications of not merging the bitmaps on the secondary server?  Will
the previous primary's dirty blocks get dropped on the floor because
the secondary (now the primary) doesn't have awareness of the previous
primary's dirty blocks once it activates the raid1?

Also, what is the interface one should use to collect dirty bits from
the primary's bitmap?

This bitmap merge can't happen until the primary's dirty bits can be
collected right?  Waiting for the failed server to come back to
harvest the dirty bits it has seems wrong (why failover at all?); so I
must be missing something.

please advise, thanks.
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-26 21:30 ` Mike Snitzer
@ 2006-07-27  2:27   ` Paul Clements
  2006-07-27  3:36     ` Mike Snitzer
  2006-07-27 14:28     ` Mike Snitzer
  0 siblings, 2 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27  2:27 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-raid, neilb

Mike Snitzer wrote:

> I tracked down the thread you referenced and these posts (by you)
> seems to summarize things well:
> http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
> 
> But for clarity's sake, could you elaborate on the negative
> implications of not merging the bitmaps on the secondary server?  Will
> the previous primary's dirty blocks get dropped on the floor because
> the secondary (now the primary) doesn't have awareness of the previous
> primary's dirty blocks once it activates the raid1?

Right. At the time of the failover, there were (probably) blocks that 
were out of sync between the primary and secondary. Now, after you've 
failed over to the secondary, you've got to overwrite those blocks with 
data from the secondary in order to make the primary disk consistent 
again. This requires that either you do a full resync from secondary to 
primary (if you don't know what differs), or you merge the two bitmaps 
and resync just that data.

> Also, what is the interface one should use to collect dirty bits from
> the primary's bitmap?

Whatever you'd like. scp the bitmap file over or collect the ranges into 
a file and scp that over, or something similar.

> This bitmap merge can't happen until the primary's dirty bits can be
> collected right?  Waiting for the failed server to come back to

Right. So, when the primary fails, you start the array on the secondary 
with a _clean_ bitmap, and just its local disk component. Now, whatever 
gets written while the primary is down gets put into the bitmap on the 
secondary. When the primary comes back up, you take the dirty bits from 
it and add them into the secondary's bitmap. Then, you insert the 
primary's disk (via nbd or similar) back into the array, and begin a 
resync.

That's the whole reason for this interface. We have to modify the bitmap 
while the array is active (modifying the bitmap while the array is down 
is trivial, and certainly doesn't require sysfs :).

> harvest the dirty bits it has seems wrong (why failover at all?); so I
> must be missing something.

We fail over immediately. We wait until later to combine the bitmaps and 
resync the data.

Hope that helps.

--
Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-27  2:27   ` Paul Clements
@ 2006-07-27  3:36     ` Mike Snitzer
  2006-07-27 14:07       ` Paul Clements
  2006-07-27 14:28     ` Mike Snitzer
  1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-27  3:36 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid, neilb

On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
> Mike Snitzer wrote:
>
> > I tracked down the thread you referenced and these posts (by you)
> > seems to summarize things well:
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
> >
> > But for clarity's sake, could you elaborate on the negative
> > implications of not merging the bitmaps on the secondary server?  Will
> > the previous primary's dirty blocks get dropped on the floor because
> > the secondary (now the primary) doesn't have awareness of the previous
> > primary's dirty blocks once it activates the raid1?
>
> Right. At the time of the failover, there were (probably) blocks that
> were out of sync between the primary and secondary. Now, after you've
> failed over to the secondary, you've got to overwrite those blocks with
> data from the secondary in order to make the primary disk consistent
> again. This requires that either you do a full resync from secondary to
> primary (if you don't know what differs), or you merge the two bitmaps
> and resync just that data.

I took more time to read the later posts in the original thread; that
coupled with your detailed response has helped a lot. thanks.

> > Also, what is the interface one should use to collect dirty bits from
> > the primary's bitmap?
>
> Whatever you'd like. scp the bitmap file over or collect the ranges into
> a file and scp that over, or something similar.

OK, so regardless of whether you are using an external or internal
bitmap; how does one collect the ranges from an array's bitmap?

Generally speaking I think others would have the same (naive) question
given that we need to know what to use as input for the sysfs
interface you've kindly provided.   If it is left as an exercise to
the user that is fine; I'd imagine neilb will get our backs with a
nifty new mdadm flag if need be.

thanks again,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-27  3:36     ` Mike Snitzer
@ 2006-07-27 14:07       ` Paul Clements
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27 14:07 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-raid, neilb

Mike Snitzer wrote:
> On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
>> Mike Snitzer wrote:

>> > Also, what is the interface one should use to collect dirty bits from
>> > the primary's bitmap?
>>
>> Whatever you'd like. scp the bitmap file over or collect the ranges into
>> a file and scp that over, or something similar.
> 
> OK, so regardless of whether you are using an external or internal
> bitmap; how does one collect the ranges from an array's bitmap?

Well, with an internal bitmap you don't need this interface. The bitmap 
is located on all the component disks. The reason we don't use internal 
bitmaps in a configuration where one of the disks is located remotely 
(over a LAN or SAN, or possibly a WAN) is that the bitmap updates (which 
are synchronous and occur fairly often) would be too costly.

So reading the bits out of a file is fairly simple. The bitmap file is 
laid out one bit per chunk, with a 256 byte superblock at the front. You 
just need a perl script (for example) that reads the file and keeps 
track of which bits are dirty, and then prints those numbers out.

--
Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-27  2:27   ` Paul Clements
  2006-07-27  3:36     ` Mike Snitzer
@ 2006-07-27 14:28     ` Mike Snitzer
  2006-07-27 14:55       ` Paul Clements
  1 sibling, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2006-07-27 14:28 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid, neilb

On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:
> Mike Snitzer wrote:
>
> > I tracked down the thread you referenced and these posts (by you)
> > seems to summarize things well:
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111116563016418&w=2
> > http://marc.theaimsgroup.com/?l=linux-raid&m=111117515400864&w=2
> >
> > But for clarity's sake, could you elaborate on the negative
> > implications of not merging the bitmaps on the secondary server?  Will
> > the previous primary's dirty blocks get dropped on the floor because
> > the secondary (now the primary) doesn't have awareness of the previous
> > primary's dirty blocks once it activates the raid1?
>
> Right. At the time of the failover, there were (probably) blocks that
> were out of sync between the primary and secondary.

OK, so now that I understand the need to merge the bitmaps... the
various scenarios that create this (potential) inconsistency are still
unclear to me when you consider the different flavors of raid1.  Is
this inconsistency only possible if using async (aka write-behind)
raid1?

If not, how would this difference in committed blocks occur with
normal (sync) raid1 given MD's endio acknowledges writes after they
are submitted to all raid members?  Is it merely that the bitmap is
left with dangling bits set that don't reflect reality (blocks weren't
actually changed anywhere) when a crash occurs?  Is there real
potential for inconsistent data on disk(s) when using sync raid1 (does
having an nbd member increase the likelihood)?

regards,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-27 14:28     ` Mike Snitzer
@ 2006-07-27 14:55       ` Paul Clements
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-07-27 14:55 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: linux-raid, neilb

Mike Snitzer wrote:
> On 7/26/06, Paul Clements <paul.clements@steeleye.com> wrote:

>> Right. At the time of the failover, there were (probably) blocks that
>> were out of sync between the primary and secondary.
> 
> OK, so now that I understand the need to merge the bitmaps... the
> various scenarios that create this (potential) inconsistency are still
> unclear to me when you consider the different flavors of raid1.  Is
> this inconsistency only possible if using async (aka write-behind)
> raid1?

No. Even with a synchronous (normal) raid1, you will probably have 
blocks that are out of sync when one disk (or server) fails. This is 
true even of raid1's using internal disks. That's why you resync the 
array after a failure (of the system or of one of the disks). That's 
exactly what the bitmap is for -- to optimize that resync.

--
Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-07-25  6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
  2006-07-25 17:41 ` dave rientjes
  2006-07-26 21:30 ` Mike Snitzer
@ 2006-08-03  1:42 ` Neil Brown
  2006-08-03  1:53   ` Paul Clements
  2006-08-03  7:24   ` David Greaves
  2 siblings, 2 replies; 12+ messages in thread
From: Neil Brown @ 2006-08-03  1:42 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid

On Tuesday July 25, paul.clements@steeleye.com wrote:
> This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface 
> that allows the bitmap of an array to be dirtied. The interface is 
> write-only, and is used as follows:
> 
> echo "1000" > /sys/block/md2/md/bitmap
> 
> (dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk 
> bitmaps of array md2)
> 
> echo "1000-2000" > /sys/block/md1/md/bitmap
> 
> (dirty the bits for chunks 1000-2000 in md1's bitmap)
> 

Thanks.  Only one remaining issue (which I didn't think of last time :-)

Is 'bitmap' the best name for the sysfs file?
It seems a bit generic to me.

  write-bits-here-to-dirty-them-in-the-bitmap

is probably (no, definitely) too verbose.

 dirty-in-bitmap
maybe?
 bitmap-set-bits

None of these seem completely satisfying.
I like to start with the most important word - which is 'bitmap' in
this case, but that isn't completely essential.

Any better suggestions?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-08-03  1:42 ` Neil Brown
@ 2006-08-03  1:53   ` Paul Clements
  2006-08-03  7:24   ` David Greaves
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Clements @ 2006-08-03  1:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Brown wrote:

> Is 'bitmap' the best name for the sysfs file?
> It seems a bit generic to me.
> 
>   write-bits-here-to-dirty-them-in-the-bitmap
> 
> is probably (no, definitely) too verbose.
> 
>  dirty-in-bitmap
> maybe?
>  bitmap-set-bits

> Any better suggestions?

I like "bitmap-set-bits" or "bitmap-dirty" maybe. I had thought about 
this, but never changed the name. I agree, something more than just 
"bitmap" is probably appropriate, so that if we ever add a generic 
bitmap output or input (to clear bits, maybe) then we still have 
"bitmap" available, and also because (as you said) "bitmap" is a little 
too generic for what this interface really does.

Thanks,
Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-08-03  1:42 ` Neil Brown
  2006-08-03  1:53   ` Paul Clements
@ 2006-08-03  7:24   ` David Greaves
  2006-08-03 15:39     ` Mr. James W. Laferriere
  1 sibling, 1 reply; 12+ messages in thread
From: David Greaves @ 2006-08-03  7:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: Paul Clements, linux-raid

Neil Brown wrote:
>   write-bits-here-to-dirty-them-in-the-bitmap
> 
> is probably (no, definitely) too verbose.
> Any better suggestions?

It's not actually a bitmap is it?
It takes a number or range and *operates* on a bitmap.

so:
 dirty-chunk-in-bitmap

or maybe:
 dirty-bitmap-chunk

David



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] md: new bitmap sysfs interface
  2006-08-03  7:24   ` David Greaves
@ 2006-08-03 15:39     ` Mr. James W. Laferriere
  0 siblings, 0 replies; 12+ messages in thread
From: Mr. James W. Laferriere @ 2006-08-03 15:39 UTC (permalink / raw)
  To: David Greaves; +Cc: Neil Brown, Paul Clements, linux-raid

 	Hello All ,

On Thu, 3 Aug 2006, David Greaves wrote:
> Neil Brown wrote:
>>   write-bits-here-to-dirty-them-in-the-bitmap
>>
>> is probably (no, definitely) too verbose.
>> Any better suggestions?
>
> It's not actually a bitmap is it?
> It takes a number or range and *operates* on a bitmap.
>
> so:
> dirty-chunk-in-bitmap
>
> or maybe:
> dirty-bitmap-chunk
>
> David

 	A thought , maybe  bitmap-dirty-this-chunk   ?

-- 
+----------------------------------------------------------------------+
| James   W.   Laferriere |   System    Techniques   | Give me VMS     |
| Network        Engineer | 3600 14th Ave SE #20-103 |  Give me Linux  |
| babydr@baby-dragons.com |  Olympia ,  WA.   98501  |   only  on  AXP |
+----------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-08-03 15:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25  6:30 [PATCH] md: new bitmap sysfs interface Paul Clements
2006-07-25 17:41 ` dave rientjes
2006-07-26 21:30 ` Mike Snitzer
2006-07-27  2:27   ` Paul Clements
2006-07-27  3:36     ` Mike Snitzer
2006-07-27 14:07       ` Paul Clements
2006-07-27 14:28     ` Mike Snitzer
2006-07-27 14:55       ` Paul Clements
2006-08-03  1:42 ` Neil Brown
2006-08-03  1:53   ` Paul Clements
2006-08-03  7:24   ` David Greaves
2006-08-03 15:39     ` Mr. James W. Laferriere

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).