* [PATCH 000 of 4] md: Introduction
@ 2006-08-24 7:40 NeilBrown
2006-08-24 7:41 ` [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking NeilBrown
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: NeilBrown @ 2006-08-24 7:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Following are 4 patches against 2.6.18-rc4-mm2
The first 2 are bug fixes which should go in 2.6.18, and apply
equally well to that tree as to -mm.
The latter two should stay in -mm until after 2.6.18.
The second patch is maybe bigger than it absolutely needs to be as a bugfix.
If you like I can stripe out all the rcu-extra-carefulness as a separate
patch and just leave the important bit which involves moving the
atomic_add down twenty-something lines.
Thanks,
NeilBrown
[PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking
[PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1.
[PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap
[PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx().
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
@ 2006-08-24 7:41 ` NeilBrown
2006-08-24 7:41 ` [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1 NeilBrown
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2006-08-24 7:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
A recent patch broke the ability to do a
user-request check of a raid1.
This patch fixes the breakage and also moves a comment that
was dislocated by the same patch.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c 2006-08-24 17:09:42.000000000 +1000
+++ ./drivers/md/raid1.c 2006-08-24 17:21:35.000000000 +1000
@@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
return 0;
}
- /* before building a request, check if we can skip these blocks..
- * This call the bitmap_start_sync doesn't actually record anything
- */
if (mddev->bitmap == NULL &&
mddev->recovery_cp == MaxSector &&
+ !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
conf->fullsync == 0) {
*skipped = 1;
return max_sector - sector_nr;
}
+ /* before building a request, check if we can skip these blocks..
+ * This call the bitmap_start_sync doesn't actually record anything
+ */
if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
!conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
/* We can skip this block, and probably several more */
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1.
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
2006-08-24 7:41 ` [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking NeilBrown
@ 2006-08-24 7:41 ` NeilBrown
2006-08-24 7:41 ` [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap NeilBrown
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2006-08-24 7:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
We need to be careful when referencing mirrors[i].rdev.
so it can disappear under us at various times.
So:
fix a couple of problem places.
comment a couple of non-problem places
move an 'atomic_add' which deferences rdev down a little
way to some where where it is sure to not be NULL.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 55 +++++++++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 21 deletions(-)
diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c 2006-08-24 17:21:35.000000000 +1000
+++ ./drivers/md/raid1.c 2006-08-24 17:23:58.000000000 +1000
@@ -930,10 +930,13 @@ static void status(struct seq_file *seq,
seq_printf(seq, " [%d/%d] [", conf->raid_disks,
conf->raid_disks - mddev->degraded);
- for (i = 0; i < conf->raid_disks; i++)
+ rcu_read_lock();
+ for (i = 0; i < conf->raid_disks; i++) {
+ mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev);
seq_printf(seq, "%s",
- conf->mirrors[i].rdev &&
- test_bit(In_sync, &conf->mirrors[i].rdev->flags) ? "U" : "_");
+ rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_");
+ }
+ rcu_read_unlock();
seq_printf(seq, "]");
}
@@ -976,7 +979,6 @@ static void error(mddev_t *mddev, mdk_rd
static void print_conf(conf_t *conf)
{
int i;
- mirror_info_t *tmp;
printk("RAID1 conf printout:\n");
if (!conf) {
@@ -986,14 +988,17 @@ static void print_conf(conf_t *conf)
printk(" --- wd:%d rd:%d\n", conf->raid_disks - conf->mddev->degraded,
conf->raid_disks);
+ rcu_read_lock();
for (i = 0; i < conf->raid_disks; i++) {
char b[BDEVNAME_SIZE];
- tmp = conf->mirrors + i;
- if (tmp->rdev)
+ mdk_rdev_t *rdev = rcu_dereference(conf->mirrors[i].rdev);
+ if (rdev)
printk(" disk %d, wo:%d, o:%d, dev:%s\n",
- i, !test_bit(In_sync, &tmp->rdev->flags), !test_bit(Faulty, &tmp->rdev->flags),
- bdevname(tmp->rdev->bdev,b));
+ i, !test_bit(In_sync, &rdev->flags),
+ !test_bit(Faulty, &rdev->flags),
+ bdevname(rdev->bdev,b));
}
+ rcu_read_unlock();
}
static void close_sync(conf_t *conf)
@@ -1009,17 +1014,17 @@ static int raid1_spare_active(mddev_t *m
{
int i;
conf_t *conf = mddev->private;
- mirror_info_t *tmp;
/*
* Find all failed disks within the RAID1 configuration
- * and mark them readable
+ * and mark them readable.
+ * Called under mddev lock, so rcu protection not needed.
*/
for (i = 0; i < conf->raid_disks; i++) {
- tmp = conf->mirrors + i;
- if (tmp->rdev
- && !test_bit(Faulty, &tmp->rdev->flags)
- && !test_and_set_bit(In_sync, &tmp->rdev->flags)) {
+ mdk_rdev_t *rdev = conf->mirrors[i].rdev;
+ if (rdev
+ && !test_bit(Faulty, &rdev->flags)
+ && !test_and_set_bit(In_sync, &rdev->flags)) {
unsigned long flags;
spin_lock_irqsave(&conf->device_lock, flags);
mddev->degraded--;
@@ -1239,7 +1244,7 @@ static void sync_request_write(mddev_t *
/* ouch - failed to read all of that.
* Try some synchronous reads of other devices to get
* good data, much like with normal read errors. Only
- * read into the pages we already have so they we don't
+ * read into the pages we already have so we don't
* need to re-issue the read request.
* We don't need to freeze the array, because being in an
* active sync request, there is no normal IO, and
@@ -1259,6 +1264,10 @@ static void sync_request_write(mddev_t *
s = PAGE_SIZE >> 9;
do {
if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
+ /* No rcu protection needed here devices
+ * can only be removed when no resync is
+ * active, and resync is currently active
+ */
rdev = conf->mirrors[d].rdev;
if (sync_page_io(rdev->bdev,
sect + rdev->data_offset,
@@ -1376,6 +1385,11 @@ static void fix_read_error(conf_t *conf,
s = PAGE_SIZE >> 9;
do {
+ /* Note: no rcu protection needed here
+ * as this is synchronous in the raid1d thread
+ * which is the thread that might remove
+ * a device. If raid1d ever becomes multi-threaded....
+ */
rdev = conf->mirrors[d].rdev;
if (rdev &&
test_bit(In_sync, &rdev->flags) &&
@@ -1403,7 +1417,6 @@ static void fix_read_error(conf_t *conf,
d = conf->raid_disks;
d--;
rdev = conf->mirrors[d].rdev;
- atomic_add(s, &rdev->corrected_errors);
if (rdev &&
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev->bdev,
@@ -1429,7 +1442,8 @@ static void fix_read_error(conf_t *conf,
== 0)
/* Well, this device is dead */
md_error(mddev, rdev);
- else
+ else {
+ atomic_add(s, &rdev->corrected_errors);
printk(KERN_INFO
"raid1:%s: read error corrected "
"(%d sectors at %llu on %s)\n",
@@ -1437,6 +1451,7 @@ static void fix_read_error(conf_t *conf,
(unsigned long long)sect +
rdev->data_offset,
bdevname(rdev->bdev, b));
+ }
}
}
sectors -= s;
@@ -1806,19 +1821,17 @@ static sector_t sync_request(mddev_t *md
for (i=0; i<conf->raid_disks; i++) {
bio = r1_bio->bios[i];
if (bio->bi_end_io == end_sync_read) {
- md_sync_acct(conf->mirrors[i].rdev->bdev, nr_sectors);
+ md_sync_acct(bio->bi_bdev, nr_sectors);
generic_make_request(bio);
}
}
} else {
atomic_set(&r1_bio->remaining, 1);
bio = r1_bio->bios[r1_bio->read_disk];
- md_sync_acct(conf->mirrors[r1_bio->read_disk].rdev->bdev,
- nr_sectors);
+ md_sync_acct(bio->bi_bdev, nr_sectors);
generic_make_request(bio);
}
-
return nr_sectors;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
2006-08-24 7:41 ` [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking NeilBrown
2006-08-24 7:41 ` [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1 NeilBrown
@ 2006-08-24 7:41 ` NeilBrown
2006-08-24 7:41 ` [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx() NeilBrown
2006-08-24 22:09 ` [PATCH 000 of 4] md: Introduction Andrew Morton
4 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2006-08-24 7:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
From: Paul Clements <paul.clements@steeleye.com>
This patch (tested against 2.6.18-rc1-mm1) adds a new sysfs interface
that allows the bitmap of an array to be dirtied. The interface is
write-only, and is used as follows:
echo "1000" > /sys/block/md2/md/bitmap
(dirty the bit for chunk 1000 [offset 0] in the in-memory and on-disk
bitmaps of array md2)
echo "1000-2000" > /sys/block/md1/md/bitmap
(dirty the bits for chunks 1000-2000 in md1's bitmap)
This is useful, for example, in cluster environments where you may need
to combine two disjoint bitmaps into one (following a server failure,
after a secondary server has taken over the array). By combining the
bitmaps on the two servers, a full resync can be avoided (This was
discussed on the list back on March 18, 2005, "[PATCH 1/2] md bitmap bug
fixes" thread).
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 9 +++++++++
./drivers/md/bitmap.c | 14 ++++++++++++++
./drivers/md/md.c | 31 +++++++++++++++++++++++++++++++
./include/linux/raid/bitmap.h | 2 ++
4 files changed, 56 insertions(+)
diff .prev/Documentation/md.txt ./Documentation/md.txt
--- .prev/Documentation/md.txt 2006-08-24 17:23:45.000000000 +1000
+++ ./Documentation/md.txt 2006-08-24 17:24:05.000000000 +1000
@@ -410,6 +410,15 @@ also have
than sectors, this my be larger than the number of actual errors
by a factor of the number of sectors in a page.
+ bitmap_set_bits
+ If the array has a write-intent bitmap, then writing to this
+ attribute can set bits in the bitmap, indicating that a resync
+ would need to check the corresponding blocks. Either individual
+ numbers or start-end pairs can be written. Multiple numbers
+ can be separated by a space.
+ Note that the numbers are 'bit' numbers, not 'block' numbers.
+ They should be scaled by the bitmap_chunksize.
+
Each active md device may also have attributes specific to the
personality module that manages it.
These are specific to the implementation of the module and could
diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c
--- .prev/drivers/md/bitmap.c 2006-08-24 17:23:45.000000000 +1000
+++ ./drivers/md/bitmap.c 2006-08-24 17:24:05.000000000 +1000
@@ -613,6 +613,7 @@ static inline unsigned long file_page_of
static inline struct page *filemap_get_page(struct bitmap *bitmap,
unsigned long chunk)
{
+ if (file_page_index(chunk) >= bitmap->file_pages) return NULL;
return bitmap->filemap[file_page_index(chunk) - file_page_index(0)];
}
@@ -739,6 +740,7 @@ static void bitmap_file_set_bit(struct b
}
page = filemap_get_page(bitmap, chunk);
+ if (!page) return;
bit = file_page_offset(chunk);
/* set the bit */
@@ -1322,6 +1324,18 @@ static void bitmap_set_memory_bits(struc
}
+/* dirty the memory and file bits for bitmap chunks "s" to "e" */
+void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long e)
+{
+ unsigned long chunk;
+
+ for (chunk = s; chunk <= e; chunk++) {
+ sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
+ bitmap_set_memory_bits(bitmap, sec, 1);
+ bitmap_file_set_bit(bitmap, sec);
+ }
+}
+
/*
* flush out any pending updates
*/
diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2006-08-24 17:23:45.000000000 +1000
+++ ./drivers/md/md.c 2006-08-24 17:24:05.000000000 +1000
@@ -2524,6 +2524,36 @@ static struct md_sysfs_entry md_new_devi
__ATTR(new_dev, S_IWUSR, null_show, new_dev_store);
static ssize_t
+bitmap_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ char *end;
+ unsigned long chunk, end_chunk;
+
+ if (!mddev->bitmap)
+ goto out;
+ /* buf should be <chunk> <chunk> ... or <chunk>-<chunk> ... (range) */
+ while (*buf) {
+ chunk = end_chunk = simple_strtoul(buf, &end, 0);
+ if (buf == end) break;
+ if (*end == '-') { /* range */
+ buf = end + 1;
+ end_chunk = simple_strtoul(buf, &end, 0);
+ if (buf == end) break;
+ }
+ if (*end && !isspace(*end)) break;
+ bitmap_dirty_bits(mddev->bitmap, chunk, end_chunk);
+ buf = end;
+ while (isspace(*buf)) buf++;
+ }
+ bitmap_unplug(mddev->bitmap); /* flush the bits to disk */
+out:
+ return len;
+}
+
+static struct md_sysfs_entry md_bitmap =
+__ATTR(bitmap_set_bits, S_IWUSR, null_show, bitmap_store);
+
+static ssize_t
size_show(mddev_t *mddev, char *page)
{
return sprintf(page, "%llu\n", (unsigned long long)mddev->size);
@@ -2843,6 +2873,7 @@ static struct attribute *md_redundancy_a
&md_sync_completed.attr,
&md_suspend_lo.attr,
&md_suspend_hi.attr,
+ &md_bitmap.attr,
NULL,
};
static struct attribute_group md_redundancy_group = {
diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h
--- .prev/include/linux/raid/bitmap.h 2006-08-24 17:23:45.000000000 +1000
+++ ./include/linux/raid/bitmap.h 2006-08-24 17:24:05.000000000 +1000
@@ -265,6 +265,8 @@ int bitmap_update_sb(struct bitmap *bitm
int bitmap_setallbits(struct bitmap *bitmap);
void bitmap_write_all(struct bitmap *bitmap);
+void bitmap_dirty_bits(struct bitmap *bitmap, unsigned long s, unsigned long e);
+
/* these are exported */
int bitmap_startwrite(struct bitmap *bitmap, sector_t offset,
unsigned long sectors, int behind);
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx().
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
` (2 preceding siblings ...)
2006-08-24 7:41 ` [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap NeilBrown
@ 2006-08-24 7:41 ` NeilBrown
2006-08-24 22:09 ` [PATCH 000 of 4] md: Introduction Andrew Morton
4 siblings, 0 replies; 7+ messages in thread
From: NeilBrown @ 2006-08-24 7:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
From : Coywolf Qi Hunt <qiyong@freeforge.net>
Signed-off-by: Coywolf Qi Hunt <qiyong@freeforge.net>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2006-08-24 17:09:42.000000000 +1000
+++ ./drivers/md/raid5.c 2006-08-24 17:24:17.000000000 +1000
@@ -1350,10 +1350,9 @@ static int page_is_zero(struct page *p)
static int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks)
{
int sectors_per_chunk = conf->chunk_size >> 9;
- sector_t x = stripe;
int pd_idx, dd_idx;
- int chunk_offset = sector_div(x, sectors_per_chunk);
- stripe = x;
+ int chunk_offset = sector_div(stripe, sectors_per_chunk);
+
raid5_compute_sector(stripe*(disks-1)*sectors_per_chunk
+ chunk_offset, disks, disks-1, &dd_idx, &pd_idx, conf);
return pd_idx;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 000 of 4] md: Introduction
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
` (3 preceding siblings ...)
2006-08-24 7:41 ` [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx() NeilBrown
@ 2006-08-24 22:09 ` Andrew Morton
2006-08-25 8:06 ` Neil Brown
4 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2006-08-24 22:09 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid, linux-kernel
On Thu, 24 Aug 2006 17:40:56 +1000
NeilBrown <neilb@suse.de> wrote:
>
> Following are 4 patches against 2.6.18-rc4-mm2
>
> The first 2 are bug fixes which should go in 2.6.18, and apply
> equally well to that tree as to -mm.
>
> The latter two should stay in -mm until after 2.6.18.
>
> The second patch is maybe bigger than it absolutely needs to be as a bugfix.
> If you like I can stripe out all the rcu-extra-carefulness as a separate
> patch and just leave the important bit which involves moving the
> atomic_add down twenty-something lines.
>
> Thanks,
> NeilBrown
>
> [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking
> [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1.
> [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap
> [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx().
The second patch is against -mm and doesn't come within a mile of applying
to mainline.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 000 of 4] md: Introduction
2006-08-24 22:09 ` [PATCH 000 of 4] md: Introduction Andrew Morton
@ 2006-08-25 8:06 ` Neil Brown
0 siblings, 0 replies; 7+ messages in thread
From: Neil Brown @ 2006-08-25 8:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
On Thursday August 24, akpm@osdl.org wrote:
> On Thu, 24 Aug 2006 17:40:56 +1000
> NeilBrown <neilb@suse.de> wrote:
> >
> > [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking
> > [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1.
> > [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap
> > [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx().
>
> The second patch is against -mm and doesn't come within a mile of applying
> to mainline.
Bother ...
I'll get you a really-truly good patch after the weekend :-(
NeilBrown
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-08-25 8:06 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-24 7:40 [PATCH 000 of 4] md: Introduction NeilBrown
2006-08-24 7:41 ` [PATCH 001 of 4] md: Fix recent breakage of md/raid1 array checking NeilBrown
2006-08-24 7:41 ` [PATCH 002 of 4] md: Fix issues with referencing rdev in md/raid1 NeilBrown
2006-08-24 7:41 ` [PATCH 003 of 4] md: new sysfs interface for setting bits in the write-intent-bitmap NeilBrown
2006-08-24 7:41 ` [PATCH 004 of 4] md: Remove unnecessary variable x in stripe_to_pdidx() NeilBrown
2006-08-24 22:09 ` [PATCH 000 of 4] md: Introduction Andrew Morton
2006-08-25 8:06 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).