* [PATCH md 000 of 14] Introduction
@ 2005-12-01 3:22 NeilBrown
2005-12-01 3:22 ` [PATCH md 001 of 14] Support check-without-repair of raid10 arrays NeilBrown
` (13 more replies)
0 siblings, 14 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Following are 14 more patches for md in 2.6.15-rc2-mm1
None are appropriate for 2.6.15 - all should wait for 2.6.16 to open.
With these the handling for read errors by overwriting with correct
data is supported for all relevant raid levels. Also user-requested
checking of redundant information is supported for all levels, and
report the number of affect blocks in sysfs (though this number is
only approximate).
/proc/mdstat becomes pollable so that mdadm can get timely
notification of changes without polling at a high frequency.
There are also a substantial number of code cleanups.
[PATCH md 001 of 14] Support check-without-repair of raid10 arrays
[PATCH md 002 of 14] Allow raid1 to check consistency
[PATCH md 003 of 14] Make sure read error on last working drive of raid1 actually returns failure.
[PATCH md 004 of 14] auto-correct correctable read errors in raid10
[PATCH md 005 of 14] raid10 read-error handling - resync and read-only
[PATCH md 006 of 14] Make /proc/mdstat pollable.
[PATCH md 007 of 14] Clean up 'page' related names in md
[PATCH md 008 of 14] Convert md to use kzalloc throughout
[PATCH md 009 of 14] Tidy up raid5/6 hash table code.
[PATCH md 010 of 14] Convert various kmap calls to kmap_atomic
[PATCH md 011 of 14] Convert recently exported symbol to GPL
[PATCH md 012 of 14] Break out of a loop that doesn't need to run to completion.
[PATCH md 013 of 14] Remove personality numbering from md.
[PATCH md 014 of 14] Fix possible problem in raid1/raid10 error overwriting.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 001 of 14] Support check-without-repair of raid10 arrays
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
@ 2005-12-01 3:22 ` NeilBrown
2005-12-01 3:22 ` [PATCH md 002 of 14] Allow raid1 to check consistency NeilBrown
` (12 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Also keep count on the number of errors found.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 4 ++++
1 file changed, 4 insertions(+)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-28 10:12:52.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-28 14:01:41.000000000 +1100
@@ -1206,6 +1206,10 @@ static void sync_request_write(mddev_t *
break;
if (j == vcnt)
continue;
+ mddev->resync_mismatches += r10_bio->sectors;
+ if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
+ /* Don't fix anything. */
+ continue;
/* Ok, we need to write this bio
* First we need to fixup bv_offset, bv_len and
* bi_vecs, as the read request might have corrupted these
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 002 of 14] Allow raid1 to check consistency
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
2005-12-01 3:22 ` [PATCH md 001 of 14] Support check-without-repair of raid10 arrays NeilBrown
@ 2005-12-01 3:22 ` NeilBrown
2005-12-01 22:34 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 003 of 14] Make sure read error on last working drive of raid1 actually returns failure NeilBrown
` (11 subsequent siblings)
13 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:22 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Where performing a user-requested 'check' or 'repair',
we read all readable devices, and compare the contents.
We only write to blocks which had read errors, or
blocks with content that differs from the first good device
found.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 158 +++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 129 insertions(+), 29 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 15:08:48.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 17:20:37.000000000 +1100
@@ -106,15 +106,30 @@ static void * r1buf_pool_alloc(gfp_t gfp
}
/*
* Allocate RESYNC_PAGES data pages and attach them to
- * the first bio;
- */
- bio = r1_bio->bios[0];
- for (i = 0; i < RESYNC_PAGES; i++) {
- page = alloc_page(gfp_flags);
- if (unlikely(!page))
- goto out_free_pages;
-
- bio->bi_io_vec[i].bv_page = page;
+ * the first bio.
+ * If this is a user-requested check/repair, allocate
+ * RESYNC_PAGES for each bio.
+ */
+ if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
+ j = pi->raid_disks;
+ else
+ j = 1;
+ while(j--) {
+ bio = r1_bio->bios[j];
+ for (i = 0; i < RESYNC_PAGES; i++) {
+ page = alloc_page(gfp_flags);
+ if (unlikely(!page))
+ goto out_free_pages;
+
+ bio->bi_io_vec[i].bv_page = page;
+ }
+ }
+ /* If not user-requests, copy the page pointers to all bios */
+ if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
+ for (i=0; i<RESYNC_PAGES ; i++)
+ for (j=1; j<pi->raid_disks; j++)
+ r1_bio->bios[j]->bi_io_vec[i].bv_page =
+ r1_bio->bios[0]->bi_io_vec[i].bv_page;
}
r1_bio->master_bio = NULL;
@@ -122,8 +137,10 @@ static void * r1buf_pool_alloc(gfp_t gfp
return r1_bio;
out_free_pages:
- for ( ; i > 0 ; i--)
- __free_page(bio->bi_io_vec[i-1].bv_page);
+ for (i=0; i < RESYNC_PAGES ; i++)
+ for (j=0 ; j < pi->raid_disks; j++)
+ __free_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
+ j = -1;
out_free_bio:
while ( ++j < pi->raid_disks )
bio_put(r1_bio->bios[j]);
@@ -134,14 +151,16 @@ out_free_bio:
static void r1buf_pool_free(void *__r1_bio, void *data)
{
struct pool_info *pi = data;
- int i;
+ int i,j;
r1bio_t *r1bio = __r1_bio;
- struct bio *bio = r1bio->bios[0];
- for (i = 0; i < RESYNC_PAGES; i++) {
- __free_page(bio->bi_io_vec[i].bv_page);
- bio->bi_io_vec[i].bv_page = NULL;
- }
+ for (i = 0; i < RESYNC_PAGES; i++)
+ for (j = pi->raid_disks; j-- ;) {
+ if (j == 0 ||
+ r1bio->bios[j]->bi_io_vec[i].bv_page !=
+ r1bio->bios[0]->bi_io_vec[i].bv_page)
+ __free_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
+ }
for (i=0 ; i < pi->raid_disks; i++)
bio_put(r1bio->bios[i]);
@@ -1076,13 +1095,16 @@ abort:
static int end_sync_read(struct bio *bio, unsigned int bytes_done, int error)
{
r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
+ int i;
if (bio->bi_size)
return 1;
- if (r1_bio->bios[r1_bio->read_disk] != bio)
- BUG();
- update_head_pos(r1_bio->read_disk, r1_bio);
+ for (i=r1_bio->mddev->raid_disks; i--; )
+ if (r1_bio->bios[i] == bio)
+ break;
+ BUG_ON(i < 0);
+ update_head_pos(i, r1_bio);
/*
* we have read a block, now it needs to be re-written,
* or re-read if the read failed.
@@ -1090,7 +1112,9 @@ static int end_sync_read(struct bio *bio
*/
if (test_bit(BIO_UPTODATE, &bio->bi_flags))
set_bit(R1BIO_Uptodate, &r1_bio->state);
- reschedule_retry(r1_bio);
+
+ if (atomic_dec_and_test(&r1_bio->remaining))
+ reschedule_retry(r1_bio);
return 0;
}
@@ -1133,9 +1157,65 @@ static void sync_request_write(mddev_t *
bio = r1_bio->bios[r1_bio->read_disk];
- /*
- * schedule writes
- */
+ if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+ /* We have read all readable devices. If we haven't
+ * got the block, then there is no hope left.
+ * If we have, then we want to do a comparison
+ * and skip the write if everything is the same.
+ * If any blocks failed to read, then we need to
+ * attempt an over-write
+ */
+ int primary;
+ if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
+ for (i=0; i<mddev->raid_disks; i++)
+ if (r1_bio->bios[i]->bi_end_io == end_sync_read)
+ md_error(mddev, conf->mirrors[i].rdev);
+
+ md_done_sync(mddev, r1_bio->sectors, 1);
+ put_buf(r1_bio);
+ return;
+ }
+ for (primary=0; primary<mddev->raid_disks; primary++)
+ if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
+ test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
+ r1_bio->bios[primary]->bi_end_io = NULL;
+ break;
+ }
+ r1_bio->read_disk = primary;
+ for (i=0; i<mddev->raid_disks; i++)
+ if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
+ test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) {
+ int j;
+ int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
+ struct bio *pbio = r1_bio->bios[primary];
+ struct bio *sbio = r1_bio->bios[i];
+ for (j = vcnt; j-- ; )
+ if (memcmp(page_address(pbio->bi_io_vec[j].bv_page),
+ page_address(sbio->bi_io_vec[j].bv_page),
+ PAGE_SIZE))
+ break;
+ if (j >= 0)
+ mddev->resync_mismatches += r1_bio->sectors;
+ if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
+ sbio->bi_end_io = NULL;
+ else {
+ /* fixup the bio for reuse */
+ sbio->bi_vcnt = vcnt;
+ sbio->bi_size = r1_bio->sectors << 9;
+ sbio->bi_idx = 0;
+ sbio->bi_phys_segments = 0;
+ sbio->bi_hw_segments = 0;
+ sbio->bi_hw_front_size = 0;
+ sbio->bi_hw_back_size = 0;
+ sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
+ sbio->bi_flags |= 1 << BIO_UPTODATE;
+ sbio->bi_next = NULL;
+ sbio->bi_sector = r1_bio->sector +
+ conf->mirrors[i].rdev->data_offset;
+ sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+ }
+ }
+ }
if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
/* ouch - failed to read all of that.
* Try some synchronous reads of other devices to get
@@ -1215,6 +1295,10 @@ static void sync_request_write(mddev_t *
idx ++;
}
}
+
+ /*
+ * schedule writes
+ */
atomic_set(&r1_bio->remaining, 1);
for (i = 0; i < disks ; i++) {
wbio = r1_bio->bios[i];
@@ -1617,10 +1701,10 @@ static sector_t sync_request(mddev_t *md
for (i=0 ; i < conf->raid_disks; i++) {
bio = r1_bio->bios[i];
if (bio->bi_end_io) {
- page = r1_bio->bios[0]->bi_io_vec[bio->bi_vcnt].bv_page;
+ page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
if (bio_add_page(bio, page, len, 0) == 0) {
/* stop here */
- r1_bio->bios[0]->bi_io_vec[bio->bi_vcnt].bv_page = page;
+ bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
while (i > 0) {
i--;
bio = r1_bio->bios[i];
@@ -1640,12 +1724,28 @@ static sector_t sync_request(mddev_t *md
sync_blocks -= (len>>9);
} while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES);
bio_full:
- bio = r1_bio->bios[r1_bio->read_disk];
r1_bio->sectors = nr_sectors;
- md_sync_acct(conf->mirrors[r1_bio->read_disk].rdev->bdev, nr_sectors);
+ /* For a user-requested sync, we read all readable devices and do a
+ * compare
+ */
+ if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+ atomic_set(&r1_bio->remaining, read_targets);
+ for (i=0; i<conf->raid_disks; i++) {
+ bio = r1_bio->bios[i];
+ if (bio->bi_end_io == end_sync_read) {
+ md_sync_acct(conf->mirrors[i].rdev->bdev, nr_sectors);
+ generic_make_request(bio);
+ }
+ }
+ } else {
+ atomic_set(&r1_bio->remaining, 1);
+ bio = r1_bio->bios[r1_bio->read_disk];
+ md_sync_acct(conf->mirrors[r1_bio->read_disk].rdev->bdev,
+ nr_sectors);
+ generic_make_request(bio);
- generic_make_request(bio);
+ }
return nr_sectors;
}
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 003 of 14] Make sure read error on last working drive of raid1 actually returns failure.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
2005-12-01 3:22 ` [PATCH md 001 of 14] Support check-without-repair of raid10 arrays NeilBrown
2005-12-01 3:22 ` [PATCH md 002 of 14] Allow raid1 to check consistency NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 004 of 14] auto-correct correctable read errors in raid10 NeilBrown
` (10 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
We are inadvertently setting the R1BIO_Uptodate bit on read errors
when we decide not to try correcting (because there are no other
working devices). This means that the read error is reported to
the client as success.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 17:20:37.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-29 10:28:21.000000000 +1100
@@ -284,7 +284,8 @@ static int raid1_end_read_request(struct
* user-side. So if something waits for IO, then it will
* wait for the 'master' bio.
*/
- set_bit(R1BIO_Uptodate, &r1_bio->state);
+ if (uptodate)
+ set_bit(R1BIO_Uptodate, &r1_bio->state);
raid_end_bio_io(r1_bio);
} else {
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 004 of 14] auto-correct correctable read errors in raid10
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (2 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 003 of 14] Make sure read error on last working drive of raid1 actually returns failure NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 005 of 14] raid10 read-error handling - resync and read-only NeilBrown
` (9 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Largely just a cross-port from raid1.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 127 +++++++++++++++++++++++++++++++++++++-----
./include/linux/raid/raid10.h | 2
2 files changed, 114 insertions(+), 15 deletions(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-29 10:27:40.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-29 11:04:22.000000000 +1100
@@ -209,6 +209,7 @@ static void reschedule_retry(r10bio_t *r
spin_lock_irqsave(&conf->device_lock, flags);
list_add(&r10_bio->retry_list, &conf->retry_list);
+ conf->nr_queued ++;
spin_unlock_irqrestore(&conf->device_lock, flags);
md_wakeup_thread(mddev->thread);
@@ -254,9 +255,9 @@ static int raid10_end_read_request(struc
/*
* this branch is our 'one mirror IO has finished' event handler:
*/
- if (!uptodate)
- md_error(r10_bio->mddev, conf->mirrors[dev].rdev);
- else
+ update_head_pos(slot, r10_bio);
+
+ if (uptodate) {
/*
* Set R10BIO_Uptodate in our master bio, so that
* we will return a good error code to the higher
@@ -267,15 +268,8 @@ static int raid10_end_read_request(struc
* wait for the 'master' bio.
*/
set_bit(R10BIO_Uptodate, &r10_bio->state);
-
- update_head_pos(slot, r10_bio);
-
- /*
- * we have only one bio on the read side
- */
- if (uptodate)
raid_end_bio_io(r10_bio);
- else {
+ } else {
/*
* oops, read error:
*/
@@ -714,6 +708,33 @@ static void allow_barrier(conf_t *conf)
wake_up(&conf->wait_barrier);
}
+static void freeze_array(conf_t *conf)
+{
+ /* stop syncio and normal IO and wait for everything to
+ * go quite.
+ * We increment barrier and nr_waiting, and then
+ * wait until barrier+nr_pending match nr_queued+2
+ */
+ spin_lock_irq(&conf->resync_lock);
+ conf->barrier++;
+ conf->nr_waiting++;
+ wait_event_lock_irq(conf->wait_barrier,
+ conf->barrier+conf->nr_pending == conf->nr_queued+2,
+ conf->resync_lock,
+ raid10_unplug(conf->mddev->queue));
+ spin_unlock_irq(&conf->resync_lock);
+}
+
+static void unfreeze_array(conf_t *conf)
+{
+ /* reverse the effect of the freeze */
+ spin_lock_irq(&conf->resync_lock);
+ conf->barrier--;
+ conf->nr_waiting--;
+ wake_up(&conf->wait_barrier);
+ spin_unlock_irq(&conf->resync_lock);
+}
+
static int make_request(request_queue_t *q, struct bio * bio)
{
mddev_t *mddev = q->queuedata;
@@ -1338,6 +1359,7 @@ static void raid10d(mddev_t *mddev)
break;
r10_bio = list_entry(head->prev, r10bio_t, retry_list);
list_del(head->prev);
+ conf->nr_queued--;
spin_unlock_irqrestore(&conf->device_lock, flags);
mddev = r10_bio->mddev;
@@ -1350,6 +1372,78 @@ static void raid10d(mddev_t *mddev)
unplug = 1;
} else {
int mirror;
+ /* we got a read error. Maybe the drive is bad. Maybe just
+ * the block and we can fix it.
+ * We freeze all other IO, and try reading the block from
+ * other devices. When we find one, we re-write
+ * and check it that fixes the read error.
+ * This is all done synchronously while the array is
+ * frozen.
+ */
+ int sect = 0; /* Offset from r10_bio->sector */
+ int sectors = r10_bio->sectors;
+ freeze_array(conf);
+ if (mddev->ro == 0) while(sectors) {
+ int s = sectors;
+ int sl = r10_bio->read_slot;
+ int success = 0;
+
+ if (s > (PAGE_SIZE>>9))
+ s = PAGE_SIZE >> 9;
+
+ do {
+ int d = r10_bio->devs[sl].devnum;
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags) &&
+ sync_page_io(rdev->bdev,
+ r10_bio->devs[sl].addr +
+ sect + rdev->data_offset,
+ s<<9,
+ conf->tmppage, READ))
+ success = 1;
+ else {
+ sl++;
+ if (sl == conf->copies)
+ sl = 0;
+ }
+ } while (!success && sl != r10_bio->read_slot);
+
+ if (success) {
+ /* write it back and re-read */
+ while (sl != r10_bio->read_slot) {
+ int d;
+ if (sl==0)
+ sl = conf->copies;
+ sl--;
+ d = r10_bio->devs[sl].devnum;
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags)) {
+ if (sync_page_io(rdev->bdev,
+ r10_bio->devs[sl].addr +
+ sect + rdev->data_offset,
+ s<<9, conf->tmppage, WRITE) == 0 ||
+ sync_page_io(rdev->bdev,
+ r10_bio->devs[sl].addr +
+ sect + rdev->data_offset,
+ s<<9, conf->tmppage, READ) == 0) {
+ /* Well, this device is dead */
+ md_error(mddev, rdev);
+ }
+ }
+ }
+ } else {
+ /* Cannot read from anywhere -- bye bye array */
+ md_error(mddev, conf->mirrors[r10_bio->devs[r10_bio->read_slot].devnum].rdev);
+ break;
+ }
+ sectors -= s;
+ sect += s;
+ }
+
+ unfreeze_array(conf);
+
bio = r10_bio->devs[r10_bio->read_slot].bio;
r10_bio->devs[r10_bio->read_slot].bio = NULL;
bio_put(bio);
@@ -1793,22 +1887,24 @@ static int run(mddev_t *mddev)
* bookkeeping area. [whatever we allocate in run(),
* should be freed in stop()]
*/
- conf = kmalloc(sizeof(conf_t), GFP_KERNEL);
+ conf = kzalloc(sizeof(conf_t), GFP_KERNEL);
mddev->private = conf;
if (!conf) {
printk(KERN_ERR "raid10: couldn't allocate memory for %s\n",
mdname(mddev));
goto out;
}
- memset(conf, 0, sizeof(*conf));
- conf->mirrors = kmalloc(sizeof(struct mirror_info)*mddev->raid_disks,
+ conf->mirrors = kzalloc(sizeof(struct mirror_info)*mddev->raid_disks,
GFP_KERNEL);
if (!conf->mirrors) {
printk(KERN_ERR "raid10: couldn't allocate memory for %s\n",
mdname(mddev));
goto out_free_conf;
}
- memset(conf->mirrors, 0, sizeof(struct mirror_info)*mddev->raid_disks);
+
+ conf->tmppage = alloc_page(GFP_KERNEL);
+ if (!conf->tmppage)
+ goto out_free_conf;
conf->near_copies = nc;
conf->far_copies = fc;
@@ -1918,6 +2014,7 @@ static int run(mddev_t *mddev)
out_free_conf:
if (conf->r10bio_pool)
mempool_destroy(conf->r10bio_pool);
+ put_page(conf->tmppage);
kfree(conf->mirrors);
kfree(conf);
mddev->private = NULL;
diff ./include/linux/raid/raid10.h~current~ ./include/linux/raid/raid10.h
--- ./include/linux/raid/raid10.h~current~ 2005-11-29 10:27:40.000000000 +1100
+++ ./include/linux/raid/raid10.h 2005-11-29 11:04:56.000000000 +1100
@@ -42,6 +42,7 @@ struct r10_private_data_s {
spinlock_t resync_lock;
int nr_pending;
int nr_waiting;
+ int nr_queued;
int barrier;
sector_t next_resync;
int fullsync; /* set to 1 if a full sync is needed,
@@ -53,6 +54,7 @@ struct r10_private_data_s {
mempool_t *r10bio_pool;
mempool_t *r10buf_pool;
+ struct page *tmppage;
};
typedef struct r10_private_data_s conf_t;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 005 of 14] raid10 read-error handling - resync and read-only
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (3 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 004 of 14] auto-correct correctable read errors in raid10 NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 006 of 14] Make /proc/mdstat pollable NeilBrown
` (8 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Add in correct read-error handling for resync and read-only
situations.
When read-only, we don't over-write, so we need to mark the failed
drive in the r10_bio so we don't re-try it.
During resync, we always read all blocks, so if there is a read error,
we simply over-write it with the good block that we found (assuming
we found one).
Note that the recovery case still isn't handled in an interesting way.
There is nothing useful to do for the 2-copies case. If there are 3
or more copies, then we could try reading from one of the non-missing
copies, but this is a bit complicated and very rarely would be used,
so I'm leaving it for now.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 56 ++++++++++++++++++++++++++----------------
./include/linux/raid/raid10.h | 7 +++++
2 files changed, 42 insertions(+), 21 deletions(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-29 14:07:05.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-29 14:11:01.000000000 +1100
@@ -172,7 +172,7 @@ static void put_all_bios(conf_t *conf, r
for (i = 0; i < conf->copies; i++) {
struct bio **bio = & r10_bio->devs[i].bio;
- if (*bio)
+ if (*bio && *bio != IO_BLOCKED)
bio_put(*bio);
*bio = NULL;
}
@@ -500,6 +500,7 @@ static int read_balance(conf_t *conf, r1
disk = r10_bio->devs[slot].devnum;
while ((rdev = rcu_dereference(conf->mirrors[disk].rdev)) == NULL ||
+ r10_bio->devs[slot].bio == IO_BLOCKED ||
!test_bit(In_sync, &rdev->flags)) {
slot++;
if (slot == conf->copies) {
@@ -517,6 +518,7 @@ static int read_balance(conf_t *conf, r1
slot = 0;
disk = r10_bio->devs[slot].devnum;
while ((rdev=rcu_dereference(conf->mirrors[disk].rdev)) == NULL ||
+ r10_bio->devs[slot].bio == IO_BLOCKED ||
!test_bit(In_sync, &rdev->flags)) {
slot ++;
if (slot == conf->copies) {
@@ -537,6 +539,7 @@ static int read_balance(conf_t *conf, r1
if ((rdev=rcu_dereference(conf->mirrors[ndisk].rdev)) == NULL ||
+ r10_bio->devs[nslot].bio == IO_BLOCKED ||
!test_bit(In_sync, &rdev->flags))
continue;
@@ -1104,7 +1107,6 @@ abort:
static int end_sync_read(struct bio *bio, unsigned int bytes_done, int error)
{
- int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
r10bio_t * r10_bio = (r10bio_t *)(bio->bi_private);
conf_t *conf = mddev_to_conf(r10_bio->mddev);
int i,d;
@@ -1119,7 +1121,10 @@ static int end_sync_read(struct bio *bio
BUG();
update_head_pos(i, r10_bio);
d = r10_bio->devs[i].devnum;
- if (!uptodate)
+
+ if (test_bit(BIO_UPTODATE, &bio->bi_flags))
+ set_bit(R10BIO_Uptodate, &r10_bio->state);
+ else if (!test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery))
md_error(r10_bio->mddev,
conf->mirrors[d].rdev);
@@ -1209,25 +1214,30 @@ static void sync_request_write(mddev_t *
fbio = r10_bio->devs[i].bio;
/* now find blocks with errors */
- for (i=first+1 ; i < conf->copies ; i++) {
- int vcnt, j, d;
+ for (i=0 ; i < conf->copies ; i++) {
+ int j, d;
+ int vcnt = r10_bio->sectors >> (PAGE_SHIFT-9);
- if (!test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags))
- continue;
- /* We know that the bi_io_vec layout is the same for
- * both 'first' and 'i', so we just compare them.
- * All vec entries are PAGE_SIZE;
- */
tbio = r10_bio->devs[i].bio;
- vcnt = r10_bio->sectors >> (PAGE_SHIFT-9);
- for (j = 0; j < vcnt; j++)
- if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
- page_address(tbio->bi_io_vec[j].bv_page),
- PAGE_SIZE))
- break;
- if (j == vcnt)
+
+ if (tbio->bi_end_io != end_sync_read)
+ continue;
+ if (i == first)
continue;
- mddev->resync_mismatches += r10_bio->sectors;
+ if (test_bit(BIO_UPTODATE, &r10_bio->devs[i].bio->bi_flags)) {
+ /* We know that the bi_io_vec layout is the same for
+ * both 'first' and 'i', so we just compare them.
+ * All vec entries are PAGE_SIZE;
+ */
+ for (j = 0; j < vcnt; j++)
+ if (memcmp(page_address(fbio->bi_io_vec[j].bv_page),
+ page_address(tbio->bi_io_vec[j].bv_page),
+ PAGE_SIZE))
+ break;
+ if (j == vcnt)
+ continue;
+ mddev->resync_mismatches += r10_bio->sectors;
+ }
if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
/* Don't fix anything. */
continue;
@@ -1308,7 +1318,10 @@ static void recovery_request_write(mddev
atomic_inc(&conf->mirrors[d].rdev->nr_pending);
md_sync_acct(conf->mirrors[d].rdev->bdev, wbio->bi_size >> 9);
- generic_make_request(wbio);
+ if (test_bit(R10BIO_Uptodate, &r10_bio->state))
+ generic_make_request(wbio);
+ else
+ bio_endio(wbio, wbio->bi_size, -EIO);
}
@@ -1445,7 +1458,8 @@ static void raid10d(mddev_t *mddev)
unfreeze_array(conf);
bio = r10_bio->devs[r10_bio->read_slot].bio;
- r10_bio->devs[r10_bio->read_slot].bio = NULL;
+ r10_bio->devs[r10_bio->read_slot].bio =
+ mddev->ro ? IO_BLOCKED : NULL;
bio_put(bio);
mirror = read_balance(conf, r10_bio);
if (mirror == -1) {
diff ./include/linux/raid/raid10.h~current~ ./include/linux/raid/raid10.h
--- ./include/linux/raid/raid10.h~current~ 2005-11-29 14:07:05.000000000 +1100
+++ ./include/linux/raid/raid10.h 2005-11-29 12:09:11.000000000 +1100
@@ -104,6 +104,13 @@ struct r10bio_s {
} devs[0];
};
+/* when we get a read error on a read-only array, we redirect to another
+ * device without failing the first device, or trying to over-write to
+ * correct the read error. To keep track of bad blocks on a per-bio
+ * level, we store IO_BLOCKED in the appropriate 'bios' pointer
+ */
+#define IO_BLOCKED ((struct bio*)1)
+
/* bits for r10bio.state */
#define R10BIO_Uptodate 0
#define R10BIO_IsSync 1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 006 of 14] Make /proc/mdstat pollable.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (4 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 005 of 14] raid10 read-error handling - resync and read-only NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 22:39 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 007 of 14] Clean up 'page' related names in md NeilBrown
` (7 subsequent siblings)
13 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
With this patch it is possible to poll /proc/mdstat to detect
arrays appearing or disappearing, to detect failures,
recovery starting, recovery completing, and devices being
added and removed.
It is similar to the poll-ability of /proc/mounts, though different in that:
We always report that the file is readable (because face it, it is, even
if only for EOF).
We report POLLPRI when there is a change so that select() can detect
it as an exceptional event. Not only are these exceptional events, but
that is the mechanism that the current 'mdadm' uses to watch for events
(It also polls after a timeout).
(We also report POLLERR like /proc/mounts).
Finally, we only reset the per-file event counter when the start of the
file is read, rather than when poll() returns an event. This is more
robust as it means that an fd will continue to report activity to poll/select
until the program clearly responds to that activity.
md_new_event takes an 'mddev' which isn't currently used, but it will
be soon.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 76 insertions(+), 5 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-01 13:56:49.000000000 +1100
+++ ./drivers/md/md.c 2005-12-01 13:57:01.000000000 +1100
@@ -42,6 +42,7 @@
#include <linux/devfs_fs_kernel.h>
#include <linux/buffer_head.h> /* for invalidate_bdev */
#include <linux/suspend.h>
+#include <linux/poll.h>
#include <linux/init.h>
@@ -134,6 +135,24 @@ static struct block_device_operations md
static int start_readonly;
/*
+ * We have a system wide 'event count' that is incremented
+ * on any 'interesting' event, and readers of /proc/mdstat
+ * can use 'poll' or 'select' to find out when the event
+ * count increases.
+ *
+ * Events are:
+ * start array, stop array, error, add device, remove device,
+ * start build, activate spare
+ */
+DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
+static atomic_t md_event_count;
+void md_new_event(mddev_t *mddev)
+{
+ atomic_inc(&md_event_count);
+ wake_up(&md_event_waiters);
+}
+
+/*
* Enables to iterate over all existing md arrays
* all_mddevs_lock protects this list.
*/
@@ -2111,6 +2130,7 @@ static int do_md_run(mddev_t * mddev)
mddev->queue->make_request_fn = mddev->pers->make_request;
mddev->changed = 1;
+ md_new_event(mddev);
return 0;
}
@@ -2238,6 +2258,7 @@ static int do_md_stop(mddev_t * mddev, i
printk(KERN_INFO "md: %s switched to read-only mode.\n",
mdname(mddev));
err = 0;
+ md_new_event(mddev);
out:
return err;
}
@@ -2712,6 +2733,7 @@ static int hot_remove_disk(mddev_t * mdd
kick_rdev_from_array(rdev);
md_update_sb(mddev);
+ md_new_event(mddev);
return 0;
busy:
@@ -2802,7 +2824,7 @@ static int hot_add_disk(mddev_t * mddev,
*/
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
-
+ md_new_event(mddev);
return 0;
abort_unbind_export:
@@ -3523,6 +3545,7 @@ void md_error(mddev_t *mddev, mdk_rdev_t
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
+ md_new_event(mddev);
}
/* seq_file implementation /proc/mdstat */
@@ -3663,12 +3686,17 @@ static void md_seq_stop(struct seq_file
mddev_put(mddev);
}
+struct mdstat_info {
+ int event;
+};
+
static int md_seq_show(struct seq_file *seq, void *v)
{
mddev_t *mddev = v;
sector_t size;
struct list_head *tmp2;
mdk_rdev_t *rdev;
+ struct mdstat_info *mi = seq->private;
int i;
struct bitmap *bitmap;
@@ -3681,6 +3709,7 @@ static int md_seq_show(struct seq_file *
spin_unlock(&pers_lock);
seq_printf(seq, "\n");
+ mi->event = atomic_read(&md_event_count);
return 0;
}
if (v == (void*)2) {
@@ -3789,16 +3818,52 @@ static struct seq_operations md_seq_ops
static int md_seq_open(struct inode *inode, struct file *file)
{
int error;
+ struct mdstat_info *mi = kmalloc(sizeof(*mi), GFP_KERNEL);
+ if (mi == NULL)
+ return -ENOMEM;
error = seq_open(file, &md_seq_ops);
+ if (error)
+ kfree(mi);
+ else {
+ struct seq_file *p = file->private_data;
+ p->private = mi;
+ mi->event = atomic_read(&md_event_count);
+ }
return error;
}
+static int md_seq_release(struct inode *inode, struct file *file)
+{
+ struct seq_file *m = file->private_data;
+ struct mdstat_info *mi = m->private;
+ m->private = NULL;
+ kfree(mi);
+ return seq_release(inode, file);
+}
+
+static unsigned int mdstat_poll(struct file *filp, poll_table *wait)
+{
+ struct seq_file *m = filp->private_data;
+ struct mdstat_info *mi = m->private;
+ int mask;
+
+ poll_wait(filp, &md_event_waiters, wait);
+
+ /* always allow read */
+ mask = POLLIN | POLLRDNORM;
+
+ if (mi->event != atomic_read(&md_event_count))
+ mask |= POLLERR | POLLPRI;
+ return mask;
+}
+
static struct file_operations md_seq_fops = {
.open = md_seq_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release,
+ .release = md_seq_release,
+ .poll = mdstat_poll,
};
int register_md_personality(int pnum, mdk_personality_t *p)
@@ -4068,7 +4133,11 @@ static void md_do_sync(mddev_t *mddev)
j += sectors;
if (j>1) mddev->curr_resync = j;
-
+ if (last_check == 0)
+ /* this is the earliers that rebuilt will be
+ * visible in /proc/mdstat
+ */
+ md_new_event(mddev);
if (last_check + window > io_sectors || j == max_sectors)
continue;
@@ -4254,6 +4323,7 @@ void md_check_recovery(mddev_t *mddev)
mddev->recovery = 0;
/* flag recovery needed just to double check */
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_new_event(mddev);
goto unlock;
}
/* Clear some bits that don't mean anything, but
@@ -4291,6 +4361,7 @@ void md_check_recovery(mddev_t *mddev)
sprintf(nm, "rd%d", rdev->raid_disk);
sysfs_create_link(&mddev->kobj, &rdev->kobj, nm);
spares++;
+ md_new_event(mddev);
} else
break;
}
@@ -4323,9 +4394,9 @@ void md_check_recovery(mddev_t *mddev)
mdname(mddev));
/* leave the spares where they are, it shouldn't hurt */
mddev->recovery = 0;
- } else {
+ } else
md_wakeup_thread(mddev->sync_thread);
- }
+ md_new_event(mddev);
}
unlock:
mddev_unlock(mddev);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 007 of 14] Clean up 'page' related names in md
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (5 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 006 of 14] Make /proc/mdstat pollable NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 008 of 14] Convert md to use kzalloc throughout NeilBrown
` (6 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Substitute:
page_cache_get -> get_page
page_cache_release -> put_page
PAGE_CACHE_SHIFT -> PAGE_SHIFT
PAGE_CACHE_SIZE -> PAGE_SIZE
PAGE_CACHE_MASK -> PAGE_MASK
__free_page -> put_page
because we aren't using the page cache, we are just using pages.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 44 ++++++++++++++++++++++----------------------
./drivers/md/md.c | 2 +-
./drivers/md/raid0.c | 2 +-
./drivers/md/raid1.c | 10 +++++-----
./drivers/md/raid10.c | 8 ++++----
./drivers/md/raid5.c | 4 ++--
./drivers/md/raid6main.c | 6 +++---
7 files changed, 38 insertions(+), 38 deletions(-)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-12-01 13:56:49.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-12-01 13:57:54.000000000 +1100
@@ -341,7 +341,7 @@ static int write_page(struct bitmap *bit
/* add to list to be waited for by daemon */
struct page_list *item = mempool_alloc(bitmap->write_pool, GFP_NOIO);
item->page = page;
- page_cache_get(page);
+ get_page(page);
spin_lock(&bitmap->write_lock);
list_add(&item->list, &bitmap->complete_pages);
spin_unlock(&bitmap->write_lock);
@@ -357,10 +357,10 @@ static struct page *read_page(struct fil
struct inode *inode = file->f_mapping->host;
struct page *page = NULL;
loff_t isize = i_size_read(inode);
- unsigned long end_index = isize >> PAGE_CACHE_SHIFT;
+ unsigned long end_index = isize >> PAGE_SHIFT;
- PRINTK("read bitmap file (%dB @ %Lu)\n", (int)PAGE_CACHE_SIZE,
- (unsigned long long)index << PAGE_CACHE_SHIFT);
+ PRINTK("read bitmap file (%dB @ %Lu)\n", (int)PAGE_SIZE,
+ (unsigned long long)index << PAGE_SHIFT);
page = read_cache_page(inode->i_mapping, index,
(filler_t *)inode->i_mapping->a_ops->readpage, file);
@@ -368,7 +368,7 @@ static struct page *read_page(struct fil
goto out;
wait_on_page_locked(page);
if (!PageUptodate(page) || PageError(page)) {
- page_cache_release(page);
+ put_page(page);
page = ERR_PTR(-EIO);
goto out;
}
@@ -376,14 +376,14 @@ static struct page *read_page(struct fil
if (index > end_index) /* we have read beyond EOF */
*bytes_read = 0;
else if (index == end_index) /* possible short read */
- *bytes_read = isize & ~PAGE_CACHE_MASK;
+ *bytes_read = isize & ~PAGE_MASK;
else
- *bytes_read = PAGE_CACHE_SIZE; /* got a full page */
+ *bytes_read = PAGE_SIZE; /* got a full page */
out:
if (IS_ERR(page))
printk(KERN_ALERT "md: bitmap read error: (%dB @ %Lu): %ld\n",
- (int)PAGE_CACHE_SIZE,
- (unsigned long long)index << PAGE_CACHE_SHIFT,
+ (int)PAGE_SIZE,
+ (unsigned long long)index << PAGE_SHIFT,
PTR_ERR(page));
return page;
}
@@ -558,7 +558,7 @@ static void bitmap_mask_state(struct bit
spin_unlock_irqrestore(&bitmap->lock, flags);
return;
}
- page_cache_get(bitmap->sb_page);
+ get_page(bitmap->sb_page);
spin_unlock_irqrestore(&bitmap->lock, flags);
sb = (bitmap_super_t *)kmap(bitmap->sb_page);
switch (op) {
@@ -569,7 +569,7 @@ static void bitmap_mask_state(struct bit
default: BUG();
}
kunmap(bitmap->sb_page);
- page_cache_release(bitmap->sb_page);
+ put_page(bitmap->sb_page);
}
/*
@@ -622,12 +622,12 @@ static void bitmap_file_unmap(struct bit
while (pages--)
if (map[pages]->index != 0) /* 0 is sb_page, release it below */
- page_cache_release(map[pages]);
+ put_page(map[pages]);
kfree(map);
kfree(attr);
if (sb_page)
- page_cache_release(sb_page);
+ put_page(sb_page);
}
static void bitmap_stop_daemon(struct bitmap *bitmap);
@@ -654,7 +654,7 @@ static void drain_write_queues(struct bi
while ((item = dequeue_page(bitmap))) {
/* don't bother to wait */
- page_cache_release(item->page);
+ put_page(item->page);
mempool_free(item, bitmap->write_pool);
}
@@ -763,7 +763,7 @@ static void bitmap_file_set_bit(struct b
/* make sure the page stays cached until it gets written out */
if (! (get_page_attr(bitmap, page) & BITMAP_PAGE_DIRTY))
- page_cache_get(page);
+ get_page(page);
/* set the bit */
kaddr = kmap_atomic(page, KM_USER0);
@@ -938,7 +938,7 @@ static int bitmap_init_from_disk(struct
if (ret) {
kunmap(page);
/* release, page not in filemap yet */
- page_cache_release(page);
+ put_page(page);
goto out;
}
}
@@ -1043,7 +1043,7 @@ int bitmap_daemon_work(struct bitmap *bi
/* skip this page unless it's marked as needing cleaning */
if (!((attr=get_page_attr(bitmap, page)) & BITMAP_PAGE_CLEAN)) {
if (attr & BITMAP_PAGE_NEEDWRITE) {
- page_cache_get(page);
+ get_page(page);
clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
}
spin_unlock_irqrestore(&bitmap->lock, flags);
@@ -1057,13 +1057,13 @@ int bitmap_daemon_work(struct bitmap *bi
default:
bitmap_file_kick(bitmap);
}
- page_cache_release(page);
+ put_page(page);
}
continue;
}
/* grab the new page, sync and release the old */
- page_cache_get(page);
+ get_page(page);
if (lastpage != NULL) {
if (get_page_attr(bitmap, lastpage) & BITMAP_PAGE_NEEDWRITE) {
clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
@@ -1078,7 +1078,7 @@ int bitmap_daemon_work(struct bitmap *bi
spin_unlock_irqrestore(&bitmap->lock, flags);
}
kunmap(lastpage);
- page_cache_release(lastpage);
+ put_page(lastpage);
if (err)
bitmap_file_kick(bitmap);
} else
@@ -1133,7 +1133,7 @@ int bitmap_daemon_work(struct bitmap *bi
spin_unlock_irqrestore(&bitmap->lock, flags);
}
- page_cache_release(lastpage);
+ put_page(lastpage);
}
return err;
@@ -1184,7 +1184,7 @@ static void bitmap_writeback_daemon(mdde
PRINTK("finished page writeback: %p\n", page);
err = PageError(page);
- page_cache_release(page);
+ put_page(page);
if (err) {
printk(KERN_WARNING "%s: bitmap file writeback "
"failed (page %lu): %d\n",
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-01 13:57:01.000000000 +1100
+++ ./drivers/md/md.c 2005-12-01 13:57:54.000000000 +1100
@@ -339,7 +339,7 @@ static int alloc_disk_sb(mdk_rdev_t * rd
static void free_disk_sb(mdk_rdev_t * rdev)
{
if (rdev->sb_page) {
- page_cache_release(rdev->sb_page);
+ put_page(rdev->sb_page);
rdev->sb_loaded = 0;
rdev->sb_page = NULL;
rdev->sb_offset = 0;
diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~ 2005-12-01 13:56:49.000000000 +1100
+++ ./drivers/md/raid0.c 2005-12-01 13:57:54.000000000 +1100
@@ -361,7 +361,7 @@ static int raid0_run (mddev_t *mddev)
* chunksize should be used in that case.
*/
{
- int stripe = mddev->raid_disks * mddev->chunk_size / PAGE_CACHE_SIZE;
+ int stripe = mddev->raid_disks * mddev->chunk_size / PAGE_SIZE;
if (mddev->queue->backing_dev_info.ra_pages < 2* stripe)
mddev->queue->backing_dev_info.ra_pages = 2* stripe;
}
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-01 13:56:50.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-01 13:57:54.000000000 +1100
@@ -139,7 +139,7 @@ static void * r1buf_pool_alloc(gfp_t gfp
out_free_pages:
for (i=0; i < RESYNC_PAGES ; i++)
for (j=0 ; j < pi->raid_disks; j++)
- __free_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
+ put_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
j = -1;
out_free_bio:
while ( ++j < pi->raid_disks )
@@ -159,7 +159,7 @@ static void r1buf_pool_free(void *__r1_b
if (j == 0 ||
r1bio->bios[j]->bi_io_vec[i].bv_page !=
r1bio->bios[0]->bi_io_vec[i].bv_page)
- __free_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
+ put_page(r1bio->bios[j]->bi_io_vec[i].bv_page);
}
for (i=0 ; i < pi->raid_disks; i++)
bio_put(r1bio->bios[i]);
@@ -386,7 +386,7 @@ static int raid1_end_write_request(struc
/* FIXME bio has been freed!!! */
int i = bio->bi_vcnt;
while (i--)
- __free_page(bio->bi_io_vec[i].bv_page);
+ put_page(bio->bi_io_vec[i].bv_page);
}
/* clear the bitmap if all writes complete successfully */
bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
@@ -732,7 +732,7 @@ static struct page **alloc_behind_pages(
do_sync_io:
if (pages)
for (i = 0; i < bio->bi_vcnt && pages[i]; i++)
- __free_page(pages[i]);
+ put_page(pages[i]);
kfree(pages);
PRINTK("%dB behind alloc failed, doing sync I/O\n", bio->bi_size);
return NULL;
@@ -1892,7 +1892,7 @@ out_free_conf:
if (conf->r1bio_pool)
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
- __free_page(conf->tmppage);
+ put_page(conf->tmppage);
kfree(conf->poolinfo);
kfree(conf);
mddev->private = NULL;
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-01 13:56:50.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-01 13:57:55.000000000 +1100
@@ -134,10 +134,10 @@ static void * r10buf_pool_alloc(gfp_t gf
out_free_pages:
for ( ; i > 0 ; i--)
- __free_page(bio->bi_io_vec[i-1].bv_page);
+ put_page(bio->bi_io_vec[i-1].bv_page);
while (j--)
for (i = 0; i < RESYNC_PAGES ; i++)
- __free_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
+ put_page(r10_bio->devs[j].bio->bi_io_vec[i].bv_page);
j = -1;
out_free_bio:
while ( ++j < nalloc )
@@ -157,7 +157,7 @@ static void r10buf_pool_free(void *__r10
struct bio *bio = r10bio->devs[j].bio;
if (bio) {
for (i = 0; i < RESYNC_PAGES; i++) {
- __free_page(bio->bi_io_vec[i].bv_page);
+ put_page(bio->bi_io_vec[i].bv_page);
bio->bi_io_vec[i].bv_page = NULL;
}
bio_put(bio);
@@ -2015,7 +2015,7 @@ static int run(mddev_t *mddev)
* maybe...
*/
{
- int stripe = conf->raid_disks * mddev->chunk_size / PAGE_CACHE_SIZE;
+ int stripe = conf->raid_disks * mddev->chunk_size / PAGE_SIZE;
stripe /= conf->near_copies;
if (mddev->queue->backing_dev_info.ra_pages < 2* stripe)
mddev->queue->backing_dev_info.ra_pages = 2* stripe;
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-01 13:56:50.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-01 13:57:55.000000000 +1100
@@ -167,7 +167,7 @@ static void shrink_buffers(struct stripe
if (!p)
continue;
sh->dev[i].page = NULL;
- page_cache_release(p);
+ put_page(p);
}
}
@@ -1955,7 +1955,7 @@ memory = conf->max_nr_stripes * (sizeof(
*/
{
int stripe = (mddev->raid_disks-1) * mddev->chunk_size
- / PAGE_CACHE_SIZE;
+ / PAGE_SIZE;
if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
}
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-01 13:56:50.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-01 13:57:55.000000000 +1100
@@ -186,7 +186,7 @@ static void shrink_buffers(struct stripe
if (!p)
continue;
sh->dev[i].page = NULL;
- page_cache_release(p);
+ put_page(p);
}
}
@@ -2069,7 +2069,7 @@ static int run(mddev_t *mddev)
*/
{
int stripe = (mddev->raid_disks-2) * mddev->chunk_size
- / PAGE_CACHE_SIZE;
+ / PAGE_SIZE;
if (mddev->queue->backing_dev_info.ra_pages < 2 * stripe)
mddev->queue->backing_dev_info.ra_pages = 2 * stripe;
}
@@ -2084,7 +2084,7 @@ abort:
if (conf) {
print_raid6_conf(conf);
if (conf->spare_page)
- page_cache_release(conf->spare_page);
+ put_page(conf->spare_page);
if (conf->stripe_hashtbl)
free_pages((unsigned long) conf->stripe_hashtbl,
HASH_PAGES_ORDER);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 008 of 14] Convert md to use kzalloc throughout
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (6 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 007 of 14] Clean up 'page' related names in md NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 22:42 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 009 of 14] Tidy up raid5/6 hash table code NeilBrown
` (5 subsequent siblings)
13 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Replace multiple kmalloc/memset pairs with kzalloc calls.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 11 +++--------
./drivers/md/linear.c | 3 +--
./drivers/md/md.c | 10 +++-------
./drivers/md/multipath.c | 10 +++-------
./drivers/md/raid0.c | 9 ++-------
./drivers/md/raid1.c | 20 ++++++--------------
./drivers/md/raid10.c | 6 ++----
./drivers/md/raid5.c | 8 ++++----
8 files changed, 24 insertions(+), 53 deletions(-)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-12-01 13:59:53.000000000 +1100
@@ -887,12 +887,10 @@ static int bitmap_init_from_disk(struct
if (!bitmap->filemap)
goto out;
- bitmap->filemap_attr = kmalloc(sizeof(long) * num_pages, GFP_KERNEL);
+ bitmap->filemap_attr = kzalloc(sizeof(long) * num_pages, GFP_KERNEL);
if (!bitmap->filemap_attr)
goto out;
- memset(bitmap->filemap_attr, 0, sizeof(long) * num_pages);
-
oldindex = ~0L;
for (i = 0; i < chunks; i++) {
@@ -1557,12 +1555,10 @@ int bitmap_create(mddev_t *mddev)
BUG_ON(file && mddev->bitmap_offset);
- bitmap = kmalloc(sizeof(*bitmap), GFP_KERNEL);
+ bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL);
if (!bitmap)
return -ENOMEM;
- memset(bitmap, 0, sizeof(*bitmap));
-
spin_lock_init(&bitmap->lock);
bitmap->mddev = mddev;
@@ -1603,12 +1599,11 @@ int bitmap_create(mddev_t *mddev)
#ifdef INJECT_FATAL_FAULT_1
bitmap->bp = NULL;
#else
- bitmap->bp = kmalloc(pages * sizeof(*bitmap->bp), GFP_KERNEL);
+ bitmap->bp = kzalloc(pages * sizeof(*bitmap->bp), GFP_KERNEL);
#endif
err = -ENOMEM;
if (!bitmap->bp)
goto error;
- memset(bitmap->bp, 0, pages * sizeof(*bitmap->bp));
bitmap->flags |= BITMAP_ACTIVE;
diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/linear.c 2005-12-01 13:59:53.000000000 +1100
@@ -121,11 +121,10 @@ static int linear_run (mddev_t *mddev)
sector_t curr_offset;
struct list_head *tmp;
- conf = kmalloc (sizeof (*conf) + mddev->raid_disks*sizeof(dev_info_t),
+ conf = kzalloc (sizeof (*conf) + mddev->raid_disks*sizeof(dev_info_t),
GFP_KERNEL);
if (!conf)
goto out;
- memset(conf, 0, sizeof(*conf) + mddev->raid_disks*sizeof(dev_info_t));
mddev->private = conf;
cnt = 0;
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/md.c 2005-12-01 13:59:53.000000000 +1100
@@ -228,12 +228,10 @@ static mddev_t * mddev_find(dev_t unit)
}
spin_unlock(&all_mddevs_lock);
- new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL);
+ new = (mddev_t *) kzalloc(sizeof(*new), GFP_KERNEL);
if (!new)
return NULL;
- memset(new, 0, sizeof(*new));
-
new->unit = unit;
if (MAJOR(unit) == MD_MAJOR)
new->md_minor = MINOR(unit);
@@ -1620,12 +1618,11 @@ static mdk_rdev_t *md_import_device(dev_
mdk_rdev_t *rdev;
sector_t size;
- rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL);
+ rdev = (mdk_rdev_t *) kzalloc(sizeof(*rdev), GFP_KERNEL);
if (!rdev) {
printk(KERN_ERR "md: could not alloc mem for new device!\n");
return ERR_PTR(-ENOMEM);
}
- memset(rdev, 0, sizeof(*rdev));
if ((err = alloc_disk_sb(rdev)))
goto abort_free;
@@ -3497,11 +3494,10 @@ mdk_thread_t *md_register_thread(void (*
{
mdk_thread_t *thread;
- thread = kmalloc(sizeof(mdk_thread_t), GFP_KERNEL);
+ thread = kzalloc(sizeof(mdk_thread_t), GFP_KERNEL);
if (!thread)
return NULL;
- memset(thread, 0, sizeof(mdk_thread_t));
init_waitqueue_head(&thread->wqueue);
thread->run = run;
diff ./drivers/md/multipath.c~current~ ./drivers/md/multipath.c
--- ./drivers/md/multipath.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/multipath.c 2005-12-01 13:59:53.000000000 +1100
@@ -41,9 +41,7 @@ static mdk_personality_t multipath_perso
static void *mp_pool_alloc(gfp_t gfp_flags, void *data)
{
struct multipath_bh *mpb;
- mpb = kmalloc(sizeof(*mpb), gfp_flags);
- if (mpb)
- memset(mpb, 0, sizeof(*mpb));
+ mpb = kzalloc(sizeof(*mpb), gfp_flags);
return mpb;
}
@@ -444,7 +442,7 @@ static int multipath_run (mddev_t *mddev
* should be freed in multipath_stop()]
*/
- conf = kmalloc(sizeof(multipath_conf_t), GFP_KERNEL);
+ conf = kzalloc(sizeof(multipath_conf_t), GFP_KERNEL);
mddev->private = conf;
if (!conf) {
printk(KERN_ERR
@@ -452,9 +450,8 @@ static int multipath_run (mddev_t *mddev
mdname(mddev));
goto out;
}
- memset(conf, 0, sizeof(*conf));
- conf->multipaths = kmalloc(sizeof(struct multipath_info)*mddev->raid_disks,
+ conf->multipaths = kzalloc(sizeof(struct multipath_info)*mddev->raid_disks,
GFP_KERNEL);
if (!conf->multipaths) {
printk(KERN_ERR
@@ -462,7 +459,6 @@ static int multipath_run (mddev_t *mddev
mdname(mddev));
goto out_free_conf;
}
- memset(conf->multipaths, 0, sizeof(struct multipath_info)*mddev->raid_disks);
conf->working_disks = 0;
ITERATE_RDEV(mddev,rdev,tmp) {
diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/raid0.c 2005-12-01 13:59:53.000000000 +1100
@@ -113,21 +113,16 @@ static int create_strip_zones (mddev_t *
}
printk("raid0: FINAL %d zones\n", conf->nr_strip_zones);
- conf->strip_zone = kmalloc(sizeof(struct strip_zone)*
+ conf->strip_zone = kzalloc(sizeof(struct strip_zone)*
conf->nr_strip_zones, GFP_KERNEL);
if (!conf->strip_zone)
return 1;
- conf->devlist = kmalloc(sizeof(mdk_rdev_t*)*
+ conf->devlist = kzalloc(sizeof(mdk_rdev_t*)*
conf->nr_strip_zones*mddev->raid_disks,
GFP_KERNEL);
if (!conf->devlist)
return 1;
- memset(conf->strip_zone, 0,sizeof(struct strip_zone)*
- conf->nr_strip_zones);
- memset(conf->devlist, 0,
- sizeof(mdk_rdev_t*) * conf->nr_strip_zones * mddev->raid_disks);
-
/* The first zone must contain all devices, so here we check that
* there is a proper alignment of slots to devices and find them all
*/
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-01 13:59:53.000000000 +1100
@@ -61,10 +61,8 @@ static void * r1bio_pool_alloc(gfp_t gfp
int size = offsetof(r1bio_t, bios[pi->raid_disks]);
/* allocate a r1bio with room for raid_disks entries in the bios array */
- r1_bio = kmalloc(size, gfp_flags);
- if (r1_bio)
- memset(r1_bio, 0, size);
- else
+ r1_bio = kzalloc(size, gfp_flags);
+ if (!r1_bio)
unplug_slaves(pi->mddev);
return r1_bio;
@@ -710,13 +708,11 @@ static struct page **alloc_behind_pages(
{
int i;
struct bio_vec *bvec;
- struct page **pages = kmalloc(bio->bi_vcnt * sizeof(struct page *),
+ struct page **pages = kzalloc(bio->bi_vcnt * sizeof(struct page *),
GFP_NOIO);
if (unlikely(!pages))
goto do_sync_io;
- memset(pages, 0, bio->bi_vcnt * sizeof(struct page *));
-
bio_for_each_segment(bvec, bio, i) {
pages[i] = alloc_page(GFP_NOIO);
if (unlikely(!pages[i]))
@@ -1769,19 +1765,16 @@ static int run(mddev_t *mddev)
* bookkeeping area. [whatever we allocate in run(),
* should be freed in stop()]
*/
- conf = kmalloc(sizeof(conf_t), GFP_KERNEL);
+ conf = kzalloc(sizeof(conf_t), GFP_KERNEL);
mddev->private = conf;
if (!conf)
goto out_no_mem;
- memset(conf, 0, sizeof(*conf));
- conf->mirrors = kmalloc(sizeof(struct mirror_info)*mddev->raid_disks,
+ conf->mirrors = kzalloc(sizeof(struct mirror_info)*mddev->raid_disks,
GFP_KERNEL);
if (!conf->mirrors)
goto out_no_mem;
- memset(conf->mirrors, 0, sizeof(struct mirror_info)*mddev->raid_disks);
-
conf->tmppage = alloc_page(GFP_KERNEL);
if (!conf->tmppage)
goto out_no_mem;
@@ -1991,13 +1984,12 @@ static int raid1_reshape(mddev_t *mddev,
kfree(newpoolinfo);
return -ENOMEM;
}
- newmirrors = kmalloc(sizeof(struct mirror_info) * raid_disks, GFP_KERNEL);
+ newmirrors = kzalloc(sizeof(struct mirror_info) * raid_disks, GFP_KERNEL);
if (!newmirrors) {
kfree(newpoolinfo);
mempool_destroy(newpool);
return -ENOMEM;
}
- memset(newmirrors, 0, sizeof(struct mirror_info)*raid_disks);
raise_barrier(conf);
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-01 13:59:53.000000000 +1100
@@ -59,10 +59,8 @@ static void * r10bio_pool_alloc(gfp_t gf
int size = offsetof(struct r10bio_s, devs[conf->copies]);
/* allocate a r10bio with room for raid_disks entries in the bios array */
- r10_bio = kmalloc(size, gfp_flags);
- if (r10_bio)
- memset(r10_bio, 0, size);
- else
+ r10_bio = kzalloc(size, gfp_flags);
+ if (!r10_bio)
unplug_slaves(conf->mddev);
return r10_bio;
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-01 14:01:30.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-01 14:01:46.000000000 +1100
@@ -1826,12 +1826,12 @@ static int run(mddev_t *mddev)
return -EIO;
}
- mddev->private = kmalloc (sizeof (raid5_conf_t)
- + mddev->raid_disks * sizeof(struct disk_info),
- GFP_KERNEL);
+ mddev->private = kzalloc(sizeof (raid5_conf_t)
+ + mddev->raid_disks * sizeof(struct disk_info),
+ GFP_KERNEL);
if ((conf = mddev->private) == NULL)
goto abort;
- memset (conf, 0, sizeof (*conf) + mddev->raid_disks * sizeof(struct disk_info) );
+
conf->mddev = mddev;
if ((conf->stripe_hashtbl = (struct stripe_head **) __get_free_pages(GFP_ATOMIC, HASH_PAGES_ORDER)) == NULL)
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 009 of 14] Tidy up raid5/6 hash table code.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (7 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 008 of 14] Convert md to use kzalloc throughout NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic NeilBrown
` (4 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
- replace open-coded hash chain with hlist macros
- Fix hash-table size at one page - it is already quite
generous, so there will never be a need to use multiple
pages, so no need for __get_free_pages
No functional change.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 40 +++++++++++++------------------------
./drivers/md/raid6main.c | 46 +++++++++++++++----------------------------
./include/linux/raid/raid5.h | 4 +--
3 files changed, 33 insertions(+), 57 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-01 14:01:46.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-01 14:01:59.000000000 +1100
@@ -35,12 +35,10 @@
#define STRIPE_SHIFT (PAGE_SHIFT - 9)
#define STRIPE_SECTORS (STRIPE_SIZE>>9)
#define IO_THRESHOLD 1
-#define HASH_PAGES 1
-#define HASH_PAGES_ORDER 0
-#define NR_HASH (HASH_PAGES * PAGE_SIZE / sizeof(struct stripe_head *))
+#define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
#define HASH_MASK (NR_HASH - 1)
-#define stripe_hash(conf, sect) ((conf)->stripe_hashtbl[((sect) >> STRIPE_SHIFT) & HASH_MASK])
+#define stripe_hash(conf, sect) (&((conf)->stripe_hashtbl[((sect) >> STRIPE_SHIFT) & HASH_MASK]))
/* bio's attached to a stripe+device for I/O are linked together in bi_sector
* order without overlap. There may be several bio's per stripe+device, and
@@ -113,29 +111,21 @@ static void release_stripe(struct stripe
spin_unlock_irqrestore(&conf->device_lock, flags);
}
-static void remove_hash(struct stripe_head *sh)
+static inline void remove_hash(struct stripe_head *sh)
{
PRINTK("remove_hash(), stripe %llu\n", (unsigned long long)sh->sector);
- if (sh->hash_pprev) {
- if (sh->hash_next)
- sh->hash_next->hash_pprev = sh->hash_pprev;
- *sh->hash_pprev = sh->hash_next;
- sh->hash_pprev = NULL;
- }
+ hlist_del_init(&sh->hash);
}
-static __inline__ void insert_hash(raid5_conf_t *conf, struct stripe_head *sh)
+static inline void insert_hash(raid5_conf_t *conf, struct stripe_head *sh)
{
- struct stripe_head **shp = &stripe_hash(conf, sh->sector);
+ struct hlist_head *hp = stripe_hash(conf, sh->sector);
PRINTK("insert_hash(), stripe %llu\n", (unsigned long long)sh->sector);
CHECK_DEVLOCK();
- if ((sh->hash_next = *shp) != NULL)
- (*shp)->hash_pprev = &sh->hash_next;
- *shp = sh;
- sh->hash_pprev = shp;
+ hlist_add_head(&sh->hash, hp);
}
@@ -228,10 +218,11 @@ static inline void init_stripe(struct st
static struct stripe_head *__find_stripe(raid5_conf_t *conf, sector_t sector)
{
struct stripe_head *sh;
+ struct hlist_node *hn;
CHECK_DEVLOCK();
PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector);
- for (sh = stripe_hash(conf, sector); sh; sh = sh->hash_next)
+ hlist_for_each_entry(sh, hn, stripe_hash(conf, sector), hash)
if (sh->sector == sector)
return sh;
PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector);
@@ -1834,9 +1825,8 @@ static int run(mddev_t *mddev)
conf->mddev = mddev;
- if ((conf->stripe_hashtbl = (struct stripe_head **) __get_free_pages(GFP_ATOMIC, HASH_PAGES_ORDER)) == NULL)
+ if ((conf->stripe_hashtbl = kzalloc(PAGE_SIZE, GFP_KERNEL)) == NULL)
goto abort;
- memset(conf->stripe_hashtbl, 0, HASH_PAGES * PAGE_SIZE);
spin_lock_init(&conf->device_lock);
init_waitqueue_head(&conf->wait_for_stripe);
@@ -1971,9 +1961,7 @@ memory = conf->max_nr_stripes * (sizeof(
abort:
if (conf) {
print_raid5_conf(conf);
- if (conf->stripe_hashtbl)
- free_pages((unsigned long) conf->stripe_hashtbl,
- HASH_PAGES_ORDER);
+ kfree(conf->stripe_hashtbl);
kfree(conf);
}
mddev->private = NULL;
@@ -1990,7 +1978,7 @@ static int stop(mddev_t *mddev)
md_unregister_thread(mddev->thread);
mddev->thread = NULL;
shrink_stripes(conf);
- free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
+ kfree(conf->stripe_hashtbl);
blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
sysfs_remove_group(&mddev->kobj, &raid5_attrs_group);
kfree(conf);
@@ -2018,12 +2006,12 @@ static void print_sh (struct stripe_head
static void printall (raid5_conf_t *conf)
{
struct stripe_head *sh;
+ struct hlist_node *hn;
int i;
spin_lock_irq(&conf->device_lock);
for (i = 0; i < NR_HASH; i++) {
- sh = conf->stripe_hashtbl[i];
- for (; sh; sh = sh->hash_next) {
+ hlist_for_each_entry(sh, hn, &conf->stripe_hashtbl[i], hash) {
if (sh->raid_conf != conf)
continue;
print_sh(sh);
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-01 14:01:58.000000000 +1100
@@ -40,12 +40,10 @@
#define STRIPE_SHIFT (PAGE_SHIFT - 9)
#define STRIPE_SECTORS (STRIPE_SIZE>>9)
#define IO_THRESHOLD 1
-#define HASH_PAGES 1
-#define HASH_PAGES_ORDER 0
-#define NR_HASH (HASH_PAGES * PAGE_SIZE / sizeof(struct stripe_head *))
+#define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
#define HASH_MASK (NR_HASH - 1)
-#define stripe_hash(conf, sect) ((conf)->stripe_hashtbl[((sect) >> STRIPE_SHIFT) & HASH_MASK])
+#define stripe_hash(conf, sect) (&((conf)->stripe_hashtbl[((sect) >> STRIPE_SHIFT) & HASH_MASK]))
/* bio's attached to a stripe+device for I/O are linked together in bi_sector
* order without overlap. There may be several bio's per stripe+device, and
@@ -132,29 +130,21 @@ static void release_stripe(struct stripe
spin_unlock_irqrestore(&conf->device_lock, flags);
}
-static void remove_hash(struct stripe_head *sh)
+static inline void remove_hash(struct stripe_head *sh)
{
PRINTK("remove_hash(), stripe %llu\n", (unsigned long long)sh->sector);
- if (sh->hash_pprev) {
- if (sh->hash_next)
- sh->hash_next->hash_pprev = sh->hash_pprev;
- *sh->hash_pprev = sh->hash_next;
- sh->hash_pprev = NULL;
- }
+ hlist_del_init(&sh->hash);
}
-static __inline__ void insert_hash(raid6_conf_t *conf, struct stripe_head *sh)
+static inline void insert_hash(raid6_conf_t *conf, struct stripe_head *sh)
{
- struct stripe_head **shp = &stripe_hash(conf, sh->sector);
+ struct hlist_head *hp = stripe_hash(conf, sh->sector);
PRINTK("insert_hash(), stripe %llu\n", (unsigned long long)sh->sector);
CHECK_DEVLOCK();
- if ((sh->hash_next = *shp) != NULL)
- (*shp)->hash_pprev = &sh->hash_next;
- *shp = sh;
- sh->hash_pprev = shp;
+ hlist_add_head(&sh->hash, hp);
}
@@ -247,10 +237,11 @@ static inline void init_stripe(struct st
static struct stripe_head *__find_stripe(raid6_conf_t *conf, sector_t sector)
{
struct stripe_head *sh;
+ struct hlist_node *hn;
CHECK_DEVLOCK();
PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector);
- for (sh = stripe_hash(conf, sector); sh; sh = sh->hash_next)
+ hlist_for_each_entry (sh, hn, stripe_hash(conf, sector), hash)
if (sh->sector == sector)
return sh;
PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector);
@@ -1931,17 +1922,15 @@ static int run(mddev_t *mddev)
return -EIO;
}
- mddev->private = kmalloc (sizeof (raid6_conf_t)
- + mddev->raid_disks * sizeof(struct disk_info),
- GFP_KERNEL);
+ mddev->private = kzalloc(sizeof (raid6_conf_t)
+ + mddev->raid_disks * sizeof(struct disk_info),
+ GFP_KERNEL);
if ((conf = mddev->private) == NULL)
goto abort;
- memset (conf, 0, sizeof (*conf) + mddev->raid_disks * sizeof(struct disk_info) );
conf->mddev = mddev;
- if ((conf->stripe_hashtbl = (struct stripe_head **) __get_free_pages(GFP_ATOMIC, HASH_PAGES_ORDER)) == NULL)
+ if ((conf->stripe_hashtbl = kzalloc(PAGE_SIZE, GFP_KERNEL)) == NULL)
goto abort;
- memset(conf->stripe_hashtbl, 0, HASH_PAGES * PAGE_SIZE);
conf->spare_page = alloc_page(GFP_KERNEL);
if (!conf->spare_page)
@@ -2085,9 +2074,7 @@ abort:
print_raid6_conf(conf);
if (conf->spare_page)
put_page(conf->spare_page);
- if (conf->stripe_hashtbl)
- free_pages((unsigned long) conf->stripe_hashtbl,
- HASH_PAGES_ORDER);
+ kfree(conf->stripe_hashtbl);
kfree(conf);
}
mddev->private = NULL;
@@ -2104,7 +2091,7 @@ static int stop (mddev_t *mddev)
md_unregister_thread(mddev->thread);
mddev->thread = NULL;
shrink_stripes(conf);
- free_pages((unsigned long) conf->stripe_hashtbl, HASH_PAGES_ORDER);
+ kfree(conf->stripe_hashtbl);
blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
kfree(conf);
mddev->private = NULL;
@@ -2131,12 +2118,13 @@ static void print_sh (struct seq_file *s
static void printall (struct seq_file *seq, raid6_conf_t *conf)
{
struct stripe_head *sh;
+ struct hlist_node *hn;
int i;
spin_lock_irq(&conf->device_lock);
for (i = 0; i < NR_HASH; i++) {
sh = conf->stripe_hashtbl[i];
- for (; sh; sh = sh->hash_next) {
+ hlist_for_each_entry(sh, hn, &conf->stripe_hashtbl[i], hash) {
if (sh->raid_conf != conf)
continue;
print_sh(seq, sh);
diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./include/linux/raid/raid5.h 2005-12-01 14:01:58.000000000 +1100
@@ -126,7 +126,7 @@
*/
struct stripe_head {
- struct stripe_head *hash_next, **hash_pprev; /* hash pointers */
+ struct hlist_node hash;
struct list_head lru; /* inactive_list or handle_list */
struct raid5_private_data *raid_conf;
sector_t sector; /* sector of this row */
@@ -204,7 +204,7 @@ struct disk_info {
};
struct raid5_private_data {
- struct stripe_head **stripe_hashtbl;
+ struct hlist_head *stripe_hashtbl;
mddev_t *mddev;
struct disk_info *spare;
int chunk_size, level, algorithm;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (8 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 009 of 14] Tidy up raid5/6 hash table code NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 22:46 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 011 of 14] Convert recently exported symbol to GPL NeilBrown
` (3 subsequent siblings)
13 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 44 +++++++++++++++++++++-----------------------
1 file changed, 21 insertions(+), 23 deletions(-)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-12-01 14:02:54.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-12-01 14:02:50.000000000 +1100
@@ -406,11 +406,11 @@ int bitmap_update_sb(struct bitmap *bitm
return 0;
}
spin_unlock_irqrestore(&bitmap->lock, flags);
- sb = (bitmap_super_t *)kmap(bitmap->sb_page);
+ sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
sb->events = cpu_to_le64(bitmap->mddev->events);
if (!bitmap->mddev->degraded)
sb->events_cleared = cpu_to_le64(bitmap->mddev->events);
- kunmap(bitmap->sb_page);
+ kunmap_atomic(sb, KM_USER0);
return write_page(bitmap, bitmap->sb_page, 1);
}
@@ -421,7 +421,7 @@ void bitmap_print_sb(struct bitmap *bitm
if (!bitmap || !bitmap->sb_page)
return;
- sb = (bitmap_super_t *)kmap(bitmap->sb_page);
+ sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
printk(KERN_DEBUG "%s: bitmap file superblock:\n", bmname(bitmap));
printk(KERN_DEBUG " magic: %08x\n", le32_to_cpu(sb->magic));
printk(KERN_DEBUG " version: %d\n", le32_to_cpu(sb->version));
@@ -440,7 +440,7 @@ void bitmap_print_sb(struct bitmap *bitm
printk(KERN_DEBUG " sync size: %llu KB\n",
(unsigned long long)le64_to_cpu(sb->sync_size)/2);
printk(KERN_DEBUG "max write behind: %d\n", le32_to_cpu(sb->write_behind));
- kunmap(bitmap->sb_page);
+ kunmap_atomic(sb, KM_USER0);
}
/* read the superblock from the bitmap file and initialize some bitmap fields */
@@ -466,7 +466,7 @@ static int bitmap_read_sb(struct bitmap
return err;
}
- sb = (bitmap_super_t *)kmap(bitmap->sb_page);
+ sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
if (bytes_read < sizeof(*sb)) { /* short read */
printk(KERN_INFO "%s: bitmap file superblock truncated\n",
@@ -535,7 +535,7 @@ success:
bitmap->events_cleared = bitmap->mddev->events;
err = 0;
out:
- kunmap(bitmap->sb_page);
+ kunmap_atomic(sb, KM_USER0);
if (err)
bitmap_print_sb(bitmap);
return err;
@@ -560,7 +560,7 @@ static void bitmap_mask_state(struct bit
}
get_page(bitmap->sb_page);
spin_unlock_irqrestore(&bitmap->lock, flags);
- sb = (bitmap_super_t *)kmap(bitmap->sb_page);
+ sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
switch (op) {
case MASK_SET: sb->state |= bits;
break;
@@ -568,7 +568,7 @@ static void bitmap_mask_state(struct bit
break;
default: BUG();
}
- kunmap(bitmap->sb_page);
+ kunmap_atomic(sb, KM_USER0);
put_page(bitmap->sb_page);
}
@@ -854,6 +854,7 @@ static int bitmap_init_from_disk(struct
unsigned long bytes, offset, dummy;
int outofdate;
int ret = -ENOSPC;
+ void *paddr;
chunks = bitmap->chunks;
file = bitmap->file;
@@ -899,8 +900,6 @@ static int bitmap_init_from_disk(struct
bit = file_page_offset(i);
if (index != oldindex) { /* this is a new page, read it in */
/* unmap the old page, we're done with it */
- if (oldpage != NULL)
- kunmap(oldpage);
if (index == 0) {
/*
* if we're here then the superblock page
@@ -923,18 +922,18 @@ static int bitmap_init_from_disk(struct
oldindex = index;
oldpage = page;
- kmap(page);
if (outofdate) {
/*
* if bitmap is out of date, dirty the
* whole page and write it out
*/
- memset(page_address(page) + offset, 0xff,
+ paddr = kmap_atomic(page, KM_USER0);
+ memset(paddr + offset, 0xff,
PAGE_SIZE - offset);
+ kunmap_atomic(paddr, KM_USER0);
ret = write_page(bitmap, page, 1);
if (ret) {
- kunmap(page);
/* release, page not in filemap yet */
put_page(page);
goto out;
@@ -943,10 +942,12 @@ static int bitmap_init_from_disk(struct
bitmap->filemap[bitmap->file_pages++] = page;
}
+ paddr = kmap_atomic(page, KM_USER0);
if (bitmap->flags & BITMAP_HOSTENDIAN)
- b = test_bit(bit, page_address(page));
+ b = test_bit(bit, paddr);
else
- b = ext2_test_bit(bit, page_address(page));
+ b = ext2_test_bit(bit, paddr);
+ kunmap_atomic(paddr, KM_USER0);
if (b) {
/* if the disk bit is set, set the memory bit */
bitmap_set_memory_bits(bitmap, i << CHUNK_BLOCK_SHIFT(bitmap),
@@ -961,9 +962,6 @@ static int bitmap_init_from_disk(struct
ret = 0;
bitmap_mask_state(bitmap, BITMAP_STALE, MASK_UNSET);
- if (page) /* unmap the last page */
- kunmap(page);
-
if (bit_cnt) { /* Kick recovery if any bits were set */
set_bit(MD_RECOVERY_NEEDED, &bitmap->mddev->recovery);
md_wakeup_thread(bitmap->mddev->thread);
@@ -1019,6 +1017,7 @@ int bitmap_daemon_work(struct bitmap *bi
int err = 0;
int blocks;
int attr;
+ void *paddr;
if (bitmap == NULL)
return 0;
@@ -1075,14 +1074,12 @@ int bitmap_daemon_work(struct bitmap *bi
set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
spin_unlock_irqrestore(&bitmap->lock, flags);
}
- kunmap(lastpage);
put_page(lastpage);
if (err)
bitmap_file_kick(bitmap);
} else
spin_unlock_irqrestore(&bitmap->lock, flags);
lastpage = page;
- kmap(page);
/*
printk("bitmap clean at page %lu\n", j);
*/
@@ -1105,10 +1102,12 @@ int bitmap_daemon_work(struct bitmap *bi
-1);
/* clear the bit */
+ paddr = kmap_atomic(page, KM_USER0);
if (bitmap->flags & BITMAP_HOSTENDIAN)
- clear_bit(file_page_offset(j), page_address(page));
+ clear_bit(file_page_offset(j), paddr);
else
- ext2_clear_bit(file_page_offset(j), page_address(page));
+ ext2_clear_bit(file_page_offset(j), paddr);
+ kunmap_atomic(paddr, KM_USER0);
}
}
spin_unlock_irqrestore(&bitmap->lock, flags);
@@ -1116,7 +1115,6 @@ int bitmap_daemon_work(struct bitmap *bi
/* now sync the final page */
if (lastpage != NULL) {
- kunmap(lastpage);
spin_lock_irqsave(&bitmap->lock, flags);
if (get_page_attr(bitmap, lastpage) &BITMAP_PAGE_NEEDWRITE) {
clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 011 of 14] Convert recently exported symbol to GPL
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (9 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 012 of 14] Break out of a loop that doesn't need to run to completion NeilBrown
` (2 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
...because that seems to be the preferred practice these days.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/md.c 2005-12-01 14:03:07.000000000 +1100
@@ -478,7 +478,7 @@ int sync_page_io(struct block_device *bd
bio_put(bio);
return ret;
}
-EXPORT_SYMBOL(sync_page_io);
+EXPORT_SYMBOL_GPL(sync_page_io);
static int read_disk_sb(mdk_rdev_t * rdev, int size)
{
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 012 of 14] Break out of a loop that doesn't need to run to completion.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (10 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 011 of 14] Convert recently exported symbol to GPL NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:23 ` [PATCH md 013 of 14] Remove personality numbering from md NeilBrown
2005-12-01 3:24 ` [PATCH md 014 of 14] Fix possible problem in raid1/raid10 error overwriting NeilBrown
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-01 14:03:16.000000000 +1100
@@ -1672,8 +1672,10 @@ static sector_t sync_request(mddev_t *md
for (j=0; j<conf->copies;j++) {
int d = r10_bio->devs[j].devnum;
if (conf->mirrors[d].rdev == NULL ||
- test_bit(Faulty, &conf->mirrors[d].rdev->flags))
+ test_bit(Faulty, &conf->mirrors[d].rdev->flags)) {
still_degraded = 1;
+ break;
+ }
}
must_sync = bitmap_start_sync(mddev->bitmap, sect,
&sync_blocks, still_degraded);
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 013 of 14] Remove personality numbering from md.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (11 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 012 of 14] Break out of a loop that doesn't need to run to completion NeilBrown
@ 2005-12-01 3:23 ` NeilBrown
2005-12-01 3:24 ` [PATCH md 014 of 14] Fix possible problem in raid1/raid10 error overwriting NeilBrown
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
md supports multiple different RAID level, each being
implemented by a 'personality' (which is often in a separate
module).
These personalities have fairly artificial 'numbers'. The numbers
are use to:
1- provide an index into an array where the various personalities
are recorded
2- identify the module (via an alias) which implements are particular
personality.
Neither of these uses really justify the existence of personality numbers.
The array can be replaced by a linked list which is searched (array lookup
only happens very rarely). Module identification can be done using an
alias based on level rather than 'personality' number.
The current 'raid5' modules support two level (4 and 5) but only one
personality. This slight awkwardness (which was handled in the mapping
from level to personality) can be better handled by allowing raid5
to register 2 personalities.
With this change in place, the core md module does not need to have an
exhaustive list of all possible personalities, so other personalities
can be added independently.
This patch also moves the check for chunksize being non-zero
into the ->run routines for the personalities that need it,
rather than having it in core-md. This has a side effect of allowing
'faulty' and 'linear' not to have a chunk-size set.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/faulty.c | 8 ++--
./drivers/md/linear.c | 10 +++--
./drivers/md/md.c | 79 ++++++++++++++++----------------------------
./drivers/md/multipath.c | 11 ++----
./drivers/md/raid0.c | 14 +++++--
./drivers/md/raid1.c | 9 ++---
./drivers/md/raid10.c | 16 +++++---
./drivers/md/raid5.c | 34 ++++++++++++++++--
./drivers/md/raid6main.c | 10 +++--
./include/linux/raid/md.h | 4 +-
./include/linux/raid/md_k.h | 63 +++++------------------------------
./init/do_mounts_md.c | 22 +++++-------
12 files changed, 125 insertions(+), 155 deletions(-)
diff ./drivers/md/faulty.c~current~ ./drivers/md/faulty.c
--- ./drivers/md/faulty.c~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./drivers/md/faulty.c 2005-12-01 14:03:25.000000000 +1100
@@ -316,9 +316,10 @@ static int stop(mddev_t *mddev)
return 0;
}
-static mdk_personality_t faulty_personality =
+static struct mdk_personality faulty_personality =
{
.name = "faulty",
+ .level = LEVEL_FAULTY,
.owner = THIS_MODULE,
.make_request = make_request,
.run = run,
@@ -329,15 +330,16 @@ static mdk_personality_t faulty_personal
static int __init raid_init(void)
{
- return register_md_personality(FAULTY, &faulty_personality);
+ return register_md_personality(&faulty_personality);
}
static void raid_exit(void)
{
- unregister_md_personality(FAULTY);
+ unregister_md_personality(&faulty_personality);
}
module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-10"); /* faulty */
+MODULE_ALIAS("md-level--5");
diff ./drivers/md/linear.c~current~ ./drivers/md/linear.c
--- ./drivers/md/linear.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/linear.c 2005-12-01 14:03:25.000000000 +1100
@@ -351,9 +351,10 @@ static void linear_status (struct seq_fi
}
-static mdk_personality_t linear_personality=
+static struct mdk_personality linear_personality =
{
.name = "linear",
+ .level = LEVEL_LINEAR,
.owner = THIS_MODULE,
.make_request = linear_make_request,
.run = linear_run,
@@ -363,16 +364,17 @@ static mdk_personality_t linear_personal
static int __init linear_init (void)
{
- return register_md_personality (LINEAR, &linear_personality);
+ return register_md_personality (&linear_personality);
}
static void linear_exit (void)
{
- unregister_md_personality (LINEAR);
+ unregister_md_personality (&linear_personality);
}
module_init(linear_init);
module_exit(linear_exit);
MODULE_LICENSE("GPL");
-MODULE_ALIAS("md-personality-1"); /* LINEAR */
+MODULE_ALIAS("md-personality-1"); /* LINEAR - degrecated*/
+MODULE_ALIAS("md-level--1");
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-12-01 14:03:07.000000000 +1100
+++ ./drivers/md/md.c 2005-12-01 14:03:25.000000000 +1100
@@ -68,7 +68,7 @@
static void autostart_arrays (int part);
#endif
-static mdk_personality_t *pers[MAX_PERSONALITY];
+static LIST_HEAD(pers_list);
static DEFINE_SPINLOCK(pers_lock);
/*
@@ -303,6 +303,15 @@ static mdk_rdev_t * find_rdev(mddev_t *
return NULL;
}
+static struct mdk_personality *find_pers(int level)
+{
+ struct mdk_personality *pers;
+ list_for_each_entry(pers, &pers_list, list)
+ if (pers->level == level)
+ return pers;
+ return NULL;
+}
+
static inline sector_t calc_dev_sboffset(struct block_device *bdev)
{
sector_t size = bdev->bd_inode->i_size >> BLOCK_SIZE_BITS;
@@ -1744,7 +1753,7 @@ static void analyze_sbs(mddev_t * mddev)
static ssize_t
level_show(mddev_t *mddev, char *page)
{
- mdk_personality_t *p = mddev->pers;
+ struct mdk_personality *p = mddev->pers;
if (p == NULL && mddev->raid_disks == 0)
return 0;
if (mddev->level >= 0)
@@ -1960,11 +1969,12 @@ static int start_dirty_degraded;
static int do_md_run(mddev_t * mddev)
{
- int pnum, err;
+ int err;
int chunk_size;
struct list_head *tmp;
mdk_rdev_t *rdev;
struct gendisk *disk;
+ struct mdk_personality *pers;
char b[BDEVNAME_SIZE];
if (list_empty(&mddev->disks))
@@ -1981,20 +1991,8 @@ static int do_md_run(mddev_t * mddev)
analyze_sbs(mddev);
chunk_size = mddev->chunk_size;
- pnum = level_to_pers(mddev->level);
- if ((pnum != MULTIPATH) && (pnum != RAID1)) {
- if (!chunk_size) {
- /*
- * 'default chunksize' in the old md code used to
- * be PAGE_SIZE, baaad.
- * we abort here to be on the safe side. We don't
- * want to continue the bad practice.
- */
- printk(KERN_ERR
- "no chunksize specified, see 'man raidtab'\n");
- return -EINVAL;
- }
+ if (chunk_size) {
if (chunk_size > MAX_CHUNK_SIZE) {
printk(KERN_ERR "too big chunk_size: %d > %d\n",
chunk_size, MAX_CHUNK_SIZE);
@@ -2030,10 +2028,7 @@ static int do_md_run(mddev_t * mddev)
}
#ifdef CONFIG_KMOD
- if (!pers[pnum])
- {
- request_module("md-personality-%d", pnum);
- }
+ request_module("md-level-%d", mddev->level);
#endif
/*
@@ -2055,14 +2050,14 @@ static int do_md_run(mddev_t * mddev)
return -ENOMEM;
spin_lock(&pers_lock);
- if (!pers[pnum] || !try_module_get(pers[pnum]->owner)) {
+ pers = find_pers(mddev->level);
+ if (!pers || !try_module_get(pers->owner)) {
spin_unlock(&pers_lock);
- printk(KERN_WARNING "md: personality %d is not loaded!\n",
- pnum);
+ printk(KERN_WARNING "md: personality for level %d is not loaded!\n",
+ mddev->level);
return -EINVAL;
}
-
- mddev->pers = pers[pnum];
+ mddev->pers = pers;
spin_unlock(&pers_lock);
mddev->recovery = 0;
@@ -3693,15 +3688,14 @@ static int md_seq_show(struct seq_file *
struct list_head *tmp2;
mdk_rdev_t *rdev;
struct mdstat_info *mi = seq->private;
- int i;
struct bitmap *bitmap;
if (v == (void*)1) {
+ struct mdk_personality *pers;
seq_printf(seq, "Personalities : ");
spin_lock(&pers_lock);
- for (i = 0; i < MAX_PERSONALITY; i++)
- if (pers[i])
- seq_printf(seq, "[%s] ", pers[i]->name);
+ list_for_each_entry(pers, &pers_list, list)
+ seq_printf(seq, "[%s] ", pers->name);
spin_unlock(&pers_lock);
seq_printf(seq, "\n");
@@ -3862,35 +3856,20 @@ static struct file_operations md_seq_fop
.poll = mdstat_poll,
};
-int register_md_personality(int pnum, mdk_personality_t *p)
+int register_md_personality(struct mdk_personality *p)
{
- if (pnum >= MAX_PERSONALITY) {
- printk(KERN_ERR
- "md: tried to install personality %s as nr %d, but max is %lu\n",
- p->name, pnum, MAX_PERSONALITY-1);
- return -EINVAL;
- }
-
spin_lock(&pers_lock);
- if (pers[pnum]) {
- spin_unlock(&pers_lock);
- return -EBUSY;
- }
-
- pers[pnum] = p;
- printk(KERN_INFO "md: %s personality registered as nr %d\n", p->name, pnum);
+ list_add_tail(&p->list, &pers_list);
+ printk(KERN_INFO "md: %s personality registered for level %d\n", p->name, p->level);
spin_unlock(&pers_lock);
return 0;
}
-int unregister_md_personality(int pnum)
+int unregister_md_personality(struct mdk_personality *p)
{
- if (pnum >= MAX_PERSONALITY)
- return -EINVAL;
-
- printk(KERN_INFO "md: %s personality unregistered\n", pers[pnum]->name);
+ printk(KERN_INFO "md: %s personality unregistered\n", p->name);
spin_lock(&pers_lock);
- pers[pnum] = NULL;
+ list_del_init(&p->list);
spin_unlock(&pers_lock);
return 0;
}
diff ./drivers/md/multipath.c~current~ ./drivers/md/multipath.c
--- ./drivers/md/multipath.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/multipath.c 2005-12-01 14:03:25.000000000 +1100
@@ -35,9 +35,6 @@
#define NR_RESERVED_BUFS 32
-static mdk_personality_t multipath_personality;
-
-
static void *mp_pool_alloc(gfp_t gfp_flags, void *data)
{
struct multipath_bh *mpb;
@@ -553,9 +550,10 @@ static int multipath_stop (mddev_t *mdde
return 0;
}
-static mdk_personality_t multipath_personality=
+static struct mdk_personality multipath_personality =
{
.name = "multipath",
+ .level = LEVEL_MULTIPATH,
.owner = THIS_MODULE,
.make_request = multipath_make_request,
.run = multipath_run,
@@ -568,15 +566,16 @@ static mdk_personality_t multipath_perso
static int __init multipath_init (void)
{
- return register_md_personality (MULTIPATH, &multipath_personality);
+ return register_md_personality (&multipath_personality);
}
static void __exit multipath_exit (void)
{
- unregister_md_personality (MULTIPATH);
+ unregister_md_personality (&multipath_personality);
}
module_init(multipath_init);
module_exit(multipath_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-7"); /* MULTIPATH */
+MODULE_ALIAS("md-level--4");
diff ./drivers/md/raid0.c~current~ ./drivers/md/raid0.c
--- ./drivers/md/raid0.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/raid0.c 2005-12-01 14:03:25.000000000 +1100
@@ -275,7 +275,11 @@ static int raid0_run (mddev_t *mddev)
mdk_rdev_t *rdev;
struct list_head *tmp;
- printk("%s: setting max_sectors to %d, segment boundary to %d\n",
+ if (mddev->chunk_size == 0) {
+ printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
+ return -EINVAL;
+ }
+ printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
mdname(mddev),
mddev->chunk_size >> 9,
(mddev->chunk_size>>1)-1);
@@ -507,9 +511,10 @@ static void raid0_status (struct seq_fil
return;
}
-static mdk_personality_t raid0_personality=
+static struct mdk_personality raid0_personality=
{
.name = "raid0",
+ .level = 0,
.owner = THIS_MODULE,
.make_request = raid0_make_request,
.run = raid0_run,
@@ -519,15 +524,16 @@ static mdk_personality_t raid0_personali
static int __init raid0_init (void)
{
- return register_md_personality (RAID0, &raid0_personality);
+ return register_md_personality (&raid0_personality);
}
static void raid0_exit (void)
{
- unregister_md_personality (RAID0);
+ unregister_md_personality (&raid0_personality);
}
module_init(raid0_init);
module_exit(raid0_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-2"); /* RAID0 */
+MODULE_ALIAS("md-level-0");
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-01 13:59:53.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-01 14:03:25.000000000 +1100
@@ -47,7 +47,6 @@
*/
#define NR_RAID1_BIOS 256
-static mdk_personality_t raid1_personality;
static void unplug_slaves(mddev_t *mddev);
@@ -2035,9 +2034,10 @@ static void raid1_quiesce(mddev_t *mddev
}
-static mdk_personality_t raid1_personality =
+static struct mdk_personality raid1_personality =
{
.name = "raid1",
+ .level = 1,
.owner = THIS_MODULE,
.make_request = make_request,
.run = run,
@@ -2055,15 +2055,16 @@ static mdk_personality_t raid1_personali
static int __init raid_init(void)
{
- return register_md_personality(RAID1, &raid1_personality);
+ return register_md_personality(&raid1_personality);
}
static void raid_exit(void)
{
- unregister_md_personality(RAID1);
+ unregister_md_personality(&raid1_personality);
}
module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-3"); /* RAID1 */
+MODULE_ALIAS("md-level-1");
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-01 14:03:16.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-01 14:03:25.000000000 +1100
@@ -1883,11 +1883,11 @@ static int run(mddev_t *mddev)
int nc, fc;
sector_t stride, size;
- if (mddev->level != 10) {
- printk(KERN_ERR "raid10: %s: raid level not set correctly... (%d)\n",
- mdname(mddev), mddev->level);
- goto out;
+ if (mddev->chunk_size == 0) {
+ printk(KERN_ERR "md/raid10: non-zero chunk size required.\n");
+ return -EINVAL;
}
+
nc = mddev->layout & 255;
fc = (mddev->layout >> 8) & 255;
if ((nc*fc) <2 || (nc*fc) > mddev->raid_disks ||
@@ -2072,9 +2072,10 @@ static void raid10_quiesce(mddev_t *mdde
}
}
-static mdk_personality_t raid10_personality =
+static struct mdk_personality raid10_personality =
{
.name = "raid10",
+ .level = 10,
.owner = THIS_MODULE,
.make_request = make_request,
.run = run,
@@ -2090,15 +2091,16 @@ static mdk_personality_t raid10_personal
static int __init raid_init(void)
{
- return register_md_personality(RAID10, &raid10_personality);
+ return register_md_personality(&raid10_personality);
}
static void raid_exit(void)
{
- unregister_md_personality(RAID10);
+ unregister_md_personality(&raid10_personality);
}
module_init(raid_init);
module_exit(raid_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-9"); /* RAID10 */
+MODULE_ALIAS("md-level-10");
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-12-01 14:01:59.000000000 +1100
+++ ./drivers/md/raid5.c 2005-12-01 14:03:25.000000000 +1100
@@ -2186,9 +2186,10 @@ static void raid5_quiesce(mddev_t *mddev
}
}
-static mdk_personality_t raid5_personality=
+static struct mdk_personality raid5_personality =
{
.name = "raid5",
+ .level = 5,
.owner = THIS_MODULE,
.make_request = make_request,
.run = run,
@@ -2203,17 +2204,40 @@ static mdk_personality_t raid5_personali
.quiesce = raid5_quiesce,
};
-static int __init raid5_init (void)
+static struct mdk_personality raid4_personality =
{
- return register_md_personality (RAID5, &raid5_personality);
+ .name = "raid4",
+ .level = 4,
+ .owner = THIS_MODULE,
+ .make_request = make_request,
+ .run = run,
+ .stop = stop,
+ .status = status,
+ .error_handler = error,
+ .hot_add_disk = raid5_add_disk,
+ .hot_remove_disk= raid5_remove_disk,
+ .spare_active = raid5_spare_active,
+ .sync_request = sync_request,
+ .resize = raid5_resize,
+ .quiesce = raid5_quiesce,
+};
+
+static int __init raid5_init(void)
+{
+ register_md_personality(&raid5_personality);
+ register_md_personality(&raid4_personality);
+ return 0;
}
-static void raid5_exit (void)
+static void raid5_exit(void)
{
- unregister_md_personality (RAID5);
+ unregister_md_personality(&raid5_personality);
+ unregister_md_personality(&raid4_personality);
}
module_init(raid5_init);
module_exit(raid5_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-4"); /* RAID5 */
+MODULE_ALIAS("md-level-5");
+MODULE_ALIAS("md-level-4");
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-12-01 14:01:58.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-12-01 14:03:25.000000000 +1100
@@ -2304,9 +2304,10 @@ static void raid6_quiesce(mddev_t *mddev
}
}
-static mdk_personality_t raid6_personality=
+static struct mdk_personality raid6_personality =
{
.name = "raid6",
+ .level = 6,
.owner = THIS_MODULE,
.make_request = make_request,
.run = run,
@@ -2321,7 +2322,7 @@ static mdk_personality_t raid6_personali
.quiesce = raid6_quiesce,
};
-static int __init raid6_init (void)
+static int __init raid6_init(void)
{
int e;
@@ -2329,15 +2330,16 @@ static int __init raid6_init (void)
if ( e )
return e;
- return register_md_personality (RAID6, &raid6_personality);
+ return register_md_personality(&raid6_personality);
}
static void raid6_exit (void)
{
- unregister_md_personality (RAID6);
+ unregister_md_personality(&raid6_personality);
}
module_init(raid6_init);
module_exit(raid6_exit);
MODULE_LICENSE("GPL");
MODULE_ALIAS("md-personality-8"); /* RAID6 */
+MODULE_ALIAS("md-level-6");
diff ./include/linux/raid/md.h~current~ ./include/linux/raid/md.h
--- ./include/linux/raid/md.h~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./include/linux/raid/md.h 2005-12-01 14:03:25.000000000 +1100
@@ -71,8 +71,8 @@
*/
#define MD_PATCHLEVEL_VERSION 3
-extern int register_md_personality (int p_num, mdk_personality_t *p);
-extern int unregister_md_personality (int p_num);
+extern int register_md_personality (struct mdk_personality *p);
+extern int unregister_md_personality (struct mdk_personality *p);
extern mdk_thread_t * md_register_thread (void (*run) (mddev_t *mddev),
mddev_t *mddev, const char *name);
extern void md_unregister_thread (mdk_thread_t *thread);
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./include/linux/raid/md_k.h 2005-12-01 14:03:25.000000000 +1100
@@ -18,62 +18,19 @@
/* and dm-bio-list.h is not under include/linux because.... ??? */
#include "../../../drivers/md/dm-bio-list.h"
-#define MD_RESERVED 0UL
-#define LINEAR 1UL
-#define RAID0 2UL
-#define RAID1 3UL
-#define RAID5 4UL
-#define TRANSLUCENT 5UL
-#define HSM 6UL
-#define MULTIPATH 7UL
-#define RAID6 8UL
-#define RAID10 9UL
-#define FAULTY 10UL
-#define MAX_PERSONALITY 11UL
-
#define LEVEL_MULTIPATH (-4)
#define LEVEL_LINEAR (-1)
#define LEVEL_FAULTY (-5)
+/* we need a value for 'no level specified' and 0
+ * means 'raid0', so we need something else. This is
+ * for internal use only
+ */
+#define LEVEL_NONE (-1000000)
+
#define MaxSector (~(sector_t)0)
#define MD_THREAD_NAME_MAX 14
-static inline int pers_to_level (int pers)
-{
- switch (pers) {
- case FAULTY: return LEVEL_FAULTY;
- case MULTIPATH: return LEVEL_MULTIPATH;
- case HSM: return -3;
- case TRANSLUCENT: return -2;
- case LINEAR: return LEVEL_LINEAR;
- case RAID0: return 0;
- case RAID1: return 1;
- case RAID5: return 5;
- case RAID6: return 6;
- case RAID10: return 10;
- }
- BUG();
- return MD_RESERVED;
-}
-
-static inline int level_to_pers (int level)
-{
- switch (level) {
- case LEVEL_FAULTY: return FAULTY;
- case LEVEL_MULTIPATH: return MULTIPATH;
- case -3: return HSM;
- case -2: return TRANSLUCENT;
- case LEVEL_LINEAR: return LINEAR;
- case 0: return RAID0;
- case 1: return RAID1;
- case 4:
- case 5: return RAID5;
- case 6: return RAID6;
- case 10: return RAID10;
- }
- return MD_RESERVED;
-}
-
typedef struct mddev_s mddev_t;
typedef struct mdk_rdev_s mdk_rdev_t;
@@ -140,12 +97,10 @@ struct mdk_rdev_s
*/
};
-typedef struct mdk_personality_s mdk_personality_t;
-
struct mddev_s
{
void *private;
- mdk_personality_t *pers;
+ struct mdk_personality *pers;
dev_t unit;
int md_minor;
struct list_head disks;
@@ -266,9 +221,11 @@ static inline void md_sync_acct(struct b
atomic_add(nr_sectors, &bdev->bd_contains->bd_disk->sync_io);
}
-struct mdk_personality_s
+struct mdk_personality
{
char *name;
+ int level;
+ struct list_head list;
struct module *owner;
int (*make_request)(request_queue_t *q, struct bio *bio);
int (*run)(mddev_t *mddev);
diff ./init/do_mounts_md.c~current~ ./init/do_mounts_md.c
--- ./init/do_mounts_md.c~current~ 2005-12-01 13:59:45.000000000 +1100
+++ ./init/do_mounts_md.c 2005-12-01 14:03:25.000000000 +1100
@@ -17,7 +17,7 @@ static int __initdata raid_noautodetect,
static struct {
int minor;
int partitioned;
- int pers;
+ int level;
int chunk;
char *device_names;
} md_setup_args[MAX_MD_DEVS] __initdata;
@@ -47,7 +47,7 @@ extern int mdp_major;
*/
static int __init md_setup(char *str)
{
- int minor, level, factor, fault, pers, partitioned = 0;
+ int minor, level, factor, fault, partitioned = 0;
char *pername = "";
char *str1;
int ent;
@@ -78,7 +78,7 @@ static int __init md_setup(char *str)
}
if (ent >= md_setup_ents)
md_setup_ents++;
- switch (get_option(&str, &level)) { /* RAID Personality */
+ switch (get_option(&str, &level)) { /* RAID level */
case 2: /* could be 0 or -1.. */
if (level == 0 || level == LEVEL_LINEAR) {
if (get_option(&str, &factor) != 2 || /* Chunk Size */
@@ -86,16 +86,12 @@ static int __init md_setup(char *str)
printk(KERN_WARNING "md: Too few arguments supplied to md=.\n");
return 0;
}
- md_setup_args[ent].pers = level;
+ md_setup_args[ent].level = level;
md_setup_args[ent].chunk = 1 << (factor+12);
- if (level == LEVEL_LINEAR) {
- pers = LINEAR;
+ if (level == LEVEL_LINEAR)
pername = "linear";
- } else {
- pers = RAID0;
+ else
pername = "raid0";
- }
- md_setup_args[ent].pers = pers;
break;
}
/* FALL THROUGH */
@@ -103,7 +99,7 @@ static int __init md_setup(char *str)
str = str1;
/* FALL THROUGH */
case 0:
- md_setup_args[ent].pers = 0;
+ md_setup_args[ent].level = LEVEL_NONE;
pername="super-block";
}
@@ -190,10 +186,10 @@ static void __init md_setup_drive(void)
continue;
}
- if (md_setup_args[ent].pers) {
+ if (md_setup_args[ent].level != LEVEL_NONE) {
/* non-persistent */
mdu_array_info_t ainfo;
- ainfo.level = pers_to_level(md_setup_args[ent].pers);
+ ainfo.level = md_setup_args[ent].level;
ainfo.size = 0;
ainfo.nr_disks =0;
ainfo.raid_disks =0;
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH md 014 of 14] Fix possible problem in raid1/raid10 error overwriting.
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
` (12 preceding siblings ...)
2005-12-01 3:23 ` [PATCH md 013 of 14] Remove personality numbering from md NeilBrown
@ 2005-12-01 3:24 ` NeilBrown
13 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2005-12-01 3:24 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
The code to overwrite/reread for addressing read errors
in raid1/raid10 currently assumes that the read will
not alter the buffer which could be used to write to
the next device. This is not a safe assumption to make.
So we split the loops into a overwrite loop and a separate re-read
loop, so that the writing is complete before reading is attempted.
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 38 ++++++++++++++++++++++++++++++--------
./drivers/md/raid10.c | 22 ++++++++++++++++++----
2 files changed, 48 insertions(+), 12 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-12-01 14:03:25.000000000 +1100
+++ ./drivers/md/raid1.c 2005-12-01 14:03:40.000000000 +1100
@@ -1252,6 +1252,7 @@ static void sync_request_write(mddev_t *
} while (!success && d != r1_bio->read_disk);
if (success) {
+ int start = d;
/* write it back and re-read */
set_bit(R1BIO_Uptodate, &r1_bio->state);
while (d != r1_bio->read_disk) {
@@ -1265,14 +1266,23 @@ static void sync_request_write(mddev_t *
sect + rdev->data_offset,
s<<9,
bio->bi_io_vec[idx].bv_page,
- WRITE) == 0 ||
- sync_page_io(rdev->bdev,
+ WRITE) == 0)
+ md_error(mddev, rdev);
+ }
+ d = start;
+ while (d != r1_bio->read_disk) {
+ if (d == 0)
+ d = conf->raid_disks;
+ d--;
+ if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+ continue;
+ rdev = conf->mirrors[d].rdev;
+ if (sync_page_io(rdev->bdev,
sect + rdev->data_offset,
s<<9,
bio->bi_io_vec[idx].bv_page,
- READ) == 0) {
+ READ) == 0)
md_error(mddev, rdev);
- }
}
} else {
char b[BDEVNAME_SIZE];
@@ -1444,6 +1454,7 @@ static void raid1d(mddev_t *mddev)
if (success) {
/* write it back and re-read */
+ int start = d;
while (d != r1_bio->read_disk) {
if (d==0)
d = conf->raid_disks;
@@ -1453,13 +1464,24 @@ static void raid1d(mddev_t *mddev)
test_bit(In_sync, &rdev->flags)) {
if (sync_page_io(rdev->bdev,
sect + rdev->data_offset,
- s<<9, conf->tmppage, WRITE) == 0 ||
- sync_page_io(rdev->bdev,
+ s<<9, conf->tmppage, WRITE) == 0)
+ /* Well, this device is dead */
+ md_error(mddev, rdev);
+ }
+ }
+ d = start;
+ while (d != r1_bio->read_disk) {
+ if (d==0)
+ d = conf->raid_disks;
+ d--;
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags)) {
+ if (sync_page_io(rdev->bdev,
sect + rdev->data_offset,
- s<<9, conf->tmppage, READ) == 0) {
+ s<<9, conf->tmppage, READ) == 0)
/* Well, this device is dead */
md_error(mddev, rdev);
- }
}
}
} else {
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-12-01 14:03:25.000000000 +1100
+++ ./drivers/md/raid10.c 2005-12-01 14:03:41.000000000 +1100
@@ -1421,6 +1421,7 @@ static void raid10d(mddev_t *mddev)
} while (!success && sl != r10_bio->read_slot);
if (success) {
+ int start = sl;
/* write it back and re-read */
while (sl != r10_bio->read_slot) {
int d;
@@ -1434,14 +1435,27 @@ static void raid10d(mddev_t *mddev)
if (sync_page_io(rdev->bdev,
r10_bio->devs[sl].addr +
sect + rdev->data_offset,
- s<<9, conf->tmppage, WRITE) == 0 ||
- sync_page_io(rdev->bdev,
+ s<<9, conf->tmppage, WRITE) == 0)
+ /* Well, this device is dead */
+ md_error(mddev, rdev);
+ }
+ }
+ sl = start;
+ while (sl != r10_bio->read_slot) {
+ int d;
+ if (sl==0)
+ sl = conf->copies;
+ sl--;
+ d = r10_bio->devs[sl].devnum;
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags)) {
+ if (sync_page_io(rdev->bdev,
r10_bio->devs[sl].addr +
sect + rdev->data_offset,
- s<<9, conf->tmppage, READ) == 0) {
+ s<<9, conf->tmppage, READ) == 0)
/* Well, this device is dead */
md_error(mddev, rdev);
- }
}
}
} else {
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 002 of 14] Allow raid1 to check consistency
2005-12-01 3:22 ` [PATCH md 002 of 14] Allow raid1 to check consistency NeilBrown
@ 2005-12-01 22:34 ` Andrew Morton
2005-12-05 23:30 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2005-12-01 22:34 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@suse.de> wrote:
>
> + if (test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery))
> + j = pi->raid_disks;
> + else
> + j = 1;
> + while(j--) {
> + bio = r1_bio->bios[j];
> + for (i = 0; i < RESYNC_PAGES; i++) {
> + page = alloc_page(gfp_flags);
> + if (unlikely(!page))
> + goto out_free_pages;
> +
> + bio->bi_io_vec[i].bv_page = page;
> + }
> + }
> + /* If not user-requests, copy the page pointers to all bios */
> + if (!test_bit(MD_RECOVERY_REQUESTED, &pi->mddev->recovery)) {
> + for (i=0; i<RESYNC_PAGES ; i++)
> + for (j=1; j<pi->raid_disks; j++)
> + r1_bio->bios[j]->bi_io_vec[i].bv_page =
> + r1_bio->bios[0]->bi_io_vec[i].bv_page;
> }
>
> r1_bio->master_bio = NULL;
> @@ -122,8 +137,10 @@ static void * r1buf_pool_alloc(gfp_t gfp
> return r1_bio;
>
> out_free_pages:
> - for ( ; i > 0 ; i--)
> - __free_page(bio->bi_io_vec[i-1].bv_page);
> + for (i=0; i < RESYNC_PAGES ; i++)
> + for (j=0 ; j < pi->raid_disks; j++)
> + __free_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
> + j = -1;
> out_free_bio:
Are you sure the error handling here is correct?
a) we loop up to RESYNC_PAGES, but the allocation loop may not have got
that far
b) we loop in the ascending-index direction, but the allocating loop
loops in the descending-index direction.
c) we loop up to pi->raid_disks, but the allocating loop may have
done `j = 1;'.
d) there was a d), but I forgot what it was.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 006 of 14] Make /proc/mdstat pollable.
2005-12-01 3:23 ` [PATCH md 006 of 14] Make /proc/mdstat pollable NeilBrown
@ 2005-12-01 22:39 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2005-12-01 22:39 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@suse.de> wrote:
>
> +DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
>
static scope?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 008 of 14] Convert md to use kzalloc throughout
2005-12-01 3:23 ` [PATCH md 008 of 14] Convert md to use kzalloc throughout NeilBrown
@ 2005-12-01 22:42 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2005-12-01 22:42 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@suse.de> wrote:
>
> - conf = kmalloc (sizeof (*conf) + mddev->raid_disks*sizeof(dev_info_t),
> + conf = kzalloc (sizeof (*conf) + mddev->raid_disks*sizeof(dev_info_t),
> - new = (mddev_t *) kmalloc(sizeof(*new), GFP_KERNEL);
> + new = (mddev_t *) kzalloc(sizeof(*new), GFP_KERNEL);
> - rdev = (mdk_rdev_t *) kmalloc(sizeof(*rdev), GFP_KERNEL);
> + rdev = (mdk_rdev_t *) kzalloc(sizeof(*rdev), GFP_KERNEL);
It'd be nice to nuke the unneeded cast while we're there.
<edits the diff>
OK, I did that.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic
2005-12-01 3:23 ` [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic NeilBrown
@ 2005-12-01 22:46 ` Andrew Morton
2005-12-05 23:43 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2005-12-01 22:46 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
NeilBrown <neilb@suse.de> wrote:
>
> + paddr = kmap_atomic(page, KM_USER0);
> + memset(paddr + offset, 0xff,
> PAGE_SIZE - offset);
This page which is being altered is a user-visible one, no? A pagecache
page?
We must always run flush_dcache_page() against a page when the kernel
modifies a user-visible page's contents. Otherwise, on some architectures,
the modification won't be visible at the different virtual address.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 002 of 14] Allow raid1 to check consistency
2005-12-01 22:34 ` Andrew Morton
@ 2005-12-05 23:30 ` Neil Brown
2005-12-06 3:50 ` Andrew Morton
0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2005-12-05 23:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
On Thursday December 1, akpm@osdl.org wrote:
> NeilBrown <neilb@suse.de> wrote:
> > out_free_pages:
> > - for ( ; i > 0 ; i--)
> > - __free_page(bio->bi_io_vec[i-1].bv_page);
> > + for (i=0; i < RESYNC_PAGES ; i++)
> > + for (j=0 ; j < pi->raid_disks; j++)
> > + __free_page(r1_bio->bios[j]->bi_io_vec[i].bv_page);
> > + j = -1;
> > out_free_bio:
>
> Are you sure the error handling here is correct?
Uhmm.. maybe?
>
> a) we loop up to RESYNC_PAGES, but the allocation loop may not have got
> that far
>
> b) we loop in the ascending-index direction, but the allocating loop
> loops in the descending-index direction.
>
> c) we loop up to pi->raid_disks, but the allocating loop may have
> done `j = 1;'.
As it is a well-known fact that all deallocation routines in Linux
accept a NULL argument, and as error handling is not a critical path,
and as the structures are zeroed when allocated, I chose simply to
free every possibly allocated page rather than keep track of exactly
where we were up to.
Unfortunately not all well-known facts are true :-(
>
> d) there was a d), but I forgot what it was.
Maybe 'd' was
__free_page does not accept 'NULL' as an argument, though
free_page does (but it wants an address I think...).
But I have since changed this code to use "put_page", and put_page
doesn't like NULL either..
Would you accept:
------------------
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./mm/swap.c | 2 ++
1 file changed, 2 insertions(+)
diff ./mm/swap.c~current~ ./mm/swap.c
--- ./mm/swap.c~current~ 2005-12-06 10:29:16.000000000 +1100
+++ ./mm/swap.c 2005-12-06 10:29:25.000000000 +1100
@@ -36,6 +36,8 @@ int page_cluster;
void put_page(struct page *page)
{
+ if (unlikely(page==NULL))
+ return;
if (unlikely(PageCompound(page))) {
page = (struct page *)page_private(page);
if (put_page_testzero(page)) {
--------------------
Or should I open code this in md ?
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic
2005-12-01 22:46 ` Andrew Morton
@ 2005-12-05 23:43 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2005-12-05 23:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
On Thursday December 1, akpm@osdl.org wrote:
> NeilBrown <neilb@suse.de> wrote:
> >
> > + paddr = kmap_atomic(page, KM_USER0);
> > + memset(paddr + offset, 0xff,
> > PAGE_SIZE - offset);
>
> This page which is being altered is a user-visible one, no? A pagecache
> page?
It can be, but also may not be..
>
> We must always run flush_dcache_page() against a page when the kernel
> modifies a user-visible page's contents. Otherwise, on some architectures,
> the modification won't be visible at the different virtual address.
Ok, I'll add that at an appropriate place, thanks.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH md 002 of 14] Allow raid1 to check consistency
2005-12-05 23:30 ` Neil Brown
@ 2005-12-06 3:50 ` Andrew Morton
0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2005-12-06 3:50 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Neil Brown <neilb@suse.de> wrote:
>
> Would you accept:
> ------------------
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./mm/swap.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff ./mm/swap.c~current~ ./mm/swap.c
> --- ./mm/swap.c~current~ 2005-12-06 10:29:16.000000000 +1100
> +++ ./mm/swap.c 2005-12-06 10:29:25.000000000 +1100
> @@ -36,6 +36,8 @@ int page_cluster;
>
> void put_page(struct page *page)
> {
> + if (unlikely(page==NULL))
> + return;
> if (unlikely(PageCompound(page))) {
> page = (struct page *)page_private(page);
> if (put_page_testzero(page)) {
>
eek. That's an all-over-the-place-fast-path!
>
> Or should I open code this in md ?
Yes please.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2005-12-06 3:50 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-01 3:22 [PATCH md 000 of 14] Introduction NeilBrown
2005-12-01 3:22 ` [PATCH md 001 of 14] Support check-without-repair of raid10 arrays NeilBrown
2005-12-01 3:22 ` [PATCH md 002 of 14] Allow raid1 to check consistency NeilBrown
2005-12-01 22:34 ` Andrew Morton
2005-12-05 23:30 ` Neil Brown
2005-12-06 3:50 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 003 of 14] Make sure read error on last working drive of raid1 actually returns failure NeilBrown
2005-12-01 3:23 ` [PATCH md 004 of 14] auto-correct correctable read errors in raid10 NeilBrown
2005-12-01 3:23 ` [PATCH md 005 of 14] raid10 read-error handling - resync and read-only NeilBrown
2005-12-01 3:23 ` [PATCH md 006 of 14] Make /proc/mdstat pollable NeilBrown
2005-12-01 22:39 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 007 of 14] Clean up 'page' related names in md NeilBrown
2005-12-01 3:23 ` [PATCH md 008 of 14] Convert md to use kzalloc throughout NeilBrown
2005-12-01 22:42 ` Andrew Morton
2005-12-01 3:23 ` [PATCH md 009 of 14] Tidy up raid5/6 hash table code NeilBrown
2005-12-01 3:23 ` [PATCH md 010 of 14] Convert various kmap calls to kmap_atomic NeilBrown
2005-12-01 22:46 ` Andrew Morton
2005-12-05 23:43 ` Neil Brown
2005-12-01 3:23 ` [PATCH md 011 of 14] Convert recently exported symbol to GPL NeilBrown
2005-12-01 3:23 ` [PATCH md 012 of 14] Break out of a loop that doesn't need to run to completion NeilBrown
2005-12-01 3:23 ` [PATCH md 013 of 14] Remove personality numbering from md NeilBrown
2005-12-01 3:24 ` [PATCH md 014 of 14] Fix possible problem in raid1/raid10 error overwriting NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).