* [PATCH md 000 of 18] Introduction
@ 2005-11-27 23:39 NeilBrown
2005-11-27 23:39 ` [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies' NeilBrown
` (16 more replies)
0 siblings, 17 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Following are 18 patches against md/raid in 2.6.15-rc2-mm1
The first 5 are bug fixes against recent additions to 2.6.15-rc.
They should preferrable go to Linus for inclusion in 2.6.15.
The remaining 13 are code cleanups and new functionality. These
should sit in -mm until 2.6.16-rc opens up.
The main elements of functionality I have been working on are
- bitmap based write intent logging
This is now available in all raid levels (where useful).
- better handling of read errors
This is now available for raid5, raid6, raid1
raid10 is still outstanding
- Support 'check' action which checks redundancy but doesn't repair.
This is now available in raid5 and raid6
raid10 and raid1 are still outstanding.
Future work includes
- enhancements to the sysfs interface
- improve raid5/6 read speed by bypassing cache
Thanks,
NeilBrown
2.6.15 material
[PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies'.
[PATCH md 002 of 18] Fix locking problem in r5/r6
[PATCH md 003 of 18] Fix problem with raid6 intent bitmap
[PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info
[PATCH md 005 of 18] Fix --re-add for raid1 and raid6
2.6.16 material
[PATCH md 006 of 18] Improve raid1 "IO Barrier" concept.
[PATCH md 007 of 18] Improve raid10 "IO Barrier" concept.
[PATCH md 008 of 18] Small cleanups for raid5
[PATCH md 009 of 18] Allow dirty raid[456] arrays to be started at boot.
[PATCH md 010 of 18] Move bitmap_create to after md array has been initialised.
[PATCH md 011 of 18] Write intent bitmap support for raid10
[PATCH md 012 of 18] Fix raid6 resync check/repair code.
[PATCH md 013 of 18] Improve handing of read errors with raid6
[PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.
[PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors.
[PATCH md 016 of 18] Better handling for read error in raid1 during resync
[PATCH md 017 of 18] Handle errors when read-only
[PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies'.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
@ 2005-11-27 23:39 ` NeilBrown
2005-11-27 23:39 ` [PATCH md 002 of 18] Fix locking problem in r5/r6 NeilBrown
` (15 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
raid10 has two different layouts. One uses near-copies (so multiple
copies of a block are at the same or similar offsets of different
devices) and the other uses far-copies (so multiple copies of a block
are stored a greatly different offsets on different devices). The
point of far-copies is that it allows the first section (normally first
half) to be layed out in normal raid0 style, and thus provide raid0
sequential read performance.
Unfortunately, the read balancing in raid10 makes some poor decisions
for far-copies arrays and you don't get the desired performance. So
turn off that bad bit of read_balance for far-copies arrays.
With this patch, read speed of an 'f2' array is comparable with a
raid0 with the same number of devices, though write speed is ofcourse
still very slow.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-28 10:08:26.000000000 +1100
@@ -552,7 +552,11 @@ static int read_balance(conf_t *conf, r1
!test_bit(In_sync, &rdev->flags))
continue;
- if (!atomic_read(&rdev->nr_pending)) {
+ /* This optimisation is debatable, and completely destroys
+ * sequential read speed for 'far copies' arrays. So only
+ * keep it for 'near' arrays, and review those later.
+ */
+ if (conf->near_copies > 1 && !atomic_read(&rdev->nr_pending)) {
disk = ndisk;
slot = nslot;
break;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 002 of 18] Fix locking problem in r5/r6
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
2005-11-27 23:39 ` [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies' NeilBrown
@ 2005-11-27 23:39 ` NeilBrown
2005-11-27 23:39 ` [PATCH md 003 of 18] Fix problem with raid6 intent bitmap NeilBrown
` (14 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
bitmap_unplug actually writes data (bits) to storage, so we
shouldn't be holding a spinlock...
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 2 ++
./drivers/md/raid6main.c | 2 ++
2 files changed, 4 insertions(+)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/raid5.c 2005-11-28 10:08:33.000000000 +1100
@@ -1704,7 +1704,9 @@ static void raid5d (mddev_t *mddev)
if (conf->seq_flush - conf->seq_write > 0) {
int seq = conf->seq_flush;
+ spin_unlock_irq(&conf->device_lock);
bitmap_unplug(mddev->bitmap);
+ spin_lock_irq(&conf->device_lock);
conf->seq_write = seq;
activate_bit_delay(conf);
}
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:08:33.000000000 +1100
@@ -1784,7 +1784,9 @@ static void raid6d (mddev_t *mddev)
if (conf->seq_flush - conf->seq_write > 0) {
int seq = conf->seq_flush;
+ spin_unlock_irq(&conf->device_lock);
bitmap_unplug(mddev->bitmap);
+ spin_lock_irq(&conf->device_lock);
conf->seq_write = seq;
activate_bit_delay(conf);
}
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 003 of 18] Fix problem with raid6 intent bitmap
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
2005-11-27 23:39 ` [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies' NeilBrown
2005-11-27 23:39 ` [PATCH md 002 of 18] Fix locking problem in r5/r6 NeilBrown
@ 2005-11-27 23:39 ` NeilBrown
2005-11-27 23:39 ` [PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info NeilBrown
` (13 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
When doing a recovery, we need to know whether the array will
still be degraded after the recovery has finished, so we can
know whether bits can be clearred yet or not. This patch
performs the required check.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid6main.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:08:33.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:09:27.000000000 +1100
@@ -1702,6 +1702,8 @@ static sector_t sync_request(mddev_t *md
int data_disks = raid_disks - 2;
sector_t max_sector = mddev->size << 1;
int sync_blocks;
+ int still_degraded = 0;
+ int i;
if (sector_nr >= max_sector) {
/* just being told to finish up .. nothing much to do */
@@ -1710,7 +1712,7 @@ static sector_t sync_request(mddev_t *md
if (mddev->curr_resync < max_sector) /* aborted */
bitmap_end_sync(mddev->bitmap, mddev->curr_resync,
&sync_blocks, 1);
- else /* compelted sync */
+ else /* completed sync */
conf->fullsync = 0;
bitmap_close_sync(mddev->bitmap);
@@ -1748,7 +1750,16 @@ static sector_t sync_request(mddev_t *md
*/
schedule_timeout_uninterruptible(1);
}
- bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 0);
+ /* Need to check if array will still be degraded after recovery/resync
+ * We don't need to check the 'failed' flag as when that gets set,
+ * recovery aborts.
+ */
+ for (i=0; i<mddev->raid_disks; i++)
+ if (conf->disks[i].rdev == NULL)
+ still_degraded = 1;
+
+ bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, still_degraded);
+
spin_lock(&sh->lock);
set_bit(STRIPE_SYNCING, &sh->state);
clear_bit(STRIPE_INSYNC, &sh->state);
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (2 preceding siblings ...)
2005-11-27 23:39 ` [PATCH md 003 of 18] Fix problem with raid6 intent bitmap NeilBrown
@ 2005-11-27 23:39 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 005 of 18] Fix --re-add for raid1 and raid6 NeilBrown
` (12 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:39 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
If an array is created using set_array_info, default_bitmap_offset
isn't set properly meaning that an internal bitmap cannot be hot-added until
the array is stopped and re-assembled.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/md.c 2005-11-28 10:10:54.000000000 +1100
@@ -1028,7 +1028,6 @@ static int super_1_validate(mddev_t *mdd
mddev->size = le64_to_cpu(sb->size)/2;
mddev->events = le64_to_cpu(sb->events);
mddev->bitmap_offset = 0;
- mddev->default_bitmap_offset = 0;
mddev->default_bitmap_offset = 1024;
mddev->recovery_cp = le64_to_cpu(sb->resync_offset);
@@ -2932,6 +2931,9 @@ static int set_array_info(mddev_t * mdde
mddev->sb_dirty = 1;
+ mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
+ mddev->bitmap_offset = 0;
+
/*
* Generate a 128 bit UUID
*/
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 005 of 18] Fix --re-add for raid1 and raid6
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (3 preceding siblings ...)
2005-11-27 23:39 ` [PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 006 of 18] Improve raid1 "IO Barrier" concept NeilBrown
` (11 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
If you have an array with a write-intent-bitmap, and you remove a device,
then re-add it, a full recovery isn't needed. We detect a re-add by looking
at saved_raid_disk. For raid1, it doesn't matter which disk it was, only
whether or not it was an active device.
The old code being removed set a value of 'mirror' which was then ignored,
so it can go. The changed code performs the correct check.
For raid6, if there are two missing devices, make sure we chose the right
slot on --re-add rather than always the first slot.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 8 ++++----
./drivers/md/raid6main.c | 10 ++++++++--
2 files changed, 12 insertions(+), 6 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:11:52.000000000 +1100
@@ -953,9 +953,6 @@ static int raid1_add_disk(mddev_t *mddev
int mirror = 0;
mirror_info_t *p;
- if (rdev->saved_raid_disk >= 0 &&
- conf->mirrors[rdev->saved_raid_disk].rdev == NULL)
- mirror = rdev->saved_raid_disk;
for (mirror=0; mirror < mddev->raid_disks; mirror++)
if ( !(p=conf->mirrors+mirror)->rdev) {
@@ -972,7 +969,10 @@ static int raid1_add_disk(mddev_t *mddev
p->head_position = 0;
rdev->raid_disk = mirror;
found = 1;
- if (rdev->saved_raid_disk != mirror)
+ /* As all devices are equivalent, we don't need a full recovery
+ * if this was recently any drive of the array
+ */
+ if (rdev->saved_raid_disk < 0)
conf->fullsync = 1;
rcu_assign_pointer(p->rdev, rdev);
break;
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:09:27.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:11:52.000000000 +1100
@@ -2158,9 +2158,15 @@ static int raid6_add_disk(mddev_t *mddev
/* no point adding a device */
return 0;
/*
- * find the disk ...
+ * find the disk ... but prefer rdev->saved_raid_disk
+ * if possible.
*/
- for (disk=0; disk < mddev->raid_disks; disk++)
+ if (rdev->saved_raid_disk >= 0 &&
+ conf->disks[rdev->saved_raid_disk].rdev == NULL)
+ disk = rdev->saved_raid_disk;
+ else
+ disk = 0;
+ for ( ; disk < mddev->raid_disks; disk++)
if ((p=conf->disks + disk)->rdev == NULL) {
clear_bit(In_sync, &rdev->flags);
rdev->raid_disk = disk;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 006 of 18] Improve raid1 "IO Barrier" concept.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (4 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 005 of 18] Fix --re-add for raid1 and raid6 NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 007 of 18] Improve raid10 " NeilBrown
` (10 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
raid1 needs to put up a barrier to new requests while it
does resync or other background recovery.
The code for this is currently open-coded, slighty obscure
by its use of two waitqueues, and not documented.
This patch gathers all the related code into 4 functions,
and includes a comment which (hopefully) explains what is happening.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 169 ++++++++++++++++++++++---------------------
./include/linux/raid/raid1.h | 4 -
2 files changed, 92 insertions(+), 81 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:11:52.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:12:17.000000000 +1100
@@ -51,6 +51,8 @@ static mdk_personality_t raid1_personali
static void unplug_slaves(mddev_t *mddev);
+static void allow_barrier(conf_t *conf);
+static void lower_barrier(conf_t *conf);
static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
{
@@ -160,20 +162,13 @@ static void put_all_bios(conf_t *conf, r
static inline void free_r1bio(r1bio_t *r1_bio)
{
- unsigned long flags;
-
conf_t *conf = mddev_to_conf(r1_bio->mddev);
/*
* Wake up any possible resync thread that waits for the device
* to go idle.
*/
- spin_lock_irqsave(&conf->resync_lock, flags);
- if (!--conf->nr_pending) {
- wake_up(&conf->wait_idle);
- wake_up(&conf->wait_resume);
- }
- spin_unlock_irqrestore(&conf->resync_lock, flags);
+ allow_barrier(conf);
put_all_bios(conf, r1_bio);
mempool_free(r1_bio, conf->r1bio_pool);
@@ -182,22 +177,10 @@ static inline void free_r1bio(r1bio_t *r
static inline void put_buf(r1bio_t *r1_bio)
{
conf_t *conf = mddev_to_conf(r1_bio->mddev);
- unsigned long flags;
mempool_free(r1_bio, conf->r1buf_pool);
- spin_lock_irqsave(&conf->resync_lock, flags);
- if (!conf->barrier)
- BUG();
- --conf->barrier;
- wake_up(&conf->wait_resume);
- wake_up(&conf->wait_idle);
-
- if (!--conf->nr_pending) {
- wake_up(&conf->wait_idle);
- wake_up(&conf->wait_resume);
- }
- spin_unlock_irqrestore(&conf->resync_lock, flags);
+ lower_barrier(conf);
}
static void reschedule_retry(r1bio_t *r1_bio)
@@ -210,6 +193,7 @@ static void reschedule_retry(r1bio_t *r1
list_add(&r1_bio->retry_list, &conf->retry_list);
spin_unlock_irqrestore(&conf->device_lock, flags);
+ wake_up(&conf->wait_barrier);
md_wakeup_thread(mddev->thread);
}
@@ -592,30 +576,83 @@ static int raid1_issue_flush(request_que
return ret;
}
-/*
- * Throttle resync depth, so that we can both get proper overlapping of
- * requests, but are still able to handle normal requests quickly.
+/* Barriers....
+ * Sometimes we need to suspend IO while we do something else,
+ * either some resync/recovery, or reconfigure the array.
+ * To do this we raise a 'barrier'.
+ * The 'barrier' is a counter that can be raised multiple times
+ * to count how many activities are happening which preclude
+ * normal IO.
+ * We can only raise the barrier if there is no pending IO.
+ * i.e. if nr_pending == 0.
+ * We choose only to raise the barrier if no-one is waiting for the
+ * barrier to go down. This means that as soon as an IO request
+ * is ready, no other operations which require a barrier will start
+ * until the IO request has had a chance.
+ *
+ * So: regular IO calls 'wait_barrier'. When that returns there
+ * is no backgroup IO happening, It must arrange to call
+ * allow_barrier when it has finished its IO.
+ * backgroup IO calls must call raise_barrier. Once that returns
+ * there is no normal IO happeing. It must arrange to call
+ * lower_barrier when the particular background IO completes.
*/
#define RESYNC_DEPTH 32
-static void device_barrier(conf_t *conf, sector_t sect)
+static void raise_barrier(conf_t *conf)
{
spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_idle, !waitqueue_active(&conf->wait_resume),
- conf->resync_lock, raid1_unplug(conf->mddev->queue));
-
- if (!conf->barrier++) {
- wait_event_lock_irq(conf->wait_idle, !conf->nr_pending,
- conf->resync_lock, raid1_unplug(conf->mddev->queue));
- if (conf->nr_pending)
- BUG();
- }
- wait_event_lock_irq(conf->wait_resume, conf->barrier < RESYNC_DEPTH,
- conf->resync_lock, raid1_unplug(conf->mddev->queue));
- conf->next_resync = sect;
+
+ /* Wait until no block IO is waiting */
+ wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
+ conf->resync_lock,
+ raid1_unplug(conf->mddev->queue));
+
+ /* block any new IO from starting */
+ conf->barrier++;
+
+ /* No wait for all pending IO to complete */
+ wait_event_lock_irq(conf->wait_barrier,
+ !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+ conf->resync_lock,
+ raid1_unplug(conf->mddev->queue));
+
+ spin_unlock_irq(&conf->resync_lock);
+}
+
+static void lower_barrier(conf_t *conf)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&conf->resync_lock, flags);
+ conf->barrier--;
+ spin_unlock_irqrestore(&conf->resync_lock, flags);
+ wake_up(&conf->wait_barrier);
+}
+
+static void wait_barrier(conf_t *conf)
+{
+ spin_lock_irq(&conf->resync_lock);
+ if (conf->barrier) {
+ conf->nr_waiting++;
+ wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
+ conf->resync_lock,
+ raid1_unplug(conf->mddev->queue));
+ conf->nr_waiting--;
+ }
+ conf->nr_pending++;
spin_unlock_irq(&conf->resync_lock);
}
+static void allow_barrier(conf_t *conf)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&conf->resync_lock, flags);
+ conf->nr_pending--;
+ spin_unlock_irqrestore(&conf->resync_lock, flags);
+ wake_up(&conf->wait_barrier);
+}
+
+
/* duplicate the data pages for behind I/O */
static struct page **alloc_behind_pages(struct bio *bio)
{
@@ -677,10 +714,7 @@ static int make_request(request_queue_t
*/
md_write_start(mddev, bio); /* wait on superblock update early */
- spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_resume, !conf->barrier, conf->resync_lock, );
- conf->nr_pending++;
- spin_unlock_irq(&conf->resync_lock);
+ wait_barrier(conf);
disk_stat_inc(mddev->gendisk, ios[rw]);
disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
@@ -908,13 +942,8 @@ static void print_conf(conf_t *conf)
static void close_sync(conf_t *conf)
{
- spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_resume, !conf->barrier,
- conf->resync_lock, raid1_unplug(conf->mddev->queue));
- spin_unlock_irq(&conf->resync_lock);
-
- if (conf->barrier) BUG();
- if (waitqueue_active(&conf->wait_idle)) BUG();
+ wait_barrier(conf);
+ allow_barrier(conf);
mempool_destroy(conf->r1buf_pool);
conf->r1buf_pool = NULL;
@@ -1316,12 +1345,16 @@ static sector_t sync_request(mddev_t *md
return sync_blocks;
}
/*
- * If there is non-resync activity waiting for us then
- * put in a delay to throttle resync.
+ * If there is non-resync activity waiting for a turn,
+ * and resync is going fast enough,
+ * then let it though before starting on this new sync request.
*/
- if (!go_faster && waitqueue_active(&conf->wait_resume))
+ if (!go_faster && conf->nr_waiting)
msleep_interruptible(1000);
- device_barrier(conf, sector_nr + RESYNC_SECTORS);
+
+ raise_barrier(conf);
+
+ conf->next_resync = sector_nr;
/*
* If reconstructing, and >1 working disc,
@@ -1354,10 +1387,6 @@ static sector_t sync_request(mddev_t *md
r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
- spin_lock_irq(&conf->resync_lock);
- conf->nr_pending++;
- spin_unlock_irq(&conf->resync_lock);
-
r1_bio->mddev = mddev;
r1_bio->sector = sector_nr;
r1_bio->state = 0;
@@ -1541,8 +1570,7 @@ static int run(mddev_t *mddev)
mddev->recovery_cp = MaxSector;
spin_lock_init(&conf->resync_lock);
- init_waitqueue_head(&conf->wait_idle);
- init_waitqueue_head(&conf->wait_resume);
+ init_waitqueue_head(&conf->wait_barrier);
bio_list_init(&conf->pending_bio_list);
bio_list_init(&conf->flushing_bio_list);
@@ -1713,11 +1741,7 @@ static int raid1_reshape(mddev_t *mddev,
}
memset(newmirrors, 0, sizeof(struct mirror_info)*raid_disks);
- spin_lock_irq(&conf->resync_lock);
- conf->barrier++;
- wait_event_lock_irq(conf->wait_idle, !conf->nr_pending,
- conf->resync_lock, raid1_unplug(mddev->queue));
- spin_unlock_irq(&conf->resync_lock);
+ raise_barrier(conf);
/* ok, everything is stopped */
oldpool = conf->r1bio_pool;
@@ -1737,12 +1761,7 @@ static int raid1_reshape(mddev_t *mddev,
conf->raid_disks = mddev->raid_disks = raid_disks;
conf->last_used = 0; /* just make sure it is in-range */
- spin_lock_irq(&conf->resync_lock);
- conf->barrier--;
- spin_unlock_irq(&conf->resync_lock);
- wake_up(&conf->wait_resume);
- wake_up(&conf->wait_idle);
-
+ lower_barrier(conf);
set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
md_wakeup_thread(mddev->thread);
@@ -1757,18 +1776,10 @@ static void raid1_quiesce(mddev_t *mddev
switch(state) {
case 1:
- spin_lock_irq(&conf->resync_lock);
- conf->barrier++;
- wait_event_lock_irq(conf->wait_idle, !conf->nr_pending,
- conf->resync_lock, raid1_unplug(mddev->queue));
- spin_unlock_irq(&conf->resync_lock);
+ raise_barrier(conf);
break;
case 0:
- spin_lock_irq(&conf->resync_lock);
- conf->barrier--;
- spin_unlock_irq(&conf->resync_lock);
- wake_up(&conf->wait_resume);
- wake_up(&conf->wait_idle);
+ lower_barrier(conf);
break;
}
if (mddev->thread) {
diff ./include/linux/raid/raid1.h~current~ ./include/linux/raid/raid1.h
--- ./include/linux/raid/raid1.h~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./include/linux/raid/raid1.h 2005-11-28 10:12:17.000000000 +1100
@@ -45,6 +45,7 @@ struct r1_private_data_s {
spinlock_t resync_lock;
int nr_pending;
+ int nr_waiting;
int barrier;
sector_t next_resync;
int fullsync; /* set to 1 if a full sync is needed,
@@ -52,8 +53,7 @@ struct r1_private_data_s {
* Cleared when a sync completes.
*/
- wait_queue_head_t wait_idle;
- wait_queue_head_t wait_resume;
+ wait_queue_head_t wait_barrier;
struct pool_info *poolinfo;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 007 of 18] Improve raid10 "IO Barrier" concept.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (5 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 006 of 18] Improve raid1 "IO Barrier" concept NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 008 of 18] Small cleanups for raid5 NeilBrown
` (9 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
raid10 needs to put up a barrier to new requests while it
does resync or other background recovery.
The code for this is currently open-coded, slighty obscure
by its use of two waitqueues, and not documented.
This patch gathers all the related code into 4 functions,
and includes a comment which (hopefully) explains what is happening.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid10.c | 135 ++++++++++++++++++++++++------------------
./include/linux/raid/raid10.h | 4 -
2 files changed, 81 insertions(+), 58 deletions(-)
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-28 10:08:26.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-28 10:12:24.000000000 +1100
@@ -47,6 +47,9 @@
static void unplug_slaves(mddev_t *mddev);
+static void allow_barrier(conf_t *conf);
+static void lower_barrier(conf_t *conf);
+
static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
{
conf_t *conf = data;
@@ -175,20 +178,13 @@ static void put_all_bios(conf_t *conf, r
static inline void free_r10bio(r10bio_t *r10_bio)
{
- unsigned long flags;
-
conf_t *conf = mddev_to_conf(r10_bio->mddev);
/*
* Wake up any possible resync thread that waits for the device
* to go idle.
*/
- spin_lock_irqsave(&conf->resync_lock, flags);
- if (!--conf->nr_pending) {
- wake_up(&conf->wait_idle);
- wake_up(&conf->wait_resume);
- }
- spin_unlock_irqrestore(&conf->resync_lock, flags);
+ allow_barrier(conf);
put_all_bios(conf, r10_bio);
mempool_free(r10_bio, conf->r10bio_pool);
@@ -197,22 +193,10 @@ static inline void free_r10bio(r10bio_t
static inline void put_buf(r10bio_t *r10_bio)
{
conf_t *conf = mddev_to_conf(r10_bio->mddev);
- unsigned long flags;
mempool_free(r10_bio, conf->r10buf_pool);
- spin_lock_irqsave(&conf->resync_lock, flags);
- if (!conf->barrier)
- BUG();
- --conf->barrier;
- wake_up(&conf->wait_resume);
- wake_up(&conf->wait_idle);
-
- if (!--conf->nr_pending) {
- wake_up(&conf->wait_idle);
- wake_up(&conf->wait_resume);
- }
- spin_unlock_irqrestore(&conf->resync_lock, flags);
+ lower_barrier(conf);
}
static void reschedule_retry(r10bio_t *r10_bio)
@@ -640,30 +624,82 @@ static int raid10_issue_flush(request_qu
return ret;
}
-/*
- * Throttle resync depth, so that we can both get proper overlapping of
- * requests, but are still able to handle normal requests quickly.
+/* Barriers....
+ * Sometimes we need to suspend IO while we do something else,
+ * either some resync/recovery, or reconfigure the array.
+ * To do this we raise a 'barrier'.
+ * The 'barrier' is a counter that can be raised multiple times
+ * to count how many activities are happening which preclude
+ * normal IO.
+ * We can only raise the barrier if there is no pending IO.
+ * i.e. if nr_pending == 0.
+ * We choose only to raise the barrier if no-one is waiting for the
+ * barrier to go down. This means that as soon as an IO request
+ * is ready, no other operations which require a barrier will start
+ * until the IO request has had a chance.
+ *
+ * So: regular IO calls 'wait_barrier'. When that returns there
+ * is no backgroup IO happening, It must arrange to call
+ * allow_barrier when it has finished its IO.
+ * backgroup IO calls must call raise_barrier. Once that returns
+ * there is no normal IO happeing. It must arrange to call
+ * lower_barrier when the particular background IO completes.
*/
#define RESYNC_DEPTH 32
-static void device_barrier(conf_t *conf, sector_t sect)
+static void raise_barrier(conf_t *conf)
{
spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_idle, !waitqueue_active(&conf->wait_resume),
- conf->resync_lock, unplug_slaves(conf->mddev));
- if (!conf->barrier++) {
- wait_event_lock_irq(conf->wait_idle, !conf->nr_pending,
- conf->resync_lock, unplug_slaves(conf->mddev));
- if (conf->nr_pending)
- BUG();
- }
- wait_event_lock_irq(conf->wait_resume, conf->barrier < RESYNC_DEPTH,
- conf->resync_lock, unplug_slaves(conf->mddev));
- conf->next_resync = sect;
+ /* Wait until no block IO is waiting */
+ wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
+ conf->resync_lock,
+ raid10_unplug(conf->mddev->queue));
+
+ /* block any new IO from starting */
+ conf->barrier++;
+
+ /* No wait for all pending IO to complete */
+ wait_event_lock_irq(conf->wait_barrier,
+ !conf->nr_pending && conf->barrier < RESYNC_DEPTH,
+ conf->resync_lock,
+ raid10_unplug(conf->mddev->queue));
+
+ spin_unlock_irq(&conf->resync_lock);
+}
+
+static void lower_barrier(conf_t *conf)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&conf->resync_lock, flags);
+ conf->barrier--;
+ spin_unlock_irqrestore(&conf->resync_lock, flags);
+ wake_up(&conf->wait_barrier);
+}
+
+static void wait_barrier(conf_t *conf)
+{
+ spin_lock_irq(&conf->resync_lock);
+ if (conf->barrier) {
+ conf->nr_waiting++;
+ wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
+ conf->resync_lock,
+ raid10_unplug(conf->mddev->queue));
+ conf->nr_waiting--;
+ }
+ conf->nr_pending++;
spin_unlock_irq(&conf->resync_lock);
}
+static void allow_barrier(conf_t *conf)
+{
+ unsigned long flags;
+ spin_lock_irqsave(&conf->resync_lock, flags);
+ conf->nr_pending--;
+ spin_unlock_irqrestore(&conf->resync_lock, flags);
+ wake_up(&conf->wait_barrier);
+}
+
static int make_request(request_queue_t *q, struct bio * bio)
{
mddev_t *mddev = q->queuedata;
@@ -719,10 +755,7 @@ static int make_request(request_queue_t
* thread has put up a bar for new requests.
* Continue immediately if no resync is active currently.
*/
- spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_resume, !conf->barrier, conf->resync_lock, );
- conf->nr_pending++;
- spin_unlock_irq(&conf->resync_lock);
+ wait_barrier(conf);
disk_stat_inc(mddev->gendisk, ios[rw]);
disk_stat_add(mddev->gendisk, sectors[rw], bio_sectors(bio));
@@ -897,13 +930,8 @@ static void print_conf(conf_t *conf)
static void close_sync(conf_t *conf)
{
- spin_lock_irq(&conf->resync_lock);
- wait_event_lock_irq(conf->wait_resume, !conf->barrier,
- conf->resync_lock, unplug_slaves(conf->mddev));
- spin_unlock_irq(&conf->resync_lock);
-
- if (conf->barrier) BUG();
- if (waitqueue_active(&conf->wait_idle)) BUG();
+ wait_barrier(conf);
+ allow_barrier(conf);
mempool_destroy(conf->r10buf_pool);
conf->r10buf_pool = NULL;
@@ -1395,9 +1423,10 @@ static sector_t sync_request(mddev_t *md
* If there is non-resync activity waiting for us then
* put in a delay to throttle resync.
*/
- if (!go_faster && waitqueue_active(&conf->wait_resume))
+ if (!go_faster && conf->nr_waiting)
msleep_interruptible(1000);
- device_barrier(conf, sector_nr + RESYNC_SECTORS);
+ raise_barrier(conf);
+ conf->next_resync = sector_nr;
/* Again, very different code for resync and recovery.
* Both must result in an r10bio with a list of bios that
@@ -1427,7 +1456,6 @@ static sector_t sync_request(mddev_t *md
r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
spin_lock_irq(&conf->resync_lock);
- conf->nr_pending++;
if (rb2) conf->barrier++;
spin_unlock_irq(&conf->resync_lock);
atomic_set(&r10_bio->remaining, 0);
@@ -1500,10 +1528,6 @@ static sector_t sync_request(mddev_t *md
int count = 0;
r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
- spin_lock_irq(&conf->resync_lock);
- conf->nr_pending++;
- spin_unlock_irq(&conf->resync_lock);
-
r10_bio->mddev = mddev;
atomic_set(&r10_bio->remaining, 0);
@@ -1713,8 +1737,7 @@ static int run(mddev_t *mddev)
INIT_LIST_HEAD(&conf->retry_list);
spin_lock_init(&conf->resync_lock);
- init_waitqueue_head(&conf->wait_idle);
- init_waitqueue_head(&conf->wait_resume);
+ init_waitqueue_head(&conf->wait_barrier);
/* need to check that every block has at least one working mirror */
if (!enough(conf)) {
diff ./include/linux/raid/raid10.h~current~ ./include/linux/raid/raid10.h
--- ./include/linux/raid/raid10.h~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./include/linux/raid/raid10.h 2005-11-28 10:12:24.000000000 +1100
@@ -39,11 +39,11 @@ struct r10_private_data_s {
spinlock_t resync_lock;
int nr_pending;
+ int nr_waiting;
int barrier;
sector_t next_resync;
- wait_queue_head_t wait_idle;
- wait_queue_head_t wait_resume;
+ wait_queue_head_t wait_barrier;
mempool_t *r10bio_pool;
mempool_t *r10buf_pool;
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 008 of 18] Small cleanups for raid5
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (6 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 007 of 18] Improve raid10 " NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 010 of 18] Move bitmap_create to after md array has been initialised NeilBrown
` (8 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Resync code:
A test that isn't needed,
a 'compute_block' that makes more sense
elsewhere (And then doesn't need a test),
a couple of BUG_ONs to confirm the change makes sense.
Printks:
A few were missing KERN_*
Also fix a typo in a comment..
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 41 +++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-11-28 10:08:33.000000000 +1100
+++ ./drivers/md/raid5.c 2005-11-28 10:12:30.000000000 +1100
@@ -416,7 +416,7 @@ static int raid5_end_read_request(struct
set_bit(R5_UPTODATE, &sh->dev[i].flags);
#endif
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
- printk("R5: read error corrected!!\n");
+ printk(KERN_INFO "raid5: read error corrected!!\n");
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
}
@@ -427,13 +427,14 @@ static int raid5_end_read_request(struct
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&conf->disks[i].rdev->read_errors);
if (conf->mddev->degraded)
- printk("R5: read error not correctable.\n");
+ printk(KERN_WARNING "raid5: read error not correctable.\n");
else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
/* Oh, no!!! */
- printk("R5: read error NOT corrected!!\n");
+ printk(KERN_WARNING "raid5: read error NOT corrected!!\n");
else if (atomic_read(&conf->disks[i].rdev->read_errors)
> conf->max_nr_stripes)
- printk("raid5: Too many read errors, failing device.\n");
+ printk(KERN_WARNING
+ "raid5: Too many read errors, failing device.\n");
else
retry = 1;
if (retry)
@@ -603,7 +604,7 @@ static sector_t raid5_compute_sector(sec
*dd_idx = (*pd_idx + 1 + *dd_idx) % raid_disks;
break;
default:
- printk("raid5: unsupported algorithm %d\n",
+ printk(KERN_ERR "raid5: unsupported algorithm %d\n",
conf->algorithm);
}
@@ -644,7 +645,7 @@ static sector_t compute_blocknr(struct s
i -= (sh->pd_idx + 1);
break;
default:
- printk("raid5: unsupported algorithm %d\n",
+ printk(KERN_ERR "raid5: unsupported algorithm %d\n",
conf->algorithm);
}
@@ -653,7 +654,7 @@ static sector_t compute_blocknr(struct s
check = raid5_compute_sector (r_sector, raid_disks, data_disks, &dummy1, &dummy2, conf);
if (check != sh->sector || dummy1 != dd_idx || dummy2 != sh->pd_idx) {
- printk("compute_blocknr: map not correct\n");
+ printk(KERN_ERR "compute_blocknr: map not correct\n");
return 0;
}
return r_sector;
@@ -736,7 +737,7 @@ static void compute_block(struct stripe_
if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
ptr[count++] = p;
else
- printk("compute_block() %d, stripe %llu, %d"
+ printk(KERN_ERR "compute_block() %d, stripe %llu, %d"
" not present\n", dd_idx,
(unsigned long long)sh->sector, i);
@@ -1004,7 +1005,7 @@ static void handle_stripe(struct stripe_
if (dev->written) written++;
rdev = conf->disks[i].rdev; /* FIXME, should I be looking rdev */
if (!rdev || !test_bit(In_sync, &rdev->flags)) {
- /* The ReadError flag wil just be confusing now */
+ /* The ReadError flag will just be confusing now */
clear_bit(R5_ReadError, &dev->flags);
clear_bit(R5_ReWrite, &dev->flags);
}
@@ -1287,7 +1288,7 @@ static void handle_stripe(struct stripe_
* is available
*/
if (syncing && locked == 0 &&
- !test_bit(STRIPE_INSYNC, &sh->state) && failed <= 1) {
+ !test_bit(STRIPE_INSYNC, &sh->state)) {
set_bit(STRIPE_HANDLE, &sh->state);
if (failed == 0) {
char *pagea;
@@ -1305,21 +1306,20 @@ static void handle_stripe(struct stripe_
if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
/* don't try to repair!! */
set_bit(STRIPE_INSYNC, &sh->state);
+ else {
+ compute_block(sh, sh->pd_idx);
+ uptodate++;
+ }
}
}
if (!test_bit(STRIPE_INSYNC, &sh->state)) {
+ /* either failed parity check, or recovery is happening */
if (failed==0)
failed_num = sh->pd_idx;
- /* should be able to compute the missing block and write it to spare */
- if (!test_bit(R5_UPTODATE, &sh->dev[failed_num].flags)) {
- if (uptodate+1 != disks)
- BUG();
- compute_block(sh, failed_num);
- uptodate++;
- }
- if (uptodate != disks)
- BUG();
dev = &sh->dev[failed_num];
+ BUG_ON(!test_bit(R5_UPTODATE, &dev->flags));
+ BUG_ON(uptodate != disks);
+
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
clear_bit(STRIPE_DEGRADED, &sh->state);
@@ -1821,7 +1821,8 @@ static int run(mddev_t *mddev)
struct list_head *tmp;
if (mddev->level != 5 && mddev->level != 4) {
- printk("raid5: %s: raid level not set to 4/5 (%d)\n", mdname(mddev), mddev->level);
+ printk(KERN_ERR "raid5: %s: raid level not set to 4/5 (%d)\n",
+ mdname(mddev), mddev->level);
return -EIO;
}
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 010 of 18] Move bitmap_create to after md array has been initialised.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (7 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 008 of 18] Small cleanups for raid5 NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 011 of 18] Write intent bitmap support for raid10 NeilBrown
` (7 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
This is important because bitmap_create uses
mddev->resync_max_sectors
and that doesn't have a valid value until after the array
has been initialised (with pers->run()).
[It doesn't make a difference for current personalities that
support bitmaps, but will make a difference for raid10]
This has the added advantage of meaning with can move the
thread->timeout manipulation inside the bitmap.c code instead
of sprinkling identical code throughout all personalities.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/bitmap.c | 4 ++++
./drivers/md/md.c | 16 +++++++++-------
./drivers/md/raid1.c | 8 --------
./drivers/md/raid5.c | 11 +----------
./drivers/md/raid6main.c | 11 +----------
5 files changed, 15 insertions(+), 35 deletions(-)
diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-11-28 10:12:40.000000000 +1100
@@ -1530,6 +1530,8 @@ void bitmap_destroy(mddev_t *mddev)
return;
mddev->bitmap = NULL; /* disconnect from the md device */
+ if (mddev->thread)
+ mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
bitmap_free(bitmap);
}
@@ -1636,6 +1638,8 @@ int bitmap_create(mddev_t *mddev)
if (IS_ERR(bitmap->writeback_daemon))
return PTR_ERR(bitmap->writeback_daemon);
+ mddev->thread->timeout = bitmap->daemon_sleep * HZ;
+
return bitmap_update_sb(bitmap);
error:
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-11-28 10:12:33.000000000 +1100
+++ ./drivers/md/md.c 2005-11-28 10:12:40.000000000 +1100
@@ -2054,13 +2054,15 @@ static int do_md_run(mddev_t * mddev)
if (start_readonly)
mddev->ro = 2; /* read-only, but switch on first write */
- /* before we start the array running, initialise the bitmap */
- err = bitmap_create(mddev);
- if (err)
- printk(KERN_ERR "%s: failed to create bitmap (%d)\n",
- mdname(mddev), err);
- else
- err = mddev->pers->run(mddev);
+ err = mddev->pers->run(mddev);
+ if (!err && mddev->pers->sync_request) {
+ err = bitmap_create(mddev);
+ if (err) {
+ printk(KERN_ERR "%s: failed to create bitmap (%d)\n",
+ mdname(mddev), err);
+ mddev->pers->stop(mddev);
+ }
+ }
if (err) {
printk(KERN_ERR "md: pers->run() failed ...\n");
module_put(mddev->pers->owner);
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:12:17.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:12:40.000000000 +1100
@@ -1610,7 +1610,6 @@ static int run(mddev_t *mddev)
mdname(mddev));
goto out_free_conf;
}
- if (mddev->bitmap) mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
printk(KERN_INFO
"raid1: raid set %s active with %d out of %d mirrors\n",
@@ -1782,13 +1781,6 @@ static void raid1_quiesce(mddev_t *mddev
lower_barrier(conf);
break;
}
- if (mddev->thread) {
- if (mddev->bitmap)
- mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
- else
- mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
- md_wakeup_thread(mddev->thread);
- }
}
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-11-28 10:12:33.000000000 +1100
+++ ./drivers/md/raid5.c 2005-11-28 10:12:40.000000000 +1100
@@ -1963,9 +1963,6 @@ memory = conf->max_nr_stripes * (sizeof(
/* Ok, everything is just fine now */
sysfs_create_group(&mddev->kobj, &raid5_attrs_group);
- if (mddev->bitmap)
- mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
-
mddev->queue->unplug_fn = raid5_unplug_device;
mddev->queue->issue_flush_fn = raid5_issue_flush;
@@ -2199,14 +2196,8 @@ static void raid5_quiesce(mddev_t *mddev
spin_unlock_irq(&conf->device_lock);
break;
}
- if (mddev->thread) {
- if (mddev->bitmap)
- mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
- else
- mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
- md_wakeup_thread(mddev->thread);
- }
}
+
static mdk_personality_t raid5_personality=
{
.name = "raid5",
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:12:33.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:12:40.000000000 +1100
@@ -1990,9 +1990,6 @@ static int run(mddev_t *mddev)
/* Ok, everything is just fine now */
mddev->array_size = mddev->size * (mddev->raid_disks - 2);
- if (mddev->bitmap)
- mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
-
mddev->queue->unplug_fn = raid6_unplug_device;
mddev->queue->issue_flush_fn = raid6_issue_flush;
return 0;
@@ -2228,14 +2225,8 @@ static void raid6_quiesce(mddev_t *mddev
spin_unlock_irq(&conf->device_lock);
break;
}
- if (mddev->thread) {
- if (mddev->bitmap)
- mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
- else
- mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
- md_wakeup_thread(mddev->thread);
- }
}
+
static mdk_personality_t raid6_personality=
{
.name = "raid6",
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 011 of 18] Write intent bitmap support for raid10
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (8 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 010 of 18] Move bitmap_create to after md array has been initialised NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 012 of 18] Fix raid6 resync check/repair code NeilBrown
` (6 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 10 +-
./drivers/md/raid10.c | 178 +++++++++++++++++++++++++++++++++++++-----
./include/linux/raid/raid10.h | 9 +-
3 files changed, 171 insertions(+), 26 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-11-28 10:12:40.000000000 +1100
+++ ./drivers/md/md.c 2005-11-28 10:12:52.000000000 +1100
@@ -714,9 +714,10 @@ static int super_90_validate(mddev_t *md
if (sb->state & (1<<MD_SB_BITMAP_PRESENT) &&
mddev->bitmap_file == NULL) {
- if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6) {
+ if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6
+ && mddev->level != 10) {
/* FIXME use a better test */
- printk(KERN_WARNING "md: bitmaps only support for raid1\n");
+ printk(KERN_WARNING "md: bitmaps not supported for this level.\n");
return -EINVAL;
}
mddev->bitmap_offset = mddev->default_bitmap_offset;
@@ -1037,8 +1038,9 @@ static int super_1_validate(mddev_t *mdd
if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) &&
mddev->bitmap_file == NULL ) {
- if (mddev->level != 1) {
- printk(KERN_WARNING "md: bitmaps only supported for raid1\n");
+ if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6
+ && mddev->level != 10) {
+ printk(KERN_WARNING "md: bitmaps not supported for this level.\n");
return -EINVAL;
}
mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~ 2005-11-28 10:12:24.000000000 +1100
+++ ./drivers/md/raid10.c 2005-11-28 10:12:52.000000000 +1100
@@ -18,7 +18,9 @@
* Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
+#include "dm-bio-list.h"
#include <linux/raid/raid10.h>
+#include <linux/raid/bitmap.h>
/*
* RAID10 provides a combination of RAID0 and RAID1 functionality.
@@ -306,9 +308,11 @@ static int raid10_end_write_request(stru
/*
* this branch is our 'one mirror IO has finished' event handler:
*/
- if (!uptodate)
+ if (!uptodate) {
md_error(r10_bio->mddev, conf->mirrors[dev].rdev);
- else
+ /* an I/O failed, we can't clear the bitmap */
+ set_bit(R10BIO_Degraded, &r10_bio->state);
+ } else
/*
* Set R10BIO_Uptodate in our master bio, so that
* we will return a good error code for to the higher
@@ -328,6 +332,11 @@ static int raid10_end_write_request(stru
* already.
*/
if (atomic_dec_and_test(&r10_bio->remaining)) {
+ /* clear the bitmap if all writes complete successfully */
+ bitmap_endwrite(r10_bio->mddev->bitmap, r10_bio->sector,
+ r10_bio->sectors,
+ !test_bit(R10BIO_Degraded, &r10_bio->state),
+ 0);
md_write_end(r10_bio->mddev);
raid_end_bio_io(r10_bio);
}
@@ -486,8 +495,9 @@ static int read_balance(conf_t *conf, r1
rcu_read_lock();
/*
* Check if we can balance. We can balance on the whole
- * device if no resync is going on, or below the resync window.
- * We take the first readable disk when above the resync window.
+ * device if no resync is going on (recovery is ok), or below
+ * the resync window. We take the first readable disk when
+ * above the resync window.
*/
if (conf->mddev->recovery_cp < MaxSector
&& (this_sector + sectors >= conf->next_resync)) {
@@ -591,7 +601,10 @@ static void unplug_slaves(mddev_t *mddev
static void raid10_unplug(request_queue_t *q)
{
+ mddev_t *mddev = q->queuedata;
+
unplug_slaves(q->queuedata);
+ md_wakeup_thread(mddev->thread);
}
static int raid10_issue_flush(request_queue_t *q, struct gendisk *disk,
@@ -647,12 +660,13 @@ static int raid10_issue_flush(request_qu
*/
#define RESYNC_DEPTH 32
-static void raise_barrier(conf_t *conf)
+static void raise_barrier(conf_t *conf, int force)
{
+ BUG_ON(force && !conf->barrier);
spin_lock_irq(&conf->resync_lock);
- /* Wait until no block IO is waiting */
- wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
+ /* Wait until no block IO is waiting (unless 'force') */
+ wait_event_lock_irq(conf->wait_barrier, force || !conf->nr_waiting,
conf->resync_lock,
raid10_unplug(conf->mddev->queue));
@@ -710,6 +724,8 @@ static int make_request(request_queue_t
int i;
int chunk_sects = conf->chunk_mask + 1;
const int rw = bio_data_dir(bio);
+ struct bio_list bl;
+ unsigned long flags;
if (unlikely(bio_barrier(bio))) {
bio_endio(bio, bio->bi_size, -EOPNOTSUPP);
@@ -767,6 +783,7 @@ static int make_request(request_queue_t
r10_bio->mddev = mddev;
r10_bio->sector = bio->bi_sector;
+ r10_bio->state = 0;
if (rw == READ) {
/*
@@ -811,13 +828,16 @@ static int make_request(request_queue_t
!test_bit(Faulty, &rdev->flags)) {
atomic_inc(&rdev->nr_pending);
r10_bio->devs[i].bio = bio;
- } else
+ } else {
r10_bio->devs[i].bio = NULL;
+ set_bit(R10BIO_Degraded, &r10_bio->state);
+ }
}
rcu_read_unlock();
- atomic_set(&r10_bio->remaining, 1);
+ atomic_set(&r10_bio->remaining, 0);
+ bio_list_init(&bl);
for (i = 0; i < conf->copies; i++) {
struct bio *mbio;
int d = r10_bio->devs[i].devnum;
@@ -835,13 +855,14 @@ static int make_request(request_queue_t
mbio->bi_private = r10_bio;
atomic_inc(&r10_bio->remaining);
- generic_make_request(mbio);
+ bio_list_add(&bl, mbio);
}
- if (atomic_dec_and_test(&r10_bio->remaining)) {
- md_write_end(mddev);
- raid_end_bio_io(r10_bio);
- }
+ bitmap_startwrite(mddev->bitmap, bio->bi_sector, r10_bio->sectors, 0);
+ spin_lock_irqsave(&conf->device_lock, flags);
+ bio_list_merge(&conf->pending_bio_list, &bl);
+ blk_plug_device(mddev->queue);
+ spin_unlock_irqrestore(&conf->device_lock, flags);
return 0;
}
@@ -999,7 +1020,12 @@ static int raid10_add_disk(mddev_t *mdde
if (!enough(conf))
return 0;
- for (mirror=0; mirror < mddev->raid_disks; mirror++)
+ if (rdev->saved_raid_disk >= 0 &&
+ conf->mirrors[rdev->saved_raid_disk].rdev == NULL)
+ mirror = rdev->saved_raid_disk;
+ else
+ mirror = 0;
+ for ( ; mirror < mddev->raid_disks; mirror++)
if ( !(p=conf->mirrors+mirror)->rdev) {
blk_queue_stack_limits(mddev->queue,
@@ -1015,6 +1041,8 @@ static int raid10_add_disk(mddev_t *mdde
p->head_position = 0;
rdev->raid_disk = mirror;
found = 1;
+ if (rdev->saved_raid_disk != mirror)
+ conf->fullsync = 1;
rcu_assign_pointer(p->rdev, rdev);
break;
}
@@ -1282,6 +1310,26 @@ static void raid10d(mddev_t *mddev)
for (;;) {
char b[BDEVNAME_SIZE];
spin_lock_irqsave(&conf->device_lock, flags);
+
+ if (conf->pending_bio_list.head) {
+ bio = bio_list_get(&conf->pending_bio_list);
+ blk_remove_plug(mddev->queue);
+ spin_unlock_irqrestore(&conf->device_lock, flags);
+ /* flush any pending bitmap writes to disk before proceeding w/ I/O */
+ if (bitmap_unplug(mddev->bitmap) != 0)
+ printk("%s: bitmap file write failed!\n", mdname(mddev));
+
+ while (bio) { /* submit pending writes */
+ struct bio *next = bio->bi_next;
+ bio->bi_next = NULL;
+ generic_make_request(bio);
+ bio = next;
+ }
+ unplug = 1;
+
+ continue;
+ }
+
if (list_empty(head))
break;
r10_bio = list_entry(head->prev, r10bio_t, retry_list);
@@ -1388,6 +1436,8 @@ static sector_t sync_request(mddev_t *md
sector_t max_sector, nr_sectors;
int disk;
int i;
+ int max_sync;
+ int sync_blocks;
sector_t sectors_skipped = 0;
int chunks_skipped = 0;
@@ -1401,6 +1451,29 @@ static sector_t sync_request(mddev_t *md
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
max_sector = mddev->resync_max_sectors;
if (sector_nr >= max_sector) {
+ /* If we aborted, we need to abort the
+ * sync on the 'current' bitmap chucks (there can
+ * be several when recovering multiple devices).
+ * as we may have started syncing it but not finished.
+ * We can find the current address in
+ * mddev->curr_resync, but for recovery,
+ * we need to convert that to several
+ * virtual addresses.
+ */
+ if (mddev->curr_resync < max_sector) { /* aborted */
+ if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
+ bitmap_end_sync(mddev->bitmap, mddev->curr_resync,
+ &sync_blocks, 1);
+ else for (i=0; i<conf->raid_disks; i++) {
+ sector_t sect =
+ raid10_find_virt(conf, mddev->curr_resync, i);
+ bitmap_end_sync(mddev->bitmap, sect,
+ &sync_blocks, 1);
+ }
+ } else /* completed sync */
+ conf->fullsync = 0;
+
+ bitmap_close_sync(mddev->bitmap);
close_sync(conf);
*skipped = 1;
return sectors_skipped;
@@ -1425,8 +1498,6 @@ static sector_t sync_request(mddev_t *md
*/
if (!go_faster && conf->nr_waiting)
msleep_interruptible(1000);
- raise_barrier(conf);
- conf->next_resync = sector_nr;
/* Again, very different code for resync and recovery.
* Both must result in an r10bio with a list of bios that
@@ -1443,6 +1514,7 @@ static sector_t sync_request(mddev_t *md
* end_sync_write if we will want to write.
*/
+ max_sync = RESYNC_PAGES << (PAGE_SHIFT-9);
if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
/* recovery... the complicated one */
int i, j, k;
@@ -1451,13 +1523,29 @@ static sector_t sync_request(mddev_t *md
for (i=0 ; i<conf->raid_disks; i++)
if (conf->mirrors[i].rdev &&
!test_bit(In_sync, &conf->mirrors[i].rdev->flags)) {
+ int still_degraded = 0;
/* want to reconstruct this device */
r10bio_t *rb2 = r10_bio;
+ sector_t sect = raid10_find_virt(conf, sector_nr, i);
+ int must_sync;
+ /* Unless we are doing a full sync, we only need
+ * to recover the block if it is set in the bitmap
+ */
+ must_sync = bitmap_start_sync(mddev->bitmap, sect,
+ &sync_blocks, 1);
+ if (sync_blocks < max_sync)
+ max_sync = sync_blocks;
+ if (!must_sync &&
+ !conf->fullsync) {
+ /* yep, skip the sync_blocks here, but don't assume
+ * that there will never be anything to do here
+ */
+ chunks_skipped = -1;
+ continue;
+ }
r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
- spin_lock_irq(&conf->resync_lock);
- if (rb2) conf->barrier++;
- spin_unlock_irq(&conf->resync_lock);
+ raise_barrier(conf, rb2 != NULL);
atomic_set(&r10_bio->remaining, 0);
r10_bio->master_bio = (struct bio*)rb2;
@@ -1465,8 +1553,21 @@ static sector_t sync_request(mddev_t *md
atomic_inc(&rb2->remaining);
r10_bio->mddev = mddev;
set_bit(R10BIO_IsRecover, &r10_bio->state);
- r10_bio->sector = raid10_find_virt(conf, sector_nr, i);
+ r10_bio->sector = sect;
+
raid10_find_phys(conf, r10_bio);
+ /* Need to check if this section will still be
+ * degraded
+ */
+ for (j=0; j<conf->copies;j++) {
+ int d = r10_bio->devs[j].devnum;
+ if (conf->mirrors[d].rdev == NULL ||
+ test_bit(Faulty, &conf->mirrors[d].rdev->flags))
+ still_degraded = 1;
+ }
+ must_sync = bitmap_start_sync(mddev->bitmap, sect,
+ &sync_blocks, still_degraded);
+
for (j=0; j<conf->copies;j++) {
int d = r10_bio->devs[j].devnum;
if (conf->mirrors[d].rdev &&
@@ -1526,10 +1627,22 @@ static sector_t sync_request(mddev_t *md
} else {
/* resync. Schedule a read for every block at this virt offset */
int count = 0;
+
+ if (!bitmap_start_sync(mddev->bitmap, sector_nr,
+ &sync_blocks, mddev->degraded) &&
+ !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+ /* We can skip this block */
+ *skipped = 1;
+ return sync_blocks + sectors_skipped;
+ }
+ if (sync_blocks < max_sync)
+ max_sync = sync_blocks;
r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
r10_bio->mddev = mddev;
atomic_set(&r10_bio->remaining, 0);
+ raise_barrier(conf, 0);
+ conf->next_resync = sector_nr;
r10_bio->master_bio = NULL;
r10_bio->sector = sector_nr;
@@ -1582,6 +1695,8 @@ static sector_t sync_request(mddev_t *md
}
nr_sectors = 0;
+ if (sector_nr + max_sync < max_sector)
+ max_sector = sector_nr + max_sync;
do {
struct page *page;
int len = PAGE_SIZE;
@@ -1821,6 +1936,26 @@ static int stop(mddev_t *mddev)
return 0;
}
+static void raid10_quiesce(mddev_t *mddev, int state)
+{
+ conf_t *conf = mddev_to_conf(mddev);
+
+ switch(state) {
+ case 1:
+ raise_barrier(conf, 0);
+ break;
+ case 0:
+ lower_barrier(conf);
+ break;
+ }
+ if (mddev->thread) {
+ if (mddev->bitmap)
+ mddev->thread->timeout = mddev->bitmap->daemon_sleep * HZ;
+ else
+ mddev->thread->timeout = MAX_SCHEDULE_TIMEOUT;
+ md_wakeup_thread(mddev->thread);
+ }
+}
static mdk_personality_t raid10_personality =
{
@@ -1835,6 +1970,7 @@ static mdk_personality_t raid10_personal
.hot_remove_disk= raid10_remove_disk,
.spare_active = raid10_spare_active,
.sync_request = sync_request,
+ .quiesce = raid10_quiesce,
};
static int __init raid_init(void)
diff ./include/linux/raid/raid10.h~current~ ./include/linux/raid/raid10.h
--- ./include/linux/raid/raid10.h~current~ 2005-11-28 10:12:24.000000000 +1100
+++ ./include/linux/raid/raid10.h 2005-11-28 10:12:52.000000000 +1100
@@ -35,13 +35,19 @@ struct r10_private_data_s {
sector_t chunk_mask;
struct list_head retry_list;
- /* for use when syncing mirrors: */
+ /* queue pending writes and submit them on unplug */
+ struct bio_list pending_bio_list;
+
spinlock_t resync_lock;
int nr_pending;
int nr_waiting;
int barrier;
sector_t next_resync;
+ int fullsync; /* set to 1 if a full sync is needed,
+ * (fresh device added).
+ * Cleared when a sync completes.
+ */
wait_queue_head_t wait_barrier;
@@ -100,4 +106,5 @@ struct r10bio_s {
#define R10BIO_Uptodate 0
#define R10BIO_IsSync 1
#define R10BIO_IsRecover 2
+#define R10BIO_Degraded 3
#endif
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 012 of 18] Fix raid6 resync check/repair code.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (9 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 011 of 18] Write intent bitmap support for raid10 NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 013 of 18] Improve handing of read errors with raid6 NeilBrown
` (5 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
raid6 currently does not check the P/Q syndromes when doing a resync,
it just calculates the correct value and writes it.
Doing the check can reduce writes (often to 0) for a resync,
and it is needed to properly implement the
echo check > sync_action
operation.
This patch implements the appropriate checks and tidies up some related
code.
It also allows raid6 user-requested resync to bypass the intent bitmap.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid6main.c | 184 +++++++++++++++++++++++++------------------
./include/linux/raid/raid5.h | 2
2 files changed, 109 insertions(+), 77 deletions(-)
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:12:40.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:12:56.000000000 +1100
@@ -805,7 +805,7 @@ static void compute_parity(struct stripe
}
/* Compute one missing block */
-static void compute_block_1(struct stripe_head *sh, int dd_idx)
+static void compute_block_1(struct stripe_head *sh, int dd_idx, int nozero)
{
raid6_conf_t *conf = sh->raid_conf;
int i, count, disks = conf->raid_disks;
@@ -821,7 +821,7 @@ static void compute_block_1(struct strip
compute_parity(sh, UPDATE_PARITY);
} else {
ptr[0] = page_address(sh->dev[dd_idx].page);
- memset(ptr[0], 0, STRIPE_SIZE);
+ if (!nozero) memset(ptr[0], 0, STRIPE_SIZE);
count = 1;
for (i = disks ; i--; ) {
if (i == dd_idx || i == qd_idx)
@@ -838,7 +838,8 @@ static void compute_block_1(struct strip
}
if (count != 1)
xor_block(count, STRIPE_SIZE, ptr);
- set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
+ if (!nozero) set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
+ else clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
}
}
@@ -871,7 +872,7 @@ static void compute_block_2(struct strip
return;
} else {
/* We're missing D+Q; recompute D from P */
- compute_block_1(sh, (dd_idx1 == qd_idx) ? dd_idx2 : dd_idx1);
+ compute_block_1(sh, (dd_idx1 == qd_idx) ? dd_idx2 : dd_idx1, 0);
compute_parity(sh, UPDATE_PARITY); /* Is this necessary? */
return;
}
@@ -982,6 +983,12 @@ static int add_stripe_bio(struct stripe_
}
+static int page_is_zero(struct page *p)
+{
+ char *a = page_address(p);
+ return ((*(u32*)a) == 0 &&
+ memcmp(a, a+4, STRIPE_SIZE-4)==0);
+}
/*
* handle_stripe - do things to a stripe.
*
@@ -1000,7 +1007,7 @@ static int add_stripe_bio(struct stripe_
*
*/
-static void handle_stripe(struct stripe_head *sh)
+static void handle_stripe(struct stripe_head *sh, struct page *tmp_page)
{
raid6_conf_t *conf = sh->raid_conf;
int disks = conf->raid_disks;
@@ -1228,7 +1235,7 @@ static void handle_stripe(struct stripe_
if (uptodate == disks-1) {
PRINTK("Computing stripe %llu block %d\n",
(unsigned long long)sh->sector, i);
- compute_block_1(sh, i);
+ compute_block_1(sh, i, 0);
uptodate++;
} else if ( uptodate == disks-2 && failed >= 2 ) {
/* Computing 2-failure is *very* expensive; only do it if failed >= 2 */
@@ -1323,7 +1330,7 @@ static void handle_stripe(struct stripe_
/* We have failed blocks and need to compute them */
switch ( failed ) {
case 0: BUG();
- case 1: compute_block_1(sh, failed_num[0]); break;
+ case 1: compute_block_1(sh, failed_num[0], 0); break;
case 2: compute_block_2(sh, failed_num[0], failed_num[1]); break;
default: BUG(); /* This request should have been failed? */
}
@@ -1338,12 +1345,10 @@ static void handle_stripe(struct stripe_
(unsigned long long)sh->sector, i);
locked++;
set_bit(R5_Wantwrite, &sh->dev[i].flags);
-#if 0 /**** FIX: I don't understand the logic here... ****/
- if (!test_bit(R5_Insync, &sh->dev[i].flags)
- || ((i==pd_idx || i==qd_idx) && failed == 0)) /* FIX? */
- set_bit(STRIPE_INSYNC, &sh->state);
-#endif
}
+ /* after a RECONSTRUCT_WRITE, the stripe MUST be in-sync */
+ set_bit(STRIPE_INSYNC, &sh->state);
+
if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
atomic_dec(&conf->preread_active_stripes);
if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
@@ -1356,79 +1361,97 @@ static void handle_stripe(struct stripe_
* Any reads will already have been scheduled, so we just see if enough data
* is available
*/
- if (syncing && locked == 0 &&
- !test_bit(STRIPE_INSYNC, &sh->state) && failed <= 2) {
- set_bit(STRIPE_HANDLE, &sh->state);
-#if 0 /* RAID-6: Don't support CHECK PARITY yet */
- if (failed == 0) {
- char *pagea;
- if (uptodate != disks)
- BUG();
- compute_parity(sh, CHECK_PARITY);
- uptodate--;
- pagea = page_address(sh->dev[pd_idx].page);
- if ((*(u32*)pagea) == 0 &&
- !memcmp(pagea, pagea+4, STRIPE_SIZE-4)) {
- /* parity is correct (on disc, not in buffer any more) */
- set_bit(STRIPE_INSYNC, &sh->state);
- }
- }
-#endif
- if (!test_bit(STRIPE_INSYNC, &sh->state)) {
- int failed_needupdate[2];
- struct r5dev *adev, *bdev;
-
- if ( failed < 1 )
- failed_num[0] = pd_idx;
- if ( failed < 2 )
- failed_num[1] = (failed_num[0] == qd_idx) ? pd_idx : qd_idx;
+ if (syncing && locked == 0 && !test_bit(STRIPE_INSYNC, &sh->state)) {
+ int update_p = 0, update_q = 0;
+ struct r5dev *dev;
- failed_needupdate[0] = !test_bit(R5_UPTODATE, &sh->dev[failed_num[0]].flags);
- failed_needupdate[1] = !test_bit(R5_UPTODATE, &sh->dev[failed_num[1]].flags);
+ set_bit(STRIPE_HANDLE, &sh->state);
- PRINTK("sync: failed=%d num=%d,%d fnu=%u%u\n",
- failed, failed_num[0], failed_num[1], failed_needupdate[0], failed_needupdate[1]);
+ BUG_ON(failed>2);
+ BUG_ON(uptodate < disks);
+ /* Want to check and possibly repair P and Q.
+ * However there could be one 'failed' device, in which
+ * case we can only check one of them, possibly using the
+ * other to generate missing data
+ */
-#if 0 /* RAID-6: This code seems to require that CHECK_PARITY destroys the uptodateness of the parity */
- /* should be able to compute the missing block(s) and write to spare */
- if ( failed_needupdate[0] ^ failed_needupdate[1] ) {
- if (uptodate+1 != disks)
- BUG();
- compute_block_1(sh, failed_needupdate[0] ? failed_num[0] : failed_num[1]);
- uptodate++;
- } else if ( failed_needupdate[0] & failed_needupdate[1] ) {
- if (uptodate+2 != disks)
- BUG();
- compute_block_2(sh, failed_num[0], failed_num[1]);
- uptodate += 2;
+ /* If !tmp_page, we cannot do the calculations,
+ * but as we have set STRIPE_HANDLE, we will soon be called
+ * by stripe_handle with a tmp_page - just wait until then.
+ */
+ if (tmp_page) {
+ if (failed == q_failed) {
+ /* The only possible failed device holds 'Q', so it makes
+ * sense to check P (If anything else were failed, we would
+ * have used P to recreate it).
+ */
+ compute_block_1(sh, pd_idx, 1);
+ if (!page_is_zero(sh->dev[pd_idx].page)) {
+ compute_block_1(sh,pd_idx,0);
+ update_p = 1;
+ }
+ }
+ if (!q_failed && failed < 2) {
+ /* q is not failed, and we didn't use it to generate
+ * anything, so it makes sense to check it
+ */
+ memcpy(page_address(tmp_page),
+ page_address(sh->dev[qd_idx].page),
+ STRIPE_SIZE);
+ compute_parity(sh, UPDATE_PARITY);
+ if (memcmp(page_address(tmp_page),
+ page_address(sh->dev[qd_idx].page),
+ STRIPE_SIZE)!= 0) {
+ clear_bit(STRIPE_INSYNC, &sh->state);
+ update_q = 1;
+ }
+ }
+ if (update_p || update_q) {
+ conf->mddev->resync_mismatches += STRIPE_SECTORS;
+ if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
+ /* don't try to repair!! */
+ update_p = update_q = 0;
}
-#else
- compute_block_2(sh, failed_num[0], failed_num[1]);
- uptodate += failed_needupdate[0] + failed_needupdate[1];
-#endif
-
- if (uptodate != disks)
- BUG();
- PRINTK("Marking for sync stripe %llu blocks %d,%d\n",
- (unsigned long long)sh->sector, failed_num[0], failed_num[1]);
+ /* now write out any block on a failed drive,
+ * or P or Q if they need it
+ */
+
+ if (failed == 2) {
+ dev = &sh->dev[failed_num[1]];
+ locked++;
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantwrite, &dev->flags);
+ set_bit(R5_Syncio, &dev->flags);
+ }
+ if (failed >= 1) {
+ dev = &sh->dev[failed_num[0]];
+ locked++;
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantwrite, &dev->flags);
+ set_bit(R5_Syncio, &dev->flags);
+ }
- /**** FIX: Should we really do both of these unconditionally? ****/
- adev = &sh->dev[failed_num[0]];
- locked += !test_bit(R5_LOCKED, &adev->flags);
- set_bit(R5_LOCKED, &adev->flags);
- set_bit(R5_Wantwrite, &adev->flags);
- bdev = &sh->dev[failed_num[1]];
- locked += !test_bit(R5_LOCKED, &bdev->flags);
- set_bit(R5_LOCKED, &bdev->flags);
+ if (update_p) {
+ dev = &sh->dev[pd_idx];
+ locked ++;
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantwrite, &dev->flags);
+ set_bit(R5_Syncio, &dev->flags);
+ }
+ if (update_q) {
+ dev = &sh->dev[qd_idx];
+ locked++;
+ set_bit(R5_LOCKED, &dev->flags);
+ set_bit(R5_Wantwrite, &dev->flags);
+ set_bit(R5_Syncio, &dev->flags);
+ }
clear_bit(STRIPE_DEGRADED, &sh->state);
- set_bit(R5_Wantwrite, &bdev->flags);
set_bit(STRIPE_INSYNC, &sh->state);
- set_bit(R5_Syncio, &adev->flags);
- set_bit(R5_Syncio, &bdev->flags);
}
}
+
if (syncing && locked == 0 && test_bit(STRIPE_INSYNC, &sh->state)) {
md_done_sync(conf->mddev, STRIPE_SECTORS,1);
clear_bit(STRIPE_SYNCING, &sh->state);
@@ -1664,7 +1687,7 @@ static int make_request (request_queue_t
}
finish_wait(&conf->wait_for_overlap, &w);
raid6_plug_device(conf);
- handle_stripe(sh);
+ handle_stripe(sh, NULL);
release_stripe(sh);
} else {
/* cannot get stripe for read-ahead, just give-up */
@@ -1728,6 +1751,7 @@ static sector_t sync_request(mddev_t *md
return rv;
}
if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
+ !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
!conf->fullsync && sync_blocks >= STRIPE_SECTORS) {
/* we can skip this block, and probably more */
sync_blocks /= STRIPE_SECTORS;
@@ -1765,7 +1789,7 @@ static sector_t sync_request(mddev_t *md
clear_bit(STRIPE_INSYNC, &sh->state);
spin_unlock(&sh->lock);
- handle_stripe(sh);
+ handle_stripe(sh, NULL);
release_stripe(sh);
return STRIPE_SECTORS;
@@ -1821,7 +1845,7 @@ static void raid6d (mddev_t *mddev)
spin_unlock_irq(&conf->device_lock);
handled++;
- handle_stripe(sh);
+ handle_stripe(sh, conf->spare_page);
release_stripe(sh);
spin_lock_irq(&conf->device_lock);
@@ -1860,6 +1884,10 @@ static int run(mddev_t *mddev)
goto abort;
memset(conf->stripe_hashtbl, 0, HASH_PAGES * PAGE_SIZE);
+ conf->spare_page = alloc_page(GFP_KERNEL);
+ if (!conf->spare_page)
+ goto abort;
+
spin_lock_init(&conf->device_lock);
init_waitqueue_head(&conf->wait_for_stripe);
init_waitqueue_head(&conf->wait_for_overlap);
@@ -1996,6 +2024,8 @@ static int run(mddev_t *mddev)
abort:
if (conf) {
print_raid6_conf(conf);
+ if (conf->spare_page)
+ page_cache_release(conf->spare_page);
if (conf->stripe_hashtbl)
free_pages((unsigned long) conf->stripe_hashtbl,
HASH_PAGES_ORDER);
diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~ 2005-11-28 10:08:19.000000000 +1100
+++ ./include/linux/raid/raid5.h 2005-11-28 10:12:56.000000000 +1100
@@ -228,6 +228,8 @@ struct raid5_private_data {
* Cleared when a sync completes.
*/
+ struct page *spare_page; /* Used when checking P/Q in raid6 */
+
/*
* Free stripes pool
*/
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 013 of 18] Improve handing of read errors with raid6
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (10 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 012 of 18] Fix raid6 resync check/repair code NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-30 22:33 ` Carlos Carvalho
2005-11-27 23:40 ` [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1 NeilBrown
` (4 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
This is a simple port of match functionality across from raid5.
If we get a read error, we don't kick the drive straight away, but
try to over-write with good data first.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid6main.c | 70 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 66 insertions(+), 4 deletions(-)
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:12:56.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:12:59.000000000 +1100
@@ -367,8 +367,8 @@ static void shrink_stripes(raid6_conf_t
conf->slab_cache = NULL;
}
-static int raid6_end_read_request (struct bio * bi, unsigned int bytes_done,
- int error)
+static int raid6_end_read_request(struct bio * bi, unsigned int bytes_done,
+ int error)
{
struct stripe_head *sh = bi->bi_private;
raid6_conf_t *conf = sh->raid_conf;
@@ -420,9 +420,35 @@ static int raid6_end_read_request (struc
#else
set_bit(R5_UPTODATE, &sh->dev[i].flags);
#endif
+ if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
+ printk(KERN_INFO "raid6: read error corrected!!\n");
+ clear_bit(R5_ReadError, &sh->dev[i].flags);
+ clear_bit(R5_ReWrite, &sh->dev[i].flags);
+ }
+ if (atomic_read(&conf->disks[i].rdev->read_errors))
+ atomic_set(&conf->disks[i].rdev->read_errors, 0);
} else {
- md_error(conf->mddev, conf->disks[i].rdev);
+ int retry = 0;
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
+ atomic_inc(&conf->disks[i].rdev->read_errors);
+ if (conf->mddev->degraded)
+ printk(KERN_WARNING "raid6: read error not correctable.\n");
+ else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
+ /* Oh, no!!! */
+ printk(KERN_WARNING "raid6: read error NOT corrected!!\n");
+ else if (atomic_read(&conf->disks[i].rdev->read_errors)
+ > conf->max_nr_stripes)
+ printk(KERN_WARNING
+ "raid6: Too many read errors, failing device.\n");
+ else
+ retry = 1;
+ if (retry)
+ set_bit(R5_ReadError, &sh->dev[i].flags);
+ else {
+ clear_bit(R5_ReadError, &sh->dev[i].flags);
+ clear_bit(R5_ReWrite, &sh->dev[i].flags);
+ md_error(conf->mddev, conf->disks[i].rdev);
+ }
}
rdev_dec_pending(conf->disks[i].rdev, conf->mddev);
#if 0
@@ -1079,6 +1105,12 @@ static void handle_stripe(struct stripe_
if (dev->written) written++;
rdev = conf->disks[i].rdev; /* FIXME, should I be looking rdev */
if (!rdev || !test_bit(In_sync, &rdev->flags)) {
+ /* The ReadError flag will just be confusing now */
+ clear_bit(R5_ReadError, &dev->flags);
+ clear_bit(R5_ReWrite, &dev->flags);
+ }
+ if (!rdev || !test_bit(In_sync, &rdev->flags)
+ || test_bit(R5_ReadError, &dev->flags)) {
if ( failed < 2 )
failed_num[failed] = i;
failed++;
@@ -1095,6 +1127,14 @@ static void handle_stripe(struct stripe_
if (failed > 2 && to_read+to_write+written) {
for (i=disks; i--; ) {
int bitmap_end = 0;
+
+ if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
+ mdk_rdev_t *rdev = conf->disks[i].rdev;
+ if (rdev && test_bit(In_sync, &rdev->flags))
+ /* multiple read failures in one stripe */
+ md_error(conf->mddev, rdev);
+ }
+
spin_lock_irq(&conf->device_lock);
/* fail all writes first */
bi = sh->dev[i].towrite;
@@ -1130,7 +1170,8 @@ static void handle_stripe(struct stripe_
}
/* fail any reads if this device is non-operational */
- if (!test_bit(R5_Insync, &sh->dev[i].flags)) {
+ if (!test_bit(R5_Insync, &sh->dev[i].flags) ||
+ test_bit(R5_ReadError, &sh->dev[i].flags)) {
bi = sh->dev[i].toread;
sh->dev[i].toread = NULL;
if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
@@ -1457,6 +1498,27 @@ static void handle_stripe(struct stripe_
clear_bit(STRIPE_SYNCING, &sh->state);
}
+ /* If the failed drives are just a ReadError, then we might need
+ * to progress the repair/check process
+ */
+ if (failed <= 2 && ! conf->mddev->ro)
+ for (i=0; i<failed;i++) {
+ dev = &sh->dev[failed_num[i]];
+ if (test_bit(R5_ReadError, &dev->flags)
+ && !test_bit(R5_LOCKED, &dev->flags)
+ && test_bit(R5_UPTODATE, &dev->flags)
+ ) {
+ if (!test_bit(R5_ReWrite, &dev->flags)) {
+ set_bit(R5_Wantwrite, &dev->flags);
+ set_bit(R5_ReWrite, &dev->flags);
+ set_bit(R5_LOCKED, &dev->flags);
+ } else {
+ /* let's read it back */
+ set_bit(R5_Wantread, &dev->flags);
+ set_bit(R5_LOCKED, &dev->flags);
+ }
+ }
+ }
spin_unlock(&sh->lock);
while ((bi=return_bi)) {
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (11 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 013 of 18] Improve handing of read errors with raid6 NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-29 16:38 ` Paul Clements
2005-11-27 23:40 ` [PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors NeilBrown
` (3 subsequent siblings)
16 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
On a read-error we suspend the array, then synchronously read the
block from other arrays until we find one where we can read it. Then
we try writing the good data back everywhere and make sure it works.
If any write or subsequent read fails, only then do we fail the device
out of the array.
To be able to suspend the array, we need to also keep track of how
many requests are queued for handling by raid1d.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 1
./drivers/md/raid1.c | 115 +++++++++++++++++++++++++++++++++++++++----
./include/linux/raid/raid1.h | 3 +
3 files changed, 109 insertions(+), 10 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-11-28 10:12:52.000000000 +1100
+++ ./drivers/md/md.c 2005-11-28 10:13:11.000000000 +1100
@@ -461,6 +461,7 @@ int sync_page_io(struct block_device *bd
bio_put(bio);
return ret;
}
+EXPORT_SYMBOL(sync_page_io);
static int read_disk_sb(mdk_rdev_t * rdev, int size)
{
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:12:40.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:13:11.000000000 +1100
@@ -191,6 +191,7 @@ static void reschedule_retry(r1bio_t *r1
spin_lock_irqsave(&conf->device_lock, flags);
list_add(&r1_bio->retry_list, &conf->retry_list);
+ conf->nr_queued ++;
spin_unlock_irqrestore(&conf->device_lock, flags);
wake_up(&conf->wait_barrier);
@@ -245,9 +246,9 @@ static int raid1_end_read_request(struct
/*
* this branch is our 'one mirror IO has finished' event handler:
*/
- if (!uptodate)
- md_error(r1_bio->mddev, conf->mirrors[mirror].rdev);
- else
+ update_head_pos(mirror, r1_bio);
+
+ if (uptodate || conf->working_disks <= 1) {
/*
* Set R1BIO_Uptodate in our master bio, so that
* we will return a good error code for to the higher
@@ -259,14 +260,8 @@ static int raid1_end_read_request(struct
*/
set_bit(R1BIO_Uptodate, &r1_bio->state);
- update_head_pos(mirror, r1_bio);
-
- /*
- * we have only one bio on the read side
- */
- if (uptodate)
raid_end_bio_io(r1_bio);
- else {
+ } else {
/*
* oops, read error:
*/
@@ -652,6 +647,32 @@ static void allow_barrier(conf_t *conf)
wake_up(&conf->wait_barrier);
}
+static void freeze_array(conf_t *conf)
+{
+ /* stop syncio and normal IO and wait for everything to
+ * go quite.
+ * We increment barrier and nr_waiting, and then
+ * wait until barrier+nr_pending match nr_queued+2
+ */
+ spin_lock_irq(&conf->resync_lock);
+ conf->barrier++;
+ conf->nr_waiting++;
+ wait_event_lock_irq(conf->wait_barrier,
+ conf->barrier+conf->nr_pending == conf->nr_queued+2,
+ conf->resync_lock,
+ raid1_unplug(conf->mddev->queue));
+ spin_unlock_irq(&conf->resync_lock);
+}
+static void unfreeze_array(conf_t *conf)
+{
+ /* reverse the effect of the freeze */
+ spin_lock_irq(&conf->resync_lock);
+ conf->barrier--;
+ conf->nr_waiting--;
+ wake_up(&conf->wait_barrier);
+ spin_unlock_irq(&conf->resync_lock);
+}
+
/* duplicate the data pages for behind I/O */
static struct page **alloc_behind_pages(struct bio *bio)
@@ -1195,6 +1216,7 @@ static void raid1d(mddev_t *mddev)
break;
r1_bio = list_entry(head->prev, r1bio_t, retry_list);
list_del(head->prev);
+ conf->nr_queued--;
spin_unlock_irqrestore(&conf->device_lock, flags);
mddev = r1_bio->mddev;
@@ -1234,6 +1256,74 @@ static void raid1d(mddev_t *mddev)
}
} else {
int disk;
+
+ /* we got a read error. Maybe the drive is bad. Maybe just
+ * the block and we can fix it.
+ * We freeze all other IO, and try reading the block from
+ * other devices. When we find one, we re-write
+ * and check it that fixes the read error.
+ * This is all done synchronously while the array is
+ * frozen
+ */
+ sector_t sect = r1_bio->sector;
+ int sectors = r1_bio->sectors;
+ freeze_array(conf);
+ while(sectors) {
+ int s = sectors;
+ int d = r1_bio->read_disk;
+ int success = 0;
+
+ if (s > (PAGE_SIZE>>9))
+ s = PAGE_SIZE >> 9;
+
+ do {
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags) &&
+ sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9,
+ conf->tmppage, READ))
+ success = 1;
+ else {
+ d++;
+ if (d == conf->raid_disks)
+ d = 0;
+ }
+ } while (!success && d != r1_bio->read_disk);
+
+ if (success) {
+ /* write it back and re-read */
+ while (d != r1_bio->read_disk) {
+ if (d==0)
+ d = conf->raid_disks;
+ d--;
+ rdev = conf->mirrors[d].rdev;
+ if (rdev &&
+ test_bit(In_sync, &rdev->flags)) {
+ if (sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9, conf->tmppage, WRITE) == 0 ||
+ sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9, conf->tmppage, READ) == 0) {
+ /* Well, this device is dead */
+ md_error(mddev, rdev);
+ }
+ }
+ }
+ } else {
+ /* Cannot read from anywhere -- bye bye array */
+ md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
+ break;
+ }
+ sectors -= s;
+ sect += s;
+ }
+
+
+ unfreeze_array(conf);
+
bio = r1_bio->bios[r1_bio->read_disk];
if ((disk=read_balance(conf, r1_bio)) == -1) {
printk(KERN_ALERT "raid1: %s: unrecoverable I/O"
@@ -1528,6 +1618,10 @@ static int run(mddev_t *mddev)
memset(conf->mirrors, 0, sizeof(struct mirror_info)*mddev->raid_disks);
+ conf->tmppage = alloc_page(GFP_KERNEL);
+ if (!conf->tmppage)
+ goto out_no_mem;
+
conf->poolinfo = kmalloc(sizeof(*conf->poolinfo), GFP_KERNEL);
if (!conf->poolinfo)
goto out_no_mem;
@@ -1634,6 +1728,7 @@ out_free_conf:
if (conf->r1bio_pool)
mempool_destroy(conf->r1bio_pool);
kfree(conf->mirrors);
+ __free_page(conf->tmppage);
kfree(conf->poolinfo);
kfree(conf);
mddev->private = NULL;
diff ./include/linux/raid/raid1.h~current~ ./include/linux/raid/raid1.h
--- ./include/linux/raid/raid1.h~current~ 2005-11-28 10:12:17.000000000 +1100
+++ ./include/linux/raid/raid1.h 2005-11-28 10:13:11.000000000 +1100
@@ -46,6 +46,7 @@ struct r1_private_data_s {
spinlock_t resync_lock;
int nr_pending;
int nr_waiting;
+ int nr_queued;
int barrier;
sector_t next_resync;
int fullsync; /* set to 1 if a full sync is needed,
@@ -57,6 +58,8 @@ struct r1_private_data_s {
struct pool_info *poolinfo;
+ struct page *tmppage;
+
mempool_t *r1bio_pool;
mempool_t *r1buf_pool;
};
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors.
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (12 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1 NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:40 ` [PATCH md 016 of 18] Better handling for read error in raid1 during resync NeilBrown
` (2 subsequent siblings)
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
We are dereferencing ->rdev without an rcu lock!
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 112 +++++++++++++++++++++++++--------------------------
1 file changed, 57 insertions(+), 55 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:13:11.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:13:14.000000000 +1100
@@ -177,6 +177,13 @@ static inline void free_r1bio(r1bio_t *r
static inline void put_buf(r1bio_t *r1_bio)
{
conf_t *conf = mddev_to_conf(r1_bio->mddev);
+ int i;
+
+ for (i=0; i<conf->raid_disks; i++) {
+ struct bio *bio = r1_bio->bios[i];
+ if (bio->bi_end_io)
+ rdev_dec_pending(conf->mirrors[i].rdev, r1_bio->mddev);
+ }
mempool_free(r1_bio, conf->r1buf_pool);
@@ -1084,7 +1091,6 @@ static int end_sync_read(struct bio *bio
conf->mirrors[r1_bio->read_disk].rdev);
} else
set_bit(R1BIO_Uptodate, &r1_bio->state);
- rdev_dec_pending(conf->mirrors[r1_bio->read_disk].rdev, conf->mddev);
reschedule_retry(r1_bio);
return 0;
}
@@ -1115,7 +1121,6 @@ static int end_sync_write(struct bio *bi
md_done_sync(mddev, r1_bio->sectors, uptodate);
put_buf(r1_bio);
}
- rdev_dec_pending(conf->mirrors[mirror].rdev, mddev);
return 0;
}
@@ -1152,10 +1157,14 @@ static void sync_request_write(mddev_t *
atomic_set(&r1_bio->remaining, 1);
for (i = 0; i < disks ; i++) {
wbio = r1_bio->bios[i];
- if (wbio->bi_end_io != end_sync_write)
+ if (wbio->bi_end_io == NULL ||
+ (wbio->bi_end_io == end_sync_read &&
+ (i == r1_bio->read_disk ||
+ !test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
continue;
- atomic_inc(&conf->mirrors[i].rdev->nr_pending);
+ wbio->bi_rw = WRITE;
+ wbio->bi_end_io = end_sync_write;
atomic_inc(&r1_bio->remaining);
md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9);
@@ -1387,14 +1396,13 @@ static int init_resync(conf_t *conf)
static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, int go_faster)
{
conf_t *conf = mddev_to_conf(mddev);
- mirror_info_t *mirror;
r1bio_t *r1_bio;
struct bio *bio;
sector_t max_sector, nr_sectors;
- int disk;
+ int disk = -1;
int i;
- int wonly;
- int write_targets = 0;
+ int wonly = -1;
+ int write_targets = 0, read_targets = 0;
int sync_blocks;
int still_degraded = 0;
@@ -1446,44 +1454,24 @@ static sector_t sync_request(mddev_t *md
conf->next_resync = sector_nr;
- /*
- * If reconstructing, and >1 working disc,
- * could dedicate one to rebuild and others to
- * service read requests ..
- */
- disk = conf->last_used;
- /* make sure disk is operational */
- wonly = disk;
- while (conf->mirrors[disk].rdev == NULL ||
- !test_bit(In_sync, &conf->mirrors[disk].rdev->flags) ||
- test_bit(WriteMostly, &conf->mirrors[disk].rdev->flags)
- ) {
- if (conf->mirrors[disk].rdev &&
- test_bit(In_sync, &conf->mirrors[disk].rdev->flags))
- wonly = disk;
- if (disk <= 0)
- disk = conf->raid_disks;
- disk--;
- if (disk == conf->last_used) {
- disk = wonly;
- break;
- }
- }
- conf->last_used = disk;
- atomic_inc(&conf->mirrors[disk].rdev->nr_pending);
-
-
- mirror = conf->mirrors + disk;
-
r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
+ rcu_read_lock();
+ /*
+ * If we get a correctably read error during resync or recovery,
+ * we might want to read from a different device. So we
+ * flag all drives that could conceivably be read from for READ,
+ * and any others (which will be non-In_sync devices) for WRITE.
+ * If a read fails, we try reading from something else for which READ
+ * is OK.
+ */
r1_bio->mddev = mddev;
r1_bio->sector = sector_nr;
r1_bio->state = 0;
set_bit(R1BIO_IsSync, &r1_bio->state);
- r1_bio->read_disk = disk;
for (i=0; i < conf->raid_disks; i++) {
+ mdk_rdev_t *rdev;
bio = r1_bio->bios[i];
/* take from bio_init */
@@ -1498,35 +1486,49 @@ static sector_t sync_request(mddev_t *md
bio->bi_end_io = NULL;
bio->bi_private = NULL;
- if (i == disk) {
- bio->bi_rw = READ;
- bio->bi_end_io = end_sync_read;
- } else if (conf->mirrors[i].rdev == NULL ||
- test_bit(Faulty, &conf->mirrors[i].rdev->flags)) {
+ rdev = rcu_dereference(conf->mirrors[i].rdev);
+ if (rdev == NULL ||
+ test_bit(Faulty, &rdev->flags)) {
still_degraded = 1;
continue;
- } else if (!test_bit(In_sync, &conf->mirrors[i].rdev->flags) ||
- sector_nr + RESYNC_SECTORS > mddev->recovery_cp ||
- test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+ } else if (!test_bit(In_sync, &rdev->flags)) {
bio->bi_rw = WRITE;
bio->bi_end_io = end_sync_write;
write_targets ++;
- } else
- /* no need to read or write here */
- continue;
- bio->bi_sector = sector_nr + conf->mirrors[i].rdev->data_offset;
- bio->bi_bdev = conf->mirrors[i].rdev->bdev;
+ } else {
+ /* may need to read from here */
+ bio->bi_rw = READ;
+ bio->bi_end_io = end_sync_read;
+ if (test_bit(WriteMostly, &rdev->flags)) {
+ if (wonly < 0)
+ wonly = i;
+ } else {
+ if (disk < 0)
+ disk = i;
+ }
+ read_targets++;
+ }
+ atomic_inc(&rdev->nr_pending);
+ bio->bi_sector = sector_nr + rdev->data_offset;
+ bio->bi_bdev = rdev->bdev;
bio->bi_private = r1_bio;
}
+ rcu_read_unlock();
+ if (disk < 0)
+ disk = wonly;
+ r1_bio->read_disk = disk;
- if (write_targets == 0) {
+ if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
+ /* extra read targets are also write targets */
+ write_targets += read_targets-1;
+
+ if (write_targets == 0 || read_targets == 0) {
/* There is nowhere to write, so all non-sync
* drives must be failed - so we are finished
*/
sector_t rv = max_sector - sector_nr;
*skipped = 1;
put_buf(r1_bio);
- rdev_dec_pending(conf->mirrors[disk].rdev, mddev);
return rv;
}
@@ -1577,10 +1579,10 @@ static sector_t sync_request(mddev_t *md
sync_blocks -= (len>>9);
} while (r1_bio->bios[disk]->bi_vcnt < RESYNC_PAGES);
bio_full:
- bio = r1_bio->bios[disk];
+ bio = r1_bio->bios[r1_bio->read_disk];
r1_bio->sectors = nr_sectors;
- md_sync_acct(mirror->rdev->bdev, nr_sectors);
+ md_sync_acct(conf->mirrors[r1_bio->read_disk].rdev->bdev, nr_sectors);
generic_make_request(bio);
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 016 of 18] Better handling for read error in raid1 during resync
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (13 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors NeilBrown
@ 2005-11-27 23:40 ` NeilBrown
2005-11-27 23:41 ` [PATCH md 017 of 18] Handle errors when read-only NeilBrown
2005-11-27 23:41 ` [PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6 NeilBrown
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
... incomplete
Signed-off-by: Neil Brown <neilb@suse.de>
Fix md raid1 fix-read-error-during-resync
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 99 ++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 78 insertions(+), 21 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:13:14.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:13:27.000000000 +1100
@@ -1071,9 +1071,7 @@ abort:
static int end_sync_read(struct bio *bio, unsigned int bytes_done, int error)
{
- int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
r1bio_t * r1_bio = (r1bio_t *)(bio->bi_private);
- conf_t *conf = mddev_to_conf(r1_bio->mddev);
if (bio->bi_size)
return 1;
@@ -1086,10 +1084,7 @@ static int end_sync_read(struct bio *bio
* or re-read if the read failed.
* We don't do much here, just schedule handling by raid1d
*/
- if (!uptodate) {
- md_error(r1_bio->mddev,
- conf->mirrors[r1_bio->read_disk].rdev);
- } else
+ if (test_bit(BIO_UPTODATE, &bio->bi_flags))
set_bit(R1BIO_Uptodate, &r1_bio->state);
reschedule_retry(r1_bio);
return 0;
@@ -1133,27 +1128,89 @@ static void sync_request_write(mddev_t *
bio = r1_bio->bios[r1_bio->read_disk];
-/*
- if (r1_bio->sector == 0) printk("First sync write startss\n");
-*/
+
/*
* schedule writes
*/
if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
- /*
- * There is no point trying a read-for-reconstruct as
- * reconstruct is about to be aborted
+ /* ouch - failed to read all of that.
+ * Try some synchronous reads of other devices to get
+ * good data, much like with normal read errors. Only
+ * read into the pages we already have so they we don't
+ * need to re-issue the read request.
+ * We don't need to freeze the array, because being in an
+ * active sync request, there is no normal IO, and
+ * no overlapping syncs.
*/
- char b[BDEVNAME_SIZE];
- printk(KERN_ALERT "raid1: %s: unrecoverable I/O read error"
- " for block %llu\n",
- bdevname(bio->bi_bdev,b),
- (unsigned long long)r1_bio->sector);
- md_done_sync(mddev, r1_bio->sectors, 0);
- put_buf(r1_bio);
- return;
+ sector_t sect = r1_bio->sector;
+ int sectors = r1_bio->sectors;
+ int idx = 0;
+
+ while(sectors) {
+ int s = sectors;
+ int d = r1_bio->read_disk;
+ int success = 0;
+ mdk_rdev_t *rdev;
+
+ if (s > (PAGE_SIZE>>9))
+ s = PAGE_SIZE >> 9;
+ do {
+ if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
+ rdev = conf->mirrors[d].rdev;
+ if (sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9,
+ bio->bi_io_vec[idx].bv_page,
+ READ)) {
+ success = 1;
+ break;
+ }
+ }
+ d++;
+ if (d == conf->raid_disks)
+ d = 0;
+ } while (!success && d != r1_bio->read_disk);
+
+ if (success) {
+ /* write it back and re-read */
+ set_bit(R1BIO_Uptodate, &r1_bio->state);
+ while (d != r1_bio->read_disk) {
+ if (d == 0)
+ d = conf->raid_disks;
+ d--;
+ if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+ continue;
+ rdev = conf->mirrors[d].rdev;
+ if (sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9,
+ bio->bi_io_vec[idx].bv_page,
+ WRITE) == 0 ||
+ sync_page_io(rdev->bdev,
+ sect + rdev->data_offset,
+ s<<9,
+ bio->bi_io_vec[idx].bv_page,
+ READ) == 0) {
+ md_error(mddev, rdev);
+ }
+ }
+ } else {
+ char b[BDEVNAME_SIZE];
+ /* Cannot read from anywhere, array is toast */
+ md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
+ printk(KERN_ALERT "raid1: %s: unrecoverable I/O read error"
+ " for block %llu\n",
+ bdevname(bio->bi_bdev,b),
+ (unsigned long long)r1_bio->sector);
+ md_done_sync(mddev, r1_bio->sectors, 0);
+ put_buf(r1_bio);
+ return;
+ }
+ sectors -= s;
+ sect += s;
+ idx ++;
+ }
}
-
atomic_set(&r1_bio->remaining, 1);
for (i = 0; i < disks ; i++) {
wbio = r1_bio->bios[i];
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 017 of 18] Handle errors when read-only
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (14 preceding siblings ...)
2005-11-27 23:40 ` [PATCH md 016 of 18] Better handling for read error in raid1 during resync NeilBrown
@ 2005-11-27 23:41 ` NeilBrown
2005-12-10 6:41 ` Yanggun
2005-11-27 23:41 ` [PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6 NeilBrown
16 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid1.c | 18 +++++++++++-------
./include/linux/raid/raid1.h | 7 +++++++
2 files changed, 18 insertions(+), 7 deletions(-)
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2005-11-28 10:13:27.000000000 +1100
+++ ./drivers/md/raid1.c 2005-11-28 10:13:32.000000000 +1100
@@ -154,7 +154,7 @@ static void put_all_bios(conf_t *conf, r
for (i = 0; i < conf->raid_disks; i++) {
struct bio **bio = r1_bio->bios + i;
- if (*bio)
+ if (*bio && *bio != IO_BLOCKED)
bio_put(*bio);
*bio = NULL;
}
@@ -418,11 +418,13 @@ static int read_balance(conf_t *conf, r1
new_disk = 0;
for (rdev = rcu_dereference(conf->mirrors[new_disk].rdev);
+ r1_bio->bios[new_disk] == IO_BLOCKED ||
!rdev || !test_bit(In_sync, &rdev->flags)
|| test_bit(WriteMostly, &rdev->flags);
rdev = rcu_dereference(conf->mirrors[++new_disk].rdev)) {
- if (rdev && test_bit(In_sync, &rdev->flags))
+ if (rdev && test_bit(In_sync, &rdev->flags) &&
+ r1_bio->bios[new_disk] != IO_BLOCKED)
wonly_disk = new_disk;
if (new_disk == conf->raid_disks - 1) {
@@ -436,11 +438,13 @@ static int read_balance(conf_t *conf, r1
/* make sure the disk is operational */
for (rdev = rcu_dereference(conf->mirrors[new_disk].rdev);
+ r1_bio->bios[new_disk] == IO_BLOCKED ||
!rdev || !test_bit(In_sync, &rdev->flags) ||
test_bit(WriteMostly, &rdev->flags);
rdev = rcu_dereference(conf->mirrors[new_disk].rdev)) {
- if (rdev && test_bit(In_sync, &rdev->flags))
+ if (rdev && test_bit(In_sync, &rdev->flags) &&
+ r1_bio->bios[new_disk] != IO_BLOCKED)
wonly_disk = new_disk;
if (new_disk <= 0)
@@ -477,7 +481,7 @@ static int read_balance(conf_t *conf, r1
rdev = rcu_dereference(conf->mirrors[disk].rdev);
- if (!rdev ||
+ if (!rdev || r1_bio->bios[disk] == IO_BLOCKED ||
!test_bit(In_sync, &rdev->flags) ||
test_bit(WriteMostly, &rdev->flags))
continue;
@@ -1334,7 +1338,7 @@ static void raid1d(mddev_t *mddev)
sector_t sect = r1_bio->sector;
int sectors = r1_bio->sectors;
freeze_array(conf);
- while(sectors) {
+ if (mddev->ro == 0) while(sectors) {
int s = sectors;
int d = r1_bio->read_disk;
int success = 0;
@@ -1387,7 +1391,6 @@ static void raid1d(mddev_t *mddev)
sect += s;
}
-
unfreeze_array(conf);
bio = r1_bio->bios[r1_bio->read_disk];
@@ -1398,7 +1401,8 @@ static void raid1d(mddev_t *mddev)
(unsigned long long)r1_bio->sector);
raid_end_bio_io(r1_bio);
} else {
- r1_bio->bios[r1_bio->read_disk] = NULL;
+ r1_bio->bios[r1_bio->read_disk] =
+ mddev->ro ? IO_BLOCKED : NULL;
r1_bio->read_disk = disk;
bio_put(bio);
bio = bio_clone(r1_bio->master_bio, GFP_NOIO);
diff ./include/linux/raid/raid1.h~current~ ./include/linux/raid/raid1.h
--- ./include/linux/raid/raid1.h~current~ 2005-11-28 10:13:11.000000000 +1100
+++ ./include/linux/raid/raid1.h 2005-11-28 10:13:32.000000000 +1100
@@ -109,6 +109,13 @@ struct r1bio_s {
/* DO NOT PUT ANY NEW FIELDS HERE - bios array is contiguously alloced*/
};
+/* when we get a read error on a read-only array, we redirect to another
+ * device without failing the first device, or trying to over-write to
+ * correct the read error. To keep track of bad blocks on a per-bio
+ * level, we store IO_BLOCKED in the appropriate 'bios' pointer
+ */
+#define IO_BLOCKED ((struct bio*)1)
+
/* bits for r1bio.state */
#define R1BIO_Uptodate 0
#define R1BIO_IsSync 1
^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
` (15 preceding siblings ...)
2005-11-27 23:41 ` [PATCH md 017 of 18] Handle errors when read-only NeilBrown
@ 2005-11-27 23:41 ` NeilBrown
16 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2005-11-27 23:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid
There is this "FIXME" comment with a typo in it!! that been annoying
me for days, so I just had to remove it.
conf->disks[i].rdev should only be accessed if
- we know we hold a reference or
- the mddev->reconfig_sem is down or
- we have a rcu_readlock
handle_stripe was referencing rdev in three places without any of these.
For the first two, get an rcu_readlock.
For the last, the same access (md_sync_acct call) is made a little later
after the rdev has been claimed under and rcu_readlock, if R5_Syncio is set.
So just use that access... However R5_Syncio isn't really needed as the
'syncing' variable contains the same information. So use that instead.
Issues, comment, and fix are identical in raid5 and raid6.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 16 ++++++++--------
./drivers/md/raid6main.c | 19 ++++++++-----------
./include/linux/raid/raid5.h | 1 -
3 files changed, 16 insertions(+), 20 deletions(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2005-11-28 10:12:40.000000000 +1100
+++ ./drivers/md/raid5.c 2005-11-28 10:13:37.000000000 +1100
@@ -960,11 +960,11 @@ static void handle_stripe(struct stripe_
syncing = test_bit(STRIPE_SYNCING, &sh->state);
/* Now to look around and see what can be done */
+ rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
dev = &sh->dev[i];
clear_bit(R5_Insync, &dev->flags);
- clear_bit(R5_Syncio, &dev->flags);
PRINTK("check %d: state 0x%lx read %p write %p written %p\n",
i, dev->flags, dev->toread, dev->towrite, dev->written);
@@ -1003,7 +1003,7 @@ static void handle_stripe(struct stripe_
non_overwrite++;
}
if (dev->written) written++;
- rdev = conf->disks[i].rdev; /* FIXME, should I be looking rdev */
+ rdev = rcu_dereference(conf->disks[i].rdev);
if (!rdev || !test_bit(In_sync, &rdev->flags)) {
/* The ReadError flag will just be confusing now */
clear_bit(R5_ReadError, &dev->flags);
@@ -1016,6 +1016,7 @@ static void handle_stripe(struct stripe_
} else
set_bit(R5_Insync, &dev->flags);
}
+ rcu_read_unlock();
PRINTK("locked=%d uptodate=%d to_read=%d"
" to_write=%d failed=%d failed_num=%d\n",
locked, uptodate, to_read, to_write, failed, failed_num);
@@ -1027,10 +1028,13 @@ static void handle_stripe(struct stripe_
int bitmap_end = 0;
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
- mdk_rdev_t *rdev = conf->disks[i].rdev;
+ mdk_rdev_t *rdev;
+ rcu_read_lock();
+ rdev = rcu_dereference(conf->disks[i].rdev);
if (rdev && test_bit(In_sync, &rdev->flags))
/* multiple read failures in one stripe */
md_error(conf->mddev, rdev);
+ rcu_read_unlock();
}
spin_lock_irq(&conf->device_lock);
@@ -1179,9 +1183,6 @@ static void handle_stripe(struct stripe_
locked++;
PRINTK("Reading block %d (sync=%d)\n",
i, syncing);
- if (syncing)
- md_sync_acct(conf->disks[i].rdev->bdev,
- STRIPE_SECTORS);
}
}
}
@@ -1325,7 +1326,6 @@ static void handle_stripe(struct stripe_
clear_bit(STRIPE_DEGRADED, &sh->state);
locked++;
set_bit(STRIPE_INSYNC, &sh->state);
- set_bit(R5_Syncio, &dev->flags);
}
}
if (syncing && locked == 0 && test_bit(STRIPE_INSYNC, &sh->state)) {
@@ -1391,7 +1391,7 @@ static void handle_stripe(struct stripe_
rcu_read_unlock();
if (rdev) {
- if (test_bit(R5_Syncio, &sh->dev[i].flags))
+ if (syncing)
md_sync_acct(rdev->bdev, STRIPE_SECTORS);
bi->bi_bdev = rdev->bdev;
diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~ 2005-11-28 10:12:59.000000000 +1100
+++ ./drivers/md/raid6main.c 2005-11-28 10:13:37.000000000 +1100
@@ -1060,11 +1060,11 @@ static void handle_stripe(struct stripe_
syncing = test_bit(STRIPE_SYNCING, &sh->state);
/* Now to look around and see what can be done */
+ rcu_read_lock();
for (i=disks; i--; ) {
mdk_rdev_t *rdev;
dev = &sh->dev[i];
clear_bit(R5_Insync, &dev->flags);
- clear_bit(R5_Syncio, &dev->flags);
PRINTK("check %d: state 0x%lx read %p write %p written %p\n",
i, dev->flags, dev->toread, dev->towrite, dev->written);
@@ -1103,7 +1103,7 @@ static void handle_stripe(struct stripe_
non_overwrite++;
}
if (dev->written) written++;
- rdev = conf->disks[i].rdev; /* FIXME, should I be looking rdev */
+ rdev = rcu_dereference(conf->disks[i].rdev);
if (!rdev || !test_bit(In_sync, &rdev->flags)) {
/* The ReadError flag will just be confusing now */
clear_bit(R5_ReadError, &dev->flags);
@@ -1117,6 +1117,7 @@ static void handle_stripe(struct stripe_
} else
set_bit(R5_Insync, &dev->flags);
}
+ rcu_read_unlock();
PRINTK("locked=%d uptodate=%d to_read=%d"
" to_write=%d failed=%d failed_num=%d,%d\n",
locked, uptodate, to_read, to_write, failed,
@@ -1129,10 +1130,13 @@ static void handle_stripe(struct stripe_
int bitmap_end = 0;
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
- mdk_rdev_t *rdev = conf->disks[i].rdev;
+ mdk_rdev_t *rdev;
+ rcu_read_lock();
+ rdev = rcu_dereference(conf->disks[i].rdev);
if (rdev && test_bit(In_sync, &rdev->flags))
/* multiple read failures in one stripe */
md_error(conf->mddev, rdev);
+ rcu_read_unlock();
}
spin_lock_irq(&conf->device_lock);
@@ -1307,9 +1311,6 @@ static void handle_stripe(struct stripe_
locked++;
PRINTK("Reading block %d (sync=%d)\n",
i, syncing);
- if (syncing)
- md_sync_acct(conf->disks[i].rdev->bdev,
- STRIPE_SECTORS);
}
}
}
@@ -1463,14 +1464,12 @@ static void handle_stripe(struct stripe_
locked++;
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
- set_bit(R5_Syncio, &dev->flags);
}
if (failed >= 1) {
dev = &sh->dev[failed_num[0]];
locked++;
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
- set_bit(R5_Syncio, &dev->flags);
}
if (update_p) {
@@ -1478,14 +1477,12 @@ static void handle_stripe(struct stripe_
locked ++;
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
- set_bit(R5_Syncio, &dev->flags);
}
if (update_q) {
dev = &sh->dev[qd_idx];
locked++;
set_bit(R5_LOCKED, &dev->flags);
set_bit(R5_Wantwrite, &dev->flags);
- set_bit(R5_Syncio, &dev->flags);
}
clear_bit(STRIPE_DEGRADED, &sh->state);
@@ -1557,7 +1554,7 @@ static void handle_stripe(struct stripe_
rcu_read_unlock();
if (rdev) {
- if (test_bit(R5_Syncio, &sh->dev[i].flags))
+ if (syncing)
md_sync_acct(rdev->bdev, STRIPE_SECTORS);
bi->bi_bdev = rdev->bdev;
diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~ 2005-11-28 10:12:56.000000000 +1100
+++ ./include/linux/raid/raid5.h 2005-11-28 10:13:37.000000000 +1100
@@ -152,7 +152,6 @@ struct stripe_head {
#define R5_Insync 3 /* rdev && rdev->in_sync at start */
#define R5_Wantread 4 /* want to schedule a read */
#define R5_Wantwrite 5
-#define R5_Syncio 6 /* this io need to be accounted as resync io */
#define R5_Overlap 7 /* There is a pending overlapping request on this block */
#define R5_ReadError 8 /* seen a read error here recently */
#define R5_ReWrite 9 /* have tried to over-write the readerror */
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.
2005-11-27 23:40 ` [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1 NeilBrown
@ 2005-11-29 16:38 ` Paul Clements
2005-11-29 23:21 ` Neil Brown
0 siblings, 1 reply; 33+ messages in thread
From: Paul Clements @ 2005-11-29 16:38 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, linux-raid
Hi Neil,
Glad to see this patch is making its way to mainline. I have a couple of
questions on the patch, though...
NeilBrown wrote:
> + if (uptodate || conf->working_disks <= 1) {
Is it valid to mask a read error just because we have only 1 working disk?
> + do {
> + rdev = conf->mirrors[d].rdev;
> + if (rdev &&
> + test_bit(In_sync, &rdev->flags) &&
> + sync_page_io(rdev->bdev,
> + sect + rdev->data_offset,
> + s<<9,
> + conf->tmppage, READ))
> + success = 1;
> + else {
> + d++;
> + if (d == conf->raid_disks)
> + d = 0;
> + }
> + } while (!success && d != r1_bio->read_disk);
> +
> + if (success) {
> + /* write it back and re-read */
> + while (d != r1_bio->read_disk) {
Here, it looks like if we retry the read on the same disk that just gave
the read error, then we will not do any re-writes? I assume that is
intentional? I guess it's a judgment call whether the sector is really
bad at that point.
> + if (d==0)
> + d = conf->raid_disks;
> + d--;
> + rdev = conf->mirrors[d].rdev;
> + if (rdev &&
> + test_bit(In_sync, &rdev->flags)) {
> + if (sync_page_io(rdev->bdev,
> + sect + rdev->data_offset,
> + s<<9, conf->tmppage, WRITE) == 0 ||
> + sync_page_io(rdev->bdev,
> + sect + rdev->data_offset,
> + s<<9, conf->tmppage, READ) == 0) {
> + /* Well, this device is dead */
> + md_error(mddev, rdev);
Here, we might have gotten garbage back from the sync_page_io(...,
READ), if it failed. So don't we have to quit the re-write loop at this
point? Otherwise, aren't we potentially writing bad data over other
disks? Granted, this particular case might never actually happen in the
real world.
Thanks,
Paul
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1.
2005-11-29 16:38 ` Paul Clements
@ 2005-11-29 23:21 ` Neil Brown
0 siblings, 0 replies; 33+ messages in thread
From: Neil Brown @ 2005-11-29 23:21 UTC (permalink / raw)
To: Paul Clements; +Cc: Andrew Morton, linux-raid
On Tuesday November 29, paul.clements@steeleye.com wrote:
> Hi Neil,
>
> Glad to see this patch is making its way to mainline. I have a couple of
> questions on the patch, though...
Thanks for reviewing the code - I really value that!
>
> NeilBrown wrote:
>
> > + if (uptodate || conf->working_disks <= 1) {
>
> Is it valid to mask a read error just because we have only 1 working disk?
>
The purpose of this was that if there is only one working disk, there
is nothing we can do in the face of a read error, except return it
upstream. However I did get the logic immediately after that wrong as
I discovered when I was porting it across to raid10. Patch will be
out shortly.
>
>
> > + do {
> > + rdev = conf->mirrors[d].rdev;
> > + if (rdev &&
> > + test_bit(In_sync, &rdev->flags) &&
> > + sync_page_io(rdev->bdev,
> > + sect + rdev->data_offset,
> > + s<<9,
> > + conf->tmppage, READ))
> > + success = 1;
> > + else {
> > + d++;
> > + if (d == conf->raid_disks)
> > + d = 0;
> > + }
> > + } while (!success && d != r1_bio->read_disk);
> > +
> > + if (success) {
> > + /* write it back and re-read */
> > + while (d != r1_bio->read_disk) {
>
> Here, it looks like if we retry the read on the same disk that just gave
> the read error, then we will not do any re-writes? I assume that is
> intentional? I guess it's a judgment call whether the sector is really
> bad at that point.
The read that failed was quite possibly a very large multipage read -
64K maybe (depends a lot on the filesystem).
What I do is walk through that one page at a time and retry the read.
If the re-read succeeds, then I assume the failure that stopped the
original (possibly larger) request was somewhere else, and I move on.
So yes, if a block sometimes fails, and then succeeds again on the
next read, we might not try to 'fix' it. Is that likely to happen I
wonder??
>
> > + if (d==0)
> > + d = conf->raid_disks;
> > + d--;
> > + rdev = conf->mirrors[d].rdev;
> > + if (rdev &&
> > + test_bit(In_sync, &rdev->flags)) {
> > + if (sync_page_io(rdev->bdev,
> > + sect + rdev->data_offset,
> > + s<<9, conf->tmppage, WRITE) == 0 ||
> > + sync_page_io(rdev->bdev,
> > + sect + rdev->data_offset,
> > + s<<9, conf->tmppage, READ) == 0) {
> > + /* Well, this device is dead */
> > + md_error(mddev, rdev);
>
> Here, we might have gotten garbage back from the sync_page_io(...,
> READ), if it failed. So don't we have to quit the re-write loop at this
> point? Otherwise, aren't we potentially writing bad data over other
> disks? Granted, this particular case might never actually happen in the
> real world.
Yes, you are right. I guess I really should be reading back into a
different buffer just in case something goes screwy...
Or maybe I could do all the writes, and then do all the reads in a
separate loop - that would be just as safe.
I see which one looks neatest.
Thank again,
NeilBrown
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH md 013 of 18] Improve handing of read errors with raid6
2005-11-27 23:40 ` [PATCH md 013 of 18] Improve handing of read errors with raid6 NeilBrown
@ 2005-11-30 22:33 ` Carlos Carvalho
2005-12-01 2:54 ` Neil Brown
0 siblings, 1 reply; 33+ messages in thread
From: Carlos Carvalho @ 2005-11-30 22:33 UTC (permalink / raw)
To: linux-raid
NeilBrown (neilb@suse.de) wrote on 28 November 2005 10:40:
>This is a simple port of match functionality across from raid5.
>If we get a read error, we don't kick the drive straight away, but
>try to over-write with good data first.
Does it really mean that this funcionality is already available for
raid5??? That would be great news.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH md 013 of 18] Improve handing of read errors with raid6
2005-11-30 22:33 ` Carlos Carvalho
@ 2005-12-01 2:54 ` Neil Brown
0 siblings, 0 replies; 33+ messages in thread
From: Neil Brown @ 2005-12-01 2:54 UTC (permalink / raw)
To: Carlos Carvalho; +Cc: linux-raid
On Wednesday November 30, carlos@fisica.ufpr.br wrote:
> NeilBrown (neilb@suse.de) wrote on 28 November 2005 10:40:
> >This is a simple port of match functionality across from raid5.
> >If we get a read error, we don't kick the drive straight away, but
> >try to over-write with good data first.
>
> Does it really mean that this funcionality is already available for
> raid5??? That would be great news.
Yes. It has been in -mm for a while, and will be in 2.6.15. The
raid1/raid6/raid10 versions will have to wait for 2.6.16.
I'll probably do a little 'md Release notes' announcement when 2.6.15
comes out if I remember.
NeilBrown
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH md 017 of 18] Handle errors when read-only
2005-11-27 23:41 ` [PATCH md 017 of 18] Handle errors when read-only NeilBrown
@ 2005-12-10 6:41 ` Yanggun
2005-12-10 6:59 ` raid1 mysteriously switching to read-only Neil Brown
0 siblings, 1 reply; 33+ messages in thread
From: Yanggun @ 2005-12-10 6:41 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, linux-raid
Hi Brown.
I have raid arrays, raid1 called md0. Basically they run fine, but
something is switching md0 readonly during write to disk(cp, mv);
Is changed by that RAID readonly in what case? let me know, describe
number of case?
I am very anxious about it very. Please, can you inform to me? Does
this patch solve it? Can not you do to do not become by readonly?
Setup:
HW:
2 S-ATA-Disks (240GB each) -> /dev/md0 RAID-1
Promise S150 TX2Plus - Controller
Intel Pentium4 - 2.8GHz
SW:
Debian Linux Testing
Kernel 2.6.13.2
Software Raid-1
2005/11/28, NeilBrown <neilb@suse.de>:
>
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./drivers/md/raid1.c | 18 +++++++++++-------
> ./include/linux/raid/raid1.h | 7 +++++++
> 2 files changed, 18 insertions(+), 7 deletions(-)
>
> diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
> --- ./drivers/md/raid1.c~current~ 2005-11-28 10:13:27.000000000 +1100
> +++ ./drivers/md/raid1.c 2005-11-28 10:13:32.000000000 +1100
> @@ -154,7 +154,7 @@ static void put_all_bios(conf_t *conf, r
>
> for (i = 0; i < conf->raid_disks; i++) {
> struct bio **bio = r1_bio->bios + i;
> - if (*bio)
> + if (*bio && *bio != IO_BLOCKED)
> bio_put(*bio);
> *bio = NULL;
> }
> @@ -418,11 +418,13 @@ static int read_balance(conf_t *conf, r1
> new_disk = 0;
>
> for (rdev = rcu_dereference(conf->mirrors[new_disk].rdev);
> + r1_bio->bios[new_disk] == IO_BLOCKED ||
> !rdev || !test_bit(In_sync, &rdev->flags)
> || test_bit(WriteMostly, &rdev->flags);
> rdev = rcu_dereference(conf->mirrors[++new_disk].rdev)) {
>
> - if (rdev && test_bit(In_sync, &rdev->flags))
> + if (rdev && test_bit(In_sync, &rdev->flags) &&
> + r1_bio->bios[new_disk] != IO_BLOCKED)
> wonly_disk = new_disk;
>
> if (new_disk == conf->raid_disks - 1) {
> @@ -436,11 +438,13 @@ static int read_balance(conf_t *conf, r1
>
> /* make sure the disk is operational */
> for (rdev = rcu_dereference(conf->mirrors[new_disk].rdev);
> + r1_bio->bios[new_disk] == IO_BLOCKED ||
> !rdev || !test_bit(In_sync, &rdev->flags) ||
> test_bit(WriteMostly, &rdev->flags);
> rdev = rcu_dereference(conf->mirrors[new_disk].rdev)) {
>
> - if (rdev && test_bit(In_sync, &rdev->flags))
> + if (rdev && test_bit(In_sync, &rdev->flags) &&
> + r1_bio->bios[new_disk] != IO_BLOCKED)
> wonly_disk = new_disk;
>
> if (new_disk <= 0)
> @@ -477,7 +481,7 @@ static int read_balance(conf_t *conf, r1
>
> rdev = rcu_dereference(conf->mirrors[disk].rdev);
>
> - if (!rdev ||
> + if (!rdev || r1_bio->bios[disk] == IO_BLOCKED ||
> !test_bit(In_sync, &rdev->flags) ||
> test_bit(WriteMostly, &rdev->flags))
> continue;
> @@ -1334,7 +1338,7 @@ static void raid1d(mddev_t *mddev)
> sector_t sect = r1_bio->sector;
> int sectors = r1_bio->sectors;
> freeze_array(conf);
> - while(sectors) {
> + if (mddev->ro == 0) while(sectors) {
> int s = sectors;
> int d = r1_bio->read_disk;
> int success = 0;
> @@ -1387,7 +1391,6 @@ static void raid1d(mddev_t *mddev)
> sect += s;
> }
>
> -
> unfreeze_array(conf);
>
> bio = r1_bio->bios[r1_bio->read_disk];
> @@ -1398,7 +1401,8 @@ static void raid1d(mddev_t *mddev)
> (unsigned long long)r1_bio->sector);
> raid_end_bio_io(r1_bio);
> } else {
> - r1_bio->bios[r1_bio->read_disk] = NULL;
> + r1_bio->bios[r1_bio->read_disk] =
> + mddev->ro ? IO_BLOCKED : NULL;
> r1_bio->read_disk = disk;
> bio_put(bio);
> bio = bio_clone(r1_bio->master_bio, GFP_NOIO);
>
> diff ./include/linux/raid/raid1.h~current~ ./include/linux/raid/raid1.h
> --- ./include/linux/raid/raid1.h~current~ 2005-11-28 10:13:11.000000000 +1100
> +++ ./include/linux/raid/raid1.h 2005-11-28 10:13:32.000000000 +1100
> @@ -109,6 +109,13 @@ struct r1bio_s {
> /* DO NOT PUT ANY NEW FIELDS HERE - bios array is contiguously alloced*/
> };
>
> +/* when we get a read error on a read-only array, we redirect to another
> + * device without failing the first device, or trying to over-write to
> + * correct the read error. To keep track of bad blocks on a per-bio
> + * level, we store IO_BLOCKED in the appropriate 'bios' pointer
> + */
> +#define IO_BLOCKED ((struct bio*)1)
> +
> /* bits for r1bio.state */
> #define R1BIO_Uptodate 0
> #define R1BIO_IsSync 1
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 6:41 ` Yanggun
@ 2005-12-10 6:59 ` Neil Brown
2005-12-10 7:50 ` Yanggun
0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-10 6:59 UTC (permalink / raw)
To: Yanggun; +Cc: linux-raid
On Saturday December 10, yang.geum.seok@gmail.com wrote:
> Hi Brown.
>
> I have raid arrays, raid1 called md0. Basically they run fine, but
> something is switching md0 readonly during write to disk(cp, mv);
>
> Is changed by that RAID readonly in what case? let me know, describe
> number of case?
>
> I am very anxious about it very. Please, can you inform to me? Does
> this patch solve it? Can not you do to do not become by readonly?
You will need to give more details about what is happening. Lots more.
What makes you say the array is 'read-only' - what are the messages
you get - exactly?
What filesystem are you using?
What messages are there in the kernel log ('dmesg' might show these)?
What does 'cat /proc/mdstat' show?
NeilBrown
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 6:59 ` raid1 mysteriously switching to read-only Neil Brown
@ 2005-12-10 7:50 ` Yanggun
2005-12-10 8:02 ` Neil Brown
0 siblings, 1 reply; 33+ messages in thread
From: Yanggun @ 2005-12-10 7:50 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
I am sorry because there is no information very.
I currently ext3 file system.
I composed RAID-1(/dev/md0) disk array by order such as lower part.
mkfs.ext3 -j /dev/sda1
mkfs.ext3 -j /dev/sda1
mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
[root@sentry24 root]# cat /proc/mdstat ; cat /proc/mounts
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
244195904 blocks [2/2] [UU]
unused devices: <none>
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
tmpfs /tmp tmpfs rw 0 0
tmpfs /var tmpfs rw 0 0
none /proc/bus/usb usbfs rw 0 0
/dev/md0 /data/disk1 ext3 rw,noatime 0 0
After compose RAID-1 disk array, subordinate did serious disk I/O by
order such as lower part.
scp -r root@192.168.0.24:/data/disk3/*.avi /data/disk1/
It became mount by to readonly while copy file at 30 minutes.
[root@root root]# cat /proc/mdstat ; cat /proc/mounts
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
244195904 blocks [2/2] [UU]
unused devices: <none>
rootfs / rootfs rw 0 0
/dev/root / ext3 rw,noatime 0 0
proc /proc proc rw,nodiratime 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw 0 0
tmpfs /dev/shm tmpfs rw 0 0
tmpfs /tmp tmpfs rw 0 0
tmpfs /var tmpfs rw 0 0
none /proc/bus/usb usbfs rw 0 0
/dev/md0 /data/disk3 ext3 ro,noatime 0 0 <------------- changed readonly
[root@rootroot]# ls -l /data/disk3/dvr
total 0
[root@rootroot]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
5 0 128 6592 299528 32676 0 0 386 2 1328 1268 35 64 1 1
2 0 128 6468 299528 32676 0 0 0 0 1346 6578 52 48 0 0
2 0 128 6468 299528 32676 0 0 0 0 1344 7606 28 72 0 0
[root@sentry24 root]# tail -f /var/log/kern.log
Dec 10 01:36:32 kernel: [ 48.647619] EXT3 FS on hda1, internal journal
Dec 10 01:36:32 kernel: [ 52.516666] SCSI subsystem initialized
Dec 10 01:36:32 kernel: [ 52.551594] PROMISE SATA-II 150/300
Series Linux Driver v1.01.0.20
Dec 10 01:36:32 kernel: [ 52.553736] ACPI: PCI Interrupt
0000:01:05.0[A] -> GSI 16 (level, low) -> IRQ 177
Dec 10 01:36:32 kernel: [ 52.693131] ulsata2:[info] Drive
1/0: WDC WD2500JS-22MHB0 488397167s 250059MB UDMA6
Dec 10 01:36:32 kernel: [ 52.807338] ulsata2:[info] Drive
3/0: WDC WD2500JS-22MHB0 488397167s 250059MB UDMA6
Dec 10 01:36:32 kernel: [ 52.809447] scsi0 : ulsata2
Dec 10 01:36:32 kernel: [ 52.823914] Vendor:
Model: WDC WD2500JS-22M Rev:
Dec 10 01:36:32 kernel: [ 52.826003] Type: Direct-Access
ANSI SCSI revision: 02
Dec 10 01:36:32 kernel: [ 52.852253] Vendor:
Model: WDC WD2500JS-22M Rev:
Dec 10 01:36:32 kernel: [ 52.854580] Type: Direct-Access
ANSI SCSI revision: 02
Dec 10 01:36:32 kernel: [ 52.942756] SCSI device sda:
488397168 512-byte hdwr sectors (250059 MB)
Dec 10 01:36:32 kernel: [ 52.944896] sda: got wrong page
Dec 10 01:36:32 kernel: [ 52.946855] sda: assuming drive
cache: write through
Dec 10 01:36:32 kernel: [ 52.961355] SCSI device sda:
488397168 512-byte hdwr sectors (250059 MB)
Dec 10 01:36:32 kernel: [ 52.963487] sda: got wrong page
Dec 10 01:36:32 kernel: [ 52.965451] sda: assuming drive
cache: write through
Dec 10 01:36:32 kernel: [ 52.967431] sda: sda1
Dec 10 01:36:32 kernel: [ 52.989793] Attached scsi disk sda
at scsi0, channel 0, id 0, lun 0
Dec 10 01:36:32 kernel: [ 52.991983] SCSI device sdb:
488397168 512-byte hdwr sectors (250059 MB)
Dec 10 01:36:32 kernel: [ 52.994016] sdb: got wrong page
Dec 10 01:36:32 kernel: [ 52.996018] sdb: assuming drive
cache: write through
Dec 10 01:36:32 kernel: [ 53.004583] SCSI device sdb:
488397168 512-byte hdwr sectors (250059 MB)
Dec 10 01:36:32 kernel: [ 53.006722] sdb: got wrong page
Dec 10 01:36:32 kernel: [ 53.008714] sdb: assuming drive
cache: write through
Dec 10 01:36:32 kernel: [ 53.010803] sdb: sdb1
Dec 10 01:36:32 kernel: [ 53.038174] Attached scsi disk sdb
at scsi0, channel 0, id 2, lun 0
Dec 10 01:36:32 kernel: [ 53.195957] Intel(R) PRO/1000
Network Driver - version 6.0.60-k2
Dec 10 01:36:32 kernel: [ 53.198108] Copyright (c) 1999-2005
Intel Corporation.
Dec 10 01:36:32 kernel: [ 53.212695] ACPI: PCI Interrupt
0000:01:03.0[A] -> GSI 21 (level, low) -> IRQ 185
Dec 10 01:36:32 kernel: [ 53.682839] e1000: eth0:
e1000_probe: Intel(R) PRO/1000 Network Connection
Dec 10 01:36:32 kernel: [ 53.999922] usbcore: registered new
driver usbfs
Dec 10 01:36:32 kernel: [ 54.014605] usbcore: registered new driver hub
Dec 10 01:36:32 kernel: [ 54.059477] ACPI: PCI Interrupt
0000:00:1d.7[D] -> GSI 23 (level, low) -> IRQ 193
Dec 10 01:36:32 kernel: [ 54.061754] PCI: Setting latency
timer of device 0000:00:1d.7 to 64
Dec 10 01:36:32 kernel: [ 54.061761] ehci_hcd 0000:00:1d.7:
Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller
Dec 10 01:36:32 kernel: [ 54.078687] ehci_hcd 0000:00:1d.7:
new USB bus registered, assigned bus number 1
Dec 10 01:36:32 kernel: [ 54.081030] ehci_hcd 0000:00:1d.7:
irq 193, io mem 0xe8180000
Dec 10 01:36:32 kernel: [ 54.087181] PCI: cache line size of
128 is not supported by device 0000:00:1d.7
Dec 10 01:36:32 kernel: [ 54.087189] ehci_hcd 0000:00:1d.7:
USB 2.0 initialized, EHCI 1.00, driver 10 Dec 2004
Dec 10 01:36:32 kernel: [ 54.116718] hub 1-0:1.0: USB hub found
Dec 10 01:36:32 kernel: [ 54.119049] hub 1-0:1.0: 6 ports detected
Dec 10 01:36:32 kernel: [ 54.675098] md: md driver 0.90.2
MAX_MD_DEVS=256, MD_SB_DISKS=27
Dec 10 01:36:32 kernel: [ 54.677398] md: bitmap version 3.38
Dec 10 01:36:32 kernel: [ 54.694205] md: raid1 personality
registered as nr 3
Dec 10 01:36:32 kernel: [ 54.868492] USB Universal Host
Controller Interface driver v2.3
Dec 10 01:36:32 kernel: [ 54.883020] ACPI: PCI Interrupt
0000:00:1d.0[A] -> GSI 16 (level, low) -> IRQ 177
Dec 10 01:36:32 kernel: [ 54.885145] PCI: Setting latency
timer of device 0000:00:1d.0 to 64
Dec 10 01:36:32 kernel: [ 54.885151] uhci_hcd 0000:00:1d.0:
Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI
Controller #1
Dec 10 01:36:32 kernel: [ 54.901825] uhci_hcd 0000:00:1d.0:
new USB bus registered, assigned bus number 2
Dec 10 01:36:32 kernel: [ 54.904054] uhci_hcd 0000:00:1d.0:
irq 177, io base 0x0000e200
Dec 10 01:36:32 kernel: [ 55.034027] hub 2-0:1.0: USB hub found
Dec 10 01:36:32 kernel: [ 55.036202] hub 2-0:1.0: 2 ports detected
Dec 10 01:36:32 kernel: [ 55.362260] ACPI: PCI Interrupt
0000:00:1d.1[B] -> GSI 19 (level, low) -> IRQ 201
Dec 10 01:36:32 kernel: [ 55.364448] PCI: Setting latency
timer of device 0000:00:1d.1 to 64
Dec 10 01:36:32 kernel: [ 55.364454] uhci_hcd 0000:00:1d.1:
Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI
Controller #2
Dec 10 01:36:32 kernel: [ 55.381310] uhci_hcd 0000:00:1d.1:
new USB bus registered, assigned bus number 3
Dec 10 01:36:32 kernel: [ 55.383568] uhci_hcd 0000:00:1d.1:
irq 201, io base 0x0000e000
Dec 10 01:36:32 kernel: [ 55.434004] hub 3-0:1.0: USB hub found
Dec 10 01:36:32 kernel: [ 55.436192] hub 3-0:1.0: 2 ports detected
Dec 10 01:36:32 kernel: [ 55.457925] ACPI: PCI Interrupt
0000:00:1d.2[C] -> GSI 18 (level, low) -> IRQ 169
Dec 10 01:36:32 kernel: [ 55.460171] PCI: Setting latency
timer of device 0000:00:1d.2 to 64
Dec 10 01:36:32 kernel: [ 55.460177] uhci_hcd 0000:00:1d.2:
Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI
Controller #3
Dec 10 01:36:32 kernel: [ 55.477009] uhci_hcd 0000:00:1d.2:
new USB bus registered, assigned bus number 4
Dec 10 01:36:32 kernel: [ 55.479289] uhci_hcd 0000:00:1d.2:
irq 169, io base 0x0000e100
Dec 10 01:36:32 kernel: [ 55.503270] hub 4-0:1.0: USB hub found
Dec 10 01:36:32 kernel: [ 55.505478] hub 4-0:1.0: 2 ports detected
Dec 10 01:36:32 kernel: [ 56.429056] Initializing USB Mass
Storage driver...
Dec 10 01:36:32 kernel: [ 56.597725] usb 4-2: new full speed
USB device using uhci_hcd and address 2
Dec 10 01:36:32 kernel: [ 58.327192] usbcore: registered new
driver usb-storage
Dec 10 01:36:32 kernel: [ 58.329386] USB Mass Storage support
registered.
Dec 10 01:36:32 kernel: [ 58.382477] usbcore: registered new
driver usbserial
Dec 10 01:36:32 kernel: [ 58.397053]
drivers/usb/serial/usb-serial.c: USB Serial support registered for
Generic
Dec 10 01:36:32 kernel: [ 58.411747] usbcore: registered new
driver usbserial_generic
Dec 10 01:36:32 kernel: [ 58.633823]
drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0
Dec 10 01:36:32 kernel: [ 58.760438] usbcore: registered new
driver hiddev
Dec 10 01:36:32 kernel: [ 58.778876] input: USB HID v1.00
Device [Burr-Brown from TI USB Audio CODEC ] on
usb-0000:00:1d.2-2
Dec 10 01:36:32 kernel: [ 58.783803] usbcore: registered new
driver usbhid
Dec 10 01:36:32 kernel: [ 58.786300]
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
Dec 10 01:36:32 kernel: [ 58.894333]
drivers/usb/serial/usb-serial.c: USB Serial support registered for
PL-2303
Dec 10 01:36:32 kernel: [ 58.909269] usbcore: registered new
driver pl2303
Dec 10 01:36:32 kernel: [ 58.911796]
drivers/usb/serial/pl2303.c: Prolific PL2303 USB to serial adaptor
driver v0.12
Dec 10 01:36:32 kernel: [ 59.014389] ieee1394: Initialized
config rom entry `ip1394'
Dec 10 01:36:32 kernel: [ 59.078076] usbcore: registered new
driver snd-usb-audio
Dec 10 01:36:32 kernel: [ 59.199443] sbp2: $Rev: 1306 $ Ben
Collins <bcollins@debian.org>
Dec 10 01:36:32 kernel: [ 59.232766] ACPI: Power Button (FF) [PWRF]
Dec 10 01:36:32 kernel: [ 59.235234] ACPI: Power Button (CM) [PWRB]
Dec 10 01:36:32 kernel: [ 59.526876] odcap_driver.c:910:
odcap: driver registered for major 97.
Dec 10 01:36:32 kernel: [ 59.541867] odcap_driver.c:278:
odcap0: device found.
Dec 10 01:36:32 kernel: [ 59.544359] ACPI: PCI Interrupt
0000:01:06.0[A] -> GSI 17 (level, low) -> IRQ 209
Dec 10 01:36:32 kernel: [ 59.546836] PCI: Setting latency
timer of device 0000:01:06.0 to 64
Dec 10 01:36:32 kernel: [ 59.546847] kernel_api.c:251:
odcap0: (240b rev 0) at 0000:01:06.0, irq:209, latency:0
Dec 10 01:36:32 kernel: [ 59.549297] kernel_api.c:267:
odcap0: port_addr:0x0000d300 port_len:0x00000004
Dec 10 01:36:32 kernel: [ 59.551750] kernel_api.c:292:
odcap0: io_mem:0xe002a000 (mmio:0xe8080000, len:0x1000)
Dec 10 01:36:32 kernel: [ 61.224923] ACPI: PCI Interrupt
0000:00:02.0[A] -> GSI 16 (level, low) -> IRQ 177
Dec 10 01:36:32 kernel: [ 61.226432] mtrr:
0xe0000000,0x8000000 overlaps existing 0xe0000000,0x400000
Dec 10 01:36:32 kernel: [ 61.240403] [drm] Initialized i830
1.3.2 20021108 on minor 0: Intel Corporation 82845G/GL[Brookdale-G]/GE
Chipset Integrated Graphics Device
Dec 10 01:36:32 kernel: [ 61.420124] Attached scsi generic
sg0 at scsi0, channel 0, id 0, lun 0, type 0
Dec 10 01:36:32 kernel: [ 61.435944] Attached scsi generic
sg1 at scsi0, channel 0, id 2, lun 0, type 0
Dec 10 01:36:32 kernel: [ 65.656940] e1000: eth0:
e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
Dec 10 01:36:32 kernel: [ 66.997072] lp: driver loaded but no
devices found
Dec 10 01:36:35 kernel: [ 69.496288] apm: BIOS version 1.2
Flags 0x07 (Driver version 1.16ac)
Dec 10 01:36:35 kernel: [ 69.496295] apm: overridden by ACPI.
Dec 10 01:36:35 kernel: [ 69.937106] mtrr:
0xe0000000,0x8000000 overlaps existing 0xe0000000,0x400000
Dec 10 01:36:41 kernel: [ 75.979540] e1000: eth0:
e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex
Dec 10 01:36:45 kernel: Kernel logging (proc) stopped.
Dec 10 01:36:45 kernel: Kernel log daemon terminating.
Dec 10 01:36:46 kernel: klogd 1.4.1#10, log source = /proc/kmsg started.
Dec 10 01:36:46 kernel: Inspecting /boot/System.map-2.6.13.2
Dec 10 01:36:46 kernel: Loaded 28981 symbols from
/boot/System.map-2.6.13.2.
Dec 10 01:36:46 kernel: Symbols match kernel version 2.6.13.
Dec 10 01:36:46 kernel: No module symbols loaded - kernel
modules not enabled.
Dec 10 01:36:50 kernel: [ 84.661743] md: bind<sda1>
Dec 10 01:36:50 kernel: [ 84.661911] md: bind<sdb1>
Dec 10 01:36:50 kernel: [ 84.662081] raid1: raid set md0
active with 2 out of 2 mirrors
Dec 10 01:36:50 kernel: [ 84.663584] md: syncing RAID array md0
Dec 10 01:36:50 kernel: [ 84.663789] md: minimum _guaranteed_
reconstruction speed: 1000 KB/sec/disc.
Dec 10 01:36:50 kernel: [ 84.663923] md: using maximum
available idle IO bandwith (but not more than 200000 KB/sec) for
reconstruction.
Dec 10 01:36:50 kernel: [ 84.664062] md: using 128k window,
over a total of 244195904 blocks.
Dec 10 01:36:50 kernel: [ 84.733296] kjournald starting.
Commit interval 5 seconds
Dec 10 01:36:50 kernel: [ 84.740243] EXT3 FS on md0, internal journal
Nov 17 01:36:50 kernel: [ 84.740347] EXT3-fs: mounted
filesystem with ordered data mode.
Yanggun
2005/12/10, Neil Brown <neilb@suse.de>:
> On Saturday December 10, yang.geum.seok@gmail.com wrote:
> > Hi Brown.
> >
> > I have raid arrays, raid1 called md0. Basically they run fine, but
> > something is switching md0 readonly during write to disk(cp, mv);
> >
> > Is changed by that RAID readonly in what case? let me know, describe
> > number of case?
> >
> > I am very anxious about it very. Please, can you inform to me? Does
> > this patch solve it? Can not you do to do not become by readonly?
>
> You will need to give more details about what is happening. Lots more.
>
> What makes you say the array is 'read-only' - what are the messages
> you get - exactly?
> What filesystem are you using?
> What messages are there in the kernel log ('dmesg' might show these)?
> What does 'cat /proc/mdstat' show?
>
> NeilBrown
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 7:50 ` Yanggun
@ 2005-12-10 8:02 ` Neil Brown
2005-12-10 8:10 ` Yanggun
0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-10 8:02 UTC (permalink / raw)
To: Yanggun; +Cc: linux-raid
On Saturday December 10, yang.geum.seok@gmail.com wrote:
> I am sorry because there is no information very.
>
> I currently ext3 file system.
>
> I composed RAID-1(/dev/md0) disk array by order such as lower part.
>
> mkfs.ext3 -j /dev/sda1
> mkfs.ext3 -j /dev/sda1
> mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
This is not the way raid1 works.
The raid1 array /dev/md0 will be slightly smaller than either sda1 or
sdb1. So the filesystem will 'think' it is the size of sda1, will
eventually discover it is smaller, and will fail.
You should:
mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mkfs.ext3 -j /dev/md0
mount /dev/md0 /data/disk1
and THEN use the filesystem.
NeilBrown
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 8:02 ` Neil Brown
@ 2005-12-10 8:10 ` Yanggun
2005-12-10 12:10 ` Neil Brown
0 siblings, 1 reply; 33+ messages in thread
From: Yanggun @ 2005-12-10 8:10 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
oh. sorry. it was mistyping.
i have done so.
mkfs.ext3 -j /dev/sda1
mkfs.ext3 -j /dev/sdb1
mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mount /dev/md0 /data/disk1
scp ... [snip]
Should I do format after make RAID-1(/dev/md0) device?
After I format disks(/dev/sda1, /dev/sdb1), do not you compose to RAID-1?
Can it make a problem?
> You should:
> mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
> mkfs.ext3 -j /dev/md0
> mount /dev/md0 /data/disk1
mkfs.ext3 -j /dev/md0
mount /dev/md0 /data/disk1
2005/12/10, Neil Brown <neilb@suse.de>:
> On Saturday December 10, yang.geum.seok@gmail.com wrote:
> > I am sorry because there is no information very.
> >
> > I currently ext3 file system.
> >
> > I composed RAID-1(/dev/md0) disk array by order such as lower part.
> >
> > mkfs.ext3 -j /dev/sda1
> > mkfs.ext3 -j /dev/sda1
> > mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
>
> This is not the way raid1 works.
> The raid1 array /dev/md0 will be slightly smaller than either sda1 or
> sdb1. So the filesystem will 'think' it is the size of sda1, will
> eventually discover it is smaller, and will fail.
> You should:
> mdadm -Cv /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
> mkfs.ext3 -j /dev/md0
> mount /dev/md0 /data/disk1
>
> and THEN use the filesystem.
> NeilBrown
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 8:10 ` Yanggun
@ 2005-12-10 12:10 ` Neil Brown
2005-12-11 13:04 ` Yanggun
0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2005-12-10 12:10 UTC (permalink / raw)
To: Yanggun; +Cc: linux-raid
On Saturday December 10, yang.geum.seok@gmail.com wrote:
>
> Should I do format after make RAID-1(/dev/md0) device?
Yes. format (mkfs) must come AFTER make RAID1 (mdadm -C).
>
> After I format disks(/dev/sda1, /dev/sdb1), do not you compose to
> RAID-1?
Don't format sda1 or sdb1. Compose the RAID-1 first, and then format
md0.
NeilBrown
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-10 12:10 ` Neil Brown
@ 2005-12-11 13:04 ` Yanggun
2005-12-11 14:14 ` Patrik Jonsson
0 siblings, 1 reply; 33+ messages in thread
From: Yanggun @ 2005-12-11 13:04 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
Thank you very much.
Can you inform it is what difference technologically?
Yanggun
2005/12/10, Neil Brown <neilb@suse.de>:
> On Saturday December 10, yang.geum.seok@gmail.com wrote:
> >
> > Should I do format after make RAID-1(/dev/md0) device?
>
> Yes. format (mkfs) must come AFTER make RAID1 (mdadm -C).
> >
> > After I format disks(/dev/sda1, /dev/sdb1), do not you compose to
> > RAID-1?
>
> Don't format sda1 or sdb1. Compose the RAID-1 first, and then format
> md0.
>
> NeilBrown
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-11 13:04 ` Yanggun
@ 2005-12-11 14:14 ` Patrik Jonsson
2005-12-11 14:29 ` Yanggun
0 siblings, 1 reply; 33+ messages in thread
From: Patrik Jonsson @ 2005-12-11 14:14 UTC (permalink / raw)
To: Yanggun; +Cc: Neil Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 888 bytes --]
hello,
Perhaps this can answer some questions:
http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html
cheers,
/Patrik
Yanggun wrote:
> Thank you very much.
>
> Can you inform it is what difference technologically?
>
> Yanggun
>
> 2005/12/10, Neil Brown <neilb@suse.de>:
>
>>On Saturday December 10, yang.geum.seok@gmail.com wrote:
>>
>>>Should I do format after make RAID-1(/dev/md0) device?
>>
>>Yes. format (mkfs) must come AFTER make RAID1 (mdadm -C).
>>
>>>After I format disks(/dev/sda1, /dev/sdb1), do not you compose to
>>>RAID-1?
>>
>>Don't format sda1 or sdb1. Compose the RAID-1 first, and then format
>>md0.
>>
>>NeilBrown
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> !DSPAM:439c3231294002003030161!
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-11 14:14 ` Patrik Jonsson
@ 2005-12-11 14:29 ` Yanggun
2005-12-11 17:13 ` Ross Vandegrift
0 siblings, 1 reply; 33+ messages in thread
From: Yanggun @ 2005-12-11 14:29 UTC (permalink / raw)
To: Patrik Jonsson; +Cc: Neil Brown, linux-raid
I did not find involved part with this in the document.
I am sorry, inform it is what section.
2005/12/11, Patrik Jonsson <patrik@ucolick.org>:
> hello,
>
> Perhaps this can answer some questions:
>
> http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html
>
> cheers,
>
> /Patrik
>
> Yanggun wrote:
> > Thank you very much.
> >
> > Can you inform it is what difference technologically?
> >
> > Yanggun
> >
> > 2005/12/10, Neil Brown <neilb@suse.de>:
> >
> >>On Saturday December 10, yang.geum.seok@gmail.com wrote:
> >>
> >>>Should I do format after make RAID-1(/dev/md0) device?
> >>
> >>Yes. format (mkfs) must come AFTER make RAID1 (mdadm -C).
> >>
> >>>After I format disks(/dev/sda1, /dev/sdb1), do not you compose to
> >>>RAID-1?
> >>
> >>Don't format sda1 or sdb1. Compose the RAID-1 first, and then format
> >>md0.
> >>
> >>NeilBrown
> >>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > !DSPAM:439c3231294002003030161!
> >
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-11 14:29 ` Yanggun
@ 2005-12-11 17:13 ` Ross Vandegrift
2005-12-11 23:28 ` Yanggun
0 siblings, 1 reply; 33+ messages in thread
From: Ross Vandegrift @ 2005-12-11 17:13 UTC (permalink / raw)
To: Yanggun; +Cc: Patrik Jonsson, Neil Brown, linux-raid
On Sun, Dec 11, 2005 at 11:29:23PM +0900, Yanggun wrote:
> I did not find involved part with this in the document.
>
> I am sorry, inform it is what section.
Basically, the md driver acts like another hard disk. So a regular
setup looks like this:
---------------------------------
| |
| Applications (ls, cat) |
| |
---------------------------------
|
v
---------------------------------
| |
| Filesystem (ext3) |
| |
---------------------------------
|
v
---------------------------------
| |
| Block layer (sda1/sdb1) |
| |
---------------------------------
and a software RAID setup looks like this:
---------------------------------
| |
| Applications (ls, cat) |
| |
---------------------------------
|
v
---------------------------------
| |
| Filesystem (ext3) |
| |
---------------------------------
|
v
---------------------------------
| |
| Block layer (md0) |
| |
---------------------------------
|
v
---------------------------------
| |
| Block Layer (sda1/sdb1) |
| |
---------------------------------
The format of sda1/sdb1 is the RAID data that the md driver writes to
them. It form a complete layer of abstraction between your
applications and the actual devices you are storing data on.
--
Ross Vandegrift
ross@lug.udel.edu
"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: raid1 mysteriously switching to read-only
2005-12-11 17:13 ` Ross Vandegrift
@ 2005-12-11 23:28 ` Yanggun
0 siblings, 0 replies; 33+ messages in thread
From: Yanggun @ 2005-12-11 23:28 UTC (permalink / raw)
To: Ross Vandegrift; +Cc: Patrik Jonsson, Neil Brown, linux-raid
thanks.
I am really appreciative for explaining painting.
Yanggun.
2005/12/12, Ross Vandegrift <ross@jose.lug.udel.edu>:
> On Sun, Dec 11, 2005 at 11:29:23PM +0900, Yanggun wrote:
> > I did not find involved part with this in the document.
> >
> > I am sorry, inform it is what section.
>
> Basically, the md driver acts like another hard disk. So a regular
> setup looks like this:
>
> ---------------------------------
> | |
> | Applications (ls, cat) |
> | |
> ---------------------------------
> |
> v
> ---------------------------------
> | |
> | Filesystem (ext3) |
> | |
> ---------------------------------
> |
> v
> ---------------------------------
> | |
> | Block layer (sda1/sdb1) |
> | |
> ---------------------------------
>
> and a software RAID setup looks like this:
>
> ---------------------------------
> | |
> | Applications (ls, cat) |
> | |
> ---------------------------------
> |
> v
> ---------------------------------
> | |
> | Filesystem (ext3) |
> | |
> ---------------------------------
> |
> v
> ---------------------------------
> | |
> | Block layer (md0) |
> | |
> ---------------------------------
> |
> v
> ---------------------------------
> | |
> | Block Layer (sda1/sdb1) |
> | |
> ---------------------------------
>
> The format of sda1/sdb1 is the RAID data that the md driver writes to
> them. It form a complete layer of abstraction between your
> applications and the actual devices you are storing data on.
>
> --
> Ross Vandegrift
> ross@lug.udel.edu
>
> "The good Christian should beware of mathematicians, and all those who
> make empty prophecies. The danger already exists that the mathematicians
> have made a covenant with the devil to darken the spirit and to confine
> man in the bonds of Hell."
> --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
>
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2005-12-11 23:28 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-27 23:39 [PATCH md 000 of 18] Introduction NeilBrown
2005-11-27 23:39 ` [PATCH md 001 of 18] Improve read speed to raid10 arrays using 'far copies' NeilBrown
2005-11-27 23:39 ` [PATCH md 002 of 18] Fix locking problem in r5/r6 NeilBrown
2005-11-27 23:39 ` [PATCH md 003 of 18] Fix problem with raid6 intent bitmap NeilBrown
2005-11-27 23:39 ` [PATCH md 004 of 18] Set default_bitmap_offset properly in set_array_info NeilBrown
2005-11-27 23:40 ` [PATCH md 005 of 18] Fix --re-add for raid1 and raid6 NeilBrown
2005-11-27 23:40 ` [PATCH md 006 of 18] Improve raid1 "IO Barrier" concept NeilBrown
2005-11-27 23:40 ` [PATCH md 007 of 18] Improve raid10 " NeilBrown
2005-11-27 23:40 ` [PATCH md 008 of 18] Small cleanups for raid5 NeilBrown
2005-11-27 23:40 ` [PATCH md 010 of 18] Move bitmap_create to after md array has been initialised NeilBrown
2005-11-27 23:40 ` [PATCH md 011 of 18] Write intent bitmap support for raid10 NeilBrown
2005-11-27 23:40 ` [PATCH md 012 of 18] Fix raid6 resync check/repair code NeilBrown
2005-11-27 23:40 ` [PATCH md 013 of 18] Improve handing of read errors with raid6 NeilBrown
2005-11-30 22:33 ` Carlos Carvalho
2005-12-01 2:54 ` Neil Brown
2005-11-27 23:40 ` [PATCH md 014 of 18] Attempt to auto-correct read errors in raid1 NeilBrown
2005-11-29 16:38 ` Paul Clements
2005-11-29 23:21 ` Neil Brown
2005-11-27 23:40 ` [PATCH md 015 of 18] Tidyup some issues with raid1 resync and prepare for catching read errors NeilBrown
2005-11-27 23:40 ` [PATCH md 016 of 18] Better handling for read error in raid1 during resync NeilBrown
2005-11-27 23:41 ` [PATCH md 017 of 18] Handle errors when read-only NeilBrown
2005-12-10 6:41 ` Yanggun
2005-12-10 6:59 ` raid1 mysteriously switching to read-only Neil Brown
2005-12-10 7:50 ` Yanggun
2005-12-10 8:02 ` Neil Brown
2005-12-10 8:10 ` Yanggun
2005-12-10 12:10 ` Neil Brown
2005-12-11 13:04 ` Yanggun
2005-12-11 14:14 ` Patrik Jonsson
2005-12-11 14:29 ` Yanggun
2005-12-11 17:13 ` Ross Vandegrift
2005-12-11 23:28 ` Yanggun
2005-11-27 23:41 ` [PATCH md 018 of 18] Fix up some rdev rcu locking in raid5/6 NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).