[md PATCH 00/16] md patches for 2.6.40

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [md PATCH 00/16] md patches for 2.6.40
@ 2011-05-11  6:30 NeilBrown
  2011-05-11  6:30 ` [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state NeilBrown
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

I'm slowly working through getting bad-block-logs working, but it
looks like it won't be ready for 2.6.40 (so many distractions.... and
the code is rather complex and hard to test).

However while doing that coding I found several opportunities to clean
up existing code and various bug reports have come in that need
addressing.

So here are that patches that I currently plan to submit for 2.6.40.
They are all in my for-next branch so should appear in -next shortly.

As always, any review is most welcome.

Thanks,
NeilBrown


---

NeilBrown (16):
      md: allow resync_start to be set while an array is active.
      md/raid10:  reformat some loops with less indenting.
      md/raid10: remove unused variable.
      md/raid10: make more use of 'slot' in raid10d.
      md/raid10: some tidying up in fix_read_error
      md/raid1: improve handling of pages allocated for write-behind.
      md/raid1: try fix_sync_read_error before process_checks.
      md/raid1: tidy up new functions: process_checks and fix_sync_read_error.
      md/raid1: split out two sub-functions from sync_request_write
      md: make error_handler functions more uniform and correct.
      md/multipath: discard ->working_disks in favour of ->degraded
      md/raid1: clean up read_balance.
      md: simplify raid10 read_balance
      md/bitmap: fix saving of events_cleared and other state.
      md: reject a re-add request that cannot be honoured.
      md: Fix race when creating a new md device.


 drivers/md/bitmap.c    |   10 +
 drivers/md/md.c        |   23 ++
 drivers/md/multipath.c |   60 +++---
 drivers/md/multipath.h |    1 
 drivers/md/raid1.c     |  506 ++++++++++++++++++++++++------------------------
 drivers/md/raid1.h     |    4 
 drivers/md/raid10.c    |  424 +++++++++++++++++++---------------------
 drivers/md/raid5.c     |   41 ++--
 8 files changed, 535 insertions(+), 534 deletions(-)

-- 
Signature


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [md PATCH 04/16] md: simplify raid10 read_balance
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
  2011-05-11  6:30 ` [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state NeilBrown
  2011-05-11  6:30 ` [md PATCH 01/16] md: Fix race when creating a new md device NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 02/16] md: reject a re-add request that cannot be honoured NeilBrown
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

raid10 read balance has two different loop for looking through
possible devices to chose the best.
Collapse those into one loop and generally make the code more
readable.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid10.c |  110 ++++++++++++++++++++-------------------------------
 1 files changed, 44 insertions(+), 66 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8e94626..8ea0aca 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -488,13 +488,19 @@ static int raid10_mergeable_bvec(struct request_queue *q,
 static int read_balance(conf_t *conf, r10bio_t *r10_bio)
 {
 	const sector_t this_sector = r10_bio->sector;
-	int disk, slot, nslot;
+	int disk, slot;
 	const int sectors = r10_bio->sectors;
-	sector_t new_distance, current_distance;
+	sector_t new_distance, best_dist;
 	mdk_rdev_t *rdev;
+	int do_balance;
+	int best_slot;
 
 	raid10_find_phys(conf, r10_bio);
 	rcu_read_lock();
+retry:
+	best_slot = -1;
+	best_dist = MaxSector;
+	do_balance = 1;
 	/*
 	 * Check if we can balance. We can balance on the whole
 	 * device if no resync is going on (recovery is ok), or below
@@ -502,86 +508,58 @@ static int read_balance(conf_t *conf, r10bio_t *r10_bio)
 	 * above the resync window.
 	 */
 	if (conf->mddev->recovery_cp < MaxSector
-	    && (this_sector + sectors >= conf->next_resync)) {
-		/* make sure that disk is operational */
-		slot = 0;
-		disk = r10_bio->devs[slot].devnum;
-
-		while ((rdev = rcu_dereference(conf->mirrors[disk].rdev)) == NULL ||
-		       r10_bio->devs[slot].bio == IO_BLOCKED ||
-		       !test_bit(In_sync, &rdev->flags)) {
-			slot++;
-			if (slot == conf->copies) {
-				slot = 0;
-				disk = -1;
-				break;
-			}
-			disk = r10_bio->devs[slot].devnum;
-		}
-		goto rb_out;
-	}
-
+	    && (this_sector + sectors >= conf->next_resync))
+		do_balance = 0;
 
-	/* make sure the disk is operational */
-	slot = 0;
-	disk = r10_bio->devs[slot].devnum;
-	while ((rdev=rcu_dereference(conf->mirrors[disk].rdev)) == NULL ||
-	       r10_bio->devs[slot].bio == IO_BLOCKED ||
-	       !test_bit(In_sync, &rdev->flags)) {
-		slot ++;
-		if (slot == conf->copies) {
-			disk = -1;
-			goto rb_out;
-		}
+	for (slot = 0; slot < conf->copies ; slot++) {
+		if (r10_bio->devs[slot].bio == IO_BLOCKED)
+			continue;
 		disk = r10_bio->devs[slot].devnum;
-	}
-
-
-	current_distance = abs(r10_bio->devs[slot].addr -
-			       conf->mirrors[disk].head_position);
-
-	/* Find the disk whose head is closest,
-	 * or - for far > 1 - find the closest to partition beginning */
-
-	for (nslot = slot; nslot < conf->copies; nslot++) {
-		int ndisk = r10_bio->devs[nslot].devnum;
-
-
-		if ((rdev=rcu_dereference(conf->mirrors[ndisk].rdev)) == NULL ||
-		    r10_bio->devs[nslot].bio == IO_BLOCKED ||
-		    !test_bit(In_sync, &rdev->flags))
+		rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		if (rdev == NULL)
+			continue;
+		if (!test_bit(In_sync, &rdev->flags))
 			continue;
 
+		if (!do_balance)
+			break;
+
 		/* This optimisation is debatable, and completely destroys
 		 * sequential read speed for 'far copies' arrays.  So only
 		 * keep it for 'near' arrays, and review those later.
 		 */
-		if (conf->near_copies > 1 && !atomic_read(&rdev->nr_pending)) {
-			disk = ndisk;
-			slot = nslot;
+		if (conf->near_copies > 1 && !atomic_read(&rdev->nr_pending))
 			break;
-		}
 
 		/* for far > 1 always use the lowest address */
 		if (conf->far_copies > 1)
-			new_distance = r10_bio->devs[nslot].addr;
+			new_distance = r10_bio->devs[slot].addr;
 		else
-			new_distance = abs(r10_bio->devs[nslot].addr -
-					   conf->mirrors[ndisk].head_position);
-		if (new_distance < current_distance) {
-			current_distance = new_distance;
-			disk = ndisk;
-			slot = nslot;
+			new_distance = abs(r10_bio->devs[slot].addr -
+					   conf->mirrors[disk].head_position);
+		if (new_distance < best_dist) {
+			best_dist = new_distance;
+			best_slot = slot;
 		}
 	}
+	if (slot == conf->copies)
+		slot = best_slot;
 
-rb_out:
-	r10_bio->read_slot = slot;
-/*	conf->next_seq_sect = this_sector + sectors;*/
-
-	if (disk >= 0 && (rdev=rcu_dereference(conf->mirrors[disk].rdev))!= NULL)
-		atomic_inc(&conf->mirrors[disk].rdev->nr_pending);
-	else
+	if (slot >= 0) {
+		disk = r10_bio->devs[slot].devnum;
+		rdev = rcu_dereference(conf->mirrors[disk].rdev);
+		if (!rdev)
+			goto retry;
+		atomic_inc(&rdev->nr_pending);
+		if (test_bit(Faulty, &rdev->flags)) {
+			/* Cannot risk returning a device that failed
+			 * before we inc'ed nr_pending
+			 */
+			rdev_dec_pending(rdev, conf->mddev);
+			goto retry;
+		}
+		r10_bio->read_slot = slot;
+	} else
 		disk = -1;
 	rcu_read_unlock();
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 01/16] md: Fix race when creating a new md device NeilBrown
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

If a bitmap is found to be 'stale' the events_cleared value
is set to match 'events'.
However if the array is degraded this does not get stored on disk.
This can subsequently lead to incorrect behaviour.

So change bitmap_update_sb to always update events_cleared in the
superblock from the known events_cleared.
For neatness also set ->state from ->flags.
This requires updating ->state whenever we update ->flags, which makes
sense anyway.

This is suitable for any active -stable release.

cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/bitmap.c |   10 ++++++----
 1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 5c93627..70bd738 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -493,11 +493,11 @@ void bitmap_update_sb(struct bitmap *bitmap)
 	spin_unlock_irqrestore(&bitmap->lock, flags);
 	sb = kmap_atomic(bitmap->sb_page, KM_USER0);
 	sb->events = cpu_to_le64(bitmap->mddev->events);
-	if (bitmap->mddev->events < bitmap->events_cleared) {
+	if (bitmap->mddev->events < bitmap->events_cleared)
 		/* rocking back to read-only */
 		bitmap->events_cleared = bitmap->mddev->events;
-		sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
-	}
+	sb->events_cleared = cpu_to_le64(bitmap->events_cleared);
+	sb->state = cpu_to_le32(bitmap->flags);
 	/* Just in case these have been changed via sysfs: */
 	sb->daemon_sleep = cpu_to_le32(bitmap->mddev->bitmap_info.daemon_sleep/HZ);
 	sb->write_behind = cpu_to_le32(bitmap->mddev->bitmap_info.max_write_behind);
@@ -618,7 +618,7 @@ success:
 	if (le32_to_cpu(sb->version) == BITMAP_MAJOR_HOSTENDIAN)
 		bitmap->flags |= BITMAP_HOSTENDIAN;
 	bitmap->events_cleared = le64_to_cpu(sb->events_cleared);
-	if (sb->state & cpu_to_le32(BITMAP_STALE))
+	if (bitmap->flags & BITMAP_STALE)
 		bitmap->events_cleared = bitmap->mddev->events;
 	err = 0;
 out:
@@ -652,9 +652,11 @@ static int bitmap_mask_state(struct bitmap *bitmap, enum bitmap_state bits,
 	switch (op) {
 	case MASK_SET:
 		sb->state |= cpu_to_le32(bits);
+		bitmap->flags |= bits;
 		break;
 	case MASK_UNSET:
 		sb->state &= cpu_to_le32(~bits);
+		bitmap->flags &= ~bits;
 		break;
 	default:
 		BUG();



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 01/16] md: Fix race when creating a new md device.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
  2011-05-11  6:30 ` [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 04/16] md: simplify raid10 read_balance NeilBrown
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

There is a race when creating an md device by opening /dev/mdXX.

If two processes do this at much the same time they will follow the
call path
  __blkdev_get -> get_gendisk -> kobj_lookup

The first will call
  -> md_probe -> md_alloc -> add_disk -> blk_register_region

and the race happens when the second gets to kobj_lookup after
add_disk has called blk_register_region but before it returns to
md_alloc.

In the case the second will not call md_probe (as the probe is already
done) but will get a handle on the gendisk, return to __blkdev_get
which will then call md_open (via the ->open) pointer.

As mddev->gendisk hasn't been set yet, md_open will think something is
wrong an return with ERESTARTSYS.

This can loop endlessly while the first thread makes no progress
through add_disk.  Nothing is blocking it, but due to scheduler
behaviour it doesn't get a turn.
So this is essentially a live-lock.

We fix this by simply moving the assignment to mddev->gendisk before
the call the add_disk() so md_open doesn't get confused.
Also move blk_queue_flush earlier because add_disk should be as late
as possible.

To make sure that md_open doesn't complete until md_alloc has done all
that is needed, we take mddev->open_mutex during the last part of
md_alloc.  md_open will wait for this.

This can cause a lock-up on boot so Cc:ing for stable.
For 2.6.36 and earlier a different patch will be needed as the
'blk_queue_flush' call isn't there.

Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: stable@kernel.org
---

 drivers/md/md.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7d6f7f1..4a4c0f8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4347,13 +4347,19 @@ static int md_alloc(dev_t dev, char *name)
 	disk->fops = &md_fops;
 	disk->private_data = mddev;
 	disk->queue = mddev->queue;
+	blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA);
 	/* Allow extended partitions.  This makes the
 	 * 'mdp' device redundant, but we can't really
 	 * remove it now.
 	 */
 	disk->flags |= GENHD_FL_EXT_DEVT;
-	add_disk(disk);
 	mddev->gendisk = disk;
+	/* As soon as we call add_disk(), another thread could get
+	 * through to md_open, so make sure it doesn't get too far
+	 */
+	mutex_lock(&mddev->open_mutex);
+	add_disk(disk);
+
 	error = kobject_init_and_add(&mddev->kobj, &md_ktype,
 				     &disk_to_dev(disk)->kobj, "%s", "md");
 	if (error) {
@@ -4367,8 +4373,7 @@ static int md_alloc(dev_t dev, char *name)
 	if (mddev->kobj.sd &&
 	    sysfs_create_group(&mddev->kobj, &md_bitmap_group))
 		printk(KERN_DEBUG "pointless warning\n");
-
-	blk_queue_flush(mddev->queue, REQ_FLUSH | REQ_FUA);
+	mutex_unlock(&mddev->open_mutex);
  abort:
 	mutex_unlock(&disks_mutex);
 	if (!error && mddev->kobj.sd) {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 02/16] md: reject a re-add request that cannot be honoured.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (2 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 04/16] md: simplify raid10 read_balance NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 14/16] md/raid10: remove unused variable NeilBrown
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

The 'add_new_disk' ioctl can be used to add a device either as a
spare, or as an active disk that just needs to be resynced based on
write-intent-bitmap information (re-add)

Currently if a re-add is requested but fails we add as a spare
instead.  This makes it impossible for user-space to check for
failure.

So change to require that a re-add attempt will either succeed or
completely fail.  User-space can then decide what to do next.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/md.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4a4c0f8..5469ae3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5216,6 +5216,16 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
 		} else
 			super_types[mddev->major_version].
 				validate_super(mddev, rdev);
+		if ((info->state & (1<<MD_DISK_SYNC)) &&
+		    (!test_bit(In_sync, &rdev->flags) ||
+		     rdev->raid_disk != info->raid_disk)) {
+			/* This was a hot-add request, but events doesn't
+			 * match, so reject it.
+			 */
+			export_rdev(rdev);
+			return -EINVAL;
+		}
+
 		if (test_bit(In_sync, &rdev->flags))
 			rdev->saved_raid_disk = rdev->raid_disk;
 		else



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 16/16] md: allow resync_start to be set while an array is active.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (8 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 09/16] md/raid1: tidy up new functions: process_checks and fix_sync_read_error NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 05/16] md/raid1: clean up read_balance NeilBrown
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

The sysfs attribute 'resync_start' (known internally as recovery_cp),
records where a resync is up to.  A value of 0 means the array is
not known to be in-sync at all.  A value of MaxSector means the array
is believed to be fully in-sync.

When the size of member devices of an array (RAID1,RAID4/5/6) is
increased, the array can be increased to match.  This process sets
resync_start to the old end-of-device offset so that the new part of
the array gets resynced.

However with RAID1 (and RAID6) a resync is not technically necessary
and may be undesirable.  So it would be good if the implied resync
after the array is resized could be avoided.

So: change 'resync_start' so the value can be changed while the array
is active, and as a precaution only allow it to be changed while
resync/recovery is 'frozen'.  Changing it once resync has started is
not going to be useful anyway.

This allows the array to be resized without a resync by:
  write 'frozen' to 'sync_action'
  write new size to 'component_size' (this will set resync_start)
  write 'none' to 'resync_start'
  write 'idle' to 'sync_action'.

Also slightly improve some tests on recovery_cp when resizing
raid1/raid5.  Now that an arbitrary value could be set we should be
more careful in our tests.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/md.c    |    2 +-
 drivers/md/raid1.c |    2 +-
 drivers/md/raid5.c |    3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5469ae3..aa640a8 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3324,7 +3324,7 @@ resync_start_store(mddev_t *mddev, const char *buf, size_t len)
 	char *e;
 	unsigned long long n = simple_strtoull(buf, &e, 10);

-	if (mddev->pers)
+	if (mddev->pers && !test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
 		return -EBUSY;
 	if (cmd_match(buf, "none"))
 		n = MaxSector;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 779abbd..5d09609 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2061,7 +2061,7 @@ static int raid1_resize(mddev_t *mddev, sector_t sectors)
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	revalidate_disk(mddev->gendisk);
 	if (sectors > mddev->dev_sectors &&
-	    mddev->recovery_cp == MaxSector) {
+	    mddev->recovery_cp > mddev->dev_sectors) {
 		mddev->recovery_cp = mddev->dev_sectors;
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 51af7f3..34dd545 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5389,7 +5389,8 @@ static int raid5_resize(mddev_t *mddev, sector_t sectors)
 		return -EINVAL;
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	revalidate_disk(mddev->gendisk);
-	if (sectors > mddev->dev_sectors && mddev->recovery_cp == MaxSector) {
+	if (sectors > mddev->dev_sectors &&
+	    mddev->recovery_cp > mddev->dev_sectors) {
 		mddev->recovery_cp = mddev->dev_sectors;
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 08/16] md/raid1: split out two sub-functions from sync_request_write
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (12 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 07/16] md: make error_handler functions more uniform and correct NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 10/16] md/raid1: try fix_sync_read_error before process_checks NeilBrown
  2011-05-11  6:30 ` [md PATCH 15/16] md/raid10: reformat some loops with less indenting NeilBrown
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

sync_request_write is too big and too deep.
So split out two self-contains bits of functionality into separate
function.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c |  365 +++++++++++++++++++++++++++-------------------------
 1 files changed, 192 insertions(+), 173 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f0b0c79..7fd7a4d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1181,194 +1181,213 @@ static void end_sync_write(struct bio *bio, int error)
 	}
 }
 
-static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
+static int fix_sync_read_error(r1bio_t *r1_bio)
 {
+	/* Try some synchronous reads of other devices to get
+	 * good data, much like with normal read errors.  Only
+	 * read into the pages we already have so we don't
+	 * need to re-issue the read request.
+	 * We don't need to freeze the array, because being in an
+	 * active sync request, there is no normal IO, and
+	 * no overlapping syncs.
+	 */
+	mddev_t *mddev = r1_bio->mddev;
 	conf_t *conf = mddev->private;
-	int i;
-	int disks = conf->raid_disks;
-	struct bio *bio, *wbio;
+	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
+	sector_t sect = r1_bio->sector;
+	int sectors = r1_bio->sectors;
+	int idx = 0;
 
-	bio = r1_bio->bios[r1_bio->read_disk];
-
-
-	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
-		/* We have read all readable devices.  If we haven't
-		 * got the block, then there is no hope left.
-		 * If we have, then we want to do a comparison
-		 * and skip the write if everything is the same.
-		 * If any blocks failed to read, then we need to
-		 * attempt an over-write
-		 */
-		int primary;
-		if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
-			for (i=0; i<mddev->raid_disks; i++)
-				if (r1_bio->bios[i]->bi_end_io == end_sync_read)
-					md_error(mddev, conf->mirrors[i].rdev);
+	while(sectors) {
+		int s = sectors;
+		int d = r1_bio->read_disk;
+		int success = 0;
+		mdk_rdev_t *rdev;
 
-			md_done_sync(mddev, r1_bio->sectors, 1);
+		if (s > (PAGE_SIZE>>9))
+			s = PAGE_SIZE >> 9;
+		do {
+			if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
+				/* No rcu protection needed here devices
+				 * can only be removed when no resync is
+				 * active, and resync is currently active
+				 */
+				rdev = conf->mirrors[d].rdev;
+				if (sync_page_io(rdev,
+						 sect,
+						 s<<9,
+						 bio->bi_io_vec[idx].bv_page,
+						 READ, false)) {
+					success = 1;
+					break;
+				}
+			}
+			d++;
+			if (d == conf->raid_disks)
+				d = 0;
+		} while (!success && d != r1_bio->read_disk);
+
+		if (success) {
+			int start = d;
+			/* write it back and re-read */
+			set_bit(R1BIO_Uptodate, &r1_bio->state);
+			while (d != r1_bio->read_disk) {
+				if (d == 0)
+					d = conf->raid_disks;
+				d--;
+				if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+					continue;
+				rdev = conf->mirrors[d].rdev;
+				atomic_add(s, &rdev->corrected_errors);
+				if (sync_page_io(rdev,
+						 sect,
+						 s<<9,
+						 bio->bi_io_vec[idx].bv_page,
+						 WRITE, false) == 0)
+					md_error(mddev, rdev);
+			}
+			d = start;
+			while (d != r1_bio->read_disk) {
+				if (d == 0)
+					d = conf->raid_disks;
+				d--;
+				if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+					continue;
+				rdev = conf->mirrors[d].rdev;
+				if (sync_page_io(rdev,
+						 sect,
+						 s<<9,
+						 bio->bi_io_vec[idx].bv_page,
+						 READ, false) == 0)
+					md_error(mddev, rdev);
+			}
+		} else {
+			char b[BDEVNAME_SIZE];
+			/* Cannot read from anywhere, array is toast */
+			md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
+			printk(KERN_ALERT "md/raid1:%s: %s: unrecoverable I/O read error"
+			       " for block %llu\n",
+			       mdname(mddev),
+			       bdevname(bio->bi_bdev, b),
+			       (unsigned long long)r1_bio->sector);
+			md_done_sync(mddev, r1_bio->sectors, 0);
 			put_buf(r1_bio);
-			return;
+			return 0;
 		}
-		for (primary=0; primary<mddev->raid_disks; primary++)
-			if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
-			    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
-				r1_bio->bios[primary]->bi_end_io = NULL;
-				rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
-				break;
-			}
-		r1_bio->read_disk = primary;
+		sectors -= s;
+		sect += s;
+		idx ++;
+	}
+	return 1;
+}
+
+static int process_checks(r1bio_t *r1_bio)
+{
+	/* We have read all readable devices.  If we haven't
+	 * got the block, then there is no hope left.
+	 * If we have, then we want to do a comparison
+	 * and skip the write if everything is the same.
+	 * If any blocks failed to read, then we need to
+	 * attempt an over-write
+	 */
+	mddev_t *mddev = r1_bio->mddev;
+	conf_t *conf = mddev->private;
+	int primary;
+	int i;
+
+	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
-				int j;
-				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
-				struct bio *pbio = r1_bio->bios[primary];
-				struct bio *sbio = r1_bio->bios[i];
-
-				if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
-					for (j = vcnt; j-- ; ) {
-						struct page *p, *s;
-						p = pbio->bi_io_vec[j].bv_page;
-						s = sbio->bi_io_vec[j].bv_page;
-						if (memcmp(page_address(p),
-							   page_address(s),
-							   PAGE_SIZE))
-							break;
-					}
-				} else
-					j = 0;
-				if (j >= 0)
-					mddev->resync_mismatches += r1_bio->sectors;
-				if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-					      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
-					sbio->bi_end_io = NULL;
-					rdev_dec_pending(conf->mirrors[i].rdev, mddev);
-				} else {
-					/* fixup the bio for reuse */
-					int size;
-					sbio->bi_vcnt = vcnt;
-					sbio->bi_size = r1_bio->sectors << 9;
-					sbio->bi_idx = 0;
-					sbio->bi_phys_segments = 0;
-					sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
-					sbio->bi_flags |= 1 << BIO_UPTODATE;
-					sbio->bi_next = NULL;
-					sbio->bi_sector = r1_bio->sector +
-						conf->mirrors[i].rdev->data_offset;
-					sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-					size = sbio->bi_size;
-					for (j = 0; j < vcnt ; j++) {
-						struct bio_vec *bi;
-						bi = &sbio->bi_io_vec[j];
-						bi->bv_offset = 0;
-						if (size > PAGE_SIZE)
-							bi->bv_len = PAGE_SIZE;
-						else
-							bi->bv_len = size;
-						size -= PAGE_SIZE;
-						memcpy(page_address(bi->bv_page),
-						       page_address(pbio->bi_io_vec[j].bv_page),
-						       PAGE_SIZE);
-					}
+			if (r1_bio->bios[i]->bi_end_io == end_sync_read)
+				md_error(mddev, conf->mirrors[i].rdev);
 
-				}
-			}
+		md_done_sync(mddev, r1_bio->sectors, 1);
+		put_buf(r1_bio);
+		return -1;
 	}
-	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
-		/* ouch - failed to read all of that.
-		 * Try some synchronous reads of other devices to get
-		 * good data, much like with normal read errors.  Only
-		 * read into the pages we already have so we don't
-		 * need to re-issue the read request.
-		 * We don't need to freeze the array, because being in an
-		 * active sync request, there is no normal IO, and
-		 * no overlapping syncs.
-		 */
-		sector_t sect = r1_bio->sector;
-		int sectors = r1_bio->sectors;
-		int idx = 0;
-
-		while(sectors) {
-			int s = sectors;
-			int d = r1_bio->read_disk;
-			int success = 0;
-			mdk_rdev_t *rdev;
-
-			if (s > (PAGE_SIZE>>9))
-				s = PAGE_SIZE >> 9;
-			do {
-				if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
-					/* No rcu protection needed here devices
-					 * can only be removed when no resync is
-					 * active, and resync is currently active
-					 */
-					rdev = conf->mirrors[d].rdev;
-					if (sync_page_io(rdev,
-							 sect,
-							 s<<9,
-							 bio->bi_io_vec[idx].bv_page,
-							 READ, false)) {
-						success = 1;
+	for (primary=0; primary<mddev->raid_disks; primary++)
+		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
+		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
+			r1_bio->bios[primary]->bi_end_io = NULL;
+			rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
+			break;
+		}
+	r1_bio->read_disk = primary;
+	for (i=0; i<mddev->raid_disks; i++)
+		if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
+			int j;
+			int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
+			struct bio *pbio = r1_bio->bios[primary];
+			struct bio *sbio = r1_bio->bios[i];
+
+			if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+				for (j = vcnt; j-- ; ) {
+					struct page *p, *s;
+					p = pbio->bi_io_vec[j].bv_page;
+					s = sbio->bi_io_vec[j].bv_page;
+					if (memcmp(page_address(p),
+						   page_address(s),
+						   PAGE_SIZE))
 						break;
-					}
-				}
-				d++;
-				if (d == conf->raid_disks)
-					d = 0;
-			} while (!success && d != r1_bio->read_disk);
-
-			if (success) {
-				int start = d;
-				/* write it back and re-read */
-				set_bit(R1BIO_Uptodate, &r1_bio->state);
-				while (d != r1_bio->read_disk) {
-					if (d == 0)
-						d = conf->raid_disks;
-					d--;
-					if (r1_bio->bios[d]->bi_end_io != end_sync_read)
-						continue;
-					rdev = conf->mirrors[d].rdev;
-					atomic_add(s, &rdev->corrected_errors);
-					if (sync_page_io(rdev,
-							 sect,
-							 s<<9,
-							 bio->bi_io_vec[idx].bv_page,
-							 WRITE, false) == 0)
-						md_error(mddev, rdev);
-				}
-				d = start;
-				while (d != r1_bio->read_disk) {
-					if (d == 0)
-						d = conf->raid_disks;
-					d--;
-					if (r1_bio->bios[d]->bi_end_io != end_sync_read)
-						continue;
-					rdev = conf->mirrors[d].rdev;
-					if (sync_page_io(rdev,
-							 sect,
-							 s<<9,
-							 bio->bi_io_vec[idx].bv_page,
-							 READ, false) == 0)
-						md_error(mddev, rdev);
 				}
+			} else
+				j = 0;
+			if (j >= 0)
+				mddev->resync_mismatches += r1_bio->sectors;
+			if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
+				      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
+				sbio->bi_end_io = NULL;
+				rdev_dec_pending(conf->mirrors[i].rdev, mddev);
 			} else {
-				char b[BDEVNAME_SIZE];
-				/* Cannot read from anywhere, array is toast */
-				md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
-				printk(KERN_ALERT "md/raid1:%s: %s: unrecoverable I/O read error"
-				       " for block %llu\n",
-				       mdname(mddev),
-				       bdevname(bio->bi_bdev, b),
-				       (unsigned long long)r1_bio->sector);
-				md_done_sync(mddev, r1_bio->sectors, 0);
-				put_buf(r1_bio);
-				return;
+				/* fixup the bio for reuse */
+				int size;
+				sbio->bi_vcnt = vcnt;
+				sbio->bi_size = r1_bio->sectors << 9;
+				sbio->bi_idx = 0;
+				sbio->bi_phys_segments = 0;
+				sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
+				sbio->bi_flags |= 1 << BIO_UPTODATE;
+				sbio->bi_next = NULL;
+				sbio->bi_sector = r1_bio->sector +
+					conf->mirrors[i].rdev->data_offset;
+				sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+				size = sbio->bi_size;
+				for (j = 0; j < vcnt ; j++) {
+					struct bio_vec *bi;
+					bi = &sbio->bi_io_vec[j];
+					bi->bv_offset = 0;
+					if (size > PAGE_SIZE)
+						bi->bv_len = PAGE_SIZE;
+					else
+						bi->bv_len = size;
+					size -= PAGE_SIZE;
+					memcpy(page_address(bi->bv_page),
+					       page_address(pbio->bi_io_vec[j].bv_page),
+					       PAGE_SIZE);
+				}
+
 			}
-			sectors -= s;
-			sect += s;
-			idx ++;
 		}
-	}
+	return 0;
+}
+
+static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
+{
+	conf_t *conf = mddev->private;
+	int i;
+	int disks = conf->raid_disks;
+	struct bio *bio, *wbio;
+
+	bio = r1_bio->bios[r1_bio->read_disk];
+
 
+	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+		if (process_checks(r1_bio) < 0)
+			return;
+
+	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
+		/* ouch - failed to read all of that. */
+		if (!fix_sync_read_error(r1_bio))
+			return;
 	/*
 	 * schedule writes
 	 */



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 14/16] md/raid10: remove unused variable.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (3 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 02/16] md: reject a re-add request that cannot be honoured NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 12/16] md/raid10: some tidying up in fix_read_error NeilBrown
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

This variable 'disk' is never used - how odd.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid10.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 80cc7e6..f44a3da 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1742,7 +1742,6 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 	r10bio_t *r10_bio;
 	struct bio *biolist = NULL, *bio;
 	sector_t max_sector, nr_sectors;
-	int disk;
 	int i;
 	int max_sync;
 	sector_t sync_blocks;
@@ -2020,7 +2019,6 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 	do {
 		struct page *page;
 		int len = PAGE_SIZE;
-		disk = 0;
 		if (sector_nr + (len>>9) > max_sector)
 			len = (max_sector - sector_nr) << 9;
 		if (len == 0)
@@ -2039,7 +2037,6 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 				}
 				goto bio_full;
 			}
-			disk = i;
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 10/16] md/raid1: try fix_sync_read_error before process_checks.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (13 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 08/16] md/raid1: split out two sub-functions from sync_request_write NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 15/16] md/raid10: reformat some loops with less indenting NeilBrown
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

If we get a read error during resync/recovery we current repeat with
single-page reads to find out just where the error is, and possibly
read each page from a different device.

With check/repair we don't currently do that, we just fail.
However it is possible that while all devices fail on the large 64K
read, we might be able to satisfy each 4K from one device or another.

So call fix_sync_read_error before process_checks to maximise the
chance of finding good data and writing it out to the devices with
read errors.

For this to work, we need to set the 'uptodate' flags properly after
fix_sync_read_error has succeeded.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2b9e86c..b9d6da1 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1282,6 +1282,7 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 		idx ++;
 	}
 	set_bit(R1BIO_Uptodate, &r1_bio->state);
+	set_bit(BIO_UPTODATE, &bio->bi_flags);
 	return 1;
 }
 
@@ -1299,15 +1300,6 @@ static int process_checks(r1bio_t *r1_bio)
 	int primary;
 	int i;
 
-	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
-		for (i=0; i < conf->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read)
-				md_error(mddev, conf->mirrors[i].rdev);
-
-		md_done_sync(mddev, r1_bio->sectors, 1);
-		put_buf(r1_bio);
-		return -1;
-	}
 	for (primary = 0; primary < conf->raid_disks; primary++)
 		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
 		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
@@ -1385,15 +1377,14 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 
 	bio = r1_bio->bios[r1_bio->read_disk];
 
-
-	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
-		if (process_checks(r1_bio) < 0)
-			return;
-
 	if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
 		/* ouch - failed to read all of that. */
 		if (!fix_sync_read_error(r1_bio))
 			return;
+
+	if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+		if (process_checks(r1_bio) < 0)
+			return;
 	/*
 	 * schedule writes
 	 */



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 09/16] md/raid1: tidy up new functions: process_checks and fix_sync_read_error.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (7 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 11/16] md/raid1: improve handling of pages allocated for write-behind NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 16/16] md: allow resync_start to be set while an array is active NeilBrown
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

These changes are mostly cosmetic:

1/ change mddev->raid_disks to conf->raid_disks because the later is
   technically safer, though in current practice it doesn't matter in
   this particular context.
2/ Rearrange two for / if loops to have an early 'continue' so the
   body of the 'if' doesn't need to be indented so much.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c |  184 +++++++++++++++++++++++++++-------------------------
 1 files changed, 95 insertions(+), 89 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7fd7a4d..2b9e86c 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1203,6 +1203,7 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 		int d = r1_bio->read_disk;
 		int success = 0;
 		mdk_rdev_t *rdev;
+		int start;
 
 		if (s > (PAGE_SIZE>>9))
 			s = PAGE_SIZE >> 9;
@@ -1227,41 +1228,7 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 				d = 0;
 		} while (!success && d != r1_bio->read_disk);
 
-		if (success) {
-			int start = d;
-			/* write it back and re-read */
-			set_bit(R1BIO_Uptodate, &r1_bio->state);
-			while (d != r1_bio->read_disk) {
-				if (d == 0)
-					d = conf->raid_disks;
-				d--;
-				if (r1_bio->bios[d]->bi_end_io != end_sync_read)
-					continue;
-				rdev = conf->mirrors[d].rdev;
-				atomic_add(s, &rdev->corrected_errors);
-				if (sync_page_io(rdev,
-						 sect,
-						 s<<9,
-						 bio->bi_io_vec[idx].bv_page,
-						 WRITE, false) == 0)
-					md_error(mddev, rdev);
-			}
-			d = start;
-			while (d != r1_bio->read_disk) {
-				if (d == 0)
-					d = conf->raid_disks;
-				d--;
-				if (r1_bio->bios[d]->bi_end_io != end_sync_read)
-					continue;
-				rdev = conf->mirrors[d].rdev;
-				if (sync_page_io(rdev,
-						 sect,
-						 s<<9,
-						 bio->bi_io_vec[idx].bv_page,
-						 READ, false) == 0)
-					md_error(mddev, rdev);
-			}
-		} else {
+		if (!success) {
 			char b[BDEVNAME_SIZE];
 			/* Cannot read from anywhere, array is toast */
 			md_error(mddev, conf->mirrors[r1_bio->read_disk].rdev);
@@ -1274,10 +1241,47 @@ static int fix_sync_read_error(r1bio_t *r1_bio)
 			put_buf(r1_bio);
 			return 0;
 		}
+
+		start = d;
+		/* write it back and re-read */
+		while (d != r1_bio->read_disk) {
+			if (d == 0)
+				d = conf->raid_disks;
+			d--;
+			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+				continue;
+			rdev = conf->mirrors[d].rdev;
+			if (sync_page_io(rdev,
+					 sect,
+					 s<<9,
+					 bio->bi_io_vec[idx].bv_page,
+					 WRITE, false) == 0) {
+				r1_bio->bios[d]->bi_end_io = NULL;
+				rdev_dec_pending(rdev, mddev);
+				md_error(mddev, rdev);
+			} else
+				atomic_add(s, &rdev->corrected_errors);
+		}
+		d = start;
+		while (d != r1_bio->read_disk) {
+			if (d == 0)
+				d = conf->raid_disks;
+			d--;
+			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+				continue;
+			rdev = conf->mirrors[d].rdev;
+			if (sync_page_io(rdev,
+					 sect,
+					 s<<9,
+					 bio->bi_io_vec[idx].bv_page,
+					 READ, false) == 0)
+				md_error(mddev, rdev);
+		}
 		sectors -= s;
 		sect += s;
 		idx ++;
 	}
+	set_bit(R1BIO_Uptodate, &r1_bio->state);
 	return 1;
 }
 
@@ -1296,7 +1300,7 @@ static int process_checks(r1bio_t *r1_bio)
 	int i;
 
 	if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
-		for (i=0; i<mddev->raid_disks; i++)
+		for (i=0; i < conf->raid_disks; i++)
 			if (r1_bio->bios[i]->bi_end_io == end_sync_read)
 				md_error(mddev, conf->mirrors[i].rdev);
 
@@ -1304,7 +1308,7 @@ static int process_checks(r1bio_t *r1_bio)
 		put_buf(r1_bio);
 		return -1;
 	}
-	for (primary=0; primary<mddev->raid_disks; primary++)
+	for (primary = 0; primary < conf->raid_disks; primary++)
 		if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
 		    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
 			r1_bio->bios[primary]->bi_end_io = NULL;
@@ -1312,61 +1316,63 @@ static int process_checks(r1bio_t *r1_bio)
 			break;
 		}
 	r1_bio->read_disk = primary;
-	for (i=0; i<mddev->raid_disks; i++)
-		if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
-			int j;
-			int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
-			struct bio *pbio = r1_bio->bios[primary];
-			struct bio *sbio = r1_bio->bios[i];
-
-			if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
-				for (j = vcnt; j-- ; ) {
-					struct page *p, *s;
-					p = pbio->bi_io_vec[j].bv_page;
-					s = sbio->bi_io_vec[j].bv_page;
-					if (memcmp(page_address(p),
-						   page_address(s),
-						   PAGE_SIZE))
-						break;
-				}
-			} else
-				j = 0;
-			if (j >= 0)
-				mddev->resync_mismatches += r1_bio->sectors;
-			if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-				      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
-				sbio->bi_end_io = NULL;
-				rdev_dec_pending(conf->mirrors[i].rdev, mddev);
-			} else {
-				/* fixup the bio for reuse */
-				int size;
-				sbio->bi_vcnt = vcnt;
-				sbio->bi_size = r1_bio->sectors << 9;
-				sbio->bi_idx = 0;
-				sbio->bi_phys_segments = 0;
-				sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
-				sbio->bi_flags |= 1 << BIO_UPTODATE;
-				sbio->bi_next = NULL;
-				sbio->bi_sector = r1_bio->sector +
-					conf->mirrors[i].rdev->data_offset;
-				sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-				size = sbio->bi_size;
-				for (j = 0; j < vcnt ; j++) {
-					struct bio_vec *bi;
-					bi = &sbio->bi_io_vec[j];
-					bi->bv_offset = 0;
-					if (size > PAGE_SIZE)
-						bi->bv_len = PAGE_SIZE;
-					else
-						bi->bv_len = size;
-					size -= PAGE_SIZE;
-					memcpy(page_address(bi->bv_page),
-					       page_address(pbio->bi_io_vec[j].bv_page),
-					       PAGE_SIZE);
-				}
+	for (i = 0; i < conf->raid_disks; i++) {
+		int j;
+		int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
+		struct bio *pbio = r1_bio->bios[primary];
+		struct bio *sbio = r1_bio->bios[i];
+		int size;
+
+		if (r1_bio->bios[i]->bi_end_io != end_sync_read)
+			continue;
 
+		if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+			for (j = vcnt; j-- ; ) {
+				struct page *p, *s;
+				p = pbio->bi_io_vec[j].bv_page;
+				s = sbio->bi_io_vec[j].bv_page;
+				if (memcmp(page_address(p),
+					   page_address(s),
+					   PAGE_SIZE))
+					break;
 			}
+		} else
+			j = 0;
+		if (j >= 0)
+			mddev->resync_mismatches += r1_bio->sectors;
+		if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
+			      && test_bit(BIO_UPTODATE, &sbio->bi_flags))) {
+			/* No need to write to this device. */
+			sbio->bi_end_io = NULL;
+			rdev_dec_pending(conf->mirrors[i].rdev, mddev);
+			continue;
 		}
+		/* fixup the bio for reuse */
+		sbio->bi_vcnt = vcnt;
+		sbio->bi_size = r1_bio->sectors << 9;
+		sbio->bi_idx = 0;
+		sbio->bi_phys_segments = 0;
+		sbio->bi_flags &= ~(BIO_POOL_MASK - 1);
+		sbio->bi_flags |= 1 << BIO_UPTODATE;
+		sbio->bi_next = NULL;
+		sbio->bi_sector = r1_bio->sector +
+			conf->mirrors[i].rdev->data_offset;
+		sbio->bi_bdev = conf->mirrors[i].rdev->bdev;
+		size = sbio->bi_size;
+		for (j = 0; j < vcnt ; j++) {
+			struct bio_vec *bi;
+			bi = &sbio->bi_io_vec[j];
+			bi->bv_offset = 0;
+			if (size > PAGE_SIZE)
+				bi->bv_len = PAGE_SIZE;
+			else
+				bi->bv_len = size;
+			size -= PAGE_SIZE;
+			memcpy(page_address(bi->bv_page),
+			       page_address(pbio->bi_io_vec[j].bv_page),
+			       PAGE_SIZE);
+		}
+	}
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 15/16] md/raid10: reformat some loops with less indenting.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (14 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 10/16] md/raid1: try fix_sync_read_error before process_checks NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

When a loop ends with an 'if' with a large body, it is neater
to make the if 'continue' on the inverse condition, and then
the body is indented less.

Apply this pattern 3 times, and wrap some other long lines.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid10.c |  228 +++++++++++++++++++++++++++------------------------
 1 files changed, 120 insertions(+), 108 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f44a3da..6e84668 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1736,7 +1736,8 @@ static int init_resync(conf_t *conf)
  *
  */
 
-static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, int go_faster)
+static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
+			     int *skipped, int go_faster)
 {
 	conf_t *conf = mddev->private;
 	r10bio_t *r10_bio;
@@ -1830,108 +1831,114 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		int j, k;
 		r10_bio = NULL;
 
-		for (i=0 ; i<conf->raid_disks; i++)
-			if (conf->mirrors[i].rdev &&
-			    !test_bit(In_sync, &conf->mirrors[i].rdev->flags)) {
-				int still_degraded = 0;
-				/* want to reconstruct this device */
-				r10bio_t *rb2 = r10_bio;
-				sector_t sect = raid10_find_virt(conf, sector_nr, i);
-				int must_sync;
-				/* Unless we are doing a full sync, we only need
-				 * to recover the block if it is set in the bitmap
-				 */
-				must_sync = bitmap_start_sync(mddev->bitmap, sect,
-							      &sync_blocks, 1);
-				if (sync_blocks < max_sync)
-					max_sync = sync_blocks;
-				if (!must_sync &&
-				    !conf->fullsync) {
-					/* yep, skip the sync_blocks here, but don't assume
-					 * that there will never be anything to do here
-					 */
-					chunks_skipped = -1;
-					continue;
-				}
+		for (i=0 ; i<conf->raid_disks; i++) {
+			int still_degraded;
+			r10bio_t *rb2;
+			sector_t sect;
+			int must_sync;
 
-				r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
-				raise_barrier(conf, rb2 != NULL);
-				atomic_set(&r10_bio->remaining, 0);
+			if (conf->mirrors[i].rdev == NULL ||
+			    test_bit(In_sync, &conf->mirrors[i].rdev->flags)) 
+				continue;
 
-				r10_bio->master_bio = (struct bio*)rb2;
-				if (rb2)
-					atomic_inc(&rb2->remaining);
-				r10_bio->mddev = mddev;
-				set_bit(R10BIO_IsRecover, &r10_bio->state);
-				r10_bio->sector = sect;
+			still_degraded = 0;
+			/* want to reconstruct this device */
+			rb2 = r10_bio;
+			sect = raid10_find_virt(conf, sector_nr, i);
+			/* Unless we are doing a full sync, we only need
+			 * to recover the block if it is set in the bitmap
+			 */
+			must_sync = bitmap_start_sync(mddev->bitmap, sect,
+						      &sync_blocks, 1);
+			if (sync_blocks < max_sync)
+				max_sync = sync_blocks;
+			if (!must_sync &&
+			    !conf->fullsync) {
+				/* yep, skip the sync_blocks here, but don't assume
+				 * that there will never be anything to do here
+				 */
+				chunks_skipped = -1;
+				continue;
+			}
 
-				raid10_find_phys(conf, r10_bio);
+			r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
+			raise_barrier(conf, rb2 != NULL);
+			atomic_set(&r10_bio->remaining, 0);
 
-				/* Need to check if the array will still be
-				 * degraded
-				 */
-				for (j=0; j<conf->raid_disks; j++)
-					if (conf->mirrors[j].rdev == NULL ||
-					    test_bit(Faulty, &conf->mirrors[j].rdev->flags)) {
-						still_degraded = 1;
-						break;
-					}
-
-				must_sync = bitmap_start_sync(mddev->bitmap, sect,
-							      &sync_blocks, still_degraded);
-
-				for (j=0; j<conf->copies;j++) {
-					int d = r10_bio->devs[j].devnum;
-					if (conf->mirrors[d].rdev &&
-					    test_bit(In_sync, &conf->mirrors[d].rdev->flags)) {
-						/* This is where we read from */
-						bio = r10_bio->devs[0].bio;
-						bio->bi_next = biolist;
-						biolist = bio;
-						bio->bi_private = r10_bio;
-						bio->bi_end_io = end_sync_read;
-						bio->bi_rw = READ;
-						bio->bi_sector = r10_bio->devs[j].addr +
-							conf->mirrors[d].rdev->data_offset;
-						bio->bi_bdev = conf->mirrors[d].rdev->bdev;
-						atomic_inc(&conf->mirrors[d].rdev->nr_pending);
-						atomic_inc(&r10_bio->remaining);
-						/* and we write to 'i' */
-
-						for (k=0; k<conf->copies; k++)
-							if (r10_bio->devs[k].devnum == i)
-								break;
-						BUG_ON(k == conf->copies);
-						bio = r10_bio->devs[1].bio;
-						bio->bi_next = biolist;
-						biolist = bio;
-						bio->bi_private = r10_bio;
-						bio->bi_end_io = end_sync_write;
-						bio->bi_rw = WRITE;
-						bio->bi_sector = r10_bio->devs[k].addr +
-							conf->mirrors[i].rdev->data_offset;
-						bio->bi_bdev = conf->mirrors[i].rdev->bdev;
-
-						r10_bio->devs[0].devnum = d;
-						r10_bio->devs[1].devnum = i;
+			r10_bio->master_bio = (struct bio*)rb2;
+			if (rb2)
+				atomic_inc(&rb2->remaining);
+			r10_bio->mddev = mddev;
+			set_bit(R10BIO_IsRecover, &r10_bio->state);
+			r10_bio->sector = sect;
 
-						break;
-					}
-				}
-				if (j == conf->copies) {
-					/* Cannot recover, so abort the recovery */
-					put_buf(r10_bio);
-					if (rb2)
-						atomic_dec(&rb2->remaining);
-					r10_bio = rb2;
-					if (!test_and_set_bit(MD_RECOVERY_INTR,
-							      &mddev->recovery))
-						printk(KERN_INFO "md/raid10:%s: insufficient "
-						       "working devices for recovery.\n",
-						       mdname(mddev));
+			raid10_find_phys(conf, r10_bio);
+
+			/* Need to check if the array will still be
+			 * degraded
+			 */
+			for (j=0; j<conf->raid_disks; j++)
+				if (conf->mirrors[j].rdev == NULL ||
+				    test_bit(Faulty, &conf->mirrors[j].rdev->flags)) {
+					still_degraded = 1;
 					break;
 				}
+
+			must_sync = bitmap_start_sync(mddev->bitmap, sect,
+						      &sync_blocks, still_degraded);
+
+			for (j=0; j<conf->copies;j++) {
+				int d = r10_bio->devs[j].devnum;
+				if (!conf->mirrors[d].rdev ||
+				    !test_bit(In_sync, &conf->mirrors[d].rdev->flags))
+					continue;
+				/* This is where we read from */
+				bio = r10_bio->devs[0].bio;
+				bio->bi_next = biolist;
+				biolist = bio;
+				bio->bi_private = r10_bio;
+				bio->bi_end_io = end_sync_read;
+				bio->bi_rw = READ;
+				bio->bi_sector = r10_bio->devs[j].addr +
+					conf->mirrors[d].rdev->data_offset;
+				bio->bi_bdev = conf->mirrors[d].rdev->bdev;
+				atomic_inc(&conf->mirrors[d].rdev->nr_pending);
+				atomic_inc(&r10_bio->remaining);
+				/* and we write to 'i' */
+
+				for (k=0; k<conf->copies; k++)
+					if (r10_bio->devs[k].devnum == i)
+						break;
+				BUG_ON(k == conf->copies);
+				bio = r10_bio->devs[1].bio;
+				bio->bi_next = biolist;
+				biolist = bio;
+				bio->bi_private = r10_bio;
+				bio->bi_end_io = end_sync_write;
+				bio->bi_rw = WRITE;
+				bio->bi_sector = r10_bio->devs[k].addr +
+					conf->mirrors[i].rdev->data_offset;
+				bio->bi_bdev = conf->mirrors[i].rdev->bdev;
+
+				r10_bio->devs[0].devnum = d;
+				r10_bio->devs[1].devnum = i;
+
+				break;
+			}
+			if (j == conf->copies) {
+				/* Cannot recover, so abort the recovery */
+				put_buf(r10_bio);
+				if (rb2)
+					atomic_dec(&rb2->remaining);
+				r10_bio = rb2;
+				if (!test_and_set_bit(MD_RECOVERY_INTR,
+						      &mddev->recovery))
+					printk(KERN_INFO "md/raid10:%s: insufficient "
+					       "working devices for recovery.\n",
+					       mdname(mddev));
+				break;
 			}
+		}
 		if (biolist == NULL) {
 			while (r10_bio) {
 				r10bio_t *rb2 = r10_bio;
@@ -1949,7 +1956,8 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 
 		if (!bitmap_start_sync(mddev->bitmap, sector_nr,
 				       &sync_blocks, mddev->degraded) &&
-		    !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+		    !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED,
+						 &mddev->recovery)) {
 			/* We can skip this block */
 			*skipped = 1;
 			return sync_blocks + sectors_skipped;
@@ -1994,7 +2002,8 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 			for (i=0; i<conf->copies; i++) {
 				int d = r10_bio->devs[i].devnum;
 				if (r10_bio->devs[i].bio->bi_end_io)
-					rdev_dec_pending(conf->mirrors[d].rdev, mddev);
+					rdev_dec_pending(conf->mirrors[d].rdev,
+							 mddev);
 			}
 			put_buf(r10_bio);
 			biolist = NULL;
@@ -2024,19 +2033,22 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		if (len == 0)
 			break;
 		for (bio= biolist ; bio ; bio=bio->bi_next) {
+			struct bio *bio2;
 			page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
-			if (bio_add_page(bio, page, len, 0) == 0) {
-				/* stop here */
-				struct bio *bio2;
-				bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
-				for (bio2 = biolist; bio2 && bio2 != bio; bio2 = bio2->bi_next) {
-					/* remove last page from this bio */
-					bio2->bi_vcnt--;
-					bio2->bi_size -= len;
-					bio2->bi_flags &= ~(1<< BIO_SEG_VALID);
-				}
-				goto bio_full;
+			if (bio_add_page(bio, page, len, 0))
+				continue;
+
+			/* stop here */
+			bio->bi_io_vec[bio->bi_vcnt].bv_page = page;
+			for (bio2 = biolist;
+			     bio2 && bio2 != bio;
+			     bio2 = bio2->bi_next) {
+				/* remove last page from this bio */
+				bio2->bi_vcnt--;
+				bio2->bi_size -= len;
+				bio2->bi_flags &= ~(1<< BIO_SEG_VALID);
 			}
+			goto bio_full;
 		}
 		nr_sectors += len>>9;
 		sector_nr += len>>9;



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 05/16] md/raid1: clean up read_balance.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (9 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 16/16] md: allow resync_start to be set while an array is active NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 13/16] md/raid10: make more use of 'slot' in raid10d NeilBrown
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

read_balance has two loops which both look for a 'best'
device based on slightly different criteria.
This is clumsy and makes is hard to add extra criteria.

So replace it all with a single loop that combines everything.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c |   83 +++++++++++++++++++++-------------------------------
 1 files changed, 34 insertions(+), 49 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 2b7a7ff..f0b0c79 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -411,10 +411,10 @@ static int read_balance(conf_t *conf, r1bio_t *r1_bio)
 {
 	const sector_t this_sector = r1_bio->sector;
 	const int sectors = r1_bio->sectors;
-	int new_disk = -1;
 	int start_disk;
+	int best_disk;
 	int i;
-	sector_t new_distance, current_distance;
+	sector_t best_dist;
 	mdk_rdev_t *rdev;
 	int choose_first;
 
@@ -425,6 +425,8 @@ static int read_balance(conf_t *conf, r1bio_t *r1_bio)
 	 * We take the first readable disk when above the resync window.
 	 */
  retry:
+	best_disk = -1;
+	best_dist = MaxSector;
 	if (conf->mddev->recovery_cp < MaxSector &&
 	    (this_sector + sectors >= conf->next_resync)) {
 		choose_first = 1;
@@ -434,8 +436,8 @@ static int read_balance(conf_t *conf, r1bio_t *r1_bio)
 		start_disk = conf->last_used;
 	}
 
-	/* make sure the disk is operational */
 	for (i = 0 ; i < conf->raid_disks ; i++) {
+		sector_t dist;
 		int disk = start_disk + i;
 		if (disk >= conf->raid_disks)
 			disk -= conf->raid_disks;
@@ -443,60 +445,43 @@ static int read_balance(conf_t *conf, r1bio_t *r1_bio)
 		rdev = rcu_dereference(conf->mirrors[disk].rdev);
 		if (r1_bio->bios[disk] == IO_BLOCKED
 		    || rdev == NULL
-		    || !test_bit(In_sync, &rdev->flags))
+		    || test_bit(Faulty, &rdev->flags))
 			continue;
-
-		new_disk = disk;
-		if (!test_bit(WriteMostly, &rdev->flags))
-			break;
-	}
-
-	if (new_disk < 0 || choose_first)
-		goto rb_out;
-
-	/*
-	 * Don't change to another disk for sequential reads:
-	 */
-	if (conf->next_seq_sect == this_sector)
-		goto rb_out;
-	if (this_sector == conf->mirrors[new_disk].head_position)
-		goto rb_out;
-
-	current_distance = abs(this_sector 
-			       - conf->mirrors[new_disk].head_position);
-
-	/* look for a better disk - i.e. head is closer */
-	start_disk = new_disk;
-	for (i = 1; i < conf->raid_disks; i++) {
-		int disk = start_disk + 1;
-		if (disk >= conf->raid_disks)
-			disk -= conf->raid_disks;
-
-		rdev = rcu_dereference(conf->mirrors[disk].rdev);
-		if (r1_bio->bios[disk] == IO_BLOCKED
-		    || rdev == NULL
-		    || !test_bit(In_sync, &rdev->flags)
-		    || test_bit(WriteMostly, &rdev->flags))
+		if (!test_bit(In_sync, &rdev->flags) &&
+		    rdev->recovery_offset < this_sector + sectors)
 			continue;
-
-		if (!atomic_read(&rdev->nr_pending)) {
-			new_disk = disk;
+		if (test_bit(WriteMostly, &rdev->flags)) {
+			/* Don't balance among write-mostly, just
+			 * use the first as a last resort */
+			if (best_disk < 0)
+				best_disk = disk;
+			continue;
+		}
+		/* This is a reasonable device to use.  It might
+		 * even be best.
+		 */
+		dist = abs(this_sector - conf->mirrors[disk].head_position);
+		if (choose_first
+		    /* Don't change to another disk for sequential reads */
+		    || conf->next_seq_sect == this_sector
+		    || dist == 0
+		    /* If device is idle, use it */
+		    || atomic_read(&rdev->nr_pending) == 0) {
+			best_disk = disk;
 			break;
 		}
-		new_distance = abs(this_sector - conf->mirrors[disk].head_position);
-		if (new_distance < current_distance) {
-			current_distance = new_distance;
-			new_disk = disk;
+		if (dist < best_dist) {
+			best_dist = dist;
+			best_disk = disk;
 		}
 	}
 
- rb_out:
-	if (new_disk >= 0) {
-		rdev = rcu_dereference(conf->mirrors[new_disk].rdev);
+	if (best_disk >= 0) {
+		rdev = rcu_dereference(conf->mirrors[best_disk].rdev);
 		if (!rdev)
 			goto retry;
 		atomic_inc(&rdev->nr_pending);
-		if (!test_bit(In_sync, &rdev->flags)) {
+		if (test_bit(Faulty, &rdev->flags)) {
 			/* cannot risk returning a device that failed
 			 * before we inc'ed nr_pending
 			 */
@@ -504,11 +489,11 @@ static int read_balance(conf_t *conf, r1bio_t *r1_bio)
 			goto retry;
 		}
 		conf->next_seq_sect = this_sector + sectors;
-		conf->last_used = new_disk;
+		conf->last_used = best_disk;
 	}
 	rcu_read_unlock();
 
-	return new_disk;
+	return best_disk;
 }
 
 static int raid1_congested(void *data, int bits)



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 12/16] md/raid10: some tidying up in fix_read_error
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (4 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 14/16] md/raid10: remove unused variable NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 06/16] md/multipath: discard ->working_disks in favour of ->degraded NeilBrown
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

Currently the rdev on which a read error happened could be removed
before we perform the fix_error handling.  This requires extra tests
for NULL.

So delay the rdev_dec_pending call until after the call to
fix_read_error so that we can be sure that the rdev still exists.

This allows an 'if' clause to be removed so the body gets re-indented
back one level.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid10.c |   74 +++++++++++++++++++++++----------------------------
 1 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8ea0aca..8e4f469 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -271,9 +271,10 @@ static void raid10_end_read_request(struct bio *bio, int error)
 		 */
 		set_bit(R10BIO_Uptodate, &r10_bio->state);
 		raid_end_bio_io(r10_bio);
+		rdev_dec_pending(conf->mirrors[dev].rdev, conf->mddev);
 	} else {
 		/*
-		 * oops, read error:
+		 * oops, read error - keep the refcount on the rdev
 		 */
 		char b[BDEVNAME_SIZE];
 		if (printk_ratelimit())
@@ -282,8 +283,6 @@ static void raid10_end_read_request(struct bio *bio, int error)
 			       bdevname(conf->mirrors[dev].rdev->bdev,b), (unsigned long long)r10_bio->sector);
 		reschedule_retry(r10_bio);
 	}
-
-	rdev_dec_pending(conf->mirrors[dev].rdev, conf->mddev);
 }
 
 static void raid10_end_write_request(struct bio *bio, int error)
@@ -1438,40 +1437,33 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 	int max_read_errors = atomic_read(&mddev->max_corr_read_errors);
 	int d = r10_bio->devs[r10_bio->read_slot].devnum;
 
-	rcu_read_lock();
-	rdev = rcu_dereference(conf->mirrors[d].rdev);
-	if (rdev) { /* If rdev is not NULL */
-		char b[BDEVNAME_SIZE];
-		int cur_read_error_count = 0;
+	/* still own a reference to this rdev, so it cannot
+	 * have been cleared recently.
+	 */
+	rdev = conf->mirrors[d].rdev;
 
-		bdevname(rdev->bdev, b);
+	if (test_bit(Faulty, &rdev->flags))
+		/* drive has already been failed, just ignore any
+		   more fix_read_error() attempts */
+		return;
 
-		if (test_bit(Faulty, &rdev->flags)) {
-			rcu_read_unlock();
-			/* drive has already been failed, just ignore any
-			   more fix_read_error() attempts */
-			return;
-		}
+	check_decay_read_errors(mddev, rdev);
+	atomic_inc(&rdev->read_errors);
+	if (atomic_read(&rdev->read_errors) > max_read_errors) {
+		char b[BDEVNAME_SIZE];
+		bdevname(rdev->bdev, b);
 
-		check_decay_read_errors(mddev, rdev);
-		atomic_inc(&rdev->read_errors);
-		cur_read_error_count = atomic_read(&rdev->read_errors);
-		if (cur_read_error_count > max_read_errors) {
-			rcu_read_unlock();
-			printk(KERN_NOTICE
-			       "md/raid10:%s: %s: Raid device exceeded "
-			       "read_error threshold "
-			       "[cur %d:max %d]\n",
-			       mdname(mddev),
-			       b, cur_read_error_count, max_read_errors);
-			printk(KERN_NOTICE
-			       "md/raid10:%s: %s: Failing raid "
-			       "device\n", mdname(mddev), b);
-			md_error(mddev, conf->mirrors[d].rdev);
-			return;
-		}
+		printk(KERN_NOTICE
+		       "md/raid10:%s: %s: Raid device exceeded "
+		       "read_error threshold [cur %d:max %d]\n",
+		       mdname(mddev), b,
+		       atomic_read(&rdev->read_errors), max_read_errors);
+		printk(KERN_NOTICE
+		       "md/raid10:%s: %s: Failing raid device\n",
+		       mdname(mddev), b);
+		md_error(mddev, conf->mirrors[d].rdev);
+		return;
 	}
-	rcu_read_unlock();
 
 	while(sectors) {
 		int s = sectors;
@@ -1540,8 +1532,8 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 					       "write failed"
 					       " (%d sectors at %llu on %s)\n",
 					       mdname(mddev), s,
-					       (unsigned long long)(sect+
-					       rdev->data_offset),
+					       (unsigned long long)(
+						       sect + rdev->data_offset),
 					       bdevname(rdev->bdev, b));
 					printk(KERN_NOTICE "md/raid10:%s: %s: failing "
 					       "drive\n",
@@ -1577,8 +1569,8 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 					       "corrected sectors"
 					       " (%d sectors at %llu on %s)\n",
 					       mdname(mddev), s,
-					       (unsigned long long)(sect+
-						    rdev->data_offset),
+					       (unsigned long long)(
+						       sect + rdev->data_offset),
 					       bdevname(rdev->bdev, b));
 					printk(KERN_NOTICE "md/raid10:%s: %s: failing drive\n",
 					       mdname(mddev),
@@ -1590,8 +1582,8 @@ static void fix_read_error(conf_t *conf, mddev_t *mddev, r10bio_t *r10_bio)
 					       "md/raid10:%s: read error corrected"
 					       " (%d sectors at %llu on %s)\n",
 					       mdname(mddev), s,
-					       (unsigned long long)(sect+
-					            rdev->data_offset),
+					       (unsigned long long)(
+						       sect + rdev->data_offset),
 					       bdevname(rdev->bdev, b));
 				}
 
@@ -1641,7 +1633,8 @@ static void raid10d(mddev_t *mddev)
 		else if (test_bit(R10BIO_IsRecover, &r10_bio->state))
 			recovery_request_write(mddev, r10_bio);
 		else {
-			int mirror;
+			int slot = r10_bio->read_slot;
+			int mirror = r10_bio->devs[slot].devnum;
 			/* we got a read error. Maybe the drive is bad.  Maybe just
 			 * the block and we can fix it.
 			 * We freeze all other IO, and try reading the block from
@@ -1655,6 +1648,7 @@ static void raid10d(mddev_t *mddev)
 				fix_read_error(conf, mddev, r10_bio);
 				unfreeze_array(conf);
 			}
+			rdev_dec_pending(conf->mirrors[mirror].rdev, mddev);
 
 			bio = r10_bio->devs[r10_bio->read_slot].bio;
 			r10_bio->devs[r10_bio->read_slot].bio =



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 11/16] md/raid1: improve handling of pages allocated for write-behind.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (6 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 06/16] md/multipath: discard ->working_disks in favour of ->degraded NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 09/16] md/raid1: tidy up new functions: process_checks and fix_sync_read_error NeilBrown
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

The current handling and freeing of these pages is a bit fragile.
We only keep the list of allocated pages in each bio, so we need to
still have a valid bio when freeing the pages, which is a bit clumsy.

So simply store the allocated page list in the r1_bio so it can easily
be found and freed when we are finished with the r1_bio.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c |   55 +++++++++++++++++++++++++---------------------------
 drivers/md/raid1.h |    4 +++-
 2 files changed, 29 insertions(+), 30 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b9d6da1..779abbd 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -297,23 +297,24 @@ static void raid1_end_read_request(struct bio *bio, int error)
 	rdev_dec_pending(conf->mirrors[mirror].rdev, conf->mddev);
 }
 
-static void r1_bio_write_done(r1bio_t *r1_bio, int vcnt, struct bio_vec *bv,
-			      int behind)
+static void r1_bio_write_done(r1bio_t *r1_bio)
 {
 	if (atomic_dec_and_test(&r1_bio->remaining))
 	{
 		/* it really is the end of this request */
 		if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
 			/* free extra copy of the data pages */
-			int i = vcnt;
+			int i = r1_bio->behind_page_count;
 			while (i--)
-				safe_put_page(bv[i].bv_page);
+				safe_put_page(r1_bio->behind_pages[i]);
+			kfree(r1_bio->behind_pages);
+			r1_bio->behind_pages = NULL;
 		}
 		/* clear the bitmap if all writes complete successfully */
 		bitmap_endwrite(r1_bio->mddev->bitmap, r1_bio->sector,
 				r1_bio->sectors,
 				!test_bit(R1BIO_Degraded, &r1_bio->state),
-				behind);
+				test_bit(R1BIO_BehindIO, &r1_bio->state));
 		md_write_end(r1_bio->mddev);
 		raid_end_bio_io(r1_bio);
 	}
@@ -386,7 +387,7 @@ static void raid1_end_write_request(struct bio *bio, int error)
 	 * Let's see if all mirrored write operations have finished
 	 * already.
 	 */
-	r1_bio_write_done(r1_bio, bio->bi_vcnt, bio->bi_io_vec, behind);
+	r1_bio_write_done(r1_bio);
 
 	if (to_put)
 		bio_put(to_put);
@@ -660,37 +661,36 @@ static void unfreeze_array(conf_t *conf)
 
 
 /* duplicate the data pages for behind I/O 
- * We return a list of bio_vec rather than just page pointers
- * as it makes freeing easier
  */
-static struct bio_vec *alloc_behind_pages(struct bio *bio)
+static void alloc_behind_pages(struct bio *bio, r1bio_t *r1_bio)
 {
 	int i;
 	struct bio_vec *bvec;
-	struct bio_vec *pages = kzalloc(bio->bi_vcnt * sizeof(struct bio_vec),
+	struct page **pages = kzalloc(bio->bi_vcnt * sizeof(struct page*),
 					GFP_NOIO);
 	if (unlikely(!pages))
-		goto do_sync_io;
+		return;
 
 	bio_for_each_segment(bvec, bio, i) {
-		pages[i].bv_page = alloc_page(GFP_NOIO);
-		if (unlikely(!pages[i].bv_page))
+		pages[i] = alloc_page(GFP_NOIO);
+		if (unlikely(!pages[i]))
 			goto do_sync_io;
-		memcpy(kmap(pages[i].bv_page) + bvec->bv_offset,
+		memcpy(kmap(pages[i]) + bvec->bv_offset,
 			kmap(bvec->bv_page) + bvec->bv_offset, bvec->bv_len);
-		kunmap(pages[i].bv_page);
+		kunmap(pages[i]);
 		kunmap(bvec->bv_page);
 	}
-
-	return pages;
+	r1_bio->behind_pages = pages;
+	r1_bio->behind_page_count = bio->bi_vcnt;
+	set_bit(R1BIO_BehindIO, &r1_bio->state);
+	return;
 
 do_sync_io:
-	if (pages)
-		for (i = 0; i < bio->bi_vcnt && pages[i].bv_page; i++)
-			put_page(pages[i].bv_page);
+	for (i = 0; i < bio->bi_vcnt; i++)
+		if (pages[i])
+			put_page(pages[i]);
 	kfree(pages);
 	PRINTK("%dB behind alloc failed, doing sync I/O\n", bio->bi_size);
-	return NULL;
 }
 
 static int make_request(mddev_t *mddev, struct bio * bio)
@@ -702,7 +702,6 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 	int i, targets = 0, disks;
 	struct bitmap *bitmap;
 	unsigned long flags;
-	struct bio_vec *behind_pages = NULL;
 	const int rw = bio_data_dir(bio);
 	const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
 	const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
@@ -855,9 +854,8 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 	if (bitmap &&
 	    (atomic_read(&bitmap->behind_writes)
 	     < mddev->bitmap_info.max_write_behind) &&
-	    !waitqueue_active(&bitmap->behind_wait) &&
-	    (behind_pages = alloc_behind_pages(bio)) != NULL)
-		set_bit(R1BIO_BehindIO, &r1_bio->state);
+	    !waitqueue_active(&bitmap->behind_wait))
+		alloc_behind_pages(bio, r1_bio);
 
 	atomic_set(&r1_bio->remaining, 1);
 	atomic_set(&r1_bio->behind_remaining, 0);
@@ -878,7 +876,7 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 		mbio->bi_rw = WRITE | do_flush_fua | do_sync;
 		mbio->bi_private = r1_bio;
 
-		if (behind_pages) {
+		if (r1_bio->behind_pages) {
 			struct bio_vec *bvec;
 			int j;
 
@@ -890,7 +888,7 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 			 * them all
 			 */
 			__bio_for_each_segment(bvec, mbio, j, 0)
-				bvec->bv_page = behind_pages[j].bv_page;
+				bvec->bv_page = r1_bio->behind_pages[j];
 			if (test_bit(WriteMostly, &conf->mirrors[i].rdev->flags))
 				atomic_inc(&r1_bio->behind_remaining);
 		}
@@ -900,8 +898,7 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 		bio_list_add(&conf->pending_bio_list, mbio);
 		spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
-	r1_bio_write_done(r1_bio, bio->bi_vcnt, behind_pages, behind_pages != NULL);
-	kfree(behind_pages); /* the behind pages are attached to the bios now */
+	r1_bio_write_done(r1_bio);
 
 	/* In case raid1d snuck in to freeze_array */
 	wake_up(&conf->wait_barrier);
diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
index cbfdf1a..5fc4ca1 100644
--- a/drivers/md/raid1.h
+++ b/drivers/md/raid1.h
@@ -94,7 +94,9 @@ struct r1bio_s {
 	int			read_disk;
 
 	struct list_head	retry_list;
-	struct bitmap_update	*bitmap_update;
+	/* Next two are only valid when R1BIO_BehindIO is set */
+	struct page		**behind_pages;
+	int			behind_page_count;
 	/*
 	 * if the IO is in WRITE direction, then multiple bios are used.
 	 * We choose the number when they are allocated.



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 07/16] md: make error_handler functions more uniform and correct.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (11 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 13/16] md/raid10: make more use of 'slot' in raid10d NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 08/16] md/raid1: split out two sub-functions from sync_request_write NeilBrown
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

- there is no need to test_bit Faulty, as that was already done in
  md_error which is the only caller of these functions.
- MD_CHANGE_DEVS should be set *after* faulty is set to ensure
  metadata is updated correctly.
- spinlock should be held while updating ->degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/multipath.c |   40 ++++++++++++++++++++++------------------
 drivers/md/raid5.c     |   38 ++++++++++++++++++--------------------
 2 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 0254712..3535c23 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -186,6 +186,7 @@ static int multipath_congested(void *data, int bits)
 static void multipath_error (mddev_t *mddev, mdk_rdev_t *rdev)
 {
 	multipath_conf_t *conf = mddev->private;
+	char b[BDEVNAME_SIZE];
 
 	if (conf->raid_disks - mddev->degraded <= 1) {
 		/*
@@ -194,26 +195,27 @@ static void multipath_error (mddev_t *mddev, mdk_rdev_t *rdev)
 		 * which has just failed.
 		 */
 		printk(KERN_ALERT 
-			"multipath: only one IO path left and IO error.\n");
+		       "multipath: only one IO path left and IO error.\n");
 		/* leave it active... it's all we have */
-	} else {
-		/*
-		 * Mark disk as unusable
-		 */
-		if (!test_bit(Faulty, &rdev->flags)) {
-			char b[BDEVNAME_SIZE];
-			clear_bit(In_sync, &rdev->flags);
-			set_bit(Faulty, &rdev->flags);
-			set_bit(MD_CHANGE_DEVS, &mddev->flags);
-			mddev->degraded++;
-			printk(KERN_ALERT "multipath: IO failure on %s,"
-				" disabling IO path.\n"
-				"multipath: Operation continuing"
-				" on %d IO paths.\n",
-				bdevname (rdev->bdev,b),
-				conf->raid_disks - mddev->degraded);
-		}
+		return;
+	}
+	/*
+	 * Mark disk as unusable
+	 */
+	if (test_and_clear_bit(In_sync, &rdev->flags)) {
+		unsigned long flags;
+		spin_lock_irqsave(&conf->device_lock, flags);
+		mddev->degraded++;
+		spin_unlock_irqrestore(&conf->device_lock, flags);
 	}
+	set_bit(Faulty, &rdev->flags);
+	set_bit(MD_CHANGE_DEVS, &mddev->flags);
+	printk(KERN_ALERT "multipath: IO failure on %s,"
+	       " disabling IO path.\n"
+	       "multipath: Operation continuing"
+	       " on %d IO paths.\n",
+	       bdevname(rdev->bdev, b),
+	       conf->raid_disks - mddev->degraded);
 }
 
 static void print_multipath_conf (multipath_conf_t *conf)
@@ -273,9 +275,11 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 							   PAGE_CACHE_SIZE - 1);
 			}
 
+			spin_lock_irq(&conf->device_lock);
 			mddev->degraded--;
 			rdev->raid_disk = path;
 			set_bit(In_sync, &rdev->flags);
+			spin_unlock_irq(&conf->device_lock);
 			rcu_assign_pointer(p->rdev, rdev);
 			err = 0;
 			md_integrity_add_rdev(rdev, mddev);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 49bf5f8..51af7f3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1700,27 +1700,25 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 	raid5_conf_t *conf = mddev->private;
 	pr_debug("raid456: error called\n");
 
-	if (!test_bit(Faulty, &rdev->flags)) {
-		set_bit(MD_CHANGE_DEVS, &mddev->flags);
-		if (test_and_clear_bit(In_sync, &rdev->flags)) {
-			unsigned long flags;
-			spin_lock_irqsave(&conf->device_lock, flags);
-			mddev->degraded++;
-			spin_unlock_irqrestore(&conf->device_lock, flags);
-			/*
-			 * if recovery was running, make sure it aborts.
-			 */
-			set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-		}
-		set_bit(Faulty, &rdev->flags);
-		printk(KERN_ALERT
-		       "md/raid:%s: Disk failure on %s, disabling device.\n"
-		       "md/raid:%s: Operation continuing on %d devices.\n",
-		       mdname(mddev),
-		       bdevname(rdev->bdev, b),
-		       mdname(mddev),
-		       conf->raid_disks - mddev->degraded);
+	if (test_and_clear_bit(In_sync, &rdev->flags)) {
+		unsigned long flags;
+		spin_lock_irqsave(&conf->device_lock, flags);
+		mddev->degraded++;
+		spin_unlock_irqrestore(&conf->device_lock, flags);
+		/*
+		 * if recovery was running, make sure it aborts.
+		 */
+		set_bit(MD_RECOVERY_INTR, &mddev->recovery);
 	}
+	set_bit(Faulty, &rdev->flags);
+	set_bit(MD_CHANGE_DEVS, &mddev->flags);
+	printk(KERN_ALERT
+	       "md/raid:%s: Disk failure on %s, disabling device.\n"
+	       "md/raid:%s: Operation continuing on %d devices.\n",
+	       mdname(mddev),
+	       bdevname(rdev->bdev, b),
+	       mdname(mddev),
+	       conf->raid_disks - mddev->degraded);
 }
 
 /*



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 13/16] md/raid10: make more use of 'slot' in raid10d.
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (10 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 05/16] md/raid1: clean up read_balance NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 07/16] md: make error_handler functions more uniform and correct NeilBrown
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

Now that we have a 'slot' variable, make better use of it to simplify
some code a little.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid10.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 8e4f469..80cc7e6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1650,8 +1650,8 @@ static void raid10d(mddev_t *mddev)
 			}
 			rdev_dec_pending(conf->mirrors[mirror].rdev, mddev);
 
-			bio = r10_bio->devs[r10_bio->read_slot].bio;
-			r10_bio->devs[r10_bio->read_slot].bio =
+			bio = r10_bio->devs[slot].bio;
+			r10_bio->devs[slot].bio =
 				mddev->ro ? IO_BLOCKED : NULL;
 			mirror = read_balance(conf, r10_bio);
 			if (mirror == -1) {
@@ -1665,6 +1665,7 @@ static void raid10d(mddev_t *mddev)
 			} else {
 				const unsigned long do_sync = (r10_bio->master_bio->bi_rw & REQ_SYNC);
 				bio_put(bio);
+				slot = r10_bio->read_slot;
 				rdev = conf->mirrors[mirror].rdev;
 				if (printk_ratelimit())
 					printk(KERN_ERR "md/raid10:%s: %s: redirecting sector %llu to"
@@ -1674,8 +1675,8 @@ static void raid10d(mddev_t *mddev)
 					       (unsigned long long)r10_bio->sector);
 				bio = bio_clone_mddev(r10_bio->master_bio,
 						      GFP_NOIO, mddev);
-				r10_bio->devs[r10_bio->read_slot].bio = bio;
-				bio->bi_sector = r10_bio->devs[r10_bio->read_slot].addr
+				r10_bio->devs[slot].bio = bio;
+				bio->bi_sector = r10_bio->devs[slot].addr
 					+ rdev->data_offset;
 				bio->bi_bdev = rdev->bdev;
 				bio->bi_rw = READ | do_sync;



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [md PATCH 06/16] md/multipath: discard ->working_disks in favour of ->degraded
  2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
                   ` (5 preceding siblings ...)
  2011-05-11  6:30 ` [md PATCH 12/16] md/raid10: some tidying up in fix_read_error NeilBrown
@ 2011-05-11  6:30 ` NeilBrown
  2011-05-11  6:30 ` [md PATCH 11/16] md/raid1: improve handling of pages allocated for write-behind NeilBrown
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2011-05-11  6:30 UTC (permalink / raw)
  To: linux-raid

conf->working_disks duplicates information already available
in mddev->degraded.
So remove working_disks.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/multipath.c |   22 +++++++++++-----------
 drivers/md/multipath.h |    1 -
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index c358909..0254712 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -146,7 +146,7 @@ static void multipath_status (struct seq_file *seq, mddev_t *mddev)
 	int i;
 	
 	seq_printf (seq, " [%d/%d] [", conf->raid_disks,
-						 conf->working_disks);
+		    conf->raid_disks - mddev->degraded);
 	for (i = 0; i < conf->raid_disks; i++)
 		seq_printf (seq, "%s",
 			       conf->multipaths[i].rdev && 
@@ -187,7 +187,7 @@ static void multipath_error (mddev_t *mddev, mdk_rdev_t *rdev)
 {
 	multipath_conf_t *conf = mddev->private;
 
-	if (conf->working_disks <= 1) {
+	if (conf->raid_disks - mddev->degraded <= 1) {
 		/*
 		 * Uh oh, we can do nothing if this is our last path, but
 		 * first check if this is a queued request for a device
@@ -205,14 +205,13 @@ static void multipath_error (mddev_t *mddev, mdk_rdev_t *rdev)
 			clear_bit(In_sync, &rdev->flags);
 			set_bit(Faulty, &rdev->flags);
 			set_bit(MD_CHANGE_DEVS, &mddev->flags);
-			conf->working_disks--;
 			mddev->degraded++;
 			printk(KERN_ALERT "multipath: IO failure on %s,"
 				" disabling IO path.\n"
 				"multipath: Operation continuing"
 				" on %d IO paths.\n",
 				bdevname (rdev->bdev,b),
-				conf->working_disks);
+				conf->raid_disks - mddev->degraded);
 		}
 	}
 }
@@ -227,7 +226,7 @@ static void print_multipath_conf (multipath_conf_t *conf)
 		printk("(conf==NULL)\n");
 		return;
 	}
-	printk(" --- wd:%d rd:%d\n", conf->working_disks,
+	printk(" --- wd:%d rd:%d\n", conf->raid_disks - conf->mddev->degraded,
 			 conf->raid_disks);
 
 	for (i = 0; i < conf->raid_disks; i++) {
@@ -274,7 +273,6 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 							   PAGE_CACHE_SIZE - 1);
 			}
 
-			conf->working_disks++;
 			mddev->degraded--;
 			rdev->raid_disk = path;
 			set_bit(In_sync, &rdev->flags);
@@ -391,6 +389,7 @@ static int multipath_run (mddev_t *mddev)
 	int disk_idx;
 	struct multipath_info *disk;
 	mdk_rdev_t *rdev;
+	int working_disks;
 
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
@@ -424,7 +423,7 @@ static int multipath_run (mddev_t *mddev)
 		goto out_free_conf;
 	}
 
-	conf->working_disks = 0;
+	working_disks = 0;
 	list_for_each_entry(rdev, &mddev->disks, same_set) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx < 0 ||
@@ -446,7 +445,7 @@ static int multipath_run (mddev_t *mddev)
 		}
 
 		if (!test_bit(Faulty, &rdev->flags))
-			conf->working_disks++;
+			working_disks++;
 	}
 
 	conf->raid_disks = mddev->raid_disks;
@@ -454,12 +453,12 @@ static int multipath_run (mddev_t *mddev)
 	spin_lock_init(&conf->device_lock);
 	INIT_LIST_HEAD(&conf->retry_list);
 
-	if (!conf->working_disks) {
+	if (!working_disks) {
 		printk(KERN_ERR "multipath: no operational IO paths for %s\n",
 			mdname(mddev));
 		goto out_free_conf;
 	}
-	mddev->degraded = conf->raid_disks - conf->working_disks;
+	mddev->degraded = conf->raid_disks - working_disks;
 
 	conf->pool = mempool_create_kmalloc_pool(NR_RESERVED_BUFS,
 						 sizeof(struct multipath_bh));
@@ -481,7 +480,8 @@ static int multipath_run (mddev_t *mddev)
 
 	printk(KERN_INFO 
 		"multipath: array %s active with %d out of %d IO paths\n",
-		mdname(mddev), conf->working_disks, mddev->raid_disks);
+		mdname(mddev), conf->raid_disks - mddev->degraded,
+	       mddev->raid_disks);
 	/*
 	 * Ok, everything is just fine now
 	 */
diff --git a/drivers/md/multipath.h b/drivers/md/multipath.h
index d1c2a8d..3c5a45e 100644
--- a/drivers/md/multipath.h
+++ b/drivers/md/multipath.h
@@ -9,7 +9,6 @@ struct multipath_private_data {
 	mddev_t			*mddev;
 	struct multipath_info	*multipaths;
 	int			raid_disks;
-	int			working_disks;
 	spinlock_t		device_lock;
 	struct list_head	retry_list;
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-05-11  6:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-11  6:30 [md PATCH 00/16] md patches for 2.6.40 NeilBrown
2011-05-11  6:30 ` [md PATCH 03/16] md/bitmap: fix saving of events_cleared and other state NeilBrown
2011-05-11  6:30 ` [md PATCH 01/16] md: Fix race when creating a new md device NeilBrown
2011-05-11  6:30 ` [md PATCH 04/16] md: simplify raid10 read_balance NeilBrown
2011-05-11  6:30 ` [md PATCH 02/16] md: reject a re-add request that cannot be honoured NeilBrown
2011-05-11  6:30 ` [md PATCH 14/16] md/raid10: remove unused variable NeilBrown
2011-05-11  6:30 ` [md PATCH 12/16] md/raid10: some tidying up in fix_read_error NeilBrown
2011-05-11  6:30 ` [md PATCH 06/16] md/multipath: discard ->working_disks in favour of ->degraded NeilBrown
2011-05-11  6:30 ` [md PATCH 11/16] md/raid1: improve handling of pages allocated for write-behind NeilBrown
2011-05-11  6:30 ` [md PATCH 09/16] md/raid1: tidy up new functions: process_checks and fix_sync_read_error NeilBrown
2011-05-11  6:30 ` [md PATCH 16/16] md: allow resync_start to be set while an array is active NeilBrown
2011-05-11  6:30 ` [md PATCH 05/16] md/raid1: clean up read_balance NeilBrown
2011-05-11  6:30 ` [md PATCH 13/16] md/raid10: make more use of 'slot' in raid10d NeilBrown
2011-05-11  6:30 ` [md PATCH 07/16] md: make error_handler functions more uniform and correct NeilBrown
2011-05-11  6:30 ` [md PATCH 08/16] md/raid1: split out two sub-functions from sync_request_write NeilBrown
2011-05-11  6:30 ` [md PATCH 10/16] md/raid1: try fix_sync_read_error before process_checks NeilBrown
2011-05-11  6:30 ` [md PATCH 15/16] md/raid10: reformat some loops with less indenting NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).