linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] md: Remove the hash tables from raid0.
@ 2009-05-14 10:43 Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

As mentioned by Neil, the raid0 hash table code does probably not
add any value. Moreover, it contains some rather strange sector_t
manipulations which are needed to setup and maintain the table.

This patch series against Neil's for-next tree as of yesterday removes
the hash table from the raid0 code.

Patch #1 replaces the hash table lookup by a simple function that
loops over all strip zones to find the zone that holds a given sector.
This change allows to get rid of the hash table itself (patch #2)
and of related fields of struct raid0_private_data (patch #3).

Patch #4 makes raid0 return a proper error code rather than -ENOMEM
in case the array could not be started for reasons other than memory
shortage.

The remaining two patches are simple cleanups that further simplify
the raid0 code a bit.

The patched kernel has been tested with a smallish raid0 array
consisting of five devices of varying sizes (created and filled with
contents by an unpatched kernel) and seems to work just fine. It
passes the raid0 tests of the mdadm test suite.

Please consider for inclusion.

 drivers/md/raid0.c |  150 ++++++++++++++--------------------------------------
 drivers/md/raid0.h |    4 --
 2 files changed, 39 insertions(+), 115 deletions(-)

Thanks
Andre

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 11:15   ` SandeepKsinha
                     ` (3 more replies)
  2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash table Andre Noll
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

The number of strip_zones of a raid0 array is bounded by the number of
drives in the array and is in fact much smaller for typical setups. For
example, any raid0 array containing identical disks will have only
a single strip_zone.

Therefore, the hash tables which are used for quickly finding the
strip_zone that holds a particular sector are of questionable value
and add quite a bit of unnecessary complexity.

This patch replaces the hash table lookup by equivalent code which
simply loops over all strip zones to find the zone that holds the
given sector.

Subsequent cleanup patches will remove the hash table structure.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   32 +++++++++++++++++++-------------
 1 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c08d755..9fd3c3c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
 	return 0;
 }
 
+/* Find the zone which holds a particular offset */
+static struct strip_zone *find_zone(struct raid0_private_data *conf,
+		sector_t sector)
+{
+	int i;
+
+	for (i = 0; i < conf->nr_strip_zones; i++) {
+		struct strip_zone *z = conf->strip_zone + i;
+
+		if (sector < z->zone_start + z->sectors)
+			return z;
+	}
+	BUG();
+	return NULL;
+}
+
 static int raid0_make_request (struct request_queue *q, struct bio *bio)
 {
 	mddev_t *mddev = q->queuedata;
@@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
 		bio_pair_release(bp);
 		return 0;
 	}
- 
-
-	{
-		sector_t x = sector >> conf->sector_shift;
-		sector_div(x, (u32)conf->spacing);
-		zone = conf->hash_table[x];
-	}
-
-	while (sector >= zone->zone_start + zone->sectors)
-		zone++;
-
+	zone = find_zone(conf, sector);
+	if (!zone)
+		return 1;
 	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
-
-
 	{
 		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
 
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Remove hash table.
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash spacing and sector shift Andre Noll
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

The raid0 hash table has become unused due to the changes in the
previous patch. This patch removes the hash table allocation and
setup code and kills the hash_table field of struct raid0_private_data.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   12 ------------
 drivers/md/raid0.h |    1 -
 2 files changed, 0 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 9fd3c3c..6bae90c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -330,22 +330,14 @@ static int raid0_run (mddev_t *mddev)
 		nb_zone = s + round;
 	}
 	printk(KERN_INFO "raid0 : nb_zone is %d.\n", nb_zone);
-
-	printk(KERN_INFO "raid0 : Allocating %zu bytes for hash.\n",
-				nb_zone*sizeof(struct strip_zone*));
-	conf->hash_table = kmalloc (sizeof (struct strip_zone *)*nb_zone, GFP_KERNEL);
-	if (!conf->hash_table)
-		goto out_free_conf;
 	sectors = conf->strip_zone[cur].sectors;
 
-	conf->hash_table[0] = conf->strip_zone + cur;
 	for (i=1; i< nb_zone; i++) {
 		while (sectors <= conf->spacing) {
 			cur++;
 			sectors += conf->strip_zone[cur].sectors;
 		}
 		sectors -= conf->spacing;
-		conf->hash_table[i] = conf->strip_zone + cur;
 	}
 	if (conf->sector_shift) {
 		conf->spacing >>= conf->sector_shift;
@@ -388,8 +380,6 @@ static int raid0_stop (mddev_t *mddev)
 	raid0_conf_t *conf = mddev_to_conf(mddev);
 
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
-	kfree(conf->hash_table);
-	conf->hash_table = NULL;
 	kfree(conf->strip_zone);
 	conf->strip_zone = NULL;
 	kfree(conf);
@@ -502,8 +492,6 @@ static void raid0_status (struct seq_file *seq, mddev_t *mddev)
 	h = 0;
 	for (j = 0; j < conf->nr_strip_zones; j++) {
 		seq_printf(seq, "      z%d", j);
-		if (conf->hash_table[h] == conf->strip_zone+j)
-			seq_printf(seq, "(h%d)", h++);
 		seq_printf(seq, "=[");
 		for (k = 0; k < conf->strip_zone[j].nb_dev; k++)
 			seq_printf(seq, "%s/", bdevname(
diff --git a/drivers/md/raid0.h b/drivers/md/raid0.h
index 824b12e..aac3613 100644
--- a/drivers/md/raid0.h
+++ b/drivers/md/raid0.h
@@ -12,7 +12,6 @@ struct strip_zone
 
 struct raid0_private_data
 {
-	struct strip_zone **hash_table; /* Table of indexes into strip_zone */
 	struct strip_zone *strip_zone;
 	mdk_rdev_t **devlist; /* lists of rdevs, pointed to by strip_zone->dev */
 	int nr_strip_zones;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Remove hash spacing and sector shift.
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash table Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Make raid0_run() return a proper error code Andre Noll
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

The "sector_shift" and "spacing" fields of struct raid0_private_data
were only used for the hash table lookups. So the removal of the
hash table allows get rid of these fields as well which simplifies
create_strip_zones() and raid0_run() quite a bit.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   63 +---------------------------------------------------
 drivers/md/raid0.h |    3 --
 2 files changed, 1 insertions(+), 65 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 6bae90c..2401e94 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -57,7 +57,6 @@ static int create_strip_zones (mddev_t *mddev)
 {
 	int i, c, j;
 	sector_t current_start, curr_zone_start;
-	sector_t min_spacing;
 	raid0_conf_t *conf = mddev_to_conf(mddev);
 	mdk_rdev_t *smallest, *rdev1, *rdev2, *rdev;
 	struct strip_zone *zone;
@@ -206,28 +205,7 @@ static int create_strip_zones (mddev_t *mddev)
 			(unsigned long long)current_start);
 	}
 
-	/* Now find appropriate hash spacing.
-	 * We want a number which causes most hash entries to cover
-	 * at most two strips, but the hash table must be at most
-	 * 1 PAGE.  We choose the smallest strip, or contiguous collection
-	 * of strips, that has big enough size.  We never consider the last
-	 * strip though as it's size has no bearing on the efficacy of the hash
-	 * table.
-	 */
-	conf->spacing = curr_zone_start;
-	min_spacing = curr_zone_start;
-	sector_div(min_spacing, PAGE_SIZE/sizeof(struct strip_zone*));
-	for (i=0; i < conf->nr_strip_zones-1; i++) {
-		sector_t s = 0;
-		for (j = i; j < conf->nr_strip_zones - 1 &&
-				s < min_spacing; j++)
-			s += conf->strip_zone[j].sectors;
-		if (s >= min_spacing && s < conf->spacing)
-			conf->spacing = s;
-	}
-
 	mddev->queue->unplug_fn = raid0_unplug;
-
 	mddev->queue->backing_dev_info.congested_fn = raid0_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
 
@@ -277,10 +255,8 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 	return array_sectors;
 }
 
-static int raid0_run (mddev_t *mddev)
+static int raid0_run(mddev_t *mddev)
 {
-	unsigned  cur=0, i=0, nb_zone;
-	s64 sectors;
 	raid0_conf_t *conf;
 
 	if (mddev->chunk_size == 0) {
@@ -310,43 +286,6 @@ static int raid0_run (mddev_t *mddev)
 
 	printk(KERN_INFO "raid0 : md_size is %llu sectors.\n",
 		(unsigned long long)mddev->array_sectors);
-	printk(KERN_INFO "raid0 : conf->spacing is %llu sectors.\n",
-		(unsigned long long)conf->spacing);
-	{
-		sector_t s = raid0_size(mddev, 0, 0);
-		sector_t space = conf->spacing;
-		int round;
-		conf->sector_shift = 0;
-		if (sizeof(sector_t) > sizeof(u32)) {
-			/*shift down space and s so that sector_div will work */
-			while (space > (sector_t) (~(u32)0)) {
-				s >>= 1;
-				space >>= 1;
-				s += 1; /* force round-up */
-				conf->sector_shift++;
-			}
-		}
-		round = sector_div(s, (u32)space) ? 1 : 0;
-		nb_zone = s + round;
-	}
-	printk(KERN_INFO "raid0 : nb_zone is %d.\n", nb_zone);
-	sectors = conf->strip_zone[cur].sectors;
-
-	for (i=1; i< nb_zone; i++) {
-		while (sectors <= conf->spacing) {
-			cur++;
-			sectors += conf->strip_zone[cur].sectors;
-		}
-		sectors -= conf->spacing;
-	}
-	if (conf->sector_shift) {
-		conf->spacing >>= conf->sector_shift;
-		/* round spacing up so when we divide by it, we
-		 * err on the side of too-low, which is safest
-		 */
-		conf->spacing++;
-	}
-
 	/* calculate the max read-ahead size.
 	 * For read-ahead of large files to be effective, we need to
 	 * readahead at least twice a whole stripe. i.e. number of devices
diff --git a/drivers/md/raid0.h b/drivers/md/raid0.h
index aac3613..218d7f3 100644
--- a/drivers/md/raid0.h
+++ b/drivers/md/raid0.h
@@ -15,9 +15,6 @@ struct raid0_private_data
 	struct strip_zone *strip_zone;
 	mdk_rdev_t **devlist; /* lists of rdevs, pointed to by strip_zone->dev */
 	int nr_strip_zones;
-
-	sector_t spacing;
-	int sector_shift; /* shift this before divide by spacing */
 };
 
 typedef struct raid0_private_data raid0_conf_t;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Make raid0_run() return a proper error code.
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
                   ` (2 preceding siblings ...)
  2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash spacing and sector shift Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 11:21   ` NeilBrown
  2009-05-14 10:43 ` [PATCH] md: raid0: Kfree() strip_zone and devlist in create_strip_zones() Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Simplify raid0_run() Andre Noll
  5 siblings, 1 reply; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

Currently raid0_run() always returns -ENOMEM on errors. This is
incorrect as running the array might fail for other reasons, for
example because not all component devices were available.

This patch changes create_strip_zones() so that it returns a proper
error code (either -ENOMEM or -EINVAL) rather than 1 on errors and
makes raid0_run(), its single caller, return that value instead
of -ENOMEM.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   17 +++++++++--------
 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 2401e94..0f4330f 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -106,12 +106,12 @@ static int create_strip_zones (mddev_t *mddev)
 	conf->strip_zone = kzalloc(sizeof(struct strip_zone)*
 				conf->nr_strip_zones, GFP_KERNEL);
 	if (!conf->strip_zone)
-		return 1;
+		return -ENOMEM;
 	conf->devlist = kzalloc(sizeof(mdk_rdev_t*)*
 				conf->nr_strip_zones*mddev->raid_disks,
 				GFP_KERNEL);
 	if (!conf->devlist)
-		return 1;
+		return -ENOMEM;
 
 	/* The first zone must contain all devices, so here we check that
 	 * there is a proper alignment of slots to devices and find them all
@@ -211,8 +211,8 @@ static int create_strip_zones (mddev_t *mddev)
 
 	printk(KERN_INFO "raid0: done.\n");
 	return 0;
- abort:
-	return 1;
+abort:
+	return -EINVAL;
 }
 
 /**
@@ -258,6 +258,7 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 static int raid0_run(mddev_t *mddev)
 {
 	raid0_conf_t *conf;
+	int ret;
 
 	if (mddev->chunk_size == 0) {
 		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
@@ -273,12 +274,13 @@ static int raid0_run(mddev_t *mddev)
 
 	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
 	if (!conf)
-		goto out;
+		return -ENOMEM;
 	mddev->private = (void *)conf;
  
 	conf->strip_zone = NULL;
 	conf->devlist = NULL;
-	if (create_strip_zones (mddev)) 
+	ret = create_strip_zones(mddev);
+	if (ret < 0)
 		goto out_free_conf;
 
 	/* calculate array device size */
@@ -310,8 +312,7 @@ out_free_conf:
 	kfree(conf->devlist);
 	kfree(conf);
 	mddev->private = NULL;
-out:
-	return -ENOMEM;
+	return ret;
 }
 
 static int raid0_stop (mddev_t *mddev)
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Kfree() strip_zone and devlist in create_strip_zones().
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
                   ` (3 preceding siblings ...)
  2009-05-14 10:43 ` [PATCH] md: raid0: Make raid0_run() return a proper error code Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 10:43 ` [PATCH] md: raid0: Simplify raid0_run() Andre Noll
  5 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

It's cleaner to kfree() these in the same function where they are
allocated. This allows to simplify raid0_run() a bit which will be
done in the next patch.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 0f4330f..5ff6290 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -110,8 +110,10 @@ static int create_strip_zones (mddev_t *mddev)
 	conf->devlist = kzalloc(sizeof(mdk_rdev_t*)*
 				conf->nr_strip_zones*mddev->raid_disks,
 				GFP_KERNEL);
-	if (!conf->devlist)
+	if (!conf->devlist) {
+		kfree(conf->strip_zone);
 		return -ENOMEM;
+	}
 
 	/* The first zone must contain all devices, so here we check that
 	 * there is a proper alignment of slots to devices and find them all
@@ -212,6 +214,8 @@ static int create_strip_zones (mddev_t *mddev)
 	printk(KERN_INFO "raid0: done.\n");
 	return 0;
 abort:
+	kfree(conf->strip_zone);
+	kfree(conf->devlist);
 	return -EINVAL;
 }
 
@@ -276,9 +280,7 @@ static int raid0_run(mddev_t *mddev)
 	if (!conf)
 		return -ENOMEM;
 	mddev->private = (void *)conf;
- 
-	conf->strip_zone = NULL;
-	conf->devlist = NULL;
+
 	ret = create_strip_zones(mddev);
 	if (ret < 0)
 		goto out_free_conf;
@@ -308,8 +310,6 @@ static int raid0_run(mddev_t *mddev)
 	return 0;
 
 out_free_conf:
-	kfree(conf->strip_zone);
-	kfree(conf->devlist);
 	kfree(conf);
 	mddev->private = NULL;
 	return ret;
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH] md: raid0: Simplify raid0_run().
  2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
                   ` (4 preceding siblings ...)
  2009-05-14 10:43 ` [PATCH] md: raid0: Kfree() strip_zone and devlist in create_strip_zones() Andre Noll
@ 2009-05-14 10:43 ` Andre Noll
  2009-05-14 11:43   ` SandeepKsinha
  2009-05-14 14:03   ` raz ben yehuda
  5 siblings, 2 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 10:43 UTC (permalink / raw)
  To: neilb; +Cc: raziebe, linux-raid, Andre Noll

Get rid of the local variable "conf" and of an unnecessary cast.

Signed-off-by: Andre Noll <maan@systemlinux.org>
---
 drivers/md/raid0.c |   20 +++++++-------------
 1 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 5ff6290..36b747a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -261,7 +261,6 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 
 static int raid0_run(mddev_t *mddev)
 {
-	raid0_conf_t *conf;
 	int ret;
 
 	if (mddev->chunk_size == 0) {
@@ -276,14 +275,15 @@ static int raid0_run(mddev_t *mddev)
 	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
 	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
-	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
-	if (!conf)
+	mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
+	if (!mddev->private)
 		return -ENOMEM;
-	mddev->private = (void *)conf;
-
 	ret = create_strip_zones(mddev);
-	if (ret < 0)
-		goto out_free_conf;
+	if (ret < 0) {
+		kfree(mddev->private);
+		mddev->private = NULL;
+		return ret;
+	}
 
 	/* calculate array device size */
 	md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
@@ -305,14 +305,8 @@ static int raid0_run(mddev_t *mddev)
 			mddev->queue->backing_dev_info.ra_pages = 2* stripe;
 	}
 
-
 	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
 	return 0;
-
-out_free_conf:
-	kfree(conf);
-	mddev->private = NULL;
-	return ret;
 }
 
 static int raid0_stop (mddev_t *mddev)
-- 
1.5.4.3


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
@ 2009-05-14 11:15   ` SandeepKsinha
  2009-05-14 11:15   ` NeilBrown
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: SandeepKsinha @ 2009-05-14 11:15 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, raziebe, linux-raid

Hi Andre,

On Thu, May 14, 2009 at 4:13 PM, Andre Noll <maan@systemlinux.org> wrote:
> The number of strip_zones of a raid0 array is bounded by the number of
> drives in the array and is in fact much smaller for typical setups. For
> example, any raid0 array containing identical disks will have only
> a single strip_zone.
>
> Therefore, the hash tables which are used for quickly finding the
> strip_zone that holds a particular sector are of questionable value
> and add quite a bit of unnecessary complexity.
>
> This patch replaces the hash table lookup by equivalent code which
> simply loops over all strip zones to find the zone that holds the
> given sector.
>
> Subsequent cleanup patches will remove the hash table structure.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   32 +++++++++++++++++++-------------
>  1 files changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c08d755..9fd3c3c 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
>        return 0;
>  }
>
> +/* Find the zone which holds a particular offset */
> +static struct strip_zone *find_zone(struct raid0_private_data *conf,
> +               sector_t sector)
> +{
> +       int i;
> +
> +       for (i = 0; i < conf->nr_strip_zones; i++) {
> +               struct strip_zone *z = conf->strip_zone + i;
> +
> +               if (sector < z->zone_start + z->sectors)
> +                       return z;
> +       }
> +       BUG();
> +       return NULL;
> +}
> +
>  static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  {
>        mddev_t *mddev = q->queuedata;
> @@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>                bio_pair_release(bp);
>                return 0;
>        }
> -
> -
> -       {
> -               sector_t x = sector >> conf->sector_shift;
> -               sector_div(x, (u32)conf->spacing);
> -               zone = conf->hash_table[x];
> -       }
> -
> -       while (sector >= zone->zone_start + zone->sectors)
> -               zone++;
> -
> +       zone = find_zone(conf, sector);
> +       if (!zone)
> +               return 1;

Firstly, will find_zone ever return a NULL. You code already asserts
on that situation in find_zone.

But in my opinion,

You should allow NULL to be returned from find_zone( ),
handle NULL in the caller function more gracefully than panicking (BUG).

Also, the above lines mean that if zone==NULL, you return 1, which
will eventually submit your request through generic_make_request.

I might be wrong as I am quite new to this code.
Kindly forgive.
>        sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> -
> -
>        {
>                sector_t x = (sector - zone->zone_start) >> chunksect_bits;
>
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
  2009-05-14 11:15   ` SandeepKsinha
@ 2009-05-14 11:15   ` NeilBrown
  2009-05-14 12:10     ` Andre Noll
  2009-05-14 12:22     ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Neil Brown
  2009-05-14 12:01   ` SandeepKsinha
  2009-05-14 14:13   ` raz ben yehuda
  3 siblings, 2 replies; 39+ messages in thread
From: NeilBrown @ 2009-05-14 11:15 UTC (permalink / raw)
  Cc: raziebe, linux-raid, Andre Noll

On Thu, May 14, 2009 8:43 pm, Andre Noll wrote:
> The number of strip_zones of a raid0 array is bounded by the number of
> drives in the array and is in fact much smaller for typical setups. For
> example, any raid0 array containing identical disks will have only
> a single strip_zone.
>
> Therefore, the hash tables which are used for quickly finding the
> strip_zone that holds a particular sector are of questionable value
> and add quite a bit of unnecessary complexity.
>
> This patch replaces the hash table lookup by equivalent code which
> simply loops over all strip zones to find the zone that holds the
> given sector.
>
> Subsequent cleanup patches will remove the hash table structure.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   32 +++++++++++++++++++-------------
>  1 files changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c08d755..9fd3c3c 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
>  	return 0;
>  }
>
> +/* Find the zone which holds a particular offset */
> +static struct strip_zone *find_zone(struct raid0_private_data *conf,
> +		sector_t sector)
> +{
> +	int i;
> +
> +	for (i = 0; i < conf->nr_strip_zones; i++) {
> +		struct strip_zone *z = conf->strip_zone + i;
> +
> +		if (sector < z->zone_start + z->sectors)
> +			return z;
> +	}

Maybe I'm being petty, but I really don't like this...

I really like
> -        while (sector >= zone->zone_start + zone->sectors)
> -                zone++;

It seems to capture what is really happening.
But I can see that your code has a defensive aspect to it
which is hard to argue against.

It's probably the fact that there are two loop variables (i and z)
that bothers me....
Here is a thought.  How about we extend the stripe_zone array
by one element and put a sentinal at the end, with ->zone_start being
the size of the drive ... or maybe even "max sector".

Then the loop could be
   for (i = 0; i < conf->nr_strip_zones; i++)
         if (sector < conf->strip_zone[i+1])
               return conf->strip_zone[i]

Save our selves an 'add' that way :-)


> +	BUG();
> +	return NULL;
> +}
> +
>  static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  {
>  	mddev_t *mddev = q->queuedata;
> @@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue
> *q, struct bio *bio)
>  		bio_pair_release(bp);
>  		return 0;
>  	}
> -
> -
> -	{
> -		sector_t x = sector >> conf->sector_shift;
> -		sector_div(x, (u32)conf->spacing);
> -		zone = conf->hash_table[x];
> -	}
> -
> -	while (sector >= zone->zone_start + zone->sectors)
> -		zone++;
> -
> +	zone = find_zone(conf, sector);
> +	if (!zone)
> +		return 1;

I don't much like those last two lines either.  zone will
never be NULL because if find_zone doesn't find anything, it
calls BUG.  And returning from make_request without doing any IO
is just wrong.  So those two lines can go.

Thanks,
NeilBrown


>  	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> -
> -
>  	{
>  		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
>
> --
> 1.5.4.3
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Make raid0_run() return a proper error code.
  2009-05-14 10:43 ` [PATCH] md: raid0: Make raid0_run() return a proper error code Andre Noll
@ 2009-05-14 11:21   ` NeilBrown
  2009-05-14 11:42     ` Andre Noll
  0 siblings, 1 reply; 39+ messages in thread
From: NeilBrown @ 2009-05-14 11:21 UTC (permalink / raw)
  Cc: raziebe, linux-raid, Andre Noll

On Thu, May 14, 2009 8:43 pm, Andre Noll wrote:
> Currently raid0_run() always returns -ENOMEM on errors. This is
> incorrect as running the array might fail for other reasons, for
> example because not all component devices were available.
>
> This patch changes create_strip_zones() so that it returns a proper
> error code (either -ENOMEM or -EINVAL) rather than 1 on errors and
> makes raid0_run(), its single caller, return that value instead
> of -ENOMEM.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   17 +++++++++--------
>  1 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 2401e94..0f4330f 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -106,12 +106,12 @@ static int create_strip_zones (mddev_t *mddev)
>  	conf->strip_zone = kzalloc(sizeof(struct strip_zone)*
>  				conf->nr_strip_zones, GFP_KERNEL);
>  	if (!conf->strip_zone)
> -		return 1;
> +		return -ENOMEM;
>  	conf->devlist = kzalloc(sizeof(mdk_rdev_t*)*
>  				conf->nr_strip_zones*mddev->raid_disks,
>  				GFP_KERNEL);
>  	if (!conf->devlist)
> -		return 1;
> +		return -ENOMEM;
>
>  	/* The first zone must contain all devices, so here we check that
>  	 * there is a proper alignment of slots to devices and find them all
> @@ -211,8 +211,8 @@ static int create_strip_zones (mddev_t *mddev)
>
>  	printk(KERN_INFO "raid0: done.\n");
>  	return 0;
> - abort:
> -	return 1;
> +abort:
> +	return -EINVAL;
>  }
>
>  /**
> @@ -258,6 +258,7 @@ static sector_t raid0_size(mddev_t *mddev, sector_t
> sectors, int raid_disks)
>  static int raid0_run(mddev_t *mddev)
>  {
>  	raid0_conf_t *conf;
> +	int ret;
>
>  	if (mddev->chunk_size == 0) {
>  		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
> @@ -273,12 +274,13 @@ static int raid0_run(mddev_t *mddev)
>
>  	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
>  	if (!conf)
> -		goto out;
> +		return -ENOMEM;
>  	mddev->private = (void *)conf;
>
>  	conf->strip_zone = NULL;
>  	conf->devlist = NULL;
> -	if (create_strip_zones (mddev))
> +	ret = create_strip_zones(mddev);
> +	if (ret < 0)
>  		goto out_free_conf;

Can we go one set further here and move the allocation of ->private
into create_stripe_zones.  It seems funny allocating it here and
never actually using it directly.

??

Everything else looks good.
Thanks for doing this.

NeilBrown





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Make raid0_run() return a proper error code.
  2009-05-14 11:21   ` NeilBrown
@ 2009-05-14 11:42     ` Andre Noll
  0 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 11:42 UTC (permalink / raw)
  To: NeilBrown; +Cc: raziebe, linux-raid

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On 21:21, NeilBrown wrote:
> >  	conf->strip_zone = NULL;
> >  	conf->devlist = NULL;
> > -	if (create_strip_zones (mddev))
> > +	ret = create_strip_zones(mddev);
> > +	if (ret < 0)
> >  		goto out_free_conf;
> 
> Can we go one set further here and move the allocation of ->private
> into create_stripe_zones.  It seems funny allocating it here and
> never actually using it directly.

Yes, I thought about that but wanted the patches to be as small as
possible. I'll follow up with another patch that moves the allocation
to create_stripe_zones().

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Simplify raid0_run().
  2009-05-14 10:43 ` [PATCH] md: raid0: Simplify raid0_run() Andre Noll
@ 2009-05-14 11:43   ` SandeepKsinha
  2009-05-14 12:06     ` NeilBrown
  2009-05-14 14:03   ` raz ben yehuda
  1 sibling, 1 reply; 39+ messages in thread
From: SandeepKsinha @ 2009-05-14 11:43 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, raziebe, linux-raid

Hi Andre,

On Thu, May 14, 2009 at 4:13 PM, Andre Noll <maan@systemlinux.org> wrote:
> Get rid of the local variable "conf" and of an unnecessary cast.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   20 +++++++-------------
>  1 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 5ff6290..36b747a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -261,7 +261,6 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
>
>  static int raid0_run(mddev_t *mddev)
>  {
> -       raid0_conf_t *conf;
>        int ret;
>
>        if (mddev->chunk_size == 0) {
> @@ -276,14 +275,15 @@ static int raid0_run(mddev_t *mddev)
>        blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
>        mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>
> -       conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
> -       if (!conf)
> +       mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
> +       if (!mddev->private)
>                return -ENOMEM;
> -       mddev->private = (void *)conf;
> -
>        ret = create_strip_zones(mddev);
> -       if (ret < 0)
> -               goto out_free_conf;
> +       if (ret < 0) {
> +               kfree(mddev->private);
> +               mddev->private = NULL;
> +               return ret;
> +       }
>


I believe the use of goto statements keep the code more structured,
especially for code cleanup in case of failures.
Its better that you revert back to the goto's so that later if a new
piece of code is added and there would be cases of failure to handle,
you will have a common cleanup routine.

I think Neil can comment better on this one.


>        /* calculate array device size */
>        md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
> @@ -305,14 +305,8 @@ static int raid0_run(mddev_t *mddev)
>                        mddev->queue->backing_dev_info.ra_pages = 2* stripe;
>        }
>
> -
>        blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
>        return 0;
> -
> -out_free_conf:
> -       kfree(conf);
> -       mddev->private = NULL;
> -       return ret;
>  }
>
>  static int raid0_stop (mddev_t *mddev)
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
  2009-05-14 11:15   ` SandeepKsinha
  2009-05-14 11:15   ` NeilBrown
@ 2009-05-14 12:01   ` SandeepKsinha
  2009-05-14 12:15     ` SandeepKsinha
  2009-05-14 14:13   ` raz ben yehuda
  3 siblings, 1 reply; 39+ messages in thread
From: SandeepKsinha @ 2009-05-14 12:01 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, raziebe, linux-raid

On Thu, May 14, 2009 at 4:13 PM, Andre Noll <maan@systemlinux.org> wrote:
> The number of strip_zones of a raid0 array is bounded by the number of
> drives in the array and is in fact much smaller for typical setups. For
> example, any raid0 array containing identical disks will have only
> a single strip_zone.
>
> Therefore, the hash tables which are used for quickly finding the
> strip_zone that holds a particular sector are of questionable value
> and add quite a bit of unnecessary complexity.
>
> This patch replaces the hash table lookup by equivalent code which
> simply loops over all strip zones to find the zone that holds the
> given sector.
>
> Subsequent cleanup patches will remove the hash table structure.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   32 +++++++++++++++++++-------------
>  1 files changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c08d755..9fd3c3c 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
>        return 0;
>  }
>
> +/* Find the zone which holds a particular offset */
> +static struct strip_zone *find_zone(struct raid0_private_data *conf,
> +               sector_t sector)
> +{
> +       int i;
> +
> +       for (i = 0; i < conf->nr_strip_zones; i++) {
> +               struct strip_zone *z = conf->strip_zone + i;
> +
> +               if (sector < z->zone_start + z->sectors)
> +                       return z;
> +       }
> +       BUG();
> +       return NULL;
> +}
> +

The point here is that rather than figuring the issue after traversing
through the list of all the zones,
you can make a check and smartly get away with the situation without
wasting extra effort.

      int i;

      BUG_ON(sector > conf->stripe_zone[0].sectors);

       for (i = 0; i < conf->nr_strip_zones; i++) {
               struct strip_zone *z = conf->strip_zone + i;

               if (sector < z->zone_start + z->sectors)
                       return z;
       }

       return NULL;



>  static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  {
>        mddev_t *mddev = q->queuedata;
> @@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>                bio_pair_release(bp);
>                return 0;
>        }
> -
> -
> -       {
> -               sector_t x = sector >> conf->sector_shift;
> -               sector_div(x, (u32)conf->spacing);
> -               zone = conf->hash_table[x];
> -       }
> -
> -       while (sector >= zone->zone_start + zone->sectors)
> -               zone++;
> -
> +       zone = find_zone(conf, sector);
> +       if (!zone)
> +               return 1;
>        sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> -
> -
>        {
>                sector_t x = (sector - zone->zone_start) >> chunksect_bits;
>
> --
> 1.5.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Simplify raid0_run().
  2009-05-14 11:43   ` SandeepKsinha
@ 2009-05-14 12:06     ` NeilBrown
  0 siblings, 0 replies; 39+ messages in thread
From: NeilBrown @ 2009-05-14 12:06 UTC (permalink / raw)
  To: SandeepKsinha; +Cc: Andre Noll, raziebe, linux-raid

On Thu, May 14, 2009 9:43 pm, SandeepKsinha wrote:
> Hi Andre,
>
> On Thu, May 14, 2009 at 4:13 PM, Andre Noll <maan@systemlinux.org> wrote:
>> Get rid of the local variable "conf" and of an unnecessary cast.
>>
>> Signed-off-by: Andre Noll <maan@systemlinux.org>
>> ---
>>  drivers/md/raid0.c |   20 +++++++-------------
>>  1 files changed, 7 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index 5ff6290..36b747a 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -261,7 +261,6 @@ static sector_t raid0_size(mddev_t *mddev, sector_t
>> sectors, int raid_disks)
>>
>>  static int raid0_run(mddev_t *mddev)
>>  {
>> -       raid0_conf_t *conf;
>>        int ret;
>>
>>        if (mddev->chunk_size == 0) {
>> @@ -276,14 +275,15 @@ static int raid0_run(mddev_t *mddev)
>>        blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) -
>> 1);
>>        mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>>
>> -       conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
>> -       if (!conf)
>> +       mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
>> +       if (!mddev->private)
>>                return -ENOMEM;
>> -       mddev->private = (void *)conf;
>> -
>>        ret = create_strip_zones(mddev);
>> -       if (ret < 0)
>> -               goto out_free_conf;
>> +       if (ret < 0) {
>> +               kfree(mddev->private);
>> +               mddev->private = NULL;
>> +               return ret;
>> +       }
>>
>
>
> I believe the use of goto statements keep the code more structured,
> especially for code cleanup in case of failures.
> Its better that you revert back to the goto's so that later if a new
> piece of code is added and there would be cases of failure to handle,
> you will have a common cleanup routine.
>
> I think Neil can comment better on this one.

I'm in two minds about this.
In general I prefer
   if (something went wrong)
         return error;
to
   if (something went wrong)_
         goto handle_error;
   ....

  handle_error:
    return error;

as it is easier to read.

However the point about maintainability is valid.  Having to put all
those gotos back in if you find that you need to free something for
unlock something in error prone (you might miss a return)
So having exactly one return for each function is a valuable
goal.

So in the general case there are good arguments in both directions.
Which means you have to make a decision based on the specifics of a
given case.
In this case, I think raid0_run is sufficiently small (now that all
the hash related code is gone) that the benefits of having gotos
are minimal.  So I would be inclined to leave it as it is.

Thanks for the suggestion.

NeilBrown




--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 11:15   ` NeilBrown
@ 2009-05-14 12:10     ` Andre Noll
  2009-05-14 12:25       ` NeilBrown
  2009-05-14 12:22     ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Neil Brown
  1 sibling, 1 reply; 39+ messages in thread
From: Andre Noll @ 2009-05-14 12:10 UTC (permalink / raw)
  To: NeilBrown; +Cc: raziebe, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

On 21:15, NeilBrown wrote:
> > +/* Find the zone which holds a particular offset */
> > +static struct strip_zone *find_zone(struct raid0_private_data *conf,
> > +		sector_t sector)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < conf->nr_strip_zones; i++) {
> > +		struct strip_zone *z = conf->strip_zone + i;
> > +
> > +		if (sector < z->zone_start + z->sectors)
> > +			return z;
> > +	}
> 
> Maybe I'm being petty, but I really don't like this...
> 
> I really like
> > -        while (sector >= zone->zone_start + zone->sectors)
> > -                zone++;
> 
> It seems to capture what is really happening.
> But I can see that your code has a defensive aspect to it
> which is hard to argue against.
> 
> It's probably the fact that there are two loop variables (i and z)
> that bothers me....
> Here is a thought.  How about we extend the stripe_zone array
> by one element and put a sentinal at the end, with ->zone_start being
> the size of the drive ... or maybe even "max sector".
> 
> Then the loop could be
>    for (i = 0; i < conf->nr_strip_zones; i++)
>          if (sector < conf->strip_zone[i+1])
>                return conf->strip_zone[i]
> 
> Save our selves an 'add' that way :-)

Saving an addition is a good thing as find_zone() is certainly
performance-critical. After all, that's why the original author used
hash tables.

The drawback is of course that we will happily return the last zone
for (invalid) requests beyond the array size. If that happens, data
corruption on the drive(s) of the last zone will likely be the result,
so I feel a bit uncomfortable in doing this.

How about extending stripe_zone as you propose, storing the array
size in ->zone_start of the last element and still call BUG() after
the loop?  That still saves the additional add and avoids the mentioned
data corruption.

> > +	zone = find_zone(conf, sector);
> > +	if (!zone)
> > +		return 1;
> 
> I don't much like those last two lines either.  zone will
> never be NULL because if find_zone doesn't find anything, it
> calls BUG.  And returning from make_request without doing any IO
> is just wrong.  So those two lines can go.

Yup. Do you want me to rework the patches, or should I follow up with
patches on top of the series?

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 12:01   ` SandeepKsinha
@ 2009-05-14 12:15     ` SandeepKsinha
  0 siblings, 0 replies; 39+ messages in thread
From: SandeepKsinha @ 2009-05-14 12:15 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, raziebe, linux-raid

On Thu, May 14, 2009 at 5:31 PM, SandeepKsinha <sandeepksinha@gmail.com> wrote:
> On Thu, May 14, 2009 at 4:13 PM, Andre Noll <maan@systemlinux.org> wrote:
>> The number of strip_zones of a raid0 array is bounded by the number of
>> drives in the array and is in fact much smaller for typical setups. For
>> example, any raid0 array containing identical disks will have only
>> a single strip_zone.
>>
>> Therefore, the hash tables which are used for quickly finding the
>> strip_zone that holds a particular sector are of questionable value
>> and add quite a bit of unnecessary complexity.
>>
>> This patch replaces the hash table lookup by equivalent code which
>> simply loops over all strip zones to find the zone that holds the
>> given sector.
>>
>> Subsequent cleanup patches will remove the hash table structure.
>>
>> Signed-off-by: Andre Noll <maan@systemlinux.org>
>> ---
>>  drivers/md/raid0.c |   32 +++++++++++++++++++-------------
>>  1 files changed, 19 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index c08d755..9fd3c3c 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
>>        return 0;
>>  }
>>
>> +/* Find the zone which holds a particular offset */
>> +static struct strip_zone *find_zone(struct raid0_private_data *conf,
>> +               sector_t sector)
>> +{
>> +       int i;
>> +
>> +       for (i = 0; i < conf->nr_strip_zones; i++) {
>> +               struct strip_zone *z = conf->strip_zone + i;
>> +
>> +               if (sector < z->zone_start + z->sectors)
>> +                       return z;
>> +       }
>> +       BUG();
>> +       return NULL;
>> +}
>> +
>
> The point here is that rather than figuring the issue after traversing
> through the list of all the zones,
> you can make a check and smartly get away with the situation without
> wasting extra effort.
>
>      int i;
>
>      BUG_ON(sector > conf->stripe_zone[0].sectors);
>
>       for (i = 0; i < conf->nr_strip_zones; i++) {
>               struct strip_zone *z = conf->strip_zone + i;
>
>               if (sector < z->zone_start + z->sectors)
>                       return z;
>       }
>
>       return NULL;
>
Sorry taken back. This won't exactly help. But thinking in similar
lines can be helpful.

>
>
>>  static int raid0_make_request (struct request_queue *q, struct bio *bio)
>>  {
>>        mddev_t *mddev = q->queuedata;
>> @@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>>                bio_pair_release(bp);
>>                return 0;
>>        }
>> -
>> -
>> -       {
>> -               sector_t x = sector >> conf->sector_shift;
>> -               sector_div(x, (u32)conf->spacing);
>> -               zone = conf->hash_table[x];
>> -       }
>> -
>> -       while (sector >= zone->zone_start + zone->sectors)
>> -               zone++;
>> -
>> +       zone = find_zone(conf, sector);
>> +       if (!zone)
>> +               return 1;
>>        sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
>> -
>> -
>>        {
>>                sector_t x = (sector - zone->zone_start) >> chunksect_bits;
>>
>> --
>> 1.5.4.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Regards,
> Sandeep.
>
>
>
>
>
>
> “To learn is to change. Education is a process that changes the learner.”
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 11:15   ` NeilBrown
  2009-05-14 12:10     ` Andre Noll
@ 2009-05-14 12:22     ` Neil Brown
  2009-05-14 15:51       ` raz ben yehuda
  1 sibling, 1 reply; 39+ messages in thread
From: Neil Brown @ 2009-05-14 12:22 UTC (permalink / raw)
  To: Andre Noll; +Cc: raziebe, linux-raid

On Thursday May 14, neilb@suse.de wrote:
> 
> Then the loop could be
>    for (i = 0; i < conf->nr_strip_zones; i++)
>          if (sector < conf->strip_zone[i+1])
>                return conf->strip_zone[i]
> 
> Save our selves an 'add' that way :-)

That should have been

> 
> Then the loop could be
>    for (i = 0; i < conf->nr_strip_zones; i++)
>          if (sector < conf->strip_zone[i+1]->zone_start)
>                return conf->strip_zone[i]
> 
> Save ourselves an 'add' that way :-)

and as we aren't using ->sectors in this loop any more, I think it
can be discarded completely, just making struct strip_zone 20% smaller
and improving cache performance.

What do you think?

NeilBrown

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 12:10     ` Andre Noll
@ 2009-05-14 12:25       ` NeilBrown
  2009-05-14 12:54         ` Sujit Karataparambil
                           ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: NeilBrown @ 2009-05-14 12:25 UTC (permalink / raw)
  To: Andre Noll; +Cc: raziebe, linux-raid

On Thu, May 14, 2009 10:10 pm, Andre Noll wrote:
> Yup. Do you want me to rework the patches, or should I follow up with
> patches on top of the series?

A fresh new set please.  And I agree with everything else you said in
the email.

NeilBrown


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 12:25       ` NeilBrown
@ 2009-05-14 12:54         ` Sujit Karataparambil
  2009-05-14 15:00           ` SandeepKsinha
  2009-05-14 15:58         ` PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity raz ben yehuda
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Sujit Karataparambil @ 2009-05-14 12:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andre Noll, raziebe, linux-raid

Hi neil,

could you please tell? out of curiosity what happens when a null is returned by
find_zone.

On Thu, May 14, 2009 at 5:55 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, May 14, 2009 10:10 pm, Andre Noll wrote:
>> Yup. Do you want me to rework the patches, or should I follow up with
>> patches on top of the series?
>
> A fresh new set please.  And I agree with everything else you said in
> the email.
>
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Simplify raid0_run().
  2009-05-14 10:43 ` [PATCH] md: raid0: Simplify raid0_run() Andre Noll
  2009-05-14 11:43   ` SandeepKsinha
@ 2009-05-14 14:03   ` raz ben yehuda
  1 sibling, 0 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 14:03 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, linux-raid

andre
i applied your patches and tested it a bit. seems to be working.
so thank you. Neil,  I will continue on top of andre of patches, if this
is ok with you.I promise to dissect better now.
On Thu, 2009-05-14 at 12:43 +0200, Andre Noll wrote:
> Get rid of the local variable "conf" and of an unnecessary cast.
> 
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   20 +++++++-------------
>  1 files changed, 7 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 5ff6290..36b747a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -261,7 +261,6 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
>  
>  static int raid0_run(mddev_t *mddev)
>  {
> -	raid0_conf_t *conf;
>  	int ret;
>  
>  	if (mddev->chunk_size == 0) {
> @@ -276,14 +275,15 @@ static int raid0_run(mddev_t *mddev)
>  	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
>  	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>  
> -	conf = kmalloc(sizeof (raid0_conf_t), GFP_KERNEL);
> -	if (!conf)
> +	mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
> +	if (!mddev->private)
>  		return -ENOMEM;
> -	mddev->private = (void *)conf;
> -
>  	ret = create_strip_zones(mddev);
> -	if (ret < 0)
> -		goto out_free_conf;
> +	if (ret < 0) {
> +		kfree(mddev->private);
> +		mddev->private = NULL;
> +		return ret;
> +	}
>  
>  	/* calculate array device size */
>  	md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
> @@ -305,14 +305,8 @@ static int raid0_run(mddev_t *mddev)
>  			mddev->queue->backing_dev_info.ra_pages = 2* stripe;
>  	}
>  
> -
>  	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
>  	return 0;
> -
> -out_free_conf:
> -	kfree(conf);
> -	mddev->private = NULL;
> -	return ret;
>  }
>  
>  static int raid0_stop (mddev_t *mddev)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity
  2009-05-14 15:58         ` PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity raz ben yehuda
@ 2009-05-14 14:07           ` Andre Noll
  2009-05-14 22:35           ` Neil Brown
  1 sibling, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-14 14:07 UTC (permalink / raw)
  To: raz ben yehuda; +Cc: NeilBrown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3382 bytes --]

On 18:58, raz ben yehuda wrote:

>  Andre, Patch is applied on top of your last post. now it is your turn to merge :)

I'll have to rework my patch series and will hopefully send the new
series tomorrow.

If it's clear that moving raid0 to 4K granularity is good thing to do,
I'll add your patches on top of the new series.

>  static sector_t calc_num_sectors(mdk_rdev_t *rdev, unsigned chunk_size)
>  {
>  	sector_t num_sectors = rdev->sb_start;
> -
> -	if (chunk_size)
> -		num_sectors &= ~((sector_t)chunk_size/512 - 1);
> +	if (chunk_size) {
> +		int chunk_sects = chunk_size>>9;

As chunk_size is unsigned, chunk_sects should probably also be unsigned.

> +		num_sectors = (num_sectors/chunk_sects)*chunk_sects;

A ROUND_DOWN macro perhaps? There are only four of them in the kernel
tree :-/

> @@ -3512,7 +3514,7 @@ min_sync_store(mddev_t *mddev, const char *buf, size_t len)
>  
>  	/* Must be a multiple of chunk_size */
>  	if (mddev->chunk_size) {
> -		if (min & (sector_t)((mddev->chunk_size>>9)-1))
> +		if (min % (sector_t)(mddev->chunk_size>>9))
>  			return -EINVAL;
>  	}
>  	mddev->resync_min = min;
> @@ -3549,7 +3551,7 @@ max_sync_store(mddev_t *mddev, const char *buf, size_t len)
>  
>  		/* Must be a multiple of chunk_size */
>  		if (mddev->chunk_size) {
> -			if (max & (sector_t)((mddev->chunk_size>>9)-1))
> +			if (max % (sector_t)((mddev->chunk_size>>9)))

If we decide to clean these up, we might as well do it right and
use ffz()/ffs().

> +		/*
> +		* raid0 chunk size has to divide by a page
> +		*/

Nitpick: I'd prefer to say "must be a multiple of", and the indentation
does not match coding style standards.

> +static int raid0_is_power2_chunk(mddev_t *mddev)
> +{
> +	if ((1 << ffz(~mddev->chunk_size)) == mddev->chunk_size)
> +		return 1;
> +	return 0;
> +}

This could be go to md.h as an inline function or at least be used
everywhere in raid0.c.

> -/* Find the zone which holds a particular offset */
> -static struct strip_zone *find_zone(struct raid0_private_data *conf,
> -		sector_t sector)
> +static int raid0_position_bio(mddev_t *mddev, struct bio *bio, sector_t sector)
>  {
> -	int i;
> -
> -	for (i = 0; i < conf->nr_strip_zones; i++) {
> -		struct strip_zone *z = conf->strip_zone + i;
> -
> -		if (sector < z->zone_start + z->sectors)
> -			return z;
> -	}
> -	BUG();
> -	return NULL;
> +	sector_t sect_in_chunk;
> +	mdk_rdev_t *tmp_dev;
> +	sector_t chunk_in_dev;
> +	sector_t rsect;
> +	sector_t x;
> +	raid0_conf_t *conf = mddev_to_conf(mddev);
> +	sector_t chunk_sects = mddev->chunk_size >> 9;
> +	struct strip_zone *zone = &conf->strip_zone[0];
> +
> +	while (sector >= zone->zone_start + zone->sectors)
> +		zone++;
> +	sect_in_chunk = sector % chunk_sects;
> +	x = (sector - zone->zone_start) / chunk_sects;
> +	sector_div(x, zone->nb_dev);
> +	chunk_in_dev = x;
> +	x = sector / chunk_sects;
> +	tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> +	rsect = (chunk_in_dev * chunk_sects) + zone->dev_start + sect_in_chunk;
> +	bio->bi_bdev = tmp_dev->bdev;
> +	bio->bi_sector = rsect + tmp_dev->data_offset;
> +	return 0;
>  }

This clashes of course with the planned changes that extend stripe_zone.

I'll see what I can do.
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
                     ` (2 preceding siblings ...)
  2009-05-14 12:01   ` SandeepKsinha
@ 2009-05-14 14:13   ` raz ben yehuda
  3 siblings, 0 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 14:13 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, linux-raid


On Thu, 2009-05-14 at 12:43 +0200, Andre Noll wrote:
> The number of strip_zones of a raid0 array is bounded by the number of
> drives in the array and is in fact much smaller for typical setups. For
> example, any raid0 array containing identical disks will have only
> a single strip_zone.
> 
> Therefore, the hash tables which are used for quickly finding the
> strip_zone that holds a particular sector are of questionable value
> and add quite a bit of unnecessary complexity.
> 
> This patch replaces the hash table lookup by equivalent code which
> simply loops over all strip zones to find the zone that holds the
> given sector.
> 
> Subsequent cleanup patches will remove the hash table structure.
> 
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> ---
>  drivers/md/raid0.c |   32 +++++++++++++++++++-------------
>  1 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index c08d755..9fd3c3c 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -398,6 +398,22 @@ static int raid0_stop (mddev_t *mddev)
>  	return 0;
>  }
>  
> +/* Find the zone which holds a particular offset */
> +static struct strip_zone *find_zone(struct raid0_private_data *conf,
> +		sector_t sector)
> +{
> +	int i;
> +
> +	for (i = 0; i < conf->nr_strip_zones; i++) {
> +		struct strip_zone *z = conf->strip_zone + i;
> +
> +		if (sector < z->zone_start + z->sectors)
> +			return z;
> +	}
> +	BUG();
> +	return NULL;
> +}
> +
>  static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  {
>  	mddev_t *mddev = q->queuedata;
> @@ -443,20 +459,10 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  		bio_pair_release(bp);
>  		return 0;
>  	}
> - 
> -
> -	{
> -		sector_t x = sector >> conf->sector_shift;
> -		sector_div(x, (u32)conf->spacing);
> -		zone = conf->hash_table[x];
> -	}
> -
> -	while (sector >= zone->zone_start + zone->sectors)
> -		zone++;
> -
> +	zone = find_zone(conf, sector);
> +	if (!zone)
> +		return 1;
? if you cannot serve the io you shouldn't you return an error ?
>  	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> -
> -
>  	{
>  		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
>  


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 12:54         ` Sujit Karataparambil
@ 2009-05-14 15:00           ` SandeepKsinha
  0 siblings, 0 replies; 39+ messages in thread
From: SandeepKsinha @ 2009-05-14 15:00 UTC (permalink / raw)
  To: Sujit Karataparambil; +Cc: NeilBrown, Andre Noll, raziebe, linux-raid

Hi Sujit,


On Thu, May 14, 2009 at 6:24 PM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> Hi neil,
>
> could you please tell? out of curiosity what happens when a null is returned by
> find_zone.
>

find_zone will never return a NULL as we have already asserted that condition.
If we are unable to find a zone, we inoke BUG();

> On Thu, May 14, 2009 at 5:55 PM, NeilBrown <neilb@suse.de> wrote:
>> On Thu, May 14, 2009 10:10 pm, Andre Noll wrote:
>>> Yup. Do you want me to rework the patches, or should I follow up with
>>> patches on top of the series?
>>
>> A fresh new set please.  And I agree with everything else you said in
>> the email.
>>
>> NeilBrown
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> -- Sujit K M
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,
Sandeep.





 	
“To learn is to change. Education is a process that changes the learner.”
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 12:22     ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Neil Brown
@ 2009-05-14 15:51       ` raz ben yehuda
  2009-05-14 20:38         ` NeilBrown
  2009-05-15 17:30         ` Andre Noll
  0 siblings, 2 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 15:51 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andre Noll, linux-raid


On Thu, 2009-05-14 at 22:22 +1000, Neil Brown wrote:
> On Thursday May 14, neilb@suse.de wrote:
> > 
> > Then the loop could be
> >    for (i = 0; i < conf->nr_strip_zones; i++)
> >          if (sector < conf->strip_zone[i+1])
> >                return conf->strip_zone[i]
> > 
> > Save our selves an 'add' that way :-)
> 
> That should have been
> 
> > 
> > Then the loop could be
> >    for (i = 0; i < conf->nr_strip_zones; i++)
> >          if (sector < conf->strip_zone[i+1]->zone_start)
> >                return conf->strip_zone[i]
> > 
> > Save ourselves an 'add' that way :-)
> 
> and as we aren't using ->sectors in this loop any more, I think it
> can be discarded completely, just making struct strip_zone 20% smaller
> and improving cache performance.
> 
> What do you think?
hamm.. please do not do it now. I think i need it for the reshape
because I have to traverse each zone up to its end. 
> NeilBrown




^ permalink raw reply	[flat|nested] 39+ messages in thread

* PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity
  2009-05-14 12:25       ` NeilBrown
  2009-05-14 12:54         ` Sujit Karataparambil
@ 2009-05-14 15:58         ` raz ben yehuda
  2009-05-14 14:07           ` Andre Noll
  2009-05-14 22:35           ` Neil Brown
  2009-05-14 16:00         ` Subject: PATCH[002:002] md: raid0: dump raid configuration raz ben yehuda
  2009-05-14 17:12         ` Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0 raz ben yehuda
  3 siblings, 2 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 15:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andre Noll, linux-raid

 move raid0 chunk size to 4K*n granularity. 
 motivation for this patch is to have a better access to raid550. if a raid 5 is 3M
 stripe (4-1),and you have two of these raids 5's, and on top of you have a raid0, 
 it is better to access raid550 with a 3MB buffers and not 1M ( no raid5 write penalty).
 Andre, Patch is applied on top of your last post. now it is your turn to merge :)
 md.c    |   24 ++++++++++-----
 raid0.c |  102 ++++++++++++++++++++++++++++++----------------------------------
 
 2 files changed, 65 insertions(+), 61 deletions(-) 
Signed-Off-by:raziebe@gmail.com

diff --git a/drivers/md/md.c b/drivers/md/md.c
index ed5727c..5eab782 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -440,12 +440,14 @@ static inline sector_t calc_dev_sboffset(struct block_device *bdev)
 	return MD_NEW_SIZE_SECTORS(num_sectors);
 }
 
+
 static sector_t calc_num_sectors(mdk_rdev_t *rdev, unsigned chunk_size)
 {
 	sector_t num_sectors = rdev->sb_start;
-
-	if (chunk_size)
-		num_sectors &= ~((sector_t)chunk_size/512 - 1);
+	if (chunk_size) {
+		int chunk_sects = chunk_size>>9;
+		num_sectors = (num_sectors/chunk_sects)*chunk_sects;
+	}
 	return num_sectors;
 }
 
@@ -3512,7 +3514,7 @@ min_sync_store(mddev_t *mddev, const char *buf, size_t len)
 
 	/* Must be a multiple of chunk_size */
 	if (mddev->chunk_size) {
-		if (min & (sector_t)((mddev->chunk_size>>9)-1))
+		if (min % (sector_t)(mddev->chunk_size>>9))
 			return -EINVAL;
 	}
 	mddev->resync_min = min;
@@ -3549,7 +3551,7 @@ max_sync_store(mddev_t *mddev, const char *buf, size_t len)
 
 		/* Must be a multiple of chunk_size */
 		if (mddev->chunk_size) {
-			if (max & (sector_t)((mddev->chunk_size>>9)-1))
+			if (max % (sector_t)((mddev->chunk_size>>9)))
 				return -EINVAL;
 		}
 		mddev->resync_max = max;
@@ -3993,11 +3995,19 @@ static int do_md_run(mddev_t * mddev)
 		/*
 		 * chunk-size has to be a power of 2
 		 */
-		if ( (1 << ffz(~chunk_size)) != chunk_size) {
+		if ((1 << ffz(~chunk_size)) != chunk_size &&
+			 mddev->level != 0) {
 			printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size);
 			return -EINVAL;
 		}
-
+		/*
+		* raid0 chunk size has to divide by a page
+		*/
+		if (mddev->level == 0 && (chunk_size % 4096)) {
+			printk(KERN_ERR "chunk_size of %d not valid\n",
+				chunk_size);
+			return -EINVAL;
+		}
 		/* devices must have minimum size of one chunk */
 		list_for_each_entry(rdev, &mddev->disks, same_set) {
 			if (test_bit(Faulty, &rdev->flags))
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 36b747a..9865316 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -53,7 +53,7 @@ static int raid0_congested(void *data, int bits)
 }
 
 
-static int create_strip_zones (mddev_t *mddev)
+static int raid0_create_strip_zones(mddev_t *mddev)
 {
 	int i, c, j;
 	sector_t current_start, curr_zone_start;
@@ -237,7 +237,7 @@ static int raid0_mergeable_bvec(struct request_queue *q,
 	unsigned int chunk_sectors = mddev->chunk_size >> 9;
 	unsigned int bio_sectors = bvm->bi_size >> 9;
 
-	max =  (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
+	max =  (chunk_sectors - ((sector % chunk_sectors) + bio_sectors)) << 9;
 	if (max < 0) max = 0; /* bio_add cannot handle a negative return */
 	if (max <= biovec->bv_len && bio_sectors == 0)
 		return biovec->bv_len;
@@ -259,26 +259,37 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 	return array_sectors;
 }
 
+static int raid0_is_power2_chunk(mddev_t *mddev)
+{
+	if ((1 << ffz(~mddev->chunk_size)) == mddev->chunk_size)
+		return 1;
+	return 0;
+}
+
+
 static int raid0_run(mddev_t *mddev)
 {
 	int ret;
+	int segment_boundary =  (mddev->chunk_size>>1)-1;
 
 	if (mddev->chunk_size == 0) {
 		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
 		return -EINVAL;
 	}
-	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
-	       mdname(mddev),
-	       mddev->chunk_size >> 9,
-	       (mddev->chunk_size>>1)-1);
 	blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
-	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
+	if (!raid0_is_power2_chunk(mddev))
+		segment_boundary = ~(ffz(~mddev->chunk_size))>>1;
+	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
+		mdname(mddev),
+		mddev->chunk_size >> 9,
+		segment_boundary);
+	blk_queue_segment_boundary(mddev->queue, segment_boundary);
 	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
 
 	mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
 	if (!mddev->private)
 		return -ENOMEM;
-	ret = create_strip_zones(mddev);
+	ret = raid0_create_strip_zones(mddev);
 	if (ret < 0) {
 		kfree(mddev->private);
 		mddev->private = NULL;
@@ -322,31 +333,35 @@ static int raid0_stop (mddev_t *mddev)
 	return 0;
 }
 
-/* Find the zone which holds a particular offset */
-static struct strip_zone *find_zone(struct raid0_private_data *conf,
-		sector_t sector)
+static int raid0_position_bio(mddev_t *mddev, struct bio *bio, sector_t sector)
 {
-	int i;
-
-	for (i = 0; i < conf->nr_strip_zones; i++) {
-		struct strip_zone *z = conf->strip_zone + i;
-
-		if (sector < z->zone_start + z->sectors)
-			return z;
-	}
-	BUG();
-	return NULL;
+	sector_t sect_in_chunk;
+	mdk_rdev_t *tmp_dev;
+	sector_t chunk_in_dev;
+	sector_t rsect;
+	sector_t x;
+	raid0_conf_t *conf = mddev_to_conf(mddev);
+	sector_t chunk_sects = mddev->chunk_size >> 9;
+	struct strip_zone *zone = &conf->strip_zone[0];
+
+	while (sector >= zone->zone_start + zone->sectors)
+		zone++;
+	sect_in_chunk = sector % chunk_sects;
+	x = (sector - zone->zone_start) / chunk_sects;
+	sector_div(x, zone->nb_dev);
+	chunk_in_dev = x;
+	x = sector / chunk_sects;
+	tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
+	rsect = (chunk_in_dev * chunk_sects) + zone->dev_start + sect_in_chunk;
+	bio->bi_bdev = tmp_dev->bdev;
+	bio->bi_sector = rsect + tmp_dev->data_offset;
+	return 0;
 }
 
-static int raid0_make_request (struct request_queue *q, struct bio *bio)
+static int raid0_make_request(struct request_queue *q, struct bio *bio)
 {
 	mddev_t *mddev = q->queuedata;
-	unsigned int sect_in_chunk, chunksect_bits, chunk_sects;
-	raid0_conf_t *conf = mddev_to_conf(mddev);
-	struct strip_zone *zone;
-	mdk_rdev_t *tmp_dev;
-	sector_t chunk;
-	sector_t sector, rsect;
+	unsigned int chunk_sects;
 	const int rw = bio_data_dir(bio);
 	int cpu;
 
@@ -362,10 +377,9 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
 	part_stat_unlock();
 
 	chunk_sects = mddev->chunk_size >> 9;
-	chunksect_bits = ffz(~chunk_sects);
-	sector = bio->bi_sector;
 
-	if (unlikely(chunk_sects < (bio->bi_sector & (chunk_sects - 1)) + (bio->bi_size >> 9))) {
+	if (unlikely(chunk_sects < ((bio->bi_sector % chunk_sects)
+			+ (bio->bi_size >> 9)))) {
 		struct bio_pair *bp;
 		/* Sanity check -- queue functions should prevent this happening */
 		if (bio->bi_vcnt != 1 ||
@@ -374,7 +388,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
 		/* This is a one page bio that upper layers
 		 * refuse to split for us, so we need to split it.
 		 */
-		bp = bio_split(bio, chunk_sects - (bio->bi_sector & (chunk_sects - 1)));
+		bp = bio_split(bio, chunk_sects -
+					(bio->bi_sector % chunk_sects));
 		if (raid0_make_request(q, &bp->bio1))
 			generic_make_request(&bp->bio1);
 		if (raid0_make_request(q, &bp->bio2))
@@ -383,29 +398,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
 		bio_pair_release(bp);
 		return 0;
 	}
-	zone = find_zone(conf, sector);
-	if (!zone)
+	if (!raid0_position_bio(mddev, bio, bio->bi_sector))
 		return 1;
-	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
-	{
-		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
-
-		sector_div(x, zone->nb_dev);
-		chunk = x;
-
-		x = sector >> chunksect_bits;
-		tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
-	}
-	rsect = (chunk << chunksect_bits) + zone->dev_start + sect_in_chunk;
- 
-	bio->bi_bdev = tmp_dev->bdev;
-	bio->bi_sector = rsect + tmp_dev->data_offset;
-
-	/*
-	 * Let the main block layer submit the IO and resolve recursion:
-	 */
-	return 1;
-
 bad_map:
 	printk("raid0_make_request bug: can't convert block across chunks"
 		" or bigger than %dk %llu %d\n", chunk_sects / 2,



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Subject: PATCH[002:002] md: raid0: dump raid configuration
  2009-05-14 12:25       ` NeilBrown
  2009-05-14 12:54         ` Sujit Karataparambil
  2009-05-14 15:58         ` PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity raz ben yehuda
@ 2009-05-14 16:00         ` raz ben yehuda
  2009-05-14 17:12         ` Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0 raz ben yehuda
  3 siblings, 0 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 16:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Andre Noll, linux-raid

Tell the user how raid0 looks like. 
 raid0.c |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
Signed-Off-by:raziebe@gmail.com
---
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 9865316..2ce81d0 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -267,6 +267,29 @@ static int raid0_is_power2_chunk(mddev_t *mddev)
 }
 
 
+static void raid0_dump_zones(mddev_t *mddev)
+{
+	int j, k, h;
+	char b[BDEVNAME_SIZE];
+	raid0_conf_t *conf = mddev_to_conf(mddev);
+	printk(KERN_INFO "***** %s configuration ******",
+		mdname(mddev));
+	h = 0;
+	for (j = 0; j < conf->nr_strip_zones; j++) {
+		printk("\nzone%d", j);
+		printk("=[");
+		for (k = 0; k < conf->strip_zone[j].nb_dev; k++)
+			printk("%s/", bdevname(
+				conf->strip_zone[j].dev[k]->bdev, b));
+		printk("]\n\t zone offset=%llu device offset=%llu size=%llukb\n",
+			(unsigned long long)conf->strip_zone[j].zone_start,
+			(unsigned long long)conf->strip_zone[j].dev_start,
+			(unsigned long long)conf->strip_zone[j].sectors>>1);
+	}
+	printk(KERN_INFO "**********************************\n\n");
+}
+
+
 static int raid0_run(mddev_t *mddev)
 {
 	int ret;
@@ -317,6 +340,7 @@ static int raid0_run(mddev_t *mddev)
 	}
 
 	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
+	raid0_dump_zones(mddev);
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-14 12:25       ` NeilBrown
                           ` (2 preceding siblings ...)
  2009-05-14 16:00         ` Subject: PATCH[002:002] md: raid0: dump raid configuration raz ben yehuda
@ 2009-05-14 17:12         ` raz ben yehuda
  2009-05-15  3:59           ` Sujit Karataparambil
  3 siblings, 1 reply; 39+ messages in thread
From: raz ben yehuda @ 2009-05-14 17:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Neil.
I do not know what is policy for mdadm patches. I would thankful for some instructions.
The bellow is the patch I applied to support raid0 chunk sizes 4K*n.
---
--- mdadm.c	(revision 16152)
+++ mdadm.c	(working copy)
@@ -314,7 +314,8 @@
 				exit(2);
 			}
 			chunk = strtol(optarg, &c, 10);
-			if (!optarg[0] || *c || chunk<4 || (chunk%4)) {
+			if (!optarg[0] || *c || chunk<4 ||  (((chunk-1)&chunk) && level !=0)
+					||  (((chunk%4) && level ==0) )) {
 				fprintf(stderr, Name ": invalid chunk/rounding value: %s\n",
 					optarg);
 				exit(2);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over   all strip_zones.
  2009-05-14 15:51       ` raz ben yehuda
@ 2009-05-14 20:38         ` NeilBrown
  2009-05-15 13:18           ` Andre Noll
  2009-05-15 17:30         ` Andre Noll
  1 sibling, 1 reply; 39+ messages in thread
From: NeilBrown @ 2009-05-14 20:38 UTC (permalink / raw)
  To: raz ben yehuda; +Cc: Andre Noll, linux-raid

On Fri, May 15, 2009 1:51 am, raz ben yehuda wrote:
>
> On Thu, 2009-05-14 at 22:22 +1000, Neil Brown wrote:
>> On Thursday May 14, neilb@suse.de wrote:
>> >
>> > Then the loop could be
>> >    for (i = 0; i < conf->nr_strip_zones; i++)
>> >          if (sector < conf->strip_zone[i+1])
>> >                return conf->strip_zone[i]
>> >
>> > Save our selves an 'add' that way :-)
>>
>> That should have been
>>
>> >
>> > Then the loop could be
>> >    for (i = 0; i < conf->nr_strip_zones; i++)
>> >          if (sector < conf->strip_zone[i+1]->zone_start)
>> >                return conf->strip_zone[i]
>> >
>> > Save ourselves an 'add' that way :-)
>>
>> and as we aren't using ->sectors in this loop any more, I think it
>> can be discarded completely, just making struct strip_zone 20% smaller
>> and improving cache performance.
>>
>> What do you think?
> hamm.. please do not do it now. I think i need it for the reshape
> because I have to traverse each zone up to its end.

You don't lose that ability.  Then end of one zone matches the
beginning of the next zone.  So run from
   zone[0].zone_start
to
   zone[1].zone_start

I was thinking further about this and realised that in the first zone,
zone_start is always 0.  And we added a sentinal zone for which only
the zone_start value is set.  This seems silly.
Instead we can just store 'zone_end' in each zone (or possibly
start_of_next_zone) and then we don't need the sentinal and don't waste
the zone_start at the beginning.

NeilBrown


>> NeilBrown
>
>
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity
  2009-05-14 15:58         ` PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity raz ben yehuda
  2009-05-14 14:07           ` Andre Noll
@ 2009-05-14 22:35           ` Neil Brown
  2009-05-18 22:58             ` raz ben yehuda
  1 sibling, 1 reply; 39+ messages in thread
From: Neil Brown @ 2009-05-14 22:35 UTC (permalink / raw)
  To: raz ben yehuda; +Cc: Andre Noll, linux-raid

On Thursday May 14, raziebe@gmail.com wrote:
>  move raid0 chunk size to 4K*n granularity. 
>  motivation for this patch is to have a better access to raid550. if a raid 5 is 3M
>  stripe (4-1),and you have two of these raids 5's, and on top of you have a raid0, 
>  it is better to access raid550 with a 3MB buffers and not 1M ( no raid5 write penalty).

I don't think you can justify "no raid5 write penalty" but it could
possibly reduce the raid5 write penalty.

>  Andre, Patch is applied on top of your last post. now it is your turn to merge :)
>  md.c    |   24 ++++++++++-----
>  raid0.c |  102 ++++++++++++++++++++++++++++++----------------------------------
>  
>  2 files changed, 65 insertions(+), 61 deletions(-) 
> Signed-Off-by:raziebe@gmail.com
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index ed5727c..5eab782 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -440,12 +440,14 @@ static inline sector_t calc_dev_sboffset(struct block_device *bdev)
>  	return MD_NEW_SIZE_SECTORS(num_sectors);
>  }
>  
> +

Just a single blank line between functions please.

>  static sector_t calc_num_sectors(mdk_rdev_t *rdev, unsigned chunk_size)
>  {
>  	sector_t num_sectors = rdev->sb_start;
> -
> -	if (chunk_size)
> -		num_sectors &= ~((sector_t)chunk_size/512 - 1);
> +	if (chunk_size) {
> +		int chunk_sects = chunk_size>>9;
> +		num_sectors = (num_sectors/chunk_sects)*chunk_sects;
> +	}

You need to use "sector_div" here.  On a 32 bit host with 64bit
sector_t you cannot use normal division.
sector_div takes a sector_t and an "unsigned long", divides the one by
the other, stores the quotient in the sector_t and returns the
remainder.

>  	return num_sectors;
>  }
>  
> @@ -3512,7 +3514,7 @@ min_sync_store(mddev_t *mddev, const char *buf, size_t len)
>  
>  	/* Must be a multiple of chunk_size */
>  	if (mddev->chunk_size) {
> -		if (min & (sector_t)((mddev->chunk_size>>9)-1))
> +		if (min % (sector_t)(mddev->chunk_size>>9))

sector_div again.

>  			return -EINVAL;
>  	}
>  	mddev->resync_min = min;
> @@ -3549,7 +3551,7 @@ max_sync_store(mddev_t *mddev, const char *buf, size_t len)
>  
>  		/* Must be a multiple of chunk_size */
>  		if (mddev->chunk_size) {
> -			if (max & (sector_t)((mddev->chunk_size>>9)-1))
> +			if (max % (sector_t)((mddev->chunk_size>>9)))

and again.

>  				return -EINVAL;
>  		}
>  		mddev->resync_max = max;
> @@ -3993,11 +3995,19 @@ static int do_md_run(mddev_t * mddev)
>  		/*
>  		 * chunk-size has to be a power of 2
>  		 */
> -		if ( (1 << ffz(~chunk_size)) != chunk_size) {
> +		if ((1 << ffz(~chunk_size)) != chunk_size &&
> +			 mddev->level != 0) {
>  			printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size);
>  			return -EINVAL;
>  		}
> -
> +		/*
> +		* raid0 chunk size has to divide by a page
> +		*/
> +		if (mddev->level == 0 && (chunk_size % 4096)) {
> +			printk(KERN_ERR "chunk_size of %d not valid\n",
> +				chunk_size);
> +			return -EINVAL;
> +		}

Why does the chunk size have to be a multiple of page size?
If there is a reason, include it in the comment.
And the size of a page is PAGE_SIZE, not 4096.  On some systems
PAGE_SIZE is 65536.


>  		/* devices must have minimum size of one chunk */
>  		list_for_each_entry(rdev, &mddev->disks, same_set) {
>  			if (test_bit(Faulty, &rdev->flags))
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 36b747a..9865316 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -53,7 +53,7 @@ static int raid0_congested(void *data, int bits)
>  }
>  
>  
> -static int create_strip_zones (mddev_t *mddev)
> +static int raid0_create_strip_zones(mddev_t *mddev)

Why did you add the "raid0_".  There is no value in a prefix like that
on static functions.  Removing the trailing space is good though!

>  {
>  	int i, c, j;
>  	sector_t current_start, curr_zone_start;
> @@ -237,7 +237,7 @@ static int raid0_mergeable_bvec(struct request_queue *q,
>  	unsigned int chunk_sectors = mddev->chunk_size >> 9;
>  	unsigned int bio_sectors = bvm->bi_size >> 9;
>  
> -	max =  (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
> +	max =  (chunk_sectors - ((sector % chunk_sectors) + bio_sectors)) << 9;

sector_div...  but you get the point, I'll stop mentioning it.

>  	if (max < 0) max = 0; /* bio_add cannot handle a negative return */
>  	if (max <= biovec->bv_len && bio_sectors == 0)
>  		return biovec->bv_len;
> @@ -259,26 +259,37 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
>  	return array_sectors;
>  }
>  
> +static int raid0_is_power2_chunk(mddev_t *mddev)
> +{
> +	if ((1 << ffz(~mddev->chunk_size)) == mddev->chunk_size)
> +		return 1;
> +	return 0;
> +}

include/linux/log2.h has is_power_of_2.  Use that.
And just say
  is_power_of_2(mddev->chunk_size)
where it is needed.  No need for a separate function.


> +
> +
>  static int raid0_run(mddev_t *mddev)
>  {
>  	int ret;
> +	int segment_boundary =  (mddev->chunk_size>>1)-1;
>  
>  	if (mddev->chunk_size == 0) {
>  		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
>  		return -EINVAL;
>  	}
> -	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
> -	       mdname(mddev),
> -	       mddev->chunk_size >> 9,
> -	       (mddev->chunk_size>>1)-1);
>  	blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
> -	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
> +	if (!raid0_is_power2_chunk(mddev))
> +		segment_boundary = ~(ffz(~mddev->chunk_size))>>1;
> +	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
> +		mdname(mddev),
> +		mddev->chunk_size >> 9,
> +		segment_boundary);
> +	blk_queue_segment_boundary(mddev->queue, segment_boundary);

Setting blk_queue_segment_boundary here seems wrong.  I know you
didn't add that code, you just modified.  But I think it might be
best to remove it first - in a separate patch.
I've just asked Peter Chubb who added that code in the first place if
he has any idea why it was there.

>  	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
>  
>  	mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
>  	if (!mddev->private)
>  		return -ENOMEM;
> -	ret = create_strip_zones(mddev);
> +	ret = raid0_create_strip_zones(mddev);
>  	if (ret < 0) {
>  		kfree(mddev->private);
>  		mddev->private = NULL;
> @@ -322,31 +333,35 @@ static int raid0_stop (mddev_t *mddev)
>  	return 0;
>  }
>  
> -/* Find the zone which holds a particular offset */
> -static struct strip_zone *find_zone(struct raid0_private_data *conf,
> -		sector_t sector)
> +static int raid0_position_bio(mddev_t *mddev, struct bio *bio, sector_t sector)

You've removed a comment, changed the name of the function that it
described, but have not provided a new comment to describe the new
function..

And no "raid0_" prefix please.  I know a lot of function do still have
that prefix but I would like the number to decrease, not increase.  It
is redundant.


>  {
> -	int i;
> -
> -	for (i = 0; i < conf->nr_strip_zones; i++) {
> -		struct strip_zone *z = conf->strip_zone + i;
> -
> -		if (sector < z->zone_start + z->sectors)
> -			return z;
> -	}
> -	BUG();
> -	return NULL;
> +	sector_t sect_in_chunk;
> +	mdk_rdev_t *tmp_dev;
> +	sector_t chunk_in_dev;
> +	sector_t rsect;
> +	sector_t x;
> +	raid0_conf_t *conf = mddev_to_conf(mddev);
> +	sector_t chunk_sects = mddev->chunk_size >> 9;
> +	struct strip_zone *zone = &conf->strip_zone[0];
> +
> +	while (sector >= zone->zone_start + zone->sectors)
> +		zone++;
> +	sect_in_chunk = sector % chunk_sects;
> +	x = (sector - zone->zone_start) / chunk_sects;
> +	sector_div(x, zone->nb_dev);
> +	chunk_in_dev = x;
> +	x = sector / chunk_sects;
> +	tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> +	rsect = (chunk_in_dev * chunk_sects) + zone->dev_start + sect_in_chunk;
> +	bio->bi_bdev = tmp_dev->bdev;
> +	bio->bi_sector = rsect + tmp_dev->data_offset;
> +	return 0;
>  }

I would really like it if we only did division if division were need,
and just use shift when that is possible.
e..g
  if (is_power_of_two())
      do the shift version
  else
      do the divide version.


>  
> -static int raid0_make_request (struct request_queue *q, struct bio *bio)
> +static int raid0_make_request(struct request_queue *q, struct bio *bio)
>  {
>  	mddev_t *mddev = q->queuedata;
> -	unsigned int sect_in_chunk, chunksect_bits, chunk_sects;
> -	raid0_conf_t *conf = mddev_to_conf(mddev);
> -	struct strip_zone *zone;
> -	mdk_rdev_t *tmp_dev;
> -	sector_t chunk;
> -	sector_t sector, rsect;
> +	unsigned int chunk_sects;
>  	const int rw = bio_data_dir(bio);
>  	int cpu;
>  
> @@ -362,10 +377,9 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  	part_stat_unlock();
>  
>  	chunk_sects = mddev->chunk_size >> 9;
> -	chunksect_bits = ffz(~chunk_sects);
> -	sector = bio->bi_sector;
>  
> -	if (unlikely(chunk_sects < (bio->bi_sector & (chunk_sects - 1)) + (bio->bi_size >> 9))) {
> +	if (unlikely(chunk_sects < ((bio->bi_sector % chunk_sects)
> +			+ (bio->bi_size >> 9)))) {

Please keep things lines up properly.  The contents of a bracketed
expression should never be to the left of the opening bracket, unless
the opening bracket is the last symbol on the line.
So this should be:
> +	if (unlikely(chunk_sects < ((bio->bi_sector % chunk_sects)
> +			            + (bio->bi_size >> 9)))) {

>  		struct bio_pair *bp;
>  		/* Sanity check -- queue functions should prevent this happening */
>  		if (bio->bi_vcnt != 1 ||
> @@ -374,7 +388,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  		/* This is a one page bio that upper layers
>  		 * refuse to split for us, so we need to split it.
>  		 */
> -		bp = bio_split(bio, chunk_sects - (bio->bi_sector & (chunk_sects - 1)));
> +		bp = bio_split(bio, chunk_sects -
> +					(bio->bi_sector % chunk_sects));

and this should be
> +		bp = bio_split(bio, chunk_sects -
> +				    (bio->bi_sector % chunk_sects));


>  		if (raid0_make_request(q, &bp->bio1))
>  			generic_make_request(&bp->bio1);
>  		if (raid0_make_request(q, &bp->bio2))
> @@ -383,29 +398,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
>  		bio_pair_release(bp);
>  		return 0;
>  	}
> -	zone = find_zone(conf, sector);
> -	if (!zone)
> +	if (!raid0_position_bio(mddev, bio, bio->bi_sector))
>  		return 1;
> -	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> -	{
> -		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
> -
> -		sector_div(x, zone->nb_dev);
> -		chunk = x;
> -
> -		x = sector >> chunksect_bits;
> -		tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> -	}
> -	rsect = (chunk << chunksect_bits) + zone->dev_start + sect_in_chunk;
> - 
> -	bio->bi_bdev = tmp_dev->bdev;
> -	bio->bi_sector = rsect + tmp_dev->data_offset;
> -
> -	/*
> -	 * Let the main block layer submit the IO and resolve recursion:
> -	 */
> -	return 1;
> -
>  bad_map:
>  	printk("raid0_make_request bug: can't convert block across chunks"
>  		" or bigger than %dk %llu %d\n", chunk_sects / 2,
> 


Thanks.
I haven't thoroughly read through 'position_bio' so I reserve the
right to comment further on that :-)
But apart from the various observations above, it looks like it is
heading in the right direction.

NeilBrown

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-14 17:12         ` Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0 raz ben yehuda
@ 2009-05-15  3:59           ` Sujit Karataparambil
  2009-05-15  6:01             ` Raz
  0 siblings, 1 reply; 39+ messages in thread
From: Sujit Karataparambil @ 2009-05-15  3:59 UTC (permalink / raw)
  To: raz ben yehuda; +Cc: NeilBrown, linux-raid

Neil,

I donot understand why this needs to be done in raid. Why should be the raid
taking care of 4k*n data sets.

What I see of this patch is that it specifically checks for some sort
of 4K Boundaries.

Thanks,
Sujit

On Thu, May 14, 2009 at 10:42 PM, raz ben yehuda <raziebe@gmail.com> wrote:
> Neil.
> I do not know what is policy for mdadm patches. I would thankful for some instructions.
> The bellow is the patch I applied to support raid0 chunk sizes 4K*n.
> ---
> --- mdadm.c     (revision 16152)
> +++ mdadm.c     (working copy)
> @@ -314,7 +314,8 @@
>                                exit(2);
>                        }
>                        chunk = strtol(optarg, &c, 10);
> -                       if (!optarg[0] || *c || chunk<4 || (chunk%4)) {
> +                       if (!optarg[0] || *c || chunk<4 ||  (((chunk-1)&chunk) && level !=0)
> +                                       ||  (((chunk%4) && level ==0) )) {
>                                fprintf(stderr, Name ": invalid chunk/rounding value: %s\n",
>                                        optarg);
>                                exit(2);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
-- Sujit K M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-15  3:59           ` Sujit Karataparambil
@ 2009-05-15  6:01             ` Raz
  2009-05-15  6:45               ` Sujit Karataparambil
  0 siblings, 1 reply; 39+ messages in thread
From: Raz @ 2009-05-15  6:01 UTC (permalink / raw)
  To: Sujit Karataparambil; +Cc: linux-raid

On Fri, May 15, 2009 at 6:59 AM, Sujit Karataparambil <sjt.kar@gmail.com> wrote:
> Neil,
>
> I donot understand why this needs to be done in raid. Why should be the raid
where else ?
> taking care of 4k*n data sets.
> What I see of this patch is that it specifically checks for some sort
> of 4K Boundaries.
instead of 4K*2^n where n>=0
correct and recompure IO posistion.
> Thanks,
> Sujit
>
> On Thu, May 14, 2009 at 10:42 PM, raz ben yehuda <raziebe@gmail.com> wrote:
>> Neil.
>> I do not know what is policy for mdadm patches. I would thankful for some instructions.
>> The bellow is the patch I applied to support raid0 chunk sizes 4K*n.
>> ---
>> --- mdadm.c     (revision 16152)
>> +++ mdadm.c     (working copy)
>> @@ -314,7 +314,8 @@
>>                                exit(2);
>>                        }
>>                        chunk = strtol(optarg, &c, 10);
>> -                       if (!optarg[0] || *c || chunk<4 || (chunk%4)) {
>> +                       if (!optarg[0] || *c || chunk<4 ||  (((chunk-1)&chunk) && level !=0)
>> +                                       ||  (((chunk%4) && level ==0) )) {
>>                                fprintf(stderr, Name ": invalid chunk/rounding value: %s\n",
>>                                        optarg);
>>                                exit(2);
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> -- Sujit K M
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-15  6:01             ` Raz
@ 2009-05-15  6:45               ` Sujit Karataparambil
  2009-05-15  8:39                 ` NeilBrown
  0 siblings, 1 reply; 39+ messages in thread
From: Sujit Karataparambil @ 2009-05-15  6:45 UTC (permalink / raw)
  To: Raz; +Cc: linux-raid, neilb

On Fri, May 15, 2009 at 11:31 AM, Raz <raziebe@gmail.com> wrote:
> where else ?

See you were the one who has sent out the patch. Ok? Then you should
know the reason for patch? What I had asked from Neil is an RFC Cross
Checking the Patch. So I donot expect an Reply From You. Ok?
>> taking care of 4k*n data sets.
>> What I see of this patch is that it specifically checks for some sort
>> of 4K Boundaries.


> instead of 4K*2^n where n>=0
> correct and recompure IO posistion.
>>>                                fprintf(stderr, Name ": invalid chunk/rounding value: %s\n",

Could you tell me how 4K*2^n is  used to do the patch what ever it was
sent for? Or
is it that the patch is incomplete.

Kindly Maintain Some sort of mailing list courtesy.Ok? Reply only when
you are asked to.
Ok?

-- 
-- Sujit K M

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-15  6:45               ` Sujit Karataparambil
@ 2009-05-15  8:39                 ` NeilBrown
  2009-05-15 15:45                   ` Raz
  0 siblings, 1 reply; 39+ messages in thread
From: NeilBrown @ 2009-05-15  8:39 UTC (permalink / raw)
  To: Sujit Karataparambil; +Cc: Raz, linux-raid

On Fri, May 15, 2009 4:45 pm, Sujit Karataparambil wrote:
> On Fri, May 15, 2009 at 11:31 AM, Raz <raziebe@gmail.com> wrote:
>> where else ?
>
> See you were the one who has sent out the patch. Ok? Then you should
> know the reason for patch? What I had asked from Neil is an RFC Cross
> Checking the Patch. So I donot expect an Reply From You. Ok?
>>> taking care of 4k*n data sets.
>>> What I see of this patch is that it specifically checks for some sort
>>> of 4K Boundaries.
>
>
>> instead of 4K*2^n where n>=0
>> correct and recompure IO posistion.
>>>>                                fprintf(stderr, Name ": invalid
>>>> chunk/rounding value: %s\n",
>
> Could you tell me how 4K*2^n is  used to do the patch what ever it was
> sent for? Or
> is it that the patch is incomplete.
>
> Kindly Maintain Some sort of mailing list courtesy.Ok? Reply only when
> you are asked to.
> Ok?

That really isn't an appropriate thing to say.  If people want to
reply (politely) this questions directed at other people on a public
list, I think that is quite appropriate.  In fact I value it as it
often means I don't need to reply because someone else has done
it for me.

Now I agree that "where else" is not a very helpful reply, but I'm
not sure that the original question was crystal clear either.  Maybe
there is a bit of a language barrier - I'm sure not everyone on this
list has English as their first language, yet English is the language
we use (for which I'm very thankful as I don't speak anything else).
So a little bit of patience all around would be good.

Also it would help if explanations were a little more detailed, and
even if questions contained more detail too.

In the case of the original patch, I'm not really sure of the point
of the patch myself.

The text
  "I do not know what is policy for mdadm patches. I would thankful for some
instructions."

is clear enough.  The answer is "send them to me" in much the same format
as the kernel patches you have seen sent by e.g. Andre Noll.

The text
   "The bellow is the patch I applied to support raid0 chunk sizes 4K*n."

confuses me as it doesn't seem add anything, it only removes.
i.e. it removes the possibility of passing a non-power-of-2
chunk size to raid levels other than 0.

That might be an appropriate thing to do, but the commentary that
came with the patch doesn't say it does that, and says something
quite different, so I'm not really sure what to make of it.  Which
is one of the reasons I didn't reply straight away - I wanted to give
myself time to read it again and make sure I wasn't missing anything.


NeilBrown

>
> --
> -- Sujit K M
>


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 20:38         ` NeilBrown
@ 2009-05-15 13:18           ` Andre Noll
  0 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-15 13:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: raz ben yehuda, linux-raid

[-- Attachment #1: Type: text/plain, Size: 596 bytes --]

On 06:38, NeilBrown wrote:

> I was thinking further about this and realised that in the first zone,
> zone_start is always 0.  And we added a sentinal zone for which only
> the zone_start value is set.  This seems silly.

Yes, it is.

> Instead we can just store 'zone_end' in each zone (or possibly
> start_of_next_zone) and then we don't need the sentinal and don't waste
> the zone_start at the beginning.

That's what I did in the new version of the patch set. I'll send it out
shortly.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0
  2009-05-15  8:39                 ` NeilBrown
@ 2009-05-15 15:45                   ` Raz
  0 siblings, 0 replies; 39+ messages in thread
From: Raz @ 2009-05-15 15:45 UTC (permalink / raw)
  To: NeilBrown, sjt.kar; +Cc: Linux RAID Mailing List

Well, first, anyone that  somehow in someway got confused or insulted
please accept my applogies.
This all thing is just a big missunderstanding, and from my part.
please see inline.
I hope we are ok now Sujit.

On 5/15/09, NeilBrown <neilb@suse.de> wrote:
> On Fri, May 15, 2009 4:45 pm, Sujit Karataparambil wrote:
> > On Fri, May 15, 2009 at 11:31 AM, Raz <raziebe@gmail.com> wrote:
> >> where else ?
> >
> > See you were the one who has sent out the patch. Ok? Then you should
> > know the reason for patch? What I had asked from Neil is an RFC Cross
> > Checking the Patch. So I donot expect an Reply From You. Ok?
> >>> taking care of 4k*n data sets.
> >>> What I see of this patch is that it specifically checks for some sort
> >>> of 4K Boundaries.
> >
> >
> >> instead of 4K*2^n where n>=0
> >> correct and recompure IO posistion.
> >>>>                                fprintf(stderr, Name ": invalid
> >>>> chunk/rounding value: %s\n",
> >
> > Could you tell me how 4K*2^n is  used to do the patch what ever it was
> > sent for? Or
> > is it that the patch is incomplete.
> >
> > Kindly Maintain Some sort of mailing list courtesy.Ok? Reply only when
> > you are asked to.
> > Ok?
>
> That really isn't an appropriate thing to say.  If people want to
> reply (politely) this questions directed at other people on a public
> list, I think that is quite appropriate.  In fact I value it as it
> often means I don't need to reply because someone else has done
> it for me.
>
> Now I agree that "where else" is not a very helpful reply, but I'm
ok. sorry about that. i just do not understand what the problem.
> not sure that the original question was crystal clear either.  Maybe
> there is a bit of a language barrier - I'm sure not everyone on this
> list has English as their first language, yet English is the language
> we use (for which I'm very thankful as I don't speak anything else).
> So a little bit of patience all around would be good.
>
> Also it would help if explanations were a little more detailed, and
> even if questions contained more detail too.
>
> In the case of the original patch, I'm not really sure of the point
> of the patch myself.
>
> The text
>  "I do not know what is policy for mdadm patches. I would thankful for some
> instructions."
>
> is clear enough.  The answer is "send them to me" in much the same format
> as the kernel patches you have seen sent by e.g. Andre Noll.
> The text
>   "The bellow is the patch I applied to support raid0 chunk sizes 4K*n."
>
> confuses me as it doesn't seem add anything, it only removes.
> i.e. it removes the possibility of passing a non-power-of-2
> chunk size to raid levels other than 0.
lol.... patch is my mistake. patch wat not against your original code .
Original mdadm.c code is:
if (!optarg[0] || *c || chunk<4 || ((chunk-1)&chunk))  ...

All I want is to enable the chunk fix only to raid0. This is all my
mistake because i am not working under your mdadm git repository but
under a private svn repository. Would you be so kind to direct me to
your git's url ( there are several mdadm gits i found in google ), I
will repost this fix.

Thank you.
raz
> That might be an appropriate thing to do, but the commentary that
> came with the patch doesn't say it does that, and says something
> quite different, so I'm not really sure what to make of it.  Which
> is one of the reasons I didn't reply straight away - I wanted to give
> myself time to read it again and make sure I wasn't missing anything.
>
>
> NeilBrown
>
> >
> > --
> > -- Sujit K M
> >
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-14 15:51       ` raz ben yehuda
  2009-05-14 20:38         ` NeilBrown
@ 2009-05-15 17:30         ` Andre Noll
  2009-05-15 21:19           ` Raz
  1 sibling, 1 reply; 39+ messages in thread
From: Andre Noll @ 2009-05-15 17:30 UTC (permalink / raw)
  To: raz ben yehuda; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On 18:51, raz ben yehuda wrote:

> > and as we aren't using ->sectors in this loop any more, I think it
> > can be discarded completely, just making struct strip_zone 20% smaller
> > and improving cache performance.
> > 
> > What do you think?
> hamm.. please do not do it now. I think i need it for the reshape
> because I have to traverse each zone up to its end. 

Well, you can always get the zone size of zone #i by

	z[i].zone_end - z[i - 1].zone_end

(with z[-1].zone_end := 0). Since reshape is a slow operation anyway,
the additional subtraction won't hurt, and we'd have the smaller
cache footprint during normal operation. So I'd be inclined to kill
->sectors as well.

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-15 17:30         ` Andre Noll
@ 2009-05-15 21:19           ` Raz
  2009-05-18  8:21             ` Andre Noll
  0 siblings, 1 reply; 39+ messages in thread
From: Raz @ 2009-05-15 21:19 UTC (permalink / raw)
  To: Andre Noll; +Cc: Neil Brown, linux-raid

I have not patched yet my code with this new serie, but where is patch
1/6 ? Does it exist or simply wrong numbering ?


On Fri, May 15, 2009 at 8:30 PM, Andre Noll <maan@systemlinux.org> wrote:
> On 18:51, raz ben yehuda wrote:
>
>> > and as we aren't using ->sectors in this loop any more, I think it
>> > can be discarded completely, just making struct strip_zone 20% smaller
>> > and improving cache performance.
>> >
>> > What do you think?
>> hamm.. please do not do it now. I think i need it for the reshape
>> because I have to traverse each zone up to its end.
>
> Well, you can always get the zone size of zone #i by
>
>        z[i].zone_end - z[i - 1].zone_end
>
> (with z[-1].zone_end := 0). Since reshape is a slow operation anyway,
> the additional subtraction won't hurt, and we'd have the smaller
> cache footprint during normal operation. So I'd be inclined to kill
> ->sectors as well.
>
> Andre
> --
> The only person who always got his work done by Friday was Robinson Crusoe
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFKDabAWto1QDEAkw8RAoEmAKCqYyRinCYIuXWi4fhJSLKfj//2XgCfYTfp
> z0lX/gg5fAfGb8IM0XKOAl4=
> =ShCN
> -----END PGP SIGNATURE-----
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones.
  2009-05-15 21:19           ` Raz
@ 2009-05-18  8:21             ` Andre Noll
  0 siblings, 0 replies; 39+ messages in thread
From: Andre Noll @ 2009-05-18  8:21 UTC (permalink / raw)
  To: Raz; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 299 bytes --]

On 00:19, Raz wrote:

> I have not patched yet my code with this new serie, but where is patch
> 1/6 ? Does it exist or simply wrong numbering ?

It's all in Neil's for-next tree. Sorry for the confusion.
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity
  2009-05-14 22:35           ` Neil Brown
@ 2009-05-18 22:58             ` raz ben yehuda
  0 siblings, 0 replies; 39+ messages in thread
From: raz ben yehuda @ 2009-05-18 22:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Neil Hello
following this email are 6 patches. I fixed the errors as required.

patch 1:
Because of the removal the device list from
the strips raid0 did not compile with MD_DEBUG flag on.

patch 2:
informative : print to the user raid0 zones

patch 3:
Add support to chunk size of 4K*n instead of 4K*2^n

patch 4:
md to support page size chunks in the case of raid 0

patch 5:
md to support 1K*n chunks only in the case of raid 0

patch 6:
fix mdadm to path 1K*n chunks only in the case of raid0.
Commit is stacked on top mdadm.c


On Fri, 2009-05-15 at 08:35 +1000, Neil Brown wrote:
> On Thursday May 14, raziebe@gmail.com wrote:
> >  move raid0 chunk size to 4K*n granularity. 
> >  motivation for this patch is to have a better access to raid550. if a raid 5 is 3M
> >  stripe (4-1),and you have two of these raids 5's, and on top of you have a raid0, 
> >  it is better to access raid550 with a 3MB buffers and not 1M ( no raid5 write penalty).
> 
> I don't think you can justify "no raid5 write penalty" but it could
> possibly reduce the raid5 write penalty.
> 
> >  Andre, Patch is applied on top of your last post. now it is your turn to merge :)
> >  md.c    |   24 ++++++++++-----
> >  raid0.c |  102 ++++++++++++++++++++++++++++++----------------------------------
> >  
> >  2 files changed, 65 insertions(+), 61 deletions(-) 
> > Signed-Off-by:raziebe@gmail.com
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index ed5727c..5eab782 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -440,12 +440,14 @@ static inline sector_t calc_dev_sboffset(struct block_device *bdev)
> >  	return MD_NEW_SIZE_SECTORS(num_sectors);
> >  }
> >  
> > +
> 
> Just a single blank line between functions please.
> 
> >  static sector_t calc_num_sectors(mdk_rdev_t *rdev, unsigned chunk_size)
> >  {
> >  	sector_t num_sectors = rdev->sb_start;
> > -
> > -	if (chunk_size)
> > -		num_sectors &= ~((sector_t)chunk_size/512 - 1);
> > +	if (chunk_size) {
> > +		int chunk_sects = chunk_size>>9;
> > +		num_sectors = (num_sectors/chunk_sects)*chunk_sects;
> > +	}
> 
> You need to use "sector_div" here.  On a 32 bit host with 64bit
> sector_t you cannot use normal division.
> sector_div takes a sector_t and an "unsigned long", divides the one by
> the other, stores the quotient in the sector_t and returns the
> remainder.
> 
> >  	return num_sectors;
> >  }
> >  
> > @@ -3512,7 +3514,7 @@ min_sync_store(mddev_t *mddev, const char *buf, size_t len)
> >  
> >  	/* Must be a multiple of chunk_size */
> >  	if (mddev->chunk_size) {
> > -		if (min & (sector_t)((mddev->chunk_size>>9)-1))
> > +		if (min % (sector_t)(mddev->chunk_size>>9))
> 
> sector_div again.
> 
> >  			return -EINVAL;
> >  	}
> >  	mddev->resync_min = min;
> > @@ -3549,7 +3551,7 @@ max_sync_store(mddev_t *mddev, const char *buf, size_t len)
> >  
> >  		/* Must be a multiple of chunk_size */
> >  		if (mddev->chunk_size) {
> > -			if (max & (sector_t)((mddev->chunk_size>>9)-1))
> > +			if (max % (sector_t)((mddev->chunk_size>>9)))
> 
> and again.
> 
> >  				return -EINVAL;
> >  		}
> >  		mddev->resync_max = max;
> > @@ -3993,11 +3995,19 @@ static int do_md_run(mddev_t * mddev)
> >  		/*
> >  		 * chunk-size has to be a power of 2
> >  		 */
> > -		if ( (1 << ffz(~chunk_size)) != chunk_size) {
> > +		if ((1 << ffz(~chunk_size)) != chunk_size &&
> > +			 mddev->level != 0) {
> >  			printk(KERN_ERR "chunk_size of %d not valid\n", chunk_size);
> >  			return -EINVAL;
> >  		}
> > -
> > +		/*
> > +		* raid0 chunk size has to divide by a page
> > +		*/
> > +		if (mddev->level == 0 && (chunk_size % 4096)) {
> > +			printk(KERN_ERR "chunk_size of %d not valid\n",
> > +				chunk_size);
> > +			return -EINVAL;
> > +		}
> 
> Why does the chunk size have to be a multiple of page size?
> If there is a reason, include it in the comment.
> And the size of a page is PAGE_SIZE, not 4096.  On some systems
> PAGE_SIZE is 65536.
> 
> 
> >  		/* devices must have minimum size of one chunk */
> >  		list_for_each_entry(rdev, &mddev->disks, same_set) {
> >  			if (test_bit(Faulty, &rdev->flags))
> > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > index 36b747a..9865316 100644
> > --- a/drivers/md/raid0.c
> > +++ b/drivers/md/raid0.c
> > @@ -53,7 +53,7 @@ static int raid0_congested(void *data, int bits)
> >  }
> >  
> >  
> > -static int create_strip_zones (mddev_t *mddev)
> > +static int raid0_create_strip_zones(mddev_t *mddev)
> 
> Why did you add the "raid0_".  There is no value in a prefix like that
> on static functions.  Removing the trailing space is good though!
> 
> >  {
> >  	int i, c, j;
> >  	sector_t current_start, curr_zone_start;
> > @@ -237,7 +237,7 @@ static int raid0_mergeable_bvec(struct request_queue *q,
> >  	unsigned int chunk_sectors = mddev->chunk_size >> 9;
> >  	unsigned int bio_sectors = bvm->bi_size >> 9;
> >  
> > -	max =  (chunk_sectors - ((sector & (chunk_sectors - 1)) + bio_sectors)) << 9;
> > +	max =  (chunk_sectors - ((sector % chunk_sectors) + bio_sectors)) << 9;
> 
> sector_div...  but you get the point, I'll stop mentioning it.
> 
> >  	if (max < 0) max = 0; /* bio_add cannot handle a negative return */
> >  	if (max <= biovec->bv_len && bio_sectors == 0)
> >  		return biovec->bv_len;
> > @@ -259,26 +259,37 @@ static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
> >  	return array_sectors;
> >  }
> >  
> > +static int raid0_is_power2_chunk(mddev_t *mddev)
> > +{
> > +	if ((1 << ffz(~mddev->chunk_size)) == mddev->chunk_size)
> > +		return 1;
> > +	return 0;
> > +}
> 
> include/linux/log2.h has is_power_of_2.  Use that.
> And just say
>   is_power_of_2(mddev->chunk_size)
> where it is needed.  No need for a separate function.
> 
> 
> > +
> > +
> >  static int raid0_run(mddev_t *mddev)
> >  {
> >  	int ret;
> > +	int segment_boundary =  (mddev->chunk_size>>1)-1;
> >  
> >  	if (mddev->chunk_size == 0) {
> >  		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
> >  		return -EINVAL;
> >  	}
> > -	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
> > -	       mdname(mddev),
> > -	       mddev->chunk_size >> 9,
> > -	       (mddev->chunk_size>>1)-1);
> >  	blk_queue_max_sectors(mddev->queue, mddev->chunk_size >> 9);
> > -	blk_queue_segment_boundary(mddev->queue, (mddev->chunk_size>>1) - 1);
> > +	if (!raid0_is_power2_chunk(mddev))
> > +		segment_boundary = ~(ffz(~mddev->chunk_size))>>1;
> > +	printk(KERN_INFO "%s: setting max_sectors to %d, segment boundary to %d\n",
> > +		mdname(mddev),
> > +		mddev->chunk_size >> 9,
> > +		segment_boundary);
> > +	blk_queue_segment_boundary(mddev->queue, segment_boundary);
> 
> Setting blk_queue_segment_boundary here seems wrong.  I know you
> didn't add that code, you just modified.  But I think it might be
> best to remove it first - in a separate patch.
> I've just asked Peter Chubb who added that code in the first place if
> he has any idea why it was there.
> 
> >  	mddev->queue->queue_lock = &mddev->queue->__queue_lock;
> >  
> >  	mddev->private = kmalloc(sizeof(raid0_conf_t), GFP_KERNEL);
> >  	if (!mddev->private)
> >  		return -ENOMEM;
> > -	ret = create_strip_zones(mddev);
> > +	ret = raid0_create_strip_zones(mddev);
> >  	if (ret < 0) {
> >  		kfree(mddev->private);
> >  		mddev->private = NULL;
> > @@ -322,31 +333,35 @@ static int raid0_stop (mddev_t *mddev)
> >  	return 0;
> >  }
> >  
> > -/* Find the zone which holds a particular offset */
> > -static struct strip_zone *find_zone(struct raid0_private_data *conf,
> > -		sector_t sector)
> > +static int raid0_position_bio(mddev_t *mddev, struct bio *bio, sector_t sector)
> 
> You've removed a comment, changed the name of the function that it
> described, but have not provided a new comment to describe the new
> function..
> 
> And no "raid0_" prefix please.  I know a lot of function do still have
> that prefix but I would like the number to decrease, not increase.  It
> is redundant.
> 
> 
> >  {
> > -	int i;
> > -
> > -	for (i = 0; i < conf->nr_strip_zones; i++) {
> > -		struct strip_zone *z = conf->strip_zone + i;
> > -
> > -		if (sector < z->zone_start + z->sectors)
> > -			return z;
> > -	}
> > -	BUG();
> > -	return NULL;
> > +	sector_t sect_in_chunk;
> > +	mdk_rdev_t *tmp_dev;
> > +	sector_t chunk_in_dev;
> > +	sector_t rsect;
> > +	sector_t x;
> > +	raid0_conf_t *conf = mddev_to_conf(mddev);
> > +	sector_t chunk_sects = mddev->chunk_size >> 9;
> > +	struct strip_zone *zone = &conf->strip_zone[0];
> > +
> > +	while (sector >= zone->zone_start + zone->sectors)
> > +		zone++;
> > +	sect_in_chunk = sector % chunk_sects;
> > +	x = (sector - zone->zone_start) / chunk_sects;
> > +	sector_div(x, zone->nb_dev);
> > +	chunk_in_dev = x;
> > +	x = sector / chunk_sects;
> > +	tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> > +	rsect = (chunk_in_dev * chunk_sects) + zone->dev_start + sect_in_chunk;
> > +	bio->bi_bdev = tmp_dev->bdev;
> > +	bio->bi_sector = rsect + tmp_dev->data_offset;
> > +	return 0;
> >  }
> 
> I would really like it if we only did division if division were need,
> and just use shift when that is possible.
> e..g
>   if (is_power_of_two())
>       do the shift version
>   else
>       do the divide version.
> 
> 
> >  
> > -static int raid0_make_request (struct request_queue *q, struct bio *bio)
> > +static int raid0_make_request(struct request_queue *q, struct bio *bio)
> >  {
> >  	mddev_t *mddev = q->queuedata;
> > -	unsigned int sect_in_chunk, chunksect_bits, chunk_sects;
> > -	raid0_conf_t *conf = mddev_to_conf(mddev);
> > -	struct strip_zone *zone;
> > -	mdk_rdev_t *tmp_dev;
> > -	sector_t chunk;
> > -	sector_t sector, rsect;
> > +	unsigned int chunk_sects;
> >  	const int rw = bio_data_dir(bio);
> >  	int cpu;
> >  
> > @@ -362,10 +377,9 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
> >  	part_stat_unlock();
> >  
> >  	chunk_sects = mddev->chunk_size >> 9;
> > -	chunksect_bits = ffz(~chunk_sects);
> > -	sector = bio->bi_sector;
> >  
> > -	if (unlikely(chunk_sects < (bio->bi_sector & (chunk_sects - 1)) + (bio->bi_size >> 9))) {
> > +	if (unlikely(chunk_sects < ((bio->bi_sector % chunk_sects)
> > +			+ (bio->bi_size >> 9)))) {
> 
> Please keep things lines up properly.  The contents of a bracketed
> expression should never be to the left of the opening bracket, unless
> the opening bracket is the last symbol on the line.
> So this should be:
> > +	if (unlikely(chunk_sects < ((bio->bi_sector % chunk_sects)
> > +			            + (bio->bi_size >> 9)))) {
> 
> >  		struct bio_pair *bp;
> >  		/* Sanity check -- queue functions should prevent this happening */
> >  		if (bio->bi_vcnt != 1 ||
> > @@ -374,7 +388,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
> >  		/* This is a one page bio that upper layers
> >  		 * refuse to split for us, so we need to split it.
> >  		 */
> > -		bp = bio_split(bio, chunk_sects - (bio->bi_sector & (chunk_sects - 1)));
> > +		bp = bio_split(bio, chunk_sects -
> > +					(bio->bi_sector % chunk_sects));
> 
> and this should be
> > +		bp = bio_split(bio, chunk_sects -
> > +				    (bio->bi_sector % chunk_sects));
> 
> 
> >  		if (raid0_make_request(q, &bp->bio1))
> >  			generic_make_request(&bp->bio1);
> >  		if (raid0_make_request(q, &bp->bio2))
> > @@ -383,29 +398,8 @@ static int raid0_make_request (struct request_queue *q, struct bio *bio)
> >  		bio_pair_release(bp);
> >  		return 0;
> >  	}
> > -	zone = find_zone(conf, sector);
> > -	if (!zone)
> > +	if (!raid0_position_bio(mddev, bio, bio->bi_sector))
> >  		return 1;
> > -	sect_in_chunk = bio->bi_sector & (chunk_sects - 1);
> > -	{
> > -		sector_t x = (sector - zone->zone_start) >> chunksect_bits;
> > -
> > -		sector_div(x, zone->nb_dev);
> > -		chunk = x;
> > -
> > -		x = sector >> chunksect_bits;
> > -		tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> > -	}
> > -	rsect = (chunk << chunksect_bits) + zone->dev_start + sect_in_chunk;
> > - 
> > -	bio->bi_bdev = tmp_dev->bdev;
> > -	bio->bi_sector = rsect + tmp_dev->data_offset;
> > -
> > -	/*
> > -	 * Let the main block layer submit the IO and resolve recursion:
> > -	 */
> > -	return 1;
> > -
> >  bad_map:
> >  	printk("raid0_make_request bug: can't convert block across chunks"
> >  		" or bigger than %dk %llu %d\n", chunk_sects / 2,
> > 
> 
> 
> Thanks.
> I haven't thoroughly read through 'position_bio' so I reserve the
> right to comment further on that :-)
> But apart from the various observations above, it looks like it is
> heading in the right direction.
> 
> NeilBrown


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-05-18 22:58 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-14 10:43 [PATCH 0/6] md: Remove the hash tables from raid0 Andre Noll
2009-05-14 10:43 ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Andre Noll
2009-05-14 11:15   ` SandeepKsinha
2009-05-14 11:15   ` NeilBrown
2009-05-14 12:10     ` Andre Noll
2009-05-14 12:25       ` NeilBrown
2009-05-14 12:54         ` Sujit Karataparambil
2009-05-14 15:00           ` SandeepKsinha
2009-05-14 15:58         ` PATCH md [001:002]: raid0: fix chunk size to 4K*n granularity raz ben yehuda
2009-05-14 14:07           ` Andre Noll
2009-05-14 22:35           ` Neil Brown
2009-05-18 22:58             ` raz ben yehuda
2009-05-14 16:00         ` Subject: PATCH[002:002] md: raid0: dump raid configuration raz ben yehuda
2009-05-14 17:12         ` Subject: [PATCH] mdadm: raid0: support chunks of 4K*n for raid0 raz ben yehuda
2009-05-15  3:59           ` Sujit Karataparambil
2009-05-15  6:01             ` Raz
2009-05-15  6:45               ` Sujit Karataparambil
2009-05-15  8:39                 ` NeilBrown
2009-05-15 15:45                   ` Raz
2009-05-14 12:22     ` [PATCH] md: raid0: Replace hash table lookup by looping over all strip_zones Neil Brown
2009-05-14 15:51       ` raz ben yehuda
2009-05-14 20:38         ` NeilBrown
2009-05-15 13:18           ` Andre Noll
2009-05-15 17:30         ` Andre Noll
2009-05-15 21:19           ` Raz
2009-05-18  8:21             ` Andre Noll
2009-05-14 12:01   ` SandeepKsinha
2009-05-14 12:15     ` SandeepKsinha
2009-05-14 14:13   ` raz ben yehuda
2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash table Andre Noll
2009-05-14 10:43 ` [PATCH] md: raid0: Remove hash spacing and sector shift Andre Noll
2009-05-14 10:43 ` [PATCH] md: raid0: Make raid0_run() return a proper error code Andre Noll
2009-05-14 11:21   ` NeilBrown
2009-05-14 11:42     ` Andre Noll
2009-05-14 10:43 ` [PATCH] md: raid0: Kfree() strip_zone and devlist in create_strip_zones() Andre Noll
2009-05-14 10:43 ` [PATCH] md: raid0: Simplify raid0_run() Andre Noll
2009-05-14 11:43   ` SandeepKsinha
2009-05-14 12:06     ` NeilBrown
2009-05-14 14:03   ` raz ben yehuda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).