linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2
@ 2006-01-24  0:40 NeilBrown
  2006-01-24  0:40 ` [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow NeilBrown
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:40 UTC (permalink / raw)
  To: linux-raid, linux-kernel

Here is second release of my patches to support online reshaping
of a raid5 array, i.e. adding 1 or more devices and restriping the 
whole thing.

This release fixes an assortment of bugs and adds checkpoint/restart
to the process (the last two patches).
This means that if your machine crashes, or if you have to stop an
array before the reshape is complete, md will notice and will restart
the reshape at an appropriate place.

There is still a small window ( < 1 second) at the start of the reshape
during which a crash will cause unrecoverable corruption.  My plan is
to resolve this in mdadm rather than md. The critical data will be copied
into the new drive(s) prior to commencing the reshape.  If there is a crash
the kernel will refuse the reassmble the array.  mdadm will be able to 
re-assemble it by first restoring the critical data and then letting
the remainder of the reshape run it's course.

I will be changing the interface for starting a reshape slightly before
this patch become final.  This will mean that current 'mdadm' will not
be able to start a raid5 reshape.
This is partly to save people from risking the above mentioned tiny hole,
but also to prepare for reshaping which changes other aspects of the
shape, e.g. layout, chunksize, level.

I am expecting that I will ultimately support online conversion of
raid5 to raid6 with only one extra device.  This process is not
(efficiently) checkpointable and so will be at-your-risk.
Checkpointing such a process with anything like reasonable efficiency
requires a largish (multi-megabytes) temporary store, and doing so
will at-best halve the speed.  I will make sure the posibility of
add this later will be left open.

My thanks to those who have tested the first release, who have
provided feedback, who will test this release, and who contribute to
the discussion in any way.

NeilBrown



 [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow.
 [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array.
 [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding.
 [PATCH 004 of 7] md: Core of raid5 resize process
 [PATCH 005 of 7] md: Final stages of raid5 expand code.
 [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape
 [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow.
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
@ 2006-01-24  0:40 ` NeilBrown
  2006-01-24  0:40 ` [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array NeilBrown
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:40 UTC (permalink / raw)
  To: linux-raid, linux-kernel


Previously the array of disk information was included in the
raid5 'conf' structure which was allocated to an appropriate size.
This makes it awkward to change the size of that array.
So we split it off into a separate kmalloced array which will
require a little extra indexing, but is much easier to grow.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c         |   10 +++++++---
 ./drivers/md/raid6main.c     |   11 ++++++++---
 ./include/linux/raid/raid5.h |    2 +-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 10:18:18.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 10:18:31.000000000 +1100
@@ -1821,11 +1821,13 @@ static int run(mddev_t *mddev)
 		return -EIO;
 	}
 
-	mddev->private = kzalloc(sizeof (raid5_conf_t)
-				 + mddev->raid_disks * sizeof(struct disk_info),
-				 GFP_KERNEL);
+	mddev->private = kzalloc(sizeof (raid5_conf_t), GFP_KERNEL);
 	if ((conf = mddev->private) == NULL)
 		goto abort;
+	conf->disks = kzalloc(mddev->raid_disks * sizeof(struct disk_info),
+			      GFP_KERNEL);
+	if (!conf->disks)
+		goto abort;
 
 	conf->mddev = mddev;
 
@@ -1965,6 +1967,7 @@ static int run(mddev_t *mddev)
 abort:
 	if (conf) {
 		print_raid5_conf(conf);
+		kfree(conf->disks);
 		kfree(conf->stripe_hashtbl);
 		kfree(conf);
 	}
@@ -1985,6 +1988,7 @@ static int stop(mddev_t *mddev)
 	kfree(conf->stripe_hashtbl);
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
 	sysfs_remove_group(&mddev->kobj, &raid5_attrs_group);
+	kfree(conf->disks);
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2006-01-24 10:18:18.000000000 +1100
+++ ./drivers/md/raid6main.c	2006-01-24 10:18:31.000000000 +1100
@@ -1925,11 +1925,14 @@ static int run(mddev_t *mddev)
 		return -EIO;
 	}
 
-	mddev->private = kzalloc(sizeof (raid6_conf_t)
-				 + mddev->raid_disks * sizeof(struct disk_info),
-				 GFP_KERNEL);
+	mddev->private = kzalloc(sizeof (raid6_conf_t), GFP_KERNEL);
 	if ((conf = mddev->private) == NULL)
 		goto abort;
+	conf->disks = kzalloc(mddev->raid_disks * sizeof(struct disk_info),
+				 GFP_KERNEL);
+	if (!conf->disks)
+		goto abort;
+
 	conf->mddev = mddev;
 
 	if ((conf->stripe_hashtbl = kzalloc(PAGE_SIZE, GFP_KERNEL)) == NULL)
@@ -2077,6 +2080,7 @@ abort:
 		print_raid6_conf(conf);
 		safe_put_page(conf->spare_page);
 		kfree(conf->stripe_hashtbl);
+		kfree(conf->disks);
 		kfree(conf);
 	}
 	mddev->private = NULL;
@@ -2095,6 +2099,7 @@ static int stop (mddev_t *mddev)
 	shrink_stripes(conf);
 	kfree(conf->stripe_hashtbl);
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
+	kfree(conf->disks);
 	kfree(conf);
 	mddev->private = NULL;
 	return 0;

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 10:18:18.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 10:18:31.000000000 +1100
@@ -240,7 +240,7 @@ struct raid5_private_data {
 							 * waiting for 25% to be free
 							 */        
 	spinlock_t		device_lock;
-	struct disk_info	disks[0];
+	struct disk_info	*disks;
 };
 
 typedef struct raid5_private_data raid5_conf_t;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array.
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
  2006-01-24  0:40 ` [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow NeilBrown
@ 2006-01-24  0:40 ` NeilBrown
  2006-01-24  0:41 ` [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding NeilBrown
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:40 UTC (permalink / raw)
  To: linux-raid, linux-kernel


Before a RAID-5 can be expanded, we need to be able to expand the
stripe-cache data structure.  
This requires allocating new stripes in a new kmem_cache.
If this succeeds, we copy cache pages over and release the old
stripes and kmem_cache.
We then allocate new pages.  If that fails, we leave the stripe
cache at it's new size.  It isn't worth the effort to shink 
it back again.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c         |  118 +++++++++++++++++++++++++++++++++++++++++--
 ./drivers/md/raid6main.c     |    4 -
 ./include/linux/raid/raid5.h |    9 ++-
 3 files changed, 123 insertions(+), 8 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 10:18:31.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 10:20:33.000000000 +1100
@@ -313,20 +313,130 @@ static int grow_stripes(raid5_conf_t *co
 	kmem_cache_t *sc;
 	int devs = conf->raid_disks;
 
-	sprintf(conf->cache_name, "raid5/%s", mdname(conf->mddev));
-
-	sc = kmem_cache_create(conf->cache_name, 
+	sprintf(conf->cache_name[0], "raid5/%s", mdname(conf->mddev));
+	sprintf(conf->cache_name[1], "raid5/%s-alt", mdname(conf->mddev));
+	conf->active_name = 0;
+	sc = kmem_cache_create(conf->cache_name[conf->active_name],
 			       sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev),
 			       0, 0, NULL, NULL);
 	if (!sc)
 		return 1;
 	conf->slab_cache = sc;
+	conf->pool_size = devs;
 	while (num--) {
 		if (!grow_one_stripe(conf))
 			return 1;
 	}
 	return 0;
 }
+static int resize_stripes(raid5_conf_t *conf, int newsize)
+{
+	/* make all the stripes able to hold 'newsize' devices.
+	 * New slots in each stripe get 'page' set to a new page.
+	 * We allocate all the new stripes first, then if that succeeds,
+	 * copy everything across.
+	 * Finally we add new pages.  This could fail, but we leave
+	 * the stripe cache at it's new size, just with some pages empty.
+	 *
+	 * We use GFP_NOIO allocations as IO to the raid5 is blocked
+	 * at some points in this operation.
+	 */
+	struct stripe_head *osh, *nsh;
+	struct list_head newstripes, oldstripes;
+	struct disk_info *ndisks;
+	int err = 0;
+	kmem_cache_t *sc;
+	int i;
+
+	if (newsize <= conf->pool_size)
+		return 0; /* never bother to shrink */
+
+	sc = kmem_cache_create(conf->cache_name[1-conf->active_name],
+			       sizeof(struct stripe_head)+(newsize-1)*sizeof(struct r5dev),
+			       0, 0, NULL, NULL);
+	if (!sc)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&newstripes);
+	for (i = conf->max_nr_stripes; i; i--) {
+		nsh = kmem_cache_alloc(sc, GFP_NOIO);
+		if (!nsh)
+			break;
+
+		memset(nsh, 0, sizeof(*nsh) + (newsize-1)*sizeof(struct r5dev));
+
+		nsh->raid_conf = conf;
+		spin_lock_init(&nsh->lock);
+
+		list_add(&nsh->lru, &newstripes);
+	}
+	if (i) {
+		/* didn't get enough, give up */
+		while (!list_empty(&newstripes)) {
+			nsh = list_entry(newstripes.next, struct stripe_head, lru);
+			list_del(&nsh->lru);
+			kmem_cache_free(sc, nsh);
+		}
+		kmem_cache_destroy(sc);
+		return -ENOMEM;
+	}
+	/* OK, we have enough stripes, start collecting inactive
+	 * stripes and copying them over
+	 */
+	INIT_LIST_HEAD(&oldstripes);
+	list_for_each_entry(nsh, &newstripes, lru) {
+		spin_lock_irq(&conf->device_lock);
+		wait_event_lock_irq(conf->wait_for_stripe,
+				    !list_empty(&conf->inactive_list),
+				    conf->device_lock,
+				    unplug_slaves(conf->mddev);
+			);
+		osh = get_free_stripe(conf);
+		spin_unlock_irq(&conf->device_lock);
+		atomic_set(&nsh->count, 1);
+		for(i=0; i<conf->pool_size; i++)
+			nsh->dev[i].page = osh->dev[i].page;
+		for( ; i<newsize; i++)
+			nsh->dev[i].page = NULL;
+		list_add(&osh->lru, &oldstripes);
+	}
+	/* Got them all.
+	 * Return the new ones and free the old ones.
+	 * At this point, we are holding all the stripes so the array
+	 * is completely stalled, so now is a good time to resize
+	 * conf->disks.
+	 */
+	ndisks = kzalloc(newsize * sizeof(struct disk_info), GFP_NOIO);
+	if (ndisks) {
+		for (i=0; i<conf->raid_disks; i++)
+			ndisks[i] = conf->disks[i];
+		kfree(conf->disks);
+		conf->disks = ndisks;
+	} else
+		err = -ENOMEM;
+	while(!list_empty(&newstripes)) {
+		nsh = list_entry(newstripes.next, struct stripe_head, lru);
+		list_del_init(&nsh->lru);
+		for (i=conf->raid_disks; i < newsize; i++)
+			if (nsh->dev[i].page == NULL) {
+				struct page *p = alloc_page(GFP_NOIO);
+				nsh->dev[i].page = p;
+				if (!p)
+					err = -ENOMEM;
+			}
+		release_stripe(nsh);
+	}
+	while(!list_empty(&oldstripes)) {
+		osh = list_entry(oldstripes.next, struct stripe_head, lru);
+		list_del(&osh->lru);
+		kmem_cache_free(conf->slab_cache, osh);
+	}
+	kmem_cache_destroy(conf->slab_cache);
+	conf->slab_cache = sc;
+	conf->active_name = 1-conf->active_name;
+	conf->pool_size = newsize;
+	return err;
+}
+
 
 static int drop_one_stripe(raid5_conf_t *conf)
 {
@@ -339,7 +449,7 @@ static int drop_one_stripe(raid5_conf_t 
 		return 0;
 	if (atomic_read(&sh->count))
 		BUG();
-	shrink_buffers(sh, conf->raid_disks);
+	shrink_buffers(sh, conf->pool_size);
 	kmem_cache_free(conf->slab_cache, sh);
 	atomic_dec(&conf->active_stripes);
 	return 1;

diff ./drivers/md/raid6main.c~current~ ./drivers/md/raid6main.c
--- ./drivers/md/raid6main.c~current~	2006-01-24 10:18:31.000000000 +1100
+++ ./drivers/md/raid6main.c	2006-01-24 10:18:50.000000000 +1100
@@ -308,9 +308,9 @@ static int grow_stripes(raid6_conf_t *co
 	kmem_cache_t *sc;
 	int devs = conf->raid_disks;
 
-	sprintf(conf->cache_name, "raid6/%s", mdname(conf->mddev));
+	sprintf(conf->cache_name[0], "raid6/%s", mdname(conf->mddev));
 
-	sc = kmem_cache_create(conf->cache_name,
+	sc = kmem_cache_create(conf->cache_name[0],
 			       sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev),
 			       0, 0, NULL, NULL);
 	if (!sc)

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 10:18:31.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 10:18:50.000000000 +1100
@@ -216,7 +216,11 @@ struct raid5_private_data {
 	struct list_head	bitmap_list; /* stripes delaying awaiting bitmap update */
 	atomic_t		preread_active_stripes; /* stripes with scheduled io */
 
-	char			cache_name[20];
+	/* unfortunately we need two cache names as we temporarily have
+	 * two caches.
+	 */
+	int			active_name;
+	char			cache_name[2][20];
 	kmem_cache_t		*slab_cache; /* for allocating stripes */
 
 	int			seq_flush, seq_write;
@@ -238,7 +242,8 @@ struct raid5_private_data {
 	wait_queue_head_t	wait_for_overlap;
 	int			inactive_blocked;	/* release of inactive stripes blocked,
 							 * waiting for 25% to be free
-							 */        
+							 */
+	int			pool_size; /* number of disks in stripeheads in pool */
 	spinlock_t		device_lock;
 	struct disk_info	*disks;
 };

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding.
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
  2006-01-24  0:40 ` [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow NeilBrown
  2006-01-24  0:40 ` [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array NeilBrown
@ 2006-01-24  0:41 ` NeilBrown
  2006-01-24  0:41 ` [PATCH 004 of 7] md: Core of raid5 resize process NeilBrown
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:41 UTC (permalink / raw)
  To: linux-raid, linux-kernel


We need to allow that different stripes are of different effective sizes,
and use the appropriate size.
Also, when a stripe is being expanded, we must block any IO attempts
until the stripe is stable again.

Key elements in this change are:
 - each stripe_head gets a 'disk' field which is part of the key,
   thus there can sometimes be two stripe heads of the same area of
   the array, but covering different numbers of devices.  One of these
   will be marked STRIPE_EXPANDING and so won't accept new requests.
 - conf->expand_progress tracks how the expansion is progressing and
   is used to determine whether the target part of the array has been
   expanded yet or not.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c         |   88 ++++++++++++++++++++++++++++---------------
 ./include/linux/raid/raid5.h |    6 ++
 2 files changed, 64 insertions(+), 30 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 10:20:33.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 10:23:40.000000000 +1100
@@ -178,10 +178,10 @@ static int grow_buffers(struct stripe_he
 
 static void raid5_build_block (struct stripe_head *sh, int i);
 
-static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx)
+static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int disks)
 {
 	raid5_conf_t *conf = sh->raid_conf;
-	int disks = conf->raid_disks, i;
+	int i;
 
 	if (atomic_read(&sh->count) != 0)
 		BUG();
@@ -198,7 +198,9 @@ static void init_stripe(struct stripe_he
 	sh->pd_idx = pd_idx;
 	sh->state = 0;
 
-	for (i=disks; i--; ) {
+	sh->disks = disks;
+
+	for (i = sh->disks; i--; ) {
 		struct r5dev *dev = &sh->dev[i];
 
 		if (dev->toread || dev->towrite || dev->written ||
@@ -215,7 +217,7 @@ static void init_stripe(struct stripe_he
 	insert_hash(conf, sh);
 }
 
-static struct stripe_head *__find_stripe(raid5_conf_t *conf, sector_t sector)
+static struct stripe_head *__find_stripe(raid5_conf_t *conf, sector_t sector, int disks)
 {
 	struct stripe_head *sh;
 	struct hlist_node *hn;
@@ -223,7 +225,7 @@ static struct stripe_head *__find_stripe
 	CHECK_DEVLOCK();
 	PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector);
 	hlist_for_each_entry(sh, hn, stripe_hash(conf, sector), hash)
-		if (sh->sector == sector)
+		if (sh->sector == sector && sh->disks == disks)
 			return sh;
 	PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector);
 	return NULL;
@@ -232,8 +234,8 @@ static struct stripe_head *__find_stripe
 static void unplug_slaves(mddev_t *mddev);
 static void raid5_unplug_device(request_queue_t *q);
 
-static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector,
-					     int pd_idx, int noblock) 
+static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector, int disks,
+					     int pd_idx, int noblock)
 {
 	struct stripe_head *sh;
 
@@ -245,7 +247,7 @@ static struct stripe_head *get_active_st
 		wait_event_lock_irq(conf->wait_for_stripe,
 				    conf->quiesce == 0,
 				    conf->device_lock, /* nothing */);
-		sh = __find_stripe(conf, sector);
+		sh = __find_stripe(conf, sector, disks);
 		if (!sh) {
 			if (!conf->inactive_blocked)
 				sh = get_free_stripe(conf);
@@ -263,7 +265,7 @@ static struct stripe_head *get_active_st
 					);
 				conf->inactive_blocked = 0;
 			} else
-				init_stripe(sh, sector, pd_idx);
+				init_stripe(sh, sector, pd_idx, disks);
 		} else {
 			if (atomic_read(&sh->count)) {
 				if (!list_empty(&sh->lru))
@@ -300,6 +302,7 @@ static int grow_one_stripe(raid5_conf_t 
 		kmem_cache_free(conf->slab_cache, sh);
 		return 0;
 	}
+	sh->disks = conf->raid_disks;
 	/* we just created an active stripe so... */
 	atomic_set(&sh->count, 1);
 	atomic_inc(&conf->active_stripes);
@@ -469,7 +472,7 @@ static int raid5_end_read_request(struct
 {
  	struct stripe_head *sh = bi->bi_private;
 	raid5_conf_t *conf = sh->raid_conf;
-	int disks = conf->raid_disks, i;
+	int disks = sh->disks, i;
 	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 
 	if (bi->bi_size)
@@ -567,7 +570,7 @@ static int raid5_end_write_request (stru
 {
  	struct stripe_head *sh = bi->bi_private;
 	raid5_conf_t *conf = sh->raid_conf;
-	int disks = conf->raid_disks, i;
+	int disks = sh->disks, i;
 	unsigned long flags;
 	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
 
@@ -721,7 +724,7 @@ static sector_t raid5_compute_sector(sec
 static sector_t compute_blocknr(struct stripe_head *sh, int i)
 {
 	raid5_conf_t *conf = sh->raid_conf;
-	int raid_disks = conf->raid_disks, data_disks = raid_disks - 1;
+	int raid_disks = sh->disks, data_disks = raid_disks - 1;
 	sector_t new_sector = sh->sector, check;
 	int sectors_per_chunk = conf->chunk_size >> 9;
 	sector_t stripe;
@@ -822,8 +825,7 @@ static void copy_data(int frombio, struc
 
 static void compute_block(struct stripe_head *sh, int dd_idx)
 {
-	raid5_conf_t *conf = sh->raid_conf;
-	int i, count, disks = conf->raid_disks;
+	int i, count, disks = sh->disks;
 	void *ptr[MAX_XOR_BLOCKS], *p;
 
 	PRINTK("compute_block, stripe %llu, idx %d\n", 
@@ -853,7 +855,7 @@ static void compute_block(struct stripe_
 static void compute_parity(struct stripe_head *sh, int method)
 {
 	raid5_conf_t *conf = sh->raid_conf;
-	int i, pd_idx = sh->pd_idx, disks = conf->raid_disks, count;
+	int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
 	void *ptr[MAX_XOR_BLOCKS];
 	struct bio *chosen;
 
@@ -1041,7 +1043,7 @@ static int add_stripe_bio(struct stripe_
 static void handle_stripe(struct stripe_head *sh)
 {
 	raid5_conf_t *conf = sh->raid_conf;
-	int disks = conf->raid_disks;
+	int disks = sh->disks;
 	struct bio *return_bi= NULL;
 	struct bio *bi;
 	int i;
@@ -1635,12 +1637,10 @@ static inline void raid5_plug_device(rai
 	spin_unlock_irq(&conf->device_lock);
 }
 
-static int make_request (request_queue_t *q, struct bio * bi)
+static int make_request(request_queue_t *q, struct bio * bi)
 {
 	mddev_t *mddev = q->queuedata;
 	raid5_conf_t *conf = mddev_to_conf(mddev);
-	const unsigned int raid_disks = conf->raid_disks;
-	const unsigned int data_disks = raid_disks - 1;
 	unsigned int dd_idx, pd_idx;
 	sector_t new_sector;
 	sector_t logical_sector, last_sector;
@@ -1664,20 +1664,48 @@ static int make_request (request_queue_t
 
 	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
 		DEFINE_WAIT(w);
+		int disks;
 		
-		new_sector = raid5_compute_sector(logical_sector,
-						  raid_disks, data_disks, &dd_idx, &pd_idx, conf);
-
+	retry:
+		if (likely(conf->expand_progress == MaxSector))
+			disks = conf->raid_disks;
+		else {
+			spin_lock_irq(&conf->device_lock);
+			disks = conf->raid_disks;
+			if (logical_sector >= conf->expand_progress)
+				disks = conf->previous_raid_disks;
+			spin_unlock_irq(&conf->device_lock);
+		}
+ 		new_sector = raid5_compute_sector(logical_sector, disks, disks - 1,
+						  &dd_idx, &pd_idx, conf);
 		PRINTK("raid5: make_request, sector %llu logical %llu\n",
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
-	retry:
 		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
-		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
+		sh = get_active_stripe(conf, new_sector, disks, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
-			if (!add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
-				/* Add failed due to overlap.  Flush everything
+			if (unlikely(conf->expand_progress != MaxSector)) {
+				/* expansion might have moved on while waiting for a
+				 * stripe, so we much do the range check again.
+				 */
+				int must_retry = 0;
+				spin_lock_irq(&conf->device_lock);
+				if (logical_sector <  conf->expand_progress &&
+				    disks == conf->previous_raid_disks)
+					/* mismatch, need to try again */
+					must_retry = 1;
+				spin_unlock_irq(&conf->device_lock);
+				if (must_retry) {
+					release_stripe(sh);
+					goto retry;
+				}
+			}
+
+			if (test_bit(STRIPE_EXPANDING, &sh->state) ||
+			    !add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK))) {
+				/* Stripe is busy expanding or
+				 * add failed due to overlap.  Flush everything
 				 * and wait a while
 				 */
 				raid5_unplug_device(mddev->queue);
@@ -1689,7 +1717,6 @@ static int make_request (request_queue_t
 			raid5_plug_device(conf);
 			handle_stripe(sh);
 			release_stripe(sh);
-
 		} else {
 			/* cannot get stripe for read-ahead, just give-up */
 			clear_bit(BIO_UPTODATE, &bi->bi_flags);
@@ -1765,9 +1792,9 @@ static sector_t sync_request(mddev_t *md
 
 	first_sector = raid5_compute_sector((sector_t)stripe*data_disks*sectors_per_chunk
 		+ chunk_offset, raid_disks, data_disks, &dd_idx, &pd_idx, conf);
-	sh = get_active_stripe(conf, sector_nr, pd_idx, 1);
+	sh = get_active_stripe(conf, sector_nr, raid_disks, pd_idx, 1);
 	if (sh == NULL) {
-		sh = get_active_stripe(conf, sector_nr, pd_idx, 0);
+		sh = get_active_stripe(conf, sector_nr, raid_disks, pd_idx, 0);
 		/* make sure we don't swamp the stripe cache if someone else
 		 * is trying to get access 
 		 */
@@ -1984,6 +2011,7 @@ static int run(mddev_t *mddev)
 	conf->level = mddev->level;
 	conf->algorithm = mddev->layout;
 	conf->max_nr_stripes = NR_STRIPES;
+	conf->expand_progress = MaxSector;
 
 	/* device size must be a multiple of chunk size */
 	mddev->size &= ~(mddev->chunk_size/1024 -1);
@@ -2114,7 +2142,7 @@ static void print_sh (struct stripe_head
 	printk("sh %llu,  count %d.\n",
 		(unsigned long long)sh->sector, atomic_read(&sh->count));
 	printk("sh %llu, ", (unsigned long long)sh->sector);
-	for (i = 0; i < sh->raid_conf->raid_disks; i++) {
+	for (i = 0; i < sh->disks; i++) {
 		printk("(cache%d: %p %ld) ", 
 			i, sh->dev[i].page, sh->dev[i].flags);
 	}

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 10:18:50.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 10:21:00.000000000 +1100
@@ -135,6 +135,7 @@ struct stripe_head {
 	atomic_t		count;			/* nr of active thread/requests */
 	spinlock_t		lock;
 	int			bm_seq;	/* sequence number for bitmap flushes */
+	int			disks;			/* disks in stripe */
 	struct r5dev {
 		struct bio	req;
 		struct bio_vec	vec;
@@ -174,6 +175,7 @@ struct stripe_head {
 #define	STRIPE_DELAYED		6
 #define	STRIPE_DEGRADED		7
 #define	STRIPE_BIT_DELAY	8
+#define	STRIPE_EXPANDING	9
 
 /*
  * Plugging:
@@ -211,6 +213,10 @@ struct raid5_private_data {
 	int			raid_disks, working_disks, failed_disks;
 	int			max_nr_stripes;
 
+	/* used during an expand */
+	sector_t		expand_progress;	/* MaxSector when no expand happening */
+	int			previous_raid_disks;
+
 	struct list_head	handle_list; /* stripes needing handling */
 	struct list_head	delayed_list; /* stripes that have plugged requests */
 	struct list_head	bitmap_list; /* stripes delaying awaiting bitmap update */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 004 of 7] md: Core of raid5 resize process
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (2 preceding siblings ...)
  2006-01-24  0:41 ` [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding NeilBrown
@ 2006-01-24  0:41 ` NeilBrown
  2006-01-24  0:41 ` [PATCH 005 of 7] md: Final stages of raid5 expand code NeilBrown
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:41 UTC (permalink / raw)
  To: linux-raid, linux-kernel


This patch provides the core of the resize/expand process.

sync_request notices if a 'reshape' is happening and acts accordingly.

It allocated new stripe_heads for the next chunk-wide-stripe in the
target geometry, marking them STRIPE_EXPANDING.
Then it finds which stripe heads in the old geometry can provide data
needed by these and marks them STRIPE_EXPAND_SOURCE.  This causes
stripe_handle to read all blocks on those stripes.
Once all blocks on a STRIPE_EXPAND_SOURCE stripe_head are read, and that
are needed are copied into the corresponding STRIPE_EXPANDING stripe_head.
Once a STRIPE_EXPANDING stripe_head is full, it is marks STRIPE_EXPAND_READY
and then is  written out and released.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c            |   10 +-
 ./drivers/md/raid5.c         |  189 +++++++++++++++++++++++++++++++++++++------
 ./include/linux/raid/md_k.h  |    4 
 ./include/linux/raid/raid5.h |    4 
 4 files changed, 181 insertions(+), 26 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-01-24 10:39:55.000000000 +1100
+++ ./drivers/md/md.c	2006-01-24 10:39:52.000000000 +1100
@@ -2151,7 +2151,9 @@ action_show(mddev_t *mddev, char *page)
 	char *type = "idle";
 	if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
 	    test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) {
-		if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+		if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery))
+			type = "reshape";
+		else if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
 			if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
 				type = "resync";
 			else if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery))
@@ -4067,8 +4069,10 @@ static void status_resync(struct seq_fil
 		seq_printf(seq, "] ");
 	}
 	seq_printf(seq, " %s =%3lu.%lu%% (%lu/%lu)",
+		   (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)?
+		    "reshape" :
 		      (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ?
-		       "resync" : "recovery"),
+		       "resync" : "recovery")),
 		      res/10, res % 10, resync, max_blocks);
 
 	/*
@@ -4656,6 +4660,8 @@ static void md_do_sync(mddev_t *mddev)
 	mddev->pers->sync_request(mddev, max_sectors, &skipped, 1);
 
 	if (!test_bit(MD_RECOVERY_ERR, &mddev->recovery) &&
+	    test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
+	    !test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
 	    mddev->curr_resync > 2 &&
 	    mddev->curr_resync >= mddev->recovery_cp) {
 		if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 10:39:55.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 10:40:14.000000000 +1100
@@ -93,11 +93,13 @@ static void __release_stripe(raid5_conf_
 				if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
 					md_wakeup_thread(conf->mddev->thread);
 			}
-			list_add_tail(&sh->lru, &conf->inactive_list);
 			atomic_dec(&conf->active_stripes);
-			if (!conf->inactive_blocked ||
-			    atomic_read(&conf->active_stripes) < (conf->max_nr_stripes*3/4))
-				wake_up(&conf->wait_for_stripe);
+			if (!test_bit(STRIPE_EXPANDING, &sh->state)) {
+				list_add_tail(&sh->lru, &conf->inactive_list);
+				if (!conf->inactive_blocked ||
+				    atomic_read(&conf->active_stripes) < (conf->max_nr_stripes*3/4))
+					wake_up(&conf->wait_for_stripe);
+			}
 		}
 	}
 }
@@ -273,9 +275,8 @@ static struct stripe_head *get_active_st
 			} else {
 				if (!test_bit(STRIPE_HANDLE, &sh->state))
 					atomic_inc(&conf->active_stripes);
-				if (list_empty(&sh->lru))
-					BUG();
-				list_del_init(&sh->lru);
+				if (!list_empty(&sh->lru))
+					list_del_init(&sh->lru);
 			}
 		}
 	} while (sh == NULL);
@@ -1021,6 +1022,18 @@ static int add_stripe_bio(struct stripe_
 	return 0;
 }
 
+int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks)
+{
+	int sectors_per_chunk = conf->chunk_size >> 9;
+	sector_t x = stripe;
+	int pd_idx, dd_idx;
+	int chunk_offset = sector_div(x, sectors_per_chunk);
+	stripe = x;
+	raid5_compute_sector(stripe*(disks-1)*sectors_per_chunk
+			     + chunk_offset, disks, disks-1, &dd_idx, &pd_idx, conf);
+	return pd_idx;
+}
+
 
 /*
  * handle_stripe - do things to a stripe.
@@ -1047,7 +1060,7 @@ static void handle_stripe(struct stripe_
 	struct bio *return_bi= NULL;
 	struct bio *bi;
 	int i;
-	int syncing;
+	int syncing, expanding, expanded;
 	int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
 	int non_overwrite = 0;
 	int failed_num=0;
@@ -1062,6 +1075,8 @@ static void handle_stripe(struct stripe_
 	clear_bit(STRIPE_DELAYED, &sh->state);
 
 	syncing = test_bit(STRIPE_SYNCING, &sh->state);
+	expanding = test_bit(STRIPE_EXPAND_SOURCE, &sh->state);
+	expanded = test_bit(STRIPE_EXPAND_READY, &sh->state);
 	/* Now to look around and see what can be done */
 
 	rcu_read_lock();
@@ -1254,13 +1269,14 @@ static void handle_stripe(struct stripe_
 	 * parity, or to satisfy requests
 	 * or to load a block that is being partially written.
 	 */
-	if (to_read || non_overwrite || (syncing && (uptodate < disks))) {
+	if (to_read || non_overwrite || (syncing && (uptodate < disks)) || expanding) {
 		for (i=disks; i--;) {
 			dev = &sh->dev[i];
 			if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
 			    (dev->toread ||
 			     (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
 			     syncing ||
+			     expanding ||
 			     (failed && (sh->dev[failed_num].toread ||
 					 (sh->dev[failed_num].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags))))
 				    )
@@ -1450,13 +1466,76 @@ static void handle_stripe(struct stripe_
 			set_bit(R5_Wantwrite, &dev->flags);
 			set_bit(R5_ReWrite, &dev->flags);
 			set_bit(R5_LOCKED, &dev->flags);
+			locked++;
 		} else {
 			/* let's read it back */
 			set_bit(R5_Wantread, &dev->flags);
 			set_bit(R5_LOCKED, &dev->flags);
+			locked++;
 		}
 	}
 
+	if (expanded && test_bit(STRIPE_EXPANDING, &sh->state)) {
+		/* Need to write out all blocks after computing parity */
+		sh->disks = conf->raid_disks;
+		sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks);
+		compute_parity(sh, RECONSTRUCT_WRITE);
+		for (i= conf->raid_disks; i--;) {
+			set_bit(R5_LOCKED, &sh->dev[i].flags);
+			locked++;
+			set_bit(R5_Wantwrite, &sh->dev[i].flags);
+		}
+		clear_bit(STRIPE_EXPANDING, &sh->state);
+	} else if (expanded) {
+		clear_bit(STRIPE_EXPAND_READY, &sh->state);
+		wake_up(&conf->wait_for_overlap);
+		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
+	}
+
+	if (expanding && locked == 0) {
+		/* We have read all the blocks in this stripe and now we need to
+		 * copy some of them into a target stripe for expand.
+		 */
+		clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
+		for (i=0; i< sh->disks; i++)
+			if (i != sh->pd_idx) {
+				int dd_idx, pd_idx, j;
+				struct stripe_head *sh2;
+
+				sector_t bn = compute_blocknr(sh, i);
+				sector_t s = raid5_compute_sector(bn, conf->raid_disks,
+								  conf->raid_disks-1,
+								  &dd_idx, &pd_idx, conf);
+				sh2 = get_active_stripe(conf, s, conf->raid_disks, pd_idx, 1);
+				if (sh2 == NULL)
+					/* so far only the early blocks of this stripe
+					 * have been requested.  When later blocks
+					 * get requested, we will try again
+					 */
+					continue;
+				if(!test_bit(STRIPE_EXPANDING, &sh2->state) ||
+				   test_bit(R5_Expanded, &sh2->dev[dd_idx].flags)) {
+					/* must have already done this block */
+					release_stripe(sh2);
+					continue;
+				}
+				memcpy(page_address(sh2->dev[dd_idx].page),
+				       page_address(sh->dev[i].page),
+				       STRIPE_SIZE);
+				set_bit(R5_Expanded, &sh2->dev[dd_idx].flags);
+				set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
+				for (j=0; j<conf->raid_disks; j++)
+					if (j != sh2->pd_idx &&
+					    !test_bit(R5_Expanded, &sh2->dev[j].flags))
+						break;
+				if (j == conf->raid_disks) {
+					set_bit(STRIPE_EXPAND_READY, &sh2->state);
+					set_bit(STRIPE_HANDLE, &sh2->state);
+				}
+				release_stripe(sh2);
+			}
+	}
+
 	spin_unlock(&sh->lock);
 
 	while ((bi=return_bi)) {
@@ -1495,7 +1574,7 @@ static void handle_stripe(struct stripe_
 		rcu_read_unlock();
  
 		if (rdev) {
-			if (syncing)
+			if (syncing || expanding || expanded)
 				md_sync_acct(rdev->bdev, STRIPE_SECTORS);
 
 			bi->bi_bdev = rdev->bdev;
@@ -1743,12 +1822,8 @@ static sector_t sync_request(mddev_t *md
 {
 	raid5_conf_t *conf = (raid5_conf_t *) mddev->private;
 	struct stripe_head *sh;
-	int sectors_per_chunk = conf->chunk_size >> 9;
-	sector_t x;
-	unsigned long stripe;
-	int chunk_offset;
-	int dd_idx, pd_idx;
-	sector_t first_sector;
+	int pd_idx;
+	sector_t first_sector, last_sector;
 	int raid_disks = conf->raid_disks;
 	int data_disks = raid_disks-1;
 	sector_t max_sector = mddev->size << 1;
@@ -1767,6 +1842,80 @@ static sector_t sync_request(mddev_t *md
 
 		return 0;
 	}
+
+	if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) {
+		/* reshaping is quite different to recovery/resync so it is
+		 * handled quite separately ... here.
+		 *
+		 * On each call to sync_request, we gather one chunk worth of
+		 * destination stripes and flag them as expanding.
+		 * Then we find all the source stripes and request reads.
+		 * As the reads complete, handle_stripe will copy the data
+		 * into the destination stripe and release that stripe.
+		 */
+		int i;
+		int dd_idx;
+		for (i=0; i < conf->chunk_size/512; i+= STRIPE_SECTORS) {
+			int j;
+			int skipped = 0;
+			pd_idx = stripe_to_pdidx(sector_nr+i, conf, conf->raid_disks);
+			sh = get_active_stripe(conf, sector_nr+i,
+					       conf->raid_disks, pd_idx, 0);
+			set_bit(STRIPE_EXPANDING, &sh->state);
+			/* If any of this stripe is beyond the end of the old
+			 * array, then we need to zero those blocks
+			 */
+			for (j=sh->disks; j--;) {
+				sector_t s;
+				if (j == sh->pd_idx)
+					continue;
+				s = compute_blocknr(sh, j);
+				if (s < (mddev->array_size<<1)) {
+					skipped = 1;
+					continue;
+				}
+				memset(page_address(sh->dev[j].page), 0, STRIPE_SIZE);
+				set_bit(R5_Expanded, &sh->dev[j].flags);
+				set_bit(R5_UPTODATE, &sh->dev[j].flags);
+			}
+			if (!skipped) {
+				set_bit(STRIPE_EXPAND_READY, &sh->state);
+				set_bit(STRIPE_HANDLE, &sh->state);
+			}
+			release_stripe(sh);
+		}
+		spin_lock_irq(&conf->device_lock);
+		conf->expand_progress = (sector_nr + i)*(conf->raid_disks-1);
+		spin_unlock_irq(&conf->device_lock);
+		/* Ok, those stripe are ready. We can start scheduling
+		 * reads on the source stripes.
+		 * The source stripes are determined by mapping the first and last
+		 * block on the destination stripes.
+		 */
+		raid_disks = conf->previous_raid_disks;
+		data_disks = raid_disks - 1;
+		first_sector =
+			raid5_compute_sector(sector_nr*(conf->raid_disks-1),
+					     raid_disks, data_disks,
+					     &dd_idx, &pd_idx, conf);
+		last_sector =
+			raid5_compute_sector((sector_nr+conf->chunk_size/512)
+					       *(conf->raid_disks-1) -1,
+					     raid_disks, data_disks,
+					     &dd_idx, &pd_idx, conf);
+		if (last_sector >= (mddev->size<<1))
+			last_sector = (mddev->size<<1)-1;
+		while (first_sector <= last_sector) {
+			pd_idx = stripe_to_pdidx(first_sector, conf, conf->previous_raid_disks);
+			sh = get_active_stripe(conf, first_sector,
+					       conf->previous_raid_disks, pd_idx, 0);
+			set_bit(STRIPE_EXPAND_SOURCE, &sh->state);
+			set_bit(STRIPE_HANDLE, &sh->state);
+			release_stripe(sh);
+			first_sector += STRIPE_SECTORS;
+		}
+		return conf->chunk_size>>9;
+	}
 	/* if there is 1 or more failed drives and we are trying
 	 * to resync, then assert that we are finished, because there is
 	 * nothing we can do.
@@ -1785,13 +1934,7 @@ static sector_t sync_request(mddev_t *md
 		return sync_blocks * STRIPE_SECTORS; /* keep things rounded to whole stripes */
 	}
 
-	x = sector_nr;
-	chunk_offset = sector_div(x, sectors_per_chunk);
-	stripe = x;
-	BUG_ON(x != stripe);
-
-	first_sector = raid5_compute_sector((sector_t)stripe*data_disks*sectors_per_chunk
-		+ chunk_offset, raid_disks, data_disks, &dd_idx, &pd_idx, conf);
+	pd_idx = stripe_to_pdidx(sector_nr, conf, raid_disks);
 	sh = get_active_stripe(conf, sector_nr, raid_disks, pd_idx, 1);
 	if (sh == NULL) {
 		sh = get_active_stripe(conf, sector_nr, raid_disks, pd_idx, 0);

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2006-01-24 10:39:55.000000000 +1100
+++ ./include/linux/raid/md_k.h	2006-01-24 10:39:49.000000000 +1100
@@ -157,6 +157,9 @@ struct mddev_s
 	 * DONE:     thread is done and is waiting to be reaped
 	 * REQUEST:  user-space has requested a sync (used with SYNC)
 	 * CHECK:    user-space request for for check-only, no repair
+	 * RESHAPE:  A reshape is happening
+	 *
+	 * If neither SYNC or RESHAPE are set, then it is a recovery.
 	 */
 #define	MD_RECOVERY_RUNNING	0
 #define	MD_RECOVERY_SYNC	1
@@ -166,6 +169,7 @@ struct mddev_s
 #define	MD_RECOVERY_NEEDED	5
 #define	MD_RECOVERY_REQUESTED	6
 #define	MD_RECOVERY_CHECK	7
+#define MD_RECOVERY_RESHAPE	8
 	unsigned long			recovery;
 
 	int				in_sync;	/* know to not need resync */

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 10:39:55.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 10:39:49.000000000 +1100
@@ -157,6 +157,7 @@ struct stripe_head {
 #define	R5_ReadError	8	/* seen a read error here recently */
 #define	R5_ReWrite	9	/* have tried to over-write the readerror */
 
+#define	R5_Expanded	10	/* This block now has post-expand data */
 /*
  * Write method
  */
@@ -176,7 +177,8 @@ struct stripe_head {
 #define	STRIPE_DEGRADED		7
 #define	STRIPE_BIT_DELAY	8
 #define	STRIPE_EXPANDING	9
-
+#define	STRIPE_EXPAND_SOURCE	10
+#define	STRIPE_EXPAND_READY	11
 /*
  * Plugging:
  *

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 005 of 7] md: Final stages of raid5 expand code.
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (3 preceding siblings ...)
  2006-01-24  0:41 ` [PATCH 004 of 7] md: Core of raid5 resize process NeilBrown
@ 2006-01-24  0:41 ` NeilBrown
  2006-01-24  0:41 ` [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape NeilBrown
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:41 UTC (permalink / raw)
  To: linux-raid, linux-kernel


This patch adds raid5_reshape and end_reshape which will
start and finish the reshape processes.

raid5_reshape is only enabled in CONFIG_MD_RAID5_RESHAPE is set,
to discourage accidental use.

Don't use this on valuable data.

Read the 'help' for the CONFIG_MD_RAID5_RESHAPE entry.

and Make sure to avoid use on valuable data.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/Kconfig      |   24 +++++++++
 ./drivers/md/md.c         |    7 +-
 ./drivers/md/raid5.c      |  117 ++++++++++++++++++++++++++++++++++++++++++++++
 ./include/linux/raid/md.h |    3 -
 4 files changed, 147 insertions(+), 4 deletions(-)

diff ./drivers/md/Kconfig~current~ ./drivers/md/Kconfig
--- ./drivers/md/Kconfig~current~	2006-01-24 10:39:52.000000000 +1100
+++ ./drivers/md/Kconfig	2006-01-24 10:40:21.000000000 +1100
@@ -127,6 +127,30 @@ config MD_RAID5
 
 	  If unsure, say Y.
 
+config MD_RAID5_RESHAPE
+	bool "Support adding drives to a raid-5 array (highly experimental)"
+	depends on MD_RAID5 && EXPERIMENTAL
+	---help---
+	  A RAID-5 set can be expanded by adding extra drives. This
+	  requires "restriping" the array which means (almost) every
+	  block must be written to a different place.
+
+          This option allows this restriping to be done while the array
+	  is online.  However it is VERY EARLY EXPERIMENTAL code.
+	  In particular, if anything goes wrong while the restriping
+	  is happening, such as a power failure or a crash, all the
+	  data on the array will be LOST beyond any reasonable hope
+	  of recovery.
+
+	  This option is provided for experimentation and testing.
+	  Please do NOT use it on valuable data without good, tested, backups.
+
+	  Any reasonable current version of 'mdadm' can start an expansion
+	  with e.g.  mdadm --grow /dev/md0 --raid-disks=6
+	  Note: The array can only be expanded, not contracted.
+	  There should be enough spares already present to make the new
+	  array workable.
+
 config MD_RAID6
 	tristate "RAID-6 mode"
 	depends on BLK_DEV_MD

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-01-24 10:39:52.000000000 +1100
+++ ./drivers/md/md.c	2006-01-24 10:40:21.000000000 +1100
@@ -158,12 +158,12 @@ static int start_readonly;
  */
 static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
 static atomic_t md_event_count;
-static void md_new_event(mddev_t *mddev)
+void md_new_event(mddev_t *mddev)
 {
 	atomic_inc(&md_event_count);
 	wake_up(&md_event_waiters);
 }
-
+EXPORT_SYMBOL_GPL(md_new_event);
 /*
  * Enables to iterate over all existing md arrays
  * all_mddevs_lock protects this list.
@@ -4444,7 +4444,7 @@ static DECLARE_WAIT_QUEUE_HEAD(resync_wa
 
 #define SYNC_MARKS	10
 #define	SYNC_MARK_STEP	(3*HZ)
-static void md_do_sync(mddev_t *mddev)
+void md_do_sync(mddev_t *mddev)
 {
 	mddev_t *mddev2;
 	unsigned int currspeed = 0,
@@ -4679,6 +4679,7 @@ static void md_do_sync(mddev_t *mddev)
 	set_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
 }
+EXPORT_SYMBOL_GPL(md_do_sync);
 
 
 /*

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 10:40:14.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 10:40:21.000000000 +1100
@@ -1022,6 +1022,8 @@ static int add_stripe_bio(struct stripe_
 	return 0;
 }
 
+static void end_reshape(raid5_conf_t *conf);
+
 int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks)
 {
 	int sectors_per_chunk = conf->chunk_size >> 9;
@@ -1832,6 +1834,10 @@ static sector_t sync_request(mddev_t *md
 	if (sector_nr >= max_sector) {
 		/* just being told to finish up .. nothing much to do */
 		unplug_slaves(mddev);
+		if (test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery)) {
+			end_reshape(conf);
+			return 0;
+		}
 
 		if (mddev->curr_resync < max_sector) /* aborted */
 			bitmap_end_sync(mddev->bitmap, mddev->curr_resync,
@@ -2452,6 +2458,114 @@ static int raid5_resize(mddev_t *mddev, 
 	return 0;
 }
 
+static int raid5_reshape(mddev_t *mddev, int raid_disks)
+{
+	raid5_conf_t *conf = mddev_to_conf(mddev);
+	int err;
+	mdk_rdev_t *rdev;
+	struct list_head *rtmp;
+	int spares = 0;
+	int added_devices = 0;
+
+	if (mddev->degraded ||
+	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+		return -EBUSY;
+	if (conf->raid_disks > raid_disks)
+		return -EINVAL; /* Cannot shrink array yet */
+	if (conf->raid_disks == raid_disks)
+		return 0; /* nothing to do */
+
+	/* Can only proceed if there are plenty of stripe_heads.
+	 * We need a minimum of one full stripe,, and for sensible progress
+	 * it is best to have about 4 times that.
+	 * If we require 4 times, then the default 256 4K stripe_heads will
+	 * allow for chunk sizes up to 256K, which is probably OK.
+	 * If the chunk size is greater, user-space should request more
+	 * stripe_heads first.
+	 */
+	if ((mddev->chunk_size / STRIPE_SIZE) * 4 > conf->max_nr_stripes) {
+		printk(KERN_WARNING "raid5: reshape: not enough stripes.  Needed %lu\n",
+		       (mddev->chunk_size / STRIPE_SIZE)*4);
+		return -ENOSPC;
+	}
+
+	ITERATE_RDEV(mddev, rdev, rtmp)
+		if (rdev->raid_disk < 0 &&
+		    !test_bit(Faulty, &rdev->flags))
+			spares++;
+	if (conf->raid_disks + spares < raid_disks-1)
+		/* Not enough devices even to make a degraded array
+		 * of that size
+		 */
+		return -EINVAL;
+
+	err = resize_stripes(conf, raid_disks);
+	if (err)
+		return err;
+
+	spin_lock_irq(&conf->device_lock);
+	conf->previous_raid_disks = conf->raid_disks;
+	mddev->raid_disks = conf->raid_disks = raid_disks;
+	conf->expand_progress = 0;
+	spin_unlock_irq(&conf->device_lock);
+
+	/* Add some new drives, as many as will fit.
+	 * We know there are enough to make the newly sized array work.
+	 */
+	ITERATE_RDEV(mddev, rdev, rtmp)
+		if (rdev->raid_disk < 0 &&
+		    !test_bit(Faulty, &rdev->flags)) {
+			if (raid5_add_disk(mddev, rdev)) {
+				char nm[20];
+				set_bit(In_sync, &rdev->flags);
+				conf->working_disks++;
+				added_devices++;
+				sprintf(nm, "rd%d", rdev->raid_disk);
+				sysfs_create_link(&mddev->kobj, &rdev->kobj, nm);
+			} else
+				break;
+		}
+
+	mddev->degraded = (raid_disks - conf->previous_raid_disks) - added_devices;
+	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+	set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+	mddev->sync_thread = md_register_thread(md_do_sync, mddev,
+						"%s_reshape");
+	if (!mddev->sync_thread) {
+		mddev->recovery = 0;
+		spin_lock_irq(&conf->device_lock);
+		mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
+		conf->expand_progress = MaxSector;
+		spin_unlock_irq(&conf->device_lock);
+		return -EAGAIN;
+	}
+	md_wakeup_thread(mddev->sync_thread);
+	md_new_event(mddev);
+	return 0;
+}
+
+static void end_reshape(raid5_conf_t *conf)
+{
+	struct block_device *bdev;
+
+	conf->mddev->array_size = conf->mddev->size * (conf->mddev->raid_disks-1);
+	set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1);
+	conf->mddev->changed = 1;
+
+	bdev = bdget_disk(conf->mddev->gendisk, 0);
+	if (bdev) {
+		mutex_lock(&bdev->bd_inode->i_mutex);
+		i_size_write(bdev->bd_inode, conf->mddev->array_size << 10);
+		mutex_unlock(&bdev->bd_inode->i_mutex);
+		bdput(bdev);
+	}
+	spin_lock_irq(&conf->device_lock);
+	conf->expand_progress = MaxSector;
+	spin_unlock_irq(&conf->device_lock);
+}
+
 static void raid5_quiesce(mddev_t *mddev, int state)
 {
 	raid5_conf_t *conf = mddev_to_conf(mddev);
@@ -2490,6 +2604,9 @@ static struct mdk_personality raid5_pers
 	.spare_active	= raid5_spare_active,
 	.sync_request	= sync_request,
 	.resize		= raid5_resize,
+#if CONFIG_MD_RAID5_RESHAPE
+	.reshape	= raid5_reshape,
+#endif
 	.quiesce	= raid5_quiesce,
 };
 

diff ./include/linux/raid/md.h~current~ ./include/linux/raid/md.h
--- ./include/linux/raid/md.h~current~	2006-01-24 10:39:52.000000000 +1100
+++ ./include/linux/raid/md.h	2006-01-24 10:40:21.000000000 +1100
@@ -92,7 +92,8 @@ extern void md_super_write(mddev_t *mdde
 extern void md_super_wait(mddev_t *mddev);
 extern int sync_page_io(struct block_device *bdev, sector_t sector, int size,
 			struct page *page, int rw);
-
+extern void md_do_sync(mddev_t *mddev);
+extern void md_new_event(mddev_t *mddev);
 
 #define MD_BUG(x...) { printk("md: bug in file %s, line %d\n", __FILE__, __LINE__); md_print_devices(); }
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (4 preceding siblings ...)
  2006-01-24  0:41 ` [PATCH 005 of 7] md: Final stages of raid5 expand code NeilBrown
@ 2006-01-24  0:41 ` NeilBrown
  2006-01-27 12:37   ` Molle Bestefich
  2006-01-24  0:41 ` [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally NeilBrown
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:41 UTC (permalink / raw)
  To: linux-raid, linux-kernel


We allow the superblock to record an 'old' and a 'new'
geometry, and a position where any conversion is up to.
The geometry allows for changing chunksize, layout and
level as well as number of devices.

When using verion-0.90 superblock, we convert the version
to 0.91 while the conversion is happening so that an old
kernel will refuse the assemble the array.
For version-1, we use a feature bit for the same effect.

When starting an array we check for an incomplete reshape
and restart the reshape process if needed.
If the reshape stopped at an awkward time (like when updating
the first stripe) we refuse to assemble the array, and
let user-space worry about it.




Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c            |   54 +++++++++++++++-
 ./drivers/md/raid1.c         |    5 +
 ./drivers/md/raid5.c         |  144 +++++++++++++++++++++++++++++++++++++------
 ./include/linux/raid/md.h    |    2 
 ./include/linux/raid/md_k.h  |    8 ++
 ./include/linux/raid/md_p.h  |   32 ++++++++-
 ./include/linux/raid/raid5.h |    1 
 7 files changed, 220 insertions(+), 26 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./drivers/md/md.c	2006-01-24 11:24:34.000000000 +1100
@@ -657,7 +657,8 @@ static int super_90_load(mdk_rdev_t *rde
 	}
 
 	if (sb->major_version != 0 ||
-	    sb->minor_version != 90) {
+	    sb->minor_version < 90 ||
+	    sb->minor_version > 91) {
 		printk(KERN_WARNING "Bad version number %d.%d on %s\n",
 			sb->major_version, sb->minor_version,
 			b);
@@ -742,6 +743,15 @@ static int super_90_validate(mddev_t *md
 		mddev->bitmap_offset = 0;
 		mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
 
+		if (mddev->minor_version >= 91) {
+			mddev->reshape_position = sb->reshape_position;
+			mddev->delta_disks = sb->delta_disks;
+			mddev->new_level = sb->new_level;
+			mddev->new_layout = sb->new_layout;
+			mddev->new_chunk = sb->new_chunk;
+		} else
+			mddev->reshape_position = MaxSector;
+
 		if (sb->state & (1<<MD_SB_CLEAN))
 			mddev->recovery_cp = MaxSector;
 		else {
@@ -835,7 +845,6 @@ static void super_90_sync(mddev_t *mddev
 
 	sb->md_magic = MD_SB_MAGIC;
 	sb->major_version = mddev->major_version;
-	sb->minor_version = mddev->minor_version;
 	sb->patch_version = mddev->patch_version;
 	sb->gvalid_words  = 0; /* ignored */
 	memcpy(&sb->set_uuid0, mddev->uuid+0, 4);
@@ -854,6 +863,17 @@ static void super_90_sync(mddev_t *mddev
 	sb->events_hi = (mddev->events>>32);
 	sb->events_lo = (u32)mddev->events;
 
+	if (mddev->reshape_position == MaxSector)
+		sb->minor_version = 90;
+	else {
+		sb->minor_version = 91;
+		sb->reshape_position = mddev->reshape_position;
+		sb->new_level = mddev->new_level;
+		sb->delta_disks = mddev->delta_disks;
+		sb->new_layout = mddev->new_layout;
+		sb->new_chunk = mddev->new_chunk;
+	}
+	mddev->minor_version = sb->minor_version;
 	if (mddev->in_sync)
 	{
 		sb->recovery_cp = mddev->recovery_cp;
@@ -1097,6 +1117,15 @@ static int super_1_validate(mddev_t *mdd
 			}
 			mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset);
 		}
+		if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)) {
+			mddev->reshape_position = le64_to_cpu(sb->reshape_position);
+			mddev->delta_disks = le32_to_cpu(sb->delta_disks);
+			mddev->new_level = le32_to_cpu(sb->new_level);
+			mddev->new_layout = le32_to_cpu(sb->new_layout);
+			mddev->new_chunk = le32_to_cpu(sb->new_chunk);
+		} else
+			mddev->reshape_position = MaxSector;
+
 	} else if (mddev->pers == NULL) {
 		/* Insist of good event counter while assembling */
 		__u64 ev1 = le64_to_cpu(sb->events);
@@ -1165,6 +1194,14 @@ static void super_1_sync(mddev_t *mddev,
 		sb->bitmap_offset = cpu_to_le32((__u32)mddev->bitmap_offset);
 		sb->feature_map = cpu_to_le32(MD_FEATURE_BITMAP_OFFSET);
 	}
+	if (mddev->reshape_position != MaxSector) {
+		sb->feature_map |= cpu_to_le32(MD_FEATURE_RESHAPE_ACTIVE);
+		sb->reshape_position = le64_to_cpu(mddev->reshape_position);
+		sb->new_layout = le32_to_cpu(mddev->new_level);
+		sb->delta_disks = le32_to_cpu(mddev->delta_disks);
+		sb->new_level = le32_to_cpu(mddev->new_layout);
+		sb->new_chunk = le32_to_cpu(mddev->new_chunk);
+	}
 
 	max_dev = 0;
 	ITERATE_RDEV(mddev,rdev2,tmp)
@@ -1482,7 +1519,7 @@ static void sync_sbs(mddev_t * mddev)
 	}
 }
 
-static void md_update_sb(mddev_t * mddev)
+void md_update_sb(mddev_t * mddev)
 {
 	int err;
 	struct list_head *tmp;
@@ -1559,6 +1596,7 @@ repeat:
 	wake_up(&mddev->sb_wait);
 
 }
+EXPORT_SYMBOL_GPL(md_update_sb);
 
 /* words written to sysfs files may, or my not, be \n terminated.
  * We want to accept with case. For this we use cmd_match.
@@ -2530,6 +2568,14 @@ static int do_md_run(mddev_t * mddev)
 	mddev->level = pers->level;
 	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
 
+	if (mddev->reshape_position != MaxSector &&
+	    pers->reshape == NULL) {
+		/* This personality cannot handle reshaping... */
+		mddev->pers = NULL;
+		module_put(pers->owner);
+		return -EINVAL;
+	}
+
 	mddev->recovery = 0;
 	mddev->resync_max_sectors = mddev->size << 1; /* may be over-ridden by personality */
 	mddev->barriers_work = 1;
@@ -3416,6 +3462,8 @@ static int set_array_info(mddev_t * mdde
 	mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
 	mddev->bitmap_offset = 0;
 
+	mddev->reshape_position = MaxSector;
+
 	/*
 	 * Generate a 128 bit UUID
 	 */

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./drivers/md/raid1.c	2006-01-24 11:14:40.000000000 +1100
@@ -1786,6 +1786,11 @@ static int run(mddev_t *mddev)
 		       mdname(mddev), mddev->level);
 		goto out;
 	}
+	if (mddev->reshape_position != MaxSector) {
+		printk("raid1: %s: reshape_position set but not supported\n",
+		       mdname(mddev));
+		goto out;
+	}
 	/*
 	 * copy the already verified devices into our private RAID1
 	 * bookkeeping area. [whatever we allocate in run(),

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 11:19:21.000000000 +1100
@@ -22,6 +22,7 @@
 #include <linux/raid/raid5.h>
 #include <linux/highmem.h>
 #include <linux/bitops.h>
+#include <linux/kthread.h>
 #include <asm/atomic.h>
 
 #include <linux/raid/bitmap.h>
@@ -1490,6 +1491,7 @@ static void handle_stripe(struct stripe_
 		clear_bit(STRIPE_EXPANDING, &sh->state);
 	} else if (expanded) {
 		clear_bit(STRIPE_EXPAND_READY, &sh->state);
+		atomic_dec(&conf->reshape_stripes);
 		wake_up(&conf->wait_for_overlap);
 		md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
 	}
@@ -1861,6 +1863,26 @@ static sector_t sync_request(mddev_t *md
 		 */
 		int i;
 		int dd_idx;
+
+		if (sector_nr == 0 &&
+		    conf->expand_progress != 0) {
+			/* restarting in the middle, skip the initial sectors */
+			sector_nr = conf->expand_progress;
+			sector_div(sector_nr, conf->raid_disks-1);
+			*skipped = 1;
+			return sector_nr;
+		}
+
+		/* Cannot proceed until we've updated the superblock... */
+		wait_event(conf->wait_for_overlap,
+			   atomic_read(&conf->reshape_stripes)==0);
+		mddev->reshape_position = conf->expand_progress;
+
+		mddev->sb_dirty = 1;
+		md_wakeup_thread(mddev->thread);
+		wait_event(mddev->sb_wait, mddev->sb_dirty == 0 ||
+			kthread_should_stop());
+
 		for (i=0; i < conf->chunk_size/512; i+= STRIPE_SECTORS) {
 			int j;
 			int skipped = 0;
@@ -1868,6 +1890,7 @@ static sector_t sync_request(mddev_t *md
 			sh = get_active_stripe(conf, sector_nr+i,
 					       conf->raid_disks, pd_idx, 0);
 			set_bit(STRIPE_EXPANDING, &sh->state);
+			atomic_inc(&conf->reshape_stripes);
 			/* If any of this stripe is beyond the end of the old
 			 * array, then we need to zero those blocks
 			 */
@@ -2107,10 +2130,65 @@ static int run(mddev_t *mddev)
 		return -EIO;
 	}
 
+	if (mddev->reshape_position != MaxSector) {
+		/* Check that we can continue the reshape.
+		 * Currently only disks can change, it must
+		 * increase, and we must be passed the point where
+		 * a stripe over-writes itself
+		 */
+		sector_t here_new, here_old, there_new;
+		int old_disks;
+
+		if (mddev->new_level != mddev->level ||
+		    mddev->new_layout != mddev->layout ||
+		    mddev->new_chunk != mddev->chunk_size) {
+			printk(KERN_ERR "raid5: %s: unsupported reshape required - aborting.\n",
+			       mdname(mddev));
+			return -EINVAL;
+		}
+		if (mddev->delta_disks <= 0) {
+			printk(KERN_ERR "raid5: %s: unsupported reshape (reduce disks) required - aborting.\n",
+			       mdname(mddev));
+			return -EINVAL;
+		}
+		old_disks = mddev->raid_disks - mddev->delta_disks;
+		/* reshape_position must be on a new-stripe boundary, and one
+		 * further up in new geometry must map before here in old geometry.
+		 */
+		here_new = mddev->reshape_position;
+		if (sector_div(here_new, (mddev->chunk_size>>9)*(mddev->raid_disks-1))) {
+			printk(KERN_ERR "raid5: reshape_position not on a stripe boundary\n");
+			return -EINVAL;
+		}
+		here_old = mddev->reshape_position;
+		sector_div(here_old, (mddev->chunk_size>>9)*(old_disks-1));
+		/* here_old is the first sector that we might need to read from
+		 * for the next movement
+		 */
+		there_new = here_new + (mddev->chunk_size>>9);
+		/* there_new is the last sector that the next movement will be
+		 * written to.
+		 */
+		if (there_new >= here_old) {
+			printk(KERN_ERR "raid5: reshape_position too early for auto-recovery - aborting.\n");
+			return -EINVAL;
+		}
+		printk("raid5: reshape will continue\n");
+		/* OK, we should be able to continue; */
+	}
+
+
 	mddev->private = kzalloc(sizeof (raid5_conf_t), GFP_KERNEL);
 	if ((conf = mddev->private) == NULL)
 		goto abort;
-	conf->disks = kzalloc(mddev->raid_disks * sizeof(struct disk_info),
+	if (mddev->reshape_position == MaxSector) {
+		conf->previous_raid_disks = conf->raid_disks = mddev->raid_disks;
+	} else {
+		conf->raid_disks = mddev->raid_disks;
+		conf->previous_raid_disks = mddev->raid_disks - mddev->delta_disks;
+	}
+
+	conf->disks = kzalloc(conf->raid_disks * sizeof(struct disk_info),
 			      GFP_KERNEL);
 	if (!conf->disks)
 		goto abort;
@@ -2134,7 +2212,7 @@ static int run(mddev_t *mddev)
 
 	ITERATE_RDEV(mddev,rdev,tmp) {
 		raid_disk = rdev->raid_disk;
-		if (raid_disk >= mddev->raid_disks
+		if (raid_disk >= conf->raid_disks
 		    || raid_disk < 0)
 			continue;
 		disk = conf->disks + raid_disk;
@@ -2150,7 +2228,6 @@ static int run(mddev_t *mddev)
 		}
 	}
 
-	conf->raid_disks = mddev->raid_disks;
 	/*
 	 * 0 for a fully functional array, 1 for a degraded array.
 	 */
@@ -2160,7 +2237,7 @@ static int run(mddev_t *mddev)
 	conf->level = mddev->level;
 	conf->algorithm = mddev->layout;
 	conf->max_nr_stripes = NR_STRIPES;
-	conf->expand_progress = MaxSector;
+	conf->expand_progress = mddev->reshape_position;
 
 	/* device size must be a multiple of chunk size */
 	mddev->size &= ~(mddev->chunk_size/1024 -1);
@@ -2233,6 +2310,20 @@ static int run(mddev_t *mddev)
 
 	print_raid5_conf(conf);
 
+	if (conf->expand_progress != MaxSector) {
+		printk("...ok start reshape thread\n");
+		atomic_set(&conf->reshape_stripes, 0);
+		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+		set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
+		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
+		mddev->sync_thread = md_register_thread(md_do_sync, mddev,
+							"%s_reshape");
+		/* FIXME if md_register_thread fails?? */
+		md_wakeup_thread(mddev->sync_thread);
+
+	}
+
 	/* read-ahead size must cover two whole stripes, which is
 	 * 2 * (n-1) * chunksize where 'n' is the number of raid devices
 	 */
@@ -2248,8 +2339,8 @@ static int run(mddev_t *mddev)
 
 	mddev->queue->unplug_fn = raid5_unplug_device;
 	mddev->queue->issue_flush_fn = raid5_issue_flush;
+	mddev->array_size =  mddev->size * (conf->previous_raid_disks - 1);
 
-	mddev->array_size =  mddev->size * (mddev->raid_disks - 1);
 	return 0;
 abort:
 	if (conf) {
@@ -2422,7 +2513,7 @@ static int raid5_add_disk(mddev_t *mddev
 	/*
 	 * find the disk ...
 	 */
-	for (disk=0; disk < mddev->raid_disks; disk++)
+	for (disk=0; disk < conf->raid_disks; disk++)
 		if ((p=conf->disks + disk)->rdev == NULL) {
 			clear_bit(In_sync, &rdev->flags);
 			rdev->raid_disk = disk;
@@ -2503,9 +2594,10 @@ static int raid5_reshape(mddev_t *mddev,
 	if (err)
 		return err;
 
+	atomic_set(&conf->reshape_stripes, 0);
 	spin_lock_irq(&conf->device_lock);
 	conf->previous_raid_disks = conf->raid_disks;
-	mddev->raid_disks = conf->raid_disks = raid_disks;
+	conf->raid_disks = raid_disks;
 	conf->expand_progress = 0;
 	spin_unlock_irq(&conf->device_lock);
 
@@ -2527,6 +2619,14 @@ static int raid5_reshape(mddev_t *mddev,
 		}
 
 	mddev->degraded = (raid_disks - conf->previous_raid_disks) - added_devices;
+	mddev->new_chunk = mddev->chunk_size;
+	mddev->new_layout = mddev->layout;
+	mddev->new_level = mddev->level;
+	mddev->raid_disks = raid_disks;
+	mddev->delta_disks = raid_disks - conf->previous_raid_disks;
+	mddev->reshape_position = 0;
+	mddev->sb_dirty = 1;
+
 	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 	clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
 	set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
@@ -2537,6 +2637,7 @@ static int raid5_reshape(mddev_t *mddev,
 		mddev->recovery = 0;
 		spin_lock_irq(&conf->device_lock);
 		mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
+		mddev->delta_disks = 0;
 		conf->expand_progress = MaxSector;
 		spin_unlock_irq(&conf->device_lock);
 		return -EAGAIN;
@@ -2550,20 +2651,23 @@ static void end_reshape(raid5_conf_t *co
 {
 	struct block_device *bdev;
 
-	conf->mddev->array_size = conf->mddev->size * (conf->mddev->raid_disks-1);
-	set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1);
-	conf->mddev->changed = 1;
-
-	bdev = bdget_disk(conf->mddev->gendisk, 0);
-	if (bdev) {
-		mutex_lock(&bdev->bd_inode->i_mutex);
-		i_size_write(bdev->bd_inode, conf->mddev->array_size << 10);
-		mutex_unlock(&bdev->bd_inode->i_mutex);
-		bdput(bdev);
+	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
+		conf->mddev->array_size = conf->mddev->size * (conf->raid_disks-1);
+		set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1);
+		conf->mddev->changed = 1;
+
+		bdev = bdget_disk(conf->mddev->gendisk, 0);
+		if (bdev) {
+			mutex_lock(&bdev->bd_inode->i_mutex);
+			i_size_write(bdev->bd_inode, conf->mddev->array_size << 10);
+			mutex_unlock(&bdev->bd_inode->i_mutex);
+			bdput(bdev);
+		}
+		spin_lock_irq(&conf->device_lock);
+		conf->expand_progress = MaxSector;
+		spin_unlock_irq(&conf->device_lock);
+		conf->mddev->reshape_position = MaxSector;
 	}
-	spin_lock_irq(&conf->device_lock);
-	conf->expand_progress = MaxSector;
-	spin_unlock_irq(&conf->device_lock);
 }
 
 static void raid5_quiesce(mddev_t *mddev, int state)

diff ./include/linux/raid/md.h~current~ ./include/linux/raid/md.h
--- ./include/linux/raid/md.h~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./include/linux/raid/md.h	2006-01-24 11:14:40.000000000 +1100
@@ -95,6 +95,8 @@ extern int sync_page_io(struct block_dev
 extern void md_do_sync(mddev_t *mddev);
 extern void md_new_event(mddev_t *mddev);
 
+extern void md_update_sb(mddev_t * mddev);
+
 #define MD_BUG(x...) { printk("md: bug in file %s, line %d\n", __FILE__, __LINE__); md_print_devices(); }
 
 #endif 

diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./include/linux/raid/md_k.h	2006-01-24 11:14:40.000000000 +1100
@@ -132,6 +132,14 @@ struct mddev_s
 
 	char				uuid[16];
 
+	/* If the array is being reshaped, we need to record the
+	 * new shape and an indication of where we are up to.
+	 * This is written to the superblock.
+	 * If reshape_position is MaxSector, then no reshape is happening (yet).
+	 */
+	sector_t			reshape_position;
+	int				delta_disks, new_level, new_layout, new_chunk;
+
 	struct mdk_thread_s		*thread;	/* management thread */
 	struct mdk_thread_s		*sync_thread;	/* doing resync or reconstruct */
 	sector_t			curr_resync;	/* blocks scheduled */

diff ./include/linux/raid/md_p.h~current~ ./include/linux/raid/md_p.h
--- ./include/linux/raid/md_p.h~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./include/linux/raid/md_p.h	2006-01-24 11:14:40.000000000 +1100
@@ -102,6 +102,18 @@ typedef struct mdp_device_descriptor_s {
 #define MD_SB_ERRORS		1
 
 #define	MD_SB_BITMAP_PRESENT	8 /* bitmap may be present nearby */
+
+/*
+ * Notes:
+ * - if an array is being reshaped (restriped) in order to change the
+ *   the number of active devices in the array, 'raid_disks' will be
+ *   the larger of the old and new numbers.  'delta_disks' will
+ *   be the "new - old".  So if +ve, raid_disks is the new value, and
+ *   "raid_disks-delta_disks" is the old.  If -ve, raid_disks is the
+ *   old value and "raid_disks+delta_disks" is the new (smaller) value.
+ */
+
+
 typedef struct mdp_superblock_s {
 	/*
 	 * Constant generic information
@@ -146,7 +158,13 @@ typedef struct mdp_superblock_s {
 	__u32 cp_events_hi;	/* 10 high-order of checkpoint update count   */
 #endif
 	__u32 recovery_cp;	/* 11 recovery checkpoint sector count	      */
-	__u32 gstate_sreserved[MD_SB_GENERIC_STATE_WORDS - 12];
+	/* There are only valid for minor_version > 90 */
+	__u64 reshape_position;	/* 12,13 next address in array-space for reshape */
+	__u32 new_level;	/* 14 new level we are reshaping to	      */
+	__u32 delta_disks;	/* 15 change in number of raid_disks	      */
+	__u32 new_layout;	/* 16 new layout			      */
+	__u32 new_chunk;	/* 17 new chunk size (bytes)		      */
+	__u32 gstate_sreserved[MD_SB_GENERIC_STATE_WORDS - 18];
 
 	/*
 	 * Personality information
@@ -207,7 +225,14 @@ struct mdp_superblock_1 {
 				 * NOTE: signed, so bitmap can be before superblock
 				 * only meaningful of feature_map[0] is set.
 				 */
-	__u8	pad1[128-100];	/* set to 0 when written */
+
+	/* These are only valid with feature bit '4' */
+	__u64	reshape_position;	/* next address in array-space for reshape */
+	__u32	new_level;	/* new level we are reshaping to		*/
+	__u32	delta_disks;	/* change in number of raid_disks		*/
+	__u32	new_layout;	/* new layout					*/
+	__u32	new_chunk;	/* new chunk size (bytes)			*/
+	__u8	pad1[128-124];	/* set to 0 when written */
 
 	/* constant this-device information - 64 bytes */
 	__u64	data_offset;	/* sector start of data, often 0 */
@@ -240,8 +265,9 @@ struct mdp_superblock_1 {
 
 /* feature_map bits */
 #define MD_FEATURE_BITMAP_OFFSET	1
+#define	MD_FEATURE_RESHAPE_ACTIVE	4
 
-#define	MD_FEATURE_ALL			1
+#define	MD_FEATURE_ALL			5
 
 #endif 
 

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 11:19:25.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 11:19:21.000000000 +1100
@@ -224,6 +224,7 @@ struct raid5_private_data {
 	struct list_head	bitmap_list; /* stripes delaying awaiting bitmap update */
 	atomic_t		preread_active_stripes; /* stripes with scheduled io */
 
+	atomic_t		reshape_stripes; /* stripes with pending writes for reshape */
 	/* unfortunately we need two cache names as we temporarily have
 	 * two caches.
 	 */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally.
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (5 preceding siblings ...)
  2006-01-24  0:41 ` [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape NeilBrown
@ 2006-01-24  0:41 ` NeilBrown
  2006-01-24  9:23 ` [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 Lars Marowsky-Bree
  2006-02-07 17:13 ` Henrik Holst
  8 siblings, 0 replies; 15+ messages in thread
From: NeilBrown @ 2006-01-24  0:41 UTC (permalink / raw)
  To: linux-raid, linux-kernel


Instead of checkpointing at each stripe, only checkpoint
when a new write would overwrite uncheckpointed data.
Block any write to the uncheckpointed area.
Arbitrarily checkpoint every 3Meg.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c         |   53 ++++++++++++++++++++++++++++++++++---------
 ./include/linux/raid/raid5.h |    3 ++
 2 files changed, 45 insertions(+), 11 deletions(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2006-01-24 11:19:21.000000000 +1100
+++ ./drivers/md/raid5.c	2006-01-24 11:26:16.000000000 +1100
@@ -1748,8 +1748,9 @@ static int make_request(request_queue_t 
 	for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
 		DEFINE_WAIT(w);
 		int disks;
-		
+
 	retry:
+		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 		if (likely(conf->expand_progress == MaxSector))
 			disks = conf->raid_disks;
 		else {
@@ -1757,6 +1758,13 @@ static int make_request(request_queue_t 
 			disks = conf->raid_disks;
 			if (logical_sector >= conf->expand_progress)
 				disks = conf->previous_raid_disks;
+			else {
+				if (logical_sector >= conf->expand_lo) {
+					spin_unlock_irq(&conf->device_lock);
+					schedule();
+					goto retry;
+				}
+			}
 			spin_unlock_irq(&conf->device_lock);
 		}
  		new_sector = raid5_compute_sector(logical_sector, disks, disks - 1,
@@ -1765,7 +1773,6 @@ static int make_request(request_queue_t 
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
-		prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
 		sh = get_active_stripe(conf, new_sector, disks, pd_idx, (bi->bi_rw&RWA_MASK));
 		if (sh) {
 			if (unlikely(conf->expand_progress != MaxSector)) {
@@ -1863,6 +1870,7 @@ static sector_t sync_request(mddev_t *md
 		 */
 		int i;
 		int dd_idx;
+		sector_t writepos, safepos, gap;
 
 		if (sector_nr == 0 &&
 		    conf->expand_progress != 0) {
@@ -1873,15 +1881,36 @@ static sector_t sync_request(mddev_t *md
 			return sector_nr;
 		}
 
-		/* Cannot proceed until we've updated the superblock... */
-		wait_event(conf->wait_for_overlap,
-			   atomic_read(&conf->reshape_stripes)==0);
-		mddev->reshape_position = conf->expand_progress;
-
-		mddev->sb_dirty = 1;
-		md_wakeup_thread(mddev->thread);
-		wait_event(mddev->sb_wait, mddev->sb_dirty == 0 ||
-			kthread_should_stop());
+		/* we update the metadata when there is more than 3Meg
+		 * in the block range (that is rather arbitrary, should
+		 * probably be time based) or when the data about to be
+		 * copied would over-write the source of the data at
+		 * the front of the range.
+		 * i.e. one new_stripe forward from expand_progress new_maps
+		 * to after where expand_lo old_maps to
+		 */
+		writepos = conf->expand_progress +
+			conf->chunk_size/512*(conf->raid_disks-1);
+		sector_div(writepos, conf->raid_disks-1);
+		safepos = conf->expand_lo;
+		sector_div(safepos, conf->previous_raid_disks-1);
+		gap = conf->expand_progress - conf->expand_lo;
+
+		if (writepos >= safepos ||
+		    gap > (conf->raid_disks-1)*3000*2 /*3Meg*/) {
+			/* Cannot proceed until we've updated the superblock... */
+			wait_event(conf->wait_for_overlap,
+				   atomic_read(&conf->reshape_stripes)==0);
+			mddev->reshape_position = conf->expand_progress;
+			mddev->sb_dirty = 1;
+			md_wakeup_thread(mddev->thread);
+			wait_event(mddev->sb_wait, mddev->sb_dirty == 0 ||
+				   kthread_should_stop());
+			spin_lock_irq(&conf->device_lock);
+			conf->expand_lo = mddev->reshape_position;
+			spin_unlock_irq(&conf->device_lock);
+			wake_up(&conf->wait_for_overlap);
+		}
 
 		for (i=0; i < conf->chunk_size/512; i+= STRIPE_SECTORS) {
 			int j;
@@ -2312,6 +2341,7 @@ static int run(mddev_t *mddev)
 
 	if (conf->expand_progress != MaxSector) {
 		printk("...ok start reshape thread\n");
+		conf->expand_lo = conf->expand_progress;
 		atomic_set(&conf->reshape_stripes, 0);
 		clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
 		clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
@@ -2599,6 +2629,7 @@ static int raid5_reshape(mddev_t *mddev,
 	conf->previous_raid_disks = conf->raid_disks;
 	conf->raid_disks = raid_disks;
 	conf->expand_progress = 0;
+	conf->expand_lo = 0;
 	spin_unlock_irq(&conf->device_lock);
 
 	/* Add some new drives, as many as will fit.

diff ./include/linux/raid/raid5.h~current~ ./include/linux/raid/raid5.h
--- ./include/linux/raid/raid5.h~current~	2006-01-24 11:19:21.000000000 +1100
+++ ./include/linux/raid/raid5.h	2006-01-24 11:26:16.000000000 +1100
@@ -217,6 +217,9 @@ struct raid5_private_data {
 
 	/* used during an expand */
 	sector_t		expand_progress;	/* MaxSector when no expand happening */
+	sector_t		expand_lo; /* from here up to expand_progress it out-of-bounds
+					    * as we haven't flushed the metadata yet
+					    */
 	int			previous_raid_disks;
 
 	struct list_head	handle_list; /* stripes needing handling */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (6 preceding siblings ...)
  2006-01-24  0:41 ` [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally NeilBrown
@ 2006-01-24  9:23 ` Lars Marowsky-Bree
  2006-01-24  9:32   ` Neil Brown
  2006-02-07 17:13 ` Henrik Holst
  8 siblings, 1 reply; 15+ messages in thread
From: Lars Marowsky-Bree @ 2006-01-24  9:23 UTC (permalink / raw)
  To: NeilBrown, linux-raid, linux-kernel

On 2006-01-24T11:40:47, NeilBrown <neilb@suse.de> wrote:

> I am expecting that I will ultimately support online conversion of
> raid5 to raid6 with only one extra device.  This process is not
> (efficiently) checkpointable and so will be at-your-risk.

So the best way to go about that, if one wants to keep that option open
w/o that risk, would be to not create a raid5 in the first place, but a
raid6 with one disk missing?

Maybe even have mdadm default to that - as long as just one parity disk
is missing, no slowdown should happen, right?


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business	 -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2
  2006-01-24  9:23 ` [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 Lars Marowsky-Bree
@ 2006-01-24  9:32   ` Neil Brown
  0 siblings, 0 replies; 15+ messages in thread
From: Neil Brown @ 2006-01-24  9:32 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-raid, linux-kernel

On Tuesday January 24, lmb@suse.de wrote:
> On 2006-01-24T11:40:47, NeilBrown <neilb@suse.de> wrote:
> 
> > I am expecting that I will ultimately support online conversion of
> > raid5 to raid6 with only one extra device.  This process is not
> > (efficiently) checkpointable and so will be at-your-risk.
> 
> So the best way to go about that, if one wants to keep that option open
> w/o that risk, would be to not create a raid5 in the first place, but a
> raid6 with one disk missing?
> 
> Maybe even have mdadm default to that - as long as just one parity disk
> is missing, no slowdown should happen, right?

Not exactly....

raid6 has rotating parity drives, for both P and Q (the two different
'parity' blocks).
With one missing device, some Ps, some Qs, and some data would be
missing, and you would definitely get a slowdown trying to generate
some of it.

We could define a raid6 layout that didn't rotate Q.  Then you would
be able to do what you suggest.
However it would then be no different from creating a normal raid5 and
supporting online conversion from raid5 to raid6-with-non-rotating-Q.
This conversion doesn't need an reshaping pass, just a recovery of the
now-missing device.

raid6-with-non-rotating-Q would have similar issues to raid4 - one
drive becomes a hot-spot for writes.  I don't know how much of an
issue this really is though.

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape
  2006-01-24  0:41 ` [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape NeilBrown
@ 2006-01-27 12:37   ` Molle Bestefich
  0 siblings, 0 replies; 15+ messages in thread
From: Molle Bestefich @ 2006-01-27 12:37 UTC (permalink / raw)
  To: linux-raid

NeilBrown wrote:
> We allow the superblock to record an 'old' and a 'new'
> geometry, and a position where any conversion is up to.

> When starting an array we check for an incomplete reshape
> and restart the reshape process if needed.

*Super* cool!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2
  2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
                   ` (7 preceding siblings ...)
  2006-01-24  9:23 ` [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 Lars Marowsky-Bree
@ 2006-02-07 17:13 ` Henrik Holst
  2006-02-09  3:32   ` Neil Brown
  8 siblings, 1 reply; 15+ messages in thread
From: Henrik Holst @ 2006-02-07 17:13 UTC (permalink / raw)
  To: linux-raid

Hello linux world!

Excuse me for being so ignorant but /exactly how/ do I go about to find
out which files to download from kernel.org that will approve these
patches?

[snip from N. Browns initial post]

>
> [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow.
> [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array.
> [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding.
> [PATCH 004 of 7] md: Core of raid5 resize process
> [PATCH 005 of 7] md: Final stages of raid5 expand code.
> [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape
> [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally.
>

I only get lot's of "chunk failed" when running patch command on my
src.tar.gz kernels. :-(

Thanks for advice,

Henrik Holst. Certified kernel patch noob.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2
  2006-02-07 17:13 ` Henrik Holst
@ 2006-02-09  3:32   ` Neil Brown
  2006-02-09  6:35     ` Kernels and MD versions (was: md: Introduction - raid5 reshape mark-2) Patrik Jonsson
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2006-02-09  3:32 UTC (permalink / raw)
  To: Henrik Holst; +Cc: linux-raid

On Tuesday February 7, henrik.holst@idgmail.se wrote:
> Hello linux world!
> 
> Excuse me for being so ignorant but /exactly how/ do I go about to find
> out which files to download from kernel.org that will approve these
> patches?

I always make them against the latest -mm kernel, so that would be a
good place to start.  However things change quickly and I can't
promise it will apply against whatever is the 'latest' today.

If you would like to nominate a particular recent kernel, I'll create
a patch set that is guaranteed to apply against that. (Testing is
always appreciated, and well worth that small effort on my part).

NeilBrown

> 
> [snip from N. Browns initial post]
> 
> >
> > [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow.
> > [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array.
> > [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding.
> > [PATCH 004 of 7] md: Core of raid5 resize process
> > [PATCH 005 of 7] md: Final stages of raid5 expand code.
> > [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape
> > [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally.
> >
> 
> I only get lot's of "chunk failed" when running patch command on my
> src.tar.gz kernels. :-(
> 
> Thanks for advice,
> 
> Henrik Holst. Certified kernel patch noob.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Kernels and MD versions (was: md: Introduction - raid5 reshape mark-2)
  2006-02-09  3:32   ` Neil Brown
@ 2006-02-09  6:35     ` Patrik Jonsson
  2006-02-09 18:07       ` Mr. James W. Laferriere
  0 siblings, 1 reply; 15+ messages in thread
From: Patrik Jonsson @ 2006-02-09  6:35 UTC (permalink / raw)
  Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

Neil Brown wrote:
> I always make them against the latest -mm kernel, so that would be a
> good place to start.  However things change quickly and I can't
> promise it will apply against whatever is the 'latest' today.
> 
> If you would like to nominate a particular recent kernel, I'll create
> a patch set that is guaranteed to apply against that. (Testing is
> always appreciated, and well worth that small effort on my part).

I find this is a major problem for me, too. Even though I try to stay up
to date with the md developments, I have a hard time piecing together
which past patches went with which version, so if I get a recent version
I don't know which patches need to be applied and which are already in
there.

My suggestion is that Neil, should he be willing, keep a log somewhere
which detail kernel version and what major updates in md functionality
go along with it. Something like
2.6.14         raid5 read error correction
2.6.15         md /sys interface
2.6.16.rc1     raid5 reshape
2.6.16.rc2-mm4 something else cool

it would include the released kernels and the previews of the current
kernel. That way, say I see that the latest FC4 kernel is 2.6.14, I
could look and see that since the raid5 read error correction was
included, I don't have to go looking for the patches.

Maybe this is too much hassle (or maybe it's already out there
somewhere) but I'm thinking simple, and I think it would give a
high-level overview of the development for us who are not intimately
involved in every kernel version.

Regards,

/Patrik

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Kernels and MD versions (was: md: Introduction - raid5 reshape mark-2)
  2006-02-09  6:35     ` Kernels and MD versions (was: md: Introduction - raid5 reshape mark-2) Patrik Jonsson
@ 2006-02-09 18:07       ` Mr. James W. Laferriere
  0 siblings, 0 replies; 15+ messages in thread
From: Mr. James W. Laferriere @ 2006-02-09 18:07 UTC (permalink / raw)
  To: Patrik Jonsson; +Cc: linux-raid

 	Hello Patrik ,

On Wed, 8 Feb 2006, Patrik Jonsson wrote:
> Neil Brown wrote:
>> I always make them against the latest -mm kernel, so that would be a
>> good place to start.  However things change quickly and I can't
>> promise it will apply against whatever is the 'latest' today.
>>
>> If you would like to nominate a particular recent kernel, I'll create
>> a patch set that is guaranteed to apply against that. (Testing is
>> always appreciated, and well worth that small effort on my part).
>
> I find this is a major problem for me, too. Even though I try to stay up
> to date with the md developments, I have a hard time piecing together
> which past patches went with which version, so if I get a recent version
> I don't know which patches need to be applied and which are already in
> there.
>
> My suggestion is that Neil, should he be willing, keep a log somewhere
> which detail kernel version and what major updates in md functionality
> go along with it. Something like
> 2.6.14         raid5 read error correction
> 2.6.15         md /sys interface
> 2.6.16.rc1     raid5 reshape
> 2.6.16.rc2-mm4 something else cool
>
> it would include the released kernels and the previews of the current
> kernel. That way, say I see that the latest FC4 kernel is 2.6.14, I
> could look and see that since the raid5 read error correction was
> included, I don't have to go looking for the patches.
>
> Maybe this is too much hassle (or maybe it's already out there
> somewhere) but I'm thinking simple, and I think it would give a
> high-level overview of the development for us who are not intimately
> involved in every kernel version.
 	Iirc ,  The git/snv/cvs repositories 'can' contain this
 	information .  Neil (as you say , if willing) can pull the
 	kernel versions & Notes for his submissions from the
 	repository .   Hth ,  JimL
-- 
+------------------------------------------------------------------+
| James   W.   Laferriere | System    Techniques | Give me VMS     |
| Network        Engineer | 3542 Broken Yoke Dr. |  Give me Linux  |
| babydr@baby-dragons.com | Billings , MT. 59105 |   only  on  AXP |
|  http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr   |
+------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-02-09 18:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-24  0:40 [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 NeilBrown
2006-01-24  0:40 ` [PATCH 001 of 7] md: Split disks array out of raid5 conf structure so it is easier to grow NeilBrown
2006-01-24  0:40 ` [PATCH 002 of 7] md: Allow stripes to be expanded in preparation for expanding an array NeilBrown
2006-01-24  0:41 ` [PATCH 003 of 7] md: Infrastructure to allow normal IO to continue while array is expanding NeilBrown
2006-01-24  0:41 ` [PATCH 004 of 7] md: Core of raid5 resize process NeilBrown
2006-01-24  0:41 ` [PATCH 005 of 7] md: Final stages of raid5 expand code NeilBrown
2006-01-24  0:41 ` [PATCH 006 of 7] md: Checkpoint and allow restart of raid5 reshape NeilBrown
2006-01-27 12:37   ` Molle Bestefich
2006-01-24  0:41 ` [PATCH 007 of 7] md: Only checkpoint expansion progress occasionally NeilBrown
2006-01-24  9:23 ` [PATCH 000 of 7] md: Introduction - raid5 reshape mark-2 Lars Marowsky-Bree
2006-01-24  9:32   ` Neil Brown
2006-02-07 17:13 ` Henrik Holst
2006-02-09  3:32   ` Neil Brown
2006-02-09  6:35     ` Kernels and MD versions (was: md: Introduction - raid5 reshape mark-2) Patrik Jonsson
2006-02-09 18:07       ` Mr. James W. Laferriere

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).