* [PATCH 000 of 5] md: assorted bug fixes and minor features @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown Following are 5 patches for md suitable for 2.6.22. None are needed for -stable. Thanks NeilBrown [PATCH 001 of 5] md: Move test for whether level supports bitmap to correct place. [PATCH 002 of 5] md: Stop using csum_partial for checksum calculation in md. [PATCH 003 of 5] md: Remove the slash from the name of a kmem_cache used by raid5. [PATCH 004 of 5] md: Allow reshape_position for md arrays to be set via sysfs. [PATCH 005 of 5] md: Improve partition detection in md array. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 000 of 5] md: assorted bug fixes and minor features @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown Following are 5 patches for md suitable for 2.6.22. None are needed for -stable. Thanks NeilBrown [PATCH 001 of 5] md: Move test for whether level supports bitmap to correct place. [PATCH 002 of 5] md: Stop using csum_partial for checksum calculation in md. [PATCH 003 of 5] md: Remove the slash from the name of a kmem_cache used by raid5. [PATCH 004 of 5] md: Allow reshape_position for md arrays to be set via sysfs. [PATCH 005 of 5] md: Improve partition detection in md array. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 001 of 5] md: Move test for whether level supports bitmap to correct place. 2007-05-08 4:29 ` NeilBrown @ 2007-05-08 4:29 ` NeilBrown -1 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown We need to check for internal-consistency of superblock in load_super. validate_super is for inter-device consistency. With the test in the wrong place, a badly created array will confuse md rather an produce sensible errors. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 @@ -695,6 +695,17 @@ static int super_90_load(mdk_rdev_t *rde rdev->data_offset = 0; rdev->sb_size = MD_SB_BYTES; + if (sb->state & (1<<MD_SB_BITMAP_PRESENT)) { + if (sb->level != 1 && sb->level != 4 + && sb->level != 5 && sb->level != 6 + && sb->level != 10) { + /* FIXME use a better test */ + printk(KERN_WARNING + "md: bitmaps not supported for this level.\n"); + goto abort; + } + } + if (sb->level == LEVEL_MULTIPATH) rdev->desc_nr = -1; else @@ -793,16 +804,8 @@ static int super_90_validate(mddev_t *md mddev->max_disks = MD_SB_DISKS; if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && - mddev->bitmap_file == NULL) { - if (mddev->level != 1 && mddev->level != 4 - && mddev->level != 5 && mddev->level != 6 - && mddev->level != 10) { - /* FIXME use a better test */ - printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); - return -EINVAL; - } + mddev->bitmap_file == NULL) mddev->bitmap_offset = mddev->default_bitmap_offset; - } } else if (mddev->pers == NULL) { /* Insist on good event counter while assembling */ @@ -1059,6 +1062,18 @@ static int super_1_load(mdk_rdev_t *rdev bdevname(rdev->bdev,b)); return -EINVAL; } + if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET)) { + if (sb->level != cpu_to_le32(1) && + sb->level != cpu_to_le32(4) && + sb->level != cpu_to_le32(5) && + sb->level != cpu_to_le32(6) && + sb->level != cpu_to_le32(10)) { + printk(KERN_WARNING + "md: bitmaps not supported for this level.\n"); + return -EINVAL; + } + } + rdev->preferred_minor = 0xffff; rdev->data_offset = le64_to_cpu(sb->data_offset); atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read)); @@ -1142,14 +1157,9 @@ static int super_1_validate(mddev_t *mdd mddev->max_disks = (4096-256)/2; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && - mddev->bitmap_file == NULL ) { - if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 - && mddev->level != 10) { - printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); - return -EINVAL; - } + mddev->bitmap_file == NULL ) mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); - } + if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)) { mddev->reshape_position = le64_to_cpu(sb->reshape_position); mddev->delta_disks = le32_to_cpu(sb->delta_disks); ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 001 of 5] md: Move test for whether level supports bitmap to correct place. @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown We need to check for internal-consistency of superblock in load_super. validate_super is for inter-device consistency. With the test in the wrong place, a badly created array will confuse md rather an produce sensible errors. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 42 ++++++++++++++++++++++++++---------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 @@ -695,6 +695,17 @@ static int super_90_load(mdk_rdev_t *rde rdev->data_offset = 0; rdev->sb_size = MD_SB_BYTES; + if (sb->state & (1<<MD_SB_BITMAP_PRESENT)) { + if (sb->level != 1 && sb->level != 4 + && sb->level != 5 && sb->level != 6 + && sb->level != 10) { + /* FIXME use a better test */ + printk(KERN_WARNING + "md: bitmaps not supported for this level.\n"); + goto abort; + } + } + if (sb->level == LEVEL_MULTIPATH) rdev->desc_nr = -1; else @@ -793,16 +804,8 @@ static int super_90_validate(mddev_t *md mddev->max_disks = MD_SB_DISKS; if (sb->state & (1<<MD_SB_BITMAP_PRESENT) && - mddev->bitmap_file == NULL) { - if (mddev->level != 1 && mddev->level != 4 - && mddev->level != 5 && mddev->level != 6 - && mddev->level != 10) { - /* FIXME use a better test */ - printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); - return -EINVAL; - } + mddev->bitmap_file == NULL) mddev->bitmap_offset = mddev->default_bitmap_offset; - } } else if (mddev->pers == NULL) { /* Insist on good event counter while assembling */ @@ -1059,6 +1062,18 @@ static int super_1_load(mdk_rdev_t *rdev bdevname(rdev->bdev,b)); return -EINVAL; } + if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET)) { + if (sb->level != cpu_to_le32(1) && + sb->level != cpu_to_le32(4) && + sb->level != cpu_to_le32(5) && + sb->level != cpu_to_le32(6) && + sb->level != cpu_to_le32(10)) { + printk(KERN_WARNING + "md: bitmaps not supported for this level.\n"); + return -EINVAL; + } + } + rdev->preferred_minor = 0xffff; rdev->data_offset = le64_to_cpu(sb->data_offset); atomic_set(&rdev->corrected_errors, le32_to_cpu(sb->cnt_corrected_read)); @@ -1142,14 +1157,9 @@ static int super_1_validate(mddev_t *mdd mddev->max_disks = (4096-256)/2; if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_BITMAP_OFFSET) && - mddev->bitmap_file == NULL ) { - if (mddev->level != 1 && mddev->level != 5 && mddev->level != 6 - && mddev->level != 10) { - printk(KERN_WARNING "md: bitmaps not supported for this level.\n"); - return -EINVAL; - } + mddev->bitmap_file == NULL ) mddev->bitmap_offset = (__s32)le32_to_cpu(sb->bitmap_offset); - } + if ((le32_to_cpu(sb->feature_map) & MD_FEATURE_RESHAPE_ACTIVE)) { mddev->reshape_position = le64_to_cpu(sb->reshape_position); mddev->delta_disks = le32_to_cpu(sb->delta_disks); ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 002 of 5] md: Stop using csum_partial for checksum calculation in md. 2007-05-08 4:29 ` NeilBrown @ 2007-05-08 4:29 ` NeilBrown -1 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown If CONFIG_NET is not selected, csum_partial is not exported, so md.ko cannot use it. We shouldn't really be using csum_partial anyway as it is an internal-to-networking interface. So replace it with C code to do the same thing. Speed is not crucial here, so something simple and correct is best. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 14:57:41.000000000 +1000 @@ -590,14 +590,41 @@ abort: return ret; } + +static u32 md_csum_fold(u32 csum) +{ + csum = (csum & 0xffff) + (csum >> 16); + return (csum & 0xffff) + (csum >> 16); +} + static unsigned int calc_sb_csum(mdp_super_t * sb) { + u64 newcsum = 0; + u32 *sb32 = (u32*)sb; + int i; unsigned int disk_csum, csum; disk_csum = sb->sb_csum; sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); + + for (i = 0; i < MD_SB_BYTES/4 ; i++) + newcsum += sb32[i]; + csum = (newcsum & 0xffffffff) + (newcsum>>32); + + +#ifdef CONFIG_ALPHA + /* This used to use csum_partial, which was wrong for several + * reasons including that different results are returned on + * different architectures. It isn't critical that we get exactly + * the same return value as before (we always csum_fold before + * testing, and that removes any differences). However as we + * know that csum_partial always returned a 16bit value on + * alphas, do a fold to maximise conformity to previous behaviour. + */ + sb->sb_csum = md_csum_fold(disk_csum); +#else sb->sb_csum = disk_csum; +#endif return csum; } @@ -685,7 +712,7 @@ static int super_90_load(mdk_rdev_t *rde if (sb->raid_disks <= 0) goto abort; - if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) { + if (md_csum_fold(calc_sb_csum(sb)) != md_csum_fold(sb->sb_csum)) { printk(KERN_WARNING "md: invalid superblock checksum on %s\n", b); goto abort; ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 002 of 5] md: Stop using csum_partial for checksum calculation in md. @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown If CONFIG_NET is not selected, csum_partial is not exported, so md.ko cannot use it. We shouldn't really be using csum_partial anyway as it is an internal-to-networking interface. So replace it with C code to do the same thing. Speed is not crucial here, so something simple and correct is best. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:33:31.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 14:57:41.000000000 +1000 @@ -590,14 +590,41 @@ abort: return ret; } + +static u32 md_csum_fold(u32 csum) +{ + csum = (csum & 0xffff) + (csum >> 16); + return (csum & 0xffff) + (csum >> 16); +} + static unsigned int calc_sb_csum(mdp_super_t * sb) { + u64 newcsum = 0; + u32 *sb32 = (u32*)sb; + int i; unsigned int disk_csum, csum; disk_csum = sb->sb_csum; sb->sb_csum = 0; - csum = csum_partial((void *)sb, MD_SB_BYTES, 0); + + for (i = 0; i < MD_SB_BYTES/4 ; i++) + newcsum += sb32[i]; + csum = (newcsum & 0xffffffff) + (newcsum>>32); + + +#ifdef CONFIG_ALPHA + /* This used to use csum_partial, which was wrong for several + * reasons including that different results are returned on + * different architectures. It isn't critical that we get exactly + * the same return value as before (we always csum_fold before + * testing, and that removes any differences). However as we + * know that csum_partial always returned a 16bit value on + * alphas, do a fold to maximise conformity to previous behaviour. + */ + sb->sb_csum = md_csum_fold(disk_csum); +#else sb->sb_csum = disk_csum; +#endif return csum; } @@ -685,7 +712,7 @@ static int super_90_load(mdk_rdev_t *rde if (sb->raid_disks <= 0) goto abort; - if (csum_fold(calc_sb_csum(sb)) != csum_fold(sb->sb_csum)) { + if (md_csum_fold(calc_sb_csum(sb)) != md_csum_fold(sb->sb_csum)) { printk(KERN_WARNING "md: invalid superblock checksum on %s\n", b); goto abort; ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 003 of 5] md: Remove the slash from the name of a kmem_cache used by raid5. 2007-05-08 4:29 ` NeilBrown @ 2007-05-08 4:29 ` NeilBrown -1 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown SLUB doesn't like slashes as it wants to use the cache name as the name of a directory (or symlink) in sysfs. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/raid5.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2007-05-07 14:36:01.000000000 +1000 +++ ./drivers/md/raid5.c 2007-05-07 15:08:45.000000000 +1000 @@ -931,8 +931,8 @@ static int grow_stripes(raid5_conf_t *co struct kmem_cache *sc; int devs = conf->raid_disks; - sprintf(conf->cache_name[0], "raid5/%s", mdname(conf->mddev)); - sprintf(conf->cache_name[1], "raid5/%s-alt", mdname(conf->mddev)); + sprintf(conf->cache_name[0], "raid5-%s", mdname(conf->mddev)); + sprintf(conf->cache_name[1], "raid5-%s-alt", mdname(conf->mddev)); conf->active_name = 0; sc = kmem_cache_create(conf->cache_name[conf->active_name], sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev), ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 003 of 5] md: Remove the slash from the name of a kmem_cache used by raid5. @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown SLUB doesn't like slashes as it wants to use the cache name as the name of a directory (or symlink) in sysfs. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/raid5.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2007-05-07 14:36:01.000000000 +1000 +++ ./drivers/md/raid5.c 2007-05-07 15:08:45.000000000 +1000 @@ -931,8 +931,8 @@ static int grow_stripes(raid5_conf_t *co struct kmem_cache *sc; int devs = conf->raid_disks; - sprintf(conf->cache_name[0], "raid5/%s", mdname(conf->mddev)); - sprintf(conf->cache_name[1], "raid5/%s-alt", mdname(conf->mddev)); + sprintf(conf->cache_name[0], "raid5-%s", mdname(conf->mddev)); + sprintf(conf->cache_name[1], "raid5-%s-alt", mdname(conf->mddev)); conf->active_name = 0; sc = kmem_cache_create(conf->cache_name[conf->active_name], sizeof(struct stripe_head)+(devs-1)*sizeof(struct r5dev), ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 004 of 5] md: Allow reshape_position for md arrays to be set via sysfs. 2007-05-08 4:29 ` NeilBrown @ 2007-05-08 4:29 ` NeilBrown -1 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown "reshape_position" records how much progress has been made on a "reshape" (adding drives, changing layout or chunksize). When it is set, the number of drives, layout and chunksize can have two possible values, an old an a new. So allow these different values to be visible, and allow both old and new to be set: Set the old ones first, then the reshape_position, then the new values. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./Documentation/md.txt | 72 +++++++++++++++++++++++++++++-------------------- ./drivers/md/md.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 107 insertions(+), 35 deletions(-) diff .prev/Documentation/md.txt ./Documentation/md.txt --- .prev/Documentation/md.txt 2007-05-07 15:38:13.000000000 +1000 +++ ./Documentation/md.txt 2007-05-07 15:49:01.000000000 +1000 @@ -178,6 +178,21 @@ All md devices contain: The size should be at least PAGE_SIZE (4k) and should be a power of 2. This can only be set while assembling an array + layout + The "layout" for the array for the particular level. This is + simply a number that is interpretted differently by different + levels. It can be written while assembling an array. + + reshape_position + This is either "none" or a sector number within the devices of + the array where "reshape" is up to. If this is set, the three + attributes mentioned above (raid_disks, chunk_size, layout) can + potentially have 2 values, an old and a new value. If these + values differ, reading the attribute returns + new (old) + and writing will effect the 'new' value, leaving the 'old' + unchanged. + component_size For arrays with data redundancy (i.e. not raid0, linear, faulty, multipath), all components must be the same size - or at least @@ -193,11 +208,6 @@ All md devices contain: 1.2 (newer format in varying locations) or "none" indicating that the kernel isn't managing metadata at all. - layout - The "layout" for the array for the particular level. This is - simply a number that is interpretted differently by different - levels. It can be written while assembling an array. - resync_start The point at which resync should start. If no resync is needed, this will be a very large number. At array creation it will @@ -259,29 +269,6 @@ All md devices contain: like active, but no writes have been seen for a while (safe_mode_delay). - sync_speed_min - sync_speed_max - This are similar to /proc/sys/dev/raid/speed_limit_{min,max} - however they only apply to the particular array. - If no value has been written to these, of if the word 'system' - is written, then the system-wide value is used. If a value, - in kibibytes-per-second is written, then it is used. - When the files are read, they show the currently active value - followed by "(local)" or "(system)" depending on whether it is - a locally set or system-wide value. - - sync_completed - This shows the number of sectors that have been completed of - whatever the current sync_action is, followed by the number of - sectors in total that could need to be processed. The two - numbers are separated by a '/' thus effectively showing one - value, a fraction of the process that is complete. - - sync_speed - This shows the current actual speed, in K/sec, of the current - sync_action. It is averaged over the last 30 seconds. - - As component devices are added to an md array, they appear in the 'md' directory as new directories named dev-XXX @@ -412,6 +399,35 @@ also have Note that the numbers are 'bit' numbers, not 'block' numbers. They should be scaled by the bitmap_chunksize. + sync_speed_min + sync_speed_max + This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + however they only apply to the particular array. + If no value has been written to these, of if the word 'system' + is written, then the system-wide value is used. If a value, + in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value + followed by "(local)" or "(system)" depending on whether it is + a locally set or system-wide value. + + sync_completed + This shows the number of sectors that have been completed of + whatever the current sync_action is, followed by the number of + sectors in total that could need to be processed. The two + numbers are separated by a '/' thus effectively showing one + value, a fraction of the process that is complete. + + sync_speed + This shows the current actual speed, in K/sec, of the current + sync_action. It is averaged over the last 30 seconds. + + suspend_lo + suspend_hi + The two values, given as numbers of sectors, indicate a range + within the array where IO will be blocked. This is currently + only supported for raid4/5/6. + + Each active md device may also have attributes specific to the personality module that manages it. These are specific to the implementation of the module and could diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:57:41.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 15:18:16.000000000 +1000 @@ -274,6 +274,7 @@ static mddev_t * mddev_find(dev_t unit) atomic_set(&new->active, 1); spin_lock_init(&new->write_lock); init_waitqueue_head(&new->sb_wait); + new->reshape_position = MaxSector; new->queue = blk_alloc_queue(GFP_KERNEL); if (!new->queue) { @@ -2242,6 +2243,10 @@ static ssize_t layout_show(mddev_t *mddev, char *page) { /* just a number, not meaningful for all levels */ + if (mddev->reshape_position != MaxSector && + mddev->layout != mddev->new_layout) + return sprintf(page, "%d (%d)\n", + mddev->new_layout, mddev->layout); return sprintf(page, "%d\n", mddev->layout); } @@ -2250,13 +2255,16 @@ layout_store(mddev_t *mddev, const char { char *e; unsigned long n = simple_strtoul(buf, &e, 10); - if (mddev->pers) - return -EBUSY; if (!*buf || (*e && *e != '\n')) return -EINVAL; - mddev->layout = n; + if (mddev->pers) + return -EBUSY; + if (mddev->reshape_position != MaxSector) + mddev->new_layout = n; + else + mddev->layout = n; return len; } static struct md_sysfs_entry md_layout = @@ -2268,6 +2276,10 @@ raid_disks_show(mddev_t *mddev, char *pa { if (mddev->raid_disks == 0) return 0; + if (mddev->reshape_position != MaxSector && + mddev->delta_disks != 0) + return sprintf(page, "%d (%d)\n", mddev->raid_disks, + mddev->raid_disks - mddev->delta_disks); return sprintf(page, "%d\n", mddev->raid_disks); } @@ -2285,7 +2297,11 @@ raid_disks_store(mddev_t *mddev, const c if (mddev->pers) rv = update_raid_disks(mddev, n); - else + else if (mddev->reshape_position != MaxSector) { + int olddisks = mddev->raid_disks - mddev->delta_disks; + mddev->delta_disks = n - olddisks; + mddev->raid_disks = n; + } else mddev->raid_disks = n; return rv ? rv : len; } @@ -2295,6 +2311,10 @@ __ATTR(raid_disks, S_IRUGO|S_IWUSR, raid static ssize_t chunk_size_show(mddev_t *mddev, char *page) { + if (mddev->reshape_position != MaxSector && + mddev->chunk_size != mddev->new_chunk) + return sprintf(page, "%d (%d)\n", mddev->new_chunk, + mddev->chunk_size); return sprintf(page, "%d\n", mddev->chunk_size); } @@ -2305,12 +2325,15 @@ chunk_size_store(mddev_t *mddev, const c char *e; unsigned long n = simple_strtoul(buf, &e, 10); - if (mddev->pers) - return -EBUSY; if (!*buf || (*e && *e != '\n')) return -EINVAL; - mddev->chunk_size = n; + if (mddev->pers) + return -EBUSY; + else if (mddev->reshape_position != MaxSector) + mddev->new_chunk = n; + else + mddev->chunk_size = n; return len; } static struct md_sysfs_entry md_chunk_size = @@ -2896,6 +2919,37 @@ suspend_hi_store(mddev_t *mddev, const c static struct md_sysfs_entry md_suspend_hi = __ATTR(suspend_hi, S_IRUGO|S_IWUSR, suspend_hi_show, suspend_hi_store); +static ssize_t +reshape_position_show(mddev_t *mddev, char *page) +{ + if (mddev->reshape_position != MaxSector) + return sprintf(page, "%llu\n", + (unsigned long long)mddev->reshape_position); + strcpy(page, "none\n"); + return 5; +} + +static ssize_t +reshape_position_store(mddev_t *mddev, const char *buf, size_t len) +{ + char *e; + unsigned long long new = simple_strtoull(buf, &e, 10); + if (mddev->pers) + return -EBUSY; + if (buf == e || (*e && *e != '\n')) + return -EINVAL; + mddev->reshape_position = new; + mddev->delta_disks = 0; + mddev->new_level = mddev->level; + mddev->new_layout = mddev->layout; + mddev->new_chunk = mddev->chunk_size; + return len; +} + +static struct md_sysfs_entry md_reshape_position = +__ATTR(reshape_position, S_IRUGO|S_IWUSR, reshape_position_show, + reshape_position_store); + static struct attribute *md_default_attrs[] = { &md_level.attr, @@ -2908,6 +2962,7 @@ static struct attribute *md_default_attr &md_new_device.attr, &md_safe_delay.attr, &md_array_state.attr, + &md_reshape_position.attr, NULL, }; @@ -3446,6 +3501,7 @@ static int do_md_stop(mddev_t * mddev, i mddev->size = 0; mddev->raid_disks = 0; mddev->recovery_cp = 0; + mddev->reshape_position = MaxSector; } else if (mddev->pers) printk(KERN_INFO "md: %s switched to read-only mode.\n", ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 004 of 5] md: Allow reshape_position for md arrays to be set via sysfs. @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown "reshape_position" records how much progress has been made on a "reshape" (adding drives, changing layout or chunksize). When it is set, the number of drives, layout and chunksize can have two possible values, an old an a new. So allow these different values to be visible, and allow both old and new to be set: Set the old ones first, then the reshape_position, then the new values. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./Documentation/md.txt | 72 +++++++++++++++++++++++++++++-------------------- ./drivers/md/md.c | 70 ++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 107 insertions(+), 35 deletions(-) diff .prev/Documentation/md.txt ./Documentation/md.txt --- .prev/Documentation/md.txt 2007-05-07 15:38:13.000000000 +1000 +++ ./Documentation/md.txt 2007-05-07 15:49:01.000000000 +1000 @@ -178,6 +178,21 @@ All md devices contain: The size should be at least PAGE_SIZE (4k) and should be a power of 2. This can only be set while assembling an array + layout + The "layout" for the array for the particular level. This is + simply a number that is interpretted differently by different + levels. It can be written while assembling an array. + + reshape_position + This is either "none" or a sector number within the devices of + the array where "reshape" is up to. If this is set, the three + attributes mentioned above (raid_disks, chunk_size, layout) can + potentially have 2 values, an old and a new value. If these + values differ, reading the attribute returns + new (old) + and writing will effect the 'new' value, leaving the 'old' + unchanged. + component_size For arrays with data redundancy (i.e. not raid0, linear, faulty, multipath), all components must be the same size - or at least @@ -193,11 +208,6 @@ All md devices contain: 1.2 (newer format in varying locations) or "none" indicating that the kernel isn't managing metadata at all. - layout - The "layout" for the array for the particular level. This is - simply a number that is interpretted differently by different - levels. It can be written while assembling an array. - resync_start The point at which resync should start. If no resync is needed, this will be a very large number. At array creation it will @@ -259,29 +269,6 @@ All md devices contain: like active, but no writes have been seen for a while (safe_mode_delay). - sync_speed_min - sync_speed_max - This are similar to /proc/sys/dev/raid/speed_limit_{min,max} - however they only apply to the particular array. - If no value has been written to these, of if the word 'system' - is written, then the system-wide value is used. If a value, - in kibibytes-per-second is written, then it is used. - When the files are read, they show the currently active value - followed by "(local)" or "(system)" depending on whether it is - a locally set or system-wide value. - - sync_completed - This shows the number of sectors that have been completed of - whatever the current sync_action is, followed by the number of - sectors in total that could need to be processed. The two - numbers are separated by a '/' thus effectively showing one - value, a fraction of the process that is complete. - - sync_speed - This shows the current actual speed, in K/sec, of the current - sync_action. It is averaged over the last 30 seconds. - - As component devices are added to an md array, they appear in the 'md' directory as new directories named dev-XXX @@ -412,6 +399,35 @@ also have Note that the numbers are 'bit' numbers, not 'block' numbers. They should be scaled by the bitmap_chunksize. + sync_speed_min + sync_speed_max + This are similar to /proc/sys/dev/raid/speed_limit_{min,max} + however they only apply to the particular array. + If no value has been written to these, of if the word 'system' + is written, then the system-wide value is used. If a value, + in kibibytes-per-second is written, then it is used. + When the files are read, they show the currently active value + followed by "(local)" or "(system)" depending on whether it is + a locally set or system-wide value. + + sync_completed + This shows the number of sectors that have been completed of + whatever the current sync_action is, followed by the number of + sectors in total that could need to be processed. The two + numbers are separated by a '/' thus effectively showing one + value, a fraction of the process that is complete. + + sync_speed + This shows the current actual speed, in K/sec, of the current + sync_action. It is averaged over the last 30 seconds. + + suspend_lo + suspend_hi + The two values, given as numbers of sectors, indicate a range + within the array where IO will be blocked. This is currently + only supported for raid4/5/6. + + Each active md device may also have attributes specific to the personality module that manages it. These are specific to the implementation of the module and could diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-07 14:57:41.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 15:18:16.000000000 +1000 @@ -274,6 +274,7 @@ static mddev_t * mddev_find(dev_t unit) atomic_set(&new->active, 1); spin_lock_init(&new->write_lock); init_waitqueue_head(&new->sb_wait); + new->reshape_position = MaxSector; new->queue = blk_alloc_queue(GFP_KERNEL); if (!new->queue) { @@ -2242,6 +2243,10 @@ static ssize_t layout_show(mddev_t *mddev, char *page) { /* just a number, not meaningful for all levels */ + if (mddev->reshape_position != MaxSector && + mddev->layout != mddev->new_layout) + return sprintf(page, "%d (%d)\n", + mddev->new_layout, mddev->layout); return sprintf(page, "%d\n", mddev->layout); } @@ -2250,13 +2255,16 @@ layout_store(mddev_t *mddev, const char { char *e; unsigned long n = simple_strtoul(buf, &e, 10); - if (mddev->pers) - return -EBUSY; if (!*buf || (*e && *e != '\n')) return -EINVAL; - mddev->layout = n; + if (mddev->pers) + return -EBUSY; + if (mddev->reshape_position != MaxSector) + mddev->new_layout = n; + else + mddev->layout = n; return len; } static struct md_sysfs_entry md_layout = @@ -2268,6 +2276,10 @@ raid_disks_show(mddev_t *mddev, char *pa { if (mddev->raid_disks == 0) return 0; + if (mddev->reshape_position != MaxSector && + mddev->delta_disks != 0) + return sprintf(page, "%d (%d)\n", mddev->raid_disks, + mddev->raid_disks - mddev->delta_disks); return sprintf(page, "%d\n", mddev->raid_disks); } @@ -2285,7 +2297,11 @@ raid_disks_store(mddev_t *mddev, const c if (mddev->pers) rv = update_raid_disks(mddev, n); - else + else if (mddev->reshape_position != MaxSector) { + int olddisks = mddev->raid_disks - mddev->delta_disks; + mddev->delta_disks = n - olddisks; + mddev->raid_disks = n; + } else mddev->raid_disks = n; return rv ? rv : len; } @@ -2295,6 +2311,10 @@ __ATTR(raid_disks, S_IRUGO|S_IWUSR, raid static ssize_t chunk_size_show(mddev_t *mddev, char *page) { + if (mddev->reshape_position != MaxSector && + mddev->chunk_size != mddev->new_chunk) + return sprintf(page, "%d (%d)\n", mddev->new_chunk, + mddev->chunk_size); return sprintf(page, "%d\n", mddev->chunk_size); } @@ -2305,12 +2325,15 @@ chunk_size_store(mddev_t *mddev, const c char *e; unsigned long n = simple_strtoul(buf, &e, 10); - if (mddev->pers) - return -EBUSY; if (!*buf || (*e && *e != '\n')) return -EINVAL; - mddev->chunk_size = n; + if (mddev->pers) + return -EBUSY; + else if (mddev->reshape_position != MaxSector) + mddev->new_chunk = n; + else + mddev->chunk_size = n; return len; } static struct md_sysfs_entry md_chunk_size = @@ -2896,6 +2919,37 @@ suspend_hi_store(mddev_t *mddev, const c static struct md_sysfs_entry md_suspend_hi = __ATTR(suspend_hi, S_IRUGO|S_IWUSR, suspend_hi_show, suspend_hi_store); +static ssize_t +reshape_position_show(mddev_t *mddev, char *page) +{ + if (mddev->reshape_position != MaxSector) + return sprintf(page, "%llu\n", + (unsigned long long)mddev->reshape_position); + strcpy(page, "none\n"); + return 5; +} + +static ssize_t +reshape_position_store(mddev_t *mddev, const char *buf, size_t len) +{ + char *e; + unsigned long long new = simple_strtoull(buf, &e, 10); + if (mddev->pers) + return -EBUSY; + if (buf == e || (*e && *e != '\n')) + return -EINVAL; + mddev->reshape_position = new; + mddev->delta_disks = 0; + mddev->new_level = mddev->level; + mddev->new_layout = mddev->layout; + mddev->new_chunk = mddev->chunk_size; + return len; +} + +static struct md_sysfs_entry md_reshape_position = +__ATTR(reshape_position, S_IRUGO|S_IWUSR, reshape_position_show, + reshape_position_store); + static struct attribute *md_default_attrs[] = { &md_level.attr, @@ -2908,6 +2962,7 @@ static struct attribute *md_default_attr &md_new_device.attr, &md_safe_delay.attr, &md_array_state.attr, + &md_reshape_position.attr, NULL, }; @@ -3446,6 +3501,7 @@ static int do_md_stop(mddev_t * mddev, i mddev->size = 0; mddev->raid_disks = 0; mddev->recovery_cp = 0; + mddev->reshape_position = MaxSector; } else if (mddev->pers) printk(KERN_INFO "md: %s switched to read-only mode.\n", ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 005 of 5] md: Improve partition detection in md array. 2007-05-08 4:29 ` NeilBrown @ 2007-05-08 4:29 ` NeilBrown -1 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown md currently uses ->media_changed to make sure rescan_partitions is call on md array after they are assembled. However that doesn't happen until the array is opened, which is later than some people would like. So use blkdev_ioctl to do the rescan immediately that the array has been assembled. This means we can remove all the ->change infrastructure as it was only used to trigger a partition rescan. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 26 ++++++++------------------ ./drivers/md/raid1.c | 1 - ./drivers/md/raid5.c | 2 -- ./include/linux/raid/md_k.h | 1 - 4 files changed, 8 insertions(+), 22 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 17:47:15.000000000 +1000 @@ -3104,6 +3104,7 @@ static int do_md_run(mddev_t * mddev) struct gendisk *disk; struct mdk_personality *pers; char b[BDEVNAME_SIZE]; + struct block_device *bdev; if (list_empty(&mddev->disks)) /* cannot run an array with no devices.. */ @@ -3331,7 +3332,13 @@ static int do_md_run(mddev_t * mddev) md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ - mddev->changed = 1; + bdev = bdget_disk(mddev->gendisk, 0); + if (bdev) { + bd_set_size(bdev, mddev->array_size << 1); + blkdev_ioctl(bdev->bd_inode, NULL, BLKRRPART, 0); + bdput(bdev); + } + md_new_event(mddev); kobject_uevent(&mddev->gendisk->kobj, KOBJ_CHANGE); return 0; @@ -3453,7 +3460,6 @@ static int do_md_stop(mddev_t * mddev, i mddev->pers = NULL; set_capacity(disk, 0); - mddev->changed = 1; if (mddev->ro) mddev->ro = 0; @@ -4593,20 +4599,6 @@ static int md_release(struct inode *inod return 0; } -static int md_media_changed(struct gendisk *disk) -{ - mddev_t *mddev = disk->private_data; - - return mddev->changed; -} - -static int md_revalidate(struct gendisk *disk) -{ - mddev_t *mddev = disk->private_data; - - mddev->changed = 0; - return 0; -} static struct block_device_operations md_fops = { .owner = THIS_MODULE, @@ -4614,8 +4606,6 @@ static struct block_device_operations md .release = md_release, .ioctl = md_ioctl, .getgeo = md_getgeo, - .media_changed = md_media_changed, - .revalidate_disk= md_revalidate, }; static int md_thread(void * arg) diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/raid1.c 2007-05-07 17:02:27.000000000 +1000 @@ -2063,7 +2063,6 @@ static int raid1_resize(mddev_t *mddev, */ mddev->array_size = sectors>>1; set_capacity(mddev->gendisk, mddev->array_size << 1); - mddev->changed = 1; if (mddev->array_size > mddev->size && mddev->recovery_cp == MaxSector) { mddev->recovery_cp = mddev->size << 1; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/raid5.c 2007-05-07 17:03:05.000000000 +1000 @@ -4514,7 +4514,6 @@ static int raid5_resize(mddev_t *mddev, sectors &= ~((sector_t)mddev->chunk_size/512 - 1); mddev->array_size = (sectors * (mddev->raid_disks-conf->max_degraded))>>1; set_capacity(mddev->gendisk, mddev->array_size << 1); - mddev->changed = 1; if (sectors/2 > mddev->size && mddev->recovery_cp == MaxSector) { mddev->recovery_cp = mddev->size << 1; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); @@ -4649,7 +4648,6 @@ static void end_reshape(raid5_conf_t *co conf->mddev->array_size = conf->mddev->size * (conf->raid_disks - conf->max_degraded); set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1); - conf->mddev->changed = 1; bdev = bdget_disk(conf->mddev->gendisk, 0); if (bdev) { diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h --- .prev/include/linux/raid/md_k.h 2007-05-08 14:24:00.000000000 +1000 +++ ./include/linux/raid/md_k.h 2007-05-08 14:24:36.000000000 +1000 @@ -201,7 +201,6 @@ struct mddev_s struct mutex reconfig_mutex; atomic_t active; - int changed; /* true if we might need to reread partition info */ int degraded; /* whether md should consider * adding a spare */ ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 005 of 5] md: Improve partition detection in md array. @ 2007-05-08 4:29 ` NeilBrown 0 siblings, 0 replies; 12+ messages in thread From: NeilBrown @ 2007-05-08 4:29 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown md currently uses ->media_changed to make sure rescan_partitions is call on md array after they are assembled. However that doesn't happen until the array is opened, which is later than some people would like. So use blkdev_ioctl to do the rescan immediately that the array has been assembled. This means we can remove all the ->change infrastructure as it was only used to trigger a partition rescan. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 26 ++++++++------------------ ./drivers/md/raid1.c | 1 - ./drivers/md/raid5.c | 2 -- ./include/linux/raid/md_k.h | 1 - 4 files changed, 8 insertions(+), 22 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/md.c 2007-05-07 17:47:15.000000000 +1000 @@ -3104,6 +3104,7 @@ static int do_md_run(mddev_t * mddev) struct gendisk *disk; struct mdk_personality *pers; char b[BDEVNAME_SIZE]; + struct block_device *bdev; if (list_empty(&mddev->disks)) /* cannot run an array with no devices.. */ @@ -3331,7 +3332,13 @@ static int do_md_run(mddev_t * mddev) md_wakeup_thread(mddev->thread); md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */ - mddev->changed = 1; + bdev = bdget_disk(mddev->gendisk, 0); + if (bdev) { + bd_set_size(bdev, mddev->array_size << 1); + blkdev_ioctl(bdev->bd_inode, NULL, BLKRRPART, 0); + bdput(bdev); + } + md_new_event(mddev); kobject_uevent(&mddev->gendisk->kobj, KOBJ_CHANGE); return 0; @@ -3453,7 +3460,6 @@ static int do_md_stop(mddev_t * mddev, i mddev->pers = NULL; set_capacity(disk, 0); - mddev->changed = 1; if (mddev->ro) mddev->ro = 0; @@ -4593,20 +4599,6 @@ static int md_release(struct inode *inod return 0; } -static int md_media_changed(struct gendisk *disk) -{ - mddev_t *mddev = disk->private_data; - - return mddev->changed; -} - -static int md_revalidate(struct gendisk *disk) -{ - mddev_t *mddev = disk->private_data; - - mddev->changed = 0; - return 0; -} static struct block_device_operations md_fops = { .owner = THIS_MODULE, @@ -4614,8 +4606,6 @@ static struct block_device_operations md .release = md_release, .ioctl = md_ioctl, .getgeo = md_getgeo, - .media_changed = md_media_changed, - .revalidate_disk= md_revalidate, }; static int md_thread(void * arg) diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/raid1.c 2007-05-07 17:02:27.000000000 +1000 @@ -2063,7 +2063,6 @@ static int raid1_resize(mddev_t *mddev, */ mddev->array_size = sectors>>1; set_capacity(mddev->gendisk, mddev->array_size << 1); - mddev->changed = 1; if (mddev->array_size > mddev->size && mddev->recovery_cp == MaxSector) { mddev->recovery_cp = mddev->size << 1; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c 2007-05-08 14:24:00.000000000 +1000 +++ ./drivers/md/raid5.c 2007-05-07 17:03:05.000000000 +1000 @@ -4514,7 +4514,6 @@ static int raid5_resize(mddev_t *mddev, sectors &= ~((sector_t)mddev->chunk_size/512 - 1); mddev->array_size = (sectors * (mddev->raid_disks-conf->max_degraded))>>1; set_capacity(mddev->gendisk, mddev->array_size << 1); - mddev->changed = 1; if (sectors/2 > mddev->size && mddev->recovery_cp == MaxSector) { mddev->recovery_cp = mddev->size << 1; set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); @@ -4649,7 +4648,6 @@ static void end_reshape(raid5_conf_t *co conf->mddev->array_size = conf->mddev->size * (conf->raid_disks - conf->max_degraded); set_capacity(conf->mddev->gendisk, conf->mddev->array_size << 1); - conf->mddev->changed = 1; bdev = bdget_disk(conf->mddev->gendisk, 0); if (bdev) { diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h --- .prev/include/linux/raid/md_k.h 2007-05-08 14:24:00.000000000 +1000 +++ ./include/linux/raid/md_k.h 2007-05-08 14:24:36.000000000 +1000 @@ -201,7 +201,6 @@ struct mddev_s struct mutex reconfig_mutex; atomic_t active; - int changed; /* true if we might need to reread partition info */ int degraded; /* whether md should consider * adding a spare */ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-05-08 4:30 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-08 4:29 [PATCH 000 of 5] md: assorted bug fixes and minor features NeilBrown 2007-05-08 4:29 ` NeilBrown 2007-05-08 4:29 ` [PATCH 001 of 5] md: Move test for whether level supports bitmap to correct place NeilBrown 2007-05-08 4:29 ` NeilBrown 2007-05-08 4:29 ` [PATCH 002 of 5] md: Stop using csum_partial for checksum calculation in md NeilBrown 2007-05-08 4:29 ` NeilBrown 2007-05-08 4:29 ` [PATCH 003 of 5] md: Remove the slash from the name of a kmem_cache used by raid5 NeilBrown 2007-05-08 4:29 ` NeilBrown 2007-05-08 4:29 ` [PATCH 004 of 5] md: Allow reshape_position for md arrays to be set via sysfs NeilBrown 2007-05-08 4:29 ` NeilBrown 2007-05-08 4:29 ` [PATCH 005 of 5] md: Improve partition detection in md array NeilBrown 2007-05-08 4:29 ` NeilBrown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.