* [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE
@ 2007-05-21 1:32 NeilBrown
2007-05-21 1:33 ` [PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components NeilBrown
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: NeilBrown @ 2007-05-21 1:32 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel, Jeff Zheng, Neil Brown, stable
Following are 7 patches for md in current main-line.
The first two fix bugs that can cause data corruption, and so are suitable for -stable.
The next fixes some problems with hot-adding a device to a linear array. As has not
been tested by my test-suite until now, it hasn't worked properly until now :-(
The remainder are mainly cleaning up code and comments. They could
wait for 2.6.23, but are probably safe to go into 2.6.22 (maybe sit a week in -mm??)
Thanks,
NeilBrown
[PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components.
[PATCH 002 of 7] md: Don't write more than is required of the last page of a bitmap
[PATCH 003 of 7] md: Fix bug with linear hot-add and elsewhere.
[PATCH 004 of 7] md: Improve message about invalid superblock during autodetect.
[PATCH 005 of 7] md: Improve the is_mddev_idle test fix
[PATCH 006 of 7] md: Check that internal bitmap does not overlap other data.
[PATCH 007 of 7] md: Change bitmap_unplug and others to void functions.
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components. 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 002 of 7] md: Don't write more than is required of the last page of a bitmap NeilBrown ` (5 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Jeff Zheng, Neil Brown, stable If a raid0 has a component device larger than 4TB, and is accessed on a 32bit machines, then as 'chunk' is unsigned lock, chunk << chunksize_bits can overflow (this can be as high as the size of the device in KB). chunk itself will not overflow (without triggering a BUG). So change 'chunk' to be 'sector_t, and get rid of the 'BUG' as it becomes impossible to hit. Cc: "Jeff Zheng" <Jeff.Zheng@endace.com> Signed-off-by: Neil Brown <neilb@suse.de> Cc: stable@kernel.org ### Diffstat output ./drivers/md/raid0.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff .prev/drivers/md/raid0.c ./drivers/md/raid0.c --- .prev/drivers/md/raid0.c 2007-05-18 11:48:57.000000000 +1000 +++ ./drivers/md/raid0.c 2007-05-18 11:48:57.000000000 +1000 @@ -415,7 +415,7 @@ static int raid0_make_request (request_q raid0_conf_t *conf = mddev_to_conf(mddev); struct strip_zone *zone; mdk_rdev_t *tmp_dev; - unsigned long chunk; + sector_t chunk; sector_t block, rsect; const int rw = bio_data_dir(bio); @@ -470,7 +470,6 @@ static int raid0_make_request (request_q sector_div(x, zone->nb_dev); chunk = x; - BUG_ON(x != (sector_t)chunk); x = block >> chunksize_bits; tmp_dev = zone->dev[sector_div(x, zone->nb_dev)]; ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 002 of 7] md: Don't write more than is required of the last page of a bitmap 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown 2007-05-21 1:33 ` [PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 003 of 7] md: Fix bug with linear hot-add and elsewhere NeilBrown ` (4 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown, stable It is possible that real data or metadata follows the bitmap without full page alignment. So limit the last write to be only the required number of bytes, rounded up to the hard sector size of the device. Signed-off-by: Neil Brown <neilb@suse.de> Cc: stable@kernel.org ### Diffstat output ./drivers/md/bitmap.c | 17 ++++++++++++----- ./include/linux/raid/bitmap.h | 1 + 2 files changed, 13 insertions(+), 5 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-05-18 11:49:18.000000000 +1000 +++ ./drivers/md/bitmap.c 2007-05-18 11:49:18.000000000 +1000 @@ -255,19 +255,25 @@ static struct page *read_sb_page(mddev_t } -static int write_sb_page(mddev_t *mddev, long offset, struct page *page, int wait) +static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait) { mdk_rdev_t *rdev; struct list_head *tmp; + mddev_t *mddev = bitmap->mddev; ITERATE_RDEV(mddev, rdev, tmp) if (test_bit(In_sync, &rdev->flags) - && !test_bit(Faulty, &rdev->flags)) + && !test_bit(Faulty, &rdev->flags)) { + int size = PAGE_SIZE; + if (page->index == bitmap->file_pages-1) + size = roundup(bitmap->last_page_size, + bdev_hardsect_size(rdev->bdev)); md_super_write(mddev, rdev, - (rdev->sb_offset<<1) + offset + (rdev->sb_offset<<1) + bitmap->offset + page->index * (PAGE_SIZE/512), - PAGE_SIZE, + size, page); + } if (wait) md_super_wait(mddev); @@ -282,7 +288,7 @@ static int write_page(struct bitmap *bit struct buffer_head *bh; if (bitmap->file == NULL) - return write_sb_page(bitmap->mddev, bitmap->offset, page, wait); + return write_sb_page(bitmap, page, wait); bh = page_buffers(page); @@ -923,6 +929,7 @@ static int bitmap_init_from_disk(struct } bitmap->filemap[bitmap->file_pages++] = page; + bitmap->last_page_size = count; } paddr = kmap_atomic(page, KM_USER0); if (bitmap->flags & BITMAP_HOSTENDIAN) diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h --- .prev/include/linux/raid/bitmap.h 2007-05-18 11:49:18.000000000 +1000 +++ ./include/linux/raid/bitmap.h 2007-05-18 11:49:18.000000000 +1000 @@ -232,6 +232,7 @@ struct bitmap { struct page **filemap; /* list of cache pages for the file */ unsigned long *filemap_attr; /* attributes associated w/ filemap pages */ unsigned long file_pages; /* number of pages in the file */ + int last_page_size; /* bytes in the last page */ unsigned long flags; ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 003 of 7] md: Fix bug with linear hot-add and elsewhere. 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown 2007-05-21 1:33 ` [PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components NeilBrown 2007-05-21 1:33 ` [PATCH 002 of 7] md: Don't write more than is required of the last page of a bitmap NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 004 of 7] md: Improve message about invalid superblock during autodetect NeilBrown ` (3 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown Adding a drive to a linear array seems to have stopped working, due to changes elsewhere in md, and insufficient ongoing testing... So the patch to make linear hot-add work in the first place introduced a subtle bug elsewhere that interracts poorly with older version of mdadm. This fixes it all up. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/linear.c | 10 +++++----- ./drivers/md/md.c | 20 ++++++++++++++------ 2 files changed, 19 insertions(+), 11 deletions(-) diff .prev/drivers/md/linear.c ./drivers/md/linear.c --- .prev/drivers/md/linear.c 2007-05-21 11:13:25.000000000 +1000 +++ ./drivers/md/linear.c 2007-05-21 11:13:39.000000000 +1000 @@ -139,8 +139,6 @@ static linear_conf_t *linear_conf(mddev_ if (!conf) return NULL; - mddev->private = conf; - cnt = 0; conf->array_size = 0; @@ -232,7 +230,7 @@ static linear_conf_t *linear_conf(mddev_ * First calculate the device offsets. */ conf->disks[0].offset = 0; - for (i=1; i<mddev->raid_disks; i++) + for (i=1; i < raid_disks; i++) conf->disks[i].offset = conf->disks[i-1].offset + conf->disks[i-1].size; @@ -244,7 +242,7 @@ static linear_conf_t *linear_conf(mddev_ curr_offset < conf->array_size; curr_offset += conf->hash_spacing) { - while (i < mddev->raid_disks-1 && + while (i < raid_disks-1 && curr_offset >= conf->disks[i+1].offset) i++; @@ -299,9 +297,11 @@ static int linear_add(mddev_t *mddev, md */ linear_conf_t *newconf; - if (rdev->raid_disk != mddev->raid_disks) + if (rdev->saved_raid_disk != mddev->raid_disks) return -EINVAL; + rdev->raid_disk = rdev->saved_raid_disk; + newconf = linear_conf(mddev,mddev->raid_disks+1); if (!newconf) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-21 11:13:25.000000000 +1000 +++ ./drivers/md/md.c 2007-05-21 11:14:54.000000000 +1000 @@ -1298,8 +1298,9 @@ static void super_1_sync(mddev_t *mddev, ITERATE_RDEV(mddev,rdev2,tmp) if (rdev2->desc_nr+1 > max_dev) max_dev = rdev2->desc_nr+1; - - sb->max_dev = cpu_to_le32(max_dev); + + if (max_dev > le32_to_cpu(sb->max_dev)) + sb->max_dev = cpu_to_le32(max_dev); for (i=0; i<max_dev;i++) sb->dev_roles[i] = cpu_to_le16(0xfffe); @@ -1365,10 +1366,14 @@ static int bind_rdev_to_array(mdk_rdev_t } /* make sure rdev->size exceeds mddev->size */ if (rdev->size && (mddev->size == 0 || rdev->size < mddev->size)) { - if (mddev->pers) - /* Cannot change size, so fail */ - return -ENOSPC; - else + if (mddev->pers) { + /* Cannot change size, so fail + * If mddev->level <= 0, then we don't care + * about aligning sizes (e.g. linear) + */ + if (mddev->level > 0) + return -ENOSPC; + } else mddev->size = rdev->size; } @@ -2142,6 +2147,9 @@ static void analyze_sbs(mddev_t * mddev) rdev->desc_nr = i++; rdev->raid_disk = rdev->desc_nr; set_bit(In_sync, &rdev->flags); + } else if (rdev->raid_disk >= mddev->raid_disks) { + rdev->raid_disk = -1; + clear_bit(In_sync, &rdev->flags); } } ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 004 of 7] md: Improve message about invalid superblock during autodetect. 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown ` (2 preceding siblings ...) 2007-05-21 1:33 ` [PATCH 003 of 7] md: Fix bug with linear hot-add and elsewhere NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 005 of 7] md: Improve the is_mddev_idle test fix NeilBrown ` (2 subsequent siblings) 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown People try to use raid auto-detect with version-1 superblocks (which is not supported) and get confused when they are told they have an invalid superblock. So be more explicit, and say it it is not a valid v0.90 superblock. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-21 11:14:54.000000000 +1000 +++ ./drivers/md/md.c 2007-05-21 11:16:16.000000000 +1000 @@ -2073,9 +2073,11 @@ static mdk_rdev_t *md_import_device(dev_ err = super_types[super_format]. load_super(rdev, NULL, super_minor); if (err == -EINVAL) { - printk(KERN_WARNING - "md: %s has invalid sb, not importing!\n", - bdevname(rdev->bdev,b)); + printk(KERN_WARNING + "md: %s does not have a valid v%d.%d " + "superblock, not importing!\n", + bdevname(rdev->bdev,b), + super_format, super_minor); goto abort_free; } if (err < 0) { @@ -5772,7 +5774,7 @@ static void autostart_arrays(int part) for (i = 0; i < dev_cnt; i++) { dev_t dev = detected_devices[i]; - rdev = md_import_device(dev,0, 0); + rdev = md_import_device(dev,0, 90); if (IS_ERR(rdev)) continue; ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 005 of 7] md: Improve the is_mddev_idle test fix 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown ` (3 preceding siblings ...) 2007-05-21 1:33 ` [PATCH 004 of 7] md: Improve message about invalid superblock during autodetect NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 006 of 7] md: Check that internal bitmap does not overlap other data NeilBrown 2007-05-21 1:33 ` [PATCH 007 of 7] md: Change bitmap_unplug and others to void functions NeilBrown 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown Don't use 'unsigned' variable to track sync vs non-sync IO, as the only thing we want to do with them is a signed comparison, and fix up the comment which had become quite wrong. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 35 ++++++++++++++++++++++------------- ./include/linux/raid/md_k.h | 2 +- 2 files changed, 23 insertions(+), 14 deletions(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-21 11:17:57.000000000 +1000 +++ ./drivers/md/md.c 2007-05-21 11:18:00.000000000 +1000 @@ -5092,7 +5092,7 @@ static int is_mddev_idle(mddev_t *mddev) mdk_rdev_t * rdev; struct list_head *tmp; int idle; - unsigned long curr_events; + long curr_events; idle = 1; ITERATE_RDEV(mddev,rdev,tmp) { @@ -5100,20 +5100,29 @@ static int is_mddev_idle(mddev_t *mddev) curr_events = disk_stat_read(disk, sectors[0]) + disk_stat_read(disk, sectors[1]) - atomic_read(&disk->sync_io); - /* The difference between curr_events and last_events - * will be affected by any new non-sync IO (making - * curr_events bigger) and any difference in the amount of - * in-flight syncio (making current_events bigger or smaller) - * The amount in-flight is currently limited to - * 32*64K in raid1/10 and 256*PAGE_SIZE in raid5/6 - * which is at most 4096 sectors. - * These numbers are fairly fragile and should be made - * more robust, probably by enforcing the - * 'window size' that md_do_sync sort-of uses. + /* sync IO will cause sync_io to increase before the disk_stats + * as sync_io is counted when a request starts, and + * disk_stats is counted when it completes. + * So resync activity will cause curr_events to be smaller than + * when there was no such activity. + * non-sync IO will cause disk_stat to increase without + * increasing sync_io so curr_events will (eventually) + * be larger than it was before. Once it becomes + * substantially larger, the test below will cause + * the array to appear non-idle, and resync will slow + * down. + * If there is a lot of outstanding resync activity when + * we set last_event to curr_events, then all that activity + * completing might cause the array to appear non-idle + * and resync will be slowed down even though there might + * not have been non-resync activity. This will only + * happen once though. 'last_events' will soon reflect + * the state where there is little or no outstanding + * resync requests, and further resync activity will + * always make curr_events less than last_events. * - * Note: the following is an unsigned comparison. */ - if ((long)curr_events - (long)rdev->last_events > 4096) { + if (curr_events - rdev->last_events > 4096) { rdev->last_events = curr_events; idle = 0; } diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h --- .prev/include/linux/raid/md_k.h 2007-05-21 11:17:57.000000000 +1000 +++ ./include/linux/raid/md_k.h 2007-05-21 11:18:00.000000000 +1000 @@ -51,7 +51,7 @@ struct mdk_rdev_s sector_t size; /* Device size (in blocks) */ mddev_t *mddev; /* RAID array if running */ - unsigned long last_events; /* IO event timestamp */ + long last_events; /* IO event timestamp */ struct block_device *bdev; /* block device handle */ ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 006 of 7] md: Check that internal bitmap does not overlap other data. 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown ` (4 preceding siblings ...) 2007-05-21 1:33 ` [PATCH 005 of 7] md: Improve the is_mddev_idle test fix NeilBrown @ 2007-05-21 1:33 ` NeilBrown 2007-05-21 1:33 ` [PATCH 007 of 7] md: Change bitmap_unplug and others to void functions NeilBrown 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown We current completely trust user-space to set up metadata describing an consistant array. In particlar, that the metadata, data, and bitmap do not overlap. But userspace can be buggy, and it is better to report an error than corrupt data. So put in some appropriate checks. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/bitmap.c | 35 +++++++++++++++++++++++++++++++++-- ./drivers/md/md.c | 22 +++++++++++++++++++++- 2 files changed, 54 insertions(+), 3 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-05-21 11:17:57.000000000 +1000 +++ ./drivers/md/bitmap.c 2007-05-21 11:18:22.000000000 +1000 @@ -268,6 +268,31 @@ static int write_sb_page(struct bitmap * if (page->index == bitmap->file_pages-1) size = roundup(bitmap->last_page_size, bdev_hardsect_size(rdev->bdev)); + /* Just make sure we aren't corrupting data or + * metadata + */ + if (bitmap->offset < 0) { + /* DATA BITMAP METADATA */ + if (bitmap->offset + + page->index * (PAGE_SIZE/512) + + size/512 > 0) + /* bitmap runs in to metadata */ + return -EINVAL; + if (rdev->data_offset + mddev->size*2 + > rdev->sb_offset*2 + bitmap->offset) + /* data runs in to bitmap */ + return -EINVAL; + } else if (rdev->sb_offset*2 < rdev->data_offset) { + /* METADATA BITMAP DATA */ + if (rdev->sb_offset*2 + + bitmap->offset + + page->index*(PAGE_SIZE/512) + size/512 + > rdev->data_offset) + /* bitmap runs in to data */ + return -EINVAL; + } else { + /* DATA METADATA BITMAP - no problems */ + } md_super_write(mddev, rdev, (rdev->sb_offset<<1) + bitmap->offset + page->index * (PAGE_SIZE/512), @@ -287,8 +312,14 @@ static int write_page(struct bitmap *bit { struct buffer_head *bh; - if (bitmap->file == NULL) - return write_sb_page(bitmap, page, wait); + if (bitmap->file == NULL) { + switch (write_sb_page(bitmap, page, wait)) { + case -EINVAL: + bitmap->flags |= BITMAP_WRITE_ERROR; + return -EIO; + } + return 0; + } bh = page_buffers(page); diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-21 11:18:00.000000000 +1000 +++ ./drivers/md/md.c 2007-05-21 11:18:22.000000000 +1000 @@ -3176,13 +3176,33 @@ static int do_md_run(mddev_t * mddev) * Drop all container device buffers, from now on * the only valid external interface is through the md * device. - * Also find largest hardsector size */ ITERATE_RDEV(mddev,rdev,tmp) { if (test_bit(Faulty, &rdev->flags)) continue; sync_blockdev(rdev->bdev); invalidate_bdev(rdev->bdev); + + /* perform some consistency tests on the device. + * We don't want the data to overlap the metadata, + * Internal Bitmap issues has handled elsewhere. + */ + if (rdev->data_offset < rdev->sb_offset) { + if (mddev->size && + rdev->data_offset + mddev->size*2 + > rdev->sb_offset*2) { + printk("md: %s: data overlaps metadata\n", + mdname(mddev)); + return -EINVAL; + } + } else { + if (rdev->sb_offset*2 + rdev->sb_size/512 + > rdev->data_offset) { + printk("md: %s: metadata overlaps data\n", + mdname(mddev)); + return -EINVAL; + } + } } md_probe(mddev->unit, NULL, NULL); ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 007 of 7] md: Change bitmap_unplug and others to void functions. 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown ` (5 preceding siblings ...) 2007-05-21 1:33 ` [PATCH 006 of 7] md: Check that internal bitmap does not overlap other data NeilBrown @ 2007-05-21 1:33 ` NeilBrown 6 siblings, 0 replies; 8+ messages in thread From: NeilBrown @ 2007-05-21 1:33 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-raid, linux-kernel, Neil Brown bitmap_unplug only ever returns 0, so it may as well be void. Two callers try to print a message if it returns non-zero, but that message is already printed by bitmap_file_kick. write_page returns an error which is not consistently checked. It always causes BITMAP_WRITE_ERROR to be set on an error, and that can more conveniently be checked. When the return of write_page is checked, an error causes bitmap_file_kick to be called - so move that call into write_page - and protect against recursive calls into bitmap_file_kick. bitmap_update_sb returns an error that is never checked. So make these 'void' and be consistent about checking the bit. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/bitmap.c | 144 +++++++++++++++++++++--------------------- ./drivers/md/md.c | 3 ./drivers/md/raid1.c | 3 ./drivers/md/raid10.c | 3 ./include/linux/raid/bitmap.h | 6 - 5 files changed, 80 insertions(+), 79 deletions(-) diff .prev/drivers/md/bitmap.c ./drivers/md/bitmap.c --- .prev/drivers/md/bitmap.c 2007-05-21 11:18:22.000000000 +1000 +++ ./drivers/md/bitmap.c 2007-05-21 11:18:23.000000000 +1000 @@ -305,10 +305,11 @@ static int write_sb_page(struct bitmap * return 0; } +static void bitmap_file_kick(struct bitmap *bitmap); /* * write out a page to a file */ -static int write_page(struct bitmap *bitmap, struct page *page, int wait) +static void write_page(struct bitmap *bitmap, struct page *page, int wait) { struct buffer_head *bh; @@ -316,27 +317,26 @@ static int write_page(struct bitmap *bit switch (write_sb_page(bitmap, page, wait)) { case -EINVAL: bitmap->flags |= BITMAP_WRITE_ERROR; - return -EIO; } - return 0; - } + } else { - bh = page_buffers(page); + bh = page_buffers(page); - while (bh && bh->b_blocknr) { - atomic_inc(&bitmap->pending_writes); - set_buffer_locked(bh); - set_buffer_mapped(bh); - submit_bh(WRITE, bh); - bh = bh->b_this_page; - } + while (bh && bh->b_blocknr) { + atomic_inc(&bitmap->pending_writes); + set_buffer_locked(bh); + set_buffer_mapped(bh); + submit_bh(WRITE, bh); + bh = bh->b_this_page; + } - if (wait) { - wait_event(bitmap->write_wait, - atomic_read(&bitmap->pending_writes)==0); - return (bitmap->flags & BITMAP_WRITE_ERROR) ? -EIO : 0; + if (wait) { + wait_event(bitmap->write_wait, + atomic_read(&bitmap->pending_writes)==0); + } } - return 0; + if (bitmap->flags & BITMAP_WRITE_ERROR) + bitmap_file_kick(bitmap); } static void end_bitmap_write(struct buffer_head *bh, int uptodate) @@ -456,17 +456,17 @@ out: */ /* update the event counter and sync the superblock to disk */ -int bitmap_update_sb(struct bitmap *bitmap) +void bitmap_update_sb(struct bitmap *bitmap) { bitmap_super_t *sb; unsigned long flags; if (!bitmap || !bitmap->mddev) /* no bitmap for this array */ - return 0; + return; spin_lock_irqsave(&bitmap->lock, flags); if (!bitmap->sb_page) { /* no superblock */ spin_unlock_irqrestore(&bitmap->lock, flags); - return 0; + return; } spin_unlock_irqrestore(&bitmap->lock, flags); sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); @@ -474,7 +474,7 @@ int bitmap_update_sb(struct bitmap *bitm if (!bitmap->mddev->degraded) sb->events_cleared = cpu_to_le64(bitmap->mddev->events); kunmap_atomic(sb, KM_USER0); - return write_page(bitmap, bitmap->sb_page, 1); + write_page(bitmap, bitmap->sb_page, 1); } /* print out the bitmap file superblock */ @@ -603,20 +603,22 @@ enum bitmap_mask_op { MASK_UNSET }; -/* record the state of the bitmap in the superblock */ -static void bitmap_mask_state(struct bitmap *bitmap, enum bitmap_state bits, - enum bitmap_mask_op op) +/* record the state of the bitmap in the superblock. Return the old value */ +static int bitmap_mask_state(struct bitmap *bitmap, enum bitmap_state bits, + enum bitmap_mask_op op) { bitmap_super_t *sb; unsigned long flags; + int old; spin_lock_irqsave(&bitmap->lock, flags); if (!bitmap->sb_page) { /* can't set the state */ spin_unlock_irqrestore(&bitmap->lock, flags); - return; + return 0; } spin_unlock_irqrestore(&bitmap->lock, flags); sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0); + old = le32_to_cpu(sb->state) & bits; switch (op) { case MASK_SET: sb->state |= cpu_to_le32(bits); break; @@ -625,6 +627,7 @@ static void bitmap_mask_state(struct bit default: BUG(); } kunmap_atomic(sb, KM_USER0); + return old; } /* @@ -718,18 +721,23 @@ static void bitmap_file_kick(struct bitm { char *path, *ptr = NULL; - bitmap_mask_state(bitmap, BITMAP_STALE, MASK_SET); - bitmap_update_sb(bitmap); - - if (bitmap->file) { - path = kmalloc(PAGE_SIZE, GFP_KERNEL); - if (path) - ptr = file_path(bitmap->file, path, PAGE_SIZE); - - printk(KERN_ALERT "%s: kicking failed bitmap file %s from array!\n", - bmname(bitmap), ptr ? ptr : ""); + if (bitmap_mask_state(bitmap, BITMAP_STALE, MASK_SET) == 0) { + bitmap_update_sb(bitmap); - kfree(path); + if (bitmap->file) { + path = kmalloc(PAGE_SIZE, GFP_KERNEL); + if (path) + ptr = file_path(bitmap->file, path, PAGE_SIZE); + + printk(KERN_ALERT + "%s: kicking failed bitmap file %s from array!\n", + bmname(bitmap), ptr ? ptr : ""); + + kfree(path); + } else + printk(KERN_ALERT + "%s: disabling internal bitmap due to errors\n", + bmname(bitmap)); } bitmap_file_put(bitmap); @@ -800,16 +808,15 @@ static void bitmap_file_set_bit(struct b /* this gets called when the md device is ready to unplug its underlying * (slave) device queues -- before we let any writes go down, we need to * sync the dirty pages of the bitmap file to disk */ -int bitmap_unplug(struct bitmap *bitmap) +void bitmap_unplug(struct bitmap *bitmap) { unsigned long i, flags; int dirty, need_write; struct page *page; int wait = 0; - int err; if (!bitmap) - return 0; + return; /* look at each page to see if there are any set bits that need to be * flushed out to disk */ @@ -817,7 +824,7 @@ int bitmap_unplug(struct bitmap *bitmap) spin_lock_irqsave(&bitmap->lock, flags); if (!bitmap->filemap) { spin_unlock_irqrestore(&bitmap->lock, flags); - return 0; + return; } page = bitmap->filemap[i]; dirty = test_page_attr(bitmap, page, BITMAP_PAGE_DIRTY); @@ -829,7 +836,7 @@ int bitmap_unplug(struct bitmap *bitmap) spin_unlock_irqrestore(&bitmap->lock, flags); if (dirty | need_write) - err = write_page(bitmap, page, 0); + write_page(bitmap, page, 0); } if (wait) { /* if any writes were performed, we need to wait on them */ if (bitmap->file) @@ -840,7 +847,6 @@ int bitmap_unplug(struct bitmap *bitmap) } if (bitmap->flags & BITMAP_WRITE_ERROR) bitmap_file_kick(bitmap); - return 0; } static void bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed); @@ -889,21 +895,21 @@ static int bitmap_init_from_disk(struct bmname(bitmap), (unsigned long) i_size_read(file->f_mapping->host), bytes + sizeof(bitmap_super_t)); - goto out; + goto err; } ret = -ENOMEM; bitmap->filemap = kmalloc(sizeof(struct page *) * num_pages, GFP_KERNEL); if (!bitmap->filemap) - goto out; + goto err; /* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */ bitmap->filemap_attr = kzalloc( roundup( DIV_ROUND_UP(num_pages*4, 8), sizeof(unsigned long)), GFP_KERNEL); if (!bitmap->filemap_attr) - goto out; + goto err; oldindex = ~0L; @@ -936,7 +942,7 @@ static int bitmap_init_from_disk(struct } if (IS_ERR(page)) { /* read error */ ret = PTR_ERR(page); - goto out; + goto err; } oldindex = index; @@ -951,11 +957,13 @@ static int bitmap_init_from_disk(struct memset(paddr + offset, 0xff, PAGE_SIZE - offset); kunmap_atomic(paddr, KM_USER0); - ret = write_page(bitmap, page, 1); - if (ret) { + write_page(bitmap, page, 1); + + ret = -EIO; + if (bitmap->flags & BITMAP_WRITE_ERROR) { /* release, page not in filemap yet */ put_page(page); - goto out; + goto err; } } @@ -987,11 +995,15 @@ static int bitmap_init_from_disk(struct md_wakeup_thread(bitmap->mddev->thread); } -out: printk(KERN_INFO "%s: bitmap initialized from disk: " - "read %lu/%lu pages, set %lu bits, status: %d\n", - bmname(bitmap), bitmap->file_pages, num_pages, bit_cnt, ret); + "read %lu/%lu pages, set %lu bits\n", + bmname(bitmap), bitmap->file_pages, num_pages, bit_cnt); + + return 0; + err: + printk(KERN_INFO "%s: bitmap initialisation failed: %d\n", + bmname(bitmap), ret); return ret; } @@ -1028,19 +1040,18 @@ static bitmap_counter_t *bitmap_get_coun * out to disk */ -int bitmap_daemon_work(struct bitmap *bitmap) +void bitmap_daemon_work(struct bitmap *bitmap) { unsigned long j; unsigned long flags; struct page *page = NULL, *lastpage = NULL; - int err = 0; int blocks; void *paddr; if (bitmap == NULL) - return 0; + return; if (time_before(jiffies, bitmap->daemon_lastrun + bitmap->daemon_sleep*HZ)) - return 0; + return; bitmap->daemon_lastrun = jiffies; for (j = 0; j < bitmap->chunks; j++) { @@ -1063,14 +1074,8 @@ int bitmap_daemon_work(struct bitmap *bi clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); - if (need_write) { - switch (write_page(bitmap, page, 0)) { - case 0: - break; - default: - bitmap_file_kick(bitmap); - } - } + if (need_write) + write_page(bitmap, page, 0); continue; } @@ -1079,13 +1084,11 @@ int bitmap_daemon_work(struct bitmap *bi if (test_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE)) { clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); - err = write_page(bitmap, lastpage, 0); + write_page(bitmap, lastpage, 0); } else { set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); } - if (err) - bitmap_file_kick(bitmap); } else spin_unlock_irqrestore(&bitmap->lock, flags); lastpage = page; @@ -1128,14 +1131,13 @@ int bitmap_daemon_work(struct bitmap *bi if (test_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE)) { clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); - err = write_page(bitmap, lastpage, 0); + write_page(bitmap, lastpage, 0); } else { set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE); spin_unlock_irqrestore(&bitmap->lock, flags); } } - return err; } static bitmap_counter_t *bitmap_get_counter(struct bitmap *bitmap, @@ -1548,7 +1550,9 @@ int bitmap_create(mddev_t *mddev) mddev->thread->timeout = bitmap->daemon_sleep * HZ; - return bitmap_update_sb(bitmap); + bitmap_update_sb(bitmap); + + return (bitmap->flags & BITMAP_WRITE_ERROR) ? -EIO : 0; error: bitmap_free(bitmap); diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2007-05-21 11:18:22.000000000 +1000 +++ ./drivers/md/md.c 2007-05-21 11:18:23.000000000 +1000 @@ -1640,7 +1640,6 @@ static void sync_sbs(mddev_t * mddev, in static void md_update_sb(mddev_t * mddev, int force_change) { - int err; struct list_head *tmp; mdk_rdev_t *rdev; int sync_req; @@ -1727,7 +1726,7 @@ repeat: "md: updating %s RAID superblock on device (in sync %d)\n", mdname(mddev),mddev->in_sync); - err = bitmap_update_sb(mddev->bitmap); + bitmap_update_sb(mddev->bitmap); ITERATE_RDEV(mddev,rdev,tmp) { char b[BDEVNAME_SIZE]; dprintk(KERN_INFO "md: "); diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c --- .prev/drivers/md/raid10.c 2007-05-21 11:17:57.000000000 +1000 +++ ./drivers/md/raid10.c 2007-05-21 11:18:23.000000000 +1000 @@ -1510,8 +1510,7 @@ static void raid10d(mddev_t *mddev) blk_remove_plug(mddev->queue); spin_unlock_irqrestore(&conf->device_lock, flags); /* flush any pending bitmap writes to disk before proceeding w/ I/O */ - if (bitmap_unplug(mddev->bitmap) != 0) - printk("%s: bitmap file write failed!\n", mdname(mddev)); + bitmap_unplug(mddev->bitmap); while (bio) { /* submit pending writes */ struct bio *next = bio->bi_next; diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- .prev/drivers/md/raid1.c 2007-05-21 11:17:57.000000000 +1000 +++ ./drivers/md/raid1.c 2007-05-21 11:18:23.000000000 +1000 @@ -1519,8 +1519,7 @@ static void raid1d(mddev_t *mddev) blk_remove_plug(mddev->queue); spin_unlock_irqrestore(&conf->device_lock, flags); /* flush any pending bitmap writes to disk before proceeding w/ I/O */ - if (bitmap_unplug(mddev->bitmap) != 0) - printk("%s: bitmap file write failed!\n", mdname(mddev)); + bitmap_unplug(mddev->bitmap); while (bio) { /* submit pending writes */ struct bio *next = bio->bi_next; diff .prev/include/linux/raid/bitmap.h ./include/linux/raid/bitmap.h --- .prev/include/linux/raid/bitmap.h 2007-05-21 11:17:57.000000000 +1000 +++ ./include/linux/raid/bitmap.h 2007-05-21 11:18:23.000000000 +1000 @@ -262,7 +262,7 @@ int bitmap_active(struct bitmap *bitmap char *file_path(struct file *file, char *buf, int count); void bitmap_print_sb(struct bitmap *bitmap); -int bitmap_update_sb(struct bitmap *bitmap); +void bitmap_update_sb(struct bitmap *bitmap); int bitmap_setallbits(struct bitmap *bitmap); void bitmap_write_all(struct bitmap *bitmap); @@ -278,8 +278,8 @@ int bitmap_start_sync(struct bitmap *bit void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, int *blocks, int aborted); void bitmap_close_sync(struct bitmap *bitmap); -int bitmap_unplug(struct bitmap *bitmap); -int bitmap_daemon_work(struct bitmap *bitmap); +void bitmap_unplug(struct bitmap *bitmap); +void bitmap_daemon_work(struct bitmap *bitmap); #endif #endif ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-05-21 1:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-05-21 1:32 [PATCH 000 of 7] md: Introduction EXPLAIN PATCH SET HERE NeilBrown 2007-05-21 1:33 ` [PATCH 001 of 7] md: Avoid overflow in raid0 calculation with large components NeilBrown 2007-05-21 1:33 ` [PATCH 002 of 7] md: Don't write more than is required of the last page of a bitmap NeilBrown 2007-05-21 1:33 ` [PATCH 003 of 7] md: Fix bug with linear hot-add and elsewhere NeilBrown 2007-05-21 1:33 ` [PATCH 004 of 7] md: Improve message about invalid superblock during autodetect NeilBrown 2007-05-21 1:33 ` [PATCH 005 of 7] md: Improve the is_mddev_idle test fix NeilBrown 2007-05-21 1:33 ` [PATCH 006 of 7] md: Check that internal bitmap does not overlap other data NeilBrown 2007-05-21 1:33 ` [PATCH 007 of 7] md: Change bitmap_unplug and others to void functions NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).