* [PATCH 000 of 4] md: assorted md patched - please read carefully.
@ 2008-01-18 11:02 NeilBrown
2008-01-18 11:02 ` [PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently NeilBrown
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: NeilBrown @ 2008-01-18 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Following are 4 patches for md.
The first two replace
md-allow-devices-to-be-shared-between-md-arrays.patch
which was recently remove. They should go at the same place in the
series, between
md-allow-a-maximum-extent-to-be-set-for-resyncing.patch
and
md-lock-address-when-changing-attributes-of-component-devices.patch
The third is a replacement for
md-change-iterate_rdev_generic-to-rdev_for_each_list-and-remove-iterate_rdev_pending.patch
which conflicts with the above change.
The last is a fix for
md-fix-an-occasional-deadlock-in-raid5.patch
which makes me a lot happier about this patch. It introduced a
performance regression and I now understand why. I'm now happy for
that patch with this fix to go into 2.6.24 if that is convenient (If
not, 2.6.24.1 will do).
Thanks,
NeilBrown
[PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently.
[PATCH 002 of 4] md: Allow devices to be shared between md arrays.
[PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING.
[PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently.
2008-01-18 11:02 [PATCH 000 of 4] md: assorted md patched - please read carefully NeilBrown
@ 2008-01-18 11:02 ` NeilBrown
2008-01-18 11:02 ` [PATCH 002 of 4] md: Allow devices to be shared between md arrays NeilBrown
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2008-01-18 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
If you try to start an array for which the number of raid disks is
listed as zero, md will currently try to read metadata off any devices
that have been given. This was done because the value of raid_disks
is used to signal whether array details have been provided by
userspace (raid_disks > 0) or must be read from the devices
(raid_disks == 0).
However for an array without persistent metadata (or with externally
managed metadata) this is the wrong thing to do. So we add a test in
do_md_run to give an error if raid_disks is zero for non-persistent
arrays.
This requires that mddev->persistent is set corrently at this point,
which it currently isn't for in-kernel autodetected arrays.
So set ->persistent for autodetect arrays, and remove the settign in
super_*_validate which is now redundant.
Also clear ->persistent when stopping an array so it is consistently
zero when starting an array.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2008-01-18 10:46:49.000000000 +1100
+++ ./drivers/md/md.c 2008-01-18 11:03:15.000000000 +1100
@@ -779,7 +779,6 @@ static int super_90_validate(mddev_t *md
mddev->major_version = 0;
mddev->minor_version = sb->minor_version;
mddev->patch_version = sb->patch_version;
- mddev->persistent = 1;
mddev->external = 0;
mddev->chunk_size = sb->chunk_size;
mddev->ctime = sb->ctime;
@@ -1159,7 +1158,6 @@ static int super_1_validate(mddev_t *mdd
if (mddev->raid_disks == 0) {
mddev->major_version = 1;
mddev->patch_version = 0;
- mddev->persistent = 1;
mddev->external = 0;
mddev->chunk_size = le32_to_cpu(sb->chunksize) << 9;
mddev->ctime = le64_to_cpu(sb->ctime) & ((1ULL << 32)-1);
@@ -3219,8 +3217,11 @@ static int do_md_run(mddev_t * mddev)
/*
* Analyze all RAID superblock(s)
*/
- if (!mddev->raid_disks)
+ if (!mddev->raid_disks) {
+ if (!mddev->persistent)
+ return -EINVAL;
analyze_sbs(mddev);
+ }
chunk_size = mddev->chunk_size;
@@ -3627,6 +3628,7 @@ static int do_md_stop(mddev_t * mddev, i
mddev->resync_max = MaxSector;
mddev->reshape_position = MaxSector;
mddev->external = 0;
+ mddev->persistent = 0;
} else if (mddev->pers)
printk(KERN_INFO "md: %s switched to read-only mode.\n",
@@ -3735,6 +3737,7 @@ static void autorun_devices(int part)
mddev_unlock(mddev);
} else {
printk(KERN_INFO "md: created %s\n", mdname(mddev));
+ mddev->persistent = 1;
ITERATE_RDEV_GENERIC(candidates,rdev,tmp) {
list_del_init(&rdev->same_set);
if (bind_rdev_to_array(rdev, mddev))
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 002 of 4] md: Allow devices to be shared between md arrays.
2008-01-18 11:02 [PATCH 000 of 4] md: assorted md patched - please read carefully NeilBrown
2008-01-18 11:02 ` [PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently NeilBrown
@ 2008-01-18 11:02 ` NeilBrown
2008-01-18 11:02 ` [PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING NeilBrown
2008-01-18 11:02 ` [PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2008-01-18 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Currently, a given device is "claimed" by a particular array so
that it cannot be used by other arrays.
This is not ideal for DDF and other metadata schemes which have
their own partitioning concept.
So for externally managed metadata, just claim the device for
md in general, require that "offset" and "size" are set
properly for each device, and make sure that if a device is
included in different arrays then the active sections do
not overlap.
This involves adding another flag to the rdev which makes it awkward
to set "->flags = 0" to clear certain flags. So now clear flags
explicitly by name when we want to clear things.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 88 +++++++++++++++++++++++++++++++++++++++-----
./include/linux/raid/md_k.h | 2 +
2 files changed, 80 insertions(+), 10 deletions(-)
diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2008-01-18 11:03:15.000000000 +1100
+++ ./drivers/md/md.c 2008-01-18 11:18:04.000000000 +1100
@@ -774,7 +774,11 @@ static int super_90_validate(mddev_t *md
__u64 ev1 = md_event(sb);
rdev->raid_disk = -1;
- rdev->flags = 0;
+ clear_bit(Faulty, &rdev->flags);
+ clear_bit(In_sync, &rdev->flags);
+ clear_bit(WriteMostly, &rdev->flags);
+ clear_bit(BarriersNotsupp, &rdev->flags);
+
if (mddev->raid_disks == 0) {
mddev->major_version = 0;
mddev->minor_version = sb->minor_version;
@@ -1154,7 +1158,11 @@ static int super_1_validate(mddev_t *mdd
__u64 ev1 = le64_to_cpu(sb->events);
rdev->raid_disk = -1;
- rdev->flags = 0;
+ clear_bit(Faulty, &rdev->flags);
+ clear_bit(In_sync, &rdev->flags);
+ clear_bit(WriteMostly, &rdev->flags);
+ clear_bit(BarriersNotsupp, &rdev->flags);
+
if (mddev->raid_disks == 0) {
mddev->major_version = 1;
mddev->patch_version = 0;
@@ -1402,7 +1410,7 @@ static int bind_rdev_to_array(mdk_rdev_t
goto fail;
}
list_add(&rdev->same_set, &mddev->disks);
- bd_claim_by_disk(rdev->bdev, rdev, mddev->gendisk);
+ bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, mddev->gendisk);
return 0;
fail:
@@ -1442,7 +1450,7 @@ static void unbind_rdev_from_array(mdk_r
* otherwise reused by a RAID array (or any other kernel
* subsystem), by bd_claiming the device.
*/
-static int lock_rdev(mdk_rdev_t *rdev, dev_t dev)
+static int lock_rdev(mdk_rdev_t *rdev, dev_t dev, int shared)
{
int err = 0;
struct block_device *bdev;
@@ -1454,13 +1462,15 @@ static int lock_rdev(mdk_rdev_t *rdev, d
__bdevname(dev, b));
return PTR_ERR(bdev);
}
- err = bd_claim(bdev, rdev);
+ err = bd_claim(bdev, shared ? (mdk_rdev_t *)lock_rdev : rdev);
if (err) {
printk(KERN_ERR "md: could not bd_claim %s.\n",
bdevname(bdev, b));
blkdev_put(bdev);
return err;
}
+ if (!shared)
+ set_bit(AllReserved, &rdev->flags);
rdev->bdev = bdev;
return err;
}
@@ -1925,7 +1935,8 @@ slot_store(mdk_rdev_t *rdev, const char
return -ENOSPC;
rdev->raid_disk = slot;
/* assume it is working */
- rdev->flags = 0;
+ clear_bit(Faulty, &rdev->flags);
+ clear_bit(WriteMostly, &rdev->flags);
set_bit(In_sync, &rdev->flags);
}
return len;
@@ -1950,6 +1961,10 @@ offset_store(mdk_rdev_t *rdev, const cha
return -EINVAL;
if (rdev->mddev->pers)
return -EBUSY;
+ if (rdev->size && rdev->mddev->external)
+ /* Must set offset before size, so overlap checks
+ * can be sane */
+ return -EBUSY;
rdev->data_offset = offset;
return len;
}
@@ -1963,16 +1978,69 @@ rdev_size_show(mdk_rdev_t *rdev, char *p
return sprintf(page, "%llu\n", (unsigned long long)rdev->size);
}
+static int overlaps(sector_t s1, sector_t l1, sector_t s2, sector_t l2)
+{
+ /* check if two start/length pairs overlap */
+ if (s1+l1 <= s2)
+ return 0;
+ if (s2+l2 <= s1)
+ return 0;
+ return 1;
+}
+
static ssize_t
rdev_size_store(mdk_rdev_t *rdev, const char *buf, size_t len)
{
char *e;
unsigned long long size = simple_strtoull(buf, &e, 10);
+ unsigned long long oldsize = rdev->size;
if (e==buf || (*e && *e != '\n'))
return -EINVAL;
if (rdev->mddev->pers)
return -EBUSY;
rdev->size = size;
+ if (size > oldsize && rdev->mddev->external) {
+ /* need to check that all other rdevs with the same ->bdev
+ * do not overlap. We need to unlock the mddev to avoid
+ * a deadlock. We have already changed rdev->size, and if
+ * we have to change it back, we will have the lock again.
+ */
+ mddev_t *mddev;
+ int overlap = 0;
+ struct list_head *tmp, *tmp2;
+
+ mddev_unlock(rdev->mddev);
+ ITERATE_MDDEV(mddev, tmp) {
+ mdk_rdev_t *rdev2;
+
+ mddev_lock(mddev);
+ ITERATE_RDEV(mddev, rdev2, tmp2)
+ if (test_bit(AllReserved, &rdev2->flags) ||
+ (rdev->bdev == rdev2->bdev &&
+ rdev != rdev2 &&
+ overlaps(rdev->data_offset, rdev->size,
+ rdev2->data_offset, rdev2->size))) {
+ overlap = 1;
+ break;
+ }
+ mddev_unlock(mddev);
+ if (overlap) {
+ mddev_put(mddev);
+ break;
+ }
+ }
+ mddev_lock(rdev->mddev);
+ if (overlap) {
+ /* Someone else could have slipped in a size
+ * change here, but doing so is just silly.
+ * We put oldsize back because we *know* it is
+ * safe, and trust userspace not to race with
+ * itself
+ */
+ rdev->size = oldsize;
+ return -EBUSY;
+ }
+ }
if (size < rdev->mddev->size || rdev->mddev->size == 0)
rdev->mddev->size = size;
return len;
@@ -2062,7 +2130,7 @@ static mdk_rdev_t *md_import_device(dev_
if ((err = alloc_disk_sb(rdev)))
goto abort_free;
- err = lock_rdev(rdev, newdev);
+ err = lock_rdev(rdev, newdev, super_format == -2);
if (err)
goto abort_free;
@@ -2615,7 +2683,9 @@ new_dev_store(mddev_t *mddev, const char
if (err < 0)
goto out;
}
- } else
+ } else if (mddev->external)
+ rdev = md_import_device(dev, -2, -1);
+ else
rdev = md_import_device(dev, -1, -1);
if (IS_ERR(rdev))
@@ -4025,8 +4095,6 @@ static int add_new_disk(mddev_t * mddev,
else
rdev->raid_disk = -1;
- rdev->flags = 0;
-
if (rdev->raid_disk < mddev->raid_disks)
if (info->state & (1<<MD_DISK_SYNC))
set_bit(In_sync, &rdev->flags);
diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h 2008-01-18 10:46:49.000000000 +1100
+++ ./include/linux/raid/md_k.h 2008-01-18 11:18:04.000000000 +1100
@@ -81,6 +81,8 @@ struct mdk_rdev_s
#define In_sync 2 /* device is in_sync with rest of array */
#define WriteMostly 4 /* Avoid reading if at all possible */
#define BarriersNotsupp 5 /* BIO_RW_BARRIER is not supported */
+#define AllReserved 6 /* If whole device is reserved for
+ * one array */
int desc_nr; /* descriptor index in the superblock */
int raid_disk; /* role of device in array */
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING.
2008-01-18 11:02 [PATCH 000 of 4] md: assorted md patched - please read carefully NeilBrown
2008-01-18 11:02 ` [PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently NeilBrown
2008-01-18 11:02 ` [PATCH 002 of 4] md: Allow devices to be shared between md arrays NeilBrown
@ 2008-01-18 11:02 ` NeilBrown
2008-01-18 11:02 ` [PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2008-01-18 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Finish ITERATE_ to for_each conversion.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 8 ++++----
./include/linux/raid/md_k.h | 14 ++++----------
2 files changed, 8 insertions(+), 14 deletions(-)
diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c 2008-01-18 11:19:09.000000000 +1100
+++ ./drivers/md/md.c 2008-01-18 11:19:24.000000000 +1100
@@ -3766,7 +3766,7 @@ static void autorun_devices(int part)
printk(KERN_INFO "md: considering %s ...\n",
bdevname(rdev0->bdev,b));
INIT_LIST_HEAD(&candidates);
- ITERATE_RDEV_PENDING(rdev,tmp)
+ rdev_for_each_list(rdev, tmp, pending_raid_disks)
if (super_90_load(rdev, rdev0, 0) >= 0) {
printk(KERN_INFO "md: adding %s ...\n",
bdevname(rdev->bdev,b));
@@ -3810,7 +3810,7 @@ static void autorun_devices(int part)
} else {
printk(KERN_INFO "md: created %s\n", mdname(mddev));
mddev->persistent = 1;
- ITERATE_RDEV_GENERIC(candidates,rdev,tmp) {
+ rdev_for_each_list(rdev, tmp, candidates) {
list_del_init(&rdev->same_set);
if (bind_rdev_to_array(rdev, mddev))
export_rdev(rdev);
@@ -3821,7 +3821,7 @@ static void autorun_devices(int part)
/* on success, candidates will be empty, on error
* it won't...
*/
- ITERATE_RDEV_GENERIC(candidates,rdev,tmp)
+ rdev_for_each_list(rdev, tmp, candidates)
export_rdev(rdev);
mddev_put(mddev);
}
@@ -4936,7 +4936,7 @@ static void status_unused(struct seq_fil
seq_printf(seq, "unused devices: ");
- ITERATE_RDEV_PENDING(rdev,tmp) {
+ rdev_for_each_list(rdev, tmp, pending_raid_disks) {
char b[BDEVNAME_SIZE];
i++;
seq_printf(seq, "%s ",
diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h 2008-01-18 11:19:09.000000000 +1100
+++ ./include/linux/raid/md_k.h 2008-01-18 11:19:24.000000000 +1100
@@ -313,23 +313,17 @@ static inline char * mdname (mddev_t * m
* iterates through some rdev ringlist. It's safe to remove the
* current 'rdev'. Dont touch 'tmp' though.
*/
-#define ITERATE_RDEV_GENERIC(head,rdev,tmp) \
+#define rdev_for_each_list(rdev, tmp, list) \
\
- for ((tmp) = (head).next; \
+ for ((tmp) = (list).next; \
(rdev) = (list_entry((tmp), mdk_rdev_t, same_set)), \
- (tmp) = (tmp)->next, (tmp)->prev != &(head) \
+ (tmp) = (tmp)->next, (tmp)->prev != &(list) \
; )
/*
* iterates through the 'same array disks' ringlist
*/
#define rdev_for_each(rdev, tmp, mddev) \
- ITERATE_RDEV_GENERIC((mddev)->disks,rdev,tmp)
-
-/*
- * Iterates through 'pending RAID disks'
- */
-#define ITERATE_RDEV_PENDING(rdev,tmp) \
- ITERATE_RDEV_GENERIC(pending_raid_disks,rdev,tmp)
+ rdev_for_each_list(rdev, tmp, (mddev)->disks)
typedef struct mdk_thread_s {
void (*run) (mddev_t *mddev);
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX
2008-01-18 11:02 [PATCH 000 of 4] md: assorted md patched - please read carefully NeilBrown
` (2 preceding siblings ...)
2008-01-18 11:02 ` [PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING NeilBrown
@ 2008-01-18 11:02 ` NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2008-01-18 11:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
(This should be merged with fix-occasional-deadlock-in-raid5.patch)
As we don't call stripe_handle in make_request any more, we need to
clear STRIPE_DELAYED to (previously done by stripe_handle) to ensure
that we test if the stripe still needs to be delayed or not.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 1 +
1 file changed, 1 insertion(+)
diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c 2008-01-18 14:58:55.000000000 +1100
+++ ./drivers/md/raid5.c 2008-01-18 14:59:53.000000000 +1100
@@ -3549,6 +3549,7 @@ static int make_request(struct request_q
}
finish_wait(&conf->wait_for_overlap, &w);
set_bit(STRIPE_HANDLE, &sh->state);
+ clear_bit(STRIPE_DELAYED, &sh->state);
release_stripe(sh);
} else {
/* cannot get stripe for read-ahead, just give-up */
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-01-18 11:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-18 11:02 [PATCH 000 of 4] md: assorted md patched - please read carefully NeilBrown
2008-01-18 11:02 ` [PATCH 001 of 4] md: Set and test the ->persistent flag for md devices more consistently NeilBrown
2008-01-18 11:02 ` [PATCH 002 of 4] md: Allow devices to be shared between md arrays NeilBrown
2008-01-18 11:02 ` [PATCH 003 of 4] md: Change ITERATE_RDEV_GENERIC to rdev_for_each_list, and remove ITERATE_RDEV_PENDING NeilBrown
2008-01-18 11:02 ` [PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).