From: Neil Brown <neilb@suse.de>
To: Andre Noll <maan@systemlinux.org>
Cc: linux-raid <linux-raid@vger.kernel.org>,
"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [PATCH/Resend] md: Push down data integrity code to personalities.
Date: Tue, 7 Jul 2009 13:42:56 +1000 [thread overview]
Message-ID: <19026.50240.504728.428875@notabene.brown> (raw)
In-Reply-To: message from Andre Noll on Wednesday July 1
On Wednesday July 1, maan@systemlinux.org wrote:
> Hi Neil,
>
> here's again the patch that reduces the knowledge about specific
> raid levels from md.c by moving the data integrity code to the
> personalities. The patch was tested and acked by Martin.
>
> Please review.
Apologies for the delay. I've been fighting a flu :-(
This patch seems to treat spares inconsistently.
md_integrity_register ignores spares.
However bind_rdev_to_array - which is used for adding a spare - calls
md_integrity_add_rdev to check that the integrity profile of the new
device matches.
We need to be consistent.
Either all devices that are bound to the array - whether active,
spare, or failed - are considered, or only the active devices are
considered.
In the former case we want to take action in bind_rdev_to_array
and possibly in unbind_rdev_from_array.
In the latter we need to take action either in remove_and_add_spares,
or in the per-personality ->hot_add_disk and ->hot_remove_disk
methods.
I think I lean towards the latter, and put code in ->hot_*_disk, but
it isn't a strong leaning.
Thanks,
NeilBrown
>
> Thanks
> Andre
>
> commit 51295532895ffe532a5d8401fc32073100268b29
> Author: Andre Noll <maan@systemlinux.org>
> Date: Fri Jun 19 14:40:46 2009 +0200
>
> [PATCH/RFC] md: Push down data integrity code to personalities.
>
> This patch replaces md_integrity_check() by two new functions:
> md_integrity_register() and md_integrity_add_rdev() which are both
> personality-independent.
>
> md_integrity_register() is a public function which is called from
> the ->run method of all personalities that support data integrity.
> The function iterates over the component devices of the array and
> determines if all active devices are integrity capable and if their
> profiles match. If this is the case, the common profile is registered
> for the mddev via blk_integrity_register().
>
> The second new function, md_integrity_add_rdev(), is internal to
> md.c and is called by bind_rdev_to_array(), i.e. whenever a new
> device is about to be added to a raid array. If the new device does
> not support data integrity or has a profile different from the one
> already registered, data integrity for the mddev is disabled.
>
> Conversely, removing a device from a (raid1-)array might make the mddev
> integrity-capable. The patch adds a call to md_integrity_register()
> to the error path of raid1.c in order to activate data integrity in
> this case.
>
> Signed-off-by: Andre Noll <maan@systemlinux.org>
> Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
>
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index dda2f1b..15aa325 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -201,6 +201,7 @@ static int linear_run (mddev_t *mddev)
> mddev->queue->unplug_fn = linear_unplug;
> mddev->queue->backing_dev_info.congested_fn = linear_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> + md_integrity_register(mddev);
> return 0;
> }
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0f11fd1..54436cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1491,36 +1491,71 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
>
> static LIST_HEAD(pending_raid_disks);
>
> -static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
> +/*
> + * Try to register data integrity profile for an mddev
> + *
> + * This only succeeds if all working and active component devices are integrity
> + * capable with matching profiles.
> + */
> +int md_integrity_register(mddev_t *mddev)
> {
> - struct mdk_personality *pers = mddev->pers;
> - struct gendisk *disk = mddev->gendisk;
> + mdk_rdev_t *rdev, *reference = NULL;
> +
> + if (list_empty(&mddev->disks))
> + return 0; /* nothing to do */
> + if (blk_get_integrity(mddev->gendisk))
> + return 0; /* already registered */
> + list_for_each_entry(rdev, &mddev->disks, same_set) {
> + /* skip spares and non-functional disks */
> + if (test_bit(Faulty, &rdev->flags))
> + continue;
> + if (rdev->raid_disk < 0)
> + continue;
> + /*
> + * If at least one rdev is not integrity capable, we can not
> + * enable data integrity for the md device.
> + */
> + if (!bdev_get_integrity(rdev->bdev))
> + return -EINVAL;
> + if (!reference) {
> + /* Use the first rdev as the reference */
> + reference = rdev;
> + continue;
> + }
> + /* does this rdev's profile match the reference profile? */
> + if (blk_integrity_compare(reference->bdev->bd_disk,
> + rdev->bdev->bd_disk) < 0)
> + return -EINVAL;
> + }
> + /*
> + * All component devices are integrity capable and have matching
> + * profiles, register the common profile for the md device.
> + */
> + if (blk_integrity_register(mddev->gendisk,
> + bdev_get_integrity(reference->bdev)) != 0) {
> + printk(KERN_ERR "md: failed to register integrity for %s\n",
> + mdname(mddev));
> + return -EINVAL;
> + }
> + printk(KERN_NOTICE "md: data integrity on %s enabled\n",
> + mdname(mddev));
> + return 0;
> +}
> +EXPORT_SYMBOL(md_integrity_register);
> +
> +/* Disable data integrity if non-capable/non-matching disk is being added */
> +static void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> + struct gendisk *gd = mddev->gendisk;
> struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
> - struct blk_integrity *bi_mddev = blk_get_integrity(disk);
> + struct blk_integrity *bi_mddev = blk_get_integrity(gd);
>
> - /* Data integrity passthrough not supported on RAID 4, 5 and 6 */
> - if (pers && pers->level >= 4 && pers->level <= 6)
> + if (!bi_mddev) /* nothing to do */
> return;
> -
> - /* If rdev is integrity capable, register profile for mddev */
> - if (!bi_mddev && bi_rdev) {
> - if (blk_integrity_register(disk, bi_rdev))
> - printk(KERN_ERR "%s: %s Could not register integrity!\n",
> - __func__, disk->disk_name);
> - else
> - printk(KERN_NOTICE "Enabling data integrity on %s\n",
> - disk->disk_name);
> + if (bi_rdev && blk_integrity_compare(gd, rdev->bdev->bd_disk) >= 0)
> return;
> - }
> -
> - /* Check that mddev and rdev have matching profiles */
> - if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
> - printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
> - disk->disk_name, rdev->bdev->bd_disk->disk_name);
> - printk(KERN_NOTICE "Disabling data integrity on %s\n",
> - disk->disk_name);
> - blk_integrity_unregister(disk);
> - }
> + printk(KERN_NOTICE "disabling data integrity on %s\n", mdname(mddev));
> + blk_integrity_unregister(gd);
> }
>
> static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
> @@ -1595,7 +1630,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
> /* May as well allow recovery to be retried once */
> mddev->recovery_disabled = 0;
>
> - md_integrity_check(rdev, mddev);
> + md_integrity_add_rdev(rdev, mddev);
> return 0;
>
> fail:
> @@ -4048,10 +4083,6 @@ static int do_md_run(mddev_t * mddev)
> }
> strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
>
> - if (pers->level >= 4 && pers->level <= 6)
> - /* Cannot support integrity (yet) */
> - blk_integrity_unregister(mddev->gendisk);
> -
> if (mddev->reshape_position != MaxSector &&
> pers->start_reshape == NULL) {
> /* This personality cannot handle reshaping... */
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ea2c441..9433a5d 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -430,5 +430,6 @@ extern void md_new_event(mddev_t *mddev);
> extern int md_allow_write(mddev_t *mddev);
> extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
> extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
> +extern int md_integrity_register(mddev_t *mddev);
>
> #endif /* _MD_MD_H */
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index c1ca63f..3d3a308 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -515,7 +515,7 @@ static int multipath_run (mddev_t *mddev)
> mddev->queue->unplug_fn = multipath_unplug;
> mddev->queue->backing_dev_info.congested_fn = multipath_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> -
> + md_integrity_register(mddev);
> return 0;
>
> out_free_conf:
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 851e631..902de77 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -346,6 +346,7 @@ static int raid0_run(mddev_t *mddev)
>
> blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
> dump_zones(mddev);
> + md_integrity_register(mddev);
> return 0;
> }
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 89939a7..44fbeda 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1045,6 +1045,11 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
> printk(KERN_ALERT "raid1: Disk failure on %s, disabling device.\n"
> "raid1: Operation continuing on %d devices.\n",
> bdevname(rdev->bdev,b), conf->raid_disks - mddev->degraded);
> + /*
> + * The good news is that kicking a disk might allow to enable data
> + * integrity on the mddev.
> + */
> + md_integrity_register(mddev);
> }
>
> static void print_conf(conf_t *conf)
> @@ -1178,7 +1183,9 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
> /* lost the race, try later */
> err = -EBUSY;
> p->rdev = rdev;
> + goto abort;
> }
> + md_integrity_register(mddev);
> }
> abort:
>
> @@ -2068,7 +2075,7 @@ static int run(mddev_t *mddev)
> mddev->queue->unplug_fn = raid1_unplug;
> mddev->queue->backing_dev_info.congested_fn = raid1_congested;
> mddev->queue->backing_dev_info.congested_data = mddev;
> -
> + md_integrity_register(mddev);
> return 0;
>
> out_no_mem:
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index ae12cea..3e553e3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1203,7 +1203,9 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
> /* lost the race, try later */
> err = -EBUSY;
> p->rdev = rdev;
> + goto abort;
> }
> + md_integrity_register(mddev);
> }
> abort:
>
> @@ -2218,6 +2220,7 @@ static int run(mddev_t *mddev)
>
> if (conf->near_copies < mddev->raid_disks)
> blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
> + md_integrity_register(mddev);
> return 0;
>
> out_free_conf:
> --
> The only person who always got his work done by Friday was Robinson Crusoe
next prev parent reply other threads:[~2009-07-07 3:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-01 8:38 [PATCH/Resend] md: Push down data integrity code to personalities Andre Noll
2009-07-07 3:42 ` Neil Brown [this message]
2009-07-07 13:44 ` Andre Noll
2009-07-07 22:10 ` Bill Davidsen
2009-07-13 8:54 ` Andre Noll
2009-07-31 5:06 ` Neil Brown
2009-08-03 16:40 ` Andre Noll
2009-08-04 5:28 ` Martin K. Petersen
2009-08-06 8:37 ` Andre Noll
2009-08-07 4:48 ` Martin K. Petersen
2009-08-06 3:31 ` Neil Brown
2009-08-07 16:46 ` Andre Noll
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=19026.50240.504728.428875@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=maan@systemlinux.org \
--cc=martin.petersen@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).