linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "Guilherme G. Piccoli" <gpiccoli@canonical.com>,
	linux-raid@vger.kernel.org
Cc: Neil F Brown <nfbrown@suse.com>, Song Liu <songliubraving@fb.com>,
	liu.song.a23@gmail.com, linux-block@vger.kernel.org,
	dm-devel@redhat.com, jay.vosburgh@canonical.com
Subject: Re: [PATCH v3 1/2] md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone
Date: Fri, 30 Aug 2019 18:08:10 +1000	[thread overview]
Message-ID: <87d0gnf53p.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190822161318.26236-1-gpiccoli@canonical.com>


[-- Attachment #1.1: Type: text/plain, Size: 8530 bytes --]

On Thu, Aug 22 2019,  Guilherme G. Piccoli  wrote:

> Currently md raid0/linear are not provided with any mechanism to validate
> if an array member got removed or failed. The driver keeps sending BIOs
> regardless of the state of array members, and kernel shows state 'clean'
> in the 'array_state' sysfs attribute. This leads to the following
> situation: if a raid0/linear array member is removed and the array is
> mounted, some user writing to this array won't realize that errors are
> happening unless they check dmesg or perform one fsync per written file.
> Despite udev signaling the member device is gone, 'mdadm' cannot issue the
> STOP_ARRAY ioctl successfully, given the array is mounted.
>
> In other words, no -EIO is returned and writes (except direct ones) appear
> normal. Meaning the user might think the wrote data is correctly stored in
> the array, but instead garbage was written given that raid0 does stripping
> (and so, it requires all its members to be working in order to not corrupt
> data). For md/linear, writes to the available members will work fine, but
> if the writes go to the missing member(s), it'll cause a file corruption
> situation, whereas the portion of the writes to the missing devices aren't
> written effectively.
>
> This patch changes this behavior: we check if the block device's gendisk
> is UP when submitting the BIO to the array member, and if it isn't, we flag
> the md device as MD_BROKEN and fail subsequent I/Os to that device; a read
> request to the array requiring data from a valid member is still completed.
> While flagging the device as MD_BROKEN, we also show a rate-limited warning
> in the kernel log.
>
> A new array state 'broken' was added too: it mimics the state 'clean' in
> every aspect, being useful only to distinguish if the array has some member
> missing. We rely on the MD_BROKEN flag to put the array in the 'broken'
> state. This state cannot be written in 'array_state' as it just shows
> one or more members of the array are missing but acts like 'clean', it
> wouldn't make sense to write it.
>
> With this patch, the filesystem reacts much faster to the event of missing
> array member: after some I/O errors, ext4 for instance aborts the journal
> and prevents corruption. Without this change, we're able to keep writing
> in the disk and after a machine reboot, e2fsck shows some severe fs errors
> that demand fixing. This patch was tested in ext4 and xfs filesystems, and
> requires a 'mdadm' counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>
> v2 -> v3:
> * Rebased against md-next.
>
> * Merged both patches in a single one (thanks Song for the
> suggestion); now we fail BIOs and mark array as MD_BROKEN
> when a member is missing. We rely in the MD_BROKEN flag
> to set array to 'broken' state.
>
> * Function is_missing_dev() was removed due to the above.
>
> v1 -> v2:
> * Added handling for md/linear 'broken' state;
> * Check for is_missing_dev() instead of personality (thanks Neil for
> the suggestion);
> * Changed is_missing_dev() handlers to static;
> * Print rate-limited warning in case of more members go away, not only
> the first.
>
> Cover-letter from v1:
> lore.kernel.org/linux-block/20190729203135.12934-1-gpiccoli@canonical.com
>
>
>  drivers/md/md-linear.c |  9 +++++++++
>  drivers/md/md.c        | 22 ++++++++++++++++++----
>  drivers/md/md.h        |  3 +++
>  drivers/md/raid0.c     | 10 ++++++++++
>  4 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> index 7354466ddc90..0479ccdbdeeb 100644
> --- a/drivers/md/md-linear.c
> +++ b/drivers/md/md-linear.c
> @@ -258,6 +258,15 @@ static bool linear_make_request(struct mddev *mddev, struct bio *bio)
>  		     bio_sector < start_sector))
>  		goto out_of_bounds;
>  
> +	if (unlikely(!(tmp_dev->rdev->bdev->bd_disk->flags & GENHD_FL_UP))) {
> +		if (!test_bit(MD_BROKEN, &mddev->flags))
> +			pr_warn("md: %s: linear array has a missing/failed member\n",
> +				mdname(mddev));
> +		set_bit(MD_BROKEN, &mddev->flags);

It is a minor point, but I think this would look nicer as
   if (!test_and_set_bit(MD_BROKEN, ....) { .. . }


> +		bio_io_error(bio);
> +		return true;
> +	}
> +
>  	if (unlikely(bio_end_sector(bio) > end_sector)) {
>  		/* This bio crosses a device boundary, so we have to split it */
>  		struct bio *split = bio_split(bio, end_sector - bio_sector,
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b46bb143e3c5..e7612033005f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -376,6 +376,11 @@ static blk_qc_t md_make_request(struct request_queue *q, struct bio *bio)
>  	struct mddev *mddev = q->queuedata;
>  	unsigned int sectors;
>  
> +	if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
> +		bio_io_error(bio);
> +		return BLK_QC_T_NONE;
> +	}
> +
>  	blk_queue_split(q, &bio);
>  
>  	if (mddev == NULL || mddev->pers == NULL) {
> @@ -4158,12 +4163,17 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0/LINEAR-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0/LINEAR mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert the
> + *     user that something is wrong.
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
> -		   write_pending, active_idle, bad_word};
> +		   write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>  	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
> -	"write-pending", "active-idle", NULL };
> +	"write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4179,7 +4189,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>  	enum array_state st = inactive;
>  
> -	if (mddev->pers && !test_bit(MD_NOT_READY, &mddev->flags))
> +	if (mddev->pers && !test_bit(MD_NOT_READY, &mddev->flags)) {
>  		switch(mddev->ro) {
>  		case 1:
>  			st = readonly;
> @@ -4199,7 +4209,10 @@ array_state_show(struct mddev *mddev, char *page)
>  				st = active;
>  			spin_unlock(&mddev->lock);
>  		}
> -	else {
> +
> +		if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && st == clean)

I prefer to keep "unlikely" for performance-sensitive code.  This is not
performance sensitive.

Even without those changes:
  Reviewed-by: NeilBrown <neilb@suse.de>


Thanks,
NeilBrown

> +			st = broken;
> +	} else {
>  		if (list_empty(&mddev->disks) &&
>  		    mddev->raid_disks == 0 &&
>  		    mddev->dev_sectors == 0)
> @@ -4313,6 +4326,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>  		break;
>  	case write_pending:
>  	case active_idle:
> +	case broken:
>  		/* these cannot be set */
>  		break;
>  	}
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1edcd967eb8e..240de65e15e8 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -251,6 +251,9 @@ enum mddev_flags {
>  	MD_NOT_READY,		/* do_md_run() is active, so 'array_state'
>  				 * must not report that array is ready yet
>  				 */
> +	MD_BROKEN,              /* This is used in RAID-0/LINEAR only, to stop
> +				 * I/O in case an array member is gone/failed.
> +				 */
>  };
>  
>  enum mddev_sb_flags {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index bf5cf184a260..7772f5350bf2 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -586,6 +586,16 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
>  
>  	zone = find_zone(mddev->private, &sector);
>  	tmp_dev = map_sector(mddev, zone, sector, &sector);
> +
> +	if (unlikely(!(tmp_dev->bdev->bd_disk->flags & GENHD_FL_UP))) {
> +		if (!test_bit(MD_BROKEN, &mddev->flags))
> +			pr_warn("md: %s: raid0 array has a missing/failed member\n",
> +				mdname(mddev));
> +		set_bit(MD_BROKEN, &mddev->flags);
> +		bio_io_error(bio);
> +		return true;
> +	}
> +
>  	bio_set_dev(bio, tmp_dev->bdev);
>  	bio->bi_iter.bi_sector = sector + zone->dev_start +
>  		tmp_dev->data_offset;
> -- 
> 2.22.0

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



      parent reply	other threads:[~2019-08-30  8:08 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22 16:13 [PATCH v3 1/2] md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone Guilherme G. Piccoli
2019-08-22 16:13 ` [PATCH v3 2/2] mdadm: Introduce new array state 'broken' for raid0/linear Guilherme G. Piccoli
2019-08-22 21:56   ` Song Liu
2019-08-29 12:51   ` Guilherme G. Piccoli
2019-08-30  8:17   ` NeilBrown
2019-08-30 12:48     ` Guilherme G. Piccoli
2019-08-22 21:55 ` [PATCH v3 1/2] md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone Song Liu
2019-08-23 17:48   ` Guilherme G. Piccoli
2019-08-23 17:51     ` Song Liu
2019-08-30 10:47       ` Wols Lists
2019-08-30 11:25         ` Guilherme Piccoli
2019-09-03 19:53           ` Guilherme G. Piccoli
2019-08-30  8:08 ` NeilBrown [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d0gnf53p.fsf@notabene.neil.brown.name \
    --to=neilb@suse.de \
    --cc=dm-devel@redhat.com \
    --cc=gpiccoli@canonical.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=liu.song.a23@gmail.com \
    --cc=nfbrown@suse.com \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).