Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Jianpeng Ma <majianpeng@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time
Date: Thu, 20 Sep 2012 12:51:44 +1000	[thread overview]
Message-ID: <20120920125144.0ed69ec3@notabene.brown> (raw)
In-Reply-To: <2012091510203206229010@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3705 bytes --]

On Sat, 15 Sep 2012 10:20:35 +0800 "Jianpeng Ma" <majianpeng@gmail.com> wrote:

> In func 'ops_run_bio' if you read the dev which the last reading
> of this dev didn't return,it will destrory the  req/rreq'source of rdev.
> It may call hung-task.
> For example, for badsector or other reasons, read-operation only used
> stripe instead of chunk_aligned_read.
> First:stripe 0;second: stripe 8;third:stripe 16.At the block-layer,three
> bios merged.
> Because media error of sector from 0 to 7, the request retried.
> At this time, raid5d readed stripe0 again.But it will set 'bio->next =
> NULL'.So the stripe 8 and 16 didn't return.
> 
> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>

Hi,
 I'm really trying, but I cannot understand what you are saying.

I think the situation that you are describing involves a 24 sector request.
This is attached to 3 stripe_heads - 8 sectors each - at address 0, 8, 16.

So 'toread' on the first device of each stripe points to this bio, and
bi_next is NULL.

The "req" bio for each device is filled out to read one page and these three
'req' bios are submitted.  The block layer merges these into a single request.

This request reports an error because there is a read error somewhere in the
first  8 sectors.

So one, or maybe all, of the 'req' bios return with an error?
Then RAID5 marks that device as being bad and reads from the other devices to
recover the data.

Where exactly does something go wrong?  which bio gets bi_next set to NULL
when it shouldn't?

If you could explain with lots of detail it would help a lot.

Thanks,
NeilBrown


> ---
>  drivers/md/raid5.c |   13 ++++++++++---
>  drivers/md/raid5.h |    1 +
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index adda94d..e52ba1b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -547,9 +547,14 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>  				rw = WRITE_FUA;
>  			else
>  				rw = WRITE;
> -		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
> -			rw = READ;
> -		else if (test_and_clear_bit(R5_WantReplace,
> +		} else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags)) {
> +			/*The last read did not return,so skip this read*/
> +			if (test_and_set_bit(R5_Reading, &sh->dev[i].flags)) {
> +				clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +				continue;
> +			} else
> +				rw = READ;
> +		} else if (test_and_clear_bit(R5_WantReplace,
>  					    &sh->dev[i].flags)) {
>  			rw = WRITE;
>  			replace_only = 1;
> @@ -700,6 +705,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
>  			pr_debug("skip op %ld on disc %d for sector %llu\n",
>  				bi->bi_rw, i, (unsigned long long)sh->sector);
>  			clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +			clear_bit(R5_Reading, &sh->dev[i].flags);
>  			set_bit(STRIPE_HANDLE, &sh->state);
>  		}
>  	}
> @@ -1818,6 +1824,7 @@ static void raid5_end_read_request(struct bio * bi, int error)
>  	}
>  	rdev_dec_pending(rdev, conf->mddev);
>  	clear_bit(R5_LOCKED, &sh->dev[i].flags);
> +	clear_bit(R5_Reading, &sh->dev[i].flags);
>  	set_bit(STRIPE_HANDLE, &sh->state);
>  	release_stripe(sh);
>  }
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index a9fc249..a596df8 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -298,6 +298,7 @@ enum r5dev_flags {
>  	R5_WantReplace, /* We need to update the replacement, we have read
>  			 * data in, and now is a good time to write it out.
>  			 */
> +	R5_Reading,	/* this dev on reading on lld*/
>  };
>  
>  /*


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2012-09-20  2:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-15  2:20 [PATCH] raid5: Avoid doing more read on dev of a stripe at the same time Jianpeng Ma
2012-09-20  2:51 ` NeilBrown [this message]
2012-09-20  3:04   ` Jianpeng Ma
2012-09-20  3:24     ` NeilBrown
2012-09-20  6:04       ` Jianpeng Ma
2012-09-21  2:24       ` Jianpeng Ma
2012-09-25  7:29         ` NeilBrown
2012-09-26  2:43           ` NeilBrown
2012-09-26  4:09             ` Jianpeng Ma
2012-09-26  9:14             ` Jianpeng Ma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120920125144.0ed69ec3@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=majianpeng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).