From: NeilBrown <neilb@suse.de>
To: majianpeng <majianpeng@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: [RFC PATCH V1] raid1: rewrite the iobarrier
Date: Mon, 4 Mar 2013 14:18:07 +1100 [thread overview]
Message-ID: <20130304141807.083e33a2@notabene.brown> (raw)
In-Reply-To: <2013022016540861404722@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6919 bytes --]
On Wed, 20 Feb 2013 16:54:12 +0800 majianpeng <majianpeng@gmail.com> wrote:
> >On Thu, 24 Jan 2013 14:02:52 +0800 majianpeng <majianpeng@gmail.com> wrote:
> >
> >> There is iobarrier in raid1 because two reason resync/recovery or reconfigure the array.
> >> At present,it suspend all nornal IO when reysync/recovey.
> >> But if nornal IO is outrange the resync/recovery windwos,it don't need to iobarrier.
> >> So I rewrite the iobarrier.
> >> Because the reasons of barrier are two,so i use two different methods.
> >> First for resync/recovery, there is a reysnc window.The end position is 'next_resync'.Because the resync depth is RESYNC_DEPTH(32),
> >> so the start is 'next_resync - RESYNC_SECTOR * RESYNC_DEPTH'
> >> The nornal IO Will be divided into three categories by the location.
> >> a: before the start of resync window
> >> b: between the resync window
> >> c: after the end of resync window
> >> For a, it don't barrier.
> >> For b, it need barrier and used the original method
> >> For c, it don't barrier but it need record the minimum position.If next resync is larger this, resync action will suspend.Otherwise versa.
> >> I used rbtree to order those io.
> >>
> >> For the reason of reconfigure of the arrary,I proposed a concept "force_barrier".When there is force_barrier, all Nornam IO must be suspended.
> >>
> >> NOTE:
> >> Because this problem is also for raid10, but i only do it for raid1. It is post out mainly to make sure it is
> >> going in the correct direction and hope to get some helpful comments from other guys.
> >> If the methods is accepted,i will send the patch for raid10.
> >>
> >> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
> >
> >Hi,
> > thanks for this and sorry for the delay in replying.
> >
> Hi, sorroy for delay in replying.Thanks very much for your suggestion.
> >The patch is reasonably good, but there is room for improvement.
> >I would break it up into several patches which are easier to review.
> >
> >- Firstly, don't worry about the barrier for 'read' requests - it really
> > isn't relevant for them (your patch didn't do this).
> >
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index bd6a188..2e5bf75 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -235,6 +235,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
> struct bio *bio = r1_bio->master_bio;
> int done;
> struct r1conf *conf = r1_bio->mddev->private;
> + int rw = bio_data_dir(bio);
>
> if (bio->bi_phys_segments) {
> unsigned long flags;
> @@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
> * Wake up any possible resync thread that waits for the device
> * to go idle.
> */
> - allow_barrier(conf);
> + if (rw == WRITE)
> + allow_barrier(conf);
> }
> }
>
> @@ -1035,7 +1037,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
> finish_wait(&conf->wait_barrier, &w);
> }
>
> - wait_barrier(conf);
> + if (rw == WRITE)
> + wait_barrier(conf);
>
> bitmap = mddev->bitmap;\x02
>
> Above code is what's you said.But it met read-error,raid1d will blocked for ever.
> The reason is freeze_array:
> > wait_event_lock_irq_cmd(conf->wait_barrier,
> > conf->nr_pending == conf->nr_queued+1,
> For read operation, it can't add nr_pending.
> Are you have good method to do this?
Only update nr_queued for Write requests, not for read requests?
>
> >- Secondly, find those places where raise_barrier() is used to reconfigure
> > the array and replace those with "freeze_array()". I think that is safe,
> > and it means that we don't need to pass a 'force' argument to
> > 'raise_barrier()' and 'lower_barrier()'.
> But it will be blocked for ever.
> The comment of freeze_array,
> > * This is called in the context of one normal IO request
> > * that has failed.
> In freeze_array,the judgement is
> >wait_event_lock_irq_cmd(conf->wait_barrier,
> > conf->nr_pending == conf->nr_queued+1,
> Because the place where call this func in io context,so there must be nr_pending.
> But for the flace where called reconfigure array don't in io context.So the condition
> "conf->nr_pending == conf->nr_queued+1" is never true.
> If we add a parameter 'int iocontext' to freeze_array, that is
> >static void freeze_array(struct r1conf *conf, int iocontext)
> >if (iocontext)
> >wait_event_lock_irq_cmd(conf->wait_barrier,
> > conf->nr_pending == conf->nr_queued+1,
> > else
> > wait_event_lock_irq_cmd(conf->wait_barrier,
> > conf->nr_pending == conf->nr_queued,
>
> How about this method?
Probably makes sense - hard to tell without the full context of a complete
patch.
> >
> >- The rest might have to be all one patch, though if it could be done in a
> > couple of distinct stages that would be good.
> > For the different positions (before, during, after), which you currently
> > call 0, 1, and 2 you should use an enum so they all have names.
> >
> Ok,thanks!
If you could blank lines around the text which is your reply, it would make
it a lot easier to find and to read.
> > I don't really like the rbtree. It adds an extra place where you take the
> > spinlock and causes more work to be done inside a spinlock. And it isn't
> > really needed. Instead, you can have two R1BIO flags for "AFTER_RESYNC".
> > One for requests that are more than 2 windows after, one for request that
> > are less then 2 windows after (or maybe 3 windows or maybe 8..). Each of
> > these has a corresponding count (as you already have: nr_after).
> > Then when resync gets to the end of a window you wait until the count of
> For resync operation,there is no resync window.How to do this?
You aren't explaining yourself very well (again!).
> > "less than 2 windows after" reaches zero, then atomically swap the meaning
> > of the two bits (toggle a bit somewhere).
> We don't know the nearset position from request which more than 2 windows after.
> For example, there are three request after 2 windows.
> A is after three windows;B is after six windows; C is after 11 windows.
> When the count of "less than 2 windows after" reached zero, how to do?
> > This should give you nearly the same effect with constant (not log-n)
> > effort.
> >
>
> >Finally, have you done any testing, either to ensure there is no slow-down,
> >or to demonstrate some improvement?
> >
> I only used dd to test.There was no apparent performance degradation
There is no point reporting any results of a performance test without
actually giving numbers.
NeilBrown
>
> Thanks!
> Jianpeng Ma
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
prev parent reply other threads:[~2013-03-04 3:18 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-24 6:02 [RFC PATCH V1] raid1: rewrite the iobarrier majianpeng
2013-02-06 2:16 ` NeilBrown
2013-02-20 8:54 ` majianpeng
2013-03-04 3:18 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130304141807.083e33a2@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=majianpeng@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).