Re: Re: [RFC PATCH V1] raid1: rewrite the iobarrier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: majianpeng <majianpeng@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Re: [RFC PATCH V1] raid1: rewrite the iobarrier
Date: Wed, 20 Feb 2013 16:54:12 +0800	[thread overview]
Message-ID: <2013022016540861404722@gmail.com> (raw)
In-Reply-To: 20130206131653.27d025a6@notabene.brown

>On Thu, 24 Jan 2013 14:02:52 +0800 majianpeng <majianpeng@gmail.com> wrote:
>
>> There is iobarrier in raid1 because two reason resync/recovery  or reconfigure the array.
>> At present,it suspend all nornal IO when reysync/recovey.
>> But if nornal IO is outrange the resync/recovery windwos,it  don't need to iobarrier.
>> So I rewrite the iobarrier.
>> Because the reasons of barrier are two,so i use two different methods.
>> First for resync/recovery, there is a reysnc window.The end position is 'next_resync'.Because the resync depth is  RESYNC_DEPTH(32),
>> so the start is 'next_resync - RESYNC_SECTOR * RESYNC_DEPTH'
>> The nornal IO Will be divided into three categories by the location.
>> a: before the start of resync window
>> b: between the resync window
>> c: after the end of resync window
>> For a, it don't barrier.
>> For b, it need barrier and used the original method
>> For c, it don't barrier but it need record the minimum position.If next resync is larger this, resync action will suspend.Otherwise versa.
>> I used rbtree to order those io.
>> 
>> For the reason of reconfigure of the arrary,I proposed a concept "force_barrier".When there is force_barrier, all Nornam IO must be suspended.
>> 
>> NOTE:
>> Because this problem is also for raid10, but i only do it for raid1. It is post out mainly to make sure it is 
>> going in the correct direction and hope to get some helpful comments from other guys.
>> If the methods is accepted,i will send the patch for raid10.
>> 
>> Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
>
>Hi,
> thanks for this and sorry for the delay in replying.
>
Hi, sorroy for delay in replying.Thanks very much for your suggestion.
>The patch is reasonably good, but there is room for improvement.
>I would break it up into several patches which are easier to review.
>
>- Firstly, don't worry about the barrier for 'read' requests - it really
>  isn't relevant for them (your patch didn't do this).
>
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index bd6a188..2e5bf75 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -235,6 +235,7 @@ static void call_bio_endio(struct r1bio *r1_bio)
        struct bio *bio = r1_bio->master_bio;
        int done;
        struct r1conf *conf = r1_bio->mddev->private;
+       int rw = bio_data_dir(bio);
 
        if (bio->bi_phys_segments) {
                unsigned long flags;
@@ -253,7 +254,8 @@ static void call_bio_endio(struct r1bio *r1_bio)
                 * Wake up any possible resync thread that waits for the device
                 * to go idle.
                 */
-               allow_barrier(conf);
+               if (rw == WRITE)
+                       allow_barrier(conf);
        }
 }
 
@@ -1035,7 +1037,8 @@ static void make_request(struct mddev *mddev, struct bio * bio)
                finish_wait(&conf->wait_barrier, &w);
        }
 
-       wait_barrier(conf);
+       if (rw == WRITE)
+               wait_barrier(conf);
 
        bitmap = mddev->bitmap;\x02

Above code is what's you said.But it met read-error,raid1d will blocked for ever.
The reason is freeze_array:
>       wait_event_lock_irq_cmd(conf->wait_barrier,
>                                conf->nr_pending == conf->nr_queued+1,
For read operation, it can't add nr_pending.
Are you have good method to do this?

>- Secondly, find those places where raise_barrier() is used to reconfigure
>  the array and replace those with "freeze_array()".  I think that is safe,
>  and it means that we don't need to pass a 'force' argument to
>  'raise_barrier()' and 'lower_barrier()'.
But it will be blocked for ever.
The comment of freeze_array,
> * This is called in the context of one normal IO request
>	 * that has failed.
In freeze_array,the judgement is
>wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued+1,
Because the place where call this func in io context,so there must be nr_pending.
But for the flace where called reconfigure array don't  in io context.So the condition
"conf->nr_pending == conf->nr_queued+1" is never true.
If we add a parameter 'int iocontext' to freeze_array, that is
>static void freeze_array(struct r1conf *conf, int iocontext)
>if (iocontext)
>wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued+1,
>  else
> wait_event_lock_irq_cmd(conf->wait_barrier,
>				conf->nr_pending == conf->nr_queued,

How about this method?
>
>- The rest might have to be all one patch, though if it could be done in a
>  couple of distinct stages that would be good.
>  For the different positions (before, during, after), which you currently
>  call 0, 1, and 2 you should use an enum so they all have names.
> 
Ok,thanks!
>  I don't really like the rbtree.  It adds an extra place where you take the
>  spinlock and causes more work to be done inside a spinlock.  And it isn't
>  really needed.  Instead, you can have two R1BIO flags for "AFTER_RESYNC".
>  One for requests that are more than 2 windows after, one for request that
>  are less then 2 windows after (or maybe 3 windows or maybe 8..).  Each of
>  these has a corresponding count (as you already have: nr_after).
>  Then when resync gets to the end of a window you wait until the count of 
For resync operation,there is no resync window.How to do this?
>  "less than 2 windows after" reaches zero, then atomically swap the meaning
>  of the two bits (toggle a bit somewhere).
We don't know the nearset position from request which more than 2 windows after.
For example, there are three request after 2 windows.
A is after three windows;B is after six windows; C is after 11 windows.
When the count of "less than 2 windows after" reached zero, how to do?
>  This should give you nearly the same effect with constant (not log-n)
>  effort.
>

>Finally, have you done any testing, either to ensure there is no slow-down,
>or to demonstrate some improvement?
>
I only used dd to test.There was no apparent performance degradation 

Thanks!
Jianpeng Ma

next prev parent reply	other threads:[~2013-02-20  8:54 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-24  6:02 [RFC PATCH V1] raid1: rewrite the iobarrier majianpeng
2013-02-06  2:16 ` NeilBrown
2013-02-20  8:54   ` majianpeng [this message]
2013-03-04  3:18     ` NeilBrown

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:bd6a188 dfblob:2e5bf75 )
 OR (
bs:"Re: Re: [RFC PATCH V1] raid1: rewrite the iobarrier" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2013022016540861404722@gmail.com \
    --to=majianpeng@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.