linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Shaohua Li <shli@kernel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: [RFC 1/2]raid1: only write mismatch sectors in sync
Date: Tue, 11 Sep 2012 10:59:08 +1000	[thread overview]
Message-ID: <20120911105908.51681433@notabene.brown> (raw)
In-Reply-To: <CANejiEWZk6JcErw9Y6cqpotYMypQA_jvv8N0612FBq81Lo8jVQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4865 bytes --]

On Tue, 31 Jul 2012 16:12:04 +0800 Shaohua Li <shli@kernel.org> wrote:

> 2012/7/31 NeilBrown <neilb@suse.de>:
> > On Thu, 26 Jul 2012 16:01:50 +0800 Shaohua Li <shli@kernel.org> wrote:
> >
> >> Write has some impacts to SSD:
> >> 1. wear out flash. Frequent write can speed out the progress.
> >> 2. increase the burden of garbage collection of SSD firmware. If no space
> >> left for write, SSD firmware garbage collection will try to free some space.
> >> 3. slow down subsequent write. After write SSD to some extents (for example,
> >> write the whole disk), subsequent write will slow down significantly (because
> >> almost every write invokes garbage collection in such case).
> >>
> >> We want to avoid unnecessary write as more as possible. raid sync generally
> >> involves a lot of unnecessary write. For example, even two disks don't have
> >> any data, we write the second disk for the whole disk size.
> >>
> >> To reduce write, we always compare raid disk data and only write mismatch part.
> >> This means sync will have extra IO read and memory compare. So this scheme is
> >> very bad for hard disk raid and sometimes SSD raid too if mismatch part is
> >> majority. But sometimes this can be very helpful to reduce write, in that case,
> >> since sync is rare operation, the extra IO/CPU usage is worthy paying. People
> >> who want to use the feature should understand the risk first. So this ability
> >> is off by default, a sysfs entry can be used to enable it.
> >>
> >> Signed-off-by: Shaohua Li <shli@fusionio.com>
> >> ---
> >>  drivers/md/md.c    |   41 +++++++++++++++++++++++++++++++
> >>  drivers/md/md.h    |    3 ++
> >>  drivers/md/raid1.c |   70 +++++++++++++++++++++++++++++++++++++++++++++++++----
> >>  3 files changed, 110 insertions(+), 4 deletions(-)
> >>
> >> Index: linux/drivers/md/md.h
> >> ===================================================================
> >> --- linux.orig/drivers/md/md.h        2012-07-25 13:51:00.353775521 +0800
> >> +++ linux/drivers/md/md.h     2012-07-26 10:36:38.500740552 +0800
> >> @@ -325,6 +325,9 @@ struct mddev {
> >>  #define      MD_RECOVERY_FROZEN      9
> >>
> >>       unsigned long                   recovery;
> >> +#define MD_RECOVERY_MODE_REPAIR              0
> >> +#define MD_RECOVERY_MODE_DISCARD     1
> >> +     unsigned long                   recovery_mode;
> >
> > You have not documented the meaning of these two flags at all, and I don't
> > feel up to guessing.
> >
> > The patch looks more complex that it should be.  The behaviour you are
> > suggesting is exactly the behaviour you get when MD_RECOVERY_REQUESTED is
> > set, so at most I expect to see a few places where that flag is tested
> > changed to test something else as well.
> >
> > How would this be used?  It affects resync and resync happens as soon as the
> > array is assembled.  So when and how would you set the flag which says
> > "prefer reads to writes"?  It seems like it needs to be set in the metadata.
> >
> > BTW RAID10 already does this - it reads and compares for a normal sync.  So
> > maybe just tell people to use raid10 if they want this behaviour?
> 
> It's true, the behavior likes MD_RECOVERY_REQUESTED, ie, read all disks,
> only do write if two disk data not match. But I hope it works as soon as the
> array is assembled. Surely we can set in the metadata, but I didn't want to
> enable it by default, since even with SSD, the read-compare-write isn't optimal
> some times, because it involves extra IO read and memory compare.
> 
> It appears MD_RECOVERY_REQUESTED assumes disks are insync.
> It doesn't read disk !insync, so it doesn't work for assemble.
> 
> In my mind, user frozen the sync first (with sync_action setting), and then
> enable read-compare-write, and finally continue the sync. I can't stop
> a recovery and set MD_RECOVERY_REQUESTED, so I added a flag.
> Anyway, please suggest what the preferred way for this.

I guess I just don't find this functionality at all convincing.  It isn't
clear when anyone would use it, or how they would use it.  It seems best not
to provide it.

> 
> The second patch is for a special case. two disks data don't match, but
> disk1 data are all 0. I'd like to do trim for disk2, instead of write
> 0 to disk2,
> this can be helpful for SSD.

If this is a useful optimisation, then I expect it should be implemented
somewhat lower in the stack, so that every attempt to write a zero block
becomes a 'trim'.  Ideally the SSD itself should  handle this optimisation.

I really don't think it is appropriate for md to be detecting zeros and
writing them as 'trim'.

NeilBrown



> 
> Good to know raid10 already does read-compare-write. From the comments,
> looks it only supports resync not recovery.
> 
> Thanks,
> Shaohua


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-09-11  0:59 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-26  8:01 [RFC 1/2]raid1: only write mismatch sectors in sync Shaohua Li
2012-07-27 16:01 ` Jan Ceuleers
2012-07-30  0:39   ` Shaohua Li
2012-07-30  1:07     ` Roberto Spadim
2012-07-31  5:53 ` NeilBrown
2012-07-31  8:12   ` Shaohua Li
2012-09-11  0:59     ` NeilBrown [this message]
2012-09-12  5:29       ` Shaohua Li
2012-09-18  4:57         ` NeilBrown
2012-09-19  5:51           ` Shaohua Li
2012-09-19  7:16             ` NeilBrown
2012-09-20  1:56               ` Shaohua Li
2012-10-17  5:11                 ` Shaohua Li
2012-10-17 22:56                   ` NeilBrown
2012-10-18  1:17                     ` Shaohua Li
2012-10-18  1:29                       ` NeilBrown
2012-10-18  2:01                         ` Shaohua Li
2012-10-18  2:36                           ` NeilBrown
2012-10-21 17:14                             ` Michael Tokarev
2012-10-31  3:25                             ` Shaohua Li
2012-10-31  5:43                               ` NeilBrown
2012-10-31  6:05                                 ` Shaohua Li
2012-10-18  1:30                       ` kedacomkernel
2012-11-20 17:00                     ` Joseph Glanville

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120911105908.51681433@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).