linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shli@kernel.org>
To: NeilBrown <neilb@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>,
	linux-raid@vger.kernel.org, axboe@kernel.dk,
	Shaohua Li <shli@fusionio.com>
Subject: Re: [RFC 1/2] MD: raid5 trim support
Date: Tue, 8 May 2012 18:16:53 +0800	[thread overview]
Message-ID: <20120508101653.GA9757@kernel.org> (raw)
In-Reply-To: <20120425034307.GA454@kernel.org>

On Wed, Apr 25, 2012 at 11:43:07AM +0800, Shaohua Li wrote:
> On Wed, Apr 18, 2012 at 02:34:04PM +0800, Shaohua Li wrote:
> > On 4/18/12 1:57 PM, NeilBrown wrote:
> > >On Wed, 18 Apr 2012 13:30:45 +0800 Shaohua Li<shli@kernel.org>  wrote:
> > >
> > >>On 4/18/12 12:48 PM, NeilBrown wrote:
> > >>>On Wed, 18 Apr 2012 08:58:14 +0800 Shaohua Li<shli@kernel.org>   wrote:
> > >>>
> > >>>>On 4/18/12 4:26 AM, NeilBrown wrote:
> > >>>>>On Tue, 17 Apr 2012 07:46:03 -0700 Dan Williams<dan.j.williams@intel.com>
> > >>>>>wrote:
> > >>>>>
> > >>>>>>On Tue, Apr 17, 2012 at 1:35 AM, Shaohua Li<shli@kernel.org>    wrote:
> > >>>>>>>Discard for raid4/5/6 has limitation. If discard request size is small, we do
> > >>>>>>>discard for one disk, but we need calculate parity and write parity disk.  To
> > >>>>>>>correctly calculate parity, zero_after_discard must be guaranteed.
> > >>>>>>
> > >>>>>>I'm wondering if we could use the new bad blocks facility to mark
> > >>>>>>discarded ranges so we don't necessarily need determinate data after
> > >>>>>>discard.
> > >>>>>>
> > >>>>>>...but I have not looked into it beyond that.
> > >>>>>>
> > >>>>>>--
> > >>>>>>Dan
> > >>>>>
> > >>>>>No.
> > >>>>>
> > >>>>>The bad blocks framework can only store a limited number of bad ranges - 512
> > >>>>>in the current implementation.
> > >>>>>That would not be an acceptable restriction for discarded ranges.
> > >>>>>
> > >>>>>You would need a bitmap of some sort if you wanted to record discarded
> > >>>>>regions.
> > >>>>>
> > >>>>>http://neil.brown.name/blog/20110216044002#5
> > >>>>
> > >>>>This appears to remove the unnecessary resync for discarded range after
> > >>>>a crash
> > >>>>or discard error, eg an enhancement. From my understanding, it can't
> > >>>>remove the
> > >>>>limitation I mentioned in the patch. For raid5, we still need discard a
> > >>>>whole
> > >>>>stripe (discarding one disk but writing parity disk isn't good).
> > >>>
> > >>>It is certainly not ideal, but it is worse than not discarding at all?
> > >>>And would updating some sort of bitmap be just as bad as updating the parity
> > >>>block?
> > >>>
> > >>>How about treating a DISCARD request as a request to write a block full of
> > >>>zeros, then at the lower level treat any request to write a block full of
> > >>>zeros as a DISCARD request.  So when the parity becomes zero, it gets
> > >>>discarded.
> > >>>
> > >>>Certainly it is best if the filesystem would discard whole stripes at a time,
> > >>>and we should be sure to optimise that.  But maybe there is still room to do
> > >>>something useful with small discards?
> > >>
> > >>Sure, it would be great we can do small discards. But I didn't get how to do
> > >>it with the bitmap approach. Let's give an example, data disk1, data disk2,
> > >>parity disk3. Say discard some sectors of disk1. The suggested approach is
> > >>to mark the range bad. Then how to deal with parity disk3? As I said,
> > >>writing
> > >>parity disk3 isn't good. So mark the corresponding range of parity disk3
> > >>bad too? If we did this, if disk2 is broken, how can we restore it?
> > >
> > >Why, exactly, is writing the parity disk not good?
> > >Not discarding blocks that we possibly could discard is also not good.
> > >Which is worst?
> > 
> > Writing the parity disk is worse. Discard is to improve the garbage
> > collection
> > of SSD firmware, so improve later write performance. While write is bad for
> > SSD, because SSD can be wear leveling out with extra write and also write
> > increases garbage collection overhead. So the result of small
> > discard is data
> > disk garbage collection is improved but parity disk gets worse and
> > parity disk
> > gets fast to end of its life, which doesn't make sense. This is even
> > worse when
> > the parity is distributed.
> Neil,
> Any comments about the patches?
ping!

Thanks,
Shaohua

  reply	other threads:[~2012-05-08 10:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-17  8:35 [RFC 0/2] raid5 trim support Shaohua Li
2012-04-17  8:35 ` [RFC 1/2] MD: " Shaohua Li
2012-04-17 14:46   ` Dan Williams
2012-04-17 15:07     ` Shaohua Li
2012-04-17 18:16       ` Dan Williams
2012-04-17 20:26     ` NeilBrown
2012-04-18  0:58       ` Shaohua Li
2012-04-18  4:48         ` NeilBrown
2012-04-18  5:30           ` Shaohua Li
2012-04-18  5:57             ` NeilBrown
2012-04-18  6:34               ` Shaohua Li
2012-04-25  3:43                 ` Shaohua Li
2012-05-08 10:16                   ` Shaohua Li [this message]
2012-05-08 15:52                     ` Dan Williams
2012-05-09  3:12                       ` Shaohua Li
2012-05-08 20:17                     ` NeilBrown
2012-04-17  8:35 ` [RFC 2/2] MD: raid5 avoid unnecessary zero page for trim Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120508101653.GA9757@kernel.org \
    --to=shli@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dan.j.williams@intel.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=shli@fusionio.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).