All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
To: Adam Borowski <kilobyte@angband.pl>
Cc: Qu Wenruo <quwenruo@cn.fujitsu.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID system with adaption to changed number of disks
Date: Wed, 12 Oct 2016 17:10:18 -0400	[thread overview]
Message-ID: <20161012211017.GJ26140@hungrycats.org> (raw)
In-Reply-To: <20161012195528.GB4800@angband.pl>

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

On Wed, Oct 12, 2016 at 09:55:28PM +0200, Adam Borowski wrote:
> On Wed, Oct 12, 2016 at 01:19:37PM -0400, Zygo Blaxell wrote:
> > On Wed, Oct 12, 2016 at 01:48:58PM +0800, Qu Wenruo wrote:
> > > In fact, the _concept_ to solve such RMW behavior is quite simple:
> > > 
> > > Make sector size equal to stripe length. (Or vice versa if you like)
> > > 
> > > Although the implementation will be more complex, people like Chandan are
> > > already working on sub page size sector size support.
> > 
> > So...metadata blocks would be 256K on the 5-disk RAID5 example above,
> > and any file smaller than 256K would be stored inline?  Ouch.  That would
> > also imply the compressed extent size limit (currently 128K) has to become
> > much larger.
> > 
> > I had been thinking that we could inject "plug" extents to fill up
> > RAID5 stripes.  This lets us keep the 4K block size for allocations,
> > but at commit (or delalloc) time we would fill up any gaps in new RAID
> > stripes to prevent them from being modified.  As the real data is deleted
> > from the RAID stripes, it would be replaced by "plug" extents to keep any
> > new data from being allocated in the stripe.  When the stripe consists
> > entirely of "plug" extents, the plug extent would be deleted, allowing
> > the stripe to be allocated again.  The "plug" data would be zero for
> > the purposes of parity reconstruction, regardless of what's on the disk.
> > Balance would just throw the plug extents away (no need to relocate them).
> 
> Your idea sounds good, but there's one problem: most real users don't
> balance.  Ever.  Contrary to the tribal wisdom here, this actually works
> fine, unless you had a pathologic load skewed to either data or metadata on
> the first write then fill the disk to near-capacity with a load skewed the
> other way.

> Most usage patterns produce a mix of transient and persistent data (and at
> write time you don't know which file is which), meaning that with time every
> stripe will contain a smidge of cold data plus a fill of plug extents.

Yes, it'll certainly reduce storage efficiency.  I think all the
RMW-avoidance strategies have this problem.  The alternative is to risk
losing data or the entire filesystem on disk failure, so any of the
RMW-avoidance strategies are probably a worthwhile tradeoff.  Big RAID5/6
arrays tend to be used mostly for storing large sequentially-accessed
files which are less susceptible to this kind of problem.

If the pattern is lots of small random writes then performance on raid5
will be terrible anyway (though it may even be improved by using plug
extents, since RMW stripe updates would be replaced with pure CoW).

> Thus, while the plug extents idea doesn't suffer from problems of big
> sectors you just mentioned, we'd need some kind of auto-balance.

Another way to approach the problem is to relocate the blocks in
partially filled RMW stripes so they can be effectively CoW stripes;
however, the requirement to do full extent relocations leads to some
nasty write amplification and performance ramifications.  Balance is
hugely heavy I/O load and there are good reasons not to incur it at
unexpected times.


> 
> -- 
> A MAP07 (Dead Simple) raspberry tincture recipe: 0.5l 95% alcohol, 1kg
> raspberries, 0.4kg sugar; put into a big jar for 1 month.  Filter out and
> throw away the fruits (can dump them into a cake, etc), let the drink age
> at least 3-6 months.
> 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2016-10-12 21:11 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-11 15:14 RAID system with adaption to changed number of disks Philip Louis Moetteli
2016-10-11 16:06 ` Hugo Mills
2016-10-11 23:58   ` Chris Murphy
2016-10-12  1:32     ` Qu Wenruo
2016-10-12  4:37       ` Zygo Blaxell
2016-10-12  5:48         ` Qu Wenruo
2016-10-12 17:19           ` Zygo Blaxell
2016-10-12 19:55             ` Adam Borowski
2016-10-12 21:10               ` Zygo Blaxell [this message]
2016-10-13  3:40                 ` Adam Borowski
2016-10-12 20:41             ` Chris Murphy
2016-10-13  0:35             ` Qu Wenruo
2016-10-13 21:03               ` Zygo Blaxell
2016-10-14  1:24                 ` Qu Wenruo
2016-10-14  7:16                   ` Chris Murphy
2016-10-14 19:55                     ` Zygo Blaxell
2016-10-14 21:19                       ` Duncan
2016-10-14 21:38                       ` Chris Murphy
2016-10-14 22:30                         ` Chris Murphy
2016-10-15  3:19                           ` Zygo Blaxell
2016-10-12  7:02         ` Anand Jain
2016-10-12  7:25     ` Roman Mamedov
2016-10-12 17:31       ` Zygo Blaxell
2016-10-12 19:19         ` Zygo Blaxell
2016-10-12 19:33           ` Roman Mamedov
2016-10-12 20:33             ` Zygo Blaxell
2016-10-11 16:37 ` Austin S. Hemmelgarn
2016-10-11 17:16 ` Tomasz Kusmierz
2016-10-11 17:29 ` ronnie sahlberg
2016-10-12  1:33 ` Dan Mons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161012211017.GJ26140@hungrycats.org \
    --to=ce3g8jdj@umail.furryterror.org \
    --cc=kilobyte@angband.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.