Re: [PATCH 0/5] [RFC] RAID-level terminology change

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

From: Roger Binns <rogerb@rogerbinns.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 0/5] [RFC] RAID-level terminology change
Date: Sun, 10 Mar 2013 17:21:28 -0700	[thread overview]
Message-ID: <khj826$386$1@ger.gmane.org> (raw)
In-Reply-To: <20130310220422.GF30771@carfax.org.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/13 15:04, Hugo Mills wrote:
> On Sat, Mar 09, 2013 at 09:41:50PM -0800, Roger Binns wrote:
>> The only constraints that matter are surviving N device failures, and
>> data not lost if at least N devices are still present.  Under the
>> hood the best way of meeting those can be heuristically determined,
>> and I'd expect things like overhead to dynamically adjust as storage
>> fills up or empties.
> 
> That's really not going to work happily -- you'd have to run the 
> restriper in the background automatically as the device fills up.

Which is the better approach - the administrator has to sit there
adjusting various parameters after having done some difficult calculations
redoing it as data and devices increase or decrease - or a computer with
billions of bytes of memory and billions of cpu cycles per second just
figures it out based on experience :-)

> Given that this is going to end up rewriting *all* of the data on the 
> FS,

Why does all data have to be rewritten?  Why does every piece of data have
to have exactly the same storage parameters in terms of
non-redundancy/performance/striping options?

I can easily imagine the final implementation being informed by hot data
tracking.  There is absolutely no need for data that is rarely read to be
using the maximum striping/performance/overhead options.

There is no need to rewrite everything anyway - if a filesystem with 1GB
of data is heading towards 2GB of data then only enough readjusts need to
be made to release that additional 1GB of overhead.

I also assume that the probability of all devices being exactly the same
size and exactly the same performance characteristics is going to
decrease.  Many will expect that they can add an SSD to the soup, and over
time add/update devices.  ie the homogenous case that regular RAID
implicitly assumes will become increasingly rare.

> If you want maximum storage (with some given redundancy), regardless of
> performance, then you might as well start with the parity-based levels
> and just leave it at that.

In the short term it would certainly make sense to have an online
calculator or mkfs helper where you specify the device sizes and
redundancy requirements together with how much data you have, and it then
spits out the string of numbers and letters to use for mkfs/balance.

> Thinking about it, specifying a (redundancy, acceptable_wastage) pair
> is fairly pointless in controlling the performance levels,

I don't think there is merit in specifying acceptable message - the answer
is obvious in that any unused space is acceptable for use.  That also
means it changes over time as storage is used/freed.

> There's not much else a heuristic can do, without effectively exposing
> all the config options to the admin, in some obfuscated form.

There is lots heuristics can do.  At the simplest level btrfs can monitor
device performance characteristics and use that as a first pass.  One
database that I use has an interesting approach for queries - rather than
trying to work out the single best perfect execution strategy (eg which
indices in which order) it actually tries them all out concurrently and
picks the quickest.  That is then used for future similar queries with the
performance being monitored.  Once responses times no longer match the
strategy it tries them all again to pick a new winner.

There is no reason btrfs can't try a similar approach.  When presented
with a pile of heterogenous storage with different sizes and performance
characteristics, use all reasonable approaches and monitor resulting
read/write performance.  Then start biasing towards what works best.  Use
hot data tracking to determine which data would most benefit from its
approach being changed to more optimal values.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE9I4UACgkQmOOfHg372QTNZgCeJe7H9FDiwMq1CWWZTWE89/4O
fDsAn1s6/J1am4mxHhOYUnz/3JUZ6VJx
=/XF8
-----END PGP SIGNATURE-----

next prev parent reply	other threads:[~2013-03-11  0:21 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-09 20:31 [PATCH 0/5] [RFC] RAID-level terminology change Hugo Mills
2013-03-09 20:31 ` [PATCH 1/5] Use nCmSpP format for mkfs Hugo Mills
2013-03-09 20:31 ` [PATCH 2/5] Move parse_profile to utils.c Hugo Mills
2013-03-09 20:31 ` [PATCH 3/5] Convert balance filter parser to use common nCmSpP replication-level parser Hugo Mills
2013-03-09 20:31 ` [PATCH 4/5] Change output of btrfs fi df to report new (or old) RAID names Hugo Mills
2013-03-09 20:31 ` [PATCH 5/5] Add man page description for nCmSpP replication levels Hugo Mills
2013-03-10 14:01   ` Goffredo Baroncelli
2013-03-10 17:20     ` Hugo Mills
2013-03-10 17:52       ` Goffredo Baroncelli
2013-03-09 21:38 ` [PATCH 0/5] [RFC] RAID-level terminology change Harald Glatt
     [not found] ` <CAFWF=am4ki529Zez4123gYk3BD+Z9RONRpAK7NZe=skHzcdMiw@mail.gmail.com>
2013-03-09 21:46   ` Hugo Mills
2013-03-09 22:25 ` Roger Binns
2013-03-10  1:44   ` Hugo Mills
2013-03-10  5:41     ` Roger Binns
2013-03-10  6:29       ` Harald Glatt
2013-03-10  6:37         ` Harald Glatt
2013-03-10 11:31           ` Martin Steigerwald
2013-03-10 11:48           ` Roger Binns
2013-03-10 22:04       ` Hugo Mills
2013-03-11  0:21         ` Roger Binns [this message]
2013-03-27  4:27           ` Brendan Hide
2013-03-27  5:24             ` Roger Binns
2013-03-10 11:23 ` Martin Steigerwald
2013-03-10 14:11   ` Goffredo Baroncelli
2013-03-10 21:36   ` Hugo Mills
2013-03-10 21:45     ` Harald Glatt
2013-03-10 22:59       ` Goffredo Baroncelli
2013-03-10 22:06         ` Harald Glatt
2013-03-10 23:06   ` Diego Calleja
2013-03-10 15:43 ` Goffredo Baroncelli
2013-03-10 22:24   ` Hugo Mills
2013-03-10 22:42     ` Harald Glatt
2013-03-10 23:40   ` sam tygier
2013-03-10 23:49     ` Hugo Mills
2013-03-11 14:14       ` David Sterba
2013-03-10 23:55 ` sam tygier
2013-03-11  8:56   ` Hugo Mills

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='khj826$386$1@ger.gmane.org' \
    --to=rogerb@rogerbinns.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox