From: Hugo Mills <hugo@carfax.org.uk>
To: Gabriel <g2p.code@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH][BTRFS-PROGS] Enhance btrfs fi df
Date: Fri, 2 Nov 2012 23:44:19 +0000 [thread overview]
Message-ID: <20121102234419.GD28864@carfax.org.uk> (raw)
In-Reply-To: <k71kl2$7go$3@ger.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 4684 bytes --]
On Fri, Nov 02, 2012 at 11:23:14PM +0000, Gabriel wrote:
> On Fri, 02 Nov 2012 22:06:04 +0000, Hugo Mills wrote:
>
> > On Fri, Nov 02, 2012 at 07:05:37PM +0000, Gabriel wrote:
> >> On Fri, 02 Nov 2012 13:02:32 +0100, Goffredo Baroncelli wrote:
> >> > On 2012-11-02 12:18, Martin Steigerwald wrote:
> >> >> Metadata, DUP is displayed as 3,50GB on the device level and as 1,75GB
> >> >> in total. I understand the logic behind this, but this could be a bit
> >> >> confusing.
> >> >>
> >> >> But it makes sense: Showing real allocation on device level makes
> >> >> sense,
> >> >> cause thats what really allocated on disk. Total makes some sense,
> >> >> cause thats what is being used from the tree by BTRFS.
> >> >
> >> > Yes, me too. At the first I was confused when you noticed this
> >> > discrepancy. So I have to admit that it is not so obvious to understand.
> >> > However we didn't find any way to make it more clear...
> >> >
> >> >> It still looks confusing at first…
> >> > We could use "Chunk(s) capacity" instead of total/size ? I would like an
> >> > opinion from a "english people" point of view..
> >>
> >> This is easy to fix, here's a mockup:
> >>
> >> Metadata,DUP: Size: 1.75GB ×2, Used: 627.84MB ×2
> >> /dev/dm-0 3.50GB
> >
> > I've not considered the full semantics of all this yet -- I'll try
> > to do that tomorrow. However, I note that the "×2" here could become
> > non-integer with the RAID-5/6 code (which is due Real Soon Now). In
> > the first RAID-5/6 code drop, it won't even be simple to calculate
> > where there are different-sized devices in the filesystem. Putting an
> > exact figure on that number is potentially going to be awkward. I
> > think we're going to need kernel help for working out what that number
> > should be, in the general case.
>
> DUP can be nested below a device because it represents same-device
> redundancy (purpose: survive smudges but not device failure).
>
> On the other hand raid levels should occupy the same space on all
> linked devices (a necessary consequence of the guarantee that RAID5
> can survive the loss of any device and RAID6 any two devices).
No, the multiplier here is variable. Consider:
1 MiB stored in RAID-5 across 3 devices takes up 1.5 MiB -- multiplier ×1.5
(1 MiB over 2 devices is 512 KiB, plus an additional 512 KiB for parity)
1 MiB stored in RAID-5 across 6 devices takes up 1.2 MiB -- multipler ×1.2
(1 MiB over 5 devices is 204.8 KiB, plus an additional 204.8 KiB for parity)
With the (initial) proposed implementation of RAID-5, the
stripe-width (i.e. the number of devices used for any given chunk
allocation) will be *as many as can be allocated*. Chris confirmed
this today on IRC. So if I have a disk array of 2T, 2T, 2T, 1T, 1T,
1T, then the first 1T of allocation will stripe across 6 devices,
giving me 5 data+1 parity, or a multiplier of ×1.2. As soon as the
smaller devices are full, the stripe width will drop to 3 devices, and
we'll be using 2 data+1 parity allocation, or a multiplier of ×1.5 for
any subsequent chunks. So, as more data over the first 5T is stored,
the multiplier steadily decreases, until we fill the FS, and we get a
multiplier of ×1.35 overall. This gets more complicated if you have
devices of many different sizes. (Imagine 6 disks with sizes 500G, 1T,
1.5T, 2T, 3T, 3T).
We probably can work out the current RAID overhead and feed it back
sensibly, but it's (a) not constant as the allocation of the chunks
increases, and (b) not trivial to compute.
> The two probably won't need to be represented at the same time
> except during a reshape, because I imagine DUP gets converted to
> RAID (1 or 5) as soon as the second device is added.
>
> A 1→2 reshape would look a bit like this (doing only the data column
> and skipping totals):
>
> InitialDevice
> Reserved 1.21TB
> Used 1.21TB
> RAID1(InitialDevice, SecondDevice)
> Reserved 1.31TB + 100GB
> Used 2× 100GB
>
> RAID5, RAID6: same with fractions, n+1⁄n and n+2⁄n.
Except that n isn't guaranteed to be constant. That was pretty much
my only point. Don't assume that it will be (or at the very least, be
aware that you are assuming it is, and be prepared for inconsistencies).
Hugo.
--
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Well, sir, the floor is yours. But remember, the ---
roof is ours!
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-11-02 23:44 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-02 10:15 [PATCH][BTRFS-PROGS] Enhance btrfs fi df Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 1/8] Enhance the command btrfs filesystem df Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 2/8] Create the man page entry for the command btrfs fi df Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 3/8] Move open_file_or_dir() in utils.c Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 4/8] Move scrub_fs_info() and scrub_dev_info() " Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 5/8] Add command btrfs filesystem disk-usage Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 6/8] Create entry in man page for " Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 7/8] Add btrfs device disk-usage command Goffredo Baroncelli
2012-11-02 10:15 ` [PATCH 8/8] Create a new entry in btrfs man page for btrfs device disk-usage Goffredo Baroncelli
2012-11-02 11:18 ` [PATCH][BTRFS-PROGS] Enhance btrfs fi df Martin Steigerwald
2012-11-02 12:02 ` Goffredo Baroncelli
2012-11-02 19:05 ` Gabriel
2012-11-02 19:31 ` Goffredo Baroncelli
2012-11-02 20:40 ` Gabriel
2012-11-02 21:46 ` Michael Kjörling
2012-11-02 23:34 ` Gabriel
2012-11-02 22:06 ` Hugo Mills
2012-11-02 23:23 ` Gabriel
2012-11-02 23:44 ` Hugo Mills [this message]
2012-11-03 0:14 ` Gabriel
2012-11-03 12:28 ` Goffredo Baroncelli
2012-11-03 12:35 ` Goffredo Baroncelli
2012-11-03 22:04 ` cwillu
2012-11-03 12:11 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121102234419.GD28864@carfax.org.uk \
--to=hugo@carfax.org.uk \
--cc=g2p.code@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).