All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert White <rwhite@pobox.com>
To: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>,
	Grzegorz Kowal <custos.mentis@gmail.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH v2 1/3] Btrfs: get more accurate output in df command.
Date: Sun, 14 Dec 2014 22:06:50 -0800	[thread overview]
Message-ID: <548E7A7A.90505@pobox.com> (raw)
In-Reply-To: <548E377D.6030804@cn.fujitsu.com>

On 12/14/2014 05:21 PM, Dongsheng Yang wrote:
>
> Does it make sense to you?

I understood what you were saying but it didn't make sense to me...

> As there are 2 complaints for the same change of @size in df, I have to
> say it maybe not so easy to understand.

> Anyone have some suggestion about it?

ABSTRACT:: Stop being clever, just give the raw values. That's what you 
should be doing anyway. There are no other correct values to give that 
doesn't blow someone's paradigm somewhere.

ITEM #1 :: In my humble opinion (ha ha) the size column should never 
change unless you add or remove actual storage. It should approximate 
the raw block size of the device on initial creation, and it should 
adjust to the size changes that happen when you semantically resize the 
filesystem with e.g. btrfs resize.

RATIONALE ::

(1a) The actual definition of size for df is not well defined, so the 
best definition of size is the one that produces the "most useful 
information". IMHO the number I've always wanted to see from df is the 
value SIZE I would supply in order to safely dd the entire filesystem 
from one place to another. That number would be, on a single drive 
filesystem, "total_bytes" from the superblock as scaled by the necessary 
block size etc.

ITEM #2 :: The idea of "blocks used" is iffy as well. In particular I 
don't care how or why those blocks have been used. And almost all 
filesystems have this same issue. If I write a 1GiB file to ext2 my 
blocks used doesn't go down by exactly 1GiB, it goes down by 1GiB plus 
all the indirect indexing blocks needed to reference that 1GiB.

RATIONALE ::

(2a) "Blocks Used" is not, and wasn't particularly meant to be "Blocks 
Used By Data Alone".

(2b) Many filesystems have, historically, per-subtracted the fixed 
overhead of their layout such as removing inode table regions. But the 
it became "stupid" and "anti-helpful" but remained un-redressed when 
advancements were made that let data be stored directly in the inodes 
for small files. So now you can technically fit more data in an EXT* 
filesystem than you could fit in SIZE*BLOCKSIZE bytes. Even before 
compression.

(2c) The "fixed" blocks-used size of BTRFS is technically 
sizeof(SuperBlock)*num_supers. Everything else is up for grabs. Some is, 
indeed, pre-grabbed but so what?

ITEM #3 :: The idea of Blocks Available should be Blocks - BlocksUsed 
and _nothing_ _more_.

RATIONALE ::

(3a) Just like Blocks Used isn't just about blocks used for data, Blocks 
Available isn't about how much more user data can be stuffed into the 
filesystem

(3b) Any attempt to treat Blocks Available as some sort of guarantee 
will be meaningless for some significant number of people and usages.

ITEM #4 :: Blocks available to unprivileged users is pretty "iffy" since 
unprivileged users cannot write to the filesystem. This datum doesn't 
have a "plain reading". I'd start with filesystem total blocks, then 
subtract the total blocks used by all trees nodes in all trees. (e.g. 
Nodes * 0x1000 or whatever node size is) then shave off the N 
superblocks, then subtract the number of blocks already allocated in 
data extents. And you're done.

ITEM #5 :: A plain reading of the comments in the code cry out "stop 
trying to help me make predictions". Just serve up the nice raw numbers.

RATIONALE ::

(5a) We have _all_ suffered under the merciless tyranny of some system 
or another that was being "too helpful to be useful". Once a block of 
code tries to "help you" and enforces that help, then you are doomed to 
suffer under that help. See "Clippy".

(5b) The code has a plain reading. It doesn't say anything about how 
things will be used. "Available" is _available_. If you have chosen to 
use it at a 2x rate (e.g. 200%, e.g. RAID1) or a 1.25 rate (five media 
in RAID5) or an N+2/N rate (e.g. RAID6), or a 4x rate (RAID10)... well 
that was your choice.

(5c) If your metadata rate is different than your data rate, then there 
is _absolutely_ no way to _programatically_ predict how the data _might_ 
be used, and this is the _default_ usage model. Literally the hardest 
model is the normal model. There is actually no predictive solution. So 
why are we putting in predictions at all when they _must_ be wrong.

            struct statfs {
                __SWORD_TYPE f_type;    /* type of filesystem (see below) */
                __SWORD_TYPE f_bsize;   /* optimal transfer block size */
                fsblkcnt_t   f_blocks;  /* total data blocks in 
filesystem */
                fsblkcnt_t   f_bfree;   /* free blocks in fs */
                fsblkcnt_t   f_bavail;  /* free blocks available to
                                           unprivileged user */
                fsfilcnt_t   f_files;   /* total file nodes in filesystem */
                fsfilcnt_t   f_ffree;   /* free file nodes in fs */
                fsid_t       f_fsid;    /* filesystem id */
                __SWORD_TYPE f_namelen; /* maximum length of filenames */
                __SWORD_TYPE f_frsize;  /* fragment size (since Linux 
2.6) */
                __SWORD_TYPE f_spare[5];
            };

The datum provided is _supposed_ to be simple. "total blocks in file 
system" "free blocks in file system".

"Blocks available to unprivileged users" is the only tricky one. Id 
limit that to all unallocated blocks inside data extents and all blocks 
not part of any extent. "Unprivileged users" cannot, after all, actually 
allocate blocks in the various trees even if the system ends up doing it 
for them.

Fortunately (or hopefully) that's not the datum /bin/df usually returns.


SUMMARY ::

No fudge factor or backwards-reasoning is going to satisfy more than 
half the people.

Trying to gestimate the users intentions is impossible. Like all 
filesystems except the most simplistic ones (vfat etc) or read-only ones 
(squashfs), getting any answer "near perfect" is not likely, nor 
particularly helpful.

It's really _not_ the implementors job to guess at how the user is going 
to use the system.

Just as EXT before us didn't bother trying to put in a fudge factor that 
guessed what percentage of files would end up needing indirect blocks, 
we shouldn't be in the business of trying to back-figure cost-of-storage.

The raw numbers are _more_ useful in many circumstances. The raw blocks 
used, for example, will tell me what I need to know for thin 
provisioning on other media, for example. Literally nothing else exposes 
that sort of information.

Just put a prominent notice that the user needs to remember to factor 
their choice of redundancy et al into the numbers.

Noticing that my RAID1 costs two 1k blocks to store 1K of data is 
_their_ _job_ when it comes down to it. That's because we are giving 
"them" insight to the filessytem _and_ the storage management.

Same for the benefits of compression etc.

We can recognize that this is "harder" than some other filesystems 
because, frankly, it is... Once we decided to get into the business of 
fusing the file system with the storage management system we _accepted_ 
that burden of difficulty. Users who never go beyond core usage (single 
data plus "overhead" from DUP metadata) will still get the same numbers 
for their simple case. People who start doing RAID5+1 or whatever 
(assuming our implementation gets that far) across 22 media are just 
going to have to remember to do the math to figure their 10% overhead 
cost when looking at "blocks available" just like I had to do my 
S=N*log(N) estimates while laying out Oracle table spaces on my sun 
stations back in the eighties.

Any "clever" answer to any one model will be wrong for _every_ _other_ 
model.

IN MY HUMBLE OPINION, of course... 8-)



  reply	other threads:[~2014-12-15  6:06 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11  8:31 [PATCH v2 1/3] Btrfs: get more accurate output in df command Dongsheng Yang
2014-12-11  8:31 ` [PATCH v2 2/3] Btrfs: raid56: simplify the parameter of nr_parity_stripes() Dongsheng Yang
2014-12-16  6:21   ` Satoru Takeuchi
2014-12-11  8:31 ` [PATCH v2 3/3] Btrfs: adapt df command to RAID5/6 Dongsheng Yang
2014-12-12 18:00 ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Goffredo Baroncelli
2014-12-13  0:50   ` Duncan
2014-12-13 10:21     ` Dongsheng Yang
2014-12-13  9:57   ` Dongsheng Yang
2014-12-12 19:25 ` Goffredo Baroncelli
2014-12-14 11:29   ` Dongsheng Yang
     [not found]     ` <CABmMA7tw9BDsBXGHLO4vjcO4gaYmZPb_BQV8w22griqFvCJpPA@mail.gmail.com>
2014-12-14 14:32       ` Grzegorz Kowal
2014-12-15  1:21         ` Dongsheng Yang
2014-12-15  6:06           ` Robert White [this message]
2014-12-15  7:49             ` Robert White
2014-12-15  8:26               ` Dongsheng Yang
2014-12-15  9:36                 ` Robert White
2014-12-16  3:30                   ` Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.] Robert White
2014-12-16  3:52                     ` Robert White
2014-12-16 11:30                     ` Dongsheng Yang
2014-12-16 13:24                       ` Dongsheng Yang
2014-12-16 19:52                       ` Robert White
2014-12-17 11:38                         ` Dongsheng Yang
2014-12-18  4:07                           ` Robert White
2014-12-18  8:02                             ` Duncan
2014-12-23 12:31                             ` Dongsheng Yang
2014-12-27  1:10                               ` Robert White
2015-01-05  9:59                                 ` Dongsheng Yang
2014-12-31  0:15                             ` Zygo Blaxell
2015-01-05  9:56                               ` Dongsheng Yang
2015-01-05 10:07                                 ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Dongsheng Yang
2015-01-05 10:07                                   ` [PATCH v2 2/3] Btrfs: raid56: simplify the parameter of nr_parity_stripes() Dongsheng Yang
2015-01-05 10:07                                   ` [PATCH v2 3/3] Btrfs: adapt df command to RAID5/6 Dongsheng Yang
2014-12-19  3:32             ` [PATCH v2 1/3] Btrfs: get more accurate output in df command Zygo Blaxell
     [not found]     ` <548F1EA7.9050505@inwind.it>
2014-12-16 13:47       ` Dongsheng Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=548E7A7A.90505@pobox.com \
    --to=rwhite@pobox.com \
    --cc=custos.mentis@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=yangds.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.