UI issues around RAID1

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* UI issues around RAID1
@ 2009-11-16 18:45 Roland Dreier
  2009-11-16 20:20 ` Josef Bacik
  0 siblings, 1 reply; 8+ messages in thread
From: Roland Dreier @ 2009-11-16 18:45 UTC (permalink / raw)
  To: linux-btrfs

I've just started playing around with btrfs RAID1, and I've noticed a
couple of what seem to be UI issues.  Suppose I do something like
"mkfs.btrfs -d raid1 -m raid1 dev1 dev2".  I see the following minor
usability problems:

 - Unless I'm missing something, there doesn't seem to be any way later
   on to see that I set the data policy to raid1, except using
   btrfs-dump-tree and checking the flags bits for the appropriate
   group.  Which can make things confusing if I have a bunch of btrfs
   filesystems around.

 - The free space reporting doesn't seem to take into account the fact
   that everything is going to be mirrored; so "df" et al report the
   size of the filesystem and free space on the new filesystem as
   size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
   would assume it should really be just size(dev1) for a fully-RAID1
   filesystem.  (Not sure in general what we should say for a
   metadata-only mirrored filesystem, since we don't really know in
   advance how much space we have exactly)

I'm happy to help fix these issues up; just want to make sure I'm not
missing something or doing it wrong.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-16 18:45 UI issues around RAID1 Roland Dreier
@ 2009-11-16 20:20 ` Josef Bacik
  2009-11-16 21:48   ` jim owens
  2009-11-18 17:54   ` Roland Dreier
  0 siblings, 2 replies; 8+ messages in thread
From: Josef Bacik @ 2009-11-16 20:20 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-btrfs

On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier wrote:
> I've just started playing around with btrfs RAID1, and I've noticed a
> couple of what seem to be UI issues.  Suppose I do something like
> "mkfs.btrfs -d raid1 -m raid1 dev1 dev2".  I see the following minor
> usability problems:
> 
>  - Unless I'm missing something, there doesn't seem to be any way later
>    on to see that I set the data policy to raid1, except using
>    btrfs-dump-tree and checking the flags bits for the appropriate
>    group.  Which can make things confusing if I have a bunch of btrfs
>    filesystems around.
> 

You aren't missing anything, theres just nothing that spits that information out
yet.  btrfs-show would probably be a good place to do this.

>  - The free space reporting doesn't seem to take into account the fact
>    that everything is going to be mirrored; so "df" et al report the
>    size of the filesystem and free space on the new filesystem as
>    size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
>    would assume it should really be just size(dev1) for a fully-RAID1
>    filesystem.  (Not sure in general what we should say for a
>    metadata-only mirrored filesystem, since we don't really know in
>    advance how much space we have exactly)
> 

Yeah df is just a fun ball of wax in many respects.  We don't take into account
RAID and we don't subtrace space thats strictly for metadata, so there are
several things that need to be fixed for df.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-16 20:20 ` Josef Bacik
@ 2009-11-16 21:48   ` jim owens
       [not found]     ` <2a31deca0911170244m74478cefy2a3f5f7bc1daf476@mail.gmail.com>
  2009-11-18 17:59     ` Roland Dreier
  2009-11-18 17:54   ` Roland Dreier
  1 sibling, 2 replies; 8+ messages in thread
From: jim owens @ 2009-11-16 21:48 UTC (permalink / raw)
  To: Josef Bacik, Roland Dreier; +Cc: linux-btrfs

Josef Bacik wrote:
> On Mon, Nov 16, 2009 at 10:45:28AM -0800, Roland Dreier wrote:
>>  - The free space reporting doesn't seem to take into account the fact
>>    that everything is going to be mirrored; so "df" et al report the
>>    size of the filesystem and free space on the new filesystem as
>>    size(dev1) + size(dev2) -- if dev1 and dev2 are the same size then I
>>    would assume it should really be just size(dev1) for a fully-RAID1
>>    filesystem.  (Not sure in general what we should say for a
>>    metadata-only mirrored filesystem, since we don't really know in
>>    advance how much space we have exactly)
>>
> 
> Yeah df is just a fun ball of wax in many respects.  We don't take into account
> RAID and we don't subtrace space thats strictly for metadata, so there are
> several things that need to be fixed for df.  Thanks,

But as we have said many times... if we have different
raid types active on different files, any attempt to make
df report "raid adjusted numbers" instead of the current raw
total storage numbers is going to sometimes give wrong answers.

So I think it is dangerous to try.  The current output
may be ugly, but it is always consistent and explainable.

jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
       [not found]     ` <2a31deca0911170244m74478cefy2a3f5f7bc1daf476@mail.gmail.com>
@ 2009-11-17 15:25       ` jim owens
  2009-11-17 20:23         ` Andrey Kuzmin
  0 siblings, 1 reply; 8+ messages in thread
From: jim owens @ 2009-11-17 15:25 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: linux-btrfs

Andrey Kuzmin wrote:
> On Tue, Nov 17, 2009 at 12:48 AM, jim owens <jowens@hp.com> wrote:
>> But as we have said many times... if we have different
>> raid types active on different files, any attempt to make
> 
> Late question, but could you please explain it a bit further (or point
> me to respective discussion archive)? Did I get it correct that btrfs
> supports per-file raid topology? Or is it per-(sub)volume?

The design of btrfs actually allows each extent inside a file
to have a different raid type.  This probably will never happen
unless a file is written, we add disks and mount with a new
raid type, and then modify part of the file. (this may not
behave how I think but I plan to test it someday soon).

There is a flag on the file to allow per-file raid setting
via ioctl/fcntl.  The typical use for this would be to
make a file DUPlicate type on a simple disk.  DUPlicate acts
like a raid 1 mirror on a single drive and is the default raid
type for metadata extents.

[disclaimer] btrfs is still in development and Chris might
say it does not (or will not in the future) work like I think.

>> df report "raid adjusted numbers" instead of the current raw
>> total storage numbers is going to sometimes give wrong answers.
> 
> I have always thought that space (both physical and logical) used by
> file-system could be accounted for correctly whatever topology or a
> mixture thereof is in effect, the only point worth discussion being
> accounting overhead. Free space, under variable topology, of course
> can only be reliably reported as raw (or as an 'if you use this
> topology,-then you have this logical capacity left' list).

So we know the "raw free blocks", but can not guarantee
"how many raw blocks per new user write-block" will be
consumed because we do not know what topology will be
in effect for a new write.

We could cheat and use "worst-case topology" numbers
if all writes are the current default raid.  Of course
this ignores DUP unless it is set on the whole filesystem.

And we also have the problem of metadata - which is dynamic
and allocated in large chunks and has a DUP type, how do we
account for that in worst-case calculations.

The worst-case is probably wrong but may be more useful to
people to know when they will run out of space. Or at least
it might make some of our ENOSPC complaints go away :)

Only "raw" and "worst-case" can be explained to users and
which we report is up to Chris.  Today we report "raw".

After spending 10 years on a multi-volume filesystem that
had (unsolvable) confusing df output, I'm just of the
opinion that nothing we do will make everyone happy.

But feel free to run a patch proposal by Chris.

jim

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-17 15:25       ` jim owens
@ 2009-11-17 20:23         ` Andrey Kuzmin
  0 siblings, 0 replies; 8+ messages in thread
From: Andrey Kuzmin @ 2009-11-17 20:23 UTC (permalink / raw)
  To: jim owens; +Cc: linux-btrfs

On Tue, Nov 17, 2009 at 6:25 PM, jim owens <jowens@hp.com> wrote:
> <snip>
> So we know the "raw free blocks", but can not guarantee
> "how many raw blocks per new user write-block" will be
> consumed because we do not know what topology will be
> in effect for a new write.
>
> We could cheat and use "worst-case topology" numbers
> if all writes are the current default raid. =A0Of course
> this ignores DUP unless it is set on the whole filesystem.
>
> And we also have the problem of metadata - which is dynamic
> and allocated in large chunks and has a DUP type, how do we
> account for that in worst-case calculations.
>
> The worst-case is probably wrong but may be more useful to
> people to know when they will run out of space. Or at least
> it might make some of our ENOSPC complaints go away :)
>
> Only "raw" and "worst-case" can be explained to users and
> which we report is up to Chris. =A0Today we report "raw".
>
> After spending 10 years on a multi-volume filesystem that
> had (unsolvable) confusing df output, I'm just of the
> opinion that nothing we do will make everyone happy.

df is user-centric, and therefore is naturally expected to return
used/available _logical_ capacity (how this translates to used
physical space is up to file-system-specific tools to find
out/report). Returning raw is counter-intuitive and causes surprise
similar to that of Roland.

With so flexible, down to per-file, topology configuration the only
option I see for df to return logical capacity available is to compute
the  latter off the file-system object for which df is invoked. For
instance, 'df /path/to/some/file' could return logical capacity for
the mountpoint where some-file resides, computed from underlying
physical capacity available _and_ topology for this file. 'df
/mount-point' would under this implementation return  available
logical capacity assuming default topology for the referenced
file-system.

As to used logical space accounting, this is file-system-specific and
I'm not yet familiar enough with btrfs code-base to argument for any
approach.

Regards,
Andrey
>
> But feel free to run a patch proposal by Chris.
>
> jim
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-16 20:20 ` Josef Bacik
  2009-11-16 21:48   ` jim owens
@ 2009-11-18 17:54   ` Roland Dreier
  1 sibling, 0 replies; 8+ messages in thread
From: Roland Dreier @ 2009-11-18 17:54 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs


 > >  - Unless I'm missing something, there doesn't seem to be any way later
 > >    on to see that I set the data policy to raid1, except using
 > >    btrfs-dump-tree and checking the flags bits for the appropriate
 > >    group.  Which can make things confusing if I have a bunch of btrfs
 > >    filesystems around.

 > You aren't missing anything, theres just nothing that spits that information out
 > yet.  btrfs-show would probably be a good place to do this.

Thanks.  I'll look at adding in show more info about RAID policy to the
btrfs-show output.

 - R.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-16 21:48   ` jim owens
       [not found]     ` <2a31deca0911170244m74478cefy2a3f5f7bc1daf476@mail.gmail.com>
@ 2009-11-18 17:59     ` Roland Dreier
  2009-11-19 14:56       ` Chris Mason
  1 sibling, 1 reply; 8+ messages in thread
From: Roland Dreier @ 2009-11-18 17:59 UTC (permalink / raw)
  To: jim owens; +Cc: Josef Bacik, linux-btrfs

 > > Yeah df is just a fun ball of wax in many respects.  We don't take into account
 > > RAID and we don't subtrace space thats strictly for metadata, so there are
 > > several things that need to be fixed for df.  Thanks,

 > But as we have said many times... if we have different
 > raid types active on different files, any attempt to make
 > df report "raid adjusted numbers" instead of the current raw
 > total storage numbers is going to sometimes give wrong answers.
 > 
 > So I think it is dangerous to try.  The current output
 > may be ugly, but it is always consistent and explainable.

It does seem like a big problem, especially as we add in other RAID
levels etc.  However on the flip side, the accounting of the "used"
space does seem off and maybe fixable?

In other words if I create a btrfs filesystem out of two 1GB devices
with RAID1 for data and metadata, then df shows a total size of 2GB for
the filesystem.  But if I then create a .5 GB file on that filesystem,
the used space is shown as .5 GB only -- ie the accounting of total size
is at the device/block level, but the accounting of used space is at the
logical/filesystem level.  Which leads to very confusing df output.

I wonder if it's possible to come up with a way to make things
consistent at least, or figure out a way to define more useful
information about space left on the filesystem.

 - Roland

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: UI issues around RAID1
  2009-11-18 17:59     ` Roland Dreier
@ 2009-11-19 14:56       ` Chris Mason
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Mason @ 2009-11-19 14:56 UTC (permalink / raw)
  To: Roland Dreier; +Cc: jim owens, Josef Bacik, linux-btrfs

On Wed, Nov 18, 2009 at 09:59:24AM -0800, Roland Dreier wrote:
> 
>  > > Yeah df is just a fun ball of wax in many respects.  We don't take into account
>  > > RAID and we don't subtrace space thats strictly for metadata, so there are
>  > > several things that need to be fixed for df.  Thanks,
> 
>  > But as we have said many times... if we have different
>  > raid types active on different files, any attempt to make
>  > df report "raid adjusted numbers" instead of the current raw
>  > total storage numbers is going to sometimes give wrong answers.
>  > 
>  > So I think it is dangerous to try.  The current output
>  > may be ugly, but it is always consistent and explainable.
> 
> It does seem like a big problem, especially as we add in other RAID
> levels etc.  However on the flip side, the accounting of the "used"
> space does seem off and maybe fixable?
> 
> In other words if I create a btrfs filesystem out of two 1GB devices
> with RAID1 for data and metadata, then df shows a total size of 2GB for
> the filesystem.  But if I then create a .5 GB file on that filesystem,
> the used space is shown as .5 GB only -- ie the accounting of total size
> is at the device/block level, but the accounting of used space is at the
> logical/filesystem level.  Which leads to very confusing df output.
> 
> I wonder if it's possible to come up with a way to make things
> consistent at least, or figure out a way to define more useful
> information about space left on the filesystem.

That part we can at least do.  Since we know the amount of space used in
each block group and the raid level of each block group, we can figure
it out.  It won't be cheap overall but it is at least possible.

-chris


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-11-19 14:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-16 18:45 UI issues around RAID1 Roland Dreier
2009-11-16 20:20 ` Josef Bacik
2009-11-16 21:48   ` jim owens
     [not found]     ` <2a31deca0911170244m74478cefy2a3f5f7bc1daf476@mail.gmail.com>
2009-11-17 15:25       ` jim owens
2009-11-17 20:23         ` Andrey Kuzmin
2009-11-18 17:59     ` Roland Dreier
2009-11-19 14:56       ` Chris Mason
2009-11-18 17:54   ` Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox