linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ????
       [not found]   ` <59FEC912.4000005@tlinx.org>
@ 2017-11-05  8:55     ` Amir Goldstein
  2017-11-05 22:34       ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2017-11-05  8:55 UTC (permalink / raw)
  To: L A Walsh; +Cc: overlayfs, linux-xfs, Dave Chinner, Darrick J. Wong

[adding cc: linux-xfs]

On Sun, Nov 5, 2017 at 10:17 AM, L A Walsh <lkml@tlinx.org> wrote:
> Amir Goldstein wrote:
>>
>>
>>
>>>
>>> I then created a new xfs file system and mounted it on '/edge';
>>>
>>>    Ishtar:/edge> xfs_info .
>>>    meta-data=/dev/Data/Edge     isize=256    agcount=32,
>>>    agsize=16777200 blks     =                   sectsz=4096  attr=2
>>>    data     =                   bsize=4096   blocks=536870400, imaxpct=5
>>>             =                   sunit=16     swidth=64 blks
>>>    naming   =version 2          bsize=4096   ascii-ci=0
>>>    log      =internal           bsize=4096   blocks=262143, version=2
>>>             =                   sectsz=4096  sunit=1 blks, lazy-count=1
>>>    realtime =none               extsz=4096   blocks=0, rtextents=0
>>>
>>>
>>
>>
>> Your problem is that you do not have "ftype" feature in directory
>> name format, like this:
>>
>> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
>>
>> Perhaps you have an old version of mkfs.xfs, not sure when
>> ftype=1 became the default format, but you can try to
>>   mkfs.xfs -n ftype=1
>>
>
> ----  Ah... no .. last I was told, if you turned on ftype=1,
> you had to also pull in crc'ing of all the meta-info.
> That has problems -- causes errors where there would be no
> problem, and was never tested on mature file systems that were
> already fragmented.
>
>
> Do you know if it was separated from crc32 -- for some inexplicable reason,
> if you wanted ftype, then the crc option would be forced on for you.
>

I don't know if there was a specific reason, but that's the way it is.

> I didn't want it as I didn't want it to flag errors in metadata that
> wasn't crucial and didn't want the speed slowdown.  Sigh.
>
> The problem on crc'ing the meta data, is that there is ALOT more meta
> data where detecting it will do more harm than good (like what nanosecond
> the file was last changed, for example).  I first ran into it
> taking the disk offline when I changed the guid on a newly formatted disk.
> That was fixed, but that was a warning shot...   How annoying.
>

I have never heard about those issues that you raise.
It sounds like a myth about XFS metadata CRC that should be debunked
so forwarding your message on to XFS list.
See also https://www.spinics.net/lists/xfs/msg19079.html

>
> From what you say, though only the upper layer needs to have the ftype=1.
> That's a new filesystem, so shouldn't make that much difference, but the
> lower fs's I'd want to use overlays with are older file systems.  But
> it sounds like those can remain as they are?
>
> (assuming they don't become upper layers in some multi-layer
> scenario)...
>

That is correct.

Amir.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ????
  2017-11-05  8:55     ` Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ???? Amir Goldstein
@ 2017-11-05 22:34       ` Dave Chinner
  2017-11-08 21:21         ` L A Walsh
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2017-11-05 22:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: L A Walsh, overlayfs, linux-xfs, Darrick J. Wong

On Sun, Nov 05, 2017 at 10:55:40AM +0200, Amir Goldstein wrote:
> [adding cc: linux-xfs]
> 
> On Sun, Nov 5, 2017 at 10:17 AM, L A Walsh <lkml@tlinx.org> wrote:
> > Amir Goldstein wrote:
> >>
> >>
> >>
> >>>
> >>> I then created a new xfs file system and mounted it on '/edge';
> >>>
> >>>    Ishtar:/edge> xfs_info .
> >>>    meta-data=/dev/Data/Edge     isize=256    agcount=32,
> >>>    agsize=16777200 blks     =                   sectsz=4096  attr=2
> >>>    data     =                   bsize=4096   blocks=536870400, imaxpct=5
> >>>             =                   sunit=16     swidth=64 blks
> >>>    naming   =version 2          bsize=4096   ascii-ci=0
> >>>    log      =internal           bsize=4096   blocks=262143, version=2
> >>>             =                   sectsz=4096  sunit=1 blks, lazy-count=1
> >>>    realtime =none               extsz=4096   blocks=0, rtextents=0
> >>>
> >>>
> >>
> >>
> >> Your problem is that you do not have "ftype" feature in directory
> >> name format, like this:
> >>
> >> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> >>
> >> Perhaps you have an old version of mkfs.xfs, not sure when
> >> ftype=1 became the default format, but you can try to
> >>   mkfs.xfs -n ftype=1
> >>
> >
> > ----  Ah... no .. last I was told, if you turned on ftype=1,
> > you had to also pull in crc'ing of all the meta-info.
> > That has problems -- causes errors where there would be no
> > problem, and was never tested on mature file systems that were
> > already fragmented.
> >
> >
> > Do you know if it was separated from crc32 -- for some inexplicable reason,
> > if you wanted ftype, then the crc option would be forced on for you.

Are you still getting all worked up about how metadata CRCs and
the v5 on-disk format is going to make the sky fall, Linda? It's
time to give in and come join us on the dark side...

> I don't know if there was a specific reason, but that's the way it is.

ftype was implemented as part of the format changes for the v5
format so it's always enabled for v5 filesystems.  It was introduced
as a mkfs option for the v4 format in early 2014, and since mid-2015
it's been the default for non-crc filesystems:

# mkfs.xfs -f -m crc=0 /dev/vdb
.....
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
.....

Users should try to keep your userspace tools up to date with the
kernel being run.... :)

> > I didn't want it as I didn't want it to flag errors in metadata that
> > wasn't crucial and didn't want the speed slowdown.  Sigh.
> >
> > The problem on crc'ing the meta data, is that there is ALOT more meta
> > data where detecting it will do more harm than good (like what nanosecond
> > the file was last changed, for example).  I first ran into it
> > taking the disk offline when I changed the guid on a newly formatted disk.
> > That was fixed, but that was a warning shot...   How annoying.
> >
> 
> I have never heard about those issues that you raise.
> It sounds like a myth about XFS metadata CRC that should be debunked
> so forwarding your message on to XFS list.

FYI, Amir.

Keep in mind that a lot of people didn't like the concept of
metadata CRCs in XFS because .... reasons.  There has been a history
of people jumping on bugs and/or not-yet-implemented feature as
justification for their opposition to the change. Call it the nature
of the vocal minority - most users haven't noticed and don't care
that their new install of their distro of choice is now using CRC
enabled filesystems by default....

As to the issue that Linda raised, yes, it *did* exist.  We baked
the UUID into the metadata format so we knew what filesystem owns a
specific metadata block. Handy for detecting stale metadata on a
reused device as well as misdirected writes.  We knew about it from
the start (all the tools had to be modified to disallow changing
UUIDS on v5 filesystems!) but it just wasn't an important enough
requirement to have this functionality up front for CRC enabled
filesystems.

However, it wasn't clear what the solution was to the "change UUID"
problem when CRCs were ready, and we also needed to understand the
behaviour of cloned v5 filesystems on COW based snapshots before we
made any sort of change that could require rewriting all the
metadata in the filesystem. So it took some time for the issue to
come to the top of the "remaining problems to solve" and when it did
we had already built up enough knowledge about v5 filesystem
behaviour to determine the best way to solve the problem.

IOWs, it was always the plan to support it so that tools like
xfs_copy worked properly with v5 filesystems, but it wasn't a
primary concern compared to making CRCs robust. It was fixed
a couple of years ago:

commit 9c4e12fb60c15dc9c5e54041c9679454b42cb23e
Author: Eric Sandeen <sandeen@sandeen.net>
Date:   Mon Aug 3 10:45:00 2015 +1000

    xfsprogs: Add new sb_meta_uuid field, update userspace tools to manipulate it
    
    This adds a new superblock field, sb_meta_uuid.  This allows us to
    change the use-visible UUID on crc-enabled filesytems from userspace
    if desired, by copying the existing UUID to the new location for
    metadata comparisons.  If this is done, an incompat flag must be
    set to prevent older filesystems from mounting the filesystem, but
    the original UUID can be restored, and the incompat flag removed,
    with a new xfs_db / xfs_admin UUID command, "restore."
    
    Much of this patch mirrors the kernel patch in simply renaming
    the field used for metadata uuid comparison; other bits:
    
    * Teach xfs_db to print the new meta_uuid field
    * Allow xfs_db to generate a new UUID for CRC-enabled filesystems
    * Allow xfs_db to revert to the original UUID and clear the flag
    * Fix up xfs_copy to work with CRC-enabled filesystems
    * Update the xfs_admin manpage to show the UUID "restore" command
    
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>
    Reviewed-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

> See also https://www.spinics.net/lists/xfs/msg19079.html

Yeah, that was in reaction to the loud claims that "CRCs are going
to slow everything down". Late last year we significantly reduced
the CPU overhead of CRC calculation on the write side , so it drops
off the CPU profiles in the workloads described in that like above
almost entirely. This was the commit:

commit cae028df53449905c944603df624ac94bc619661
Author: Dave Chinner <dchinner@redhat.com>
Date:   Mon Dec 5 14:40:32 2016 +1100

    xfs: optimise CRC updates
    
    Nick Piggin reported that the CRC overhead in an fsync heavy
    workload was higher than expected on a Power8 machine. Part of this
    was to do with the fact that the power8 CRC implementation is not
    efficient for CRC lengths of less than 512 bytes, and so the way we
    split the CRCs over the CRC field means a lot of the CRCs are
    reduced to being less than than optimal size.
    
    To optimise this, change the CRC update mechanism to zero the CRC
    field first, and then compute the CRC in one pass over the buffer
    and write the result back into the buffer. We can do this safely
    because anything writing a CRC has exclusive access to the buffer
    the CRC is being calculated over.
    
    We leave the CRC verify code the same - it still splits the CRC
    calculation - because we do not want read-only operations modifying
    the underlying buffer. This is because read-only operations may not
    have an exclusive access to the buffer guaranteed, and so temporary
    modifications could leak out to to other processes accessing the
    buffer concurrently.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ????
  2017-11-05 22:34       ` Dave Chinner
@ 2017-11-08 21:21         ` L A Walsh
  2017-11-09  1:47           ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: L A Walsh @ 2017-11-08 21:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Amir Goldstein, overlayfs, linux-xfs, Darrick J. Wong

Dave Chinner wrote:
> Are you still getting all worked up about how metadata CRCs and
> the v5 on-disk format is going to make the sky fall, Linda? It's
> time to give in and come join us on the dark side...
>   
---
    I don't believe I've heard that the sky would fall.  I only had
2 issues -- 1 that metadata that I that I didn't care about or that I
wanted to change would be crc'd and prevent changing meta data I wanted
to change or would flag errors in meta data I didn't care about
(file last access time being a nanosecond or a day off due to bit rot
and crc flagging it as an error.

    Maybe you might remember, I first ran into this when,  as part of
my mkfs procedure, I assigned my own value to my disk's UUID, and at the
time, the crc-feature claimed the disk had a fault in it. 

    My second issue was it being tied to the finobt feature in a way that
precluded benchmarking changes on our own filesystems and workload.


>   
>> I don't know if there was a specific reason, but that's the way it is.
>>     
>
> ftype was implemented as part of the format changes for the v5
> format so it's always enabled for v5 filesystems.  It was introduced
> as a mkfs option for the v4 format in early 2014, and since mid-2015
> it's been the default for non-crc filesystems:
>
> # mkfs.xfs -f -m crc=0 /dev/vdb
> .....
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> .....
>
> Users should try to keep your userspace tools up to date with the
> kernel being run.... :)
>   
----
    And tools space writers should remember that those who run some
distro may have tools from 2+ years old and have even been told that
we are running unsupported configurations if we update system tools
(not that this always stops some people).


   I forget -- what switch do I pass to xfs utils to have them tell
me what features are supported (v4 or v5, for example)?

    I do see ftype=0|1 under naming and that it has nothing to do
with crc of data as well as crc and finobt under metadata.

    The problem I had was following the kernel docs for the overlayfs
and not seeing where ftype=1 was required when making an xfs file system.

    It seems like my mkfs supports ftype, but it isn't the default and
I didn't know I was supposed to turn it on.

>   
>> I have never heard about those issues that you raise.
>> It sounds like a myth about XFS metadata CRC that should be debunked
>> so forwarding your message on to XFS list.
>>     
>
> FYI, Amir.
>
> Keep in mind that a lot of people didn't like the concept of
> metadata CRCs in XFS because .... reasons.  
---
    See above for for my reasons.


> As to the issue that Linda raised, yes, it *did* exist.  We baked
> the UUID into the metadata format so we knew what filesystem owns a
> specific metadata block. Handy for detecting stale metadata on a
> reused device as well as misdirected writes.  We knew about it from
> the start (all the tools had to be modified to disallow changing
> UUIDS on v5 filesystems!) but it just wasn't an important enough
> requirement to have this functionality up front for CRC enabled
> filesystems.
>   
====
    And you have confirmed 1 of my 2 reasons for disliking the crc
feature -- it sounds like you can no longer set the UUID field on a
new file systems. 

    Please don't tell people that they sky is falling when you have broken
the ability to change UUID's as was present in the past.  That was a valid
feature -- that I was told would be excluded from crc'ing, but now find
that it can't be done without damaging ability for old systems to read
such file systems.
   
>
> Yeah, that was in reaction to the loud claims that "CRCs are going
> to slow everything down". Late last year we significantly reduced
> the CPU overhead of CRC calculation on the write side , so it drops
> off the CPU profiles in the workloads described in that like above
> almost entirely. This was the commit:
>   
----
    That article had nothing to do w/my concern and predated my involvement.
My concern was tying the finobt feature to the crc feature so they could
not be tested in isolation to allow seeing what the impact of crc's 
might be,
but more importantly, seeing if finobt had any positive impact on
more mature file systems without including  the crc feature.

     Your stance seems to be that the the crc feature combined with the
finobt feature don't show a measurable slowdown on newly created file 
systems.

    I would expect that, especially since finobt would benefit more mature
file systems more than newer ones.  While on newer file systems, finobt+crc
comes out to about the same performance.  

    My issue was the inability to bench or use them separately.

    No sky falling, just standard benchmark methodology to test changes
on your own workload.

    But as to the ftype flag -- that was me using v4 tools and seeing
no information that I needed to explicitly specify it to make the overlay
file system work with xfs, which I don't think has
anything to do with crc's.  Right?

-linda





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ????
  2017-11-08 21:21         ` L A Walsh
@ 2017-11-09  1:47           ` Dave Chinner
  2017-11-09  7:51             ` L A Walsh
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2017-11-09  1:47 UTC (permalink / raw)
  To: L A Walsh; +Cc: Amir Goldstein, overlayfs, linux-xfs, Darrick J. Wong

On Wed, Nov 08, 2017 at 01:21:18PM -0800, L A Walsh wrote:
> Dave Chinner wrote:
> >Are you still getting all worked up about how metadata CRCs and
> >the v5 on-disk format is going to make the sky fall, Linda? It's
> >time to give in and come join us on the dark side...
> ---
>    I don't believe I've heard that the sky would fall.  I only had
> 2 issues -- 1 that metadata that I that I didn't care about or that I
> wanted to change would be crc'd and prevent changing meta data I wanted
> to change or would flag errors in meta data I didn't care about
> (file last access time being a nanosecond or a day off due to bit rot
> and crc flagging it as an error.
> 
>    Maybe you might remember, I first ran into this when,  as part of
> my mkfs procedure, I assigned my own value to my disk's UUID, and at the
> time, the crc-feature claimed the disk had a fault in it.

Yes, but changing the UUID was documented as "not currently
supported" on v5 filesystems *when it was originally released*.
IOWs, it was documented as "will be supported in future", but it
wasn't a critical feature for the initial release of CRC enabled
filesystems.

If someone manually changed the UUID (which was the only way to do
it because the xfs_db commands would refuse to do it) then *it broke
the filesystem* and so it was correct behaviour to report
corruption.

Changing the UUID on v5 filesystems is now implemented and
supported:

$ sudo mkfs.xfs -f /dev/pmem0
Default configuration sourced from package build definitions
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=524288 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=2097152, imaxpct=25, thinblocks=0
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
$
$ sudo blkid -c /dev/null /dev/pmem0
/dev/pmem0: UUID="7073fe11-4b44-4160-a8a0-dec492f61a14" TYPE="xfs"
$
$ sudo xfs_admin -U generate /dev/pmem0
Clearing log and setting UUID
writing all SBs
new UUID = c3a4f999-b76a-4597-bb62-df11c5e3fc04
$
$ sudo blkid -c /dev/null /dev/pmem0
/dev/pmem0: UUID="c3a4f999-b76a-4597-bb62-df11c5e3fc04" TYPE="xfs"
$

IOWs, this problem is ancient history. Move on, nothing to see here.

>    My second issue was it being tied to the finobt feature in a way that
> precluded benchmarking changes on our own filesystems and workload.

[....]

>    I would expect that, especially since finobt would benefit more mature
> file systems more than newer ones.  While on newer file systems, finobt+crc
> comes out to about the same performance.
> 
>    My issue was the inability to bench or use them separately.

<sigh>

Not an XFS problem:

$ mkfs.xfs -f -m finobt=0 /dev/pmem0
....
         =                       crc=1        finobt=0, sparse=0, rmapbt=0, reflink=0
.....

Yup, crc's enabled, finobt is not. As documented in the mkfs.xfs man
page.

IOWs, We can directly measure the impact of the finobt on
workloads/benchamrks. And if we want to compare the impact of CRCs,
then 'mkfs.xfs -f -isize=512, -m crc=0 <dev>' will be directly
comparable to the above non-finobt filesystem. THis is how we
benchmarked the changes in the first place....

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ????
  2017-11-09  1:47           ` Dave Chinner
@ 2017-11-09  7:51             ` L A Walsh
  0 siblings, 0 replies; 5+ messages in thread
From: L A Walsh @ 2017-11-09  7:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Amir Goldstein, overlayfs, linux-xfs, Darrick J. Wong

Dave Chinner wrote:
>
>
> Changing the UUID on v5 filesystems is now implemented and
> supported:
>   
---
    And it won't have a problem if seen by a previous gen
tool -- like xfsrestore on an older "emergency boot disk"?

    If not, then I misunderstood what you wrote earlier.
>>    My issue was the inability to bench or use them separately.
>>     
>
> <sigh>
>
> Not an XFS problem:
>
> $ mkfs.xfs -f -m finobt=0 /dev/pmem0
> ....
>          =                       crc=1        finobt=0, sparse=0, rmapbt=0, reflink=0
> .....
>
> Yup, crc's enabled, finobt is not. As documented in the mkfs.xfs manpage.
>   

So you are saying I can set finobt to 0 or 1 with crc=0?

Because testing sigh crc=1 and finobt=0|1 isn't
the same as testing crc=0 and finobt=0|1.

I'm more interested in testing finobt's affect by itself and have
a secondary interest in the effect of the crc option because I would
like to use the finobt option (thus desire to test it first), but
do not currently want to use the crc otpion, thus it being of secondary
interest.

Again, if I can test finobt 0 or 1 with no requirements I turn on
crc, then I was mistaken in my earlier understanding.

So I still have 2 issues: UUID labels that don't preclude using older
emergency boot disks, to restore a file system, and the ability to
test finobt apart from other features.




> ge.
>
> IOWs, We can directly measure the impact of the finobt on
> workloads/benchamrks. And if we want to compare the impact of CRCs,
> then 'mkfs.xfs -f -isize=512, -m crc=0 <dev>' will be directly
> comparable to the above non-finobt filesystem. THis is how we
> benchmarked the changes in the first place....
>   
----
  

    That methodology is flawed.

    If the crc option is on during finobt being tested as 0 or 1,
the crc option on means different disk caching -- if the crc
option pulls in some or all of the finobt info, then you can't measure
the finobt option with the crc option turned on. Even if you turn the crc
feature off, if the disk has crc features, those also change how
information is read in.

    Only if you create a disk with no crc info on it, and then can test
finobt=0|1, can you see the relative performance of the finobt cases. 

    If it is the case that finobt can be toggled on a disk with no
crc data or option in use, then I've misunderstood previously read
constraints (or they've changed).

-l


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-09  7:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <59FA4499.2030502@tlinx.org>
     [not found] ` <CAOQ4uxg-bkYnpV146kBgdbgCK9On+=jM1gCe1975J8HKkWxcNg@mail.gmail.com>
     [not found]   ` <59FEC912.4000005@tlinx.org>
2017-11-05  8:55     ` Bug? or normal behavior? if bug, then where? overlay, vfs, xfs, or ???? Amir Goldstein
2017-11-05 22:34       ` Dave Chinner
2017-11-08 21:21         ` L A Walsh
2017-11-09  1:47           ` Dave Chinner
2017-11-09  7:51             ` L A Walsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).