* Can't cp --reflink files on a Ext4-converted FS w/o checksums
@ 2014-11-26 19:55 Roman Mamedov
2014-11-26 23:18 ` Robert White
2014-11-27 3:31 ` Liu Bo
0 siblings, 2 replies; 13+ messages in thread
From: Roman Mamedov @ 2014-11-26 19:55 UTC (permalink / raw)
To: linux-btrfs
Hello,
I used btrfs-convert to switch my FS from Ext4 to Btrfs. As it was a rather
large 10 TB filesystem, to save on the conversion time, I used the "-d,
disable data checksum" option of btrfs-convert.
Turns out now I can't "cp --reflink" any files that were already on the FS
prior to conversion. The error message from cp is "failed to clone [...]
Invalid argument".
I assume this is because of the lack of checksums; the only way to make old
files cloneable is to plain copy them to a different place and then delete the
originals, but that's what I was trying to avoid in the first place.
Also I thought maybe defragmenting will help, but nope, doesn't seem to be the
case, even ordering it to recompress data to a different method doesn't fix
the problem. (Even if it did, it's still a lot of unnecessary rewriting).
Is there really a good reason to stop these files without checksums from being
cloneable? It's not like they have the noCoW attribute, so I'd assume any new
write to these files would cause a CoW and proper checksums for all new blocks
anyways.
--
With respect,
Roman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 19:55 Can't cp --reflink files on a Ext4-converted FS w/o checksums Roman Mamedov
@ 2014-11-26 23:18 ` Robert White
2014-11-26 23:33 ` Roman Mamedov
2014-11-27 9:27 ` Duncan
2014-11-27 3:31 ` Liu Bo
1 sibling, 2 replies; 13+ messages in thread
From: Robert White @ 2014-11-26 23:18 UTC (permalink / raw)
To: Roman Mamedov, linux-btrfs
On 11/26/2014 11:55 AM, Roman Mamedov wrote:
> Is there really a good reason to stop these files without checksums from being
> cloneable? It's not like they have the noCoW attribute, so I'd assume any new
> write to these files would cause a CoW and proper checksums for all new blocks
> anyways.
The problem seems to be that you are trying to clone a NODATASUM file to
a file that would have data checsums. The resulting file _might_ then
end up with some extents with, and others without, checksums if that
target file were modified.
So you _could_ reflink the file but you'd have to do it to another file
with no data checksums -- which basically means a NOCOW file, or
mounting with nodatasum while you do the reflink, but now you have more
problem files.
linux/fs/btrfs/ioctl.c @ ~ line 2930
/* don't make the dst file partly checksummed */
if ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
(BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM)) {
ret = -EINVAL;
goto out_unlock;
}
Basically if the system allowed you to reflink from a no-data-sum to a
data-sum file the results would be instantly unreadable for failing
every single data checksum test. That or the entire checksum system
would have to be advisory instead of functional.
So yes, there is a good reason.
David Foster Wallace famously said "act in haste, repent in leasure".
You kind-of shot yourself in the foot while attempting to save time. You
promised yourself that you didn't need the checksums.
Now at this point I'm going to make a few guesses...
I don't see _anywhere_ in the kernel source or btrfs-progs where
BTRFS_INODE_NODATASUM is explicitly cleared from any inode ever. It
might be implicitly cleared somewhere but I couldn't find it. So all
those unsummed files are probably unsummed for life.
I also don't see anything in the code that says "this ioctl will create
the checksums for the selected file" so you may have to do the copy you
tried to avoid.
Sorry man...
If you haven't done much with the file system yet, you might want to
reverse the conversion and do it again.
Otherwise, you will want to copy the files long-hand, possibly in
batches if you are more than 50% full on disk space.
On the bright side...
This would be the perfect moment to think about your backup/snapshot
scheme. I always have a /whatever (e.g. /__System for a root partition)
as my default subvolume that I normally mount. When I do my backups I
mount -o subvol=/ /dev/whathaveyou /mnt/maintenance and then do my
snapshots in /mnt/maintenance. That way every file in my N snapshots
doesn't show up in the output of locate N+1 times. (e.g. this lets me
"hide" my local snapshots/backups from normal operations)
Also, now would be the perfect time to add compression to your default
regime. Compressing the files only happens on write and so using
compression involves a copy anyway.
-- Rob.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 23:18 ` Robert White
@ 2014-11-26 23:33 ` Roman Mamedov
2014-11-27 0:00 ` Robert White
2014-11-27 0:20 ` Robert White
2014-11-27 9:27 ` Duncan
1 sibling, 2 replies; 13+ messages in thread
From: Roman Mamedov @ 2014-11-26 23:33 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs
On Wed, 26 Nov 2014 15:18:26 -0800
Robert White <rwhite@pobox.com> wrote:
> So you _could_ reflink the file but you'd have to do it to another file
> with no data checksums -- which basically means a NOCOW file, or
> mounting with nodatasum while you do the reflink, but now you have more
> problem files.
Bingo!!! A cp --reflink to a destination that's been made chattr +C prior to
that, works perfectly.
My goal was to convert regular top-level directories into subvolumes (for
further snapshotting). With that trick, I've been able to do that now w/o
issues.
$ mv Music Music.orig
$ sudo btrfs sub create Music
Create subvolume './Music'
$ sudo chattr +C Music
$ sudo cp -a --reflink Music.orig/* Music/
$
Finished with no rewriting necessary. After that I recursively-removed the +C
attribute from all newly reflinked files, and cp --reflink as well as
snapshotting of those works fine.
Thanks for the idea. :)
--
With respect,
Roman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 23:33 ` Roman Mamedov
@ 2014-11-27 0:00 ` Robert White
2014-11-27 0:20 ` Roman Mamedov
2014-11-27 0:20 ` Robert White
1 sibling, 1 reply; 13+ messages in thread
From: Robert White @ 2014-11-27 0:00 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 11/26/2014 03:33 PM, Roman Mamedov wrote:
> On Wed, 26 Nov 2014 15:18:26 -0800
> Robert White <rwhite@pobox.com> wrote:
>
>> So you _could_ reflink the file but you'd have to do it to another file
>> with no data checksums -- which basically means a NOCOW file, or
>> mounting with nodatasum while you do the reflink, but now you have more
>> problem files.
>
> Bingo!!! A cp --reflink to a destination that's been made chattr +C prior to
> that, works perfectly.
>
> My goal was to convert regular top-level directories into subvolumes (for
> further snapshotting). With that trick, I've been able to do that now w/o
> issues.
>
> $ mv Music Music.orig
> $ sudo btrfs sub create Music
> Create subvolume './Music'
> $ sudo chattr +C Music
> $ sudo cp -a --reflink Music.orig/* Music/
> $
>
> Finished with no rewriting necessary. After that I recursively-removed the +C
> attribute from all newly reflinked files, and cp --reflink as well as
> snapshotting of those works fine.
>
> Thanks for the idea. :)
>
Uh... you may _still_ have no checksums on any of those data extents.
They are not going to come back until you write them to a normal file
with a normal copy. So you may be lacking most of the data validation
features of this filesystem. For instance you can, and always could,
snapshot a NODATACOW/NODATASUM file just fine (I call it 1COW mode).
Setting NODATACOW sets NODATASUM...
Clearing NODATACOW does _not_ clear NODATASUM (at least not on a
non-empty file) as near as I can tell, so that directory hierarchy and
its subsequent snapshots is likely "less safe" than you think.
You might want to go experiment. Make another new subvol (or at least a
directory in a directory/root/subvol that never had the +C attribute
set) and see if you can cp --reflink any of these files into that
subdirectory without repeating the +C trick.
Basically scrubbing and mirroring and "sending" your Music folder is an
unprotected and unverified operation the way you've done this (if my
reading of the code is correct).
You _really_ might be better off spending the time and doing the copy to
a normal directory.
For instance without checksums if you "btrfs scrub" your volume it will
read the file but it can't know if the file got corrupted, it can only
tell you if you if the disk read completed without error. (there's a
whole degenerate/simplified path in the code for scrubbing un-summed files).
So seriously, you might be "Saving yourself time" right into a future
data loss.
Take this as a "you've been warned".
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 0:00 ` Robert White
@ 2014-11-27 0:20 ` Roman Mamedov
2014-11-27 0:31 ` Robert White
0 siblings, 1 reply; 13+ messages in thread
From: Roman Mamedov @ 2014-11-27 0:20 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs
On Wed, 26 Nov 2014 16:00:23 -0800
Robert White <rwhite@pobox.com> wrote:
> Uh... you may _still_ have no checksums on any of those data extents.
> They are not going to come back until you write them to a normal file
> with a normal copy. So you may be lacking most of the data validation
> features of this filesystem.
Well, this FS is coming from being Ext4 for years, so it's not worse off now
than it was before. And anyways the main feature that I wanted were snapshots.
> You might want to go experiment. Make another new subvol (or at least a
> directory in a directory/root/subvol that never had the +C attribute
> set) and see if you can cp --reflink any of these files into that
> subdirectory without repeating the +C trick.
Ha, indeed I can't. Maybe there should be a way to generate checksums without
rewriting files, just via reading them, then calculating and writing checksum
to metadata.
> Clearing NODATACOW does _not_ clear NODATASUM (at least not on a
> non-empty file) as near as I can tell, so that directory hierarchy and
> its subsequent snapshots is likely "less safe" than you think.
The nodatasum flag also isn't accessible via chattr, is it?
--
With respect,
Roman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 23:33 ` Roman Mamedov
2014-11-27 0:00 ` Robert White
@ 2014-11-27 0:20 ` Robert White
2014-11-27 0:28 ` Roman Mamedov
1 sibling, 1 reply; 13+ messages in thread
From: Robert White @ 2014-11-27 0:20 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 11/26/2014 03:33 PM, Roman Mamedov wrote:
> Finished with no rewriting necessary. After that I recursively-removed the +C
> attribute from all newly reflinked files, and cp --reflink as well as
> snapshotting of those works fine.
I did some double checking and I think you'll find that if you lsattr
those files they still have the C (NoCOW) attribute, which also means
they are still unsummed.
Which also means that when you cp --reflink them the target files you
create are also NoCOW.
So you harmonized the lack of checksums with the linux-level C
attribute. This has hidden your problem but not fixed it.
(Trying to clear the NOCOW attribute on a file in BTRFS is _silently_
ignored as invalid. That recursive removal only changed the directories.)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 0:20 ` Robert White
@ 2014-11-27 0:28 ` Roman Mamedov
2014-11-27 0:45 ` Robert White
0 siblings, 1 reply; 13+ messages in thread
From: Roman Mamedov @ 2014-11-27 0:28 UTC (permalink / raw)
To: Robert White; +Cc: linux-btrfs
On Wed, 26 Nov 2014 16:20:44 -0800
Robert White <rwhite@pobox.com> wrote:
> I did some double checking and I think you'll find that if you lsattr
> those files they still have the C (NoCOW) attribute, which also means
> they are still unsummed.
Indeed, I looked at the top level only, which had just directories.
> (Trying to clear the NOCOW attribute on a file in BTRFS is _silently_
> ignored as invalid. That recursive removal only changed the directories.)
And the chattr command even completes with a zero exit code, this is rather
unexpected.
$ lsattr 0000\ MP3.m3u; chattr -C 0000\ MP3.m3u && echo Yay; lsattr 0000\ MP3.m3u
---------------C 0000 MP3.m3u
Yay
---------------C 0000 MP3.m3u
--
With respect,
Roman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 0:20 ` Roman Mamedov
@ 2014-11-27 0:31 ` Robert White
2014-11-27 0:57 ` Robert White
0 siblings, 1 reply; 13+ messages in thread
From: Robert White @ 2014-11-27 0:31 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 11/26/2014 04:20 PM, Roman Mamedov wrote:
> On Wed, 26 Nov 2014 16:00:23 -0800
> Robert White <rwhite@pobox.com> wrote:
>
>> Uh... you may _still_ have no checksums on any of those data extents.
>> They are not going to come back until you write them to a normal file
>> with a normal copy. So you may be lacking most of the data validation
>> features of this filesystem.
>
> Well, this FS is coming from being Ext4 for years, so it's not worse off now
> than it was before. And anyways the main feature that I wanted were snapshots.
Given that you wont be able to scrub the data and BTRFS is usable but
still a little brittle, it might be a little worse off than you think.
Plus if you ever find yourself in need of balancing you've tossed out
one level of data integrity right here at the start.
>> You might want to go experiment. Make another new subvol (or at least a
>> directory in a directory/root/subvol that never had the +C attribute
>> set) and see if you can cp --reflink any of these files into that
>> subdirectory without repeating the +C trick.
>
> Ha, indeed I can't. Maybe there should be a way to generate checksums without
> rewriting files, just via reading them, then calculating and writing checksum
> to metadata.
That problem would be "computationally hard" because you'd have to
verify that no other file was using that extent before you put that
extent under control of the csum machinery, otherwise you might break
later break the COW promise when the file that knows those blocks by
their checksum changes the contents out from underneath the other
references.
>> Clearing NODATACOW does _not_ clear NODATASUM (at least not on a
>> non-empty file) as near as I can tell, so that directory hierarchy and
>> its subsequent snapshots is likely "less safe" than you think.
>
> The nodatasum flag also isn't accessible via chattr, is it?
It is not. NODATASUM and NODATACOW are private features. The linux
kernel has no general concept of data checksums. The BTRFS guys mapped
the NODATACOW attribute onto the existing lsattr/chattr bit because it
was defined for another file system long ago.
You'll also find that you can not set or clear the C attr on a file in a
BTRFS unless its size is zero. So all your files now have the C
attribute more-or-less forever. Only a normal copy operation (e.g.
allocating new extents and writing the data into them wiht the checksum
feature in force) will change that.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 0:28 ` Roman Mamedov
@ 2014-11-27 0:45 ` Robert White
0 siblings, 0 replies; 13+ messages in thread
From: Robert White @ 2014-11-27 0:45 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 11/26/2014 04:28 PM, Roman Mamedov wrote:
> On Wed, 26 Nov 2014 16:20:44 -0800
> Robert White <rwhite@pobox.com> wrote:
>> (Trying to clear the NOCOW attribute on a file in BTRFS is _silently_
>> ignored as invalid. That recursive removal only changed the directories.)
>
> And the chattr command even completes with a zero exit code, this is rather
> unexpected.
That's what "silently" means in this context. I didn't pick the result,
and it's not what I would have done. I've got no idea if this was ever
discussed at any length for pros-and-cons. I could make an argument for
the silent result, or against it. Since the attribute is immutable there
really isn't a "nope, that's just not possible dave" errno value to
return that isn't as confusing as just skipping it.
The closest result code would be ENOSUP (operation not supported) but
changing attributes _is_ supported, just not that particular attribute
in that particular circumstance.
Also, the "set attributes" call sets all the attributes at once so there
is no way to say which attribute was rejected. As such, a "do what you
can and let the people check the result" behavior is not at all
unreasonable.
Life is full of flaws. 8-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 0:31 ` Robert White
@ 2014-11-27 0:57 ` Robert White
0 siblings, 0 replies; 13+ messages in thread
From: Robert White @ 2014-11-27 0:57 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On 11/26/2014 04:31 PM, Robert White wrote:
> On 11/26/2014 04:20 PM, Roman Mamedov wrote:
>> On Wed, 26 Nov 2014 16:00:23 -0800
>> Robert White <rwhite@pobox.com> wrote:
>>> You might want to go experiment. Make another new subvol (or at least a
>>> directory in a directory/root/subvol that never had the +C attribute
>>> set) and see if you can cp --reflink any of these files into that
>>> subdirectory without repeating the +C trick.
>>
>> Ha, indeed I can't. Maybe there should be a way to generate checksums
>> without
>> rewriting files, just via reading them, then calculating and writing
>> checksum
>> to metadata.
>
> That problem would be "computationally hard" because you'd have to
> verify that no other file was using that extent before you put that
> extent under control of the csum machinery, otherwise you might break
> later break the COW promise when the file that knows those blocks by
> their checksum changes the contents out from underneath the other
> references.
I explained this poorly/incorrectly...
So some guy like yourself converts a file system, or mounts an existing
file system with nodatasum and creates some file. As a result there is a
file called /One that has no checksums on its extents.
Then the guy creates a directory and sets +C on it, and copies the file
into that directory with "cp --reflink /One /d/Two". File /d/Two is a
no-cow file.
Now the guy somes back and decides to put the data checksums onto the
extents of /One.
At this moment everything _looks_ fine.
Then the guy alters the first byte of /d/Two, which modifies the no-cow
file in place.
Now the guy tires to read /One ... what happens? The checksum doesn't
match and the data has changed because of the NOCOW.
(So I think, in practice, the 1COW mechanism prevents this just like it
works for snapshots)
But thats a _lot_ of corner cases that can be solved by telling someone
to copy the file if they want the checksums to be recreated.
e.g. the set of all files and all possibilities gets well into the
"computationally hard" end of the swimming pool.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 19:55 Can't cp --reflink files on a Ext4-converted FS w/o checksums Roman Mamedov
2014-11-26 23:18 ` Robert White
@ 2014-11-27 3:31 ` Liu Bo
1 sibling, 0 replies; 13+ messages in thread
From: Liu Bo @ 2014-11-27 3:31 UTC (permalink / raw)
To: Roman Mamedov; +Cc: linux-btrfs
On Thu, Nov 27, 2014 at 12:55:27AM +0500, Roman Mamedov wrote:
> Hello,
>
> I used btrfs-convert to switch my FS from Ext4 to Btrfs. As it was a rather
> large 10 TB filesystem, to save on the conversion time, I used the "-d,
> disable data checksum" option of btrfs-convert.
>
> Turns out now I can't "cp --reflink" any files that were already on the FS
> prior to conversion. The error message from cp is "failed to clone [...]
> Invalid argument".
>
> I assume this is because of the lack of checksums; the only way to make old
> files cloneable is to plain copy them to a different place and then delete the
> originals, but that's what I was trying to avoid in the first place.
>
> Also I thought maybe defragmenting will help, but nope, doesn't seem to be the
> case, even ordering it to recompress data to a different method doesn't fix
> the problem. (Even if it did, it's still a lot of unnecessary rewriting).
>
> Is there really a good reason to stop these files without checksums from being
> cloneable? It's not like they have the noCoW attribute, so I'd assume any new
> write to these files would cause a CoW and proper checksums for all new blocks
> anyways.
Just FYI, I recently send a patch[1] to fix btrfs-convert's checksum
problem, it'll produce checksum for empty extents, which makes slow
btrrfs-convert even slower.
With this patch, you may try to convert your ext4 without disabling checksum and see if time is improved.
[1]: Btrfs-progs: fix a bug of converting sparse ext2/3/4
https://patchwork.kernel.org/patch/5374741/
thanks,
-liubo
>
> --
> With respect,
> Roman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-26 23:18 ` Robert White
2014-11-26 23:33 ` Roman Mamedov
@ 2014-11-27 9:27 ` Duncan
2014-11-28 7:12 ` Robert White
1 sibling, 1 reply; 13+ messages in thread
From: Duncan @ 2014-11-27 9:27 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Wed, 26 Nov 2014 15:18:26 -0800 as excerpted:
> I also don't see anything in the code that says "this ioctl will create
> the checksums for the selected file" so you may have to do the copy you
> tried to avoid.
Note that btrfs check has an --init-csum-tree switch. In a new enough
btrfs-progs, I think it even works as one might expect! (In older
versions it would init the tree... by zeroing it for everything!)
FWIW I'm running btrfs-progs v3.17.1 here, but I've not updated in a few
days and think I might have seen someone mention v3.17.2.
But I've not tested it. The normal btrfs check caution applies: Before
using btrfs check in anything other than read-only mode (without any of
the repair/init options), have a backup if you care about anything on the
filesystem, as there's a chance it might eat it instead of fixing it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Can't cp --reflink files on a Ext4-converted FS w/o checksums
2014-11-27 9:27 ` Duncan
@ 2014-11-28 7:12 ` Robert White
0 siblings, 0 replies; 13+ messages in thread
From: Robert White @ 2014-11-28 7:12 UTC (permalink / raw)
To: Duncan, linux-btrfs
On 11/27/2014 01:27 AM, Duncan wrote:
> Robert White posted on Wed, 26 Nov 2014 15:18:26 -0800 as excerpted:
>
>> I also don't see anything in the code that says "this ioctl will create
>> the checksums for the selected file" so you may have to do the copy you
>> tried to avoid.
> Note that btrfs check has an --init-csum-tree switch.
I thought about that, but I doubt it's going to go through all the
inodes and clear the NODATASUM bit from the inode flags where it's been
set by something such as the conversion using -d or setting the
NODATACOW flag (e.g. the +C attribute).
So while that will, hopefully, recalculate the checksums on the regular
files I don't think it would have fixed his problem since those files
weren't "regular" at that point.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-11-28 7:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-26 19:55 Can't cp --reflink files on a Ext4-converted FS w/o checksums Roman Mamedov
2014-11-26 23:18 ` Robert White
2014-11-26 23:33 ` Roman Mamedov
2014-11-27 0:00 ` Robert White
2014-11-27 0:20 ` Roman Mamedov
2014-11-27 0:31 ` Robert White
2014-11-27 0:57 ` Robert White
2014-11-27 0:20 ` Robert White
2014-11-27 0:28 ` Roman Mamedov
2014-11-27 0:45 ` Robert White
2014-11-27 9:27 ` Duncan
2014-11-28 7:12 ` Robert White
2014-11-27 3:31 ` Liu Bo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).