fsck: to repair or not to repair

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* fsck: to repair or not to repair
@ 2016-05-11 21:10 Nikolaus Rath
  2016-05-12 17:02 ` Henk Slager
  2016-06-10  3:40 ` Nikolaus Rath
  0 siblings, 2 replies; 20+ messages in thread
From: Nikolaus Rath @ 2016-05-11 21:10 UTC (permalink / raw)
  To: linux-btrfs

Hello,

I recently ran btrfsck on one of my file systems, and got the following
messages:

checking extents
checking free space cache
checking fs roots
root 5 inode 3149867 errors 400, nbytes wrong
root 5 inode 3150237 errors 400, nbytes wrong
root 5 inode 3150238 errors 400, nbytes wrong
root 5 inode 3150242 errors 400, nbytes wrong
root 5 inode 3150260 errors 400, nbytes wrong
[ lots of similar message with different inode numbers ]
root 5 inode 15595011 errors 400, nbytes wrong
root 5 inode 15595016 errors 400, nbytes wrong
Checking filesystem on /dev/mapper/vg0-nikratio_crypt
UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
found 263648960636 bytes used err is 1
total csum bytes: 395314372
total tree bytes: 908644352
total fs tree bytes: 352735232
total extent tree bytes: 95039488
btree space waste bytes: 156301160
file data blocks allocated: 675209801728
 referenced 410351722496
Btrfs v3.17



Can someone explain to me the risk that I run by attempting a repair,
and (conversely) what I put at stake when continuing to use this file
system as-is?


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-11 21:10 fsck: to repair or not to repair Nikolaus Rath
@ 2016-05-12 17:02 ` Henk Slager
  2016-05-12 17:35   ` Nikolaus Rath
  2016-05-13  6:36   ` Duncan
  2016-06-10  3:40 ` Nikolaus Rath
  1 sibling, 2 replies; 20+ messages in thread
From: Henk Slager @ 2016-05-12 17:02 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: linux-btrfs

On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hello,
>
> I recently ran btrfsck on one of my file systems, and got the following
> messages:
>
> checking extents
> checking free space cache
> checking fs roots
> root 5 inode 3149867 errors 400, nbytes wrong
> root 5 inode 3150237 errors 400, nbytes wrong
> root 5 inode 3150238 errors 400, nbytes wrong
> root 5 inode 3150242 errors 400, nbytes wrong
> root 5 inode 3150260 errors 400, nbytes wrong
> [ lots of similar message with different inode numbers ]
> root 5 inode 15595011 errors 400, nbytes wrong
> root 5 inode 15595016 errors 400, nbytes wrong
> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
> found 263648960636 bytes used err is 1
> total csum bytes: 395314372
> total tree bytes: 908644352
> total fs tree bytes: 352735232
> total extent tree bytes: 95039488
> btree space waste bytes: 156301160
> file data blocks allocated: 675209801728
>  referenced 410351722496
> Btrfs v3.17
>
>
>
> Can someone explain to me the risk that I run by attempting a repair,
> and (conversely) what I put at stake when continuing to use this file
> system as-is?

It has once been mentioned in this mail-list, that if the 'errors 400,
nbytes wrong' is the only error on an fs, btrfs check --repair can fix
them ( was around time of tools release 4.4 , by Qu AFAIK).
I had /(have?) about 7 of those errors in small files on an fs that is
2.5 years old and has quite some older ro snapshots. I once tried to
fix them with 4.5.0 + some patches tools, but actually they did not
get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
fix them in your case. Maybe you first want to test it on an overlay
of the device or copy the whole fs with dd. It depends on how much
time you can allow the fs to be offline etc, it is up to you.

In my case, I recreated the files in the working subvol, but as long
as I don't remove the older snapshots, the errors 400 will still be
there I assume. At least I don't experience any negative impact of it,
so I keep it like it is until at some point in time the older
snapshots get removed or I am somehow forced to clone back the data
into a fresh fs. I am running mostly latest stable or sometimes
mainline kernel.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-12 17:02 ` Henk Slager
@ 2016-05-12 17:35   ` Nikolaus Rath
  2016-05-12 17:55     ` Ashish Samant
  2016-05-13  6:36   ` Duncan
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolaus Rath @ 2016-05-12 17:35 UTC (permalink / raw)
  To: linux-btrfs

On May 12 2016, Henk Slager <eye1tm@gmail.com> wrote:
> On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Hello,
>>
>> I recently ran btrfsck on one of my file systems, and got the following
>> messages:
>>
>> checking extents
>> checking free space cache
>> checking fs roots
>> root 5 inode 3149867 errors 400, nbytes wrong
>> root 5 inode 3150237 errors 400, nbytes wrong
>> root 5 inode 3150238 errors 400, nbytes wrong
>> root 5 inode 3150242 errors 400, nbytes wrong
>> root 5 inode 3150260 errors 400, nbytes wrong
>> [ lots of similar message with different inode numbers ]
>> root 5 inode 15595011 errors 400, nbytes wrong
>> root 5 inode 15595016 errors 400, nbytes wrong
>> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
>> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>> found 263648960636 bytes used err is 1
>> total csum bytes: 395314372
>> total tree bytes: 908644352
>> total fs tree bytes: 352735232
>> total extent tree bytes: 95039488
>> btree space waste bytes: 156301160
>> file data blocks allocated: 675209801728
>>  referenced 410351722496
>> Btrfs v3.17
>>
>>
>>
>> Can someone explain to me the risk that I run by attempting a repair,
>> and (conversely) what I put at stake when continuing to use this file
>> system as-is?
>
> It has once been mentioned in this mail-list, that if the 'errors 400,
> nbytes wrong' is the only error on an fs, btrfs check --repair can fix
> them ( was around time of tools release 4.4 , by Qu AFAIK).
> I had /(have?) about 7 of those errors in small files on an fs that is
> 2.5 years old and has quite some older ro snapshots. I once tried to
> fix them with 4.5.0 + some patches tools, but actually they did not
> get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
> fix them in your case. Maybe you first want to test it on an overlay
> of the device or copy the whole fs with dd. It depends on how much
> time you can allow the fs to be offline etc, it is up to you.
>
> In my case, I recreated the files in the working subvol, but as long
[...]

How did you determine which files were affected? Is there a way to map
inodes to paths?


Thanks!
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-12 17:35   ` Nikolaus Rath
@ 2016-05-12 17:55     ` Ashish Samant
  0 siblings, 0 replies; 20+ messages in thread
From: Ashish Samant @ 2016-05-12 17:55 UTC (permalink / raw)
  To: Nikolaus Rath, linux-btrfs



On 05/12/2016 10:35 AM, Nikolaus Rath wrote:
> On May 12 2016, Henk Slager <eye1tm@gmail.com> wrote:
>> On Wed, May 11, 2016 at 11:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>> Hello,
>>>
>>> I recently ran btrfsck on one of my file systems, and got the following
>>> messages:
>>>
>>> checking extents
>>> checking free space cache
>>> checking fs roots
>>> root 5 inode 3149867 errors 400, nbytes wrong
>>> root 5 inode 3150237 errors 400, nbytes wrong
>>> root 5 inode 3150238 errors 400, nbytes wrong
>>> root 5 inode 3150242 errors 400, nbytes wrong
>>> root 5 inode 3150260 errors 400, nbytes wrong
>>> [ lots of similar message with different inode numbers ]
>>> root 5 inode 15595011 errors 400, nbytes wrong
>>> root 5 inode 15595016 errors 400, nbytes wrong
>>> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
>>> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>>> found 263648960636 bytes used err is 1
>>> total csum bytes: 395314372
>>> total tree bytes: 908644352
>>> total fs tree bytes: 352735232
>>> total extent tree bytes: 95039488
>>> btree space waste bytes: 156301160
>>> file data blocks allocated: 675209801728
>>>   referenced 410351722496
>>> Btrfs v3.17
>>>
>>>
>>>
>>> Can someone explain to me the risk that I run by attempting a repair,
>>> and (conversely) what I put at stake when continuing to use this file
>>> system as-is?
>> It has once been mentioned in this mail-list, that if the 'errors 400,
>> nbytes wrong' is the only error on an fs, btrfs check --repair can fix
>> them ( was around time of tools release 4.4 , by Qu AFAIK).
>> I had /(have?) about 7 of those errors in small files on an fs that is
>> 2.5 years old and has quite some older ro snapshots. I once tried to
>> fix them with 4.5.0 + some patches tools, but actually they did not
>> get fixed. At least with 4.5.2 or 4.5.3 tools it should be possible to
>> fix them in your case. Maybe you first want to test it on an overlay
>> of the device or copy the whole fs with dd. It depends on how much
>> time you can allow the fs to be offline etc, it is up to you.
>>
>> In my case, I recreated the files in the working subvol, but as long
> [...]
>
> How did you determine which files were affected? Is there a way to map
> inodes to paths?
  btrfs inspect-internal inode-resolve <inode> <path>.

This resolves the <inode> in subvol <path> to its fs paths

Thanks,
Ashish
>
>
> Thanks!
> -Nikolaus
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-12 17:02 ` Henk Slager
  2016-05-12 17:35   ` Nikolaus Rath
@ 2016-05-13  6:36   ` Duncan
  2016-05-13 15:28     ` Nikolaus Rath
  1 sibling, 1 reply; 20+ messages in thread
From: Duncan @ 2016-05-13  6:36 UTC (permalink / raw)
  To: linux-btrfs

Henk Slager posted on Thu, 12 May 2016 19:02:56 +0200 as excerpted:

>  Maybe you first want to test it on an overlay
> of the device or copy the whole fs with dd.

WARNING!

Because btrfs can be multi-device, it needs some way to track which 
devices belong to each filesystem, and it uses filesystem UUID for this 
purpose.

If you clone a filesystem (for instance using dd or lvm snapshotting, 
doesn't matter how) and then trigger a btrfs device scan, say by plugging 
in some other device with btrfs on it so udev triggers a scan, and the 
kernel sees multiple devices with the same filesystem UUID as a result, 
and one of those happens to be mounted, you can corrupt both copies as 
the kernel btrfs won't be able to tell them apart and may write updates 
to the wrong one.

Prevention:

Don't let btrfs see both copies at the same time.  If you need to clone 
the filesystem, ideally make sure it's unmounted at the time, and detach 
either the original device or the clone immediately upon finishing the 
clone operation (before doing anything that might trigger a btrfs device 
scan, including device plugging that would trigger udev to trigger such a 
scan).  Then simply keep them separate, only attaching one at a time and 
DEFINITELY never mounting the filesystem with both the clone and the 
original devices attached, so the kernel can't get confused and write to 
the wrong one because the other one is never there at the same time to 
provide the opportunity.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-13  6:36   ` Duncan
@ 2016-05-13 15:28     ` Nikolaus Rath
  2016-05-13 21:35       ` Chris Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Nikolaus Rath @ 2016-05-13 15:28 UTC (permalink / raw)
  To: linux-btrfs

On May 13 2016, Duncan <1i5t5.duncan@cox.net> wrote:
> Because btrfs can be multi-device, it needs some way to track which 
> devices belong to each filesystem, and it uses filesystem UUID for this 
> purpose.
>
> If you clone a filesystem (for instance using dd or lvm snapshotting, 
> doesn't matter how) and then trigger a btrfs device scan, say by plugging 
> in some other device with btrfs on it so udev triggers a scan, and the 
> kernel sees multiple devices with the same filesystem UUID as a result, 
> and one of those happens to be mounted, you can corrupt both copies as 
> the kernel btrfs won't be able to tell them apart and may write updates 
> to the wrong one.

That seems like a rather odd design. Why isn't btrfs refusing to mount
in this situation? In the face of ambiguity, guessing is generally bad
idea (at least for a computer program).


Best,
-Nikolaus
-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-13 15:28     ` Nikolaus Rath
@ 2016-05-13 21:35       ` Chris Murphy
  2016-05-16 11:17         ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Murphy @ 2016-05-13 21:35 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: Btrfs BTRFS

On Fri, May 13, 2016 at 9:28 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> On May 13 2016, Duncan <1i5t5.duncan@cox.net> wrote:
>> Because btrfs can be multi-device, it needs some way to track which
>> devices belong to each filesystem, and it uses filesystem UUID for this
>> purpose.
>>
>> If you clone a filesystem (for instance using dd or lvm snapshotting,
>> doesn't matter how) and then trigger a btrfs device scan, say by plugging
>> in some other device with btrfs on it so udev triggers a scan, and the
>> kernel sees multiple devices with the same filesystem UUID as a result,
>> and one of those happens to be mounted, you can corrupt both copies as
>> the kernel btrfs won't be able to tell them apart and may write updates
>> to the wrong one.
>
> That seems like a rather odd design. Why isn't btrfs refusing to mount
> in this situation? In the face of ambiguity, guessing is generally bad
> idea (at least for a computer program).

The logic  you describe requires code. It's the absence of code rather
than an intentional design that's the cause of the current behavior.
And yes, it'd be nice if Btrfs weren't stepping on its own tail in
this situation. It could be as simple as refusing to mount anytime
there's an ambiguity, but that's sorta user hostile if there isn't a
message that goes along with it to help the user figure out a way to
resolve the problem. And that too could be fraught with peril if the
user makes a mistake. So, really what's the right way to do this is
part of the problem but I agree it's better to be hostile and refuse
to mount a given volume UUID at all when too many devices are found,
than corrupt the file system.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-13 21:35       ` Chris Murphy
@ 2016-05-16 11:17         ` Austin S. Hemmelgarn
  2016-05-16 11:34           ` Andrei Borzenkov
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-16 11:17 UTC (permalink / raw)
  To: Chris Murphy, Nikolaus Rath; +Cc: Btrfs BTRFS

On 2016-05-13 17:35, Chris Murphy wrote:
> On Fri, May 13, 2016 at 9:28 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> On May 13 2016, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Because btrfs can be multi-device, it needs some way to track which
>>> devices belong to each filesystem, and it uses filesystem UUID for this
>>> purpose.
>>>
>>> If you clone a filesystem (for instance using dd or lvm snapshotting,
>>> doesn't matter how) and then trigger a btrfs device scan, say by plugging
>>> in some other device with btrfs on it so udev triggers a scan, and the
>>> kernel sees multiple devices with the same filesystem UUID as a result,
>>> and one of those happens to be mounted, you can corrupt both copies as
>>> the kernel btrfs won't be able to tell them apart and may write updates
>>> to the wrong one.
>>
>> That seems like a rather odd design. Why isn't btrfs refusing to mount
>> in this situation? In the face of ambiguity, guessing is generally bad
>> idea (at least for a computer program).
>
> The logic  you describe requires code. It's the absence of code rather
> than an intentional design that's the cause of the current behavior.
> And yes, it'd be nice if Btrfs weren't stepping on its own tail in
> this situation. It could be as simple as refusing to mount anytime
> there's an ambiguity, but that's sorta user hostile if there isn't a
> message that goes along with it to help the user figure out a way to
> resolve the problem. And that too could be fraught with peril if the
> user makes a mistake. So, really what's the right way to do this is
> part of the problem but I agree it's better to be hostile and refuse
> to mount a given volume UUID at all when too many devices are found,
> than corrupt the file system.
>
FWIW, the behavior I'd expect from a sysadmin perspective would be:
1. If and only if a correct number of device= options have been passed 
to mount, use those devices (and only those devices), and log a warning 
if extra devices are detected.
2. Otherwise, refuse to mount and log a warning.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-16 11:17         ` Austin S. Hemmelgarn
@ 2016-05-16 11:34           ` Andrei Borzenkov
  2016-05-16 11:48             ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Andrei Borzenkov @ 2016-05-16 11:34 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Chris Murphy, Nikolaus Rath; +Cc: Btrfs BTRFS

16.05.2016 14:17, Austin S. Hemmelgarn пишет:
> On 2016-05-13 17:35, Chris Murphy wrote:
>> On Fri, May 13, 2016 at 9:28 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>> On May 13 2016, Duncan <1i5t5.duncan@cox.net> wrote:
>>>> Because btrfs can be multi-device, it needs some way to track which
>>>> devices belong to each filesystem, and it uses filesystem UUID for this
>>>> purpose.
>>>>
>>>> If you clone a filesystem (for instance using dd or lvm snapshotting,
>>>> doesn't matter how) and then trigger a btrfs device scan, say by
>>>> plugging
>>>> in some other device with btrfs on it so udev triggers a scan, and the
>>>> kernel sees multiple devices with the same filesystem UUID as a result,
>>>> and one of those happens to be mounted, you can corrupt both copies as
>>>> the kernel btrfs won't be able to tell them apart and may write updates
>>>> to the wrong one.
>>>
>>> That seems like a rather odd design. Why isn't btrfs refusing to mount
>>> in this situation? In the face of ambiguity, guessing is generally bad
>>> idea (at least for a computer program).
>>
>> The logic  you describe requires code. It's the absence of code rather
>> than an intentional design that's the cause of the current behavior.
>> And yes, it'd be nice if Btrfs weren't stepping on its own tail in
>> this situation. It could be as simple as refusing to mount anytime
>> there's an ambiguity, but that's sorta user hostile if there isn't a
>> message that goes along with it to help the user figure out a way to
>> resolve the problem. And that too could be fraught with peril if the
>> user makes a mistake. So, really what's the right way to do this is
>> part of the problem but I agree it's better to be hostile and refuse
>> to mount a given volume UUID at all when too many devices are found,
>> than corrupt the file system.
>>
> FWIW, the behavior I'd expect from a sysadmin perspective would be:
> 1. If and only if a correct number of device= options have been passed
> to mount, use those devices (and only those devices), and log a warning
> if extra devices are detected.

First, how do you know that devices, passed as device= options, are
correct? Is it possible to detect stale copy?

Second, today udev rules will run equivalent of "btrfs device ready" for
each device that is part of btrfs. So you still need to handle the
situation when device(s) appear and disappear after initial mount and
have some way to distinguish between two copies.

Third, what exactly "extra devices detected" means? Who is responsible
for detection? Where this information is kept? How can mount query this
information?

> 2. Otherwise, refuse to mount and log a warning.

So no way to mount degraded redundant filesystem?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-16 11:34           ` Andrei Borzenkov
@ 2016-05-16 11:48             ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-05-16 11:48 UTC (permalink / raw)
  To: Andrei Borzenkov, Chris Murphy, Nikolaus Rath; +Cc: Btrfs BTRFS

On 2016-05-16 07:34, Andrei Borzenkov wrote:
> 16.05.2016 14:17, Austin S. Hemmelgarn пишет:
>> On 2016-05-13 17:35, Chris Murphy wrote:
>>> On Fri, May 13, 2016 at 9:28 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>>> On May 13 2016, Duncan <1i5t5.duncan@cox.net> wrote:
>>>>> Because btrfs can be multi-device, it needs some way to track which
>>>>> devices belong to each filesystem, and it uses filesystem UUID for this
>>>>> purpose.
>>>>>
>>>>> If you clone a filesystem (for instance using dd or lvm snapshotting,
>>>>> doesn't matter how) and then trigger a btrfs device scan, say by
>>>>> plugging
>>>>> in some other device with btrfs on it so udev triggers a scan, and the
>>>>> kernel sees multiple devices with the same filesystem UUID as a result,
>>>>> and one of those happens to be mounted, you can corrupt both copies as
>>>>> the kernel btrfs won't be able to tell them apart and may write updates
>>>>> to the wrong one.
>>>>
>>>> That seems like a rather odd design. Why isn't btrfs refusing to mount
>>>> in this situation? In the face of ambiguity, guessing is generally bad
>>>> idea (at least for a computer program).
>>>
>>> The logic  you describe requires code. It's the absence of code rather
>>> than an intentional design that's the cause of the current behavior.
>>> And yes, it'd be nice if Btrfs weren't stepping on its own tail in
>>> this situation. It could be as simple as refusing to mount anytime
>>> there's an ambiguity, but that's sorta user hostile if there isn't a
>>> message that goes along with it to help the user figure out a way to
>>> resolve the problem. And that too could be fraught with peril if the
>>> user makes a mistake. So, really what's the right way to do this is
>>> part of the problem but I agree it's better to be hostile and refuse
>>> to mount a given volume UUID at all when too many devices are found,
>>> than corrupt the file system.
>>>
>> FWIW, the behavior I'd expect from a sysadmin perspective would be:
>> 1. If and only if a correct number of device= options have been passed
>> to mount, use those devices (and only those devices), and log a warning
>> if extra devices are detected.
>
> First, how do you know that devices, passed as device= options, are
> correct? Is it possible to detect stale copy?
You don't.  As much as it pains me to say it, there's no way to protect 
against this reliably.  The intent is that if you have specified the 
correct number of devices according to the number the filesystem says 
should be there (and that number is the same on all devices specified), 
it's assumed you know what you're doing.
>
> Second, today udev rules will run equivalent of "btrfs device ready" for
> each device that is part of btrfs.
That's part of the rules shipped by systemd, and is not by any means on 
every system in existence.  That is an inherent design flaw in systemd 
resulting from them thinking they're smarter than the kernel, and it has 
on multiple occasions bit people.
> So you still need to handle the
> situation when device(s) appear and disappear after initial mount and
> have some way to distinguish between two copies.
Yes, you need to account for devices appearing and disappearing, but at 
least until we add proper support for off-line devices, that's easy.
>
> Third, what exactly "extra devices detected" means? Who is responsible
> for detection? Where this information is kept? How can mount query this
> information?
If there are more devices with the filesystem's UUID than are passed in 
via device= options, and the above stated condition regarding device= 
options is met, then those are extra devices.
>
>> 2. Otherwise, refuse to mount and log a warning.
>
> So no way to mount degraded redundant filesystem?
I know a large number of people who routinely use degraded as part of 
their fstab options.  Degraded is supposed to mean reduced data safety, 
not 'may cause random corruption just by being used'.  I have no issue 
with a mount option to force mounting it anyway, but I absolutely do not 
want that to be part of the degraded mount option.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-05-11 21:10 fsck: to repair or not to repair Nikolaus Rath
  2016-05-12 17:02 ` Henk Slager
@ 2016-06-10  3:40 ` Nikolaus Rath
  2016-06-10 11:05   ` Austin S. Hemmelgarn
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolaus Rath @ 2016-06-10  3:40 UTC (permalink / raw)
  To: linux-btrfs

On May 11 2016, Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hello,
>
> I recently ran btrfsck on one of my file systems, and got the following
> messages:
>
> checking extents
> checking free space cache
> checking fs roots
> root 5 inode 3149867 errors 400, nbytes wrong
> root 5 inode 3150237 errors 400, nbytes wrong
> root 5 inode 3150238 errors 400, nbytes wrong
> root 5 inode 3150242 errors 400, nbytes wrong
> root 5 inode 3150260 errors 400, nbytes wrong
> [ lots of similar message with different inode numbers ]
> root 5 inode 15595011 errors 400, nbytes wrong
> root 5 inode 15595016 errors 400, nbytes wrong
> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
> found 263648960636 bytes used err is 1
> total csum bytes: 395314372
> total tree bytes: 908644352
> total fs tree bytes: 352735232
> total extent tree bytes: 95039488
> btree space waste bytes: 156301160
> file data blocks allocated: 675209801728
>  referenced 410351722496
> Btrfs v3.17
>
>
> Can someone explain to me the risk that I run by attempting a repair,
> and (conversely) what I put at stake when continuing to use this file
> system as-is?
 
To follow-up on this: after finding out which files were affected (using
btrfs inspect-internal), I was able to fix the problem without using
btrfsck by simply copying the data, deleting the file, and restoring it:

cat affected-files.txt | while read -r name; do
rsync -a "${name}" "/backup/location/${name}"
rm -f "${name}"
cp -a "/backup/location/${name}" "${name}"
done

(I used rsync to avoid cp making use of reflinks). After this procedure,
btrfschk reported no more problems.


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10  3:40 ` Nikolaus Rath
@ 2016-06-10 11:05   ` Austin S. Hemmelgarn
  2016-06-10 15:54     ` Nikolaus Rath
  2016-06-10 15:55     ` Nikolaus Rath
  0 siblings, 2 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-10 11:05 UTC (permalink / raw)
  To: Nikolaus Rath, linux-btrfs

On 2016-06-09 23:40, Nikolaus Rath wrote:
> On May 11 2016, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Hello,
>>
>> I recently ran btrfsck on one of my file systems, and got the following
>> messages:
>>
>> checking extents
>> checking free space cache
>> checking fs roots
>> root 5 inode 3149867 errors 400, nbytes wrong
>> root 5 inode 3150237 errors 400, nbytes wrong
>> root 5 inode 3150238 errors 400, nbytes wrong
>> root 5 inode 3150242 errors 400, nbytes wrong
>> root 5 inode 3150260 errors 400, nbytes wrong
>> [ lots of similar message with different inode numbers ]
>> root 5 inode 15595011 errors 400, nbytes wrong
>> root 5 inode 15595016 errors 400, nbytes wrong
>> Checking filesystem on /dev/mapper/vg0-nikratio_crypt
>> UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
>> found 263648960636 bytes used err is 1
>> total csum bytes: 395314372
>> total tree bytes: 908644352
>> total fs tree bytes: 352735232
>> total extent tree bytes: 95039488
>> btree space waste bytes: 156301160
>> file data blocks allocated: 675209801728
>>  referenced 410351722496
>> Btrfs v3.17
>>
>>
>> Can someone explain to me the risk that I run by attempting a repair,
>> and (conversely) what I put at stake when continuing to use this file
>> system as-is?
>
> To follow-up on this: after finding out which files were affected (using
> btrfs inspect-internal), I was able to fix the problem without using
> btrfsck by simply copying the data, deleting the file, and restoring it:
>
> cat affected-files.txt | while read -r name; do
> rsync -a "${name}" "/backup/location/${name}"
> rm -f "${name}"
> cp -a "/backup/location/${name}" "${name}"
> done
>
> (I used rsync to avoid cp making use of reflinks). After this procedure,
> btrfschk reported no more problems.
JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid it 
making reflinks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 11:05   ` Austin S. Hemmelgarn
@ 2016-06-10 15:54     ` Nikolaus Rath
  2016-06-10 16:50       ` Adam Borowski
  2016-06-10 15:55     ` Nikolaus Rath
  1 sibling, 1 reply; 20+ messages in thread
From: Nikolaus Rath @ 2016-06-10 15:54 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On Jun 10 2016, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> it making reflinks.

I would have expected so, but at least in coreutils 8.23 the only valid
options are "never" and "auto" (at least according to cp --help and the
manpage).

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 15:54     ` Nikolaus Rath
@ 2016-06-10 16:50       ` Adam Borowski
  2016-06-10 16:55         ` Nikolaus Rath
  2016-06-10 17:12         ` Austin S. Hemmelgarn
  0 siblings, 2 replies; 20+ messages in thread
From: Adam Borowski @ 2016-06-10 16:50 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: Austin S. Hemmelgarn, linux-btrfs

On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:
> On Jun 10 2016, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> > JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> > it making reflinks.
> 
> I would have expected so, but at least in coreutils 8.23 the only valid
> options are "never" and "auto" (at least according to cp --help and the
> manpage).

Where do you get "never" from?

.--====
cp: invalid argument ‘never’ for ‘--reflink’
Valid arguments are:
  - ‘auto’
  - ‘always’
Try 'cp --help' for more information.
`----

And, as of coreutils 8.25, the default is no reflink, with "never" not being
recognized even as a way to avoid an alias.  As far as I remember, this
applies to every past version with support for reflinks too.

-- 
An imaginary friend squared is a real enemy.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 16:50       ` Adam Borowski
@ 2016-06-10 16:55         ` Nikolaus Rath
  2016-06-10 17:12         ` Austin S. Hemmelgarn
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolaus Rath @ 2016-06-10 16:55 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Austin S. Hemmelgarn, linux-btrfs

On Jun 10 2016, Adam Borowski <kilobyte@angband.pl> wrote:
> On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:
>> On Jun 10 2016, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>> > JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
>> > it making reflinks.
>> 
>> I would have expected so, but at least in coreutils 8.23 the only valid
>> options are "never" and "auto" (at least according to cp --help and the
>> manpage).
>
> Where do you get "never" from?

I meant to write "always" (as in my second mail, I thought I hit "cancel"
quickly enough).


Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 16:50       ` Adam Borowski
  2016-06-10 16:55         ` Nikolaus Rath
@ 2016-06-10 17:12         ` Austin S. Hemmelgarn
  2016-06-10 17:22           ` Adam Borowski
  1 sibling, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-10 17:12 UTC (permalink / raw)
  To: Adam Borowski, Nikolaus Rath; +Cc: linux-btrfs

On 2016-06-10 12:50, Adam Borowski wrote:
> On Fri, Jun 10, 2016 at 08:54:36AM -0700, Nikolaus Rath wrote:
>> On Jun 10 2016, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
>>> JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
>>> it making reflinks.
>>
>> I would have expected so, but at least in coreutils 8.23 the only valid
>> options are "never" and "auto" (at least according to cp --help and the
>> manpage).
>
> Where do you get "never" from?
>
> .--====
> cp: invalid argument ‘never’ for ‘--reflink’
> Valid arguments are:
>   - ‘auto’
>   - ‘always’
> Try 'cp --help' for more information.
> `----
>
> And, as of coreutils 8.25, the default is no reflink, with "never" not being
> recognized even as a way to avoid an alias.  As far as I remember, this
> applies to every past version with support for reflinks too.
>
Odd, I could have sworn that was an option...

And I do know there was talk at least at one point of adding it and 
switching to reflink=auto by default.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 17:12         ` Austin S. Hemmelgarn
@ 2016-06-10 17:22           ` Adam Borowski
  2016-06-10 17:39             ` Austin S. Hemmelgarn
  2016-06-10 17:40             ` Henk Slager
  0 siblings, 2 replies; 20+ messages in thread
From: Adam Borowski @ 2016-06-10 17:22 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: Nikolaus Rath, linux-btrfs

On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:
> On 2016-06-10 12:50, Adam Borowski wrote:
> >And, as of coreutils 8.25, the default is no reflink, with "never" not being
> >recognized even as a way to avoid an alias.  As far as I remember, this
> >applies to every past version with support for reflinks too.
> >
> Odd, I could have sworn that was an option...
> 
> And I do know there was talk at least at one point of adding it and
> switching to reflink=auto by default.

Yes please!

It's hard to come with a good reason for not reflinking when it's possible
-- the only one I see is if you have a nocow VM and want to slightly improve
speed at a cost of lots of disk space.  And even then, there's cat a >b for
that.

And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
that's utterly negligible compared to actually copying the data.

-- 
An imaginary friend squared is a real enemy.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 17:22           ` Adam Borowski
@ 2016-06-10 17:39             ` Austin S. Hemmelgarn
  2016-06-10 17:40             ` Henk Slager
  1 sibling, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2016-06-10 17:39 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Nikolaus Rath, linux-btrfs

On 2016-06-10 13:22, Adam Borowski wrote:
> On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:
>> On 2016-06-10 12:50, Adam Borowski wrote:
>>> And, as of coreutils 8.25, the default is no reflink, with "never" not being
>>> recognized even as a way to avoid an alias.  As far as I remember, this
>>> applies to every past version with support for reflinks too.
>>>
>> Odd, I could have sworn that was an option...
>>
>> And I do know there was talk at least at one point of adding it and
>> switching to reflink=auto by default.
>
> Yes please!
>
> It's hard to come with a good reason for not reflinking when it's possible
> -- the only one I see is if you have a nocow VM and want to slightly improve
> speed at a cost of lots of disk space.  And even then, there's cat a >b for
> that.
There are other arguments, the most common one being not changing user 
visible behavior.  There are (misguided) people who expect copying a 
file to mean you have two distinct copies of that file.

OTOH, it's not too hard to set up a system to do this, you just put:
alias cp='cp --reflink=auto'
into your bashrc (or something similar into whatever other shell you 
use).  I've been doing this since cp added support for it.
>
> And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
> that's utterly negligible compared to actually copying the data.
Actually, IIRC, it's an ioctl, not a syscall, which can be kind of 
expensive (I don't know how much more expensive, but ioctls are usually 
more expensive than syscalls).

Other things to keep in mind though that may impact this (either way):
1. There are other filesystems that support reflinks (OCFS2 and ZFS come 
immediately to mind).
2. Most of the filesystems that support reflinks are used more in 
enterprise situations, where the bit about not changing user visible 
behavior is a much stronger argument.
3. Even in enterprise situations, reflink capable filesystems are still 
unusual outside of petabyte scale data storage.
4. Last I checked, the most widely used filesystem that supports 
reflinks (ZFS) uses a different ioctl interface for them than most other 
Linux filesystems, which means more checking is needed than just calling 
one ioctl.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 17:22           ` Adam Borowski
  2016-06-10 17:39             ` Austin S. Hemmelgarn
@ 2016-06-10 17:40             ` Henk Slager
  1 sibling, 0 replies; 20+ messages in thread
From: Henk Slager @ 2016-06-10 17:40 UTC (permalink / raw)
  To: Adam Borowski; +Cc: Austin S. Hemmelgarn, Nikolaus Rath, linux-btrfs

On Fri, Jun 10, 2016 at 7:22 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Fri, Jun 10, 2016 at 01:12:42PM -0400, Austin S. Hemmelgarn wrote:
>> On 2016-06-10 12:50, Adam Borowski wrote:
>> >And, as of coreutils 8.25, the default is no reflink, with "never" not being
>> >recognized even as a way to avoid an alias.  As far as I remember, this
>> >applies to every past version with support for reflinks too.
>> >
>> Odd, I could have sworn that was an option...
>>
>> And I do know there was talk at least at one point of adding it and
>> switching to reflink=auto by default.
>
> Yes please!
>
> It's hard to come with a good reason for not reflinking when it's possible
> -- the only one I see is if you have a nocow VM and want to slightly improve
> speed at a cost of lots of disk space.  And even then, there's cat a >b for
> that.

For a nocow VM imagefile, reflink anyhow does not work so cp
--reflink=auto would then just duplicate the whole thing, do doing a
'cp --reflink=never' (never works for --sparse), either silently or
with a warning/note.

For a cow VM imagefile, the only thing I do and want w.r.t. cp is
reflink=always, so I also vote for auto on by default.

If you want to 'defrag' a VM imagefile, using cat or dd and enough RAM
does a better and faster job than cp or btrfs manual defrag.

> And the cost on non-btrfs non-unmerged-xfs is a single syscall per file,
> that's utterly negligible compared to actually copying the data.
>
> --
> An imaginary friend squared is a real enemy.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: fsck: to repair or not to repair
  2016-06-10 11:05   ` Austin S. Hemmelgarn
  2016-06-10 15:54     ` Nikolaus Rath
@ 2016-06-10 15:55     ` Nikolaus Rath
  1 sibling, 0 replies; 20+ messages in thread
From: Nikolaus Rath @ 2016-06-10 15:55 UTC (permalink / raw)
  To: Austin S. Hemmelgarn; +Cc: linux-btrfs

On Jun 10 2016, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> wrote:
> JFYI, if you've using GNU cp, you can pass '--reflink=never' to avoid
> it making reflinks.

I would have expected so, but at least in coreutils 8.23 the only valid
options are "always" and "auto" (at least according to cp --help and the
manpage).

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-06-10 17:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-11 21:10 fsck: to repair or not to repair Nikolaus Rath
2016-05-12 17:02 ` Henk Slager
2016-05-12 17:35   ` Nikolaus Rath
2016-05-12 17:55     ` Ashish Samant
2016-05-13  6:36   ` Duncan
2016-05-13 15:28     ` Nikolaus Rath
2016-05-13 21:35       ` Chris Murphy
2016-05-16 11:17         ` Austin S. Hemmelgarn
2016-05-16 11:34           ` Andrei Borzenkov
2016-05-16 11:48             ` Austin S. Hemmelgarn
2016-06-10  3:40 ` Nikolaus Rath
2016-06-10 11:05   ` Austin S. Hemmelgarn
2016-06-10 15:54     ` Nikolaus Rath
2016-06-10 16:50       ` Adam Borowski
2016-06-10 16:55         ` Nikolaus Rath
2016-06-10 17:12         ` Austin S. Hemmelgarn
2016-06-10 17:22           ` Adam Borowski
2016-06-10 17:39             ` Austin S. Hemmelgarn
2016-06-10 17:40             ` Henk Slager
2016-06-10 15:55     ` Nikolaus Rath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).