linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Deleted files cause btrfs-send to fail
@ 2015-08-12 22:34 Marc Joliet
  2015-08-13  7:05 ` Marc Joliet
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Joliet @ 2015-08-12 22:34 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

Hi all,

Starting today I have an interesting problem: I deleted some files as part (old
fcrontabs), which now persistently causes btrfs-send to fail.  The error
message I get is:

Aug 12 23:32:24 thetick make_backups.sh[1059]: ERROR: send ioctl failed with -2: No such file or directory
Aug 12 23:32:25 thetick make_backups.sh[1059]: ERROR: unexpected EOF in stream.

There is nothing in the dmesg output.

Since this is the root file system, I haven't gotten a copy of the actual output
of "btrfs check", though I have run it from an initramfs rescue shell.  The
output I saw there was much like the following (taken from an Email by Roman
Mamedov from 2014-12-28):

root 22730 inode 6236418 errors 2000, link count wrong unresolved ref dir 105512 index 586340 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index

Only in my case, it's "root 5" and "root 4" (I think), and the file names
(and other file system specifics) are of course different.  I definitely saw
"errors 2000" (I take it that's supposed to be an error code?).

Is this something that "btrfs check --repair" (or something else) can safely
fix?

# uname -a
Linux thetick 4.1.4-gentoo #1 SMP PREEMPT Tue Aug 4 21:58:41 CEST 2015 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ AuthenticAMD GNU/Linux
# btrfs --version
btrfs-progs v4.1.2
# btrfs filesystem show
Label: 'MARCEC_ROOT'  uuid: 0267d8b3-a074-460a-832d-5d5fd36bae64
        Total devices 1 FS bytes used 59.30GiB
        devid    1 size 107.79GiB used 74.03GiB path /dev/sda1

Label: 'MARCEC_STORAGE'  uuid: 472c9290-3ff2-4096-9c47-0612d3a52cef
        Total devices 2 FS bytes used 597.75GiB
        devid    1 size 931.51GiB used 600.03GiB path /dev/sdc
        devid    2 size 931.51GiB used 600.03GiB path /dev/sdb

Label: 'MARCEC_BACKUP'  uuid: f97b3cda-15e8-418b-bb9b-235391ef2a38
        Total devices 1 FS bytes used 810.35GiB
        devid    1 size 976.56GiB used 837.06GiB path /dev/sdd2

btrfs-progs v4.1.2
# btrfs filesystem df /
Data, single: total=70.00GiB, used=57.53GiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=4.00GiB, used=1.77GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-12 22:34 Deleted files cause btrfs-send to fail Marc Joliet
@ 2015-08-13  7:05 ` Marc Joliet
  2015-08-13  8:29   ` Duncan
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Joliet @ 2015-08-13  7:05 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]

Am Thu, 13 Aug 2015 00:34:19 +0200
schrieb Marc Joliet <marcec@gmx.de>:

[...]
> Since this is the root file system, I haven't gotten a copy of the actual output
> of "btrfs check", though I have run it from an initramfs rescue shell.  The
> output I saw there was much like the following (taken from an Email by Roman
> Mamedov from 2014-12-28):
> 
> root 22730 inode 6236418 errors 2000, link count wrong unresolved ref dir 105512 index 586340 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
> 
> Only in my case, it's "root 5" and "root 4" (I think), and the file names
> (and other file system specifics) are of course different.  I definitely saw
> "errors 2000" (I take it that's supposed to be an error code?).
[...]

Here's the actual output now, obtained via btrfs-progs 4.0.1 from an initramfs
emergency shell:

checking extents
checking free space cache
checking fs roots
root 5 inode 8338813 errors 2000, link count wrong
        unresolved ref dir 26699 index 50500 namelen 4 name root filetype 0 errors 3, no dir item, no dir index
root 5 inode 8338814 errors 2000, link count wrong
        unresolved ref dir 26699 index 50502 namelen 6 name marcec filetype 0 errors 3, no dir item, no dir index
root 5 inode 8338815 errors 2000, link count wrong
        unresolved ref dir 26699 index 50504 namelen 6 name systab filetype 0 errors 3, no dir item, no dir index
root 5 inode 8710030 errors 2000, link count wrong
        unresolved ref dir 26699 index 59588 namelen 6 name marcec filetype 0 errors 3, no dir item, no dir index
root 5 inode 8710031 errors 2000, link count wrong
        unresolved ref dir 26699 index 59590 namelen 4 name root filetype 0 errors 3, no dir item, no dir index
Checking filesystem on /dev/sda1
UUID: 0267d8b3-a074-460a-832d-5d5fd36bae64
found 63467610172 bytes used err is 1
total csum bytes: 59475016
total tree bytes: 1903411200
total fs tree bytes: 1691504640
total extent tree bytes: 130322432
btree space waste bytes: 442495212
file data blocks allocated: 555097092096
 referenced 72887840768
btrfs-progs v4.0.1

Again: is this fixable?

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-13  7:05 ` Marc Joliet
@ 2015-08-13  8:29   ` Duncan
  2015-08-13  8:54     ` Marc Joliet
  0 siblings, 1 reply; 8+ messages in thread
From: Duncan @ 2015-08-13  8:29 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Thu, 13 Aug 2015 09:05:41 +0200 as excerpted:

> Here's the actual output now, obtained via btrfs-progs 4.0.1 from an
> initramfs emergency shell:
> 
> checking extents checking free space cache checking fs roots root 5
> inode 8338813 errors 2000, link count wrong
>         unresolved ref dir 26699 index 50500 namelen 4 name root
>         filetype 0 errors 3, no dir item, no dir index
> root 5 inode 8338814 errors 2000, link count wrong
>         unresolved ref dir 26699 index 50502 namelen 6 name marcec
>         filetype 0 errors 3, no dir item, no dir index
> root 5 inode 8338815 errors 2000, link count wrong
>         unresolved ref dir 26699 index 50504 namelen 6 name systab
>         filetype 0 errors 3, no dir item, no dir index
> root 5 inode 8710030 errors 2000, link count wrong
>         unresolved ref dir 26699 index 59588 namelen 6 name marcec
>         filetype 0 errors 3, no dir item, no dir index
> root 5 inode 8710031 errors 2000, link count wrong
>         unresolved ref dir 26699 index 59590 namelen 4 name root
>         filetype 0 errors 3, no dir item, no dir index
> Checking filesystem on /dev/sda1 UUID:
> 0267d8b3-a074-460a-832d-5d5fd36bae64 found 63467610172 bytes used err is
> 1 total csum bytes: 59475016 total tree bytes: 1903411200 total fs tree
> bytes: 1691504640 total extent tree bytes: 130322432 btree space waste
> bytes: 442495212 file data blocks allocated: 555097092096
>  referenced 72887840768
> btrfs-progs v4.0.1
> 
> Again: is this fixable?

FWIW, root 5 (which you asked about upthread) is the main filesystem 
root.  So all these appear to be on the main filesystem, not on snapshots/
subvolumes.

As for the problem itself, noting that I'm not a dev, just a user/admin 
following the list, I believe...

There was a recent bug (early 4.0 or 4.1, IDR which) that (as I recall 
understanding it) would fail to decrement link count and would thus leave 
unnamed inodes hanging around in directories with no way to delete them.  
That looks very much like what you're seeing.  The bug has indeed been 
fixed in current, and a current btrfs check should fix it, but I don't 
believe that v4.0.1 userspace from the initramfs is new enough to have 
that fix.  The 4.1.2 userspace on your main system (from the first post) 
is current and should fix it, I believe, however.

But if it's critical, you may wish to wait and have someone else confirm 
that before acting on it, just in case I have it wrong.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-13  8:29   ` Duncan
@ 2015-08-13  8:54     ` Marc Joliet
  2015-08-14 21:37       ` Marc Joliet
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Joliet @ 2015-08-13  8:54 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3278 bytes --]

Am Thu, 13 Aug 2015 08:29:19 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Marc Joliet posted on Thu, 13 Aug 2015 09:05:41 +0200 as excerpted:
> 
> > Here's the actual output now, obtained via btrfs-progs 4.0.1 from an
> > initramfs emergency shell:
> > 
> > checking extents checking free space cache checking fs roots root 5
> > inode 8338813 errors 2000, link count wrong
> >         unresolved ref dir 26699 index 50500 namelen 4 name root
> >         filetype 0 errors 3, no dir item, no dir index
> > root 5 inode 8338814 errors 2000, link count wrong
> >         unresolved ref dir 26699 index 50502 namelen 6 name marcec
> >         filetype 0 errors 3, no dir item, no dir index
> > root 5 inode 8338815 errors 2000, link count wrong
> >         unresolved ref dir 26699 index 50504 namelen 6 name systab
> >         filetype 0 errors 3, no dir item, no dir index
> > root 5 inode 8710030 errors 2000, link count wrong
> >         unresolved ref dir 26699 index 59588 namelen 6 name marcec
> >         filetype 0 errors 3, no dir item, no dir index
> > root 5 inode 8710031 errors 2000, link count wrong
> >         unresolved ref dir 26699 index 59590 namelen 4 name root
> >         filetype 0 errors 3, no dir item, no dir index
> > Checking filesystem on /dev/sda1 UUID:
> > 0267d8b3-a074-460a-832d-5d5fd36bae64 found 63467610172 bytes used err is
> > 1 total csum bytes: 59475016 total tree bytes: 1903411200 total fs tree
> > bytes: 1691504640 total extent tree bytes: 130322432 btree space waste
> > bytes: 442495212 file data blocks allocated: 555097092096
> >  referenced 72887840768
> > btrfs-progs v4.0.1
> > 
> > Again: is this fixable?
> 
> FWIW, root 5 (which you asked about upthread) is the main filesystem 
> root.  So all these appear to be on the main filesystem, not on snapshots/
> subvolumes.

OK

> As for the problem itself, noting that I'm not a dev, just a user/admin 
> following the list, I believe...
> 
> There was a recent bug (early 4.0 or 4.1, IDR which) that (as I recall 
> understanding it) would fail to decrement link count and would thus leave 
> unnamed inodes hanging around in directories with no way to delete them.  
> That looks very much like what you're seeing.

Now that you mention it, I think I remember seeing that patch (series?).

> The bug has indeed been 
> fixed in current, and a current btrfs check should fix it, but I don't 
> believe that v4.0.1 userspace from the initramfs is new enough to have 
> that fix.  The 4.1.2 userspace on your main system (from the first post) 
> is current and should fix it, I believe, however.

I have updated the initramfs in the meantime.  (Funny: I *just* started using
one, mainly to be able to use btrfstune on /, but now I have a genuine
necessity for it.)

> But if it's critical, you may wish to wait and have someone else confirm 
> that before acting on it, just in case I have it wrong.

I can wait until tonight, at least.  The FS still mounts, and it's just the root
subvolume that's affected; running btrfs-send on the /home subvolume still
works.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-13  8:54     ` Marc Joliet
@ 2015-08-14 21:37       ` Marc Joliet
  2015-08-15  5:10         ` Duncan
  2015-08-23 13:22         ` [SOLVED] " Marc Joliet
  0 siblings, 2 replies; 8+ messages in thread
From: Marc Joliet @ 2015-08-14 21:37 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]

Am Thu, 13 Aug 2015 10:54:58 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> Am Thu, 13 Aug 2015 08:29:19 +0000 (UTC)
> schrieb Duncan <1i5t5.duncan@cox.net>:
> 
> > Marc Joliet posted on Thu, 13 Aug 2015 09:05:41 +0200 as excerpted:
> > 
> > > Here's the actual output now, obtained via btrfs-progs 4.0.1 from an
> > > initramfs emergency shell:
> > > 
> > > checking extents checking free space cache checking fs roots root 5
> > > inode 8338813 errors 2000, link count wrong
> > >         unresolved ref dir 26699 index 50500 namelen 4 name root
> > >         filetype 0 errors 3, no dir item, no dir index
> > > root 5 inode 8338814 errors 2000, link count wrong
> > >         unresolved ref dir 26699 index 50502 namelen 6 name marcec
> > >         filetype 0 errors 3, no dir item, no dir index
> > > root 5 inode 8338815 errors 2000, link count wrong
> > >         unresolved ref dir 26699 index 50504 namelen 6 name systab
> > >         filetype 0 errors 3, no dir item, no dir index
> > > root 5 inode 8710030 errors 2000, link count wrong
> > >         unresolved ref dir 26699 index 59588 namelen 6 name marcec
> > >         filetype 0 errors 3, no dir item, no dir index
> > > root 5 inode 8710031 errors 2000, link count wrong
> > >         unresolved ref dir 26699 index 59590 namelen 4 name root
> > >         filetype 0 errors 3, no dir item, no dir index
> > > Checking filesystem on /dev/sda1 UUID:
> > > 0267d8b3-a074-460a-832d-5d5fd36bae64 found 63467610172 bytes used err is
> > > 1 total csum bytes: 59475016 total tree bytes: 1903411200 total fs tree
> > > bytes: 1691504640 total extent tree bytes: 130322432 btree space waste
> > > bytes: 442495212 file data blocks allocated: 555097092096
> > >  referenced 72887840768
> > > btrfs-progs v4.0.1
> > > 
> > > Again: is this fixable?
> > 
> > FWIW, root 5 (which you asked about upthread) is the main filesystem 
> > root.  So all these appear to be on the main filesystem, not on snapshots/
> > subvolumes.
> 
[...]
> > But if it's critical, you may wish to wait and have someone else confirm 
> > that before acting on it, just in case I have it wrong.
> 
> I can wait until tonight, at least.  The FS still mounts, and it's just the root
> subvolume that's affected; running btrfs-send on the /home subvolume still
> works.

Well, I got impatient, and just went ahead and did it (I have backups, after
all).  It looks like it worked: the affected files were moved to /lost+found/,
where I deleted them again, and btrfs-send works again.  The output of "btrfs
check" after --repair:

checking extents
checking free space cache
checking fs roots
checking csums
There are no extents for csum range 0-69632
Csum exists for 0-69632 but there is no extent record
Checking filesystem on /dev/sda1
UUID: 0267d8b3-a074-460a-832d-5d5fd36bae64
block group 274307481600 has wrong amount of free spacefailed to load free space cache for block group 274307481600
found 60980420666 bytes used err is 1
total csum bytes: 57521732
total tree bytes: 1996800000
total fs tree bytes: 1791721472
total extent tree bytes: 127942656
btree space waste bytes: 460072661
file data blocks allocated: 478650343424
 referenced 73326161920
btrfs-progs v4.1.2

If I notice anything amiss, I'll report back.

(One other thing I found interesting was that "btrfs scrub" didn't care about
the link count errors.)

Greetings.
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-14 21:37       ` Marc Joliet
@ 2015-08-15  5:10         ` Duncan
  2015-08-15  9:19           ` Marc Joliet
  2015-08-23 13:22         ` [SOLVED] " Marc Joliet
  1 sibling, 1 reply; 8+ messages in thread
From: Duncan @ 2015-08-15  5:10 UTC (permalink / raw)
  To: linux-btrfs

Marc Joliet posted on Fri, 14 Aug 2015 23:37:37 +0200 as excerpted:

> (One other thing I found interesting was that "btrfs scrub" didn't care
> about the link count errors.)

A lot of people are confused about exactly what btrfs scrub does, and 
expect it to detect and possibly fix stuff it has nothing to do with.  
It's *not* an fsck.

Scrub does one very useful, but limited, thing.  It systematically 
verifies that the computed checksums for all data and metadata covered by 
checksums match the corresponding recorded checksums.  For dup/raid1/
raid10 modes, if there's a match failure, it will look up the other copy 
and see if it matches, replacing the invalid block with a new copy of the 
other one, assuming it's valid.  For raid56 modes, it attempts to compute 
the valid copy from parity and, again assuming a match after doing so, 
does the replace.  If a valid copy cannot be found or computed, either 
because it's damaged too or because there's no second copy or parity to 
fall back on (single and raid0 modes), then scrub will detect but cannot 
correct the error.

In routine usage, btrfs automatically does the same thing if it happens 
to come across checksum errors in its normal IO stream, but it has to 
come across them first.  Scrub's benefit is that it systematically 
verifies (and corrects errors where it can) checksums on the entire 
filesystem, not just the parts that happen to appear in the normal IO 
stream.

Such checksum errors can be for a few reasons...

I have one ssd that's gradually failing and returns checksum errors 
fairly regularly.  Were I using a normal filesystem I'd have had to 
replace it some time ago.  But with btrfs in raid1 mode and regular 
scrubs (and backups, should they be needed; sometimes I let them get a 
bit stale, but I do have them and am prepared to live with the stale 
restored data if I have to), I've been able to keep using the failing 
device.  When the scrubs hit errors and btrfs does the rewrite from the 
good copy, a block relocation on the failing device is triggered as well, 
with the bad block taken out of service and a new one from the set of 
spares all modern devices have takes its place.  Currently, smartctl -A 
reports 904 reallocated sectors raw value, with a standardized value of 
92.  Before the first reallocated sector, the standardized value was 253, 
perfect.  With the first reallocated sector, it immediately dropped to 
100, apparently the rounded percentage of spare sectors left.  It has 
gradually dropped since then to its current 92, with a threshold value of 
36.  So while it's gradually failing, there's still plenty of spare 
sectors left.  Normally I would have replaced the device even so, but 
I've never actually had the opportunity to actually watch a slow failure 
continue to get worse over time, and now that I do I'm a bit curious how 
things will go, so I'm just letting it happen, tho I do have a 
replacement device already purchased and ready, when the time comes. 

So real media failure, bitrot, is one reason for bad checksums.  The data 
read back from the device simply isn't the same data that was stored to 
it, and the checksum fails as a result.

Of course bad connector cables or storage chipset firmware or hardware is 
another "hardware" cause.

Sudden reboot or power loss, with data being actively written and one 
copy either already updated or not yet touched, while the other is 
actually being written at the time of the crash so the write isn't 
completed, is yet another reason for checksum failure.  This one is 
actually why a scrub can appear to do so much more than it does, because 
where there's a second copy (or parity) of the data available, scrub can 
use it to recover the partially written copy (which being partially 
written fails its checksum verification) to either the completed write 
state, if the other copy was already written, or the pre-write state, if 
the other copy hadn't been written at all, yet.  In this way the result 
is often the same one an fsck would normally produce, detecting and 
fixing the error, but the mechanism is entirely different -- it only 
detected and fixed the error because the checksum was bad and it had a 
good copy it could replace it with, not because it had any smarts about 
how the filesystem actually worked, and could actually tell what the 
error was and correct it by actually correcting it.


Meanwhile, in your case the problem was an actual btrfs logic bug -- it 
didn't track the inode ref-counts correctly, and didn't remove the inode 
when the last reference to it was deleted, because it still thought there 
were more references.  So the metadata actually written to storage was 
incorrect due to the logic flaw, but the checksum covering it was indeed 
the correct checksum for that metadata, as wrong as the metadata actually 
happened to be.  So scrub couldn't detect the error, because it was an 
error not in checksum, which was computed correctly over the metadata, 
but in the logic of the metadata itself as it was written.  Scrub 
therefore had nothing to do with that error and was in fact totally 
oblivious to the fact that the valid checksum covered flawed data in the 
first place.  Only a tool that could follow the actual logic, send in 
this case, since it has to follow the logic in ordered to properly send 
it, could detect the error, and only btrfs check knew enough about the 
logic to both detect the problem and correct it -- tho even then, it 
couldn't totally fix it, as part of the metadata was irretrievably 
missing, so it simply dropped what it could retrieve in lost-and-found.


That should make the answer to the question of why scrub couldn't detect 
and fix the problem clearer -- scrub only detects and possibly fixes a 
very specific problem. checksum verification failure, and that's not the 
problem you had.  As far as scrub was concerned, the checksums were fine, 
and that's all it knows about, so to it, the data and metadata were fine.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deleted files cause btrfs-send to fail
  2015-08-15  5:10         ` Duncan
@ 2015-08-15  9:19           ` Marc Joliet
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Joliet @ 2015-08-15  9:19 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 7119 bytes --]

Am Sat, 15 Aug 2015 05:10:57 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@cox.net>:

> Marc Joliet posted on Fri, 14 Aug 2015 23:37:37 +0200 as excerpted:
> 
> > (One other thing I found interesting was that "btrfs scrub" didn't care
> > about the link count errors.)
> 
> A lot of people are confused about exactly what btrfs scrub does, and 
> expect it to detect and possibly fix stuff it has nothing to do with.  
> It's *not* an fsck.
> 
> Scrub does one very useful, but limited, thing.  It systematically 
> verifies that the computed checksums for all data and metadata covered by 
> checksums match the corresponding recorded checksums.  For dup/raid1/
> raid10 modes, if there's a match failure, it will look up the other copy 
> and see if it matches, replacing the invalid block with a new copy of the 
> other one, assuming it's valid.  For raid56 modes, it attempts to compute 
> the valid copy from parity and, again assuming a match after doing so, 
> does the replace.  If a valid copy cannot be found or computed, either 
> because it's damaged too or because there's no second copy or parity to 
> fall back on (single and raid0 modes), then scrub will detect but cannot 
> correct the error.
> 
> In routine usage, btrfs automatically does the same thing if it happens 
> to come across checksum errors in its normal IO stream, but it has to 
> come across them first.  Scrub's benefit is that it systematically 
> verifies (and corrects errors where it can) checksums on the entire 
> filesystem, not just the parts that happen to appear in the normal IO 
> stream.

I know all that, I just thought it was interesting and wanted to remark as
such. After thinking about it a bit, of course, it makes perfect sense and is
not very interesting at all:  scrub will just verify that the checksums match,
no matter whether the underlying (meta)data is valid or not.

> Such checksum errors can be for a few reasons...
> 
> I have one ssd that's gradually failing and returns checksum errors 
> fairly regularly.  Were I using a normal filesystem I'd have had to 
> replace it some time ago.  But with btrfs in raid1 mode and regular 
> scrubs (and backups, should they be needed; sometimes I let them get a 
> bit stale, but I do have them and am prepared to live with the stale 
> restored data if I have to), I've been able to keep using the failing 
> device.  When the scrubs hit errors and btrfs does the rewrite from the 
> good copy, a block relocation on the failing device is triggered as well, 
> with the bad block taken out of service and a new one from the set of 
> spares all modern devices have takes its place.  Currently, smartctl -A 
> reports 904 reallocated sectors raw value, with a standardized value of 
> 92.  Before the first reallocated sector, the standardized value was 253, 
> perfect.  With the first reallocated sector, it immediately dropped to 
> 100, apparently the rounded percentage of spare sectors left.  It has 
> gradually dropped since then to its current 92, with a threshold value of 
> 36.  So while it's gradually failing, there's still plenty of spare 
> sectors left.  Normally I would have replaced the device even so, but 
> I've never actually had the opportunity to actually watch a slow failure 
> continue to get worse over time, and now that I do I'm a bit curious how 
> things will go, so I'm just letting it happen, tho I do have a 
> replacement device already purchased and ready, when the time comes. 

I'm curious how that will pan out.  My experience with HDDs is that at some
point the sector reallocations start picking up at a somewhat constant (maybe
even accelerating) rate.  I wonder how SSDs behave in this regard.

> So real media failure, bitrot, is one reason for bad checksums.  The data 
> read back from the device simply isn't the same data that was stored to 
> it, and the checksum fails as a result.
> 
> Of course bad connector cables or storage chipset firmware or hardware is 
> another "hardware" cause.
> 
> Sudden reboot or power loss, with data being actively written and one 
> copy either already updated or not yet touched, while the other is 
> actually being written at the time of the crash so the write isn't 
> completed, is yet another reason for checksum failure.  This one is 
> actually why a scrub can appear to do so much more than it does, because 
> where there's a second copy (or parity) of the data available, scrub can 
> use it to recover the partially written copy (which being partially 
> written fails its checksum verification) to either the completed write 
> state, if the other copy was already written, or the pre-write state, if 
> the other copy hadn't been written at all, yet.  In this way the result 
> is often the same one an fsck would normally produce, detecting and 
> fixing the error, but the mechanism is entirely different -- it only 
> detected and fixed the error because the checksum was bad and it had a 
> good copy it could replace it with, not because it had any smarts about 
> how the filesystem actually worked, and could actually tell what the 
> error was and correct it by actually correcting it.
> 
> 
> Meanwhile, in your case the problem was an actual btrfs logic bug -- it 
> didn't track the inode ref-counts correctly, and didn't remove the inode 
> when the last reference to it was deleted, because it still thought there 
> were more references.  So the metadata actually written to storage was 
> incorrect due to the logic flaw, but the checksum covering it was indeed 
> the correct checksum for that metadata, as wrong as the metadata actually 
> happened to be.  So scrub couldn't detect the error, because it was an 
> error not in checksum, which was computed correctly over the metadata, 
> but in the logic of the metadata itself as it was written.  Scrub 
> therefore had nothing to do with that error and was in fact totally 
> oblivious to the fact that the valid checksum covered flawed data in the 
> first place.  Only a tool that could follow the actual logic, send in 
> this case, since it has to follow the logic in ordered to properly send 
> it, could detect the error, and only btrfs check knew enough about the 
> logic to both detect the problem and correct it -- tho even then, it 
> couldn't totally fix it, as part of the metadata was irretrievably 
> missing, so it simply dropped what it could retrieve in lost-and-found.
> 
> 
> That should make the answer to the question of why scrub couldn't detect 
> and fix the problem clearer -- scrub only detects and possibly fixes a 
> very specific problem. checksum verification failure, and that's not the 
> problem you had.  As far as scrub was concerned, the checksums were fine, 
> and that's all it knows about, so to it, the data and metadata were fine.

Yeah, that's a more verbose way to put it :) .  Thanks anyway.

Greetings
-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [SOLVED] Re: Deleted files cause btrfs-send to fail
  2015-08-14 21:37       ` Marc Joliet
  2015-08-15  5:10         ` Duncan
@ 2015-08-23 13:22         ` Marc Joliet
  1 sibling, 0 replies; 8+ messages in thread
From: Marc Joliet @ 2015-08-23 13:22 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 329 bytes --]

Am Fri, 14 Aug 2015 23:37:37 +0200
schrieb Marc Joliet <marcec@gmx.de>:

> If I notice anything amiss, I'll report back.

I haven't noticed anything amiss, so I'm marking this thread as SOLVED.

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup

[-- Attachment #2: Digitale Signatur von OpenPGP --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-08-23 13:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-12 22:34 Deleted files cause btrfs-send to fail Marc Joliet
2015-08-13  7:05 ` Marc Joliet
2015-08-13  8:29   ` Duncan
2015-08-13  8:54     ` Marc Joliet
2015-08-14 21:37       ` Marc Joliet
2015-08-15  5:10         ` Duncan
2015-08-15  9:19           ` Marc Joliet
2015-08-23 13:22         ` [SOLVED] " Marc Joliet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).