public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
* What does scrub do?
@ 2014-04-11 11:23 Alex
  2014-04-11 18:36 ` Duncan
  2014-04-15 16:26 ` David Sterba
  0 siblings, 2 replies; 4+ messages in thread
From: Alex @ 2014-04-11 11:23 UTC (permalink / raw)
  To: linux-btrfs

Hi all,

Debian testing/Jessie-to-be; except kernels/btrfs-tools are from unstable so
usually couple of weeks later than you/Linus publish.
Linux XX 3.13-1-amd64 #1 SMP Debian 3.13.7-1 (2014-03-25) x86_64
Btrfs-tools v3.12 Debian standard (not particularly messed with looks like) 

I've never had scrub report anything other than 0 (zero) errors. Ever.
Yet I've had more than one ( ;-) ) problem which required btrfs-zero-log
and/or btrfs --repair. This are usually my fault - fixed it 'til it broke.

root@XX ~ # btrfs scrub status /
scrub status for f8152a67-3c2e-4da1-812e-9a6ab2ad1102
scrub started at Fri Apr 11 09:55:36 2014 and finished after 44 seconds
total bytes scrubbed: 1.40GiB with 0 errors

[    7.502338] btrfs: device label china devid 1 transid 938773 /dev/vda1
[    7.514213] btrfs: device label china devid 1 transid 938773 /dev/vda1
[    7.530893] btrfs: disk space caching is enabled
[    7.530897] btrfs: has skinny extents
[    7.720288] btrfs: bdev /dev/vda1 errs: wr 0, rd 0, flush 0, corrupt 66,
gen 2
[   18.967319] btrfs: device label china devid 1 transid 938773 /dev/vda1
[   19.360767] btrfs: device label china devid 1 transid 938773 /dev/vda1

This scrub and dmesg were taken within minutes of each other. So what it the
utility of running scrub? Or have I got the the wrong idea of what scrub
should report. This VM guest doesn't get messed with often, and is kept 

Very small KVM virtual machine - easy to send you a btrfs dump. Almost
vanilla set-up too. Just say the word.

Have been running btrfs here for quite some while (years, since Linux3.1 I
think) on server. Very very stable (lzo compression sometimes not quite as
stable as zlib and I only run it on the desktop m/c).

People: for auto snapshots use Snapper (a la SUSE) which is now in Debian et
al. Only peculiarity is that clear-down of daily snapshots only happens in
the night so you don't need to put many/any hourly snapshots in.

Thank you. And well done/thank you to the contributors.

Al.

PS: please get the 3.14 tools release out - perhaps the fixes have 
already gone through the tree and I am just shouting at the wind.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What does scrub do?
  2014-04-11 11:23 What does scrub do? Alex
@ 2014-04-11 18:36 ` Duncan
  2014-04-15 17:13   ` Alex
  2014-04-15 16:26 ` David Sterba
  1 sibling, 1 reply; 4+ messages in thread
From: Duncan @ 2014-04-11 18:36 UTC (permalink / raw)
  To: linux-btrfs

Alex posted on Fri, 11 Apr 2014 11:23:31 +0000 as excerpted:

> I've never had scrub report anything other than 0 (zero) errors. Ever.
> Yet I've had more than one ( ;-) ) problem which required btrfs-zero-log
> and/or btrfs --repair. This are usually my fault - fixed it 'til it
> broke.
> 
> root@XX ~ # btrfs scrub status /
> scrub status for f8152a67-3c2e-4da1-812e-9a6ab2ad1102
> scrub started at Fri Apr 11 09:55:36 2014 and finished after 44 seconds
> total bytes scrubbed: 1.40GiB with 0 errors

[snip]

> [    7.720288] btrfs: bdev /dev/vda1 errs: wr 0, rd 0, flush 0,
> corrupt 66, gen 2

[snip]

> This scrub and dmesg were taken within minutes of each other. So what it
> the utility of running scrub? Or have I got the the wrong idea of what
> scrub should report.

Probably the latter (wrong idea...), altho you might have the wrong idea 
of what the mount is reporting, rather than the wrong idea about scrub, 
or more likely, a bit of wrong on both.

Scrub is designed to fix one specific kind of error, and then in only one 
specific (but somewhat common) case.  Btrfs data and metadata are both 
checksummed.  Scrub goes over each individual checksummed object and 
calculates its checksum, verifying it against the checksum stored for 
it.  If the checksums don't match, it records an error.

Additionally, for errors, *IF* there's a second copy of the object and 
that copy DOES pass checksum validation, scrub will rewrite the bad copy 
using the good copy, "scrubbing" the data and fixing the errors it found.

Here's the critical bit.  By default, btrfs keeps two copies of metadata, 
but *NOT* data.  On a single device filesystem, this is dup mode metadata 
(except on ssd, where it's single mode), single mode data.  On a multi-
device filesystem, metadata will default to raid1 mode instead of dup 
mode (a copy on each device instead of two copies on one device), while 
data still defaults to single mode -- just one copy.  There is one 
further exception, for filesystems under 1 GiB in size, btrfs defaults to 
mixed-mode, data/metadata in the same mixed chunks.

Of course if you created the filesystem with specific modes (say -draid1, 
for raid1 mode data, or -msingle, for single mode metadata) or if you did 
a balance-convert to change the mode or switched between multi-device and 
single-device filesystem, the defaults won't apply -- you'll have what 
you set (or the default for the originally created filesystem).

While scrub can detect checksum errors in single (and raid0) mode, there 
won't be a second hopefully valid copy to replace bad copies with, so it 
will detect checksum errors but won't be able to fix them.  Only if 
there's a second, valid copy, can it fix the errors it detects.

Which is one reason I run most of my btrfs filesystems with two devices 
configured as raid1 for both data and metadata.  (I do have a couple very 
small filesystems, /boot and its backup on the other device, that are 
mixed-mode dup-mode, on a single device, but of course dup-mode has a 
second copy too.)

Anyway, if you have never seen scrub errors, that's because scrub has 
never come across such checksum validation errors on your system.

Meanwhile, the corrupt errors you see in the above mount are likely 
historical.  The errors reported by mount above, and by btrfs device stat 
are the number of errors since the filesystem was created or since the 
last reset (btrfs device stat -z prints AND RESETS the stats).  As you've 
never had scrub report an error, the corruptions likely got fixed some 
other way, possibly by deleting the affected files.  But the count has 
never been reset, so you're still seeing those historical errors.

> PS: please get the 3.14 tools release out - perhaps the fixes have
> already gone through the tree and I am just shouting at the wind.

FWIW, btrfs-progs v3.14 is tagged in git, and I'm running it here.  I 
don't know tarball release status since I build from git, but it's 
definitely tagged and available in git, which is what I'm building from, 
so it's definitely out.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What does scrub do?
  2014-04-11 11:23 What does scrub do? Alex
  2014-04-11 18:36 ` Duncan
@ 2014-04-15 16:26 ` David Sterba
  1 sibling, 0 replies; 4+ messages in thread
From: David Sterba @ 2014-04-15 16:26 UTC (permalink / raw)
  To: Alex; +Cc: linux-btrfs

On Fri, Apr 11, 2014 at 11:23:31AM +0000, Alex wrote:
> People: for auto snapshots use Snapper (a la SUSE) which is now in Debian et
> al. Only peculiarity is that clear-down of daily snapshots only happens in
> the night so you don't need to put many/any hourly snapshots in.

Can be fixed by copying

/etc/cron.daily/suse.de-snapper -> ../cron.hourly

though this would increase the device load, there will be always a
snapshot to clean.

Or you can run the script any time you want to start the cleanup.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: What does scrub do?
  2014-04-11 18:36 ` Duncan
@ 2014-04-15 17:13   ` Alex
  0 siblings, 0 replies; 4+ messages in thread
From: Alex @ 2014-04-15 17:13 UTC (permalink / raw)
  To: linux-btrfs

NoDuncan <1i5t5.duncan <at> XXX> writes:

Wow Duncan! Thank you so much for your extensive post. Well written and very
well received.

I do think your 'critical bit' comments are worth iterating! I've booked
marked your email!

Qu: By-the-by does know how to re-lay the CRCs down again?

I don't seem to be able to 'coax' it myself (but the corruption error is
still there). I created backup problems (on the VMs) for myself by using a
naive approach at the time. So it is a good chance to put it right now.

Thank you (!) for the 3.14 tools poke; I deserved it! Just bad timing my end
it seems .. tl;d(on't)r!

Qu: is anyone actively using seed devices? I saw one post relatively
recently. I can see "Ebonacco" and possibly "Killermist" are.

Kind regards

Alex.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-15 17:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-11 11:23 What does scrub do? Alex
2014-04-11 18:36 ` Duncan
2014-04-15 17:13   ` Alex
2014-04-15 16:26 ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox