Balance & scrub & defrag

All of lore.kernel.org
 help / color / mirror / Atom feed

* Balance & scrub & defrag
@ 2014-12-10 22:15 sys.syphus
  2014-12-11  1:17 ` Robert White
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: sys.syphus @ 2014-12-10 22:15 UTC (permalink / raw)
  To: linux-btrfs

I am working on a script that i can run daily that will do maintenance
on my btrfs mountpoints. is there any reason not to concurrently do
all of the above? possibly including discards as well.

also, is there anything existing currently that will do maintenance on
btrfs so i don't have to reinvent the wheel?

#!/bin/bash
btrfs filesystem defragment -r -v /media/btrfs/  &
btrfs scrub start /media/btrfs/ &
btrfs balance start /media/btrfs/ &

watch -d -n 30 "btrfs balance status /media/btrfs/; btrfs scrub status
/media/btrfs/"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-10 22:15 Balance & scrub & defrag sys.syphus
@ 2014-12-11  1:17 ` Robert White
  2014-12-12  1:00   ` Russell Coker
  2014-12-11  8:33 ` Duncan
  2014-12-12  4:32 ` Zygo Blaxell
  2 siblings, 1 reply; 10+ messages in thread
From: Robert White @ 2014-12-11  1:17 UTC (permalink / raw)
  To: sys.syphus, linux-btrfs

On 12/10/2014 02:15 PM, sys.syphus wrote:
> I am working on a script that i can run daily that will do maintenance
> on my btrfs mountpoints. is there any reason not to concurrently do
> all of the above? possibly including discards as well.
>
>
> also, is there anything existing currently that will do maintenance on
> btrfs so i don't have to reinvent the wheel?
>
> #!/bin/bash
> btrfs filesystem defragment -r -v /media/btrfs/  &
> btrfs scrub start /media/btrfs/ &
> btrfs balance start /media/btrfs/ &
>
>
> watch -d -n 30 "btrfs balance status /media/btrfs/; btrfs scrub status
> /media/btrfs/"

I'd recommend doing "none of the above" on a daily basis. One of the 
goals of the filesystem design is to remove the need for any of these 
operations on any regular basis. You are just going to bog down your 
system and increase you heat and wear profiles for no good reason.

Those tools should be used if you notice something fishy like recent 
decreases in efficiency or errors in your log files.

A _monthly_ scrub is maybe worth scheduling if you have a lot of churn 
in your disk contents.

Defragging should be done after significant content additions/changes 
(like replacing a lot of files via package management) and limited to 
the directories most likely changed.

Balancing is almost never necessary and can be anti-helpful if a 
experiences random updates in batches (because the nicely packed file 
may end up far, far away from the active data extent where its COW 
events are taking place.

Resist the urge to tinker with production systems. The exposure 
(rewriting stable data is just the chance to destabilize your data, 
balancing your drive can take two files that always change together and 
put them far away from one another, etc) is not worth the nearly 
non-existent chance of benefit. Once the system is "good" just leave it 
that way until you notice something "not good" coming on the horizon.

If you feel you _must_ do these tasks then doing them all at once, where 
possible, will just make both tasks take longer. If you are transcribing 
a file over on one side of the disk to defrag it, and you are 
transcribing an extent on the other side of the disk to balance it, you 
are just bouncing your disk heads back-and-forth and wasing wall-clock time.

So yea, it's not windows, it doesn't need the defrag hammer.

Trying to over-manage the system will prevent it from seeking its 
dynamic (and so predictable) equilibrium.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-10 22:15 Balance & scrub & defrag sys.syphus
  2014-12-11  1:17 ` Robert White
@ 2014-12-11  8:33 ` Duncan
  2014-12-12  4:32 ` Zygo Blaxell
  2 siblings, 0 replies; 10+ messages in thread
From: Duncan @ 2014-12-11  8:33 UTC (permalink / raw)
  To: linux-btrfs

sys.syphus posted on Wed, 10 Dec 2014 16:15:17 -0600 as excerpted:

> I am working on a script that i can run daily that will do maintenance
> on my btrfs mountpoints. is there any reason not to concurrently do all
> of the above? possibly including discards as well.
> 
> 
> also, is there anything existing currently that will do maintenance on
> btrfs so i don't have to reinvent the wheel?
> 
> #!/bin/bash btrfs filesystem defragment -r -v /media/btrfs/  &
> btrfs scrub start /media/btrfs/ &
> btrfs balance start /media/btrfs/ &

Btrfs has had concurrency issues in the past, tho there has been a recent 
patch series aimed at fixing many of them.  Still, running more than one 
of defrag/scrub/balance at once, particularly on spinning rust (as 
opposed to SSD which is faster and doesn't have I/O bottlenecks to the 
same degree) does put a lot of stress on the system and is thus more 
likely to trigger bugs than running them one at a time.  If your goal is 
to stress-test and find and report bugs, that's a reasonable start, 
otherwise consider doing one at a time.

There's also the memory issue.  These utilities can take quite a bit of 
memory at times, particularly if you're running with lots of snapshots.

Meanwhile, as others have said, doing these daily is overkill.  If you're 
running multi-TB filesystems on spinning rust, it'll take several hours 
for one of these anyway.  Maybe once a week for scrub, which won't 
rewrite anything unless it finds errors.  Balance you don't need to run 
routinely, only when adding/deleting devices or if your data/metadata 
chunk balance (see btrfs fi df) gets out of balance.

And for defrag, take a look at the autodefrag mount option.  Tho be aware 
that it can interact badly with large (say half a gig or larger), 
actively internal-write-pattern rewritten, files such as VM images and 
databases.  With autodefrag on, you shouldn't have to worry about 
fragmentation at all, unless of course you're using big VMs or the like, 
but there's available solutions for that as well.  See the wiki and many 
previous threads here for more.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-11  1:17 ` Robert White
@ 2014-12-12  1:00   ` Russell Coker
  2014-12-12  1:31     ` Robert White
  0 siblings, 1 reply; 10+ messages in thread
From: Russell Coker @ 2014-12-12  1:00 UTC (permalink / raw)
  To: Robert White; +Cc: sys.syphus, linux-btrfs

On Wed, 10 Dec 2014 17:17:28 Robert White wrote:
> A _monthly_ scrub is maybe worth scheduling if you have a lot of churn 
> in your disk contents.

I do weekly scrubs.  I recently had 2 disks in a RAID-1 array develop read 
errors within a month of each other.  The first scrub after replacing sdb 
revealed an error on sdc!

> Defragging should be done after significant content additions/changes 
> (like replacing a lot of files via package management) and limited to 
> the directories most likely changed.

I have never run defrag.  Currently all my BTRFS filesystems that have any 
performance requirements are on SSD and I don't think that defragmenting a SSD 
does much good.

> Balancing is almost never necessary and can be anti-helpful if a 
> experiences random updates in batches (because the nicely packed file 
> may end up far, far away from the active data extent where its COW 
> events are taking place.

The problem with running out of metadata space requires a need for an 
occasional data balance.  If you set it to only balance chunks that are less 
than 10% used then it doesn't take much time.

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-12  1:00   ` Russell Coker
@ 2014-12-12  1:31     ` Robert White
  2014-12-12  9:17       ` Erkki Seppala
  0 siblings, 1 reply; 10+ messages in thread
From: Robert White @ 2014-12-12  1:31 UTC (permalink / raw)
  To: russell; +Cc: sys.syphus, linux-btrfs

On 12/11/2014 05:00 PM, Russell Coker wrote:
> On Wed, 10 Dec 2014 17:17:28 Robert White wrote:
>> A _monthly_ scrub is maybe worth scheduling if you have a lot of churn
>> in your disk contents.
>
> I do weekly scrubs.  I recently had 2 disks in a RAID-1 array develop read
> errors within a month of each other.  The first scrub after replacing sdb
> revealed an error on sdc!

You need to buy better disks. 8-)

I use SMART (smartmontools etc) and its tests to keep track of and warn 
me of such issues. It's way more likely to catch incipient media 
failures long before scrub would. It's also more likely to correct 
situations before they become visible to userspace. Its also a way 
better full-platter scan that involves less real time delay and won't 
bog down a running system.

I reserve scrub for after maintenance and the occasional look-see.

But whatever works for you.

>
> The problem with running out of metadata space requires a need for an
> occasional data balance.  If you set it to only balance chunks that are less
> than 10% used then it doesn't take much time.

In very recent kernels the empty extent remover will take up most of 
this burden.

A shallow balance is fast, but you are missing most of its potential 
benefits at that point. I wash my clothes instead of just taking a lint 
brush to them. Half measures, repeated, lead to more and more fractional 
results.

Every time you sweep a 10% full extent into a another extent far, far 
away you are perturbing your locality and probably shaving a little off 
of probable peak performance. It's the equivalent of organizing your 
sock drawer by just taking all the socks out of the dryer in a lump and 
cramming them into the back of the drawer. That is you are moving the 
most-changed items back to pack them against the least-changed ones. The 
natural lay of the filesystem is to spread out and churn. Repeatedly 
smashing it down is just going to wrinkle your data.

If you are getting anywhere near running out of metadata extents on any 
kind of regular basis then you need to reexamine your entire deal. Make 
sure you are running a recent kernel with the reclaim update. Do a full 
balance _once_ and then leave it alone. Maybe consider autodefrag if 
your file load is compatible (not a lot of VMs and RDBMS extents).

Of course if this is your pirate warez machine and you are regularly 
passing torrents through it, then you just need more space and better 
delete discipline.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-10 22:15 Balance & scrub & defrag sys.syphus
  2014-12-11  1:17 ` Robert White
  2014-12-11  8:33 ` Duncan
@ 2014-12-12  4:32 ` Zygo Blaxell
  2 siblings, 0 replies; 10+ messages in thread
From: Zygo Blaxell @ 2014-12-12  4:32 UTC (permalink / raw)
  To: sys.syphus; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2922 bytes --]

On Wed, Dec 10, 2014 at 04:15:17PM -0600, sys.syphus wrote:
> I am working on a script that i can run daily that will do maintenance
> on my btrfs mountpoints. is there any reason not to concurrently do
> all of the above? possibly including discards as well.
> 
> 
> also, is there anything existing currently that will do maintenance on
> btrfs so i don't have to reinvent the wheel?

There's not a lot of wheel to reinvent.  Just a one-liner in a crontab
is sufficient.

> #!/bin/bash
> btrfs filesystem defragment -r -v /media/btrfs/  &
> btrfs scrub start /media/btrfs/ &
> btrfs balance start /media/btrfs/ &

They should be run sequentially for simple performance reasons.  They all
attempt to occupy all the available disk bandwidth, so running them all
at the same time just increases access latency and usually makes them
much slower than if they were run sequentially.  There is no cooperative
scheduling of these operations in btrfs, even though they theoretically
could be combined into a single pass.

Run scrub once a week on low-end consumer drives, once a month on drives
designed for NAS applications.  Scrub is a fast and (assuming no errors
are detected) read-only scan of allocated data areas that is well worth
its relatively low cost.  There's no need to run it daily--but there's no
reason _not_ to run it daily either, if your disks' speed-to-size ratio
is big enough.

Don't run defragment at all, unless you have a database or VM image,
and if you do, run defrag only on that.  It's necessary for databases
because each fragment ends up being the size of a database page, and
the extent records for large badly-fragmented files consume almost
as much RAM as the file pages themselves.  defrag on arbitrary large
files is a fairly good way to lock yourself out of your system:  defrag
will eventually finish, but in pathological cases it can take hours and
prevent you from using the filesystem while it runs.  You can try using
the autodefrag mount option instead, but be prepared to turn it off if
autodefrag is not right for your workload.

Balance is something to use only when there is a configuration change
(e.g. you added a new disk or replaced one with a larger one) or you've
drastically changed the average size of files in a nearly-full filesystem.
It will make the filesystem painfully slow the whole time it runs, and it
can run for weeks on a filesystem smaller than 1TB.  There's a _reason_
why balance requests persist across reboots.  Speaking of reboots: if
a balance is interrupted by a reboot, it can delay the next mount for
minutes or hours (the mount command seems to hang until it has processed
the interrupted block group) depending on filesystem size.

> watch -d -n 30 "btrfs balance status /media/btrfs/; btrfs scrub status
> /media/btrfs/"

That part is fine.  I throw in 'btrfs fi df' into the watches too.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-12  1:31     ` Robert White
@ 2014-12-12  9:17       ` Erkki Seppala
  2014-12-12 13:32         ` Robert White
  2014-12-13  5:15         ` Zygo Blaxell
  0 siblings, 2 replies; 10+ messages in thread
From: Erkki Seppala @ 2014-12-12  9:17 UTC (permalink / raw)
  To: linux-btrfs

Robert White <rwhite@pobox.com> writes:

> You need to buy better disks. 8-)

Where can one buy these better disks with reasonable prices?-) Disks are
best thought of as consumables.

> I use SMART (smartmontools etc) and its tests to keep track of and
> warn me of such issues. It's way more likely to catch incipient media
> failures long before scrub would.

That may be sort of true, but I think even SMART is helped by the fact
that the media is read through from the beginning to the end*, so it can
detect even the errors that don't bubble through the IO layer. And BTRFS
can indeed note errors that the media doesn't - two checksums is better
than one checksum, assuming they aren't exactly the same algorithm ;).

Do you alternatively execute SMART self tests?

* scrub doesn't do this, it reads only through used data

-- 
  _____________________________________________________________________
     / __// /__ ____  __               http://www.modeemi.fi/~flux/\   \
    / /_ / // // /\ \/ /                                            \  /
   /_/  /_/ \___/ /_/\_\@modeemi.fi                                  \/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
@ 2014-12-12  9:49 Tomasz Chmielewski
  0 siblings, 0 replies; 10+ messages in thread
From: Tomasz Chmielewski @ 2014-12-12  9:49 UTC (permalink / raw)
  To: Btrfs BTRFS

> I use SMART (smartmontools etc) and its tests to keep track of and warn
> me of such issues. It's way more likely to catch incipient media
> failures long before scrub would. It's also more likely to correct
> situations before they become visible to userspace. Its also a way
> better full-platter scan that involves less real time delay and won't
> bog down a running system.

Don't put too much trust in SMART - sectors can rot unexpectedly even if 
SMART is thinking everything is fine with the drive.

I had exactly this issue recently:

1) one of the drives in the server failed and was replaced

2) "btrfs device delete missing" (which basically moves data from the 
remaining drive to the new one) was failing with IO error

3) according to SMART, the drive with IO error was fine (no reallocated 
sectors, no warnings etc.)


So, scrub to the rescue - it printed "broken" files, after removing them 
manually, it was possible to finish "btrfs device delete missing".

Probably it makes sense to run scrub occasionally (just like mdraid is 
doing on most distributions).


-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-12  9:17       ` Erkki Seppala
@ 2014-12-12 13:32         ` Robert White
  2014-12-13  5:15         ` Zygo Blaxell
  1 sibling, 0 replies; 10+ messages in thread
From: Robert White @ 2014-12-12 13:32 UTC (permalink / raw)
  To: Erkki Seppala, linux-btrfs

On 12/12/2014 01:17 AM, Erkki Seppala wrote:
> Robert White <rwhite@pobox.com> writes:
>
>> You need to buy better disks. 8-)
>
> Where can one buy these better disks with reasonable prices?-) Disks are
> best thought of as consumables.

A good disk is only about 9% more expensive. So like the WD "green" 
disks were all cheap because they were (essentially) the disks that 
didn't pass the full quality suite for the higher WD lines like "caviar".

"Inexpensive" and "Cheap" are not the same thing.

Disks are not best thought of as consumables unless the data you store 
on them is discardable.

> Do you alternatively execute SMART self tests?

Indeed. If you install and activate SMART but you never run the tests 
you've done another one of those half-measures I was talking about.

The "long offline" test reads 100% of the disk surface (well, up until 
it hits an error anyway). But since none of that data has to leave the 
disk controller and go out through the interface etc it doesn't bog the 
rest of the system.

All but the oldest or cheapest drives have controllers that will "resume 
the offline test after any command" so you do

smartctl --test=long /dev/sda # or whatever

every few days and you'll know when things start to get dicy.

The one thing you do have to be watchful of is that the tests _stop_ 
when they hit the first read error, so you do have to keep up with things.

For instance I just had a pair of uncorrectable read errors. When I used 
hdparm to write the sectors, however, the disk didn't need to relocate 
the block(s) as bad. So it was some funky event on the disk itself.

Of course it's a very old disk (1525 days of power-on runtime) so two 
correctable-with-overwrite read errors isn't bad.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Balance & scrub & defrag
  2014-12-12  9:17       ` Erkki Seppala
  2014-12-12 13:32         ` Robert White
@ 2014-12-13  5:15         ` Zygo Blaxell
  1 sibling, 0 replies; 10+ messages in thread
From: Zygo Blaxell @ 2014-12-13  5:15 UTC (permalink / raw)
  To: Erkki Seppala; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

On Fri, Dec 12, 2014 at 11:17:58AM +0200, Erkki Seppala wrote:
> That may be sort of true, but I think even SMART is helped by the fact
> that the media is read through from the beginning to the end*, so it can
> detect even the errors that don't bubble through the IO layer. And BTRFS
> can indeed note errors that the media doesn't - two checksums is better
> than one checksum, assuming they aren't exactly the same algorithm ;).
> 
> Do you alternatively execute SMART self tests?
> 
> * scrub doesn't do this, it reads only through used data

I do both.  They operate at different layers of the storage stack, and have
access to different information.  They also have different (and hopefully
non-overlapping) bugs.

scrub pros:

	+ can compare data with the other copies in RAID1 or DUP mode

	+ can fix bad data when good copies available

	+ slows down when other processes want to use the disk

	+ can be suspended and resumed at will by software

	+ error data is impervious to drive firmware bugs

	+ straightforward error reports

	+ only scans allocated data

scrub cons:

	- only scans allocated data

	- btrfs filesystems only

	- CPU and I/O burden

	- error sources are not localized:  scrub errors could be software
	bugs, bad RAM, bad CPU cooling, bad cabling, bad power supply,
	or bad hard drive

smart pros:

	+ runs in the background

	+ no CPU or I/O required, just read results from previous run
	and launch new test daily

	+ access to electrical and mechanical data from the drive
	that are otherwise unavailable to the host

	+ 100% surface scan (including bad sector count)

	+ logs host I/O errors that OS might miss
	(e.g. because they occur during BIOS booting)

	+ works with any filesystems, partitions, swap, etc.

	+ error sources are localized to the drive in test

smart cons:

	- buggy firmware does not detect or report error events when
	significant failures occur

	- buggy firmware does detect and report error events when
	signficant failures do not occur

	- buggy firmware will make host accesses painfully slow during
	scan (WD Green is very bad for this)

	- firmware does not implement useful subset of SMART command set

	- SMART command set can be inaccessible through some SATA bridge
	chips (especially USB)

	- cannot fix anything, only report quantities of data already lost

	- cannot reliably detect RAM or CPU failure (on host or drive)

	- requires the drive to spin for 1-2 continuous hours during test

	- interpreting the raw data is a black art

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-12-13  5:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-10 22:15 Balance & scrub & defrag sys.syphus
2014-12-11  1:17 ` Robert White
2014-12-12  1:00   ` Russell Coker
2014-12-12  1:31     ` Robert White
2014-12-12  9:17       ` Erkki Seppala
2014-12-12 13:32         ` Robert White
2014-12-13  5:15         ` Zygo Blaxell
2014-12-11  8:33 ` Duncan
2014-12-12  4:32 ` Zygo Blaxell
  -- strict thread matches above, loose matches on Subject: below --
2014-12-12  9:49 Tomasz Chmielewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.