Huge load on btrfs subvolume delete

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Huge load on btrfs subvolume delete
@ 2016-08-15 10:39 Daniel Caillibaud
  2016-08-15 12:32 ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 10:39 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete

I use snapshots on lxc hosts under debian jessie with
- kernel 4.6.0-0.bpo.1-amd64
- btrfs-progs 4.6.1-1~bpo8

For backup, I have each day, for each subvolume

btrfs subvolume snapshot -r $subvol $snap
# then later
ionice -c3 btrfs subvolume delete $snap

but ionice doesn't seems to have any effect here and after a few minutes the load grows up
quite high (30~40), and I don't know how to make this deletion nicer with I/O

Is there a better way to do so ?

Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a
lot of I/O ?

Actually my io priority on btrfs process are 

ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
      [btrfs-worker] none: prio 4
   [btrfs-worker-hi] none: prio 4
    [btrfs-delalloc] none: prio 4
   [btrfs-flush_del] none: prio 4
       [btrfs-cache] none: prio 4
      [btrfs-submit] none: prio 4
       [btrfs-fixup] none: prio 4
       [btrfs-endio] none: prio 4
   [btrfs-endio-met] none: prio 4
   [btrfs-endio-met] none: prio 4
   [btrfs-endio-rai] none: prio 4
   [btrfs-endio-rep] none: prio 4
         [btrfs-rmw] none: prio 4
   [btrfs-endio-wri] none: prio 4
   [btrfs-freespace] none: prio 4
   [btrfs-delayed-m] none: prio 4
   [btrfs-readahead] none: prio 4
   [btrfs-qgroup-re] none: prio 4
   [btrfs-extent-re] none: prio 4
     [btrfs-cleaner] none: prio 0
   [btrfs-transacti] none: prio 0



Thanks

-- 
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Huge load on btrfs subvolume delete
  2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud
@ 2016-08-15 12:32 ` Austin S. Hemmelgarn
  2016-08-15 14:06   ` Daniel Caillibaud
  0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 12:32 UTC (permalink / raw)
  To: Daniel Caillibaud, linux-btrfs

On 2016-08-15 06:39, Daniel Caillibaud wrote:
> Hi,
>
> I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
>
> I use snapshots on lxc hosts under debian jessie with
> - kernel 4.6.0-0.bpo.1-amd64
> - btrfs-progs 4.6.1-1~bpo8
>
> For backup, I have each day, for each subvolume
>
> btrfs subvolume snapshot -r $subvol $snap
> # then later
> ionice -c3 btrfs subvolume delete $snap
>
> but ionice doesn't seems to have any effect here and after a few minutes the load grows up
> quite high (30~40), and I don't know how to make this deletion nicer with I/O
Before I start explaining possible solutions, it helps to explain what's 
actually happening here.  When you create a snapshot, BTRFS just scans 
down the tree for the subvolume in question and creates new references 
to everything in that subvolume in a separate tree.  This is usually 
insanely fast because all that needs to be done is updating metadata. 
When you delete a snapshot however, it has to remove any remaining 
references within the snapshot to the parent subvolume, and also has to 
process any changed data that is now different from the parent subvolume 
for deletion just like it would for deleting a file.  As a result of 
this, the work to create a snapshot only depends on the complexity of 
the directory structure within the subvolume, while the work to delete 
it depends on both that and how much the snapshot has changed from the 
parent subvolume.

The spike in load your seeing is the filesystem handling all that 
internal accounting in the background, and I'd be willing to bet that it 
varies based on how fast things are changing in the parent subvolume. 
Setting idle I/O scheduling priority on the command to delete the 
snapshot does nothing because all that command does is tell the kernel 
to delete the snapshot, the actual deletion is handled in the filesystem 
driver.  While it won't help with the spike in load, you probably want 
to add `--commit-after` to that subvolume deletion command.  That will 
cause the spike to happen almost immediately, and the command won't 
return until the filesystem is finished with the accounting and thus the 
load should be back to normal when it returns.
>
> Is there a better way to do so ?
While there isn't any way I know of to do so, there are ways you can 
reduce the impact by reducing how much your backing up:
1. You almost certainly don't need to back up the logs, and if you do, 
they should probably be backed up independently from the rest of the 
system image.  In most cases, logs just add extra size to a backup, and 
have little value when you restore a backup, so it makes little sense in 
most cases to include them in a backup.  The simplest way to exclude 
them in your case is to make /var/log in the LXC containers be a 
separate subvolume.  This will exclude it from the snapshot for the 
backup, which will both speed up the backup, and reduce the amount of 
changes from the parent that occur while creating the backup.
2. Assuming you're using a distribution compliant with the filesystem 
hierarchy standard, there are a couple of directories you can safely 
exclude from all backups simply because portable programs are designed 
to handle losing data from these directories gracefully.  Such 
directories include /tmp, /var/tmp, and /var/cache, and they can be 
excluded the same way as /var/log.
3. Similar arguments apply to $HOME/.cache, which is essentially a 
per-user /var/cache.  This is less likely to have an impact if you don't 
have individual users doing things on these systems.
4. Look for other similar areas you may be able to safely exclude.  For 
example, I use Gentoo, and I build all my packages with external 
debugging symbols which get stored in /usr/lib/debug.  I only have this 
set up for convenience, so there's no point in me backing it up because 
I can just rebuild the package to regenerate the debugging symbols if I 
need them after restoring from a backup.  Similarly, I also exclude any 
VCS repositories that I have copies of elsewhere, simply because I can 
just clone that copy if I need it.
>
> Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a
> lot of I/O ?
Yes, it's always a bad idea to mess with any scheduling properties other 
than CPU affinity for kernel threads (and even messing with CPU affinity 
is usually a bad idea too).  The btrfs-transaction kthread (the name 
gets cut off by the length limits built into the kernel) is a 
particularly bad one to mess with, because it handles committing updates 
to the filesystem.  Setting an idle scheduling priority on it would 
probably put you at severe risk of data loss or cause your system to 
lock up.
>
> Actually my io priority on btrfs process are
>
> ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
>       [btrfs-worker] none: prio 4
>    [btrfs-worker-hi] none: prio 4
>     [btrfs-delalloc] none: prio 4
>    [btrfs-flush_del] none: prio 4
>        [btrfs-cache] none: prio 4
>       [btrfs-submit] none: prio 4
>        [btrfs-fixup] none: prio 4
>        [btrfs-endio] none: prio 4
>    [btrfs-endio-met] none: prio 4
>    [btrfs-endio-met] none: prio 4
>    [btrfs-endio-rai] none: prio 4
>    [btrfs-endio-rep] none: prio 4
>          [btrfs-rmw] none: prio 4
>    [btrfs-endio-wri] none: prio 4
>    [btrfs-freespace] none: prio 4
>    [btrfs-delayed-m] none: prio 4
>    [btrfs-readahead] none: prio 4
>    [btrfs-qgroup-re] none: prio 4
>    [btrfs-extent-re] none: prio 4
>      [btrfs-cleaner] none: prio 0
>    [btrfs-transacti] none: prio 0
Altogether, this is exactly what they should be in a normal kernel.

Also, neat trick with awk to get that info, I'll have to remember that.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Huge load on btrfs subvolume delete
  2016-08-15 12:32 ` Austin S. Hemmelgarn
@ 2016-08-15 14:06   ` Daniel Caillibaud
  2016-08-15 14:16     ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 14:06 UTC (permalink / raw)
  To: btrfs ml

Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :

ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote:
ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
[…]

ASH> Before I start explaining possible solutions, it helps to explain what's 
ASH> actually happening here.
[…]

Thanks a lot for these clear and detailed explanations.

ASH> > Is there a better way to do so ?

ASH> While there isn't any way I know of to do so, there are ways you can 
ASH> reduce the impact by reducing how much your backing up:

Thanks for these clues too !

I'll use --commit-after, in order to wait for complete deletion before starting rsync the next
snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the
vm (but I guess my main pb is about databases, because their datadir are the ones with most
writes).

-- 
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Huge load on btrfs subvolume delete
  2016-08-15 14:06   ` Daniel Caillibaud
@ 2016-08-15 14:16     ` Austin S. Hemmelgarn
  2016-08-15 19:56       ` Daniel Caillibaud
  0 siblings, 1 reply; 5+ messages in thread
From: Austin S. Hemmelgarn @ 2016-08-15 14:16 UTC (permalink / raw)
  To: Daniel Caillibaud, btrfs ml

On 2016-08-15 10:06, Daniel Caillibaud wrote:
> Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :
>
> ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote:
> ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
> […]
>
> ASH> Before I start explaining possible solutions, it helps to explain what's
> ASH> actually happening here.
> […]
>
> Thanks a lot for these clear and detailed explanations.
Glad I could help.
>
> ASH> > Is there a better way to do so ?
>
> ASH> While there isn't any way I know of to do so, there are ways you can
> ASH> reduce the impact by reducing how much your backing up:
>
> Thanks for these clues too !
>
> I'll use --commit-after, in order to wait for complete deletion before starting rsync the next
> snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the
> vm (but I guess my main pb is about databases, because their datadir are the ones with most
> writes).
>
With respect to databases, you might consider backing them up separately 
too.  In many cases for something like an SQL database, it's a lot more 
flexible to have a dump of the database as a backup than it is to have 
the database files themselves, because it decouples it from the 
filesystem level layout.  Most good databases should be able to give you 
a stable dump (assuming of course that the application using the 
databases is sanely written) a whole lot faster than you could back up 
the files themselves.  For the couple of databases we use internally 
where I work, we actually back them up separately not only to retain 
this flexibility, but also because we have them on a separate backup 
schedule from the rest of the systems because they change a lot more 
frequently than anything else.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Huge load on btrfs subvolume delete
  2016-08-15 14:16     ` Austin S. Hemmelgarn
@ 2016-08-15 19:56       ` Daniel Caillibaud
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 19:56 UTC (permalink / raw)
  To: btrfs ml

Le 15/08/16 à 10:16, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit :
ASH> With respect to databases, you might consider backing them up separately 
ASH> too.  In many cases for something like an SQL database, it's a lot more 
ASH> flexible to have a dump of the database as a backup than it is to have 
ASH> the database files themselves, because it decouples it from the 
ASH> filesystem level layout.

With mysql|mariadb, having a consistent dump needs to lock tables during dump, not acceptable on
production servers. 

Even with specialised tools for hotdump, doing the dump on prod servers is too heavy about I/O
(I have huge db, writing the dump is expensive and long).

I used to have a slave juste for the dump (easy to stop slave, dump, and start slave), but after
a while it wasn't able to follow the writings all the day long (prod was on ssd and it wasn't,
dump hd was 100% busy all the day long), so it's for me really easier to rsync the raw
files once a day on a cheap host before dump.

(of course, I need to flush & lock table during the snapshot, before rsync, but it's just one or
two seconds, still acceptable)

-- 
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-15 19:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud
2016-08-15 12:32 ` Austin S. Hemmelgarn
2016-08-15 14:06   ` Daniel Caillibaud
2016-08-15 14:16     ` Austin S. Hemmelgarn
2016-08-15 19:56       ` Daniel Caillibaud

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).