* Huge load on btrfs subvolume delete
@ 2016-08-15 10:39 Daniel Caillibaud
2016-08-15 12:32 ` Austin S. Hemmelgarn
0 siblings, 1 reply; 5+ messages in thread
From: Daniel Caillibaud @ 2016-08-15 10:39 UTC (permalink / raw)
To: linux-btrfs
Hi,
I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete
I use snapshots on lxc hosts under debian jessie with
- kernel 4.6.0-0.bpo.1-amd64
- btrfs-progs 4.6.1-1~bpo8
For backup, I have each day, for each subvolume
btrfs subvolume snapshot -r $subvol $snap
# then later
ionice -c3 btrfs subvolume delete $snap
but ionice doesn't seems to have any effect here and after a few minutes the load grows up
quite high (30~40), and I don't know how to make this deletion nicer with I/O
Is there a better way to do so ?
Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a
lot of I/O ?
Actually my io priority on btrfs process are
ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}'
[btrfs-worker] none: prio 4
[btrfs-worker-hi] none: prio 4
[btrfs-delalloc] none: prio 4
[btrfs-flush_del] none: prio 4
[btrfs-cache] none: prio 4
[btrfs-submit] none: prio 4
[btrfs-fixup] none: prio 4
[btrfs-endio] none: prio 4
[btrfs-endio-met] none: prio 4
[btrfs-endio-met] none: prio 4
[btrfs-endio-rai] none: prio 4
[btrfs-endio-rep] none: prio 4
[btrfs-rmw] none: prio 4
[btrfs-endio-wri] none: prio 4
[btrfs-freespace] none: prio 4
[btrfs-delayed-m] none: prio 4
[btrfs-readahead] none: prio 4
[btrfs-qgroup-re] none: prio 4
[btrfs-extent-re] none: prio 4
[btrfs-cleaner] none: prio 0
[btrfs-transacti] none: prio 0
Thanks
--
Daniel
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Huge load on btrfs subvolume delete 2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud @ 2016-08-15 12:32 ` Austin S. Hemmelgarn 2016-08-15 14:06 ` Daniel Caillibaud 0 siblings, 1 reply; 5+ messages in thread From: Austin S. Hemmelgarn @ 2016-08-15 12:32 UTC (permalink / raw) To: Daniel Caillibaud, linux-btrfs On 2016-08-15 06:39, Daniel Caillibaud wrote: > Hi, > > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete > > I use snapshots on lxc hosts under debian jessie with > - kernel 4.6.0-0.bpo.1-amd64 > - btrfs-progs 4.6.1-1~bpo8 > > For backup, I have each day, for each subvolume > > btrfs subvolume snapshot -r $subvol $snap > # then later > ionice -c3 btrfs subvolume delete $snap > > but ionice doesn't seems to have any effect here and after a few minutes the load grows up > quite high (30~40), and I don't know how to make this deletion nicer with I/O Before I start explaining possible solutions, it helps to explain what's actually happening here. When you create a snapshot, BTRFS just scans down the tree for the subvolume in question and creates new references to everything in that subvolume in a separate tree. This is usually insanely fast because all that needs to be done is updating metadata. When you delete a snapshot however, it has to remove any remaining references within the snapshot to the parent subvolume, and also has to process any changed data that is now different from the parent subvolume for deletion just like it would for deleting a file. As a result of this, the work to create a snapshot only depends on the complexity of the directory structure within the subvolume, while the work to delete it depends on both that and how much the snapshot has changed from the parent subvolume. The spike in load your seeing is the filesystem handling all that internal accounting in the background, and I'd be willing to bet that it varies based on how fast things are changing in the parent subvolume. Setting idle I/O scheduling priority on the command to delete the snapshot does nothing because all that command does is tell the kernel to delete the snapshot, the actual deletion is handled in the filesystem driver. While it won't help with the spike in load, you probably want to add `--commit-after` to that subvolume deletion command. That will cause the spike to happen almost immediately, and the command won't return until the filesystem is finished with the accounting and thus the load should be back to normal when it returns. > > Is there a better way to do so ? While there isn't any way I know of to do so, there are ways you can reduce the impact by reducing how much your backing up: 1. You almost certainly don't need to back up the logs, and if you do, they should probably be backed up independently from the rest of the system image. In most cases, logs just add extra size to a backup, and have little value when you restore a backup, so it makes little sense in most cases to include them in a backup. The simplest way to exclude them in your case is to make /var/log in the LXC containers be a separate subvolume. This will exclude it from the snapshot for the backup, which will both speed up the backup, and reduce the amount of changes from the parent that occur while creating the backup. 2. Assuming you're using a distribution compliant with the filesystem hierarchy standard, there are a couple of directories you can safely exclude from all backups simply because portable programs are designed to handle losing data from these directories gracefully. Such directories include /tmp, /var/tmp, and /var/cache, and they can be excluded the same way as /var/log. 3. Similar arguments apply to $HOME/.cache, which is essentially a per-user /var/cache. This is less likely to have an impact if you don't have individual users doing things on these systems. 4. Look for other similar areas you may be able to safely exclude. For example, I use Gentoo, and I build all my packages with external debugging symbols which get stored in /usr/lib/debug. I only have this set up for convenience, so there's no point in me backing it up because I can just rebuild the package to regenerate the debugging symbols if I need them after restoring from a backup. Similarly, I also exclude any VCS repositories that I have copies of elsewhere, simply because I can just clone that copy if I need it. > > Is it a bad idea to set ionice -c3 on the btrfs-transacti process which seems the one doing a > lot of I/O ? Yes, it's always a bad idea to mess with any scheduling properties other than CPU affinity for kernel threads (and even messing with CPU affinity is usually a bad idea too). The btrfs-transaction kthread (the name gets cut off by the length limits built into the kernel) is a particularly bad one to mess with, because it handles committing updates to the filesystem. Setting an idle scheduling priority on it would probably put you at severe risk of data loss or cause your system to lock up. > > Actually my io priority on btrfs process are > > ps x|awk '/[b]trfs/ {printf("%20s ", $NF); system("ionice -p" $1)}' > [btrfs-worker] none: prio 4 > [btrfs-worker-hi] none: prio 4 > [btrfs-delalloc] none: prio 4 > [btrfs-flush_del] none: prio 4 > [btrfs-cache] none: prio 4 > [btrfs-submit] none: prio 4 > [btrfs-fixup] none: prio 4 > [btrfs-endio] none: prio 4 > [btrfs-endio-met] none: prio 4 > [btrfs-endio-met] none: prio 4 > [btrfs-endio-rai] none: prio 4 > [btrfs-endio-rep] none: prio 4 > [btrfs-rmw] none: prio 4 > [btrfs-endio-wri] none: prio 4 > [btrfs-freespace] none: prio 4 > [btrfs-delayed-m] none: prio 4 > [btrfs-readahead] none: prio 4 > [btrfs-qgroup-re] none: prio 4 > [btrfs-extent-re] none: prio 4 > [btrfs-cleaner] none: prio 0 > [btrfs-transacti] none: prio 0 Altogether, this is exactly what they should be in a normal kernel. Also, neat trick with awk to get that info, I'll have to remember that. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete 2016-08-15 12:32 ` Austin S. Hemmelgarn @ 2016-08-15 14:06 ` Daniel Caillibaud 2016-08-15 14:16 ` Austin S. Hemmelgarn 0 siblings, 1 reply; 5+ messages in thread From: Daniel Caillibaud @ 2016-08-15 14:06 UTC (permalink / raw) To: btrfs ml Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit : ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote: ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete […] ASH> Before I start explaining possible solutions, it helps to explain what's ASH> actually happening here. […] Thanks a lot for these clear and detailed explanations. ASH> > Is there a better way to do so ? ASH> While there isn't any way I know of to do so, there are ways you can ASH> reduce the impact by reducing how much your backing up: Thanks for these clues too ! I'll use --commit-after, in order to wait for complete deletion before starting rsync the next snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the vm (but I guess my main pb is about databases, because their datadir are the ones with most writes). -- Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete 2016-08-15 14:06 ` Daniel Caillibaud @ 2016-08-15 14:16 ` Austin S. Hemmelgarn 2016-08-15 19:56 ` Daniel Caillibaud 0 siblings, 1 reply; 5+ messages in thread From: Austin S. Hemmelgarn @ 2016-08-15 14:16 UTC (permalink / raw) To: Daniel Caillibaud, btrfs ml On 2016-08-15 10:06, Daniel Caillibaud wrote: > Le 15/08/16 à 08:32, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit : > > ASH> On 2016-08-15 06:39, Daniel Caillibaud wrote: > ASH> > I'm newbie with btrfs, and I have pb with high load after each btrfs subvolume delete > […] > > ASH> Before I start explaining possible solutions, it helps to explain what's > ASH> actually happening here. > […] > > Thanks a lot for these clear and detailed explanations. Glad I could help. > > ASH> > Is there a better way to do so ? > > ASH> While there isn't any way I know of to do so, there are ways you can > ASH> reduce the impact by reducing how much your backing up: > > Thanks for these clues too ! > > I'll use --commit-after, in order to wait for complete deletion before starting rsync the next > snapshot, and I keep in mind the benefit of putting /var/log outside the main subvolume of the > vm (but I guess my main pb is about databases, because their datadir are the ones with most > writes). > With respect to databases, you might consider backing them up separately too. In many cases for something like an SQL database, it's a lot more flexible to have a dump of the database as a backup than it is to have the database files themselves, because it decouples it from the filesystem level layout. Most good databases should be able to give you a stable dump (assuming of course that the application using the databases is sanely written) a whole lot faster than you could back up the files themselves. For the couple of databases we use internally where I work, we actually back them up separately not only to retain this flexibility, but also because we have them on a separate backup schedule from the rest of the systems because they change a lot more frequently than anything else. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Huge load on btrfs subvolume delete 2016-08-15 14:16 ` Austin S. Hemmelgarn @ 2016-08-15 19:56 ` Daniel Caillibaud 0 siblings, 0 replies; 5+ messages in thread From: Daniel Caillibaud @ 2016-08-15 19:56 UTC (permalink / raw) To: btrfs ml Le 15/08/16 à 10:16, "Austin S. Hemmelgarn" <ahferroin7@gmail.com> a écrit : ASH> With respect to databases, you might consider backing them up separately ASH> too. In many cases for something like an SQL database, it's a lot more ASH> flexible to have a dump of the database as a backup than it is to have ASH> the database files themselves, because it decouples it from the ASH> filesystem level layout. With mysql|mariadb, having a consistent dump needs to lock tables during dump, not acceptable on production servers. Even with specialised tools for hotdump, doing the dump on prod servers is too heavy about I/O (I have huge db, writing the dump is expensive and long). I used to have a slave juste for the dump (easy to stop slave, dump, and start slave), but after a while it wasn't able to follow the writings all the day long (prod was on ssd and it wasn't, dump hd was 100% busy all the day long), so it's for me really easier to rsync the raw files once a day on a cheap host before dump. (of course, I need to flush & lock table during the snapshot, before rsync, but it's just one or two seconds, still acceptable) -- Daniel ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-08-15 19:56 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-15 10:39 Huge load on btrfs subvolume delete Daniel Caillibaud 2016-08-15 12:32 ` Austin S. Hemmelgarn 2016-08-15 14:06 ` Daniel Caillibaud 2016-08-15 14:16 ` Austin S. Hemmelgarn 2016-08-15 19:56 ` Daniel Caillibaud
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).