* system stuck with flush-btrfs-4 at 100% after filesystem resize
@ 2014-02-08 18:36 John Navitsky
2014-02-10 15:35 ` John Navitsky
2014-02-10 16:43 ` Josef Bacik
0 siblings, 2 replies; 5+ messages in thread
From: John Navitsky @ 2014-02-08 18:36 UTC (permalink / raw)
To: linux-btrfs
Hello,
I have a large file system that has been growing. We've resized it a
couple of times with the following approach:
lvextend -L +800G /dev/raid/virtual_machines
btrfs filesystem resize +800G /vms
I think the FS started out at 200G, we increased it by 200GB a time or
two, then by 800GB and everything worked fine.
The filesystem hosts a number of virtual machines so the file system is
in use, although the VMs individually tend not to be overly active.
VMs tend to be in subvolumes, and some of those subvolumes have snapshots.
This time, I increased it by another 800GB, and it it has hung for many
hours (over night) with flush-btrfs-4 near 100% cpu all that time.
I'm not clear at this point that it will finish or where to go from here.
Any pointers would be much appreciated.
Thanks,
-john (newbie to BTRFS)
-------- procedure log ----------
romulus:/home/users/johnn # lvextend -L +800G /dev/raid/virtual_machines
romulus:/home/users/johnn # btrfs filesystem resize +800G /vms
Resize '/vms' of '+800G'
[hangs]
top - 12:21:53 up 136 days, 2:45, 13 users, load average: 30.39,
30.37, 30.37
Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
%Cpu(s): 2.4 us, 2.3 sy, 0.0 ni, 95.1 id, 0.1 wa, 0.0 hi, 0.0 si,
0.0 st
MiB Mem: 129147 total, 127427 used, 1720 free, 264 buffers
MiB Swap: 262143 total, 661 used, 261482 free, 93666 cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
48809 root 20 0 0 0 0 R 99.3 0.0 1449:14
flush-btrfs-4
------- misc info -----------
romulus:/home/users/johnn # cat /etc/SuSE-release
openSUSE 12.3 (x86_64)
VERSION = 12.3
CODENAME = Dartmouth
romulus:/home/users/johnn # uname -a
Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May
31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux
romulus:/home/users/johnn #
romulus:/home/users/johnn # vgdisplay
--- Volume group ---
VG Name raid
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 19
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 7
Open LV 7
Max PV 0
Cur PV 1
Act PV 1
VG Size 10.91 TiB
PE Size 4.00 MiB
Total PE 2859333
Alloc PE / Size 1371136 / 5.23 TiB
Free PE / Size 1488197 / 5.68 TiB
VG UUID npyvGj-7vxF-IoI8-Z4tF-ygpP-Q2Ja-vV8sLA
[...]
romulus:/home/users/johnn # lvdisplay
[...]
--- Logical volume ---
LV Path /dev/raid/virtual_machines
LV Name virtual_machines
VG Name raid
LV UUID qtzNBG-vuLV-EsgO-FDIf-sO7A-GKmd-EVjGjp
LV Write Access read/write
LV Creation host, time romulus.redacted.com, 2013-09-25 11:05:54 -0500
LV Status available
# open 1
LV Size 2.54 TiB
Current LE 665600
Segments 2
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:4
[...]
johnn@romulus:~> df -h /vms
Filesystem Size Used Avail Use% Mounted on
/dev/dm-4 1.8T 1.8T 6.0G 100% /vms
johnn@romulus:~>
romulus:/home/users/johnn # btrfs filesystem show
[...]
Label: none uuid: f08c5602-f53a-43c9-b498-fa788b01e679
Total devices 1 FS bytes used 1.74TB
devid 1 size 1.76TB used 1.76TB path /dev/dm-4
[...]
Btrfs v0.19+
romulus:/home/users/johnn #
romulus:/home/users/johnn # btrfs subvolume list /vms
ID 324 top level 5 path johnn-centos64
ID 325 top level 5 path johnn-ubuntu1304
ID 326 top level 5 path johnn-opensuse1203
ID 327 top level 5 path johnn-sles11sp3
ID 328 top level 5 path johnn-sles11sp2
ID 329 top level 5 path johnn-fedora19
ID 330 top level 5 path johnn-sles11sp1
ID 394 top level 5 path redacted-glance
ID 396 top level 5 path redacted_test
ID 397 top level 5 path glance
ID 403 top level 5 path test_redacted
ID 414 top level 5 path johnn-disktest
ID 460 top level 5 path redacted-opensuse-01
ID 472 top level 5 path redacted
ID 473 top level 5 path redacted2
ID 496 top level 5 path redacted_test
ID 524 top level 5 path redacted-moab
ID 525 top level 5 path redacted_redacted-1
ID 531 top level 5 path
.snapshots/johnn-sles11sp2/2013.10.11-14:25.18/johnn-sles11sp2
ID 533 top level 5 path
.snapshots/johnn-centos64/2013.10.11-15:32.16/johnn-centos64
ID 534 top level 5 path
.snapshots/johnn-ubuntu1304/2013.10.11-15:33.20/johnn-ubuntu1304
ID 535 top level 5 path
.snapshots/johnn-opensuse1203/2013.10.11-15:36.19/johnn-opensuse1203
ID 536 top level 5 path
.snapshots/johnn-sles11sp3/2013.10.11-15:39.51/johnn-sles11sp3
ID 537 top level 5 path
.snapshots/johnn-fedora19/2013.10.11-15:41.08/johnn-fedora19
ID 538 top level 5 path
.snapshots/johnn-sles11sp2/2013.10.11-16:48.02/johnn-sles11sp2
ID 539 top level 5 path
.snapshots/johnn-sles11sp1/2013.10.11-17:17.17/johnn-sles11sp1
ID 540 top level 5 path redacted-master
ID 547 top level 5 path redacted-client
ID 583 top level 5 path redacted-sles11sp3
ID 584 top level 5 path .snapshots/2013.11.07-12:52.01/vms
ID 586 top level 5 path rur_vm
ID 599 top level 5 path redacted-redactedsp3
ID 727 top level 5 path redacted-redactedsp3-test2
ID 771 top level 5 path redacted-sp3-standalone
ID 787 top level 5 path .trash/redacted-redactedsp3-Dec10
ID 806 top level 5 path
.snapshots/redacted-redactedsp3-Dec10/2013.12.10-12:24.17/redacted-redactedsp3-Dec10
ID 826 top level 5 path redacted-sp3-standalone2
ID 894 top level 5 path redacted-redactedsp3-update01
ID 941 top level 5 path redacted-redacted-testvm
ID 1194 top level 5 path redacted-redactedsp3-jan3
ID 1210 top level 5 path redacted-sle11sp3-01
ID 1298 top level 5 path
redacted_redacted_standalone_SLES11SP3-redacted_20140117+0928
ID 1324 top level 5 path redacted_redacted_SLES11SP3-redacted_20140117+0928
ID 1356 top level 5 path redacted_redacted_SP3
ID 1383 top level 5 path redacted_redacted-redacted-redacted11SP3-20140204
ID 1964 top level 5 path redacted-redacted-base
ID 1971 top level 5 path .trash/redacted-redacted-redacted
ID 1972 top level 5 path .trash/redacted-redacted-redacted2
ID 1988 top level 5 path .trash/redacted-redacted-redacted3
ID 1989 top level 5 path .trash/redacted-redacted-redacted4
ID 2002 top level 5 path redacted-feature-branch_71
ID 2003 top level 5 path
.snapshots/johnn-sles11sp3/2014.02.07-10:51.57/johnn-sles11sp3
romulus:/home/users/johnn #
romulus:/home/users/johnn # lsof | grep vms | wc -l
1127
romulus:/home/users/johnn #
romulus:/home/users/johnn # ps -ef | grep VBoxHeadless | wc -l
18
romulus:/home/users/johnn #
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
2014-02-08 18:36 system stuck with flush-btrfs-4 at 100% after filesystem resize John Navitsky
@ 2014-02-10 15:35 ` John Navitsky
2014-02-11 5:23 ` Duncan
2014-02-10 16:43 ` Josef Bacik
1 sibling, 1 reply; 5+ messages in thread
From: John Navitsky @ 2014-02-10 15:35 UTC (permalink / raw)
To: linux-btrfs
As a follow-up, at some point over the weekend things did finish on
their own:
romulus:/vms/johnn-sles11sp3 # df -h /vms
Filesystem Size Used Avail Use% Mounted on
/dev/dm-4 2.6T 1.6T 1.1T 60% /vms
romulus:/vms/johnn-sles11sp3 #
I'd still be interested in any comments about what was going on or
suggestions.
Thanks,
-john
On 2/8/2014 10:36 AM, John Navitsky wrote:
> Hello,
>
> I have a large file system that has been growing. We've resized it a
> couple of times with the following approach:
>
> lvextend -L +800G /dev/raid/virtual_machines
> btrfs filesystem resize +800G /vms
>
> I think the FS started out at 200G, we increased it by 200GB a time or
> two, then by 800GB and everything worked fine.
>
> The filesystem hosts a number of virtual machines so the file system is
> in use, although the VMs individually tend not to be overly active.
>
> VMs tend to be in subvolumes, and some of those subvolumes have snapshots.
>
> This time, I increased it by another 800GB, and it it has hung for many
> hours (over night) with flush-btrfs-4 near 100% cpu all that time.
>
> I'm not clear at this point that it will finish or where to go from here.
>
> Any pointers would be much appreciated.
>
> Thanks,
>
> -john (newbie to BTRFS)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
2014-02-08 18:36 system stuck with flush-btrfs-4 at 100% after filesystem resize John Navitsky
2014-02-10 15:35 ` John Navitsky
@ 2014-02-10 16:43 ` Josef Bacik
2014-02-10 16:52 ` John Navitsky
1 sibling, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2014-02-10 16:43 UTC (permalink / raw)
To: John Navitsky, linux-btrfs
On 02/08/2014 01:36 PM, John Navitsky wrote:
> Hello,
>
> I have a large file system that has been growing. We've resized it a
> couple of times with the following approach:
>
> lvextend -L +800G /dev/raid/virtual_machines
> btrfs filesystem resize +800G /vms
>
> I think the FS started out at 200G, we increased it by 200GB a time or
> two, then by 800GB and everything worked fine.
>
> The filesystem hosts a number of virtual machines so the file system is
> in use, although the VMs individually tend not to be overly active.
>
> VMs tend to be in subvolumes, and some of those subvolumes have snapshots.
>
> This time, I increased it by another 800GB, and it it has hung for many
> hours (over night) with flush-btrfs-4 near 100% cpu all that time.
>
> I'm not clear at this point that it will finish or where to go from here.
>
> Any pointers would be much appreciated.
>
> Thanks,
>
> -john (newbie to BTRFS)
>
>
> -------- procedure log ----------
>
> romulus:/home/users/johnn # lvextend -L +800G /dev/raid/virtual_machines
> romulus:/home/users/johnn # btrfs filesystem resize +800G /vms
> Resize '/vms' of '+800G'
> [hangs]
>
>
> top - 12:21:53 up 136 days, 2:45, 13 users, load average: 30.39,
> 30.37, 30.37
> Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie
> %Cpu(s): 2.4 us, 2.3 sy, 0.0 ni, 95.1 id, 0.1 wa, 0.0 hi, 0.0 si,
> 0.0 st
> MiB Mem: 129147 total, 127427 used, 1720 free, 264 buffers
> MiB Swap: 262143 total, 661 used, 261482 free, 93666 cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 48809 root 20 0 0 0 0 R 99.3 0.0 1449:14
> flush-btrfs-4
>
> ------- misc info -----------
>
> romulus:/home/users/johnn # cat /etc/SuSE-release
> openSUSE 12.3 (x86_64)
> VERSION = 12.3
> CODENAME = Dartmouth
> romulus:/home/users/johnn # uname -a
> Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May
> 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux
> romulus:/home/users/johnn #
Found your problem! Basically if you are going to run btrfs you should
at the very least keep up with the stable kernels. 3.11.whatever is
fine, 3.12.whatever is better. Thanks,
Josef
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
2014-02-10 16:43 ` Josef Bacik
@ 2014-02-10 16:52 ` John Navitsky
0 siblings, 0 replies; 5+ messages in thread
From: John Navitsky @ 2014-02-10 16:52 UTC (permalink / raw)
To: Josef Bacik, linux-btrfs
On 2/10/2014 8:43 AM, Josef Bacik wrote:
> On 02/08/2014 01:36 PM, John Navitsky wrote:
>> romulus:/home/users/johnn # cat /etc/SuSE-release
>> openSUSE 12.3 (x86_64)
>> VERSION = 12.3
>> CODENAME = Dartmouth
>> romulus:/home/users/johnn # uname -a
>> Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May
>> 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux
>> romulus:/home/users/johnn #
>
> Found your problem! Basically if you are going to run btrfs you should
> at the very least keep up with the stable kernels. 3.11.whatever is
> fine, 3.12.whatever is better. Thanks,
>
> Josef
Thanks for the feedback.
-john
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
2014-02-10 15:35 ` John Navitsky
@ 2014-02-11 5:23 ` Duncan
0 siblings, 0 replies; 5+ messages in thread
From: Duncan @ 2014-02-11 5:23 UTC (permalink / raw)
To: linux-btrfs
John Navitsky posted on Mon, 10 Feb 2014 07:35:32 -0800 as excerpted:
[I rearranged your upside-down posting so the reply comes in context
after the quote.]
> On 2/8/2014 10:36 AM, John Navitsky wrote:
>> I have a large file system that has been growing. We've resized it a
>> couple of times with the following approach:
>>
>> lvextend -L +800G /dev/raid/virtual_machines
>> btrfs filesystem resize +800G /vms
>>
>> I think the FS started out at 200G, we increased it by 200GB a time or
>> two, then by 800GB and everything worked fine.
>>
>> The filesystem hosts a number of virtual machines so the file system is
>> in use, although the VMs individually tend not to be overly active.
>>
>> VMs tend to be in subvolumes, and some of those subvolumes have
>> snapshots.
>>
>> This time, I increased it by another 800GB, and it it has hung for many
>> hours (over night) with flush-btrfs-4 near 100% cpu all that time.
>>
>> I'm not clear at this point that it will finish or where to go from
>> here.
>>
>> Any pointers would be much appreciated.
> As a follow-up, at some point over the weekend things did finish on
> their own:
>
> romulus:/vms/johnn-sles11sp3 # df -h /vms
> Filesystem Size Used Avail Use% Mounted on
> /dev/dm-4 2.6T 1.6T 1.1T 60% /vms
> romulus:/vms/johnn-sles11sp3 #
>
> I'd still be interested in any comments about what was going on or
> suggestions.
I'm guessing you don't have the VM images set NOCOW (no-copy-on-write),
which means over time they'll **HEAVILY** fragment since every time
something changes in the image and is written back to the file, that
block is written somewhere else due to COW. We've had some reports of
hundreds of thousands of extents in VM files of a few gigs!
It's also worth noting that while NOCOW does normally mean in-place
writes, a change after a snapshot means unsharing the data since the
snapshotted data has now diverged, which means mandatory single-shot COW
in ordered to keep the new change from overwriting the old snapshot
version. That of course triggers fragmentation too, since everything
that changes in the image between snapshots must be written elsewhere,
altho the fragmentation won't be nearly as fast as the default COW mode
will.
So what was very likely taking the time was tracking down all those
potentially hundreds of thousands of fragments/extents in ordered to re-
write the files as triggered by the size increase and presumably the
physical location on-device.
I'd strongly suggest that you set all VMs NOCOW (chattr +C). However,
there's a wrinkle. In ordered to be effective on btrfs, NOCOW must be
set on a file while it is still zero-size, before it has data written to
it. The easiest way to do that is to set NOCOW on the directory, which
doesn't really affect the directory itself, but DOES cause all new files
(and subdirs, so it nests) created in that directory to inherit the NOCOW
attribute. Then copy the file in, preferably either catting it in with
redirection to create/write the file, or copying it from another
filesystem, such that you know it's actually copying the data and not
simply hard-linking it, thus ensuring that the new copy is actually a new
copy, so the NOCOW will actually take effect.
By organizing your VM images into dirs, all with NOCOW set, so the images
inherit it at creation, you'll save yourself the fragmentation of the
repeated COW writes. However, as I mentioned, the first time a block is
written after a snapshot it's still a COW write, unavoidably so. Thus,
I'd suggest keeping btrfs snapshots of your VMs to a minimum (preferably
0), using ordinary full-copy backups to other media, instead, thus
avoiding that first COW-after-snapshot effect, too.
Meanwhile, it's worth noting that if a file is written sequentially
(append only) and not written "into", as will typically be the case with
the VM backups, there's nothing to trigger fragmentation. So the backups
don't have to be NOCOW, since they'll be written once and left alone.
But the actively in-use and thus often written to operational VM images
should be NOCOW, and preferably not snapshotted, to keep fragmentation to
a minimum.
Finally, of course you can use btrfs defrag to manually deal with the
problem. However, do note that the snapshot aware defrag introduced with
kernel 3.9 simply does NOT scale well once the number of snapshots
reaches near 1000, and the snapshot-awareness has just been disabled
again (in kernel 3.14-rc), until the code can be reworked to scale
better. So I'd suggest if you /are/ using snapshots and trying to work
with defrag, you'll want a very new 3.14-rc kernel in ordered to avoid
that problem, but avoiding it does come at the cost of losing space
efficiency when defragging snapshotted btrfs, as the non-snapshot-aware
version will tend to create separate copies of the data on each snapshot
it is run on, thus decreasing shared data blocks and increasing space
usage, perhaps dramatically.
So again, at least for now, and at least for large (half-gig or larger) VM
images and other "internal write" files such as databases, etc, I'd
suggest NOCOW, and don't snapshot, backup to a separate filesystem
instead.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-11 5:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-08 18:36 system stuck with flush-btrfs-4 at 100% after filesystem resize John Navitsky
2014-02-10 15:35 ` John Navitsky
2014-02-11 5:23 ` Duncan
2014-02-10 16:43 ` Josef Bacik
2014-02-10 16:52 ` John Navitsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).