* Heavy nocow'd VM image fragmentation
@ 2014-10-23 23:04 Larkin Lowrey
2014-10-24 11:49 ` Marc MERLIN
0 siblings, 1 reply; 7+ messages in thread
From: Larkin Lowrey @ 2014-10-23 23:04 UTC (permalink / raw)
To: linux-btrfs
I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
(filefrag). The file was created in a dir that was chattr +C'd, the file
was created via fallocate and the contents of the orignal image were
copied into the file via dd. I verified that the image was +C.
After initial creation there were about 2800 fragments, according to
filefrag. That doesn't surprise me because this image took up about 60%
of the free space. After an hour of light use the filefrag count was the
same. But, after a day of heavy use, the count is now well over 600,000.
There were no snapshots during the period of use. The fs does not have
compression enabled. These usual suspects don't apply in my case.
The process I used to copy the image to a noCOW image was:
fallocate -n -l $(stat --format %s old.vdi) new.vdi
dd if=old.vdi of=new.vdi conv=notrunc oflags=append bs=1M
Performance does seem much worse in the VM but could it be that the
image isn't actually severely fragmented and I'm just misunderstanding
the output from filefrag?
Is there a problem with how I copied over the old image file?
--Larkin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey
@ 2014-10-24 11:49 ` Marc MERLIN
2014-10-25 2:41 ` Robert White
0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2014-10-24 11:49 UTC (permalink / raw)
To: Larkin Lowrey; +Cc: linux-btrfs
On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
> I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
> (filefrag). The file was created in a dir that was chattr +C'd, the file
> was created via fallocate and the contents of the orignal image were
> copied into the file via dd. I verified that the image was +C.
>
> After initial creation there were about 2800 fragments, according to
> filefrag. That doesn't surprise me because this image took up about 60%
> of the free space. After an hour of light use the filefrag count was the
> same. But, after a day of heavy use, the count is now well over 600,000.
>
> There were no snapshots during the period of use. The fs does not have
> compression enabled. These usual suspects don't apply in my case.
>
To be honest, I have the same problem, and it's vexing:
legolas:/var/local/nobck/VirtualBox VMs/Win7# lsattr *
---------------C Logs/VBox.log.3
---------------C Logs/VBox.log.2
---------------C Logs/VBox.log.1
---------------C Logs/VBox.log
---------------C Snapshots/2014-10-24T04-37-46-247921000Z.sav
---------------C Win7.png
---------------C Win7.vbox
---------------C Win7.vbox-prev
---------------C Win7.vdi
legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
Win7.vdi: 169130 extents found
Note that I already copied this file recently to lay it out properly
but apparently it fragments again despite NOCOW.
On the plus side, at least btrfs send works again, before my last copy
of it setting it to NOCOW, btrfs send of that filesystem was taking 12H+
instead of minutes.
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-24 11:49 ` Marc MERLIN
@ 2014-10-25 2:41 ` Robert White
2014-10-25 3:28 ` Duncan
0 siblings, 1 reply; 7+ messages in thread
From: Robert White @ 2014-10-25 2:41 UTC (permalink / raw)
To: Marc MERLIN, Larkin Lowrey; +Cc: linux-btrfs
On 10/24/2014 04:49 AM, Marc MERLIN wrote:
> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>> I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
>> (filefrag). The file was created in a dir that was chattr +C'd, the file
>> was created via fallocate and the contents of the orignal image were
>> copied into the file via dd. I verified that the image was +C.
> To be honest, I have the same problem, and it's vexing:
If I understand correctly, when you take a snapshot the file goes into
what I call "1COW" mode. The snapshot is pinning the old image into
place and the new data has to go somewhere. One explainer said it was
"on its head" but I suspect the fragmentation still ends up in the
written segment.
Options may include transferring the VMs to a subvolume that you don't
snapshot very often.
Doing backups by taking the snapshot, transfering it to backup, and then
dropping it before you restart the virtual machine.
Honestly while autodefrag would be your enemy, I don't see a lot of
harm/degradation in leaving the VM images on my drive all normal COW and
doing the occasional defrag.
I could, of course, be wrong.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-25 2:41 ` Robert White
@ 2014-10-25 3:28 ` Duncan
2014-10-26 17:20 ` Larkin Lowrey
0 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2014-10-25 3:28 UTC (permalink / raw)
To: linux-btrfs
Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:
> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>> fragmentation (filefrag). The file was created in a dir that was
>>> chattr +C'd, the file was created via fallocate and the contents of
>>> the orignal image were copied into the file via dd. I verified that
>>> the image was +C.
>> To be honest, I have the same problem, and it's vexing:
>
> If I understand correctly, when you take a snapshot the file goes into
> what I call "1COW" mode.
Yes, but the OP said he hadn't snapshotted since creating the file, and
MM's a regular that actually wrote much of the wiki documentation on
raid56 modes, so he better know about the snapshotting problem too.
So that can't be it. There's apparently a bug in some recent code, and
it's not honoring the NOCOW even in normal operation, when it should be.
(FWIW I'm not running any VMs or large DBs here, so don't have nocow set
on anything and can and do use autodefrag on all my btrfs. So I can't
say one way or the other, personally.)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-25 3:28 ` Duncan
@ 2014-10-26 17:20 ` Larkin Lowrey
2014-10-27 6:44 ` Duncan
2014-10-27 12:04 ` Austin S Hemmelgarn
0 siblings, 2 replies; 7+ messages in thread
From: Larkin Lowrey @ 2014-10-26 17:20 UTC (permalink / raw)
To: Duncan, linux-btrfs
On 10/24/2014 10:28 PM, Duncan wrote:
> Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:
>
>> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>>> fragmentation (filefrag). The file was created in a dir that was
>>>> chattr +C'd, the file was created via fallocate and the contents of
>>>> the orignal image were copied into the file via dd. I verified that
>>>> the image was +C.
>>> To be honest, I have the same problem, and it's vexing:
>> If I understand correctly, when you take a snapshot the file goes into
>> what I call "1COW" mode.
> Yes, but the OP said he hadn't snapshotted since creating the file, and
> MM's a regular that actually wrote much of the wiki documentation on
> raid56 modes, so he better know about the snapshotting problem too.
>
> So that can't be it. There's apparently a bug in some recent code, and
> it's not honoring the NOCOW even in normal operation, when it should be.
>
> (FWIW I'm not running any VMs or large DBs here, so don't have nocow set
> on anything and can and do use autodefrag on all my btrfs. So I can't
> say one way or the other, personally.)
>
Correct, there were no snapshots during VM usage when the fragmentation
occurred.
One unusual property of my setup is I have my fs on top of bcache. More
specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the
fs mounts it has mount option 'ssd' due to the fact that bcache sets
/sys/block/bcache0/queue/rotational to 0.
Is there any reason why either the 'ssd' mount option or being backed by
bcache could be responsible?
--Larkin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-26 17:20 ` Larkin Lowrey
@ 2014-10-27 6:44 ` Duncan
2014-10-27 12:04 ` Austin S Hemmelgarn
1 sibling, 0 replies; 7+ messages in thread
From: Duncan @ 2014-10-27 6:44 UTC (permalink / raw)
To: linux-btrfs
Larkin Lowrey posted on Sun, 26 Oct 2014 12:20:45 -0500 as excerpted:
> One unusual property of my setup is I have my fs on top of bcache. More
> specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the
> fs mounts it has mount option 'ssd' due to the fact that bcache sets
> /sys/block/bcache0/queue/rotational to 0.
>
> Is there any reason why either the 'ssd' mount option or being backed by
> bcache could be responsible?
Bcache... Some kernel cycles ago btrfs on bcache had known issues but IDR
the details. I /think/ that was fixed, but if you don't know what I'm
referring to, I'd suggest looking back in the btrfs list archives (and
assuming there's a bcache list, there's too), to see what it was, whether
it was fixed, and (presumably on the bcache list) current status.
... Actually just did a bcache keyword search in my archive and see you
on a thread, saying it was working fine for you, so never mind, looks
like you are aware of that thread, and actually know more about the
status than I do...
I don't believe the ssd mount option /should/ be triggering
fragmentation; I use it here on real ssd, but as I said, I don't have
that sort of large-internal-write-pattern file to worry about and have
autodefrag set too, plus compress=lzo so filefrag's reports aren't
trustworthy here anyway.
But what I DO know is that there's a nossd mount option available if the
detection's going whacky and it's adding the ssd mount option
inappropriately. That has been there for a couple kernel cycles now.
See the btrfs (5) manpage for the mount options.
So you could try the nossd mount option and see if it makes a difference.
Meanwhile, that's quite a stack you have there. Before I switched to
btrfs and btrfs raid, I was running mdraid here, and for a period ran lvm
on top of mdraid. But as an admin I decided that was simply too complex
a setup for me to be confident in my own ability to properly handle
disaster recovery. And because I could feed the appropriate root on
mdraid parameters directly to the kernel and didn't need an initr* for
it, while I did for lvm, I kept mdraid, and actually had a few chances to
practice disaster recovery on mdraid over time, becoming quite
comfortable with it.
But not only do you have that, you have bcache thrown in too, and in
place of the traditional reiserfs I was using (and still use on my second
backups and media partitions on spinning rust as I've had very good
results with reiserfs since data=ordered became the default, even thru
various hardware issues... I'll avoid the stories), you're using btrfs,
which has its own raid modes, altho I suppose you're not using them.
So that is indeed quite a stack. If you're comfortable with your ability
to properly handle disaster recovery at all those levels, wow, you
definitely have my respect. Or do you just have it all backed up and
figure if it blows up and disaster recovery isn't going to be trivial you
simply rebuild and restore from backup? I guess with btrfs not yet fully
stable and mature that's the best idea at its level anyway, and if you
have it backed up for that, then you have it backed up for the others
and /can/ simply rebuild your stack and restore from backup, should you
need to.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation
2014-10-26 17:20 ` Larkin Lowrey
2014-10-27 6:44 ` Duncan
@ 2014-10-27 12:04 ` Austin S Hemmelgarn
1 sibling, 0 replies; 7+ messages in thread
From: Austin S Hemmelgarn @ 2014-10-27 12:04 UTC (permalink / raw)
To: Larkin Lowrey, Duncan, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]
On 2014-10-26 13:20, Larkin Lowrey wrote:
> On 10/24/2014 10:28 PM, Duncan wrote:
>> Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:
>>
>>> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>>>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>>>> fragmentation (filefrag). The file was created in a dir that was
>>>>> chattr +C'd, the file was created via fallocate and the contents of
>>>>> the orignal image were copied into the file via dd. I verified that
>>>>> the image was +C.
>>>> To be honest, I have the same problem, and it's vexing:
>>> If I understand correctly, when you take a snapshot the file goes into
>>> what I call "1COW" mode.
>> Yes, but the OP said he hadn't snapshotted since creating the file, and
>> MM's a regular that actually wrote much of the wiki documentation on
>> raid56 modes, so he better know about the snapshotting problem too.
>>
>> So that can't be it. There's apparently a bug in some recent code, and
>> it's not honoring the NOCOW even in normal operation, when it should be.
>>
>> (FWIW I'm not running any VMs or large DBs here, so don't have nocow set
>> on anything and can and do use autodefrag on all my btrfs. So I can't
>> say one way or the other, personally.)
>>
>
> Correct, there were no snapshots during VM usage when the fragmentation
> occurred.
>
> One unusual property of my setup is I have my fs on top of bcache. More
> specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the
> fs mounts it has mount option 'ssd' due to the fact that bcache sets
> /sys/block/bcache0/queue/rotational to 0.
>
> Is there any reason why either the 'ssd' mount option or being backed by
> bcache could be responsible?
>
Two things:
First, regarding your question, the ssd mount option "shouldn't" be
responsible for this, because it is supposed to spread out allocation
only at the chunk level, not the block level, but some recent commit may
have changed that. Are you using any kind of compression in btrfs? If
so, then filefrag won't report the number of fragments correctly (it
currently reports the number of compressed blocks in the file instead),
and in fact, if you are using compression in btrfs, I would expect the
number of compressed blocks to go up as you use more space in the VM
image, long runs of zero bytes compress well, other stuff (especially
on-disk structures from encapsulated filesystems) doesn't. You might
consider putting the vm images directly on the LVM layer instead, that
tends to get much better performance in my experience than storing them
on a filesystem.
Secondly, I'd recommend switching from using bcache under LVM to using
dm-cache on top of LVM, as it makes it much easier to recover from the
various failure modes, and also to deal with a corrupted cache, due to
the fact that dm-cache doesn't put any metadata on the backing device.
It takes longer to shutdown when in write-back mode, and isn't SSD
optimized, but has also been much more reliable in my experience.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-10-27 12:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey
2014-10-24 11:49 ` Marc MERLIN
2014-10-25 2:41 ` Robert White
2014-10-25 3:28 ` Duncan
2014-10-26 17:20 ` Larkin Lowrey
2014-10-27 6:44 ` Duncan
2014-10-27 12:04 ` Austin S Hemmelgarn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox