Heavy nocow'd VM image fragmentation

Linux Btrfs filesystem development
 help / color / mirror / Atom feed

* Heavy nocow'd VM image fragmentation
@ 2014-10-23 23:04 Larkin Lowrey
  2014-10-24 11:49 ` Marc MERLIN
  0 siblings, 1 reply; 7+ messages in thread
From: Larkin Lowrey @ 2014-10-23 23:04 UTC (permalink / raw)
  To: linux-btrfs

I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
(filefrag). The file was created in a dir that was chattr +C'd, the file
was created via fallocate and the contents of the orignal image were
copied into the file via dd. I verified that the image was +C.

After initial creation there were about 2800 fragments, according to
filefrag. That doesn't surprise me because this image took up about 60%
of the free space. After an hour of light use the filefrag count was the
same. But, after a day of heavy use, the count is now well over 600,000.

There were no snapshots during the period of use. The fs does not have
compression enabled. These usual suspects don't apply in my case.

The process I used to copy the image to a noCOW image was:

fallocate -n -l $(stat --format %s old.vdi) new.vdi
dd if=old.vdi of=new.vdi conv=notrunc oflags=append bs=1M

Performance does seem much worse in the VM but could it be that the
image isn't actually severely fragmented and I'm just misunderstanding
the output from filefrag?

Is there a problem with how I copied over the old image file?

--Larkin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey
@ 2014-10-24 11:49 ` Marc MERLIN
  2014-10-25  2:41   ` Robert White
  0 siblings, 1 reply; 7+ messages in thread
From: Marc MERLIN @ 2014-10-24 11:49 UTC (permalink / raw)
  To: Larkin Lowrey; +Cc: linux-btrfs

On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
> I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
> (filefrag). The file was created in a dir that was chattr +C'd, the file
> was created via fallocate and the contents of the orignal image were
> copied into the file via dd. I verified that the image was +C.
> 
> After initial creation there were about 2800 fragments, according to
> filefrag. That doesn't surprise me because this image took up about 60%
> of the free space. After an hour of light use the filefrag count was the
> same. But, after a day of heavy use, the count is now well over 600,000.
> 
> There were no snapshots during the period of use. The fs does not have
> compression enabled. These usual suspects don't apply in my case.
> 

To be honest, I have the same problem, and it's vexing:
legolas:/var/local/nobck/VirtualBox VMs/Win7# lsattr *
---------------C Logs/VBox.log.3
---------------C Logs/VBox.log.2
---------------C Logs/VBox.log.1
---------------C Logs/VBox.log
---------------C Snapshots/2014-10-24T04-37-46-247921000Z.sav
---------------C Win7.png
---------------C Win7.vbox
---------------C Win7.vbox-prev
---------------C Win7.vdi
legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi
Win7.vdi: 169130 extents found

Note that I already copied this file recently to lay it out properly
but apparently it fragments again despite NOCOW.
On the plus side, at least btrfs send works again, before my last copy
of it setting it to NOCOW, btrfs send of that filesystem was taking 12H+
instead of minutes.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-24 11:49 ` Marc MERLIN
@ 2014-10-25  2:41   ` Robert White
  2014-10-25  3:28     ` Duncan
  0 siblings, 1 reply; 7+ messages in thread
From: Robert White @ 2014-10-25  2:41 UTC (permalink / raw)
  To: Marc MERLIN, Larkin Lowrey; +Cc: linux-btrfs

On 10/24/2014 04:49 AM, Marc MERLIN wrote:
> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>> I have a 240GB VirtualBox vdi image that is showing heavy fragmentation
>> (filefrag). The file was created in a dir that was chattr +C'd, the file
>> was created via fallocate and the contents of the orignal image were
>> copied into the file via dd. I verified that the image was +C.
> To be honest, I have the same problem, and it's vexing:

If I understand correctly, when you take a snapshot the file goes into 
what I call "1COW" mode. The snapshot is pinning the old image into 
place and the new data has to go somewhere. One explainer said it was 
"on its head" but I suspect the fragmentation still ends up in the 
written segment.

Options may include transferring the VMs to a subvolume that you don't 
snapshot very often.

Doing backups by taking the snapshot, transfering it to backup, and then 
dropping it before you restart the virtual machine.

Honestly while autodefrag would be your enemy, I don't see a lot of 
harm/degradation in leaving the VM images on my drive all normal COW and 
doing the occasional defrag.

I could, of course, be wrong.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-25  2:41   ` Robert White
@ 2014-10-25  3:28     ` Duncan
  2014-10-26 17:20       ` Larkin Lowrey
  0 siblings, 1 reply; 7+ messages in thread
From: Duncan @ 2014-10-25  3:28 UTC (permalink / raw)
  To: linux-btrfs

Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:

> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>> fragmentation (filefrag). The file was created in a dir that was
>>> chattr +C'd, the file was created via fallocate and the contents of
>>> the orignal image were copied into the file via dd. I verified that
>>> the image was +C.
>> To be honest, I have the same problem, and it's vexing:
> 
> If I understand correctly, when you take a snapshot the file goes into
> what I call "1COW" mode.

Yes, but the OP said he hadn't snapshotted since creating the file, and 
MM's a regular that actually wrote much of the wiki documentation on 
raid56 modes, so he better know about the snapshotting problem too.

So that can't be it.  There's apparently a bug in some recent code, and 
it's not honoring the NOCOW even in normal operation, when it should be.

(FWIW I'm not running any VMs or large DBs here, so don't have nocow set 
on anything and can and do use autodefrag on all my btrfs.  So I can't 
say one way or the other, personally.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-25  3:28     ` Duncan
@ 2014-10-26 17:20       ` Larkin Lowrey
  2014-10-27  6:44         ` Duncan
  2014-10-27 12:04         ` Austin S Hemmelgarn
  0 siblings, 2 replies; 7+ messages in thread
From: Larkin Lowrey @ 2014-10-26 17:20 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 10/24/2014 10:28 PM, Duncan wrote:
> Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:
>
>> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>>> fragmentation (filefrag). The file was created in a dir that was
>>>> chattr +C'd, the file was created via fallocate and the contents of
>>>> the orignal image were copied into the file via dd. I verified that
>>>> the image was +C.
>>> To be honest, I have the same problem, and it's vexing:
>> If I understand correctly, when you take a snapshot the file goes into
>> what I call "1COW" mode.
> Yes, but the OP said he hadn't snapshotted since creating the file, and 
> MM's a regular that actually wrote much of the wiki documentation on 
> raid56 modes, so he better know about the snapshotting problem too.
>
> So that can't be it.  There's apparently a bug in some recent code, and 
> it's not honoring the NOCOW even in normal operation, when it should be.
>
> (FWIW I'm not running any VMs or large DBs here, so don't have nocow set 
> on anything and can and do use autodefrag on all my btrfs.  So I can't 
> say one way or the other, personally.)
>

Correct, there were no snapshots during VM usage when the fragmentation
occurred.

One unusual property of my setup is I have my fs on top of bcache. More
specifically, the stack is md raid6  -> bcache -> lvm -> btrfs. When the
fs mounts it has mount option 'ssd' due to the fact that bcache sets
/sys/block/bcache0/queue/rotational to 0.

Is there any reason why either the 'ssd' mount option or being backed by
bcache could be responsible?

--Larkin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-26 17:20       ` Larkin Lowrey
@ 2014-10-27  6:44         ` Duncan
  2014-10-27 12:04         ` Austin S Hemmelgarn
  1 sibling, 0 replies; 7+ messages in thread
From: Duncan @ 2014-10-27  6:44 UTC (permalink / raw)
  To: linux-btrfs

Larkin Lowrey posted on Sun, 26 Oct 2014 12:20:45 -0500 as excerpted:

> One unusual property of my setup is I have my fs on top of bcache. More
> specifically, the stack is md raid6  -> bcache -> lvm -> btrfs. When the
> fs mounts it has mount option 'ssd' due to the fact that bcache sets
> /sys/block/bcache0/queue/rotational to 0.
> 
> Is there any reason why either the 'ssd' mount option or being backed by
> bcache could be responsible?

Bcache... Some kernel cycles ago btrfs on bcache had known issues but IDR 
the details.  I /think/ that was fixed, but if you don't know what I'm 
referring to, I'd suggest looking back in the btrfs list archives (and 
assuming there's a bcache list, there's too), to see what it was, whether 
it was fixed, and (presumably on the bcache list) current status.

... Actually just did a bcache keyword search in my archive and see you 
on a thread, saying it was working fine for you, so never mind, looks 
like you are aware of that thread, and actually know more about the 
status than I do...

I don't believe the ssd mount option /should/ be triggering 
fragmentation; I use it here on real ssd, but as I said, I don't have 
that sort of large-internal-write-pattern file to worry about and have 
autodefrag set too, plus compress=lzo so filefrag's reports aren't 
trustworthy here anyway.

But what I DO know is that there's a nossd mount option available if the 
detection's going whacky and it's adding the ssd mount option 
inappropriately.  That has been there for a couple kernel cycles now.  
See the btrfs (5) manpage for the mount options.

So you could try the nossd mount option and see if it makes a difference.

Meanwhile, that's quite a stack you have there.  Before I switched to 
btrfs and btrfs raid, I was running mdraid here, and for a period ran lvm 
on top of mdraid.  But as an admin I decided that was simply too complex 
a setup for me to be confident in my own ability to properly handle 
disaster recovery.  And because I could feed the appropriate root on 
mdraid parameters directly to the kernel and didn't need an initr* for 
it, while I did for lvm, I kept mdraid, and actually had a few chances to 
practice disaster recovery on mdraid over time, becoming quite 
comfortable with it.

But not only do you have that, you have bcache thrown in too, and in 
place of the traditional reiserfs I was using (and still use on my second 
backups and media partitions on spinning rust as I've had very good 
results with reiserfs since data=ordered became the default, even thru 
various hardware issues... I'll avoid the stories), you're using btrfs, 
which has its own raid modes, altho I suppose you're not using them.

So that is indeed quite a stack.  If you're comfortable with your ability 
to properly handle disaster recovery at all those levels, wow, you 
definitely have my respect.  Or do you just have it all backed up and 
figure if it blows up and disaster recovery isn't going to be trivial you 
simply rebuild and restore from backup?  I guess with btrfs not yet fully 
stable and mature that's the best idea at its level anyway, and if you 
have it backed up for that, then you have it backed up for the others 
and /can/ simply rebuild your stack and restore from backup, should you 
need to.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Heavy nocow'd VM image fragmentation
  2014-10-26 17:20       ` Larkin Lowrey
  2014-10-27  6:44         ` Duncan
@ 2014-10-27 12:04         ` Austin S Hemmelgarn
  1 sibling, 0 replies; 7+ messages in thread
From: Austin S Hemmelgarn @ 2014-10-27 12:04 UTC (permalink / raw)
  To: Larkin Lowrey, Duncan, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]

On 2014-10-26 13:20, Larkin Lowrey wrote:
> On 10/24/2014 10:28 PM, Duncan wrote:
>> Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted:
>>
>>> On 10/24/2014 04:49 AM, Marc MERLIN wrote:
>>>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote:
>>>>> I have a 240GB VirtualBox vdi image that is showing heavy
>>>>> fragmentation (filefrag). The file was created in a dir that was
>>>>> chattr +C'd, the file was created via fallocate and the contents of
>>>>> the orignal image were copied into the file via dd. I verified that
>>>>> the image was +C.
>>>> To be honest, I have the same problem, and it's vexing:
>>> If I understand correctly, when you take a snapshot the file goes into
>>> what I call "1COW" mode.
>> Yes, but the OP said he hadn't snapshotted since creating the file, and
>> MM's a regular that actually wrote much of the wiki documentation on
>> raid56 modes, so he better know about the snapshotting problem too.
>>
>> So that can't be it.  There's apparently a bug in some recent code, and
>> it's not honoring the NOCOW even in normal operation, when it should be.
>>
>> (FWIW I'm not running any VMs or large DBs here, so don't have nocow set
>> on anything and can and do use autodefrag on all my btrfs.  So I can't
>> say one way or the other, personally.)
>>
>
> Correct, there were no snapshots during VM usage when the fragmentation
> occurred.
>
> One unusual property of my setup is I have my fs on top of bcache. More
> specifically, the stack is md raid6  -> bcache -> lvm -> btrfs. When the
> fs mounts it has mount option 'ssd' due to the fact that bcache sets
> /sys/block/bcache0/queue/rotational to 0.
>
> Is there any reason why either the 'ssd' mount option or being backed by
> bcache could be responsible?
>

Two things:
First, regarding your question, the ssd mount option "shouldn't" be 
responsible for this, because it is supposed to spread out allocation 
only at the chunk level, not the block level, but some recent commit may 
have changed that.  Are you using any kind of compression in btrfs?  If 
so, then filefrag won't report the number of fragments correctly (it 
currently reports the number of compressed blocks in the file instead), 
and in fact, if you are using compression in btrfs, I would expect the 
number of compressed blocks to go up as you use more space in the VM 
image, long runs of zero bytes compress well, other stuff (especially 
on-disk structures from encapsulated filesystems) doesn't.  You might 
consider putting the vm images directly on the LVM layer instead, that 
tends to get much better performance in my experience than storing them 
on a filesystem.

Secondly, I'd recommend switching from using bcache under LVM to using 
dm-cache on top of LVM, as it makes it much easier to recover from the 
various failure modes, and also to deal with a corrupted cache, due to 
the fact that dm-cache doesn't put any metadata on the backing device. 
It takes longer to shutdown when in write-back mode, and isn't SSD 
optimized, but has also been much more reliable in my experience.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-27 12:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey
2014-10-24 11:49 ` Marc MERLIN
2014-10-25  2:41   ` Robert White
2014-10-25  3:28     ` Duncan
2014-10-26 17:20       ` Larkin Lowrey
2014-10-27  6:44         ` Duncan
2014-10-27 12:04         ` Austin S Hemmelgarn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox