* Heavy nocow'd VM image fragmentation @ 2014-10-23 23:04 Larkin Lowrey 2014-10-24 11:49 ` Marc MERLIN 0 siblings, 1 reply; 7+ messages in thread From: Larkin Lowrey @ 2014-10-23 23:04 UTC (permalink / raw) To: linux-btrfs I have a 240GB VirtualBox vdi image that is showing heavy fragmentation (filefrag). The file was created in a dir that was chattr +C'd, the file was created via fallocate and the contents of the orignal image were copied into the file via dd. I verified that the image was +C. After initial creation there were about 2800 fragments, according to filefrag. That doesn't surprise me because this image took up about 60% of the free space. After an hour of light use the filefrag count was the same. But, after a day of heavy use, the count is now well over 600,000. There were no snapshots during the period of use. The fs does not have compression enabled. These usual suspects don't apply in my case. The process I used to copy the image to a noCOW image was: fallocate -n -l $(stat --format %s old.vdi) new.vdi dd if=old.vdi of=new.vdi conv=notrunc oflags=append bs=1M Performance does seem much worse in the VM but could it be that the image isn't actually severely fragmented and I'm just misunderstanding the output from filefrag? Is there a problem with how I copied over the old image file? --Larkin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey @ 2014-10-24 11:49 ` Marc MERLIN 2014-10-25 2:41 ` Robert White 0 siblings, 1 reply; 7+ messages in thread From: Marc MERLIN @ 2014-10-24 11:49 UTC (permalink / raw) To: Larkin Lowrey; +Cc: linux-btrfs On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote: > I have a 240GB VirtualBox vdi image that is showing heavy fragmentation > (filefrag). The file was created in a dir that was chattr +C'd, the file > was created via fallocate and the contents of the orignal image were > copied into the file via dd. I verified that the image was +C. > > After initial creation there were about 2800 fragments, according to > filefrag. That doesn't surprise me because this image took up about 60% > of the free space. After an hour of light use the filefrag count was the > same. But, after a day of heavy use, the count is now well over 600,000. > > There were no snapshots during the period of use. The fs does not have > compression enabled. These usual suspects don't apply in my case. > To be honest, I have the same problem, and it's vexing: legolas:/var/local/nobck/VirtualBox VMs/Win7# lsattr * ---------------C Logs/VBox.log.3 ---------------C Logs/VBox.log.2 ---------------C Logs/VBox.log.1 ---------------C Logs/VBox.log ---------------C Snapshots/2014-10-24T04-37-46-247921000Z.sav ---------------C Win7.png ---------------C Win7.vbox ---------------C Win7.vbox-prev ---------------C Win7.vdi legolas:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi Win7.vdi: 169130 extents found Note that I already copied this file recently to lay it out properly but apparently it fragments again despite NOCOW. On the plus side, at least btrfs send works again, before my last copy of it setting it to NOCOW, btrfs send of that filesystem was taking 12H+ instead of minutes. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-24 11:49 ` Marc MERLIN @ 2014-10-25 2:41 ` Robert White 2014-10-25 3:28 ` Duncan 0 siblings, 1 reply; 7+ messages in thread From: Robert White @ 2014-10-25 2:41 UTC (permalink / raw) To: Marc MERLIN, Larkin Lowrey; +Cc: linux-btrfs On 10/24/2014 04:49 AM, Marc MERLIN wrote: > On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote: >> I have a 240GB VirtualBox vdi image that is showing heavy fragmentation >> (filefrag). The file was created in a dir that was chattr +C'd, the file >> was created via fallocate and the contents of the orignal image were >> copied into the file via dd. I verified that the image was +C. > To be honest, I have the same problem, and it's vexing: If I understand correctly, when you take a snapshot the file goes into what I call "1COW" mode. The snapshot is pinning the old image into place and the new data has to go somewhere. One explainer said it was "on its head" but I suspect the fragmentation still ends up in the written segment. Options may include transferring the VMs to a subvolume that you don't snapshot very often. Doing backups by taking the snapshot, transfering it to backup, and then dropping it before you restart the virtual machine. Honestly while autodefrag would be your enemy, I don't see a lot of harm/degradation in leaving the VM images on my drive all normal COW and doing the occasional defrag. I could, of course, be wrong. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-25 2:41 ` Robert White @ 2014-10-25 3:28 ` Duncan 2014-10-26 17:20 ` Larkin Lowrey 0 siblings, 1 reply; 7+ messages in thread From: Duncan @ 2014-10-25 3:28 UTC (permalink / raw) To: linux-btrfs Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted: > On 10/24/2014 04:49 AM, Marc MERLIN wrote: >> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote: >>> I have a 240GB VirtualBox vdi image that is showing heavy >>> fragmentation (filefrag). The file was created in a dir that was >>> chattr +C'd, the file was created via fallocate and the contents of >>> the orignal image were copied into the file via dd. I verified that >>> the image was +C. >> To be honest, I have the same problem, and it's vexing: > > If I understand correctly, when you take a snapshot the file goes into > what I call "1COW" mode. Yes, but the OP said he hadn't snapshotted since creating the file, and MM's a regular that actually wrote much of the wiki documentation on raid56 modes, so he better know about the snapshotting problem too. So that can't be it. There's apparently a bug in some recent code, and it's not honoring the NOCOW even in normal operation, when it should be. (FWIW I'm not running any VMs or large DBs here, so don't have nocow set on anything and can and do use autodefrag on all my btrfs. So I can't say one way or the other, personally.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-25 3:28 ` Duncan @ 2014-10-26 17:20 ` Larkin Lowrey 2014-10-27 6:44 ` Duncan 2014-10-27 12:04 ` Austin S Hemmelgarn 0 siblings, 2 replies; 7+ messages in thread From: Larkin Lowrey @ 2014-10-26 17:20 UTC (permalink / raw) To: Duncan, linux-btrfs On 10/24/2014 10:28 PM, Duncan wrote: > Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted: > >> On 10/24/2014 04:49 AM, Marc MERLIN wrote: >>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote: >>>> I have a 240GB VirtualBox vdi image that is showing heavy >>>> fragmentation (filefrag). The file was created in a dir that was >>>> chattr +C'd, the file was created via fallocate and the contents of >>>> the orignal image were copied into the file via dd. I verified that >>>> the image was +C. >>> To be honest, I have the same problem, and it's vexing: >> If I understand correctly, when you take a snapshot the file goes into >> what I call "1COW" mode. > Yes, but the OP said he hadn't snapshotted since creating the file, and > MM's a regular that actually wrote much of the wiki documentation on > raid56 modes, so he better know about the snapshotting problem too. > > So that can't be it. There's apparently a bug in some recent code, and > it's not honoring the NOCOW even in normal operation, when it should be. > > (FWIW I'm not running any VMs or large DBs here, so don't have nocow set > on anything and can and do use autodefrag on all my btrfs. So I can't > say one way or the other, personally.) > Correct, there were no snapshots during VM usage when the fragmentation occurred. One unusual property of my setup is I have my fs on top of bcache. More specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the fs mounts it has mount option 'ssd' due to the fact that bcache sets /sys/block/bcache0/queue/rotational to 0. Is there any reason why either the 'ssd' mount option or being backed by bcache could be responsible? --Larkin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-26 17:20 ` Larkin Lowrey @ 2014-10-27 6:44 ` Duncan 2014-10-27 12:04 ` Austin S Hemmelgarn 1 sibling, 0 replies; 7+ messages in thread From: Duncan @ 2014-10-27 6:44 UTC (permalink / raw) To: linux-btrfs Larkin Lowrey posted on Sun, 26 Oct 2014 12:20:45 -0500 as excerpted: > One unusual property of my setup is I have my fs on top of bcache. More > specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the > fs mounts it has mount option 'ssd' due to the fact that bcache sets > /sys/block/bcache0/queue/rotational to 0. > > Is there any reason why either the 'ssd' mount option or being backed by > bcache could be responsible? Bcache... Some kernel cycles ago btrfs on bcache had known issues but IDR the details. I /think/ that was fixed, but if you don't know what I'm referring to, I'd suggest looking back in the btrfs list archives (and assuming there's a bcache list, there's too), to see what it was, whether it was fixed, and (presumably on the bcache list) current status. ... Actually just did a bcache keyword search in my archive and see you on a thread, saying it was working fine for you, so never mind, looks like you are aware of that thread, and actually know more about the status than I do... I don't believe the ssd mount option /should/ be triggering fragmentation; I use it here on real ssd, but as I said, I don't have that sort of large-internal-write-pattern file to worry about and have autodefrag set too, plus compress=lzo so filefrag's reports aren't trustworthy here anyway. But what I DO know is that there's a nossd mount option available if the detection's going whacky and it's adding the ssd mount option inappropriately. That has been there for a couple kernel cycles now. See the btrfs (5) manpage for the mount options. So you could try the nossd mount option and see if it makes a difference. Meanwhile, that's quite a stack you have there. Before I switched to btrfs and btrfs raid, I was running mdraid here, and for a period ran lvm on top of mdraid. But as an admin I decided that was simply too complex a setup for me to be confident in my own ability to properly handle disaster recovery. And because I could feed the appropriate root on mdraid parameters directly to the kernel and didn't need an initr* for it, while I did for lvm, I kept mdraid, and actually had a few chances to practice disaster recovery on mdraid over time, becoming quite comfortable with it. But not only do you have that, you have bcache thrown in too, and in place of the traditional reiserfs I was using (and still use on my second backups and media partitions on spinning rust as I've had very good results with reiserfs since data=ordered became the default, even thru various hardware issues... I'll avoid the stories), you're using btrfs, which has its own raid modes, altho I suppose you're not using them. So that is indeed quite a stack. If you're comfortable with your ability to properly handle disaster recovery at all those levels, wow, you definitely have my respect. Or do you just have it all backed up and figure if it blows up and disaster recovery isn't going to be trivial you simply rebuild and restore from backup? I guess with btrfs not yet fully stable and mature that's the best idea at its level anyway, and if you have it backed up for that, then you have it backed up for the others and /can/ simply rebuild your stack and restore from backup, should you need to. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Heavy nocow'd VM image fragmentation 2014-10-26 17:20 ` Larkin Lowrey 2014-10-27 6:44 ` Duncan @ 2014-10-27 12:04 ` Austin S Hemmelgarn 1 sibling, 0 replies; 7+ messages in thread From: Austin S Hemmelgarn @ 2014-10-27 12:04 UTC (permalink / raw) To: Larkin Lowrey, Duncan, linux-btrfs [-- Attachment #1: Type: text/plain, Size: 3151 bytes --] On 2014-10-26 13:20, Larkin Lowrey wrote: > On 10/24/2014 10:28 PM, Duncan wrote: >> Robert White posted on Fri, 24 Oct 2014 19:41:32 -0700 as excerpted: >> >>> On 10/24/2014 04:49 AM, Marc MERLIN wrote: >>>> On Thu, Oct 23, 2014 at 06:04:43PM -0500, Larkin Lowrey wrote: >>>>> I have a 240GB VirtualBox vdi image that is showing heavy >>>>> fragmentation (filefrag). The file was created in a dir that was >>>>> chattr +C'd, the file was created via fallocate and the contents of >>>>> the orignal image were copied into the file via dd. I verified that >>>>> the image was +C. >>>> To be honest, I have the same problem, and it's vexing: >>> If I understand correctly, when you take a snapshot the file goes into >>> what I call "1COW" mode. >> Yes, but the OP said he hadn't snapshotted since creating the file, and >> MM's a regular that actually wrote much of the wiki documentation on >> raid56 modes, so he better know about the snapshotting problem too. >> >> So that can't be it. There's apparently a bug in some recent code, and >> it's not honoring the NOCOW even in normal operation, when it should be. >> >> (FWIW I'm not running any VMs or large DBs here, so don't have nocow set >> on anything and can and do use autodefrag on all my btrfs. So I can't >> say one way or the other, personally.) >> > > Correct, there were no snapshots during VM usage when the fragmentation > occurred. > > One unusual property of my setup is I have my fs on top of bcache. More > specifically, the stack is md raid6 -> bcache -> lvm -> btrfs. When the > fs mounts it has mount option 'ssd' due to the fact that bcache sets > /sys/block/bcache0/queue/rotational to 0. > > Is there any reason why either the 'ssd' mount option or being backed by > bcache could be responsible? > Two things: First, regarding your question, the ssd mount option "shouldn't" be responsible for this, because it is supposed to spread out allocation only at the chunk level, not the block level, but some recent commit may have changed that. Are you using any kind of compression in btrfs? If so, then filefrag won't report the number of fragments correctly (it currently reports the number of compressed blocks in the file instead), and in fact, if you are using compression in btrfs, I would expect the number of compressed blocks to go up as you use more space in the VM image, long runs of zero bytes compress well, other stuff (especially on-disk structures from encapsulated filesystems) doesn't. You might consider putting the vm images directly on the LVM layer instead, that tends to get much better performance in my experience than storing them on a filesystem. Secondly, I'd recommend switching from using bcache under LVM to using dm-cache on top of LVM, as it makes it much easier to recover from the various failure modes, and also to deal with a corrupted cache, due to the fact that dm-cache doesn't put any metadata on the backing device. It takes longer to shutdown when in write-back mode, and isn't SSD optimized, but has also been much more reliable in my experience. [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2455 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-10-27 12:05 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-10-23 23:04 Heavy nocow'd VM image fragmentation Larkin Lowrey 2014-10-24 11:49 ` Marc MERLIN 2014-10-25 2:41 ` Robert White 2014-10-25 3:28 ` Duncan 2014-10-26 17:20 ` Larkin Lowrey 2014-10-27 6:44 ` Duncan 2014-10-27 12:04 ` Austin S Hemmelgarn
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox