From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from magic.merlins.org ([209.81.13.136]:59187 "EHLO mail1.merlins.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbaAEIfb (ORCPT ); Sun, 5 Jan 2014 03:35:31 -0500 Date: Sat, 4 Jan 2014 22:39:57 -0800 From: Marc MERLIN To: Duncan <1i5t5.duncan@cox.net> Cc: linux-btrfs@vger.kernel.org Subject: Re: btrfs-transaction blocked for more than 120 seconds Message-ID: <20140105063957.GF11749@merlins.org> References: <52C2AE7C.5020601@gmx.at> <20140103172506.GA10021@merlins.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Fri, Jan 03, 2014 at 09:34:10PM +0000, Duncan wrote: > > Thank you for that tip, I had been unaware of it 'till now. > > This will make my virtualbox image directory much happier :) > > I think I said it, but it bears repeating. Once you set that attribute > on the dir, you may want to move the files out of the dir (to another > partition would make sure the data is actually moved) and back in, so > they're effectively new files in the dir. Or use something like cat > oldfile > newfile, so you know it's actually creating the new file, not > reflinking. That'll ensure the NOCOW takes effect. Yes, I got that. That why I ran btrfs defrag on the files after that (I explained why, copy would waste lots of snapshot space by replacing all the block needlessly). > > Unfortunately, on a 83GB vdi (virtualbox) file, with 3.12.5, it did a > > lot of writing and chewed up my 4 CPUs. Then, it started to be hard to > > move my mouse cursor and my procmeter graph was barely updating seconds. > > Next, nothing updated on my X server anymore, not even seconds in time > > widgets. > > > > But, I could still sometimes move my mouse cursor, and I could sometimes > > see the HD light fliker a bit before going dead again. In other words, > > the system wasn't fully deadlocked, but btrfs sure got into a state > > where it was unable to to finish the job, and took the kernel down with > > it (64bit, 8GB of RAM). > > > > I waited 2H and it never came out of it, I had to power down the system > > in the end. Note that this was on a top of the line 500MB/s write > > Samsung Evo 840 SSD, not a slow HD. > > That was defrag (the command) or autodefrag (the mount option)? I'd > guess defrag (the command). defrag, the btrfs subcommand. > That's fragmentation for you! What did/does filefrag have to say about > that file? Were you the one that posted the 6-digit extents? Nope, I never posted anything until now. Hopefully you agree that it's not ok for btrfs/kernel to just kill my system for over 2H until I power it off before of defragging one file. I did hit a severe performance but if it's not a never ending loop. gandalfthegreat:/var/local/nobck/VirtualBox VMs/Win7# filefrag Win7.vdi Win7.vdi: 156222 extents found Considering how virtualbox works, that's hardly surprising. > For something that bad, it might be faster to copy/move it off-device > (expect it to take awhile) then move it back. That way you're only > trying to read OR write on the device, not both, and the move elsewhere > should defrag it quite a bit, effectively sequential write, then read and > write on the move back. Yes, I know how I can work around the problem (although I'll likely have to delete all my historical snapshots to delete the old blocks, which I don't love to do). But doesn't it make sense to see why the kernel is near deadlocking on a single file defrag first? > But even that might be prohibitive. At some point, you may need to > either simply give up on it (if you're lazy), or get down and dirty with > the tracing/profiling, working with a dev to figure out where it's > spending its time and hopefully get btrfs recoded to work a bit faster > for that sort of thing. I'm on my way to a linux conf where I'm speaking, so I have limited time and can't crash my laptop, but I'm happy to type some commands and give output. > As I suggested above, you might try the old school method of defrag, move > the file to a different device, then move it back. And if possible do it > when nothing else is using the system. But it may simply be practically > inaccessible with a current kernel, in which case you'd either have to > work with the devs to optimize, or give it up as a lost cause. =:( I can fix my problem, actually virtualbox works fine with the fragmented file, without even feeling slow, so really I don't need to fix it urgently, I was just trying it out after your post. > Then if the process completed successfully, you could cat the parts back > together again... and the written parts would be basically sequential, so > that should go MUCH faster! =:^) All that noted, but I'm not desperate, just trying commands I hadn't tried yet :) Thanks for your replies, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901