From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from plane.gmane.org ([80.91.229.3]:59751 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751295AbaIBFUy (ORCPT ); Tue, 2 Sep 2014 01:20:54 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1XOgWN-0002Jc-GD for linux-btrfs@vger.kernel.org; Tue, 02 Sep 2014 07:20:47 +0200 Received: from ip68-231-22-224.ph.ph.cox.net ([68.231.22.224]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 02 Sep 2014 07:20:47 +0200 Received: from 1i5t5.duncan by ip68-231-22-224.ph.ph.cox.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 02 Sep 2014 07:20:47 +0200 To: linux-btrfs@vger.kernel.org From: Duncan <1i5t5.duncan@cox.net> Subject: Re: kernel 3.17-rc3: task rsync:2524 blocked for more than 120 seconds Date: Tue, 2 Sep 2014 05:20:29 +0000 (UTC) Message-ID: References: <540498AF.6030109@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: john terragon posted on Mon, 01 Sep 2014 18:36:49 +0200 as excerpted: > I was trying it again and it seems to have completed, albeit very slowly > (even for an usb flash drive). Was the 3.14 series the last immune one > from this problem? Should I try the latest 3.14.x? The 3.14 series was before the switch to generic kworker threads, while btrfs still had its own custom work-queue threads. There was known to be a very specific problem with the kworker threads, but in 3.17-rc3 that should be fixed. So it may well be a problem with btrfs in general, at least as it exists today and historically, in which case 3.14.x won't help you much if at all. But I'd definitely recommend trying it. If 3.14 is significantly faster and it's repeatedly so, then there's obviously some other regression, either with kworker threads or with something else, since then. If not, then at least we know for sure kworker threads aren't a factor, since 3.14 was previous to them entering the picture. The other possibility I'm aware of would be erase-block related. I see you're using autodefrag so it shouldn't be direct file fragmentation, but particularly if the filesystem has been used for some time, it might be the firmware trying to shuffle things around and having trouble due to having already used up all the known-free erase blocks so it's having to stop and free one by shifting things around every time it needs another one, and that's what's taking the time. What does btrfs fi show say about free space (the device line (lines, for multi-device btrfs) size vs. used, not the top line, is the interesting bit)? What does btrfs fi df say for data and metadata (total vs. used)? For btrfs fi df ideally your data/metadata spread between used and total shouldn't be too large (a few gig for data and a gig or so for metadata isn't too bad, assuming a large enough device, of course). If it is, a balance may be in order, perhaps using the -dusage=20 and/or -musage=20 style options to keep it from rebalancing everything (read up on the wiki and choose your number, 5 might be good if there's plenty of room, you might need 50 or higher if you're close to full, more than about 80 and you might as well just use -d or -m and forget the usage bit). Similarly, for btrfs fi show, you want as much space as possible left, several gigs at least if your device isn't too small for that to be practical. Again, if btrfs fi df is out of balance it'll use more space in show as well, and a balance should retrieve some of it. Once you have some space to work with (or before the balance if you suspect your firmware is SERIOUSLY out of space and shuffling, as that'll slow the balance down too, and again after), try running fstrim on the device. It may or may not work on that device, but if it does and the firmware /was/ out of space and having to shuffle hard, it could improve performance *DRAMATICALLY*. The reason being that on devices where it works, fstrim will tell the firmware what blocks are free, allowing it more flexibility in erase-block shuffling. If that makes a big difference, you can /try/ the discard mount option. Tho doing the trim/discard as part of normal operations can slow them down some too. The alternative would be to simply run fstrim periodically, perhaps every Nth rsync or some such. Note that as the fstrim manpage says, the output of fstrim run repeatedly will be the same, since it only knows what areas are candidates to trim, not which ones are already trimmed, but it shouldn't hurt the device any to repeatedly fstrim it, and if you do it every N rsyncs, it should keep things from getting too bad again. The other thing to consider if you haven't already is the ssd_spread mount option. The documentation suggests it can be helpful on lower quality SSDs and USB sticks which fits your use-case, so I'd try it. Tho it probably won't work at its ideal unless you do a fresh mkfs (or near full balance with it enabled). But it's something to at least consider and possibly try if you haven't. Depending on the firmware and erase- block layout, it could help. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman