* Struggling with file system slowness @ 2017-05-04 13:15 Matt McKinnon 2017-05-04 14:22 ` Peter Grandi 2017-05-04 14:24 ` Duncan 0 siblings, 2 replies; 6+ messages in thread From: Matt McKinnon @ 2017-05-04 13:15 UTC (permalink / raw) To: linux-btrfs Hi All, Trying to peg down why I have one server that has btrfs-transacti pegged at 100% CPU for most of the time. I thought this might have to do with fragmentation as mentioned in the Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as mentioned in the wiki), but after running a full defrag of the file system, and also enabling the 'autodefrag' mount option, the problem still persists. What's the best way to figure out what btrfs is chugging away at here? Kernel: 4.10.13-custom btrfs-progs: v4.10.2 -Matt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness 2017-05-04 13:15 Struggling with file system slowness Matt McKinnon @ 2017-05-04 14:22 ` Peter Grandi 2017-05-05 13:24 ` Matt McKinnon 2017-05-04 14:24 ` Duncan 1 sibling, 1 reply; 6+ messages in thread From: Peter Grandi @ 2017-05-04 14:22 UTC (permalink / raw) To: Linux fs Btrfs > Trying to peg down why I have one server that has > btrfs-transacti pegged at 100% CPU for most of the time. Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Typical high CPU are extents (your defragging not necessarily worked), and 'qgroups', especially with many subvolumes. It could be the fre space cache in some rare cases. https://www.google.ca/search?num=100&safe=images&as_q=cxpu&as_epq=btrfs-transaction To this something like this happens often, but is not Btrfs-related, but triggered for example by near-memory exhaustion in the kernel memory manager. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness 2017-05-04 14:22 ` Peter Grandi @ 2017-05-05 13:24 ` Matt McKinnon 2017-05-09 19:14 ` Liu Bo 0 siblings, 1 reply; 6+ messages in thread From: Matt McKinnon @ 2017-05-05 13:24 UTC (permalink / raw) To: pg, linux-btrfs > Too little information. Is IO happening at the same time? Is > compression on? Deduplicated? Lots of subvolumes? SSD? What > kind of workload and file size/distribution profile? Only write IO during the load spikes. No compression, no deduplication. 12 volumes (including snapshots). Spinning disks. Medium workload; file sizes are all over the map since this hold about 30 user home directories. Interestingly enough, the problems which had persisted for many weeks went away when all snapshots were removed. btrfs-transaction spikes disappeared. Memory usage went from 30G to under 2G. -Matt ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness 2017-05-05 13:24 ` Matt McKinnon @ 2017-05-09 19:14 ` Liu Bo 2017-05-09 19:25 ` Matt McKinnon 0 siblings, 1 reply; 6+ messages in thread From: Liu Bo @ 2017-05-09 19:14 UTC (permalink / raw) To: Matt McKinnon; +Cc: pg, linux-btrfs On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: > > Too little information. Is IO happening at the same time? Is > > compression on? Deduplicated? Lots of subvolumes? SSD? What > > kind of workload and file size/distribution profile? > > Only write IO during the load spikes. No compression, no deduplication. 12 > volumes (including snapshots). Spinning disks. Medium workload; file sizes > are all over the map since this hold about 30 user home directories. > > Interestingly enough, the problems which had persisted for many weeks went > away when all snapshots were removed. btrfs-transaction spikes disappeared. > Memory usage went from 30G to under 2G. > Were those snapshots served as backup? Could you please elaborate how you create snapshots? We could probably hammer out a testcase to improve the situation. Thanks, -liubo ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness 2017-05-09 19:14 ` Liu Bo @ 2017-05-09 19:25 ` Matt McKinnon 0 siblings, 0 replies; 6+ messages in thread From: Matt McKinnon @ 2017-05-09 19:25 UTC (permalink / raw) To: bo.li.liu; +Cc: pg, linux-btrfs Those snapshots were created using Marc Merlin's script (thanks, Marc). They don't do anything except sit around on the file system for a week or so and then are removed. I'm now doing quarter-hourly snaps instead of nightly since I have nightly backups of the filesytem going off-site. So far the btrfs-transaction and memory spikes have not returned. -Matt On 05/09/2017 03:14 PM, Liu Bo wrote: > On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: >>> Too little information. Is IO happening at the same time? Is >>> compression on? Deduplicated? Lots of subvolumes? SSD? What >>> kind of workload and file size/distribution profile? >> >> Only write IO during the load spikes. No compression, no deduplication. 12 >> volumes (including snapshots). Spinning disks. Medium workload; file sizes >> are all over the map since this hold about 30 user home directories. >> >> Interestingly enough, the problems which had persisted for many weeks went >> away when all snapshots were removed. btrfs-transaction spikes disappeared. >> Memory usage went from 30G to under 2G. >> > > Were those snapshots served as backup? > > Could you please elaborate how you create snapshots? We could > probably hammer out a testcase to improve the situation. > > Thanks, > > -liubo > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness 2017-05-04 13:15 Struggling with file system slowness Matt McKinnon 2017-05-04 14:22 ` Peter Grandi @ 2017-05-04 14:24 ` Duncan 1 sibling, 0 replies; 6+ messages in thread From: Duncan @ 2017-05-04 14:24 UTC (permalink / raw) To: linux-btrfs Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted: > Hi All, > > Trying to peg down why I have one server that has btrfs-transacti pegged > at 100% CPU for most of the time. > > I thought this might have to do with fragmentation as mentioned in the > Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as > mentioned in the wiki), but after running a full defrag of the file > system, and also enabling the 'autodefrag' mount option, the problem > still persists. > > What's the best way to figure out what btrfs is chugging away at here? > > Kernel: 4.10.13-custom > btrfs-progs: v4.10.2 Headed for work so briefer than usual... Three questions: Number of snapshots per subvolume? Quotas enabled? Do you do dedupe or otherwise have lots of reflinks? These dramatically affect scaling. Keeping the number of snapshots per subvolume under 300, under 100 if possible, should help a lot. Quotas dramatically worsen the problem, so keeping them disabled unless your use- case calls for them should help (and if your use-case calls for them, consider a filesystem where the quota feature is more mature). And reflinks are the mechanism behind snapshots, so too many of them for other reasons (such as dedupe) create problems too, tho a snapshot basically reflinks /everything/, so it takes quite a few reflinks to trigger the scaling issues of a single snapshot, meaning they aren't normally a problem unless dedupe is done on a /massive/ scale. Of course defrag interacts with snapshots too, tho it shouldn't affect /this/ problem, but potentially eating up more space than expected as it breaks the reflinks. Beyond that, have you tried a (readonly) btrfs check and/or a scrub or balance recently? Perhaps there's something wrong that's snagging things, and you simply haven't otherwise detected it yet? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-09 19:25 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-05-04 13:15 Struggling with file system slowness Matt McKinnon 2017-05-04 14:22 ` Peter Grandi 2017-05-05 13:24 ` Matt McKinnon 2017-05-09 19:14 ` Liu Bo 2017-05-09 19:25 ` Matt McKinnon 2017-05-04 14:24 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).