* Struggling with file system slowness
@ 2017-05-04 13:15 Matt McKinnon
2017-05-04 14:22 ` Peter Grandi
2017-05-04 14:24 ` Duncan
0 siblings, 2 replies; 6+ messages in thread
From: Matt McKinnon @ 2017-05-04 13:15 UTC (permalink / raw)
To: linux-btrfs
Hi All,
Trying to peg down why I have one server that has btrfs-transacti pegged
at 100% CPU for most of the time.
I thought this might have to do with fragmentation as mentioned in the
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as
mentioned in the wiki), but after running a full defrag of the file
system, and also enabling the 'autodefrag' mount option, the problem
still persists.
What's the best way to figure out what btrfs is chugging away at here?
Kernel: 4.10.13-custom
btrfs-progs: v4.10.2
-Matt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness
2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
@ 2017-05-04 14:22 ` Peter Grandi
2017-05-05 13:24 ` Matt McKinnon
2017-05-04 14:24 ` Duncan
1 sibling, 1 reply; 6+ messages in thread
From: Peter Grandi @ 2017-05-04 14:22 UTC (permalink / raw)
To: Linux fs Btrfs
> Trying to peg down why I have one server that has
> btrfs-transacti pegged at 100% CPU for most of the time.
Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What kind
of workload and file size/distribution profile?
Typical high CPU are extents (your defragging not necessarily
worked), and 'qgroups', especially with many subvolumes. It
could be the fre space cache in some rare cases.
https://www.google.ca/search?num=100&safe=images&as_q=cxpu&as_epq=btrfs-transaction
To this something like this happens often, but is not
Btrfs-related, but triggered for example by near-memory
exhaustion in the kernel memory manager.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness
2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
2017-05-04 14:22 ` Peter Grandi
@ 2017-05-04 14:24 ` Duncan
1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2017-05-04 14:24 UTC (permalink / raw)
To: linux-btrfs
Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted:
> Hi All,
>
> Trying to peg down why I have one server that has btrfs-transacti pegged
> at 100% CPU for most of the time.
>
> I thought this might have to do with fragmentation as mentioned in the
> Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as
> mentioned in the wiki), but after running a full defrag of the file
> system, and also enabling the 'autodefrag' mount option, the problem
> still persists.
>
> What's the best way to figure out what btrfs is chugging away at here?
>
> Kernel: 4.10.13-custom
> btrfs-progs: v4.10.2
Headed for work so briefer than usual...
Three questions:
Number of snapshots per subvolume?
Quotas enabled?
Do you do dedupe or otherwise have lots of reflinks?
These dramatically affect scaling. Keeping the number of snapshots per
subvolume under 300, under 100 if possible, should help a lot. Quotas
dramatically worsen the problem, so keeping them disabled unless your use-
case calls for them should help (and if your use-case calls for them,
consider a filesystem where the quota feature is more mature). And
reflinks are the mechanism behind snapshots, so too many of them for
other reasons (such as dedupe) create problems too, tho a snapshot
basically reflinks /everything/, so it takes quite a few reflinks to
trigger the scaling issues of a single snapshot, meaning they aren't
normally a problem unless dedupe is done on a /massive/ scale.
Of course defrag interacts with snapshots too, tho it shouldn't affect
/this/ problem, but potentially eating up more space than expected as it
breaks the reflinks.
Beyond that, have you tried a (readonly) btrfs check and/or a scrub or
balance recently? Perhaps there's something wrong that's snagging
things, and you simply haven't otherwise detected it yet?
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness
2017-05-04 14:22 ` Peter Grandi
@ 2017-05-05 13:24 ` Matt McKinnon
2017-05-09 19:14 ` Liu Bo
0 siblings, 1 reply; 6+ messages in thread
From: Matt McKinnon @ 2017-05-05 13:24 UTC (permalink / raw)
To: pg, linux-btrfs
> Too little information. Is IO happening at the same time? Is
> compression on? Deduplicated? Lots of subvolumes? SSD? What
> kind of workload and file size/distribution profile?
Only write IO during the load spikes. No compression, no deduplication.
12 volumes (including snapshots). Spinning disks. Medium workload;
file sizes are all over the map since this hold about 30 user home
directories.
Interestingly enough, the problems which had persisted for many weeks
went away when all snapshots were removed. btrfs-transaction spikes
disappeared. Memory usage went from 30G to under 2G.
-Matt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness
2017-05-05 13:24 ` Matt McKinnon
@ 2017-05-09 19:14 ` Liu Bo
2017-05-09 19:25 ` Matt McKinnon
0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2017-05-09 19:14 UTC (permalink / raw)
To: Matt McKinnon; +Cc: pg, linux-btrfs
On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:
> > Too little information. Is IO happening at the same time? Is
> > compression on? Deduplicated? Lots of subvolumes? SSD? What
> > kind of workload and file size/distribution profile?
>
> Only write IO during the load spikes. No compression, no deduplication. 12
> volumes (including snapshots). Spinning disks. Medium workload; file sizes
> are all over the map since this hold about 30 user home directories.
>
> Interestingly enough, the problems which had persisted for many weeks went
> away when all snapshots were removed. btrfs-transaction spikes disappeared.
> Memory usage went from 30G to under 2G.
>
Were those snapshots served as backup?
Could you please elaborate how you create snapshots? We could
probably hammer out a testcase to improve the situation.
Thanks,
-liubo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Struggling with file system slowness
2017-05-09 19:14 ` Liu Bo
@ 2017-05-09 19:25 ` Matt McKinnon
0 siblings, 0 replies; 6+ messages in thread
From: Matt McKinnon @ 2017-05-09 19:25 UTC (permalink / raw)
To: bo.li.liu; +Cc: pg, linux-btrfs
Those snapshots were created using Marc Merlin's script (thanks, Marc).
They don't do anything except sit around on the file system for a week
or so and then are removed.
I'm now doing quarter-hourly snaps instead of nightly since I have
nightly backups of the filesytem going off-site. So far the
btrfs-transaction and memory spikes have not returned.
-Matt
On 05/09/2017 03:14 PM, Liu Bo wrote:
> On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:
>>> Too little information. Is IO happening at the same time? Is
>>> compression on? Deduplicated? Lots of subvolumes? SSD? What
>>> kind of workload and file size/distribution profile?
>>
>> Only write IO during the load spikes. No compression, no deduplication. 12
>> volumes (including snapshots). Spinning disks. Medium workload; file sizes
>> are all over the map since this hold about 30 user home directories.
>>
>> Interestingly enough, the problems which had persisted for many weeks went
>> away when all snapshots were removed. btrfs-transaction spikes disappeared.
>> Memory usage went from 30G to under 2G.
>>
>
> Were those snapshots served as backup?
>
> Could you please elaborate how you create snapshots? We could
> probably hammer out a testcase to improve the situation.
>
> Thanks,
>
> -liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-09 19:25 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
2017-05-04 14:22 ` Peter Grandi
2017-05-05 13:24 ` Matt McKinnon
2017-05-09 19:14 ` Liu Bo
2017-05-09 19:25 ` Matt McKinnon
2017-05-04 14:24 ` Duncan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).