Struggling with file system slowness

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Struggling with file system slowness
@ 2017-05-04 13:15 Matt McKinnon
  2017-05-04 14:22 ` Peter Grandi
  2017-05-04 14:24 ` Duncan
  0 siblings, 2 replies; 6+ messages in thread
From: Matt McKinnon @ 2017-05-04 13:15 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

Trying to peg down why I have one server that has btrfs-transacti pegged 
at 100% CPU for most of the time.

I thought this might have to do with fragmentation as mentioned in the 
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as 
mentioned in the wiki), but after running a full defrag of the file 
system, and also enabling the 'autodefrag' mount option, the problem 
still persists.

What's the best way to figure out what btrfs is chugging away at here?

Kernel: 4.10.13-custom
btrfs-progs: v4.10.2

-Matt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Struggling with file system slowness
  2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
@ 2017-05-04 14:22 ` Peter Grandi
  2017-05-05 13:24   ` Matt McKinnon
  2017-05-04 14:24 ` Duncan
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Grandi @ 2017-05-04 14:22 UTC (permalink / raw)
  To: Linux fs Btrfs

> Trying to peg down why I have one server that has
> btrfs-transacti pegged at 100% CPU for most of the time.

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What kind
of workload and file size/distribution profile?

Typical high CPU are extents (your defragging not necessarily
worked), and 'qgroups', especially with many subvolumes. It
could be the fre space cache in some rare cases.

  https://www.google.ca/search?num=100&safe=images&as_q=cxpu&as_epq=btrfs-transaction

To this something like this happens often, but is not
Btrfs-related, but triggered for example by near-memory
exhaustion in the kernel memory manager.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Struggling with file system slowness
  2017-05-04 14:22 ` Peter Grandi
@ 2017-05-05 13:24   ` Matt McKinnon
  2017-05-09 19:14     ` Liu Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Matt McKinnon @ 2017-05-05 13:24 UTC (permalink / raw)
  To: pg, linux-btrfs

 > Too little information. Is IO happening at the same time? Is
 > compression on? Deduplicated? Lots of subvolumes? SSD? What
 > kind of workload and file size/distribution profile?

Only write IO during the load spikes.  No compression, no deduplication. 
  12 volumes (including snapshots).  Spinning disks.  Medium workload; 
file sizes are all over the map since this hold about 30 user home 
directories.

Interestingly enough, the problems which had persisted for many weeks 
went away when all snapshots were removed.  btrfs-transaction spikes 
disappeared.  Memory usage went from 30G to under 2G.

-Matt

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Struggling with file system slowness
  2017-05-05 13:24   ` Matt McKinnon
@ 2017-05-09 19:14     ` Liu Bo
  2017-05-09 19:25       ` Matt McKinnon
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2017-05-09 19:14 UTC (permalink / raw)
  To: Matt McKinnon; +Cc: pg, linux-btrfs

On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:
> > Too little information. Is IO happening at the same time? Is
> > compression on? Deduplicated? Lots of subvolumes? SSD? What
> > kind of workload and file size/distribution profile?
> 
> Only write IO during the load spikes.  No compression, no deduplication.  12
> volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
> are all over the map since this hold about 30 user home directories.
> 
> Interestingly enough, the problems which had persisted for many weeks went
> away when all snapshots were removed.  btrfs-transaction spikes disappeared.
> Memory usage went from 30G to under 2G.
>

Were those snapshots served as backup?

Could you please elaborate how you create snapshots?  We could
probably hammer out a testcase to improve the situation.

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Struggling with file system slowness
  2017-05-09 19:14     ` Liu Bo
@ 2017-05-09 19:25       ` Matt McKinnon
  0 siblings, 0 replies; 6+ messages in thread
From: Matt McKinnon @ 2017-05-09 19:25 UTC (permalink / raw)
  To: bo.li.liu; +Cc: pg, linux-btrfs

Those snapshots were created using Marc Merlin's script (thanks, Marc). 
They don't do anything except sit around on the file system for a week 
or so and then are removed.

I'm now doing quarter-hourly snaps instead of nightly since I have 
nightly backups of the filesytem going off-site.  So far the 
btrfs-transaction and memory spikes have not returned.

-Matt





On 05/09/2017 03:14 PM, Liu Bo wrote:
> On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:
>>> Too little information. Is IO happening at the same time? Is
>>> compression on? Deduplicated? Lots of subvolumes? SSD? What
>>> kind of workload and file size/distribution profile?
>>
>> Only write IO during the load spikes.  No compression, no deduplication.  12
>> volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
>> are all over the map since this hold about 30 user home directories.
>>
>> Interestingly enough, the problems which had persisted for many weeks went
>> away when all snapshots were removed.  btrfs-transaction spikes disappeared.
>> Memory usage went from 30G to under 2G.
>>
>
> Were those snapshots served as backup?
>
> Could you please elaborate how you create snapshots?  We could
> probably hammer out a testcase to improve the situation.
>
> Thanks,
>
> -liubo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Struggling with file system slowness
  2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
  2017-05-04 14:22 ` Peter Grandi
@ 2017-05-04 14:24 ` Duncan
  1 sibling, 0 replies; 6+ messages in thread
From: Duncan @ 2017-05-04 14:24 UTC (permalink / raw)
  To: linux-btrfs

Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted:

> Hi All,
> 
> Trying to peg down why I have one server that has btrfs-transacti pegged
> at 100% CPU for most of the time.
> 
> I thought this might have to do with fragmentation as mentioned in the
> Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as
> mentioned in the wiki), but after running a full defrag of the file
> system, and also enabling the 'autodefrag' mount option, the problem
> still persists.
> 
> What's the best way to figure out what btrfs is chugging away at here?
> 
> Kernel: 4.10.13-custom
> btrfs-progs: v4.10.2

Headed for work so briefer than usual...

Three questions:

Number of snapshots per subvolume?

Quotas enabled?

Do you do dedupe or otherwise have lots of reflinks?

These dramatically affect scaling.  Keeping the number of snapshots per 
subvolume under 300, under 100 if possible, should help a lot.  Quotas 
dramatically worsen the problem, so keeping them disabled unless your use-
case calls for them should help (and if your use-case calls for them, 
consider a filesystem where the quota feature is more mature).  And 
reflinks are the mechanism behind snapshots, so too many of them for 
other reasons (such as dedupe) create problems too, tho a snapshot 
basically reflinks /everything/, so it takes quite a few reflinks to 
trigger the scaling issues of a single snapshot, meaning they aren't 
normally a problem unless dedupe is done on a /massive/ scale.

Of course defrag interacts with snapshots too, tho it shouldn't affect 
/this/ problem, but potentially eating up more space than expected as it 
breaks the reflinks.

Beyond that, have you tried a (readonly) btrfs check and/or a scrub or 
balance recently?  Perhaps there's something wrong that's snagging 
things, and you simply haven't otherwise detected it yet?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-09 19:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-05-04 13:15 Struggling with file system slowness Matt McKinnon
2017-05-04 14:22 ` Peter Grandi
2017-05-05 13:24   ` Matt McKinnon
2017-05-09 19:14     ` Liu Bo
2017-05-09 19:25       ` Matt McKinnon
2017-05-04 14:24 ` Duncan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).