Re: [PATCH RFC] xfs: drop SYNC_WAIT from xfs_reclaim_inodes_ag during slab reclaim

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clm@fb.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH RFC] xfs: drop SYNC_WAIT from xfs_reclaim_inodes_ag during slab reclaim
Date: Thu, 17 Nov 2016 12:00:08 +1100	[thread overview]
Message-ID: <20161117010008.GC19783@dastard> (raw)
In-Reply-To: <20161117002726.GA4811@clm-mbp.masoncoding.com>

On Wed, Nov 16, 2016 at 07:27:27PM -0500, Chris Mason wrote:
> On Thu, Nov 17, 2016 at 10:31:36AM +1100, Dave Chinner wrote:
> >>>So I'm running on 16GB RAM and have 100-150MB of XFS slab.
> >>>Percentage wise, the inode cache is a larger portion of memory than
> >>>in your machines. I can increase the number of files to increase it
> >>>further, but I don't think that will change anything.
> >>
> >>I think the way to see what I'm seeing would be to drop the number
> >>of IO threads (-T) and bump both -m and -M.  Basically less inode
> >>working set and more memory working set.
> >
> >If I increase m/M by any non-trivial amount, the test OOMs within a
> >couple of minutes of starting even after cutting the number of IO
> >threads in half. I've managed to increase -m by 10% without OOM -
> >I'll keep trying to increase this part of the load as much as I
> >can as I refine the patchset I have.
> 
> Gotcha.  -m is long lasting, allocated once at the start of the run
> and stays around for ever.  It basically soaks up ram.  -M is
> allocated once per work loop, and it should be where the stalls
> really hit.  I'll peel off a flash machine tomorrow and find a
> command line that matches my results so far.
> 
> What kind of flash are you using?  I can choose between modern nvme
> or something more crusty.

Crusty old stuff - a pair of EVO 840s in HW-RAID0 behind 512MB of
BBWC. Read rates peak at ~150MB/s, write rates sustain at about
75MB/s.

I'm testing on a 200GB filesystem, configured as:

mkfs.xfs -f -dagcount=8,size=200g /dev/vdc

The backing file for /dev/vdc is fully preallocated and linear,
accessed via virtio/direct IO, so it's no different to accessing the
real block device....

> >>>That's what removing the blocking from the shrinker causes the
> >>>overall work rate to go down - it results in the cache not
> >>>maintaining a working set of inodes and so increases the IO load and
> >>>that then slows everything down.
> >>
> >>At least on my machines, it made the overall work rate go up.  Both
> >>simoop and prod are 10-15% faster.
> >
> >Ok, I'll see if I can tune the workload here to behave more like
> >this....
> 
> What direction do you have in mind for your current patches?  Many
> tiers have shadows where we can put experimental code without
> feeling bad if machines crash or data is lost.  I'm happy to line up
> tests if you want data from specific workloads.

Right now I have kswapd as fully non-blocking - even more so that
your one line patch because reclaim can (and does) still block on
inode locks with SYNC_TRYLOCK set. I don't see any problems with
doing this.

I'm still trying to work out what to do with direct reclaim - it's
clearly the source of the worst allocation latency problems, and it
also seems to be the contributing factor to the obnoxious kswapd FFE
behaviour.  There's a couple more variations I want to try to see if
I can make it block less severely, but what I do in the short term
here is largely dependent on the effect on other benchmarks and
loads....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2016-11-17  1:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-14 12:27 [PATCH RFC] xfs: drop SYNC_WAIT from xfs_reclaim_inodes_ag during slab reclaim Chris Mason
2016-10-15 22:34 ` Dave Chinner
2016-10-17  0:24   ` Chris Mason
2016-10-17  1:52     ` Dave Chinner
2016-10-17 13:30       ` Chris Mason
2016-10-17 22:30         ` Dave Chinner
2016-10-17 23:20           ` Chris Mason
2016-10-18  2:03             ` Dave Chinner
2016-11-14  1:00               ` Chris Mason
2016-11-14  7:27                 ` Dave Chinner
2016-11-14 20:56                   ` Chris Mason
2016-11-14 23:58                     ` Dave Chinner
2016-11-15  3:09                       ` Chris Mason
2016-11-15  5:54                       ` Dave Chinner
2016-11-15 19:00                         ` Chris Mason
2016-11-16  1:30                           ` Dave Chinner
2016-11-16  3:03                             ` Chris Mason
2016-11-16 23:31                               ` Dave Chinner
2016-11-17  0:27                                 ` Chris Mason
2016-11-17  1:00                                   ` Dave Chinner [this message]
2016-11-17  0:47                               ` Dave Chinner
2016-11-17  1:07                                 ` Chris Mason
2016-11-17  3:39                                   ` Dave Chinner
2019-06-14 12:58 ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161117010008.GC19783@dastard \
    --to=david@fromorbit.com \
    --cc=clm@fb.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).