linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 17/30] xfs: make inode reclaim almost non-blocking
Date: Thu, 4 Jun 2020 14:06:44 -0400	[thread overview]
Message-ID: <20200604180644.GF17815@bfoster> (raw)
In-Reply-To: <20200604074606.266213-18-david@fromorbit.com>

On Thu, Jun 04, 2020 at 05:45:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that dirty inode writeback doesn't cause read-modify-write
> cycles on the inode cluster buffer under memory pressure, the need
> to throttle memory reclaim to the rate at which we can clean dirty
> inodes goes away. That is due to the fact that we no longer thrash
> inode cluster buffers under memory pressure to clean dirty inodes.
> 
> This means inode writeback no longer stalls on memory allocation
> or read IO, and hence can be done asynchronously without generating
> memory pressure. As a result, blocking inode writeback in reclaim is
> no longer necessary to prevent reclaim priority windup as cleaning
> dirty inodes is no longer dependent on having memory reserves
> available for the filesystem to make progress reclaiming inodes.
> 
> Hence we can convert inode reclaim to be non-blocking for shrinker
> callouts, both for direct reclaim and kswapd.
> 
> On a vanilla kernel, running a 16-way fsmark create workload on a
> 4 node/16p/16GB RAM machine, I can reliably pin 14.75GB of RAM via
> userspace mlock(). The OOM killer gets invoked at 15GB of
> pinned RAM.
> 
> With this patch alone, pinning memory triggers premature OOM
> killer invocation, sometimes with as much as 45% of RAM being free.
> It's trivially easy to trigger the OOM killer when reclaim does not
> block.
> 

Nit, but I had to reread this a couple times to grok what "With this
patch alone ..." meant. I think it means "With non-blocking inode
reclaim without cluster buffer pinning ...," based on the next
paragraph. If so, I think the explicit wording is a little more clear.

> With pinning inode clusters in RAM and then adding this patch, I can
> reliably pin 14.5GB of RAM and still have the fsmark workload run to
> completion. The OOM killer gets invoked 14.75GB of pinned RAM, which
> is only a small amount of memory less than the vanilla kernel. It is
> much more reliable than just with async reclaim alone.
> 
> simoops shows that allocation stalls go away when async reclaim is
> used. Vanilla kernel:
> 
> Run time: 1924 seconds
> Read latency (p50: 3,305,472) (p95: 3,723,264) (p99: 4,001,792)
> Write latency (p50: 184,064) (p95: 553,984) (p99: 807,936)
> Allocation latency (p50: 2,641,920) (p95: 3,911,680) (p99: 4,464,640)
> work rate = 13.45/sec (avg 13.44/sec) (p50: 13.46) (p95: 13.58) (p99: 13.70)
> alloc stall rate = 3.80/sec (avg: 2.59) (p50: 2.54) (p95: 2.96) (p99: 3.02)
> 
> With inode cluster pinning and async reclaim:
> 
> Run time: 1924 seconds
> Read latency (p50: 3,305,472) (p95: 3,715,072) (p99: 3,977,216)
> Write latency (p50: 187,648) (p95: 553,984) (p99: 789,504)
> Allocation latency (p50: 2,748,416) (p95: 3,919,872) (p99: 4,448,256)
> work rate = 13.28/sec (avg 13.32/sec) (p50: 13.26) (p95: 13.34) (p99: 13.34)
> alloc stall rate = 0.02/sec (avg: 0.02) (p50: 0.01) (p95: 0.03) (p99: 0.03)
> 
> Latencies don't really change much, nor does the work rate. However,
> allocation almost never stalls with these changes, whilst the
> vanilla kernel is sometimes reporting 20 stalls/s over a 60s sample
> period. This difference is due to inode reclaim being largely
> non-blocking now.
> 
> IOWs, once we have pinned inode cluster buffers, we can make inode
> reclaim non-blocking without a major risk of premature and/or
> spurious OOM killer invocation, and without any changes to memory
> reclaim infrastructure.
> 

Very interesting result:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Amir Goldstein <amir73il@gmail.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_icache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index dbba4c1946386..a6780942034fc 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1402,7 +1402,7 @@ xfs_reclaim_inodes_nr(
>  	xfs_reclaim_work_queue(mp);
>  	xfs_ail_push_all(mp->m_ail);
>  
> -	return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT, &nr_to_scan);
> +	return xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK, &nr_to_scan);
>  }
>  
>  /*
> -- 
> 2.26.2.761.g0e0b3e54be
> 


  reply	other threads:[~2020-06-04 18:07 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-04  7:45 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-04  7:45 ` [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Dave Chinner
2020-06-04  7:45 ` [PATCH 02/30] xfs: remove logged flag from inode log item Dave Chinner
2020-06-04  7:45 ` [PATCH 03/30] xfs: add an inode item lock Dave Chinner
2020-06-09 13:13   ` Brian Foster
2020-06-04  7:45 ` [PATCH 04/30] xfs: mark inode buffers in cache Dave Chinner
2020-06-04 14:04   ` Brian Foster
2020-06-04  7:45 ` [PATCH 05/30] xfs: mark dquot " Dave Chinner
2020-06-04  7:45 ` [PATCH 06/30] xfs: mark log recovery buffers for completion Dave Chinner
2020-06-04  7:45 ` [PATCH 07/30] xfs: call xfs_buf_iodone directly Dave Chinner
2020-06-04  7:45 ` [PATCH 08/30] xfs: clean up whacky buffer log item list reinit Dave Chinner
2020-06-04  7:45 ` [PATCH 09/30] xfs: make inode IO completion buffer centric Dave Chinner
2020-06-04  7:45 ` [PATCH 10/30] xfs: use direct calls for dquot IO completion Dave Chinner
2020-06-04  7:45 ` [PATCH 11/30] xfs: clean up the buffer iodone callback functions Dave Chinner
2020-06-04  7:45 ` [PATCH 12/30] xfs: get rid of log item callbacks Dave Chinner
2020-06-04  7:45 ` [PATCH 13/30] xfs: handle buffer log item IO errors directly Dave Chinner
2020-06-04 14:05   ` Brian Foster
2020-06-05  0:59     ` Dave Chinner
2020-06-05  1:32   ` [PATCH 13/30 V2] " Dave Chinner
2020-06-05 16:24     ` Brian Foster
2020-06-04  7:45 ` [PATCH 14/30] xfs: unwind log item error flagging Dave Chinner
2020-06-04  7:45 ` [PATCH 15/30] xfs: move xfs_clear_li_failed out of xfs_ail_delete_one() Dave Chinner
2020-06-04  7:45 ` [PATCH 16/30] xfs: pin inode backing buffer to the inode log item Dave Chinner
2020-06-04 14:05   ` Brian Foster
2020-06-04  7:45 ` [PATCH 17/30] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-06-04 18:06   ` Brian Foster [this message]
2020-06-04  7:45 ` [PATCH 18/30] xfs: remove IO submission from xfs_reclaim_inode() Dave Chinner
2020-06-04 18:08   ` Brian Foster
2020-06-04 22:53     ` Dave Chinner
2020-06-05 16:25       ` Brian Foster
2020-06-04  7:45 ` [PATCH 19/30] xfs: allow multiple reclaimers per AG Dave Chinner
2020-06-05 16:26   ` Brian Foster
2020-06-05 21:07     ` Dave Chinner
2020-06-08 16:44       ` Brian Foster
2020-06-04  7:45 ` [PATCH 20/30] xfs: don't block inode reclaim on the ILOCK Dave Chinner
2020-06-05 16:26   ` Brian Foster
2020-06-04  7:45 ` [PATCH 21/30] xfs: remove SYNC_TRYLOCK from inode reclaim Dave Chinner
2020-06-05 16:26   ` Brian Foster
2020-06-04  7:45 ` [PATCH 22/30] xfs: remove SYNC_WAIT from xfs_reclaim_inodes() Dave Chinner
2020-06-05 16:26   ` Brian Foster
2020-06-05 21:09     ` Dave Chinner
2020-06-04  7:45 ` [PATCH 23/30] xfs: clean up inode reclaim comments Dave Chinner
2020-06-05 16:26   ` Brian Foster
2020-06-04  7:46 ` [PATCH 24/30] xfs: rework stale inodes in xfs_ifree_cluster Dave Chinner
2020-06-05 18:27   ` Brian Foster
2020-06-05 21:32     ` Dave Chinner
2020-06-08 16:44       ` Brian Foster
2020-06-04  7:46 ` [PATCH 25/30] xfs: attach inodes to the cluster buffer when dirtied Dave Chinner
2020-06-08 16:45   ` Brian Foster
2020-06-08 21:05     ` Dave Chinner
2020-06-04  7:46 ` [PATCH 26/30] xfs: xfs_iflush() is no longer necessary Dave Chinner
2020-06-08 16:45   ` Brian Foster
2020-06-08 21:37     ` Dave Chinner
2020-06-08 22:26   ` [PATCH 26/30 V2] " Dave Chinner
2020-06-09 13:11     ` Brian Foster
2020-06-04  7:46 ` [PATCH 27/30] xfs: rename xfs_iflush_int() Dave Chinner
2020-06-08 17:37   ` Brian Foster
2020-06-04  7:46 ` [PATCH 28/30] xfs: rework xfs_iflush_cluster() dirty inode iteration Dave Chinner
2020-06-09 13:11   ` Brian Foster
2020-06-09 22:01     ` Dave Chinner
2020-06-10 13:06       ` Brian Foster
2020-06-10 23:40         ` Dave Chinner
2020-06-11 13:56           ` Brian Foster
2020-06-15  1:01             ` Dave Chinner
2020-06-15 14:21               ` Brian Foster
2020-06-16 14:41                 ` Brian Foster
2020-06-11  1:56   ` [PATCH 28/30 V2] " Dave Chinner
2020-06-04  7:46 ` [PATCH 29/30] xfs: factor xfs_iflush_done Dave Chinner
2020-06-09 13:12   ` Brian Foster
2020-06-09 22:14     ` Dave Chinner
2020-06-10 13:08       ` Brian Foster
2020-06-11  0:16         ` Dave Chinner
2020-06-11 14:07           ` Brian Foster
2020-06-15  1:49             ` Dave Chinner
2020-06-15  5:20               ` Amir Goldstein
2020-06-15 14:31               ` Brian Foster
2020-06-11  1:58   ` [PATCH 29/30 V2] " Dave Chinner
2020-06-04  7:46 ` [PATCH 30/30] xfs: remove xfs_inobp_check() Dave Chinner
2020-06-09 13:12   ` Brian Foster
  -- strict thread matches above, loose matches on Subject: below --
2020-06-22  8:15 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-22  8:15 ` [PATCH 17/30] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-06-01 21:42 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-01 21:42 ` [PATCH 17/30] xfs: make inode reclaim almost non-blocking Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604180644.GF17815@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).