linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Josef Bacik <josef@toxicpanda.com>
Cc: hannes@cmpxchg.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, jack@suse.cz,
	linux-fsdevel@vger.kernel.org, kernel-team@fb.com,
	linux-btrfs@vger.kernel.org, Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 06/10] writeback: add counters for metadata usage
Date: Wed, 22 Nov 2017 11:21:59 +0100	[thread overview]
Message-ID: <20171122102159.GC11233@quack2.suse.cz> (raw)
In-Reply-To: <1510696616-8489-6-git-send-email-josef@toxicpanda.com>

On Tue 14-11-17 16:56:52, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> Btrfs has no bounds except memory on the amount of dirty memory that we have in
> use for metadata.  Historically we have used a special inode so we could take
> advantage of the balance_dirty_pages throttling that comes with using pagecache.
> However as we'd like to support different blocksizes it would be nice to not
> have to rely on pagecache, but still get the balance_dirty_pages throttling
> without having to do it ourselves.
> 
> So introduce *METADATA_DIRTY_BYTES and *METADATA_WRITEBACK_BYTES.  These are
> zone and bdi_writeback counters to keep track of how many bytes we have in
> flight for METADATA.  We need to count in bytes as blocksizes could be
> percentages of pagesize.  We simply convert the bytes to number of pages where
> it is needed for the throttling.
> 
> Also introduce NR_METADATA_BYTES so we can keep track of the total amount of
> pages used for metadata on the system.  This is also needed so things like dirty
> throttling know that this is dirtyable memory as well and easily reclaimed.

NR_METADATA_BYTES never gets set in the patch set. Either remove this or
implement it properly. Also for memory reclaim properties we already have
NR_SLAB_RECLAIMABLE so you should make sure your metadata buffers are not
double accounted.

Another catch is that node and zone counters are kept in longs. So on
32-bit archs you will overflow the counters if number of metadata (or dirty
metadata) ever exceeds 2GB. That should be rare but still possible. Not
sure what the right answer to this is... Account in 512-byte units?

> @@ -1549,12 +1579,17 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
>  	 * deltas.
>  	 */
>  	if (dtc->wb_thresh < 2 * wb_stat_error(wb)) {
> -		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
> -		dtc->wb_dirty = wb_reclaimable + wb_stat_sum(wb, WB_WRITEBACK);
> +		wb_reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE) +
> +			(wb_stat_sum(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
> +		wb_writeback = wb_stat_sum(wb, WB_WRITEBACK) +
> +			(wb_stat_sum(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
>  	} else {
> -		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE);
> -		dtc->wb_dirty = wb_reclaimable + wb_stat(wb, WB_WRITEBACK);
> +		wb_reclaimable = wb_stat(wb, WB_RECLAIMABLE) +
> +			(wb_stat(wb, WB_METADATA_DIRTY_BYTES) >> PAGE_SHIFT);
> +		wb_writeback = wb_stat(wb, WB_WRITEBACK) +
> +			(wb_stat(wb, WB_METADATA_WRITEBACK_BYTES) >> PAGE_SHIFT);
>  	}
> +	dtc->wb_dirty = wb_reclaimable + wb_writeback;
>  }

Use BtoP here as well? You have it defined anyway...

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 13d711dd8776..415b003e475c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -225,7 +225,8 @@ unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat)
>  
>  	nr = node_page_state_snapshot(pgdat, NR_ACTIVE_FILE) +
>  	     node_page_state_snapshot(pgdat, NR_INACTIVE_FILE) +
> -	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE);
> +	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE) +
> +	     (node_page_state_snapshot(pgdat, NR_METADATA_BYTES) >> PAGE_SHIFT);
>  
>  	if (get_nr_swap_pages() > 0)
>  		nr += node_page_state_snapshot(pgdat, NR_ACTIVE_ANON) +

This function never gets called in current kernel. I'll send a patch to
remove it.

> @@ -3812,6 +3813,7 @@ static inline unsigned long node_unmapped_file_pages(struct pglist_data *pgdat)
>  static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>  {
>  	unsigned long nr_pagecache_reclaimable;
> +	unsigned long nr_metadata_reclaimable;
>  	unsigned long delta = 0;
>  
>  	/*
> @@ -3833,7 +3835,20 @@ static unsigned long node_pagecache_reclaimable(struct pglist_data *pgdat)
>  	if (unlikely(delta > nr_pagecache_reclaimable))
>  		delta = nr_pagecache_reclaimable;
>  
> -	return nr_pagecache_reclaimable - delta;
> +	nr_metadata_reclaimable =
> +		node_page_state(pgdat, NR_METADATA_BYTES) >> PAGE_SHIFT;
> +	/*
> +	 * We don't do writeout through the shrinkers so subtract any
> +	 * dirty/writeback metadata bytes from the reclaimable count.
> +	 */
> +	if (nr_metadata_reclaimable) {
> +		unsigned long unreclaimable =
> +			node_page_state(pgdat, NR_METADATA_DIRTY_BYTES) +
> +			node_page_state(pgdat, NR_METADATA_WRITEBACK_BYTES);
> +		unreclaimable >>= PAGE_SHIFT;
> +		nr_metadata_reclaimable -= unreclaimable;
> +	}
> +	return nr_metadata_reclaimable + nr_pagecache_reclaimable - delta;
>  }

So I've checked both places that use this function and I think they are fine
with the change. However it would still be good to get someone more
knowledgeable of reclaim paths to have a look at this patch.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2017-11-22 10:22 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 21:56 [PATCH 01/10] remove mapping from balance_dirty_pages*() Josef Bacik
2017-11-14 21:56 ` [PATCH 02/10] writeback: convert WB_WRITTEN/WB_DIRITED counters to bytes Josef Bacik
2017-11-16 23:45   ` Liu Bo
2017-11-14 21:56 ` [PATCH 03/10] lib: add a batch size to fprop_global Josef Bacik
2017-11-22  8:47   ` Jan Kara
2017-11-22  8:54     ` Jan Kara
2017-11-14 21:56 ` [PATCH 04/10] lib: add a __fprop_add_percpu_max Josef Bacik
2017-11-14 21:56 ` [PATCH 05/10] writeback: convert the flexible prop stuff to bytes Josef Bacik
2017-11-14 21:56 ` [PATCH 06/10] writeback: add counters for metadata usage Josef Bacik
2017-11-22 10:21   ` Jan Kara [this message]
2017-11-14 21:56 ` [PATCH 07/10] writeback: introduce super_operations->write_metadata Josef Bacik
2017-11-14 21:56 ` [PATCH 08/10] export radix_tree_iter_tag_set Josef Bacik
2017-11-14 21:56 ` [PATCH 09/10] Btrfs: kill the btree_inode Josef Bacik
2017-11-17  1:03   ` Liu Bo
2017-11-17  1:13     ` Josef Bacik
2017-11-14 21:56 ` [PATCH 10/10] btrfs: rework end io for extent buffer reads Josef Bacik
2017-11-17  1:24   ` Liu Bo
2017-11-16 23:36 ` [PATCH 01/10] remove mapping from balance_dirty_pages*() Liu Bo
2017-11-21 22:45 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122102159.GC11233@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jbacik@fb.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).