All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Nhat Pham <nphamcs@gmail.com>,
	 Chengming Zhou <zhouchengming@bytedance.com>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: zswap: optimize zswap pool size tracking
Date: Mon, 11 Mar 2024 22:09:35 +0000	[thread overview]
Message-ID: <Ze-BH5HDNUG5ohJS@google.com> (raw)
In-Reply-To: <20240311161214.1145168-1-hannes@cmpxchg.org>

On Mon, Mar 11, 2024 at 12:12:13PM -0400, Johannes Weiner wrote:
> Profiling the munmap() of a zswapped memory region shows 50%(!) of the
> total cycles currently going into updating the zswap_pool_total_size.

Yikes. I have always hated that size update scheme FWIW.

I have also wondered whether it makes sense to just maintain the number
of pages in zswap as an atomic, like zswap_stored_pages. I guess your
proposed scheme is even cheaper for the load/invalidate paths because we
do nothing at all.  It could be an option if the aggregation in other
paths ever becomes a problem, but we would need to make sure it
doesn't regress the load/invalidate paths. Just sharing some thoughts.

> 
> There are three consumers of this counter:
> - store, to enforce the globally configured pool limit
> - meminfo & debugfs, to report the size to the user
> - shrink, to determine the batch size for each cycle
> 
> Instead of aggregating everytime an entry enters or exits the zswap
> pool, aggregate the value from the zpools on-demand:
> 
> - Stores aggregate the counter anyway upon success. Aggregating to
>   check the limit instead is the same amount of work.
> 
> - Meminfo & debugfs might benefit somewhat from a pre-aggregated
>   counter, but aren't exactly hotpaths.
> 
> - Shrinking can aggregate once for every cycle instead of doing it for
>   every freed entry. As the shrinker might work on tens or hundreds of
>   objects per scan cycle, this is a large reduction in aggregations.
> 
> The paths that benefit dramatically are swapin, swapoff, and
> unmaps. There could be millions of pages being processed until
> somebody asks for the pool size again. This eliminates the pool size
> updates from those paths entirely.

This looks like a big win, thanks! I wonder if you have any numbers of
perf profiles to share. That would be nice to have, but I think the
benefit is clear regardless.

I also like the implicit cleanup when we switch to maintaining the
number of pages rather than bytes. The code looks much better with all
the shifts and divisions gone :)

I have a couple of comments below. With them addressed, feel free to
add:
Acked-by: Yosry Ahmed <yosryahmed@google.com>

[..]  
> @@ -1385,6 +1365,10 @@ static void shrink_worker(struct work_struct *w)
>  {
>  	struct mem_cgroup *memcg;
>  	int ret, failures = 0;
> +	unsigned long thr;
> +
> +	/* Reclaim down to the accept threshold */
> +	thr = zswap_max_pages() * zswap_accept_thr_percent / 100;

This calculation is repeated twice, so I'd rather keep a helper for it
as an alternative to zswap_can_accept(). Perhaps zswap_threshold_page()
or zswap_acceptance_pages()?

>  
>  	/* global reclaim will select cgroup in a round-robin fashion. */
>  	do {
> @@ -1432,10 +1416,9 @@ static void shrink_worker(struct work_struct *w)
>  			break;
>  		if (ret && ++failures == MAX_RECLAIM_RETRIES)
>  			break;
> -
>  resched:
>  		cond_resched();
> -	} while (!zswap_can_accept());
> +	} while (zswap_total_pages() > thr);
>  }
[..]
> @@ -1711,6 +1700,13 @@ void zswap_swapoff(int type)
>  
>  static struct dentry *zswap_debugfs_root;
>  
> +static int debugfs_get_total_size(void *data, u64 *val)
> +{
> +	*val = zswap_total_pages() * PAGE_SIZE;
> +	return 0;
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(total_size_fops, debugfs_get_total_size, NULL, "%llu");

I think we are missing a newline here to maintain the current format
(i.e "%llu\n").

> +
>  static int zswap_debugfs_init(void)
>  {
>  	if (!debugfs_initialized())
> @@ -1732,8 +1728,8 @@ static int zswap_debugfs_init(void)
>  			   zswap_debugfs_root, &zswap_reject_compress_poor);
>  	debugfs_create_u64("written_back_pages", 0444,
>  			   zswap_debugfs_root, &zswap_written_back_pages);
> -	debugfs_create_u64("pool_total_size", 0444,
> -			   zswap_debugfs_root, &zswap_pool_total_size);
> +	debugfs_create_file("pool_total_size", 0444,
> +			    zswap_debugfs_root, NULL, &total_size_fops);
>  	debugfs_create_atomic_t("stored_pages", 0444,
>  				zswap_debugfs_root, &zswap_stored_pages);
>  	debugfs_create_atomic_t("same_filled_pages", 0444,
> -- 
> 2.44.0
> 


  parent reply	other threads:[~2024-03-11 22:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-11 16:12 [PATCH 1/2] mm: zswap: optimize zswap pool size tracking Johannes Weiner
2024-03-11 16:12 ` [PATCH 2/2] mm: zpool: return pool size in pages Johannes Weiner
2024-03-11 22:12   ` Yosry Ahmed
2024-03-12  2:36     ` Johannes Weiner
2024-03-12  4:07       ` Yosry Ahmed
2024-03-12  4:56   ` Chengming Zhou
2024-03-12  9:15   ` Nhat Pham
2024-03-11 22:09 ` Yosry Ahmed [this message]
2024-03-12  2:34   ` [PATCH 1/2] mm: zswap: optimize zswap pool size tracking Johannes Weiner
2024-03-12  4:04     ` Yosry Ahmed
2024-03-12  4:55 ` Chengming Zhou
2024-03-12  9:12 ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze-BH5HDNUG5ohJS@google.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.