Re: [PATCH v7] zswap: replace RB tree with xarray

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yosry Ahmed <yosryahmed@google.com>
To: Chris Li <chrisl@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	Nhat Pham <nphamcs@gmail.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	 Chengming Zhou <zhouchengming@bytedance.com>,
	Barry Song <v-songbaohua@oppo.com>
Subject: Re: [PATCH v7] zswap: replace RB tree with xarray
Date: Wed, 20 Mar 2024 06:13:29 +0000	[thread overview]
Message-ID: <Zfp-iWaDfqeCOElt@google.com> (raw)
In-Reply-To: <20240319-zswap-xarray-v7-1-e9a03a049e86@kernel.org>

On Tue, Mar 19, 2024 at 10:52:26PM -0700, Chris Li wrote:
> Very deep RB tree requires rebalance at times. That
> contributes to the zswap fault latencies. Xarray does not
> need to perform tree rebalance. Replacing RB tree to xarray
> can have some small performance gain.
> 
> One small difference is that xarray insert might fail with
> ENOMEM, while RB tree insert does not allocate additional
> memory.
> 
> The zswap_entry size will reduce a bit due to removing the
> RB node, which has two pointers and a color field. Xarray
> store the pointer in the xarray tree rather than the
> zswap_entry. Every entry has one pointer from the xarray
> tree. Overall, switching to xarray should save some memory,
> if the swap entries are densely packed.
> 
> Notice the zswap_rb_search and zswap_rb_insert always
> followed by zswap_rb_erase. Use xa_erase and xa_store
> directly. That saves one tree lookup as well.
> 
> Remove zswap_invalidate_entry due to no need to call
> zswap_rb_erase any more. Use zswap_free_entry instead.
> 
> The "struct zswap_tree" has been replaced by "struct xarray".
> The tree spin lock has transferred to the xarray lock.
> 
> Run the kernel build testing 10 times for each version, averages:
> (memory.max=2GB, zswap shrinker and writeback enabled,
> one 50GB swapfile, 24 HT core, 32 jobs)
> 
> mm-unstable-a824831a082f     xarray v7
> user       3547.264			3541.509
> sys        531.176                      526.111
> real       200.752                      201.334
> 
> ---

I believe there shouldn't be a separator before Rb and Sb below.

> Reviewed-by: Nhat Pham <nphamcs@gmail.com>
> 
> Signed-off-by: Chris Li <chrisl@kernel.org>

I have some comments below, with them addressed:

Acked-by: Yosry Ahmed <yosryahmed@google.com>

[..]
> @@ -1556,28 +1474,43 @@ bool zswap_store(struct folio *folio)
>  insert_entry:
>  	entry->swpentry = swp;
>  	entry->objcg = objcg;
> +
> +	old = xa_store(tree, offset, entry, GFP_KERNEL);
> +	if (xa_is_err(old)) {
> +		int err = xa_err(old);

There should be a blank line after the declaration.

> +		WARN_ONCE(err != -ENOMEM, "unexpected xarray error: %d\n", err);
> +		zswap_reject_alloc_fail++;
> +		goto store_failed;
> +	}
> +
> +        /*
> +         * We may have had an existing entry that became stale when
> +         * the folio was redirtied and now the new version is being
> +         * swapped out. Get rid of the old.
> +         */

This comment is mis-indented.

checkpatch would have caught these btw.

> +	if (old)
> +		zswap_entry_free(old);
> +
>  	if (objcg) {
>  		obj_cgroup_charge_zswap(objcg, entry->length);
> -		/* Account before objcg ref is moved to tree */
>  		count_objcg_event(objcg, ZSWPOUT);
>  	}
>  
> -	/* map */
> -	spin_lock(&tree->lock);
>  	/*
> -	 * The folio may have been dirtied again, invalidate the
> -	 * possibly stale entry before inserting the new entry.
> +	 * We finish initializing the entry while it's already in xarray.
> +	 * This is safe because:
> +	 *
> +	 * 1. Concurrent stores and invalidations are excluded by folio lock.
> +	 *
> +	 * 2. Writeback is excluded by the entry not being on the LRU yet.
> +	 *    The publishing order matters to prevent writeback from seeing
> +	 *    an incoherent entry.

As I mentioned before, writeback is also protected by the folio lock.
Concurrent writeback will find the folio in the swapcache and abort. The
fact that the entry is not on the LRU yet is just additional protection,
so I don't think the publishing order actually matters here. Right?

next prev parent reply	other threads:[~2024-03-20  6:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20  5:52 [PATCH v7] zswap: replace RB tree with xarray Chris Li
2024-03-20  6:13 ` Yosry Ahmed [this message]
2024-03-20  6:34   ` Chris Li
2024-03-20  7:24     ` Yosry Ahmed
2024-03-20 10:08       ` Johannes Weiner
2024-03-20 18:34         ` Chris Li
2024-03-20 19:11         ` Yosry Ahmed
2024-03-20 19:25           ` Johannes Weiner
2024-03-20 19:34             ` Yosry Ahmed
2024-03-20 19:41               ` Chris Li
2024-03-20 19:46                 ` Yosry Ahmed
2024-03-20 20:03               ` Johannes Weiner
2024-03-20 20:12                 ` Yosry Ahmed
2024-03-20 10:14 ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zfp-iWaDfqeCOElt@google.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisl@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=v-songbaohua@oppo.com \
    --cc=willy@infradead.org \
    --cc=zhouchengming@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).