Re: [PATCH v9 6/7] mm: zswap: Support large folios in zswap_store().

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Johannes Weiner <hannes@cmpxchg.org>
To: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	yosryahmed@google.com, nphamcs@gmail.com,
	chengming.zhou@linux.dev, usamaarif642@gmail.com,
	shakeel.butt@linux.dev, ryan.roberts@arm.com,
	ying.huang@intel.com, 21cnbao@gmail.com,
	akpm@linux-foundation.org, willy@infradead.org,
	nanhai.zou@intel.com, wajdi.k.feghali@intel.com,
	vinodh.gopal@intel.com
Subject: Re: [PATCH v9 6/7] mm: zswap: Support large folios in zswap_store().
Date: Mon, 30 Sep 2024 21:14:24 -0400	[thread overview]
Message-ID: <20241001011424.GB1349@cmpxchg.org> (raw)
In-Reply-To: <20240930221221.6981-7-kanchana.p.sridhar@intel.com>

On Mon, Sep 30, 2024 at 03:12:20PM -0700, Kanchana P Sridhar wrote:
>  /*********************************
>  * main API
>  **********************************/
> -bool zswap_store(struct folio *folio)
> +
> +/*
> + * Stores the page at specified "index" in a folio.

There is no more index and no folio in this function.

> + *
> + * @page:  The page to store in zswap.
> + * @objcg: The folio's objcg. Caller has a reference.
> + * @pool:  The zswap_pool to store the compressed data for the page.
> + *         The caller should have obtained a reference to a valid
> + *         zswap_pool by calling zswap_pool_tryget(), to pass as this
> + *         argument.
> + * @tree:  The xarray for the @page's folio's swap.

This doesn't look safe.

If the entries were to span a SWAP_ADDRESS_SPACE_SHIFT boundary, the
subpage entries would need to be spread out to different trees also.
Otherwise, it would break loading and writeback down the line.

I *think* it works right now, but at best it's not very future proof.
Please look up the tree inside the function for the specific
swp_entry_t that is being stored.

Same for the unwind/check_old: section.

> + * @compressed_bytes: The compressed entry->length value is added
> + *                    to this, so that the caller can get the total
> + *                    compressed lengths of all sub-pages in a folio.
> + */

With just one caller, IMO the function comment can be dropped...

>  	/* allocate entry */
> -	entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio));
> +	entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(page_folio(page)));

page_to_nid() is safe to use here.

> +bool zswap_store(struct folio *folio)
> +{
> +	long nr_pages = folio_nr_pages(folio);
> +	swp_entry_t swp = folio->swap;
> +	struct xarray *tree = swap_zswap_tree(swp);
> +	struct obj_cgroup *objcg = NULL;
> +	struct mem_cgroup *memcg = NULL;
> +	struct zswap_pool *pool;
> +	size_t compressed_bytes = 0;
> +	bool ret = false;
> +	long index;
> +
> +	VM_WARN_ON_ONCE(!folio_test_locked(folio));
> +	VM_WARN_ON_ONCE(!folio_test_swapcache(folio));
> +
> +	if (!zswap_enabled)
> +		goto check_old;
> +
> +	/*
> +	 * Check cgroup zswap limits:
> +	 *
> +	 * The cgroup zswap limit check is done once at the beginning of
> +	 * zswap_store(). The cgroup charging is done once, at the end
> +	 * of a successful folio store. What this means is, if the cgroup
> +	 * was within the zswap_max limit at the beginning of a large folio
> +	 * store, it could go over the limit by at most (HPAGE_PMD_NR - 1)
> +	 * pages due to this store.
> +	 */
> +	objcg = get_obj_cgroup_from_folio(folio);
> +	if (objcg && !obj_cgroup_may_zswap(objcg)) {
> +		memcg = get_mem_cgroup_from_objcg(objcg);
> +		if (shrink_memcg(memcg)) {
> +			mem_cgroup_put(memcg);
> +			goto put_objcg;
> +		}
> +		mem_cgroup_put(memcg);
> +	}
> +
> +	/*
> +	 * Check zpool utilization against zswap limits:
> +	 *
> +	 * The zswap zpool utilization is also checked against the limits
> +	 * just once, at the start of zswap_store(). If the check passes,
> +	 * any breaches of the limits set by zswap_max_pages() or
> +	 * zswap_accept_thr_pages() that may happen while storing this
> +	 * folio, will only be detected during the next call to
> +	 * zswap_store() by any process.
> +	 */
> +	if (zswap_check_limits())
> +		goto put_objcg;

There has been some back and forth on those comments. Both checks are
non-atomic and subject to races, so mentioning the HPAGE_PMD_NR - 1
overrun is somewhat misleading - it's much higher in the worst case.

Honestly, I would just get rid of the comments. You're not changing
anything fundamental in this regard, so I don't think there is a
burden to add new comments either.

> +
> +	pool = zswap_pool_current_get();
> +	if (!pool)
> +		goto put_objcg;
> +
> +	if (objcg) {
> +		memcg = get_mem_cgroup_from_objcg(objcg);
> +		if (memcg_list_lru_alloc(memcg, &zswap_list_lru, GFP_KERNEL)) {
> +			mem_cgroup_put(memcg);
> +			goto put_pool;
> +		}
> +		mem_cgroup_put(memcg);
> +	}
> +
> +	/*
> +	 * Store each page of the folio as a separate entry. If we fail to
> +	 * store a page, unwind by deleting all the pages for this folio
> +	 * currently in zswap.
> +	 */

The first sentence explains something that is internal to
zswap_store_page(). The second sentence IMO is obvious from the code
itself. I think you can delete this comment.

> +	for (index = 0; index < nr_pages; ++index) {
> +		if (!zswap_store_page(folio_page(folio, index), objcg, pool, tree, &compressed_bytes))
> +			goto put_pool;

Hah, I'm not a stickler for the 80 column line limit, but this is
pushing it ;)

Please grab the page up front.

Yosry had also suggested replacing the compressed_bytes return
parameter with an actual return value. Basically, return compressed
bytes on success, -errno on error. I think this comment was missed
among the page_swap_entry() discussion.

	for (index = 0; index < nr_pages; index++) {
		struct page *page = folio_page(folio, index);
		int bytes;

		bytes = zswap_store_page(page, object, pool, tree);
		if (bytes < 0)
			goto put_pool;
		total_bytes += bytes;
	}

next prev parent reply	other threads:[~2024-10-01  1:14 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-30 22:12 [PATCH v9 0/7] mm: zswap swap-out of large folios Kanchana P Sridhar
2024-09-30 22:12 ` [PATCH v9 1/7] mm: Define obj_cgroup_get() if CONFIG_MEMCG is not defined Kanchana P Sridhar
2024-09-30 22:12 ` [PATCH v9 2/7] mm: zswap: Modify zswap_compress() to accept a page instead of a folio Kanchana P Sridhar
2024-09-30 22:16   ` Yosry Ahmed
2024-09-30 22:12 ` [PATCH v9 3/7] mm: zswap: Rename zswap_pool_get() to zswap_pool_tryget() Kanchana P Sridhar
2024-09-30 22:12 ` [PATCH v9 4/7] mm: Change count_objcg_event() to count_objcg_events() for batch event updates Kanchana P Sridhar
2024-09-30 22:18   ` Yosry Ahmed
2024-09-30 22:59   ` Nhat Pham
2024-10-01  0:22   ` Johannes Weiner
2024-09-30 22:12 ` [PATCH v9 5/7] mm: zswap: Modify zswap_stored_pages to be atomic_long_t Kanchana P Sridhar
2024-09-30 22:12 ` [PATCH v9 6/7] mm: zswap: Support large folios in zswap_store() Kanchana P Sridhar
2024-09-30 23:09   ` Yosry Ahmed
2024-10-01  0:35     ` Sridhar, Kanchana P
2024-10-01  6:00       ` Yosry Ahmed
2024-10-01 16:58         ` Sridhar, Kanchana P
2024-10-01 17:01           ` Yosry Ahmed
2024-10-01 17:09             ` Sridhar, Kanchana P
2024-10-01 21:53               ` Sridhar, Kanchana P
2024-10-01 23:38                 ` Yosry Ahmed
2024-10-01 23:44                   ` Sridhar, Kanchana P
2024-09-30 23:11   ` Nhat Pham
2024-09-30 23:19     ` Yosry Ahmed
2024-09-30 23:29       ` Nhat Pham
2024-09-30 23:33         ` Yosry Ahmed
2024-09-30 23:43           ` Nhat Pham
2024-10-01  0:47             ` Sridhar, Kanchana P
2024-10-01  1:14   ` Johannes Weiner [this message]
2024-10-01  1:53     ` Sridhar, Kanchana P
2024-09-30 22:12 ` [PATCH v9 7/7] mm: swap: Count successful large folio zswap stores in hugepage zswpout stats Kanchana P Sridhar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241001011424.GB1349@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=kanchana.p.sridhar@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nanhai.zou@intel.com \
    --cc=nphamcs@gmail.com \
    --cc=ryan.roberts@arm.com \
    --cc=shakeel.butt@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vinodh.gopal@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.