linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: SeongJae Park <sj@kernel.org>,
	akpm@linux-foundation.org, david@redhat.com,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com,
	rppt@kernel.org, mhocko@suse.com, corbet@lwn.net,
	axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
	hch@infradead.org, jack@suse.cz, willy@infradead.org,
	m.szyprowski@samsung.com, robin.murphy@arm.com,
	hannes@cmpxchg.org, zhengqi.arch@bytedance.com,
	shakeel.butt@linux.dev, axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com, minchan@kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev,
	Minchan Kim <minchan@google.com>
Subject: Re: [PATCH 7/8] mm: introduce GCMA
Date: Fri, 10 Oct 2025 14:11:01 -0700	[thread overview]
Message-ID: <20251010211101.59275-1-sj@kernel.org> (raw)
In-Reply-To: <20251010011951.2136980-8-surenb@google.com>

Hello Suren,

On Thu,  9 Oct 2025 18:19:50 -0700 Suren Baghdasaryan <surenb@google.com> wrote:

> From: Minchan Kim <minchan@google.com>
> 
> This patch introduces GCMA (Guaranteed Contiguous Memory Allocator)
> cleacache backend which reserves some amount of memory at the boot
> and then donates it to store clean file-backed pages in the cleancache.
> GCMA aims to guarantee contiguous memory allocation success as well as
> low and deterministic allocation latency.
> 
> Notes:
> Originally, the idea was posted by SeongJae Park and Minchan Kim [1].
> Later Minchan reworked it to be used in Android as a reference for
> Android vendors to use [2].
> 
> [1] https://lwn.net/Articles/619865/
> [2] https://android-review.googlesource.com/q/topic:%22gcma_6.12%22
> 
> Signed-off-by: Minchan Kim <minchan@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  MAINTAINERS          |   2 +
>  include/linux/gcma.h |  36 +++++++
>  mm/Kconfig           |  15 +++
>  mm/Makefile          |   1 +
>  mm/gcma.c            | 231 +++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 285 insertions(+)
>  create mode 100644 include/linux/gcma.h
>  create mode 100644 mm/gcma.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 441e68c94177..95b5ad26ec11 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16361,6 +16361,7 @@ F:	Documentation/admin-guide/mm/
>  F:	Documentation/mm/
>  F:	include/linux/cma.h
>  F:	include/linux/dmapool.h
> +F:	include/linux/gcma.h
>  F:	include/linux/ioremap.h
>  F:	include/linux/memory-tiers.h
>  F:	include/linux/page_idle.h
> @@ -16372,6 +16373,7 @@ F:	mm/dmapool.c
>  F:	mm/dmapool_test.c
>  F:	mm/early_ioremap.c
>  F:	mm/fadvise.c
> +F:	mm/gcma.c
>  F:	mm/ioremap.c
>  F:	mm/mapping_dirty_helpers.c
>  F:	mm/memory-tiers.c
> diff --git a/include/linux/gcma.h b/include/linux/gcma.h
> new file mode 100644
> index 000000000000..20b2c85de87b
> --- /dev/null
> +++ b/include/linux/gcma.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __GCMA_H__
> +#define __GCMA_H__
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_GCMA
> +
> +int gcma_register_area(const char *name,
> +		       unsigned long start_pfn, unsigned long count);
> +
> +/*
> + * NOTE: allocated pages are still marked reserved and when freeing them
> + * the caller should ensure they are isolated and not referenced by anyone
> + * other than the caller.
> + */
> +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t gfp);
> +int gcma_free_range(unsigned long start_pfn, unsigned long count);
> +
> +#else /* CONFIG_GCMA */
> +
> +static inline int gcma_register_area(const char *name,
> +				     unsigned long start_pfn,
> +				     unsigned long count)
> +		{ return -EOPNOTSUPP; }
> +static inline int gcma_alloc_range(unsigned long start_pfn,
> +				   unsigned long count, gfp_t gfp)
> +		{ return -EOPNOTSUPP; }
> +
> +static inline int gcma_free_range(unsigned long start_pfn,
> +				   unsigned long count)
> +		{ return -EOPNOTSUPP; }
> +
> +#endif /* CONFIG_GCMA */
> +
> +#endif /* __GCMA_H__ */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 9f4da8a848f4..41ce5ef8db55 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1013,6 +1013,21 @@ config CMA_AREAS
>  
>  	  If unsure, leave the default value "8" in UMA and "20" in NUMA.
>  
> +config GCMA
> +       bool "GCMA (Guaranteed Contiguous Memory Allocator)"
> +       depends on CLEANCACHE
> +	help
> +	  This enables the Guaranteed Contiguous Memory Allocator to allow
> +	  low latency guaranteed contiguous memory allocations. Memory
> +	  reserved by GCMA is donated to cleancache to be used as pagecache
> +	  extension. Once GCMA allocation is requested, necessary pages are
> +	  taken back from the cleancache and used to satisfy the request.
> +	  Cleancache guarantees low latency successful allocation as long
> +	  as the total size of GCMA allocations does not exceed the size of
> +	  the memory donated to the cleancache.
> +
> +	  If unsure, say "N".
> +
>  #
>  # Select this config option from the architecture Kconfig, if available, to set
>  # the max page order for physically contiguous allocations.
> diff --git a/mm/Makefile b/mm/Makefile
> index 845841a140e3..05aee66a8b07 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -149,3 +149,4 @@ obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
>  obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
>  obj-$(CONFIG_CLEANCACHE) += cleancache.o
>  obj-$(CONFIG_CLEANCACHE_SYSFS)	+= cleancache_sysfs.o
> +obj-$(CONFIG_GCMA)	+= gcma.o
> diff --git a/mm/gcma.c b/mm/gcma.c
> new file mode 100644
> index 000000000000..3ee0e1340db3
> --- /dev/null
> +++ b/mm/gcma.c
> @@ -0,0 +1,231 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * GCMA (Guaranteed Contiguous Memory Allocator)
> + *
> + */
> +
> +#define pr_fmt(fmt) "gcma: " fmt
> +
> +#include <linux/cleancache.h>
> +#include <linux/gcma.h>
> +#include <linux/hashtable.h>
> +#include <linux/highmem.h>
> +#include <linux/idr.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include "internal.h"
> +
> +#define MAX_GCMA_AREAS		64
> +#define GCMA_AREA_NAME_MAX_LEN	32
> +
> +struct gcma_area {
> +	int pool_id;
> +	unsigned long start_pfn;
> +	unsigned long end_pfn;
> +	char name[GCMA_AREA_NAME_MAX_LEN];
> +};
> +
> +static struct gcma_area areas[MAX_GCMA_AREAS];
> +static atomic_t nr_gcma_area = ATOMIC_INIT(0);
> +static DEFINE_SPINLOCK(gcma_area_lock);
> +
> +static int free_folio_range(struct gcma_area *area,
> +			     unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long scanned = 0;
> +	struct folio *folio;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		int err;
> +
> +		if (!(++scanned % XA_CHECK_SCHED))
> +			cond_resched();
> +
> +		folio = pfn_folio(pfn);
> +		err = cleancache_backend_put_folio(area->pool_id, folio);

Why don't you use pfn_folio() directly, like alloc_folio_range() does?

> +		if (WARN(err, "PFN %lu: folio is still in use\n", pfn))
> +			return -EINVAL;

Why don't you return err, like alloc_folio_range() does?

> +	}
> +
> +	return 0;
> +}
> +
> +static int alloc_folio_range(struct gcma_area *area,
> +			      unsigned long start_pfn, unsigned long end_pfn,
> +			      gfp_t gfp)
> +{
> +	unsigned long scanned = 0;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		int err;
> +
> +		if (!(++scanned % XA_CHECK_SCHED))
> +			cond_resched();
> +
> +		err = cleancache_backend_get_folio(area->pool_id, pfn_folio(pfn));
> +		if (err) {
> +			free_folio_range(area, start_pfn, pfn);
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static struct gcma_area *find_area(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	int nr_area = atomic_read_acquire(&nr_gcma_area);
> +	int i;
> +
> +	for (i = 0; i < nr_area; i++) {
> +		struct gcma_area *area = &areas[i];
> +
> +		if (area->end_pfn <= start_pfn)
> +			continue;
> +
> +		if (area->start_pfn > end_pfn)
> +			continue;
> +
> +		/* The entire range should belong to a single area */
> +		if (start_pfn < area->start_pfn || end_pfn > area->end_pfn)
> +			break;
> +
> +		/* Found the area containing the entire range */
> +		return area;
> +	}
> +
> +	return NULL;
> +}
> +
> +int gcma_register_area(const char *name,
> +		       unsigned long start_pfn, unsigned long count)
> +{
> +	LIST_HEAD(folios);
> +	int i, pool_id;
> +	int nr_area;
> +	int ret = 0;
> +
> +	pool_id = cleancache_backend_register_pool(name);
> +	if (pool_id < 0)
> +		return pool_id;
> +
> +	for (i = 0; i < count; i++) {
> +		struct folio *folio;
> +
> +		folio = pfn_folio(start_pfn + i);
> +		folio_clear_reserved(folio);
> +		folio_set_count(folio, 0);
> +		list_add(&folio->lru, &folios);
> +	}
> +
> +	cleancache_backend_put_folios(pool_id, &folios);
> +
> +	spin_lock(&gcma_area_lock);
> +
> +	nr_area = atomic_read(&nr_gcma_area);
> +	if (nr_area < MAX_GCMA_AREAS) {
> +		struct gcma_area *area = &areas[nr_area];
> +
> +		area->pool_id = pool_id;
> +		area->start_pfn = start_pfn;
> +		area->end_pfn = start_pfn + count;
> +		strscpy(area->name, name);
> +		/* Ensure above stores complete before we increase the count */
> +		atomic_set_release(&nr_gcma_area, nr_area + 1);
> +	} else {
> +		ret = -ENOMEM;
> +	}
> +
> +	spin_unlock(&gcma_area_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(gcma_register_area);
> +
> +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t gfp)
> +{
> +	unsigned long end_pfn = start_pfn + count;
> +	struct gcma_area *area;
> +	struct folio *folio;
> +	int err, order = 0;
> +
> +	gfp = current_gfp_context(gfp);
> +	if (gfp & __GFP_COMP) {
> +		if (!is_power_of_2(count))
> +			return -EINVAL;
> +
> +		order = ilog2(count);
> +		if (order >= MAX_PAGE_ORDER)
> +			return -EINVAL;
> +	}
> +
> +	area = find_area(start_pfn, end_pfn);
> +	if (!area)
> +		return -EINVAL;
> +
> +	err = alloc_folio_range(area, start_pfn, end_pfn, gfp);
> +	if (err)
> +		return err;
> +
> +	/*
> +	 * GCMA returns pages with refcount 1 and expects them to have
> +	 * the same refcount 1 when they are freed.
> +	 */
> +	if (order) {
> +		folio = pfn_folio(start_pfn);
> +		set_page_count(&folio->page, 1);
> +		prep_compound_page(&folio->page, order);
> +	} else {
> +		for (unsigned long pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			folio = pfn_folio(pfn);
> +			set_page_count(&folio->page, 1);
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gcma_alloc_range);

I'm wondering if the rule of exporting symbols only for in-tree modules that
use the symbols should be applied here or not, and why.

> +
> +int gcma_free_range(unsigned long start_pfn, unsigned long count)
> +{
> +	unsigned long end_pfn = start_pfn + count;
> +	struct gcma_area *area;
> +	struct folio *folio;
> +
> +	area = find_area(start_pfn, end_pfn);
> +	if (!area)
> +		return -EINVAL;
> +
> +	folio = pfn_folio(start_pfn);
> +	if (folio_test_large(folio)) {
> +		int expected = folio_nr_pages(folio);

folio_nr_pages() return 'unsigned long'.  Would it be better to match the type?

> +
> +		if (WARN(count != expected, "PFN %lu: count %lu != expected %d\n",
> +			  start_pfn, count, expected))
> +			return -EINVAL;
> +
> +		if (WARN(!folio_ref_dec_and_test(folio),
> +			 "PFN %lu: invalid folio refcount when freeing\n", start_pfn))
> +			return -EINVAL;
> +
> +		free_pages_prepare(&folio->page, folio_order(folio));
> +	} else {
> +		for (unsigned long pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			folio = pfn_folio(pfn);
> +			if (folio_nr_pages(folio) == 1)
> +				count--;
> +
> +			if (WARN(!folio_ref_dec_and_test(folio),
> +				 "PFN %lu: invalid folio refcount when freeing\n", pfn))
> +				return -EINVAL;

Don't we need to increase the previously decreased folio refcounts?

> +
> +			free_pages_prepare(&folio->page, 0);
> +		}
> +		WARN(count != 0, "%lu pages are still in use!\n", count);

Is WARN() but not returning error here ok?

Also, why don't you warn earlier above if 'folio_nr_pages(folio) != 1' ?

> +	}
> +
> +	return free_folio_range(area, start_pfn, end_pfn);
> +}
> +EXPORT_SYMBOL_GPL(gcma_free_range);

Like the gcma_alloc_range() case, I'm curious if this symbol exporting is
somewhat intended and the intention is explained.

> -- 
> 2.51.0.740.g6adb054d12-goog


Thanks,
SJ

  reply	other threads:[~2025-10-10 21:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-10  1:19 [PATCH 0/8] Guaranteed CMA Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 1/8] mm: implement cleancache Suren Baghdasaryan
2025-10-10  1:31   ` Andrew Morton
2025-10-10  1:42     ` Suren Baghdasaryan
2025-10-10  2:39   ` Matthew Wilcox
2025-10-10 14:53     ` Suren Baghdasaryan
2025-10-13  6:44   ` Christoph Hellwig
2025-10-13 15:43     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 2/8] mm/cleancache: add cleancache LRU for folio aging Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 3/8] mm/cleancache: readahead support Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 4/8] mm/cleancache: add sysfs interface Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 5/8] mm/tests: add cleancache kunit test Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 6/8] add cleancache documentation Suren Baghdasaryan
2025-10-10 20:20   ` SeongJae Park
2025-10-10 22:09     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 7/8] mm: introduce GCMA Suren Baghdasaryan
2025-10-10 21:11   ` SeongJae Park [this message]
2025-10-10 22:05     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 8/8] mm: integrate GCMA with CMA using dt-bindings Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251010211101.59275-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexandru.elisei@arm.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=m.szyprowski@samsung.com \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).