All of lore.kernel.org
 help / color / mirror / Atom feed
From: SeongJae Park <sj@kernel.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: SeongJae Park <sj@kernel.org>,
	akpm@linux-foundation.org, david@redhat.com,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, alexandru.elisei@arm.com, peterx@redhat.com,
	rppt@kernel.org, mhocko@suse.com, corbet@lwn.net,
	axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org,
	hch@infradead.org, jack@suse.cz, willy@infradead.org,
	m.szyprowski@samsung.com, robin.murphy@arm.com,
	hannes@cmpxchg.org, zhengqi.arch@bytedance.com,
	shakeel.butt@linux.dev, axelrasmussen@google.com,
	yuanchu@google.com, weixugc@google.com, minchan@kernel.org,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, iommu@lists.linux.dev,
	Minchan Kim <minchan@google.com>
Subject: Re: [PATCH 7/8] mm: introduce GCMA
Date: Fri, 10 Oct 2025 14:11:01 -0700	[thread overview]
Message-ID: <20251010211101.59275-1-sj@kernel.org> (raw)
In-Reply-To: <20251010011951.2136980-8-surenb@google.com>

Hello Suren,

On Thu,  9 Oct 2025 18:19:50 -0700 Suren Baghdasaryan <surenb@google.com> wrote:

> From: Minchan Kim <minchan@google.com>
> 
> This patch introduces GCMA (Guaranteed Contiguous Memory Allocator)
> cleacache backend which reserves some amount of memory at the boot
> and then donates it to store clean file-backed pages in the cleancache.
> GCMA aims to guarantee contiguous memory allocation success as well as
> low and deterministic allocation latency.
> 
> Notes:
> Originally, the idea was posted by SeongJae Park and Minchan Kim [1].
> Later Minchan reworked it to be used in Android as a reference for
> Android vendors to use [2].
> 
> [1] https://lwn.net/Articles/619865/
> [2] https://android-review.googlesource.com/q/topic:%22gcma_6.12%22
> 
> Signed-off-by: Minchan Kim <minchan@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  MAINTAINERS          |   2 +
>  include/linux/gcma.h |  36 +++++++
>  mm/Kconfig           |  15 +++
>  mm/Makefile          |   1 +
>  mm/gcma.c            | 231 +++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 285 insertions(+)
>  create mode 100644 include/linux/gcma.h
>  create mode 100644 mm/gcma.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 441e68c94177..95b5ad26ec11 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16361,6 +16361,7 @@ F:	Documentation/admin-guide/mm/
>  F:	Documentation/mm/
>  F:	include/linux/cma.h
>  F:	include/linux/dmapool.h
> +F:	include/linux/gcma.h
>  F:	include/linux/ioremap.h
>  F:	include/linux/memory-tiers.h
>  F:	include/linux/page_idle.h
> @@ -16372,6 +16373,7 @@ F:	mm/dmapool.c
>  F:	mm/dmapool_test.c
>  F:	mm/early_ioremap.c
>  F:	mm/fadvise.c
> +F:	mm/gcma.c
>  F:	mm/ioremap.c
>  F:	mm/mapping_dirty_helpers.c
>  F:	mm/memory-tiers.c
> diff --git a/include/linux/gcma.h b/include/linux/gcma.h
> new file mode 100644
> index 000000000000..20b2c85de87b
> --- /dev/null
> +++ b/include/linux/gcma.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __GCMA_H__
> +#define __GCMA_H__
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_GCMA
> +
> +int gcma_register_area(const char *name,
> +		       unsigned long start_pfn, unsigned long count);
> +
> +/*
> + * NOTE: allocated pages are still marked reserved and when freeing them
> + * the caller should ensure they are isolated and not referenced by anyone
> + * other than the caller.
> + */
> +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t gfp);
> +int gcma_free_range(unsigned long start_pfn, unsigned long count);
> +
> +#else /* CONFIG_GCMA */
> +
> +static inline int gcma_register_area(const char *name,
> +				     unsigned long start_pfn,
> +				     unsigned long count)
> +		{ return -EOPNOTSUPP; }
> +static inline int gcma_alloc_range(unsigned long start_pfn,
> +				   unsigned long count, gfp_t gfp)
> +		{ return -EOPNOTSUPP; }
> +
> +static inline int gcma_free_range(unsigned long start_pfn,
> +				   unsigned long count)
> +		{ return -EOPNOTSUPP; }
> +
> +#endif /* CONFIG_GCMA */
> +
> +#endif /* __GCMA_H__ */
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 9f4da8a848f4..41ce5ef8db55 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1013,6 +1013,21 @@ config CMA_AREAS
>  
>  	  If unsure, leave the default value "8" in UMA and "20" in NUMA.
>  
> +config GCMA
> +       bool "GCMA (Guaranteed Contiguous Memory Allocator)"
> +       depends on CLEANCACHE
> +	help
> +	  This enables the Guaranteed Contiguous Memory Allocator to allow
> +	  low latency guaranteed contiguous memory allocations. Memory
> +	  reserved by GCMA is donated to cleancache to be used as pagecache
> +	  extension. Once GCMA allocation is requested, necessary pages are
> +	  taken back from the cleancache and used to satisfy the request.
> +	  Cleancache guarantees low latency successful allocation as long
> +	  as the total size of GCMA allocations does not exceed the size of
> +	  the memory donated to the cleancache.
> +
> +	  If unsure, say "N".
> +
>  #
>  # Select this config option from the architecture Kconfig, if available, to set
>  # the max page order for physically contiguous allocations.
> diff --git a/mm/Makefile b/mm/Makefile
> index 845841a140e3..05aee66a8b07 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -149,3 +149,4 @@ obj-$(CONFIG_TMPFS_QUOTA) += shmem_quota.o
>  obj-$(CONFIG_PT_RECLAIM) += pt_reclaim.o
>  obj-$(CONFIG_CLEANCACHE) += cleancache.o
>  obj-$(CONFIG_CLEANCACHE_SYSFS)	+= cleancache_sysfs.o
> +obj-$(CONFIG_GCMA)	+= gcma.o
> diff --git a/mm/gcma.c b/mm/gcma.c
> new file mode 100644
> index 000000000000..3ee0e1340db3
> --- /dev/null
> +++ b/mm/gcma.c
> @@ -0,0 +1,231 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * GCMA (Guaranteed Contiguous Memory Allocator)
> + *
> + */
> +
> +#define pr_fmt(fmt) "gcma: " fmt
> +
> +#include <linux/cleancache.h>
> +#include <linux/gcma.h>
> +#include <linux/hashtable.h>
> +#include <linux/highmem.h>
> +#include <linux/idr.h>
> +#include <linux/slab.h>
> +#include <linux/xarray.h>
> +#include "internal.h"
> +
> +#define MAX_GCMA_AREAS		64
> +#define GCMA_AREA_NAME_MAX_LEN	32
> +
> +struct gcma_area {
> +	int pool_id;
> +	unsigned long start_pfn;
> +	unsigned long end_pfn;
> +	char name[GCMA_AREA_NAME_MAX_LEN];
> +};
> +
> +static struct gcma_area areas[MAX_GCMA_AREAS];
> +static atomic_t nr_gcma_area = ATOMIC_INIT(0);
> +static DEFINE_SPINLOCK(gcma_area_lock);
> +
> +static int free_folio_range(struct gcma_area *area,
> +			     unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	unsigned long scanned = 0;
> +	struct folio *folio;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		int err;
> +
> +		if (!(++scanned % XA_CHECK_SCHED))
> +			cond_resched();
> +
> +		folio = pfn_folio(pfn);
> +		err = cleancache_backend_put_folio(area->pool_id, folio);

Why don't you use pfn_folio() directly, like alloc_folio_range() does?

> +		if (WARN(err, "PFN %lu: folio is still in use\n", pfn))
> +			return -EINVAL;

Why don't you return err, like alloc_folio_range() does?

> +	}
> +
> +	return 0;
> +}
> +
> +static int alloc_folio_range(struct gcma_area *area,
> +			      unsigned long start_pfn, unsigned long end_pfn,
> +			      gfp_t gfp)
> +{
> +	unsigned long scanned = 0;
> +	unsigned long pfn;
> +
> +	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +		int err;
> +
> +		if (!(++scanned % XA_CHECK_SCHED))
> +			cond_resched();
> +
> +		err = cleancache_backend_get_folio(area->pool_id, pfn_folio(pfn));
> +		if (err) {
> +			free_folio_range(area, start_pfn, pfn);
> +			return err;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static struct gcma_area *find_area(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +	int nr_area = atomic_read_acquire(&nr_gcma_area);
> +	int i;
> +
> +	for (i = 0; i < nr_area; i++) {
> +		struct gcma_area *area = &areas[i];
> +
> +		if (area->end_pfn <= start_pfn)
> +			continue;
> +
> +		if (area->start_pfn > end_pfn)
> +			continue;
> +
> +		/* The entire range should belong to a single area */
> +		if (start_pfn < area->start_pfn || end_pfn > area->end_pfn)
> +			break;
> +
> +		/* Found the area containing the entire range */
> +		return area;
> +	}
> +
> +	return NULL;
> +}
> +
> +int gcma_register_area(const char *name,
> +		       unsigned long start_pfn, unsigned long count)
> +{
> +	LIST_HEAD(folios);
> +	int i, pool_id;
> +	int nr_area;
> +	int ret = 0;
> +
> +	pool_id = cleancache_backend_register_pool(name);
> +	if (pool_id < 0)
> +		return pool_id;
> +
> +	for (i = 0; i < count; i++) {
> +		struct folio *folio;
> +
> +		folio = pfn_folio(start_pfn + i);
> +		folio_clear_reserved(folio);
> +		folio_set_count(folio, 0);
> +		list_add(&folio->lru, &folios);
> +	}
> +
> +	cleancache_backend_put_folios(pool_id, &folios);
> +
> +	spin_lock(&gcma_area_lock);
> +
> +	nr_area = atomic_read(&nr_gcma_area);
> +	if (nr_area < MAX_GCMA_AREAS) {
> +		struct gcma_area *area = &areas[nr_area];
> +
> +		area->pool_id = pool_id;
> +		area->start_pfn = start_pfn;
> +		area->end_pfn = start_pfn + count;
> +		strscpy(area->name, name);
> +		/* Ensure above stores complete before we increase the count */
> +		atomic_set_release(&nr_gcma_area, nr_area + 1);
> +	} else {
> +		ret = -ENOMEM;
> +	}
> +
> +	spin_unlock(&gcma_area_lock);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(gcma_register_area);
> +
> +int gcma_alloc_range(unsigned long start_pfn, unsigned long count, gfp_t gfp)
> +{
> +	unsigned long end_pfn = start_pfn + count;
> +	struct gcma_area *area;
> +	struct folio *folio;
> +	int err, order = 0;
> +
> +	gfp = current_gfp_context(gfp);
> +	if (gfp & __GFP_COMP) {
> +		if (!is_power_of_2(count))
> +			return -EINVAL;
> +
> +		order = ilog2(count);
> +		if (order >= MAX_PAGE_ORDER)
> +			return -EINVAL;
> +	}
> +
> +	area = find_area(start_pfn, end_pfn);
> +	if (!area)
> +		return -EINVAL;
> +
> +	err = alloc_folio_range(area, start_pfn, end_pfn, gfp);
> +	if (err)
> +		return err;
> +
> +	/*
> +	 * GCMA returns pages with refcount 1 and expects them to have
> +	 * the same refcount 1 when they are freed.
> +	 */
> +	if (order) {
> +		folio = pfn_folio(start_pfn);
> +		set_page_count(&folio->page, 1);
> +		prep_compound_page(&folio->page, order);
> +	} else {
> +		for (unsigned long pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			folio = pfn_folio(pfn);
> +			set_page_count(&folio->page, 1);
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(gcma_alloc_range);

I'm wondering if the rule of exporting symbols only for in-tree modules that
use the symbols should be applied here or not, and why.

> +
> +int gcma_free_range(unsigned long start_pfn, unsigned long count)
> +{
> +	unsigned long end_pfn = start_pfn + count;
> +	struct gcma_area *area;
> +	struct folio *folio;
> +
> +	area = find_area(start_pfn, end_pfn);
> +	if (!area)
> +		return -EINVAL;
> +
> +	folio = pfn_folio(start_pfn);
> +	if (folio_test_large(folio)) {
> +		int expected = folio_nr_pages(folio);

folio_nr_pages() return 'unsigned long'.  Would it be better to match the type?

> +
> +		if (WARN(count != expected, "PFN %lu: count %lu != expected %d\n",
> +			  start_pfn, count, expected))
> +			return -EINVAL;
> +
> +		if (WARN(!folio_ref_dec_and_test(folio),
> +			 "PFN %lu: invalid folio refcount when freeing\n", start_pfn))
> +			return -EINVAL;
> +
> +		free_pages_prepare(&folio->page, folio_order(folio));
> +	} else {
> +		for (unsigned long pfn = start_pfn; pfn < end_pfn; pfn++) {
> +			folio = pfn_folio(pfn);
> +			if (folio_nr_pages(folio) == 1)
> +				count--;
> +
> +			if (WARN(!folio_ref_dec_and_test(folio),
> +				 "PFN %lu: invalid folio refcount when freeing\n", pfn))
> +				return -EINVAL;

Don't we need to increase the previously decreased folio refcounts?

> +
> +			free_pages_prepare(&folio->page, 0);
> +		}
> +		WARN(count != 0, "%lu pages are still in use!\n", count);

Is WARN() but not returning error here ok?

Also, why don't you warn earlier above if 'folio_nr_pages(folio) != 1' ?

> +	}
> +
> +	return free_folio_range(area, start_pfn, end_pfn);
> +}
> +EXPORT_SYMBOL_GPL(gcma_free_range);

Like the gcma_alloc_range() case, I'm curious if this symbol exporting is
somewhat intended and the intention is explained.

> -- 
> 2.51.0.740.g6adb054d12-goog


Thanks,
SJ

  reply	other threads:[~2025-10-10 21:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-10  1:19 [PATCH 0/8] Guaranteed CMA Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 1/8] mm: implement cleancache Suren Baghdasaryan
2025-10-10  1:31   ` Andrew Morton
2025-10-10  1:42     ` Suren Baghdasaryan
2025-10-10  2:39   ` Matthew Wilcox
2025-10-10 14:53     ` Suren Baghdasaryan
2025-10-10 21:17   ` kernel test robot
2025-10-10 21:42     ` Suren Baghdasaryan
2025-10-13  6:44   ` Christoph Hellwig
2025-10-13 15:43     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 2/8] mm/cleancache: add cleancache LRU for folio aging Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 3/8] mm/cleancache: readahead support Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 4/8] mm/cleancache: add sysfs interface Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 5/8] mm/tests: add cleancache kunit test Suren Baghdasaryan
2025-10-11  2:57   ` kernel test robot
2025-10-11 21:47     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 6/8] add cleancache documentation Suren Baghdasaryan
2025-10-10 20:20   ` SeongJae Park
2025-10-10 22:09     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 7/8] mm: introduce GCMA Suren Baghdasaryan
2025-10-10 21:11   ` SeongJae Park [this message]
2025-10-10 22:05     ` Suren Baghdasaryan
2025-10-10  1:19 ` [PATCH 8/8] mm: integrate GCMA with CMA using dt-bindings Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251010211101.59275-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexandru.elisei@arm.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=m.szyprowski@samsung.com \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.