From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>,
Derek Kiernan <derek.kiernan@amd.com>,
Dragan Cvetic <dragan.cvetic@amd.com>,
Arnd Bergmann <arnd@arndb.de>, Wei Liu <wei.liu@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Madhavan Venkataraman <madvenka@linux.microsoft.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
James Gowans <jgowans@amazon.com>,
Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
Jinank Jain <jinankjain@linux.microsoft.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool
Date: Fri, 25 Aug 2023 10:05:08 +0200 [thread overview]
Message-ID: <2023082506-enchanted-tripping-d1d5@gregkh> (raw)
In-Reply-To: <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> This patch addresses the need for a memory allocator dedicated to
> persistent memory within the kernel. This allocator will preserve
> kernel-specific states like DMA passthrough device states, IOMMU state, and
> more across kexec.
And CMA doesn't do this for you today?
> The proposed solution offers a foundational implementation for potential
> custom solutions that might follow. Though the implementation is
> intentionally kept concise and straightforward to foster discussion and
> feedback, it's fully functional in its current state.
>
> The persistent memory pool consists of a simple page allocator that
> operates on reserved physical memory regions. It employs bitmaps to
> allocate and free pages, with the following characteristics:
>
> 1. Memory pool regions are specified using the command line option:
>
> pmpool=<base>,<size>
>
> Where <base> represents the physical memory base address and <size>
> indicates the memory size.
>
> 2. While the pages allocation emulates the buddy allocator interface, it
> isn't confined to the MAX_ORDER.
>
> 3. The memory pool initializes during a cold boot and is retained across
> kexec.
>
> Potential applications include:
>
> 1. Allowing various in-kernel entities to allocate persistent pages from
> a singular memory pool, eliminating the need for multiple region
> reservations.
>
> 2. For in-kernel components that require the allocation address to be
> available on kernel kexec, this address can be exposed to user space and
> then passed via the command line.
>
> 3. Separate subsystems or drivers can reserve their region, sharing a
> portion of it for their persistent memory pool. This might be a file
> system, a key-value store, or other applications.
>
> Potential Enhancements for the Proposed Memory Pool:
>
> 1. Multiple Memory Regions Support: enhance the pool to accommodate and
> manage multiple memory regions simultaneously.
>
> 2. In-Kernel Memory Allocations for Storage Memory Regions:
> * Allow in-kernel memory allocations to serve as storage memory regions.
> * Implement explicit reservation of these designated regions during kexec.
As you have no in-kernel users of this, it's not something we can even
consider at the moment for obvious reasons (neither would you want us
to.)
Can you make this part of a patch series that actually adds a user,
probably more than one, so that we can see if any of this even makes
sense?
> drivers/misc/Kconfig | 7 +
> drivers/misc/Makefile | 1
> drivers/misc/pmpool.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pmpool.h | 20 ++++
> 4 files changed, 298 insertions(+)
> create mode 100644 drivers/misc/pmpool.c
> create mode 100644 include/linux/pmpool.h
misc is not for memory pools, as this is not a driver. please put this
in the properly location instead of trying to hide it from the mm
maintainers and subsystem :)
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index cadd4a820c03..c8ef5b37ee98 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -562,6 +562,13 @@ config TPS6594_PFSM
> This driver can also be built as a module. If so, the module
> will be called tps6594-pfsm.
>
> +config PMPOOL
> + bool "Persistent memory pool support"
> + help
> + This option adds support for a persistent memory pool feature, which
> + provides pages allocation and freeing from a set of persistent memory ranges,
> + deposited to the memory pool.
Why would this even be a user selectable option? Either the kernel
needs this or it doesn't.
> +
> source "drivers/misc/c2port/Kconfig"
> source "drivers/misc/eeprom/Kconfig"
> source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index f2a4d1ff65d4..31dd6553057d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER) += xilinx_tmr_manager.o
> obj-$(CONFIG_TMR_INJECT) += xilinx_tmr_inject.o
> obj-$(CONFIG_TPS6594_ESM) += tps6594-esm.o
> obj-$(CONFIG_TPS6594_PFSM) += tps6594-pfsm.o
> +obj-$(CONFIG_PMPOOL) += pmpool.o
> diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> new file mode 100644
> index 000000000000..e2c923b31b36
> --- /dev/null
> +++ b/drivers/misc/pmpool.c
> @@ -0,0 +1,270 @@
> +#include <linux/io.h>
You forgot basic copyright/license stuff, did you use checkpatch.pl on
your file?
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include <linux/pmpool.h>
> +
> +#define VERSION 1
In kernel code does not need versions.
> +#define MAX_REGIONS 14
Why 14? Why not 24? Or something else?
> +
> +#define for_each_region(pmpool, r) \
> + for (r = pmpool->hdr->regions; \
> + r - pmpool->hdr->regions < pmpool->hdr->nr_regions; \
> + r++)
> +
> +#define for_each_used_region(pmpool, r) \
> + for_each_region((pmpool), (r)) \
> + if (!(r)->size_in_pfns) { continue; } else
> +
> +struct pmpool_region {
> + uint32_t base_pfn; /* 32 bits * 4k = up it 15TB of physical address space */
Please use proper kernel types when writing kernel code (i.e. u32, u8,
and so on.) uint*_t is for userspace code only.
> --- /dev/null
> +++ b/include/linux/pmpool.h
> @@ -0,0 +1,20 @@
> +#ifndef _PMPOOL_H
> +#define _PMPOOL_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +void *alloc_pm_pages(unsigned int order);
> +void free_pm_pages(void *addr, unsigned int order);
> +
> +struct pmpool {
> + spinlock_t lock;
> + struct pmpool_header *hdr;
> +};
> +
> +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> +
> +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);
Please use "noun_verb_*" for new apis so that we have a chance at
understanding where the calls are living.
good luck!
greg k-h
next parent reply other threads:[~2023-08-25 8:27 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-25 8:05 ` Greg Kroah-Hartman [this message]
2023-08-23 1:36 ` [RFC PATCH] Introduce persistent memory pool Stanislav Kinsburskii
2023-08-25 13:32 ` Gowans, James
2023-08-23 2:45 ` Stanislav Kinsburskii
2023-08-28 20:50 ` Alexander Graf
2023-08-28 20:50 ` Alexander Graf
2023-08-29 22:07 ` Stanislav Kinsburskii
2023-08-29 22:07 ` Stanislav Kinsburskii
2023-08-30 7:20 ` Alexander Graf
2023-08-30 7:20 ` Alexander Graf
2023-08-30 23:39 ` Arnd Bergmann
2023-08-30 23:39 ` Arnd Bergmann
2023-08-31 2:24 ` Stanislav Kinsburskii
2023-08-31 2:24 ` Stanislav Kinsburskii
[not found] ` <64e8f6dd.050a0220.edb3c.c045SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 7:45 ` Greg Kroah-Hartman
2023-08-23 6:15 ` Stanislav Kinsburskii
[not found] ` <64ea25cd.650a0220.642cc.50e6SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 17:02 ` Greg Kroah-Hartman
2023-08-23 6:21 ` Stanislav Kinsburskii
[not found] ` <64ea3699.170a0220.13ee0.5c3aSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 20:04 ` Greg Kroah-Hartman
2023-08-31 14:18 ` Paolo Bonzini
2023-08-31 2:37 ` Stanislav Kinsburskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2023082506-enchanted-tripping-d1d5@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=anrayabh@linux.microsoft.com \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=derek.kiernan@amd.com \
--cc=dragan.cvetic@amd.com \
--cc=jgowans@amazon.com \
--cc=jinankjain@linux.microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=madvenka@linux.microsoft.com \
--cc=rppt@kernel.org \
--cc=skinsburskii@linux.microsoft.com \
--cc=stanislav.kinsburskii@gmail.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.