From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>,
Derek Kiernan <derek.kiernan@amd.com>,
Dragan Cvetic <dragan.cvetic@amd.com>,
Arnd Bergmann <arnd@arndb.de>, Wei Liu <wei.liu@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
Madhavan Venkataraman <madvenka@linux.microsoft.com>,
Anthony Yznaga <anthony.yznaga@oracle.com>,
"Mike Rapoport (IBM)" <rppt@kernel.org>,
James Gowans <jgowans@amazon.com>,
Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
Jinank Jain <jinankjain@linux.microsoft.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool
Date: Fri, 25 Aug 2023 10:05:08 +0200 [thread overview]
Message-ID: <2023082506-enchanted-tripping-d1d5@gregkh> (raw)
In-Reply-To: <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> This patch addresses the need for a memory allocator dedicated to
> persistent memory within the kernel. This allocator will preserve
> kernel-specific states like DMA passthrough device states, IOMMU state, and
> more across kexec.
And CMA doesn't do this for you today?
> The proposed solution offers a foundational implementation for potential
> custom solutions that might follow. Though the implementation is
> intentionally kept concise and straightforward to foster discussion and
> feedback, it's fully functional in its current state.
>
> The persistent memory pool consists of a simple page allocator that
> operates on reserved physical memory regions. It employs bitmaps to
> allocate and free pages, with the following characteristics:
>
> 1. Memory pool regions are specified using the command line option:
>
> pmpool=<base>,<size>
>
> Where <base> represents the physical memory base address and <size>
> indicates the memory size.
>
> 2. While the pages allocation emulates the buddy allocator interface, it
> isn't confined to the MAX_ORDER.
>
> 3. The memory pool initializes during a cold boot and is retained across
> kexec.
>
> Potential applications include:
>
> 1. Allowing various in-kernel entities to allocate persistent pages from
> a singular memory pool, eliminating the need for multiple region
> reservations.
>
> 2. For in-kernel components that require the allocation address to be
> available on kernel kexec, this address can be exposed to user space and
> then passed via the command line.
>
> 3. Separate subsystems or drivers can reserve their region, sharing a
> portion of it for their persistent memory pool. This might be a file
> system, a key-value store, or other applications.
>
> Potential Enhancements for the Proposed Memory Pool:
>
> 1. Multiple Memory Regions Support: enhance the pool to accommodate and
> manage multiple memory regions simultaneously.
>
> 2. In-Kernel Memory Allocations for Storage Memory Regions:
> * Allow in-kernel memory allocations to serve as storage memory regions.
> * Implement explicit reservation of these designated regions during kexec.
As you have no in-kernel users of this, it's not something we can even
consider at the moment for obvious reasons (neither would you want us
to.)
Can you make this part of a patch series that actually adds a user,
probably more than one, so that we can see if any of this even makes
sense?
> drivers/misc/Kconfig | 7 +
> drivers/misc/Makefile | 1
> drivers/misc/pmpool.c | 270 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pmpool.h | 20 ++++
> 4 files changed, 298 insertions(+)
> create mode 100644 drivers/misc/pmpool.c
> create mode 100644 include/linux/pmpool.h
misc is not for memory pools, as this is not a driver. please put this
in the properly location instead of trying to hide it from the mm
maintainers and subsystem :)
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index cadd4a820c03..c8ef5b37ee98 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -562,6 +562,13 @@ config TPS6594_PFSM
> This driver can also be built as a module. If so, the module
> will be called tps6594-pfsm.
>
> +config PMPOOL
> + bool "Persistent memory pool support"
> + help
> + This option adds support for a persistent memory pool feature, which
> + provides pages allocation and freeing from a set of persistent memory ranges,
> + deposited to the memory pool.
Why would this even be a user selectable option? Either the kernel
needs this or it doesn't.
> +
> source "drivers/misc/c2port/Kconfig"
> source "drivers/misc/eeprom/Kconfig"
> source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index f2a4d1ff65d4..31dd6553057d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER) += xilinx_tmr_manager.o
> obj-$(CONFIG_TMR_INJECT) += xilinx_tmr_inject.o
> obj-$(CONFIG_TPS6594_ESM) += tps6594-esm.o
> obj-$(CONFIG_TPS6594_PFSM) += tps6594-pfsm.o
> +obj-$(CONFIG_PMPOOL) += pmpool.o
> diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> new file mode 100644
> index 000000000000..e2c923b31b36
> --- /dev/null
> +++ b/drivers/misc/pmpool.c
> @@ -0,0 +1,270 @@
> +#include <linux/io.h>
You forgot basic copyright/license stuff, did you use checkpatch.pl on
your file?
> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include <linux/pmpool.h>
> +
> +#define VERSION 1
In kernel code does not need versions.
> +#define MAX_REGIONS 14
Why 14? Why not 24? Or something else?
> +
> +#define for_each_region(pmpool, r) \
> + for (r = pmpool->hdr->regions; \
> + r - pmpool->hdr->regions < pmpool->hdr->nr_regions; \
> + r++)
> +
> +#define for_each_used_region(pmpool, r) \
> + for_each_region((pmpool), (r)) \
> + if (!(r)->size_in_pfns) { continue; } else
> +
> +struct pmpool_region {
> + uint32_t base_pfn; /* 32 bits * 4k = up it 15TB of physical address space */
Please use proper kernel types when writing kernel code (i.e. u32, u8,
and so on.) uint*_t is for userspace code only.
> --- /dev/null
> +++ b/include/linux/pmpool.h
> @@ -0,0 +1,20 @@
> +#ifndef _PMPOOL_H
> +#define _PMPOOL_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +void *alloc_pm_pages(unsigned int order);
> +void free_pm_pages(void *addr, unsigned int order);
> +
> +struct pmpool {
> + spinlock_t lock;
> + struct pmpool_header *hdr;
> +};
> +
> +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> +
> +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);
Please use "noun_verb_*" for new apis so that we have a chance at
understanding where the calls are living.
good luck!
greg k-h
next parent reply other threads:[~2023-08-25 8:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-25 8:05 ` Greg Kroah-Hartman [this message]
2023-08-25 13:32 ` [RFC PATCH] Introduce persistent memory pool Gowans, James
[not found] ` <20230823024500.GA25462@skinsburskii.>
2023-08-28 20:50 ` Alexander Graf
[not found] ` <20230829220740.GA26605@skinsburskii.>
2023-08-30 7:20 ` Alexander Graf
2023-08-30 23:39 ` Arnd Bergmann
[not found] ` <64e8f6dd.050a0220.edb3c.c045SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 7:45 ` Greg Kroah-Hartman
[not found] ` <64ea25cd.650a0220.642cc.50e6SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 17:02 ` Greg Kroah-Hartman
[not found] ` <64ea3699.170a0220.13ee0.5c3aSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 20:04 ` Greg Kroah-Hartman
2023-08-31 14:18 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2023082506-enchanted-tripping-d1d5@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=anrayabh@linux.microsoft.com \
--cc=anthony.yznaga@oracle.com \
--cc=arnd@arndb.de \
--cc=derek.kiernan@amd.com \
--cc=dragan.cvetic@amd.com \
--cc=jgowans@amazon.com \
--cc=jinankjain@linux.microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=madvenka@linux.microsoft.com \
--cc=rppt@kernel.org \
--cc=skinsburskii@linux.microsoft.com \
--cc=stanislav.kinsburskii@gmail.com \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox