public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com>,
	Derek Kiernan <derek.kiernan@amd.com>,
	Dragan Cvetic <dragan.cvetic@amd.com>,
	Arnd Bergmann <arnd@arndb.de>, Wei Liu <wei.liu@kernel.org>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Madhavan Venkataraman <madvenka@linux.microsoft.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	"Mike Rapoport (IBM)" <rppt@kernel.org>,
	James Gowans <jgowans@amazon.com>,
	Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
	Jinank Jain <jinankjain@linux.microsoft.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] Introduce persistent memory pool
Date: Fri, 25 Aug 2023 10:05:08 +0200	[thread overview]
Message-ID: <2023082506-enchanted-tripping-d1d5@gregkh> (raw)
In-Reply-To: <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Aug 22, 2023 at 11:34:34AM -0700, Stanislav Kinsburskii wrote:
> This patch addresses the need for a memory allocator dedicated to
> persistent memory within the kernel. This allocator will preserve
> kernel-specific states like DMA passthrough device states, IOMMU state, and
> more across kexec.

And CMA doesn't do this for you today?

> The proposed solution offers a foundational implementation for potential
> custom solutions that might follow. Though the implementation is
> intentionally kept concise and straightforward to foster discussion and
> feedback, it's fully functional in its current state.
> 
> The persistent memory pool consists of a simple page allocator that
> operates on reserved physical memory regions. It employs bitmaps to
> allocate and free pages, with the following characteristics:
> 
>   1. Memory pool regions are specified using the command line option:
> 
>        pmpool=<base>,<size>
> 
>      Where <base> represents the physical memory base address and <size>
>      indicates the memory size.
> 
>   2. While the pages allocation emulates the buddy allocator interface, it
>      isn't confined to the MAX_ORDER.
> 
>   3. The memory pool initializes during a cold boot and is retained across
>      kexec.
> 
> Potential applications include:
> 
>   1. Allowing various in-kernel entities to allocate persistent pages from
>      a singular memory pool, eliminating the need for multiple region
>      reservations.
> 
>   2. For in-kernel components that require the allocation address to be
>      available on kernel kexec, this address can be exposed to user space and
>      then passed via the command line.
> 
>   3. Separate subsystems or drivers can reserve their region, sharing a
>      portion of it for their persistent memory pool. This might be a file
>      system, a key-value store, or other applications.
> 
> Potential Enhancements for the Proposed Memory Pool:
> 
>   1. Multiple Memory Regions Support: enhance the pool to accommodate and
>      manage multiple memory regions simultaneously.
> 
>   2. In-Kernel Memory Allocations for Storage Memory Regions:
>    * Allow in-kernel memory allocations to serve as storage memory regions.
>    * Implement explicit reservation of these designated regions during kexec.

As you have no in-kernel users of this, it's not something we can even
consider at the moment for obvious reasons (neither would you want us
to.)

Can you make this part of a patch series that actually adds a user,
probably more than one, so that we can see if any of this even makes
sense?


>  drivers/misc/Kconfig   |    7 +
>  drivers/misc/Makefile  |    1 
>  drivers/misc/pmpool.c  |  270 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pmpool.h |   20 ++++
>  4 files changed, 298 insertions(+)
>  create mode 100644 drivers/misc/pmpool.c
>  create mode 100644 include/linux/pmpool.h

misc is not for memory pools, as this is not a driver.  please put this
in the properly location instead of trying to hide it from the mm
maintainers and subsystem :)

> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index cadd4a820c03..c8ef5b37ee98 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -562,6 +562,13 @@ config TPS6594_PFSM
>  	  This driver can also be built as a module.  If so, the module
>  	  will be called tps6594-pfsm.
>  
> +config PMPOOL
> +	bool "Persistent memory pool support"
> +	help
> +	  This option adds support for a persistent memory pool feature, which
> +	  provides pages allocation and freeing from a set of persistent memory ranges,
> +	  deposited to the memory pool.

Why would this even be a user selectable option?  Either the kernel
needs this or it doesn't.


> +
>  source "drivers/misc/c2port/Kconfig"
>  source "drivers/misc/eeprom/Kconfig"
>  source "drivers/misc/cb710/Kconfig"
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index f2a4d1ff65d4..31dd6553057d 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -67,3 +67,4 @@ obj-$(CONFIG_TMR_MANAGER)      += xilinx_tmr_manager.o
>  obj-$(CONFIG_TMR_INJECT)	+= xilinx_tmr_inject.o
>  obj-$(CONFIG_TPS6594_ESM)	+= tps6594-esm.o
>  obj-$(CONFIG_TPS6594_PFSM)	+= tps6594-pfsm.o
> +obj-$(CONFIG_PMPOOL)		+= pmpool.o
> diff --git a/drivers/misc/pmpool.c b/drivers/misc/pmpool.c
> new file mode 100644
> index 000000000000..e2c923b31b36
> --- /dev/null
> +++ b/drivers/misc/pmpool.c
> @@ -0,0 +1,270 @@
> +#include <linux/io.h>

You forgot basic copyright/license stuff, did you use checkpatch.pl on
your file?

> +#include <linux/bitmap.h>
> +#include <linux/memblock.h>
> +#include <linux/spinlock.h>
> +#include <linux/types.h>
> +
> +#include <linux/pmpool.h>
> +
> +#define VERSION			1

In kernel code does not need versions.

> +#define MAX_REGIONS		14

Why 14?  Why not 24?  Or something else?

> +
> +#define for_each_region(pmpool, r)					\
> +	for (r = pmpool->hdr->regions;					\
> +	     r - pmpool->hdr->regions < pmpool->hdr->nr_regions;	\
> +	     r++)
> +
> +#define for_each_used_region(pmpool, r)					\
> +	for_each_region((pmpool), (r))					\
> +		if (!(r)->size_in_pfns) { continue; } else
> +
> +struct pmpool_region {
> +	uint32_t		base_pfn;		/* 32 bits * 4k = up it 15TB of physical address space */

Please use proper kernel types when writing kernel code (i.e. u32, u8,
and so on.)  uint*_t is for userspace code only.

> --- /dev/null
> +++ b/include/linux/pmpool.h
> @@ -0,0 +1,20 @@
> +#ifndef _PMPOOL_H
> +#define _PMPOOL_H
> +
> +#include <linux/types.h>
> +#include <linux/spinlock.h>
> +
> +void *alloc_pm_pages(unsigned int order);
> +void free_pm_pages(void *addr, unsigned int order);
> +
> +struct pmpool {
> +	spinlock_t		lock;
> +	struct pmpool_header	*hdr;
> +};
> +
> +int pmpool_init(struct pmpool *pmpool, phys_addr_t base, phys_addr_t size);
> +
> +void *alloc_pmpool_pages(struct pmpool *pmpool, unsigned int order);
> +void free_pmpool_pages(struct pmpool *pmpool, void *addr, unsigned int order);

Please use "noun_verb_*" for new apis so that we have a chance at
understanding where the calls are living.

good luck!

greg k-h

       reply	other threads:[~2023-08-25  8:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <64e7cbf7.050a0220.114c7.b70dSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-25  8:05 ` Greg Kroah-Hartman [this message]
2023-08-25 13:32   ` [RFC PATCH] Introduce persistent memory pool Gowans, James
     [not found]     ` <20230823024500.GA25462@skinsburskii.>
2023-08-28 20:50       ` Alexander Graf
     [not found]         ` <20230829220740.GA26605@skinsburskii.>
2023-08-30  7:20           ` Alexander Graf
2023-08-30 23:39             ` Arnd Bergmann
     [not found]   ` <64e8f6dd.050a0220.edb3c.c045SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26  7:45     ` Greg Kroah-Hartman
     [not found]       ` <64ea25cd.650a0220.642cc.50e6SMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 17:02         ` Greg Kroah-Hartman
     [not found]           ` <64ea3699.170a0220.13ee0.5c3aSMTPIN_ADDED_BROKEN@mx.google.com>
2023-08-26 20:04             ` Greg Kroah-Hartman
2023-08-31 14:18               ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2023082506-enchanted-tripping-d1d5@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=anrayabh@linux.microsoft.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=derek.kiernan@amd.com \
    --cc=dragan.cvetic@amd.com \
    --cc=jgowans@amazon.com \
    --cc=jinankjain@linux.microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=rppt@kernel.org \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=stanislav.kinsburskii@gmail.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox