* [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-09 9:50 ` David Hildenbrand (Arm)
2026-06-05 21:19 ` [PATCH v4 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
` (7 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Memory hotplug operations require ranges aligned to memory block
boundaries. This is a generic operation for hotplug.
Add memory_block_aligned_range() as a common helper in <linux/memory.h>
that aligns the start address up and end address down to memory block
boundaries.
Update dax/kmem to use this helper.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 4 +---
include/linux/memory.h | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index a18e2b968e4d..592171ec10f4 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -33,9 +33,7 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
struct dev_dax_range *dax_range = &dev_dax->ranges[i];
struct range *range = &dax_range->range;
- /* memory-block align the hotplug range */
- r->start = ALIGN(range->start, memory_block_size_bytes());
- r->end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1;
+ *r = memory_block_aligned_range(range);
if (r->start >= r->end) {
r->start = range->start;
r->end = range->end;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 463dc02f6cff..9f5ef0309f77 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -20,6 +20,7 @@
#include <linux/compiler.h>
#include <linux/mutex.h>
#include <linux/memory_hotplug.h>
+#include <linux/range.h>
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
@@ -100,6 +101,27 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
unsigned long memory_block_size_bytes(void);
int set_memory_block_size_order(unsigned int order);
+/**
+ * memory_block_aligned_range - align a physical address range to memory blocks
+ * @range: the input range to align
+ *
+ * Aligns the start address up and the end address down to memory block
+ * boundaries. This is required for memory hotplug operations which must
+ * operate on memory-block aligned ranges.
+ *
+ * Returns the aligned range. Callers should check that the returned
+ * range is valid (aligned.start < aligned.end) before using it.
+ */
+static inline struct range memory_block_aligned_range(const struct range *range)
+{
+ struct range aligned;
+
+ aligned.start = ALIGN(range->start, memory_block_size_bytes());
+ aligned.end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1;
+
+ return aligned;
+}
+
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper
2026-06-05 21:19 ` [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
@ 2026-06-09 9:50 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 9:50 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, osalvador, shuah, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
On 6/5/26 23:19, Gregory Price wrote:
> Memory hotplug operations require ranges aligned to memory block
> boundaries. This is a generic operation for hotplug.
>
> Add memory_block_aligned_range() as a common helper in <linux/memory.h>
> that aligns the start address up and end address down to memory block
> boundaries.
>
> Update dax/kmem to use this helper.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
2026-06-05 21:19 ` [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:19 ` [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
` (6 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Modify online_memory_block() to accept the online type through its arg
parameter rather than calling mhp_get_default_online_type() internally.
This prepares for allowing callers to specify explicit online types.
Update the caller in add_memory_resource() to pass the default online
type via a local variable.
No functional change.
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
mm/memory_hotplug.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7ac19fab2263..6833208cc17c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
static int online_memory_block(struct memory_block *mem, void *arg)
{
- mem->online_type = mhp_get_default_online_type();
+ enum mmop *online_type = arg;
+
+ mem->online_type = *online_type;
return device_online(&mem->dev);
}
@@ -1494,6 +1496,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+ enum mmop online_type = mhp_get_default_online_type();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
struct memory_group *group = NULL;
u64 start, size;
@@ -1582,7 +1585,8 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* online pages if requested */
if (mhp_get_default_online_type() != MMOP_OFFLINE)
- walk_memory_blocks(start, size, NULL, online_memory_block);
+ walk_memory_blocks(start, size, &online_type,
+ online_memory_block);
return ret;
error:
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
2026-06-05 21:19 ` [PATCH v4 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
2026-06-05 21:19 ` [PATCH v4 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:29 ` sashiko-bot
2026-06-09 9:52 ` David Hildenbrand (Arm)
2026-06-05 21:19 ` [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
` (5 subsequent siblings)
8 siblings, 2 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Drivers which may pass hotplug policy down to DAX need MMOP_ symbols
and the mhp_get_default_online_type function for hotplug use cases.
Some drivers (cxl) co-mingle their hotplug and devdax use-cases into
the same driver code, and chose the dax_kmem path as the default driver
path - making it difficult to require hotplug as a predicate to building
the overall driver (it may break other non-hotplug use-cases).
Export mhp_get_default_online_type function to allow these drivers to
build when hotplug is disabled and still use the DAX use case.
In the built-out case we simply return MMOP_OFFLINE as it's
non-destructive. The internal function can never return -1 either,
so we choose this to allow for defining the function with 'enum mmop'.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7c9d66729c60..f059025f8f8b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -316,6 +316,8 @@ extern struct zone *zone_for_pfn_range(enum mmop online_type,
extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
struct mhp_params *params);
void arch_remove_linear_mapping(u64 start, u64 size);
+#else
+static inline enum mmop mhp_get_default_online_type(void) { return MMOP_OFFLINE; }
#endif /* CONFIG_MEMORY_HOTPLUG */
#endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6833208cc17c..494257054095 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -239,6 +239,7 @@ enum mmop mhp_get_default_online_type(void)
return mhp_default_online_type;
}
+EXPORT_SYMBOL_GPL(mhp_get_default_online_type);
void mhp_set_default_online_type(enum mmop online_type)
{
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-05 21:19 ` [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
@ 2026-06-05 21:29 ` sashiko-bot
2026-06-05 21:43 ` Gregory Price
2026-06-09 9:52 ` David Hildenbrand (Arm)
1 sibling, 1 reply; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:29 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] Lazy initialization of `mhp_default_online_type` can race with sysfs updates, silently clobbering the user-specified hotplug policy.
--
commit 7d00f30f087affb00391db7c7b27504af6cf3ac3
Author: Gregory Price <gourry@gourry.net>
mm/memory_hotplug: export mhp_get_default_online_type
Drivers which may pass hotplug policy down to DAX need MMOP_ symbols
and the mhp_get_default_online_type function for hotplug use cases.
This commit exports the mhp_get_default_online_type function to allow
these drivers to build and function when hotplug is disabled.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6833208cc17cb..4942570540956 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
[ ... ]
> @@ -239,6 +239,7 @@ enum mmop mhp_get_default_online_type(void)
>
> return mhp_default_online_type;
> }
> +EXPORT_SYMBOL_GPL(mhp_get_default_online_type);
[Severity: High]
This isn't a bug introduced by this patch, but exporting this function for
asynchronous driver module initialization paths significantly increases the
probability of a data race with sysfs updates.
Can this lazy initialization clobber a user-specified hotplug policy?
If a driver calls mhp_get_default_online_type() and reads the uninitialized
state just as userspace writes to /sys/devices/system/memory/auto_online_blocks,
could the following sequence occur?
CPU1 (Driver init calling mhp_get_default_online_type):
if (mhp_default_online_type >= 0) // reads -1
CPU2 (Sysfs write calling auto_online_blocks_store in drivers/base/memory.c):
mhp_set_default_online_type(online_type); // sets user policy
CPU1 (Continuing in mhp_get_default_online_type):
if (IS_ENABLED(CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE))
mhp_default_online_type = MMOP_OFFLINE; // clobbers user policy
Does this unprotected check-and-set sequence on a global configuration
variable need serialization to prevent the user-specified policy from being
silently lost?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=3
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-05 21:29 ` sashiko-bot
@ 2026-06-05 21:43 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:43 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-cxl
On Fri, Jun 05, 2026 at 09:29:16PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
>
> Pre-existing issues:
> - [High] Lazy initialization of `mhp_default_online_type` can race with sysfs updates, silently clobbering the user-specified hotplug policy.
> --
>
... snip ...
>
> [Severity: High]
Classifying this as "High Severity" is almost comical.
If we want to address this, then we can just slap a lock on the global
and call it a day I suppose.
This seems incredibly unlikely, and a good example why a single global
auto-online policy is bad anyway.
> This isn't a bug introduced by this patch, but exporting this function for
> asynchronous driver module initialization paths significantly increases the
> probability of a data race with sysfs updates.
>
> Can this lazy initialization clobber a user-specified hotplug policy?
>
> If a driver calls mhp_get_default_online_type() and reads the uninitialized
> state just as userspace writes to /sys/devices/system/memory/auto_online_blocks,
> could the following sequence occur?
>
> CPU1 (Driver init calling mhp_get_default_online_type):
> if (mhp_default_online_type >= 0) // reads -1
>
> CPU2 (Sysfs write calling auto_online_blocks_store in drivers/base/memory.c):
> mhp_set_default_online_type(online_type); // sets user policy
>
> CPU1 (Continuing in mhp_get_default_online_type):
> if (IS_ENABLED(CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE))
> mhp_default_online_type = MMOP_OFFLINE; // clobbers user policy
>
> Does this unprotected check-and-set sequence on a global configuration
> variable need serialization to prevent the user-specified policy from being
> silently lost?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=3
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-05 21:19 ` [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
2026-06-05 21:29 ` sashiko-bot
@ 2026-06-09 9:52 ` David Hildenbrand (Arm)
2026-06-09 15:11 ` Gregory Price
1 sibling, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 9:52 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, osalvador, shuah, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
On 6/5/26 23:19, Gregory Price wrote:
> Drivers which may pass hotplug policy down to DAX need MMOP_ symbols
> and the mhp_get_default_online_type function for hotplug use cases.
>
> Some drivers (cxl) co-mingle their hotplug and devdax use-cases into
> the same driver code, and chose the dax_kmem path as the default driver
> path - making it difficult to require hotplug as a predicate to building
> the overall driver (it may break other non-hotplug use-cases).
>
> Export mhp_get_default_online_type function to allow these drivers to
> build when hotplug is disabled and still use the DAX use case.
>
> In the built-out case we simply return MMOP_OFFLINE as it's
Ah, you mean without CONFIG_MEMORY_HOTPLUG
> non-destructive. The internal function can never return -1 either,
> so we choose this to allow for defining the function with 'enum mmop'.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-09 9:52 ` David Hildenbrand (Arm)
@ 2026-06-09 15:11 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-09 15:11 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple
On Tue, Jun 09, 2026 at 11:52:21AM +0200, David Hildenbrand (Arm) wrote:
> On 6/5/26 23:19, Gregory Price wrote:
> > Drivers which may pass hotplug policy down to DAX need MMOP_ symbols
> > and the mhp_get_default_online_type function for hotplug use cases.
> >
> > Some drivers (cxl) co-mingle their hotplug and devdax use-cases into
> > the same driver code, and chose the dax_kmem path as the default driver
> > path - making it difficult to require hotplug as a predicate to building
> > the overall driver (it may break other non-hotplug use-cases).
> >
> > Export mhp_get_default_online_type function to allow these drivers to
> > build when hotplug is disabled and still use the DAX use case.
> >
> > In the built-out case we simply return MMOP_OFFLINE as it's
>
> Ah, you mean without CONFIG_MEMORY_HOTPLUG
>
Yeah i'll update the commit message, thanks!
> > non-destructive. The internal function can never return -1 either,
> > so we choose this to allow for defining the function with 'enum mmop'.
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (2 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-09 9:55 ` David Hildenbrand (Arm)
2026-06-05 21:19 ` [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug Gregory Price
` (4 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Existing callers of add_memory_driver_managed cannot select the
preferred online type (ZONE_NORMAL vs ZONE_MOVABLE), requiring it to
hot-add memory as offline blocks, and then follow up by onlining each
memory block individually.
Most drivers prefer the system default, but the CXL driver wants to
plumb a preferred policy through the dax kmem driver.
Refactor APIs to add a new interface which allows the dax kmem module
to select a preferred policy.
Overriding the configured auto-online policy is only safe for known
in-tree modules, where we know the override reflects a different,
user-requested policy. We do not want arbitrary out-of-tree drivers
silently overriding the system-wide onlining policy, so restrict the
new interface to the kmem module using EXPORT_SYMBOL_FOR_MODULES()
rather than a plain EXPORT_SYMBOL_GPL(). Other in-tree modules (e.g.
cxl_core) can be added to the allowed list as the need arises.
Refactor add_memory_driver_managed, extract __add_memory_driver_managed
- Add proper kernel-doc for add_memory_driver_managed while refactoring
- New helper accepts an explicit online_type.
- New helper validates online_type is between OFFLINE and ONLINE_MOVABLE
Refactor: add_memory_resource, extract __add_memory_resource
- new helper accepts an explicit online_type
Original APIs now explicitly pass the system-default to new helpers.
No functional change for existing users.
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 3 ++
mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++-----
2 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f059025f8f8b..d3edeb80aadb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory_resource(int nid, struct resource *resource,
mhp_t mhp_flags);
+int __add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags,
+ enum mmop online_type);
extern int add_memory_driver_managed(int nid, u64 start, u64 size,
const char *resource_name,
mhp_t mhp_flags);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 494257054095..7d145217adc6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1494,10 +1494,10 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
*
* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
*/
-int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+static int __add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags,
+ enum mmop online_type)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
- enum mmop online_type = mhp_get_default_online_type();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
struct memory_group *group = NULL;
u64 start, size;
@@ -1585,7 +1585,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
merge_system_ram_resource(res);
/* online pages if requested */
- if (mhp_get_default_online_type() != MMOP_OFFLINE)
+ if (online_type != MMOP_OFFLINE)
walk_memory_blocks(start, size, &online_type,
online_memory_block);
@@ -1603,7 +1603,13 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
return ret;
}
-/* requires device_hotplug_lock, see add_memory_resource() */
+int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+{
+ return __add_memory_resource(nid, res, mhp_flags,
+ mhp_get_default_online_type());
+}
+
+/* requires device_hotplug_lock, see __add_memory_resource() */
int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
{
struct resource *res;
@@ -1631,7 +1637,15 @@ int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
}
EXPORT_SYMBOL_GPL(add_memory);
-/*
+/**
+ * __add_memory_driver_managed - add driver-managed memory with explicit online_type
+ * @nid: NUMA node ID where the memory will be added
+ * @start: Start physical address of the memory range
+ * @size: Size of the memory range in bytes
+ * @resource_name: Resource name in format "System RAM ($DRIVER)"
+ * @mhp_flags: Memory hotplug flags
+ * @online_type: Auto-Online behavior (offline, online, kernel, movable)
+ *
* Add special, driver-managed memory to the system as system RAM. Such
* memory is not exposed via the raw firmware-provided memmap as system
* RAM, instead, it is detected and added by a driver - during cold boot,
@@ -1639,6 +1653,7 @@ EXPORT_SYMBOL_GPL(add_memory);
*
* Reasons why this memory should not be used for the initial memmap of a
* kexec kernel or for placing kexec images:
+ *
* - The booting kernel is in charge of determining how this memory will be
* used (e.g., use persistent memory as system RAM)
* - Coordination with a hypervisor is required before this memory
@@ -1651,9 +1666,12 @@ EXPORT_SYMBOL_GPL(add_memory);
*
* The resource_name (visible via /proc/iomem) has to have the format
* "System RAM ($DRIVER)".
+ *
+ * Return: 0 on success, negative error code on failure.
*/
-int add_memory_driver_managed(int nid, u64 start, u64 size,
- const char *resource_name, mhp_t mhp_flags)
+int __add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags,
+ enum mmop online_type)
{
struct resource *res;
int rc;
@@ -1663,6 +1681,9 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
resource_name[strlen(resource_name) - 1] != ')')
return -EINVAL;
+ if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
lock_device_hotplug();
res = register_memory_resource(start, size, resource_name);
@@ -1671,7 +1692,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
goto out_unlock;
}
- rc = add_memory_resource(nid, res, mhp_flags);
+ rc = __add_memory_resource(nid, res, mhp_flags, online_type);
if (rc < 0)
release_memory_resource(res);
@@ -1679,6 +1700,30 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
unlock_device_hotplug();
return rc;
}
+EXPORT_SYMBOL_FOR_MODULES(__add_memory_driver_managed, "kmem");
+
+/**
+ * add_memory_driver_managed - add driver-managed memory
+ * @nid: NUMA node ID where the memory will be added
+ * @start: Start physical address of the memory range
+ * @size: Size of the memory range in bytes
+ * @resource_name: Resource name in format "System RAM ($DRIVER)"
+ * @mhp_flags: Memory hotplug flags
+ *
+ * Add driver-managed memory with the system default online type set by
+ * build config or kernel boot parameter.
+ *
+ * See __add_memory_driver_managed for more details.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags)
+{
+ return __add_memory_driver_managed(nid, start, size, resource_name,
+ mhp_flags,
+ mhp_get_default_online_type());
+}
EXPORT_SYMBOL_GPL(add_memory_driver_managed);
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg
2026-06-05 21:19 ` [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
@ 2026-06-09 9:55 ` David Hildenbrand (Arm)
2026-06-09 15:12 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 9:55 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, osalvador, shuah, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
On 6/5/26 23:19, Gregory Price wrote:
> Existing callers of add_memory_driver_managed cannot select the
> preferred online type (ZONE_NORMAL vs ZONE_MOVABLE), requiring it to
> hot-add memory as offline blocks, and then follow up by onlining each
> memory block individually.
>
> Most drivers prefer the system default, but the CXL driver wants to
> plumb a preferred policy through the dax kmem driver.
>
> Refactor APIs to add a new interface which allows the dax kmem module
> to select a preferred policy.
>
> Overriding the configured auto-online policy is only safe for known
> in-tree modules, where we know the override reflects a different,
> user-requested policy. We do not want arbitrary out-of-tree drivers
> silently overriding the system-wide onlining policy, so restrict the
> new interface to the kmem module using EXPORT_SYMBOL_FOR_MODULES()
> rather than a plain EXPORT_SYMBOL_GPL(). Other in-tree modules (e.g.
> cxl_core) can be added to the allowed list as the need arises.
>
> Refactor add_memory_driver_managed, extract __add_memory_driver_managed
> - Add proper kernel-doc for add_memory_driver_managed while refactoring
> - New helper accepts an explicit online_type.
> - New helper validates online_type is between OFFLINE and ONLINE_MOVABLE
>
> Refactor: add_memory_resource, extract __add_memory_resource
> - new helper accepts an explicit online_type
>
> Original APIs now explicitly pass the system-default to new helpers.
>
> No functional change for existing users.
>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> include/linux/memory_hotplug.h | 3 ++
> mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++-----
> 2 files changed, 56 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f059025f8f8b..d3edeb80aadb 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> extern int add_memory_resource(int nid, struct resource *resource,
> mhp_t mhp_flags);
> +int __add_memory_driver_managed(int nid, u64 start, u64 size,
> + const char *resource_name, mhp_t mhp_flags,
> + enum mmop online_type);
We prefer two-tab indent on second parameter line while touching code / adding
new code.
Same applies to the other instances below.
Apart from that (still) LGTM.
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg
2026-06-09 9:55 ` David Hildenbrand (Arm)
@ 2026-06-09 15:12 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-09 15:12 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple
On Tue, Jun 09, 2026 at 11:55:24AM +0200, David Hildenbrand (Arm) wrote:
> On 6/5/26 23:19, Gregory Price wrote:
> >
> > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> > index f059025f8f8b..d3edeb80aadb 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> > extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> > extern int add_memory_resource(int nid, struct resource *resource,
> > mhp_t mhp_flags);
> > +int __add_memory_driver_managed(int nid, u64 start, u64 size,
> > + const char *resource_name, mhp_t mhp_flags,
> > + enum mmop online_type);
>
> We prefer two-tab indent on second parameter line while touching code / adding
> new code.
>
> Same applies to the other instances below.
>
Will fix on next spin, thanks!
>
> Apart from that (still) LGTM.
>
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (3 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:30 ` sashiko-bot
2026-06-09 10:06 ` David Hildenbrand (Arm)
2026-06-05 21:19 ` [PATCH v4 6/9] dax: plumb hotplug online_type through dax Gregory Price
` (3 subsequent siblings)
8 siblings, 2 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
offline_and_remove_memory() handles a single contiguous range.
Callers that manage a device composed of several ranges (e.g. dax/kmem)
currently have to call it in a loop, which gives up atomicity.
This creates a race condition where another daemon can online a block
that was just offlined while other blocks are being offlined, causing
the eventual (original) unplug operation to fail.
Add offline_and_remove_memory_ranges(), which takes an array of ranges
and processes them as one operation under a single lock_device_hotplug():
- Phase 1 offlines every block of every range, remembering each block's
previous online type.
- Phase 2 removes the ranges only once all of them are offline.
- If any offline fails, the offlining done so far is reverted and
nothing is removed.
This gives callers all-or-nothing semantics for the offline step, so a
failed or interrupted unplug leaves every range online as before rather
than in an inconsistent partially-removed state.
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 7 +++
mm/memory_hotplug.c | 95 ++++++++++++++++++++++++++++++++++
2 files changed, 102 insertions(+)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d3edeb80aadb..7f1da7c428dc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -267,6 +267,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
+int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges);
#else
static inline void try_offline_node(int nid) {}
@@ -283,6 +284,12 @@ static inline int remove_memory(u64 start, u64 size)
}
static inline void __remove_memory(u64 start, u64 size) {}
+
+static inline int offline_and_remove_memory_ranges(const struct range *ranges,
+ int nr_ranges)
+{
+ return -EBUSY;
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7d145217adc6..e486d35c22b2 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2483,4 +2483,99 @@ int offline_and_remove_memory(u64 start, u64 size)
return rc;
}
EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+/**
+ * offline_and_remove_memory_ranges - offline and remove multiple memory ranges
+ * @ranges: array of physical address ranges to offline and remove
+ * @nr_ranges: number of entries in @ranges
+ *
+ * Offline and remove several memory ranges as one operation, serialized
+ * against other hotplug operations by a single lock_device_hotplug().
+ *
+ * Unlike calling offline_and_remove_memory() in a loop, this offlines *all*
+ * ranges before removing any of them. If offlining any range fails, the
+ * offlining of the ranges processed so far is reverted and nothing is
+ * removed, leaving every range online as it was before the call. This gives
+ * callers all-or-nothing semantics for the offline step, so a failed unplug
+ * does not leave a device split between online and removed ranges.
+ *
+ * Each range must be memory-block aligned in start and size.
+ *
+ * Return: 0 on success, negative errno otherwise. On failure no range has
+ * been removed.
+ */
+int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges)
+{
+ unsigned long mb_total = 0;
+ uint8_t *online_types, *tmp;
+ int i, rc = 0;
+
+ if (!ranges || nr_ranges <= 0)
+ return -EINVAL;
+
+ for (i = 0; i < nr_ranges; i++) {
+ u64 start = ranges[i].start;
+ u64 size = range_len(&ranges[i]);
+
+ if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
+ !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+ return -EINVAL;
+ mb_total += size / memory_block_size_bytes();
+ }
+
+ /*
+ * Remember the old online type of every memory block across all ranges,
+ * so we can revert if offlining a later block fails. All entries start
+ * as MMOP_OFFLINE so blocks we never touched are skipped on rollback.
+ */
+ online_types = kmalloc_array(mb_total, sizeof(*online_types),
+ GFP_KERNEL);
+ if (!online_types)
+ return -ENOMEM;
+ memset(online_types, MMOP_OFFLINE, mb_total);
+
+ lock_device_hotplug();
+
+ /* Phase 1: offline every block in every range. */
+ tmp = online_types;
+ for (i = 0; i < nr_ranges; i++) {
+ rc = walk_memory_blocks(ranges[i].start, range_len(&ranges[i]),
+ &tmp, try_offline_memory_block);
+ if (rc)
+ break;
+ }
+
+ /*
+ * Phase 2: only once everything is offline, remove it. This cannot
+ * fail as the memory can no longer be onlined in the meantime.
+ */
+ if (!rc) {
+ for (i = 0; i < nr_ranges; i++) {
+ rc = try_remove_memory(ranges[i].start,
+ range_len(&ranges[i]));
+ if (rc) {
+ pr_err("%s: Failed to remove memory: %d",
+ __func__, rc);
+ break;
+ }
+ }
+ }
+
+ /*
+ * Roll back the offlining if anything failed. Blocks we never offlined
+ * are marked MMOP_OFFLINE and skipped by try_reonline_memory_block().
+ */
+ if (rc) {
+ tmp = online_types;
+ for (i = 0; i < nr_ranges; i++)
+ walk_memory_blocks(ranges[i].start,
+ range_len(&ranges[i]), &tmp,
+ try_reonline_memory_block);
+ }
+ unlock_device_hotplug();
+
+ kfree(online_types);
+ return rc;
+}
+EXPORT_SYMBOL_GPL(offline_and_remove_memory_ranges);
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug
2026-06-05 21:19 ` [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug Gregory Price
@ 2026-06-05 21:30 ` sashiko-bot
2026-06-09 10:06 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:30 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] If `try_remove_memory()` fails during Phase 2, the rollback phase irrevocably corrupts the state array and breaks the 'all-or-nothing' guarantee.
--
commit 8e30dd8a7c852dc51179b28a0c0a3841cc1be525
Author: Gregory Price <gourry@gourry.net>
mm/memory_hotplug: add multi-range hotunplug
This commit adds offline_and_remove_memory_ranges() to offline and remove
multiple memory ranges atomically. It offlines all ranges in phase 1, and
removes them in phase 2, falling back to restoring online types if failures
occur.
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7d145217adc6b..e486d35c22b2c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
[ ... ]
> + /*
> + * Phase 2: only once everything is offline, remove it. This cannot
> + * fail as the memory can no longer be onlined in the meantime.
> + */
> + if (!rc) {
> + for (i = 0; i < nr_ranges; i++) {
> + rc = try_remove_memory(ranges[i].start,
> + range_len(&ranges[i]));
> + if (rc) {
> + pr_err("%s: Failed to remove memory: %d",
> + __func__, rc);
> + break;
> + }
> + }
> + }
[Severity: High]
The comment states that this cannot fail, but try_remove_memory() can return
errors, for example if memory_blocks_have_altmaps() returns -EINVAL.
If try_remove_memory() succeeds for the first range but fails for a subsequent
range, the blocks for the first range are irrevocably removed. Does this leave
the system in a partially removed state, breaking the all-or-nothing guarantee?
> +
> + /*
> + * Roll back the offlining if anything failed. Blocks we never offlined
> + * are marked MMOP_OFFLINE and skipped by try_reonline_memory_block().
> + */
> + if (rc) {
> + tmp = online_types;
> + for (i = 0; i < nr_ranges; i++)
> + walk_memory_blocks(ranges[i].start,
> + range_len(&ranges[i]), &tmp,
> + try_reonline_memory_block);
> + }
[Severity: High]
If we enter this rollback path after a partial failure in phase 2, will the
state array tracking become corrupted?
Since the first range was physically removed, memory_block_get() inside
walk_memory_blocks() will return NULL and skip try_reonline_memory_block().
Because the callback is skipped, the tmp pointer will not be advanced for
those blocks.
When the rollback loop moves to the next range, it will read from the beginning
of tmp, incorrectly applying the saved states of the first range to the blocks
of the second range. Can this desynchronization be avoided?
> + unlock_device_hotplug();
> +
> + kfree(online_types);
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(offline_and_remove_memory_ranges);
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=5
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug
2026-06-05 21:19 ` [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug Gregory Price
2026-06-05 21:30 ` sashiko-bot
@ 2026-06-09 10:06 ` David Hildenbrand (Arm)
2026-06-09 15:15 ` Gregory Price
1 sibling, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 10:06 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, osalvador, shuah, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
> EXPORT_SYMBOL_GPL(offline_and_remove_memory);
> +
> +/**
> + * offline_and_remove_memory_ranges - offline and remove multiple memory ranges
> + * @ranges: array of physical address ranges to offline and remove
> + * @nr_ranges: number of entries in @ranges
> + *
> + * Offline and remove several memory ranges as one operation, serialized
> + * against other hotplug operations by a single lock_device_hotplug().
> + *
> + * Unlike calling offline_and_remove_memory() in a loop, this offlines *all*
> + * ranges before removing any of them. If offlining any range fails, the
> + * offlining of the ranges processed so far is reverted and nothing is
> + * removed, leaving every range online as it was before the call. This gives
> + * callers all-or-nothing semantics for the offline step, so a failed unplug
> + * does not leave a device split between online and removed ranges.
> + *
> + * Each range must be memory-block aligned in start and size.
> + *
> + * Return: 0 on success, negative errno otherwise. On failure no range has
> + * been removed.
> + */
> +int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges)
> +{
Is there a way to just generalize the logic in offline_and_remove_memory() to
multiple ranges, making offline_and_remove_memory() then a simple wrapper around
the new offline_and_remove_memory_ranges(), providing only a single range?
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug
2026-06-09 10:06 ` David Hildenbrand (Arm)
@ 2026-06-09 15:15 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-09 15:15 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple
On Tue, Jun 09, 2026 at 12:06:38PM +0200, David Hildenbrand (Arm) wrote:
>
> > EXPORT_SYMBOL_GPL(offline_and_remove_memory);
> > +
> > +/**
> > + * offline_and_remove_memory_ranges - offline and remove multiple memory ranges
> > + * @ranges: array of physical address ranges to offline and remove
> > + * @nr_ranges: number of entries in @ranges
> > + *
> > + * Offline and remove several memory ranges as one operation, serialized
> > + * against other hotplug operations by a single lock_device_hotplug().
> > + *
> > + * Unlike calling offline_and_remove_memory() in a loop, this offlines *all*
> > + * ranges before removing any of them. If offlining any range fails, the
> > + * offlining of the ranges processed so far is reverted and nothing is
> > + * removed, leaving every range online as it was before the call. This gives
> > + * callers all-or-nothing semantics for the offline step, so a failed unplug
> > + * does not leave a device split between online and removed ranges.
> > + *
> > + * Each range must be memory-block aligned in start and size.
> > + *
> > + * Return: 0 on success, negative errno otherwise. On failure no range has
> > + * been removed.
> > + */
> > +int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges)
> > +{
>
> Is there a way to just generalize the logic in offline_and_remove_memory() to
> multiple ranges, making offline_and_remove_memory() then a simple wrapper around
> the new offline_and_remove_memory_ranges(), providing only a single range?
>
Yeah that's reasonable, I'll look at what can be done.
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 6/9] dax: plumb hotplug online_type through dax
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (4 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 5/9] mm/memory_hotplug: add multi-range hotunplug Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:31 ` sashiko-bot
2026-06-05 21:19 ` [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
` (2 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
There is no way for drivers leveraging dax_kmem to plumb through a
preferred auto-online policy - the system default policy is forced.
Add 'enum mmop' field to DAX device creation path to allow drivers
to specify an auto-online policy when using the kmem driver.
Current callers initialize online_type to mhp_get_default_online_type()
to retain backward compatibility and to make explicit to the drivers
what is actually happening underneath.
No functional changes to existing callers.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/bus.c | 3 +++
drivers/dax/bus.h | 2 ++
drivers/dax/cxl.c | 1 +
drivers/dax/dax-private.h | 3 +++
drivers/dax/hmem/hmem.c | 1 +
drivers/dax/kmem.c | 5 +++--
drivers/dax/pmem.c | 1 +
7 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 492573b47f66..6611fe399f59 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2017-2018 Intel Corporation. All rights reserved. */
#include <linux/memremap.h>
+#include <linux/memory_hotplug.h>
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/list.h>
@@ -394,6 +395,7 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
.size = 0,
.id = -1,
.memmap_on_memory = false,
+ .online_type = mhp_get_default_online_type(),
};
struct dev_dax *dev_dax = __devm_create_dev_dax(&data);
@@ -1527,6 +1529,7 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
ida_init(&dev_dax->ida);
dev_dax->memmap_on_memory = data->memmap_on_memory;
+ dev_dax->online_type = data->online_type;
inode = dax_inode(dax_dev);
dev->devt = inode->i_rdev;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 5909171a4428..c53a9427f8e4 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -3,6 +3,7 @@
#ifndef __DAX_BUS_H__
#define __DAX_BUS_H__
#include <linux/device.h>
+#include <linux/memory_hotplug.h>
#include <linux/platform_device.h>
#include <linux/range.h>
#include <linux/workqueue.h>
@@ -26,6 +27,7 @@ struct dev_dax_data {
resource_size_t size;
int id;
bool memmap_on_memory;
+ enum mmop online_type;
};
struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 3ab39b77843d..0eaef700bb2a 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -27,6 +27,7 @@ static int cxl_dax_region_probe(struct device *dev)
.id = -1,
.size = range_len(&cxlr_dax->hpa_range),
.memmap_on_memory = true,
+ .online_type = mhp_get_default_online_type(),
};
return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 81e4af49e39c..0787325bc8dd 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -8,6 +8,7 @@
#include <linux/device.h>
#include <linux/cdev.h>
#include <linux/idr.h>
+#include <linux/memory_hotplug.h>
/* private routines between core files */
struct dax_device;
@@ -79,6 +80,7 @@ struct dev_dax_range {
* @dev: device core
* @pgmap: pgmap for memmap setup / lifetime (driver owned)
* @memmap_on_memory: allow kmem to put the memmap in the memory
+ * @online_type: MMOP_* online type for memory hotplug
* @nr_range: size of @ranges
* @ranges: range tuples of memory used
*/
@@ -95,6 +97,7 @@ struct dev_dax {
struct device dev;
struct dev_pagemap *pgmap;
bool memmap_on_memory;
+ enum mmop online_type;
int nr_range;
struct dev_dax_range *ranges;
};
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index af21f66bf872..0ef6e9ae660d 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -37,6 +37,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
.id = -1,
.size = region_idle ? 0 : range_len(&mri->range),
.memmap_on_memory = false,
+ .online_type = mhp_get_default_online_type(),
};
return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 592171ec10f4..41ccb618a146 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -172,8 +172,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
* Ensure that future kexec'd kernels will not treat
* this as RAM automatically.
*/
- rc = add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags);
+ rc = __add_memory_driver_managed(data->mgid, range.start,
+ range_len(&range), kmem_name, mhp_flags,
+ dev_dax->online_type);
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index bee93066a849..a5f987814da5 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -63,6 +63,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
.pgmap = &pgmap,
.size = range_len(&range),
.memmap_on_memory = false,
+ .online_type = mhp_get_default_online_type(),
};
return devm_create_dev_dax(&data);
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 6/9] dax: plumb hotplug online_type through dax
2026-06-05 21:19 ` [PATCH v4 6/9] dax: plumb hotplug online_type through dax Gregory Price
@ 2026-06-05 21:31 ` sashiko-bot
2026-06-05 21:54 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:31 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] The default auto-online policy is captured at DAX device creation time rather than at driver bind time, breaking dynamic sysfs policy changes.
--
commit d528bb5263ccfd880f3665a7e5e92516c3faabd8
Author: Gregory Price <gourry@gourry.net>
dax: plumb hotplug online_type through dax
This commit adds an online_type field to the DAX device creation path to
allow drivers to specify an auto-online policy when using the kmem driver.
Existing callers are updated to initialize this field to the system default
policy.
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 492573b47f662..6611fe399f59e 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -394,6 +395,7 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
> .size = 0,
> .id = -1,
> .memmap_on_memory = false,
> + .online_type = mhp_get_default_online_type(),
> };
> struct dev_dax *dev_dax = __devm_create_dev_dax(&data);
[ ... ]
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 592171ec10f49..41ccb618a1464 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -172,8 +172,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> * Ensure that future kexec'd kernels will not treat
> * this as RAM automatically.
> */
> - rc = add_memory_driver_managed(data->mgid, range.start,
> - range_len(&range), kmem_name, mhp_flags);
> + rc = __add_memory_driver_managed(data->mgid, range.start,
> + range_len(&range), kmem_name, mhp_flags,
> + dev_dax->online_type);
[Severity: High]
Does capturing the auto-online policy at device creation time break dynamic
sysfs policy changes?
DAX devices are typically created early during boot, when the global
auto-online policy is still the kernel default. Previously,
add_memory_driver_managed() dynamically evaluated the policy when the dax_kmem
driver was bound. This allowed userspace to configure the auto-online policy
dynamically before binding the driver.
By caching the policy during device creation in create_store() (and other
creation paths) and consuming it later in dev_dax_kmem_probe(), the kmem driver
might use a stale policy and ignore runtime changes to the global policy.
Will this cause memory blocks to unintentionally remain offline if the policy
was changed after device creation?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=6
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 6/9] dax: plumb hotplug online_type through dax
2026-06-05 21:31 ` sashiko-bot
@ 2026-06-05 21:54 ` Gregory Price
2026-06-09 10:21 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:54 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-mm, david
On Fri, Jun 05, 2026 at 09:31:43PM +0000, sashiko-bot@kernel.org wrote:
>
> > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> > index 592171ec10f49..41ccb618a1464 100644
> > --- a/drivers/dax/kmem.c
> > +++ b/drivers/dax/kmem.c
> > @@ -172,8 +172,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> > * Ensure that future kexec'd kernels will not treat
> > * this as RAM automatically.
> > */
> > - rc = add_memory_driver_managed(data->mgid, range.start,
> > - range_len(&range), kmem_name, mhp_flags);
> > + rc = __add_memory_driver_managed(data->mgid, range.start,
> > + range_len(&range), kmem_name, mhp_flags,
> > + dev_dax->online_type);
>
> [Severity: High]
> Does capturing the auto-online policy at device creation time break dynamic
> sysfs policy changes?
>
@David - we talked about this ~6+ mo ago at this point, i originally
proposed something like MMOP_SYSTEM_DEFAULT
https://lore.kernel.org/linux-mm/20260114085201.3222597-3-gourry@gourry.net/
https://lore.kernel.org/linux-mm/c4ed9675-269c-4764-86d5-87f4f83fc74d@kernel.org/
'''
I don't like having fake options as part of this interface.
Why can't we let selected users use mhp_get_default_online_type()
instead? Like add_memory_resource(). We can export that function.
'''
I guess the argument here is that the value can't be cached?
But we can't really enforce that.
This is a behavioral change we have to agree on (either we change it or
carry something like MMOP_SYSTEM_DEFAULT for this).
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v4 6/9] dax: plumb hotplug online_type through dax
2026-06-05 21:54 ` Gregory Price
@ 2026-06-09 10:21 ` David Hildenbrand (Arm)
2026-06-09 15:33 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 10:21 UTC (permalink / raw)
To: Gregory Price, sashiko-reviews; +Cc: linux-mm
On 6/5/26 23:54, Gregory Price wrote:
> On Fri, Jun 05, 2026 at 09:31:43PM +0000, sashiko-bot@kernel.org wrote:
>>
>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>>> index 592171ec10f49..41ccb618a1464 100644
>>> --- a/drivers/dax/kmem.c
>>> +++ b/drivers/dax/kmem.c
>>> @@ -172,8 +172,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>> * Ensure that future kexec'd kernels will not treat
>>> * this as RAM automatically.
>>> */
>>> - rc = add_memory_driver_managed(data->mgid, range.start,
>>> - range_len(&range), kmem_name, mhp_flags);
>>> + rc = __add_memory_driver_managed(data->mgid, range.start,
>>> + range_len(&range), kmem_name, mhp_flags,
>>> + dev_dax->online_type);
>>
>> [Severity: High]
>> Does capturing the auto-online policy at device creation time break dynamic
>> sysfs policy changes?
>>
>
> @David - we talked about this ~6+ mo ago at this point, i originally
> proposed something like MMOP_SYSTEM_DEFAULT
>
> https://lore.kernel.org/linux-mm/20260114085201.3222597-3-gourry@gourry.net/
>
> https://lore.kernel.org/linux-mm/c4ed9675-269c-4764-86d5-87f4f83fc74d@kernel.org/
>
> '''
> I don't like having fake options as part of this interface.
>
> Why can't we let selected users use mhp_get_default_online_type()
> instead? Like add_memory_resource(). We can export that function.
> '''
>
> I guess the argument here is that the value can't be cached?
>
> But we can't really enforce that.
>
> This is a behavioral change we have to agree on (either we change it or
> carry something like MMOP_SYSTEM_DEFAULT for this).
I don't think we'd need a generic MMOP_* for that purpose (the core cannot
really do a lot with that AFAIKS, and the scenario "unspecified" doesn't really
occur in memory hotplug core).
I guess you could have some indication whether the user already configured it
manually (bool online_type_specified), and handle that in the handful of places.
if (dev_dax->online_type_specified)
online_type = dev_dax->online_type;
else
online_type = mhp_get_default_online_type();
Of course, you could also just call the old interface when adding without a
preference.
A negative online_type could also be used as "unspecified" in your code (MMOP_
starts at 0).
But the question really is which semantics we would want? is that exposed to
user-space somehow, such that the user could query it ("default")? or how does
this get set, and how does it later get used while it could have changed
system-wide?
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 6/9] dax: plumb hotplug online_type through dax
2026-06-09 10:21 ` David Hildenbrand (Arm)
@ 2026-06-09 15:33 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-09 15:33 UTC (permalink / raw)
To: David Hildenbrand (Arm); +Cc: sashiko-reviews, linux-mm
On Tue, Jun 09, 2026 at 12:21:34PM +0200, David Hildenbrand (Arm) wrote:
> On 6/5/26 23:54, Gregory Price wrote:
> >>
> >> [Severity: High]
> >> Does capturing the auto-online policy at device creation time break dynamic
> >> sysfs policy changes?
> >>
> >
> > @David - we talked about this ~6+ mo ago at this point, i originally
> > proposed something like MMOP_SYSTEM_DEFAULT
> >
> > https://lore.kernel.org/linux-mm/20260114085201.3222597-3-gourry@gourry.net/
> >
> > https://lore.kernel.org/linux-mm/c4ed9675-269c-4764-86d5-87f4f83fc74d@kernel.org/
> >
> > '''
> > I don't like having fake options as part of this interface.
> >
> > Why can't we let selected users use mhp_get_default_online_type()
> > instead? Like add_memory_resource(). We can export that function.
> > '''
> >
> > I guess the argument here is that the value can't be cached?
> >
> > But we can't really enforce that.
> >
> > This is a behavioral change we have to agree on (either we change it or
> > carry something like MMOP_SYSTEM_DEFAULT for this).
>
> I don't think we'd need a generic MMOP_* for that purpose (the core cannot
> really do a lot with that AFAIKS, and the scenario "unspecified" doesn't really
> occur in memory hotplug core).
>
> I guess you could have some indication whether the user already configured it
> manually (bool online_type_specified), and handle that in the handful of places.
>
> if (dev_dax->online_type_specified)
> online_type = dev_dax->online_type;
> else
> online_type = mhp_get_default_online_type();
>
Hm, I suppose this would do it.
I'm using -1 as "DAX_UNPLUGGED" at the moment, but I could actually just
add MMOP_UNPLUGGED as an explicit mode which only has relevance to callers
of hotplug (the state of the hotplug region they currently control is
MMOP_UNPLUGGED) - but not to hotplug itself (memory_hotplug.c doesn't
use it / rejects it).
Then i could use (dev_dax->online_type == -1) to mean "use system
default"
I'll bring this up in the mmop/str patch whether we should add
MMOP_UNPLUGGED.
> Of course, you could also just call the old interface when adding without a
> preference.
>
This is essentially the same thing, but we'll eventually want to pass an
online_type through the interface, so may as well just do as above.
> A negative online_type could also be used as "unspecified" in your code (MMOP_
> starts at 0).
>
> But the question really is which semantics we would want? is that exposed to
> user-space somehow, such that the user could query it ("default")? or how does
> this get set, and how does it later get used while it could have changed
> system-wide?
>
I think this is the core question.
What selection of these values do we want?
echo default > .../dax0.0/hotplug # -1
echo unplugged > .../dax0.0/hotplug # MMOP_UNPLUGGED
echo offline > .../dax0.0/hotplug # MMOP_OFFLINE
echo online > .../dax0.0/hotplug # MMOP_ONLINE
echo online_kernel > .../dax0.0/hotplug # MMOP_ONLINE_KERNEL
echo online_movable > .../dax0.0/hotplug # MMOP_ONLINE_MOVABLE
echo default > .../memoryN/state # -EINVAL - nonsensical
echo unplugged > .../memoryN/state # -EINVAL - nonsensical
echo offline > .../memoryN/state # MMOP_OFFLINE
echo online > .../memoryN/state # MMOP_ONLINE
echo online_kernel > .../memoryN/state # MMOP_ONLINE_KERNEL
echo online_movable > .../memoryN/state # MMOP_ONLINE_MOVABLE
And lets say the use sets a value, should they be able to return it to
the system default? I don't think that makes much sense, considering
the default is really only used on probe.
i.e.:
cat .../dax0.0/hotplug # default
echo online_movable > .../dax0.0/hotplug # online_movable
echo default > .../dax0.0/hotplug # -EINVAL?
The point being that .../dax0.0/hotplug doesn't exist unless the dax
device is already bound to kmem - and so by the time you can set it, the
point at which (hotplug == default) being useful is already passed.
Does this seem reasonable? The tl;dr is
+ MMOP_UNPLUGGED
+ #define DAX_HOTPLUG_DEFAULT (-1)
+ dev_dax->online_type = DAX_HOTPLUG_DEFAULT;
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (5 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 6/9] dax: plumb hotplug online_type through dax Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:36 ` sashiko-bot
2026-06-05 21:19 ` [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug Gregory Price
2026-06-05 21:19 ` [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
8 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Refactor kmem _probe() _remove() by extracting init, cleanup, hotplug,
and hot-remove logic into separate helper functions:
- dax_kmem_init_resources: inits IO_RESOURCE w/ request_mem_region
- dax_kmem_cleanup_resources: cleans up initialized IO_RESOURCE
- dax_kmem_do_hotplug: handles memory region reservation and adding
- dax_kmem_do_hotremove: handles memory removal and resource cleanup
This is a pure refactoring with no functional change. The helpers will
enable future extensions to support more granular control over memory
hotplug operations.
We need to split hotplug/remove and init/cleanup in order to have the
resources available for hot-add. Otherwise, when probe occurs, the dax
devices are never added to sysfs because the resources are never
registered.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 315 +++++++++++++++++++++++++++++++--------------
1 file changed, 215 insertions(+), 100 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 41ccb618a146..5bf36ab73f86 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -63,14 +63,195 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/**
+ * dax_kmem_do_hotplug - hotplug memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Hotplugs all ranges in the dev_dax region as system memory.
+ *
+ * Returns the number of successfully mapped ranges, or negative error.
+ */
+static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data,
+ int online_type)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, rc, onlined = 0;
+ mhp_t mhp_flags;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ mhp_flags = MHP_NID_IS_MGID;
+ if (dev_dax->memmap_on_memory)
+ mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+
+ /*
+ * Ensure that future kexec'd kernels will not treat
+ * this as RAM automatically.
+ */
+ rc = __add_memory_driver_managed(data->mgid, range.start,
+ range_len(&range), kmem_name, mhp_flags,
+ online_type);
+
+ if (rc) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
+ i, range.start, range.end);
+ /*
+ * Release the reservation for the range that failed to
+ * add so a later hotremove does not try to remove memory
+ * that was never added.
+ */
+ if (data->res[i]) {
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+ if (onlined)
+ continue;
+ return rc;
+ }
+ onlined++;
+ }
+
+ return onlined;
+}
+
+/**
+ * dax_kmem_init_resources - create memory regions for dax kmem
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Initializes all the resources for the DAX
+ *
+ * Returns the number of successfully mapped ranges, or negative error.
+ */
+static int dax_kmem_init_resources(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, rc, mapped = 0;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct resource *res;
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* Skip ranges already added */
+ if (data->res[i])
+ continue;
+
+ /* Region is permanently reserved if hotremove fails. */
+ res = request_mem_region(range.start, range_len(&range),
+ data->res_name);
+ if (!res) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
+ i, range.start, range.end);
+ /*
+ * Once some memory has been onlined we can't
+ * assume that it can be un-onlined safely.
+ */
+ if (mapped)
+ continue;
+ return -EBUSY;
+ }
+ data->res[i] = res;
+ /*
+ * Set flags appropriate for System RAM. Leave ..._BUSY clear
+ * so that add_memory() can add a child resource. Do not
+ * inherit flags from the parent since it may set new flags
+ * unknown to us that will break add_memory() below.
+ */
+ res->flags = IORESOURCE_SYSTEM_RAM;
+ mapped++;
+ }
+ return mapped;
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/**
+ * dax_kmem_do_hotremove - hot-remove memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Removes all ranges in the dev_dax region.
+ *
+ * Returns the number of successfully removed ranges.
+ */
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, success = 0;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+ int rc;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* range was never added during probe, count as removed */
+ if (!data->res[i]) {
+ success++;
+ continue;
+ }
+
+ rc = remove_memory(range.start, range_len(&range));
+ if (rc == 0) {
+ /* Release the resource for the successfully removed range */
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ success++;
+ continue;
+ }
+ any_hotremove_failed = true;
+ dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n",
+ i, range.start, range.end);
+ }
+
+ return success;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+
+/**
+ * dax_kmem_cleanup_resources - remove the dax memory resources
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Removes all resources in the dev_dax region.
+ */
+static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ int i;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ if (!data->res[i])
+ continue;
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+}
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
unsigned long total_len = 0, orig_len = 0;
struct dax_kmem_data *data;
struct memory_dev_type *mtype;
- int i, rc, mapped = 0;
- mhp_t mhp_flags;
+ int i, rc;
int numa_node;
int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
@@ -132,68 +313,25 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct resource *res;
- struct range range;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- /* Region is permanently reserved if hotremove fails. */
- res = request_mem_region(range.start, range_len(&range), data->res_name);
- if (!res) {
- dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
- i, range.start, range.end);
- /*
- * Once some memory has been onlined we can't
- * assume that it can be un-onlined safely.
- */
- if (mapped)
- continue;
- rc = -EBUSY;
- goto err_request_mem;
- }
- data->res[i] = res;
-
- /*
- * Set flags appropriate for System RAM. Leave ..._BUSY clear
- * so that add_memory() can add a child resource. Do not
- * inherit flags from the parent since it may set new flags
- * unknown to us that will break add_memory() below.
- */
- res->flags = IORESOURCE_SYSTEM_RAM;
-
- mhp_flags = MHP_NID_IS_MGID;
- if (dev_dax->memmap_on_memory)
- mhp_flags |= MHP_MEMMAP_ON_MEMORY;
-
- /*
- * Ensure that future kexec'd kernels will not treat
- * this as RAM automatically.
- */
- rc = __add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags,
- dev_dax->online_type);
+ dev_set_drvdata(dev, data);
- if (rc) {
- dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
- i, range.start, range.end);
- remove_resource(res);
- kfree(res);
- data->res[i] = NULL;
- if (mapped)
- continue;
- goto err_request_mem;
- }
- mapped++;
- }
+ rc = dax_kmem_init_resources(dev_dax, data);
+ if (rc < 0)
+ goto err_resources;
- dev_set_drvdata(dev, data);
+ /*
+ * Hotplug using the configured online type for this device.
+ */
+ rc = dax_kmem_do_hotplug(dev_dax, data, dev_dax->online_type);
+ if (rc < 0)
+ goto err_hotplug;
return 0;
-err_request_mem:
+err_hotplug:
+ dax_kmem_cleanup_resources(dev_dax, data);
+err_resources:
+ dev_set_drvdata(dev, NULL);
memory_group_unregister(data->mgid);
err_reg_mgid:
kfree(data->res_name);
@@ -207,7 +345,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
#ifdef CONFIG_MEMORY_HOTREMOVE
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
- int i, success = 0;
+ int success;
int node = dev_dax->target_node;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
@@ -218,48 +356,25 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* there is no way to hotremove this memory until reboot because device
* unbind will succeed even if we return failure.
*/
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct range range;
- int rc;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- /* range was never added during probe */
- if (!data->res[i]) {
- success++;
- continue;
- }
-
- rc = remove_memory(range.start, range_len(&range));
- if (rc == 0) {
- remove_resource(data->res[i]);
- kfree(data->res[i]);
- data->res[i] = NULL;
- success++;
- continue;
- }
- any_hotremove_failed = true;
- dev_err(dev,
- "mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
- i, range.start, range.end);
+ success = dax_kmem_do_hotremove(dev_dax, data);
+ if (success < dev_dax->nr_range) {
+ dev_err(dev, "Hotplug regions stuck online until reboot\n");
+ return;
}
- if (success >= dev_dax->nr_range) {
- memory_group_unregister(data->mgid);
- kfree(data->res_name);
- kfree(data);
- dev_set_drvdata(dev, NULL);
- /*
- * Clear the memtype association on successful unplug.
- * If not, we have memory blocks left which can be
- * offlined/onlined later. We need to keep memory_dev_type
- * for that. This implies this reference will be around
- * till next reboot.
- */
- clear_node_memory_type(node, NULL);
- }
+ dax_kmem_cleanup_resources(dev_dax, data);
+ memory_group_unregister(data->mgid);
+ kfree(data->res_name);
+ kfree(data);
+ dev_set_drvdata(dev, NULL);
+ /*
+ * Clear the memtype association on successful unplug.
+ * If not, we have memory blocks left which can be
+ * offlined/onlined later. We need to keep memory_dev_type
+ * for that. This implies this reference will be around
+ * till next reboot.
+ */
+ clear_node_memory_type(node, NULL);
}
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions
2026-06-05 21:19 ` [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
@ 2026-06-05 21:36 ` sashiko-bot
2026-06-05 22:03 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:36 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] dax_kmem_do_hotplug() unconditionally attempts to add memory for all ranges, missing a check for successfully reserved I/O resources, leading to unsafe hotplug and a permanent memory leak on removal.
--
commit 9aae29a4ee0b2d84bab7e3e062cb31c70f75bf67
Author: Gregory Price <gourry@gourry.net>
dax/kmem: extract hotplug/hotremove helper functions
Refactor kmem _probe() _remove() by extracting init, cleanup, hotplug,
and hot-remove logic into separate helper functions.
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 41ccb618a1464..5bf36ab73f867 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
[ ... ]
> @@ -63,14 +63,195 @@ static void kmem_put_memory_types(void)
> mt_put_memory_types(&kmem_memory_types);
> }
>
> +/**
> + * dax_kmem_do_hotplug - hotplug memory for dax kmem device
[ ... ]
> +static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
> + struct dax_kmem_data *data,
> + int online_type)
> +{
> + struct device *dev = &dev_dax->dev;
> + int i, rc, onlined = 0;
> + mhp_t mhp_flags;
> +
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range range;
> +
> + rc = dax_kmem_range(dev_dax, i, &range);
> + if (rc)
> + continue;
> +
> + mhp_flags = MHP_NID_IS_MGID;
> + if (dev_dax->memmap_on_memory)
> + mhp_flags |= MHP_MEMMAP_ON_MEMORY;
> +
> + /*
> + * Ensure that future kexec'd kernels will not treat
> + * this as RAM automatically.
> + */
> + rc = __add_memory_driver_managed(data->mgid, range.start,
> + range_len(&range), kmem_name, mhp_flags,
> + online_type);
[Severity: High]
Should this loop verify if the resource was actually reserved before
adding the memory?
In dax_kmem_init_resources(), if request_mem_region() fails for a range
but a previous range succeeded (mapped > 0), the failure is bypassed and
data->res[i] is left as NULL.
If we don't check for !data->res[i] here, does this unconditionally call
__add_memory_driver_managed() and hot-add memory for physical ranges that
failed reservation?
> +
> + if (rc) {
> + dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
[ ... ]
> +static int dax_kmem_init_resources(struct dev_dax *dev_dax,
> + struct dax_kmem_data *data)
> +{
[ ... ]
> + /* Region is permanently reserved if hotremove fails. */
> + res = request_mem_region(range.start, range_len(&range),
> + data->res_name);
> + if (!res) {
> + dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
> + i, range.start, range.end);
> + /*
> + * Once some memory has been onlined we can't
> + * assume that it can be un-onlined safely.
> + */
> + if (mapped)
> + continue;
> + return -EBUSY;
> + }
> + data->res[i] = res;
[ ... ]
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +/**
> + * dax_kmem_do_hotremove - hot-remove memory for dax kmem device
[ ... ]
> +static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
> + struct dax_kmem_data *data)
> +{
> + struct device *dev = &dev_dax->dev;
> + int i, success = 0;
> +
> + for (i = 0; i < dev_dax->nr_range; i++) {
> + struct range range;
> + int rc;
> +
> + rc = dax_kmem_range(dev_dax, i, &range);
> + if (rc)
> + continue;
> +
> + /* range was never added during probe, count as removed */
> + if (!data->res[i]) {
> + success++;
> + continue;
> + }
[Severity: High]
If dax_kmem_do_hotplug() erroneously added memory for a range with a
NULL data->res[i], will this check cause us to skip remove_memory()
for that range?
Does this strand the memory block online permanently?
> +
> + rc = remove_memory(range.start, range_len(&range));
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=7
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions
2026-06-05 21:36 ` sashiko-bot
@ 2026-06-05 22:03 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 22:03 UTC (permalink / raw)
To: sashiko-reviews; +Cc: linux-cxl
On Fri, Jun 05, 2026 at 09:36:01PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] dax_kmem_do_hotplug() unconditionally attempts to add memory for all ranges, missing a check for successfully reserved I/O resources, leading to unsafe hotplug and a permanent memory leak on removal.
> --
>
> > +
> > + for (i = 0; i < dev_dax->nr_range; i++) {
> > + struct range range;
> > +
> > + rc = dax_kmem_range(dev_dax, i, &range);
> > + if (rc)
> > + continue;
> > +
> > + mhp_flags = MHP_NID_IS_MGID;
> > + if (dev_dax->memmap_on_memory)
> > + mhp_flags |= MHP_MEMMAP_ON_MEMORY;
> > +
> > + /*
> > + * Ensure that future kexec'd kernels will not treat
> > + * this as RAM automatically.
> > + */
> > + rc = __add_memory_driver_managed(data->mgid, range.start,
> > + range_len(&range), kmem_name, mhp_flags,
> > + online_type);
>
> [Severity: High]
> Should this loop verify if the resource was actually reserved before
> adding the memory?
>
> In dax_kmem_init_resources(), if request_mem_region() fails for a range
> but a previous range succeeded (mapped > 0), the failure is bypassed and
> data->res[i] is left as NULL.
>
> If we don't check for !data->res[i] here, does this unconditionally call
> __add_memory_driver_managed() and hot-add memory for physical ranges that
> failed reservation?
>
This feedback is actually subtly wrong, the old code would do
for region:
get_resource()
hotplug()
The new code does
for region:
get_resource()
for resource:
hotplug()
I see now I forgot to error out if nr_success != nr_requested. With
that, this condition goes away entirely and we can cleanup without a
partial hotplug event from having occurred.
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (6 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:37 ` sashiko-bot
2026-06-09 10:26 ` David Hildenbrand (Arm)
2026-06-05 21:19 ` [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
8 siblings, 2 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple,
Hannes Reinecke
The dax kmem driver currently onlines memory automatically during
probe using the system's default online policy but provides no way
to control or query the entire region state at runtime.
Additionally, there is no atomic mechanism to offline and remove
the entire set of memory blocks together. Instead, this is presently
done in two steps: (offline all, remove all). This creates a race
condition where external entities can operate directly on the blocks
and cause hot-unplug to fail.
Add a new 'hotplug' sysfs attribute that allows userspace to control
and query the entire memory region state. The writable states mirror
the per-block /sys/devices/system/memory/memoryX/state ABI:
- "unplugged": memory blocks are not present
- "online": memory is online, zone chosen by the kernel
- "online_kernel": memory is online in ZONE_NORMAL
- "online_movable": memory is online in ZONE_MOVABLE
The "unplugged" state is new and only applies to kmem/hotplug.
Valid transitions:
- unplugged -> online[_kernel|_movable]
- online | online_kernel | online_movable -> unplugged
- offline -> unplugged
A device can only be onlined from "unplugged", so it must be returned
there before being onlined into a different state.
For backwards compatibility the memory blocks are always created at
probe: existing tools expect them to be present once the kmem driver
binds. When the configured policy (mhp_get_default_online_type())
selects an online state the blocks are onlined into that policy's zone;
when the policy is offline the blocks are created but left offline and
the device reports the state "offline".
"offline" is therefore a reportable state but is not writable: it only
arises from the legacy auto_online_blocks=offline policy. Onlining such
a device through this attribute requires unplugging it first.
The "offline" state may be deprecated later if the memory block ABI
changes and userland migrates to using the region-wide hotplug.
Unplug is atomic across the whole device: dax_kmem_do_hotremove()
collects every added range and offlines/removes them in one operation
via offline_and_remove_memory_ranges(). Either all ranges are removed
and the device becomes "unplugged", or offlining is rolled back and the
device is left fully online, so the reported 'hotplug' state always
matches reality.
Unbind Note:
We used to call remove_memory() during unbind, which would fire a
BUG() if any of the memory blocks were online at that time. We lift
this into a WARN in the cleanup routine and don't attempt hotremove
if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE. Memory that is
merely offline (the legacy "offline" state) is removed on unbind as
before; only online memory is left pinned.
The resources are still leaked but this prevents deadlock on unbind
if a memory region happens to be impossible to hotremove.
Inconsistency Note:
Since memory blocks can still be modified individually, the hotplug
attribute can become out of sync with the state of the system if
userland software mixes and matches the use of memory_block ABI and
kmem/hotplug ABI. It's suggests to use one or the other.
Suggested-by: Hannes Reinecke <hare@suse.de>
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
Documentation/ABI/testing/sysfs-bus-dax | 25 +++
drivers/dax/kmem.c | 254 ++++++++++++++++++++----
2 files changed, 238 insertions(+), 41 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
index b34266bfae49..931eb4e20358 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -151,3 +151,28 @@ Description:
memmap_on_memory parameter for memory_hotplug. This is
typically set on the kernel command line -
memory_hotplug.memmap_on_memory set to 'true' or 'force'."
+
+What: /sys/bus/dax/devices/daxX.Y/hotplug
+Date: January, 2026
+KernelVersion: v6.21
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Controls the hotplug state of the memory region.
+ Applies to all memory blocks associated with the device.
+ Only applies to dax_kmem devices.
+
+ Reading returns the current state; the writable states mirror
+ the per-block /sys/devices/system/memory/memoryX/state ABI:
+ "unplugged": memory blocks are not present
+ "online": memory is online, zone chosen by the kernel
+ "online_kernel": memory is online in ZONE_NORMAL
+ "online_movable": memory is online in ZONE_MOVABLE
+
+ "offline" (memory blocks are present but offline) may also be
+ reported - this happens when the device is bound while the
+ auto_online_blocks policy is offline. It cannot be written and
+ is deprecated; it may be removed in the future.
+
+ A device can only be onlined from the "unplugged" state, so a
+ device must be returned to "unplugged" before it can be onlined
+ into a different state.
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 5bf36ab73f86..46ee06d9f56b 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -42,9 +42,15 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
return 0;
}
+#define DAX_KMEM_UNPLUGGED (-1)
+
struct dax_kmem_data {
const char *res_name;
int mgid;
+ int numa_node;
+ struct dev_dax *dev_dax;
+ int state;
+ struct mutex lock; /* protects hotplug state transitions */
struct resource *res[];
};
@@ -63,23 +69,41 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/* True for the online states a kmem dax device can hold. */
+static bool dax_kmem_state_is_online(int state)
+{
+ return state == MMOP_ONLINE ||
+ state == MMOP_ONLINE_KERNEL ||
+ state == MMOP_ONLINE_MOVABLE;
+}
+
/**
- * dax_kmem_do_hotplug - hotplug memory for dax kmem device
+ * dax_kmem_do_hotplug - add the dev_dax memory ranges as system memory
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
+ * @online_type: MMOP_OFFLINE to add the blocks offline, otherwise the online
+ * state (MMOP_ONLINE, MMOP_ONLINE_KERNEL, MMOP_ONLINE_MOVABLE)
+ * to bring them online in.
*
- * Hotplugs all ranges in the dev_dax region as system memory.
+ * Adds all ranges in the dev_dax region as system memory, onlining them in
+ * the requested zone unless @online_type is MMOP_OFFLINE.
*
- * Returns the number of successfully mapped ranges, or negative error.
+ * Returns the number of successfully added ranges, or negative error.
*/
static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
struct dax_kmem_data *data,
int online_type)
{
struct device *dev = &dev_dax->dev;
- int i, rc, onlined = 0;
+ int i, rc, added = 0;
mhp_t mhp_flags;
+ if (dax_kmem_state_is_online(data->state))
+ return -EINVAL;
+
+ if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
@@ -112,14 +136,14 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
kfree(data->res[i]);
data->res[i] = NULL;
}
- if (onlined)
+ if (added)
continue;
return rc;
}
- onlined++;
+ added++;
}
- return onlined;
+ return added;
}
/**
@@ -182,45 +206,65 @@ static int dax_kmem_init_resources(struct dev_dax *dev_dax,
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
*
- * Removes all ranges in the dev_dax region.
+ * Offlines and removes every currently-added range in the dev_dax region
+ * atomically: either all ranges are offlined and removed, or none are and
+ * the device is left fully online (see offline_and_remove_memory_ranges()).
*
- * Returns the number of successfully removed ranges.
+ * Returns 0 on success, or a negative errno if the device could not be
+ * fully unplugged (in which case nothing was removed).
*/
static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
struct dax_kmem_data *data)
{
struct device *dev = &dev_dax->dev;
- int i, success = 0;
+ struct range *ranges;
+ int i, nr_ranges = 0, rc;
+ ranges = kmalloc_array(dev_dax->nr_range, sizeof(*ranges), GFP_KERNEL);
+ if (!ranges)
+ return -ENOMEM;
+
+ /* Collect the ranges that were actually added during probe. */
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
- int rc;
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
+ if (!data->res[i])
continue;
-
- /* range was never added during probe, count as removed */
- if (!data->res[i]) {
- success++;
+ if (dax_kmem_range(dev_dax, i, &range))
continue;
- }
+ ranges[nr_ranges++] = range;
+ }
- rc = remove_memory(range.start, range_len(&range));
- if (rc == 0) {
- /* Release the resource for the successfully removed range */
- remove_resource(data->res[i]);
- kfree(data->res[i]);
- data->res[i] = NULL;
- success++;
- continue;
- }
+ /* Nothing added means nothing to remove. */
+ if (!nr_ranges) {
+ kfree(ranges);
+ return 0;
+ }
+
+ rc = offline_and_remove_memory_ranges(ranges, nr_ranges);
+ kfree(ranges);
+ if (rc) {
any_hotremove_failed = true;
- dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n",
- i, range.start, range.end);
+ dev_err(dev, "hotremove failed, device left online: %d\n", rc);
+ return rc;
}
- return success;
+ /* All ranges removed; release the reserved resources. */
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ if (!data->res[i])
+ continue;
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+
+ return 0;
+}
+#else
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ return -EBUSY;
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
@@ -236,6 +280,20 @@ static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
{
int i;
+ /*
+ * If the device unbind occurs before memory is hotremoved, we can never
+ * remove the memory (requires reboot). Attempting an offline operation
+ * here may cause deadlock and a failure to finish the unbind.
+ *
+ * This WARN used to be a BUG called by remove_memory().
+ *
+ * Note: This leaks the resources.
+ */
+ if (WARN(((data->state != DAX_KMEM_UNPLUGGED) &&
+ (data->state != MMOP_OFFLINE)),
+ "Hotplug memory regions stuck online until reboot"))
+ return;
+
for (i = 0; i < dev_dax->nr_range; i++) {
if (!data->res[i])
continue;
@@ -245,6 +303,107 @@ static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
}
}
+static int dax_kmem_parse_state(const char *buf)
+{
+ if (sysfs_streq(buf, "unplugged"))
+ return DAX_KMEM_UNPLUGGED;
+ if (sysfs_streq(buf, "online"))
+ return MMOP_ONLINE;
+ if (sysfs_streq(buf, "online_kernel"))
+ return MMOP_ONLINE_KERNEL;
+ if (sysfs_streq(buf, "online_movable"))
+ return MMOP_ONLINE_MOVABLE;
+ return -EINVAL;
+}
+
+static ssize_t hotplug_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ const char *state_str;
+
+ if (!data)
+ return -ENXIO;
+
+ switch (data->state) {
+ case DAX_KMEM_UNPLUGGED:
+ state_str = "unplugged";
+ break;
+ case MMOP_OFFLINE:
+ state_str = "offline";
+ break;
+ case MMOP_ONLINE:
+ state_str = "online";
+ break;
+ case MMOP_ONLINE_KERNEL:
+ state_str = "online_kernel";
+ break;
+ case MMOP_ONLINE_MOVABLE:
+ state_str = "online_movable";
+ break;
+ default:
+ state_str = "unknown";
+ break;
+ }
+
+ return sysfs_emit(buf, "%s\n", state_str);
+}
+
+static ssize_t hotplug_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ int online_type;
+ int rc;
+
+ if (!data)
+ return -ENXIO;
+
+ online_type = dax_kmem_parse_state(buf);
+ if (online_type < DAX_KMEM_UNPLUGGED)
+ return online_type;
+
+ guard(mutex)(&data->lock);
+
+ /* Already in requested state */
+ if (data->state == online_type)
+ return len;
+
+ if (online_type == DAX_KMEM_UNPLUGGED) {
+ rc = dax_kmem_do_hotremove(dev_dax, data);
+ if (rc)
+ return rc;
+ data->state = DAX_KMEM_UNPLUGGED;
+ return len;
+ }
+
+ /*
+ * Onlining is only allowed from the unplugged state. An already-online
+ * device (or one left in the legacy offline state) must be unplugged
+ * first.
+ */
+ if (data->state != DAX_KMEM_UNPLUGGED)
+ return -EBUSY;
+
+ /*
+ * A previous unplug releases the per-range resources, so re-acquire
+ * them here (mirroring probe). This is a no-op for ranges that are
+ * still reserved (e.g. transitioning from the offline state).
+ */
+ rc = dax_kmem_init_resources(dev_dax, data);
+ if (rc < 0)
+ return rc;
+
+ rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ if (rc < 0)
+ return rc;
+
+ data->state = online_type;
+ return len;
+}
+static DEVICE_ATTR_RW(hotplug);
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
@@ -312,6 +471,10 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc < 0)
goto err_reg_mgid;
data->mgid = rc;
+ data->numa_node = numa_node;
+ data->dev_dax = dev_dax;
+ data->state = DAX_KMEM_UNPLUGGED;
+ mutex_init(&data->lock);
dev_set_drvdata(dev, data);
@@ -320,11 +483,19 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_resources;
/*
- * Hotplug using the configured online type for this device.
+ * Always create the memory blocks for backwards compatibility: existing
+ * tools expect them to be present after the kmem driver binds. Under
+ * the offline policy they are added but left offline (state
+ * MMOP_OFFLINE); otherwise they are onlined per the configured policy.
*/
rc = dax_kmem_do_hotplug(dev_dax, data, dev_dax->online_type);
if (rc < 0)
goto err_hotplug;
+ data->state = dev_dax->online_type;
+
+ rc = device_create_file(dev, &dev_attr_hotplug);
+ if (rc)
+ dev_warn(dev, "failed to create hotplug sysfs entry\n");
return 0;
@@ -345,23 +516,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
#ifdef CONFIG_MEMORY_HOTREMOVE
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
- int success;
int node = dev_dax->target_node;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
+ device_remove_file(dev, &dev_attr_hotplug);
/*
- * We have one shot for removing memory, if some memory blocks were not
- * offline prior to calling this function remove_memory() will fail, and
- * there is no way to hotremove this memory until reboot because device
- * unbind will succeed even if we return failure.
+ * Blocks added under the legacy offline policy are present but offline;
+ * remove them on unbind as the driver always has. If removal fails,
+ * leak the resources rather than freeing state that still backs present
+ * memory. Online memory is left alone (dax_kmem_cleanup_resources()
+ * warns and leaks it) since offlining it here could deadlock the unbind.
*/
- success = dax_kmem_do_hotremove(dev_dax, data);
- if (success < dev_dax->nr_range) {
- dev_err(dev, "Hotplug regions stuck online until reboot\n");
+ if (data->state == MMOP_OFFLINE && dax_kmem_do_hotremove(dev_dax, data))
return;
- }
-
dax_kmem_cleanup_resources(dev_dax, data);
memory_group_unregister(data->mgid);
kfree(data->res_name);
@@ -379,6 +547,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
+ struct device *dev = &dev_dax->dev;
+
+ device_remove_file(dev, &dev_attr_hotplug);
+
/*
* Without hotremove purposely leak the request_mem_region() for the
* device-dax range and return '0' to ->remove() attempts. The removal
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-05 21:19 ` [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug Gregory Price
@ 2026-06-05 21:37 ` sashiko-bot
2026-06-09 10:26 ` David Hildenbrand (Arm)
1 sibling, 0 replies; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:37 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Unbinding the device while memory is online causes a use-after-free of `data->res_name` and incorrectly clears the node memory type.
- [High] `dax_kmem_do_hotplug()` unconditionally adds memory regions to the system even if their physical address reservations failed.
- [Low] Lockless access to `data->state` in `hotplug_show` can race with concurrent updates.
--
commit 3599d063a2c3d29c40b1495050182e1083f34ccd
Author: Gregory Price <gourry@gourry.net>
dax/kmem: add sysfs interface for atomic hotplug
The dax kmem driver currently onlines memory automatically during
probe using the system's default online policy but provides no way
to control or query the entire region state at runtime.
This commit adds a 'hotplug' sysfs attribute allowing userspace to
control and query the entire memory region state. Unplug operations are
now atomic across the whole device.
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 5bf36ab73f867..46ee06d9f56b3 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
[ ... ]
> @@ -63,23 +69,41 @@ static void kmem_put_memory_types(void)
> mt_put_memory_types(&kmem_memory_types);
> }
>
> +/* True for the online states a kmem dax device can hold. */
> +static bool dax_kmem_state_is_online(int state)
> +{
> + return state == MMOP_ONLINE ||
> + state == MMOP_ONLINE_KERNEL ||
> + state == MMOP_ONLINE_MOVABLE;
> +}
> +
> /**
> - * dax_kmem_do_hotplug - hotplug memory for dax kmem device
> + * dax_kmem_do_hotplug - add the dev_dax memory ranges as system memory
> * @dev_dax: the dev_dax instance
> * @data: the dax_kmem_data structure with resource tracking
> + * @online_type: MMOP_OFFLINE to add the blocks offline, otherwise the online
> + * state (MMOP_ONLINE, MMOP_ONLINE_KERNEL, MMOP_ONLINE_MOVABLE)
> + * to bring them online in.
> *
> - * Hotplugs all ranges in the dev_dax region as system memory.
> + * Adds all ranges in the dev_dax region as system memory, onlining them in
> + * the requested zone unless @online_type is MMOP_OFFLINE.
> *
> - * Returns the number of successfully mapped ranges, or negative error.
> */
> static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
> struct dax_kmem_data *data,
> int online_type)
> {
> struct device *dev = &dev_dax->dev;
> - int i, rc, onlined = 0;
> + int i, rc, added = 0;
> mhp_t mhp_flags;
>
> + if (dax_kmem_state_is_online(data->state))
> + return -EINVAL;
> +
> + if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
> + return -EINVAL;
> +
> for (i = 0; i < dev_dax->nr_range; i++) {
> struct range range;
>
> @@ -88,10 +112,10 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
> if (rc)
> continue;
>
> mhp_flags = MHP_NID_IS_MGID;
> if (dev_dax->memmap_on_memory)
> mhp_flags |= MHP_MEMMAP_ON_MEMORY;
>
> /*
> * Ensure that future kexec'd kernels will not treat
> * this as RAM automatically.
> */
> rc = __add_memory_driver_managed(data->mgid, range.start,
> range_len(&range), kmem_name, mhp_flags,
> online_type);
[Severity: High]
Does this code unconditionally add memory regions to the system even if their
physical address reservations failed?
If `request_mem_region()` fails in `dax_kmem_init_resources()`, it skips
reserving the region (leaving `data->res[i] = NULL`), but continues the loop
if previous ranges succeeded.
However, `dax_kmem_do_hotplug()` iterates over all ranges and blindly calls
`__add_memory_driver_managed()` without checking if `data->res[i]` is valid.
This attempts to forcibly online unreserved physical addresses as System RAM.
If these address ranges overlap with active MMIO or firmware regions, could
this lead to severe system stability issues and memory corruption?
[ ... ]
> +static ssize_t hotplug_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dax_kmem_data *data = dev_get_drvdata(dev);
> + const char *state_str;
> +
> + if (!data)
> + return -ENXIO;
> +
> + switch (data->state) {
[Severity: Low]
Does this lockless read of `data->state` race with concurrent updates?
`hotplug_show()` reads `data->state` locklessly to return the current state
string to userspace. Concurrently, `hotplug_store()` modifies `data->state`
while holding `data->lock`.
Should this be protected by the lock, or properly annotated with
`READ_ONCE()`/`WRITE_ONCE()` to prevent compiler tearing?
> + case DAX_KMEM_UNPLUGGED:
> + state_str = "unplugged";
> + break;
[ ... ]
> @@ -345,23 +516,20 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
> {
> - int success;
> int node = dev_dax->target_node;
> struct device *dev = &dev_dax->dev;
> struct dax_kmem_data *data = dev_get_drvdata(dev);
>
> + device_remove_file(dev, &dev_attr_hotplug);
> /*
> - * We have one shot for removing memory, if some memory blocks were not
> - * offline prior to calling this function remove_memory() will fail, and
> - * there is no way to hotremove this memory until reboot because device
> - * unbind will succeed even if we return failure.
> + * Blocks added under the legacy offline policy are present but offline;
> + * remove them on unbind as the driver always has. If removal fails,
> + * leak the resources rather than freeing state that still backs present
> + * memory. Online memory is left alone (dax_kmem_cleanup_resources()
> + * warns and leaks it) since offlining it here could deadlock the unbind.
> */
> - success = dax_kmem_do_hotremove(dev_dax, data);
> - if (success < dev_dax->nr_range) {
> - dev_err(dev, "Hotplug regions stuck online until reboot\n");
> + if (data->state == MMOP_OFFLINE && dax_kmem_do_hotremove(dev_dax, data))
> return;
> - }
> -
> dax_kmem_cleanup_resources(dev_dax, data);
> memory_group_unregister(data->mgid);
> kfree(data->res_name);
[Severity: High]
Can unbinding the device while memory is online cause a use-after-free
of `data->res_name` and incorrectly clear the node memory type?
When `data->state != MMOP_OFFLINE`, `dev_dax_kmem_remove()` skips
`dax_kmem_do_hotremove()` and calls `dax_kmem_cleanup_resources()`.
`dax_kmem_cleanup_resources()` intentionally warns and returns early to leave
the `struct resource` objects active to prevent unbind deadlocks.
However, `dev_dax_kmem_remove()` unconditionally proceeds to call
`kfree(data->res_name)` and `kfree(data)`. Since the leaked `struct resource`
objects' name fields still point to `data->res_name`, will any subsequent
read of `/proc/iomem` trigger a KASAN Use-After-Free?
Additionally, does the unconditional continuation improperly call
`clear_node_memory_type()`, corrupting the memtype association while the
memory blocks are still active?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=8
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-05 21:19 ` [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug Gregory Price
2026-06-05 21:37 ` sashiko-bot
@ 2026-06-09 10:26 ` David Hildenbrand (Arm)
2026-06-09 15:35 ` Gregory Price
1 sibling, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 10:26 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, osalvador, shuah, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple,
Hannes Reinecke
On 6/5/26 23:19, Gregory Price wrote:
> The dax kmem driver currently onlines memory automatically during
> probe using the system's default online policy but provides no way
> to control or query the entire region state at runtime.
>
> Additionally, there is no atomic mechanism to offline and remove
> the entire set of memory blocks together. Instead, this is presently
> done in two steps: (offline all, remove all). This creates a race
> condition where external entities can operate directly on the blocks
> and cause hot-unplug to fail.
>
> Add a new 'hotplug' sysfs attribute that allows userspace to control
> and query the entire memory region state. The writable states mirror
> the per-block /sys/devices/system/memory/memoryX/state ABI:
> - "unplugged": memory blocks are not present
> - "online": memory is online, zone chosen by the kernel
> - "online_kernel": memory is online in ZONE_NORMAL
> - "online_movable": memory is online in ZONE_MOVABLE
>
> The "unplugged" state is new and only applies to kmem/hotplug.
>
> Valid transitions:
> - unplugged -> online[_kernel|_movable]
> - online | online_kernel | online_movable -> unplugged
> - offline -> unplugged
>
> A device can only be onlined from "unplugged", so it must be returned
> there before being onlined into a different state.
>
> For backwards compatibility the memory blocks are always created at
> probe: existing tools expect them to be present once the kmem driver
> binds. When the configured policy (mhp_get_default_online_type())
> selects an online state the blocks are onlined into that policy's zone;
> when the policy is offline the blocks are created but left offline and
> the device reports the state "offline".
>
> "offline" is therefore a reportable state but is not writable: it only
> arises from the legacy auto_online_blocks=offline policy. Onlining such
> a device through this attribute requires unplugging it first.
>
> The "offline" state may be deprecated later if the memory block ABI
> changes and userland migrates to using the region-wide hotplug.
>
> Unplug is atomic across the whole device: dax_kmem_do_hotremove()
> collects every added range and offlines/removes them in one operation
> via offline_and_remove_memory_ranges(). Either all ranges are removed
> and the device becomes "unplugged", or offlining is rolled back and the
> device is left fully online, so the reported 'hotplug' state always
> matches reality.
>
> Unbind Note:
> We used to call remove_memory() during unbind, which would fire a
> BUG() if any of the memory blocks were online at that time. We lift
> this into a WARN in the cleanup routine and don't attempt hotremove
> if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE. Memory that is
> merely offline (the legacy "offline" state) is removed on unbind as
> before; only online memory is left pinned.
>
> The resources are still leaked but this prevents deadlock on unbind
> if a memory region happens to be impossible to hotremove.
>
> Inconsistency Note:
>
> Since memory blocks can still be modified individually, the hotplug
> attribute can become out of sync with the state of the system if
> userland software mixes and matches the use of memory_block ABI and
> kmem/hotplug ABI. It's suggests to use one or the other.
>
> Suggested-by: Hannes Reinecke <hare@suse.de>
> Suggested-by: David Hildenbrand <david@kernel.org>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
[...]
>
> +static int dax_kmem_parse_state(const char *buf)
> +{
> + if (sysfs_streq(buf, "unplugged"))
> + return DAX_KMEM_UNPLUGGED;
> + if (sysfs_streq(buf, "online"))
> + return MMOP_ONLINE;
> + if (sysfs_streq(buf, "online_kernel"))
> + return MMOP_ONLINE_KERNEL;
> + if (sysfs_streq(buf, "online_movable"))
> + return MMOP_ONLINE_MOVABLE;
> + return -EINVAL;
Should we try making use of mhp_online_type_from_str()/online_type_to_str()
[possibly a nicer exported function for the latter] to avoid duplicating this ...
> +}
> +
> +static ssize_t hotplug_show(struct device *dev,
> + struct device_attribute *attr, char *buf)
> +{
> + struct dax_kmem_data *data = dev_get_drvdata(dev);
> + const char *state_str;
> +
> + if (!data)
> + return -ENXIO;
> +
> + switch (data->state) {
> + case DAX_KMEM_UNPLUGGED:
> + state_str = "unplugged";
> + break;
> + case MMOP_OFFLINE:
> + state_str = "offline";
> + break;
> + case MMOP_ONLINE:
> + state_str = "online";
> + break;
> + case MMOP_ONLINE_KERNEL:
> + state_str = "online_kernel";
> + break;
> + case MMOP_ONLINE_MOVABLE:
> + state_str = "online_movable";
> + break;
...
and this?
[sorry if we already discussed this]
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-09 10:26 ` David Hildenbrand (Arm)
@ 2026-06-09 15:35 ` Gregory Price
2026-06-09 18:11 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-09 15:35 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple, Hannes Reinecke
On Tue, Jun 09, 2026 at 12:26:07PM +0200, David Hildenbrand (Arm) wrote:
> On 6/5/26 23:19, Gregory Price wrote:
>
> >
> > +static int dax_kmem_parse_state(const char *buf)
> > +{
> > + if (sysfs_streq(buf, "unplugged"))
> > + return DAX_KMEM_UNPLUGGED;
> > + if (sysfs_streq(buf, "online"))
> > + return MMOP_ONLINE;
> > + if (sysfs_streq(buf, "online_kernel"))
> > + return MMOP_ONLINE_KERNEL;
> > + if (sysfs_streq(buf, "online_movable"))
> > + return MMOP_ONLINE_MOVABLE;
> > + return -EINVAL;
>
> Should we try making use of mhp_online_type_from_str()/online_type_to_str()
> [possibly a nicer exported function for the latter] to avoid duplicating this ...
>
In patch 6 response i point out adding MMOP_UNPLUGGED
If we add MMOP_UNPLUGGED as a state that is only use by callers of
memory hotplug to represent the current state - but not as a valid input
to memory_hotplug.c - then we can simply this as you request.
Although we'll need to add a couple lines to memoryN/state parsing code
to disallow MMOP_UNPLUGGED as a valid input (otherwise you could
permanently unplug memory without the ability to get it back... unless
you want that? Seems useless to me.)
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-09 15:35 ` Gregory Price
@ 2026-06-09 18:11 ` David Hildenbrand (Arm)
2026-06-09 18:19 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 18:11 UTC (permalink / raw)
To: Gregory Price
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple, Hannes Reinecke
On 6/9/26 17:35, Gregory Price wrote:
> On Tue, Jun 09, 2026 at 12:26:07PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/5/26 23:19, Gregory Price wrote:
>>
>>>
>>> +static int dax_kmem_parse_state(const char *buf)
>>> +{
>>> + if (sysfs_streq(buf, "unplugged"))
>>> + return DAX_KMEM_UNPLUGGED;
>>> + if (sysfs_streq(buf, "online"))
>>> + return MMOP_ONLINE;
>>> + if (sysfs_streq(buf, "online_kernel"))
>>> + return MMOP_ONLINE_KERNEL;
>>> + if (sysfs_streq(buf, "online_movable"))
>>> + return MMOP_ONLINE_MOVABLE;
>>> + return -EINVAL;
>>
>> Should we try making use of mhp_online_type_from_str()/online_type_to_str()
>> [possibly a nicer exported function for the latter] to avoid duplicating this ...
>>
>
> In patch 6 response i point out adding MMOP_UNPLUGGED
>
> If we add MMOP_UNPLUGGED as a state that is only use by callers of
> memory hotplug to represent the current state - but not as a valid input
> to memory_hotplug.c - then we can simply this as you request.
>
> Although we'll need to add a couple lines to memoryN/state parsing code
> to disallow MMOP_UNPLUGGED as a valid input (otherwise you could
> permanently unplug memory without the ability to get it back... unless
> you want that? Seems useless to me.)
I would just special-case your KMEM-special value, and then branch off to the
core helpers?
static int dax_kmem_parse_state(const char *buf)
{
if (sysfs_streq(buf, "unplugged"))
return DAX_KMEM_UNPLUGGED;
return mhp_online_type_from_str(...);
}
So there is no need to add a value to MM core?
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-09 18:11 ` David Hildenbrand (Arm)
@ 2026-06-09 18:19 ` Gregory Price
2026-06-09 18:22 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-09 18:19 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple, Hannes Reinecke
On Tue, Jun 09, 2026 at 08:11:42PM +0200, David Hildenbrand (Arm) wrote:
> On 6/9/26 17:35, Gregory Price wrote:
> > On Tue, Jun 09, 2026 at 12:26:07PM +0200, David Hildenbrand (Arm) wrote:
> >> On 6/5/26 23:19, Gregory Price wrote:
> >>
> >>>
> >>> +static int dax_kmem_parse_state(const char *buf)
> >>> +{
> >>> + if (sysfs_streq(buf, "unplugged"))
> >>> + return DAX_KMEM_UNPLUGGED;
> >>> + if (sysfs_streq(buf, "online"))
> >>> + return MMOP_ONLINE;
> >>> + if (sysfs_streq(buf, "online_kernel"))
> >>> + return MMOP_ONLINE_KERNEL;
> >>> + if (sysfs_streq(buf, "online_movable"))
> >>> + return MMOP_ONLINE_MOVABLE;
> >>> + return -EINVAL;
> >>
> >> Should we try making use of mhp_online_type_from_str()/online_type_to_str()
> >> [possibly a nicer exported function for the latter] to avoid duplicating this ...
> >>
> >
> > In patch 6 response i point out adding MMOP_UNPLUGGED
> >
> > If we add MMOP_UNPLUGGED as a state that is only use by callers of
> > memory hotplug to represent the current state - but not as a valid input
> > to memory_hotplug.c - then we can simply this as you request.
> >
> > Although we'll need to add a couple lines to memoryN/state parsing code
> > to disallow MMOP_UNPLUGGED as a valid input (otherwise you could
> > permanently unplug memory without the ability to get it back... unless
> > you want that? Seems useless to me.)
>
> I would just special-case your KMEM-special value, and then branch off to the
> core helpers?
>
> static int dax_kmem_parse_state(const char *buf)
> {
> if (sysfs_streq(buf, "unplugged"))
> return DAX_KMEM_UNPLUGGED;
> return mhp_online_type_from_str(...);
> }
>
> So there is no need to add a value to MM core?
>
Was going to propose this, but then I thought... well if CXL detactes
itself from dax and does this same pattern, maybe MMOP_UNPLUGGED makes
sense.
But i'll leave that for another day and do as you suggest.
Thanks!
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-09 18:19 ` Gregory Price
@ 2026-06-09 18:22 ` David Hildenbrand (Arm)
2026-06-09 18:33 ` Gregory Price
0 siblings, 1 reply; 34+ messages in thread
From: David Hildenbrand (Arm) @ 2026-06-09 18:22 UTC (permalink / raw)
To: Gregory Price
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple, Hannes Reinecke
On 6/9/26 20:19, Gregory Price wrote:
> On Tue, Jun 09, 2026 at 08:11:42PM +0200, David Hildenbrand (Arm) wrote:
>> On 6/9/26 17:35, Gregory Price wrote:
>>>
>>> In patch 6 response i point out adding MMOP_UNPLUGGED
>>>
>>> If we add MMOP_UNPLUGGED as a state that is only use by callers of
>>> memory hotplug to represent the current state - but not as a valid input
>>> to memory_hotplug.c - then we can simply this as you request.
>>>
>>> Although we'll need to add a couple lines to memoryN/state parsing code
>>> to disallow MMOP_UNPLUGGED as a valid input (otherwise you could
>>> permanently unplug memory without the ability to get it back... unless
>>> you want that? Seems useless to me.)
>>
>> I would just special-case your KMEM-special value, and then branch off to the
>> core helpers?
>>
>> static int dax_kmem_parse_state(const char *buf)
>> {
>> if (sysfs_streq(buf, "unplugged"))
>> return DAX_KMEM_UNPLUGGED;
>> return mhp_online_type_from_str(...);
>> }
>>
>> So there is no need to add a value to MM core?
>>
>
> Was going to propose this, but then I thought... well if CXL detactes
> itself from dax and does this same pattern, maybe MMOP_UNPLUGGED makes
> sense.
>
> But i'll leave that for another day and do as you suggest.
Let me sleep over it :)
--
Cheers,
David
^ permalink raw reply [flat|nested] 34+ messages in thread* Re: [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug
2026-06-09 18:22 ` David Hildenbrand (Arm)
@ 2026-06-09 18:33 ` Gregory Price
0 siblings, 0 replies; 34+ messages in thread
From: Gregory Price @ 2026-06-09 18:33 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: linux-mm, nvdimm, linux-kernel, kernel-team, linux-cxl,
linux-kselftest, djbw, vishal.l.verma, dave.jiang, akpm, ljs,
liam, vbabka, rppt, surenb, mhocko, osalvador, shuah,
alison.schofield, Smita.KoralahalliChannabasappa, ira.weiny,
apopple, Hannes Reinecke
On Tue, Jun 09, 2026 at 08:22:20PM +0200, David Hildenbrand (Arm) wrote:
> On 6/9/26 20:19, Gregory Price wrote:
> > On Tue, Jun 09, 2026 at 08:11:42PM +0200, David Hildenbrand (Arm) wrote:
> >> On 6/9/26 17:35, Gregory Price wrote:
> >
> > Was going to propose this, but then I thought... well if CXL detactes
> > itself from dax and does this same pattern, maybe MMOP_UNPLUGGED makes
> > sense.
> >
> > But i'll leave that for another day and do as you suggest.
>
> Let me sleep over it :)
>
Have some other sets i'm trying to break out of private-nodes anyway,
i'll let this bake for the week.
~Gregory
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test
2026-06-05 21:19 [PATCH v4 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (7 preceding siblings ...)
2026-06-05 21:19 ` [PATCH v4 8/9] dax/kmem: add sysfs interface for atomic hotplug Gregory Price
@ 2026-06-05 21:19 ` Gregory Price
2026-06-05 21:34 ` sashiko-bot
8 siblings, 1 reply; 34+ messages in thread
From: Gregory Price @ 2026-06-05 21:19 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, kernel-team, linux-cxl, linux-kselftest, djbw,
vishal.l.verma, dave.jiang, akpm, david, ljs, liam, vbabka, rppt,
surenb, mhocko, osalvador, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Add a kselftest for the dax/kmem whole-device "hotplug" sysfs attribute
(/sys/bus/dax/devices/daxX.Y/hotplug), which transitions a kmem-backed
dax device between "unplugged", "online" and "online_movable".
Provisioning a devdax device and binding it to kmem needs daxctl/ndctl
(or the tools/testing/nvdimm emulation) and is out of scope for an
in-tree selftest, so the test discovers an already kmem-bound dax device
and SKIPs (KSFT_SKIP) when none is present or when the memory cannot be
freed to reach a known baseline.
When a device is available it validates the interface contract:
- online / online_movable actually add memory (MemTotal grows),
- online is idempotent,
- switching between online types without an intervening unplug is
rejected,
- unplug removes the memory and the reported state matches reality,
- invalid input is rejected.
In particular it covers the online -> unplug -> online_movable -> unplug
cycle: a re-online must re-reserve the per-range resources so that a
subsequent unplug actually offlines and removes the memory instead of
silently reporting success while the memory stays online.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/dax/Makefile | 6 +
tools/testing/selftests/dax/config | 4 +
.../testing/selftests/dax/dax-kmem-hotplug.sh | 145 ++++++++++++++++++
tools/testing/selftests/dax/settings | 1 +
5 files changed, 157 insertions(+)
create mode 100644 tools/testing/selftests/dax/Makefile
create mode 100644 tools/testing/selftests/dax/config
create mode 100755 tools/testing/selftests/dax/dax-kmem-hotplug.sh
create mode 100644 tools/testing/selftests/dax/settings
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6e59b8f63e41..8c2b4f97619c 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += core
TARGETS += cpufreq
TARGETS += cpu-hotplug
TARGETS += damon
+TARGETS += dax
TARGETS += devices/error_logs
TARGETS += devices/probe
TARGETS += dmabuf-heaps
diff --git a/tools/testing/selftests/dax/Makefile b/tools/testing/selftests/dax/Makefile
new file mode 100644
index 000000000000..25a4f3d73a5b
--- /dev/null
+++ b/tools/testing/selftests/dax/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := dax-kmem-hotplug.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dax/config b/tools/testing/selftests/dax/config
new file mode 100644
index 000000000000..4c9aaeb6ceb4
--- /dev/null
+++ b/tools/testing/selftests/dax/config
@@ -0,0 +1,4 @@
+CONFIG_DEV_DAX=m
+CONFIG_DEV_DAX_KMEM=m
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
diff --git a/tools/testing/selftests/dax/dax-kmem-hotplug.sh b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
new file mode 100755
index 000000000000..705a34cc3c6d
--- /dev/null
+++ b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
@@ -0,0 +1,145 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Exercise the dax/kmem whole-device "hotplug" sysfs attribute:
+# /sys/bus/dax/devices/daxX.Y/hotplug -> unplugged | online | online_movable
+#
+# The test needs a dax device already bound to the kmem driver (so the
+# 'hotplug' attribute exists). Provisioning a devdax device and binding it to
+# kmem requires daxctl/ndctl (or the tools/testing/nvdimm emulation) and is out
+# of scope here; if no suitable device is found the test SKIPs.
+#
+# To actually run it, provision a kmem-backed dax device first. For example,
+# carve a chunk of RAM into an emulated pmem region via the kernel command line
+# (the region must be at least one memory block, e.g. 128MiB on x86):
+#
+# memmap=2G!4G
+#
+# then, in the booted system:
+#
+# ndctl create-namespace -m devdax -e namespace0.0 -f
+# daxctl reconfigure-device -N -m system-ram dax0.0 # binds the kmem driver
+# ./dax-kmem-hotplug.sh
+
+DIR="$(dirname "$(readlink -f "$0")")"
+. "$DIR"/../kselftest/ktap_helpers.sh
+
+DAX_BASE=/sys/bus/dax/devices
+
+memtotal_kb() { awk '/^MemTotal:/ {print $2}' /proc/meminfo; }
+get_state() { cat "$HP" 2>/dev/null; }
+# set_state STATE -- write a state to the hotplug attribute; returns the
+# write's exit status (0 = accepted by the kernel)
+set_state() { echo "$1" > "$HP" 2>/dev/null; }
+
+find_kmem_dax() {
+ local d drv
+ for d in "$DAX_BASE"/dax*; do
+ [ -e "$d/hotplug" ] || continue
+ drv=$(readlink "$d/driver" 2>/dev/null)
+ [ "$(basename "${drv:-}")" = kmem ] || continue
+ basename "$d"
+ return 0
+ done
+ return 1
+}
+
+ktap_print_header
+
+if [ "$UID" != 0 ]; then
+ ktap_skip_all "must be run as root"
+ exit "$KSFT_SKIP"
+fi
+
+DAX=$(find_kmem_dax)
+if [ -z "$DAX" ]; then
+ ktap_skip_all "no kmem-bound dax device with a hotplug attribute"
+ exit "$KSFT_SKIP"
+fi
+HP=$DAX_BASE/$DAX/hotplug
+ORIG=$(get_state)
+
+# A failure to reach the baseline is environmental (memory in use), not an
+# interface failure, so skip rather than fail.
+set_state unplugged; rc=$?
+if [ "$rc" != 0 ] || [ "$(get_state)" != unplugged ]; then
+ ktap_skip_all "$DAX: cannot reach 'unplugged' baseline (memory in use?)"
+ [ -n "$ORIG" ] && set_state "$ORIG"
+ exit "$KSFT_SKIP"
+fi
+mt_unplugged=$(memtotal_kb)
+
+ktap_print_msg "using $DAX (initial state was: $ORIG)"
+ktap_set_plan 8
+
+set_state online; rc=$?
+mt_online=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online ] && [ "$mt_online" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online: state=online, MemTotal $mt_unplugged -> $mt_online kB"
+else
+ ktap_test_fail "online: rc=$rc state=$(get_state) MemTotal $mt_unplugged -> $mt_online"
+fi
+
+set_state online; rc=$?
+if [ "$rc" = 0 ] && [ "$(get_state)" = online ]; then
+ ktap_test_pass "online idempotent"
+else
+ ktap_test_fail "online idempotent: rc=$rc state=$(get_state)"
+fi
+
+set_state online_movable; rc=$?
+if [ "$rc" != 0 ] && [ "$(get_state)" = online ]; then
+ ktap_test_pass "reject online_movable without intervening unplug"
+else
+ ktap_test_fail "online->online_movable not rejected: rc=$rc state=$(get_state)"
+fi
+
+set_state unplugged; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = unplugged ] && [ "$mt" -lt "$mt_online" ]; then
+ ktap_test_pass "unplug from online: MemTotal $mt_online -> $mt kB"
+else
+ ktap_test_fail "unplug from online: rc=$rc state=$(get_state) MemTotal $mt_online -> $mt"
+fi
+
+set_state online_movable; rc=$?
+mt_mov=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online_movable ] && [ "$mt_mov" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online_movable after unplug: MemTotal $mt_unplugged -> $mt_mov kB"
+else
+ ktap_test_fail "online_movable after unplug: rc=$rc state=$(get_state) MemTotal=$mt_mov"
+fi
+
+# The online -> unplug -> online_movable -> unplug cycle once regressed: a
+# re-online failed to re-reserve the per-range resources, so this final unplug
+# reported success while leaving the memory online. Assert it is really freed.
+set_state unplugged; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" != 0 ]; then
+ ktap_test_skip "unplug from movable not accepted (memory in use?) rc=$rc"
+elif [ "$(get_state)" = unplugged ] && [ "$mt" -lt "$mt_mov" ]; then
+ ktap_test_pass "unplug from online_movable removed memory: $mt_mov -> $mt kB"
+else
+ ktap_test_fail "unplug success but memory remained: $(get_state) $mt_mov -> $mt kB"
+fi
+
+set_state online_kernel; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online_kernel ] && [ "$mt" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online_kernel: MemTotal $mt_unplugged -> $mt kB"
+else
+ ktap_test_fail "online_kernel: rc=$rc state=$(get_state) MemTotal=$mt"
+fi
+set_state unplugged
+
+before=$(get_state)
+set_state bogus_state; rc=$?
+if [ "$rc" != 0 ] && [ "$(get_state)" = "$before" ]; then
+ ktap_test_pass "reject invalid state string"
+else
+ ktap_test_fail "invalid state not rejected: rc=$rc state=$(get_state)"
+fi
+
+[ -n "$ORIG" ] && set_state "$ORIG"
+
+ktap_finished
diff --git a/tools/testing/selftests/dax/settings b/tools/testing/selftests/dax/settings
new file mode 100644
index 000000000000..ba4d85f74cd6
--- /dev/null
+++ b/tools/testing/selftests/dax/settings
@@ -0,0 +1 @@
+timeout=90
--
2.54.0
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test
2026-06-05 21:19 ` [PATCH v4 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
@ 2026-06-05 21:34 ` sashiko-bot
0 siblings, 0 replies; 34+ messages in thread
From: sashiko-bot @ 2026-06-05 21:34 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [Medium] Skipped unplug test causes a cascading spurious failure in the subsequent `online_kernel` test.
- [Medium] The test script modifies system-wide DAX device hotplug state but lacks a `trap` to ensure restoration on unexpected termination.
--
commit eaf4726251fa25a05d2dfac099a7956bef58a48a
Author: Gregory Price <gourry@gourry.net>
selftests/dax: add dax/kmem hotplug sysfs regression test
This commit adds a kselftest to validate the whole-device hotplug sysfs
attribute for dax/kmem devices. It verifies transitions between unplugged,
online, and online_movable states, ensuring memory is properly added and
removed.
> diff --git a/tools/testing/selftests/dax/dax-kmem-hotplug.sh b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
> new file mode 100755
> index 0000000000000..e43cda8ba1274
> --- /dev/null
> +++ b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
[ ... ]
> +DAX=$(find_kmem_dax)
> +if [ -z "$DAX" ]; then
> + ktap_skip_all "no kmem-bound dax device with a hotplug attribute"
> + exit "$KSFT_SKIP"
> +fi
> +HP=$DAX_BASE/$DAX/hotplug
> +ORIG=$(get_state)
[Severity: Medium]
Does this script ensure the original state is restored if execution is
aborted early?
Since the original state is recorded here but no shell trap is established,
an unexpected exit or user interrupt could bypass the manual restoration
at the end of the script. Would adding an exit trap handler right after
recording the state ensure reliable cleanup?
[ ... ]
> +# The online -> unplug -> online_movable -> unplug cycle once regressed: a
> +# re-online failed to re-reserve the per-range resources, so this final unplug
> +# reported success while leaving the memory online. Assert it is really freed.
> +set_state unplugged; rc=$?
> +mt=$(memtotal_kb)
> +if [ "$rc" != 0 ]; then
> + ktap_test_skip "unplug from movable not accepted (memory in use?) rc=$rc"
> +elif [ "$(get_state)" = unplugged ] && [ "$mt" -lt "$mt_mov" ]; then
> + ktap_test_pass "unplug from online_movable removed memory: $mt_mov -> $mt kB"
> +else
> + ktap_test_fail "unplug success but memory remained: $(get_state) $mt_mov -> $mt kB"
> +fi
> +
> +set_state online_kernel; rc=$?
[Severity: Medium]
If the preceding unplug transition fails and execution enters the skip
branch, will this subsequent state transition fail spuriously?
Because the driver rejects direct transitions between different online states
without an intervening unplug, falling through to the online_kernel test
while the device is still in the online_movable state seems like it will
unconditionally fail instead of safely skipping the rest of the checks.
> +mt=$(memtotal_kb)
> +if [ "$rc" = 0 ] && [ "$(get_state)" = online_kernel ] && [ "$mt" -gt "$mt_unplugged" ]; then
> + ktap_test_pass "online_kernel: MemTotal $mt_unplugged -> $mt kB"
> +else
> + ktap_test_fail "online_kernel: rc=$rc state=$(get_state) MemTotal=$mt"
> +fi
> +set_state unplugged
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605211911.2160954-1-gourry@gourry.net?part=9
^ permalink raw reply [flat|nested] 34+ messages in thread