* [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs
@ 2026-06-24 14:57 Gregory Price
2026-06-24 14:57 ` [PATCH v5 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
` (9 more replies)
0 siblings, 10 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
The dax kmem driver onlines memory during probe using the system
default policy, with no atomic control for the state of an entire
region at runtime - only by toggling individual memory blocks.
Offlining and removing a whole region therefore races with other
userland controllers that interfere between the offline and remove
steps. This was discussed in the LPC2025 device memory sessions [1].
This series adds a sysfs "state" attribute for atomic whole-device
hotplug control, plus the mm and dax plumbing to support it.
Transitions are atomic across every range of the device. The state
names mirror the per-block memoryX/state ABI with one modification:
- "unplugged": memory blocks are not present
- "online": online as system RAM, zone chosen by the kernel
- "online_kernel": online in ZONE_NORMAL
- "online_movable": online in ZONE_MOVABLE
"offline" (blocks present but offline) is reportable for backward
compatibility but is not writable because it entices the race condition
we are trying to solve (offlining all the memory blocks in one atomic
and unplugging them in another atomic).
mm preparation:
1. mm/memory: add memory_block_aligned_range() helper.
2. mm/memory_hotplug: pass online_type to online_memory_block().
3. mm/memory_hotplug: export mhp_get_default_online_type().
4. mm/memory_hotplug: add __add_memory_driver_managed() so a driver can
select the online policy. The override is restricted to in-tree
modules via EXPORT_SYMBOL_FOR_MODULES().
5. mm/memory_hotplug: add offline_and_remove_memory_ranges() for atomic,
all-or-nothing offline+remove of several ranges under a single
lock_device_hotplug().
dax/kmem feature:
6. Plumb online_type through the dax device creation path.
7. Extract hotplug/hotremove into helper functions.
8. Add the "hotplug" sysfs attribute.
9. selftests/dax: regression test for the attribute.
DAX Kmem probe still creates the memory blocks by default, even when
the default policy is "offline" to preserve backwards compatibility.
Unplug (atomic offline+remove of the whole device) is the new
capability provided by the attribute.
I downgraded a BUG() to a WARN() when unbind is called while the device
is not unplugged. The old per-block toggling pattern is still used by
userland tools and disconnects the 'hotplug' value from the real region
state; until per-block control is deprecated or restricted in some way,
WARN() flags that tools should move to the new atomic pattern.
Changes since v4:
- renamed 'dax/hotplug' -> 'dax/state'
- refactored the work into a shared offline_and_remove_memory_ranges
- reworked MMOP_ helpers to re-use code
- fixed cached system default online_type regression
- nits
Gregory Price (9):
mm/memory: add memory_block_aligned_range() helper
mm/memory_hotplug: pass online_type to online_memory_block() via arg
mm/memory_hotplug: export mhp_get_default_online_type
mm/memory_hotplug: add __add_memory_driver_managed() with online_type
arg
mm/memory_hotplug: add offline_and_remove_memory_ranges()
dax: plumb hotplug online_type through dax
dax/kmem: extract hotplug/hotremove helper functions
dax/kmem: add sysfs interface for atomic whole-device hotplug
selftests/dax: add dax/kmem hotplug sysfs regression test
Documentation/ABI/testing/sysfs-bus-dax | 26 +
drivers/base/memory.c | 9 +
drivers/dax/bus.c | 3 +
drivers/dax/bus.h | 9 +
drivers/dax/cxl.c | 1 +
drivers/dax/dax-private.h | 4 +
drivers/dax/hmem/hmem.c | 1 +
drivers/dax/kmem.c | 475 ++++++++++++++----
drivers/dax/pmem.c | 1 +
include/linux/memory.h | 22 +
include/linux/memory_hotplug.h | 13 +
mm/memory_hotplug.c | 162 ++++--
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/dax/Makefile | 6 +
tools/testing/selftests/dax/config | 4 +
.../testing/selftests/dax/dax-kmem-hotplug.sh | 207 ++++++++
tools/testing/selftests/dax/settings | 1 +
17 files changed, 806 insertions(+), 139 deletions(-)
create mode 100644 tools/testing/selftests/dax/Makefile
create mode 100644 tools/testing/selftests/dax/config
create mode 100755 tools/testing/selftests/dax/dax-kmem-hotplug.sh
create mode 100644 tools/testing/selftests/dax/settings
--
2.54.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v5 1/9] mm/memory: add memory_block_aligned_range() helper
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
` (8 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Memory hotplug operations require ranges aligned to memory block
boundaries. This is a generic operation for hotplug.
Add memory_block_aligned_range() as a common helper in <linux/memory.h>
that aligns the start address up and end address down to memory block
boundaries.
Update dax/kmem to use this helper.
Signed-off-by: Gregory Price <gourry@gourry.net>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
drivers/dax/kmem.c | 4 +---
include/linux/memory.h | 22 ++++++++++++++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index a18e2b968e4d..592171ec10f4 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -33,9 +33,7 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
struct dev_dax_range *dax_range = &dev_dax->ranges[i];
struct range *range = &dax_range->range;
- /* memory-block align the hotplug range */
- r->start = ALIGN(range->start, memory_block_size_bytes());
- r->end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1;
+ *r = memory_block_aligned_range(range);
if (r->start >= r->end) {
r->start = range->start;
r->end = range->end;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 463dc02f6cff..9f5ef0309f77 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -20,6 +20,7 @@
#include <linux/compiler.h>
#include <linux/mutex.h>
#include <linux/memory_hotplug.h>
+#include <linux/range.h>
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
@@ -100,6 +101,27 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
unsigned long memory_block_size_bytes(void);
int set_memory_block_size_order(unsigned int order);
+/**
+ * memory_block_aligned_range - align a physical address range to memory blocks
+ * @range: the input range to align
+ *
+ * Aligns the start address up and the end address down to memory block
+ * boundaries. This is required for memory hotplug operations which must
+ * operate on memory-block aligned ranges.
+ *
+ * Returns the aligned range. Callers should check that the returned
+ * range is valid (aligned.start < aligned.end) before using it.
+ */
+static inline struct range memory_block_aligned_range(const struct range *range)
+{
+ struct range aligned;
+
+ aligned.start = ALIGN(range->start, memory_block_size_bytes());
+ aligned.end = ALIGN_DOWN(range->end + 1, memory_block_size_bytes()) - 1;
+
+ return aligned;
+}
+
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
2026-06-24 14:57 ` [PATCH v5 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 16:28 ` Gupta, Pankaj
2026-06-24 14:57 ` [PATCH v5 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
` (7 subsequent siblings)
9 siblings, 1 reply; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Modify online_memory_block() to accept the online type through its arg
parameter rather than calling mhp_get_default_online_type() internally.
This prepares for allowing callers to specify explicit online types.
Update the caller in add_memory_resource() to pass the default online
type via a local variable.
No functional change.
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
mm/memory_hotplug.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7ac19fab2263..6833208cc17c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
static int online_memory_block(struct memory_block *mem, void *arg)
{
- mem->online_type = mhp_get_default_online_type();
+ enum mmop *online_type = arg;
+
+ mem->online_type = *online_type;
return device_online(&mem->dev);
}
@@ -1494,6 +1496,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+ enum mmop online_type = mhp_get_default_online_type();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
struct memory_group *group = NULL;
u64 start, size;
@@ -1582,7 +1585,8 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
/* online pages if requested */
if (mhp_get_default_online_type() != MMOP_OFFLINE)
- walk_memory_blocks(start, size, NULL, online_memory_block);
+ walk_memory_blocks(start, size, &online_type,
+ online_memory_block);
return ret;
error:
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 3/9] mm/memory_hotplug: export mhp_get_default_online_type
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
2026-06-24 14:57 ` [PATCH v5 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
2026-06-24 14:57 ` [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
` (6 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Drivers which may pass hotplug policy down to DAX need MMOP_ symbols
and the mhp_get_default_online_type function for hotplug use cases.
Some drivers (cxl) co-mingle their hotplug and devdax use-cases into
the same driver code, and chose the dax_kmem path as the default driver
path - making it difficult to require hotplug as a predicate to building
the overall driver (it may break other non-hotplug use-cases).
Export mhp_get_default_online_type function to allow these drivers to
build when hotplug is disabled and still use the DAX use case.
In the built-out case we simply return MMOP_OFFLINE as it's
non-destructive. The internal function can never return -1 either,
so we choose this to allow for defining the function with 'enum mmop'.
Signed-off-by: Gregory Price <gourry@gourry.net>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
---
include/linux/memory_hotplug.h | 2 ++
mm/memory_hotplug.c | 1 +
2 files changed, 3 insertions(+)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7c9d66729c60..f059025f8f8b 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -316,6 +316,8 @@ extern struct zone *zone_for_pfn_range(enum mmop online_type,
extern int arch_create_linear_mapping(int nid, u64 start, u64 size,
struct mhp_params *params);
void arch_remove_linear_mapping(u64 start, u64 size);
+#else
+static inline enum mmop mhp_get_default_online_type(void) { return MMOP_OFFLINE; }
#endif /* CONFIG_MEMORY_HOTPLUG */
#endif /* __LINUX_MEMORY_HOTPLUG_H */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6833208cc17c..494257054095 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -239,6 +239,7 @@ enum mmop mhp_get_default_online_type(void)
return mhp_default_online_type;
}
+EXPORT_SYMBOL_GPL(mhp_get_default_online_type);
void mhp_set_default_online_type(enum mmop online_type)
{
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (2 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 16:41 ` Gupta, Pankaj
2026-06-24 14:57 ` [PATCH v5 5/9] mm/memory_hotplug: offline_and_remove_memory_ranges() Gregory Price
` (5 subsequent siblings)
9 siblings, 1 reply; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Existing callers of add_memory_driver_managed cannot select the
preferred online type (ZONE_NORMAL vs ZONE_MOVABLE), requiring it to
hot-add memory as offline blocks, and then follow up by onlining each
memory block individually.
Most drivers prefer the system default, but the CXL driver wants to
plumb a preferred policy through the dax kmem driver.
Refactor APIs to add a new interface which allows the dax kmem module
to select a preferred policy.
Overriding the configured auto-online policy is only safe for known
in-tree modules, where we know the override reflects a different,
user-requested policy. We do not want arbitrary out-of-tree drivers
silently overriding the system-wide onlining policy, so restrict the
new interface to the kmem module using EXPORT_SYMBOL_FOR_MODULES()
rather than a plain EXPORT_SYMBOL_GPL(). Other in-tree modules (e.g.
cxl_core) can be added to the allowed list as the need arises.
Refactor add_memory_driver_managed, extract __add_memory_driver_managed
- Add proper kernel-doc for add_memory_driver_managed while refactoring
- New helper accepts an explicit online_type.
- New helper validates online_type is between OFFLINE and ONLINE_MOVABLE
Refactor: add_memory_resource, extract __add_memory_resource
- new helper accepts an explicit online_type
Original APIs now explicitly pass the system-default to new helpers.
No functional change for existing users.
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 3 ++
mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++-----
2 files changed, 56 insertions(+), 8 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index f059025f8f8b..d3edeb80aadb 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
extern int add_memory_resource(int nid, struct resource *resource,
mhp_t mhp_flags);
+int __add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags,
+ enum mmop online_type);
extern int add_memory_driver_managed(int nid, u64 start, u64 size,
const char *resource_name,
mhp_t mhp_flags);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 494257054095..a66346def504 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1494,10 +1494,10 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
*
* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
*/
-int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+static int __add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags,
+ enum mmop online_type)
{
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
- enum mmop online_type = mhp_get_default_online_type();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
struct memory_group *group = NULL;
u64 start, size;
@@ -1585,7 +1585,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
merge_system_ram_resource(res);
/* online pages if requested */
- if (mhp_get_default_online_type() != MMOP_OFFLINE)
+ if (online_type != MMOP_OFFLINE)
walk_memory_blocks(start, size, &online_type,
online_memory_block);
@@ -1603,7 +1603,13 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
return ret;
}
-/* requires device_hotplug_lock, see add_memory_resource() */
+int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
+{
+ return __add_memory_resource(nid, res, mhp_flags,
+ mhp_get_default_online_type());
+}
+
+/* requires device_hotplug_lock, see __add_memory_resource() */
int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
{
struct resource *res;
@@ -1631,7 +1637,15 @@ int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
}
EXPORT_SYMBOL_GPL(add_memory);
-/*
+/**
+ * __add_memory_driver_managed - add driver-managed memory with explicit online_type
+ * @nid: NUMA node ID where the memory will be added
+ * @start: Start physical address of the memory range
+ * @size: Size of the memory range in bytes
+ * @resource_name: Resource name in format "System RAM ($DRIVER)"
+ * @mhp_flags: Memory hotplug flags
+ * @online_type: Auto-Online behavior (offline, online, kernel, movable)
+ *
* Add special, driver-managed memory to the system as system RAM. Such
* memory is not exposed via the raw firmware-provided memmap as system
* RAM, instead, it is detected and added by a driver - during cold boot,
@@ -1639,6 +1653,7 @@ EXPORT_SYMBOL_GPL(add_memory);
*
* Reasons why this memory should not be used for the initial memmap of a
* kexec kernel or for placing kexec images:
+ *
* - The booting kernel is in charge of determining how this memory will be
* used (e.g., use persistent memory as system RAM)
* - Coordination with a hypervisor is required before this memory
@@ -1651,9 +1666,12 @@ EXPORT_SYMBOL_GPL(add_memory);
*
* The resource_name (visible via /proc/iomem) has to have the format
* "System RAM ($DRIVER)".
+ *
+ * Return: 0 on success, negative error code on failure.
*/
-int add_memory_driver_managed(int nid, u64 start, u64 size,
- const char *resource_name, mhp_t mhp_flags)
+int __add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags,
+ enum mmop online_type)
{
struct resource *res;
int rc;
@@ -1663,6 +1681,9 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
resource_name[strlen(resource_name) - 1] != ')')
return -EINVAL;
+ if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
lock_device_hotplug();
res = register_memory_resource(start, size, resource_name);
@@ -1671,7 +1692,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
goto out_unlock;
}
- rc = add_memory_resource(nid, res, mhp_flags);
+ rc = __add_memory_resource(nid, res, mhp_flags, online_type);
if (rc < 0)
release_memory_resource(res);
@@ -1679,6 +1700,30 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
unlock_device_hotplug();
return rc;
}
+EXPORT_SYMBOL_FOR_MODULES(__add_memory_driver_managed, "kmem");
+
+/**
+ * add_memory_driver_managed - add driver-managed memory
+ * @nid: NUMA node ID where the memory will be added
+ * @start: Start physical address of the memory range
+ * @size: Size of the memory range in bytes
+ * @resource_name: Resource name in format "System RAM ($DRIVER)"
+ * @mhp_flags: Memory hotplug flags
+ *
+ * Add driver-managed memory with the system default online type set by
+ * build config or kernel boot parameter.
+ *
+ * See __add_memory_driver_managed for more details.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int add_memory_driver_managed(int nid, u64 start, u64 size,
+ const char *resource_name, mhp_t mhp_flags)
+{
+ return __add_memory_driver_managed(nid, start, size, resource_name,
+ mhp_flags,
+ mhp_get_default_online_type());
+}
EXPORT_SYMBOL_GPL(add_memory_driver_managed);
/*
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 5/9] mm/memory_hotplug: offline_and_remove_memory_ranges()
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (3 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 6/9] dax: plumb hotplug online_type through dax Gregory Price
` (4 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
offline_and_remove_memory() handles a single contiguous range.
Callers that manage a device composed of several ranges (dax/kmem)
currently have to call it in a loop, which gives up atomicity.
In addition to pushing rollback logic into the driver, the lack
of atomicity creates a race condition between system daemons trying
to manage the same resource:
- Manager 1: Offlines memory blocks. Removes device.
^^^^
- Manager 2: Detects offline memory blocks, re-onlines them.
Add offline_and_remove_memory_ranges(), which takes an array of ranges
and processes them as one operation under a single lock_device_hotplug():
- Phase 1 offlines every block of every range.
- Phase 2 removes the ranges only if all ranges are offline.
- If any offline fails, the whole operation is reverted.
This gives callers all-or-nothing semantics for the offline step, so a
failed or interrupted unplug leaves the device in a consistent state.
This also resolves the battling managers race - the second manager's
operation simply fails when the block is destroyed / cannot be onlined.
offline_and_remove_memory() becomes a thin wrapper that passes its single
range to the new helper, so the offline/rollback logic lives in one place.
Suggested-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
include/linux/memory_hotplug.h | 7 +++
mm/memory_hotplug.c | 94 ++++++++++++++++++++++++----------
2 files changed, 74 insertions(+), 27 deletions(-)
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d3edeb80aadb..7f1da7c428dc 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -267,6 +267,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
extern int remove_memory(u64 start, u64 size);
extern void __remove_memory(u64 start, u64 size);
extern int offline_and_remove_memory(u64 start, u64 size);
+int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges);
#else
static inline void try_offline_node(int nid) {}
@@ -283,6 +284,12 @@ static inline int remove_memory(u64 start, u64 size)
}
static inline void __remove_memory(u64 start, u64 size) {}
+
+static inline int offline_and_remove_memory_ranges(const struct range *ranges,
+ int nr_ranges)
+{
+ return -EBUSY;
+}
#endif /* CONFIG_MEMORY_HOTREMOVE */
#ifdef CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a66346def504..7d56e0c6ede0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -2429,58 +2429,98 @@ static int try_reonline_memory_block(struct memory_block *mem, void *arg)
*/
int offline_and_remove_memory(u64 start, u64 size)
{
- const unsigned long mb_count = size / memory_block_size_bytes();
+ struct range range = { .start = start, .end = start + size - 1 };
+
+ return offline_and_remove_memory_ranges(&range, 1);
+}
+EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+/**
+ * offline_and_remove_memory_ranges - offline and remove multiple memory ranges
+ * @ranges: array of physical address ranges to offline and remove
+ * @nr_ranges: number of entries in @ranges
+ *
+ * Offline and remove several memory ranges as one operation, serialized
+ * against other hotplug operations by a single lock_device_hotplug().
+ *
+ * This offlines all ranges before removing any of them. If offlining any
+ * range fails, the entire process is reverted and nothing is removed.
+ * This provides a fully atomic semantic for unplugging an entire device.
+ *
+ * Each range must be memory-block aligned in start and size.
+ *
+ * Return: 0 on success, negative errno otherwise. On failure no range has
+ * been removed.
+ */
+int offline_and_remove_memory_ranges(const struct range *ranges, int nr_ranges)
+{
+ unsigned long mb_total = 0;
uint8_t *online_types, *tmp;
- int rc;
+ int i, rc = 0;
- if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
- !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+ if (!ranges || nr_ranges <= 0)
return -EINVAL;
+ for (i = 0; i < nr_ranges; i++) {
+ u64 start = ranges[i].start;
+ u64 size = range_len(&ranges[i]);
+
+ if (!IS_ALIGNED(start, memory_block_size_bytes()) ||
+ !IS_ALIGNED(size, memory_block_size_bytes()) || !size)
+ return -EINVAL;
+ mb_total += size / memory_block_size_bytes();
+ }
+
/*
- * We'll remember the old online type of each memory block, so we can
- * try to revert whatever we did when offlining one memory block fails
- * after offlining some others succeeded.
+ * Remember the old online type of every memory block across all ranges,
+ * so we can revert if offlining a later block fails. All entries start
+ * as MMOP_OFFLINE so blocks we never touched are skipped on rollback.
*/
- online_types = kmalloc_array(mb_count, sizeof(*online_types),
+ online_types = kmalloc_array(mb_total, sizeof(*online_types),
GFP_KERNEL);
if (!online_types)
return -ENOMEM;
- /*
- * Initialize all states to MMOP_OFFLINE, so when we abort processing in
- * try_offline_memory_block(), we'll skip all unprocessed blocks in
- * try_reonline_memory_block().
- */
- memset(online_types, MMOP_OFFLINE, mb_count);
+ memset(online_types, MMOP_OFFLINE, mb_total);
lock_device_hotplug();
+ /* Phase 1: offline every block in every range. */
tmp = online_types;
- rc = walk_memory_blocks(start, size, &tmp, try_offline_memory_block);
+ for (i = 0; i < nr_ranges; i++) {
+ rc = walk_memory_blocks(ranges[i].start, range_len(&ranges[i]),
+ &tmp, try_offline_memory_block);
+ if (rc)
+ break;
+ }
/*
- * In case we succeeded to offline all memory, remove it.
- * This cannot fail as it cannot get onlined in the meantime.
+ * Phase 2: Remove each range. This essentially cannot fail as we hold
+ * the hotplug lock . WARN if that assumption is ever broken.
*/
if (!rc) {
- rc = try_remove_memory(start, size);
- if (rc)
- pr_err("%s: Failed to remove memory: %d", __func__, rc);
+ for (i = 0; i < nr_ranges; i++) {
+ rc = try_remove_memory(ranges[i].start,
+ range_len(&ranges[i]));
+ if (WARN_ON_ONCE(rc)) {
+ pr_err("%s: Failed to remove memory: %d",
+ __func__, rc);
+ break;
+ }
+ }
}
- /*
- * Rollback what we did. While memory onlining might theoretically fail
- * (nacked by a notifier), it barely ever happens.
- */
+ /* On fail: roll back. Blocks that were already offline are skipped */
if (rc) {
tmp = online_types;
- walk_memory_blocks(start, size, &tmp,
- try_reonline_memory_block);
+ for (i = 0; i < nr_ranges; i++)
+ walk_memory_blocks(ranges[i].start,
+ range_len(&ranges[i]), &tmp,
+ try_reonline_memory_block);
}
unlock_device_hotplug();
kfree(online_types);
return rc;
}
-EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+EXPORT_SYMBOL_GPL(offline_and_remove_memory_ranges);
#endif /* CONFIG_MEMORY_HOTREMOVE */
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 6/9] dax: plumb hotplug online_type through dax
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (4 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 5/9] mm/memory_hotplug: offline_and_remove_memory_ranges() Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
` (3 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
There is no way for drivers leveraging dax_kmem to plumb through a
preferred auto-online policy - the system default policy is forced.
Add 'enum mmop' field to DAX device creation path to allow drivers
to specify an auto-online policy when using the kmem driver.
Capturing the system default would otherwise break the ABI, because
the system default can change - but we would be statically assigning
the value at device creation time.
To resolve this we add DAX_ONLINE_DEFAULT, which defaults devices to
the current behavior, while providing a clean way to override it.
No behavioural change for existing callers (still the system default).
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/bus.c | 3 +++
drivers/dax/bus.h | 9 +++++++++
drivers/dax/cxl.c | 1 +
drivers/dax/dax-private.h | 4 ++++
drivers/dax/hmem/hmem.c | 1 +
drivers/dax/kmem.c | 11 +++++++++--
drivers/dax/pmem.c | 1 +
7 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 492573b47f66..4a03b323b003 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2017-2018 Intel Corporation. All rights reserved. */
#include <linux/memremap.h>
+#include <linux/memory_hotplug.h>
#include <linux/device.h>
#include <linux/mutex.h>
#include <linux/list.h>
@@ -394,6 +395,7 @@ static ssize_t create_store(struct device *dev, struct device_attribute *attr,
.size = 0,
.id = -1,
.memmap_on_memory = false,
+ .online_type = DAX_ONLINE_DEFAULT,
};
struct dev_dax *dev_dax = __devm_create_dev_dax(&data);
@@ -1527,6 +1529,7 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
ida_init(&dev_dax->ida);
dev_dax->memmap_on_memory = data->memmap_on_memory;
+ dev_dax->online_type = data->online_type;
inode = dax_inode(dax_dev);
dev->devt = inode->i_rdev;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 5909171a4428..f3c9dae5de6b 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -3,6 +3,7 @@
#ifndef __DAX_BUS_H__
#define __DAX_BUS_H__
#include <linux/device.h>
+#include <linux/memory_hotplug.h>
#include <linux/platform_device.h>
#include <linux/range.h>
#include <linux/workqueue.h>
@@ -16,6 +17,13 @@ struct dax_region;
#define IORESOURCE_DAX_STATIC BIT(0)
#define IORESOURCE_DAX_KMEM BIT(1)
+/*
+ * online_type sentinel: the device was created without an explicit online
+ * policy, so the system default is resolved when the kmem driver binds,
+ * (not at device-creation time, which would freeze a stale policy).
+ */
+#define DAX_ONLINE_DEFAULT (-1)
+
struct dax_region *alloc_dax_region(struct device *parent, int region_id,
struct range *range, int target_node, unsigned int align,
unsigned long flags);
@@ -26,6 +34,7 @@ struct dev_dax_data {
resource_size_t size;
int id;
bool memmap_on_memory;
+ enum mmop online_type;
};
struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data);
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 3ab39b77843d..1a7ec6212213 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -27,6 +27,7 @@ static int cxl_dax_region_probe(struct device *dev)
.id = -1,
.size = range_len(&cxlr_dax->hpa_range),
.memmap_on_memory = true,
+ .online_type = DAX_ONLINE_DEFAULT,
};
return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 81e4af49e39c..ccd77965fe3e 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -8,6 +8,7 @@
#include <linux/device.h>
#include <linux/cdev.h>
#include <linux/idr.h>
+#include <linux/memory_hotplug.h>
/* private routines between core files */
struct dax_device;
@@ -79,6 +80,8 @@ struct dev_dax_range {
* @dev: device core
* @pgmap: pgmap for memmap setup / lifetime (driver owned)
* @memmap_on_memory: allow kmem to put the memmap in the memory
+ * @online_type: MMOP_* online type for memory hotplug, or DAX_ONLINE_DEFAULT
+ * to resolve the system default policy when kmem binds
* @nr_range: size of @ranges
* @ranges: range tuples of memory used
*/
@@ -95,6 +98,7 @@ struct dev_dax {
struct device dev;
struct dev_pagemap *pgmap;
bool memmap_on_memory;
+ enum mmop online_type;
int nr_range;
struct dev_dax_range *ranges;
};
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index af21f66bf872..2de3bc925172 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -37,6 +37,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
.id = -1,
.size = region_idle ? 0 : range_len(&mri->range),
.memmap_on_memory = false,
+ .online_type = DAX_ONLINE_DEFAULT,
};
return PTR_ERR_OR_ZERO(devm_create_dev_dax(&data));
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 592171ec10f4..0a184c0878dd 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -72,6 +72,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
int i, rc, mapped = 0;
mhp_t mhp_flags;
int numa_node;
+ int online_type;
int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
/*
@@ -132,6 +133,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
+ /* Resolve system default at bind time in case it changed */
+ online_type = dev_dax->online_type;
+ if (online_type == DAX_ONLINE_DEFAULT)
+ online_type = mhp_get_default_online_type();
+
for (i = 0; i < dev_dax->nr_range; i++) {
struct resource *res;
struct range range;
@@ -172,8 +178,9 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
* Ensure that future kexec'd kernels will not treat
* this as RAM automatically.
*/
- rc = add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags);
+ rc = __add_memory_driver_managed(data->mgid, range.start,
+ range_len(&range), kmem_name, mhp_flags,
+ online_type);
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index bee93066a849..e7adace69195 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -63,6 +63,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
.pgmap = &pgmap,
.size = range_len(&range),
.memmap_on_memory = false,
+ .online_type = DAX_ONLINE_DEFAULT,
};
return devm_create_dev_dax(&data);
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 7/9] dax/kmem: extract hotplug/hotremove helper functions
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (5 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 6/9] dax: plumb hotplug online_type through dax Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 8/9] dax/kmem: add sysfs interface for atomic whole-device hotplug Gregory Price
` (2 subsequent siblings)
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Refactor kmem _probe() _remove() by extracting init, cleanup, hotplug,
and hot-remove logic into separate helper functions:
- dax_kmem_init_resources: inits IO_RESOURCE w/ request_mem_region
- dax_kmem_cleanup_resources: cleans up initialized IO_RESOURCE
- dax_kmem_do_hotplug: handles memory region reservation and adding
- dax_kmem_do_hotremove: handles memory removal and resource cleanup
This is a pure refactoring with no functional change. The helpers will
enable future extensions to support more granular control over memory
hotplug operations.
We need to split hotplug/hotunplug and init/cleanup in order to have the
resources available for hot-add. Otherwise, when probe occurs, the dax
devices are never added to sysfs because the resources are never
registered.
Detatching hotunplug/cleanup allows us to re-use the hotunplug code
without destroying the underlying resources.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
drivers/dax/kmem.c | 316 ++++++++++++++++++++++++++++++---------------
1 file changed, 214 insertions(+), 102 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 0a184c0878dd..a45e50def537 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -63,14 +63,195 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/**
+ * dax_kmem_do_hotplug - hotplug memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Hotplugs all ranges in the dev_dax region as system memory.
+ *
+ * Returns the number of successfully mapped ranges, or negative error.
+ */
+static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data,
+ int online_type)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, rc, onlined = 0;
+ mhp_t mhp_flags;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ mhp_flags = MHP_NID_IS_MGID;
+ if (dev_dax->memmap_on_memory)
+ mhp_flags |= MHP_MEMMAP_ON_MEMORY;
+
+ /*
+ * Ensure that future kexec'd kernels will not treat
+ * this as RAM automatically.
+ */
+ rc = __add_memory_driver_managed(data->mgid, range.start,
+ range_len(&range), kmem_name, mhp_flags,
+ online_type);
+
+ if (rc) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
+ i, range.start, range.end);
+ /*
+ * Release the reservation for the range that failed to
+ * add so a later hotremove does not try to remove memory
+ * that was never added.
+ */
+ if (data->res[i]) {
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+ if (onlined)
+ continue;
+ return rc;
+ }
+ onlined++;
+ }
+
+ return onlined;
+}
+
+/**
+ * dax_kmem_init_resources - create memory regions for dax kmem
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Initializes all the resources for the DAX
+ *
+ * Returns the number of successfully mapped ranges, or negative error.
+ */
+static int dax_kmem_init_resources(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, rc, mapped = 0;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct resource *res;
+ struct range range;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* Skip ranges already added */
+ if (data->res[i])
+ continue;
+
+ /* Region is permanently reserved if hotremove fails. */
+ res = request_mem_region(range.start, range_len(&range),
+ data->res_name);
+ if (!res) {
+ dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
+ i, range.start, range.end);
+ /*
+ * Once some memory has been onlined we can't
+ * assume that it can be un-onlined safely.
+ */
+ if (mapped)
+ continue;
+ return -EBUSY;
+ }
+ data->res[i] = res;
+ /*
+ * Set flags appropriate for System RAM. Leave ..._BUSY clear
+ * so that add_memory() can add a child resource. Do not
+ * inherit flags from the parent since it may set new flags
+ * unknown to us that will break add_memory() below.
+ */
+ res->flags = IORESOURCE_SYSTEM_RAM;
+ mapped++;
+ }
+ return mapped;
+}
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+/**
+ * dax_kmem_do_hotremove - hot-remove memory for dax kmem device
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Removes all ranges in the dev_dax region.
+ *
+ * Returns the number of successfully removed ranges.
+ */
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ struct device *dev = &dev_dax->dev;
+ int i, success = 0;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ struct range range;
+ int rc;
+
+ rc = dax_kmem_range(dev_dax, i, &range);
+ if (rc)
+ continue;
+
+ /* range was never added during probe, count as removed */
+ if (!data->res[i]) {
+ success++;
+ continue;
+ }
+
+ rc = remove_memory(range.start, range_len(&range));
+ if (rc == 0) {
+ /* Release the resource for the successfully removed range */
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ success++;
+ continue;
+ }
+ any_hotremove_failed = true;
+ dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n",
+ i, range.start, range.end);
+ }
+
+ return success;
+}
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+
+/**
+ * dax_kmem_cleanup_resources - remove the dax memory resources
+ * @dev_dax: the dev_dax instance
+ * @data: the dax_kmem_data structure with resource tracking
+ *
+ * Removes all resources in the dev_dax region.
+ */
+static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ int i;
+
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ if (!data->res[i])
+ continue;
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+}
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
unsigned long total_len = 0, orig_len = 0;
struct dax_kmem_data *data;
struct memory_dev_type *mtype;
- int i, rc, mapped = 0;
- mhp_t mhp_flags;
+ int i, rc;
int numa_node;
int online_type;
int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
@@ -133,73 +314,27 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
goto err_reg_mgid;
data->mgid = rc;
+ dev_set_drvdata(dev, data);
+
+ rc = dax_kmem_init_resources(dev_dax, data);
+ if (rc < 0)
+ goto err_resources;
+
/* Resolve system default at bind time in case it changed */
online_type = dev_dax->online_type;
if (online_type == DAX_ONLINE_DEFAULT)
online_type = mhp_get_default_online_type();
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct resource *res;
- struct range range;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- /* Region is permanently reserved if hotremove fails. */
- res = request_mem_region(range.start, range_len(&range), data->res_name);
- if (!res) {
- dev_warn(dev, "mapping%d: %#llx-%#llx could not reserve region\n",
- i, range.start, range.end);
- /*
- * Once some memory has been onlined we can't
- * assume that it can be un-onlined safely.
- */
- if (mapped)
- continue;
- rc = -EBUSY;
- goto err_request_mem;
- }
- data->res[i] = res;
-
- /*
- * Set flags appropriate for System RAM. Leave ..._BUSY clear
- * so that add_memory() can add a child resource. Do not
- * inherit flags from the parent since it may set new flags
- * unknown to us that will break add_memory() below.
- */
- res->flags = IORESOURCE_SYSTEM_RAM;
-
- mhp_flags = MHP_NID_IS_MGID;
- if (dev_dax->memmap_on_memory)
- mhp_flags |= MHP_MEMMAP_ON_MEMORY;
-
- /*
- * Ensure that future kexec'd kernels will not treat
- * this as RAM automatically.
- */
- rc = __add_memory_driver_managed(data->mgid, range.start,
- range_len(&range), kmem_name, mhp_flags,
- online_type);
-
- if (rc) {
- dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
- i, range.start, range.end);
- remove_resource(res);
- kfree(res);
- data->res[i] = NULL;
- if (mapped)
- continue;
- goto err_request_mem;
- }
- mapped++;
- }
-
- dev_set_drvdata(dev, data);
+ rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ if (rc < 0)
+ goto err_hotplug;
return 0;
-err_request_mem:
+err_hotplug:
+ dax_kmem_cleanup_resources(dev_dax, data);
+err_resources:
+ dev_set_drvdata(dev, NULL);
memory_group_unregister(data->mgid);
err_reg_mgid:
kfree(data->res_name);
@@ -213,7 +348,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
#ifdef CONFIG_MEMORY_HOTREMOVE
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
- int i, success = 0;
+ int success;
int node = dev_dax->target_node;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
@@ -224,48 +359,25 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
* there is no way to hotremove this memory until reboot because device
* unbind will succeed even if we return failure.
*/
- for (i = 0; i < dev_dax->nr_range; i++) {
- struct range range;
- int rc;
-
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
- continue;
-
- /* range was never added during probe */
- if (!data->res[i]) {
- success++;
- continue;
- }
-
- rc = remove_memory(range.start, range_len(&range));
- if (rc == 0) {
- remove_resource(data->res[i]);
- kfree(data->res[i]);
- data->res[i] = NULL;
- success++;
- continue;
- }
- any_hotremove_failed = true;
- dev_err(dev,
- "mapping%d: %#llx-%#llx cannot be hotremoved until the next reboot\n",
- i, range.start, range.end);
+ success = dax_kmem_do_hotremove(dev_dax, data);
+ if (success < dev_dax->nr_range) {
+ dev_err(dev, "Hotplug regions stuck online until reboot\n");
+ return;
}
- if (success >= dev_dax->nr_range) {
- memory_group_unregister(data->mgid);
- kfree(data->res_name);
- kfree(data);
- dev_set_drvdata(dev, NULL);
- /*
- * Clear the memtype association on successful unplug.
- * If not, we have memory blocks left which can be
- * offlined/onlined later. We need to keep memory_dev_type
- * for that. This implies this reference will be around
- * till next reboot.
- */
- clear_node_memory_type(node, NULL);
- }
+ dax_kmem_cleanup_resources(dev_dax, data);
+ memory_group_unregister(data->mgid);
+ kfree(data->res_name);
+ kfree(data);
+ dev_set_drvdata(dev, NULL);
+ /*
+ * Clear the memtype association on successful unplug.
+ * If not, we have memory blocks left which can be
+ * offlined/onlined later. We need to keep memory_dev_type
+ * for that. This implies this reference will be around
+ * till next reboot.
+ */
+ clear_node_memory_type(node, NULL);
}
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 8/9] dax/kmem: add sysfs interface for atomic whole-device hotplug
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (6 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 14:57 ` [PATCH v5 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
2026-06-24 18:59 ` [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple,
Hannes Reinecke
There is no atomic mechanism to offline and remove an entire
multi-block DAX kmem device. This is presently done in two steps:
1. offline all
2. remove all).
This creates a race condition where another entity operates directly
on the memory blocks and can cause hot-unplug to fail / unbind to
deadlock.
Add a new 'state' sysfs attribute that enables an atomic whole-device
hotplug operation across its entire memory region.
daxX.Y/state mirrors the per-block memoryX/state ABI:
- [offline, online, online_kernel, online_movable]
- "unplugged" - is added specifically for dax0.0/state
The valid writable states include:
- "unplugged": memory blocks are not present
- "online": memory is online, zone chosen by the kernel
- "online_kernel": memory is online in ZONE_NORMAL
- "online_movable": memory is online in ZONE_MOVABLE
Valid transitions:
- unplugged -> online[_kernel|_movable]
- online[_kernel|_movable] -> unplugged
- offline -> unplugged
A device can only be onlined from "unplugged", so it must be returned
there before being onlined into a different state.
For backwards compatibility the memory blocks are always created at
probe - existing tools expect them to be present after kmem binds.
"offline" is therefore a reportable state but is not writable: it only
arises from the legacy auto_online_blocks=offline policy. Onlining
such a device through this attribute requires unplugging it first in
an effort to get drivers creating DAX devices to set a default.
Unplug is atomic across the whole device: dax_kmem_do_hotremove()
collects every added range and offlines/removes them in one operation.
Either the operation succeeds or is entirely rolled back.
Unbind Note:
We used to call remove_memory() during unbind, which would fire a
BUG() if any of the memory blocks were online at that time. We lift
this into a WARN in the cleanup routine and don't attempt hotremove
if ->state is not DAX_KMEM_UNPLUGGED or MMOP_OFFLINE.
An offline dax device memory is removed on unbind as before.
If online at unbind, the resources are leaked (as before), but now
we prevent deadlock if a memory region is impossible to hotremove.
Suggested-by: Hannes Reinecke <hare@suse.de>
Suggested-by: David Hildenbrand <david@kernel.org>
Signed-off-by: Gregory Price <gourry@gourry.net>
---
Documentation/ABI/testing/sysfs-bus-dax | 26 +++
drivers/base/memory.c | 9 +
drivers/dax/kmem.c | 224 ++++++++++++++++++++----
include/linux/memory_hotplug.h | 1 +
4 files changed, 224 insertions(+), 36 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-dax b/Documentation/ABI/testing/sysfs-bus-dax
index b34266bfae49..2dcad1e9dad0 100644
--- a/Documentation/ABI/testing/sysfs-bus-dax
+++ b/Documentation/ABI/testing/sysfs-bus-dax
@@ -151,3 +151,29 @@ Description:
memmap_on_memory parameter for memory_hotplug. This is
typically set on the kernel command line -
memory_hotplug.memmap_on_memory set to 'true' or 'force'."
+
+What: /sys/bus/dax/devices/daxX.Y/state
+Date: June, 2026
+KernelVersion: v6.21
+Contact: nvdimm@lists.linux.dev
+Description:
+ (RW) Controls the state of the memory region.
+ Applies to all memory blocks associated with the device.
+ Only applies to dax_kmem devices.
+
+ Reading returns the current state; the writable states mirror
+ the per-block /sys/devices/system/memory/memoryX/state ABI::
+
+ "unplugged": memory blocks are not present
+ "online": memory is online, zone chosen by the kernel
+ "online_kernel": memory is online in ZONE_NORMAL
+ "online_movable": memory is online in ZONE_MOVABLE
+
+ "offline" (memory blocks are present but offline) may also be
+ reported - this happens when the device is bound while the
+ auto_online_blocks policy is "offline". It cannot be written,
+ as it's not useful and creates device destruction races.
+
+ A device can only be onlined from the "unplugged" state, so a
+ device must be returned to "unplugged" before it can be onlined
+ into a different state.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index b318344426fa..3a2f69d3af7b 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -46,6 +46,15 @@ int mhp_online_type_from_str(const char *str)
}
return -EINVAL;
}
+EXPORT_SYMBOL_GPL(mhp_online_type_from_str);
+
+const char *mhp_online_type_to_str(int online_type)
+{
+ if (online_type < 0 || online_type >= (int)ARRAY_SIZE(online_type_to_str))
+ return NULL;
+ return online_type_to_str[online_type];
+}
+EXPORT_SYMBOL_GPL(mhp_online_type_to_str);
#define to_memory_block(dev) container_of(dev, struct memory_block, dev)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index a45e50def537..340486586d82 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -42,9 +42,15 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r)
return 0;
}
+#define DAX_KMEM_UNPLUGGED (-1)
+
struct dax_kmem_data {
const char *res_name;
int mgid;
+ int numa_node;
+ struct dev_dax *dev_dax;
+ int state;
+ struct mutex lock; /* protects hotplug state transitions */
struct resource *res[];
};
@@ -63,12 +69,22 @@ static void kmem_put_memory_types(void)
mt_put_memory_types(&kmem_memory_types);
}
+/* True for the online states a kmem dax device can hold. */
+static bool dax_kmem_state_is_online(int state)
+{
+ return state == MMOP_ONLINE ||
+ state == MMOP_ONLINE_KERNEL ||
+ state == MMOP_ONLINE_MOVABLE;
+}
+
/**
* dax_kmem_do_hotplug - hotplug memory for dax kmem device
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
+ * @online_type: the online policy to use for the memory blocks
*
- * Hotplugs all ranges in the dev_dax region as system memory.
+ * Hotplugs all ranges in the dev_dax region as system memory with the
+ * provided online policy (offline, online, online_movable, online_kernel).
*
* Returns the number of successfully mapped ranges, or negative error.
*/
@@ -77,9 +93,15 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
int online_type)
{
struct device *dev = &dev_dax->dev;
- int i, rc, onlined = 0;
+ int i, rc, added = 0;
mhp_t mhp_flags;
+ if (dax_kmem_state_is_online(data->state))
+ return -EINVAL;
+
+ if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
+ return -EINVAL;
+
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
@@ -112,14 +134,14 @@ static int dax_kmem_do_hotplug(struct dev_dax *dev_dax,
kfree(data->res[i]);
data->res[i] = NULL;
}
- if (onlined)
+ if (added)
continue;
return rc;
}
- onlined++;
+ added++;
}
- return onlined;
+ return added;
}
/**
@@ -182,45 +204,64 @@ static int dax_kmem_init_resources(struct dev_dax *dev_dax,
* @dev_dax: the dev_dax instance
* @data: the dax_kmem_data structure with resource tracking
*
- * Removes all ranges in the dev_dax region.
+ * Offlines and removes every currently-added range in the dev_dax region
+ * atomically: either all ranges are offlined and removed, or none are and
+ * the device is returned to its prior state.
*
- * Returns the number of successfully removed ranges.
+ * Returns 0 on success, or a negative errno on failure.
*/
static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
struct dax_kmem_data *data)
{
struct device *dev = &dev_dax->dev;
- int i, success = 0;
+ struct range *ranges;
+ int i, nr_ranges = 0, rc;
+
+ ranges = kmalloc_array(dev_dax->nr_range, sizeof(*ranges), GFP_KERNEL);
+ if (!ranges)
+ return -ENOMEM;
+ /* Collect the ranges that were actually added during probe. */
for (i = 0; i < dev_dax->nr_range; i++) {
struct range range;
- int rc;
- rc = dax_kmem_range(dev_dax, i, &range);
- if (rc)
+ if (!data->res[i])
continue;
-
- /* range was never added during probe, count as removed */
- if (!data->res[i]) {
- success++;
+ if (dax_kmem_range(dev_dax, i, &range))
continue;
- }
+ ranges[nr_ranges++] = range;
+ }
- rc = remove_memory(range.start, range_len(&range));
- if (rc == 0) {
- /* Release the resource for the successfully removed range */
- remove_resource(data->res[i]);
- kfree(data->res[i]);
- data->res[i] = NULL;
- success++;
- continue;
- }
+ /* Nothing added means nothing to remove. */
+ if (!nr_ranges) {
+ kfree(ranges);
+ return 0;
+ }
+
+ rc = offline_and_remove_memory_ranges(ranges, nr_ranges);
+ kfree(ranges);
+ if (rc) {
any_hotremove_failed = true;
- dev_err(dev, "mapping%d: %#llx-%#llx hotremove failed\n",
- i, range.start, range.end);
+ dev_err(dev, "hotremove failed, device left online: %d\n", rc);
+ return rc;
}
- return success;
+ /* All ranges removed; release the reserved resources. */
+ for (i = 0; i < dev_dax->nr_range; i++) {
+ if (!data->res[i])
+ continue;
+ remove_resource(data->res[i]);
+ kfree(data->res[i]);
+ data->res[i] = NULL;
+ }
+
+ return 0;
+}
+#else
+static int dax_kmem_do_hotremove(struct dev_dax *dev_dax,
+ struct dax_kmem_data *data)
+{
+ return -EBUSY;
}
#endif /* CONFIG_MEMORY_HOTREMOVE */
@@ -236,6 +277,18 @@ static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
{
int i;
+ /*
+ * If the device unbind occurs before memory is hotremoved, we can never
+ * remove the memory (requires reboot). Attempting an offline operation
+ * here may cause deadlock and a failure to finish the unbind.
+ *
+ * Note: This leaks the resources.
+ */
+ if (WARN(((data->state != DAX_KMEM_UNPLUGGED) &&
+ (data->state != MMOP_OFFLINE)),
+ "Hotplug memory regions stuck online until reboot"))
+ return;
+
for (i = 0; i < dev_dax->nr_range; i++) {
if (!data->res[i])
continue;
@@ -245,6 +298,85 @@ static void dax_kmem_cleanup_resources(struct dev_dax *dev_dax,
}
}
+static int dax_kmem_parse_state(const char *buf)
+{
+ int online_type;
+
+ /* "unplugged" is kmem-specific - the rest map to MMOP_ */
+ if (sysfs_streq(buf, "unplugged"))
+ return DAX_KMEM_UNPLUGGED;
+
+ online_type = mhp_online_type_from_str(buf);
+ /* Disallow "offline": it's not useful and creates race conditions */
+ if (online_type == MMOP_OFFLINE)
+ return -EINVAL;
+ return online_type;
+}
+
+static ssize_t state_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ const char *state_str;
+
+ if (!data)
+ return -ENXIO;
+
+ if (data->state == DAX_KMEM_UNPLUGGED)
+ state_str = "unplugged";
+ else
+ state_str = mhp_online_type_to_str(data->state);
+
+ return sysfs_emit(buf, "%s\n", state_str ?: "unknown");
+}
+
+static ssize_t state_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct dev_dax *dev_dax = to_dev_dax(dev);
+ struct dax_kmem_data *data = dev_get_drvdata(dev);
+ int online_type;
+ int rc;
+
+ if (!data)
+ return -ENXIO;
+
+ online_type = dax_kmem_parse_state(buf);
+ if (online_type < DAX_KMEM_UNPLUGGED)
+ return online_type;
+
+ guard(mutex)(&data->lock);
+
+ /* Already in requested state */
+ if (data->state == online_type)
+ return len;
+
+ if (online_type == DAX_KMEM_UNPLUGGED) {
+ rc = dax_kmem_do_hotremove(dev_dax, data);
+ if (rc)
+ return rc;
+ data->state = DAX_KMEM_UNPLUGGED;
+ return len;
+ }
+
+ /* Onlining is only allowed from the unplugged state. */
+ if (data->state != DAX_KMEM_UNPLUGGED)
+ return -EBUSY;
+
+ /* Re-acquire resources if previously unplugged, otherwise no-op */
+ rc = dax_kmem_init_resources(dev_dax, data);
+ if (rc < 0)
+ return rc;
+
+ rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
+ if (rc < 0)
+ return rc;
+
+ data->state = online_type;
+ return len;
+}
+static DEVICE_ATTR_RW(state);
+
static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
{
struct device *dev = &dev_dax->dev;
@@ -313,6 +445,10 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc < 0)
goto err_reg_mgid;
data->mgid = rc;
+ data->numa_node = numa_node;
+ data->dev_dax = dev_dax;
+ data->state = DAX_KMEM_UNPLUGGED;
+ mutex_init(&data->lock);
dev_set_drvdata(dev, data);
@@ -325,9 +461,15 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (online_type == DAX_ONLINE_DEFAULT)
online_type = mhp_get_default_online_type();
+ /* Always create blocks for backward compatibility, even if offline */
rc = dax_kmem_do_hotplug(dev_dax, data, online_type);
if (rc < 0)
goto err_hotplug;
+ data->state = online_type;
+
+ rc = device_create_file(dev, &dev_attr_state);
+ if (rc)
+ dev_warn(dev, "failed to create state sysfs entry\n");
return 0;
@@ -348,20 +490,26 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
#ifdef CONFIG_MEMORY_HOTREMOVE
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
- int success;
int node = dev_dax->target_node;
struct device *dev = &dev_dax->dev;
struct dax_kmem_data *data = dev_get_drvdata(dev);
+ device_remove_file(dev, &dev_attr_state);
/*
- * We have one shot for removing memory, if some memory blocks were not
- * offline prior to calling this function remove_memory() will fail, and
- * there is no way to hotremove this memory until reboot because device
- * unbind will succeed even if we return failure.
+ * Online memory cannot safely be removed (offlining during unbind can
+ * deadlock a task as unbind cannot be interrupted). Unfortunately we
+ * have to leak all of [resources, memory group, @data, memtype], until
+ * the next reboot - and the memory will stay online until then.
+ *
+ * offline blocks are removed on unbind, but may leak on failure.
*/
- success = dax_kmem_do_hotremove(dev_dax, data);
- if (success < dev_dax->nr_range) {
- dev_err(dev, "Hotplug regions stuck online until reboot\n");
+ if (dax_kmem_state_is_online(data->state)) {
+ dev_warn(dev, "Hotplug regions stuck online until reboot\n");
+ any_hotremove_failed = true;
+ return;
+ } else if (data->state == MMOP_OFFLINE &&
+ dax_kmem_do_hotremove(dev_dax, data)) {
+ dev_warn(dev, "Unplug failed, resources leaked until reboot\n");
return;
}
@@ -382,6 +530,10 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
#else
static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
{
+ struct device *dev = &dev_dax->dev;
+
+ device_remove_file(dev, &dev_attr_state);
+
/*
* Without hotremove purposely leak the request_mem_region() for the
* device-dax range and return '0' to ->remove() attempts. The removal
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 7f1da7c428dc..46c796570692 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -127,6 +127,7 @@ extern int arch_add_memory(int nid, u64 start, u64 size,
extern u64 max_mem_size;
extern int mhp_online_type_from_str(const char *str);
+const char *mhp_online_type_to_str(int online_type);
/* If movable_node boot option specified */
extern bool movable_node_enabled;
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v5 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (7 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 8/9] dax/kmem: add sysfs interface for atomic whole-device hotplug Gregory Price
@ 2026-06-24 14:57 ` Gregory Price
2026-06-24 18:59 ` [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 14:57 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, gourry, alison.schofield,
Smita.KoralahalliChannabasappa, ira.weiny, apopple
Add a kselftest for the dax/kmem whole-device "state" sysfs attribute
(/sys/bus/dax/devices/daxX.Y/state), which transitions a kmem-backed
dax device between "unplugged", "online" and "online_movable".
The kselftest also includes a test to demonstrate the force-unbind
does not deadlock - but this is a destructive test. The dax device
can never be rebound after doing this.
Provisioning a devdax device and binding it to kmem needs daxctl/ndctl
out of scope for an in-tree selftest, so the test discovers an already
kmem-bound dax device and SKIPs when none are present or the memory
cannot be freed to reach a known baseline.
When a device is available it validates the interface contract:
- online / online_movable actually add memory (MemTotal grows),
- online is idempotent,
- switching between online types without unplug is rejected,
- unplug removes memory and the reported state is "unplugged"
- invalid input is rejected.
One specific regression test:
online -> unplug -> online_movable -> unplug
Re-online must re-reserve per-range resources so subsequent unplug
actually offlines and removes instead of silently reporting success
while the memory stays online.
Signed-off-by: Gregory Price <gourry@gourry.net>
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/dax/Makefile | 6 +
tools/testing/selftests/dax/config | 4 +
.../testing/selftests/dax/dax-kmem-hotplug.sh | 207 ++++++++++++++++++
tools/testing/selftests/dax/settings | 1 +
5 files changed, 219 insertions(+)
create mode 100644 tools/testing/selftests/dax/Makefile
create mode 100644 tools/testing/selftests/dax/config
create mode 100755 tools/testing/selftests/dax/dax-kmem-hotplug.sh
create mode 100644 tools/testing/selftests/dax/settings
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 6e59b8f63e41..8c2b4f97619c 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += core
TARGETS += cpufreq
TARGETS += cpu-hotplug
TARGETS += damon
+TARGETS += dax
TARGETS += devices/error_logs
TARGETS += devices/probe
TARGETS += dmabuf-heaps
diff --git a/tools/testing/selftests/dax/Makefile b/tools/testing/selftests/dax/Makefile
new file mode 100644
index 000000000000..25a4f3d73a5b
--- /dev/null
+++ b/tools/testing/selftests/dax/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := dax-kmem-hotplug.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/dax/config b/tools/testing/selftests/dax/config
new file mode 100644
index 000000000000..4c9aaeb6ceb4
--- /dev/null
+++ b/tools/testing/selftests/dax/config
@@ -0,0 +1,4 @@
+CONFIG_DEV_DAX=m
+CONFIG_DEV_DAX_KMEM=m
+CONFIG_MEMORY_HOTPLUG=y
+CONFIG_MEMORY_HOTREMOVE=y
diff --git a/tools/testing/selftests/dax/dax-kmem-hotplug.sh b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
new file mode 100755
index 000000000000..803bbd5a6409
--- /dev/null
+++ b/tools/testing/selftests/dax/dax-kmem-hotplug.sh
@@ -0,0 +1,207 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Exercise the dax/kmem "state" sysfs attribute:
+# /sys/bus/dax/devices/daxX.Y/state -> unplugged | online | online_movable
+#
+# The test needs a dax device already bound to the kmem driver.
+# If no suitable device is found the tests SKIP.
+#
+# A dax device can be provisioned with the memmap= boot param, e.g.:
+# memmap=2G!4G
+#
+# then, in the booted system:
+#
+# ndctl create-namespace -m devdax -e namespace0.0 -f
+# daxctl reconfigure-device -N -m system-ram dax0.0 # bind kmem
+# ./dax-kmem-hotplug.sh
+
+# shellcheck disable=SC1091
+DIR="$(dirname "$(readlink -f "$0")")"
+. "$DIR"/../kselftest/ktap_helpers.sh
+
+DAX_BASE=/sys/bus/dax/devices
+
+memtotal_kb() { awk '/^MemTotal:/ {print $2}' /proc/meminfo; }
+get_state() { cat "$HP" 2>/dev/null; }
+# set_state STATE -- write a state to the state attribute; returns the
+# write's exit status (0 = accepted by the kernel)
+set_state() { echo "$1" > "$HP" 2>/dev/null; }
+
+find_kmem_dax() {
+ local d drv
+ for d in "$DAX_BASE"/dax*; do
+ [ -e "$d/state" ] || continue
+ drv=$(readlink "$d/driver" 2>/dev/null)
+ [ "$(basename "${drv:-}")" = kmem ] || continue
+ basename "$d"
+ return 0
+ done
+ return 1
+}
+
+ktap_print_header
+
+if [ "$UID" != 0 ]; then
+ ktap_skip_all "must be run as root"
+ exit "$KSFT_SKIP"
+fi
+
+DAX=$(find_kmem_dax)
+if [ -z "$DAX" ]; then
+ ktap_skip_all "no kmem-bound dax device with a state attribute"
+ exit "$KSFT_SKIP"
+fi
+HP=$DAX_BASE/$DAX/state
+ORIG=$(get_state)
+
+# A failure to reach the baseline is environmental (memory in use), not an
+# interface failure, so skip rather than fail.
+set_state unplugged; rc=$?
+if [ "$rc" != 0 ] || [ "$(get_state)" != unplugged ]; then
+ ktap_skip_all "$DAX: cannot reach 'unplugged' baseline (memory in use?)"
+ [ -n "$ORIG" ] && set_state "$ORIG"
+ exit "$KSFT_SKIP"
+fi
+mt_unplugged=$(memtotal_kb)
+
+DRV=/sys/bus/dax/drivers/kmem
+AOB=/sys/devices/system/memory/auto_online_blocks
+
+ktap_print_msg "using $DAX (initial state was: $ORIG)"
+ktap_set_plan 11
+
+set_state online; rc=$?
+mt_online=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online ] && [ "$mt_online" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online: state=online, MemTotal $mt_unplugged -> $mt_online kB"
+else
+ ktap_test_fail "online: rc=$rc state=$(get_state) MemTotal $mt_unplugged -> $mt_online"
+fi
+
+set_state online; rc=$?
+if [ "$rc" = 0 ] && [ "$(get_state)" = online ]; then
+ ktap_test_pass "online idempotent"
+else
+ ktap_test_fail "online idempotent: rc=$rc state=$(get_state)"
+fi
+
+set_state online_movable; rc=$?
+if [ "$rc" != 0 ] && [ "$(get_state)" = online ]; then
+ ktap_test_pass "reject online_movable without intervening unplug"
+else
+ ktap_test_fail "online->online_movable not rejected: rc=$rc state=$(get_state)"
+fi
+
+set_state unplugged; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = unplugged ] && [ "$mt" -lt "$mt_online" ]; then
+ ktap_test_pass "unplug from online: MemTotal $mt_online -> $mt kB"
+else
+ ktap_test_fail "unplug from online: rc=$rc state=$(get_state) MemTotal $mt_online -> $mt"
+fi
+
+set_state online_movable; rc=$?
+mt_movable=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online_movable ] && [ "$mt_movable" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online_movable after unplug: MemTotal $mt_unplugged -> $mt_movable kB"
+else
+ ktap_test_fail "online_movable after unplug: rc=$rc state=$(get_state) MemTotal=$mt_movable"
+fi
+
+# The online -> unplug -> online_movable -> unplug cycle once regressed:
+# a re-online failed to re-reserve the per-range resources, so the final unplug
+# reported success while leaving the memory online. Assert it is really freed.
+set_state unplugged; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" != 0 ]; then
+ ktap_test_skip "unplug from movable not accepted (memory in use?) rc=$rc"
+elif [ "$(get_state)" = unplugged ] && [ "$mt" -lt "$mt_movable" ]; then
+ ktap_test_pass "unplug from online_movable removed memory: $mt_movable -> $mt kB"
+else
+ ktap_test_fail "unplug from movable reported success but memory remained: state=$(get_state) MemTotal $mt_movable -> $mt"
+fi
+
+set_state online_kernel; rc=$?
+mt=$(memtotal_kb)
+if [ "$rc" = 0 ] && [ "$(get_state)" = online_kernel ] && [ "$mt" -gt "$mt_unplugged" ]; then
+ ktap_test_pass "online_kernel: MemTotal $mt_unplugged -> $mt kB"
+else
+ ktap_test_fail "online_kernel: rc=$rc state=$(get_state) MemTotal=$mt"
+fi
+set_state unplugged
+
+before=$(get_state)
+set_state bogus_state; rc=$?
+if [ "$rc" != 0 ] && [ "$(get_state)" = "$before" ]; then
+ ktap_test_pass "reject invalid state string"
+else
+ ktap_test_fail "invalid state not rejected: rc=$rc state=$(get_state)"
+fi
+
+# Run several online/unplug cycles and require that each one adds/removes memory
+set_state unplugged
+cycle_ok=1; fail_i=0
+for i in 1 2 3; do
+ if ! set_state online; then cycle_ok=0; fail_i=$i; break; fi
+ on=$(memtotal_kb)
+ if ! set_state unplugged; then cycle_ok=0; fail_i=$i; break; fi
+ off=$(memtotal_kb)
+ if [ "$on" -le "$mt_unplugged" ] || [ "$off" -ge "$on" ]; then
+ cycle_ok=0; fail_i=$i; break
+ fi
+done
+if [ "$cycle_ok" = 1 ]; then
+ ktap_test_pass "online/unplug cycle re-acquires resources (3x: memory added and freed each time)"
+else
+ ktap_test_fail "online/unplug cycle regressed at iteration $fail_i (on=$on off=$off baseline=$mt_unplugged)"
+fi
+
+# change system default online policy while the device is unbound, and show
+# the new system default policy is utilized across bindings.
+set_state unplugged
+if [ -w "$AOB" ] && [ -w "$DRV/unbind" ] && [ -w "$DRV/bind" ]; then
+ orig_aob=$(cat "$AOB")
+ echo "$DAX" > "$DRV/unbind" 2>/dev/null
+ echo offline > "$AOB" 2>/dev/null
+ echo "$DAX" > "$DRV/bind" 2>/dev/null
+ sleep 1
+ st=$(get_state)
+ echo "$orig_aob" > "$AOB" 2>/dev/null # restore system policy
+ if [ "$st" = offline ]; then
+ ktap_test_pass "online policy resolved at bind: auto_online_blocks=offline -> state=offline"
+ else
+ ktap_test_fail "bind-time policy not honored: state=$st (expected offline)"
+ fi
+ set_state unplugged 2>/dev/null
+else
+ ktap_test_skip "auto_online_blocks or driver bind/unbind not writable"
+fi
+
+[ -n "$ORIG" ] && set_state "$ORIG"
+
+# DESTRUCTIVE: unbinding the driver while memory is online causes the resources
+# to leak - but the unbind should not deadlock. Instead the driver leaks it
+# with a single "suck online" warning. This leaves the memory online and the
+# device unbound until reboot, so it runs last.
+set_state unplugged; set_state online
+if [ "$(get_state)" = online ] && [ -w "$DRV/unbind" ]; then
+ mt_on=$(memtotal_kb)
+ dmesg -C 2>/dev/null
+ echo "$DAX" > "$DRV/unbind" 2>/dev/null
+ mt_after=$(memtotal_kb)
+ # The leaked "System RAM (kmem)" regions stay in the iomem tree; reading
+ # their names dereferences res_name, which a buggy unbind already freed.
+ # Walk /proc/iomem to provoke that use-after-free (caught by KASAN).
+ cat /proc/iomem > /dev/null 2>&1
+ splat=$(dmesg 2>/dev/null | grep -ciE "KASAN|BUG:|use-after-free|general protection|Oops|refcount_t")
+ if [ "$splat" = 0 ] && [ "$mt_after" -ge "$mt_on" ]; then
+ ktap_test_pass "unbind while online: memory left online, no UAF/oops (MemTotal $mt_on -> $mt_after kB)"
+ else
+ ktap_test_fail "unbind while online regressed: splat=$splat MemTotal $mt_on -> $mt_after kB"
+ fi
+else
+ ktap_test_skip "could not online device for unbind-while-online test"
+fi
+
+ktap_finished
diff --git a/tools/testing/selftests/dax/settings b/tools/testing/selftests/dax/settings
new file mode 100644
index 000000000000..ba4d85f74cd6
--- /dev/null
+++ b/tools/testing/selftests/dax/settings
@@ -0,0 +1 @@
+timeout=90
--
2.54.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg
2026-06-24 14:57 ` [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
@ 2026-06-24 16:28 ` Gupta, Pankaj
0 siblings, 0 replies; 13+ messages in thread
From: Gupta, Pankaj @ 2026-06-24 16:28 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, alison.schofield, Smita.KoralahalliChannabasappa,
ira.weiny, apopple
> Modify online_memory_block() to accept the online type through its arg
> parameter rather than calling mhp_get_default_online_type() internally.
>
> This prepares for allowing callers to specify explicit online types.
>
> Update the caller in add_memory_resource() to pass the default online
> type via a local variable.
>
> No functional change.
>
> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> mm/memory_hotplug.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 7ac19fab2263..6833208cc17c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1337,7 +1337,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
>
> static int online_memory_block(struct memory_block *mem, void *arg)
> {
> - mem->online_type = mhp_get_default_online_type();
> + enum mmop *online_type = arg;
> +
> + mem->online_type = *online_type;
> return device_online(&mem->dev);
> }
>
> @@ -1494,6 +1496,7 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
> int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> {
> struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
> + enum mmop online_type = mhp_get_default_online_type();
> enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> struct memory_group *group = NULL;
> u64 start, size;
> @@ -1582,7 +1585,8 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>
> /* online pages if requested */
> if (mhp_get_default_online_type() != MMOP_OFFLINE)
> - walk_memory_blocks(start, size, NULL, online_memory_block);
> + walk_memory_blocks(start, size, &online_type,
> + online_memory_block);
>
> return ret;
> error:
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg
2026-06-24 14:57 ` [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
@ 2026-06-24 16:41 ` Gupta, Pankaj
0 siblings, 0 replies; 13+ messages in thread
From: Gupta, Pankaj @ 2026-06-24 16:41 UTC (permalink / raw)
To: Gregory Price, linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, alison.schofield, Smita.KoralahalliChannabasappa,
ira.weiny, apopple
> Existing callers of add_memory_driver_managed cannot select the
> preferred online type (ZONE_NORMAL vs ZONE_MOVABLE), requiring it to
> hot-add memory as offline blocks, and then follow up by onlining each
> memory block individually.
>
> Most drivers prefer the system default, but the CXL driver wants to
> plumb a preferred policy through the dax kmem driver.
>
> Refactor APIs to add a new interface which allows the dax kmem module
> to select a preferred policy.
>
> Overriding the configured auto-online policy is only safe for known
> in-tree modules, where we know the override reflects a different,
> user-requested policy. We do not want arbitrary out-of-tree drivers
> silently overriding the system-wide onlining policy, so restrict the
> new interface to the kmem module using EXPORT_SYMBOL_FOR_MODULES()
> rather than a plain EXPORT_SYMBOL_GPL(). Other in-tree modules (e.g.
> cxl_core) can be added to the allowed list as the need arises.
>
> Refactor add_memory_driver_managed, extract __add_memory_driver_managed
> - Add proper kernel-doc for add_memory_driver_managed while refactoring
> - New helper accepts an explicit online_type.
> - New helper validates online_type is between OFFLINE and ONLINE_MOVABLE
>
> Refactor: add_memory_resource, extract __add_memory_resource
> - new helper accepts an explicit online_type
>
> Original APIs now explicitly pass the system-default to new helpers.
>
> No functional change for existing users.
>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
> include/linux/memory_hotplug.h | 3 ++
> mm/memory_hotplug.c | 61 +++++++++++++++++++++++++++++-----
> 2 files changed, 56 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index f059025f8f8b..d3edeb80aadb 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -294,6 +294,9 @@ extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> extern int add_memory_resource(int nid, struct resource *resource,
> mhp_t mhp_flags);
> +int __add_memory_driver_managed(int nid, u64 start, u64 size,
> + const char *resource_name, mhp_t mhp_flags,
> + enum mmop online_type);
> extern int add_memory_driver_managed(int nid, u64 start, u64 size,
> const char *resource_name,
> mhp_t mhp_flags);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 494257054095..a66346def504 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1494,10 +1494,10 @@ static int create_altmaps_and_memory_blocks(int nid, struct memory_group *group,
> *
> * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
> */
> -int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> +static int __add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags,
> + enum mmop online_type)
> {
> struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
> - enum mmop online_type = mhp_get_default_online_type();
> enum memblock_flags memblock_flags = MEMBLOCK_NONE;
> struct memory_group *group = NULL;
> u64 start, size;
> @@ -1585,7 +1585,7 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> merge_system_ram_resource(res);
>
> /* online pages if requested */
> - if (mhp_get_default_online_type() != MMOP_OFFLINE)
> + if (online_type != MMOP_OFFLINE)
> walk_memory_blocks(start, size, &online_type,
> online_memory_block);
>
> @@ -1603,7 +1603,13 @@ int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> return ret;
> }
>
> -/* requires device_hotplug_lock, see add_memory_resource() */
> +int add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
> +{
> + return __add_memory_resource(nid, res, mhp_flags,
> + mhp_get_default_online_type());
> +}
> +
> +/* requires device_hotplug_lock, see __add_memory_resource() */
> int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
> {
> struct resource *res;
> @@ -1631,7 +1637,15 @@ int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
> }
> EXPORT_SYMBOL_GPL(add_memory);
>
> -/*
> +/**
> + * __add_memory_driver_managed - add driver-managed memory with explicit online_type
> + * @nid: NUMA node ID where the memory will be added
> + * @start: Start physical address of the memory range
> + * @size: Size of the memory range in bytes
> + * @resource_name: Resource name in format "System RAM ($DRIVER)"
> + * @mhp_flags: Memory hotplug flags
> + * @online_type: Auto-Online behavior (offline, online, kernel, movable)
> + *
> * Add special, driver-managed memory to the system as system RAM. Such
> * memory is not exposed via the raw firmware-provided memmap as system
> * RAM, instead, it is detected and added by a driver - during cold boot,
> @@ -1639,6 +1653,7 @@ EXPORT_SYMBOL_GPL(add_memory);
> *
> * Reasons why this memory should not be used for the initial memmap of a
> * kexec kernel or for placing kexec images:
> + *
> * - The booting kernel is in charge of determining how this memory will be
> * used (e.g., use persistent memory as system RAM)
> * - Coordination with a hypervisor is required before this memory
> @@ -1651,9 +1666,12 @@ EXPORT_SYMBOL_GPL(add_memory);
> *
> * The resource_name (visible via /proc/iomem) has to have the format
> * "System RAM ($DRIVER)".
> + *
> + * Return: 0 on success, negative error code on failure.
> */
> -int add_memory_driver_managed(int nid, u64 start, u64 size,
> - const char *resource_name, mhp_t mhp_flags)
> +int __add_memory_driver_managed(int nid, u64 start, u64 size,
> + const char *resource_name, mhp_t mhp_flags,
> + enum mmop online_type)
> {
> struct resource *res;
> int rc;
> @@ -1663,6 +1681,9 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
> resource_name[strlen(resource_name) - 1] != ')')
> return -EINVAL;
>
> + if (online_type < MMOP_OFFLINE || online_type > MMOP_ONLINE_MOVABLE)
> + return -EINVAL;
> +
> lock_device_hotplug();
>
> res = register_memory_resource(start, size, resource_name);
> @@ -1671,7 +1692,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
> goto out_unlock;
> }
>
> - rc = add_memory_resource(nid, res, mhp_flags);
> + rc = __add_memory_resource(nid, res, mhp_flags, online_type);
> if (rc < 0)
> release_memory_resource(res);
>
> @@ -1679,6 +1700,30 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
> unlock_device_hotplug();
> return rc;
> }
> +EXPORT_SYMBOL_FOR_MODULES(__add_memory_driver_managed, "kmem");
> +
> +/**
> + * add_memory_driver_managed - add driver-managed memory
> + * @nid: NUMA node ID where the memory will be added
> + * @start: Start physical address of the memory range
> + * @size: Size of the memory range in bytes
> + * @resource_name: Resource name in format "System RAM ($DRIVER)"
> + * @mhp_flags: Memory hotplug flags
> + *
> + * Add driver-managed memory with the system default online type set by
> + * build config or kernel boot parameter.
> + *
> + * See __add_memory_driver_managed for more details.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int add_memory_driver_managed(int nid, u64 start, u64 size,
> + const char *resource_name, mhp_t mhp_flags)
> +{
> + return __add_memory_driver_managed(nid, start, size, resource_name,
> + mhp_flags,
> + mhp_get_default_online_type());
> +}
> EXPORT_SYMBOL_GPL(add_memory_driver_managed);
>
> /*
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
` (8 preceding siblings ...)
2026-06-24 14:57 ` [PATCH v5 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
@ 2026-06-24 18:59 ` Gregory Price
9 siblings, 0 replies; 13+ messages in thread
From: Gregory Price @ 2026-06-24 18:59 UTC (permalink / raw)
To: linux-mm, nvdimm
Cc: linux-kernel, linux-cxl, driver-core, linux-kselftest,
kernel-team, david, osalvador, gregkh, rafael, dakr, djbw,
vishal.l.verma, dave.jiang, akpm, ljs, liam, vbabka, rppt, surenb,
mhocko, shuah, alison.schofield, Smita.KoralahalliChannabasappa,
ira.weiny, apopple
On Wed, Jun 24, 2026 at 10:57:35AM -0400, Gregory Price wrote:
>... snip ...
Disregard, there are a few unaddressed Sashiko comments, I'm just going
to respin this. Will wait until after the merge window closes for v6.
The rough shape of things should still hold w/ prior feedback.
~Gregory
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-06-24 18:59 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 14:57 [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
2026-06-24 14:57 ` [PATCH v5 1/9] mm/memory: add memory_block_aligned_range() helper Gregory Price
2026-06-24 14:57 ` [PATCH v5 2/9] mm/memory_hotplug: pass online_type to online_memory_block() via arg Gregory Price
2026-06-24 16:28 ` Gupta, Pankaj
2026-06-24 14:57 ` [PATCH v5 3/9] mm/memory_hotplug: export mhp_get_default_online_type Gregory Price
2026-06-24 14:57 ` [PATCH v5 4/9] mm/memory_hotplug: add __add_memory_driver_managed() with online_type arg Gregory Price
2026-06-24 16:41 ` Gupta, Pankaj
2026-06-24 14:57 ` [PATCH v5 5/9] mm/memory_hotplug: offline_and_remove_memory_ranges() Gregory Price
2026-06-24 14:57 ` [PATCH v5 6/9] dax: plumb hotplug online_type through dax Gregory Price
2026-06-24 14:57 ` [PATCH v5 7/9] dax/kmem: extract hotplug/hotremove helper functions Gregory Price
2026-06-24 14:57 ` [PATCH v5 8/9] dax/kmem: add sysfs interface for atomic whole-device hotplug Gregory Price
2026-06-24 14:57 ` [PATCH v5 9/9] selftests/dax: add dax/kmem hotplug sysfs regression test Gregory Price
2026-06-24 18:59 ` [PATCH v5 0/9] dax/kmem: atomic whole-device hotplug via sysfs Gregory Price
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox