* [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs
@ 2026-02-27 2:42 Juan Yescas
2026-03-16 15:52 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 4+ messages in thread
From: Juan Yescas @ 2026-02-27 2:42 UTC (permalink / raw)
To: Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
android-mm, Linux Memory Management List, Matthew Wilcox,
Vlastimil Babka, David Hildenbrand (Red Hat), Lorenzo Stoakes,
lsf-pc
Hi LSF MM organizers
I would like to propose a discussion on improving our ability to
reproduce complex memory allocation and reclaim scenarios, and solicit
feedback on a debugfs-based testing interface to help trigger these
edge cases.
== The Problem ==
We frequently encounter complex memory management issues in the wild, including:
- CMA allocation failures due to pinned MIGRATE_MOVABLE pages.
- Page migration and compaction failing during reclaim.
- Excessive reclaim loops triggered by specific workloads.
- OOM kills.
Reproducing these specific memory states for debugging is currently
cumbersome. For instance, consuming most of the available
MIGRATE_MOVABLE memory, or forcing MIGRATE_UNMOVABLE allocations
specifically from Node 1 and Zone DMA directly from userspace,
requires writing custom kernel modules or relying on unreliable
userspace memory pressure tactics.
== Proposed Approach ==
To simplify reproducer setups, we are exploring a debugfs driver that
allows us to perform highly targeted allocations using a
straightforward path-based API. The interface exposes the node, zone,
order, and migrate type.
Example 1: Allocating 2^11 pages from Node 1, Zone Normal, MIGRATE_MOVABLE
$ echo 1 > /sys/kernel/debug/mm/node-1/zone-Normal/order-11/migrate-Movable/alloc
This generates a unique handle (file) for the allocation:
$ ls /sys/kernel/debug/mm/node-1/zone-Normal/order-11/migrate-Movable/
8343cb1e-cc57-4753-a060-e152e0584e36
Example 2: Allocating 2^3 pages from Node 0, Zone DMA, MIGRATE_UNMOVABLE
$ echo 1 > /sys/kernel/debug/mm/node-0/zone-DMA/order-3/migrate-Unmovable/alloc
this gives
$ ls /sys/kernel/debug/mm/node-0/zone-DMA/order-3/migrate-Unmovable/
b5f607ec-eae3-4aca-b8ab-4335a4338a1f
To release the memory, userspace simply writes 0 to the generated handle:
$ echo 0 > /sys/kernel/debug/mm/node-0/zone-DMA/order-3/migrate-Unmovable/b5f607ec-eae3-4aca-b8ab-4335a4338a1f
== Discussion Points ==
Rather than presenting this as a finalized driver, I would like to use
this session to discuss the design with the mm community:
- API Semantics: Does this path-based structure make sense for
targeted allocations? How should we handle metadata (e.g., cating the
generated file to show allocation details/status)?
- Extensibility: What other memory shaping or fault-injection
functionality would be valuable to add to this driver for the broader
community?
- Alternative Approaches: Are there better existing mechanisms to
achieve this level of deterministic, user-controlled page allocation
for testing?
Thanks
Juan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs
2026-02-27 2:42 [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs Juan Yescas
@ 2026-03-16 15:52 ` David Hildenbrand (Arm)
2026-03-19 0:56 ` Juan Yescas
0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-16 15:52 UTC (permalink / raw)
To: Juan Yescas, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
Isaac Manjarres, android-mm, Linux Memory Management List,
Matthew Wilcox, Vlastimil Babka, Lorenzo Stoakes, lsf-pc
On 2/27/26 03:42, Juan Yescas wrote:
> Hi LSF MM organizers
Hi,
I'm late ...
>
> I would like to propose a discussion on improving our ability to
> reproduce complex memory allocation and reclaim scenarios, and solicit
> feedback on a debugfs-based testing interface to help trigger these
> edge cases.
>
> == The Problem ==
>
> We frequently encounter complex memory management issues in the wild, including:
>
> - CMA allocation failures due to pinned MIGRATE_MOVABLE pages.
> - Page migration and compaction failing during reclaim.
> - Excessive reclaim loops triggered by specific workloads.
> - OOM kills.
>
> Reproducing these specific memory states for debugging is currently
> cumbersome. For instance, consuming most of the available
> MIGRATE_MOVABLE memory, or forcing MIGRATE_UNMOVABLE allocations
> specifically from Node 1 and Zone DMA directly from userspace,
> requires writing custom kernel modules or relying on unreliable
> userspace memory pressure tactics.
I'm wondering whether an OOT module for this purpose would be sufficient?
IOW, do we really have to have this in the upstream kernel, or could we
have a public OOT module to perform these allocations?
Then, there are no worries about API/Extensibility etc.
Or would you want to fire up this debugging on a production kernel? I
would assume now.
--
Cheers,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs
2026-03-16 15:52 ` David Hildenbrand (Arm)
@ 2026-03-19 0:56 ` Juan Yescas
2026-03-23 9:14 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 4+ messages in thread
From: Juan Yescas @ 2026-03-19 0:56 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
android-mm, Linux Memory Management List, Matthew Wilcox,
Vlastimil Babka, Lorenzo Stoakes, lsf-pc
Thanks David for you comments,
On Mon, Mar 16, 2026 at 8:52 AM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/27/26 03:42, Juan Yescas wrote:
> > Hi LSF MM organizers
>
> Hi,
>
> I'm late ...
>
> >
> > I would like to propose a discussion on improving our ability to
> > reproduce complex memory allocation and reclaim scenarios, and solicit
> > feedback on a debugfs-based testing interface to help trigger these
> > edge cases.
> >
> > == The Problem ==
> >
> > We frequently encounter complex memory management issues in the wild, including:
> >
> > - CMA allocation failures due to pinned MIGRATE_MOVABLE pages.
> > - Page migration and compaction failing during reclaim.
> > - Excessive reclaim loops triggered by specific workloads.
> > - OOM kills.
> >
> > Reproducing these specific memory states for debugging is currently
> > cumbersome. For instance, consuming most of the available
> > MIGRATE_MOVABLE memory, or forcing MIGRATE_UNMOVABLE allocations
> > specifically from Node 1 and Zone DMA directly from userspace,
> > requires writing custom kernel modules or relying on unreliable
> > userspace memory pressure tactics.
>
> I'm wondering whether an OOT module for this purpose would be sufficient?
>
> IOW, do we really have to have this in the upstream kernel, or could we
> have a public OOT module to perform these allocations?
>
> Then, there are no worries about API/Extensibility etc.
>
You’re right that going OOT would bypass the strict API stability and
extensibility requirements that come with being in-tree.
However, there are some symbols that we would need to be exported in
order for the module to compile.
> Or would you want to fire up this debugging on a production kernel? I
> would assume now.
>
Yes, that is actually one of our goals. We often encounter
"heisenbugs" that only manifest
under specific workloads and we want the ability to stress the memory subystem.
For example, if we want to increase the unmovable allocations by 16 MiB,
a 4 KiB kernel, we can do
$ for i in {1..4} \
do \
echo 1 > /sys/kernel/debug/mm/node-1/zone-Normal/order-10/migrate-Unmovable/alloc
\
done
And this is way more convenient than writing a test driver to make
only unmovable allocations.
The same apply for the other migrate types.
Having this driver upstream allows us to trigger these allocations
on-demand without the friction (or security risk) of
loading unsigned OOT modules into a locked-down device.
Thanks
Juan
> --
> Cheers,
>
> David
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs
2026-03-19 0:56 ` Juan Yescas
@ 2026-03-23 9:14 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-03-23 9:14 UTC (permalink / raw)
To: Juan Yescas
Cc: Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
android-mm, Linux Memory Management List, Matthew Wilcox,
Vlastimil Babka, Lorenzo Stoakes, lsf-pc
On 3/19/26 01:56, Juan Yescas wrote:
> Thanks David for you comments,
>
>
> On Mon, Mar 16, 2026 at 8:52 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/27/26 03:42, Juan Yescas wrote:
>>> Hi LSF MM organizers
>>
>> Hi,
>>
>> I'm late ...
>>
>>>
>>> I would like to propose a discussion on improving our ability to
>>> reproduce complex memory allocation and reclaim scenarios, and solicit
>>> feedback on a debugfs-based testing interface to help trigger these
>>> edge cases.
>>>
>>> == The Problem ==
>>>
>>> We frequently encounter complex memory management issues in the wild, including:
>>>
>>> - CMA allocation failures due to pinned MIGRATE_MOVABLE pages.
>>> - Page migration and compaction failing during reclaim.
>>> - Excessive reclaim loops triggered by specific workloads.
>>> - OOM kills.
>>>
>>> Reproducing these specific memory states for debugging is currently
>>> cumbersome. For instance, consuming most of the available
>>> MIGRATE_MOVABLE memory, or forcing MIGRATE_UNMOVABLE allocations
>>> specifically from Node 1 and Zone DMA directly from userspace,
>>> requires writing custom kernel modules or relying on unreliable
>>> userspace memory pressure tactics.
>>
>> I'm wondering whether an OOT module for this purpose would be sufficient?
>>
>> IOW, do we really have to have this in the upstream kernel, or could we
>> have a public OOT module to perform these allocations?
>>
>> Then, there are no worries about API/Extensibility etc.
>>
> You’re right that going OOT would bypass the strict API stability and
> extensibility requirements that come with being in-tree.
>
> However, there are some symbols that we would need to be exported in
> order for the module to compile.
Reason I am asking is because we had similar discussions around memory
hot(un)plug in the past, where we decided that an OOT kernel module to
simulate add/remove was a better choice than exposing weird APIs to user
space.
Which symbols would you need? I guess we'd want to call the buddy by
specifying node+zone+order.
Is specifying the migratetype really relevant?
>
>> Or would you want to fire up this debugging on a production kernel? I
>> would assume now.
>>
>
> Yes, that is actually one of our goals. We often encounter
> "heisenbugs" that only manifest
> under specific workloads and we want the ability to stress the memory subystem.
>
> For example, if we want to increase the unmovable allocations by 16 MiB,
> a 4 KiB kernel, we can do
>
> $ for i in {1..4} \
> do \
> echo 1 > /sys/kernel/debug/mm/node-1/zone-Normal/order-10/migrate-Unmovable/alloc
How will we handle unmovable allocations ending up on movable memory
(e.g., ZONE_MOVABLE)? (e.g., allocating from ZONE_MOVABLE)
Also, is there any reason why we can't do it similar to hugetlb and use
a simple "nr_pages" variable, that can be set and read.
Why did you decide to use the "handle" approach?
> \
> done
>
> And this is way more convenient than writing a test driver to make
> only unmovable allocations.
> The same apply for the other migrate types.
Right, but the interface you provide looks like it would allow
allocating from movable areas etc, and I am not sure that is generally
helpful (or adds more complexity to handle).
--
Cheers,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-23 9:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 2:42 [LSF/MM/BPF TOPIC] Discussion: Targeted memory allocation via debugfs Juan Yescas
2026-03-16 15:52 ` David Hildenbrand (Arm)
2026-03-19 0:56 ` Juan Yescas
2026-03-23 9:14 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox