[PATCH v1 0/1] Fix race condition in the memory management system

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [PATCH v1 0/1] Fix race condition in the memory management system
@ 2026-03-12 13:14 Hubert Mazur
  2026-03-12 13:14 ` [PATCH v1 1/1] mm: fix race condition in the memory management Hubert Mazur
  0 siblings, 1 reply; 4+ messages in thread
From: Hubert Mazur @ 2026-03-12 13:14 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport
  Cc: Greg Kroah-Hartman, Stanislaw Kardach, Michal Krawczyk,
	Slawomir Rosek, Ryan Neph, linux-mm, linux-kernel, Hubert Mazur

When 'ARCH_HAS_EXECMEM_ROX' is being enabled the memory management
system will use caching techniques to optimize the allocations. The
logic tries to find the appropriate memory block based on requested
size. This can fail if current allocations is not sufficient hence
kernel allocates a new block large enough in regards to the request.
After the allocation is done, the new block is being added to the
free_areas tree and then - traverses the tree with hope to find the
matching piecie of memory. The operations of allocating new memory and
traversing the tree are not protected by mutex and thus there is a
chance that some other process will "steal" this shiny new block. It's a
classic race condition for resources. Fix this accordingly by moving a
new block of memory to busy fragments instead of free and return the
pointer to memory. This simplifies the allocation logic since we don't
firstly extend the free areas just to take it a bit later. In case the
new memory allocation is required - do it and return to the caller.

Hubert Mazur (1):
  mm: fix race condition in the memory management

 mm/execmem.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

-- 
2.53.0.851.ga537e3e6e9-goog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v1 1/1] mm: fix race condition in the memory management
  2026-03-12 13:14 [PATCH v1 0/1] Fix race condition in the memory management system Hubert Mazur
@ 2026-03-12 13:14 ` Hubert Mazur
  2026-03-12 13:42   ` Mike Rapoport
  0 siblings, 1 reply; 4+ messages in thread
From: Hubert Mazur @ 2026-03-12 13:14 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport
  Cc: Greg Kroah-Hartman, Stanislaw Kardach, Michal Krawczyk,
	Slawomir Rosek, Ryan Neph, linux-mm, linux-kernel, Hubert Mazur

When 'ARCH_HAS_EXECMEM_ROX' is being enabled the memory management
system will use caching techniques to optimize the allocations. The
logic tries to find the appropriate memory block based on requested
size. This can fail if current allocations is not sufficient hence
kernel allocates a new block large enough in regards to the request.
After the allocation is done, the new block is being added to the
free_areas tree and then - traverses the tree with hope to find the
matching piecie of memory. The operations of allocating new memory and
traversing the tree are not protected by mutex and thus there is a
chance that some other process will "steal" this shiny new block. It's a
classic race condition for resources. Fix this accordingly by moving a
new block of memory to busy fragments instead of free and return the
pointer to memory. This simplifies the allocation logic since we don't
firstly extend the free areas just to take it a bit later. In case the
new memory allocation is required - do it and return to the caller.

Signed-off-by: Hubert Mazur <hmazur@google.com>
---
 mm/execmem.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/mm/execmem.c b/mm/execmem.c
index 810a4ba9c924..8aa44d19ec73 100644
--- a/mm/execmem.c
+++ b/mm/execmem.c
@@ -203,13 +203,6 @@ static int execmem_cache_add_locked(void *ptr, size_t size, gfp_t gfp_mask)
 	return mas_store_gfp(&mas, (void *)lower, gfp_mask);
 }
 
-static int execmem_cache_add(void *ptr, size_t size, gfp_t gfp_mask)
-{
-	guard(mutex)(&execmem_cache.mutex);
-
-	return execmem_cache_add_locked(ptr, size, gfp_mask);
-}
-
 static bool within_range(struct execmem_range *range, struct ma_state *mas,
 			 size_t size)
 {
@@ -225,7 +218,7 @@ static bool within_range(struct execmem_range *range, struct ma_state *mas,
 	return false;
 }
 
-static void *__execmem_cache_alloc(struct execmem_range *range, size_t size)
+static void *__execmem_cache_lookup(struct execmem_range *range, size_t size)
 {
 	struct maple_tree *free_areas = &execmem_cache.free_areas;
 	struct maple_tree *busy_areas = &execmem_cache.busy_areas;
@@ -278,10 +271,12 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size)
 	return ptr;
 }
 
-static int execmem_cache_populate(struct execmem_range *range, size_t size)
+static void *__execmem_cache_alloc(struct execmem_range *range, size_t size)
 {
+	struct maple_tree *busy_areas = &execmem_cache.busy_areas;
 	unsigned long vm_flags = VM_ALLOW_HUGE_VMAP;
 	struct vm_struct *vm;
+	unsigned long addr;
 	size_t alloc_size;
 	int err = -ENOMEM;
 	void *p;
@@ -294,7 +289,7 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 	}
 
 	if (!p)
-		return err;
+		return NULL;
 
 	vm = find_vm_area(p);
 	if (!vm)
@@ -307,32 +302,35 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
 	if (err)
 		goto err_free_mem;
 
-	err = execmem_cache_add(p, alloc_size, GFP_KERNEL);
+	/* Set new allocation as an already busy fragment */
+	addr = (unsigned long)p;
+	MA_STATE(mas, busy_areas, addr - 1, addr + 1);
+	mas_set_range(&mas, addr, addr+size - 1);
+
+	mutex_lock(&execmem_cache.mutex);
+	err = mas_store_gfp(&mas, (void *)addr, GFP_KERNEL);
+	mutex_unlock(&execmem_cache.mutex);
+
 	if (err)
 		goto err_reset_direct_map;
 
-	return 0;
+	return p;
 
 err_reset_direct_map:
 	execmem_set_direct_map_valid(vm, true);
 err_free_mem:
 	vfree(p);
-	return err;
+	return NULL;
 }
 
 static void *execmem_cache_alloc(struct execmem_range *range, size_t size)
 {
 	void *p;
-	int err;
 
-	p = __execmem_cache_alloc(range, size);
+	p = __execmem_cache_lookup(range, size);
 	if (p)
 		return p;
 
-	err = execmem_cache_populate(range, size);
-	if (err)
-		return NULL;
-
 	return __execmem_cache_alloc(range, size);
 }
 
-- 
2.53.0.851.ga537e3e6e9-goog



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v1 1/1] mm: fix race condition in the memory management
  2026-03-12 13:14 ` [PATCH v1 1/1] mm: fix race condition in the memory management Hubert Mazur
@ 2026-03-12 13:42   ` Mike Rapoport
  2026-03-12 15:42     ` Hubert Mazur
  0 siblings, 1 reply; 4+ messages in thread
From: Mike Rapoport @ 2026-03-12 13:42 UTC (permalink / raw)
  To: Hubert Mazur
  Cc: Andrew Morton, Greg Kroah-Hartman, Stanislaw Kardach,
	Michal Krawczyk, Slawomir Rosek, Ryan Neph, linux-mm,
	linux-kernel

On Thu, Mar 12, 2026 at 01:14:38PM +0000, Hubert Mazur wrote:
> Subject: mm: fix race condition in the memory management

The prefix should be mm/execmem:
And this is not exactly a fix.

> When 'ARCH_HAS_EXECMEM_ROX' is being enabled the memory management
> system will use caching techniques to optimize the allocations. The
> logic tries to find the appropriate memory block based on requested
> size. This can fail if current allocations is not sufficient hence
> kernel allocates a new block large enough in regards to the request.
> After the allocation is done, the new block is being added to the
> free_areas tree and then - traverses the tree with hope to find the
> matching piecie of memory. The operations of allocating new memory and
> traversing the tree are not protected by mutex and thus there is a
> chance that some other process will "steal" this shiny new block. It's a

Does it actually happen in some environment or it's a theoretical issue?

> classic race condition for resources. Fix this accordingly by moving a
> new block of memory to busy fragments instead of free and return the
> pointer to memory. This simplifies the allocation logic since we don't
> firstly extend the free areas just to take it a bit later. In case the
> new memory allocation is required - do it and return to the caller.

It's hard to parse a single huge paragraph.
 
> Signed-off-by: Hubert Mazur <hmazur@google.com>
> ---
>  mm/execmem.c | 36 +++++++++++++++++-------------------
>  1 file changed, 17 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/execmem.c b/mm/execmem.c
> index 810a4ba9c924..8aa44d19ec73 100644
> --- a/mm/execmem.c
> +++ b/mm/execmem.c
> @@ -307,32 +302,35 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
>  	if (err)
>  		goto err_free_mem;
>  
> -	err = execmem_cache_add(p, alloc_size, GFP_KERNEL);
> +	/* Set new allocation as an already busy fragment */
> +	addr = (unsigned long)p;
> +	MA_STATE(mas, busy_areas, addr - 1, addr + 1);
> +	mas_set_range(&mas, addr, addr+size - 1);
> +
> +	mutex_lock(&execmem_cache.mutex);
> +	err = mas_store_gfp(&mas, (void *)addr, GFP_KERNEL);
> +	mutex_unlock(&execmem_cache.mutex);
> +
>  	if (err)
>  		goto err_reset_direct_map;
>  
> -	return 0;
> +	return p;

This is wrong. The caller asked for 'size' and got ALIGN(size, PMD_SIZE)
instead.

>  
>  err_reset_direct_map:
>  	execmem_set_direct_map_valid(vm, true);
>  err_free_mem:
>  	vfree(p);
> -	return err;
> +	return NULL;
>  }

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v1 1/1] mm: fix race condition in the memory management
  2026-03-12 13:42   ` Mike Rapoport
@ 2026-03-12 15:42     ` Hubert Mazur
  0 siblings, 0 replies; 4+ messages in thread
From: Hubert Mazur @ 2026-03-12 15:42 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, Greg Kroah-Hartman, Stanislaw Kardach,
	Michal Krawczyk, Slawomir Rosek, Ryan Neph, linux-mm,
	linux-kernel

> The prefix should be mm/execmem:
ACK, I'll change it in the next patch set.

> Does it actually happen in some environment or it's a theoretical issue?
Yes - this is reproducible on Android devices running the 6.18 kernel
during the early boot phase, when modules are loaded. The "Out of
memory" error usually hits
modules that request a lot of contiguous memory.
It may be hard to reproduce (like 5 times out of 20 probes) but it occurs.

> It's hard to parse a single huge paragraph.
ACK - I'll reformat the commit msg a bit and add an info about the
reproduction env.

> This is wrong. The caller asked for 'size' and got ALIGN(size, PMD_SIZE)
> instead.
Yeah, you're right—this is inefficient since the remaining memory from
the allocated block won't be used,
i.e. align_size - size. I'll implement the logic to return this chunk
to the free_area.

Regards


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-12 15:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 13:14 [PATCH v1 0/1] Fix race condition in the memory management system Hubert Mazur
2026-03-12 13:14 ` [PATCH v1 1/1] mm: fix race condition in the memory management Hubert Mazur
2026-03-12 13:42   ` Mike Rapoport
2026-03-12 15:42     ` Hubert Mazur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox