linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/rmap: Add anon_vma lifetime debug check
@ 2025-07-24 19:13 Jann Horn
  2025-07-24 21:52 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Jann Horn @ 2025-07-24 19:13 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Rik van Riel,
	Liam R. Howlett, Vlastimil Babka, Harry Yoo
  Cc: linux-mm, linux-kernel, Jann Horn

If an anon page is mapped into userspace, its anon_vma must be alive,
otherwise rmap walks can hit UAF.

There have been syzkaller reports a few months ago[1][2] of UAF in rmap
walks that seems to indicate that there can be pages with elevated mapcount
whose anon_vma has already been freed, but I think we never figured out
what the cause is; and syzkaller only hit these UAFs when memory pressure
randomly caused reclaim to rmap-walk the affected pages, so it of course
didn't manage to create a reproducer.

Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
hopefully catch such issues more reliably.

Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
unlike the checks above, this one would otherwise be hard to write such
that it completely compiles away in non-debug builds by itself, without
looking extremely ugly.

[1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
[2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com

Signed-off-by: Jann Horn <jannh@google.com>
---
 include/linux/rmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index c4f4903b1088..ba694c436f59 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
 	default:
 		VM_WARN_ON_ONCE(true);
 	}
+
+	/*
+	 * Anon folios must have an associated live anon_vma as long as they're
+	 * mapped into userspace.
+	 * Part of the purpose of the atomic_read() is to make KASAN check that
+	 * the anon_vma is still alive.
+	 */
+	if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
+		unsigned long mapping = (unsigned long)folio->mapping;
+		struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
+
+		VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
+	}
 }
 
 /*

---
base-commit: 01a412d06bc5786eb4e44a6c8f0f4659bd4c9864
change-id: 20250724-anonvma-uaf-debug-a9db0eb4177b

-- 
Jann Horn <jannh@google.com>



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-24 19:13 [PATCH] mm/rmap: Add anon_vma lifetime debug check Jann Horn
@ 2025-07-24 21:52 ` Andrew Morton
  2025-07-25 10:59   ` Jann Horn
  2025-07-24 21:56 ` David Hildenbrand
  2025-07-25 11:32 ` Lorenzo Stoakes
  2 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2025-07-24 21:52 UTC (permalink / raw)
  To: Jann Horn
  Cc: David Hildenbrand, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Thu, 24 Jul 2025 21:13:50 +0200 Jann Horn <jannh@google.com> wrote:

> If an anon page is mapped into userspace, its anon_vma must be alive,
> otherwise rmap walks can hit UAF.
> 
> There have been syzkaller reports a few months ago[1][2] of UAF in rmap
> walks that seems to indicate that there can be pages with elevated mapcount
> whose anon_vma has already been freed, but I think we never figured out
> what the cause is; and syzkaller only hit these UAFs when memory pressure
> randomly caused reclaim to rmap-walk the affected pages, so it of course
> didn't manage to create a reproducer.
> 
> Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> hopefully catch such issues more reliably.
> 
> Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> unlike the checks above, this one would otherwise be hard to write such
> that it completely compiles away in non-debug builds by itself, without
> looking extremely ugly.
> 
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
>  	default:
>  		VM_WARN_ON_ONCE(true);
>  	}
> +
> +	/*
> +	 * Anon folios must have an associated live anon_vma as long as they're
> +	 * mapped into userspace.
> +	 * Part of the purpose of the atomic_read() is to make KASAN check that
> +	 * the anon_vma is still alive.
> +	 */
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> +		unsigned long mapping = (unsigned long)folio->mapping;
> +		struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> +
> +		VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> +	}
>  }

PAGE_MAPPING_ANON is now FOLIO_MAPPING_ANON.

The subtraction to clear a bitflag works, but my brain would prefer &=
FOLIO_MAPPING_ANON.  Oh well.

Plus gratuitous 80-col fix:

--- a/include/linux/rmap.h~mm-rmap-add-anon_vma-lifetime-debug-check-fix
+++ a/include/linux/rmap.h
@@ -458,8 +458,9 @@ static inline void __folio_rmap_sanity_c
 	 */
 	if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
 		unsigned long mapping = (unsigned long)folio->mapping;
-		struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
+		struct anon_vma *anon_vma;
 
+		anon_vma = (void *)(mapping - FOLIO_MAPPING_ANON);
 		VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
 	}
 }
_



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-24 19:13 [PATCH] mm/rmap: Add anon_vma lifetime debug check Jann Horn
  2025-07-24 21:52 ` Andrew Morton
@ 2025-07-24 21:56 ` David Hildenbrand
  2025-07-25 11:08   ` Jann Horn
  2025-07-25 11:32 ` Lorenzo Stoakes
  2 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2025-07-24 21:56 UTC (permalink / raw)
  To: Jann Horn, Andrew Morton, Lorenzo Stoakes, Rik van Riel,
	Liam R. Howlett, Vlastimil Babka, Harry Yoo
  Cc: linux-mm, linux-kernel

On 24.07.25 21:13, Jann Horn wrote:
> If an anon page is mapped into userspace, its anon_vma must be alive,
> otherwise rmap walks can hit UAF.
> 
> There have been syzkaller reports a few months ago[1][2] of UAF in rmap
> walks that seems to indicate that there can be pages with elevated mapcount
> whose anon_vma has already been freed, but I think we never figured out
> what the cause is; and syzkaller only hit these UAFs when memory pressure
> randomly caused reclaim to rmap-walk the affected pages, so it of course
> didn't manage to create a reproducer.
> 
> Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> hopefully catch such issues more reliably.
> 
> Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> unlike the checks above, this one would otherwise be hard to write such
> that it completely compiles away in non-debug builds by itself, without
> looking extremely ugly.
> 
> [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
> [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
> 
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>   include/linux/rmap.h | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index c4f4903b1088..ba694c436f59 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
>   	default:
>   		VM_WARN_ON_ONCE(true);
>   	}
> +
> +	/*
> +	 * Anon folios must have an associated live anon_vma as long as they're
> +	 * mapped into userspace.
> +	 * Part of the purpose of the atomic_read() is to make KASAN check that
> +	 * the anon_vma is still alive.
> +	 */
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {

1) You probably don't need the CONFIG_DEBUG_VM check: the 
VM_WARN_ON_FOLIO should make everything get optimized out ... right?

2) We have a folio here, so ... better

if (folio_test_anon(folio) && !folio_test_ksm(folio)) {
	...
}

> +		unsigned long mapping = (unsigned long)folio->mapping;
> +		struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> +
> +		VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> +	}

In general,

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-24 21:52 ` Andrew Morton
@ 2025-07-25 10:59   ` Jann Horn
  0 siblings, 0 replies; 20+ messages in thread
From: Jann Horn @ 2025-07-25 10:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Thu, Jul 24, 2025 at 11:52 PM Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 24 Jul 2025 21:13:50 +0200 Jann Horn <jannh@google.com> wrote:
>
> > If an anon page is mapped into userspace, its anon_vma must be alive,
> > otherwise rmap walks can hit UAF.
> >
> > There have been syzkaller reports a few months ago[1][2] of UAF in rmap
> > walks that seems to indicate that there can be pages with elevated mapcount
> > whose anon_vma has already been freed, but I think we never figured out
> > what the cause is; and syzkaller only hit these UAFs when memory pressure
> > randomly caused reclaim to rmap-walk the affected pages, so it of course
> > didn't manage to create a reproducer.
> >
> > Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> > hopefully catch such issues more reliably.
> >
> > Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> > unlike the checks above, this one would otherwise be hard to write such
> > that it completely compiles away in non-debug builds by itself, without
> > looking extremely ugly.
> >
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> >       default:
> >               VM_WARN_ON_ONCE(true);
> >       }
> > +
> > +     /*
> > +      * Anon folios must have an associated live anon_vma as long as they're
> > +      * mapped into userspace.
> > +      * Part of the purpose of the atomic_read() is to make KASAN check that
> > +      * the anon_vma is still alive.
> > +      */
> > +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> > +             unsigned long mapping = (unsigned long)folio->mapping;
> > +             struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > +
> > +             VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> > +     }
> >  }
>
> PAGE_MAPPING_ANON is now FOLIO_MAPPING_ANON.

Bleh, sorry about that, I keep forgetting to write MM patches against
the MM tree...

> The subtraction to clear a bitflag works, but my brain would prefer &=
> FOLIO_MAPPING_ANON.  Oh well.

(I'd prefer bitmasking too but the existing code does subtraction, so
I figured I should mirror that.)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-24 21:56 ` David Hildenbrand
@ 2025-07-25 11:08   ` Jann Horn
  2025-07-25 11:12     ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Jann Horn @ 2025-07-25 11:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Thu, Jul 24, 2025 at 11:56 PM David Hildenbrand <david@redhat.com> wrote:
> On 24.07.25 21:13, Jann Horn wrote:
> > If an anon page is mapped into userspace, its anon_vma must be alive,
> > otherwise rmap walks can hit UAF.
> >
> > There have been syzkaller reports a few months ago[1][2] of UAF in rmap
> > walks that seems to indicate that there can be pages with elevated mapcount
> > whose anon_vma has already been freed, but I think we never figured out
> > what the cause is; and syzkaller only hit these UAFs when memory pressure
> > randomly caused reclaim to rmap-walk the affected pages, so it of course
> > didn't manage to create a reproducer.
> >
> > Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> > hopefully catch such issues more reliably.
> >
> > Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> > unlike the checks above, this one would otherwise be hard to write such
> > that it completely compiles away in non-debug builds by itself, without
> > looking extremely ugly.
> >
> > [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
> > [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> >   include/linux/rmap.h | 13 +++++++++++++
> >   1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index c4f4903b1088..ba694c436f59 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> >       default:
> >               VM_WARN_ON_ONCE(true);
> >       }
> > +
> > +     /*
> > +      * Anon folios must have an associated live anon_vma as long as they're
> > +      * mapped into userspace.
> > +      * Part of the purpose of the atomic_read() is to make KASAN check that
> > +      * the anon_vma is still alive.
> > +      */
> > +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
>
> 1) You probably don't need the CONFIG_DEBUG_VM check: the
> VM_WARN_ON_FOLIO should make everything get optimized out ... right?

The PageAnonNotKsm() check is outside the VM_WARN_ON_FOLIO(), and it
uses page_folio(), which uses _compound_head(), which does
READ_ONCE(page->compound_head); and READ_ONCE() unfortunately doesn't
just mean "I want a read without tearing", it also (intentionally)
prevents the compiler from removing the read when it sees that it's
not being used for anything.

> 2) We have a folio here, so ... better
>
> if (folio_test_anon(folio) && !folio_test_ksm(folio)) {
>         ...
> }

Hrm, okay. It kind of irks me to write it as two checks when really I
want to ask "is it this one specific type", but yeah, will change it.

These helpers don't use READ_ONCE(), so the compiler should then also
be able to remove the check...

> > +             unsigned long mapping = (unsigned long)folio->mapping;
> > +             struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > +
> > +             VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> > +     }
>
> In general,
>
> Acked-by: David Hildenbrand <david@redhat.com>

Thanks!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 11:08   ` Jann Horn
@ 2025-07-25 11:12     ` David Hildenbrand
  2025-07-25 11:24       ` Jann Horn
  0 siblings, 1 reply; 20+ messages in thread
From: David Hildenbrand @ 2025-07-25 11:12 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On 25.07.25 13:08, Jann Horn wrote:
> On Thu, Jul 24, 2025 at 11:56 PM David Hildenbrand <david@redhat.com> wrote:
>> On 24.07.25 21:13, Jann Horn wrote:
>>> If an anon page is mapped into userspace, its anon_vma must be alive,
>>> otherwise rmap walks can hit UAF.
>>>
>>> There have been syzkaller reports a few months ago[1][2] of UAF in rmap
>>> walks that seems to indicate that there can be pages with elevated mapcount
>>> whose anon_vma has already been freed, but I think we never figured out
>>> what the cause is; and syzkaller only hit these UAFs when memory pressure
>>> randomly caused reclaim to rmap-walk the affected pages, so it of course
>>> didn't manage to create a reproducer.
>>>
>>> Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
>>> hopefully catch such issues more reliably.
>>>
>>> Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
>>> unlike the checks above, this one would otherwise be hard to write such
>>> that it completely compiles away in non-debug builds by itself, without
>>> looking extremely ugly.
>>>
>>> [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
>>> [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
>>>
>>> Signed-off-by: Jann Horn <jannh@google.com>
>>> ---
>>>    include/linux/rmap.h | 13 +++++++++++++
>>>    1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index c4f4903b1088..ba694c436f59 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
>>>        default:
>>>                VM_WARN_ON_ONCE(true);
>>>        }
>>> +
>>> +     /*
>>> +      * Anon folios must have an associated live anon_vma as long as they're
>>> +      * mapped into userspace.
>>> +      * Part of the purpose of the atomic_read() is to make KASAN check that
>>> +      * the anon_vma is still alive.
>>> +      */
>>> +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
>>
>> 1) You probably don't need the CONFIG_DEBUG_VM check: the
>> VM_WARN_ON_FOLIO should make everything get optimized out ... right?
> 
> The PageAnonNotKsm() check is outside the VM_WARN_ON_FOLIO(), and it
> uses page_folio(), which uses _compound_head(), which does
> READ_ONCE(page->compound_head); and READ_ONCE() unfortunately doesn't
> just mean "I want a read without tearing", it also (intentionally)
> prevents the compiler from removing the read when it sees that it's
> not being used for anything.

That makes sense, the volatile read will not get optimized out.

> 
>> 2) We have a folio here, so ... better
>>
>> if (folio_test_anon(folio) && !folio_test_ksm(folio)) {
>>          ...
>> }
> 
> Hrm, okay. It kind of irks me to write it as two checks when really I
> want to ask "is it this one specific type", but yeah, will change it.

Well, ksm is a subtype of anon.

folio_test_anon_not_ksm()

let's rather ... not :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 11:12     ` David Hildenbrand
@ 2025-07-25 11:24       ` Jann Horn
  2025-07-25 11:31         ` David Hildenbrand
  0 siblings, 1 reply; 20+ messages in thread
From: Jann Horn @ 2025-07-25 11:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 1:12 PM David Hildenbrand <david@redhat.com> wrote:
> On 25.07.25 13:08, Jann Horn wrote:
> > On Thu, Jul 24, 2025 at 11:56 PM David Hildenbrand <david@redhat.com> wrote:
> >> 2) We have a folio here, so ... better
> >>
> >> if (folio_test_anon(folio) && !folio_test_ksm(folio)) {
> >>          ...
> >> }
> >
> > Hrm, okay. It kind of irks me to write it as two checks when really I
> > want to ask "is it this one specific type", but yeah, will change it.
>
> Well, ksm is a subtype of anon.

I mean... not really? At least ksm folios are not a subtype of normal
anon folios. Normal anon folios point to an anon_vma, a ksm folio
points to a ksm_stable_node instead, and you can't treat a
ksm_stable_node as an anon_vma.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 11:24       ` Jann Horn
@ 2025-07-25 11:31         ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2025-07-25 11:31 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, Lorenzo Stoakes, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On 25.07.25 13:24, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 1:12 PM David Hildenbrand <david@redhat.com> wrote:
>> On 25.07.25 13:08, Jann Horn wrote:
>>> On Thu, Jul 24, 2025 at 11:56 PM David Hildenbrand <david@redhat.com> wrote:
>>>> 2) We have a folio here, so ... better
>>>>
>>>> if (folio_test_anon(folio) && !folio_test_ksm(folio)) {
>>>>           ...
>>>> }
>>>
>>> Hrm, okay. It kind of irks me to write it as two checks when really I
>>> want to ask "is it this one specific type", but yeah, will change it.
>>
>> Well, ksm is a subtype of anon.
> 
> I mean... not really? At least ksm folios are not a subtype of normal
> anon folios. 

Well, what I mean is:

A KSM folio is considered an anon folio.

An anon folio is not necessarily a KSM folio.

Of course, there are implementation differences when it comes to 
folio->mapping etc, just the way the rmap works.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-24 19:13 [PATCH] mm/rmap: Add anon_vma lifetime debug check Jann Horn
  2025-07-24 21:52 ` Andrew Morton
  2025-07-24 21:56 ` David Hildenbrand
@ 2025-07-25 11:32 ` Lorenzo Stoakes
  2025-07-25 12:00   ` Jann Horn
  2025-07-25 15:05   ` Jann Horn
  2 siblings, 2 replies; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 11:32 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> If an anon page is mapped into userspace, its anon_vma must be alive,
> otherwise rmap walks can hit UAF.

This is extremely weird. I guess it's a UAF of vma->anon_vma

Because we:

put_anon_vma()
->__put_anon_vma()
->anon_vma_free() (also the root anon_vma... interestingly!)

But none of this obviously updates the vma to set vma->anon_vma to NULL.

So yeah god almighty. To get this, we must be dereffing vma->anon_vma
somewhere unexpected.

>
> There have been syzkaller reports a few months ago[1][2] of UAF in rmap

Will try to take a look when I get a chance.

> walks that seems to indicate that there can be pages with elevated mapcount
> whose anon_vma has already been freed, but I think we never figured out
> what the cause is; and syzkaller only hit these UAFs when memory pressure
> randomly caused reclaim to rmap-walk the affected pages, so it of course
> didn't manage to create a reproducer.

Fun.

Please hook me in (I mean you're going to anyway right :P) on this stuff,
as I'm looking to rework the anon_vma stuff so am naturally interested in
any and all rmap anon stuff.

For my sins ;)

Maybe I"ll dig into these syzkallers.

>
> Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> hopefully catch such issues more reliably.

Good idea.

>
> Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> unlike the checks above, this one would otherwise be hard to write such
> that it completely compiles away in non-debug builds by itself, without
> looking extremely ugly.

David already addressed.

>
> [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
> [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
>
> Signed-off-by: Jann Horn <jannh@google.com>

Nit below, and pending David's requests, but LGTM so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> ---
>  include/linux/rmap.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index c4f4903b1088..ba694c436f59 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
>  	default:
>  		VM_WARN_ON_ONCE(true);
>  	}
> +
> +	/*
> +	 * Anon folios must have an associated live anon_vma as long as they're
> +	 * mapped into userspace.
> +	 * Part of the purpose of the atomic_read() is to make KASAN check that
> +	 * the anon_vma is still alive.
> +	 */
> +	if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> +		unsigned long mapping = (unsigned long)folio->mapping;
> +		struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> +
> +		VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);

Maybe stupid question, but wouldn't we pick this up with KASAN? Or would we
pick it up far too late I guess?

We're sort of relying on this

a. being a UAF

b. the thing we're UAF-ing not either corrupting this field or (if that
    memory is actually reused as an anon_vma - I'm not familiar with slab
    caches - so maybe it's quite likely - getting its refcount incremented.

Which is fine - because hey this is the only way we can get a hint that
this is happening, but I think we should describe what we're doing.

> +	}
>  }

David has made all the points about the code, yes I agree with him re:
folio helpers etc.

>
>  /*
>
> ---
> base-commit: 01a412d06bc5786eb4e44a6c8f0f4659bd4c9864
> change-id: 20250724-anonvma-uaf-debug-a9db0eb4177b
>
> --
> Jann Horn <jannh@google.com>
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 11:32 ` Lorenzo Stoakes
@ 2025-07-25 12:00   ` Jann Horn
  2025-07-25 13:48     ` Lorenzo Stoakes
  2025-07-25 15:05   ` Jann Horn
  1 sibling, 1 reply; 20+ messages in thread
From: Jann Horn @ 2025-07-25 12:00 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 1:32 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> > If an anon page is mapped into userspace, its anon_vma must be alive,
> > otherwise rmap walks can hit UAF.
>
> This is extremely weird. I guess it's a UAF of vma->anon_vma
>
> Because we:
>
> put_anon_vma()
> ->__put_anon_vma()
> ->anon_vma_free() (also the root anon_vma... interestingly!)

FWIW, the thing we're trying to access can't be the root anon_vma,
since KASAN says the UAF'd object was allocated from anon_vma_fork.

> But none of this obviously updates the vma to set vma->anon_vma to NULL.
>
> So yeah god almighty. To get this, we must be dereffing vma->anon_vma
> somewhere unexpected.

I don't see how vma->anon_vma is directly relevant. I think the
relevant things are:

An anon_vma is only kept alive as long as at least one associated VMA
is still alive.
An anon folio may outlive the VMAs it comes from, so it may also
outlive its associated anon_vma.
When an anon_vma goes away, it does not clear the ->mapping of
associated folios (this is an intentional design choice).
When an rmap traversal wants to go from a folio to the associated
anon_vma, it (under RCU) checks that the mapcount is non-zero, which
implies that there must still be associated VMAs, which implies that
the associated anon_vma must still be alive; and then it dereferences
the ->mapping.

I think the ASAN splat indicates that syzkaller must fairly often get
into situations where the ->mapping pointer is dangling despite the
mapcount being non-zero, but syzkaller likely only actually hits the
UAF when the kernel gets memory pressure by chance and does rmap walks
on the reclaim path.

So there are several invariants we're relying on here, and breaking
any one of these could allow an rmap traversal to cause UAF; a
non-exhaustive list:

1. An anon folio is only mapped into VMAs that are associated with the
folio's anon_vma.
2. Removing an anon folio mapping reduces the anon folio's mapcount
before the VMA can go away.
3. VMA splitting/merging/forking properly preserves the anon_vma association.
4. If the anon-exclusive bit is set, the folio is only mapped in a
single place (otherwise swapout+swapin could erroneously set
RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
wrong anon_vma).
5. When a VMA is associated with an anon_vma, it is always also
associated with the corresponding root anon_vma (necessary because
non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
root anon_vma).
6. If two VMAs in the same process have the same ->anon_vma, their
anonvma chains must be the same (otherwise VMA merging can break
stuff).

> > There have been syzkaller reports a few months ago[1][2] of UAF in rmap
>
> Will try to take a look when I get a chance.
>
> > walks that seems to indicate that there can be pages with elevated mapcount
> > whose anon_vma has already been freed, but I think we never figured out
> > what the cause is; and syzkaller only hit these UAFs when memory pressure
> > randomly caused reclaim to rmap-walk the affected pages, so it of course
> > didn't manage to create a reproducer.
>
> Fun.
>
> Please hook me in (I mean you're going to anyway right :P) on this stuff,
> as I'm looking to rework the anon_vma stuff so am naturally interested in
> any and all rmap anon stuff.
>
> For my sins ;)
>
> Maybe I"ll dig into these syzkallers.
>
> >
> > Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous pages to
> > hopefully catch such issues more reliably.
>
> Good idea.
>
> >
> > Implementation note: I'm checking IS_ENABLED(CONFIG_DEBUG_VM) because,
> > unlike the checks above, this one would otherwise be hard to write such
> > that it completely compiles away in non-debug builds by itself, without
> > looking extremely ugly.
>
> David already addressed.
>
> >
> > [1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
> > [2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Nit below, and pending David's requests, but LGTM so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks!

> > ---
> >  include/linux/rmap.h | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index c4f4903b1088..ba694c436f59 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> >       default:
> >               VM_WARN_ON_ONCE(true);
> >       }
> > +
> > +     /*
> > +      * Anon folios must have an associated live anon_vma as long as they're
> > +      * mapped into userspace.
> > +      * Part of the purpose of the atomic_read() is to make KASAN check that
> > +      * the anon_vma is still alive.
> > +      */
> > +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> > +             unsigned long mapping = (unsigned long)folio->mapping;
> > +             struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > +
> > +             VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
>
> Maybe stupid question, but wouldn't we pick this up with KASAN? Or would we
> pick it up far too late I guess?

I mean, it depends on the exact nature of the issue. If the issue is
that we somehow end up with anon folios mapped into VMAs that are not
associated with the anon folio's anon_vma, then I think the only time
we'd notice would be if we randomly end up doing rmap walks of the
folios, as syzkaller did above.

> We're sort of relying on this
>
> a. being a UAF
>
> b. the thing we're UAF-ing not either corrupting this field or (if that
>     memory is actually reused as an anon_vma - I'm not familiar with slab
>     caches - so maybe it's quite likely - getting its refcount incremented.

KASAN sees the memory read I'm doing with this atomic_read(), so in
KASAN builds, if this is a UAF, it should trigger a KASAN splat
(modulo KASAN limitations around when UAF can be detected). Basically,
in KASAN builds, the actual explicit check I'm doing here is only
relevant if the object has not yet been freed. That's why I wrote the
comment "Part of the purpose of the atomic_read() is to make KASAN
check that the anon_vma is still alive.".

> Which is fine - because hey this is the only way we can get a hint that
> this is happening, but I think we should describe what we're doing.

Sure, I can make the comment more verbose.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 12:00   ` Jann Horn
@ 2025-07-25 13:48     ` Lorenzo Stoakes
  2025-07-25 14:10       ` Lorenzo Stoakes
  2025-07-25 14:48       ` Jann Horn
  0 siblings, 2 replies; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 13:48 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 1:32 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> > > If an anon page is mapped into userspace, its anon_vma must be alive,
> > > otherwise rmap walks can hit UAF.
> >
> > This is extremely weird. I guess it's a UAF of vma->anon_vma
> >
> > Because we:
> >
> > put_anon_vma()
> > ->__put_anon_vma()
> > ->anon_vma_free() (also the root anon_vma... interestingly!)
>
> FWIW, the thing we're trying to access can't be the root anon_vma,
> since KASAN says the UAF'd object was allocated from anon_vma_fork.

OK that's interesting, this is data.

I wonder if related to anon_vma reuse somehow...

>
> > But none of this obviously updates the vma to set vma->anon_vma to NULL.
> >
> > So yeah god almighty. To get this, we must be dereffing vma->anon_vma
> > somewhere unexpected.
>
> I don't see how vma->anon_vma is directly relevant. I think the
> relevant things are:

Yeah sorry, I'm reflexively assuming it's a VMA that's the issue here. I'll
dig into the syzkallers when I can to get more insights if there are any to
be had (maybe this patch is the difference on this).

>
> An anon_vma is only kept alive as long as at least one associated VMA
> is still alive.

Well, including child VMA's yes.

> An anon folio may outlive the VMAs it comes from, so it may also
> outlive its associated anon_vma.

Yes, I will share some diagrams I did a while ago to outline this. They're
ASCII and make you want to cry! :)

Hmm, if non-root, I wonder if we

> When an anon_vma goes away, it does not clear the ->mapping of
> associated folios (this is an intentional design choice).

Well it can't without having to do an expensive page table walk, so
obviously by design yes.

> When an rmap traversal wants to go from a folio to the associated
> anon_vma, it (under RCU) checks that the mapcount is non-zero, which
> implies that there must still be associated VMAs, which implies that
> the associated anon_vma must still be alive; and then it dereferences
> the ->mapping.



>
> I think the ASAN splat indicates that syzkaller must fairly often get
> into situations where the ->mapping pointer is dangling despite the
> mapcount being non-zero, but syzkaller likely only actually hits the
> UAF when the kernel gets memory pressure by chance and does rmap walks
> on the reclaim path.

Yup.

>
> So there are several invariants we're relying on here, and breaking
> any one of these could allow an rmap traversal to cause UAF; a
> non-exhaustive list:

This is good, because we can start to really lay out these things. This is
also vital to my anon_vma work.

>
> 1. An anon folio is only mapped into VMAs that are associated with the
> folio's anon_vma.

Correct.

> 2. Removing an anon folio mapping reduces the anon folio's mapcount
> before the VMA can go away.

the anon folio's mapcount? You mean the VMA's? :P

> 3. VMA splitting/merging/forking properly preserves the anon_vma
  association.

I had slides for this for version 1 of my LSF presentation, not sure if I
kept...

But yes it does fundamentally, in a slightly typically overwrought way.

> 4. If the anon-exclusive bit is set, the folio is only mapped in a
> single place (otherwise swapout+swapin could erroneously set
> RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> wrong anon_vma).

I believe (David?) swapin can cause this not to be the case?

> 5. When a VMA is associated with an anon_vma, it is always also
> associated with the corresponding root anon_vma (necessary because
> non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> root anon_vma).

OK but we know for sure the UAF is not on a root anon_vma so it's not some
weirdness with trying to access anon_vma->root

> 6. If two VMAs in the same process have the same ->anon_vma, their
> anonvma chains must be the same (otherwise VMA merging can break
> stuff).
>

What do you mean the same?

If you mean they both must have AVC's which ponit to the individual VMAs
and each point to the same anon_vma, yes.

Then we see e.g. after split anon_vma->num_active_vmas being incremented.

So in relation to this, as to assignment of vma->anon_vma or not:

1. Where vma->anon_vma == anon_vma A.

2. Where we fork the process and the new vma->anon_vma allocates a new
   anon_vma to vma->anon_vma B, whose parent, root fields reference
   anon_vma A.

3. We can end up with anon_vma's where no vma->anon_vma points to them, and
   folio->mapping (masking lower bits) == anon_vma, but then the refcount
   should be 1. Also I can't see how you can do that without a
   folio->mapping pointing at it:

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|
  |  A  |<------| avc |
  |-----|       |-----|        |--------------------------|
     | anon_vma    ^           |                          |
     |             |--------------------|                 |
     v     rb_root |                    |                 |
  |---------------------|               |                 |
  | refcount = 3        |               |                 |
  | num_children = 2    |<--------------x----|            |
  | num_active_vmas = 1 |               |    |            |
  |---------------------|               |    |            |
                           |------------|    |            |
Process 2                  |                 |            |
                           |                 |            |
     |-------------|       |                 |            |
     | avc         v       v                 |            |
  |-----|   vma |-----| |-----|              |            |
  |  B  |<------| avc |.| avc |              |            |
  |-----|       |-----| |-----|              |            |
     | anon_vma    ^                         |            |
     |             |-------------------------x--------|   |
     v     rb_root |                         |        |   |
  |---------------------|                    |        |   |
  | refcount = 1        |<-------------------x----|   |   |
  | num_children = 1    |--------------------|    |   |   |
  | num_active_vmas = 1 | parent                  |   |   |
  |---------------------|                         |   |   |
                                                  |   |   |
Process 3                                         |   |   |
                           |----------------------x---|   |
     |-------------|       |       |--------------x-------|
     | avc         v       v       v              |
  |-----|   vma |-----| |-----| |-----|           |
  |  C  |<------| avc |.| avc |.| avc |           |
  |-----|       |-----| |-----| |-----|           |
     | anon_vma    ^                              |
     |             |                              |
     v     rb_root |                              |
  |---------------------|                         |
  | refcount = 1        |                         |
  | num_children = 0    |-------------------------|
  | num_active_vmas = 1 | parent
  |---------------------|

UNMAP B

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|
  |  A  |<------| avc |
  |-----|       |-----|        |--------------------------|
     | anon_vma    ^           |                          |
     |             |------------                          |
     v     rb_root |                                      |
  |---------------------|                                 |
  | refcount = 3        |                                 |
  | num_children = 2    |<-------------------|            |
  | num_active_vmas = 1 |                    |            |
  |---------------------|                    |            |
                                             |            |
Process 2                                    |            |
                                             |            |
                   |-------------------------x--------|   |
           rb_root |                         |        |   |
  |---------------------|                    |        |   |
  | refcount = 1        |<-------------------x----|   |   | We keep empty
  | num_children = 1    |--------------------|    |   |   | anon_vma round.
  | num_active_vmas = 0 | parent                  |   |   |
  |---------------------|                         |   |   |
                                                  |   |   |
Process 3                                         |   |   |
                           |----------------------x---|   |
     |-------------|       |       |--------------x-------|
     | avc         v       v       v              |
  |-----|   vma |-----| |-----| |-----|           |
  |  C  |<------| avc |.| avc |.| avc |           |
  |-----|       |-----| |-----| |-----|           |
     | anon_vma    ^                              |
     |             |                              |
     v     rb_root |                              |
  |---------------------|                         |
  | refcount = 1        |                         |
  | num_children = 0    |-------------------------|
  | num_active_vmas = 1 | parent
  |---------------------|

4. But they can get reused through the anon_vma reuse logic:

FORK

Process 1

     |-------------|
     | avc         v
  |-----|   vma |-----|             |-----------------------------|
  |  A  |<------| avc |             |                             |
  |-----|       |-----|        |--------------------------|       |
     | anon_vma    ^           |                          |       |
     |             |-----------|                          |       |
     v     rb_root |                                      |       |
  |---------------------|                                 |       |
  | refcount = 3        | (self-parent)                   |       |
  | num_children = 2    |<-------------------|            |       |
  | num_active_vmas = 1 |                    |            |       |
  |---------------------|                    |            |       |
                                             |            |       |
Process 2               |--------------------x------------x-------x-------|
                        |                    |            |       |       |
                   |-------------------------x--------|   |       |       |
           rb_root |                         |        |   |       |       |
  |---------------------|<-------------------x--------x---x-------x---|   |
  | refcount = 1        |<-------------------x----|   |   |       |   |   |
  | num_children = 1    |--------------------|    |   |   |       |   |   |
  | num_active_vmas = 1 | parent                  |   |   |       |   |   |
  |---------------------|                         |   |   |       |   |   |
                                                  |   |   |       |   |   |
Process 3                                         |   |   |       |   |   |
                           |----------------------x---|   |       |   |   |
     |-------------|       |       |--------------x-------|       |   |   |
     | avc         v       v       v              |               |   |   |
  |-----|   vma |-----| |-----| |-----|           |               |   |   |
  |  C  |<------| avc |.| avc |.| avc |           |               |   |   |
  |-----|       |-----| |-----| |-----|           |               |   |   |
     | anon_vma    ^                              |               |   |   |
     |             |------------------------------x-----------|   |   |   |
     v     rb_root |                              |           |   |   |   |
  |---------------------|                         |           |   |   |   |
  | refcount = 1        |                         |           |   |   |   |
  | num_children = 0    |-------------------------|           |   |   |   |
  | num_active_vmas = 1 | parent                              |   |   |   |
  |---------------------|                                     |   |   |   |
                                                              |   |   |   |
Process 4                                                     |   |   |   |
                           |----------------------------------|   |   |   |
     |-------------|       |       |------------------------------|   |   |
     | avc         v       v       v                                  |   |
  |-----|   vma |-----| |-----| |-----|                               |   |
  |  D  |<------| avc |.| avc |.| avc |                               |   |
  |-----|       |-----| |-----| |-----|                               |   |
     | anon_vma    ^                                                  |   |
     |             |--------------------------------------------------x---|
     |                                                                |
     |----------------------------------------------------------------|

God simple isn't it? ;)

I verified these numbers with drgn, interesting add a new child doesn't
increment refcount...

> > Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks!

No problem, thanks for doing this!

>
> > > ---
> > >  include/linux/rmap.h | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > > index c4f4903b1088..ba694c436f59 100644
> > > --- a/include/linux/rmap.h
> > > +++ b/include/linux/rmap.h
> > > @@ -449,6 +449,19 @@ static inline void __folio_rmap_sanity_checks(const struct folio *folio,
> > >       default:
> > >               VM_WARN_ON_ONCE(true);
> > >       }
> > > +
> > > +     /*
> > > +      * Anon folios must have an associated live anon_vma as long as they're
> > > +      * mapped into userspace.
> > > +      * Part of the purpose of the atomic_read() is to make KASAN check that
> > > +      * the anon_vma is still alive.
> > > +      */
> > > +     if (IS_ENABLED(CONFIG_DEBUG_VM) && PageAnonNotKsm(page)) {
> > > +             unsigned long mapping = (unsigned long)folio->mapping;
> > > +             struct anon_vma *anon_vma = (void *)(mapping - PAGE_MAPPING_ANON);
> > > +
> > > +             VM_WARN_ON_FOLIO(atomic_read(&anon_vma->refcount) == 0, folio);
> >
> > Maybe stupid question, but wouldn't we pick this up with KASAN? Or would we
> > pick it up far too late I guess?
>
> I mean, it depends on the exact nature of the issue. If the issue is
> that we somehow end up with anon folios mapped into VMAs that are not
> associated with the anon folio's anon_vma, then I think the only time
> we'd notice would be if we randomly end up doing rmap walks of the
> folios, as syzkaller did above.

Yup. This will trigger on __folio_remove_rmap() and __folio_add_rmap():

do_swap_page() / remove_migration_pte() / remove_migration_pmd()
-> folio_add_anon_rmap_ptes() / folio_add_anon_rmap_pmd()
-> __folio_add_anon_rmap()
-> __folio_add_rmap()


vmf_insert_folio_pud() / set_pte_range() / insert_page_into_pte_locked() / mfill_atomic_install_pte() / vmf_insert_folio_pmd() / remove_migration_pmd() / remove_migration_pte()
-> folio_add_file_rmap_ptes() / folio_add_file_rmap_pmd() / folio_add_file_rmap_pud()
-> __folio_add_file_rmap()
-> __folio_add_rmap()

I guess we aren't interested in folio_add_new_anon_rmap() as we won't have
an existing anon_vma there

>
> > We're sort of relying on this
> >
> > a. being a UAF
> >
> > b. the thing we're UAF-ing not either corrupting this field or (if that
> >     memory is actually reused as an anon_vma - I'm not familiar with slab
> >     caches - so maybe it's quite likely - getting its refcount incremented.
>
> KASAN sees the memory read I'm doing with this atomic_read(), so in
> KASAN builds, if this is a UAF, it should trigger a KASAN splat
> (modulo KASAN limitations around when UAF can be detected). Basically,
> in KASAN builds, the actual explicit check I'm doing here is only
> relevant if the object has not yet been freed. That's why I wrote the
> comment "Part of the purpose of the atomic_read() is to make KASAN
> check that the anon_vma is still alive.".

Hm, I'm confused, how can you detect a UAF if the object cannot yet be
freed? :P

or would that be the case were it not an atomic_read()?

I guess this permits this to be detected in a timely manner.

>
> > Which is fine - because hey this is the only way we can get a hint that
> > this is happening, but I think we should describe what we're doing.
>
> Sure, I can make the comment more verbose.

Thanks!


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 13:48     ` Lorenzo Stoakes
@ 2025-07-25 14:10       ` Lorenzo Stoakes
  2025-07-25 14:50         ` Jann Horn
  2025-07-25 14:48       ` Jann Horn
  1 sibling, 1 reply; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 14:10 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

I went checking for this but was blind obviously since didn't find, Vlasta
asked off-list about why anon_vma's are allocated SLAB_TYPESAFE_BY_RCU.

I notice as well folio_get_anon_vma() is _tricky_:

/*
 * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
 *
 * Since there is no serialization what so ever against folio_remove_rmap_*()
 * the best this function can do is return a refcount increased anon_vma
 * that might have been relevant to this page.
 *
 * The page might have been remapped to a different anon_vma or the anon_vma
 * returned may already be freed (and even reused).
 *
 * In case it was remapped to a different anon_vma, the new anon_vma will be a
 * child of the old anon_vma, and the anon_vma lifetime rules will therefore
 * ensure that any anon_vma obtained from the page will still be valid for as
 * long as we observe page_mapped() [ hence all those page_mapped() tests ].
 *
 * All users of this function must be very careful when walking the anon_vma
 * chain and verify that the page in question is indeed mapped in it
 * [ something equivalent to page_mapped_in_vma() ].
 *
 * Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from
 * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid
 * if there is a mapcount, we can dereference the anon_vma after observing
 * those.

^--- this seems particularly pertinent...

 *
 * NOTE: the caller should normally hold folio lock when calling this.  If
 * not, the caller needs to double check the anon_vma didn't change after
 * taking the anon_vma lock for either read or write (UFFDIO_MOVE can modify it
 * concurrently without folio lock protection). See folio_lock_anon_vma_read()
 * which has already covered that, and comment above remap_pages().
 */

Suspicious...


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 13:48     ` Lorenzo Stoakes
  2025-07-25 14:10       ` Lorenzo Stoakes
@ 2025-07-25 14:48       ` Jann Horn
  2025-07-25 15:07         ` Lorenzo Stoakes
  1 sibling, 1 reply; 20+ messages in thread
From: Jann Horn @ 2025-07-25 14:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > An anon folio may outlive the VMAs it comes from, so it may also
> > outlive its associated anon_vma.
>
> Yes, I will share some diagrams I did a while ago to outline this. They're
> ASCII and make you want to cry! :)
>
> Hmm, if non-root, I wonder if we

(looks like you stopped typing mid-sentence)

> > 2. Removing an anon folio mapping reduces the anon folio's mapcount
> > before the VMA can go away.
>
> the anon folio's mapcount? You mean the VMA's? :P

I mean folio_mapcount(folio), which reads folio->_mapcount and
folio->_large_mapcount.

> > 4. If the anon-exclusive bit is set, the folio is only mapped in a
> > single place (otherwise swapout+swapin could erroneously set
> > RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> > wrong anon_vma).
>
> I believe (David?) swapin can cause this not to be the case?
>
> > 5. When a VMA is associated with an anon_vma, it is always also
> > associated with the corresponding root anon_vma (necessary because
> > non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> > root anon_vma).
>
> OK but we know for sure the UAF is not on a root anon_vma so it's not some
> weirdness with trying to access anon_vma->root

Ah, right.

> > 6. If two VMAs in the same process have the same ->anon_vma, their
> > anonvma chains must be the same (otherwise VMA merging can break
> > stuff).
> >
>
> What do you mean the same?
>
> If you mean they both must have AVC's which ponit to the individual VMAs
> and each point to the same anon_vma, yes.

Yeah, that.

> God simple isn't it? ;)

Yeah, I prefer to think of this at the slightly higher abstraction
layer of "which VMAs are tied to which anon_vmas via AVC" and "which
VMAs use which anon_vmas as their primary anon_vma"; to me, AVCs being
separate objects is a minor implementation detail caused by the kernel
only using intrusive lists instead of the kinds of data structures
that you'd use in almost any other environment.
(Like, you wouldn't need AVC objects if the references between VMAs
and anon_vmas were formed with things like maple trees or xarrays, but
I guess they wouldn't give you the interval tree semantics you want.)

> I verified these numbers with drgn, interesting add a new child doesn't
> increment refcount...

Yeah - AFAIU a single reference is shared by all the VMAs that are
tied to an anon_vma via AVC nodes, and a child anon_vma can't be
associated with a VMA without its parent also being associated with
the VMA...

> > > We're sort of relying on this
> > >
> > > a. being a UAF
> > >
> > > b. the thing we're UAF-ing not either corrupting this field or (if that
> > >     memory is actually reused as an anon_vma - I'm not familiar with slab
> > >     caches - so maybe it's quite likely - getting its refcount incremented.
> >
> > KASAN sees the memory read I'm doing with this atomic_read(), so in
> > KASAN builds, if this is a UAF, it should trigger a KASAN splat
> > (modulo KASAN limitations around when UAF can be detected). Basically,
> > in KASAN builds, the actual explicit check I'm doing here is only
> > relevant if the object has not yet been freed. That's why I wrote the
> > comment "Part of the purpose of the atomic_read() is to make KASAN
> > check that the anon_vma is still alive.".
>
> Hm, I'm confused, how can you detect a UAF if the object cannot yet be
> freed? :P
>
> or would that be the case were it not an atomic_read()?
>
> I guess this permits this to be detected in a timely manner.

If the anon_vma hasn't yet been freed, but its refcount is 0, then
that's still a bug because we rely on the anon_vma to have a nonzero
refcount as long as there are folios with a nonzero mapcount that are
tied to it, and it is likely to allow UAF at a later point.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 14:10       ` Lorenzo Stoakes
@ 2025-07-25 14:50         ` Jann Horn
  0 siblings, 0 replies; 20+ messages in thread
From: Jann Horn @ 2025-07-25 14:50 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 4:11 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> I went checking for this but was blind obviously since didn't find, Vlasta
> asked off-list about why anon_vma's are allocated SLAB_TYPESAFE_BY_RCU.
>
> I notice as well folio_get_anon_vma() is _tricky_:
>
> /*
>  * Getting a lock on a stable anon_vma from a page off the LRU is tricky!
>  *
>  * Since there is no serialization what so ever against folio_remove_rmap_*()
>  * the best this function can do is return a refcount increased anon_vma
>  * that might have been relevant to this page.
>  *
>  * The page might have been remapped to a different anon_vma or the anon_vma
>  * returned may already be freed (and even reused).
>  *
>  * In case it was remapped to a different anon_vma, the new anon_vma will be a
>  * child of the old anon_vma, and the anon_vma lifetime rules will therefore
>  * ensure that any anon_vma obtained from the page will still be valid for as
>  * long as we observe page_mapped() [ hence all those page_mapped() tests ].
>  *
>  * All users of this function must be very careful when walking the anon_vma
>  * chain and verify that the page in question is indeed mapped in it
>  * [ something equivalent to page_mapped_in_vma() ].
>  *
>  * Since anon_vma's slab is SLAB_TYPESAFE_BY_RCU and we know from
>  * folio_remove_rmap_*() that the anon_vma pointer from page->mapping is valid
>  * if there is a mapcount, we can dereference the anon_vma after observing
>  * those.
>
> ^--- this seems particularly pertinent...

Yep! This was also how that anon_vma bug a few years ago
(https://project-zero.issues.chromium.org/42451486) turned into UAF -
it allowed you to have a folio still mapped into userspace while its
anon_vma was already gone.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 11:32 ` Lorenzo Stoakes
  2025-07-25 12:00   ` Jann Horn
@ 2025-07-25 15:05   ` Jann Horn
  1 sibling, 0 replies; 20+ messages in thread
From: Jann Horn @ 2025-07-25 15:05 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 1:32 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Thu, Jul 24, 2025 at 09:13:50PM +0200, Jann Horn wrote:
> > There have been syzkaller reports a few months ago[1][2] of UAF in rmap
>
> Will try to take a look when I get a chance.
>
> > walks that seems to indicate that there can be pages with elevated mapcount
> > whose anon_vma has already been freed, but I think we never figured out
> > what the cause is; and syzkaller only hit these UAFs when memory pressure
> > randomly caused reclaim to rmap-walk the affected pages, so it of course
> > didn't manage to create a reproducer.
>
> Fun.
>
> Please hook me in (I mean you're going to anyway right :P) on this stuff,
> as I'm looking to rework the anon_vma stuff so am naturally interested in
> any and all rmap anon stuff.
>
> For my sins ;)
>
> Maybe I"ll dig into these syzkallers.

For what it's worth, the point of this change is that hopefully we
won't have to dig more into them manually, because hopefully a few
days after this patch hits linux-next, syzkaller will present us with
a beautiful reproducer that shows exactly what went wrong... or maybe
it won't, I'm being very optimistic here.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 14:48       ` Jann Horn
@ 2025-07-25 15:07         ` Lorenzo Stoakes
  2025-07-25 15:15           ` Jann Horn
  2025-07-25 15:58           ` Lorenzo Stoakes
  0 siblings, 2 replies; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 15:07 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 04:48:09PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > > An anon folio may outlive the VMAs it comes from, so it may also
> > > outlive its associated anon_vma.
> >
> > Yes, I will share some diagrams I did a while ago to outline this. They're
> > ASCII and make you want to cry! :)
> >
> > Hmm, if non-root, I wonder if we
>
> (looks like you stopped typing mid-sentence)

Yup,

>
> > > 2. Removing an anon folio mapping reduces the anon folio's mapcount
> > > before the VMA can go away.
> >
> > the anon folio's mapcount? You mean the VMA's? :P
>
> I mean folio_mapcount(folio), which reads folio->_mapcount and
> folio->_large_mapcount.

Yup sorry, brain fart.

>
> > > 4. If the anon-exclusive bit is set, the folio is only mapped in a
> > > single place (otherwise swapout+swapin could erroneously set
> > > RMAP_EXCLUSIVE, causing the swapped-in folio to be associated with the
> > > wrong anon_vma).
> >
> > I believe (David?) swapin can cause this not to be the case?
> >
> > > 5. When a VMA is associated with an anon_vma, it is always also
> > > associated with the corresponding root anon_vma (necessary because
> > > non-RMAP_EXCLUSIVE swapin falls back to associating the folio with the
> > > root anon_vma).
> >
> > OK but we know for sure the UAF is not on a root anon_vma so it's not some
> > weirdness with trying to access anon_vma->root
>
> Ah, right.
>
> > > 6. If two VMAs in the same process have the same ->anon_vma, their
> > > anonvma chains must be the same (otherwise VMA merging can break
> > > stuff).
> > >
> >
> > What do you mean the same?
> >
> > If you mean they both must have AVC's which ponit to the individual VMAs
> > and each point to the same anon_vma, yes.
>
> Yeah, that.
>
> > God simple isn't it? ;)
>
> Yeah, I prefer to think of this at the slightly higher abstraction
> layer of "which VMAs are tied to which anon_vmas via AVC" and "which
> VMAs use which anon_vmas as their primary anon_vma"; to me, AVCs being
> separate objects is a minor implementation detail caused by the kernel
> only using intrusive lists instead of the kinds of data structures
> that you'd use in almost any other environment.

To me (a simple man :) there's a many to many relationship so it makes
sense _on one level_ to have connecting AVC's.

I however feel like we can probably do better on the data structures here.

I alos think:

- The reuse logic can be improved
- The semantics or at least logic around logic can be improved


> (Like, you wouldn't need AVC objects if the references between VMAs
> and anon_vmas were formed with things like maple trees or xarrays, but
> I guess they wouldn't give you the interval tree semantics you want.)

Well, we'll see about that :)

>
> > I verified these numbers with drgn, interesting add a new child doesn't
> > increment refcount...
>
> Yeah - AFAIU a single reference is shared by all the VMAs that are
> tied to an anon_vma via AVC nodes, and a child anon_vma can't be
> associated with a VMA without its parent also being associated with
> the VMA...

You see this is also some of the complexity I dislike.

I think when you constrain yourself to the design currently existing, you
sort of necessarily need to add these things.

But a better thought out design might avoid them.

>
> > > > We're sort of relying on this
> > > >
> > > > a. being a UAF
> > > >
> > > > b. the thing we're UAF-ing not either corrupting this field or (if that
> > > >     memory is actually reused as an anon_vma - I'm not familiar with slab
> > > >     caches - so maybe it's quite likely - getting its refcount incremented.
> > >
> > > KASAN sees the memory read I'm doing with this atomic_read(), so in
> > > KASAN builds, if this is a UAF, it should trigger a KASAN splat
> > > (modulo KASAN limitations around when UAF can be detected). Basically,
> > > in KASAN builds, the actual explicit check I'm doing here is only
> > > relevant if the object has not yet been freed. That's why I wrote the
> > > comment "Part of the purpose of the atomic_read() is to make KASAN
> > > check that the anon_vma is still alive.".
> >
> > Hm, I'm confused, how can you detect a UAF if the object cannot yet be
> > freed? :P
> >
> > or would that be the case were it not an atomic_read()?
> >
> > I guess this permits this to be detected in a timely manner.
>
> If the anon_vma hasn't yet been freed, but its refcount is 0, then
> that's still a bug because we rely on the anon_vma to have a nonzero
> refcount as long as there are folios with a nonzero mapcount that are
> tied to it, and it is likely to allow UAF at a later point.

But how is this happening?

The only places where we might explicitly manipulate anon_vma->refcount
are:

- anon_vma_ctor() -> set to 0 on construction used by slab.
- folio_lock_anon_vma_read() / put_anon_vma() - both cases call
  __put_anon_vma() when 0 to free the anon_vma.

So how could we get to a refcount of 0 but the anon_vma still be hanging
around, except if it's freshly allocated?

It's surely only UAF?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 15:07         ` Lorenzo Stoakes
@ 2025-07-25 15:15           ` Jann Horn
  2025-07-25 15:22             ` Lorenzo Stoakes
  2025-07-25 15:58           ` Lorenzo Stoakes
  1 sibling, 1 reply; 20+ messages in thread
From: Jann Horn @ 2025-07-25 15:15 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 5:07 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Fri, Jul 25, 2025 at 04:48:09PM +0200, Jann Horn wrote:
> > On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > > > > We're sort of relying on this
> > > > >
> > > > > a. being a UAF
> > > > >
> > > > > b. the thing we're UAF-ing not either corrupting this field or (if that
> > > > >     memory is actually reused as an anon_vma - I'm not familiar with slab
> > > > >     caches - so maybe it's quite likely - getting its refcount incremented.
> > > >
> > > > KASAN sees the memory read I'm doing with this atomic_read(), so in
> > > > KASAN builds, if this is a UAF, it should trigger a KASAN splat
> > > > (modulo KASAN limitations around when UAF can be detected). Basically,
> > > > in KASAN builds, the actual explicit check I'm doing here is only
> > > > relevant if the object has not yet been freed. That's why I wrote the
> > > > comment "Part of the purpose of the atomic_read() is to make KASAN
> > > > check that the anon_vma is still alive.".
> > >
> > > Hm, I'm confused, how can you detect a UAF if the object cannot yet be
> > > freed? :P
> > >
> > > or would that be the case were it not an atomic_read()?
> > >
> > > I guess this permits this to be detected in a timely manner.
> >
> > If the anon_vma hasn't yet been freed, but its refcount is 0, then
> > that's still a bug because we rely on the anon_vma to have a nonzero
> > refcount as long as there are folios with a nonzero mapcount that are
> > tied to it, and it is likely to allow UAF at a later point.
>
> But how is this happening?
>
> The only places where we might explicitly manipulate anon_vma->refcount
> are:
>
> - anon_vma_ctor() -> set to 0 on construction used by slab.
> - folio_lock_anon_vma_read() / put_anon_vma() - both cases call
>   __put_anon_vma() when 0 to free the anon_vma.
>
> So how could we get to a refcount of 0 but the anon_vma still be hanging
> around, except if it's freshly allocated?

Due to SLAB_TYPESAFE_BY_RCU, the anon_vma is guaranteed to still be
accessible (possibly post-recycling) for an RCU grace period after its
refcount drops to zero. Under CONFIG_SLUB_RCU_DEBUG (which you need
for KASAN to catch UAF in such slabs semi-reliably), from KASAN's
perspective, the anon_vma is effectively freed after an RCU grace
period.

Basically CONFIG_SLUB_RCU_DEBUG turns kmem_cache_free() on
SLAB_TYPESAFE_BY_RCU slabs into something like kfree_rcu(), and this
allows KASAN to catch UAF access.

> It's surely only UAF?

I mean, "UAF" is kind of vague when talking about SLAB_TYPESAFE_BY_RCU
slabs. I am only using the term "UAF" when talking about a situation
where accessing the anon_vma object is entirely forbidden because an
RCU grace period has passed after it was "freed" with
kmem_cache_free().


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 15:15           ` Jann Horn
@ 2025-07-25 15:22             ` Lorenzo Stoakes
  2025-07-25 15:27               ` Jann Horn
  0 siblings, 1 reply; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 15:22 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 05:15:45PM +0200, Jann Horn wrote:
> On Fri, Jul 25, 2025 at 5:07 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > On Fri, Jul 25, 2025 at 04:48:09PM +0200, Jann Horn wrote:
> > > On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > > On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > > > > > We're sort of relying on this
> > > > > >
> > > > > > a. being a UAF
> > > > > >
> > > > > > b. the thing we're UAF-ing not either corrupting this field or (if that
> > > > > >     memory is actually reused as an anon_vma - I'm not familiar with slab
> > > > > >     caches - so maybe it's quite likely - getting its refcount incremented.
> > > > >
> > > > > KASAN sees the memory read I'm doing with this atomic_read(), so in
> > > > > KASAN builds, if this is a UAF, it should trigger a KASAN splat
> > > > > (modulo KASAN limitations around when UAF can be detected). Basically,
> > > > > in KASAN builds, the actual explicit check I'm doing here is only
> > > > > relevant if the object has not yet been freed. That's why I wrote the
> > > > > comment "Part of the purpose of the atomic_read() is to make KASAN
> > > > > check that the anon_vma is still alive.".
> > > >
> > > > Hm, I'm confused, how can you detect a UAF if the object cannot yet be
> > > > freed? :P
> > > >
> > > > or would that be the case were it not an atomic_read()?
> > > >
> > > > I guess this permits this to be detected in a timely manner.
> > >
> > > If the anon_vma hasn't yet been freed, but its refcount is 0, then
> > > that's still a bug because we rely on the anon_vma to have a nonzero
> > > refcount as long as there are folios with a nonzero mapcount that are
> > > tied to it, and it is likely to allow UAF at a later point.
> >
> > But how is this happening?
> >
> > The only places where we might explicitly manipulate anon_vma->refcount
> > are:
> >
> > - anon_vma_ctor() -> set to 0 on construction used by slab.
> > - folio_lock_anon_vma_read() / put_anon_vma() - both cases call
> >   __put_anon_vma() when 0 to free the anon_vma.
> >
> > So how could we get to a refcount of 0 but the anon_vma still be hanging
> > around, except if it's freshly allocated?
>
> Due to SLAB_TYPESAFE_BY_RCU, the anon_vma is guaranteed to still be
> accessible (possibly post-recycling) for an RCU grace period after its

Right that makes sense.

> refcount drops to zero. Under CONFIG_SLUB_RCU_DEBUG (which you need
> for KASAN to catch UAF in such slabs semi-reliably), from KASAN's
> perspective, the anon_vma is effectively freed after an RCU grace
> period.

By UAF I mean used after kmem_cache_free(), but I hadn't grokked this point but
y'know kinda makes sense given the name...

>
> Basically CONFIG_SLUB_RCU_DEBUG turns kmem_cache_free() on
> SLAB_TYPESAFE_BY_RCU slabs into something like kfree_rcu(), and this
> allows KASAN to catch UAF access.
>
> > It's surely only UAF?
>
> I mean, "UAF" is kind of vague when talking about SLAB_TYPESAFE_BY_RCU
> slabs. I am only using the term "UAF" when talking about a situation
> where accessing the anon_vma object is entirely forbidden because an
> RCU grace period has passed after it was "freed" with
> kmem_cache_free().

Could it not be either case? Or are we sure it's been accessed within that grace
period?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 15:22             ` Lorenzo Stoakes
@ 2025-07-25 15:27               ` Jann Horn
  0 siblings, 0 replies; 20+ messages in thread
From: Jann Horn @ 2025-07-25 15:27 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 5:22 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> On Fri, Jul 25, 2025 at 05:15:45PM +0200, Jann Horn wrote:
> > On Fri, Jul 25, 2025 at 5:07 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > Basically CONFIG_SLUB_RCU_DEBUG turns kmem_cache_free() on
> > SLAB_TYPESAFE_BY_RCU slabs into something like kfree_rcu(), and this
> > allows KASAN to catch UAF access.
> >
> > > It's surely only UAF?
> >
> > I mean, "UAF" is kind of vague when talking about SLAB_TYPESAFE_BY_RCU
> > slabs. I am only using the term "UAF" when talking about a situation
> > where accessing the anon_vma object is entirely forbidden because an
> > RCU grace period has passed after it was "freed" with
> > kmem_cache_free().
>
> Could it not be either case? Or are we sure it's been accessed within that grace
> period?

If there is a bug here, it could be either case. The check covers both
cases in CONFIG_SLUB_RCU_DEBUG builds (as long as the object hasn't
already moved all the way through the KASAN quarantine, the usual
KASAN caveat). Note that atomic_read() implicitly causes KASAN to
check that the object is still accessible.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] mm/rmap: Add anon_vma lifetime debug check
  2025-07-25 15:07         ` Lorenzo Stoakes
  2025-07-25 15:15           ` Jann Horn
@ 2025-07-25 15:58           ` Lorenzo Stoakes
  1 sibling, 0 replies; 20+ messages in thread
From: Lorenzo Stoakes @ 2025-07-25 15:58 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrew Morton, David Hildenbrand, Rik van Riel, Liam R. Howlett,
	Vlastimil Babka, Harry Yoo, linux-mm, linux-kernel

On Fri, Jul 25, 2025 at 04:07:00PM +0100, Lorenzo Stoakes wrote:
> On Fri, Jul 25, 2025 at 04:48:09PM +0200, Jann Horn wrote:
> > On Fri, Jul 25, 2025 at 3:49 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > On Fri, Jul 25, 2025 at 02:00:18PM +0200, Jann Horn wrote:
> > > > An anon folio may outlive the VMAs it comes from, so it may also
> > > > outlive its associated anon_vma.
> > >
> > > Yes, I will share some diagrams I did a while ago to outline this. They're
> > > ASCII and make you want to cry! :)
> > >
> > > Hmm, if non-root, I wonder if we
> >
> > (looks like you stopped typing mid-sentence)
>
> Yup,

Lol I did it twice, yeah I'm tired man.

I can't even remember what I was saying... :P


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-25 15:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-24 19:13 [PATCH] mm/rmap: Add anon_vma lifetime debug check Jann Horn
2025-07-24 21:52 ` Andrew Morton
2025-07-25 10:59   ` Jann Horn
2025-07-24 21:56 ` David Hildenbrand
2025-07-25 11:08   ` Jann Horn
2025-07-25 11:12     ` David Hildenbrand
2025-07-25 11:24       ` Jann Horn
2025-07-25 11:31         ` David Hildenbrand
2025-07-25 11:32 ` Lorenzo Stoakes
2025-07-25 12:00   ` Jann Horn
2025-07-25 13:48     ` Lorenzo Stoakes
2025-07-25 14:10       ` Lorenzo Stoakes
2025-07-25 14:50         ` Jann Horn
2025-07-25 14:48       ` Jann Horn
2025-07-25 15:07         ` Lorenzo Stoakes
2025-07-25 15:15           ` Jann Horn
2025-07-25 15:22             ` Lorenzo Stoakes
2025-07-25 15:27               ` Jann Horn
2025-07-25 15:58           ` Lorenzo Stoakes
2025-07-25 15:05   ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).