linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kasan: skip quarantine if object is still accessible under RCU
@ 2025-07-23 14:59 Jann Horn
  2025-07-24 10:14 ` Vlastimil Babka
  2025-07-26 22:05 ` Andrey Konovalov
  0 siblings, 2 replies; 6+ messages in thread
From: Jann Horn @ 2025-07-23 14:59 UTC (permalink / raw)
  To: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton
  Cc: Vlastimil Babka, kasan-dev, linux-mm, linux-kernel, Jann Horn

Currently, enabling KASAN masks bugs where a lockless lookup path gets a
pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
recycled and is insufficiently careful about handling recycled objects:
KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
queues, even when it can't actually detect UAF in these objects, and the
quarantine prevents fast recycling.

When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
after an RCU grace period and put them on the quarantine, while disabling
CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
but that hasn't actually been working.

I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
could only trigger this bug in a KASAN build by disabling
CONFIG_SLUB_RCU_DEBUG and applying this patch.

Signed-off-by: Jann Horn <jannh@google.com>
---
 mm/kasan/common.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index ed4873e18c75..9142964ab9c9 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object,
 }
 
 static inline void poison_slab_object(struct kmem_cache *cache, void *object,
-				      bool init, bool still_accessible)
+				      bool init)
 {
 	void *tagged_object = object;
 
 	object = kasan_reset_tag(object);
 
-	/* RCU slabs could be legally used after free within the RCU period. */
-	if (unlikely(still_accessible))
-		return;
-
 	kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
 			KASAN_SLAB_FREE, init);
 
@@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init,
 	if (!kasan_arch_is_ready() || is_kfence_address(object))
 		return false;
 
-	poison_slab_object(cache, object, init, still_accessible);
+	/*
+	 * If this point is reached with an object that must still be
+	 * accessible under RCU, we can't poison it; in that case, also skip the
+	 * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG
+	 * has been disabled manually.
+	 *
+	 * Putting the object on the quarantine wouldn't help catch UAFs (since
+	 * we can't poison it here), and it would mask bugs caused by
+	 * SLAB_TYPESAFE_BY_RCU users not being careful enough about object
+	 * reuse; so overall, putting the object into the quarantine here would
+	 * be counterproductive.
+	 */
+	if (still_accessible)
+		return false;
+
+	poison_slab_object(cache, object, init);
 
 	/*
 	 * If the object is put into quarantine, do not let slab put the object
@@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip)
 	if (check_slab_allocation(slab->slab_cache, ptr, ip))
 		return false;
 
-	poison_slab_object(slab->slab_cache, ptr, false, false);
+	poison_slab_object(slab->slab_cache, ptr, false);
 	return true;
 }
 

---
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24

-- 
Jann Horn <jannh@google.com>



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] kasan: skip quarantine if object is still accessible under RCU
  2025-07-23 14:59 [PATCH] kasan: skip quarantine if object is still accessible under RCU Jann Horn
@ 2025-07-24 10:14 ` Vlastimil Babka
  2025-07-24 10:20   ` Alexander Potapenko
  2025-07-24 15:11   ` Jann Horn
  2025-07-26 22:05 ` Andrey Konovalov
  1 sibling, 2 replies; 6+ messages in thread
From: Vlastimil Babka @ 2025-07-24 10:14 UTC (permalink / raw)
  To: Jann Horn, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton
  Cc: kasan-dev, linux-mm, linux-kernel

On 7/23/25 16:59, Jann Horn wrote:
> Currently, enabling KASAN masks bugs where a lockless lookup path gets a
> pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
> recycled and is insufficiently careful about handling recycled objects:
> KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
> queues, even when it can't actually detect UAF in these objects, and the
> quarantine prevents fast recycling.
> 
> When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
> CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
> after an RCU grace period and put them on the quarantine, while disabling
> CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
> but that hasn't actually been working.

Was the "allow reuse immediately" not working also before you introduced
CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a
Fixes: here?

> I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
> could only trigger this bug in a KASAN build by disabling
> CONFIG_SLUB_RCU_DEBUG and applying this patch.
> 
> Signed-off-by: Jann Horn <jannh@google.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/kasan/common.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index ed4873e18c75..9142964ab9c9 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object,
>  }
>  
>  static inline void poison_slab_object(struct kmem_cache *cache, void *object,
> -				      bool init, bool still_accessible)
> +				      bool init)
>  {
>  	void *tagged_object = object;
>  
>  	object = kasan_reset_tag(object);
>  
> -	/* RCU slabs could be legally used after free within the RCU period. */
> -	if (unlikely(still_accessible))
> -		return;
> -
>  	kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
>  			KASAN_SLAB_FREE, init);
>  
> @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init,
>  	if (!kasan_arch_is_ready() || is_kfence_address(object))
>  		return false;
>  
> -	poison_slab_object(cache, object, init, still_accessible);
> +	/*
> +	 * If this point is reached with an object that must still be
> +	 * accessible under RCU, we can't poison it; in that case, also skip the
> +	 * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG
> +	 * has been disabled manually.
> +	 *
> +	 * Putting the object on the quarantine wouldn't help catch UAFs (since
> +	 * we can't poison it here), and it would mask bugs caused by
> +	 * SLAB_TYPESAFE_BY_RCU users not being careful enough about object
> +	 * reuse; so overall, putting the object into the quarantine here would
> +	 * be counterproductive.
> +	 */
> +	if (still_accessible)
> +		return false;
> +
> +	poison_slab_object(cache, object, init);
>  
>  	/*
>  	 * If the object is put into quarantine, do not let slab put the object
> @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip)
>  	if (check_slab_allocation(slab->slab_cache, ptr, ip))
>  		return false;
>  
> -	poison_slab_object(slab->slab_cache, ptr, false, false);
> +	poison_slab_object(slab->slab_cache, ptr, false);
>  	return true;
>  }
>  
> 
> ---
> base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
> change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kasan: skip quarantine if object is still accessible under RCU
  2025-07-24 10:14 ` Vlastimil Babka
@ 2025-07-24 10:20   ` Alexander Potapenko
  2025-07-24 15:11   ` Jann Horn
  1 sibling, 0 replies; 6+ messages in thread
From: Alexander Potapenko @ 2025-07-24 10:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Jann Horn, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, kasan-dev, linux-mm,
	linux-kernel

On Thu, Jul 24, 2025 at 12:14 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 7/23/25 16:59, Jann Horn wrote:
> > Currently, enabling KASAN masks bugs where a lockless lookup path gets a
> > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
> > recycled and is insufficiently careful about handling recycled objects:
> > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
> > queues, even when it can't actually detect UAF in these objects, and the
> > quarantine prevents fast recycling.
> >
> > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
> > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
> > after an RCU grace period and put them on the quarantine, while disabling
> > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
> > but that hasn't actually been working.
>
> Was the "allow reuse immediately" not working also before you introduced
> CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a
> Fixes: here?
>
> > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
> > could only trigger this bug in a KASAN build by disabling
> > CONFIG_SLUB_RCU_DEBUG and applying this patch.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Alexander Potapenko <glider@google.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kasan: skip quarantine if object is still accessible under RCU
  2025-07-24 10:14 ` Vlastimil Babka
  2025-07-24 10:20   ` Alexander Potapenko
@ 2025-07-24 15:11   ` Jann Horn
  1 sibling, 0 replies; 6+ messages in thread
From: Jann Horn @ 2025-07-24 15:11 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, kasan-dev,
	linux-mm, linux-kernel

On Thu, Jul 24, 2025 at 12:14 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 7/23/25 16:59, Jann Horn wrote:
> > Currently, enabling KASAN masks bugs where a lockless lookup path gets a
> > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
> > recycled and is insufficiently careful about handling recycled objects:
> > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
> > queues, even when it can't actually detect UAF in these objects, and the
> > quarantine prevents fast recycling.
> >
> > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
> > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
> > after an RCU grace period and put them on the quarantine, while disabling
> > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
> > but that hasn't actually been working.
>
> Was the "allow reuse immediately" not working also before you introduced
> CONFIG_SLUB_RCU_DEBUG, or is it a side-effect of that? IOW should we add a
> Fixes: here?

This was already an issue before. I think it got broken by refactoring
in commit b556a462eb8df6b6836c318d23f43409c40a7c7e ("kasan: save free
stack traces for slab mempools"), but I don't think it was necessarily
an intentionally supported feature.

> > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
> > could only trigger this bug in a KASAN build by disabling
> > CONFIG_SLUB_RCU_DEBUG and applying this patch.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kasan: skip quarantine if object is still accessible under RCU
  2025-07-23 14:59 [PATCH] kasan: skip quarantine if object is still accessible under RCU Jann Horn
  2025-07-24 10:14 ` Vlastimil Babka
@ 2025-07-26 22:05 ` Andrey Konovalov
  2025-07-28 15:25   ` Jann Horn
  1 sibling, 1 reply; 6+ messages in thread
From: Andrey Konovalov @ 2025-07-26 22:05 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Vlastimil Babka, kasan-dev,
	linux-mm, linux-kernel

On Wed, Jul 23, 2025 at 4:59 PM Jann Horn <jannh@google.com> wrote:
>
> Currently, enabling KASAN masks bugs where a lockless lookup path gets a
> pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
> recycled and is insufficiently careful about handling recycled objects:
> KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
> queues, even when it can't actually detect UAF in these objects, and the
> quarantine prevents fast recycling.
>
> When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
> CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
> after an RCU grace period and put them on the quarantine, while disabling
> CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
> but that hasn't actually been working.
>
> I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
> could only trigger this bug in a KASAN build by disabling
> CONFIG_SLUB_RCU_DEBUG and applying this patch.
>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  mm/kasan/common.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index ed4873e18c75..9142964ab9c9 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object,
>  }
>
>  static inline void poison_slab_object(struct kmem_cache *cache, void *object,
> -                                     bool init, bool still_accessible)
> +                                     bool init)
>  {
>         void *tagged_object = object;
>
>         object = kasan_reset_tag(object);
>
> -       /* RCU slabs could be legally used after free within the RCU period. */
> -       if (unlikely(still_accessible))
> -               return;
> -
>         kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
>                         KASAN_SLAB_FREE, init);
>
> @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init,
>         if (!kasan_arch_is_ready() || is_kfence_address(object))
>                 return false;
>
> -       poison_slab_object(cache, object, init, still_accessible);
> +       /*
> +        * If this point is reached with an object that must still be
> +        * accessible under RCU, we can't poison it; in that case, also skip the
> +        * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG
> +        * has been disabled manually.
> +        *
> +        * Putting the object on the quarantine wouldn't help catch UAFs (since
> +        * we can't poison it here), and it would mask bugs caused by
> +        * SLAB_TYPESAFE_BY_RCU users not being careful enough about object
> +        * reuse; so overall, putting the object into the quarantine here would
> +        * be counterproductive.
> +        */
> +       if (still_accessible)
> +               return false;
> +
> +       poison_slab_object(cache, object, init);
>
>         /*
>          * If the object is put into quarantine, do not let slab put the object
> @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip)
>         if (check_slab_allocation(slab->slab_cache, ptr, ip))
>                 return false;
>
> -       poison_slab_object(slab->slab_cache, ptr, false, false);
> +       poison_slab_object(slab->slab_cache, ptr, false);
>         return true;
>  }
>
>
> ---
> base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
> change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24
>
> --
> Jann Horn <jannh@google.com>
>

Acked-by: Andrey Konovalov <andreyknvl@gmail.com>

Would it be hard to add KUnit test to check that KASAN detects such issues?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] kasan: skip quarantine if object is still accessible under RCU
  2025-07-26 22:05 ` Andrey Konovalov
@ 2025-07-28 15:25   ` Jann Horn
  0 siblings, 0 replies; 6+ messages in thread
From: Jann Horn @ 2025-07-28 15:25 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: Andrey Ryabinin, Alexander Potapenko, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Vlastimil Babka, kasan-dev,
	linux-mm, linux-kernel

On Sun, Jul 27, 2025 at 12:06 AM Andrey Konovalov <andreyknvl@gmail.com> wrote:
> On Wed, Jul 23, 2025 at 4:59 PM Jann Horn <jannh@google.com> wrote:
> >
> > Currently, enabling KASAN masks bugs where a lockless lookup path gets a
> > pointer to a SLAB_TYPESAFE_BY_RCU object that might concurrently be
> > recycled and is insufficiently careful about handling recycled objects:
> > KASAN puts freed objects in SLAB_TYPESAFE_BY_RCU slabs onto its quarantine
> > queues, even when it can't actually detect UAF in these objects, and the
> > quarantine prevents fast recycling.
> >
> > When I introduced CONFIG_SLUB_RCU_DEBUG, my intention was that enabling
> > CONFIG_SLUB_RCU_DEBUG should cause KASAN to mark such objects as freed
> > after an RCU grace period and put them on the quarantine, while disabling
> > CONFIG_SLUB_RCU_DEBUG should allow such objects to be reused immediately;
> > but that hasn't actually been working.
> >
> > I discovered such a UAF bug involving SLAB_TYPESAFE_BY_RCU yesterday; I
> > could only trigger this bug in a KASAN build by disabling
> > CONFIG_SLUB_RCU_DEBUG and applying this patch.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> >  mm/kasan/common.c | 25 ++++++++++++++++++-------
> >  1 file changed, 18 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> > index ed4873e18c75..9142964ab9c9 100644
> > --- a/mm/kasan/common.c
> > +++ b/mm/kasan/common.c
> > @@ -230,16 +230,12 @@ static bool check_slab_allocation(struct kmem_cache *cache, void *object,
> >  }
> >
> >  static inline void poison_slab_object(struct kmem_cache *cache, void *object,
> > -                                     bool init, bool still_accessible)
> > +                                     bool init)
> >  {
> >         void *tagged_object = object;
> >
> >         object = kasan_reset_tag(object);
> >
> > -       /* RCU slabs could be legally used after free within the RCU period. */
> > -       if (unlikely(still_accessible))
> > -               return;
> > -
> >         kasan_poison(object, round_up(cache->object_size, KASAN_GRANULE_SIZE),
> >                         KASAN_SLAB_FREE, init);
> >
> > @@ -261,7 +257,22 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object, bool init,
> >         if (!kasan_arch_is_ready() || is_kfence_address(object))
> >                 return false;
> >
> > -       poison_slab_object(cache, object, init, still_accessible);
> > +       /*
> > +        * If this point is reached with an object that must still be
> > +        * accessible under RCU, we can't poison it; in that case, also skip the
> > +        * quarantine. This should mostly only happen when CONFIG_SLUB_RCU_DEBUG
> > +        * has been disabled manually.
> > +        *
> > +        * Putting the object on the quarantine wouldn't help catch UAFs (since
> > +        * we can't poison it here), and it would mask bugs caused by
> > +        * SLAB_TYPESAFE_BY_RCU users not being careful enough about object
> > +        * reuse; so overall, putting the object into the quarantine here would
> > +        * be counterproductive.
> > +        */
> > +       if (still_accessible)
> > +               return false;
> > +
> > +       poison_slab_object(cache, object, init);
> >
> >         /*
> >          * If the object is put into quarantine, do not let slab put the object
> > @@ -519,7 +530,7 @@ bool __kasan_mempool_poison_object(void *ptr, unsigned long ip)
> >         if (check_slab_allocation(slab->slab_cache, ptr, ip))
> >                 return false;
> >
> > -       poison_slab_object(slab->slab_cache, ptr, false, false);
> > +       poison_slab_object(slab->slab_cache, ptr, false);
> >         return true;
> >  }
> >
> >
> > ---
> > base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
> > change-id: 20250723-kasan-tsbrcu-noquarantine-e207bb990e24
> >
> > --
> > Jann Horn <jannh@google.com>
> >
>
> Acked-by: Andrey Konovalov <andreyknvl@gmail.com>

Thanks!

> Would it be hard to add KUnit test to check that KASAN detects such issues?

Sent a separate patch with a kunit test.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-28 15:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 14:59 [PATCH] kasan: skip quarantine if object is still accessible under RCU Jann Horn
2025-07-24 10:14 ` Vlastimil Babka
2025-07-24 10:20   ` Alexander Potapenko
2025-07-24 15:11   ` Jann Horn
2025-07-26 22:05 ` Andrey Konovalov
2025-07-28 15:25   ` Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).