linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH slab hotfixes v2 0/2] slub kunit tests fixes for 6.12
@ 2024-10-01 16:20 Vlastimil Babka
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test Vlastimil Babka
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in Vlastimil Babka
  0 siblings, 2 replies; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-01 16:20 UTC (permalink / raw)
  To: Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Vlastimil Babka, kernel test robot, Guenter Roeck,
	Paul E. McKenney, Boqun Feng, Uladzislau Rezki, rcu, David Gow,
	Rae Moar, linux-kselftest, kunit-dev, Brendan Higgins

The SLUB changes for 6.12 included new kunit tests that resulted in
noisy warnings, which we normally suppress, and a boot lockup in some
configurations in case the kunit tests are built-in.

The warnings are addressed in Patch 1.

The lockups I couldn't reproduce, but inspecting boot initialization
order makes me suspect the test_kfree_rcu() calling kfree_rcu() which is
too early before RCU finishes initialization.  Moving the exection later
was tried but broke tests marking their code as __init so Patch 2 skips
the test when the slub kunit tests are  built-in.

So these are now fixes for 4e1c44b3db79 ("kunit, slub: add
test_kfree_rcu() and test_leak_destroy()")

The plan is to take the fixes via slab tree for a 6.12 rcX.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Changes in v2:
- patch 2 skips the test when built-in instead of moving kunit execution
  later
- Link to v1: https://lore.kernel.org/r/20240930-b4-slub-kunit-fix-v1-0-32ca9dbbbc11@suse.cz

---
Vlastimil Babka (2):
      mm, slab: suppress warnings in test_leak_destroy kunit test
      slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in

 lib/slub_kunit.c | 18 ++++++++++++------
 mm/slab.h        |  6 ++++++
 mm/slab_common.c |  5 +++--
 mm/slub.c        |  5 +++--
 4 files changed, 24 insertions(+), 10 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20240930-b4-slub-kunit-fix-6fba4d1c1742

Best regards,
-- 
Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test
  2024-10-01 16:20 [PATCH slab hotfixes v2 0/2] slub kunit tests fixes for 6.12 Vlastimil Babka
@ 2024-10-01 16:20 ` Vlastimil Babka
  2024-10-02 13:44   ` Guenter Roeck
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in Vlastimil Babka
  1 sibling, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-01 16:20 UTC (permalink / raw)
  To: Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Vlastimil Babka, kernel test robot, Guenter Roeck

The test_leak_destroy kunit test intends to test the detection of stray
objects in kmem_cache_destroy(), which normally produces a warning. The
other slab kunit tests suppress the warnings in the kunit test context,
so suppress warnings and related printk output in this test as well.
Automated test running environments then don't need to learn to filter
the warnings.

Also rename the test's kmem_cache, the name was wrongly copy-pasted from
test_kfree_rcu.

Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408251723.42f3d902-oliver.sang@intel.com
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Closes: https://lore.kernel.org/all/CAB=+i9RHHbfSkmUuLshXGY_ifEZg9vCZi3fqr99+kmmnpDus7Q@mail.gmail.com/
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 lib/slub_kunit.c | 4 ++--
 mm/slab.h        | 6 ++++++
 mm/slab_common.c | 5 +++--
 mm/slub.c        | 5 +++--
 4 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index 6e3a1e5a7142f797fe20a28967657f50a466d4ee..85d51ec09846d4fa219db6bda336c6f0b89e98e4 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -177,13 +177,13 @@ static void test_kfree_rcu(struct kunit *test)
 
 static void test_leak_destroy(struct kunit *test)
 {
-	struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
+	struct kmem_cache *s = test_kmem_cache_create("TestSlub_leak_destroy",
 							64, SLAB_NO_MERGE);
 	kmem_cache_alloc(s, GFP_KERNEL);
 
 	kmem_cache_destroy(s);
 
-	KUNIT_EXPECT_EQ(test, 1, slab_errors);
+	KUNIT_EXPECT_EQ(test, 2, slab_errors);
 }
 
 static int test_init(struct kunit *test)
diff --git a/mm/slab.h b/mm/slab.h
index f22fb760b2866124d9d873d28b5a7fa6867aeb90..5cf0fbab0b0554ee70f5371772bc37b290893b8a 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -546,6 +546,12 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla
 	return false;
 }
 
+#if IS_ENABLED(CONFIG_SLUB_DEBUG) && IS_ENABLED(CONFIG_KUNIT)
+bool slab_in_kunit_test(void);
+#else
+static inline bool slab_in_kunit_test(void) { return false; }
+#endif
+
 #ifdef CONFIG_SLAB_OBJ_EXT
 
 /*
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7443244656150325fb2a7d0158a71821e1418062..3d26c257ed8b57c336ec5c38d4bbd5f5e51d50ff 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -508,8 +508,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	kasan_cache_shutdown(s);
 
 	err = __kmem_cache_shutdown(s);
-	WARN(err, "%s %s: Slab cache still has objects when called from %pS",
-	     __func__, s->name, (void *)_RET_IP_);
+	if (!slab_in_kunit_test())
+		WARN(err, "%s %s: Slab cache still has objects when called from %pS",
+		     __func__, s->name, (void *)_RET_IP_);
 
 	list_del(&s->list);
 
diff --git a/mm/slub.c b/mm/slub.c
index 21f71cb6cc06d951a657290421f41170bb3c76cf..5b832512044e3ead8ccde2c02308bd8954246db5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -827,7 +827,7 @@ static bool slab_add_kunit_errors(void)
 	return true;
 }
 
-static bool slab_in_kunit_test(void)
+bool slab_in_kunit_test(void)
 {
 	struct kunit_resource *resource;
 
@@ -843,7 +843,6 @@ static bool slab_in_kunit_test(void)
 }
 #else
 static inline bool slab_add_kunit_errors(void) { return false; }
-static inline bool slab_in_kunit_test(void) { return false; }
 #endif
 
 static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -5436,6 +5435,8 @@ static void list_slab_objects(struct kmem_cache *s, struct slab *slab,
 	for_each_object(p, s, addr, slab->objects) {
 
 		if (!test_bit(__obj_to_index(s, addr, p), object_map)) {
+			if (slab_add_kunit_errors())
+				continue;
 			pr_err("Object 0x%p @offset=%tu\n", p, p - addr);
 			print_tracking(s, p);
 		}

-- 
2.46.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  2024-10-01 16:20 [PATCH slab hotfixes v2 0/2] slub kunit tests fixes for 6.12 Vlastimil Babka
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test Vlastimil Babka
@ 2024-10-01 16:20 ` Vlastimil Babka
  2024-10-02 10:26   ` Vlastimil Babka
  2024-10-02 13:46   ` Guenter Roeck
  1 sibling, 2 replies; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-01 16:20 UTC (permalink / raw)
  To: Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Vlastimil Babka, Guenter Roeck, Paul E. McKenney,
	Boqun Feng, Uladzislau Rezki, rcu, David Gow, Rae Moar,
	linux-kselftest, kunit-dev, Brendan Higgins

Guenter Roeck reports that the new slub kunit tests added by commit
4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
test_leak_destroy()") cause a lockup on boot on several architectures
when the kunit tests are configured to be built-in and not modules.

The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
showed the runner for built-in kunit tests kunit_run_all_tests() is
called before setting system_state to SYSTEM_RUNNING and calling
rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
was unable to reproduce the problem myself, skipping the test when the
slub_kunit module is built-in should avoid the issue.

An alternative fix that was moving the call to kunit_run_all_tests() a
bit later in the boot was tried, but has broken tests with functions
marked as __init due to free_initmem() already being done.

Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: rcu@vger.kernel.org
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Rae Moar <rmoar@google.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 lib/slub_kunit.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644
--- a/lib/slub_kunit.c
+++ b/lib/slub_kunit.c
@@ -164,10 +164,16 @@ struct test_kfree_rcu_struct {
 
 static void test_kfree_rcu(struct kunit *test)
 {
-	struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
-				sizeof(struct test_kfree_rcu_struct),
-				SLAB_NO_MERGE);
-	struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
+	struct kmem_cache *s;
+	struct test_kfree_rcu_struct *p;
+
+	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
+		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
+
+	s = test_kmem_cache_create("TestSlub_kfree_rcu",
+				   sizeof(struct test_kfree_rcu_struct),
+				   SLAB_NO_MERGE);
+	p = kmem_cache_alloc(s, GFP_KERNEL);
 
 	kfree_rcu(p, rcu);
 	kmem_cache_destroy(s);

-- 
2.46.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in Vlastimil Babka
@ 2024-10-02 10:26   ` Vlastimil Babka
  2024-10-02 13:52     ` Guenter Roeck
  2024-10-02 13:46   ` Guenter Roeck
  1 sibling, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-02 10:26 UTC (permalink / raw)
  To: Guenter Roeck, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Paul E. McKenney, Boqun Feng, Uladzislau Rezki, rcu,
	David Gow, Rae Moar, linux-kselftest, kunit-dev, Brendan Higgins

On 10/1/24 18:20, Vlastimil Babka wrote:
> Guenter Roeck reports that the new slub kunit tests added by commit
> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
> test_leak_destroy()") cause a lockup on boot on several architectures
> when the kunit tests are configured to be built-in and not modules.
> 
> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
> showed the runner for built-in kunit tests kunit_run_all_tests() is
> called before setting system_state to SYSTEM_RUNNING and calling
> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
> was unable to reproduce the problem myself, skipping the test when the
> slub_kunit module is built-in should avoid the issue.
> 
> An alternative fix that was moving the call to kunit_run_all_tests() a
> bit later in the boot was tried, but has broken tests with functions
> marked as __init due to free_initmem() already being done.
> 
> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/

I hope you can confirm it helps, because the commit added two tests and I've
only skipped one of them, as it's the one using kfree_rcu(), which is
suspected. But the other is responsible for the (now suppressed)
kmem_cache_destroy() warning, and maybe I'm missing something and it was
actually that one causing the lockups.

Since you mentioned the boot lockups happened on some x86_64 too, do you
have a .config of the lockup case? I've tried tweaking some rcu options but
still nothing.

Thanks!

> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: rcu@vger.kernel.org
> Cc: Brendan Higgins <brendanhiggins@google.com>
> Cc: David Gow <davidgow@google.com>
> Cc: Rae Moar <rmoar@google.com>
> Cc: linux-kselftest@vger.kernel.org
> Cc: kunit-dev@googlegroups.com
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  lib/slub_kunit.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/slub_kunit.c b/lib/slub_kunit.c
> index 85d51ec09846d4fa219db6bda336c6f0b89e98e4..80e39f003344858722a544ad62ed84e885574054 100644
> --- a/lib/slub_kunit.c
> +++ b/lib/slub_kunit.c
> @@ -164,10 +164,16 @@ struct test_kfree_rcu_struct {
>  
>  static void test_kfree_rcu(struct kunit *test)
>  {
> -	struct kmem_cache *s = test_kmem_cache_create("TestSlub_kfree_rcu",
> -				sizeof(struct test_kfree_rcu_struct),
> -				SLAB_NO_MERGE);
> -	struct test_kfree_rcu_struct *p = kmem_cache_alloc(s, GFP_KERNEL);
> +	struct kmem_cache *s;
> +	struct test_kfree_rcu_struct *p;
> +
> +	if (IS_BUILTIN(CONFIG_SLUB_KUNIT_TEST))
> +		kunit_skip(test, "can't do kfree_rcu() when test is built-in");
> +
> +	s = test_kmem_cache_create("TestSlub_kfree_rcu",
> +				   sizeof(struct test_kfree_rcu_struct),
> +				   SLAB_NO_MERGE);
> +	p = kmem_cache_alloc(s, GFP_KERNEL);
>  
>  	kfree_rcu(p, rcu);
>  	kmem_cache_destroy(s);
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test Vlastimil Babka
@ 2024-10-02 13:44   ` Guenter Roeck
  0 siblings, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2024-10-02 13:44 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, kernel test robot

On 10/1/24 09:20, Vlastimil Babka wrote:
> The test_leak_destroy kunit test intends to test the detection of stray
> objects in kmem_cache_destroy(), which normally produces a warning. The
> other slab kunit tests suppress the warnings in the kunit test context,
> so suppress warnings and related printk output in this test as well.
> Automated test running environments then don't need to learn to filter
> the warnings.
> 
> Also rename the test's kmem_cache, the name was wrongly copy-pasted from
> test_kfree_rcu.
> 
> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202408251723.42f3d902-oliver.sang@intel.com
> Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Closes: https://lore.kernel.org/all/CAB=+i9RHHbfSkmUuLshXGY_ifEZg9vCZi3fqr99+kmmnpDus7Q@mail.gmail.com/
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Tested-by: Guenter Roeck <linux@roeck-us.net>

Guenter



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  2024-10-01 16:20 ` [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in Vlastimil Babka
  2024-10-02 10:26   ` Vlastimil Babka
@ 2024-10-02 13:46   ` Guenter Roeck
  1 sibling, 0 replies; 8+ messages in thread
From: Guenter Roeck @ 2024-10-02 13:46 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Paul E. McKenney, Boqun Feng, Uladzislau Rezki, rcu,
	David Gow, Rae Moar, linux-kselftest, kunit-dev, Brendan Higgins

On 10/1/24 09:20, Vlastimil Babka wrote:
> Guenter Roeck reports that the new slub kunit tests added by commit
> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
> test_leak_destroy()") cause a lockup on boot on several architectures
> when the kunit tests are configured to be built-in and not modules.
> 
> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
> showed the runner for built-in kunit tests kunit_run_all_tests() is
> called before setting system_state to SYSTEM_RUNNING and calling
> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
> was unable to reproduce the problem myself, skipping the test when the
> slub_kunit module is built-in should avoid the issue.
> 
> An alternative fix that was moving the call to kunit_run_all_tests() a
> bit later in the boot was tried, but has broken tests with functions
> marked as __init due to free_initmem() already being done.
> 
> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
> Reported-by: Guenter Roeck <linux@roeck-us.net>
> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: rcu@vger.kernel.org
> Cc: Brendan Higgins <brendanhiggins@google.com>
> Cc: David Gow <davidgow@google.com>
> Cc: Rae Moar <rmoar@google.com>
> Cc: linux-kselftest@vger.kernel.org
> Cc: kunit-dev@googlegroups.com
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

This results in:

     KTAP version 1
     # Subtest: slub_test
     # module: slub_kunit
     1..8
     # test_clobber_zone: pass:1 fail:0 skip:0 total:1
     ok 1 test_clobber_zone
     # test_next_pointer: pass:1 fail:0 skip:0 total:1
     ok 2 test_next_pointer
     # test_first_word: pass:1 fail:0 skip:0 total:1
     ok 3 test_first_word
     # test_clobber_50th_byte: pass:1 fail:0 skip:0 total:1
     ok 4 test_clobber_50th_byte
     # test_clobber_redzone_free: pass:1 fail:0 skip:0 total:1
     ok 5 test_clobber_redzone_free
     # test_kmalloc_redzone_access: pass:1 fail:0 skip:0 total:1
     ok 6 test_kmalloc_redzone_access
     # test_kfree_rcu: pass:0 fail:0 skip:1 total:1
     ok 7 test_kfree_rcu # SKIP can't do kfree_rcu() when test is built-in
     # test_leak_destroy: pass:1 fail:0 skip:0 total:1
     ok 8 test_leak_destroy
# slub_test: pass:7 fail:0 skip:1 total:8

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks,
Guenter



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  2024-10-02 10:26   ` Vlastimil Babka
@ 2024-10-02 13:52     ` Guenter Roeck
  2024-10-02 14:44       ` Vlastimil Babka
  0 siblings, 1 reply; 8+ messages in thread
From: Guenter Roeck @ 2024-10-02 13:52 UTC (permalink / raw)
  To: Vlastimil Babka, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Paul E. McKenney, Boqun Feng, Uladzislau Rezki, rcu,
	David Gow, Rae Moar, linux-kselftest, kunit-dev, Brendan Higgins

[-- Attachment #1: Type: text/plain, Size: 1929 bytes --]

On 10/2/24 03:26, Vlastimil Babka wrote:
> On 10/1/24 18:20, Vlastimil Babka wrote:
>> Guenter Roeck reports that the new slub kunit tests added by commit
>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
>> test_leak_destroy()") cause a lockup on boot on several architectures
>> when the kunit tests are configured to be built-in and not modules.
>>
>> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
>> showed the runner for built-in kunit tests kunit_run_all_tests() is
>> called before setting system_state to SYSTEM_RUNNING and calling
>> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
>> was unable to reproduce the problem myself, skipping the test when the
>> slub_kunit module is built-in should avoid the issue.
>>
>> An alternative fix that was moving the call to kunit_run_all_tests() a
>> bit later in the boot was tried, but has broken tests with functions
>> marked as __init due to free_initmem() already being done.
>>
>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
>> Reported-by: Guenter Roeck <linux@roeck-us.net>
>> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
> 
> I hope you can confirm it helps, because the commit added two tests and I've
> only skipped one of them, as it's the one using kfree_rcu(), which is
> suspected. But the other is responsible for the (now suppressed)
> kmem_cache_destroy() warning, and maybe I'm missing something and it was
> actually that one causing the lockups.
> 

Everything works with your patches applied, so we are good.

> Since you mentioned the boot lockups happened on some x86_64 too, do you
> have a .config of the lockup case? I've tried tweaking some rcu options but
> still nothing.
> 

I have a bunch of debug options enabled. Configuration (generated using
"make savedefconfig") for x86_64 is attached.

Thanks,
Guenter

[-- Attachment #2: defconfig.gz --]
[-- Type: application/gzip, Size: 2834 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in
  2024-10-02 13:52     ` Guenter Roeck
@ 2024-10-02 14:44       ` Vlastimil Babka
  0 siblings, 0 replies; 8+ messages in thread
From: Vlastimil Babka @ 2024-10-02 14:44 UTC (permalink / raw)
  To: Guenter Roeck, Christoph Lameter, David Rientjes
  Cc: Roman Gushchin, Andrew Morton, Hyeonggon Yoo, linux-kernel,
	linux-mm, Paul E. McKenney, Boqun Feng, Uladzislau Rezki, rcu,
	David Gow, Rae Moar, linux-kselftest, kunit-dev, Brendan Higgins

On 10/2/24 15:52, Guenter Roeck wrote:
> On 10/2/24 03:26, Vlastimil Babka wrote:
>> On 10/1/24 18:20, Vlastimil Babka wrote:
>>> Guenter Roeck reports that the new slub kunit tests added by commit
>>> 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and
>>> test_leak_destroy()") cause a lockup on boot on several architectures
>>> when the kunit tests are configured to be built-in and not modules.
>>>
>>> The test_kfree_rcu test invokes kfree_rcu() and boot sequence inspection
>>> showed the runner for built-in kunit tests kunit_run_all_tests() is
>>> called before setting system_state to SYSTEM_RUNNING and calling
>>> rcu_end_inkernel_boot(), so this seems like a likely cause. So while I
>>> was unable to reproduce the problem myself, skipping the test when the
>>> slub_kunit module is built-in should avoid the issue.
>>>
>>> An alternative fix that was moving the call to kunit_run_all_tests() a
>>> bit later in the boot was tried, but has broken tests with functions
>>> marked as __init due to free_initmem() already being done.
>>>
>>> Fixes: 4e1c44b3db79 ("kunit, slub: add test_kfree_rcu() and test_leak_destroy()")
>>> Reported-by: Guenter Roeck <linux@roeck-us.net>
>>> Closes: https://lore.kernel.org/all/6fcb1252-7990-4f0d-8027-5e83f0fb9409@roeck-us.net/
>> 
>> I hope you can confirm it helps, because the commit added two tests and I've
>> only skipped one of them, as it's the one using kfree_rcu(), which is
>> suspected. But the other is responsible for the (now suppressed)
>> kmem_cache_destroy() warning, and maybe I'm missing something and it was
>> actually that one causing the lockups.
>> 
> 
> Everything works with your patches applied, so we are good.

Thanks for testing! Queued for -next now and will send to Linus later if
all's good.

>> Since you mentioned the boot lockups happened on some x86_64 too, do you
>> have a .config of the lockup case? I've tried tweaking some rcu options but
>> still nothing.
>> 
> 
> I have a bunch of debug options enabled. Configuration (generated using
> "make savedefconfig") for x86_64 is attached.

Hmm, didn't see the hang with that (using virtme-ng) on v6.12-rc1. Guess
there's something more to it. Oh well.

> Thanks,
> Guenter



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-02 14:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-01 16:20 [PATCH slab hotfixes v2 0/2] slub kunit tests fixes for 6.12 Vlastimil Babka
2024-10-01 16:20 ` [PATCH slab hotfixes v2 1/2] mm, slab: suppress warnings in test_leak_destroy kunit test Vlastimil Babka
2024-10-02 13:44   ` Guenter Roeck
2024-10-01 16:20 ` [PATCH slab hotfixes v2 2/2] slub/kunit: skip test_kfree_rcu when the slub kunit test is built-in Vlastimil Babka
2024-10-02 10:26   ` Vlastimil Babka
2024-10-02 13:52     ` Guenter Roeck
2024-10-02 14:44       ` Vlastimil Babka
2024-10-02 13:46   ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).