linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] genirq/test: Platform/architecture fixes
@ 2025-08-18 19:27 Brian Norris
  2025-08-18 19:27 ` [PATCH 1/6] genirq/test: Select IRQ_DOMAIN Brian Norris
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

The new kunit tests at kernel/irq/irq_test.c were primarily tested on
x86_64, with QEMU and with ARCH=um builds. Naturally, there are other
architectures that throw complications in the mix, with various CPU
hotplug and IRQ implementation choices.

Guenter has been dutifully noticing and reporting these errors, in
places like:
https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/

I hope I've addressed all the failures, but it's hard to tell when I
don't have cross-compilers and QEMU setups for all of these
architectures.

I've tested what I could on arm, powerpc, x86_64, and um ARCH.

This series is based on David's patch for these tests:

[PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default.
https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/


Brian Norris (6):
  genirq/test: Select IRQ_DOMAIN
  genirq/test: Factor out fake-virq setup
  genirq/test: Fail early if we can't request an IRQ
  genirq/test: Skip managed-affinity tests with !SPARSE_IRQ
  genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions
  genirq/test: Ensure CPU 1 is online for hotplug test

 kernel/irq/Kconfig    |  1 +
 kernel/irq/irq_test.c | 64 ++++++++++++++++++++-----------------------
 2 files changed, 31 insertions(+), 34 deletions(-)

-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/6] genirq/test: Select IRQ_DOMAIN
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-18 19:27 ` [PATCH 2/6] genirq/test: Factor out fake-virq setup Brian Norris
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

These tests use irq_domain_alloc_descs() and so require
CONFIG_IRQ_DOMAIN.

Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Closes: https://lore.kernel.org/lkml/ded44edf-eeb7-420c-b8a8-d6543b955e6e@roeck-us.net/
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 1da5e9d9da71..08088b8e95ae 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -148,6 +148,7 @@ config IRQ_KUNIT_TEST
 	bool "KUnit tests for IRQ management APIs" if !KUNIT_ALL_TESTS
 	depends on KUNIT=y
 	default KUNIT_ALL_TESTS
+	select IRQ_DOMAIN
 	imply SMP
 	help
 	  This option enables KUnit tests for the IRQ subsystem API. These are
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/6] genirq/test: Factor out fake-virq setup
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
  2025-08-18 19:27 ` [PATCH 1/6] genirq/test: Select IRQ_DOMAIN Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-18 19:27 ` [PATCH 3/6] genirq/test: Fail early if we can't request an IRQ Brian Norris
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

We have to repeat a few things in tests. Factor out the creation of fake
IRQs.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/irq_test.c | 45 +++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
index e220e7b2fc18..f8f4532c2805 100644
--- a/kernel/irq/irq_test.c
+++ b/kernel/irq/irq_test.c
@@ -41,15 +41,15 @@ static struct irq_chip fake_irq_chip = {
 	.flags          = IRQCHIP_SKIP_SET_WAKE,
 };
 
-static void irq_disable_depth_test(struct kunit *test)
+static int irq_test_setup_fake_irq(struct kunit *test, struct irq_affinity_desc *affd)
 {
 	struct irq_desc *desc;
-	int virq, ret;
+	int virq;
 
-	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, affd);
 	KUNIT_ASSERT_GE(test, virq, 0);
 
-	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+	irq_set_chip_and_handler(virq, &fake_irq_chip, handle_simple_irq);
 
 	desc = irq_to_desc(virq);
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
@@ -57,6 +57,19 @@ static void irq_disable_depth_test(struct kunit *test)
 	/* On some architectures, IRQs are NOREQUEST | NOPROBE by default. */
 	irq_settings_clr_norequest(desc);
 
+	return virq;
+}
+
+static void irq_disable_depth_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	int virq, ret;
+
+	virq = irq_test_setup_fake_irq(test, NULL);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
 	KUNIT_EXPECT_EQ(test, ret, 0);
 
@@ -76,17 +89,11 @@ static void irq_free_disabled_test(struct kunit *test)
 	struct irq_desc *desc;
 	int virq, ret;
 
-	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
-	KUNIT_ASSERT_GE(test, virq, 0);
-
-	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+	virq = irq_test_setup_fake_irq(test, NULL);
 
 	desc = irq_to_desc(virq);
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
 
-	/* On some architectures, IRQs are NOREQUEST | NOPROBE by default. */
-	irq_settings_clr_norequest(desc);
-
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
 	KUNIT_EXPECT_EQ(test, ret, 0);
 
@@ -118,17 +125,11 @@ static void irq_shutdown_depth_test(struct kunit *test)
 	if (!IS_ENABLED(CONFIG_SMP))
 		kunit_skip(test, "requires CONFIG_SMP for managed shutdown");
 
-	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
-	KUNIT_ASSERT_GE(test, virq, 0);
-
-	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+	virq = irq_test_setup_fake_irq(test, &affinity);
 
 	desc = irq_to_desc(virq);
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
 
-	/* On some architectures, IRQs are NOREQUEST | NOPROBE by default. */
-	irq_settings_clr_norequest(desc);
-
 	data = irq_desc_get_irq_data(desc);
 	KUNIT_ASSERT_PTR_NE(test, data, NULL);
 
@@ -181,17 +182,11 @@ static void irq_cpuhotplug_test(struct kunit *test)
 
 	cpumask_copy(&affinity.mask, cpumask_of(1));
 
-	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
-	KUNIT_ASSERT_GE(test, virq, 0);
-
-	irq_set_chip_and_handler(virq, &fake_irq_chip, handle_simple_irq);
+	virq = irq_test_setup_fake_irq(test, &affinity);
 
 	desc = irq_to_desc(virq);
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
 
-	/* On some architectures, IRQs are NOREQUEST | NOPROBE by default. */
-	irq_settings_clr_norequest(desc);
-
 	data = irq_desc_get_irq_data(desc);
 	KUNIT_ASSERT_PTR_NE(test, data, NULL);
 
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/6] genirq/test: Fail early if we can't request an IRQ
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
  2025-08-18 19:27 ` [PATCH 1/6] genirq/test: Select IRQ_DOMAIN Brian Norris
  2025-08-18 19:27 ` [PATCH 2/6] genirq/test: Factor out fake-virq setup Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-18 19:27 ` [PATCH 4/6] genirq/test: Skip managed-affinity tests with !SPARSE_IRQ Brian Norris
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

Requesting the IRQ is part of basic setup of the test. If it fails, most
of the subsequent tests are likely to fail, and the output gets noisy.
Use "assert" to fail early.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/irq_test.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
index f8f4532c2805..56baeb5041d6 100644
--- a/kernel/irq/irq_test.c
+++ b/kernel/irq/irq_test.c
@@ -71,7 +71,7 @@ static void irq_disable_depth_test(struct kunit *test)
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
 
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
-	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
 
 	KUNIT_EXPECT_EQ(test, desc->depth, 0);
 
@@ -95,7 +95,7 @@ static void irq_free_disabled_test(struct kunit *test)
 	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
 
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
-	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
 
 	KUNIT_EXPECT_EQ(test, desc->depth, 0);
 
@@ -106,7 +106,7 @@ static void irq_free_disabled_test(struct kunit *test)
 	KUNIT_EXPECT_GE(test, desc->depth, 1);
 
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
-	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
 	KUNIT_EXPECT_EQ(test, desc->depth, 0);
 
 	free_irq(virq, NULL);
@@ -134,7 +134,7 @@ static void irq_shutdown_depth_test(struct kunit *test)
 	KUNIT_ASSERT_PTR_NE(test, data, NULL);
 
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
-	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
 
 	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
 	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
@@ -191,7 +191,7 @@ static void irq_cpuhotplug_test(struct kunit *test)
 	KUNIT_ASSERT_PTR_NE(test, data, NULL);
 
 	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
-	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
 
 	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
 	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/6] genirq/test: Skip managed-affinity tests with !SPARSE_IRQ
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
                   ` (2 preceding siblings ...)
  2025-08-18 19:27 ` [PATCH 3/6] genirq/test: Fail early if we can't request an IRQ Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-18 19:27 ` [PATCH 5/6] genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions Brian Norris
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

Managed-affinity is only supported with CONFIG_SPARSE_IRQ=y, so
irq_shutdown_depth_test() would fail with !irqd_affinity_is_managed().
Skip such tests if they're run without support.

Many architectures 'select SPARSE_IRQ', so this is easy to miss.

Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/irq_test.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
index 56baeb5041d6..ba85e4eb5211 100644
--- a/kernel/irq/irq_test.c
+++ b/kernel/irq/irq_test.c
@@ -46,6 +46,9 @@ static int irq_test_setup_fake_irq(struct kunit *test, struct irq_affinity_desc
 	struct irq_desc *desc;
 	int virq;
 
+	if (affd && !IS_ENABLED(CONFIG_SPARSE_IRQ))
+		kunit_skip(test, "requires CONFIG_SPARSE_IRQ for managed affinity");
+
 	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, affd);
 	KUNIT_ASSERT_GE(test, virq, 0);
 
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/6] genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
                   ` (3 preceding siblings ...)
  2025-08-18 19:27 ` [PATCH 4/6] genirq/test: Skip managed-affinity tests with !SPARSE_IRQ Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-18 19:27 ` [PATCH 6/6] genirq/test: Ensure CPU 1 is online for hotplug test Brian Norris
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

Not all platforms use the generic IRQ migration code, even if they
select GENERIC_IRQ_MIGRATION. (See, for example, powerpc /
pseries_cpu_disable().)

If such platforms don't perform managed shutdown the same way, the IRQ
may not actually shut down, and we'll fail these tests:

[    4.357022][  T101]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:211
[    4.357022][  T101]     Expected irqd_is_activated(data) to be false, but is true
[    4.358128][  T101]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
[    4.358128][  T101]     Expected irqd_is_started(data) to be false, but is true
[    4.375558][  T101]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:216
[    4.375558][  T101]     Expected irqd_is_activated(data) to be false, but is true
[    4.376088][  T101]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:217
[    4.376088][  T101]     Expected irqd_is_started(data) to be false, but is true
[    4.377851][    T1]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1
[    4.377901][    T1]     not ok 4 irq_cpuhotplug_test
[    4.378073][    T1] # irq_test_cases: pass:3 fail:1 skip:0 total:4

Rather than test that PowerPC performs migration the same way as the IRQ
core, let's just drop the state checks. The point of the test was to
ensure we kept |depth| balanced, and we can still test for that.

Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/irq_test.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
index ba85e4eb5211..553963136259 100644
--- a/kernel/irq/irq_test.c
+++ b/kernel/irq/irq_test.c
@@ -206,13 +206,9 @@ static void irq_cpuhotplug_test(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, desc->depth, 1);
 
 	KUNIT_EXPECT_EQ(test, remove_cpu(1), 0);
-	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
-	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
 	KUNIT_EXPECT_GE(test, desc->depth, 1);
 	KUNIT_EXPECT_EQ(test, add_cpu(1), 0);
 
-	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
-	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
 	KUNIT_EXPECT_EQ(test, desc->depth, 1);
 
 	enable_irq(virq);
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 6/6] genirq/test: Ensure CPU 1 is online for hotplug test
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
                   ` (4 preceding siblings ...)
  2025-08-18 19:27 ` [PATCH 5/6] genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions Brian Norris
@ 2025-08-18 19:27 ` Brian Norris
  2025-08-20  7:00 ` [PATCH 0/6] genirq/test: Platform/architecture fixes David Gow
  2025-08-21 17:02 ` Guenter Roeck
  7 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-18 19:27 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Gow, Guenter Roeck, linux-kernel, kunit-dev, Brian Norris

It's possible to run these tests on platforms that think they have a
hotpluggable CPU1, but for whatever reason, CPU1 is not online and can't
be brought online:

    # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
    Expected remove_cpu(1) == 0, but
        remove_cpu(1) == 1 (0x1)
CPU1: failed to boot: -38
    # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:214
    Expected add_cpu(1) == 0, but
        add_cpu(1) == -38 (0xffffffffffffffda)

Check that CPU1 is actually online before trying to run the test.

Fixes: 66067c3c8a1e ("genirq: Add kunit tests for depth counts")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 kernel/irq/irq_test.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
index 553963136259..2e7adf06fd17 100644
--- a/kernel/irq/irq_test.c
+++ b/kernel/irq/irq_test.c
@@ -182,6 +182,8 @@ static void irq_cpuhotplug_test(struct kunit *test)
 		kunit_skip(test, "requires more than 1 CPU for CPU hotplug");
 	if (!cpu_is_hotpluggable(1))
 		kunit_skip(test, "CPU 1 must be hotpluggable");
+	if (!cpu_online(1))
+		kunit_skip(test, "CPU 1 must be online");
 
 	cpumask_copy(&affinity.mask, cpumask_of(1));
 
-- 
2.51.0.rc1.167.g924127e9c0-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
                   ` (5 preceding siblings ...)
  2025-08-18 19:27 ` [PATCH 6/6] genirq/test: Ensure CPU 1 is online for hotplug test Brian Norris
@ 2025-08-20  7:00 ` David Gow
  2025-08-20 17:22   ` Brian Norris
  2025-08-21 17:02 ` Guenter Roeck
  7 siblings, 1 reply; 17+ messages in thread
From: David Gow @ 2025-08-20  7:00 UTC (permalink / raw)
  To: Brian Norris; +Cc: Thomas Gleixner, Guenter Roeck, linux-kernel, kunit-dev

[-- Attachment #1: Type: text/plain, Size: 2963 bytes --]

On Tue, 19 Aug 2025 at 03:28, Brian Norris <briannorris@chromium.org> wrote:
>
> The new kunit tests at kernel/irq/irq_test.c were primarily tested on
> x86_64, with QEMU and with ARCH=um builds. Naturally, there are other
> architectures that throw complications in the mix, with various CPU
> hotplug and IRQ implementation choices.
>
> Guenter has been dutifully noticing and reporting these errors, in
> places like:
> https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/
>
> I hope I've addressed all the failures, but it's hard to tell when I
> don't have cross-compilers and QEMU setups for all of these
> architectures.
>
> I've tested what I could on arm, powerpc, x86_64, and um ARCH.
>
> This series is based on David's patch for these tests:
>
> [PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default.
> https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/
>
>

Thanks very much. These patches all look good to me, so the series is:

Reviewed-by: David Gow <davidgow@google.com>

I am, however, still getting test failures on m68k (with CONFIG_VIRT=y):
./tools/testing/kunit/kunit.py  run --arch m68k --cross_compile
m68k-linux-gnu- irq*
[14:54:23] =============== irq_test_cases (4 subtests) ================
[14:54:23]     # irq_disable_depth_test: ASSERTION FAILED at
kernel/irq/irq_test.c:53
[14:54:23]     Expected virq >= 0, but
[14:54:23]         virq == -12 (0xfffffffffffffff4)
[14:54:23] [FAILED] irq_disable_depth_test
[14:54:23]     # irq_free_disabled_test: ASSERTION FAILED at
kernel/irq/irq_test.c:53
[14:54:23]     Expected virq >= 0, but
[14:54:23]         virq == -12 (0xfffffffffffffff4)
[14:54:23] [FAILED] irq_free_disabled_test
[14:54:23] [SKIPPED] irq_shutdown_depth_test
[14:54:23] [SKIPPED] irq_cpuhotplug_test
[14:54:23]     # module: irq_test
[14:54:23] # irq_test_cases: pass:0 fail:2 skip:2 total:4
[14:54:23] # Totals: pass:0 fail:2 skip:2 total:4
[14:54:23] ================= [FAILED] irq_test_cases ==================
[14:54:23] ============================================================
[14:54:23] Testing complete. Ran 4 tests: failed: 2, skipped: 2

Looks like __irq_alloc_descs() is returning -ENOMEM (as
irq_find_free_area() is returning 200 w/ nr_irqs == 200, and
CONFIG_SPARSE_IRQ=n).

But all of the other architectures I found worked okay, so this is at
least an improvement.

Thanks,
-- David

> Brian Norris (6):
>   genirq/test: Select IRQ_DOMAIN
>   genirq/test: Factor out fake-virq setup
>   genirq/test: Fail early if we can't request an IRQ
>   genirq/test: Skip managed-affinity tests with !SPARSE_IRQ
>   genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions
>   genirq/test: Ensure CPU 1 is online for hotplug test
>
>  kernel/irq/Kconfig    |  1 +
>  kernel/irq/irq_test.c | 64 ++++++++++++++++++++-----------------------
>  2 files changed, 31 insertions(+), 34 deletions(-)
>
> --
> 2.51.0.rc1.167.g924127e9c0-goog
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5281 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-20  7:00 ` [PATCH 0/6] genirq/test: Platform/architecture fixes David Gow
@ 2025-08-20 17:22   ` Brian Norris
  2025-08-20 21:37     ` Guenter Roeck
  2025-08-21  3:45     ` David Gow
  0 siblings, 2 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-20 17:22 UTC (permalink / raw)
  To: David Gow; +Cc: Thomas Gleixner, Guenter Roeck, linux-kernel, kunit-dev

On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote:
> Looks like __irq_alloc_descs() is returning -ENOMEM (as
> irq_find_free_area() is returning 200 w/ nr_irqs == 200, and
> CONFIG_SPARSE_IRQ=n).

Thanks for the insight. I bothered compiling my own qemu just so I can
run m68k this time, and I can reproduce.

I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on
CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable
SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd
be a tweak to patch 4.

Or maybe just 'depends on !M68K', since architectures with higher
NR_IRQS headroom may still work even without SPARSE_IRQ.

> But all of the other architectures I found worked okay, so this is at
> least an improvement.

Thanks for the testing.

Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-20 17:22   ` Brian Norris
@ 2025-08-20 21:37     ` Guenter Roeck
  2025-08-21  3:45     ` David Gow
  1 sibling, 0 replies; 17+ messages in thread
From: Guenter Roeck @ 2025-08-20 21:37 UTC (permalink / raw)
  To: Brian Norris, David Gow; +Cc: Thomas Gleixner, linux-kernel, kunit-dev

On 8/20/25 10:22, Brian Norris wrote:
> On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote:
>> Looks like __irq_alloc_descs() is returning -ENOMEM (as
>> irq_find_free_area() is returning 200 w/ nr_irqs == 200, and
>> CONFIG_SPARSE_IRQ=n).
> 
> Thanks for the insight. I bothered compiling my own qemu just so I can
> run m68k this time, and I can reproduce.
> 
> I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on
> CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable
> SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd
> be a tweak to patch 4.
> 
> Or maybe just 'depends on !M68K', since architectures with higher
> NR_IRQS headroom may still work even without SPARSE_IRQ.
> 
>> But all of the other architectures I found worked okay, so this is at
>> least an improvement.
> 
> Thanks for the testing.
> 
I applied the series to my testing branch. I'll run a full test tonight and
report results tomorrow.

Guenter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-20 17:22   ` Brian Norris
  2025-08-20 21:37     ` Guenter Roeck
@ 2025-08-21  3:45     ` David Gow
  2025-08-21  7:05       ` Geert Uytterhoeven
  1 sibling, 1 reply; 17+ messages in thread
From: David Gow @ 2025-08-21  3:45 UTC (permalink / raw)
  To: Brian Norris, Geert Uytterhoeven
  Cc: Thomas Gleixner, Guenter Roeck, linux-kernel, kunit-dev

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

On Thu, 21 Aug 2025 at 01:22, Brian Norris <briannorris@chromium.org> wrote:
>
> On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote:
> > Looks like __irq_alloc_descs() is returning -ENOMEM (as
> > irq_find_free_area() is returning 200 w/ nr_irqs == 200, and
> > CONFIG_SPARSE_IRQ=n).
>
> Thanks for the insight. I bothered compiling my own qemu just so I can
> run m68k this time, and I can reproduce.
>
> I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on
> CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable
> SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd
> be a tweak to patch 4.
>
> Or maybe just 'depends on !M68K', since architectures with higher
> NR_IRQS headroom may still work even without SPARSE_IRQ.
>

I'm not an m68k expert (so I've CCed Geert), but I think different
m68k configs do have different NR_IRQS, so it's possible there are
working m68k setups, too. (It also seems slightly suspicious to me
that exactly 200 IRQs are allocated here, though, so a lack of extra
headroom may be deliberate and/or triggered by something trying to
allocate all IRQs.)

Personally, I don't have any m68k machines lying around, so disabling
the test so my qemu scripts don't report errors is fine by me. Ideally
the dependency would be as narrow as possible, but that may well be
!M68K.

The other option would be to try to skip the test if there aren't free
IRQs, but maybe that'd hide real issues?

Regardless, I'll defer to the IRQ and m68k experts here: as long as
I'm not seeing errors, I'm happy. :-)

Cheers,
-- David

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5281 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-21  3:45     ` David Gow
@ 2025-08-21  7:05       ` Geert Uytterhoeven
  2025-08-21 15:32         ` Brian Norris
  0 siblings, 1 reply; 17+ messages in thread
From: Geert Uytterhoeven @ 2025-08-21  7:05 UTC (permalink / raw)
  To: David Gow
  Cc: Brian Norris, Thomas Gleixner, Guenter Roeck, linux-kernel,
	kunit-dev

Hi David,

On Thu, 21 Aug 2025 at 05:45, David Gow <davidgow@google.com> wrote:
> On Thu, 21 Aug 2025 at 01:22, Brian Norris <briannorris@chromium.org> wrote:
> > On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote:
> > > Looks like __irq_alloc_descs() is returning -ENOMEM (as
> > > irq_find_free_area() is returning 200 w/ nr_irqs == 200, and
> > > CONFIG_SPARSE_IRQ=n).
> >
> > Thanks for the insight. I bothered compiling my own qemu just so I can
> > run m68k this time, and I can reproduce.
> >
> > I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on
> > CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable
> > SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd
> > be a tweak to patch 4.
> >
> > Or maybe just 'depends on !M68K', since architectures with higher
> > NR_IRQS headroom may still work even without SPARSE_IRQ.
>
> I'm not an m68k expert (so I've CCed Geert), but I think different
> m68k configs do have different NR_IRQS, so it's possible there are
> working m68k setups, too. (It also seems slightly suspicious to me
> that exactly 200 IRQs are allocated here, though, so a lack of extra
> headroom may be deliberate and/or triggered by something trying to
> allocate all IRQs.)
>
> Personally, I don't have any m68k machines lying around, so disabling
> the test so my qemu scripts don't report errors is fine by me. Ideally
> the dependency would be as narrow as possible, but that may well be
> !M68K.

M68k indeed has different values of NR_IRQS, based on the system(s)
support is enabled for.  These values are based on the IRQ hierarchy
of the system(s), which is rather fixed.  Hence this does not take
into account any additional irqchips that are being registered by
e.g. tests...

"git grep -w NR_IRQS -- arch/*/include/" shows m68k is not the only
architecture having that limitation...

> The other option would be to try to skip the test if there aren't free
> IRQs, but maybe that'd hide real issues?
>
> Regardless, I'll defer to the IRQ and m68k experts here: as long as
> I'm not seeing errors, I'm happy. :-)

kernel/irq/irqdesc.c:

    static bool irq_expand_nr_irqs(unsigned int nr)
    {
            if (nr > MAX_SPARSE_IRQS)
                    return false;
            nr_irqs = nr;
            return true;
    }

kernel/irq/internals.h:

    #ifdef CONFIG_SPARSE_IRQ
    # define MAX_SPARSE_IRQS        INT_MAX
    #else
    # define MAX_SPARSE_IRQS        NR_IRQS
    #endif

So probably the test should depend on SPARSE_IRQ?  Increasing NR_IRQS
everywhere when IRQ_KUNIT_TEST is enabled sounds rather invasive to me.

BTW, given the test calls irq_domain_alloc_descs(), I think it should
also depend on IRQ_DOMAIN.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-21  7:05       ` Geert Uytterhoeven
@ 2025-08-21 15:32         ` Brian Norris
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-21 15:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: David Gow, Thomas Gleixner, Guenter Roeck, linux-kernel,
	kunit-dev

Hi Geert,

On Thu, Aug 21, 2025 at 09:05:03AM +0200, Geert Uytterhoeven wrote:
> So probably the test should depend on SPARSE_IRQ?  Increasing NR_IRQS
> everywhere when IRQ_KUNIT_TEST is enabled sounds rather invasive to me.

Yeah, I was leaning to 'depends on SPARSE_IRQ'

> BTW, given the test calls irq_domain_alloc_descs(), I think it should
> also depend on IRQ_DOMAIN.

Right, that's in patch 1.

I'll resend the series with a 'depends on SPARSE_IRQ'.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
                   ` (6 preceding siblings ...)
  2025-08-20  7:00 ` [PATCH 0/6] genirq/test: Platform/architecture fixes David Gow
@ 2025-08-21 17:02 ` Guenter Roeck
  2025-08-21 19:06   ` Brian Norris
  7 siblings, 1 reply; 17+ messages in thread
From: Guenter Roeck @ 2025-08-21 17:02 UTC (permalink / raw)
  To: Brian Norris; +Cc: Thomas Gleixner, David Gow, linux-kernel, kunit-dev

On Mon, Aug 18, 2025 at 12:27:37PM -0700, Brian Norris wrote:
> The new kunit tests at kernel/irq/irq_test.c were primarily tested on
> x86_64, with QEMU and with ARCH=um builds. Naturally, there are other
> architectures that throw complications in the mix, with various CPU
> hotplug and IRQ implementation choices.
> 
> Guenter has been dutifully noticing and reporting these errors, in
> places like:
> https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/
> 
> I hope I've addressed all the failures, but it's hard to tell when I
> don't have cross-compilers and QEMU setups for all of these
> architectures.
> 
> I've tested what I could on arm, powerpc, x86_64, and um ARCH.
> 
> This series is based on David's patch for these tests:
> 
> [PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default.
> https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/
> 
Looks pretty good.

Build results:
	total: 162 pass: 162 fail: 0
Qemu test results:
	total: 637 pass: 637 fail: 0
Unit test results:
	pass: 640616 fail: 13
Failed unit tests:
	arm64:imx8mp-evk:irq_cpuhotplug_test
	arm64:imx8mp-evk:irq_test_cases
	m68k:q800:irq_test_cases
	m68k:virt:irq_test_cases

Individual failures:

[   32.613761]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
[   32.613761]     Expected remove_cpu(1) == 0, but
[   32.613761]         remove_cpu(1) == -16 (0xfffffffffffffff0)
[   32.621522]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
[   32.621522]     Expected add_cpu(1) == 0, but
[   32.621522]         add_cpu(1) == 1 (0x1)
[   32.630930]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1

    # irq_disable_depth_test: ASSERTION FAILED at kernel/irq/irq_test.c:53
    Expected virq >= 0, but
        virq == -12 (0xfffffffffffffff4)
    # irq_disable_depth_test: pass:0 fail:1 skip:0 total:1
    not ok 1 irq_disable_depth_test
    # irq_free_disabled_test: ASSERTION FAILED at kernel/irq/irq_test.c:53
    Expected virq >= 0, but
        virq == -12 (0xfffffffffffffff4)
    # irq_free_disabled_test: pass:0 fail:1 skip:0 total:1

Guenter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-21 17:02 ` Guenter Roeck
@ 2025-08-21 19:06   ` Brian Norris
  2025-08-22 18:34     ` Guenter Roeck
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Norris @ 2025-08-21 19:06 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Thomas Gleixner, David Gow, linux-kernel, kunit-dev

On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote:
> Build results:
> 	total: 162 pass: 162 fail: 0
> Qemu test results:
> 	total: 637 pass: 637 fail: 0
> Unit test results:
> 	pass: 640616 fail: 13
> Failed unit tests:
> 	arm64:imx8mp-evk:irq_cpuhotplug_test
> 	arm64:imx8mp-evk:irq_test_cases
> 	m68k:q800:irq_test_cases
> 	m68k:virt:irq_test_cases
> 
> Individual failures:
> 
> [   32.613761]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
> [   32.613761]     Expected remove_cpu(1) == 0, but
> [   32.613761]         remove_cpu(1) == -16 (0xfffffffffffffff0)
> [   32.621522]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
> [   32.621522]     Expected add_cpu(1) == 0, but
> [   32.621522]         add_cpu(1) == 1 (0x1)
> [   32.630930]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1

I managed to get an imx8mp-evk setup running (both little and big
endian) and couldn't reproduce. But I'm guessing based on the logs that
we're racing with pci_call_probe(), which disables CPU hotplug
(cpu_hotplug_disable()) for its duration.

I'm not sure how to handle that.

1. I could just SKIP the test on EBUSY. But that'd make for flaky test
   coverage.
2. Expose some method to block cpu_hotplug_disable() users temporarily.
3. Stop trying to do CPU hotplug in a unit test. (It's bordering on
   "integration test"; but it's still useful IMO...)
4. Add an EBUSY retry loop? Or some other similar polling (if we had,
   say, a cpu_hotplug_disabled() API).

>     # irq_disable_depth_test: ASSERTION FAILED at kernel/irq/irq_test.c:53
>     Expected virq >= 0, but
>         virq == -12 (0xfffffffffffffff4)
>     # irq_disable_depth_test: pass:0 fail:1 skip:0 total:1
>     not ok 1 irq_disable_depth_test
>     # irq_free_disabled_test: ASSERTION FAILED at kernel/irq/irq_test.c:53
>     Expected virq >= 0, but
>         virq == -12 (0xfffffffffffffff4)
>     # irq_free_disabled_test: pass:0 fail:1 skip:0 total:1

We've discussed this one, and I have a fix (depends on SPARSE_IRQ).

Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-21 19:06   ` Brian Norris
@ 2025-08-22 18:34     ` Guenter Roeck
  2025-08-22 19:01       ` Brian Norris
  0 siblings, 1 reply; 17+ messages in thread
From: Guenter Roeck @ 2025-08-22 18:34 UTC (permalink / raw)
  To: Brian Norris; +Cc: Thomas Gleixner, David Gow, linux-kernel, kunit-dev

On 8/21/25 12:06, Brian Norris wrote:
> On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote:
>> Build results:
>> 	total: 162 pass: 162 fail: 0
>> Qemu test results:
>> 	total: 637 pass: 637 fail: 0
>> Unit test results:
>> 	pass: 640616 fail: 13
>> Failed unit tests:
>> 	arm64:imx8mp-evk:irq_cpuhotplug_test
>> 	arm64:imx8mp-evk:irq_test_cases
>> 	m68k:q800:irq_test_cases
>> 	m68k:virt:irq_test_cases
>>
>> Individual failures:
>>
>> [   32.613761]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
>> [   32.613761]     Expected remove_cpu(1) == 0, but
>> [   32.613761]         remove_cpu(1) == -16 (0xfffffffffffffff0)
>> [   32.621522]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
>> [   32.621522]     Expected add_cpu(1) == 0, but
>> [   32.621522]         add_cpu(1) == 1 (0x1)
>> [   32.630930]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1
> 
> I managed to get an imx8mp-evk setup running (both little and big
> endian) and couldn't reproduce. But I'm guessing based on the logs that
> we're racing with pci_call_probe(), which disables CPU hotplug
> (cpu_hotplug_disable()) for its duration.
> 
> I'm not sure how to handle that.
> 
> 1. I could just SKIP the test on EBUSY. But that'd make for flaky test
>     coverage.
> 2. Expose some method to block cpu_hotplug_disable() users temporarily.
> 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on
>     "integration test"; but it's still useful IMO...)
> 4. Add an EBUSY retry loop? Or some other similar polling (if we had,
>     say, a cpu_hotplug_disabled() API).
> 

Here is an additional data point: It only happens with big endian tests.
This always happens in my setup, and it only happens when booting from
virtio-pci but not when booting from other devices.

I just re-ran the test and it passed this time, so this is apparently
a flake. I'd suggest to ignore it for now. If I see it again and find
a clean way to reproduce it we can have another look. The emulated PCIe
controller for imx8mp-evk isn't exactly stable, so this may just be a side
effect of emulation problems.

Guenter


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/6] genirq/test: Platform/architecture fixes
  2025-08-22 18:34     ` Guenter Roeck
@ 2025-08-22 19:01       ` Brian Norris
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Norris @ 2025-08-22 19:01 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Thomas Gleixner, David Gow, linux-kernel, kunit-dev

On Fri, Aug 22, 2025 at 11:34:04AM -0700, Guenter Roeck wrote:
> On 8/21/25 12:06, Brian Norris wrote:
> > On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote:
> > > Build results:
> > > 	total: 162 pass: 162 fail: 0
> > > Qemu test results:
> > > 	total: 637 pass: 637 fail: 0
> > > Unit test results:
> > > 	pass: 640616 fail: 13
> > > Failed unit tests:
> > > 	arm64:imx8mp-evk:irq_cpuhotplug_test
> > > 	arm64:imx8mp-evk:irq_test_cases
> > > 	m68k:q800:irq_test_cases
> > > 	m68k:virt:irq_test_cases
> > > 
> > > Individual failures:
> > > 
> > > [   32.613761]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210
> > > [   32.613761]     Expected remove_cpu(1) == 0, but
> > > [   32.613761]         remove_cpu(1) == -16 (0xfffffffffffffff0)
> > > [   32.621522]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212
> > > [   32.621522]     Expected add_cpu(1) == 0, but
> > > [   32.621522]         add_cpu(1) == 1 (0x1)
> > > [   32.630930]     # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1
> > 
> > I managed to get an imx8mp-evk setup running (both little and big
> > endian) and couldn't reproduce. But I'm guessing based on the logs that
> > we're racing with pci_call_probe(), which disables CPU hotplug
> > (cpu_hotplug_disable()) for its duration.
> > 
> > I'm not sure how to handle that.
> > 
> > 1. I could just SKIP the test on EBUSY. But that'd make for flaky test
> >     coverage.
> > 2. Expose some method to block cpu_hotplug_disable() users temporarily.
> > 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on
> >     "integration test"; but it's still useful IMO...)
> > 4. Add an EBUSY retry loop? Or some other similar polling (if we had,
> >     say, a cpu_hotplug_disabled() API).

Ah, I see that add_cpu() (cpu_subsys_online()) already has an -EBUSY
retry loop, but remove_cpu() doesn't. So #4 seems like a good solution.
It might even make sense to retry in cpu_subsys_offline(), rather than
just in the test.

I'll give this some thought for later though.

> Here is an additional data point: It only happens with big endian tests.
> This always happens in my setup, and it only happens when booting from
> virtio-pci but not when booting from other devices.
> 
> I just re-ran the test and it passed this time, so this is apparently
> a flake. I'd suggest to ignore it for now. If I see it again and find
> a clean way to reproduce it we can have another look. The emulated PCIe
> controller for imx8mp-evk isn't exactly stable, so this may just be a side
> effect of emulation problems.

This furthers my suspicion that it's a race with PCIe probing. On the
failure case, the test is running right after some PCI scan logs.

But I'm fine deferring for now, since it's not very reproducible.

Brian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-08-22 19:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 19:27 [PATCH 0/6] genirq/test: Platform/architecture fixes Brian Norris
2025-08-18 19:27 ` [PATCH 1/6] genirq/test: Select IRQ_DOMAIN Brian Norris
2025-08-18 19:27 ` [PATCH 2/6] genirq/test: Factor out fake-virq setup Brian Norris
2025-08-18 19:27 ` [PATCH 3/6] genirq/test: Fail early if we can't request an IRQ Brian Norris
2025-08-18 19:27 ` [PATCH 4/6] genirq/test: Skip managed-affinity tests with !SPARSE_IRQ Brian Norris
2025-08-18 19:27 ` [PATCH 5/6] genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions Brian Norris
2025-08-18 19:27 ` [PATCH 6/6] genirq/test: Ensure CPU 1 is online for hotplug test Brian Norris
2025-08-20  7:00 ` [PATCH 0/6] genirq/test: Platform/architecture fixes David Gow
2025-08-20 17:22   ` Brian Norris
2025-08-20 21:37     ` Guenter Roeck
2025-08-21  3:45     ` David Gow
2025-08-21  7:05       ` Geert Uytterhoeven
2025-08-21 15:32         ` Brian Norris
2025-08-21 17:02 ` Guenter Roeck
2025-08-21 19:06   ` Brian Norris
2025-08-22 18:34     ` Guenter Roeck
2025-08-22 19:01       ` Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).