serious bugs in KUnit framework makes test completely useless

All of lore.kernel.org
 help / color / mirror / Atom feed

* serious bugs in KUnit framework makes test completely useless
@ 2026-02-27  9:40 Andy Shevchenko
  2026-02-27 10:42 ` David Gow
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Shevchenko @ 2026-02-27  9:40 UTC (permalink / raw)
  To: David Gow, Brendan Higgins, Rae Moar; +Cc: linux-kselftest, kunit-dev

Hi!

I have stumbled over the kunit framework issues that make the respective test
cases useless.

Now to the details.
Consider having today's Linux Next.

Scenario 1 (good):

I run

	./tools/testing/kunit/kunit.py config
	./tools/testing/kunit/kunit.py run printf

Everything works as expected:

  [10:19:36] Testing complete. Ran 28 tests: passed: 28
  [10:19:36] Elapsed time: 15.929s total, 0.001s configuring, 15.761s building, 0.114s running

Scenario 2 (BAD):

I applied the following change:

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -18,6 +18,7 @@
  */

 #include <linux/stdarg.h>
+#include <linux/bitops.h>
 #include <linux/build_bug.h>
 #include <linux/clk.h>
 #include <linux/clk-provider.h>
@@ -2904,12 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)

 		case FORMAT_STATE_NUM: {
 			unsigned long long num;
+			u8 shift = fmt.size * 8 - 1;

 			if (fmt.size > sizeof(int))
 				num = va_arg(args, long long);
 			else
-				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
-			str = number(str, end, num, spec);
+				num = va_arg(args, int);
+			num = sign_extend64(num, shift);
+			if (spec.flags & SIGN)
+				str = number(str, end, num, spec);
+			else
+				str = number(str, end, -(long long)num, spec);
 			continue;
 		}

Tests went into cosmos (I waited a few minutes and has to interrupt that):

  ^CERROR:root:Build interruption occurred. Cleaning console.
  ^CERROR:root:Build interruption occurred. Cleaning console.
  ^CERROR:root:Build interruption occurred. Cleaning console.
  Command '['.kunit/linux', 'kunit.filter_glob=printf', 'kunit.enable=1', 'mem=1G', 'console=tty', 'kunit_shutdown=halt']' timed out after 300 seconds
  [10:29:52] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
  [10:29:52] ============================================================
  [10:29:52] Testing complete. Ran 0 tests: errors: 1
  [10:29:52] Elapsed time: 305.676s total, 0.001s configuring, 5.669s building, 300.006s running

NOTE!
Independently on how long I waited the Elapsed time is about 5 minutes
(Seems 300 seconds limit as stated in the output).

Scenario 3 (BAD):

Now I took again a clean tree and applied this change:

--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -18,6 +18,7 @@
  */

 #include <linux/stdarg.h>
+#include <linux/bitops.h>
 #include <linux/build_bug.h>
 #include <linux/clk.h>
 #include <linux/clk-provider.h>
@@ -2904,11 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)

 		case FORMAT_STATE_NUM: {
 			unsigned long long num;
+			u8 shift = fmt.size * 8 - 1;

 			if (fmt.size > sizeof(int))
 				num = va_arg(args, long long);
+			else {
+				num = va_arg(args, int);
+			if ((spec.flags & SIGN))
+				num = sign_extend64(num, shift);
 			else
-				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
+				num &= ~(BIT_ULL(shift) - 1);
+			}
 			str = number(str, end, num, spec);
 			continue;
 		}

and run tests again.

  [10:39:16] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
  [10:39:16] ============================================================
  [10:39:16] Testing complete. Ran 0 tests: errors: 1
  [10:39:16] Elapsed time: 5.762s total, 0.001s configuring, 5.694s building, 0.067s running

it runs fast and completely useless. (There is no build error)

...

Please, fix this as it is a serious issue and really makes kunit useless.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: serious bugs in KUnit framework makes test completely useless
  2026-02-27  9:40 serious bugs in KUnit framework makes test completely useless Andy Shevchenko
@ 2026-02-27 10:42 ` David Gow
  2026-02-27 14:43   ` Andy Shevchenko
  0 siblings, 1 reply; 4+ messages in thread
From: David Gow @ 2026-02-27 10:42 UTC (permalink / raw)
  To: Andy Shevchenko, Brendan Higgins, Rae Moar; +Cc: linux-kselftest, kunit-dev

Le 27/02/2026 à 5:40 PM, Andy Shevchenko a écrit :
> 
> Hi!
> 
> I have stumbled over the kunit framework issues that make the respective test
> cases useless.
> 
> Now to the details.
> Consider having today's Linux Next.


Hi Andy,

Sorry to hear that KUnit is causing trouble. It looks like this is due 
to those patches crashing the kernel before KUnit gets to run: by using 
the --raw_output=full argument to kunit.py run, the corresponding logs 
are shown.

> 
> Scenario 1 (good):
> 
> I run
> 
> 	./tools/testing/kunit/kunit.py config
> 	./tools/testing/kunit/kunit.py run printf
> 
> Everything works as expected:
> 
>    [10:19:36] Testing complete. Ran 28 tests: passed: 28
>    [10:19:36] Elapsed time: 15.929s total, 0.001s configuring, 15.761s building, 0.114s running
> 
> 

This works fine for me, too. :-)


> Scenario 2 (BAD):
> 
> I applied the following change:
> 
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -18,6 +18,7 @@
>    */
>   
>   #include <linux/stdarg.h>
> +#include <linux/bitops.h>
>   #include <linux/build_bug.h>
>   #include <linux/clk.h>
>   #include <linux/clk-provider.h>
> @@ -2904,12 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>   
>   		case FORMAT_STATE_NUM: {
>   			unsigned long long num;
> +			u8 shift = fmt.size * 8 - 1;
>   
>   			if (fmt.size > sizeof(int))
>   				num = va_arg(args, long long);
>   			else
> -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
> -			str = number(str, end, num, spec);
> +				num = va_arg(args, int);
> +			num = sign_extend64(num, shift);
> +			if (spec.flags & SIGN)
> +				str = number(str, end, num, spec);
> +			else
> +				str = number(str, end, -(long long)num, spec);
>   			continue;
>   		}
>   
> 
> Tests went into cosmos (I waited a few minutes and has to interrupt that):
> 
>    ^CERROR:root:Build interruption occurred. Cleaning console.
>    ^CERROR:root:Build interruption occurred. Cleaning console.
>    ^CERROR:root:Build interruption occurred. Cleaning console.
>    Command '['.kunit/linux', 'kunit.filter_glob=printf', 'kunit.enable=1', 'mem=1G', 'console=tty', 'kunit_shutdown=halt']' timed out after 300 seconds
>    [10:29:52] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
>    [10:29:52] ============================================================
>    [10:29:52] Testing complete. Ran 0 tests: errors: 1
>    [10:29:52] Elapsed time: 305.676s total, 0.001s configuring, 5.669s building, 300.006s running
> 
> NOTE!
> Independently on how long I waited the Elapsed time is about 5 minutes
> (Seems 300 seconds limit as stated in the output).
> 

Interesting: this crashed immediately on my machine. During building, I 
see a (harmless) warning:
../lib/vsprintf.c:2827:27: warning: ‘convert_num_spec’ defined but not 
used [-Wunused-function]




By running KUnit with the --raw_output=full option, I can see a segfault 
(though, as you can see, the numbers throughout the stacktrace a wrong):
<18446744073709551610>Pid: 1, comm: swapper/0 Not tainted 
7.0.0-rc1-gff0627514551-dirty
<18446744073709551610>RIP: ffffffffffffffcd:0xffffffff9fac7320
<18446744073709551610>RSP: ffffffff5f7fc098  EFLAGS: fffffffffffefdf9
<18446744073709551610>RAX: fffffffffffc0000 RBX: ffffffff9ff1ad4c RCX: 
ffffffff9f7b6440
<18446744073709551610>RDX: 0000000000000000 RSI: ffffffff9f7b6388 RDI: 
ffffffffc6cfc8cc
<18446744073709551610>RBP: 0000000000000000 R08: ffffffffffffffff R09: 
ffffffffffffffd0
<18446744073709551610>R10: fffffffffffffff8 R11: fffffffffffffdba R12: 
ffffffff9fac7320
<18446744073709551610>R13: ffffffff9ff1b0d0 R14: 0000000000000000 R15: 
ffffffff9f963fe8
<0>Kernel panic - not syncing: Segfault with no mm
<18446744073709551612>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
7.0.0-rc1-gff0627514551-dirty #35 VOLUNTARY
<18446744073709551612>Stack:
<18446744073709551612> ffffffff9fbe2fd0 00000000 ffffffff9fffdaec 
ffffffff5f7fc080
<18446744073709551612> ffffffff5f7fc080 ffffffff9ff95050 
ffffffff9fab1be0 ffffffff9ff95050
<18446744073709551612> ffffffff9fbe2fd0 00000000 00000000 00000000
<18446744073709551612>Call Trace:
<18446744073709551612> [<ffffffff9fbe2fd0>] ? 
kernel_init+0x0/0xfffffffffffffe20
<18446744073709551612> [<ffffffff9fffdaec>] ? 
kernel_init_freeable+0xfffffffffffffe8b/0xfffffffffffffc82
<18446744073709551612> [<ffffffff9ff95050>] ? 
uml_curr_cpu+0x0/0xfffffffffffffff0
<18446744073709551612> [<ffffffff9ff95050>] ? 
uml_curr_cpu+0x0/0xfffffffffffffff0
<18446744073709551612> [<ffffffff9fbe2fd0>] ? 
kernel_init+0x0/0xfffffffffffffe20
<18446744073709551612> [<ffffffff9fbe2faa>] ? 
kernel_init+0xffffffffffffffda/0xfffffffffffffe20
<18446744073709551612> [<ffffffff9ff95050>] ? 
uml_curr_cpu+0x0/0xfffffffffffffff0
<18446744073709551612> [<ffffffff9ffa5c27>] ? 
new_thread_handler+0xffffffffffffff87/0xffffffffffffff60

(Trying the same thing with --arch x86_64 suggested that some stack 
corruption is occurring.)

> 
> Scenario 3 (BAD):
> 
> Now I took again a clean tree and applied this change:
> 
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -18,6 +18,7 @@
>    */
>   
>   #include <linux/stdarg.h>
> +#include <linux/bitops.h>
>   #include <linux/build_bug.h>
>   #include <linux/clk.h>
>   #include <linux/clk-provider.h>
> @@ -2904,11 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>   
>   		case FORMAT_STATE_NUM: {
>   			unsigned long long num;
> +			u8 shift = fmt.size * 8 - 1;
>   
>   			if (fmt.size > sizeof(int))
>   				num = va_arg(args, long long);
> +			else {
> +				num = va_arg(args, int);
> +			if ((spec.flags & SIGN))
> +				num = sign_extend64(num, shift);
>   			else
> -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
> +				num &= ~(BIT_ULL(shift) - 1);
> +			}
>   			str = number(str, end, num, spec);
>   			continue;
>   		}
> 
> and run tests again.
> 
>    [10:39:16] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
>    [10:39:16] ============================================================
>    [10:39:16] Testing complete. Ran 0 tests: errors: 1
>    [10:39:16] Elapsed time: 5.762s total, 0.001s configuring, 5.694s building, 0.067s running
> 
> it runs fast and completely useless. (There is no build error)


This one also kernel panics, and when run with --raw_output=full, we can 
see that it's due to all of the character devices' sysfs entries being 
duplicates, because the minor/major are being formatted as '/dev/char/0:0':

<0>sysfs: cannot create duplicate filename '/dev/char/0:0'
(...)
<0>Kernel panic - not syncing: Couldn't register pty driver
<0>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W 
7.0.0-rc1-gff0627514551-dirty #34 VOLUNTARY


> 
> ...
> 
> Please, fix this as it is a serious issue and really makes kunit useless.
> 

There's not much KUnit can do if the kernel panics before any tests can 
be run -- and unfortunately, vsprintf() seems able to cause lots of 
trouble early in the boot process.

One idea is to support building tests as independent userspace 
executables, which wouldn't depend on all of those parts of the kernel 
which break (and would be easier to debug). I discussed this a bit at 
Plumbers a couple of years ago[1], but haven't had a chance to work on 
it since. Even then, it'd require a little bit of test-specific work to 
get an isolated version of the kernel vsprintf to build and be testable.

In the short term, maybe we can improve the interface of kunit.py in 
cases where the kernel crashes. At the moment, we simply report that no 
tests had run (as you've noticed), but maybe we should check more 
actively for panics and/or make a more explicit difference between "no 
tests were run" and "the KUnit framework never exectuted". At the very 
least, we should suggest that --raw_output=full is a good way to debug 
this issue if the user wasn't expecting it in the error message. (I'll 
send a patch out to do this now.)

I hope that helps (at least a little bit), and thanks for sticking with 
KUnit despite these issues!

Cheers,
-- David


[1]: https://lpc.events/event/18/contributions/1790/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: serious bugs in KUnit framework makes test completely useless
  2026-02-27 10:42 ` David Gow
@ 2026-02-27 14:43   ` Andy Shevchenko
  2026-02-28 10:11     ` David Gow
  0 siblings, 1 reply; 4+ messages in thread
From: Andy Shevchenko @ 2026-02-27 14:43 UTC (permalink / raw)
  To: David Gow; +Cc: Brendan Higgins, Rae Moar, linux-kselftest, kunit-dev

On Fri, Feb 27, 2026 at 06:42:12PM +0800, David Gow wrote:
> Le 27/02/2026 à 5:40 PM, Andy Shevchenko a écrit :
> > 
> > I have stumbled over the kunit framework issues that make the respective test
> > cases useless.
> > 
> > Now to the details.
> > Consider having today's Linux Next.
> 
> Sorry to hear that KUnit is causing trouble. It looks like this is due to
> those patches crashing the kernel before KUnit gets to run: by using the
> --raw_output=full argument to kunit.py run, the corresponding logs are
> shown.
> 
> > 
> > Scenario 1 (good):
> > 
> > I run
> > 
> > 	./tools/testing/kunit/kunit.py config
> > 	./tools/testing/kunit/kunit.py run printf
> > 
> > Everything works as expected:
> > 
> >    [10:19:36] Testing complete. Ran 28 tests: passed: 28
> >    [10:19:36] Elapsed time: 15.929s total, 0.001s configuring, 15.761s building, 0.114s running
> 
> This works fine for me, too. :-)
> 
> 
> > Scenario 2 (BAD):
> > 
> > I applied the following change:
> > 
> > --- a/lib/vsprintf.c
> > +++ b/lib/vsprintf.c
> > @@ -18,6 +18,7 @@
> >    */
> >   #include <linux/stdarg.h>
> > +#include <linux/bitops.h>
> >   #include <linux/build_bug.h>
> >   #include <linux/clk.h>
> >   #include <linux/clk-provider.h>
> > @@ -2904,12 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> >   		case FORMAT_STATE_NUM: {
> >   			unsigned long long num;
> > +			u8 shift = fmt.size * 8 - 1;
> >   			if (fmt.size > sizeof(int))
> >   				num = va_arg(args, long long);
> >   			else
> > -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
> > -			str = number(str, end, num, spec);
> > +				num = va_arg(args, int);
> > +			num = sign_extend64(num, shift);
> > +			if (spec.flags & SIGN)
> > +				str = number(str, end, num, spec);
> > +			else
> > +				str = number(str, end, -(long long)num, spec);
> >   			continue;
> >   		}
> > 
> > Tests went into cosmos (I waited a few minutes and has to interrupt that):
> > 
> >    ^CERROR:root:Build interruption occurred. Cleaning console.
> >    ^CERROR:root:Build interruption occurred. Cleaning console.
> >    ^CERROR:root:Build interruption occurred. Cleaning console.
> >    Command '['.kunit/linux', 'kunit.filter_glob=printf', 'kunit.enable=1', 'mem=1G', 'console=tty', 'kunit_shutdown=halt']' timed out after 300 seconds
> >    [10:29:52] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
> >    [10:29:52] ============================================================
> >    [10:29:52] Testing complete. Ran 0 tests: errors: 1
> >    [10:29:52] Elapsed time: 305.676s total, 0.001s configuring, 5.669s building, 300.006s running
> > 
> > NOTE!
> > Independently on how long I waited the Elapsed time is about 5 minutes
> > (Seems 300 seconds limit as stated in the output).
> 
> Interesting: this crashed immediately on my machine. During building, I see
> a (harmless) warning:
> ../lib/vsprintf.c:2827:27: warning: ‘convert_num_spec’ defined but not used
> [-Wunused-function]

You need to enable binary printf() in the configuration, or comment out that function.
I have no such warning as I dropped the function (haven't used it in the above change
for the sake of simplicity.

> By running KUnit with the --raw_output=full option, I can see a segfault
> (though, as you can see, the numbers throughout the stacktrace a wrong):
> <18446744073709551610>Pid: 1, comm: swapper/0 Not tainted
> 7.0.0-rc1-gff0627514551-dirty
> <18446744073709551610>RIP: ffffffffffffffcd:0xffffffff9fac7320
> <18446744073709551610>RSP: ffffffff5f7fc098  EFLAGS: fffffffffffefdf9
> <18446744073709551610>RAX: fffffffffffc0000 RBX: ffffffff9ff1ad4c RCX:
> ffffffff9f7b6440
> <18446744073709551610>RDX: 0000000000000000 RSI: ffffffff9f7b6388 RDI:
> ffffffffc6cfc8cc
> <18446744073709551610>RBP: 0000000000000000 R08: ffffffffffffffff R09:
> ffffffffffffffd0
> <18446744073709551610>R10: fffffffffffffff8 R11: fffffffffffffdba R12:
> ffffffff9fac7320
> <18446744073709551610>R13: ffffffff9ff1b0d0 R14: 0000000000000000 R15:
> ffffffff9f963fe8
> <0>Kernel panic - not syncing: Segfault with no mm
> <18446744073709551612>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
> 7.0.0-rc1-gff0627514551-dirty #35 VOLUNTARY
> <18446744073709551612>Stack:
> <18446744073709551612> ffffffff9fbe2fd0 00000000 ffffffff9fffdaec
> ffffffff5f7fc080
> <18446744073709551612> ffffffff5f7fc080 ffffffff9ff95050 ffffffff9fab1be0
> ffffffff9ff95050
> <18446744073709551612> ffffffff9fbe2fd0 00000000 00000000 00000000
> <18446744073709551612>Call Trace:
> <18446744073709551612> [<ffffffff9fbe2fd0>] ?
> kernel_init+0x0/0xfffffffffffffe20
> <18446744073709551612> [<ffffffff9fffdaec>] ?
> kernel_init_freeable+0xfffffffffffffe8b/0xfffffffffffffc82
> <18446744073709551612> [<ffffffff9ff95050>] ?
> uml_curr_cpu+0x0/0xfffffffffffffff0
> <18446744073709551612> [<ffffffff9ff95050>] ?
> uml_curr_cpu+0x0/0xfffffffffffffff0
> <18446744073709551612> [<ffffffff9fbe2fd0>] ?
> kernel_init+0x0/0xfffffffffffffe20
> <18446744073709551612> [<ffffffff9fbe2faa>] ?
> kernel_init+0xffffffffffffffda/0xfffffffffffffe20
> <18446744073709551612> [<ffffffff9ff95050>] ?
> uml_curr_cpu+0x0/0xfffffffffffffff0
> <18446744073709551612> [<ffffffff9ffa5c27>] ?
> new_thread_handler+0xffffffffffffff87/0xffffffffffffff60
> 
> (Trying the same thing with --arch x86_64 suggested that some stack
> corruption is occurring.)

Got it!

> > Scenario 3 (BAD):
> > 
> > Now I took again a clean tree and applied this change:
> > 
> > --- a/lib/vsprintf.c
> > +++ b/lib/vsprintf.c
> > @@ -18,6 +18,7 @@
> >    */
> >   #include <linux/stdarg.h>
> > +#include <linux/bitops.h>
> >   #include <linux/build_bug.h>
> >   #include <linux/clk.h>
> >   #include <linux/clk-provider.h>
> > @@ -2904,11 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
> >   		case FORMAT_STATE_NUM: {
> >   			unsigned long long num;
> > +			u8 shift = fmt.size * 8 - 1;
> >   			if (fmt.size > sizeof(int))
> >   				num = va_arg(args, long long);
> > +			else {
> > +				num = va_arg(args, int);
> > +			if ((spec.flags & SIGN))
> > +				num = sign_extend64(num, shift);
> >   			else
> > -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
> > +				num &= ~(BIT_ULL(shift) - 1);
> > +			}
> >   			str = number(str, end, num, spec);
> >   			continue;
> >   		}
> > 
> > and run tests again.
> > 
> >    [10:39:16] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
> >    [10:39:16] ============================================================
> >    [10:39:16] Testing complete. Ran 0 tests: errors: 1
> >    [10:39:16] Elapsed time: 5.762s total, 0.001s configuring, 5.694s building, 0.067s running
> > 
> > it runs fast and completely useless. (There is no build error)
> 
> This one also kernel panics, and when run with --raw_output=full, we can see
> that it's due to all of the character devices' sysfs entries being
> duplicates, because the minor/major are being formatted as '/dev/char/0:0':
> 
> <0>sysfs: cannot create duplicate filename '/dev/char/0:0'
> (...)
> <0>Kernel panic - not syncing: Couldn't register pty driver
> <0>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W
> 7.0.0-rc1-gff0627514551-dirty #34 VOLUNTARY

Thanks for trying it and explaining me what's going on. At the bottom line I missed
--raw_output=full which should be enough for me.

...

> > Please, fix this as it is a serious issue and really makes kunit useless.
> 
> There's not much KUnit can do if the kernel panics before any tests can be
> run -- and unfortunately, vsprintf() seems able to cause lots of trouble
> early in the boot process.
> 
> One idea is to support building tests as independent userspace executables,
> which wouldn't depend on all of those parts of the kernel which break (and
> would be easier to debug). I discussed this a bit at Plumbers a couple of
> years ago[1], but haven't had a chance to work on it since. Even then, it'd
> require a little bit of test-specific work to get an isolated version of the
> kernel vsprintf to build and be testable.
> 
> In the short term, maybe we can improve the interface of kunit.py in cases
> where the kernel crashes. At the moment, we simply report that no tests had
> run (as you've noticed), but maybe we should check more actively for panics
> and/or make a more explicit difference between "no tests were run" and "the
> KUnit framework never exectuted". At the very least, we should suggest that
> --raw_output=full is a good way to debug this issue if the user wasn't
> expecting it in the error message. (I'll send a patch out to do this now.)

Most annoying part is hanging the console, and then after Ctrl+C pressed,
+300 seconds (unneeded!) timeout occurs.

> I hope that helps (at least a little bit), and thanks for sticking with
> KUnit despite these issues!
> 
> [1]: https://lpc.events/event/18/contributions/1790/

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: serious bugs in KUnit framework makes test completely useless
  2026-02-27 14:43   ` Andy Shevchenko
@ 2026-02-28 10:11     ` David Gow
  0 siblings, 0 replies; 4+ messages in thread
From: David Gow @ 2026-02-28 10:11 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Brendan Higgins, Rae Moar, linux-kselftest, kunit-dev

Le 27/02/2026 à 10:43 PM, Andy Shevchenko a écrit :
> On Fri, Feb 27, 2026 at 06:42:12PM +0800, David Gow wrote:
>> Le 27/02/2026 à 5:40 PM, Andy Shevchenko a écrit :
>>>
>>> I have stumbled over the kunit framework issues that make the respective test
>>> cases useless.
>>>
>>> Now to the details.
>>> Consider having today's Linux Next.
>>
>> Sorry to hear that KUnit is causing trouble. It looks like this is due to
>> those patches crashing the kernel before KUnit gets to run: by using the
>> --raw_output=full argument to kunit.py run, the corresponding logs are
>> shown.
>>
>>>
>>> Scenario 1 (good):
>>>
>>> I run
>>>
>>> 	./tools/testing/kunit/kunit.py config
>>> 	./tools/testing/kunit/kunit.py run printf
>>>
>>> Everything works as expected:
>>>
>>>     [10:19:36] Testing complete. Ran 28 tests: passed: 28
>>>     [10:19:36] Elapsed time: 15.929s total, 0.001s configuring, 15.761s building, 0.114s running
>>
>> This works fine for me, too. :-)
>>
>>
>>> Scenario 2 (BAD):
>>>
>>> I applied the following change:
>>>
>>> --- a/lib/vsprintf.c
>>> +++ b/lib/vsprintf.c
>>> @@ -18,6 +18,7 @@
>>>     */
>>>    #include <linux/stdarg.h>
>>> +#include <linux/bitops.h>
>>>    #include <linux/build_bug.h>
>>>    #include <linux/clk.h>
>>>    #include <linux/clk-provider.h>
>>> @@ -2904,12 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>>>    		case FORMAT_STATE_NUM: {
>>>    			unsigned long long num;
>>> +			u8 shift = fmt.size * 8 - 1;
>>>    			if (fmt.size > sizeof(int))
>>>    				num = va_arg(args, long long);
>>>    			else
>>> -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
>>> -			str = number(str, end, num, spec);
>>> +				num = va_arg(args, int);
>>> +			num = sign_extend64(num, shift);
>>> +			if (spec.flags & SIGN)
>>> +				str = number(str, end, num, spec);
>>> +			else
>>> +				str = number(str, end, -(long long)num, spec);
>>>    			continue;
>>>    		}
>>>
>>> Tests went into cosmos (I waited a few minutes and has to interrupt that):
>>>
>>>     ^CERROR:root:Build interruption occurred. Cleaning console.
>>>     ^CERROR:root:Build interruption occurred. Cleaning console.
>>>     ^CERROR:root:Build interruption occurred. Cleaning console.
>>>     Command '['.kunit/linux', 'kunit.filter_glob=printf', 'kunit.enable=1', 'mem=1G', 'console=tty', 'kunit_shutdown=halt']' timed out after 300 seconds
>>>     [10:29:52] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
>>>     [10:29:52] ============================================================
>>>     [10:29:52] Testing complete. Ran 0 tests: errors: 1
>>>     [10:29:52] Elapsed time: 305.676s total, 0.001s configuring, 5.669s building, 300.006s running
>>>
>>> NOTE!
>>> Independently on how long I waited the Elapsed time is about 5 minutes
>>> (Seems 300 seconds limit as stated in the output).
>>
>> Interesting: this crashed immediately on my machine. During building, I see
>> a (harmless) warning:
>> ../lib/vsprintf.c:2827:27: warning: ‘convert_num_spec’ defined but not used
>> [-Wunused-function]
> 
> You need to enable binary printf() in the configuration, or comment out that function.
> I have no such warning as I dropped the function (haven't used it in the above change
> for the sake of simplicity.
> 

Yeah, I assumed it was just an in-progress patch or something, but it 
did confirm that at least some build output was making it through kunit.py.

>> By running KUnit with the --raw_output=full option, I can see a segfault
>> (though, as you can see, the numbers throughout the stacktrace a wrong):
>> <18446744073709551610>Pid: 1, comm: swapper/0 Not tainted
>> 7.0.0-rc1-gff0627514551-dirty
>> <18446744073709551610>RIP: ffffffffffffffcd:0xffffffff9fac7320
>> <18446744073709551610>RSP: ffffffff5f7fc098  EFLAGS: fffffffffffefdf9
>> <18446744073709551610>RAX: fffffffffffc0000 RBX: ffffffff9ff1ad4c RCX:
>> ffffffff9f7b6440
>> <18446744073709551610>RDX: 0000000000000000 RSI: ffffffff9f7b6388 RDI:
>> ffffffffc6cfc8cc
>> <18446744073709551610>RBP: 0000000000000000 R08: ffffffffffffffff R09:
>> ffffffffffffffd0
>> <18446744073709551610>R10: fffffffffffffff8 R11: fffffffffffffdba R12:
>> ffffffff9fac7320
>> <18446744073709551610>R13: ffffffff9ff1b0d0 R14: 0000000000000000 R15:
>> ffffffff9f963fe8
>> <0>Kernel panic - not syncing: Segfault with no mm
>> <18446744073709551612>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
>> 7.0.0-rc1-gff0627514551-dirty #35 VOLUNTARY
>> <18446744073709551612>Stack:
>> <18446744073709551612> ffffffff9fbe2fd0 00000000 ffffffff9fffdaec
>> ffffffff5f7fc080
>> <18446744073709551612> ffffffff5f7fc080 ffffffff9ff95050 ffffffff9fab1be0
>> ffffffff9ff95050
>> <18446744073709551612> ffffffff9fbe2fd0 00000000 00000000 00000000
>> <18446744073709551612>Call Trace:
>> <18446744073709551612> [<ffffffff9fbe2fd0>] ?
>> kernel_init+0x0/0xfffffffffffffe20
>> <18446744073709551612> [<ffffffff9fffdaec>] ?
>> kernel_init_freeable+0xfffffffffffffe8b/0xfffffffffffffc82
>> <18446744073709551612> [<ffffffff9ff95050>] ?
>> uml_curr_cpu+0x0/0xfffffffffffffff0
>> <18446744073709551612> [<ffffffff9ff95050>] ?
>> uml_curr_cpu+0x0/0xfffffffffffffff0
>> <18446744073709551612> [<ffffffff9fbe2fd0>] ?
>> kernel_init+0x0/0xfffffffffffffe20
>> <18446744073709551612> [<ffffffff9fbe2faa>] ?
>> kernel_init+0xffffffffffffffda/0xfffffffffffffe20
>> <18446744073709551612> [<ffffffff9ff95050>] ?
>> uml_curr_cpu+0x0/0xfffffffffffffff0
>> <18446744073709551612> [<ffffffff9ffa5c27>] ?
>> new_thread_handler+0xffffffffffffff87/0xffffffffffffff60
>>
>> (Trying the same thing with --arch x86_64 suggested that some stack
>> corruption is occurring.)
> 
> Got it!
> 
>>> Scenario 3 (BAD):
>>>
>>> Now I took again a clean tree and applied this change:
>>>
>>> --- a/lib/vsprintf.c
>>> +++ b/lib/vsprintf.c
>>> @@ -18,6 +18,7 @@
>>>     */
>>>    #include <linux/stdarg.h>
>>> +#include <linux/bitops.h>
>>>    #include <linux/build_bug.h>
>>>    #include <linux/clk.h>
>>>    #include <linux/clk-provider.h>
>>> @@ -2904,11 +2889,17 @@ int vsnprintf(char *buf, size_t size, const char *fmt_str, va_list args)
>>>    		case FORMAT_STATE_NUM: {
>>>    			unsigned long long num;
>>> +			u8 shift = fmt.size * 8 - 1;
>>>    			if (fmt.size > sizeof(int))
>>>    				num = va_arg(args, long long);
>>> +			else {
>>> +				num = va_arg(args, int);
>>> +			if ((spec.flags & SIGN))
>>> +				num = sign_extend64(num, shift);
>>>    			else
>>> -				num = convert_num_spec(va_arg(args, int), fmt.size, spec);
>>> +				num &= ~(BIT_ULL(shift) - 1);
>>> +			}
>>>    			str = number(str, end, num, spec);
>>>    			continue;
>>>    		}
>>>
>>> and run tests again.
>>>
>>>     [10:39:16] [ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
>>>     [10:39:16] ============================================================
>>>     [10:39:16] Testing complete. Ran 0 tests: errors: 1
>>>     [10:39:16] Elapsed time: 5.762s total, 0.001s configuring, 5.694s building, 0.067s running
>>>
>>> it runs fast and completely useless. (There is no build error)
>>
>> This one also kernel panics, and when run with --raw_output=full, we can see
>> that it's due to all of the character devices' sysfs entries being
>> duplicates, because the minor/major are being formatted as '/dev/char/0:0':
>>
>> <0>sysfs: cannot create duplicate filename '/dev/char/0:0'
>> (...)
>> <0>Kernel panic - not syncing: Couldn't register pty driver
>> <0>CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W
>> 7.0.0-rc1-gff0627514551-dirty #34 VOLUNTARY
> 
> Thanks for trying it and explaining me what's going on. At the bottom line I missed
> --raw_output=full which should be enough for me.
> 
> ...
> 

Yeah, --raw_output=full (or its synonym --raw_output=all) are very 
useful for debugging this sort of thing.

Basically, --raw_output or --raw_output=kunit will give only the kernel 
logs from after the tests have started running (i.e., the KTAP header 
line has been printed), and --raw_output=all will pass through all 
kernel output, which is required if the KUnit tests never get run.

>>> Please, fix this as it is a serious issue and really makes kunit useless.
>>
>> There's not much KUnit can do if the kernel panics before any tests can be
>> run -- and unfortunately, vsprintf() seems able to cause lots of trouble
>> early in the boot process.
>>
>> One idea is to support building tests as independent userspace executables,
>> which wouldn't depend on all of those parts of the kernel which break (and
>> would be easier to debug). I discussed this a bit at Plumbers a couple of
>> years ago[1], but haven't had a chance to work on it since. Even then, it'd
>> require a little bit of test-specific work to get an isolated version of the
>> kernel vsprintf to build and be testable.
>>
>> In the short term, maybe we can improve the interface of kunit.py in cases
>> where the kernel crashes. At the moment, we simply report that no tests had
>> run (as you've noticed), but maybe we should check more actively for panics
>> and/or make a more explicit difference between "no tests were run" and "the
>> KUnit framework never exectuted". At the very least, we should suggest that
>> --raw_output=full is a good way to debug this issue if the user wasn't
>> expecting it in the error message. (I'll send a patch out to do this now.)
> 
> Most annoying part is hanging the console, and then after Ctrl+C pressed,
> +300 seconds (unneeded!) timeout occurs.
> 

Yes, that's been annoying for a while. I've sent a patch out [1] trying 
to fix it -- at least we won't go out of our way to eat SIGINT any more 
-- but I'm sure there are still some cases where UML or QEMU could hang 
more nastily.

>> I hope that helps (at least a little bit), and thanks for sticking with
>> KUnit despite these issues!
>>
>> [1]: https://lpc.events/event/18/contributions/1790/
> 

Cheers,
-- David

[1]: https://lore.kernel.org/all/20260228100727.208896-1-david@davidgow.net/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-28 10:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27  9:40 serious bugs in KUnit framework makes test completely useless Andy Shevchenko
2026-02-27 10:42 ` David Gow
2026-02-27 14:43   ` Andy Shevchenko
2026-02-28 10:11     ` David Gow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.