rcutorture’s init segfaults in ppc64le VM

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* rcutorture’s init segfaults in ppc64le VM
@ 2022-02-07 16:44 Paul Menzel
  2022-02-07 17:51 ` Paul E. McKenney
  2022-02-08 10:09 ` Michael Ellerman
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Menzel @ 2022-02-07 16:44 UTC (permalink / raw)
  To: Paul E. McKenney, Michael Ellerman; +Cc: rcu, linuxppc-dev

Dear Linux folks,


On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 
5.17-rc2+ with rcutorture tests

     $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10

the built init

     $ file tools/testing/selftests/rcutorture/initrd/init
     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for 
GNU/Linux 3.10.0, stripped

segfaults in QEMU. From one of the log files

 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log

     [    1.119803][    T1] Run /init as init process
     [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 
10000a18 lr 0 code 1 in init[10000000+d0000]
     [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 
4bffff58 00000000 01000000 00000580 3c40100f
     [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 
38000000 <f821ff81> 7c0803a6 f8010000 e9028010

Executing the init, which just seems to be an endless loop, from 
userspace work:

     $ strace ./tools/testing/selftests/rcutorture/initrd/init
     execve("./tools/testing/selftests/rcutorture/initrd/init", 
["./tools/testing/selftests/rcutor"...], 0x7ffffdb9e860 /* 31 vars */) = 0
     brk(NULL)                               = 0x1001d940000
     brk(0x1001d940b98)                      = 0x1001d940b98
     set_tid_address(0x1001d9400d0)          = 2890832
     set_robust_list(0x1001d9400e0, 24)      = 0
     uname({sysname="Linux", 
nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
     prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, 
rlim_max=RLIM64_INFINITY}) = 0
     readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 
4096) = 61
     getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
     brk(0x1001d970b98)                      = 0x1001d970b98
     brk(0x1001d980000)                      = 0x1001d980000
     mprotect(0x100e0000, 65536, PROT_READ)  = 0
     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
0x7ffffb22c8a8) = 0
     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
0x7ffffb22c8a8) = 0
     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, 
^C{tv_sec=0, tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted 
by signal)
     strace: Process 2890832 detached

Any ideas, what `mkinitrd.sh` [2] should do differently?

```
cat > init.c << '___EOF___'
#ifndef NOLIBC
#include <unistd.h>
#include <sys/time.h>
#endif

volatile unsigned long delaycount;

int main(int argc, int argv[])
{
	int i;
	struct timeval tv;
	struct timeval tvb;

	for (;;) {
		sleep(1);
		/* Need some userspace time. */
		if (gettimeofday(&tvb, NULL))
			continue;
		do {
			for (i = 0; i < 1000 * 100; i++)
				delaycount = i * i;
			if (gettimeofday(&tv, NULL))
				break;
			tv.tv_sec -= tvb.tv_sec;
			if (tv.tv_sec > 1)
				break;
			tv.tv_usec += tv.tv_sec * 1000 * 1000;
			tv.tv_usec -= tvb.tv_usec;
		} while (tv.tv_usec < 1000);
	}
	return 0;
}
___EOF___

# build using nolibc on supported archs (smaller executable) and fall
# back to regular glibc on other ones.
if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
            "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
    | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
    | grep -q '^yes'; then
	# architecture supported by nolibc
         ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
		-nostdlib -include ../../../../include/nolibc/nolibc.h \
		-s -static -Os -o init init.c -lgcc
else
	${CROSS_COMPILE}gcc -s -static -Os -o init init.c
fi
```


Kind regards,

Paul


[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
[2]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-07 16:44 rcutorture’s init segfaults in ppc64le VM Paul Menzel
@ 2022-02-07 17:51 ` Paul E. McKenney
  2022-02-07 18:09   ` rcutorture's " Willy Tarreau
  2022-02-08  5:46   ` rcutorture’s " Zhouyi Zhou
  2022-02-08 10:09 ` Michael Ellerman
  1 sibling, 2 replies; 15+ messages in thread
From: Paul E. McKenney @ 2022-02-07 17:51 UTC (permalink / raw)
  To: Paul Menzel; +Cc: rcu, linuxppc-dev, w

On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> 5.17-rc2+ with rcutorture tests
> 
>     $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> 
> the built init
> 
>     $ file tools/testing/selftests/rcutorture/initrd/init
>     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> GNU/Linux 3.10.0, stripped
> 
> segfaults in QEMU. From one of the log files
> 
> 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> 
>     [    1.119803][    T1] Run /init as init process
>     [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18
> lr 0 code 1 in init[10000000+d0000]
>     [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> 4bffff58 00000000 01000000 00000580 3c40100f
>     [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4
> 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> 
> Executing the init, which just seems to be an endless loop, from userspace
> work:
> 
>     $ strace ./tools/testing/selftests/rcutorture/initrd/init
>     execve("./tools/testing/selftests/rcutorture/initrd/init",
> ["./tools/testing/selftests/rcutor"...], 0x7ffffdb9e860 /* 31 vars */) = 0
>     brk(NULL)                               = 0x1001d940000
>     brk(0x1001d940b98)                      = 0x1001d940b98
>     set_tid_address(0x1001d9400d0)          = 2890832
>     set_robust_list(0x1001d9400e0, 24)      = 0
>     uname({sysname="Linux",
> nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
>     prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> rlim_max=RLIM64_INFINITY}) = 0
>     readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> = 61
>     getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
>     brk(0x1001d970b98)                      = 0x1001d970b98
>     brk(0x1001d980000)                      = 0x1001d980000
>     mprotect(0x100e0000, 65536, PROT_READ)  = 0
>     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> 0x7ffffb22c8a8) = 0
>     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> 0x7ffffb22c8a8) = 0
>     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
>     strace: Process 2890832 detached

Huh.  In PowerPC, is there some difference between system calls
executed in initrd and those same system calls executed in userspace?

And just to make sure, the above strace was from exactly the same
binary "init" file that is included in initrd, correct?

Adding Willy Tarreau for his thoughts.

							Thanx, Paul

> Any ideas, what `mkinitrd.sh` [2] should do differently?
> 
> ```
> cat > init.c << '___EOF___'
> #ifndef NOLIBC
> #include <unistd.h>
> #include <sys/time.h>
> #endif
> 
> volatile unsigned long delaycount;
> 
> int main(int argc, int argv[])
> {
> 	int i;
> 	struct timeval tv;
> 	struct timeval tvb;
> 
> 	for (;;) {
> 		sleep(1);
> 		/* Need some userspace time. */
> 		if (gettimeofday(&tvb, NULL))
> 			continue;
> 		do {
> 			for (i = 0; i < 1000 * 100; i++)
> 				delaycount = i * i;
> 			if (gettimeofday(&tv, NULL))
> 				break;
> 			tv.tv_sec -= tvb.tv_sec;
> 			if (tv.tv_sec > 1)
> 				break;
> 			tv.tv_usec += tv.tv_sec * 1000 * 1000;
> 			tv.tv_usec -= tvb.tv_usec;
> 		} while (tv.tv_usec < 1000);
> 	}
> 	return 0;
> }
> ___EOF___
> 
> # build using nolibc on supported archs (smaller executable) and fall
> # back to regular glibc on other ones.
> if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
>            "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
>    | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
>    | grep -q '^yes'; then
> 	# architecture supported by nolibc
>         ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
> 		-nostdlib -include ../../../../include/nolibc/nolibc.h \
> 		-s -static -Os -o init init.c -lgcc
> else
> 	${CROSS_COMPILE}gcc -s -static -Os -o init init.c
> fi
> ```
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
> [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture's init segfaults in ppc64le VM
  2022-02-07 17:51 ` Paul E. McKenney
@ 2022-02-07 18:09   ` Willy Tarreau
  2022-02-08  5:46   ` rcutorture’s " Zhouyi Zhou
  1 sibling, 0 replies; 15+ messages in thread
From: Willy Tarreau @ 2022-02-07 18:09 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu, Paul Menzel, linuxppc-dev

Hi Paul,

On Mon, Feb 07, 2022 at 09:51:39AM -0800, Paul E. McKenney wrote:
(...)
> >     $ file tools/testing/selftests/rcutorture/initrd/init
> >     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> > GNU/Linux 3.10.0, stripped
> > 
> > segfaults in QEMU. From one of the log files
> > 
> > 
> > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> > 
> >     [    1.119803][    T1] Run /init as init process
> >     [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18
> > lr 0 code 1 in init[10000000+d0000]
> >     [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> > 4bffff58 00000000 01000000 00000580 3c40100f
> >     [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4
> > 38000000 <f821ff81> 7c0803a6 f8010000 e9028010

It would be useful to disassemble the executable and spot exactly
the corresponding code locations and instructions.

> > Executing the init, which just seems to be an endless loop, from userspace
> > work:
> > 
> >     $ strace ./tools/testing/selftests/rcutorture/initrd/init
> >     execve("./tools/testing/selftests/rcutorture/initrd/init",
> > ["./tools/testing/selftests/rcutor"...], 0x7ffffdb9e860 /* 31 vars */) = 0
> >     brk(NULL)                               = 0x1001d940000
> >     brk(0x1001d940b98)                      = 0x1001d940b98
> >     set_tid_address(0x1001d9400d0)          = 2890832
> >     set_robust_list(0x1001d9400e0, 24)      = 0
> >     uname({sysname="Linux",
> > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> >     prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> > rlim_max=RLIM64_INFINITY}) = 0
> >     readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> > = 61

Just guessing, maybe the loader is missing a test when /proc is not
mounted ?

> >     getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> >     brk(0x1001d970b98)                      = 0x1001d970b98
> >     brk(0x1001d980000)                      = 0x1001d980000
> >     mprotect(0x100e0000, 65536, PROT_READ)  = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7ffffb22c8a8) = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7ffffb22c8a8) = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> >     strace: Process 2890832 detached
> 
> Huh.  In PowerPC, is there some difference between system calls
> executed in initrd and those same system calls executed in userspace?

I've faced some issues in the past with certain syscalls not working
exactly the same on pid 1 (I think it was setsid() or setpgrp(), but
I could be wrong, that was ~10 years ago). Maybe here we're seeing
something similar with set_tid_address() or set_robust_list().

> And just to make sure, the above strace was from exactly the same
> binary "init" file that is included in initrd, correct?
> 
> Adding Willy Tarreau for his thoughts.
> 
> 							Thanx, Paul
> 
> > Any ideas, what `mkinitrd.sh` [2] should do differently?

I think that we could add a fork() to see if the PID changes anything:

> > #ifndef NOLIBC
> > #include <unistd.h>
> > #include <sys/time.h>
> > #endif
> > 
> > volatile unsigned long delaycount;
> > 
> > int main(int argc, int argv[])
> > {
> > 	int i;
> > 	struct timeval tv;
> > 	struct timeval tvb;

Could you try with this ugly hack here ?

+	if (fork() > 0) {
+		wait(NULL);
+		return 0;
+	}

> > 	for (;;) {
> > 		sleep(1);
> > 		/* Need some userspace time. */
> > 		if (gettimeofday(&tvb, NULL))
> > 			continue;
> > 		do {
> > 			for (i = 0; i < 1000 * 100; i++)
> > 				delaycount = i * i;
> > 			if (gettimeofday(&tv, NULL))
> > 				break;
> > 			tv.tv_sec -= tvb.tv_sec;
> > 			if (tv.tv_sec > 1)
> > 				break;
> > 			tv.tv_usec += tv.tv_sec * 1000 * 1000;
> > 			tv.tv_usec -= tvb.tv_usec;
> > 		} while (tv.tv_usec < 1000);
> > 	}
> > 	return 0;
> > }
(...)

Regards,
Willy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-07 17:51 ` Paul E. McKenney
  2022-02-07 18:09   ` rcutorture's " Willy Tarreau
@ 2022-02-08  5:46   ` Zhouyi Zhou
  2022-02-08  6:08     ` Zhouyi Zhou
  1 sibling, 1 reply; 15+ messages in thread
From: Zhouyi Zhou @ 2022-02-08  5:46 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu, Paul Menzel, linuxppc-dev, w

Dear Paul

I am also very interested in the topic.
The Open source lab of Oregon State University has lent me a 8 core
power ppc64el VM for 3 months, I guess I can try reproducing this bug
in the Virtual Machine by executing qemu in non hardware accelerated
mode (using -no-kvm argument).
I am currently doing research on
https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759,
I think I can give a preliminary short report on that previous topic
tomorrow. And I am very interested in doing a search on the new topic
the day after tomorrow.

Thank you both for providing me an opportunity to improve myself ;-)

Thanks again
Zhouyi

On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> > Dear Linux folks,
> >
> >
> > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > 5.17-rc2+ with rcutorture tests
> >
> >     $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> >
> > the built init
> >
> >     $ file tools/testing/selftests/rcutorture/initrd/init
> >     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> > GNU/Linux 3.10.0, stripped
> >
> > segfaults in QEMU. From one of the log files
> >
> >
> > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> >
> >     [    1.119803][    T1] Run /init as init process
> >     [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18
> > lr 0 code 1 in init[10000000+d0000]
> >     [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> > 4bffff58 00000000 01000000 00000580 3c40100f
> >     [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4
> > 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> >
> > Executing the init, which just seems to be an endless loop, from userspace
> > work:
> >
> >     $ strace ./tools/testing/selftests/rcutorture/initrd/init
> >     execve("./tools/testing/selftests/rcutorture/initrd/init",
> > ["./tools/testing/selftests/rcutor"...], 0x7ffffdb9e860 /* 31 vars */) = 0
> >     brk(NULL)                               = 0x1001d940000
> >     brk(0x1001d940b98)                      = 0x1001d940b98
> >     set_tid_address(0x1001d9400d0)          = 2890832
> >     set_robust_list(0x1001d9400e0, 24)      = 0
> >     uname({sysname="Linux",
> > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> >     prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> > rlim_max=RLIM64_INFINITY}) = 0
> >     readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> > = 61
> >     getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> >     brk(0x1001d970b98)                      = 0x1001d970b98
> >     brk(0x1001d980000)                      = 0x1001d980000
> >     mprotect(0x100e0000, 65536, PROT_READ)  = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7ffffb22c8a8) = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > 0x7ffffb22c8a8) = 0
> >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> >     strace: Process 2890832 detached
>
> Huh.  In PowerPC, is there some difference between system calls
> executed in initrd and those same system calls executed in userspace?
>
> And just to make sure, the above strace was from exactly the same
> binary "init" file that is included in initrd, correct?
>
> Adding Willy Tarreau for his thoughts.
>
>                                                         Thanx, Paul
>
> > Any ideas, what `mkinitrd.sh` [2] should do differently?
> >
> > ```
> > cat > init.c << '___EOF___'
> > #ifndef NOLIBC
> > #include <unistd.h>
> > #include <sys/time.h>
> > #endif
> >
> > volatile unsigned long delaycount;
> >
> > int main(int argc, int argv[])
> > {
> >       int i;
> >       struct timeval tv;
> >       struct timeval tvb;
> >
> >       for (;;) {
> >               sleep(1);
> >               /* Need some userspace time. */
> >               if (gettimeofday(&tvb, NULL))
> >                       continue;
> >               do {
> >                       for (i = 0; i < 1000 * 100; i++)
> >                               delaycount = i * i;
> >                       if (gettimeofday(&tv, NULL))
> >                               break;
> >                       tv.tv_sec -= tvb.tv_sec;
> >                       if (tv.tv_sec > 1)
> >                               break;
> >                       tv.tv_usec += tv.tv_sec * 1000 * 1000;
> >                       tv.tv_usec -= tvb.tv_usec;
> >               } while (tv.tv_usec < 1000);
> >       }
> >       return 0;
> > }
> > ___EOF___
> >
> > # build using nolibc on supported archs (smaller executable) and fall
> > # back to regular glibc on other ones.
> > if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
> >            "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
> >    | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
> >    | grep -q '^yes'; then
> >       # architecture supported by nolibc
> >         ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
> >               -nostdlib -include ../../../../include/nolibc/nolibc.h \
> >               -s -static -Os -o init init.c -lgcc
> > else
> >       ${CROSS_COMPILE}gcc -s -static -Os -o init init.c
> > fi
> > ```
> >
> >
> > Kind regards,
> >
> > Paul
> >
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
> > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-08  5:46   ` rcutorture’s " Zhouyi Zhou
@ 2022-02-08  6:08     ` Zhouyi Zhou
  0 siblings, 0 replies; 15+ messages in thread
From: Zhouyi Zhou @ 2022-02-08  6:08 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: rcu, Paul Menzel, linuxppc-dev, w

Hi,

The mailing list forward the emails to me in periodic style, very
sorry not seeing Willy's email until I visited
https://lore.kernel.org/rcu/20220207180901.GB14608@1wt.eu/T/#u,  I am
also very interested in testing Willy's proposal.

Thanks a lot
Zhouyi

On Tue, Feb 8, 2022 at 1:46 PM Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
>
> Dear Paul
>
> I am also very interested in the topic.
> The Open source lab of Oregon State University has lent me a 8 core
> power ppc64el VM for 3 months, I guess I can try reproducing this bug
> in the Virtual Machine by executing qemu in non hardware accelerated
> mode (using -no-kvm argument).
> I am currently doing research on
> https://lore.kernel.org/rcu/20220201175023.GW4285@paulmck-ThinkPad-P17-Gen-1/T/#mc7e5f8ec99e3794bec1e38fbbb130e71172e4759,
> I think I can give a preliminary short report on that previous topic
> tomorrow. And I am very interested in doing a search on the new topic
> the day after tomorrow.
>
> Thank you both for providing me an opportunity to improve myself ;-)
>
> Thanks again
> Zhouyi
>
> On Tue, Feb 8, 2022 at 12:10 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Mon, Feb 07, 2022 at 05:44:47PM +0100, Paul Menzel wrote:
> > > Dear Linux folks,
> > >
> > >
> > > On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > > 5.17-rc2+ with rcutorture tests
> > >
> > >     $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> > >
> > > the built init
> > >
> > >     $ file tools/testing/selftests/rcutorture/initrd/init
> > >     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
> > > executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
> > > linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for
> > > GNU/Linux 3.10.0, stripped
> > >
> > > segfaults in QEMU. From one of the log files
> > >
> > >
> > > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> > >
> > >     [    1.119803][    T1] Run /init as init process
> > >     [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18
> > > lr 0 code 1 in init[10000000+d0000]
> > >     [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84
> > > 4bffff58 00000000 01000000 00000580 3c40100f
> > >     [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4
> > > 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> > >
> > > Executing the init, which just seems to be an endless loop, from userspace
> > > work:
> > >
> > >     $ strace ./tools/testing/selftests/rcutorture/initrd/init
> > >     execve("./tools/testing/selftests/rcutorture/initrd/init",
> > > ["./tools/testing/selftests/rcutor"...], 0x7ffffdb9e860 /* 31 vars */) = 0
> > >     brk(NULL)                               = 0x1001d940000
> > >     brk(0x1001d940b98)                      = 0x1001d940b98
> > >     set_tid_address(0x1001d9400d0)          = 2890832
> > >     set_robust_list(0x1001d9400e0, 24)      = 0
> > >     uname({sysname="Linux",
> > > nodename="flughafenberlinbrandenburgwillybrandt.molgen.mpg.de", ...}) = 0
> > >     prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024,
> > > rlim_max=RLIM64_INFINITY}) = 0
> > >     readlink("/proc/self/exe", "/dev/shm/linux/tools/testing/sel"..., 4096)
> > > = 61
> > >     getrandom("\xf1\x30\x4c\x9e\x82\x8d\x26\xd7", 8, GRND_NONBLOCK) = 8
> > >     brk(0x1001d970b98)                      = 0x1001d970b98
> > >     brk(0x1001d980000)                      = 0x1001d980000
> > >     mprotect(0x100e0000, 65536, PROT_READ)  = 0
> > >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > > 0x7ffffb22c8a8) = 0
> > >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0},
> > > 0x7ffffb22c8a8) = 0
> > >     clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=1, tv_nsec=0}, ^C{tv_sec=0,
> > > tv_nsec=872674044}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
> > >     strace: Process 2890832 detached
> >
> > Huh.  In PowerPC, is there some difference between system calls
> > executed in initrd and those same system calls executed in userspace?
> >
> > And just to make sure, the above strace was from exactly the same
> > binary "init" file that is included in initrd, correct?
> >
> > Adding Willy Tarreau for his thoughts.
> >
> >                                                         Thanx, Paul
> >
> > > Any ideas, what `mkinitrd.sh` [2] should do differently?
> > >
> > > ```
> > > cat > init.c << '___EOF___'
> > > #ifndef NOLIBC
> > > #include <unistd.h>
> > > #include <sys/time.h>
> > > #endif
> > >
> > > volatile unsigned long delaycount;
> > >
> > > int main(int argc, int argv[])
> > > {
> > >       int i;
> > >       struct timeval tv;
> > >       struct timeval tvb;
> > >
> > >       for (;;) {
> > >               sleep(1);
> > >               /* Need some userspace time. */
> > >               if (gettimeofday(&tvb, NULL))
> > >                       continue;
> > >               do {
> > >                       for (i = 0; i < 1000 * 100; i++)
> > >                               delaycount = i * i;
> > >                       if (gettimeofday(&tv, NULL))
> > >                               break;
> > >                       tv.tv_sec -= tvb.tv_sec;
> > >                       if (tv.tv_sec > 1)
> > >                               break;
> > >                       tv.tv_usec += tv.tv_sec * 1000 * 1000;
> > >                       tv.tv_usec -= tvb.tv_usec;
> > >               } while (tv.tv_usec < 1000);
> > >       }
> > >       return 0;
> > > }
> > > ___EOF___
> > >
> > > # build using nolibc on supported archs (smaller executable) and fall
> > > # back to regular glibc on other ones.
> > > if echo -e "#if __x86_64__||__i386__||__i486__||__i586__||__i686__" \
> > >            "||__ARM_EABI__||__aarch64__\nyes\n#endif" \
> > >    | ${CROSS_COMPILE}gcc -E -nostdlib -xc - \
> > >    | grep -q '^yes'; then
> > >       # architecture supported by nolibc
> > >         ${CROSS_COMPILE}gcc -fno-asynchronous-unwind-tables -fno-ident \
> > >               -nostdlib -include ../../../../include/nolibc/nolibc.h \
> > >               -s -static -Os -o init init.c -lgcc
> > > else
> > >       ${CROSS_COMPILE}gcc -s -static -Os -o init init.c
> > > fi
> > > ```
> > >
> > >
> > > Kind regards,
> > >
> > > Paul
> > >
> > >
> > > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/doc/initrd.txt
> > > [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/rcutorture/bin/mkinitrd.sh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-07 16:44 rcutorture’s init segfaults in ppc64le VM Paul Menzel
  2022-02-07 17:51 ` Paul E. McKenney
@ 2022-02-08 10:09 ` Michael Ellerman
  2022-02-08 12:12   ` Paul Menzel
  1 sibling, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2022-02-08 10:09 UTC (permalink / raw)
  To: Paul Menzel, Paul E. McKenney; +Cc: rcu, linuxppc-dev

Paul Menzel <pmenzel@molgen.mpg.de> writes:
> Dear Linux folks,

Hi Paul,

> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux 
> 5.17-rc2+ with rcutorture tests

I'm not sure if that's the host kernel version or the version you're
using of rcutorture? Can you tell us the sha1 of your host kernel and of
the tree you're running rcutorture from?

>      $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>
> the built init
>
>      $ file tools/testing/selftests/rcutorture/initrd/init
>      tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB 
> executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically 
> linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for 
> GNU/Linux 3.10.0, stripped

Mine looks pretty much identical:

  $ file tools/testing/selftests/rcutorture/initrd/init
  tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB
  executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically
  linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for
  GNU/Linux 3.10.0, stripped


> segfaults in QEMU. From one of the log files

But mine doesn't segfault, it runs fine and the test completes.

What qemu version are you using?

I tried 4.2.1 and 6.2.0, both worked.


> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
>      [    1.119803][    T1] Run /init as init process
>      [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
>      [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
>      [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010

The disassembly from 3c40100f is:
  lis     r2,4111
  addi    r2,r2,31744
  mr      r9,r1
  rldicr  r1,r1,0,59
  li      r0,0
  stdu    r1,-128(r1)		<- fault
  mtlr    r0
  std     r0,0(r1)
  ld      r8,-32752(r2)


I think you'll find that's the code at the ELF entry point. You can
check with:

 $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
   Entry point address:               0x10000c0c

 $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
    10000c0c:   0e 10 40 3c     lis     r2,4110
    10000c10:   00 7b 42 38     addi    r2,r2,31488
    10000c14:   78 0b 29 7c     mr      r9,r1
    10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
    10000c1c:   00 00 00 38     li      r0,0
    10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
    10000c24:   a6 03 08 7c     mtlr    r0
    10000c28:   00 00 01 f8     std     r0,0(r1)
    10000c2c:   10 80 02 e9     ld      r8,-32752(r2)


The fault you're seeing is the first store using the stack pointer (r1),
which is setup by the kernel.

The fault address f0656d90 is weirdly low, the stack should be up near 128TB.

I'm not sure how we end up with a bad r1.

Can you dump some info about the kernel that was built, something like:

$ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux

And maybe paste/attach the full log, maybe there's a clue somewhere.

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-08 10:09 ` Michael Ellerman
@ 2022-02-08 12:12   ` Paul Menzel
  2022-02-08 12:27     ` Paul Menzel
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Paul Menzel @ 2022-02-08 12:12 UTC (permalink / raw)
  To: Michael Ellerman, Paul E. McKenney
  Cc: rcu, Zhouyi Zhou, linuxppc-dev, Willy Tarreau

Dear Michael,


Thank you for looking into this.

Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> Paul Menzel writes:

[…]

>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>> 5.17-rc2+ with rcutorture tests
> 
> I'm not sure if that's the host kernel version or the version you're
> using of rcutorture? Can you tell us the sha1 of your host kernel and of
> the tree you're running rcutorture from?

The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
I am unable to find the exact sha1.

     $ more /proc/version
     Linux version 5.17.0-rc1+ 
(pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu 
clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
17:13:04 CET 2022

The Linux tree, from where I run rcutorture from, is at commit 
dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:

     $ git log --oneline -6
     207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems 
with rcutorture on ppc64le: allmodconfig(2) and other failures
     8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
     a447541d925f ata: libata-sata: remove debounce delay by default
     afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
     f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
     dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3

>>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>
>> the built init
>>
>>       $ file tools/testing/selftests/rcutorture/initrd/init
>>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
> 
> Mine looks pretty much identical:
> 
>    $ file tools/testing/selftests/rcutorture/initrd/init
>    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
> 
>> segfaults in QEMU. From one of the log files
> 
> But mine doesn't segfault, it runs fine and the test completes.
> 
> What qemu version are you using?
> 
> I tried 4.2.1 and 6.2.0, both worked.

     $ qemu-system-ppc64le --version
     QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
     Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log

Sorry, that was the wrong path/test. The correct one for the excerpt 
below is:

 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log

(For TREE03, QEMU does not start the Linux kernel at all, that means no 
output after:

     Booting Linux via __start() @ 0x0000000000400000 ...
)

>>       [    1.119803][    T1] Run /init as init process
>>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
>>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
>>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> 
> The disassembly from 3c40100f is:
>    lis     r2,4111
>    addi    r2,r2,31744
>    mr      r9,r1
>    rldicr  r1,r1,0,59
>    li      r0,0
>    stdu    r1,-128(r1)		<- fault
>    mtlr    r0
>    std     r0,0(r1)
>    ld      r8,-32752(r2)
> 
> 
> I think you'll find that's the code at the ELF entry point. You can
> check with:
> 
>   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
>     Entry point address:               0x10000c0c
> 
>   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
>      10000c0c:   0e 10 40 3c     lis     r2,4110
>      10000c10:   00 7b 42 38     addi    r2,r2,31488
>      10000c14:   78 0b 29 7c     mr      r9,r1
>      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
>      10000c1c:   00 00 00 38     li      r0,0
>      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
>      10000c24:   a6 03 08 7c     mtlr    r0
>      10000c28:   00 00 01 f8     std     r0,0(r1)
>      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
> 
> The fault you're seeing is the first store using the stack pointer (r1),
> which is setup by the kernel.
> 
> The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
> 
> I'm not sure how we end up with a bad r1.
> 
> Can you dump some info about the kernel that was built, something like:
> 
> $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
> 
> And maybe paste/attach the full log, maybe there's a clue somewhere.

You can now download the content of 
`/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
[1, 65 MB].

Can you reproduce the segmentation fault with the line below?

     $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
-net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
stdio -m 512 -kernel 
/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux 
-append "debug_boot_weak_hash panic=-1 console=ttyS0 
torture.disable_onoff_at_boot locktorture.onoff_interval=3 
locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
locktorture.shutdown_secs=60 locktorture.verbose=1"


Kind regards,

Paul


[1]: 
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-08 12:12   ` Paul Menzel
@ 2022-02-08 12:27     ` Paul Menzel
  2022-02-11  1:48     ` Michael Ellerman
  2022-03-10  2:37     ` Zhouyi Zhou
  2 siblings, 0 replies; 15+ messages in thread
From: Paul Menzel @ 2022-02-08 12:27 UTC (permalink / raw)
  To: Michael Ellerman, Paul E. McKenney
  Cc: rcu, Zhouyi Zhou, linuxppc-dev, Willy Tarreau

[Correct sha1 for test for 2022.02.01-21.52.37]


Am 08.02.22 um 13:12 schrieb Paul Menzel:
> Dear Michael,
> 
> 
> Thank you for looking into this.
> 
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
> 
> […]
> 
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>>
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
> 
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
> I am unable to find the exact sha1.
> 
>      $ more /proc/version
>      Linux version 5.17.0-rc1+ (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> 
> The Linux tree, from where I run rcutorture from, is at commit 
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
> 
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3

I was able to reproduce this with the above, but the report and the 
attached logs at the end are from:

     $ git log --oneline -6 b37a34a8cf5a
     b37a34a8cf5a Problems with rcutorture on ppc64le: allmodconfig(2) 
and other failures
     9a78ddead89a ata: libata-sata: improve sata_link_debounce()
     567da2eaf099 ata: libata-sata: remove debounce delay by default
     70ae61851660 ata: libata-sata: introduce struct sata_deb_timing
     9ebb6433d9c3 ata: libata-sata: Simplify sata_link_resume() interface
     26291c54e111 (tag: v5.17-rc2) Linux 5.17-rc2

>>>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>>       $ file tools/testing/selftests/rcutorture/initrd/init
>>>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>
>> Mine looks pretty much identical:
>>
>>    $ file tools/testing/selftests/rcutorture/initrd/init
>>    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>
>>> segfaults in QEMU. From one of the log files
>>
>> But mine doesn't segfault, it runs fine and the test completes.
>>
>> What qemu version are you using?
>>
>> I tried 4.2.1 and 6.2.0, both worked.
> 
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project 
> developers
> 
>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log 
>>>
> 
> Sorry, that was the wrong path/test. The correct one for the excerpt 
> below is:
> 
> 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log 
> 
> 
> (For TREE03, QEMU does not start the Linux kernel at all, that means no 
> output after:
> 
>      Booting Linux via __start() @ 0x0000000000400000 ...
> )
> 
>>>       [    1.119803][    T1] Run /init as init process
>>>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
>>>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
>>>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
>>
>> The disassembly from 3c40100f is:
>>    lis     r2,4111
>>    addi    r2,r2,31744
>>    mr      r9,r1
>>    rldicr  r1,r1,0,59
>>    li      r0,0
>>    stdu    r1,-128(r1)        <- fault
>>    mtlr    r0
>>    std     r0,0(r1)
>>    ld      r8,-32752(r2)
>>
>>
>> I think you'll find that's the code at the ELF entry point. You can
>> check with:
>>
>>   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep 
>> Entry
>>     Entry point address:               0x10000c0c
>>
>>   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep 
>> -m 1 -A 8 10000c0c
>>      10000c0c:   0e 10 40 3c     lis     r2,4110
>>      10000c10:   00 7b 42 38     addi    r2,r2,31488
>>      10000c14:   78 0b 29 7c     mr      r9,r1
>>      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
>>      10000c1c:   00 00 00 38     li      r0,0
>>      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
>>      10000c24:   a6 03 08 7c     mtlr    r0
>>      10000c28:   00 00 01 f8     std     r0,0(r1)
>>      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
>>
>> The fault you're seeing is the first store using the stack pointer (r1),
>> which is setup by the kernel.
>>
>> The fault address f0656d90 is weirdly low, the stack should be up near 
>> 128TB.
>>
>> I'm not sure how we end up with a bad r1.
>>
>> Can you dump some info about the kernel that was built, something like:
>>
>> $ file 
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux 
>>
>> And maybe paste/attach the full log, maybe there's a clue somewhere.
> 
> You can now download the content of 
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
> [1, 65 MB].
> 
> Can you reproduce the segmentation fault with the line below?
> 
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
> stdio -m 512 -kernel 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux 
> -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> torture.disable_onoff_at_boot locktorture.onoff_interval=3 
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
> locktorture.shutdown_secs=60 locktorture.verbose=1"
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-08 12:12   ` Paul Menzel
  2022-02-08 12:27     ` Paul Menzel
@ 2022-02-11  1:48     ` Michael Ellerman
  2022-02-11 14:19       ` Paul Menzel
  2022-03-10  2:37     ` Zhouyi Zhou
  2 siblings, 1 reply; 15+ messages in thread
From: Michael Ellerman @ 2022-02-11  1:48 UTC (permalink / raw)
  To: Paul Menzel, Paul E. McKenney
  Cc: rcu, Zhouyi Zhou, linuxppc-dev, Willy Tarreau

Paul Menzel <pmenzel@molgen.mpg.de> writes:
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>> Paul Menzel writes:
>
> […]
>
>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>> 5.17-rc2+ with rcutorture tests
>> 
>> I'm not sure if that's the host kernel version or the version you're
>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>> the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately, 
> I am unable to find the exact sha1.
>
>      $ more /proc/version
>      Linux version 5.17.0-rc1+ 
> (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu 
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022

OK. In general rc1 kernels can have issues, so it might be worth
rebooting the host into either v5.17-rc3 or a distro or stable kernel.
Just to rule out any issues on the host.

> The Linux tree, from where I run rcutorture from, is at commit 
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems 
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
>>>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>
>>> the built init
>>>
>>>       $ file tools/testing/selftests/rcutorture/initrd/init
>>>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>> 
>> Mine looks pretty much identical:
>> 
>>    $ file tools/testing/selftests/rcutorture/initrd/init
>>    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>> 
>>> segfaults in QEMU. From one of the log files
>> 
>> But mine doesn't segfault, it runs fine and the test completes.
>> 
>> What qemu version are you using?
>> 
>> I tried 4.2.1 and 6.2.0, both worked.
>
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

OK, that's one difference between our setups, but I'd be surprised if it
explains this bug, but I guess anything's possible.


>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt 
> below is:
>
>  
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no 
> output after:
>
>      Booting Linux via __start() @ 0x0000000000400000 ...

OK yeah I see that too.

Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
seems to fix it.

I still see some preempt related warnings, we clearly have some bugs
with preempt enabled.

> You can now download the content of 
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01` 
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial 
> stdio -m 512 -kernel 
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux 
> -append "debug_boot_weak_hash panic=-1 console=ttyS0 
> torture.disable_onoff_at_boot locktorture.onoff_interval=3 
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15 
> locktorture.shutdown_secs=60 locktorture.verbose=1"

That works fine for me, boots and runs the test, then shuts down.

I assume you see the segfault on every boot, not intermittently?

So the differences between our setups are the host kernel and the qemu
version. Can you try a different host kernel easily?

The other thing would be to try a different qemu version, you might need
to build from source, but it's not that hard :)

cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-11  1:48     ` Michael Ellerman
@ 2022-02-11 14:19       ` Paul Menzel
  2022-02-11 15:42         ` Paul Menzel
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Menzel @ 2022-02-11 14:19 UTC (permalink / raw)
  To: Michael Ellerman, Paul E. McKenney
  Cc: rcu, Zhouyi Zhou, linuxppc-dev, Willy Tarreau

Dear Michael,


Am 11.02.22 um 02:48 schrieb Michael Ellerman:
> Paul Menzel writes:
>> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>>> Paul Menzel writes:
>>
>> […]
>>
>>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>>> 5.17-rc2+ with rcutorture tests
>>>
>>> I'm not sure if that's the host kernel version or the version you're
>>> using of rcutorture? Can you tell us the sha1 of your host kernel and of
>>> the tree you're running rcutorture from?
>>
>> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
>> I am unable to find the exact sha1.
>>
>>       $ more /proc/version
>>       Linux version 5.17.0-rc1+ (x@eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
> 
> OK. In general rc1 kernels can have issues, so it might be worth
> rebooting the host into either v5.17-rc3 or a distro or stable kernel.
> Just to rule out any issues on the host.

Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.

     $ more /proc/version
     Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc 
(Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) 
#31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022

I have to do more tests, but it could be LLVM/clang related.

>> The Linux tree, from where I run rcutorture from, is at commit
>> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>>
>>       $ git log --oneline -6
>>       207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems with rcutorture on ppc64le: allmodconfig(2) and other failures
>>       8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>>       a447541d925f ata: libata-sata: remove debounce delay by default
>>       afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>>       f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>>       dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>>
>>>>        $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
>>>>
>>>> the built init
>>>>
>>>>        $ file tools/testing/selftests/rcutorture/initrd/init
>>>>        tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
>>>
>>> Mine looks pretty much identical:
>>>
>>>     $ file tools/testing/selftests/rcutorture/initrd/init
>>>     tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
>>>
>>>> segfaults in QEMU. From one of the log files
>>>
>>> But mine doesn't segfault, it runs fine and the test completes.
>>>
>>> What qemu version are you using?
>>>
>>> I tried 4.2.1 and 6.2.0, both worked.
>>
>>       $ qemu-system-ppc64le --version
>>       QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>>       Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
> 
> OK, that's one difference between our setups, but I'd be surprised if it
> explains this bug, but I guess anything's possible.
> 
>>>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>>
>> Sorry, that was the wrong path/test. The correct one for the excerpt
>> below is:
>>
>>   
>> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>>
>> (For TREE03, QEMU does not start the Linux kernel at all, that means no
>> output after:
>>
>>       Booting Linux via __start() @ 0x0000000000400000 ...
> 
> OK yeah I see that too.
> 
> Removing "threadirqs" from tools/testing/selftests/rcutorture/configs/rcu/TREE03.boot
> seems to fix it.

Nice find. I have no idea, what that means though.

> I still see some preempt related warnings, we clearly have some bugs
> with preempt enabled.
> 
>> You can now download the content of
>> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
>> [1, 65 MB].
>>
>> Can you reproduce the segmentation fault with the line below?
>>
>>       $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8 \
>>       -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial stdio -m 512 \
>>       -kernel /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux \
>>       -append "debug_boot_weak_hash panic=-1 console=ttyS0 \
>>       torture.disable_onoff_at_boot locktorture.onoff_interval=3 \
>>       locktorture.onoff_holdoff=30 locktorture.stat_interval=15 \
>>       locktorture.shutdown_secs=60 locktorture.verbose=1"
> 
> That works fine for me, boots and runs the test, then shuts down.
> 
> I assume you see the segfault on every boot, not intermittently?
> 
> So the differences between our setups are the host kernel and the qemu
> version. Can you try a different host kernel easily?
> 
> The other thing would be to try a different qemu version, you might need
> to build from source, but it's not that hard :)

Indeed. I needed to find a current Meson, but then it didn’t make a 
difference, as found out above, it’s related to the Linux kernel.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-11 14:19       ` Paul Menzel
@ 2022-02-11 15:42         ` Paul Menzel
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Menzel @ 2022-02-11 15:42 UTC (permalink / raw)
  To: Michael Ellerman, Paul E. McKenney
  Cc: rcu, Zhouyi Zhou, linuxppc-dev, Willy Tarreau

Dear Michael,


Am 11.02.22 um 15:19 schrieb Paul Menzel:

> Am 11.02.22 um 02:48 schrieb Michael Ellerman:
>> Paul Menzel writes:
>>> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
>>>> Paul Menzel writes:
>>>
>>> […]
>>>
>>>>> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
>>>>> 5.17-rc2+ with rcutorture tests
>>>>
>>>> I'm not sure if that's the host kernel version or the version you're
>>>> using of rcutorture? Can you tell us the sha1 of your host kernel 
>>>> and of the tree you're running rcutorture from?
>>>
>>> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
>>> I am unable to find the exact sha1.
>>>
>>>       $ more /proc/version
>>>       Linux version 5.17.0-rc1+ (x@eddb.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022
>>
>> OK. In general rc1 kernels can have issues, so it might be worth
>> rebooting the host into either v5.17-rc3 or a distro or stable kernel.
>> Just to rule out any issues on the host.
> 
> Yes, that was a good test. It works with Ubuntu’s 5.13 Linux kernel.
> 
>      $ more /proc/version
>      Linux version 5.13.0-28-generic (buildd@bos02-ppc64el-013) (gcc (Ubuntu 11.2.0-7ubuntu2) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.37) #31-Ubuntu SMP Thu Jan 13 17:40:19 UTC 2022
> 
> I have to do more tests, but it could be LLVM/clang related.

Building commit f1baf68e1383 (Merge tag 'net-5.17-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net) with the ata 
patches on top with GCC, I am unable to reproduce the issue. Before I 
built it with

     make -j100 LLVM=1 LLVM_IAS=0 bindeb-pkg

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-02-08 12:12   ` Paul Menzel
  2022-02-08 12:27     ` Paul Menzel
  2022-02-11  1:48     ` Michael Ellerman
@ 2022-03-10  2:37     ` Zhouyi Zhou
  2022-03-10  4:48       ` Paul E. McKenney
  2022-03-10  8:10       ` Paul Menzel
  2 siblings, 2 replies; 15+ messages in thread
From: Zhouyi Zhou @ 2022-03-10  2:37 UTC (permalink / raw)
  To: Paul Menzel; +Cc: rcu, linuxppc-dev, Willy Tarreau, Paul E. McKenney

Dear Paul

I try to reproduce the bug in ppc64 VM in Oregon State University
using the vmlinux extracted from
https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

the ppc64 VM in which I run the qemu without hardware acceleration is:
Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc
version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb
3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)


The qemu command I use to test:
cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
$qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
-m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
console=ttyS0 rcutorture.onoff_interval=200
rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
rcutorture.verbose=1"

The console.log is uploaded to:
http://154.223.142.244/logs/20220310/console.paul.log
The log tells us it is illegal instruction that causes the trouble:
[    4.246387][    T1] init[1]: illegal instruction (4) at 1002c308
nip 1002c308 lr 10001684 code 1 in init[10000000+d0000]
[    4.251400][    T1] init[1]: code: f90d88c0 f92a0008 f9480008
7c2004ac 2c2d0000 f9490000 386d88d0 380000e8
[    4.253416][    T1] init[1]: code: 41820098 e92d8f98 75290010
4182008c <44000001> 2c2d0000 60000000 8902f438


Meanwhile, the vmlinux compiled by myself runs smoothly.

Then I modify mkinitrd.sh to let it panic manually:
http://154.223.142.244/logs/20220310/mkinitrd.sh
The log tells us it is a segfault (instead of a illegal instruction):
http://154.223.142.244/logs/20220310/console.zhouyi.log

Then I use gdb to debug the init in host:
ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
tools/testing/selftests/rcutorture/initrd/init
(gdb) run
Starting program:
/home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init

Program received signal SIGSEGV, Segmentation fault.
0x0000000010000b2c in ?? ()
(gdb) x/10i $pc
=> 0x10000b2c:    stw     r9,0(r9)
   0x10000b30:    trap
   0x10000b34:    .long 0x0
   0x10000b38:    .long 0x0
   0x10000b3c:    .long 0x0
   0x10000b40:    lis     r2,4110
   0x10000b44:    addi    r2,r2,31488
   0x10000b48:    mr      r9,r1
   0x10000b4c:    rldicr  r1,r1,0,59
   0x10000b50:    li      r0,0
(gdb) p $r9
$1 = 0
(gdb) x/30x $pc - 0x30
0x10000afc:    0x38840040    0x387f0040    0xf8010040    0x48026919
0x10000b0c:    0x60000000    0xe8010040    0x7c0803a6    0x4bffff24
0x10000b1c:    0x00000000    0x01000000    0x00000180    0x39200000
0x10000b2c:    0x91290000    0x7fe00008    0x00000000    0x00000000
which matches the hex content of
http://154.223.142.244/logs/20220310/console.zhouyi.log:
[    5.077431][    T1] init[1]: segfault (11) at 0 nip 10000b2c lr
10001024 code 1 in init[10000000+d0000]
[    5.087167][    T1] init[1]: code: 38840040 387f0040 f8010040
48026919 60000000 e8010040 7c0803a6 4bffff24
[    5.093987][    T1] init[1]: code: 00000000 01000000 00000180
39200000 <91290000> 7fe00008 00000000 00000000


Conclusions: there might be something wrong when packing the init into
vmlinux in your environment.

I will continue to do research on this interesting problem with you.

Thanks
Kind Regards
Zhouyi



On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Michael,
>
>
> Thank you for looking into this.
>
> Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> > Paul Menzel writes:
>
> […]
>
> >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> >> 5.17-rc2+ with rcutorture tests
> >
> > I'm not sure if that's the host kernel version or the version you're
> > using of rcutorture? Can you tell us the sha1 of your host kernel and of
> > the tree you're running rcutorture from?
>
> The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> I am unable to find the exact sha1.
>
>      $ more /proc/version
>      Linux version 5.17.0-rc1+
> (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> 17:13:04 CET 2022
>
> The Linux tree, from where I run rcutorture from, is at commit
> dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
>
>      $ git log --oneline -6
>      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
> with rcutorture on ppc64le: allmodconfig(2) and other failures
>      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
>      a447541d925f ata: libata-sata: remove debounce delay by default
>      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
>      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
>      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
>
> >>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> >>
> >> the built init
> >>
> >>       $ file tools/testing/selftests/rcutorture/initrd/init
> >>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
> >
> > Mine looks pretty much identical:
> >
> >    $ file tools/testing/selftests/rcutorture/initrd/init
> >    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
> >
> >> segfaults in QEMU. From one of the log files
> >
> > But mine doesn't segfault, it runs fine and the test completes.
> >
> > What qemu version are you using?
> >
> > I tried 4.2.1 and 6.2.0, both worked.
>
>      $ qemu-system-ppc64le --version
>      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
>      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
>
> >> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
>
> Sorry, that was the wrong path/test. The correct one for the excerpt
> below is:
>
>
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
>
> (For TREE03, QEMU does not start the Linux kernel at all, that means no
> output after:
>
>      Booting Linux via __start() @ 0x0000000000400000 ...
> )
>
> >>       [    1.119803][    T1] Run /init as init process
> >>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
> >>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
> >>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> >
> > The disassembly from 3c40100f is:
> >    lis     r2,4111
> >    addi    r2,r2,31744
> >    mr      r9,r1
> >    rldicr  r1,r1,0,59
> >    li      r0,0
> >    stdu    r1,-128(r1)                <- fault
> >    mtlr    r0
> >    std     r0,0(r1)
> >    ld      r8,-32752(r2)
> >
> >
> > I think you'll find that's the code at the ELF entry point. You can
> > check with:
> >
> >   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
> >     Entry point address:               0x10000c0c
> >
> >   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
> >      10000c0c:   0e 10 40 3c     lis     r2,4110
> >      10000c10:   00 7b 42 38     addi    r2,r2,31488
> >      10000c14:   78 0b 29 7c     mr      r9,r1
> >      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
> >      10000c1c:   00 00 00 38     li      r0,0
> >      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
> >      10000c24:   a6 03 08 7c     mtlr    r0
> >      10000c28:   00 00 01 f8     std     r0,0(r1)
> >      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
> >
> > The fault you're seeing is the first store using the stack pointer (r1),
> > which is setup by the kernel.
> >
> > The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
> >
> > I'm not sure how we end up with a bad r1.
> >
> > Can you dump some info about the kernel that was built, something like:
> >
> > $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
> >
> > And maybe paste/attach the full log, maybe there's a clue somewhere.
>
> You can now download the content of
> `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
> [1, 65 MB].
>
> Can you reproduce the segmentation fault with the line below?
>
>      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
> -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
> stdio -m 512 -kernel
> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
> -append "debug_boot_weak_hash panic=-1 console=ttyS0
> torture.disable_onoff_at_boot locktorture.onoff_interval=3
> locktorture.onoff_holdoff=30 locktorture.stat_interval=15
> locktorture.shutdown_secs=60 locktorture.verbose=1"
>
>
> Kind regards,
>
> Paul
>
>
> [1]:
> https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-03-10  2:37     ` Zhouyi Zhou
@ 2022-03-10  4:48       ` Paul E. McKenney
  2022-03-10  8:10       ` Paul Menzel
  1 sibling, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2022-03-10  4:48 UTC (permalink / raw)
  To: Zhouyi Zhou; +Cc: rcu, Paul Menzel, linuxppc-dev, Willy Tarreau

On Thu, Mar 10, 2022 at 10:37:12AM +0800, Zhouyi Zhou wrote:
> Dear Paul
> 
> I try to reproduce the bug in ppc64 VM in Oregon State University
> using the vmlinux extracted from
> https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
> 
> the ppc64 VM in which I run the qemu without hardware acceleration is:
> Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc
> version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb
> 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)
> 
> 
> The qemu command I use to test:
> cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
> $qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
> pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
> -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
> console=ttyS0 rcutorture.onoff_interval=200
> rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
> rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
> rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
> rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
> rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
> rcutorture.verbose=1"
> 
> The console.log is uploaded to:
> http://154.223.142.244/logs/20220310/console.paul.log
> The log tells us it is illegal instruction that causes the trouble:
> [    4.246387][    T1] init[1]: illegal instruction (4) at 1002c308
> nip 1002c308 lr 10001684 code 1 in init[10000000+d0000]
> [    4.251400][    T1] init[1]: code: f90d88c0 f92a0008 f9480008
> 7c2004ac 2c2d0000 f9490000 386d88d0 380000e8
> [    4.253416][    T1] init[1]: code: 41820098 e92d8f98 75290010
> 4182008c <44000001> 2c2d0000 60000000 8902f438
> 
> 
> Meanwhile, the vmlinux compiled by myself runs smoothly.
> 
> Then I modify mkinitrd.sh to let it panic manually:
> http://154.223.142.244/logs/20220310/mkinitrd.sh
> The log tells us it is a segfault (instead of a illegal instruction):
> http://154.223.142.244/logs/20220310/console.zhouyi.log
> 
> Then I use gdb to debug the init in host:
> ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
> tools/testing/selftests/rcutorture/initrd/init
> (gdb) run
> Starting program:
> /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000010000b2c in ?? ()
> (gdb) x/10i $pc
> => 0x10000b2c:    stw     r9,0(r9)
>    0x10000b30:    trap
>    0x10000b34:    .long 0x0
>    0x10000b38:    .long 0x0
>    0x10000b3c:    .long 0x0
>    0x10000b40:    lis     r2,4110
>    0x10000b44:    addi    r2,r2,31488
>    0x10000b48:    mr      r9,r1
>    0x10000b4c:    rldicr  r1,r1,0,59
>    0x10000b50:    li      r0,0
> (gdb) p $r9
> $1 = 0
> (gdb) x/30x $pc - 0x30
> 0x10000afc:    0x38840040    0x387f0040    0xf8010040    0x48026919
> 0x10000b0c:    0x60000000    0xe8010040    0x7c0803a6    0x4bffff24
> 0x10000b1c:    0x00000000    0x01000000    0x00000180    0x39200000
> 0x10000b2c:    0x91290000    0x7fe00008    0x00000000    0x00000000
> which matches the hex content of
> http://154.223.142.244/logs/20220310/console.zhouyi.log:
> [    5.077431][    T1] init[1]: segfault (11) at 0 nip 10000b2c lr
> 10001024 code 1 in init[10000000+d0000]
> [    5.087167][    T1] init[1]: code: 38840040 387f0040 f8010040
> 48026919 60000000 e8010040 7c0803a6 4bffff24
> [    5.093987][    T1] init[1]: code: 00000000 01000000 00000180
> 39200000 <91290000> 7fe00008 00000000 00000000
> 
> 
> Conclusions: there might be something wrong when packing the init into
> vmlinux in your environment.

Quite possibly!  Or the compiler might not be being invoked properly
by the mkinitrd.sh script.

> I will continue to do research on this interesting problem with you.

Please let me know how it goes!

							Thanx, Paul

> Thanks
> Kind Regards
> Zhouyi
> 
> 
> 
> On Tue, Feb 8, 2022 at 8:12 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> >
> > Dear Michael,
> >
> >
> > Thank you for looking into this.
> >
> > Am 08.02.22 um 11:09 schrieb Michael Ellerman:
> > > Paul Menzel writes:
> >
> > […]
> >
> > >> On the POWER8 server IBM S822LC running Ubuntu 21.10, building Linux
> > >> 5.17-rc2+ with rcutorture tests
> > >
> > > I'm not sure if that's the host kernel version or the version you're
> > > using of rcutorture? Can you tell us the sha1 of your host kernel and of
> > > the tree you're running rcutorture from?
> >
> > The host system runs Linux 5.17-rc1+ started with kexec. Unfortunately,
> > I am unable to find the exact sha1.
> >
> >      $ more /proc/version
> >      Linux version 5.17.0-rc1+
> > (pmenzel@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu
> > clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28
> > 17:13:04 CET 2022
> >
> > The Linux tree, from where I run rcutorture from, is at commit
> > dfd42facf1e4 (Linux 5.17-rc3) with four patches on top:
> >
> >      $ git log --oneline -6
> >      207cec79e752 (HEAD -> master, origin/master, origin/HEAD) Problems
> > with rcutorture on ppc64le: allmodconfig(2) and other failures
> >      8c82f96fbe57 ata: libata-sata: improve sata_link_debounce()
> >      a447541d925f ata: libata-sata: remove debounce delay by default
> >      afd84e1eeafc ata: libata-sata: introduce struct sata_deb_timing
> >      f4caf7e48b75 ata: libata-sata: Simplify sata_link_resume() interface
> >      dfd42facf1e4 (tag: v5.17-rc3) Linux 5.17-rc3
> >
> > >>       $ tools/testing/selftests/rcutorture/bin/torture.sh --duration 10
> > >>
> > >> the built init
> > >>
> > >>       $ file tools/testing/selftests/rcutorture/initrd/init
> > >>       tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=0ded0e45649184a296f30d611f7a03cc51ecb616, for GNU/Linux 3.10.0, stripped
> > >
> > > Mine looks pretty much identical:
> > >
> > >    $ file tools/testing/selftests/rcutorture/initrd/init
> > >    tools/testing/selftests/rcutorture/initrd/init: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=86078bf6e5d54ab0860d36aa9a65d52818b972c8, for GNU/Linux 3.10.0, stripped
> > >
> > >> segfaults in QEMU. From one of the log files
> > >
> > > But mine doesn't segfault, it runs fine and the test completes.
> > >
> > > What qemu version are you using?
> > >
> > > I tried 4.2.1 and 6.2.0, both worked.
> >
> >      $ qemu-system-ppc64le --version
> >      QEMU emulator version 6.0.0 (Debian 1:6.0+dfsg-2expubuntu1.1)
> >      Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
> >
> > >> /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/console.log
> >
> > Sorry, that was the wrong path/test. The correct one for the excerpt
> > below is:
> >
> >
> > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/console.log
> >
> > (For TREE03, QEMU does not start the Linux kernel at all, that means no
> > output after:
> >
> >      Booting Linux via __start() @ 0x0000000000400000 ...
> > )
> >
> > >>       [    1.119803][    T1] Run /init as init process
> > >>       [    1.122011][    T1] init[1]: segfault (11) at f0656d90 nip 10000a18 lr 0 code 1 in init[10000000+d0000]
> > >>       [    1.124863][    T1] init[1]: code: 2c2903e7 f9210030 4081ff84 4bffff58 00000000 01000000 00000580 3c40100f
> > >>       [    1.128823][    T1] init[1]: code: 38427c00 7c290b78 782106e4 38000000 <f821ff81> 7c0803a6 f8010000 e9028010
> > >
> > > The disassembly from 3c40100f is:
> > >    lis     r2,4111
> > >    addi    r2,r2,31744
> > >    mr      r9,r1
> > >    rldicr  r1,r1,0,59
> > >    li      r0,0
> > >    stdu    r1,-128(r1)                <- fault
> > >    mtlr    r0
> > >    std     r0,0(r1)
> > >    ld      r8,-32752(r2)
> > >
> > >
> > > I think you'll find that's the code at the ELF entry point. You can
> > > check with:
> > >
> > >   $ readelf -e tools/testing/selftests/rcutorture/initrd/init | grep Entry
> > >     Entry point address:               0x10000c0c
> > >
> > >   $ objdump -d tools/testing/selftests/rcutorture/initrd/init | grep -m 1 -A 8 10000c0c
> > >      10000c0c:   0e 10 40 3c     lis     r2,4110
> > >      10000c10:   00 7b 42 38     addi    r2,r2,31488
> > >      10000c14:   78 0b 29 7c     mr      r9,r1
> > >      10000c18:   e4 06 21 78     rldicr  r1,r1,0,59
> > >      10000c1c:   00 00 00 38     li      r0,0
> > >      10000c20:   81 ff 21 f8     stdu    r1,-128(r1)
> > >      10000c24:   a6 03 08 7c     mtlr    r0
> > >      10000c28:   00 00 01 f8     std     r0,0(r1)
> > >      10000c2c:   10 80 02 e9     ld      r8,-32752(r2)
> > >
> > > The fault you're seeing is the first store using the stack pointer (r1),
> > > which is setup by the kernel.
> > >
> > > The fault address f0656d90 is weirdly low, the stack should be up near 128TB.
> > >
> > > I'm not sure how we end up with a bad r1.
> > >
> > > Can you dump some info about the kernel that was built, something like:
> > >
> > > $ file /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-rcutorture/TREE03/vmlinux
> > >
> > > And maybe paste/attach the full log, maybe there's a clue somewhere.
> >
> > You can now download the content of
> > `/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01`
> > [1, 65 MB].
> >
> > Can you reproduce the segmentation fault with the line below?
> >
> >      $ qemu-system-ppc64 -enable-kvm -nographic -smp cores=1,threads=8
> > -net none -enable-kvm -M pseries -nodefaults -device spapr-vscsi -serial
> > stdio -m 512 -kernel
> > /dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01/vmlinux
> > -append "debug_boot_weak_hash panic=-1 console=ttyS0
> > torture.disable_onoff_at_boot locktorture.onoff_interval=3
> > locktorture.onoff_holdoff=30 locktorture.stat_interval=15
> > locktorture.shutdown_secs=60 locktorture.verbose=1"
> >
> >
> > Kind regards,
> >
> > Paul
> >
> >
> > [1]:
> > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-03-10  2:37     ` Zhouyi Zhou
  2022-03-10  4:48       ` Paul E. McKenney
@ 2022-03-10  8:10       ` Paul Menzel
  2022-03-10 22:13         ` Zhouyi Zhou
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Menzel @ 2022-03-10  8:10 UTC (permalink / raw)
  To: Zhouyi Zhou; +Cc: rcu, linuxppc-dev, Willy Tarreau, Paul E. McKenney

Dear Zhouyi,


Thank you for still looking into this.


Am 10.03.22 um 03:37 schrieb Zhouyi Zhou:

> I try to reproduce the bug in ppc64 VM in Oregon State University
> using the vmlinux extracted from
> https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
> 
> the ppc64 VM in which I run the qemu without hardware acceleration is:
> Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)
> 
> 
> The qemu command I use to test:
> cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
> $qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
> pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
> -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
> console=ttyS0 rcutorture.onoff_interval=200
> rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
> rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
> rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
> rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
> rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
> rcutorture.verbose=1"
> 
> The console.log is uploaded to:
> http://154.223.142.244/logs/20220310/console.paul.log
> The log tells us it is illegal instruction that causes the trouble:
> [    4.246387][    T1] init[1]: illegal instruction (4) at 1002c308 nip 1002c308 lr 10001684 code 1 in init[10000000+d0000]
> [    4.251400][    T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 2c2d0000 f9490000 386d88d0 380000e8
> [    4.253416][    T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c <44000001> 2c2d0000 60000000 8902f438
> 
> 
> Meanwhile, the vmlinux compiled by myself runs smoothly.

How did you build it? Using GCC or clang? I forgot, if the problem was 
only reproducible if the host Linux kernel was built with clang or the 
VM kernel.

> Then I modify mkinitrd.sh to let it panic manually:
> http://154.223.142.244/logs/20220310/mkinitrd.sh

I only see the change:

     -
     +	int *ptr = 0;
     +	*ptr =  0;

> The log tells us it is a segfault (instead of a illegal instruction):
> http://154.223.142.244/logs/20220310/console.zhouyi.log
> 
> Then I use gdb to debug the init in host:
> ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
> tools/testing/selftests/rcutorture/initrd/init
> (gdb) run
> Starting program:
> /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000010000b2c in ?? ()
> (gdb) x/10i $pc
> => 0x10000b2c:    stw     r9,0(r9)
>     0x10000b30:    trap
>     0x10000b34:    .long 0x0
>     0x10000b38:    .long 0x0
>     0x10000b3c:    .long 0x0
>     0x10000b40:    lis     r2,4110
>     0x10000b44:    addi    r2,r2,31488
>     0x10000b48:    mr      r9,r1
>     0x10000b4c:    rldicr  r1,r1,0,59
>     0x10000b50:    li      r0,0
> (gdb) p $r9
> $1 = 0
> (gdb) x/30x $pc - 0x30
> 0x10000afc:    0x38840040    0x387f0040    0xf8010040    0x48026919
> 0x10000b0c:    0x60000000    0xe8010040    0x7c0803a6    0x4bffff24
> 0x10000b1c:    0x00000000    0x01000000    0x00000180    0x39200000
> 0x10000b2c:    0x91290000    0x7fe00008    0x00000000    0x00000000
> which matches the hex content of
> http://154.223.142.244/logs/20220310/console.zhouyi.log:
> [    5.077431][    T1] init[1]: segfault (11) at 0 nip 10000b2c lr 10001024 code 1 in init[10000000+d0000]
> [    5.087167][    T1] init[1]: code: 38840040 387f0040 f8010040 48026919 60000000 e8010040 7c0803a6 4bffff24
> [    5.093987][    T1] init[1]: code: 00000000 01000000 00000180 39200000 <91290000> 7fe00008 00000000 00000000
> 
> 
> Conclusions: there might be something wrong when packing the init into
> vmlinux in your environment.
> 
> I will continue to do research on this interesting problem with you.

As written I think it’s a problem with LLVM/clang. Unfortunately, I 
won’t be able to retest before next week.


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: rcutorture’s init segfaults in ppc64le VM
  2022-03-10  8:10       ` Paul Menzel
@ 2022-03-10 22:13         ` Zhouyi Zhou
  0 siblings, 0 replies; 15+ messages in thread
From: Zhouyi Zhou @ 2022-03-10 22:13 UTC (permalink / raw)
  To: Paul Menzel; +Cc: rcu, linuxppc-dev, Willy Tarreau, Paul E. McKenney

Dear Paul

On Thu, Mar 10, 2022 at 4:10 PM Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Zhouyi,
>
>
> Thank you for still looking into this.
You are very welcome ;-)
>
>
> Am 10.03.22 um 03:37 schrieb Zhouyi Zhou:
>
> > I try to reproduce the bug in ppc64 VM in Oregon State University
> > using the vmlinux extracted from
> > https://owww.molgen.mpg.de/~pmenzel/rcutorture-2022.02.01-21.52.37-torture-locktorture-kasan-lock01.tar.xz
> >
> > the ppc64 VM in which I run the qemu without hardware acceleration is:
> > Linux version 5.4.0-100-generic (buildd@bos02-ppc64el-021) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #113-Ubuntu SMP Thu Feb 3 18:43:11 UTC 2022 (Ubuntu 5.4.0-100.113-generic 5.4.166)
> >
> >
> > The qemu command I use to test:
> > cd /tmp/dev/shm/linux/tools/testing/selftests/rcutorture/res/2022.02.01-21.52.37-torture/results-locktorture-kasan/LOCK01$
> > $qemu-system-ppc64   -nographic -smp cores=2,threads=1 -net none -M
> > pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log
> > -m 512 -kernel ./vmlinux -append "debug_boot_weak_hash panic=-1
> > console=ttyS0 rcutorture.onoff_interval=200
> > rcutorture.onoff_holdoff=30 rcutree.gp_preinit_delay=12
> > rcutree.gp_init_delay=3 rcutree.gp_cleanup_delay=3
> > rcutree.kthread_prio=2 threadirqs tree.use_softirq=0
> > rcutorture.n_barrier_cbs=4 rcutorture.stat_interval=15
> > rcutorture.shutdown_secs=1800 rcutorture.test_no_idle_hz=1
> > rcutorture.verbose=1"
> >
> > The console.log is uploaded to:
> > http://154.223.142.244/logs/20220310/console.paul.log
> > The log tells us it is illegal instruction that causes the trouble:
> > [    4.246387][    T1] init[1]: illegal instruction (4) at 1002c308 nip 1002c308 lr 10001684 code 1 in init[10000000+d0000]
> > [    4.251400][    T1] init[1]: code: f90d88c0 f92a0008 f9480008 7c2004ac 2c2d0000 f9490000 386d88d0 380000e8
> > [    4.253416][    T1] init[1]: code: 41820098 e92d8f98 75290010 4182008c <44000001> 2c2d0000 60000000 8902f438
> >
> >
> > Meanwhile, the vmlinux compiled by myself runs smoothly.
>
> How did you build it? Using GCC or clang? I forgot, if the problem was
I built vmlinux(es) using GCC and clang both. The compiled vmlinux(es)
runs smoothly.
> only reproducible if the host Linux kernel was built with clang or the
> VM kernel.
Yes, I also remember this, the dependence of how the host Linux kernel
is built makes things more complex.
>
> > Then I modify mkinitrd.sh to let it panic manually:
> > http://154.223.142.244/logs/20220310/mkinitrd.sh
>
> I only see the change:
>
>      -
>      +  int *ptr = 0;
>      +  *ptr =  0;
>
Yes, I make the segfault happen manually.
> > The log tells us it is a segfault (instead of a illegal instruction):
> > http://154.223.142.244/logs/20220310/console.zhouyi.log
> >
> > Then I use gdb to debug the init in host:
> > ubuntu@zhouzhouyi-1:~/newkernel/linux-next$ gdb
> > tools/testing/selftests/rcutorture/initrd/init
> > (gdb) run
> > Starting program:
> > /home/ubuntu/newkernel/linux-next/tools/testing/selftests/rcutorture/initrd/init
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x0000000010000b2c in ?? ()
> > (gdb) x/10i $pc
> > => 0x10000b2c:    stw     r9,0(r9)
> >     0x10000b30:    trap
> >     0x10000b34:    .long 0x0
> >     0x10000b38:    .long 0x0
> >     0x10000b3c:    .long 0x0
> >     0x10000b40:    lis     r2,4110
> >     0x10000b44:    addi    r2,r2,31488
> >     0x10000b48:    mr      r9,r1
> >     0x10000b4c:    rldicr  r1,r1,0,59
> >     0x10000b50:    li      r0,0
> > (gdb) p $r9
> > $1 = 0
> > (gdb) x/30x $pc - 0x30
> > 0x10000afc:    0x38840040    0x387f0040    0xf8010040    0x48026919
> > 0x10000b0c:    0x60000000    0xe8010040    0x7c0803a6    0x4bffff24
> > 0x10000b1c:    0x00000000    0x01000000    0x00000180    0x39200000
> > 0x10000b2c:    0x91290000    0x7fe00008    0x00000000    0x00000000
> > which matches the hex content of
> > http://154.223.142.244/logs/20220310/console.zhouyi.log:
> > [    5.077431][    T1] init[1]: segfault (11) at 0 nip 10000b2c lr 10001024 code 1 in init[10000000+d0000]
> > [    5.087167][    T1] init[1]: code: 38840040 387f0040 f8010040 48026919 60000000 e8010040 7c0803a6 4bffff24
> > [    5.093987][    T1] init[1]: code: 00000000 01000000 00000180 39200000 <91290000> 7fe00008 00000000 00000000
> >
> >
> > Conclusions: there might be something wrong when packing the init into
> > vmlinux in your environment.
> >
> > I will continue to do research on this interesting problem with you.
>
> As written I think it’s a problem with LLVM/clang. Unfortunately, I
> won’t be able to retest before next week.
Roger that, no need to hurry ;-)

Kind regards
Zhouyi
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-03-10 22:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-07 16:44 rcutorture’s init segfaults in ppc64le VM Paul Menzel
2022-02-07 17:51 ` Paul E. McKenney
2022-02-07 18:09   ` rcutorture's " Willy Tarreau
2022-02-08  5:46   ` rcutorture’s " Zhouyi Zhou
2022-02-08  6:08     ` Zhouyi Zhou
2022-02-08 10:09 ` Michael Ellerman
2022-02-08 12:12   ` Paul Menzel
2022-02-08 12:27     ` Paul Menzel
2022-02-11  1:48     ` Michael Ellerman
2022-02-11 14:19       ` Paul Menzel
2022-02-11 15:42         ` Paul Menzel
2022-03-10  2:37     ` Zhouyi Zhou
2022-03-10  4:48       ` Paul E. McKenney
2022-03-10  8:10       ` Paul Menzel
2022-03-10 22:13         ` Zhouyi Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).