From: "Toralf Förster" <toralf.foerster@gmx.de>
To: Richard Weinberger <richard.weinberger@gmail.com>
Cc: UML devel <user-mode-linux-devel@lists.sourceforge.net>
Subject: Re: [uml-devel] negative pid -516 possible ?
Date: Mon, 13 Jan 2014 20:54:10 +0100 [thread overview]
Message-ID: <52D44462.8040808@gmx.de> (raw)
In-Reply-To: <CAFLxGvwrbAm5mRkje+TFgpr1Q+UoBEZQxmTaxvwv+6-GwGpMbA@mail.gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
On 01/13/2014 12:21 AM, Richard Weinberger wrote:
> On Sat, Jan 11, 2014 at 11:47 AM, Toralf Förster <toralf.foerster@gmx.de> wrote:
> I do fuzz testing with trinity (latest git version) a stable 32 bit Gentoo Linux user mode linux image.
> The host is a stable 32 bit vanilla 3.12.7 kernel, the guest runs latest git tree + 2 patches (attached).
>
> The trinity call in the UML guest is :
> $> trinity -q -l off -N 10000 -C 2 -x move_pages -x mremap -v /mnt/ramdisk
>
> After a while there's no progress on the command line seen at the host system - the trinity process seems to just hangs/idling. When this does occur I cannot longer ssh into the system. The system however runs furthermore. In another terminal I still see the output of this command:
>
>> Does it consume 100% CPU?
>
No.
It just doesnt allow new ssh connections. Existing ssh conenctinos are still working.
> $> ssh root@trinity "tail -f /var/log/messages"
>
> That's why I do know that the system does not hang completely. The output of top at the host system gives me the pid of the linux exe. A gdb call gives for that pid :
>
> $ date; sudo gdb /home/tfoerste/devel/linux/linux 25224 -n -batch -ex 'bt full'
> Sat Jan 11 11:36:47 CET 2014
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> No symbol table info available.
> #1 0x083d63ff in __nanosleep_nocancel ()
> No symbol table info available.
> #2 0x0807266c in idle_sleep (nsecs=602496380195307520) at arch/um/os-Linux/time.c:183
> ts = {tv_sec = 0, tv_nsec = 8436602}
> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
> No locals.
> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
> No locals.
> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
> No locals.
> #6 0x084215e9 in rest_init () at init/main.c:402
> pid = -516
> __func__ = "rest_init"
> #7 0x080487e1 in start_kernel () at init/main.c:656
> command_line = 0x85b8400 <command_line> "earlyprintk ubda=/home/tfoerste/virtual/uml/trinity ubdb=/mnt/ramdisk/trinity_swap eth0=tuntap,tap0,72:ef:3d:9f:c3:5a mem=1025M con0=fd:0,fd:1 con=pts rootfstype=ext4 root=98:0"
> #8 0x08049e42 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:48
> pid = -516
> __func__ = "start_kernel_proc"
> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
> fn = 0x0
> #10 0x00000000 in ?? ()
> No symbol table info available.
>
>
>
> Please note that BUG_ON was not triggered. For completeness here are the gdb traces from all linux processes currently running at the host:
>
>> So let's forget the 516 issue for now.
>> What we no for now is that you manage to trigger a lockup within UML.
>
Agreed, especially b/c I added this patch too :
$ cat ~/devel/priv/uml/pid516_2.patch
- --- init/main.c_orig 2014-01-12 16:43:48.585439158 +0100
+++ init/main.c 2014-01-12 16:44:01.706438453 +0100
@@ -389,6 +389,7 @@
BUG_ON(pid == -516);
rcu_read_lock();
kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns);
+ BUG_ON(pid == -516);
rcu_read_unlock();
complete(&kthreadd_done);
and this wasn't triggered (/me wonders if the -516 is somehow garbage).
But I can narrow down the problem. In an still open ssh sessions I made :
$ lsof | grep t3
bash 6129 tfoerste cwd DIR 98,0 4096 734 /home/tfoerste/t3
logger 6135 tfoerste cwd DIR 98,0 4096 734 /home/tfoerste/t3
(t3 is the ~/t3 directory where I cd into it bewfore I run trinity.
And after killing the logger command the trinity batch continues :
$ ps xf -eo pid,start_time,command | grep trinity
6412 20:48 | \_ grep --colour=auto trinity
6129 19:17 \_ bash -c cd ~; sudo su -c 'if [[ -d ./t3 ]]; then sudo chmod -R a+rwx ./t3; sudo rm -rf ./t3; fi'; mkdir ./t3; cd ./t3; logger "17#-1, M=/mnt/ramdisk"; if [[ -n /mnt/ramdisk ]]; then if [[ -d /mnt/ramdisk/victims/v1 ]]; then sudo chmod -R a+rwx /mnt/ramdisk/victims/v1; sudo rm -rf /mnt/ramdisk/victims/v1; fi; mkdir -p /mnt/ramdisk/victims/v1/v2; for i in $(seq -w 0 99); do touch /mnt/ramdisk/victims/v1/v2/f$i 2>/dev/null; mkdir /mnt/ramdisk/victims/v1/v2/d$i 2>/dev/null; done; fi; trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6390 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6391 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6392 20:46 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6408 20:47 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
6410 20:48 \_ trinity -q -N 10000 -C 2 -x move_pages -x mremap -V /mnt/ramdisk/victims/v1/v2
FWIW a ssh into the UML guest is however still no longer possible. So I'm pretty sure that trinity damage there something really but I'd expect that such a damage should be seen somewhere in the logs, or ?
And finally - now the the batch trinity command hangs again and now not even killing logger helps.
And a shutdown ("sudo halt; exit") hangs too.
>
>
> $ pgrep linux | xargs -n1 -I {} sudo gdb /home/tfoerste/devel/linux/linux {} -n -batch -ex 'bt'
> warning: process 1613 is already traced by process 25224
> ptrace: Operation not permitted.
> /home/tfoerste/1613: No such file or directory.
> No stack.
> warning: process 21849 is already traced by process 25224
> ptrace: Operation not permitted.
> /home/tfoerste/21849: No such file or directory.
> No stack.
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d63ff in __nanosleep_nocancel ()
> #2 0x0807266c in idle_sleep (nsecs=602496380205307520) at arch/um/os-Linux/time.c:183
> #3 0x0805fc0f in arch_cpu_idle () at arch/um/kernel/process.c:208
> #4 0x080a8971 in cpu_idle_loop () at kernel/cpu/idle.c:98
> #5 cpu_startup_entry (state=CPUHP_ONLINE) at kernel/cpu/idle.c:140
> #6 0x084215e9 in rest_init () at init/main.c:402
> #7 0x080487e1 in start_kernel () at init/main.c:656
> #8 0x08049e42 in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:48
> #9 0x0805f7cb in new_thread_handler () at arch/um/kernel/process.c:129
> #10 0x00000000 in ?? ()
>
> warning: process 25231 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083da446 in syscall ()
> #2 0x0806e861 in io_getevents (events=<optimized out>, ctx_id=<optimized out>, min_nr=<optimized out>, nr=<optimized out>, timeout=<optimized out>) at arch/um/os-Linux/aio.c:49
> #3 aio_thread (arg=0x0) at arch/um/os-Linux/aio.c:109
> #4 0x083db56e in clone ()
>
> warning: process 25232 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d82c2 in __read_nocancel ()
> #2 0x0806f3ff in read (__nbytes=<optimized out>, __buf=<optimized out>, __fd=<optimized out>) at /usr/include/bits/unistd.h:44
> #3 os_read_file (fd=-512, buf=0xfffffe00, len=-512) at arch/um/os-Linux/file.c:253
> #4 0x0806bafc in io_thread (arg=0x0) at arch/um/drivers/ubd_kern.c:1482
> #5 0x083db56e in clone ()
>
> warning: process 25233 is a cloned process
>
> warning: Could not load shared library symbols for linux-gate.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> 0xb7800424 in __kernel_vsyscall ()
> #0 0xb7800424 in __kernel_vsyscall ()
> #1 0x083d9132 in __poll_nocancel ()
> #2 0x08071114 in poll (__timeout=<optimized out>, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:46
> #3 write_sigio_thread (unused=0x0) at arch/um/os-Linux/sigio.c:61
> #4 0x083db56e in clone ()
> warning: process 25234 is a zombie - the process has already terminated
> ptrace: Operation not permitted.
> /home/tfoerste/25234: No such file or directory.
> No stack.
> ...
>
>
> Please Cc: me I'm not subscribed.
>
>> Wouldn't it make sense to subscribe?
>> You post very often on this list. :)
>
done ;)
>
>
>>
>> ------------------------------------------------------------------------------
>> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
>> Learn Why More Businesses Are Choosing CenturyLink Cloud For
>> Critical Workloads, Development Environments & Everything In Between.
>> Get a Quote or Start a Free Trial Today.
>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>> _______________________________________________
>> User-mode-linux-devel mailing list
>> User-mode-linux-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>>
>
>
>
- --
MfG/Sincerely
Toralf Förster
pgp finger print:1A37 6F99 4A9D 026F 13E2 4DCF C4EA CDDE 0076 E94E
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iF4EAREIAAYFAlLURGIACgkQxOrN3gB26U44RQD+KUqGBeP6/nJk1K/1Wx6nz7ij
/JXcjNN+ZBt8PsMWrV4A/jx7w7Xrl0RPWcwXVFYm+Ixo0dSbtr+zvh/2pdcCNU2c
=uGid
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
next prev parent reply other threads:[~2014-01-13 19:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-11 10:47 [uml-devel] negative pid -516 possible ? Toralf Förster
2014-01-12 23:21 ` Richard Weinberger
2014-01-13 19:54 ` Toralf Förster [this message]
2014-02-15 15:44 ` Toralf Förster
-- strict thread matches above, loose matches on Subject: below --
2013-12-21 14:36 Toralf Förster
2013-12-29 12:53 ` Toralf Förster
2013-12-29 13:14 ` stian
2014-01-02 13:38 ` Richard Weinberger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D44462.8040808@gmx.de \
--to=toralf.foerster@gmx.de \
--cc=richard.weinberger@gmail.com \
--cc=user-mode-linux-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.