From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
kernel test robot <oliver.sang@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
lkp@lists.01.org, lkp@intel.com, zhengjun.xing@linux.intel.com
Subject: Re: [printk] b031a684bf: INFO:rcu_tasks_detected_stalls_on_tasks
Date: Thu, 28 Jan 2021 16:48:29 +0106 [thread overview]
Message-ID: <877dnxm5ju.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <YBLPhkHQ8cXFiY1X@alley>
On 2021-01-28, Petr Mladek <pmladek@suse.com> wrote:
> If you are able to reproduce this, it might be interesting to see
> backtraces from all CPUs when RCU stall gets detected. Or even
> printing all processes, even the sleeping ones.
OK, I now have lkp running on my intel core i5 machine and it reliably
reproduces this. First let me say a word about lkp. The instructions in
the report email are a bit misleading. There are lots of steps that it
doesn't mention.
First I needed to install lkp and its dependencies. I did all of this as
root because lkp seems to get confused when sudo is involved.
# git clone https://github.com/intel/lkp-tests.git
# cd lkp-tests
# make install
# lkp install
Then I built the kernel and modules using a helper script to setup the
environment for me:
----- BEGIN /tmp/mk.sh -----
#!/bin/sh
export INSTALL_MOD_PATH=`pwd`
export HOSTCC=gcc-9
export CC=gcc-9
export ARCH=x86_64
exec make "$@"
----- END /tmp/mk.sh -----
# cd linux
# git checkout -b lkp b031a684bfd01d633c79d281bd0cf11c2f834ada
# cp /tmp/config-5.10.0-rc5-gb031a684bfd0 .config
# /tmp/mk.sh -j `nproc` olddefconfig prepare modules_prepare bzImage modules modules_install
Then I created the modules.cgz:
# find lib/modules | cpio -H newc -o | gzip -9c > modules.cgz
I put kernel and modules in /tmp/:
# cp arch/x86/boot/bzImage modules.cgz /tmp/
And ran lkp inside script(1) so I have a logfile:
# script
# lkp qemu -k /tmp/bzImage -m /tmp/modules.cgz /tmp/job-script
And reliably I see the first stall at about 925 seconds:
[ 926.386441][ T22] INFO: rcu_tasks detected stalls on tasks:
[ 926.387310][ T22] 0000000041eb5240: .. nvcsw: 0/0 holdout: 1 idle_cpu: -1/1
[ 926.388196][ T22] task:dmesg state:R running task stack: 0 pid: 1752 ppid: 511 flags:0x2002000
0
[ 926.389509][ T22] Call Trace:
[ 926.389962][ T22] __schedule+0xa12/0xab5
[ 926.390534][ T22] ? firmware_map_remove+0xd1/0xd1
[ 926.391190][ T22] ? ksys_read+0x116/0x150
[ 926.391773][ T22] schedule+0x16c/0x1df
[ 926.392317][ T22] exit_to_user_mode_loop+0x28/0x84
[ 926.392977][ T22] exit_to_user_mode_prepare+0x1d/0x4a
[ 926.393671][ T22] syscall_exit_to_user_mode+0x41/0x4f
[ 926.394357][ T22] entry_INT80_compat+0x71/0x76
[ 926.394982][ T22] RIP: 0023:0xf7f0fa02
[ 926.395528][ T22] RSP: 002b:00000000ff850364 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ 926.396590][ T22] RAX: ffffffffffffffe0 RBX: 0000000000000004 RCX: 00000000565c8234
[ 926.397607][ T22] RDX: 0000000000001fff RSI: 00000000000001e0 RDI: 00000000565c8234
[ 926.398617][ T22] RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
[ 926.399629][ T22] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 926.400643][ T22] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
But actually it is not the rcu stall that interests me so much. I am
more interested in some bizarre console output I am seeing. Here is a
snippet:
[ 903.008486][ T356] [ 778.732883] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.008496][ T356]
[ 903.014733][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.030232][ T356] [ 778.735197] [ 655.203106] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.030242][ T356]
[ 903.040809][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.044327][ T356] [ 778.746898] [ 655.214214] [ 531.584847] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.044337][ T356]
[ 903.048732][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.066039][ T356] [ 778.751162] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.066050][ T356]
[ 903.070123][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.077137][ T356] [ 778.768744] [ 655.227204] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.077147][ T356]
[ 903.092792][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.095313][ T356] [ 778.773836] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.095323][ T356]
[ 903.106788][ T356] [ 778.781780] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.106798][ T356]
[ 903.121986][ T356] [ 778.791183] [ 655.230287] [ 531.589973] [ 407.116952] [ 283.068815] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.121997][ T356]
[ 903.134120][ T356] [ 778.797950] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.134130][ T356]
[ 903.148741][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.151626][ T356] [ 778.805164] [ 655.241173] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.151637][ T356]
[ 903.160126][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.166936][ T356] [ 778.818740] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.166946][ T356]
[ 903.175785][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.185955][ T528] tasks-torture:torture_onoff task: online 0 failed: errno -5
[ 903.189448][ T356] [ 778.825864] [ 655.250559] [ 531.607066] [ 407.120936] tasks-torture:torture_onoff task: online 0 failed: errno -5
These multi-timestamp lines are odd. And they are almost exactly 124
seconds apart. What is going on there? The investigation continues...
John Ogness
next prev parent reply other threads:[~2021-01-28 15:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-22 8:13 [printk] b031a684bf: INFO:rcu_tasks_detected_stalls_on_tasks kernel test robot
2021-01-22 16:21 ` Petr Mladek
2021-01-26 4:52 ` Sergey Senozhatsky
2021-01-27 21:22 ` John Ogness
2021-01-27 23:29 ` John Ogness
2021-01-28 8:30 ` Sergey Senozhatsky
2021-01-28 11:26 ` John Ogness
2021-01-28 14:51 ` Petr Mladek
2021-01-28 15:42 ` John Ogness [this message]
2021-01-28 16:36 ` Petr Mladek
2021-01-28 17:36 ` John Ogness
2021-01-28 21:38 ` John Ogness
2021-01-23 2:44 ` Sergey Senozhatsky
2021-02-02 17:13 ` John Ogness
2021-02-04 21:32 ` John Ogness
2021-02-05 9:32 ` Petr Mladek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877dnxm5ju.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=lkp@lists.01.org \
--cc=oliver.sang@intel.com \
--cc=pmladek@suse.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=zhengjun.xing@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox