From: Dave Jones <davej@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Andrey Vagin <avagin@openvz.org>
Subject: Re: frequent softlockups with 3.10rc6.
Date: Wed, 26 Jun 2013 01:48:09 -0400 [thread overview]
Message-ID: <20130626054809.GA2862@redhat.com> (raw)
In-Reply-To: <1372177414.18733.211.camel@gandalf.local.home>
On Tue, Jun 25, 2013 at 12:23:34PM -0400, Steven Rostedt wrote:
> On Tue, 2013-06-25 at 11:35 -0400, Dave Jones wrote:
> > Took a lot longer to trigger this time. (13 hours of runtime).
> >
> > This trace may still not be from the first lockup, as a flood of
> > them happened at the same time.
> >
> >
> > # tracer: preemptirqsoff
> > #
> > # preemptirqsoff latency trace v1.1.5 on 3.10.0-rc7+
> > # --------------------------------------------------------------------
> > # latency: 389877255 us, #4/4, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
> > # -----------------
> > # | task: trinity-main-9252 (uid:1000 nice:19 policy:0 rt_prio:0)
> > # -----------------
> > # => started at: vprintk_emit
> > # => ended at: vprintk_emit
> > #
> > #
> > # _------=> CPU#
> > # / _-----=> irqs-off
> > # | / _----=> need-resched
> > # || / _---=> hardirq/softirq
> > # ||| / _--=> preempt-depth
> > # |||| / delay
> > # cmd pid ||||| time | caller
> > # \ / ||||| \ | /
> > trinity--9252 1dNh1 0us!: console_unlock <-vprintk_emit
> > trinity--9252 1dNh1 389877255us : console_unlock <-vprintk_emit
> > trinity--9252 1dNh1 389877255us+: stop_critical_timings <-vprintk_emit
> > trinity--9252 1dNh1 389877261us : <stack trace>
> > => console_unlock
> > => vprintk_emit
> > => printk
>
> This is the same as the last one, with no new info to why it started the
> tracing at console_unlock :-/
>
> Now, what we can try to do as well, is to add a trigger to disable
> tracing, which should (I need to check the code) stop tracing on printk.
> To do so:
>
> # echo printk:traceoff > /sys/kernel/debug/tracing/set_ftrace_filter
>
> This will add a trigger to the printk function that when called, will
> disable tracing. If it is hit before you get your trace, you can just
> re-enable tracing with:
>
> # echo 1 > /sys/kernel/debug/tracing/tracing_on
>
> Hmm, no it needs a fix to make this work. I applied a patch below that
> should do this correctly (and will put this into my 3.11 queue).
>
> If you run the test again with this change and with the above filter, it
> should stop the trace before overwriting the first dump, as it should
> ignore the printk output.
More puzzling. Rebooted the machine, restarted the test, and hit it pretty
quickly.. Though different backtrace this time..
[ 1583.293902] BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:28]
[ 1583.293905] BUG: soft lockup - CPU#0 stuck for 22s! [migration/0:7]
[ 1583.293932] Modules linked in: tun hidp bnep nfnetlink rfcomm scsi_transport_iscsi ipt_ULOG can_raw af_rxrpc netrom nfc can_bcm can appletalk ipx p8023 af_key psnap irda p8022 rose caif_socket caif ax25 crc_ccitt llc2 af_802154 llc rds bluetooth phonet rfkill pppoe pppox atm ppp_generic slhc x25 coretemp hwmon kvm_intel kvm snd_hda_codec_realtek crc32c_intel snd_hda_codec_hdmi ghash_clmulni_intel microcode snd_hda_intel pcspkr snd_hda_codec snd_hwdep snd_seq snd_seq_device e1000e snd_pcm ptp pps_core snd_page_alloc snd_timer snd soundcore xfs libcrc32c
[ 1583.293932] irq event stamp: 108950
[ 1583.293937] hardirqs last enabled at (108949): [<ffffffff816e9320>] restore_args+0x0/0x30
[ 1583.293940] hardirqs last disabled at (108950): [<ffffffff816f1dea>] apic_timer_interrupt+0x6a/0x80
[ 1583.293943] softirqs last enabled at (108948): [<ffffffff810541d4>] __do_softirq+0x194/0x440
[ 1583.293945] softirqs last disabled at (108943): [<ffffffff8105463d>] irq_exit+0xcd/0xe0
[ 1583.293947] CPU: 0 PID: 7 Comm: migration/0 Not tainted 3.10.0-rc7+ #5
[ 1583.293948] task: ffff880244190000 ti: ffff88024418a000 task.ti: ffff88024418a000
[ 1583.293952] RIP: 0010:[<ffffffff810dd856>] [<ffffffff810dd856>] stop_machine_cpu_stop+0x86/0x110
[ 1583.293953] RSP: 0018:ffff88024418bce8 EFLAGS: 00000293
[ 1583.293953] RAX: 0000000000000001 RBX: ffffffff816e9320 RCX: 0000000000000000
[ 1583.293954] RDX: ffff88024418bf00 RSI: ffffffff81084b2c RDI: ffff8801ecd51b88
[ 1583.293954] RBP: ffff88024418bd10 R08: 0000000000000001 R09: 0000000000000000
[ 1583.293955] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88024418bc58
[ 1583.293956] R13: ffff88024418bfd8 R14: ffff88024418a000 R15: 0000000000000046
[ 1583.293956] FS: 0000000000000000(0000) GS:ffff880245600000(0000) knlGS:0000000000000000
[ 1583.293957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1583.293958] CR2: 0000000006ff0158 CR3: 00000001ea0df000 CR4: 00000000001407f0
[ 1583.293958] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1583.293959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1583.293959] Stack:
[ 1583.293961] ffff8802457cd880 ffff8801ecd51ac0 ffff8801ecd51b88 ffffffff810dd7d0
[ 1583.293963] ffff88024418bfd8 ffff88024418bde0 ffffffff810dd6ed ffff88024418bfd8
[ 1583.293965] ffff8802457cd8d0 000000000000017f ffff88024418bd48 00000007810b463e
[ 1583.293965] Call Trace:
[ 1583.293967] [<ffffffff810dd7d0>] ? cpu_stopper_thread+0x180/0x180
[ 1583.293969] [<ffffffff810dd6ed>] cpu_stopper_thread+0x9d/0x180
[ 1583.293971] [<ffffffff816e86d5>] ? _raw_spin_unlock_irqrestore+0x65/0x80
[ 1583.293973] [<ffffffff810b75b5>] ? trace_hardirqs_on_caller+0x115/0x1e0
[ 1583.293976] [<ffffffff81084b2c>] smpboot_thread_fn+0x1ac/0x320
[ 1583.293978] [<ffffffff81084980>] ? lg_global_unlock+0xe0/0xe0
[ 1583.293981] [<ffffffff8107a88d>] kthread+0xed/0x100
[ 1583.293983] [<ffffffff816e597f>] ? wait_for_completion+0xdf/0x110
[ 1583.293985] [<ffffffff8107a7a0>] ? insert_kthread_work+0x80/0x80
[ 1583.293987] [<ffffffff816f10dc>] ret_from_fork+0x7c/0xb0
[ 1583.293989] [<ffffffff8107a7a0>] ? insert_kthread_work+0x80/0x80
[ 1583.294007] Code: f0 ff 4b 24 0f 94 c2 84 d2 44 89 e0 74 12 8b 43 20 8b 53 10 83 c0 01 89 53 24 89 43 20 44 89 e0 83 f8 04 74 20 f3 90 44 8b 63 20 <41> 39 c4 74 f0 41 83 fc 02 75 bf fa e8 49 66 fd ff eb c2 0f 1f
Same trace for all 4 cpus occurred.
'trace' file contained this...
# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 3.10.0-rc7+
# --------------------------------------------------------------------
# latency: 478 us, #4/4, CPU#3 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: sshd-405 (uid:1000 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: fget_light
# => ended at: fget_light
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
sshd-405 3...1 0us!: rcu_lockdep_current_cpu_online <-fget_light
sshd-405 3...1 478us+: rcu_lockdep_current_cpu_online <-fget_light
sshd-405 3...1 481us+: trace_preempt_on <-fget_light
sshd-405 3...1 496us : <stack trace>
=> sub_preempt_count
=> rcu_lockdep_current_cpu_online
=> fget_light
=> do_select
=> core_sys_select
=> SyS_select
=> tracesys
which looks pretty unhelpful to me ?
Dave
next prev parent reply other threads:[~2013-06-26 5:48 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-19 16:45 frequent softlockups with 3.10rc6 Dave Jones
2013-06-19 17:53 ` Dave Jones
2013-06-19 18:13 ` Paul E. McKenney
2013-06-19 18:42 ` Dave Jones
2013-06-20 0:12 ` Dave Jones
2013-06-20 16:16 ` Paul E. McKenney
2013-06-20 16:27 ` Dave Jones
2013-06-21 15:11 ` Dave Jones
2013-06-21 19:59 ` Oleg Nesterov
2013-06-22 1:37 ` Dave Jones
2013-06-22 17:31 ` Oleg Nesterov
2013-06-22 21:59 ` Dave Jones
2013-06-23 5:00 ` Andrew Vagin
2013-06-23 14:36 ` Oleg Nesterov
2013-06-23 15:06 ` Dave Jones
2013-06-23 16:04 ` Oleg Nesterov
2013-06-24 0:21 ` Dave Jones
2013-06-24 2:00 ` Dave Jones
2013-06-24 14:39 ` Oleg Nesterov
2013-06-24 14:52 ` Steven Rostedt
2013-06-24 16:00 ` Dave Jones
2013-06-24 16:24 ` Steven Rostedt
2013-06-24 16:51 ` Dave Jones
2013-06-24 17:04 ` Steven Rostedt
2013-06-25 16:55 ` Dave Jones
2013-06-25 17:21 ` Steven Rostedt
2013-06-25 17:23 ` Steven Rostedt
2013-06-25 17:26 ` Dave Jones
2013-06-25 17:31 ` Steven Rostedt
2013-06-25 17:32 ` Steven Rostedt
2013-06-25 17:29 ` Steven Rostedt
2013-06-25 17:34 ` Dave Jones
2013-06-24 16:37 ` Oleg Nesterov
2013-06-24 16:49 ` Dave Jones
2013-06-24 15:57 ` Dave Jones
2013-06-24 17:35 ` Oleg Nesterov
2013-06-24 17:44 ` Dave Jones
2013-06-24 17:53 ` Steven Rostedt
2013-06-24 18:00 ` Dave Jones
2013-06-25 15:35 ` Dave Jones
2013-06-25 16:23 ` Steven Rostedt
2013-06-26 5:23 ` Dave Jones
2013-06-26 19:52 ` Steven Rostedt
2013-06-26 20:00 ` Dave Jones
2013-06-27 3:01 ` Steven Rostedt
2013-06-26 5:48 ` Dave Jones [this message]
2013-06-26 19:18 ` Oleg Nesterov
2013-06-26 19:40 ` Dave Jones
2013-06-27 0:22 ` Dave Jones
2013-06-27 1:06 ` Eric W. Biederman
2013-06-27 2:32 ` Tejun Heo
2013-06-27 7:55 ` Dave Chinner
2013-06-27 10:06 ` Dave Chinner
2013-06-27 12:52 ` Dave Chinner
2013-06-27 15:21 ` Dave Jones
2013-06-28 1:13 ` Dave Chinner
2013-06-28 3:58 ` Dave Chinner
2013-06-28 10:28 ` Jan Kara
2013-06-29 3:39 ` Dave Chinner
2013-07-01 12:00 ` Jan Kara
2013-07-02 6:29 ` Dave Chinner
2013-07-02 8:19 ` Jan Kara
2013-07-02 12:38 ` Dave Chinner
2013-07-02 14:05 ` Jan Kara
2013-07-02 16:13 ` Linus Torvalds
2013-07-02 16:57 ` Jan Kara
2013-07-02 17:38 ` Linus Torvalds
2013-07-03 3:07 ` Dave Chinner
2013-07-03 3:28 ` Linus Torvalds
2013-07-03 4:49 ` Dave Chinner
2013-07-04 7:19 ` Andrew Morton
2013-06-29 20:13 ` Dave Jones
2013-06-29 22:23 ` Linus Torvalds
2013-06-29 23:44 ` Dave Jones
2013-06-30 0:21 ` Steven Rostedt
2013-07-01 12:49 ` Pavel Machek
2013-06-30 0:17 ` Steven Rostedt
2013-06-30 2:05 ` Dave Chinner
2013-06-30 2:34 ` Dave Chinner
2013-06-27 14:30 ` Dave Jones
2013-06-28 1:18 ` Dave Chinner
2013-06-28 2:54 ` Linus Torvalds
2013-06-28 3:54 ` Dave Chinner
2013-06-28 5:59 ` Linus Torvalds
2013-06-28 7:21 ` Dave Chinner
2013-06-28 8:22 ` Linus Torvalds
2013-06-28 8:32 ` Al Viro
2013-06-28 8:22 ` Al Viro
2013-06-28 9:49 ` Jan Kara
2013-07-01 17:57 ` block layer softlockup Dave Jones
2013-07-02 2:07 ` Dave Chinner
2013-07-02 6:01 ` Dave Jones
2013-07-02 7:30 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130626054809.GA2862@redhat.com \
--to=davej@redhat.com \
--cc=avagin@openvz.org \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.