From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sumit Gemini <sumit.gemini0@gmail.com>
Cc: ralf.hildebrandt@charite.de, linux-kernel@vger.kernel.org
Subject: Re: Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies)
Date: Mon, 20 Feb 2017 12:58:02 -0800 [thread overview]
Message-ID: <20170220205802.GS30506@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAD6qDH7j1Fr6jFsuaeJm+xGX8Q41m7nDJb7C329jF65yrO5a7g@mail.gmail.com>
On Mon, Feb 20, 2017 at 03:26:32PM +0530, Sumit Gemini wrote:
> Hi Paul,
>
>
> I got you while reading 'http://www.serverphorums.com/read.php?12,468996". I'm
> facing same problem but not able to detect what exectly the cause of
> it.
>
>
> I'm observed weird problem, I got rcu_bh_state on cpu0 but my system
> was in ideal state when i got rcu_bh_state (Cpu stall) problem. After
> 7 second my system recovered itself from stall to active. But i did
> not get what happen at that time.
One thing that you might look into below...
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.520060] INFO: rcu_bh_state detected stall on CPU 0 (t=0 jiffies)
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] sending NMI to all CPUs:
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] NMI backtrace for cpu 0
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] CPU 0
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Modules linked in: nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate zlib_deflate ctr twofish_x86_64 twofish_common camellia serpent blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor thermal_sys [last unloaded: ipt_PORTMAP]
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042]
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] Pid: 0, comm: swapper Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RIP: 0010:[<ffffffff81007f51>] [<ffffffff81007f51>] native_read_tsc+0x2/0xf
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RSP: 0018:ffff88043ee03db0 EFLAGS: 00000803
> Jan 6 16:45:06 ffm-sbc-2a kernel: [3167216.524042] RAX: 0000000037185395 RBX: 00000000000003e9 RCX: 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a osafimmd[5035]: WA IMMND DOWN on active controller f2 detected at standby immd!! f1. Possible failover
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Done Locking applications node id:2020f ret val:0
> Jan 6 16:45:07 ffm-sbc-2a opensaf_recovery: Control interface status:0 Role:STANDBY
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: NO Current role: STANDBY
> Jan 6 16:45:07 ffm-sbc-2a osaffmd[5023]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Received Node Down for peer controller, > OwnNodeId = 131343, SupervisionTime = 60
Is this "WA IMMND DOWN" a normal occurrence? Perhaps it is pointing
at a root cause.
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RDX: 0000000000bf0977 RSI: 0000000000000002 RDI: 0000000000032bd4
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] RBP: 0000000000032bd4 R08: 0000000000000000 R09: ffffffff819232b0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R10: 7fffffffffffffff R11: 7fffffffffffffff R12: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] R13: ffffffff819232b0 R14: 0000000000000001 R15: ffffffff81065c28
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] FS: 0000000000000000(0000) GS:ffff88043ee00000(0000) knlGS:0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] CR2: 000000000069e034 CR3: 0000000001805000 CR4: 00000000000406f0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Process swapper (pid: 0, threadinfo ffffffff81800000, task ffffffff8180d020)
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Stack:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] ffffffff81200eb5 ffffffff81200f44 00000000000003e9 0000000000001000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002 ffffffff819232b0 ffffffff81017698 7fffffffffffffff
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] 0000000000000002 0000000000000002 ffffffff81017fdf 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200eb5>] paravirt_read_tsc+0x5/0x8
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81200f44>] delay_tsc+0x1d/0x68
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017698>] native_safe_apic_wait_icr_idle+0x27/0x32
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017fdf>] default_send_IPI_dest_field.constprop.0+0x19/0x4d
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101804b>] default_send_IPI_mask_sequence_phys+0x38/0x6a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8101815e>] arch_trigger_all_cpu_backtrace+0x4d/0x7b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109801b>] check_cpu_stall+0x66/0xdb
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff810980aa>] rcu_pending+0x1a/0x10a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8109852c>] rcu_check_callbacks+0x9d/0xae
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8104c56c>] update_process_times+0x31/0x63
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81065c92>] tick_sched_timer+0x6a/0x90
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105b872>] __run_hrtimer+0xa4/0x148
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8105c08e>] hrtimer_interrupt+0xdb/0x19a
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81017768>] smp_apic_timer_interrupt+0x6e/0x80
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff813efcde>] apic_timer_interrupt+0x6e/0x80
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81239655>] intel_idle+0xdd/0x104
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff81001200>] cpu_idle+0x51/0x95
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193db0f>] start_kernel+0x388/0x393
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] [<ffffffff8193d3af>] x86_64_start_kernel+0xcf/0xdc
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.524042] Code: 74 03 e6 80 c3 e6 ed c3 bf 8e 21 00 00 e9 ba 8f 1f 00 c3 90 90 90 40 88 f8 e6 70 e4 71 c3 40 88 f0 e6 70 40 88 f8 e6 71 c3 0f 31
> Jan 6 16:45:07 ffm-sbc-2a kernel[3167216.524042]: c1 48 89 d0 48 c1 e0 20 48 09 c8 c3 41 56 41 55 41 54 55 53
>
> .
> .
> .
> .
> .
> .
>
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] NMI backtrace for cpu 31
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CPU 31
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Modules linked in: nf_conntrack_netlink af_packet xt_sharedlimit xt_hashlimit ip_set_hash_ipport ip_set_hash_ipportip xt_NOTRACK ip_set_bitmap_port xt_sctp nf_conntrack_ipv6 nf_defrag_ipv6 xt_CT arpt_mangle ip_set_hash_ipnet xt_NFLOG nfnetlink_log ipt_ULOG xt_limit xt_hashcounter ip_set_hash_ipip xt_set ip_set_hash_ip deflate ctr twofish_x86_64 twofish_common camellia serpent > blowfish cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 crypto_null af_key iptable_mangle ip_set nfnetlink arptable_filter arp_tables iptable_raw iptable_nat tipc xt_tcpudp xt_state xt_pkttype bonding binfmt_misc iptable_filter ip6table_filter ip6_tables nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables x_tables mperf edd ipmi_devintf ipmi_si ipmi_msghandler nf_conntrack_proto_sctp nf_conntrack sctp 8021q garp stp llc gb_sys usb_storage ioatdma ixgbe uas sg igb iTCO_wdt wmi i2c_i801 pcspkr mdio iTCO_vendor_support button container dca ipv6 autofs4 usbhid megasr(P) ehci_hcd usbcore processor thermal_sys [last unloaded: ipt_PORTMAP]
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734]
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Pid: 0, comm: kworker/0:1 Tainted: P 3.1.10-gb17-default #1 Intel Corporation S2600CO/S2600CO
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RIP: 0010:[<ffffffff81239624>] [<ffffffff81239624>] intel_idle+0xac/0x104
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RSP: 0018:ffff880425e33eb8 EFLAGS: 00000046
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RAX: 0000000000000030 RBX: 0000000000000010 RCX: 0000000000000001
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RDX: 0000000000000000 RSI: ffff880425e33fd8 RDI: ffffffff81810500
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] RBP: 0000000000000030 R08: 000000000000006d R09: 0000000000034b10
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R10: ffff88083eded830 R11: ffff88083eded830 R12: 149739342cb2ca49
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] R13: 0000000000000004 R14: 000000000000001f R15: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] FS: 0000000000000000(0000) GS:ffff88083ede0000(0000) knlGS:0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] CR2: 00007fe89df67120 CR3: 0000000001805000 CR4: 00000000000406e0
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Process kworker/0:1 (pid: 0, threadinfo ffff880425e32000, task ffff880425e30580)
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167216.830734] Stack:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685484] TIPC: Resetting link <1.1.1:bond0-1.1.2:bond0>, requested by peer
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685487] TIPC: Lost link <1.1.1:bond0-1.1.2:bond0> on network plane A
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.685491] TIPC: Lost contact with <1.1.2>
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000 000000000cdd3a47 0000000000000000 000000000cdd3a47
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] ffff880425e33fd8 0000001f3edf8970 ffff88083edf8970 ffff88083edf8b00
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] 0000000000000000 ffffffff81304773 ffffffff819232b0 ffff880425e33fd8
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>] cpu_idle+0x51/0x95
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Code: 28 e0 ff ff 80 e2 08 75 22 31 d2 48 83 c0 10 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <e8> 3f 6e e2 ff 4c 29 e0 48 89 c7 e8 10 ae e0 ff 48 69 e8 40 42
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] Call Trace:
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81304773>] cpuidle_idle_call+0xda/0x169
> Jan 6 16:45:07 ffm-sbc-2a kernel: [3167225.687214] [<ffffffff81001200>] cpu_idle+0x51/0x95
Since NMIs were sent to all CPUs, it is not surprising that one was found
to be idle. Or is there something that I am missing in CPU 31's
backtrace?
> I'm using suse kernel 3.1.10
That is going back a ways!
There are quite a few patches to RCU CPU stall warnings that could
be applied. It might be easier to move to a newer kernel.
Alternatively, if you are convinced that these stall warnings are
false positives, you could boot with the rcu_cpu_stall_suppress
kernel boot parameter set, or rebuild the kernel with a much larger
value of the RCU_CPU_STALL_TIMEOUT Kconfig option.
Thanx, Paul
next parent reply other threads:[~2017-02-20 20:58 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAD6qDH7j1Fr6jFsuaeJm+xGX8Q41m7nDJb7C329jF65yrO5a7g@mail.gmail.com>
2017-02-20 20:58 ` Paul E. McKenney [this message]
[not found] ` <CAD6qDH7GY6xYDtZAF3LWFFYZZuusj7UuQg43WhEc44jXBfeNpA@mail.gmail.com>
2017-02-21 3:01 ` Need your help on rcu_bh_state stall on CPU 0 (t=0 jiffies) Paul E. McKenney
[not found] ` <CAD6qDH6iQzf29ZzgFHxUCwRtbdvRCW2C5a4nvuYzhhwHgnOnPQ@mail.gmail.com>
2017-02-21 8:05 ` Paul E. McKenney
[not found] ` <CAD6qDH6piF-tyTUjbGj679EBg_SneOUnC6jzxB24r_qy79dYvQ@mail.gmail.com>
[not found] ` <20170221195753.GF30506@linux.vnet.ibm.com>
[not found] ` <CAD6qDH5OrQdKJNKb1EENvWMyZOZK7kQ-pHJQL5q9DaKKJbrOpw@mail.gmail.com>
2017-02-22 0:39 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170220205802.GS30506@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ralf.hildebrandt@charite.de \
--cc=sumit.gemini0@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.