All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
To: sparclinux@vger.kernel.org
Subject: Re: Recent spontaneous reboots on multiple machines
Date: Mon, 22 Feb 2016 01:02:01 +0000	[thread overview]
Message-ID: <20160222010201.GB23053@oracle.com> (raw)
In-Reply-To: <alpine.LRH.2.20.1601061716510.27880@math.ut.ee>

On (02/15/16 07:54), Meelis Roos wrote (on sparclinux):
> > > It's getting more strange. I ran 4.4-rc8-00005 for 2-3 weeks nonstop, 
> > > doing git clone and make -j4 in a loop, on both V240 and V440. Worked 
> > > 100% stable.
> > > 
> > > Then I git git pull from kernel.org, tried to compile 4.5-rc1 (or was it 
> > > rc2 already), on the same running 4.4.0-rc8-00005 and it rebooted, on 
> > > both V240 and V440.

Hmm. My experience was a little different than yours but maybe we
are seeing the same thing. 

I get a panic that matches the description in d188ba86dd07a ("xfrm:
add rcu protection to sk->sk_policy[]") but the panic remains
even after applying that patch, so maybe there is still some
race-window that was missed by the patch (or I'm missing some additional
patches?)

To reproduce the panic on my v440 (sparc sunfire) I fixed up my transparent
proxy env, and do a 'git pull' on the test machine (running 4.4.0-rc3+).
The reboot on panic was quite noisy on the (serial line to) console, though I
didnt find anything recorded in /var/log/*, and, with
kernel.panic = kernel.panic_on_oops = 1, the ssh session terminates quietly.

here's what I pulled out from the console noise:

[3816414.196028] Unable to handle kernel paging request at virtual address 77e0000000000000
[3816414.302455] tsk->{mm,active_mm}->context = 0000000000001f95
[3816414.378057] tsk->{mm,active_mm}->pgd = fff000123c040000
   :
[3816414.651546] git(7768): Oops [#1]
[3816414.696158] CPU: 0 PID: 7768 Comm: git Not tainted 4.4.0-rc3-roos-00790-g264a4ac-dirty #29
[3816414.807133] task: fff000123e2a31e0 ti: fff000123e3dc000 task.ti: fff000123e3dc000
[3816414.907887] TSTATE: 0000009911001601 TPC: 00000000007ed400 TNPC: 00000000007ed404 Y: 00000276    Not tainted
[3816415.039484] TPC: <xfrm_selector_match+0x20/0x3a0>
                      :
                      :

Looks like the pol is the bad vaddr. When I insert printks, I see
the following in xfrm_sk_policy_lookup() 

   dir XFRM_POLICY_OUT  sk fff000123e1aa000 pol 77e0000000000000

Relevant parts of the stack trace from console messages  are shown below.

 xfrm_sk_policy_lookup+0x30/0xc0
 xfrm_lookup+0x20/0x340
 nf_xfrm_me_harder+0x54/0x120 [nf_nat]
 nf_nat_ipv4_out+0xe0/0x140 [nf_nat_ipv4]
 nf_iterate+0x8c/0xc0
 nf_hook_slow+0x1c/0xe0
 ip_output+0xd4/0x100
 ip_local_out+0x30/0x60
 tcp_v4_send_synack+0x4c/0xa0
 tcp_conn_request+0x934/0x960
 tcp_rcv_state_process+0x1dc/0xee0
 tcp_v4_do_rcv+0x68/0x220
 tcp_v4_rcv+0xb04/0xbc0
 ip_local_deliver_finish+0x114/0x2a0
 ip_local_deliver+0x38/0xe0
 ip_rcv_finish+0x14c/0x380
 ip_rcv+0x26c/0x3e0
 __netif_receive_skb_core+0x7c4/0xb60
 process_backlog+0x70/0x120
 net_rx_action+0x204/0x300
 __do_softirq+0xc4/0x200
 do_softirq_own_stack+0x2c/0x4
  etc.

Unfortunately I cannot get a crash dump on sunfire, so no way to tell 
what other kernel threads could potentially be racing with this.

Still looking..

--Sowmini

  parent reply	other threads:[~2016-02-22  1:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-06 15:21 Recent spontaneous reboots on multiple machines Meelis Roos
2016-01-06 15:22 ` Meelis Roos
2016-01-07 17:41 ` Sowmini Varadhan
2016-01-07 19:46 ` Meelis Roos
2016-01-07 20:14 ` Sowmini Varadhan
2016-01-08 17:20 ` Sowmini Varadhan
2016-01-08 19:02 ` David Miller
2016-01-08 20:18 ` mroos
2016-01-08 21:15 ` Sowmini Varadhan
2016-01-08 21:29 ` Meelis Roos
2016-01-10 10:03 ` Meelis Roos
2016-02-05  9:06 ` Meelis Roos
2016-02-14 10:51 ` Sowmini Varadhan
2016-02-15  5:54 ` Meelis Roos
2016-02-18  2:30 ` Sowmini Varadhan
2016-02-22  1:02 ` Sowmini Varadhan [this message]
2016-02-23 12:12   ` Invalid sk_policy[] access (was Re: Recent spontaneous reboots on multiple machines) Sowmini Varadhan
2016-02-23 12:12     ` Sowmini Varadhan
2016-02-23 19:36     ` Meelis Roos
2016-02-23 19:36       ` Meelis Roos
2016-02-23 19:39       ` Sowmini Varadhan
2016-02-23 19:39         ` Sowmini Varadhan
2016-02-23 20:45         ` Meelis Roos
2016-02-23 20:45           ` Meelis Roos
2016-02-23 20:20       ` Invalid sk_policy[] access David Miller
2016-02-23 20:20         ` David Miller
2016-02-23 20:29         ` Sowmini Varadhan
2016-02-23 20:29           ` Sowmini Varadhan
2016-02-23 20:37           ` mroos
2016-02-23 20:37             ` mroos
2016-02-23 20:51           ` mroos
2016-02-23 20:51             ` mroos
2016-02-23 20:53             ` Sowmini Varadhan
2016-02-23 20:53               ` Sowmini Varadhan
2016-02-23 23:05               ` David Miller
2016-02-23 23:05                 ` David Miller
2016-02-23 22:59             ` David Miller
2016-02-23 22:59               ` David Miller
2016-02-23 23:59               ` Sowmini Varadhan
2016-02-23 23:59                 ` Sowmini Varadhan
2016-02-24  0:23                 ` David Miller
2016-02-24  0:23                   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160222010201.GB23053@oracle.com \
    --to=sowmini.varadhan@oracle.com \
    --cc=sparclinux@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.