All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Erik Damrose <Damrose@univention.de>,
	Ian Campbell <ian.campbell@citrix.com>,
	Zoltan Kiss <zoltan.kiss@citrix.com>
Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Date: Sat, 07 Jun 2014 00:12:32 +0200	[thread overview]
Message-ID: <53923CD0.7010001@univention.de> (raw)
In-Reply-To: <20140606105804.GD11959@zion.uk.xensource.com>

Hello,

On 06.06.2014 12:58, Wei Liu wrote:
> On Fri, Jun 06, 2014 at 12:26:55PM +0200, Philipp Hahn wrote:
>> on one of our hosts (Xen-4.1.3 with Linux-3.10.26 + Debian patches)
>> running 16 Linux VMs (linux-3.2.39 and others) netback crashes during
>> the night when one of the VMs is rebooted by a cron-job:
>>> [38551.549615] Oops: 0000 [#1] SMP
> 
> Is there any more output above this line? Is it a NULL pointer
> dereference or something else?

Sorry, those lines got lost somehow during copy&paste:

[38551.547728] XXXlan0: port 9(vif26.0) entered disabled state
[38551.549365] BUG: unable to handle kernel paging request at
ffffc900108641d8
[38551.549461] IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0
[xen_netback]
[38551.549551] PGD 57e20067 PUD 57e21067 PMD 571a7067 PTE 0
[38551.549615] Oops: 0000 [#1] SMP

>>> [38551.550865] RIP: e030:[<ffffffffa04147dc>]  [<ffffffffa04147dc>]
>>> xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> 
> Try addr2line?

Good to know, but since that host is already rebooted, I no longer know
the module load address, which seems to render addr2line useless.

>> The host itself is still alive and reachable by network, but all VMs are
>> no longer reachable.
>> The crash does not happen on every reboot: The VM was running fine for
>> 1½ week after a dom0 kernel update, but now crashed the following past
>> two nights.
>>
> 
> What's the Dom0 kernel version before upgrading? That would help us
> narrow down the range of changesets.

The previous kernel was 3.10.15. The update was performed to get another
bug fixed, which went into the Debian update between .11 and .26:

commit 0ff773f59ff375c42af2238457bda98ed4ddcd25
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Wed Sep 11 14:52:48 2013 +0100
    xen-netback: count number required slots for an skb more carefully
    [ Upstream commit 6e43fc04a6bc357d260583b8440882f28069207f ]

> The oops happens in guest receive path. Unfortunately that's a very
> complex function, it's hard to identify the problem by looking at the
> code.
> 
> And as you seem to be using a distro kernel, have your reported to
> Debian yet? I don't quite understand which Debian release has 3.10
> kernel though.

Actually this is "Univention Corporate Server" (UCS), which is
Debian-Squeeze based but with a newer Xen-4.1.3 and newer Linux-3.10 kernel.

> 3.7.0 is too old. There has been lots of changes since then.

Probably so, but thanks for the confirmation.

>> Running "objdump -Sl xen-netback.ko" shows the OOPs to happen here:
>>> /root/linux-3.10.11/drivers/net/xen-netback/netback.c:606
...
> You mentioned 3.10.26 at the beginning but now it's 3.10.11? I'm
> confused.

This has something to do with Debian patch policy: The first 3.10 kernel
was 3.10.11, so that number stays, even when it up-patched to 3.10.26.

> If it's dereferencing NULL pointer, skb_shinfo(skb) == NULL?

If I got the math right, it looks like it's crashing here:
>>> /root/linux-3.10.11/drivers/net/xen-netback/netback.c:611
>>>         meta->id = req->id;
>>>      7d8:       48 83 c2 08             add    $0x8,%rdx
>>>      7dc:       0f b7 34 d1             movzwl (%rcx,%rdx,8),%esi
>> 0x651 + 0x18B = 0x7DC

0x651 is the start of xen_netbk_rx_action() from objdump.

...
> There's one more patch that you can pick up from 3.10.y tree. I doubt it
> will make much difference though.
> 
> I think the first thing to do is to identify which line of code is
> causing the problem. If it is actually the line you're referring to in
> your analyse then we need to figure out why skb_shinfo(skb) is NULL...

I'll try to add some debug output to yell if skb_shinfo() is NULL, but
it might take some time until the bug manifests again.

> Wei.

Thank you for your feedback.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

  reply	other threads:[~2014-06-06 22:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06 10:26 RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Philipp Hahn
2014-06-06 10:58 ` Wei Liu
2014-06-06 22:12   ` Philipp Hahn [this message]
2014-06-18 16:48     ` Philipp Hahn
2014-06-19 14:12       ` Wei Liu
2014-06-19 14:35         ` David Vrabel
2014-06-19 14:41           ` Wei Liu
2014-06-23 14:56         ` Philipp Hahn
2014-06-27  8:42           ` Philipp Hahn
2014-06-27 17:48             ` Philipp Hahn
2014-06-27 18:24               ` Philipp Hahn
2014-07-02  7:45                 ` [PATCH] " Philipp Hahn
2014-07-10 12:41                   ` Wei Liu
     [not found]                   ` <20140710124122.GA2381@zion.uk.xensource.com>
2014-07-11  9:41                     ` Philipp Hahn
     [not found]                     ` <53BFB142.7050201@univention.de>
2014-07-11  9:53                       ` Wei Liu
2014-07-11 10:32                       ` Wei Liu
     [not found]                       ` <20140711103236.GB12584@zion.uk.xensource.com>
2014-07-11 11:02                         ` Philipp Hahn
     [not found]                         ` <53BFC43A.4080709@univention.de>
2014-07-11 11:16                           ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53923CD0.7010001@univention.de \
    --to=hahn@univention.de \
    --cc=Damrose@univention.de \
    --cc=ian.campbell@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zoltan.kiss@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.