xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Philipp Hahn <hahn@univention.de>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Erik Damrose <Damrose@univention.de>,
	Ian Campbell <ian.campbell@citrix.com>,
	Zoltan Kiss <zoltan.kiss@citrix.com>
Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Date: Sat, 07 Jun 2014 00:12:32 +0200	[thread overview]
Message-ID: <53923CD0.7010001@univention.de> (raw)
In-Reply-To: <20140606105804.GD11959@zion.uk.xensource.com>

Hello,

On 06.06.2014 12:58, Wei Liu wrote:
> On Fri, Jun 06, 2014 at 12:26:55PM +0200, Philipp Hahn wrote:
>> on one of our hosts (Xen-4.1.3 with Linux-3.10.26 + Debian patches)
>> running 16 Linux VMs (linux-3.2.39 and others) netback crashes during
>> the night when one of the VMs is rebooted by a cron-job:
>>> [38551.549615] Oops: 0000 [#1] SMP
> 
> Is there any more output above this line? Is it a NULL pointer
> dereference or something else?

Sorry, those lines got lost somehow during copy&paste:

[38551.547728] XXXlan0: port 9(vif26.0) entered disabled state
[38551.549365] BUG: unable to handle kernel paging request at
ffffc900108641d8
[38551.549461] IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0
[xen_netback]
[38551.549551] PGD 57e20067 PUD 57e21067 PMD 571a7067 PTE 0
[38551.549615] Oops: 0000 [#1] SMP

>>> [38551.550865] RIP: e030:[<ffffffffa04147dc>]  [<ffffffffa04147dc>]
>>> xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> 
> Try addr2line?

Good to know, but since that host is already rebooted, I no longer know
the module load address, which seems to render addr2line useless.

>> The host itself is still alive and reachable by network, but all VMs are
>> no longer reachable.
>> The crash does not happen on every reboot: The VM was running fine for
>> 1½ week after a dom0 kernel update, but now crashed the following past
>> two nights.
>>
> 
> What's the Dom0 kernel version before upgrading? That would help us
> narrow down the range of changesets.

The previous kernel was 3.10.15. The update was performed to get another
bug fixed, which went into the Debian update between .11 and .26:

commit 0ff773f59ff375c42af2238457bda98ed4ddcd25
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Wed Sep 11 14:52:48 2013 +0100
    xen-netback: count number required slots for an skb more carefully
    [ Upstream commit 6e43fc04a6bc357d260583b8440882f28069207f ]

> The oops happens in guest receive path. Unfortunately that's a very
> complex function, it's hard to identify the problem by looking at the
> code.
> 
> And as you seem to be using a distro kernel, have your reported to
> Debian yet? I don't quite understand which Debian release has 3.10
> kernel though.

Actually this is "Univention Corporate Server" (UCS), which is
Debian-Squeeze based but with a newer Xen-4.1.3 and newer Linux-3.10 kernel.

> 3.7.0 is too old. There has been lots of changes since then.

Probably so, but thanks for the confirmation.

>> Running "objdump -Sl xen-netback.ko" shows the OOPs to happen here:
>>> /root/linux-3.10.11/drivers/net/xen-netback/netback.c:606
...
> You mentioned 3.10.26 at the beginning but now it's 3.10.11? I'm
> confused.

This has something to do with Debian patch policy: The first 3.10 kernel
was 3.10.11, so that number stays, even when it up-patched to 3.10.26.

> If it's dereferencing NULL pointer, skb_shinfo(skb) == NULL?

If I got the math right, it looks like it's crashing here:
>>> /root/linux-3.10.11/drivers/net/xen-netback/netback.c:611
>>>         meta->id = req->id;
>>>      7d8:       48 83 c2 08             add    $0x8,%rdx
>>>      7dc:       0f b7 34 d1             movzwl (%rcx,%rdx,8),%esi
>> 0x651 + 0x18B = 0x7DC

0x651 is the start of xen_netbk_rx_action() from objdump.

...
> There's one more patch that you can pick up from 3.10.y tree. I doubt it
> will make much difference though.
> 
> I think the first thing to do is to identify which line of code is
> causing the problem. If it is actually the line you're referring to in
> your analyse then we need to figure out why skb_shinfo(skb) is NULL...

I'll try to add some debug output to yell if skb_shinfo() is NULL, but
it might take some time until the bug manifests again.

> Wei.

Thank you for your feedback.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

  reply	other threads:[~2014-06-06 22:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06 10:26 RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Philipp Hahn
2014-06-06 10:58 ` Wei Liu
2014-06-06 22:12   ` Philipp Hahn [this message]
2014-06-18 16:48     ` Philipp Hahn
2014-06-19 14:12       ` Wei Liu
2014-06-19 14:35         ` David Vrabel
2014-06-19 14:41           ` Wei Liu
2014-06-23 14:56         ` Philipp Hahn
2014-06-27  8:42           ` Philipp Hahn
2014-06-27 17:48             ` Philipp Hahn
2014-06-27 18:24               ` Philipp Hahn
2014-07-02  7:45                 ` [PATCH] " Philipp Hahn
2014-07-10 12:41                   ` Wei Liu
     [not found]                   ` <20140710124122.GA2381@zion.uk.xensource.com>
2014-07-11  9:41                     ` Philipp Hahn
     [not found]                     ` <53BFB142.7050201@univention.de>
2014-07-11  9:53                       ` Wei Liu
2014-07-11 10:32                       ` Wei Liu
     [not found]                       ` <20140711103236.GB12584@zion.uk.xensource.com>
2014-07-11 11:02                         ` Philipp Hahn
     [not found]                         ` <53BFC43A.4080709@univention.de>
2014-07-11 11:16                           ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53923CD0.7010001@univention.de \
    --to=hahn@univention.de \
    --cc=Damrose@univention.de \
    --cc=ian.campbell@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zoltan.kiss@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).