Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Philipp Hahn <hahn@univention.de>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Erik Damrose <Damrose@univention.de>,
	Ian Campbell <ian.campbell@citrix.com>,
	Zoltan Kiss <zoltan.kiss@citrix.com>
Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Date: Fri, 27 Jun 2014 10:42:24 +0200	[thread overview]
Message-ID: <53AD2E70.5060002@univention.de> (raw)
In-Reply-To: <53A84002.5050402@univention.de>

Hello Wei Liu,

On 23.06.2014 16:56, Philipp Hahn wrote:
> On 19.06.2014 16:12, Wei Liu wrote:
>> On Wed, Jun 18, 2014 at 06:48:31PM +0200, Philipp Hahn wrote:
> ...
>>> 5. then xen-netback continues processing the pending requests and tries
> ...
>> I think your analysis makes sense. Netback does have it's internal queue
>> and kthread can certainly be scheduled away. There doesn't seem to be a
>> synchronisation point between a vif getting disconnet and internal queue
>> gets processed. I attach a quick hack. If it does work to a degree then
>> we can try to work out a proper fix.
> 
> Your quick hack seems to have solved the problem: The network survived
> the week-end, but we had to change the VMs as one of them was required
> last weekend. We're currently re-checking that the bug still occurs with
> the old kernel but the ne

We added some debug output (UniDEBUG) as we observed another OOPS in one
test run, but I think that was a mis-compiled kernel as the size of
function was the same as previous, but it should be 0x712 according to
"objdump -S":

> [ 6196.712232] BUG: unable to handle kernel paging request at ffffc90010d94678
> [ 6196.712322] IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> [ 6196.712410] PGD 95822067 PUD 95823067 PMD 94721067 PTE 0
> [ 6196.712473] Oops: 0000 [#1] SMP
...
> [ 6196.713434] CPU: 0 PID: 11743 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1
> Univention 3.10.11-1.58.201405060908a~xenXXX
...
> [ 6196.713618] task: ffff8800917f7840 ti: ffff880004bde000 task.ti: ffff880004bde000
> [ 6196.713701] RIP: e030:[<ffffffffa04147dc>]  [<ffffffffa04147dc>]
> xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]

With the modified patch we now get the following hang:

> [   84.833333] device eth2 entered promiscuous mode
> [  248.191165] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [  248.442727] device vif1.0 entered promiscuous mode
> [  250.721054] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [  250.721099] XXXlan0: port 2(vif1.0) entered forwarding state
> [  250.721103] XXXlan0: port 2(vif1.0) entered forwarding state
> [  253.473859] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [  253.737812] device vif2.0 entered promiscuous mode
> [  255.639021] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [  255.639067] XXXlan0: port 3(vif2.0) entered forwarding state
> [  255.639072] XXXlan0: port 3(vif2.0) entered forwarding state
> [  592.867375] UniDEBUG vif->mapped is set to false(xen_netbk_unmap_frontend_rings)
> [  592.868147] XXXlan0: port 3(vif2.0) entered disabled state
> [  593.499258] XXXlan0: port 3(vif2.0) entered disabled state
> [  593.499293] device vif2.0 left promiscuous mode
> [  593.499295] XXXlan0: port 3(vif2.0) entered disabled state
> [  595.386548] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [  595.633665] device vif3.0 entered promiscuous mode
> [  597.390410] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [  597.390458] XXXlan0: port 3(vif3.0) entered forwarding state
> [  597.390462] XXXlan0: port 3(vif3.0) entered forwarding state
> [  936.549840] UniDEBUG vif->mapped is set to false(xen_netbk_unmap_frontend_rings)
> [  936.549869] XXXlan0: port 3(vif3.0) entered disabled state
> [  936.553024] UniDEBUG vif->mapped is false
here it would oops previously.
> [  937.459565] device vif3.0 left promiscuous mode
> [  937.459570] XXXlan0: port 3(vif3.0) entered disabled state
> [ 1115.250900] INFO: task xenwatch:14 blocked for more than 120 seconds.
> [ 1115.250902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1115.250904] xenwatch        D ffff8800952d9080     0    14      2 0x00000000
> [ 1115.250907]  ffff8800952d9080 0000000000000246 ffff880094510880 0000000000013ec0
> [ 1115.250909]  ffff88009530ffd8 0000000000013ec0 ffff88009530ffd8 0000000000013ec0
> [ 1115.250911]  ffff8800952d9080 0000000000013ec0 0000000000013ec0 ffff88009530e010
> [ 1115.250913] Call Trace:
> [ 1115.250921]  [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f
> [ 1115.250925]  [<ffffffffa040a396>] ? xenvif_free+0x7a/0xb6 [xen_netback]
> [ 1115.250930]  [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
> [ 1115.250934]  [<ffffffff812697c2>] ? xenbus_rm+0x44/0x4f
> [ 1115.250937]  [<ffffffffa0409ead>] ? netback_remove+0x5d/0x7e [xen_netback]
> [ 1115.250940]  [<ffffffff8126a72f>] ? xenbus_dev_remove+0x29/0x4e
> [ 1115.250943]  [<ffffffff812a4ff0>] ? __device_release_driver+0x7f/0xd5
> [ 1115.250946]  [<ffffffff812a50fc>] ? device_release_driver+0x1d/0x29
> [ 1115.250948]  [<ffffffff812a4519>] ? bus_remove_device+0xee/0x103
> [ 1115.250950]  [<ffffffff812a2b49>] ? device_del+0x112/0x182
> [ 1115.250952]  [<ffffffff812a2bc2>] ? device_unregister+0x9/0x12
> [ 1115.250955]  [<ffffffff81268e20>] ? xenwatch_thread+0x122/0x15f
> [ 1115.250957]  [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
> [ 1115.250959]  [<ffffffff81268cfe>] ? xs_watch+0x57/0x57
> [ 1115.250962]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [ 1115.250964]  [<ffffffff81268cfe>] ? xs_watch+0x57/0x57
> [ 1115.250966]  [<ffffffff8105ce1e>] ? kthread+0xab/0xb3
> [ 1115.250969]  [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c
> [ 1115.250972]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [ 1115.250975]  [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0
> [ 1115.250977]  [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56

Any idea?

Sincerely
Philipp

next prev parent reply	other threads:[~2014-06-27  8:42 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-06 10:26 RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Philipp Hahn
2014-06-06 10:58 ` Wei Liu
2014-06-06 22:12   ` Philipp Hahn
2014-06-18 16:48     ` Philipp Hahn
2014-06-19 14:12       ` Wei Liu
2014-06-19 14:35         ` David Vrabel
2014-06-19 14:41           ` Wei Liu
2014-06-23 14:56         ` Philipp Hahn
2014-06-27  8:42           ` Philipp Hahn [this message]
2014-06-27 17:48             ` Philipp Hahn
2014-06-27 18:24               ` Philipp Hahn
2014-07-02  7:45                 ` [PATCH] " Philipp Hahn
2014-07-10 12:41                   ` Wei Liu
     [not found]                   ` <20140710124122.GA2381@zion.uk.xensource.com>
2014-07-11  9:41                     ` Philipp Hahn
     [not found]                     ` <53BFB142.7050201@univention.de>
2014-07-11  9:53                       ` Wei Liu
2014-07-11 10:32                       ` Wei Liu
     [not found]                       ` <20140711103236.GB12584@zion.uk.xensource.com>
2014-07-11 11:02                         ` Philipp Hahn
     [not found]                         ` <53BFC43A.4080709@univention.de>
2014-07-11 11:16                           ` Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53AD2E70.5060002@univention.de \
    --to=hahn@univention.de \
    --cc=Damrose@univention.de \
    --cc=ian.campbell@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    --cc=zoltan.kiss@citrix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).