From: Philipp Hahn <hahn@univention.de>
To: Wei Liu <wei.liu2@citrix.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
Erik Damrose <Damrose@univention.de>,
Ian Campbell <ian.campbell@citrix.com>,
Zoltan Kiss <zoltan.kiss@citrix.com>
Subject: Re: RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb
Date: Fri, 27 Jun 2014 10:42:24 +0200 [thread overview]
Message-ID: <53AD2E70.5060002@univention.de> (raw)
In-Reply-To: <53A84002.5050402@univention.de>
Hello Wei Liu,
On 23.06.2014 16:56, Philipp Hahn wrote:
> On 19.06.2014 16:12, Wei Liu wrote:
>> On Wed, Jun 18, 2014 at 06:48:31PM +0200, Philipp Hahn wrote:
> ...
>>> 5. then xen-netback continues processing the pending requests and tries
> ...
>> I think your analysis makes sense. Netback does have it's internal queue
>> and kthread can certainly be scheduled away. There doesn't seem to be a
>> synchronisation point between a vif getting disconnet and internal queue
>> gets processed. I attach a quick hack. If it does work to a degree then
>> we can try to work out a proper fix.
>
> Your quick hack seems to have solved the problem: The network survived
> the week-end, but we had to change the VMs as one of them was required
> last weekend. We're currently re-checking that the bug still occurs with
> the old kernel but the ne
We added some debug output (UniDEBUG) as we observed another OOPS in one
test run, but I think that was a mis-compiled kernel as the size of
function was the same as previous, but it should be 0x712 according to
"objdump -S":
> [ 6196.712232] BUG: unable to handle kernel paging request at ffffc90010d94678
> [ 6196.712322] IP: [<ffffffffa04147dc>] xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
> [ 6196.712410] PGD 95822067 PUD 95823067 PMD 94721067 PTE 0
> [ 6196.712473] Oops: 0000 [#1] SMP
...
> [ 6196.713434] CPU: 0 PID: 11743 Comm: netback/0 Not tainted 3.10.0-ucs58-amd64 #1
> Univention 3.10.11-1.58.201405060908a~xenXXX
...
> [ 6196.713618] task: ffff8800917f7840 ti: ffff880004bde000 task.ti: ffff880004bde000
> [ 6196.713701] RIP: e030:[<ffffffffa04147dc>] [<ffffffffa04147dc>]
> xen_netbk_rx_action+0x18b/0x6f0 [xen_netback]
With the modified patch we now get the following hang:
> [ 84.833333] device eth2 entered promiscuous mode
> [ 248.191165] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [ 248.442727] device vif1.0 entered promiscuous mode
> [ 250.721054] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [ 250.721099] XXXlan0: port 2(vif1.0) entered forwarding state
> [ 250.721103] XXXlan0: port 2(vif1.0) entered forwarding state
> [ 253.473859] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [ 253.737812] device vif2.0 entered promiscuous mode
> [ 255.639021] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [ 255.639067] XXXlan0: port 3(vif2.0) entered forwarding state
> [ 255.639072] XXXlan0: port 3(vif2.0) entered forwarding state
> [ 592.867375] UniDEBUG vif->mapped is set to false(xen_netbk_unmap_frontend_rings)
> [ 592.868147] XXXlan0: port 3(vif2.0) entered disabled state
> [ 593.499258] XXXlan0: port 3(vif2.0) entered disabled state
> [ 593.499293] device vif2.0 left promiscuous mode
> [ 593.499295] XXXlan0: port 3(vif2.0) entered disabled state
> [ 595.386548] UniDEBUG vif->mapped is set to false (xenvif_alloc)
> [ 595.633665] device vif3.0 entered promiscuous mode
> [ 597.390410] UniDEBUG vif->mapped is true (xen_netbk_map_frontend_rings)
> [ 597.390458] XXXlan0: port 3(vif3.0) entered forwarding state
> [ 597.390462] XXXlan0: port 3(vif3.0) entered forwarding state
> [ 936.549840] UniDEBUG vif->mapped is set to false(xen_netbk_unmap_frontend_rings)
> [ 936.549869] XXXlan0: port 3(vif3.0) entered disabled state
> [ 936.553024] UniDEBUG vif->mapped is false
here it would oops previously.
> [ 937.459565] device vif3.0 left promiscuous mode
> [ 937.459570] XXXlan0: port 3(vif3.0) entered disabled state
> [ 1115.250900] INFO: task xenwatch:14 blocked for more than 120 seconds.
> [ 1115.250902] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1115.250904] xenwatch D ffff8800952d9080 0 14 2 0x00000000
> [ 1115.250907] ffff8800952d9080 0000000000000246 ffff880094510880 0000000000013ec0
> [ 1115.250909] ffff88009530ffd8 0000000000013ec0 ffff88009530ffd8 0000000000013ec0
> [ 1115.250911] ffff8800952d9080 0000000000013ec0 0000000000013ec0 ffff88009530e010
> [ 1115.250913] Call Trace:
> [ 1115.250921] [<ffffffff813ca32d>] ? _raw_spin_lock_irqsave+0x11/0x2f
> [ 1115.250925] [<ffffffffa040a396>] ? xenvif_free+0x7a/0xb6 [xen_netback]
> [ 1115.250930] [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
> [ 1115.250934] [<ffffffff812697c2>] ? xenbus_rm+0x44/0x4f
> [ 1115.250937] [<ffffffffa0409ead>] ? netback_remove+0x5d/0x7e [xen_netback]
> [ 1115.250940] [<ffffffff8126a72f>] ? xenbus_dev_remove+0x29/0x4e
> [ 1115.250943] [<ffffffff812a4ff0>] ? __device_release_driver+0x7f/0xd5
> [ 1115.250946] [<ffffffff812a50fc>] ? device_release_driver+0x1d/0x29
> [ 1115.250948] [<ffffffff812a4519>] ? bus_remove_device+0xee/0x103
> [ 1115.250950] [<ffffffff812a2b49>] ? device_del+0x112/0x182
> [ 1115.250952] [<ffffffff812a2bc2>] ? device_unregister+0x9/0x12
> [ 1115.250955] [<ffffffff81268e20>] ? xenwatch_thread+0x122/0x15f
> [ 1115.250957] [<ffffffff8105d373>] ? wake_up_bit+0x20/0x20
> [ 1115.250959] [<ffffffff81268cfe>] ? xs_watch+0x57/0x57
> [ 1115.250962] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [ 1115.250964] [<ffffffff81268cfe>] ? xs_watch+0x57/0x57
> [ 1115.250966] [<ffffffff8105ce1e>] ? kthread+0xab/0xb3
> [ 1115.250969] [<ffffffff81003638>] ? xen_end_context_switch+0xe/0x1c
> [ 1115.250972] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
> [ 1115.250975] [<ffffffff813cfbfc>] ? ret_from_fork+0x7c/0xb0
> [ 1115.250977] [<ffffffff8105cd73>] ? kthread_freezable_should_stop+0x56/0x56
Any idea?
Sincerely
Philipp
next prev parent reply other threads:[~2014-06-27 8:42 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-06 10:26 RFH: Kernel OOPS in xen_netbk_rx_action / xenvif_gop_skb Philipp Hahn
2014-06-06 10:58 ` Wei Liu
2014-06-06 22:12 ` Philipp Hahn
2014-06-18 16:48 ` Philipp Hahn
2014-06-19 14:12 ` Wei Liu
2014-06-19 14:35 ` David Vrabel
2014-06-19 14:41 ` Wei Liu
2014-06-23 14:56 ` Philipp Hahn
2014-06-27 8:42 ` Philipp Hahn [this message]
2014-06-27 17:48 ` Philipp Hahn
2014-06-27 18:24 ` Philipp Hahn
2014-07-02 7:45 ` [PATCH] " Philipp Hahn
2014-07-10 12:41 ` Wei Liu
[not found] ` <20140710124122.GA2381@zion.uk.xensource.com>
2014-07-11 9:41 ` Philipp Hahn
[not found] ` <53BFB142.7050201@univention.de>
2014-07-11 9:53 ` Wei Liu
2014-07-11 10:32 ` Wei Liu
[not found] ` <20140711103236.GB12584@zion.uk.xensource.com>
2014-07-11 11:02 ` Philipp Hahn
[not found] ` <53BFC43A.4080709@univention.de>
2014-07-11 11:16 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53AD2E70.5060002@univention.de \
--to=hahn@univention.de \
--cc=Damrose@univention.de \
--cc=ian.campbell@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xenproject.org \
--cc=zoltan.kiss@citrix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.