From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: KY Srinivasan <kys@microsoft.com>,
leann.ogasawara@canonical.com,
Stephen Hemminger <sthemmin@microsoft.com>,
"apw\@canonical.com" <apw@canonical.com>,
"olaf\@aepfle.de" <olaf@aepfle.de>,
"marcelo.cerri\@canonical.com" <marcelo.cerri@canonical.com>,
"gregkh\@linuxfoundation.org" <gregkh@linuxfoundation.org>,
Haiyang Zhang <haiyangz@microsoft.com>,
"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"jasowang\@redhat.com" <jasowang@redhat.com>,
"devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>
Subject: Re: [PATCH 1/1] Drivers: hv: vmbus: Fix rescind handling issues
Date: Mon, 18 Sep 2017 10:31:21 +0200 [thread overview]
Message-ID: <87vakgtnl2.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <CAOaVG15SSUemaWuuCXz3O=3S2jCW+3=1Jdp1Ff8P+za9TFerRA@mail.gmail.com> (Stephen Hemminger's message of "Fri, 15 Sep 2017 14:54:30 -0700")
Stephen Hemminger <stephen@networkplumber.org> writes:
> On Sep 15, 2017 11:01 AM, "KY Srinivasan" <kys@microsoft.com> wrote:
>
> > Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> >
> > >
> > > I'm seeing the same issue, reverting the offending
> > >
> > > commit 6f3d791f300618caf82a2be0c27456edd76d5164
> > > Author: K. Y. Srinivasan <kys@microsoft.com>
> > > Date: Fri Aug 11 10:03:59 2017 -0700
> > >
> > > Drivers: hv: vmbus: Fix rescind handling issues
> > >
> > > which made it upstream helps. Did you guys do some investigation here?
> > > In case not I can take a look next week.
> >
> > Sorry, I have to resend - K.Y.'s kys@exchange.microsoft.com doesn't
> > accept mail from me :-(
>
> I think this turned out to be a bug in netvsc - Stephen can elaborate. I think the fix
> has been submitted upstream as well.
>
> K. Y
> >
> > --
> > Vitaly
>
> It turned out that subchannel call back is run from primary channel call back, and was deadlocking waiting for vmbus open.
>
> Original code had broken wait for sub channels.
> The first for that induced this bug.
>
> The resolution is in latest network driver was to bring up sub channels from work queue
This is a bit confusing, in case you mean
commit 8195b1396ec86dddbba443c74b2188b423556c74
Author: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed Sep 6 13:53:05 2017 -0700
hv_netvsc: fix deadlock on hotplug
is supposed to fix the issue. I'm testing the latest net-next which has
it:
$ git log --oneline drivers/net/hyperv/
5023a6db7319 netvsc: increase default receive buffer size
8f2bb1de7334 hv_netvsc: avoid unnecessary wakeups on subchannel creation
8195b1396ec8 hv_netvsc: fix deadlock on hotplug
db3cd7af9d0f hv_netvsc: Fix the channel limit in netvsc_set_rxfh()
06be580ac7b6 hv_netvsc: Simplify the limit check in netvsc_set_channels()
...
and when I do
# ip link set dev eth0 mtu 1400
# ip link set dev eth0 mtu 1500
(actually, you can do mtu change just once the deadlock has already
happened, doing it twice just reveals the issue faster - it will hang
permanently trying to get rtnl lock which is already taken)
The log is:
[ 248.800089] INFO: task kworker/u480:0:5 blocked for more than 120 seconds.
[ 248.804065] Not tainted 4.14.0-rc1 #63
[ 248.806486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 248.810879] kworker/u480:0 D 0 5 2 0x80000000
[ 248.814225] Workqueue: netns cleanup_net
[ 248.816745] Call Trace:
[ 248.818286] __schedule+0x1d5/0x790
[ 248.820398] ? sched_clock+0x9/0x10
[ 248.822297] ? select_idle_sibling+0x24/0x420
[ 248.824883] schedule+0x31/0x80
[ 248.826835] schedule_preempt_disabled+0x9/0x10
[ 248.829444] __mutex_lock.isra.1+0x1a3/0x4f0
[ 248.831973] ? sched_clock_cpu+0x5d/0xb0
[ 248.834300] __mutex_lock_slowpath+0xe/0x10
[ 248.836869] ? __mutex_lock_slowpath+0xe/0x10
[ 248.839488] mutex_lock+0x2a/0x30
[ 248.841378] rtnl_lock+0x10/0x20
[ 248.843431] cleanup_net+0x94/0x2e0
[ 248.845479] process_one_work+0x193/0x390
[ 248.847838] worker_thread+0x48/0x3c0
[ 248.850121] kthread+0x120/0x140
[ 248.851927] ? process_one_work+0x390/0x390
[ 248.854425] ? kthread_create_on_node+0x60/0x60
[ 248.856884] ? kthread_create_on_node+0x60/0x60
[ 248.859585] ret_from_fork+0x25/0x30
[ 248.861581] INFO: task kworker/9:2:374 blocked for more than 120 seconds.
[ 248.865286] Not tainted 4.14.0-rc1 #63
[ 248.867655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 248.872135] kworker/9:2 D 0 374 2 0x80000000
[ 248.875175] Workqueue: events rndis_set_subchannel [hv_netvsc]
[ 248.878473] Call Trace:
[ 248.879825] __schedule+0x1d5/0x790
[ 248.882122] schedule+0x31/0x80
[ 248.884160] rndis_set_subchannel+0x186/0x1e0 [hv_netvsc]
[ 248.887741] ? finish_wait+0x80/0x80
[ 248.889960] process_one_work+0x193/0x390
[ 248.892349] worker_thread+0x48/0x3c0
[ 248.894484] kthread+0x120/0x140
[ 248.896364] ? process_one_work+0x390/0x390
[ 248.898756] ? kthread_create_on_node+0x60/0x60
[ 248.901229] ret_from_fork+0x25/0x30
[ 248.903180] INFO: task hypervkvpd:656 blocked for more than 120 seconds.
[ 248.907120] Not tainted 4.14.0-rc1 #63
[ 248.909966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 248.914387] hypervkvpd D 0 656 1 0x00000000
[ 248.917618] Call Trace:
[ 248.919119] __schedule+0x1d5/0x790
[ 248.921343] schedule+0x31/0x80
[ 248.924454] schedule_preempt_disabled+0x9/0x10
[ 248.927792] __mutex_lock.isra.1+0x1a3/0x4f0
[ 248.930951] __mutex_lock_slowpath+0xe/0x10
[ 248.934078] ? __mutex_lock_slowpath+0xe/0x10
[ 248.937466] mutex_lock+0x2a/0x30
[ 248.940228] __netlink_dump_start+0x44/0x190
[ 248.943286] rtnetlink_rcv_msg+0x1a2/0x280
[ 248.946237] ? vsnprintf+0xea/0x4d0
[ 248.948888] ? rtnl_getlink+0x1c0/0x1c0
[ 248.951713] ? rtnl_getlink+0x1c0/0x1c0
[ 248.954644] ? rtnl_calcit.isra.26+0x110/0x110
[ 248.958010] netlink_rcv_skb+0x89/0x130
[ 248.960945] rtnetlink_rcv+0x10/0x20
[ 248.963566] netlink_unicast+0x186/0x220
[ 248.966442] netlink_sendmsg+0x2a8/0x3a0
[ 248.969369] sock_sendmsg+0x33/0x40
[ 248.971955] SYSC_sendto+0x13f/0x180
[ 248.974552] ? SYSC_getsockname+0xc7/0xe0
[ 248.977393] ? fd_install+0x20/0x30
[ 248.980044] ? sock_map_fd+0x3f/0x60
[ 248.982995] SyS_sendto+0x9/0x10
[ 248.986106] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 248.989785] RIP: 0033:0x7f544421f0d3
[ 248.992576] RSP: 002b:00007ffcb8a81dc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 248.997458] RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007f544421f0d3
[ 249.002140] RDX: 0000000000000014 RSI: 00007ffcb8a82e20 RDI: 000000000000000a
[ 249.006863] RBP: 00007ffcb8a84e50 R08: 00007ffcb8a82e00 R09: 000000000000000c
[ 249.011590] R10: 0000000000000000 R11: 0000000000000246 R12: 00005598e3c4d963
[ 249.016558] R13: 00005598e3bb1110 R14: 00005598e3c4d8c0 R15: 0000000000000011
[ 249.021481] INFO: task ip:1004 blocked for more than 120 seconds.
[ 249.025703] Not tainted 4.14.0-rc1 #63
[ 249.028713] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 249.039417] ip D 0 1004 938 0x00000000
[ 249.043418] Call Trace:
[ 249.045633] __schedule+0x1d5/0x790
[ 249.048335] schedule+0x31/0x80
[ 249.050795] schedule_preempt_disabled+0x9/0x10
[ 249.054109] __mutex_lock.isra.1+0x1a3/0x4f0
[ 249.057251] ? security_capable+0x43/0x60
[ 249.060188] __mutex_lock_slowpath+0xe/0x10
[ 249.063299] ? __netlink_ns_capable+0x36/0x40
[ 249.066416] ? __mutex_lock_slowpath+0xe/0x10
[ 249.069520] mutex_lock+0x2a/0x30
[ 249.072032] rtnetlink_rcv_msg+0x1cb/0x280
[ 249.075189] ? rtnl_calcit.isra.26+0x110/0x110
[ 249.079047] netlink_rcv_skb+0x89/0x130
[ 249.082209] rtnetlink_rcv+0x10/0x20
[ 249.085369] netlink_unicast+0x186/0x220
[ 249.088767] netlink_sendmsg+0x2a8/0x3a0
[ 249.091777] sock_sendmsg+0x33/0x40
[ 249.094353] SYSC_sendto+0x13f/0x180
[ 249.096938] ? handle_mm_fault+0x65/0xf0
[ 249.099824] ? __do_page_fault+0x250/0x4a0
[ 249.102817] SyS_sendto+0x9/0x10
[ 249.105240] entry_SYSCALL_64_fastpath+0x1a/0xa5
[ 249.108489] RIP: 0033:0x7f972cc28f7d
[ 249.111176] RSP: 002b:00007ffed31097e8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 249.115958] RAX: ffffffffffffffda RBX: 000000000066fc90 RCX: 00007f972cc28f7d
[ 249.120548] RDX: 0000000000000020 RSI: 00007ffed3109800 RDI: 0000000000000003
[ 249.125439] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
[ 249.130397] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[ 249.135432] R13: 00007ffed3109ed0 R14: 00007ffed3109ed8 R15: 00007ffed310ae1f
Reverting 6f3d791f300618caf82a2be0c27456edd76d5164 still helps. Or do
you mean some other fix which is not yet in net-next?
--
Vitaly
next prev parent reply other threads:[~2017-09-18 8:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-11 17:03 [PATCH 1/1] Drivers: hv: vmbus: Fix rescind handling issues kys
2017-08-24 22:41 ` Stephen Hemminger
2017-09-15 13:42 ` Vitaly Kuznetsov
2017-09-15 13:48 ` Vitaly Kuznetsov
2017-09-15 18:01 ` KY Srinivasan
[not found] ` <CAOaVG15SSUemaWuuCXz3O=3S2jCW+3=1Jdp1Ff8P+za9TFerRA@mail.gmail.com>
2017-09-18 8:31 ` Vitaly Kuznetsov [this message]
2017-09-18 12:55 ` Vitaly Kuznetsov
2017-09-18 23:18 ` KY Srinivasan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vakgtnl2.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=apw@canonical.com \
--cc=devel@linuxdriverproject.org \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=jasowang@redhat.com \
--cc=kys@microsoft.com \
--cc=leann.ogasawara@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.cerri@canonical.com \
--cc=olaf@aepfle.de \
--cc=stephen@networkplumber.org \
--cc=sthemmin@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.