From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: devel@linuxdriverproject.org, haiyangz@microsoft.com,
sthemmin@microsoft.com, netdev@vger.kernel.org
Subject: Re: [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf
Date: Mon, 07 Aug 2017 17:17:19 +0200 [thread overview]
Message-ID: <87a83bcsqo.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <87y3qvcxci.fsf@vitty.brq.redhat.com> (Vitaly Kuznetsov's message of "Mon, 07 Aug 2017 15:37:49 +0200")
Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> Stephen Hemminger <stephen@networkplumber.org> writes:
>>
>>> With new transparent VF support, it is possible to get a deadlock
>>> when some of the deferred work is running and the unregister_vf
>>> is trying to cancel the work element. The solution is to use
>>> trylock and reschedule (similar to bonding and team device).
>>>
>>> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
>>> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>>> ---
>>> drivers/net/hyperv/netvsc_drv.c | 12 ++++++++++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
>>> index c71728d82049..e75c0f852a63 100644
>>> --- a/drivers/net/hyperv/netvsc_drv.c
>>> +++ b/drivers/net/hyperv/netvsc_drv.c
>>> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
>>> struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
>>> struct net_device *vf_netdev;
>>>
>>> - rtnl_lock();
>>> + if (!rtnl_trylock()) {
>>> + schedule_work(w);
>>> + return;
>>> + }
>>> +
>>> vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
>>> if (vf_netdev)
>>> __netvsc_vf_setup(ndev, vf_netdev);
>>> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
>>> struct net_device *vf_netdev;
>>> bool vf_is_up;
>>>
>>> - rtnl_lock();
>>> + if (!rtnl_trylock()) {
>>> + schedule_work(w);
>>> + return;
>>> + }
>>> +
>>
>> So in the situation when we're currently in netvsc_unregister_vf() and
>> trying to do
>> cancel_work_sync(&net_device_ctx->vf_takeover);
>> cancel_work_sync(&net_device_ctx->vf_notify);
>>
>> we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
>> create an issue as nobody is switching the datapath back to netvsc?
>>
>
> Actually, looking more at this I think we have additional issues:
>
> netvsc_unregister_vf() may get executed _before_ netvsc_vf_update() gets
> a chance and we just cancel it so the data path is never switched
> back. I actually have a VM where I suppose it happens ...
>
> [ 7.235566] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF up: enP2p0s2
> [ 7.235569] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Datapath switched to VF: enP2p0s2
>
> On VF removal:
>
> [ 17.675885] mlx4_en: enP2p0s2: Close port called
> [ 17.727005] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF unregistering: enP2p0s2
> <and nothing after - so the data path is not switched>
>
> We need to make sure netvsc_vf_update() is always processed on removal.
So the question I have is: why do we need to call netvsc_vf_update()
from a work? I tried calling it directly from netvsc_netdev_event() (and
with rtnl_lock()/unlock() calls dropped from it as we already have it,
of course) and everything seems to work for me.
Shall I send a patch removing the work?
--
Vitaly
next prev parent reply other threads:[~2017-08-07 15:17 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-04 19:13 [PATCH net-next 0/1] netvsc: fix deadlock in VF unregister Stephen Hemminger
2017-08-04 19:14 ` [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf Stephen Hemminger
2017-08-07 4:29 ` David Miller
2017-08-07 13:08 ` Vitaly Kuznetsov
2017-08-07 13:37 ` Vitaly Kuznetsov
2017-08-07 15:17 ` Vitaly Kuznetsov [this message]
2017-08-07 15:21 ` Stephen Hemminger
2017-08-07 15:17 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a83bcsqo.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=devel@linuxdriverproject.org \
--cc=haiyangz@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=stephen@networkplumber.org \
--cc=sthemmin@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.