netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: devel@linuxdriverproject.org, haiyangz@microsoft.com,
	sthemmin@microsoft.com, netdev@vger.kernel.org
Subject: Re: [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf
Date: Mon, 7 Aug 2017 08:17:56 -0700	[thread overview]
Message-ID: <20170807081756.1cf95326@xeon-e3> (raw)
In-Reply-To: <87y3qvcxci.fsf@vitty.brq.redhat.com>

On Mon, 07 Aug 2017 15:37:49 +0200
Vitaly Kuznetsov <vkuznets@redhat.com> wrote:

> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
> 
> > Stephen Hemminger <stephen@networkplumber.org> writes:
> >  
> >> With new transparent VF support, it is possible to get a deadlock
> >> when some of the deferred work is running and the unregister_vf
> >> is trying to cancel the work element. The solution is to use
> >> trylock and reschedule (similar to bonding and team device).
> >>
> >> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> Fixes: 0c195567a8f6 ("netvsc: transparent VF management")
> >> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >> ---
> >>  drivers/net/hyperv/netvsc_drv.c | 12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> >> index c71728d82049..e75c0f852a63 100644
> >> --- a/drivers/net/hyperv/netvsc_drv.c
> >> +++ b/drivers/net/hyperv/netvsc_drv.c
> >> @@ -1601,7 +1601,11 @@ static void netvsc_vf_setup(struct work_struct *w)
> >>  	struct net_device *ndev = hv_get_drvdata(ndev_ctx->device_ctx);
> >>  	struct net_device *vf_netdev;
> >>
> >> -	rtnl_lock();
> >> +	if (!rtnl_trylock()) {
> >> +		schedule_work(w);
> >> +		return;
> >> +	}
> >> +
> >>  	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
> >>  	if (vf_netdev)
> >>  		__netvsc_vf_setup(ndev, vf_netdev);
> >> @@ -1655,7 +1659,11 @@ static void netvsc_vf_update(struct work_struct *w)
> >>  	struct net_device *vf_netdev;
> >>  	bool vf_is_up;
> >>
> >> -	rtnl_lock();
> >> +	if (!rtnl_trylock()) {
> >> +		schedule_work(w);
> >> +		return;
> >> +	}
> >> +  
> >
> > So in the situation when we're currently in netvsc_unregister_vf() and
> > trying to do
> >         cancel_work_sync(&net_device_ctx->vf_takeover);
> > 	cancel_work_sync(&net_device_ctx->vf_notify);
> >
> > we'll end up not executing netvsc_vf_update() at all, right? Wouldn't it
> > create an issue as nobody is switching the datapath back to netvsc?

It worked testing, but most likely only because host is doing it for us.
Not a good thing to rely on.

> 
> Actually, looking more at this I think we have additional issues:
> 
> netvsc_unregister_vf() may get executed _before_ netvsc_vf_update() gets
> a chance and we just cancel it so the data path is never switched
> back. I actually have a VM where I suppose it happens ...
> 
> [    7.235566] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF up: enP2p0s2
> [    7.235569] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: Datapath switched to VF: enP2p0s2
> 
> On VF removal:
> 
> [   17.675885] mlx4_en: enP2p0s2: Close port called
> [   17.727005] hv_netvsc 33b7a6f9-6736-451f-8fce-b382eaa50bee eth1: VF unregistering: enP2p0s2
> <and nothing after - so the data path is not switched>
> 
> We need to make sure netvsc_vf_update() is always processed on removal.

The reason vf_update was converted to work queue was because there were some
case the old code could sleep. Probably best to go back to doing it directly
in notifier and handle the special cases.

      parent reply	other threads:[~2017-08-07 15:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-04 19:13 [PATCH net-next 0/1] netvsc: fix deadlock in VF unregister Stephen Hemminger
2017-08-04 19:14 ` [PATCH net-next 1/1] netvsc: fix rtnl deadlock on unregister of vf Stephen Hemminger
2017-08-07  4:29   ` David Miller
2017-08-07 13:08   ` Vitaly Kuznetsov
2017-08-07 13:37     ` Vitaly Kuznetsov
2017-08-07 15:17       ` Vitaly Kuznetsov
2017-08-07 15:21         ` Stephen Hemminger
2017-08-07 15:17       ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170807081756.1cf95326@xeon-e3 \
    --to=stephen@networkplumber.org \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=netdev@vger.kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).