From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
KY Srinivasan <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown
Date: Tue, 14 Jul 2015 18:02:54 +0200 [thread overview]
Message-ID: <87lheiybf5.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <19f503e369b04d01b79a1bde866a39f8@SIXPR30MB031.064d.mgd.msft.net> (Dexuan Cui's message of "Tue, 14 Jul 2015 11:41:44 +0000")
Dexuan Cui <decui@microsoft.com> writes:
>> -----Original Message-----
>> From: Vitaly Kuznetsov
>> Sent: Monday, July 13, 2015 20:19
>> Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device
>> shutdown
>>
>> When a new subchannel offer from host comes during device shutdown (e.g.
>> when a netvsc/storvsc module is unloadedshortly after it was loaded) a
>> crash can happen as vmbus_process_offer() is not anyhow serialized with
>> vmbus_remove().
>
> How about vmbus_onoffer_rescind()?
> It's not serialized with vmbus_remove() either, so I think there is an issue too?
>
> I remember when 'rmmod hv_netvsc', we get a rescind-offer message for
> each subchannel.
>
True, I think we have a race with rescind messages as well, we just
never saw crashes for some reason. I'll think how we can make the fix
more general.
>> As an example we can try calling subchannel create
>> callback when the module is already unloaded.
>> The following crash was observed while keeping loading/unloading hv_netvsc
>> module on 64-CPU guest:
>>
>> hv_netvsc vmbus_14: net device safe to remove
>> BUG: unable to handle kernel paging request at 000000000000a118
>> IP: [<ffffffffa01c1b21>] netvsc_sc_open+0x31/0xb0 [hv_netvsc]
>> PGD 1f3946a067 PUD 1f38a5f067 PMD 0
>> Oops: 0000 [#1] SMP
>> ...
>> Call Trace:
>> [<ffffffffa0055cd7>] vmbus_onoffer+0x477/0x530 [hv_vmbus]
>> [<ffffffff81092e1f>] ? move_linked_works+0x5f/0x80
>> [<ffffffffa0055fd3>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
>> [<ffffffffa0052f81>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
>> [<ffffffff81095cde>] process_one_work+0x18e/0x4e0
>> ...
>>
>> The issue cannot be solved by just resetting sc_creation_callback on
>> driver removal as while we search for the parent channel with channel_lock
>> held we release it after the channel was found and it can disapper beneath
>> our feet while we're still in vmbus_process_offer();
>>
>> Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure
>> no new subchannels are created after we started the removal procedure.
>> Check its state with mutex_trylock in vmbus_process_offer().
>
> In my 8-CPU VM, I can very easily reproduce the panic by
> 1. running
> while ((1)); do modprobe -r hv_netvsc; modprobe hv_netvsc; sleep 10; done.
>
> and
> 2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into
> the primary channel's sc_list (and before the sc_creation_callback is invoked):
> (I added line 275)
>
> 262 if (!fnew) {
> 263 /*
> 264 * Check to see if this is a sub-channel.
> 265 */
> 266 if (newchannel->offermsg.offer.sub_channel_index != 0) {
> 267 /*
> 268 * Process the sub-channel.
> 269 */
> 270 newchannel->primary_channel = channel;
> 271 spin_lock_irqsave(&channel->lock, flags);
> 272 list_add_tail(&newchannel->sc_list, &channel->sc_list);
> 273 channel->num_sc++;
> 274 spin_unlock_irqrestore(&channel->lock, flags);
> 275 ssleep(3);
> 276 } else
> 277 goto err_free_chan;
> 278 }
It is possible to see crashes even without such intrumentation, move
CPUs and slower host will do the job.
Thanks,
--
Vitaly
next prev parent reply other threads:[~2015-07-14 16:02 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-13 12:18 [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown Vitaly Kuznetsov
2015-07-13 15:10 ` KY Srinivasan
2015-07-13 15:28 ` Vitaly Kuznetsov
2015-07-14 11:41 ` Dexuan Cui
2015-07-14 16:02 ` Vitaly Kuznetsov [this message]
2015-07-16 0:31 ` KY Srinivasan
2015-07-16 13:42 ` KY Srinivasan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lheiybf5.fsf@vitty.brq.redhat.com \
--to=vkuznets@redhat.com \
--cc=decui@microsoft.com \
--cc=devel@linuxdriverproject.org \
--cc=haiyangz@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.