All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown
Date: Tue, 14 Jul 2015 18:02:54 +0200	[thread overview]
Message-ID: <87lheiybf5.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <19f503e369b04d01b79a1bde866a39f8@SIXPR30MB031.064d.mgd.msft.net> (Dexuan Cui's message of "Tue, 14 Jul 2015 11:41:44 +0000")

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov
>> Sent: Monday, July 13, 2015 20:19
>> Subject: [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device
>> shutdown
>> 
>> When a new subchannel offer from host comes during device shutdown (e.g.
>> when a netvsc/storvsc module is unloadedshortly after it was loaded) a
>> crash can happen as vmbus_process_offer() is not anyhow serialized with
>> vmbus_remove().
>
> How about vmbus_onoffer_rescind()? 
> It's not serialized with vmbus_remove() either, so I think there is an issue too?
>
> I remember when 'rmmod hv_netvsc', we get a rescind-offer message for
> each subchannel.
>

True, I think we have a race with rescind messages as well, we just
never saw crashes for some reason. I'll think how we can make the fix
more general.

>> As an example we can try calling subchannel create
>> callback when the module is already unloaded.
>> The following crash was observed while keeping loading/unloading hv_netvsc
>> module on 64-CPU guest:
>> 
>>   hv_netvsc vmbus_14: net device safe to remove
>>   BUG: unable to handle kernel paging request at 000000000000a118
>>   IP: [<ffffffffa01c1b21>] netvsc_sc_open+0x31/0xb0 [hv_netvsc]
>>   PGD 1f3946a067 PUD 1f38a5f067 PMD 0
>>   Oops: 0000 [#1] SMP
>>   ...
>>   Call Trace:
>>   [<ffffffffa0055cd7>] vmbus_onoffer+0x477/0x530 [hv_vmbus]
>>   [<ffffffff81092e1f>] ? move_linked_works+0x5f/0x80
>>   [<ffffffffa0055fd3>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
>>   [<ffffffffa0052f81>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
>>   [<ffffffff81095cde>] process_one_work+0x18e/0x4e0
>>   ...
>> 
>> The issue cannot be solved by just resetting sc_creation_callback on
>> driver removal as while we search for the parent channel with channel_lock
>> held we release it after the channel was found and it can disapper beneath
>> our feet while we're still in vmbus_process_offer();
>> 
>> Introduce new sc_create_lock mutex and take it in vmbus_remove() to ensure
>> no new subchannels are created after we started the removal procedure.
>> Check its state with mutex_trylock in vmbus_process_offer().
>
> In my 8-CPU VM, I can very easily reproduce the panic by
> 1. running
>   while ((1)); do modprobe -r  hv_netvsc; modprobe hv_netvsc; sleep 10; done.
>
> and
> 2. in vmbus_onoffer_rescind(), we sleep 3s after a subchannel is added into
> the primary channel's sc_list (and before the sc_creation_callback is invoked):
> (I added line 275)
>
> 262         if (!fnew) {
> 263                 /*
> 264                  * Check to see if this is a sub-channel.
> 265                  */
> 266                 if (newchannel->offermsg.offer.sub_channel_index != 0) {
> 267                         /*
> 268                          * Process the sub-channel.
> 269                          */
> 270                         newchannel->primary_channel = channel;
> 271                         spin_lock_irqsave(&channel->lock, flags);
> 272                         list_add_tail(&newchannel->sc_list, &channel->sc_list);
> 273                         channel->num_sc++;
> 274                         spin_unlock_irqrestore(&channel->lock, flags);
> 275                         ssleep(3);
> 276                 } else
> 277                         goto err_free_chan;
> 278         }

It is possible to see crashes even without such intrumentation, move
CPUs and slower host will do the job.

Thanks,

-- 
  Vitaly

  reply	other threads:[~2015-07-14 16:02 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-13 12:18 [PATCH] Drivers: hv: vmbus: prevent new subchannel creation on device shutdown Vitaly Kuznetsov
2015-07-13 15:10 ` KY Srinivasan
2015-07-13 15:28   ` Vitaly Kuznetsov
2015-07-14 11:41 ` Dexuan Cui
2015-07-14 16:02   ` Vitaly Kuznetsov [this message]
2015-07-16  0:31     ` KY Srinivasan
2015-07-16 13:42     ` KY Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lheiybf5.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.