All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Dexuan Cui <decui@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>,
	"devel\@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels
Date: Wed, 04 Feb 2015 10:32:26 +0100	[thread overview]
Message-ID: <877fvyrphx.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <F792CF86EFE20D4AB8064279AFBA51C61F33A6AF@HKNPRD3002MB017.064d.mgd.msft.net> (Dexuan Cui's message of "Wed, 4 Feb 2015 08:18:17 +0000")

Dexuan Cui <decui@microsoft.com> writes:

>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
>> Sent: Wednesday, February 4, 2015 1:01 AM
>> To: KY Srinivasan; devel@linuxdriverproject.org
>> Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui; Jason Wang
>> Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for
>> vmbus channels
>> 
>> free_channel() function frees the channel unconditionally so we need to make
>> sure nobody has any link to it. This is not trivial and there are several
>> examples of races we have:
>> 
>> 1) In vmbus_onoffer_rescind() we check for channel existence with
>>    relid2channel() and then use it. This can go wrong if we're in the middle
>>    of channel removal (free_channel() was already called).
>> 
>> 2) In process_chn_event() we check for channel existence with
>>    pcpu_relid2channel() and then use it. This can also go wrong.
>> 
>> 3) vmbus_free_channels() just frees all channels, in case we're in the middle
>>    of vmbus_process_rescind_offer() crash is possible.
>> 
>> The issue can be solved by holding vmbus_connection.channel_lock everywhere,
>> however, it looks like a way to deadlocks and performance degradation. Get/put
>> workflow fits here the best.
>> 
>> Implement vmbus_get_channel()/vmbus_put_channel() pair instead of
>> free_channel().
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/channel_mgmt.c | 45
>> ++++++++++++++++++++++++++++++++++++++-------
>>  drivers/hv/connection.c   |  7 +++++--
>>  drivers/hv/hyperv_vmbus.h |  4 ++++
>>  include/linux/hyperv.h    | 13 +++++++++++++
>>  4 files changed, 60 insertions(+), 9 deletions(-)
>> 
>> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
>> index 36bacc7..eb9ce94 100644
>> --- a/drivers/hv/channel_mgmt.c
>> +++ b/drivers/hv/channel_mgmt.c
>> @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void)
>>  		return NULL;
>> 
>>  	channel->id = atomic_inc_return(&chan_num);
>> +	atomic_set(&channel->count, 1);
>> +
>>  	spin_lock_init(&channel->inbound_lock);
>>  	spin_lock_init(&channel->lock);
>> 
>> @@ -178,19 +180,47 @@ static void release_channel(struct work_struct *work)
>>  }
>> 
>>  /*
>> - * free_channel - Release the resources used by the vmbus channel object
>> + * vmbus_put_channel - Decrease the channel usage counter and release the
>> + * resources when this counter reaches zero.
>>   */
>> -static void free_channel(struct vmbus_channel *channel)
>> +void vmbus_put_channel(struct vmbus_channel *channel)
>>  {
>> +	unsigned long flags;
>> 
>>  	/*
>>  	 * We have to release the channel's workqueue/thread in the vmbus's
>>  	 * workqueue/thread context
>>  	 * ie we can't destroy ourselves.
>>  	 */
>> -	INIT_WORK(&channel->work, release_channel);
>> -	queue_work(vmbus_connection.work_queue, &channel->work);
>> +	spin_lock_irqsave(&channel->lock, flags);
>> +	if (atomic_dec_and_test(&channel->count)) {
>> +		channel->dying = true;
>> +		INIT_WORK(&channel->work, release_channel);
>> +		spin_unlock_irqrestore(&channel->lock, flags);
>> +		queue_work(vmbus_connection.work_queue, &channel->work);
>> +	} else
>> +		spin_unlock_irqrestore(&channel->lock, flags);
>> +}
>> +EXPORT_SYMBOL_GPL(vmbus_put_channel);
>> +
>> +/* vmbus_get_channel - Get additional reference to the channel */
>> +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel)
>> +{
>> +	unsigned long flags;
>> +	struct vmbus_channel *ret = NULL;
>> +
>> +	if (!channel)
>> +		return NULL;
>> +
>> +	spin_lock_irqsave(&channel->lock, flags);
>> +	if (!channel->dying) {
>> +		atomic_inc(&channel->count);
>> +		ret = channel;
>> +	}
>> +	spin_unlock_irqrestore(&channel->lock, flags);
>> +	return ret;
>>  }
>> +EXPORT_SYMBOL_GPL(vmbus_get_channel);
>> 
>>  static void percpu_channel_enq(void *arg)
>>  {
>> @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct
>> work_struct *work)
>>  		list_del(&channel->sc_list);
>>  		spin_unlock_irqrestore(&primary_channel->lock, flags);
>>  	}
>> -	free_channel(channel);
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  void vmbus_free_channels(void)
>> @@ -262,7 +292,7 @@ void vmbus_free_channels(void)
>> 
>>  	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>>  		vmbus_device_unregister(channel->device_obj);
>> -		free_channel(channel);
>> +		vmbus_put_channel(channel);
>>  	}
>>  }
>> 
>> @@ -391,7 +421,7 @@ done_init_rescind:
>>  	spin_unlock_irqrestore(&newchannel->lock, flags);
>>  	return;
>>  err_free_chan:
>> -	free_channel(newchannel);
>> +	vmbus_put_channel(newchannel);
>>  }
>> 
>>  enum {
>> @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct
>> vmbus_channel_message_header *hdr)
>>  		queue_work(channel->controlwq, &channel->work);
>> 
>>  	spin_unlock_irqrestore(&channel->lock, flags);
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  /*
>> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> index c4acd1c..d1ce134 100644
>> --- a/drivers/hv/connection.c
>> +++ b/drivers/hv/connection.c
>> @@ -247,7 +247,8 @@ void vmbus_disconnect(void)
>>   * Map the given relid to the corresponding channel based on the
>>   * per-cpu list of channels that have been affinitized to this CPU.
>>   * This will be used in the channel callback path as we can do this
>> - * mapping in a lock-free fashion.
>> + * mapping in a lock-free fashion. Takes additional reference to the
>> + * channel, all users are supposed to do vmbus_put_channel().
>>   */
>>  static struct vmbus_channel *pcpu_relid2channel(u32 relid)
>>  {
>> @@ -263,7 +264,7 @@ static struct vmbus_channel *pcpu_relid2channel(u32
>> relid)
>>  		}
>>  	}
>> 
>> -	return found_channel;
>> +	return vmbus_get_channel(found_channel);
>>  }
>> 
>>  /*
>> @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid)
>>  			}
>>  		}
>>  	}
>> +	found_channel = vmbus_get_channel(found_channel);
>>  	spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags);
>> 
>>  	return found_channel;
>> @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid)
>>  		pr_err("no channel callback for relid - %u\n", relid);
>>  	}
>> 
>> +	vmbus_put_channel(channel);
>>  }
>> 
>>  /*
>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>> index b055e53..40d70f0 100644
>> --- a/drivers/hv/hyperv_vmbus.h
>> +++ b/drivers/hv/hyperv_vmbus.h
>> @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device
>> *device_obj);
>>  /* VmbusChildDeviceDestroy( */
>>  /* struct hv_device *); */
>> 
>> +/*
>> + * Get the channel by its relid. Takes additional reference to the channel so
>> + * all users are supposed to do vmbus_put_channel() when they're done.
>> + */
>>  struct vmbus_channel *relid2channel(u32 relid);
>> 
>>  void vmbus_free_channels(void);
>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>> index e73cfeb..c576d2d 100644
>> --- a/include/linux/hyperv.h
>> +++ b/include/linux/hyperv.h
>> @@ -649,6 +649,9 @@ struct vmbus_channel {
>>  	/* Unique channel id */
>>  	int id;
>> 
>> +	/* Active reference count */
>> +	atomic_t count;
>> +
>>  	struct list_head listentry;
>> 
>>  	struct hv_device *device_obj;
>> @@ -666,6 +669,7 @@ struct vmbus_channel {
>>  	u8 monitor_bit;
>> 
>>  	bool rescind; /* got rescind msg */
>> +	bool dying; /* channel is dying */
>> 
>>  	u32 ringbuffer_gpadlhandle;
>> 
>> @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct
>> vmbus_channel *channel,
>> 
>>  extern void vmbus_ontimer(unsigned long data);
>> 
>> +/*
>> + * Decrease reference count for the channel. Frees the channel when its usage
>> + * count reaches zero.
>> + */
>> +extern void vmbus_put_channel(struct vmbus_channel *channel);
>> +
>> +/* Get additional reference to the channel */
>> +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel
>> *channel);
>> +
>>  /* Base driver object */
>>  struct hv_driver {
>>  	const char *name;
>> --
>
> Hi Vitaly,
> Thanks for the patchset! 
> I once tried to do the same work, but just didn't find enough time. :-)
>
> Please see my below questions:

Thanks for your detailed review! I should have added 'RFC' to my series :-)

>
> Since we always get channel->lock when accessing channel->count, I don't
> think the counter has to be of atomic_t?

Agreed

>
> I suggest we add vmbus_get/put_channel() in vmbus_open/close().
> In this way, IMO patch 3/4 and 4/4 would be unnecessary?
>

I was thinking about that but I wasn't sure it is going to be enough for
sub-channels. I'll look again and report.

> In vmbus_exit(), I suspect we should swap the order of the 2 lines:
>         vmbus_free_channels();
>         bus_unregister(&hv_bus);
> I suppose in bus_unregister(), the .remove callbacks of all the drivers are
> invoked, and vmbus_close() is  invoked for every channel.

Makes sense!

>
> BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,):
> due to max_active==1 here, a channel's process_offer() and
> process_rescind_offer() is already serialized.
> I'm not sure if this fact can be used to simplify the logic?

Workers are serialized but e.g. vmbus_onoffer_rescind() is not
serialized with them, e.g. there is no protection against channel
dessapearance after we found it with relid2channel() (e.g. we're on
failure path in vmbus_process_offer()).

Anyway, we have another examples of (theoretically) possible issues and
get/put workflow should simplify the syncronization.

> Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer"
> necessary?

It is as not checking work function is bad anyway as (in theory) we can
run vmbus_process_offer() twice if vmbus_onoffer_rescind() runs before
vmbus_process_offer() was started.

>
> BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue,
> not related to this patch of yours):
>
> vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel -> 
> 	list_for_each_entry(channel, pcpu_head, percpu_list) 
>
> and
>
> vmbus_process_rescind_offer() -> percpu_channel_deq() ->
> 	list_del(&channel->percpu_list)
>
> because the former is running in tasklet and hence can preempt the latter (running
> in process context).  percpu_channel_enq() should have the same
> issue(?)

As I said in my cover letter I don't believe my series solves all issues :-)

>
> We should add local_bh_disable()/enable() accordingly?
> Can you please help to fix this?

Yes, sure, I'll take a look. 

>
> vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have
> a race with vmbus_process_rescind_offer()?
> After vmbus_process_rescind_offer() ->  vmbus_device_unregister(), the count
> can already be the initial 1, later vmbus_free_channels() -> vmbus_put_channel() can
> kfree the channel before vmbus_process_rescind_offer() completely
> finishes?

I was going to rewrite vmbus_free_channels() not only because of
that. E.g. vmbus_get_outgoing_channel() presumes all lists (in both
channel.sc_list and vmbus_connection.chn_list) are valid and we don't
take care of that. Not that I believe vmbus_get_outgoing_channel()
may happen when vmbus_free_channels() is active (and we call
vmbus_free_channels() only if we unload the module)


Thanks for your thorough review, it gave me further inspiration :-)

-- 
  Vitaly

  reply	other threads:[~2015-02-04  9:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-03 17:00 [PATCH 0/4] Drivers: hv: Further protection for the rescind path Vitaly Kuznetsov
2015-02-03 17:00 ` [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels Vitaly Kuznetsov
2015-02-04  8:18   ` Dexuan Cui
2015-02-04  9:32     ` Vitaly Kuznetsov [this message]
2015-02-04  9:54       ` Dexuan Cui
2015-02-04  9:14   ` Jason Wang
2015-02-04  9:33     ` Vitaly Kuznetsov
2015-02-03 17:00 ` [PATCH 2/4] Drivers: hv: vmbus: do not lose rescind offer on failure in vmbus_process_offer() Vitaly Kuznetsov
2015-02-04  7:42   ` Dexuan Cui
2015-02-03 17:00 ` [PATCH 3/4] Drivers: hv: vmbus: protect vmbus_get_outgoing_channel() against channel removal Vitaly Kuznetsov
2015-02-04  7:27   ` Dexuan Cui
2015-02-03 17:00 ` [PATCH 4/4] hyperv: netvsc: improve protection against rescind offer Vitaly Kuznetsov
2015-02-04  7:29   ` Dexuan Cui
2015-02-04 18:26 ` [PATCH 0/4] Drivers: hv: Further protection for the rescind path KY Srinivasan
2015-02-05 10:10   ` Vitaly Kuznetsov
2015-02-05 12:44     ` Dexuan Cui
2015-02-05 22:47       ` KY Srinivasan
2015-02-06 14:53         ` Dexuan Cui
2015-02-07 16:27           ` KY Srinivasan
2015-02-05 15:52     ` KY Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877fvyrphx.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.