All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Parri <parri.andrea@gmail.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	"K . Y . Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>,
	linux-hyperv@vger.kernel.org,
	Michael Kelley <mikelley@microsoft.com>,
	Dexuan Cui <decui@microsoft.com>,
	Boqun Feng <boqun.feng@gmail.com>
Subject: Re: [RFC PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels
Date: Thu, 26 Mar 2020 18:05:18 +0100	[thread overview]
Message-ID: <20200326170518.GA14314@andrea> (raw)
In-Reply-To: <87y2rn4287.fsf@vitty.brq.redhat.com>

On Thu, Mar 26, 2020 at 03:31:20PM +0100, Vitaly Kuznetsov wrote:
> "Andrea Parri (Microsoft)" <parri.andrea@gmail.com> writes:
> 
> > When Hyper-V sends an interrupt to the guest, the guest has to figure
> > out which channel the interrupt is associated with.  Hyper-V sets a bit
> > in a memory page that is shared with the guest, indicating a particular
> > "relid" that the interrupt is associated with.  The current Linux code
> > then uses a set of per-CPU linked lists to map a given "relid" to a
> > pointer to a channel structure.
> >
> > This design introduces a synchronization problem if the CPU that Hyper-V
> > will interrupt for a certain channel is changed.  If the interrupt comes
> > on the "old CPU" and the channel was already moved to the per-CPU list
> > of the "new CPU", then the relid -> channel mapping will fail and the
> > interrupt is dropped.  Similarly, if the interrupt comes on the new CPU
> > but the channel was not moved to the per-CPU list of the new CPU, then
> > the mapping will fail and the interrupt is dropped.
> >
> > Relids are integers ranging from 0 to 2047.  The mapping from relids to
> > channel structures can be done by setting up an array with 2048 entries,
> > each entry being a pointer to a channel structure (hence total size ~16K
> > bytes, which is not a problem).  The array is global, so there are no
> > per-CPU linked lists to update.   The array can be searched and updated
> > by simply loading and storing the array at the specified index.  With no
> > per-CPU data structures, the above mentioned synchronization problem is
> > avoided and the relid2channel() function gets simpler.
> >
> > Suggested-by: Michael Kelley <mikelley@microsoft.com>
> > Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
> > ---
> >  drivers/hv/channel_mgmt.c | 158 ++++++++++++++++++++++----------------
> >  drivers/hv/connection.c   |  38 +++------
> >  drivers/hv/hv.c           |   2 -
> >  drivers/hv/hyperv_vmbus.h |  14 ++--
> >  drivers/hv/vmbus_drv.c    |  48 +++++++-----
> >  include/linux/hyperv.h    |   5 --
> >  6 files changed, 139 insertions(+), 126 deletions(-)
> >
> > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> > index 1191f3d76d111..9b1449c839575 100644
> > --- a/drivers/hv/channel_mgmt.c
> > +++ b/drivers/hv/channel_mgmt.c
> > @@ -319,7 +319,6 @@ static struct vmbus_channel *alloc_channel(void)
> >  	init_completion(&channel->rescind_event);
> >  
> >  	INIT_LIST_HEAD(&channel->sc_list);
> > -	INIT_LIST_HEAD(&channel->percpu_list);
> >  
> >  	tasklet_init(&channel->callback_event,
> >  		     vmbus_on_event, (unsigned long)channel);
> > @@ -340,23 +339,28 @@ static void free_channel(struct vmbus_channel *channel)
> >  	kobject_put(&channel->kobj);
> >  }
> >  
> > -static void percpu_channel_enq(void *arg)
> > +void vmbus_channel_map_relid(struct vmbus_channel *channel)
> >  {
> > -	struct vmbus_channel *channel = arg;
> > -	struct hv_per_cpu_context *hv_cpu
> > -		= this_cpu_ptr(hv_context.cpu_context);
> > -
> > -	list_add_tail_rcu(&channel->percpu_list, &hv_cpu->chan_list);
> > +	if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
> > +		return;
> > +	/*
> > +	 * Pairs with the READ_ONCE() in vmbus_chan_sched().  Guarantees
> > +	 * that vmbus_chan_sched() will find up-to-date data.
> > +	 */
> > +	smp_store_release(
> > +		&vmbus_connection.channels[channel->offermsg.child_relid],
> > +		channel);
> >  }
> >  
> > -static void percpu_channel_deq(void *arg)
> > +void vmbus_channel_unmap_relid(struct vmbus_channel *channel)
> >  {
> > -	struct vmbus_channel *channel = arg;
> > -
> > -	list_del_rcu(&channel->percpu_list);
> > +	if (WARN_ON(channel->offermsg.child_relid >= MAX_CHANNEL_RELIDS))
> > +		return;
> > +	WRITE_ONCE(
> > +		vmbus_connection.channels[channel->offermsg.child_relid],
> > +		NULL);
> 
> I don't think this smp_store_release()/WRITE_ONCE() fanciness gives you
> anything. Basically, without proper synchronization with a lock there is
> no such constructions which will give you any additional guarantee on
> top of just doing X=1. E.g. smp_store_release() is just 
>   barrier();
>   *p = v;
> if I'm not mistaken. Nobody tells you when *some other CPU* will see the
> update - 'eventually' is your best guess. Here, you're only setting one
> pointer.
> 
> Percpu structures have an advantage: we (almost) never access them from
> different CPUs so just doing updates atomically (and writing 64bit
> pointer on x86_64 is atomic) is OK.
> 
> I haven't looked at all possible scenarios but I'd suggest protecting
> this array with a spinlock (in case we can have simultaneous accesses
> from different CPUs and care about the result, of course).

The smp_store_release()+READ_ONCE() pair should guarantee that any store
to the channel fields performed before (in program order) the "mapping"
of the channel are visible to the CPU which observes that mapping; this
guarantee is expected to hold for all architectures.

Notice that this apporach follows the current/upstream code, cf. the
rcu_assign_pointer() in list_add_tail_rcu() and notice that (both before
and after this series) vmbus_chan_sched() accesses the channel array
without any mutex/lock held.

I'd be inclined to stick to the current code (unless more turns out to
be required).  Thoughts?

Thanks,
  Andrea

  reply	other threads:[~2020-03-26 17:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 22:54 [RFC PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
2020-03-25 22:54 ` [RFC PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
2020-03-26 14:05   ` Vitaly Kuznetsov
2020-03-28 18:50     ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
2020-03-26 14:16   ` Vitaly Kuznetsov
2020-03-26 15:47     ` Andrea Parri
2020-03-26 17:26       ` Vitaly Kuznetsov
2020-03-28 17:08         ` Andrea Parri
2020-03-29  3:43           ` Michael Kelley
2020-03-30 12:24             ` Vitaly Kuznetsov
2020-04-03 12:04               ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
2020-03-26 14:31   ` Vitaly Kuznetsov
2020-03-26 17:05     ` Andrea Parri [this message]
2020-03-26 17:43       ` Vitaly Kuznetsov
2020-03-28 18:21         ` Andrea Parri
2020-03-29  3:49           ` Michael Kelley
2020-03-30 12:45           ` Vitaly Kuznetsov
2020-04-03 13:38             ` Andrea Parri
2020-04-03 14:56               ` Vitaly Kuznetsov
2020-03-25 22:54 ` [RFC PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
2020-03-26 15:26   ` Stephen Hemminger
2020-03-26 17:55     ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
2020-03-26 14:46   ` Vitaly Kuznetsov
2020-03-28 18:48     ` Andrea Parri
2020-04-03 14:55       ` Andrea Parri
2020-03-25 22:55 ` [RFC PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
2020-03-30 16:42   ` Michael Kelley
2020-03-30 18:55     ` Andrea Parri
2020-03-30 19:49       ` Michael Kelley
2020-04-03 13:41         ` Andrea Parri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200326170518.GA14314@andrea \
    --to=parri.andrea@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=sthemmin@microsoft.com \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.