All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Parri <parri.andrea@gmail.com>
To: Michael Kelley <mikelley@microsoft.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Wei Liu <wei.liu@kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	Dexuan Cui <decui@microsoft.com>,
	Boqun Feng <boqun.feng@gmail.com>, vkuznets <vkuznets@redhat.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>
Subject: Re: [RFC PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
Date: Fri, 3 Apr 2020 15:41:07 +0200	[thread overview]
Message-ID: <20200403134107.GA25800@andrea> (raw)
In-Reply-To: <DM5PR2101MB104720061795F26D892D5373D7CB0@DM5PR2101MB1047.namprd21.prod.outlook.com>

On Mon, Mar 30, 2020 at 07:49:00PM +0000, Michael Kelley wrote:
> From: Andrea Parri <parri.andrea@gmail.com> Sent: Monday, March 30, 2020 11:55 AM
> > 
> > > > @@ -1721,6 +1721,10 @@ static ssize_t target_cpu_store(struct vmbus_channel
> > *channel,
> > > >  	 * in on a CPU that is different from the channel target_cpu value.
> > > >  	 */
> > > >
> > > > +	if (channel->change_target_cpu_callback)
> > > > +		(*channel->change_target_cpu_callback)(channel,
> > > > +				channel->target_cpu, target_cpu);
> > > > +
> > > >  	channel->target_cpu = target_cpu;
> > > >  	channel->target_vp = hv_cpu_number_to_vp_number(target_cpu);
> > > >  	channel->numa_node = cpu_to_node(target_cpu);
> > >
> > > I think there's an ordering problem here.  The change_target_cpu_callback
> > > will allow storvsc to flush the cache that it is keeping, but there's a window
> > > after the storvsc callback releases the spin lock and before this function
> > > changes channel->target_cpu to the new value.  In that window, the cache
> > > could get refilled based on the old value of channel->target_cpu, which is
> > > exactly what we don't want.  Generally with caches, you have to set the new
> > > value first, then flush the cache, and I think that works in this case.  The
> > > callback function doesn't depend on the value of channel->target_cpu,
> > > and any cache filling that might happen after channel->target_cpu is set
> > > to the new value but before the callback function runs is OK.   But please
> > > double-check my thinking. :-)
> > 
> > Sorry, I don't see the problem.  AFAICT, the "cache" gets refilled based
> > on the values of alloced_cpus and on the current state of the cache but
> > not based on the value of channel->target_cpu.  The callback invocation
> > uses the value of the "old" target_cpu; I think I ended up placing the
> > callback call where it is for not having to introduce a local variable
> > "old_cpu".  ;-)
> >
> 
> You are right.   My comment is bogus.
> 
> > 
> > > > @@ -621,6 +621,63 @@ static inline struct storvsc_device *get_in_stor_device(
> > > >
> > > >  }
> > > >
> > > > +void storvsc_change_target_cpu(struct vmbus_channel *channel, u32 old, u32 new)
> > > > +{
> > > > +	struct storvsc_device *stor_device;
> > > > +	struct vmbus_channel *cur_chn;
> > > > +	bool old_is_alloced = false;
> > > > +	struct hv_device *device;
> > > > +	unsigned long flags;
> > > > +	int cpu;
> > > > +
> > > > +	device = channel->primary_channel ?
> > > > +			channel->primary_channel->device_obj
> > > > +				: channel->device_obj;
> > > > +	stor_device = get_out_stor_device(device);
> > > > +	if (!stor_device)
> > > > +		return;
> > > > +
> > > > +	/* See storvsc_do_io() -> get_og_chn(). */
> > > > +	spin_lock_irqsave(&device->channel->lock, flags);
> > > > +
> > > > +	/*
> > > > +	 * Determines if the storvsc device has other channels assigned to
> > > > +	 * the "old" CPU to update the alloced_cpus mask and the stor_chns
> > > > +	 * array.
> > > > +	 */
> > > > +	if (device->channel != channel && device->channel->target_cpu == old) {
> > > > +		cur_chn = device->channel;
> > > > +		old_is_alloced = true;
> > > > +		goto old_is_alloced;
> > > > +	}
> > > > +	list_for_each_entry(cur_chn, &device->channel->sc_list, sc_list) {
> > > > +		if (cur_chn == channel)
> > > > +			continue;
> > > > +		if (cur_chn->target_cpu == old) {
> > > > +			old_is_alloced = true;
> > > > +			goto old_is_alloced;
> > > > +		}
> > > > +	}
> > > > +
> > > > +old_is_alloced:
> > > > +	if (old_is_alloced)
> > > > +		WRITE_ONCE(stor_device->stor_chns[old], cur_chn);
> > > > +	else
> > > > +		cpumask_clear_cpu(old, &stor_device->alloced_cpus);
> > >
> > > I think target_cpu_store() can get called in parallel on multiple CPUs for different
> > > channels on the same storvsc device, but multiple changes to a single channel are
> > > serialized by higher levels of sysfs.  So this function could run after multiple
> > > channels have been changed, in which case there's not just a single "old" value,
> > > and the above algorithm might not work, especially if channel->target_cpu is
> > > updated before calling this function per my earlier comment.   I can see a
> > > couple of possible ways to deal with this.  One is to put the update of
> > > channel->target_cpu in this function, within the spin lock boundaries so
> > > that the cache flush and target_cpu update are atomic.  Another idea is to
> > > process multiple changes in this function, by building a temp copy of
> > > alloced_cpus by walking the channel list, use XOR to create a cpumask
> > > with changes, and then process all the changes in a loop instead of
> > > just handling a single change as with the current code at the old_is_alloced
> > > label.  But I haven't completely thought through this idea.
> > 
> > Same here: the invocations of target_cpu_store() are serialized on the
> > per-connection channel_mutex...
> 
> Agreed.  My comment is not valid.
> 
> > 
> > 
> > > > @@ -1268,8 +1330,10 @@ static struct vmbus_channel *get_og_chn(struct
> > storvsc_device
> > > > *stor_device,
> > > >  		if (cpumask_test_cpu(tgt_cpu, node_mask))
> > > >  			num_channels++;
> > > >  	}
> > > > -	if (num_channels == 0)
> > > > +	if (num_channels == 0) {
> > > > +		stor_device->stor_chns[q_num] = stor_device->device->channel;
> > >
> > > Is the above added line just fixing a bug in the existing code?  I'm not seeing how
> > > it would derive from the other changes in this patch.
> > 
> > It was rather intended as an optimization:  Each time I/O for a device
> > is initiated on a CPU that have "num_channels == 0" channel, the current
> > code ends up calling get_og_chn() (in the attempt to fill the cache) and
> > returns the device's primary channel.  In the current code, the cost of
> > this operations is basically the cost of parsing alloced_cpus, but with
> > the changes introduced here this also involves acquiring (and releasing)
> > the primary channel's lock.  I should probably put my hands forward and
> > say that I haven't observed any measurable effects due this addition in
> > my experiments; OTOH, caching the returned/"found" value made sense...
> 
> OK.  That's what I thought.  The existing code does not produce an incorrect
> result, but the cache isn't working as intended.  This fixes it.
> 
> > 
> > 
> > > > @@ -1324,7 +1390,10 @@ static int storvsc_do_io(struct hv_device *device,
> > > >  					continue;
> > > >  				if (tgt_cpu == q_num)
> > > >  					continue;
> > > > -				channel = stor_device->stor_chns[tgt_cpu];
> > > > +				channel = READ_ONCE(
> > > > +					stor_device->stor_chns[tgt_cpu]);
> > > > +				if (channel == NULL)
> > > > +					continue;
> > >
> > > The channel == NULL case is new because a cache flush could be happening
> > > in parallel on another CPU.  I'm wondering about the tradeoffs of
> > > continuing in the loop (as you have coded in this patch) vs. a "goto" back to
> > > the top level "if" statement.   With the "continue" you might finish the
> > > loop without finding any matches, and fall through to the next approach.
> > > But it's only a single I/O operation, and if it comes up with a less than
> > > optimal channel choice, it's no big deal.  So I guess it's really a wash.
> > 
> > Yes, I considered both approaches; they both "worked" here.  I was a
> > bit concerned about the number of "possible" gotos (again, mainly a
> > theoretical issue, since I can imagine that the cash flushes will be
> > relatively "rare" events in most cases and, in any case, they happen
> > to be serialized); the "continue" looked like a suitable and simpler
> > approach/compromise, at least for the time being.
> 
> Yes, I'm OK with your patch "as is".  I was just thinking about the
> alternative, and evidently you did too.

Thank you for the confirmation.  I'm wondering: could take this as a
Reviewed-by: for this patch?  ;-)

Thanks,
  Andrea


> 
> > 
> > 
> > >
> > > >  				if (hv_get_avail_to_write_percent(
> > > >  							&channel->outbound)
> > > >  						> ring_avail_percent_lowater) {
> > > > @@ -1350,7 +1419,10 @@ static int storvsc_do_io(struct hv_device *device,
> > > >  			for_each_cpu(tgt_cpu, &stor_device->alloced_cpus) {
> > > >  				if (cpumask_test_cpu(tgt_cpu, node_mask))
> > > >  					continue;
> > > > -				channel = stor_device->stor_chns[tgt_cpu];
> > > > +				channel = READ_ONCE(
> > > > +					stor_device->stor_chns[tgt_cpu]);
> > > > +				if (channel == NULL)
> > > > +					continue;
> > >
> > > Same comment here.
> > 
> > Similarly here.
> 
> Agreed.
> 
> > 
> > Thoughts?
> > 
> > Thanks,
> >   Andrea

      reply	other threads:[~2020-04-03 13:41 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25 22:54 [RFC PATCH 00/11] VMBus channel interrupt reassignment Andrea Parri (Microsoft)
2020-03-25 22:54 ` [RFC PATCH 01/11] Drivers: hv: vmbus: Always handle the VMBus messages on CPU0 Andrea Parri (Microsoft)
2020-03-26 14:05   ` Vitaly Kuznetsov
2020-03-28 18:50     ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 02/11] Drivers: hv: vmbus: Don't bind the offer&rescind works to a specific CPU Andrea Parri (Microsoft)
2020-03-26 14:16   ` Vitaly Kuznetsov
2020-03-26 15:47     ` Andrea Parri
2020-03-26 17:26       ` Vitaly Kuznetsov
2020-03-28 17:08         ` Andrea Parri
2020-03-29  3:43           ` Michael Kelley
2020-03-30 12:24             ` Vitaly Kuznetsov
2020-04-03 12:04               ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 03/11] Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels Andrea Parri (Microsoft)
2020-03-26 14:31   ` Vitaly Kuznetsov
2020-03-26 17:05     ` Andrea Parri
2020-03-26 17:43       ` Vitaly Kuznetsov
2020-03-28 18:21         ` Andrea Parri
2020-03-29  3:49           ` Michael Kelley
2020-03-30 12:45           ` Vitaly Kuznetsov
2020-04-03 13:38             ` Andrea Parri
2020-04-03 14:56               ` Vitaly Kuznetsov
2020-03-25 22:54 ` [RFC PATCH 04/11] hv_netvsc: Disable NAPI before closing the VMBus channel Andrea Parri (Microsoft)
2020-03-26 15:26   ` Stephen Hemminger
2020-03-26 17:55     ` Andrea Parri
2020-03-25 22:54 ` [RFC PATCH 05/11] hv_utils: Always execute the fcopy and vss callbacks in a tasklet Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 06/11] Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 07/11] PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 08/11] Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 09/11] Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug Andrea Parri (Microsoft)
2020-03-25 22:55 ` [RFC PATCH 10/11] Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type Andrea Parri (Microsoft)
2020-03-26 14:46   ` Vitaly Kuznetsov
2020-03-28 18:48     ` Andrea Parri
2020-04-03 14:55       ` Andrea Parri
2020-03-25 22:55 ` [RFC PATCH 11/11] scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned Andrea Parri (Microsoft)
2020-03-30 16:42   ` Michael Kelley
2020-03-30 18:55     ` Andrea Parri
2020-03-30 19:49       ` Michael Kelley
2020-04-03 13:41         ` Andrea Parri [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200403134107.GA25800@andrea \
    --to=parri.andrea@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=jejb@linux.ibm.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=mikelley@microsoft.com \
    --cc=sthemmin@microsoft.com \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.