From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chao Peng Subject: Re: [PATCH v15 01/11] multicall: add no preemption ability between two calls Date: Thu, 18 Sep 2014 21:45:42 +0800 Message-ID: <20140918134542.GA6631@pengc-linux> References: <540F19AF0200007800032AD1@mail.emea.novell.com> <20140910013232.GG15872@pengc-linux> <54101D29.8010606@citrix.com> <54103F01020000780003328F@mail.emea.novell.com> <541024AF.8050404@citrix.com> <541043330200007800033304@mail.emea.novell.com> <54103207.5080004@citrix.com> <20140912025543.GI15872@pengc-linux> <20140917092237.GA15318@pengc-linux> <5419740C02000078000359A9@mail.emea.novell.com> Reply-To: Chao Peng Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <5419740C02000078000359A9@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Andrew Cooper Cc: keir@xen.org, Ian.Campbell@citrix.com, stefano.stabellini@eu.citrix.com, George.Dunlap@eu.citrix.com, Ian.Jackson@eu.citrix.com, xen-devel@lists.xen.org, dgdegra@tycho.nsa.gov List-Id: xen-devel@lists.xenproject.org On Wed, Sep 17, 2014 at 10:44:12AM +0100, Jan Beulich wrote: > >>> On 17.09.14 at 11:22, wrote: > > On Fri, Sep 12, 2014 at 10:55:43AM +0800, Chao Peng wrote: > >> On Wed, Sep 10, 2014 at 12:12:07PM +0100, Andrew Cooper wrote: > >> > On 10/09/14 11:25, Jan Beulich wrote: > >> > >>>> On 10.09.14 at 12:15, wrote: > >> > >> On 10/09/14 11:07, Jan Beulich wrote: > >> > >>>>>> On 10.09.14 at 11:43, wrote: > >> > >>>> Actually, on further thought, using multicalls like this cannot possibly > >> > >>>> be correct from a functional point of view. > >> > >>>> > >> > >>>> Even with the no preempt flag between a wrmsr/rdmsr hypercall pair, > >> > >>>> there is no guarantee that accesses to remote cpus msrs won't interleave > >> > >>>> with a different natural access, clobbering the results of the wrmsr. > >> > >>>> > >> > >>>> However this is solved, the wrmsr/rdmsr pair *must* be part of the same > >> > >>>> synchronous thread of execution on the appropriate cpu. You can trust > >> > >>>> that interrupts won't play with these msrs, but you absolutely can't > >> > >>>> guarantee that IPI/wrmsr/IPI/rdmsr will work. > >> > >>> Not sure I follow, particularly in the context of the white listing of > >> > >>> MSRs permitted here (which ought to not include anything the > >> > >>> hypervisor needs control over). > >> > >> Consider two dom0 vcpus both using this new multicall mechanism to read > >> > >> QoS information for different domains, which end up both targeting the > >> > >> same remote cpu. They will both end up using IPI/wrmsr/IPI/rdmsr, which > >> > >> may interleave and clobber the first wrmsr. > >> > > But that situation doesn't result from the multicall use here - it would > >> > > equally be the case for an inherently batchable hypercall. > >> > > >> > Indeed - I called out multicall because of the current implementation, > >> > but I should have been more clear. > >> > > >> > > To deal with > >> > > that we'd need a wrmsr-then-rdmsr operation, or move the entire > >> > > execution of the batch onto the target CPU. Since the former would > >> > > quickly become unwieldy for more complex operations, I think this > >> > > gets us back to aiming at using continue_hypercall_on_cpu() here. > >> > > >> > Which gets us back to the problem that you cannot use > >> > copy_{to,from}_guest() after continue_hypercall_on_cpu(), due to being > >> > in the wrong context. > >> > > >> > > >> > I think this requires a step back and rethink. I can't offhand think of > >> > any combination of existing bits of infrastructure which will allow this > >> > to work correctly, which means something new needs designing. > >> > > >> How about this: > >> > >> 1) Still do the batch in do_platform_op() but add a iteration field in > >> the interface structure. > >> > >> 2) Still use on_selected_cpus() but group the adjacent resource_ops > >> which have a same cpu and NO_PREEMPT set into one and do it as a whole > >> in the new cpu context. > >> > > Any suggestion for this? > > 1 is ugly (contradicting everything we do elsewhere), but would be a > last resort option. > > 2 would be perhaps an option if small, non-preemptible batches > would be handled in do_platform_op() while preemptible larger > groups then ought to use the multicall interface. > > Option 3 would be to fiddle with the current vCPU's affinity before > invoking a continuation (perhaps already on the first iteration to > get onto the needed pCPU). > Thanks Jan. On further thought, I think we may over design for this. Why not make it simple and also scalable? The answer is also simple: do_platform_op() is always non-preemptible. It can accept one operation or small batch of operations but it guarantees all the operations are non-preemptible. (eg it never calls hypercall_create_continuation() ) It's the minimum unit for non-preemptible operation. If the caller(userspace tool) wants to make preemptible batch calls, then multicall mechanism can be employed. We don't need to add NO_PREEMPT ability for multicall. Just keep it preemptible. This is almost option 2 above. Chao > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel