From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chong Li Subject: Re: [PATCH v1 1/4] xen: enabling XL to set per-VCPU parameters of a domain for RTDS scheduler Date: Thu, 14 May 2015 17:27:22 -0500 Message-ID: References: <1431018326-3239-1-git-send-email-chong.li@wustl.edu> <1431018326-3239-2-git-send-email-chong.li@wustl.edu> <554C86B60200007800077FAC@mail.emea.novell.com> <55506EF90200007800078B4B@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3654539436055405089==" Return-path: In-Reply-To: <55506EF90200007800078B4B@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Chong Li , Wei Liu , Sisu Xi , "george.dunlap" , "dario.faggioli" , xen-devel , Meng Xu , Dagaen Golomb List-Id: xen-devel@lists.xenproject.org --===============3654539436055405089== Content-Type: multipart/alternative; boundary=001a1141b7e8346a5a0516123b27 --001a1141b7e8346a5a0516123b27 Content-Type: text/plain; charset=UTF-8 On Mon, May 11, 2015 at 1:57 AM, Jan Beulich wrote: > >>> On 11.05.15 at 00:04, wrote: > > On Fri, May 8, 2015 at 2:49 AM, Jan Beulich wrote: > >> >>> On 07.05.15 at 19:05, wrote: > >> > @@ -1110,6 +1113,67 @@ rt_dom_cntl( > >> > } > >> > spin_unlock_irqrestore(&prv->lock, flags); > >> > break; > >> > + case XEN_DOMCTL_SCHEDOP_getvcpuinfo: > >> > + op->u.rtds.nr_vcpus = 0; > >> > + spin_lock_irqsave(&prv->lock, flags); > >> > + list_for_each( iter, &sdom->vcpu ) > >> > + vcpu_index++; > >> > + spin_unlock_irqrestore(&prv->lock, flags); > >> > + op->u.rtds.nr_vcpus = vcpu_index; > >> > >> Does dropping of the lock here and re-acquiring it below really work > >> race free? > >> > > > > Here, the lock is used in the same way as the ones in the two cases > > above (XEN_DOMCTL_SCHEDOP_get/putinfo). So I think if race free > > is guaranteed in that two cases, the lock in this case works race free > > as well. > > No - the difference is that in the {get,put}info cases it is being > acquired just once each. > I see. I changed it based on Dario's suggestions. > > >> > + vcpu_index = 0; > >> > + spin_lock_irqsave(&prv->lock, flags); > >> > + list_for_each( iter, &sdom->vcpu ) > >> > + { > >> > + struct rt_vcpu *svc = list_entry(iter, struct rt_vcpu, > >> sdom_elem); > >> > + > >> > + local_sched[vcpu_index].budget = svc->budget / > MICROSECS(1); > >> > + local_sched[vcpu_index].period = svc->period / > MICROSECS(1); > >> > + local_sched[vcpu_index].index = vcpu_index; > >> > >> What use is this index to the caller? I think you rather want to tell it > >> the vCPU number. That's especially also taking the use case of a > >> get/set pair into account - unless you tell me that these indexes can > >> never change, the indexes passed back into the set operation would > >> risk to have become stale by the time the hypervisor processes the > >> request. > >> > > > > I don't quite understand what the "stale" means. The array here > > (local_sched[ ]) > > and the array (in libxc) that local_sched[ ] is copied to are both used > for > > this get > > operation only. When users set per-vcpu parameters, there are also > > dedicated > > arrays for that set operation. > > Just clarify this for me (and maybe yourself): Is the vCPU number > <-> vcpu_index mapping invariable for the lifetime of a domain? > If it isn't, the vCPU for a particular vcpu_index during a "get" > may be different from that for the same vcpu_index during a > subsequent "set". > Here the vcpu_index means the vcpu_id. I'll use svc->vcpu.vcpu_id instead of the vcpu_index in next version. > > >> > + if( local_sched == NULL ) > >> > + { > >> > + return -ENOMEM; > >> > + } > >> > + copy_from_guest(local_sched, op->u.rtds.vcpus, > >> op->u.rtds.nr_vcpus); > >> > + > >> > + for( i = 0; i < op->u.rtds.nr_vcpus; i++ ) > >> > + { > >> > + vcpu_index = 0; > >> > + spin_lock_irqsave(&prv->lock, flags); > >> > + list_for_each( iter, &sdom->vcpu ) > >> > + { > >> > + struct rt_vcpu *svc = list_entry(iter, struct > rt_vcpu, > >> sdom_elem); > >> > + if ( local_sched[i].index == vcpu_index ) > >> > + { > >> > + if ( local_sched[i].period <= 0 || > >> local_sched[i].budget <= 0 ) > >> > + return -EINVAL; > >> > + > >> > + svc->period = MICROSECS(local_sched[i].period); > >> > + svc->budget = MICROSECS(local_sched[i].budget); > >> > + break; > >> > + } > >> > + vcpu_index++; > >> > + } > >> > + spin_unlock_irqrestore(&prv->lock, flags); > >> > + } > >> > >> Considering a maximum size guest, these two nested loops could > >> require a couple of million iterations. That's too much without any > >> preemption checks in the middle. > >> > > > > The section protected by the lock is only the "list_for_each" loop, whose > > running time is limited by the number of vcpus of a domain (32 at most). > > Since when is 32 the limit on the number of vCPU-s in a domain? > Based on Dario's suggestion, I'll use vcpu_id to locate the vcpu to set, which cost much less time. > > > If this does cause problems, I think adding a "hypercall_preempt_check()" > > at the outside "for" loop may help. Is that right? > > Yes. > > >> > --- a/xen/common/schedule.c > >> > +++ b/xen/common/schedule.c > >> > @@ -1093,7 +1093,9 @@ long sched_adjust(struct domain *d, struct > >> xen_domctl_scheduler_op *op) > >> > > >> > if ( (op->sched_id != DOM2OP(d)->sched_id) || > >> > ((op->cmd != XEN_DOMCTL_SCHEDOP_putinfo) && > >> > - (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo)) ) > >> > + (op->cmd != XEN_DOMCTL_SCHEDOP_getinfo) && > >> > + (op->cmd != XEN_DOMCTL_SCHEDOP_putvcpuinfo) && > >> > + (op->cmd != XEN_DOMCTL_SCHEDOP_getvcpuinfo)) ) > >> > >> Imo this should become a switch now. > >> > > > > Do you mean "switch ( op->cmd )" ? I'm afraid that would make it look > more > > complicated. > > This may be a matter of taste to a certain degree, but I personally > don't think a series of four almost identical comparisons reads any > better than its switch() replacement. But it being a style issue, the > ultimate decision is with George as the maintainer anyway. > > Jan > -- Chong Li Department of Computer Science and Engineering Washington University in St.louis --001a1141b7e8346a5a0516123b27 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Mon, May 11, 2015 at 1:57 AM, Jan Beulich <JBeulich@suse.com>= ; wrote:
>>= > On 11.05.15 at 00:04, <lich= ong659@gmail.com> wrote:
> On Fri, May 8, 2015 at 2:49 AM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>> On 07.05.15 at 19:05, <lichong659@gmail.com> wrote:
>> > @@ -1110,6 +1113,67 @@ rt_dom_cntl( >> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_unlock_irqrestore(&= ;prv->lock, flags);
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
>> > +=C2=A0 =C2=A0 case XEN_DOMCTL_SCHEDOP_getvcpuinfo:
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 op->u.rtds.nr_vcpus =3D 0; >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock_irqsave(&prv->l= ock, flags);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 list_for_each( iter, &sdom-&= gt;vcpu )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vcpu_index++;
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_unlock_irqrestore(&prv-= >lock, flags);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 op->u.rtds.nr_vcpus =3D vcpu_= index;
>>
>> Does dropping of the lock here and re-acquiring it below really wo= rk
>> race free?
>>
>
> Here, the lock is used in the same way as the ones in the two cases > above (XEN_DOMCTL_SCHEDOP_get/putinfo). So I think if race free
> is guaranteed in that two cases, the lock in this case works race free=
> as well.

No - the difference is that in the {get,put}info cases it is being acquired just once each.

I see. I chang= ed it based on Dario's suggestions.=C2=A0

>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 vcpu_index =3D 0;
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock_irqsave(&prv->l= ock, flags);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 list_for_each( iter, &sdom-&= gt;vcpu )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct rt_vcpu *sv= c =3D list_entry(iter, struct rt_vcpu,
>> sdom_elem);
>> > +
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 local_sched[vcpu_i= ndex].budget =3D svc->budget / MICROSECS(1);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 local_sched[vcpu_i= ndex].period =3D svc->period / MICROSECS(1);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 local_sched[vcpu_i= ndex].index =3D vcpu_index;
>>
>> What use is this index to the caller? I think you rather want to t= ell it
>> the vCPU number. That's especially also taking the use case of= a
>> get/set pair into account - unless you tell me that these indexes = can
>> never change, the indexes passed back into the set operation would=
>> risk to have become stale by the time the hypervisor processes the=
>> request.
>>
>
> I don't quite understand what the "stale" means. The arr= ay here
> (local_sched[ ])
> and the array (in libxc) that local_sched[ ] is copied to are both use= d for
> this get
> operation only. When users set per-vcpu parameters, there are also
> dedicated
> arrays for that set operation.

Just clarify this for me (and maybe yourself): Is the vCPU number <-> vcpu_index mapping invariable for the lifetime of a domain?
If it isn't, the vCPU for a particular vcpu_index during a "get&qu= ot;
may be different from that for the same vcpu_index during a
subsequent "set".

Here the vc= pu_index means the vcpu_id. I'll use svc->vcpu.vcpu_id instead of th= e
vcpu_index in next version.
=C2=A0

>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 if( local_sched =3D=3D NULL ) >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return -ENOMEM; >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 copy_from_guest(local_sched, op-= >u.rtds.vcpus,
>> op->u.rtds.nr_vcpus);
>> > +
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 for( i =3D 0; i < op->u.rt= ds.nr_vcpus; i++ )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vcpu_index =3D 0;<= br> >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_lock_irqsave(= &prv->lock, flags);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 list_for_each( ite= r, &sdom->vcpu )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 stru= ct rt_vcpu *svc =3D list_entry(iter, struct rt_vcpu,
>> sdom_elem);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (= local_sched[i].index =3D=3D vcpu_index )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 { >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 if ( local_sched[i].period <=3D 0 ||
>> local_sched[i].budget <=3D 0 )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return -EINVAL;
>> > +
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 svc->period =3D MICROSECS(local_sched[i].period);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 svc->budget =3D MICROSECS(local_sched[i].budget);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 break;
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } >> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 vcpu= _index++;
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 spin_unlock_irqres= tore(&prv->lock, flags);
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>>
>> Considering a maximum size guest, these two nested loops could
>> require a couple of million iterations. That's too much withou= t any
>> preemption checks in the middle.
>>
>
> The section protected by the lock is only the "list_for_each"= ; loop, whose
> running time is limited by the number of vcpus of a domain (32 at most= ).

Since when is 32 the limit on the number of vCPU-s in a domain?=

Based on Dario's suggestion, I'= ;ll use vcpu_id to locate the vcpu to set, which cost much
less t= ime.
=C2=A0

> If this does cause problems, I think adding a "hypercall_preempt_= check()"
> at the outside "for" loop may help. Is that right?

Yes.

>> > --- a/xen/common/schedule.c
>> > +++ b/xen/common/schedule.c
>> > @@ -1093,7 +1093,9 @@ long sched_adjust(struct domain *d, str= uct
>> xen_domctl_scheduler_op *op)
>> >
>> >=C2=A0 =C2=A0 =C2=A0 if ( (op->sched_id !=3D DOM2OP(d)->= sched_id) ||
>> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0((op->cmd !=3D XEN= _DOMCTL_SCHEDOP_putinfo) &&
>> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (op->cmd !=3D XEN_DOMC= TL_SCHEDOP_getinfo)) )
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (op->cmd !=3D XEN_DOMC= TL_SCHEDOP_getinfo) &&
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (op->cmd !=3D XEN_DOMC= TL_SCHEDOP_putvcpuinfo) &&
>> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (op->cmd !=3D XEN_DOMC= TL_SCHEDOP_getvcpuinfo)) )
>>
>> Imo this should become a switch now.
>>
>
> Do you mean "switch ( op->cmd )" ? I'm afraid that wo= uld make it look more
> complicated.

This may be a matter of taste to a certain degree, but I personally<= br> don't think a series of four almost identical comparisons reads any
better than its switch() replacement. But it being a style issue, the
ultimate decision is with George as the maintainer anyway.

Jan



--
Chong Li
Department of Computer Science and Engineering=
Washington University in St.louis
=
--001a1141b7e8346a5a0516123b27-- --===============3654539436055405089== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============3654539436055405089==--