From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 1/3] xen: sched: introduce the 'null' semi-static scheduler Date: Tue, 21 Mar 2017 09:26:23 +0100 Message-ID: <1490084783.15340.10.camel@citrix.com> References: <148977585611.29510.906390949919041674.stgit@Palanthas.fritz.box> <148977617833.29510.4160128186395621610.stgit@Palanthas.fritz.box> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3745303215736307315==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cqF84-0007zF-7j for xen-devel@lists.xenproject.org; Tue, 21 Mar 2017 08:26:56 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Stefano Stabellini Cc: Jonathan Davies , Julien Grall , George Dunlap , Marcus Granado , xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org --===============3745303215736307315== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-TuuQJGh/vaI2fy3yLsXY" --=-TuuQJGh/vaI2fy3yLsXY Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2017-03-20 at 16:21 -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2017, Dario Faggioli wrote: > >=20 > > --- /dev/null > > +++ b/xen/common/sched_null.c > > +/* > > + * Virtual CPU > > + */ > > +struct null_vcpu { > > +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head waitq_elem; > > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_dom *ndom; >=20 > This field is redundant, given that from struct vcpu you can get > struct > domain and from struct domain you can get struct null_dom. I would > remove it. >=20 It's kind of a paradigm, followed by pretty much all schedulers. So it's good to uniform to that, and it's often quite useful (or it may be in future)... I'll have a look, though, at whether it is really important to have it in this simple scheduler too, and do remove it if it's not worth. > > +=C2=A0=C2=A0=C2=A0=C2=A0struct vcpu *vcpu; > > +=C2=A0=C2=A0=C2=A0=C2=A0int pcpu;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0/* To what pCPU the vCPU is assigned (-1 if > > none) */ >=20 > Isn't this always the same as struct vcpu->processor? > If it's only different when the vcpu is waiting in the waitq, then > you > can just remove this field and replace the pcpu =3D=3D -1 test with > list_empty(waitq_elem). >=20 I'll think about it. Right now, it's useful for ASSERTing and consistency checking, which is something I want, at least in this phase. It is also useful for reporting to what pcpu a vcpu is currently assigned, for which thing you can't trust v->processor. That only happens in `xl debug-key r' for now, but we may want to have less tricky way of exporting such information in future. Anyway, as I said, I'll ponder whether I can get rid of it. > > +static void null_vcpu_remove(const struct scheduler *ops, struct > > vcpu *v) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_private *prv =3D null_priv(ops); > > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *wvc, *nvc =3D null_vcpu(v); > > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned int cpu; > > +=C2=A0=C2=A0=C2=A0=C2=A0spinlock_t *lock; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v)); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0lock =3D vcpu_schedule_lock_irq(v); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0cpu =3D v->processor; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0/* If v is in waitqueue, just get it out of th= ere and bail */ > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(nvc->pcpu =3D=3D -1) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_= lock); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!list_empty(&nu= ll_vcpu(v)->waitq_elem)); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&nvc->wa= itq_elem); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait= q_lock); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out; > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If v is assigned to a pCPU, let's see = if there is someone > > waiting. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If yes, we assign it to cpu, in spite = of v. If no, we just > > set > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* cpu free. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(per_cpu(npc, cpu).vcpu =3D=3D v); > > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!cpumask_test_cpu(cpu, &prv->cpus_free)= ); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_lock); > > +=C2=A0=C2=A0=C2=A0=C2=A0wvc =3D list_first_entry_or_null(&prv->waitq, = struct null_vcpu, > > waitq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( wvc ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_assign(prv, wvc->= vcpu, cpu); >=20 > The vcpu_assign in null_vcpu_insert is protected by the pcpu runq > lock, > while this call is protected by the waitq_lock lock. Is that safe? >=20 vcpu assignment is always protected by the runqueue lock. Both in null_vcpu_insert and() in here, we take it with: lock =3D vcpu_schedule_lock_irq(v); at the beginning of the function (left in context, see above). Taking the waitq_lock here serializes access to the waitqueue (prv->waitqueue), i.e., the list_first_entry_or_null() call above, and the list_del_init() call below. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&wvc->wa= itq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpu_raise_softirq(cpu,= SCHEDULE_SOFTIRQ); > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > +=C2=A0=C2=A0=C2=A0=C2=A0else > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_deassign(prv, v, = cpu); > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > +=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->waitq_lock); > > + > > + out: > > +=C2=A0=C2=A0=C2=A0=C2=A0vcpu_schedule_unlock_irq(lock, v); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_remove); > > +} > > + > > +static void null_vcpu_wake(const struct scheduler *ops, struct > > vcpu *v) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v)); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(curr_on_cpu(v->processor) =3D=3D= v) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_= wake_running); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return; > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( null_vcpu(v)->pcpu =3D=3D -1 ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > + /* Not exactly "on runq", but close enough for reusing the > > counter */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_= wake_onrunq); > > + return; >=20 > coding style >=20 Yeah... I need to double check the style of all the file (patch series?). I mostly wrote this while travelling, with an editor set differently from what I use when at home. I thought I had done that, but I clearly missed a couple of sports. Sorry. > > +static void null_vcpu_migrate(const struct scheduler *ops, struct > > vcpu *v, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned int new_cpu) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_private *prv =3D null_priv(ops); > > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *nvc =3D null_vcpu(v); > > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned int cpu =3D v->processor; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v)); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( v->processor =3D=3D new_cpu ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* v is either in the waitqueue, or assig= ned to a pCPU. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the former case, there is nothing t= o do. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the latter, the pCPU to which it wa= s assigned would > > become free, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* and we, therefore, should check whethe= r there is anyone in > > the > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* waitqueue that can be assigned to it. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( likely(nvc->pcpu !=3D -1) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *wvc; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_= lock); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0wvc =3D list_first_ent= ry_or_null(&prv->waitq, struct > > null_vcpu, waitq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( wvc && cpumask_te= st_cpu(cpu, > > cpupool_domain_cpumask(v->domain)) ) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0vcpu_assign(prv, wvc->vcpu, cpu); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0list_del_init(&wvc->waitq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0vcpu_deassign(prv, v, cpu); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait= q_lock); >=20 > This looks very similar to null_vcpu_remove, maybe you want to > refactor > the code and call a single shared service function. >=20 Yes, maybe I can. > > + SCHED_STAT_CRANK(migrate_running); > > +=C2=A0=C2=A0=C2=A0=C2=A0} > > +=C2=A0=C2=A0=C2=A0=C2=A0else > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(migra= te_on_runq); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(migrated); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Let's now consider new_cpu, which is w= here v is being sent. > > It can be > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* either free, or have a vCPU already as= signed to it. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the former case, we should assign v= to it, and try to > > get it to run. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In latter, all we can do is to park v = in the waitqueue. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > +=C2=A0=C2=A0=C2=A0=C2=A0if ( per_cpu(npc, new_cpu).vcpu =3D=3D NULL ) > > +=C2=A0=C2=A0=C2=A0=C2=A0{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* We don't know wheth= er v was in the waitqueue. If yes, > > remove it */ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_= lock); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&nvc->wa= itq_elem); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait= q_lock); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_assign(prv, v, ne= w_cpu); >=20 > This vcpu_assign call seems to be unprotected. Should it be within a > spin_lock'ed area? >=20 Lock is taken by the caller. Check calls to SCHED_OP(...,migrate) in xen/common/schedule.c. That's by design, and it's like that for most functions (with the sole exceptions of debug dump and insert/remove, IIRC), for all schedulers. > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > > index 223a120..b482037 100644 > > --- a/xen/common/schedule.c > > +++ b/xen/common/schedule.c > > @@ -1785,6 +1785,8 @@ int schedule_cpu_switch(unsigned int cpu, > > struct cpupool *c) > > =C2=A0 > > =C2=A0 out: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0per_cpu(cpupool, cpu) =3D c; > > +=C2=A0=C2=A0=C2=A0=C2=A0/* Trigger a reschedule so the CPU can pick up= some work ASAP. > > */ > > +=C2=A0=C2=A0=C2=A0=C2=A0cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); > > =C2=A0 > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return 0; > > =C2=A0} >=20 > Why? This change is not explained in the commit message. >=20 You're right. This may well go into it's own patch, actually. I'll see how I like it better. Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-TuuQJGh/vaI2fy3yLsXY Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJY0OOwAAoJEBZCeImluHPuC9EQALCaiWXWas0u3TLOzuTUusGJ w4SBABL2UCzuQ0+jio7h56poaVzdau1X1n5YE7ZghClwZ9/AxbPX2R7aZEeDGPR8 KubxpNpCnFXqehrTTlp1zYU/x641FNXBWmpTM+fzKMYW2hZ739sFExaOLVzQPWeb 32YbRfndouWM/fz0tG/rQwP41EUWor8yMLaxOiSFfunAAtaUBZxIm8VcIH+WSlCR njf2mO9hJQoYUcga97fnxd0aClc5wNow+hT8WacMkEyTJxWigzHtCXf2hJxW+jFN 9HrNP0zYJX8KLpRAEO+uNzBETJwwbdeh/KS25R/e3ix9OFWi7wCJDuFhYD0dCkGn CCILc6r6jso9yZbzfyK8h1ZvTroBD1LdsZTwZRmD+mZ9JevIEMkysrW8DtbSWJls xdNp3sKV3Psb3wSbq1m3rzgyuspHHsguhB5q4KgpN6+e+65hmx+ysE1jv4jtVhvz jYKkwJtbmvSvGQ2lImCFJBrtIdWvscdIXdWht8PiH+FL8mu6D0JV4bGbLjDFY3A0 jRqPeF0+XWhP3szyegISilI/DsHUOxHgny78LG40rhgWwLFGclALiJx/WXIqZXU2 z5lFNg0UiviuQ0JX8v90xf017fkbwnJRTYj9MmjT2E3HYLbigHVWRiecjtY8kdnl mTSiZxkHQEFziscnDaay =7NMX -----END PGP SIGNATURE----- --=-TuuQJGh/vaI2fy3yLsXY-- --===============3745303215736307315== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============3745303215736307315==--