From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH 1/3] xen: sched: introduce the 'null'
 semi-static scheduler
Date: Tue, 21 Mar 2017 09:26:23 +0100
Message-ID: <1490084783.15340.10.camel@citrix.com>
References: <148977585611.29510.906390949919041674.stgit@Palanthas.fritz.box>
 <148977617833.29510.4160128186395621610.stgit@Palanthas.fritz.box>
 <alpine.DEB.2.10.1703201524020.11533@sstabellini-ThinkPad-X260>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============3745303215736307315=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta3.messagelabs.com ([195.245.230.39])
 by lists.xenproject.org with esmtp (Exim 4.84_2)
 (envelope-from <prvs=246f5387f=dario.faggioli@citrix.com>)
 id 1cqF84-0007zF-7j
 for xen-devel@lists.xenproject.org; Tue, 21 Mar 2017 08:26:56 +0000
In-Reply-To: <alpine.DEB.2.10.1703201524020.11533@sstabellini-ThinkPad-X260>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
To: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jonathan Davies <Jonathan.Davies@citrix.com>, Julien Grall <julien.grall@arm.com>, George Dunlap <george.dunlap@citrix.com>, Marcus Granado <marcus.granado@citrix.com>, xen-devel@lists.xenproject.org
List-Id: xen-devel@lists.xenproject.org

--===============3745303215736307315==
Content-Type: multipart/signed; micalg=pgp-sha256;
	protocol="application/pgp-signature"; boundary="=-TuuQJGh/vaI2fy3yLsXY"

--=-TuuQJGh/vaI2fy3yLsXY
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, 2017-03-20 at 16:21 -0700, Stefano Stabellini wrote:
> On Fri, 17 Mar 2017, Dario Faggioli wrote:
> >=20
> > --- /dev/null
> > +++ b/xen/common/sched_null.c

> > +/*
> > + * Virtual CPU
> > + */
> > +struct null_vcpu {
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct list_head waitq_elem;
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_dom *ndom;
>=20
> This field is redundant, given that from struct vcpu you can get
> struct
> domain and from struct domain you can get struct null_dom. I would
> remove it.
>=20
It's kind of a paradigm, followed by pretty much all schedulers. So
it's good to uniform to that, and it's often quite useful (or it may be
in future)... I'll have a look, though, at whether it is really
important to have it in this simple scheduler too, and do remove it if
it's not worth.

> > +=C2=A0=C2=A0=C2=A0=C2=A0struct vcpu *vcpu;
> > +=C2=A0=C2=A0=C2=A0=C2=A0int pcpu;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0/* To what pCPU the vCPU is assigned (-1 if
> > none) */
>=20
> Isn't this always the same as struct vcpu->processor?
> If it's only different when the vcpu is waiting in the waitq, then
> you
> can just remove this field and replace the pcpu =3D=3D -1 test with
> list_empty(waitq_elem).
>=20
I'll think about it. Right now, it's useful for ASSERTing and
consistency checking, which is something I want, at least in this
phase. It is also useful for reporting to what pcpu a vcpu is currently
assigned, for which thing you can't trust v->processor. That only
happens in `xl debug-key r' for now, but we may want to have less
tricky way of exporting such information in future.

Anyway, as I said, I'll ponder whether I can get rid of it.

> > +static void null_vcpu_remove(const struct scheduler *ops, struct
> > vcpu *v)
> > +{
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_private *prv =3D null_priv(ops);
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *wvc, *nvc =3D null_vcpu(v);
> > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned int cpu;
> > +=C2=A0=C2=A0=C2=A0=C2=A0spinlock_t *lock;
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v));
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0lock =3D vcpu_schedule_lock_irq(v);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0cpu =3D v->processor;
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0/* If v is in waitqueue, just get it out of th=
ere and bail */
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(nvc->pcpu =3D=3D -1) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_=
lock);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!list_empty(&nu=
ll_vcpu(v)->waitq_elem));
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&nvc->wa=
itq_elem);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait=
q_lock);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0goto out;
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0/*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If v is assigned to a pCPU, let's see =
if there is someone
> > waiting.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* If yes, we assign it to cpu, in spite =
of v. If no, we just
> > set
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* cpu free.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(per_cpu(npc, cpu).vcpu =3D=3D v);
> > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!cpumask_test_cpu(cpu, &prv->cpus_free)=
);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_lock);
> > +=C2=A0=C2=A0=C2=A0=C2=A0wvc =3D list_first_entry_or_null(&prv->waitq, =
struct null_vcpu,
> > waitq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( wvc )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_assign(prv, wvc->=
vcpu, cpu);
>=20
> The vcpu_assign in null_vcpu_insert is protected by the pcpu runq
> lock,
> while this call is protected by the waitq_lock lock. Is that safe?
>=20
vcpu assignment is always protected by the runqueue lock. Both in
null_vcpu_insert and() in here, we take it with:

 lock =3D vcpu_schedule_lock_irq(v);

at the beginning of the function (left in context, see above).

Taking the waitq_lock here serializes access to the waitqueue
(prv->waitqueue), i.e., the list_first_entry_or_null() call above, and
the list_del_init() call below.

> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&wvc->wa=
itq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpu_raise_softirq(cpu,=
 SCHEDULE_SOFTIRQ);
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > +=C2=A0=C2=A0=C2=A0=C2=A0else
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_deassign(prv, v, =
cpu);
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > +=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->waitq_lock);
> > +
> > + out:
> > +=C2=A0=C2=A0=C2=A0=C2=A0vcpu_schedule_unlock_irq(lock, v);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_remove);
> > +}
> > +
> > +static void null_vcpu_wake(const struct scheduler *ops, struct
> > vcpu *v)
> > +{
> > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v));
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( unlikely(curr_on_cpu(v->processor) =3D=3D=
 v) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_=
wake_running);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return;
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( null_vcpu(v)->pcpu =3D=3D -1 )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +	/* Not exactly "on runq", but close enough for reusing the
> > counter */
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(vcpu_=
wake_onrunq);
> > +	return;
>=20
> coding style
>=20
Yeah... I need to double check the style of all the file (patch
series?). I mostly wrote this while travelling, with an editor set
differently from what I use when at home. I thought I had done that,
but I clearly missed a couple of sports. Sorry.

> > +static void null_vcpu_migrate(const struct scheduler *ops, struct
> > vcpu *v,
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0unsigned int new_cpu)
> > +{
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_private *prv =3D null_priv(ops);
> > +=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *nvc =3D null_vcpu(v);
> > +=C2=A0=C2=A0=C2=A0=C2=A0unsigned int cpu =3D v->processor;
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_vcpu(v));
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( v->processor =3D=3D new_cpu )
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return;
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0/*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* v is either in the waitqueue, or assig=
ned to a pCPU.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the former case, there is nothing t=
o do.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the latter, the pCPU to which it wa=
s assigned would
> > become free,
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* and we, therefore, should check whethe=
r there is anyone in
> > the
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* waitqueue that can be assigned to it.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( likely(nvc->pcpu !=3D -1) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct null_vcpu *wvc;
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_=
lock);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0wvc =3D list_first_ent=
ry_or_null(&prv->waitq, struct
> > null_vcpu, waitq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if ( wvc && cpumask_te=
st_cpu(cpu,
> > cpupool_domain_cpumask(v->domain)) )
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0vcpu_assign(prv, wvc->vcpu, cpu);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0list_del_init(&wvc->waitq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0else
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0vcpu_deassign(prv, v, cpu);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait=
q_lock);
>=20
> This looks very similar to null_vcpu_remove, maybe you want to
> refactor
> the code and call a single shared service function.
>=20
Yes, maybe I can.

> > +	SCHED_STAT_CRANK(migrate_running);
> > +=C2=A0=C2=A0=C2=A0=C2=A0}
> > +=C2=A0=C2=A0=C2=A0=C2=A0else
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(migra=
te_on_runq);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0SCHED_STAT_CRANK(migrated);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0/*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Let's now consider new_cpu, which is w=
here v is being sent.
> > It can be
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* either free, or have a vCPU already as=
signed to it.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In the former case, we should assign v=
 to it, and try to
> > get it to run.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* In latter, all we can do is to park v =
in the waitqueue.
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/
> > +=C2=A0=C2=A0=C2=A0=C2=A0if ( per_cpu(npc, new_cpu).vcpu =3D=3D NULL )
> > +=C2=A0=C2=A0=C2=A0=C2=A0{
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* We don't know wheth=
er v was in the waitqueue. If yes,
> > remove it */
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_lock(&prv->waitq_=
lock);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0list_del_init(&nvc->wa=
itq_elem);
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0spin_unlock(&prv->wait=
q_lock);
> > +
> > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vcpu_assign(prv, v, ne=
w_cpu);
>=20
> This vcpu_assign call seems to be unprotected. Should it be within a
> spin_lock'ed area?
>=20
Lock is taken by the caller. Check calls to SCHED_OP(...,migrate) in
xen/common/schedule.c.

That's by design, and it's like that for most functions (with the sole
exceptions of debug dump and insert/remove, IIRC), for all schedulers.

> > diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> > index 223a120..b482037 100644
> > --- a/xen/common/schedule.c
> > +++ b/xen/common/schedule.c
> > @@ -1785,6 +1785,8 @@ int schedule_cpu_switch(unsigned int cpu,
> > struct cpupool *c)
> > =C2=A0
> > =C2=A0 out:
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0per_cpu(cpupool, cpu) =3D c;
> > +=C2=A0=C2=A0=C2=A0=C2=A0/* Trigger a reschedule so the CPU can pick up=
 some work ASAP.
> > */
> > +=C2=A0=C2=A0=C2=A0=C2=A0cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> > =C2=A0
> > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return 0;
> > =C2=A0}
>=20
> Why? This change is not explained in the commit message.
>=20
You're right. This may well go into it's own patch, actually. I'll see
how I like it better.

Thanks and Regards,
Dario
--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--=-TuuQJGh/vaI2fy3yLsXY
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAABCAAGBQJY0OOwAAoJEBZCeImluHPuC9EQALCaiWXWas0u3TLOzuTUusGJ
w4SBABL2UCzuQ0+jio7h56poaVzdau1X1n5YE7ZghClwZ9/AxbPX2R7aZEeDGPR8
KubxpNpCnFXqehrTTlp1zYU/x641FNXBWmpTM+fzKMYW2hZ739sFExaOLVzQPWeb
32YbRfndouWM/fz0tG/rQwP41EUWor8yMLaxOiSFfunAAtaUBZxIm8VcIH+WSlCR
njf2mO9hJQoYUcga97fnxd0aClc5wNow+hT8WacMkEyTJxWigzHtCXf2hJxW+jFN
9HrNP0zYJX8KLpRAEO+uNzBETJwwbdeh/KS25R/e3ix9OFWi7wCJDuFhYD0dCkGn
CCILc6r6jso9yZbzfyK8h1ZvTroBD1LdsZTwZRmD+mZ9JevIEMkysrW8DtbSWJls
xdNp3sKV3Psb3wSbq1m3rzgyuspHHsguhB5q4KgpN6+e+65hmx+ysE1jv4jtVhvz
jYKkwJtbmvSvGQ2lImCFJBrtIdWvscdIXdWht8PiH+FL8mu6D0JV4bGbLjDFY3A0
jRqPeF0+XWhP3szyegISilI/DsHUOxHgny78LG40rhgWwLFGclALiJx/WXIqZXU2
z5lFNg0UiviuQ0JX8v90xf017fkbwnJRTYj9MmjT2E3HYLbigHVWRiecjtY8kdnl
mTSiZxkHQEFziscnDaay
=7NMX
-----END PGP SIGNATURE-----

--=-TuuQJGh/vaI2fy3yLsXY--


--===============3745303215736307315==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs
IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v
cmcveGVuLWRldmVsCg==

--===============3745303215736307315==--