From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH 2/4] xen: x86 / cpupool: clear the proper
 cpu_valid bit on pCPU teardown
Date: Thu, 25 Jun 2015 17:04:37 +0200
Message-ID: <1435244677.25170.169.camel@citrix.com>
References: <20150625103457.3353.39292.stgit@Solace.station>
	<20150625121520.3353.30808.stgit@Solace.station>
	<558C0E11.6050009@citrix.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============3651496323423851855=="
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=61120bd37=dario.faggioli@citrix.com>)
	id 1Z88j9-0003bL-L7
	for xen-devel@lists.xenproject.org; Thu, 25 Jun 2015 15:06:07 +0000
In-Reply-To: <558C0E11.6050009@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <jgross@suse.com>, xen-devel@lists.xenproject.org, Jan Beulich <JBeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

--===============3651496323423851855==
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="=-w+C9CCMdA96yjhVeiaAa"

--=-w+C9CCMdA96yjhVeiaAa
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Thu, 2015-06-25 at 15:20 +0100, Andrew Cooper wrote:
> On 25/06/15 13:15, Dario Faggioli wrote:
> > In fact, if a pCPU belonging to some other pool than
> > cpupool0 goes down, we want to clear the relevant bit
> > from its actual pool, rather than always from cpupool0.
>=20
> This sentence is a little hard to parse.
>=20
> I presume you mean "use the correct cpupools valid mask, rather than
> cpupool0's".
>=20
Yes, that's a better way to say what I meant.

> > # xl cpupool-cpu-remove Pool-0 8-15
> > # xl cpupool-create name=3D\"Pool-1\"
> > # xl cpupool-cpu-add Pool-1 8-15
> > --> suspend
> > --> resume
> > (XEN) ----[ Xen-4.6-unstable  x86_64  debug=3Dy  Tainted:    C ]----
> > (XEN) CPU:    8
> > (XEN) RIP:    e008:[<ffff82d080123078>] csched_schedule+0x4be/0xb97
> > (XEN) RFLAGS: 0000000000010087   CONTEXT: hypervisor
> > (XEN) rax: 80007d2f7fccb780   rbx: 0000000000000009   rcx: 000000000000=
0000
> > (XEN) rdx: ffff82d08031ed40   rsi: ffff82d080334980   rdi: 000000000000=
0000
> > (XEN) rbp: ffff83010000fe20   rsp: ffff83010000fd40   r8:  000000000000=
0004
> > (XEN) r9:  0000ffff0000ffff   r10: 00ff00ff00ff00ff   r11: 0f0f0f0f0f0f=
0f0f
> > (XEN) r12: ffff8303191ea870   r13: ffff8303226aadf0   r14: 000000000000=
0009
> > (XEN) r15: 0000000000000008   cr0: 000000008005003b   cr4: 000000000000=
26f0
> > (XEN) cr3: 00000000dba9d000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) ... ... ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d080123078>] csched_schedule+0x4be/0xb97
> > (XEN)    [<ffff82d08012c732>] schedule+0x12a/0x63c
> > (XEN)    [<ffff82d08012f8c8>] __do_softirq+0x82/0x8d
> > (XEN)    [<ffff82d08012f920>] do_softirq+0x13/0x15
> > (XEN)    [<ffff82d080164791>] idle_loop+0x5b/0x6b
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 8:
> > (XEN) GENERAL PROTECTION FAULT
> > (XEN) [error_code=3D0000]
> > (XEN) ****************************************
>=20
> What is the actual cause of the #GP fault?  There are no obviously
> poised registers. =20
>
IIRC, CPU 8 has been just brought up and is scheduling. Not any other
CPU from Pool-1 is online yet. We are on CPU 8, in
csched_load_balance(), more specifically here:

    ...
    BUG_ON( cpu !=3D snext->vcpu->processor );
    online =3D cpupool_scheduler_cpumask(per_cpu(cpupool, cpu));
    ...
    for_each_csched_balance_step( bstep )
    {
        /*
         * We peek at the non-idling CPUs in a node-wise fashion. In fact,
         * it is more likely that we find some affine work on our same
         * node, not to mention that migrating vcpus within the same node
         * could well expected to be cheaper than across-nodes (memory
         * stays local, there might be some node-wide cache[s], etc.).
         */
        peer_node =3D node;
        do
        {
            /* Find out what the !idle are in this node */
            cpumask_andnot(&workers, online, prv->idlers);
            cpumask_and(&workers, &workers, &node_to_cpumask(peer_node));
            __cpumask_clear_cpu(cpu, &workers);

            peer_cpu =3D cpumask_first(&workers);
            if ( peer_cpu >=3D nr_cpu_ids )
                goto next_node;
            do
            {
                /*
                 * Get ahold of the scheduler lock for this peer CPU.
                 *
                 * Note: We don't spin on this lock but simply try it. Spin=
ning
                 * could cause a deadlock if the peer CPU is also load
                 * balancing and trying to lock this CPU.
                 */
                spinlock_t *lock =3D pcpu_schedule_trylock(peer_cpu);

Because of the fact that we did not clear Pool-1->cpu_valid online is
8-15. Also, since we _did_ clear bits 8-15 in prv->idlers when tearing
them down, during suspend, they're all (or all but 8) workers, as far as
the code above can tell.

We therefore enter the inner do{}while with, for instance (that's what
I've seen in my debugging), peer_cpu=3D9, but we've not yet done
cpu_schedule_up()-->alloc_pdata()-->etc. for that CPU, so we die at (or
shortly after) the end of the code snippet shown above.

> Is it something we should modify to be a BUG or ASSERT?
>=20
Not sure how/where. Note that some more fixing of similar situations
happen in other patches in the series, and that includes also adding
ASSERT-s (although, no, they probably won't cover this case).

I can try to think at it and to come up with something if you think it's
important...

Thanks and Regards,
Dario

--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

--=-w+C9CCMdA96yjhVeiaAa
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEABECAAYFAlWMGIUACgkQk4XaBE3IOsS7qQCeNE/G2PJGTC6/3su3jMeNLbCW
YFwAn2zrWLRntOX3Dpq1WhT1e0eyF9wR
=JaKW
-----END PGP SIGNATURE-----

--=-w+C9CCMdA96yjhVeiaAa--


--===============3651496323423851855==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============3651496323423851855==--