From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Date: Wed, 18 Mar 2015 16:49:28 +0000 Message-ID: <1426697366.2560.65.camel@citrix.com> References: <20150313181109.GA3179@gmail.com> <55032C85.6090805@citrix.com> <550336EE.60909@eu.citrix.com> <5506DF40020000780006A407@mail.emea.novell.com> <5506D1E1.8050404@eu.citrix.com> <5506E11F020000780006A426@mail.emea.novell.com> <1426616308.32500.98.camel@citrix.com> <55093DD1020000780006B1F2@mail.emea.novell.com> <1426668836.2560.13.camel@citrix.com> <55099940.7080007@eu.citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7876641840621347569==" Return-path: In-Reply-To: <55099940.7080007@eu.citrix.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org Cc: Andrew Cooper , "xen-devel@lists.xen.org" , George Dunlap , "JBeulich@suse.com" , "uma.sharma523@gmail.com" List-Id: xen-devel@lists.xenproject.org --===============7876641840621347569== Content-Language: en-US Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-cPCsNkLIhlDEAmMq1MH/" --=-cPCsNkLIhlDEAmMq1MH/ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2015-03-18 at 15:26 +0000, George Dunlap wrote: > On 03/18/2015 08:53 AM, Dario Faggioli wrote: > >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *= ops, int cpu) > >>> static void * > >>> csched2_alloc_pdata(const struct scheduler *ops, int cpu) > >>> { > >>> - /* Check to see if the cpu is online yet */ > >>> - /* Note: cpu 0 doesn't get a STARTING callback */ > >>> - if ( cpu =3D=3D 0 || cpu_to_socket(cpu) >=3D 0 ) > >>> + /* > >>> + * Actual initialization is deferred to when the pCPU will be > >>> + * online, via a STARTING callback. The only exception is > >>> + * the boot cpu, which does not get such a notification, and > >>> + * hence needs to be taken care of here. > >>> + */ > >>> + if ( system_state =3D=3D SYS_STATE_boot ) > >>> init_pcpu(ops, cpu); >=20 > I think this will break adding a cpu to a cpupool after boot, won't it? >=20 Oh, yeah, that. Yes, I knew it, and probably should have mentioned it... I assumed it was clear enough that I was asking an opinion about the shown use of system_state, as opposed to using a particular init value of cpu_data.phys_proc_id to the same effect, during the boot process. > Don't we want effectively, "if ( is_boot_cpu() || > is_cpu_topology_set_up() )"? >=20 > is_cpu_topology_set_up() could be "system_state >=3D SYS_STATE_smp_boot" = I > suppose? >=20 That is what I was thinking. Alternatively, I thought we can keep the '<=3D SYS_STATE_boot' part for recognizing that the call comes from within the boot process, and actually do tweak the initialization/assignment logic of phys_proc_id, as it was also mentioned previously in the thread (and as you also mention below, IIUIC), to make it better represent whether topology info are valid or not. > Is "cpu =3D=3D 0" the best test for "is_boot_cpu()", or is there another = test? >=20 It is a valid test for that, right now, AFAICT, but Jan would like for it to stop being one. :-D > So let's step back for a minute, and let me see if I can describe the > situation. >=20 Yeah... thanks for this, it's been extremely helpful, IMO! > Unlike the other schedulers, credit2 needs to know the topology of a > cpus when setting it up, so that we can place it in the proper runqueue. >=20 > Setting up the per-cpu structures would normally happen in alloc_pdata. > This is called in three different ways: >=20 > * During boot, on the boot processer, alloc_pdata is called from > scheduler_init(). >=20 > * During boot, on the secondary processors, alloc_pdata is called from > a CPU_UP_PREPARE callback, which happens just before the cpu is actually > brought up. >=20 > * When switching a cpu to a new cpupool after boot, alloc_pdata is also > called from cpu_schedule_switch(). >=20 > The "normal" place the cpu topology information can be found is in > global the cpu_data[] array, typically accessed by the cpu_to_socket() > or cpu_to_core() macros. >=20 > This topology information is written to cpu_data[] by > smp_store_cpu_info(). For the boot cpu, it happens in > smp_prepare_cpus(); For secondary cpus, it's happens in > start_secondary(), which is the code run on the cpu itself as it's being > brought up. >=20 > Unfortunately, at the moment, both of these places are after the > respective places where the alloc pdata is called for those cpus. > Flattening the entire x86 setup at the moment you'd see: > alloc_pdata for boot cpu > smp_store_cpu_info for boot cpu > for each secondary cpu > alloc_pdata for secondary cpu > smp_store_cpu_info for secondary cpu >=20 > scheduler_init() is called before smp_prepare_cpus() in part because of > a dependency chain elsewhere: we cannot set up the idle domain until > scheduler_init() is called; and other places further on in the > initialization but before setting up cpu topology information assume > that the idle domain has been set up. >=20 However, as you say yourself below, there is boot_cpu_data, which stashes the info we need for the boot cpu. It gets filled after init_idle_domain()->scheduler_init() right now, but I don't see a reason why it can't be pulled up a bit (and from my tests, it seems to work). That happens during system_state=3D=3DSYS_STATE_boot, which also means system_state I haven't actually tracked down this dependency yet; nor have I explored > the possibility of populating cpu_data[] for the boot cpu earier than > it's done now. >=20 That looks like what boot_cpu_data is for, AFAICT. > I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather > than CPU_STARTING -- whether that's just what seemed most sensible when > cpupools were created, or whether there's a dependency somewhere between > those two points. >=20 Me neither, this may be worth a try. > At the moment we deal with the fact that the topology information for a > CPU is not available on CPU_UP_PREPARE by setting a callback that will > call init_pcpu() on the CPU_STARTING action. >=20 > The boot cpu will not get such a callback; but information about the > topology of the boot cpu *is* available in boot_cpu_data[]. >=20 Exactly. > We can use system_state to detect whether we've been called from > scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've > been called after the whole thing has been done (system_state >=3D > SYS_STATE_active). =20 > Exactly again. :-) > But while system_state =3D=3D SYS_STATE_smp_boot (which > is where both of the normal schedule callback and the csched2-specific > callback happen at the moment), we can't determine whether a given cpu's > cpu_data[] has been initialized yet. >=20 But we know this from the fact that, if system_state is SYS_STATE_smp_boot and not SYS_STATE_boot, we are being called from the callback, don't we? > So one thing we could do is this: >=20 > * In csched2_alloc_pdata(), call init_pcpu() if system_state <=3D > SYS_STATE_smp_boot (to catch scheduler_init) or system_state >=3D > SYS_STATE_active (to catch cpupool callbacks). >=20 > * Keep the credit2-specific callback on CPU_STARTING. In > csched2_cpu_starting(), call init_pcpu(). (Not sure if this handles cpu > offlining / onlining properly or not.) >=20 > * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that > we've been called from scheduler_init and use boot_cpu_data[]. > Otherwise, assume that the calls from the CPU_UP_PREPARE callback have > been filtered out, and that we're being called either from CPU_STARTING > or from cpu_schedlue_switch(), and use cpu_data[]. >=20 Oh, right, so after all, we at least sort of are on the same page: this is exactly what I was suggesting/wanted to do. :-) > Another thing we could explore is this: >=20 > * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than > CPU_UP_PREPARE, and get rid of the csched2-specific callback. >=20 > * In csched2_alloc_pdata, always call init_pcpu(). (Or perhaps just > rename init_pcpu() to csched2_alloc_pdata().) >=20 > * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've > been called from scheduler_init and use boot_cpu_data[]; otherwise, > assume that we're being called either from CPU_STARTING, or from > cpu_schedule_switch(), and use cpu_data[]. >=20 Right too, I think. > If this would work, I think it would be a lot better. It would remove > the credit2-specific callback, and it would mean we wouldn't need an > additional test to filter out >=20 I see. Ok, I guess I can give this a try. If it does not explode, because some weird dependency we can't see right now, we can either try harder to track it, or use the other solution outlined above as a fallback. > In both cases there's a slight risk in using system_state to determine > whether to rely on cpu_data[] or not, because there's actually a window > for each processor after system_state =3D=3D SYS_STATE_smp_boot where > cpu_data[] is *not* initialized, but it's not obvious from looking at > the data itself. If the callback mechanisms ever change order with the > cpu_data[] initializations in the future, we risk a situation where > credit2 silently regresses to using a single massive runqueue. >=20 > So I think that at least for debug builds, we should make some way to > determine whether a given cpu_data[] has been initialized, so that at > least we an ASSERT() that it has. > Yep, agreed too. :-) Thanks again for this and Regards, Dario --=-cPCsNkLIhlDEAmMq1MH/ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlUJrJcACgkQk4XaBE3IOsQ9mgCeOY2uolhK86/F2rVj0+WjPGGs 38AAnjWdop5/meZiKUu9FMDarZa9CKjw =doGT -----END PGP SIGNATURE----- --=-cPCsNkLIhlDEAmMq1MH/-- --===============7876641840621347569== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============7876641840621347569==--