All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dario.faggioli@citrix.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	George Dunlap <George.Dunlap@citrix.com>,
	"JBeulich@suse.com" <JBeulich@suse.com>,
	"uma.sharma523@gmail.com" <uma.sharma523@gmail.com>
Subject: Re: [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
Date: Wed, 18 Mar 2015 16:49:28 +0000	[thread overview]
Message-ID: <1426697366.2560.65.camel@citrix.com> (raw)
In-Reply-To: <55099940.7080007@eu.citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 8955 bytes --]

On Wed, 2015-03-18 at 15:26 +0000, George Dunlap wrote:
> On 03/18/2015 08:53 AM, Dario Faggioli wrote:
> >>> @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
> >>>  static void *
> >>>  csched2_alloc_pdata(const struct scheduler *ops, int cpu)
> >>>  {
> >>> -    /* Check to see if the cpu is online yet */
> >>> -    /* Note: cpu 0 doesn't get a STARTING callback */
> >>> -    if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
> >>> +    /*
> >>> +     * Actual initialization is deferred to when the pCPU will be
> >>> +     * online, via a STARTING callback. The only exception is
> >>> +     * the boot cpu, which does not get such a notification, and
> >>> +     * hence needs to be taken care of here.
> >>> +     */
> >>> +    if ( system_state == SYS_STATE_boot )
> >>>          init_pcpu(ops, cpu);
> 
> I think this will break adding a cpu to a cpupool after boot, won't it?
> 
Oh, yeah, that. Yes, I knew it, and probably should have mentioned it...
I assumed it was clear enough that I was asking an opinion about the
shown use of system_state, as opposed to using a particular init value
of cpu_data.phys_proc_id to the same effect, during the boot process.

> Don't we want effectively, "if ( is_boot_cpu() ||
> is_cpu_topology_set_up() )"?
> 
> is_cpu_topology_set_up() could be "system_state >= SYS_STATE_smp_boot" I
> suppose?
> 
That is what I was thinking. Alternatively, I thought we can keep the
'<= SYS_STATE_boot' part for recognizing that the call comes from within
the boot process, and actually do tweak the initialization/assignment
logic of phys_proc_id, as it was also mentioned previously in the thread
(and as you also mention below, IIUIC), to make it better represent
whether topology info are valid or not.

> Is "cpu == 0" the best test for "is_boot_cpu()", or is there another test?
> 
It is a valid test for that, right now, AFAICT, but Jan would like for
it to stop being one. :-D

> So let's step back for a minute, and let me see if I can describe the
> situation.
> 
Yeah... thanks for this, it's been extremely helpful, IMO!

> Unlike the other schedulers, credit2 needs to know the topology of a
> cpus when setting it up, so that we can place it in the proper runqueue.
> 
> Setting up the per-cpu structures would normally happen in alloc_pdata.
>  This is called in three different ways:
> 
> * During boot, on the boot processer, alloc_pdata is called from
> scheduler_init().
> 
> * During boot, on the secondary processors, alloc_pdata is called from
> a CPU_UP_PREPARE callback, which happens just before the cpu is actually
> brought up.
> 
> * When switching a cpu to a new cpupool after boot, alloc_pdata is also
> called from cpu_schedule_switch().
> 
> The "normal" place the cpu topology information can be found is in
> global the cpu_data[] array, typically accessed by the cpu_to_socket()
> or cpu_to_core() macros.
> 
> This topology information is written to cpu_data[] by
> smp_store_cpu_info().  For the boot cpu, it happens in
> smp_prepare_cpus();  For secondary cpus, it's happens in
> start_secondary(), which is the code run on the cpu itself as it's being
> brought up.
> 
> Unfortunately, at the moment, both of these places are after the
> respective places where the alloc pdata is called for those cpus.
> Flattening the entire x86 setup at the moment you'd see:
>   alloc_pdata for boot cpu
>   smp_store_cpu_info for boot cpu
>   for each secondary cpu
>     alloc_pdata for secondary cpu
>     smp_store_cpu_info for secondary cpu
> 
> scheduler_init() is called before smp_prepare_cpus() in part because of
> a dependency chain elsewhere: we cannot set up the idle domain until
> scheduler_init() is called; and other places further on in the
> initialization but before setting up cpu topology information assume
> that the idle domain has been set up.
> 
However, as you say yourself below, there is boot_cpu_data, which
stashes the info we need for the boot cpu. It gets filled after
init_idle_domain()->scheduler_init() right now, but I don't see a reason
why it can't be pulled up a bit (and from my tests, it seems to work).

That happens during system_state==SYS_STATE_boot, which also means
system_state<SYS_STATE_smp_boot. This means that, as far as scheduling
initialization is concerned, if (system_state < SYS_STATE_smp_boot) we
can and should use boot_cpu_data from the scheduler code, because it's
for that cpu that we're being called.

I do agree that this may look fragile, as it depends on the relative
positioning of identify_cpu() wrt scheduler_init() wrt system_state
values, but I think we can find something to ASSERT() about (from
Credit2's code).

> I haven't actually tracked down this dependency yet; nor have I explored
> the possibility of populating cpu_data[] for the boot cpu earier than
> it's done now.
> 
That looks like what boot_cpu_data is for, AFAICT.

> I'm not sure exactly why we call alloc_pdata on CPU_UP_PREPARE rather
> than CPU_STARTING -- whether that's just what seemed most sensible when
> cpupools were created, or whether there's a dependency somewhere between
> those two points.
> 
Me neither, this may be worth a try.

> At the moment we deal with the fact that the topology information for a
> CPU is not available on CPU_UP_PREPARE by setting a callback that will
> call init_pcpu() on the CPU_STARTING action.
> 
> The boot cpu will not get such a callback; but information about the
> topology of the boot cpu *is* available in boot_cpu_data[].
> 
Exactly.

> We can use system_state to detect whether we've been called from
> scheduler_init (system_state < SYS_STATE_smp_boot), or whether we've
> been called after the whole thing has been done (system_state >=
> SYS_STATE_active).  
>
Exactly again. :-)

> But while system_state == SYS_STATE_smp_boot (which
> is where both of the normal schedule callback and the csched2-specific
> callback happen at the moment), we can't determine whether a given cpu's
> cpu_data[] has been initialized yet.
> 
But we know this from the fact that, if system_state is
SYS_STATE_smp_boot and not SYS_STATE_boot, we are being called from the
callback, don't we?

> So one thing we could do is this:
> 
> * In csched2_alloc_pdata(), call init_pcpu() if system_state <=
> SYS_STATE_smp_boot (to catch scheduler_init) or system_state >=
> SYS_STATE_active (to catch cpupool callbacks).
> 
> * Keep the credit2-specific callback on CPU_STARTING.  In
> csched2_cpu_starting(), call init_pcpu().  (Not sure if this handles cpu
> offlining / onlining properly or not.)
> 
> * In init_pcpu(), if system_state < SYS_STATE_smp_boot, assume that
> we've been called from scheduler_init and use boot_cpu_data[].
> Otherwise, assume that the calls from the CPU_UP_PREPARE callback have
> been filtered out, and that we're being called either from CPU_STARTING
> or from cpu_schedlue_switch(), and use cpu_data[].
> 
Oh, right, so after all, we at least sort of are on the same page: this
is exactly what I was suggesting/wanted to do. :-)

> Another thing we could explore is this:
> 
> * Change scheduler.c to call alloc_pdata on CPU_STARTING rather than
> CPU_UP_PREPARE, and get rid of the csched2-specific callback.
> 
> * In csched2_alloc_pdata, always call init_pcpu().  (Or perhaps just
> rename init_pcpu() to csched2_alloc_pdata().)
> 
> * In init_pcpu, if system_state < SYS_STATE_smp_boot, assume that we've
> been called from scheduler_init and use boot_cpu_data[]; otherwise,
> assume that we're being called either from CPU_STARTING, or from
> cpu_schedule_switch(), and use cpu_data[].
> 
Right too, I think.

> If this would work, I think it would be a lot better.  It would remove
> the credit2-specific callback, and it would mean we wouldn't need an
> additional test to filter out
> 
I see. Ok, I guess I can give this a try. If it does not explode,
because some weird dependency we can't see right now, we can either try
harder to track it, or use the other solution outlined above as a
fallback.

> In both cases there's a slight risk in using system_state to determine
> whether to rely on cpu_data[] or not, because there's actually a window
> for each processor after system_state == SYS_STATE_smp_boot where
> cpu_data[] is *not* initialized, but it's not obvious from looking at
> the data itself.  If the callback mechanisms ever change order with the
> cpu_data[] initializations in the future, we risk a situation where
> credit2 silently regresses to using a single massive runqueue.
> 
> So I think that at least for debug builds, we should make some way to
> determine whether a given cpu_data[] has been initialized, so that at
> least we an ASSERT() that it has.
>
Yep, agreed too. :-)

Thanks again for this and Regards,
Dario

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  parent reply	other threads:[~2015-03-18 16:49 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-13 18:11 [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code Uma Sharma
2015-03-13 18:29 ` Andrew Cooper
2015-03-13 19:13   ` George Dunlap
2015-03-16 12:48     ` Jan Beulich
2015-03-16 12:51       ` George Dunlap
2015-03-16 12:56         ` Jan Beulich
2015-03-16 13:26           ` Dario Faggioli
2015-03-17 18:18           ` Dario Faggioli
2015-03-18  7:56             ` Jan Beulich
2015-03-18  8:53               ` Dario Faggioli
2015-03-18 15:26                 ` George Dunlap
2015-03-18 15:59                   ` Jan Beulich
2015-03-18 16:08                     ` George Dunlap
2015-03-18 16:30                       ` Dario Faggioli
2015-03-18 16:49                   ` Dario Faggioli [this message]
2015-03-18 17:05                     ` George Dunlap
2015-03-19 10:03                       ` Dario Faggioli
2015-03-19 10:50                         ` Jan Beulich
2015-03-19 11:23                           ` Dario Faggioli
2015-03-19 11:40                           ` George Dunlap
2015-03-19 12:29                             ` Dario Faggioli
2015-03-19 12:35                               ` George Dunlap
2015-03-19 13:00                                 ` Dario Faggioli
2015-03-16 12:45 ` Jan Beulich
2015-03-16 12:49 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1426697366.2560.65.camel@citrix.com \
    --to=dario.faggioli@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=uma.sharma523@gmail.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.