Re: [PATCH v3 1/4] sched: credit2: respect per-vcpu hard affinity

All of lore.kernel.org
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@eu.citrix.com>
To: "Justin T. Weaver" <jtweaver@hawaii.edu>, xen-devel@lists.xen.org
Cc: dario.faggioli@citrix.com, henric@hawaii.edu
Subject: Re: [PATCH v3 1/4] sched: credit2: respect per-vcpu hard affinity
Date: Tue, 31 Mar 2015 15:37:28 +0100	[thread overview]
Message-ID: <551AB128.8090905@eu.citrix.com> (raw)
In-Reply-To: <1427363314-25430-2-git-send-email-jtweaver@hawaii.edu>

On 03/26/2015 09:48 AM, Justin T. Weaver wrote:
> by making sure that vcpus only run on the pcpu(s) they are allowed to
> run on based on their hard affinity cpu masks.
> 
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Hey Justin!  Getting close.  A couple of comments:

> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 7581731..af716e4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -176,6 +176,7 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
>  #define c2r(_ops, _cpu)     (CSCHED2_PRIV(_ops)->runq_map[(_cpu)])
>  /* CPU to runqueue struct macro */
>  #define RQD(_ops, _cpu)     (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops, _cpu)])
> +#define VCPU2ONLINE(_v)     cpupool_online_cpumask((_v)->domain->cpupool)
>  
>  /*
>   * Shifts for load average.
> @@ -194,6 +195,12 @@ int opt_overload_balance_tolerance=-3;
>  integer_param("credit2_balance_over", opt_overload_balance_tolerance);
>  
>  /*
> + * Use this to avoid having too many cpumask_t structs on the stack
> + */
> +static cpumask_t **scratch_mask = NULL;
> +#define csched2_cpumask scratch_mask[smp_processor_id()]

Since I'm asking for changes below:  It's not clear to me when I'm
scanning the code that csched2_cpumask is a scratch variable.  What
about calling the actual variable _scratch_cpumask and havind the
#define be scratch_cpumask?

>      /* Find the runqueue with the lowest instantaneous load */
>      for_each_cpu(i, &prv->active_queues)
>      {
>          struct csched2_runqueue_data *rqd;
> -        s_time_t rqd_avgload;
> +        s_time_t rqd_avgload = MAX_LOAD;
>  
>          rqd = prv->rqd + i;
>  
>          /* If checking a different runqueue, grab the lock,
> -         * read the avg, and then release the lock.
> +         * check hard affinity, read the avg, and then release the lock.
>           *
>           * If on our own runqueue, don't grab or release the lock;
>           * but subtract our own load from the runqueue load to simulate
> -         * impartiality */
> +         * impartiality.
> +         *
> +         * svc's hard affinity may have changed; this function is the
> +         * credit 2 scheduler's first opportunity to react to the change,
> +         * so it is possible here that svc does not have hard affinity
> +         * with any of the pcpus of svc's currently assigned run queue.
> +         */
>          if ( rqd == svc->rqd )
>          {
> -            rqd_avgload = rqd->b_avgload - svc->avgload;
> +            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                rqd_avgload = rqd->b_avgload - svc->avgload;
>          }
>          else if ( spin_trylock(&rqd->lock) )
>          {
> -            rqd_avgload = rqd->b_avgload;
> +            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
> +                rqd_avgload = rqd->b_avgload;
> +
>              spin_unlock(&rqd->lock);
>          }
>          else

Since we're already potentially falling through and doing the comparison
below with an unchanged rqd_avgload (which has now been set to MAX_LOAD
above), I wonder if it makes more sense to remove this "else continue"
here, just to avoid confusing people.

> @@ -2024,6 +2096,13 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
>          printk("%s: cpu %d not online yet, deferring initializatgion\n",
>                 __func__, cpu);
>  
> +    /*
> +     * For each new pcpu, allocate a cpumask_t for use throughout the
> +     * scheduler to avoid putting any cpumask_t structs on the stack.
> +     */
> +    if ( !zalloc_cpumask_var(&scratch_mask[cpu]) )

Any reason not to use "scratch_mask + cpu" here rather than
"&scratch_mask[cpu]"?

It might not be a bad idea to ad ASSERT(scratch_mask[cpu] == NULL)
before this, just to be paranoid...

> +        return NULL;
> +
>      return (void *)1;
>  }
>  
> @@ -2072,6 +2151,8 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
>  
>      spin_unlock_irqrestore(&prv->lock, flags);
>  
> +    free_cpumask_var(scratch_mask[cpu]);

We should definitely set this to NULL after freeing it (see below)

> +
>      return;
>  }
>  
> @@ -2159,6 +2240,10 @@ csched2_init(struct scheduler *ops)
>  
>      prv->load_window_shift = opt_load_window_shift;
>  
> +    scratch_mask = xmalloc_array(cpumask_t *, nr_cpu_ids);

I realize Dario recommended using xmalloc_array() instead of
xzalloc_array(), but I don't understand why he thinks that's OK.  His
mail says "(see below about why I actually don't
think we need)", but I don't actually see that addressed in that e-mail.

I think it's just dangerous to leave uninitialized pointers around.  The
invariant should be that if the array entry is invalid it's NULL, and if
it's non-null then it's valid.

(* marc.info/?i=<1426266674.32500.25.camel@citrix.com>)

Also -- I think this allocation wants to happen in global_init(), right?
 Otherwise if you make a second cpupool with the credit2 scheduler this
will be clobbered.  (I think nr_cpu_ids should be defined at that point.)

 -George

next prev parent reply	other threads:[~2015-03-31 14:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-26  9:48 [PATCH v3 0/4] sched: credit2: introduce per-vcpu hard and soft affinity Justin T. Weaver
2015-03-26  9:48 ` [PATCH v3 1/4] sched: credit2: respect per-vcpu hard affinity Justin T. Weaver
2015-03-31 14:37   ` George Dunlap [this message]
2015-03-31 17:14     ` Dario Faggioli
2015-03-31 17:32       ` George Dunlap
2015-04-23 16:00     ` Dario Faggioli
2015-05-06 12:39   ` Dario Faggioli
2015-03-26  9:48 ` [PATCH v3 2/4] sched: factor out per-vcpu affinity related code to common header file Justin T. Weaver
2015-04-23 15:22   ` Dario Faggioli
2015-03-26  9:48 ` [PATCH v3 3/4] sched: credit2: indent code sections to make review of patch 4/4 easier Justin T. Weaver
2015-04-23 15:35   ` Dario Faggioli
2015-03-26  9:48 ` [PATCH v3 4/4] sched: credit2: consider per-vcpu soft affinity Justin T. Weaver
2015-03-31 17:38   ` George Dunlap
2015-04-20 15:38   ` George Dunlap
2015-04-22 16:16   ` George Dunlap
2015-09-17 14:27 ` [PATCH v3 0/4] sched: credit2: introduce per-vcpu hard and " Dario Faggioli
2015-09-17 15:15   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=551AB128.8090905@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=henric@hawaii.edu \
    --cc=jtweaver@hawaii.edu \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.