xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: anshul makkar <anshul.makkar@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH 05/24] xen: credit2: make tickling more deterministic
Date: Wed, 31 Aug 2016 18:10:24 +0100	[thread overview]
Message-ID: <57C70F80.80802@citrix.com> (raw)
In-Reply-To: <147145428827.25877.8612560340138019986.stgit@Solace.fritz.box>

On 17/08/16 18:18, Dario Faggioli wrote:
> Right now, the following scenario can occurr:
>   - upon vcpu v wakeup, v itself is put in the runqueue,
>     and pcpu X is tickled;
>   - pcpu Y schedules (for whatever reason), sees v in
>     the runqueue and picks it up.
>
> This may seem ok (or even a good thing), but it's not.
> In fact, if runq_tickle() decided X is where v should
> run, it did it for a reason (load distribution, SMT
> support, cache hotness, affinity, etc), and we really
> should try as hard as possible to stick to that.
>
> Of course, we can't be too strict, or we risk leaving
> vcpus in the runqueue while there is available CPU
> capacity. So, we only leave v in runqueue --for X to
> pick it up-- if we see that X has been tickled and
> has not scheduled yet, i.e., it will have a real chance
> of actually select and schedule v.
>
> If that is not the case, we schedule it on Y (or, at
> least, we consider that), as running somewhere non-ideal
> is better than not running at all.
>
> The commit also adds performance counters for each of
> the possible situations.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>   xen/common/sched_credit2.c   |   65 +++++++++++++++++++++++++++++++++++++++---
>   xen/include/xen/perfc_defn.h |    3 ++
>   2 files changed, 64 insertions(+), 4 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 12dfd20..a3d7beb 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -54,6 +54,7 @@
>   #define TRC_CSCHED2_LOAD_CHECK       TRC_SCHED_CLASS_EVT(CSCHED2, 16)
>   #define TRC_CSCHED2_LOAD_BALANCE     TRC_SCHED_CLASS_EVT(CSCHED2, 17)
>   #define TRC_CSCHED2_PICKED_CPU       TRC_SCHED_CLASS_EVT(CSCHED2, 19)
> +#define TRC_CSCHED2_RUNQ_CANDIDATE   TRC_SCHED_CLASS_EVT(CSCHED2, 20)
>
>   /*
>    * WARNING: This is still in an experimental phase.  Status and work can be found at the
> @@ -398,6 +399,7 @@ struct csched2_vcpu {
>       int credit;
>       s_time_t start_time; /* When we were scheduled (used for credit) */
>       unsigned flags;      /* 16 bits doesn't seem to play well with clear_bit() */
> +    int tickled_cpu;     /* cpu tickled for picking us up (-1 if none) */
>
>       /* Individual contribution to load */
>       s_time_t load_last_update;  /* Last time average was updated */
> @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>       __cpumask_set_cpu(ipid, &rqd->tickled);
>       smt_idle_mask_clear(ipid, &rqd->smt_idle);
>       cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
> +
> +    if ( unlikely(new->tickled_cpu != -1) )
> +        SCHED_STAT_CRANK(tickled_cpu_overwritten);
> +    new->tickled_cpu = ipid;
>   }
>
>   /*
> @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>           ASSERT(svc->sdom != NULL);
>           svc->credit = CSCHED2_CREDIT_INIT;
>           svc->weight = svc->sdom->weight;
> +        svc->tickled_cpu = -1;
>           /* Starting load of 50% */
>           svc->avgload = 1ULL << (CSCHED2_PRIV(ops)->load_precision_shift - 1);
>           svc->load_last_update = NOW() >> LOADAVG_GRANULARITY_SHIFT;
> @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
>       else
>       {
>           ASSERT(svc->sdom == NULL);
> +        svc->tickled_cpu = svc->vcpu->vcpu_id;
If I understood correctly, tickled_cpu refers to pcpu and not a vcpu. 
Saving vcpu_id in tickled_cpu looks wrong.

>           svc->credit = CSCHED2_IDLE_CREDIT;
>           svc->weight = 0;
>       }
> @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused);
>   static struct csched2_vcpu *
>   runq_candidate(struct csched2_runqueue_data *rqd,
>                  struct csched2_vcpu *scurr,
> -               int cpu, s_time_t now)
> +               int cpu, s_time_t now,
> +               unsigned int *pos)
>   {
>       struct list_head *iter;
>       struct csched2_vcpu *snext = NULL;
> @@ -2262,13 +2271,29 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>
>           /* Only consider vcpus that are allowed to run on this processor. */
>           if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) )
> +        {
> +            (*pos)++;
>               continue;
> +        }
> +
> +        /*
> +         * If a vcpu is meant to be picked up by another processor, and such
> +         * processor has not scheduled yet, leave it in the runqueue for him.
> +         */
> +        if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu &&
> +             cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) )
> +        {
> +            (*pos)++;
> +            SCHED_STAT_CRANK(deferred_to_tickled_cpu);
> +            continue;
> +        }
>
>           /* If this is on a different processor, don't pull it unless
>            * its credit is at least CSCHED2_MIGRATE_RESIST higher. */
>           if ( svc->vcpu->processor != cpu
>                && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit )
>           {
> +            (*pos)++;
>               SCHED_STAT_CRANK(migrate_resisted);
>               continue;
>           }
> @@ -2280,9 +2305,26 @@ runq_candidate(struct csched2_runqueue_data *rqd,
>
>           /* In any case, if we got this far, break. */
>           break;
> +    }
>
> +    if ( unlikely(tb_init_done) )
> +    {
> +        struct {
> +            unsigned vcpu:16, dom:16;
> +            unsigned tickled_cpu, position;
> +        } d;
> +        d.dom = snext->vcpu->domain->domain_id;
> +        d.vcpu = snext->vcpu->vcpu_id;
> +        d.tickled_cpu = snext->tickled_cpu;
> +        d.position = *pos;
> +        __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1,
> +                    sizeof(d),
> +                    (unsigned char *)&d);
>       }
>
> +    if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) )
> +        SCHED_STAT_CRANK(tickled_cpu_overridden);
> +
>       return snext;
>   }
>
> @@ -2298,6 +2340,7 @@ csched2_schedule(
>       struct csched2_runqueue_data *rqd;
>       struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
>       struct csched2_vcpu *snext = NULL;
> +    unsigned int snext_pos = 0;
>       struct task_slice ret;
>
>       SCHED_STAT_CRANK(schedule);
> @@ -2347,7 +2390,7 @@ csched2_schedule(
>           snext = CSCHED2_VCPU(idle_vcpu[cpu]);
>       }
>       else
> -        snext=runq_candidate(rqd, scurr, cpu, now);
> +        snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos);
>
>       /* If switching from a non-idle runnable vcpu, put it
>        * back on the runqueue. */
> @@ -2371,8 +2414,21 @@ csched2_schedule(
>               __set_bit(__CSFLAG_scheduled, &snext->flags);
>           }
>
> -        /* Check for the reset condition */
> -        if ( snext->credit <= CSCHED2_CREDIT_RESET )
> +        /*
> +         * The reset condition is "has a scheduler epoch come to an end?".
> +         * The way this is enforced is checking whether the vcpu at the top
> +         * of the runqueue has negative credits. This means the epochs have
> +         * variable lenght, as in one epoch expores when:
> +         *  1) the vcpu at the top of the runqueue has executed for
> +         *     around 10 ms (with default parameters);
> +         *  2) no other vcpu with higher credits wants to run.
> +         *
> +         * Here, where we want to check for reset, we need to make sure the
> +         * proper vcpu is being used. In fact, runqueue_candidate() may have
> +         * not returned the first vcpu in the runqueue, for various reasons
> +         * (e.g., affinity). Only trigger a reset when it does.
> +         */
> +        if ( snext_pos == 0 && snext->credit <= CSCHED2_CREDIT_RESET )
>           {
>               reset_credit(ops, cpu, now, snext);
>               balance_load(ops, cpu, now);
> @@ -2386,6 +2442,7 @@ csched2_schedule(
>           }
>
>           snext->start_time = now;
> +        snext->tickled_cpu = -1;
>
>           /* Safe because lock for old processor is held */
>           if ( snext->vcpu->processor != cpu )
> diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h
> index a336c71..4a835b8 100644
> --- a/xen/include/xen/perfc_defn.h
> +++ b/xen/include/xen/perfc_defn.h
> @@ -66,6 +66,9 @@ PERFCOUNTER(runtime_max_timer,      "csched2: runtime_max_timer")
>   PERFCOUNTER(migrated,               "csched2: migrated")
>   PERFCOUNTER(migrate_resisted,       "csched2: migrate_resisted")
>   PERFCOUNTER(credit_reset,           "csched2: credit_reset")
> +PERFCOUNTER(deferred_to_tickled_cpu,"csched2: deferred_to_tickled_cpu")
> +PERFCOUNTER(tickled_cpu_overwritten,"csched2: tickled_cpu_overwritten")
> +PERFCOUNTER(tickled_cpu_overridden, "csched2: tickled_cpu_overridden")
>
>   PERFCOUNTER(need_flush_tlb_flush,   "PG_need_flush tlb flushes")
>
>Anshul


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2016-08-31 17:10 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
2016-09-12 15:01   ` George Dunlap
2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
2016-08-17 23:42   ` Dario Faggioli
2016-09-12 15:04     ` George Dunlap
2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
2016-09-12 15:14   ` George Dunlap
2016-09-12 17:00     ` Dario Faggioli
2016-09-14  9:34       ` George Dunlap
2016-09-14 13:54         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
2016-09-12 17:11   ` George Dunlap
2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
2016-08-31 17:10   ` anshul makkar [this message]
2016-09-05 13:47     ` Dario Faggioli
2016-09-07 12:25       ` anshul makkar
2016-09-13 11:13       ` George Dunlap
2016-09-29 15:24         ` Dario Faggioli
2016-09-13 11:28   ` George Dunlap
2016-09-30  2:22     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
2016-09-13 13:33   ` George Dunlap
2016-09-29 16:05     ` Dario Faggioli
2016-09-20 13:25   ` George Dunlap
2016-09-20 13:37     ` George Dunlap
2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
2016-09-20 13:32   ` George Dunlap
2016-09-29 16:46     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
2016-08-18  0:57   ` Meng Xu
2016-08-18  9:41     ` Dario Faggioli
2016-09-20 13:50   ` George Dunlap
2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
2016-09-20 14:08   ` George Dunlap
2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
2016-09-20 14:35   ` George Dunlap
2016-09-29 17:23     ` Dario Faggioli
2016-09-29 17:28       ` George Dunlap
2016-09-29 20:53         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
2016-09-20 14:37   ` George Dunlap
2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
2016-09-20 14:43   ` George Dunlap
2016-09-20 14:45     ` Wei Liu
2016-09-28 15:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
2016-09-20 15:10   ` Wei Liu
2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
2016-08-22  9:21   ` Ian Jackson
2016-09-05 14:02     ` Dario Faggioli
2016-08-22  9:28   ` Ian Jackson
2016-09-28 15:37     ` George Dunlap
2016-09-30  1:03     ` Dario Faggioli
2016-09-28 15:39   ` George Dunlap
2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
2016-09-28 15:46   ` George Dunlap
2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
2016-09-28 15:49   ` George Dunlap
2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
2016-09-01 10:52   ` anshul makkar
2016-09-05 14:55     ` Dario Faggioli
2016-09-07 13:24       ` anshul makkar
2016-09-07 13:31         ` Dario Faggioli
2016-09-28 20:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
2016-09-01 11:08   ` anshul makkar
2016-09-05 13:26     ` Dario Faggioli
2016-09-07 12:52       ` anshul makkar
2016-09-29 11:11   ` George Dunlap
2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
2016-09-02 11:46   ` anshul makkar
2016-09-05 12:49     ` Dario Faggioli
2016-08-17 17:19 ` [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
2016-08-17 17:20 ` [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
2016-09-30 15:30   ` George Dunlap
2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
2016-09-02 12:38   ` anshul makkar
2016-09-05 12:52     ` Dario Faggioli
2016-08-17 17:20 ` [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
2016-08-18  0:11 ` [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-18 11:49 ` Dario Faggioli
2016-08-18 11:53 ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57C70F80.80802@citrix.com \
    --to=anshul.makkar@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).