From: anshul makkar <anshul.makkar@citrix.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	xen-devel@lists.xenproject.org
Cc: "Justin T. Weaver" <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>
Subject: Re: [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle()
Date: Thu, 1 Sep 2016 11:52:50 +0100	[thread overview]
Message-ID: <57C80882.90102@citrix.com> (raw)
In-Reply-To: <147145437291.25877.11396888641547651914.stgit@Solace.fritz.box>
On 17/08/16 18:19, Dario Faggioli wrote:
> This is done by means of the "usual" two steps loop:
>   - soft affinity balance step;
>   - hard affinity balance step.
>
> The entire logic implemented in runq_tickle() is
> applied, during the first step, considering only the
> CPUs in the vcpu's soft affinity. In the second step,
> we fall back to use all the CPUs from its hard
> affinity (as it is doing now, without this patch).
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |  243 ++++++++++++++++++++++++++++----------------
>   1 file changed, 157 insertions(+), 86 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 0d83bd7..3aef1b4 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -902,6 +902,42 @@ __runq_remove(struct csched2_vcpu *svc)
>       list_del_init(&svc->runq_elem);
>   }
>
> +/*
> + * During the soft-affinity step, only actually preempt someone if
> + * he does not have soft-affinity with cpu (while we have).
> + *
> + * BEWARE that this uses cpumask_scratch, trowing away what's in there!
Typo:* BEWARE that this uses cpumask_scratch, throwing away what's in there!
> + */
> +static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
> +{
> +    struct csched2_vcpu * cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> +
> +    /*
> +     * If we're doing hard-affinity, always check whether to preempt cur.
> +     * If we're doing soft-affinity, but cur doesn't have one, check as well.
> +     */
> +    if ( bs == BALANCE_HARD_AFFINITY ||
> +         !has_soft_affinity(cur->vcpu, cur->vcpu->cpu_hard_affinity) )
> +        return 1;
> +
> +    /*
> +     * We're doing soft-affinity, and we know that the current vcpu on cpu
> +     * has a soft affinity. We now want to know whether cpu itself is in
Please can you explain the above statment. If the vcpu has soft affinity 
and its currently executing, doesn;t it always means that its running on 
one of the pcpu which is there in its soft affinity or hard affinity?
> +     * such affinity. In fact, since we now that new (in runq_tickle()) is
Typo:   * such affinity. In fact, since now we know that new (in 
runq_tickle()) is
> +     *  - if cpu is not in cur's soft-affinity, we should indeed check to
> +     *    see whether new should preempt cur. If that will be the case, that
> +     *    would be an improvement wrt respecting soft affinity;
> +     *  - if cpu is in cur's soft-affinity, we leave it alone and (in
> +     *    runq_tickle()) move on to another cpu. In fact, we don't want to
> +     *    be too harsh with someone which is running within its soft-affinity.
> +     *    This is safe because later, if we don't fine anyone else during the
> +     *    soft-affinity step, we will check cpu for preemption anyway, when
> +     *    doing hard-affinity.
> +     */
> +    affinity_balance_cpumask(cur->vcpu, BALANCE_SOFT_AFFINITY, cpumask_scratch);
> +    return !cpumask_test_cpu(cpu, cpumask_scratch);
> +}
> +
>   void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
>
>   /*
> @@ -925,7 +961,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>   {
>       int i, ipid = -1;
>       s_time_t lowest = (1<<30);
> -    unsigned int cpu = new->vcpu->processor;
> +    unsigned int bs, cpu = new->vcpu->processor;
>       struct csched2_runqueue_data *rqd = RQD(ops, cpu);
>       cpumask_t mask;
>       struct csched2_vcpu * cur;
> @@ -947,109 +983,144 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
>                       (unsigned char *)&d);
>       }
>
> -    /*
> -     * First of all, consider idle cpus, checking if we can just
> -     * re-use the pcpu where we were running before.
> -     *
> -     * If there are cores where all the siblings are idle, consider
> -     * them first, honoring whatever the spreading-vs-consolidation
> -     * SMT policy wants us to do.
> -     */
> -    if ( unlikely(sched_smt_power_savings) )
> -        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
> -    else
> -        cpumask_copy(&mask, &rqd->smt_idle);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    i = cpumask_test_or_cycle(cpu, &mask);
> -    if ( i < nr_cpu_ids )
> +    for_each_affinity_balance_step( bs )
>       {
> -        SCHED_STAT_CRANK(tickled_idle_cpu);
> -        ipid = i;
> -        goto tickle;
> -    }
> +        /*
> +         * First things first: if we are at the first (soft affinity) step,
> +         * but new doesn't have a soft affinity, skip this step.
> +         */
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
> +            continue;
>
> -    /*
> -     * If there are no fully idle cores, check all idlers, after
> -     * having filtered out pcpus that have been tickled but haven't
> -     * gone through the scheduler yet.
> -     */
> -    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    i = cpumask_test_or_cycle(cpu, &mask);
> -    if ( i < nr_cpu_ids )
> -    {
> -        SCHED_STAT_CRANK(tickled_idle_cpu);
> -        ipid = i;
> -        goto tickle;
> -    }
> +        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch);
>
> -    /*
> -     * Otherwise, look for the non-idle (and non-tickled) processors with
> -     * the lowest credit, among the ones new is allowed to run on. Again,
> -     * the cpu were it was running on would be the best candidate.
> -     */
> -    cpumask_andnot(&mask, &rqd->active, &rqd->idle);
> -    cpumask_andnot(&mask, &mask, &rqd->tickled);
> -    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
> -    if ( cpumask_test_cpu(cpu, &mask) )
> -    {
> -        cur = CSCHED2_VCPU(curr_on_cpu(cpu));
> -        burn_credits(rqd, cur, now);
> +        /*
> +         * First of all, consider idle cpus, checking if we can just
> +         * re-use the pcpu where we were running before.
> +         *
> +         * If there are cores where all the siblings are idle, consider
> +         * them first, honoring whatever the spreading-vs-consolidation
> +         * SMT policy wants us to do.
> +         */
> +        if ( unlikely(sched_smt_power_savings) )
> +            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
> +        else
> +            cpumask_copy(&mask, &rqd->smt_idle);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        i = cpumask_test_or_cycle(cpu, &mask);
> +        if ( i < nr_cpu_ids )
> +        {
> +            SCHED_STAT_CRANK(tickled_idle_cpu);
> +            ipid = i;
> +            goto tickle;
> +        }
>
> -        if ( cur->credit < new->credit )
> +        /*
> +         * If there are no fully idle cores, check all idlers, after
> +         * having filtered out pcpus that have been tickled but haven't
> +         * gone through the scheduler yet.
> +         */
> +        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        i = cpumask_test_or_cycle(cpu, &mask);
> +        if ( i < nr_cpu_ids )
>           {
> -            SCHED_STAT_CRANK(tickled_busy_cpu);
> -            ipid = cpu;
> +            SCHED_STAT_CRANK(tickled_idle_cpu);
> +            ipid = i;
>               goto tickle;
>           }
> -    }
>
> -    for_each_cpu(i, &mask)
> -    {
> -        /* Already looked at this one above */
> -        if ( i == cpu )
> -            continue;
> +        /*
> +         * Otherwise, look for the non-idle (and non-tickled) processors with
> +         * the lowest credit, among the ones new is allowed to run on. Again,
> +         * the cpu were it was running on would be the best candidate.
> +         */
> +        cpumask_andnot(&mask, &rqd->active, &rqd->idle);
> +        cpumask_andnot(&mask, &mask, &rqd->tickled);
> +        cpumask_and(&mask, &mask, cpumask_scratch);
> +        if ( cpumask_test_cpu(cpu, &mask) )
> +        {
> +            cur = CSCHED2_VCPU(curr_on_cpu(cpu));
>
> -        cur = CSCHED2_VCPU(curr_on_cpu(i));
> +            if ( soft_aff_check_preempt(bs, cpu) )
> +            {
> +                burn_credits(rqd, cur, now);
> +
> +                if ( unlikely(tb_init_done) )
> +                {
> +                    struct {
> +                        unsigned vcpu:16, dom:16;
> +                        unsigned cpu, credit;
> +                    } d;
> +                    d.dom = cur->vcpu->domain->domain_id;
> +                    d.vcpu = cur->vcpu->vcpu_id;
> +                    d.credit = cur->credit;
> +                    d.cpu = cpu;
> +                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> +                                sizeof(d),
> +                                (unsigned char *)&d);
> +                }
> +
> +                if ( cur->credit < new->credit )
> +                {
> +                    SCHED_STAT_CRANK(tickled_busy_cpu);
> +                    ipid = cpu;
> +                    goto tickle;
> +                }
> +            }
> +        }
>
> -        ASSERT(!is_idle_vcpu(cur->vcpu));
> +        for_each_cpu(i, &mask)
> +        {
> +            /* Already looked at this one above */
> +            if ( i == cpu )
> +                continue;
>
> -        /* Update credits for current to see if we want to preempt. */
> -        burn_credits(rqd, cur, now);
> +            cur = CSCHED2_VCPU(curr_on_cpu(i));
> +            ASSERT(!is_idle_vcpu(cur->vcpu));
>
> -        if ( cur->credit < lowest )
> -        {
> -            ipid = i;
> -            lowest = cur->credit;
> +            if ( soft_aff_check_preempt(bs, i) )
> +            {
> +                /* Update credits for current to see if we want to preempt. */
> +                burn_credits(rqd, cur, now);
> +
> +                if ( unlikely(tb_init_done) )
> +                {
> +                    struct {
> +                        unsigned vcpu:16, dom:16;
> +                        unsigned cpu, credit;
> +                    } d;
> +                    d.dom = cur->vcpu->domain->domain_id;
> +                    d.vcpu = cur->vcpu->vcpu_id;
> +                    d.credit = cur->credit;
> +                    d.cpu = i;
> +                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> +                                sizeof(d),
> +                                (unsigned char *)&d);
> +                }
> +
> +                if ( cur->credit < lowest )
> +                {
> +                    ipid = i;
> +                    lowest = cur->credit;
> +                }
> +            }
>           }
>
> -        if ( unlikely(tb_init_done) )
> +        /*
> +         * Only switch to another processor if the credit difference is
> +         * greater than the migrate resistance.
> +         */
> +        if ( ipid != -1 && lowest + CSCHED2_MIGRATE_RESIST <= new->credit )
>           {
> -            struct {
> -                unsigned vcpu:16, dom:16;
> -                unsigned cpu, credit;
> -            } d;
> -            d.dom = cur->vcpu->domain->domain_id;
> -            d.vcpu = cur->vcpu->vcpu_id;
> -            d.credit = cur->credit;
> -            d.cpu = i;
> -            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
> -                        sizeof(d),
> -                        (unsigned char *)&d);
> +            SCHED_STAT_CRANK(tickled_busy_cpu);
> +            goto tickle;
>           }
>       }
>
> -    /*
> -     * Only switch to another processor if the credit difference is
> -     * greater than the migrate resistance.
> -     */
> -    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
> -    {
> -        SCHED_STAT_CRANK(tickled_no_cpu);
> -        return;
> -    }
> -
> -    SCHED_STAT_CRANK(tickled_busy_cpu);
> +    SCHED_STAT_CRANK(tickled_no_cpu);
> +    return;
>    tickle:
>       BUG_ON(ipid == -1);
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply	other threads:[~2016-09-01 10:53 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17 17:17 [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-17 17:17 ` [PATCH 01/24] xen: credit1: small optimization in Credit1's tickling logic Dario Faggioli
2016-09-12 15:01   ` George Dunlap
2016-08-17 17:17 ` [PATCH 02/24] xen: credit1: fix mask to be used for tickling in Credit1 Dario Faggioli
2016-08-17 23:42   ` Dario Faggioli
2016-09-12 15:04     ` George Dunlap
2016-08-17 17:17 ` [PATCH 03/24] xen: credit1: return the 'time remaining to the limit' as next timeslice Dario Faggioli
2016-09-12 15:14   ` George Dunlap
2016-09-12 17:00     ` Dario Faggioli
2016-09-14  9:34       ` George Dunlap
2016-09-14 13:54         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 04/24] xen: credit2: properly schedule migration of a running vcpu Dario Faggioli
2016-09-12 17:11   ` George Dunlap
2016-08-17 17:18 ` [PATCH 05/24] xen: credit2: make tickling more deterministic Dario Faggioli
2016-08-31 17:10   ` anshul makkar
2016-09-05 13:47     ` Dario Faggioli
2016-09-07 12:25       ` anshul makkar
2016-09-13 11:13       ` George Dunlap
2016-09-29 15:24         ` Dario Faggioli
2016-09-13 11:28   ` George Dunlap
2016-09-30  2:22     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 06/24] xen: credit2: implement yield() Dario Faggioli
2016-09-13 13:33   ` George Dunlap
2016-09-29 16:05     ` Dario Faggioli
2016-09-20 13:25   ` George Dunlap
2016-09-20 13:37     ` George Dunlap
2016-08-17 17:18 ` [PATCH 07/24] xen: sched: don't rate limit context switches in case of yields Dario Faggioli
2016-09-20 13:32   ` George Dunlap
2016-09-29 16:46     ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 08/24] xen: tracing: add trace records for schedule and rate-limiting Dario Faggioli
2016-08-18  0:57   ` Meng Xu
2016-08-18  9:41     ` Dario Faggioli
2016-09-20 13:50   ` George Dunlap
2016-08-17 17:18 ` [PATCH 09/24] xen/tools: tracing: improve tracing of context switches Dario Faggioli
2016-09-20 14:08   ` George Dunlap
2016-08-17 17:18 ` [PATCH 10/24] xen: tracing: improve Credit2's tickle_check and burn_credits records Dario Faggioli
2016-09-20 14:35   ` George Dunlap
2016-09-29 17:23     ` Dario Faggioli
2016-09-29 17:28       ` George Dunlap
2016-09-29 20:53         ` Dario Faggioli
2016-08-17 17:18 ` [PATCH 11/24] tools: tracing: handle more scheduling related events Dario Faggioli
2016-09-20 14:37   ` George Dunlap
2016-08-17 17:18 ` [PATCH 12/24] xen: libxc: allow to set the ratelimit value online Dario Faggioli
2016-09-20 14:43   ` George Dunlap
2016-09-20 14:45     ` Wei Liu
2016-09-28 15:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 13/24] libxc: improve error handling of xc Credit1 and Credit2 helpers Dario Faggioli
2016-09-20 15:10   ` Wei Liu
2016-08-17 17:19 ` [PATCH 14/24] libxl: allow to set the ratelimit value online for Credit2 Dario Faggioli
2016-08-22  9:21   ` Ian Jackson
2016-09-05 14:02     ` Dario Faggioli
2016-08-22  9:28   ` Ian Jackson
2016-09-28 15:37     ` George Dunlap
2016-09-30  1:03     ` Dario Faggioli
2016-09-28 15:39   ` George Dunlap
2016-08-17 17:19 ` [PATCH 15/24] xl: " Dario Faggioli
2016-09-28 15:46   ` George Dunlap
2016-08-17 17:19 ` [PATCH 16/24] xen: sched: factor affinity helpers out of sched_credit.c Dario Faggioli
2016-09-28 15:49   ` George Dunlap
2016-08-17 17:19 ` [PATCH 17/24] xen: credit2: soft-affinity awareness in runq_tickle() Dario Faggioli
2016-09-01 10:52   ` anshul makkar [this message]
2016-09-05 14:55     ` Dario Faggioli
2016-09-07 13:24       ` anshul makkar
2016-09-07 13:31         ` Dario Faggioli
2016-09-28 20:44   ` George Dunlap
2016-08-17 17:19 ` [PATCH 18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick() Dario Faggioli
2016-09-01 11:08   ` anshul makkar
2016-09-05 13:26     ` Dario Faggioli
2016-09-07 12:52       ` anshul makkar
2016-09-29 11:11   ` George Dunlap
2016-08-17 17:19 ` [PATCH 19/24] xen: credit2: soft-affinity awareness in load balancing Dario Faggioli
2016-09-02 11:46   ` anshul makkar
2016-09-05 12:49     ` Dario Faggioli
2016-08-17 17:19 ` [PATCH 20/24] xen: credit2: kick away vcpus not running within their soft-affinity Dario Faggioli
2016-08-17 17:20 ` [PATCH 21/24] xen: credit2: optimize runq_candidate() a little bit Dario Faggioli
2016-08-17 17:20 ` [PATCH 22/24] xen: credit2: "relax" CSCHED2_MAX_TIMER Dario Faggioli
2016-09-30 15:30   ` George Dunlap
2016-08-17 17:20 ` [PATCH 23/24] xen: credit2: optimize runq_tickle() a little bit Dario Faggioli
2016-09-02 12:38   ` anshul makkar
2016-09-05 12:52     ` Dario Faggioli
2016-08-17 17:20 ` [PATCH 24/24] xen: credit2: try to avoid tickling cpus subject to ratelimiting Dario Faggioli
2016-08-18  0:11 ` [PATCH 00/24] sched: Credit1 and Credit2 improvements... and soft-affinity for Credit2! Dario Faggioli
2016-08-18 11:49 ` Dario Faggioli
2016-08-18 11:53 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=57C80882.90102@citrix.com \
    --to=anshul.makkar@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jtweaver@hawaii.edu \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).