From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [PATCH v2 10/16] xen: sched: use soft-affinity
 instead of domain's node-affinity
Date: Thu, 14 Nov 2013 15:30:33 +0000
Message-ID: <5284EC99.3070607@eu.citrix.com>
References: <20131113190852.18086.5437.stgit@Solace>
	<20131113191233.18086.60472.stgit@Solace>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <20131113191233.18086.60472.stgit@Solace>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Dario Faggioli <dario.faggioli@citrix.com>, xen-devel@lists.xen.org
Cc: Marcus Granado <Marcus.Granado@eu.citrix.com>, Keir Fraser <keir@xen.org>, Ian Campbell <Ian.Campbell@citrix.com>, Li Yechen <lccycc123@gmail.com>, Andrew Cooper <Andrew.Cooper3@citrix.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>, Ian Jackson <Ian.Jackson@eu.citrix.com>, Jan Beulich <JBeulich@suse.com>, Justin Weaver <jtweaver@hawaii.edu>, Matt Wilson <msw@amazon.com>, Elena Ufimtseva <ufimtseva@gmail.com>
List-Id: xen-devel@lists.xenproject.org

On 13/11/13 19:12, Dario Faggioli wrote:
> now that we have it, use soft affinity for scheduling, and replace
> the indirect use of the domain's NUMA node-affinity. This is
> more general, as soft affinity does not have to be related to NUMA.
> At the same time it allows to achieve the same results as
> NUMA-aware scheduling, just by making soft affinity equal to the
> domain's node affinity, for all the vCPUs (e.g., from the toolstack).
>
> This also means renaming most of the NUMA-aware scheduling related
> functions, in credit1, to something more generic, hinting toward
> the concept of soft affinity rather than directly to NUMA awareness.
>
> As a side effects, this simplifies the code quit a bit. In fact,
> prior to this change, we needed to cache the translation of
> d->node_affinity (which is a nodemask_t) to a cpumask_t, since that
> is what scheduling decisions require (we used to keep it in
> node_affinity_cpumask). This, and all the complicated logic
> required to keep it updated, is not necessary any longer.
>
> The high level description of NUMA placement and scheduling in
> docs/misc/xl-numa-placement.markdown is being updated too, to match
> the new architecture.
>
> signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>

Just a few things to note below...

> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 4b8fca8..b599223 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -411,8 +411,6 @@ void domain_update_node_affinity(struct domain *d)
>                   node_set(node, d->node_affinity);
>       }
>   
> -    sched_set_node_affinity(d, &d->node_affinity);
> -
>       spin_unlock(&d->node_affinity_lock);

At this point, the only thing inside the spinlock is contingent on 
d->auto_node_affinity.

> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> index 398b095..0790ebb 100644
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
...

> -static inline int __vcpu_has_node_affinity(const struct vcpu *vc,
> +static inline int __vcpu_has_soft_affinity(const struct vcpu *vc,
>                                              const cpumask_t *mask)
>   {
> -    const struct domain *d = vc->domain;
> -    const struct csched_dom *sdom = CSCHED_DOM(d);
> -
> -    if ( d->auto_node_affinity
> -         || cpumask_full(sdom->node_affinity_cpumask)
> -         || !cpumask_intersects(sdom->node_affinity_cpumask, mask) )
> +    if ( cpumask_full(vc->cpu_soft_affinity)
> +         || !cpumask_intersects(vc->cpu_soft_affinity, mask) )
>           return 0;

At this point we've lost a way to make this check potentially much 
faster (being able to check auto_node_affinity).  This isn't a super-hot 
path but it does happen fairly frequently -- will the "cpumask_full()" 
check take a significant amount of time on, say, a 4096-core system?  If 
so, we might think about "caching" the results of cpumask_full() at some 
point.