From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH 2 of 5 v3] xen: sched_credit: improve picking up the idle CPU for a VCPU Date: Tue, 18 Dec 2012 12:12:10 +0000 Message-ID: <50D05D9A.3010601@eu.citrix.com> References: <7e9837f96c0d6afc2f48.1355783339@Solace> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <7e9837f96c0d6afc2f48.1355783339@Solace> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: "xen-devel@lists.xensource.com" , "Keir (Xen.org)" , Ian Campbell List-Id: xen-devel@lists.xenproject.org On 17/12/12 22:28, Dario Faggioli wrote: > In _csched_cpu_pick() we try to select the best possible CPU for > running a VCPU, considering the characteristics of the underlying > hardware (i.e., how many threads, core, sockets, and how busy they > are). What we want is "the idle execution vehicle with the most > idling neighbours in its grouping". > > In order to achieve it, we select a CPU from the VCPU's affinity, > giving preference to its current processor if possible, as the basis > for the comparison with all the other CPUs. Problem is, to discount > the VCPU itself when computing this "idleness" (in an attempt to be > fair wrt its current processor), we arbitrarily and unconditionally > consider that selected CPU as idle, even when it is not the case, > for instance: > 1. If the CPU is not the one where the VCPU is running (perhaps due > to the affinity being changed); > 2. The CPU is where the VCPU is running, but it has other VCPUs in > its runq, so it won't go idle even if the VCPU in question goes. > > This is exemplified in the trace below: > > ] 3.466115364 x|------|------| d10v1 22005(2:2:5) 3 [ a 1 8 ] > ... ... ... > 3.466122856 x|------|------| d10v1 runstate_change d10v1 running->offline > 3.466123046 x|------|------| d?v? runstate_change d32767v0 runnable->running > ... ... ... > ] 3.466126887 x|------|------| d32767v0 28004(2:8:4) 3 [ a 1 8 ] > > 22005(...) line (the first line) means _csched_cpu_pick() was called on > VCPU 1 of domain 10, while it is running on CPU 0, and it choose CPU 8, > which is busy ('|'), even if there are plenty of idle CPUs. That is > because, as a consequence of changing the VCPU affinity, CPU 8 was > chosen as the basis for the comparison, and therefore considered idle > (its bit gets unconditionally set in the bitmask representing the idle > CPUs). 28004(...) line means the VCPU is woken up and queued on CPU 8's > runq, where it waits for a context switch or a migration, in order to > be able to execute. > > This change fixes things by only considering the "guessed" CPU idle if > the VCPU in question is both running there and is its only runnable > VCPU. > > Signed-off-by: Dario Faggioli Acked-by: George Dunlap > --- > Changes from v2: > * Use `vc->processor' instead of curr_on_cpu() for determining whether > or not vc is current on cpu, as suggested during review. > * Fixed IS_IDLE_RUNQ() macro in case runq is empty. > * Ditched the variable renaming, as requested during review. > > diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c > --- a/xen/common/sched_credit.c > +++ b/xen/common/sched_credit.c > @@ -59,6 +59,9 @@ > #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv) > #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv) > #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq)) > +/* Is the first element of _cpu's runq its idle vcpu? */ > +#define IS_RUNQ_IDLE(_cpu) (list_empty(RUNQ(_cpu)) || \ > + is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu)) > > > /* > @@ -478,9 +481,14 @@ static int > * distinct cores first and guarantees we don't do something stupid > * like run two VCPUs on co-hyperthreads while there are idle cores > * or sockets. > + * > + * Notice that, when computing the "idleness" of cpu, we may want to > + * discount vc. That is, iff vc is the currently running and the only > + * runnable vcpu on cpu, we add cpu to the idlers. > */ > cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers); > - cpumask_set_cpu(cpu, &idlers); > + if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) ) > + cpumask_set_cpu(cpu, &idlers); > cpumask_and(&cpus, &cpus, &idlers); > cpumask_clear_cpu(cpu, &cpus); > >