[PATCH] SEDF: avoid gathering vCPU-s on pCPU0

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
@ 2013-03-01 15:35 Jan Beulich
  2013-03-01 15:54 ` Keir Fraser
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jan Beulich @ 2013-03-01 15:35 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc

     online = cpupool_scheduler_cpumask(v->domain->cpupool);
     cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
+                         &online_affinity);
 }

 /*
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
     vcpu_schedule_lock_irq(v);

     cpumask_copy(v->cpu_affinity, affinity);
-    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
+    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
+         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
         set_bit(_VPF_migrating, &v->pause_flags);

     vcpu_schedule_unlock_irq(v);

[-- Attachment #2: sedf-spread-load.patch --]
[-- Type: text/plain, Size: 2268 bytes --]

SEDF: avoid gathering vCPU-s on pCPU0

The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.

So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.

However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.

Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/common/sched_sedf.c
+++ b/xen/common/sched_sedf.c
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc

     online = cpupool_scheduler_cpumask(v->domain->cpupool);
     cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
+                         &online_affinity);
 }

 /*
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
     vcpu_schedule_lock_irq(v);

     cpumask_copy(v->cpu_affinity, affinity);
-    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
+    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
+         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
         set_bit(_VPF_migrating, &v->pause_flags);

     vcpu_schedule_unlock_irq(v);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-01 15:35 [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Jan Beulich
@ 2013-03-01 15:54 ` Keir Fraser
  2013-03-02  3:44 ` Ian Campbell
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Keir Fraser @ 2013-03-01 15:54 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 01/03/2013 15:35, "Jan Beulich" <JBeulich@suse.com> wrote:

> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
> guests rather sooner than later.
> 
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
> 
> However, there's one more glitch: When reducing the affinity of a vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.
> 
> Of course, an alternative to checking for SEDF explicitly in
> vcpu_set_affinity() would be to introduce a flags field in struct
> scheduler, and have SEDF set a "always-migrate-on-affinity-change"
> flag.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Keir Fraser <keir@xen.org>

> --- a/xen/common/sched_sedf.c
> +++ b/xen/common/sched_sedf.c
> @@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc
>  
>      online = cpupool_scheduler_cpumask(v->domain->cpupool);
>      cpumask_and(&online_affinity, v->cpu_affinity, online);
> -    return cpumask_first(&online_affinity);
> +    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
> +                         &online_affinity);
>  }
>  
>  /*
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -605,7 +605,8 @@ int vcpu_set_affinity(struct vcpu *v, co
>      vcpu_schedule_lock_irq(v);
>  
>      cpumask_copy(v->cpu_affinity, affinity);
> -    if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
> +    if ( VCPU2OP(v)->sched_id == XEN_SCHEDULER_SEDF ||
> +         !cpumask_test_cpu(v->processor, v->cpu_affinity) )
>          set_bit(_VPF_migrating, &v->pause_flags);
>  
>      vcpu_schedule_unlock_irq(v);
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-01 15:35 [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Jan Beulich
  2013-03-01 15:54 ` Keir Fraser
@ 2013-03-02  3:44 ` Ian Campbell
  2013-03-04  7:57   ` Jan Beulich
  2013-03-04  6:48 ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Juergen Gross
  2013-03-04 11:22 ` George Dunlap
  3 siblings, 1 reply; 13+ messages in thread
From: Ian Campbell @ 2013-03-02  3:44 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 2013-03-01 at 15:35 +0000, Jan Beulich wrote:
> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
> guests rather sooner than later.
> 
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
> 

Is this change at all related to the long standing failure to boot with
sedf in the test harnesss?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-02  3:44 ` Ian Campbell
@ 2013-03-04  7:57   ` Jan Beulich
  2013-03-05 15:54     ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages] Ian Jackson
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2013-03-04  7:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

>>> On 02.03.13 at 04:44, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Fri, 2013-03-01 at 15:35 +0000, Jan Beulich wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
>> guests rather sooner than later.
>> 
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>> 
> 
> Is this change at all related to the long standing failure to boot with
> sedf in the test harnesss?

Yes, very likely. It was looking at the logs (where all Dom0 vCPU-s
were similarly running on pCPU0) plus immediately seeing a hang
when trying it out that prompted me to finally look into the issue.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]
  2013-03-04  7:57   ` Jan Beulich
@ 2013-03-05 15:54     ` Ian Jackson
  2013-03-05 16:11       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Jackson @ 2013-03-05 15:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Ian Campbell, xen-devel

Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s on pCPU0"):
> On 02.03.13 at 04:44, Ian Campbell <ian.campbell@citrix.com> wrote:
> > Is this change at all related to the long standing failure to boot with
> > sedf in the test harnesss?
> 
> Yes, very likely. It was looking at the logs (where all Dom0 vCPU-s
> were similarly running on pCPU0) plus immediately seeing a hang
> when trying it out that prompted me to finally look into the issue.

Good, but...

Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s on pCPU0"):
...
> So when a vCPU narrows its affinity (in the extreme to a mask
> with just one bit set) and then widens it again, it would
> nevertheless remain on the pCPU it was formerly restricted to.
> When (perhaps much later) a second and then a third vCPU
> do the same, they may all end up running on the same pCPU,
> i.e. we'd get back to the problem addressed by the first half of
> the fix here.

I don't understand why having every VCPU in the whole system running
on a single PCPU would result in a complete hang.  Surely unless we
have many VCPUs all piling on spinlocks there will not be that many
runnable VCPUs ?

Or is this symptom due to the lack of PV spinlocks ?

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages]
  2013-03-05 15:54     ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages] Ian Jackson
@ 2013-03-05 16:11       ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2013-03-05 16:11 UTC (permalink / raw)
  To: Ian Jackson; +Cc: George Dunlap, Ian Campbell, xen-devel

>>> On 05.03.13 at 16:54, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Jan Beulich writes ("Re: [Xen-devel] [PATCH] SEDF: avoid gathering vCPU-s on pCPU0"):
>> So when a vCPU narrows its affinity (in the extreme to a mask
>> with just one bit set) and then widens it again, it would
>> nevertheless remain on the pCPU it was formerly restricted to.
>> When (perhaps much later) a second and then a third vCPU
>> do the same, they may all end up running on the same pCPU,
>> i.e. we'd get back to the problem addressed by the first half of
>> the fix here.
> 
> I don't understand why having every VCPU in the whole system running
> on a single PCPU would result in a complete hang.  Surely unless we
> have many VCPUs all piling on spinlocks there will not be that many
> runnable VCPUs ?
> 
> Or is this symptom due to the lack of PV spinlocks ?

The case that I observed here was all CPUs piling up in
stop_machine logic - that's effectively an open coded
spinning operation. But clearly non-ticket, non-pv spin locks
can result in similar live lock effects (as soon as the locking
rate is higher than what the single CPU can handle in terms
of executing locked region code).

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-01 15:35 [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Jan Beulich
  2013-03-01 15:54 ` Keir Fraser
  2013-03-02  3:44 ` Ian Campbell
@ 2013-03-04  6:48 ` Juergen Gross
  2013-03-04  7:59   ` Jan Beulich
  2013-03-04 11:22 ` George Dunlap
  3 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2013-03-04  6:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 01.03.2013 16:35, Jan Beulich wrote:
> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
> guests rather sooner than later.
>
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
>
> However, there's one more glitch: When reducing the affinity of a vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.
>
> Of course, an alternative to checking for SEDF explicitly in
> vcpu_set_affinity() would be to introduce a flags field in struct
> scheduler, and have SEDF set a "always-migrate-on-affinity-change"
> flag.

Or something like this? I don't like the test for sedf in schedule.c


diff -r 65105a4a8c7a xen/common/sched_sedf.c
--- a/xen/common/sched_sedf.c   Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/common/sched_sedf.c   Mon Mar 04 07:35:53 2013 +0100
@@ -397,7 +397,8 @@ static int sedf_pick_cpu(const struct sc

      online = cpupool_scheduler_cpumask(v->domain->cpupool);
      cpumask_and(&online_affinity, v->cpu_affinity, online);
-    return cpumask_first(&online_affinity);
+    return cpumask_cycle(v->vcpu_id % cpumask_weight(&online_affinity) - 1,
+                         &online_affinity);
  }

  /*
@@ -1503,6 +1504,11 @@ out:
      return rc;
  }

+static void sedf_set_affinity(const struct scheduler *ops, struct vcpu *v)
+{
+    set_bit(_VPF_migrating, &v->pause_flags);
+}
+
  static struct sedf_priv_info _sedf_priv;

  const struct scheduler sched_sedf_def = {
@@ -1532,6 +1538,7 @@ const struct scheduler sched_sedf_def =
      .sleep          = sedf_sleep,
      .wake           = sedf_wake,
      .adjust         = sedf_adjust,
+    .set_affinity   = sedf_set_affinity,
  };

  /*
diff -r 65105a4a8c7a xen/common/schedule.c
--- a/xen/common/schedule.c     Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/common/schedule.c     Mon Mar 04 07:35:53 2013 +0100
@@ -615,6 +615,8 @@ int vcpu_set_affinity(struct vcpu *v, co
      cpumask_copy(v->cpu_affinity, affinity);
      if ( !cpumask_test_cpu(v->processor, v->cpu_affinity) )
          set_bit(_VPF_migrating, &v->pause_flags);
+    if ( VCPU2OP(v)->set_affinity )
+        SCHED_OP(VCPU2OP(v), set_affinity, v);

      vcpu_schedule_unlock_irq(v);

diff -r 65105a4a8c7a xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h        Fri Mar 01 16:59:49 2013 +0100
+++ b/xen/include/xen/sched-if.h        Mon Mar 04 07:35:53 2013 +0100
@@ -180,6 +180,7 @@ struct scheduler {
      int          (*pick_cpu)       (const struct scheduler *, struct vcpu *);
      void         (*migrate)        (const struct scheduler *, struct vcpu *,
                                      unsigned int);
+    void         (*set_affinity)   (const struct scheduler *, struct vcpu *);
      int          (*adjust)         (const struct scheduler *, struct domain *,
                                      struct xen_domctl_scheduler_op *);
      int          (*adjust_global)  (const struct scheduler *,



-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-04  6:48 ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Juergen Gross
@ 2013-03-04  7:59   ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2013-03-04  7:59 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel

>>> On 04.03.13 at 07:48, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote:
> On 01.03.2013 16:35, Jan Beulich wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
>> guests rather sooner than later.
>>
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>>
>> However, there's one more glitch: When reducing the affinity of a vCPU
>> temporarily, and then widening it again to a set that includes the pCPU
>> that the vCPU was last running on, the generic scheduler code would not
>> force a migration of that vCPU, and hence it would forever stay on the
>> pCPU it last ran on. Since that can again create a load imbalance, the
>> SEDF scheduler wants a migration to happen regardless of it being
>> apparently unnecessary.
>>
>> Of course, an alternative to checking for SEDF explicitly in
>> vcpu_set_affinity() would be to introduce a flags field in struct
>> scheduler, and have SEDF set a "always-migrate-on-affinity-change"
>> flag.
> 
> Or something like this? I don't like the test for sedf in schedule.c

As said - to me a flag would seem more appropriate than another
indirect call. But for now I'll commit as is - feel free to submit an
incremental patch.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-01 15:35 [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Jan Beulich
                   ` (2 preceding siblings ...)
  2013-03-04  6:48 ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Juergen Gross
@ 2013-03-04 11:22 ` George Dunlap
  2013-03-04 11:37   ` Jan Beulich
  3 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2013-03-04 11:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, Mar 1, 2013 at 3:35 PM, Jan Beulich <JBeulich@suse.com> wrote:
> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
> incompatible with the SEDF scheduler: Any vCPU using
> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
> guests) ends up on pCPU0 after that call. Obviously, running all PV
> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
> guests rather sooner than later.
>
> So the main thing that was clearly wrong (and bogus from the beginning)
> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
> by a construct that prefers to put back the vCPU on the pCPU that it
> got launched on.
>
> However, there's one more glitch: When reducing the affinity of a vCPU
> temporarily, and then widening it again to a set that includes the pCPU
> that the vCPU was last running on, the generic scheduler code would not
> force a migration of that vCPU, and hence it would forever stay on the
> pCPU it last ran on. Since that can again create a load imbalance, the
> SEDF scheduler wants a migration to happen regardless of it being
> apparently unnecessary.

I'm not quite understanding what this is about -- why is this
necessary for sedf but not for the other schedulers?

 -George

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-04 11:22 ` George Dunlap
@ 2013-03-04 11:37   ` Jan Beulich
  2013-03-04 12:11     ` George Dunlap
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2013-03-04 11:37 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

>>> On 04.03.13 at 12:22, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
> On Fri, Mar 1, 2013 at 3:35 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
>> incompatible with the SEDF scheduler: Any vCPU using
>> VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
>> guests) ends up on pCPU0 after that call. Obviously, running all PV
>> guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
>> guests rather sooner than later.
>>
>> So the main thing that was clearly wrong (and bogus from the beginning)
>> was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
>> by a construct that prefers to put back the vCPU on the pCPU that it
>> got launched on.
>>
>> However, there's one more glitch: When reducing the affinity of a vCPU
>> temporarily, and then widening it again to a set that includes the pCPU
>> that the vCPU was last running on, the generic scheduler code would not
>> force a migration of that vCPU, and hence it would forever stay on the
>> pCPU it last ran on. Since that can again create a load imbalance, the
>> SEDF scheduler wants a migration to happen regardless of it being
>> apparently unnecessary.
> 
> I'm not quite understanding what this is about -- why is this
> necessary for sedf but not for the other schedulers?

The problem with sedf is that it doesn't do any balancing, and
never moves a vCPU to a different pCPU unless the affinity
mask changes in a way that makes this necessary (in which
case it's the generic scheduler code that invokes to relocation).

So when a vCPU narrows its affinity (in the extreme to a mask
with just one bit set) and then widens it again, it would
nevertheless remain on the pCPU it was formerly restricted to.
When (perhaps much later) a second and then a third vCPU
do the same, they may all end up running on the same pCPU,
i.e. we'd get back to the problem addressed by the first half of
the fix here.

While you might argue that affinity changes ought to be done
in a "sensible" way, I think you still realize that there is behavior
here that one wouldn't be able to control without explicitly
adding an intermediate step when doing this for DomU-s. And
you should also keep in mind that there are certain things that
Dom0 needs to change its vCPU affinities for (pv-ops doesn't
do that, which is while certain things there just can't work);
this is how I noticed in the first place that the second half of
the fix is necessary.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-04 11:37   ` Jan Beulich
@ 2013-03-04 12:11     ` George Dunlap
  2013-03-04 12:23       ` Jan Beulich
  0 siblings, 1 reply; 13+ messages in thread
From: George Dunlap @ 2013-03-04 12:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 04/03/13 11:37, Jan Beulich wrote:
> While you might argue that affinity changes ought to be done
> in a "sensible" way, I think you still realize that there is behavior
> here that one wouldn't be able to control without explicitly
> adding an intermediate step when doing this for DomU-s. And
> you should also keep in mind that there are certain things that
> Dom0 needs to change its vCPU affinities for (pv-ops doesn't
> do that, which is while certain things there just can't work);
> this is how I noticed in the first place that the second half of
> the fix is necessary.

Oh right -- so the problem is that when you *expand* the mask, you want 
the scheduler to take another look at where might be a good place to run 
the vcpu given the changed constraints.

It's probably not a bad idea to just extend that idea to all the 
schedulers really.  It's not like we expect people to be changing the 
affinity masks hundreds of times a second.

  -George

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-04 12:11     ` George Dunlap
@ 2013-03-04 12:23       ` Jan Beulich
  2013-03-04 12:15         ` George Dunlap
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2013-03-04 12:23 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

>>> On 04.03.13 at 13:11, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> It's probably not a bad idea to just extend that idea to all the 
> schedulers really.  It's not like we expect people to be changing the 
> affinity masks hundreds of times a second.

Yeah, why not - would (minimally) reduce code size and remove the
SEDF-specific check from the generic scheduler code again.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] SEDF: avoid gathering vCPU-s on pCPU0
  2013-03-04 12:23       ` Jan Beulich
@ 2013-03-04 12:15         ` George Dunlap
  0 siblings, 0 replies; 13+ messages in thread
From: George Dunlap @ 2013-03-04 12:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 04/03/13 12:23, Jan Beulich wrote:
>>>> On 04.03.13 at 13:11, George Dunlap <george.dunlap@eu.citrix.com> wrote:
>> It's probably not a bad idea to just extend that idea to all the
>> schedulers really.  It's not like we expect people to be changing the
>> affinity masks hundreds of times a second.
> Yeah, why not - would (minimally) reduce code size and remove the
> SEDF-specific check from the generic scheduler code again.

I'll whip up a patch...
  -G

>
> Jan
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-03-05 16:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-01 15:35 [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Jan Beulich
2013-03-01 15:54 ` Keir Fraser
2013-03-02  3:44 ` Ian Campbell
2013-03-04  7:57   ` Jan Beulich
2013-03-05 15:54     ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 [and 1 more messages] Ian Jackson
2013-03-05 16:11       ` Jan Beulich
2013-03-04  6:48 ` [PATCH] SEDF: avoid gathering vCPU-s on pCPU0 Juergen Gross
2013-03-04  7:59   ` Jan Beulich
2013-03-04 11:22 ` George Dunlap
2013-03-04 11:37   ` Jan Beulich
2013-03-04 12:11     ` George Dunlap
2013-03-04 12:23       ` Jan Beulich
2013-03-04 12:15         ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.