All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Keir Fraser <keir@xen.org>,
	Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Wed, 02 Feb 2011 11:05:48 +0100	[thread overview]
Message-ID: <4D492C7C.5040005@ts.fujitsu.com> (raw)
In-Reply-To: <4D491AB7.40204@ts.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 4658 bytes --]

Hi Andre,

could you try the attached patch?
It should verify if your problems are due to the master ticker
kicking in at a time when the cpu is already gone from the cpupool.

I'm not sure if the patch is complete - Disabling the master ticker
in csched_tick_suspend might lead to problems with cstates. The
functionality is different, at least.

George, do you think this is correct?


Juergen

On 02/02/11 09:49, Juergen Gross wrote:
> On 02/02/11 07:27, Juergen Gross wrote:
>> On 02/01/11 17:32, Andre Przywara wrote:
>>> Hi folks,
>>>
>>> I asked Stephan Diestelhorst for help and after I convinced him that
>>> removing credit and making SEDF the default again is not an option he
>>> worked together with me on that ;-) Many thanks for that!
>>> We haven't come to a final solution but could gather some debug data.
>>> I will simply dump some data here, maybe somebody has got a clue. We
>>> will work further on this tomorrow.
>>>
>>> First I replaced the BUG_ON with some printks to get some insight:
>>> (XEN) sdom->active_vcpu_count: 18
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 4096, weight_total: 4096
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) Xen BUG at sched_credit.c:591
>>> (XEN) ----[ Xen-4.1.0-rc2-pre x86_64 debug=y Not tainted ]----
>>>
>>> So that one shows that the number of VCPUs is not up-to-date with the
>>> computed weight sum, we have seen a difference of one or two VCPUs (in
>>> this case here the weight has been computed from 16 VCPUs). Also it
>>> shows that the assertion kicks in in the first iteration of the loop,
>>> where weight_left and weight_total are still equal.
>>>
>>> So I additionally instrumented alloc_pdata and free_pdata, the
>>> unprefixed lines come from a shell script mimicking the functionality of
>>> cpupool-numa-split.
>>> ------------
>>> Removing CPUs from Pool 0
>>> Creating new pool
>>> Using config file "cpupool.test"
>>> cpupool name: Pool-node6
>>> scheduler: credit
>>> number of cpus: 1
>>> (XEN) adding CPU 36, now 1 CPUs
>>> (XEN) removing CPU 36, remaining: 17
>>> Populating new pool
>>> (XEN) sdom->active_vcpu_count: 9
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 2048, weight_total: 2048
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) adding CPU 37, now 2 CPUs
>>> (XEN) removing CPU 37, remaining: 16
>>> (XEN) adding CPU 38, now 3 CPUs
>>> (XEN) removing CPU 38, remaining: 15
>>> (XEN) adding CPU 39, now 4 CPUs
>>> (XEN) removing CPU 39, remaining: 14
>>> (XEN) adding CPU 40, now 5 CPUs
>>> (XEN) removing CPU 40, remaining: 13
>>> (XEN) sdom->active_vcpu_count: 17
>>> (XEN) sdom->weight: 256
>>> (XEN) weight_left: 4096, weight_total: 4096
>>> (XEN) credit_balance: 0, credit_xtra: 0, credit_cap: 0
>>> (XEN) adding CPU 41, now 6 CPUs
>>> (XEN) removing CPU 41, remaining: 12
>>> ...
>>> Two thing startled me:
>>> 1) There is quite some between the "Removing CPUs" message from the
>>> script and the actual HV printk showing it's done, why is that not
>>> synchronous?
>>
>> Removing cpus from Pool-0 requires no switching of the scheduler, so you
>> see no calls of alloc/free_pdata here.
>>
>> > Looking at the code it shows that
>>> __csched_vcpu_acct_start() is eventually triggered by a timer, shouldn't
>>> that be triggered synchronously by add/removal events?
>>
>> The vcpus are not moved explicitly, they are migrated by the normal
>> scheduler mechanisms, same as for vcpu-pin.
>>
>>> 2) It clearly shows that each CPU gets added to the new pool _before_ it
>>> gets removed from the old one (Pool-0), isn't that violating the "only
>>> one pool per CPU" rule? Even it that is fine for a short period of time,
>>> maybe the timer kicks in in this very moment resulting in violated
>>> invariants?
>>
>> The sequence you are seeing seems to be okay. The alloc_pdata for the
>> new pool
>> is called before the free_pdata for the old pool.
>>
>> And the timer is not relevant, as only the idle vcpu should be running
>> on the
>> moving cpu and the accounting stuff is never called during idle.
>
> Uhh, this could be wrong!
> The normal ticker doesn't call accounting in idle and it is stopped during
> cpu move. The master_ticker is handled wrong, perhaps. I'll check this and
> prepare a patch if necessary.
>
>
> Juergen
>


-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 2720 bytes --]

diff -r 76e1f7018b01 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Mon Jan 31 08:10:00 2011 +0100
+++ b/xen/common/sched_credit.c Wed Feb 02 10:59:44 2011 +0100
@@ -50,6 +50,8 @@
     (CSCHED_CREDITS_PER_MSEC * CSCHED_MSECS_PER_TSLICE)
 #define CSCHED_CREDITS_PER_ACCT     \
     (CSCHED_CREDITS_PER_MSEC * CSCHED_MSECS_PER_TICK * CSCHED_TICKS_PER_ACCT)
+#define CSCHED_ACCT_TSLICE          \
+    (MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT)
 
 
 /*
@@ -320,6 +322,7 @@ csched_free_pdata(const struct scheduler
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_pcpu *spc = pcpu;
     unsigned long flags;
+    uint64_t now = NOW();
 
     if ( spc == NULL )
         return;
@@ -334,6 +337,8 @@ csched_free_pdata(const struct scheduler
     {
         prv->master = first_cpu(prv->cpus);
         migrate_timer(&prv->master_ticker, prv->master);
+        set_timer(&prv->master_ticker, now + CSCHED_ACCT_TSLICE
+            - now % CSCHED_ACCT_TSLICE);
     }
     kill_timer(&spc->ticker);
     if ( prv->ncpus == 0 )
@@ -367,8 +372,7 @@ csched_alloc_pdata(const struct schedule
     {
         prv->master = cpu;
         init_timer(&prv->master_ticker, csched_acct, prv, cpu);
-        set_timer(&prv->master_ticker, NOW() +
-                  MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT);
+        set_timer(&prv->master_ticker, NOW() + CSCHED_ACCT_TSLICE);
     }
 
     init_timer(&spc->ticker, csched_tick, (void *)(unsigned long)cpu, cpu);
@@ -1138,8 +1142,7 @@ csched_acct(void* dummy)
     prv->runq_sort++;
 
 out:
-    set_timer( &prv->master_ticker, NOW() +
-            MILLISECS(CSCHED_MSECS_PER_TICK) * CSCHED_TICKS_PER_ACCT );
+    set_timer( &prv->master_ticker, NOW() + CSCHED_ACCT_TSLICE );
 }
 
 static void
@@ -1531,22 +1534,31 @@ csched_deinit(const struct scheduler *op
 
 static void csched_tick_suspend(const struct scheduler *ops, unsigned int cpu)
 {
+    struct csched_private *prv;
     struct csched_pcpu *spc;
 
+    prv = CSCHED_PRIV(ops);
     spc = CSCHED_PCPU(cpu);
 
     stop_timer(&spc->ticker);
+    if ( prv->master == cpu )
+        stop_timer(&prv->master_ticker);
 }
 
 static void csched_tick_resume(const struct scheduler *ops, unsigned int cpu)
 {
+    struct csched_private *prv;
     struct csched_pcpu *spc;
     uint64_t now = NOW();
 
+    prv = CSCHED_PRIV(ops);
     spc = CSCHED_PCPU(cpu);
 
     set_timer(&spc->ticker, now + MILLISECS(CSCHED_MSECS_PER_TICK)
             - now % MILLISECS(CSCHED_MSECS_PER_TICK) );
+    if ( prv->master == cpu )
+        set_timer(&prv->master_ticker, now + CSCHED_ACCT_TSLICE
+            - now % CSCHED_ACCT_TSLICE);
 }
 
 static struct csched_private _csched_priv;

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

  reply	other threads:[~2011-02-02 10:05 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28  6:47 ` Juergen Gross
2011-01-28 11:07   ` Andre Przywara
2011-01-28 11:44     ` Juergen Gross
2011-01-28 13:14       ` Andre Przywara
2011-01-31  7:04         ` Juergen Gross
2011-01-31 14:59           ` Andre Przywara
2011-01-31 15:28             ` George Dunlap
2011-02-01 16:32               ` Andre Przywara
2011-02-02  6:27                 ` Juergen Gross
2011-02-02  8:49                   ` Juergen Gross
2011-02-02 10:05                     ` Juergen Gross [this message]
2011-02-02 10:59                       ` Andre Przywara
2011-02-02 14:39                 ` Stephan Diestelhorst
2011-02-02 15:14                   ` Juergen Gross
2011-02-02 16:01                     ` Stephan Diestelhorst
2011-02-03  5:57                       ` Juergen Gross
2011-02-03  9:18                         ` Juergen Gross
2011-02-04 14:09                           ` Andre Przywara
2011-02-07 12:38                             ` Andre Przywara
2011-02-07 13:32                               ` Juergen Gross
2011-02-07 15:55                                 ` George Dunlap
2011-02-08  5:43                                   ` Juergen Gross
2011-02-08 12:08                                     ` George Dunlap
2011-02-08 12:14                                       ` George Dunlap
2011-02-08 16:33                                         ` Andre Przywara
2011-02-09 12:27                                           ` George Dunlap
2011-02-09 12:27                                             ` George Dunlap
2011-02-09 13:04                                               ` Juergen Gross
2011-02-09 13:39                                                 ` Andre Przywara
2011-02-09 13:51                                               ` Andre Przywara
2011-02-09 14:21                                                 ` Juergen Gross
2011-02-10  6:42                                                   ` Juergen Gross
2011-02-10  9:25                                                     ` Andre Przywara
2011-02-10 14:18                                                       ` Andre Przywara
2011-02-11  6:17                                                         ` Juergen Gross
2011-02-11  7:39                                                           ` Andre Przywara
2011-02-14 17:57                                                             ` George Dunlap
2011-02-15  7:22                                                               ` Juergen Gross
2011-02-16  9:47                                                                 ` Juergen Gross
2011-02-16 13:54                                                                   ` George Dunlap
     [not found]                                                                     ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11                                                                     ` Juergen Gross
2011-02-16 14:28                                                                       ` Juergen Gross
2011-02-17  0:05                                                                       ` André Przywara
2011-02-17  7:05                                                                     ` Juergen Gross
2011-02-17  9:11                                                                       ` Juergen Gross
2011-02-21 10:00                                                                     ` Andre Przywara
2011-02-21 13:19                                                                       ` Juergen Gross
2011-02-21 14:45                                                                         ` Andre Przywara
2011-02-21 14:50                                                                           ` Juergen Gross
2011-02-08 12:23                                       ` Juergen Gross
2011-01-28 11:13   ` George Dunlap
2011-01-28 13:05     ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D492C7C.5040005@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=andre.przywara@amd.com \
    --cc=keir@xen.org \
    --cc=stephan.diestelhorst@amd.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.