From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Wed, 09 Feb 2011 15:21:01 +0100 [thread overview]
Message-ID: <4D52A2CD.9090507@ts.fujitsu.com> (raw)
In-Reply-To: <4D529BD9.5050200@amd.com>
Andre, George,
What seems to be interesting: I think the problem did always occur when
a new cpupool was created and the first cpu was moved to it.
I think my previous assumption regarding the master_ticker was not too bad.
I think somehow the master_ticker of the new cpupool is becoming active
before the scheduler is really initialized properly. This could happen, if
enough time is spent between alloc_pdata for the cpu to be moved and the
critical section in schedule_cpu_switch().
The solution should be to activate the timers only if the scheduler is
ready for them.
George, do you think the master_ticker should be stopped in suspend_ticker
as well? I still see potential problems for entering deep C-States. I think
I'll prepare a patch which will keep the master_ticker active for the
C-State case and migrate it for the schedule_cpu_switch() case.
Juergen
On 02/09/11 14:51, Andre Przywara wrote:
> George Dunlap wrote:
>> <George.Dunlap@eu.citrix.com> wrote:
>>> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara
>>> <andre.przywara@amd.com> wrote:
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>>> Interesting -- what seems to happen here is that as cpus are disabled,
>>> vcpus are "shovelled" in an accumulative fashion from one cpu to the
>>> next:
>>> * v18,34,42 start on cpu 24.
>>> * When 24 is brought down, they're all migrated to 25; then when 25 is
>>> brougth down, to 26, then to 27
>>> * v24 is running on cpu 27, so when 27 is brought down, v24 is added
>>> to the mix
>>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto
>>> cpu 29.
>>>
>>> While that behavior may not be ideal, it should certainly be bug-free.
>>>
>>> Another interesting thing to note is that the bug happened on pcpu 32,
>>> but there were no advertised migrations from that cpu.
>>>
>>> Andre, can you fold the attached patch into your testing?
> Sorry, but that bug (and its output) didn't trigger on two tries.
> Instead I now saw two occasions of the "migration failed, must retry
> later" message. Interestingly enough is does not seem to be fatal. The
> first time it triggers, the numa-split even completes, then after I roll
> it back and repeat it it shows again, but crashes later on that old
> BUG_ON().
>
> See the attached log for more details.
>
> Thanks for the try, anyway.
>
> Regards,
> Andre.
>
>
>>>
>>> Thanks for all your work on this.
> I am glad for all your help. I only start to really understand the
> scheduler, so your support is much appreciated.
>
>>>
>>> -George
>>>
>
>
--
Juergen Gross Principal Developer Operating Systems
TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
next prev parent reply other threads:[~2011-02-09 14:21 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28 6:47 ` Juergen Gross
2011-01-28 11:07 ` Andre Przywara
2011-01-28 11:44 ` Juergen Gross
2011-01-28 13:14 ` Andre Przywara
2011-01-31 7:04 ` Juergen Gross
2011-01-31 14:59 ` Andre Przywara
2011-01-31 15:28 ` George Dunlap
2011-02-01 16:32 ` Andre Przywara
2011-02-02 6:27 ` Juergen Gross
2011-02-02 8:49 ` Juergen Gross
2011-02-02 10:05 ` Juergen Gross
2011-02-02 10:59 ` Andre Przywara
2011-02-02 14:39 ` Stephan Diestelhorst
2011-02-02 15:14 ` Juergen Gross
2011-02-02 16:01 ` Stephan Diestelhorst
2011-02-03 5:57 ` Juergen Gross
2011-02-03 9:18 ` Juergen Gross
2011-02-04 14:09 ` Andre Przywara
2011-02-07 12:38 ` Andre Przywara
2011-02-07 13:32 ` Juergen Gross
2011-02-07 15:55 ` George Dunlap
2011-02-08 5:43 ` Juergen Gross
2011-02-08 12:08 ` George Dunlap
2011-02-08 12:14 ` George Dunlap
2011-02-08 16:33 ` Andre Przywara
2011-02-09 12:27 ` George Dunlap
2011-02-09 12:27 ` George Dunlap
2011-02-09 13:04 ` Juergen Gross
2011-02-09 13:39 ` Andre Przywara
2011-02-09 13:51 ` Andre Przywara
2011-02-09 14:21 ` Juergen Gross [this message]
2011-02-10 6:42 ` Juergen Gross
2011-02-10 9:25 ` Andre Przywara
2011-02-10 14:18 ` Andre Przywara
2011-02-11 6:17 ` Juergen Gross
2011-02-11 7:39 ` Andre Przywara
2011-02-14 17:57 ` George Dunlap
2011-02-15 7:22 ` Juergen Gross
2011-02-16 9:47 ` Juergen Gross
2011-02-16 13:54 ` George Dunlap
[not found] ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11 ` Juergen Gross
2011-02-16 14:28 ` Juergen Gross
2011-02-17 0:05 ` André Przywara
2011-02-17 7:05 ` Juergen Gross
2011-02-17 9:11 ` Juergen Gross
2011-02-21 10:00 ` Andre Przywara
2011-02-21 13:19 ` Juergen Gross
2011-02-21 14:45 ` Andre Przywara
2011-02-21 14:50 ` Juergen Gross
2011-02-08 12:23 ` Juergen Gross
2011-01-28 11:13 ` George Dunlap
2011-01-28 13:05 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D52A2CD.9090507@ts.fujitsu.com \
--to=juergen.gross@ts.fujitsu.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Stephan.Diestelhorst@amd.com \
--cc=andre.przywara@amd.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).