From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: Hypervisor crash(!) on xl cpupool-numa-split
Date: Wed, 09 Feb 2011 15:21:01 +0100 [thread overview]
Message-ID: <4D52A2CD.9090507@ts.fujitsu.com> (raw)
In-Reply-To: <4D529BD9.5050200@amd.com>
Andre, George,
What seems to be interesting: I think the problem did always occur when
a new cpupool was created and the first cpu was moved to it.
I think my previous assumption regarding the master_ticker was not too bad.
I think somehow the master_ticker of the new cpupool is becoming active
before the scheduler is really initialized properly. This could happen, if
enough time is spent between alloc_pdata for the cpu to be moved and the
critical section in schedule_cpu_switch().
The solution should be to activate the timers only if the scheduler is
ready for them.
George, do you think the master_ticker should be stopped in suspend_ticker
as well? I still see potential problems for entering deep C-States. I think
I'll prepare a patch which will keep the master_ticker active for the
C-State case and migrate it for the schedule_cpu_switch() case.
Juergen
On 02/09/11 14:51, Andre Przywara wrote:
> George Dunlap wrote:
>> <George.Dunlap@eu.citrix.com> wrote:
>>> On Tue, Feb 8, 2011 at 4:33 PM, Andre Przywara
>>> <andre.przywara@amd.com> wrote:
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 24
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v34 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 25
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 26
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v24 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v42 from cpu 27
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v18 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v25 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v32 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v39 from cpu 28
>>>> (XEN) cpu_disable_scheduler: Migrating d0v3 from cpu 29
>>> Interesting -- what seems to happen here is that as cpus are disabled,
>>> vcpus are "shovelled" in an accumulative fashion from one cpu to the
>>> next:
>>> * v18,34,42 start on cpu 24.
>>> * When 24 is brought down, they're all migrated to 25; then when 25 is
>>> brougth down, to 26, then to 27
>>> * v24 is running on cpu 27, so when 27 is brought down, v24 is added
>>> to the mix
>>> * v3 is running on cpu 28, so all of them plus v3 are shoveled onto
>>> cpu 29.
>>>
>>> While that behavior may not be ideal, it should certainly be bug-free.
>>>
>>> Another interesting thing to note is that the bug happened on pcpu 32,
>>> but there were no advertised migrations from that cpu.
>>>
>>> Andre, can you fold the attached patch into your testing?
> Sorry, but that bug (and its output) didn't trigger on two tries.
> Instead I now saw two occasions of the "migration failed, must retry
> later" message. Interestingly enough is does not seem to be fatal. The
> first time it triggers, the numa-split even completes, then after I roll
> it back and repeat it it shows again, but crashes later on that old
> BUG_ON().
>
> See the attached log for more details.
>
> Thanks for the try, anyway.
>
> Regards,
> Andre.
>
>
>>>
>>> Thanks for all your work on this.
> I am glad for all your help. I only start to really understand the
> scheduler, so your support is much appreciated.
>
>>>
>>> -George
>>>
>
>
--
Juergen Gross Principal Developer Operating Systems
TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
next prev parent reply other threads:[~2011-02-09 14:21 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-27 23:18 Hypervisor crash(!) on xl cpupool-numa-split Andre Przywara
2011-01-28 6:47 ` Juergen Gross
2011-01-28 11:07 ` Andre Przywara
2011-01-28 11:44 ` Juergen Gross
2011-01-28 13:14 ` Andre Przywara
2011-01-31 7:04 ` Juergen Gross
2011-01-31 14:59 ` Andre Przywara
2011-01-31 15:28 ` George Dunlap
2011-02-01 16:32 ` Andre Przywara
2011-02-02 6:27 ` Juergen Gross
2011-02-02 8:49 ` Juergen Gross
2011-02-02 10:05 ` Juergen Gross
2011-02-02 10:59 ` Andre Przywara
2011-02-02 14:39 ` Stephan Diestelhorst
2011-02-02 15:14 ` Juergen Gross
2011-02-02 16:01 ` Stephan Diestelhorst
2011-02-03 5:57 ` Juergen Gross
2011-02-03 9:18 ` Juergen Gross
2011-02-04 14:09 ` Andre Przywara
2011-02-07 12:38 ` Andre Przywara
2011-02-07 13:32 ` Juergen Gross
2011-02-07 15:55 ` George Dunlap
2011-02-08 5:43 ` Juergen Gross
2011-02-08 12:08 ` George Dunlap
2011-02-08 12:14 ` George Dunlap
2011-02-08 16:33 ` Andre Przywara
2011-02-09 12:27 ` George Dunlap
2011-02-09 12:27 ` George Dunlap
2011-02-09 13:04 ` Juergen Gross
2011-02-09 13:39 ` Andre Przywara
2011-02-09 13:51 ` Andre Przywara
2011-02-09 14:21 ` Juergen Gross [this message]
2011-02-10 6:42 ` Juergen Gross
2011-02-10 9:25 ` Andre Przywara
2011-02-10 14:18 ` Andre Przywara
2011-02-11 6:17 ` Juergen Gross
2011-02-11 7:39 ` Andre Przywara
2011-02-14 17:57 ` George Dunlap
2011-02-15 7:22 ` Juergen Gross
2011-02-16 9:47 ` Juergen Gross
2011-02-16 13:54 ` George Dunlap
[not found] ` <4D6237C6.1050206@amd.c om>
2011-02-16 14:11 ` Juergen Gross
2011-02-16 14:28 ` Juergen Gross
2011-02-17 0:05 ` André Przywara
2011-02-17 7:05 ` Juergen Gross
2011-02-17 9:11 ` Juergen Gross
2011-02-21 10:00 ` Andre Przywara
2011-02-21 13:19 ` Juergen Gross
2011-02-21 14:45 ` Andre Przywara
2011-02-21 14:50 ` Juergen Gross
2011-02-08 12:23 ` Juergen Gross
2011-01-28 11:13 ` George Dunlap
2011-01-28 13:05 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D52A2CD.9090507@ts.fujitsu.com \
--to=juergen.gross@ts.fujitsu.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Stephan.Diestelhorst@amd.com \
--cc=andre.przywara@amd.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.