From: Andre Przywara <andre.przywara@amd.com>
To: Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Keir Fraser <keir@xen.org>,
"Diestelhorst, Stephan" <Stephan.Diestelhorst@amd.com>
Subject: Re: [PATCH] Avoid race when moving cpu between cpupools
Date: Fri, 25 Feb 2011 15:25:30 +0100 [thread overview]
Message-ID: <4D67BBDA.5070603@amd.com> (raw)
In-Reply-To: <AANLkTikSiJKLH=ginoEgO4Tx0-Z1AC2bwP4qBDjVSfAg@mail.gmail.com>
George Dunlap wrote:
> Looks good -- thanks Juergen.
>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> -George
>
> On Thu, Feb 24, 2011 at 2:08 PM, Andre Przywara <andre.przywara@amd.com> wrote:
>> Juergen Gross wrote:
>>> Moving cpus between cpupools is done under the schedule lock of the moved
>>> cpu.
>>> When checking a cpu being member of a cpupool this must be done with the
>>> lock
>>> of that cpu being held.
>> I have reviewed and tested the patch. It fixes my problem. My script has
>> been running for several hundred iterations without any Xen crash, whereas
>> without the patch the hypervisor crashed mostly at the second iteration.
Juergen,
can you rule out that this code will be triggered on two CPUs trying to
switch to each other? As Stephan pointed out: the code looks like as
this could trigger a possible dead-lock condition, where:
1) CPU A grabs lock (a) while CPU B grabs lock (b)
2) CPU A tries to grab (b) and CPU B tries to grab (a)
3) both fail and loop to 1)
A possible fix would be to introduce some ordering for the locks (just
the pointer address) and let the "bigger" pointer yield to the "smaller"
one. I am not sure if this is really necessary, but I now see strange
hangs after running the script for a while (30min to 1hr).
Sometimes Dom0 hangs for a while, loosing interrupts (sda or eth0) or
getting spurious ones, on two occasions the machine totally locked up.
I am not 100% sure whether this is CPUpools related, but I put some load
on Dom0 (without messing with CPUpools) for the whole night and it ran fine.
Sorry for this :-(
I will try to further isolate this.
Anyway, it works much better with the fix than without and I will try to
trigger this with the "reduce number of Dom0 vCPUs" patch.
Regards,
Andre.
>>
>> Thanks Juergen and George for the persistent work!
>>
>>> Hot-unplugging of physical cpus might encounter the same problems, but
>>> this
>>> should happen only very rarely.
>>>
>>> Signed-off-by: juergen.gross@ts.fujitsu.com
>> Acked-by: Andre Przywara <andre.przywara@amd.com>
>>
>> Keir, please apply for 4.1.0.
>>
--
Andre Przywara
AMD-OSRC (Dresden)
Tel: x29712
next prev parent reply other threads:[~2011-02-25 14:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-24 10:00 [PATCH] Avoid race when moving cpu between cpupools Juergen Gross
2011-02-24 14:08 ` Andre Przywara
2011-02-24 14:33 ` George Dunlap
2011-02-25 14:25 ` Andre Przywara [this message]
2011-02-25 14:36 ` Keir Fraser
2011-02-28 9:29 ` Juergen Gross
2011-02-28 10:00 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D67BBDA.5070603@amd.com \
--to=andre.przywara@amd.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=Stephan.Diestelhorst@amd.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.