From: Julien Grall <julien.grall@linaro.org>
To: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Andre Przywara <andre.przywara@calxeda.com>,
Ian Campbell <Ian.Campbell@citrix.com>,
Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>,
xen-devel <xen-devel@lists.xen.org>
Subject: Re: [ARM] Bash often segfaults in Dom0 with the latest Xen
Date: Wed, 05 Jun 2013 17:12:04 +0100 [thread overview]
Message-ID: <51AF6354.4090701@linaro.org> (raw)
In-Reply-To: <CAMJs5B9CsjdF1v=c1OdGq1Edkx-9LTDq0m+Vv5EP_p_K+W8Czw@mail.gmail.com>
On 06/05/2013 03:30 PM, Christoffer Dall wrote:
> On 5 June 2013 04:48, Julien Grall <julien.grall@linaro.org> wrote:
>> On 06/05/2013 02:38 AM, Christoffer Dall wrote:
>>
>>> On 4 June 2013 15:45, Julien Grall <julien.grall@linaro.org> wrote:
>>>> Hi all,
>>>>
>>>> Since a couple of week, I'm tracking an issue with Xen on ARM with no luck.
>>>>
>>>> I'm run out of idea, so I send this email to have advice from the community.
>>>>
>>>> Most of the time bash will abort with random error in dom0:
>>>> - page fault (data and prefetch abort)
>>>> - memory corruption (malloc corruption and invalid pointer)
>>>>
>>>> It's easily to reproduce by doing ./configure on the xen tree.
>>>>
>>>> My environment is an arndale board:
>>>> - linux linaro 13.05 (using arndale_xen_dom0_defconfig and exynos5250_arndale.dts)
>>>> - opensuse 12.03 (http://en.opensuse.org/HCL:Arndale)
>>>> - xen upstream
>>>>
>>>> The linux tree can be retrieved from git://xenbits.xen.org/people/julieng/linux-arm.git
>>>> using the branch linaro-3.10.
>>>> The previous branch is based on the linaro tree with some patches for the dts and xen.
>>>>
>>>> The issue also occurs on the versatile express. But it's harder to reproduce.
>>>> Here the environment is:
>>>> - linux linaro 13.05 (using vexpress_xen_dom0_defconfig and vexpress_v2p_ca15_a7.dtb)
>>>> - ubuntu linaro 13.05
>>>> - xen upstream
>>>>
>>>> I have tried different distributions and linux version, the issue was the same.
>>>> I made some testing to narrow down the bug and I came to the following test case:
>>>>
>>>> Only dom0 is running and each VCPUs are pinned to a specific cpu
>>>> (vcpu0 -> cpu0 and vcpu1 -> cpu1).
>>>>
>>>> The patch below removes WFI trap and by consequence avoid a VCPU to move to
>>>> another physical CPU.
>>>> =========================================
>>>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>>>> index 6cfba1a..e89ca15 100644
>>>> --- a/xen/arch/arm/traps.c
>>>> +++ b/xen/arch/arm/traps.c
>>>> @@ -62,7 +62,7 @@ void __cpuinit init_traps(void)
>>>> WRITE_SYSREG((vaddr_t)hyp_traps_vector, VBAR_EL2);
>>>>
>>>> /* Setup hypervisor traps */
>>>> - WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TWI|HCR_TSC, HCR_EL2);
>>>> + WRITE_SYSREG(HCR_PTW|HCR_BSU_OUTER|HCR_AMO|HCR_IMO|HCR_VM|HCR_TSC, HCR_EL2);
>>>> isb();
>>>> }
>>>>
>>>> =========================================
>>>>
>>>> If a bash process is assigned to a specific cpu with taskset, the process seems
>>>> to always run without any issue.
>>>>
>>>> taskset -c 0 ./configure
>>>>
>>>> I guess it's a caching issue, but each time I've tried to play with the caching
>>>> policy Linux was not booting.
>>>>
>>>> Thanks in advance for any advice.
>>>
>>> Some thoughts:
>>>
>>> - Does dom0 run with Stage-2 translation? If so, you should be able
>>> to disable caches in both Hyp mode and for dom0 by manipulating the
>>> hyp registers to try and exclude caches. If Linux doesn't boot under
>>> such configuration, something else is completely broken, as it must be
>>> transparent to your dom0.
>>>
>>> - Are you doing any swapping and/or page reclaiming? I wouldn't
>>> assume so for dom0, but if you are, you need to maintain the icache
>>> properly, since it can be aliasing, see
>>> http://lxr.linux.no/linux+v3.9.4/arch/arm/kvm/mmu.c#L495 (I doubt this
>>> is the case though)
>>>
>>> - All other cache accesses should be coherent across cores and are
>>> physically indexed/physically tagged so I don't see how this could be
>>> your issue.
>>
>> It was only an idea because I have noticed the memory was often corrupted.
>>
>>> - Do you always see the crash in user space or kernel space in dom0 or
>>> is it all over the map?
>>
>>
>> Only in user space in dom0.
>>
> Hmm, which kernel version is dom0 based on? Can you bisect the dom0
> source to make sure it's not something introduced during development.
I'm using the linaro's branch ll_20130528.0, I have only few patches for
the dts and not yet in linaro tree patches.
I have the same issue with linux 3.9-rc4 with multiple CPUs and I can't
really go before without carrying many xen patches to try it.
I have tried different configuration with the number of CPUs in Xen
(pCPU) and linux (vCPU):
- 2 pCPU 2 vCPU : segfaulting
- 2 pCPU 1 vCPU : working
- 1 pCPU 1 vCPU : working
- 1 pCPU 2 vCPU : very slow but working
> You have this in your tree right: "9d1f5c ARM: 7641/1: memory: fix
> broken mmap..." ?
Yes.
--
Julien
next prev parent reply other threads:[~2013-06-05 16:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-04 22:45 [ARM] Bash often segfaults in Dom0 with the latest Xen Julien Grall
2013-06-05 1:38 ` Christoffer Dall
2013-06-05 9:52 ` Ian Campbell
2013-06-05 11:48 ` Julien Grall
2013-06-05 14:30 ` Christoffer Dall
2013-06-05 15:18 ` Ian Campbell
2013-06-05 16:12 ` Julien Grall [this message]
2013-06-05 16:46 ` Stefano Stabellini
2013-06-05 17:36 ` Christoffer Dall
2013-06-05 17:53 ` Julien Grall
2013-06-05 17:57 ` Christoffer Dall
2013-06-05 18:01 ` Stefano Stabellini
2013-06-05 18:17 ` Christoffer Dall
2013-06-05 18:36 ` Julien Grall
2013-06-11 11:48 ` Julien Grall
2013-06-11 14:25 ` Christoffer Dall
2013-06-05 9:38 ` Ian Campbell
2013-06-05 10:39 ` Julien Grall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51AF6354.4090701@linaro.org \
--to=julien.grall@linaro.org \
--cc=Ian.Campbell@citrix.com \
--cc=Stefano.Stabellini@eu.citrix.com \
--cc=andre.przywara@calxeda.com \
--cc=christoffer.dall@linaro.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).