From: Manoj Iyer <manoj.iyer-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
To: Shanker Donthineni
<shankerd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
James Morse <james.morse-5wv7dgnIgG8@public.gmane.org>
Cc: Manoj Iyer <manoj.iyer-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>,
Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>,
Marc Zyngier <marc.zyngier-5wv7dgnIgG8@public.gmane.org>,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>,
Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
Matt Fleming
<matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>,
Christoffer Dall
<christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org
Subject: Re: [3/3] arm64: Add software workaround for Falkor erratum 1041
Date: Wed, 15 Nov 2017 09:12:33 -0600 (CST) [thread overview]
Message-ID: <alpine.DEB.2.20.1711150905001.7346@hungry> (raw)
In-Reply-To: <alpine.DEB.2.20.1711101146400.4353@lazy>
On Fri, 10 Nov 2017, Manoj Iyer wrote:
> On Thu, 9 Nov 2017, Manoj Iyer wrote:
>
>>
>> James,
>>
>> Looks like my VM test raised a false alarm. I retested stock Artful 4.13
>> kernel (No erratum 1041 patches applied).
>>
>
> James, an update on the crash (false alarm). We suspect this is a firmware
> crash due to a possible fw bug. Once this is addressed I will be able to send
> you the test results you requested on VM start/stop with the erratum 1041
> patches applied.
>
James/Shanker,
I can report that VM start/stop/restart tests worked with the patches
applied to Ubuntu 4.13 (Artful) kernel on the qdf2400 hardware.
Host: Ubuntu 4.13 with Erratum 1041 patches applied
Guest: Stock Ubuntu 4.13 kernel
- create 20 vms one at a time
10 iteration of:
- Stop (virsh destroy) 20 VMs one at a time
- Start (virsh start) 20 VMs one at a time.
Tested-by: Manoj Iyer <manoj.iyer-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
>
>> Host: Ubuntu Artful 4.13 kernel with *no* erratum 1041 patches applied.
>> Guest: Ubuntu Zesty (4.10) kernel.
>>
>> - Created 20 VMs one at a time
>>
>> In a loop:
>> - Stop (virsh destroy) 20 VMs one at a time
>> - Start (virsh start) 20 VMs one at a time.
>>
>> And, I am able to reproduce the system reset issue I previously reported. I
>> think the problem I reported with VMs might have nothing to do with the
>> erratum 1041 patches, and probably needs to be root caused seperately.
>>
>> With stock 4.13 kernel (no erratum 1041 patches applied):
>>
>> awrep6 login: [ 461.881379] ACPI CPPC: PCC check channel failed. Status=0
>> [ 462.051194] ACPI CPPC: PCC check channel failed. Status=0
>> [ 462.223137] ACPI CPPC: PCC check channel failed. Status=0
>> [ 462.633790] ACPI CPPC: PCC check channel failed. Status=0
>> [ 463.231971] ACPI CPPC: PCC check channel failed. Status=0
>> [ 463.403163] ACPI CPPC: PCC check channel failed. Status=0
>> [ 463.822936] ACPI CPPC: PCC check channel failed. Status=0
>> [ 463.995222] ACPI CPPC: PCC check channel failed. Status=0
>> [ 464.130962] ACPI CPPC: PCC check channel failed. Status=0
>> [ 464.258973] ACPI CPPC: PCC check channel failed. Status=0
>> [ 465.283028] ACPI CPPC: PCC check channel failed. Status=0
>>
>>
>> SYS_DBG: Running SDI image (immediate mode)
>> SYS_DBG: Ram Dump Init
>> SYS_DBG: Failed to init SD card
>> SYS_DBG: Resetting system!
>>
>>
>> On Thu, 9 Nov 2017, Manoj Iyer wrote:
>>
>>>
>>>
>>>
>>> On Thu, 9 Nov 2017, Manoj Iyer wrote:
>>>
>>>>
>>>> James,
>>>>
>>>> (sorry for top-posting)
>>>>
>>>> Applied patch 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic )
>>>>
>>>> - Start 20 VMs one at a time
>>>>
>>>> In a loop:
>>>> - Stop (virsh destroy) 20 VMs one at a time
>>>> - Start (virsh start) 20 VMs one at a time.
>>>
>>> Fixing some confusion I might have introduced in my prev email.
>>>
>>> - Applied all 3 patches to Ubuntu Artful Kernel ( 4.13.0-16-generic )
>>>
>>> - Created 20 VMs one at a time
>>>
>>> In a loop:
>>> - Stop (virsh destroy) 20 VMs one at a time
>>> - Start (virsh start) 20 VMs one at a time.
>>>
>>>>
>>>> The system reset's itself after starting the last VM on the 1st loop
>>>> displaying the following:
>>>>
>>>> awrep6 login: [ 603.349141] ACPI CPPC: PCC check channel failed. Status=0
>>>> [ 603.765101] ACPI CPPC: PCC check channel failed. Status=0
>>>> [ 603.937389] ACPI CPPC: PCC check channel failed. Status=0
>>>> [ 608.285495] ACPI CPPC: PCC check channel failed. Status=0
>>>> [ 608.289481] ACPI CPPC: PCC check channel failed. Status=0
>>>>
>>>> SYS_DBG: Running SDI image (immediate mode)
>>>> SYS_DBG: Ram Dump Init
>>>> SYS_DBG: Failed to init SD card
>>>> SYS_DBG: Resetting system!
>>>>
>>>> Followed by the following messages on system reboot:
>>>> [ 6.616891] BERT: Error records from previous boot:
>>>> [ 6.621655] [Hardware Error]: event severity: fatal
>>>> [ 6.626516] [Hardware Error]: imprecise tstamp: 0000-00-00 00:00:00
>>>> [ 6.632851] [Hardware Error]: Error 0, type: fatal
>>>> [ 6.637713] [Hardware Error]: section type: unknown,
>>>> d2e2621c-f936-468d-0d84-15a4ed015c8b
>>>> [ 6.646045] [Hardware Error]: section length: 0x238
>>>> [ 6.651082] [Hardware Error]: 00000000: 72724502 5220726f 6f736165
>>>> 6e55206e .Error Reason Un
>>>> [ 6.659761] [Hardware Error]: 00000010: 776f6e6b 0000006e 00000000
>>>> 00000000 known...........
>>>> [ 6.668442] [Hardware Error]: 00000020: 00000000 00000000 00000000
>>>> 00000000 ................
>>>> [ 6.677122] [Hardware Error]: 00000030: 00000000 00000000 00000000
>>>> 00000000 ................
>>>>
>>>>
>>>> On Thu, 9 Nov 2017, James Morse wrote:
>>>>
>>>>> Hi Manoj,
>>>>>
>>>>> On 08/11/17 19:05, Manoj Iyer wrote:
>>>>>> On Thu, 2 Nov 2017, Shanker Donthineni wrote:
>>>>>>> The ARM architecture defines the memory locations that are permitted
>>>>>>> to be accessed as the result of a speculative instruction fetch from
>>>>>>> an exception level for which all stages of translation are disabled.
>>>>>>> Specifically, the core is permitted to speculatively fetch from the
>>>>>>> 4KB region containing the current program counter and next 4KB.
>>>>>>>
>>>>>>> When translation is changed from enabled to disabled for the running
>>>>>>> exception level (SCTLR_ELn[M] changed from a value of 1 to 0), the
>>>>>>> Falkor core may errantly speculatively access memory locations outside
>>>>>>> of the 4KB region permitted by the architecture. The errant memory
>>>>>>> access may lead to one of the following unexpected behaviors.
>>>>>
>>>>>> I applied the 3 patches to Ubuntu 4.13.0-16-generic (Artful) kernel and
>>>>>> ran stress-ng cpu tests on QDF2400 server
>>>>>
>>>>> [...]
>>>>>
>>>>>> Where stress-ng would spawn N workers and test cpu offline/online,
>>>>>> perform
>>>>>> matrix operations, do rapid context switchs, and anonymous mmaps.
>>>>>> Although
>>>>>> I was not able to reproduce the erratum on the stock 4.13 kernel using
>>>>>> the
>>>>>> same test case, the patched kernel did not seem to introduce any
>>>>>> regressions either. I ran the stress-ng tests for over 8hrs found the
>>>>>> system to be stable.
>>>>>
>>>>>
>>>>> Could you throw kexec and KVM into the mix? This issue only shows up
>>>>> when we
>>>>> disable the MMU, which we almost never do.
>>>>>
>>>>> For CPU offline/online we make the PSCI 'offline' call with the MMU
>>>>> enabled.
>>>>> When the CPU comes back firmware has reset the EL2/EL1 SCTLR from a
>>>>> higher
>>>>> exception level, so it won't hit this issue.
>>>>>
>>>>> One place we do this is kexec, where we drop into purgatory with the MMU
>>>>> disabled.
>>>>>
>>>>> The other is KVM unloading itself to return to the hyp stub. You can
>>>>> stress this
>>>>> by starting and stopping a VM. When the number of VMs reaches 0 KVM
>>>>> should
>>>>> unload via 'kvm_arch_hardware_disable()'.
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>>
>>>>
>>>> --
>>>> ============================
>>>> Manoj Iyer
>>>> Ubuntu/Canonical
>>>> ARM Servers - Cloud
>>>> ============================
>>>>
>>>>
>>>
>>> --
>>> ============================
>>> Manoj Iyer
>>> Ubuntu/Canonical
>>> ARM Servers - Cloud
>>> ============================
>>>
>>>
>>
>> --
>> ============================
>> Manoj Iyer
>> Ubuntu/Canonical
>> ARM Servers - Cloud
>> ============================
>>
>>
>
> --
> ============================
> Manoj Iyer
> Ubuntu/Canonical
> ARM Servers - Cloud
> ============================
>
>
prev parent reply other threads:[~2017-11-15 15:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-03 3:27 [PATCH 0/3] Implement a software workaround for Falkor erratum 1041 Shanker Donthineni
2017-11-03 3:27 ` [PATCH 1/3] arm64: Define cputype macros for Falkor CPU Shanker Donthineni
2017-11-03 3:27 ` [PATCH 2/3] arm64: Prepare SCTLR_ELn accesses to handle Falkor erratum 1041 Shanker Donthineni
2017-11-03 3:27 ` [PATCH 3/3] arm64: Add software workaround for " Shanker Donthineni
2017-11-03 15:11 ` Robin Murphy
[not found] ` <1f4a523c-608b-b46b-527a-bc1e02e7db5e-5wv7dgnIgG8@public.gmane.org>
2017-11-04 21:43 ` Shanker Donthineni
[not found] ` <2ef66b5c-d1b7-a5fc-a19d-88dddff95bad-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-09 11:08 ` James Morse
[not found] ` <5A04372B.2090902-5wv7dgnIgG8@public.gmane.org>
2017-11-09 15:22 ` Shanker Donthineni
[not found] ` <93801988-e785-bd49-796e-4a73e0b77413-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-10 10:24 ` James Morse
[not found] ` <5A057E44.3050109-5wv7dgnIgG8@public.gmane.org>
2017-11-13 1:06 ` Shanker Donthineni
[not found] ` <1509679664-3749-4-git-send-email-shankerd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-11-08 19:05 ` [3/3] " Manoj Iyer
2017-11-09 11:06 ` James Morse
2017-11-09 15:52 ` Manoj Iyer
2017-11-09 16:14 ` Manoj Iyer
2017-11-09 16:58 ` Manoj Iyer
2017-11-10 17:49 ` Manoj Iyer
2017-11-15 15:12 ` Manoj Iyer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.20.1711150905001.7346@hungry \
--to=manoj.iyer-z7wlfzj8ewms+fvcfc7uqw@public.gmane.org \
--cc=ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
--cc=catalin.marinas-5wv7dgnIgG8@public.gmane.org \
--cc=christoffer.dall-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
--cc=james.morse-5wv7dgnIgG8@public.gmane.org \
--cc=kvmarm-FPEHb7Xf0XXUo1n7N8X6UoWGPAHP3yOg@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=marc.zyngier-5wv7dgnIgG8@public.gmane.org \
--cc=matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org \
--cc=shankerd-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
--cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox