From: Michael Ellerman <mpe@ellerman.id.au>
To: Daniel Axtens <dja@axtens.net>, linuxppc-dev@lists.ozlabs.org
Cc: aneesh.kumar@linux.ibm.com
Subject: Re: [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
Date: Fri, 19 Mar 2021 22:56:47 +1100 [thread overview]
Message-ID: <871rcb8igw.fsf@mpe.ellerman.id.au> (raw)
In-Reply-To: <87h7m8pyk5.fsf@dja-thinkpad.axtens.net>
Daniel Axtens <dja@axtens.net> writes:
> Michael Ellerman <mpe@ellerman.id.au> writes:
>
>> When we enabled STRICT_KERNEL_RWX we received some reports of boot
>> failures when using the Hash MMU and running under phyp. The crashes
>> are intermittent, and often exhibit as a completely unresponsive
>> system, or possibly an oops.
>>
>> One example, which was caught in xmon:
>>
>> [ 14.068327][ T1] devtmpfs: mounted
>> [ 14.069302][ T1] Freeing unused kernel memory: 5568K
>> [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch
>> [ 14.142063][ T1] Run /sbin/init as init process
>> [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400
>> cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
>> pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
>> lr: c0000000001862d4: update_rq_clock+0x44/0x110
>> sp: c00000000c747880
>> msr: 8000000040001031
>> current = 0xc00000000c60d380
>> paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01
>> pid = 347, comm = kworker/2:1
>> ...
>> enter ? for help
>> [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
>> [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
>> [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
>> [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
>> [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
>> [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
>> [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
>> [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
>> [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
>> [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
>> [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c
>>
>> This shows that CPU 2, which was idle, woke up and then appears to
>> randomly take an instruction fault on a completely valid area of
>> kernel text.
>>
>> The cause turns out to be the call to hash__mark_rodata_ro(), late in
>> boot. Due to the way we layout text and rodata, that function actually
>> changes the permissions for all of text and rodata to read-only plus
>> execute.
>>
>> To do the permission change we use a hypervisor call, H_PROTECT. On
>> phyp that appears to be implemented by briefly removing the mapping of
>> the kernel text, before putting it back with the updated permissions.
>> If any other CPU is executing during that window, it will see spurious
>> faults on the kernel text and/or data, leading to crashes.
>
> Jordan asked why we saw this on phyp but not under KVM? We had a look at
> book3s_hv_rm_mmu.c but the code is a bit too obtuse for me to reason
> about!
>
> Nick suggests that the KVM hypervisor is invalidating the HPTE, but
> because we run guests in VPM mode, the hypervisor would catch the page
> fault and not reflect it down to the guest. It looks like Linux-as-a-HV
> will take HPTE_V_HVLOCK, and then because it's running in VPM mode, the
> hypervisor will catch the fault and not pass it to the guest.
Yep.
> But if phyp runs with VPM mode off, the guest will see the fault
> before the hypervisor. (we think this is what's going on anyway.)
Yeah. I assumed phyp always ran with VPM=1, but apparently it can run
with it off or on, depending on various configuration settings.
So I'm fairly sure what we're hitting here is VPM=0, where the faults go
straight to the guest.
cheers
next prev parent reply other threads:[~2021-03-19 11:57 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-11 13:51 [PATCH 1/6] powerpc/mm/64s: Add _PAGE_KERNEL_ROX Michael Ellerman
2021-02-11 13:51 ` [PATCH 2/6] powerpc/pseries: Add key to flags in pSeries_lpar_hpte_updateboltedpp() Michael Ellerman
2021-02-16 5:39 ` Daniel Axtens
2021-02-18 23:25 ` Michael Ellerman
2021-02-11 13:51 ` [PATCH 3/6] powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro() Michael Ellerman
2021-02-16 5:50 ` Daniel Axtens
2021-02-11 13:51 ` [PATCH 4/6] powerpc/mm/64s/hash: Factor out change_memory_range() Michael Ellerman
2021-02-19 2:08 ` Daniel Axtens
2021-03-16 6:30 ` Michael Ellerman
2021-02-11 13:51 ` [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR Michael Ellerman
2021-02-11 23:16 ` Nicholas Piggin
2021-03-20 13:04 ` Michael Ellerman
2021-03-22 2:56 ` Nicholas Piggin
2021-02-12 0:36 ` Nicholas Piggin
2021-03-16 6:40 ` Michael Ellerman
2021-03-22 3:09 ` Nicholas Piggin
2021-03-22 9:07 ` Michael Ellerman
2021-02-19 2:43 ` Daniel Axtens
2021-03-19 11:56 ` Michael Ellerman [this message]
2021-02-11 13:51 ` [PATCH 6/6] powerpc/mm/64s: Allow STRICT_KERNEL_RWX again Michael Ellerman
2021-04-10 14:28 ` [PATCH 1/6] powerpc/mm/64s: Add _PAGE_KERNEL_ROX Michael Ellerman
2021-04-19 5:17 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871rcb8igw.fsf@mpe.ellerman.id.au \
--to=mpe@ellerman.id.au \
--cc=aneesh.kumar@linux.ibm.com \
--cc=dja@axtens.net \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.