From mboxrd@z Thu Jan  1 00:00:00 1970
From: labbott@redhat.com (Laura Abbott)
Date: Tue, 5 Jan 2016 10:36:48 -0800
Subject: [PATCHv2 00/18] arm64: mm: rework page table creation
In-Reply-To: <20160105115414.GC24664@leverpostej>
References: <1451930211-22460-1-git-send-email-mark.rutland@arm.com>
 <568B17AA.1050002@redhat.com> <20160105115414.GC24664@leverpostej>
Message-ID: <568C0D40.4040204@redhat.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On 01/05/2016 03:54 AM, Mark Rutland wrote:
> On Mon, Jan 04, 2016 at 05:08:58PM -0800, Laura Abbott wrote:
>> On 01/04/2016 09:56 AM, Mark Rutland wrote:
>>> Hi all,
>>>
>>> This series reworks the arm64 early page table code, in order to:
>>>
>>> (a) Avoid issues with potentially-conflicting TTBR1 TLB entries (as raised in
>>>      Jeremy's thread [1]). This can happen when splitting/merging sections or
>>>      contiguous ranges, and per a pessimistic reading of the ARM ARM may happen
>>>      for changes to other fields in translation table entries.
>>>
>>> (b) Allow for more complex page table creation early on, with tables created
>>>      with fine-grained permissions as early as possible. In the cases where we
>>>      currently use fine-grained permissions (e.g. DEBUG_RODATA and marking .init
>>>      as non-executable), this is required for the same reasons as (a), as we
>>>      must ensure that changes to page tables do not split/merge sections or
>>>      contiguous regions for memory in active use.
>
> [...]
>
>>> There are still opportunities for improvement:
>>>
>>> * BUG() when splitting sections or creating overlapping entries in
>>>    create_mapping, as these both indicate serious bugs in kernel page table
>>>    creation.
>>>
>>>    This will require rework to the EFI runtime services pagetable creation, as
>>>    for >4K page kernels EFI memory descriptors may share pages (and currently
>>>    such overlap is assumed to be benign).
>>
>> Given the split_{pmd,pud} were added for DEBUG_RODATA, is there any reason
>> those can't be dropped now since it sounds like the EFI problem is for overlapping
>> entries and not splitting?
>
> Good point. I think they can be removed.
>
> I'll take a look into that.
>
>> This series points out that my attempt to allow set_memory_* to
>> work on regular kernel memory[1] is broken right now because it breaks down
>> the larger block sizes.
>
> What's the rationale for set_memory_* on kernel mappings? I see
> "security", but I couldn't figure out a concrete use-case. Is there any
> example of a subsystem that wants to use this?

 From the description, it sounded like this was possibly new work but
the eBPF interpreter currently supports setting a page read only via
set_memory_ro (see 60a3b2253c413cf601783b070507d7dd6620c954
"net: bpf: make eBPF interpreter images read-only") so it's not
unheard of.

>
> For statically-allocated data, an alternative approach would be for such
> memory to be mapped with minimal permissions from the outset (e.g. being
> placed in .rodata), and when elevated permissions are required a
> (temporary) memremap'd alias could be used, like what patch_map does to
> modify ROX kernel/module text.
>
> For dynamically-allocated data, we could create (minimal permission)
> mappings in the vmalloc region and pass those around. The linear map
> alias would still be writeable, but as the offset between the two isn't
> linear (and the owner of that allocation doesn't have to know/care about
> the linear map address), it would be much harder to find the linear map
> address to attack. An alias with elevated permissions could be used as
> required, or if it's a one-time RW->RO switch, the mapping could me
> modified in-place as the granularity wouldn't change.

This would work for new features but probably not for existing features
such as the eBPF interpreter.

>
>> Do you have any suggestions for a cleaner approach
>> short of requiring all memory mapped with 4K pages? The only solution I see
>> right now is having a separate copy of page tables to switch to. Any idea
>> other idea I come up with would have problems if we tried to invalidate an
>> entry before breaking it down.
>
> The other option I looked into was to have a completely independent
> TTBR0 mapping (like the idmap or efi runtime tables), and have that map
> code for modifying page tables. That way you could modify the tables
> in-place (with TTBR1 disabled for the duration of the modification).
>
> That ended up having its own set of problems, as you could only rely on
> self-contained position independent code, which ruled out most kernel
> APIs (including locking/atomic primitives due to debug paths). That gets
> worse when secondaries are online and you have to synchronise those
> disabling/invalidating/enabling the TTBR1 mapping.
>
> Other than that I haven't managed to come up with other functional
> ideas. The RCU-like approach is the cleanest I've found so far.
>

Yeah, I suspect this is going to remain open for a while. Thanks for
your thoughts.

Laura