From: Mark Rutland <mark.rutland@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Will Deacon <will@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Jisheng Zhang <jszhang@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Linux-Arch <linux-arch@vger.kernel.org>,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] riscv: uaccess: optimizations
Date: Mon, 8 Jul 2024 16:21:00 +0100 [thread overview]
Message-ID: <ZowD3LQT_KTz2g4X@J2N7QTR9R3> (raw)
In-Reply-To: <CAHk-=wgRgDy8_=uZPZr4LRyF7BiN1nDNzUx7iRzrD0g8O+bh3A@mail.gmail.com>
On Fri, Jul 05, 2024 at 10:58:29AM -0700, Linus Torvalds wrote:
> On Fri, 5 Jul 2024 at 04:25, Will Deacon <will@kernel.org> wrote:
> >
> > we'd probably want to use an address that lives between the two TTBRs
> > (i.e. in the "guard region" you mentioned above), just in case somebody
> > has fscked around with /proc/sys/vm/mmap_min_addr.
>
> Yes, I don't want to use a NULL pointer and rely on mmap_min_addr.
>
> For x86-64, we have two "guard regions" that can be used to generate
> an address that is guaranteed to fault:
>
> - the kernel always lives in the "top bit set" part of the address
> space (and any address tagging bits don't touch that part), and does
> not map the highest virtual address because that's used for error
> pointers, so the "all bits set" address always faults
The same should be true on arm64, though I'm not immediately sure if we
explicitly reserve that VA region -- if we don't, then we should.
> - the region between valid user addresses and kernel addresses is
> also always going to fault, and we don't have them adjacent to each
> other (unlike, for example, 32-bit i386, where the kernel address
> space is directly adjacent to the top of user addresses)
Today we have a gap between the TTBR0 and TTBR1 VA ranges in all
configurations, but in future (with the new FEAT_D128 page table format)
we will have configurations where there's no gap between the two ranges.
> So on x86-64, the simple solution is to just say "we know if the top
> bit is clear, it cannot ever touch kernel code, and if the top bit is
> set we have to make the address fault". So just duplicating the top
> bit (with an arithmetic shift) and or'ing it with the low bits, we get
> exactly what we want.
>
> But my knowledge of arm64 is weak enough that while I am reading
> assembly language and I know that instead of the top bit, it's bit55,
> I don't know what the actual rules for the translation table registers
> are.
>
> If the all-bits-set address is guaranteed to always trap, then arm64
> could just use the same thing x86 does (just duplicating bit 55
> instead of the sign bit)?
I think something of that shape can work (see below). There are a couple
of things that make using all-ones unsafe:
1) Non-faulting parts of a misaligned load/store can occur *before* the
fault is raised. If you have two pages where one of which is writable
and the other of which is not writeable (in either order), a store
which straddles those pages can write to the writeable page before
raising a fault on the non-writeable page.
I've seen this behaviour on real HW, and IIUC this is fairly common.
2) Loads/stores which wrap past 0xFFFF_FFFF_FFFF_FFFF access bytes at
UNKNOWN addresses. An N-byte store at 0xFFFF_FFFF_FFFF_FFFF may write
to N-1 bytes at an arbitrary address which is not
0x0000_0000_0000_0000.
In the latest ARM ARM (K.a), this is described tersely in section
K1.2.9 "Out of range virtual address".
That can be found at:
https://developer.arm.com/documentation/ddi0487/ka/?lang=en
I'm aware of implementation styles where that address is not zero and
can be a TTBR1 (kernel) address.
Given that, we'd need to avoid all-ones, but provided we know that the
first access using the pointer will be limited to PAGE_SIZE bytes past
the pointer, we could round down the bad pointer to be somewhere within
the error pointer page, e.g.
SBFX <mask>, <ptr>, #55, #1
ORR <ptr>, <ptr>, <mask>
BIC <ptr>, <ptr>, <mask>, lsr #(64 - PAGE_SHIFT)
That last `BIC` instructions is "BIt Clear" AKA "AND NOT". When bit 55
is one that will clear the lower bits to round down to a page boundary,
and when bit 55 is zero it will have no effect (as it'll be an AND with
all-ones).
Thanks,
Mark.
prev parent reply other threads:[~2024-07-08 15:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240625040500.1788-1-jszhang@kernel.org>
2024-06-25 7:21 ` [PATCH 0/4] riscv: uaccess: optimizations Arnd Bergmann
2024-06-25 18:12 ` Linus Torvalds
2024-06-26 13:04 ` Jisheng Zhang
2024-06-30 16:59 ` Linus Torvalds
2024-07-05 11:25 ` Will Deacon
2024-07-05 17:58 ` Linus Torvalds
2024-07-08 13:52 ` Will Deacon
2024-07-08 15:30 ` Mark Rutland
2024-07-23 14:16 ` Will Deacon
2024-07-08 15:21 ` Mark Rutland [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZowD3LQT_KTz2g4X@J2N7QTR9R3 \
--to=mark.rutland@arm.com \
--cc=aou@eecs.berkeley.edu \
--cc=arnd@arndb.de \
--cc=catalin.marinas@arm.com \
--cc=jszhang@kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox