From: Al Viro <viro@zeniv.linux.org.uk>
To: Peter Xu <peterx@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
linux-ia64@vger.kernel.org, linux-hexagon@vger.kernel.org,
linux-m68k@lists.linux-m68k.org, Michal Simek <monstr@monstr.eu>,
Dinh Nguyen <dinguyen@kernel.org>,
openrisc@lists.librecores.org, linux-parisc@vger.kernel.org,
linux-riscv@lists.infradead.org, sparclinux@vger.kernel.org
Subject: Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
Date: Wed, 1 Feb 2023 22:18:11 +0000 [thread overview]
Message-ID: <Y9rlI6d5J2Y/YNQ+@ZenIV> (raw)
In-Reply-To: <Y9rCBqwbLlLf1fHe@x1n>
On Wed, Feb 01, 2023 at 02:48:22PM -0500, Peter Xu wrote:
> I do also see a common pattern of the possibility to have a generic fault
> handler like generic_page_fault().
>
> It probably should start with taking the mmap_sem until providing some
> retval that is much easier to digest further by the arch-dependent code, so
> it can directly do something rather than parsing the bitmask in a
> duplicated way (hence the new retval should hopefully not a bitmask anymore
> but a "what to do").
>
> Maybe it can be something like:
>
> /**
> * enum page_fault_retval - Higher level fault retval, generalized from
> * vm_fault_reason above that is only used by hardware page fault handlers.
> * It generalizes the bitmask-versioned retval into something that the arch
> * dependent code should react upon.
> *
> * @PF_RET_COMPLETED: The page fault is completed successfully
> * @PF_RET_BAD_AREA: The page fault address falls in a bad area
> * (e.g., vma not found, expand_stack() fails..)
FWIW, there's a fun discrepancy - VM_FAULT_SIGSEGV may yield SEGV_MAPERR
or SEGV_ACCERR; depends upon the architecture. Not that there'd been
many places that return VM_FAULT_SIGSEGV these days... Good thing, too,
since otherwise e.g. csky would oops...
> * @PF_RET_ACCESS_ERR: The page fault has access errors
> * (e.g., write fault on !VM_WRITE vmas)
> * @PF_RET_KERN_FIXUP: The page fault requires kernel fixups
> * (e.g., during copy_to_user() but fault failed?)
> * @PF_RET_HWPOISON: The page fault encountered poisoned pages
> * @PF_RET_SIGNAL: The page fault encountered poisoned pages
??
> * ...
> */
> enum page_fault_retval {
> PF_RET_DONE = 0,
> PF_RET_BAD_AREA,
> PF_RET_ACCESS_ERR,
> PF_RET_KERN_FIXUP,
> PF_RET_HWPOISON,
> PF_RET_SIGNAL,
> ...
> };
>
> As a start we may still want to return some more information (perhaps still
> the vm_fault_t alongside? Or another union that will provide different
> information based on different PF_RET_*). One major thing is I see how we
> handle VM_FAULT_HWPOISON and also the fact that we encode something more
> into the bitmask on page sizes (VM_FAULT_HINDEX_MASK).
>
> So the generic helper could, hopefully, hide the complexity of:
>
> - Taking and releasing of mmap lock
> - find_vma(), and also relevant checks on access or stack handling
Umm... arm is a bit special here:
if (addr < FIRST_USER_ADDRESS)
return VM_FAULT_BADMAP;
with no counterparts elsewhere.
> - handle_mm_fault() itself (of course...)
> - detect signals
> - handle page fault retries (so, in the new layer of retval there should
> have nothing telling it to retry; it should always be the ultimate result)
agreed.
- unlock mmap; don't leave that to caller.
> - parse different errors into "what the arch code should do", and
> generalize the common ones, e.g.
> - OOM, do pagefault_out_of_memory() for user-mode
> - VM_FAULT_SIGSEGV, which should be able to merge into PF_RET_BAD_AREA?
> - ...
AFAICS, all errors in kernel mode => no_context.
> It'll simplify things if we can unify some small details like whether the
> -EFAULT above should contain a sigbus.
>
> A trivial detail I found when I was looking at this is, x86_64 passes in
> different signals to kernelmode_fixup_or_oops() - in do_user_addr_fault()
> there're three call sites and each of them pass over a differerent signal.
> IIUC that will only make a difference if there's a nested page fault during
> the vsyscall emulation (but I may be wrong too because I'm new to this
> code), and I have no idea when it'll happen and whether that needs to be
> strictly followed.
From my (very incomplete so far) dig through that pile:
Q: do we still have the cases when handle_mm_fault() does
not return any of VM_FAULT_COMPLETED | VM_FAULT_RETRY | VM_FAULT_ERROR?
That gets treated as unlock + VM_FAULT_COMPLETED, but do we still need
that?
Q: can VM_FAULT_RETRY be mixed with anything in VM_FAULT_ERROR?
What locking, if that happens?
* details of storing the fault details (for ptrace, mostly)
vary a lot; no chance to unify, AFAICS.
* requirements for vma flags also differ; e.g. read fault on
alpha is explicitly OK with absence of VM_READ if VM_WRITE is there.
Probably should go by way of arm and pass the mask that must
have non-empty intersection with vma->vm_flags? Because *that*
is very likely to be a part of ABI - mmap(2) callers that rely
upon the flags being OK for given architecture are quite possible.
* mmap lock is also quite variable in how it's taken;
x86 and arm have fun dance with trylock/search for exception handler/etc.
Other architectures do not; OTOH, there's a prefetch stuck in itanic
variant, with comment about mmap_sem being performance-critical...
* logics for stack expansion includes this twist:
if (!(vma->vm_flags & VM_GROWSDOWN))
goto map_err;
if (user_mode(regs)) {
/* Accessing the stack below usp is always a bug. The
"+ 256" is there due to some instructions doing
pre-decrement on the stack and that doesn't show up
until later. */
if (address + 256 < rdusp())
goto map_err;
}
if (expand_stack(vma, address))
goto map_err;
That's m68k; ISTR similar considerations elsewhere, but I could be
wrong.
next prev parent reply other threads:[~2023-02-01 22:18 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-31 20:02 [RFC][PATCHSET] VM_FAULT_RETRY fixes Al Viro
2023-01-31 20:03 ` [PATCH 01/10] alpha: fix livelock in uaccess Al Viro
2023-03-07 0:48 ` patchwork-bot+linux-riscv
2023-01-31 20:03 ` [PATCH 02/10] hexagon: " Al Viro
2023-02-10 2:59 ` Brian Cain
2023-01-31 20:04 ` [PATCH 03/10] ia64: " Al Viro
2023-01-31 20:04 ` [PATCH 04/10] m68k: " Al Viro
2023-02-05 6:18 ` Finn Thain
2023-02-05 18:51 ` Linus Torvalds
2023-02-07 3:07 ` Finn Thain
2023-02-05 20:39 ` Al Viro
2023-02-05 20:41 ` Linus Torvalds
2023-02-06 12:08 ` Geert Uytterhoeven
2023-01-31 20:05 ` [PATCH 05/10] microblaze: " Al Viro
2023-01-31 20:05 ` [PATCH 06/10] nios2: " Al Viro
2023-01-31 20:06 ` [PATCH 07/10] openrisc: " Al Viro
2023-01-31 20:06 ` [PATCH 08/10] parisc: " Al Viro
2023-02-06 16:58 ` Helge Deller
2023-02-28 17:34 ` Al Viro
2023-02-28 18:26 ` Helge Deller
2023-02-28 19:14 ` Al Viro
2023-02-28 19:32 ` Helge Deller
2023-02-28 20:00 ` Helge Deller
2023-02-28 20:22 ` Helge Deller
2023-02-28 22:57 ` Al Viro
2023-03-01 4:00 ` Helge Deller
2023-03-02 17:53 ` Al Viro
2023-02-28 15:22 ` Guenter Roeck
2023-02-28 19:18 ` Michael Schmitz
2023-01-31 20:06 ` [PATCH 09/10] riscv: " Al Viro
2023-02-06 20:06 ` Björn Töpel
2023-02-07 16:11 ` Geert Uytterhoeven
2023-01-31 20:07 ` [PATCH 10/10] sparc: " Al Viro
2023-01-31 20:24 ` [RFC][PATCHSET] VM_FAULT_RETRY fixes Linus Torvalds
2023-01-31 21:10 ` Al Viro
2023-01-31 21:19 ` Linus Torvalds
2023-01-31 21:49 ` Al Viro
2023-02-01 0:00 ` Linus Torvalds
2023-02-01 19:48 ` Peter Xu
2023-02-01 22:18 ` Al Viro [this message]
2023-02-02 0:57 ` Al Viro
2023-02-02 22:56 ` Peter Xu
2023-02-04 0:26 ` Al Viro
2023-02-05 5:10 ` Al Viro
2023-02-04 0:47 ` [loongarch oddities] " Al Viro
2023-02-01 8:21 ` Helge Deller
2023-02-01 19:51 ` Linus Torvalds
2023-02-02 6:58 ` Al Viro
2023-02-02 8:54 ` Michael Cree
2023-02-02 9:56 ` John Paul Adrian Glaubitz
2023-02-02 15:20 ` Al Viro
2023-02-02 20:20 ` Al Viro
2023-02-02 20:34 ` Linus Torvalds
2023-02-01 10:50 ` Mark Rutland
2023-02-06 12:08 ` Geert Uytterhoeven
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y9rlI6d5J2Y/YNQ+@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=dinguyen@kernel.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hexagon@vger.kernel.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=monstr@monstr.eu \
--cc=openrisc@lists.librecores.org \
--cc=peterx@redhat.com \
--cc=sparclinux@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).