From: Catalin Marinas <catalin.marinas@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
Paul Mackerras <paulus@ozlabs.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@infradead.org>,
"Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <willy@infradead.org>,
cluster-devel <cluster-devel@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
ocfs2-devel@oss.oracle.com, kvm-ppc@vger.kernel.org,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Tony Luck <tony.luck@intel.com>,
Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks
Date: Fri, 29 Oct 2021 18:50:35 +0100 [thread overview]
Message-ID: <YXw0a9n+/PLAcObB@arm.com> (raw)
In-Reply-To: <CAHk-=wgzEKEYKRoR_abQRDO=R8xJX_FK+XC3gNhKfu=KLdxt3g@mail.gmail.com>
On Thu, Oct 28, 2021 at 03:32:23PM -0700, Linus Torvalds wrote:
> The pointer color fault (or whatever some other architecture may do to
> generate sub-page faults) is not only not recoverable in the sense
> that we can't fix it up, it also ends up being a forced SIGSEGV (ie it
> can't be blocked - it has to either be caught or cause the process to
> be killed).
>
> And the thing is, I think we could just make the rule be that kernel
> code that has this kind of retry loop with fault_in_pages() would
> force an EFAULT on a pending SIGSEGV.
>
> IOW, the pending SIGSEGV could effectively be exactly that "thread flag".
>
> And that means that fault_in_xyz() wouldn't need to worry about this
> situation at all: the regular copy_from_user() (or whatever flavor it
> is - to/from/iter/whatever) would take the fault. And if it's a
> regular page fault,. it would act exactly like it does now, so no
> changes.
>
> If it's a sub-page fault, we'd just make the rule be that we send a
> SIGSEGV even if the instruction in question has a user exception
> fixup.
>
> Then we just need to add the logic somewhere that does "if active
> pending SIGSEGV, return -EFAULT".
>
> Of course, that logic might be in fault_in_xyz(), but it migth also be
> a separate function entirely.
>
> So this does effectively end up being a thread flag, but it's also
> slightly more than that - it's that a sub-page fault from kernel mode
> has semantics that a regular page fault does not.
>
> The whole "kernel access doesn't cause SIGSEGV, but returns -EFAULT
> instead" has always been an odd and somewhat wrong-headed thing. Of
> course it should cause a SIGSEGV, but that's not how Unix traditionall
> worked. We would just say "color faults always raise a signal, even if
> the color fault was triggered in a system call".
It's doable and, at least for MTE, people have asked for a signal even
when the fault was caused by a kernel uaccess. But there are some
potentially confusing aspects to sort out:
First of all, a uaccess in interrupt should not force such signal as it
had nothing to do with the interrupted context. I guess we can do an
in_task() check in the fault handler.
Second, is there a chance that we enter the fault-in loop with a SIGSEGV
already pending? Maybe it's not a problem, we just bail out of the loop
early and deliver the signal, though unrelated to the actual uaccess in
the loop.
Third is the sigcontext.pc presented to the signal handler. Normally for
SIGSEGV it points to the address of a load/store instruction and a
handler could disable MTE and restart from that point. With a syscall we
don't want it to point to the syscall place as it shouldn't be restarted
in case it copied something. Pointing it to the next instruction after
syscall is backwards-compatible but it may confuse the handler (if it
does some reporting). I think we need add a new si_code that describes a
fault in kernel mode to differentiate from the genuine user access.
There was a discussion back in August on infinite loops with hwpoison
and Tony said that Andy convinced him that the kernel should not send a
SIGBUS for uaccess:
https://lore.kernel.org/linux-edac/20210823152437.GA1637466@agluck-desk2.amr.corp.intel.com/
I personally like the approach of a SIG{SEGV,BUS} on uaccess and I don't
think the ABI change is significant but ideally we should have a unified
approach that's not just for MTE.
Adding Andy and Tony (the background is potentially infinite loops with
faults at sub-page granularity: arm64 MTE, hwpoison, sparc ADI).
Thanks.
--
Catalin
next prev parent reply other threads:[~2021-10-29 17:50 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-19 13:41 [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 01/17] iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 02/17] powerpc/kvm: Fix kvm_use_magic_page Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 03/17] gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 04/17] iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 05/17] iov_iter: Introduce fault_in_iov_iter_writeable Andreas Gruenbacher
2021-10-20 16:25 ` Catalin Marinas
2021-10-19 13:41 ` [PATCH v8 06/17] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 07/17] gfs2: Clean up function may_grant Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 08/17] gfs2: Introduce flag for glock holder auto-demotion Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 09/17] gfs2: Move the inode glock locking to gfs2_file_buffered_write Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 10/17] gfs2: Eliminate ip->i_gh Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 11/17] gfs2: Fix mmap + page fault deadlocks for buffered I/O Andreas Gruenbacher
2021-10-19 13:41 ` [PATCH v8 12/17] iomap: Fix iomap_dio_rw return value for user copies Andreas Gruenbacher
2021-10-19 13:42 ` [PATCH v8 13/17] iomap: Support partial direct I/O on user copy failures Andreas Gruenbacher
2021-10-19 13:42 ` [PATCH v8 14/17] iomap: Add done_before argument to iomap_dio_rw Andreas Gruenbacher
2021-10-19 15:51 ` Darrick J. Wong
2021-10-19 19:30 ` Andreas Gruenbacher
2021-10-20 1:57 ` Darrick J. Wong
2021-10-19 13:42 ` [PATCH v8 15/17] gup: Introduce FOLL_NOFAULT flag to disable page faults Andreas Gruenbacher
2021-10-19 13:42 ` [PATCH v8 16/17] iov_iter: Introduce nofault " Andreas Gruenbacher
2021-10-19 13:42 ` [PATCH v8 17/17] gfs2: Fix mmap + page fault deadlocks for direct I/O Andreas Gruenbacher
2021-10-19 15:40 ` [PATCH v8 00/17] gfs2: Fix mmap + page fault deadlocks Linus Torvalds
2021-10-19 16:00 ` Bob Peterson
2021-10-20 16:36 ` Catalin Marinas
2021-10-20 20:11 ` Linus Torvalds
2021-10-20 22:44 ` Catalin Marinas
2021-10-21 6:19 ` Linus Torvalds
2021-10-22 18:06 ` Catalin Marinas
2021-10-22 19:22 ` Linus Torvalds
2021-10-25 19:00 ` Andreas Gruenbacher
2021-10-26 18:24 ` Catalin Marinas
2021-10-26 18:50 ` Linus Torvalds
2021-10-26 19:18 ` Linus Torvalds
2021-10-27 19:13 ` Catalin Marinas
2021-10-27 21:14 ` Linus Torvalds
2021-10-28 21:20 ` Catalin Marinas
2021-10-28 21:40 ` Catalin Marinas
2021-10-28 22:15 ` Andreas Grünbacher
2021-10-29 12:50 ` Catalin Marinas
2021-10-28 22:32 ` Linus Torvalds
2021-10-29 17:50 ` Catalin Marinas [this message]
2021-10-29 18:47 ` Linus Torvalds
2021-10-25 18:24 ` Andreas Gruenbacher
2021-10-26 5:12 ` Theodore Ts'o
2021-10-26 9:44 ` Andreas Gruenbacher
2021-10-27 21:21 ` Andreas Gruenbacher
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YXw0a9n+/PLAcObB@arm.com \
--to=catalin.marinas@arm.com \
--cc=agruenba@redhat.com \
--cc=cluster-devel@redhat.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=kvm-ppc@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=ocfs2-devel@oss.oracle.com \
--cc=paulus@ozlabs.org \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox