From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com,
cl@gentwo.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Fri, 17 May 2024 18:25:42 +0100 [thread overview]
Message-ID: <ZkeTFiF_OOy80stO@arm.com> (raw)
In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>
On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
> On 5/14/24 3:39 AM, Catalin Marinas wrote:
> > It would be good to understand why openjdk is doing this instead of a
> > plain write. Is it because it may be racing with some other threads
> > already using the heap? That would be a valid pattern.
>
> Yes, you are right. I think I quoted the JVM justification in earlier email,
> anyway they said "permit use of memory concurrently with pretouch".
Ah, sorry, I missed that. This seems like a valid reason.
> > A point Will raised was on potential ABI changes introduced by this
> > patch. The ESR_EL1 reported to user remains the same as per the hardware
> > spec (read-only), so from a SIGSEGV we may have some slight behaviour
> > changes:
> >
> > 1. PTE invalid:
> >
> > a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
> > ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
> > patch, the PTE is mapped as PTE_RDONLY first and a subsequent
> > fault will report SIGSEGV with ESR_EL1.WnR == 1.
>
> I think I can do something like the below conceptually:
>
> if is_el0_atomic_instr && !is_write_abort
> force_write = true
>
> if VM_READ && !VM_WRITE && force_write == true
Nit: write implies read, so you only need to check !write.
> vm_flags = VM_READ
> mm_flags ~= FAULT_FLAG_WRITE
>
> Then we just fallback to read fault. The following write fault will trigger
> SIGSEGV with consistent ABI.
I think this should work. So instead of reporting the write fault
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.
> > b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
> > ESR_EL1.WnR == 0, so no change from current behaviour, unless we
> > fix the patch for (1.a) to fake the WnR bit which would change the
> > current expectations.
> >
> > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
> > hardware, no need to fix ESR_EL1 up.
> >
> > The patch would have to address (1) above but faking the ESR_EL1.WnR bit
> > based on the vma flags looks a bit fragile.
>
> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.
I agree, with your approach above we don't need to fake WnR.
> > Similarly, we have userfaultfd that reports the fault to user. I think
> > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
> > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
> > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
> > and UFFD_PAGEFAULT_FLAG_WRITE set.
>
> I don't quite get what the problem is. IIUC, uffd just needs a signal from
> kernel to tell this area will be written. It seems not break the semantic.
> Added Peter Xu in this loop, who is the uffd developer. He may shed some
> light.
Not really familiar with uffd but just looking at the code, if a handler
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).
With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).
However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: peterx@redhat.com, will@kernel.org, scott@os.amperecomputing.com,
cl@gentwo.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Fri, 17 May 2024 18:25:42 +0100 [thread overview]
Message-ID: <ZkeTFiF_OOy80stO@arm.com> (raw)
In-Reply-To: <570c686c-6aa1-43f0-ba31-3597a329e037@os.amperecomputing.com>
On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
> On 5/14/24 3:39 AM, Catalin Marinas wrote:
> > It would be good to understand why openjdk is doing this instead of a
> > plain write. Is it because it may be racing with some other threads
> > already using the heap? That would be a valid pattern.
>
> Yes, you are right. I think I quoted the JVM justification in earlier email,
> anyway they said "permit use of memory concurrently with pretouch".
Ah, sorry, I missed that. This seems like a valid reason.
> > A point Will raised was on potential ABI changes introduced by this
> > patch. The ESR_EL1 reported to user remains the same as per the hardware
> > spec (read-only), so from a SIGSEGV we may have some slight behaviour
> > changes:
> >
> > 1. PTE invalid:
> >
> > a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
> > ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
> > patch, the PTE is mapped as PTE_RDONLY first and a subsequent
> > fault will report SIGSEGV with ESR_EL1.WnR == 1.
>
> I think I can do something like the below conceptually:
>
> if is_el0_atomic_instr && !is_write_abort
> force_write = true
>
> if VM_READ && !VM_WRITE && force_write == true
Nit: write implies read, so you only need to check !write.
> vm_flags = VM_READ
> mm_flags ~= FAULT_FLAG_WRITE
>
> Then we just fallback to read fault. The following write fault will trigger
> SIGSEGV with consistent ABI.
I think this should work. So instead of reporting the write fault
directly in case of a read-only vma, we let the core code handle the
read fault and first and we retry the atomic instruction.
> > b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
> > ESR_EL1.WnR == 0, so no change from current behaviour, unless we
> > fix the patch for (1.a) to fake the WnR bit which would change the
> > current expectations.
> >
> > 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
> > hardware, no need to fix ESR_EL1 up.
> >
> > The patch would have to address (1) above but faking the ESR_EL1.WnR bit
> > based on the vma flags looks a bit fragile.
>
> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.
I agree, with your approach above we don't need to fake WnR.
> > Similarly, we have userfaultfd that reports the fault to user. I think
> > in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
> > your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
> > indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
> > and UFFD_PAGEFAULT_FLAG_WRITE set.
>
> I don't quite get what the problem is. IIUC, uffd just needs a signal from
> kernel to tell this area will be written. It seems not break the semantic.
> Added Peter Xu in this loop, who is the uffd developer. He may shed some
> light.
Not really familiar with uffd but just looking at the code, if a handler
is registered for both MODE_MISSING and MODE_WP, currently the atomic
instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
do_anonymous_page() path). If the page is mapped by the uffd handler as
the zero page, a restart of the instruction would signal
UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
path).
With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
the first attempt, just like having a STR instruction instead of
separate LDR + STR (as the atomics behave from a fault perspective).
However, I don't think that's a problem, the uffd handler should cope
with an STR anyway, so it's not some unexpected combination of flags.
--
Catalin
next prev parent reply other threads:[~2024-05-17 17:26 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-07 22:35 [PATCH] arm64: mm: force write fault for atomic RMW instructions Yang Shi
2024-05-07 22:35 ` Yang Shi
2024-05-07 22:42 ` Christoph Lameter (Ampere)
2024-05-08 6:45 ` Anshuman Khandual
2024-05-08 6:45 ` Anshuman Khandual
2024-05-08 17:15 ` Christoph Lameter (Ampere)
2024-05-08 17:15 ` Christoph Lameter (Ampere)
2024-05-09 4:23 ` Anshuman Khandual
2024-05-09 4:23 ` Anshuman Khandual
2024-05-13 22:39 ` Christoph Lameter (Ampere)
2024-05-13 22:39 ` Christoph Lameter (Ampere)
2024-05-08 18:37 ` Yang Shi
2024-05-08 18:37 ` Yang Shi
2024-05-09 4:31 ` Anshuman Khandual
2024-05-09 4:31 ` Anshuman Khandual
2024-05-09 21:46 ` Yang Shi
2024-05-09 21:46 ` Yang Shi
2024-05-10 4:28 ` Anshuman Khandual
2024-05-10 4:28 ` Anshuman Khandual
2024-05-10 16:37 ` Yang Shi
2024-05-10 16:37 ` Yang Shi
2024-05-10 12:11 ` Catalin Marinas
2024-05-10 12:11 ` Catalin Marinas
2024-05-10 17:13 ` Yang Shi
2024-05-10 17:13 ` Yang Shi
2024-05-13 22:41 ` Christoph Lameter (Ampere)
2024-05-13 22:41 ` Christoph Lameter (Ampere)
2024-05-14 10:39 ` Catalin Marinas
2024-05-14 10:39 ` Catalin Marinas
2024-05-14 15:57 ` David Hildenbrand
2024-05-14 15:57 ` David Hildenbrand
2024-05-17 16:30 ` Yang Shi
2024-05-17 16:30 ` Yang Shi
2024-05-17 17:25 ` Catalin Marinas [this message]
2024-05-17 17:25 ` Catalin Marinas
2024-05-17 17:35 ` Yang Shi
2024-05-17 17:35 ` Yang Shi
2024-05-14 3:19 ` Yang Shi
2024-05-14 3:19 ` Yang Shi
2024-05-14 10:53 ` Catalin Marinas
2024-05-14 10:53 ` Catalin Marinas
2024-05-17 16:10 ` Yang Shi
2024-05-17 16:10 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZkeTFiF_OOy80stO@arm.com \
--to=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterx@redhat.com \
--cc=scott@os.amperecomputing.com \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.