linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: "Christoph Lameter (Ampere)" <cl@gentwo.org>,
	will@kernel.org, anshuman.khandual@arm.com, david@redhat.com,
	scott@os.amperecomputing.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Thu, 11 Jul 2024 18:43:59 +0100	[thread overview]
Message-ID: <ZpAZ39VQhxfNepWT@arm.com> (raw)
In-Reply-To: <6167c4ce-fef0-4af4-a6a1-9fe7b2eb023d@os.amperecomputing.com>

On Wed, Jul 10, 2024 at 11:43:18AM -0700, Yang Shi wrote:
> On 7/10/24 2:22 AM, Catalin Marinas wrote:
> > > diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
> > > index 642bdf908b22..d30265d424e4 100644
> > > --- a/arch/arm64/mm/mmap.c
> > > +++ b/arch/arm64/mm/mmap.c
> > > @@ -19,7 +19,7 @@ static pgprot_t protection_map[16] __ro_after_init = {
> > >          [VM_WRITE]                                      = PAGE_READONLY,
> > >          [VM_WRITE | VM_READ]                            = PAGE_READONLY,
> > >          /* PAGE_EXECONLY if Enhanced PAN */
> > > -        [VM_EXEC]                                       = PAGE_READONLY_EXEC,
> > > +        [VM_EXEC]                                       = PAGE_EXECONLY,
> > >          [VM_EXEC | VM_READ]                             = PAGE_READONLY_EXEC,
> > >          [VM_EXEC | VM_WRITE]                            = PAGE_READONLY_EXEC,
> > >          [VM_EXEC | VM_WRITE | VM_READ]                  = PAGE_READONLY_EXEC,
> > 
> > In theory you'd need to change the VM_SHARED | VM_EXEC entry as well.
> > Otherwise it looks fine.
> 
> Thanks. I just ran the same benchmark. Ran the modified page_fault1_thread
> (trigger read fault) in 100 iterations with 160 threads on 160 cores. This
> should be the worst contention case and collected the max data (worst
> latency). It shows the patch may incur ~30% overhead for exec-only case. The
> overhead should just come from the permission fault.
> 
>     N           Min           Max        Median           Avg Stddev
> x 100        163840        219083        184471        183262 12593.229
> + 100        211198        285947        233608     238819.98 15253.967
> Difference at 95.0% confidence
>     55558 +/- 3877
>     30.3161% +/- 2.11555%
> 
> This is a very extreme benchmark, I don't think any real life workload will
> spend that much time (sys vs user) in page fault, particularly read fault.
> 
> With my atomic fault benchmark (populate 1G memory with atomic instruction
> then manipulate the value stored in the memory in 100 iterations so the user
> time is much longer than sys time), I saw around 13% overhead on sys time
> due to the permission fault, but no noticeable change for user and real
> time.

Thanks for running these tests.

> So the permission fault does incur noticeable overhead for read fault on
> exec-only, but it may be not that bad for real life workloads.

So you are saying 3 faults is not that bad for real life workloads but
the 2 faults behaviour we have currently is problematic for OpenJDK. For
the OpenJDK case, I don't think the faulting is the main issue affecting
run-time performance but, over a longer run (or with more iterations in
this benchmark after the faulting in), you'd notice the breaking up of
the initial THP pages.

Do you have any OpenJDK benchmarks that show the problem? It might be
worth running them with this patch:

https://lore.kernel.org/r/rjudrmg7nkkwfgviqeqluuww6wu6fdrgdsfimtmpjee7lkz2ej@iosd2f6pk4f7

Or, if not, do you see any difference in the user time in your benchmark
with the above mm patch? In subsequent iterations, linear accesses are
not ideal for testing. Better to have some random accesses this 1GB of
memory (after the initial touching). That would mimic heap accesses a
bit better.

Anyway, as it stands, I don't think we should merge this patch since it
does incur an additional penalty with exec-only mappings and it would
make things even worse for OpenJDK if distros change their default
toolchain flags at some point to generate exec-only ELF text sections.
While we could work around this by allowing the kernel to read the
exec-only user mapping with a new flavour of get_user() (toggling PAN as
well), I don't think it's worth it. Especially with the above mm change,
I think the benefits of this patch aren't justified. Longer term, I hope
that customers upgrade to OpenJDK v22 or, for proprietary tools, they
either follow the madvise() approach or wait for the Arm architects to
change the hardware behaviour.

-- 
Catalin


  reply	other threads:[~2024-07-11 17:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26 19:18 [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions Yang Shi
2024-06-28 16:54 ` Catalin Marinas
2024-06-28 16:57   ` Christoph Lameter (Ampere)
2024-06-28 17:24     ` Catalin Marinas
2024-06-28 18:20       ` Yang Shi
2024-07-01 19:43         ` Catalin Marinas
2024-07-02 10:26           ` Ryan Roberts
2024-07-02 11:22             ` David Hildenbrand
2024-07-02 12:36               ` Ryan Roberts
2024-07-02 12:58                 ` David Hildenbrand
2024-07-02 13:26                   ` Ryan Roberts
2024-07-02 13:50                     ` David Hildenbrand
2024-07-02 14:51                       ` Ryan Roberts
2024-07-15 13:09             ` Ryan Roberts
2024-07-02 22:21           ` Yang Shi
2024-07-04 10:03             ` Catalin Marinas
2024-07-05 17:05               ` Christoph Lameter (Ampere)
2024-07-05 18:24                 ` Catalin Marinas
2024-07-05 18:51                   ` Christoph Lameter (Ampere)
2024-07-06  9:47                     ` Catalin Marinas
2024-07-09 17:56               ` Yang Shi
2024-07-09 18:35                 ` Catalin Marinas
2024-07-09 22:29                   ` Yang Shi
2024-07-10  9:22                     ` Catalin Marinas
2024-07-10 18:43                       ` Yang Shi
2024-07-11 17:43                         ` Catalin Marinas [this message]
2024-07-11 18:17                           ` Yang Shi
2024-08-13 17:09                             ` Yang Shi
2024-08-21 10:18                             ` Catalin Marinas
2024-08-21 11:32                               ` Dev Jain
2024-08-23  9:59                               ` Will Deacon
2024-06-28 18:26       ` Christoph Lameter (Ampere)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZpAZ39VQhxfNepWT@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=cl@gentwo.org \
    --cc=david@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott@os.amperecomputing.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).