Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
	Yang Shi <yang@os.amperecomputing.com>
Cc: "Christoph Lameter (Ampere)" <cl@gentwo.org>,
	will@kernel.org, anshuman.khandual@arm.com, david@redhat.com,
	scott@os.amperecomputing.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dev Jain <dev.jain@arm.com>
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Mon, 15 Jul 2024 14:09:15 +0100	[thread overview]
Message-ID: <4595f655-aaee-4b5c-8988-0804cedda14c@arm.com> (raw)
In-Reply-To: <3743d7e1-0b79-4eaf-82d5-d1ca29fe347d@arm.com>

On 02/07/2024 11:26, Ryan Roberts wrote:
> On 01/07/2024 20:43, Catalin Marinas wrote:
>> On Fri, Jun 28, 2024 at 11:20:43AM -0700, Yang Shi wrote:
>>> On 6/28/24 10:24 AM, Catalin Marinas wrote:
>>>> This patch does feel a bit like working around a non-optimal user choice
>>>> in kernel space. Who knows, madvise() may even be quicker if you do a
>>>> single call for a larger VA vs touching each page.
>>>
>>> IMHO, I don't think so. I viewed this patch to solve or workaround some ISA
>>> inefficiency in kernel. Two faults are not necessary if we know we are
>>> definitely going to write the memory very soon, right?
>>
>> I agree the Arm architecture behaviour is not ideal here and any
>> timelines for fixing it in hardware, if they do happen, are far into the
>> future. Purely from a kernel perspective, what I want though is make
>> sure that longer term (a) we don't create additional maintenance burden
>> and (b) we don't keep dead code around.
>>
>> Point (a) could be mitigated if the architecture is changed so that any
>> new atomic instructions added to this range would also come with
>> additional syndrome information so that we don't have to update the
>> decoding patterns.
>>
>> Point (b), however, depends on the OpenJDK and the kernel versions in
>> distros. Nick Gasson kindly provided some information on the OpenJDK
>> changes. The atomic_add(0) change happened in early 2022, about 5-6
>> months after MADV_POPULATE_WRITE support was added to the kernel. What's
>> interesting is Ampere already contributed MADV_POPULATE_WRITE support to
>> OpenJDK a few months ago:
>>
>> https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a
>>
>> The OpenJDK commit lacks explanation but what I gathered from the diff
>> is that this option is the preferred one in the presence of THP (which
>> most/all distros enable by default). If we merge your proposed kernel
>> patch, it will take time before it makes its way into distros. I'm
>> hoping that by that time, distros would have picked a new OpenJDK
>> version already that doesn't need the atomic_add(0) pattern. If that's
>> the case, we end up with some dead code in the kernel that's almost
>> never exercised.
>>
>> I don't follow OpenJDK development but I heard that updates are dragging
>> quite a lot. I can't tell whether people have picked up the
>> atomic_add(0) feature and whether, by the time a kernel patch would make
>> it into distros, they'd also move to the MADV_POPULATE_WRITE pattern.
>>
>> There's a point (c) as well on the overhead of reading the faulting
>> instruction. I hope that's negligible but I haven't measured it.
>>
> 
> Just to add to this, I note the existing kernel behaviour is that if a write
> fault happens in a region that has a (RO) huge zero page mapped at PMD level,
> then the PMD is shattered, the PTE of the fault address is populated with a
> writable page and the remaining PTEs are populated with order-0 zero pages
> (read-only).
> 
> This seems like odd behaviour to me. Surely it would be less effort and more
> aligned with the app's expectations to notice the huge zero page in the PMD,
> remove it, and install a THP, as would have been done if pmd_none() was true? I
> don't think there is a memory bloat argument here because, IIUC, with the
> current behaviour, khugepaged would eventually upgrade it to a THP anyway?
> 
> Changing to this new behaviour would only be a partial solution for your use
> case, since you would still have 2 faults. But it would remove the cost of the
> shattering and ensure you have a THP immediately after the write fault. But I
> can't think of a reason why this wouldn't be a generally useful change
> regardless? Any thoughts?

Hi All,

FYI, I had some more conversation on this at [1] and the conclusion was that the
current kernel behaviour is undesirable and we should change it so that if a
write fault occurs in a ragion mapped by a huge zero page, then the huge zero
page is uninstalled and a (private) THP is installed in its place. Shattering of
the huge zero page to PTEs should *not* occur.

So we will make this change in due course and submit to the list.

[1] https://lore.kernel.org/linux-mm/1cfae0c0-96a2-4308-9c62-f7a640520242@arm.com/

Thanks,
Ryan

> 
> Thanks,
> Ryan
>

next prev parent reply	other threads:[~2024-07-15 13:09 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-26 19:18 [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions Yang Shi
2024-06-28 16:54 ` Catalin Marinas
2024-06-28 16:57   ` Christoph Lameter (Ampere)
2024-06-28 17:24     ` Catalin Marinas
2024-06-28 18:20       ` Yang Shi
2024-07-01 19:43         ` Catalin Marinas
2024-07-02 10:26           ` Ryan Roberts
2024-07-02 11:22             ` David Hildenbrand
2024-07-02 12:36               ` Ryan Roberts
2024-07-02 12:58                 ` David Hildenbrand
2024-07-02 13:26                   ` Ryan Roberts
2024-07-02 13:50                     ` David Hildenbrand
2024-07-02 14:51                       ` Ryan Roberts
2024-07-15 13:09             ` Ryan Roberts [this message]
2024-07-02 22:21           ` Yang Shi
2024-07-04 10:03             ` Catalin Marinas
2024-07-05 17:05               ` Christoph Lameter (Ampere)
2024-07-05 18:24                 ` Catalin Marinas
2024-07-05 18:51                   ` Christoph Lameter (Ampere)
2024-07-06  9:47                     ` Catalin Marinas
2024-07-09 17:56               ` Yang Shi
2024-07-09 18:35                 ` Catalin Marinas
2024-07-09 22:29                   ` Yang Shi
2024-07-10  9:22                     ` Catalin Marinas
2024-07-10 18:43                       ` Yang Shi
2024-07-11 17:43                         ` Catalin Marinas
2024-07-11 18:17                           ` Yang Shi
2024-08-13 17:09                             ` Yang Shi
2024-08-21 10:18                             ` Catalin Marinas
2024-08-21 11:32                               ` Dev Jain
2024-08-23  9:59                               ` Will Deacon
2024-06-28 18:26       ` Christoph Lameter (Ampere)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4595f655-aaee-4b5c-8988-0804cedda14c@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott@os.amperecomputing.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).