From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>,
Yang Shi <yang@os.amperecomputing.com>
Cc: "Christoph Lameter (Ampere)" <cl@gentwo.org>,
will@kernel.org, anshuman.khandual@arm.com, david@redhat.com,
scott@os.amperecomputing.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, Dev Jain <dev.jain@arm.com>
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Mon, 15 Jul 2024 14:09:15 +0100 [thread overview]
Message-ID: <4595f655-aaee-4b5c-8988-0804cedda14c@arm.com> (raw)
In-Reply-To: <3743d7e1-0b79-4eaf-82d5-d1ca29fe347d@arm.com>
On 02/07/2024 11:26, Ryan Roberts wrote:
> On 01/07/2024 20:43, Catalin Marinas wrote:
>> On Fri, Jun 28, 2024 at 11:20:43AM -0700, Yang Shi wrote:
>>> On 6/28/24 10:24 AM, Catalin Marinas wrote:
>>>> This patch does feel a bit like working around a non-optimal user choice
>>>> in kernel space. Who knows, madvise() may even be quicker if you do a
>>>> single call for a larger VA vs touching each page.
>>>
>>> IMHO, I don't think so. I viewed this patch to solve or workaround some ISA
>>> inefficiency in kernel. Two faults are not necessary if we know we are
>>> definitely going to write the memory very soon, right?
>>
>> I agree the Arm architecture behaviour is not ideal here and any
>> timelines for fixing it in hardware, if they do happen, are far into the
>> future. Purely from a kernel perspective, what I want though is make
>> sure that longer term (a) we don't create additional maintenance burden
>> and (b) we don't keep dead code around.
>>
>> Point (a) could be mitigated if the architecture is changed so that any
>> new atomic instructions added to this range would also come with
>> additional syndrome information so that we don't have to update the
>> decoding patterns.
>>
>> Point (b), however, depends on the OpenJDK and the kernel versions in
>> distros. Nick Gasson kindly provided some information on the OpenJDK
>> changes. The atomic_add(0) change happened in early 2022, about 5-6
>> months after MADV_POPULATE_WRITE support was added to the kernel. What's
>> interesting is Ampere already contributed MADV_POPULATE_WRITE support to
>> OpenJDK a few months ago:
>>
>> https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a
>>
>> The OpenJDK commit lacks explanation but what I gathered from the diff
>> is that this option is the preferred one in the presence of THP (which
>> most/all distros enable by default). If we merge your proposed kernel
>> patch, it will take time before it makes its way into distros. I'm
>> hoping that by that time, distros would have picked a new OpenJDK
>> version already that doesn't need the atomic_add(0) pattern. If that's
>> the case, we end up with some dead code in the kernel that's almost
>> never exercised.
>>
>> I don't follow OpenJDK development but I heard that updates are dragging
>> quite a lot. I can't tell whether people have picked up the
>> atomic_add(0) feature and whether, by the time a kernel patch would make
>> it into distros, they'd also move to the MADV_POPULATE_WRITE pattern.
>>
>> There's a point (c) as well on the overhead of reading the faulting
>> instruction. I hope that's negligible but I haven't measured it.
>>
>
> Just to add to this, I note the existing kernel behaviour is that if a write
> fault happens in a region that has a (RO) huge zero page mapped at PMD level,
> then the PMD is shattered, the PTE of the fault address is populated with a
> writable page and the remaining PTEs are populated with order-0 zero pages
> (read-only).
>
> This seems like odd behaviour to me. Surely it would be less effort and more
> aligned with the app's expectations to notice the huge zero page in the PMD,
> remove it, and install a THP, as would have been done if pmd_none() was true? I
> don't think there is a memory bloat argument here because, IIUC, with the
> current behaviour, khugepaged would eventually upgrade it to a THP anyway?
>
> Changing to this new behaviour would only be a partial solution for your use
> case, since you would still have 2 faults. But it would remove the cost of the
> shattering and ensure you have a THP immediately after the write fault. But I
> can't think of a reason why this wouldn't be a generally useful change
> regardless? Any thoughts?
Hi All,
FYI, I had some more conversation on this at [1] and the conclusion was that the
current kernel behaviour is undesirable and we should change it so that if a
write fault occurs in a ragion mapped by a huge zero page, then the huge zero
page is uninstalled and a (private) THP is installed in its place. Shattering of
the huge zero page to PTEs should *not* occur.
So we will make this change in due course and submit to the list.
[1] https://lore.kernel.org/linux-mm/1cfae0c0-96a2-4308-9c62-f7a640520242@arm.com/
Thanks,
Ryan
>
> Thanks,
> Ryan
>
next prev parent reply other threads:[~2024-07-15 13:09 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-26 19:18 [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions Yang Shi
2024-06-28 16:54 ` Catalin Marinas
2024-06-28 16:57 ` Christoph Lameter (Ampere)
2024-06-28 17:24 ` Catalin Marinas
2024-06-28 18:20 ` Yang Shi
2024-07-01 19:43 ` Catalin Marinas
2024-07-02 10:26 ` Ryan Roberts
2024-07-02 11:22 ` David Hildenbrand
2024-07-02 12:36 ` Ryan Roberts
2024-07-02 12:58 ` David Hildenbrand
2024-07-02 13:26 ` Ryan Roberts
2024-07-02 13:50 ` David Hildenbrand
2024-07-02 14:51 ` Ryan Roberts
2024-07-15 13:09 ` Ryan Roberts [this message]
2024-07-02 22:21 ` Yang Shi
2024-07-04 10:03 ` Catalin Marinas
2024-07-05 17:05 ` Christoph Lameter (Ampere)
2024-07-05 18:24 ` Catalin Marinas
2024-07-05 18:51 ` Christoph Lameter (Ampere)
2024-07-06 9:47 ` Catalin Marinas
2024-07-09 17:56 ` Yang Shi
2024-07-09 18:35 ` Catalin Marinas
2024-07-09 22:29 ` Yang Shi
2024-07-10 9:22 ` Catalin Marinas
2024-07-10 18:43 ` Yang Shi
2024-07-11 17:43 ` Catalin Marinas
2024-07-11 18:17 ` Yang Shi
2024-08-13 17:09 ` Yang Shi
2024-08-21 10:18 ` Catalin Marinas
2024-08-21 11:32 ` Dev Jain
2024-08-23 9:59 ` Will Deacon
2024-06-28 18:26 ` Christoph Lameter (Ampere)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4595f655-aaee-4b5c-8988-0804cedda14c@arm.com \
--to=ryan.roberts@arm.com \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=scott@os.amperecomputing.com \
--cc=will@kernel.org \
--cc=yang@os.amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).