All of lore.kernel.org
 help / color / mirror / Atom feed
From: Muchun Song <muchun.song@linux.dev>
To: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Mark Brown <broonie@kernel.org>
Subject: Re: [PATCH V2] arm64/mm: Intercept pfn changes in set_pte_at()
Date: Wed, 8 Feb 2023 11:13:46 +0800	[thread overview]
Message-ID: <FB3AE2F1-2EF8-4960-884E-DABA7E64CE59@linux.dev> (raw)
In-Reply-To: <20230207143131.GA12475@willie-the-truck>



> On Feb 7, 2023, at 22:31, Will Deacon <will@kernel.org> wrote:
> 
> On Mon, Feb 06, 2023 at 11:28:12AM +0800, Muchun Song wrote:
>> 
>> 
>>> On Feb 3, 2023, at 18:10, Will Deacon <will@kernel.org> wrote:
>>> 
>>> On Fri, Feb 03, 2023 at 10:40:18AM +0800, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Feb 2, 2023, at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>> 
>>>>> On Thu, Feb 02, 2023 at 05:51:39PM +0800, Muchun Song wrote:
>>>>>>> On Feb 1, 2023, at 20:20, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>>>>> Bah, sorry! Catalin reckons it may have been him talking about the vmemmap.
>>>>>>> 
>>>>>>> Indeed. The discussion with Anshuman started from this thread:
>>>>>>> 
>>>>>>> https://lore.kernel.org/all/20221025014215.3466904-1-mawupeng1@huawei.com/
>>>>>>> 
>>>>>>> We already trip over the existing checks even without Anshuman's patch,
>>>>>>> though only by chance. We are not setting the software PTE_DIRTY on the
>>>>>>> new pte (we don't bother with this bit for kernel mappings).
>>>>>>> 
>>>>>>> Given that the vmemmap ptes are still live when such change happens and
>>>>>>> no-one came with a solution to the break-before-make problem, I propose
>>>>>>> we revert the arm64 part of commit 47010c040dec ("mm: hugetlb_vmemmap:
>>>>>>> cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*"). We just need this hunk:
>>>>>>> 
>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>> index 27b2592698b0..5263454a5794 100644
>>>>>>> --- a/arch/arm64/Kconfig
>>>>>>> +++ b/arch/arm64/Kconfig
>>>>>>> @@ -100,7 +100,6 @@ config ARM64
>>>>>>> select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>>>>>> select ARCH_WANT_FRAME_POINTERS
>>>>>>> select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
>>>>>>> - select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>>>>>> 
>>>>>> Maybe it is a little overkill for HVO as it can significantly minimize the
>>>>>> overhead of vmemmap on ARM64 servers for some workloads (like qemu, DPDK).
>>>>>> So I don't think disabling it is a good approach. Indeed, HVO broke BBM,
>>>>>> but the waring does not affect anything since the tail vmemmap pages are
>>>>>> supposed to be read-only. So, I suggest skipping warnings if it is the
>>>>>> vmemmap address in set_pte_at(). What do you think of?
>>>>> 
>>>>> IIUC, vmemmap_remap_pte() not only makes the pte read-only but also
>>>>> changes the output address. Architecturally, this needs a BBM sequence.
>>>>> We can avoid going through an invalid pte if we first make the pte
>>>>> read-only, TLBI but keeping the same pfn, followed by a change of the
>>>>> pfn while keeping the pte readonly. This also assumes that the content
>>>>> of the page pointed at by the pte is the same at both old and new pfn.
>>>> 
>>>> Right. I think using BBM is to avoid possibly creating multiple TLB entries
>>>> for the same address for a extremely short period. But accessing either the
>>>> old page or the new page is fine in this case. Is it acceptable for this
>>>> special case without using BBM?
>>> 
>>> Sadly, the architecture allows the CPU to conjure up a mapping based on a
>>> combination of the old and the new descriptor (a process known as
>>> "amalgamation") so we _really_ need the BBM sequence.
>> 
>> I am not familiar with ARM64, what's the user-visible effect if this
>> "amalgamation" occurs?
> 
> The user-visible effects would probably be data corruption and instability,
> since the amalgamated TLB entry could result in a bogus physical address and
> bogus permissions.

You mean the output address of amalgamated TLB entry is neither the old
address (before updated) nor the new address (after updated)? So it is
a bogus physical address? Is there any specifications to describe the
rules of how to create a amalgamated TLB entry? Thanks.

Muchun

> 
> Will


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Muchun Song <muchun.song@linux.dev>
To: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Mark Rutland <mark.rutland@arm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, Mark Brown <broonie@kernel.org>
Subject: Re: [PATCH V2] arm64/mm: Intercept pfn changes in set_pte_at()
Date: Wed, 8 Feb 2023 11:13:46 +0800	[thread overview]
Message-ID: <FB3AE2F1-2EF8-4960-884E-DABA7E64CE59@linux.dev> (raw)
In-Reply-To: <20230207143131.GA12475@willie-the-truck>



> On Feb 7, 2023, at 22:31, Will Deacon <will@kernel.org> wrote:
> 
> On Mon, Feb 06, 2023 at 11:28:12AM +0800, Muchun Song wrote:
>> 
>> 
>>> On Feb 3, 2023, at 18:10, Will Deacon <will@kernel.org> wrote:
>>> 
>>> On Fri, Feb 03, 2023 at 10:40:18AM +0800, Muchun Song wrote:
>>>> 
>>>> 
>>>>> On Feb 2, 2023, at 18:45, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>> 
>>>>> On Thu, Feb 02, 2023 at 05:51:39PM +0800, Muchun Song wrote:
>>>>>>> On Feb 1, 2023, at 20:20, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>>>>>> Bah, sorry! Catalin reckons it may have been him talking about the vmemmap.
>>>>>>> 
>>>>>>> Indeed. The discussion with Anshuman started from this thread:
>>>>>>> 
>>>>>>> https://lore.kernel.org/all/20221025014215.3466904-1-mawupeng1@huawei.com/
>>>>>>> 
>>>>>>> We already trip over the existing checks even without Anshuman's patch,
>>>>>>> though only by chance. We are not setting the software PTE_DIRTY on the
>>>>>>> new pte (we don't bother with this bit for kernel mappings).
>>>>>>> 
>>>>>>> Given that the vmemmap ptes are still live when such change happens and
>>>>>>> no-one came with a solution to the break-before-make problem, I propose
>>>>>>> we revert the arm64 part of commit 47010c040dec ("mm: hugetlb_vmemmap:
>>>>>>> cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*"). We just need this hunk:
>>>>>>> 
>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>> index 27b2592698b0..5263454a5794 100644
>>>>>>> --- a/arch/arm64/Kconfig
>>>>>>> +++ b/arch/arm64/Kconfig
>>>>>>> @@ -100,7 +100,6 @@ config ARM64
>>>>>>> select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
>>>>>>> select ARCH_WANT_FRAME_POINTERS
>>>>>>> select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
>>>>>>> - select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>>>>>> 
>>>>>> Maybe it is a little overkill for HVO as it can significantly minimize the
>>>>>> overhead of vmemmap on ARM64 servers for some workloads (like qemu, DPDK).
>>>>>> So I don't think disabling it is a good approach. Indeed, HVO broke BBM,
>>>>>> but the waring does not affect anything since the tail vmemmap pages are
>>>>>> supposed to be read-only. So, I suggest skipping warnings if it is the
>>>>>> vmemmap address in set_pte_at(). What do you think of?
>>>>> 
>>>>> IIUC, vmemmap_remap_pte() not only makes the pte read-only but also
>>>>> changes the output address. Architecturally, this needs a BBM sequence.
>>>>> We can avoid going through an invalid pte if we first make the pte
>>>>> read-only, TLBI but keeping the same pfn, followed by a change of the
>>>>> pfn while keeping the pte readonly. This also assumes that the content
>>>>> of the page pointed at by the pte is the same at both old and new pfn.
>>>> 
>>>> Right. I think using BBM is to avoid possibly creating multiple TLB entries
>>>> for the same address for a extremely short period. But accessing either the
>>>> old page or the new page is fine in this case. Is it acceptable for this
>>>> special case without using BBM?
>>> 
>>> Sadly, the architecture allows the CPU to conjure up a mapping based on a
>>> combination of the old and the new descriptor (a process known as
>>> "amalgamation") so we _really_ need the BBM sequence.
>> 
>> I am not familiar with ARM64, what's the user-visible effect if this
>> "amalgamation" occurs?
> 
> The user-visible effects would probably be data corruption and instability,
> since the amalgamated TLB entry could result in a bogus physical address and
> bogus permissions.

You mean the output address of amalgamated TLB entry is neither the old
address (before updated) nor the new address (after updated)? So it is
a bogus physical address? Is there any specifications to describe the
rules of how to create a amalgamated TLB entry? Thanks.

Muchun

> 
> Will


  reply	other threads:[~2023-02-08  3:16 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09  5:28 [PATCH V2] arm64/mm: Intercept pfn changes in set_pte_at() Anshuman Khandual
2023-01-09  5:28 ` Anshuman Khandual
2023-01-24  5:41 ` Anshuman Khandual
2023-01-24  5:41   ` Anshuman Khandual
2023-01-26 13:33   ` Will Deacon
2023-01-26 13:33     ` Will Deacon
2023-01-27 12:43     ` Robin Murphy
2023-01-27 12:43       ` Robin Murphy
2023-01-31 15:49       ` Will Deacon
2023-01-31 15:49         ` Will Deacon
2023-02-01 12:20         ` Catalin Marinas
2023-02-01 12:20           ` Catalin Marinas
2023-02-02  9:51           ` Muchun Song
2023-02-02  9:51             ` Muchun Song
2023-02-02 10:45             ` Catalin Marinas
2023-02-02 10:45               ` Catalin Marinas
2023-02-03  2:40               ` Muchun Song
2023-02-03  2:40                 ` Muchun Song
2023-02-03 10:10                 ` Will Deacon
2023-02-03 10:10                   ` Will Deacon
2023-02-06  3:28                   ` Muchun Song
2023-02-06  3:28                     ` Muchun Song
2023-02-07 14:31                     ` Will Deacon
2023-02-07 14:31                       ` Will Deacon
2023-02-08  3:13                       ` Muchun Song [this message]
2023-02-08  3:13                         ` Muchun Song
2023-02-08 17:27                         ` Mark Rutland
2023-02-08 17:27                           ` Mark Rutland
2023-02-10  6:50                           ` Muchun Song
2023-02-10  6:50                             ` Muchun Song
2023-01-27 15:16     ` Mark Rutland
2023-01-27 15:16       ` Mark Rutland
2023-01-30  8:16       ` Anshuman Khandual
2023-01-30  8:16         ` Anshuman Khandual
2023-01-30 10:08       ` Mark Rutland
2023-01-30 10:08         ` Mark Rutland
2023-01-27 15:14 ` Mark Rutland
2023-01-27 15:14   ` Mark Rutland
2023-01-31  2:57   ` Anshuman Khandual
2023-01-31  2:57     ` Anshuman Khandual

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FB3AE2F1-2EF8-4960-884E-DABA7E64CE59@linux.dev \
    --to=muchun.song@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.