From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
linux-arch@vger.kernel.org, Vineet Gupta <vgupta@synopsys.com>,
linux-snps-arc@lists.infradead.org,
Russell King <linux@armlinux.org.uk>,
linux-arm-kernel@lists.infradead.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
linux-mips@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
sparclinux@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Flushing transparent hugepages
Date: Fri, 28 Aug 2020 18:06:08 +0100 [thread overview]
Message-ID: <20200828170608.GJ3169@gaia> (raw)
In-Reply-To: <20200818160815.GA16191@willie-the-truck>
On Tue, Aug 18, 2020 at 05:08:16PM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> > For example, arm64 seems confused in this scenario:
> >
> > void flush_dcache_page(struct page *page)
> > {
> > if (test_bit(PG_dcache_clean, &page->flags))
> > clear_bit(PG_dcache_clean, &page->flags);
> > }
> >
> > ...
> >
> > void __sync_icache_dcache(pte_t pte)
> > {
> > struct page *page = pte_page(pte);
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> > sync_icache_aliases(page_address(page), page_size(page));
> > }
> >
> > So arm64 keeps track on a per-page basis which ones have been flushed.
> > page_size() will return PAGE_SIZE if called on a tail page or regular
> > page, but will return PAGE_SIZE << compound_order if called on a head
> > page. So this will either over-flush, or it's missing the opportunity
> > to clear the bits on all the subpages which have now been flushed.
>
> Hmm, that seems to go all the way back to 2014 as the result of a bug fix
> in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
> aware") which has a Reported-by Mark and a CC stable, suggesting something
> _was_ going wrong at the time :/ Was there a point where the tail pages
> could end up with PG_arch_1 uncleared on allocation?
In my experience, it's the other way around: you can end up with
PG_arch_1 cleared in a tail page when the head one was set (splitting
THP).
> > What would you _like_ to see? Would you rather flush_dcache_page()
> > were called once for each subpage, or would you rather maintain
> > the page-needs-flushing state once per compound page? We could also
> > introduce flush_dcache_thp() if some architectures would prefer it one
> > way and one the other, although that brings into question what to do
> > for hugetlbfs pages.
>
> For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
> [1], but there was a worry that it might break x86 and s390. It's also not
> clear to me that we can change __sync_icache_dcache() as it's called when
> we're installing the entry in the page-table, so why would it be called
> again for the tail pages?
Indeed, __sync_icache_dcache() is called from set_pte_at() on the head
page, though it could always iterate and flush the tail pages
individually (I think we could have done this in commit 923b8f5044da).
Currently I suspect it does some over-flushing if you use THP on
executable pages (it's a no-op on non-exec pages).
With MTE (arm64 memory tagging) I'm introducing a PG_arch_2 flag and
losing this is more problematic as it can lead to clearing valid tags.
In the subsequent patch [2], mte_sync_tags() (also called from
set_pte_at()) checks the PG_arch_2 in each page of a compound one.
My preference would be to treat both PG_arch_1 and _2 similarly.
> [1] https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.marinas@arm.com/
[2] https://lore.kernel.org/linux-arch/20200703153718.16973-9-catalin.marinas@arm.com/
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Vineet Gupta <vgupta@synopsys.com>,
Russell King <linux@armlinux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
linux-mips@vger.kernel.org, linux-mm@kvack.org,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
sparclinux@vger.kernel.org, linux-snps-arc@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Flushing transparent hugepages
Date: Fri, 28 Aug 2020 18:06:08 +0100 [thread overview]
Message-ID: <20200828170608.GJ3169@gaia> (raw)
In-Reply-To: <20200818160815.GA16191@willie-the-truck>
On Tue, Aug 18, 2020 at 05:08:16PM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> > For example, arm64 seems confused in this scenario:
> >
> > void flush_dcache_page(struct page *page)
> > {
> > if (test_bit(PG_dcache_clean, &page->flags))
> > clear_bit(PG_dcache_clean, &page->flags);
> > }
> >
> > ...
> >
> > void __sync_icache_dcache(pte_t pte)
> > {
> > struct page *page = pte_page(pte);
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> > sync_icache_aliases(page_address(page), page_size(page));
> > }
> >
> > So arm64 keeps track on a per-page basis which ones have been flushed.
> > page_size() will return PAGE_SIZE if called on a tail page or regular
> > page, but will return PAGE_SIZE << compound_order if called on a head
> > page. So this will either over-flush, or it's missing the opportunity
> > to clear the bits on all the subpages which have now been flushed.
>
> Hmm, that seems to go all the way back to 2014 as the result of a bug fix
> in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
> aware") which has a Reported-by Mark and a CC stable, suggesting something
> _was_ going wrong at the time :/ Was there a point where the tail pages
> could end up with PG_arch_1 uncleared on allocation?
In my experience, it's the other way around: you can end up with
PG_arch_1 cleared in a tail page when the head one was set (splitting
THP).
> > What would you _like_ to see? Would you rather flush_dcache_page()
> > were called once for each subpage, or would you rather maintain
> > the page-needs-flushing state once per compound page? We could also
> > introduce flush_dcache_thp() if some architectures would prefer it one
> > way and one the other, although that brings into question what to do
> > for hugetlbfs pages.
>
> For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
> [1], but there was a worry that it might break x86 and s390. It's also not
> clear to me that we can change __sync_icache_dcache() as it's called when
> we're installing the entry in the page-table, so why would it be called
> again for the tail pages?
Indeed, __sync_icache_dcache() is called from set_pte_at() on the head
page, though it could always iterate and flush the tail pages
individually (I think we could have done this in commit 923b8f5044da).
Currently I suspect it does some over-flushing if you use THP on
executable pages (it's a no-op on non-exec pages).
With MTE (arm64 memory tagging) I'm introducing a PG_arch_2 flag and
losing this is more problematic as it can lead to clearing valid tags.
In the subsequent patch [2], mte_sync_tags() (also called from
set_pte_at()) checks the PG_arch_2 in each page of a compound one.
My preference would be to treat both PG_arch_1 and _2 similarly.
> [1] https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.marinas@arm.com/
[2] https://lore.kernel.org/linux-arch/20200703153718.16973-9-catalin.marinas@arm.com/
--
Catalin
_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Vineet Gupta <vgupta@synopsys.com>,
Russell King <linux@armlinux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
linux-mips@vger.kernel.org, linux-mm@kvack.org,
Paul Mackerras <paulus@samba.org>,
sparclinux@vger.kernel.org, linux-snps-arc@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Flushing transparent hugepages
Date: Fri, 28 Aug 2020 18:06:08 +0100 [thread overview]
Message-ID: <20200828170608.GJ3169@gaia> (raw)
In-Reply-To: <20200818160815.GA16191@willie-the-truck>
On Tue, Aug 18, 2020 at 05:08:16PM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> > For example, arm64 seems confused in this scenario:
> >
> > void flush_dcache_page(struct page *page)
> > {
> > if (test_bit(PG_dcache_clean, &page->flags))
> > clear_bit(PG_dcache_clean, &page->flags);
> > }
> >
> > ...
> >
> > void __sync_icache_dcache(pte_t pte)
> > {
> > struct page *page = pte_page(pte);
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> > sync_icache_aliases(page_address(page), page_size(page));
> > }
> >
> > So arm64 keeps track on a per-page basis which ones have been flushed.
> > page_size() will return PAGE_SIZE if called on a tail page or regular
> > page, but will return PAGE_SIZE << compound_order if called on a head
> > page. So this will either over-flush, or it's missing the opportunity
> > to clear the bits on all the subpages which have now been flushed.
>
> Hmm, that seems to go all the way back to 2014 as the result of a bug fix
> in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
> aware") which has a Reported-by Mark and a CC stable, suggesting something
> _was_ going wrong at the time :/ Was there a point where the tail pages
> could end up with PG_arch_1 uncleared on allocation?
In my experience, it's the other way around: you can end up with
PG_arch_1 cleared in a tail page when the head one was set (splitting
THP).
> > What would you _like_ to see? Would you rather flush_dcache_page()
> > were called once for each subpage, or would you rather maintain
> > the page-needs-flushing state once per compound page? We could also
> > introduce flush_dcache_thp() if some architectures would prefer it one
> > way and one the other, although that brings into question what to do
> > for hugetlbfs pages.
>
> For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
> [1], but there was a worry that it might break x86 and s390. It's also not
> clear to me that we can change __sync_icache_dcache() as it's called when
> we're installing the entry in the page-table, so why would it be called
> again for the tail pages?
Indeed, __sync_icache_dcache() is called from set_pte_at() on the head
page, though it could always iterate and flush the tail pages
individually (I think we could have done this in commit 923b8f5044da).
Currently I suspect it does some over-flushing if you use THP on
executable pages (it's a no-op on non-exec pages).
With MTE (arm64 memory tagging) I'm introducing a PG_arch_2 flag and
losing this is more problematic as it can lead to clearing valid tags.
In the subsequent patch [2], mte_sync_tags() (also called from
set_pte_at()) checks the PG_arch_2 in each page of a compound one.
My preference would be to treat both PG_arch_1 and _2 similarly.
> [1] https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.marinas@arm.com/
[2] https://lore.kernel.org/linux-arch/20200703153718.16973-9-catalin.marinas@arm.com/
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
linux-arch@vger.kernel.org, Vineet Gupta <vgupta@synopsys.com>,
linux-snps-arc@lists.infradead.org,
Russell King <linux@armlinux.org.uk>,
linux-arm-kernel@lists.infradead.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
linux-mips@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Paul Mackerras <paulus@samba.org>,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
sparclinux@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Flushing transparent hugepages
Date: Fri, 28 Aug 2020 17:06:08 +0000 [thread overview]
Message-ID: <20200828170608.GJ3169@gaia> (raw)
In-Reply-To: <20200818160815.GA16191@willie-the-truck>
On Tue, Aug 18, 2020 at 05:08:16PM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> > For example, arm64 seems confused in this scenario:
> >
> > void flush_dcache_page(struct page *page)
> > {
> > if (test_bit(PG_dcache_clean, &page->flags))
> > clear_bit(PG_dcache_clean, &page->flags);
> > }
> >
> > ...
> >
> > void __sync_icache_dcache(pte_t pte)
> > {
> > struct page *page = pte_page(pte);
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> > sync_icache_aliases(page_address(page), page_size(page));
> > }
> >
> > So arm64 keeps track on a per-page basis which ones have been flushed.
> > page_size() will return PAGE_SIZE if called on a tail page or regular
> > page, but will return PAGE_SIZE << compound_order if called on a head
> > page. So this will either over-flush, or it's missing the opportunity
> > to clear the bits on all the subpages which have now been flushed.
>
> Hmm, that seems to go all the way back to 2014 as the result of a bug fix
> in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
> aware") which has a Reported-by Mark and a CC stable, suggesting something
> _was_ going wrong at the time :/ Was there a point where the tail pages
> could end up with PG_arch_1 uncleared on allocation?
In my experience, it's the other way around: you can end up with
PG_arch_1 cleared in a tail page when the head one was set (splitting
THP).
> > What would you _like_ to see? Would you rather flush_dcache_page()
> > were called once for each subpage, or would you rather maintain
> > the page-needs-flushing state once per compound page? We could also
> > introduce flush_dcache_thp() if some architectures would prefer it one
> > way and one the other, although that brings into question what to do
> > for hugetlbfs pages.
>
> For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
> [1], but there was a worry that it might break x86 and s390. It's also not
> clear to me that we can change __sync_icache_dcache() as it's called when
> we're installing the entry in the page-table, so why would it be called
> again for the tail pages?
Indeed, __sync_icache_dcache() is called from set_pte_at() on the head
page, though it could always iterate and flush the tail pages
individually (I think we could have done this in commit 923b8f5044da).
Currently I suspect it does some over-flushing if you use THP on
executable pages (it's a no-op on non-exec pages).
With MTE (arm64 memory tagging) I'm introducing a PG_arch_2 flag and
losing this is more problematic as it can lead to clearing valid tags.
In the subsequent patch [2], mte_sync_tags() (also called from
set_pte_at()) checks the PG_arch_2 in each page of a compound one.
My preference would be to treat both PG_arch_1 and _2 similarly.
> [1] https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.marinas@arm.com/
[2] https://lore.kernel.org/linux-arch/20200703153718.16973-9-catalin.marinas@arm.com/
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Vineet Gupta <vgupta@synopsys.com>,
Russell King <linux@armlinux.org.uk>,
Matthew Wilcox <willy@infradead.org>,
linux-mips@vger.kernel.org, linux-mm@kvack.org,
Paul Mackerras <paulus@samba.org>,
Michael Ellerman <mpe@ellerman.id.au>,
sparclinux@vger.kernel.org, linux-snps-arc@lists.infradead.org,
linuxppc-dev@lists.ozlabs.org,
"David S. Miller" <davem@davemloft.net>,
linux-arm-kernel@lists.infradead.org
Subject: Re: Flushing transparent hugepages
Date: Fri, 28 Aug 2020 18:06:08 +0100 [thread overview]
Message-ID: <20200828170608.GJ3169@gaia> (raw)
In-Reply-To: <20200818160815.GA16191@willie-the-truck>
On Tue, Aug 18, 2020 at 05:08:16PM +0100, Will Deacon wrote:
> On Tue, Aug 18, 2020 at 04:07:36PM +0100, Matthew Wilcox wrote:
> > For example, arm64 seems confused in this scenario:
> >
> > void flush_dcache_page(struct page *page)
> > {
> > if (test_bit(PG_dcache_clean, &page->flags))
> > clear_bit(PG_dcache_clean, &page->flags);
> > }
> >
> > ...
> >
> > void __sync_icache_dcache(pte_t pte)
> > {
> > struct page *page = pte_page(pte);
> >
> > if (!test_and_set_bit(PG_dcache_clean, &page->flags))
> > sync_icache_aliases(page_address(page), page_size(page));
> > }
> >
> > So arm64 keeps track on a per-page basis which ones have been flushed.
> > page_size() will return PAGE_SIZE if called on a tail page or regular
> > page, but will return PAGE_SIZE << compound_order if called on a head
> > page. So this will either over-flush, or it's missing the opportunity
> > to clear the bits on all the subpages which have now been flushed.
>
> Hmm, that seems to go all the way back to 2014 as the result of a bug fix
> in 923b8f5044da ("arm64: mm: Make icache synchronisation logic huge page
> aware") which has a Reported-by Mark and a CC stable, suggesting something
> _was_ going wrong at the time :/ Was there a point where the tail pages
> could end up with PG_arch_1 uncleared on allocation?
In my experience, it's the other way around: you can end up with
PG_arch_1 cleared in a tail page when the head one was set (splitting
THP).
> > What would you _like_ to see? Would you rather flush_dcache_page()
> > were called once for each subpage, or would you rather maintain
> > the page-needs-flushing state once per compound page? We could also
> > introduce flush_dcache_thp() if some architectures would prefer it one
> > way and one the other, although that brings into question what to do
> > for hugetlbfs pages.
>
> For arm64, we'd like to see PG_arch_1 preserved during huge page splitting
> [1], but there was a worry that it might break x86 and s390. It's also not
> clear to me that we can change __sync_icache_dcache() as it's called when
> we're installing the entry in the page-table, so why would it be called
> again for the tail pages?
Indeed, __sync_icache_dcache() is called from set_pte_at() on the head
page, though it could always iterate and flush the tail pages
individually (I think we could have done this in commit 923b8f5044da).
Currently I suspect it does some over-flushing if you use THP on
executable pages (it's a no-op on non-exec pages).
With MTE (arm64 memory tagging) I'm introducing a PG_arch_2 flag and
losing this is more problematic as it can lead to clearing valid tags.
In the subsequent patch [2], mte_sync_tags() (also called from
set_pte_at()) checks the PG_arch_2 in each page of a compound one.
My preference would be to treat both PG_arch_1 and _2 similarly.
> [1] https://lore.kernel.org/linux-arch/20200703153718.16973-8-catalin.marinas@arm.com/
[2] https://lore.kernel.org/linux-arch/20200703153718.16973-9-catalin.marinas@arm.com/
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-08-28 17:06 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-18 15:07 Flushing transparent hugepages Matthew Wilcox
2020-08-18 15:07 ` Matthew Wilcox
2020-08-18 15:07 ` Matthew Wilcox
2020-08-18 15:07 ` Matthew Wilcox
2020-08-18 15:07 ` Matthew Wilcox
2020-08-18 16:08 ` Will Deacon
2020-08-18 16:08 ` Will Deacon
2020-08-18 16:08 ` Will Deacon
2020-08-18 16:08 ` Will Deacon
2020-08-18 16:08 ` Will Deacon
2020-08-28 17:06 ` Catalin Marinas [this message]
2020-08-28 17:06 ` Catalin Marinas
2020-08-28 17:06 ` Catalin Marinas
2020-08-28 17:06 ` Catalin Marinas
2020-08-28 17:06 ` Catalin Marinas
2020-09-09 10:26 ` Aneesh Kumar K.V
2020-09-09 10:38 ` Aneesh Kumar K.V
2020-09-09 10:26 ` Aneesh Kumar K.V
2020-09-09 10:26 ` Aneesh Kumar K.V
2020-09-09 10:26 ` Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200828170608.GJ3169@gaia \
--to=catalin.marinas@arm.com \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-snps-arc@lists.infradead.org \
--cc=linux@armlinux.org.uk \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=paulus@samba.org \
--cc=sparclinux@vger.kernel.org \
--cc=tsbogend@alpha.franken.de \
--cc=vgupta@synopsys.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.