* [patch]x86: clearing access bit don't flush tlb
@ 2014-04-03 0:42 Shaohua Li
2014-04-03 11:35 ` [patch] x86: " Ingo Molnar
0 siblings, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2014-04-03 0:42 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-mm, akpm, mingo, riel, hughd, mgorman, torvalds
Add a few acks and resend this patch.
We use access bit to age a page at page reclaim. When clearing pte access bit,
we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
access bit is unset in page table, when cpu access the page again, cpu will not
set page table pte's access bit. Next time page reclaim will think this hot
page is yong and reclaim it wrongly, but this doesn't corrupt data.
And according to intel manual, tlb has less than 1k entries, which covers < 4M
memory. In today's system, several giga byte memory is normal. After page
reclaim clears pte access bit and before cpu access the page again, it's quite
unlikely this page's pte is still in TLB. And context swich will flush tlb too.
The chance skiping tlb flush to impact page reclaim should be very rare.
Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
Hugh added it to fix some ARM and sparc issues. Since I only change this for
x86, there should be no risk.
And in some workloads, TLB flush overhead is very heavy. In my simple
multithread app with a lot of swap to several pcie SSD, removing the tlb flush
gives about 20% ~ 30% swapout speedup.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
---
arch/x86/mm/pgtable.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
Index: linux/arch/x86/mm/pgtable.c
===================================================================
--- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
+++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
@@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- int young;
-
- young = ptep_test_and_clear_young(vma, address, ptep);
- if (young)
- flush_tlb_page(vma, address);
-
- return young;
+ /*
+ * In X86, clearing access bit without TLB flush doesn't cause data
+ * corruption. Doing this could cause wrong page aging and so hot pages
+ * are reclaimed, but the chance should be very rare.
+ */
+ return ptep_test_and_clear_young(vma, address, ptep);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] x86: clearing access bit don't flush tlb
2014-04-03 0:42 [patch]x86: clearing access bit don't flush tlb Shaohua Li
@ 2014-04-03 11:35 ` Ingo Molnar
2014-04-03 13:45 ` Shaohua Li
2014-04-08 7:58 ` Shaohua Li
0 siblings, 2 replies; 8+ messages in thread
From: Ingo Molnar @ 2014-04-03 11:35 UTC (permalink / raw)
To: Shaohua Li
Cc: linux-kernel, linux-mm, akpm, riel, hughd, mgorman, torvalds,
Peter Zijlstra, Thomas Gleixner
* Shaohua Li <shli@kernel.org> wrote:
> Add a few acks and resend this patch.
>
> We use access bit to age a page at page reclaim. When clearing pte access bit,
> we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> access bit is unset in page table, when cpu access the page again, cpu will not
> set page table pte's access bit. Next time page reclaim will think this hot
> page is yong and reclaim it wrongly, but this doesn't corrupt data.
>
> And according to intel manual, tlb has less than 1k entries, which covers < 4M
> memory. In today's system, several giga byte memory is normal. After page
> reclaim clears pte access bit and before cpu access the page again, it's quite
> unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> The chance skiping tlb flush to impact page reclaim should be very rare.
>
> Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> Hugh added it to fix some ARM and sparc issues. Since I only change this for
> x86, there should be no risk.
>
> And in some workloads, TLB flush overhead is very heavy. In my simple
> multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> gives about 20% ~ 30% swapout speedup.
>
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Hugh Dickins <hughd@google.com>
> ---
> arch/x86/mm/pgtable.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> Index: linux/arch/x86/mm/pgtable.c
> ===================================================================
> --- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
> +++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
> @@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> int ptep_clear_flush_young(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> - int young;
> -
> - young = ptep_test_and_clear_young(vma, address, ptep);
> - if (young)
> - flush_tlb_page(vma, address);
> -
> - return young;
> + /*
> + * In X86, clearing access bit without TLB flush doesn't cause data
> + * corruption. Doing this could cause wrong page aging and so hot pages
> + * are reclaimed, but the chance should be very rare.
So, beyond the spelling mistakes, I guess this explanation should also
be a bit more explanatory - how about something like:
/*
* On x86 CPUs, clearing the accessed bit without a TLB flush
* doesn't cause data corruption. [ It could cause incorrect
* page aging and the (mistaken) reclaim of hot pages, but the
* chance of that should be relatively low. ]
*
* So as a performance optimization don't flush the TLB when
* clearing the accessed bit, it will eventually be flushed by
* a context switch or a VM operation anyway. [ In the rare
* event of it not getting flushed for a long time the delay
* shouldn't really matter because there's no real memory
* pressure for swapout to react to. ]
*/
Agreed?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] x86: clearing access bit don't flush tlb
2014-04-03 11:35 ` [patch] x86: " Ingo Molnar
@ 2014-04-03 13:45 ` Shaohua Li
2014-04-04 15:01 ` Johannes Weiner
2014-04-08 7:58 ` Shaohua Li
1 sibling, 1 reply; 8+ messages in thread
From: Shaohua Li @ 2014-04-03 13:45 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-mm, akpm, riel, hughd, mgorman, torvalds,
Peter Zijlstra, Thomas Gleixner
On Thu, Apr 03, 2014 at 01:35:37PM +0200, Ingo Molnar wrote:
>
> * Shaohua Li <shli@kernel.org> wrote:
>
> > Add a few acks and resend this patch.
> >
> > We use access bit to age a page at page reclaim. When clearing pte access bit,
> > we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> > access bit is unset in page table, when cpu access the page again, cpu will not
> > set page table pte's access bit. Next time page reclaim will think this hot
> > page is yong and reclaim it wrongly, but this doesn't corrupt data.
> >
> > And according to intel manual, tlb has less than 1k entries, which covers < 4M
> > memory. In today's system, several giga byte memory is normal. After page
> > reclaim clears pte access bit and before cpu access the page again, it's quite
> > unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> > The chance skiping tlb flush to impact page reclaim should be very rare.
> >
> > Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> > Hugh added it to fix some ARM and sparc issues. Since I only change this for
> > x86, there should be no risk.
> >
> > And in some workloads, TLB flush overhead is very heavy. In my simple
> > multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> > gives about 20% ~ 30% swapout speedup.
> >
> > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Acked-by: Mel Gorman <mgorman@suse.de>
> > Acked-by: Hugh Dickins <hughd@google.com>
> > ---
> > arch/x86/mm/pgtable.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > Index: linux/arch/x86/mm/pgtable.c
> > ===================================================================
> > --- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
> > +++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
> > @@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> > int ptep_clear_flush_young(struct vm_area_struct *vma,
> > unsigned long address, pte_t *ptep)
> > {
> > - int young;
> > -
> > - young = ptep_test_and_clear_young(vma, address, ptep);
> > - if (young)
> > - flush_tlb_page(vma, address);
> > -
> > - return young;
> > + /*
> > + * In X86, clearing access bit without TLB flush doesn't cause data
> > + * corruption. Doing this could cause wrong page aging and so hot pages
> > + * are reclaimed, but the chance should be very rare.
>
> So, beyond the spelling mistakes, I guess this explanation should also
> be a bit more explanatory - how about something like:
>
> /*
> * On x86 CPUs, clearing the accessed bit without a TLB flush
> * doesn't cause data corruption. [ It could cause incorrect
> * page aging and the (mistaken) reclaim of hot pages, but the
> * chance of that should be relatively low. ]
> *
> * So as a performance optimization don't flush the TLB when
> * clearing the accessed bit, it will eventually be flushed by
> * a context switch or a VM operation anyway. [ In the rare
> * event of it not getting flushed for a long time the delay
> * shouldn't really matter because there's no real memory
> * pressure for swapout to react to. ]
> */
>
> Agreed?
Sure, that's better, thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] x86: clearing access bit don't flush tlb
2014-04-03 13:45 ` Shaohua Li
@ 2014-04-04 15:01 ` Johannes Weiner
0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2014-04-04 15:01 UTC (permalink / raw)
To: Shaohua Li
Cc: Ingo Molnar, linux-kernel, linux-mm, akpm, riel, hughd, mgorman,
torvalds, Peter Zijlstra, Thomas Gleixner
On Thu, Apr 03, 2014 at 09:45:42PM +0800, Shaohua Li wrote:
> On Thu, Apr 03, 2014 at 01:35:37PM +0200, Ingo Molnar wrote:
> >
> > * Shaohua Li <shli@kernel.org> wrote:
> >
> > > Add a few acks and resend this patch.
> > >
> > > We use access bit to age a page at page reclaim. When clearing pte access bit,
> > > we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> > > access bit is unset in page table, when cpu access the page again, cpu will not
> > > set page table pte's access bit. Next time page reclaim will think this hot
> > > page is yong and reclaim it wrongly, but this doesn't corrupt data.
> > >
> > > And according to intel manual, tlb has less than 1k entries, which covers < 4M
> > > memory. In today's system, several giga byte memory is normal. After page
> > > reclaim clears pte access bit and before cpu access the page again, it's quite
> > > unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> > > The chance skiping tlb flush to impact page reclaim should be very rare.
> > >
> > > Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> > > Hugh added it to fix some ARM and sparc issues. Since I only change this for
> > > x86, there should be no risk.
> > >
> > > And in some workloads, TLB flush overhead is very heavy. In my simple
> > > multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> > > gives about 20% ~ 30% swapout speedup.
> > >
> > > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Acked-by: Mel Gorman <mgorman@suse.de>
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > > ---
> > > arch/x86/mm/pgtable.c | 13 ++++++-------
> > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > >
> > > Index: linux/arch/x86/mm/pgtable.c
> > > ===================================================================
> > > --- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
> > > +++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
> > > @@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> > > int ptep_clear_flush_young(struct vm_area_struct *vma,
> > > unsigned long address, pte_t *ptep)
> > > {
> > > - int young;
> > > -
> > > - young = ptep_test_and_clear_young(vma, address, ptep);
> > > - if (young)
> > > - flush_tlb_page(vma, address);
> > > -
> > > - return young;
> > > + /*
> > > + * In X86, clearing access bit without TLB flush doesn't cause data
> > > + * corruption. Doing this could cause wrong page aging and so hot pages
> > > + * are reclaimed, but the chance should be very rare.
> >
> > So, beyond the spelling mistakes, I guess this explanation should also
> > be a bit more explanatory - how about something like:
> >
> > /*
> > * On x86 CPUs, clearing the accessed bit without a TLB flush
> > * doesn't cause data corruption. [ It could cause incorrect
> > * page aging and the (mistaken) reclaim of hot pages, but the
> > * chance of that should be relatively low. ]
> > *
> > * So as a performance optimization don't flush the TLB when
> > * clearing the accessed bit, it will eventually be flushed by
> > * a context switch or a VM operation anyway. [ In the rare
> > * event of it not getting flushed for a long time the delay
> > * shouldn't really matter because there's no real memory
> > * pressure for swapout to react to. ]
> > */
> >
> > Agreed?
>
> Sure, that's better, thanks!
With Ingo's updated comment:
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] x86: clearing access bit don't flush tlb
2014-04-03 11:35 ` [patch] x86: " Ingo Molnar
2014-04-03 13:45 ` Shaohua Li
@ 2014-04-08 7:58 ` Shaohua Li
2014-04-14 11:36 ` Ingo Molnar
` (2 more replies)
1 sibling, 3 replies; 8+ messages in thread
From: Shaohua Li @ 2014-04-08 7:58 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-kernel, linux-mm, akpm, riel, hughd, mgorman, torvalds,
Peter Zijlstra, Thomas Gleixner
On Thu, Apr 03, 2014 at 01:35:37PM +0200, Ingo Molnar wrote:
>
> * Shaohua Li <shli@kernel.org> wrote:
>
> > Add a few acks and resend this patch.
> >
> > We use access bit to age a page at page reclaim. When clearing pte access bit,
> > we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> > access bit is unset in page table, when cpu access the page again, cpu will not
> > set page table pte's access bit. Next time page reclaim will think this hot
> > page is yong and reclaim it wrongly, but this doesn't corrupt data.
> >
> > And according to intel manual, tlb has less than 1k entries, which covers < 4M
> > memory. In today's system, several giga byte memory is normal. After page
> > reclaim clears pte access bit and before cpu access the page again, it's quite
> > unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> > The chance skiping tlb flush to impact page reclaim should be very rare.
> >
> > Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> > Hugh added it to fix some ARM and sparc issues. Since I only change this for
> > x86, there should be no risk.
> >
> > And in some workloads, TLB flush overhead is very heavy. In my simple
> > multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> > gives about 20% ~ 30% swapout speedup.
> >
> > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Acked-by: Mel Gorman <mgorman@suse.de>
> > Acked-by: Hugh Dickins <hughd@google.com>
> > ---
> > arch/x86/mm/pgtable.c | 13 ++++++-------
> > 1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > Index: linux/arch/x86/mm/pgtable.c
> > ===================================================================
> > --- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
> > +++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
> > @@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> > int ptep_clear_flush_young(struct vm_area_struct *vma,
> > unsigned long address, pte_t *ptep)
> > {
> > - int young;
> > -
> > - young = ptep_test_and_clear_young(vma, address, ptep);
> > - if (young)
> > - flush_tlb_page(vma, address);
> > -
> > - return young;
> > + /*
> > + * In X86, clearing access bit without TLB flush doesn't cause data
> > + * corruption. Doing this could cause wrong page aging and so hot pages
> > + * are reclaimed, but the chance should be very rare.
>
> So, beyond the spelling mistakes, I guess this explanation should also
> be a bit more explanatory - how about something like:
>
> /*
> * On x86 CPUs, clearing the accessed bit without a TLB flush
> * doesn't cause data corruption. [ It could cause incorrect
> * page aging and the (mistaken) reclaim of hot pages, but the
> * chance of that should be relatively low. ]
> *
> * So as a performance optimization don't flush the TLB when
> * clearing the accessed bit, it will eventually be flushed by
> * a context switch or a VM operation anyway. [ In the rare
> * event of it not getting flushed for a long time the delay
> * shouldn't really matter because there's no real memory
> * pressure for swapout to react to. ]
> */
>
> Agreed?
Changed the comments and added ACK of Johannes, so you can pick up directly.
Subject: x86: clearing access bit don't flush tlb
We use access bit to age a page at page reclaim. When clearing pte access bit,
we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
access bit is unset in page table, when cpu access the page again, cpu will not
set page table pte's access bit. Next time page reclaim will think this hot
page is yong and reclaim it wrongly, but this doesn't corrupt data.
And according to intel manual, tlb has less than 1k entries, which covers < 4M
memory. In today's system, several giga byte memory is normal. After page
reclaim clears pte access bit and before cpu access the page again, it's quite
unlikely this page's pte is still in TLB. And context swich will flush tlb too.
The chance skiping tlb flush to impact page reclaim should be very rare.
Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
Hugh added it to fix some ARM and sparc issues. Since I only change this for
x86, there should be no risk.
And in some workloads, TLB flush overhead is very heavy. In my simple
multithread app with a lot of swap to several pcie SSD, removing the tlb flush
gives about 20% ~ 30% swapout speedup.
Update comments by Ingo.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
arch/x86/mm/pgtable.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
Index: linux/arch/x86/mm/pgtable.c
===================================================================
--- linux.orig/arch/x86/mm/pgtable.c 2014-04-07 08:36:02.843221074 +0800
+++ linux/arch/x86/mm/pgtable.c 2014-04-07 08:37:26.438170140 +0800
@@ -399,13 +399,20 @@ int pmdp_test_and_clear_young(struct vm_
int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- int young;
-
- young = ptep_test_and_clear_young(vma, address, ptep);
- if (young)
- flush_tlb_page(vma, address);
-
- return young;
+ /*
+ * On x86 CPUs, clearing the accessed bit without a TLB flush
+ * doesn't cause data corruption. [ It could cause incorrect
+ * page aging and the (mistaken) reclaim of hot pages, but the
+ * chance of that should be relatively low. ]
+ *
+ * So as a performance optimization don't flush the TLB when
+ * clearing the accessed bit, it will eventually be flushed by
+ * a context switch or a VM operation anyway. [ In the rare
+ * event of it not getting flushed for a long time the delay
+ * shouldn't really matter because there's no real memory
+ * pressure for swapout to react to. ]
+ */
+ return ptep_test_and_clear_young(vma, address, ptep);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [patch] x86: clearing access bit don't flush tlb
2014-04-08 7:58 ` Shaohua Li
@ 2014-04-14 11:36 ` Ingo Molnar
2014-04-15 8:24 ` [tip:x86/urgent] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB tip-bot for Shaohua Li
2014-04-16 7:40 ` tip-bot for Shaohua Li
2 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2014-04-14 11:36 UTC (permalink / raw)
To: Shaohua Li
Cc: linux-kernel, linux-mm, akpm, riel, hughd, mgorman, torvalds,
Peter Zijlstra, Thomas Gleixner
* Shaohua Li <shli@kernel.org> wrote:
> On Thu, Apr 03, 2014 at 01:35:37PM +0200, Ingo Molnar wrote:
> >
> > * Shaohua Li <shli@kernel.org> wrote:
> >
> > > Add a few acks and resend this patch.
> > >
> > > We use access bit to age a page at page reclaim. When clearing pte access bit,
> > > we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> > > access bit is unset in page table, when cpu access the page again, cpu will not
> > > set page table pte's access bit. Next time page reclaim will think this hot
> > > page is yong and reclaim it wrongly, but this doesn't corrupt data.
> > >
> > > And according to intel manual, tlb has less than 1k entries, which covers < 4M
> > > memory. In today's system, several giga byte memory is normal. After page
> > > reclaim clears pte access bit and before cpu access the page again, it's quite
> > > unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> > > The chance skiping tlb flush to impact page reclaim should be very rare.
> > >
> > > Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> > > Hugh added it to fix some ARM and sparc issues. Since I only change this for
> > > x86, there should be no risk.
> > >
> > > And in some workloads, TLB flush overhead is very heavy. In my simple
> > > multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> > > gives about 20% ~ 30% swapout speedup.
> > >
> > > Signed-off-by: Shaohua Li <shli@fusionio.com>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Acked-by: Mel Gorman <mgorman@suse.de>
> > > Acked-by: Hugh Dickins <hughd@google.com>
> > > ---
> > > arch/x86/mm/pgtable.c | 13 ++++++-------
> > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > >
> > > Index: linux/arch/x86/mm/pgtable.c
> > > ===================================================================
> > > --- linux.orig/arch/x86/mm/pgtable.c 2014-03-27 05:22:08.572100549 +0800
> > > +++ linux/arch/x86/mm/pgtable.c 2014-03-27 05:46:12.456131121 +0800
> > > @@ -399,13 +399,12 @@ int pmdp_test_and_clear_young(struct vm_
> > > int ptep_clear_flush_young(struct vm_area_struct *vma,
> > > unsigned long address, pte_t *ptep)
> > > {
> > > - int young;
> > > -
> > > - young = ptep_test_and_clear_young(vma, address, ptep);
> > > - if (young)
> > > - flush_tlb_page(vma, address);
> > > -
> > > - return young;
> > > + /*
> > > + * In X86, clearing access bit without TLB flush doesn't cause data
> > > + * corruption. Doing this could cause wrong page aging and so hot pages
> > > + * are reclaimed, but the chance should be very rare.
> >
> > So, beyond the spelling mistakes, I guess this explanation should also
> > be a bit more explanatory - how about something like:
> >
> > /*
> > * On x86 CPUs, clearing the accessed bit without a TLB flush
> > * doesn't cause data corruption. [ It could cause incorrect
> > * page aging and the (mistaken) reclaim of hot pages, but the
> > * chance of that should be relatively low. ]
> > *
> > * So as a performance optimization don't flush the TLB when
> > * clearing the accessed bit, it will eventually be flushed by
> > * a context switch or a VM operation anyway. [ In the rare
> > * event of it not getting flushed for a long time the delay
> > * shouldn't really matter because there's no real memory
> > * pressure for swapout to react to. ]
> > */
> >
> > Agreed?
>
> Changed the comments and added ACK of Johannes, so you can pick up directly.
>
> Subject: x86: clearing access bit don't flush tlb
>
> We use access bit to age a page at page reclaim. When clearing pte access bit,
> we could skip tlb flush in X86. The side effect is if the pte is in tlb and pte
> access bit is unset in page table, when cpu access the page again, cpu will not
> set page table pte's access bit. Next time page reclaim will think this hot
> page is yong and reclaim it wrongly, but this doesn't corrupt data.
>
> And according to intel manual, tlb has less than 1k entries, which covers < 4M
> memory. In today's system, several giga byte memory is normal. After page
> reclaim clears pte access bit and before cpu access the page again, it's quite
> unlikely this page's pte is still in TLB. And context swich will flush tlb too.
> The chance skiping tlb flush to impact page reclaim should be very rare.
>
> Originally (in 2.5 kernel maybe), we didn't do tlb flush after clear access bit.
> Hugh added it to fix some ARM and sparc issues. Since I only change this for
> x86, there should be no risk.
>
> And in some workloads, TLB flush overhead is very heavy. In my simple
> multithread app with a lot of swap to several pcie SSD, removing the tlb flush
> gives about 20% ~ 30% swapout speedup.
>
> Update comments by Ingo.
I fixed this changelog as well.
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mgorman@suse.de>
> Acked-by: Hugh Dickins <hughd@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
> arch/x86/mm/pgtable.c | 21 ++++++++++++++-------
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> Index: linux/arch/x86/mm/pgtable.c
> ===================================================================
> --- linux.orig/arch/x86/mm/pgtable.c 2014-04-07 08:36:02.843221074 +0800
> +++ linux/arch/x86/mm/pgtable.c 2014-04-07 08:37:26.438170140 +0800
> @@ -399,13 +399,20 @@ int pmdp_test_and_clear_young(struct vm_
> int ptep_clear_flush_young(struct vm_area_struct *vma,
> unsigned long address, pte_t *ptep)
> {
> - int young;
> -
> - young = ptep_test_and_clear_young(vma, address, ptep);
> - if (young)
> - flush_tlb_page(vma, address);
> -
> - return young;
> + /*
> + * On x86 CPUs, clearing the accessed bit without a TLB flush
> + * doesn't cause data corruption. [ It could cause incorrect
> + * page aging and the (mistaken) reclaim of hot pages, but the
> + * chance of that should be relatively low. ]
> + *
> + * So as a performance optimization don't flush the TLB when
> + * clearing the accessed bit, it will eventually be flushed by
> + * a context switch or a VM operation anyway. [ In the rare
> + * event of it not getting flushed for a long time the delay
> + * shouldn't really matter because there's no real memory
> + * pressure for swapout to react to. ]
> + */
There's whitespace damage here - I fixed that up as well.
Please use scripts/checkpatch.pl before submitting patches, to make
sure there are no fixable problems in it.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip:x86/urgent] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
2014-04-08 7:58 ` Shaohua Li
2014-04-14 11:36 ` Ingo Molnar
@ 2014-04-15 8:24 ` tip-bot for Shaohua Li
2014-04-16 7:40 ` tip-bot for Shaohua Li
2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Shaohua Li @ 2014-04-15 8:24 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, torvalds, hannes, hughd,
riel, shli, mgorman, tglx, shli
Commit-ID: ef28faf837aba5b80d08a3d957e365be972f222b
Gitweb: http://git.kernel.org/tip/ef28faf837aba5b80d08a3d957e365be972f222b
Author: Shaohua Li <shli@kernel.org>
AuthorDate: Tue, 8 Apr 2014 15:58:09 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 14 Apr 2014 13:34:50 +0200
x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
We use the accessed bit to age a page at page reclaim time,
and currently we also flush the TLB when doing so.
But in some workloads TLB flush overhead is very heavy. In my
simple multithreaded app with a lot of swap to several pcie
SSDs, removing the tlb flush gives about 20% ~ 30% swapout
speedup.
Fortunately just removing the TLB flush is a valid optimization:
on x86 CPUs, clearing the accessed bit without a TLB flush
doesn't cause data corruption.
It could cause incorrect page aging and the (mistaken) reclaim of
hot pages, but the chance of that should be relatively low.
So as a performance optimization don't flush the TLB when
clearing the accessed bit, it will eventually be flushed by
a context switch or a VM operation anyway. [ In the rare
event of it not getting flushed for a long time the delay
shouldn't really matter because there's no real memory
pressure for swapout to react to. ]
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
[ Rewrote the changelog and the code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/mm/pgtable.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c96314a..0004ac7 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -399,13 +399,20 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma,
int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- int young;
-
- young = ptep_test_and_clear_young(vma, address, ptep);
- if (young)
- flush_tlb_page(vma, address);
-
- return young;
+ /*
+ * On x86 CPUs, clearing the accessed bit without a TLB flush
+ * doesn't cause data corruption. [ It could cause incorrect
+ * page aging and the (mistaken) reclaim of hot pages, but the
+ * chance of that should be relatively low. ]
+ *
+ * So as a performance optimization don't flush the TLB when
+ * clearing the accessed bit, it will eventually be flushed by
+ * a context switch or a VM operation anyway. [ In the rare
+ * event of it not getting flushed for a long time the delay
+ * shouldn't really matter because there's no real memory
+ * pressure for swapout to react to. ]
+ */
+ return ptep_test_and_clear_young(vma, address, ptep);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [tip:x86/urgent] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
2014-04-08 7:58 ` Shaohua Li
2014-04-14 11:36 ` Ingo Molnar
2014-04-15 8:24 ` [tip:x86/urgent] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB tip-bot for Shaohua Li
@ 2014-04-16 7:40 ` tip-bot for Shaohua Li
2 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Shaohua Li @ 2014-04-16 7:40 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, torvalds, hannes, hughd,
riel, shli, mgorman, tglx, shli
Commit-ID: b13b1d2d8692b437203de7a404c6b809d2cc4d99
Gitweb: http://git.kernel.org/tip/b13b1d2d8692b437203de7a404c6b809d2cc4d99
Author: Shaohua Li <shli@kernel.org>
AuthorDate: Tue, 8 Apr 2014 15:58:09 +0800
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 16 Apr 2014 08:57:08 +0200
x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB
We use the accessed bit to age a page at page reclaim time,
and currently we also flush the TLB when doing so.
But in some workloads TLB flush overhead is very heavy. In my
simple multithreaded app with a lot of swap to several pcie
SSDs, removing the tlb flush gives about 20% ~ 30% swapout
speedup.
Fortunately just removing the TLB flush is a valid optimization:
on x86 CPUs, clearing the accessed bit without a TLB flush
doesn't cause data corruption.
It could cause incorrect page aging and the (mistaken) reclaim of
hot pages, but the chance of that should be relatively low.
So as a performance optimization don't flush the TLB when
clearing the accessed bit, it will eventually be flushed by
a context switch or a VM operation anyway. [ In the rare
event of it not getting flushed for a long time the delay
shouldn't really matter because there's no real memory
pressure for swapout to react to. ]
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
[ Rewrote the changelog and the code comments. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/mm/pgtable.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index c96314a..0004ac7 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -399,13 +399,20 @@ int pmdp_test_and_clear_young(struct vm_area_struct *vma,
int ptep_clear_flush_young(struct vm_area_struct *vma,
unsigned long address, pte_t *ptep)
{
- int young;
-
- young = ptep_test_and_clear_young(vma, address, ptep);
- if (young)
- flush_tlb_page(vma, address);
-
- return young;
+ /*
+ * On x86 CPUs, clearing the accessed bit without a TLB flush
+ * doesn't cause data corruption. [ It could cause incorrect
+ * page aging and the (mistaken) reclaim of hot pages, but the
+ * chance of that should be relatively low. ]
+ *
+ * So as a performance optimization don't flush the TLB when
+ * clearing the accessed bit, it will eventually be flushed by
+ * a context switch or a VM operation anyway. [ In the rare
+ * event of it not getting flushed for a long time the delay
+ * shouldn't really matter because there's no real memory
+ * pressure for swapout to react to. ]
+ */
+ return ptep_test_and_clear_young(vma, address, ptep);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-04-16 7:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-03 0:42 [patch]x86: clearing access bit don't flush tlb Shaohua Li
2014-04-03 11:35 ` [patch] x86: " Ingo Molnar
2014-04-03 13:45 ` Shaohua Li
2014-04-04 15:01 ` Johannes Weiner
2014-04-08 7:58 ` Shaohua Li
2014-04-14 11:36 ` Ingo Molnar
2014-04-15 8:24 ` [tip:x86/urgent] x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB tip-bot for Shaohua Li
2014-04-16 7:40 ` tip-bot for Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).