From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Richard Henderson <rth@twiddle.net>,
David Hildenbrand <david@redhat.com>,
Matt Turner <mattst88@gmail.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Michal Simek <monstr@monstr.eu>,
Russell King <linux@armlinux.org.uk>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
linux-riscv@lists.infradead.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Jonas Bonn <jonas@southpole.se>, Will Deacon <will@kernel.org>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Andrea Arcangeli <aarcange@redhat.com>,
openrisc@lists.librecores.org, linux-s390@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-m68k@lists.linux-m68k.org, Palmer Dabbelt <palmer@da>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Richard Henderson <rth@twiddle.net>,
David Hildenbrand <david@redhat.com>,
Matt Turner <mattst88@gmail.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Michal Simek <monstr@monstr.eu>,
Russell King <linux@armlinux.org.uk>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
linux-riscv@lists.infradead.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Jonas Bonn <jonas@southpole.se>, Will Deacon <will@kernel.org>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Andrea Arcangeli <aarcange@redhat.com>,
openrisc@lists.librecores.org, linux-s390@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-m68k@lists.linux-m68k.org,
Palmer Dabbelt <palmer@dabbelt.com>,
Chris Zankel <chris@zankel.net>,
Peter Zijlstra <peterz@infradead.org>,
Alistair Popple <apopple@nvidia.com>,
linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
sparclinux@vger.kernel.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Stafford Horne <shorne@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
x86@kernel.org, Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Paul Mackerras <paulus@samba.org>,
linux-arm-kernel@lists.infradead.org,
Sven Schnelle <svens@linux.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
linux-xtensa@linux-xtensa.org,
Nicholas Piggin <npiggin@gmail.com>,
linux-sh@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
linux-mips@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
Helge Deller <deller@gmx.de>, Vineet Gupta <vgupta@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Paul Walmsley <paul.walmsley@sifive.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-um@lists.infradead.org, linux-alpha@vger.kernel.org,
Johannes Berg <johannes@sipsolutions.net>,
linux-ia64@vger.kernel.org,
Geert Uytterhoeven <geert@linux-m68k.org>,
Dinh Nguyen <dinguyen@kernel.org>, Guo Ren <guoren@kernel.org>,
linux-snps-arc@lists.infradead.org,
Hugh Dickins <hughd@google.com>, Rich Felker <dalias@libc.org>,
Andy Lutomirski <luto@kernel.org>,
Richard Weinberger <richard@nod.at>,
linuxppc-dev@lists.ozlabs.org, Brian Cain <bcain@quicinc.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
linux-parisc@vger.kernel.org,
"David S . Miller" <davem@davemloft.net>,
Janosch Frank <frankja@linux.ibm.com>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Richard Henderson <rth@twiddle.net>,
David Hildenbrand <david@redhat.com>,
Matt Turner <mattst88@gmail.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Michal Simek <monstr@monstr.eu>,
Russell King <linux@armlinux.org.uk>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
linux-riscv@lists.infradead.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Jonas Bonn <jonas@southpole.se>, Will Deacon <will@kernel.org>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Andrea Arcangeli <aarcange@redhat.com>,
openrisc@lists.librecores.org, linux-s390@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-m68k@lists.linux-m68k.org,
Palmer Dabbelt <palmer@dabbelt.com>,
Chris Zankel <chris@zankel.net>,
Peter Zijlstra <peterz@infradead.org>,
Alistair Popple <apopple@nvidia.com>,
linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
sparclinux@vger.kernel.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Stafford Horne <shorne@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
x86@kernel.org, Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Paul Mackerras <paulus@samba.org>,
linux-arm-kernel@lists.infradead.org,
Sven Schnelle <svens@linux.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
linux-xtensa@linux-xtensa.org,
Nicholas Piggin <npiggin@gmail.com>,
linux-sh@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
linux-mips@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
Helge Deller <deller@gmx.de>, Vineet Gupta <vgupta@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Paul Walmsley <paul.walmsley@sifive.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-um@lists.infradead.org, linux-alpha@vger.kernel.org,
Johannes Berg <johannes@sipsolutions.net>,
linux-ia64@vger.kernel.org,
Geert Uytterhoeven <geert@linux-m68k.org>,
Dinh Nguyen <dinguyen@kernel.org>, Guo Ren <guoren@kernel.org>,
linux-snps-arc@lists.infradead.org,
Hugh Dickins <hughd@google.com>, Rich Felker <dalias@libc.org>,
Andy Lutomirski <luto@kernel.org>,
Richard Weinberger <richard@nod.at>,
linuxppc-dev@lists.ozlabs.org, Brian Cain <bcain@quicinc.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
linux-parisc@vger.kernel.org,
"David S . Miller" <davem@davemloft.net>,
Janosch Frank <frankja@linux.ibm.com>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 12:23:42 +0000 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Richard Henderson <rth@twiddle.net>,
David Hildenbrand <david@redhat.com>,
Matt Turner <mattst88@gmail.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Michal Simek <monstr@monstr.eu>,
Russell King <linux@armlinux.org.uk>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
linux-riscv@lists.infradead.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Jonas Bonn <jonas@southpole.se>, Will Deacon <will@kernel.org>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Andrea Arcangeli <aarcange@redhat.com>,
openrisc@lists.librecores.org, linux-s390@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-m68k@lists.linux-m68k.org,
Palmer Dabbelt <palmer@dabbelt.com>,
Chris Zankel <chris@zankel.net>,
Peter Zijlstra <peterz@infradead.org>,
Alistair Popple <apopple@nvidia.com>,
linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
sparclinux@vger.kernel.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Stafford Horne <shorne@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
x86@kernel.org, Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Paul Mackerras <paulus@samba.org>,
linux-arm-kernel@lists.infradead.org,
Sven Schnelle <svens@linux.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
linux-xtensa@linux-xtensa.org,
Nicholas Piggin <npiggin@gmail.com>,
linux-sh@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
linux-mips@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
Helge Deller <deller@gmx.de>, Vineet Gupta <vgupta@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Paul Walmsley <paul.walmsley@sifive.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-um@lists.infradead.org, linux-alpha@vger.kernel.org,
Johannes Berg <johannes@sipsolutions.net>,
linux-ia64@vger.kernel.org,
Geert Uytterhoeven <geert@linux-m68k.org>,
Dinh Nguyen <dinguyen@kernel.org>, Guo Ren <guoren@kernel.org>,
linux-snps-arc@lists.infradead.org,
Hugh Dickins <hughd@google.com>, Rich Felker <dalias@libc.org>,
Andy Lutomirski <luto@kernel.org>,
Richard Weinberger <richard@nod.at>,
linuxppc-dev@lists.ozlabs.org, Brian Cain <bcain@quicinc.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
linux-parisc@vger.kernel.org,
"David S . Miller" <davem@davemloft.net>,
Janosch Frank <frankja@linux.ibm.com>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Richard Henderson <rth@twiddle.net>,
David Hildenbrand <david@redhat.com>,
Matt Turner <mattst88@gmail.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Michal Simek <monstr@monstr.eu>,
Russell King <linux@armlinux.org.uk>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
linux-riscv@lists.infradead.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Jonas Bonn <jonas@southpole.se>, Will Deacon <will@kernel.org>,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
"H . Peter Anvin" <hpa@zytor.com>,
Andrea Arcangeli <aarcange@redhat.com>,
openrisc@lists.librecores.org, linux-s390@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-m68k@lists.linux-m68k.org,
Palmer Dabbelt <palmer@dabbelt.com>,
Chris Zankel <chris@zankel.net>,
Peter Zijlstra <peterz@infradead.org>,
Alistair Popple <apopple@nvidia.com>,
linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org,
Vlastimil Babka <vbabka@suse.cz>,
Thomas Gleixner <tglx@linutronix.de>,
sparclinux@vger.kernel.org,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Stafford Horne <shorne@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
x86@kernel.org, Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
Paul Mackerras <paulus@samba.org>,
linux-arm-kernel@lists.infradead.org,
Sven Schnelle <svens@linux.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
linux-xtensa@linux-xtensa.org,
Nicholas Piggin <npiggin@gmail.com>,
linux-sh@vger.kernel.org, Vasily Gorbik <gor@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
linux-mips@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
Helge Deller <deller@gmx.de>, Vineet Gupta <vgupta@kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Paul Walmsley <paul.walmsley@sifive.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Catalin Marinas <catalin.marinas@arm.com>,
linux-um@lists.infradead.org, linux-alpha@vger.kernel.org,
Johannes Berg <johannes@sipsolutions.net>,
linux-ia64@vger.kernel.org,
Geert Uytterhoeven <geert@linux-m68k.org>,
Dinh Nguyen <dinguyen@kernel.org>, Guo Ren <guoren@kernel.org>,
linux-snps-arc@lists.infradead.org,
Hugh Dickins <hughd@google.com>, Rich Felker <dalias@libc.org>,
Andy Lutomirski <luto@kernel.org>,
Richard Weinberger <richard@nod.at>,
linuxppc-dev@lists.ozlabs.org, Brian Cain <bcain@quicinc.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Andrew Morton <akpm@linux-foundation.org>,
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
linux-parisc@vger.kernel.org,
"David S . Miller" <davem@davemloft.net>,
Janosch Frank <frankja@linux.ibm.com>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
_______________________________________________
linux-snps-arc mailing list
linux-snps-arc@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-snps-arc
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: x86@kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
David Hildenbrand <david@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mips@vger.kernel.org,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
linux-mm@kvack.org, Rich Felker <dalias@libc.org>,
Paul Mackerras <paulus@samba.org>,
"H . Peter Anvin" <hpa@zytor.com>,
sparclinux@vger.kernel.org, linux-ia64@vger.kernel.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Will Deacon <will@kernel.org>,
linux-riscv@lists.infradead.org,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Jonas Bonn <jonas@southpole.se>,
linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org,
Janosch Frank <frankja@linux.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
linux-sh@vger.kernel.org, linux-hexagon@vger.kernel.org,
Helge Deller <deller@gmx.de>,
Alistair Popple <apopple@nvidia.com>,
Hugh Dickins <hughd@google.com>,
Russell King <linux@armlinux.org.uk>,
linux-csky@vger.kernel.org, linux-alpha@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
linux-arm-kernel@lists.infradead.org,
Vineet Gupta <vgupta@kernel.org>,
Matt Turner <mattst88@gmail.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Vasily Gorbik <gor@linux.ibm.com>, Brian Cain <bcain@quicinc.com>,
linux-xtensa@linux-xtensa.org,
Johannes Weiner <hannes@cmpxchg.org>,
linux-um@lists.infradead.org, Nicholas Piggin <npiggin@gmail.com>,
Richard Weinberger <richard@nod.at>,
linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Al Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Richard Henderson <rth@twiddle.net>,
Chris Zankel <chris@zankel.net>, Michal Simek <monstr@monstr.eu>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
linux-parisc@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>,
linux-kernel@vger.kernel.org, Dinh Nguyen <dinguyen@kernel.org>,
Palmer Dabbelt <palmer@dabbelt.com>,
Sven Schnelle <svens@linux.ibm.com>, Guo Ren <guoren@kernel.org>,
Michael Ellerman <mpe@ellerman.id.au>,
Borislav Petkov <bp@alien8.de>,
Johannes Berg <johannes@sipsolutions.net>,
linuxppc-dev@lists.ozlabs.org,
"David S . Miller" <davem@davemloft.net>
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
WARNING: multiple messages have this Message-ID (diff)
From: Heiko Carstens <hca@linux.ibm.com>
To: Peter Xu <peterx@redhat.com>
Cc: x86@kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
David Hildenbrand <david@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mips@vger.kernel.org,
"James E . J . Bottomley" <James.Bottomley@hansenpartnership.com>,
linux-mm@kvack.org, Rich Felker <dalias@libc.org>,
Paul Mackerras <paulus@samba.org>,
"H . Peter Anvin" <hpa@zytor.com>,
sparclinux@vger.kernel.org, linux-ia64@vger.kernel.org,
Alexander Gordeev <agordeev@linux.ibm.com>,
Will Deacon <will@kernel.org>,
linux-riscv@lists.infradead.org,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Jonas Bonn <jonas@southpole.se>,
linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org,
Janosch Frank <frankja@linux.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
linux-sh@vger.kernel.org, linux-hexagon@vger.kernel.org,
Helge Deller <deller@gmx.de>,
Alistair Popple <apopple@nvidia.com>,
Hugh Dickins <hughd@google.com>,
Russell King <linux@armlinux.org.uk>,
linux-csky @vger.kernel.org, linux-alpha@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Geert Uytterhoeven <geert@linux-m68k.org>,
linux-arm-kernel@lists.infradead.org,
Vineet Gupta <vgupta@kernel.org>,
Stafford Horne <shorne@gmail.com>,
Matt Turner <mattst88@gmail.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Vasily Gorbik <gor@linux.ibm.com>, Brian Cain <bcain@quicinc.com>,
linux-xtensa@linux-xtensa.org,
Johannes Weiner <hannes@cmpxchg.org>,
linux-um@lists.infradead.org, Nicholas Piggin <npiggin@gmail.com>,
Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
Richard Weinberger <richard@nod.at>,
linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Al Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Thomas Gleixner <tglx@linutronix.de>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Richard Henderson <rth@twiddl
Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types
Date: Fri, 27 May 2022 14:23:42 +0200 [thread overview]
Message-ID: <YpDCzvLER9AYJJc8@osiris> (raw)
In-Reply-To: <20220524234531.1949-1-peterx@redhat.com>
e.net>, Chris Zankel <chris@zankel.net>, Michal Simek <monstr@monstr.eu>, Thomas Bogendoerfer <tsbogend@alpha.franken.de>, linux-parisc@vger.kernel.org, Max Filippov <jcmvbkbc@gmail.com>, linux-kernel@vger.kernel.org, Dinh Nguyen <dinguyen@kernel.org>, Palmer Dabbelt <palmer@dabbelt.com>, Sven Schnelle <svens@linux.ibm.com>, Guo Ren <guoren@kernel.org>, Borislav Petkov <bp@alien8.de>, Johannes Berg <johannes@sipsolutions.net>, linuxppc-dev@lists.ozlabs.org, "David S . Miller" <davem@davemloft.net>
Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org
Sender: "Linuxppc-dev" <linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org>
On Tue, May 24, 2022 at 07:45:31PM -0400, Peter Xu wrote:
> I observed that for each of the shared file-backed page faults, we're very
> likely to retry one more time for the 1st write fault upon no page. It's
> because we'll need to release the mmap lock for dirty rate limit purpose
> with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
>
> Then after that throttling we return VM_FAULT_RETRY.
>
> We did that probably because VM_FAULT_RETRY is the only way we can return
> to the fault handler at that time telling it we've released the mmap lock.
>
> However that's not ideal because it's very likely the fault does not need
> to be retried at all since the pgtable was well installed before the
> throttling, so the next continuous fault (including taking mmap read lock,
> walk the pgtable, etc.) could be in most cases unnecessary.
>
> It's not only slowing down page faults for shared file-backed, but also add
> more mmap lock contention which is in most cases not needed at all.
>
> To observe this, one could try to write to some shmem page and look at
> "pgfault" value in /proc/vmstat, then we should expect 2 counts for each
> shmem write simply because we retried, and vm event "pgfault" will capture
> that.
>
> To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
> show that we've completed the whole fault and released the lock. It's also
> a hint that we should very possibly not need another fault immediately on
> this page because we've just completed it.
>
> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
> Before: 650.980 ms (+-1.94%)
> After: 569.396 ms (+-1.38%)
>
> I believe it could help more than that.
>
> We need some special care on GUP and the s390 pgfault handler (for gmap
> code before returning from pgfault), the rest changes in the page fault
> handlers should be relatively straightforward.
>
> Another thing to mention is that mm_account_fault() does take this new
> fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
>
> I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
> not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
> them as-is.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
...
> diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
> index e173b6187ad5..9503a7cfaf03 100644
> --- a/arch/s390/mm/fault.c
> +++ b/arch/s390/mm/fault.c
> @@ -339,6 +339,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> unsigned long address;
> unsigned int flags;
> vm_fault_t fault;
> + bool need_unlock = true;
> bool is_write;
>
> tsk = current;
> @@ -433,6 +434,13 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> goto out_up;
> goto out;
> }
> +
> + /* The fault is fully completed (including releasing mmap lock) */
> + if (fault & VM_FAULT_COMPLETED) {
> + need_unlock = false;
> + goto out_gmap;
> + }
> +
> if (unlikely(fault & VM_FAULT_ERROR))
> goto out_up;
>
> @@ -452,6 +460,7 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> mmap_read_lock(mm);
> goto retry;
> }
> +out_gmap:
> if (IS_ENABLED(CONFIG_PGSTE) && gmap) {
> address = __gmap_link(gmap, current->thread.gmap_addr,
> address);
> @@ -466,7 +475,8 @@ static inline vm_fault_t do_exception(struct pt_regs *regs, int access)
> }
> fault = 0;
> out_up:
> - mmap_read_unlock(mm);
> + if (need_unlock)
> + mmap_read_unlock(mm);
> out:
This seems to be incorrect. __gmap_link() requires the mmap_lock to be
held. Christian, Janosch, or David, could you please check?
next prev parent reply other threads:[~2022-05-27 12:23 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-24 23:45 [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-24 23:45 ` Peter Xu
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 8:03 ` Geert Uytterhoeven
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 11:10 ` Peter Zijlstra
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-25 12:44 ` Johannes Weiner
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-26 3:40 ` Vineet Gupta
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 2:54 ` Guo Ren
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 5:39 ` Max Filippov
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 8:21 ` Alistair Popple
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 10:46 ` Ingo Molnar
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 14:53 ` Peter Xu
2022-05-27 12:23 ` Heiko Carstens [this message]
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 12:23 ` Heiko Carstens
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
2022-05-27 13:49 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YpDCzvLER9AYJJc8@osiris \
--to=hca@linux.ibm.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=agordeev@linux.ibm.com \
--cc=aou@eecs.berkeley.edu \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hpa@zytor.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jonas@southpole.se \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mattst88@gmail.com \
--cc=mingo@redhat.com \
--cc=monstr@monstr.eu \
--cc=openrisc@lists.librecores.org \
--cc=palmer@da \
--cc=peterx@redhat.com \
--cc=rth@twiddle.net \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.