From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vineet Gupta Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Date: Wed, 25 May 2022 20:40:51 -0700 Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> References: <20220524234531.1949-1-peterx@redhat.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Content-Language: en-US In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> List-ID: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Richard Henderson , David Hildenbrand , Matt Turner , Albert Ou , Michal Simek , Russell King , Ivan Kokshaysky , linux-riscv@lists.infradead.org, Alexander Gordeev , Dave Hansen , Jonas Bonn , Will Deacon , "James E . J . Bottomley" , "H . Peter Anvin" , Andrea Arcangeli , openrisc@lists.librecores.org, linux-s390@vger.kernel.org, Ingo Molnar , linux-m68k@lists.linux-m68k.org, Palmer Dabbelt , Heiko On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33142C433EF for ; Thu, 26 May 2022 03:41:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241731AbiEZDk7 (ORCPT ); Wed, 25 May 2022 23:40:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242286AbiEZDk6 (ORCPT ); Wed, 25 May 2022 23:40:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 70CACBCEA6; Wed, 25 May 2022 20:40:56 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0D8D660FE6; Thu, 26 May 2022 03:40:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BBA1C385B8; Thu, 26 May 2022 03:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> Date: Wed, 25 May 2022 20:40:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Content-Language: en-US To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Richard Henderson , David Hildenbrand , Matt Turner , Albert Ou , Michal Simek , Russell King , Ivan Kokshaysky , linux-riscv@lists.infradead.org, Alexander Gordeev , Dave Hansen , Jonas Bonn , Will Deacon , "James E . J . Bottomley" , "H . Peter Anvin" , Andrea Arcangeli , openrisc@lists.librecores.org, linux-s390@vger.kernel.org, Ingo Molnar , linux-m68k@lists.linux-m68k.org, Palmer Dabbelt , Heiko Carstens , Chris Zankel , Peter Zijlstra , Alistair Popple , linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, Vlastimil Babka , Thomas Gleixner , sparclinux@vger.kernel.org, Christian Borntraeger , Stafford Horne , Michael Ellerman , x86@kernel.org, Thomas Bogendoerfer , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Sven Schnelle , Benjamin Herrenschmidt , linux-xtensa@linux-xtensa.org, Nicholas Piggin , linux-sh@vger.kernel.org, Vasily Gorbik , Borislav Petkov , linux-mips@vger.kernel.org, Max Filippov , Helge Deller , Vineet Gupta , Al Viro , Paul Walmsley , Johannes Weiner , Anton Ivanov , Catalin Marinas , linux-um@lists.infradead.org, linux-alpha@vger.kernel.org, Johannes Berg , linux-ia64@vger.kernel.org, Geert Uytterhoeven , Dinh Nguyen , Guo Ren , linux-snps-arc@lists.infradead.org, Hugh Dickins , Rich Felker , Andy Lutomirski , Richard Weinberger , linuxppc-dev@lists.ozlabs.org, Brian Cain , Yoshinori Sato , Andrew Morton , Stefan Kristiansson , linux-parisc@vger.kernel.org, "David S . Miller" References: <20220524234531.1949-1-peterx@redhat.com> From: Vineet Gupta In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-csky@vger.kernel.org On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vineet Gupta Date: Thu, 26 May 2022 03:40:51 +0000 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Message-Id: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> List-Id: References: <20220524234531.1949-1-peterx@redhat.com> In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Richard Henderson , David Hildenbrand , Matt Turner , Albert Ou , Michal Simek , Russell King , Ivan Kokshaysky , linux-riscv@lists.infradead.org, Alexander Gordeev , Dave Hansen , Jonas Bonn , Will Deacon , "James E . J . Bottomley" , "H . Peter Anvin" , Andrea Arcangeli , openrisc@lists.librecores.org, linux-s390@vger.kernel.org, Ingo Molnar , linux-m68k@lists.linux-m68k.org, Palmer Dabbelt , Heiko Carstens , Chris Zankel , Peter Zijlstra , Alistair Popple , linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, Vlastimil Babka , Thomas Gleixner , sparclinux@vger.kernel.org, Christian Borntraeger , Stafford Horne , Michael Ellerman , x86@kernel.org, Thomas Bogendoerfer , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Sven Schnelle , Benjamin Herrenschmidt , linux-xtensa@linux-xtensa.org, Nicholas Piggin , linux-sh@vger.kernel.org, Vasily Gorbik , Borislav Petkov , linux-mips@vger.kernel.org, Max Filippov , Helge Deller , Vineet Gupta , Al Viro , Paul Walmsley , Johannes Weiner , Anton Ivanov , Catalin Marinas , linux-um@lists.infradead.org, linux-alpha@vger.kernel.org, Johannes Berg , linux-ia64@vger.kernel.org, Geert Uytterhoeven , Dinh Nguyen , Guo Ren , linux-snps-arc@lists.infradead.org, Hugh Dickins , Rich Felker , Andy Lutomirski , Richard Weinberger , linuxppc-dev@lists.ozlabs.org, Brian Cain , Yoshinori Sato , Andrew Morton , Stefan Kristiansson , linux-parisc@vger.kernel.org, "David S . Miller" On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5CCDCC433F5 for ; Thu, 26 May 2022 03:41:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=+XNojd7Qf+7XwHJbGBwnFm3qGoJ7pvbsmV6g0xkiCPg=; b=x46Ap7lgQacxiF zXsSf3mHQ8eD/hCi1Nks4pKX5NK8F/2qe4E5KFLE9X74kpketPPcIwkuXKSmeFjZj2hRVDFjcrvwy kZwzh3FhApzZZfvQLOP7CFSu044sAbpxJhD3KhLMDT06/gjY+mq0IrprTtIt5RaXT+xwwx9M97rQ1 y1gmZovfvJiYIGMtKbTfQnqYKe/XexDcPAVcRaTNvEvJSjLEjEe210sk6BFga9SP8hfVQwXGVxyrg HJO/PzorExmmyE50Lev0VWFsOw+A2G1UvPxHczs4PY2NdI6QhLzkY5arJtLAG12LyVQND8CT9y2B7 BnVIval6+nnJTR6AaN1g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nu4Mx-00DOQL-Nx; Thu, 26 May 2022 03:41:03 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nu4Mt-00DOO5-Go; Thu, 26 May 2022 03:41:02 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 13040615AC; Thu, 26 May 2022 03:40:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BBA1C385B8; Thu, 26 May 2022 03:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> Date: Wed, 25 May 2022 20:40:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Content-Language: en-US To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Richard Henderson , David Hildenbrand , Matt Turner , Albert Ou , Michal Simek , Russell King , Ivan Kokshaysky , linux-riscv@lists.infradead.org, Alexander Gordeev , Dave Hansen , Jonas Bonn , Will Deacon , "James E . J . Bottomley" , "H . Peter Anvin" , Andrea Arcangeli , openrisc@lists.librecores.org, linux-s390@vger.kernel.org, Ingo Molnar , linux-m68k@lists.linux-m68k.org, Palmer Dabbelt , Heiko Carstens , Chris Zankel , Peter Zijlstra , Alistair Popple , linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, Vlastimil Babka , Thomas Gleixner , sparclinux@vger.kernel.org, Christian Borntraeger , Stafford Horne , Michael Ellerman , x86@kernel.org, Thomas Bogendoerfer , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Sven Schnelle , Benjamin Herrenschmidt , linux-xtensa@linux-xtensa.org, Nicholas Piggin , linux-sh@vger.kernel.org, Vasily Gorbik , Borislav Petkov , linux-mips@vger.kernel.org, Max Filippov , Helge Deller , Vineet Gupta , Al Viro , Paul Walmsley , Johannes Weiner , Anton Ivanov , Catalin Marinas , linux-um@lists.infradead.org, linux-alpha@vger.kernel.org, Johannes Berg , linux-ia64@vger.kernel.org, Geert Uytterhoeven , Dinh Nguyen , Guo Ren , linux-snps-arc@lists.infradead.org, Hugh Dickins , Rich Felker , Andy Lutomirski , Richard Weinberger , linuxppc-dev@lists.ozlabs.org, Brian Cain , Yoshinori Sato , Andrew Morton , Stefan Kristiansson , linux-parisc@vger.kernel.org, "David S . Miller" References: <20220524234531.1949-1-peterx@redhat.com> From: Vineet Gupta In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220525_204059_666800_783F5EA8 X-CRM114-Status: GOOD ( 30.80 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 70335C433F5 for ; Thu, 26 May 2022 03:41:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=kxAX2830Zk/kXiU3eEawy5+KTpxT0m5MhVyEeQRwjDo=; b=DP+7PLnUuLoBUJ deT7CxfVRoSDaKseYgNdq9si6szZvJ0vFeMZ0jm2+T4vOhJui8O5A3UnLA4HqDIkqxGGw03L3pZp/ ixiPUxIKXDsASGm27l6AqUOZ9TwqCZVz5q1M1LM/BzTKyXPcc/3lB7Pky2f9rGlpJ7xoQp5xIEYLn wKK1qYJ/hsPoRIZEecLbPbm4Trn/J+i9cMUoUYE3BOxaLpUeYbAdp6/pdbn2JyuS13smbY4FFgBUq FlotLHlV/Tng/PeKpaDUnz/iA0LPUaCSY69DRhmTYwdmEGP2g+i0c4APU+EPFqajxKcW74F1rv7sZ eE9CifhZDVgDzlQQkbcA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nu4Mz-00DOQl-6m; Thu, 26 May 2022 03:41:05 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nu4Mt-00DOO5-Go; Thu, 26 May 2022 03:41:02 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 13040615AC; Thu, 26 May 2022 03:40:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BBA1C385B8; Thu, 26 May 2022 03:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> Date: Wed, 25 May 2022 20:40:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Content-Language: en-US To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Richard Henderson , David Hildenbrand , Matt Turner , Albert Ou , Michal Simek , Russell King , Ivan Kokshaysky , linux-riscv@lists.infradead.org, Alexander Gordeev , Dave Hansen , Jonas Bonn , Will Deacon , "James E . J . Bottomley" , "H . Peter Anvin" , Andrea Arcangeli , openrisc@lists.librecores.org, linux-s390@vger.kernel.org, Ingo Molnar , linux-m68k@lists.linux-m68k.org, Palmer Dabbelt , Heiko Carstens , Chris Zankel , Peter Zijlstra , Alistair Popple , linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, Vlastimil Babka , Thomas Gleixner , sparclinux@vger.kernel.org, Christian Borntraeger , Stafford Horne , Michael Ellerman , x86@kernel.org, Thomas Bogendoerfer , Paul Mackerras , linux-arm-kernel@lists.infradead.org, Sven Schnelle , Benjamin Herrenschmidt , linux-xtensa@linux-xtensa.org, Nicholas Piggin , linux-sh@vger.kernel.org, Vasily Gorbik , Borislav Petkov , linux-mips@vger.kernel.org, Max Filippov , Helge Deller , Vineet Gupta , Al Viro , Paul Walmsley , Johannes Weiner , Anton Ivanov , Catalin Marinas , linux-um@lists.infradead.org, linux-alpha@vger.kernel.org, Johannes Berg , linux-ia64@vger.kernel.org, Geert Uytterhoeven , Dinh Nguyen , Guo Ren , linux-snps-arc@lists.infradead.org, Hugh Dickins , Rich Felker , Andy Lutomirski , Richard Weinberger , linuxppc-dev@lists.ozlabs.org, Brian Cain , Yoshinori Sato , Andrew Morton , Stefan Kristiansson , linux-parisc@vger.kernel.org, "David S . Miller" References: <20220524234531.1949-1-peterx@redhat.com> From: Vineet Gupta In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220525_204059_666800_783F5EA8 X-CRM114-Status: GOOD ( 30.80 ) X-BeenThere: linux-snps-arc@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+linux-snps-arc=archiver.kernel.org@lists.infradead.org On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet _______________________________________________ linux-snps-arc mailing list linux-snps-arc@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-snps-arc From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.librecores.org (lists.librecores.org [88.198.125.70]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E13AC43219 for ; Thu, 26 May 2022 03:41:00 +0000 (UTC) Received: from [172.31.1.100] (localhost.localdomain [127.0.0.1]) by mail.librecores.org (Postfix) with ESMTP id D607F24863; Thu, 26 May 2022 05:40:59 +0200 (CEST) Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mail.librecores.org (Postfix) with ESMTPS id B01C0247DD for ; Thu, 26 May 2022 05:40:57 +0200 (CEST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 13040615AC; Thu, 26 May 2022 03:40:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BBA1C385B8; Thu, 26 May 2022 03:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> Date: Wed, 25 May 2022 20:40:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Content-Language: en-US To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20220524234531.1949-1-peterx@redhat.com> From: Vineet Gupta In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: openrisc@lists.librecores.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Discussion around the OpenRISC processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: x86@kernel.org, Catalin Marinas , David Hildenbrand , Peter Zijlstra , Benjamin Herrenschmidt , Dave Hansen , "James E . J . Bottomley" , Max Filippov , Rich Felker , Paul Mackerras , "H . Peter Anvin" , sparclinux@vger.kernel.org, linux-ia64@vger.kernel.org, Alexander Gordeev , Will Deacon , linux-riscv@lists.infradead.org, Anton Ivanov , Jonas Bonn , linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , Alistair Popple , Hugh Dickins , Russell King , linux-csky@vger.kernel.org, linux-sh@vger.kernel.org, Ingo Molnar , linux-arm-kernel@lists.infradead.org, Vineet Gupta , Matt Turner , Christian Borntraeger , Andrea Arcangeli , Albert Ou , Vasily Gorbik , Brian Cain , Heiko Carstens , Johannes Weiner , linux-um@lists.infradead.org, Nicholas Piggin , Richard Weinberger , linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org, Ivan Kokshaysky , Al Viro , Andy Lutomirski , Paul Walmsley , Thomas Gleixner , linux-alpha@vger.kernel.org, Andrew Morton , Vlastimil Babka , Richard Henderson , Chris Zankel , Michal Simek , Thomas Bogendoerfer , linux-parisc@vger.kernel.org, linux-mips@vger.kernel.org, Dinh Nguyen , Palmer Dabbelt , Sven Schnelle , Guo Ren , Michael Ellerman , Borislav Petkov , Johannes Berg , linuxppc-dev@lists.ozlabs.org, "David S . Miller" Errors-To: openrisc-bounces@lists.librecores.org Sender: "OpenRISC" On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EAE5BC433EF for ; Thu, 26 May 2022 04:05:29 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4L7vVH0BtCz3bpJ for ; Thu, 26 May 2022 14:05:27 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=hLLrqk/d; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=kernel.org (client-ip=139.178.84.217; helo=dfw.source.kernel.org; envelope-from=vgupta@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256 header.s=k20201202 header.b=hLLrqk/d; dkim-atps=neutral Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4L7ty352Zvz305v for ; Thu, 26 May 2022 13:40:59 +1000 (AEST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 13040615AC; Thu, 26 May 2022 03:40:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5BBA1C385B8; Thu, 26 May 2022 03:40:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1653536455; bh=/FC3fRDbTNCC3O7WmAhxUVAkmggWpUFeLv7OpR1lxLg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=hLLrqk/dSj3ABZgHj+oG01SXfcPdfcPLbCPW5c/vtlKKjGhr826rFN6lLQogyE6R+ V4+rLm9jBH6we4JcRrEMO+7ArvhQ4vUukeFX5qG0umNfJEjOVl7t+dToTe7orX2kZK bSWoDGynO3/vHi04K7UfgRIeRSjHN5GtmmMIX2USGvftGI9pdfeJrjgnAmxrltc5n0 8y833O4MCr4JKKzVPSDGbAvYYjpDQY/dQCqYoM1yOIlCVTJtDFNiHcFx0Vs05+lAF0 ALTHrfJeOAfPcImvCexeRL9L6Mpy6p/bjPRzg8B1971s8Zq8YHz1AyoKFvxIGL0AWO PHAjKqVSOf5bw== Message-ID: <8f6add25-2e8f-4533-fa42-e43db0e32f2d@kernel.org> Date: Wed, 25 May 2022 20:40:51 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types Content-Language: en-US To: Peter Xu , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20220524234531.1949-1-peterx@redhat.com> From: Vineet Gupta In-Reply-To: <20220524234531.1949-1-peterx@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Thu, 26 May 2022 14:04:55 +1000 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: x86@kernel.org, Catalin Marinas , David Hildenbrand , Peter Zijlstra , Dave Hansen , "James E . J . Bottomley" , Max Filippov , Rich Felker , Paul Mackerras , "H . Peter Anvin" , sparclinux@vger.kernel.org, linux-ia64@vger.kernel.org, Alexander Gordeev , Will Deacon , linux-riscv@lists.infradead.org, Anton Ivanov , Jonas Bonn , linux-s390@vger.kernel.org, linux-snps-arc@lists.infradead.org, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , Alistair Popple , Hugh Dickins , Russell King , linux-csky@vger.kernel.org, linux-sh@vger.kernel.org, Ing o Molnar , Geert Uytterhoeven , linux-arm-kernel@lists.infradead.org, Vineet Gupta , Stafford Horne , Matt Turner , Christian Borntraeger , Andrea Arcangeli , Albert Ou , Vasily Gorbik , Brian Cain , Heiko Carstens , Johannes Weiner , linux-um@lists.infradead.org, Nicholas Piggin , Stefan Kristiansson , Richard Weinberger , linux-m68k@lists.linux-m68k.org, openrisc@lists.librecores.org, Ivan Kokshaysky , Al Viro , Andy Lutomirski , Paul Walmsley , Thomas Gleixner , linux-alpha@vger.kernel.org, Andrew Morton , Vlastimil Babka , Richard Henderson , Chris Za nkel , Michal Simek , Thomas Bogendoerfer , linux-parisc@vger.kernel.org, linux-mips@vger.kernel.org, Dinh Nguyen , Palmer Dabbelt , Sven Schnelle , Guo Ren , Borislav Petkov , Johannes Berg , linuxppc-dev@lists.ozlabs.org, "David S . Miller" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 5/24/22 16:45, Peter Xu wrote: > I observed that for each of the shared file-backed page faults, we're very > likely to retry one more time for the 1st write fault upon no page. It's > because we'll need to release the mmap lock for dirty rate limit purpose > with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). > > Then after that throttling we return VM_FAULT_RETRY. > > We did that probably because VM_FAULT_RETRY is the only way we can return > to the fault handler at that time telling it we've released the mmap lock. > > However that's not ideal because it's very likely the fault does not need > to be retried at all since the pgtable was well installed before the > throttling, so the next continuous fault (including taking mmap read lock, > walk the pgtable, etc.) could be in most cases unnecessary. > > It's not only slowing down page faults for shared file-backed, but also add > more mmap lock contention which is in most cases not needed at all. > > To observe this, one could try to write to some shmem page and look at > "pgfault" value in /proc/vmstat, then we should expect 2 counts for each > shmem write simply because we retried, and vm event "pgfault" will capture > that. > > To make it more efficient, add a new VM_FAULT_COMPLETED return code just to > show that we've completed the whole fault and released the lock. It's also > a hint that we should very possibly not need another fault immediately on > this page because we've just completed it. > > This patch provides a ~12% perf boost on my aarch64 test VM with a simple > program sequentially dirtying 400MB shmem file being mmap()ed and these are > the time it needs: > > Before: 650.980 ms (+-1.94%) > After: 569.396 ms (+-1.38%) > > I believe it could help more than that. > > We need some special care on GUP and the s390 pgfault handler (for gmap > code before returning from pgfault), the rest changes in the page fault > handlers should be relatively straightforward. > > Another thing to mention is that mm_account_fault() does take this new > fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. > > I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do > not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping > them as-is. > > Signed-off-by: Peter Xu > --- > > v3: > - Rebase to akpm/mm-unstable > - Copy arch maintainers > --- > arch/arc/mm/fault.c | 4 ++++ Acked-by: Vineet Gupta Thx, -Vineet