From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CA908C5475B for ; Fri, 1 Mar 2024 18:48:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=0U+HNTx8qGB7sDeKYgcq8Zsk3Ls147vweK70qFyWp2A=; b=RTLmSsu71TvTqO NZA1x76ijtZ6HI2L59NQSTqgKYjkJiL1XHrZ8II0a9coVLlG2ZP8GSKbRBNrGRayDQp7098/w8WgN xcVJUKA/O68FKFBqNjmf+YDohxOiQDq6cavixW7FAlv75rshVnmg0HkXMtPFCQoVvK/XziFvFuWVN KE4HJaiQi6ZGS6Q3e/Lg5iKCsym1LscLaKsnHNqMiTYXQwmoPEe4nrGDbNA2Nz7CNmiaJujE803zA m/fhSFcVgCwR1Yldc2K8HQRQf0WFUgZ2nQZUPFqiTPU8lItWCJ+rkn9YF6XEo3RrcV05L/K3xf3Zd hch0zMILxIJSnk3sSGyg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rg7vF-00000001aBM-2bkb; Fri, 01 Mar 2024 18:47:53 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rg7vC-00000001aAE-3c8p for linux-arm-kernel@lists.infradead.org; Fri, 01 Mar 2024 18:47:52 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 33B9F616C4; Fri, 1 Mar 2024 18:47:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA14DC433C7; Fri, 1 Mar 2024 18:47:48 +0000 (UTC) Date: Fri, 1 Mar 2024 18:47:46 +0000 From: Catalin Marinas To: Ryan Roberts Cc: Andrew Morton , Mark Rutland , John Hubbard , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org Subject: Re: [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless() Message-ID: References: <20240226120321.1055731-1-ryan.roberts@arm.com> <20240226120321.1055731-3-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240226120321.1055731-3-ryan.roberts@arm.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240301_104751_007295_A8FCC45C X-CRM114-Status: GOOD ( 28.74 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Feb 26, 2024 at 12:03:21PM +0000, Ryan Roberts wrote: > Make clear the atmicity/consistency requirements of the API and how we > achieve them. > > Link: https://lore.kernel.org/linux-mm/Zc-Tqqfksho3BHmU@arm.com/ > Signed-off-by: Ryan Roberts > --- > arch/arm64/mm/contpte.c | 24 ++++++++++++++---------- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > index be0a226c4ff9..1b64b4c3f8bf 100644 > --- a/arch/arm64/mm/contpte.c > +++ b/arch/arm64/mm/contpte.c > @@ -183,16 +183,20 @@ EXPORT_SYMBOL_GPL(contpte_ptep_get); > pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) > { > /* > - * Gather access/dirty bits, which may be populated in any of the ptes > - * of the contig range. We may not be holding the PTL, so any contiguous > - * range may be unfolded/modified/refolded under our feet. Therefore we > - * ensure we read a _consistent_ contpte range by checking that all ptes > - * in the range are valid and have CONT_PTE set, that all pfns are > - * contiguous and that all pgprots are the same (ignoring access/dirty). > - * If we find a pte that is not consistent, then we must be racing with > - * an update so start again. If the target pte does not have CONT_PTE > - * set then that is considered consistent on its own because it is not > - * part of a contpte range. > + * The ptep_get_lockless() API requires us to read and return *orig_ptep > + * so that it is self-consistent, without the PTL held, so we may be > + * racing with other threads modifying the pte. Usually a READ_ONCE() > + * would suffice, but for the contpte case, we also need to gather the > + * access and dirty bits from across all ptes in the contiguous block, > + * and we can't read all of those neighbouring ptes atomically, so any > + * contiguous range may be unfolded/modified/refolded under our feet. > + * Therefore we ensure we read a _consistent_ contpte range by checking > + * that all ptes in the range are valid and have CONT_PTE set, that all > + * pfns are contiguous and that all pgprots are the same (ignoring > + * access/dirty). If we find a pte that is not consistent, then we must > + * be racing with an update so start again. If the target pte does not > + * have CONT_PTE set then that is considered consistent on its own > + * because it is not part of a contpte range. > */ I haven't had the time to properly think about this function but, depending on what its semantics are, we might not guarantee that, at the time of reading a pte, we have the correct dirty state from the other ptes in the range. Theoretical: let's say we read the first pte in the contig range and it's clean but further down there's a dirty one. Another (v)CPU breaks the contig range, sets the dirty bit everywhere, there's some pte_mkclean for all of them and they are collapsed into a contig range again. The function above on the first (v)CPU returns a clean pte when it should have actually been dirty at the time of read. Throughout the callers of this function, I couldn't find one where it matters. So I concluded that they don't need the dirty state. Normally the dirty state is passed to the page flags, so not lost after the pte has been cleaned. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06761C5475B for ; Fri, 1 Mar 2024 18:47:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75E1C94000D; Fri, 1 Mar 2024 13:47:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70D62940007; Fri, 1 Mar 2024 13:47:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D4A494000D; Fri, 1 Mar 2024 13:47:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 4BD5B940007 for ; Fri, 1 Mar 2024 13:47:53 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0F912C05E6 for ; Fri, 1 Mar 2024 18:47:53 +0000 (UTC) X-FDA: 81849354426.09.2DA9461 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf09.hostedemail.com (Postfix) with ESMTP id 563F8140021 for ; Fri, 1 Mar 2024 18:47:51 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709318871; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XdhwNMZD9V0cFujJmhmV2klgF1u5CiAYbbA/rcXqG1c=; b=CQE0CAmc1ApAuT9eVsOS3Fctt7cNTZZKbnzfvv2z54PwG0xvF+hv0BZI4bf2yxz1ghx+Aq CKUYGrUOmVKdNeylF+Mq7RonbkXmVCLzOqA0m17XxS2VMqQVqVjqjyhpB/RrgxzrzqKmHQ QBwLFX2awH0RkGsxXLwshsKAR9F1iWE= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of cmarinas@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709318871; a=rsa-sha256; cv=none; b=0W6QSH+d9LqUru98F75TJlUYWe/FxLMSNPSltIGxn+X5OHeGdWOjJqaX7fpWlpn6YhPgXN Awx8oUMzsyCPda2DSBigkRH7ZKh92RxnOyHjWhYx5wVPM16wJ1/sYayghEPtRRNou7qyq2 kngscFRfRNLzNt6SCpdUe1XHna+T3Zk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 33B9F616C4; Fri, 1 Mar 2024 18:47:50 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id AA14DC433C7; Fri, 1 Mar 2024 18:47:48 +0000 (UTC) Date: Fri, 1 Mar 2024 18:47:46 +0000 From: Catalin Marinas To: Ryan Roberts Cc: Andrew Morton , Mark Rutland , John Hubbard , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org Subject: Re: [PATCH 2/2] arm64/mm: Improve comment in contpte_ptep_get_lockless() Message-ID: References: <20240226120321.1055731-1-ryan.roberts@arm.com> <20240226120321.1055731-3-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240226120321.1055731-3-ryan.roberts@arm.com> X-Rspamd-Queue-Id: 563F8140021 X-Rspam-User: X-Stat-Signature: fizrb65c3n6shqco4casqrapzzpj8o7p X-Rspamd-Server: rspam01 X-HE-Tag: 1709318871-698169 X-HE-Meta: U2FsdGVkX19VqRu4IM2EKWoRnsEDV6ZFCKwjQVRWHUkLt0vcsiX87L2Dj7pBLU+q3mn30Hv3+YijnaWlxfNumTD8wKMlAZfGvAarcPDlfRw9EoZ3kgvMsuCH0EsYN6LvxtDz6X1z1Ly8/o/3gv+Dzi5Ej1LnlEdOey1tjE2IhuIhkrDw6NKIJln2CWo4DxaIYDV0EEgKy2m15yxgeD+d2yu6g7TxYzMv6W1H+wXESF6sftEazPvQT3y3G3B6wMEMvvsgetoF3kbPtTnpp1S3F0JCVygqYauaYoHoAHDrXNj1DvlYccM5Aeh0to+c+qrvZPx5+Em8p7AbMVI22IdkSrXuFiIm1ll2a2TkMR4IllSosc1YkNbS570tGJD6/xaQdWMwjL7+CYEw+5R0yk3fuErFPOJWWoRnRBAt8nHmnWEcyT4ReG3UXBWRIwoQPXYqBqTOM9Q+PDn7W7mpvQYu0C9lJTY0N20Sqq7TFyoMyzYlH5dswPS1iNUcC4N+mx6UM81yPJmxTiRlXMAbFjmv+wkpPD9dpIXoeV7W1ARujTEmd/ISy3AUEKojo+1pmBp1qZx88EM8q14PPs1kSliGXBoezUgegb9a0ON5UjIQLOCmJ44btuaf8ehLrxb+asnlBJEXTyePKcbHOAKmsc+vz5FDeGTaEDzIaeumxucJDk/h9R5Gk8r1n0Vr5/G7RXjU8+LhKAegiirBcsBopDLRWMwvwJ8oSxq1uivedOrz90VjPWSZWzqSzRePLf+Sy3uKXC7VQFSwDfC4HFvpJRNHLpVXLvkMbPRTNszG+BZyTY9/iV7pAWkFtwjZMf6Tkkqnohzn9QBF8/TfzFOc8XfYPl/Pn5t5UH82G/VhcLAglM+I/X3TS83KUPFA89J2z4IlVssb08Ya4EG4jah7gYCPAG2cb0LHm6Q3EDwLfIGEjCochypH2/auMdiVc+UGpQpz+mKh0cYxmg9M0y+aPaC x2H/4lq1 HbcPygiv/ApxrSRcFD7N+49BXDmP/g8kGVdKw/wjZ4HxubRJ6p3cGLp9gqCq0D3zgObGVNK7Gehy6YzSuKq1KdP+N3vrEwBQPuLTrvXU8x3iy9HPV9jf9Y55PCDpSHwfP8oQ/f4wB9afDk96b4Sy5m5i9LxLFuBRaM4Kq82qAujv4CH4buTWu+EgYOldUFNMxNoWi/SDaIsyw4eONKneJJu5r5gLrAyGll4mwb4l37e3+ULI8hObCmpD6UYtR2T4okdvexUNEq6OLl7rJMhvQpyLLSKd14BCIVCTILK+ZzdJw5ovB0tMqR4mzYsx3Kau9S1OuXhJ4C80mAhfz8/YQJpiFCBPKa3LI5uQ1q8WghK0FfCv87T7YEcfBgLvTsjQNq81m6YRZdudG/YUqt78J4MwiQv+Cjye9xtsq7yDO5PSpxjJNKwClrbZLlmEZmCTxb4sdi2gPsKRqpbvri2Uf1qQBd3/8yXrkwNkx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 26, 2024 at 12:03:21PM +0000, Ryan Roberts wrote: > Make clear the atmicity/consistency requirements of the API and how we > achieve them. > > Link: https://lore.kernel.org/linux-mm/Zc-Tqqfksho3BHmU@arm.com/ > Signed-off-by: Ryan Roberts > --- > arch/arm64/mm/contpte.c | 24 ++++++++++++++---------- > 1 file changed, 14 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > index be0a226c4ff9..1b64b4c3f8bf 100644 > --- a/arch/arm64/mm/contpte.c > +++ b/arch/arm64/mm/contpte.c > @@ -183,16 +183,20 @@ EXPORT_SYMBOL_GPL(contpte_ptep_get); > pte_t contpte_ptep_get_lockless(pte_t *orig_ptep) > { > /* > - * Gather access/dirty bits, which may be populated in any of the ptes > - * of the contig range. We may not be holding the PTL, so any contiguous > - * range may be unfolded/modified/refolded under our feet. Therefore we > - * ensure we read a _consistent_ contpte range by checking that all ptes > - * in the range are valid and have CONT_PTE set, that all pfns are > - * contiguous and that all pgprots are the same (ignoring access/dirty). > - * If we find a pte that is not consistent, then we must be racing with > - * an update so start again. If the target pte does not have CONT_PTE > - * set then that is considered consistent on its own because it is not > - * part of a contpte range. > + * The ptep_get_lockless() API requires us to read and return *orig_ptep > + * so that it is self-consistent, without the PTL held, so we may be > + * racing with other threads modifying the pte. Usually a READ_ONCE() > + * would suffice, but for the contpte case, we also need to gather the > + * access and dirty bits from across all ptes in the contiguous block, > + * and we can't read all of those neighbouring ptes atomically, so any > + * contiguous range may be unfolded/modified/refolded under our feet. > + * Therefore we ensure we read a _consistent_ contpte range by checking > + * that all ptes in the range are valid and have CONT_PTE set, that all > + * pfns are contiguous and that all pgprots are the same (ignoring > + * access/dirty). If we find a pte that is not consistent, then we must > + * be racing with an update so start again. If the target pte does not > + * have CONT_PTE set then that is considered consistent on its own > + * because it is not part of a contpte range. > */ I haven't had the time to properly think about this function but, depending on what its semantics are, we might not guarantee that, at the time of reading a pte, we have the correct dirty state from the other ptes in the range. Theoretical: let's say we read the first pte in the contig range and it's clean but further down there's a dirty one. Another (v)CPU breaks the contig range, sets the dirty bit everywhere, there's some pte_mkclean for all of them and they are collapsed into a contig range again. The function above on the first (v)CPU returns a clean pte when it should have actually been dirty at the time of read. Throughout the callers of this function, I couldn't find one where it matters. So I concluded that they don't need the dirty state. Normally the dirty state is passed to the page flags, so not lost after the pte has been cleaned. -- Catalin