From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C2A43C77B7C for ; Thu, 11 May 2023 14:05:20 +0000 (UTC) Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4QHDDt6bDXz3fPR for ; Fri, 12 May 2023 00:05:18 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=lL2e25Q7; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=infradead.org (client-ip=2001:8b0:10b:1236::1; helo=casper.infradead.org; envelope-from=willy@infradead.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=lL2e25Q7; dkim-atps=neutral Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4QHDCr5Rq7z3fFY for ; Fri, 12 May 2023 00:04:23 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=agTIVDV40QlNuRDmtaMpYQzLmx8ylyt8K8o0YwfCE44=; b=lL2e25Q7Ym9Z2f1ilo2kWwHtLS 4LEtDqoDAi/1aC4kjoeLMlHtLt+a+FS4EW3ttxbaQDhTXZfyi02WOEzNc3/vCsr0owbyF5PEsIwHc 4EGA9zShanxJYpMWANvO4AfYvUXSJigLrvPdNGyjPeVrrNrqH+tCpFoKNI6P6bYEGIKsYYtBzAtzu 8Ajl8g5ng54yFG0pf0AiaE1GeB3gEJ+vkOMxu+NkdXV1EBgexS5qxecZBXe6DpvnyQd7D8QuN/WHz qIZBX6djd2pg1T+ptsLEa/faZBnf92py7b8vnk1aHVuruD1rplz5XIZSxcsHdZBQZO/oPVfH/8958 /oI/ASsA==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1px6sh-00HGyK-LW; Thu, 11 May 2023 14:02:55 +0000 Date: Thu, 11 May 2023 15:02:55 +0100 From: Matthew Wilcox To: Hugh Dickins Subject: Re: [PATCH 00/23] arch: allow pte_offset_map[_lock]() to fail Message-ID: References: <77a5d8c-406b-7068-4f17-23b7ac53bc83@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michel Lespinasse , linux-ia64@vger.kernel.org, David Hildenbrand , Peter Zijlstra , Catalin Marinas , Qi Zheng , linux-kernel@vger.kernel.org, Max Filippov , sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Claudio Imbrenda , Will Deacon , Greg Ungerer , linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, Helge Deller , x86@kernel.org, Russell King , Geert Uytterhoeven , Christian Borntraeger , Alexandre Ghiti , Heiko Carstens , linux-m68k@lists.linux-m68k.org, John Paul Adrian Glaubitz , John David Anglin , Suren Baghdasaryan , linux-arm-kernel@lists.infradead.org, Chris Z ankel , Michal Simek , Thomas Bogendoerfer , linux-parisc@vger.kernel.org, linux-mm@kvack.org, linux-mips@vger.kernel.org, Palmer Dabbelt , "Kirill A. Shutemov" , "Aneesh Kumar K.V" , Andrew Morton , linuxppc-dev@lists.ozlabs.org, "David S. Miller" , Mike Rapoport , Mike Kravetz Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, May 10, 2023 at 09:35:44PM -0700, Hugh Dickins wrote: > On Wed, 10 May 2023, Matthew Wilcox wrote: > > On Tue, May 09, 2023 at 09:39:13PM -0700, Hugh Dickins wrote: > > > Two: pte_offset_map() will need to do an rcu_read_lock(), with the > > > corresponding rcu_read_unlock() in pte_unmap(). But most architectures > > > never supported CONFIG_HIGHPTE, so some don't always call pte_unmap() > > > after pte_offset_map(), or have used userspace pte_offset_map() where > > > pte_offset_kernel() is more correct. No problem in the current tree, > > > but a problem once an rcu_read_unlock() will be needed to keep balance. > > > > Hi Hugh, > > > > I shall have to spend some time looking at these patches, but at LSFMM > > just a few hours ago, I proposed and nobody objected to removing > > CONFIG_HIGHPTE. I don't intend to take action on that consensus > > immediately, so I can certainly wait until your patches are applied, but > > if this information simplifies what you're doing, feel free to act on it. > > Thanks a lot, Matthew: very considerate, as usual. > > Yes, I did see your "Whither Highmem?" (wither highmem!) proposal on the I'm glad somebody noticed the pun ;-) > list, and it did make me think, better get these patches and preview out > soon, before you get to vanish pte_unmap() altogether. HIGHMEM or not, > HIGHPTE or not, I think pte_offset_map() and pte_unmap() still have an > important role to play. > > I don't really understand why you're going down a remove-CONFIG_HIGHPTE > route: I thought you were motivated by the awkardness of kmap on large > folios; but I don't see how removing HIGHPTE helps with that at all > (unless you have a "large page tables" effort in mind, but I doubt it). Quite right, my primary concern is filesystem metadata; primarily directories as I don't think anybody has ever supported symlinks or superblocks larger than 4kB. I was thinking that removing CONFIG_HIGHPTE might simplify the page fault handling path a little, but now I've looked at it some more, and I'm not sure there's any simplification to be had. It should probably use kmap_local instead of kmap_atomic(), though. > But I've no investment in CONFIG_HIGHPTE if people think now is the > time to remove it: I disagree, but wouldn't miss it myself - so long > as you leave pte_offset_map() and pte_unmap() (under whatever names). > > I don't think removing CONFIG_HIGHPTE will simplify what I'm doing. > For a moment it looked like it would: the PAE case is nasty (and our > data centres have not been on PAE for a long time, so it wasn't a > problem I had to face before); and knowing pmd_high must be 0 for a > page table looked like it would help, but now I'm not so sure of that > (hmm, I'm changing my mind again as I write). > > Peter's pmdp_get_lockless() does rely for complete correctness on > interrupts being disabled, and I suspect that I may be forced in the > PAE case to do so briefly; but detest that notion. For now I'm just > deferring it, hoping for a better idea before third series finalized. > > I mention this (and Cc Peter) in passing: don't want this arch thread > to go down into that rabbit hole: we can start a fresh thread on it if > you wish, but right now my priority is commit messages for the second > series, rather than solving (or even detailing) the PAE problem. I infer that what you need is a pte_access_start() and a pte_access_end() which look like they can be plausibly rcu_read_lock() and rcu_read_unlock(), but might need to be local_irq_save() and local_irq_restore() in some configurations? We also talked about moving x86 to always RCU-free page tables in order to make accessing /proc/$pid/smaps lockless. I believe Michel is going to take a swing at this project.