From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36052C3DA7A for ; Mon, 2 Jan 2023 14:37:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80C1A8E0002; Mon, 2 Jan 2023 09:37:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 794C98E0001; Mon, 2 Jan 2023 09:37:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 65CBE8E0002; Mon, 2 Jan 2023 09:37:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 535E18E0001 for ; Mon, 2 Jan 2023 09:37:01 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 14C651C440E for ; Mon, 2 Jan 2023 14:37:01 +0000 (UTC) X-FDA: 80310111042.04.F3AFF2C Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf21.hostedemail.com (Postfix) with ESMTP id 003281C001A for ; Mon, 2 Jan 2023 14:36:57 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=QUih8KNn; spf=none (imf21.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672670218; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IHJbXSStoTOW1TXphXY3233q5587iv+dojH9JRS4b8Y=; b=KMmoKojbgFhElyO4ARfSY/VzgWyHda9iv8QdJVlslFQEVZsMPd8wAvPX09eKWjmaWDvhrI drcWe6AddkdBS73wg3ezOgMnJrvdoTmeCtK+lUP0bv46Ua4oZmg571W1DdVlbrENwgxYCI of1FpSFWQOcr4Zj+HwOb2eNyqTC9uAY= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=QUih8KNn; spf=none (imf21.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672670218; a=rsa-sha256; cv=none; b=DBv7NHH/ndHVT5lhNZtT7oHq/Qz+y6yhjpjxLOKHTJbqmAI3uf1u371NC82lr5hiQL6OpO vmOJN7mkRDHYox2qukkh4oy27ghE4OLW8Pxe3uXGRKTzfF7m3EfBClLEIfYuJdiqfRyaC3 PRB7Jz0kD9Lp4ej7CsFT1gcN3GU2iYY= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=IHJbXSStoTOW1TXphXY3233q5587iv+dojH9JRS4b8Y=; b=QUih8KNnF4kH+vRtjaGFtkBLuH MbB3v3V26JuxZSmYx6Dy7uAi0QLvocEqVUwx2EvpvGDRzPpItDue6/4aCsjlPiZrbE8Bj17mzGVTR EtBlzaG9hKTvWDND3awzE6yfycMcaVfV1pyz2RiB3eHGvP6Vte49xtWNgWJka4g/1nkaXIuxYGsLe i1J3vWgwk6RoRU6q5t6aj7QIXiOLoQWUC2xqxZyphwn1B90KfcL1SOQwcnldjZxxblpu1TXF1gY3p JLBc9TVTsgCY3OmpksAeKpiV2XrWUyd2QunTEpnu8Si/hruDP06mcucZpY/jWSP7VqvksLWJbfDYf hhSm9S0A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pCLvz-00DEAH-0q; Mon, 02 Jan 2023 14:37:03 +0000 Date: Mon, 2 Jan 2023 14:37:02 +0000 From: Matthew Wilcox To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: linux-mm@kvack.org, liam.howlett@oracle.com, surenb@google.com, ldufour@linux.ibm.com, michel@lespinasse.org, vbabka@suse.cz, linux-kernel@vger.kernel.org Subject: Re: [QUESTION] about the maple tree and current status of mmap_lock scalability Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 003281C001A X-Stat-Signature: qbw6tjsabw5ectq68z1mdwdayzk56hyq X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1672670217-789176 X-HE-Meta: U2FsdGVkX1+2LI064qRIV+g9mS4EjzwVCtxFkkxGoMdowStWzmeMMPQ9Rqms9obse+8ZvJnZZpM6nWMJf4DfD/6VYB6unvgM05VzW3ZSUaPxegkUlGZoQ23vD3OTV2cd3sVgDZdK3oGpaW6R2J3y0OdHzGZ9JHiP2/AFdvrTEPUnDapoQsmuhwbdfJ4rRxRMMUByORf72aQfHxwxGoKcO38Zlmu0RP462elpMFhReKK4QvuiEuwEnUPJ6ABjhQ8lXL3/T5galnls4U/VxGnX2s9UpYugWDSCe3g0P1ZKw8LfIiyG/0AdcSU4FbtT6OHriguDu+MBbn1vLiFJWIcTIuf1lLyZdt7leuIqUOk1xQyea5/qWSuWqaKMxzSUgrL2ZauNMKInzx0JIStDGX751bhD/XvOIKOXXa7uiZsSIg2QR8aDhFXd9rjbhOZoC4cOL+/cB4ULjGNzgiU1FQ+7tgVtmmOXt2TOwMrcpTov7193AqZFudTgSjbqAmXnaXxq+VlH/qX8p7arp9WYpUPPQfOIqdnS+8+bhuO8nvJXL+vHKd+LsunT0WUYK4XOPKlA2q6FnZ4DZcaiUc0tmythT8yZBJkatp7rRnwMAgSUr++H2q271I2DGgcv390MBXy0VJhNZ9XqNGodnFNGjTeeAVhXcpphZcl2SK9eygJhJnU97ZBHZ80txpDH+S40BYdNGURWdqgpW8ekO1pxrT5xzeVh2Ilfn25x97C7ZRa5HosKRSDk6gdVK7eKM5bt1udiN2NCaAyxP7tQztTPD/e2gGg6UNK7LVk+BtvMWyxZ9dgPQbUhQVjb9FdSyHzyTEWMi2BWVyXEoN8uypuqFd5yTJVIr3O3II7WO6qTczBlIICv+sbrkFgljGBHX3Ta8pO1heXnUGXGlbKvsSQ+Qwih9KMa45/Ps4FkUKAgM5/QZPo4FuUF4aJPb9hjM6ztN1LL9r45SAjSgyxfpGPKUG0 gGg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jan 02, 2023 at 09:04:12PM +0900, Hyeonggon Yoo wrote: > > https://www.infradead.org/~willy/linux/store-free-page-faults.html > > outlines how I intend to proceed from Suren's current scheme (where > > RCU is only used to protect the tree walk) to using RCU for the > > entire page fault. > > Thank you for sharing this your outlines. > Okay, so the planned scheme is: > > 1. Try to process entire page fault under RCU protection > - if failed, goto 2. if succeeded, goto 4. > > 2. Fall back to Suren's scheme (try to take VMA rwsem) > - if failed, goto 3. if succeeded, goto 4. Right. The question is whether to restart the page fault under Suren's scheme, or just grab the VMA rwsem and continue. Experimentation needed. It's also worth noting that Michel has an alternative proposal, which is to drop out of RCU protection before trying to allocate memory, then re-enter RCU mode and check the sequence count hasn't changed on the entire MM. His proposal has the advantage of not trying to allocate memory while holding the RCU read lock, but the disadvantage of having to retry the page fault if anyone has called mmap() or munmap(). Which alternative is better is going to depend on the workload; do we see more calls to mmap()/munmap(), or do we need to enter page reclaim more often? I think they're largely equivalent performance-wise in the fast path. Another metric to consider is code complexity; he thinks his method is easier to understand and I think mine is easier. To be expected, I suppose ;-) > 3. Fall back to mmap_lock > - goto 4. > > 4. Finish page fault. > > To implement 1, __p*d_alloc() need to take gfp flags > not to sleep in RCU read-side critical section. > > What about introducing PF_MEMALLOC_NOWAIT process flag forcing > GFP_NOWAIT | __GFP_NOWARN > > similar to PF_MEMALLOC_NO{FS,IO}, looking like this? > > Will be less churn. Certainly less churn, but also far more risky. All of a sudden, codepaths which used to always succeed will now start failing, and either there aren't checks for memory allocation failures or those paths have never been tested before.