From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D72EC433DB for ; Tue, 9 Feb 2021 17:38:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 75C5164EAA for ; Tue, 9 Feb 2021 17:38:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75C5164EAA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5B5466B006E; Tue, 9 Feb 2021 12:38:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 565B06B0070; Tue, 9 Feb 2021 12:38:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4A20B6B0071; Tue, 9 Feb 2021 12:38:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id 349F86B006E for ; Tue, 9 Feb 2021 12:38:28 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id ED932249E for ; Tue, 9 Feb 2021 17:38:27 +0000 (UTC) X-FDA: 77799438654.12.skirt26_030698627609 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id C3F13180555C0 for ; Tue, 9 Feb 2021 17:38:24 +0000 (UTC) X-HE-Tag: skirt26_030698627609 X-Filterd-Recvd-Size: 5752 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Tue, 9 Feb 2021 17:38:24 +0000 (UTC) Received: by mail-qt1-f180.google.com with SMTP id e11so13525491qtg.6 for ; Tue, 09 Feb 2021 09:38:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=mLSmqabDwYFBYi6R7A5OXibCeFhbh+v90ks94+BX63M=; b=T3ylxMweOrlB1WloT50ZkS+YwOBxqX//l+AFEOJ2xQPgjGExpeTeGNJ2OTEOV/KuCL QryUgufRtNU1IruEQZtGH+a0j2kVJ6k4/Zk6crPTVtLB642UJ3taR4WrGdjXdu8UxQFH Tdig/by2kh5o0TfIeAvgwNWesDfT+x7GM5YUE+QQG2AOoNozJ3WGM2RqpONv8PZfVxSQ pVh/cSxS47FWPktwAi+BoGrX1pXmdi0pxTT61q1sHjyDW5VdztAvRhqkESuNT+60Vn8F sYGY3L35o/9y8oyNrgNH5+rO2Ava+7Oo8sACZVjkIAC+WJfPGuFbgfOVGoa96xqHoBf2 1MwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=mLSmqabDwYFBYi6R7A5OXibCeFhbh+v90ks94+BX63M=; b=mBmoEmlwyJohcIabAIHBGLvtuc1961EMZeZp4PWqHBBYdA9+ZCwANbS3s3qAiVx56I b5pAS1OEg5IlrzHuQE4DVAnwUyz2mprqVW4jppmHtUD4+yI6dLdnV51lMj/oDZjay887 tgq/2Jp3od27Dxh8Xg4P2zBFFm30gPitaF5Uv07ysX6TvHsm1GA+FdA3UmQFnvP5Pb4Q ypwhbwyZeKNwNLEqFdx8ac7tba8RbGszN1Um8lsjn2Y6YSft3lYF3HRCuWlFdV0lZ+Qp GqoLpbv+yTNgB4AkAxSws6g/5754dwi4fei7BgsZGJoAicgS8xWnYUa9fZgrPCZxAkVR vQnA== X-Gm-Message-State: AOAM5306Ut/Uum0krX9XSb2mvXm18kdFc8g/PRGmnknx1nCw/TDcvKRu rBvCRJyjMR5dYWibTQk44qJUnQ== X-Google-Smtp-Source: ABdhPJwO0LOVQxp6PHQnS7StmdHQZWWGplhN0uPuv/hjaYrCXJ8aJHi4PbMHP5YPssDYoeLi6MdO9g== X-Received: by 2002:ac8:5a01:: with SMTP id n1mr20749753qta.107.1612892303767; Tue, 09 Feb 2021 09:38:23 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-162-115-133.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.115.133]) by smtp.gmail.com with ESMTPSA id 15sm15130208qko.119.2021.02.09.09.38.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Feb 2021 09:38:23 -0800 (PST) Received: from jgg by mlx with local (Exim 4.94) (envelope-from ) id 1l9Wxy-005XDS-Hd; Tue, 09 Feb 2021 13:38:22 -0400 Date: Tue, 9 Feb 2021 13:38:22 -0400 From: Jason Gunthorpe To: Laurent Dufour Cc: Matthew Wilcox , linux-mm@kvack.org, "Liam R. Howlett" , Paul McKenney Subject: Re: synchronize_rcu in munmap? Message-ID: <20210209173822.GH4718@ziepe.ca> References: <20210208132643.GP308988@casper.infradead.org> <20210209142941.GY308988@casper.infradead.org> <17e3b4d0-8a16-75ba-e1c7-b678e4cf2089@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <17e3b4d0-8a16-75ba-e1c7-b678e4cf2089@linux.ibm.com> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 09, 2021 at 06:19:35PM +0100, Laurent Dufour wrote: > Le 09/02/2021 =C3=A0 15:29, Matthew Wilcox a =C3=A9crit=C2=A0: > > On Mon, Feb 08, 2021 at 01:26:43PM +0000, Matthew Wilcox wrote: > > > Next problem: /proc/$pid/smaps calls walk_page_vma() which starts o= ut by > > > saying: > > > mmap_assert_locked(walk.mm); > > > which made me realise that smaps is also going to walk the page tab= les. > > > So the page tables have to be pinned by the existence of the VMA. > > > Which means the page tables must be freed by the same RCU callback = that > > > frees the VMA. But doing that means that a task which calls mmap()= ; > > > munmap(); mmap(); must avoid allocating the same address for the se= cond > > > mmap (until the RCU grace period has elapsed), otherwise threads on > > > other CPUs may see the stale PTEs instead of the new ones. > > >=20 > > > Solution 1: Move the page table freeing into the RCU callback, call > > > synchronize_rcu() in munmap(). > > >=20 > > > Solution 2: Refcount the VMA and free the page tables on refcount > > > dropping to zero. This doesn't actually work because the stale PTE > > > problem still exists. > > >=20 > > > Solution 3: When unmapping a VMA, instead of erasing the VMA from t= he > > > maple tree, put a "dead" entry in its place. Once the RCU freeing = and the > > > TLB shootdown has happened, erase the entry and it can then be allo= cated. > > > If we do that MAP_FIXED will have to synchronize_rcu() if it overla= ps > > > a dead entry. > >=20 > > Solution 4: RCU free the page table pages and teach pagewalk.c to > > be RCU-safe. That means that it will have to use rcu_dereference() > > or READ_ONCE to dereference (eg) pmdp, but also allows GUP-fast to ru= n > > under the rcu read lock instead of disabling interrupts. >=20 > I might be wrong but my understanding is that the RCU window could not = be > closed on a CPU where IRQs are disabled. So in a first step GUP-fast mi= ght > continue to disable interrupts to get safe walking the page directories= . Yes, this is right. PPC already uses RCU for the TLB flush and the GUP-fast trick is safe against that. The comments for PPC say the downside of RCU is having to do an allocation in paths that really don't want to fail on memory exhaustion The pagewalk.c needs to call its ops in a sleepable context, otherwise it could just use the normal page table locks.. Not sure RCU could be fit into here? Jason