From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4F0A8C77B73 for ; Tue, 6 Jun 2023 06:19:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:Message-ID:In-Reply-To:Subject:cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=SyfyL02n/rFMd0MXxW49MdJ+VUbf7EWpkJoh1wD9Yv8=; b=C3QugAB5bbZWFCHRsEeUgzM5jo DglnwxCMm3j4K8IQFaN/2urftGGOgtCz4JaWky9d1eIv57APUbolPZZlUCxgkv5sNqopQVJvFZ5fg BcK6FTs3gptT2CSOUvibc3BnxvSRzhpbRFW1e/+HyBT+GuY61jv8yiNn4sLrhhl8nPnEQ9/fmER+O XDfJBezJVO0kTy/J4xQbEQszgR6LzVA3WlSOdwgTAul2/J/NNXFsIWLfceYrTit5aHdhOder9Yx6p V23aqGkEXrfR4BBe2WEiGzLqxi+2hwIs703Amm0YD7ZIDzBH0eauP04PVeshhsbFP0YliTbL9fcsA 05LW1nWw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1q6Q26-000Lke-0y; Tue, 06 Jun 2023 06:19:06 +0000 Received: from mail-yw1-x1135.google.com ([2607:f8b0:4864:20::1135]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1q6Q23-000Liq-0u for linux-arm-kernel@lists.infradead.org; Tue, 06 Jun 2023 06:19:05 +0000 Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-565a6837a0bso62681937b3.3 for ; Mon, 05 Jun 2023 23:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686032342; x=1688624342; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=+sTRYTVdx/ux01uBjEjpeUqJbg1WCUQQStTJJJ+iSYQ=; b=YVpcnR57XK58Ak9HJaKiD+AbmN0H/CYge3w1rDZy+kp+ixEQ5UXDhUW9pOYeesSqFj UIRUIc+94+rFepe5SS0IdOo5iztw/rOy6CZpzWrU+IO4mNui1Zkgo9h3jingrr3fdVeR HIuDw7tANLptHE48ffNhMDJFR/2yVZoygTObBxZHLrYwZ5ig0FO5oMVOAbG5JmgJ0UWU 6+2GTtgCPvXlaoClixmzd6lsXHoN+YRamBAk/DTYCWg1LdM33PXGWziAwYwNFEbX76sd 8nwM0BIPLw8ZG/nUnsDIXqbiioZwSbfpAnbTQqesPpDB88/NKXiVlbDpREXhm6LvGiU9 9BzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686032342; x=1688624342; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+sTRYTVdx/ux01uBjEjpeUqJbg1WCUQQStTJJJ+iSYQ=; b=d+Cwro9aUJNwALpLuwj6XIqJ+cg9AtiA3erT6uEVgeolaw03wS+tsFtAJHT3epZWlW nI0E9VQuZy6GQVKxVjEqKLxgr/Y7rwReUcV5iEJV0Dzr2cjIwVcPgbJNmZ1qdYYJfKG4 JHEGcX2ZuWFyE0DN/oyDLj7+ZvDxmcUyICFsIuvi3ugiH3HcTmOyUEqNOYCHIKcQ0HCK h9H0he8gHn611D5sHZGWe4ZTfQ+mOtSg4KKbRRleABJxKJULDcIKuRZzfPkNNnY+Ehh6 +AgA5570AwBXbGEV60UA2lH2KMv7lzXyTDhl2uMkICNlpgz42eD8l0PhUtsKuq35Gea2 aE8A== X-Gm-Message-State: AC+VfDw4lheq12vWMd9iDz2nL15RzWcN+IZf+WAwGBmP8Jqap9hneyUL SKN07gA3IMpkbnmJgNL8bXgNwA== X-Google-Smtp-Source: ACHHUZ5WPUWoPm4xuDxQsHa6AYmw1XOJYysfo3qH0lT+OY94v7UN7xIfgoB+mAV06b+Ahd1uLZ7ohg== X-Received: by 2002:a81:6d46:0:b0:565:d3f9:209e with SMTP id i67-20020a816d46000000b00565d3f9209emr1132264ywc.34.1686032341554; Mon, 05 Jun 2023 23:19:01 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id i187-20020a816dc4000000b0055d820f11cesm3876070ywc.13.2023.06.05.23.18.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 23:19:00 -0700 (PDT) Date: Mon, 5 Jun 2023 23:18:50 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Jann Horn cc: Hugh Dickins , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Xu , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock In-Reply-To: Message-ID: References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <2e9996fa-d238-e7c-1194-834a2bd1f60@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-64387534-1686032340=:3708" X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230605_231903_328848_F7ECB918 X-CRM114-Status: GOOD ( 30.82 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-64387534-1686032340=:3708 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Wed, 31 May 2023, Jann Horn wrote: > On Mon, May 29, 2023 at 8:25=E2=80=AFAM Hugh Dickins w= rote: > > +static void retract_page_tables(struct address_space *mapping, pgoff_t= pgoff) =2E.. > > + * Note that vma->anon_vma check is racy: it can be set= after > > + * the check, but page locks (with XA_RETRY_ENTRYs in h= oles) > > + * prevented establishing new ptes of the page. So we a= re safe > > + * to remove page table below, without even checking it= 's empty. >=20 > This "we are safe to remove page table below, without even checking > it's empty" assumes that the only way to create new anonymous PTEs is > to use existing file PTEs, right? What about private shmem VMAs that > are registered with userfaultfd as VM_UFFD_MISSING? I think for those, > the UFFDIO_COPY ioctl lets you directly insert anonymous PTEs without > looking at the mapping and its pages (except for checking that the > insertion point is before end-of-file), protected only by mmap_lock > (shared) and pte_offset_map_lock(). Right, from your comments and Peter's, thank you both, I can see that userfaultfd breaks the usual assumptions here: so I'm putting an =09=09if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) check in once we've got the ptlock; with a comment above it to point the blame at uffd, though I gave up on describing all the detail. And deleted this earlier "we are safe" paragraph. You did suggest, in another mail, that perhaps there should be a scan checking all pte_none() when we get the ptlock. I wasn't keen on yet another debug scan for bugs and didn't add that, thinking I was going to add a patch on the end to do so in page_table_check_pte_clear_range(). But when I came to write that patch, found that I'd been misled by its name: it's about checking or adjusting some accounting, not really a suitable place to check for pte_none() at all; so just scrapped it. =2E.. > > - collapse_and_free_pmd(mm, vma, addr, pmd); >=20 > The old code called collapse_and_free_pmd(), which involves MMU > notifier invocation... =2E.. > > + pml =3D pmd_lock(mm, pmd); > > + ptl =3D pte_lockptr(mm, pmd); > > + if (ptl !=3D pml) > > + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > > + pgt_pmd =3D pmdp_collapse_flush(vma, addr, pmd); >=20 > ... while the new code only does pmdp_collapse_flush(), which clears > the pmd entry and does a TLB flush, but AFAICS doesn't use MMU > notifiers. My understanding is that that's problematic - maybe (?) it > is sort of okay with regards to classic MMU notifier users like KVM, > but it's probably wrong for IOMMUv2 users, where an IOMMU directly > consumes the normal page tables? Right, I intentionally left out the MMU notifier invocation, knowing that we have already done an MMU notifier invocation when unmapping any PTEs which were mapped: it was necessary for collapse_and_free_pmd() in the collapse_pte_mapped_thp() case, but there was no notifier in this case for many years, and I was glad to be rid of it. However, I now see that you were adding it intentionally even for this case in your f268f6cf875f; and from later comments in this thread, it looks like there is still uncertainty about whether it is needed here, but safer to assume that it is needed: I'll add it back. >=20 > (FWIW, last I looked, there also seemed to be some other issues with > MMU notifier usage wrt IOMMUv2, see the thread > .) ---1463760895-64387534-1686032340=:3708 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ---1463760895-64387534-1686032340=:3708--