From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F063C352A1 for ; Tue, 6 Dec 2022 17:39:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B75E8E0003; Tue, 6 Dec 2022 12:39:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 068908E0001; Tue, 6 Dec 2022 12:39:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E4B568E0003; Tue, 6 Dec 2022 12:39:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D70348E0001 for ; Tue, 6 Dec 2022 12:39:57 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A42C380639 for ; Tue, 6 Dec 2022 17:39:57 +0000 (UTC) X-FDA: 80212594434.01.0F4677F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 3321CA000B for ; Tue, 6 Dec 2022 17:39:57 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=McaPpzj8; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670348397; a=rsa-sha256; cv=none; b=Lv4uVXqcjIPnO8LJVOtXvJZKHo3Pozpy3Nd4pXZYVFweYQCQ/ICxj5rFpFS+j0Jq+UG8YU xOqxhx6WAFrXFpnQ3BqAn1h903rlYXf59VQ3W5Kv0xoaGxzy2jANXZoPR8vR07X8hW3FG3 RsLqflzX6w6xBTYiRCl6O17gm3bcIiU= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=McaPpzj8; spf=pass (imf15.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670348397; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9eSdDE0c0U/MUBWPyCAEkbqVPoDx/BF4tw0MN/NmEp8=; b=NnHx8LhielMJTrIMNNzYYJ64NEyXY761O0mxwYulMT6Ug7JphgF5eZfN91upr8WMTlj42p wXKAGC3J/UrSesbPwyhpwhuWaHTQ0wcjZVcpK68r/BvNKfm6p1W2E3rcvRmnGfIuwkFc8h 2OJEaQzedkgpPrVxGwv9vubuKDhvmgs= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670348396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9eSdDE0c0U/MUBWPyCAEkbqVPoDx/BF4tw0MN/NmEp8=; b=McaPpzj8GvaQLT8KRuKmFml1m5qYediqWhB4Vw3whN5UCZl1xAEHEFyMvRp+UoO1TyZ3cV uE+I+Hy65nfTNi9J4S4TpBb4GWNOmSFNiQwYmVnYJjAhVCY69TKw9jQ00uD3+KK3L0cJSO YLR8Qc/ol5YawjJpk6+YBasp9JsPZNs= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-484-SZJUlcRpNeSqFGFtDf7gTA-1; Tue, 06 Dec 2022 12:39:55 -0500 X-MC-Unique: SZJUlcRpNeSqFGFtDf7gTA-1 Received: by mail-qk1-f200.google.com with SMTP id h8-20020a05620a284800b006b5c98f09fbso21944324qkp.21 for ; Tue, 06 Dec 2022 09:39:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9eSdDE0c0U/MUBWPyCAEkbqVPoDx/BF4tw0MN/NmEp8=; b=auy71la572JEoz58NdZk2MTVOx2TwHaK2hviNFOPqU29EFOyyUDtMq3C+sgBWVBCPX P1cfc+tiidRJbNRIhU2Cl3qHeogVZvXQq8U7SjmVnZj9FYw9f5q5n/C6QyxpHqv/Xzbi iR5qVHlGbi+7pRO+GPgUJd4aCnMmUsQFpIpERBVt8A7Fnz7krI0TgtxJzI8rajBpE3s5 5lxCsDvxLB1bkKHqV1MUDtUwb6oCKO+xW5U5IQ962A0ml9JxvbeGJE31gyk9epKsPeh0 r6nAyvHq43MHcje1YSjhsfvcd7KliSl61ZqR59ObTQDJS7dBdLm1WiPT8AYnuTF2SXVP WX9A== X-Gm-Message-State: ANoB5pkm3ZQO9PZC5CSDLvwpogNNVBbhdqq6EoG3L6XpUyVZ15bYiOs2 xMU+bD2QI6MEjZ/Ip73UA3qjDyqVsTotFAlvE6S7kuJembqS3qVvtBkl8jZldMI4cbK9d6QQ9Kb YlPK3afJde9o= X-Received: by 2002:a0c:ee91:0:b0:4b4:a0b0:2dd8 with SMTP id u17-20020a0cee91000000b004b4a0b02dd8mr64284684qvr.19.1670348395129; Tue, 06 Dec 2022 09:39:55 -0800 (PST) X-Google-Smtp-Source: AA0mqf6b1+ISqrlH0lLOb+7V4W6ygsRdI5Zxx1V2ZKnqYPdlh7HbE9R+/H01/HLMkvJ008ZRLgd5lA== X-Received: by 2002:a0c:ee91:0:b0:4b4:a0b0:2dd8 with SMTP id u17-20020a0cee91000000b004b4a0b02dd8mr64284657qvr.19.1670348394821; Tue, 06 Dec 2022 09:39:54 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id f25-20020ac86ed9000000b003a4f2510e5dsm11815516qtv.24.2022.12.06.09.39.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Dec 2022 09:39:54 -0800 (PST) Date: Tue, 6 Dec 2022 12:39:53 -0500 From: Peter Xu To: Mike Kravetz Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , David Hildenbrand Subject: Re: [PATCH 09/10] mm/hugetlb: Make page_vma_mapped_walk() safe to pmd unshare Message-ID: References: <20221129193526.3588187-1-peterx@redhat.com> <20221129193526.3588187-10-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spamd-Result: default: False [0.10 / 9.00]; BAYES_HAM(-6.00)[100.00%]; SORBS_IRL_BL(3.00)[209.85.222.200:received]; SUSPICIOUS_RECIPS(1.50)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; BAD_REP_POLICIES(0.10)[]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_TWELVE(0.00)[12]; ARC_NA(0.00)[]; R_DKIM_ALLOW(0.00)[redhat.com:s=mimecast20190719]; MIME_TRACE(0.00)[0:+]; TAGGED_RCPT(0.00)[]; DMARC_POLICY_ALLOW(0.00)[redhat.com,none]; TO_MATCH_ENVRCPT_SOME(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; DKIM_TRACE(0.00)[redhat.com:+]; R_SPF_ALLOW(0.00)[+ip4:170.10.133.0/24]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3321CA000B X-Stat-Signature: fi9unuzaqdjeqdh7nfodryq7sp613kkb X-HE-Tag: 1670348397-154865 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 06, 2022 at 09:10:00AM -0800, Mike Kravetz wrote: > On 12/05/22 15:52, Mike Kravetz wrote: > > On 11/29/22 14:35, Peter Xu wrote: > > > Since page_vma_mapped_walk() walks the pgtable, it needs the vma lock > > > to make sure the pgtable page will not be freed concurrently. > > > > > > Signed-off-by: Peter Xu > > > --- > > > include/linux/rmap.h | 4 ++++ > > > mm/page_vma_mapped.c | 5 ++++- > > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > > > index bd3504d11b15..a50d18bb86aa 100644 > > > --- a/include/linux/rmap.h > > > +++ b/include/linux/rmap.h > > > @@ -13,6 +13,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > /* > > > * The anon_vma heads a list of private "related" vmas, to scan if > > > @@ -408,6 +409,9 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) > > > pte_unmap(pvmw->pte); > > > if (pvmw->ptl) > > > spin_unlock(pvmw->ptl); > > > + /* This needs to be after unlock of the spinlock */ > > > + if (is_vm_hugetlb_page(pvmw->vma)) > > > + hugetlb_vma_unlock_read(pvmw->vma); > > > } > > > > > > bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw); > > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > > > index 93e13fc17d3c..f94ec78b54ff 100644 > > > --- a/mm/page_vma_mapped.c > > > +++ b/mm/page_vma_mapped.c > > > @@ -169,10 +169,13 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > > > if (pvmw->pte) > > > return not_found(pvmw); > > > > > > + hugetlb_vma_lock_read(vma); > > > /* when pud is not present, pte will be NULL */ > > > pvmw->pte = huge_pte_offset(mm, pvmw->address, size); > > > - if (!pvmw->pte) > > > + if (!pvmw->pte) { > > > + hugetlb_vma_unlock_read(vma); > > > return false; > > > + } > > > > > > pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); > > > if (!check_pte(pvmw)) > > > > I think this is going to cause try_to_unmap() to always fail for hugetlb > > shared pages. See try_to_unmap_one: > > > > while (page_vma_mapped_walk(&pvmw)) { > > ... > > if (folio_test_hugetlb(folio)) { > > ... > > /* > > * To call huge_pmd_unshare, i_mmap_rwsem must be > > * held in write mode. Caller needs to explicitly > > * do this outside rmap routines. > > * > > * We also must hold hugetlb vma_lock in write mode. > > * Lock order dictates acquiring vma_lock BEFORE > > * i_mmap_rwsem. We can only try lock here and fail > > * if unsuccessful. > > */ > > if (!anon) { > > VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > > if (!hugetlb_vma_trylock_write(vma)) { > > page_vma_mapped_walk_done(&pvmw); > > ret = false; > > } > > > > > > Can not think of a great solution right now. > > Thought of this last night ... > > Perhaps we do not need vma_lock in this code path (not sure about all > page_vma_mapped_walk calls). Why? We already hold i_mmap_rwsem. Exactly. The only concern is when it's not in a rmap. I'm actually preparing something that adds a new flag to PVMW, like: #define PVMW_HUGETLB_NEEDS_LOCK (1 << 2) But maybe we don't need that at all, since I had a closer look the only outliers of not using a rmap is: __replace_page write_protect_page I'm pretty sure ksm doesn't have hugetlb involved, then the other one is uprobe (uprobe_write_opcode). I think it's the same. If it's true, we can simply drop this patch. Then we also have hugetlb_walk and the lock checks there guarantee that we're safe anyways. Potentially we can document this fact, which I also attached a comment patch just for it to be appended to the end of the patchset. Mike, let me know what do you think. Andrew, if this patch to be dropped then the last patch may not cleanly apply. Let me know if you want a full repost of the things. Thanks, -- Peter Xu