From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E09C0C52D7C for ; Fri, 9 Aug 2024 17:15:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7BD216B00A0; Fri, 9 Aug 2024 13:15:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76D936B00A1; Fri, 9 Aug 2024 13:15:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 635526B00A2; Fri, 9 Aug 2024 13:15:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 4506F6B00A0 for ; Fri, 9 Aug 2024 13:15:30 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DEF66413A8 for ; Fri, 9 Aug 2024 17:15:29 +0000 (UTC) X-FDA: 82433358378.14.DBE5EA9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id B7EC91A0018 for ; Fri, 9 Aug 2024 17:15:27 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Uum4qQ7g; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723223674; a=rsa-sha256; cv=none; b=yIvQr1BcqaXrGAhQEoOSVBU/o23vsTHJ3NtgP5XBSoPpHoifwYwPXhwGcOD2fJaropHsSy t3EIeet2xM+8p6YycBLb6FyS7AntKABcVRFvltXrfv6sWYwWS4HXNarwav4ZWznLLj+6Xb XqAczZhJfCCOgLMAKHzFi+kC2ywTav4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Uum4qQ7g; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723223674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=k1RM34U80cBuoBEaB3XScsq2LvIzrQPbOvXLfubi1Yw=; b=qr0COsfrQnfna1AA0qAwXOn5DbQupyy+OTzT0QLgBYdGgoNhxFGIiA+vQK1wtziBprrUJD 1Zkv/3c/0Nzx5kG8iQ/rfyb07gDJ+hffd/YhOF+2X45IU+CTLM+v+4YGieHxUhUS6kx2yu 6TiinX6fTqceElHWLzzv5sTAD1t/daY= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723223727; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k1RM34U80cBuoBEaB3XScsq2LvIzrQPbOvXLfubi1Yw=; b=Uum4qQ7g4woKGEJ/mzhRq3R8CRHLomDq16UHTaY5U3O7Y4N7izmhvbDdwh1l9AccXuVs8n 10MnzJuwG6DCJWeXSf2NcUxeZUUAG4JdzN22NmqphdsR7IMOcbkSzBAjHudmsZGacO8N0F WOgu8Ab72ddWG9ZfA+V2L8d0+v7mM1A= Received: from mail-oi1-f199.google.com (mail-oi1-f199.google.com [209.85.167.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-172-yGqoEYqtM3G6aR67Sz36lg-1; Fri, 09 Aug 2024 13:15:25 -0400 X-MC-Unique: yGqoEYqtM3G6aR67Sz36lg-1 Received: by mail-oi1-f199.google.com with SMTP id 5614622812f47-3dc2b094f1aso648271b6e.0 for ; Fri, 09 Aug 2024 10:15:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723223725; x=1723828525; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=k1RM34U80cBuoBEaB3XScsq2LvIzrQPbOvXLfubi1Yw=; b=jynsOc0mEP/gYQYt8htg2PATUctV7F7ql4EfXU/N2ZeF3gkV+6iCGjHYnxNl6YL11f eiSP5s2079A+3Zljf1drkIuXNK9jnNT+1bvzjcn/Pd7U7ESs8px3WwLoafNKCozJZ06u apwNXMyliBHwtnNwDA6Lhvtg3b67O1g9vc2WP/DqLOGi4d1VhxKdmDML1OiKbFdPsaaN Kxcrm0sYjTJitMOibZ3tuwwfRmCd6PNHyojTuwuwJZ3iRlN1yz8vn7R/s6kgPSFweZbT pFzORZ+hOdU7tieLu1dbu+KZ4CoKrWRwsC8xt10cPaIhfDtCvkx31hizoK8Mbu2BBHVu dQzg== X-Gm-Message-State: AOJu0YxA/cV0RhLBKh29+2EnD2g+hQDtAxyw8sRHgy8b4NAxUey/vQLp CvranHYaxai3M8msbFJFRBoX1q1bfcOu4M1qK52sQJNzLu8jnIsHSPapZsnoAHQSrQ7S/d9ECCo ztxI4V/gwjIIRu5oVnJM9Q6S2k+X7lUeMWzwtBBOm1WdVc2rt X-Received: by 2002:a05:6830:4113:b0:70b:2a0f:d2d8 with SMTP id 46e09a7af769-70b7c47ca90mr1694640a34.4.1723223724964; Fri, 09 Aug 2024 10:15:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHDZ8AG8hkigQJRUf5X9BYk1irDNCGK0SoMgTlS1KNGbXDLrarAMaXAoHbkQ/XXZWQbMY2eQg== X-Received: by 2002:a05:6830:4113:b0:70b:2a0f:d2d8 with SMTP id 46e09a7af769-70b7c47ca90mr1694631a34.4.1723223724644; Fri, 09 Aug 2024 10:15:24 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a3785e67easm276234285a.52.2024.08.09.10.15.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Aug 2024 10:15:24 -0700 (PDT) Date: Fri, 9 Aug 2024 13:15:21 -0400 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240809160909.1023470-1-peterx@redhat.com> <20240809160909.1023470-8-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B7EC91A0018 X-Stat-Signature: 5obf9bocqgz85suwhuoe4yxh7cs8wzfs X-Rspam-User: X-HE-Tag: 1723223727-92623 X-HE-Meta: U2FsdGVkX1/aA9eZ4cC9ADpSSOfEq8t6wt4XjSisD6XfuvTQUvjkybq3g3+6Fd6B99X6VdbxfxnpzUYrlaXzMsfhL3WcTX9maMDuR7vG+t9/9VbcHBji5nLFptSveLR28HpCEokoZa8khAFgGaTSYLEqYcRPFeqTEi0qE2igLegCiThx5i3cvKfDnRWno13rwUKvR1PBMVKilen5qe9n8cQS3unJbw4JuCSBC5j15MnnojPHD4tYlt22M/f/4/D+TBjUsh4xTqs3fmndsBnpdhDYxeud3vpfb0aELTrC9dfqhlwPM+fbMTBXzhvshpSbyvygj9ElR9mCfJ7AYbZzK3DoaJbn1nMHx5tKhx/4d2zNMBchZHE9ug3keVh66D6PmdT8Os6yNPF+/2Umbm/KcccYbWr1vUfCoFCaA3BEscCA3KNp8TJmd/fybEnyabQayFka3YarOwMBNygfcl2S1O0ZxZajKcNSab2QF/4XNHUgOWtFs+/iFCQkHblzpm83RVL6VA0W3xMKvvWKW0OWUl4wnoZH5abJjanRsPeVxkBBAbhhJNtAnqrY5z/+BUCSW2ZDTbB2S8dYHHVSEPClqaHuL1xzGOWFbrGz15v7iwefHk07jf/EatavzHRuwctzmVeZxgO/8EkaIVN3DYKPVpcu90NEXw+zl6G+ill/pqOB6V9X/Y2rPO0sHGZlUQyCqZGlQo91el8OSRJOIvdiwMHyjNXCHIiIsDGFpCFLiYjCuM8hF/vbmxWXKvpZF4D47oOjqgVwn9dCXrfKHOfg/JlQfLm/RJBFXmaxYo0nVaZtw+/A9DRifhtvw02E0lJ86OBVZguFC/Ercw7vTXlc5d4pqC+heHtaLiGvdyBxKFEMBOZwsa2fyOXQ+6sCdPE3VgbKB802YTD3IqGFuRhBuCTZRqtGO0Y8OXF3KFCqu7XN5Ua83jsTVfDZVGHKnwH0BQNHzQiRuP89ncpI0U7 Jh8l8IJW fCm6MAkioNf2ImSeUGymyQSp2IMMOgwMlY2wm9vzMAL5LRD8vzyjaW67Nej+6U4sgkC5WtYdOuecZjb/DWVjqLdFihCQvsRWdmX2CD+aBy04S5ZwvnqJ+Fr4LUtuCVMJCHRWqZH4Yhcr4jDhD9qzk2Xy14n6wH0a7EssH8kNqHyP2N5v9l04VfUGNw9VardPEwa7HNv2cyBAiLT+BvpJ+mvaobqgWdSdUoXDnY86pdsc+qrUuMSo4/Mu5sEfoFWCTgVgNhu3Lmn6IeB8r+tnEmq9HobDKQvQLqSCNXCeQDYu7FcZ8ksmlkuTl/sRq4oTtNIZg7XQH5VuTy6MJ6UKLQmdfrZkGiq1V+rltPGFzLB7akysuxaKdDP6IlNV+SScvAcll6fRisqGRyL2Wn8L1FibMPF1e+pvHQVtQNYdCWTwbRpXa7qOWsQp56YMMDdcYio4o6ljDt4Qy3XMF/EI221V2NA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 09, 2024 at 06:32:44PM +0200, David Hildenbrand wrote: > On 09.08.24 18:08, Peter Xu wrote: > > Teach the fork code to properly copy pfnmaps for pmd/pud levels. Pud is > > much easier, the write bit needs to be persisted though for writable and > > shared pud mappings like PFNMAP ones, otherwise a follow up write in either > > parent or child process will trigger a write fault. > > > > Do the same for pmd level. > > > > Signed-off-by: Peter Xu > > --- > > mm/huge_memory.c | 27 ++++++++++++++++++++++++--- > > 1 file changed, 24 insertions(+), 3 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 6568586b21ab..015c9468eed5 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -1375,6 +1375,22 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pgtable_t pgtable = NULL; > > int ret = -ENOMEM; > > + pmd = pmdp_get_lockless(src_pmd); > > + if (unlikely(pmd_special(pmd))) { > > + dst_ptl = pmd_lock(dst_mm, dst_pmd); > > + src_ptl = pmd_lockptr(src_mm, src_pmd); > > + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); > > + /* > > + * No need to recheck the pmd, it can't change with write > > + * mmap lock held here. > > + */ > > + if (is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)) { > > + pmdp_set_wrprotect(src_mm, addr, src_pmd); > > + pmd = pmd_wrprotect(pmd); > > + } > > + goto set_pmd; > > + } > > + > > I strongly assume we should be using using vm_normal_page_pmd() instead of > pmd_page() further below. pmd_special() should be mostly limited to GUP-fast > and vm_normal_page_pmd(). One thing to mention that it has this: if (!vma_is_anonymous(dst_vma)) return 0; So it's only about anonymous below that. In that case I feel like the pmd_page() is benign, and actually good. Though what you're saying here made me notice my above check doesn't seem to be necessary, I mean, "(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd))" can't be true when special bit is set, aka, pfnmaps.. and if it's writable for CoW it means it's already an anon. I think I can probably drop that line there, perhaps with a VM_WARN_ON_ONCE() making sure it won't happen. > > Again, we should be doing this similar to how we handle PTEs. > > I'm a bit confused about the "unlikely(!pmd_trans_huge(pmd)" check, below: > what else should we have here if it's not a migration entry but a present > entry? I had a feeling that it was just a safety belt since the 1st day of thp when Andrea worked that out, so that it'll work with e.g. file truncation races. But with current code it looks like it's only anonymous indeed, so looks not possible at least from that pov. Thanks, > > Likely this function needs a bit of rework. > > -- > Cheers, > > David / dhildenb > -- Peter Xu