From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AEF1ECD3440 for ; Tue, 3 Sep 2024 21:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=F3xhew8FrM6lpAgmRkrxhdE3+1 ez43ORtV2NjhnApeL69mBUTS1dIJa4xlpYi77hMyJS9nxvix5A6eO3623kNN7ePP9P3yciD0GTiH1 Yd7pVkybbKpTgmP9OZKQqqgtSu0Wir+ndqwuAUR6qcfQQ37RwEwjcLiuT8Kt1b/fj8uEcKXRh9Lqb A7aY/1FXLRgyJCUgFGrr/zdcVsiV9TsyAwvokPNhrG1QixceoeJzxBilURCiK0mwZLsdkvbp/mdmY XX4M8OKRsLPYQ+P7M5ef3KJO4fC1KjdV3TCTE+2H5YXcfD5yEJ65R8bW2y6HetpRfECTWU8nLdKny w5zNOUwg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1slb1B-00000001yVn-2qoD; Tue, 03 Sep 2024 21:24:53 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1slb0C-00000001yLg-11QD for linux-arm-kernel@lists.infradead.org; Tue, 03 Sep 2024 21:23:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725398626; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=Rm34WT3JzFnaqvp9bAASZuWLHE9cm0ccwofUztYNT4fGXXqW+sBUFaN25UYoSUFDxT6FiW X57OziFZHonEFWYmmOXnw25Kk7X5gcoiKanqVMGwWvkWLhxGF0riUCIb8SFSCsWCNMvBph /vSFD7jpaaMHPDkbk5HelgATAcs0ycY= Received: from mail-il1-f199.google.com (mail-il1-f199.google.com [209.85.166.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-376-xj8jcJQdMcuPBqWpoLbmpA-1; Tue, 03 Sep 2024 17:23:43 -0400 X-MC-Unique: xj8jcJQdMcuPBqWpoLbmpA-1 Received: by mail-il1-f199.google.com with SMTP id e9e14a558f8ab-39d5537a659so62606715ab.1 for ; Tue, 03 Sep 2024 14:23:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725398622; x=1726003422; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=H9BUz8/XF0Y9T8CJfW/HZEGzqGxZl9wZoaf3L2iMKE4=; b=Q2IC1qJCG3a+x95foRKLKtRVlbQgmfi/3DL61niZhdkiB7OrRTT5KV+V9s/V0/PvFk VXGcvBufMv6fOYRDVxiUATuqP4VwZVZBVmClGlSCP0F2wAEf0n+IKic437FarrymEqsP RKacHGqXXar3ObBAQLJpfv60+MoPRTgmBkLIzwlrOX8SPQeTHWYQOLeU/Yqm27xiAHUT lY05Moj0E8We2+YQ8PLan5dmXxd1gypD3cWV7kUZFuxKmGHf7YfIFiQfQaBWTizuScPW Z3qSAmiYNbQ+ymtxrkds/kPJTVDVJpa2+IbzasiuvKRhroUTo278J8jEJaQ0GMzkqVp9 6Mzg== X-Forwarded-Encrypted: i=1; AJvYcCWC549eg6VHYetNzL0ksV29KVYhlIs503OedfebV4N14gZx1JJRstt8eoUeOHRMm8WIN1fkiIIAX08uQjZ65I+U@lists.infradead.org X-Gm-Message-State: AOJu0YyinV+p8IaW5BU+jCU5795Om3JTSkpUBoPnSeTAkXsADUTVTWE9 NWe021WGqB/ajk1+XBE5DehKdrKv6Gy8gH6ywH21J1J6J+34cMeySXIE9Bey7WQcSrQbSgDwaiD LRjYnskSC0AixD3VCGXmSsp/JOFZJJENaVfXKhsNMkj+jOVSjo6pHxDEnWA38BWQDJoSm1SKo X-Received: by 2002:a05:6e02:1a27:b0:39f:58f9:8d7c with SMTP id e9e14a558f8ab-39f58f999d6mr98609265ab.26.1725398622632; Tue, 03 Sep 2024 14:23:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxvBmrKrUXlRMceLmp36Br27W1bNa2xkEzECgslwDjjql0QNiHatEZ4EKpj7lp4mRqT0s0iw== X-Received: by 2002:a05:6e02:1a27:b0:39f:58f9:8d7c with SMTP id e9e14a558f8ab-39f58f999d6mr98608875ab.26.1725398622195; Tue, 03 Sep 2024 14:23:42 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-39f3af969dfsm32923855ab.14.2024.09.03.14.23.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Sep 2024 14:23:41 -0700 (PDT) Date: Tue, 3 Sep 2024 17:23:38 -0400 From: Peter Xu To: Yan Zhao Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Andrew Morton , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240903_142352_362466_E711A453 X-CRM114-Status: GOOD ( 32.50 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Sep 02, 2024 at 03:58:38PM +0800, Yan Zhao wrote: > On Mon, Aug 26, 2024 at 04:43:41PM -0400, Peter Xu wrote: > > Teach the fork code to properly copy pfnmaps for pmd/pud levels. Pud is > > much easier, the write bit needs to be persisted though for writable and > > shared pud mappings like PFNMAP ones, otherwise a follow up write in either > > parent or child process will trigger a write fault. > > > > Do the same for pmd level. > > > > Signed-off-by: Peter Xu > > --- > > mm/huge_memory.c | 29 ++++++++++++++++++++++++++--- > > 1 file changed, 26 insertions(+), 3 deletions(-) > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index e2c314f631f3..15418ffdd377 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -1559,6 +1559,24 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pgtable_t pgtable = NULL; > > int ret = -ENOMEM; > > > > + pmd = pmdp_get_lockless(src_pmd); > > + if (unlikely(pmd_special(pmd))) { > > + dst_ptl = pmd_lock(dst_mm, dst_pmd); > > + src_ptl = pmd_lockptr(src_mm, src_pmd); > > + spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); > > + /* > > + * No need to recheck the pmd, it can't change with write > > + * mmap lock held here. > > + * > > + * Meanwhile, making sure it's not a CoW VMA with writable > > + * mapping, otherwise it means either the anon page wrongly > > + * applied special bit, or we made the PRIVATE mapping be > > + * able to wrongly write to the backend MMIO. > > + */ > > + VM_WARN_ON_ONCE(is_cow_mapping(src_vma->vm_flags) && pmd_write(pmd)); > > + goto set_pmd; > > + } > > + > > /* Skip if can be re-fill on fault */ > > if (!vma_is_anonymous(dst_vma)) > > return 0; > > @@ -1640,7 +1658,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pmdp_set_wrprotect(src_mm, addr, src_pmd); > > if (!userfaultfd_wp(dst_vma)) > > pmd = pmd_clear_uffd_wp(pmd); > > - pmd = pmd_mkold(pmd_wrprotect(pmd)); > > + pmd = pmd_wrprotect(pmd); > > +set_pmd: > > + pmd = pmd_mkold(pmd); > > set_pmd_at(dst_mm, addr, dst_pmd, pmd); > > > > ret = 0; > > @@ -1686,8 +1706,11 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > * TODO: once we support anonymous pages, use > > * folio_try_dup_anon_rmap_*() and split if duplicating fails. > > */ > > - pudp_set_wrprotect(src_mm, addr, src_pud); > > - pud = pud_mkold(pud_wrprotect(pud)); > > + if (is_cow_mapping(vma->vm_flags) && pud_write(pud)) { > > + pudp_set_wrprotect(src_mm, addr, src_pud); > > + pud = pud_wrprotect(pud); > > + } > Do we need the logic to clear dirty bit in the child as that in > __copy_present_ptes()? (and also for the pmd's case). > > e.g. > if (vma->vm_flags & VM_SHARED) > pud = pud_mkclean(pud); Yeah, good question. I remember I thought about that when initially working on these lines, but I forgot the details, or maybe I simply tried to stick with the current code base, as the dirty bit used to be kept even in the child here. I'd expect there's only performance differences, but still sounds like I'd better leave that to whoever knows the best on the implications, then draft it as a separate patch but only when needed. Thanks, -- Peter Xu