From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 09A3BECE564 for ; Tue, 10 Sep 2024 12:18:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=mEdWqBXE99wa0uYemQTMpq+O4a fttnqtzqML+2vmxtRTehbUQY9FjArPBvX12rQhU1sDI8aqV3ww5LbLquz/oUgxnVjfGG4psf/k0Ef TyhzTFtb06ejc3Hiwh/+mJIrGZUD1DotDAPEN7WnDNFFhB5z7gFLc23lOMNsAla1mKWYihFeSPHHB YRsQbmLJMwS7qOcK9fDSOka9SG0MNxbGolBou6aD0B4Gd3GFL9/C5gss4peZ7FYszGaP5hUMHWNmZ YF5aSPL+yMnAGzRmDW+D/O2eMd9WWnttY+geC6Djqb054BQgX84z0LLgVdNGpvFYOdB8oR7GEz3I2 ZJ0rQaYA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1snzp9-00000005VzM-05Hl; Tue, 10 Sep 2024 12:18:23 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1snzn8-00000005Vcu-2LYD for linux-arm-kernel@lists.infradead.org; Tue, 10 Sep 2024 12:16:20 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725970577; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=fwmy469Znubw9uqU/lVnt0prsnNShV5HAWs78pWERWYtfEEVxuafgoAgLZ1Y4rusQD5901 fYBJEya6EmzqjsZtBP5CZCZXJ3EWrp6ERrNbNYvMqavXYYr/+CfehbC88NNUN+Hkidpent m7XB5ldAw4ekdpC/N1lyK2+2WXarPNo= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-613-JAKfJHVtNFuGk9NfKfjYbw-1; Tue, 10 Sep 2024 08:16:16 -0400 X-MC-Unique: JAKfJHVtNFuGk9NfKfjYbw-1 Received: by mail-qt1-f198.google.com with SMTP id d75a77b69052e-4582a894843so13536301cf.2 for ; Tue, 10 Sep 2024 05:16:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725970575; x=1726575375; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=TqNTsGxo7KShGTs/iiAEuAgoPybBC8iCW7tKXvfqSTE=; b=FeZcScqM83OzJjJlQND7kHAfXt8+tkvAuStedV/rwOm/oq2gD8j6Xakfy/xsMJ6yik qtvWhGJzDQkKJT9R0qj/OudasQWpPHqXXr8WUG2GIj7NbryFgAZMttg3hrmgk0O+Snpv hgvqiuaksLDTnV0HB+GddYVhGaCNWLeI1FRE0698vame4E0VPaYLoahGmnXKcRLIGsGS V+IML/uO/s0TS7VgsKCDFxYAXsfBgKE2+OvM+175vdodMyMiRKg8Ckd0I7iAIwzqZLJo 9V9CI75fuvnA4j2LqpzZyX8NYEJOs2fMMoW1E/y/JXKW8nMYsGrYXu8yiFeD1nmJefJm MQXQ== X-Forwarded-Encrypted: i=1; AJvYcCUf0wBwLSHYUNNO1WdMwpHCQ/QLbtO9Q27ZwaqbzbWEnpjZ5igZj+s/XeABiDQvU3JVmnuMCLO5zTHyjVwY4Kxo@lists.infradead.org X-Gm-Message-State: AOJu0YyPtPQXL5ufwgwtxnUJfSSCwhlknnFJNmH6Pd9xneIFnpI2OSaT HVjFhVphYiljGGwA2eVR5Jqi3TmzoEWfXiTW/RJrz4HfJ8kxZI17K5P/iV2FKu2qJGyKTV16fMG wwpwINeDeUqjzD/y8T6TN35D26JNvhSQsko3h0Vvy4ygX1AaV74agHfHjoNGJA8uKAnCa/Yh5 X-Received: by 2002:ac8:5f47:0:b0:458:1578:56a6 with SMTP id d75a77b69052e-4581f480530mr147993641cf.24.1725970575412; Tue, 10 Sep 2024 05:16:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFFv3wgyd3Hg6lqpOUUjpJkTGHHsYgJs2xr7aDITSvRwlUJdHVQc4gd4/fSmPETPVJLM55vBA== X-Received: by 2002:ac8:5f47:0:b0:458:1578:56a6 with SMTP id d75a77b69052e-4581f480530mr147993141cf.24.1725970574851; Tue, 10 Sep 2024 05:16:14 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45822eb001bsm29057461cf.54.2024.09.10.05.16.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Sep 2024 05:16:14 -0700 (PDT) Date: Tue, 10 Sep 2024 08:16:10 -0400 From: Peter Xu To: Yan Zhao Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> <20240909152546.4ef47308e560ce120156bc35@linux-foundation.org> <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240910_051618_720176_14C128C5 X-CRM114-Status: GOOD ( 21.50 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Sep 10, 2024 at 10:52:01AM +0800, Yan Zhao wrote: > Hi Peter, Hi, Yan, > > Not sure if I missed anything. > > It looks that before this patch, pmd/pud are alawys write protected without > checking "is_cow_mapping(vma->vm_flags) && pud_write(pud)". pud_wrprotect() > clears dirty bit by moving the dirty value to the software bit. > > And I have a question that why previously pmd/pud are always write protected. IIUC this is a separate question - the move of dirty bit in pud_wrprotect() is to avoid wrongly creating shadow stack mappings. In our discussion I think that's an extra complexity and can be put aside; the dirty bit will get recovered in pud_clear_saveddirty() later, so it's not the same as pud_mkclean(). AFAIU pmd/pud paths don't consider is_cow_mapping() because normally we will not duplicate pgtables in fork() for most of shared file mappings (!CoW). Please refer to vma_needs_copy(), and the comment before returning false at last. I think it's not strictly is_cow_mapping(), as we're checking anon_vma there, however it's mostly it, just to also cover MAP_PRIVATE on file mappings too when there's no CoW happened (as if CoW happened then anon_vma will appear already). There're some outliers, e.g. userfault protected, or pfnmaps/mixedmaps. Userfault & mixedmap are not involved in this series at all, so let's discuss pfnmaps. It means, fork() can still copy pgtable for pfnmap vmas, and it's relevant to this series, because before this series pfnmap only exists in pte level, hence IMO the is_cow_mapping() must exist for pte level as you described, because it needs to properly take care of those. Note that in the pte processing it also checks pte_write() to make sure it's a COWed page, not a RO page cache / pfnmap / ..., for example. Meanwhile, since pfnmap won't appear in pmd/pud, I think it's fair that pmd/pud assumes when seeing a huge mapping it must be MAP_PRIVATE otherwise the whole copy_page_range() could be already skipped. IOW I think they only need to process COWed pages here, and those pages require write bit removed in both parent and child when fork(). After this series, pfnmaps can appear in the form of pmd/pud, then the previous assumption will stop holding true, as we'll still copy pfnmaps during fork() always. My guessing of the reason is because most of the drivers map pfnmap vmas only during mmap(), it means there can normally have no fault() handler at all for those pfns. In this case, we'll need to also identify whether the page is COWed, using the newly added "is_cow_mapping() && pxx_write()" in this series (added to pud path, while for pmd path I used a WARN_ON_ONCE instead). If we don't do that, it means e.g. for a VM_SHARED pfnmap vma, after fork() we'll wrongly observe write protected entries. Here the change will make sure VM_SHARED can properly persist the write bits on pmds/puds. Hope that explains. Thanks, -- Peter Xu