From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4BE81ECE58F for ; Tue, 10 Sep 2024 00:09:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=iS85v2mgp57QDjzCmKKrvgG8FT bw0KE4mJkZTOy6Lhxu7AQFJ3H/XRa2Xa4y8BoZgMYz7o0bhCONbp75Bm9M/aTjeD9YowhdhvsZnCv rLvO+dCvDcXLTEZFU3b54sHOptDx2IocmH4mzRZ1C4EUIU9n9mlLTKN17qpd78ExSWSdozRYpOcsD MnlxitJ9F+dCqt/6aApTi/XWvTMkoNQpuQDqyrVVCMP+JEC1DBoVEpp6qbnYYmbMdhBybb5BDV9IW m+jRAzv4X1zQ2mL/gVlas/1WvOxSKD6LPfU73MSqxmABFzn2OrsstniOjPy3B2pZ0uePVqfi4sLmN Th68NI1Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1snoRj-00000003en7-3i3l; Tue, 10 Sep 2024 00:09:27 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1snoQi-00000003efU-16i3 for linux-arm-kernel@lists.infradead.org; Tue, 10 Sep 2024 00:08:25 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1725926902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=L3uxbiYIsAJaCg4EWsOmjRioxJpvgccM77IkHsUe5xZbF/kCfog8k9VO/EXzZDa6y2ENKH 1sWg3wGb604hs7NCRptA8yCNjDIAI4vbx8rYd5xf/Sz4sAEv+k3UBlgTMios5ZAjsf7tHr dxeNxDd2UGIDep0cZsD1G2ZwI1sxcpo= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-569-WVOr-NejO56AsSBaxh2EhA-1; Mon, 09 Sep 2024 20:08:21 -0400 X-MC-Unique: WVOr-NejO56AsSBaxh2EhA-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-6c54fb2327fso15610446d6.2 for ; Mon, 09 Sep 2024 17:08:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725926901; x=1726531701; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nS90PBsQLo/NAADX0RLPD+B7rTpNZ5FJaTlKjF8OXUU=; b=eA8IQlviHjlOXN6Iq6sdiqght2N0lbrlAKvkzzP8Er2g/GgASk6pwwY48VpOYpmuro LKBZTe/GMxgR7IUSd7TuarkrZ35Nq1scH/Tc/6GhivnaGD8dCEHxPlPuwSHDpFbFOuJ1 JlixZB9j/UW8mbZsL9tdbyN17YWm8YUQgGeb0lsLAWPXyWyW2jbe2JTXWOdG1l0TE+Qa dh04ZlWoGb6yFBHj5Oxl5snRHIuVb2lfrEO3/LTfw5Bcy2gW/avGTFlFXUDkupX+Mmkp gN2g8ZS4TQfz1pkzyVKVl7zJH0VVsmnVhrcpVFjx8ptHrUVe7/JSK+EQWe57fJcp3mg0 LJMg== X-Forwarded-Encrypted: i=1; AJvYcCVAP0hL0WICIT9BKtx1wnvREG/QFY+lIE8JSgXYx+54W4bBdUMpa1e1lpUeLQDBXjJOh0pnNhIGijTY0qKUtPuB@lists.infradead.org X-Gm-Message-State: AOJu0YxXlna1hKK/C0UeTHKa8nyGKzfcnapCekyOohBPfujWxLY2yrn5 /J16c9rSvP/iAmLCG42oSM7EW8yf3H93RBAcveRyKEysOjjWQ6BebEM6FohmgqzyGYJ3KNmh2oP /4Yhkq0XwD+bWLpKaDCz/5gGx8AN+j9atByqAA2MF9718RlQkCLkm8GNt/gmTLYYLOg9Y/1WK X-Received: by 2002:a05:6214:2dc2:b0:6c5:53b8:c8b1 with SMTP id 6a1803df08f44-6c553b8c8e3mr29372056d6.13.1725926900636; Mon, 09 Sep 2024 17:08:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4mcc+7qwVSx7ydxnAtNmRQVI9fbDsEeVC4thly4922XNAYI8zD2xHJ+6aII9StNVEuLJ8dw== X-Received: by 2002:a05:6214:2dc2:b0:6c5:53b8:c8b1 with SMTP id 6a1803df08f44-6c553b8c8e3mr29371626d6.13.1725926900246; Mon, 09 Sep 2024 17:08:20 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c53432968esm25143526d6.23.2024.09.09.17.08.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Sep 2024 17:08:19 -0700 (PDT) Date: Mon, 9 Sep 2024 20:08:16 -0400 From: Peter Xu To: Andrew Morton Cc: Yan Zhao , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Gavin Shan , Catalin Marinas , x86@kernel.org, Ingo Molnar , Paolo Bonzini , Dave Hansen , Thomas Gleixner , Alistair Popple , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Sean Christopherson , Oscar Salvador , Jason Gunthorpe , Borislav Petkov , Zi Yan , Axel Rasmussen , David Hildenbrand , Will Deacon , Kefeng Wang , Alex Williamson Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Message-ID: References: <20240826204353.2228736-1-peterx@redhat.com> <20240826204353.2228736-8-peterx@redhat.com> <20240909152546.4ef47308e560ce120156bc35@linux-foundation.org> <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240909_170824_404766_26BEC2B5 X-CRM114-Status: GOOD ( 42.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Sep 09, 2024 at 04:15:39PM -0700, Andrew Morton wrote: > On Mon, 9 Sep 2024 18:43:22 -0400 Peter Xu wrote: > > > > > > Do we need the logic to clear dirty bit in the child as that in > > > > > __copy_present_ptes()? (and also for the pmd's case). > > > > > > > > > > e.g. > > > > > if (vma->vm_flags & VM_SHARED) > > > > > pud = pud_mkclean(pud); > > > > > > > > Yeah, good question. I remember I thought about that when initially > > > > working on these lines, but I forgot the details, or maybe I simply tried > > > > to stick with the current code base, as the dirty bit used to be kept even > > > > in the child here. > > > > > > > > I'd expect there's only performance differences, but still sounds like I'd > > > > better leave that to whoever knows the best on the implications, then draft > > > > it as a separate patch but only when needed. > > > > > > Sorry, but this vaguensss simply leaves me with nowhere to go. > > > > > > I'll drop the series - let's revisit after -rc1 please. > > > > Andrew, would you please explain why it needs to be dropped? > > > > I meant in the reply that I think we should leave that as is, and I think > > so far nobody in real life should care much on this bit, so I think it's > > fine to leave the dirty bit as-is. > > > > I still think whoever has a better use of the dirty bit and would like to > > change the behavior should find the use case and work on top, but only if > > necessary. > > Well. "I'd expect there's only performance differences" means to me > "there might be correctness issues, I don't know". Is it or is it not > merely a performance thing? There should have no correctness issue pending. It can only be about performance, and AFAIU what this patch does is exactly the way where it shouldn't ever change performance either, as it didn't change how dirty bit was processed (just like before this patch), not to mention correctness (in regards to dirty bits). I can provide some more details. Here the question we're discussing is "whether we should clear the dirty bit in the child for a pgtable entry when it's VM_SHARED". Yan observed that we don't do the same thing for pte/pmd/pud, which is true. Before this patch: - For pte: we clear dirty bit if VM_SHARED in child when copy - For pmd/pud: we never clear dirty bit in the child when copy The behavior of clearing dirty bit for VM_SHARED in child for pte level originates to the 1st commit that git history starts. So we always do so for 19 years. That makes sense to me, because clearing dirty bit in pte normally requires a SetDirty on the folio, e.g. in unmap path: if (pte_dirty(pteval)) folio_mark_dirty(folio); Hence cleared dirty bit in the child should avoid some extra overheads when the pte maps a file cache, so clean pte can at least help us to avoid calls into e.g. mapping's dirty_folio() functions (in which it should normally check folio_test_set_dirty() again anyway, and parent pte still have the dirty bit set so we won't miss setting folio dirty): folio_mark_dirty(): if (folio_test_reclaim(folio)) folio_clear_reclaim(folio); return mapping->a_ops->dirty_folio(mapping, folio); However there's the other side of thing where when the dirty bit is missing I _think_ it also means when the child writes to the cleaned pte, it'll require (e.g. on hardware accelerated archs) MMU setting dirty bit which is slower than if we don't clear the dirty bit... and on software emulated dirty bits it could even require a page fault, IIUC. In short, personally I don't know what's the best to do, on keep / remove the dirty bit even if it's safe either way: there are pros and cons on different decisions. That's why I said I'm not sure which is the best way. I had a feeling that most of the people didn't even notice this, and we kept running this code for the past 19 years just all fine.. OTOH, we don't do the same for pmds/puds (in which case we persist dirty bits always in child), and I didn't check whether it's intended, or why. It'll have similar reasoning as above discussion on pte, or even more I overlooked. So again, the safest approach here is in terms of dirty bit we keep what we do as before. And that's what this patch does as of now. IOW, if I'll need a repost, I'll repost exactly the same thing (with the fixup I sent later, which is already in mm-unstable). Thanks, -- Peter Xu