linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Yan Zhao <yan.y.zhao@intel.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries
Date: Mon, 9 Sep 2024 20:08:16 -0400	[thread overview]
Message-ID: <Zt-N8MB93XSqFZO_@x1n> (raw)
In-Reply-To: <20240909161539.aa685e3eb44cdc786b8c05d2@linux-foundation.org>

On Mon, Sep 09, 2024 at 04:15:39PM -0700, Andrew Morton wrote:
> On Mon, 9 Sep 2024 18:43:22 -0400 Peter Xu <peterx@redhat.com> wrote:
> 
> > > > > Do we need the logic to clear dirty bit in the child as that in
> > > > > __copy_present_ptes()?  (and also for the pmd's case).
> > > > > 
> > > > > e.g.
> > > > > if (vma->vm_flags & VM_SHARED)
> > > > > 	pud = pud_mkclean(pud);
> > > > 
> > > > Yeah, good question.  I remember I thought about that when initially
> > > > working on these lines, but I forgot the details, or maybe I simply tried
> > > > to stick with the current code base, as the dirty bit used to be kept even
> > > > in the child here.
> > > > 
> > > > I'd expect there's only performance differences, but still sounds like I'd
> > > > better leave that to whoever knows the best on the implications, then draft
> > > > it as a separate patch but only when needed.
> > > 
> > > Sorry, but this vaguensss simply leaves me with nowhere to go.
> > > 
> > > I'll drop the series - let's revisit after -rc1 please.
> > 
> > Andrew, would you please explain why it needs to be dropped?
> > 
> > I meant in the reply that I think we should leave that as is, and I think
> > so far nobody in real life should care much on this bit, so I think it's
> > fine to leave the dirty bit as-is.
> > 
> > I still think whoever has a better use of the dirty bit and would like to
> > change the behavior should find the use case and work on top, but only if
> > necessary.
> 
> Well.  "I'd expect there's only performance differences" means to me
> "there might be correctness issues, I don't know".  Is it or is it not
> merely a performance thing?

There should have no correctness issue pending.  It can only be about
performance, and AFAIU what this patch does is exactly the way where it
shouldn't ever change performance either, as it didn't change how dirty bit
was processed (just like before this patch), not to mention correctness (in
regards to dirty bits).

I can provide some more details.

Here the question we're discussing is "whether we should clear the dirty
bit in the child for a pgtable entry when it's VM_SHARED".  Yan observed
that we don't do the same thing for pte/pmd/pud, which is true.

Before this patch:

  - For pte:     we clear dirty bit if VM_SHARED in child when copy
  - For pmd/pud: we never clear dirty bit in the child when copy

The behavior of clearing dirty bit for VM_SHARED in child for pte level
originates to the 1st commit that git history starts.  So we always do so
for 19 years.

That makes sense to me, because clearing dirty bit in pte normally requires
a SetDirty on the folio, e.g. in unmap path:

        if (pte_dirty(pteval))
                folio_mark_dirty(folio);

Hence cleared dirty bit in the child should avoid some extra overheads when
the pte maps a file cache, so clean pte can at least help us to avoid calls
into e.g. mapping's dirty_folio() functions (in which it should normally
check folio_test_set_dirty() again anyway, and parent pte still have the
dirty bit set so we won't miss setting folio dirty):

folio_mark_dirty():
        if (folio_test_reclaim(folio))
                folio_clear_reclaim(folio);
        return mapping->a_ops->dirty_folio(mapping, folio);

However there's the other side of thing where when the dirty bit is missing
I _think_ it also means when the child writes to the cleaned pte, it'll
require (e.g. on hardware accelerated archs) MMU setting dirty bit which is
slower than if we don't clear the dirty bit... and on software emulated
dirty bits it could even require a page fault, IIUC.

In short, personally I don't know what's the best to do, on keep / remove
the dirty bit even if it's safe either way: there are pros and cons on
different decisions.

That's why I said I'm not sure which is the best way.  I had a feeling that
most of the people didn't even notice this, and we kept running this code
for the past 19 years just all fine..

OTOH, we don't do the same for pmds/puds (in which case we persist dirty
bits always in child), and I didn't check whether it's intended, or why.
It'll have similar reasoning as above discussion on pte, or even more I
overlooked.

So again, the safest approach here is in terms of dirty bit we keep what we
do as before.  And that's what this patch does as of now.

IOW, if I'll need a repost, I'll repost exactly the same thing (with the
fixup I sent later, which is already in mm-unstable).

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-09-10  0:08 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand
2024-08-28 14:24     ` Peter Xu
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu [this message]
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu
2024-08-28 16:23         ` Jiaqi Yan
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zt-N8MB93XSqFZO_@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).