All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: David Matlack <dmatlack@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	David Stevens <stevensd@chromium.org>,
	 Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed
Date: Thu, 4 Apr 2024 15:02:50 -0700	[thread overview]
Message-ID: <Zg8jip0QIBbOCgpz@google.com> (raw)
In-Reply-To: <b3ea925f-bd47-4f54-bede-3f0d7471e3d7@redhat.com>

On Thu, Apr 04, 2024, David Hildenbrand wrote:
> On 04.04.24 19:31, Sean Christopherson wrote:
> > On Thu, Apr 04, 2024, David Hildenbrand wrote:
> > > On 04.04.24 00:19, Sean Christopherson wrote:
> > > > Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
> > > > to be invalidated before consuming dirty status.  Isn't the end result essentially
> > > > a sane FOLL_TOUCH?
> > > 
> > > Likely. As stated in my first mail, FOLL_TOUCH is a bit of a mess right now.
> > > 
> > > Having something that makes sure the writable PTE/PMD is dirty (or
> > > alternatively sets it dirty), paired with MMU notifiers notifying on any
> > > mkclean would be one option that would leave handling how to handle dirtying
> > > of folios completely to the core. It would behave just like a CPU writing to
> > > the page table, which would set the pte dirty.
> > > 
> > > Of course, if frequent clearing of the dirty PTE/PMD bit would be a problem
> > > (like we discussed for the accessed bit), that would not be an option. But
> > > from what I recall, only clearing the PTE/PMD dirty bit is rather rare.
> > 
> > And AFAICT, all cases already invalidate secondary MMUs anyways, so if anything
> > it would probably be a net positive, e.g. the notification could more precisely
> > say that SPTEs need to be read-only, not blasted away completely.
> 
> As discussed, I think at least madvise_free_pte_range() wouldn't do that.

I'm getting a bit turned around.  Are you talking about what madvise_free_pte_range()
would do in this future world, or what madvise_free_pte_range() does today?  Because
today, unless I'm really misreading the code, secondary MMUs are invalidated before
the dirty bit is cleared.

	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
				range.start, range.end);

	lru_add_drain();
	tlb_gather_mmu(&tlb, mm);
	update_hiwater_rss(mm);

	mmu_notifier_invalidate_range_start(&range);
	tlb_start_vma(&tlb, vma);
	walk_page_range(vma->vm_mm, range.start, range.end,
			&madvise_free_walk_ops, &tlb);
	tlb_end_vma(&tlb, vma);
	mmu_notifier_invalidate_range_end(&range);

KVM (or any other secondary MMU) can re-establish mapping with W=1,D=0 in the
PTE, but the costly invalidation (zap+flush+fault) still happens.

> Notifiers would only get called later when actually zapping the folio.

And in case we're talking about a hypothetical future, I was thinking the above
could do MMU_NOTIFY_WRITE_PROTECT instead of MMU_NOTIFY_CLEAR.

> So at least for some time, you would have the PTE not dirty, but the SPTE
> writable or even dirty. So you'd have to set the page dirty when zapping the
> SPTE ...  and IMHO that is what we should maybe try to avoid :)

  reply	other threads:[~2024-04-04 22:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20  0:50 [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed Sean Christopherson
2024-03-20  0:50 ` [RFC PATCH 1/4] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
2024-03-20  0:50 ` [RFC PATCH 2/4] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
2024-03-20  0:50 ` [RFC PATCH 3/4] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
2024-03-20  0:50 ` [RFC PATCH 4/4] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
2024-03-20 12:56 ` [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed David Hildenbrand
2024-04-02 17:38   ` David Matlack
2024-04-02 18:31     ` David Hildenbrand
2024-04-03  0:17       ` Sean Christopherson
2024-04-03 21:43         ` David Hildenbrand
2024-04-03 22:19           ` Sean Christopherson
2024-04-04 15:44             ` David Hildenbrand
2024-04-04 17:31               ` Sean Christopherson
2024-04-04 18:23                 ` David Hildenbrand
2024-04-04 22:02                   ` Sean Christopherson [this message]
2024-04-05  6:53                     ` David Hildenbrand
2024-04-05  9:37                       ` Paolo Bonzini
2024-04-05 10:14                         ` David Hildenbrand
2024-04-05 13:59                           ` Sean Christopherson
2024-04-05 14:06                             ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg8jip0QIBbOCgpz@google.com \
    --to=seanjc@google.com \
    --cc=david@redhat.com \
    --cc=dmatlack@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=stevensd@chromium.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.