From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: David Matlack <dmatlack@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
David Stevens <stevensd@chromium.org>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed
Date: Thu, 4 Apr 2024 15:02:50 -0700 [thread overview]
Message-ID: <Zg8jip0QIBbOCgpz@google.com> (raw)
In-Reply-To: <b3ea925f-bd47-4f54-bede-3f0d7471e3d7@redhat.com>
On Thu, Apr 04, 2024, David Hildenbrand wrote:
> On 04.04.24 19:31, Sean Christopherson wrote:
> > On Thu, Apr 04, 2024, David Hildenbrand wrote:
> > > On 04.04.24 00:19, Sean Christopherson wrote:
> > > > Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
> > > > to be invalidated before consuming dirty status. Isn't the end result essentially
> > > > a sane FOLL_TOUCH?
> > >
> > > Likely. As stated in my first mail, FOLL_TOUCH is a bit of a mess right now.
> > >
> > > Having something that makes sure the writable PTE/PMD is dirty (or
> > > alternatively sets it dirty), paired with MMU notifiers notifying on any
> > > mkclean would be one option that would leave handling how to handle dirtying
> > > of folios completely to the core. It would behave just like a CPU writing to
> > > the page table, which would set the pte dirty.
> > >
> > > Of course, if frequent clearing of the dirty PTE/PMD bit would be a problem
> > > (like we discussed for the accessed bit), that would not be an option. But
> > > from what I recall, only clearing the PTE/PMD dirty bit is rather rare.
> >
> > And AFAICT, all cases already invalidate secondary MMUs anyways, so if anything
> > it would probably be a net positive, e.g. the notification could more precisely
> > say that SPTEs need to be read-only, not blasted away completely.
>
> As discussed, I think at least madvise_free_pte_range() wouldn't do that.
I'm getting a bit turned around. Are you talking about what madvise_free_pte_range()
would do in this future world, or what madvise_free_pte_range() does today? Because
today, unless I'm really misreading the code, secondary MMUs are invalidated before
the dirty bit is cleared.
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
range.start, range.end);
lru_add_drain();
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
mmu_notifier_invalidate_range_start(&range);
tlb_start_vma(&tlb, vma);
walk_page_range(vma->vm_mm, range.start, range.end,
&madvise_free_walk_ops, &tlb);
tlb_end_vma(&tlb, vma);
mmu_notifier_invalidate_range_end(&range);
KVM (or any other secondary MMU) can re-establish mapping with W=1,D=0 in the
PTE, but the costly invalidation (zap+flush+fault) still happens.
> Notifiers would only get called later when actually zapping the folio.
And in case we're talking about a hypothetical future, I was thinking the above
could do MMU_NOTIFY_WRITE_PROTECT instead of MMU_NOTIFY_CLEAR.
> So at least for some time, you would have the PTE not dirty, but the SPTE
> writable or even dirty. So you'd have to set the page dirty when zapping the
> SPTE ... and IMHO that is what we should maybe try to avoid :)
next prev parent reply other threads:[~2024-04-04 22:02 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 0:50 [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 1/4] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 2/4] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 3/4] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 4/4] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
2024-03-20 12:56 ` [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed David Hildenbrand
2024-04-02 17:38 ` David Matlack
2024-04-02 18:31 ` David Hildenbrand
2024-04-03 0:17 ` Sean Christopherson
2024-04-03 21:43 ` David Hildenbrand
2024-04-03 22:19 ` Sean Christopherson
2024-04-04 15:44 ` David Hildenbrand
2024-04-04 17:31 ` Sean Christopherson
2024-04-04 18:23 ` David Hildenbrand
2024-04-04 22:02 ` Sean Christopherson [this message]
2024-04-05 6:53 ` David Hildenbrand
2024-04-05 9:37 ` Paolo Bonzini
2024-04-05 10:14 ` David Hildenbrand
2024-04-05 13:59 ` Sean Christopherson
2024-04-05 14:06 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zg8jip0QIBbOCgpz@google.com \
--to=seanjc@google.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=stevensd@chromium.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox