From: Sean Christopherson <seanjc@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: David Matlack <dmatlack@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
David Stevens <stevensd@chromium.org>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed
Date: Wed, 3 Apr 2024 15:19:36 -0700 [thread overview]
Message-ID: <Zg3V-M3iospVUEDU@google.com> (raw)
In-Reply-To: <ca1f320b-dc06-48e0-b4f5-ce860a72f0e2@redhat.com>
On Wed, Apr 03, 2024, David Hildenbrand wrote:
> On 03.04.24 02:17, Sean Christopherson wrote:
> > On Tue, Apr 02, 2024, David Hildenbrand wrote:
> > Aha! But try_to_unmap_one() also checks that refcount==mapcount+1, i.e. will
> > also keep the folio if it has been GUP'd. And __remove_mapping() explicitly states
> > that it needs to play nice with a GUP'd page being marked dirty before the
> > reference is dropped.
>
> >
> > * Must be careful with the order of the tests. When someone has
> > * a ref to the folio, it may be possible that they dirty it then
> > * drop the reference. So if the dirty flag is tested before the
> > * refcount here, then the following race may occur:
> >
> > So while it's totally possible for KVM to get a W=1,D=0 PTE, if I'm reading the
> > code correctly it's safe/legal so long as KVM either (a) marks the folio dirty
> > while holding a reference or (b) marks the folio dirty before returning from its
> > mmu_notifier_invalidate_range_start() hook, *AND* obviously if KVM drops its
> > mappings in response to mmu_notifier_invalidate_range_start().
> >
>
> Yes, I agree that it should work in the context of vmscan. But (b) is
> certainly a bit harder to swallow than "ordinary" (a) :)
Heh, all the more reason to switch KVM x86 from (b) => (a).
> As raised, if having a writable SPTE would imply having a writable+dirty
> PTE, then KVM MMU code wouldn't have to worry about syncing any dirty bits
> ever back to core-mm, so patch #2 would not be required. ... well, it would
> be replaces by an MMU notifier that notifies about clearing the PTE dirty
> bit :)
Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need
to be invalidated before consuming dirty status. Isn't the end result essentially
a sane FOLL_TOUCH?
> ... because, then, there is also a subtle difference between
> folio_set_dirty() and folio_mark_dirty(), and I am still confused about the
> difference and not competent enough to explain the difference ... and KVM
> always does the former, while zapping code of pagecache folios does the
> latter ... hm
Ugh, just when I thought I finally had my head wrapped around this.
> Related note: IIRC, we usually expect most anon folios to be dirty.
>
> kvm_set_pfn_dirty()->kvm_set_page_dirty() does an unconditional
> SetPageDirty()->folio_set_dirty(). Doing a test-before-set might frequently
> avoid atomic ops.
Noted, definitely worth poking at.
next prev parent reply other threads:[~2024-04-03 22:19 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 0:50 [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 1/4] KVM: x86/mmu: Skip the "try unsync" path iff the old SPTE was a leaf SPTE Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 2/4] KVM: x86/mmu: Mark folio dirty when creating SPTE, not when zapping/modifying Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 3/4] KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs Sean Christopherson
2024-03-20 0:50 ` [RFC PATCH 4/4] KVM: x86/mmu: Don't force flush if SPTE update clears Accessed bit Sean Christopherson
2024-03-20 12:56 ` [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed David Hildenbrand
2024-04-02 17:38 ` David Matlack
2024-04-02 18:31 ` David Hildenbrand
2024-04-03 0:17 ` Sean Christopherson
2024-04-03 21:43 ` David Hildenbrand
2024-04-03 22:19 ` Sean Christopherson [this message]
2024-04-04 15:44 ` David Hildenbrand
2024-04-04 17:31 ` Sean Christopherson
2024-04-04 18:23 ` David Hildenbrand
2024-04-04 22:02 ` Sean Christopherson
2024-04-05 6:53 ` David Hildenbrand
2024-04-05 9:37 ` Paolo Bonzini
2024-04-05 10:14 ` David Hildenbrand
2024-04-05 13:59 ` Sean Christopherson
2024-04-05 14:06 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zg3V-M3iospVUEDU@google.com \
--to=seanjc@google.com \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=stevensd@chromium.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.