From: Catalin Marinas <catalin.marinas@arm.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"maz@kernel.org" <maz@kernel.org>,
"will@kernel.org" <will@kernel.org>,
"james.morse@arm.com" <james.morse@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
yuzenghui <yuzenghui@huawei.com>,
zhukeqian <zhukeqian1@huawei.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: Re: [RFC PATCH v2 3/8] KVM: arm64: Add some HW_DBM related pgtable interfaces
Date: Tue, 26 Sep 2023 16:20:03 +0100 [thread overview]
Message-ID: <ZRL2owYDvKF6gnlb@arm.com> (raw)
In-Reply-To: <c4e12638b4874dc4809d24ce131d7b07@huawei.com>
On Mon, Sep 25, 2023 at 08:04:39AM +0000, Shameerali Kolothum Thodi wrote:
> From: Oliver Upton [mailto:oliver.upton@linux.dev]
> > On Fri, Sep 22, 2023 at 04:24:11PM +0100, Catalin Marinas wrote:
> > > I was wondering if this interferes with the OS dirty tracking (not the
> > > KVM one) but I think that's ok, at least at this point, since the PTE is
> > > already writeable and a fault would have marked the underlying page as
> > > dirty (user_mem_abort() -> kvm_set_pfn_dirty()).
> > >
> > > I'm not particularly fond of relying on this but I need to see how it
> > > fits with the rest of the series. IIRC KVM doesn't go around and make
> > > Stage 2 PTEs read-only but rather unmaps them when it changes the
> > > permission of the corresponding Stage 1 VMM mapping.
> > >
> > > My personal preference would be to track dirty/clean properly as we do
> > > for stage 1 (e.g. DBM means writeable PTE) but it has some downsides
> > > like the try_to_unmap() code having to retrieve the dirty state via
> > > notifiers.
> >
> > KVM's usage of DBM is complicated by the fact that the dirty log
> > interface w/ userspace is at PTE granularity. We only want the page
> > table walker to relax PTEs, but take faults on hugepages so we can do
> > page splitting.
Thanks for the clarification.
> > > > @@ -952,6 +990,11 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> > > > stage2_pte_executable(new))
> > > > mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > > >
> > > > + /* Save the possible hardware dirty info */
> > > > + if ((ctx->level == KVM_PGTABLE_MAX_LEVELS - 1) &&
> > > > + stage2_pte_writeable(ctx->old))
> > > > + mark_page_dirty(kvm_s2_mmu_to_kvm(pgt->mmu), ctx->addr >> PAGE_SHIFT);
> > > > +
> > > > stage2_make_pte(ctx, new);
> > >
> > > Isn't this racy and potentially losing the dirty state? Or is the 'new'
> > > value guaranteed to have the S2AP[1] bit? For stage 1 we normally make
> > > the page genuinely read-only (clearing DBM) in a cmpxchg loop to
> > > preserve the dirty state (see ptep_set_wrprotect()).
> >
> > stage2_try_break_pte() a few lines up does a cmpxchg() and full
> > break-before-make, so at this point there shouldn't be a race with
> > either software or hardware table walkers.
Ah, I missed this. Also it was unrelated to this patch (or rather not
introduced by this patch).
> > In both cases the 'old' translation should have DBM cleared. Even if the
> > PTE were dirty, this is wasted work since we need to do a final scan of
> > the stage-2 when userspace collects the dirty log.
> >
> > Am I missing something?
>
> I think we can get rid of the above mark_page_dirty(). I will test it to confirm
> we are not missing anything here.
Is this the case for the other places of mark_page_dirty() in your
patches? If stage2_pte_writeable() is true, it must have been made
writeable earlier by a fault and the underlying page marked as dirty.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"maz@kernel.org" <maz@kernel.org>,
"will@kernel.org" <will@kernel.org>,
"james.morse@arm.com" <james.morse@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
yuzenghui <yuzenghui@huawei.com>,
zhukeqian <zhukeqian1@huawei.com>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: Re: [RFC PATCH v2 3/8] KVM: arm64: Add some HW_DBM related pgtable interfaces
Date: Tue, 26 Sep 2023 16:20:03 +0100 [thread overview]
Message-ID: <ZRL2owYDvKF6gnlb@arm.com> (raw)
In-Reply-To: <c4e12638b4874dc4809d24ce131d7b07@huawei.com>
On Mon, Sep 25, 2023 at 08:04:39AM +0000, Shameerali Kolothum Thodi wrote:
> From: Oliver Upton [mailto:oliver.upton@linux.dev]
> > On Fri, Sep 22, 2023 at 04:24:11PM +0100, Catalin Marinas wrote:
> > > I was wondering if this interferes with the OS dirty tracking (not the
> > > KVM one) but I think that's ok, at least at this point, since the PTE is
> > > already writeable and a fault would have marked the underlying page as
> > > dirty (user_mem_abort() -> kvm_set_pfn_dirty()).
> > >
> > > I'm not particularly fond of relying on this but I need to see how it
> > > fits with the rest of the series. IIRC KVM doesn't go around and make
> > > Stage 2 PTEs read-only but rather unmaps them when it changes the
> > > permission of the corresponding Stage 1 VMM mapping.
> > >
> > > My personal preference would be to track dirty/clean properly as we do
> > > for stage 1 (e.g. DBM means writeable PTE) but it has some downsides
> > > like the try_to_unmap() code having to retrieve the dirty state via
> > > notifiers.
> >
> > KVM's usage of DBM is complicated by the fact that the dirty log
> > interface w/ userspace is at PTE granularity. We only want the page
> > table walker to relax PTEs, but take faults on hugepages so we can do
> > page splitting.
Thanks for the clarification.
> > > > @@ -952,6 +990,11 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> > > > stage2_pte_executable(new))
> > > > mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > > >
> > > > + /* Save the possible hardware dirty info */
> > > > + if ((ctx->level == KVM_PGTABLE_MAX_LEVELS - 1) &&
> > > > + stage2_pte_writeable(ctx->old))
> > > > + mark_page_dirty(kvm_s2_mmu_to_kvm(pgt->mmu), ctx->addr >> PAGE_SHIFT);
> > > > +
> > > > stage2_make_pte(ctx, new);
> > >
> > > Isn't this racy and potentially losing the dirty state? Or is the 'new'
> > > value guaranteed to have the S2AP[1] bit? For stage 1 we normally make
> > > the page genuinely read-only (clearing DBM) in a cmpxchg loop to
> > > preserve the dirty state (see ptep_set_wrprotect()).
> >
> > stage2_try_break_pte() a few lines up does a cmpxchg() and full
> > break-before-make, so at this point there shouldn't be a race with
> > either software or hardware table walkers.
Ah, I missed this. Also it was unrelated to this patch (or rather not
introduced by this patch).
> > In both cases the 'old' translation should have DBM cleared. Even if the
> > PTE were dirty, this is wasted work since we need to do a final scan of
> > the stage-2 when userspace collects the dirty log.
> >
> > Am I missing something?
>
> I think we can get rid of the above mark_page_dirty(). I will test it to confirm
> we are not missing anything here.
Is this the case for the other places of mark_page_dirty() in your
patches? If stage2_pte_writeable() is true, it must have been made
writeable earlier by a fault and the underlying page marked as dirty.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-09-26 15:20 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-25 9:35 [RFC PATCH v2 0/8] KVM: arm64: Implement SW/HW combined dirty log Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-08-25 9:35 ` [RFC PATCH v2 1/8] arm64: cpufeature: Add API to report system support of HWDBM Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-08-25 9:35 ` [RFC PATCH v2 2/8] KVM: arm64: Add KVM_PGTABLE_WALK_HW_DBM for HW DBM support Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-09-15 22:05 ` Oliver Upton
2023-09-15 22:05 ` Oliver Upton
2023-09-18 9:52 ` Shameerali Kolothum Thodi
2023-09-18 9:52 ` Shameerali Kolothum Thodi
2023-08-25 9:35 ` [RFC PATCH v2 3/8] KVM: arm64: Add some HW_DBM related pgtable interfaces Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-09-15 22:22 ` Oliver Upton
2023-09-15 22:22 ` Oliver Upton
2023-09-18 9:53 ` Shameerali Kolothum Thodi
2023-09-18 9:53 ` Shameerali Kolothum Thodi
2023-09-22 15:24 ` Catalin Marinas
2023-09-22 15:24 ` Catalin Marinas
2023-09-22 17:49 ` Oliver Upton
2023-09-22 17:49 ` Oliver Upton
2023-09-25 8:04 ` Shameerali Kolothum Thodi
2023-09-25 8:04 ` Shameerali Kolothum Thodi
2023-09-26 15:20 ` Catalin Marinas [this message]
2023-09-26 15:20 ` Catalin Marinas
2023-09-26 15:52 ` Shameerali Kolothum Thodi
2023-09-26 15:52 ` Shameerali Kolothum Thodi
2023-09-26 16:37 ` Catalin Marinas
2023-09-26 16:37 ` Catalin Marinas
2023-08-25 9:35 ` [RFC PATCH v2 4/8] KVM: arm64: Set DBM for previously writeable pages Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-09-15 22:54 ` Oliver Upton
2023-09-15 22:54 ` Oliver Upton
2023-09-18 9:54 ` Shameerali Kolothum Thodi
2023-09-18 9:54 ` Shameerali Kolothum Thodi
2023-09-22 15:40 ` Catalin Marinas
2023-09-22 15:40 ` Catalin Marinas
2023-09-25 8:04 ` Shameerali Kolothum Thodi
2023-09-25 8:04 ` Shameerali Kolothum Thodi
2023-08-25 9:35 ` [RFC PATCH v2 5/8] KVM: arm64: Add some HW_DBM related mmu interfaces Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-08-25 9:35 ` [RFC PATCH v2 6/8] KVM: arm64: Only write protect selected PTE Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-09-22 16:00 ` Catalin Marinas
2023-09-22 16:00 ` Catalin Marinas
2023-09-22 16:59 ` Oliver Upton
2023-09-22 16:59 ` Oliver Upton
2023-09-26 15:58 ` Catalin Marinas
2023-09-26 15:58 ` Catalin Marinas
2023-09-26 16:10 ` Catalin Marinas
2023-09-26 16:10 ` Catalin Marinas
2023-08-25 9:35 ` [RFC PATCH v2 7/8] KVM: arm64: Add KVM_CAP_ARM_HW_DBM Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-08-25 9:35 ` [RFC PATCH v2 8/8] KVM: arm64: Start up SW/HW combined dirty log Shameer Kolothum
2023-08-25 9:35 ` Shameer Kolothum
2023-09-13 17:30 ` [RFC PATCH v2 0/8] KVM: arm64: Implement " Oliver Upton
2023-09-13 17:30 ` Oliver Upton
2023-09-14 9:47 ` Shameerali Kolothum Thodi
2023-09-14 9:47 ` Shameerali Kolothum Thodi
2023-09-15 0:36 ` Oliver Upton
2023-09-15 0:36 ` Oliver Upton
2023-09-18 9:55 ` Shameerali Kolothum Thodi
2023-09-18 9:55 ` Shameerali Kolothum Thodi
2023-09-20 21:12 ` Oliver Upton
2023-09-20 21:12 ` Oliver Upton
2023-10-12 7:51 ` Shameerali Kolothum Thodi
2023-10-12 7:51 ` Shameerali Kolothum Thodi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRL2owYDvKF6gnlb@arm.com \
--to=catalin.marinas@arm.com \
--cc=james.morse@arm.com \
--cc=jonathan.cameron@huawei.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linuxarm@huawei.com \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=suzuki.poulose@arm.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.