All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Will Deacon <will@kernel.org>
Cc: jiangkunkun@huawei.com, Gavin Shan <gshan@redhat.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	wangjingyi11@huawei.com, Quentin Perret <qperret@google.com>,
	lushenming@huawei.com, linux-kernel@vger.kernel.org,
	"wangyanan \(Y\)" <wangyanan55@huawei.com>,
	yezengruan@huawei.com, James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org, yuzenghui@huawei.com,
	wanghaibin.wang@huawei.com, zhukeqian1@huawei.com,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
Date: Tue, 01 Dec 2020 14:05:03 +0000	[thread overview]
Message-ID: <03ab1bdd8459831ad05993807e39e5bd@kernel.org> (raw)
In-Reply-To: <20201201134606.GB26973@willie-the-truck>

On 2020-12-01 13:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> > > On 2020/11/30 21:34, Will Deacon wrote:
>> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
>> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>> > > > >    	return 0;
>> > > > >    }
>> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
>> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> > > > > +
>> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    				struct stage2_map_data *data)
>> > > > >    {
>> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    	struct page *page = virt_to_page(ptep);
>> > > > >    	if (data->anchor) {
>> > > > > -		if (kvm_pte_valid(pte))
>> > > > > +		if (kvm_pte_valid(pte)) {
>> > > > > +			kvm_set_invalid_pte(ptep);
>> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> > > > > +				     addr, level);
>> > > > >    			put_page(page);
>> > > > This doesn't make sense to me: the page-table pages we're walking when the
>> > > > anchor is set are not accessible to the hardware walker because we unhooked
>> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>> > > > TLB invalidation.
>> > > >
>> > > > Are you seeing a problem in practice here?
>> > > Yes, I indeed find a problem in practice.
>> > >
>> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
>> > >
>> > > This problem is fixed before rework of the page table code, you can have a
>> > > look in the following two links:
>> > >
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>> > >
>> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>> > Ok, let's go through this, because I still don't see the bug. Please correct
>> > me if you spot any mistakes:
>> >
>> >    1. We have a block mapping for X => Y
>> >    2. Dirty logging is enabled, so the block mapping is write-protected and
>> >       ends up being split into page mappings
>> >    3. Dirty logging is disabled due to a failed migration.
>> >
>> > --- At this point, I think we agree that the state of the MMU is alright ---
>> >
>> >    4. We take a stage-2 fault and want to reinstall the block mapping:
>> >
>> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>> >       b. stage2_map_walk_table_pre() finds a table where we would like to
>> >          install the block:
>> >
>> > 	i.   The anchor is set to point at this entry
>> > 	ii.  The entry is made invalid
>> > 	iii. We invalidate the TLB for the input address. This is
>> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>> >
>> > 	*** At this point, the page-table pointed to by the old table entry
>> > 	    is not reachable to the hardware walker ***
>> >
>> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
>> >          now-unreachable subtree, dropping page-references for each valid
>> > 	entry it finds.
>> >       d. stage2_map_walk_table_post() is eventually called for the entry
>> >          which we cleared back in b.ii, so we install the new block mapping.
>> >
>> > You are proposing to add additional TLB invalidation to (c), but I don't
>> > think that is necessary, thanks to the invalidation already performed in
>> > b.iii. What am I missing here?
>> 
>> The point is at b.iii where the TLBI is not enough. There are many 
>> page
>> mappings that we need to merge into a block mapping.
>> 
>> We invalidate the TLB for the input address without level hint at 
>> b.iii, but
>> this operation just flush TLB for one page mapping, there
>> 
>> are still some TLB entries for the other page mappings in the cache, 
>> the MMU
>> hardware walker can still hit these entries next time.
> 
> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
> table
> entries beneath the anchor. So how about the diff below?
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c 
> b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> end, u32 level,
>  		return 0;
> 
>  	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>  	data->anchor = ptep;
>  	return 0;
>  }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> u64 end, u32 level,
>  				      struct stage2_map_data *data)
>  {
>  	int ret = 0;
> +	kvm_pte_t pte = *ptep;
> 
>  	if (!data->anchor)
>  		return 0;
> 
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);

That's a big hammer, and we so far have been pretty careful not to
over-invalidate. Is the block-replacing-table *without* an unmap
in between the only case where this triggers?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Will Deacon <will@kernel.org>
Cc: "wangyanan (Y)" <wangyanan55@huawei.com>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Gavin Shan <gshan@redhat.com>,
	Quentin Perret <qperret@google.com>,
	wanghaibin.wang@huawei.com, yezengruan@huawei.com,
	zhukeqian1@huawei.com, yuzenghui@huawei.com,
	jiangkunkun@huawei.com, wangjingyi11@huawei.com,
	lushenming@huawei.com
Subject: Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry
Date: Tue, 01 Dec 2020 14:05:03 +0000	[thread overview]
Message-ID: <03ab1bdd8459831ad05993807e39e5bd@kernel.org> (raw)
In-Reply-To: <20201201134606.GB26973@willie-the-truck>

On 2020-12-01 13:46, Will Deacon wrote:
> On Tue, Dec 01, 2020 at 10:30:41AM +0800, wangyanan (Y) wrote:
>> On 2020/12/1 0:01, Will Deacon wrote:
>> > On Mon, Nov 30, 2020 at 11:24:19PM +0800, wangyanan (Y) wrote:
>> > > On 2020/11/30 21:34, Will Deacon wrote:
>> > > > On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > index 696b6aa83faf..fec8dc9f2baa 100644
>> > > > > --- a/arch/arm64/kvm/hyp/pgtable.c
>> > > > > +++ b/arch/arm64/kvm/hyp/pgtable.c
>> > > > > @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>> > > > >    	return 0;
>> > > > >    }
>> > > > > +static void stage2_flush_dcache(void *addr, u64 size);
>> > > > > +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> > > > > +
>> > > > >    static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    				struct stage2_map_data *data)
>> > > > >    {
>> > > > > @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> > > > >    	struct page *page = virt_to_page(ptep);
>> > > > >    	if (data->anchor) {
>> > > > > -		if (kvm_pte_valid(pte))
>> > > > > +		if (kvm_pte_valid(pte)) {
>> > > > > +			kvm_set_invalid_pte(ptep);
>> > > > > +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> > > > > +				     addr, level);
>> > > > >    			put_page(page);
>> > > > This doesn't make sense to me: the page-table pages we're walking when the
>> > > > anchor is set are not accessible to the hardware walker because we unhooked
>> > > > the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
>> > > > TLB invalidation.
>> > > >
>> > > > Are you seeing a problem in practice here?
>> > > Yes, I indeed find a problem in practice.
>> > >
>> > > When the migration was cancelled, a TLB conflic abort  was found in guest.
>> > >
>> > > This problem is fixed before rework of the page table code, you can have a
>> > > look in the following two links:
>> > >
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277
>> > >
>> > > https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html
>> > Ok, let's go through this, because I still don't see the bug. Please correct
>> > me if you spot any mistakes:
>> >
>> >    1. We have a block mapping for X => Y
>> >    2. Dirty logging is enabled, so the block mapping is write-protected and
>> >       ends up being split into page mappings
>> >    3. Dirty logging is disabled due to a failed migration.
>> >
>> > --- At this point, I think we agree that the state of the MMU is alright ---
>> >
>> >    4. We take a stage-2 fault and want to reinstall the block mapping:
>> >
>> >       a. kvm_pgtable_stage2_map() is invoked to install the block mapping
>> >       b. stage2_map_walk_table_pre() finds a table where we would like to
>> >          install the block:
>> >
>> > 	i.   The anchor is set to point at this entry
>> > 	ii.  The entry is made invalid
>> > 	iii. We invalidate the TLB for the input address. This is
>> > 	     TLBI IPAS2SE1IS without level hint and then TLBI VMALLE1IS.
>> >
>> > 	*** At this point, the page-table pointed to by the old table entry
>> > 	    is not reachable to the hardware walker ***
>> >
>> >       c. stage2_map_walk_leaf() is called for each leaf entry in the
>> >          now-unreachable subtree, dropping page-references for each valid
>> > 	entry it finds.
>> >       d. stage2_map_walk_table_post() is eventually called for the entry
>> >          which we cleared back in b.ii, so we install the new block mapping.
>> >
>> > You are proposing to add additional TLB invalidation to (c), but I don't
>> > think that is necessary, thanks to the invalidation already performed in
>> > b.iii. What am I missing here?
>> 
>> The point is at b.iii where the TLBI is not enough. There are many 
>> page
>> mappings that we need to merge into a block mapping.
>> 
>> We invalidate the TLB for the input address without level hint at 
>> b.iii, but
>> this operation just flush TLB for one page mapping, there
>> 
>> are still some TLB entries for the other page mappings in the cache, 
>> the MMU
>> hardware walker can still hit these entries next time.
> 
> Ah, yes, I see. Thanks. I hadn't considered the case where there are 
> table
> entries beneath the anchor. So how about the diff below?
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c 
> b/arch/arm64/kvm/hyp/pgtable.c
> index 0271b4a3b9fe..12526d8c7ae4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -493,7 +493,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64
> end, u32 level,
>  		return 0;
> 
>  	kvm_set_invalid_pte(ptep);
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	/* TLB invalidation is deferred until the _post handler */
>  	data->anchor = ptep;
>  	return 0;
>  }
> @@ -547,11 +547,21 @@ static int stage2_map_walk_table_post(u64 addr,
> u64 end, u32 level,
>  				      struct stage2_map_data *data)
>  {
>  	int ret = 0;
> +	kvm_pte_t pte = *ptep;
> 
>  	if (!data->anchor)
>  		return 0;
> 
> -	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	kvm_set_invalid_pte(ptep);
> +
> +	/*
> +	 * Invalidate the whole stage-2, as we may have numerous leaf
> +	 * entries below us which would otherwise need invalidating
> +	 * individually.
> +	 */
> +	kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);

That's a big hammer, and we so far have been pretty careful not to
over-invalidate. Is the block-replacing-table *without* an unmap
in between the only case where this triggers?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2020-12-01 14:06 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-30 12:18 [RFC PATCH 0/3] Fix several bugs in KVM stage 2 translation Yanan Wang
2020-11-30 12:18 ` Yanan Wang
2020-11-30 12:18 ` [RFC PATCH 1/3] KVM: arm64: Fix possible memory leak in kvm stage2 Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:21   ` Will Deacon
2020-11-30 13:21     ` Will Deacon
2020-12-01  7:21     ` wangyanan (Y)
2020-12-01  7:21       ` wangyanan (Y)
2020-12-01 14:16       ` Will Deacon
2020-12-01 14:16         ` Will Deacon
2020-12-01 17:19         ` wangyanan (Y)
2020-12-01 17:19           ` wangyanan (Y)
2020-12-01 18:15           ` Will Deacon
2020-12-01 18:15             ` Will Deacon
2020-12-01 20:08             ` wangyanan (Y)
2020-12-01 20:08               ` wangyanan (Y)
2020-11-30 12:18 ` [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:34   ` Will Deacon
2020-11-30 13:34     ` Will Deacon
2020-11-30 15:24     ` wangyanan (Y)
2020-11-30 15:24       ` wangyanan (Y)
2020-11-30 16:01       ` Will Deacon
2020-11-30 16:01         ` Will Deacon
2020-12-01  2:30         ` wangyanan (Y)
2020-12-01  2:30           ` wangyanan (Y)
2020-12-01 13:46           ` Will Deacon
2020-12-01 13:46             ` Will Deacon
2020-12-01 14:05             ` Marc Zyngier [this message]
2020-12-01 14:05               ` Marc Zyngier
2020-12-01 14:23               ` Will Deacon
2020-12-01 14:23                 ` Will Deacon
2020-12-01 14:32                 ` Marc Zyngier
2020-12-01 14:32                   ` Marc Zyngier
2020-12-01 14:11             ` wangyanan (Y)
2020-12-01 14:11               ` wangyanan (Y)
2020-12-01 14:35               ` Marc Zyngier
2020-12-01 14:35                 ` Marc Zyngier
2020-12-01 17:20                 ` wangyanan (Y)
2020-12-01 17:20                   ` wangyanan (Y)
2020-12-01 18:17                   ` Will Deacon
2020-12-01 18:17                     ` Will Deacon
2020-11-30 12:18 ` [RFC PATCH 3/3] KVM: arm64: Add usage of stage 2 fault lookup level in user_mem_abort() Yanan Wang
2020-11-30 12:18   ` Yanan Wang
2020-11-30 13:49   ` Will Deacon
2020-11-30 13:49     ` Will Deacon
2020-12-01  6:04     ` wangyanan (Y)
2020-12-01  6:04       ` wangyanan (Y)
2020-11-30 19:40   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03ab1bdd8459831ad05993807e39e5bd@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jiangkunkun@huawei.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=qperret@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=wangjingyi11@huawei.com \
    --cc=wangyanan55@huawei.com \
    --cc=will@kernel.org \
    --cc=yezengruan@huawei.com \
    --cc=yuzenghui@huawei.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.