All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvm@vger.kernel.org, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [PATCH v2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks
Date: Wed, 16 Nov 2022 03:08:49 +0000	[thread overview]
Message-ID: <868rkbppdq.wl-maz@kernel.org> (raw)
In-Reply-To: <20221115225502.2240227-1-oliver.upton@linux.dev>

On Tue, 15 Nov 2022 22:55:02 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Marek reported a BUG resulting from the recent parallel faults changes,
> as the hyp stage-1 map walker attempted to allocate table memory while
> holding the RCU read lock:
> 
>   BUG: sleeping function called from invalid context at
>   include/linux/sched/mm.h:274
>   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>   preempt_count: 0, expected: 0
>   RCU nest depth: 1, expected: 0
>   2 locks held by swapper/0/1:
>     #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
>   __create_hyp_mappings+0x80/0xc4
>     #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at:
>   kvm_pgtable_walk+0x0/0x1f4
>   CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
>   Hardware name: Raspberry Pi 3 Model B (DT)
>   Call trace:
>     dump_backtrace.part.0+0xe4/0xf0
>     show_stack+0x18/0x40
>     dump_stack_lvl+0x8c/0xb8
>     dump_stack+0x18/0x34
>     __might_resched+0x178/0x220
>     __might_sleep+0x48/0xa0
>     prepare_alloc_pages+0x178/0x1a0
>     __alloc_pages+0x9c/0x109c
>     alloc_page_interleave+0x1c/0xc4
>     alloc_pages+0xec/0x160
>     get_zeroed_page+0x1c/0x44
>     kvm_hyp_zalloc_page+0x14/0x20
>     hyp_map_walker+0xd4/0x134
>     kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>     __kvm_pgtable_walk+0x1a4/0x220
>     kvm_pgtable_walk+0x104/0x1f4
>     kvm_pgtable_hyp_map+0x80/0xc4
>     __create_hyp_mappings+0x9c/0xc4
>     kvm_mmu_init+0x144/0x1cc
>     kvm_arch_init+0xe4/0xef4
>     kvm_init+0x3c/0x3d0
>     arm_init+0x20/0x30
>     do_one_initcall+0x74/0x400
>     kernel_init_freeable+0x2e0/0x350
>     kernel_init+0x24/0x130
>     ret_from_fork+0x10/0x20
> 
> Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
> RCU protection really doesn't add anything. Don't acquire the RCU read
> lock for an exclusive walk. While at it, add a warning which codifies
> the lack of support for shared walks in the hypervisor code.
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> 
> Applies on top of the parallel faults series that was picked up last
> week. Tested with kvm-arm.mode={nvhe,protected} on an Ampere Altra
> system.
> 
> v1 -> v2:
>  - Took Will's suggestion of conditioning RCU on a flag, small tweak to
>    use existing bit instead (Thanks!)
> 
>  arch/arm64/include/asm/kvm_pgtable.h | 22 ++++++++++++++++------
>  arch/arm64/kvm/hyp/pgtable.c         |  5 +++--
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a874ce0ce7b5..d4c7321fa652 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -51,8 +51,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return pteref;
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void) {}
> -static inline void kvm_pgtable_walk_end(void) {}
> +static inline void kvm_pgtable_walk_begin(bool shared)
> +{
> +	/*
> +	 * Due to the lack of RCU (or a similar protection scheme), only
> +	 * non-shared table walkers are allowed in the hypervisor.
> +	 */
> +	WARN_ON(shared);
> +}
> +
> +static inline void kvm_pgtable_walk_end(bool shared) {}
>  
>  static inline bool kvm_pgtable_walk_lock_held(void)
>  {
> @@ -68,14 +76,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return rcu_dereference_check(pteref, !shared);
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void)
> +static inline void kvm_pgtable_walk_begin(bool shared)

I'm not crazy about this sort of parameters. I think it would make a
lot more sense to pass a pointer to the walker structure and do the
flag check inside the helper.

That way, we avoid extra churn if/when we need extra state or
bookkeeping around the walk.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks
Date: Wed, 16 Nov 2022 03:08:49 +0000	[thread overview]
Message-ID: <868rkbppdq.wl-maz@kernel.org> (raw)
Message-ID: <20221116030849.fMobY4MQYNgTvP0DYnk_PkmwwyclfBg6Cu-EzGpNi1o@z> (raw)
In-Reply-To: <20221115225502.2240227-1-oliver.upton@linux.dev>

On Tue, 15 Nov 2022 22:55:02 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Marek reported a BUG resulting from the recent parallel faults changes,
> as the hyp stage-1 map walker attempted to allocate table memory while
> holding the RCU read lock:
> 
>   BUG: sleeping function called from invalid context at
>   include/linux/sched/mm.h:274
>   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>   preempt_count: 0, expected: 0
>   RCU nest depth: 1, expected: 0
>   2 locks held by swapper/0/1:
>     #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
>   __create_hyp_mappings+0x80/0xc4
>     #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at:
>   kvm_pgtable_walk+0x0/0x1f4
>   CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
>   Hardware name: Raspberry Pi 3 Model B (DT)
>   Call trace:
>     dump_backtrace.part.0+0xe4/0xf0
>     show_stack+0x18/0x40
>     dump_stack_lvl+0x8c/0xb8
>     dump_stack+0x18/0x34
>     __might_resched+0x178/0x220
>     __might_sleep+0x48/0xa0
>     prepare_alloc_pages+0x178/0x1a0
>     __alloc_pages+0x9c/0x109c
>     alloc_page_interleave+0x1c/0xc4
>     alloc_pages+0xec/0x160
>     get_zeroed_page+0x1c/0x44
>     kvm_hyp_zalloc_page+0x14/0x20
>     hyp_map_walker+0xd4/0x134
>     kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>     __kvm_pgtable_walk+0x1a4/0x220
>     kvm_pgtable_walk+0x104/0x1f4
>     kvm_pgtable_hyp_map+0x80/0xc4
>     __create_hyp_mappings+0x9c/0xc4
>     kvm_mmu_init+0x144/0x1cc
>     kvm_arch_init+0xe4/0xef4
>     kvm_init+0x3c/0x3d0
>     arm_init+0x20/0x30
>     do_one_initcall+0x74/0x400
>     kernel_init_freeable+0x2e0/0x350
>     kernel_init+0x24/0x130
>     ret_from_fork+0x10/0x20
> 
> Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
> RCU protection really doesn't add anything. Don't acquire the RCU read
> lock for an exclusive walk. While at it, add a warning which codifies
> the lack of support for shared walks in the hypervisor code.
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> 
> Applies on top of the parallel faults series that was picked up last
> week. Tested with kvm-arm.mode={nvhe,protected} on an Ampere Altra
> system.
> 
> v1 -> v2:
>  - Took Will's suggestion of conditioning RCU on a flag, small tweak to
>    use existing bit instead (Thanks!)
> 
>  arch/arm64/include/asm/kvm_pgtable.h | 22 ++++++++++++++++------
>  arch/arm64/kvm/hyp/pgtable.c         |  5 +++--
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a874ce0ce7b5..d4c7321fa652 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -51,8 +51,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return pteref;
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void) {}
> -static inline void kvm_pgtable_walk_end(void) {}
> +static inline void kvm_pgtable_walk_begin(bool shared)
> +{
> +	/*
> +	 * Due to the lack of RCU (or a similar protection scheme), only
> +	 * non-shared table walkers are allowed in the hypervisor.
> +	 */
> +	WARN_ON(shared);
> +}
> +
> +static inline void kvm_pgtable_walk_end(bool shared) {}
>  
>  static inline bool kvm_pgtable_walk_lock_held(void)
>  {
> @@ -68,14 +76,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return rcu_dereference_check(pteref, !shared);
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void)
> +static inline void kvm_pgtable_walk_begin(bool shared)

I'm not crazy about this sort of parameters. I think it would make a
lot more sense to pass a pointer to the walker structure and do the
flag check inside the helper.

That way, we avoid extra churn if/when we need extra state or
bookkeeping around the walk.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: James Morse <james.morse@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org,
	kvmarm@lists.linux.dev,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks
Date: Wed, 16 Nov 2022 03:08:49 +0000	[thread overview]
Message-ID: <868rkbppdq.wl-maz@kernel.org> (raw)
In-Reply-To: <20221115225502.2240227-1-oliver.upton@linux.dev>

On Tue, 15 Nov 2022 22:55:02 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Marek reported a BUG resulting from the recent parallel faults changes,
> as the hyp stage-1 map walker attempted to allocate table memory while
> holding the RCU read lock:
> 
>   BUG: sleeping function called from invalid context at
>   include/linux/sched/mm.h:274
>   in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0
>   preempt_count: 0, expected: 0
>   RCU nest depth: 1, expected: 0
>   2 locks held by swapper/0/1:
>     #0: ffff80000a8a44d0 (kvm_hyp_pgd_mutex){+.+.}-{3:3}, at:
>   __create_hyp_mappings+0x80/0xc4
>     #1: ffff80000a927720 (rcu_read_lock){....}-{1:2}, at:
>   kvm_pgtable_walk+0x0/0x1f4
>   CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc3+ #5918
>   Hardware name: Raspberry Pi 3 Model B (DT)
>   Call trace:
>     dump_backtrace.part.0+0xe4/0xf0
>     show_stack+0x18/0x40
>     dump_stack_lvl+0x8c/0xb8
>     dump_stack+0x18/0x34
>     __might_resched+0x178/0x220
>     __might_sleep+0x48/0xa0
>     prepare_alloc_pages+0x178/0x1a0
>     __alloc_pages+0x9c/0x109c
>     alloc_page_interleave+0x1c/0xc4
>     alloc_pages+0xec/0x160
>     get_zeroed_page+0x1c/0x44
>     kvm_hyp_zalloc_page+0x14/0x20
>     hyp_map_walker+0xd4/0x134
>     kvm_pgtable_visitor_cb.isra.0+0x38/0x5c
>     __kvm_pgtable_walk+0x1a4/0x220
>     kvm_pgtable_walk+0x104/0x1f4
>     kvm_pgtable_hyp_map+0x80/0xc4
>     __create_hyp_mappings+0x9c/0xc4
>     kvm_mmu_init+0x144/0x1cc
>     kvm_arch_init+0xe4/0xef4
>     kvm_init+0x3c/0x3d0
>     arm_init+0x20/0x30
>     do_one_initcall+0x74/0x400
>     kernel_init_freeable+0x2e0/0x350
>     kernel_init+0x24/0x130
>     ret_from_fork+0x10/0x20
> 
> Since the hyp stage-1 table walkers are serialized by kvm_hyp_pgd_mutex,
> RCU protection really doesn't add anything. Don't acquire the RCU read
> lock for an exclusive walk. While at it, add a warning which codifies
> the lack of support for shared walks in the hypervisor code.
> 
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> 
> Applies on top of the parallel faults series that was picked up last
> week. Tested with kvm-arm.mode={nvhe,protected} on an Ampere Altra
> system.
> 
> v1 -> v2:
>  - Took Will's suggestion of conditioning RCU on a flag, small tweak to
>    use existing bit instead (Thanks!)
> 
>  arch/arm64/include/asm/kvm_pgtable.h | 22 ++++++++++++++++------
>  arch/arm64/kvm/hyp/pgtable.c         |  5 +++--
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a874ce0ce7b5..d4c7321fa652 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -51,8 +51,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return pteref;
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void) {}
> -static inline void kvm_pgtable_walk_end(void) {}
> +static inline void kvm_pgtable_walk_begin(bool shared)
> +{
> +	/*
> +	 * Due to the lack of RCU (or a similar protection scheme), only
> +	 * non-shared table walkers are allowed in the hypervisor.
> +	 */
> +	WARN_ON(shared);
> +}
> +
> +static inline void kvm_pgtable_walk_end(bool shared) {}
>  
>  static inline bool kvm_pgtable_walk_lock_held(void)
>  {
> @@ -68,14 +76,16 @@ static inline kvm_pte_t *kvm_dereference_pteref(kvm_pteref_t pteref, bool shared
>  	return rcu_dereference_check(pteref, !shared);
>  }
>  
> -static inline void kvm_pgtable_walk_begin(void)
> +static inline void kvm_pgtable_walk_begin(bool shared)

I'm not crazy about this sort of parameters. I think it would make a
lot more sense to pass a pointer to the walker structure and do the
flag check inside the helper.

That way, we avoid extra churn if/when we need extra state or
bookkeeping around the walk.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-11-16  3:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-15 22:55 [PATCH v2] KVM: arm64: Don't acquire RCU read lock for exclusive table walks Oliver Upton
2022-11-15 22:55 ` Oliver Upton
2022-11-15 22:55 ` Oliver Upton
2022-11-16  3:08 ` Marc Zyngier [this message]
2022-11-16  3:08   ` Marc Zyngier
2022-11-16  3:08   ` Marc Zyngier
2022-11-16  7:27   ` Oliver Upton
2022-11-16  7:27     ` Oliver Upton
2022-11-16  7:27     ` Oliver Upton
2022-11-16  8:14     ` Oliver Upton
2022-11-16  8:14       ` Oliver Upton
2022-11-16  8:14       ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=868rkbppdq.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=oliver.upton@linux.dev \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.