* [PATCH AUTOSEL 5.17 01/49] KVM: PPC: Book3S HV P9: Fix "lost kick" race
@ 2022-04-12 0:43 Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 27/49] static_call: Properly initialise DEFINE_STATIC_CALL_RET0() Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit Sasha Levin
0 siblings, 2 replies; 4+ messages in thread
From: Sasha Levin @ 2022-04-12 0:43 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, farosas, aik, Nicholas Piggin, pbonzini,
linuxppc-dev
From: Nicholas Piggin <npiggin@gmail.com>
[ Upstream commit c7fa848ff01dad9ed3146a6b1a7d3622131bcedd ]
When new work is created that requires attention from the hypervisor
(e.g., to inject an interrupt into the guest), fast_vcpu_kick is used to
pull the target vcpu out of the guest if it may have been running.
Therefore the work creation side looks like this:
vcpu->arch.doorbell_request = 1;
kvmppc_fast_vcpu_kick_hv(vcpu) {
smp_mb();
cpu = vcpu->cpu;
if (cpu != -1)
send_ipi(cpu);
}
And the guest entry side *should* look like this:
vcpu->cpu = smp_processor_id();
smp_mb();
if (vcpu->arch.doorbell_request) {
// do something (abort entry or inject doorbell etc)
}
But currently the store and load are flipped, so it is possible for the
entry to see no doorbell pending, and the doorbell creation misses the
store to set cpu, resulting lost work (or at least delayed until the
next guest exit).
Fix this by reordering the entry operations and adding a smp_mb
between them. The P8 path appears to have a similar race which is
commented but not addressed yet.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220303053315.1056880-2-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/kvm/book3s_hv.c | 41 +++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 791db769080d..316f61a4cb59 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -225,6 +225,13 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
int cpu;
struct rcuwait *waitp;
+ /*
+ * rcuwait_wake_up contains smp_mb() which orders prior stores that
+ * create pending work vs below loads of cpu fields. The other side
+ * is the barrier in vcpu run that orders setting the cpu fields vs
+ * testing for pending work.
+ */
+
waitp = kvm_arch_vcpu_get_wait(vcpu);
if (rcuwait_wake_up(waitp))
++vcpu->stat.generic.halt_wakeup;
@@ -1089,7 +1096,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
break;
}
tvcpu->arch.prodded = 1;
- smp_mb();
+ smp_mb(); /* This orders prodded store vs ceded load */
if (tvcpu->arch.ceded)
kvmppc_fast_vcpu_kick_hv(tvcpu);
break;
@@ -3771,6 +3778,14 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
pvc = core_info.vc[sub];
pvc->pcpu = pcpu + thr;
for_each_runnable_thread(i, vcpu, pvc) {
+ /*
+ * XXX: is kvmppc_start_thread called too late here?
+ * It updates vcpu->cpu and vcpu->arch.thread_cpu
+ * which are used by kvmppc_fast_vcpu_kick_hv(), but
+ * kick is called after new exceptions become available
+ * and exceptions are checked earlier than here, by
+ * kvmppc_core_prepare_to_enter.
+ */
kvmppc_start_thread(vcpu, pvc);
kvmppc_create_dtl_entry(vcpu, pvc);
trace_kvm_guest_enter(vcpu);
@@ -4492,6 +4507,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
if (need_resched() || !kvm->arch.mmu_ready)
goto out;
+ vcpu->cpu = pcpu;
+ vcpu->arch.thread_cpu = pcpu;
+ vc->pcpu = pcpu;
+ local_paca->kvm_hstate.kvm_vcpu = vcpu;
+ local_paca->kvm_hstate.ptid = 0;
+ local_paca->kvm_hstate.fake_suspend = 0;
+
+ /*
+ * Orders set cpu/thread_cpu vs testing for pending interrupts and
+ * doorbells below. The other side is when these fields are set vs
+ * kvmppc_fast_vcpu_kick_hv reading the cpu/thread_cpu fields to
+ * kick a vCPU to notice the pending interrupt.
+ */
+ smp_mb();
+
if (!nested) {
kvmppc_core_prepare_to_enter(vcpu);
if (test_bit(BOOK3S_IRQPRIO_EXTERNAL,
@@ -4511,13 +4541,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
tb = mftb();
- vcpu->cpu = pcpu;
- vcpu->arch.thread_cpu = pcpu;
- vc->pcpu = pcpu;
- local_paca->kvm_hstate.kvm_vcpu = vcpu;
- local_paca->kvm_hstate.ptid = 0;
- local_paca->kvm_hstate.fake_suspend = 0;
-
__kvmppc_create_dtl_entry(vcpu, pcpu, tb + vc->tb_offset, 0);
trace_kvm_guest_enter(vcpu);
@@ -4619,6 +4642,8 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
run->exit_reason = KVM_EXIT_INTR;
vcpu->arch.ret = -EINTR;
out:
+ vcpu->cpu = -1;
+ vcpu->arch.thread_cpu = -1;
powerpc_local_irq_pmu_restore(flags);
preempt_enable();
goto done;
--
2.35.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 5.17 27/49] static_call: Properly initialise DEFINE_STATIC_CALL_RET0()
2022-04-12 0:43 [PATCH AUTOSEL 5.17 01/49] KVM: PPC: Book3S HV P9: Fix "lost kick" race Sasha Levin
@ 2022-04-12 0:43 ` Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit Sasha Levin
1 sibling, 0 replies; 4+ messages in thread
From: Sasha Levin @ 2022-04-12 0:43 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Sasha Levin, x86, Peter Zijlstra, dave.hansen, jbaron, mingo, bp,
Josh Poimboeuf, tglx, linuxppc-dev
From: Christophe Leroy <christophe.leroy@csgroup.eu>
[ Upstream commit 5517d500829c683a358a8de04ecb2e28af629ae5 ]
When a static call is updated with __static_call_return0() as target,
arch_static_call_transform() set it to use an optimised set of
instructions which are meant to lay in the same cacheline.
But when initialising a static call with DEFINE_STATIC_CALL_RET0(),
we get a branch to the real __static_call_return0() function instead
of getting the optimised setup:
c00d8120 <__SCT__perf_snapshot_branch_stack>:
c00d8120: 4b ff ff f4 b c00d8114 <__static_call_return0>
c00d8124: 3d 80 c0 0e lis r12,-16370
c00d8128: 81 8c 81 3c lwz r12,-32452(r12)
c00d812c: 7d 89 03 a6 mtctr r12
c00d8130: 4e 80 04 20 bctr
c00d8134: 38 60 00 00 li r3,0
c00d8138: 4e 80 00 20 blr
c00d813c: 00 00 00 00 .long 0x0
Add ARCH_DEFINE_STATIC_CALL_RET0_TRAMP() defined by each architecture
to setup the optimised configuration, and rework
DEFINE_STATIC_CALL_RET0() to call it:
c00d8120 <__SCT__perf_snapshot_branch_stack>:
c00d8120: 48 00 00 14 b c00d8134 <__SCT__perf_snapshot_branch_stack+0x14>
c00d8124: 3d 80 c0 0e lis r12,-16370
c00d8128: 81 8c 81 3c lwz r12,-32452(r12)
c00d812c: 7d 89 03 a6 mtctr r12
c00d8130: 4e 80 04 20 bctr
c00d8134: 38 60 00 00 li r3,0
c00d8138: 4e 80 00 20 blr
c00d813c: 00 00 00 00 .long 0x0
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/1e0a61a88f52a460f62a58ffc2a5f847d1f7d9d8.1647253456.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/static_call.h | 1 +
arch/x86/include/asm/static_call.h | 2 ++
include/linux/static_call.h | 20 +++++++++++++++++---
3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/static_call.h b/arch/powerpc/include/asm/static_call.h
index 0a0bc79bd1fa..de1018cc522b 100644
--- a/arch/powerpc/include/asm/static_call.h
+++ b/arch/powerpc/include/asm/static_call.h
@@ -24,5 +24,6 @@
#define ARCH_DEFINE_STATIC_CALL_TRAMP(name, func) __PPC_SCT(name, "b " #func)
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) __PPC_SCT(name, "blr")
+#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) __PPC_SCT(name, "b .+20")
#endif /* _ASM_POWERPC_STATIC_CALL_H */
diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index ed4f8bb6c2d9..2455d721503e 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -38,6 +38,8 @@
#define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) \
__ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret; int3; nop; nop; nop")
+#define ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) \
+ ARCH_DEFINE_STATIC_CALL_TRAMP(name, __static_call_return0)
#define ARCH_ADD_TRAMP_KEY(name) \
asm(".pushsection .static_call_tramp_key, \"a\" \n" \
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 3e56a9751c06..e2d70435988c 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -196,6 +196,14 @@ extern long __static_call_return0(void);
}; \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
+#define DEFINE_STATIC_CALL_RET0(name, _func) \
+ DECLARE_STATIC_CALL(name, _func); \
+ struct static_call_key STATIC_CALL_KEY(name) = { \
+ .func = __static_call_return0, \
+ .type = 1, \
+ }; \
+ ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)
+
#define static_call_cond(name) (void)__static_call(name)
#define EXPORT_STATIC_CALL(name) \
@@ -231,6 +239,12 @@ static inline int static_call_init(void) { return 0; }
}; \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
+#define DEFINE_STATIC_CALL_RET0(name, _func) \
+ DECLARE_STATIC_CALL(name, _func); \
+ struct static_call_key STATIC_CALL_KEY(name) = { \
+ .func = __static_call_return0, \
+ }; \
+ ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name)
#define static_call_cond(name) (void)__static_call(name)
@@ -287,6 +301,9 @@ static inline long __static_call_return0(void)
.func = NULL, \
}
+#define DEFINE_STATIC_CALL_RET0(name, _func) \
+ __DEFINE_STATIC_CALL(name, _func, __static_call_return0)
+
static inline void __static_call_nop(void) { }
/*
@@ -330,7 +347,4 @@ static inline int static_call_text_reserved(void *start, void *end)
#define DEFINE_STATIC_CALL(name, _func) \
__DEFINE_STATIC_CALL(name, _func, _func)
-#define DEFINE_STATIC_CALL_RET0(name, _func) \
- __DEFINE_STATIC_CALL(name, _func, __static_call_return0)
-
#endif /* _LINUX_STATIC_CALL_H */
--
2.35.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
2022-04-12 0:43 [PATCH AUTOSEL 5.17 01/49] KVM: PPC: Book3S HV P9: Fix "lost kick" race Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 27/49] static_call: Properly initialise DEFINE_STATIC_CALL_RET0() Sasha Levin
@ 2022-04-12 0:43 ` Sasha Levin
2022-04-12 6:35 ` Michael Ellerman
1 sibling, 1 reply; 4+ messages in thread
From: Sasha Levin @ 2022-04-12 0:43 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Sasha Levin, Kefeng Wang, linuxppc-dev
From: Kefeng Wang <wangkefeng.wang@huawei.com>
[ Upstream commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 ]
mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
Because of the way __pa() works we have:
__pa(0x8000000000000000) == 0, and therefore
virt_to_pfn(0x8000000000000000) == 0, and therefore
virt_addr_valid(0x8000000000000000) == true
Which is wrong, virt_addr_valid() should be false for vmalloc space.
In fact all vmalloc addresses that alias with a valid PFN will return
true from virt_addr_valid(). That can cause bugs with hardened usercopy
as described below by Kefeng Wang:
When running ethtool eth0 on 64-bit Book3E, a BUG occurred:
usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)!
kernel BUG at mm/usercopy.c:99
...
usercopy_abort+0x64/0xa0 (unreliable)
__check_heap_object+0x168/0x190
__check_object_size+0x1a0/0x200
dev_ethtool+0x2494/0x2b20
dev_ioctl+0x5d0/0x770
sock_do_ioctl+0xf0/0x1d0
sock_ioctl+0x3ec/0x5a0
__se_sys_ioctl+0xf0/0x160
system_call_exception+0xfc/0x1f0
system_call_common+0xf8/0x200
The code shows below,
data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))
The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
on 64-bit Book3E, which leads to the panic.
As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
the virt_addr_valid() for 64-bit, also add upper limit check to make
sure the virt is below high_memory.
Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start
of lowmem, high_memory is the upper low virtual address, the check is
suitable for 32-bit, this will fix the issue mentioned in commit
602946ec2f90 ("powerpc: Set max_mapnr correctly") too.
On 32-bit there is a similar problem with high memory, that was fixed in
commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that
commit breaks highmem and needs to be reverted.
We can't easily fix __pa(), we have code that relies on its current
behaviour. So for now add extra checks to virt_addr_valid().
For 64-bit Book3S the extra checks are not necessary, the combination of
virt_to_pfn() and pfn_valid() should yield the correct result, but they
are harmless.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add additional change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
arch/powerpc/include/asm/page.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 254687258f42..f2c5c26869f1 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -132,7 +132,11 @@ static inline bool pfn_valid(unsigned long pfn)
#define virt_to_page(kaddr) pfn_to_page(virt_to_pfn(kaddr))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
-#define virt_addr_valid(kaddr) pfn_valid(virt_to_pfn(kaddr))
+#define virt_addr_valid(vaddr) ({ \
+ unsigned long _addr = (unsigned long)vaddr; \
+ _addr >= PAGE_OFFSET && _addr < (unsigned long)high_memory && \
+ pfn_valid(virt_to_pfn(_addr)); \
+})
/*
* On Book-E parts we need __va to parse the device tree and we can't
--
2.35.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit Sasha Levin
@ 2022-04-12 6:35 ` Michael Ellerman
0 siblings, 0 replies; 4+ messages in thread
From: Michael Ellerman @ 2022-04-12 6:35 UTC (permalink / raw)
To: Sasha Levin, linux-kernel, stable; +Cc: Sasha Levin, Kefeng Wang, linuxppc-dev
Sasha Levin <sashal@kernel.org> writes:
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> [ Upstream commit ffa0b64e3be58519ae472ea29a1a1ad681e32f48 ]
>
> mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.
This cherry-pick is good, but can you also pick up the immediately
following commit:
1ff5c8e8c835 ("Revert "powerpc: Set max_mapnr correctly"")
For v5.16 and v5.17. Thanks.
cheers
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-04-12 6:36 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-12 0:43 [PATCH AUTOSEL 5.17 01/49] KVM: PPC: Book3S HV P9: Fix "lost kick" race Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 27/49] static_call: Properly initialise DEFINE_STATIC_CALL_RET0() Sasha Levin
2022-04-12 0:43 ` [PATCH AUTOSEL 5.17 40/49] powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit Sasha Levin
2022-04-12 6:35 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).