LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 5/5] KVM: selftests: Remove __NR_userfaultfd syscall fallback
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Revert the __NR_userfaultfd syscall fallback added for KVM selftests now
that x86's unistd_{32,63}.h overrides are under uapi/ and thus not in
KVM sefltests' search path, i.e. now that KVM gets x86 syscall numbers
from the installed kernel headers.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/uapi/asm/unistd_64.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
index 4205ed4158bf..cb52a3a8b8fc 100644
--- a/tools/arch/x86/include/uapi/asm/unistd_64.h
+++ b/tools/arch/x86/include/uapi/asm/unistd_64.h
@@ -1,7 +1,4 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __NR_userfaultfd
-#define __NR_userfaultfd 282
-#endif
 #ifndef __NR_perf_event_open
 # define __NR_perf_event_open 298
 #endif
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Add a test to verify an rseq's CPU ID is updated correctly if the task is
migrated while the kernel is handling KVM_RUN.  This is a regression test
for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
without updating rseq, leading to a stale CPU ID and other badness.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/.gitignore  |   1 +
 tools/testing/selftests/kvm/Makefile    |   3 +
 tools/testing/selftests/kvm/rseq_test.c | 236 ++++++++++++++++++++++++
 3 files changed, 240 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 0709af0144c8..6d031ff6b68e 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -47,6 +47,7 @@
 /kvm_page_table_test
 /memslot_modification_stress_test
 /memslot_perf_test
+/rseq_test
 /set_memory_region_test
 /steal_time
 /kvm_binary_stats_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 5832f510a16c..0756e79cb513 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += memslot_perf_test
+TEST_GEN_PROGS_x86_64 += rseq_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
@@ -92,6 +93,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_aarch64 += kvm_page_table_test
+TEST_GEN_PROGS_aarch64 += rseq_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
 TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
@@ -103,6 +105,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
 TEST_GEN_PROGS_s390x += dirty_log_test
 TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
 TEST_GEN_PROGS_s390x += kvm_page_table_test
+TEST_GEN_PROGS_s390x += rseq_test
 TEST_GEN_PROGS_s390x += set_memory_region_test
 TEST_GEN_PROGS_s390x += kvm_binary_stats_test
 
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
new file mode 100644
index 000000000000..060538bd405a
--- /dev/null
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <errno.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+#include <syscall.h>
+#include <sys/ioctl.h>
+#include <asm/barrier.h>
+#include <linux/atomic.h>
+#include <linux/rseq.h>
+#include <linux/unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static __thread volatile struct rseq __rseq = {
+	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+/*
+ * Use an arbitrary, bogus signature for configuring rseq, this test does not
+ * actually enter an rseq critical section.
+ */
+#define RSEQ_SIG 0xdeadbeef
+
+/*
+ * Any bug related to task migration is likely to be timing-dependent; perform
+ * a large number of migrations to reduce the odds of a false negative.
+ */
+#define NR_TASK_MIGRATIONS 100000
+
+static pthread_t migration_thread;
+static cpu_set_t possible_mask;
+static bool done;
+
+static atomic_t seq_cnt;
+
+static void guest_code(void)
+{
+	for (;;)
+		GUEST_SYNC(0);
+}
+
+static void sys_rseq(int flags)
+{
+	int r;
+
+	r = syscall(__NR_rseq, &__rseq, sizeof(__rseq), flags, RSEQ_SIG);
+	TEST_ASSERT(!r, "rseq failed, errno = %d (%s)", errno, strerror(errno));
+}
+
+static void *migration_worker(void *ign)
+{
+	cpu_set_t allowed_mask;
+	int r, i, nr_cpus, cpu;
+
+	CPU_ZERO(&allowed_mask);
+
+	nr_cpus = CPU_COUNT(&possible_mask);
+
+	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
+		cpu = i % nr_cpus;
+		if (!CPU_ISSET(cpu, &possible_mask))
+			continue;
+
+		CPU_SET(cpu, &allowed_mask);
+
+		/*
+		 * Bump the sequence count twice to allow the reader to detect
+		 * that a migration may have occurred in between rseq and sched
+		 * CPU ID reads.  An odd sequence count indicates a migration
+		 * is in-progress, while a completely different count indicates
+		 * a migration occurred since the count was last read.
+		 */
+		atomic_inc(&seq_cnt);
+
+		/*
+		 * Ensure the odd count is visible while sched_getcpu() isn't
+		 * stable, i.e. while changing affinity is in-progress.
+		 */
+		smp_wmb();
+		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
+			    errno, strerror(errno));
+		smp_wmb();
+		atomic_inc(&seq_cnt);
+
+		CPU_CLR(cpu, &allowed_mask);
+
+		/*
+		 * Wait 1-10us before proceeding to the next iteration and more
+		 * specifically, before bumping seq_cnt again.  A delay is
+		 * needed on three fronts:
+		 *
+		 *  1. To allow sched_setaffinity() to prompt migration before
+		 *     ioctl(KVM_RUN) enters the guest so that TIF_NOTIFY_RESUME
+		 *     (or TIF_NEED_RESCHED, which indirectly leads to handling
+		 *     NOTIFY_RESUME) is handled in KVM context.
+		 *
+		 *     If NOTIFY_RESUME/NEED_RESCHED is set after KVM enters
+		 *     the guest, the guest will trigger a IO/MMIO exit all the
+		 *     way to userspace and the TIF flags will be handled by
+		 *     the generic "exit to userspace" logic, not by KVM.  The
+		 *     exit to userspace is necessary to give the test a chance
+		 *     to check the rseq CPU ID (see #2).
+		 *
+		 *     Alternatively, guest_code() could include an instruction
+		 *     to trigger an exit that is handled by KVM, but any such
+		 *     exit requires architecture specific code.
+		 *
+		 *  2. To let ioctl(KVM_RUN) make its way back to the test
+		 *     before the next round of migration.  The test's check on
+		 *     the rseq CPU ID must wait for migration to complete in
+		 *     order to avoid false positive, thus any kernel rseq bug
+		 *     will be missed if the next migration starts before the
+		 *     check completes.
+		 *
+		 *  3. To ensure the read-side makes efficient forward progress,
+		 *     e.g. if sched_getcpu() involves a syscall.  Stalling the
+		 *     read-side means the test will spend more time waiting for
+		 *     sched_getcpu() to stabilize and less time trying to hit
+		 *     the timing-dependent bug.
+		 *
+		 * Because any bug in this area is likely to be timing-dependent,
+		 * run with a range of delays at 1us intervals from 1us to 10us
+		 * as a best effort to avoid tuning the test to the point where
+		 * it can hit _only_ the original bug and not detect future
+		 * regressions.
+		 *
+		 * The original bug can reproduce with a delay up to ~500us on
+		 * x86-64, but starts to require more iterations to reproduce
+		 * as the delay creeps above ~10us, and the average runtime of
+		 * each iteration obviously increases as well.  Cap the delay
+		 * at 10us to keep test runtime reasonable while minimizing
+		 * potential coverage loss.
+		 *
+		 * The lower bound for reproducing the bug is likely below 1us,
+		 * e.g. failures occur on x86-64 with nanosleep(0), but at that
+		 * point the overhead of the syscall likely dominates the delay.
+		 * Use usleep() for simplicity and to avoid unnecessary kernel
+		 * dependencies.
+		 */
+		usleep((i % 10) + 1);
+	}
+	done = true;
+	return NULL;
+}
+
+int main(int argc, char *argv[])
+{
+	int r, i, snapshot;
+	struct kvm_vm *vm;
+	u32 cpu, rseq_cpu;
+
+	/* Tell stdout not to buffer its content */
+	setbuf(stdout, NULL);
+
+	r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
+	TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
+		    strerror(errno));
+
+	if (CPU_COUNT(&possible_mask) < 2) {
+		print_skip("Only one CPU, task migration not possible\n");
+		exit(KSFT_SKIP);
+	}
+
+	sys_rseq(0);
+
+	/*
+	 * Create and run a dummy VM that immediately exits to userspace via
+	 * GUEST_SYNC, while concurrently migrating the process by setting its
+	 * CPU affinity.
+	 */
+	vm = vm_create_default(VCPU_ID, 0, guest_code);
+
+	pthread_create(&migration_thread, NULL, migration_worker, 0);
+
+	for (i = 0; !done; i++) {
+		vcpu_run(vm, VCPU_ID);
+		TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
+			    "Guest failed?");
+
+		/*
+		 * Verify rseq's CPU matches sched's CPU.  Ensure migration
+		 * doesn't occur between sched_getcpu() and reading the rseq
+		 * cpu_id by rereading both if the sequence count changes, or
+		 * if the count is odd (migration in-progress).
+		 */
+		do {
+			/*
+			 * Drop bit 0 to force a mismatch if the count is odd,
+			 * i.e. if a migration is in-progress.
+			 */
+			snapshot = atomic_read(&seq_cnt) & ~1;
+
+			/*
+			 * Ensure reading sched_getcpu() and rseq.cpu_id
+			 * complete in a single "no migration" window, i.e. are
+			 * not reordered across the seq_cnt reads.
+			 */
+			smp_rmb();
+			cpu = sched_getcpu();
+			rseq_cpu = READ_ONCE(__rseq.cpu_id);
+			smp_rmb();
+		} while (snapshot != atomic_read(&seq_cnt));
+
+		TEST_ASSERT(rseq_cpu == cpu,
+			    "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
+	}
+
+	/*
+	 * Sanity check that the test was able to enter the guest a reasonable
+	 * number of times, e.g. didn't get stalled too often/long waiting for
+	 * sched_getcpu() to stabilize.  A 2:1 migration:KVM_RUN ratio is a
+	 * fairly conservative ratio on x86-64, which can do _more_ KVM_RUNs
+	 * than migrations given the 1us+ delay in the migration task.
+	 */
+	TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
+		    "Only performed %d KVM_RUNs, task stalled too much?\n", i);
+
+	pthread_join(migration_thread, NULL);
+
+	kvm_vm_free(vm);
+
+	sys_rseq(RSEQ_FLAG_UNREGISTER);
+
+	return 0;
+}
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 3/5] tools: Move x86 syscall number fallbacks to .../uapi/
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Move unistd_{32,64}.h from x86/include/asm to x86/include/uapi/asm so
that tools/selftests that install kernel headers, e.g. KVM selftests, can
include non-uapi tools headers, e.g. to get 'struct list_head', without
effectively overriding the installed non-tool uapi headers.

Swapping KVM's search order, e.g. to search the kernel headers before
tool headers, is not a viable option as doing results in linux/type.h and
other core headers getting pulled from the kernel headers, which do not
have the kernel-internal typedefs that are used through tools, including
many files outside of selftests/kvm's control.

Prior to commit cec07f53c398 ("perf tools: Move syscall number fallbacks
from perf-sys.h to tools/arch/x86/include/asm/"), the handcoded numbers
were actual fallbacks, i.e. overriding unistd_{32,64}.h from the kernel
headers was unintentional.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/arch/x86/include/{ => uapi}/asm/unistd_32.h | 0
 tools/arch/x86/include/{ => uapi}/asm/unistd_64.h | 0
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (100%)

diff --git a/tools/arch/x86/include/asm/unistd_32.h b/tools/arch/x86/include/uapi/asm/unistd_32.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_32.h
rename to tools/arch/x86/include/uapi/asm/unistd_32.h
diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/uapi/asm/unistd_64.h
similarity index 100%
rename from tools/arch/x86/include/asm/unistd_64.h
rename to tools/arch/x86/include/uapi/asm/unistd_64.h
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply

* [PATCH v3 2/5] entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume()
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
that the two function are always called back-to-back by architectures
that have rseq.  The rseq helper is stubbed out for architectures that
don't support rseq, i.e. this is a nop across the board.

Note, tracehook_notify_resume() is horribly named and arguably does not
belong in tracehook.h as literally every line of code in it has nothing
to do with tracing.  But, that's been true since commit a42c6ded827d
("move key_repace_session_keyring() into tracehook_notify_resume()")
first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
mess up to future patches.

No functional change intended.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/arm/kernel/signal.c     | 1 -
 arch/arm64/kernel/signal.c   | 1 -
 arch/csky/kernel/signal.c    | 4 +---
 arch/mips/kernel/signal.c    | 4 +---
 arch/powerpc/kernel/signal.c | 4 +---
 include/linux/tracehook.h    | 2 ++
 kernel/entry/common.c        | 4 +---
 kernel/entry/kvm.c           | 4 +---
 8 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index a3a38d0a4c85..9df68d139965 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -670,7 +670,6 @@ do_work_pending(struct pt_regs *regs, unsigned int thread_flags, int syscall)
 				uprobe_notify_resume(regs);
 			} else {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 			}
 		}
 		local_irq_disable();
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 23036334f4dc..22b55db13da6 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -951,7 +951,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 
 			if (thread_flags & _TIF_NOTIFY_RESUME) {
 				tracehook_notify_resume(regs);
-				rseq_handle_notify_resume(NULL, regs);
 
 				/*
 				 * If we reschedule after checking the affinity
diff --git a/arch/csky/kernel/signal.c b/arch/csky/kernel/signal.c
index 312f046d452d..bc4238b9f709 100644
--- a/arch/csky/kernel/signal.c
+++ b/arch/csky/kernel/signal.c
@@ -260,8 +260,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
diff --git a/arch/mips/kernel/signal.c b/arch/mips/kernel/signal.c
index f1e985109da0..c9b2a75563e1 100644
--- a/arch/mips/kernel/signal.c
+++ b/arch/mips/kernel/signal.c
@@ -906,10 +906,8 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
 	if (thread_info_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 		do_signal(regs);
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 
 	user_enter();
 }
diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
index e600764a926c..b93b87df499d 100644
--- a/arch/powerpc/kernel/signal.c
+++ b/arch/powerpc/kernel/signal.c
@@ -293,10 +293,8 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
 		do_signal(current);
 	}
 
-	if (thread_info_flags & _TIF_NOTIFY_RESUME) {
+	if (thread_info_flags & _TIF_NOTIFY_RESUME)
 		tracehook_notify_resume(regs);
-		rseq_handle_notify_resume(NULL, regs);
-	}
 }
 
 static unsigned long get_tm_stackpointer(struct task_struct *tsk)
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 3e80c4bc66f7..2564b7434b4d 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -197,6 +197,8 @@ static inline void tracehook_notify_resume(struct pt_regs *regs)
 
 	mem_cgroup_handle_over_high();
 	blkcg_maybe_throttle_current();
+
+	rseq_handle_notify_resume(NULL, regs);
 }
 
 /*
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bf16395b9e13..d5a61d565ad5 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -171,10 +171,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL))
 			handle_signal_work(regs, ti_work);
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(regs);
-			rseq_handle_notify_resume(NULL, regs);
-		}
 
 		/* Architecture specific TIF work */
 		arch_exit_to_user_mode_work(regs, ti_work);
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 049fd06b4c3d..49972ee99aff 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,10 +19,8 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME) {
+		if (ti_work & _TIF_NOTIFY_RESUME)
 			tracehook_notify_resume(NULL);
-			rseq_handle_notify_resume(NULL, NULL);
-		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20210901203030.1292304-1-seanjc@google.com>

Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
transferring to a KVM guest, which is roughly equivalent to an exit to
userspace and processes many of the same pending actions.  While the task
cannot be in an rseq critical section as the KVM path is reachable only
by via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
critical section still apply, e.g. the current CPU needs to be updated if
the task is migrated.

Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
and other badness in userspace VMMs that use rseq in combination with KVM,
e.g. due to the CPU ID being stale after task migration.

Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
Reported-by: Peter Foley <pefoley@google.com>
Bisected-by: Doug Evans <dje@google.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 kernel/entry/kvm.c |  4 +++-
 kernel/rseq.c      | 14 +++++++++++---
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
index 49972ee99aff..049fd06b4c3d 100644
--- a/kernel/entry/kvm.c
+++ b/kernel/entry/kvm.c
@@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work)
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
-		if (ti_work & _TIF_NOTIFY_RESUME)
+		if (ti_work & _TIF_NOTIFY_RESUME) {
 			tracehook_notify_resume(NULL);
+			rseq_handle_notify_resume(NULL, NULL);
+		}
 
 		ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
 		if (ret)
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 35f7bd0fced0..6d45ac3dae7f 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -282,9 +282,17 @@ void __rseq_handle_notify_resume(struct ksignal *ksig, struct pt_regs *regs)
 
 	if (unlikely(t->flags & PF_EXITING))
 		return;
-	ret = rseq_ip_fixup(regs);
-	if (unlikely(ret < 0))
-		goto error;
+
+	/*
+	 * regs is NULL if and only if the caller is in a syscall path.  Skip
+	 * fixup and leave rseq_cs as is so that rseq_sycall() will detect and
+	 * kill a misbehaving userspace on debug kernels.
+	 */
+	if (regs) {
+		ret = rseq_ip_fixup(regs);
+		if (unlikely(ret < 0))
+			goto error;
+	}
 	if (unlikely(rseq_update_cpu_id(t)))
 		goto error;
 	return;
-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply related

* [PATCH v3 0/5] KVM: rseq: Fix and a test for a KVM+rseq bug
From: Sean Christopherson @ 2021-09-01 20:30 UTC (permalink / raw)
  To: Russell King, Catalin Marinas, Will Deacon, Guo Ren,
	Thomas Bogendoerfer, Michael Ellerman, Steven Rostedt,
	Ingo Molnar, Oleg Nesterov, Thomas Gleixner, Peter Zijlstra,
	Andy Lutomirski, Mathieu Desnoyers, Paul E. McKenney, Boqun Feng,
	Paolo Bonzini, Shuah Khan
  Cc: kvm, Ben Gardon, linux-kernel, linux-csky, linux-mips,
	Peter Foley, Paul Mackerras, linux-kselftest, Sean Christopherson,
	Shakeel Butt, linuxppc-dev, linux-arm-kernel

Patch 1 fixes a KVM+rseq bug where KVM's handling of TIF_NOTIFY_RESUME,
e.g. for task migration, clears the flag without informing rseq and leads
to stale data in userspace's rseq struct.

Patch 2 is a cleanup to try and make future bugs less likely.  It's also
a baby step towards moving and renaming tracehook_notify_resume() since
it has nothing to do with tracing.

Patch 3 is a fix/cleanup to stop overriding x86's unistd_{32,64}.h when
the include path (intentionally) omits tools' uapi headers.  KVM's
selftests do exactly that so that they can pick up the uapi headers from
the installed kernel headers, and still use various tools/ headers that
mirror kernel code, e.g. linux/types.h.  This allows the new test in
patch 4 to reference __NR_rseq without having to manually define it.

Patch 4 is a regression test for the KVM+rseq bug.

Patch 5 is a cleanup made possible by patch 3.

Based on commit 835d31d319d9 ("Merge tag 'media/v5.15-1' of ...").

v3:
  - Collect Ack/Review. [Mathieu, Ben]
  - Add explicit smp_wmb() instead of relying on atomic_inc() to do a full
    barrier. [Mathieu]
  - Add lots and lots of comments in the selftest, especially around why
    the migration thread needs a udelay(). [Mathieu]
  - Delay between 1us and 10us to reduce the odds of having a hard
    dependency on arch/kernel behavior.  [Mathieu]
  - Dropped an s390 change in patch 2 after a rebase to upstream master.

v2:
  - https://lkml.kernel.org/r/20210820225002.310652-1-seanjc@google.com
  - Don't touch rseq_cs when handling KVM case so that rseq_syscall() will
    still detect a naughty userspace. [Mathieu]
  - Use a sequence counter + retry in the test to ensure the process isn't
    migrated between sched_getcpu() and reading rseq.cpu_id, i.e. to
    avoid a flaky test. [Mathieu]
  - Add Mathieu's ack for patch 2.
  - Add more comments in the test.

v1: https://lkml.kernel.org/r/20210818001210.4073390-1-seanjc@google.com

Sean Christopherson (5):
  KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM
    guest
  entry: rseq: Call rseq_handle_notify_resume() in
    tracehook_notify_resume()
  tools: Move x86 syscall number fallbacks to .../uapi/
  KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration
    bugs
  KVM: selftests: Remove __NR_userfaultfd syscall fallback

 arch/arm/kernel/signal.c                      |   1 -
 arch/arm64/kernel/signal.c                    |   1 -
 arch/csky/kernel/signal.c                     |   4 +-
 arch/mips/kernel/signal.c                     |   4 +-
 arch/powerpc/kernel/signal.c                  |   4 +-
 include/linux/tracehook.h                     |   2 +
 kernel/entry/common.c                         |   4 +-
 kernel/rseq.c                                 |  14 +-
 .../x86/include/{ => uapi}/asm/unistd_32.h    |   0
 .../x86/include/{ => uapi}/asm/unistd_64.h    |   3 -
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 tools/testing/selftests/kvm/rseq_test.c       | 236 ++++++++++++++++++
 13 files changed, 257 insertions(+), 20 deletions(-)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_32.h (100%)
 rename tools/arch/x86/include/{ => uapi}/asm/unistd_64.h (83%)
 create mode 100644 tools/testing/selftests/kvm/rseq_test.c

-- 
2.33.0.153.gba50c8fa24-goog


^ permalink raw reply

* Re: [GIT PULL] retire legacy WR sbc8548 and sbc8641 platforms
From: Paul Gortmaker @ 2021-09-01 19:51 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Scott Wood, Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <87pmu09rg1.fsf@mpe.ellerman.id.au>

[Re: [GIT PULL] retire legacy WR sbc8548 and sbc8641 platforms] On 27/08/2021 (Fri 01:05) Michael Ellerman wrote:

> Paul Gortmaker <paul.gortmaker@windriver.com> writes:
> > This is unchanged from the original wr_sbc-delete branch sent in January,
> > other than to add the Acks from Scott in July, and update the baseline.
> >
> > Built with ppc64 defconfig and mpc85xx_cds_defconfig and mpc86xx_defconfig
> > just to make sure I didn't fat finger anything in the baseline update.
> 
> Thanks for following up on this.
> 
> I ended up cherry-picking the patches into my branch. I like to keep my
> next based on rc2, and merging this would have pulled in everything up
> to rc7 into my branch.
> 
> I don't think you were planning to merge this branch anywhere else, so
> it shouldn't make any difference, but let me know if it's a problem.

That is 100% fine - as you guessed, it is a dead end branch, and there is
no real underlying value in preserving the baseline for removals like this.

Thanks!
Paul.
--

> 
> It should appear here soon:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=next
> 
> 
> cheers
>  
> 
> > Original v1 text follows below, from:
> >
> > https://lore.kernel.org/lkml/20210111082823.99562-1-paul.gortmaker@windriver.com
> >
> > It would be nice to get this in and off our collective to-do list.
> >
> > Thanks,
> > Paul.
> >
> >   ---
> >
> > In v2.6.27 (2008, 917f0af9e5a9) the sbc8260 support was implicitly
> > retired by not being carried forward through the ppc --> powerpc
> > device tree transition.
> >
> > Then, in v3.6 (2012, b048b4e17cbb) we retired the support for the
> > sbc8560 boards.
> >
> > Next, in v4.18 (2017, 3bc6cf5a86e5) we retired the support for the
> > 2006 vintage sbc834x boards.
> >
> > The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the
> > sbc834x boards, but it is also 3+ years later, so it makes sense to
> > now retire them as well - which is what is done here.
> >
> > These two remaining WR boards were based on the Freescale MPC8548-CDS
> > and the MPC8641D-HPCN reference board implementations.  Having had the
> > chance to use these and many other Fsl ref boards, I know this:  The
> > Freescale reference boards were typically produced in limited quantity
> > and primarily available to BSP developers and hardware designers, and
> > not likely to have found a 2nd life with hobbyists and/or collectors.
> >
> > It was good to have that BSP code subjected to mainline review and
> > hence also widely available back in the day. But given the above, we
> > should probably also be giving serious consideration to retiring
> > additional similar age/type reference board platforms as well.
> >
> > I've always felt it is important for us to be proactive in retiring
> > old code, since it has a genuine non-zero carrying cost, as described
> > in the 930d52c012b8 merge log.  But for the here and now, we just
> > clean up the remaining BSP code that I had added for SBC platforms.
> >
> > --- 
> >
> > The following changes since commit e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93:
> >
> >   Linux 5.14-rc7 (2021-08-22 14:24:56 -0700)
> >
> > are available in the Git repository at:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git wr_sbc-delete-v2
> >
> > for you to fetch changes up to d44e2dc12ea2112e74cdd25090eeda2727ed09cc:
> >
> >   MAINTAINERS: update for Paul Gortmaker (2021-08-24 08:19:01 -0400)
> >
> > ----------------------------------------------------------------
> > Paul Gortmaker (3):
> >       powerpc: retire sbc8548 board support
> >       powerpc: retire sbc8641d board support
> >       MAINTAINERS: update for Paul Gortmaker
> >
> >  MAINTAINERS                                 |   1 -
> >  arch/powerpc/boot/Makefile                  |   1 -
> >  arch/powerpc/boot/dts/fsl/sbc8641d.dts      | 176 -----------------
> >  arch/powerpc/boot/dts/sbc8548-altflash.dts  | 111 -----------
> >  arch/powerpc/boot/dts/sbc8548-post.dtsi     | 289 ----------------------------
> >  arch/powerpc/boot/dts/sbc8548-pre.dtsi      |  48 -----
> >  arch/powerpc/boot/dts/sbc8548.dts           | 106 ----------
> >  arch/powerpc/boot/wrapper                   |   2 +-
> >  arch/powerpc/configs/85xx/sbc8548_defconfig |  50 -----
> >  arch/powerpc/configs/mpc85xx_base.config    |   1 -
> >  arch/powerpc/configs/mpc86xx_base.config    |   1 -
> >  arch/powerpc/configs/ppc6xx_defconfig       |   1 -
> >  arch/powerpc/platforms/85xx/Kconfig         |   6 -
> >  arch/powerpc/platforms/85xx/Makefile        |   1 -
> >  arch/powerpc/platforms/85xx/sbc8548.c       | 134 -------------
> >  arch/powerpc/platforms/86xx/Kconfig         |   8 +-
> >  arch/powerpc/platforms/86xx/Makefile        |   1 -
> >  arch/powerpc/platforms/86xx/sbc8641d.c      |  87 ---------
> >  18 files changed, 2 insertions(+), 1022 deletions(-)
> >  delete mode 100644 arch/powerpc/boot/dts/fsl/sbc8641d.dts
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-altflash.dts
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-post.dtsi
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548-pre.dtsi
> >  delete mode 100644 arch/powerpc/boot/dts/sbc8548.dts
> >  delete mode 100644 arch/powerpc/configs/85xx/sbc8548_defconfig
> >  delete mode 100644 arch/powerpc/platforms/85xx/sbc8548.c
> >  delete mode 100644 arch/powerpc/platforms/86xx/sbc8641d.c

^ permalink raw reply

* [PATCH 2/5] KVM: PPC: Book3S HV: Delay setting of kvm ops
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Delay the setting of kvm_hv_ops until after all init code has
completed. This avoids leaving the ops still accessible if the init
fails.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4cdf2e6e1cf5..fe20074faa61 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5985,9 +5985,6 @@ static int kvmppc_book3s_init_hv(void)
 	}
 #endif
 
-	kvm_ops_hv.owner = THIS_MODULE;
-	kvmppc_hv_ops = &kvm_ops_hv;
-
 	init_default_hcalls();
 
 	init_vcore_lists();
@@ -6003,10 +6000,15 @@ static int kvmppc_book3s_init_hv(void)
 	}
 
 	r = kvmppc_uvmem_init();
-	if (r < 0)
+	if (r < 0) {
 		pr_err("KVM-HV: kvmppc_uvmem_init failed %d\n", r);
+		return r;
+	}
 
-	return r;
+	kvm_ops_hv.owner = THIS_MODULE;
+	kvmppc_hv_ops = &kvm_ops_hv;
+
+	return 0;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 1/5] KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

The return of the function is being shadowed by the call to
kvmppc_uvmem_init.

Fixes: ca9f4942670c ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..4cdf2e6e1cf5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5996,8 +5996,11 @@ static int kvmppc_book3s_init_hv(void)
 	if (r)
 		return r;
 
-	if (kvmppc_radix_possible())
+	if (kvmppc_radix_possible()) {
 		r = kvmppc_radix_init();
+		if (r)
+			return r;
+	}
 
 	r = kvmppc_uvmem_init();
 	if (r < 0)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 5/5] KVM: PPC: Book3S: Stop exporting non-builtin symbols
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Now that we have only one kvm module, we can stop exporting some
symbols.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s.c              | 15 ---------------
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  3 ---
 arch/powerpc/kvm/book3s_64_vio.c       |  3 ---
 arch/powerpc/kvm/book3s_rtas.c         |  1 -
 arch/powerpc/kvm/book3s_xics.c         |  4 ----
 arch/powerpc/kvm/book3s_xive.c         |  6 ------
 arch/powerpc/kvm/emulate.c             |  1 -
 arch/powerpc/kvm/powerpc.c             | 14 --------------
 8 files changed, 47 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c381bb17b0f9..120a68c76501 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -191,27 +191,23 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec)
 	printk(KERN_INFO "Queueing interrupt %x\n", vec);
 #endif
 }
-EXPORT_SYMBOL_GPL(kvmppc_book3s_queue_irqprio);
 
 void kvmppc_core_queue_machine_check(struct kvm_vcpu *vcpu, ulong flags)
 {
 	/* might as well deliver this straight away */
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_MACHINE_CHECK, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_machine_check);
 
 void kvmppc_core_queue_syscall(struct kvm_vcpu *vcpu)
 {
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_SYSCALL, 0);
 }
-EXPORT_SYMBOL(kvmppc_core_queue_syscall);
 
 void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
 	/* might as well deliver this straight away */
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_PROGRAM, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_program);
 
 void kvmppc_core_queue_fpunavail(struct kvm_vcpu *vcpu)
 {
@@ -235,19 +231,16 @@ void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
 {
 	kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_dec);
 
 int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
 {
 	return test_bit(BOOK3S_IRQPRIO_DECREMENTER, &vcpu->arch.pending_exceptions);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_pending_dec);
 
 void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu)
 {
 	kvmppc_book3s_dequeue_irqprio(vcpu, BOOK3S_INTERRUPT_DECREMENTER);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_dequeue_dec);
 
 void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
                                 struct kvm_interrupt *irq)
@@ -290,13 +283,11 @@ void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
 	kvmppc_set_dsisr(vcpu, flags);
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
 void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu, ulong flags)
 {
 	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_INST_STORAGE, flags);
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_queue_inst_storage);
 
 static int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu,
 					 unsigned int priority)
@@ -428,7 +419,6 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_core_prepare_to_enter);
 
 kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 			bool *writable)
@@ -454,7 +444,6 @@ kvm_pfn_t kvmppc_gpa_to_pfn(struct kvm_vcpu *vcpu, gpa_t gpa, bool writing,
 
 	return gfn_to_pfn_prot(vcpu->kvm, gfn, writing, writable);
 }
-EXPORT_SYMBOL_GPL(kvmppc_gpa_to_pfn);
 
 int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid,
 		 enum xlate_readwrite xlrw, struct kvmppc_pte *pte)
@@ -501,7 +490,6 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 	else
 		return EMULATE_AGAIN;
 }
-EXPORT_SYMBOL_GPL(kvmppc_load_last_inst);
 
 int kvmppc_subarch_vcpu_init(struct kvm_vcpu *vcpu)
 {
@@ -788,7 +776,6 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 {
 	vcpu->kvm->arch.kvm_ops->set_msr(vcpu, msr);
 }
-EXPORT_SYMBOL_GPL(kvmppc_set_msr);
 
 int kvmppc_vcpu_run(struct kvm_vcpu *vcpu)
 {
@@ -964,7 +951,6 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_load);
 
 int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
 {
@@ -1004,7 +990,6 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_logical_ci_store);
 
 int kvmppc_core_check_processor_compat(void)
 {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..f354ccfed56d 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -81,7 +81,6 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(__kvmhv_copy_tofrom_guest_radix);
 
 static long kvmhv_copy_tofrom_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr,
 					  void *to, void *from, unsigned long n)
@@ -117,14 +116,12 @@ long kvmhv_copy_from_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *to,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_from_guest_radix);
 
 long kvmhv_copy_to_guest_radix(struct kvm_vcpu *vcpu, gva_t eaddr, void *from,
 			       unsigned long n)
 {
 	return kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, NULL, from, n);
 }
-EXPORT_SYMBOL_GPL(kvmhv_copy_to_guest_radix);
 
 int kvmppc_mmu_walk_radix_tree(struct kvm_vcpu *vcpu, gva_t eaddr,
 			       struct kvmppc_pte *gpte, u64 root,
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 8da93fdfa59e..5e717512d9c4 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -606,7 +606,6 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce);
 
 long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 		unsigned long liobn, unsigned long ioba,
@@ -703,7 +702,6 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);
 
 long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 		unsigned long liobn, unsigned long ioba,
@@ -752,4 +750,3 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce);
diff --git a/arch/powerpc/kvm/book3s_rtas.c b/arch/powerpc/kvm/book3s_rtas.c
index 0f847f1e5ddd..91094dfb9bc4 100644
--- a/arch/powerpc/kvm/book3s_rtas.c
+++ b/arch/powerpc/kvm/book3s_rtas.c
@@ -294,7 +294,6 @@ int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu)
 	 */
 	return rc;
 }
-EXPORT_SYMBOL_GPL(kvmppc_rtas_hcall);
 
 void kvmppc_rtas_tokens_free(struct kvm *kvm)
 {
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 303e3cb096db..234a80c958f1 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -870,7 +870,6 @@ int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
 
 	return H_SUCCESS;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_rm_complete);
 
 int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 {
@@ -917,7 +916,6 @@ int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 
 	return rc;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_hcall);
 
 
 /* -- Initialisation code etc. -- */
@@ -1498,7 +1496,6 @@ void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long irq,
 	ics->irq_state[idx].host_irq = host_irq;
 	ics->irq_state[idx].intr_cpu = -1;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_set_mapped);
 
 void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq,
 			    unsigned long host_irq)
@@ -1513,4 +1510,3 @@ void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq,
 
 	ics->irq_state[idx].host_irq = 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xics_clr_mapped);
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 8cfab3547494..77b350805649 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -126,7 +126,6 @@ void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu)
 			vcpu->arch.xive_esc_on = 0;
 	}
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_push_vcpu);
 
 /*
  * Pull a vcpu's context from the XIVE on guest exit.
@@ -157,7 +156,6 @@ void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu)
 	vcpu->arch.xive_pushed = 0;
 	eieio();
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_pull_vcpu);
 
 void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
 {
@@ -191,7 +189,6 @@ void kvmppc_xive_rearm_escalation(struct kvm_vcpu *vcpu)
 	}
 	mb();
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_rearm_escalation);
 
 /*
  * This is a simple trigger for a generic XIVE IRQ. This must
@@ -1016,7 +1013,6 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_set_mapped);
 
 int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 			   struct irq_desc *host_desc)
@@ -1097,7 +1093,6 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_clr_mapped);
 
 void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu)
 {
@@ -2169,7 +2164,6 @@ int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
 
 	return H_UNSUPPORTED;
 }
-EXPORT_SYMBOL_GPL(kvmppc_xive_xics_hcall);
 
 int kvmppc_xive_debug_show_queues(struct seq_file *m, struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ee1147c98cd8..468efa502d56 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -304,4 +304,3 @@ int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 
 	return emulated;
 }
-EXPORT_SYMBOL_GPL(kvmppc_emulate_instruction);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b4e6f70b97b9..b3a8853bf6ba 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -41,9 +41,7 @@
 #include "trace.h"
 
 struct kvmppc_ops *kvmppc_hv_ops;
-EXPORT_SYMBOL_GPL(kvmppc_hv_ops);
 struct kvmppc_ops *kvmppc_pr_ops;
-EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
 
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
@@ -135,7 +133,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 	local_irq_enable();
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_prepare_to_enter);
 
 #if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_PR_POSSIBLE)
 static void kvmppc_swab_shared(struct kvm_vcpu *vcpu)
@@ -248,7 +245,6 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_kvm_pv);
 
 int kvmppc_sanity_check(struct kvm_vcpu *vcpu)
 {
@@ -277,7 +273,6 @@ int kvmppc_sanity_check(struct kvm_vcpu *vcpu)
 	vcpu->arch.sane = r;
 	return r ? 0 : -EINVAL;
 }
-EXPORT_SYMBOL_GPL(kvmppc_sanity_check);
 
 int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 {
@@ -319,7 +314,6 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvmppc_emulate_mmio);
 
 int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 	      bool data)
@@ -362,7 +356,6 @@ int kvmppc_st(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	return EMULATE_DONE;
 }
-EXPORT_SYMBOL_GPL(kvmppc_st);
 
 int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 		      bool data)
@@ -411,7 +404,6 @@ int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 
 	return EMULATE_DONE;
 }
-EXPORT_SYMBOL_GPL(kvmppc_ld);
 
 int kvm_arch_hardware_enable(void)
 {
@@ -1281,7 +1273,6 @@ int kvmppc_handle_load(struct kvm_vcpu *vcpu,
 {
 	return __kvmppc_handle_load(vcpu, rt, bytes, is_default_endian, 0);
 }
-EXPORT_SYMBOL_GPL(kvmppc_handle_load);
 
 /* Same as above, but sign extends */
 int kvmppc_handle_loads(struct kvm_vcpu *vcpu,
@@ -1378,7 +1369,6 @@ int kvmppc_handle_store(struct kvm_vcpu *vcpu,
 
 	return EMULATE_DO_MMIO;
 }
-EXPORT_SYMBOL_GPL(kvmppc_handle_store);
 
 #ifdef CONFIG_VSX
 static inline int kvmppc_get_vsr_data(struct kvm_vcpu *vcpu, int rs, u64 *val)
@@ -2478,26 +2468,22 @@ long kvmppc_alloc_lpid(void)
 
 	return lpid;
 }
-EXPORT_SYMBOL_GPL(kvmppc_alloc_lpid);
 
 void kvmppc_claim_lpid(long lpid)
 {
 	set_bit(lpid, lpid_inuse);
 }
-EXPORT_SYMBOL_GPL(kvmppc_claim_lpid);
 
 void kvmppc_free_lpid(long lpid)
 {
 	clear_bit(lpid, lpid_inuse);
 }
-EXPORT_SYMBOL_GPL(kvmppc_free_lpid);
 
 void kvmppc_init_lpid(unsigned long nr_lpids_param)
 {
 	nr_lpids = min_t(unsigned long, KVMPPC_NR_LPIDS, nr_lpids_param);
 	memset(lpid_inuse, 0, sizeof(lpid_inuse));
 }
-EXPORT_SYMBOL_GPL(kvmppc_init_lpid);
 
 int kvm_arch_init(void *opaque)
 {
-- 
2.29.2


^ permalink raw reply related

* [PATCH 4/5] KVM: PPC: Book3S: Unify kvm-hv and kvm-pr modules
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

Our three virtualization modules (kvm, kvm-hv, kvm-pr) can be
loaded/unloaded in such a way that could leave kvm.ko loaded without
kvm-hv.ko or kvm-pr.ko. That means the userspace could continue to
issue ioctls to KVM while there is no code on our side to service
them.

We currently select at config time which module will be built, either
kvm-hv, kvm-pr or both, with the kvm module being always present. For
32 bits the kvm module is the only one present and contains the code
from kvm-pr in it. We can do the same for 64 bit and keep hv and pr
inside kvm.ko.

With this, we do not lose the ability of running two guests at the
same time, each using a different implementation (although only POWER
bare-metal with Hash MMU supports HV and PR at the same time).

We lose the ability of loading and unloading the code, which means any
Linux installation that wants to support *both* KVM-HV guests on POWER
bare-metal and KVM-PR nested guests on top of PowerVM would have a
binary with a larger memory footprint (assuming PR is only used when
HV is not present).

This patch alters the build to output only one module at all times and
relies on the Kconfig to select which implementation will be present.

The module init code was rearranged to initialize and cleanup both
implementations or only one of them.

The Kconfig was altered to provide one boolean for each implementation
(KVM_BOOK3S_PR_POSSIBLE, KVM_BOOK3S_HV_POSSIBLE), while keeping the
old tristate for the kvm.ko module (KVM_BOOK3S_64). The old
KVM_BOOK3S_32 already selects KVM_BOOK3S_PR_POSSIBLE, so nothing
changes there.

Two module aliases were created to facilitate the introduction of the
new scheme so `modprobe kvm-hv`and `modprobe kvm-pr` are the same as
`modprobe kvm`.

The license macro was removed because it is already included by
virt/kvm/kvm_main.c.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>

---
Here's a summary of the environments that can run KVM:

* 64 bit powernv w/Radix              KVM-HV
* 64 bit powernv w/Hash               KVM-HV        KVM-PR
  32 bit powernv w/Hash                             KVM-PR
  32 bit powernv w/Radix              N/A
* 64 bit pseries w/Radix on powernv   KVM-HV nested
* 64 bit pseries w/Hash  on powernv                 KVM-PR nested
  32 bit pseries w/Hash  on powernv                 KVM-PR nested
* 64 bit pseries w/Radix on PowerVM   N/A
* 64 bit pseries w/Hash  on PowerVM                 KVM-PR nested

Lines with an asterisk were tested with all combinations for the
module and the HV/PR configs.
---
 arch/powerpc/configs/powernv_defconfig |  2 +-
 arch/powerpc/configs/ppc64_defconfig   |  2 +-
 arch/powerpc/configs/pseries_defconfig |  2 +-
 arch/powerpc/kvm/Kconfig               | 72 ++++++++++++--------------
 arch/powerpc/kvm/Makefile              | 11 ++--
 arch/powerpc/kvm/book3s.c              | 46 +++++++++++++---
 arch/powerpc/kvm/book3s.h              | 19 +++++++
 arch/powerpc/kvm/book3s_hv.c           | 10 +---
 arch/powerpc/kvm/book3s_pr.c           | 13 -----
 kernel/irq/irqdesc.c                   |  2 +-
 10 files changed, 105 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/configs/powernv_defconfig b/arch/powerpc/configs/powernv_defconfig
index 8bfeea6c7de7..853b95685d9f 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -341,7 +341,7 @@ CONFIG_CRYPTO_DEV_NX=y
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig
index 0ad2291337a7..46ace457def6 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -61,7 +61,7 @@ CONFIG_PCCARD=y
 CONFIG_ELECTRA_CF=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_KPROBES=y
 CONFIG_JUMP_LABEL=y
diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index b183629f1bcf..6804b1a6bef1 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -321,7 +321,7 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_CRYPTO_DEV_VMX=y
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
-CONFIG_KVM_BOOK3S_64_HV=m
+CONFIG_KVM_BOOK3S_HV_POSSIBLE=y
 CONFIG_VHOST_NET=m
 CONFIG_PRINTK_TIME=y
 CONFIG_PRINTK_CALLER=y
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index e45644657d49..2d080f57ce3b 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -41,12 +41,43 @@ config KVM_BOOK3S_64_HANDLER
 	select PPC_DAWR_FORCE_ENABLE
 
 config KVM_BOOK3S_PR_POSSIBLE
-	bool
+	bool "KVM support without using hypervisor mode in host"
+	depends on KVM_BOOK3S_64 || KVM_BOOK3S_32
 	select KVM_MMIO
 	select MMU_NOTIFIER
+	help
+	  Support running guest kernels in virtual machines on processors
+	  without using hypervisor mode in the host, by running the
+	  guest in user mode (problem state) and emulating all
+	  privileged instructions and registers.
+
+	  This is not as fast as using hypervisor mode, but works on
+	  where hypervisor mode is not available or not usable,
+	  and can emulate processors that are different from the host
+	  processor, including emulating 32-bit processors on a 64-bit
+	  host.
+
 
 config KVM_BOOK3S_HV_POSSIBLE
-	bool
+	bool "KVM for POWER7 and later using hypervisor mode in host"
+	depends on KVM_BOOK3S_64
+	select MMU_NOTIFIER
+	select CMA
+	help
+	  Support running unmodified book3s_64 guest kernels in
+	  virtual machines on POWER7 and newer processors that have
+	  hypervisor mode available to the host.
+
+	  If you say Y here, KVM will use the hardware virtualization
+	  facilities of POWER7 (and later) processors, meaning that
+	  guest operating systems will run at full hardware speed
+	  using supervisor and user modes.  However, this also means
+	  that KVM is not usable under PowerVM (pHyp), is only usable
+	  on POWER7 or later processors, and cannot emulate a
+	  different processor from the host processor.
+
+	  If unsure, say N.
+
 
 config KVM_BOOK3S_32
 	tristate "KVM support for PowerPC book3s_32 processors"
@@ -80,43 +111,6 @@ config KVM_BOOK3S_64
 
 	  If unsure, say N.
 
-config KVM_BOOK3S_64_HV
-	tristate "KVM for POWER7 and later using hypervisor mode in host"
-	depends on KVM_BOOK3S_64 && PPC_POWERNV
-	select KVM_BOOK3S_HV_POSSIBLE
-	select MMU_NOTIFIER
-	select CMA
-	help
-	  Support running unmodified book3s_64 guest kernels in
-	  virtual machines on POWER7 and newer processors that have
-	  hypervisor mode available to the host.
-
-	  If you say Y here, KVM will use the hardware virtualization
-	  facilities of POWER7 (and later) processors, meaning that
-	  guest operating systems will run at full hardware speed
-	  using supervisor and user modes.  However, this also means
-	  that KVM is not usable under PowerVM (pHyp), is only usable
-	  on POWER7 or later processors, and cannot emulate a
-	  different processor from the host processor.
-
-	  If unsure, say N.
-
-config KVM_BOOK3S_64_PR
-	tristate "KVM support without using hypervisor mode in host"
-	depends on KVM_BOOK3S_64
-	select KVM_BOOK3S_PR_POSSIBLE
-	help
-	  Support running guest kernels in virtual machines on processors
-	  without using hypervisor mode in the host, by running the
-	  guest in user mode (problem state) and emulating all
-	  privileged instructions and registers.
-
-	  This is not as fast as using hypervisor mode, but works on
-	  machines where hypervisor mode is not available or not usable,
-	  and can emulate processors that are different from the host
-	  processor, including emulating 32-bit processors on a 64-bit
-	  host.
-
 config KVM_BOOK3S_HV_EXIT_TIMING
 	bool "Detailed timing for hypervisor real-mode code"
 	depends on KVM_BOOK3S_HV_POSSIBLE && DEBUG_FS
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..704380065df3 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -108,6 +108,14 @@ kvm-book3s_64-module-objs := \
 	book3s_rtas.o \
 	$(kvm-book3s_64-objs-y)
 
+ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+kvm-book3s_64-module-objs += $(kvm-hv-y)
+endif
+
+ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
+kvm-book3s_64-module-objs += $(kvm-pr-y)
+endif
+
 kvm-objs-$(CONFIG_KVM_BOOK3S_64) := $(kvm-book3s_64-module-objs)
 
 kvm-book3s_32-objs := \
@@ -134,7 +142,4 @@ obj-$(CONFIG_KVM_E500MC) += kvm.o
 obj-$(CONFIG_KVM_BOOK3S_64) += kvm.o
 obj-$(CONFIG_KVM_BOOK3S_32) += kvm.o
 
-obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
-obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
-
 obj-y += $(kvm-book3s_64-builtin-objs-y)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..c381bb17b0f9 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -1065,6 +1065,24 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
 
 #endif /* CONFIG_KVM_XICS */
 
+static int kvmppc_init(void)
+{
+	int r;
+
+	r = kvmppc_book3s_init_hv();
+	if (r)
+		pr_err("KVM-HV: could not load (%d)\n", r);
+
+	r = kvmppc_book3s_init_pr();
+	if (r)
+		pr_err("KVM-PR: could not load (%d)\n", r);
+
+	if (!kvmppc_hv_ops && !kvmppc_pr_ops)
+		return -EINVAL;
+
+	return 0;
+}
+
 static int kvmppc_book3s_init(void)
 {
 	int r;
@@ -1072,9 +1090,10 @@ static int kvmppc_book3s_init(void)
 	r = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
 	if (r)
 		return r;
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
-	r = kvmppc_book3s_init_pr();
-#endif
+
+	r = kvmppc_init();
+	if (r)
+		goto exit;
 
 #ifdef CONFIG_KVM_XICS
 #ifdef CONFIG_KVM_XIVE
@@ -1087,22 +1106,35 @@ static int kvmppc_book3s_init(void)
 #endif
 		kvm_register_device_ops(&kvm_xics_ops, KVM_DEV_TYPE_XICS);
 #endif
+	return 0;
+
+exit:
+	kvm_exit();
 	return r;
 }
 
 static void kvmppc_book3s_exit(void)
 {
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 	kvmppc_book3s_exit_pr();
-#endif
+	kvmppc_book3s_exit_hv();
 	kvm_exit();
 }
 
 module_init(kvmppc_book3s_init);
 module_exit(kvmppc_book3s_exit);
 
-/* On 32bit this is our one and only kernel module */
-#ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 MODULE_ALIAS_MISCDEV(KVM_MINOR);
 MODULE_ALIAS("devname:kvm");
+
+/*
+ * Whether we use KVM-HV or KVM-PR is dependent exclusively on the
+ * config options. These aliases are here only to ease the transition
+ * to the one module model we have now.
+ */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+MODULE_ALIAS("kvm-hv");
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
+MODULE_ALIAS("kvm-pr");
 #endif
diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h
index 740e51def5a5..d21772904971 100644
--- a/arch/powerpc/kvm/book3s.h
+++ b/arch/powerpc/kvm/book3s.h
@@ -22,8 +22,27 @@ extern int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu,
 					int sprn, ulong spr_val);
 extern int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu,
 					int sprn, ulong *spr_val);
+#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 extern int kvmppc_book3s_init_pr(void);
 extern void kvmppc_book3s_exit_pr(void);
+#else
+static inline int kvmppc_book3s_init_pr(void)
+{
+	return 0;
+}
+static inline void kvmppc_book3s_exit_pr(void) {}
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+extern int kvmppc_book3s_init_hv(void);
+extern void kvmppc_book3s_exit_hv(void);
+#else
+static inline int kvmppc_book3s_init_hv(void)
+{
+	return 0;
+}
+static inline void kvmppc_book3s_exit_hv(void) {}
+#endif
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 extern void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a7b82eb438f5..6ba9545ef58e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5941,7 +5941,7 @@ static int kvmppc_radix_possible(void)
 	return cpu_has_feature(CPU_FTR_ARCH_300) && radix_enabled();
 }
 
-static int kvmppc_book3s_init_hv(void)
+int kvmppc_book3s_init_hv(void)
 {
 	int r;
 
@@ -6018,7 +6018,7 @@ static int kvmppc_book3s_init_hv(void)
 	return r;
 }
 
-static void kvmppc_book3s_exit_hv(void)
+void kvmppc_book3s_exit_hv(void)
 {
 	kvmppc_uvmem_free();
 	kvmppc_free_host_rm_ops();
@@ -6027,9 +6027,3 @@ static void kvmppc_book3s_exit_hv(void)
 	kvmppc_hv_ops = NULL;
 	kvmhv_nested_exit();
 }
-
-module_init(kvmppc_book3s_init_hv);
-module_exit(kvmppc_book3s_exit_hv);
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_MISCDEV(KVM_MINOR);
-MODULE_ALIAS("devname:kvm");
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..1d017c4b3eb4 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -2100,16 +2100,3 @@ void kvmppc_book3s_exit_pr(void)
 	kvmppc_pr_ops = NULL;
 	kvmppc_mmu_hpte_sysexit();
 }
-
-/*
- * We only support separate modules for book3s 64
- */
-#ifdef CONFIG_PPC_BOOK3S_64
-
-module_init(kvmppc_book3s_init_pr);
-module_exit(kvmppc_book3s_exit_pr);
-
-MODULE_LICENSE("GPL");
-MODULE_ALIAS_MISCDEV(KVM_MINOR);
-MODULE_ALIAS("devname:kvm");
-#endif
diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 4e3c29bb603c..7c9c9e794c01 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -352,7 +352,7 @@ struct irq_desc *irq_to_desc(unsigned int irq)
 {
 	return radix_tree_lookup(&irq_desc_tree, irq);
 }
-#ifdef CONFIG_KVM_BOOK3S_64_HV_MODULE
+#ifdef CONFIG_KVM_BOOK3S_64_MODULE
 EXPORT_SYMBOL_GPL(irq_to_desc);
 #endif
 
-- 
2.29.2


^ permalink raw reply related

* [PATCH 3/5] KVM: PPC: Book3S HV: Free allocated memory if module init fails
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david
In-Reply-To: <20210901173357.3183658-1-farosas@linux.ibm.com>

The module's exit function is not called when the init fails, we need
to do cleanup before returning.

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fe20074faa61..a7b82eb438f5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5963,7 +5963,7 @@ static int kvmppc_book3s_init_hv(void)
 
 	r = kvm_init_subcore_bitmap();
 	if (r)
-		return r;
+		goto err;
 
 	/*
 	 * We need a way of accessing the XICS interrupt controller,
@@ -5978,7 +5978,8 @@ static int kvmppc_book3s_init_hv(void)
 		np = of_find_compatible_node(NULL, NULL, "ibm,opal-intc");
 		if (!np) {
 			pr_err("KVM-HV: Cannot determine method for accessing XICS\n");
-			return -ENODEV;
+			r = -ENODEV;
+			goto err;
 		}
 		/* presence of intc confirmed - node can be dropped again */
 		of_node_put(np);
@@ -5991,12 +5992,12 @@ static int kvmppc_book3s_init_hv(void)
 
 	r = kvmppc_mmu_hv_init();
 	if (r)
-		return r;
+		goto err;
 
 	if (kvmppc_radix_possible()) {
 		r = kvmppc_radix_init();
 		if (r)
-			return r;
+			goto err;
 	}
 
 	r = kvmppc_uvmem_init();
@@ -6009,6 +6010,12 @@ static int kvmppc_book3s_init_hv(void)
 	kvmppc_hv_ops = &kvm_ops_hv;
 
 	return 0;
+
+err:
+	kvmhv_nested_exit();
+	kvmppc_radix_exit();
+
+	return r;
 }
 
 static void kvmppc_book3s_exit_hv(void)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 0/5] KVM: PPC: Book3S: Modules cleanup and unification
From: Fabiano Rosas @ 2021-09-01 17:33 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, npiggin, david

This series merges our three kvm modules kvm.ko, kvm-hv.ko and
kvm-pr.ko into one kvm.ko module.

The main reason for this is to deal with the issue that kvm.ko can be
loaded on its own without any of the other modules present. This can
happen if one or both of the modules fail to init or if the user loads
kvm.ko only.

With only kvm.ko loaded, the userspace can call any of the KVM ioctls
which will fail more or less gracefully depending on what kind of
verification we do in powerpc.c.

Instead of adding a check to every entry point or finding a hack to
link the modules so that when one fails (hv/pr), the other (kvm)
exits, I think it is cleaner to just make them all a single module.

The two KVM implementations are already selected by Kconfig options,
so the only thing that changes is that they are now in the same
module. I also kept kvm-hv and kvm-pr as aliases to kvm, so that
people don't get too surprised with the change.

There is a possible issue with the larger module size for kernel
builds that should support both HV-only and PR-only environments, but
PR is usually not used in production so I'm not sure if that is a real
issue.

Patches 1,2,3 are standalone cleanups.
Patches 4,5 are the unification work.

Fabiano Rosas (5):
  KVM: PPC: Book3S HV: Check return value of kvmppc_radix_init
  KVM: PPC: Book3S HV: Delay setting of kvm ops
  KVM: PPC: Book3S HV: Free allocated memory if module init fails
  KVM: PPC: Book3S: Unify kvm-hv and kvm-pr modules
  KVM: PPC: Book3S: Stop exporting non-builtin symbols

 arch/powerpc/configs/powernv_defconfig |  2 +-
 arch/powerpc/configs/ppc64_defconfig   |  2 +-
 arch/powerpc/configs/pseries_defconfig |  2 +-
 arch/powerpc/kvm/Kconfig               | 72 ++++++++++++--------------
 arch/powerpc/kvm/Makefile              | 11 ++--
 arch/powerpc/kvm/book3s.c              | 61 ++++++++++++++--------
 arch/powerpc/kvm/book3s.h              | 19 +++++++
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  3 --
 arch/powerpc/kvm/book3s_64_vio.c       |  3 --
 arch/powerpc/kvm/book3s_hv.c           | 38 ++++++++------
 arch/powerpc/kvm/book3s_pr.c           | 13 -----
 arch/powerpc/kvm/book3s_rtas.c         |  1 -
 arch/powerpc/kvm/book3s_xics.c         |  4 --
 arch/powerpc/kvm/book3s_xive.c         |  6 ---
 arch/powerpc/kvm/emulate.c             |  1 -
 arch/powerpc/kvm/powerpc.c             | 14 -----
 kernel/irq/irqdesc.c                   |  2 +-
 17 files changed, 125 insertions(+), 129 deletions(-)

-- 
2.29.2


^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Christophe Leroy @ 2021-09-01 17:21 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Eirik Fuller
In-Reply-To: <20210901165418.1412891-1-npiggin@gmail.com>



Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
> If a system call is made with a transaction active, the kernel
> immediately aborts it and returns. scv system calls disable irqs even
> earlier in their interrupt handler, and tabort_syscall does not fix this
> up.
> 
> This can result in irq soft-mask state being messed up on the next
> kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
> the kernel exit handlers, or possibly worse.
> 
> Fix this by having tabort_syscall setting irq soft-mask back to enabled
> (which requires MSR[EE] be disabled first).
> 
> Reported-by: Eirik Fuller <efuller@redhat.com>
> Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> 
> Tested the wrong kernel before sending v1 and missed a bug, sorry.
> 
>   arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
> index d4212d2ff0b5..9c31d65b4851 100644
> --- a/arch/powerpc/kernel/interrupt_64.S
> +++ b/arch/powerpc/kernel/interrupt_64.S
> @@ -428,16 +428,22 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   tabort_syscall:
>   _ASM_NOKPROBE_SYMBOL(tabort_syscall)
> -	/* Firstly we need to enable TM in the kernel */
> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>   	mfmsr	r10
>   	li	r9, 1
>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> +	andc	r10, r10, r9

Why not use 'rlwinm' to mask out MSR_EE ?

Something like

	rlwinm	r10, r10, 0, ~MSR_EE

>   	mtmsrd	r10, 0
>   
>   	/* tabort, this dooms the transaction, nothing else */
>   	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
>   	TABORT(R9)
>   
> +	/* scv has disabled irqs so must re-enable. sc just remains enabled */
> +	li	r9,IRQS_ENABLED
> +	stb	r9,PACAIRQSOFTMASK(r13)
> +
>   	/*
>   	 * Return directly to userspace. We have corrupted user register state,
>   	 * but userspace will never see that register state. Execution will
> 

^ permalink raw reply

* Re: [PATCH v2 2/2] selftests/powerpc: Add scv versions of the basic TM syscall tests
From: Christophe Leroy @ 2021-09-01 17:15 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Eirik Fuller
In-Reply-To: <20210901165418.1412891-2-npiggin@gmail.com>



Le 01/09/2021 à 18:54, Nicholas Piggin a écrit :
> The basic TM vs syscall test code hard codes an sc instruction for the
> system call, which fails to cover scv even when the userspace libc has
> support for it.
> 
> Duplicate the tests with hard coded scv variants so both are tested
> when possible.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   .../selftests/powerpc/tm/tm-syscall-asm.S     | 46 +++++++++++++++++++
>   .../testing/selftests/powerpc/tm/tm-syscall.c | 36 ++++++++++++---
>   2 files changed, 75 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
> index bd1ca25febe4..849316831e6a 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
> +++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
> @@ -2,6 +2,10 @@
>   #include <ppc-asm.h>
>   #include <asm/unistd.h>
>   
> +/* ppc-asm.h does not define r0 or r1 */
> +#define r0 0
> +#define r1 1
> +

See https://github.com/gcc-mirror/gcc/blob/master/gcc/config/rs6000/ppc-asm.h

It doesn't not define r1 but it defines r0.

And it defines 'sp' as register 1.

>   	.text
>   FUNC_START(getppid_tm_active)
>   	tbegin.
> @@ -26,3 +30,45 @@ FUNC_START(getppid_tm_suspended)
>   1:
>   	li	r3, -1
>   	blr
> +
> +FUNC_START(getppid_scv_tm_active)
> +	mflr	r0
> +	std	r0,16(r1)
> +	stdu	r1,-32(r1)
> +	tbegin.
> +	beq 1f
> +	li	r0, __NR_getppid
> +	scv	0
> +	tend.
> +	addi	r1,r1,32
> +	ld	r0,16(r1)
> +	mtlr	r0
> +	blr
> +1:
> +	li	r3, -1
> +	addi	r1,r1,32
> +	ld	r0,16(r1)
> +	mtlr	r0
> +	blr
> +
> +FUNC_START(getppid_scv_tm_suspended)
> +	mflr	r0
> +	std	r0,16(r1)
> +	stdu	r1,-32(r1)
> +	tbegin.
> +	beq 1f
> +	li	r0, __NR_getppid
> +	tsuspend.
> +	scv	0
> +	tresume.
> +	tend.
> +	addi	r1,r1,32
> +	ld	r0,16(r1)
> +	mtlr	r0
> +	blr
> +1:
> +	li	r3, -1
> +	addi	r1,r1,32
> +	ld	r0,16(r1)
> +	mtlr	r0
> +	blr
> diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c
> index becb8207b432..9a822208680e 100644
> --- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
> +++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
> @@ -19,24 +19,37 @@
>   #include "utils.h"
>   #include "tm.h"
>   
> +#ifndef PPC_FEATURE2_SCV
> +#define PPC_FEATURE2_SCV               0x00100000 /* scv syscall */
> +#endif
> +
>   extern int getppid_tm_active(void);
>   extern int getppid_tm_suspended(void);
> +extern int getppid_scv_tm_active(void);
> +extern int getppid_scv_tm_suspended(void);
>   
>   unsigned retries = 0;
>   
>   #define TEST_DURATION 10 /* seconds */
>   #define TM_RETRIES 100
>   
> -pid_t getppid_tm(bool suspend)
> +pid_t getppid_tm(bool scv, bool suspend)
>   {
>   	int i;
>   	pid_t pid;
>   
>   	for (i = 0; i < TM_RETRIES; i++) {
> -		if (suspend)
> -			pid = getppid_tm_suspended();
> -		else
> -			pid = getppid_tm_active();
> +		if (suspend) {
> +			if (scv)
> +				pid = getppid_scv_tm_suspended();
> +			else
> +				pid = getppid_tm_suspended();
> +		} else {
> +			if (scv)
> +				pid = getppid_scv_tm_active();
> +			else
> +				pid = getppid_tm_active();
> +		}
>   
>   		if (pid >= 0)
>   			return pid;
> @@ -82,15 +95,24 @@ int tm_syscall(void)
>   		 * Test a syscall within a suspended transaction and verify
>   		 * that it succeeds.
>   		 */
> -		FAIL_IF(getppid_tm(true) == -1); /* Should succeed. */
> +		FAIL_IF(getppid_tm(false, true) == -1); /* Should succeed. */
>   
>   		/*
>   		 * Test a syscall within an active transaction and verify that
>   		 * it fails with the correct failure code.
>   		 */
> -		FAIL_IF(getppid_tm(false) != -1);  /* Should fail... */
> +		FAIL_IF(getppid_tm(false, false) != -1);  /* Should fail... */
>   		FAIL_IF(!failure_is_persistent()); /* ...persistently... */
>   		FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
> +
> +		/* Now do it all again with scv if it is available. */
> +		if (have_hwcap2(PPC_FEATURE2_SCV)) {
> +			FAIL_IF(getppid_tm(true, true) == -1); /* Should succeed. */
> +			FAIL_IF(getppid_tm(true, false) != -1);  /* Should fail... */
> +			FAIL_IF(!failure_is_persistent()); /* ...persistently... */
> +			FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
> +		}
> +
>   		gettimeofday(&now, 0);
>   	}
>   
> 

^ permalink raw reply

* [PATCH v2 2/2] selftests/powerpc: Add scv versions of the basic TM syscall tests
From: Nicholas Piggin @ 2021-09-01 16:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Eirik Fuller, Nicholas Piggin
In-Reply-To: <20210901165418.1412891-1-npiggin@gmail.com>

The basic TM vs syscall test code hard codes an sc instruction for the
system call, which fails to cover scv even when the userspace libc has
support for it.

Duplicate the tests with hard coded scv variants so both are tested
when possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../selftests/powerpc/tm/tm-syscall-asm.S     | 46 +++++++++++++++++++
 .../testing/selftests/powerpc/tm/tm-syscall.c | 36 ++++++++++++---
 2 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
index bd1ca25febe4..849316831e6a 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
@@ -2,6 +2,10 @@
 #include <ppc-asm.h>
 #include <asm/unistd.h>
 
+/* ppc-asm.h does not define r0 or r1 */
+#define r0 0
+#define r1 1
+
 	.text
 FUNC_START(getppid_tm_active)
 	tbegin.
@@ -26,3 +30,45 @@ FUNC_START(getppid_tm_suspended)
 1:
 	li	r3, -1
 	blr
+
+FUNC_START(getppid_scv_tm_active)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,-32(r1)
+	tbegin.
+	beq 1f
+	li	r0, __NR_getppid
+	scv	0
+	tend.
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+1:
+	li	r3, -1
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+
+FUNC_START(getppid_scv_tm_suspended)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,-32(r1)
+	tbegin.
+	beq 1f
+	li	r0, __NR_getppid
+	tsuspend.
+	scv	0
+	tresume.
+	tend.
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+1:
+	li	r3, -1
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index becb8207b432..9a822208680e 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -19,24 +19,37 @@
 #include "utils.h"
 #include "tm.h"
 
+#ifndef PPC_FEATURE2_SCV
+#define PPC_FEATURE2_SCV               0x00100000 /* scv syscall */
+#endif
+
 extern int getppid_tm_active(void);
 extern int getppid_tm_suspended(void);
+extern int getppid_scv_tm_active(void);
+extern int getppid_scv_tm_suspended(void);
 
 unsigned retries = 0;
 
 #define TEST_DURATION 10 /* seconds */
 #define TM_RETRIES 100
 
-pid_t getppid_tm(bool suspend)
+pid_t getppid_tm(bool scv, bool suspend)
 {
 	int i;
 	pid_t pid;
 
 	for (i = 0; i < TM_RETRIES; i++) {
-		if (suspend)
-			pid = getppid_tm_suspended();
-		else
-			pid = getppid_tm_active();
+		if (suspend) {
+			if (scv)
+				pid = getppid_scv_tm_suspended();
+			else
+				pid = getppid_tm_suspended();
+		} else {
+			if (scv)
+				pid = getppid_scv_tm_active();
+			else
+				pid = getppid_tm_active();
+		}
 
 		if (pid >= 0)
 			return pid;
@@ -82,15 +95,24 @@ int tm_syscall(void)
 		 * Test a syscall within a suspended transaction and verify
 		 * that it succeeds.
 		 */
-		FAIL_IF(getppid_tm(true) == -1); /* Should succeed. */
+		FAIL_IF(getppid_tm(false, true) == -1); /* Should succeed. */
 
 		/*
 		 * Test a syscall within an active transaction and verify that
 		 * it fails with the correct failure code.
 		 */
-		FAIL_IF(getppid_tm(false) != -1);  /* Should fail... */
+		FAIL_IF(getppid_tm(false, false) != -1);  /* Should fail... */
 		FAIL_IF(!failure_is_persistent()); /* ...persistently... */
 		FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
+
+		/* Now do it all again with scv if it is available. */
+		if (have_hwcap2(PPC_FEATURE2_SCV)) {
+			FAIL_IF(getppid_tm(true, true) == -1); /* Should succeed. */
+			FAIL_IF(getppid_tm(true, false) != -1);  /* Should fail... */
+			FAIL_IF(!failure_is_persistent()); /* ...persistently... */
+			FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
+		}
+
 		gettimeofday(&now, 0);
 	}
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Nicholas Piggin @ 2021-09-01 16:54 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Eirik Fuller, Nicholas Piggin

If a system call is made with a transaction active, the kernel
immediately aborts it and returns. scv system calls disable irqs even
earlier in their interrupt handler, and tabort_syscall does not fix this
up.

This can result in irq soft-mask state being messed up on the next
kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
the kernel exit handlers, or possibly worse.

Fix this by having tabort_syscall setting irq soft-mask back to enabled
(which requires MSR[EE] be disabled first).

Reported-by: Eirik Fuller <efuller@redhat.com>
Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---

Tested the wrong kernel before sending v1 and missed a bug, sorry.

 arch/powerpc/kernel/interrupt_64.S | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index d4212d2ff0b5..9c31d65b4851 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -428,16 +428,22 @@ RESTART_TABLE(.Lsyscall_rst_start, .Lsyscall_rst_end, syscall_restart)
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
 tabort_syscall:
 _ASM_NOKPROBE_SYMBOL(tabort_syscall)
-	/* Firstly we need to enable TM in the kernel */
+	/* We need to enable TM in the kernel, and disable EE (for scv) */
 	mfmsr	r10
 	li	r9, 1
 	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
+	LOAD_REG_IMMEDIATE(r9, MSR_EE)
+	andc	r10, r10, r9
 	mtmsrd	r10, 0
 
 	/* tabort, this dooms the transaction, nothing else */
 	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
 	TABORT(R9)
 
+	/* scv has disabled irqs so must re-enable. sc just remains enabled */
+	li	r9,IRQS_ENABLED
+	stb	r9,PACAIRQSOFTMASK(r13)
+
 	/*
 	 * Return directly to userspace. We have corrupted user register state,
 	 * but userspace will never see that register state. Execution will
-- 
2.23.0


^ permalink raw reply related

* [PATCH v1 2/2] selftests/powerpc: Add scv versions of the basic TM syscall tests
From: Nicholas Piggin @ 2021-09-01 16:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Eirik Fuller, Nicholas Piggin
In-Reply-To: <20210901161810.1411015-1-npiggin@gmail.com>

The basic TM vs syscall test code hard codes an sc instruction for the
system call, which fails to cover scv even when the userspace libc has
support for it.

Duplicate the tests with hard coded scv variants so both are tested
when possible.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 .../selftests/powerpc/tm/tm-syscall-asm.S     | 46 +++++++++++++++++++
 .../testing/selftests/powerpc/tm/tm-syscall.c | 36 ++++++++++++---
 2 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
index bd1ca25febe4..849316831e6a 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall-asm.S
@@ -2,6 +2,10 @@
 #include <ppc-asm.h>
 #include <asm/unistd.h>
 
+/* ppc-asm.h does not define r0 or r1 */
+#define r0 0
+#define r1 1
+
 	.text
 FUNC_START(getppid_tm_active)
 	tbegin.
@@ -26,3 +30,45 @@ FUNC_START(getppid_tm_suspended)
 1:
 	li	r3, -1
 	blr
+
+FUNC_START(getppid_scv_tm_active)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,-32(r1)
+	tbegin.
+	beq 1f
+	li	r0, __NR_getppid
+	scv	0
+	tend.
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+1:
+	li	r3, -1
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+
+FUNC_START(getppid_scv_tm_suspended)
+	mflr	r0
+	std	r0,16(r1)
+	stdu	r1,-32(r1)
+	tbegin.
+	beq 1f
+	li	r0, __NR_getppid
+	tsuspend.
+	scv	0
+	tresume.
+	tend.
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
+1:
+	li	r3, -1
+	addi	r1,r1,32
+	ld	r0,16(r1)
+	mtlr	r0
+	blr
diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index becb8207b432..9a822208680e 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -19,24 +19,37 @@
 #include "utils.h"
 #include "tm.h"
 
+#ifndef PPC_FEATURE2_SCV
+#define PPC_FEATURE2_SCV               0x00100000 /* scv syscall */
+#endif
+
 extern int getppid_tm_active(void);
 extern int getppid_tm_suspended(void);
+extern int getppid_scv_tm_active(void);
+extern int getppid_scv_tm_suspended(void);
 
 unsigned retries = 0;
 
 #define TEST_DURATION 10 /* seconds */
 #define TM_RETRIES 100
 
-pid_t getppid_tm(bool suspend)
+pid_t getppid_tm(bool scv, bool suspend)
 {
 	int i;
 	pid_t pid;
 
 	for (i = 0; i < TM_RETRIES; i++) {
-		if (suspend)
-			pid = getppid_tm_suspended();
-		else
-			pid = getppid_tm_active();
+		if (suspend) {
+			if (scv)
+				pid = getppid_scv_tm_suspended();
+			else
+				pid = getppid_tm_suspended();
+		} else {
+			if (scv)
+				pid = getppid_scv_tm_active();
+			else
+				pid = getppid_tm_active();
+		}
 
 		if (pid >= 0)
 			return pid;
@@ -82,15 +95,24 @@ int tm_syscall(void)
 		 * Test a syscall within a suspended transaction and verify
 		 * that it succeeds.
 		 */
-		FAIL_IF(getppid_tm(true) == -1); /* Should succeed. */
+		FAIL_IF(getppid_tm(false, true) == -1); /* Should succeed. */
 
 		/*
 		 * Test a syscall within an active transaction and verify that
 		 * it fails with the correct failure code.
 		 */
-		FAIL_IF(getppid_tm(false) != -1);  /* Should fail... */
+		FAIL_IF(getppid_tm(false, false) != -1);  /* Should fail... */
 		FAIL_IF(!failure_is_persistent()); /* ...persistently... */
 		FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
+
+		/* Now do it all again with scv if it is available. */
+		if (have_hwcap2(PPC_FEATURE2_SCV)) {
+			FAIL_IF(getppid_tm(true, true) == -1); /* Should succeed. */
+			FAIL_IF(getppid_tm(true, false) != -1);  /* Should fail... */
+			FAIL_IF(!failure_is_persistent()); /* ...persistently... */
+			FAIL_IF(!failure_is_syscall());    /* ...with code syscall. */
+		}
+
 		gettimeofday(&now, 0);
 	}
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH v1 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Nicholas Piggin @ 2021-09-01 16:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Eirik Fuller, Nicholas Piggin

If a system call is made with a transaction active, the kernel
immediately aborts it and returns. scv system calls disable irqs even
earlier in their interrupt handler, and tabort_syscall does not fix this
up.

This can result in irq soft-mask state being messed up on the next
kernel entry, and crashing at BUG_ON(arch_irq_disabled_regs(regs)) in
the kernel exit handlers, or possibly worse.

Fix this by having tabort_syscall setting irq soft-mask back to enabled.

Reported-by: Eirik Fuller <efuller@redhat.com>
Fixes: 7fa95f9adaee7 ("powerpc/64s: system call support for scv/rfscv instructions")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/interrupt_64.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index d4212d2ff0b5..44f99df36fb2 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -438,6 +438,10 @@ _ASM_NOKPROBE_SYMBOL(tabort_syscall)
 	li	r9, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
 	TABORT(R9)
 
+	/* scv has disabled irqs so must re-enable. sc just remains enabled */
+	li	r9,IRQS_ENABLED
+	stb	r9,PACAIRQSOFTMASK(r13)
+
 	/*
 	 * Return directly to userspace. We have corrupted user register state,
 	 * but userspace will never see that register state. Execution will
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH kernel] KVM: PPC: Book3S HV: Make unique debugfs nodename
From: Fabiano Rosas @ 2021-09-01 15:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <87lf4gv0hf.fsf@linux.ibm.com>

Fabiano Rosas <farosas@linux.ibm.com> writes:

> That is why I mentioned creating a hook similar to
> kvm_create_vcpu_debugfs in the common KVM code. kvm_create_vm_debugfs or
> something.

s/kvm/kvm_arch/


^ permalink raw reply

* Re: [PATCH kernel] KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
From: Fabiano Rosas @ 2021-09-01 14:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc
In-Reply-To: <20210901084512.1658628-1-aik@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> The userspace can trigger "vmalloc size %lu allocation failure: exceeds
> total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.
>
> This silences the warning by checking the limit before calling vzalloc()
> and returns ENOMEM if failed.
>
> This does not call underlying valloc helpers as __vmalloc_node() is only
> exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is not
> exported at all.
>
> Spotted by syzkaller.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 474c0cfde384..a59f1cccbcf9 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4830,8 +4830,12 @@ static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
>  	unsigned long npages = mem->memory_size >> PAGE_SHIFT;
>
>  	if (change == KVM_MR_CREATE) {
> -		slot->arch.rmap = vzalloc(array_size(npages,
> -					  sizeof(*slot->arch.rmap)));
> +		unsigned long cb = array_size(npages, sizeof(*slot->arch.rmap));

What does cb mean?

> +
> +		if ((cb >> PAGE_SHIFT) > totalram_pages())
> +			return -ENOMEM;
> +
> +		slot->arch.rmap = vzalloc(cb);
>  		if (!slot->arch.rmap)
>  			return -ENOMEM;
>  	}

^ permalink raw reply

* Re: [PATCH] powerpc/bug: Cast to unsigned long before passing to inline asm
From: Segher Boessenkool @ 2021-09-01 14:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: nathan, linuxppc-dev
In-Reply-To: <87tuj43gu1.fsf@mpe.ellerman.id.au>

On Wed, Sep 01, 2021 at 05:17:26PM +1000, Michael Ellerman wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Tue, Aug 31, 2021 at 11:27:20PM +1000, Michael Ellerman wrote:
> >> Nathan filed an LLVM bug [2], in which Eli Friedman explained that "if
> >> you pass a value of a type that's narrower than a register to an inline
> >> asm, the high bits are undefined". In this case we are passing a bool
> >> to the inline asm, which is a single bit value, and so the compiler
> >> thinks it can leave the high bits of r30 unmasked, because only the
> >> value of bit 0 matters.
> >> 
> >> Because the inline asm compares the full width of the register (32 or
> >> 64-bit) we need to ensure the value passed to the inline asm has all
> >> bits defined. So fix it by casting to long.
> >> 
> >> We also already cast to long for the similar case in BUG_ENTRY(), which
> >> it turns out was added to fix a similar bug in 2005 in commit
> >> 32818c2eb6b8 ("[PATCH] ppc64: Fix issue with gcc 4.0 compiled kernels").
> >
> > That points to <https://gcc.gnu.org/PR23422>, which shows the correct
> > explanation.
> 
> That's a pity because I don't understand that explanation ^_^
> 
> Why does sign extension matter when we're comparing against zero?

The "td" insn wants a 64-bit quantity.  You have to provide one, the
compiler will not do an extend itself, it does not try to understand the
asm template in any way.

> > The code as it was did **not** pass a bool.  It perhaps passed an int
> > (so many macros in play, it is hard to tell for sure, but it is int or
> > long int, perhaps unsigned (which does not change anything here).
> 
> I don't understand that. It definitely is passing a bool at the source
> level. Are you saying it's getting promoted somehow?
> 
> It expands to:

<snip>

> And knode_dead() returns bool:
> 
>   static bool knode_dead(struct klist_node *knode)
>   {
>   	return (unsigned long)knode->n_klist & KNODE_DEAD;
>   }
> 
> So in my book that means the type there is bool. But I'm not a compiler
> guy so I guessing I'm missing something.

I was confused by all the macros and stuff.  And "bool" in the kernel
means "_Bool" now (so it is a character type, with GCC).

> > If this is not the correct explanation for LLVM, it needs to solve some
> > other bug.
> 
> OK. I just need to get this fixed in the kernel, so I'll do a new
> version with a changelog that is ~= "shrug not sure what's going on" and
> merge that. Then we can argue about what is really going on later :)

One thing you should probably do is never pass expressions as asm
operands that are "r".  Instead, make a temporary var and assign to that,
so it will have the type you want, without being able to forget to add
a cast :-)


Segher

^ permalink raw reply

* Re: [PATCH kernel] KVM: PPC: Book3S: Suppress failed alloc warning in H_COPY_TOFROM_GUEST
From: Fabiano Rosas @ 2021-09-01 14:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: Alexey Kardashevskiy, kvm-ppc
In-Reply-To: <20210901084550.1658699-1-aik@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> H_COPY_TOFROM_GUEST is an hcall for an upper level VM to access its nested
> VMs memory. The userspace can trigger WARN_ON_ONCE(!(gfp & __GFP_NOWARN))
> in __alloc_pages() by constructing a tiny VM which only does
> H_COPY_TOFROM_GUEST with a too big GPR9 (number of bytes to copy).
>
> This silences the warning by adding __GFP_NOWARN.
>
> Spotted by syzkaller.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>

>  arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
> index e57c08b968c0..a2e34efb8d31 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -580,7 +580,7 @@ long kvmhv_copy_tofrom_guest_nested(struct kvm_vcpu *vcpu)
>  	if (eaddr & (0xFFFUL << 52))
>  		return H_PARAMETER;
>
> -	buf = kzalloc(n, GFP_KERNEL);
> +	buf = kzalloc(n, GFP_KERNEL | __GFP_NOWARN);
>  	if (!buf)
>  		return H_NO_MEM;

^ permalink raw reply

* Re: [PATCH kernel] KVM: PPC: Book3S HV: Make unique debugfs nodename
From: Fabiano Rosas @ 2021-09-01 14:23 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <2fe01488-5a9b-785e-7c05-1d527dead18d@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 24/08/2021 18:37, Alexey Kardashevskiy wrote:
>> 
>> 
>> On 18/08/2021 08:20, Fabiano Rosas wrote:
>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>
>>>> On 07/07/2021 14:13, Alexey Kardashevskiy wrote:
>>>
>>>> alternatively move this debugfs stuff under the platform-independent
>>>> directory, how about that?
>>>
>>> That's a good idea. I only now realized we have two separate directories
>>> for the same guest:
>>>
>>> $ ls /sys/kernel/debug/kvm/ | grep $pid
>>> 19062-11
>>> vm19062
>>>
>>> Looks like we would have to implement kvm_arch_create_vcpu_debugfs for
>>> the vcpu information and add a similar hook for the vm.
>> 
>> Something like that. From the git history, it looks like the ppc folder 
>> was added first and then the generic kvm folder was added but apparently 
>> they did not notice the ppc one due to natural reasons :)
>> 
>> If you are not too busy, can you please merge the ppc one into the 
>> generic one and post the patch, so we won't need to fix these 
>> duplication warnings again? Thanks,
>
>
>
> Turns out it is not that straight forward as I thought as the common KVM 
> debugfs entry is created after PPC HV KVM created its own and there is 
> no obvious way to change the order (no "post init" hook in
> kvmppc_ops).

That is why I mentioned creating a hook similar to
kvm_create_vcpu_debugfs in the common KVM code. kvm_create_vm_debugfs or
something.

Alternatively, maybe kvm_create_vm_debugfs could be moved earlier into
kvm_create_vm, before kvm_arch_post_init_vm and we could move our code
into kvm_arch_post_init_vm.

>
> Also, unlike the common KVM debugfs setup, we do not allocate structures 
> to support debugfs nodes so we do not leak anything to bother with a 
> mutex like 85cd39af14f4 did.
>
> So I'd stick to the original patch to reduce the noise in the dmesg, and 
> it also exposes lpid which I find rather useful for finding the right 
> partition scope tree in partition_tb.
>
> Michael?
>
>
>> 
>> 
>> 
>>>>> ---
>>>>>    arch/powerpc/kvm/book3s_hv.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c 
>>>>> b/arch/powerpc/kvm/book3s_hv.c
>>>>> index 1d1fcc290fca..0223ddc0eed0 100644
>>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>>>> @@ -5227,7 +5227,7 @@ static int kvmppc_core_init_vm_hv(struct kvm 
>>>>> *kvm)
>>>>>        /*
>>>>>         * Create a debugfs directory for the VM
>>>>>         */
>>>>> -    snprintf(buf, sizeof(buf), "vm%d", current->pid);
>>>>> +    snprintf(buf, sizeof(buf), "vm%d-lp%ld", current->pid, lpid);
>>>>>        kvm->arch.debugfs_dir = debugfs_create_dir(buf, 
>>>>> kvm_debugfs_dir);
>>>>>        kvmppc_mmu_debugfs_init(kvm);
>>>>>        if (radix_enabled())
>>>>>
>> 

^ permalink raw reply

* Re: [next-20210827][ppc][multipathd] INFO: task hung in dm_table_add_target
From: Christoph Hellwig @ 2021-09-01 13:36 UTC (permalink / raw)
  To: Abdul Haleem
  Cc: axboe, sachinp, jack, linux-scsi, linux-kernel, dm-devel,
	linux-next, dougmill, Brian King, linuxppc-dev, hch
In-Reply-To: <68dde454-965a-0c44-374a-a0ca277150ee@linux.vnet.ibm.com>

On Wed, Sep 01, 2021 at 04:47:26PM +0530, Abdul Haleem wrote:
> Greeting's
>
> multiple task hung while adding the vfc disk back to the multipath on my 
> powerpc box running linux-next kernel

Can you retry to reproduce this with lockdep enabled to see if there
is anything interesting holding this lock?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox