LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH kernel] KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
From: Fabiano Rosas @ 2021-09-02 13:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: kvm-ppc
In-Reply-To: <a72edcd2-a990-a549-2f31-dab134bef6a6@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 02/09/2021 00:59, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> The userspace can trigger "vmalloc size %lu allocation failure: exceeds
>>> total pages" via the KVM_SET_USER_MEMORY_REGION ioctl.
>>>
>>> This silences the warning by checking the limit before calling vzalloc()
>>> and returns ENOMEM if failed.
>>>
>>> This does not call underlying valloc helpers as __vmalloc_node() is only
>>> exported when CONFIG_TEST_VMALLOC_MODULE and __vmalloc_node_range() is not
>>> exported at all.
>>>
>>> Spotted by syzkaller.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>   arch/powerpc/kvm/book3s_hv.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 474c0cfde384..a59f1cccbcf9 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -4830,8 +4830,12 @@ static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
>>>   	unsigned long npages = mem->memory_size >> PAGE_SHIFT;
>>>
>>>   	if (change == KVM_MR_CREATE) {
>>> -		slot->arch.rmap = vzalloc(array_size(npages,
>>> -					  sizeof(*slot->arch.rmap)));
>>> +		unsigned long cb = array_size(npages, sizeof(*slot->arch.rmap));
>> 
>> What does cb mean?
>
> "count of bytes"
>
> This is from my deep Windows past :)
>
> https://docs.microsoft.com/en-us/windows/win32/stg/coding-style-conventions

=D How interesting! And according to that link 'sz' means "Zero terminated
String". Imagine the confusion.. haha

>> 
>>> +
>>> +		if ((cb >> PAGE_SHIFT) > totalram_pages())
>>> +			return -ENOMEM;
>>> +
>>> +		slot->arch.rmap = vzalloc(cb);
>>>   		if (!slot->arch.rmap)
>>>   			return -ENOMEM;
>>>   	}

^ permalink raw reply

* [powerpc:next] BUILD SUCCESS e432fe97f3e5de325b40021e505cce53877586c5
From: kernel test robot @ 2021-09-02 13:07 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
branch HEAD: e432fe97f3e5de325b40021e505cce53877586c5  powerpc/bug: Cast to unsigned long before passing to inline asm

elapsed time: 1390m

configs tested: 152
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm64                               defconfig
arm                                 defconfig
arm64                            allyesconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210831
powerpc                 mpc8540_ads_defconfig
sh                          sdk7786_defconfig
arm                            xcep_defconfig
arm                          ep93xx_defconfig
arm                         cm_x300_defconfig
powerpc                     sequoia_defconfig
mips                            e55_defconfig
sh                  sh7785lcr_32bit_defconfig
sh                   rts7751r2dplus_defconfig
powerpc                 linkstation_defconfig
sparc64                             defconfig
arm                           corgi_defconfig
mips                           jazz_defconfig
powerpc                      acadia_defconfig
arm                          moxart_defconfig
sh                        edosk7760_defconfig
riscv                            alldefconfig
arm                          pxa168_defconfig
sh                        edosk7705_defconfig
mips                   sb1250_swarm_defconfig
powerpc                      tqm8xx_defconfig
xtensa                generic_kc705_defconfig
x86_64                           alldefconfig
riscv             nommu_k210_sdcard_defconfig
sh                        sh7785lcr_defconfig
powerpc                 mpc836x_rdk_defconfig
sh                      rts7751r2d1_defconfig
powerpc                     kmeter1_defconfig
mips                         cobalt_defconfig
arm                        trizeps4_defconfig
sh                          rsk7269_defconfig
mips                     decstation_defconfig
arm                           spitz_defconfig
powerpc                     tqm8540_defconfig
arm                            mps2_defconfig
sh                           se7750_defconfig
arm                       aspeed_g5_defconfig
sparc64                          alldefconfig
sh                           sh2007_defconfig
arm                     davinci_all_defconfig
powerpc                     tqm5200_defconfig
riscv                    nommu_virt_defconfig
arm                           h5000_defconfig
arc                              alldefconfig
sh                              ul2_defconfig
powerpc64                           defconfig
sh                           se7712_defconfig
powerpc                      bamboo_defconfig
ia64                             alldefconfig
powerpc                 mpc837x_rdb_defconfig
alpha                               defconfig
x86_64                            allnoconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
x86_64               randconfig-a005-20210831
x86_64               randconfig-a001-20210831
x86_64               randconfig-a003-20210831
x86_64               randconfig-a002-20210831
x86_64               randconfig-a004-20210831
x86_64               randconfig-a006-20210831
i386                 randconfig-a005-20210831
i386                 randconfig-a002-20210831
i386                 randconfig-a003-20210831
i386                 randconfig-a006-20210831
i386                 randconfig-a001-20210831
i386                 randconfig-a004-20210831
x86_64               randconfig-a016-20210901
x86_64               randconfig-a011-20210901
x86_64               randconfig-a012-20210901
x86_64               randconfig-a015-20210901
x86_64               randconfig-a014-20210901
x86_64               randconfig-a013-20210901
arc                  randconfig-r043-20210831
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                    rhel-8.3-kselftests
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
i386                 randconfig-c001-20210831
s390                 randconfig-c005-20210831
riscv                randconfig-c006-20210831
powerpc              randconfig-c003-20210831
mips                 randconfig-c004-20210831
arm                  randconfig-c002-20210831
x86_64               randconfig-c007-20210831
s390                 randconfig-c005-20210901
mips                 randconfig-c004-20210901
x86_64               randconfig-c007-20210901
powerpc              randconfig-c003-20210901
i386                 randconfig-c001-20210901
arm                  randconfig-c002-20210901
riscv                randconfig-c006-20210901
x86_64               randconfig-a014-20210831
x86_64               randconfig-a015-20210831
x86_64               randconfig-a013-20210831
x86_64               randconfig-a016-20210831
x86_64               randconfig-a012-20210831
x86_64               randconfig-a011-20210831
i386                 randconfig-a016-20210831
i386                 randconfig-a011-20210831
i386                 randconfig-a015-20210831
i386                 randconfig-a014-20210831
i386                 randconfig-a012-20210831
i386                 randconfig-a013-20210831
s390                 randconfig-r044-20210831
hexagon              randconfig-r041-20210831
hexagon              randconfig-r045-20210831
riscv                randconfig-r042-20210831
hexagon              randconfig-r045-20210901
hexagon              randconfig-r041-20210901

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* RE: [PATCH kernel] KVM: PPC: Book3S: Suppress warnings when allocating too big memory slots
From: David Laight @ 2021-09-02 13:23 UTC (permalink / raw)
  To: 'Fabiano Rosas', Alexey Kardashevskiy,
	linuxppc-dev@lists.ozlabs.org
  Cc: kvm-ppc@vger.kernel.org
In-Reply-To: <878s0funuy.fsf@linux.ibm.com>

...
> > This is from my deep Windows past :)
> >
> > https://docs.microsoft.com/en-us/windows/win32/stg/coding-style-conventions
> 
> =D How interesting! And according to that link 'sz' means "Zero terminated
> String". Imagine the confusion.. haha

Is that document responsible for some of the general unreadability
of windows code?
(I'm not going to addle by brain by trying to read it.)

Types like DWORD_PTR really shouldn't exist.
You won't guess what it is...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH 0/5] KVM: PPC: Book3S: Modules cleanup and unification
From: Fabiano Rosas @ 2021-09-02 14:32 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, npiggin, kvm-ppc
In-Reply-To: <YTAownlTy46X4jGV@yekko>

David Gibson <david@gibson.dropbear.id.au> writes:

> On Wed, Sep 01, 2021 at 02:33:52PM -0300, Fabiano Rosas wrote:
>> This series merges our three kvm modules kvm.ko, kvm-hv.ko and
>> kvm-pr.ko into one kvm.ko module.
>
> That doesn't sound like a good idea to me.  People who aren't on BookS
> servers don't want - and can't use - kvm-hv.  Almost nobody wants
> kvm-pr.  It's also kind of inconsistent with x86, which has the
> separate AMD and Intel modules.

But this is not altering the ability of having only kvm-hv or only
kvm-pr. I'm taking the Kconfig options that used to produce separate
modules and using them to select which code gets built into the one
kvm.ko module.

Currently:

CONFIG_KVM_BOOK3S_64=m     <-- produces kvm.ko
CONFIG_KVM_BOOK3S_64_HV=m  <-- produces kvm-hv.ko
CONFIG_KVM_BOOK3S_64_PR=m  <-- produces kvm-pr.ko

I'm making it so we now have one kvm.ko everywhere, but there is still:

CONFIG_KVM_BOOK3S_64=m           <-- produces kvm.ko
CONFIG_KVM_BOOK3S_HV_POSSIBLE=y  <-- includes HV in kvm.ko
CONFIG_KVM_BOOK3S_PR_POSSIBLE=y  <-- includes PR in kvm.ko

In other words, if you are going to have at least two modules loaded at
all times (kvm + kvm-hv or kvm + kvm-pr), why not put all that into one
module? No one needs to build code they are not going to use, this is
not changing.


About consistency with x86, this situation is not analogous because we
need to be able to load both modules at the same time, which means
kvm.ko needs to stick around when one module goes away in case we want
to load the other module. The KVM common code states that it expects to
have at most one implementation:

        /*
         * kvm_arch_init makes sure there's at most one caller
         * for architectures that support multiple implementations,
         * like intel and amd on x86.
         (...)

which is not true in our case due to this requirement of having two
separate modules loading independently.

(tangent) We are already quite different from other architectures since
we're not making use of kvm_arch_init and some other KVM hooks, such as
kvm_arch_check_processor_compat. So while other archs have their init
dispatched by kvm common code, our init and cleanup happens
independently in the ppc-specific modules, which obviously works but is
needlessly different and has subtleties in the ordering of operations
wrt. the kvm common code. (tangent)

^ permalink raw reply

* Re: [PATCH v3 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Mathieu Desnoyers @ 2021-09-02 15:28 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: KVM list, Peter Zijlstra, linux-kernel, Will Deacon, Guo Ren,
	linux-kselftest, Ben Gardon, shuah, Paul Mackerras,
	Russell King, ARM Linux, linux-csky, Ingo Molnar, Catalin Marinas,
	paulmck, Boqun Feng, rostedt, Shakeel Butt, Andy Lutomirski,
	Oleg Nesterov, Thomas Gleixner, Peter Foley, linux-arm-kernel,
	Thomas Bogendoerfer, linux-mips, Paolo Bonzini, linuxppc-dev
In-Reply-To: <20210901203030.1292304-5-seanjc@google.com>

----- On Sep 1, 2021, at 4:30 PM, Sean Christopherson seanjc@google.com wrote:

> Add a test to verify an rseq's CPU ID is updated correctly if the task is
> migrated while the kernel is handling KVM_RUN.  This is a regression test
> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
> without updating rseq, leading to a stale CPU ID and other badness.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Thanks!

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> ---
> tools/testing/selftests/kvm/.gitignore  |   1 +
> tools/testing/selftests/kvm/Makefile    |   3 +
> tools/testing/selftests/kvm/rseq_test.c | 236 ++++++++++++++++++++++++
> 3 files changed, 240 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/rseq_test.c
> 
> diff --git a/tools/testing/selftests/kvm/.gitignore
> b/tools/testing/selftests/kvm/.gitignore
> index 0709af0144c8..6d031ff6b68e 100644
> --- a/tools/testing/selftests/kvm/.gitignore
> +++ b/tools/testing/selftests/kvm/.gitignore
> @@ -47,6 +47,7 @@
> /kvm_page_table_test
> /memslot_modification_stress_test
> /memslot_perf_test
> +/rseq_test
> /set_memory_region_test
> /steal_time
> /kvm_binary_stats_test
> diff --git a/tools/testing/selftests/kvm/Makefile
> b/tools/testing/selftests/kvm/Makefile
> index 5832f510a16c..0756e79cb513 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
> TEST_GEN_PROGS_x86_64 += kvm_page_table_test
> TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> TEST_GEN_PROGS_x86_64 += memslot_perf_test
> +TEST_GEN_PROGS_x86_64 += rseq_test
> TEST_GEN_PROGS_x86_64 += set_memory_region_test
> TEST_GEN_PROGS_x86_64 += steal_time
> TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> @@ -92,6 +93,7 @@ TEST_GEN_PROGS_aarch64 += dirty_log_test
> TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
> TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
> TEST_GEN_PROGS_aarch64 += kvm_page_table_test
> +TEST_GEN_PROGS_aarch64 += rseq_test
> TEST_GEN_PROGS_aarch64 += set_memory_region_test
> TEST_GEN_PROGS_aarch64 += steal_time
> TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
> @@ -103,6 +105,7 @@ TEST_GEN_PROGS_s390x += demand_paging_test
> TEST_GEN_PROGS_s390x += dirty_log_test
> TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
> TEST_GEN_PROGS_s390x += kvm_page_table_test
> +TEST_GEN_PROGS_s390x += rseq_test
> TEST_GEN_PROGS_s390x += set_memory_region_test
> TEST_GEN_PROGS_s390x += kvm_binary_stats_test
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c
> b/tools/testing/selftests/kvm/rseq_test.c
> new file mode 100644
> index 000000000000..060538bd405a
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <signal.h>
> +#include <syscall.h>
> +#include <sys/ioctl.h>
> +#include <asm/barrier.h>
> +#include <linux/atomic.h>
> +#include <linux/rseq.h>
> +#include <linux/unistd.h>
> +
> +#include "kvm_util.h"
> +#include "processor.h"
> +#include "test_util.h"
> +
> +#define VCPU_ID 0
> +
> +static __thread volatile struct rseq __rseq = {
> +	.cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
> +};
> +
> +/*
> + * Use an arbitrary, bogus signature for configuring rseq, this test does not
> + * actually enter an rseq critical section.
> + */
> +#define RSEQ_SIG 0xdeadbeef
> +
> +/*
> + * Any bug related to task migration is likely to be timing-dependent; perform
> + * a large number of migrations to reduce the odds of a false negative.
> + */
> +#define NR_TASK_MIGRATIONS 100000
> +
> +static pthread_t migration_thread;
> +static cpu_set_t possible_mask;
> +static bool done;
> +
> +static atomic_t seq_cnt;
> +
> +static void guest_code(void)
> +{
> +	for (;;)
> +		GUEST_SYNC(0);
> +}
> +
> +static void sys_rseq(int flags)
> +{
> +	int r;
> +
> +	r = syscall(__NR_rseq, &__rseq, sizeof(__rseq), flags, RSEQ_SIG);
> +	TEST_ASSERT(!r, "rseq failed, errno = %d (%s)", errno, strerror(errno));
> +}
> +
> +static void *migration_worker(void *ign)
> +{
> +	cpu_set_t allowed_mask;
> +	int r, i, nr_cpus, cpu;
> +
> +	CPU_ZERO(&allowed_mask);
> +
> +	nr_cpus = CPU_COUNT(&possible_mask);
> +
> +	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
> +		cpu = i % nr_cpus;
> +		if (!CPU_ISSET(cpu, &possible_mask))
> +			continue;
> +
> +		CPU_SET(cpu, &allowed_mask);
> +
> +		/*
> +		 * Bump the sequence count twice to allow the reader to detect
> +		 * that a migration may have occurred in between rseq and sched
> +		 * CPU ID reads.  An odd sequence count indicates a migration
> +		 * is in-progress, while a completely different count indicates
> +		 * a migration occurred since the count was last read.
> +		 */
> +		atomic_inc(&seq_cnt);
> +
> +		/*
> +		 * Ensure the odd count is visible while sched_getcpu() isn't
> +		 * stable, i.e. while changing affinity is in-progress.
> +		 */
> +		smp_wmb();
> +		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
> +		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
> +			    errno, strerror(errno));
> +		smp_wmb();
> +		atomic_inc(&seq_cnt);
> +
> +		CPU_CLR(cpu, &allowed_mask);
> +
> +		/*
> +		 * Wait 1-10us before proceeding to the next iteration and more
> +		 * specifically, before bumping seq_cnt again.  A delay is
> +		 * needed on three fronts:
> +		 *
> +		 *  1. To allow sched_setaffinity() to prompt migration before
> +		 *     ioctl(KVM_RUN) enters the guest so that TIF_NOTIFY_RESUME
> +		 *     (or TIF_NEED_RESCHED, which indirectly leads to handling
> +		 *     NOTIFY_RESUME) is handled in KVM context.
> +		 *
> +		 *     If NOTIFY_RESUME/NEED_RESCHED is set after KVM enters
> +		 *     the guest, the guest will trigger a IO/MMIO exit all the
> +		 *     way to userspace and the TIF flags will be handled by
> +		 *     the generic "exit to userspace" logic, not by KVM.  The
> +		 *     exit to userspace is necessary to give the test a chance
> +		 *     to check the rseq CPU ID (see #2).
> +		 *
> +		 *     Alternatively, guest_code() could include an instruction
> +		 *     to trigger an exit that is handled by KVM, but any such
> +		 *     exit requires architecture specific code.
> +		 *
> +		 *  2. To let ioctl(KVM_RUN) make its way back to the test
> +		 *     before the next round of migration.  The test's check on
> +		 *     the rseq CPU ID must wait for migration to complete in
> +		 *     order to avoid false positive, thus any kernel rseq bug
> +		 *     will be missed if the next migration starts before the
> +		 *     check completes.
> +		 *
> +		 *  3. To ensure the read-side makes efficient forward progress,
> +		 *     e.g. if sched_getcpu() involves a syscall.  Stalling the
> +		 *     read-side means the test will spend more time waiting for
> +		 *     sched_getcpu() to stabilize and less time trying to hit
> +		 *     the timing-dependent bug.
> +		 *
> +		 * Because any bug in this area is likely to be timing-dependent,
> +		 * run with a range of delays at 1us intervals from 1us to 10us
> +		 * as a best effort to avoid tuning the test to the point where
> +		 * it can hit _only_ the original bug and not detect future
> +		 * regressions.
> +		 *
> +		 * The original bug can reproduce with a delay up to ~500us on
> +		 * x86-64, but starts to require more iterations to reproduce
> +		 * as the delay creeps above ~10us, and the average runtime of
> +		 * each iteration obviously increases as well.  Cap the delay
> +		 * at 10us to keep test runtime reasonable while minimizing
> +		 * potential coverage loss.
> +		 *
> +		 * The lower bound for reproducing the bug is likely below 1us,
> +		 * e.g. failures occur on x86-64 with nanosleep(0), but at that
> +		 * point the overhead of the syscall likely dominates the delay.
> +		 * Use usleep() for simplicity and to avoid unnecessary kernel
> +		 * dependencies.
> +		 */
> +		usleep((i % 10) + 1);
> +	}
> +	done = true;
> +	return NULL;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int r, i, snapshot;
> +	struct kvm_vm *vm;
> +	u32 cpu, rseq_cpu;
> +
> +	/* Tell stdout not to buffer its content */
> +	setbuf(stdout, NULL);
> +
> +	r = sched_getaffinity(0, sizeof(possible_mask), &possible_mask);
> +	TEST_ASSERT(!r, "sched_getaffinity failed, errno = %d (%s)", errno,
> +		    strerror(errno));
> +
> +	if (CPU_COUNT(&possible_mask) < 2) {
> +		print_skip("Only one CPU, task migration not possible\n");
> +		exit(KSFT_SKIP);
> +	}
> +
> +	sys_rseq(0);
> +
> +	/*
> +	 * Create and run a dummy VM that immediately exits to userspace via
> +	 * GUEST_SYNC, while concurrently migrating the process by setting its
> +	 * CPU affinity.
> +	 */
> +	vm = vm_create_default(VCPU_ID, 0, guest_code);
> +
> +	pthread_create(&migration_thread, NULL, migration_worker, 0);
> +
> +	for (i = 0; !done; i++) {
> +		vcpu_run(vm, VCPU_ID);
> +		TEST_ASSERT(get_ucall(vm, VCPU_ID, NULL) == UCALL_SYNC,
> +			    "Guest failed?");
> +
> +		/*
> +		 * Verify rseq's CPU matches sched's CPU.  Ensure migration
> +		 * doesn't occur between sched_getcpu() and reading the rseq
> +		 * cpu_id by rereading both if the sequence count changes, or
> +		 * if the count is odd (migration in-progress).
> +		 */
> +		do {
> +			/*
> +			 * Drop bit 0 to force a mismatch if the count is odd,
> +			 * i.e. if a migration is in-progress.
> +			 */
> +			snapshot = atomic_read(&seq_cnt) & ~1;
> +
> +			/*
> +			 * Ensure reading sched_getcpu() and rseq.cpu_id
> +			 * complete in a single "no migration" window, i.e. are
> +			 * not reordered across the seq_cnt reads.
> +			 */
> +			smp_rmb();
> +			cpu = sched_getcpu();
> +			rseq_cpu = READ_ONCE(__rseq.cpu_id);
> +			smp_rmb();
> +		} while (snapshot != atomic_read(&seq_cnt));
> +
> +		TEST_ASSERT(rseq_cpu == cpu,
> +			    "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
> +	}
> +
> +	/*
> +	 * Sanity check that the test was able to enter the guest a reasonable
> +	 * number of times, e.g. didn't get stalled too often/long waiting for
> +	 * sched_getcpu() to stabilize.  A 2:1 migration:KVM_RUN ratio is a
> +	 * fairly conservative ratio on x86-64, which can do _more_ KVM_RUNs
> +	 * than migrations given the 1us+ delay in the migration task.
> +	 */
> +	TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
> +		    "Only performed %d KVM_RUNs, task stalled too much?\n", i);
> +
> +	pthread_join(migration_thread, NULL);
> +
> +	kvm_vm_free(vm);
> +
> +	sys_rseq(RSEQ_FLAG_UNREGISTER);
> +
> +	return 0;
> +}
> --
> 2.33.0.153.gba50c8fa24-goog

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [PATCH v2 3/5] signal: Add unsafe_copy_siginfo_to_user()
From: Linus Torvalds @ 2021-09-02 16:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Peter Zijlstra, Josh Poimboeuf, Linux Kernel Mailing List,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <YTB1F7o15FrxmmP1@infradead.org>

On Wed, Sep 1, 2021 at 11:55 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> I'm a little worried about all these unsafe helper in powerpc and the
> ever increasing scope of the unsafe sections.  Can you at least at
> powerpc support to objtool to verify them?  objtool verifications has
> helped to find quite a few bugs in unsafe sections on x86.

.. yeah, objdump was particularly useful for the really subtle ones
where there are random function calls due to things like KASAN etc.

No human would ever have noticed "oh, we're walking the kernel stack
with user mode accesses enabled because the compiler inserted magical
debug code here". Objdump sees those things - assuming you teach it
about that special user space access enable/disable sequence.

            Linus

^ permalink raw reply

* Re: [PATCH printk v4 3/6] printk: remove safe buffers
From: Geert Uytterhoeven @ 2021-09-02 16:48 UTC (permalink / raw)
  To: John Ogness
  Cc: Petr Mladek, Kees Cook, Paul E. McKenney, Alexey Kardashevskiy,
	Nicholas Piggin, Linux Kernel Mailing List, Steven Rostedt, kexec,
	Sergey Senozhatsky, Yue Hu, Paul Mackerras, Eric Biederman,
	Thomas Gleixner, linuxppc-dev, Andrew Morton, Tiezhu Yang,
	Cédric Le Goater
In-Reply-To: <20210715193359.25946-4-john.ogness@linutronix.de>

Hi John,

On Thu, Jul 15, 2021 at 9:53 PM John Ogness <john.ogness@linutronix.de> wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
>
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
>
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console and console_owner locks is left
> in place. This is because the console and console_owner locks are
> needed for the actual printing.
>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>

Thank you very much for reducing kernel size by ca. 8 KiB!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v2 3/5] signal: Add unsafe_copy_siginfo_to_user()
From: Eric W. Biederman @ 2021-09-02 18:43 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <a94be61f008ab29c231b805e1a97e9dab35cb0cc.1629732940.git.christophe.leroy@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> In the same spirit as commit fb05121fd6a2 ("signal: Add
> unsafe_get_compat_sigset()"), implement an 'unsafe' version of
> copy_siginfo_to_user() in order to use it within user access blocks.
>
> For that, also add an 'unsafe' version of clear_user().

Looking at your use cases you need the 32bit compat version of this
as well.

The 32bit compat version is too complicated to become a macro, so I
don't think you can make this work correctly for the 32bit compat case.

Probably-Not-by: "Eric W. Biederman" <ebiederm@xmission.com>

Eric

> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  include/linux/signal.h  | 15 +++++++++++++++
>  include/linux/uaccess.h |  1 +
>  kernel/signal.c         |  5 -----
>  3 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index 3454c7ff0778..659bd43daf10 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -35,6 +35,21 @@ static inline void copy_siginfo_to_external(siginfo_t *to,
>  int copy_siginfo_to_user(siginfo_t __user *to, const kernel_siginfo_t *from);
>  int copy_siginfo_from_user(kernel_siginfo_t *to, const siginfo_t __user *from);
>  
> +static __always_inline char __user *si_expansion(const siginfo_t __user *info)
> +{
> +	return ((char __user *)info) + sizeof(struct kernel_siginfo);
> +}
> +
> +#define unsafe_copy_siginfo_to_user(to, from, label) do {		\
> +	siginfo_t __user *__ucs_to = to;				\
> +	const kernel_siginfo_t *__ucs_from = from;			\
> +	char __user *__ucs_expansion = si_expansion(__ucs_to);		\
> +									\
> +	unsafe_copy_to_user(__ucs_to, __ucs_from,			\
> +			    sizeof(struct kernel_siginfo), label);	\
> +	unsafe_clear_user(__ucs_expansion, SI_EXPANSION_SIZE, label);	\
> +} while (0)
> +
>  enum siginfo_layout {
>  	SIL_KILL,
>  	SIL_TIMER,
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index c05e903cef02..37073caac474 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -398,6 +398,7 @@ long strnlen_user_nofault(const void __user *unsafe_addr, long count);
>  #define unsafe_put_user(x,p,e) unsafe_op_wrap(__put_user(x,p),e)
>  #define unsafe_copy_to_user(d,s,l,e) unsafe_op_wrap(__copy_to_user(d,s,l),e)
>  #define unsafe_copy_from_user(d,s,l,e) unsafe_op_wrap(__copy_from_user(d,s,l),e)
> +#define unsafe_clear_user(d, l, e) unsafe_op_wrap(__clear_user(d, l), e)
>  static inline unsigned long user_access_save(void) { return 0UL; }
>  static inline void user_access_restore(unsigned long flags) { }
>  #endif
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a3229add4455..83b5971e4304 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -3261,11 +3261,6 @@ enum siginfo_layout siginfo_layout(unsigned sig, int si_code)
>  	return layout;
>  }
>  
> -static inline char __user *si_expansion(const siginfo_t __user *info)
> -{
> -	return ((char __user *)info) + sizeof(struct kernel_siginfo);
> -}
> -
>  int copy_siginfo_to_user(siginfo_t __user *to, const kernel_siginfo_t *from)
>  {
>  	char __user *expansion = si_expansion(to);

^ permalink raw reply

* Re: [PATCH v2 5/5] powerpc/signal: Use unsafe_copy_siginfo_to_user()
From: Eric W. Biederman @ 2021-09-02 18:38 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <7d391d915d2bd1c0f601f55f61f8dd4c70066229.1629732940.git.christophe.leroy@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Use unsafe_copy_siginfo_to_user() in order to do the copy
> within the user access block.
>
> On an mpc 8321 (book3s/32) the improvment is about 5% on a process
> sending a signal to itself.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

copy_siginfo_to_user is not the same as copy_siginfo_to_user32.

As in this patch breaks 32bit userspace on powerpc.


> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/kernel/signal_32.c | 13 ++++++-------
>  arch/powerpc/kernel/signal_64.c |  5 +----
>  2 files changed, 7 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
> index ff101e2b3bab..f9e16d108bc8 100644
> --- a/arch/powerpc/kernel/signal_32.c
> +++ b/arch/powerpc/kernel/signal_32.c
> @@ -710,12 +710,6 @@ static long restore_tm_user_regs(struct pt_regs *regs, struct mcontext __user *s
>  }
>  #endif
>  
> -#ifdef CONFIG_PPC64
> -
> -#define copy_siginfo_to_user	copy_siginfo_to_user32
> -
> -#endif /* CONFIG_PPC64 */
> -
>  /*
>   * Set up a signal frame for a "real-time" signal handler
>   * (one which gets siginfo).
> @@ -779,14 +773,19 @@ int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
>  		asm("dcbst %y0; sync; icbi %y0; sync" :: "Z" (mctx->mc_pad[0]));
>  	}
>  	unsafe_put_sigset_t(&frame->uc.uc_sigmask, oldset, failed);
> +#ifndef CONFIG_COMPAT
> +	unsafe_copy_siginfo_to_user(&frame->info, &ksig->info, failed);
> +#endif
>  
>  	/* create a stack frame for the caller of the handler */
>  	unsafe_put_user(regs->gpr[1], newsp, failed);
>  
>  	user_access_end();
>  
> -	if (copy_siginfo_to_user(&frame->info, &ksig->info))
> +#ifdef CONFIG_COMPAT
> +	if (copy_siginfo_to_user32(&frame->info, &ksig->info))
>  		goto badframe;
> +#endif
>  
>  	regs->link = tramp;
>  
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 2cca6c8febe1..82b73fbd937d 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -901,15 +901,12 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
>  	}
>  
>  	unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block);
> +	unsafe_copy_siginfo_to_user(&frame->info, &ksig->info, badframe_block);
>  	/* Allocate a dummy caller frame for the signal handler. */
>  	unsafe_put_user(regs->gpr[1], newsp, badframe_block);
>  
>  	user_write_access_end();
>  
> -	/* Save the siginfo outside of the unsafe block. */
> -	if (copy_siginfo_to_user(&frame->info, &ksig->info))
> -		goto badframe;
> -
>  	/* Make sure signal handler doesn't get spurious FP exceptions */
>  	tsk->thread.fp_state.fpscr = 0;

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Segher Boessenkool @ 2021-09-02 21:52 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Eirik Fuller, linuxppc-dev
In-Reply-To: <1630553233.5hjr91skvz.astroid@bobo.none>

On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
> Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> >> -	/* Firstly we need to enable TM in the kernel */
> >> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
> >>   	mfmsr	r10
> >>   	li	r9, 1
> >>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> >> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> >> +	andc	r10, r10, r9
> > 
> > Why not use 'rlwinm' to mask out MSR_EE ?
> > 
> > Something like
> > 
> > 	rlwinm	r10, r10, 0, ~MSR_EE
> 
> Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
> to change as much as possible to C?

The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
with rlwinm (only the low 32 bits can).  There are many ways to do it
using two insns of course.

> Actually there should really be no need for mfmsr either, I wanted to
> rewrite the thing entirely as
> 
> 	ld      r10,PACAKMSR(r13)
> 	LOAD_REG_IMMEDIATE(r9, MSR_TM)
> 	or	r10,r10,r9
> 	mtmsrd	r10
> 
> But I thought that's not a minimal bug fix.

That (LOAD_REG_IMMEDIATE+or) can be done with just two insns, not three
as written:
  li r0,1
  rldimi r10,r0,32,31
(you can write that last insns as
  rldimi r10,r0,MSR_TM_LG,MSR_TM_LG-1
if you insist :-) )

It isn't like a few integer computational insns will kill you here of
course, there are much more important cycles to shave off :-)


Segher

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Segher Boessenkool @ 2021-09-02 22:20 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Eirik Fuller, linuxppc-dev
In-Reply-To: <20210902215203.GM1583@gate.crashing.org>

On Thu, Sep 02, 2021 at 04:52:03PM -0500, Segher Boessenkool wrote:
> On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
> > Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
> > >> -	/* Firstly we need to enable TM in the kernel */
> > >> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
> > >>   	mfmsr	r10
> > >>   	li	r9, 1
> > >>   	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
> > >> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
> > >> +	andc	r10, r10, r9
> > > 
> > > Why not use 'rlwinm' to mask out MSR_EE ?
> > > 
> > > Something like
> > > 
> > > 	rlwinm	r10, r10, 0, ~MSR_EE
> > 
> > Mainly because I'm bad at powerpc assembly. Why do you think I'm trying 
> > to change as much as possible to C?
> 
> The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
> with rlwinm (only the low 32 bits can).  There are many ways to do it
> using two insns of course.

Wow I misread that, you want to clear MSR[EE] really, not MSR[TM].

You cannot use rlwinm and keep the high 32 bits of the target register
intact.  You either clear all to 0 or set them to a copy of the rotated
value in the low 32 bits.


Segher

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64s: system call scv tabort fix for corrupt irq soft-mask state
From: Christophe Leroy @ 2021-09-03  5:04 UTC (permalink / raw)
  To: Segher Boessenkool, Nicholas Piggin; +Cc: Eirik Fuller, linuxppc-dev
In-Reply-To: <20210902222002.GN1583@gate.crashing.org>



Le 03/09/2021 à 00:20, Segher Boessenkool a écrit :
> On Thu, Sep 02, 2021 at 04:52:03PM -0500, Segher Boessenkool wrote:
>> On Thu, Sep 02, 2021 at 01:33:10PM +1000, Nicholas Piggin wrote:
>>> Excerpts from Christophe Leroy's message of September 2, 2021 3:21 am:
>>>>> -	/* Firstly we need to enable TM in the kernel */
>>>>> +	/* We need to enable TM in the kernel, and disable EE (for scv) */
>>>>>    	mfmsr	r10
>>>>>    	li	r9, 1
>>>>>    	rldimi	r10, r9, MSR_TM_LG, 63-MSR_TM_LG
>>>>> +	LOAD_REG_IMMEDIATE(r9, MSR_EE)
>>>>> +	andc	r10, r10, r9
>>>>
>>>> Why not use 'rlwinm' to mask out MSR_EE ?
>>>>
>>>> Something like
>>>>
>>>> 	rlwinm	r10, r10, 0, ~MSR_EE
>>>
>>> Mainly because I'm bad at powerpc assembly. Why do you think I'm trying
>>> to change as much as possible to C?
>>
>> The actual bit (bit 31, i.e. with value 1UL << 32) cannot be cleared
>> with rlwinm (only the low 32 bits can).  There are many ways to do it
>> using two insns of course.
> 
> Wow I misread that, you want to clear MSR[EE] really, not MSR[TM].
> 
> You cannot use rlwinm and keep the high 32 bits of the target register
> intact.  You either clear all to 0 or set them to a copy of the rotated
> value in the low 32 bits.
> 

Oops, my mistake. When I tested it in C to see what was generated by GCC I forgot the ~ so I got 
rlwinm r3,r3,0,16,16 and didn't realise it was different from rlwinm r3,r3,0,~(1<<15)

By the way it would be more explicit if objdump could display the mask instead of the mask 
boundaries. Is there a way to do that ?

Christophe

^ permalink raw reply

* [RESEND PATCH v4 0/4]  Add perf interface to expose nvdimm
From: Kajol Jain @ 2021-09-03  5:09 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx

Patchset adds performance stats reporting support for nvdimm.
Added interface includes support for pmu register/unregister
functions. A structure is added called nvdimm_pmu to be used for
adding arch/platform specific data such as supported events, cpumask
pmu event functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Added implementation to expose IBM pseries platform nmem*
device performance stats using this interface.

Result from power9 pseries lpar with 2 nvdimm device:
command:# perf list nmem
  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Patch1:
        Introduces the nvdimm_pmu structure
Patch2:
	Adds common interface to add arch/platform specific data
	includes supported events, pmu event functions. It also
	adds code for cpu hotplug support.
Patch3:
        Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
        nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
        cpumask andevent functions and then registers the pmu by adding
        callbacks to register_nvdimm_pmu.
Patch4:
        Sysfs documentation patch

Changelog
---
v3 -> v4
- Rebase code on top of current papr_scm code without any logical
  changes.

- Added Acked-by tag from Peter Zijlstra and Reviewed by tag
  from Madhavan Srinivasan.

- Link to the patchset v3: https://lkml.org/lkml/2021/6/17/605

v2 -> v3
- Added Tested-by tag.

- Fix nvdimm mailing list in the ABI Documentation.

- Link to the patchset v2: https://lkml.org/lkml/2021/6/14/25

v1 -> v2
- Fix hotplug code by adding pmu migration call
  incase current designated cpu got offline. As
  pointed by Peter Zijlstra.

- Removed the retun -1 part from cpu hotplug offline
  function.

- Link to the patchset v1: https://lkml.org/lkml/2021/6/8/500

Kajol Jain (4):
  drivers/nvdimm: Add nvdimm pmu structure
  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
  powerpc/papr_scm: Add perf interface support
  powerpc/papr_scm: Document papr_scm sysfs event format entries

 Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
 arch/powerpc/include/asm/device.h             |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c     | 365 ++++++++++++++++++
 drivers/nvdimm/Makefile                       |   1 +
 drivers/nvdimm/nd_perf.c                      | 230 +++++++++++
 include/linux/nd.h                            |  46 +++
 6 files changed, 678 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

-- 
2.26.2


^ permalink raw reply

* [RESEND PATCH v4 1/4] drivers/nvdimm: Add nvdimm pmu structure
From: Kajol Jain @ 2021-09-03  5:09 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx
In-Reply-To: <20210903050914.273525-1-kjain@linux.ibm.com>

A structure is added, called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
nvdimm pmu data such as supported events and pmu event functions
like event_init/add/read/del with cpu hotplug support.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/linux/nd.h | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..712499cf7335 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,8 @@
 #include <linux/ndctl.h>
 #include <linux/device.h>
 #include <linux/badblocks.h>
+#include <linux/platform_device.h>
+#include <linux/perf_event.h>
 
 enum nvdimm_event {
 	NVDIMM_REVALIDATE_POISON,
@@ -23,6 +25,47 @@ enum nvdimm_claim_class {
 	NVDIMM_CCLASS_UNKNOWN,
 };
 
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR		0
+#define NVDIMM_PMU_EVENT_ATTR		1
+#define NVDIMM_PMU_CPUMASK_ATTR		2
+#define NVDIMM_PMU_NULL_ATTR		3
+
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ *
+ * @name: name of the nvdimm pmu device.
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @functions(event_init/add/del/read): platform specific pmu functions.
+ * @attr_groups: data structure for events, formats and cpumask
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+	const char *name;
+	struct pmu pmu;
+	struct device *dev;
+	int (*event_init)(struct perf_event *event);
+	int  (*add)(struct perf_event *event, int flags);
+	void (*del)(struct perf_event *event, int flags);
+	void (*read)(struct perf_event *event);
+	/*
+	 * Attribute groups for the nvdimm pmu. Index 0 used for
+	 * format attribute, index 1 used for event attribute,
+	 * index 2 used for cpusmask attribute and index 3 kept as NULL.
+	 */
+	const struct attribute_group *attr_groups[4];
+	int cpu;
+	struct hlist_node node;
+	enum cpuhp_state cpuhp_state;
+
+	/* cpumask provided by arch/platform specific code */
+	struct cpumask arch_cpumask;
+};
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.26.2


^ permalink raw reply related

* [RESEND PATCH v4 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
From: Kajol Jain @ 2021-09-03  5:09 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx
In-Reply-To: <20210903050914.273525-1-kjain@linux.ibm.com>

A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface includes support for
pmu register/unregister functions, cpu hotplug and pmu event
functions like event_init/add/read/del.
User could use the standard perf tool to access perf
events exposed via pmu.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_perf.c | 230 +++++++++++++++++++++++++++++++++++++++
 include/linux/nd.h       |   3 +
 3 files changed, 234 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..4c49d1bc2a3c
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+	u32 target;
+	int nodeid;
+	const struct cpumask *cpumask;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+	cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+	/*
+	 * If given cpu is not same as current designated cpu for
+	 * counter access, just return.
+	 */
+	if (cpu != nd_pmu->cpu)
+		return 0;
+
+	/* Check for any active cpu in nd_pmu->arch_cpumask */
+	target = cpumask_any(&nd_pmu->arch_cpumask);
+
+	/*
+	 * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+	 * check in given cpu's numa node list.
+	 */
+	if (target >= nr_cpu_ids) {
+		nodeid = cpu_to_node(cpu);
+		cpumask = cpumask_of_node(nodeid);
+		target = cpumask_any_but(cpumask, cpu);
+	}
+	nd_pmu->cpu = target;
+
+	/* Migrate nvdimm pmu events to the new target cpu if valid */
+	if (target >= 0 && target < nr_cpu_ids)
+		perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
+
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	if (nd_pmu->cpu >= nr_cpu_ids)
+		nd_pmu->cpu = cpu;
+
+	return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+	struct perf_pmu_events_attr *attr;
+	struct attribute **attrs;
+	struct attribute_group *nvdimm_pmu_cpumask_group;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return -ENOMEM;
+
+	attrs = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+	if (!attrs) {
+		kfree(attr);
+		return -ENOMEM;
+	}
+
+	/* Allocate memory for cpumask attribute group */
+	nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+	if (!nvdimm_pmu_cpumask_group) {
+		kfree(attr);
+		kfree(attrs);
+		return -ENOMEM;
+	}
+
+	sysfs_attr_init(&attr->attr.attr);
+	attr->attr.attr.name = "cpumask";
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = nvdimm_pmu_cpumask_show;
+	attrs[0] = &attr->attr.attr;
+	attrs[1] = NULL;
+
+	nvdimm_pmu_cpumask_group->attrs = attrs;
+	nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+	int nodeid, rc;
+	const struct cpumask *cpumask;
+
+	/*
+	 * Incase cpu hotplug is not handled by arch specific code
+	 * they can still provide required cpumask which can be used
+	 * to get designatd cpu for counter access.
+	 * Check for any active cpu in nd_pmu->arch_cpumask.
+	 */
+	if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+		nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+	} else {
+		/* pick active cpu from the cpumask of device numa node. */
+		nodeid = dev_to_node(nd_pmu->dev);
+		cpumask = cpumask_of_node(nodeid);
+		nd_pmu->cpu = cpumask_any(cpumask);
+	}
+
+	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+				     nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+	if (rc < 0)
+		return rc;
+
+	nd_pmu->cpuhp_state = rc;
+
+	/* Register the pmu instance for cpu hotplug */
+	rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	if (rc) {
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	/* Create cpumask attribute group */
+	rc = create_cpumask_attr_group(nd_pmu);
+	if (rc) {
+		cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	return 0;
+}
+
+void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+	if (nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+		kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+	kfree(nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+	int rc;
+
+	if (!nd_pmu || !pdev)
+		return -EINVAL;
+
+	/* event functions like add/del/read/event_init should not be NULL */
+	if (WARN_ON_ONCE(!(nd_pmu->event_init && nd_pmu->add && nd_pmu->del && nd_pmu->read)))
+		return -EINVAL;
+
+	nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+	nd_pmu->pmu.name = nd_pmu->name;
+	nd_pmu->pmu.event_init = nd_pmu->event_init;
+	nd_pmu->pmu.add = nd_pmu->add;
+	nd_pmu->pmu.del = nd_pmu->del;
+	nd_pmu->pmu.read = nd_pmu->read;
+
+	nd_pmu->pmu.attr_groups = nd_pmu->attr_groups;
+	nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+				PERF_PMU_CAP_NO_EXCLUDE;
+
+	/*
+	 * Add platform_device->dev pointer to nvdimm_pmu to access
+	 * device data in events functions.
+	 */
+	nd_pmu->dev = &pdev->dev;
+
+	/*
+	 * Incase cpumask attribute is set it means cpu
+	 * hotplug is handled by the arch specific code and
+	 * we can skip calling hotplug_init.
+	 */
+	if (!nd_pmu->attr_groups[NVDIMM_PMU_CPUMASK_ATTR]) {
+		/* init cpuhotplug */
+		rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+		if (rc) {
+			pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->name);
+			return rc;
+		}
+	}
+
+	rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->name, -1);
+	if (rc) {
+		nvdimm_pmu_free_hotplug_memory(nd_pmu);
+		return rc;
+	}
+
+	pr_info("%s NVDIMM performance monitor support registered\n",
+		nd_pmu->name);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+	/* handle freeing of memory nd_pmu in arch specific code */
+	perf_pmu_unregister(&nd_pmu->pmu);
+	nvdimm_pmu_free_hotplug_memory(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 712499cf7335..7d8b4f7d277d 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -66,6 +66,9 @@ struct nvdimm_pmu {
 	struct cpumask arch_cpumask;
 };
 
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.26.2


^ permalink raw reply related

* [RESEND PATCH v4 3/4] powerpc/papr_scm: Add perf interface support
From: Kajol Jain @ 2021-09-03  5:09 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx
In-Reply-To: <20210903050914.273525-1-kjain@linux.ibm.com>

Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.

A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with events, cpumask, attribute groups along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device.
Event handling functions internally uses hcall to get events and
counter data.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

  nmem0/cchrhcnt/                                    [Kernel PMU event]
  nmem0/cchwhcnt/                                    [Kernel PMU event]
  nmem0/critrscu/                                    [Kernel PMU event]
  nmem0/ctlresct/                                    [Kernel PMU event]
  nmem0/ctlrestm/                                    [Kernel PMU event]
  nmem0/fastwcnt/                                    [Kernel PMU event]
  nmem0/hostlcnt/                                    [Kernel PMU event]
  nmem0/hostldur/                                    [Kernel PMU event]
  nmem0/hostscnt/                                    [Kernel PMU event]
  nmem0/hostsdur/                                    [Kernel PMU event]
  nmem0/medrcnt/                                     [Kernel PMU event]
  nmem0/medrdur/                                     [Kernel PMU event]
  nmem0/medwcnt/                                     [Kernel PMU event]
  nmem0/medwdur/                                     [Kernel PMU event]
  nmem0/memlife/                                     [Kernel PMU event]
  nmem0/noopstat/                                    [Kernel PMU event]
  nmem0/ponsecs/                                     [Kernel PMU event]
  nmem1/cchrhcnt/                                    [Kernel PMU event]
  nmem1/cchwhcnt/                                    [Kernel PMU event]
  nmem1/critrscu/                                    [Kernel PMU event]
  ...
  nmem1/noopstat/                                    [Kernel PMU event]
  nmem1/ponsecs/                                     [Kernel PMU event]

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/include/asm/device.h         |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 365 ++++++++++++++++++++++
 2 files changed, 370 insertions(+)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
 
 struct pdev_archdata {
 	u64 dma_mask;
+	/*
+	 * Pointer to nvdimm_pmu structure, to handle the unregistering
+	 * of pmu device
+	 */
+	void *priv;
 };
 
 #endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..26900101e638 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,8 @@
 #include <asm/papr_pdsm.h>
 #include <asm/mce.h>
 #include <asm/unaligned.h>
+#include <linux/perf_event.h>
+#include <linux/ctype.h>
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -68,6 +70,8 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
+#define to_nvdimm_pmu(_pmu)	container_of(_pmu, struct nvdimm_pmu, pmu)
+
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
 	u8 stat_id[8];
@@ -120,6 +124,12 @@ struct papr_scm_priv {
 
 	/* length of the stat buffer as expected by phyp */
 	size_t stat_buffer_len;
+
+	 /* array to have event_code and stat_id mappings */
+	char **nvdimm_events_map;
+
+	/* count of supported events */
+	u32 total_events;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +350,354 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+	.name = "format",
+	.attrs = nvdimm_pmu_format_attr,
+};
+
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+	int rc, size;
+
+	/* Allocate request buffer enough to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) +
+		sizeof(struct papr_scm_perf_stat);
+
+	if (!p || !p->nvdimm_events_map)
+		return -EINVAL;
+
+	stats = kzalloc(size, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	stat = &stats->scm_statistic[0];
+	memcpy(&stat->stat_id,
+	       p->nvdimm_events_map[event->attr.config - 1],
+		sizeof(stat->stat_id));
+	stat->stat_val = 0;
+
+	rc = drc_pmem_query_stats(p, stats, 1);
+	if (rc < 0) {
+		kfree(stats);
+		return rc;
+	}
+
+	*count = be64_to_cpu(stat->stat_val);
+	kfree(stats);
+	return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+	struct papr_scm_priv *p;
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	/* test the event attr type for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* it does not support event sampling mode */
+	if (is_sampling_event(event))
+		return -EOPNOTSUPP;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+	if (!p)
+		return -EINVAL;
+
+	/* Invalid eventcode */
+	if (event->attr.config == 0 || event->attr.config > p->total_events)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+	u64 count;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	if (flags & PERF_EF_START) {
+		rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+		if (rc)
+			return rc;
+
+		local64_set(&event->hw.prev_count, count);
+	}
+
+	return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+	u64 prev, now;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return;
+
+	rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+	if (rc)
+		return;
+
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+	papr_scm_pmu_read(event);
+}
+
+static ssize_t device_show_string(struct device *dev, struct device_attribute *attr,
+				  char *buf)
+{
+	struct perf_pmu_events_attr *d;
+
+	d = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sysfs_emit(buf, "%s\n", (char *)d->event_str);
+}
+
+static char *strtolower(char *updated_name)
+{
+	int i = 0;
+
+	while (updated_name[i]) {
+		if (isupper(updated_name[i]))
+			updated_name[i] = tolower(updated_name[i]);
+		i++;
+	}
+	updated_name[i] = '\0';
+	return strim(updated_name);
+}
+
+/* device_str_attr_create : Populate event "name" and string "str" in attribute */
+static struct attribute *device_str_attr_create_(char *name, char *str)
+{
+	struct perf_pmu_events_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+
+	if (!attr)
+		return NULL;
+
+	sysfs_attr_init(&attr->attr.attr);
+	attr->event_str = str;
+	attr->attr.attr.name = strtolower(name);
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = device_show_string;
+
+	return &attr->attr.attr;
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats, *single_stats;
+	int index, size, rc, count;
+	u32 available_events;
+	struct attribute **events;
+	char *eventcode, *eventname, *statid;
+	struct attribute_group *nvdimm_pmu_events_group;
+
+	if (!p->stat_buffer_len)
+		return -ENOENT;
+
+	available_events = (p->stat_buffer_len  - sizeof(struct papr_scm_perf_stats))
+			/ sizeof(struct papr_scm_perf_stat);
+
+	/* Allocate memory for events attribute group */
+	nvdimm_pmu_events_group = kzalloc(sizeof(*nvdimm_pmu_events_group), GFP_KERNEL);
+	if (!nvdimm_pmu_events_group)
+		return -ENOMEM;
+
+	/* Allocate the buffer for phyp where stats are written */
+	stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+	if (!stats) {
+		rc = -ENOMEM;
+		goto out_nvdimm_pmu_events_group;
+	}
+
+	/* Allocate memory to nvdimm_event_map */
+	p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+	if (!p->nvdimm_events_map) {
+		rc = -ENOMEM;
+		goto out_stats;
+	}
+
+	/* Called to get list of events supported */
+	rc = drc_pmem_query_stats(p, stats, 0);
+	if (rc)
+		goto out_nvdimm_events_map;
+
+	/* Allocate buffer to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) + sizeof(struct papr_scm_perf_stat);
+
+	single_stats = kzalloc(size, GFP_KERNEL);
+	if (!single_stats) {
+		rc = -ENOMEM;
+		goto out_nvdimm_events_map;
+	}
+
+	events = kzalloc(available_events * sizeof(struct attribute *), GFP_KERNEL);
+	if (!events) {
+		rc = -ENOMEM;
+		goto out_single_stats;
+	}
+
+	for (index = 0, stat = stats->scm_statistic, count = 0;
+		     index < available_events; index++, ++stat) {
+
+		single_stats->scm_statistic[0] = *stat;
+		rc = drc_pmem_query_stats(p, single_stats, 1);
+
+		if (rc < 0) {
+			pr_info("Event not supported %s for device %s\n",
+				stat->stat_id, nvdimm_name(p->nvdimm));
+		} else {
+			eventcode = kasprintf(GFP_KERNEL, "event=0x%x", count + 1);
+			eventname = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+			statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+
+			if (!eventname || !statid || !eventcode)
+				goto out;
+
+			strcpy(eventname, stat->stat_id);
+			events[count] = device_str_attr_create_(eventname,
+								eventcode);
+			if (!events[count])
+				goto out;
+
+			strcpy(statid, stat->stat_id);
+			p->nvdimm_events_map[count] = statid;
+			count++;
+			continue;
+out:
+			kfree(eventcode);
+			kfree(eventname);
+			kfree(statid);
+		}
+	}
+
+	if (!count)
+		goto out_events;
+
+	events[count] = NULL;
+	p->nvdimm_events_map[count] = NULL;
+	p->total_events = count;
+
+	nvdimm_pmu_events_group->name = "events";
+	nvdimm_pmu_events_group->attrs = events;
+
+	/* Fill attribute groups for the nvdimm pmu device */
+	nd_pmu->attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+	nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR] = nvdimm_pmu_events_group;
+	nd_pmu->attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+	kfree(single_stats);
+	kfree(stats);
+	return 0;
+
+out_events:
+	kfree(events);
+out_single_stats:
+	kfree(single_stats);
+out_nvdimm_events_map:
+	kfree(p->nvdimm_events_map);
+out_stats:
+	kfree(stats);
+out_nvdimm_pmu_events_group:
+	kfree(nvdimm_pmu_events_group);
+	return rc;
+}
+
+/* Function to free the attr_groups which are dynamically allocated */
+static void papr_scm_pmu_mem_free(struct nvdimm_pmu *nd_pmu)
+{
+	if (nd_pmu) {
+		if (nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR])
+			kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]->attrs);
+		kfree(nd_pmu->attr_groups[NVDIMM_PMU_EVENT_ATTR]);
+	}
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+	struct nvdimm_pmu *nd_pmu;
+	int rc, nodeid;
+
+	nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+	if (!nd_pmu) {
+		rc = -ENOMEM;
+		goto pmu_err_print;
+	}
+
+	rc = papr_scm_pmu_check_events(p, nd_pmu);
+	if (rc)
+		goto pmu_check_events_err;
+
+	nd_pmu->name = nvdimm_name(p->nvdimm);
+	nd_pmu->event_init = papr_scm_pmu_event_init;
+	nd_pmu->read = papr_scm_pmu_read;
+	nd_pmu->add = papr_scm_pmu_add;
+	nd_pmu->del = papr_scm_pmu_del;
+
+	/*updating the cpumask variable */
+	nodeid = dev_to_node(&p->pdev->dev);
+	nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+	/* cpumask should not be NULL */
+	WARN_ON_ONCE(cpumask_empty(&nd_pmu->arch_cpumask));
+
+	rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+	if (rc)
+		goto pmu_register_err;
+
+	/*
+	 * Set archdata.priv value to nvdimm_pmu structure, to handle the
+	 * unregistering of pmu device.
+	 */
+	p->pdev->archdata.priv = nd_pmu;
+	return;
+
+pmu_register_err:
+	papr_scm_pmu_mem_free(nd_pmu);
+	kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+	kfree(nd_pmu);
+pmu_err_print:
+	dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
+static void papr_scm_pmu_uninit(struct nvdimm_pmu *nd_pmu)
+{
+	unregister_nvdimm_pmu(nd_pmu);
+	papr_scm_pmu_mem_free(nd_pmu);
+	kfree(nd_pmu);
+}
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
@@ -1236,6 +1594,7 @@ static int papr_scm_probe(struct platform_device *pdev)
 		goto err2;
 
 	platform_set_drvdata(pdev, p);
+	papr_scm_pmu_register(p);
 
 	return 0;
 
@@ -1254,6 +1613,12 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 	nvdimm_bus_unregister(p->bus);
 	drc_pmem_unbind(p);
+
+	if (pdev->archdata.priv)
+		papr_scm_pmu_uninit(pdev->archdata.priv);
+
+	pdev->archdata.priv = NULL;
+	kfree(p->nvdimm_events_map);
 	kfree(p->bus_desc.provider_name);
 	kfree(p);
 
-- 
2.26.2


^ permalink raw reply related

* [RESEND PATCH v4 4/4] powerpc/papr_scm: Document papr_scm sysfs event format entries
From: Kajol Jain @ 2021-09-03  5:09 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx
In-Reply-To: <20210903050914.273525-1-kjain@linux.ibm.com>

Details is added for the event, cpumask and format attributes
in the ABI documentation.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-papr-pmem | 31 +++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-papr-pmem b/Documentation/ABI/testing/sysfs-bus-papr-pmem
index 95254cec92bf..4d86252448f8 100644
--- a/Documentation/ABI/testing/sysfs-bus-papr-pmem
+++ b/Documentation/ABI/testing/sysfs-bus-papr-pmem
@@ -61,3 +61,34 @@ Description:
 		* "CchRHCnt" : Cache Read Hit Count
 		* "CchWHCnt" : Cache Write Hit Count
 		* "FastWCnt" : Fast Write Count
+
+What:		/sys/devices/nmemX/format
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description:	(RO) Attribute group to describe the magic bits
+                that go into perf_event_attr.config for a particular pmu.
+                (See ABI/testing/sysfs-bus-event_source-devices-format).
+
+                Each attribute under this group defines a bit range of the
+                perf_event_attr.config. Supported attribute is listed
+                below::
+
+		    event  = "config:0-4"  - event ID
+
+		For example::
+		    noopstat = "event=0x1"
+
+What:		/sys/devices/nmemX/events
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description:    (RO) Attribute group to describe performance monitoring
+                events specific to papr-scm. Each attribute in this group describes
+                a single performance monitoring event supported by this nvdimm pmu.
+                The name of the file is the name of the event.
+                (See ABI/testing/sysfs-bus-event_source-devices-events).
+
+What:		/sys/devices/nmemX/cpumask
+Date:		June 2021
+Contact:	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, nvdimm@lists.linux.dev,
+Description:	(RO) This sysfs file exposes the cpumask which is designated to make
+                HCALLs to retrieve nvdimm pmu event counter data.
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH 0/5] KVM: PPC: Book3S: Modules cleanup and unification
From: David Gibson @ 2021-09-03  5:13 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: linuxppc-dev, npiggin, kvm-ppc
In-Reply-To: <875yvjujxy.fsf@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3208 bytes --]

On Thu, Sep 02, 2021 at 11:32:41AM -0300, Fabiano Rosas wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Wed, Sep 01, 2021 at 02:33:52PM -0300, Fabiano Rosas wrote:
> >> This series merges our three kvm modules kvm.ko, kvm-hv.ko and
> >> kvm-pr.ko into one kvm.ko module.
> >
> > That doesn't sound like a good idea to me.  People who aren't on BookS
> > servers don't want - and can't use - kvm-hv.  Almost nobody wants
> > kvm-pr.  It's also kind of inconsistent with x86, which has the
> > separate AMD and Intel modules.
> 
> But this is not altering the ability of having only kvm-hv or only
> kvm-pr. I'm taking the Kconfig options that used to produce separate
> modules and using them to select which code gets built into the one
> kvm.ko module.

> 
> Currently:
> 
> CONFIG_KVM_BOOK3S_64=m     <-- produces kvm.ko
> CONFIG_KVM_BOOK3S_64_HV=m  <-- produces kvm-hv.ko
> CONFIG_KVM_BOOK3S_64_PR=m  <-- produces kvm-pr.ko
> 
> I'm making it so we now have one kvm.ko everywhere, but there is still:
> 
> CONFIG_KVM_BOOK3S_64=m           <-- produces kvm.ko
> CONFIG_KVM_BOOK3S_HV_POSSIBLE=y  <-- includes HV in kvm.ko
> CONFIG_KVM_BOOK3S_PR_POSSIBLE=y  <-- includes PR in kvm.ko
> 
> In other words, if you are going to have at least two modules loaded at
> all times (kvm + kvm-hv or kvm + kvm-pr), why not put all that into one
> module? No one needs to build code they are not going to use, this is
> not changing.

Ah.. I see, you're removing the runtime switch from one to the other
at the same time as having just a single one loaded, but leaving the
ability to compile time switch.  And compile time is arguably good
enough for the cases I've described.

Ok, I see your point.

I still think it's conceptually not ideal, but the practical benefit
is more important.  Objection withdrawn.


> About consistency with x86, this situation is not analogous because we
> need to be able to load both modules at the same time, which means
> kvm.ko needs to stick around when one module goes away in case we want
> to load the other module. The KVM common code states that it expects to
> have at most one implementation:
> 
>         /*
>          * kvm_arch_init makes sure there's at most one caller
>          * for architectures that support multiple implementations,
>          * like intel and amd on x86.
>          (...)
> 
> which is not true in our case due to this requirement of having two
> separate modules loading independently.
> 
> (tangent) We are already quite different from other architectures since
> we're not making use of kvm_arch_init and some other KVM hooks, such as
> kvm_arch_check_processor_compat. So while other archs have their init
> dispatched by kvm common code, our init and cleanup happens
> independently in the ppc-specific modules, which obviously works but is
> needlessly different and has subtleties in the ordering of operations
> wrt. the kvm common code. (tangent)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH kernel] KVM: PPC: Book3S: Merge powerpc's debugfs entry content into generic entry
From: Alexey Kardashevskiy @ 2021-09-03  5:22 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: kvm, Fabiano Rosas, Alexey Kardashevskiy, kvm-ppc, Paolo Bonzini

At the moment the generic KVM code creates an "%pid-%fd" entry per a KVM
instance; and the PPC HV KVM creates its own at "vm%pid".

The rproblems with the PPC entries are:
1. they do not allow multiple VMs in the same process (which is extremely
rare case mostly used by syzkaller fuzzer);
2. prone to race bugs like the generic KVM code had fixed in
commit 85cd39af14f4 ("KVM: Do not leak memory for duplicate debugfs
directories").

This defines kvm_arch_create_kvm_debugfs() similar to one for vcpus.

This defines 2 hooks in kvmppc_ops for allowing specific KVM
implementations to add necessary entries.

This makes use of already existing kvm_arch_create_vcpu_debugfs.

This removes no more used debugfs_dir pointers from PPC kvm_arch structs.

Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/include/asm/kvm_host.h    |  6 +++---
 arch/powerpc/include/asm/kvm_ppc.h     |  2 ++
 include/linux/kvm_host.h               |  3 +++
 arch/powerpc/kvm/book3s_64_mmu_hv.c    |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/powerpc/kvm/book3s_hv.c           | 30 +++++++++-----------------
 arch/powerpc/kvm/powerpc.c             | 12 +++++++++++
 virt/kvm/kvm_main.c                    |  3 +++
 8 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 2bcac6da0a4b..e4f2feb67b53 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -296,7 +296,6 @@ struct kvm_arch {
 	bool dawr1_enabled;
 	pgd_t *pgtable;
 	u64 process_table;
-	struct dentry *debugfs_dir;
 	struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
@@ -828,8 +827,6 @@ struct kvm_vcpu_arch {
 	struct kvmhv_tb_accumulator rm_exit;	/* real-mode exit code */
 	struct kvmhv_tb_accumulator guest_time;	/* guest execution */
 	struct kvmhv_tb_accumulator cede_time;	/* time napping inside guest */
-
-	struct dentry *debugfs_dir;
 #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
 };
 
@@ -868,4 +865,7 @@ static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+#define __KVM_HAVE_ARCH_KVM_DEBUGFS
+
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 6355a6980ccf..8b3f7f6e3f12 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -316,6 +316,8 @@ struct kvmppc_ops {
 	int (*svm_off)(struct kvm *kvm);
 	int (*enable_dawr1)(struct kvm *kvm);
 	bool (*hash_v3_possible)(void);
+	void (*create_kvm_debugfs)(struct kvm *kvm);
+	void (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
 };
 
 extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae7735b490b4..74d2c1c3df1b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1021,6 +1021,9 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state);
 #ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS
 void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
 #endif
+#ifdef __KVM_HAVE_ARCH_KVM_DEBUGFS
+void kvm_arch_create_kvm_debugfs(struct kvm *kvm);
+#endif
 
 int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index c63e263312a4..33dae253a0ac 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -2112,7 +2112,7 @@ static const struct file_operations debugfs_htab_fops = {
 
 void kvmppc_mmu_debugfs_init(struct kvm *kvm)
 {
-	debugfs_create_file("htab", 0400, kvm->arch.debugfs_dir, kvm,
+	debugfs_create_file("htab", 0400, kvm->debugfs_dentry, kvm,
 			    &debugfs_htab_fops);
 }
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index c5508744e14c..f4e083c20872 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -1452,7 +1452,7 @@ static const struct file_operations debugfs_radix_fops = {
 
 void kvmhv_radix_debugfs_init(struct kvm *kvm)
 {
-	debugfs_create_file("radix", 0400, kvm->arch.debugfs_dir, kvm,
+	debugfs_create_file("radix", 0400, kvm->debugfs_dentry, kvm,
 			    &debugfs_radix_fops);
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c8f12b056968..325b388c725a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2771,19 +2771,14 @@ static const struct file_operations debugfs_timings_ops = {
 };
 
 /* Create a debugfs directory for the vcpu */
-static void debugfs_vcpu_init(struct kvm_vcpu *vcpu, unsigned int id)
+static void kvmppc_arch_create_vcpu_debugfs_hv(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
 {
-	char buf[16];
-	struct kvm *kvm = vcpu->kvm;
-
-	snprintf(buf, sizeof(buf), "vcpu%u", id);
-	vcpu->arch.debugfs_dir = debugfs_create_dir(buf, kvm->arch.debugfs_dir);
-	debugfs_create_file("timings", 0444, vcpu->arch.debugfs_dir, vcpu,
+	debugfs_create_file("timings", 0444, debugfs_dentry, vcpu,
 			    &debugfs_timings_ops);
 }
 
 #else /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
-static void debugfs_vcpu_init(struct kvm_vcpu *vcpu, unsigned int id)
+static void kvmppc_arch_create_vcpu_debugfs_hv(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
 {
 }
 #endif /* CONFIG_KVM_BOOK3S_HV_EXIT_TIMING */
@@ -2907,8 +2902,6 @@ static int kvmppc_core_vcpu_create_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
 
-	debugfs_vcpu_init(vcpu, id);
-
 	return 0;
 }
 
@@ -5186,7 +5179,6 @@ void kvmppc_free_host_rm_ops(void)
 static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 {
 	unsigned long lpcr, lpid;
-	char buf[32];
 	int ret;
 
 	mutex_init(&kvm->arch.uvmem_lock);
@@ -5319,16 +5311,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 		kvm->arch.smt_mode = 1;
 	kvm->arch.emul_smt_mode = 1;
 
-	/*
-	 * Create a debugfs directory for the VM
-	 */
-	snprintf(buf, sizeof(buf), "vm%d", current->pid);
-	kvm->arch.debugfs_dir = debugfs_create_dir(buf, kvm_debugfs_dir);
+	return 0;
+}
+
+static void kvmppc_arch_create_kvm_debugfs_hv(struct kvm *kvm)
+{
 	kvmppc_mmu_debugfs_init(kvm);
 	if (radix_enabled())
 		kvmhv_radix_debugfs_init(kvm);
-
-	return 0;
 }
 
 static void kvmppc_free_vcores(struct kvm *kvm)
@@ -5342,8 +5332,6 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
-	debugfs_remove_recursive(kvm->arch.debugfs_dir);
-
 	if (!cpu_has_feature(CPU_FTR_ARCH_300))
 		kvm_hv_vm_deactivated();
 
@@ -5996,6 +5984,8 @@ static struct kvmppc_ops kvm_ops_hv = {
 	.svm_off = kvmhv_svm_off,
 	.enable_dawr1 = kvmhv_enable_dawr1,
 	.hash_v3_possible = kvmppc_hash_v3_possible,
+	.create_vcpu_debugfs = kvmppc_arch_create_vcpu_debugfs_hv,
+	.create_kvm_debugfs = kvmppc_arch_create_kvm_debugfs_hv,
 };
 
 static int kvm_init_subcore_bitmap(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b4e6f70b97b9..9c18d599171b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -2505,3 +2505,15 @@ int kvm_arch_init(void *opaque)
 }
 
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ppc_instr);
+
+void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry)
+{
+	if (vcpu->kvm->arch.kvm_ops->create_vcpu_debugfs)
+		vcpu->kvm->arch.kvm_ops->create_vcpu_debugfs(vcpu, debugfs_dentry);
+}
+
+void kvm_arch_create_kvm_debugfs(struct kvm *kvm)
+{
+	if (kvm->arch.kvm_ops->create_kvm_debugfs)
+		kvm->arch.kvm_ops->create_kvm_debugfs(kvm);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b50dbe269f4b..f9d423535f00 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -954,6 +954,9 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 				    kvm->debugfs_dentry, stat_data,
 				    &stat_fops_per_vm);
 	}
+#ifdef __KVM_HAVE_ARCH_KVM_DEBUGFS
+	kvm_arch_create_kvm_debugfs(kvm);
+#endif
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related

* [PATCH] powerpc: Remove unused prototype for of_show_percpuinfo
From: Daniel Axtens @ 2021-09-03  6:32 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Axtens

commit 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
removed of_show_percpuinfo but didn't remove the prototype.

Remove it.

Fixes: 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/platforms/powermac/pmac.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/powermac/pmac.h b/arch/powerpc/platforms/powermac/pmac.h
index 0d715db434dc..29d2036dcc9d 100644
--- a/arch/powerpc/platforms/powermac/pmac.h
+++ b/arch/powerpc/platforms/powermac/pmac.h
@@ -27,7 +27,6 @@ extern void pmac_nvram_update(void);
 extern unsigned char pmac_nvram_read_byte(int addr);
 extern void pmac_nvram_write_byte(int addr, unsigned char val);
 extern void pmac_pcibios_after_init(void);
-extern int of_show_percpuinfo(struct seq_file *m, int i);
 
 extern void pmac_setup_pci_dma(void);
 extern void pmac_check_ht_link(void);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] powerpc: Remove unused prototype for of_show_percpuinfo
From: Andrew Donnellan @ 2021-09-03  6:36 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev
In-Reply-To: <20210903063246.70691-1-dja@axtens.net>

On 3/9/21 4:32 pm, Daniel Axtens wrote:
> commit 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
> removed of_show_percpuinfo but didn't remove the prototype.
> 
> Remove it.
> 
> Fixes: 6d7f58b04d82 ("[PATCH] powerpc: Some minor cleanups to setup_32.c")
> Signed-off-by: Daniel Axtens <dja@axtens.net>

I grepped through, confirmed there's no other references to 
of_show_percpuinfo().

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

(First linuxppc patch written and sent live on a twitch stream?)

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* Re: [PATCH] powerpc/machdep: Remove stale functions from ppc_md structure
From: Daniel Axtens @ 2021-09-03  6:50 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu>

Hi Christophe,

> ppc_md.iommu_save() is not set anymore by any platform after
> commit c40785ad305b ("powerpc/dart: Use a cachable DART").
> So iommu_save() has become a nop and can be removed.

I wonder if it makes sense to have an iommu_restore() without an
iommu_save. Only dart_iommu.c defines an iommu_restore(), but I couldn't
figure out if it was safe to remove and it seems like it still did
something...

> ppc_md.show_percpuinfo() is not set anymore by any platform after
> commit 4350147a816b ("[PATCH] ppc64: SMU based macs cpufreq support").
>
> Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were
> removed by commit 0f03a43b8f0f ("[POWERPC] Remove todc code from
> ARCH=powerpc")
>
> Last user of kgdb_map_scc() was removed by commit 17ce452f7ea3 ("kgdb,
> powerpc: arch specific powerpc kgdb support").
>
> ppc.machine_kexec_prepare() has not been used since
> commit 8ee3e0d69623 ("powerpc: Remove the main legacy iSerie platform
> code"). This allows the removal of machine_kexec_prepare() and the
> rename of default_machine_kexec_prepare() into machine_kexec_prepare()

I think you should also remove the prototype from
arch/powerpc/include/asm/kexec.h

Apart from that:
Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel Axtens

^ permalink raw reply

* Re: [PATCH] powerpc/time: Remove generic_suspend_{dis/en}able_irqs()
From: Daniel Axtens @ 2021-09-03  7:02 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Commit d75d68cfef49 ("powerpc: Clean up obsolete code relating to
> decrementer and timebase") made generic_suspend_enable_irqs() and
> generic_suspend_disable_irqs() static.
>
> Fold them into their only caller.

This does what it says, and simplifies the code.
Reviewed-by: Daniel Axtens <dja@axtens.net>

Kind regards,
Daniel

^ permalink raw reply

* [PATCH] powerpc/code-patching: Return error on patch_branch() out-of-range failure
From: Christophe Leroy @ 2021-09-03  8:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Do not silentely ignore a failure of create_branch() in
patch_branch(). Return -ERANGE.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/lib/code-patching.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index f9a3019e37b4..0bc9cc0416b8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -202,7 +202,9 @@ int patch_branch(u32 *addr, unsigned long target, int flags)
 {
 	struct ppc_inst instr;
 
-	create_branch(&instr, addr, target, flags);
+	if (create_branch(&instr, addr, target, flags))
+		return -ERANGE;
+
 	return patch_instruction(addr, instr);
 }
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] ftrace: Cleanup ftrace_dyn_arch_init()
From: Heiko Carstens @ 2021-09-03  8:21 UTC (permalink / raw)
  To: Weizhao Ouyang
  Cc: dalias, linux-ia64, linux-sh, linux-mips, James.Bottomley, guoren,
	hpa, sparclinux, linux-riscv, deanbo422, will, linux-s390, ysato,
	deller, x86, linux, linux-csky, borntraeger, mingo,
	catalin.marinas, aou, gor, rostedt, bp, green.hu, paul.walmsley,
	tglx, linux-arm-kernel, monstr, tsbogend, linux-parisc, nickhu,
	linux-kernel, palmer, paulus, linuxppc-dev, davem
In-Reply-To: <20210903071817.1162938-1-o451686892@gmail.com>

On Fri, Sep 03, 2021 at 03:18:17PM +0800, Weizhao Ouyang wrote:
> Most ARCHs use empty ftrace_dyn_arch_init(), introduce a weak common
> ftrace_dyn_arch_init() to cleanup them.
> 
> Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
> ---
>  arch/arm/kernel/ftrace.c          | 5 -----
>  arch/arm64/kernel/ftrace.c        | 5 -----
>  arch/csky/kernel/ftrace.c         | 5 -----
>  arch/ia64/kernel/ftrace.c         | 6 ------
>  arch/microblaze/kernel/ftrace.c   | 5 -----
>  arch/mips/include/asm/ftrace.h    | 2 ++
>  arch/nds32/kernel/ftrace.c        | 5 -----
>  arch/parisc/kernel/ftrace.c       | 5 -----
>  arch/powerpc/include/asm/ftrace.h | 4 ++++
>  arch/riscv/kernel/ftrace.c        | 5 -----
>  arch/s390/kernel/ftrace.c         | 5 -----
>  arch/sh/kernel/ftrace.c           | 5 -----
>  arch/sparc/kernel/ftrace.c        | 5 -----
>  arch/x86/kernel/ftrace.c          | 5 -----
>  include/linux/ftrace.h            | 1 -
>  kernel/trace/ftrace.c             | 5 +++++
>  16 files changed, 11 insertions(+), 62 deletions(-)

For s390:
Acked-by: Heiko Carstens <hca@linux.ibm.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox