LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Implement generic cc_platform_has() helper function
From: Borislav Petkov @ 2021-09-28 19:10 UTC (permalink / raw)
  To: LKML
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, kvm, David Airlie,
	dri-devel, platform-driver-x86, Paul Mackerras, Will Deacon,
	Ard Biesheuvel, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, Christoph Hellwig, Christian Borntraeger,
	VMware Graphics, Dave Young, Tom Lendacky, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Andy Lutomirski, Kirill A. Shutemov, kexec, iommu, Daniel Vetter,
	linuxppc-dev

From: Borislav Petkov <bp@suse.de>

Hi all,

here's v4 of the cc_platform_has() patchset with feedback incorporated.

I'm going to route this through tip if there are no objections.

Thx.

Tom Lendacky (8):
  x86/ioremap: Selectively build arch override encryption functions
  arch/cc: Introduce a function to check for confidential computing
    features
  x86/sev: Add an x86 version of cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  treewide: Replace the use of mem_encrypt_active() with
    cc_platform_has()

 arch/Kconfig                                 |  3 +
 arch/powerpc/include/asm/mem_encrypt.h       |  5 --
 arch/powerpc/platforms/pseries/Kconfig       |  1 +
 arch/powerpc/platforms/pseries/Makefile      |  2 +
 arch/powerpc/platforms/pseries/cc_platform.c | 26 ++++++
 arch/powerpc/platforms/pseries/svm.c         |  5 +-
 arch/s390/include/asm/mem_encrypt.h          |  2 -
 arch/x86/Kconfig                             |  1 +
 arch/x86/include/asm/io.h                    |  8 ++
 arch/x86/include/asm/kexec.h                 |  2 +-
 arch/x86/include/asm/mem_encrypt.h           | 12 +--
 arch/x86/kernel/Makefile                     |  6 ++
 arch/x86/kernel/cc_platform.c                | 69 +++++++++++++++
 arch/x86/kernel/crash_dump_64.c              |  4 +-
 arch/x86/kernel/head64.c                     |  9 +-
 arch/x86/kernel/kvm.c                        |  3 +-
 arch/x86/kernel/kvmclock.c                   |  4 +-
 arch/x86/kernel/machine_kexec_64.c           | 19 +++--
 arch/x86/kernel/pci-swiotlb.c                |  9 +-
 arch/x86/kernel/relocate_kernel_64.S         |  2 +-
 arch/x86/kernel/sev.c                        |  6 +-
 arch/x86/kvm/svm/svm.c                       |  3 +-
 arch/x86/mm/ioremap.c                        | 18 ++--
 arch/x86/mm/mem_encrypt.c                    | 55 ++++--------
 arch/x86/mm/mem_encrypt_identity.c           |  9 +-
 arch/x86/mm/pat/set_memory.c                 |  3 +-
 arch/x86/platform/efi/efi_64.c               |  9 +-
 arch/x86/realmode/init.c                     |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c      |  4 +-
 drivers/gpu/drm/drm_cache.c                  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c          |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c          |  6 +-
 drivers/iommu/amd/init.c                     |  7 +-
 drivers/iommu/amd/iommu.c                    |  3 +-
 drivers/iommu/amd/iommu_v2.c                 |  3 +-
 drivers/iommu/iommu.c                        |  3 +-
 fs/proc/vmcore.c                             |  6 +-
 include/linux/cc_platform.h                  | 88 ++++++++++++++++++++
 include/linux/mem_encrypt.h                  |  4 -
 kernel/dma/swiotlb.c                         |  4 +-
 40 files changed, 310 insertions(+), 129 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c
 create mode 100644 arch/x86/kernel/cc_platform.c
 create mode 100644 include/linux/cc_platform.h

-- 
2.29.2


^ permalink raw reply

* [PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()
From: Borislav Petkov @ 2021-09-28 19:10 UTC (permalink / raw)
  To: LKML
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, kvm, David Airlie,
	dri-devel, platform-driver-x86, Paul Mackerras, Will Deacon,
	Ard Biesheuvel, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, Christoph Hellwig, Christian Borntraeger,
	VMware Graphics, Dave Young, Tom Lendacky, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Andy Lutomirski, Kirill A. Shutemov, kexec, iommu, Daniel Vetter,
	linuxppc-dev
In-Reply-To: <20210928191009.32551-1-bp@alien8.de>

From: Tom Lendacky <thomas.lendacky@amd.com>

Introduce an x86 version of the cc_platform_has() function. This will be
used to replace vendor specific calls like sme_active(), sev_active(),
etc.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/Kconfig                   |  1 +
 arch/x86/include/asm/mem_encrypt.h |  1 +
 arch/x86/kernel/Makefile           |  6 +++
 arch/x86/kernel/cc_platform.c      | 69 ++++++++++++++++++++++++++++++
 arch/x86/mm/mem_encrypt.c          |  1 +
 5 files changed, 78 insertions(+)
 create mode 100644 arch/x86/kernel/cc_platform.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab83c22d274e..9f190ec4f953 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
 	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
 	select INSTRUCTION_DECODER
 	select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
+	select ARCH_HAS_CC_PLATFORM
 	help
 	  Say yes to enable support for the encryption of system memory.
 	  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..3fb9f5ebefa4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/init.h>
+#include <linux/cc_platform.h>
 
 #include <asm/bootparam.h>
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8f4e8fa6ed75..2ff3e600f426 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
+CFLAGS_REMOVE_cc_platform.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o				:= n
@@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o			:= n
 KASAN_SANITIZE_stacktrace.o				:= n
 KASAN_SANITIZE_paravirt.o				:= n
 KASAN_SANITIZE_sev.o					:= n
+KASAN_SANITIZE_cc_platform.o				:= n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -47,6 +49,7 @@ endif
 KCOV_INSTRUMENT		:= n
 
 CFLAGS_head$(BITS).o	+= -fno-stack-protector
+CFLAGS_cc_platform.o	+= -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
@@ -147,6 +150,9 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)		+= unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)		+= sev.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM)	+= cc_platform.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
new file mode 100644
index 000000000000..03bb2f343ddb
--- /dev/null
+++ b/arch/x86/kernel/cc_platform.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ */
+
+#include <linux/export.h>
+#include <linux/cc_platform.h>
+#include <linux/mem_encrypt.h>
+
+#include <asm/processor.h>
+
+static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_INTEL_TDX_GUEST
+	return false;
+#else
+	return false;
+#endif
+}
+
+/*
+ * SME and SEV are very similar but they are not the same, so there are
+ * times that the kernel will need to distinguish between SME and SEV. The
+ * cc_platform_has() function is used for this.  When a distinction isn't
+ * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
+ *
+ * The trampoline code is a good example for this requirement.  Before
+ * paging is activated, SME will access all memory as decrypted, but SEV
+ * will access all memory as encrypted.  So, when APs are being brought
+ * up under SME the trampoline area cannot be encrypted, whereas under SEV
+ * the trampoline area must be encrypted.
+ */
+static bool amd_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+	switch (attr) {
+	case CC_ATTR_MEM_ENCRYPT:
+		return sme_me_mask;
+
+	case CC_ATTR_HOST_MEM_ENCRYPT:
+		return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+	case CC_ATTR_GUEST_MEM_ENCRYPT:
+		return sev_status & MSR_AMD64_SEV_ENABLED;
+
+	case CC_ATTR_GUEST_STATE_ENCRYPT:
+		return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+
+	default:
+		return false;
+	}
+#else
+	return false;
+#endif
+}
+
+
+bool cc_platform_has(enum cc_attr attr)
+{
+	if (sme_me_mask)
+		return amd_cc_platform_has(attr);
+
+	return false;
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..e29b1418d00c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -20,6 +20,7 @@
 #include <linux/bitops.h>
 #include <linux/dma-mapping.h>
 #include <linux/virtio_config.h>
+#include <linux/cc_platform.h>
 
 #include <asm/tlbflush.h>
 #include <asm/fixmap.h>
-- 
2.29.2


^ permalink raw reply related

* [PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions
From: Borislav Petkov @ 2021-09-28 19:10 UTC (permalink / raw)
  To: LKML
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, kvm, David Airlie,
	dri-devel, platform-driver-x86, Paul Mackerras, Will Deacon,
	Ard Biesheuvel, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, Christoph Hellwig, Christian Borntraeger,
	VMware Graphics, Dave Young, Tom Lendacky, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Andy Lutomirski, Kirill A. Shutemov, kexec, iommu, Daniel Vetter,
	linuxppc-dev
In-Reply-To: <20210928191009.32551-1-bp@alien8.de>

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation for other uses of the cc_platform_has() function
besides AMD's memory encryption support, selectively build the
AMD memory encryption architecture override functions only when
CONFIG_AMD_MEM_ENCRYPT=y. These functions are:

- early_memremap_pgprot_adjust()
- arch_memremap_can_ram_remap()

Additionally, routines that are only invoked by these architecture
override functions can also be conditionally built. These functions are:

- memremap_should_map_decrypted()
- memremap_is_efi_data()
- memremap_is_setup_data()
- early_memremap_is_setup_data()

And finally, phys_mem_access_encrypted() is conditionally built as well,
but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
not set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/io.h | 8 ++++++++
 arch/x86/mm/ioremap.c     | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 841a5d104afa..5c6a4af0b911 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -391,6 +391,7 @@ extern void arch_io_free_memtype_wc(resource_size_t start, resource_size_t size)
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 extern bool arch_memremap_can_ram_remap(resource_size_t offset,
 					unsigned long size,
 					unsigned long flags);
@@ -398,6 +399,13 @@ extern bool arch_memremap_can_ram_remap(resource_size_t offset,
 
 extern bool phys_mem_access_encrypted(unsigned long phys_addr,
 				      unsigned long size);
+#else
+static inline bool phys_mem_access_encrypted(unsigned long phys_addr,
+					     unsigned long size)
+{
+	return true;
+}
+#endif
 
 /**
  * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 60ade7dd71bd..ccff76cedd8f 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -508,6 +508,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 	memunmap((void *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
@@ -746,7 +747,6 @@ bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size)
 	return arch_memremap_can_ram_remap(phys_addr, size, 0);
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
 				      unsigned long size)
-- 
2.29.2


^ permalink raw reply related

* [PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features
From: Borislav Petkov @ 2021-09-28 19:10 UTC (permalink / raw)
  To: LKML
  Cc: Kuppuswamy Sathyanarayanan, linux-efi, kvm, David Airlie,
	dri-devel, platform-driver-x86, Paul Mackerras, Will Deacon,
	Ard Biesheuvel, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, Christoph Hellwig, Christian Borntraeger,
	VMware Graphics, Dave Young, Tom Lendacky, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Andy Lutomirski, Kirill A. Shutemov, kexec, iommu, Daniel Vetter,
	linuxppc-dev
In-Reply-To: <20210928191009.32551-1-bp@alien8.de>

From: Tom Lendacky <thomas.lendacky@amd.com>

In preparation for other confidential computing technologies, introduce
a generic helper function, cc_platform_has(), that can be used to
check for specific active confidential computing attributes, like
memory encryption. This is intended to eliminate having to add multiple
technology-specific checks to the code (e.g. if (sev_active() ||
tdx_active() || ... ).

 [ bp: s/_CC_PLATFORM_H/_LINUX_CC_PLATFORM_H/g ]

Co-developed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/Kconfig                |  3 ++
 include/linux/cc_platform.h | 88 +++++++++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)
 create mode 100644 include/linux/cc_platform.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..d1e69d6e8498 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1234,6 +1234,9 @@ config RELR
 config ARCH_HAS_MEM_ENCRYPT
 	bool
 
+config ARCH_HAS_CC_PLATFORM
+	bool
+
 config HAVE_SPARSE_SYSCALL_NR
        bool
        help
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
new file mode 100644
index 000000000000..a075b70b9a70
--- /dev/null
+++ b/include/linux/cc_platform.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ */
+
+#ifndef _LINUX_CC_PLATFORM_H
+#define _LINUX_CC_PLATFORM_H
+
+#include <linux/types.h>
+#include <linux/stddef.h>
+
+/**
+ * enum cc_attr - Confidential computing attributes
+ *
+ * These attributes represent confidential computing features that are
+ * currently active.
+ */
+enum cc_attr {
+	/**
+	 * @CC_ATTR_MEM_ENCRYPT: Memory encryption is active
+	 *
+	 * The platform/OS is running with active memory encryption. This
+	 * includes running either as a bare-metal system or a hypervisor
+	 * and actively using memory encryption or as a guest/virtual machine
+	 * and actively using memory encryption.
+	 *
+	 * Examples include SME, SEV and SEV-ES.
+	 */
+	CC_ATTR_MEM_ENCRYPT,
+
+	/**
+	 * @CC_ATTR_HOST_MEM_ENCRYPT: Host memory encryption is active
+	 *
+	 * The platform/OS is running as a bare-metal system or a hypervisor
+	 * and actively using memory encryption.
+	 *
+	 * Examples include SME.
+	 */
+	CC_ATTR_HOST_MEM_ENCRYPT,
+
+	/**
+	 * @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active
+	 *
+	 * The platform/OS is running as a guest/virtual machine and actively
+	 * using memory encryption.
+	 *
+	 * Examples include SEV and SEV-ES.
+	 */
+	CC_ATTR_GUEST_MEM_ENCRYPT,
+
+	/**
+	 * @CC_ATTR_GUEST_STATE_ENCRYPT: Guest state encryption is active
+	 *
+	 * The platform/OS is running as a guest/virtual machine and actively
+	 * using memory encryption and register state encryption.
+	 *
+	 * Examples include SEV-ES.
+	 */
+	CC_ATTR_GUEST_STATE_ENCRYPT,
+};
+
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+
+/**
+ * cc_platform_has() - Checks if the specified cc_attr attribute is active
+ * @attr: Confidential computing attribute to check
+ *
+ * The cc_platform_has() function will return an indicator as to whether the
+ * specified Confidential Computing attribute is currently active.
+ *
+ * Context: Any context
+ * Return:
+ * * TRUE  - Specified Confidential Computing attribute is active
+ * * FALSE - Specified Confidential Computing attribute is not active
+ */
+bool cc_platform_has(enum cc_attr attr);
+
+#else	/* !CONFIG_ARCH_HAS_CC_PLATFORM */
+
+static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+
+#endif	/* CONFIG_ARCH_HAS_CC_PLATFORM */
+
+#endif	/* _LINUX_CC_PLATFORM_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
From: Borislav Petkov @ 2021-09-28 19:10 UTC (permalink / raw)
  To: LKML
  Cc: Kuppuswamy, Sathyanarayanan, linux-efi, kvm, David Airlie,
	dri-devel, platform-driver-x86, Paul Mackerras, Will Deacon,
	Ard Biesheuvel, linux-s390, Andi Kleen, Baoquan He, Joerg Roedel,
	x86, amd-gfx, Christoph Hellwig, Christian Borntraeger,
	VMware Graphics, Dave Young, Tom Lendacky, Thomas Zimmermann,
	Vasily Gorbik, Heiko Carstens, Maarten Lankhorst, Maxime Ripard,
	Andy Lutomirski, Kirill A. Shutemov, kexec, iommu, Daniel Vetter,
	linuxppc-dev
In-Reply-To: <20210928191009.32551-1-bp@alien8.de>

From: Tom Lendacky <thomas.lendacky@amd.com>

Introduce a powerpc version of the cc_platform_has() function. This will
be used to replace the powerpc mem_encrypt_active() implementation, so
the implementation will initially only support the CC_ATTR_MEM_ENCRYPT
attribute.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/platforms/pseries/Kconfig       |  1 +
 arch/powerpc/platforms/pseries/Makefile      |  2 ++
 arch/powerpc/platforms/pseries/cc_platform.c | 26 ++++++++++++++++++++
 3 files changed, 29 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 5e037df2a3a1..2e57391e0778 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -159,6 +159,7 @@ config PPC_SVM
 	select SWIOTLB
 	select ARCH_HAS_MEM_ENCRYPT
 	select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+	select ARCH_HAS_CC_PLATFORM
 	help
 	 There are certain POWER platforms which support secure guests using
 	 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..41d8aee98da4 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -31,3 +31,5 @@ obj-$(CONFIG_FA_DUMP)		+= rtas-fadump.o
 
 obj-$(CONFIG_SUSPEND)		+= suspend.o
 obj-$(CONFIG_PPC_VAS)		+= vas.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM)	+= cc_platform.o
diff --git a/arch/powerpc/platforms/pseries/cc_platform.c b/arch/powerpc/platforms/pseries/cc_platform.c
new file mode 100644
index 000000000000..e8021af83a19
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/cc_platform.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <thomas.lendacky@amd.com>
+ */
+
+#include <linux/export.h>
+#include <linux/cc_platform.h>
+
+#include <asm/machdep.h>
+#include <asm/svm.h>
+
+bool cc_platform_has(enum cc_attr attr)
+{
+	switch (attr) {
+	case CC_ATTR_MEM_ENCRYPT:
+		return is_secure_guest();
+
+	default:
+		return false;
+	}
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
-- 
2.29.2


^ permalink raw reply related

* Re: [PATCH] powerpc: fix unbalanced node refcount in check_kvm_guest()
From: Tyrel Datwyler @ 2021-09-28 18:57 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: srikar, npiggin
In-Reply-To: <20210928124550.132020-1-nathanl@linux.ibm.com>

On 9/28/21 5:45 AM, Nathan Lynch wrote:
> When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
> returns without performing a matching put for the lookup, leaving the
> node's reference count elevated.
> 
> Add the necessary call to of_node_put(), rearranging the code slightly to
> avoid repetition or goto.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> Fixes: 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")

Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>

^ permalink raw reply

* Re: [PATCH v2 9/9] iommu/vt-d: Use pci core's DVSEC functionality
From: Dan Williams @ 2021-09-28 17:54 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Kan Liang, David Woodhouse, Lu Baolu
In-Reply-To: <20210923172647.72738-10-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Reduce maintenance burden of DVSEC query implementation by using the
> centralized PCI core implementation.
>
> Cc: iommu@lists.linux-foundation.org
> Cc: David Woodhouse <dwmw2@infradead.org>
> Cc: Lu Baolu <baolu.lu@linux.intel.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..30c97181f0ae 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5398,20 +5398,7 @@ static int intel_iommu_disable_sva(struct device *dev)
>   */
>  static int siov_find_pci_dvsec(struct pci_dev *pdev)
>  {
> -       int pos;
> -       u16 vendor, id;
> -
> -       pos = pci_find_next_ext_capability(pdev, 0, 0x23);
> -       while (pos) {
> -               pci_read_config_word(pdev, pos + 4, &vendor);
> -               pci_read_config_word(pdev, pos + 8, &id);
> -               if (vendor == PCI_VENDOR_ID_INTEL && id == 5)
> -                       return pos;
> -
> -               pos = pci_find_next_ext_capability(pdev, pos, 0x23);
> -       }
> -
> -       return 0;
> +       return pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_INTEL, 5);
>  }

Same comments as the CXL patch, siov_find_pci_dvsec() doesn't seem to
have a reason to exist anymore. What is 5?

^ permalink raw reply

* Re: [PATCH v3 9/9] powerpc/mm: Use is_kernel_text() and is_kernel_inittext() helper
From: Christophe Leroy @ 2021-09-28 17:51 UTC (permalink / raw)
  To: Kefeng Wang, arnd, linux-arch, linux-kernel, linuxppc-dev,
	rostedt, mingo, davem, ast, ryabinin.a.a, akpm
  Cc: bpf, paulus
In-Reply-To: <20210926072048.190336-10-wangkefeng.wang@huawei.com>



Le 26/09/2021 à 09:20, Kefeng Wang a écrit :
> Use is_kernel_text() and is_kernel_inittext() helper to simplify code,
> also drop etext, _stext, _sinittext, _einittext declaration which
> already declared in section.h.
> 
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   arch/powerpc/mm/pgtable_32.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index dcf5ecca19d9..13c798308c2e 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -33,8 +33,6 @@
>   
>   #include <mm/mmu_decl.h>
>   
> -extern char etext[], _stext[], _sinittext[], _einittext[];
> -
>   static u8 early_fixmap_pagetable[FIXMAP_PTE_SIZE] __page_aligned_data;
>   
>   notrace void __init early_ioremap_init(void)
> @@ -104,14 +102,13 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
>   {
>   	unsigned long v, s;
>   	phys_addr_t p;
> -	int ktext;
> +	bool ktext;
>   
>   	s = offset;
>   	v = PAGE_OFFSET + s;
>   	p = memstart_addr + s;
>   	for (; s < top; s += PAGE_SIZE) {
> -		ktext = ((char *)v >= _stext && (char *)v < etext) ||
> -			((char *)v >= _sinittext && (char *)v < _einittext);
> +		ktext = (is_kernel_text(v) || is_kernel_inittext(v));

I think we could use core_kernel_next() instead.


>   		map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
>   		v += PAGE_SIZE;
>   		p += PAGE_SIZE;
> 




Build failure on mpc885_ads_defconfig

arch/powerpc/mm/pgtable_32.c: In function '__mapin_ram_chunk':
arch/powerpc/mm/pgtable_32.c:111:26: error: implicit declaration of 
function 'is_kernel_text'; did you mean 'is_kernel_inittext'? 
[-Werror=implicit-function-declaration]
   111 |                 ktext = (is_kernel_text(v) || 
is_kernel_inittext(v));
       |                          ^~~~~~~~~~~~~~
       |                          is_kernel_inittext
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:277: arch/powerpc/mm/pgtable_32.o] 
Error 1
make[1]: *** [scripts/Makefile.build:540: arch/powerpc/mm] Error 2
make: *** [Makefile:1868: arch/powerpc] Error 2



^ permalink raw reply

* Re: [PATCH v2 7/9] cxl/pci: Use pci core's DVSEC functionality
From: Dan Williams @ 2021-09-28 17:43 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-8-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Reduce maintenance burden of DVSEC query implementation by using the
> centralized PCI core implementation.
>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  drivers/cxl/pci.c | 20 +-------------------
>  1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 5eaf2736f779..79d4d9b16d83 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -340,25 +340,7 @@ static void cxl_pci_unmap_regblock(struct cxl_mem *cxlm, struct cxl_register_map
>
>  static int cxl_pci_dvsec(struct pci_dev *pdev, int dvsec)

cxl_pci_dvsec() has no reason to exist anymore. Let's just have the
caller use pci_find_dvsec_capability() directly.

^ permalink raw reply

* Re: [PATCH v2 5/9] cxl/pci: Make more use of cxl_register_map
From: Dan Williams @ 2021-09-28 17:41 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-6-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> The structure exists to pass around information about register mapping.
> Using it more extensively cleans up many existing functions.

I would have liked to have seen "add @base to cxl_register_map" and
"use @map for @bar and @offset arguments" somewhere in this changelog
to set expectations for what changes are included. That would have
also highlighted that adding a @base to cxl_register_map deserves its
own patch vs the conversion of @bar and @offset to instead use
@map->bar and @map->offset. Can you resend with that split and those
mentions?

^ permalink raw reply

* Re: [PATCH v2 4/9] cxl/pci: Refactor cxl_pci_setup_regs
From: Dan Williams @ 2021-09-28 17:35 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-5-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> In preparation for moving parts of register mapping to cxl_core, the
> cxl_pci driver is refactored to utilize a new helper to find register
> blocks by type.
>
> cxl_pci scanned through all register blocks and mapping the ones that
> the driver will use. This logic is inverted so that the driver
> specifically requests the register blocks from a new helper. Under the
> hood, the same implementation of scanning through all register locator
> DVSEC entries exists.
>
> There are 2 behavioral changes (#2 is arguable):
> 1. A dev_err is introduced if cxl_map_regs fails.
> 2. The previous logic would try to map component registers and device
>    registers multiple times if there were present and keep the mapping
>    of the last one found (furthest offset in the register locator).
>    While this is disallowed in the spec, CXL 2.0 8.1.9: "Each register
>    block identifier shall only occur once in the Register Locator DVSEC
>    structure" it was how the driver would respond to the spec violation.
>    The new logic will take the first found register block by type and
>    move on.
>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
>
> ---
> Changes since v1:

No changes? Luckily git am strips this section...

Overall I think this refactor can be broken down further for
readability and cleanup the long standing problem that the driver maps
component registers for no reason. The main contributing factor to
readability is that cxl_setup_pci_regs() still exists after the
refactor, which also contributes to the component register problem. If
the register mapping is up leveled to the caller of
cxl_setup_pci_regs() (and drops mapping component registers) then a
follow-on patch to rename cxl_setup_pci_regs to find_register_block
becomes easier to read. Moving the cxl_register_map array out of
cxl_setup_pci_regs() also makes a later patch to pass component
register enumeration details to the endpoint-port that much cleaner.

^ permalink raw reply

* Re: [PATCH kernel] powerps/pseries/dma: Add support for 2M IOMMU page size
From: Leonardo Brás @ 2021-09-28 17:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Frederic Barrat, Murilo Vicentini, Travis Pizel,
	Leonardo Augusto Guimaraes Garcia, Brian J King
In-Reply-To: <20210928101521.3956331-1-aik@ozlabs.ru>

Hello Alexey,

On Tue, 2021-09-28 at 20:15 +1000, Alexey Kardashevskiy wrote:
> The upcoming PAPR spec adds a 2M page size, bit 23 right after the 16G
> page
> size in the "ibm,query-pe-dma-window" call.
> 
> This adds support for the new page size. Since the new page size is out
> of sorted order, this changes the loop to not assume that shift[] is
> sorted.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> This might not work if PHYP keeps rejecting new window requests for
> less
> than 32768 TCEs. This is needed:
> https://github.com/aik/linux/commit/8cc8fa5ba5b3b4a18efbc9d81d9e5b85ca7c8a95
> 
> 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/iommu.c
> b/arch/powerpc/platforms/pseries/iommu.c
> index c741689a5165..237bf405b0cb 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -1159,14 +1159,15 @@ static void reset_dma_window(struct pci_dev
> *dev, struct device_node *par_dn)
>  /* Return largest page shift based on "IO Page Sizes" output of
> ibm,query-pe-dma-window. */
>  static int iommu_get_page_shift(u32 query_page_size)
>  {
> -       /* Supported IO page-sizes according to LoPAR */
> +       /* Supported IO page-sizes according to LoPAR, note that 2M is
> out of order */
>         const int shift[] = {
>                 __builtin_ctzll(SZ_4K),   __builtin_ctzll(SZ_64K),
> __builtin_ctzll(SZ_16M),
>                 __builtin_ctzll(SZ_32M),  __builtin_ctzll(SZ_64M),
> __builtin_ctzll(SZ_128M),
> -               __builtin_ctzll(SZ_256M), __builtin_ctzll(SZ_16G)
> +               __builtin_ctzll(SZ_256M), __builtin_ctzll(SZ_16G),
> __builtin_ctzll(SZ_2M)
>         };
>  
>         int i = ARRAY_SIZE(shift) - 1;
> +       int ret = 0;
>  
>         /*
>          * On LoPAR, ibm,query-pe-dma-window outputs "IO Page Sizes"
> using a bit field:
> @@ -1176,11 +1177,10 @@ static int iommu_get_page_shift(u32
> query_page_size)
>          */
>         for (; i >= 0 ; i--) {
>                 if (query_page_size & (1 << i))
> -                       return shift[i];
> +                       ret = max(ret, shift[i]);
>         }
>  
> -       /* No valid page size found. */
> -       return 0;
> +       return ret;
>  }
>  
>  static struct property *ddw_property_create(const char *propname, u32
> liobn, u64 dma_addr,

Looks great to me.

FWIW:
Reviewed-by: Leonardo Bras <leobras.c@gmail.com>

Best regards,
Leonardo



^ permalink raw reply

* Re: [PATCH v4 4/8] PCI: replace pci_dev::driver usage that gets the driver name
From: Bjorn Helgaas @ 2021-09-28 17:17 UTC (permalink / raw)
  To: Uwe Kleine-König, Oliver O'Halloran, Russell Currey
  Cc: linux-pci, Alexander Duyck, oss-drivers, Paul Mackerras,
	Herbert Xu, Rafał Miłecki, Jesse Brandeburg,
	Ido Schimmel, Jakub Kicinski, Yisen Zhuang, Vadym Kochan,
	Uwe Kleine-König, Michael Buesch, Jiri Pirko, Salil Mehta,
	netdev, linux-wireless, linux-kernel, Taras Chornyi, Zhou Wang,
	linux-crypto, kernel, Simon Horman, Oliver O'Halloran,
	linuxppc-dev, David S. Miller
In-Reply-To: <20210927204326.612555-5-uwe@kleine-koenig.org>

[+to Oliver, Russell for eeh_driver_name() question below]

On Mon, Sep 27, 2021 at 10:43:22PM +0200, Uwe Kleine-König wrote:
> From: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> 
> struct pci_dev::driver holds (apart from a constant offset) the same
> data as struct pci_dev::dev->driver. With the goal to remove struct
> pci_dev::driver to get rid of data duplication replace getting the
> driver name by dev_driver_string() which implicitly makes use of struct
> pci_dev::dev->driver.

When you repost to fix the build issue, can you capitalize the subject
line to match the other?

Also, would you mind using "pci_dev.driver" instead of
"pci_dev::driver"?  AFAIK, the "::" operator is not actually part of
C, so I think it's more confusing than useful.

> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
>  arch/powerpc/include/asm/ppc-pci.h                   | 9 ++++++++-
>  drivers/bcma/host_pci.c                              | 7 ++++---
>  drivers/crypto/hisilicon/qm.c                        | 2 +-
>  drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c   | 2 +-
>  drivers/net/ethernet/marvell/prestera/prestera_pci.c | 2 +-
>  drivers/net/ethernet/mellanox/mlxsw/pci.c            | 2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +-
>  drivers/ssb/pcihost_wrapper.c                        | 8 +++++---
>  8 files changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc-pci.h b/arch/powerpc/include/asm/ppc-pci.h
> index 2b9edbf6e929..e8f1795a2acf 100644
> --- a/arch/powerpc/include/asm/ppc-pci.h
> +++ b/arch/powerpc/include/asm/ppc-pci.h
> @@ -57,7 +57,14 @@ void eeh_sysfs_remove_device(struct pci_dev *pdev);
>  
>  static inline const char *eeh_driver_name(struct pci_dev *pdev)
>  {
> -	return (pdev && pdev->driver) ? pdev->driver->name : "<null>";
> +	if (pdev) {
> +		const char *drvstr = dev_driver_string(&pdev->dev);
> +
> +		if (strcmp(drvstr, ""))
> +			return drvstr;
> +	}
> +
> +	return "<null>";

Can we just do this?

  if (pdev)
    return dev_driver_string(&pdev->dev);

  return "<null>";

I think it's more complicated than it's worth to include a strcmp().
It's possible this will change those error messages about "Might be
infinite loop in %s driver", but that doesn't seem like a huge deal.

I moved Oliver to "to:" and added Russell in case they object.

>  }
>  
>  #endif /* CONFIG_EEH */
> diff --git a/drivers/bcma/host_pci.c b/drivers/bcma/host_pci.c
> index 69c10a7b7c61..0973022d4b13 100644
> --- a/drivers/bcma/host_pci.c
> +++ b/drivers/bcma/host_pci.c
> @@ -175,9 +175,10 @@ static int bcma_host_pci_probe(struct pci_dev *dev,
>  	if (err)
>  		goto err_kfree_bus;
>  
> -	name = dev_name(&dev->dev);
> -	if (dev->driver && dev->driver->name)
> -		name = dev->driver->name;
> +	name = dev_driver_string(&dev->dev);
> +	if (!strcmp(name, ""))
> +		name = dev_name(&dev->dev);
>  	err = pci_request_regions(dev, name);

Again seems more complicated than it's worth to me.  This is in the
driver's .probe() method, so really_probe() has already set
"dev->driver = drv", which means dev->driver is always set to
&bcma_pci_bridge_driver here, and bcma_pci_bridge_driver.name is
always "bcma-pci-bridge".

Almost all callers of pci_request_regions() just hardcode the driver
name or use a DRV_NAME #define

So I think we should just do:

  err = pci_request_regions(dev, "bcma-pci-bridge");

>  	if (err)
>  		goto err_pci_disable;
> diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
> index 369562d34d66..8f361e54e524 100644
> --- a/drivers/crypto/hisilicon/qm.c
> +++ b/drivers/crypto/hisilicon/qm.c
> @@ -3085,7 +3085,7 @@ static int qm_alloc_uacce(struct hisi_qm *qm)
>  	};
>  	int ret;
>  
> -	ret = strscpy(interface.name, pdev->driver->name,
> +	ret = strscpy(interface.name, dev_driver_string(&pdev->dev),
>  		      sizeof(interface.name));
>  	if (ret < 0)
>  		return -ENAMETOOLONG;
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
> index 7ea511d59e91..f279edfce3f1 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
> @@ -606,7 +606,7 @@ static void hns3_get_drvinfo(struct net_device *netdev,
>  		return;
>  	}
>  
> -	strncpy(drvinfo->driver, h->pdev->driver->name,
> +	strncpy(drvinfo->driver, dev_driver_string(&h->pdev->dev),
>  		sizeof(drvinfo->driver));
>  	drvinfo->driver[sizeof(drvinfo->driver) - 1] = '\0';
>  
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_pci.c b/drivers/net/ethernet/marvell/prestera/prestera_pci.c
> index a250d394da38..a8f007f6dad2 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_pci.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_pci.c
> @@ -720,7 +720,7 @@ static int prestera_fw_load(struct prestera_fw *fw)
>  static int prestera_pci_probe(struct pci_dev *pdev,
>  			      const struct pci_device_id *id)
>  {
> -	const char *driver_name = pdev->driver->name;
> +	const char *driver_name = dev_driver_string(&pdev->dev);
>  	struct prestera_fw *fw;
>  	int err;
>  
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> index 13b0259f7ea6..8f306364f7bf 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c
> @@ -1876,7 +1876,7 @@ static void mlxsw_pci_cmd_fini(struct mlxsw_pci *mlxsw_pci)
>  
>  static int mlxsw_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
> -	const char *driver_name = pdev->driver->name;
> +	const char *driver_name = dev_driver_string(&pdev->dev);
>  	struct mlxsw_pci *mlxsw_pci;
>  	int err;
>  
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> index 0685ece1f155..23dfb599c828 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
> @@ -202,7 +202,7 @@ nfp_get_drvinfo(struct nfp_app *app, struct pci_dev *pdev,
>  {
>  	char nsp_version[ETHTOOL_FWVERS_LEN] = {};
>  
> -	strlcpy(drvinfo->driver, pdev->driver->name, sizeof(drvinfo->driver));
> +	strlcpy(drvinfo->driver, dev_driver_string(&pdev->dev), sizeof(drvinfo->driver));
>  	nfp_net_get_nspinfo(app, nsp_version);
>  	snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version),
>  		 "%s %s %s %s", vnic_version, nsp_version,
> diff --git a/drivers/ssb/pcihost_wrapper.c b/drivers/ssb/pcihost_wrapper.c
> index 410215c16920..4938ed5cfae5 100644
> --- a/drivers/ssb/pcihost_wrapper.c
> +++ b/drivers/ssb/pcihost_wrapper.c
> @@ -78,9 +78,11 @@ static int ssb_pcihost_probe(struct pci_dev *dev,
>  	err = pci_enable_device(dev);
>  	if (err)
>  		goto err_kfree_ssb;
> -	name = dev_name(&dev->dev);
> -	if (dev->driver && dev->driver->name)
> -		name = dev->driver->name;
> +
> +	name = dev_driver_string(&dev->dev);
> +	if (*name == '\0')
> +		name = dev_name(&dev->dev);
> +
>  	err = pci_request_regions(dev, name);

Also seems like more trouble than it's worth.  This one is a little
strange but is always called for either b43_pci_bridge_driver or
b44_pci_driver, both of which have .name set, so I think we should
simply do:

  err = pci_request_regions(dev, dev_driver_string(&dev->dev));

>  	if (err)
>  		goto err_pci_disable;
> -- 
> 2.30.2
> 

^ permalink raw reply

* Re: [PATCH 02/10] i2c: pasemi: Use io{read,write}32
From: Sven Peter @ 2021-09-28 15:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hector Martin, Linux Kernel Mailing List, Linux I2C,
	Paul Mackerras, Linux ARM, Olof Johansson, Mohamed Mediouni,
	Mark Kettenis, linuxppc-dev, Alyssa Rosenzweig, Stan Skowronek
In-Reply-To: <CAK8P3a2_6rcQa8TCgw=6uH26UfjShrVVu-zfLf2=pi6Z8cGOPg@mail.gmail.com>

On Mon, Sep 27, 2021, at 09:39, Arnd Bergmann wrote:
> On Sun, Sep 26, 2021 at 12:00 PM Sven Peter <sven@svenpeter.dev> wrote:
>>
>> In preparation for splitting this driver up into a platform_driver
>> and a pci_driver, replace outl/inl usage with ioport_map and
>> ioread32/iowrite32.
>>
>> Signed-off-by: Sven Peter <sven@svenpeter.dev>
>>
>> +       smbus->ioaddr = ioport_map(smbus->base, smbus->size);
>> +       if (!smbus->ioaddr) {
>> +               error = -EBUSY;
>> +               goto out_release_region;
>> +       }
>
> While this works, I would suggest using the more regular pci_iomap()
> or pcim_iomap() helper to turn the port number into an __iomem token.

Thanks a lot for the review!

I'll replace it with pci_iomap here and then later in this series with
pcim_iomap when also switching the rest to devres.


Thanks,

Sven


^ permalink raw reply

* Re: [PATCH v2 3/9] cxl/pci: Remove pci request/release regions
From: Dan Williams @ 2021-09-28 14:42 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-4-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:26 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> Quoting Dan, "... the request + release regions should probably just be
> dropped. It's not like any of the register enumeration would collide
> with someone else who already has the registers mapped. The collision
> only comes when the registers are mapped for their final usage, and that
> will have more precision in the request."

Looks good to me:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

>
> Recommended-by: Dan Williams <dan.j.williams@intel.com>

This isn't one of the canonical tags:
Documentation/process/submitting-patches.rst

I'll change this to Suggested-by:

> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> ---
>  drivers/cxl/pci.c | 5 -----
>  1 file changed, 5 deletions(-)
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index ccc7c2573ddc..7256c236fdb3 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -453,9 +453,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm)
>                 return -ENXIO;
>         }
>
> -       if (pci_request_mem_regions(pdev, pci_name(pdev)))
> -               return -ENODEV;
> -
>         /* Get the size of the Register Locator DVSEC */
>         pci_read_config_dword(pdev, regloc + PCI_DVSEC_HEADER1, &regloc_size);
>         regloc_size = FIELD_GET(PCI_DVSEC_HEADER1_LENGTH_MASK, regloc_size);
> @@ -499,8 +496,6 @@ static int cxl_pci_setup_regs(struct cxl_mem *cxlm)
>                 n_maps++;
>         }
>
> -       pci_release_mem_regions(pdev);
> -
>         for (i = 0; i < n_maps; i++) {
>                 ret = cxl_map_regs(cxlm, &maps[i]);
>                 if (ret)
> --
> 2.33.0
>

^ permalink raw reply

* Re: [PATCH v2 2/9] cxl/pci: Remove dev_dbg for unknown register blocks
From: Dan Williams @ 2021-09-28 14:37 UTC (permalink / raw)
  To: Ben Widawsky
  Cc: Andrew Donnellan, Linux PCI, linuxppc-dev, linux-cxl,
	open list:DMA MAPPING HELPERS, Bjorn Helgaas, David E. Box,
	Frederic Barrat, Lu Baolu, David Woodhouse, Kan Liang
In-Reply-To: <20210923172647.72738-3-ben.widawsky@intel.com>

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky <ben.widawsky@intel.com> wrote:
>
> While interesting to driver developers, the dev_dbg message doesn't do
> much except clutter up logs. This information should be attainable
> through sysfs, and someday lspci like utilities. This change
> additionally helps reduce the LOC in a subsequent patch to refactor some
> of cxl_pci register mapping.

Looks good to me:

Reviewed-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Pratik Sampat @ 2021-09-28 14:11 UTC (permalink / raw)
  To: Greg KH
  Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
	paulus, linux-kselftest, kjain, shuah
In-Reply-To: <YVMfonwjmbgL/ZCX@kroah.com>



On 28/09/21 7:28 pm, Greg KH wrote:
> On Tue, Sep 28, 2021 at 06:13:18PM +0530, Pratik Sampat wrote:
>> Hello Greg,
>>
>> Thank you for your review.
>>
>> On 28/09/21 5:38 pm, Greg KH wrote:
>>> On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
>>>> Adds a generic interface to represent the energy and frequency related
>>>> PAPR attributes on the system using the new H_CALL
>>>> "H_GET_ENERGY_SCALE_INFO".
>>>>
>>>> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
>>>> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
>>>> will be deprecated P10 onwards.
>>>>
>>>> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
>>>> hcall(
>>>>     uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
>>>>     uint64 flags,           // Per the flag request
>>>>     uint64 firstAttributeId,// The attribute id
>>>>     uint64 bufferAddress,   // Guest physical address of the output buffer
>>>>     uint64 bufferSize       // The size in bytes of the output buffer
>>>> );
>>>>
>>>> This H_CALL can query either all the attributes at once with
>>>> firstAttributeId = 0, flags = 0 as well as query only one attribute
>>>> at a time with firstAttributeId = id, flags = 1.
>>>>
>>>> The output buffer consists of the following
>>>> 1. number of attributes              - 8 bytes
>>>> 2. array offset to the data location - 8 bytes
>>>> 3. version info                      - 1 byte
>>>> 4. A data array of size num attributes, which contains the following:
>>>>     a. attribute ID              - 8 bytes
>>>>     b. attribute value in number - 8 bytes
>>>>     c. attribute name in string  - 64 bytes
>>>>     d. attribute value in string - 64 bytes
>>>>
>>>> The new H_CALL exports information in direct string value format, hence
>>>> a new interface has been introduced in
>>>> /sys/firmware/papr/energy_scale_info to export this information to
>>>> userspace in an extensible pass-through format.
>>>>
>>>> The H_CALL returns the name, numeric value and string value (if exists)
>>>>
>>>> The format of exposing the sysfs information is as follows:
>>>> /sys/firmware/papr/energy_scale_info/
>>>>      |-- <id>/
>>>>        |-- desc
>>>>        |-- value
>>>>        |-- value_desc (if exists)
>>>>      |-- <id>/
>>>>        |-- desc
>>>>        |-- value
>>>>        |-- value_desc (if exists)
>>>> ...
>>>>
>>>> The energy information that is exported is useful for userspace tools
>>>> such as powerpc-utils. Currently these tools infer the
>>>> "power_mode_data" value in the lparcfg, which in turn is obtained from
>>>> the to be deprecated H_GET_EM_PARMS H_CALL.
>>>> On future platforms, such userspace utilities will have to look at the
>>>> data returned from the new H_CALL being populated in this new sysfs
>>>> interface and report this information directly without the need of
>>>> interpretation.
>>>>
>>>> Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
>>>> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
>>>> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
>>>> Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
>>>> ---
>>>>    .../sysfs-firmware-papr-energy-scale-info     |  26 ++
>>>>    arch/powerpc/include/asm/hvcall.h             |  24 +-
>>>>    arch/powerpc/kvm/trace_hv.h                   |   1 +
>>>>    arch/powerpc/platforms/pseries/Makefile       |   3 +-
>>>>    .../pseries/papr_platform_attributes.c        | 312 ++++++++++++++++++
>>>>    5 files changed, 364 insertions(+), 2 deletions(-)
>>>>    create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>>    create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>>>>
>>>> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>> new file mode 100644
>>>> index 000000000000..139a576c7c9d
>>>> --- /dev/null
>>>> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>>>> @@ -0,0 +1,26 @@
>>>> +What:		/sys/firmware/papr/energy_scale_info
>>>> +Date:		June 2021
>>>> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>>>> +Description:	Directory hosting a set of platform attributes like
>>>> +		energy/frequency on Linux running as a PAPR guest.
>>>> +
>>>> +		Each file in a directory contains a platform
>>>> +		attribute hierarchy pertaining to performance/
>>>> +		energy-savings mode and processor frequency.
>>>> +
>>>> +What:		/sys/firmware/papr/energy_scale_info/<id>
>>>> +		/sys/firmware/papr/energy_scale_info/<id>/desc
>>>> +		/sys/firmware/papr/energy_scale_info/<id>/value
>>>> +		/sys/firmware/papr/energy_scale_info/<id>/value_desc
>>>> +Date:		June 2021
>>>> +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
>>>> +Description:	Energy, frequency attributes directory for POWERVM servers
>>>> +
>>>> +		This directory provides energy, frequency, folding information. It
>>>> +		contains below sysfs attributes:
>>>> +
>>>> +		- desc: String description of the attribute <id>
>>>> +
>>>> +		- value: Numeric value of attribute <id>
>>>> +
>>>> +		- value_desc: String value of attribute <id>
>>> Can you just make 4 different entries in this file, making it easier to
>>> parse and extend over time?
>> Do you mean I only create one file per attribute and populate it with 4
>> different entries as follows?
>>
>> # cat /sys/firmware/papr/energy_scale_info/<id>
>> id:
>> desc:
>> value:
>> value_desc:
> No, I mean in this documentation file, have 4 different "What:" entries,
> don't lump 4 of them together into one larger Description for no reason
> like you did here.
>
> The sysfs files themselves are fine.

Ah okay, I understand what you're saying. I just need to make 4 different
entries in the documentation.
Thanks for that clarification.

>>>> +struct papr_attr {
>>>> +	u64 id;
>>>> +	struct kobj_attribute kobj_attr;
>>> Why does an attribute have to be part of this structure?
>> I bundled both an attribute as well as its ID in a structure because each
>> attributes value could only be queried from the firmware with the corresponding
>> ID.
>> It seemed to be logically connected and that's why I had them in the structure.
>> Are you suggesting we maintain them separately and don't need the coupling?
> The id is connected to the kobject, not the attribute, right?
> Attributes do not have uniqueness like this normally.
>
>
>>>> +static struct papr_ops_info {
>>>> +	const char *attr_name;
>>>> +	ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
>>>> +			char *buf);
>>>> +} ops_info[MAX_ATTRS] = {
>>>> +	{ "desc", papr_show_desc },
>>>> +	{ "value", papr_show_value },
>>>> +	{ "value_desc", papr_show_value_desc },
>>> What is wrong with just using the __ATTR_RO() macro and then having an
>>> array of attributes in a single group?  That should be a lot simpler
>>> overall, right?
>> If I understand this correctly, you mean I can have a array of attributes in a
>> flat single group?
> Yes.
>
>> I suppose that would be a simpler, given your earlier suggestion to wrap
>> attribute values up in a single file per attribute.
>>
>> However, the intent of grouping and keeping files separate was that each sysfs
>> file has only one value to display.
> That is correct, and not a problem here at all.
>
>> I can change it to using an array of attributes in a single group too if you
>> believe that is right way to go instead.
> You have 3 variables for your attributes:
>
> static struct kobj_attribute papr_desc = __ATTR_RO(desc);
> static struct kobj_attribute papr_value = __ATTR_RO(value);
> static struct kobj_attribute papr_value_desc = __ATTR_RO(value_desc);
>
> and then your attribute group:
> static struct attribute papr_attrs[] = {
> 	&papr_desc.attr,
> 	&papr_value.attr,
> 	&papr_value_desc.attr,
> 	NULL,
> };
>
> ATTRIBUTE_GROUPS(papr);
>
> Then take that papr_groups and register that with the kobject when
> needed.
>
> But, you seem to only be having a whole kobject for a subdirectory,
> right?  No need for that, just name your attribute group, so instead of
>
> ATTRIBUTE_GROUPS(papr);
>
> do:
> static const struct attribute_group papr_group = {
> 	.name = "Your Subdirectory Name here",
> 	.attrs = papr_attrs,
> };
>
> Hope this helps,

Yes, this does!
I understand now that a whole kobject for a sub-directory is futile.
The approach you suggested for having papr_groups register with the kobject
whenever needed is more cleaner.

Thanks for the help, I'll rework my current logic according to that.

Pratik

> greg k-h


^ permalink raw reply

* Re: [PATCH v8 1/2] powerpc/pseries: Interface to represent PAPR firmware attributes
From: Greg KH @ 2021-09-28 13:58 UTC (permalink / raw)
  To: Pratik Sampat
  Cc: farosas, pratik.r.sampat, linuxppc-dev, kvm-ppc, linux-kernel,
	paulus, linux-kselftest, kjain, shuah
In-Reply-To: <289d2081-7ae8-f76a-5180-49bc6061a05c@linux.ibm.com>

On Tue, Sep 28, 2021 at 06:13:18PM +0530, Pratik Sampat wrote:
> Hello Greg,
> 
> Thank you for your review.
> 
> On 28/09/21 5:38 pm, Greg KH wrote:
> > On Tue, Sep 28, 2021 at 05:21:01PM +0530, Pratik R. Sampat wrote:
> > > Adds a generic interface to represent the energy and frequency related
> > > PAPR attributes on the system using the new H_CALL
> > > "H_GET_ENERGY_SCALE_INFO".
> > > 
> > > H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> > > information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> > > will be deprecated P10 onwards.
> > > 
> > > The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> > > hcall(
> > >    uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
> > >    uint64 flags,           // Per the flag request
> > >    uint64 firstAttributeId,// The attribute id
> > >    uint64 bufferAddress,   // Guest physical address of the output buffer
> > >    uint64 bufferSize       // The size in bytes of the output buffer
> > > );
> > > 
> > > This H_CALL can query either all the attributes at once with
> > > firstAttributeId = 0, flags = 0 as well as query only one attribute
> > > at a time with firstAttributeId = id, flags = 1.
> > > 
> > > The output buffer consists of the following
> > > 1. number of attributes              - 8 bytes
> > > 2. array offset to the data location - 8 bytes
> > > 3. version info                      - 1 byte
> > > 4. A data array of size num attributes, which contains the following:
> > >    a. attribute ID              - 8 bytes
> > >    b. attribute value in number - 8 bytes
> > >    c. attribute name in string  - 64 bytes
> > >    d. attribute value in string - 64 bytes
> > > 
> > > The new H_CALL exports information in direct string value format, hence
> > > a new interface has been introduced in
> > > /sys/firmware/papr/energy_scale_info to export this information to
> > > userspace in an extensible pass-through format.
> > > 
> > > The H_CALL returns the name, numeric value and string value (if exists)
> > > 
> > > The format of exposing the sysfs information is as follows:
> > > /sys/firmware/papr/energy_scale_info/
> > >     |-- <id>/
> > >       |-- desc
> > >       |-- value
> > >       |-- value_desc (if exists)
> > >     |-- <id>/
> > >       |-- desc
> > >       |-- value
> > >       |-- value_desc (if exists)
> > > ...
> > > 
> > > The energy information that is exported is useful for userspace tools
> > > such as powerpc-utils. Currently these tools infer the
> > > "power_mode_data" value in the lparcfg, which in turn is obtained from
> > > the to be deprecated H_GET_EM_PARMS H_CALL.
> > > On future platforms, such userspace utilities will have to look at the
> > > data returned from the new H_CALL being populated in this new sysfs
> > > interface and report this information directly without the need of
> > > interpretation.
> > > 
> > > Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
> > > Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> > > Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
> > > Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
> > > ---
> > >   .../sysfs-firmware-papr-energy-scale-info     |  26 ++
> > >   arch/powerpc/include/asm/hvcall.h             |  24 +-
> > >   arch/powerpc/kvm/trace_hv.h                   |   1 +
> > >   arch/powerpc/platforms/pseries/Makefile       |   3 +-
> > >   .../pseries/papr_platform_attributes.c        | 312 ++++++++++++++++++
> > >   5 files changed, 364 insertions(+), 2 deletions(-)
> > >   create mode 100644 Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > >   create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
> > > 
> > > diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > > new file mode 100644
> > > index 000000000000..139a576c7c9d
> > > --- /dev/null
> > > +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> > > @@ -0,0 +1,26 @@
> > > +What:		/sys/firmware/papr/energy_scale_info
> > > +Date:		June 2021
> > > +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> > > +Description:	Directory hosting a set of platform attributes like
> > > +		energy/frequency on Linux running as a PAPR guest.
> > > +
> > > +		Each file in a directory contains a platform
> > > +		attribute hierarchy pertaining to performance/
> > > +		energy-savings mode and processor frequency.
> > > +
> > > +What:		/sys/firmware/papr/energy_scale_info/<id>
> > > +		/sys/firmware/papr/energy_scale_info/<id>/desc
> > > +		/sys/firmware/papr/energy_scale_info/<id>/value
> > > +		/sys/firmware/papr/energy_scale_info/<id>/value_desc
> > > +Date:		June 2021
> > > +Contact:	Linux for PowerPC mailing list <linuxppc-dev@ozlabs.org>
> > > +Description:	Energy, frequency attributes directory for POWERVM servers
> > > +
> > > +		This directory provides energy, frequency, folding information. It
> > > +		contains below sysfs attributes:
> > > +
> > > +		- desc: String description of the attribute <id>
> > > +
> > > +		- value: Numeric value of attribute <id>
> > > +
> > > +		- value_desc: String value of attribute <id>
> > Can you just make 4 different entries in this file, making it easier to
> > parse and extend over time?
> 
> Do you mean I only create one file per attribute and populate it with 4
> different entries as follows?
> 
> # cat /sys/firmware/papr/energy_scale_info/<id>
> id:
> desc:
> value:
> value_desc:

No, I mean in this documentation file, have 4 different "What:" entries,
don't lump 4 of them together into one larger Description for no reason
like you did here.

The sysfs files themselves are fine.

> > > +struct papr_attr {
> > > +	u64 id;
> > > +	struct kobj_attribute kobj_attr;
> > Why does an attribute have to be part of this structure?
> 
> I bundled both an attribute as well as its ID in a structure because each
> attributes value could only be queried from the firmware with the corresponding
> ID.
> It seemed to be logically connected and that's why I had them in the structure.
> Are you suggesting we maintain them separately and don't need the coupling?

The id is connected to the kobject, not the attribute, right?
Attributes do not have uniqueness like this normally.


> > > +static struct papr_ops_info {
> > > +	const char *attr_name;
> > > +	ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *kobj_attr,
> > > +			char *buf);
> > > +} ops_info[MAX_ATTRS] = {
> > > +	{ "desc", papr_show_desc },
> > > +	{ "value", papr_show_value },
> > > +	{ "value_desc", papr_show_value_desc },
> > What is wrong with just using the __ATTR_RO() macro and then having an
> > array of attributes in a single group?  That should be a lot simpler
> > overall, right?
> 
> If I understand this correctly, you mean I can have a array of attributes in a
> flat single group?

Yes.

> I suppose that would be a simpler, given your earlier suggestion to wrap
> attribute values up in a single file per attribute.
> 
> However, the intent of grouping and keeping files separate was that each sysfs
> file has only one value to display.

That is correct, and not a problem here at all.

> I can change it to using an array of attributes in a single group too if you
> believe that is right way to go instead.

You have 3 variables for your attributes:

static struct kobj_attribute papr_desc = __ATTR_RO(desc);
static struct kobj_attribute papr_value = __ATTR_RO(value);
static struct kobj_attribute papr_value_desc = __ATTR_RO(value_desc);

and then your attribute group:
static struct attribute papr_attrs[] = {
	&papr_desc.attr,
	&papr_value.attr,
	&papr_value_desc.attr,
	NULL,
};

ATTRIBUTE_GROUPS(papr);

Then take that papr_groups and register that with the kobject when
needed.

But, you seem to only be having a whole kobject for a subdirectory,
right?  No need for that, just name your attribute group, so instead of

ATTRIBUTE_GROUPS(papr);

do:
static const struct attribute_group papr_group = {
	.name = "Your Subdirectory Name here",
	.attrs = papr_attrs,
};

Hope this helps,

greg k-h

^ permalink raw reply

* Re: [RFC PATCH 3/8] s390: add CPU field to struct thread_info
From: Ard Biesheuvel @ 2021-09-28 13:52 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Heiko Carstens, Christian Borntraeger,
	Vasily Gorbik
  Cc: Peter Zijlstra, Catalin Marinas, Paul Mackerras, linux-riscv,
	Will Deacon, open list:S390, Russell King, Ingo Molnar, Albert Ou,
	Kees Cook, Arnd Bergmann, Keith Packard, Borislav Petkov,
	Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
	Linus Torvalds, Palmer Dabbelt,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20210914121036.3975026-4-ardb@kernel.org>

On Tue, 14 Sept 2021 at 14:11, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> The CPU field will be moved back into thread_info even when
> THREAD_INFO_IN_TASK is enabled, so add it back to s390's definition of
> struct thread_info.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/s390/include/asm/thread_info.h | 1 +
>  1 file changed, 1 insertion(+)
>

Heiko, Christian, Vasily,

Do you have any objections to this change? If you don't, could you
please ack it so it can be taken through another tree (or if that is
problematic for you, could you please propose another way of merging
these changes?)

Thanks,
Ard.

> diff --git a/arch/s390/include/asm/thread_info.h b/arch/s390/include/asm/thread_info.h
> index e6674796aa6f..b2ffcb4fe000 100644
> --- a/arch/s390/include/asm/thread_info.h
> +++ b/arch/s390/include/asm/thread_info.h
> @@ -37,6 +37,7 @@
>  struct thread_info {
>         unsigned long           flags;          /* low level flags */
>         unsigned long           syscall_work;   /* SYSCALL_WORK_ flags */
> +       unsigned int            cpu;            /* current CPU */
>  };
>
>  /*
> --
> 2.30.2
>

^ permalink raw reply

* Re: [RFC PATCH 4/8] powerpc: add CPU field to struct thread_info
From: Ard Biesheuvel @ 2021-09-28 13:18 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Peter Zijlstra, Catalin Marinas, Paul Mackerras, linux-riscv,
	Will Deacon, open list:S390, Arnd Bergmann, Russell King,
	Christian Borntraeger, Ingo Molnar, Albert Ou, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Keith Packard, Borislav Petkov,
	Andy Lutomirski, Paul Walmsley, Thomas Gleixner, Linux ARM,
	open list:LINUX FOR POWERPC (32-BIT AND 64-BIT),
	Linux Kernel Mailing List, Palmer Dabbelt, Linus Torvalds
In-Reply-To: <87pmst1rn9.fsf@mpe.ellerman.id.au>

On Tue, 28 Sept 2021 at 02:16, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > Ard Biesheuvel <ardb@kernel.org> writes:
> >> On Tue, 14 Sept 2021 at 14:11, Ard Biesheuvel <ardb@kernel.org> wrote:
> >>>
> >>> The CPU field will be moved back into thread_info even when
> >>> THREAD_INFO_IN_TASK is enabled, so add it back to powerpc's definition
> >>> of struct thread_info.
> >>>
> >>> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >>
> >> Michael,
> >>
> >> Do you have any objections or issues with this patch or the subsequent
> >> ones cleaning up the task CPU kludge for ppc32? Christophe indicated
> >> that he was happy with it.
> >
> > No objections, it looks good to me, thanks for cleaning up that horror :)
> >
> > It didn't apply cleanly to master so I haven't tested it at all, if you can point me at a
> > git tree with the dependencies I'd be happy to run some tests over it.
>
> Actually I realised I can just drop the last patch.
>
> So that looks fine, passes my standard quick build & boot on qemu tests,
> and builds with/without stack protector enabled.
>

Thanks.

Do you have any opinion on how this series should be merged? Kees Cook
is willing to take them via his cross-arch tree, or you could carry
them if you prefer. Taking it via multiple trees at the same time is
going to be tricky, or take two cycles, with I'd prefer to avoid.

-- 
Ard.

^ permalink raw reply

* [PATCH v5 4/4] docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu
From: Kajol Jain @ 2021-09-28 12:48 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx

Details are added for the event, cpumask and format attributes
in the ABI documentation.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 Documentation/ABI/testing/sysfs-bus-nvdimm | 35 ++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-nvdimm b/Documentation/ABI/testing/sysfs-bus-nvdimm
index bff84a16812a..64004d5e4840 100644
--- a/Documentation/ABI/testing/sysfs-bus-nvdimm
+++ b/Documentation/ABI/testing/sysfs-bus-nvdimm
@@ -6,3 +6,38 @@ Description:
 
 The libnvdimm sub-system implements a common sysfs interface for
 platform nvdimm resources. See Documentation/driver-api/nvdimm/.
+
+What:           /sys/bus/event_source/devices/nmemX/format
+Date:           September 2021
+KernelVersion:  5.16
+Contact:        Kajol Jain <kjain@linux.ibm.com>
+Description:	(RO) Attribute group to describe the magic bits
+		that go into perf_event_attr.config for a particular pmu.
+		(See ABI/testing/sysfs-bus-event_source-devices-format).
+
+		Each attribute under this group defines a bit range of the
+		perf_event_attr.config. Supported attribute is listed
+		below::
+		  event  = "config:0-4"  - event ID
+
+		For example::
+			ctl_res_cnt = "event=0x1"
+
+What:           /sys/bus/event_source/devices/nmemX/events
+Date:           September 2021
+KernelVersion:  5.16
+Contact:        Kajol Jain <kjain@linux.ibm.com>
+Description:	(RO) Attribute group to describe performance monitoring events
+                for the nvdimm memory device. Each attribute in this group
+                describes a single performance monitoring event supported by
+                this nvdimm pmu.  The name of the file is the name of the event.
+                (See ABI/testing/sysfs-bus-event_source-devices-events). A
+                listing of the events supported by a given nvdimm provider type
+                can be found in Documentation/driver-api/nvdimm/$provider.
+
+What:          /sys/bus/event_source/devices/nmemX/cpumask
+Date:          September 2021
+KernelVersion:  5.16
+Contact:        Kajol Jain <kjain@linux.ibm.com>
+Description:	(RO) This sysfs file exposes the cpumask which is designated to
+		to retrieve nvdimm pmu event counter data.
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 3/4] powerpc/papr_scm: Add perf interface support
From: Kajol Jain @ 2021-09-28 12:48 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx

Performance monitoring support for papr-scm nvdimm devices
via perf interface is added which includes addition of pmu
functions like add/del/read/event_init for nvdimm_pmu struture.

A new parameter 'priv' in added to the pdev_archdata structure to save
nvdimm_pmu device pointer, to handle the unregistering of pmu device.

papr_scm_pmu_register function populates the nvdimm_pmu structure
with name, capabilities, cpumask along with event handling
functions. Finally the populated nvdimm_pmu structure is passed to
register the pmu device. Event handling functions internally uses
hcall to get events and counter data.

Result in power9 machine with 2 nvdimm device:

Ex: List all event by perf list

command:# perf list nmem

  nmem0/cache_rh_cnt/                                [Kernel PMU event]
  nmem0/cache_wh_cnt/                                [Kernel PMU event]
  nmem0/cri_res_util/                                [Kernel PMU event]
  nmem0/ctl_res_cnt/                                 [Kernel PMU event]
  nmem0/ctl_res_tm/                                  [Kernel PMU event]
  nmem0/fast_w_cnt/                                  [Kernel PMU event]
  nmem0/host_l_cnt/                                  [Kernel PMU event]
  nmem0/host_l_dur/                                  [Kernel PMU event]
  nmem0/host_s_cnt/                                  [Kernel PMU event]
  nmem0/host_s_dur/                                  [Kernel PMU event]
  nmem0/med_r_cnt/                                   [Kernel PMU event]
  nmem0/med_r_dur/                                   [Kernel PMU event]
  nmem0/med_w_cnt/                                   [Kernel PMU event]
  nmem0/med_w_dur/                                   [Kernel PMU event]
  nmem0/mem_life/                                    [Kernel PMU event]
  nmem0/poweron_secs/                                [Kernel PMU event]
  ...
  nmem1/mem_life/                                    [Kernel PMU event]
  nmem1/poweron_secs/                                [Kernel PMU event]

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 arch/powerpc/include/asm/device.h         |   5 +
 arch/powerpc/platforms/pseries/papr_scm.c | 225 ++++++++++++++++++++++
 2 files changed, 230 insertions(+)

diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h
index 219559d65864..47ed639f3b8f 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -48,6 +48,11 @@ struct dev_archdata {
 
 struct pdev_archdata {
 	u64 dma_mask;
+	/*
+	 * Pointer to nvdimm_pmu structure, to handle the unregistering
+	 * of pmu device
+	 */
+	void *priv;
 };
 
 #endif /* _ASM_POWERPC_DEVICE_H */
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f48e87ac89c9..bdf2620db461 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -19,6 +19,7 @@
 #include <asm/papr_pdsm.h>
 #include <asm/mce.h>
 #include <asm/unaligned.h>
+#include <linux/perf_event.h>
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -68,6 +69,8 @@
 #define PAPR_SCM_PERF_STATS_EYECATCHER __stringify(SCMSTATS)
 #define PAPR_SCM_PERF_STATS_VERSION 0x1
 
+#define to_nvdimm_pmu(_pmu)	container_of(_pmu, struct nvdimm_pmu, pmu)
+
 /* Struct holding a single performance metric */
 struct papr_scm_perf_stat {
 	u8 stat_id[8];
@@ -120,6 +123,9 @@ struct papr_scm_priv {
 
 	/* length of the stat buffer as expected by phyp */
 	size_t stat_buffer_len;
+
+	 /* array to have event_code and stat_id mappings */
+	char **nvdimm_events_map;
 };
 
 static int papr_scm_pmem_flush(struct nd_region *nd_region,
@@ -340,6 +346,218 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv *p,
 	return 0;
 }
 
+static int papr_scm_pmu_get_value(struct perf_event *event, struct device *dev, u64 *count)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	struct papr_scm_priv *p = (struct papr_scm_priv *)dev->driver_data;
+	int rc, size;
+
+	/* Allocate request buffer enough to hold single performance stat */
+	size = sizeof(struct papr_scm_perf_stats) +
+		sizeof(struct papr_scm_perf_stat);
+
+	if (!p || !p->nvdimm_events_map)
+		return -EINVAL;
+
+	stats = kzalloc(size, GFP_KERNEL);
+	if (!stats)
+		return -ENOMEM;
+
+	stat = &stats->scm_statistic[0];
+	memcpy(&stat->stat_id,
+	       p->nvdimm_events_map[event->attr.config],
+		sizeof(stat->stat_id));
+	stat->stat_val = 0;
+
+	rc = drc_pmem_query_stats(p, stats, 1);
+	if (rc < 0) {
+		kfree(stats);
+		return rc;
+	}
+
+	*count = be64_to_cpu(stat->stat_val);
+	kfree(stats);
+	return 0;
+}
+
+static int papr_scm_pmu_event_init(struct perf_event *event)
+{
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+	struct papr_scm_priv *p;
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	/* test the event attr type for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* it does not support event sampling mode */
+	if (is_sampling_event(event))
+		return -EOPNOTSUPP;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	p = (struct papr_scm_priv *)nd_pmu->dev->driver_data;
+	if (!p)
+		return -EINVAL;
+
+	/* Invalid eventcode */
+	if (event->attr.config == 0 || event->attr.config > 16)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int papr_scm_pmu_add(struct perf_event *event, int flags)
+{
+	u64 count;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return -EINVAL;
+
+	if (flags & PERF_EF_START) {
+		rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &count);
+		if (rc)
+			return rc;
+
+		local64_set(&event->hw.prev_count, count);
+	}
+
+	return 0;
+}
+
+static void papr_scm_pmu_read(struct perf_event *event)
+{
+	u64 prev, now;
+	int rc;
+	struct nvdimm_pmu *nd_pmu = to_nvdimm_pmu(event->pmu);
+
+	if (!nd_pmu)
+		return;
+
+	rc = papr_scm_pmu_get_value(event, nd_pmu->dev, &now);
+	if (rc)
+		return;
+
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void papr_scm_pmu_del(struct perf_event *event, int flags)
+{
+	papr_scm_pmu_read(event);
+}
+
+static int papr_scm_pmu_check_events(struct papr_scm_priv *p, struct nvdimm_pmu *nd_pmu)
+{
+	struct papr_scm_perf_stat *stat;
+	struct papr_scm_perf_stats *stats;
+	char *statid;
+	int index, rc, count;
+	u32 available_events;
+
+	if (!p->stat_buffer_len)
+		return -ENOENT;
+
+	available_events = (p->stat_buffer_len  - sizeof(struct papr_scm_perf_stats))
+			/ sizeof(struct papr_scm_perf_stat);
+
+	/* Allocate the buffer for phyp where stats are written */
+	stats = kzalloc(p->stat_buffer_len, GFP_KERNEL);
+	if (!stats) {
+		rc = -ENOMEM;
+		return rc;
+	}
+
+	/* Allocate memory to nvdimm_event_map */
+	p->nvdimm_events_map = kcalloc(available_events, sizeof(char *), GFP_KERNEL);
+	if (!p->nvdimm_events_map) {
+		rc = -ENOMEM;
+		goto out_stats;
+	}
+
+	/* Called to get list of events supported */
+	rc = drc_pmem_query_stats(p, stats, 0);
+	if (rc)
+		goto out_nvdimm_events_map;
+
+	for (index = 0, stat = stats->scm_statistic, count = 0;
+		     index < available_events; index++, ++stat) {
+		statid = kzalloc(strlen(stat->stat_id) + 1, GFP_KERNEL);
+		if (!statid) {
+			rc = -ENOMEM;
+			goto out_nvdimm_events_map;
+		}
+
+		strcpy(statid, stat->stat_id);
+		p->nvdimm_events_map[count] = statid;
+		count++;
+	}
+	p->nvdimm_events_map[count] = NULL;
+	kfree(stats);
+	return 0;
+
+out_nvdimm_events_map:
+	kfree(p->nvdimm_events_map);
+out_stats:
+	kfree(stats);
+	return rc;
+}
+
+static void papr_scm_pmu_register(struct papr_scm_priv *p)
+{
+	struct nvdimm_pmu *nd_pmu;
+	int rc, nodeid;
+
+	nd_pmu = kzalloc(sizeof(*nd_pmu), GFP_KERNEL);
+	if (!nd_pmu) {
+		rc = -ENOMEM;
+		goto pmu_err_print;
+	}
+
+	rc = papr_scm_pmu_check_events(p, nd_pmu);
+	if (rc)
+		goto pmu_check_events_err;
+
+	nd_pmu->pmu.task_ctx_nr = perf_invalid_context;
+	nd_pmu->pmu.name = nvdimm_name(p->nvdimm);
+	nd_pmu->pmu.event_init = papr_scm_pmu_event_init;
+	nd_pmu->pmu.read = papr_scm_pmu_read;
+	nd_pmu->pmu.add = papr_scm_pmu_add;
+	nd_pmu->pmu.del = papr_scm_pmu_del;
+
+	nd_pmu->pmu.capabilities = PERF_PMU_CAP_NO_INTERRUPT |
+				PERF_PMU_CAP_NO_EXCLUDE;
+
+	/*updating the cpumask variable */
+	nodeid = dev_to_node(&p->pdev->dev);
+	nd_pmu->arch_cpumask = *cpumask_of_node(nodeid);
+
+	rc = register_nvdimm_pmu(nd_pmu, p->pdev);
+	if (rc)
+		goto pmu_register_err;
+
+	/*
+	 * Set archdata.priv value to nvdimm_pmu structure, to handle the
+	 * unregistering of pmu device.
+	 */
+	p->pdev->archdata.priv = nd_pmu;
+	return;
+
+pmu_register_err:
+	kfree(p->nvdimm_events_map);
+pmu_check_events_err:
+	kfree(nd_pmu);
+pmu_err_print:
+	dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc);
+}
+
 /*
  * Issue hcall to retrieve dimm health info and populate papr_scm_priv with the
  * health information.
@@ -1236,6 +1454,7 @@ static int papr_scm_probe(struct platform_device *pdev)
 		goto err2;
 
 	platform_set_drvdata(pdev, p);
+	papr_scm_pmu_register(p);
 
 	return 0;
 
@@ -1254,6 +1473,12 @@ static int papr_scm_remove(struct platform_device *pdev)
 
 	nvdimm_bus_unregister(p->bus);
 	drc_pmem_unbind(p);
+
+	if (pdev->archdata.priv)
+		unregister_nvdimm_pmu(pdev->archdata.priv);
+
+	pdev->archdata.priv = NULL;
+	kfree(p->nvdimm_events_map);
 	kfree(p->bus_desc.provider_name);
 	kfree(p);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 2/4] drivers/nvdimm: Add perf interface to expose nvdimm performance stats
From: Kajol Jain @ 2021-09-28 12:47 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, kernel test robot, rnsastry, aneesh.kumar,
	atrajeev, kjain, vaibhav, tglx

A common interface is added to get performance stats reporting
support for nvdimm devices. Added interface defines supported
event list, config fields for the event attributes and their
corresponding bit values which are exported via sysfs.

Interface also added support for pmu register/unregister functions,
cpu hotplug feature along with macros for handling events addition
via sysfs. It adds attribute groups for format, cpumask and events
to the pmu structure.

User could use the standard perf tool to access perf events exposed
via nvdimm pmu.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
[Make hotplug function static as reported by kernel test rorbot]
Reported-by: kernel test robot <lkp@intel.com> 
---
 drivers/nvdimm/Makefile  |   1 +
 drivers/nvdimm/nd_perf.c | 328 +++++++++++++++++++++++++++++++++++++++
 include/linux/nd.h       |  21 +++
 3 files changed, 350 insertions(+)
 create mode 100644 drivers/nvdimm/nd_perf.c

diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 29203f3d3069..25dba6095612 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -18,6 +18,7 @@ nd_e820-y := e820.o
 libnvdimm-y := core.o
 libnvdimm-y += bus.o
 libnvdimm-y += dimm_devs.o
+libnvdimm-y += nd_perf.o
 libnvdimm-y += dimm.o
 libnvdimm-y += region_devs.o
 libnvdimm-y += region.o
diff --git a/drivers/nvdimm/nd_perf.c b/drivers/nvdimm/nd_perf.c
new file mode 100644
index 000000000000..314415894acf
--- /dev/null
+++ b/drivers/nvdimm/nd_perf.c
@@ -0,0 +1,328 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * nd_perf.c: NVDIMM Device Performance Monitoring Unit support
+ *
+ * Perf interface to expose nvdimm performance stats.
+ *
+ * Copyright (C) 2021 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "nvdimm_pmu: " fmt
+
+#include <linux/nd.h>
+
+#define EVENT(_name, _code)     enum{_name = _code}
+
+/*
+ * NVDIMM Events codes.
+ */
+
+/* Controller Reset Count */
+EVENT(CTL_RES_CNT,		0x1);
+/* Controller Reset Elapsed Time */
+EVENT(CTL_RES_TM,		0x2);
+/* Power-on Seconds */
+EVENT(POWERON_SECS,		0x3);
+/* Life Remaining */
+EVENT(MEM_LIFE,		0x4);
+/* Critical Resource Utilization */
+EVENT(CRI_RES_UTIL,		0x5);
+/* Host Load Count */
+EVENT(HOST_L_CNT,		0x6);
+/* Host Store Count */
+EVENT(HOST_S_CNT,		0x7);
+/* Host Store Duration */
+EVENT(HOST_S_DUR,		0x8);
+/* Host Load Duration */
+EVENT(HOST_L_DUR,		0x9);
+/* Media Read Count */
+EVENT(MED_R_CNT,		0xa);
+/* Media Write Count */
+EVENT(MED_W_CNT,		0xb);
+/* Media Read Duration */
+EVENT(MED_R_DUR,		0xc);
+/* Media Write Duration */
+EVENT(MED_W_DUR,		0xd);
+/* Cache Read Hit Count */
+EVENT(CACHE_RH_CNT,		0xe);
+/* Cache Write Hit Count */
+EVENT(CACHE_WH_CNT,		0xf);
+/* Fast Write Count */
+EVENT(FAST_W_CNT,		0x10);
+
+NVDIMM_EVENT_ATTR(ctl_res_cnt,		CTL_RES_CNT);
+NVDIMM_EVENT_ATTR(ctl_res_tm,		CTL_RES_TM);
+NVDIMM_EVENT_ATTR(poweron_secs,		POWERON_SECS);
+NVDIMM_EVENT_ATTR(mem_life,		MEM_LIFE);
+NVDIMM_EVENT_ATTR(cri_res_util,		CRI_RES_UTIL);
+NVDIMM_EVENT_ATTR(host_l_cnt,		HOST_L_CNT);
+NVDIMM_EVENT_ATTR(host_s_cnt,		HOST_S_CNT);
+NVDIMM_EVENT_ATTR(host_s_dur,		HOST_S_DUR);
+NVDIMM_EVENT_ATTR(host_l_dur,		HOST_L_DUR);
+NVDIMM_EVENT_ATTR(med_r_cnt,		MED_R_CNT);
+NVDIMM_EVENT_ATTR(med_w_cnt,		MED_W_CNT);
+NVDIMM_EVENT_ATTR(med_r_dur,		MED_R_DUR);
+NVDIMM_EVENT_ATTR(med_w_dur,		MED_W_DUR);
+NVDIMM_EVENT_ATTR(cache_rh_cnt,		CACHE_RH_CNT);
+NVDIMM_EVENT_ATTR(cache_wh_cnt,		CACHE_WH_CNT);
+NVDIMM_EVENT_ATTR(fast_w_cnt,		FAST_W_CNT);
+
+static struct attribute *nvdimm_events_attr[] = {
+	NVDIMM_EVENT_PTR(CTL_RES_CNT),
+	NVDIMM_EVENT_PTR(CTL_RES_TM),
+	NVDIMM_EVENT_PTR(POWERON_SECS),
+	NVDIMM_EVENT_PTR(MEM_LIFE),
+	NVDIMM_EVENT_PTR(CRI_RES_UTIL),
+	NVDIMM_EVENT_PTR(HOST_L_CNT),
+	NVDIMM_EVENT_PTR(HOST_S_CNT),
+	NVDIMM_EVENT_PTR(HOST_S_DUR),
+	NVDIMM_EVENT_PTR(HOST_L_DUR),
+	NVDIMM_EVENT_PTR(MED_R_CNT),
+	NVDIMM_EVENT_PTR(MED_W_CNT),
+	NVDIMM_EVENT_PTR(MED_R_DUR),
+	NVDIMM_EVENT_PTR(MED_W_DUR),
+	NVDIMM_EVENT_PTR(CACHE_RH_CNT),
+	NVDIMM_EVENT_PTR(CACHE_WH_CNT),
+	NVDIMM_EVENT_PTR(FAST_W_CNT),
+	NULL
+};
+
+static struct attribute_group nvdimm_pmu_events_group = {
+	.name = "events",
+	.attrs = nvdimm_events_attr,
+};
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *nvdimm_pmu_format_attr[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group nvdimm_pmu_format_group = {
+	.name = "format",
+	.attrs = nvdimm_pmu_format_attr,
+};
+
+ssize_t nvdimm_events_sysfs_show(struct device *dev,
+				 struct device_attribute *attr, char *page)
+{
+	struct perf_pmu_events_attr *pmu_attr;
+
+	pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
+static ssize_t nvdimm_pmu_cpumask_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = container_of(pmu, struct nvdimm_pmu, pmu);
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(nd_pmu->cpu));
+}
+
+static int nvdimm_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+	u32 target;
+	int nodeid;
+	const struct cpumask *cpumask;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	/* Clear it, incase given cpu is set in nd_pmu->arch_cpumask */
+	cpumask_test_and_clear_cpu(cpu, &nd_pmu->arch_cpumask);
+
+	/*
+	 * If given cpu is not same as current designated cpu for
+	 * counter access, just return.
+	 */
+	if (cpu != nd_pmu->cpu)
+		return 0;
+
+	/* Check for any active cpu in nd_pmu->arch_cpumask */
+	target = cpumask_any(&nd_pmu->arch_cpumask);
+
+	/*
+	 * Incase we don't have any active cpu in nd_pmu->arch_cpumask,
+	 * check in given cpu's numa node list.
+	 */
+	if (target >= nr_cpu_ids) {
+		nodeid = cpu_to_node(cpu);
+		cpumask = cpumask_of_node(nodeid);
+		target = cpumask_any_but(cpumask, cpu);
+	}
+	nd_pmu->cpu = target;
+
+	/* Migrate nvdimm pmu events to the new target cpu if valid */
+	if (target >= 0 && target < nr_cpu_ids)
+		perf_pmu_migrate_context(&nd_pmu->pmu, cpu, target);
+
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct nvdimm_pmu *nd_pmu;
+
+	nd_pmu = hlist_entry_safe(node, struct nvdimm_pmu, node);
+
+	if (nd_pmu->cpu >= nr_cpu_ids)
+		nd_pmu->cpu = cpu;
+
+	return 0;
+}
+
+static int create_cpumask_attr_group(struct nvdimm_pmu *nd_pmu)
+{
+	struct perf_pmu_events_attr *pmu_events_attr;
+	struct attribute **attrs_group;
+	struct attribute_group *nvdimm_pmu_cpumask_group;
+
+	pmu_events_attr = kzalloc(sizeof(*pmu_events_attr), GFP_KERNEL);
+	if (!pmu_events_attr)
+		return -ENOMEM;
+
+	attrs_group = kzalloc(2 * sizeof(struct attribute *), GFP_KERNEL);
+	if (!attrs_group) {
+		kfree(pmu_events_attr);
+		return -ENOMEM;
+	}
+
+	/* Allocate memory for cpumask attribute group */
+	nvdimm_pmu_cpumask_group = kzalloc(sizeof(*nvdimm_pmu_cpumask_group), GFP_KERNEL);
+	if (!nvdimm_pmu_cpumask_group) {
+		kfree(pmu_events_attr);
+		kfree(attrs_group);
+		return -ENOMEM;
+	}
+
+	sysfs_attr_init(&pmu_events_attr->attr.attr);
+	pmu_events_attr->attr.attr.name = "cpumask";
+	pmu_events_attr->attr.attr.mode = 0444;
+	pmu_events_attr->attr.show = nvdimm_pmu_cpumask_show;
+	attrs_group[0] = &pmu_events_attr->attr.attr;
+	attrs_group[1] = NULL;
+
+	nvdimm_pmu_cpumask_group->attrs = attrs_group;
+	nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR] = nvdimm_pmu_cpumask_group;
+	return 0;
+}
+
+static int nvdimm_pmu_cpu_hotplug_init(struct nvdimm_pmu *nd_pmu)
+{
+	int nodeid, rc;
+	const struct cpumask *cpumask;
+
+	/*
+	 * Incase of cpu hotplug feature, arch specific code
+	 * can provide required cpumask which can be used
+	 * to get designatd cpu for counter access.
+	 * Check for any active cpu in nd_pmu->arch_cpumask.
+	 */
+	if (!cpumask_empty(&nd_pmu->arch_cpumask)) {
+		nd_pmu->cpu = cpumask_any(&nd_pmu->arch_cpumask);
+	} else {
+		/* pick active cpu from the cpumask of device numa node. */
+		nodeid = dev_to_node(nd_pmu->dev);
+		cpumask = cpumask_of_node(nodeid);
+		nd_pmu->cpu = cpumask_any(cpumask);
+	}
+
+	rc = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/nvdimm:online",
+				     nvdimm_pmu_cpu_online, nvdimm_pmu_cpu_offline);
+
+	if (rc < 0)
+		return rc;
+
+	nd_pmu->cpuhp_state = rc;
+
+	/* Register the pmu instance for cpu hotplug */
+	rc = cpuhp_state_add_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	if (rc) {
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	/* Create cpumask attribute group */
+	rc = create_cpumask_attr_group(nd_pmu);
+	if (rc) {
+		cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+		cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+		return rc;
+	}
+
+	return 0;
+}
+
+static void nvdimm_pmu_free_hotplug_memory(struct nvdimm_pmu *nd_pmu)
+{
+	cpuhp_state_remove_instance_nocalls(nd_pmu->cpuhp_state, &nd_pmu->node);
+	cpuhp_remove_multi_state(nd_pmu->cpuhp_state);
+
+	if (nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR])
+		kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]->attrs);
+	kfree(nd_pmu->pmu.attr_groups[NVDIMM_PMU_CPUMASK_ATTR]);
+}
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nd_pmu, struct platform_device *pdev)
+{
+	int rc;
+
+	if (!nd_pmu || !pdev)
+		return -EINVAL;
+
+	/* event functions like add/del/read/event_init and pmu name should not be NULL */
+	if (WARN_ON_ONCE(!(nd_pmu->pmu.event_init && nd_pmu->pmu.add &&
+			   nd_pmu->pmu.del && nd_pmu->pmu.read && nd_pmu->pmu.name)))
+		return -EINVAL;
+
+	nd_pmu->pmu.attr_groups = kzalloc((NVDIMM_PMU_NULL_ATTR + 1) *
+					  sizeof(struct attribute_group *), GFP_KERNEL);
+	if (!nd_pmu->pmu.attr_groups)
+		return -ENOMEM;
+
+	/*
+	 * Add platform_device->dev pointer to nvdimm_pmu to access
+	 * device data in events functions.
+	 */
+	nd_pmu->dev = &pdev->dev;
+
+	/* Fill attribute groups for the nvdimm pmu device */
+	nd_pmu->pmu.attr_groups[NVDIMM_PMU_FORMAT_ATTR] = &nvdimm_pmu_format_group;
+	nd_pmu->pmu.attr_groups[NVDIMM_PMU_EVENT_ATTR] = &nvdimm_pmu_events_group;
+	nd_pmu->pmu.attr_groups[NVDIMM_PMU_NULL_ATTR] = NULL;
+
+	/* Fill attribute group for cpumask */
+	rc = nvdimm_pmu_cpu_hotplug_init(nd_pmu);
+	if (rc) {
+		pr_info("cpu hotplug feature failed for device: %s\n", nd_pmu->pmu.name);
+		kfree(nd_pmu->pmu.attr_groups);
+		return rc;
+	}
+
+	rc = perf_pmu_register(&nd_pmu->pmu, nd_pmu->pmu.name, -1);
+	if (rc) {
+		kfree(nd_pmu->pmu.attr_groups);
+		nvdimm_pmu_free_hotplug_memory(nd_pmu);
+		return rc;
+	}
+
+	pr_info("%s NVDIMM performance monitor support registered\n",
+		nd_pmu->pmu.name);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_nvdimm_pmu);
+
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu)
+{
+	perf_pmu_unregister(&nd_pmu->pmu);
+	nvdimm_pmu_free_hotplug_memory(nd_pmu);
+	kfree(nd_pmu);
+}
+EXPORT_SYMBOL_GPL(unregister_nvdimm_pmu);
diff --git a/include/linux/nd.h b/include/linux/nd.h
index f5ed4db2d859..fa4370607bdb 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -9,6 +9,7 @@
 #include <linux/device.h>
 #include <linux/badblocks.h>
 #include <linux/perf_event.h>
+#include <linux/platform_device.h>
 
 enum nvdimm_event {
 	NVDIMM_REVALIDATE_POISON,
@@ -24,6 +25,19 @@ enum nvdimm_claim_class {
 	NVDIMM_CCLASS_UNKNOWN,
 };
 
+#define NVDIMM_EVENT_VAR(_id)  event_attr_##_id
+#define NVDIMM_EVENT_PTR(_id)  (&event_attr_##_id.attr.attr)
+
+#define NVDIMM_EVENT_ATTR(_name, _id)				\
+	PMU_EVENT_ATTR(_name, NVDIMM_EVENT_VAR(_id), _id,	\
+			nvdimm_events_sysfs_show)
+
+/* Event attribute array index */
+#define NVDIMM_PMU_FORMAT_ATTR	0
+#define NVDIMM_PMU_EVENT_ATTR	1
+#define NVDIMM_PMU_CPUMASK_ATTR	2
+#define NVDIMM_PMU_NULL_ATTR	3
+
 /**
  * struct nvdimm_pmu - data structure for nvdimm perf driver
  * @pmu: pmu data structure for nvdimm performance stats.
@@ -43,6 +57,13 @@ struct nvdimm_pmu {
 	struct cpumask arch_cpumask;
 };
 
+extern ssize_t nvdimm_events_sysfs_show(struct device *dev,
+					struct device_attribute *attr,
+					char *page);
+
+int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev);
+void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu);
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v5 1/4] drivers/nvdimm: Add nvdimm pmu structure
From: Kajol Jain @ 2021-09-28 12:47 UTC (permalink / raw)
  To: mpe, linuxppc-dev, nvdimm, linux-kernel, peterz, dan.j.williams,
	ira.weiny, vishal.l.verma
  Cc: santosh, maddy, rnsastry, aneesh.kumar, atrajeev, kjain, vaibhav,
	tglx

A structure is added called nvdimm_pmu, for performance
stats reporting support of nvdimm devices. It can be used to add
device pmu data such as pmu data structure for performance
stats, nvdimm device pointer along with cpumask attributes.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
 include/linux/nd.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/nd.h b/include/linux/nd.h
index ee9ad76afbba..f5ed4db2d859 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -8,6 +8,7 @@
 #include <linux/ndctl.h>
 #include <linux/device.h>
 #include <linux/badblocks.h>
+#include <linux/perf_event.h>
 
 enum nvdimm_event {
 	NVDIMM_REVALIDATE_POISON,
@@ -23,6 +24,25 @@ enum nvdimm_claim_class {
 	NVDIMM_CCLASS_UNKNOWN,
 };
 
+/**
+ * struct nvdimm_pmu - data structure for nvdimm perf driver
+ * @pmu: pmu data structure for nvdimm performance stats.
+ * @dev: nvdimm device pointer.
+ * @cpu: designated cpu for counter access.
+ * @node: node for cpu hotplug notifier link.
+ * @cpuhp_state: state for cpu hotplug notification.
+ * @arch_cpumask: cpumask to get designated cpu for counter access.
+ */
+struct nvdimm_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	int cpu;
+	struct hlist_node node;
+	enum cpuhp_state cpuhp_state;
+	/* cpumask provided by arch/platform specific code */
+	struct cpumask arch_cpumask;
+};
+
 struct nd_device_driver {
 	struct device_driver drv;
 	unsigned long type;
-- 
2.26.2


^ permalink raw reply related

* [PATCH] powerpc: fix unbalanced node refcount in check_kvm_guest()
From: Nathan Lynch @ 2021-09-28 12:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: srikar, npiggin

When check_kvm_guest() succeeds in looking up a /hypervisor OF node, it
returns without performing a matching put for the lookup, leaving the
node's reference count elevated.

Add the necessary call to of_node_put(), rearranging the code slightly to
avoid repetition or goto.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Fixes: 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")
---
 arch/powerpc/kernel/firmware.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c
index c7022c41cc31..20328f72f9f2 100644
--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -31,11 +31,10 @@ int __init check_kvm_guest(void)
 	if (!hyper_node)
 		return 0;
 
-	if (!of_device_is_compatible(hyper_node, "linux,kvm"))
-		return 0;
-
-	static_branch_enable(&kvm_guest);
+	if (of_device_is_compatible(hyper_node, "linux,kvm"))
+		static_branch_enable(&kvm_guest);
 
+	of_node_put(hyper_node);
 	return 0;
 }
 core_initcall(check_kvm_guest); // before kvm_guest_init()
-- 
2.31.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox