LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Deepthi Dharwar @ 2014-02-25  7:59 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Madhavan Srinivasan, Wang Dongsheng, linux-kernel, Paul Gortmaker,
	Paul Mackerras, Olof Johansson, linuxppc-dev
In-Reply-To: <1393028074-26797-1-git-send-email-cody@linux.vnet.ibm.com>

On 02/22/2014 05:44 AM, Cody P Schafer wrote:
> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
> nothing. Add a pr_warn() to convince any users that they should stop
> using it.
> 
> The commit message from the removing commit notes that this
> functionality should move into the cpuidle driver, essentially by
> adjusting target_residency to the specified value. At the moment,
> target_residency is not exposed by cpuidle's sysfs, so there isn't a
> drop in replacement for this.
> 
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>

smt-snooze-delay was used to delay an entry into NAP state
or disable NAP state completely. This was before we adopted cpuidle
framework for idle state management on powerpc. This is per-cpu based
tunable, where we could have cores with  different target residencies
and idle states.

Now that we have moved towards cpuidle framework, which provides a
better way of idle state management and this framework expects a single
target residency for all the cpus. We can no longer honour
smt-snooze-delay functionality of providing per-cpu target residency.
This was badly broken in the kernel before the patch to clean it up.
By removing this we would honour cpuidle framework through which we
carry out idle state management.

And generic cpuidle framework does not provide the flexibility to change
target residency on the go as there are multiple idle states supported
and trying to change target residency of one state (incorrectly) may
result in undefined behavior.

Also, the second functionality to disable/enable states can be done
using the cpuidle sysfs files. So this is functionality is preserved.

We currently do not use smt-snooze-delay in the kernel.
The sysfs entries needs to  be retained until we do a clean up ppc64_cpu
util that uses these entries to determine SMT,
clean up patch for this has already been posted out by Prerna.
Once, we have the ppc64_cpu changes in, we can look to clean up these
parts from the kernel.

Regards,
Deepthi

> ---
>  arch/powerpc/kernel/sysfs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 97e1dc9..84097b4 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -50,6 +50,9 @@ static ssize_t store_smt_snooze_delay(struct device *dev,
>  	if (ret != 1)
>  		return -EINVAL;
>  
> +	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +		  current->comm, task_pid_nr(current), cpu->dev.id);
> +
>  	per_cpu(smt_snooze_delay, cpu->dev.id) = snooze;
>  	return count;
>  }
> @@ -60,6 +63,9 @@ static ssize_t show_smt_snooze_delay(struct device *dev,
>  {
>  	struct cpu *cpu = container_of(dev, struct cpu, dev);
>  
> +	pr_warn_ratelimited("%s (%d): /sys/devices/system/cpu/cpu%d/smt-snooze-delay is deprecated and is a NOP\n",
> +		  current->comm, task_pid_nr(current), cpu->dev.id);
> +
>  	return sprintf(buf, "%ld\n", per_cpu(smt_snooze_delay, cpu->dev.id));
>  }
>  
> 

^ permalink raw reply

* [PATCH v2 0/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: AKASHI Takahiro @ 2014-02-25  9:16 UTC (permalink / raw)
  To: viro, eparis, rgb, arndb
  Cc: linux-s390, linaro-kernel, linux-ia64, user-mode-linux-devel,
	linux-parisc, linux-sh, catalin.marinas, x86, will.deacon,
	linux-kernel, AKASHI Takahiro, linux-alpha, dsaxena,
	user-mode-linux-user, linux-audit, sparclinux, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <1391407232-4623-1-git-send-email-takahiro.akashi@linaro.org>

Currently AUDITSYSCALL has a long list of architecture depencency:
       depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
                SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
for simplicity.

Changes v1 -> v2:
* rebased to 3.14-rcX, and so added a change on ALPHA

AKASHI Takahiro (1):
  audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL

 arch/alpha/Kconfig     |    1 +
 arch/arm/Kconfig       |    1 +
 arch/ia64/Kconfig      |    1 +
 arch/parisc/Kconfig    |    1 +
 arch/powerpc/Kconfig   |    1 +
 arch/s390/Kconfig      |    1 +
 arch/sh/Kconfig        |    1 +
 arch/sparc/Kconfig     |    1 +
 arch/um/Kconfig.common |    1 +
 arch/x86/Kconfig       |    1 +
 init/Kconfig           |    5 ++++-
 11 files changed, 14 insertions(+), 1 deletion(-)

-- 
1.7.9.5

^ permalink raw reply

* [PATCH v2 1/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: AKASHI Takahiro @ 2014-02-25  9:16 UTC (permalink / raw)
  To: viro, eparis, rgb, arndb
  Cc: linux-s390, linaro-kernel, linux-ia64, user-mode-linux-devel,
	linux-parisc, linux-sh, catalin.marinas, x86, will.deacon,
	linux-kernel, AKASHI Takahiro, linux-alpha, dsaxena,
	user-mode-linux-user, linux-audit, sparclinux, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <1393319784-2758-1-git-send-email-takahiro.akashi@linaro.org>

Currently AUDITSYSCALL has a long list of architecture depencency:
       depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
		SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
for simplicity.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/alpha/Kconfig     |    1 +
 arch/arm/Kconfig       |    1 +
 arch/ia64/Kconfig      |    1 +
 arch/parisc/Kconfig    |    1 +
 arch/powerpc/Kconfig   |    1 +
 arch/s390/Kconfig      |    1 +
 arch/sh/Kconfig        |    1 +
 arch/sparc/Kconfig     |    1 +
 arch/um/Kconfig.common |    1 +
 arch/x86/Kconfig       |    1 +
 init/Kconfig           |    5 ++++-
 11 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index f6c6b34..b7ff9a3 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -22,6 +22,7 @@ config ALPHA
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
+	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select ODD_RT_SIGACTION
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e254198..ca79340 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -24,6 +24,7 @@ config ARM
 	select GENERIC_STRNCPY_FROM_USER
 	select GENERIC_STRNLEN_USER
 	select HARDIRQS_SW_RESEND
+	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 0c8e553..5409bf4 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -44,6 +44,7 @@ config IA64
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select HAVE_ARCH_AUDITSYSCALL
 	default y
 	help
 	  The Itanium Processor Family is Intel's 64-bit successor to
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index bb2a8ec..1faefed 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -28,6 +28,7 @@ config PARISC
 	select CLONE_BACKWARDS
 	select TTY # Needed for pdc_cons.c
 	select HAVE_DEBUG_STACKOVERFLOW
+	select HAVE_ARCH_AUDITSYSCALL
 
 	help
 	  The PA-RISC microprocessor is designed by Hewlett-Packard and used
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 957bf34..7b3b8fe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -141,6 +141,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK
 	select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+	select HAVE_ARCH_AUDITSYSCALL
 
 config GENERIC_CSUM
 	def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 65a0775..1b58568 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -103,6 +103,7 @@ config S390
 	select GENERIC_SMP_IDLE_THREAD
 	select GENERIC_TIME_VSYSCALL
 	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
+	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
 	select HAVE_ARCH_SECCOMP_FILTER
 	select HAVE_ARCH_TRACEHOOK
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 6357710..4addd87 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -42,6 +42,7 @@ config SUPERH
 	select MODULES_USE_ELF_RELA
 	select OLD_SIGSUSPEND
 	select OLD_SIGACTION
+	select HAVE_ARCH_AUDITSYSCALL
 	help
 	  The SuperH is a RISC processor targeted for use in embedded systems
 	  and consumer electronics; it was also used in the Sega Dreamcast
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index c51efdc..9c74d6b 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -77,6 +77,7 @@ config SPARC64
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select HAVE_C_RECORDMCOUNT
 	select NO_BOOTMEM
+	select HAVE_ARCH_AUDITSYSCALL
 
 config ARCH_DEFCONFIG
 	string
diff --git a/arch/um/Kconfig.common b/arch/um/Kconfig.common
index 21ca44c..6915d28 100644
--- a/arch/um/Kconfig.common
+++ b/arch/um/Kconfig.common
@@ -1,6 +1,7 @@
 config UML
 	bool
 	default y
+	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_UID16
 	select GENERIC_IRQ_SHOW
 	select GENERIC_CPU_DEVICES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..2938365 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
 	select HAVE_CC_STACKPROTECTOR
+	select HAVE_ARCH_AUDITSYSCALL
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/init/Kconfig b/init/Kconfig
index 009a797..d4ec53d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -282,9 +282,12 @@ config AUDIT
 	  logging of avc messages output).  Does not do system-call
 	  auditing without CONFIG_AUDITSYSCALL.
 
+config HAVE_ARCH_AUDITSYSCALL
+	bool
+
 config AUDITSYSCALL
 	bool "Enable system-call auditing support"
-	depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML || SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
+	depends on AUDIT && HAVE_ARCH_AUDITSYSCALL
 	default y if SECURITY_SELINUX
 	help
 	  Enable low-overhead system-call auditing infrastructure that
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
From: Peter Zijlstra @ 2014-02-25 10:20 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Cody P Schafer, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Linux PPC
In-Reply-To: <20140225033326.7BB942C0228@ozlabs.org>

On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
> > Export the swevent hrtimer helpers currently only used in events/core.c
> > to allow the addition of architecture specific sw-like pmus.
> 
> Peter, Ingo, can we get your ACK on this please?

How are they used? I saw some usage in patch 9 or so; but its not
explained anywhere. All patches have non-existent Changelogs and the few
comments that are there are pretty hardware specific.

So please do tell; what do you need this for?

^ permalink raw reply

* Re: [PATCH] powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_ranges
From: Benjamin Herrenschmidt @ 2014-02-25 13:25 UTC (permalink / raw)
  To: Andrew Murray; +Cc: linux-pci, bhelgass, linuxppc-dev
In-Reply-To: <1393309931-20405-1-git-send-email-amurray@embedded-bits.co.uk>

On Tue, 2014-02-25 at 06:32 +0000, Andrew Murray wrote:
> This patch updates the implementation of pci_process_bridge_OF_ranges to use
> the of_pci_range_parser helpers.
> 
> Signed-off-by: Andrew Murray <amurray@embedded-bits.co.uk>
> ---
> I've verified that this builds, however I have no hardware to test this.
> ---

Thanks. A cursory review looks good but I need to spend a bit more time
making sure our various special cases are handled properly.

It's tracked on patchwork so unless you have an update to the patch,
it won't be lost, but it might take a little while before I get to
actually merge it.

Cheers,
Ben.

>  arch/powerpc/kernel/pci-common.c | 88 +++++++++++++---------------------------
>  1 file changed, 29 insertions(+), 59 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index d9476c1..a05fe18 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -666,60 +666,36 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar,
>  void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>  				  struct device_node *dev, int primary)
>  {
> -	const __be32 *ranges;
> -	int rlen;
> -	int pna = of_n_addr_cells(dev);
> -	int np = pna + 5;
>  	int memno = 0;
> -	u32 pci_space;
> -	unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
>  	struct resource *res;
> +	struct of_pci_range range;
> +	struct of_pci_range_parser parser;
>  
>  	printk(KERN_INFO "PCI host bridge %s %s ranges:\n",
>  	       dev->full_name, primary ? "(primary)" : "");
>  
> -	/* Get ranges property */
> -	ranges = of_get_property(dev, "ranges", &rlen);
> -	if (ranges == NULL)
> +	/* Check for ranges property */
> +	if (of_pci_range_parser_init(&parser, dev))
>  		return;
>  
>  	/* Parse it */
> -	while ((rlen -= np * 4) >= 0) {
> -		/* Read next ranges element */
> -		pci_space = of_read_number(ranges, 1);
> -		pci_addr = of_read_number(ranges + 1, 2);
> -		cpu_addr = of_translate_address(dev, ranges + 3);
> -		size = of_read_number(ranges + pna + 3, 2);
> -		ranges += np;
> -
> +	for_each_of_pci_range(&parser, &range) {
>  		/* If we failed translation or got a zero-sized region
>  		 * (some FW try to feed us with non sensical zero sized regions
>  		 * such as power3 which look like some kind of attempt at exposing
>  		 * the VGA memory hole)
>  		 */
> -		if (cpu_addr == OF_BAD_ADDR || size == 0)
> +		if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>  			continue;
>  
> -		/* Now consume following elements while they are contiguous */
> -		for (; rlen >= np * sizeof(u32);
> -		     ranges += np, rlen -= np * 4) {
> -			if (of_read_number(ranges, 1) != pci_space)
> -				break;
> -			pci_next = of_read_number(ranges + 1, 2);
> -			cpu_next = of_translate_address(dev, ranges + 3);
> -			if (pci_next != pci_addr + size ||
> -			    cpu_next != cpu_addr + size)
> -				break;
> -			size += of_read_number(ranges + pna + 3, 2);
> -		}
> -
>  		/* Act based on address space type */
>  		res = NULL;
> -		switch ((pci_space >> 24) & 0x3) {
> -		case 1:		/* PCI IO space */
> +		switch (range.flags & IORESOURCE_TYPE_BITS) {
> +		case IORESOURCE_IO:
>  			printk(KERN_INFO
>  			       "  IO 0x%016llx..0x%016llx -> 0x%016llx\n",
> -			       cpu_addr, cpu_addr + size - 1, pci_addr);
> +			       range.cpu_addr, range.cpu_addr + range.size - 1,
> +			       range.pci_addr);
>  
>  			/* We support only one IO range */
>  			if (hose->pci_io_size) {
> @@ -729,11 +705,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>  			}
>  #ifdef CONFIG_PPC32
>  			/* On 32 bits, limit I/O space to 16MB */
> -			if (size > 0x01000000)
> -				size = 0x01000000;
> +			if (range.size > 0x01000000)
> +				range.size = 0x01000000;
>  
>  			/* 32 bits needs to map IOs here */
> -			hose->io_base_virt = ioremap(cpu_addr, size);
> +			hose->io_base_virt = ioremap(range.cpu_addr,
> +						range.size);
>  
>  			/* Expect trouble if pci_addr is not 0 */
>  			if (primary)
> @@ -743,20 +720,20 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>  			/* pci_io_size and io_base_phys always represent IO
>  			 * space starting at 0 so we factor in pci_addr
>  			 */
> -			hose->pci_io_size = pci_addr + size;
> -			hose->io_base_phys = cpu_addr - pci_addr;
> +			hose->pci_io_size = range.pci_addr + range.size;
> +			hose->io_base_phys = range.cpu_addr - range.pci_addr;
>  
>  			/* Build resource */
>  			res = &hose->io_resource;
> -			res->flags = IORESOURCE_IO;
> -			res->start = pci_addr;
> +			range.cpu_addr = range.pci_addr;
>  			break;
> -		case 2:		/* PCI Memory space */
> -		case 3:		/* PCI 64 bits Memory space */
> +		case IORESOURCE_MEM:
>  			printk(KERN_INFO
>  			       " MEM 0x%016llx..0x%016llx -> 0x%016llx %s\n",
> -			       cpu_addr, cpu_addr + size - 1, pci_addr,
> -			       (pci_space & 0x40000000) ? "Prefetch" : "");
> +			       range.cpu_addr, range.cpu_addr + range.size - 1,
> +			       range.pci_addr,
> +			       (range.pci_space & 0x40000000) ?
> +			       "Prefetch" : "");
>  
>  			/* We support only 3 memory ranges */
>  			if (memno >= 3) {
> @@ -765,28 +742,21 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>  				continue;
>  			}
>  			/* Handles ISA memory hole space here */
> -			if (pci_addr == 0) {
> +			if (range.pci_addr == 0) {
>  				if (primary || isa_mem_base == 0)
> -					isa_mem_base = cpu_addr;
> -				hose->isa_mem_phys = cpu_addr;
> -				hose->isa_mem_size = size;
> +					isa_mem_base = range.cpu_addr;
> +				hose->isa_mem_phys = range.cpu_addr;
> +				hose->isa_mem_size = range.size;
>  			}
>  
>  			/* Build resource */
> -			hose->mem_offset[memno] = cpu_addr - pci_addr;
> +			hose->mem_offset[memno] = range.cpu_addr -
> +							range.pci_addr;
>  			res = &hose->mem_resources[memno++];
> -			res->flags = IORESOURCE_MEM;
> -			if (pci_space & 0x40000000)
> -				res->flags |= IORESOURCE_PREFETCH;
> -			res->start = cpu_addr;
>  			break;
>  		}
>  		if (res != NULL) {
> -			res->name = dev->full_name;
> -			res->end = res->start + size - 1;
> -			res->parent = NULL;
> -			res->sibling = NULL;
> -			res->child = NULL;
> +			of_pci_range_to_resource(&range, dev, res);
>  		}
>  	}
>  }

^ permalink raw reply

* Re: [PATCH] powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_ranges
From: Andrew Murray @ 2014-02-25 14:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-pci@vger.kernel.org, bhelgass, linuxppc-dev
In-Reply-To: <1393334743.1282.8.camel@pasglop>

On 25 February 2014 13:25, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
> On Tue, 2014-02-25 at 06:32 +0000, Andrew Murray wrote:
>> This patch updates the implementation of pci_process_bridge_OF_ranges to use
>> the of_pci_range_parser helpers.
>>
>> Signed-off-by: Andrew Murray <amurray@embedded-bits.co.uk>
>> ---
>> I've verified that this builds, however I have no hardware to test this.
>> ---
>
> Thanks. A cursory review looks good but I need to spend a bit more time
> making sure our various special cases are handled properly.
>
> It's tracked on patchwork so unless you have an update to the patch,
> it won't be lost, but it might take a little while before I get to
> actually merge it.

Thanks for the response - Yes it's easy to screw this stuff up.

Please note that some of the special cases are handled by the parser
helper e.g. consumption of contiguous ranges, assignment to 'struct
resource', etc.

It should also be pointed out that this is now once again very similar
to the Microblaze implementation.

Thanks,

Andrew Murray

>
> Cheers,
> Ben.
>
>>  arch/powerpc/kernel/pci-common.c | 88 +++++++++++++---------------------------
>>  1 file changed, 29 insertions(+), 59 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
>> index d9476c1..a05fe18 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -666,60 +666,36 @@ void pci_resource_to_user(const struct pci_dev *dev, int bar,
>>  void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>>                                 struct device_node *dev, int primary)
>>  {
>> -     const __be32 *ranges;
>> -     int rlen;
>> -     int pna = of_n_addr_cells(dev);
>> -     int np = pna + 5;
>>       int memno = 0;
>> -     u32 pci_space;
>> -     unsigned long long pci_addr, cpu_addr, pci_next, cpu_next, size;
>>       struct resource *res;
>> +     struct of_pci_range range;
>> +     struct of_pci_range_parser parser;
>>
>>       printk(KERN_INFO "PCI host bridge %s %s ranges:\n",
>>              dev->full_name, primary ? "(primary)" : "");
>>
>> -     /* Get ranges property */
>> -     ranges = of_get_property(dev, "ranges", &rlen);
>> -     if (ranges == NULL)
>> +     /* Check for ranges property */
>> +     if (of_pci_range_parser_init(&parser, dev))
>>               return;
>>
>>       /* Parse it */
>> -     while ((rlen -= np * 4) >= 0) {
>> -             /* Read next ranges element */
>> -             pci_space = of_read_number(ranges, 1);
>> -             pci_addr = of_read_number(ranges + 1, 2);
>> -             cpu_addr = of_translate_address(dev, ranges + 3);
>> -             size = of_read_number(ranges + pna + 3, 2);
>> -             ranges += np;
>> -
>> +     for_each_of_pci_range(&parser, &range) {
>>               /* If we failed translation or got a zero-sized region
>>                * (some FW try to feed us with non sensical zero sized regions
>>                * such as power3 which look like some kind of attempt at exposing
>>                * the VGA memory hole)
>>                */
>> -             if (cpu_addr == OF_BAD_ADDR || size == 0)
>> +             if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
>>                       continue;
>>
>> -             /* Now consume following elements while they are contiguous */
>> -             for (; rlen >= np * sizeof(u32);
>> -                  ranges += np, rlen -= np * 4) {
>> -                     if (of_read_number(ranges, 1) != pci_space)
>> -                             break;
>> -                     pci_next = of_read_number(ranges + 1, 2);
>> -                     cpu_next = of_translate_address(dev, ranges + 3);
>> -                     if (pci_next != pci_addr + size ||
>> -                         cpu_next != cpu_addr + size)
>> -                             break;
>> -                     size += of_read_number(ranges + pna + 3, 2);
>> -             }
>> -
>>               /* Act based on address space type */
>>               res = NULL;
>> -             switch ((pci_space >> 24) & 0x3) {
>> -             case 1:         /* PCI IO space */
>> +             switch (range.flags & IORESOURCE_TYPE_BITS) {
>> +             case IORESOURCE_IO:
>>                       printk(KERN_INFO
>>                              "  IO 0x%016llx..0x%016llx -> 0x%016llx\n",
>> -                            cpu_addr, cpu_addr + size - 1, pci_addr);
>> +                            range.cpu_addr, range.cpu_addr + range.size - 1,
>> +                            range.pci_addr);
>>
>>                       /* We support only one IO range */
>>                       if (hose->pci_io_size) {
>> @@ -729,11 +705,12 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>>                       }
>>  #ifdef CONFIG_PPC32
>>                       /* On 32 bits, limit I/O space to 16MB */
>> -                     if (size > 0x01000000)
>> -                             size = 0x01000000;
>> +                     if (range.size > 0x01000000)
>> +                             range.size = 0x01000000;
>>
>>                       /* 32 bits needs to map IOs here */
>> -                     hose->io_base_virt = ioremap(cpu_addr, size);
>> +                     hose->io_base_virt = ioremap(range.cpu_addr,
>> +                                             range.size);
>>
>>                       /* Expect trouble if pci_addr is not 0 */
>>                       if (primary)
>> @@ -743,20 +720,20 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>>                       /* pci_io_size and io_base_phys always represent IO
>>                        * space starting at 0 so we factor in pci_addr
>>                        */
>> -                     hose->pci_io_size = pci_addr + size;
>> -                     hose->io_base_phys = cpu_addr - pci_addr;
>> +                     hose->pci_io_size = range.pci_addr + range.size;
>> +                     hose->io_base_phys = range.cpu_addr - range.pci_addr;
>>
>>                       /* Build resource */
>>                       res = &hose->io_resource;
>> -                     res->flags = IORESOURCE_IO;
>> -                     res->start = pci_addr;
>> +                     range.cpu_addr = range.pci_addr;
>>                       break;
>> -             case 2:         /* PCI Memory space */
>> -             case 3:         /* PCI 64 bits Memory space */
>> +             case IORESOURCE_MEM:
>>                       printk(KERN_INFO
>>                              " MEM 0x%016llx..0x%016llx -> 0x%016llx %s\n",
>> -                            cpu_addr, cpu_addr + size - 1, pci_addr,
>> -                            (pci_space & 0x40000000) ? "Prefetch" : "");
>> +                            range.cpu_addr, range.cpu_addr + range.size - 1,
>> +                            range.pci_addr,
>> +                            (range.pci_space & 0x40000000) ?
>> +                            "Prefetch" : "");
>>
>>                       /* We support only 3 memory ranges */
>>                       if (memno >= 3) {
>> @@ -765,28 +742,21 @@ void pci_process_bridge_OF_ranges(struct pci_controller *hose,
>>                               continue;
>>                       }
>>                       /* Handles ISA memory hole space here */
>> -                     if (pci_addr == 0) {
>> +                     if (range.pci_addr == 0) {
>>                               if (primary || isa_mem_base == 0)
>> -                                     isa_mem_base = cpu_addr;
>> -                             hose->isa_mem_phys = cpu_addr;
>> -                             hose->isa_mem_size = size;
>> +                                     isa_mem_base = range.cpu_addr;
>> +                             hose->isa_mem_phys = range.cpu_addr;
>> +                             hose->isa_mem_size = range.size;
>>                       }
>>
>>                       /* Build resource */
>> -                     hose->mem_offset[memno] = cpu_addr - pci_addr;
>> +                     hose->mem_offset[memno] = range.cpu_addr -
>> +                                                     range.pci_addr;
>>                       res = &hose->mem_resources[memno++];
>> -                     res->flags = IORESOURCE_MEM;
>> -                     if (pci_space & 0x40000000)
>> -                             res->flags |= IORESOURCE_PREFETCH;
>> -                     res->start = cpu_addr;
>>                       break;
>>               }
>>               if (res != NULL) {
>> -                     res->name = dev->full_name;
>> -                     res->end = res->start + size - 1;
>> -                     res->parent = NULL;
>> -                     res->sibling = NULL;
>> -                     res->child = NULL;
>> +                     of_pci_range_to_resource(&range, dev, res);
>>               }
>>       }
>>  }
>
>



-- 
Andrew Murray, Director
Embedded Bits Limited
www.embedded-bits.co.uk

Embedded Bits Limited is a company registered in England and Wales
with company number 08178608 and VAT number 140658911. Registered
office: Embedded Bits Limited c/o InTouch Accounting Ltd. Bristol and
West House Post Office Road Bournemouth Dorset BH1 1BL

^ permalink raw reply

* Re: [PATCH v2 1/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: Will Deacon @ 2014-02-25 14:53 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: linux-s390@vger.kernel.org, linaro-kernel@lists.linaro.org,
	linux-ia64@vger.kernel.org,
	user-mode-linux-devel@lists.sourceforge.net,
	linux-parisc@vger.kernel.org, linux-sh@vger.kernel.org,
	rgb@redhat.com, Catalin Marinas, x86@kernel.org, arndb@arndb.de,
	eparis@redhat.com, linux-kernel@vger.kernel.org,
	linux-alpha@vger.kernel.org, dsaxena@linaro.org,
	viro@zeniv.linux.org.uk,
	user-mode-linux-user@lists.sourceforge.net,
	linux-audit@redhat.com, sparclinux@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <1393319784-2758-2-git-send-email-takahiro.akashi@linaro.org>

On Tue, Feb 25, 2014 at 09:16:24AM +0000, AKASHI Takahiro wrote:
> Currently AUDITSYSCALL has a long list of architecture depencency:
>        depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
> 		SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
> The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
> for simplicity.

Looks sensible to me:

  Acked-by: Will Deacon <will.deacon@arm.com>

Will

^ permalink raw reply

* Re: [PATCH v2 1/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: Richard Guy Briggs @ 2014-02-25 15:25 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: linux-s390, linaro-kernel, linux-ia64, user-mode-linux-devel,
	linux-parisc, linux-sh, catalin.marinas, x86, will.deacon,
	linux-kernel, eparis, linux-audit, user-mode-linux-user,
	linux-alpha, sparclinux, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1393319784-2758-2-git-send-email-takahiro.akashi@linaro.org>

On 14/02/25, AKASHI Takahiro wrote:
> Currently AUDITSYSCALL has a long list of architecture depencency:
>        depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
> 		SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
> The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
> for simplicity.
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>

Acked-by: Richard Guy Briggs <rgb@redhat.com>

> ---
>  arch/alpha/Kconfig     |    1 +
>  arch/arm/Kconfig       |    1 +
>  arch/ia64/Kconfig      |    1 +
>  arch/parisc/Kconfig    |    1 +
>  arch/powerpc/Kconfig   |    1 +
>  arch/s390/Kconfig      |    1 +
>  arch/sh/Kconfig        |    1 +
>  arch/sparc/Kconfig     |    1 +
>  arch/um/Kconfig.common |    1 +
>  arch/x86/Kconfig       |    1 +
>  init/Kconfig           |    5 ++++-
>  11 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
> index f6c6b34..b7ff9a3 100644
> --- a/arch/alpha/Kconfig
> +++ b/arch/alpha/Kconfig
> @@ -22,6 +22,7 @@ config ALPHA
>  	select GENERIC_SMP_IDLE_THREAD
>  	select GENERIC_STRNCPY_FROM_USER
>  	select GENERIC_STRNLEN_USER
> +	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_MOD_ARCH_SPECIFIC
>  	select MODULES_USE_ELF_RELA
>  	select ODD_RT_SIGACTION
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index e254198..ca79340 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -24,6 +24,7 @@ config ARM
>  	select GENERIC_STRNCPY_FROM_USER
>  	select GENERIC_STRNLEN_USER
>  	select HARDIRQS_SW_RESEND
> +	select HAVE_ARCH_AUDITSYSCALL if (AEABI && !OABI_COMPAT)
>  	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL
>  	select HAVE_ARCH_KGDB
>  	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
> diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
> index 0c8e553..5409bf4 100644
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -44,6 +44,7 @@ config IA64
>  	select HAVE_MOD_ARCH_SPECIFIC
>  	select MODULES_USE_ELF_RELA
>  	select ARCH_USE_CMPXCHG_LOCKREF
> +	select HAVE_ARCH_AUDITSYSCALL
>  	default y
>  	help
>  	  The Itanium Processor Family is Intel's 64-bit successor to
> diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
> index bb2a8ec..1faefed 100644
> --- a/arch/parisc/Kconfig
> +++ b/arch/parisc/Kconfig
> @@ -28,6 +28,7 @@ config PARISC
>  	select CLONE_BACKWARDS
>  	select TTY # Needed for pdc_cons.c
>  	select HAVE_DEBUG_STACKOVERFLOW
> +	select HAVE_ARCH_AUDITSYSCALL
>  
>  	help
>  	  The PA-RISC microprocessor is designed by Hewlett-Packard and used
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 957bf34..7b3b8fe 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -141,6 +141,7 @@ config PPC
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK
>  	select ARCH_USE_CMPXCHG_LOCKREF if PPC64
> +	select HAVE_ARCH_AUDITSYSCALL
>  
>  config GENERIC_CSUM
>  	def_bool CPU_LITTLE_ENDIAN
> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
> index 65a0775..1b58568 100644
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -103,6 +103,7 @@ config S390
>  	select GENERIC_SMP_IDLE_THREAD
>  	select GENERIC_TIME_VSYSCALL
>  	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
> +	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_ARCH_JUMP_LABEL if !MARCH_G5
>  	select HAVE_ARCH_SECCOMP_FILTER
>  	select HAVE_ARCH_TRACEHOOK
> diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
> index 6357710..4addd87 100644
> --- a/arch/sh/Kconfig
> +++ b/arch/sh/Kconfig
> @@ -42,6 +42,7 @@ config SUPERH
>  	select MODULES_USE_ELF_RELA
>  	select OLD_SIGSUSPEND
>  	select OLD_SIGACTION
> +	select HAVE_ARCH_AUDITSYSCALL
>  	help
>  	  The SuperH is a RISC processor targeted for use in embedded systems
>  	  and consumer electronics; it was also used in the Sega Dreamcast
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index c51efdc..9c74d6b 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -77,6 +77,7 @@ config SPARC64
>  	select ARCH_HAVE_NMI_SAFE_CMPXCHG
>  	select HAVE_C_RECORDMCOUNT
>  	select NO_BOOTMEM
> +	select HAVE_ARCH_AUDITSYSCALL
>  
>  config ARCH_DEFCONFIG
>  	string
> diff --git a/arch/um/Kconfig.common b/arch/um/Kconfig.common
> index 21ca44c..6915d28 100644
> --- a/arch/um/Kconfig.common
> +++ b/arch/um/Kconfig.common
> @@ -1,6 +1,7 @@
>  config UML
>  	bool
>  	default y
> +	select HAVE_ARCH_AUDITSYSCALL
>  	select HAVE_UID16
>  	select GENERIC_IRQ_SHOW
>  	select GENERIC_CPU_DEVICES
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0af5250..2938365 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -127,6 +127,7 @@ config X86
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
>  	select HAVE_CC_STACKPROTECTOR
> +	select HAVE_ARCH_AUDITSYSCALL
>  
>  config INSTRUCTION_DECODER
>  	def_bool y
> diff --git a/init/Kconfig b/init/Kconfig
> index 009a797..d4ec53d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -282,9 +282,12 @@ config AUDIT
>  	  logging of avc messages output).  Does not do system-call
>  	  auditing without CONFIG_AUDITSYSCALL.
>  
> +config HAVE_ARCH_AUDITSYSCALL
> +	bool
> +
>  config AUDITSYSCALL
>  	bool "Enable system-call auditing support"
> -	depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML || SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
> +	depends on AUDIT && HAVE_ARCH_AUDITSYSCALL
>  	default y if SECURITY_SELINUX
>  	help
>  	  Enable low-overhead system-call auditing infrastructure that
> -- 
> 1.7.9.5
> 

- RGB

--
Richard Guy Briggs <rbriggs@redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

^ permalink raw reply

* Re: [PATCH] PPC: KVM: Introduce hypervisor call H_GET_TCE
From: Laurent Dufour @ 2014-02-25 16:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: kvm@vger.kernel.org mailing list, Gleb Natapov, kvm-ppc,
	Paul Mackerras, Paolo Bonzini, linuxppc-dev
In-Reply-To: <75FB1EEB-910A-49A9-A4CC-0A2E5403C54C@suse.de>

On 21/02/2014 16:57, Alexander Graf wrote:
> 
> On 21.02.2014, at 16:31, Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> 
>> This fix introduces the H_GET_TCE hypervisor call which is basically the
>> reverse of H_PUT_TCE, as defined in the Power Architecture Platform
>> Requirements (PAPR).
>>
>> The hcall H_GET_TCE is required by the kdump kernel which is calling it to
>> retrieve the TCE set up by the panicing kernel.
>>
>> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> 
> Thanks, applied to kvm-ppc-queue. Btw, why exactly are we using struct page pointers and alloc_page rather than __get_free_page() and simple page start pointers?

FWIW, I'm not so familiar with that part of code, it seems that this is
due to the page fault handler (kvm_spapr_tce_fault) which is part of the
mmap file operation handlers associated to the fd returned by
kvm_vm_ioctl_create_spapr_tce. Underlying vma's operation requires the
page fault handler to return a struct page value in the vm_fault structure.

Cheers,
Laurent.

^ permalink raw reply

* Re: [PATCH v2 1/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: Matt Turner @ 2014-02-25 17:40 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: linux-s390, linaro-kernel, linux-ia64, user-mode-linux-devel,
	linux-parisc@vger.kernel.org, linux-sh, rgb, catalin.marinas, x86,
	Will Deacon, arndb, eparis, LKML, linux-alpha, dsaxena, Al Viro,
	user-mode-linux-user, linux-audit, sparclinux, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <1393319784-2758-2-git-send-email-takahiro.akashi@linaro.org>

On Tue, Feb 25, 2014 at 1:16 AM, AKASHI Takahiro
<takahiro.akashi@linaro.org> wrote:
> Currently AUDITSYSCALL has a long list of architecture depencency:
>        depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
>                 SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
> The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
> for simplicity.
>
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  arch/alpha/Kconfig     |    1 +
>  arch/arm/Kconfig       |    1 +
>  arch/ia64/Kconfig      |    1 +
>  arch/parisc/Kconfig    |    1 +
>  arch/powerpc/Kconfig   |    1 +
>  arch/s390/Kconfig      |    1 +
>  arch/sh/Kconfig        |    1 +
>  arch/sparc/Kconfig     |    1 +
>  arch/um/Kconfig.common |    1 +
>  arch/x86/Kconfig       |    1 +
>  init/Kconfig           |    5 ++++-
>  11 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
> index f6c6b34..b7ff9a3 100644
> --- a/arch/alpha/Kconfig
> +++ b/arch/alpha/Kconfig
> @@ -22,6 +22,7 @@ config ALPHA
>         select GENERIC_SMP_IDLE_THREAD
>         select GENERIC_STRNCPY_FROM_USER
>         select GENERIC_STRNLEN_USER
> +       select HAVE_ARCH_AUDITSYSCALL
>         select HAVE_MOD_ARCH_SPECIFIC
>         select MODULES_USE_ELF_RELA
>         select ODD_RT_SIGACTION

Thanks.

Acked-by: Matt Turner <mattst88@gmail.com>

^ permalink raw reply

* Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
From: Cody P Schafer @ 2014-02-25 20:33 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC, Arnaldo Carvalho de Melo,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra
  Cc: LKML
In-Reply-To: <20140225033326.135BB2C0227@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote:
>> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
>> generate functions to extract the relevent bits from
>> event->attr.config{,1,2} for use by sw-like pmus where the
>> 'config{,1,2}' values don't map directly to hardware registers.
>>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   include/linux/perf_event.h | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index e56b07f..2702e91 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -871,4 +871,21 @@ _name##_show(struct device *dev,					\
>>   									\
>>   static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
>>
>> +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)		\
>> +PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
>> +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
>> +
>> +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)		\
>> +static u64 event_get_##name##_max(void)					\
>> +{									\
>> +	int bits = (bit_end) - (bit_start) + 1;				\
>> +	return ((0x1ULL << (bits - 1ULL)) - 1ULL) |			\
>> +		(0xFULL << (bits - 4ULL));				\
>> +}									\
>> +static u64 event_get_##name(struct perf_event *event)			\
>> +{									\
>> +	return (event->attr.attr_var >> (bit_start)) &			\
>> +		event_get_##name##_max();				\
>> +}
>
> I still don't like the names.
>
> EVENT_GETTER_AND_FORMAT()

EVENT_RANGE()

I'd prefer to describe the intended usage rather than what is generated 
both in case we change some of the specifics later, and to provide 
additional information to the developers beyond what a simple code 
reading gives.

> EVENT_RESERVED()

Sure. The PMU_* naming was just based on the PMU_FORMAT_ATTR() naming, 
so I kept it for continuity with the existing API. Maybe 
EVENT_RANGE_RESERVED() would be more appropriate?

> ?
>
> It's not clear to me the max routine is useful in general. Can't we just do:
>
>> +#define EVENT_RESERVED(name, attr_var, bit_start, bit_end)		\
>> +static u64 event_get_##name(struct perf_event *event)		\
>> +{									\
>> +	return (event->attr.attr_var >> (bit_start)) &			\
>> +		((0x1ULL << ((bit_end) - (bit_start) + 1)) - 1ULL);	\
>> +}

I use event_get_*_max() for some checking of parameters in event_init(). 
Having it lets me avoid specifying the maximum explicitly (0x7ffff = 
0-19, for example). Specifying it explicitly would mean we'd have the 
bit width of the field in question encoded in two places instead of one, 
and I'd prefer to avoid unneeded duplication.

^ permalink raw reply

* Re: [PATCH v2 05/11] powerpc: add hv_gpci interface header
From: Cody P Schafer @ 2014-02-25 20:35 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <20140225033328.1D5652C030B@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:09 UTC, Cody P Schafer wrote:
>> "H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
>> here on) is an interface to retrieve specific performance counters and
>> other data from the hypervisor. All outputs have a fixed format (and
>> are represented as structs in this patch).
>
> I still see unused stuff in here, can you strip it back to just what we need.
> Same goes for the next patch.
>

Sure, I can remove the unused structures and enum entries (hadn't 
realized you wanted that in the last review).

^ permalink raw reply

* Re: [PATCH v2 09/11] powerpc/perf: add support for the hv 24x7 interface
From: Cody P Schafer @ 2014-02-25 20:55 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <20140225033329.BBB492C033B@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:13 UTC, Cody P Schafer wrote:
>> This provides a basic interface between hv_24x7 and perf. Similar to
>> the one provided for gpci, it lacks transaction support and does not
>> list any events.
>>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/perf/hv-24x7.c | 491 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 491 insertions(+)
>>   create mode 100644 arch/powerpc/perf/hv-24x7.c
>>
>> diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
>> new file mode 100644
>> index 0000000..13de140
>> --- /dev/null
>> +++ b/arch/powerpc/perf/hv-24x7.c
> ...
>> +
>> +/*
>> + * read_offset_data - copy data from one buffer to another while treating the
>> + *                    source buffer as a small view on the total avaliable
>> + *                    source data.
>> + *
>> + * @dest: buffer to copy into
>> + * @dest_len: length of @dest in bytes
>> + * @requested_offset: the offset within the source data we want. Must be > 0
>> + * @src: buffer to copy data from
>> + * @src_len: length of @src in bytes
>> + * @source_offset: the offset in the sorce data that (src,src_len) refers to.
>> + *                 Must be > 0
>> + *
>> + * returns the number of bytes copied.
>> + *
>> + * '.' areas in d are written to.
>> + *
>> + *                       u
>> + *   x         w	 v  z
>> + * d           |.........|
>> + * s |----------------------|
>> + *
>> + *                      u
>> + *   x         w	z     v
>> + * d           |........------|
>> + * s |------------------|
>> + *
>> + *   x         w        u,z,v
>> + * d           |........|
>> + * s |------------------|
>> + *
>> + *   x,w                u,v,z
>> + * d |------------------|
>> + * s |------------------|
>> + *
>> + *   x        u
>> + *   w        v		z
>> + * d |........|
>> + * s |------------------|
>> + *
>> + *   x      z   w      v
>> + * d            |------|
>> + * s |------|
>> + *
>> + * x = source_offset
>> + * w = requested_offset
>> + * z = source_offset + src_len
>> + * v = requested_offset + dest_len
>> + *
>> + * w_offset_in_s = w - x = requested_offset - source_offset
>> + * z_offset_in_s = z - x = src_len
>> + * v_offset_in_s = v - x = request_offset + dest_len - src_len
>> + * u_offset_in_s = min(z_offset_in_s, v_offset_in_s)
>> + *
>> + * copy_len = u_offset_in_s - w_offset_in_s = min(z_offset_in_s, v_offset_in_s)
>> + *						- w_offset_in_s
>
> Comments are great, especially for complicated code like this. But at a glance
> I don't actually understand what this comment is trying to tell me.

The function was composed via some number line logic. The comment tries 
to explain what that logic is. The ascii art is various overlapping 
buffers that we're copying between (the '+'s from the patch are messing 
with the indenting some of the labels). The only major omission I'm 
seeing is I failed to note that d=dest and s=src (though this could be 
inferred from the comment about '.' indicating a write).

Is there anything specific That doesn't make sense in the comment? (it 
may not be a comment that really can be read at a glance).

>
>> + */
>> +static ssize_t read_offset_data(void *dest, size_t dest_len,
>> +				loff_t requested_offset, void *src,
>> +				size_t src_len, loff_t source_offset)
>> +{
>> +	size_t w_offset_in_s = requested_offset - source_offset;
>> +	size_t z_offset_in_s = src_len;
>> +	size_t v_offset_in_s = requested_offset + dest_len - src_len;
>> +	size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
>> +	size_t copy_len = u_offset_in_s - w_offset_in_s;
>> +
>> +	if (requested_offset < 0 || source_offset < 0)
>> +		return -EINVAL;
>> +
>> +	if (z_offset_in_s <= w_offset_in_s)
>> +		return 0;
>> +
>> +	memcpy(dest, src + w_offset_in_s, copy_len);
>> +	return copy_len;
>> +}
>> +
>> +static unsigned long h_get_24x7_catalog_page(char page[static 4096],
>> +					     u32 version, u32 index)
>> +{
>> +	WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
>> +	return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
>> +			virt_to_phys(page),
>> +			version,
>> +			index);
>> +}
>> +
>> +static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
>> +			    struct bin_attribute *bin_attr, char *buf,
>> +			    loff_t offset, size_t count)
>> +{
>> +	unsigned long hret;
>> +	ssize_t ret = 0;
>> +	size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
>> +	loff_t page_offset = 0;
>> +	uint32_t catalog_version_num = 0;
>> +	void *page = kmalloc(4096, GFP_USER);
>> +	struct hv_24x7_catalog_page_0 *page_0 = page;
>> +	if (!page)
>> +		return -ENOMEM;
>> +
>> +
>> +	hret = h_get_24x7_catalog_page(page, 0, 0);
>> +	if (hret) {
>> +		ret = -EIO;
>> +		goto e_free;
>> +	}
>> +
>> +	catalog_version_num = be32_to_cpu(page_0->version);
>> +	catalog_page_len = be32_to_cpu(page_0->length);
>> +	catalog_len = catalog_page_len * 4096;
>> +
>> +	page_offset = offset / 4096;
>> +	page_count  = count  / 4096;
>> +
>> +	if (page_offset >= catalog_page_len)
>> +		goto e_free;
>> +
>> +	if (page_offset != 0) {
>> +		hret = h_get_24x7_catalog_page(page, catalog_version_num,
>> +					       page_offset);
>> +		if (hret) {
>> +			ret = -EIO;
>> +			goto e_free;
>> +		}
>> +	}
>> +
>> +	ret = read_offset_data(buf, count, offset,
>> +				page, 4096, page_offset * 4096);
>> +e_free:
>> +	if (hret)
>> +		pr_err("h_get_24x7_catalog_page(ver=%d, page=%lld) failed: rc=%ld\n",
>> +				catalog_version_num, page_offset, hret);
>> +	kfree(page);
>> +
>> +	pr_devel("catalog_read: offset=%lld(%lld) count=%zu(%zu) catalog_len=%zu(%zu) => %zd\n",
>> +			offset, page_offset, count, page_count, catalog_len,
>> +			catalog_page_len, ret);
>> +
>> +	return ret;
>> +}
>> +
>> +#define PAGE_0_ATTR(_name, _fmt, _expr)				\
>> +static ssize_t _name##_show(struct device *dev,			\
>> +			    struct device_attribute *dev_attr,	\
>> +			    char *buf)				\
>> +{								\
>> +	unsigned long hret;					\
>> +	ssize_t ret = 0;					\
>> +	void *page = kmalloc(4096, GFP_USER);			\
>> +	struct hv_24x7_catalog_page_0 *page_0 = page;		\
>> +	if (!page)						\
>> +		return -ENOMEM;					\
>> +	hret = h_get_24x7_catalog_page(page, 0, 0);		\
>> +	if (hret) {						\
>> +		ret = -EIO;					\
>> +		goto e_free;					\
>> +	}							\
>> +	ret = sprintf(buf, _fmt, _expr);			\
>> +e_free:								\
>> +	kfree(page);						\
>> +	return ret;						\
>> +}								\
>> +static DEVICE_ATTR_RO(_name)
>> +
>> +PAGE_0_ATTR(catalog_version, "%lld\n",
>> +		(unsigned long long)be32_to_cpu(page_0->version));
>> +PAGE_0_ATTR(catalog_len, "%lld\n",
>> +		(unsigned long long)be32_to_cpu(page_0->length) * 4096);
>> +static BIN_ATTR_RO(catalog, 0/* real length varies */);
>
> So we're dumping the catalog out as a binary blob.

Yep

> Why do we want to do that?

Right now it's the only way to know what events are available. 
Additionally, even when the kernel starts parsing events out (and 
exposing them via sysfs), there is some additional powerpc specific 
structuring ("groups" and "schemas" that some userspace applications may 
want to take advantage of.

> It clearly violates the sysfs rule-of-sorts of ASCII and one value per file.
> Obviously there can be exceptions, but what's our justification?

Actual justification is above, but additionally:
I actually was looking at the acpi code that provides (among other 
binary tables) the dsdt as a binary blob in sysfs when I was putting 
this code together. The 24x7 catalog is, in the same manner, a binary 
blob provided by firmware.

>> +static struct bin_attribute *if_bin_attrs[] = {
>> +	&bin_attr_catalog,
>> +	NULL,
>> +};
>> +
>> +static struct attribute *if_attrs[] = {
>> +	&dev_attr_catalog_len.attr,
>> +	&dev_attr_catalog_version.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group if_group = {
>> +	.name = "interface",
>> +	.bin_attrs = if_bin_attrs,
>> +	.attrs = if_attrs,
>> +};
>
> Both pmus have an "interface" directory, but they don't seem to have anything
> in common? Its feels a little ad-hoc.

It is absolutely ad-hoc. The only similarity is that both groups named 
"interface" provide some additional details about the firmware interface 
they're using to provide the perf data. We could easily call them both 
"misc", "details", put all the attributes in the device root, or call 
them some other generic name. I ended up choosing "interface" because 
we're provided details on the firmware interface, and it feels just a 
bit less generic. Having device specific names for the attribute group 
("24x7" and "gpci", for example) doesn't get us anything because the 
devices themselves already have those names ("hv_24x7" and "hv_gpci"). I 
don't see any reason to make them different.

^ permalink raw reply

* Re: [PATCH v2 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
From: Cody P Schafer @ 2014-02-25 21:13 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC, Alexander Graf, Anton Blanchard,
	Benjamin Herrenschmidt, Paul Mackerras
  Cc: Ingo Molnar, LKML, Arnaldo Carvalho de Melo, Peter Zijlstra
In-Reply-To: <20140225033327.878F52C0256@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:08 UTC, Cody P Schafer wrote:
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/hvcall.h | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
>> index d8b600b..652f7e4 100644
>> --- a/arch/powerpc/include/asm/hvcall.h
>> +++ b/arch/powerpc/include/asm/hvcall.h
>> @@ -274,6 +274,11 @@
>>   /* Platform specific hcalls, used by KVM */
>>   #define H_RTAS			0xf000
>>
>> +/* "Platform specific hcalls", provided by PHYP */
>> +#define H_GET_24X7_CATALOG_PAGE 0xF078
>> +#define H_GET_24X7_DATA		0xF07C
>> +#define H_GET_PERF_COUNTER_INFO 0xF080
>
> Some tabs some spaces, use tabs.

Ack.

^ permalink raw reply

* Re: [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities
From: Cody P Schafer @ 2014-02-25 21:20 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <20140225033328.C5A9E2C0324@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> [PATCH v2 07/11] powerpc: add a shared interface to get gpci version and capabilities
>
> All the patches that touch perf should be "powerpc/perf: foo"

Ok.

> On Fri, 2014-14-02 at 22:02:11 UTC, Cody P Schafer wrote:
>> ...
>
> I realise this is a fairly small patch but a changelog is still nice. You could
> for example mention that we don't currently use .ga, .expanded or .lab but
> we're adding the logic anyway because ...
>

Well, we do use them to expose some more information to the user (via 
sysfs attributes). Always nice to know what capabilities are enabled.

But sure, I can explain why each bit in that structure is a good idea.

>
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/perf/hv-common.c | 39 +++++++++++++++++++++++++++++++++++++++
>>   arch/powerpc/perf/hv-common.h | 17 +++++++++++++++++
>>   2 files changed, 56 insertions(+)
>>   create mode 100644 arch/powerpc/perf/hv-common.c
>>   create mode 100644 arch/powerpc/perf/hv-common.h
>>
>> diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
>> new file mode 100644
>> index 0000000..47e02b3
>> --- /dev/null
>> +++ b/arch/powerpc/perf/hv-common.c
>> @@ -0,0 +1,39 @@
>> +#include <asm/io.h>
>> +#include <asm/hvcall.h>
>> +
>> +#include "hv-gpci.h"
>> +#include "hv-common.h"
>> +
>> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
>> +{
>> +	unsigned long r;
>> +	struct p {
>> +		struct hv_get_perf_counter_info_params params;
>> +		struct cv_system_performance_capabilities caps;
>> +	} __packed __aligned(sizeof(uint64_t));
>> +
>> +	struct p arg = {
>> +		.params = {
>> +			.counter_request = cpu_to_be32(
>> +					CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
>> +			.starting_index = cpu_to_be32(-1),
>> +			.counter_info_version_in = 0,
>> +		}
>> +	};
>> +
>> +	r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
>> +			       virt_to_phys(&arg), sizeof(arg));
>> +
>> +	if (r)
>> +		return r;
>> +
>> +	pr_devel("capability_mask: 0x%x\n", arg.caps.capability_mask);
>> +
>> +	caps->version = arg.params.counter_info_version_out;
>> +	caps->collect_privileged = !!arg.caps.perf_collect_privileged;
>> +	caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
>> +	caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
>> +	caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
>> +
>> +	return r;
>> +}
>> diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
>> new file mode 100644
>> index 0000000..7e615bd
>> --- /dev/null
>> +++ b/arch/powerpc/perf/hv-common.h
>> @@ -0,0 +1,17 @@
>> +#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
>> +#define LINUX_POWERPC_PERF_HV_COMMON_H_
>> +
>> +#include <linux/types.h>
>> +
>> +struct hv_perf_caps {
>> +	u16 version;
>> +	u16 collect_privileged:1,
>> +	    ga:1,
>> +	    expanded:1,
>> +	    lab:1,
>> +	    unused:12;
>> +};
>> +
>> +unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
>> +
>> +#endif
>> --
>> 1.8.5.4
>>
>>
>

^ permalink raw reply

* Re: [PATCH v2 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
From: Cody P Schafer @ 2014-02-25 21:25 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo
In-Reply-To: <20140225033329.400E22C0331@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:12 UTC, Cody P Schafer wrote:
>> This provides a basic link between perf and hv_gpci. Notably, it does
>> not yet support transactions and does not list any events (they can
>> still be manually composed).
>
> Can you explain how the HV_CAPS stuff ends up looking.
>
> I'm not against adding it, but I'd like to understand how we expect it to be
> used a bit better.

It's just a quick mechanism for me to expose some relevant information 
to userspace via sysfs using the hv_perf_caps_get() function's returned 
data. Documentation for this sysfs interface (and the rest) is in a 
later patch.
I don't expect any more uses to show up unless the firmware decides to 
add another capability bit (in which case I'll want to expose it as well).

>> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
>> new file mode 100644
>> index 0000000..1f5d96d
>> --- /dev/null
>> +++ b/arch/powerpc/perf/hv-gpci.c
>> +
>> +static struct pmu h_gpci_pmu = {
>> +	.task_ctx_nr = perf_invalid_context,
>> +
>> +	.name = "hv_gpci",
>> +	.attr_groups = attr_groups,
>> +	.event_init  = h_gpci_event_init,
>> +	.add         = h_gpci_event_add,
>> +	.del         = h_gpci_event_del,
> 		     = h_gpci_event_stop,
>
>> +	.start       = h_gpci_event_start,
>> +	.stop        = h_gpci_event_stop,
>> +	.read        = h_gpci_event_read,
> 		     = h_gpci_event_update
>
>> +	.event_idx = perf_swevent_event_idx,
>> +};

whoops, thought I had fixed those 2 already.

^ permalink raw reply

* Re: [PATCH v2 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
From: Cody P Schafer @ 2014-02-25 21:31 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC, Aneesh Kumar K.V, Anshuman Khandual,
	Anton Blanchard, Benjamin Herrenschmidt, Kumar Gala, Lijun Pan,
	Li Yang, Paul Bolle, Priyanka Jain, Scott Wood, Tang Yuantian
  Cc: Ingo Molnar, Paul Mackerras, LKML, Arnaldo Carvalho de Melo,
	Peter Zijlstra
In-Reply-To: <20140225033330.54F332C02FB@ozlabs.org>

On 02/24/2014 07:33 PM, Michael Ellerman wrote:
> On Fri, 2014-14-02 at 22:02:14 UTC, Cody P Schafer wrote:
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/perf/Makefile             | 2 ++
>>   arch/powerpc/platforms/Kconfig.cputype | 6 ++++++
>>   2 files changed, 8 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
>> index 60d71ee..f9c083a 100644
>> --- a/arch/powerpc/perf/Makefile
>> +++ b/arch/powerpc/perf/Makefile
>> @@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
>>   obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
>>   obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
>>
>> +obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
>> +
>>   obj-$(CONFIG_PPC64)		+= $(obj64-y)
>>   obj-$(CONFIG_PPC32)		+= $(obj32-y)
>> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
>> index 434fda3..dcc67cd 100644
>> --- a/arch/powerpc/platforms/Kconfig.cputype
>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>> @@ -364,6 +364,12 @@ config PPC_PERF_CTRS
>>          help
>>            This enables the powerpc-specific perf_event back-end.
>>
>> +config HV_PERF_CTRS
>> +       def_bool y
>
> This was bool, why did you change it?

No, it wasn't. v1 also had def_bool. https://lkml.org/lkml/2014/1/16/518
Maybe you're confusing v2.1 and v2 of this patch?

>
>> +       depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
>
> Should be:
>
> 	depends on PERF_EVENTS && PPC_PSERIES
>
>> +       help
>> +         Enable access to perf counters provided by the hypervisor
>> +

Yep, the v2.1 patch (which I bungled and labeled as 9/11) already 
changes both of these.
It'll end up rolled into v3.

^ permalink raw reply

* Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
From: Cody P Schafer @ 2014-02-25 21:38 UTC (permalink / raw)
  To: Peter Zijlstra, Michael Ellerman
  Cc: Paul Mackerras, Ingo Molnar, Linux PPC, LKML,
	Arnaldo Carvalho de Melo
In-Reply-To: <20140225102008.GI9987@twins.programming.kicks-ass.net>

On 02/25/2014 02:20 AM, Peter Zijlstra wrote:
> On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote:
>> On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
>>> Export the swevent hrtimer helpers currently only used in events/core.c
>>> to allow the addition of architecture specific sw-like pmus.
>>
>> Peter, Ingo, can we get your ACK on this please?
>
> How are they used? I saw some usage in patch 9 or so; but its not
> explained anywhere. All patches have non-existent Changelogs and the few
> comments that are there are pretty hardware specific.
>
> So please do tell; what do you need this for?

 From this patch's change log:

> Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus.

The key part here is "architecture specific sw-like pmus", where the 
announcement explains why these pmus are sw-like:

> The counters supplied by these interfaces are continually counting and never
> need to be (and cannot be) disabled or enabled. They additionally do not
> generate any interrupts. This makes them in some regards similar to software
> counters, and as a result their implimentation shares some common code (which
> an initial patch exposes) with the sw counters.

Essentially, these pmus just provide access to a big array of counters 
which don't generate interrupts, and are all 64bit (and assumed to never 
overflow). Rather than duplicate the code that we already have for 
managing timing when reading from counters that don't have interrupts 
(the functions that are exposed by this patch), I've reused it.

^ permalink raw reply

* [PATCH] rapidio: rework device hierarchy and introduce mport class of devices
From: Alexandre Bounine @ 2014-02-25 21:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arno Tiemersma, linux-kernel, Andre van Herk, Jerry Jacobs,
	Alexandre Bounine, Rob Landley, Stef van Os, linuxppc-dev

This patch removes an artificial RapidIO bus root device and establishes actual
device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip or
an eternal bridge, known as "mport").

Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller. With introduction of devices using multiple RapidIO
controllers and PCIe-to-RapidIO bridges the old scheme is very limiting or does
not work at all. The implemented changes allow to properly reference platform's
local RapidIO mport devices and provide device details needed for upper layers.

This change to RapidIO device hierarchy does not break any known existing kernel
or user space interfaces.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
Cc: Rob Landley <rob@landley.net>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
---
 Documentation/rapidio/sysfs.txt  |   66 +++++++++++++++++++++++++++++++++----
 arch/powerpc/sysdev/fsl_rio.c    |    1 +
 drivers/net/rionet.c             |    1 +
 drivers/rapidio/devices/tsi721.c |    1 +
 drivers/rapidio/rio-driver.c     |   22 ++++++++----
 drivers/rapidio/rio-scan.c       |    1 +
 drivers/rapidio/rio-sysfs.c      |   40 +++++++++++++++++++++++
 drivers/rapidio/rio.c            |   11 ++++++
 drivers/rapidio/rio.h            |    1 +
 include/linux/rio.h              |    5 ++-
 10 files changed, 133 insertions(+), 16 deletions(-)

diff --git a/Documentation/rapidio/sysfs.txt b/Documentation/rapidio/sysfs.txt
index 271438c..47ce9a5 100644
--- a/Documentation/rapidio/sysfs.txt
+++ b/Documentation/rapidio/sysfs.txt
@@ -2,8 +2,8 @@
 
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-1. Device Subdirectories
-------------------------
+1. RapidIO Device Subdirectories
+--------------------------------
 
 For each RapidIO device, the RapidIO subsystem creates files in an individual
 subdirectory with the following name, /sys/bus/rapidio/devices/<device_name>.
@@ -25,8 +25,8 @@ seen by the enumerating host (destID = 1):
 NOTE: An enumerating or discovering endpoint does not create a sysfs entry for
 itself, this is why an endpoint with destID=1 is not shown in the list.
 
-2. Attributes Common for All Devices
-------------------------------------
+2. Attributes Common for All RapidIO Devices
+--------------------------------------------
 
 Each device subdirectory contains the following informational read-only files:
 
@@ -52,16 +52,16 @@ This attribute is similar in behavior to the "config" attribute of PCI devices
 and provides an access to the RapidIO device registers using standard file read
 and write operations.
 
-3. Endpoint Device Attributes
------------------------------
+3. RapidIO Endpoint Device Attributes
+-------------------------------------
 
 Currently Linux RapidIO subsystem does not create any endpoint specific sysfs
 attributes. It is possible that RapidIO master port drivers and endpoint device
 drivers will add their device-specific sysfs attributes but such attributes are
 outside the scope of this document.
 
-4. Switch Device Attributes
----------------------------
+4. RapidIO Switch Device Attributes
+-----------------------------------
 
 RapidIO switches have additional attributes in sysfs. RapidIO subsystem supports
 common and device-specific sysfs attributes for switches. Because switches are
@@ -106,3 +106,53 @@ attribute:
 	 for that controller always will be 0.
 	 To initiate RapidIO enumeration/discovery on all available mports
 	 a user must write '-1' (or RIO_MPORT_ANY) into this attribute file.
+
+
+6. RapidIO Bus Controllers/Ports
+--------------------------------
+
+On-chip RapidIO controllers and PCIe-to-RapidIO bridges (referenced as
+"Master Port" or "mport") are presented in sysfs as the special class of
+devices: "rapidio_port".
+
+The /sys/class/rapidio_port subdirectory contains individual subdirectories
+named as "rapidioN" where N = mport ID registered with RapidIO subsystem.
+
+NOTE: An mport ID is not a RapidIO destination ID assigned to a given local
+mport device.
+
+Each mport device subdirectory in addition to standard entries contains the
+following device-specific attributes:
+
+   port_destid - reports RapidIO destination ID assigned to the given RapidIO
+                 mport device. If value 0xFFFFFFFF is returned this means that
+                 no valid destination ID have been assigned to the mport (yet).
+                 Normally, before enumeration/discovery have been executed only
+                 fabric enumerating mports have a valid destination ID assigned
+                 to them using "hdid=..." rapidio module parameter.
+      sys_size - reports RapidIO common transport system size:
+                   0 = small (8-bit destination ID, max. 256 devices),
+                   1 = large (16-bit destination ID, max. 65536 devices).
+
+After enumeration or discovery was performed for a given mport device,
+the corresponding subdirectory will also contain subdirectories for each
+child RapidIO device connected to the mport. Naming conventions for RapidIO
+devices are described in Section 1 above.
+
+The example below shows mport device subdirectory with several child RapidIO
+devices attached to it.
+
+[rio@rapidio ~]$ ls /sys/class/rapidio_port/rapidio0/ -l
+total 0
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:e:0001
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:e:0004
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:e:0007
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:s:0002
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:s:0003
+drwxr-xr-x 3 root root    0 Feb 11 15:10 00:s:0005
+lrwxrwxrwx 1 root root    0 Feb 11 15:11 device -> ../../../0000:01:00.0
+-r--r--r-- 1 root root 4096 Feb 11 15:11 port_destid
+drwxr-xr-x 2 root root    0 Feb 11 15:11 power
+lrwxrwxrwx 1 root root    0 Feb 11 15:04 subsystem -> ../../../../../../class/rapidio_port
+-r--r--r-- 1 root root 4096 Feb 11 15:11 sys_size
+-rw-r--r-- 1 root root 4096 Feb 11 15:04 uevent
diff --git a/arch/powerpc/sysdev/fsl_rio.c b/arch/powerpc/sysdev/fsl_rio.c
index 95dd892..cf2b084 100644
--- a/arch/powerpc/sysdev/fsl_rio.c
+++ b/arch/powerpc/sysdev/fsl_rio.c
@@ -531,6 +531,7 @@ int fsl_rio_setup(struct platform_device *dev)
 		sprintf(port->name, "RIO mport %d", i);
 
 		priv->dev = &dev->dev;
+		port->dev.parent = &dev->dev;
 		port->ops = ops;
 		port->priv = priv;
 		port->phys_efptr = 0x100;
diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c
index 6d1f6ed..a849718 100644
--- a/drivers/net/rionet.c
+++ b/drivers/net/rionet.c
@@ -493,6 +493,7 @@ static int rionet_setup_netdev(struct rio_mport *mport, struct net_device *ndev)
 	ndev->netdev_ops = &rionet_netdev_ops;
 	ndev->mtu = RIO_MAX_MSG_SIZE - 14;
 	ndev->features = NETIF_F_LLTX;
+	SET_NETDEV_DEV(ndev, &mport->dev);
 	SET_ETHTOOL_OPS(ndev, &rionet_ethtool_ops);
 
 	spin_lock_init(&rnet->lock);
diff --git a/drivers/rapidio/devices/tsi721.c b/drivers/rapidio/devices/tsi721.c
index ff7cbf2..1753dc6 100644
--- a/drivers/rapidio/devices/tsi721.c
+++ b/drivers/rapidio/devices/tsi721.c
@@ -2256,6 +2256,7 @@ static int tsi721_setup_mport(struct tsi721_device *priv)
 	mport->phy_type = RIO_PHY_SERIAL;
 	mport->priv = (void *)priv;
 	mport->phys_efptr = 0x100;
+	mport->dev.parent = &pdev->dev;
 	priv->mport = mport;
 
 	INIT_LIST_HEAD(&mport->dbells);
diff --git a/drivers/rapidio/rio-driver.c b/drivers/rapidio/rio-driver.c
index c9ae692..f301f05 100644
--- a/drivers/rapidio/rio-driver.c
+++ b/drivers/rapidio/rio-driver.c
@@ -167,7 +167,6 @@ void rio_unregister_driver(struct rio_driver *rdrv)
 void rio_attach_device(struct rio_dev *rdev)
 {
 	rdev->dev.bus = &rio_bus_type;
-	rdev->dev.parent = &rio_bus;
 }
 EXPORT_SYMBOL_GPL(rio_attach_device);
 
@@ -216,9 +215,12 @@ static int rio_uevent(struct device *dev, struct kobj_uevent_env *env)
 	return 0;
 }
 
-struct device rio_bus = {
-	.init_name = "rapidio",
+struct class rio_mport_class = {
+	.name		= "rapidio_port",
+	.owner		= THIS_MODULE,
+	.dev_groups	= rio_mport_groups,
 };
+EXPORT_SYMBOL_GPL(rio_mport_class);
 
 struct bus_type rio_bus_type = {
 	.name = "rapidio",
@@ -233,14 +235,20 @@ struct bus_type rio_bus_type = {
 /**
  *  rio_bus_init - Register the RapidIO bus with the device model
  *
- *  Registers the RIO bus device and RIO bus type with the Linux
+ *  Registers the RIO mport device class and RIO bus type with the Linux
  *  device model.
  */
 static int __init rio_bus_init(void)
 {
-	if (device_register(&rio_bus) < 0)
-		printk("RIO: failed to register RIO bus device\n");
-	return bus_register(&rio_bus_type);
+	int ret;
+
+	ret = class_register(&rio_mport_class);
+	if (!ret) {
+		ret = bus_register(&rio_bus_type);
+		if (ret)
+			class_unregister(&rio_mport_class);
+	}
+	return ret;
 }
 
 postcore_initcall(rio_bus_init);
diff --git a/drivers/rapidio/rio-scan.c b/drivers/rapidio/rio-scan.c
index d3a6539..47a1b2e 100644
--- a/drivers/rapidio/rio-scan.c
+++ b/drivers/rapidio/rio-scan.c
@@ -461,6 +461,7 @@ static struct rio_dev *rio_setup_device(struct rio_net *net,
 			     rdev->comp_tag & RIO_CTAG_UDEVID);
 	}
 
+	rdev->dev.parent = &port->dev;
 	rio_attach_device(rdev);
 
 	device_initialize(&rdev->dev);
diff --git a/drivers/rapidio/rio-sysfs.c b/drivers/rapidio/rio-sysfs.c
index e0221c6..cdb005c 100644
--- a/drivers/rapidio/rio-sysfs.c
+++ b/drivers/rapidio/rio-sysfs.c
@@ -341,3 +341,43 @@ const struct attribute_group *rio_bus_groups[] = {
 	&rio_bus_group,
 	NULL,
 };
+
+static ssize_t
+port_destid_show(struct device *dev, struct device_attribute *attr,
+		 char *buf)
+{
+	struct rio_mport *mport = to_rio_mport(dev);
+
+	if (mport)
+		return sprintf(buf, "0x%04x\n", mport->host_deviceid);
+	else
+		return -ENODEV;
+}
+static DEVICE_ATTR_RO(port_destid);
+
+static ssize_t sys_size_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct rio_mport *mport = to_rio_mport(dev);
+
+	if (mport)
+		return sprintf(buf, "%u\n", mport->sys_size);
+	else
+		return -ENODEV;
+}
+static DEVICE_ATTR_RO(sys_size);
+
+static struct attribute *rio_mport_attrs[] = {
+	&dev_attr_port_destid.attr,
+	&dev_attr_sys_size.attr,
+	NULL,
+};
+
+static const struct attribute_group rio_mport_group = {
+	.attrs = rio_mport_attrs,
+};
+
+const struct attribute_group *rio_mport_groups[] = {
+	&rio_mport_group,
+	NULL,
+};
diff --git a/drivers/rapidio/rio.c b/drivers/rapidio/rio.c
index 2e8a20c..a54ba04 100644
--- a/drivers/rapidio/rio.c
+++ b/drivers/rapidio/rio.c
@@ -1884,6 +1884,7 @@ static int rio_get_hdid(int index)
 int rio_register_mport(struct rio_mport *port)
 {
 	struct rio_scan_node *scan = NULL;
+	int res = 0;
 
 	if (next_portid >= RIO_MAX_MPORTS) {
 		pr_err("RIO: reached specified max number of mports\n");
@@ -1894,6 +1895,16 @@ int rio_register_mport(struct rio_mport *port)
 	port->host_deviceid = rio_get_hdid(port->id);
 	port->nscan = NULL;
 
+	dev_set_name(&port->dev, "rapidio%d", port->id);
+	port->dev.class = &rio_mport_class;
+
+	res = device_register(&port->dev);
+	if (res)
+		dev_err(&port->dev, "RIO: mport%d registration failed ERR=%d\n",
+			port->id, res);
+	else
+		dev_dbg(&port->dev, "RIO: mport%d registered\n", port->id);
+
 	mutex_lock(&rio_mport_list_lock);
 	list_add_tail(&port->node, &rio_mports);
 
diff --git a/drivers/rapidio/rio.h b/drivers/rapidio/rio.h
index 5f99d22..2d0550e 100644
--- a/drivers/rapidio/rio.h
+++ b/drivers/rapidio/rio.h
@@ -50,6 +50,7 @@ extern int rio_mport_scan(int mport_id);
 /* Structures internal to the RIO core code */
 extern const struct attribute_group *rio_dev_groups[];
 extern const struct attribute_group *rio_bus_groups[];
+extern const struct attribute_group *rio_mport_groups[];
 
 #define RIO_GET_DID(size, x)	(size ? (x & 0xffff) : ((x & 0x00ff0000) >> 16))
 #define RIO_SET_DID(size, x)	(size ? (x & 0xffff) : ((x & 0x000000ff) << 16))
diff --git a/include/linux/rio.h b/include/linux/rio.h
index b71d573..6bda06f 100644
--- a/include/linux/rio.h
+++ b/include/linux/rio.h
@@ -83,7 +83,7 @@
 #define RIO_CTAG_UDEVID	0x0001ffff /* Unique device identifier */
 
 extern struct bus_type rio_bus_type;
-extern struct device rio_bus;
+extern struct class rio_mport_class;
 
 struct rio_mport;
 struct rio_dev;
@@ -201,6 +201,7 @@ struct rio_dev {
 #define rio_dev_f(n) list_entry(n, struct rio_dev, net_list)
 #define	to_rio_dev(n) container_of(n, struct rio_dev, dev)
 #define sw_to_rio_dev(n) container_of(n, struct rio_dev, rswitch[0])
+#define	to_rio_mport(n) container_of(n, struct rio_mport, dev)
 
 /**
  * struct rio_msg - RIO message event
@@ -248,6 +249,7 @@ enum rio_phy_type {
  * @phy_type: RapidIO phy type
  * @phys_efptr: RIO port extended features pointer
  * @name: Port name string
+ * @dev: device structure associated with an mport
  * @priv: Master port private data
  * @dma: DMA device associated with mport
  * @nscan: RapidIO network enumeration/discovery operations
@@ -272,6 +274,7 @@ struct rio_mport {
 	enum rio_phy_type phy_type;	/* RapidIO phy type */
 	u32 phys_efptr;
 	unsigned char name[RIO_MAX_MPORT_NAME];
+	struct device dev;
 	void *priv;		/* Master port private data */
 #ifdef CONFIG_RAPIDIO_DMA_ENGINE
 	struct dma_device	dma;
-- 
1.7.8.4

^ permalink raw reply related

* Re: [rtc-linux] [PATCH] rtc/ds3232: Enable ds3232 to work as wakeup source
From: Andrew Morton @ 2014-02-25 22:07 UTC (permalink / raw)
  To: rtc-linux; +Cc: a.zummo, linuxppc-dev, Dongsheng Wang, chenhui.zhao
In-Reply-To: <1390281891-9632-1-git-send-email-dongsheng.wang@freescale.com>

On Tue, 21 Jan 2014 13:24:51 +0800 Dongsheng Wang <dongsheng.wang@freescale.com> wrote:

> From: Wang Dongsheng <dongsheng.wang@freescale.com>
> 
> Add suspend/resume and device_init_wakeup to enable ds3232 as
> wakeup source, /sys/class/rtc/rtcX/wakealarm for set wakeup alarm.
> 
> ...
> 
> @@ -411,23 +424,21 @@ static int ds3232_probe(struct i2c_client *client,
>  	if (ret)
>  		return ret;
>  
> -	ds3232->rtc = devm_rtc_device_register(&client->dev, client->name,
> -					  &ds3232_rtc_ops, THIS_MODULE);
> -	if (IS_ERR(ds3232->rtc)) {
> -		dev_err(&client->dev, "unable to register the class device\n");
> -		return PTR_ERR(ds3232->rtc);
> -	}
> -
> -	if (client->irq >= 0) {
> +	if (client->irq != NO_IRQ) {

x86_64 allmodconfig:

drivers/rtc/rtc-ds3232.c: In function 'ds3232_probe':
drivers/rtc/rtc-ds3232.c:427: error: 'NO_IRQ' undeclared (first use in this function)
drivers/rtc/rtc-ds3232.c:427: error: (Each undeclared identifier is reported only once
drivers/rtc/rtc-ds3232.c:427: error: for each function it appears in.)

Not all architectures implement NO_IRQ.

I think this should be 

	if (client->irq > 0) {

but I'm not sure - iirc, x86 (at least) treats zero as "not an IRQ". 
But I think some architectures permit IRQ 0.  There was discussion many
years ago but I don't think anything got resolved.


Help!  I think some ppc people will know what to do here?

^ permalink raw reply

* Re: [PATCH v2 01/11] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
From: Cody P Schafer @ 2014-02-25 22:19 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC, Arnaldo Carvalho de Melo,
	Ingo Molnar, Paul Mackerras, Peter Zijlstra
  Cc: LKML
In-Reply-To: <530CFE30.70803@linux.vnet.ibm.com>

On 02/25/2014 12:33 PM, Cody P Schafer wrote:
> On 02/24/2014 07:33 PM, Michael Ellerman wrote:
>> On Fri, 2014-14-02 at 22:02:05 UTC, Cody P Schafer wrote:
>>> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
>>> generate functions to extract the relevent bits from
>>> event->attr.config{,1,2} for use by sw-like pmus where the
>>> 'config{,1,2}' values don't map directly to hardware registers.
>>>
>>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>>> ---
>>>   include/linux/perf_event.h | 17 +++++++++++++++++
>>>   1 file changed, 17 insertions(+)
>>>
>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>> index e56b07f..2702e91 100644
>>> --- a/include/linux/perf_event.h
>>> +++ b/include/linux/perf_event.h
>>> @@ -871,4 +871,21 @@ _name##_show(struct device
>>> *dev,                    \
>>>                                       \
>>>   static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
>>>
>>> +#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)        \
>>> +PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);        \
>>> +PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
>>> +
>>> +#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)        \
>>> +static u64 event_get_##name##_max(void)                    \
>>> +{                                    \
>>> +    int bits = (bit_end) - (bit_start) + 1;                \
>>> +    return ((0x1ULL << (bits - 1ULL)) - 1ULL) |            \
>>> +        (0xFULL << (bits - 4ULL));                \
>>> +}                                    \
>>> +static u64 event_get_##name(struct perf_event *event)            \
>>> +{                                    \
>>> +    return (event->attr.attr_var >> (bit_start)) &            \
>>> +        event_get_##name##_max();                \
>>> +}
>>
>> I still don't like the names.
>>
>> EVENT_GETTER_AND_FORMAT()
>
> EVENT_RANGE()
>
> I'd prefer to describe the intended usage rather than what is generated
> both in case we change some of the specifics later, and to provide
> additional information to the developers beyond what a simple code
> reading gives.
>
>> EVENT_RESERVED()
>
> Sure. The PMU_* naming was just based on the PMU_FORMAT_ATTR() naming,
> so I kept it for continuity with the existing API. Maybe
> EVENT_RANGE_RESERVED() would be more appropriate?
>

Thinking about this a bit more, EVENT_RANGE() and EVENT_RANGE_RESERVED() 
aren't quite ideal either. The "EVENT" name collides with the files we 
put in the event/ dir, which these macros generate files for the format/ 
dir. Maybe:

FORMAT_RANGE() and FORMAT_RANGE_RESERVED()
or
PMU_FORMAT_RANGE(), PMU_FORMAT_RANGE_RESERVED()

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Benjamin Herrenschmidt @ 2014-02-25 22:40 UTC (permalink / raw)
  To: Deepthi Dharwar
  Cc: Madhavan Srinivasan, linuxppc-dev, Wang Dongsheng, linux-kernel,
	Paul Gortmaker, Paul Mackerras, Olof Johansson, Cody P Schafer
In-Reply-To: <530C4D57.1030905@linux.vnet.ibm.com>

On Tue, 2014-02-25 at 13:29 +0530, Deepthi Dharwar wrote:
> We currently do not use smt-snooze-delay in the kernel.
> The sysfs entries needs to  be retained until we do a clean up
> ppc64_cpu
> util that uses these entries to determine SMT,
> clean up patch for this has already been posted out by Prerna.
> Once, we have the ppc64_cpu changes in, we can look to clean up these
> parts from the kernel.

We generally shouldn't change user visible interfaces.

People still have old versions of ppc64_cpu, we must not break them

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: warn users of smt-snooze-delay that the API isn't there anymore
From: Cody P Schafer @ 2014-02-25 22:47 UTC (permalink / raw)
  To: Madhavan Srinivasan, Benjamin Herrenschmidt, Olof Johansson,
	Paul Gortmaker, Wang Dongsheng
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel
In-Reply-To: <530C21E6.5020106@linux.vnet.ibm.com>

On 02/24/2014 08:53 PM, Madhavan Srinivasan wrote:
> On Saturday 22 February 2014 05:44 AM, Cody P Schafer wrote:
>> /sys/devices/system/cpu/cpu*/smt-snooze-delay was converted into a NOP
>> in commit 3fa8cad82b94d0bed002571bd246f2299ffc876b, and now does
>> nothing. Add a pr_warn() to convince any users that they should stop
>> using it.
>>
>> The commit message from the removing commit notes that this
>> functionality should move into the cpuidle driver, essentially by
>
> Would prefer to cleanup the code since the functionality is moved,
> instead of adding to it.

We'd still want users of the interface to use an attribute wired up 
under the cpuidle/ dir, so a warning (to update their software) is still 
needed. As deepthi has noted, cpuidle right now doesn't support changing 
this on a per-cpu basis, so a "cleanup" isn't a simple matter.

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Read opal error log and export it through sysfs interface.
From: Stewart Smith @ 2014-02-25 23:19 UTC (permalink / raw)
  To: Mahesh Jagannath Salgaonkar, linuxppc-dev
In-Reply-To: <530C23D3.6020203@linux.vnet.ibm.com>

Mahesh Jagannath Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
>>  I think we could provide a better interface with instead having a file
>>  per log message appear in sysfs. We're never going to have more than 128
>>  of these at any one time on the Linux side, so it's not going to bee too
>>  many files.
>
> It is not just about 128 files, we may be adding/removing sysfs node for
> every new log id that gets informed to kernel and ack-ed. In worst case,
> when we have flood of elog errors with user daemon consuming it and
> ack-ing back to get ready for next log in a tight poll, we may
> continuously add/remove the sysfs node for each new <id>.

Do we ever get a storm of hundreds/thousands of them though? If many
come it at once userspace may just be woken up one or two times, as it
would just select() and wait for events.

>>  I've seen some conflicting things on this - is it 2kb or 16kb?
>
> We choose 16kb because we want to pull all the log data and not
> partial.

So the max log size for any one entry is in fact 16kb?

>>  This means we constantly use 128 * sizeof(struct opal_err_log) which
>>  equates to somewhere north of 2MB of memory (due to list overhead).
>> 
>>  I don't think we need to statically allocate this, we can probably just
>>  allocate on-demand as in a typical system you're probably quite
>>  unlikely to have too many of these sitting around (besides, if for
>>  whatever reason we cannot allocate memory at some point, that's okay
>>  because we can read it again later).
>
> The reason we choose to go for static allocation is, we can not afford
> to drop or delay a critical error log due to memory allocation failure.
> OR we can keep static allocations for critical errors and follow dynamic
> allocation for informative error logs.  What do you say?

Userspace is probably going to have to do IO to get the log and ack it,
so it's probably not a huge problem - if we can't allocate a few kb in a
couple of attempts then we likely have bigger problems.

If we were going to have a sustained amount of hundreds/thousands of
these per second then perhaps we'd have other issues, but from what I
understand we're probably only going to have a handful per year on a
typical system? (I am, of course, not talking about our dev systems,
which are rather atypical :)

I'll likely have a patch today that shows kind of what I mean.

^ permalink raw reply

* [PATCH 0/7] cpuidle/powernv: Enable Fast-Sleep on PowerNV
From: Preeti U Murthy @ 2014-02-26  0:07 UTC (permalink / raw)
  To: linux-pm, geoff, fweisbec, daniel.lezcano, srivatsa.bhat, benh,
	tglx, svaidy, linuxppc-dev, mingo
  Cc: paulmck, rafael.j.wysocki

This series is based on tip/timers/core ontop of commit
849401b66d305:tick: Fixup more fallout from hrtimer broadcast mode.

Fast sleep is one of the deep idle states on Power8 in which local timers of
CPUs stop. On PowerPC we do not have an external clock device which can
handle wakeup of such CPUs. Now that we have the support in the tick
broadcast framework for archs that do not sport such a device soon to go
upstream, add fast sleep as one of the idle states on PowerNV along with
related arch specific support.

The earlier versions of this patchset included support in the tick broadcast
framework for such idle states. Now that the support in the broadcast
framework has been pulled into tip separately, this series is posted
independently and as a new patchset altogether. This series depends in
particular on the following commits in tip/timers/core:

1.da7e6f45c3:time: Change the return type of clockevents_notify() to integer
2.ba8f20c2eb:cpuidle: Handle clockevents_notify(BROADCAST_ENTER) failure
3.5d1638acb9f62fa:tick: Introduce hrtimer based broadcast
4.f1689bb7abec8e2e6:time: Fixup fallout from recent clockevent/tick changes
5.849401b66d305f3feb75:Fixup more fallout from hrtimer broadcast mode

---

Preeti U Murthy (3):
      cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
      cpuidle/powernv: Add "Fast-Sleep" CPU idle state
      cpuidle/powernv: Parse device tree to setup idle states

Srivatsa S. Bhat (2):
      powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
      powerpc: Implement tick broadcast IPI as a fixed IPI message

Vaidyanathan Srinivasan (2):
      powernv/cpuidle: Add context management for Fast Sleep
      powermgt: Add OPAL call to resync timebase on wakeup


 arch/powerpc/Kconfig                           |    2 
 arch/powerpc/include/asm/opal.h                |    2 
 arch/powerpc/include/asm/processor.h           |    1 
 arch/powerpc/include/asm/smp.h                 |    2 
 arch/powerpc/include/asm/time.h                |    1 
 arch/powerpc/kernel/exceptions-64s.S           |   10 ++
 arch/powerpc/kernel/idle_power7.S              |   90 +++++++++++++++++----
 arch/powerpc/kernel/smp.c                      |   25 ++++--
 arch/powerpc/kernel/time.c                     |   90 +++++++++++++--------
 arch/powerpc/platforms/cell/interrupt.c        |    2 
 arch/powerpc/platforms/powernv/opal-wrappers.S |    1 
 arch/powerpc/platforms/ps3/smp.c               |    2 
 drivers/cpuidle/cpuidle-powernv.c              |  102 ++++++++++++++++++++++--
 13 files changed, 253 insertions(+), 77 deletions(-)

-- 
Signature

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox