LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 12/12] ppc64/kexec_file: fix kexec load failure with lack of memory hole
From: Hari Bathini @ 2020-07-20 12:55 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton
  Cc: Pingfan Liu, Kexec-ml, Nayna Jain, Petr Tesarik,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Vivek Goyal, Dave Young, Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <159524918900.20855.17709718993097359220.stgit@hbathini.in.ibm.com>

The kexec purgatory has to run in real mode. Only the first memory
block maybe accessible in real mode. And, unlike the case with panic
kernel, no memory is set aside for regular kexec load. Another thing
to note is, the memory for crashkernel is reserved at an offset of
128MB. So, when crashkernel memory is reserved, the memory ranges to
load kexec segments shrink further as the generic code only looks for
memblock free memory ranges and in all likelihood only a tiny bit of
memory from 0 to 128MB would be available to load kexec segments.

With kdump being used by default in general, kexec file load is likely
to fail almost always. This can be fixed by changing the memory hole
lookup logic for regular kexec to use the same method as kdump. This
would mean that most kexec segments will overlap with crashkernel
memory region. That should still be ok as the pages, whose destination
address isn't available while loading, are placed in an intermediate
location till a flush to the actual destination address happens during
kexec boot sequence.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Tested-by: Pingfan Liu <piliu@redhat.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
---

v3 -> v4:
* Unchanged. Added Reviewed-by tag from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* New patch to fix locating memory hole for kexec_file_load (kexec -s -l)
  when memory is reserved for crashkernel.


 arch/powerpc/kexec/file_load_64.c |   33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kexec/file_load_64.c b/arch/powerpc/kexec/file_load_64.c
index 47642d5..694f305 100644
--- a/arch/powerpc/kexec/file_load_64.c
+++ b/arch/powerpc/kexec/file_load_64.c
@@ -1374,13 +1374,6 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 	u64 buf_min, buf_max;
 	int ret;
 
-	/*
-	 * Use the generic kexec_locate_mem_hole for regular
-	 * kexec_file_load syscall
-	 */
-	if (kbuf->image->type != KEXEC_TYPE_CRASH)
-		return kexec_locate_mem_hole(kbuf);
-
 	/* Look up the exclude ranges list while locating the memory hole */
 	emem = &(kbuf->image->arch.exclude_ranges);
 	if (!(*emem) || ((*emem)->nr_ranges == 0)) {
@@ -1388,11 +1381,15 @@ int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
 		return kexec_locate_mem_hole(kbuf);
 	}
 
+	buf_min = kbuf->buf_min;
+	buf_max = kbuf->buf_max;
 	/* Segments for kdump kernel should be within crashkernel region */
-	buf_min = (kbuf->buf_min < crashk_res.start ?
-		   crashk_res.start : kbuf->buf_min);
-	buf_max = (kbuf->buf_max > crashk_res.end ?
-		   crashk_res.end : kbuf->buf_max);
+	if (kbuf->image->type == KEXEC_TYPE_CRASH) {
+		buf_min = (buf_min < crashk_res.start ?
+			   crashk_res.start : buf_min);
+		buf_max = (buf_max > crashk_res.end ?
+			   crashk_res.end : buf_max);
+	}
 
 	if (buf_min > buf_max) {
 		pr_err("Invalid buffer min and/or max values\n");
@@ -1522,15 +1519,13 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
 int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
 				  unsigned long buf_len)
 {
-	if (image->type == KEXEC_TYPE_CRASH) {
-		int ret;
+	int ret;
 
-		/* Get exclude memory ranges needed for setting up kdump segments */
-		ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
-		if (ret) {
-			pr_err("Failed to setup exclude memory ranges for buffer lookup\n");
-			return ret;
-		}
+	/* Get exclude memory ranges needed for setting up kexec segments */
+	ret = get_exclude_memory_ranges(&(image->arch.exclude_ranges));
+	if (ret) {
+		pr_err("Failed to setup exclude memory ranges for buffer lookup\n");
+		return ret;
 	}
 
 	return kexec_image_probe_default(image, buf, buf_len);


^ permalink raw reply related

* Re: [PATCH v4 03/12] powerpc/kexec_file: add helper functions for getting memory ranges
From: Hari Bathini @ 2020-07-20 12:59 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Morton
  Cc: Pingfan Liu, Kexec-ml, Nayna Jain, Petr Tesarik,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Thiago Jung Bauermann, Dave Young, Vivek Goyal, Eric Biederman
In-Reply-To: <159524946347.20855.15784642736087777919.stgit@hbathini.in.ibm.com>



On 20/07/20 6:21 pm, Hari Bathini wrote:
> In kexec case, the kernel to be loaded uses the same memory layout as
> the running kernel. So, passing on the DT of the running kernel would
> be good enough.
> 
> But in case of kdump, different memory ranges are needed to manage
> loading the kdump kernel, booting into it and exporting the elfcore
> of the crashing kernel. The ranges are exclude memory ranges, usable
> memory ranges, reserved memory ranges and crash memory ranges.
> 
> Exclude memory ranges specify the list of memory ranges to avoid while
> loading kdump segments. Usable memory ranges list the memory ranges
> that could be used for booting kdump kernel. Reserved memory ranges
> list the memory regions for the loading kernel's reserve map. Crash
> memory ranges list the memory ranges to be exported as the crashing
> kernel's elfcore.
> 
> Add helper functions for setting up the above mentioned memory ranges.
> This helpers facilitate in understanding the subsequent changes better
> and make it easy to setup the different memory ranges listed above, as
> and when appropriate.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> Tested-by: Pingfan Liu <piliu@redhat.com>
> ---

> 
> v3 -> v4:
> * Unchanged. Added Reviewed-by tag from Thiago.
> 
> v2 -> v3:
> * Unchanged. Added Acked-by & Tested-by tags from Dave & Pingfan.
> 
> v1 -> v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Dropped __weak identifier for arch overridable functions.
> * Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
>   reported by lkp. lkp report for reference:
>     - https://lore.kernel.org/patchwork/patch/1264418/

Sorry, copy-paste error. The patch version changelog is as follows:

v3 -> v4:
* Updated sort_memory_ranges() function to reuse sort() from lib/sort.c
  and addressed other review comments from Thiago.

v2 -> v3:
* Unchanged. Added Tested-by tag from Pingfan.

v1 -> v2:
* Added an option to merge ranges while sorting to minimize reallocations
  for memory ranges list.
* Dropped within_crashkernel option for add_opal_mem_range() &
  add_rtas_mem_range() as it is not really needed.


Thanks
Hari

^ permalink raw reply

* Re: [PATCH] powerpc/fault: kernel can extend a user process's stack
From: Michael Ellerman @ 2020-07-20 13:52 UTC (permalink / raw)
  To: Michal Suchánek, Daniel Axtens; +Cc: Tom Lane, linuxppc-dev, Daniel Black
In-Reply-To: <20200720105116.GO32107@kitsune.suse.cz>

Michal Suchánek <msuchanek@suse.de> writes:
> Hello,
>
> On Wed, Dec 11, 2019 at 08:37:21PM +1100, Daniel Axtens wrote:
>> > Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
>> > arch/powerpc.")
>> 
>> Wow, that's pretty ancient! I'm also not sure it's right - in that same
>> patch, arch/ppc64/mm/fault.c contains:
>> 
>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 213)            if (address + 2048 < uregs->gpr[1]
>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 214)                && (!user_mode(regs) || !store_updates_sp(regs)))
>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 215)                    goto bad_area;
>> 
>> Which is the same as the new arch/powerpc/mm/fault.c code:
>> 
>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 234)            if (address + 2048 < uregs->gpr[1]
>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 235)                && (!user_mode(regs) || !store_updates_sp(regs)))
>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 236)                    goto bad_area;
>> 
>> So either they're both right or they're both wrong, either way I'm not
>> sure how this patch is to blame.
>
> Is there any progress on resolving this?
>
> I did not notice any followup patch nor this one being merged/refuted.

It ended up with this:

https://lore.kernel.org/linuxppc-dev/20200703141327.1732550-2-mpe@ellerman.id.au/


Which I was hoping would get some reviews :)

I'll probably merge the whole series into next this week.

cheers

^ permalink raw reply

* Re: [PATCH 07/11] Powerpc/numa: Detect support for coregroup
From: Michael Ellerman @ 2020-07-20 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
	Srikar Dronamraju, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200714043624.5648-8-srikar@linux.vnet.ibm.com>

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> Add support for grouping cores based on the device-tree classification.
> - The last domain in the associativity domains always refers to the
> core.
> - If primary reference domain happens to be the penultimate domain in
> the associativity domains device-tree property, then there are no
> coregroups. However if its not a penultimate domain, then there are
> coregroups. There can be more than one coregroup. For now we would be
> interested in the last or the smallest coregroups.

Should I know what a "coregroup" is? It's not a term I'm familiar with.

When you repost can you expand the Cc list to include lkml and
scheduler/topology folks please.

cheers

^ permalink raw reply

* Re: [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup
From: Michael Ellerman @ 2020-07-20 13:57 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
	Srikar Dronamraju, Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <20200714043624.5648-12-srikar@linux.vnet.ibm.com>

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> If user wants to enable coregroup sched_domain then they can boot with
> kernel parameter "coregroup_support=on"

Why would they want to do that?

Adding options like this increases our test matrix by 2x (though in
reality the non-default case will never be tested).

cheers

^ permalink raw reply

* Re: [FIX PATCH] powerpc/prom: Enable Radix GTSE in cpu pa-features
From: Michael Ellerman @ 2020-07-20 14:14 UTC (permalink / raw)
  To: linuxppc-dev, Bharata B Rao; +Cc: aneesh.kumar, Qian Cai, npiggin
In-Reply-To: <20200720044258.863574-1-bharata@linux.ibm.com>

On Mon, 20 Jul 2020 10:12:58 +0530, Bharata B Rao wrote:
> When '029ab30b4c0a ("powerpc/mm: Enable radix GTSE only if supported.")'
> made GTSE an MMU feature, it was enabled by default in
> powerpc-cpu-features but was missed in pa-features. This causes
> random memory corruption during boot of PowerNV kernels where
> CONFIG_PPC_DT_CPU_FTRS isn't enabled.

Applied to powerpc/next.

[1/1] powerpc/prom: Enable Radix GTSE in cpu pa-features
      https://git.kernel.org/powerpc/c/9a77c4a0a12597c661be374b8d566516c0341570

cheers

^ permalink raw reply

* Re: [PATCH v6] ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
From: Nayna @ 2020-07-20 14:40 UTC (permalink / raw)
  To: Bruno Meneguele, linux-kernel, x86, linuxppc-dev, linux-s390,
	linux-integrity, zohar
  Cc: erichte, nayna, stable
In-Reply-To: <20200713164830.101165-1-bmeneg@redhat.com>


On 7/13/20 12:48 PM, Bruno Meneguele wrote:
> The IMA_APPRAISE_BOOTPARAM config allows enabling different "ima_appraise="
> modes - log, fix, enforce - at run time, but not when IMA architecture
> specific policies are enabled.  This prevents properly labeling the
> filesystem on systems where secure boot is supported, but not enabled on the
> platform.  Only when secure boot is actually enabled should these IMA
> appraise modes be disabled.
>
> This patch removes the compile time dependency and makes it a runtime
> decision, based on the secure boot state of that platform.
>
> Test results as follows:
>
> -> x86-64 with secure boot enabled
>
> [    0.015637] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> [    0.015668] ima: Secure boot enabled: ignoring ima_appraise=fix boot parameter option
>
> -> powerpc with secure boot disabled
>
> [    0.000000] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> [    0.000000] Secure boot mode disabled
>
> -> Running the system without secure boot and with both options set:
>
> CONFIG_IMA_APPRAISE_BOOTPARAM=y
> CONFIG_IMA_ARCH_POLICY=y
>
> Audit prompts "missing-hash" but still allow execution and, consequently,
> filesystem labeling:
>
> type=INTEGRITY_DATA msg=audit(07/09/2020 12:30:27.778:1691) : pid=4976
> uid=root auid=root ses=2
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 op=appraise_data
> cause=missing-hash comm=bash name=/usr/bin/evmctl dev="dm-0" ino=493150
> res=no
>
> Cc: stable@vger.kernel.org
> Fixes: d958083a8f64 ("x86/ima: define arch_get_ima_policy() for x86")
> Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>


Reviewed-by: Nayna Jain<nayna@linux.ibm.com>

Tested-by: Nayna Jain<nayna@linux.ibm.com>


Thanks & Regards,

         - Nayna


^ permalink raw reply

* Re: [PATCH v6] ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
From: Mimi Zohar @ 2020-07-20 14:56 UTC (permalink / raw)
  To: Nayna, Bruno Meneguele, linux-kernel, x86, linuxppc-dev,
	linux-s390, linux-integrity
  Cc: erichte, nayna, stable
In-Reply-To: <d337cbba-e996-e898-1e75-9f142d480e5e@linux.vnet.ibm.com>

On Mon, 2020-07-20 at 10:40 -0400, Nayna wrote:
> On 7/13/20 12:48 PM, Bruno Meneguele wrote:
> > The IMA_APPRAISE_BOOTPARAM config allows enabling different "ima_appraise="
> > modes - log, fix, enforce - at run time, but not when IMA architecture
> > specific policies are enabled.  This prevents properly labeling the
> > filesystem on systems where secure boot is supported, but not enabled on the
> > platform.  Only when secure boot is actually enabled should these IMA
> > appraise modes be disabled.
> >
> > This patch removes the compile time dependency and makes it a runtime
> > decision, based on the secure boot state of that platform.
> >
> > Test results as follows:
> >
> > -> x86-64 with secure boot enabled
> >
> > [    0.015637] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> > [    0.015668] ima: Secure boot enabled: ignoring ima_appraise=fix boot parameter option
> >

Is it common to have two colons in the same line?  Is the colon being
used as a delimiter when parsing the kernel logs?  Should the second
colon be replaced with a hyphen?  (No need to repost.  I'll fix it
up.)
 

> > -> powerpc with secure boot disabled
> >
> > [    0.000000] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> > [    0.000000] Secure boot mode disabled
> >
> > -> Running the system without secure boot and with both options set:
> >
> > CONFIG_IMA_APPRAISE_BOOTPARAM=y
> > CONFIG_IMA_ARCH_POLICY=y
> >
> > Audit prompts "missing-hash" but still allow execution and, consequently,
> > filesystem labeling:
> >
> > type=INTEGRITY_DATA msg=audit(07/09/2020 12:30:27.778:1691) : pid=4976
> > uid=root auid=root ses=2
> > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 op=appraise_data
> > cause=missing-hash comm=bash name=/usr/bin/evmctl dev="dm-0" ino=493150
> > res=no
> >
> > Cc: stable@vger.kernel.org
> > Fixes: d958083a8f64 ("x86/ima: define arch_get_ima_policy() for x86")
> > Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
> 
> 
> Reviewed-by: Nayna Jain<nayna@linux.ibm.com>
> Tested-by: Nayna Jain<nayna@linux.ibm.com>

Thanks, Nayna.

Mimi


^ permalink raw reply

* Re: [PATCH v6] ima: move APPRAISE_BOOTPARAM dependency on ARCH_POLICY to runtime
From: Bruno Meneguele @ 2020-07-20 15:38 UTC (permalink / raw)
  To: Mimi Zohar
  Cc: linux-s390, Nayna, erichte, nayna, x86, linux-kernel, stable,
	linux-integrity, linuxppc-dev
In-Reply-To: <1595257015.5055.8.camel@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]

On Mon, Jul 20, 2020 at 10:56:55AM -0400, Mimi Zohar wrote:
> On Mon, 2020-07-20 at 10:40 -0400, Nayna wrote:
> > On 7/13/20 12:48 PM, Bruno Meneguele wrote:
> > > The IMA_APPRAISE_BOOTPARAM config allows enabling different "ima_appraise="
> > > modes - log, fix, enforce - at run time, but not when IMA architecture
> > > specific policies are enabled.  This prevents properly labeling the
> > > filesystem on systems where secure boot is supported, but not enabled on the
> > > platform.  Only when secure boot is actually enabled should these IMA
> > > appraise modes be disabled.
> > >
> > > This patch removes the compile time dependency and makes it a runtime
> > > decision, based on the secure boot state of that platform.
> > >
> > > Test results as follows:
> > >
> > > -> x86-64 with secure boot enabled
> > >
> > > [    0.015637] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> > > [    0.015668] ima: Secure boot enabled: ignoring ima_appraise=fix boot parameter option
> > >
> 
> Is it common to have two colons in the same line?  Is the colon being
> used as a delimiter when parsing the kernel logs?  Should the second
> colon be replaced with a hyphen?  (No need to repost.  I'll fix it
> up.)
>  

AFAICS it has been used without any limitations, e.g:

PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff]
clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
microcode: CPU0: patch_level=0x08701013
Lockdown: modprobe: unsigned module loading is restricted; see man kernel_lockdown.7
...

I'd say we're fine using it.

> 
> > > -> powerpc with secure boot disabled
> > >
> > > [    0.000000] Kernel command line: <...> ima_policy=appraise_tcb ima_appraise=fix
> > > [    0.000000] Secure boot mode disabled
> > >
> > > -> Running the system without secure boot and with both options set:
> > >
> > > CONFIG_IMA_APPRAISE_BOOTPARAM=y
> > > CONFIG_IMA_ARCH_POLICY=y
> > >
> > > Audit prompts "missing-hash" but still allow execution and, consequently,
> > > filesystem labeling:
> > >
> > > type=INTEGRITY_DATA msg=audit(07/09/2020 12:30:27.778:1691) : pid=4976
> > > uid=root auid=root ses=2
> > > subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 op=appraise_data
> > > cause=missing-hash comm=bash name=/usr/bin/evmctl dev="dm-0" ino=493150
> > > res=no
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: d958083a8f64 ("x86/ima: define arch_get_ima_policy() for x86")
> > > Signed-off-by: Bruno Meneguele <bmeneg@redhat.com>
> > 
> > 
> > Reviewed-by: Nayna Jain<nayna@linux.ibm.com>
> > Tested-by: Nayna Jain<nayna@linux.ibm.com>
> 
> Thanks, Nayna.
> 
> Mimi
> 

-- 
bmeneg 
PGP Key: http://bmeneg.com/pubkey.txt

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [powerpc:next-test 103/106] arch/powerpc/mm/book3s64/radix_pgtable.c:513:21: error: use of undeclared identifier 'SECTION_SIZE_BITS'
From: Christophe Leroy @ 2020-07-20 16:39 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: kbuild-all, kernel test robot, linuxppc-dev, clang-built-linux,
	Bharata B Rao
In-Reply-To: <87zh7w108a.fsf@linux.ibm.com>

"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> a écrit :

> kernel test robot <lkp@intel.com> writes:
>
>> tree:    
>> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
>> next-test
>> head:   5fed3b3e21db21f9a7002426f456fd3a8a8c0772
>> commit: 21407f39b9d547da527ad5224c4323e1f62bb514 [103/106]  
>> powerpc/mm/radix: Create separate mappings for hot-plugged memory
>> config: powerpc-randconfig-r016-20200719 (attached as .config)
>> compiler: clang version 12.0.0  
>> (https://github.com/llvm/llvm-project  
>> ed6b578040a85977026c93bf4188f996148f3218)
>> reproduce (this is a W=1 build):
>>         wget  
>> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O  
>> ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # install powerpc cross compiling tool for clang build
>>         # apt-get install binutils-powerpc-linux-gnu
>>         git checkout 21407f39b9d547da527ad5224c4323e1f62bb514
>>         # save the attached .config to linux build tree
>>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross  
>> ARCH=powerpc
>>
>> If you fix the issue, kindly add following tag as appropriate
>> Reported-by: kernel test robot <lkp@intel.com>
>>
>> All errors (new ones prefixed by >>):
>>
>>>> arch/powerpc/mm/book3s64/radix_pgtable.c:513:21: error: use of  
>>>> undeclared identifier 'SECTION_SIZE_BITS'
>>                    *mem_block_size = MIN_MEMORY_BLOCK_SIZE;
>>                                      ^
>>    include/linux/memory.h:24:43: note: expanded from macro  
>> 'MIN_MEMORY_BLOCK_SIZE'
>>    #define MIN_MEMORY_BLOCK_SIZE     (1UL << SECTION_SIZE_BITS)
>>                                              ^
>>    arch/powerpc/mm/book3s64/radix_pgtable.c:521:33: error: use of  
>> undeclared identifier 'SECTION_SIZE_BITS'
>>            unsigned long mem_block_size = MIN_MEMORY_BLOCK_SIZE;
>>                                           ^
>>    include/linux/memory.h:24:43: note: expanded from macro  
>> 'MIN_MEMORY_BLOCK_SIZE'
>>    #define MIN_MEMORY_BLOCK_SIZE     (1UL << SECTION_SIZE_BITS)
>>                                              ^
>>    2 errors generated.
>>
>> vim +/SECTION_SIZE_BITS +513 arch/powerpc/mm/book3s64/radix_pgtable.c
>>
>>    494
>>    495	static int __init probe_memory_block_size(unsigned long  
>> node, const char *uname, int
>>    496						  depth, void *data)
>>    497	{
>>    498		unsigned long *mem_block_size = (unsigned long *)data;
>>    499		const __be64 *prop;
>>    500		int len;
>>    501
>>    502		if (depth != 1)
>>    503			return 0;
>>    504
>>    505		if (strcmp(uname, "ibm,dynamic-reconfiguration-memory"))
>>    506			return 0;
>>    507
>>    508		prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
>>    509		if (!prop || len < sizeof(__be64))
>>    510			/*
>>    511			 * Nothing in the device tree
>>    512			 */
>>  > 513			*mem_block_size = MIN_MEMORY_BLOCK_SIZE;
>>    514		else
>>    515			*mem_block_size = be64_to_cpup(prop);
>>    516		return 1;
>>    517	}
>>    518
>>
>
>  arch/powerpc/mm/book3s64/radix_pgtable.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c  
> b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index bba45fc0b7b2..c5bf2ef73c36 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -492,6 +492,7 @@ static int __init  
> radix_dt_scan_page_sizes(unsigned long node,
>  	return 1;
>  }
>
> +#ifdef CONFIG_MEMORY_HOTPLUG
>  static int __init probe_memory_block_size(unsigned long node, const  
> char *uname, int
>  					  depth, void *data)
>  {
> @@ -532,6 +533,15 @@ static unsigned long radix_memory_block_size(void)
>  	return mem_block_size;
>  }
>
> +#else   /* CONFIG_MEMORY_HOTPLUG */
> +
> +static unsigned long radix_memory_block_size(void)
> +{
> +	return 1UL * 1024 * 1024 * 1024;

Use SZ_1G instead ?

Christophe

> +}
> +
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> +
>
>  void __init radix__early_init_devtree(void)
>  {
> --
> 2.26.2
>
>
> -aneesh



^ permalink raw reply

* Re: [RFC PATCH 4/7] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Mathieu Desnoyers @ 2020-07-20 16:46 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Jens Axboe, Arnd Bergmann, Peter Zijlstra, x86,
	linux-kernel, Andy Lutomirski, linux-mm, Andy Lutomirski,
	linuxppc-dev
In-Reply-To: <1595213677.kxru89dqy2.astroid@bobo.none>

----- On Jul 19, 2020, at 11:03 PM, Nicholas Piggin npiggin@gmail.com wrote:

> Excerpts from Mathieu Desnoyers's message of July 17, 2020 11:42 pm:
>> ----- On Jul 16, 2020, at 7:26 PM, Nicholas Piggin npiggin@gmail.com wrote:
>> [...]
>>> 
>>> membarrier does replace barrier instructions on remote CPUs, which do
>>> order accesses performed by the kernel on the user address space. So
>>> membarrier should too I guess.
>>> 
>>> Normal process context accesses like read(2) will do so because they
>>> don't get filtered out from IPIs, but kernel threads using the mm may
>>> not.
>> 
>> But it should not be an issue, because membarrier's ordering is only with
>> respect
>> to submit and completion of io_uring requests, which are performed through
>> system calls from the context of user-space threads, which are called from the
>> right mm.
> 
> Is that true? Can io completions be written into an address space via a
> kernel thread? I don't know the io_uring code well but it looks like
> that's asynchonously using the user mm context.

Indeed, the io completion appears to be signaled asynchronously between kernel
and user-space. Therefore, both kernel and userspace code need to have proper
memory barriers in place to signal completion, otherwise user-space could read
garbage after it notices completion of a read.

I did not review the entire io_uring implementation, but the publish side
for completion appears to be:

static void __io_commit_cqring(struct io_ring_ctx *ctx)
{
        struct io_rings *rings = ctx->rings;

        /* order cqe stores with ring update */
        smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);

        if (wq_has_sleeper(&ctx->cq_wait)) {
                wake_up_interruptible(&ctx->cq_wait);
                kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
        }
}

The store-release on tail should be paired with a load_acquire on the
reader-side (it's called "read_barrier()" in the code):

tools/io_uring/queue.c:

static int __io_uring_get_cqe(struct io_uring *ring,
                              struct io_uring_cqe **cqe_ptr, int wait)
{
        struct io_uring_cq *cq = &ring->cq;
        const unsigned mask = *cq->kring_mask;
        unsigned head;
        int ret;

        *cqe_ptr = NULL;
        head = *cq->khead;
        do {
                /*
                 * It's necessary to use a read_barrier() before reading
                 * the CQ tail, since the kernel updates it locklessly. The
                 * kernel has the matching store barrier for the update. The
                 * kernel also ensures that previous stores to CQEs are ordered
                 * with the tail update.
                 */
                read_barrier();
                if (head != *cq->ktail) {
                        *cqe_ptr = &cq->cqes[head & mask];
                        break;
                }
                if (!wait)
                        break;
                ret = io_uring_enter(ring->ring_fd, 0, 1,
                                        IORING_ENTER_GETEVENTS, NULL);
                if (ret < 0)
                        return -errno;
        } while (1);

        return 0;
}

So as far as membarrier memory ordering dependencies are concerned, it relies
on the store-release/load-acquire dependency chain in the completion queue to
order against anything that was done prior to the completed requests.

What is in-flight while the requests are being serviced provides no memory
ordering guarantee whatsoever.

> How about other memory accesses via kthread_use_mm? Presumably there is
> still ordering requirement there for membarrier,

Please provide an example case with memory accesses via kthread_use_mm where
ordering matters to support your concern.

> so I really think
> it's a fragile interface with no real way for the user to know how
> kernel threads may use its mm for any particular reason, so membarrier
> should synchronize all possible kernel users as well.

I strongly doubt so, but perhaps something should be clarified in the documentation
if you have that feeling.

Thanks,

Mathieu

> 
> Thanks,
> Nick

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register
From: Segher Boessenkool @ 2020-07-20 20:10 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: aik, Ram Pai, kvm-ppc, bharata, sathnaga, sukadev, linuxppc-dev,
	bauerman, david
In-Reply-To: <18e3bcee-8a3a-bd13-c995-8e4168471f74@linux.ibm.com>

On Mon, Jul 20, 2020 at 11:39:56AM +0200, Laurent Dufour wrote:
> Le 16/07/2020 à 10:32, Ram Pai a écrit :
> >+	if (is_secure_guest()) {					\
> >+		__asm__ __volatile__("mfsprg0 %3;"			\
> >+				"lnia %2;"				\
> >+				"ld %2,12(%2);"				\
> >+				"mtsprg0 %2;"				\
> >+				"sync;"					\
> >+				#insn" %0,%y1;"				\
> >+				"twi 0,%0,0;"				\
> >+				"isync;"				\
> >+				"mtsprg0 %3"				\
> >+			: "=r" (ret)					\
> >+			: "Z" (*addr), "r" (0), "r" (0)			\
> 
> I'm wondering if SPRG0 is restored to its original value.
> You're using the same register (r0) for parameters 2 and 3, so when doing 
> lnia %2, you're overwriting the SPRG0 value you saved in r0 just earlier.

It is putting the value 0 in the registers the compiler chooses for
operands 2 and 3.  But operand 3 is written, while the asm says it is an
input.  It needs an earlyclobber as well.

> It may be clearer to use explicit registers for %2 and %3 and to mark them 
> as modified for the compiler.

That is not a good idea, imnsho.


Segher

^ permalink raw reply

* Re: [RFC PATCH] powerpc/pseries/svm: capture instruction faulting on MMIO access, in sprg0 register
From: Segher Boessenkool @ 2020-07-20 20:24 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: aik, Ram Pai, kvm-ppc, bharata, sathnaga, sukadev, linuxppc-dev,
	bauerman, david
In-Reply-To: <20200720201041.GM30544@gate.crashing.org>

On Mon, Jul 20, 2020 at 03:10:41PM -0500, Segher Boessenkool wrote:
> On Mon, Jul 20, 2020 at 11:39:56AM +0200, Laurent Dufour wrote:
> > Le 16/07/2020 à 10:32, Ram Pai a écrit :
> > >+	if (is_secure_guest()) {					\
> > >+		__asm__ __volatile__("mfsprg0 %3;"			\
> > >+				"lnia %2;"				\
> > >+				"ld %2,12(%2);"				\
> > >+				"mtsprg0 %2;"				\
> > >+				"sync;"					\
> > >+				#insn" %0,%y1;"				\
> > >+				"twi 0,%0,0;"				\
> > >+				"isync;"				\
> > >+				"mtsprg0 %3"				\
> > >+			: "=r" (ret)					\
> > >+			: "Z" (*addr), "r" (0), "r" (0)			\
> > 
> > I'm wondering if SPRG0 is restored to its original value.
> > You're using the same register (r0) for parameters 2 and 3, so when doing 
> > lnia %2, you're overwriting the SPRG0 value you saved in r0 just earlier.
> 
> It is putting the value 0 in the registers the compiler chooses for
> operands 2 and 3.  But operand 3 is written, while the asm says it is an
> input.  It needs an earlyclobber as well.
> 
> > It may be clearer to use explicit registers for %2 and %3 and to mark them 
> > as modified for the compiler.
> 
> That is not a good idea, imnsho.

(The explicit register number part, I mean; operand 2 should be an
output as well, yes.)


Segher

^ permalink raw reply

* Re: [PATCH] powerpc/boot: Use address-of operator on section symbols
From: Segher Boessenkool @ 2020-07-20 21:02 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Arnd Bergmann, Geoff Levand, Linux Kernel Mailing List,
	clang-built-linux, Paul Mackerras, Joel Stanley,
	Nathan Chancellor, linuxppc-dev
In-Reply-To: <CAMuHMdU_KfQ-RT_nev5LgN=Vj_P97Fn=nwRoC6ZREFLa3Ysj7w@mail.gmail.com>

Hi!

On Sat, Jul 18, 2020 at 09:50:50AM +0200, Geert Uytterhoeven wrote:
> On Wed, Jun 24, 2020 at 6:02 AM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >         /* If we have an image attached to us, it overrides anything
> >          * supplied by the loader. */
> > -       if (_initrd_end > _initrd_start) {
> > +       if (&_initrd_end > &_initrd_start) {
> 
> Are you sure that fix is correct?
> 
>     extern char _initrd_start[];
>     extern char _initrd_end[];
>     extern char _esm_blob_start[];
>     extern char _esm_blob_end[];
> 
> Of course the result of their comparison is a constant, as the addresses
> are constant.  If clangs warns about it, perhaps that warning should be moved
> to W=1?
> 
> But adding "&" is not correct, according to C.

Why not?

6.5.3.2/3
The unary & operator yields the address of its operand.  [...]
Otherwise, the result is a pointer to the object or function designated
by its operand.

This is the same as using the name of an array without anything else,
yes.  It is a bit clearer if it would not be declared as array, perhaps,
but it is correct just fine like this.


Segher

^ permalink raw reply

* Re: [PATCH net-next] net: fs_enet: remove redundant null check
From: David Miller @ 2020-07-21  0:42 UTC (permalink / raw)
  To: zhangchangzhong; +Cc: kuba, netdev, linuxppc-dev, linux-kernel
In-Reply-To: <1595243553-12325-1-git-send-email-zhangchangzhong@huawei.com>

From: Zhang Changzhong <zhangchangzhong@huawei.com>
Date: Mon, 20 Jul 2020 19:12:33 +0800

> Because clk_prepare_enable and clk_disable_unprepare already
> checked NULL clock parameter, so the additional checks are
> unnecessary, just remove them.
> 
> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH] powerpc/fault: kernel can extend a user process's stack
From: Daniel Axtens @ 2020-07-21  0:57 UTC (permalink / raw)
  To: Michael Ellerman, Michal Suchánek
  Cc: Tom Lane, linuxppc-dev, Daniel Black
In-Reply-To: <878sfethsj.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:

> Michal Suchánek <msuchanek@suse.de> writes:
>> Hello,
>>
>> On Wed, Dec 11, 2019 at 08:37:21PM +1100, Daniel Axtens wrote:
>>> > Fixes: 14cf11af6cf6 ("powerpc: Merge enough to start building in
>>> > arch/powerpc.")
>>> 
>>> Wow, that's pretty ancient! I'm also not sure it's right - in that same
>>> patch, arch/ppc64/mm/fault.c contains:
>>> 
>>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 213)            if (address + 2048 < uregs->gpr[1]
>>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 214)                && (!user_mode(regs) || !store_updates_sp(regs)))
>>> ^1da177e4c3f4 (Linus Torvalds         2005-04-16 15:20:36 -0700 215)                    goto bad_area;
>>> 
>>> Which is the same as the new arch/powerpc/mm/fault.c code:
>>> 
>>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 234)            if (address + 2048 < uregs->gpr[1]
>>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 235)                && (!user_mode(regs) || !store_updates_sp(regs)))
>>> 14cf11af6cf60 (Paul Mackerras 2005-09-26 16:04:21 +1000 236)                    goto bad_area;
>>> 
>>> So either they're both right or they're both wrong, either way I'm not
>>> sure how this patch is to blame.
>>
>> Is there any progress on resolving this?
>>
>> I did not notice any followup patch nor this one being merged/refuted.
>
> It ended up with this:
>
> https://lore.kernel.org/linuxppc-dev/20200703141327.1732550-2-mpe@ellerman.id.au/
>
>
> Which I was hoping would get some reviews :)

Ah, I missed this. I'll give it a look as soon as I can.

Kind regards,
Daniel

>
> I'll probably merge the whole series into next this week.
>
> cheers

^ permalink raw reply

* Re: Question about NUMA distance calculation in powerpc/mm/numa.c
From: Michael Ellerman @ 2020-07-21  1:36 UTC (permalink / raw)
  To: Daniel Henrique Barboza, linuxppc-dev
In-Reply-To: <e5c3b1f1-d6ac-50d5-95f5-3c6e830a078e@gmail.com>

Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> Hello,
>
>
> I didn't find an explanation about the 'double the distance' logic in
> 'git log' or anywhere in the kernel docs:
>
>
> (arch/powerpc/mm/numa.c, __node_distance()):

Adding more context:

  int distance = LOCAL_DISTANCE;
  ...

> for (i = 0; i < distance_ref_points_depth; i++) {
> 	if (distance_lookup_table[a][i] == distance_lookup_table[b][i])
> 		break;
>
> 	/* Double the distance for each NUMA level */
> 	distance *= 2;
> }

And:

#define LOCAL_DISTANCE		10
#define REMOTE_DISTANCE		20


So AFAICS the doubling is just a way to ensure we go from LOCAL_DISTANCE
to REMOTE_DISTANCE at the first level, and then after that it's fairly
arbitrary.

cheers

^ permalink raw reply

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
From: Michael Ellerman @ 2020-07-21  1:45 UTC (permalink / raw)
  To: Nathan Lynch, Aneesh Kumar K.V; +Cc: linuxppc-dev, Bharata B Rao
In-Reply-To: <87r1tb1rw2.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> This is the next version of the fixes for memory unplug on radix.
>> The issues and the fix are described in the actual patches.
>
> I guess this isn't actually causing problems at runtime right now, but I
> notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> arch_remove_memory(), which ought to be mmu-agnostic:
>
> int __ref arch_add_memory(int nid, u64 start, u64 size,
> 			  struct mhp_params *params)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int rc;
>
> 	resize_hpt_for_hotplug(memblock_phys_mem_size());
>
> 	start = (unsigned long)__va(start);
> 	rc = create_section_mapping(start, start + size, nid,
> 				    params->pgprot);
> ...

Hmm well spotted.

That does return early if the ops are not setup:

int resize_hpt_for_hotplug(unsigned long new_mem_size)
{
	unsigned target_hpt_shift;

	if (!mmu_hash_ops.resize_hpt)
		return 0;


And:

void __init hpte_init_pseries(void)
{
	...
	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;

And that comes in via ibm,hypertas-functions:

	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},


But firmware is not necessarily going to add/remove that call based on
whether we're using hash/radix.

So I think a follow-up patch is needed to make this more robust.

Aneesh/Bharata what platform did you test this series on? I'm curious
how this didn't break.

cheers

^ permalink raw reply

* Re: [PATCH 07/11] Powerpc/numa: Detect support for coregroup
From: Srikar Dronamraju @ 2020-07-21  2:57 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran, Michael Neuling,
	Anton Blanchard, linuxppc-dev, Nick Piggin
In-Reply-To: <875zaithmk.fsf@mpe.ellerman.id.au>

* Michael Ellerman <mpe@ellerman.id.au> [2020-07-20 23:56:19]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> 
> > Add support for grouping cores based on the device-tree classification.
> > - The last domain in the associativity domains always refers to the
> > core.
> > - If primary reference domain happens to be the penultimate domain in
> > the associativity domains device-tree property, then there are no
> > coregroups. However if its not a penultimate domain, then there are
> > coregroups. There can be more than one coregroup. For now we would be
> > interested in the last or the smallest coregroups.
> 
> Should I know what a "coregroup" is? It's not a term I'm familiar with.
> 

Coregroup is a group or subset of cores from the same chip/DIE that share
some resources. This is very similar to MC domain aka Multi-Core Cache (MC),
(kernel/sched/topology.c) where cores share the same cache. In the
Multi-Core Cache domain, all the cores of that domain share the last level
of cache.

Will add the description to the commit message.

> When you repost can you expand the Cc list to include lkml and
> scheduler/topology folks please.
> 

Okay, will add them, I shall copy to LKML, Peter Zijlstra and Ingo Molnar.
Do let me know if you want anyone else to added.

> cheers

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [powerpc:next-test 103/106] arch/powerpc/mm/book3s64/radix_pgtable.c:513:21: error: use of undeclared identifier 'SECTION_SIZE_BITS'
From: Michael Ellerman @ 2020-07-21  3:13 UTC (permalink / raw)
  To: Christophe Leroy, Aneesh Kumar K.V
  Cc: linuxppc-dev, Bharata B Rao, kbuild-all, kernel test robot,
	clang-built-linux
In-Reply-To: <20200720183900.Horde.y2dVSL93KA1P6bzz7IKxoA1@messagerie.si.c-s.fr>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> a écrit :
...
>>
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c  
>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index bba45fc0b7b2..c5bf2ef73c36 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -492,6 +492,7 @@ static int __init  
>> @@ -532,6 +533,15 @@ static unsigned long radix_memory_block_size(void)
>>  	return mem_block_size;
>>  }
>>
>> +#else   /* CONFIG_MEMORY_HOTPLUG */
>> +
>> +static unsigned long radix_memory_block_size(void)
>> +{
>> +	return 1UL * 1024 * 1024 * 1024;
>
> Use SZ_1G instead ?

I've already squashed that in.

I'd take a patch to convert all cases though, I see at least:

  arch/powerpc/boot/ep8248e.c:    mem_size *= 1024 * 1024;
  arch/powerpc/boot/ep88xc.c:     mem_size *= 1024 * 1024;
  arch/powerpc/include/asm/kexec.h:#define KEXEC_SOURCE_MEMORY_LIMIT      (2 * 1024 * 1024 * 1024UL - 1)
  arch/powerpc/include/asm/kexec.h:#define KEXEC_DESTINATION_MEMORY_LIMIT (2 * 1024 * 1024 * 1024UL - 1)
  arch/powerpc/include/asm/kexec.h:#define KEXEC_CONTROL_MEMORY_LIMIT     (2 * 1024 * 1024 * 1024UL - 1)
  arch/powerpc/kernel/iommu.c:    if ((tbl->it_size << tbl->it_page_shift) >= (1UL * 1024 * 1024 * 1024))
  arch/powerpc/kernel/setup-common.c:                        (unsigned int)(total_memory / (1024 * 1024)));
  arch/powerpc/mm/book3s64/radix_pgtable.c:               mem_block_size = 1UL * 1024 * 1024 * 1024;
  arch/powerpc/mm/book3s64/radix_pgtable.c:       return 1UL * 1024 * 1024 * 1024;
  arch/powerpc/mm/ioremap_32.c:   if (p < 16 * 1024 * 1024)
  arch/powerpc/platforms/powernv/setup.c:         return 256UL * 1024 * 1024;
  arch/powerpc/platforms/pseries/cmm.c:   signed long min_mem_pages = (min_mem_mb * 1024 * 1024) / PAGE_SIZE;

cheers

^ permalink raw reply

* Re: [PATCH v4 10/10] powerpc/watchpoint: Remove 512 byte boundary
From: Ravi Bangoria @ 2020-07-21  3:24 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: Christophe Leroy, apopple, mikey, miltonm, peterz, oleg,
	Nicholas Piggin, linux-kernel, Paul Mackerras, jolsa, fweisbec,
	pedromfc, naveen.n.rao, linuxppc-dev, mingo, Ravi Bangoria
In-Reply-To: <CACzsE9og50tH9jRZjWYDgbFxdTkDXJq3gMuP8uxPWfrrREo=4w@mail.gmail.com>

Hi Jordan,

On 7/20/20 12:24 PM, Jordan Niethe wrote:
> On Fri, Jul 17, 2020 at 2:11 PM Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>>
>> Power10 has removed 512 bytes boundary from match criteria. i.e. The watch
>> range can cross 512 bytes boundary.
> It looks like this change is not mentioned in ISA v3.1 Book III 9.4
> Data Address Watchpoint. It could be useful to mention that in the
> commit message.

Yes, ISA 3.1 Book III 9.4 has a documentation mistake and hopefully it
will be fixed in the next version of ISA. Though, this is mentioned in
ISA 3.1 change log:

   Multiple DEAW:
   Added a second Data Address Watchpoint. [H]DAR is
   set to the first byte of overlap. 512B boundary is
   removed.

I'll mention this in the commit description.

> Also I wonder if could add a test for this to the ptrace-hwbreak selftest?

Yes, I already have a selftest for this in perf-hwbreak. Will send that soon.

Thanks,
Ravi

^ permalink raw reply

* Re: [PATCH v3 0/4] powerpc/mm/radix: Memory unplug fixes
From: Bharata B Rao @ 2020-07-21  3:29 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Nathan Lynch, Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <87tuy1sksv.fsf@mpe.ellerman.id.au>

On Tue, Jul 21, 2020 at 11:45:20AM +1000, Michael Ellerman wrote:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
> > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> >> This is the next version of the fixes for memory unplug on radix.
> >> The issues and the fix are described in the actual patches.
> >
> > I guess this isn't actually causing problems at runtime right now, but I
> > notice calls to resize_hpt_for_hotplug() from arch_add_memory() and
> > arch_remove_memory(), which ought to be mmu-agnostic:
> >
> > int __ref arch_add_memory(int nid, u64 start, u64 size,
> > 			  struct mhp_params *params)
> > {
> > 	unsigned long start_pfn = start >> PAGE_SHIFT;
> > 	unsigned long nr_pages = size >> PAGE_SHIFT;
> > 	int rc;
> >
> > 	resize_hpt_for_hotplug(memblock_phys_mem_size());
> >
> > 	start = (unsigned long)__va(start);
> > 	rc = create_section_mapping(start, start + size, nid,
> > 				    params->pgprot);
> > ...
> 
> Hmm well spotted.
> 
> That does return early if the ops are not setup:
> 
> int resize_hpt_for_hotplug(unsigned long new_mem_size)
> {
> 	unsigned target_hpt_shift;
> 
> 	if (!mmu_hash_ops.resize_hpt)
> 		return 0;
> 
> 
> And:
> 
> void __init hpte_init_pseries(void)
> {
> 	...
> 	if (firmware_has_feature(FW_FEATURE_HPT_RESIZE))
> 		mmu_hash_ops.resize_hpt = pseries_lpar_resize_hpt;
> 
> And that comes in via ibm,hypertas-functions:
> 
> 	{FW_FEATURE_HPT_RESIZE,		"hcall-hpt-resize"},
> 
> 
> But firmware is not necessarily going to add/remove that call based on
> whether we're using hash/radix.

Correct but hpte_init_pseries() will not be called for radix guests.

> 
> So I think a follow-up patch is needed to make this more robust.
> 
> Aneesh/Bharata what platform did you test this series on? I'm curious
> how this didn't break.

I have tested memory hotplug/unplug for radix guest on zz platform and
sanity-tested this for hash guest on P8.

As noted above, mmu_hash_ops.resize_hpt will not be set for radix
guest and hence we won't see any breakage.

However a separate patch to fix this will be good.

Regards,
Bharata.

^ permalink raw reply

* Re: [v3 01/15] powerpc/perf: Update cpu_hw_event to use `struct` for storing MMCR registers
From: Jordan Niethe @ 2020-07-21  3:42 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Gautham R Shenoy, mikey, maddy, kvm, kvm-ppc, svaidyan, acme,
	jolsa, linuxppc-dev
In-Reply-To: <1594996707-3727-2-git-send-email-atrajeev@linux.vnet.ibm.com>

On Sat, Jul 18, 2020 at 12:48 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
> core-book3s currently uses array to store the MMCR registers as part
> of per-cpu `cpu_hw_events`. This patch does a clean up to use `struct`
> to store mmcr regs instead of array. This will make code easier to read
> and reduces chance of any subtle bug that may come in the future, say
> when new registers are added. Patch updates all relevant code that was
> using MMCR array ( cpuhw->mmcr[x]) to use newly introduced `struct`.
> This includes the PMU driver code for supported platforms (power5
> to power9) and ISA macros for counter support functions.
>
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/perf_event_server.h | 10 ++++--
>  arch/powerpc/perf/core-book3s.c              | 53 +++++++++++++---------------
>  arch/powerpc/perf/isa207-common.c            | 20 +++++------
>  arch/powerpc/perf/isa207-common.h            |  4 +--
>  arch/powerpc/perf/mpc7450-pmu.c              | 21 +++++++----
>  arch/powerpc/perf/power5+-pmu.c              | 17 ++++-----
>  arch/powerpc/perf/power5-pmu.c               | 17 ++++-----
>  arch/powerpc/perf/power6-pmu.c               | 16 ++++-----
>  arch/powerpc/perf/power7-pmu.c               | 17 ++++-----
>  arch/powerpc/perf/ppc970-pmu.c               | 24 ++++++-------
>  10 files changed, 105 insertions(+), 94 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
> index 3e9703f..f9a3668 100644
> --- a/arch/powerpc/include/asm/perf_event_server.h
> +++ b/arch/powerpc/include/asm/perf_event_server.h
> @@ -17,6 +17,12 @@
>
>  struct perf_event;
>
> +struct mmcr_regs {
> +       unsigned long mmcr0;
> +       unsigned long mmcr1;
> +       unsigned long mmcr2;
> +       unsigned long mmcra;
> +};
>  /*
>   * This struct provides the constants and functions needed to
>   * describe the PMU on a particular POWER-family CPU.
> @@ -28,7 +34,7 @@ struct power_pmu {
>         unsigned long   add_fields;
>         unsigned long   test_adder;
>         int             (*compute_mmcr)(u64 events[], int n_ev,
> -                               unsigned int hwc[], unsigned long mmcr[],
> +                               unsigned int hwc[], struct mmcr_regs *mmcr,
>                                 struct perf_event *pevents[]);
>         int             (*get_constraint)(u64 event_id, unsigned long *mskp,
>                                 unsigned long *valp);
> @@ -41,7 +47,7 @@ struct power_pmu {
>         unsigned long   group_constraint_val;
>         u64             (*bhrb_filter_map)(u64 branch_sample_type);
>         void            (*config_bhrb)(u64 pmu_bhrb_filter);
> -       void            (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
> +       void            (*disable_pmc)(unsigned int pmc, struct mmcr_regs *mmcr);
>         int             (*limited_pmc_event)(u64 event_id);
>         u32             flags;
>         const struct attribute_group    **attr_groups;
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index cd6a742..18b1b6a 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -37,12 +37,7 @@ struct cpu_hw_events {
>         struct perf_event *event[MAX_HWEVENTS];
>         u64 events[MAX_HWEVENTS];
>         unsigned int flags[MAX_HWEVENTS];
> -       /*
> -        * The order of the MMCR array is:
> -        *  - 64-bit, MMCR0, MMCR1, MMCRA, MMCR2
> -        *  - 32-bit, MMCR0, MMCR1, MMCR2
> -        */
> -       unsigned long mmcr[4];
> +       struct mmcr_regs mmcr;
>         struct perf_event *limited_counter[MAX_LIMITED_HWCOUNTERS];
>         u8  limited_hwidx[MAX_LIMITED_HWCOUNTERS];
>         u64 alternatives[MAX_HWEVENTS][MAX_EVENT_ALTERNATIVES];
> @@ -121,7 +116,7 @@ static void ebb_event_add(struct perf_event *event) { }
>  static void ebb_switch_out(unsigned long mmcr0) { }
>  static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>  {
> -       return cpuhw->mmcr[0];
> +       return cpuhw->mmcr.mmcr0;
>  }
>
>  static inline void power_pmu_bhrb_enable(struct perf_event *event) {}
> @@ -590,7 +585,7 @@ static void ebb_switch_out(unsigned long mmcr0)
>
>  static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>  {
> -       unsigned long mmcr0 = cpuhw->mmcr[0];
> +       unsigned long mmcr0 = cpuhw->mmcr.mmcr0;
>
>         if (!ebb)
>                 goto out;
> @@ -624,7 +619,7 @@ static unsigned long ebb_switch_in(bool ebb, struct cpu_hw_events *cpuhw)
>          * unfreeze counters, it should not set exclude_xxx in its events and
>          * instead manage the MMCR2 entirely by itself.
>          */
> -       mtspr(SPRN_MMCR2, cpuhw->mmcr[3] | current->thread.mmcr2);
> +       mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2 | current->thread.mmcr2);
>  out:
>         return mmcr0;
>  }
> @@ -1232,9 +1227,9 @@ static void power_pmu_disable(struct pmu *pmu)
>                 /*
>                  * Disable instruction sampling if it was enabled
>                  */
> -               if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
> +               if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE) {
>                         mtspr(SPRN_MMCRA,
> -                             cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> +                             cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
>                         mb();
>                         isync();
>                 }
> @@ -1308,18 +1303,18 @@ static void power_pmu_enable(struct pmu *pmu)
>          * (possibly updated for removal of events).
>          */
>         if (!cpuhw->n_added) {
> -               mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> -               mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
> +               mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
> +               mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
>                 goto out_enable;
>         }
>
>         /*
>          * Clear all MMCR settings and recompute them for the new set of events.
>          */
> -       memset(cpuhw->mmcr, 0, sizeof(cpuhw->mmcr));
> +       memset(&cpuhw->mmcr, 0, sizeof(cpuhw->mmcr));
>
>         if (ppmu->compute_mmcr(cpuhw->events, cpuhw->n_events, hwc_index,
> -                              cpuhw->mmcr, cpuhw->event)) {
> +                              &cpuhw->mmcr, cpuhw->event)) {
>                 /* shouldn't ever get here */
>                 printk(KERN_ERR "oops compute_mmcr failed\n");
>                 goto out;
> @@ -1333,11 +1328,11 @@ static void power_pmu_enable(struct pmu *pmu)
>                  */
>                 event = cpuhw->event[0];
>                 if (event->attr.exclude_user)
> -                       cpuhw->mmcr[0] |= MMCR0_FCP;
> +                       cpuhw->mmcr.mmcr0 |= MMCR0_FCP;
>                 if (event->attr.exclude_kernel)
> -                       cpuhw->mmcr[0] |= freeze_events_kernel;
> +                       cpuhw->mmcr.mmcr0 |= freeze_events_kernel;
>                 if (event->attr.exclude_hv)
> -                       cpuhw->mmcr[0] |= MMCR0_FCHV;
> +                       cpuhw->mmcr.mmcr0 |= MMCR0_FCHV;
>         }
>
>         /*
> @@ -1346,12 +1341,12 @@ static void power_pmu_enable(struct pmu *pmu)
>          * Then unfreeze the events.
>          */
>         ppc_set_pmu_inuse(1);
> -       mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> -       mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
> -       mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
> +       mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra & ~MMCRA_SAMPLE_ENABLE);
> +       mtspr(SPRN_MMCR1, cpuhw->mmcr.mmcr1);
> +       mtspr(SPRN_MMCR0, (cpuhw->mmcr.mmcr0 & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
>                                 | MMCR0_FC);
>         if (ppmu->flags & PPMU_ARCH_207S)
> -               mtspr(SPRN_MMCR2, cpuhw->mmcr[3]);
> +               mtspr(SPRN_MMCR2, cpuhw->mmcr.mmcr2);
>
>         /*
>          * Read off any pre-existing events that need to move
> @@ -1402,7 +1397,7 @@ static void power_pmu_enable(struct pmu *pmu)
>                 perf_event_update_userpage(event);
>         }
>         cpuhw->n_limited = n_lim;
> -       cpuhw->mmcr[0] |= MMCR0_PMXE | MMCR0_FCECE;
> +       cpuhw->mmcr.mmcr0 |= MMCR0_PMXE | MMCR0_FCECE;
>
>   out_enable:
>         pmao_restore_workaround(ebb);
> @@ -1418,9 +1413,9 @@ static void power_pmu_enable(struct pmu *pmu)
>         /*
>          * Enable instruction sampling if necessary
>          */
> -       if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
> +       if (cpuhw->mmcr.mmcra & MMCRA_SAMPLE_ENABLE) {
>                 mb();
> -               mtspr(SPRN_MMCRA, cpuhw->mmcr[2]);
> +               mtspr(SPRN_MMCRA, cpuhw->mmcr.mmcra);
>         }
>
>   out:
> @@ -1550,7 +1545,7 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>                                 cpuhw->flags[i-1] = cpuhw->flags[i];
>                         }
>                         --cpuhw->n_events;
> -                       ppmu->disable_pmc(event->hw.idx - 1, cpuhw->mmcr);
> +                       ppmu->disable_pmc(event->hw.idx - 1, &cpuhw->mmcr);
>                         if (event->hw.idx) {
>                                 write_pmc(event->hw.idx, 0);
>                                 event->hw.idx = 0;
> @@ -1571,7 +1566,7 @@ static void power_pmu_del(struct perf_event *event, int ef_flags)
>         }
>         if (cpuhw->n_events == 0) {
>                 /* disable exceptions if no events are running */
> -               cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE);
> +               cpuhw->mmcr.mmcr0 &= ~(MMCR0_PMXE | MMCR0_FCECE);
>         }
>
>         if (has_branch_stack(event))
> @@ -2240,7 +2235,7 @@ static void __perf_event_interrupt(struct pt_regs *regs)
>          * XXX might want to use MSR.PM to keep the events frozen until
>          * we get back out of this interrupt.
>          */
> -       write_mmcr0(cpuhw, cpuhw->mmcr[0]);
> +       write_mmcr0(cpuhw, cpuhw->mmcr.mmcr0);
>
>         if (nmi)
>                 nmi_exit();
> @@ -2262,7 +2257,7 @@ static int power_pmu_prepare_cpu(unsigned int cpu)
>
>         if (ppmu) {
>                 memset(cpuhw, 0, sizeof(*cpuhw));
> -               cpuhw->mmcr[0] = MMCR0_FC;
> +               cpuhw->mmcr.mmcr0 = MMCR0_FC;
>         }
>         return 0;
>  }
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 4c86da5..2fe63f2 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -363,7 +363,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
>  }
>
>  int isa207_compute_mmcr(u64 event[], int n_ev,
> -                              unsigned int hwc[], unsigned long mmcr[],
> +                              unsigned int hwc[], struct mmcr_regs *mmcr,
>                                struct perf_event *pevents[])
>  {
>         unsigned long mmcra, mmcr1, mmcr2, unit, combine, psel, cache, val;
> @@ -464,30 +464,30 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
>         }
>
>         /* Return MMCRx values */
> -       mmcr[0] = 0;
> +       mmcr->mmcr0 = 0;
>
>         /* pmc_inuse is 1-based */
>         if (pmc_inuse & 2)
> -               mmcr[0] = MMCR0_PMC1CE;
> +               mmcr->mmcr0 = MMCR0_PMC1CE;
>
>         if (pmc_inuse & 0x7c)
> -               mmcr[0] |= MMCR0_PMCjCE;
> +               mmcr->mmcr0 |= MMCR0_PMCjCE;
>
>         /* If we're not using PMC 5 or 6, freeze them */
>         if (!(pmc_inuse & 0x60))
> -               mmcr[0] |= MMCR0_FC56;
> +               mmcr->mmcr0 |= MMCR0_FC56;
>
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> -       mmcr[3] = mmcr2;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
> +       mmcr->mmcr2 = mmcr2;
>
>         return 0;
>  }
>
> -void isa207_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +void isa207_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         if (pmc <= 3)
> -               mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
> +               mmcr->mmcr1 &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
>  }
>
>  static int find_alternative(u64 event, const unsigned int ev_alt[][MAX_ALT], int size)
> diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
> index 63fd4f3..df968fd 100644
> --- a/arch/powerpc/perf/isa207-common.h
> +++ b/arch/powerpc/perf/isa207-common.h
> @@ -217,9 +217,9 @@
>
>  int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp);
>  int isa207_compute_mmcr(u64 event[], int n_ev,
> -                               unsigned int hwc[], unsigned long mmcr[],
> +                               unsigned int hwc[], struct mmcr_regs *mmcr,
>                                 struct perf_event *pevents[]);
> -void isa207_disable_pmc(unsigned int pmc, unsigned long mmcr[]);
> +void isa207_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr);
>  int isa207_get_alternatives(u64 event, u64 alt[], int size, unsigned int flags,
>                                         const unsigned int ev_alt[][MAX_ALT]);
>  void isa207_get_mem_data_src(union perf_mem_data_src *dsrc, u32 flags,
> diff --git a/arch/powerpc/perf/mpc7450-pmu.c b/arch/powerpc/perf/mpc7450-pmu.c
> index 4d5ef92..826de25 100644
> --- a/arch/powerpc/perf/mpc7450-pmu.c
> +++ b/arch/powerpc/perf/mpc7450-pmu.c
> @@ -257,7 +257,7 @@ static int mpc7450_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>   * Compute MMCR0/1/2 values for a set of events.
>   */
>  static int mpc7450_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[],
> -                               unsigned long mmcr[],
> +                               struct mmcr_regs *mmcr,
>                                 struct perf_event *pevents[])
>  {
>         u8 event_index[N_CLASSES][N_COUNTER];
> @@ -321,9 +321,16 @@ static int mpc7450_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[],
>                 mmcr0 |= MMCR0_PMCnCE;
>
>         /* Return MMCRx values */
> -       mmcr[0] = mmcr0;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcr2;
> +       mmcr->mmcr0 = mmcr0;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcr2 = mmcr2;
Will this mmcr->mmcr2 be used anywere, or will it always use mmcr->mmcra?
> +       /*
> +        * 32-bit doesn't have an MMCRA and uses SPRN_MMCR2 to define
> +        * SPRN_MMCRA. So assign mmcra of cpu_hw_events with `mmcr2`
> +        * value to ensure that any write to this SPRN_MMCRA will
> +        * use mmcr2 value.
> +        */
Something like this comment would probably be useful to near the struct mmcr.
> +       mmcr->mmcra = mmcr2;
>         return 0;
>  }
>
> @@ -331,12 +338,12 @@ static int mpc7450_compute_mmcr(u64 event[], int n_ev, unsigned int hwc[],
>   * Disable counting by a PMC.
>   * Note that the pmc argument is 0-based here, not 1-based.
>   */
> -static void mpc7450_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void mpc7450_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         if (pmc <= 1)
> -               mmcr[0] &= ~(pmcsel_mask[pmc] << pmcsel_shift[pmc]);
> +               mmcr->mmcr0 &= ~(pmcsel_mask[pmc] << pmcsel_shift[pmc]);
>         else
> -               mmcr[1] &= ~(pmcsel_mask[pmc] << pmcsel_shift[pmc]);
> +               mmcr->mmcr1 &= ~(pmcsel_mask[pmc] << pmcsel_shift[pmc]);
>  }
>
>  static int mpc7450_generic_events[] = {
> diff --git a/arch/powerpc/perf/power5+-pmu.c b/arch/powerpc/perf/power5+-pmu.c
> index f857454..5f0821e 100644
> --- a/arch/powerpc/perf/power5+-pmu.c
> +++ b/arch/powerpc/perf/power5+-pmu.c
> @@ -448,7 +448,8 @@ static int power5p_marked_instr_event(u64 event)
>  }
>
>  static int power5p_compute_mmcr(u64 event[], int n_ev,
> -                               unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[])
> +                               unsigned int hwc[], struct mmcr_regs *mmcr,
> +                               struct perf_event *pevents[])
>  {
>         unsigned long mmcr1 = 0;
>         unsigned long mmcra = 0;
> @@ -586,20 +587,20 @@ static int power5p_compute_mmcr(u64 event[], int n_ev,
>         }
>
>         /* Return MMCRx values */
> -       mmcr[0] = 0;
> +       mmcr->mmcr0 = 0;
>         if (pmc_inuse & 1)
> -               mmcr[0] = MMCR0_PMC1CE;
> +               mmcr->mmcr0 = MMCR0_PMC1CE;
>         if (pmc_inuse & 0x3e)
> -               mmcr[0] |= MMCR0_PMCjCE;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> +               mmcr->mmcr0 |= MMCR0_PMCjCE;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
>         return 0;
>  }
>
> -static void power5p_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void power5p_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         if (pmc <= 3)
> -               mmcr[1] &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc));
> +               mmcr->mmcr1 &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc));
>  }
>
>  static int power5p_generic_events[] = {
> diff --git a/arch/powerpc/perf/power5-pmu.c b/arch/powerpc/perf/power5-pmu.c
> index da52eca..426021d 100644
> --- a/arch/powerpc/perf/power5-pmu.c
> +++ b/arch/powerpc/perf/power5-pmu.c
> @@ -379,7 +379,8 @@ static int power5_marked_instr_event(u64 event)
>  }
>
>  static int power5_compute_mmcr(u64 event[], int n_ev,
> -                              unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[])
> +                              unsigned int hwc[], struct mmcr_regs *mmcr,
> +                              struct perf_event *pevents[])
>  {
>         unsigned long mmcr1 = 0;
>         unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
> @@ -528,20 +529,20 @@ static int power5_compute_mmcr(u64 event[], int n_ev,
>         }
>
>         /* Return MMCRx values */
> -       mmcr[0] = 0;
> +       mmcr->mmcr0 = 0;
>         if (pmc_inuse & 1)
> -               mmcr[0] = MMCR0_PMC1CE;
> +               mmcr->mmcr0 = MMCR0_PMC1CE;
>         if (pmc_inuse & 0x3e)
> -               mmcr[0] |= MMCR0_PMCjCE;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> +               mmcr->mmcr0 |= MMCR0_PMCjCE;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
>         return 0;
>  }
>
> -static void power5_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void power5_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         if (pmc <= 3)
> -               mmcr[1] &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc));
> +               mmcr->mmcr1 &= ~(0x7fUL << MMCR1_PMCSEL_SH(pmc));
>  }
>
>  static int power5_generic_events[] = {
> diff --git a/arch/powerpc/perf/power6-pmu.c b/arch/powerpc/perf/power6-pmu.c
> index 3929cac..e343a51 100644
> --- a/arch/powerpc/perf/power6-pmu.c
> +++ b/arch/powerpc/perf/power6-pmu.c
> @@ -171,7 +171,7 @@ static int power6_marked_instr_event(u64 event)
>   * Assign PMC numbers and compute MMCR1 value for a set of events
>   */
>  static int p6_compute_mmcr(u64 event[], int n_ev,
> -                          unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[])
> +                          unsigned int hwc[], struct mmcr_regs *mmcr, struct perf_event *pevents[])
>  {
>         unsigned long mmcr1 = 0;
>         unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
> @@ -243,13 +243,13 @@ static int p6_compute_mmcr(u64 event[], int n_ev,
>                 if (pmc < 4)
>                         mmcr1 |= (unsigned long)psel << MMCR1_PMCSEL_SH(pmc);
>         }
> -       mmcr[0] = 0;
> +       mmcr->mmcr0 = 0;
>         if (pmc_inuse & 1)
> -               mmcr[0] = MMCR0_PMC1CE;
> +               mmcr->mmcr0 = MMCR0_PMC1CE;
>         if (pmc_inuse & 0xe)
> -               mmcr[0] |= MMCR0_PMCjCE;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> +               mmcr->mmcr0 |= MMCR0_PMCjCE;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
>         return 0;
>  }
>
> @@ -457,11 +457,11 @@ static int p6_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>         return nalt;
>  }
>
> -static void p6_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void p6_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         /* Set PMCxSEL to 0 to disable PMCx */
>         if (pmc <= 3)
> -               mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
> +               mmcr->mmcr1 &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
>  }
>
>  static int power6_generic_events[] = {
> diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
> index a137813..3152336 100644
> --- a/arch/powerpc/perf/power7-pmu.c
> +++ b/arch/powerpc/perf/power7-pmu.c
> @@ -242,7 +242,8 @@ static int power7_marked_instr_event(u64 event)
>  }
>
>  static int power7_compute_mmcr(u64 event[], int n_ev,
> -                              unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[])
> +                              unsigned int hwc[], struct mmcr_regs *mmcr,
> +                              struct perf_event *pevents[])
>  {
>         unsigned long mmcr1 = 0;
>         unsigned long mmcra = MMCRA_SDAR_DCACHE_MISS | MMCRA_SDAR_ERAT_MISS;
> @@ -298,20 +299,20 @@ static int power7_compute_mmcr(u64 event[], int n_ev,
>         }
>
>         /* Return MMCRx values */
> -       mmcr[0] = 0;
> +       mmcr->mmcr0 = 0;
>         if (pmc_inuse & 1)
> -               mmcr[0] = MMCR0_PMC1CE;
> +               mmcr->mmcr0 = MMCR0_PMC1CE;
>         if (pmc_inuse & 0x3e)
> -               mmcr[0] |= MMCR0_PMCjCE;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> +               mmcr->mmcr0 |= MMCR0_PMCjCE;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
>         return 0;
>  }
>
> -static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void power7_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
>         if (pmc <= 3)
> -               mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
> +               mmcr->mmcr1 &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
>  }
>
>  static int power7_generic_events[] = {
> diff --git a/arch/powerpc/perf/ppc970-pmu.c b/arch/powerpc/perf/ppc970-pmu.c
> index 4035d93..89a90ab 100644
> --- a/arch/powerpc/perf/ppc970-pmu.c
> +++ b/arch/powerpc/perf/ppc970-pmu.c
> @@ -253,7 +253,8 @@ static int p970_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>  }
>
>  static int p970_compute_mmcr(u64 event[], int n_ev,
> -                            unsigned int hwc[], unsigned long mmcr[], struct perf_event *pevents[])
> +                            unsigned int hwc[], struct mmcr_regs *mmcr,
> +                            struct perf_event *pevents[])
>  {
>         unsigned long mmcr0 = 0, mmcr1 = 0, mmcra = 0;
>         unsigned int pmc, unit, byte, psel;
> @@ -393,27 +394,26 @@ static int p970_compute_mmcr(u64 event[], int n_ev,
>         mmcra |= 0x2000;        /* mark only one IOP per PPC instruction */
>
>         /* Return MMCRx values */
> -       mmcr[0] = mmcr0;
> -       mmcr[1] = mmcr1;
> -       mmcr[2] = mmcra;
> +       mmcr->mmcr0 = mmcr0;
> +       mmcr->mmcr1 = mmcr1;
> +       mmcr->mmcra = mmcra;
>         return 0;
>  }
>
> -static void p970_disable_pmc(unsigned int pmc, unsigned long mmcr[])
> +static void p970_disable_pmc(unsigned int pmc, struct mmcr_regs *mmcr)
>  {
> -       int shift, i;
> +       int shift;
>
> +       /*
> +        * Setting the PMCxSEL field to 0x08 disables PMC x.
> +        */
>         if (pmc <= 1) {
>                 shift = MMCR0_PMC1SEL_SH - 7 * pmc;
> -               i = 0;
> +               mmcr->mmcr0 = (mmcr->mmcr0 & ~(0x1fUL << shift)) | (0x08UL << shift);
>         } else {
>                 shift = MMCR1_PMC3SEL_SH - 5 * (pmc - 2);
> -               i = 1;
> +               mmcr->mmcr1 = (mmcr->mmcr1 & ~(0x1fUL << shift)) | (0x08UL << shift);
>         }
> -       /*
> -        * Setting the PMCxSEL field to 0x08 disables PMC x.
> -        */
> -       mmcr[i] = (mmcr[i] & ~(0x1fUL << shift)) | (0x08UL << shift);
>  }
>
>  static int ppc970_generic_events[] = {
> --
> 1.8.3.1
>

^ permalink raw reply

* [PATCH 2/2] ASoC: bindings: fsl-asoc-card: Support properties for configuring dai fmt
From: Shengjiu Wang @ 2020-07-21  3:41 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
	perex, tiwai, alsa-devel, robh+dt, devicetree
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1595302910-19688-1-git-send-email-shengjiu.wang@nxp.com>

In order to support configuring dai fmt through DT, add some properties.
These properiese are same as the properties in simple card.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 Documentation/devicetree/bindings/sound/fsl-asoc-card.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
index 8a6a3d0fda5e..63ebf52b43e8 100644
--- a/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
+++ b/Documentation/devicetree/bindings/sound/fsl-asoc-card.txt
@@ -71,6 +71,11 @@ Optional properties:
 
   - hp-det-gpio		: The GPIO that detect headphones are plugged in
   - mic-det-gpio	: The GPIO that detect microphones are plugged in
+  - bitclock-master	: Indicates dai-link bit clock master; for details see simple-card.yaml.
+  - frame-master	: Indicates dai-link frame master; for details see simple-card.yaml.
+  - dai-format		: audio format, for details see simple-card.yaml.
+  - frame-inversion	: dai-link uses frame clock inversion, for details see simple-card.yaml.
+  - bitclock-inversion	: dai-link uses bit clock inversion, for details see simple-card.yaml.
 
 Optional unless SSI is selected as a CPU DAI:
 
-- 
2.27.0


^ permalink raw reply related

* [PATCH 1/2] ASoC: fsl-asoc-card: Support configuring dai fmt from DT
From: Shengjiu Wang @ 2020-07-21  3:41 UTC (permalink / raw)
  To: timur, nicoleotsuka, Xiubo.Lee, festevam, lgirdwood, broonie,
	perex, tiwai, alsa-devel, robh+dt, devicetree
  Cc: linuxppc-dev, linux-kernel

Support same propeties as simple card for configuring fmt
from DT.
In order to make this change compatible with old DT, these
properties are optional.

Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
---
 sound/soc/fsl/fsl-asoc-card.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/sound/soc/fsl/fsl-asoc-card.c b/sound/soc/fsl/fsl-asoc-card.c
index ee80d02b56c6..4848ba61d083 100644
--- a/sound/soc/fsl/fsl-asoc-card.c
+++ b/sound/soc/fsl/fsl-asoc-card.c
@@ -531,11 +531,14 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 	struct device_node *cpu_np, *codec_np, *asrc_np;
 	struct device_node *np = pdev->dev.of_node;
 	struct platform_device *asrc_pdev = NULL;
+	struct device_node *bitclkmaster = NULL;
+	struct device_node *framemaster = NULL;
 	struct platform_device *cpu_pdev;
 	struct fsl_asoc_card_priv *priv;
 	struct device *codec_dev = NULL;
 	const char *codec_dai_name;
 	const char *codec_dev_name;
+	unsigned int daifmt;
 	u32 width;
 	int ret;
 
@@ -667,6 +670,31 @@ static int fsl_asoc_card_probe(struct platform_device *pdev)
 		goto asrc_fail;
 	}
 
+	/* Format info from DT is optional. */
+	daifmt = snd_soc_of_parse_daifmt(np, NULL,
+					 &bitclkmaster, &framemaster);
+	daifmt &= ~SND_SOC_DAIFMT_MASTER_MASK;
+	if (bitclkmaster || framemaster) {
+		if (codec_np == bitclkmaster)
+			daifmt |= (codec_np == framemaster) ?
+				SND_SOC_DAIFMT_CBM_CFM : SND_SOC_DAIFMT_CBM_CFS;
+		else
+			daifmt |= (codec_np == framemaster) ?
+				SND_SOC_DAIFMT_CBS_CFM : SND_SOC_DAIFMT_CBS_CFS;
+
+		/* Override dai_fmt with value from DT */
+		priv->dai_fmt = daifmt;
+	}
+
+	/* Change direction according to format */
+	if (priv->dai_fmt & SND_SOC_DAIFMT_CBM_CFM) {
+		priv->cpu_priv.sysclk_dir[TX] = SND_SOC_CLOCK_IN;
+		priv->cpu_priv.sysclk_dir[RX] = SND_SOC_CLOCK_IN;
+	}
+
+	of_node_put(bitclkmaster);
+	of_node_put(framemaster);
+
 	if (!fsl_asoc_card_is_ac97(priv) && !codec_dev) {
 		dev_err(&pdev->dev, "failed to find codec device\n");
 		ret = -EPROBE_DEFER;
-- 
2.27.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox