LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Baoquan He @ 2020-08-05  9:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
	linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
	Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
	linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
	Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
	iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>

On 08/02/20 at 07:36pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> Iteration over memblock.reserved with for_each_reserved_mem_region() used
> __next_reserved_mem_region() that implemented a subset of
> __next_mem_region().
> 
> Use __for_each_mem_range() and, essentially, __next_mem_region() with
> appropriate parameters to reduce code duplication.
> 
> While on it, rename for_each_reserved_mem_region() to
> for_each_reserved_mem_range() for consistency.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  .clang-format                    |  2 +-
>  arch/arm64/kernel/setup.c        |  2 +-
>  drivers/irqchip/irq-gic-v3-its.c |  2 +-
>  include/linux/memblock.h         | 12 +++------
>  mm/memblock.c                    | 46 +++++++-------------------------
>  5 files changed, 17 insertions(+), 47 deletions(-)

Reviewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply

* [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05  9:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha

All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.

Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.

This should prevent some soft lockups when starting large guests

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/mm/drmem.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..3819c523c65b 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -397,7 +397,7 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
 
 static void __init init_drmem_v2_lmbs(const __be32 *prop)
 {
-	struct drmem_lmb *lmb;
+	struct drmem_lmb *lmb, *first;
 	struct of_drconf_cell_v2 dr_cell;
 	const __be32 *p;
 	u32 i, j, lmb_sets;
@@ -422,10 +422,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	/* second pass, read in the LMB information */
 	lmb_index = 0;
 	p = prop;
+	first = NULL;
 
 	for (i = 0; i < lmb_sets; i++) {
 		read_drconf_v2_cell(&dr_cell, &p);
 
+		/*
+		 * Fetch the NUMA node id for the fist set or if the
+		 * associativity index is different from the previous set.
+		 */
+		if (first && dr_cell.aa_index != first->aa_index)
+			first = NULL;
+
 		for (j = 0; j < dr_cell.seq_lmbs; j++) {
 			lmb = &drmem_info->lmbs[lmb_index++];
 
@@ -438,7 +446,16 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 			lmb->aa_index = dr_cell.aa_index;
 			lmb->flags = dr_cell.flags;
 
-			lmb_set_nid(lmb);
+			/*
+			 * All the LMB in the set share the same NUMA
+			 * associativity property. So read that node only once.
+			 */
+			if (!first) {
+				lmb_set_nid(lmb);
+				first = lmb;
+			} else {
+				lmb->nid = first->nid;
+			}
 		}
 	}
 }
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Baoquan He @ 2020-08-05  9:29 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Emil Renner Berthing, linux-sh, Peter Zijlstra, Dave Hansen,
	linux-mips, Max Filippov, Paul Mackerras, sparclinux, linux-riscv,
	Will Deacon, Stafford Horne, Marek Szyprowski, linux-arch,
	linux-s390, linux-c6x-dev, Yoshinori Sato, x86, Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, linux-arm-kernel,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Thomas Gleixner,
	Hari Bathini, Michal Simek, linux-mm, linuxppc-dev, linux-kernel,
	iommu, Palmer Dabbelt, Andrew Morton, Christoph Hellwig
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>

On 08/02/20 at 07:36pm, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> for_each_memblock() is used to iterate over memblock.memory in
> a few places that use data from memblock_region rather than the memory
> ranges.
> 
> Introduce separate for_each_mem_region() and for_each_reserved_mem_region()
> to improve encapsulation of memblock internals from its users.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  .clang-format                  |  3 ++-
>  arch/arm64/kernel/setup.c      |  2 +-
>  arch/arm64/mm/numa.c           |  2 +-
>  arch/mips/netlogic/xlp/setup.c |  2 +-
>  arch/x86/mm/numa.c             |  2 +-
>  include/linux/memblock.h       | 19 ++++++++++++++++---
>  mm/memblock.c                  |  4 ++--
>  mm/page_alloc.c                |  8 ++++----
>  8 files changed, 28 insertions(+), 14 deletions(-)
> 
...

> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 9e51b3fd4134..a6970e058bd7 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -522,9 +522,22 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
>  	return PFN_UP(reg->base + reg->size);
>  }
>  
> -#define for_each_memblock(memblock_type, region)					\
> -	for (region = memblock.memblock_type.regions;					\
> -	     region < (memblock.memblock_type.regions + memblock.memblock_type.cnt);	\
> +/**
> + * for_each_mem_region - itereate over registered memory regions
                                          ~~~~~~~~~~~~~~~~~

Wonder why emphasize 'registered' memory.

Other than this confusion to me, this patch looks good.

Reviewed-by: Baoquan He <bhe@redhat.com>


^ permalink raw reply

* [PATCH] macintosh: windfarm: fix spelling mistake "detatch" -> "detach"
From: Colin King @ 2020-08-05 10:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Wolfram Sang, colin.king, linuxppc-dev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There are spelling mistakes in DBG messages. Fix them.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/macintosh/windfarm_lm75_sensor.c | 2 +-
 drivers/macintosh/windfarm_lm87_sensor.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/windfarm_lm75_sensor.c b/drivers/macintosh/windfarm_lm75_sensor.c
index 1e5fa09845e7..a88f73af4d5e 100644
--- a/drivers/macintosh/windfarm_lm75_sensor.c
+++ b/drivers/macintosh/windfarm_lm75_sensor.c
@@ -152,7 +152,7 @@ static int wf_lm75_remove(struct i2c_client *client)
 {
 	struct wf_lm75_sensor *lm = i2c_get_clientdata(client);
 
-	DBG("wf_lm75: i2c detatch called for %s\n", lm->sens.name);
+	DBG("wf_lm75: i2c detach called for %s\n", lm->sens.name);
 
 	/* Mark client detached */
 	lm->i2c = NULL;
diff --git a/drivers/macintosh/windfarm_lm87_sensor.c b/drivers/macintosh/windfarm_lm87_sensor.c
index d011899c0a8a..de8ef76a0ac8 100644
--- a/drivers/macintosh/windfarm_lm87_sensor.c
+++ b/drivers/macintosh/windfarm_lm87_sensor.c
@@ -149,7 +149,7 @@ static int wf_lm87_remove(struct i2c_client *client)
 {
 	struct wf_lm87_sensor *lm = i2c_get_clientdata(client);
 
-	DBG("wf_lm87: i2c detatch called for %s\n", lm->sens.name);
+	DBG("wf_lm87: i2c detach called for %s\n", lm->sens.name);
 
 	/* Mark client detached */
 	lm->i2c = NULL;
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: kernel test robot @ 2020-08-05 10:43 UTC (permalink / raw)
  To: Laurent Dufour, linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha, kbuild-all
In-Reply-To: <20200805092858.64143-1-ldufour@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3560 bytes --]

Hi Laurent,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linux/master linus/master v5.8 next-20200804]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Laurent-Dufour/powerpc-drmem-Don-t-compute-the-NUMA-node-for-each-LMB/20200805-173213
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-mpc885_ads_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/mm/drmem.c: In function 'init_drmem_v2_lmbs':
>> arch/powerpc/mm/drmem.c:457:8: error: 'struct drmem_lmb' has no member named 'nid'
     457 |     lmb->nid = first->nid;
         |        ^~
   arch/powerpc/mm/drmem.c:457:21: error: 'struct drmem_lmb' has no member named 'nid'
     457 |     lmb->nid = first->nid;
         |                     ^~

vim +457 arch/powerpc/mm/drmem.c

   397	
   398	static void __init init_drmem_v2_lmbs(const __be32 *prop)
   399	{
   400		struct drmem_lmb *lmb, *first;
   401		struct of_drconf_cell_v2 dr_cell;
   402		const __be32 *p;
   403		u32 i, j, lmb_sets;
   404		int lmb_index;
   405	
   406		lmb_sets = of_read_number(prop++, 1);
   407		if (lmb_sets == 0)
   408			return;
   409	
   410		/* first pass, calculate the number of LMBs */
   411		p = prop;
   412		for (i = 0; i < lmb_sets; i++) {
   413			read_drconf_v2_cell(&dr_cell, &p);
   414			drmem_info->n_lmbs += dr_cell.seq_lmbs;
   415		}
   416	
   417		drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
   418					   GFP_KERNEL);
   419		if (!drmem_info->lmbs)
   420			return;
   421	
   422		/* second pass, read in the LMB information */
   423		lmb_index = 0;
   424		p = prop;
   425		first = NULL;
   426	
   427		for (i = 0; i < lmb_sets; i++) {
   428			read_drconf_v2_cell(&dr_cell, &p);
   429	
   430			/*
   431			 * Fetch the NUMA node id for the fist set or if the
   432			 * associativity index is different from the previous set.
   433			 */
   434			if (first && dr_cell.aa_index != first->aa_index)
   435				first = NULL;
   436	
   437			for (j = 0; j < dr_cell.seq_lmbs; j++) {
   438				lmb = &drmem_info->lmbs[lmb_index++];
   439	
   440				lmb->base_addr = dr_cell.base_addr;
   441				dr_cell.base_addr += drmem_info->lmb_size;
   442	
   443				lmb->drc_index = dr_cell.drc_index;
   444				dr_cell.drc_index++;
   445	
   446				lmb->aa_index = dr_cell.aa_index;
   447				lmb->flags = dr_cell.flags;
   448	
   449				/*
   450				 * All the LMB in the set share the same NUMA
   451				 * associativity property. So read that node only once.
   452				 */
   453				if (!first) {
   454					lmb_set_nid(lmb);
   455					first = lmb;
   456				} else {
 > 457					lmb->nid = first->nid;
   458				}
   459			}
   460		}
   461	}
   462	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 10853 bytes --]

^ permalink raw reply

* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Thomas Bogendoerfer @ 2020-08-05 10:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
	Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
	linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
	openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
	linux-mm, linux-kernel, iommu, Palmer Dabbelt, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>

On Sun, Aug 02, 2020 at 07:36:01PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> for_each_memblock() is used to iterate over memblock.memory in
> a few places that use data from memblock_region rather than the memory
> ranges.
> 
> Introduce separate for_each_mem_region() and for_each_reserved_mem_region()
> to improve encapsulation of memblock internals from its users.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/mips/netlogic/xlp/setup.c |  2 +-

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply

* Re: [PATCH v2 12/17] arch, drivers: replace for_each_membock() with for_each_mem_range()
From: Thomas Bogendoerfer @ 2020-08-05 11:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He, x86,
	Russell King, Mike Rapoport, clang-built-linux, Ingo Molnar,
	linux-arm-kernel, Catalin Marinas, uclinux-h8-devel, linux-xtensa,
	openrisc, Borislav Petkov, Andy Lutomirski, Paul Walmsley,
	Stafford Horne, Hari Bathini, Michal Simek, Yoshinori Sato,
	linux-mm, linux-kernel, iommu, Palmer Dabbelt, Andrew Morton,
	linuxppc-dev
In-Reply-To: <20200802163601.8189-13-rppt@kernel.org>

On Sun, Aug 02, 2020 at 07:35:56PM +0300, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> There are several occurrences of the following pattern:
> 
> 	for_each_memblock(memory, reg) {
> 		start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
> 		end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));
> 
> 		/* do something with start and end */
> 	}
> 
> Using for_each_mem_range() iterator is more appropriate in such cases and
> allows simpler and cleaner code.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  arch/mips/cavium-octeon/dma-octeon.c     | 12 +++---
>  arch/mips/kernel/setup.c                 | 31 +++++++--------

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply

* Re: [PATCH] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 12:23 UTC (permalink / raw)
  To: kernel test robot, linuxppc-dev, linux-kernel
  Cc: nathanl, cheloha, kbuild-all
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>

Le 05/08/2020 à 12:43, kernel test robot a écrit :
> Hi Laurent,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on linux/master linus/master v5.8 next-20200804]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Laurent-Dufour/powerpc-drmem-Don-t-compute-the-NUMA-node-for-each-LMB/20200805-173213
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-mpc885_ads_defconfig (attached as .config)
> compiler: powerpc-linux-gcc (GCC) 9.3.0
> reproduce (this is a W=1 build):
>          wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>          chmod +x ~/bin/make.cross
>          # save the attached .config to linux build tree
>          COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
>     arch/powerpc/mm/drmem.c: In function 'init_drmem_v2_lmbs':
>>> arch/powerpc/mm/drmem.c:457:8: error: 'struct drmem_lmb' has no member named 'nid'
>       457 |     lmb->nid = first->nid;
>           |        ^~
>     arch/powerpc/mm/drmem.c:457:21: error: 'struct drmem_lmb' has no member named 'nid'
>       457 |     lmb->nid = first->nid;
>           |                     ^~

My mistake, the nid member is only present when CONFIG_MEMORY_HOTPLUG is set.

I'll send a new version fixing this.


> vim +457 arch/powerpc/mm/drmem.c
> 
>     397	
>     398	static void __init init_drmem_v2_lmbs(const __be32 *prop)
>     399	{
>     400		struct drmem_lmb *lmb, *first;
>     401		struct of_drconf_cell_v2 dr_cell;
>     402		const __be32 *p;
>     403		u32 i, j, lmb_sets;
>     404		int lmb_index;
>     405	
>     406		lmb_sets = of_read_number(prop++, 1);
>     407		if (lmb_sets == 0)
>     408			return;
>     409	
>     410		/* first pass, calculate the number of LMBs */
>     411		p = prop;
>     412		for (i = 0; i < lmb_sets; i++) {
>     413			read_drconf_v2_cell(&dr_cell, &p);
>     414			drmem_info->n_lmbs += dr_cell.seq_lmbs;
>     415		}
>     416	
>     417		drmem_info->lmbs = kcalloc(drmem_info->n_lmbs, sizeof(*lmb),
>     418					   GFP_KERNEL);
>     419		if (!drmem_info->lmbs)
>     420			return;
>     421	
>     422		/* second pass, read in the LMB information */
>     423		lmb_index = 0;
>     424		p = prop;
>     425		first = NULL;
>     426	
>     427		for (i = 0; i < lmb_sets; i++) {
>     428			read_drconf_v2_cell(&dr_cell, &p);
>     429	
>     430			/*
>     431			 * Fetch the NUMA node id for the fist set or if the
>     432			 * associativity index is different from the previous set.
>     433			 */
>     434			if (first && dr_cell.aa_index != first->aa_index)
>     435				first = NULL;
>     436	
>     437			for (j = 0; j < dr_cell.seq_lmbs; j++) {
>     438				lmb = &drmem_info->lmbs[lmb_index++];
>     439	
>     440				lmb->base_addr = dr_cell.base_addr;
>     441				dr_cell.base_addr += drmem_info->lmb_size;
>     442	
>     443				lmb->drc_index = dr_cell.drc_index;
>     444				dr_cell.drc_index++;
>     445	
>     446				lmb->aa_index = dr_cell.aa_index;
>     447				lmb->flags = dr_cell.flags;
>     448	
>     449				/*
>     450				 * All the LMB in the set share the same NUMA
>     451				 * associativity property. So read that node only once.
>     452				 */
>     453				if (!first) {
>     454					lmb_set_nid(lmb);
>     455					first = lmb;
>     456				} else {
>   > 457					lmb->nid = first->nid;
>     458				}
>     459			}
>     460		}
>     461	}
>     462	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> 


^ permalink raw reply

* Re: [PATCH v3 2/4] powerpc/sstep: support emulation for vsx vector paired storage access instructions
From: Naveen N. Rao @ 2020-08-05 12:53 UTC (permalink / raw)
  To: Balamuruhan S, mpe; +Cc: paulus, linuxppc-dev, sandipan, ravi.bangoria
In-Reply-To: <20200731081637.1837559-3-bala24@linux.ibm.com>

Balamuruhan S wrote:
> add emulate_step() changes to support vsx vector paired storage
> access instructions that provides octword operands loads/stores
> between storage and set of 64 Vector Scalar Registers (VSRs).

This should be squashed in with the previous patch. Otherwise, emulation 
of these instructions won't be complete, which affects bisectability.

- Naveen


^ permalink raw reply

* [PATCH] usb: gadget: fix spelling mistake "Dectected" -> "Detected"
From: Colin King @ 2020-08-05 13:14 UTC (permalink / raw)
  To: Li Yang, Felipe Balbi, Greg Kroah-Hartman, linux-usb,
	linuxppc-dev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There is a spelling mistake in a literal string. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index b2638e83bb49..c79407f2d577 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -2061,7 +2061,7 @@ static int fsl_proc_read(struct seq_file *m, void *v)
 			"Sleep Enable: %d SOF Received Enable: %d "
 			"Reset Enable: %d\n"
 			"System Error Enable: %d "
-			"Port Change Dectected Enable: %d\n"
+			"Port Change Detected Enable: %d\n"
 			"USB Error Intr Enable: %d USB Intr Enable: %d\n\n",
 			(tmp_reg & USB_INTR_DEVICE_SUSPEND) ? 1 : 0,
 			(tmp_reg & USB_INTR_SOF_EN) ? 1 : 0,
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Vinod Koul @ 2020-08-05 13:19 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Song Liu, Alexei Starovoitov, Shawn Guo, Fabio Estevam,
	Daniel Borkmann, John Fastabend, Zhang Wei, NXP Linux Team,
	Yonghong Song, Andrii Nakryiko, Sascha Hauer, KP Singh,
	Dan Williams, linux-arm-kernel, netdev, linux-kernel, Li Yang,
	Pengutronix Kernel Team, dmaengine, bpf, linuxppc-dev,
	Martin KaFai Lau
In-Reply-To: <20200727203413.GA6245@embeddedor>

On 27-07-20, 15:34, Gustavo A. R. Silva wrote:

> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 2c508ee672b9..9b69716172a4 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>  
>  		if (_state(thrd) == PL330_STATE_KILLING)
>  			UNTIL(thrd, PL330_STATE_STOPPED)
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_FAULTING:
>  		_stop(thrd);
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_KILLING:
>  	case PL330_STATE_COMPLETING:
>  		UNTIL(thrd, PL330_STATE_STOPPED)
> -		/* fall through */
> +		fallthrough;
>  
>  	case PL330_STATE_STOPPED:
>  		return _trigger(thrd);
> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>  
>  	switch (direction) {
>  	case DMA_MEM_TO_MEM:
> -		/* fall through */
>  	case DMA_MEM_TO_DEV:
>  		off += _emit_LD(dry_run, &buf[off], cond);
>  		break;
> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>  
>  	switch (direction) {
>  	case DMA_MEM_TO_MEM:
> -		/* fall through */
>  	case DMA_DEV_TO_MEM:
>  		off += _emit_ST(dry_run, &buf[off], cond);
>  		break;
> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>  
>  	switch (pxs->desc->rqtype) {
>  	case DMA_MEM_TO_DEV:
> -		/* fall through */
>  	case DMA_DEV_TO_MEM:
>  		off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>  			cond);
> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>  
>  	switch (pxs->desc->rqtype) {
>  	case DMA_MEM_TO_DEV:
> -		/* fall through */

replacement missed here and above few

-- 
~Vinod

^ permalink raw reply

* [PATCH v2] powerpc/drmem: Don't compute the NUMA node for each LMB
From: Laurent Dufour @ 2020-08-05 13:35 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: nathanl, cheloha
In-Reply-To: <202008051807.Vi8NDJtX%lkp@intel.com>

All the LMB from the same set of ibm,dynamic-memory-v2 property are
sharing the same NUMA node. Don't compute that node for each one.

Tested on a system with 1022 LMBs spread on 4 NUMA nodes, only 4 calls to
lmb_set_nid() have been made instead of 1022.

This should prevent some soft lockups when starting large guests

Code has meaning only if CONFIG_MEMORY_HOTPLUG is set, otherwise the nid
field is not present in the drmem_lmb structure.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
 arch/powerpc/mm/drmem.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index b2eeea39684c..c11b6ec99ea3 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -402,6 +402,9 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	const __be32 *p;
 	u32 i, j, lmb_sets;
 	int lmb_index;
+#ifdef CONFIG_MEMORY_HOTPLUG
+	struct drmem_lmb *first = NULL;
+#endif
 
 	lmb_sets = of_read_number(prop++, 1);
 	if (lmb_sets == 0)
@@ -426,6 +429,15 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 	for (i = 0; i < lmb_sets; i++) {
 		read_drconf_v2_cell(&dr_cell, &p);
 
+#ifdef CONFIG_MEMORY_HOTPLUG
+		/*
+		 * Fetch the NUMA node id for the fist set or if the
+		 * associativity index is different from the previous set.
+		 */
+		if (first && dr_cell.aa_index != first->aa_index)
+			first = NULL;
+#endif
+
 		for (j = 0; j < dr_cell.seq_lmbs; j++) {
 			lmb = &drmem_info->lmbs[lmb_index++];
 
@@ -438,7 +450,18 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 			lmb->aa_index = dr_cell.aa_index;
 			lmb->flags = dr_cell.flags;
 
-			lmb_set_nid(lmb);
+#ifdef CONFIG_MEMORY_HOTPLUG
+			/*
+			 * All the LMB in the set share the same NUMA
+			 * associativity property. So read that node only once.
+			 */
+			if (!first) {
+				lmb_set_nid(lmb);
+				first = lmb;
+			} else {
+				lmb->nid = first->nid;
+			}
+#endif
 		}
 	}
 }
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 13:35 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Christophe Leroy, nathanl, arnd, linux-kernel,
	Tulio Magno Quites Machado Filho, Paul Mackerras, luto,
	linux-arch, tglx, vincenzo.frascino, linuxppc-dev
In-Reply-To: <87wo2dy5in.fsf@mpe.ellerman.id.au>

Hi!

On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> > frame whenever it has anything to same.
> 
> Yeah OK that would explain it.
> 
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> >     fbb60:	94 21 ff e0 	stwu    r1,-32(r1)

This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).

> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> >     .cfi_startproc
> >   	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
> >   	mflr		r0
> > +	PPC_STLU	r1, -STACK_FRAME_OVERHEAD(r1)
> >     .cfi_register lr, r0
> 
> The cfi_register should come directly after the mflr I think.

That is the idiomatic way to write it, and most obviously correct.  But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding.  Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)

> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if 
> > you think that's needed, just let's do it:
> 
> Hmm, true.
> 
> The compiler will use the toc for globals (and possibly also for large
> constants?)

And anything else it bloody well wants to, yeah :-)

> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.

Yes.

> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.

If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)

> So yeah we should probably do as below.

[ snip ]

Looks good yes.


Segher

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 14:03 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> Provide vdso_shift_ns(), as the generic x >> s gives the following
> bad result:
> 
>   18:	35 25 ff e0 	addic.  r9,r5,-32
>   1c:	41 80 00 10 	blt     2c <shift+0x14>
>   20:	7c 64 4c 30 	srw     r4,r3,r9
>   24:	38 60 00 00 	li      r3,0
> ...
>   2c:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
>   30:	21 45 00 1f 	subfic  r10,r5,31
>   34:	7c 84 2c 30 	srw     r4,r4,r5
>   38:	7d 29 50 30 	slw     r9,r9,r10
>   3c:	7c 63 2c 30 	srw     r3,r3,r5
>   40:	7d 24 23 78 	or      r4,r9,r4
> 
> In our case the shift is always <= 32. In addition,  the upper 32 bits
> of the result are likely nul. Lets GCC know it, it also optimises the
> following calculations.
> 
> With the patch, we get:
>    0:	21 25 00 20 	subfic  r9,r5,32
>    4:	7c 69 48 30 	slw     r9,r3,r9
>    8:	7c 84 2c 30 	srw     r4,r4,r5
>    c:	7d 24 23 78 	or      r4,r9,r4
>   10:	7c 63 2c 30 	srw     r3,r3,r5

See below.  Such code is valid on PowerPC for all shift < 64, and a
future version of GCC will do that (it is on various TODO lists, it is
bound to happen *some* day ;-), but it won't help you yet of course).


> +/*
> + * The macros sets two stack frames, one for the caller and one for the callee
> + * because there are no requirement for the caller to set a stack frame when
> + * calling VDSO so it may have omitted to set one, especially on PPC64
> + */

If the caller follows the ABI, there always is a stack frame.  So what
is going on?


> +/*
> + * powerpc specific delta calculation.
> + *
> + * This variant removes the masking of the subtraction because the
> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
> + * which would result in a pointless operation. The compiler cannot
> + * optimize it away as the mask comes from the vdso data and is not compile
> + * time constant.
> + */

It cannot optimise it because it does not know shift < 32.  The code
below is incorrect for shift equal to 32, fwiw.

> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
> +{
> +	return (cycles - last) * mult;
> +}
> +#define vdso_calc_delta vdso_calc_delta
> +
> +#ifndef __powerpc64__
> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> +{
> +	u32 hi = ns >> 32;
> +	u32 lo = ns;
> +
> +	lo >>= shift;
> +	lo |= hi << (32 - shift);
> +	hi >>= shift;


> +	if (likely(hi == 0))
> +		return lo;

Removing these two lines shouldn't change generated object code?  Or not
make it worse, at least.

> +	return ((u64)hi << 32) | lo;
> +}


What does the compiler do for just

static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
	return ns >> (shift & 31);
}

?


Segher

^ permalink raw reply

* [PATCH] powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

On BOOK3S_32, when we have modules and strict kernel RWX, modules
are not in vmalloc space but in a dedicated segment that is
below PAGE_OFFSET.

So KASAN_SHADOW_START must take it into account.

MODULES_VADDR can't be used because it is not defined yet
in kasan.h

Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/kasan.h | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index d635b96c7ea6..7355ed05e65e 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -15,11 +15,18 @@
 #ifndef __ASSEMBLY__
 
 #include <asm/page.h>
+#include <linux/sizes.h>
 
 #define KASAN_SHADOW_SCALE_SHIFT	3
 
+#if defined(CONFIG_PPC_BOOK3S_32) && defined(CONFIG_MODULES) && defined(CONFIG_STRICT_KERNEL_RWX)
+#define KASAN_KERN_START	ALIGN_DOWN(PAGE_OFFSET - SZ_256M, SZ_256M)
+#else
+#define KASAN_KERN_START	PAGE_OFFSET
+#endif
+
 #define KASAN_SHADOW_START	(KASAN_SHADOW_OFFSET + \
-				 (PAGE_OFFSET >> KASAN_SHADOW_SCALE_SHIFT))
+				 (KASAN_KERN_START >> KASAN_SHADOW_SCALE_SHIFT))
 
 #define KASAN_SHADOW_OFFSET	ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
 
-- 
2.25.0


^ permalink raw reply related

* [PATCH] powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
From: Christophe Leroy @ 2020-08-05 15:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

When MODULES_VADDR is defined, is_module_segment() shall check the
address against it instead of checking agains VMALLOC_START.

Fixes: 6ca055322da8 ("powerpc/32s: Use dedicated segment for modules with STRICT_KERNEL_RWX")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/mm/book3s32/mmu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index c0162911f6cb..82ae9e06a773 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -191,10 +191,17 @@ static bool is_module_segment(unsigned long addr)
 {
 	if (!IS_ENABLED(CONFIG_MODULES))
 		return false;
+#ifdef MODULES_VADDR
+	if (addr < ALIGN_DOWN(MODULES_VADDR, SZ_256M))
+		return false;
+	if (addr >= ALIGN(MODULES_END, SZ_256M))
+		return false;
+#else
 	if (addr < ALIGN_DOWN(VMALLOC_START, SZ_256M))
 		return false;
 	if (addr >= ALIGN(VMALLOC_END, SZ_256M))
 		return false;
+#endif
 	return true;
 }
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH] powerpc/40x: Fix assembler warning about r0
From: Christophe Leroy @ 2020-08-05 15:52 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20200722022422.825197-1-mpe@ellerman.id.au>



Le 22/07/2020 à 04:24, Michael Ellerman a écrit :
> The assembler says:
>    arch/powerpc/kernel/head_40x.S:623: Warning: invalid register expression

I get exactly the same with head_32.S, for the exact same reason.

Christophe

> 
> It's objecting to the use of r0 as the RA argument. That's because
> when RA = 0 the literal value 0 is used, rather than the content of
> r0, making the use of r0 in the source potentially confusing.
> 
> Fix it to use a literal 0, the generated code is identical.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>   arch/powerpc/kernel/head_40x.S | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/head_40x.S b/arch/powerpc/kernel/head_40x.S
> index 926bfa73586a..5b282d9965a5 100644
> --- a/arch/powerpc/kernel/head_40x.S
> +++ b/arch/powerpc/kernel/head_40x.S
> @@ -620,7 +620,7 @@ _ENTRY(saved_ksp_limit)
>   	ori	r6, r6, swapper_pg_dir@l
>   	lis	r5, abatron_pteptrs@h
>   	ori	r5, r5, abatron_pteptrs@l
> -	stw	r5, 0xf0(r0)	/* Must match your Abatron config file */
> +	stw	r5, 0xf0(0)	/* Must match your Abatron config file */
>   	tophys(r5,r5)
>   	stw	r6, 0(r5)
>   
> 

^ permalink raw reply

* Re: [PATCH][next] dmaengine: Use fallthrough pseudo-keyword
From: Tyrel Datwyler @ 2020-08-05 16:14 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20200805131922.GZ12965@vkoul-mobl>

On 8/5/20 6:19 AM, Vinod Koul wrote:
> On 27-07-20, 15:34, Gustavo A. R. Silva wrote:
> 
>> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
>> index 2c508ee672b9..9b69716172a4 100644
>> --- a/drivers/dma/pl330.c
>> +++ b/drivers/dma/pl330.c
>> @@ -1061,16 +1061,16 @@ static bool _start(struct pl330_thread *thrd)
>>  
>>  		if (_state(thrd) == PL330_STATE_KILLING)
>>  			UNTIL(thrd, PL330_STATE_STOPPED)
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_FAULTING:
>>  		_stop(thrd);
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_KILLING:
>>  	case PL330_STATE_COMPLETING:
>>  		UNTIL(thrd, PL330_STATE_STOPPED)
>> -		/* fall through */
>> +		fallthrough;
>>  
>>  	case PL330_STATE_STOPPED:
>>  		return _trigger(thrd);
>> @@ -1121,7 +1121,6 @@ static u32 _emit_load(unsigned int dry_run, u8 buf[],
>>  
>>  	switch (direction) {
>>  	case DMA_MEM_TO_MEM:
>> -		/* fall through */
>>  	case DMA_MEM_TO_DEV:
>>  		off += _emit_LD(dry_run, &buf[off], cond);
>>  		break;
>> @@ -1155,7 +1154,6 @@ static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
>>  
>>  	switch (direction) {
>>  	case DMA_MEM_TO_MEM:
>> -		/* fall through */
>>  	case DMA_DEV_TO_MEM:
>>  		off += _emit_ST(dry_run, &buf[off], cond);
>>  		break;
>> @@ -1216,7 +1214,6 @@ static int _bursts(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[],
>>  
>>  	switch (pxs->desc->rqtype) {
>>  	case DMA_MEM_TO_DEV:
>> -		/* fall through */
>>  	case DMA_DEV_TO_MEM:
>>  		off += _ldst_peripheral(pl330, dry_run, &buf[off], pxs, cyc,
>>  			cond);
>> @@ -1266,7 +1263,6 @@ static int _dregs(struct pl330_dmac *pl330, unsigned int dry_run, u8 buf[],
>>  
>>  	switch (pxs->desc->rqtype) {
>>  	case DMA_MEM_TO_DEV:
>> -		/* fall through */
> 
> replacement missed here and above few
> 

Its not obvious via most of the documentation, but a case label followed
immediately by another case label is the only allowed implicit fall through as
it is obvious to the eye that there is no postceding statement.

For example the following is legal:

    case FOO:
    	/* fallthrough */ (or fallthrough;)
    case BAR:

is converted to:

    case FOO:
    case BAR:

I would assume the justification is that it is common to have a switch statement
where several case statements fall directly through to a single code block and
annotating each case label seems like overkill.

        switch (vhost->state) {
        case IBMVFC_LINK_DEAD:
        case IBMVFC_HOST_OFFLINE:
                result = DID_NO_CONNECT << 16;
                break;
        case IBMVFC_NO_CRQ:
        case IBMVFC_INITIALIZING:
        case IBMVFC_HALTED:
        case IBMVFC_LINK_DOWN:
                result = DID_REQUEUE << 16;
                break;
        case IBMVFC_ACTIVE:
                result = 0;
                break;
        }

Regards,

Tyrel

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>

Hi,

On 08/05/2020 02:03 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * powerpc specific delta calculation.
>> + *
>> + * This variant removes the masking of the subtraction because the
>> + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX
>> + * which would result in a pointless operation. The compiler cannot
>> + * optimize it away as the mask comes from the vdso data and is not compile
>> + * time constant.
>> + */
> 
> It cannot optimise it because it does not know shift < 32.  The code
> below is incorrect for shift equal to 32, fwiw.

Is there a way to tell it ?

> 
>> +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
>> +{
>> +	return (cycles - last) * mult;
>> +}
>> +#define vdso_calc_delta vdso_calc_delta
>> +
>> +#ifndef __powerpc64__
>> +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
>> +{
>> +	u32 hi = ns >> 32;
>> +	u32 lo = ns;
>> +
>> +	lo >>= shift;
>> +	lo |= hi << (32 - shift);
>> +	hi >>= shift;
> 
> 
>> +	if (likely(hi == 0))
>> +		return lo;
> 
> Removing these two lines shouldn't change generated object code?  Or not
> make it worse, at least.

I remember it made noticeable difference allthough I can't remember the 
details. See below with GCC 10.1. At least we see that with those two 
lines, GCC only sets a 16 bytes stack frame. Without those lines it sets 
a 32 bytes stack frame and seems to save some values for no reason.

With the two lines:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 04 	bgt     7b4 <__c_kernel_clock_gettime+0x108>
  6b4:	39 40 00 01 	li      r10,1
  6b8:	7d 4a 18 30 	slw     r10,r10,r3
  6bc:	71 47 08 83 	andi.   r7,r10,2179
  6c0:	41 82 01 2c 	beq     7ec <__c_kernel_clock_gettime+0x140>
  6c4:	94 21 ff f0 	stwu    r1,-16(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 e1 00 0c 	stw     r31,12(r1)
  6d0:	7d 85 1a 14 	add     r12,r5,r3
  6d4:	80 05 00 00 	lwz     r0,0(r5)
  6d8:	70 06 00 01 	andi.   r6,r0,1
  6dc:	40 82 00 d4 	bne     7b0 <__c_kernel_clock_gettime+0x104>
  6e0:	7d 4d 42 e6 	mftbu   r10
  6e4:	7d 6c 42 e6 	mftb    r11
  6e8:	7c ed 42 e6 	mftbu   r7
  6ec:	7c 0a 38 40 	cmplw   r10,r7
  6f0:	40 82 ff f0 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  6f4:	80 e5 00 0c 	lwz     r7,12(r5)
  6f8:	80 65 00 08 	lwz     r3,8(r5)
  6fc:	7c e7 58 10 	subfc   r7,r7,r11
  700:	81 65 00 18 	lwz     r11,24(r5)
  704:	7d 43 51 10 	subfe   r10,r3,r10
  708:	7f e7 58 16 	mulhwu  r31,r7,r11
  70c:	7d 4a 59 d6 	mullw   r10,r10,r11
  710:	7c e7 59 d6 	mullw   r7,r7,r11
  714:	80 6c 00 2c 	lwz     r3,44(r12)
  718:	81 6c 00 28 	lwz     r11,40(r12)
  71c:	7c e7 18 14 	addc    r7,r7,r3
  720:	7d 4a fa 14 	add     r10,r10,r31
  724:	80 65 00 1c 	lwz     r3,28(r5)
  728:	7d 4a 59 14 	adde    r10,r10,r11
  72c:	7c e7 1c 30 	srw     r7,r7,r3
  730:	21 63 00 20 	subfic  r11,r3,32
  734:	7d 43 1c 31 	srw.    r3,r10,r3
  738:	7d 4a 58 30 	slw     r10,r10,r11
  73c:	7d 49 3b 78 	or      r9,r10,r7
  740:	39 00 00 00 	li      r8,0
  744:	40 82 00 84 	bne     7c8 <__c_kernel_clock_gettime+0x11c>
  748:	80 6c 00 24 	lwz     r3,36(r12)
  74c:	81 45 00 00 	lwz     r10,0(r5)
  750:	7c 00 50 40 	cmplw   r0,r10
  754:	40 a2 ff 80 	bne     6d4 <__c_kernel_clock_gettime+0x28>
  758:	2c 08 00 00 	cmpwi   r8,0
  75c:	41 82 00 7c 	beq     7d8 <__c_kernel_clock_gettime+0x12c>
  760:	3c e0 c4 65 	lis     r7,-15259
  764:	3c 00 3b 9a 	lis     r0,15258
  768:	60 e7 36 00 	ori     r7,r7,13824
  76c:	60 00 c9 ff 	ori     r0,r0,51711
  770:	7c a9 38 14 	addc    r5,r9,r7
  774:	7d 48 01 d4 	addme   r10,r8
  778:	2c 0a 00 00 	cmpwi   r10,0
  77c:	7d 48 53 78 	mr      r8,r10
  780:	7c a9 2b 78 	mr      r9,r5
  784:	38 c6 00 01 	addi    r6,r6,1
  788:	40 82 ff e8 	bne     770 <__c_kernel_clock_gettime+0xc4>
  78c:	7c 05 00 40 	cmplw   r5,r0
  790:	41 81 ff e0 	bgt     770 <__c_kernel_clock_gettime+0xc4>
  794:	7c 66 18 14 	addc    r3,r6,r3
  798:	90 64 00 00 	stw     r3,0(r4)
  79c:	91 24 00 04 	stw     r9,4(r4)
  7a0:	38 60 00 00 	li      r3,0
  7a4:	83 e1 00 0c 	lwz     r31,12(r1)
  7a8:	38 21 00 10 	addi    r1,r1,16
  7ac:	4e 80 00 20 	blr
  7b0:	4b ff ff 24 	b       6d4 <__c_kernel_clock_gettime+0x28>
  7b4:	38 00 00 f6 	li      r0,246
  7b8:	44 00 00 02 	sc
  7bc:	40 a3 00 08 	bns     7c4 <__c_kernel_clock_gettime+0x118>
  7c0:	7c 63 00 d0 	neg     r3,r3
  7c4:	4e 80 00 20 	blr
  7c8:	7d 2a 4b 78 	mr      r10,r9
  7cc:	7c 68 1b 78 	mr      r8,r3
  7d0:	7d 49 53 78 	mr      r9,r10
  7d4:	4b ff ff 74 	b       748 <__c_kernel_clock_gettime+0x9c>
  7d8:	3d 40 3b 9a 	lis     r10,15258
  7dc:	61 4a c9 ff 	ori     r10,r10,51711
  7e0:	7c 09 50 40 	cmplw   r9,r10
  7e4:	41 81 ff 7c 	bgt     760 <__c_kernel_clock_gettime+0xb4>
  7e8:	4b ff ff b0 	b       798 <__c_kernel_clock_gettime+0xec>
  7ec:	71 47 00 60 	andi.   r7,r10,96
  7f0:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  7f4:	7d 25 4a 14 	add     r9,r5,r9
  7f8:	40 82 00 14 	bne     80c <__c_kernel_clock_gettime+0x160>
  7fc:	71 4a 00 10 	andi.   r10,r10,16
  800:	41 a2 ff b4 	beq     7b4 <__c_kernel_clock_gettime+0x108>
  804:	38 a5 00 f0 	addi    r5,r5,240
  808:	4b ff fe bc 	b       6c4 <__c_kernel_clock_gettime+0x18>
  80c:	81 05 00 00 	lwz     r8,0(r5)
  810:	71 0a 00 01 	andi.   r10,r8,1
  814:	40 a2 ff f8 	bne     80c <__c_kernel_clock_gettime+0x160>
  818:	80 69 00 24 	lwz     r3,36(r9)
  81c:	81 49 00 2c 	lwz     r10,44(r9)
  820:	80 e5 00 00 	lwz     r7,0(r5)
  824:	7c 08 38 40 	cmplw   r8,r7
  828:	40 a2 ff e4 	bne     80c <__c_kernel_clock_gettime+0x160>
  82c:	90 64 00 00 	stw     r3,0(r4)
  830:	91 44 00 04 	stw     r10,4(r4)
  834:	38 60 00 00 	li      r3,0
  838:	4e 80 00 20 	blr


Without the two lines:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 14 	bgt     7c4 <__c_kernel_clock_gettime+0x118>
  6b4:	39 20 00 01 	li      r9,1
  6b8:	7d 29 18 30 	slw     r9,r9,r3
  6bc:	71 2a 08 83 	andi.   r10,r9,2179
  6c0:	41 82 01 2c 	beq     7ec <__c_kernel_clock_gettime+0x140>
  6c4:	94 21 ff e0 	stwu    r1,-32(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 81 00 10 	stw     r28,16(r1)
  6d0:	93 a1 00 14 	stw     r29,20(r1)
  6d4:	93 c1 00 18 	stw     r30,24(r1)
  6d8:	93 e1 00 1c 	stw     r31,28(r1)
  6dc:	7c 65 1a 14 	add     r3,r5,r3
  6e0:	81 85 00 00 	lwz     r12,0(r5)
  6e4:	71 87 00 01 	andi.   r7,r12,1
  6e8:	40 82 00 d8 	bne     7c0 <__c_kernel_clock_gettime+0x114>
  6ec:	7d 2d 42 e6 	mftbu   r9
  6f0:	7c cc 42 e6 	mftb    r6
  6f4:	7d 4d 42 e6 	mftbu   r10
  6f8:	7c 09 50 40 	cmplw   r9,r10
  6fc:	40 82 ff f0 	bne     6ec <__c_kernel_clock_gettime+0x40>
  700:	83 83 00 28 	lwz     r28,40(r3)
  704:	83 a3 00 2c 	lwz     r29,44(r3)
  708:	81 65 00 08 	lwz     r11,8(r5)
  70c:	81 05 00 0c 	lwz     r8,12(r5)
  710:	83 c5 00 18 	lwz     r30,24(r5)
  714:	83 e5 00 1c 	lwz     r31,28(r5)
  718:	80 03 00 24 	lwz     r0,36(r3)
  71c:	81 45 00 00 	lwz     r10,0(r5)
  720:	7c 0c 50 40 	cmplw   r12,r10
  724:	40 a2 ff bc 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  728:	7d 48 30 10 	subfc   r10,r8,r6
  72c:	7c cb 49 10 	subfe   r6,r11,r9
  730:	7c c6 f1 d6 	mullw   r6,r6,r30
  734:	7d 2a f0 16 	mulhwu  r9,r10,r30
  738:	7d 4a f1 d6 	mullw   r10,r10,r30
  73c:	7c c6 4a 14 	add     r6,r6,r9
  740:	7d 4a e8 14 	addc    r10,r10,r29
  744:	7c c6 e1 14 	adde    r6,r6,r28
  748:	7c c8 fc 30 	srw     r8,r6,r31
  74c:	2c 08 00 00 	cmpwi   r8,0
  750:	20 bf 00 20 	subfic  r5,r31,32
  754:	7d 4a fc 30 	srw     r10,r10,r31
  758:	7c c5 28 30 	slw     r5,r6,r5
  75c:	7c a9 53 78 	or      r9,r5,r10
  760:	41 82 00 78 	beq     7d8 <__c_kernel_clock_gettime+0x12c>
  764:	3c c0 c4 65 	lis     r6,-15259
  768:	3c 60 3b 9a 	lis     r3,15258
  76c:	60 c6 36 00 	ori     r6,r6,13824
  770:	60 63 c9 ff 	ori     r3,r3,51711
  774:	7c a9 30 14 	addc    r5,r9,r6
  778:	7d 48 01 d4 	addme   r10,r8
  77c:	2c 0a 00 00 	cmpwi   r10,0
  780:	7d 48 53 78 	mr      r8,r10
  784:	7c a9 2b 78 	mr      r9,r5
  788:	38 e7 00 01 	addi    r7,r7,1
  78c:	40 82 ff e8 	bne     774 <__c_kernel_clock_gettime+0xc8>
  790:	7c 05 18 40 	cmplw   r5,r3
  794:	41 81 ff e0 	bgt     774 <__c_kernel_clock_gettime+0xc8>
  798:	7c 07 00 14 	addc    r0,r7,r0
  79c:	90 04 00 00 	stw     r0,0(r4)
  7a0:	91 24 00 04 	stw     r9,4(r4)
  7a4:	38 60 00 00 	li      r3,0
  7a8:	83 81 00 10 	lwz     r28,16(r1)
  7ac:	83 a1 00 14 	lwz     r29,20(r1)
  7b0:	83 c1 00 18 	lwz     r30,24(r1)
  7b4:	83 e1 00 1c 	lwz     r31,28(r1)
  7b8:	38 21 00 20 	addi    r1,r1,32
  7bc:	4e 80 00 20 	blr
  7c0:	4b ff ff 20 	b       6e0 <__c_kernel_clock_gettime+0x34>
  7c4:	38 00 00 f6 	li      r0,246
  7c8:	44 00 00 02 	sc
  7cc:	40 a3 00 08 	bns     7d4 <__c_kernel_clock_gettime+0x128>
  7d0:	7c 63 00 d0 	neg     r3,r3
  7d4:	4e 80 00 20 	blr
  7d8:	3d 40 3b 9a 	lis     r10,15258
  7dc:	61 4a c9 ff 	ori     r10,r10,51711
  7e0:	7c 09 50 40 	cmplw   r9,r10
  7e4:	41 81 ff 80 	bgt     764 <__c_kernel_clock_gettime+0xb8>
  7e8:	4b ff ff b4 	b       79c <__c_kernel_clock_gettime+0xf0>
  7ec:	71 2a 00 60 	andi.   r10,r9,96
  7f0:	40 82 00 14 	bne     804 <__c_kernel_clock_gettime+0x158>
  7f4:	71 29 00 10 	andi.   r9,r9,16
  7f8:	41 a2 ff cc 	beq     7c4 <__c_kernel_clock_gettime+0x118>
  7fc:	38 a5 00 f0 	addi    r5,r5,240
  800:	4b ff fe c4 	b       6c4 <__c_kernel_clock_gettime+0x18>
  804:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  808:	7d 25 4a 14 	add     r9,r5,r9
  80c:	81 05 00 00 	lwz     r8,0(r5)
  810:	71 0a 00 01 	andi.   r10,r8,1
  814:	40 82 00 28 	bne     83c <__c_kernel_clock_gettime+0x190>
  818:	80 09 00 24 	lwz     r0,36(r9)
  81c:	81 49 00 2c 	lwz     r10,44(r9)
  820:	80 e5 00 00 	lwz     r7,0(r5)
  824:	7c 08 38 40 	cmplw   r8,r7
  828:	40 a2 ff e4 	bne     80c <__c_kernel_clock_gettime+0x160>
  82c:	90 04 00 00 	stw     r0,0(r4)
  830:	91 44 00 04 	stw     r10,4(r4)
  834:	38 60 00 00 	li      r3,0
  838:	4e 80 00 20 	blr
  83c:	4b ff ff d0 	b       80c <__c_kernel_clock_gettime+0x160>


> 
>> +	return ((u64)hi << 32) | lo;
>> +}
> 
> 
> What does the compiler do for just
> 
> static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> 	return ns >> (shift & 31);
> }
> 

Worse:

000006ac <__c_kernel_clock_gettime>:
  6ac:	28 03 00 0f 	cmplwi  r3,15
  6b0:	41 81 01 30 	bgt     7e0 <__c_kernel_clock_gettime+0x134>
  6b4:	39 20 00 01 	li      r9,1
  6b8:	7d 29 18 30 	slw     r9,r9,r3
  6bc:	71 2a 08 83 	andi.   r10,r9,2179
  6c0:	41 82 01 48 	beq     808 <__c_kernel_clock_gettime+0x15c>
  6c4:	94 21 ff e0 	stwu    r1,-32(r1)
  6c8:	54 63 20 36 	rlwinm  r3,r3,4,0,27
  6cc:	93 81 00 10 	stw     r28,16(r1)
  6d0:	93 a1 00 14 	stw     r29,20(r1)
  6d4:	93 c1 00 18 	stw     r30,24(r1)
  6d8:	93 e1 00 1c 	stw     r31,28(r1)
  6dc:	7c 65 1a 14 	add     r3,r5,r3
  6e0:	80 c5 00 00 	lwz     r6,0(r5)
  6e4:	70 c7 00 01 	andi.   r7,r6,1
  6e8:	40 82 00 f4 	bne     7dc <__c_kernel_clock_gettime+0x130>
  6ec:	7d 2d 42 e6 	mftbu   r9
  6f0:	7d 0c 42 e6 	mftb    r8
  6f4:	7d 4d 42 e6 	mftbu   r10
  6f8:	7c 09 50 40 	cmplw   r9,r10
  6fc:	40 82 ff f0 	bne     6ec <__c_kernel_clock_gettime+0x40>
  700:	83 83 00 28 	lwz     r28,40(r3)
  704:	83 c3 00 2c 	lwz     r30,44(r3)
  708:	81 65 00 08 	lwz     r11,8(r5)
  70c:	81 45 00 0c 	lwz     r10,12(r5)
  710:	83 e5 00 18 	lwz     r31,24(r5)
  714:	81 85 00 1c 	lwz     r12,28(r5)
  718:	80 03 00 24 	lwz     r0,36(r3)
  71c:	83 a5 00 00 	lwz     r29,0(r5)
  720:	7c 06 e8 40 	cmplw   r6,r29
  724:	40 a2 ff bc 	bne     6e0 <__c_kernel_clock_gettime+0x34>
  728:	7d 0a 40 10 	subfc   r8,r10,r8
  72c:	7c cb 49 10 	subfe   r6,r11,r9
  730:	7c c6 f9 d6 	mullw   r6,r6,r31
  734:	7d 28 f8 16 	mulhwu  r9,r8,r31
  738:	7d 08 f9 d6 	mullw   r8,r8,r31
  73c:	55 8c 06 fe 	clrlwi  r12,r12,27
  740:	7f c8 f0 14 	addc    r30,r8,r30
  744:	7c c6 4a 14 	add     r6,r6,r9
  748:	7c c6 e1 14 	adde    r6,r6,r28
  74c:	34 6c ff e0 	addic.  r3,r12,-32
  750:	41 80 00 70 	blt     7c0 <__c_kernel_clock_gettime+0x114>
  754:	7c c9 1c 30 	srw     r9,r6,r3
  758:	39 00 00 00 	li      r8,0
  75c:	2c 08 00 00 	cmpwi   r8,0
  760:	41 82 00 94 	beq     7f4 <__c_kernel_clock_gettime+0x148>
  764:	3c c0 c4 65 	lis     r6,-15259
  768:	3c 60 3b 9a 	lis     r3,15258
  76c:	60 c6 36 00 	ori     r6,r6,13824
  770:	60 63 c9 ff 	ori     r3,r3,51711
  774:	7c a9 30 14 	addc    r5,r9,r6
  778:	7d 48 01 d4 	addme   r10,r8
  77c:	2c 0a 00 00 	cmpwi   r10,0
  780:	7d 48 53 78 	mr      r8,r10
  784:	7c a9 2b 78 	mr      r9,r5
  788:	38 e7 00 01 	addi    r7,r7,1
  78c:	40 82 ff e8 	bne     774 <__c_kernel_clock_gettime+0xc8>
  790:	7c 05 18 40 	cmplw   r5,r3
  794:	41 81 ff e0 	bgt     774 <__c_kernel_clock_gettime+0xc8>
  798:	7c 07 00 14 	addc    r0,r7,r0
  79c:	90 04 00 00 	stw     r0,0(r4)
  7a0:	91 24 00 04 	stw     r9,4(r4)
  7a4:	38 60 00 00 	li      r3,0
  7a8:	83 81 00 10 	lwz     r28,16(r1)
  7ac:	83 a1 00 14 	lwz     r29,20(r1)
  7b0:	83 c1 00 18 	lwz     r30,24(r1)
  7b4:	83 e1 00 1c 	lwz     r31,28(r1)
  7b8:	38 21 00 20 	addi    r1,r1,32
  7bc:	4e 80 00 20 	blr
  7c0:	54 c3 08 3c 	rlwinm  r3,r6,1,0,30
  7c4:	21 6c 00 1f 	subfic  r11,r12,31
  7c8:	7c 63 58 30 	slw     r3,r3,r11
  7cc:	7f c9 64 30 	srw     r9,r30,r12
  7d0:	7c 69 4b 78 	or      r9,r3,r9
  7d4:	7c c8 64 30 	srw     r8,r6,r12
  7d8:	4b ff ff 84 	b       75c <__c_kernel_clock_gettime+0xb0>
  7dc:	4b ff ff 04 	b       6e0 <__c_kernel_clock_gettime+0x34>
  7e0:	38 00 00 f6 	li      r0,246
  7e4:	44 00 00 02 	sc
  7e8:	40 a3 00 08 	bns     7f0 <__c_kernel_clock_gettime+0x144>
  7ec:	7c 63 00 d0 	neg     r3,r3
  7f0:	4e 80 00 20 	blr
  7f4:	3d 40 3b 9a 	lis     r10,15258
  7f8:	61 4a c9 ff 	ori     r10,r10,51711
  7fc:	7c 09 50 40 	cmplw   r9,r10
  800:	41 81 ff 64 	bgt     764 <__c_kernel_clock_gettime+0xb8>
  804:	4b ff ff 98 	b       79c <__c_kernel_clock_gettime+0xf0>
  808:	71 2a 00 60 	andi.   r10,r9,96
  80c:	40 82 00 14 	bne     820 <__c_kernel_clock_gettime+0x174>
  810:	71 29 00 10 	andi.   r9,r9,16
  814:	41 a2 ff cc 	beq     7e0 <__c_kernel_clock_gettime+0x134>
  818:	38 a5 00 f0 	addi    r5,r5,240
  81c:	4b ff fe a8 	b       6c4 <__c_kernel_clock_gettime+0x18>
  820:	54 69 20 36 	rlwinm  r9,r3,4,0,27
  824:	7d 25 4a 14 	add     r9,r5,r9
  828:	81 05 00 00 	lwz     r8,0(r5)
  82c:	71 0a 00 01 	andi.   r10,r8,1
  830:	40 82 00 28 	bne     858 <__c_kernel_clock_gettime+0x1ac>
  834:	80 09 00 24 	lwz     r0,36(r9)
  838:	81 49 00 2c 	lwz     r10,44(r9)
  83c:	80 e5 00 00 	lwz     r7,0(r5)
  840:	7c 08 38 40 	cmplw   r8,r7
  844:	40 a2 ff e4 	bne     828 <__c_kernel_clock_gettime+0x17c>
  848:	90 04 00 00 	stw     r0,0(r4)
  84c:	91 44 00 04 	stw     r10,4(r4)
  850:	38 60 00 00 	li      r3,0
  854:	4e 80 00 20 	blr
  858:	4b ff ff d0 	b       828 <__c_kernel_clock_gettime+0x17c>

Christophe

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Christophe Leroy @ 2020-08-05 16:51 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <20200805140307.GO6753@gate.crashing.org>

Hi Again,

Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> Hi!
> 
> On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
>> +/*
>> + * The macros sets two stack frames, one for the caller and one for the callee
>> + * because there are no requirement for the caller to set a stack frame when
>> + * calling VDSO so it may have omitted to set one, especially on PPC64
>> + */
> 
> If the caller follows the ABI, there always is a stack frame.  So what
> is going on?

Looks like it is not the case. See discussion at 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/

Seems like GCC uses the redzone and doesn't set a stack frame. I guess 
it doesn't know that the inline assembly contains a function call so it 
doesn't set the frame.


Christophe

^ permalink raw reply

* Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions
From: Miguel Ojeda @ 2020-08-05 17:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
	Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
	linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-18-rppt@kernel.org>

On Sun, Aug 2, 2020 at 6:41 PM Mike Rapoport <rppt@kernel.org> wrote:
>
>  .clang-format                  |  3 ++-

The .clang-format bit:

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v2 16/17] memblock: implement for_each_reserved_mem_region() using __next_mem_region()
From: Miguel Ojeda @ 2020-08-05 17:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Thomas Gleixner, Emil Renner Berthing, linux-sh, Peter Zijlstra,
	Dave Hansen, linux-mips, Max Filippov, Paul Mackerras, sparclinux,
	linux-riscv, Will Deacon, Christoph Hellwig, Marek Szyprowski,
	linux-arch, linux-s390, linux-c6x-dev, Baoquan He,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), Russell King,
	Mike Rapoport, clang-built-linux, Ingo Molnar, Linux ARM,
	Catalin Marinas, uclinux-h8-devel, linux-xtensa, openrisc,
	Borislav Petkov, Andy Lutomirski, Paul Walmsley, Stafford Horne,
	Hari Bathini, Michal Simek, Yoshinori Sato, Linux-MM,
	linux-kernel, iommu, Palmer Dabbelt, Andrew Morton, linuxppc-dev
In-Reply-To: <20200802163601.8189-17-rppt@kernel.org>

On Sun, Aug 2, 2020 at 6:40 PM Mike Rapoport <rppt@kernel.org> wrote:
>
>  .clang-format                    |  2 +-

The .clang-format bit:

Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>

Cheers,
Miguel

^ permalink raw reply

* Re: [PATCH v5 0/4] Allow bigger 64bit window by removing default DMA window
From: Leonardo Bras @ 2020-08-05 18:08 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Alexey Kardashevskiy, Thiago Jung Bauermann, Ram Pai, Brian King,
	Murilo Fossa Vicentini, David Dai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200805030455.123024-1-leobras.c@gmail.com>

Travis reported successful compilation with mpe/merge:

https://travis-ci.org/github/LeoBras/linux-ppc/builds/715028857


^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 18:40 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <3db2a590-b842-83db-ed2b-f3ee62595f18@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 04:40:16PM +0000, Christophe Leroy wrote:
> >It cannot optimise it because it does not know shift < 32.  The code
> >below is incorrect for shift equal to 32, fwiw.
> 
> Is there a way to tell it ?

Sure, for example the &31 should work (but it doesn't, with the GCC
version you used -- which version is that?)

> >What does the compiler do for just
> >
> >static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
> >	return ns >> (shift & 31);
> >}
> >
> 
> Worse:

I cannot make heads or tails of all that branch spaghetti, sorry.

>  73c:	55 8c 06 fe 	clrlwi  r12,r12,27
>  740:	7f c8 f0 14 	addc    r30,r8,r30
>  744:	7c c6 4a 14 	add     r6,r6,r9
>  748:	7c c6 e1 14 	adde    r6,r6,r28
>  74c:	34 6c ff e0 	addic.  r3,r12,-32
>  750:	41 80 00 70 	blt     7c0 <__c_kernel_clock_gettime+0x114>

This branch is always true.  Hrm.


Segher

^ permalink raw reply

* Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
From: Segher Boessenkool @ 2020-08-05 20:55 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: nathanl, linux-arch, vincenzo.frascino, arnd, linux-kernel,
	Paul Mackerras, luto, tglx, linuxppc-dev
In-Reply-To: <7d409421-6396-8eba-8250-b6c9ff8232d9@csgroup.eu>

Hi!

On Wed, Aug 05, 2020 at 06:51:44PM +0200, Christophe Leroy wrote:
> Le 05/08/2020 à 16:03, Segher Boessenkool a écrit :
> >On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote:
> >>+/*
> >>+ * The macros sets two stack frames, one for the caller and one for the 
> >>callee
> >>+ * because there are no requirement for the caller to set a stack frame 
> >>when
> >>+ * calling VDSO so it may have omitted to set one, especially on PPC64
> >>+ */
> >
> >If the caller follows the ABI, there always is a stack frame.  So what
> >is going on?
> 
> Looks like it is not the case. See discussion at 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/2a67c333893454868bbfda773ba4b01c20272a5d.1588079622.git.christophe.leroy@c-s.fr/
> 
> Seems like GCC uses the redzone and doesn't set a stack frame. I guess 
> it doesn't know that the inline assembly contains a function call so it 
> doesn't set the frame.

Yes, that is the problem.  See
https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Extended-Asm.html#AssemblerTemplate
where this is (briefly) discussed:
  "Accessing data from C programs without using input/output operands
  (such as by using global symbols directly from the assembler
  template) may not work as expected. Similarly, calling functions
  directly from an assembler template requires a detailed understanding
  of the target assembler and ABI."

I don't know of a good way to tell GCC some function needs a frame (that
is, one that doesn't result in extra code other than to set up the
frame).  I'll think about it.


Segher

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox