LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCHv7 02/12] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
From: Rob Herring @ 2020-09-10 17:58 UTC (permalink / raw)
  To: Zhiqiang Hou
  Cc: devicetree, lorenzo.pieralisi, Xiaowei Bao, roy.zang, linux-pci,
	linux-kernel, leoyang.li, minghuan.Lian, jingoohan1,
	andrew.murray, mingkai.hu, gustavo.pimentel, bhelgaas, shawnguo,
	kishon, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200811095441.7636-3-Zhiqiang.Hou@nxp.com>

On Tue, Aug 11, 2020 at 05:54:31PM +0800, Zhiqiang Hou wrote:
> From: Xiaowei Bao <xiaowei.bao@nxp.com>
> 
> Add the doorbell mode of MSI-X in DWC EP driver.
> 
> Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V7:
>  - Rebase the patch without functionality change.
> 
>  drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++++++++++++++
>  drivers/pci/controller/dwc/pcie-designware.h    | 12 ++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index e5bd3a5ef380..e76b504ed465 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -471,6 +471,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	return 0;
>  }
>  
> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,

return void. It never has an error.

It could also just be an inline function.

> +				       u16 interrupt_num)
> +{
> +	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> +	u32 msg_data;
> +
> +	msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> +		   (interrupt_num - 1);
> +
> +	dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> +
> +	return 0;
> +}
> +
>  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>  			      u16 interrupt_num)
>  {
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index 89f8271ec5ee..745b4938225a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -97,6 +97,9 @@
>  #define PCIE_MISC_CONTROL_1_OFF		0x8BC
>  #define PCIE_DBI_RO_WR_EN		BIT(0)
>  
> +#define PCIE_MSIX_DOORBELL		0x948
> +#define PCIE_MSIX_DOORBELL_PF_SHIFT	24
> +
>  #define PCIE_PL_CHK_REG_CONTROL_STATUS			0xB20
>  #define PCIE_PL_CHK_REG_CHK_REG_START			BIT(0)
>  #define PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS		BIT(1)
> @@ -434,6 +437,8 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
>  			     u8 interrupt_num);
>  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>  			     u16 interrupt_num);
> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
> +				       u16 interrupt_num);
>  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
>  #else
>  static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
> @@ -475,6 +480,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>  	return 0;
>  }
>  
> +static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,
> +						     u8 func_no,
> +						     u16 interrupt_num)
> +{
> +	return 0;
> +}
> +
>  static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
>  {
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Linus Torvalds @ 2020-09-10 17:35 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Claudio Imbrenda, Will Deacon, linux-arch,
	linux-s390, Vasily Gorbik, Christian Borntraeger,
	Richard Weinberger, linux-x86, Russell King, Jason Gunthorpe,
	Ingo Molnar, Catalin Marinas, Andrey Ryabinin, Gerald Schaefer,
	Heiko Carstens, Arnd Bergmann, John Hubbard, Jeff Dike, linux-um,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, linux-arm,
	linux-mm, LKML, Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <20200910093925.GB29166@oc3871087118.ibm.com>

On Thu, Sep 10, 2020 at 2:40 AM Alexander Gordeev
<agordeev@linux.ibm.com> wrote:
>
> It is only gup_fast case that exposes the issue. It hits because
> pointers to stack copies are passed to gup_pXd_range iterators, not
> pointers to real page tables itself.

Can we possibly change fast-gup to not do the stack copies?

I'd actually rather do something like that, than the "addr_end" thing.

As you say, none of the other page table walking code does what the
GUP code does, and I don't think it's required.

The GUP code is kind of strange, I'm not quite sure why. Some of it
unusually came from the powerpc code that handled their special odd
hugepage model, and that may be why it's so different.

How painful would it be to just pass the pmd (etc) _pointers_ around,
rather than do the odd "take the address of local copies"?

                  Linus

^ permalink raw reply

* [PATCH v1] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
From: Scott Cheloha @ 2020-09-10 17:56 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nathan Lynch, Michal Suchanek, Laurent Dufour, David Hildenbrand,
	Rick Lindsley

During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
to determine which node id (nid) to use when later calling __add_memory().

This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
appropriate nid for a given address by looking up the LMB containing the
address and then passing that LMB to of_drconf_to_nid_single() to get the
nid.  In dlpar_add_lmb() we get this address from the LMB itself.

In short, we have a pointer to an LMB and then we are searching for
that LMB *again* in order to find its nid.

If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
can skip the redundant lookup.  The only error handling we need to
duplicate from memory_add_physaddr_to_nid() is the fallback to the
default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
an invalid nid.

Skipping the extra lookup makes hot-add operations faster, especially
on machines with many LMBs.

Consider an LPAR with 126976 LMBs.  In one test, hot-adding 126000
LMBs on an upatched kernel took ~3.5 hours while a patched kernel
completed the same operation in ~2 hours:

Unpatched (12450 seconds):
Sep  9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
Sep  9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
[...]
Sep  9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added

Patched (7065 seconds):
Sep  8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
Sep  8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
[...]
Sep  8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added

It should be noted that the speedup grows more substantial when
hot-adding LMBs at the end of the drconf range.  This is because we
are skipping a linear LMB search.

To see the distinction, consider smaller hot-add test on the same
LPAR.  A perf-stat run with 10 iterations showed that hot-adding 4096
LMBs completed less than 1 second faster on a patched kernel:

Unpatched:
 Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):

        104,753.42 msec task-clock                #    0.992 CPUs utilized            ( +-  0.55% )
             4,708      context-switches          #    0.045 K/sec                    ( +-  0.69% )
             2,444      cpu-migrations            #    0.023 K/sec                    ( +-  1.25% )
               394      page-faults               #    0.004 K/sec                    ( +-  0.22% )
   445,902,503,057      cycles                    #    4.257 GHz                      ( +-  0.55% )  (66.67%)
     8,558,376,740      stalled-cycles-frontend   #    1.92% frontend cycles idle     ( +-  0.88% )  (49.99%)
   300,346,181,651      stalled-cycles-backend    #   67.36% backend cycles idle      ( +-  0.76% )  (50.01%)
   258,091,488,691      instructions              #    0.58  insn per cycle
                                                  #    1.16  stalled cycles per insn  ( +-  0.22% )  (66.67%)
    70,568,169,256      branches                  #  673.660 M/sec                    ( +-  0.17% )  (50.01%)
     3,100,725,426      branch-misses             #    4.39% of all branches          ( +-  0.20% )  (49.99%)

           105.583 +- 0.589 seconds time elapsed  ( +-  0.56% )

Patched:
 Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):

        104,055.69 msec task-clock                #    0.993 CPUs utilized            ( +-  0.32% )
             4,606      context-switches          #    0.044 K/sec                    ( +-  0.20% )
             2,463      cpu-migrations            #    0.024 K/sec                    ( +-  0.93% )
               394      page-faults               #    0.004 K/sec                    ( +-  0.25% )
   442,951,129,921      cycles                    #    4.257 GHz                      ( +-  0.32% )  (66.66%)
     8,710,413,329      stalled-cycles-frontend   #    1.97% frontend cycles idle     ( +-  0.47% )  (50.06%)
   299,656,905,836      stalled-cycles-backend    #   67.65% backend cycles idle      ( +-  0.39% )  (50.02%)
   252,731,168,193      instructions              #    0.57  insn per cycle
                                                  #    1.19  stalled cycles per insn  ( +-  0.20% )  (66.66%)
    68,902,851,121      branches                  #  662.173 M/sec                    ( +-  0.13% )  (49.94%)
     3,100,242,882      branch-misses             #    4.50% of all branches          ( +-  0.15% )  (49.98%)

           104.829 +- 0.325 seconds time elapsed  ( +-  0.31% )

This is consistent.  An add-by-count hot-add operation adds LMBs
greedily, so LMBs near the start of the drconf range are considered
first.  On an otherwise idle LPAR with so many LMBs we would expect to
find the LMBs we need near the start of the drconf range, hence the
smaller speedup.

Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
---
 arch/powerpc/mm/numa.c                          | 2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c | 8 ++++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 1f61fa2148b5..63507b47164d 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -430,7 +430,7 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
  * This is like of_node_to_nid_single() for memory represented in the
  * ibm,dynamic-reconfiguration-memory node.
  */
-static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
+int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 {
 	struct assoc_arrays aa = { .arrays = NULL };
 	int default_nid = NUMA_NO_NODE;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac4..9cd572440175 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -595,6 +595,8 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
+extern int of_drconf_to_nid_single(struct drmem_lmb *);
+
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
@@ -611,8 +613,10 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	block_sz = memory_block_size_bytes();
 
-	/* Find the node id for this address. */
-	nid = memory_add_physaddr_to_nid(lmb->base_addr);
+	/* Find the node id for this address.  Fake one if necessary. */
+	nid = of_drconf_to_nid_single(lmb);
+	if (nid < 0 || !node_possible(nid))
+		nid = first_online_node;
 
 	/* Add the memory */
 	rc = __add_memory(nid, lmb->base_addr, block_sz);
-- 
2.24.1


^ permalink raw reply related

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 17:19 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
	linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
	linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
	Anshuman Khandual, John Hubbard, Jeff Dike, linux-um,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, linux-arm,
	linux-mm, linux-power, LKML, Andrew Morton, Linus Torvalds,
	Mike Rapoport
In-Reply-To: <20200910190757.153319d4@thinkpad>

On Thu, Sep 10, 2020 at 07:07:57PM +0200, Gerald Schaefer wrote:

> I might have lost track a bit. Are we still talking about possible
> functional impacts of either our current pagetable walking with s390
> (apart from gup_fast), or the proposed generic change (for s390, or
> others?)?

I'm looking for an more understandable explanation what is wrong with
the S390 implementation.

If the page operations require the invariant I described then it is
quite easy to explain the problem and understand the solution.

Jason

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Gerald Schaefer @ 2020-09-10 17:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
	linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
	linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
	Anshuman Khandual, John Hubbard, Jeff Dike, linux-um,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner, linux-arm,
	linux-mm, linux-power, LKML, Andrew Morton, Linus Torvalds,
	Mike Rapoport
In-Reply-To: <20200910151026.GL87483@ziepe.ca>

On Thu, 10 Sep 2020 12:10:26 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Thu, Sep 10, 2020 at 03:28:03PM +0200, Gerald Schaefer wrote:
> > On Thu, 10 Sep 2020 10:02:33 -0300
> > Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >   
> > > On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:
> > >   
> > > > As Gerald mentioned, it is very difficult to explain in a clear way.
> > > > Hopefully, one could make sense ot of it.    
> > > 
> > > I would say the page table API requires this invariant:
> > > 
> > >         pud = pud_offset(p4d, addr);
> > >         do {
> > > 		WARN_ON(pud != pud_offset(p4d, addr);
> > >                 next = pud_addr_end(addr, end);
> > >         } while (pud++, addr = next, addr != end);
> > > 
> > > ie pud++ is supposed to be a shortcut for 
> > >   pud_offset(p4d, next)
> > > 
> > > While S390 does not follow this. Fixing addr_end brings it into
> > > alignment by preventing pud++ from happening.
> > > 
> > > The only currently known side effect is that gup_fast crashes, but it
> > > sure is an unexpected thing.  
> > 
> > It only is unexpected in a "top-level folding" world, see my other reply.
> > Consider it an optimization, which was possible because of how our dynamic
> > folding works, and e.g. because we can determine the correct pagetable
> > level from a pXd value in pXd_offset.  
> 
> No, I disagree. The page walker API the arch presents has to have well
> defined semantics. For instance, there is an effort to define tests
> and invarients for the page table accesses to bring this understanding
> and uniformity:
> 
>  mm/debug_vm_pgtable.c
> 
> If we fix S390 using the pX_addr_end() change then the above should be
> updated with an invariant to check it. I've added Anshuman for some
> thoughts..

We are very aware of those tests, and actually a big supporter of the
idea. Also part of the supported architectures already, and it has
already helped us find / fix some s390 oddities.

However, we did not see any issues wrt to our pagetable walking,
neither with the current version, nor with the new generic approach.
We do currently see other issues, Anshuman will know what I mean :-)

> For better or worse, that invariant does exclude arches from using
> other folding techniques.
> 
> The other solution would be to address the other side of != and adjust
> the pud++
> 
> eg replcae pud++ with something like:
>   pud = pud_next_entry(p4d, pud, next)
> 
> Such that:
>   pud_next_entry(p4d, pud, next) === pud_offset(p4d, next)
> 
> In which case the invarient changes to 'callers can never do pointer
> arithmetic on the result of pXX_offset()' which is a bit harder to
> enforce.

I might have lost track a bit. Are we still talking about possible
functional impacts of either our current pagetable walking with s390
(apart from gup_fast), or the proposed generic change (for s390, or
others?)?

Or is this rather some (other) generic issue / idea that you have,
in order to put "some more structure / enforcement" to generic
pagetable walkers?

^ permalink raw reply

* Re: [PATCHv7 10/12] arm64: dts: layerscape: Add PCIe EP node for ls1088a
From: Rob Herring @ 2020-09-10 16:47 UTC (permalink / raw)
  To: Zhiqiang Hou
  Cc: devicetree, lorenzo.pieralisi, Xiaowei Bao, roy.zang, linux-pci,
	linux-kernel, leoyang.li, minghuan.Lian, jingoohan1,
	andrew.murray, mingkai.hu, gustavo.pimentel, bhelgaas, shawnguo,
	kishon, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200811095441.7636-11-Zhiqiang.Hou@nxp.com>

On Tue, Aug 11, 2020 at 05:54:39PM +0800, Zhiqiang Hou wrote:
> From: Xiaowei Bao <xiaowei.bao@nxp.com>
> 
> Add PCIe EP node for ls1088a to support EP mode.
> 
> Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V7:
>  - Rebase the patch without functionality change.
> 
>  .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 31 +++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> index 169f4742ae3b..915592141f1b 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> @@ -499,6 +499,17 @@
>  			status = "disabled";
>  		};
>  
> +		pcie_ep@3400000 {

pci-ep@...

> +			compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
> +			reg = <0x00 0x03400000 0x0 0x00100000
> +			       0x20 0x00000000 0x8 0x00000000>;
> +			reg-names = "regs", "addr_space";
> +			num-ib-windows = <24>;
> +			num-ob-windows = <128>;
> +			max-functions = /bits/ 8 <2>;
> +			status = "disabled";
> +		};
> +
>  		pcie@3500000 {
>  			compatible = "fsl,ls1088a-pcie";
>  			reg = <0x00 0x03500000 0x0 0x00100000   /* controller registers */
> @@ -525,6 +536,16 @@
>  			status = "disabled";
>  		};
>  
> +		pcie_ep@3500000 {
> +			compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
> +			reg = <0x00 0x03500000 0x0 0x00100000
> +			       0x28 0x00000000 0x8 0x00000000>;
> +			reg-names = "regs", "addr_space";
> +			num-ib-windows = <6>;
> +			num-ob-windows = <8>;
> +			status = "disabled";
> +		};
> +
>  		pcie@3600000 {
>  			compatible = "fsl,ls1088a-pcie";
>  			reg = <0x00 0x03600000 0x0 0x00100000   /* controller registers */
> @@ -551,6 +572,16 @@
>  			status = "disabled";
>  		};
>  
> +		pcie_ep@3600000 {
> +			compatible = "fsl,ls1088a-pcie-ep","fsl,ls-pcie-ep";
> +			reg = <0x00 0x03600000 0x0 0x00100000
> +			       0x30 0x00000000 0x8 0x00000000>;
> +			reg-names = "regs", "addr_space";
> +			num-ib-windows = <6>;
> +			num-ob-windows = <8>;
> +			status = "disabled";
> +		};
> +
>  		smmu: iommu@5000000 {
>  			compatible = "arm,mmu-500";
>  			reg = <0 0x5000000 0 0x800000>;
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH] powerpc/papr_scm: Fix warning triggered by perf_stats_show()
From: Ira Weiny @ 2020-09-10 15:55 UTC (permalink / raw)
  To: Vaibhav Jain
  Cc: Santosh Sivaraj, linux-nvdimm, Aneesh Kumar K . V,
	Oliver O'Halloran, Dan Williams, linuxppc-dev
In-Reply-To: <20200910092212.107674-1-vaibhav@linux.ibm.com>

On Thu, Sep 10, 2020 at 02:52:12PM +0530, Vaibhav Jain wrote:
> A warning is reported by the kernel in case perf_stats_show() returns
> an error code. The warning is of the form below:
> 
>  papr_scm ibm,persistent-memory:ibm,pmemory@44100001:
>  	  Failed to query performance stats, Err:-10
>  dev_attr_show: perf_stats_show+0x0/0x1c0 [papr_scm] returned bad count
>  fill_read_buffer: dev_attr_show+0x0/0xb0 returned bad count
> 
> On investigation it looks like that the compiler is silently truncating the
> return value of drc_pmem_query_stats() from 'long' to 'int', since the
> variable used to store the return code 'rc' is an 'int'. This
> truncated value is then returned back as a 'ssize_t' back from
> perf_stats_show() to 'dev_attr_show()' which thinks of it as a large
> unsigned number and triggers this warning..
> 
> To fix this we update the type of variable 'rc' from 'int' to
> 'ssize_t' that prevents the compiler from truncating the return value
> of drc_pmem_query_stats() and returning correct signed value back from
> perf_stats_show().
> 
> Fixes: 2d02bf835e573 ('powerpc/papr_scm: Fetch nvdimm performance
>        stats from PHYP')
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
> index a88a707a608aa..9f00b61676ab9 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -785,7 +785,8 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
>  static ssize_t perf_stats_show(struct device *dev,
>  			       struct device_attribute *attr, char *buf)
>  {
> -	int index, rc;
> +	int index;
> +	ssize_t rc;

I'm not sure this is really fixing everything here.

drc_pmem_query_stats() can return negative errno's.  Why are those not checked
somewhere in perf_stats_show()?

It seems like all this fix is handling is a > 0 return value: 'ret[0]' from
line 289 in papr_scm.c...  Or something?

Worse yet drc_pmem_query_stats() is returning ssize_t which is a signed value.
Therefore, it should not be returning -errno.  I'm surprised the static
checkers did not catch that.

I believe I caught similar errors with a patch series before which did not pay
attention to variable types.

Please audit this code for these types of errors and ensure you are really
doing the correct thing when using the sysfs interface.  I'm pretty sure bad
things will eventually happen (if they are not already) if you return some
really big number to the sysfs core from *_show().

Ira

>  	struct seq_buf s;
>  	struct papr_scm_perf_stat *stat;
>  	struct papr_scm_perf_stats *stats;
> -- 
> 2.26.2
> 

^ permalink raw reply

* Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: Segher Boessenkool @ 2020-09-10 15:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-arch, Kees Cook, the arch/x86 maintainers, Nick Desaulniers,
	Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
	Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <CAHk-=whu19Du_rZ-zBtGsXAB-Qo7NtoJjQjd-Sa9OB5u1Cq_Zw@mail.gmail.com>

On Wed, Sep 09, 2020 at 02:33:36PM -0700, Linus Torvalds wrote:
> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > It will not work like this in GCC, no.  The LLVM people know about that.
> > I do not know why they insist on pushing this, being incompatible and
> > everything.
> 
> Umm. Since they'd be the ones supporting this, *gcc* would be the
> incompatible one, not clang.

This breaks the basic requirements of asm goto.

> So I'd phrase it differently. If gcc is planning on doing some
> different model for asm goto with outputs, that would be the
> incompatible case.

If we will do asm goto with outputs, the asm will still be a jump
instruction!  (It is not in LLVM!)

We probably *can* make asm goto have outputs (jump instructions can have
outputs just fine!  Just output reloads on jump instructions are hard,
because not always they are *possible*; but for asm goto it should be
fine).

Doing as LLVM does, and making the asm a "trapping" instruction, makes
it not a jump insn, and opens up whole new cans of worms (including
inferior code quality).  Since it has very different semantics, and we
might want to keep the semantics of asm goto as well anyway, this should
be called something different ("asm break" or "asm __anything" for
example).

It would be nice if they talked to us about it, too.  LLVM claims it
implements the GCC inline asm extension.  It already only is compatible
for the simplest of cases, but this would be much worse still :-(

> and honestly, (b) is actually inferior for the error cases, even if to
> a compiler person it might feel like the "RightThing(tm)" to do.
> Because when an exception happens, the outputs simply won't be
> initialized.

Sure, that is fine, and quite possible useful, but it is not the same as
asm goto.  asm goto is not some exception handling construct: it is a
jump instruction.

> Anyway, for either of those cases, the kernel won't care either way.
> We'll have to support the non-goto case for many years even if
> everybody were to magically implement it today, so it's not like this
> is a "you have to do it" thing.

Yes.

I'm just annoyed because of all the extra work created by people not
communicating.

Segher

^ permalink raw reply

* RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: David Laight @ 2020-09-10 15:31 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: linux-arch, Kees Cook, linuxppc-dev, the arch/x86 maintainers,
	Nick Desaulniers, Linux Kernel Mailing List, Christoph Hellwig,
	Luis Chamberlain, Al Viro, linux-fsdevel,
	'Linus Torvalds', Alexey Dobriyan
In-Reply-To: <20200910152030.GJ28786@gate.crashing.org>



> -----Original Message-----
> From: Segher Boessenkool <segher@kernel.crashing.org>
> Sent: 10 September 2020 16:21
> To: David Laight <David.Laight@ACULAB.COM>
> Cc: 'Christophe Leroy' <christophe.leroy@csgroup.eu>; 'Linus Torvalds' <torvalds@linux-
> foundation.org>; linux-arch <linux-arch@vger.kernel.org>; Kees Cook <keescook@chromium.org>; the
> arch/x86 maintainers <x86@kernel.org>; Nick Desaulniers <ndesaulniers@google.com>; Linux Kernel
> Mailing List <linux-kernel@vger.kernel.org>; Alexey Dobriyan <adobriyan@gmail.com>; Luis Chamberlain
> <mcgrof@kernel.org>; Al Viro <viro@zeniv.linux.org.uk>; linux-fsdevel <linux-fsdevel@vger.kernel.org>;
> linuxppc-dev <linuxppc-dev@lists.ozlabs.org>; Christoph Hellwig <hch@lst.de>
> Subject: Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
> 
> On Thu, Sep 10, 2020 at 12:26:53PM +0000, David Laight wrote:
> > Actually this is pretty sound:
> > 	__label__ label;
> > 	register int eax asm ("eax");
> > 	// Ensure eax can't be reloaded from anywhere
> > 	// In particular it can't be reloaded after the asm goto line
> > 	asm volatile ("" : "=r" (eax));
> 
> This asm is fine.  It says it writes the "eax" variable, which lives in
> the eax register *in that asm* (so *not* guaranteed after it!).
> 
> > 	// Provided gcc doesn't save eax here...
> > 	asm volatile goto ("xxxxx" ::: "eax" : label);
> 
> So this is incorrect.

From the other email:

> It is neither input nor output operand here!  Only *then* is a local
> register asm guaranteed to be in the given reg: as input or output to an
> inline asm.

Ok, so adding '"r" (eax)' to the input section helps a bit.

> > 	// ... and reload the saved value here.
> > 	// The input value here will be that modified by the 'asm goto'.
> > 	// Since this modifies eax it can't be moved before the 'asm goto'.
> > 	asm volatile ("" : "+r" (eax));
> > 	// So here eax must contain the value set by the "xxxxx" instructions.
> 
> No, the register eax will contain the value of the eax variable.  In the
> asm; it might well be there before or after the asm as well, but none of
> that is guaranteed.

Perhaps not 'guaranteed', but very unlikely to be wrong.
It doesn't give gcc much scope for not generating the desired code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: Segher Boessenkool @ 2020-09-10 15:20 UTC (permalink / raw)
  To: David Laight
  Cc: linux-arch, Kees Cook, linuxppc-dev, the arch/x86 maintainers,
	Nick Desaulniers, Linux Kernel Mailing List, Christoph Hellwig,
	Luis Chamberlain, Al Viro, linux-fsdevel,
	'Linus Torvalds', Alexey Dobriyan
In-Reply-To: <5050b43687c84515a49b345174a98822@AcuMS.aculab.com>

On Thu, Sep 10, 2020 at 12:26:53PM +0000, David Laight wrote:
> Actually this is pretty sound:
> 	__label__ label;
> 	register int eax asm ("eax");
> 	// Ensure eax can't be reloaded from anywhere
> 	// In particular it can't be reloaded after the asm goto line
> 	asm volatile ("" : "=r" (eax));

This asm is fine.  It says it writes the "eax" variable, which lives in
the eax register *in that asm* (so *not* guaranteed after it!).

> 	// Provided gcc doesn't save eax here...
> 	asm volatile goto ("xxxxx" ::: "eax" : label);

So this is incorrect.

> 	// ... and reload the saved value here.
> 	// The input value here will be that modified by the 'asm goto'.
> 	// Since this modifies eax it can't be moved before the 'asm goto'.
> 	asm volatile ("" : "+r" (eax));
> 	// So here eax must contain the value set by the "xxxxx" instructions.

No, the register eax will contain the value of the eax variable.  In the
asm; it might well be there before or after the asm as well, but none of
that is guaranteed.


Segher

^ permalink raw reply

* Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: Segher Boessenkool @ 2020-09-10 15:16 UTC (permalink / raw)
  To: David Laight
  Cc: linux-arch, Kees Cook, linuxppc-dev, the arch/x86 maintainers,
	Nick Desaulniers, Linux Kernel Mailing List, Alexey Dobriyan,
	Luis Chamberlain, Al Viro, linux-fsdevel,
	'Linus Torvalds', Christoph Hellwig
In-Reply-To: <59a64e9a210847b59f70f9bd2d02b5c3@AcuMS.aculab.com>

On Thu, Sep 10, 2020 at 09:26:28AM +0000, David Laight wrote:
> From: Christophe Leroy
> > Sent: 10 September 2020 09:14
> > 
> > Le 10/09/2020 à 10:04, David Laight a écrit :
> > > From: Linus Torvalds
> > >> Sent: 09 September 2020 22:34
> > >> On Wed, Sep 9, 2020 at 11:42 AM Segher Boessenkool
> > >> <segher@kernel.crashing.org> wrote:
> > >>>
> > >>> It will not work like this in GCC, no.  The LLVM people know about that.
> > >>> I do not know why they insist on pushing this, being incompatible and
> > >>> everything.
> > >>
> > >> Umm. Since they'd be the ones supporting this, *gcc* would be the
> > >> incompatible one, not clang.
> > >
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> > 
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> > 
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good....
> 
> int bar(char *z)
> {
>         __label__ label;
>         register int eax asm ("eax") = 6;
>         asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> 
> label:
>         return eax;
> }

It is neither input nor output operand here!  Only *then* is a local
register asm guaranteed to be in the given reg: as input or output to an
inline asm.


Segher

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 15:10 UTC (permalink / raw)
  To: Gerald Schaefer, Anshuman Khandual
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
	linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
	linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
	John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
	linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910152803.1a930afc@thinkpad>

On Thu, Sep 10, 2020 at 03:28:03PM +0200, Gerald Schaefer wrote:
> On Thu, 10 Sep 2020 10:02:33 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> > On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:
> > 
> > > As Gerald mentioned, it is very difficult to explain in a clear way.
> > > Hopefully, one could make sense ot of it.  
> > 
> > I would say the page table API requires this invariant:
> > 
> >         pud = pud_offset(p4d, addr);
> >         do {
> > 		WARN_ON(pud != pud_offset(p4d, addr);
> >                 next = pud_addr_end(addr, end);
> >         } while (pud++, addr = next, addr != end);
> > 
> > ie pud++ is supposed to be a shortcut for 
> >   pud_offset(p4d, next)
> > 
> > While S390 does not follow this. Fixing addr_end brings it into
> > alignment by preventing pud++ from happening.
> > 
> > The only currently known side effect is that gup_fast crashes, but it
> > sure is an unexpected thing.
> 
> It only is unexpected in a "top-level folding" world, see my other reply.
> Consider it an optimization, which was possible because of how our dynamic
> folding works, and e.g. because we can determine the correct pagetable
> level from a pXd value in pXd_offset.

No, I disagree. The page walker API the arch presents has to have well
defined semantics. For instance, there is an effort to define tests
and invarients for the page table accesses to bring this understanding
and uniformity:

 mm/debug_vm_pgtable.c

If we fix S390 using the pX_addr_end() change then the above should be
updated with an invariant to check it. I've added Anshuman for some
thoughts..

For better or worse, that invariant does exclude arches from using
other folding techniques.

The other solution would be to address the other side of != and adjust
the pud++

eg replcae pud++ with something like:
  pud = pud_next_entry(p4d, pud, next)

Such that:
  pud_next_entry(p4d, pud, next) === pud_offset(p4d, next)

In which case the invarient changes to 'callers can never do pointer
arithmetic on the result of pXX_offset()' which is a bit harder to
enforce.

Jason

^ permalink raw reply

* [PATCH] soc: fsl: dpio: remove set but not used 'addr_cena'
From: Jason Yan @ 2020-09-10 14:04 UTC (permalink / raw)
  To: Roy.Pledge, leoyang.li, youri.querry_1, linux-kernel,
	linuxppc-dev, linux-arm-kernel
  Cc: Hulk Robot, Jason Yan

This addresses the following gcc warning with "make W=1":

drivers/soc/fsl/dpio/qbman-portal.c: In function
‘qbman_swp_enqueue_multiple_direct’:
drivers/soc/fsl/dpio/qbman-portal.c:650:11: warning: variable
‘addr_cena’ set but not used [-Wunused-but-set-variable]
  650 |  uint64_t addr_cena;
      |           ^~~~~~~~~

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
---
 drivers/soc/fsl/dpio/qbman-portal.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c b/drivers/soc/fsl/dpio/qbman-portal.c
index 0ab85bfb116f..659b4a570d5b 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -647,7 +647,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
 	const uint32_t *cl = (uint32_t *)d;
 	uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
 	int i, num_enqueued = 0;
-	uint64_t addr_cena;
 
 	spin_lock(&s->access_spinlock);
 	half_mask = (s->eqcr.pi_ci_mask>>1);
@@ -701,7 +700,6 @@ int qbman_swp_enqueue_multiple_direct(struct qbman_swp *s,
 
 	/* Flush all the cacheline without load/store in between */
 	eqcr_pi = s->eqcr.pi;
-	addr_cena = (size_t)s->addr_cena;
 	for (i = 0; i < num_enqueued; i++)
 		eqcr_pi++;
 	s->eqcr.pi = eqcr_pi & full_mask;
-- 
2.25.4


^ permalink raw reply related

* Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;
From: Robin Murphy @ 2020-09-10 14:21 UTC (permalink / raw)
  To: Joe Perches, LKML, Jiri Kosina
  Cc: linux-fbdev, oss-drivers, nouveau, alsa-devel, dri-devel,
	linux-mips, linux-ide, dm-devel, linux-mtd, linux-i2c, sparclinux,
	Will Deacon, linux-afs, linux-rtc, linux-s390, linux-scsi, dccp,
	linux-rdma, linux-atm-general, kvmarm, coreteam, intel-wired-lan,
	linux-serial, linux-input, linux-mmc, Kees Cook, linux-media,
	linux-pm, intel-gfx, linux-mediatek, linux-nvme, storagedev,
	ceph-devel, linux-arm-kernel, Nick Desaulniers, linux-nfs,
	linux-parisc, netdev, linux-usb, linux-wireless, linux-sctp,
	iommu, netfilter-devel, linux-crypto, bpf, linuxppc-dev
In-Reply-To: <e6387578c75736d61b2fe70d9783d91329a97eb4.camel@perches.com>

On 2020-09-09 21:06, Joe Perches wrote:
> fallthrough to a separate case/default label break; isn't very readable.
> 
> Convert pseudo-keyword fallthrough; statements to a simple break; when
> the next label is case or default and the only statement in the next
> label block is break;
> 
> Found using:
> 
> $ grep-2.5.4 -rP --include=*.[ch] -n "fallthrough;(\s*(case\s+\w+|default)\s*:\s*){1,7}break;" *
> 
> Miscellanea:
> 
> o Move or coalesce a couple label blocks above a default: block.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> 
> Compiled allyesconfig x86-64 only.
> A few files for other arches were not compiled.
> 

[...]
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c192544e874b..743db1abec40 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -3777,7 +3777,7 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
>   	switch (FIELD_GET(IDR0_TTF, reg)) {
>   	case IDR0_TTF_AARCH32_64:
>   		smmu->ias = 40;
> -		fallthrough;
> +		break;
>   	case IDR0_TTF_AARCH64:
>   		break;
>   	default:

I have to say I don't really agree with the readability argument for 
this one - a fallthrough is semantically correct here, since the first 
case is a superset of the second. It just happens that anything we would 
do for the common subset is implicitly assumed (there are other 
potential cases we simply haven't added support for at the moment), thus 
the second case is currently empty.

This change actively obfuscates that distinction.

Robin.

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Gerald Schaefer @ 2020-09-10 13:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
	linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
	linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
	John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
	linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910130233.GK87483@ziepe.ca>

On Thu, 10 Sep 2020 10:02:33 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:
> 
> > As Gerald mentioned, it is very difficult to explain in a clear way.
> > Hopefully, one could make sense ot of it.  
> 
> I would say the page table API requires this invariant:
> 
>         pud = pud_offset(p4d, addr);
>         do {
> 		WARN_ON(pud != pud_offset(p4d, addr);
>                 next = pud_addr_end(addr, end);
>         } while (pud++, addr = next, addr != end);
> 
> ie pud++ is supposed to be a shortcut for 
>   pud_offset(p4d, next)
> 
> While S390 does not follow this. Fixing addr_end brings it into
> alignment by preventing pud++ from happening.
> 
> The only currently known side effect is that gup_fast crashes, but it
> sure is an unexpected thing.

It only is unexpected in a "top-level folding" world, see my other reply.
Consider it an optimization, which was possible because of how our dynamic
folding works, and e.g. because we can determine the correct pagetable
level from a pXd value in pXd_offset.

> This suggests another fix, which is to say that pud++ is undefined and
> pud_offset() must always be called, but I think that would cause worse
> codegen on all other archs.

There really is nothing to fix for s390 outside of gup_fast, or other
potential future READ_ONCE pagetable walkers. We do take the side-effect
of the generic change on all other pagetable walkers for s390, but it
really is rather a slight degradation than a fix.

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Gerald Schaefer @ 2020-09-10 13:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
	linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
	linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
	Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
	John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
	linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200909180324.GI87483@ziepe.ca>

On Wed, 9 Sep 2020 15:03:24 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Wed, Sep 09, 2020 at 07:25:34PM +0200, Gerald Schaefer wrote:
> > I actually had to draw myself a picture to get some hold of
> > this, or rather a walk-through with a certain pud-crossing
> > range in a folded 3-level scenario. Not sure if I would have
> > understood my explanation above w/o that, but I hope you can
> > make some sense out of it. Or draw yourself a picture :-)
> 
> What I don't understand is how does anything work with S390 today?

That is totally comprehensible :-)

> If the fix is only to change pxx_addr_end() then than generic code
> like mm/pagewalk.c will iterate over a *different list* of page table
> entries. 
> 
> It's choice of entries to look at is entirely driven by pxx_addr_end().
> 
> Which suggest to me that mm/pagewalk.c also doesn't work properly
> today on S390 and this issue is not really about stack variables?

I guess you are confused by the fact that the generic change will indeed
change the logic for _all_ pagetable walkers on s390, not just for
the gup_fast case. But that doesn't mean that they were doing it wrong
before, we simply can do it both ways. However, we probably should
make that (in theory useless) change more explicit.

Let's compare before and after for mm/pagewalk.c on s390, with 3-level
pagetables, range crossing 2 GB pud boundary.

* Before (with pXd_addr_end always using static 5-level PxD_SIZE):

walk_pgd_range()
-> pgd_addr_end() will use static 2^53 PGDIR_SIZE, range is not cropped,
                  no iterations needed, passed over to next level

walk_p4d_range()
-> p4d_addr_end() will use static 2^42 P4D_SIZE, range still not cropped

walk_pud_range()
-> pud_addr_end() now we're cropping, with 2^31 PUD_SIZE, need two
                  iterations for range crossing pud boundary, doing
                  that right here on a pudp which is actually the
                  previously passed-through pgdp/p4dp (pointing to
                  correct pagetable entry)

* After (with dynamic pXd_addr_end using "correct" PxD_SIZE boundaries,
         should be similar to other archs static "top-level folding"):

walk_pgd_range()
-> pgd_addr_end() will now determine "correct" boundary based on pgd
                  value, i.e. 2^31 PUD_SIZE, do cropping now, iteration
                  will now happen here

walk_p4d/pud_range()
->  operate on cropped range, will not iterate, instead return to pgd level,
    which will then use the same pointer for iteration as in the "Before"
    case, but not on the same level.

IMHO, our "Before" logic is more efficient, and also feels more natural.
After all, it is not really necessary to return to pgd level, and it will
surely cost some extra instructions. We are willing to take that cost
for the sake of doing it in a more generic way, hoping that will reduce
future issues. E.g. you already mentioned that you have plans for using
the READ_ONCE logic also in other places, and that would be such a
"future issue".

> Fundamentally if pXX_offset() and pXX_addr_end() must be consistent
> together, if pXX_offset() is folded then pXX_addr_end() must cause a
> single iteration of that level.

well, that sounds correct in theory, but I guess it depends on "how
you fold it". E.g. what does "if pXX_offset() is folded" mean?
Take pgd_offset() for the 3-level case above. From our previous
"middle-level folding/iteration" perspective, I would say that
pgd/p4d are folded into pud, so if you say "if pgd_offset() is folded
then pgd_addr_end() must cause a single iteration of that level",
we were doing it all correctly, i.e only having single iteration
on pgd/p4d level. You could even say that all others are doing /
using it wrong :-)

Now take pgd_offset() from the "top-level folding/iteration".
Here you would say that p4d/pud are folded into pgd, which again
does not sound like the natural / most efficient way to me,
but IIUC this has to be how it works for all other archs with
(static) pagetable folding. Now you'd say "if pud/p4d_offset()
is folded then pud/p4d_addr_end() must cause a single iteration
of that level", and that would sound correct. At least until
you look more closely, because e.g. p4d_addr_end() in
include/asm-generic/pgtable-nop4d.h is simply this:
#define p4d_addr_end(addr, end) (end)

How can that cause a single iteration? It clearly won't, it only
works because the previous pgd_addr_end already cropped the range
so that there will be only single iterations for p4d/pud.

The more I think of it, the more it sounds like s390 "middle-level
folding/iteration" was doing it "the right way", and everybody else
was wrong, or at least not in an optimally efficient way :-) Might
also be that only we could do this because we can determine the
pagetable level from a pagetable entry value.

Anyway, if you are not yet confused enough, I recommend looking
at the other option we had in mind, for fixing the gup_fast issue.
See "Patch 1" from here:
https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schaefer@linux.ibm.com/

That would actually have kept that "middle-level iteration" also
for gup_fast, by additionally passing through the pXd pointers.
However, it also needed a gup-specific version of pXd_offset(),
in order to keep the READ_ONCE semantics. For s390, that would
have actually been the best solution, but a generic version of
that might not have been so easy. And doing it like everybody
else can not be so bad, at least I really hope so.

Of course, at some point in time, we might come up with some fancy
fundamental change that would "do it the right middle-level way
for everybody". At least I think I overheard Vasily and Alexander
discussing some wild ideas, but that is certainly beyond this scope
here...

^ permalink raw reply

* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 13:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
	linux-sparc, Claudio Imbrenda, Will Deacon, linux-arch,
	linux-s390, Vasily Gorbik, Richard Weinberger, linux-x86,
	Russell King, Christian Borntraeger, Ingo Molnar, Catalin Marinas,
	Andrey Ryabinin, Gerald Schaefer, Heiko Carstens, Arnd Bergmann,
	John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
	linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910093925.GB29166@oc3871087118.ibm.com>

On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:

> As Gerald mentioned, it is very difficult to explain in a clear way.
> Hopefully, one could make sense ot of it.

I would say the page table API requires this invariant:

        pud = pud_offset(p4d, addr);
        do {
		WARN_ON(pud != pud_offset(p4d, addr);
                next = pud_addr_end(addr, end);
        } while (pud++, addr = next, addr != end);

ie pud++ is supposed to be a shortcut for 
  pud_offset(p4d, next)

While S390 does not follow this. Fixing addr_end brings it into
alignment by preventing pud++ from happening.

The only currently known side effect is that gup_fast crashes, but it
sure is an unexpected thing.

This suggests another fix, which is to say that pud++ is undefined and
pud_offset() must always be called, but I think that would cause worse
codegen on all other archs.

Jason


^ permalink raw reply

* Re: [PATCH v2] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
From: Michael Ellerman @ 2020-09-10 12:55 UTC (permalink / raw)
  To: linux-nvdimm, linuxppc-dev, Vaibhav Jain; +Cc: Aneesh Kumar K . V
In-Reply-To: <20200907110540.21349-1-vaibhav@linux.ibm.com>

On Mon, 7 Sep 2020 16:35:40 +0530, Vaibhav Jain wrote:
> The newly introduced 'perf_stats' attribute uses the default access
> mode of 0444 letting non-root users access performance stats of an
> nvdimm and potentially force the kernel into issuing large number of
> expensive HCALLs. Since the information exposed by this attribute
> cannot be cached hence its better to ward of access to this attribute
> from users who don't need to access these performance statistics.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
      https://git.kernel.org/powerpc/c/0460534b532e5518c657c7d6492b9337d975eaa3

cheers

^ permalink raw reply

* Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask
From: Michael Ellerman @ 2020-09-10 12:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Oliver O'Halloran, Cédric Le Goater, Christoph Hellwig
In-Reply-To: <20200908015106.79661-1-aik@ozlabs.ru>

On Tue, 8 Sep 2020 11:51:06 +1000, Alexey Kardashevskiy wrote:
> There are 2 problems with it:
> 1. "<" vs expected "<<"
> 2. the shift number is an IOMMU page number mask, not an address mask
> as the IOMMU page shift is missing.
> 
> This did not hit us before f1565c24b596 ("powerpc: use the generic
> dma_ops_bypass mode") because we had there additional code to handle
> bypass mask so this chunk (almost?) never executed. However there
> were reports that aacraid does not work with "iommu=nobypass".
> After f1565c24b596, aacraid (and probably others which call
> dma_get_required_mask() before setting the mask) was unable to
> enable 64bit DMA and fall back to using IOMMU which was known not to work,
> one of the problems is double free of an IOMMU page.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/dma: Fix dma_map_ops::get_required_mask
      https://git.kernel.org/powerpc/c/437ef802e0adc9f162a95213a3488e8646e5fc03

cheers

^ permalink raw reply

* Re: [PATCH v2] cpuidle-pseries: Fix CEDE latency conversion from tb to us
From: Michael Ellerman @ 2020-09-10 12:55 UTC (permalink / raw)
  To: Rafael J. Wysocki, Joel Stanley, Gautham R. Shenoy,
	Vaidyanathan Srinivasan, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, linux-pm
In-Reply-To: <1599125247-28488-1-git-send-email-ego@linux.vnet.ibm.com>

On Thu, 3 Sep 2020 14:57:27 +0530, Gautham R. Shenoy wrote:
> commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for
> CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
> of the Extended CEDE states advertised by the platform. The values
> advertised by the platform are in timebase ticks. However the cpuidle
> framework requires the latency values in microseconds.
> 
> If the tb-ticks value advertised by the platform correspond to a value
> smaller than 1us, during the conversion from tb-ticks to microseconds,
> in the current code, the result becomes zero. This is incorrect as it
> puts a CEDE state on par with the snooze state.
> 
> [...]

Applied to powerpc/fixes.

[1/1] cpuidle: pseries: Fix CEDE latency conversion from tb to us
      https://git.kernel.org/powerpc/c/1d3ee7df009a46440c58508b8819213c09503acd

cheers

^ permalink raw reply

* RE: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: David Laight @ 2020-09-10 12:26 UTC (permalink / raw)
  To: David Laight, 'Christophe Leroy',
	'Linus Torvalds', Segher Boessenkool
  Cc: linux-arch, Kees Cook, the arch/x86 maintainers, Nick Desaulniers,
	Linux Kernel Mailing List, Christoph Hellwig, Luis Chamberlain,
	Al Viro, linux-fsdevel, linuxppc-dev, Alexey Dobriyan
In-Reply-To: <59a64e9a210847b59f70f9bd2d02b5c3@AcuMS.aculab.com>

From: David Laight
> Sent: 10 September 2020 10:26
...
> > > I had an 'interesting' idea.
> > >
> > > Can you use a local asm register variable as an input and output to
> > > an 'asm volatile goto' statement?
> > >
> > > Well you can - but is it guaranteed to work :-)
> > >
> >
> > With gcc at least it should work according to
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html
> >
> > They even explicitely tell: "The only supported use for this feature is
> > to specify registers for input and output operands when calling Extended
> > asm "
> 
> A quick test isn't good....
> 
> int bar(char *z)
> {
>         __label__ label;
>         register int eax asm ("eax") = 6;
>         asm volatile goto (" mov $1, %%eax" ::: "eax" : label);
> label:
>         return eax;
> }
> 
> 0000000000000040 <bar>:
>   40:   b8 01 00 00 00          mov    $0x1,%eax
>   45:   b8 06 00 00 00          mov    $0x6,%eax
>   4a:   c3                      retq
> 
> although adding:
>         asm volatile ("" : "+r" (eax));
> either side of the 'asm volatile goto' does fix it.

Actually this is pretty sound:
	__label__ label;
	register int eax asm ("eax");
	// Ensure eax can't be reloaded from anywhere
	// In particular it can't be reloaded after the asm goto line
	asm volatile ("" : "=r" (eax));
	// Provided gcc doesn't save eax here...
	asm volatile goto ("xxxxx" ::: "eax" : label);
	// ... and reload the saved value here.
	// The input value here will be that modified by the 'asm goto'.
	// Since this modifies eax it can't be moved before the 'asm goto'.
	asm volatile ("" : "+r" (eax));
	// So here eax must contain the value set by the "xxxxx" instructions.

    David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply

* Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;
From: Matthias Brugger @ 2020-09-10 10:16 UTC (permalink / raw)
  To: Joe Perches, LKML, Jiri Kosina
  Cc: linux-fbdev, oss-drivers, nouveau, alsa-devel, dri-devel,
	linux-mips, linux-ide, dm-devel, linux-mtd, linux-i2c, sparclinux,
	linux-afs, linux-rtc, linux-s390, linux-scsi, dccp, linux-rdma,
	linux-atm-general, kvmarm, coreteam, intel-wired-lan,
	linux-serial, linux-input, linux-mmc, Kees Cook, linux-media,
	linux-pm, intel-gfx, linux-mediatek, linux-nvme, storagedev,
	ceph-devel, linux-arm-kernel, Nick Desaulniers, linux-nfs,
	linux-parisc, netdev, linux-usb, linux-wireless, linux-sctp,
	iommu, netfilter-devel, linux-crypto, bpf, linuxppc-dev
In-Reply-To: <e6387578c75736d61b2fe70d9783d91329a97eb4.camel@perches.com>



On 09/09/2020 22:06, Joe Perches wrote:
> diff --git a/drivers/net/wireless/mediatek/mt7601u/dma.c b/drivers/net/wireless/mediatek/mt7601u/dma.c
> index 09f931d4598c..778be26d329f 100644
> --- a/drivers/net/wireless/mediatek/mt7601u/dma.c
> +++ b/drivers/net/wireless/mediatek/mt7601u/dma.c
> @@ -193,11 +193,11 @@ static void mt7601u_complete_rx(struct urb *urb)
>   	case -ESHUTDOWN:
>   	case -ENOENT:
>   		return;
> +	case 0:
> +		break;
>   	default:
>   		dev_err_ratelimited(dev->dev, "rx urb failed: %d\n",
>   				    urb->status);
> -		fallthrough;
> -	case 0:
>   		break;
>   	}
>   
> @@ -238,11 +238,11 @@ static void mt7601u_complete_tx(struct urb *urb)
>   	case -ESHUTDOWN:
>   	case -ENOENT:
>   		return;
> +	case 0:
> +		break;
>   	default:
>   		dev_err_ratelimited(dev->dev, "tx urb failed: %d\n",
>   				    urb->status);
> -		fallthrough;
> -	case 0:
>   		break;
>   	}

Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>

^ permalink raw reply

* Re: [PATCH v3 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends
From: Pankaj Gupta @ 2020-09-10  9:53 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-hyperv, Michal Hocko, Michael S. Tsirkin, Jason Wang,
	Pingfan Liu, virtualization, Linux MM, Paul Mackerras,
	K. Y. Srinivasan, Boris Ostrovsky, linux-s390, Wei Liu,
	Stefano Stabellini, Dave Jiang, Baoquan He, Jason Gunthorpe,
	linux-acpi, xen-devel, Heiko Carstens, Len Brown, Nathan Lynch,
	Vasily Gorbik, Leonardo Bras, Haiyang Zhang, Stephen Hemminger,
	Dan Williams, Christian Borntraeger, Juergen Gross,
	Libor Pechacek, Greg Kroah-Hartman, linux-nvdimm,
	Rafael J. Wysocki, LKML, Wei Yang, Vishal Verma,
	Oliver O'Halloran, Andrew Morton, linuxppc-dev
In-Reply-To: <20200910091340.8654-4-david@redhat.com>

> We soon want to pass flags, e.g., to mark added System RAM resources.
> mergeable. Prepare for that.
>
> This patch is based on a similar patch by Oscar Salvador:
>
> https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de
>
> Acked-by: Wei Liu <wei.liu@kernel.org>
> Reviewed-by: Juergen Gross <jgross@suse.com> # Xen related part
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Wei Yang <richardw.yang@linux.intel.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
> Cc: Len Brown <lenb@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Vishal Verma <vishal.l.verma@intel.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Heiko Carstens <hca@linux.ibm.com>
> Cc: Vasily Gorbik <gor@linux.ibm.com>
> Cc: Christian Borntraeger <borntraeger@de.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: "Oliver O'Halloran" <oohall@gmail.com>
> Cc: Pingfan Liu <kernelfans@gmail.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Libor Pechacek <lpechacek@suse.cz>
> Cc: Anton Blanchard <anton@ozlabs.org>
> Cc: Leonardo Bras <leobras.c@gmail.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-acpi@vger.kernel.org
> Cc: linux-nvdimm@lists.01.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: linux-s390@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/powerpc/platforms/powernv/memtrace.c       |  2 +-
>  arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
>  drivers/acpi/acpi_memhotplug.c                  |  3 ++-
>  drivers/base/memory.c                           |  3 ++-
>  drivers/dax/kmem.c                              |  2 +-
>  drivers/hv/hv_balloon.c                         |  2 +-
>  drivers/s390/char/sclp_cmd.c                    |  2 +-
>  drivers/virtio/virtio_mem.c                     |  2 +-
>  drivers/xen/balloon.c                           |  2 +-
>  include/linux/memory_hotplug.h                  | 16 ++++++++++++----
>  mm/memory_hotplug.c                             | 14 +++++++-------
>  11 files changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
> index 13b369d2cc454..6828108486f83 100644
> --- a/arch/powerpc/platforms/powernv/memtrace.c
> +++ b/arch/powerpc/platforms/powernv/memtrace.c
> @@ -224,7 +224,7 @@ static int memtrace_online(void)
>                         ent->mem = 0;
>                 }
>
> -               if (add_memory(ent->nid, ent->start, ent->size)) {
> +               if (add_memory(ent->nid, ent->start, ent->size, MHP_NONE)) {
>                         pr_err("Failed to add trace memory to node %d\n",
>                                 ent->nid);
>                         ret += 1;
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 0ea976d1cac47..e1c9fa0d730f5 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -615,7 +615,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
>         nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
>         /* Add the memory */
> -       rc = __add_memory(nid, lmb->base_addr, block_sz);
> +       rc = __add_memory(nid, lmb->base_addr, block_sz, MHP_NONE);
>         if (rc) {
>                 invalidate_lmb_associativity_index(lmb);
>                 return rc;
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index e294f44a78504..2067c3bc55763 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -207,7 +207,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>                 if (node < 0)
>                         node = memory_add_physaddr_to_nid(info->start_addr);
>
> -               result = __add_memory(node, info->start_addr, info->length);
> +               result = __add_memory(node, info->start_addr, info->length,
> +                                     MHP_NONE);
>
>                 /*
>                  * If the memory block has been used by the kernel, add_memory()
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 4db3c660de831..b4c297dd04755 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -432,7 +432,8 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
>
>         nid = memory_add_physaddr_to_nid(phys_addr);
>         ret = __add_memory(nid, phys_addr,
> -                          MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +                          MIN_MEMORY_BLOCK_SIZE * sections_per_block,
> +                          MHP_NONE);
>
>         if (ret)
>                 goto out;
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 7dcb2902e9b1b..896cb9444e727 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -95,7 +95,7 @@ int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>                  * this as RAM automatically.
>                  */
>                 rc = add_memory_driver_managed(numa_node, range.start,
> -                               range_len(&range), kmem_name);
> +                               range_len(&range), kmem_name, MHP_NONE);
>
>                 res->flags |= IORESOURCE_BUSY;
>                 if (rc) {
> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 32e3bc0aa665a..3c0d52e244520 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>
>                 nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>                 ret = add_memory(nid, PFN_PHYS((start_pfn)),
> -                               (HA_CHUNK << PAGE_SHIFT));
> +                               (HA_CHUNK << PAGE_SHIFT), MHP_NONE);
>
>                 if (ret) {
>                         pr_err("hot_add memory failed error is %d\n", ret);
> diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
> index a864b21af602a..f6e97f0830f64 100644
> --- a/drivers/s390/char/sclp_cmd.c
> +++ b/drivers/s390/char/sclp_cmd.c
> @@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn)
>         if (!size)
>                 goto skip_add;
>         for (addr = start; addr < start + size; addr += block_size)
> -               add_memory(0, addr, block_size);
> +               add_memory(0, addr, block_size, MHP_NONE);
>  skip_add:
>         first_rn = rn;
>         num = 1;
> diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
> index 834b7c13ef3dc..ed99e43354010 100644
> --- a/drivers/virtio/virtio_mem.c
> +++ b/drivers/virtio/virtio_mem.c
> @@ -424,7 +424,7 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
>
>         dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
>         return add_memory_driver_managed(nid, addr, memory_block_size_bytes(),
> -                                        vm->resource_name);
> +                                        vm->resource_name, MHP_NONE);
>  }
>
>  /*
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 51427c752b37b..9f40a294d398d 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -331,7 +331,7 @@ static enum bp_state reserve_additional_memory(void)
>         mutex_unlock(&balloon_mutex);
>         /* add_memory_resource() requires the device_hotplug lock */
>         lock_device_hotplug();
> -       rc = add_memory_resource(nid, resource);
> +       rc = add_memory_resource(nid, resource, MHP_NONE);
>         unlock_device_hotplug();
>         mutex_lock(&balloon_mutex);
>
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 51a877fec8da8..e53d1058f3443 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -57,6 +57,12 @@ enum {
>         MMOP_ONLINE_MOVABLE,
>  };
>
> +/* Flags for add_memory() and friends to specify memory hotplug details. */
> +typedef int __bitwise mhp_t;
> +
> +/* No special request */
> +#define MHP_NONE               ((__force mhp_t)0)
> +
>  /*
>   * Extended parameters for memory hotplug:
>   * altmap: alternative allocator for memmap array (optional)
> @@ -345,11 +351,13 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>
>  extern void __ref free_area_init_core_hotplug(int nid);
> -extern int __add_memory(int nid, u64 start, u64 size);
> -extern int add_memory(int nid, u64 start, u64 size);
> -extern int add_memory_resource(int nid, struct resource *resource);
> +extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> +extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
> +extern int add_memory_resource(int nid, struct resource *resource,
> +                              mhp_t mhp_flags);
>  extern int add_memory_driver_managed(int nid, u64 start, u64 size,
> -                                    const char *resource_name);
> +                                    const char *resource_name,
> +                                    mhp_t mhp_flags);
>  extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>                                    unsigned long nr_pages,
>                                    struct vmem_altmap *altmap, int migratetype);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 8e1cd18b5cf14..8f0bd7c9a63a5 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1039,7 +1039,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
>   *
>   * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
>   */
> -int __ref add_memory_resource(int nid, struct resource *res)
> +int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
>  {
>         struct mhp_params params = { .pgprot = PAGE_KERNEL };
>         u64 start, size;
> @@ -1118,7 +1118,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
>  }
>
>  /* requires device_hotplug_lock, see add_memory_resource() */
> -int __ref __add_memory(int nid, u64 start, u64 size)
> +int __ref __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
>  {
>         struct resource *res;
>         int ret;
> @@ -1127,18 +1127,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
>         if (IS_ERR(res))
>                 return PTR_ERR(res);
>
> -       ret = add_memory_resource(nid, res);
> +       ret = add_memory_resource(nid, res, mhp_flags);
>         if (ret < 0)
>                 release_memory_resource(res);
>         return ret;
>  }
>
> -int add_memory(int nid, u64 start, u64 size)
> +int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags)
>  {
>         int rc;
>
>         lock_device_hotplug();
> -       rc = __add_memory(nid, start, size);
> +       rc = __add_memory(nid, start, size, mhp_flags);
>         unlock_device_hotplug();
>
>         return rc;
> @@ -1167,7 +1167,7 @@ EXPORT_SYMBOL_GPL(add_memory);
>   * "System RAM ($DRIVER)".
>   */
>  int add_memory_driver_managed(int nid, u64 start, u64 size,
> -                             const char *resource_name)
> +                             const char *resource_name, mhp_t mhp_flags)
>  {
>         struct resource *res;
>         int rc;
> @@ -1185,7 +1185,7 @@ int add_memory_driver_managed(int nid, u64 start, u64 size,
>                 goto out_unlock;
>         }
>
> -       rc = add_memory_resource(nid, res);
> +       rc = add_memory_resource(nid, res, mhp_flags);
>         if (rc < 0)
>                 release_memory_resource(res);
>
> --

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>

> 2.26.2
>

^ permalink raw reply

* Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;
From: Ilya Dryomov @ 2020-09-10  9:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-wireless, linux-fbdev, oss-drivers, nouveau, alsa-devel,
	dri-devel, linux-mips, linux-ide, dm-devel, linux-mtd, linux-i2c,
	sparclinux, kvmarm, linux-rtc, linux-s390, linux-scsi, dccp,
	linux-rdma, linux-atm-general, linux-afs, coreteam,
	intel-wired-lan, linux-serial, linux-input, linux-mmc, Kees Cook,
	linux-media, linux-pm, intel-gfx, linux-sctp, linux-mediatek,
	linux-nvme, storagedev, Ceph Development, linux-arm-kernel,
	linux-nfs, Jiri Kosina, linux-parisc, netdev, linux-usb,
	Nick Desaulniers, LKML, iommu, netfilter-devel,
	Linux Crypto Mailing List, bpf, linuxppc-dev
In-Reply-To: <e6387578c75736d61b2fe70d9783d91329a97eb4.camel@perches.com>

On Wed, Sep 9, 2020 at 10:10 PM Joe Perches <joe@perches.com> wrote:
>
> fallthrough to a separate case/default label break; isn't very readable.
>
> Convert pseudo-keyword fallthrough; statements to a simple break; when
> the next label is case or default and the only statement in the next
> label block is break;
>
> Found using:
>
> $ grep-2.5.4 -rP --include=*.[ch] -n "fallthrough;(\s*(case\s+\w+|default)\s*:\s*){1,7}break;" *
>
> Miscellanea:
>
> o Move or coalesce a couple label blocks above a default: block.
>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
>
> Compiled allyesconfig x86-64 only.
> A few files for other arches were not compiled.
>
>  arch/arm/mach-mmp/pm-pxa910.c                             |  2 +-
>  arch/arm64/kvm/handle_exit.c                              |  2 +-
>  arch/mips/kernel/cpu-probe.c                              |  2 +-
>  arch/mips/math-emu/cp1emu.c                               |  2 +-
>  arch/s390/pci/pci.c                                       |  2 +-
>  crypto/tcrypt.c                                           |  4 ++--
>  drivers/ata/sata_mv.c                                     |  2 +-
>  drivers/atm/lanai.c                                       |  2 +-
>  drivers/gpu/drm/i915/display/intel_sprite.c               |  2 +-
>  drivers/gpu/drm/nouveau/nvkm/engine/disp/hdmi.c           |  2 +-
>  drivers/hid/wacom_wac.c                                   |  2 +-
>  drivers/i2c/busses/i2c-i801.c                             |  2 +-
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c                    | 14 +++++++-------
>  drivers/infiniband/ulp/rtrs/rtrs-srv.c                    |  6 +++---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c               |  2 +-
>  drivers/irqchip/irq-vic.c                                 |  4 ++--
>  drivers/md/dm.c                                           |  2 +-
>  drivers/media/dvb-frontends/drxd_hard.c                   |  2 +-
>  drivers/media/i2c/ov5640.c                                |  2 +-
>  drivers/media/i2c/ov6650.c                                |  5 ++---
>  drivers/media/i2c/smiapp/smiapp-core.c                    |  2 +-
>  drivers/media/i2c/tvp5150.c                               |  2 +-
>  drivers/media/pci/ddbridge/ddbridge-core.c                |  2 +-
>  drivers/media/usb/cpia2/cpia2_core.c                      |  2 +-
>  drivers/mfd/iqs62x.c                                      |  3 +--
>  drivers/mmc/host/atmel-mci.c                              |  2 +-
>  drivers/mtd/nand/raw/nandsim.c                            |  2 +-
>  drivers/net/ethernet/intel/e1000e/phy.c                   |  2 +-
>  drivers/net/ethernet/intel/fm10k/fm10k_pf.c               |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_adminq.c             |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c               |  2 +-
>  drivers/net/ethernet/intel/iavf/iavf_txrx.c               |  2 +-
>  drivers/net/ethernet/intel/igb/e1000_phy.c                |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_82599.c            |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c             |  2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c            |  2 +-
>  drivers/net/ethernet/intel/ixgbevf/vf.c                   |  2 +-
>  drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c |  2 +-
>  drivers/net/ethernet/qlogic/qed/qed_mcp.c                 |  2 +-
>  drivers/net/ethernet/sfc/falcon/farch.c                   |  2 +-
>  drivers/net/ethernet/sfc/farch.c                          |  2 +-
>  drivers/net/phy/adin.c                                    |  3 +--
>  drivers/net/usb/pegasus.c                                 |  4 ++--
>  drivers/net/usb/usbnet.c                                  |  2 +-
>  drivers/net/wireless/ath/ath5k/eeprom.c                   |  2 +-
>  drivers/net/wireless/mediatek/mt7601u/dma.c               |  8 ++++----
>  drivers/nvme/host/core.c                                  | 12 ++++++------
>  drivers/pcmcia/db1xxx_ss.c                                |  4 ++--
>  drivers/power/supply/abx500_chargalg.c                    |  2 +-
>  drivers/power/supply/charger-manager.c                    |  2 +-
>  drivers/rtc/rtc-pcf85063.c                                |  2 +-
>  drivers/s390/scsi/zfcp_fsf.c                              |  2 +-
>  drivers/scsi/aic7xxx/aic79xx_core.c                       |  4 ++--
>  drivers/scsi/aic94xx/aic94xx_tmf.c                        |  2 +-
>  drivers/scsi/lpfc/lpfc_sli.c                              |  2 +-
>  drivers/scsi/smartpqi/smartpqi_init.c                     |  2 +-
>  drivers/scsi/sr.c                                         |  2 +-
>  drivers/tty/serial/sunsu.c                                |  2 +-
>  drivers/tty/serial/sunzilog.c                             |  2 +-
>  drivers/tty/vt/vt_ioctl.c                                 |  2 +-
>  drivers/usb/dwc3/core.c                                   |  2 +-
>  drivers/usb/gadget/legacy/inode.c                         |  2 +-
>  drivers/usb/gadget/udc/pxa25x_udc.c                       |  4 ++--
>  drivers/usb/host/ohci-hcd.c                               |  2 +-
>  drivers/usb/isp1760/isp1760-hcd.c                         |  2 +-
>  drivers/usb/musb/cppi_dma.c                               |  2 +-
>  drivers/usb/phy/phy-fsl-usb.c                             |  2 +-
>  drivers/video/fbdev/stifb.c                               |  2 +-
>  fs/afs/yfsclient.c                                        |  8 ++++----
>  fs/ceph/dir.c                                             |  2 +-

For ceph:

Acked-by: Ilya Dryomov <idryomov@gmail.com>

Thanks,

                Ilya

^ permalink raw reply

* Re: [trivial PATCH] treewide: Convert switch/case fallthrough; to break;
From: Steffen Maier @ 2020-09-10  9:18 UTC (permalink / raw)
  To: Joe Perches, LKML, Jiri Kosina, Benjamin Block
  Cc: linux-wireless, linux-fbdev, oss-drivers, nouveau, alsa-devel,
	dri-devel, linux-ide, dm-devel, linux-mtd, linux-i2c, sparclinux,
	kvmarm, linux-rtc, linux-s390, linux-scsi, dccp, linux-rdma,
	linux-atm-general, linux-afs, coreteam, intel-wired-lan,
	linux-serial, linux-input, linux-mmc, Kees Cook, linux-media,
	linux-pm, intel-gfx, linux-sctp, linux-mediatek, linux-nvme,
	storagedev, ceph-devel, linux-arm-kernel, linux-nfs, linux-parisc,
	netdev, linux-usb, Nick Desaulniers, linux-mips, iommu,
	netfilter-devel, linux-crypto, bpf, linuxppc-dev
In-Reply-To: <e6387578c75736d61b2fe70d9783d91329a97eb4.camel@perches.com>

On 9/9/20 10:06 PM, Joe Perches wrote:
> fallthrough to a separate case/default label break; isn't very readable.
> 
> Convert pseudo-keyword fallthrough; statements to a simple break; when
> the next label is case or default and the only statement in the next
> label block is break;
> 
> Found using:
> 
> $ grep-2.5.4 -rP --include=*.[ch] -n "fallthrough;(\s*(case\s+\w+|default)\s*:\s*){1,7}break;" *
> 
> Miscellanea:
> 
> o Move or coalesce a couple label blocks above a default: block.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> 
> Compiled allyesconfig x86-64 only.
> A few files for other arches were not compiled.

>   drivers/s390/scsi/zfcp_fsf.c                              |  2 +-

>   82 files changed, 109 insertions(+), 112 deletions(-)

> diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
> index 140186fe1d1e..2741a07df692 100644
> --- a/drivers/s390/scsi/zfcp_fsf.c
> +++ b/drivers/s390/scsi/zfcp_fsf.c
> @@ -2105,7 +2105,7 @@ static void zfcp_fsf_open_lun_handler(struct zfcp_fsf_req *req)
>   
>   	case FSF_PORT_HANDLE_NOT_VALID:
>   		zfcp_erp_adapter_reopen(adapter, 0, "fsouh_1");
> -		fallthrough;
> +		break;
>   	case FSF_LUN_ALREADY_OPEN:
>   		break;
>   	case FSF_PORT_BOXED:

Acked-by: Steffen Maier <maier@linux.ibm.com> # for zfcp


-- 
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z Development

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox