LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc/papr_scm: Make access mode of 'perf_stats' attribute file to '0400'
From: Michael Ellerman @ 2020-08-10 13:12 UTC (permalink / raw)
  To: Vaibhav Jain, linuxppc-dev, linux-nvdimm
  Cc: Santosh Sivaraj, Aneesh Kumar K . V, Oliver O'Halloran,
	Vaibhav Jain, Dan Williams, Ira Weiny
In-Reply-To: <20200807123146.11037-1-vaibhav@linux.ibm.com>

Vaibhav Jain <vaibhav@linux.ibm.com> writes:
> The newly introduced 'perf_stats' attribute uses the default access
> mode of 0444 letting non-root users access performance stats of an
> nvdimm and potentially force the kernel into issuing large number of
> expensive HCALLs. Since the information exposed by this attribute
> cannot be cached hence its better to ward of access to this attribute
> from non-root users.
>
> Hence this patch updates the access-mode of 'perf_stats' sysfs
> attribute file to 0400 to make it only readable to root-users.

Or should we ratelimit it?

Fixes: ??

> Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>

cheers


^ permalink raw reply

* Re: [PATCH] recordmcount: Fix build failure on non arm64
From: Steven Rostedt @ 2020-08-10 13:27 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, Gregory Herrero, linuxppc-dev, linux-kernel
In-Reply-To: <20200810121855.GD9480@gaia>

On Mon, 10 Aug 2020 13:18:55 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> > Oops, thanks for fixing this.
> > 
> > Acked-by: Gregory Herrero <gregory.herrero@oracle.com>  
> 
> Thanks. I'll queue it via the arm64 tree (as I did with the previous
> fix) but I'll wait a bit for Steve to ack it.

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

^ permalink raw reply

* Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
From: Herbert Xu @ 2020-08-10 13:45 UTC (permalink / raw)
  To: Van Leeuwen, Pascal
  Cc: Andrei Botila, Andrei Botila, linux-s390@vger.kernel.org,
	Antoine Tenart, x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@axis.com, linux-crypto@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <CY4PR0401MB36528610C3ABF802F8CBF35FC3440@CY4PR0401MB3652.namprd04.prod.outlook.com>

On Mon, Aug 10, 2020 at 10:20:20AM +0000, Van Leeuwen, Pascal wrote:
>
> With all due respect, but this makes no sense.

I agree.  This is a lot of churn for no gain.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] recordmcount: Fix build failure on non arm64
From: Catalin Marinas @ 2020-08-10 14:26 UTC (permalink / raw)
  To: Gregory Herrero, Steven Rostedt (VMware), Christophe Leroy
  Cc: linuxppc-dev, Will Deacon, linux-kernel, linux-arm-kernel
In-Reply-To: <5ca1be21fa6ebf73203b45fd9aadd2bafb5e6b15.1597049145.git.christophe.leroy@csgroup.eu>

On Mon, 10 Aug 2020 08:48:22 +0000 (UTC), Christophe Leroy wrote:
> Commit ea0eada45632 leads to the following build failure on powerpc:
> 
>   HOSTCC  scripts/recordmcount
> scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
> scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use in this function)
> scripts/recordmcount.c:440: error: (Each undeclared identifier is reported only once
> scripts/recordmcount.c:440: error: for each function it appears in.)
> make[2]: *** [scripts/recordmcount] Error 1
> 
> [...]

Applied to arm64 (for-next/core), thanks!

[1/1] recordmcount: Fix build failure on non arm64
      https://git.kernel.org/arm64/c/3df14264ad99

-- 
Catalin


^ permalink raw reply

* [PATCH] Documentation/features: refresh powerpc arch support files
From: Tobias Klauser @ 2020-08-10 10:09 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linuxppc-dev, Nicholas Piggin, linux-doc

Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
Implement queued spinlocks and rwlocks").

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
---
 Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
 .../features/locking/queued-spinlocks/arch-support.txt          | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt b/Documentation/features/locking/queued-rwlocks/arch-support.txt
index 5c6bcfcf8e1f..4dd5e554873f 100644
--- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
@@ -22,7 +22,7 @@
     |       nios2: | TODO |
     |    openrisc: |  ok  |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |       riscv: | TODO |
     |        s390: | TODO |
     |          sh: | TODO |
diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
index b55e420a34ea..b16d4f71e5ce 100644
--- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
+++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
@@ -22,7 +22,7 @@
     |       nios2: | TODO |
     |    openrisc: |  ok  |
     |      parisc: | TODO |
-    |     powerpc: | TODO |
+    |     powerpc: |  ok  |
     |       riscv: | TODO |
     |        s390: | TODO |
     |          sh: | TODO |
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH] Documentation/features: refresh powerpc arch support files
From: Christophe Leroy @ 2020-08-10 15:09 UTC (permalink / raw)
  To: Tobias Klauser, Jonathan Corbet; +Cc: linuxppc-dev, Nicholas Piggin, linux-doc
In-Reply-To: <20200810100906.3805-1-tklauser@distanz.ch>



Le 10/08/2020 à 12:09, Tobias Klauser a écrit :
> Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
> Implement queued spinlocks and rwlocks").
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> ---
>   Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
>   .../features/locking/queued-spinlocks/arch-support.txt          | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> index 5c6bcfcf8e1f..4dd5e554873f 100644
> --- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
> +++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> @@ -22,7 +22,7 @@
>       |       nios2: | TODO |
>       |    openrisc: |  ok  |
>       |      parisc: | TODO |
> -    |     powerpc: | TODO |
> +    |     powerpc: |  ok  |

In your commit log you are refering to a commit titled "powerpc/64s:"

Are you sure it is now OK for all powerpc, not only for book3s/64 as 
suggested by yout text ?

Christophe

>       |       riscv: | TODO |
>       |        s390: | TODO |
>       |          sh: | TODO |
> diff --git a/Documentation/features/locking/queued-spinlocks/arch-support.txt b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> index b55e420a34ea..b16d4f71e5ce 100644
> --- a/Documentation/features/locking/queued-spinlocks/arch-support.txt
> +++ b/Documentation/features/locking/queued-spinlocks/arch-support.txt
> @@ -22,7 +22,7 @@
>       |       nios2: | TODO |
>       |    openrisc: |  ok  |
>       |      parisc: | TODO |
> -    |     powerpc: | TODO |
> +    |     powerpc: |  ok  |
>       |       riscv: | TODO |
>       |        s390: | TODO |
>       |          sh: | TODO |
> 

^ permalink raw reply

* Re: [PATCH] Documentation/features: refresh powerpc arch support files
From: Tobias Klauser @ 2020-08-10 15:36 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-doc, linuxppc-dev, Nicholas Piggin, Jonathan Corbet
In-Reply-To: <4b6b65e8-ec79-ebf0-0ab5-7b48182584f1@csgroup.eu>

On 2020-08-10 at 17:09:51 +0200, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
> 
> Le 10/08/2020 à 12:09, Tobias Klauser a écrit :
> > Support for these was added by commit aa65ff6b18e0 ("powerpc/64s:
> > Implement queued spinlocks and rwlocks").
> > 
> > Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> > ---
> >   Documentation/features/locking/queued-rwlocks/arch-support.txt  | 2 +-
> >   .../features/locking/queued-spinlocks/arch-support.txt          | 2 +-
> >   2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/features/locking/queued-rwlocks/arch-support.txt b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > index 5c6bcfcf8e1f..4dd5e554873f 100644
> > --- a/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > +++ b/Documentation/features/locking/queued-rwlocks/arch-support.txt
> > @@ -22,7 +22,7 @@
> >       |       nios2: | TODO |
> >       |    openrisc: |  ok  |
> >       |      parisc: | TODO |
> > -    |     powerpc: | TODO |
> > +    |     powerpc: |  ok  |
> 
> In your commit log you are refering to a commit titled "powerpc/64s:"
> 
> Are you sure it is now OK for all powerpc, not only for book3s/64 as
> suggested by yout text ?

The change was generated by running
Documentation/features/scripts/features-refresh.sh
Sorry, I should have mentioned this in the commit message. I noticed the
updated features for powerpc after updating the RISC-V supported
features [1].

[1] https://lore.kernel.org/linux-riscv/20200810095000.32092-1-tklauser@distanz.ch/T/#u

AFAIK, the features-refresh.sh script has no way of distinguishing
between different types of an architecture. It just checks for the
respective Kconfig symbols listed in the
Documentation/features/**/arch-support.txt files in all arch/**/Kconfig
files and updates the feature to "ok" if it finds the Kconfig symbol.

^ permalink raw reply

* Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
From: Eric Biggers @ 2020-08-10 17:03 UTC (permalink / raw)
  To: Horia Geantă
  Cc: Andrei Botila (OSS), Andrei Botila, Herbert Xu,
	Van Leeuwen, Pascal, Antoine Tenart, linux-s390@vger.kernel.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@axis.com, linux-crypto@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <fd3e5862-3357-7dfc-6c75-30086ab19f82@nxp.com>

On Mon, Aug 10, 2020 at 05:33:39PM +0300, Horia Geantă wrote:
> On 8/10/2020 4:45 PM, Herbert Xu wrote:
> > On Mon, Aug 10, 2020 at 10:20:20AM +0000, Van Leeuwen, Pascal wrote:
> >>
> >> With all due respect, but this makes no sense.
> > 
> > I agree.  This is a lot of churn for no gain.
> > 
> I would say the gain is that all skcipher algorithms would behave the same
> when input length equals zero - i.e. treat the request as a no-op.
> 
> We can't say "no input" has any meaning to the other skcipher algorithms,
> but the convention is to accept this case and just return 0.
> I don't see why XTS has to be handled differently.
> 

CTS also rejects empty inputs.

The rule it follows is just that all input lengths >= blocksize are allowed.
Input lengths < blocksize aren't allowed.

- Eric

^ permalink raw reply

* Re: [PATCH] powerpc/pseries: explicitly reschedule during drmem_lmb list traversal
From: Nathan Lynch @ 2020-08-10 20:03 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: tyreld, cheloha, Laurent Dufour, linuxppc-dev
In-Reply-To: <87tuxl1ant.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:
> One thought, which I possibly should not put in writing, is that we
> could use the alignment of the pointer as a poor man's substitute for a
> counter, eg:
>
> +static inline struct drmem_lmb *drmem_lmb_next(struct drmem_lmb *lmb)
> +{
> +	if (lmb % PAGE_SIZE == 0)
> +		cond_resched();
> +
> +	return ++lmb;
> +}
>
> I think the lmbs are allocated in a block, so I think that will work.
> Maybe PAGE_SIZE is not the right size to use, but you get the idea.
>
> Gross I know, but might be OK as short term solution?

OK, looking into this.

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_sai: Add -EPROBE_DEFER check for regmap init
From: Nicolin Chen @ 2020-08-10 23:11 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, lgirdwood,
	perex, broonie, festevam, linux-kernel
In-Reply-To: <1596791682-4311-1-git-send-email-shengjiu.wang@nxp.com>

On Fri, Aug 07, 2020 at 05:14:42PM +0800, Shengjiu Wang wrote:
> Regmap initialization may return -EPROBE_DEFER for clock
> may not be ready, so check -EPROBE_DEFER error type before
> start another Regmap initialization.
> 
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>

Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

^ permalink raw reply

* Re: [PATCH v2] ASoC: fsl-asoc-card: Get "extal" clock rate by clk_get_rate
From: Nicolin Chen @ 2020-08-10 23:13 UTC (permalink / raw)
  To: Shengjiu Wang
  Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, lgirdwood,
	perex, broonie, festevam, linux-kernel
In-Reply-To: <1597047103-6863-1-git-send-email-shengjiu.wang@nxp.com>

On Mon, Aug 10, 2020 at 04:11:43PM +0800, Shengjiu Wang wrote:
> On some platform(.e.g. i.MX8QM MEK), the "extal" clock is different
> with the mclk of codec, then the clock rate is also different.
> So it is better to get clock rate of "extal" rate by clk_get_rate,
> don't reuse the clock rate of mclk.
> 
> Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>

Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

^ permalink raw reply

* Re: [PATCH] recordmcount: Fix build failure on non arm64
From: Gregory Herrero @ 2020-08-10  9:17 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arm-kernel, Catalin Marinas, linuxppc-dev, linux-kernel,
	Steven Rostedt
In-Reply-To: <5ca1be21fa6ebf73203b45fd9aadd2bafb5e6b15.1597049145.git.christophe.leroy@csgroup.eu>

Hi Christophe,

On Mon, Aug 10, 2020 at 08:48:22AM +0000, Christophe Leroy wrote:
> Commit ea0eada45632 leads to the following build failure on powerpc:
> 
>   HOSTCC  scripts/recordmcount
> scripts/recordmcount.c: In function 'arm64_is_fake_mcount':
> scripts/recordmcount.c:440: error: 'R_AARCH64_CALL26' undeclared (first use in this function)
> scripts/recordmcount.c:440: error: (Each undeclared identifier is reported only once
> scripts/recordmcount.c:440: error: for each function it appears in.)
> make[2]: *** [scripts/recordmcount] Error 1
> 
> Make sure R_AARCH64_CALL26 is always defined.
> 
Oops, thanks for fixing this.

Acked-by: Gregory Herrero <gregory.herrero@oracle.com>

Greg

> Fixes: ea0eada45632 ("recordmcount: only record relocation of type R_AARCH64_CALL26 on arm64.")
> Cc: Gregory Herrero <gregory.herrero@oracle.com>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  scripts/recordmcount.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
> index e59022b3f125..b9c2ee7ab43f 100644
> --- a/scripts/recordmcount.c
> +++ b/scripts/recordmcount.c
> @@ -42,6 +42,8 @@
>  #define R_ARM_THM_CALL		10
>  #define R_ARM_CALL		28
>  
> +#define R_AARCH64_CALL26	283
> +
>  static int fd_map;	/* File descriptor for file being modified. */
>  static int mmap_failed; /* Boolean flag. */
>  static char gpfx;	/* prefix for global symbol name (sometimes '_') */
> -- 
> 2.25.0
> 

^ permalink raw reply

* RE: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
From: Van Leeuwen, Pascal @ 2020-08-10 10:20 UTC (permalink / raw)
  To: Andrei Botila, Herbert Xu, David S. Miller
  Cc: linux-s390@vger.kernel.org, Andrei Botila, Antoine Tenart,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@axis.com, linux-crypto@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200807162010.18979-20-andrei.botila@oss.nxp.com>

> -----Original Message-----
> From: linux-crypto-owner@vger.kernel.org <linux-crypto-owner@vger.kernel.org> On Behalf Of Andrei Botila
> Sent: Friday, August 7, 2020 6:20 PM
> To: Herbert Xu <herbert@gondor.apana.org.au>; David S. Miller <davem@davemloft.net>
> Cc: linux-crypto@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-s390@vger.kernel.org; x86@kernel.org; linux-arm-kernel@axis.com; Andrei Botila <andrei.botila@nxp.com>; Antoine Tenart
> <antoine.tenart@bootlin.com>
> Subject: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
>
> <<< External Email >>>
> From: Andrei Botila <andrei.botila@nxp.com>
>
> Standardize the way input lengths equal to 0 are handled in all skcipher
> algorithms. All the algorithms return 0 for input lengths equal to zero.
>
> Cc: Antoine Tenart <antoine.tenart@bootlin.com>
> Signed-off-by: Andrei Botila <andrei.botila@nxp.com>
> ---
>  drivers/crypto/inside-secure/safexcel_cipher.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/crypto/inside-secure/safexcel_cipher.c b/drivers/crypto/inside-secure/safexcel_cipher.c
> index 1ac3253b7903..03d06556ea98 100644
> --- a/drivers/crypto/inside-secure/safexcel_cipher.c
> +++ b/drivers/crypto/inside-secure/safexcel_cipher.c
> @@ -2533,6 +2533,9 @@ static int safexcel_skcipher_aes_xts_cra_init(struct crypto_tfm *tfm)
>
>  static int safexcel_encrypt_xts(struct skcipher_request *req)
>  {
> +if (!req->cryptlen)
> +return 0;
> +
>  if (req->cryptlen < XTS_BLOCK_SIZE)
>  return -EINVAL;
>  return safexcel_queue_req(&req->base, skcipher_request_ctx(req),
> @@ -2541,6 +2544,9 @@ static int safexcel_encrypt_xts(struct skcipher_request *req)
>
>  static int safexcel_decrypt_xts(struct skcipher_request *req)
>  {
> +if (!req->cryptlen)
> +return 0;
> +
>  if (req->cryptlen < XTS_BLOCK_SIZE)
>  return -EINVAL;
>  return safexcel_queue_req(&req->base, skcipher_request_ctx(req),
> --
> 2.17.1

With all due respect, but this makes no sense.

For XTS, any length below 16 is illegal, as applying CTS in order to handle non-cipher
block multiples (16 bytes in case of AES) requires _more_ data than 1 cipher block.

There is no benefit to explicitly check for zero length if there is already a check for
less-than-16. That's just wasting CPU cycles and  a branch predictor entry, for no
benefit whatsoever. (except for academic "alignment with other ciphers").

XTS has very specific use cases. No one in their right mind would call it for a
situation where it can't be applied in the first place, e.g. anything < 16 bytes.

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>


^ permalink raw reply

* Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
From: Horia Geantă @ 2020-08-10 14:33 UTC (permalink / raw)
  To: Herbert Xu, Van Leeuwen, Pascal
  Cc: Andrei Botila (OSS), Andrei Botila, linux-s390@vger.kernel.org,
	Antoine Tenart, x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@axis.com, linux-crypto@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200810134500.GA22914@gondor.apana.org.au>

On 8/10/2020 4:45 PM, Herbert Xu wrote:
> On Mon, Aug 10, 2020 at 10:20:20AM +0000, Van Leeuwen, Pascal wrote:
>>
>> With all due respect, but this makes no sense.
> 
> I agree.  This is a lot of churn for no gain.
> 
I would say the gain is that all skcipher algorithms would behave the same
when input length equals zero - i.e. treat the request as a no-op.

We can't say "no input" has any meaning to the other skcipher algorithms,
but the convention is to accept this case and just return 0.
I don't see why XTS has to be handled differently.

Thanks,
Horia

^ permalink raw reply

* RE: [PATCH 2/2 v2] powerpc/powernv: Enable and setup PCI P2P
From: Aneela Devarasetty @ 2020-08-10 20:24 UTC (permalink / raw)
  To: Oliver O'Halloran, Max Gurtovoy
  Cc: Zhi-wei Dai, Vladimir Koushnir, Carol Soto, linux-pci,
	Shlomi Nimrodi, Israel Rukshin, Frederic Barrat, Idan Werpoler,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <CAOSf1CGv=0bwShzzK5zP3dtKg=RxeTFvq52j-Vi4GDfZ4UpBJA@mail.gmail.com>

+ David from IBM.

-----Original Message-----
From: Oliver O'Halloran <oohall@gmail.com> 
Sent: Monday, August 3, 2020 2:35 AM
To: Max Gurtovoy <maxg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>; linux-pci <linux-pci@vger.kernel.org>; linuxppc-dev <linuxppc-dev@lists.ozlabs.org>; Israel Rukshin <israelr@mellanox.com>; Idan Werpoler <Idanw@mellanox.com>; Vladimir Koushnir <vladimirk@mellanox.com>; Shlomi Nimrodi <shlomin@mellanox.com>; Frederic Barrat <fbarrat@linux.ibm.com>; Carol Soto <clsoto@us.ibm.com>; Aneela Devarasetty <aneela@mellanox.com>
Subject: Re: [PATCH 2/2 v2] powerpc/powernv: Enable and setup PCI P2P

On Thu, Apr 30, 2020 at 11:15 PM Max Gurtovoy <maxg@mellanox.com> wrote:
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 57d3a6a..9ecc576 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3706,18 +3706,208 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
>         }
>  }
>
> +#ifdef CONFIG_PCI_P2PDMA
> +static DEFINE_MUTEX(p2p_mutex);
> +
> +static bool pnv_pci_controller_owns_addr(struct pci_controller *hose,
> +                                        phys_addr_t addr, size_t 
> +size) {
> +       int i;
> +
> +       /*
> +        * It seems safe to assume the full range is under the same PHB, so we
> +        * can ignore the size.
> +        */
> +       for (i = 0; i < ARRAY_SIZE(hose->mem_resources); i++) {
> +               struct resource *res = &hose->mem_resources[i];
> +
> +               if (res->flags && addr >= res->start && addr < res->end)
> +                       return true;
> +       }
> +       return false;
> +}
> +
> +/*
> + * find the phb owning a mmio address if not owned locally  */ static 
> +struct pnv_phb *pnv_pci_find_owning_phb(struct pci_dev *pdev,
> +                                              phys_addr_t addr, 
> +size_t size) {
> +       struct pci_controller *hose;
> +
> +       /* fast path */
> +       if (pnv_pci_controller_owns_addr(pdev->bus->sysdata, addr, size))
> +               return NULL;

Do we actually need this fast path? It's going to be slow either way.
Also if a device is doing p2p to another device under the same PHB then it should not be happening via the root complex. Is this a case you've tested?

> +       list_for_each_entry(hose, &hose_list, list_node) {
> +               struct pnv_phb *phb = hose->private_data;
> +
> +               if (phb->type != PNV_PHB_NPU_NVLINK &&
> +                   phb->type != PNV_PHB_NPU_OCAPI) {
> +                       if (pnv_pci_controller_owns_addr(hose, addr, size))
> +                               return phb;
> +               }
> +       }
> +       return NULL;
> +}
> +
> +static u64 pnv_pci_dma_dir_to_opal_p2p(enum dma_data_direction dir) {
> +       if (dir == DMA_TO_DEVICE)
> +               return OPAL_PCI_P2P_STORE;
> +       else if (dir == DMA_FROM_DEVICE)
> +               return OPAL_PCI_P2P_LOAD;
> +       else if (dir == DMA_BIDIRECTIONAL)
> +               return OPAL_PCI_P2P_LOAD | OPAL_PCI_P2P_STORE;
> +       else
> +               return 0;
> +}
> +
> +static int pnv_pci_ioda_enable_p2p(struct pci_dev *initiator,
> +                                  struct pnv_phb *phb_target,
> +                                  enum dma_data_direction dir) {
> +       struct pci_controller *hose;
> +       struct pnv_phb *phb_init;
> +       struct pnv_ioda_pe *pe_init;
> +       u64 desc;
> +       int rc;
> +
> +       if (!opal_check_token(OPAL_PCI_SET_P2P))
> +               return -ENXIO;
> +

> +       hose = pci_bus_to_host(initiator->bus);
> +       phb_init = hose->private_data;

You can use the pci_bus_to_pnvhb() helper

> +
> +       pe_init = pnv_ioda_get_pe(initiator);
> +       if (!pe_init)
> +               return -ENODEV;
> +
> +       if (!pe_init->tce_bypass_enabled)
> +               return -EINVAL;
> +
> +       /*
> +        * Configuring the initiator's PHB requires to adjust its TVE#1
> +        * setting. Since the same device can be an initiator several times for
> +        * different target devices, we need to keep a reference count to know
> +        * when we can restore the default bypass setting on its TVE#1 when
> +        * disabling. Opal is not tracking PE states, so we add a reference
> +        * count on the PE in linux.
> +        *
> +        * For the target, the configuration is per PHB, so we keep a
> +        * target reference count on the PHB.
> +        */

This irks me a bit because configuring the DMA address limits for the TVE is the kernel's job. What we really should be doing is using
opal_pci_map_pe_dma_window_real() to set the bypass-mode address limit for the TVE to something large enough to hit the MMIO ranges rather than having set_p2p do it as a side effect. Unfortunately, for some reason skiboot doesn't implement support for enabling 56bit addressing using opal_pci_map_pe_dma_window_real() and we do need to support older kernel's which used this stuff so I guess we're stuck with it for now. It'd be nice if we could fix this in the longer term though...

> +       mutex_lock(&p2p_mutex);
> +
> +       desc = OPAL_PCI_P2P_ENABLE | pnv_pci_dma_dir_to_opal_p2p(dir);
> +       /* always go to opal to validate the configuration */
> +       rc = opal_pci_set_p2p(phb_init->opal_id, phb_target->opal_id, desc,
> +                             pe_init->pe_number);
> +       if (rc != OPAL_SUCCESS) {
> +               rc = -EIO;
> +               goto out;
> +       }
> +
> +       pe_init->p2p_initiator_count++;
> +       phb_target->p2p_target_count++;
> +
> +       rc = 0;
> +out:
> +       mutex_unlock(&p2p_mutex);
> +       return rc;
> +}
> +
> +static int pnv_pci_dma_map_resource(struct pci_dev *pdev,
> +                                   phys_addr_t phys_addr, size_t size,
> +                                   enum dma_data_direction dir) {
> +       struct pnv_phb *target_phb;
> +
> +       target_phb = pnv_pci_find_owning_phb(pdev, phys_addr, size);
> +       if (!target_phb)
> +               return 0;
> +
> +       return pnv_pci_ioda_enable_p2p(pdev, target_phb, dir); }
> +
> +static int pnv_pci_ioda_disable_p2p(struct pci_dev *initiator,
> +               struct pnv_phb *phb_target) {
> +       struct pci_controller *hose;
> +       struct pnv_phb *phb_init;
> +       struct pnv_ioda_pe *pe_init;
> +       int rc;
> +
> +       if (!opal_check_token(OPAL_PCI_SET_P2P))
> +               return -ENXIO;

This should probably have a WARN_ON() since we can't hit this path unless the initial map succeeds.

> +       hose = pci_bus_to_host(initiator->bus);
> +       phb_init = hose->private_data;

pci_bus_to_pnvhb()

> +       pe_init = pnv_ioda_get_pe(initiator);
> +       if (!pe_init)
> +               return -ENODEV;
> +
> +       mutex_lock(&p2p_mutex);
> +
> +       if (!pe_init->p2p_initiator_count || !phb_target->p2p_target_count) {
> +               rc = -EINVAL;
> +               goto out;
> +       }
> +
> +       if (--pe_init->p2p_initiator_count == 0)
> +               pnv_pci_ioda2_set_bypass(pe_init, true);
> +
> +       if (--phb_target->p2p_target_count == 0) {
> +               rc = opal_pci_set_p2p(phb_init->opal_id, phb_target->opal_id,
> +                                     0, pe_init->pe_number);
> +               if (rc != OPAL_SUCCESS) {
> +                       rc = -EIO;
> +                       goto out;
> +               }
> +       }
> +
> +       rc = 0;
> +out:
> +       mutex_unlock(&p2p_mutex);
> +       return rc;
> +}
> +
> +static void pnv_pci_dma_unmap_resource(struct pci_dev *pdev,
> +                                      dma_addr_t addr, size_t size,
> +                                      enum dma_data_direction dir) {
> +       struct pnv_phb *target_phb;
> +       int rc;
> +
> +       target_phb = pnv_pci_find_owning_phb(pdev, addr, size);
> +       if (!target_phb)
> +               return;
> +
> +       rc = pnv_pci_ioda_disable_p2p(pdev, target_phb);
> +       if (rc)
> +               dev_err(&pdev->dev, "Failed to undo PCI peer-to-peer setup for address %llx: %d\n",
> +                       addr, rc);

Use pci_err() or pe_err().

^ permalink raw reply

* RE: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
From: Van Leeuwen, Pascal @ 2020-08-10 21:37 UTC (permalink / raw)
  To: Horia Geantă, Herbert Xu
  Cc: Andrei Botila (OSS), Andrei Botila, linux-s390@vger.kernel.org,
	Antoine Tenart, x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@axis.com, linux-crypto@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, David S. Miller,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <fd3e5862-3357-7dfc-6c75-30086ab19f82@nxp.com>

> -----Original Message-----
> From: Horia Geantă <horia.geanta@nxp.com>
> Sent: Monday, August 10, 2020 4:34 PM
> To: Herbert Xu <herbert@gondor.apana.org.au>; Van Leeuwen, Pascal <pvanleeuwen@rambus.com>
> Cc: Andrei Botila (OSS) <andrei.botila@oss.nxp.com>; David S. Miller <davem@davemloft.net>; linux-crypto@vger.kernel.org; linux-
> arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-s390@vger.kernel.org;
> x86@kernel.org; linux-arm-kernel@axis.com; Andrei Botila <andrei.botila@nxp.com>; Antoine Tenart <antoine.tenart@bootlin.com>
> Subject: Re: [PATCH 19/22] crypto: inside-secure - add check for xts input length equal to zero
>
> <<< External Email >>>
> On 8/10/2020 4:45 PM, Herbert Xu wrote:
> > On Mon, Aug 10, 2020 at 10:20:20AM +0000, Van Leeuwen, Pascal wrote:
> >>
> >> With all due respect, but this makes no sense.
> >
> > I agree.  This is a lot of churn for no gain.
> >
> I would say the gain is that all skcipher algorithms would behave the same
> when input length equals zero - i.e. treat the request as a no-op.
>
XTS already behaves differently because it can accept any byte amount as long
as it is not in the range 0 -16. So far, you got an EINVAL error for lengths < 16.
The special exception on top of that for length 0 does not improve anything.

Treating a request of length 0 as a no-op is not a useful feature here, as there
is no use case where that would make sense. XTS encrypts blocks (usually disk
sectors), and cannot be chained. So an attempt to encrypt a zero length block
is most certainly some kind of error (e.g. trying to use XTS for something it
was not designed to do - big security mistake!).

> We can't say "no input" has any meaning to the other skcipher algorithms,
> but the convention is to accept this case and just return 0.
> I don't see why XTS has to be handled differently.
>
I don't see why you would blindly follow some historical convention ...
unless maybe there was some existing real use case that would benefit?

BTW: for generic ciphers I could think of some use cases where the zero
length request being a no-op makes sense if the application does not
bother to check how much data it has gathered to process (which may be
nothing), but I can't see how this could apply to XTS, being block-based.

> Thanks,
> Horia

Regards,
Pascal van Leeuwen
Silicon IP Architect Multi-Protocol Engines, Rambus Security
Rambus ROTW Holding BV
+31-73 6581953

Note: The Inside Secure/Verimatrix Silicon IP team was recently acquired by Rambus.
Please be so kind to update your e-mail address book with my new e-mail address.


** This message and any attachments are for the sole use of the intended recipient(s). It may contain information that is confidential and privileged. If you are not the intended recipient of this message, you are prohibited from printing, copying, forwarding or saving it. Please delete the message and attachments and notify the sender immediately. **

Rambus Inc.<http://www.rambus.com>

^ permalink raw reply

* [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct
From: Scott Cheloha @ 2020-08-11  1:51 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nathan Lynch, Laurent Dufour, David Hildenbrand, Michal Suchanek,
	Rick Lindsley

At memory hot-remove time we can retrieve an LMB's nid from its
corresponding memory_block.  There is no need to store the nid
in multiple locations.

Note that lmb_to_memblock() uses find_memory_block() to get the
corresponding memory_block.  As find_memory_block() runs in sub-linear
time this approach is negligibly slower than what we do at present.

In exchange for this lookup at hot-remove time we no longer need to
call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
spares us an O(n^2) initialization during boot.

On systems with many LMBs that initialization overhead is palpable and
disruptive.  For example, on a box with 249854 LMBs we're seeing
drmem_init() take upwards of 30 seconds to complete:

[   53.721639] drmem: initializing drmem v2
[   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
[   80.604377] Modules linked in:
[   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
[   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR: 0000000000000000
[   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
[   80.604412] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d
[   80.604431] CFAR: c0000000000a4a38 IRQMASK: 0
[   80.604431] GPR00: c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30
[   80.604431] GPR04: 0000000000000000 c000000000f4095a 000000000000002f 0000000010000000
[   80.604431] GPR08: c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001
[   80.604431] GPR12: 0000000000000000 c00000001e8ec200
[   80.604477] NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0
[   80.604486] LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0
[   80.604492] Call Trace:
[   80.604498] [c0002dbff8493ac0] [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
[   80.604509] [c0002dbff8493b20] [c000000000087c10] memory_add_physaddr_to_nid+0x20/0x60
[   80.604521] [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0
[   80.604530] [c0002dbff8493c10] [c000000000010154] do_one_initcall+0x64/0x2c0
[   80.604540] [c0002dbff8493ce0] [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0
[   80.604550] [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148
[   80.604560] [c0002dbff8493e20] [c00000000000b648] ret_from_kernel_thread+0x5c/0x74
[   80.604567] Instruction dump:
[   80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018 3908ffe8 7d094214
[   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e9490000 7fbe5040
[   89.047390] drmem: 249854 LMB(s)

With a patched kernel on the same machine we're no longer seeing the
soft lockup.  drmem_init() now completes in negligible time, even when
the LMB count is large.

Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
---
v1:
 - RFC

v2:
 - Adjusted commit message.
 - Miscellaneous cleanup.

v3:
 - Correct issue found by Laurent Dufour <ldufour@linux.vnet.ibm.com>:
   - Add missing put_device() call in dlpar_remove_lmb() for the
     lmb's associated mem_block.

 arch/powerpc/include/asm/drmem.h              | 21 ----------------
 arch/powerpc/mm/drmem.c                       |  6 +----
 .../platforms/pseries/hotplug-memory.c        | 24 ++++++++++++-------
 3 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 414d209f45bb..34e4e9b257f5 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -13,9 +13,6 @@ struct drmem_lmb {
 	u32     drc_index;
 	u32     aa_index;
 	u32     flags;
-#ifdef CONFIG_MEMORY_HOTPLUG
-	int	nid;
-#endif
 };
 
 struct drmem_lmb_info {
@@ -104,22 +101,4 @@ static inline void invalidate_lmb_associativity_index(struct drmem_lmb *lmb)
 	lmb->aa_index = 0xffffffff;
 }
 
-#ifdef CONFIG_MEMORY_HOTPLUG
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-	lmb->nid = memory_add_physaddr_to_nid(lmb->base_addr);
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-	lmb->nid = -1;
-}
-#else
-static inline void lmb_set_nid(struct drmem_lmb *lmb)
-{
-}
-static inline void lmb_clear_nid(struct drmem_lmb *lmb)
-{
-}
-#endif
-
 #endif /* _ASM_POWERPC_LMB_H */
diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
index 59327cefbc6a..873fcfc7b875 100644
--- a/arch/powerpc/mm/drmem.c
+++ b/arch/powerpc/mm/drmem.c
@@ -362,10 +362,8 @@ static void __init init_drmem_v1_lmbs(const __be32 *prop)
 	if (!drmem_info->lmbs)
 		return;
 
-	for_each_drmem_lmb(lmb) {
+	for_each_drmem_lmb(lmb)
 		read_drconf_v1_cell(lmb, &prop);
-		lmb_set_nid(lmb);
-	}
 }
 
 static void __init init_drmem_v2_lmbs(const __be32 *prop)
@@ -410,8 +408,6 @@ static void __init init_drmem_v2_lmbs(const __be32 *prop)
 
 			lmb->aa_index = dr_cell.aa_index;
 			lmb->flags = dr_cell.flags;
-
-			lmb_set_nid(lmb);
 		}
 	}
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5ace2f9a277e..e34326d22400 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -356,25 +356,32 @@ static int dlpar_add_lmb(struct drmem_lmb *);
 
 static int dlpar_remove_lmb(struct drmem_lmb *lmb)
 {
+	struct memory_block *mem_block;
 	unsigned long block_sz;
 	int rc;
 
 	if (!lmb_is_removable(lmb))
 		return -EINVAL;
 
+	mem_block = lmb_to_memblock(lmb);
+	if (mem_block == NULL)
+		return -EINVAL;
+
 	rc = dlpar_offline_lmb(lmb);
-	if (rc)
+	if (rc) {
+		put_device(&mem_block->dev);
 		return rc;
+	}
 
 	block_sz = pseries_memory_block_size();
 
-	__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+	__remove_memory(mem_block->nid, lmb->base_addr, block_sz);
+	put_device(&mem_block->dev);
 
 	/* Update memory regions for memory remove */
 	memblock_remove(lmb->base_addr, block_sz);
 
 	invalidate_lmb_associativity_index(lmb);
-	lmb_clear_nid(lmb);
 	lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 
 	return 0;
@@ -631,7 +638,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int rc;
+	int nid, rc;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -642,11 +649,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		return rc;
 	}
 
-	lmb_set_nid(lmb);
 	block_sz = memory_block_size_bytes();
 
+	/* Find the node id for this address. */
+	nid = memory_add_physaddr_to_nid(lmb->base_addr);
+
 	/* Add the memory */
-	rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
@@ -654,9 +663,8 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 
 	rc = dlpar_online_lmb(lmb);
 	if (rc) {
-		__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+		__remove_memory(nid, lmb->base_addr, block_sz);
 		invalidate_lmb_associativity_index(lmb);
-		lmb_clear_nid(lmb);
 	} else {
 		lmb->flags |= DRCONF_MEM_ASSIGNED;
 	}
-- 
2.24.1


^ permalink raw reply related

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
From: bugzilla-daemon @ 2020-08-11  3:47 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-205183-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #6 from Michael Ellerman (michael@ellerman.id.au) ---
Fixed in 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack expansion for the
signal frame")

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [PATCH] powerpc: kvm: Increase HDEC threshold to enter guest
From: David Gibson @ 2020-08-11  4:08 UTC (permalink / raw)
  To: paulus, mpe; +Cc: kvm-ppc, linuxppc-dev, linux-kernel, kvm, David Gibson

Before entering a guest, we need to set the HDEC to pull us out again
when the guest's time is up.  This needs some care, though, because the
HDEC is edge triggered, which means that if it expires before entering the
guest, the interrupt will be lost, meaning we stay in the guest
indefinitely (in practice, until the the hard lockup detector pulls us out
with an NMI).

For the POWER9, independent threads mode specific path, we attempt to
prevent that, by testing time has already expired before setting the HDEC
in kvmhv_load_regs_and_go().  However, that doesn't account for the case
where the timer expires between that test and the actual guest entry.
Preliminary instrumentation suggests that can take as long as 1.5µs under
certain load conditions, and simply checking the HDEC value we're going to
load is positive isn't enough to guarantee that leeway.

That test here is sometimes masked by a test in kvmhv_p9_guest_entry(), its
caller.  That checks that the remaining time is at 1µs.  However as noted
above that doesn't appear to be sufficient in all circumstances even
from the point HDEC is set, let alone this earlier point.

Therefore, increase the threshold we check for in both locations to 4µs
(2048 timebase ticks).  This is a pretty crude approach, but it addresses
a real problem where guest load can trigger a host hard lockup.

We're hoping to refine this in future by gathering more data on exactly
how long these paths can take, and possibly by moving the check closer to
the actual guest entry point to reduce the variance.  Getting the details
for that might take some time however.

NOTE: For reasons I haven't yet tracked down yet, I haven't actually
managed to reproduce this on current upstream.  I have reproduced it on
RHEL kernels without obvious differences in this area.  I'm still trying
to determine what the cause of that difference is, but I think it's worth
applying this change as a precaution in the interim.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 arch/powerpc/kvm/book3s_hv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 0f83f39a2bd2..65a92dd890cb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3435,7 +3435,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	unsigned long host_pidr = mfspr(SPRN_PID);
 
 	hdec = time_limit - mftb();
-	if (hdec < 0)
+	if (hdec < 2048)
 		return BOOK3S_INTERRUPT_HV_DECREMENTER;
 	mtspr(SPRN_HDEC, hdec);
 
@@ -3564,7 +3564,7 @@ int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
 
 	dec = mfspr(SPRN_DEC);
 	tb = mftb();
-	if (dec < 512)
+	if (dec < 2048)
 		return BOOK3S_INTERRUPT_HV_DECREMENTER;
 	local_paca->kvm_hstate.dec_expires = dec + tb;
 	if (local_paca->kvm_hstate.dec_expires < time_limit)
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH] powerpc/pseries/hotplug-cpu: increase wait time for vCPU death
From: Michael Roth @ 2020-08-11  5:39 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Lynch
  Cc: linuxppc-dev, Greg Kurz, Thiago Jung Bauermann, Cedric Le Goater
In-Reply-To: <87mu37ylzu.fsf@linux.ibm.com>

Quoting Nathan Lynch (2020-08-07 02:05:09)
> Hi everyone,
> 
> Michael Ellerman <mpe@ellerman.id.au> writes:
> > Greg Kurz <groug@kaod.org> writes:
> >> On Tue, 04 Aug 2020 23:35:10 +1000
> >> Michael Ellerman <mpe@ellerman.id.au> wrote:
> >>> Spinning forever seems like a bad idea, but as has been demonstrated at
> >>> least twice now, continuing when we don't know the state of the other
> >>> CPU can lead to straight up crashes.
> >>> 
> >>> So I think I'm persuaded that it's preferable to have the kernel stuck
> >>> spinning rather than oopsing.
> >>> 
> >>
> >> +1
> >>
> >>> I'm 50/50 on whether we should have a cond_resched() in the loop. My
> >>> first instinct is no, if we're stuck here for 20s a stack trace would be
> >>> good. But then we will probably hit that on some big and/or heavily
> >>> loaded machine.
> >>> 
> >>> So possibly we should call cond_resched() but have some custom logic in
> >>> the loop to print a warning if we are stuck for more than some
> >>> sufficiently long amount of time.
> >>
> >> How long should that be ?
> >
> > Yeah good question.
> >
> > I guess step one would be seeing how long it can take on the 384 vcpu
> > machine. And we can probably test on some other big machines.
> >
> > Hopefully Nathan can give us some idea of how long he's seen it take on
> > large systems? I know he was concerned about the 20s timeout of the
> > softlockup detector.
> 
> Maybe I'm not quite clear what this is referring to, but I don't think
> stop-self/query-stopped-state latency increases with processor count, at
> least not on PowerVM. And IIRC I was concerned with the earlier patch's
> potential for causing the softlockup watchdog to rightly complain by
> polling the stopped state without ever scheduling away.
> 
> The fact that smp_query_cpu_stopped() kind of collapses the two distinct
> results from the query-cpu-stopped-state RTAS call into one return value
> may make it harder than necessary to reason about the questions around
> cond_resched() and whether to warn.
> 
> Sorry to pull this stunt but I have had some code sitting in a neglected
> branch that I think gets the logic around this right.
> 
> What we should have is a simple C wrapper for the RTAS call that reflects the
> architected inputs and outputs:
> 
> ================================================================
> (-- rtas.c --)
> 
> /**
>  * rtas_query_cpu_stopped_state() - Call RTAS query-cpu-stopped-state.
>  * @hwcpu: Identifies the processor thread to be queried.
>  * @status: Pointer to status, valid only on success.
>  *
>  * Determine whether the given processor thread is in the stopped
>  * state.  If successful and @status is non-NULL, the thread's status
>  * is stored to @status.
>  *
>  * Return:
>  * * 0   - Success
>  * * -1  - Hardware error
>  * * -2  - Busy, try again later
>  */
> int rtas_query_cpu_stopped_state(unsigned int hwcpu, unsigned int *status)
> {
>        unsigned int cpu_status;
>        int token;
>        int fwrc;
> 
>        token = rtas_token("query-cpu-stopped-state");
> 
>        fwrc = rtas_call(token, 1, 2, &cpu_status, hwcpu);
>        if (fwrc != 0)
>                goto out;
> 
>        if (status != NULL)
>                *status = cpu_status;
> out:
>        return fwrc;
> }
> ================================================================
> 
> 
> And then a utility function that waits for the remote thread to enter
> stopped state, with higher-level logic for rescheduling and warning. The
> fact that smp_query_cpu_stopped() currently does not handle a -2/busy
> status is a bug, fixed below by using rtas_busy_delay(). Note the
> justification for the explicit cond_resched() in the outer loop:
> 
> ================================================================
> (-- rtas.h --)
> 
> /* query-cpu-stopped-state CPU_status */
> #define RTAS_QCSS_STATUS_STOPPED     0
> #define RTAS_QCSS_STATUS_IN_PROGRESS 1
> #define RTAS_QCSS_STATUS_NOT_STOPPED 2
> 
> (-- pseries/hotplug-cpu.c --)
> 
> /**
>  * wait_for_cpu_stopped() - Wait for a cpu to enter RTAS stopped state.
>  */
> static void wait_for_cpu_stopped(unsigned int cpu)
> {
>        unsigned int status;
>        unsigned int hwcpu;
> 
>        hwcpu = get_hard_smp_processor_id(cpu);
> 
>        do {
>                int fwrc;
> 
>                /*
>                 * rtas_busy_delay() will yield only if RTAS returns a
>                 * busy status. Since query-cpu-stopped-state can
>                 * yield RTAS_QCSS_STATUS_IN_PROGRESS or
>                 * RTAS_QCSS_STATUS_NOT_STOPPED for an unbounded
>                 * period before the target thread stops, we must take
>                 * care to explicitly reschedule while polling.
>                 */
>                cond_resched();
> 
>                do {
>                        fwrc = rtas_query_cpu_stopped_state(hwcpu, &status);
>                } while (rtas_busy_delay(fwrc));
> 
>                if (fwrc == 0)
>                        continue;
> 
>                pr_err_ratelimited("query-cpu-stopped-state for "
>                                   "thread 0x%x returned %d\n",
>                                   hwcpu, fwrc);
>                goto out;
> 
>        } while (status == RTAS_QCSS_STATUS_NOT_STOPPED ||
>                 status == RTAS_QCSS_STATUS_IN_PROGRESS);
> 
>        if (status != RTAS_QCSS_STATUS_STOPPED) {
>                pr_err_ratelimited("query-cpu-stopped-state yielded unknown "
>                                   "status %d for thread 0x%x\n",
>                                   status, hwcpu);
>        }
> out:
>        return;
> }
> 
> [...]
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
>        wait_for_cpu_stopped(cpu);
>        paca_ptrs[cpu]->cpu_start = 0;
> }
> ================================================================
> 
> wait_for_cpu_stopped() should be able to accommodate a time-based
> warning if necessary, but speaking as a likely recipient of any bug
> reports that would arise here, I'm not convinced of the need and I
> don't know what a good value would be. It's relatively easy to sample
> the stack of a task that's apparently failing to make progress, plus I
> probably would use 'perf probe' or similar to report the inputs and
> outputs for the RTAS call.

I think if we make the timeout sufficiently high like 2 minutes or so
it wouldn't hurt and if we did seem them it would probably point to an
actual bug. But I don't have a strong feeling either way.

> 
> I'm happy to make this a proper submission after I can clean it up and
> retest it, or Michael R. is welcome to appropriate it, assuming it's
> acceptable.
> 

I've given it a shot with this patch and it seems to be holding up in
testing. If we don't think the ~2 minutes warning message is needed I
can clean it up to post:

https://github.com/mdroth/linux/commit/354b8c97bf0dc1146e36aa72273f5b33fe90d09e

I'd likely break the refactoring patches out to a separate patch under
Nathan's name since it fixes a separate bug potentially.

^ permalink raw reply

* Re: [PATCH] powerpc/rtas: Restrict RTAS requests from userspace
From: Andrew Donnellan @ 2020-08-11  8:04 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: nathanl, leobras.c, Daniel Axtens
In-Reply-To: <87bljjxau2.fsf@mpe.ellerman.id.au>

On 10/8/20 4:40 pm, Michael Ellerman wrote:
> Hi ajd,
> 
> Thanks for taking care of this.
> 
> I was going to merge this as-is, but given it's fixing a long standing
> issue there's not really a big rush. So a few comments below.

Thanks for the review.

>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index a09eba03f180..ec1cae52d8bd 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -324,6 +324,23 @@ int rtas_token(const char *service)
>>   }
>>   EXPORT_SYMBOL(rtas_token);
>>   
>> +#ifdef CONFIG_PPC_RTAS_FILTER
>> +
> 
> I think this could be combined with the #ifdef block below?

I put it here to keep it next to rtas_token() as it seemed thematically 
appropriate. Anyway per below I'll probably get rid of this function 
altogether.

> 
>> +static char *rtas_token_name(int token)
>> +{
>> +	struct property *prop;
>> +
>> +	for_each_property_of_node(rtas.dev, prop) {
>> +		const __be32 *tokp = prop->value;
>> +
>> +		if (tokp && be32_to_cpu(*tokp) == token)
>> +			return prop->name;
>> +	}
>> +	return NULL;
>> +}
>> +
>> +#endif /* CONFIG_PPC_RTAS_FILTER */
>> +
>>   int rtas_service_present(const char *service)
>>   {
>>   	return rtas_token(service) != RTAS_UNKNOWN_SERVICE;
>> @@ -1110,6 +1127,184 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
>>   	return NULL;
>>   }
>>   
>> +#ifdef CONFIG_PPC_RTAS_FILTER
>> +
>> +/*
>> + * The sys_rtas syscall, as originally designed, allows root to pass
>> + * arbitrary physical addresses to RTAS calls. A number of RTAS calls
>> + * can be abused to write to arbitrary memory and do other things that
>> + * are potentially harmful to system integrity, and thus should only
>> + * be used inside the kernel and not exposed to userspace.
>> + *
>> + * All known legitimate users of the sys_rtas syscall will only ever
>> + * pass addresses that fall within the RMO buffer, and use a known
>> + * subset of RTAS calls.
>> + *
>> + * Accordingly, we filter RTAS requests to check that the call is
>> + * permitted, and that provided pointers fall within the RMO buffer.
>> + * The rtas_filters list contains an entry for each permitted call,
>> + * with the indexes of the parameters which are expected to contain
>> + * addresses and sizes of buffers allocated inside the RMO buffer.
>> + */
>> +struct rtas_filter {
>> +	const char name[32];
> 
> Using a const char * for the name would be more typical, meaning the
> strings would end up in .rodata, and could be merged with other uses of
> the same strings.

Will fix

> 
>> +
>> +	/* Indexes into the args buffer, -1 if not used */
>> +	int rmo_buf_idx1;
>> +	int rmo_size_idx1;
>> +	int rmo_buf_idx2;
>> +	int rmo_size_idx2;
> 
> The "rmo" prefix is probably unnecessary?
> 

Ack

>> +};
>> +
>> +struct rtas_filter rtas_filters[] = {
> 
> Should be static, and __ro_after_init ?
> 
>> +	{ "ibm,activate-firmware", -1, -1, -1, -1 },
> 
> Would it be worth making the indices 1-based, allowing 0 to be the
> unused value, meaning you only have to initialise the used fields?

1-based array indices are morally reprehensible. It would make it 
cleaner but I kind of prefer an obvious and clear constant for unused, idk

> It would require adjusting them before use, but there's only 4 places
> they're used, and you could probably use a macro to do the - 1.
> 
>> +	{ "ibm,configure-connector", 0, -1, 1, -1 },	/* Special cased, size 4096 */
> 
> Does it make sense to put the hard coded sizes in the structure as well?
> 
> eg. fixed_size1 = 4096,
> 
> I think that would avoid the need for any strcmps in the code.

Not quite - we still have a special case for ibm,configure-connector 
passing a base address of 0.

But yes that's a good idea.

> 
>> +	{ "display-character", -1, -1, -1, -1 },
>> +	{ "ibm,display-message", 0, -1, -1, -1 },
>> +	{ "ibm,errinjct", 2, -1, -1, -1 },		/* Fixed size of 1024 */
>> +	{ "ibm,close-errinjct", -1, -1, -1, -1 },
>> +	{ "ibm,open-errinct", -1, -1, -1, -1 },
>> +	{ "ibm,get-config-addr-info2", -1, -1, -1, -1 },
>> +	{ "ibm,get-dynamic-sensor-state", 1, -1, -1, -1 },
>> +	{ "ibm,get-indices", 2, 3, -1, -1 },
>> +	{ "get-power-level", -1, -1, -1, -1 },
>> +	{ "get-sensor-state", -1, -1, -1, -1 },
>> +	{ "ibm,get-system-parameter", 1, 2, -1, -1 },
>> +	{ "get-time-of-day", -1, -1, -1, -1 },
>> +	{ "ibm,get-vpd", 0, -1, 1, 2 },
>> +	{ "ibm,lpar-perftools", 2, 3, -1, -1 },
>> +	{ "ibm,platform-dump", 4, 5, -1, -1 },
>> +	{ "ibm,read-slot-reset-state", -1, -1, -1, -1 },
>> +	{ "ibm,scan-log-dump", 0, 1, -1, -1 },
>> +	{ "ibm,set-dynamic-indicator", 2, -1, -1, -1 },
>> +	{ "ibm,set-eeh-option", -1, -1, -1, -1 },
>> +	{ "set-indicator", -1, -1, -1, -1 },
>> +	{ "set-power-level", -1, -1, -1, -1 },
>> +	{ "set-time-for-power-on", -1, -1, -1, -1 },
>> +	{ "ibm,set-system-parameter", 1, -1, -1, -1 },
>> +	{ "set-time-of-day", -1, -1, -1, -1 },
>> +	{ "ibm,suspend-me", -1, -1, -1, -1 },
>> +	{ "ibm,update-nodes", 0, -1, -1, -1 },		/* Fixed size of 4096 */
>> +	{ "ibm,update-properties", 0, -1, -1, -1 },	/* Fixed size of 4096 */
>> +	{ "ibm,physical-attestation", 0, 1, -1, -1 },
>> +};
>> +
>> +static void dump_rtas_params(int token, int nargs, int nret,
>> +			     struct rtas_args *args)
>> +{
>> +	int i;
>> +	char *token_name = rtas_token_name(token);
>> +
>> +	pr_err_ratelimited("sys_rtas: token=0x%x (%s), nargs=%d, nret=%d (called by %s)\n",
>> +			   token, token_name ? token_name : "unknown", nargs,
>> +			   nret, current->comm);
>> +	pr_err_ratelimited("sys_rtas: args: ");
>> +
>> +	for (i = 0; i < nargs; i++) {
>> +		u32 arg = be32_to_cpu(args->args[i]);
>> +
>> +		pr_cont("%08x ", arg);
>> +		if (arg >= rtas_rmo_buf &&
>> +		    arg < (rtas_rmo_buf + RTAS_RMOBUF_MAX))
>> +			pr_cont("(buf+0x%lx) ", arg - rtas_rmo_buf);
>> +	}
> 
> This can leak the location of the RMO buf into dmesg. I know it's
> visible via /proc, but the /proc file is 0400.
> 
> So I think it's probably safer if we just don't dump the args, or their
> relation to the RMO buf.

Good point, removed.

> 
>> +
>> +	pr_cont("\n");
>> +}
>> +
>> +static bool in_rmo_buf(u32 base, u32 end)
>> +{
>> +	return base >= rtas_rmo_buf &&
>> +		base < (rtas_rmo_buf + RTAS_RMOBUF_MAX) &&
>> +		base <= end &&
>> +		end >= rtas_rmo_buf &&
>> +		end < (rtas_rmo_buf + RTAS_RMOBUF_MAX);
>> +}
>> +
>> +static bool block_rtas_call(int token, int nargs,
>> +			    struct rtas_args *args)
>> +{
>> +	int i;
>> +	const char *reason;
>> +	char *token_name = rtas_token_name(token);
> 
> This code isn't particularly performance critical, but I think it would
> be cleaner to do the token lookup once at init time, and store the token
> in the filter array?
> 
> Then this code would only be doing token comparisons.

Yeah that would be cleaner, can get rid of rtas_token_name().

> 
>> +
>> +	if (!token_name)
>> +		goto err_notpermitted;
>> +
>> +	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
>> +		struct rtas_filter *f = &rtas_filters[i];
>> +		u32 base, size, end;
>> +
>> +		if (strcmp(token_name, f->name))
>> +			continue;
>> +
>> +		if (f->rmo_buf_idx1 != -1) {
>> +			base = be32_to_cpu(args->args[f->rmo_buf_idx1]);
>> +			if (f->rmo_size_idx1 != -1)
>> +				size = be32_to_cpu(args->args[f->rmo_size_idx1]);
>> +			else if (!strcmp(token_name, "ibm,errinjct"))
>> +				size = 1024;
>> +			else if (!strcmp(token_name, "ibm,update-nodes") ||
>> +				 !strcmp(token_name, "ibm,update-properties") ||
>> +				 !strcmp(token_name, "ibm,configure-connector"))
>> +				size = 4096;
>> +			else
>> +				size = 1;
>> +
>> +			end = base + size - 1;
>> +			if (!in_rmo_buf(base, end)) {
>> +				reason = "address pair 1 out of range";
> 
> I don't think we need to give the user this much detail about what they
> did wrong, all cases can just print "call not permitted" IMO.

Ack

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* [powerpc:next-test 33/34] arch/powerpc/platforms/powermac/smp.c:933:2: error: implicit declaration of function 'low_cpu_offline_self'
From: kernel test robot @ 2020-08-11  8:27 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: clang-built-linux, kbuild-all, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2890 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
head:   0a2900256840208c4a4248ff5900ae57990d55dc
commit: 7f24f76bc606cbae1b56a8a445a5353594c3cf18 [33/34] powerpc/smp: Move ppc_md.cpu_die() to smp_ops.cpu_offline_self()
config: powerpc-randconfig-r024-20200811 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 4f2ad15db535873dda9bfe248a2771023b64a43c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        git checkout 7f24f76bc606cbae1b56a8a445a5353594c3cf18
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/powermac/smp.c:419:13: warning: no previous prototype for function 'smp_psurge_take_timebase' [-Wmissing-prototypes]
   void __init smp_psurge_take_timebase(void)
               ^
   arch/powerpc/platforms/powermac/smp.c:419:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init smp_psurge_take_timebase(void)
   ^
   static 
   arch/powerpc/platforms/powermac/smp.c:435:13: warning: no previous prototype for function 'smp_psurge_give_timebase' [-Wmissing-prototypes]
   void __init smp_psurge_give_timebase(void)
               ^
   arch/powerpc/platforms/powermac/smp.c:435:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void __init smp_psurge_give_timebase(void)
   ^
   static 
>> arch/powerpc/platforms/powermac/smp.c:933:2: error: implicit declaration of function 'low_cpu_offline_self' [-Werror,-Wimplicit-function-declaration]
           low_cpu_offline_self();
           ^
   arch/powerpc/platforms/powermac/smp.c:933:2: note: did you mean 'pmac_cpu_offline_self'?
   arch/powerpc/platforms/powermac/smp.c:923:13: note: 'pmac_cpu_offline_self' declared here
   static void pmac_cpu_offline_self(void)
               ^
   2 warnings and 1 error generated.

vim +/low_cpu_offline_self +933 arch/powerpc/platforms/powermac/smp.c

   922	
   923	static void pmac_cpu_offline_self(void)
   924	{
   925		int cpu = smp_processor_id();
   926	
   927		local_irq_disable();
   928		idle_task_exit();
   929		pr_debug("CPU%d offline\n", cpu);
   930		generic_set_cpu_dead(cpu);
   931		smp_wmb();
   932		mb();
 > 933		low_cpu_offline_self();
   934	}
   935	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33676 bytes --]

^ permalink raw reply

* [PATCHv7 00/12]PCI: dwc: Add the multiple PF support for DWC and Layerscape
From: Zhiqiang Hou @ 2020-08-11  9:54 UTC (permalink / raw)
  To: linux-pci, devicetree, linux-kernel, linux-arm-kernel,
	linuxppc-dev, robh+dt, bhelgaas, lorenzo.pieralisi, shawnguo,
	leoyang.li, kishon, gustavo.pimentel, roy.zang, jingoohan1,
	andrew.murray
  Cc: minghuan.Lian, Hou Zhiqiang, mingkai.hu

From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>

Add the PCIe EP multiple PF support for DWC and Layerscape, and use
a list to manage the PFs of each PCIe controller; add the doorbell
MSIX function for DWC; and refactor the Layerscape EP driver due to
some difference in Layercape platforms PCIe integration.

Hou Zhiqiang (1):
  misc: pci_endpoint_test: Add driver data for Layerscape PCIe
    controllers

Xiaowei Bao (11):
  PCI: designware-ep: Add multiple PFs support for DWC
  PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
  PCI: designware-ep: Move the function of getting MSI capability
    forward
  PCI: designware-ep: Modify MSI and MSIX CAP way of finding
  dt-bindings: pci: layerscape-pci: Add compatible strings for ls1088a
    and ls2088a
  PCI: layerscape: Fix some format issue of the code
  PCI: layerscape: Modify the way of getting capability with different
    PEX
  PCI: layerscape: Modify the MSIX to the doorbell mode
  PCI: layerscape: Add EP mode support for ls1088a and ls2088a
  arm64: dts: layerscape: Add PCIe EP node for ls1088a
  misc: pci_endpoint_test: Add LS1088a in pci_device_id table

 .../bindings/pci/layerscape-pci.txt           |   2 +
 .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi |  31 +++
 drivers/misc/pci_endpoint_test.c              |   8 +-
 .../pci/controller/dwc/pci-layerscape-ep.c    | 100 +++++--
 .../pci/controller/dwc/pcie-designware-ep.c   | 258 ++++++++++++++----
 drivers/pci/controller/dwc/pcie-designware.c  |  59 ++--
 drivers/pci/controller/dwc/pcie-designware.h  |  48 +++-
 7 files changed, 410 insertions(+), 96 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCHv7 01/12] PCI: designware-ep: Add multiple PFs support for DWC
From: Zhiqiang Hou @ 2020-08-11  9:54 UTC (permalink / raw)
  To: linux-pci, devicetree, linux-kernel, linux-arm-kernel,
	linuxppc-dev, robh+dt, bhelgaas, lorenzo.pieralisi, shawnguo,
	leoyang.li, kishon, gustavo.pimentel, roy.zang, jingoohan1,
	andrew.murray
  Cc: minghuan.Lian, Hou Zhiqiang, Xiaowei Bao, mingkai.hu
In-Reply-To: <20200811095441.7636-1-Zhiqiang.Hou@nxp.com>

From: Xiaowei Bao <xiaowei.bao@nxp.com>

Add multiple PFs support for DWC, due to different PF have different
config space, we use func_conf_select callback function to access
the different PF's config space, the different chip company need to
implement this callback function when use the DWC IP core and intend
to support multiple PFs feature.

Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
---
V7:
 - Rebase the patch without functionality change.

 .../pci/controller/dwc/pcie-designware-ep.c   | 125 ++++++++++++------
 drivers/pci/controller/dwc/pcie-designware.c  |  59 ++++++---
 drivers/pci/controller/dwc/pcie-designware.h  |  18 ++-
 3 files changed, 143 insertions(+), 59 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index 305bfec2424d..e5bd3a5ef380 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -28,12 +28,26 @@ void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
 
-static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar,
-				   int flags)
+static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no)
+{
+	unsigned int func_offset = 0;
+
+	if (ep->ops->func_conf_select)
+		func_offset = ep->ops->func_conf_select(ep, func_no);
+
+	return func_offset;
+}
+
+static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
+				   enum pci_barno bar, int flags)
 {
 	u32 reg;
+	unsigned int func_offset = 0;
+	struct dw_pcie_ep *ep = &pci->ep;
+
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
 
-	reg = PCI_BASE_ADDRESS_0 + (4 * bar);
+	reg = func_offset + PCI_BASE_ADDRESS_0 + (4 * bar);
 	dw_pcie_dbi_ro_wr_en(pci);
 	dw_pcie_writel_dbi2(pci, reg, 0x0);
 	dw_pcie_writel_dbi(pci, reg, 0x0);
@@ -46,7 +60,12 @@ static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar,
 
 void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
 {
-	__dw_pcie_ep_reset_bar(pci, bar, 0);
+	u8 func_no, funcs;
+
+	funcs = pci->ep.epc->max_functions;
+
+	for (func_no = 0; func_no < funcs; func_no++)
+		__dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
 }
 
 static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
@@ -54,28 +73,31 @@ static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
 {
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+	unsigned int func_offset = 0;
+
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
 
 	dw_pcie_dbi_ro_wr_en(pci);
-	dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid);
-	dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid);
-	dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid);
-	dw_pcie_writeb_dbi(pci, PCI_CLASS_PROG, hdr->progif_code);
-	dw_pcie_writew_dbi(pci, PCI_CLASS_DEVICE,
+	dw_pcie_writew_dbi(pci, func_offset + PCI_VENDOR_ID, hdr->vendorid);
+	dw_pcie_writew_dbi(pci, func_offset + PCI_DEVICE_ID, hdr->deviceid);
+	dw_pcie_writeb_dbi(pci, func_offset + PCI_REVISION_ID, hdr->revid);
+	dw_pcie_writeb_dbi(pci, func_offset + PCI_CLASS_PROG, hdr->progif_code);
+	dw_pcie_writew_dbi(pci, func_offset + PCI_CLASS_DEVICE,
 			   hdr->subclass_code | hdr->baseclass_code << 8);
-	dw_pcie_writeb_dbi(pci, PCI_CACHE_LINE_SIZE,
+	dw_pcie_writeb_dbi(pci, func_offset + PCI_CACHE_LINE_SIZE,
 			   hdr->cache_line_size);
-	dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_VENDOR_ID,
+	dw_pcie_writew_dbi(pci, func_offset + PCI_SUBSYSTEM_VENDOR_ID,
 			   hdr->subsys_vendor_id);
-	dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_ID, hdr->subsys_id);
-	dw_pcie_writeb_dbi(pci, PCI_INTERRUPT_PIN,
+	dw_pcie_writew_dbi(pci, func_offset + PCI_SUBSYSTEM_ID, hdr->subsys_id);
+	dw_pcie_writeb_dbi(pci, func_offset + PCI_INTERRUPT_PIN,
 			   hdr->interrupt_pin);
 	dw_pcie_dbi_ro_wr_dis(pci);
 
 	return 0;
 }
 
-static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, enum pci_barno bar,
-				  dma_addr_t cpu_addr,
+static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, u8 func_no,
+				  enum pci_barno bar, dma_addr_t cpu_addr,
 				  enum dw_pcie_as_type as_type)
 {
 	int ret;
@@ -88,7 +110,7 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, enum pci_barno bar,
 		return -EINVAL;
 	}
 
-	ret = dw_pcie_prog_inbound_atu(pci, free_win, bar, cpu_addr,
+	ret = dw_pcie_prog_inbound_atu(pci, func_no, free_win, bar, cpu_addr,
 				       as_type);
 	if (ret < 0) {
 		dev_err(pci->dev, "Failed to program IB window\n");
@@ -101,7 +123,8 @@ static int dw_pcie_ep_inbound_atu(struct dw_pcie_ep *ep, enum pci_barno bar,
 	return 0;
 }
 
-static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, phys_addr_t phys_addr,
+static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, u8 func_no,
+				   phys_addr_t phys_addr,
 				   u64 pci_addr, size_t size)
 {
 	u32 free_win;
@@ -113,8 +136,8 @@ static int dw_pcie_ep_outbound_atu(struct dw_pcie_ep *ep, phys_addr_t phys_addr,
 		return -EINVAL;
 	}
 
-	dw_pcie_prog_outbound_atu(pci, free_win, PCIE_ATU_TYPE_MEM,
-				  phys_addr, pci_addr, size);
+	dw_pcie_prog_ep_outbound_atu(pci, func_no, free_win, PCIE_ATU_TYPE_MEM,
+				     phys_addr, pci_addr, size);
 
 	set_bit(free_win, ep->ob_window_map);
 	ep->outbound_addr[free_win] = phys_addr;
@@ -130,7 +153,7 @@ static void dw_pcie_ep_clear_bar(struct pci_epc *epc, u8 func_no,
 	enum pci_barno bar = epf_bar->barno;
 	u32 atu_index = ep->bar_to_atu[bar];
 
-	__dw_pcie_ep_reset_bar(pci, bar, epf_bar->flags);
+	__dw_pcie_ep_reset_bar(pci, func_no, bar, epf_bar->flags);
 
 	dw_pcie_disable_atu(pci, atu_index, DW_PCIE_REGION_INBOUND);
 	clear_bit(atu_index, ep->ib_window_map);
@@ -147,14 +170,20 @@ static int dw_pcie_ep_set_bar(struct pci_epc *epc, u8 func_no,
 	size_t size = epf_bar->size;
 	int flags = epf_bar->flags;
 	enum dw_pcie_as_type as_type;
-	u32 reg = PCI_BASE_ADDRESS_0 + (4 * bar);
+	u32 reg;
+	unsigned int func_offset = 0;
+
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = PCI_BASE_ADDRESS_0 + (4 * bar) + func_offset;
 
 	if (!(flags & PCI_BASE_ADDRESS_SPACE))
 		as_type = DW_PCIE_AS_MEM;
 	else
 		as_type = DW_PCIE_AS_IO;
 
-	ret = dw_pcie_ep_inbound_atu(ep, bar, epf_bar->phys_addr, as_type);
+	ret = dw_pcie_ep_inbound_atu(ep, func_no, bar,
+				     epf_bar->phys_addr, as_type);
 	if (ret)
 		return ret;
 
@@ -213,7 +242,7 @@ static int dw_pcie_ep_map_addr(struct pci_epc *epc, u8 func_no,
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 
-	ret = dw_pcie_ep_outbound_atu(ep, addr, pci_addr, size);
+	ret = dw_pcie_ep_outbound_atu(ep, func_no, addr, pci_addr, size);
 	if (ret) {
 		dev_err(pci->dev, "Failed to enable address\n");
 		return ret;
@@ -227,11 +256,14 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 func_no)
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	u32 val, reg;
+	unsigned int func_offset = 0;
 
 	if (!ep->msi_cap)
 		return -EINVAL;
 
-	reg = ep->msi_cap + PCI_MSI_FLAGS;
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
 	val = dw_pcie_readw_dbi(pci, reg);
 	if (!(val & PCI_MSI_FLAGS_ENABLE))
 		return -EINVAL;
@@ -246,11 +278,14 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts)
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	u32 val, reg;
+	unsigned int func_offset = 0;
 
 	if (!ep->msi_cap)
 		return -EINVAL;
 
-	reg = ep->msi_cap + PCI_MSI_FLAGS;
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
 	val = dw_pcie_readw_dbi(pci, reg);
 	val &= ~PCI_MSI_FLAGS_QMASK;
 	val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK;
@@ -266,11 +301,14 @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	u32 val, reg;
+	unsigned int func_offset = 0;
 
 	if (!ep->msix_cap)
 		return -EINVAL;
 
-	reg = ep->msix_cap + PCI_MSIX_FLAGS;
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
 	val = dw_pcie_readw_dbi(pci, reg);
 	if (!(val & PCI_MSIX_FLAGS_ENABLE))
 		return -EINVAL;
@@ -286,23 +324,26 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts,
 	struct dw_pcie_ep *ep = epc_get_drvdata(epc);
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	u32 val, reg;
+	unsigned int func_offset = 0;
 
 	if (!ep->msix_cap)
 		return -EINVAL;
 
 	dw_pcie_dbi_ro_wr_en(pci);
 
-	reg = ep->msix_cap + PCI_MSIX_FLAGS;
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
 	val = dw_pcie_readw_dbi(pci, reg);
 	val &= ~PCI_MSIX_FLAGS_QSIZE;
 	val |= interrupts;
 	dw_pcie_writew_dbi(pci, reg, val);
 
-	reg = ep->msix_cap + PCI_MSIX_TABLE;
+	reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
 	val = offset | bir;
 	dw_pcie_writel_dbi(pci, reg, val);
 
-	reg = ep->msix_cap + PCI_MSIX_PBA;
+	reg = ep->msix_cap + func_offset + PCI_MSIX_PBA;
 	val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
 	dw_pcie_writel_dbi(pci, reg, val);
 
@@ -387,6 +428,7 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	struct pci_epc *epc = ep->epc;
 	unsigned int aligned_offset;
+	unsigned int func_offset = 0;
 	u16 msg_ctrl, msg_data;
 	u32 msg_addr_lower, msg_addr_upper, reg;
 	u64 msg_addr;
@@ -396,20 +438,22 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 	if (!ep->msi_cap)
 		return -EINVAL;
 
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
 	/* Raise MSI per the PCI Local Bus Specification Revision 3.0, 6.8.1. */
-	reg = ep->msi_cap + PCI_MSI_FLAGS;
+	reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
 	msg_ctrl = dw_pcie_readw_dbi(pci, reg);
 	has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT);
-	reg = ep->msi_cap + PCI_MSI_ADDRESS_LO;
+	reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_LO;
 	msg_addr_lower = dw_pcie_readl_dbi(pci, reg);
 	if (has_upper) {
-		reg = ep->msi_cap + PCI_MSI_ADDRESS_HI;
+		reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_HI;
 		msg_addr_upper = dw_pcie_readl_dbi(pci, reg);
-		reg = ep->msi_cap + PCI_MSI_DATA_64;
+		reg = ep->msi_cap + func_offset + PCI_MSI_DATA_64;
 		msg_data = dw_pcie_readw_dbi(pci, reg);
 	} else {
 		msg_addr_upper = 0;
-		reg = ep->msi_cap + PCI_MSI_DATA_32;
+		reg = ep->msi_cap + func_offset + PCI_MSI_DATA_32;
 		msg_data = dw_pcie_readw_dbi(pci, reg);
 	}
 	aligned_offset = msg_addr_lower & (epc->mem->window.page_size - 1);
@@ -428,11 +472,12 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 }
 
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
-			     u16 interrupt_num)
+			      u16 interrupt_num)
 {
 	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
 	struct pci_epf_msix_tbl *msix_tbl;
 	struct pci_epc *epc = ep->epc;
+	unsigned int func_offset = 0;
 	u32 reg, msg_data, vec_ctrl;
 	unsigned int aligned_offset;
 	u32 tbl_offset;
@@ -440,7 +485,9 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
 	int ret;
 	u8 bir;
 
-	reg = ep->msix_cap + PCI_MSIX_TABLE;
+	func_offset = dw_pcie_ep_func_select(ep, func_no);
+
+	reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
 	tbl_offset = dw_pcie_readl_dbi(pci, reg);
 	bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
 	tbl_offset &= PCI_MSIX_TABLE_OFFSET;
@@ -599,13 +646,13 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
 	ep->epc = epc;
 	epc_set_drvdata(epc, ep);
 
-	if (ep->ops->ep_init)
-		ep->ops->ep_init(ep);
-
 	ret = of_property_read_u8(np, "max-functions", &epc->max_functions);
 	if (ret < 0)
 		epc->max_functions = 1;
 
+	if (ep->ops->ep_init)
+		ep->ops->ep_init(ep);
+
 	ret = pci_epc_mem_init(epc, ep->phys_base, ep->addr_size,
 			       ep->page_size);
 	if (ret < 0) {
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index b723e0cc41fb..8094e8a72859 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -239,9 +239,10 @@ static void dw_pcie_writel_ob_unroll(struct dw_pcie *pci, u32 index, u32 reg,
 	dw_pcie_writel_atu(pci, offset + reg, val);
 }
 
-static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index,
-					     int type, u64 cpu_addr,
-					     u64 pci_addr, u32 size)
+static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, u8 func_no,
+					     int index, int type,
+					     u64 cpu_addr, u64 pci_addr,
+					     u32 size)
 {
 	u32 retries, val;
 	u64 limit_addr = cpu_addr + size - 1;
@@ -259,7 +260,7 @@ static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index,
 	dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_UPPER_TARGET,
 				 upper_32_bits(pci_addr));
 	dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1,
-				 type);
+				 type | PCIE_ATU_FUNC_NUM(func_no));
 	dw_pcie_writel_ob_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2,
 				 PCIE_ATU_ENABLE);
 
@@ -278,8 +279,9 @@ static void dw_pcie_prog_outbound_atu_unroll(struct dw_pcie *pci, int index,
 	dev_err(pci->dev, "Outbound iATU is not being enabled\n");
 }
 
-void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
-			       u64 cpu_addr, u64 pci_addr, u32 size)
+static void __dw_pcie_prog_outbound_atu(struct dw_pcie *pci, u8 func_no,
+					int index, int type, u64 cpu_addr,
+					u64 pci_addr, u32 size)
 {
 	u32 retries, val;
 
@@ -287,8 +289,8 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
 		cpu_addr = pci->ops->cpu_addr_fixup(pci, cpu_addr);
 
 	if (pci->iatu_unroll_enabled) {
-		dw_pcie_prog_outbound_atu_unroll(pci, index, type, cpu_addr,
-						 pci_addr, size);
+		dw_pcie_prog_outbound_atu_unroll(pci, func_no, index, type,
+						 cpu_addr, pci_addr, size);
 		return;
 	}
 
@@ -304,7 +306,8 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
 			   lower_32_bits(pci_addr));
 	dw_pcie_writel_dbi(pci, PCIE_ATU_UPPER_TARGET,
 			   upper_32_bits(pci_addr));
-	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type);
+	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
+			   PCIE_ATU_FUNC_NUM(func_no));
 	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE);
 
 	/*
@@ -321,6 +324,21 @@ void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
 	dev_err(pci->dev, "Outbound iATU is not being enabled\n");
 }
 
+void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index, int type,
+			       u64 cpu_addr, u64 pci_addr, u32 size)
+{
+	__dw_pcie_prog_outbound_atu(pci, 0, index, type,
+				    cpu_addr, pci_addr, size);
+}
+
+void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8 func_no, int index,
+				  int type, u64 cpu_addr, u64 pci_addr,
+				  u32 size)
+{
+	__dw_pcie_prog_outbound_atu(pci, func_no, index, type,
+				    cpu_addr, pci_addr, size);
+}
+
 static u32 dw_pcie_readl_ib_unroll(struct dw_pcie *pci, u32 index, u32 reg)
 {
 	u32 offset = PCIE_GET_ATU_INB_UNR_REG_OFFSET(index);
@@ -336,8 +354,8 @@ static void dw_pcie_writel_ib_unroll(struct dw_pcie *pci, u32 index, u32 reg,
 	dw_pcie_writel_atu(pci, offset + reg, val);
 }
 
-static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie *pci, int index,
-					   int bar, u64 cpu_addr,
+static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie *pci, u8 func_no,
+					   int index, int bar, u64 cpu_addr,
 					   enum dw_pcie_as_type as_type)
 {
 	int type;
@@ -359,8 +377,10 @@ static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie *pci, int index,
 		return -EINVAL;
 	}
 
-	dw_pcie_writel_ib_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, type);
+	dw_pcie_writel_ib_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL1, type |
+				 PCIE_ATU_FUNC_NUM(func_no));
 	dw_pcie_writel_ib_unroll(pci, index, PCIE_ATU_UNR_REGION_CTRL2,
+				 PCIE_ATU_FUNC_NUM_MATCH_EN |
 				 PCIE_ATU_ENABLE |
 				 PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
 
@@ -381,14 +401,15 @@ static int dw_pcie_prog_inbound_atu_unroll(struct dw_pcie *pci, int index,
 	return -EBUSY;
 }
 
-int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int index, int bar,
-			     u64 cpu_addr, enum dw_pcie_as_type as_type)
+int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, u8 func_no, int index,
+			     int bar, u64 cpu_addr,
+			     enum dw_pcie_as_type as_type)
 {
 	int type;
 	u32 retries, val;
 
 	if (pci->iatu_unroll_enabled)
-		return dw_pcie_prog_inbound_atu_unroll(pci, index, bar,
+		return dw_pcie_prog_inbound_atu_unroll(pci, func_no, index, bar,
 						       cpu_addr, as_type);
 
 	dw_pcie_writel_dbi(pci, PCIE_ATU_VIEWPORT, PCIE_ATU_REGION_INBOUND |
@@ -407,9 +428,11 @@ int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int index, int bar,
 		return -EINVAL;
 	}
 
-	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type);
-	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE
-			   | PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
+	dw_pcie_writel_dbi(pci, PCIE_ATU_CR1, type |
+			   PCIE_ATU_FUNC_NUM(func_no));
+	dw_pcie_writel_dbi(pci, PCIE_ATU_CR2, PCIE_ATU_ENABLE |
+			   PCIE_ATU_FUNC_NUM_MATCH_EN |
+			   PCIE_ATU_BAR_MODE_ENABLE | (bar << 8));
 
 	/*
 	 * Make sure ATU enable takes effect before any subsequent config
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index f911760dcc69..89f8271ec5ee 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -80,9 +80,11 @@
 #define PCIE_ATU_TYPE_IO		0x2
 #define PCIE_ATU_TYPE_CFG0		0x4
 #define PCIE_ATU_TYPE_CFG1		0x5
+#define PCIE_ATU_FUNC_NUM(pf)           ((pf) << 20)
 #define PCIE_ATU_CR2			0x908
 #define PCIE_ATU_ENABLE			BIT(31)
 #define PCIE_ATU_BAR_MODE_ENABLE	BIT(30)
+#define PCIE_ATU_FUNC_NUM_MATCH_EN      BIT(19)
 #define PCIE_ATU_LOWER_BASE		0x90C
 #define PCIE_ATU_UPPER_BASE		0x910
 #define PCIE_ATU_LIMIT			0x914
@@ -215,6 +217,14 @@ struct dw_pcie_ep_ops {
 	int	(*raise_irq)(struct dw_pcie_ep *ep, u8 func_no,
 			     enum pci_epc_irq_type type, u16 interrupt_num);
 	const struct pci_epc_features* (*get_features)(struct dw_pcie_ep *ep);
+	/*
+	 * Provide a method to implement the different func config space
+	 * access for different platform, if different func have different
+	 * offset, return the offset of func. if use write a register way
+	 * return a 0, and implement code in callback function of platform
+	 * driver.
+	 */
+	unsigned int (*func_conf_select)(struct dw_pcie_ep *ep, u8 func_no);
 };
 
 struct dw_pcie_ep {
@@ -290,8 +300,12 @@ int dw_pcie_wait_for_link(struct dw_pcie *pci);
 void dw_pcie_prog_outbound_atu(struct dw_pcie *pci, int index,
 			       int type, u64 cpu_addr, u64 pci_addr,
 			       u32 size);
-int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, int index, int bar,
-			     u64 cpu_addr, enum dw_pcie_as_type as_type);
+void dw_pcie_prog_ep_outbound_atu(struct dw_pcie *pci, u8 func_no, int index,
+				  int type, u64 cpu_addr, u64 pci_addr,
+				  u32 size);
+int dw_pcie_prog_inbound_atu(struct dw_pcie *pci, u8 func_no, int index,
+			     int bar, u64 cpu_addr,
+			     enum dw_pcie_as_type as_type);
 void dw_pcie_disable_atu(struct dw_pcie *pci, int index,
 			 enum dw_pcie_region_type type);
 void dw_pcie_setup(struct dw_pcie *pci);
-- 
2.17.1


^ permalink raw reply related

* [PATCHv7 02/12] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
From: Zhiqiang Hou @ 2020-08-11  9:54 UTC (permalink / raw)
  To: linux-pci, devicetree, linux-kernel, linux-arm-kernel,
	linuxppc-dev, robh+dt, bhelgaas, lorenzo.pieralisi, shawnguo,
	leoyang.li, kishon, gustavo.pimentel, roy.zang, jingoohan1,
	andrew.murray
  Cc: minghuan.Lian, Hou Zhiqiang, Xiaowei Bao, mingkai.hu
In-Reply-To: <20200811095441.7636-1-Zhiqiang.Hou@nxp.com>

From: Xiaowei Bao <xiaowei.bao@nxp.com>

Add the doorbell mode of MSI-X in DWC EP driver.

Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
---
V7:
 - Rebase the patch without functionality change.

 drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++++++++++++++
 drivers/pci/controller/dwc/pcie-designware.h    | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
index e5bd3a5ef380..e76b504ed465 100644
--- a/drivers/pci/controller/dwc/pcie-designware-ep.c
+++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
@@ -471,6 +471,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 	return 0;
 }
 
+int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
+				       u16 interrupt_num)
+{
+	struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+	u32 msg_data;
+
+	msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
+		   (interrupt_num - 1);
+
+	dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
+
+	return 0;
+}
+
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
 			      u16 interrupt_num)
 {
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 89f8271ec5ee..745b4938225a 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -97,6 +97,9 @@
 #define PCIE_MISC_CONTROL_1_OFF		0x8BC
 #define PCIE_DBI_RO_WR_EN		BIT(0)
 
+#define PCIE_MSIX_DOORBELL		0x948
+#define PCIE_MSIX_DOORBELL_PF_SHIFT	24
+
 #define PCIE_PL_CHK_REG_CONTROL_STATUS			0xB20
 #define PCIE_PL_CHK_REG_CHK_REG_START			BIT(0)
 #define PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS		BIT(1)
@@ -434,6 +437,8 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
 			     u8 interrupt_num);
 int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
 			     u16 interrupt_num);
+int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
+				       u16 interrupt_num);
 void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
 #else
 static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
@@ -475,6 +480,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
 	return 0;
 }
 
+static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,
+						     u8 func_no,
+						     u16 interrupt_num)
+{
+	return 0;
+}
+
 static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
 {
 }
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox