LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/3] powerpc/configs/dpaa: enable the Cortina PHY driver
From: Camelia Groza @ 2018-07-18 11:46 UTC (permalink / raw)
  To: robh+dt, mark.rutland, benh
  Cc: paulus, mpe, devicetree, linuxppc-dev, linux-kernel,
	Camelia Groza
In-Reply-To: <cover.1531903211.git.camelia.groza@nxp.com>

Cortina PHYs are present on T4240RDB and T2080RDB. Enable the driver
by default.

Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
---
 arch/powerpc/configs/dpaa.config | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/dpaa.config b/arch/powerpc/configs/dpaa.config
index 2fe76f5..4ffacaf 100644
--- a/arch/powerpc/configs/dpaa.config
+++ b/arch/powerpc/configs/dpaa.config
@@ -2,3 +2,4 @@ CONFIG_FSL_DPAA=y
 CONFIG_FSL_PAMU=y
 CONFIG_FSL_FMAN=y
 CONFIG_FSL_DPAA_ETH=y
+CONFIG_CORTINA_PHY=y
-- 
1.9.1

^ permalink raw reply related

* [PATCH 0/3] powerpc/dpaa: use the Cortina PHY driver
From: Camelia Groza @ 2018-07-18 11:46 UTC (permalink / raw)
  To: robh+dt, mark.rutland, benh
  Cc: paulus, mpe, devicetree, linuxppc-dev, linux-kernel,
	Camelia Groza

The Cortina PHY on T2080RDB and T4240RDB requires the use of the
dedicated Cortina PHY driver. This patch series enables it for
DPAA platforms and uses it on T4240RDB and T2080RDB.

Camelia Groza (3):
  powerpc/configs/dpaa: enable the Cortina PHY driver
  powerpc/dts/fsl: t4240rdb: use the Cortina PHY driver compatible
  powerpc/dts/fsl: t2080rdb: use the Cortina PHY driver compatible

 arch/powerpc/boot/dts/fsl/t2080rdb.dts | 4 ++--
 arch/powerpc/boot/dts/fsl/t4240rdb.dts | 8 ++++----
 arch/powerpc/configs/dpaa.config       | 1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

--
1.9.1

^ permalink raw reply

* Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling
From: Finn Thain @ 2018-07-18 12:02 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Arnd Bergmann, Paul Mackerras, Michael Ellerman, Joshua Thompson,
	Mathieu Malaterre, Benjamin Herrenschmidt, Greg Ungerer,
	linux-m68k, linuxppc-dev, Linux Kernel Mailing List,
	y2038 Mailman List, Meelis Roos, Andreas Schwab
In-Reply-To: <CAMuHMdXHUwbD4ZBuQz-xYQO=QdYdPS=j=oLWBet6B7PrGEkT9Q@mail.gmail.com>

On Wed, 18 Jul 2018, Geert Uytterhoeven wrote:

> 
> Thanks for your patch!
> 
> Applied and queued for v4.19, with the WARN_ON() dropped.
> 

The patch you've committed to your for-v4.19 branch has this hunk:

@@ -269,8 +275,12 @@ static long via_read_time(void)
                via_pram_command(0x89, &result.cdata[1]);
                via_pram_command(0x8D, &result.cdata[0]);
 
-               if (result.idata == last_result.idata)
+               if (result.idata == last_result.idata) {
+                       if (result.idata < RTC_OFFSET)
+                               result.idata += 0x100000000ull;
+
                        return result.idata - RTC_OFFSET;
+               }
 
                if (++count > 10)
                        break;

That looks bogus to me, since result.idata is a long.

Also, the following hunk seems a bit pointless (?)

@@ -291,11 +301,11 @@ static long via_read_time(void)
  * is basically any machine with Mac II-style ADB.
  */
 
-static void via_write_time(long time)
+static void via_write_time(time64_t time)
 {
        union {
                __u8 cdata[4];
-               long idata;
+               __u32 idata;
        } data;
        __u8 temp;
 

But if data.idata needs to be changed to __u32 here, why not change the 
same struct member in via_read_time() also?

-- 

^ permalink raw reply

* Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling
From: Arnd Bergmann @ 2018-07-18 12:20 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, Paul Mackerras, Michael Ellerman,
	Joshua Thompson, Mathieu Malaterre, Benjamin Herrenschmidt,
	Greg Ungerer, linux-m68k, linuxppc-dev, Linux Kernel Mailing List,
	y2038 Mailman List, Meelis Roos, Andreas Schwab
In-Reply-To: <alpine.LNX.2.21.1807182201560.41@nippy.intranet>

On Wed, Jul 18, 2018 at 2:02 PM, Finn Thain <fthain@telegraphics.com.au> wrote:
> On Wed, 18 Jul 2018, Geert Uytterhoeven wrote:
>
>>
>> Thanks for your patch!
>>
>> Applied and queued for v4.19, with the WARN_ON() dropped.
>>
>
> The patch you've committed to your for-v4.19 branch has this hunk:
>
> @@ -269,8 +275,12 @@ static long via_read_time(void)
>                 via_pram_command(0x89, &result.cdata[1]);
>                 via_pram_command(0x8D, &result.cdata[0]);
>
> -               if (result.idata == last_result.idata)
> +               if (result.idata == last_result.idata) {
> +                       if (result.idata < RTC_OFFSET)
> +                               result.idata += 0x100000000ull;
> +
>                         return result.idata - RTC_OFFSET;
> +               }
>
>                 if (++count > 10)
>                         break;
>
> That looks bogus to me, since result.idata is a long.



> Also, the following hunk seems a bit pointless (?)
>
> @@ -291,11 +301,11 @@ static long via_read_time(void)
>   * is basically any machine with Mac II-style ADB.
>   */
>
> -static void via_write_time(long time)
> +static void via_write_time(time64_t time)
>  {
>         union {
>                 __u8 cdata[4];
> -               long idata;
> +               __u32 idata;
>         } data;
>         __u8 temp;
>
>
> But if data.idata needs to be changed to __u32 here, why not change the
> same struct member in via_read_time() also?

Hmm, apparently I forgot to update via_read_time(), that one
is indeed bogus and now inconsistent with the other functions.

The change in via_write_time() seems at least consistent wtih
what we do elsewhere, and using __u32 makes this code
more portable. (yes, I realize that 64-bit powermac doesn't
use the VIA RTC, but it feels better to write code portably
anyway).

I'd suggest we do it like below to make it consistent with the
rest again, using the 1904..2040 range of dates and no warning
for invalid dates.

If you agree, I'll send that as a proper patch.

       Arnd

diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
index bf8df47a6d09..8335509969f1 100644
--- a/arch/m68k/mac/misc.c
+++ b/arch/m68k/mac/misc.c
@@ -255,12 +255,13 @@ static void via_write_pram(int offset, __u8 data)
  * is basically any machine with Mac II-style ADB.
  */

-static long via_read_time(void)
+static time64_t via_read_time(void)
 {
        union {
                __u8 cdata[4];
-               long idata;
+               __u32 idata;
        } result, last_result;
+       time64_t ret;
        int count = 1;

        via_pram_command(0x81, &last_result.cdata[3]);
@@ -279,12 +280,8 @@ static long via_read_time(void)
                via_pram_command(0x89, &result.cdata[1]);
                via_pram_command(0x8D, &result.cdata[0]);

-               if (result.idata == last_result.idata) {
-                       if (result.idata < RTC_OFFSET)
-                               result.idata += 0x100000000ull;
-
-                       return result.idata - RTC_OFFSET;
-               }
+               if (result.idata == last_result.idata)
+                       return (time64_t(result.idata) - RTC_OFFSET);

                if (++count > 10)
                        break;

^ permalink raw reply related

* Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling
From: Finn Thain @ 2018-07-18 13:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Geert Uytterhoeven, Paul Mackerras, Michael Ellerman,
	Joshua Thompson, Mathieu Malaterre, Benjamin Herrenschmidt,
	Greg Ungerer, linux-m68k, linuxppc-dev, Linux Kernel Mailing List,
	y2038 Mailman List, Meelis Roos, Andreas Schwab
In-Reply-To: <CAK8P3a0OBVu-o6kFCS=gV83EcH9ba9QYGZBBhB9D8cJrf=8akA@mail.gmail.com>

On Wed, 18 Jul 2018, Arnd Bergmann wrote:

> Hmm, apparently I forgot to update via_read_time(), that one
> is indeed bogus and now inconsistent with the other functions.
> 
> The change in via_write_time() seems at least consistent wtih what we do 
> elsewhere, and using __u32 makes this code more portable. (yes, I 
> realize that 64-bit powermac doesn't use the VIA RTC, but it feels 
> better to write code portably anyway).
> 

As for portability, I think you just contradicted yourself. But I take 
your point about consistency. So I won't object to adopting __u32.

> I'd suggest we do it like below to make it consistent with the
> rest again, using the 1904..2040 range of dates and no warning
> for invalid dates.
> 
> If you agree, I'll send that as a proper patch.
> 

Geert may instead wish to fixup or revert the patch he has committed 
already...

>        Arnd
> 
> diff --git a/arch/m68k/mac/misc.c b/arch/m68k/mac/misc.c
> index bf8df47a6d09..8335509969f1 100644
> --- a/arch/m68k/mac/misc.c
> +++ b/arch/m68k/mac/misc.c
> @@ -255,12 +255,13 @@ static void via_write_pram(int offset, __u8 data)
>   * is basically any machine with Mac II-style ADB.
>   */
> 
> -static long via_read_time(void)
> +static time64_t via_read_time(void)
>  {
>         union {
>                 __u8 cdata[4];
> -               long idata;
> +               __u32 idata;
>         } result, last_result;
> +       time64_t ret;

ret isn't used.

>         int count = 1;
> 
>         via_pram_command(0x81, &last_result.cdata[3]);
> @@ -279,12 +280,8 @@ static long via_read_time(void)
>                 via_pram_command(0x89, &result.cdata[1]);
>                 via_pram_command(0x8D, &result.cdata[0]);
> 
> -               if (result.idata == last_result.idata) {
> -                       if (result.idata < RTC_OFFSET)
> -                               result.idata += 0x100000000ull;
> -
> -                       return result.idata - RTC_OFFSET;
> -               }
> +               if (result.idata == last_result.idata)
> +                       return (time64_t(result.idata) - RTC_OFFSET);
> 

Did you mean to write,

			return (time64_t)result.idata - RTC_OFFSET;

?

-- 

^ permalink raw reply

* Re: [PATCH 2/3] [v2] m68k: mac: use time64_t in RTC handling
From: Arnd Bergmann @ 2018-07-18 14:26 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, Paul Mackerras, Michael Ellerman,
	Joshua Thompson, Mathieu Malaterre, Benjamin Herrenschmidt,
	Greg Ungerer, linux-m68k, linuxppc-dev, Linux Kernel Mailing List,
	y2038 Mailman List, Meelis Roos, Andreas Schwab
In-Reply-To: <alpine.LNX.2.21.1807182333150.41@nippy.intranet>

On Wed, Jul 18, 2018 at 3:49 PM, Finn Thain <fthain@telegraphics.com.au> wrote:
> On Wed, 18 Jul 2018, Arnd Bergmann wrote:

>>
>> -static long via_read_time(void)
>> +static time64_t via_read_time(void)
>>  {
>>         union {
>>                 __u8 cdata[4];
>> -               long idata;
>> +               __u32 idata;
>>         } result, last_result;
>> +       time64_t ret;
>
> ret isn't used.
>
>>         int count = 1;
>>
>>         via_pram_command(0x81, &last_result.cdata[3]);
>> @@ -279,12 +280,8 @@ static long via_read_time(void)
>>                 via_pram_command(0x89, &result.cdata[1]);
>>                 via_pram_command(0x8D, &result.cdata[0]);
>>
>> -               if (result.idata == last_result.idata) {
>> -                       if (result.idata < RTC_OFFSET)
>> -                               result.idata += 0x100000000ull;
>> -
>> -                       return result.idata - RTC_OFFSET;
>> -               }
>> +               if (result.idata == last_result.idata)
>> +                       return (time64_t(result.idata) - RTC_OFFSET);
>>
>
> Did you mean to write,
>
>                         return (time64_t)result.idata - RTC_OFFSET;
>
> ?

Right, I should have at least tried to build it again.

      Arnd

^ permalink raw reply

* Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E
From: Diana Madalina Craciun @ 2018-07-18 14:29 UTC (permalink / raw)
  To: LEROY Christophe
  Cc: Leo Li, Bharat Bhushan, oss@buserror.net,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20180717184654.Horde.vpgbrK2KJGaXTbvLIr1Bgg1@messagerie.si.c-s.fr>

On 7/17/2018 7:47 PM, LEROY Christophe wrote:=0A=
> Diana Craciun <diana.craciun@nxp.com> a =E9crit :=0A=
>=0A=
>> The NXP PPC Book3E platforms are not vulnerable to meltdown and=0A=
>> Spectre v4, so make them PPC_BOOK3S_64 specific.=0A=
>>=0A=
>> Signed-off-by: Diana Craciun <diana.craciun@nxp.com>=0A=
>> ---=0A=
>> History:=0A=
>>=0A=
>> v2-->v3=0A=
>> - used the existing functions for spectre v1/v2=0A=
>>=0A=
>>  arch/powerpc/Kconfig           | 7 ++++++-=0A=
>>  arch/powerpc/kernel/security.c | 2 ++=0A=
>>  2 files changed, 8 insertions(+), 1 deletion(-)=0A=
>>=0A=
>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig=0A=
>> index 9f2b75f..116c953 100644=0A=
>> --- a/arch/powerpc/Kconfig=0A=
>> +++ b/arch/powerpc/Kconfig=0A=
>> @@ -165,7 +165,7 @@ config PPC=0A=
>>  	select GENERIC_CLOCKEVENTS_BROADCAST	if SMP=0A=
>>  	select GENERIC_CMOS_UPDATE=0A=
>>  	select GENERIC_CPU_AUTOPROBE=0A=
>> -	select GENERIC_CPU_VULNERABILITIES	if PPC_BOOK3S_64=0A=
>> +	select GENERIC_CPU_VULNERABILITIES	if PPC_NOSPEC=0A=
> I don't understand.  You say this patch is to make something specific  =
=0A=
> to book3s64 specific, and you are creating a new config param that  =0A=
> make things less specific=0A=
>=0A=
> Christophe=0A=
=0A=
In order to enable the vulnerabilities reporting on NXP socs I need to=0A=
enable them for PPC_FSL_BOOK3E. So they will be enabled for both=0A=
PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the=0A=
Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs=0A=
are not vulnerable to meltdown, so I made the meltdown reporting=0A=
PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in=0A=
a separate patch to be more clear.=0A=
=0A=
Diana=0A=
=0A=
>=0A=
>>  	select GENERIC_IRQ_SHOW=0A=
>>  	select GENERIC_IRQ_SHOW_LEVEL=0A=
>>  	select GENERIC_SMP_IDLE_THREAD=0A=
>> @@ -240,6 +240,11 @@ config PPC=0A=
>>  	# Please keep this list sorted alphabetically.=0A=
>>  	#=0A=
>>=0A=
>> +config PPC_NOSPEC=0A=
>> +    bool=0A=
>> +    default y=0A=
>> +    depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E=0A=
>> +=0A=
>>  config GENERIC_CSUM=0A=
>>  	def_bool n=0A=
>>=0A=
>> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/securi=
ty.c=0A=
>> index 3a4e5c3..539c744 100644=0A=
>> --- a/arch/powerpc/kernel/security.c=0A=
>> +++ b/arch/powerpc/kernel/security.c=0A=
>> @@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)=0A=
>>  device_initcall(barrier_nospec_debugfs_init);=0A=
>>  #endif /* CONFIG_DEBUG_FS */=0A=
>>=0A=
>> +#ifdef CONFIG_PPC_BOOK3S_64=0A=
>>  ssize_t cpu_show_meltdown(struct device *dev, struct  =0A=
>> device_attribute *attr, char *buf)=0A=
>>  {=0A=
>>  	bool thread_priv;=0A=
>> @@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,  =0A=
>> struct device_attribute *attr, cha=0A=
>>=0A=
>>  	return sprintf(buf, "Vulnerable\n");=0A=
>>  }=0A=
>> +#endif=0A=
>>=0A=
>>  ssize_t cpu_show_spectre_v1(struct device *dev, struct  =0A=
>> device_attribute *attr, char *buf)=0A=
>>  {=0A=
>> --=0A=
>> 2.5.5=0A=
>=0A=
>=0A=
=0A=

^ permalink raw reply

* Re: [PATCH v14 01/22] selftests/x86: Move protecton key selftest to arch neutral directory
From: Dave Hansen @ 2018-07-18 15:25 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-2-git-send-email-linuxram@us.ibm.com>

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply

* Re: [PATCH v14 03/22] selftests/vm: move generic definitions to header file
From: Dave Hansen @ 2018-07-18 15:26 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-4-git-send-email-linuxram@us.ibm.com>

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply

* Re: [PATCH v14 04/22] selftests/vm: move arch-specific definitions to arch-specific header
From: Dave Hansen @ 2018-07-18 15:27 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-5-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> In preparation for multi-arch support, move definitions which have
> arch-specific values to x86-specific header.

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply

* Re: [PATCH v14 06/22] selftests/vm: typecast the pkey register
From: Dave Hansen @ 2018-07-18 15:32 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-7-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> This is in preparation to accomadate a differing size register
> across architectures.

This is pretty fugly, and reading it again, I wonder whether we should
have just made it 64-bit, at least in all the printk's.  Or even

	prink("pk reg: %*llx\n", PKEY_FMT_LEN, pkey_reg);

But, I don't _really_ care in the end.

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply

* Re: [PATCH v14 07/22] selftests/vm: generic function to handle shadow key register
From: Dave Hansen @ 2018-07-18 15:34 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-8-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> -	shifted_pkey_reg = (pkey_reg >> (pkey * PKEY_BITS_PER_PKEY));
> +	shifted_pkey_reg = right_shift_bits(pkey, pkey_reg);
>  	dprintf2("%s() shifted_pkey_reg: "PKEY_REG_FMT"\n", __func__,
>  			shifted_pkey_reg);
>  	masked_pkey_reg = shifted_pkey_reg & mask;

I'm not a fan of how this looks.  This is almost certainly going to get
the argument order mixed up at some point.

Why do we need this?  The description doesn't say.

^ permalink raw reply

* Re: [PATCH v14 08/22] selftests/vm: fix the wrong assert in pkey_disable_set()
From: Dave Hansen @ 2018-07-18 15:36 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-9-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> If the flag is 0, no bits will be set. Hence we cant expect
> the resulting bitmap to have a higher value than what it
> was earlier.
...
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -415,7 +415,7 @@ void pkey_disable_set(int pkey, int flags)
>  	dprintf1("%s(%d) pkey_reg: 0x"PKEY_REG_FMT"\n",
>  		__func__, pkey, read_pkey_reg());
>  	if (flags)
> -		pkey_assert(read_pkey_reg() > orig_pkey_reg);
> +		pkey_assert(read_pkey_reg() >= orig_pkey_reg);
>  	dprintf1("END<---%s(%d, 0x%x)\n", __func__,
>  		pkey, flags);
>  }

I know these are just selftests, but this change makes zero sense
without the context from how powerpc works.  It's also totally
non-obvious from the patch itself what is going on, even though I
specifically called this out in a previous review.

Please add a comment here that either specifically calls out powerpc or
talks about "an architecture that does this ..."

^ permalink raw reply

* Re: [PATCH v14 09/22] selftests/vm: fixed bugs in pkey_disable_clear()
From: Dave Hansen @ 2018-07-18 15:43 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-10-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> instead of clearing the bits, pkey_disable_clear() was setting
> the bits. Fixed it.

Again, I know these are just selftests, but I'm seeing a real lack of
attention to detail with these.  Please at least go to the trouble of
writing actual changelogs with full sentences and actual capitalization.
 I think it's the least you can do if I'm going to spend time reviewing
these.

I'll go one better.  Can you just use this as a changelog?

	pkey_disable_clear() does not work.  Instead of clearing bits,
	it sets them, probably because it originated as a copy-and-paste
	of pkey_disable_set().

	This has been dead code up to now because the only callers are
	pkey_access/write_allow() and _those_ are currently unused.

> Also fixed a wrong assertion in that function. When bits are
> cleared, the resulting bit value will be less than the original.
> 
> This hasn't been a problem so far because this code isn't currently
> used.
> 
> cc: Dave Hansen <dave.hansen@intel.com>
> cc: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  tools/testing/selftests/vm/protection_keys.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index 2dd94c3..8fa4f74 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -433,7 +433,7 @@ void pkey_disable_clear(int pkey, int flags)
>  			pkey, pkey, pkey_rights);
>  	pkey_assert(pkey_rights >= 0);
>  
> -	pkey_rights |= flags;
> +	pkey_rights &= ~flags;
>
>  	ret = hw_pkey_set(pkey, pkey_rights, 0);
>  	shadow_pkey_reg &= clear_pkey_flags(pkey, flags);
> @@ -446,7 +446,7 @@ void pkey_disable_clear(int pkey, int flags)
>  	dprintf1("%s(%d) pkey_reg: 0x"PKEY_REG_FMT"\n", __func__,
>  			pkey, read_pkey_reg());

If you add a better changelog:

Acked-by: Dave Hansen <dave.hansen@intel.com>

>  	if (flags)
> -		assert(read_pkey_reg() > orig_pkey_reg);
> +		assert(read_pkey_reg() <= orig_pkey_reg);
>  }

This really is an orthogonal change that would have been better placed
with the other patch that did the exact same thing.  But I'd be OK with
it here, as long as it has an actual comment.

^ permalink raw reply

* Re: [PATCH v14 10/22] selftests/vm: fix alloc_random_pkey() to make it really random
From: Dave Hansen @ 2018-07-18 15:45 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-11-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> alloc_random_pkey() was allocating the same pkey every time.
> Not all pkeys were geting tested. fixed it.

This fixes a real issue but also unnecessarily munges whitespace.  If
you rev these again, please fix the munging.  Otherwise:

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply

* Re: [PATCH v14 11/22] selftests/vm: introduce two arch independent abstraction
From: Dave Hansen @ 2018-07-18 15:52 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-12-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> open_hugepage_file() <- opens the huge page file

Folks, a sentence here would be nice:

	Different architectures have different huge page sizes and thus
	have different sysfs filees to manipulate when allocating huge
	pages.

> get_start_key() <--  provides the first non-reserved key.

Does powerpc not start on key 0?  Why do you need this?

> cc: Dave Hansen <dave.hansen@intel.com>
> cc: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> ---
>  tools/testing/selftests/vm/pkey-helpers.h    |   10 ++++++++++
>  tools/testing/selftests/vm/pkey-x86.h        |    1 +
>  tools/testing/selftests/vm/protection_keys.c |    6 +++---
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/pkey-helpers.h b/tools/testing/selftests/vm/pkey-helpers.h
> index ada0146..52a1152 100644
> --- a/tools/testing/selftests/vm/pkey-helpers.h
> +++ b/tools/testing/selftests/vm/pkey-helpers.h
> @@ -179,4 +179,14 @@ static inline void __pkey_write_allow(int pkey, int do_allow_write)
>  #define __stringify_1(x...)     #x
>  #define __stringify(x...)       __stringify_1(x)
>  
> +static inline int open_hugepage_file(int flag)
> +{
> +	return open(HUGEPAGE_FILE, flag);
> +}

open_nr_hugepages_file() if you revise this, please
> +
> +static inline int get_start_key(void)
> +{
> +	return 1;
> +}

get_first_user_pkey(), please.

>  #endif /* _PKEYS_HELPER_H */
> diff --git a/tools/testing/selftests/vm/pkey-x86.h b/tools/testing/selftests/vm/pkey-x86.h
> index 2b3780d..d5fa299 100644
> --- a/tools/testing/selftests/vm/pkey-x86.h
> +++ b/tools/testing/selftests/vm/pkey-x86.h
> @@ -48,6 +48,7 @@
>  #define MB			(1<<20)
>  #define pkey_reg_t		u32
>  #define PKEY_REG_FMT		"%016x"
> +#define HUGEPAGE_FILE		"/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"
>  
>  static inline u32 pkey_bit_position(int pkey)
>  {
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index 2565b4c..2e448e0 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -788,7 +788,7 @@ void setup_hugetlbfs(void)
>  	 * Now go make sure that we got the pages and that they
>  	 * are 2M pages.  Someone might have made 1G the default.
>  	 */
> -	fd = open("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages", O_RDONLY);
> +	fd = open_hugepage_file(O_RDONLY);
>  	if (fd < 0) {
>  		perror("opening sysfs 2M hugetlb config");
>  		return;

This is fine, and obviously necessary.

> @@ -1075,10 +1075,10 @@ void test_kernel_gup_write_to_write_disabled_region(int *ptr, u16 pkey)
>  void test_pkey_syscalls_on_non_allocated_pkey(int *ptr, u16 pkey)
>  {
>  	int err;
> -	int i;
> +	int i = get_start_key();
>  
>  	/* Note: 0 is the default pkey, so don't mess with it */
> -	for (i = 1; i < NR_PKEYS; i++) {
> +	for (; i < NR_PKEYS; i++) {
>  		if (pkey == i)
>  			continue;

Grumble, grumble, you moved the code away from the comment connected to
it.

^ permalink raw reply

* Re: [PATCH v14 12/22] selftests/vm: pkey register should match shadow pkey
From: Dave Hansen @ 2018-07-18 16:00 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-13-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> expected_pkey_fault() is comparing the contents of pkey
> register with 0. This may not be true all the time. There
> could be bits set by default by the architecture
> which can never be changed. Hence compare the value against
> shadow pkey register, which is supposed to track the bits
> accurately all throughout

This is getting dangerously close to full sentences that actually
describe the patch.  You forgot a period, but much this is a substantial
improvement over earlier parts of the series.  Thanks for writing this,
seriously.

> cc: Dave Hansen <dave.hansen@intel.com>
> cc: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  tools/testing/selftests/vm/protection_keys.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index 2e448e0..f50cce8 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -913,10 +913,10 @@ void expected_pkey_fault(int pkey)
>  		pkey_assert(last_si_pkey == pkey);
>  
>  	/*
> -	 * The signal handler shold have cleared out PKEY register to let the
> +	 * The signal handler should have cleared out pkey-register to let the
>  	 * test program continue.  We now have to restore it.
>  	 */

... while I appreciate the spelling corrections, and I would totally ack
a patch that fixed them in one fell swoop, could we please segregate the
random spelling corrections from code fixes unless you touch those lines
otherwise?

> -	if (__read_pkey_reg() != 0)
> +	if (__read_pkey_reg() != shadow_pkey_reg)
>  		pkey_assert(0);
>  
>  	__write_pkey_reg(shadow_pkey_reg);

I know this is a one-line change, but I don't fully understand it.

On x86, if we take a pkey fault, we clear PKRU entirely (via the
on-stack XSAVE state that is restored at sigreturn) which allows the
faulting instruction to resume and execute normally.  That's what this
check is looking for: Did the signal handler clear PKRU?

Now, you're saying that powerpc might not clear it.  That makes sense.

While PKRU's state here is obvious, it isn't patently obvious to me what
shadow_pkey_reg's state is.  In fact, looking at it, I don't see the
signal handler manipulating the shadow.  So, how can this patch work?

^ permalink raw reply

* Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E
From: LEROY Christophe @ 2018-07-18 16:02 UTC (permalink / raw)
  To: Diana Madalina Craciun; +Cc: linuxppc-dev, oss, Bharat Bhushan, Leo Li
In-Reply-To: <VI1PR0401MB246392F396B74D6153CF12F7FF530@VI1PR0401MB2463.eurprd04.prod.outlook.com>

Diana Madalina Craciun <diana.craciun@nxp.com> a =C3=A9crit=C2=A0:

> On 7/17/2018 7:47 PM, LEROY Christophe wrote:
>> Diana Craciun <diana.craciun@nxp.com> a =C3=A9crit :
>>
>>> The NXP PPC Book3E platforms are not vulnerable to meltdown and
>>> Spectre v4, so make them PPC_BOOK3S_64 specific.
>>>
>>> Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
>>> ---
>>> History:
>>>
>>> v2-->v3
>>> - used the existing functions for spectre v1/v2
>>>
>>>  arch/powerpc/Kconfig           | 7 ++++++-
>>>  arch/powerpc/kernel/security.c | 2 ++
>>>  2 files changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 9f2b75f..116c953 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -165,7 +165,7 @@ config PPC
>>>  	select GENERIC_CLOCKEVENTS_BROADCAST	if SMP
>>>  	select GENERIC_CMOS_UPDATE
>>>  	select GENERIC_CPU_AUTOPROBE
>>> -	select GENERIC_CPU_VULNERABILITIES	if PPC_BOOK3S_64
>>> +	select GENERIC_CPU_VULNERABILITIES	if PPC_NOSPEC
>> I don't understand.  You say this patch is to make something specific
>> to book3s64 specific, and you are creating a new config param that
>> make things less specific
>>
>> Christophe
>
> In order to enable the vulnerabilities reporting on NXP socs I need to
> enable them for PPC_FSL_BOOK3E. So they will be enabled for both
> PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the
> Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs
> are not vulnerable to meltdown, so I made the meltdown reporting
> PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in
> a separate patch to be more clear.

Yes you can. Or keep it as a single patch and add the details you gave=20=
=20
me=20in the patch description.

Christophe

>
> Diana
>
>>
>>>  	select GENERIC_IRQ_SHOW
>>>  	select GENERIC_IRQ_SHOW_LEVEL
>>>  	select GENERIC_SMP_IDLE_THREAD
>>> @@ -240,6 +240,11 @@ config PPC
>>>  	# Please keep this list sorted alphabetically.
>>>  	#
>>>
>>> +config PPC_NOSPEC
>>> +    bool
>>> +    default y
>>> +    depends on PPC_BOOK3S_64 || PPC_FSL_BOOK3E
>>> +
>>>  config GENERIC_CSUM
>>>  	def_bool n
>>>
>>> diff --git a/arch/powerpc/kernel/security.c=20=20
>>>=20b/arch/powerpc/kernel/security.c
>>> index 3a4e5c3..539c744 100644
>>> --- a/arch/powerpc/kernel/security.c
>>> +++ b/arch/powerpc/kernel/security.c
>>> @@ -92,6 +92,7 @@ static __init int barrier_nospec_debugfs_init(void)
>>>  device_initcall(barrier_nospec_debugfs_init);
>>>  #endif /* CONFIG_DEBUG_FS */
>>>
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>>  ssize_t cpu_show_meltdown(struct device *dev, struct
>>> device_attribute *attr, char *buf)
>>>  {
>>>  	bool thread_priv;
>>> @@ -124,6 +125,7 @@ ssize_t cpu_show_meltdown(struct device *dev,
>>> struct device_attribute *attr, cha
>>>
>>>  	return sprintf(buf, "Vulnerable\n");
>>>  }
>>> +#endif
>>>
>>>  ssize_t cpu_show_spectre_v1(struct device *dev, struct
>>> device_attribute *attr, char *buf)
>>>  {
>>> --
>>> 2.5.5
>>
>>

^ permalink raw reply

* Re: [PATCH v14 13/22] selftests/vm: generic cleanup
From: Dave Hansen @ 2018-07-18 16:06 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-14-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> cleanup the code to satisfy coding styles.
> 
> cc: Dave Hansen <dave.hansen@intel.com>
> cc: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  tools/testing/selftests/vm/protection_keys.c |   64 +++++++++++++++++--------
>  1 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index f50cce8..304f74f 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -4,7 +4,7 @@
>   *
>   * There are examples in here of:
>   *  * how to set protection keys on memory
> - *  * how to set/clear bits in pkey registers (the rights register)
> + *  * how to set/clear bits in Protection Key registers (the rights register)

Huh?  Which coding style says that we can't say "pkey"?

>   *  * how to handle SEGV_PKUERR signals and extract pkey-relevant
>   *    information from the siginfo
>   *
> @@ -13,13 +13,18 @@
>   *	prefault pages in at malloc, or not
>   *	protect MPX bounds tables with protection keys?
>   *	make sure VMA splitting/merging is working correctly
> - *	OOMs can destroy mm->mmap (see exit_mmap()), so make sure it is immune to pkeys
> - *	look for pkey "leaks" where it is still set on a VMA but "freed" back to the kernel
> - *	do a plain mprotect() to a mprotect_pkey() area and make sure the pkey sticks
> + *	OOMs can destroy mm->mmap (see exit_mmap()),
> + *			so make sure it is immune to pkeys
> + *	look for pkey "leaks" where it is still set on a VMA
> + *			 but "freed" back to the kernel
> + *	do a plain mprotect() to a mprotect_pkey() area and make
> + *			 sure the pkey sticks

This makes it work substantially worse.  That's not acceptable, even if
you did move it under 80 columns.

>   * Compile like this:
> - *	gcc      -o protection_keys    -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> - *	gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99 -pthread -Wall protection_keys.c -lrt -ldl -lm
> + *	gcc      -o protection_keys    -O2 -g -std=gnu99
> + *			 -pthread -Wall protection_keys.c -lrt -ldl -lm
> + *	gcc -m32 -o protection_keys_32 -O2 -g -std=gnu99
> + *			 -pthread -Wall protection_keys.c -lrt -ldl -lm
>   */

Why was this on one line?  Because it was easier to copy and paste.
Please leave it on one line, CodingStyle be damned.

>  #define _GNU_SOURCE
>  #include <errno.h>
> @@ -263,10 +268,12 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  			__read_pkey_reg());
>  	dprintf1("pkey from siginfo: %jx\n", siginfo_pkey);
>  	*(u64 *)pkey_reg_ptr = 0x00000000;
> -	dprintf1("WARNING: set PRKU=0 to allow faulting instruction to continue\n");
> +	dprintf1("WARNING: set PKEY_REG=0 to allow faulting instruction "
> +			"to continue\n");

It's actually totally OK to let printk strings go over 80 columns.

>  	pkey_faults++;
>  	dprintf1("<<<<==================================================\n");
>  	dprint_in_signal = 0;
> +	return;
>  }

Now we're just being silly.

>  
>  int wait_all_children(void)
> @@ -384,7 +391,7 @@ void pkey_disable_set(int pkey, int flags)
>  {
>  	unsigned long syscall_flags = 0;
>  	int ret;
> -	int pkey_rights;
> +	u32 pkey_rights;

This is not CodingStyle.  Shouldn't this be the pkey_reg_t that you
introduced earlier in the series?

> -int sys_pkey_alloc(unsigned long flags, unsigned long init_val)
> +int sys_pkey_alloc(unsigned long flags, u64 init_val)
>  {

Um, this is actually a 'unsigned long' in the ABI.

Can you go back through this and actually make sure that these are real
coding style cleanups?

^ permalink raw reply

* [PATCH v2] powerpc/prom_init: remove linux,stdout-package property
From: Murilo Opsfelder Araujo @ 2018-07-18 16:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: Murilo Opsfelder Araujo, Aneesh Kumar K . V, Kees Cook,
	Mathieu Malaterre, Bharata B Rao, Nicholas Piggin,
	Alexey Kardashevskiy, Michael Bringmann, Paul Mackerras,
	Cédric Le Goater, Nathan Fontenot, linuxppc-dev

This property was added in 2004 and the only use of it, which was already inside
`#if 0`, was removed a month later.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
 arch/powerpc/kernel/prom_init.c | 2 --
 1 file changed, 2 deletions(-)

v1..v2:
- Move github.com links to this section so they can still end up in the mail
  archive.
- Remove Fixes: tag from commit message.
- Add Link: tag here.
- Link to v1: https://lkml.kernel.org/r/20180712171904.18971-1-muriloo@linux.ibm.com

This property was added in 2004 by

    https://github.com/mpe/linux-fullhistory/commit/689fe5072fe9a0dec914bfa4fa60aed1e54563e6

and the only use of it, which was already inside `#if 0`, was removed a month
later by

    https://github.com/mpe/linux-fullhistory/commit/1fbe5a6d90f6cd4ea610737ef488719d1a875de7

Link: https://github.com/linuxppc/linux/issues/125

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 5425dd3d6a9f..c45fb463c9e5 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2102,8 +2102,6 @@ static void __init prom_init_stdout(void)
 	stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
 	if (stdout_node != PROM_ERROR) {
 		val = cpu_to_be32(stdout_node);
-		prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
-			     &val, sizeof(val));
 
 		/* If it's a display, note it */
 		memset(type, 0, sizeof(type));
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] powerpc/prom_init: remove linux,stdout-package property
From: Murilo Opsfelder Araujo @ 2018-07-18 16:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-kernel, Aneesh Kumar K . V, Kees Cook, Mathieu Malaterre,
	Bharata B Rao, Nicholas Piggin, Alexey Kardashevskiy,
	Michael Bringmann, Paul Mackerras, Cédric Le Goater,
	Nathan Fontenot, linuxppc-dev
In-Reply-To: <87a7qoq3z2.fsf@concordia.ellerman.id.au>

On Wed, Jul 18, 2018 at 07:37:37PM +1000, Michael Ellerman wrote:
> Hi Murilo,
> 
> Thanks for the patch.
> 
> Murilo Opsfelder Araujo <muriloo@linux.ibm.com> writes:
> > This property was added in 2004 by
> >
> >     https://github.com/mpe/linux-fullhistory/commit/689fe5072fe9a0dec914bfa4fa60aed1e54563e6
> >
> > and the only use of it, which was already inside `#if 0`, was removed a month
> > later by
> >
> >     https://github.com/mpe/linux-fullhistory/commit/1fbe5a6d90f6cd4ea610737ef488719d1a875de7
> >
> > Fixes: https://github.com/linuxppc/linux/issues/125
> 
> That is going to confuse some scripts that are expecting that to be a
> "Fixes: <some commit>" tag :)
> 
> The proper tag to use there would be "Link:".
> 
> But, I'd prefer we didn't add github issue links to the history, as I'm
> not sure they won't bit-rot eventually. Not because I'm a anti-Microsoft
> conspiracy person but just because it's a repo I set up and manage and
> there's no long term plan for it necessarily.
> 
> > ---
> >  arch/powerpc/kernel/prom_init.c | 2 --
> >  1 file changed, 2 deletions(-)
> 
> Including the link here would be ideal, as it means it doesn't end up in
> the commit history, but it does end up in the mail archive. So if we
> ever really need to find it, it should be there.
> 
> cheers

Hi, Michael.

Thanks for reviewing it.  I've sent v2 with your suggestions:

    https://lkml.kernel.org/r/20180718161544.12134-1-muriloo@linux.ibm.com

Cheers

--
Murilo

^ permalink raw reply

* Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump.
From: Mahesh Jagannath Salgaonkar @ 2018-07-18 16:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linuxppc-dev, Linux Kernel, Hari Bathini,
	Ananth N Mavinakayanahalli, Srikar Dronamraju, Aneesh Kumar K.V,
	Anshuman Khandual, Andrew Morton, Joonsoo Kim, Ananth Narayan,
	kernelfans
In-Reply-To: <20180717115232.GF7193@dhcp22.suse.cz>

On 07/17/2018 05:22 PM, Michal Hocko wrote:
> On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote:
>> On 07/16/2018 01:56 PM, Michal Hocko wrote:
>>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote:
>>>> One of the primary issues with Firmware Assisted Dump (fadump) on Power
>>>> is that it needs a large amount of memory to be reserved. This reserved
>>>> memory is used for saving the contents of old crashed kernel's memory before
>>>> fadump capture kernel uses old kernel's memory area to boot. However, This
>>>> reserved memory area stays unused until system crash and isn't available
>>>> for production kernel to use.
>>>
>>> How much memory are we talking about. Regular kernel dump process needs
>>> some reserved memory as well. Why that is not a big problem?
>>
>> We reserve around 5% of total system RAM. On large systems with
>> TeraBytes of memory, this reservation can be quite significant.
>>
>> The regular kernel dump uses the kexec method to boot into capture
>> kernel and it can control the parameters that are being passed to
>> capture kernel. This allows a capability to strip down the parameters
>> that can help lowering down the memory requirement for capture kernel to
>> boot. This allows regular kdump to reserve less memory to start with.
>>
>> Where as fadump depends on power firmware (pHyp) to load the capture
>> kernel after full reset and boots like a regular kernel. It needs same
>> amount of memory to boot as the production kernel. On large systems
>> production kernel needs significant amount of memory to boot. Hence
>> fadump needs to reserve enough memory for capture kernel to boot
>> successfully and execute dump capturing operations. By default fadump
>> reserves 5% of total system RAM and in most cases this has worked
>> flawlessly on variety of system configurations. Optionally,
>> 'crashkernel=X' can also be used to specify more fine-tuned memory size
>> for reservation.
> 
> So why do we even care about fadump when regular kexec provides
> (presumably) same functionality with a smaller memory footprint? Or is
> there any reason why kexec doesn't work well on ppc?

Kexec based kdump is loaded by crashing kernel. When OS crashes, the
system is in an inconsistent state, especially the devices. In some
cases, a rogue DMA or ill-behaving device drivers can cause the kdump
capture to fail.

On power platform, fadump solves these issues by taking help from power
firmware, to fully-reset the system, load the fresh copy of same kernel
to capture the dump with PCI and I/O devices reinitialized, making it
more reliable.

Fadump does full system reset, booting system through the regular boot
options i.e the dump capture kernel is booted in the same fashion and
doesn't have specialized kernel command line option. This implies, we
need to give more memory for the system boot. Since the new kernel boots
from the same memory location as crashed kernel, we reserve 5% of memory
where power firmware moves the crashed kernel's memory content. This
reserved memory is completely removed from the available memory. For
large memory systems like 64TB systems, this account to ~ 3TB, which is
a significant chunk of memory production kernel is deprived of. Hence,
this patch adds an improvement to exiting fadump feature to make the
reserved memory available to system for use, using zone movable.

Thanks,
-Mahesh.

> 
>>>> Instead of setting aside a significant chunk of memory that nobody can use,
>>>> take advantage ZONE_MOVABLE to mark a significant chunk of reserved memory
>>>> as ZONE_MOVABLE, so that the kernel is prevented from using, but
>>>> applications are free to use it.
>>>
>>> Why kernel cannot use that memory while userspace can?
>>
>> fadump needs to reserve memory to be able to save crashing kernel's
>> memory, with help from power firmware, before the capture kernel loads
>> into crashing kernel's memory area. Any contents present in this
>> reserved memory will be over-written. If kernel is allowed to use this
>> memory, then we loose that kernel data and won't be part of captured
>> dump, which could be critical to debug root cause of system crash.
> 
> But then you simply screw user memory sitting there. This might be not
> so critical as the kernel memory but still it sounds like you are
> reducing the usefulness of the dump just because of inherent limitations
> of fadump.
> 
>> Kdump and fadump both uses same infrastructure/tool (makedumpfile) to
>> capture the memory dump. While the tool provides flexibility to
>> determine what needs to be part of the dump and what memory to filter
>> out, all supported distributions defaults to "Capture only kernel data
>> and nothing else". Taking advantage of this default we can at least make
>> the reserved memory available for userspace to use.
>>
>> If someone wants to capture userspace data as well then
>> 'fadump=nonmovable' option can be used where reserved pages won't be
>> marked zone movable.
> 
> Ohh, so you have an unclutter thing to support the case above.
> 
>> Advantage of movable method is the reserved memory chunk is also
>> available for use.
>>
>>> [...]
>>>>  Documentation/powerpc/firmware-assisted-dump.txt |   18 +++
>>>>  arch/powerpc/include/asm/fadump.h                |    7 +
>>>>  arch/powerpc/kernel/fadump.c                     |  123 +++++++++++++++++--
>>>>  arch/powerpc/platforms/pseries/hotplug-memory.c  |    7 +
>>>>  include/linux/mmzone.h                           |    2 
>>>>  mm/page_alloc.c                                  |  146 ++++++++++++++++++++++
>>>>  6 files changed, 290 insertions(+), 13 deletions(-)
>>>
>>> This is quite a large change and you didn't seem to explain why we need
>>> it.
>>>
>>
>> In fadump case, the reserved memory stays unused until system is
>> crashed. fadump uses very small portion of this reserved memory, few
>> KBs, for storing fadump metadata. Otherwise, the significant chunk of
>> memory is completely unused. Hence, instead of blocking a memory that is
>> un-utilized through out the lifetime of system, it's better to give it
>> back to production kernel to use. But at the same time we don't want
>> kernel to use that memory. While exploring we found 1) Linux kernel's
>> Contiguous Memory Allocator (CMA) feature and 2) ZONE_MOVABLE, that
>> suites the requirement. Initial 5 revisions of this patchset () was
>> using CMA feature. However, fadump does not do any cma allocations,
>> hence it will be more appropriate to use zone movable to achieve the same.
>>
>> But unlike CMA, there is no interface available to mark a custom
>> reserved memory area as ZONE_MOVABLE. Hence patch 1/4 proposes the same.
> 
> Well, you are adding a significant amount of code so you should be much
> better in explaining why does the generic code care about a ppc specific
> kdump method.
> 

^ permalink raw reply

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-07-18 16:36 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, Michal Simek,
	David S. Miller, Chris Zankel, Max Filippov, Gustavo Padovan,
	Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <20180718024944.577-2-bhe@redhat.com>

On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
> so that it's shared.

Some minor stuff.

> +/**
> + * reparent_resources - reparent resource children of parent that res covers
> + * @parent: parent resource descriptor
> + * @res: resource descriptor desired by caller
> + *
> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
> + * contained by 'res', -ECANCELED if no any conflicting entry found.

'res' -> @res

> + *
> + * Reparent resource children of 'parent' that conflict with 'res'

Ditto + 'parent' -> @parent

> + * under 'res', and make 'res' replace those children.

Ditto.

> + */
> +int reparent_resources(struct resource *parent, struct resource *res)
> +{
> +       struct resource *p, **pp;
> +       struct resource **firstpp = NULL;
> +
> +       for (pp = &parent->child; (p = *pp) != NULL; pp = &p->sibling) {
> +               if (p->end < res->start)
> +                       continue;
> +               if (res->end < p->start)
> +                       break;
> +               if (p->start < res->start || p->end > res->end)
> +                       return -ENOTSUPP;       /* not completely contained */
> +               if (firstpp == NULL)
> +                       firstpp = pp;
> +       }
> +       if (firstpp == NULL)
> +               return -ECANCELED; /* didn't find any conflicting entries? */
> +       res->parent = parent;
> +       res->child = *firstpp;
> +       res->sibling = *pp;
> +       *firstpp = res;
> +       *pp = NULL;
> +       for (p = res->child; p != NULL; p = p->sibling) {
> +               p->parent = res;

> +               pr_debug("PCI: Reparented %s %pR under %s\n",
> +                        p->name, p, res->name);

Now, PCI is a bit confusing here.

> +       }
> +       return 0;
> +}
> +EXPORT_SYMBOL(reparent_resources);
> +
>  static void __init __reserve_region_with_split(struct resource *root,
>                 resource_size_t start, resource_size_t end,
>                 const char *name)
> --
> 2.13.6
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v7 1/4] resource: Move reparent_resources() to kernel/resource.c and make it public
From: Andy Shevchenko @ 2018-07-18 16:37 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linux Kernel Mailing List, Andrew Morton, Rob Herring,
	Dan Williams, Nicolas Pitre, Josh Triplett, kbuild test robot,
	Borislav Petkov, Patrik Jakobsson, David Airlie, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Dmitry Torokhov, Frank Rowand,
	Keith Busch, Jon Derrick, Lorenzo Pieralisi, Bjorn Helgaas,
	Thomas Gleixner, brijesh.singh, Jérôme Glisse,
	Tom Lendacky, Greg Kroah-Hartman, baiyaowei, richard.weiyang,
	devel, linux-input, linux-nvdimm, devicetree, linux-pci,
	Eric Biederman, Vivek Goyal, Dave Young, Yinghai Lu, Michal Simek,
	David S. Miller, Chris Zankel, Max Filippov, Gustavo Padovan,
	Maarten Lankhorst, Sean Paul, linux-parisc,
	open list:LINUX FOR POWERPC PA SEMI PWRFICIENT,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
In-Reply-To: <CAHp75VdO88ydJQ9GHdaDUmAmzL6QHR=US6JiXZ1R_EEA-xWR1Q@mail.gmail.com>

On Wed, Jul 18, 2018 at 7:36 PM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> On Wed, Jul 18, 2018 at 5:49 AM, Baoquan He <bhe@redhat.com> wrote:
>> reparent_resources() is duplicated in arch/microblaze/pci/pci-common.c
>> and arch/powerpc/kernel/pci-common.c, so move it to kernel/resource.c
>> so that it's shared.

>> + * Returns 0 on success, -ENOTSUPP if child resource is not completely
>> + * contained by 'res', -ECANCELED if no any conflicting entry found.

You also can refer to constants by prefixing them with %, e.g. %-ENOTSUPP.
But this is up to you completely.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v14 14/22] selftests/vm: Introduce generic abstractions
From: Dave Hansen @ 2018-07-18 16:38 UTC (permalink / raw)
  To: Ram Pai, shuahkh, linux-kselftest
  Cc: mpe, linuxppc-dev, linux-mm, x86, linux-arch, mingo, mhocko,
	bauerman, fweimer, msuchanek, aneesh.kumar
In-Reply-To: <1531835365-32387-15-git-send-email-linuxram@us.ibm.com>

On 07/17/2018 06:49 AM, Ram Pai wrote:
> Introduce generic abstractions and provide architecture
> specific implementation for the abstractions.

I really wanted to see these two things separated:
1. introduce abstractions
2. introduce ppc implementation

But, I guess most of it is done except for the siginfo stuff.

>  #if defined(__i386__) || defined(__x86_64__) /* arch */
>  #include "pkey-x86.h"
> +#elif defined(__powerpc64__) /* arch */
> +#include "pkey-powerpc.h"
>  #else /* arch */
>  #error Architecture not supported
>  #endif /* arch */
> @@ -186,7 +191,16 @@ static inline int open_hugepage_file(int flag)
>  
>  static inline int get_start_key(void)
>  {
> -	return 1;
> +	return 0;
> +}

How does this not now break x86?
>  #endif /* _PKEYS_X86_H */
> diff --git a/tools/testing/selftests/vm/protection_keys.c b/tools/testing/selftests/vm/protection_keys.c
> index 304f74f..18e1bb7 100644
> --- a/tools/testing/selftests/vm/protection_keys.c
> +++ b/tools/testing/selftests/vm/protection_keys.c
> @@ -197,17 +197,18 @@ void dump_mem(void *dumpme, int len_bytes)
>  
>  int pkey_faults;
>  int last_si_pkey = -1;
> +void pkey_access_allow(int pkey);

Please just move the function.

>  void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  {
>  	ucontext_t *uctxt = vucontext;
>  	int trapno;
>  	unsigned long ip;
>  	char *fpregs;
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
>  	pkey_reg_t *pkey_reg_ptr;
> -	u64 siginfo_pkey;
> +#endif /* defined(__i386__) || defined(__x86_64__) */
> +	u32 siginfo_pkey;
>  	u32 *si_pkey_ptr;
> -	int pkey_reg_offset;
> -	fpregset_t fpregset;
>  
>  	dprint_in_signal = 1;
>  	dprintf1(">>>>===============SIGSEGV============================\n");
> @@ -217,12 +218,14 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  
>  	trapno = uctxt->uc_mcontext.gregs[REG_TRAPNO];
>  	ip = uctxt->uc_mcontext.gregs[REG_IP_IDX];
> -	fpregset = uctxt->uc_mcontext.fpregs;
> -	fpregs = (void *)fpregset;
> +	fpregs = (char *) uctxt->uc_mcontext.fpregs;
>  
>  	dprintf2("%s() trapno: %d ip: 0x%016lx info->si_code: %s/%d\n",
>  			__func__, trapno, ip, si_code_str(si->si_code),
>  			si->si_code);
> +
> +#if defined(__i386__) || defined(__x86_64__) /* arch */
> +
>  #ifdef __i386__
>  	/*
>  	 * 32-bit has some extra padding so that userspace can tell whether
> @@ -230,20 +233,21 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  	 * state.  We just assume that it is here.
>  	 */
>  	fpregs += 0x70;
> -#endif
> -	pkey_reg_offset = pkey_reg_xstate_offset();
> -	pkey_reg_ptr = (void *)(&fpregs[pkey_reg_offset]);
> +#endif /* __i386__ */
>  
> -	dprintf1("siginfo: %p\n", si);
> -	dprintf1(" fpregs: %p\n", fpregs);
> +	pkey_reg_ptr = (void *)(&fpregs[pkey_reg_xstate_offset()]);

There are unnecessary parenthesis here.

Also, why are you bothering to mess with this?  This is inside the x86
#ifdef, right?

>  	/*
> -	 * If we got a PKEY fault, we *HAVE* to have at least one bit set in
> +	 * If we got a key fault, we *HAVE* to have at least one bit set in
>  	 * here.
>  	 */
>  	dprintf1("pkey_reg_xstate_offset: %d\n", pkey_reg_xstate_offset());
>  	if (DEBUG_LEVEL > 4)
>  		dump_mem(pkey_reg_ptr - 128, 256);
>  	pkey_assert(*pkey_reg_ptr);
> +#endif /* defined(__i386__) || defined(__x86_64__) */
> +
> +	dprintf1("siginfo: %p\n", si);
> +	dprintf1(" fpregs: %p\n", fpregs);
>  
>  	if ((si->si_code == SEGV_MAPERR) ||
>  	    (si->si_code == SEGV_ACCERR) ||
> @@ -252,22 +256,28 @@ void signal_handler(int signum, siginfo_t *si, void *vucontext)
>  		exit(4);
>  	}
>  
> -	si_pkey_ptr = (u32 *)(((u8 *)si) + si_pkey_offset);
> +	si_pkey_ptr = siginfo_get_pkey_ptr(si);
>  	dprintf1("si_pkey_ptr: %p\n", si_pkey_ptr);
> -	dump_mem((u8 *)si_pkey_ptr - 8, 24);
> +	dump_mem(si_pkey_ptr - 8, 24);

You removed the cast here, why?  That changes the pointer math.

Can we merge this as-is.  No, I do not think we can.  If it were _just_
the #ifdefs, we could let it pass, but there are a bunch of rough spots,
not just the #ifdefs.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox