LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 2/3] powerpc/book3e: Fix set_memory_x() and set_memory_nx()
From: Michael Ellerman @ 2021-10-28 11:33 UTC (permalink / raw)
  To: Christophe Leroy, Nicholas Piggin, Benjamin Herrenschmidt,
	Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <063e72e1-fc05-7783-9f42-f681dd08a4b2@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 27/10/2021 à 06:44, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of October 26, 2021 3:39 pm:
>>> set_memory_x() calls pte_mkexec() which sets _PAGE_EXEC.
>>> set_memory_nx() calls pte_exprotec() which clears _PAGE_EXEC.
>>>
>>> Book3e has 2 bits, UX and SX, which defines the exec rights
>>> resp. for user (PR=1) and for kernel (PR=0).
>>>
>>> _PAGE_EXEC is defined as UX only.
>>>
>>> An executable kernel page is set with either _PAGE_KERNEL_RWX
>>> or _PAGE_KERNEL_ROX, which both have SX set and UX cleared.
>>>
>>> So set_memory_nx() call for an executable kernel page does
>>> nothing because UX is already cleared.
>>>
>>> And set_memory_x() on a non-executable kernel page makes it
>>> executable for the user and keeps it non-executable for kernel.
>>>
>>> Also, pte_exec() always returns 'false' on kernel pages, because
>>> it checks _PAGE_EXEC which doesn't include SX, so for instance
>>> the W+X check doesn't work.
>>>
>>> To fix this:
>>> - change tlb_low_64e.S to use _PAGE_BAP_UX instead of _PAGE_USER
>>> - sets both UX and SX in _PAGE_EXEC so that pte_user() returns
>>> true whenever one of the two bits is set
>> 
>> I don't understand this change. Which pte_user() returns true after
>> this change? Or do you mean pte_exec()?
>
> Oops, yes, I mean pte_exec()
>
> Unless I have to re-spin, can Michael eventually fix that typo while 
> applying ?

I did.

cheers

^ permalink raw reply

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
From: John Paul Adrian Glaubitz @ 2021-10-28 11:20 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: debian-powerpc@lists.debian.org, linuxppc-dev
In-Reply-To: <878rydac0d.fsf@mpe.ellerman.id.au>

Hi Michael!

On 10/28/21 08:39, Michael Ellerman wrote:
>>> No, I will try that now.
> 
> That completed fine on my BE VM here.
> 
> I ran these in two tmux windows:
>   $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
>   $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc
> 
> 
> The VM has 32 CPUs, with 4 threads per core:
> 
>   $ ppc64_cpu --info
>   Core   0:    0*    1*    2*    3*
>   Core   1:    4*    5*    6*    7*
>   Core   2:    8*    9*   10*   11*
>   Core   3:   12*   13*   14*   15*
>   Core   4:   16*   17*   18*   19*
>   Core   5:   20*   21*   22*   23*
>   Core   6:   24*   25*   26*   27*
>   Core   7:   28*   29*   30*   31*

It seems I also can no longer reproduce the issue, even when building the most problematic
packages and I think we should consider it fixed for now. I will keep monitoring the server,
of course, and will let you know in case the problem shows again.

Thanks a lot again for fixing this issue!

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply

* Re: [PATCH v2 03/45] notifier: Add atomic/blocking_notifier_has_unique_priority()
From: Andy Shevchenko @ 2021-10-28 11:00 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Ulf Hansson, Rich Felker, linux-ia64, Tomer Maimon,
	Santosh Shilimkar, Rafael J . Wysocki, Boris Ostrovsky,
	Catalin Marinas, Linus Walleij, Dave Hansen, x86, Tali Perry,
	James E.J. Bottomley, Thierry Reding, Guo Ren, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, linux-sh, Lee Jones, Helge Deller,
	Daniel Lezcano, Russell King, linux-csky, Jonathan Hunter,
	Tony Lindgren, Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven,
	xen-devel, linux-mips, Guenter Roeck, Len Brown, Albert Ou,
	linux-pm, Jonathan Neuschäfer, Vladimir Zapolskiy,
	linux-acpi, linux-m68k, Mark Brown, Borislav Petkov, Greentime Hu,
	Paul Walmsley, linux-tegra, Thomas Gleixner, linux-omap,
	Nancy Yuen, linux-arm-kernel, Juergen Gross, Thomas Bogendoerfer,
	linux-parisc, Nick Hu, Avi Fishman, Patrick Venture,
	Liam Girdwood, linux-kernel, Palmer Dabbelt, Philipp Zabel,
	Paul Mackerras, Andrew Morton, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-4-digetx@gmail.com>

On Thu, Oct 28, 2021 at 12:16:33AM +0300, Dmitry Osipenko wrote:
> Add atomic/blocking_notifier_has_unique_priority() helpers which return
> true if given handler has unique priority.

...

> +/**
> + *	atomic_notifier_has_unique_priority - Checks whether notifier's priority is unique
> + *	@nh: Pointer to head of the atomic notifier chain
> + *	@n: Entry in notifier chain to check
> + *
> + *	Checks whether there is another notifier in the chain with the same priority.
> + *	Must be called in process context.
> + *
> + *	Returns true if priority is unique, false otherwise.

Why this indentation?

> + */
> +bool atomic_notifier_has_unique_priority(struct atomic_notifier_head *nh,
> +		struct notifier_block *n)
> +{
> +	struct notifier_block **nl = &nh->head;
> +	unsigned long flags;
> +	bool ret = true;
> +
> +	spin_lock_irqsave(&nh->lock, flags);
> +
> +	while ((*nl) != NULL && (*nl)->priority >= n->priority) {

' != NULL' is redundant.

> +		if ((*nl)->priority == n->priority && (*nl) != n) {
> +			ret = false;
> +			break;
> +		}
> +
> +		nl = &((*nl)->next);
> +	}
> +
> +	spin_unlock_irqrestore(&nh->lock, flags);
> +
> +	return ret;
> +}

...

> +	/*
> +	 * This code gets used during boot-up, when task switching is
> +	 * not yet working and interrupts must remain disabled.  At

One space is enough.

> +	 * such times we must not call down_write().
> +	 */

> +	while ((*nl) != NULL && (*nl)->priority >= n->priority) {

' != NULL' is not needed.

> +		if ((*nl)->priority == n->priority && (*nl) != n) {
> +			ret = false;
> +			break;
> +		}
> +
> +		nl = &((*nl)->next);
> +	}

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* Re: [PATCH v2 08/45] kernel: Add combined power-off+restart handler call chain API
From: Rafael J. Wysocki @ 2021-10-28  9:59 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Ulf Hansson, Rich Felker, linux-ia64, Tomer Maimon,
	Santosh Shilimkar, Rafael J . Wysocki, Boris Ostrovsky,
	Catalin Marinas, Linus Walleij, Dave Hansen,
	the arch/x86 maintainers, Tali Perry, James E.J. Bottomley,
	Thierry Reding, Guo Ren, Pavel Machek, H. Peter Anvin,
	linux-riscv, Vincent Chen, Will Deacon, Greg Ungerer,
	Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Linux-sh list, Lee Jones, Helge Deller,
	Daniel Lezcano, Russell King, linux-csky, Jonathan Hunter,
	Tony Lindgren, Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven,
	xen-devel, linux-mips, Guenter Roeck, Len Brown, Albert Ou,
	Linux OMAP Mailing List, Jonathan Neuschäfer,
	Vladimir Zapolskiy, ACPI Devel Maling List, linux-m68k,
	Mark Brown, Borislav Petkov, Greentime Hu, Paul Walmsley,
	linux-tegra, Thomas Gleixner, Andy Shevchenko, Nancy Yuen,
	Linux ARM, Juergen Gross, Thomas Bogendoerfer, linux-parisc,
	Avi Fishman, Patrick Venture, Linux PM, Liam Girdwood,
	Linux Kernel Mailing List, Palmer Dabbelt, Philipp Zabel,
	Paul Mackerras, Andrew Morton, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-9-digetx@gmail.com>

On Wed, Oct 27, 2021 at 11:18 PM Dmitry Osipenko <digetx@gmail.com> wrote:
>
> SoC platforms often have multiple options of how to perform system's
> power-off and restart operations. Meanwhile today's kernel is limited to
> a single option. Add combined power-off+restart handler call chain API,
> which is inspired by the restart API. The new API provides both power-off
> and restart functionality.
>
> The old pm_power_off method will be kept around till all users are
> converted to the new API.
>
> Current restart API will be replaced by the new unified API since
> new API is its superset. The restart functionality of the power-handler
> API is built upon the existing restart-notifier APIs.
>
> In order to ease conversion to the new API, convenient helpers are added
> for the common use-cases. They will reduce amount of boilerplate code and
> remove global variables. These helpers preserve old behaviour for cases
> where only one power-off handler is executed, this is what existing
> drivers want, and thus, they could be easily converted to the new API.
> Users of the new API should explicitly enable power-off chaining by
> setting corresponding flag of the power_handler structure.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  include/linux/reboot.h   | 176 +++++++++++-
>  kernel/power/hibernate.c |   2 +-
>  kernel/reboot.c          | 601 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 768 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index b7fa25726323..0ec835338c27 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -8,10 +8,16 @@
>
>  struct device;
>
> -#define SYS_DOWN       0x0001  /* Notify of system down */
> -#define SYS_RESTART    SYS_DOWN
> -#define SYS_HALT       0x0002  /* Notify of system halt */
> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> +enum reboot_prepare_mode {
> +       SYS_DOWN = 1,           /* Notify of system down */
> +       SYS_RESTART = SYS_DOWN,
> +       SYS_HALT,               /* Notify of system halt */
> +       SYS_POWER_OFF,          /* Notify of system power off */
> +};
> +
> +#define RESTART_PRIO_RESERVED          0
> +#define RESTART_PRIO_DEFAULT           128
> +#define RESTART_PRIO_HIGH              192
>
>  enum reboot_mode {
>         REBOOT_UNDEFINED = -1,
> @@ -49,6 +55,167 @@ int register_restart_handler(struct notifier_block *);
>  int unregister_restart_handler(struct notifier_block *);
>  void do_kernel_restart(char *cmd);
>
> +/*
> + * Unified poweroff + restart API.
> + */
> +
> +#define POWEROFF_PRIO_RESERVED         0
> +#define POWEROFF_PRIO_PLATFORM         1
> +#define POWEROFF_PRIO_DEFAULT          128
> +#define POWEROFF_PRIO_HIGH             192
> +#define POWEROFF_PRIO_FIRMWARE         224

Also I'm wondering why these particular numbers were chosen, here and above?

> +
> +enum poweroff_mode {
> +       POWEROFF_NORMAL = 0,
> +       POWEROFF_PREPARE,
> +};
> +
> +struct power_off_data {
> +       void *cb_data;
> +};
> +
> +struct power_off_prep_data {
> +       void *cb_data;
> +};
> +
> +struct restart_data {
> +       void *cb_data;
> +       const char *cmd;
> +       enum reboot_mode mode;
> +};
> +
> +struct reboot_prep_data {
> +       void *cb_data;
> +       const char *cmd;
> +       enum reboot_prepare_mode mode;
> +};
> +
> +struct power_handler_private_data {
> +       struct notifier_block reboot_prep_nb;
> +       struct notifier_block power_off_nb;
> +       struct notifier_block restart_nb;
> +       void (*trivial_power_off_cb)(void);
> +       void (*simple_power_off_cb)(void *data);
> +       void *simple_power_off_cb_data;
> +       bool registered;
> +};
> +
> +/**
> + * struct power_handler - Machine power-off + restart handler
> + *
> + * Describes power-off and restart handlers which are invoked by kernel
> + * to power off or restart this machine.  Supports prioritized chaining for
> + * both restart and power-off handlers.  Callback's priority must be unique.
> + * Intended to be used by device drivers that are responsible for restarting
> + * and powering off hardware which kernel is running on.
> + *
> + * Struct power_handler can be static.  Members of this structure must not be
> + * altered while handler is registered.
> + *
> + * Fill the structure members and pass it to register_power_handler().
> + */
> +struct power_handler {
> +       /**
> +        * @cb_data:
> +        *
> +        * User data included in callback's argument.
> +        */

And here I would document the structure fields in the main kerneldoc
comment above.

As is, it is a bit hard to grasp the whole definition.

> +       void *cb_data;
> +
> +       /**
> +        * @power_off_cb:
> +        *
> +        * Callback that should turn off machine.  Inactive if NULL.
> +        */
> +       void (*power_off_cb)(struct power_off_data *data);
> +
> +       /**
> +        * @power_off_prepare_cb:
> +        *
> +        * Power-off preparation callback.  All power-off preparation callbacks
> +        * are invoked before @restart_cb.  Inactive if NULL.
> +        */
> +       void (*power_off_prepare_cb)(struct power_off_prep_data *data);
> +
> +       /**
> +        * @power_off_priority:
> +        *
> +        * Power-off callback priority, must be unique.  Zero value is
> +        * reassigned to default priority.  Inactive if @power_off_cb is NULL.
> +        */
> +       int power_off_priority;
> +
> +       /**
> +        * @power_off_chaining_allowed:
> +        *
> +        * False if callbacks execution should stop when @power_off_cb fails
> +        * to power off machine.  True if further lower priority power-off
> +        * callback should be executed.
> +        */
> +       bool power_off_chaining_allowed;
> +
> +       /**
> +        * @restart_cb:
> +        *
> +        * Callback that should reboot machine.  Inactive if NULL.
> +        */
> +       void (*restart_cb)(struct restart_data *data);
> +
> +       /**
> +        * @restart_priority:
> +        *
> +        * Restart callback priority, must be unique.  Zero value is reassigned
> +        * to default priority.  Inactive if @restart_cb is NULL.
> +        */
> +       int restart_priority;
> +
> +       /**
> +        * @reboot_prepare_cb:
> +        *
> +        * Reboot preparation callback.  All reboot preparation callbacks are
> +        * invoked before @restart_cb.  Inactive if NULL.
> +        */
> +       void (*reboot_prepare_cb)(struct reboot_prep_data *data);
> +
> +       /**
> +        * @priv:
> +        *
> +        * Internal data.  Shouldn't be touched.
> +        */
> +       const struct power_handler_private_data priv;
> +};

^ permalink raw reply

* Re: [PATCH v2 08/45] kernel: Add combined power-off+restart handler call chain API
From: Rafael J. Wysocki @ 2021-10-28  9:53 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Ulf Hansson, Rich Felker, linux-ia64, Tomer Maimon,
	Santosh Shilimkar, Rafael J . Wysocki, Boris Ostrovsky,
	Catalin Marinas, Linus Walleij, Dave Hansen,
	the arch/x86 maintainers, Tali Perry, James E.J. Bottomley,
	Thierry Reding, Guo Ren, Pavel Machek, H. Peter Anvin,
	linux-riscv, Vincent Chen, Will Deacon, Greg Ungerer,
	Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Linux-sh list, Lee Jones, Helge Deller,
	Daniel Lezcano, Russell King, linux-csky, Jonathan Hunter,
	Tony Lindgren, Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven,
	xen-devel, linux-mips, Guenter Roeck, Len Brown, Albert Ou,
	Linux OMAP Mailing List, Jonathan Neuschäfer,
	Vladimir Zapolskiy, ACPI Devel Maling List, linux-m68k,
	Mark Brown, Borislav Petkov, Greentime Hu, Paul Walmsley,
	linux-tegra, Thomas Gleixner, Andy Shevchenko, Nancy Yuen,
	Linux ARM, Juergen Gross, Thomas Bogendoerfer, linux-parisc,
	Nick Hu, Avi Fishman, Patrick Venture, Linux PM, Liam Girdwood,
	Linux Kernel Mailing List, Palmer Dabbelt, Philipp Zabel,
	Paul Mackerras, Andrew Morton, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-9-digetx@gmail.com>

On Wed, Oct 27, 2021 at 11:18 PM Dmitry Osipenko <digetx@gmail.com> wrote:
>
> SoC platforms often have multiple options of how to perform system's
> power-off and restart operations. Meanwhile today's kernel is limited to
> a single option. Add combined power-off+restart handler call chain API,
> which is inspired by the restart API. The new API provides both power-off
> and restart functionality.
>
> The old pm_power_off method will be kept around till all users are
> converted to the new API.
>
> Current restart API will be replaced by the new unified API since
> new API is its superset. The restart functionality of the power-handler
> API is built upon the existing restart-notifier APIs.
>
> In order to ease conversion to the new API, convenient helpers are added
> for the common use-cases. They will reduce amount of boilerplate code and
> remove global variables. These helpers preserve old behaviour for cases
> where only one power-off handler is executed, this is what existing
> drivers want, and thus, they could be easily converted to the new API.
> Users of the new API should explicitly enable power-off chaining by
> setting corresponding flag of the power_handler structure.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  include/linux/reboot.h   | 176 +++++++++++-
>  kernel/power/hibernate.c |   2 +-
>  kernel/reboot.c          | 601 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 768 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/reboot.h b/include/linux/reboot.h
> index b7fa25726323..0ec835338c27 100644
> --- a/include/linux/reboot.h
> +++ b/include/linux/reboot.h
> @@ -8,10 +8,16 @@
>
>  struct device;
>
> -#define SYS_DOWN       0x0001  /* Notify of system down */
> -#define SYS_RESTART    SYS_DOWN
> -#define SYS_HALT       0x0002  /* Notify of system halt */
> -#define SYS_POWER_OFF  0x0003  /* Notify of system power off */
> +enum reboot_prepare_mode {
> +       SYS_DOWN = 1,           /* Notify of system down */
> +       SYS_RESTART = SYS_DOWN,
> +       SYS_HALT,               /* Notify of system halt */
> +       SYS_POWER_OFF,          /* Notify of system power off */
> +};
> +
> +#define RESTART_PRIO_RESERVED          0
> +#define RESTART_PRIO_DEFAULT           128
> +#define RESTART_PRIO_HIGH              192
>
>  enum reboot_mode {
>         REBOOT_UNDEFINED = -1,
> @@ -49,6 +55,167 @@ int register_restart_handler(struct notifier_block *);
>  int unregister_restart_handler(struct notifier_block *);
>  void do_kernel_restart(char *cmd);
>
> +/*
> + * Unified poweroff + restart API.
> + */
> +
> +#define POWEROFF_PRIO_RESERVED         0
> +#define POWEROFF_PRIO_PLATFORM         1
> +#define POWEROFF_PRIO_DEFAULT          128
> +#define POWEROFF_PRIO_HIGH             192
> +#define POWEROFF_PRIO_FIRMWARE         224
> +
> +enum poweroff_mode {
> +       POWEROFF_NORMAL = 0,
> +       POWEROFF_PREPARE,
> +};
> +
> +struct power_off_data {
> +       void *cb_data;
> +};
> +
> +struct power_off_prep_data {
> +       void *cb_data;
> +};
> +
> +struct restart_data {
> +       void *cb_data;
> +       const char *cmd;
> +       enum reboot_mode mode;
> +};
> +
> +struct reboot_prep_data {
> +       void *cb_data;
> +       const char *cmd;
> +       enum reboot_prepare_mode mode;
> +};
> +
> +struct power_handler_private_data {
> +       struct notifier_block reboot_prep_nb;
> +       struct notifier_block power_off_nb;
> +       struct notifier_block restart_nb;
> +       void (*trivial_power_off_cb)(void);
> +       void (*simple_power_off_cb)(void *data);
> +       void *simple_power_off_cb_data;
> +       bool registered;
> +};
> +
> +/**
> + * struct power_handler - Machine power-off + restart handler
> + *
> + * Describes power-off and restart handlers which are invoked by kernel
> + * to power off or restart this machine.  Supports prioritized chaining for
> + * both restart and power-off handlers.  Callback's priority must be unique.
> + * Intended to be used by device drivers that are responsible for restarting
> + * and powering off hardware which kernel is running on.
> + *
> + * Struct power_handler can be static.  Members of this structure must not be
> + * altered while handler is registered.
> + *
> + * Fill the structure members and pass it to register_power_handler().
> + */
> +struct power_handler {

The name of this structure is too generic IMV.  There are many things
that it might apply to in principle.

What about calling power_off_handler or sys_off_handler as it need not
be about power at all?

^ permalink raw reply

* [PATCH] drivers: scsi: replace snprintf in show functions with sysfs_emit
From: cgel.zte @ 2021-10-28 10:18 UTC (permalink / raw)
  To: tyreld
  Cc: Jing Yao, martin.petersen, linux-scsi, jejb, linux-kernel, paulus,
	linuxppc-dev, Zeal Robot

From: Jing Yao <yao.jing2@zte.com.cn>

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Jing Yao <yao.jing2@zte.com.cn>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index d0eab5700dc5..9f20fefc2c02 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3403,7 +3403,7 @@ static ssize_t ibmvfc_show_host_partition_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
+	return sysfs_emit(buf, PAGE_SIZE, "%s\n",
 			vhost->login_buf->resp.partition_name);
 }
 
@@ -3413,7 +3413,7 @@ static ssize_t ibmvfc_show_host_device_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
+	return sysfs_emit(buf, PAGE_SIZE, "%s\n",
 			vhost->login_buf->resp.device_name);
 }
 
@@ -3423,7 +3423,7 @@ static ssize_t ibmvfc_show_host_loc_code(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
+	return sysfs_emit(buf, PAGE_SIZE, "%s\n",
 			vhost->login_buf->resp.port_loc_code);
 }
 
@@ -3433,7 +3433,7 @@ static ssize_t ibmvfc_show_host_drc_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
+	return sysfs_emit(buf, PAGE_SIZE, "%s\n",
 			vhost->login_buf->resp.drc_name);
 }
 
@@ -3442,7 +3442,7 @@ static ssize_t ibmvfc_show_host_npiv_version(struct device *dev,
 {
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
-	return snprintf(buf, PAGE_SIZE, "%d\n", be32_to_cpu(vhost->login_buf->resp.version));
+	return sysfs_emit(buf, PAGE_SIZE, "%d\n", be32_to_cpu(vhost->login_buf->resp.version));
 }
 
 static ssize_t ibmvfc_show_host_capabilities(struct device *dev,
@@ -3450,7 +3450,7 @@ static ssize_t ibmvfc_show_host_capabilities(struct device *dev,
 {
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
-	return snprintf(buf, PAGE_SIZE, "%llx\n", be64_to_cpu(vhost->login_buf->resp.capabilities));
+	return sysfs_emit(buf, PAGE_SIZE, "%llx\n", be64_to_cpu(vhost->login_buf->resp.capabilities));
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related

* Re: instruction storage exception handling
From: Nicholas Piggin @ 2021-10-28  9:35 UTC (permalink / raw)
  To: Jacques de Laval; +Cc: Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <VFoVC_F_r-pD62fs3PMz16KfdtzX-4Sa6QajxxabOAnilDnV_olPSFbVVmYsWUeH4BUxcI7YaJ4RnKKhdqhqOLEopCeFZrqEcjJLyJlyX_I=@protonmail.com>

Excerpts from Jacques de Laval's message of October 28, 2021 7:08 pm:
> On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>>
>> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
>> >
>> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>> > >
>> > > > You're right. In that case it shouldn't change anything unless there
>> > > >
>> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
>> > > >
>> > > > on the NIP and instructions, it looks like it's probably got the correct
>> > > >
>> > > > user address that it's storing into vmf on the stack, so it has got past
>> > > >
>> > > > the access checks so my theory would be wrong anyway.
>> > > >
>> > > > Must be something simple but I can't see it yet.
>> > >
>> > > Anyway, I think it is still worth doing the check with setting 0 in
>> > >
>> > > _ESR(r11), maybe the Reference Manual is wrong.
>> > >
>> > > So Jacques, please do the test anyway if you can.
>> > >
>> > > Thanks
>> > >
>> > > Christophe
>> >
>> > I tested with the last patch from Nicholas, and with that I can not
>> >
>> > reproduce the issue, so it seems like that solves it for my case and setup
>> >
>> > at least.
>> >
>> > Big thanks Christophe and Nicholas for looking in to this!
>>
>> Thanks for reporting and testing. We can certainly send this patch
>>
>> upstream to fix the regression, but I'm still not exactly sure what is
>>
>> going on. If it is an errata or part of specification we missed that
>>
>> could explain it but it would be good to understand and comment it.
>>
>> If you have time to test again with only the following patch applied,
>>
>> it might give a better clue. This patch should keep running but it
>>
>> would print some dmesg output.
>>
>> Thanks,
>>
>> Nick
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>
>> index a8d0ce85d39a..cf56f23ff90a 100644
>>
>> --- a/arch/powerpc/mm/fault.c
>>
>> +++ b/arch/powerpc/mm/fault.c
>>
>> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>>
>> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>>
>> {
>>
>> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>>
>> -         if (regs->dsisr != 0) {
>>
>>
>> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>>
>>
>> -         	regs->dsisr = 0; // fix?
>>
>>
>> -         }
>>
>>
>> -   }
>>
>>     __do_page_fault(regs);
>>
>>     }
>>
> 
> As expected it keeps running. The output, and number of prints is naturally
> a bit different from time to time, but dsisr/ESR is always 0x800000.
> 
> Here's a representative output from one run:
> 
> 	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
> 	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
> 	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000

Great, thanks for testing that is interesting.

Looking a bit more,

https://www.nxp.com/files-static/32bit/doc/ref_manual/EREF_RM.pdf

This is the manual for e500 family including e5500.

Table 8-4. Interrupt Summary by IVOR shows ISI interrupt as affecting 
ESR register with [PT], this means the PT bit may be set. There is no
BO bit specified here.

The architecture (and this manual) says that if an interrupt type 
affects one of the ESR bits then all others are cleared. However if
we look at the 4.8.7 Exception Syndrome Register (ESR) definition,
the PT bit is specified with <E.PT>. This means it is implemented if
the processor supports the E.PT extension.

According to this table, the e5500 does not support E.PT.
https://www.linux-kvm.org/page/E500_virtual_CPU_specification

So it seems possible that a processor which does not support E.PT and 
therefore ISI will never set any bits in ESR, will not zero the ESR
when it takes an ISI intrrupt, without violating the specification.

It looks like that is what is happening here, ESR is being left from
a previous store DSI.

Thanks,
Nick

^ permalink raw reply

* Re: instruction storage exception handling
From: Jacques de Laval @ 2021-10-28  9:08 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1635389034.knz9p2g41k.astroid@bobo.none>

On Thursday, October 28th, 2021 at 5:01 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
>
> > On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
> >
> > > Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
> > >
> > > > You're right. In that case it shouldn't change anything unless there
> > > >
> > > > was a BO fault. I'm not sure what the problem is then. Guessing based
> > > >
> > > > on the NIP and instructions, it looks like it's probably got the correct
> > > >
> > > > user address that it's storing into vmf on the stack, so it has got past
> > > >
> > > > the access checks so my theory would be wrong anyway.
> > > >
> > > > Must be something simple but I can't see it yet.
> > >
> > > Anyway, I think it is still worth doing the check with setting 0 in
> > >
> > > _ESR(r11), maybe the Reference Manual is wrong.
> > >
> > > So Jacques, please do the test anyway if you can.
> > >
> > > Thanks
> > >
> > > Christophe
> >
> > I tested with the last patch from Nicholas, and with that I can not
> >
> > reproduce the issue, so it seems like that solves it for my case and setup
> >
> > at least.
> >
> > Big thanks Christophe and Nicholas for looking in to this!
>
> Thanks for reporting and testing. We can certainly send this patch
>
> upstream to fix the regression, but I'm still not exactly sure what is
>
> going on. If it is an errata or part of specification we missed that
>
> could explain it but it would be good to understand and comment it.
>
> If you have time to test again with only the following patch applied,
>
> it might give a better clue. This patch should keep running but it
>
> would print some dmesg output.
>
> Thanks,
>
> Nick
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>
> index a8d0ce85d39a..cf56f23ff90a 100644
>
> --- a/arch/powerpc/mm/fault.c
>
> +++ b/arch/powerpc/mm/fault.c
>
> @@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
>
> DEFINE_INTERRUPT_HANDLER(do_page_fault)
>
> {
>
> -   if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
>
> -         if (regs->dsisr != 0) {
>
>
> -         	printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
>
>
> -         	regs->dsisr = 0; // fix?
>
>
> -         }
>
>
> -   }
>
>     __do_page_fault(regs);
>
>     }
>

As expected it keeps running. The output, and number of prints is naturally
a bit different from time to time, but dsisr/ESR is always 0x800000.

Here's a representative output from one run:

	ISI pc:b789e6c0 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:b7884220 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:b78c18a4 msr:2d002 dsisr:800000 ESR:800000
	ISI pc:55a238 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:412380 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:3aabe0 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:47a0e0 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:443290 msr:2f902 dsisr:800000 ESR:800000
	ISI pc:43b350 msr:2d002 dsisr:800000 ESR:800000

Thanks,
Jacques

^ permalink raw reply

* Re: [PATCH v2] perf vendor events power10: Add metric events json file for power10 platform
From: kajoljain @ 2021-10-28  9:04 UTC (permalink / raw)
  To: Paul A. Clarke
  Cc: maddy, rnsastry, linuxppc-dev, linux-kernel, acme,
	linux-perf-users, atrajeev, jolsa
In-Reply-To: <31576d89-583a-dd56-056f-58784184c8b7@linux.ibm.com>



On 10/26/21 3:28 PM, kajoljain wrote:
> 
> 
> On 10/22/21 8:19 PM, Paul A. Clarke wrote:
>> Thanks for the changes!
>> More nits below (many left over from prior review)...
>>
>> On Fri, Oct 22, 2021 at 11:55:05AM +0530, Kajol Jain wrote:
>>> Add pmu metric json file for power10 platform.
>>>
>>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>>> ---
>>> Changelog v1 -> v2:
>>> - Did some nit changes in BriefDescription field
>>>   as suggested by Paul A. Clarke
>>>
>>> - Link to the v1 patch: https://lkml.org/lkml/2021/10/6/131
>>>
>>>  .../arch/powerpc/power10/metrics.json         | 676 ++++++++++++++++++
>>>  1 file changed, 676 insertions(+)
>>>  create mode 100644 tools/perf/pmu-events/arch/powerpc/power10/metrics.json
>>>
>>> diff --git a/tools/perf/pmu-events/arch/powerpc/power10/metrics.json b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
>>> new file mode 100644
>>> index 000000000000..8adab5cd9934
>>> --- /dev/null
>>> +++ b/tools/perf/pmu-events/arch/powerpc/power10/metrics.json
>>> @@ -0,0 +1,676 @@
>>> +[
>>> +    {
>>> +        "BriefDescription": "Percentage of cycles that are run cycles",
>>> +        "MetricExpr": "PM_RUN_CYC / PM_CYC * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "RUN_CYCLES_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per completed instruction",
>>> +        "MetricExpr": "PM_CYC / PM_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "CYCLES_PER_INSTRUCTION"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled for any reason",
>>> +        "MetricExpr": "PM_DISP_STALL_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled because there was a flush",
>>> +        "MetricExpr": "PM_DISP_STALL_FLUSH / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_FLUSH_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled because the MMU was handling a translation miss",
>>> +        "MetricExpr": "PM_DISP_STALL_TRANSLATION / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_TRANSLATION_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled waiting to resolve an instruction ERAT miss",
>>> +        "MetricExpr": "PM_DISP_STALL_IERAT_ONLY_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_IERAT_ONLY_MISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled waiting to resolve an instruction TLB miss",
>>> +        "MetricExpr": "PM_DISP_STALL_ITLB_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_ITLB_MISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled due to an icache miss",
>>> +        "MetricExpr": "PM_DISP_STALL_IC_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_IC_MISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while the instruction was fetched from the local L2",
>>> +        "MetricExpr": "PM_DISP_STALL_IC_L2 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_IC_L2_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while the instruction was fetched from the local L3",
>>> +        "MetricExpr": "PM_DISP_STALL_IC_L3 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_IC_L3_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while the instruction was fetched from any source beyond the local L3",
>>> +        "MetricExpr": "PM_DISP_STALL_IC_L3MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_IC_L3MISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled due to an icache miss after a branch mispredict",
>>> +        "MetricExpr": "PM_DISP_STALL_BR_MPRED_ICMISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_BR_MPRED_ICMISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while instruction was fetched from the local L2 after suffering a branch mispredict",
>>> +        "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L2 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_BR_MPRED_IC_L2_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while instruction was fetched from the local L3 after suffering a branch mispredict",
>>> +        "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L3 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_BR_MPRED_IC_L3_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled while instruction was fetched from any source beyond the local L3 after suffering a branch mispredict",
>>> +        "MetricExpr": "PM_DISP_STALL_BR_MPRED_IC_L3MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_BR_MPRED_IC_L3MISS_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled due to a branch mispredict",
>>> +        "MetricExpr": "PM_DISP_STALL_BR_MPRED / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_BR_MPRED_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch for any reason",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch because of a synchronizing instruction that requires the ICT to be empty before dispatch",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_SYNC_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISP_HELD_STALL_SYNC_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch while waiting on the scoreboard",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_SCOREBOARD_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISP_HELD_STALL_SCOREBOARD_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch due to issue queue full",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_ISSQ_FULL_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISP_HELD_STALL_ISSQ_FULL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch because the mapper/SRB was full",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_RENAME_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_RENAME_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch because the STF mapper/SRB was full",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_STF_MAPPER_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_STF_MAPPER_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch because the XVFC mapper/SRB was full",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_XVFC_MAPPER_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_XVFC_MAPPER_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch for any other reason",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_OTHER_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_OTHER_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction has been dispatched but not issued for any reason",
>>> +        "MetricExpr": "PM_ISSUE_STALL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "ISSUE_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting to be finished in one of the execution units",
>>> +        "MetricExpr": "PM_EXEC_STALL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "EXECUTION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction spent executing an NTC instruction that gets flushed some time after dispatch",
>>> +        "MetricExpr": "PM_EXEC_STALL_NTC_FLUSH / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "NTC_FLUSH_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTF instruction finishes at dispatch",
>>> +        "MetricExpr": "PM_EXEC_STALL_FIN_AT_DISP / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "FIN_AT_DISP_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is executing in the branch unit",
>>> +        "MetricExpr": "PM_EXEC_STALL_BRU / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "BRU_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a simple fixed point instruction that is executing in the LSU",
>>> +        "MetricExpr": "PM_EXEC_STALL_SIMPLE_FX / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "SIMPLE_FX_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is executing in the VSU",
>>> +        "MetricExpr": "PM_EXEC_STALL_VSU / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "VSU_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting to be finished in one of the execution units",
>>> +        "MetricExpr": "PM_EXEC_STALL_TRANSLATION / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "TRANSLATION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a load or store that suffered a translation miss",
>>> +        "MetricExpr": "PM_EXEC_STALL_DERAT_ONLY_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DERAT_ONLY_MISS_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is recovering from a TLB miss",
>>> +        "MetricExpr": "PM_EXEC_STALL_DERAT_DTLB_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DERAT_DTLB_MISS_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is executing in the LSU",
>>> +        "MetricExpr": "PM_EXEC_STALL_LSU / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "LSU_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a load that is executing in the LSU",
>>> +        "MetricExpr": "PM_EXEC_STALL_LOAD / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "LOAD_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from either the local L2 or local L3",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_L2L3 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_L2L3_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from either the local L2 or local L3, with an RC dispatch conflict",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_L2L3_CONFLICT / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_L2L3_CONFLICT_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from either the local L2 or local L3, without an RC dispatch conflict",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_L2L3_NOCONFLICT / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_L2L3_NOCONFLICT_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from a source beyond the local L2 and local L3",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_L3MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_L3MISS_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from a neighbor chiplet's L2 or L3 in the same chip",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_L21_L31 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_L21_L31_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from L4, local memory or OpenCapp chip",
>>
>> What is "OpenCapp"?  Is is different from OpenCAPI?
> 
> Hi Paul,
>     Yes, OpenCapp is same as OpenCAPI. But as these descriptions
> are provided by hardware team and same is followed in the PMU workbook.
> We need to use OpenCapp.

Hi Paul,
      I further checked with the team, using OpenCAPI make more sense.
Thanks for pointing it. Will correct it in the next version.

Thanks,
Kajol Jain

> 
>>
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_LMEM / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_LMEM_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from a remote chip (cache, L4, memory or OpenCapp) in the same group",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_OFF_CHIP / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_OFF_CHIP_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is waiting for a load miss to resolve from a distant chip (cache, L4, memory or OpenCapp chip)",
>>> +        "MetricExpr": "PM_EXEC_STALL_DMISS_OFF_NODE / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DMISS_OFF_NODE_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is executing a TLBIEL instruction",
>>> +        "MetricExpr": "PM_EXEC_STALL_TLBIEL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "TLBIEL_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is finishing a load after its data has been reloaded from a data source beyond the local L1, OR when the LSU is processing an L1-hit, OR when the NTF instruction merged with another load in the LMQ",
>>> +        "MetricExpr": "PM_EXEC_STALL_LOAD_FINISH / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "LOAD_FINISH_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a store that is executing in the LSU",
>>> +        "MetricExpr": "PM_EXEC_STALL_STORE / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "STORE_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is in the store unit outside of handling store misses or other special store operations",
>>
>> Is "store unit" not the same as "LSU" ?  Use "LSU" uniformly if appropriate:
>> s/store unit/LSU/
> 
> Here using store unit is more appropriate as we are counting
> instructions executed in the store unit of LSU.
> 
>>
>>> +        "MetricExpr": "PM_EXEC_STALL_STORE_PIPE / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "STORE_PIPE_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a store whose cache line was not resident in the L1 and had to wait for allocation of the missing line into the L1",
>>> +        "MetricExpr": "PM_EXEC_STALL_STORE_MISS / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "STORE_MISS_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a TLBIE instruction waiting for a response from the L2",
>>> +        "MetricExpr": "PM_EXEC_STALL_TLBIE / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "TLBIE_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is executing a PTESYNC instruction",
>>> +        "MetricExpr": "PM_EXEC_STALL_PTESYNC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "PTESYNC_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction cannot complete because the thread was blocked",
>>> +        "MetricExpr": "PM_CMPL_STALL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction cannot complete because it was interrupted by ANY exception",
>>> +        "MetricExpr": "PM_CMPL_STALL_EXCEPTION / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "EXCEPTION_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is stuck at finish waiting for the non-speculative finish of either a STCX instruction waiting for its result or a load waiting for non-critical sectors of data and ECC",
>>> +        "MetricExpr": "PM_CMPL_STALL_MEM_ECC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "MEM_ECC_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction cannot complete the instruction is a STCX instruction waiting for resolution from the nest",
>>
>> Need to reword this, I think.  I propose "Average cycles per instruction
>> when the NTC instruction is a STCX instruction waiting for resolution
>> from the nest", which follows the form used by HWSYNC_COMPLETION_STALL_CPI,
>> below.
> 
> Yes make sense. Will update this description.
> 
>>
>>> +        "MetricExpr": "PM_CMPL_STALL_STCX / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "STCX_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a LWSYNC instruction waiting to complete",
>>> +        "MetricExpr": "PM_CMPL_STALL_LWSYNC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "LWSYNC_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction is a HWSYNC instruction stuck at finish waiting for a response from the L2",
>>> +        "MetricExpr": "PM_CMPL_STALL_HWSYNC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "HWSYNC_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction required special handling before completion",
>>> +        "MetricExpr": "PM_CMPL_STALL_SPECIAL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "SPECIAL_COMPLETION_STALL_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when dispatch was stalled because fetch was being held, so there was nothing in the pipeline for this thread",
>>> +        "MetricExpr": "PM_DISP_STALL_FETCH / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_FETCH_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTC instruction was held at dispatch because of power management",
>>> +        "MetricExpr": "PM_DISP_STALL_HELD_HALT_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "CPI",
>>> +        "MetricName": "DISPATCHED_HELD_HALT_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of flushes per completed run instruction",
>>
>> s/per completed run instruction/per instruction/
> 
> As discussed I will update it to completed insruction in all the below
> descriptions.
> 
>>
>>> +        "MetricExpr": "PM_FLUSH / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "FLUSH_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of flushes due to a branch mispredict per instruction",
>>> +        "MetricExpr": "PM_FLUSH_MPRED / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "BR_MPRED_FLUSH_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of branch mispredictions per completed run instruction",
>>
>> s/per completed run instruction/per instruction/
>>
>>> +        "MetricExpr": "PM_BR_MPRED_CMPL / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "BRANCH_MISPREDICTION_RATE"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of finished loads that missed in the L1",
>>> +        "MetricExpr": "PM_LD_MISS_L1 / PM_LD_REF_L1 * 100",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "L1_LD_MISS_RATIO",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of completed run instructions that were loads that missed the L1",
>>
>> s/completed run instructions/instructions/
>>
>>> +        "MetricExpr": "PM_LD_MISS_L1 / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "L1_LD_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of instructions when the DPTEG required for the load/store instruction in execution was missing from the TLB",
>>> +        "MetricExpr": "PM_DTLB_MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Others",
>>> +        "MetricName": "DTLB_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of instructions dispatched per instruction completed",
>>> +        "MetricExpr": "PM_INST_DISP / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "DISPATCH_PER_INST_CMPL"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of completed run instructions that were a demand load that did not hit in the L1 or L2",
>>
>> s/completed run instructions/instructions/
>>
>>> +        "MetricExpr": "PM_DATA_FROM_L2MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "L2_LD_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of completed run instructions that were demand fetches that missed the L1 instruction cache",
>>
>> s/completed run instructions/instructions/
>> s/instruction cache/icache/ to be consistent with the rest of the file
>>
>>> +        "MetricExpr": "PM_L1_ICACHE_MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Instruction_Misses",
>>> +        "MetricName": "L1_INST_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of completed run instructions that were demand fetches that reloaded from beyond the L3 instruction cache",
>>
>> s/completed run instructions/instructions/
>> s/instruction cache/icache/ to be consistent with the rest of the file
> 
> Sure, I will make it icache.
> 
> Thanks,
> Kajol Jain
> 
>>
>>> +        "MetricExpr": "PM_INST_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "L3_INST_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of completed instructions per cycle",
>>> +        "MetricExpr": "PM_INST_CMPL / PM_CYC",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "IPC"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of cycles per completed instruction group",
>>> +        "MetricExpr": "PM_CYC / PM_1PLUS_PPC_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "CYCLES_PER_COMPLETED_INSTRUCTIONS_SET"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of cycles when at least 1 instruction dispatched",
>>> +        "MetricExpr": "PM_1PLUS_PPC_DISP / PM_RUN_CYC * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "CYCLES_ATLEAST_ONE_INST_DISPATCHED",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of finished loads per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_LD_REF_L1 / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "LOADS_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of finished stores per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_ST_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "STORES_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of demand loads that reloaded from beyond the L2 per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DATA_FROM_L2MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "dL1_Reloads",
>>> +        "MetricName": "DL1_RELOAD_FROM_L2_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of demand loads that reloaded from beyond the L3 per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DATA_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "dL1_Reloads",
>>> +        "MetricName": "DL1_RELOAD_FROM_L3_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses with 4k page size per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DERAT_MISS_4K / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_4K_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses with 64k page size per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DERAT_MISS_64K / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_64K_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of run cycles per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_RUN_CYC / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "RUN_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DERAT_MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of completed run instructions per run cycle",
>>
>> s/completed run instructions/instructions/
>>
>>> +        "MetricExpr": "PM_RUN_INST_CMPL / PM_RUN_CYC",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "RUN_IPC"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of instructions completed per instruction group",
>>
>> s/completed//
>>
>>> +        "MetricExpr": "PM_RUN_INST_CMPL / PM_1PLUS_PPC_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "AVERAGE_COMPLETED_INSTRUCTION_SET_SIZE"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of finished instructions per completed run instructions",
>>
>> s/completed run instructions/instruction/
>>
>>> +        "MetricExpr": "PM_INST_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "INST_FIN_PER_CMPL"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average cycles per instruction when the NTF instruction is completing and the finish was overlooked",
>>> +        "MetricExpr": "PM_EXEC_STALL_UNKNOWN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "EXEC_STALL_UNKOWN_CPI"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of finished branches that were taken",
>>> +        "MetricExpr": "PM_BR_TAKEN_CMPL / PM_BR_FIN * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "TAKEN_BRANCHES",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of completed run instructions that were a demand load that did not hit in the L1, L2, or the L3",
>>
>> s/completed run instructions/instructions/
>>
>>> +        "MetricExpr": "PM_DATA_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "L3_LD_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of finished branches per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_BR_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "BRANCHES_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of instructions finished in the LSU per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_LSU_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "LSU_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of instructions finished in the VSU per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_VSU_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "VSU_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of TLBIE instructions finished in the LSU per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_TLBIE_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "TLBIE_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of STCX instructions finshed per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_STCX_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "STXC_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of LARX instructions finshed per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_LARX_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "LARX_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of PTESYNC instructions finshed per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_PTESYNC_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "PTESYNC_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Average number of simple fixed-point instructions finshed in the store unit per completed run instruction",
>>
>> s/completed run instruction/instruction/
>> s/store unit/LSU/
>>
>>> +        "MetricExpr": "PM_FX_LSU_FIN / PM_RUN_INST_CMPL",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "FX_PER_INST"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of demand load misses that reloaded the L1 cache",
>>> +        "MetricExpr": "PM_LD_DEMAND_MISS_L1 / PM_LD_MISS_L1 * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "DL1_MISS_RELOADS",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of demand load misses that reloaded from beyond the local L2",
>>> +        "MetricExpr": "PM_DATA_FROM_L2MISS / PM_LD_DEMAND_MISS_L1 * 100",
>>> +        "MetricGroup": "dL1_Reloads",
>>> +        "MetricName": "DL1_RELOAD_FROM_L2_MISS",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of demand load misses that reloaded from beyond the local L3",
>>> +        "MetricExpr": "PM_DATA_FROM_L3MISS / PM_LD_DEMAND_MISS_L1 * 100",
>>> +        "MetricGroup": "dL1_Reloads",
>>> +        "MetricName": "DL1_RELOAD_FROM_L3_MISS",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of cycles stalled due to the NTC instruction waiting for a load miss to resolve from a source beyond the local L2 and local L3",
>>> +        "MetricExpr": "DMISS_L3MISS_STALL_CPI / RUN_CPI * 100",
>>> +        "MetricGroup": "General",
>>> +        "MetricName": "DCACHE_MISS_CPI",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses with 2M page size per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DERAT_MISS_2M / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_2M_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses with 16M page size per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_DERAT_MISS_16M / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_16M_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "DERAT miss ratio for 4K page size",
>>> +        "MetricExpr": "PM_DERAT_MISS_4K / PM_DERAT_MISS",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_4K_MISS_RATIO"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "DERAT miss ratio for 2M page size",
>>> +        "MetricExpr": "PM_DERAT_MISS_2M / PM_DERAT_MISS",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_2M_MISS_RATIO"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "DERAT miss ratio for 16M page size",
>>> +        "MetricExpr": "PM_DERAT_MISS_16M / PM_DERAT_MISS",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_16M_MISS_RATIO"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "DERAT miss ratio for 64K page size",
>>> +        "MetricExpr": "PM_DERAT_MISS_64K / PM_DERAT_MISS",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_64K_MISS_RATIO"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of DERAT misses that resulted in TLB reloads",
>>> +        "MetricExpr": "PM_DTLB_MISS / PM_DERAT_MISS * 100",
>>> +        "MetricGroup": "Translation",
>>> +        "MetricName": "DERAT_MISS_RELOAD",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of icache misses that were reloaded from beyond the local L3",
>>> +        "MetricExpr": "PM_INST_FROM_L3MISS / PM_L1_ICACHE_MISS * 100",
>>> +        "MetricGroup": "Instruction_Misses",
>>> +        "MetricName": "INST_FROM_L3_MISS",
>>> +        "ScaleUnit": "1%"
>>> +    },
>>> +    {
>>> +        "BriefDescription": "Percentage of icache reloads from the beyond the L3 per completed run instruction",
>>
>> s/completed run instruction/instruction/
>>
>>> +        "MetricExpr": "PM_INST_FROM_L3MISS / PM_RUN_INST_CMPL * 100",
>>> +        "MetricGroup": "Instruction_Misses",
>>> +        "MetricName": "INST_FROM_L3_MISS_RATE",
>>> +        "ScaleUnit": "1%"
>>> +    }
>>> +]
>>> -- 
>>
>> PC
>>

^ permalink raw reply

* Re: [PATCH v2 09/45] xen/x86: Use do_kernel_power_off()
From: Juergen Gross @ 2021-10-28  7:35 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Lee Jones,
	Rafael J . Wysocki, Mark Brown, Andrew Morton, Guenter Roeck,
	Russell King, Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Helge Deller, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, Nick Hu, Avi Fishman,
	Patrick Venture, Liam Girdwood, linux-kernel, Palmer Dabbelt,
	Philipp Zabel, Guo Ren, linuxppc-dev, openbmc, Joshua Thompson
In-Reply-To: <20211027211715.12671-10-digetx@gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 417 bytes --]

On 27.10.21 23:16, Dmitry Osipenko wrote:
> Kernel now supports chained power-off handlers. Use do_kernel_power_off()
> that invokes chained power-off handlers. It also invokes legacy
> pm_power_off() for now, which will be removed once all drivers will
> be converted to the new power-off API.
> 
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

Acked-by: Juergen Gross <jgross@suse.com>


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* [PATCH -next] powerpc/44x/fsp2: add missing of_node_put
From: Bixuan Cui @ 2021-10-28  7:28 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel; +Cc: cuibixuan, ivan, paulus

Early exits from for_each_compatible_node() should decrement the
node reference counter.  Reported by Coccinelle:

./arch/powerpc/platforms/44x/fsp2.c:206:1-25: WARNING: Function
"for_each_compatible_node" should have of_node_put() before return
around line 218.

Fixes: 7813043e1bbc ("powerpc/44x/fsp2: Add irq error handlers")
Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
---
 arch/powerpc/platforms/44x/fsp2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/44x/fsp2.c b/arch/powerpc/platforms/44x/fsp2.c
index b299e43..823397c 100644
--- a/arch/powerpc/platforms/44x/fsp2.c
+++ b/arch/powerpc/platforms/44x/fsp2.c
@@ -208,6 +208,7 @@ static void node_irq_request(const char *compat, irq_handler_t errirq_handler)
 		if (irq == NO_IRQ) {
 			pr_err("device tree node %pOFn is missing a interrupt",
 			      np);
+			of_node_put(np);
 			return;
 		}
 
@@ -215,6 +216,7 @@ static void node_irq_request(const char *compat, irq_handler_t errirq_handler)
 		if (rc) {
 			pr_err("fsp_of_probe: request_irq failed: np=%pOF rc=%d",
 			      np, rc);
+			of_node_put(np);
 			return;
 		}
 	}
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v2 16/45] parisc: Use do_kernel_power_off()
From: Helge Deller @ 2021-10-28  6:38 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Jonathan Hunter, Lee Jones,
	Rafael J . Wysocki, Mark Brown, Andrew Morton, Guenter Roeck,
	Russell King, Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, openbmc, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Juergen Gross, Thomas Bogendoerfer, linux-parisc, Nick Hu,
	Avi Fishman, Patrick Venture, Liam Girdwood, linux-kernel,
	Palmer Dabbelt, Philipp Zabel, Guo Ren, linuxppc-dev,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-17-digetx@gmail.com>

On 10/27/21 23:16, Dmitry Osipenko wrote:
> Kernel now supports chained power-off handlers. Use do_kernel_power_off()
> that invokes chained power-off handlers. It also invokes legacy
> pm_power_off() for now, which will be removed once all drivers will
> be converted to the new power-off API.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>

Acked-by: Helge Deller <deller@gmx.de> # parisc



> ---
>  arch/parisc/kernel/process.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index ea3d83b6fb62..928201b1f58f 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -26,6 +26,7 @@
>  #include <linux/module.h>
>  #include <linux/personality.h>
>  #include <linux/ptrace.h>
> +#include <linux/reboot.h>
>  #include <linux/sched.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched/task.h>
> @@ -114,8 +115,7 @@ void machine_power_off(void)
>  	pdc_chassis_send_status(PDC_CHASSIS_DIRECT_SHUTDOWN);
>
>  	/* ipmi_poweroff may have been installed. */
> -	if (pm_power_off)
> -		pm_power_off();
> +	do_kernel_power_off();
>
>  	/* It seems we have no way to power the system off via
>  	 * software. The user has to press the button himself. */
>


^ permalink raw reply

* [powerpc:next-test 29/62] page_poison.c:(.text+0x350): multiple definition of `__kernel_map_pages'; arch/powerpc/mm/pgtable_32.o:pgtable_32.c:(.text+0xa0): first defined here
From: kernel test robot @ 2021-10-28  6:47 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1551 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
head:   5d354dc35ebb0b224d627264c60f60dbda3a1bc3
commit: 68b44f94d6370e2c6c790fedd28e637fa9964a93 [29/62] powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE
config: powerpc-randconfig-r016-20211027 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=68b44f94d6370e2c6c790fedd28e637fa9964a93
        git remote add powerpc https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
        git fetch --no-tags powerpc next-test
        git checkout 68b44f94d6370e2c6c790fedd28e637fa9964a93
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   powerpc-linux-ld: mm/page_poison.o: in function `__kernel_map_pages':
>> page_poison.c:(.text+0x350): multiple definition of `__kernel_map_pages'; arch/powerpc/mm/pgtable_32.o:pgtable_32.c:(.text+0xa0): first defined here

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 48273 bytes --]

^ permalink raw reply

* Re: Linux kernel: powerpc: KVM guest can trigger host crash on Power8
From: Michael Ellerman @ 2021-10-28  6:39 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz; +Cc: debian-powerpc@lists.debian.org, linuxppc-dev
In-Reply-To: <c21c7a0e-95f1-e6d2-a04c-fb99d801e8da@physik.fu-berlin.de>

[ Dropping oss-security from Cc]

John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
> On 10/27/21 13:06, Michael Ellerman wrote:
>> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> writes:
>>> On 10/27/21 07:30, Michael Ellerman wrote:
>>>> I did test the repro case you gave me before (in the bugzilla), which
>>>> was building glibc, that passes for me with a patched host.
>>>
>>> Did you manage to crash the unpatched host?
>> 
>> Yes, the parallel builds of glibc you described crashed the unpatched
>> host 100% reliably for me.
>
> OK, that is very good news!
>
>> I also have a standalone reproducer I'll send you.
>
> Thanks, that would be helpful!
>
>>> Also, I'll try a kernel from git with Debian's config.
>>>
>>>> I guess we have yet another bug.
>>>>
>>>> I tried the following in a debian BE VM and it completed fine:
>>>>
>>>>  $ dget -u http://ftp.debian.org/debian/pool/main/g/git/git_2.33.1-1.dsc
>>>>  $ sbuild -d sid --arch=powerpc --no-arch-all git_2.33.1-1.dsc
>>>>
>>>> Same for ppc64.
>>>>
>>>> And I also tried both at once, repeatedly in a loop.
>>>
>>> Did you try building gcc-11 for powerpc and ppc64 both at once?
>> 
>> No, I will try that now.

That completed fine on my BE VM here.

I ran these in two tmux windows:
  $ sbuild -d sid --arch=powerpc --no-arch-all gcc-11_11.2.0-10.dsc
  $ sbuild -d sid --arch=ppc64 --no-arch-all gcc-11_11.2.0-10.dsc


The VM has 32 CPUs, with 4 threads per core:

  $ ppc64_cpu --info
  Core   0:    0*    1*    2*    3*
  Core   1:    4*    5*    6*    7*
  Core   2:    8*    9*   10*   11*
  Core   3:   12*   13*   14*   15*
  Core   4:   16*   17*   18*   19*
  Core   5:   20*   21*   22*   23*
  Core   6:   24*   25*   26*   27*
  Core   7:   28*   29*   30*   31*


cheers

^ permalink raw reply

* Re: [oss-security] Linux kernel: powerpc: KVM guest can trigger host crash on Power8
From: Salvatore Bonaccorso @ 2021-10-28  3:58 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: oss-security, linuxppc-dev, John Paul Adrian Glaubitz
In-Reply-To: <87pmrtbbdt.fsf@mpe.ellerman.id.au>

Hi,

On Mon, Oct 25, 2021 at 10:18:54PM +1100, Michael Ellerman wrote:
> The Linux kernel for powerpc since v5.2 has a bug which allows a
> malicious KVM guest to crash the host, when the host is running on
> Power8.
> 
> Only machines using Linux as the hypervisor, aka. KVM, powernv or bare
> metal, are affected by the bug. Machines running PowerVM are not
> affected.
> 
> The bug was introduced in:
> 
>     10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
> 
> Which was first released in v5.2.
> 
> The upstream fix is:
> 
>   cdeb5d7d890e ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
> 
> Which will be included in the v5.16 release.
> 
> Note to backporters, the following commits are required:
> 
>   73287caa9210ded6066833195f4335f7f688a46b
>   ("powerpc64/idle: Fix SP offsets when saving GPRs")
> 
>   9b4416c5095c20e110c82ae602c254099b83b72f
>   ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
> 
>   cdeb5d7d890e14f3b70e8087e745c4a6a7d9f337
>   ("KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest")
> 
>   496c5fe25c377ddb7815c4ce8ecfb676f051e9b6
>   ("powerpc/idle: Don't corrupt back chain when going idle")
> 
> 
> I have a test case to trigger the bug, which I can share privately with
> anyone who would like to test the fix.

The issue has been assigned CVE-2021-43056.

Regards,
Salvatore

^ permalink raw reply

* Re: [PATCH v2 18/45] riscv: Use do_kernel_power_off()
From: Palmer Dabbelt @ 2021-10-27 23:09 UTC (permalink / raw)
  To: digetx
  Cc: ulf.hansson, dalias, linux-ia64, tmaimon77, ssantosh, rafael,
	boris.ostrovsky, catalin.marinas, linus.walleij, dave.hansen, x86,
	tali.perry1, James.Bottomley, thierry.reding, guoren, pavel, hpa,
	linux-riscv, deanbo422, will, gerg, sstabellini, benjaminfair,
	ysato, krzysztof.kozlowski, linux-sh, lee.jones, deller,
	daniel.lezcano, linux, linux-csky, jonathanh, tony, wens, mingo,
	geert, xen-devel, linux-mips, linux, lenb, aou, linux-omap,
	j.neuschaefer, vz, linux-acpi, linux-m68k, broonie, bp, green.hu,
	Paul Walmsley, linux-tegra, tglx, andriy.shevchenko, yuenn,
	linux-arm-kernel, jgross, tsbogend, linux-parisc, nickhu,
	avifishman70, venture, linux-pm, lgirdwood, linux-kernel, p.zabel,
	paulus, akpm, linuxppc-dev, openbmc, funaho
In-Reply-To: <20211027211715.12671-19-digetx@gmail.com>

On Wed, 27 Oct 2021 14:16:48 PDT (-0700), digetx@gmail.com wrote:
> Kernel now supports chained power-off handlers. Use do_kernel_power_off()
> that invokes chained power-off handlers. It also invokes legacy
> pm_power_off() for now, which will be removed once all drivers will
> be converted to the new power-off API.
>
> Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  arch/riscv/kernel/reset.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/kernel/reset.c b/arch/riscv/kernel/reset.c
> index 9c842c41684a..912288572226 100644
> --- a/arch/riscv/kernel/reset.c
> +++ b/arch/riscv/kernel/reset.c
> @@ -23,16 +23,12 @@ void machine_restart(char *cmd)
>
>  void machine_halt(void)
>  {
> -	if (pm_power_off != NULL)
> -		pm_power_off();
> -	else
> -		default_power_off();
> +	do_kernel_power_off();
> +	default_power_off();
>  }
>
>  void machine_power_off(void)
>  {
> -	if (pm_power_off != NULL)
> -		pm_power_off();
> -	else
> -		default_power_off();
> +	do_kernel_power_off();
> +	default_power_off();
>  }

Acked-by: Palmer Dabbelt <palmer@dabbelt.com>

^ permalink raw reply

* Re: [PATCH] powerpc/security: Use a mutex for interrupt exit code patching
From: Nicholas Piggin @ 2021-10-28  3:03 UTC (permalink / raw)
  To: linuxppc-dev, Russell Currey
In-Reply-To: <20211027072410.40950-1-ruscur@russell.cc>

Excerpts from Russell Currey's message of October 27, 2021 5:24 pm:
> The mitigation-patching.sh script in the powerpc selftests toggles
> all mitigations on and off simultaneously, revealing that rfi_flush
> and stf_barrier cannot safely operate at the same time due to races
> in updating the static key.
> 
> On some systems, the static key code throws a warning and the kernel
> remains functional.  On others, the kernel will hang or crash.
> 
> Fix this by slapping on a mutex.
> 
> Fixes: 13799748b957 ("powerpc/64: use interrupt restart table to speed up return from interrupt")
> Signed-off-by: Russell Currey <ruscur@russell.cc>

Doh,

Acked-by: Nicholas Piggin <npiggin@gmail.com>

> ---
>  arch/powerpc/lib/feature-fixups.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c
> index cda17bee5afe..9956277fbd88 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -228,6 +228,7 @@ static void do_stf_exit_barrier_fixups(enum stf_barrier_type types)
>  
>  static bool stf_exit_reentrant = false;
>  static bool rfi_exit_reentrant = false;
> +static DEFINE_MUTEX(exit_flush_lock);
>  
>  static int __do_stf_barrier_fixups(void *data)
>  {
> @@ -253,6 +254,9 @@ void do_stf_barrier_fixups(enum stf_barrier_type types)
>  	 * low level interrupt exit code before patching. After the patching,
>  	 * if allowed, then flip the branch to allow fast exits.
>  	 */
> +
> +	// Prevent static key update races with do_rfi_flush_fixups()
> +	mutex_lock(&exit_flush_lock);
>  	static_branch_enable(&interrupt_exit_not_reentrant);
>  
>  	stop_machine(__do_stf_barrier_fixups, &types, NULL);
> @@ -264,6 +268,7 @@ void do_stf_barrier_fixups(enum stf_barrier_type types)
>  
>  	if (stf_exit_reentrant && rfi_exit_reentrant)
>  		static_branch_disable(&interrupt_exit_not_reentrant);
> +	mutex_unlock(&exit_flush_lock);
>  }
>  
>  void do_uaccess_flush_fixups(enum l1d_flush_type types)
> @@ -486,6 +491,9 @@ void do_rfi_flush_fixups(enum l1d_flush_type types)
>  	 * without stop_machine, so this could be achieved with a broadcast
>  	 * IPI instead, but this matches the stf sequence.
>  	 */
> +
> +	// Prevent static key update races with do_stf_barrier_fixups()
> +	mutex_lock(&exit_flush_lock);
>  	static_branch_enable(&interrupt_exit_not_reentrant);
>  
>  	stop_machine(__do_rfi_flush_fixups, &types, NULL);
> @@ -497,6 +505,7 @@ void do_rfi_flush_fixups(enum l1d_flush_type types)
>  
>  	if (stf_exit_reentrant && rfi_exit_reentrant)
>  		static_branch_disable(&interrupt_exit_not_reentrant);
> +	mutex_unlock(&exit_flush_lock);
>  }
>  
>  void do_barrier_nospec_fixups_range(bool enable, void *fixup_start, void *fixup_end)
> -- 
> 2.33.1
> 
> 

^ permalink raw reply

* Re: instruction storage exception handling
From: Nicholas Piggin @ 2021-10-28  3:01 UTC (permalink / raw)
  To: Christophe Leroy, Jacques de Laval
  Cc: Scott Wood, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <V0kJsLgxvO-1SWRhS-9Nbx1E6oXO6IAJDAYBUA_kieAbf5J8MOnrRzdAiCSl2KoRjztnI3LitFLsJstAOVnWZ-4PBzWmrpTiqIYnU7TRXyo=@protonmail.com>

Excerpts from Jacques de Laval's message of October 27, 2021 10:03 pm:
> On Wednesday, October 27th, 2021 at 9:52 AM, Christophe Leroy <christophe.leroy@csgroup.eu> wrote:
> 
>> Le 27/10/2021 à 09:47, Nicholas Piggin a écrit :
>>
>> > You're right. In that case it shouldn't change anything unless there
>> >
>> > was a BO fault. I'm not sure what the problem is then. Guessing based
>> >
>> > on the NIP and instructions, it looks like it's probably got the correct
>> >
>> > user address that it's storing into vmf on the stack, so it has got past
>> >
>> > the access checks so my theory would be wrong anyway.
>> >
>> > Must be something simple but I can't see it yet.
>>
>> Anyway, I think it is still worth doing the check with setting 0 in
>>
>> _ESR(r11), maybe the Reference Manual is wrong.
>>
>> So Jacques, please do the test anyway if you can.
>>
>> Thanks
>>
>> Christophe
> 
> I tested with the last patch from Nicholas, and with that I can not
> reproduce the issue, so it seems like that solves it for my case and setup
> at least.
> 
> Big thanks Christophe and Nicholas for looking in to this!

Thanks for reporting and testing. We can certainly send this patch 
upstream to fix the regression, but I'm still not exactly sure what is 
going on. If it is an errata or part of specification we missed that 
could explain it but it would be good to understand and comment it.

If you have time to test again with only the following patch applied,
it might give a better clue. This patch should keep running but it
would print some dmesg output.

Thanks,
Nick

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..cf56f23ff90a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -548,6 +548,12 @@ static __always_inline void __do_page_fault(struct pt_regs *regs)
 
 DEFINE_INTERRUPT_HANDLER(do_page_fault)
 {
+	if (TRAP(regs) == INTERRUPT_INST_STORAGE) {
+		if (regs->dsisr != 0) {
+			printk("ISI pc:%lx msr:%lx dsisr:%lx ESR:%lx\n", regs->nip, regs->msr, regs->dsisr, mfspr(SPRN_ESR));
+			regs->dsisr = 0; // fix?
+		}
+	}
 	__do_page_fault(regs);
 }
 

^ permalink raw reply related

* [PATCH] scsi: ibmvfc: replace scnprintf in show functions with sysfs_emit
From: cgel.zte @ 2021-10-28  2:24 UTC (permalink / raw)
  To: tyreld
  Cc: martin.petersen, linux-scsi, jejb, linux-kernel, Changcheng Deng,
	paulus, linuxppc-dev, Zeal Robot

From: Changcheng Deng <deng.changcheng@zte.com.cn>

Fix the coccicheck warning:
WARNING: use scnprintf or sprintf

Use sysfs_emit instead of scnprintf or sprintf makes more sense.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index a4b0a12f8a97..7f39a965677b 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -3402,8 +3402,7 @@ static ssize_t ibmvfc_show_host_partition_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
-			vhost->login_buf->resp.partition_name);
+	return sysfs_emit(buf, "%s\n", vhost->login_buf->resp.partition_name);
 }
 
 static ssize_t ibmvfc_show_host_device_name(struct device *dev,
@@ -3412,8 +3411,7 @@ static ssize_t ibmvfc_show_host_device_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
-			vhost->login_buf->resp.device_name);
+	return sysfs_emit(buf, "%s\n", vhost->login_buf->resp.device_name);
 }
 
 static ssize_t ibmvfc_show_host_loc_code(struct device *dev,
@@ -3422,8 +3420,7 @@ static ssize_t ibmvfc_show_host_loc_code(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
-			vhost->login_buf->resp.port_loc_code);
+	return sysfs_emit(buf, "%s\n", vhost->login_buf->resp.port_loc_code);
 }
 
 static ssize_t ibmvfc_show_host_drc_name(struct device *dev,
@@ -3432,8 +3429,7 @@ static ssize_t ibmvfc_show_host_drc_name(struct device *dev,
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
-			vhost->login_buf->resp.drc_name);
+	return sysfs_emit(buf, "%s\n", vhost->login_buf->resp.drc_name);
 }
 
 static ssize_t ibmvfc_show_host_npiv_version(struct device *dev,
@@ -3441,7 +3437,7 @@ static ssize_t ibmvfc_show_host_npiv_version(struct device *dev,
 {
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
-	return snprintf(buf, PAGE_SIZE, "%d\n", be32_to_cpu(vhost->login_buf->resp.version));
+	return sysfs_emit(buf, "%d\n", be32_to_cpu(vhost->login_buf->resp.version));
 }
 
 static ssize_t ibmvfc_show_host_capabilities(struct device *dev,
@@ -3449,7 +3445,7 @@ static ssize_t ibmvfc_show_host_capabilities(struct device *dev,
 {
 	struct Scsi_Host *shost = class_to_shost(dev);
 	struct ibmvfc_host *vhost = shost_priv(shost);
-	return snprintf(buf, PAGE_SIZE, "%llx\n", be64_to_cpu(vhost->login_buf->resp.capabilities));
+	return sysfs_emit(buf, "%llx\n", be64_to_cpu(vhost->login_buf->resp.capabilities));
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
From: Alistair Popple @ 2021-10-28  1:42 UTC (permalink / raw)
  To: akpm, Felix Kuehling
  Cc: rcampbell, amd-gfx, nouveau, linux-kernel, dri-devel, linux-mm,
	jglisse, kvm-ppc, ziy, jhubbard, alexander.deucher, linuxppc-dev,
	hch, bskeggs
In-Reply-To: <14d807ba-04f8-49cb-8094-bde1032f1eaf@amd.com>

On Wednesday, 27 October 2021 3:09:57 AM AEDT Felix Kuehling wrote:
> Am 2021-10-25 um 12:16 a.m. schrieb Alistair Popple:
> > MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
> > source page was already locked during migrate_vma_collect(). If it
> > wasn't then the a second attempt is made to lock the page. However if
> > the first attempt failed it's unlikely a second attempt will succeed,
> > and the retry adds complexity. So clean this up by removing the retry
> > and MIGRATE_PFN_LOCKED flag.
> >
> > Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
> > set, but nothing actually checks that.
> >
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> 
> It makes sense to me. Do you have any empirical data on how much more
> likely migrations are going to fail with this change due to contested
> page locks?

Thanks Felix. I do not have any empirical data on this but I've mostly seen
migrations fail due to the reference count check failing rather than failure to
lock the page. Even then it's mostly been due to thrashing on the same page, so
I would be surprised if this change made any noticeable difference.

> Either way, the patch is
> 
> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> 
> 
> > ---
> >  Documentation/vm/hmm.rst                 |   2 +-
> >  arch/powerpc/kvm/book3s_hv_uvmem.c       |   4 +-
> >  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |   2 -
> >  drivers/gpu/drm/nouveau/nouveau_dmem.c   |   4 +-
> >  include/linux/migrate.h                  |   1 -
> >  lib/test_hmm.c                           |   5 +-
> >  mm/migrate.c                             | 145 +++++------------------
> >  7 files changed, 35 insertions(+), 128 deletions(-)
> >
> > diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> > index a14c2938e7af..f2a59ed82ed3 100644
> > --- a/Documentation/vm/hmm.rst
> > +++ b/Documentation/vm/hmm.rst
> > @@ -360,7 +360,7 @@ between device driver specific code and shared common code:
> >     system memory page, locks the page with ``lock_page()``, and fills in the
> >     ``dst`` array entry with::
> >  
> > -     dst[i] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +     dst[i] = migrate_pfn(page_to_pfn(dpage));
> >  
> >     Now that the driver knows that this page is being migrated, it can
> >     invalidate device private MMU mappings and copy device private memory
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index a7061ee3b157..28c436df9935 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -560,7 +560,7 @@ static int __kvmppc_svm_page_out(struct vm_area_struct *vma,
> >  				  gpa, 0, page_shift);
> >  
> >  	if (ret == U_SUCCESS)
> > -		*mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
> > +		*mig.dst = migrate_pfn(pfn);
> >  	else {
> >  		unlock_page(dpage);
> >  		__free_page(dpage);
> > @@ -774,7 +774,7 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
> >  		}
> >  	}
> >  
> > -	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +	*mig.dst = migrate_pfn(page_to_pfn(dpage));
> >  	migrate_vma_pages(&mig);
> >  out_finalize:
> >  	migrate_vma_finalize(&mig);
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > index 4a16e3c257b9..41d9417f182b 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> > @@ -300,7 +300,6 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
> >  			migrate->dst[i] = svm_migrate_addr_to_pfn(adev, dst[i]);
> >  			svm_migrate_get_vram_page(prange, migrate->dst[i]);
> >  			migrate->dst[i] = migrate_pfn(migrate->dst[i]);
> > -			migrate->dst[i] |= MIGRATE_PFN_LOCKED;
> >  			src[i] = dma_map_page(dev, spage, 0, PAGE_SIZE,
> >  					      DMA_TO_DEVICE);
> >  			r = dma_mapping_error(dev, src[i]);
> > @@ -580,7 +579,6 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
> >  			      dst[i] >> PAGE_SHIFT, page_to_pfn(dpage));
> >  
> >  		migrate->dst[i] = migrate_pfn(page_to_pfn(dpage));
> > -		migrate->dst[i] |= MIGRATE_PFN_LOCKED;
> >  		j++;
> >  	}
> >  
> > diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > index 92987daa5e17..3828aafd3ac4 100644
> > --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
> > @@ -166,7 +166,7 @@ static vm_fault_t nouveau_dmem_fault_copy_one(struct nouveau_drm *drm,
> >  		goto error_dma_unmap;
> >  	mutex_unlock(&svmm->mutex);
> >  
> > -	args->dst[0] = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +	args->dst[0] = migrate_pfn(page_to_pfn(dpage));
> >  	return 0;
> >  
> >  error_dma_unmap:
> > @@ -602,7 +602,7 @@ static unsigned long nouveau_dmem_migrate_copy_one(struct nouveau_drm *drm,
> >  		((paddr >> PAGE_SHIFT) << NVIF_VMM_PFNMAP_V0_ADDR_SHIFT);
> >  	if (src & MIGRATE_PFN_WRITE)
> >  		*pfn |= NVIF_VMM_PFNMAP_V0_W;
> > -	return migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +	return migrate_pfn(page_to_pfn(dpage));
> >  
> >  out_dma_unmap:
> >  	dma_unmap_page(dev, *dma_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> > diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> > index c8077e936691..479b861ae490 100644
> > --- a/include/linux/migrate.h
> > +++ b/include/linux/migrate.h
> > @@ -119,7 +119,6 @@ static inline int migrate_misplaced_page(struct page *page,
> >   */
> >  #define MIGRATE_PFN_VALID	(1UL << 0)
> >  #define MIGRATE_PFN_MIGRATE	(1UL << 1)
> > -#define MIGRATE_PFN_LOCKED	(1UL << 2)
> >  #define MIGRATE_PFN_WRITE	(1UL << 3)
> >  #define MIGRATE_PFN_SHIFT	6
> >  
> > diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> > index c259842f6d44..e2ce8f9b7605 100644
> > --- a/lib/test_hmm.c
> > +++ b/lib/test_hmm.c
> > @@ -613,8 +613,7 @@ static void dmirror_migrate_alloc_and_copy(struct migrate_vma *args,
> >  		 */
> >  		rpage->zone_device_data = dmirror;
> >  
> > -		*dst = migrate_pfn(page_to_pfn(dpage)) |
> > -			    MIGRATE_PFN_LOCKED;
> > +		*dst = migrate_pfn(page_to_pfn(dpage));
> >  		if ((*src & MIGRATE_PFN_WRITE) ||
> >  		    (!spage && args->vma->vm_flags & VM_WRITE))
> >  			*dst |= MIGRATE_PFN_WRITE;
> > @@ -1137,7 +1136,7 @@ static vm_fault_t dmirror_devmem_fault_alloc_and_copy(struct migrate_vma *args,
> >  		lock_page(dpage);
> >  		xa_erase(&dmirror->pt, addr >> PAGE_SHIFT);
> >  		copy_highpage(dpage, spage);
> > -		*dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
> > +		*dst = migrate_pfn(page_to_pfn(dpage));
> >  		if (*src & MIGRATE_PFN_WRITE)
> >  			*dst |= MIGRATE_PFN_WRITE;
> >  	}
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index a6a7743ee98f..915e969811d0 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2369,7 +2369,6 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >  		 * can't be dropped from it).
> >  		 */
> >  		get_page(page);
> > -		migrate->cpages++;
> >  
> >  		/*
> >  		 * Optimize for the common case where page is only mapped once
> > @@ -2379,7 +2378,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >  		if (trylock_page(page)) {
> >  			pte_t swp_pte;
> >  
> > -			mpfn |= MIGRATE_PFN_LOCKED;
> > +			migrate->cpages++;
> >  			ptep_get_and_clear(mm, addr, ptep);
> >  
> >  			/* Setup special migration page table entry */
> > @@ -2413,6 +2412,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >  
> >  			if (pte_present(pte))
> >  				unmapped++;
> > +		} else {
> > +			put_page(page);
> > +			mpfn = 0;
> >  		}
> >  
> >  next:
> > @@ -2517,15 +2519,17 @@ static bool migrate_vma_check_page(struct page *page)
> >  }
> >  
> >  /*
> > - * migrate_vma_prepare() - lock pages and isolate them from the lru
> > + * migrate_vma_unmap() - replace page mapping with special migration pte entry
> >   * @migrate: migrate struct containing all migration information
> >   *
> > - * This locks pages that have been collected by migrate_vma_collect(). Once each
> > - * page is locked it is isolated from the lru (for non-device pages). Finally,
> > - * the ref taken by migrate_vma_collect() is dropped, as locked pages cannot be
> > - * migrated by concurrent kernel threads.
> > + * Isolate pages from the LRU and replace mappings (CPU page table pte) with a
> > + * special migration pte entry and check if it has been pinned. Pinned pages are
> > + * restored because we cannot migrate them.
> > + *
> > + * This is the last step before we call the device driver callback to allocate
> > + * destination memory and copy contents of original page over to new page.
> >   */
> > -static void migrate_vma_prepare(struct migrate_vma *migrate)
> > +static void migrate_vma_unmap(struct migrate_vma *migrate)
> >  {
> >  	const unsigned long npages = migrate->npages;
> >  	const unsigned long start = migrate->start;
> > @@ -2534,32 +2538,12 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >  
> >  	lru_add_drain();
> >  
> > -	for (i = 0; (i < npages) && migrate->cpages; i++) {
> > +	for (i = 0; i < npages; i++) {
> >  		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -		bool remap = true;
> >  
> >  		if (!page)
> >  			continue;
> >  
> > -		if (!(migrate->src[i] & MIGRATE_PFN_LOCKED)) {
> > -			/*
> > -			 * Because we are migrating several pages there can be
> > -			 * a deadlock between 2 concurrent migration where each
> > -			 * are waiting on each other page lock.
> > -			 *
> > -			 * Make migrate_vma() a best effort thing and backoff
> > -			 * for any page we can not lock right away.
> > -			 */
> > -			if (!trylock_page(page)) {
> > -				migrate->src[i] = 0;
> > -				migrate->cpages--;
> > -				put_page(page);
> > -				continue;
> > -			}
> > -			remap = false;
> > -			migrate->src[i] |= MIGRATE_PFN_LOCKED;
> > -		}
> > -
> >  		/* ZONE_DEVICE pages are not on LRU */
> >  		if (!is_zone_device_page(page)) {
> >  			if (!PageLRU(page) && allow_drain) {
> > @@ -2569,16 +2553,9 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >  			}
> >  
> >  			if (isolate_lru_page(page)) {
> > -				if (remap) {
> > -					migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -					migrate->cpages--;
> > -					restore++;
> > -				} else {
> > -					migrate->src[i] = 0;
> > -					unlock_page(page);
> > -					migrate->cpages--;
> > -					put_page(page);
> > -				}
> > +				migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > +				migrate->cpages--;
> > +				restore++;
> >  				continue;
> >  			}
> >  
> > @@ -2586,80 +2563,20 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >  			put_page(page);
> >  		}
> >  
> > -		if (!migrate_vma_check_page(page)) {
> > -			if (remap) {
> > -				migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -				migrate->cpages--;
> > -				restore++;
> > -
> > -				if (!is_zone_device_page(page)) {
> > -					get_page(page);
> > -					putback_lru_page(page);
> > -				}
> > -			} else {
> > -				migrate->src[i] = 0;
> > -				unlock_page(page);
> > -				migrate->cpages--;
> > +		if (page_mapped(page))
> > +			try_to_migrate(page, 0);
> >  
> > -				if (!is_zone_device_page(page))
> > -					putback_lru_page(page);
> > -				else
> > -					put_page(page);
> > +		if (page_mapped(page) || !migrate_vma_check_page(page)) {
> > +			if (!is_zone_device_page(page)) {
> > +				get_page(page);
> > +				putback_lru_page(page);
> >  			}
> > -		}
> > -	}
> > -
> > -	for (i = 0, addr = start; i < npages && restore; i++, addr += PAGE_SIZE) {
> > -		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -
> > -		if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE))
> > -			continue;
> >  
> > -		remove_migration_pte(page, migrate->vma, addr, page);
> > -
> > -		migrate->src[i] = 0;
> > -		unlock_page(page);
> > -		put_page(page);
> > -		restore--;
> > -	}
> > -}
> > -
> > -/*
> > - * migrate_vma_unmap() - replace page mapping with special migration pte entry
> > - * @migrate: migrate struct containing all migration information
> > - *
> > - * Replace page mapping (CPU page table pte) with a special migration pte entry
> > - * and check again if it has been pinned. Pinned pages are restored because we
> > - * cannot migrate them.
> > - *
> > - * This is the last step before we call the device driver callback to allocate
> > - * destination memory and copy contents of original page over to new page.
> > - */
> > -static void migrate_vma_unmap(struct migrate_vma *migrate)
> > -{
> > -	const unsigned long npages = migrate->npages;
> > -	const unsigned long start = migrate->start;
> > -	unsigned long addr, i, restore = 0;
> > -
> > -	for (i = 0; i < npages; i++) {
> > -		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -
> > -		if (!page || !(migrate->src[i] & MIGRATE_PFN_MIGRATE))
> > +			migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > +			migrate->cpages--;
> > +			restore++;
> >  			continue;
> > -
> > -		if (page_mapped(page)) {
> > -			try_to_migrate(page, 0);
> > -			if (page_mapped(page))
> > -				goto restore;
> >  		}
> > -
> > -		if (migrate_vma_check_page(page))
> > -			continue;
> > -
> > -restore:
> > -		migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -		migrate->cpages--;
> > -		restore++;
> >  	}
> >  
> >  	for (addr = start, i = 0; i < npages && restore; addr += PAGE_SIZE, i++) {
> > @@ -2672,12 +2589,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
> >  
> >  		migrate->src[i] = 0;
> >  		unlock_page(page);
> > +		put_page(page);
> >  		restore--;
> > -
> > -		if (is_zone_device_page(page))
> > -			put_page(page);
> > -		else
> > -			putback_lru_page(page);
> >  	}
> >  }
> >  
> > @@ -2700,8 +2613,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
> >   * it for all those entries (ie with MIGRATE_PFN_VALID and MIGRATE_PFN_MIGRATE
> >   * flag set).  Once these are allocated and copied, the caller must update each
> >   * corresponding entry in the dst array with the pfn value of the destination
> > - * page and with the MIGRATE_PFN_VALID and MIGRATE_PFN_LOCKED flags set
> > - * (destination pages must have their struct pages locked, via lock_page()).
> > + * page and with MIGRATE_PFN_VALID. Destination pages must be locked via
> > + * lock_page().
> >   *
> >   * Note that the caller does not have to migrate all the pages that are marked
> >   * with MIGRATE_PFN_MIGRATE flag in src array unless this is a migration from
> > @@ -2770,8 +2683,6 @@ int migrate_vma_setup(struct migrate_vma *args)
> >  
> >  	migrate_vma_collect(args);
> >  
> > -	if (args->cpages)
> > -		migrate_vma_prepare(args);
> >  	if (args->cpages)
> >  		migrate_vma_unmap(args);
> >  
> 





^ permalink raw reply

* Re: [PATCH] mm/migrate.c: Remove MIGRATE_PFN_LOCKED
From: Alistair Popple @ 2021-10-28  1:27 UTC (permalink / raw)
  To: akpm, Ralph Campbell
  Cc: amd-gfx, nouveau, Felix.Kuehling, linux-kernel, kvm-ppc, linux-mm,
	jglisse, dri-devel, ziy, jhubbard, alexander.deucher,
	linuxppc-dev, hch, bskeggs
In-Reply-To: <f92e2dfe-f033-9b09-e83c-203052b491e1@nvidia.com>

On Tuesday, 26 October 2021 11:57:06 AM AEDT Ralph Campbell wrote:
> 
> On 10/24/21 21:16, Alistair Popple wrote:
> > MIGRATE_PFN_LOCKED is used to indicate to migrate_vma_prepare() that a
> > source page was already locked during migrate_vma_collect(). If it
> > wasn't then the a second attempt is made to lock the page. However if
> > the first attempt failed it's unlikely a second attempt will succeed,
> > and the retry adds complexity. So clean this up by removing the retry
> > and MIGRATE_PFN_LOCKED flag.
> >
> > Destination pages are also meant to have the MIGRATE_PFN_LOCKED flag
> > set, but nothing actually checks that.
> >
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> 
> You can add:
> Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>

Thanks!

> >   
> >   		/*
> >   		 * Optimize for the common case where page is only mapped once
> > @@ -2379,7 +2378,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >   		if (trylock_page(page)) {
> >   			pte_t swp_pte;
> >   
> > -			mpfn |= MIGRATE_PFN_LOCKED;
> > +			migrate->cpages++;
> >   			ptep_get_and_clear(mm, addr, ptep);
> 
> I was looking at try_to_migrate_one() and looking at the differences with
> the code here to insert the migration PTE and noticed that instead of
> ptet_get_and_clear() it has:
> 	pteval = ptep_clear_flush(vma, address, pvmw.pte);
> 
> 	/* Move the dirty bit to the page. Now the pte is gone. */
> 	if (pte_dirty(pteval))
> 		set_page_dirty(page);
> 	update_hiwater_rss(mm);
> 
> I know that is pre-existing, probably a separate patch if it is an issue.

I don't think it is an issue today because migrate_vma only supports private,
non-shared pages. At some point though it may be extended to support
file-backed pages and this would be easy to miss so will put a patch together.

> >   
> >   			/* Setup special migration page table entry */
> > @@ -2413,6 +2412,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
> >   
> >   			if (pte_present(pte))
> >   				unmapped++;
> > +		} else {
> > +			put_page(page);
> > +			mpfn = 0;
> >   		}
> >   
> >   next:
> > @@ -2517,15 +2519,17 @@ static bool migrate_vma_check_page(struct page *page)
> >   }
> >   
> >   /*
> > - * migrate_vma_prepare() - lock pages and isolate them from the lru
> > + * migrate_vma_unmap() - replace page mapping with special migration pte entry
> >    * @migrate: migrate struct containing all migration information
> >    *
> > - * This locks pages that have been collected by migrate_vma_collect(). Once each
> > - * page is locked it is isolated from the lru (for non-device pages). Finally,
> > - * the ref taken by migrate_vma_collect() is dropped, as locked pages cannot be
> > - * migrated by concurrent kernel threads.
> > + * Isolate pages from the LRU and replace mappings (CPU page table pte) with a
> > + * special migration pte entry and check if it has been pinned. Pinned pages are
> > + * restored because we cannot migrate them.
> > + *
> > + * This is the last step before we call the device driver callback to allocate
> > + * destination memory and copy contents of original page over to new page.
> >    */
> > -static void migrate_vma_prepare(struct migrate_vma *migrate)
> > +static void migrate_vma_unmap(struct migrate_vma *migrate)
> >   {
> >   	const unsigned long npages = migrate->npages;
> >   	const unsigned long start = migrate->start;
> > @@ -2534,32 +2538,12 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >   
> >   	lru_add_drain();
> >   
> > -	for (i = 0; (i < npages) && migrate->cpages; i++) {
> > +	for (i = 0; i < npages; i++) {
> >   		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -		bool remap = true;
> >   
> >   		if (!page)
> >   			continue;
> >   
> > -		if (!(migrate->src[i] & MIGRATE_PFN_LOCKED)) {
> > -			/*
> > -			 * Because we are migrating several pages there can be
> > -			 * a deadlock between 2 concurrent migration where each
> > -			 * are waiting on each other page lock.
> > -			 *
> > -			 * Make migrate_vma() a best effort thing and backoff
> > -			 * for any page we can not lock right away.
> > -			 */
> > -			if (!trylock_page(page)) {
> > -				migrate->src[i] = 0;
> > -				migrate->cpages--;
> > -				put_page(page);
> > -				continue;
> > -			}
> > -			remap = false;
> > -			migrate->src[i] |= MIGRATE_PFN_LOCKED;
> > -		}
> > -
> >   		/* ZONE_DEVICE pages are not on LRU */
> >   		if (!is_zone_device_page(page)) {
> >   			if (!PageLRU(page) && allow_drain) {
> > @@ -2569,16 +2553,9 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >   			}
> >   
> >   			if (isolate_lru_page(page)) {
> > -				if (remap) {
> > -					migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -					migrate->cpages--;
> > -					restore++;
> > -				} else {
> > -					migrate->src[i] = 0;
> > -					unlock_page(page);
> > -					migrate->cpages--;
> > -					put_page(page);
> > -				}
> > +				migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > +				migrate->cpages--;
> > +				restore++;
> >   				continue;
> >   			}
> >   
> > @@ -2586,80 +2563,20 @@ static void migrate_vma_prepare(struct migrate_vma *migrate)
> >   			put_page(page);
> >   		}
> >   
> > -		if (!migrate_vma_check_page(page)) {
> > -			if (remap) {
> > -				migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -				migrate->cpages--;
> > -				restore++;
> > -
> > -				if (!is_zone_device_page(page)) {
> > -					get_page(page);
> > -					putback_lru_page(page);
> > -				}
> > -			} else {
> > -				migrate->src[i] = 0;
> > -				unlock_page(page);
> > -				migrate->cpages--;
> > +		if (page_mapped(page))
> > +			try_to_migrate(page, 0);
> >   
> > -				if (!is_zone_device_page(page))
> > -					putback_lru_page(page);
> > -				else
> > -					put_page(page);
> > +		if (page_mapped(page) || !migrate_vma_check_page(page)) {
> > +			if (!is_zone_device_page(page)) {
> > +				get_page(page);
> > +				putback_lru_page(page);
> >   			}
> > -		}
> > -	}
> > -
> > -	for (i = 0, addr = start; i < npages && restore; i++, addr += PAGE_SIZE) {
> > -		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -
> > -		if (!page || (migrate->src[i] & MIGRATE_PFN_MIGRATE))
> > -			continue;
> >   
> > -		remove_migration_pte(page, migrate->vma, addr, page);
> > -
> > -		migrate->src[i] = 0;
> > -		unlock_page(page);
> > -		put_page(page);
> > -		restore--;
> > -	}
> > -}
> > -
> > -/*
> > - * migrate_vma_unmap() - replace page mapping with special migration pte entry
> > - * @migrate: migrate struct containing all migration information
> > - *
> > - * Replace page mapping (CPU page table pte) with a special migration pte entry
> > - * and check again if it has been pinned. Pinned pages are restored because we
> > - * cannot migrate them.
> > - *
> > - * This is the last step before we call the device driver callback to allocate
> > - * destination memory and copy contents of original page over to new page.
> > - */
> > -static void migrate_vma_unmap(struct migrate_vma *migrate)
> > -{
> > -	const unsigned long npages = migrate->npages;
> > -	const unsigned long start = migrate->start;
> > -	unsigned long addr, i, restore = 0;
> > -
> > -	for (i = 0; i < npages; i++) {
> > -		struct page *page = migrate_pfn_to_page(migrate->src[i]);
> > -
> > -		if (!page || !(migrate->src[i] & MIGRATE_PFN_MIGRATE))
> > +			migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > +			migrate->cpages--;
> > +			restore++;
> >   			continue;
> > -
> > -		if (page_mapped(page)) {
> > -			try_to_migrate(page, 0);
> > -			if (page_mapped(page))
> > -				goto restore;
> >   		}
> > -
> > -		if (migrate_vma_check_page(page))
> > -			continue;
> > -
> > -restore:
> > -		migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
> > -		migrate->cpages--;
> > -		restore++;
> >   	}
> >   
> >   	for (addr = start, i = 0; i < npages && restore; addr += PAGE_SIZE, i++) {
> > @@ -2672,12 +2589,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
> >   
> >   		migrate->src[i] = 0;
> >   		unlock_page(page);
> > +		put_page(page);
> >   		restore--;
> > -
> > -		if (is_zone_device_page(page))
> > -			put_page(page);
> > -		else
> > -			putback_lru_page(page);
> >   	}
> >   }
> >   
> > @@ -2700,8 +2613,8 @@ static void migrate_vma_unmap(struct migrate_vma *migrate)
> >    * it for all those entries (ie with MIGRATE_PFN_VALID and MIGRATE_PFN_MIGRATE
> >    * flag set).  Once these are allocated and copied, the caller must update each
> >    * corresponding entry in the dst array with the pfn value of the destination
> > - * page and with the MIGRATE_PFN_VALID and MIGRATE_PFN_LOCKED flags set
> > - * (destination pages must have their struct pages locked, via lock_page()).
> > + * page and with MIGRATE_PFN_VALID. Destination pages must be locked via
> > + * lock_page().
> >    *
> >    * Note that the caller does not have to migrate all the pages that are marked
> >    * with MIGRATE_PFN_MIGRATE flag in src array unless this is a migration from
> > @@ -2770,8 +2683,6 @@ int migrate_vma_setup(struct migrate_vma *args)
> >   
> >   	migrate_vma_collect(args);
> >   
> > -	if (args->cpages)
> > -		migrate_vma_prepare(args);
> >   	if (args->cpages)
> >   		migrate_vma_unmap(args);
> >   
> 





^ permalink raw reply

* [PATCH v2 40/45] mfd: twl4030: Use devm_register_trivial_power_off_handler()
From: Dmitry Osipenko @ 2021-10-27 21:17 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Lee Jones, Rafael J . Wysocki,
	Mark Brown, Andrew Morton, Guenter Roeck, Russell King,
	Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Helge Deller, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Juergen Gross, Thomas Bogendoerfer, linux-parisc, Nick Hu,
	Avi Fishman, Patrick Venture, Liam Girdwood, linux-kernel,
	Palmer Dabbelt, Philipp Zabel, Guo Ren, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-1-digetx@gmail.com>

Use devm_register_trivial_power_off_handler() that replaces global
pm_power_off variable and allows to register multiple power-off handlers.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/mfd/twl4030-power.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/mfd/twl4030-power.c b/drivers/mfd/twl4030-power.c
index 6b36932263ba..72df4735d628 100644
--- a/drivers/mfd/twl4030-power.c
+++ b/drivers/mfd/twl4030-power.c
@@ -29,6 +29,7 @@
 #include <linux/platform_device.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/reboot.h>
 
 #include <asm/mach-types.h>
 
@@ -923,7 +924,7 @@ static int twl4030_power_probe(struct platform_device *pdev)
 	}
 
 	/* Board has to be wired properly to use this feature */
-	if (twl4030_power_use_poweroff(pdata, node) && !pm_power_off) {
+	if (twl4030_power_use_poweroff(pdata, node)) {
 		/* Default for SEQ_OFFSYNC is set, lets ensure this */
 		err = twl_i2c_read_u8(TWL_MODULE_PM_MASTER, &val,
 				      TWL4030_PM_MASTER_CFG_P123_TRANSITION);
@@ -939,7 +940,12 @@ static int twl4030_power_probe(struct platform_device *pdev)
 			}
 		}
 
-		pm_power_off = twl4030_power_off;
+		err = devm_register_trivial_power_off_handler(&pdev->dev,
+							      twl4030_power_off);
+		if (err) {
+			dev_err(&pdev->dev, "Failed to register power-off handler\n");
+			goto relock;
+		}
 	}
 
 relock:
-- 
2.33.1


^ permalink raw reply related

* [PATCH v2 39/45] mfd: dm355evm_msp: Use devm_register_trivial_power_off_handler()
From: Dmitry Osipenko @ 2021-10-27 21:17 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Lee Jones, Rafael J . Wysocki,
	Mark Brown, Andrew Morton, Guenter Roeck, Russell King,
	Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Helge Deller, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Juergen Gross, Thomas Bogendoerfer, linux-parisc, Nick Hu,
	Avi Fishman, Patrick Venture, Liam Girdwood, linux-kernel,
	Palmer Dabbelt, Philipp Zabel, Guo Ren, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-1-digetx@gmail.com>

Use devm_register_trivial_power_off_handler() that replaces global
pm_power_off variable and allows to register multiple power-off handlers.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/mfd/dm355evm_msp.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/mfd/dm355evm_msp.c b/drivers/mfd/dm355evm_msp.c
index 54fb6cbd2aa0..5ee830f65589 100644
--- a/drivers/mfd/dm355evm_msp.c
+++ b/drivers/mfd/dm355evm_msp.c
@@ -8,6 +8,7 @@
 #include <linux/init.h>
 #include <linux/mutex.h>
 #include <linux/platform_device.h>
+#include <linux/reboot.h>
 #include <linux/clk.h>
 #include <linux/module.h>
 #include <linux/err.h>
@@ -375,11 +376,10 @@ static void dm355evm_power_off(void)
 	dm355evm_command(MSP_COMMAND_POWEROFF);
 }
 
-static int dm355evm_msp_remove(struct i2c_client *client)
+static void dm355evm_msp_remove(void *data)
 {
-	pm_power_off = NULL;
+	/* FIXME remove children ... */
 	msp430 = NULL;
-	return 0;
 }
 
 static int
@@ -392,6 +392,11 @@ dm355evm_msp_probe(struct i2c_client *client, const struct i2c_device_id *id)
 		return -EBUSY;
 	msp430 = client;
 
+	status = devm_add_action_or_reset(&client->dev, dm355evm_msp_remove,
+					  NULL);
+	if (status < 0)
+		goto fail;
+
 	/* display revision status; doubles as sanity check */
 	status = dm355evm_msp_read(DM355EVM_MSP_FIRMREV);
 	if (status < 0)
@@ -416,13 +421,15 @@ dm355evm_msp_probe(struct i2c_client *client, const struct i2c_device_id *id)
 		goto fail;
 
 	/* PM hookup */
-	pm_power_off = dm355evm_power_off;
+	status = devm_register_trivial_power_off_handler(&client->dev,
+							 dm355evm_power_off);
+	if (status)
+		dev_err(&client->dev, "failed to register power-off handler: %d",
+			status);
 
 	return 0;
 
 fail:
-	/* FIXME remove children ... */
-	dm355evm_msp_remove(client);
 	return status;
 }
 
@@ -436,7 +443,6 @@ static struct i2c_driver dm355evm_msp_driver = {
 	.driver.name	= "dm355evm_msp",
 	.id_table	= dm355evm_msp_ids,
 	.probe		= dm355evm_msp_probe,
-	.remove		= dm355evm_msp_remove,
 };
 
 static int __init dm355evm_msp_init(void)
-- 
2.33.1


^ permalink raw reply related

* [PATCH v2 38/45] mfd: max77620: Use devm_register_simple_power_off_handler()
From: Dmitry Osipenko @ 2021-10-27 21:17 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Lee Jones, Rafael J . Wysocki,
	Mark Brown, Andrew Morton, Guenter Roeck, Russell King,
	Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Helge Deller, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Juergen Gross, Thomas Bogendoerfer, linux-parisc, Nick Hu,
	Avi Fishman, Patrick Venture, Liam Girdwood, linux-kernel,
	Palmer Dabbelt, Philipp Zabel, Guo Ren, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-1-digetx@gmail.com>

Use devm_register_simple_power_off_handler() that replaces global
pm_power_off variable and allows to register multiple power-off handlers.

Nexus 7 Android tablet can be powered off using MAX77663 PMIC and using
a special bootloader command. At first the bootloader option should be
tried, it will have a higher priority than of PMIC that uses default
priority.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/mfd/max77620.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/mfd/max77620.c b/drivers/mfd/max77620.c
index fec2096474ad..29487ccc191a 100644
--- a/drivers/mfd/max77620.c
+++ b/drivers/mfd/max77620.c
@@ -31,11 +31,10 @@
 #include <linux/init.h>
 #include <linux/of.h>
 #include <linux/of_device.h>
+#include <linux/reboot.h>
 #include <linux/regmap.h>
 #include <linux/slab.h>
 
-static struct max77620_chip *max77620_scratch;
-
 static const struct resource gpio_resources[] = {
 	DEFINE_RES_IRQ(MAX77620_IRQ_TOP_GPIO),
 };
@@ -483,13 +482,13 @@ static int max77620_read_es_version(struct max77620_chip *chip)
 	return ret;
 }
 
-static void max77620_pm_power_off(void)
+static void max77620_pm_power_off(void *data)
 {
-	struct max77620_chip *chip = max77620_scratch;
+	struct max77620_chip *chip = data;
 
 	regmap_update_bits(chip->rmap, MAX77620_REG_ONOFFCNFG1,
-			   MAX77620_ONOFFCNFG1_SFT_RST,
-			   MAX77620_ONOFFCNFG1_SFT_RST);
+				   MAX77620_ONOFFCNFG1_SFT_RST,
+				   MAX77620_ONOFFCNFG1_SFT_RST);
 }
 
 static int max77620_probe(struct i2c_client *client,
@@ -566,9 +565,13 @@ static int max77620_probe(struct i2c_client *client,
 	}
 
 	pm_off = of_device_is_system_power_controller(client->dev.of_node);
-	if (pm_off && !pm_power_off) {
-		max77620_scratch = chip;
-		pm_power_off = max77620_pm_power_off;
+	if (pm_off) {
+		ret = devm_register_simple_power_off_handler(chip->dev,
+							     max77620_pm_power_off,
+							     chip);
+		if (ret < 0)
+			dev_err(chip->dev,
+				"Failed to register power-off handler: %d\n", ret);
 	}
 
 	return 0;
-- 
2.33.1


^ permalink raw reply related

* [PATCH v2 37/45] mfd: tps65910: Use devm_register_simple_power_off_handler()
From: Dmitry Osipenko @ 2021-10-27 21:17 UTC (permalink / raw)
  To: Thierry Reding, Jonathan Hunter, Lee Jones, Rafael J . Wysocki,
	Mark Brown, Andrew Morton, Guenter Roeck, Russell King,
	Daniel Lezcano, Andy Shevchenko, Ulf Hansson
  Cc: Rich Felker, linux-ia64, Tomer Maimon, Santosh Shilimkar,
	linux-sh, Boris Ostrovsky, Linus Walleij, Dave Hansen, linux-acpi,
	Tali Perry, James E.J. Bottomley, Paul Mackerras, Pavel Machek,
	H. Peter Anvin, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, Stefano Stabellini, Benjamin Fair, Yoshinori Sato,
	Krzysztof Kozlowski, Helge Deller, x86, linux-csky, Tony Lindgren,
	Chen-Yu Tsai, Ingo Molnar, Geert Uytterhoeven, Catalin Marinas,
	xen-devel, linux-mips, Len Brown, Albert Ou, linux-pm,
	Jonathan Neuschäfer, Vladimir Zapolskiy, linux-m68k,
	Borislav Petkov, Greentime Hu, Paul Walmsley, linux-tegra,
	Thomas Gleixner, linux-omap, Nancy Yuen, linux-arm-kernel,
	Juergen Gross, Thomas Bogendoerfer, linux-parisc, Nick Hu,
	Avi Fishman, Patrick Venture, Liam Girdwood, linux-kernel,
	Palmer Dabbelt, Philipp Zabel, Guo Ren, linuxppc-dev, openbmc,
	Joshua Thompson
In-Reply-To: <20211027211715.12671-1-digetx@gmail.com>

Use devm_register_simple_power_off_handler() that replaces global
pm_power_off variable and allows to register multiple power-off handlers.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/mfd/tps65910.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/mfd/tps65910.c b/drivers/mfd/tps65910.c
index 6e105cca27d4..8fab30dc84e5 100644
--- a/drivers/mfd/tps65910.c
+++ b/drivers/mfd/tps65910.c
@@ -16,6 +16,7 @@
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
 #include <linux/mfd/core.h>
+#include <linux/reboot.h>
 #include <linux/regmap.h>
 #include <linux/mfd/tps65910.h>
 #include <linux/of.h>
@@ -429,9 +430,9 @@ struct tps65910_board *tps65910_parse_dt(struct i2c_client *client,
 }
 #endif
 
-static struct i2c_client *tps65910_i2c_client;
-static void tps65910_power_off(void)
+static void tps65910_power_off(void *data)
 {
+	struct i2c_client *tps65910_i2c_client = data;
 	struct tps65910 *tps65910;
 
 	tps65910 = dev_get_drvdata(&tps65910_i2c_client->dev);
@@ -503,9 +504,15 @@ static int tps65910_i2c_probe(struct i2c_client *i2c,
 	tps65910_ck32k_init(tps65910, pmic_plat_data);
 	tps65910_sleepinit(tps65910, pmic_plat_data);
 
-	if (pmic_plat_data->pm_off && !pm_power_off) {
-		tps65910_i2c_client = i2c;
-		pm_power_off = tps65910_power_off;
+	if (pmic_plat_data->pm_off) {
+		ret = devm_register_simple_power_off_handler(&i2c->dev,
+							     tps65910_power_off,
+							     i2c);
+		if (ret) {
+			dev_err(&i2c->dev,
+				"failed to register power-off handler: %d\n", ret);
+			return ret;
+		}
 	}
 
 	ret = devm_mfd_add_devices(tps65910->dev, -1,
-- 
2.33.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox