LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: fsl_espi errors on v5.7.15
From: Chris Packham @ 2020-08-19 21:08 UTC (permalink / raw)
  To: Heiner Kallweit, broonie@kernel.org, mpe@ellerman.id.au,
	benh@kernel.crashing.org, paulus@samba.org,
	tiago.brusamarello@datacom.ind.br
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-spi@vger.kernel.org
In-Reply-To: <61bb9800-9f90-9cd4-3b17-c14a7f83d792@gmail.com>


On 19/08/20 6:15 pm, Heiner Kallweit wrote:
> On 19.08.2020 00:44, Chris Packham wrote:
>> Hi Again,
>>
>> On 17/08/20 9:09 am, Chris Packham wrote:
>>
>>> On 14/08/20 6:19 pm, Heiner Kallweit wrote:
>>>> On 14.08.2020 04:48, Chris Packham wrote:
>>>>> Hi,
>>>>>
>>>>> I'm seeing a problem with accessing spi-nor after upgrading a T2081
>>>>> based system to linux v5.7.15
>>>>>
>>>>> For this board u-boot and the u-boot environment live on spi-nor.
>>>>>
>>>>> When I use fw_setenv from userspace I get the following kernel logs
>>>>>
>>>>> # fw_setenv foo=1
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>>>> ...
>>>>>
>>>> This error reporting doesn't exist yet in 4.4. So you may have an issue
>>>> under 4.4 too, it's just not reported.
>>>> Did you verify that under 4.4 fw_setenv actually has an effect?
>>> Just double checked and yes under 4.4 the setting does get saved.
>>>>> If I run fw_printenv (before getting it into a bad state) it is able to
>>>>> display the content of the boards u-boot environment.
>>>>>
>>>> This might indicate an issue with spi being locked. I've seen related
>>>> questions, just use the search engine of your choice and check for
>>>> fw_setenv and locked.
>>> I'm running a version of fw_setenv which includes
>>> https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it shouldn't
>>> be locking things unnecessarily.
>>>>> If been unsuccessful in producing a setup for bisecting the issue. I do
>>>>> know the issue doesn't occur on the old 4.4.x based kernel but that's
>>>>> probably not much help.
>>>>>
>>>>> Any pointers on what the issue (and/or solution) might be.
>> I finally managed to get our board running with a vanilla kernel. With
>> corenet64_smp_defconfig I occasionally see
>>
>>     fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>
>> other than the message things seem to be working.
>>
>> With a custom defconfig I see
>>
>>     fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>>     fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>>     fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>     ...
>>
>> and access to the spi-nor does not work until the board is reset.
>>
>> I'll try and pick apart the differences between the two defconfigs.

I now think my earlier testing is invalid. I have seen the problem with 
either defconfig if I try hard enough. I had convinced myself that the 
problem was CONFIG_PREEMPT but that was before I found boot-to-boot 
differences with the same kernel.

It's possible that I'm chasing multiple issues with the same symptom.

The error I'm most concerned with is in the sequence
1. boot with old image
2. write environment
3. boot with new image
4. write environment
5. write fails and environment is corrupted

After I recover the system things sometimes seem fine. Until I repeat 
the sequence above.

> Also relevant may be:
> - Which dts are you using?
Custom but based heavily on the t2080rdb.
> - What's the spi-nor type, and at which frequency are you operating it?
The board has several alternate parts for the spi-nor so the dts just 
specifies compatible = "jedec,spi-nor" the actual chip detected on the 
board I have is "n25q032a (4096 Kbytes)". The dts sets spi-max-frequency 
= <10000000> I haven't measured the actual frequency on the bus.
> - Does the issue still happen if you lower the frequency?
I did play around with the frequency initially but I should probably 
give that another go now that I have a better reproduction method.

^ permalink raw reply

* [PATCH net-next v2 0/4] refactoring of ibmvnic code
From: Lijun Pan @ 2020-08-19 22:52 UTC (permalink / raw)
  To: netdev; +Cc: Lijun Pan, linuxppc-dev

This patch series refactor reset_init and init functions,
and make some other cosmetic changes to make the code
easier to read and debug. v2 removes __func__ and v1's 1/5.

Lijun Pan (4):
  ibmvnic: compare adapter->init_done_rc with more readable
    ibmvnic_rc_codes
  ibmvnic: improve ibmvnic_init and ibmvnic_reset_init
  ibmvnic: remove never executed if statement
  ibmvnic: merge ibmvnic_reset_init and ibmvnic_init

 drivers/net/ethernet/ibm/ibmvnic.c | 84 ++++++++----------------------
 1 file changed, 21 insertions(+), 63 deletions(-)

-- 
2.23.0


^ permalink raw reply

* [PATCH net-next v2 1/4] ibmvnic: compare adapter->init_done_rc with more readable ibmvnic_rc_codes
From: Lijun Pan @ 2020-08-19 22:52 UTC (permalink / raw)
  To: netdev; +Cc: Lijun Pan, linuxppc-dev
In-Reply-To: <20200819225226.10152-1-ljp@linux.ibm.com>

Instead of comparing (adapter->init_done_rc == 1), let it
be (adapter->init_done_rc == PARTIALSUCCESS).

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5afb3c9c52d2..65f5d99f97dc 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -974,7 +974,7 @@ static int set_link_state(struct ibmvnic_adapter *adapter, u8 link_state)
 			return -1;
 		}
 
-		if (adapter->init_done_rc == 1) {
+		if (adapter->init_done_rc == PARTIALSUCCESS) {
 			/* Partuial success, delay and re-send */
 			mdelay(1000);
 			resend = true;
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 2/4] ibmvnic: improve ibmvnic_init and ibmvnic_reset_init
From: Lijun Pan @ 2020-08-19 22:52 UTC (permalink / raw)
  To: netdev; +Cc: Lijun Pan, linuxppc-dev
In-Reply-To: <20200819225226.10152-1-ljp@linux.ibm.com>

When H_SEND_CRQ command returns with H_CLOSED, it means the
server's CRQ is not ready yet. Instead of resetting immediately,
we wait for the server to launch passive init.
ibmvnic_init() and ibmvnic_reset_init() should also return the
error code from ibmvnic_send_crq_init() call.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
v2: removes __func__ in error messages.

 drivers/net/ethernet/ibm/ibmvnic.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 65f5d99f97dc..644352e5056d 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3568,8 +3568,7 @@ static int ibmvnic_send_crq(struct ibmvnic_adapter *adapter,
 	if (rc) {
 		if (rc == H_CLOSED) {
 			dev_warn(dev, "CRQ Queue closed\n");
-			if (test_bit(0, &adapter->resetting))
-				ibmvnic_reset(adapter, VNIC_RESET_FATAL);
+			/* do not reset, report the fail, wait for passive init from server */
 		}
 
 		dev_warn(dev, "Send error (rc=%d)\n", rc);
@@ -4985,7 +4984,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 
 	reinit_completion(&adapter->init_done);
 	adapter->init_done_rc = 0;
-	ibmvnic_send_crq_init(adapter);
+	rc = ibmvnic_send_crq_init(adapter);
+	if (rc) {
+		dev_err(dev, "Send crq init failed with error %d\n", rc);
+		return rc;
+	}
+
 	if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
 		dev_err(dev, "Initialization sequence timed out\n");
 		return -1;
@@ -5039,7 +5043,12 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
 	adapter->from_passive_init = false;
 
 	adapter->init_done_rc = 0;
-	ibmvnic_send_crq_init(adapter);
+	rc = ibmvnic_send_crq_init(adapter);
+	if (rc) {
+		dev_err(dev, "Send crq init failed with error %d\n", rc);
+		return rc;
+	}
+
 	if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
 		dev_err(dev, "Initialization sequence timed out\n");
 		return -1;
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 3/4] ibmvnic: remove never executed if statement
From: Lijun Pan @ 2020-08-19 22:52 UTC (permalink / raw)
  To: netdev; +Cc: Lijun Pan, linuxppc-dev
In-Reply-To: <20200819225226.10152-1-ljp@linux.ibm.com>

At the beginning of the function, from_passive_init is set false by
"adapter->from_passive_init = false;",
hence the if statement will never run.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 644352e5056d..4ca4647db72a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5000,12 +5000,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 		return adapter->init_done_rc;
 	}
 
-	if (adapter->from_passive_init) {
-		adapter->state = VNIC_OPEN;
-		adapter->from_passive_init = false;
-		return -1;
-	}
-
 	if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
 	    adapter->reset_reason != VNIC_RESET_MOBILITY) {
 		if (adapter->req_rx_queues != old_num_rx_queues ||
@@ -5059,12 +5053,6 @@ static int ibmvnic_init(struct ibmvnic_adapter *adapter)
 		return adapter->init_done_rc;
 	}
 
-	if (adapter->from_passive_init) {
-		adapter->state = VNIC_OPEN;
-		adapter->from_passive_init = false;
-		return -1;
-	}
-
 	rc = init_sub_crqs(adapter);
 	if (rc) {
 		dev_err(dev, "Initialization of sub crqs failed\n");
-- 
2.23.0


^ permalink raw reply related

* [PATCH net-next v2 4/4] ibmvnic: merge ibmvnic_reset_init and ibmvnic_init
From: Lijun Pan @ 2020-08-19 22:52 UTC (permalink / raw)
  To: netdev; +Cc: Lijun Pan, linuxppc-dev
In-Reply-To: <20200819225226.10152-1-ljp@linux.ibm.com>

These two functions share the majority of the code, hence merge
them together. In the meanwhile, add a reset pass-in parameter
to differentiate them. Thus, the code is easier to read and to tell
the difference between reset_init and regular init.

Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 65 ++++++------------------------
 1 file changed, 13 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 4ca4647db72a..47fbe0553570 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -104,8 +104,7 @@ static int send_login(struct ibmvnic_adapter *adapter);
 static void send_cap_queries(struct ibmvnic_adapter *adapter);
 static int init_sub_crqs(struct ibmvnic_adapter *);
 static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter);
-static int ibmvnic_init(struct ibmvnic_adapter *);
-static int ibmvnic_reset_init(struct ibmvnic_adapter *);
+static int ibmvnic_reset_init(struct ibmvnic_adapter *, bool reset);
 static void release_crq_queue(struct ibmvnic_adapter *);
 static int __ibmvnic_set_mac(struct net_device *, u8 *);
 static int init_crq_queue(struct ibmvnic_adapter *adapter);
@@ -1868,7 +1867,7 @@ static int do_change_param_reset(struct ibmvnic_adapter *adapter,
 		return rc;
 	}
 
-	rc = ibmvnic_reset_init(adapter);
+	rc = ibmvnic_reset_init(adapter, true);
 	if (rc)
 		return IBMVNIC_INIT_FAILED;
 
@@ -1986,7 +1985,7 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			goto out;
 		}
 
-		rc = ibmvnic_reset_init(adapter);
+		rc = ibmvnic_reset_init(adapter, true);
 		if (rc) {
 			rc = IBMVNIC_INIT_FAILED;
 			goto out;
@@ -2093,7 +2092,7 @@ static int do_hard_reset(struct ibmvnic_adapter *adapter,
 		return rc;
 	}
 
-	rc = ibmvnic_init(adapter);
+	rc = ibmvnic_reset_init(adapter, false);
 	if (rc)
 		return rc;
 
@@ -4970,7 +4969,7 @@ static int init_crq_queue(struct ibmvnic_adapter *adapter)
 	return retrc;
 }
 
-static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
+static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter, bool reset)
 {
 	struct device *dev = &adapter->vdev->dev;
 	unsigned long timeout = msecs_to_jiffies(30000);
@@ -4979,10 +4978,12 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 
 	adapter->from_passive_init = false;
 
-	old_num_rx_queues = adapter->req_rx_queues;
-	old_num_tx_queues = adapter->req_tx_queues;
+	if (reset) {
+		old_num_rx_queues = adapter->req_rx_queues;
+		old_num_tx_queues = adapter->req_tx_queues;
+		reinit_completion(&adapter->init_done);
+	}
 
-	reinit_completion(&adapter->init_done);
 	adapter->init_done_rc = 0;
 	rc = ibmvnic_send_crq_init(adapter);
 	if (rc) {
@@ -5000,7 +5001,8 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 		return adapter->init_done_rc;
 	}
 
-	if (test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
+	if (reset &&
+	    test_bit(0, &adapter->resetting) && !adapter->wait_for_reset &&
 	    adapter->reset_reason != VNIC_RESET_MOBILITY) {
 		if (adapter->req_rx_queues != old_num_rx_queues ||
 		    adapter->req_tx_queues != old_num_tx_queues) {
@@ -5028,47 +5030,6 @@ static int ibmvnic_reset_init(struct ibmvnic_adapter *adapter)
 	return rc;
 }
 
-static int ibmvnic_init(struct ibmvnic_adapter *adapter)
-{
-	struct device *dev = &adapter->vdev->dev;
-	unsigned long timeout = msecs_to_jiffies(30000);
-	int rc;
-
-	adapter->from_passive_init = false;
-
-	adapter->init_done_rc = 0;
-	rc = ibmvnic_send_crq_init(adapter);
-	if (rc) {
-		dev_err(dev, "Send crq init failed with error %d\n", rc);
-		return rc;
-	}
-
-	if (!wait_for_completion_timeout(&adapter->init_done, timeout)) {
-		dev_err(dev, "Initialization sequence timed out\n");
-		return -1;
-	}
-
-	if (adapter->init_done_rc) {
-		release_crq_queue(adapter);
-		return adapter->init_done_rc;
-	}
-
-	rc = init_sub_crqs(adapter);
-	if (rc) {
-		dev_err(dev, "Initialization of sub crqs failed\n");
-		release_crq_queue(adapter);
-		return rc;
-	}
-
-	rc = init_sub_crq_irqs(adapter);
-	if (rc) {
-		dev_err(dev, "Failed to initialize sub crq irqs\n");
-		release_crq_queue(adapter);
-	}
-
-	return rc;
-}
-
 static struct device_attribute dev_attr_failover;
 
 static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
@@ -5131,7 +5092,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id)
 			goto ibmvnic_init_fail;
 		}
 
-		rc = ibmvnic_init(adapter);
+		rc = ibmvnic_reset_init(adapter, false);
 		if (rc && rc != EAGAIN)
 			goto ibmvnic_init_fail;
 	} while (rc == EAGAIN);
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v2 3/4] powerpc/memhotplug: Make lmb size 64bit
From: Sasha Levin @ 2020-08-19 23:56 UTC (permalink / raw)
  To: Sasha Levin, Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: Nathan Lynch, stable
In-Reply-To: <20200806162329.276534-3-aneesh.kumar@linux.ibm.com>

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, v4.14.193, v4.9.232, v4.4.232.

v5.8.1: Build OK!
v5.7.15: Build OK!
v5.4.58: Build OK!
v4.19.139: Failed to apply! Possible dependencies:
    Unable to calculate

v4.14.193: Failed to apply! Possible dependencies:
    Unable to calculate

v4.9.232: Failed to apply! Possible dependencies:
    1a367063ca0c ("powerpc/pseries: Check memory device state before onlining/offlining")
    25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs buffer")
    333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add")
    753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory remove")
    943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
    c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
    e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation")
    f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n")

v4.4.232: Failed to apply! Possible dependencies:
    183deeea5871 ("powerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.c")
    1a367063ca0c ("powerpc/pseries: Check memory device state before onlining/offlining")
    1dc759566636 ("powerpc/pseries: Use kernel hotplug queue for PowerVM hotplug events")
    1f859adb9253 ("powerpc/pseries: Verify CPU doesn't exist before adding")
    25b587fba9a4 ("powerpc/pseries: Correct possible read beyond dlpar sysfs buffer")
    333f7b76865b ("powerpc/pseries: Implement indexed-count hotplug memory add")
    4a4bdfea7cb7 ("powerpc/pseries: Refactor dlpar_add_lmb() code")
    753843471cbb ("powerpc/pseries: Implement indexed-count hotplug memory remove")
    9054619ef54a ("powerpc/pseries: Add pseries hotplug workqueue")
    943db62c316c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
    9dc512819e4b ("powerpc: Fix unused function warning 'lmb_to_memblock'")
    bdf5fc633804 ("powerpc/pseries: Update LMB associativity index during DLPAR add/remove")
    c21f515c7436 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
    e70d59700fc3 ("powerpc/pseries: Introduce memory hotplug READD operation")
    e9d764f80396 ("powerpc/pseries: Enable kernel CPU dlpar from sysfs")
    ec999072442a ("powerpc/pseries: Auto-online hotplugged memory")
    f84775c2d5d9 ("powerpc/pseries: Fix build break when MEMORY_HOTREMOVE=n")
    fdb4f6e99ffa ("powerpc/pseries: Remove call to memblock_add()")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply

* Re: [PATCH v2 1/4] powerpc/drmem: Make lmb_size 64 bit
From: Sasha Levin @ 2020-08-19 23:56 UTC (permalink / raw)
  To: Sasha Levin, Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: Nathan Lynch, stable
In-Reply-To: <20200806162329.276534-1-aneesh.kumar@linux.ibm.com>

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.1, v5.7.15, v5.4.58, v4.19.139, v4.14.193, v4.9.232, v4.4.232.

v5.8.1: Build OK!
v5.7.15: Build OK!
v5.4.58: Build OK!
v4.19.139: Build OK!
v4.14.193: Failed to apply! Possible dependencies:
    22508f3dc985 ("powerpc/numa: Look up device node in of_get_usable_memory()")
    2c77721552e5 ("powerpc: Move of_drconf_cell struct to asm/drmem.h")
    35f80debaef0 ("powerpc/numa: Look up device node in of_get_assoc_arrays()")
    514a9cb3316a ("powerpc/numa: Update numa code use walk_drmem_lmbs")
    6195a5001f1d ("powerpc/pseries: Update memory hotplug code to use drmem LMB array")
    6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
    b6eca183e23e ("powerpc/kernel: Enables memory hot-remove after reboot on pseries guests")
    b88fc309d6ad ("powerpc/numa: Look up associativity array in of_drconf_to_nid_single")

v4.9.232: Failed to apply! Possible dependencies:
    3a2df3798d4d ("powerpc/mm: Make switch_mm_irqs_off() out of line")
    43ed84a891b7 ("powerpc/mm: Move pgdir setting into a helper")
    5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
    5d451a87e5eb ("powerpc/64: Retrieve number of L1 cache sets from device-tree")
    6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
    70cd4c10b290 ("KVM: PPC: Book3S HV: Fix software walk of guest process page tables")
    9b081e10805c ("powerpc: port 64 bits pgtable_cache to 32 bits")
    a25bd72badfa ("powerpc/mm/radix: Workaround prefetch issue with KVM")
    bd067f83b084 ("powerpc/64: Fix naming of cache block vs. cache line")
    dbcbfee0c81c ("powerpc/64: More definitions for POWER9")
    e2827fe5c156 ("powerpc/64: Clean up ppc64_caches using a struct per cache")
    f4329f2ecb14 ("powerpc/64s: Reduce exception alignment")

v4.4.232: Failed to apply! Possible dependencies:
    11a6f6abd74a ("powerpc/mm: Move radix/hash common data structures to book3s64 headers")
    26b6a3d9bb48 ("powerpc/mm: move pte headers to book3s directory")
    3808a88985b4 ("powerpc: Move FW feature probing out of pseries probe()")
    3dfcb315d81e ("powerpc/mm: make a separate copy for book3s")
    5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
    5d31a96e6c01 ("powerpc/livepatch: Add livepatch stack to struct thread_info")
    6574ba950bbe ("powerpc/kernel: Convert cpu_has_feature() to returning bool")
    6c6ea53725b3 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
    a141cca3892b ("powerpc/mm: Add early_[cpu|mmu]_has_feature()")
    a8ed87c92adf ("powerpc/mm/radix: Add MMU_FTR_RADIX")
    b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file")
    da6a97bf12d5 ("powerpc: Move epapr_paravirt_early_init() to early_init_devtree()")
    f63e6d898760 ("powerpc/livepatch: Add livepatch header")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s: remove PROT_SAO support
From: Nicholas Piggin @ 2020-08-20  1:05 UTC (permalink / raw)
  To: linuxppc-dev, Shawn Anastasio
In-Reply-To: <3c053cc9-751c-9899-79ba-1013af140148@anastas.io>

Excerpts from Shawn Anastasio's message of August 19, 2020 6:59 am:
> On 8/18/20 2:11 AM, Nicholas Piggin wrote> Very reasonable point.
>> 
>> The problem we're trying to get a handle on is live partition migration
>> where a running guest might be using SAO then get migrated to a P10. I
>> don't think we have a good way to handle this case. Potentially the
>> hypervisor could revoke the page tables if the guest is running in hash
>> mode and the guest kernel could be taught about that and sigbus the
>> process, but in radix the guest controls those page tables and the SAO
>> state and I don't think there's a way to cause it to take a fault.
>> 
>> I also don't know what the proprietary hypervisor does here.
>> 
>> We could add it back, default to n, or make it bare metal only, or
>> somehow try to block live migration to a later CPU without the faciliy.
>> I wouldn't be against that.
> 
> 
> Admittedly I'm not too familiar with the specifics of live migration
> or guest memory management, but restoring the functionality and adding
> a way to prevent migration of SAO-using guests seems like a reasonable
> choice to me. Would this be done with help from the guest using some
> sort of infrastructure to signal to the hypervisor that SAO is in use,
> or entirely on the hypervisor by e.g. scanning the through the process
> table for SAO pages?

The first step might be to just re-add the functionality but disable
it by default if firmware_has_feature(FW_FEATURE_LPAR). You could have
a config or boot option to allow guests to use it at the cost of
migration compatibility.

That would probably be good enough for experimenting with the feature.
I think modifying the hypervisor and/or guest to deal with migration
is probably too much work to be justified at the moment.

>> It would be very interesting to know how it performs in such a "real"
>> situation. I don't know how well POWER9 has optimised it -- it's
>> possible that it's not much better than putting lwsync after every load
>> or store.
> 
> 
> This is definitely worth investigating in depth. That said, even if the
> performance on P9 isn't super great, I think the feature could still be
> useful, since it would offer more granularity than the sledgehammer
> approach of emitting lwsync everywhere.

Sure, we'd be interested to hear of results.

> I'd be happy to put in some of the work required to get this to a point
> where it can be reintroduced without breaking guest migration - I'd just
> need some pointers on getting started with whatever approach is decided on.

I think re-adding it as I said above would be okay. The code itself is 
not complex so that was not the reason for removal.

Thanks,
Nick


^ permalink raw reply

* Re: [PATCH v3] powerpc/pseries/svm: Allocate SWIOTLB buffer anywhere in memory
From: Michael Ellerman @ 2020-08-20  2:12 UTC (permalink / raw)
  To: Christoph Hellwig, Thiago Jung Bauermann
  Cc: Konrad Rzeszutek Wilk, Robin Murphy, Ram Pai, linux-kernel, iommu,
	Satheesh Rajendran, linuxppc-dev, Christoph Hellwig,
	Marek Szyprowski
In-Reply-To: <20200819044351.GA19391@lst.de>

Christoph Hellwig <hch@lst.de> writes:
> On Tue, Aug 18, 2020 at 07:11:26PM -0300, Thiago Jung Bauermann wrote:
>> POWER secure guests (i.e., guests which use the Protection Execution
>> Facility) need to use SWIOTLB to be able to do I/O with the hypervisor, but
>> they don't need the SWIOTLB memory to be in low addresses since the
>> hypervisor doesn't have any addressing limitation.
>> 
>> This solves a SWIOTLB initialization problem we are seeing in secure guests
>> with 128 GB of RAM: they are configured with 4 GB of crashkernel reserved
>> memory, which leaves no space for SWIOTLB in low addresses.
>> 
>> To do this, we use mostly the same code as swiotlb_init(), but allocate the
>> buffer using memblock_alloc() instead of memblock_alloc_low().
>> 
>> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
>
> Looks fine to me (except for the pointlessly long comment lines, but I've
> been told that's the powerpc way).

They're 80 columns AFAICS?

cheers

^ permalink raw reply

* [RFC PATCH 1/2] KVM: PPC: Use the ppc_inst type
From: Jordan Niethe @ 2020-08-20  3:39 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: Jordan Niethe

The ppc_inst type was added to help cope with the addition of prefixed
instructions to the ISA. Convert KVM to use this new type for dealing
wiht instructions. For now do not try to add further support for
prefixed instructions.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
 arch/powerpc/include/asm/disassemble.h    | 80 +++++++++---------
 arch/powerpc/include/asm/kvm_book3s.h     |  4 +-
 arch/powerpc/include/asm/kvm_book3s_asm.h |  3 +-
 arch/powerpc/include/asm/kvm_host.h       |  5 +-
 arch/powerpc/include/asm/kvm_ppc.h        | 14 ++--
 arch/powerpc/kernel/asm-offsets.c         |  2 +
 arch/powerpc/kernel/kvm.c                 | 99 +++++++++++------------
 arch/powerpc/kvm/book3s.c                 |  6 +-
 arch/powerpc/kvm/book3s.h                 |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c       |  8 +-
 arch/powerpc/kvm/book3s_emulate.c         | 30 +++----
 arch/powerpc/kvm/book3s_hv.c              | 19 ++---
 arch/powerpc/kvm/book3s_hv_nested.c       |  4 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  5 ++
 arch/powerpc/kvm/book3s_hv_tm.c           | 17 ++--
 arch/powerpc/kvm/book3s_hv_tm_builtin.c   | 12 +--
 arch/powerpc/kvm/book3s_paired_singles.c  | 15 ++--
 arch/powerpc/kvm/book3s_pr.c              | 20 ++---
 arch/powerpc/kvm/booke.c                  | 18 ++---
 arch/powerpc/kvm/booke.h                  |  4 +-
 arch/powerpc/kvm/booke_emulate.c          |  4 +-
 arch/powerpc/kvm/e500_emulate.c           |  6 +-
 arch/powerpc/kvm/e500_mmu_host.c          |  6 +-
 arch/powerpc/kvm/emulate.c                | 15 ++--
 arch/powerpc/kvm/emulate_loadstore.c      |  8 +-
 arch/powerpc/kvm/powerpc.c                |  4 +-
 arch/powerpc/kvm/trace.h                  |  9 ++-
 arch/powerpc/kvm/trace_booke.h            |  8 +-
 arch/powerpc/kvm/trace_pr.h               |  8 +-
 arch/powerpc/lib/inst.c                   |  4 +-
 arch/powerpc/lib/sstep.c                  |  4 +-
 arch/powerpc/sysdev/fsl_pci.c             |  4 +-
 32 files changed, 237 insertions(+), 210 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h
index 8d2ebc36d5e3..91dbe8e5cd13 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -10,75 +10,82 @@
 #define __ASM_PPC_DISASSEMBLE_H__
 
 #include <linux/types.h>
+#include <asm/inst.h>
 
-static inline unsigned int get_op(u32 inst)
+static inline unsigned int get_op(struct ppc_inst inst)
 {
-	return inst >> 26;
+	return ppc_inst_val(inst) >> 26;
 }
 
-static inline unsigned int get_xop(u32 inst)
+static inline unsigned int get_xop(struct ppc_inst inst)
 {
-	return (inst >> 1) & 0x3ff;
+	return (ppc_inst_val(inst) >> 1) & 0x3ff;
 }
 
-static inline unsigned int get_sprn(u32 inst)
+static inline unsigned int get_sprn(struct ppc_inst inst)
 {
-	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+	u32 word = ppc_inst_val(inst);
+
+	return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_dcrn(u32 inst)
+static inline unsigned int get_dcrn(struct ppc_inst inst)
 {
-	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+	u32 word = ppc_inst_val(inst);
+
+	return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_tmrn(u32 inst)
+static inline unsigned int get_tmrn(struct ppc_inst inst)
 {
-	return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+	u32 word = ppc_inst_val(inst);
+
+	return ((word >> 16) & 0x1f) | ((word >> 6) & 0x3e0);
 }
 
-static inline unsigned int get_rt(u32 inst)
+static inline unsigned int get_rt(struct ppc_inst inst)
 {
-	return (inst >> 21) & 0x1f;
+	return (ppc_inst_val(inst) >> 21) & 0x1f;
 }
 
-static inline unsigned int get_rs(u32 inst)
+static inline unsigned int get_rs(struct ppc_inst inst)
 {
-	return (inst >> 21) & 0x1f;
+	return (ppc_inst_val(inst) >> 21) & 0x1f;
 }
 
-static inline unsigned int get_ra(u32 inst)
+static inline unsigned int get_ra(struct ppc_inst inst)
 {
-	return (inst >> 16) & 0x1f;
+	return (ppc_inst_val(inst) >> 16) & 0x1f;
 }
 
-static inline unsigned int get_rb(u32 inst)
+static inline unsigned int get_rb(struct ppc_inst inst)
 {
-	return (inst >> 11) & 0x1f;
+	return (ppc_inst_val(inst) >> 11) & 0x1f;
 }
 
-static inline unsigned int get_rc(u32 inst)
+static inline unsigned int get_rc(struct ppc_inst inst)
 {
-	return inst & 0x1;
+	return ppc_inst_val(inst) & 0x1;
 }
 
-static inline unsigned int get_ws(u32 inst)
+static inline unsigned int get_ws(struct ppc_inst inst)
 {
-	return (inst >> 11) & 0x1f;
+	return (ppc_inst_val(inst) >> 11) & 0x1f;
 }
 
-static inline unsigned int get_d(u32 inst)
+static inline unsigned int get_d(struct ppc_inst inst)
 {
-	return inst & 0xffff;
+	return ppc_inst_val(inst) & 0xffff;
 }
 
-static inline unsigned int get_oc(u32 inst)
+static inline unsigned int get_oc(struct ppc_inst inst)
 {
-	return (inst >> 11) & 0x7fff;
+	return (ppc_inst_val(inst) >> 11) & 0x7fff;
 }
 
-static inline unsigned int get_tx_or_sx(u32 inst)
+static inline unsigned int get_tx_or_sx(struct ppc_inst inst)
 {
-	return (inst) & 0x1;
+	return (ppc_inst_val(inst)) & 0x1;
 }
 
 #define IS_XFORM(inst)	(get_op(inst)  == 31)
@@ -87,29 +94,30 @@ static inline unsigned int get_tx_or_sx(u32 inst)
 /*
  * Create a DSISR value from the instruction
  */
-static inline unsigned make_dsisr(unsigned instr)
+static inline unsigned make_dsisr(struct ppc_inst instr)
 {
 	unsigned dsisr;
+	u32 word = ppc_inst_val(instr);
 
 
 	/* bits  6:15 --> 22:31 */
-	dsisr = (instr & 0x03ff0000) >> 16;
+	dsisr = (word & 0x03ff0000) >> 16;
 
 	if (IS_XFORM(instr)) {
 		/* bits 29:30 --> 15:16 */
-		dsisr |= (instr & 0x00000006) << 14;
+		dsisr |= (word & 0x00000006) << 14;
 		/* bit     25 -->    17 */
-		dsisr |= (instr & 0x00000040) << 8;
+		dsisr |= (word & 0x00000040) << 8;
 		/* bits 21:24 --> 18:21 */
-		dsisr |= (instr & 0x00000780) << 3;
+		dsisr |= (word & 0x00000780) << 3;
 	} else {
 		/* bit      5 -->    17 */
-		dsisr |= (instr & 0x04000000) >> 12;
+		dsisr |= (word & 0x04000000) >> 12;
 		/* bits  1: 4 --> 18:21 */
-		dsisr |= (instr & 0x78000000) >> 17;
+		dsisr |= (word & 0x78000000) >> 17;
 		/* bits 30:31 --> 12:13 */
 		if (IS_DSFORM(instr))
-			dsisr |= (instr & 0x00000003) << 18;
+			dsisr |= (word & 0x00000003) << 18;
 	}
 
 	return dsisr;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd..688c8ff78c7a 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -268,8 +268,8 @@ extern void kvmhv_emulate_tm_rollback(struct kvm_vcpu *vcpu);
 
 extern void kvmppc_entry_trampoline(void);
 extern void kvmppc_hv_entry_trampoline(void);
-extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
-extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
+extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, struct ppc_inst inst);
+extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, struct ppc_inst inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
 extern void kvmppc_pr_init_default_hcalls(struct kvm *kvm);
 extern int kvmppc_hcall_impl_pr(unsigned long cmd);
diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
index 078f4648ea27..da1ff6f23041 100644
--- a/arch/powerpc/include/asm/kvm_book3s_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
@@ -134,6 +134,7 @@ struct kvmppc_host_state {
 #endif
 };
 
+#include <asm/inst.h>
 struct kvmppc_book3s_shadow_vcpu {
 	bool in_use;
 	ulong gpr[14];
@@ -146,7 +147,7 @@ struct kvmppc_book3s_shadow_vcpu {
 	ulong shadow_srr1;
 	ulong fault_dar;
 	u32 fault_dsisr;
-	u32 last_inst;
+	struct ppc_inst last_inst;
 
 #ifdef CONFIG_PPC_BOOK3S_32
 	u32     sr[16];			/* Guest SRs */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e020d269416d..42937a600e6f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/inst.h>
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
@@ -751,7 +752,7 @@ struct kvm_vcpu_arch {
 	u8 prodded;
 	u8 doorbell_request;
 	u8 irq_pending; /* Used by XIVE to signal pending guest irqs */
-	u32 last_inst;
+	struct ppc_inst last_inst;
 
 	struct rcuwait *waitp;
 	struct kvmppc_vcore *vcore;
@@ -810,7 +811,7 @@ struct kvm_vcpu_arch {
 	u64 busy_stolen;
 	u64 busy_preempt;
 
-	u32 emul_inst;
+	struct ppc_inst emul_inst;
 
 	u32 online;
 
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0a056c64c317..88dce8d1dfc7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -29,6 +29,8 @@
 #include <asm/cpu_has_feature.h>
 #endif
 
+#include <asm/inst.h>
+
 /*
  * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
  * for supporting software breakpoint.
@@ -84,7 +86,7 @@ extern int kvmppc_handle_vsx_store(struct kvm_vcpu *vcpu,
 				int is_default_endian);
 
 extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
-				 enum instruction_fetch_type type, u32 *inst);
+				 enum instruction_fetch_type type, struct ppc_inst *inst);
 
 extern int kvmppc_ld(struct kvm_vcpu *vcpu, ulong *eaddr, int size, void *ptr,
 		     bool data);
@@ -291,7 +293,7 @@ struct kvmppc_ops {
 	void (*destroy_vm)(struct kvm *kvm);
 	int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info);
 	int (*emulate_op)(struct kvm_vcpu *vcpu,
-			  unsigned int inst, int *advance);
+			  struct ppc_inst inst, int *advance);
 	int (*emulate_mtspr)(struct kvm_vcpu *vcpu, int sprn, ulong spr_val);
 	int (*emulate_mfspr)(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val);
 	void (*fast_vcpu_kick)(struct kvm_vcpu *vcpu);
@@ -320,20 +322,20 @@ extern struct kvmppc_ops *kvmppc_hv_ops;
 extern struct kvmppc_ops *kvmppc_pr_ops;
 
 static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu,
-				enum instruction_fetch_type type, u32 *inst)
+				enum instruction_fetch_type type, struct ppc_inst *inst)
 {
 	int ret = EMULATE_DONE;
-	u32 fetched_inst;
+	struct ppc_inst fetched_inst;
 
 	/* Load the instruction manually if it failed to do so in the
 	 * exit path */
-	if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
+	if (ppc_inst_equal(vcpu->arch.last_inst, ppc_inst(KVM_INST_FETCH_FAILED)))
 		ret = kvmppc_load_last_inst(vcpu, type, &vcpu->arch.last_inst);
 
 	/*  Write fetch_failed unswapped if the fetch failed */
 	if (ret == EMULATE_DONE)
 		fetched_inst = kvmppc_need_byteswap(vcpu) ?
-				swab32(vcpu->arch.last_inst) :
+				ppc_inst_swab(vcpu->arch.last_inst) :
 				vcpu->arch.last_inst;
 	else
 		fetched_inst = vcpu->arch.last_inst;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 8711c2164b45..96b77588bf71 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -73,6 +73,8 @@
 #include "../xmon/xmon_bpts.h"
 #endif
 
+#include <asm/inst.h>
+
 #define STACK_PT_REGS_OFFSET(sym, val)	\
 	DEFINE(sym, STACK_FRAME_OVERHEAD + offsetof(struct pt_regs, val))
 
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index 617eba82531c..7672847d4035 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -22,6 +22,7 @@
 #include <asm/disassemble.h>
 #include <asm/ppc-opcode.h>
 #include <asm/epapr_hcalls.h>
+#include <asm/code-patching.h>
 
 #define KVM_MAGIC_PAGE		(-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
@@ -68,55 +69,49 @@ extern char kvm_tmp[];
 extern char kvm_tmp_end[];
 static int kvm_tmp_index;
 
-static void __init kvm_patch_ins(u32 *inst, u32 new_inst)
-{
-	*inst = new_inst;
-	flush_icache_range((ulong)inst, (ulong)inst + 4);
-}
-
-static void __init kvm_patch_ins_ll(u32 *inst, long addr, u32 rt)
+static void __init kvm_patch_ins_ll(struct ppc_inst *inst, long addr, u32 rt)
 {
 #ifdef CONFIG_64BIT
-	kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_LD | rt | (addr & 0x0000fffc)));
 #else
-	kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_LWZ | rt | (addr & 0x0000fffc)));
 #endif
 }
 
-static void __init kvm_patch_ins_ld(u32 *inst, long addr, u32 rt)
+static void __init kvm_patch_ins_ld(struct ppc_inst *inst, long addr, u32 rt)
 {
 #ifdef CONFIG_64BIT
-	kvm_patch_ins(inst, KVM_INST_LD | rt | (addr & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_LD | rt | (addr & 0x0000fffc)));
 #else
-	kvm_patch_ins(inst, KVM_INST_LWZ | rt | ((addr + 4) & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_LWZ | rt | ((addr + 4) & 0x0000fffc)));
 #endif
 }
 
-static void __init kvm_patch_ins_lwz(u32 *inst, long addr, u32 rt)
+static void __init kvm_patch_ins_lwz(struct ppc_inst *inst, long addr, u32 rt)
 {
-	kvm_patch_ins(inst, KVM_INST_LWZ | rt | (addr & 0x0000ffff));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_LWZ | rt | (addr & 0x0000ffff)));
 }
 
-static void __init kvm_patch_ins_std(u32 *inst, long addr, u32 rt)
+static void __init kvm_patch_ins_std(struct ppc_inst *inst, long addr, u32 rt)
 {
 #ifdef CONFIG_64BIT
-	kvm_patch_ins(inst, KVM_INST_STD | rt | (addr & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_STD | rt | (addr & 0x0000fffc)));
 #else
-	kvm_patch_ins(inst, KVM_INST_STW | rt | ((addr + 4) & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_STW | rt | ((addr + 4) & 0x0000fffc)));
 #endif
 }
 
-static void __init kvm_patch_ins_stw(u32 *inst, long addr, u32 rt)
+static void __init kvm_patch_ins_stw(struct ppc_inst *inst, long addr, u32 rt)
 {
-	kvm_patch_ins(inst, KVM_INST_STW | rt | (addr & 0x0000fffc));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_STW | rt | (addr & 0x0000fffc)));
 }
 
-static void __init kvm_patch_ins_nop(u32 *inst)
+static void __init kvm_patch_ins_nop(struct ppc_inst *inst)
 {
-	kvm_patch_ins(inst, KVM_INST_NOP);
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_NOP));
 }
 
-static void __init kvm_patch_ins_b(u32 *inst, int addr)
+static void __init kvm_patch_ins_b(struct ppc_inst *inst, int addr)
 {
 #if defined(CONFIG_RELOCATABLE) && defined(CONFIG_PPC_BOOK3S)
 	/* On relocatable kernels interrupts handlers and our code
@@ -126,7 +121,7 @@ static void __init kvm_patch_ins_b(u32 *inst, int addr)
 		return;
 #endif
 
-	kvm_patch_ins(inst, KVM_INST_B | (addr & KVM_INST_B_MASK));
+	raw_patch_instruction(inst, ppc_inst(KVM_INST_B | (addr & KVM_INST_B_MASK)));
 }
 
 static u32 * __init kvm_alloc(int len)
@@ -152,7 +147,7 @@ extern u32 kvm_emulate_mtmsrd_orig_ins_offs;
 extern u32 kvm_emulate_mtmsrd_len;
 extern u32 kvm_emulate_mtmsrd[];
 
-static void __init kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
+static void __init kvm_patch_ins_mtmsrd(struct ppc_inst *inst, u32 rt)
 {
 	u32 *p;
 	int distance_start;
@@ -177,13 +172,13 @@ static void __init kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 	/* Modify the chunk to fit the invocation */
 	memcpy(p, kvm_emulate_mtmsrd, kvm_emulate_mtmsrd_len * 4);
 	p[kvm_emulate_mtmsrd_branch_offs] |= distance_end & KVM_INST_B_MASK;
-	switch (get_rt(rt)) {
+	switch (get_rt(ppc_inst(rt))) {
 	case 30:
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsrd_reg_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsrd_reg_offs],
 				 magic_var(scratch2), KVM_RT_30);
 		break;
 	case 31:
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsrd_reg_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsrd_reg_offs],
 				 magic_var(scratch1), KVM_RT_30);
 		break;
 	default:
@@ -191,7 +186,7 @@ static void __init kvm_patch_ins_mtmsrd(u32 *inst, u32 rt)
 		break;
 	}
 
-	p[kvm_emulate_mtmsrd_orig_ins_offs] = *inst;
+	p[kvm_emulate_mtmsrd_orig_ins_offs] = ppc_inst_val(ppc_inst_read(inst));
 	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsrd_len * 4);
 
 	/* Patch the invocation */
@@ -205,7 +200,7 @@ extern u32 kvm_emulate_mtmsr_orig_ins_offs;
 extern u32 kvm_emulate_mtmsr_len;
 extern u32 kvm_emulate_mtmsr[];
 
-static void __init kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
+static void __init kvm_patch_ins_mtmsr(struct ppc_inst *inst, u32 rt)
 {
 	u32 *p;
 	int distance_start;
@@ -232,17 +227,17 @@ static void __init kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
 	p[kvm_emulate_mtmsr_branch_offs] |= distance_end & KVM_INST_B_MASK;
 
 	/* Make clobbered registers work too */
-	switch (get_rt(rt)) {
+	switch (get_rt(ppc_inst(rt))) {
 	case 30:
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg1_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsr_reg1_offs],
 				 magic_var(scratch2), KVM_RT_30);
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg2_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsr_reg2_offs],
 				 magic_var(scratch2), KVM_RT_30);
 		break;
 	case 31:
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg1_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsr_reg1_offs],
 				 magic_var(scratch1), KVM_RT_30);
-		kvm_patch_ins_ll(&p[kvm_emulate_mtmsr_reg2_offs],
+		kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_mtmsr_reg2_offs],
 				 magic_var(scratch1), KVM_RT_30);
 		break;
 	default:
@@ -251,7 +246,7 @@ static void __init kvm_patch_ins_mtmsr(u32 *inst, u32 rt)
 		break;
 	}
 
-	p[kvm_emulate_mtmsr_orig_ins_offs] = *inst;
+	p[kvm_emulate_mtmsr_orig_ins_offs] = ppc_inst_val(ppc_inst_read(inst));
 	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_mtmsr_len * 4);
 
 	/* Patch the invocation */
@@ -266,7 +261,7 @@ extern u32 kvm_emulate_wrtee_orig_ins_offs;
 extern u32 kvm_emulate_wrtee_len;
 extern u32 kvm_emulate_wrtee[];
 
-static void __init kvm_patch_ins_wrtee(u32 *inst, u32 rt, int imm_one)
+static void __init kvm_patch_ins_wrtee(struct ppc_inst *inst, u32 rt, int imm_one)
 {
 	u32 *p;
 	int distance_start;
@@ -297,13 +292,13 @@ static void __init kvm_patch_ins_wrtee(u32 *inst, u32 rt, int imm_one)
 			KVM_INST_LI | __PPC_RT(R30) | MSR_EE;
 	} else {
 		/* Make clobbered registers work too */
-		switch (get_rt(rt)) {
+		switch (get_rt(ppc_inst(rt))) {
 		case 30:
-			kvm_patch_ins_ll(&p[kvm_emulate_wrtee_reg_offs],
+			kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_wrtee_reg_offs],
 					 magic_var(scratch2), KVM_RT_30);
 			break;
 		case 31:
-			kvm_patch_ins_ll(&p[kvm_emulate_wrtee_reg_offs],
+			kvm_patch_ins_ll((struct ppc_inst *)&p[kvm_emulate_wrtee_reg_offs],
 					 magic_var(scratch1), KVM_RT_30);
 			break;
 		default:
@@ -312,7 +307,7 @@ static void __init kvm_patch_ins_wrtee(u32 *inst, u32 rt, int imm_one)
 		}
 	}
 
-	p[kvm_emulate_wrtee_orig_ins_offs] = *inst;
+	p[kvm_emulate_wrtee_orig_ins_offs] = ppc_inst_val(ppc_inst_read(inst));
 	flush_icache_range((ulong)p, (ulong)p + kvm_emulate_wrtee_len * 4);
 
 	/* Patch the invocation */
@@ -323,7 +318,7 @@ extern u32 kvm_emulate_wrteei_0_branch_offs;
 extern u32 kvm_emulate_wrteei_0_len;
 extern u32 kvm_emulate_wrteei_0[];
 
-static void __init kvm_patch_ins_wrteei_0(u32 *inst)
+static void __init kvm_patch_ins_wrteei_0(struct ppc_inst *inst)
 {
 	u32 *p;
 	int distance_start;
@@ -415,11 +410,11 @@ static void __init kvm_map_magic_page(void *data)
 	*features = out[0];
 }
 
-static void __init kvm_check_ins(u32 *inst, u32 features)
+static void __init kvm_check_ins(struct ppc_inst *inst, u32 features)
 {
-	u32 _inst = *inst;
-	u32 inst_no_rt = _inst & ~KVM_MASK_RT;
-	u32 inst_rt = _inst & KVM_MASK_RT;
+	struct ppc_inst _inst = ppc_inst_read(inst);
+	u32 inst_no_rt = ppc_inst_val(_inst) & ~KVM_MASK_RT;
+	u32 inst_rt = ppc_inst_val(_inst) & KVM_MASK_RT;
 
 	switch (inst_no_rt) {
 	/* Loads */
@@ -643,7 +638,7 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
 #endif
 	}
 
-	switch (_inst) {
+	switch (ppc_inst_val(_inst)) {
 #ifdef CONFIG_BOOKE
 	case KVM_INST_WRTEEI_0:
 		kvm_patch_ins_wrteei_0(inst);
@@ -656,13 +651,13 @@ static void __init kvm_check_ins(u32 *inst, u32 features)
 	}
 }
 
-extern u32 kvm_template_start[];
-extern u32 kvm_template_end[];
+extern struct ppc_inst kvm_template_start[];
+extern struct ppc_inst kvm_template_end[];
 
 static void __init kvm_use_magic_page(void)
 {
-	u32 *p;
-	u32 *start, *end;
+	struct ppc_inst *p;
+	struct ppc_inst *start, *end;
 	u32 features;
 
 	/* Tell the host to map the magic page to -4096 on all CPUs */
@@ -685,10 +680,10 @@ static void __init kvm_use_magic_page(void)
 	 */
 	local_irq_disable();
 
-	for (p = start; p < end; p++) {
+	for (p = start; p < end; p = ppc_inst_next(p, p)) {
 		/* Avoid patching the template code */
 		if (p >= kvm_template_start && p < kvm_template_end) {
-			p = kvm_template_end - 1;
+			p = (void *)(kvm_template_end - 4);
 			continue;
 		}
 		kvm_check_ins(p, features);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 41fedec69ac3..70d8967acc9b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -452,15 +452,17 @@ int kvmppc_xlate(struct kvm_vcpu *vcpu, ulong eaddr, enum xlate_instdata xlid,
 }
 
 int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
-		enum instruction_fetch_type type, u32 *inst)
+		enum instruction_fetch_type type, struct ppc_inst *inst)
 {
 	ulong pc = kvmppc_get_pc(vcpu);
+	u32 word;
 	int r;
 
 	if (type == INST_SC)
 		pc -= 4;
 
-	r = kvmppc_ld(vcpu, &pc, sizeof(u32), inst, false);
+	r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false);
+	*inst = ppc_inst(word);
 	if (r == EMULATE_DONE)
 		return r;
 	else
diff --git a/arch/powerpc/kvm/book3s.h b/arch/powerpc/kvm/book3s.h
index 9b6323ec8e60..c0f9fbdc11c7 100644
--- a/arch/powerpc/kvm/book3s.h
+++ b/arch/powerpc/kvm/book3s.h
@@ -19,7 +19,7 @@ extern void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte);
 extern int kvmppc_mmu_init_pr(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_destroy_pr(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
-				     unsigned int inst, int *advance);
+				     struct ppc_inst inst, int *advance);
 extern int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu,
 					int sprn, ulong spr_val);
 extern int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 38ea396a23d6..775ce41738ce 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -406,20 +406,20 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
  * embodied here.)  If the instruction isn't a load or store, then
  * this doesn't return anything useful.
  */
-static int instruction_is_store(unsigned int instr)
+static int instruction_is_store(struct ppc_inst instr)
 {
 	unsigned int mask;
 
 	mask = 0x10000000;
-	if ((instr & 0xfc000000) == 0x7c000000)
+	if ((ppc_inst_val(instr) & 0xfc000000) == 0x7c000000)
 		mask = 0x100;		/* major opcode 31 */
-	return (instr & mask) != 0;
+	return (ppc_inst_val(instr) & mask) != 0;
 }
 
 int kvmppc_hv_emulate_mmio(struct kvm_vcpu *vcpu,
 			   unsigned long gpa, gva_t ea, int is_store)
 {
-	u32 last_inst;
+	struct ppc_inst last_inst;
 
 	/*
 	 * Fast path - check if the guest physical address corresponds to a
diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
index 0effd48c8f4d..9c299a742702 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -236,20 +236,21 @@ void kvmppc_emulate_tabort(struct kvm_vcpu *vcpu, int ra_val)
 #endif
 
 int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
-			      unsigned int inst, int *advance)
+			      struct ppc_inst inst, int *advance)
 {
 	int emulated = EMULATE_DONE;
 	int rt = get_rt(inst);
 	int rs = get_rs(inst);
 	int ra = get_ra(inst);
 	int rb = get_rb(inst);
-	u32 inst_sc = 0x44000002;
+	struct ppc_inst inst_sc = ppc_inst(0x44000002);
+	u32 word = ppc_inst_val(inst);
 
-	switch (get_op(inst)) {
+	switch (ppc_inst_primary_opcode(inst)) {
 	case 0:
 		emulated = EMULATE_FAIL;
 		if ((kvmppc_get_msr(vcpu) & MSR_LE) &&
-		    (inst == swab32(inst_sc))) {
+		    (ppc_inst_equal(inst, ppc_inst_swab(inst_sc)))) {
 			/*
 			 * This is the byte reversed syscall instruction of our
 			 * hypercall handler. Early versions of LE Linux didn't
@@ -301,7 +302,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 		case OP_31_XOP_MTMSRD:
 		{
 			ulong rs_val = kvmppc_get_gpr(vcpu, rs);
-			if (inst & 0x10000) {
+			if (word & 0x10000) {
 				ulong new_msr = kvmppc_get_msr(vcpu);
 				new_msr &= ~(MSR_RI | MSR_EE);
 				new_msr |= rs_val & (MSR_RI | MSR_EE);
@@ -317,7 +318,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 		{
 			int srnum;
 
-			srnum = kvmppc_get_field(inst, 12 + 32, 15 + 32);
+			srnum = kvmppc_get_field(word, 12 + 32, 15 + 32);
 			if (vcpu->arch.mmu.mfsrin) {
 				u32 sr;
 				sr = vcpu->arch.mmu.mfsrin(vcpu, srnum);
@@ -339,7 +340,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 		}
 		case OP_31_XOP_MTSR:
 			vcpu->arch.mmu.mtsrin(vcpu,
-				(inst >> 16) & 0xf,
+				(word >> 16) & 0xf,
 				kvmppc_get_gpr(vcpu, rs));
 			break;
 		case OP_31_XOP_MTSRIN:
@@ -350,7 +351,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 		case OP_31_XOP_TLBIE:
 		case OP_31_XOP_TLBIEL:
 		{
-			bool large = (inst & 0x00200000) ? true : false;
+			bool large = (word & 0x00200000) ? true : false;
 			ulong addr = kvmppc_get_gpr(vcpu, rb);
 			vcpu->arch.mmu.tlbie(vcpu, addr, large);
 			break;
@@ -407,7 +408,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 			vcpu->arch.mmu.slbia(vcpu);
 			break;
 		case OP_31_XOP_SLBFEE:
-			if (!(inst & 1) || !vcpu->arch.mmu.slbfee) {
+			if (!(word & 1) || !vcpu->arch.mmu.slbfee) {
 				return EMULATE_FAIL;
 			} else {
 				ulong b, t;
@@ -507,7 +508,7 @@ int kvmppc_core_emulate_op_pr(struct kvm_vcpu *vcpu,
 					(((u64)(TM_CAUSE_EMULATE | TM_CAUSE_PERSISTENT))
 						 << TEXASR_FC_LG));
 
-				if ((inst >> 21) & 0x1)
+				if ((word >> 21) & 0x1)
 					vcpu->arch.texasr |= TEXASR_ROT;
 
 				if (kvmppc_get_msr(vcpu) & MSR_HV)
@@ -1029,12 +1030,12 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
 	return emulated;
 }
 
-u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)
+u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, struct ppc_inst inst)
 {
 	return make_dsisr(inst);
 }
 
-ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
+ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, struct ppc_inst inst)
 {
 #ifdef CONFIG_PPC_BOOK3S_64
 	/*
@@ -1053,7 +1054,7 @@ ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
 	case OP_STFS:
 		if (ra)
 			dar = kvmppc_get_gpr(vcpu, ra);
-		dar += (s32)((s16)inst);
+		dar += (s32)((s16)ppc_inst_val(inst));
 		break;
 	case 31:
 		if (ra)
@@ -1061,7 +1062,8 @@ ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
 		dar += kvmppc_get_gpr(vcpu, rb);
 		break;
 	default:
-		printk(KERN_INFO "KVM: Unaligned instruction 0x%x\n", inst);
+		printk(KERN_INFO "KVM: Unaligned instruction %s\n",
+		       ppc_inst_as_str(inst));
 		break;
 	}
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4ba06a2a306c..1041631c6f0d 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -432,9 +432,9 @@ static void kvmppc_dump_regs(struct kvm_vcpu *vcpu)
 	for (r = 0; r < vcpu->arch.slb_max; ++r)
 		pr_err("  ESID = %.16llx VSID = %.16llx\n",
 		       vcpu->arch.slb[r].orige, vcpu->arch.slb[r].origv);
-	pr_err("lpcr = %.16lx sdr1 = %.16lx last_inst = %.8x\n",
+	pr_err("lpcr = %.16lx sdr1 = %.16lx last_inst = %s\n",
 	       vcpu->arch.vcore->lpcr, vcpu->kvm->arch.sdr1,
-	       vcpu->arch.last_inst);
+	       ppc_inst_as_str(vcpu->arch.last_inst));
 }
 
 static struct kvm_vcpu *kvmppc_find_vcpu(struct kvm *kvm, int id)
@@ -1167,7 +1167,7 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
 
 static int kvmppc_emulate_debug_inst(struct kvm_vcpu *vcpu)
 {
-	u32 last_inst;
+	struct ppc_inst last_inst;
 
 	if (kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst) !=
 					EMULATE_DONE) {
@@ -1178,7 +1178,7 @@ static int kvmppc_emulate_debug_inst(struct kvm_vcpu *vcpu)
 		return RESUME_GUEST;
 	}
 
-	if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
+	if (ppc_inst_equal(last_inst, ppc_inst(KVMPPC_INST_SW_BREAKPOINT))) {
 		vcpu->run->exit_reason = KVM_EXIT_DEBUG;
 		vcpu->run->debug.arch.address = kvmppc_get_pc(vcpu);
 		return RESUME_HOST;
@@ -1227,7 +1227,8 @@ static unsigned long kvmppc_read_dpdes(struct kvm_vcpu *vcpu)
  */
 static int kvmppc_emulate_doorbell_instr(struct kvm_vcpu *vcpu)
 {
-	u32 inst, rb, thr;
+	struct ppc_inst inst;
+	u32 rb, thr;
 	unsigned long arg;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_vcpu *tvcpu;
@@ -1414,9 +1415,9 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 	 * Accordingly return to Guest or Host.
 	 */
 	case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-		if (vcpu->arch.emul_inst != KVM_INST_FETCH_FAILED)
+		if (!ppc_inst_equal(vcpu->arch.emul_inst, ppc_inst(KVM_INST_FETCH_FAILED)))
 			vcpu->arch.last_inst = kvmppc_need_byteswap(vcpu) ?
-				swab32(vcpu->arch.emul_inst) :
+				ppc_inst_swab(vcpu->arch.emul_inst) :
 				vcpu->arch.emul_inst;
 		if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) {
 			r = kvmppc_emulate_debug_inst(vcpu);
@@ -4096,7 +4097,7 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
 	vcpu->arch.stolen_logged = vcore_stolen_time(vc, mftb());
 	vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
 	vcpu->arch.busy_preempt = TB_NIL;
-	vcpu->arch.last_inst = KVM_INST_FETCH_FAILED;
+	vcpu->arch.last_inst = ppc_inst(KVM_INST_FETCH_FAILED);
 	vc->runnable_threads[0] = vcpu;
 	vc->n_runnable = 1;
 	vc->runner = vcpu;
@@ -5024,7 +5025,7 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 
 /* We don't need to emulate any privileged instructions or dcbz */
 static int kvmppc_core_emulate_op_hv(struct kvm_vcpu *vcpu,
-				     unsigned int inst, int *advance)
+				     struct ppc_inst inst, int *advance)
 {
 	return EMULATE_FAIL;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 6822d23a2da4..9993d8337e3a 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -123,7 +123,7 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
 		hr->asdr = vcpu->arch.fault_gpa;
 		break;
 	case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-		hr->heir = vcpu->arch.emul_inst;
+		hr->heir = ppc_inst_val(vcpu->arch.emul_inst);
 		break;
 	}
 }
@@ -183,7 +183,7 @@ void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
 	vcpu->arch.fault_dar = hr->hdar;
 	vcpu->arch.fault_dsisr = hr->hdsisr;
 	vcpu->arch.fault_gpa = hr->asdr;
-	vcpu->arch.emul_inst = hr->heir;
+	vcpu->arch.emul_inst = ppc_inst_read((struct ppc_inst *)&hr->heir);
 	vcpu->arch.shregs.srr0 = hr->srr0;
 	vcpu->arch.shregs.srr1 = hr->srr1;
 	vcpu->arch.shregs.sprg0 = hr->sprg[0];
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 799d6d0f4ead..4853b3444c5f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1360,10 +1360,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	   if this is an HEI (HV emulation interrupt, e40) */
 	li	r3,KVM_INST_FETCH_FAILED
 	stw	r3,VCPU_LAST_INST(r9)
+	li	r4,0xff
+	stw	r4,VCPU_LAST_INST+4(r9)
 	cmpwi	r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
 	bne	11f
 	mfspr	r3,SPRN_HEIR
 11:	stw	r3,VCPU_HEIR(r9)
+	stw	r4,VCPU_HEIR+4(r9)
 
 	/* these are volatile across C function calls */
 	mfctr	r3
@@ -2160,6 +2163,7 @@ fast_interrupt_c_return:
 
 	/* If this is for emulated MMIO, load the instruction word */
 2:	li	r8, KVM_INST_FETCH_FAILED	/* In case lwz faults */
+	li	r5, 0xff
 
 	/* Set guest mode to 'jump over instruction' so if lwz faults
 	 * we'll just continue at the next IP. */
@@ -2175,6 +2179,7 @@ fast_interrupt_c_return:
 
 	/* Store the result */
 	stw	r8, VCPU_LAST_INST(r9)
+	stw	r5, VCPU_LAST_INST+4(r9)
 
 	/* Unset guest mode. */
 	li	r0, KVM_GUEST_MODE_HOST_HV
diff --git a/arch/powerpc/kvm/book3s_hv_tm.c b/arch/powerpc/kvm/book3s_hv_tm.c
index cc90b8b82329..8cf5d4c9a842 100644
--- a/arch/powerpc/kvm/book3s_hv_tm.c
+++ b/arch/powerpc/kvm/book3s_hv_tm.c
@@ -41,11 +41,13 @@ static void emulate_tx_failure(struct kvm_vcpu *vcpu, u64 failure_cause)
  */
 int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 {
-	u32 instr = vcpu->arch.emul_inst;
+	struct ppc_inst instr = vcpu->arch.emul_inst;
 	u64 msr = vcpu->arch.shregs.msr;
 	u64 newmsr, bescr;
+	u32 word;
 	int ra, rs;
 
+	word = ppc_inst_val(instr);
 	/*
 	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
 	 * in these instructions, so masking bit 31 out doesn't change these
@@ -57,7 +59,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 	 * bit 31 set) can generate a softpatch interrupt. Hence both forms
 	 * are handled below for these instructions so they behave the same way.
 	 */
-	switch (instr & PO_XOP_OPCODE_MASK) {
+	switch (word & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -95,7 +97,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
 			       ((bescr >> 30) & 3) == 2));
 		bescr &= ~BESCR_GE;
-		if (instr & (1 << 11))
+		if (word & (1 << 11))
 			bescr |= BESCR_GE;
 		vcpu->arch.bescr = bescr;
 		msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
@@ -106,7 +108,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 
 	case PPC_INST_MTMSRD:
 		/* XXX do we need to check for PR=0 here? */
-		rs = (instr >> 21) & 0x1f;
+		rs = (word >> 21) & 0x1f;
 		newmsr = kvmppc_get_gpr(vcpu, rs);
 		/* check this is a Sx -> T1 transition */
 		WARN_ON_ONCE(!(MSR_TM_SUSPENDED(msr) &&
@@ -144,7 +146,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		vcpu->arch.regs.ccr = (vcpu->arch.regs.ccr & 0x0fffffff) |
 			(((msr & MSR_TS_MASK) >> MSR_TS_S_LG) << 29);
 		/* L=1 => tresume, L=0 => tsuspend */
-		if (instr & (1 << 21)) {
+		if (word & (1 << 21)) {
 			if (MSR_TM_SUSPENDED(msr))
 				msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
 		} else {
@@ -177,7 +179,7 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 		}
 		/* If failure was not previously recorded, recompute TEXASR */
 		if (!(vcpu->arch.orig_texasr & TEXASR_FS)) {
-			ra = (instr >> 16) & 0x1f;
+			ra = (word >> 16) & 0x1f;
 			if (ra)
 				ra = kvmppc_get_gpr(vcpu, ra) & 0xff;
 			emulate_tx_failure(vcpu, ra);
@@ -225,7 +227,8 @@ int kvmhv_p9_tm_emulation(struct kvm_vcpu *vcpu)
 
 	/* What should we do here? We didn't recognize the instruction */
 	kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
-	pr_warn_ratelimited("Unrecognized TM-related instruction %#x for emulation", instr);
+	pr_warn_ratelimited("Unrecognized TM-related instruction %s for emulation",
+			    ppc_inst_as_str(instr));
 
 	return RESUME_GUEST;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_tm_builtin.c b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
index fad931f224ef..82e6e190c211 100644
--- a/arch/powerpc/kvm/book3s_hv_tm_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_tm_builtin.c
@@ -19,10 +19,12 @@
  */
 int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 {
-	u32 instr = vcpu->arch.emul_inst;
+	struct ppc_inst instr = vcpu->arch.emul_inst;
 	u64 newmsr, msr, bescr;
+	u32 word;
 	int rs;
 
+	word = ppc_inst_val(instr);
 	/*
 	 * rfid, rfebb, and mtmsrd encode bit 31 = 0 since it's a reserved bit
 	 * in these instructions, so masking bit 31 out doesn't change these
@@ -34,7 +36,7 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 	 * generate a softpatch interrupt. Hence both forms are handled below
 	 * for tsr. to make them behave the same way.
 	 */
-	switch (instr & PO_XOP_OPCODE_MASK) {
+	switch (word & PO_XOP_OPCODE_MASK) {
 	case PPC_INST_RFID:
 		/* XXX do we need to check for PR=0 here? */
 		newmsr = vcpu->arch.shregs.srr1;
@@ -61,7 +63,7 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 		if (((bescr >> 30) & 3) != 2)
 			return 0;
 		bescr &= ~BESCR_GE;
-		if (instr & (1 << 11))
+		if (word & (1 << 11))
 			bescr |= BESCR_GE;
 		mtspr(SPRN_BESCR, bescr);
 		msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
@@ -72,7 +74,7 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 
 	case PPC_INST_MTMSRD:
 		/* XXX do we need to check for PR=0 here? */
-		rs = (instr >> 21) & 0x1f;
+		rs = (word >> 21) & 0x1f;
 		newmsr = kvmppc_get_gpr(vcpu, rs);
 		msr = vcpu->arch.shregs.msr;
 		/* check this is a Sx -> T1 transition */
@@ -95,7 +97,7 @@ int kvmhv_p9_tm_emulation_early(struct kvm_vcpu *vcpu)
 		if (!(vcpu->arch.hfscr & HFSCR_TM) || !(msr & MSR_TM))
 			return 0;
 		/* L=1 => tresume => set TS to T (0b10) */
-		if (instr & (1 << 21))
+		if (word & (1 << 21))
 			vcpu->arch.shregs.msr = (msr & ~MSR_TS_MASK) | MSR_TS_T;
 		/* Set CR0 to 0b0010 */
 		vcpu->arch.regs.ccr = (vcpu->arch.regs.ccr & 0x0fffffff) |
diff --git a/arch/powerpc/kvm/book3s_paired_singles.c b/arch/powerpc/kvm/book3s_paired_singles.c
index a11436720a8c..7645e2c6ef1d 100644
--- a/arch/powerpc/kvm/book3s_paired_singles.c
+++ b/arch/powerpc/kvm/book3s_paired_singles.c
@@ -336,12 +336,12 @@ static int kvmppc_emulate_psq_store(struct kvm_vcpu *vcpu,
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
  */
-static inline u32 inst_get_field(u32 inst, int msb, int lsb)
+static inline u32 inst_get_field(struct ppc_inst inst, int msb, int lsb)
 {
-	return kvmppc_get_field(inst, msb + 32, lsb + 32);
+	return kvmppc_get_field(ppc_inst_val(inst), msb + 32, lsb + 32);
 }
 
-static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
+static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, struct ppc_inst inst)
 {
 	if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
 		return false;
@@ -477,9 +477,10 @@ static bool kvmppc_inst_is_paired_single(struct kvm_vcpu *vcpu, u32 inst)
 	return false;
 }
 
-static int get_d_signext(u32 inst)
+static int get_d_signext(struct ppc_inst inst)
 {
-	int d = inst & 0x8ff;
+	u32 word = ppc_inst_val(inst);
+	int d = word & 0x8ff;
 
 	if (d & 0x800)
 		return -(d & 0x7ff);
@@ -620,7 +621,7 @@ static int kvmppc_ps_one_in(struct kvm_vcpu *vcpu, bool rc,
 
 int kvmppc_emulate_paired_single(struct kvm_vcpu *vcpu)
 {
-	u32 inst;
+	struct ppc_inst inst;
 	enum emulation_result emulated = EMULATE_DONE;
 	int ax_rd, ax_ra, ax_rb, ax_rc;
 	short full_d;
@@ -647,7 +648,7 @@ int kvmppc_emulate_paired_single(struct kvm_vcpu *vcpu)
 	fpr_b = &VCPU_FPR(vcpu, ax_rb);
 	fpr_c = &VCPU_FPR(vcpu, ax_rc);
 
-	rcomp = (inst & 1) ? true : false;
+	rcomp = (ppc_inst_val(inst) & 1) ? true : false;
 	cr = kvmppc_get_cr(vcpu);
 
 	if (!kvmppc_inst_is_paired_single(vcpu, inst))
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 88fac22fbf09..3a07f520b385 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1093,7 +1093,7 @@ static int kvmppc_exit_pr_progint(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 {
 	enum emulation_result er;
 	ulong flags;
-	u32 last_inst;
+	struct ppc_inst last_inst;
 	int emul, r;
 
 	/*
@@ -1113,10 +1113,10 @@ static int kvmppc_exit_pr_progint(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 
 	if (kvmppc_get_msr(vcpu) & MSR_PR) {
 #ifdef EXIT_DEBUG
-		pr_info("Userspace triggered 0x700 exception at\n 0x%lx (0x%x)\n",
-			kvmppc_get_pc(vcpu), last_inst);
+		pr_info("Userspace triggered 0x700 exception at\n 0x%lx (%s)\n",
+			kvmppc_get_pc(vcpu), ppc_inst_as_str(last_inst));
 #endif
-		if ((last_inst & 0xff0007ff) != (INS_DCBZ & 0xfffffff7)) {
+		if ((ppc_inst_val(last_inst) & 0xff0007ff) != (INS_DCBZ & 0xfffffff7)) {
 			kvmppc_core_queue_program(vcpu, flags);
 			return RESUME_GUEST;
 		}
@@ -1132,8 +1132,8 @@ static int kvmppc_exit_pr_progint(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 		r = RESUME_GUEST;
 		break;
 	case EMULATE_FAIL:
-		pr_crit("%s: emulation at %lx failed (%08x)\n",
-			__func__, kvmppc_get_pc(vcpu), last_inst);
+		pr_crit("%s: emulation at %lx failed (%s)\n",
+			__func__, kvmppc_get_pc(vcpu), ppc_inst_as_str(last_inst));
 		kvmppc_core_queue_program(vcpu, flags);
 		r = RESUME_GUEST;
 		break;
@@ -1295,7 +1295,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 		break;
 	case BOOK3S_INTERRUPT_SYSCALL:
 	{
-		u32 last_sc;
+		struct ppc_inst last_sc;
 		int emul;
 
 		/* Get last sc for papr */
@@ -1310,7 +1310,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 		}
 
 		if (vcpu->arch.papr_enabled &&
-		    (last_sc == 0x44000022) &&
+		    (ppc_inst_equal(last_sc, ppc_inst(0x44000022))) &&
 		    !(kvmppc_get_msr(vcpu) & MSR_PR)) {
 			/* SC 1 papr hypercalls */
 			ulong cmd = kvmppc_get_gpr(vcpu, 3);
@@ -1362,7 +1362,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	{
 		int ext_msr = 0;
 		int emul;
-		u32 last_inst;
+		struct ppc_inst last_inst;
 
 		if (vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE) {
 			/* Do paired single instruction emulation */
@@ -1396,7 +1396,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	}
 	case BOOK3S_INTERRUPT_ALIGNMENT:
 	{
-		u32 last_inst;
+		struct ppc_inst last_inst;
 		int emul = kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
 
 		if (emul == EMULATE_DONE) {
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3e1c9f08e302..e6b83d7b3a62 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -816,12 +816,12 @@ static int emulation_exit(struct kvm_vcpu *vcpu)
 		return RESUME_GUEST;
 
 	case EMULATE_FAIL:
-		printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
-		       __func__, vcpu->arch.regs.nip, vcpu->arch.last_inst);
+		printk(KERN_CRIT "%s: emulation at %lx failed (%s)\n",
+		       __func__, vcpu->arch.regs.nip, ppc_inst_as_str(vcpu->arch.last_inst));
 		/* For debugging, encode the failing instruction and
 		 * report it to userspace. */
-		vcpu->run->hw.hardware_exit_reason = ~0ULL << 32;
-		vcpu->run->hw.hardware_exit_reason |= vcpu->arch.last_inst;
+		run->hw.hardware_exit_reason = ~0ULL << 32;
+		run->hw.hardware_exit_reason |= ppc_inst_val(vcpu->arch.last_inst);
 		kvmppc_core_queue_program(vcpu, ESR_PIL);
 		return RESUME_HOST;
 
@@ -955,7 +955,7 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
 }
 
 static int kvmppc_resume_inst_load(struct kvm_vcpu *vcpu,
-				  enum emulation_result emulated, u32 last_inst)
+				  enum emulation_result emulated, struct ppc_inst last_inst)
 {
 	switch (emulated) {
 	case EMULATE_AGAIN:
@@ -966,8 +966,8 @@ static int kvmppc_resume_inst_load(struct kvm_vcpu *vcpu,
 		       __func__, vcpu->arch.regs.nip);
 		/* For debugging, encode the failing instruction and
 		 * report it to userspace. */
-		vcpu->run->hw.hardware_exit_reason = ~0ULL << 32;
-		vcpu->run->hw.hardware_exit_reason |= last_inst;
+		run->hw.hardware_exit_reason = ~0ULL << 32;
+		run->hw.hardware_exit_reason |= ppc_inst_val(last_inst);
 		kvmppc_core_queue_program(vcpu, ESR_PIL);
 		return RESUME_HOST;
 
@@ -987,7 +987,7 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 	int r = RESUME_HOST;
 	int s;
 	int idx;
-	u32 last_inst = KVM_INST_FETCH_FAILED;
+	struct ppc_inst last_inst = ppc_inst(KVM_INST_FETCH_FAILED);
 	enum emulation_result emulated = EMULATE_DONE;
 
 	/* update before a new last_exit_type is rewritten */
@@ -1089,7 +1089,7 @@ int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 
 	case BOOKE_INTERRUPT_PROGRAM:
 		if ((vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) &&
-			(last_inst == KVMPPC_INST_SW_BREAKPOINT)) {
+			(ppc_inst_equal(last_inst, ppc_inst(KVMPPC_INST_SW_BREAKPOINT)))) {
 			/*
 			 * We are here because of an SW breakpoint instr,
 			 * so lets return to host to handle.
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index be9da96d9f06..66bfdd750ebf 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -71,7 +71,7 @@ void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
 void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits);
 
 int kvmppc_booke_emulate_op(struct kvm_vcpu *vcpu,
-                            unsigned int inst, int *advance);
+			    struct ppc_inst inst, int *advance);
 int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val);
 int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val);
 
@@ -95,7 +95,7 @@ enum int_class {
 void kvmppc_set_pending_interrupt(struct kvm_vcpu *vcpu, enum int_class type);
 
 extern int kvmppc_core_emulate_op_e500(struct kvm_vcpu *vcpu,
-				       unsigned int inst, int *advance);
+				       struct ppc_inst inst, int *advance);
 extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn,
 					  ulong spr_val);
 extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn,
diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index d8d38aca71bd..ce3323c1e3e5 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -40,7 +40,7 @@ static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)
 }
 
 int kvmppc_booke_emulate_op(struct kvm_vcpu *vcpu,
-                            unsigned int inst, int *advance)
+			    struct ppc_inst inst, int *advance)
 {
 	int emulated = EMULATE_DONE;
 	int rs = get_rs(inst);
@@ -94,7 +94,7 @@ int kvmppc_booke_emulate_op(struct kvm_vcpu *vcpu,
 
 		case OP_31_XOP_WRTEEI:
 			vcpu->arch.shared->msr = (vcpu->arch.shared->msr & ~MSR_EE)
-							 | (inst & MSR_EE);
+							 | (ppc_inst_val(inst) & MSR_EE);
 			kvmppc_set_exit_type(vcpu, EMULATED_WRTEE_EXITS);
 			break;
 
diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index 64eb833e9f02..3274b659cb55 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -84,7 +84,7 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb)
 #endif
 
 static int kvmppc_e500_emul_ehpriv(struct kvm_vcpu *vcpu,
-				   unsigned int inst, int *advance)
+				   struct ppc_inst inst, int *advance)
 {
 	int emulated = EMULATE_DONE;
 
@@ -112,7 +112,7 @@ static int kvmppc_e500_emul_dcbtls(struct kvm_vcpu *vcpu)
 	return EMULATE_DONE;
 }
 
-static int kvmppc_e500_emul_mftmr(struct kvm_vcpu *vcpu, unsigned int inst,
+static int kvmppc_e500_emul_mftmr(struct kvm_vcpu *vcpu, struct ppc_inst inst,
 				  int rt)
 {
 	/* Expose one thread per vcpu */
@@ -126,7 +126,7 @@ static int kvmppc_e500_emul_mftmr(struct kvm_vcpu *vcpu, unsigned int inst,
 }
 
 int kvmppc_core_emulate_op_e500(struct kvm_vcpu *vcpu,
-				unsigned int inst, int *advance)
+				struct ppc_inst inst, int *advance)
 {
 	int emulated = EMULATE_DONE;
 	int ra = get_ra(inst);
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index d6c1069e9954..5710ad853dc8 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -623,7 +623,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 eaddr, gpa_t gpaddr,
 
 #ifdef CONFIG_KVM_BOOKE_HV
 int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
-		enum instruction_fetch_type type, u32 *instr)
+		enum instruction_fetch_type type, struct ppc_inst *instr)
 {
 	gva_t geaddr;
 	hpa_t addr;
@@ -706,14 +706,14 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 	/* Map a page and get guest's instruction */
 	page = pfn_to_page(pfn);
 	eaddr = (unsigned long)kmap_atomic(page);
-	*instr = *(u32 *)(eaddr | (unsigned long)(addr & ~PAGE_MASK));
+	*instr = ppc_inst_read((struct ppc_inst *)(eaddr | (unsigned long)(addr & ~PAGE_MASK)));
 	kunmap_atomic((u32 *)eaddr);
 
 	return EMULATE_DONE;
 }
 #else
 int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
-		enum instruction_fetch_type type, u32 *instr)
+		enum instruction_fetch_type type, struct ppc_inst *instr)
 {
 	return EMULATE_AGAIN;
 }
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index ee1147c98cd8..e439012e91f1 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -193,7 +193,7 @@ static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
  * from opcode tables in the future. */
 int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 {
-	u32 inst;
+	struct ppc_inst inst;
 	int rs, rt, sprn;
 	enum emulation_result emulated;
 	int advance = 1;
@@ -211,7 +211,7 @@ int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 	rt = get_rt(inst);
 	sprn = get_sprn(inst);
 
-	switch (get_op(inst)) {
+	switch (ppc_inst_primary_opcode(inst)) {
 	case OP_TRAP:
 #ifdef CONFIG_PPC_BOOK3S
 	case OP_TRAP_64:
@@ -269,7 +269,7 @@ int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 		 * Instruction with primary opcode 0. Based on PowerISA
 		 * these are illegal instructions.
 		 */
-		if (inst == KVMPPC_INST_SW_BREAKPOINT) {
+		if (ppc_inst_equal(inst, ppc_inst(KVMPPC_INST_SW_BREAKPOINT))) {
 			vcpu->run->exit_reason = KVM_EXIT_DEBUG;
 			vcpu->run->debug.arch.status = 0;
 			vcpu->run->debug.arch.address = kvmppc_get_pc(vcpu);
@@ -291,16 +291,17 @@ int kvmppc_emulate_instruction(struct kvm_vcpu *vcpu)
 			advance = 0;
 		} else if (emulated == EMULATE_FAIL) {
 			advance = 0;
-			printk(KERN_ERR "Couldn't emulate instruction 0x%08x "
-			       "(op %d xop %d)\n", inst, get_op(inst), get_xop(inst));
+			printk(KERN_ERR "Couldn't emulate instruction %s "
+			       "(op %d xop %d)\n", ppc_inst_as_str(inst),
+			       ppc_inst_primary_opcode(inst), get_xop(inst));
 		}
 	}
 
-	trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
+	trace_kvm_ppc_instr(&inst, kvmppc_get_pc(vcpu), emulated);
 
 	/* Advance past emulated instruction. */
 	if (advance)
-		kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
+		kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + ppc_inst_len(inst));
 
 	return emulated;
 }
diff --git a/arch/powerpc/kvm/emulate_loadstore.c b/arch/powerpc/kvm/emulate_loadstore.c
index 48272a9b9c30..62050c830753 100644
--- a/arch/powerpc/kvm/emulate_loadstore.c
+++ b/arch/powerpc/kvm/emulate_loadstore.c
@@ -71,7 +71,7 @@ static bool kvmppc_check_altivec_disabled(struct kvm_vcpu *vcpu)
  */
 int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 {
-	u32 inst;
+	struct ppc_inst inst;
 	enum emulation_result emulated = EMULATE_FAIL;
 	int advance = 1;
 	struct instruction_op op;
@@ -94,7 +94,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 
 	emulated = EMULATE_FAIL;
 	vcpu->arch.regs.msr = vcpu->arch.shared->msr;
-	if (analyse_instr(&op, &vcpu->arch.regs, ppc_inst(inst)) == 0) {
+	if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) {
 		int type = op.type & INSTR_TYPE_MASK;
 		int size = GETSIZE(op.type);
 
@@ -360,11 +360,11 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
 		kvmppc_core_queue_program(vcpu, 0);
 	}
 
-	trace_kvm_ppc_instr(inst, kvmppc_get_pc(vcpu), emulated);
+	trace_kvm_ppc_instr(&inst, kvmppc_get_pc(vcpu), emulated);
 
 	/* Advance past emulated instruction. */
 	if (advance)
-		kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + 4);
+		kvmppc_set_pc(vcpu, kvmppc_get_pc(vcpu) + ppc_inst_len(inst));
 
 	return emulated;
 }
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..2406e92c10bd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -304,11 +304,11 @@ int kvmppc_emulate_mmio(struct kvm_vcpu *vcpu)
 		break;
 	case EMULATE_FAIL:
 	{
-		u32 last_inst;
+		struct ppc_inst last_inst;
 
 		kvmppc_get_last_inst(vcpu, INST_GENERIC, &last_inst);
 		/* XXX Deliver Program interrupt to guest. */
-		pr_emerg("%s: emulation failed (%08x)\n", __func__, last_inst);
+		pr_emerg("%s: emulation failed (%s)\n", __func__, ppc_inst_as_str(last_inst));
 		r = RESUME_HOST;
 		break;
 	}
diff --git a/arch/powerpc/kvm/trace.h b/arch/powerpc/kvm/trace.h
index ea1d7c808319..1bfccb9df7b1 100644
--- a/arch/powerpc/kvm/trace.h
+++ b/arch/powerpc/kvm/trace.h
@@ -3,6 +3,7 @@
 #define _TRACE_KVM_H
 
 #include <linux/tracepoint.h>
+#include <asm/inst.h>
 
 #undef TRACE_SYSTEM
 #define TRACE_SYSTEM kvm
@@ -11,11 +12,11 @@
  * Tracepoint for guest mode entry.
  */
 TRACE_EVENT(kvm_ppc_instr,
-	TP_PROTO(unsigned int inst, unsigned long _pc, unsigned int emulate),
+	TP_PROTO(struct ppc_inst *inst, unsigned long _pc, unsigned int emulate),
 	TP_ARGS(inst, _pc, emulate),
 
 	TP_STRUCT__entry(
-		__field(	unsigned int,	inst		)
+		__field(	struct ppc_inst *,	inst	)
 		__field(	unsigned long,	pc		)
 		__field(	unsigned int,	emulate		)
 	),
@@ -26,8 +27,8 @@ TRACE_EVENT(kvm_ppc_instr,
 		__entry->emulate	= emulate;
 	),
 
-	TP_printk("inst %u pc 0x%lx emulate %u\n",
-		  __entry->inst, __entry->pc, __entry->emulate)
+	TP_printk("inst %s pc 0x%lx emulate %u\n",
+		  ppc_inst_as_str(*(__entry->inst)), __entry->pc, __entry->emulate)
 );
 
 TRACE_EVENT(kvm_stlb_inval,
diff --git a/arch/powerpc/kvm/trace_booke.h b/arch/powerpc/kvm/trace_booke.h
index 3837842986aa..a59475f2d4b1 100644
--- a/arch/powerpc/kvm/trace_booke.h
+++ b/arch/powerpc/kvm/trace_booke.h
@@ -44,7 +44,7 @@ TRACE_EVENT(kvm_exit,
 		__field(	unsigned long,	pc		)
 		__field(	unsigned long,	msr		)
 		__field(	unsigned long,	dar		)
-		__field(	unsigned long,	last_inst	)
+		__field(	struct ppc_inst *,	last_inst	)
 	),
 
 	TP_fast_assign(
@@ -52,20 +52,20 @@ TRACE_EVENT(kvm_exit,
 		__entry->pc		= kvmppc_get_pc(vcpu);
 		__entry->dar		= kvmppc_get_fault_dar(vcpu);
 		__entry->msr		= vcpu->arch.shared->msr;
-		__entry->last_inst	= vcpu->arch.last_inst;
+		__entry->last_inst	= &(vcpu->arch.last_inst);
 	),
 
 	TP_printk("exit=%s"
 		" | pc=0x%lx"
 		" | msr=0x%lx"
 		" | dar=0x%lx"
-		" | last_inst=0x%lx"
+		" | last_inst=%s"
 		,
 		__print_symbolic(__entry->exit_nr, kvm_trace_symbol_exit),
 		__entry->pc,
 		__entry->msr,
 		__entry->dar,
-		__entry->last_inst
+		ppc_inst_as_str(*(__entry->last_inst))
 		)
 );
 
diff --git a/arch/powerpc/kvm/trace_pr.h b/arch/powerpc/kvm/trace_pr.h
index 46a46d328fbf..d49807afd5fd 100644
--- a/arch/powerpc/kvm/trace_pr.h
+++ b/arch/powerpc/kvm/trace_pr.h
@@ -224,7 +224,7 @@ TRACE_EVENT(kvm_exit,
 		__field(	unsigned long,	msr		)
 		__field(	unsigned long,	dar		)
 		__field(	unsigned long,	srr1		)
-		__field(	unsigned long,	last_inst	)
+		__field(	struct ppc_inst *,	last_inst	)
 	),
 
 	TP_fast_assign(
@@ -233,7 +233,7 @@ TRACE_EVENT(kvm_exit,
 		__entry->dar		= kvmppc_get_fault_dar(vcpu);
 		__entry->msr		= kvmppc_get_msr(vcpu);
 		__entry->srr1		= vcpu->arch.shadow_srr1;
-		__entry->last_inst	= vcpu->arch.last_inst;
+		__entry->last_inst	= &(vcpu->arch.last_inst);
 	),
 
 	TP_printk("exit=%s"
@@ -241,14 +241,14 @@ TRACE_EVENT(kvm_exit,
 		" | msr=0x%lx"
 		" | dar=0x%lx"
 		" | srr1=0x%lx"
-		" | last_inst=0x%lx"
+		" | last_inst=%s"
 		,
 		__print_symbolic(__entry->exit_nr, kvm_trace_symbol_exit),
 		__entry->pc,
 		__entry->msr,
 		__entry->dar,
 		__entry->srr1,
-		__entry->last_inst
+		ppc_inst_as_str(*(__entry->last_inst))
 		)
 );
 
diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c
index 9cc17eb62462..7bcf3b300f42 100644
--- a/arch/powerpc/lib/inst.c
+++ b/arch/powerpc/lib/inst.c
@@ -18,7 +18,7 @@ int probe_user_read_inst(struct ppc_inst *inst,
 	err = copy_from_user_nofault(&val, nip, sizeof(val));
 	if (err)
 		return err;
-	if (get_op(val) == OP_PREFIX) {
+	if ((val >> 26) == OP_PREFIX) {
 		err = copy_from_user_nofault(&suffix, (void __user *)nip + 4, 4);
 		*inst = ppc_inst_prefix(val, suffix);
 	} else {
@@ -36,7 +36,7 @@ int probe_kernel_read_inst(struct ppc_inst *inst,
 	err = copy_from_kernel_nofault(&val, src, sizeof(val));
 	if (err)
 		return err;
-	if (get_op(val) == OP_PREFIX) {
+	if ((val >> 26) == OP_PREFIX) {
 		err = copy_from_kernel_nofault(&suffix, (void *)src + 4, 4);
 		*inst = ppc_inst_prefix(val, suffix);
 	} else {
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index caee8cc77e19..885bcd5170af 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1348,7 +1348,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		rd = (suffix >> 21) & 0x1f;
 		op->reg = rd;
 		op->val = regs->gpr[rd];
-		suffixopcode = get_op(suffix);
+		suffixopcode = suffix >> 26;
 		prefixtype = (word >> 24) & 0x3;
 		switch (prefixtype) {
 		case 2:
@@ -2737,7 +2737,7 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		op->reg = rd;
 		op->val = regs->gpr[rd];
 
-		suffixopcode = get_op(suffix);
+		suffixopcode = suffix >> 26;
 		prefixtype = (word >> 24) & 0x3;
 		switch (prefixtype) {
 		case 0: /* Type 00  Eight-Byte Load/Store */
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index 040b9d01c079..488f8c3d4d0e 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -930,7 +930,7 @@ u64 fsl_pci_immrbar_base(struct pci_controller *hose)
 }
 
 #ifdef CONFIG_E500
-static int mcheck_handle_load(struct pt_regs *regs, u32 inst)
+static int mcheck_handle_load(struct pt_regs *regs, struct ppc_inst inst)
 {
 	unsigned int rd, ra, rb, d;
 
@@ -1050,7 +1050,7 @@ static int is_in_pci_mem_space(phys_addr_t addr)
 
 int fsl_pci_mcheck_exception(struct pt_regs *regs)
 {
-	u32 inst;
+	struct ppc_inst inst;
 	int ret;
 	phys_addr_t addr = 0;
 
-- 
2.17.1


^ permalink raw reply related

* [RFC PATCH 2/2] KVM: PPC: Book3S HV: Support prefixed instructions
From: Jordan Niethe @ 2020-08-20  3:39 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: Jordan Niethe
In-Reply-To: <20200820033922.32311-1-jniethe5@gmail.com>

There are two main places where instructions are loaded from the guest:
    * Emulate loadstore - such as when performing MMIO emulation
      triggered by an HDSI
    * After an HV emulation assistance interrupt (e40)

If it is a prefixed instruction that triggers these cases, its suffix
must be loaded. Use the SRR1_PREFIX bit to decide if a suffix needs to
be loaded. Make sure if this bit is set inject_interrupt() also sets it
when giving an interrupt to the guest.

ISA v3.10 extends the Hypervisor Emulation Instruction Register (HEIR)
to 64 bits long to accommodate prefixed instructions. For interrupts
caused by a word instruction the instruction is loaded into bits 32:63
and bits 0:31 are zeroed. When caused by a prefixed instruction the
prefix and suffix are loaded into bits 0:63.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
 arch/powerpc/kvm/book3s.c               | 15 +++++++++++++--
 arch/powerpc/kvm/book3s_64_mmu_hv.c     | 10 +++++++---
 arch/powerpc/kvm/book3s_hv_builtin.c    |  3 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 14 ++++++++++++++
 4 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 70d8967acc9b..18b1928a571b 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -456,13 +456,24 @@ int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,
 {
 	ulong pc = kvmppc_get_pc(vcpu);
 	u32 word;
+	u64 doubleword;
 	int r;
 
 	if (type == INST_SC)
 		pc -= 4;
 
-	r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false);
-	*inst = ppc_inst(word);
+	if ((kvmppc_get_msr(vcpu) & SRR1_PREFIXED)) {
+		r = kvmppc_ld(vcpu, &pc, sizeof(u64), &doubleword, false);
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+		*inst = ppc_inst_prefix(doubleword & 0xffffffff, doubleword >> 32);
+#else
+		*inst = ppc_inst_prefix(doubleword >> 32, doubleword & 0xffffffff);
+#endif
+	} else {
+		r = kvmppc_ld(vcpu, &pc, sizeof(u32), &word, false);
+		*inst = ppc_inst(word);
+	}
+
 	if (r == EMULATE_DONE)
 		return r;
 	else
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 775ce41738ce..0802471f4856 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -411,9 +411,13 @@ static int instruction_is_store(struct ppc_inst instr)
 	unsigned int mask;
 
 	mask = 0x10000000;
-	if ((ppc_inst_val(instr) & 0xfc000000) == 0x7c000000)
-		mask = 0x100;		/* major opcode 31 */
-	return (ppc_inst_val(instr) & mask) != 0;
+	if (ppc_inst_prefixed(instr)) {
+		return (ppc_inst_suffix(instr) & mask) != 0;
+	} else {
+		if ((ppc_inst_val(instr) & 0xfc000000) == 0x7c000000)
+			mask = 0x100;		/* major opcode 31 */
+		return (ppc_inst_val(instr) & mask) != 0;
+	}
 }
 
 int kvmppc_hv_emulate_mmio(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617ce83e0..41e07e63104b 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -807,6 +807,9 @@ static void inject_interrupt(struct kvm_vcpu *vcpu, int vec, u64 srr1_flags)
 		new_pc += 0xC000000000004000ULL;
 	}
 
+	if (msr & SRR1_PREFIXED)
+		srr1_flags |= SRR1_PREFIXED;
+
 	kvmppc_set_srr0(vcpu, pc);
 	kvmppc_set_srr1(vcpu, (msr & SRR1_MSR_BITS) | srr1_flags);
 	kvmppc_set_pc(vcpu, new_pc);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 4853b3444c5f..f2a609413621 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1365,6 +1365,16 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	cmpwi	r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
 	bne	11f
 	mfspr	r3,SPRN_HEIR
+	andis.  r0,r11,SRR1_PREFIXED@h
+	cmpwi   r0,0
+	beq	12f
+	rldicl	r4,r3,0,32	/* Suffix */
+	srdi	r3,r3,32	/* Prefix */
+	b	11f
+12:
+BEGIN_FTR_SECTION
+	rldicl	r3,r3,0,32	/* Word */
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_31)
 11:	stw	r3,VCPU_HEIR(r9)
 	stw	r4,VCPU_HEIR+4(r9)
 
@@ -2175,6 +2185,10 @@ fast_interrupt_c_return:
 	ori	r4, r3, MSR_DR		/* Enable paging for data */
 	mtmsrd	r4
 	lwz	r8, 0(r10)
+	andis.  r7, r11, SRR1_PREFIXED@h
+	cmpwi   r7,0
+	beq	+4
+	lwz	r5, 4(r10)
 	mtmsrd	r3
 
 	/* Store the result */
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
From: Michael Ellerman @ 2020-08-20  4:18 UTC (permalink / raw)
  To: Frederic Barrat, linuxppc-dev, oohall
In-Reply-To: <20200819130741.16769-1-fbarrat@linux.ibm.com>

Frederic Barrat <fbarrat@linux.ibm.com> writes:
> Fix typo introduced during recent code cleanup, which could lead to
> silently not freeing resources or oops message (on PCI hotplug or CAPI
> reset).
> Only impacts ioda2, the code path for ioda1 is correct.
>
> Fixes: 01e12629af4e ("powerpc/powernv/pci: Add explicit tracking of the DMA setup state")
> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>

I changed the subject to:

    powerpc/powernv/pci: Fix possible crash when releasing DMA resources


To hopefully better convey that it's a moderately serious bug, even if
the root cause is just a typo. Otherwise folks scanning the commit log
might think it's just a harmless typo.

cheers

> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index c9c25fb0783c..023a4f987bb2 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2705,7 +2705,7 @@ void pnv_pci_ioda2_release_pe_dma(struct pnv_ioda_pe *pe)
>  	struct iommu_table *tbl = pe->table_group.tables[0];
>  	int64_t rc;
>  
> -	if (pe->dma_setup_done)
> +	if (!pe->dma_setup_done)
>  		return;
>  
>  	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> -- 
> 2.26.2

^ permalink raw reply

* [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
From: Andrew Donnellan @ 2020-08-20  4:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nathanl, leobras.c, stable, dja

A number of userspace utilities depend on making calls to RTAS to retrieve
information and update various things.

The existing API through which we expose RTAS to userspace exposes more
RTAS functionality than we actually need, through the sys_rtas syscall,
which allows root (or anyone with CAP_SYS_ADMIN) to make any RTAS call they
want with arbitrary arguments.

Many RTAS calls take the address of a buffer as an argument, and it's up to
the caller to specify the physical address of the buffer as an argument. We
allocate a buffer (the "RMO buffer") in the Real Memory Area that RTAS can
access, and then expose the physical address and size of this buffer in
/proc/powerpc/rtas/rmo_buffer. Userspace is expected to read this address,
poke at the buffer using /dev/mem, and pass an address in the RMO buffer to
the RTAS call.

However, there's nothing stopping the caller from specifying whatever
address they want in the RTAS call, and it's easy to construct a series of
RTAS calls that can overwrite arbitrary bytes (even without /dev/mem
access).

Additionally, there are some RTAS calls that do potentially dangerous
things and for which there are no legitimate userspace use cases.

In the past, this would not have been a particularly big deal as it was
assumed that root could modify all system state freely, but with Secure
Boot and lockdown we need to care about this.

We can't fundamentally change the ABI at this point, however we can address
this by implementing a filter that checks RTAS calls against a list
of permitted calls and forces the caller to use addresses within the RMO
buffer.

The list is based off the list of calls that are used by the librtas
userspace library, and has been tested with a number of existing userspace
RTAS utilities. For compatibility with any applications we are not aware of
that require other calls, the filter can be turned off at build time.

Reported-by: Daniel Axtens <dja@axtens.net>
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>

---
v1->v2:
- address comments from mpe
- shorten the names of some struct members
- make the filter array static/ro_after_init, use const char *
- genericise the fixed buffer size cases
- simplify/get rid of some of the error printing
- get rid of rtas_token_name()
---
 arch/powerpc/Kconfig       |  13 ++++
 arch/powerpc/kernel/rtas.c | 153 +++++++++++++++++++++++++++++++++++++
 2 files changed, 166 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1f48bbfb3ce9..8dd42b82379b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -989,6 +989,19 @@ config PPC_SECVAR_SYSFS
 	  read/write operations on these variables. Say Y if you have
 	  secure boot enabled and want to expose variables to userspace.
 
+config PPC_RTAS_FILTER
+	bool "Enable filtering of RTAS syscalls"
+	default y
+	depends on PPC_RTAS
+	help
+	  The RTAS syscall API has security issues that could be used to
+	  compromise system integrity. This option enforces restrictions on the
+	  RTAS calls and arguments passed by userspace programs to mitigate
+	  these issues.
+
+	  Say Y unless you know what you are doing and the filter is causing
+	  problems for you.
+
 endmenu
 
 config ISA_DMA_API
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 806d554ce357..954f41676f69 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -992,6 +992,147 @@ struct pseries_errorlog *get_pseries_errorlog(struct rtas_error_log *log,
 	return NULL;
 }
 
+#ifdef CONFIG_PPC_RTAS_FILTER
+
+/*
+ * The sys_rtas syscall, as originally designed, allows root to pass
+ * arbitrary physical addresses to RTAS calls. A number of RTAS calls
+ * can be abused to write to arbitrary memory and do other things that
+ * are potentially harmful to system integrity, and thus should only
+ * be used inside the kernel and not exposed to userspace.
+ *
+ * All known legitimate users of the sys_rtas syscall will only ever
+ * pass addresses that fall within the RMO buffer, and use a known
+ * subset of RTAS calls.
+ *
+ * Accordingly, we filter RTAS requests to check that the call is
+ * permitted, and that provided pointers fall within the RMO buffer.
+ * The rtas_filters list contains an entry for each permitted call,
+ * with the indexes of the parameters which are expected to contain
+ * addresses and sizes of buffers allocated inside the RMO buffer.
+ */
+struct rtas_filter {
+	const char *name;
+	int token;
+	/* Indexes into the args buffer, -1 if not used */
+	int buf_idx1;
+	int size_idx1;
+	int buf_idx2;
+	int size_idx2;
+
+	int fixed_size;
+};
+
+static struct rtas_filter rtas_filters[] __ro_after_init = {
+	{ "ibm,activate-firmware", -1, -1, -1, -1, -1 },
+	{ "ibm,configure-connector", -1, 0, -1, 1, -1, 4096 },	/* Special cased */
+	{ "display-character", -1, -1, -1, -1, -1 },
+	{ "ibm,display-message", -1, 0, -1, -1, -1 },
+	{ "ibm,errinjct", -1, 2, -1, -1, -1, 1024 },
+	{ "ibm,close-errinjct", -1, -1, -1, -1, -1 },
+	{ "ibm,open-errinct", -1, -1, -1, -1, -1 },
+	{ "ibm,get-config-addr-info2", -1, -1, -1, -1, -1 },
+	{ "ibm,get-dynamic-sensor-state", -1, 1, -1, -1, -1 },
+	{ "ibm,get-indices", -1, 2, 3, -1, -1 },
+	{ "get-power-level", -1, -1, -1, -1, -1 },
+	{ "get-sensor-state", -1, -1, -1, -1, -1 },
+	{ "ibm,get-system-parameter", -1, 1, 2, -1, -1 },
+	{ "get-time-of-day", -1, -1, -1, -1, -1 },
+	{ "ibm,get-vpd", -1, 0, -1, 1, 2 },
+	{ "ibm,lpar-perftools", -1, 2, 3, -1, -1 },
+	{ "ibm,platform-dump", -1, 4, 5, -1, -1 },
+	{ "ibm,read-slot-reset-state", -1, -1, -1, -1, -1 },
+	{ "ibm,scan-log-dump", -1, 0, 1, -1, -1 },
+	{ "ibm,set-dynamic-indicator", -1, 2, -1, -1, -1 },
+	{ "ibm,set-eeh-option", -1, -1, -1, -1, -1 },
+	{ "set-indicator", -1, -1, -1, -1, -1 },
+	{ "set-power-level", -1, -1, -1, -1, -1 },
+	{ "set-time-for-power-on", -1, -1, -1, -1, -1 },
+	{ "ibm,set-system-parameter", -1, 1, -1, -1, -1 },
+	{ "set-time-of-day", -1, -1, -1, -1, -1 },
+	{ "ibm,suspend-me", -1, -1, -1, -1, -1 },
+	{ "ibm,update-nodes", -1, 0, -1, -1, -1, 4096 },
+	{ "ibm,update-properties", -1, 0, -1, -1, -1, 4096 },
+	{ "ibm,physical-attestation", -1, 0, 1, -1, -1 },
+};
+
+static bool in_rmo_buf(u32 base, u32 end)
+{
+	return base >= rtas_rmo_buf &&
+		base < (rtas_rmo_buf + RTAS_RMOBUF_MAX) &&
+		base <= end &&
+		end >= rtas_rmo_buf &&
+		end < (rtas_rmo_buf + RTAS_RMOBUF_MAX);
+}
+
+static bool block_rtas_call(int token, int nargs,
+			    struct rtas_args *args)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+		struct rtas_filter *f = &rtas_filters[i];
+		u32 base, size, end;
+
+		if (token != f->token)
+			continue;
+
+		if (f->buf_idx1 != -1) {
+			base = be32_to_cpu(args->args[f->buf_idx1]);
+			if (f->size_idx1 != -1)
+				size = be32_to_cpu(args->args[f->size_idx1]);
+			else if (f->fixed_size)
+				size = f->fixed_size;
+			else
+				size = 1;
+
+			end = base + size - 1;
+			if (!in_rmo_buf(base, end))
+				goto err;
+		}
+
+		if (f->buf_idx2 != -1) {
+			base = be32_to_cpu(args->args[f->buf_idx2]);
+			if (f->size_idx2 != -1)
+				size = be32_to_cpu(args->args[f->size_idx2]);
+			else if (f->fixed_size)
+				size = f->fixed_size;
+			else
+				size = 1;
+			end = base + size - 1;
+
+			/*
+			 * Special case for ibm,configure-connector where the
+			 * address can be 0
+			 */
+			if (!strcmp(f->name, "ibm,configure-connector") &&
+			    base == 0)
+				return false;
+
+			if (!in_rmo_buf(base, end))
+				goto err;
+		}
+
+		return false;
+	}
+
+err:
+	pr_err_ratelimited("sys_rtas: RTAS call blocked - exploit attempt?\n");
+	pr_err_ratelimited("sys_rtas: token=0x%x, nargs=%d (called by %s)\n",
+			   token, nargs, current->comm);
+	return true;
+}
+
+#else
+
+static bool block_rtas_call(int token, int nargs,
+			    struct rtas_args *args)
+{
+	return false;
+}
+
+#endif /* CONFIG_PPC_RTAS_FILTER */
+
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
@@ -1029,6 +1170,9 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	args.rets = &args.args[nargs];
 	memset(args.rets, 0, nret * sizeof(rtas_arg_t));
 
+	if (block_rtas_call(token, nargs, &args))
+		return -EINVAL;
+
 	/* Need to handle ibm,suspend_me call specially */
 	if (token == ibm_suspend_me_token) {
 
@@ -1090,6 +1234,9 @@ void __init rtas_initialize(void)
 	unsigned long rtas_region = RTAS_INSTANTIATE_MAX;
 	u32 base, size, entry;
 	int no_base, no_size, no_entry;
+#ifdef CONFIG_PPC_RTAS_FILTER
+	int i;
+#endif
 
 	/* Get RTAS dev node and fill up our "rtas" structure with infos
 	 * about it.
@@ -1129,6 +1276,12 @@ void __init rtas_initialize(void)
 #ifdef CONFIG_RTAS_ERROR_LOGGING
 	rtas_last_error_token = rtas_token("rtas-last-error");
 #endif
+
+#ifdef CONFIG_PPC_RTAS_FILTER
+	for (i = 0; i < ARRAY_SIZE(rtas_filters); i++) {
+		rtas_filters[i].token = rtas_token(rtas_filters[i].name);
+	}
+#endif
 }
 
 int __init early_init_dt_scan_rtas(unsigned long node,
-- 
2.20.1


^ permalink raw reply related

* [PATCH v2 2/2] selftests/powerpc: Add a rtas_filter selftest
From: Andrew Donnellan @ 2020-08-20  4:45 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: nathanl, leobras.c, dja
In-Reply-To: <20200820044512.7543-1-ajd@linux.ibm.com>

Add a selftest to test the basic functionality of CONFIG_RTAS_FILTER.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
v1->v2:
- new patch

---
 .../selftests/powerpc/syscalls/Makefile       |   2 +-
 .../selftests/powerpc/syscalls/rtas_filter.c  | 285 ++++++++++++++++++
 2 files changed, 286 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/syscalls/rtas_filter.c

diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile b/tools/testing/selftests/powerpc/syscalls/Makefile
index 01b22775ca87..b63f8459c704 100644
--- a/tools/testing/selftests/powerpc/syscalls/Makefile
+++ b/tools/testing/selftests/powerpc/syscalls/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-TEST_GEN_PROGS := ipc_unmuxed
+TEST_GEN_PROGS := ipc_unmuxed rtas_filter
 
 CFLAGS += -I../../../../../usr/include
 
diff --git a/tools/testing/selftests/powerpc/syscalls/rtas_filter.c b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
new file mode 100644
index 000000000000..3b0ac4ff64e5
--- /dev/null
+++ b/tools/testing/selftests/powerpc/syscalls/rtas_filter.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright 2005-2020 IBM Corporation.
+ *
+ * Includes code from librtas (https://github.com/ibm-power-utilities/librtas/)
+ */
+
+#include <byteswap.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <errno.h>
+#include "utils.h"
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define cpu_to_be32(x)		bswap_32(x)
+#define be32_to_cpu(x)		bswap_32(x)
+#else
+#define cpu_to_be32(x)		(x)
+#define be32_to_cpu(x)		(x)
+#endif
+
+#define RTAS_IO_ASSERT	-1098	/* Unexpected I/O Error */
+#define RTAS_UNKNOWN_OP -1099	/* No Firmware Implementation of Function */
+#define BLOCK_SIZE 4096
+#define PAGE_SIZE 4096
+#define MAX_PAGES 64
+
+static const char *ofdt_rtas_path = "/proc/device-tree/rtas";
+
+typedef __be32 uint32_t;
+struct rtas_args {
+	__be32 token;
+	__be32 nargs;
+	__be32 nret;
+	__be32 args[16];
+	__be32 *rets;	  /* Pointer to return values in args[]. */
+};
+
+struct region {
+	uint64_t addr;
+	uint32_t size;
+	struct region *next;
+};
+
+int read_entire_file(int fd, char **buf, size_t *len)
+{
+	size_t buf_size = 0;
+	size_t off = 0;
+	int rc;
+
+	*buf = NULL;
+	do {
+		buf_size += BLOCK_SIZE;
+		if (*buf == NULL)
+			*buf = malloc(buf_size);
+		else
+			*buf = realloc(*buf, buf_size);
+
+		if (*buf == NULL)
+			return -ENOMEM;
+
+		rc = read(fd, *buf + off, BLOCK_SIZE);
+		if (rc < 0)
+			return -EIO;
+
+		off += rc;
+	} while (rc == BLOCK_SIZE);
+
+	if (len)
+		*len = off;
+
+	return 0;
+}
+
+static int open_prop_file(const char *prop_path, const char *prop_name, int *fd)
+{
+	char *path;
+	int len;
+
+	/* allocate enough for two string, a slash and trailing NULL */
+	len = strlen(prop_path) + strlen(prop_name) + 1 + 1;
+	path = malloc(len);
+	if (path == NULL)
+		return -ENOMEM;
+
+	snprintf(path, len, "%s/%s", prop_path, prop_name);
+
+	*fd = open(path, O_RDONLY);
+	free(path);
+	if (*fd < 0)
+		return -errno;
+
+	return 0;
+}
+
+static int get_property(const char *prop_path, const char *prop_name,
+			char **prop_val, size_t *prop_len)
+{
+	int rc, fd;
+
+	rc = open_prop_file(prop_path, prop_name, &fd);
+	if (rc)
+		return rc;
+
+	rc = read_entire_file(fd, prop_val, prop_len);
+	close(fd);
+
+	return rc;
+}
+
+int rtas_token(const char *call_name)
+{
+	char *prop_buf = NULL;
+	size_t len;
+	int rc;
+
+	rc = get_property(ofdt_rtas_path, call_name, &prop_buf, &len);
+	if (rc < 0) {
+		rc = RTAS_UNKNOWN_OP;
+		goto err;
+	}
+
+	rc = be32_to_cpu(*(int *)prop_buf);
+
+err:
+	free(prop_buf);
+	return rc;
+}
+
+static int read_kregion_bounds(struct region *kregion)
+{
+	char *buf;
+	int fd;
+	int rc;
+
+	fd = open("/proc/ppc64/rtas/rmo_buffer", O_RDONLY);
+	if (fd < 0) {
+		printf("Could not open rmo_buffer file\n");
+		return RTAS_IO_ASSERT;
+	}
+
+	rc = read_entire_file(fd, &buf, NULL);
+	close(fd);
+	if (rc) {
+		free(buf);
+		return rc;
+	}
+
+	sscanf(buf, "%" SCNx64 " %x", &kregion->addr, &kregion->size);
+	free(buf);
+
+	if (!(kregion->size && kregion->addr) ||
+	    (kregion->size > (PAGE_SIZE * MAX_PAGES))) {
+		printf("Unexpected kregion bounds\n");
+		return RTAS_IO_ASSERT;
+	}
+
+	return 0;
+}
+
+static int rtas_call(const char *name, int nargs,
+		     int nrets, ...)
+{
+	struct rtas_args args;
+	__be32 *rets[16];
+	int i, rc, token;
+	va_list ap;
+
+	va_start(ap, nrets);
+
+	token = rtas_token(name);
+	if (token == RTAS_UNKNOWN_OP) {
+		// We don't care if the call doesn't exist
+		printf("call '%s' not available, skipping...", name);
+		rc = RTAS_UNKNOWN_OP;
+		goto err;
+	}
+
+	args.token = cpu_to_be32(token);
+	args.nargs = cpu_to_be32(nargs);
+	args.nret = cpu_to_be32(nrets);
+
+	for (i = 0; i < nargs; i++)
+		args.args[i] = (__be32) va_arg(ap, unsigned long);
+
+	for (i = 0; i < nrets; i++)
+		rets[i] = (__be32 *) va_arg(ap, unsigned long);
+
+	rc = syscall(__NR_rtas, &args);
+	if (rc) {
+		rc = -errno;
+		goto err;
+	}
+
+	if (nrets) {
+		*(rets[0]) = be32_to_cpu(args.args[nargs]);
+
+		for (i = 1; i < nrets; i++) {
+			*(rets[i]) = args.args[nargs + i];
+		}
+	}
+
+err:
+	va_end(ap);
+	return rc;
+}
+
+static int test(void)
+{
+	struct region rmo_region;
+	uint64_t rmo_start;
+	uint64_t rmo_end;
+	__be32 rets[1];
+	int rc;
+
+	// Test a legitimate harmless call
+	// Expected: call succeeds
+	printf("Test a permitted call, no parameters... ");
+	rc = rtas_call("get-time-of-day", 0, 1, rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != 0 && rc != RTAS_UNKNOWN_OP);
+
+	// Test a prohibited call
+	// Expected: call returns -EINVAL
+	printf("Test a prohibited call... ");
+	rc = rtas_call("nvram-fetch", 0, 1, rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != -EINVAL && rc != RTAS_UNKNOWN_OP);
+
+	// Get RMO
+	rc = read_kregion_bounds(&rmo_region);
+	if (rc) {
+		printf("Couldn't read RMO region bounds, skipping remaining cases\n");
+		return 0;
+	}
+	rmo_start = rmo_region.addr;
+	rmo_end = rmo_start + rmo_region.size - 1;
+	printf("RMO range: %08lx - %08lx\n", rmo_start, rmo_end);
+
+	// Test a permitted call, user-supplied size, buffer inside RMO
+	// Expected: call succeeds
+	printf("Test a permitted call, user-supplied size, buffer inside RMO... ");
+	rc = rtas_call("ibm,get-system-parameter", 3, 1, 0, cpu_to_be32(rmo_start),
+		       cpu_to_be32(rmo_end - rmo_start + 1), rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != 0 && rc != RTAS_UNKNOWN_OP);
+
+	// Test a permitted call, user-supplied size, buffer start outside RMO
+	// Expected: call returns -EINVAL
+	printf("Test a permitted call, user-supplied size, buffer start outside RMO... ");
+	rc = rtas_call("ibm,get-system-parameter", 3, 1, 0, cpu_to_be32(rmo_end + 1),
+		       cpu_to_be32(4000), rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != -EINVAL && rc != RTAS_UNKNOWN_OP);
+
+	// Test a permitted call, user-supplied size, buffer end outside RMO
+	// Expected: call returns -EINVAL
+	printf("Test a permitted call, user-supplied size, buffer end outside RMO... ");
+	rc = rtas_call("ibm,get-system-parameter", 3, 1, 0, cpu_to_be32(rmo_start),
+		       cpu_to_be32(rmo_end - rmo_start + 2), rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != -EINVAL && rc != RTAS_UNKNOWN_OP);
+
+	// Test a permitted call, fixed size, buffer end outside RMO
+	// Expected: call returns -EINVAL
+	printf("Test a permitted call, fixed size, buffer end outside RMO... ");
+	rc = rtas_call("ibm,configure-connector", 2, 1, cpu_to_be32(rmo_end - 4000), 0, rets);
+	printf("rc: %d\n", rc);
+	FAIL_IF(rc != -EINVAL && rc != RTAS_UNKNOWN_OP);
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test, "rtas_filter");
+}
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 04/13] mm/debug_vm_pgtables/hugevmap: Use the arch helper to identify huge vmap support.
From: Aneesh Kumar K.V @ 2020-08-20  5:31 UTC (permalink / raw)
  To: kernel test robot, linux-mm, akpm
  Cc: kbuild-all, linuxppc-dev, Anshuman Khandual
In-Reply-To: <202008200034.CGkDyB26%lkp@intel.com>

kernel test robot <lkp@intel.com> writes:

> Hi "Aneesh,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on hnaz-linux-mm/master]
> [also build test ERROR on powerpc/next linus/master v5.9-rc1 next-20200819]
> [cannot apply to mmotm/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
>
> url:    https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/mm-debug_vm_pgtable-fixes/20200819-213446
> base:   https://github.com/hnaz/linux-mm master
> config: i386-randconfig-s002-20200818 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
> reproduce:
>         # apt-get install sparse
>         # sparse version: v0.6.2-183-gaa6ede3b-dirty
>         # save the attached .config to linux build tree
>         make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>>> arch/x86/mm/ioremap.c:484:12: error: redefinition of 'arch_ioremap_p4d_supported'
>      484 | int __init arch_ioremap_p4d_supported(void)
>          |            ^~~~~~~~~~~~~~~~~~~~~~~~~~
>    In file included from arch/x86/mm/ioremap.c:12:
>    include/linux/io.h:41:19: note: previous definition of 'arch_ioremap_p4d_supported' was here
>       41 | static inline int arch_ioremap_p4d_supported(void)
>          |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~

I guess trying to work this out without using #ifdef is complex. I ended
up with the below

1 file changed, 12 insertions(+), 2 deletions(-)
mm/debug_vm_pgtable.c | 14 ++++++++++++--

modified   mm/debug_vm_pgtable.c
@@ -202,11 +202,12 @@ static void __init pmd_leaf_tests(unsigned long pfn, pgprot_t prot)
 	WARN_ON(!pmd_leaf(pmd));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
 {
 	pmd_t pmd;
 
-	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+	if (!arch_ioremap_pmd_supported())
 		return;
 
 	pr_debug("Validating PMD huge\n");
@@ -220,6 +221,10 @@ static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot)
 	pmd = READ_ONCE(*pmdp);
 	WARN_ON(!pmd_none(pmd));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pmd_huge_tests(pmd_t *pmdp, unsigned long pfn, pgprot_t prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 
 static void __init pmd_savedwrite_tests(unsigned long pfn, pgprot_t prot)
 {
@@ -316,11 +321,12 @@ static void __init pud_leaf_tests(unsigned long pfn, pgprot_t prot)
 	WARN_ON(!pud_leaf(pud));
 }
 
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMAP
 static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
 {
 	pud_t pud;
 
-	if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMAP))
+	if (!arch_ioremap_pud_supported())
 		return;
 
 	pr_debug("Validating PUD huge\n");
@@ -334,6 +340,10 @@ static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot)
 	pud = READ_ONCE(*pudp);
 	WARN_ON(!pud_none(pud));
 }
+#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+static void __init pud_huge_tests(pud_t *pudp, unsigned long pfn, pgprot_t prot) { }
+#endif /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
+
 #else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
 static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
 static void __init pud_advanced_tests(struct mm_struct *mm,

^ permalink raw reply

* [PATCH v2] powerpc/pseries: Do not initiate shutdown when system is running on UPS
From: Vasant Hegde @ 2020-08-20  6:18 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Vasant Hegde, Tyrel Datwyler, stable

As per PAPR we have to look for both EPOW sensor value and event modifier to
identify type of event and take appropriate action.

Sensor value = 3 (EPOW_SYSTEM_SHUTDOWN) schedule system to be shutdown after
                  OS defined delay (default 10 mins).

EPOW Event Modifier for sensor value = 3:
   We have to initiate immediate shutdown for most of the event modifier except
   value = 2 (system running on UPS).

Checking with firmware document its clear that we have to wait for predefined
time before initiating shutdown. If power is restored within time we should
cancel the shutdown process. I think commit 79872e35 accidently enabled
immediate poweroff for EPOW_SHUTDOWN_ON_UPS event.

We have user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS.
Once it gets event it initiates shutdown after predefined time. Also starts
monitoring for any new EPOW events. If it receives "Power restored" event
before predefined time it will cancel the shutdown. Otherwise after
predefined time it will shutdown the system.

Fixes: 79872e35 (powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown)
Cc: stable@vger.kernel.org # v4.0+
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
Changes in v2:
  - Updated patch description based on mpe, Tyrel comment.

-Vasant
 arch/powerpc/platforms/pseries/ras.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index f3736fcd98fc..13c86a292c6d 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -184,7 +184,6 @@ static void handle_system_shutdown(char event_modifier)
 	case EPOW_SHUTDOWN_ON_UPS:
 		pr_emerg("Loss of system power detected. System is running on"
 			 " UPS/battery. Check RTAS error log for details\n");
-		orderly_poweroff(true);
 		break;

 	case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
-- 
2.26.2

^ permalink raw reply related

* Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
From: Frederic Barrat @ 2020-08-20  7:39 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, oohall
In-Reply-To: <874koyhrwm.fsf@mpe.ellerman.id.au>



Le 20/08/2020 à 06:18, Michael Ellerman a écrit :
> I changed the subject to:
> 
>      powerpc/powernv/pci: Fix possible crash when releasing DMA resources


Much better, thanks!

   Fred

^ permalink raw reply

* Re: [PATCH] kernel/watchdog: fix warning -Wunused-variable for watchdog_allowed_mask in ppc64
From: Petr Mladek @ 2020-08-20 11:30 UTC (permalink / raw)
  To: Balamuruhan S
  Cc: ravi.bangoria, peterz, gpiccoli, linux-kernel, rdna,
	Jisheng.Zhang, viro, tglx, linuxppc-dev, naveen.n.rao, sandipan
In-Reply-To: <20200814133330.210093-1-bala24@linux.ibm.com>

On Fri 2020-08-14 19:03:30, Balamuruhan S wrote:
> In ppc64 config if `CONFIG_SOFTLOCKUP_DETECTOR` is not set then it
> warns for unused declaration of `watchdog_allowed_mask` while building,
> move the declaration inside ifdef later in the code.
> 
> ```
> kernel/watchdog.c:47:23: warning: ‘watchdog_allowed_mask’ defined but not used [-Wunused-variable]
>  static struct cpumask watchdog_allowed_mask __read_mostly;
> ```
> 
> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
> ---
>  kernel/watchdog.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5abb5b22ad13..33c9b8a3d51b 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -44,7 +44,6 @@ int __read_mostly soft_watchdog_user_enabled = 1;
>  int __read_mostly watchdog_thresh = 10;
>  static int __read_mostly nmi_watchdog_available;
>  
> -static struct cpumask watchdog_allowed_mask __read_mostly;
>  
>  struct cpumask watchdog_cpumask __read_mostly;
>  unsigned long *watchdog_cpumask_bits = cpumask_bits(&watchdog_cpumask);
> @@ -166,6 +165,7 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace;
>  unsigned int __read_mostly softlockup_panic =
>  			CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;
>  
> +static struct cpumask watchdog_allowed_mask __read_mostly;

I could confirm that the variable is used only in code that is built
when CONFIG_SOFTLOCKUP_DETECTOR is enabled.

Note that the problem can't be seen on x86. There the softlockup
detector is enforced together with hardloclup detector via
via HARDLOCKUP_DETECTOR_PERF.

Reviewed-by: Petr Mladek <pmladek@suse.com>

Best Regards,
Petr

^ permalink raw reply

* Re: [PATCH v2] powerpc/pseries: Do not initiate shutdown when system is running on UPS
From: Michael Ellerman @ 2020-08-20 12:57 UTC (permalink / raw)
  To: Vasant Hegde, linuxppc-dev; +Cc: Vasant Hegde, Tyrel Datwyler, stable
In-Reply-To: <20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com>

Vasant Hegde <hegdevasant@linux.vnet.ibm.com> writes:
> As per PAPR we have to look for both EPOW sensor value and event modifier to
> identify type of event and take appropriate action.
>
> Sensor value = 3 (EPOW_SYSTEM_SHUTDOWN) schedule system to be shutdown after
>                   OS defined delay (default 10 mins).
>
> EPOW Event Modifier for sensor value = 3:
>    We have to initiate immediate shutdown for most of the event modifier except
>    value = 2 (system running on UPS).
>
> Checking with firmware document its clear that we have to wait for predefined
> time before initiating shutdown. If power is restored within time we should
> cancel the shutdown process. I think commit 79872e35 accidently enabled
> immediate poweroff for EPOW_SHUTDOWN_ON_UPS event.

It's not that clear to me :)

LoPAPR v1.1 section 10.2.2 includes table 136 "EPOW Action Codes":

  SYSTEM_SHUTDOWN 3

  The system must be shut down. An EPOW-aware OS logs the EPOW error
  log information, then schedules the system to be shut down to begin
  after an OS defined delay internal (default is 10 minutes.)

And then in section 10.3.2.2.8 there is table 146 "Platform Event Log
Format, Version 6, EPOW Section", which includes the "EPOW Event
Modifier":

  For EPOW sensor value = 3
  0x01 = Normal system shutdown with no additional delay
  0x02 = Loss of utility power, system is running on UPS/Battery
  0x03 = Loss of system critical functions, system should be shutdown
  0x04 = Ambient temperature too high
  All other values = reserved

There is also section 7.3.6.4 which includes a note saying:

  2. The report that a system needs to be shutdown due to running under
  a UPS would be given by the platform as an EPOW event with EPOW event
  modifier being given as, 0x02 = Loss of utility power, system is
  running on UPS/Battery, as described in section Section 10.3.2.2.8‚
  “Platform Event Log Format, EPOW Section‚” on page 308.

So the only mention of the 10 minutes is in relation to all
SYSTEM_SHUTDOWN events. ie. according to that we should not be doing an
immediate shutdown for any of the events.

> We have user space tool (rtas_errd) on LPAR to monitor for EPOW_SHUTDOWN_ON_UPS.
> Once it gets event it initiates shutdown after predefined time. Also starts
> monitoring for any new EPOW events. If it receives "Power restored" event
> before predefined time it will cancel the shutdown. Otherwise after
> predefined time it will shutdown the system.

What event are you referring to as the "Power restored" event? AFAICS
PAPR just says we "may" receive an EPOW_RESET.

I can't see anything else about what we're supposed to do if power is
restored.

Anyway I'm not opposed to the change, but I don't think it's correct to
say that PAPR defines the behaviour.

Rather we used to implement a certain behaviour, and we have at least
one customer who relies on that old behaviour and dislikes the new
behaviour. It's also generally good to defer decisions like this to
userspace, so that administrators can customise the behaviour.

Anyway I'll massage the change log a bit to incorporate some of the
above and apply it.

cheers

> Fixes: 79872e35 (powerpc/pseries: All events of EPOW_SYSTEM_SHUTDOWN must initiate shutdown)
> Cc: stable@vger.kernel.org # v4.0+
> Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
> ---
> Changes in v2:
>   - Updated patch description based on mpe, Tyrel comment.
>
> -Vasant
>  arch/powerpc/platforms/pseries/ras.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index f3736fcd98fc..13c86a292c6d 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -184,7 +184,6 @@ static void handle_system_shutdown(char event_modifier)
>  	case EPOW_SHUTDOWN_ON_UPS:
>  		pr_emerg("Loss of system power detected. System is running on"
>  			 " UPS/battery. Check RTAS error log for details\n");
> -		orderly_poweroff(true);
>  		break;
>  
>  	case EPOW_SHUTDOWN_LOSS_OF_CRITICAL_FUNCTIONS:
> -- 
> 2.26.2

^ permalink raw reply

* Re: [PATCH] powerpc/powernv/pci: Fix typo when releasing DMA resources
From: Michael Ellerman @ 2020-08-20 13:31 UTC (permalink / raw)
  To: oohall, linuxppc-dev, Frederic Barrat
In-Reply-To: <20200819130741.16769-1-fbarrat@linux.ibm.com>

On Wed, 19 Aug 2020 15:07:41 +0200, Frederic Barrat wrote:
> Fix typo introduced during recent code cleanup, which could lead to
> silently not freeing resources or oops message (on PCI hotplug or CAPI
> reset).
> Only impacts ioda2, the code path for ioda1 is correct.

Applied to powerpc/fixes.

[1/1] powerpc/powernv/pci: Fix possible crash when releasing DMA resources
      https://git.kernel.org/powerpc/c/e17a7c0e0aebb956719ce2a8465f649859c2da7d

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/perf: Account for interrupts during PMC overflow for an invalid SIAR check
From: Michael Ellerman @ 2020-08-20 13:31 UTC (permalink / raw)
  To: mpe, Athira Rajeev; +Cc: aik, maddy, linuxppc-dev
In-Reply-To: <1596717992-7321-1-git-send-email-atrajeev@linux.vnet.ibm.com>

On Thu, 6 Aug 2020 08:46:32 -0400, Athira Rajeev wrote:
> Performance monitor interrupt handler checks if any counter has overflown
> and calls `record_and_restart` in core-book3s which invokes
> `perf_event_overflow` to record the sample information.
> Apart from creating sample, perf_event_overflow also does the interrupt
> and period checks via perf_event_account_interrupt.
> 
> Currently we record information only if the SIAR valid bit is set
> ( using `siar_valid` check ) and hence the interrupt check.
> But it is possible that we do sampling for some events that are not
> generating valid SIAR and hence there is no chance to disable the event
> if interrupts is more than max_samples_per_tick. This leads to soft lockup.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/perf: Fix soft lockups due to missed interrupt accounting
      https://git.kernel.org/powerpc/c/17899eaf88d689529b866371344c8f269ba79b5f

cheers

^ permalink raw reply

* Re: [PATCH v2] powerpc/pseries: Do not initiate shutdown when system is running on UPS
From: Michael Ellerman @ 2020-08-20 13:31 UTC (permalink / raw)
  To: linuxppc-dev, Vasant Hegde; +Cc: Tyrel Datwyler, stable
In-Reply-To: <20200820061844.306460-1-hegdevasant@linux.vnet.ibm.com>

On Thu, 20 Aug 2020 11:48:44 +0530, Vasant Hegde wrote:
> As per PAPR we have to look for both EPOW sensor value and event modifier to
> identify type of event and take appropriate action.
> 
> Sensor value = 3 (EPOW_SYSTEM_SHUTDOWN) schedule system to be shutdown after
>                   OS defined delay (default 10 mins).
> 
> EPOW Event Modifier for sensor value = 3:
>    We have to initiate immediate shutdown for most of the event modifier except
>    value = 2 (system running on UPS).
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/pseries: Do not initiate shutdown when system is running on UPS
      https://git.kernel.org/powerpc/c/90a9b102eddf6a3f987d15f4454e26a2532c1c98

cheers

^ permalink raw reply

* Re: [PATCH v2 07/13] mm/debug_vm_pgtable/set_pte/pmd/pud: Don't use set_*_at to update an existing pte entry
From: Christophe Leroy @ 2020-08-20 14:32 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linux-mm, akpm; +Cc: linuxppc-dev, Anshuman Khandual
In-Reply-To: <20200819130107.478414-8-aneesh.kumar@linux.ibm.com>



Le 19/08/2020 à 15:01, Aneesh Kumar K.V a écrit :
> set_pte_at() should not be used to set a pte entry at locations that
> already holds a valid pte entry. Architectures like ppc64 don't do TLB
> invalidate in set_pte_at() and hence expect it to be used to set locations
> that are not a valid PTE.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   mm/debug_vm_pgtable.c | 35 +++++++++++++++--------------------
>   1 file changed, 15 insertions(+), 20 deletions(-)
> 
> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
> index 76f4c713e5a3..9c7e2c9cfc76 100644
> --- a/mm/debug_vm_pgtable.c
> +++ b/mm/debug_vm_pgtable.c
> @@ -74,15 +74,18 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
>   {
>   	pte_t pte = pfn_pte(pfn, prot);
>   
> +	/*
> +	 * Architectures optimize set_pte_at by avoiding TLB flush.
> +	 * This requires set_pte_at to be not used to update an
> +	 * existing pte entry. Clear pte before we do set_pte_at
> +	 */
> +
>   	pr_debug("Validating PTE advanced\n");
>   	pte = pfn_pte(pfn, prot);
>   	set_pte_at(mm, vaddr, ptep, pte);
>   	ptep_set_wrprotect(mm, vaddr, ptep);
>   	pte = ptep_get(ptep);
>   	WARN_ON(pte_write(pte));
> -
> -	pte = pfn_pte(pfn, prot);
> -	set_pte_at(mm, vaddr, ptep, pte);
>   	ptep_get_and_clear(mm, vaddr, ptep);
>   	pte = ptep_get(ptep);
>   	WARN_ON(!pte_none(pte));
> @@ -96,13 +99,11 @@ static void __init pte_advanced_tests(struct mm_struct *mm,
>   	ptep_set_access_flags(vma, vaddr, ptep, pte, 1);
>   	pte = ptep_get(ptep);
>   	WARN_ON(!(pte_write(pte) && pte_dirty(pte)));
> -
> -	pte = pfn_pte(pfn, prot);
> -	set_pte_at(mm, vaddr, ptep, pte);
>   	ptep_get_and_clear_full(mm, vaddr, ptep, 1);
>   	pte = ptep_get(ptep);
>   	WARN_ON(!pte_none(pte));
>   
> +	pte = pfn_pte(pfn, prot);
>   	pte = pte_mkyoung(pte);
>   	set_pte_at(mm, vaddr, ptep, pte);
>   	ptep_test_and_clear_young(vma, vaddr, ptep);
> @@ -164,9 +165,6 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>   	pmdp_set_wrprotect(mm, vaddr, pmdp);
>   	pmd = READ_ONCE(*pmdp);
>   	WARN_ON(pmd_write(pmd));
> -
> -	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
> -	set_pmd_at(mm, vaddr, pmdp, pmd);
>   	pmdp_huge_get_and_clear(mm, vaddr, pmdp);
>   	pmd = READ_ONCE(*pmdp);
>   	WARN_ON(!pmd_none(pmd));
> @@ -180,13 +178,11 @@ static void __init pmd_advanced_tests(struct mm_struct *mm,
>   	pmdp_set_access_flags(vma, vaddr, pmdp, pmd, 1);
>   	pmd = READ_ONCE(*pmdp);
>   	WARN_ON(!(pmd_write(pmd) && pmd_dirty(pmd)));
> -
> -	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
> -	set_pmd_at(mm, vaddr, pmdp, pmd);
>   	pmdp_huge_get_and_clear_full(vma, vaddr, pmdp, 1);
>   	pmd = READ_ONCE(*pmdp);
>   	WARN_ON(!pmd_none(pmd));
>   
> +	pmd = pmd_mkhuge(pfn_pmd(pfn, prot));
>   	pmd = pmd_mkyoung(pmd);
>   	set_pmd_at(mm, vaddr, pmdp, pmd);
>   	pmdp_test_and_clear_young(vma, vaddr, pmdp);
> @@ -283,18 +279,10 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
>   	WARN_ON(pud_write(pud));
>   
>   #ifndef __PAGETABLE_PMD_FOLDED

Same as below, once set_put_at() is gone, I don't think this #ifndef 
__PAGETABLE_PMD_FOLDED is still need, should be possible to replace by 
'if (mm_pmd_folded())'

> -
> -	pud = pud_mkhuge(pfn_pud(pfn, prot));
> -	set_pud_at(mm, vaddr, pudp, pud);
>   	pudp_huge_get_and_clear(mm, vaddr, pudp);
>   	pud = READ_ONCE(*pudp);
>   	WARN_ON(!pud_none(pud));
>   
> -	pud = pud_mkhuge(pfn_pud(pfn, prot));
> -	set_pud_at(mm, vaddr, pudp, pud);
> -	pudp_huge_get_and_clear_full(mm, vaddr, pudp, 1);
> -	pud = READ_ONCE(*pudp);
> -	WARN_ON(!pud_none(pud));
>   #endif /* __PAGETABLE_PMD_FOLDED */
>   
>   	pud = pud_mkhuge(pfn_pud(pfn, prot));
> @@ -307,6 +295,13 @@ static void __init pud_advanced_tests(struct mm_struct *mm,
>   	pud = READ_ONCE(*pudp);
>   	WARN_ON(!(pud_write(pud) && pud_dirty(pud)));
>   
> +#ifndef __PAGETABLE_PMD_FOLDED
> +	pudp_huge_get_and_clear_full(vma, vaddr, pudp, 1);
> +	pud = READ_ONCE(*pudp);
> +	WARN_ON(!pud_none(pud));
> +#endif /* __PAGETABLE_PMD_FOLDED */

pudp_huge_get_and_clear_full() and pud_none() are always defined, I 
think this #ifndef can be replaced by an 'if (mm_pmd_folded())'

> +
> +	pud = pud_mkhuge(pfn_pud(pfn, prot));
>   	pud = pud_mkyoung(pud);
>   	set_pud_at(mm, vaddr, pudp, pud);
>   	pudp_test_and_clear_young(vma, vaddr, pudp);
> 

Christophe

^ permalink raw reply

* [PATCH] powerpc: Fix a bug in __div64_32 if divisor is zero
From: Guohua Zhong @ 2020-08-20 13:10 UTC (permalink / raw)
  To: paulus, mpe, benh, gregkh
  Cc: nixiaoming, wangle6, linuxppc-dev, linux-kernel, stable

When cat /proc/pid/stat, do_task_stat will call into cputime_adjust,
which call stack is like this:

[17179954.674326]BookE Watchdog detected hard LOCKUP on cpu 0
[17179954.674331]dCPU: 0 PID: 1262 Comm: TICK Tainted: P        W  O    4.4.176 #1
[17179954.674339]dtask: dc9d7040 task.stack: d3cb4000
[17179954.674344]NIP: c001b1a8 LR: c006a7ac CTR: 00000000
[17179954.674349]REGS: e6fe1f10 TRAP: 3202   Tainted: P        W  O     (4.4.176)
[17179954.674355]MSR: 00021002 <CE,ME>  CR: 28002224  XER: 00000000
[17179954.674364]
GPR00: 00000016 d3cb5cb0 dc9d7040 d3cb5cc0 00000000 0000025d ffe15b24 ffffffff
GPR08: de86aead 00000000 000003ff ffffffff 28002222 0084d1c0 00000000 ffffffff
GPR16: b5929ca0 b4bb7a48 c0863c08 0000048d 00000062 00000062 00000000 0000000f
GPR24: 00000000 d3cb5d08 d3cb5d60 d3cb5d64 00029002 d3e9c214 fffff30e d3e9c20c
[17179954.674410]NIP [c001b1a8] __div64_32+0x60/0xa0
[17179954.674422]LR [c006a7ac] cputime_adjust+0x124/0x138
[17179954.674434]Call Trace:
[17179961.832693]Call Trace:
[17179961.832695][d3cb5cb0] [c006a6dc] cputime_adjust+0x54/0x138 (unreliable)
[17179961.832705][d3cb5cf0] [c006a818] task_cputime_adjusted+0x58/0x80
[17179961.832713][d3cb5d20] [c01dab44] do_task_stat+0x298/0x870
[17179961.832720][d3cb5de0] [c01d4948] proc_single_show+0x60/0xa4
[17179961.832728][d3cb5e10] [c01963d8] seq_read+0x2d8/0x52c
[17179961.832736][d3cb5e80] [c01702fc] __vfs_read+0x40/0x114
[17179961.832744][d3cb5ef0] [c0170b1c] vfs_read+0x9c/0x10c
[17179961.832751][d3cb5f10] [c0171440] SyS_read+0x68/0xc4
[17179961.832759][d3cb5f40] [c0010a40] ret_from_syscall+0x0/0x3c

do_task_stat->task_cputime_adjusted->cputime_adjust->scale_stime->div_u64
->div_u64_rem->do_div->__div64_32

In some corner case, stime + utime = 0 if overflow. Even in v5.8.2  kernel
the cputime has changed from unsigned long to u64 data type. About 200
days, the lowwer 32 bit will be 0x00000000. Because divisor for __div64_32
is unsigned long data type,which is 32 bit for powepc 32, the bug still
exists.

So it is also a bug in the cputime_adjust which does not check if
stime + utime = 0

time = scale_stime((__force u64)stime, (__force u64)rtime,
                (__force u64)(stime + utime));

The commit 3dc167ba5729 ("sched/cputime: Improve cputime_adjust()") in
mainline kernel may has fixed this case. But it is also better to check
if divisor is 0 in __div64_32 for other situation.

Signed-off-by: Guohua Zhong <zhongguohua1@huawei.com>
Fixes:14cf11af6cf6 "( powerpc: Merge enough to start building in arch/powerpc.)"
Fixes:94b212c29f68 "( powerpc: Move ppc64 boot wrapper code over to arch/powerpc)"
Cc: stable@vger.kernel.org # v2.6.15+
---
 arch/powerpc/boot/div64.S | 4 ++++
 arch/powerpc/lib/div64.S  | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/boot/div64.S b/arch/powerpc/boot/div64.S
index 4354928ed62e..39a25b9712d1 100644
--- a/arch/powerpc/boot/div64.S
+++ b/arch/powerpc/boot/div64.S
@@ -13,6 +13,9 @@
 
 	.globl __div64_32
 __div64_32:
+	li	r9,0
+	cmplw	r4,r9	# check if divisor r4 is zero
+	beq	5f			# jump to label 5 if r4(divisor) is zero
 	lwz	r5,0(r3)	# get the dividend into r5/r6
 	lwz	r6,4(r3)
 	cmplw	r5,r4
@@ -52,6 +55,7 @@ __div64_32:
 4:	stw	r7,0(r3)	# return the quotient in *r3
 	stw	r8,4(r3)
 	mr	r3,r6		# return the remainder in r3
+5:					# return if divisor r4 is zero
 	blr
 
 /*
diff --git a/arch/powerpc/lib/div64.S b/arch/powerpc/lib/div64.S
index 3d5426e7dcc4..1cc9bcabf678 100644
--- a/arch/powerpc/lib/div64.S
+++ b/arch/powerpc/lib/div64.S
@@ -13,6 +13,9 @@
 #include <asm/processor.h>
 
 _GLOBAL(__div64_32)
+	li	r9,0
+	cmplw	r4,r9	# check if divisor r4 is zero
+	beq	5f			# jump to label 5 if r4(divisor) is zero
 	lwz	r5,0(r3)	# get the dividend into r5/r6
 	lwz	r6,4(r3)
 	cmplw	r5,r4
@@ -52,4 +55,5 @@ _GLOBAL(__div64_32)
 4:	stw	r7,0(r3)	# return the quotient in *r3
 	stw	r8,4(r3)
 	mr	r3,r6		# return the remainder in r3
+5:					# return if divisor r4 is zero
 	blr
-- 
2.12.3



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox