* Re: [PATCH] powerpc: Add cpu family documentation
From: Michael Ellerman @ 2014-02-01 4:28 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev
In-Reply-To: <669D726F-25DF-4703-AD30-CAE7CA142970@kernel.crashing.org>
On Fri, 2014-01-31 at 07:32 -0600, Kumar Gala wrote:
> On Jan 29, 2014, at 8:38 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> > +Freescale BookE
> > +---------------
> > +
> > + - Software loaded TLB.
> > + - e6500 adds HW loaded indirect TLB entries.
> > + - Mix of 32 & 64 bit
> > +
> > + e200 --- e500 --- e500v2 --- e500mc --- e5500 --- e6500
> > + (Book3E) (HW TLB)
> > + (64bit)
> > +
>
> e200 is its own core family that doesn’t have any relation to e500 line other than being book-e
>
> might want to add multithreaded to e6500.
Thanks Kumar.
cheers
^ permalink raw reply
* [PATCH v3 3/3] powerpc/pseries: Report in kernel device tree update to drmgr
From: Tyrel Datwyler @ 2014-01-31 23:58 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1391212692-16217-1-git-send-email-tyreld@linux.vnet.ibm.com>
Traditionally it has been drmgr's responsibilty to update the device tree
through the /proc/ppc64/ofdt interface after a suspend/resume operation.
This patchset however has modified suspend/resume ops to preform that update
entirely in the kernel during the resume. Therefore, a mechanism is required
for drmgr to determine who is responsible for the update. This patch adds a
show function to the "hibernate" attribute that returns 1 if the kernel
updates the device tree after the resume and 0 if drmgr is responsible.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/suspend.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 1d9c580..b87b978 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -192,7 +192,30 @@ out:
return rc;
}
-static DEVICE_ATTR(hibernate, S_IWUSR, NULL, store_hibernate);
+#define USER_DT_UPDATE 0
+#define KERN_DT_UPDATE 1
+
+/**
+ * show_hibernate - Report device tree update responsibilty
+ * @dev: subsys root device
+ * @attr: device attribute struct
+ * @buf: buffer
+ *
+ * Report whether a device tree update is performed by the kernel after a
+ * resume, or if drmgr must coordinate the update from user space.
+ *
+ * Return value:
+ * 0 if drmgr is to initiate update, and 1 otherwise
+ **/
+static ssize_t show_hibernate(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return sprintf(buf, "%d\n", KERN_DT_UPDATE);
+}
+
+static DEVICE_ATTR(hibernate, S_IWUSR | S_IRUGO,
+ show_hibernate, store_hibernate);
static struct bus_type suspend_subsys = {
.name = "power",
--
1.7.12.2
^ permalink raw reply related
* [PATCH v3 2/3] powerpc/pseries: Update dynamic cache nodes for suspend/resume operation
From: Tyrel Datwyler @ 2014-01-31 23:58 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1391212692-16217-1-git-send-email-tyreld@linux.vnet.ibm.com>
From: Haren Myneni <hbabu@us.ibm.com>
pHyp can change cache nodes for suspend/resume operation. The current code
updates the device tree after all non boot CPUs are enabled. Hence, we do not
modify the cache list based on the latest cache nodes. Also we do not remove
cache entries for the primary CPU.
This patch removes the cache list for the boot CPU, updates the device tree
before enabling nonboot CPUs and adds cache list for the boot cpu.
Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/rtas.h | 1 +
arch/powerpc/platforms/pseries/suspend.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 9bd52c6..a0e1add 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -283,6 +283,7 @@ extern void pSeries_log_error(char *buf, unsigned int err_type, int fatal);
#ifdef CONFIG_PPC_PSERIES
extern int pseries_devicetree_update(s32 scope);
+extern void post_mobility_fixup(void);
#endif
#ifdef CONFIG_PPC_RTAS_DAEMON
diff --git a/arch/powerpc/platforms/pseries/suspend.c b/arch/powerpc/platforms/pseries/suspend.c
index 16a2552..1d9c580 100644
--- a/arch/powerpc/platforms/pseries/suspend.c
+++ b/arch/powerpc/platforms/pseries/suspend.c
@@ -26,6 +26,7 @@
#include <asm/mmu.h>
#include <asm/rtas.h>
#include <asm/topology.h>
+#include "../../kernel/cacheinfo.h"
static u64 stream_id;
static struct device suspend_dev;
@@ -79,6 +80,23 @@ static int pseries_suspend_cpu(void)
}
/**
+ * pseries_suspend_enable_irqs
+ *
+ * Post suspend configuration updates
+ *
+ **/
+static void pseries_suspend_enable_irqs(void)
+{
+ /*
+ * Update configuration which can be modified based on device tree
+ * changes during resume.
+ */
+ cacheinfo_cpu_offline(smp_processor_id());
+ post_mobility_fixup();
+ cacheinfo_cpu_online(smp_processor_id());
+}
+
+/**
* pseries_suspend_enter - Final phase of hibernation
*
* Return value:
@@ -235,6 +253,7 @@ static int __init pseries_suspend_init(void)
return rc;
ppc_md.suspend_disable_cpu = pseries_suspend_cpu;
+ ppc_md.suspend_enable_irqs = pseries_suspend_enable_irqs;
suspend_set_ops(&pseries_suspend_ops);
return 0;
}
--
1.7.12.2
^ permalink raw reply related
* [PATCH v3 1/3] powerpc/pseries: Device tree should only be updated once after suspend/migrate
From: Tyrel Datwyler @ 2014-01-31 23:58 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
In-Reply-To: <1391212692-16217-1-git-send-email-tyreld@linux.vnet.ibm.com>
From: Haren Myneni <hbabu@us.ibm.com>
The current code makes rtas calls for update-nodes, activate-firmware and then
update-nodes again. The FW provides the same data for both update-nodes calls.
As a result a proc entry exists error is reported for the second update while
adding device nodes.
This patch makes a single rtas call for update-nodes after activating the FW.
It also add rtas_busy delay for the activate-firmware rtas call.
Signed-off-by: Haren Myneni <hbabu@us.ibm.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/mobility.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index cde4e0a..bde7eba 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -290,13 +290,6 @@ void post_mobility_fixup(void)
int rc;
int activate_fw_token;
- rc = pseries_devicetree_update(MIGRATION_SCOPE);
- if (rc) {
- printk(KERN_ERR "Initial post-mobility device tree update "
- "failed: %d\n", rc);
- return;
- }
-
activate_fw_token = rtas_token("ibm,activate-firmware");
if (activate_fw_token == RTAS_UNKNOWN_SERVICE) {
printk(KERN_ERR "Could not make post-mobility "
@@ -304,16 +297,17 @@ void post_mobility_fixup(void)
return;
}
- rc = rtas_call(activate_fw_token, 0, 1, NULL);
- if (!rc) {
- rc = pseries_devicetree_update(MIGRATION_SCOPE);
- if (rc)
- printk(KERN_ERR "Secondary post-mobility device tree "
- "update failed: %d\n", rc);
- } else {
+ do {
+ rc = rtas_call(activate_fw_token, 0, 1, NULL);
+ } while (rtas_busy_delay(rc));
+
+ if (rc)
printk(KERN_ERR "Post-mobility activate-fw failed: %d\n", rc);
- return;
- }
+
+ rc = pseries_devicetree_update(MIGRATION_SCOPE);
+ if (rc)
+ printk(KERN_ERR "Post-mobility device tree update "
+ "failed: %d\n", rc);
return;
}
--
1.7.12.2
^ permalink raw reply related
* [PATCH v3 0/3] powerpc/pseries: fix issues in suspend/resume code
From: Tyrel Datwyler @ 2014-01-31 23:58 UTC (permalink / raw)
To: linuxppc-dev; +Cc: nfont, Tyrel Datwyler
This patchset fixes a couple of issues encountered in the suspend/resume code
base. First when using the kernel device tree update code update-nodes is
unnecessarily called more than once. Second the cpu cache lists are not
updated after a suspend/resume which under certain conditions may cause a
panic. Finally, since the cache list fix utilzes in kernel device tree update
code a means for telling drmgr not to perform a device tree update from
userspace is required.
Changes from v2:
- Moved dynamic configuration update code into pseries specific routine
per Nathan's suggestion.
Changes from v1:
- Fixed several commit message typos
- Fixed authorship of first two patches
Haren Myneni (2):
powerpc/pseries: Device tree should only be updated once after
suspend/migrate
powerpc/pseries: Update dynamic cache nodes for suspend/resume
operation
Tyrel Datwyler (1):
powerpc/pseries: Report in kernel device tree update to drmgr
arch/powerpc/include/asm/rtas.h | 1 +
arch/powerpc/platforms/pseries/mobility.c | 26 +++++++-----------
arch/powerpc/platforms/pseries/suspend.c | 44 ++++++++++++++++++++++++++++++-
3 files changed, 54 insertions(+), 17 deletions(-)
--
1.7.12.2
^ permalink raw reply
* Re: PCIe Access - achieve bursts without DMA
From: David Hawkins @ 2014-01-31 23:18 UTC (permalink / raw)
To: Moese, Michael; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1391208815.27142.38.camel@pasglop>
Hi Michael,
>> I'm currently trying to benchmark access speeds to our PCIe-connected IP-cores
>> located inside our FPGA. On x86-based systems I was able to achieve bursts for
>> both read and write access. On PPC32, using an e500v2, I had no success at all
>> so far.
Whenever I want to benchmark PCI/PCIe performance I do the
following tests;
1. Peripheral board DMA (board-to-board)
Use two of your FPGA boards in a chassis and DMA between them.
In a PCI system, you can put the cards on the same bus segment and
then between a bridge and see how that affects things. In your case,
the PCIe traffic will all be via the root-complex/switch, so
you should get the same performance regardless of which PCIe slot
you use.
This is likely the "best you can do" as far as bursts go.
2. Peripheral board DMA to host memory.
In this case I typically insmod a simple driver on the host that
gives me a page of memory, and then DMA into and out of that
memory, using the DMA controller on the peripheral.
3. Host (root complex) DMA.
If your host has a DMA controller, then program it per (2).
As far as "verification" of your custom peripheral board FPGA IP is
concerned, if I was a customer, and you had data for (1) and (2),
I'd be pretty happy (and could care less about (2), since its so
system dependent).
Since its an FPGA-based IP. I'd also expect to see a PCIe simulation
with Bus Functional Models showing what the optimal performance of
your IP was, and then how it nicely matches with the measurements
in (1). If you do not have a PCIe logic analyzer, both Xilinx and
Altera have Chipscope/SignalTap logic analyzers that can be used
for tracing traffic at the TLP layer inside the FPGA.
Just some thoughts ...
Cheers,
Dave
^ permalink raw reply
* Re: PCIe Access - achieve bursts without DMA
From: Benjamin Herrenschmidt @ 2014-01-31 22:53 UTC (permalink / raw)
To: Moese, Michael; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>
On Thu, 2014-01-30 at 12:20 +0000, Moese, Michael wrote:
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-cores
> located inside our FPGA. On x86-based systems I was able to achieve bursts for
> both read and write access. On PPC32, using an e500v2, I had no success at all
> so far.
> I tried using ioremap_wc(), like I did on x86, for writing, and it only results in my
> writes just being single requests, one after another.
Hrm, ioremap_wc will give you a mapping without the G (guard) bit.
Whether that results in some store gathering or not on IOs depends on a
specific HW implementation, you'll have to check with the FSP folks on
that one, there could also be a chicken switch (HID bit or similar)
needed to enable that (there was on some earlier ppc32 chips).
Another thing you can try is to use FP register load/stores.
> For reads, I noticed I could not ioremap_cache() on PPC, so I used simple ioremap()
> here.
> I used several ways to read from the device, from simple readl(),memcpy_from_io(),
> memcpy() to cacheable_memcpy() - with no improvements. Even when just issuing
> a batch of prefetch()-calls for all the memory to read did not result in read bursts.
>
> I only get really poor results, writing is possible with around 40 MiByte/s, whereas I
> can read at about only 3 MiByte/s.
> After hours of studying the reference manual from freescale, looking into other code
> and searching the web, I'm close to resignation.
>
> Maybe someone of you has some more directions for me, I'd appreciate every hint
> that leads me to my problem's solution - maybe I just missed something or lack
> knowledge about this architecture in general.
>
> Thanks for your reading.
>
>
> Michael
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Paul Mackerras @ 2014-01-31 22:17 UTC (permalink / raw)
To: Alexander Graf; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm-devel
In-Reply-To: <5C99D2BA-7E11-4012-B3BD-9B01F4F865ED@suse.de>
On Fri, Jan 31, 2014 at 11:47:44AM +0100, Alexander Graf wrote:
>
> On 31.01.2014, at 11:38, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > Alexander Graf <agraf@suse.de> writes:
> >
> >> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
> >>> We definitely don't need to emulate mtspr, because both the registers
> >>> are hypervisor resource.
> >>
> >> This patch description doesn't cover what the patch actually does. It
> >> changes the implementation from "always tell the guest it uses 100%" to
> >> "give the guest an accurate amount of cpu time spent inside guest
> >> context".
> >
> > Will fix that
> >
> >>
> >> Also, I think we either go with full hyp semantics which means we also
> >> emulate the offset or we go with no hyp awareness in the guest at all
> >> which means we also don't emulate SPURR which is a hyp privileged
> >> register.
> >
> > Can you clarify this ?
>
> In the 2.06 ISA SPURR is hypervisor privileged. That changed for 2.07 where it became supervisor privileged. So I suppose your patch is ok. When reviewing those patches I only had 2.06 around because power.org was broken.
It's always been supervisor privilege for reading and hypervisor
privilege for writing, ever since it was introduced in 2.05, and that
hasn't changed. So I think what Aneesh is doing is correct.
Regards,
Paul.
^ permalink raw reply
* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Cody P Schafer @ 2014-01-31 20:59 UTC (permalink / raw)
To: Michael Ellerman
Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
Arnaldo Carvalho de Melo, Linux PPC
In-Reply-To: <52E05E49.3010903@linux.vnet.ibm.com>
On 01/22/2014 04:11 PM, Cody P Schafer wrote:
> On 01/21/2014 05:32 PM, Michael Ellerman wrote:
>> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>>> performance counters: gpci ("get performance counter info") and 24x7.
Any comments on/things that need fixing for this patch set to be merged?
^ permalink raw reply
* Re: [PATCH 2/2] Fix coding style errors
From: Brian W Hart @ 2014-01-31 19:34 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <1390878454-4329-1-git-send-email-stewartb2@gmail.com>
On Mon, Jan 27, 2014 at 09:07:34PM -0600, Brandon Stewart wrote:
> I corrected several coding errors.
>
> Signed-off-by: Brandon Stewart <stewartb2@gmail.com>
> ---
> drivers/macintosh/adb.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
> index 53611de..dd3f49a 100644
> --- a/drivers/macintosh/adb.c
> +++ b/drivers/macintosh/adb.c
> @@ -623,7 +623,7 @@ do_adb_query(struct adb_request *req)
> {
> int ret = -EINVAL;
>
> - switch(req->data[1]) {
> + switch (req->data[1]) {
> case ADB_QUERY_GETDEVINFO:
> if (req->nbytes < 3)
> break;
> @@ -792,8 +792,9 @@ static ssize_t adb_write(struct file *file, const char __user *buf,
> }
> /* Special case for ADB_BUSRESET request, all others are sent to
> the controller */
> - else if ((req->data[0] == ADB_PACKET) && (count > 1)
> - && (req->data[1] == ADB_BUSRESET)) {
> + else if (req->data[0] == ADB_PACKET &&
> + req->data[1] == ADB_BUSRESET &&
> + count > 1) {
Is this re-ordering safe? Isn't 'count > 1' notionally indicating whether
req->data[1] exists to be tested in the first place?
On the other hand there's a check at the top of the routine that returns
if count < 2, so maybe the check here should be removed altogether (along
with one a few lines above)?
^ permalink raw reply
* Re: [PATCH] powerpc/eeh: drop taken reference to driver on eeh_rmv_device
From: Thadeu Lima de Souza Cascardo @ 2014-01-31 17:24 UTC (permalink / raw)
To: Gavin Shan; +Cc: linuxppc-dev, paulus
In-Reply-To: <20140131004611.GA6790@shangw.(null)>
On Fri, Jan 31, 2014 at 08:46:11AM +0800, Gavin Shan wrote:
> On Thu, Jan 30, 2014 at 11:00:48AM -0200, Thadeu Lima de Souza Cascardo wrote:
> >Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use
> >partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
> >which may grab a reference to a driver, but not release it.
> >
> >That prevents a driver from being removed after it has gone through EEH
> >recovery.
> >
> >This patch drops the reference in either exit path if it was taken.
> >
> >Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> >---
> > arch/powerpc/kernel/eeh_driver.c | 5 ++++-
> > 1 files changed, 4 insertions(+), 1 deletions(-)
> >
> >diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> >index 7bb30dc..afe7337 100644
> >--- a/arch/powerpc/kernel/eeh_driver.c
> >+++ b/arch/powerpc/kernel/eeh_driver.c
> >@@ -364,7 +364,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
> > return NULL;
> > driver = eeh_pcid_get(dev);
> > if (driver && driver->err_handler)
> >- return NULL;
> >+ goto out;
> >
> > /* Remove it from PCI subsystem */
> > pr_debug("EEH: Removing %s without EEH sensitive driver\n",
> >@@ -377,6 +377,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
>
> For normal case (driver without EEH support), we probably release the reference
> to the driver before pci_stop_and_remove_bus_device().
You are right, we need to call it before we call
pci_stop_and_remove_bus_device, otherwise dev->driver will be NULL, and
eeh_pcid_put will not do module_put. On the other hand, we could change
the call to eeh_pcid_put to accept struct pci_driver instead.
>
> > pci_stop_and_remove_bus_device(dev);
> > pci_unlock_rescan_remove();
> >
> >+out:
> >+ if (driver)
> >+ eeh_pcid_put(dev);
> > return NULL;
>
> We needn't "if (driver)" here as eeh_pcid_put() already had the check.
>
What if try_module_get returned false on eeh_pcid_get?
How about something like the patch below?
> > }
> >
>
> Thanks,
> Gavin
---
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7bb30dc..3a397fa 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -352,6 +352,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
struct eeh_dev *edev = (struct eeh_dev *)data;
struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
int *removed = (int *)userdata;
+ bool has_err_handler;
/*
* Actually, we should remove the PCI bridges as well.
@@ -362,8 +363,12 @@ static void *eeh_rmv_device(void *data, void *userdata)
*/
if (!dev || (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
return NULL;
+
driver = eeh_pcid_get(dev);
- if (driver && driver->err_handler)
+ has_err_handler = driver && driver->err_handler;
+ if (driver)
+ eeh_pcid_put(dev);
+ if (has_err_handler)
return NULL;
/* Remove it from PCI subsystem */
---
^ permalink raw reply related
* Re: [PATCH] powerpc: Add cpu family documentation
From: Kumar Gala @ 2014-01-31 13:32 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1391049480-29346-1-git-send-email-mpe@ellerman.id.au>
On Jan 29, 2014, at 8:38 PM, Michael Ellerman <mpe@ellerman.id.au> =
wrote:
> This patch adds some documentation on the different cpu families
> supported by arch/powerpc.
>=20
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> Documentation/powerpc/cpu_families.txt | 76 =
++++++++++++++++++++++++++++++++++
> 1 file changed, 76 insertions(+)
> create mode 100644 Documentation/powerpc/cpu_families.txt
>=20
> diff --git a/Documentation/powerpc/cpu_families.txt =
b/Documentation/powerpc/cpu_families.txt
> new file mode 100644
> index 0000000..df72657
> --- /dev/null
> +++ b/Documentation/powerpc/cpu_families.txt
> @@ -0,0 +1,76 @@
> +CPU Families
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +This doco tries to summarise some of the different cpu families that =
exist and
> +are supported by arch/powerpc.
> +
> +Book3S (aka sPAPR)
> +------------------
> +
> + - Hash MMU
> + - Mix of 32 & 64 bit
> +
> + Old
> + POWER --- 601 --- 603
> + | | |
> + | | *----- 740
> + | | |
> + | | *----- 750 (G3) --- 750CX --- 750CL --- 750FX
> + | | |
> + | | |
> + | 604 *--- 7400 --- 7410 --- 7450 --- 7455 --- =
7447 --- 7448
> + | |
> + | |
> + | *---- [620] --- POWER3/630 --- POWER3+ --- POWER4 --- =
POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- =
POWER7+ --- POWER8
> + | (64bit) =
| .
> + | =
| .
> + | =
| *--- Cell
> + | =
|
> + | =
*--- 970 --- 970FX --- 970MP
> + |
> + *--- RS64 (threads)
> +
> +
> + PA6T (64bit) ...
> +
> +
> +IBM BookE
> +---------
> +
> + - Software loaded TLB.
> + - All 32 bit
> +
> + 401 --- 403 --- 405 --- 440 --- 450 --- 460 --- 476
> + |
> + *--- BG/P
> +
> +
> +Motorola/Freescale 8xx
> +----------------------
> +
> + - Software loaded with hardware assist.
> + - All 32 bit
> +
> + 8xx --- 850
> +
> +
> +Freescale BookE
> +---------------
> +
> + - Software loaded TLB.
> + - e6500 adds HW loaded indirect TLB entries.
> + - Mix of 32 & 64 bit
> +
> + e200 --- e500 --- e500v2 --- e500mc --- e5500 --- e6500
> + (Book3E) (HW TLB)
> + (64bit)
> +
e200 is its own core family that doesn=92t have any relation to e500 =
line other than being book-e
might want to add multithreaded to e6500.
> +IBM A2 core
> +-----------
> +
> + - Book3E, software loaded TLB + HW loaded indirect TLB entries.
> + - 64 bit
> +
> + A2 core --- BG/Q
> + |
> + *------- WSP
> --=20
> 1.8.3.2
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: PCIe Access - achieve bursts without DMA
From: Gabriel Paubert @ 2014-01-31 12:31 UTC (permalink / raw)
To: Moese, Michael; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>
On Thu, Jan 30, 2014 at 12:20:21PM +0000, Moese, Michael wrote:
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-cores
> located inside our FPGA. On x86-based systems I was able to achieve bursts for
> both read and write access. On PPC32, using an e500v2, I had no success at all
> so far.
> I tried using ioremap_wc(), like I did on x86, for writing, and it only results in my
> writes just being single requests, one after another.
I believe that on PPC, write-combine is directly mapped to nocache. I can't remember
if there is a writethrough option for ioremap (but adding it would probably be
relaively easy).
> For reads, I noticed I could not ioremap_cache() on PPC, so I used simple ioremap()
> here.
You might be able to use ioremap_cache and using direct cache control instruction
(dcbf/dcbi) to achieve your goals. This becomes similar to handling machines with
no hardware cache coherency. You have to know the hardware cache line size to make
this work.
This said, it might be better to mark the memory as guarded and non-coherent
(WIMG=0000), I don't know what ioremap_cache does for the MG bits and don't
have the time to look it up right now.
> I used several ways to read from the device, from simple readl(),memcpy_from_io(),
> memcpy() to cacheable_memcpy() - with no improvements. Even when just issuing
> a batch of prefetch()-calls for all the memory to read did not result in read bursts.
If the device data you want to read is supposed to be cacheable (which means basically
that the data does not change unexpectedly under you, i.e., is not as volatile as
a typical device I/O register), you don't want to use readl() which adds some
synchronization to the read.
Prefetch only works on writeback memory, maybe writethrough, expecting it to work on
cache-inhibited memory is contradictory.
Regards,
Gabriel
^ permalink raw reply
* Re: [PATCH 0/2] Fixes for PCI-E link speed
From: Benjamin Herrenschmidt @ 2014-01-31 12:29 UTC (permalink / raw)
To: Kleber Sacilotto de Souza; +Cc: Brian King, Paul Mackerras, linuxppc-dev
In-Reply-To: <52EB94F6.6000800@linux.vnet.ibm.com>
On Fri, 2014-01-31 at 10:20 -0200, Kleber Sacilotto de Souza wrote:
> On 01/17/2014 11:56 AM, Kleber Sacilotto de Souza wrote:
> > These two patches fix problems on the PCI-E link speed detection.
> > The first one fixes a regression and adds some improvements on the
> > code, and the second one adds definitions for Gen3 speeds.
> >
> > Kleber Sacilotto de Souza (2):
> > powerpc/pseries: fix regression on PCI link speed
> > powerpc/pseries: add Gen3 definitions for PCIE link speed
> >
> > arch/powerpc/platforms/pseries/pci.c | 22 +++++++++++++++-------
> > 1 files changed, 15 insertions(+), 7 deletions(-)
> >
>
> Hi,
>
> Any feedback on this patch series?
Patches on this list are tracked in patchwork so are generally not
"lost". Plus I was on vacation last week. So there's no need for such
pings unless much more time has elapsed. I'll probably put it in after
-rc1.
Ben.
^ permalink raw reply
* Re: [PATCH 0/2] Fixes for PCI-E link speed
From: Kleber Sacilotto de Souza @ 2014-01-31 12:20 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Brian King, Paul Mackerras
In-Reply-To: <1389967012-7774-1-git-send-email-klebers@linux.vnet.ibm.com>
On 01/17/2014 11:56 AM, Kleber Sacilotto de Souza wrote:
> These two patches fix problems on the PCI-E link speed detection.
> The first one fixes a regression and adds some improvements on the
> code, and the second one adds definitions for Gen3 speeds.
>
> Kleber Sacilotto de Souza (2):
> powerpc/pseries: fix regression on PCI link speed
> powerpc/pseries: add Gen3 definitions for PCIE link speed
>
> arch/powerpc/platforms/pseries/pci.c | 22 +++++++++++++++-------
> 1 files changed, 15 insertions(+), 7 deletions(-)
>
Hi,
Any feedback on this patch series?
Thanks,
--
Kleber Sacilotto de Souza
IBM Linux Technology Center
^ permalink raw reply
* Re: [RFC PATCH 08/10] KVM: PPC: BOOK3S: PR: Add support for facility unavailable interrupt
From: Alexander Graf @ 2014-01-31 12:02 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87lhxwjs60.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 12:40, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> At this point we allow all the supported facilities except EBB. So
>>> forward the interrupt to guest as illegal instruction.
>>>=20
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/include/asm/kvm_asm.h | 4 +++-
>>> arch/powerpc/kvm/book3s.c | 4 ++++
>>> arch/powerpc/kvm/book3s_emulate.c | 18 ++++++++++++++++++
>>> arch/powerpc/kvm/book3s_pr.c | 17 +++++++++++++++++
>>> 4 files changed, 42 insertions(+), 1 deletion(-)
>>>=20
>>> diff --git a/arch/powerpc/include/asm/kvm_asm.h =
b/arch/powerpc/include/asm/kvm_asm.h
>>> index 1bd92fd43cfb..799244face51 100644
>>> --- a/arch/powerpc/include/asm/kvm_asm.h
>>> +++ b/arch/powerpc/include/asm/kvm_asm.h
>>> @@ -99,6 +99,7 @@
>>> #define BOOK3S_INTERRUPT_PERFMON 0xf00
>>> #define BOOK3S_INTERRUPT_ALTIVEC 0xf20
>>> #define BOOK3S_INTERRUPT_VSX 0xf40
>>> +#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
>>>=20
>>> #define BOOK3S_IRQPRIO_SYSTEM_RESET 0
>>> #define BOOK3S_IRQPRIO_DATA_SEGMENT 1
>>> @@ -117,7 +118,8 @@
>>> #define BOOK3S_IRQPRIO_DECREMENTER 14
>>> #define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 15
>>> #define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16
>>> -#define BOOK3S_IRQPRIO_MAX 17
>>> +#define BOOK3S_IRQPRIO_FAC_UNAVAIL 17
>>> +#define BOOK3S_IRQPRIO_MAX 18
>>>=20
>>> #define BOOK3S_HFLAG_DCBZ32 0x1
>>> #define BOOK3S_HFLAG_SLB 0x2
>>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>>> index 8912608b7e1b..a9aea28c2677 100644
>>> --- a/arch/powerpc/kvm/book3s.c
>>> +++ b/arch/powerpc/kvm/book3s.c
>>> @@ -143,6 +143,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned =
int vec)
>>> case 0xd00: prio =3D BOOK3S_IRQPRIO_DEBUG; break;
>>> case 0xf20: prio =3D BOOK3S_IRQPRIO_ALTIVEC; break;
>>> case 0xf40: prio =3D BOOK3S_IRQPRIO_VSX; =
break;
>>> + case 0xf60: prio =3D BOOK3S_IRQPRIO_FAC_UNAVAIL; =
break;
>>> default: prio =3D BOOK3S_IRQPRIO_MAX; =
break;
>>> }
>>>=20
>>> @@ -273,6 +274,9 @@ int kvmppc_book3s_irqprio_deliver(struct =
kvm_vcpu *vcpu, unsigned int priority)
>>> case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
>>> vec =3D BOOK3S_INTERRUPT_PERFMON;
>>> break;
>>> + case BOOK3S_IRQPRIO_FAC_UNAVAIL:
>>> + vec =3D BOOK3S_INTERRUPT_FAC_UNAVAIL;
>>> + break;
>>> default:
>>> deliver =3D 0;
>>> printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", =
priority);
>>> diff --git a/arch/powerpc/kvm/book3s_emulate.c =
b/arch/powerpc/kvm/book3s_emulate.c
>>> index 60d0b6b745e7..bf6b11021250 100644
>>> --- a/arch/powerpc/kvm/book3s_emulate.c
>>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>>> @@ -481,6 +481,15 @@ int kvmppc_core_emulate_mtspr_pr(struct =
kvm_vcpu *vcpu, int sprn, ulong spr_val)
>>> vcpu->arch.shadow_fscr =3D vcpu->arch.fscr & host_fscr;
>>> break;
>>> }
>>> + case SPRN_EBBHR:
>>> + vcpu->arch.ebbhr =3D spr_val;
>>> + break;
>>> + case SPRN_EBBRR:
>>> + vcpu->arch.ebbrr =3D spr_val;
>>> + break;
>>> + case SPRN_BESCR:
>>> + vcpu->arch.bescr =3D spr_val;
>>> + break;
>>> unprivileged:
>>> default:
>>> printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
>>> @@ -607,6 +616,15 @@ int kvmppc_core_emulate_mfspr_pr(struct =
kvm_vcpu *vcpu, int sprn, ulong *spr_val
>>> case SPRN_FSCR:
>>> *spr_val =3D vcpu->arch.fscr;
>>> break;
>>> + case SPRN_EBBHR:
>>> + *spr_val =3D vcpu->arch.ebbhr;
>>> + break;
>>> + case SPRN_EBBRR:
>>> + *spr_val =3D vcpu->arch.ebbrr;
>>> + break;
>>> + case SPRN_BESCR:
>>> + *spr_val =3D vcpu->arch.bescr;
>>> + break;
>>> default:
>>> unprivileged:
>>> printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
>>> diff --git a/arch/powerpc/kvm/book3s_pr.c =
b/arch/powerpc/kvm/book3s_pr.c
>>> index 51d469f8c9fd..828056ec208f 100644
>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>> @@ -900,6 +900,23 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, =
struct kvm_vcpu *vcpu,
>>> case BOOK3S_INTERRUPT_PERFMON:
>>> r =3D RESUME_GUEST;
>>> break;
>>> + case BOOK3S_INTERRUPT_FAC_UNAVAIL:
>>> + {
>>> + /*
>>> + * Check for the facility that need to be emulated
>>> + */
>>> + ulong fscr_ic =3D vcpu->arch.shadow_fscr >> 56;
>>> + if (fscr_ic !=3D FSCR_EBB_LG) {
>>> + /*
>>> + * We only disable EBB facility.
>>> + * So only emulate that.
>>=20
>> I don't understand the comment. We emulate nothing at all here. We =
either
>> - hit an EBB unavailable in which case we send the guest an =
illegal=20
>> instruction interrupt or we
>> - hit another facility interrupt in which case we forward the=20
>> interrupt to the guest, but not the interrupt cause (fscr_ic).
>>=20
>=20
> What i wanted to achive was, enable both TAR and DSCR and disable
> EBB. The reason to disable EBB was, we are still not clear how to =
handle
> PMU details in PR. Now with FSCR carrying that value, we would get
> facility unavailable interrupt when we try to mfspr/mtspr few EBB
> related registers. The PR guest kernel do that on context switch
> (_switch). Now what we do here is to fallthrough and handle that via
> emulate mtspr/mfspr.
>=20
> If we get facility unavailable interrupt due to any other reason, that
> means PR guest has explicitly disabled that facility. Hence we forward
> that as facility unavailable interrupt to guest allowing PR guest to
> handle that.=20
Please adjust the comment accordingly. =46rom the code flow that is very =
unclear. "Disable" means we don't allow the guest to access EBB. You do =
want to allow the guest to use a fake version of EBB by emulating the =
facility unavailable interrupt.
if (fscr_ic =3D=3D FSCR_EBB_LG) {
/*
* We filtered EBB out of FSCR so that we get traps whenever the guest =
is trying to
* access EBB registers. Thanks to that we can now emulate these =
instructions and
* expose a virtual (no-op) ebb facility to the guest
*/
<call instruction emulation>
} else {
/* forward interrupt to the guest */
}
Alex
>=20
>=20
>> I think the EBB case should be explicit:
>>=20
>> /* We don't allow EBB inside the guest, so something must have gone=20=
>> terribly wrong */
>> if (fscr_ic =3D=3D FSCR_EBB_LG)
>> BUG();
>>=20
>=20
> Instead of BUG, we do handle few mfspr/mtspr via emulate which we are
> mostly ignoring. For event based branch instruction, the emulation =
will
> fail and we will send 0x700(interrupt program) to PR guest right ?
>=20
>=20
>> vcpu->arch.fscr &=3D ~FSCR_IC_MASK;
>> vcpu->arch.fscr |=3D vcpu->arch.shadow_fscr & FSCR_IC_MASK;
>> kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
>> r =3D RESUME_GUEST;
>> break;
>>=20
>=20
> -aneesh
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH 08/10] KVM: PPC: BOOK3S: PR: Add support for facility unavailable interrupt
From: Aneesh Kumar K.V @ 2014-01-31 11:40 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E93BF2.9010500@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> At this point we allow all the supported facilities except EBB. So
>> forward the interrupt to guest as illegal instruction.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/kvm_asm.h | 4 +++-
>> arch/powerpc/kvm/book3s.c | 4 ++++
>> arch/powerpc/kvm/book3s_emulate.c | 18 ++++++++++++++++++
>> arch/powerpc/kvm/book3s_pr.c | 17 +++++++++++++++++
>> 4 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
>> index 1bd92fd43cfb..799244face51 100644
>> --- a/arch/powerpc/include/asm/kvm_asm.h
>> +++ b/arch/powerpc/include/asm/kvm_asm.h
>> @@ -99,6 +99,7 @@
>> #define BOOK3S_INTERRUPT_PERFMON 0xf00
>> #define BOOK3S_INTERRUPT_ALTIVEC 0xf20
>> #define BOOK3S_INTERRUPT_VSX 0xf40
>> +#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
>>
>> #define BOOK3S_IRQPRIO_SYSTEM_RESET 0
>> #define BOOK3S_IRQPRIO_DATA_SEGMENT 1
>> @@ -117,7 +118,8 @@
>> #define BOOK3S_IRQPRIO_DECREMENTER 14
>> #define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 15
>> #define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16
>> -#define BOOK3S_IRQPRIO_MAX 17
>> +#define BOOK3S_IRQPRIO_FAC_UNAVAIL 17
>> +#define BOOK3S_IRQPRIO_MAX 18
>>
>> #define BOOK3S_HFLAG_DCBZ32 0x1
>> #define BOOK3S_HFLAG_SLB 0x2
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index 8912608b7e1b..a9aea28c2677 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -143,6 +143,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
>> case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG; break;
>> case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC; break;
>> case 0xf40: prio = BOOK3S_IRQPRIO_VSX; break;
>> + case 0xf60: prio = BOOK3S_IRQPRIO_FAC_UNAVAIL; break;
>> default: prio = BOOK3S_IRQPRIO_MAX; break;
>> }
>>
>> @@ -273,6 +274,9 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
>> case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
>> vec = BOOK3S_INTERRUPT_PERFMON;
>> break;
>> + case BOOK3S_IRQPRIO_FAC_UNAVAIL:
>> + vec = BOOK3S_INTERRUPT_FAC_UNAVAIL;
>> + break;
>> default:
>> deliver = 0;
>> printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
>> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
>> index 60d0b6b745e7..bf6b11021250 100644
>> --- a/arch/powerpc/kvm/book3s_emulate.c
>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>> @@ -481,6 +481,15 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
>> vcpu->arch.shadow_fscr = vcpu->arch.fscr & host_fscr;
>> break;
>> }
>> + case SPRN_EBBHR:
>> + vcpu->arch.ebbhr = spr_val;
>> + break;
>> + case SPRN_EBBRR:
>> + vcpu->arch.ebbrr = spr_val;
>> + break;
>> + case SPRN_BESCR:
>> + vcpu->arch.bescr = spr_val;
>> + break;
>> unprivileged:
>> default:
>> printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
>> @@ -607,6 +616,15 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
>> case SPRN_FSCR:
>> *spr_val = vcpu->arch.fscr;
>> break;
>> + case SPRN_EBBHR:
>> + *spr_val = vcpu->arch.ebbhr;
>> + break;
>> + case SPRN_EBBRR:
>> + *spr_val = vcpu->arch.ebbrr;
>> + break;
>> + case SPRN_BESCR:
>> + *spr_val = vcpu->arch.bescr;
>> + break;
>> default:
>> unprivileged:
>> printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index 51d469f8c9fd..828056ec208f 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -900,6 +900,23 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
>> case BOOK3S_INTERRUPT_PERFMON:
>> r = RESUME_GUEST;
>> break;
>> + case BOOK3S_INTERRUPT_FAC_UNAVAIL:
>> + {
>> + /*
>> + * Check for the facility that need to be emulated
>> + */
>> + ulong fscr_ic = vcpu->arch.shadow_fscr >> 56;
>> + if (fscr_ic != FSCR_EBB_LG) {
>> + /*
>> + * We only disable EBB facility.
>> + * So only emulate that.
>
> I don't understand the comment. We emulate nothing at all here. We either
> - hit an EBB unavailable in which case we send the guest an illegal
> instruction interrupt or we
> - hit another facility interrupt in which case we forward the
> interrupt to the guest, but not the interrupt cause (fscr_ic).
>
What i wanted to achive was, enable both TAR and DSCR and disable
EBB. The reason to disable EBB was, we are still not clear how to handle
PMU details in PR. Now with FSCR carrying that value, we would get
facility unavailable interrupt when we try to mfspr/mtspr few EBB
related registers. The PR guest kernel do that on context switch
(_switch). Now what we do here is to fallthrough and handle that via
emulate mtspr/mfspr.
If we get facility unavailable interrupt due to any other reason, that
means PR guest has explicitly disabled that facility. Hence we forward
that as facility unavailable interrupt to guest allowing PR guest to
handle that.
> I think the EBB case should be explicit:
>
> /* We don't allow EBB inside the guest, so something must have gone
> terribly wrong */
> if (fscr_ic == FSCR_EBB_LG)
> BUG();
>
Instead of BUG, we do handle few mfspr/mtspr via emulate which we are
mostly ignoring. For event based branch instruction, the emulation will
fail and we will send 0x700(interrupt program) to PR guest right ?
> vcpu->arch.fscr &= ~FSCR_IC_MASK;
> vcpu->arch.fscr |= vcpu->arch.shadow_fscr & FSCR_IC_MASK;
> kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> r = RESUME_GUEST;
> break;
>
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Aneesh Kumar K.V @ 2014-01-31 11:28 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140130060000.GB10611@iris.ozlabs.ibm.com>
Paul Mackerras <paulus@samba.org> writes:
> On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
>> We allow priv-mode update of this. The guest value is saved in fscr,
>> and the value actually used is saved in shadow_fscr. shadow_fscr
>> only contains values that are allowed by the host. On
>> facility unavailable interrupt, if the facility is allowed by fscr
>> but disabled in shadow_fscr we need to emulate the support. Currently
>> all but EBB is disabled. We still don't support performance monitoring
>> in PR guest.
>
> ...
>
>> + /*
>> + * Save the current fscr in shadow fscr
>> + */
>> + mfspr r3,SPRN_FSCR
>> + PPC_STL r3, VCPU_SHADOW_FSCR(r7)
>
> I don't think you need to do this. What could possibly have changed
> FSCR since we loaded it on the way into the guest?
The reason for facility unavailable interrupt is encoded in FSCR right ?
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 03/10] KVM: PPC: BOOK3S: PR: Emulate instruction counter
From: Alexander Graf @ 2014-01-31 11:28 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87r47ojsu6.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 12:25, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> Writing to IC is not allowed in the privileged mode.
>>=20
>> This is not a patch description.
>>=20
>>>=20
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/include/asm/kvm_host.h | 1 +
>>> arch/powerpc/kvm/book3s_emulate.c | 3 +++
>>> arch/powerpc/kvm/book3s_pr.c | 2 ++
>>> 3 files changed, 6 insertions(+)
>>>=20
>>> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
>>> index 9ebdd12e50a9..e0b13aca98e6 100644
>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>> @@ -509,6 +509,7 @@ struct kvm_vcpu_arch {
>>> /* Time base value when we entered the guest */
>>> u64 entry_tb;
>>> u64 entry_vtb;
>>> + u64 entry_ic;
>>> u32 tcr;
>>> ulong tsr; /* we need to perform set/clr_bits() which requires =
ulong */
>>> u32 ivor[64];
>>> diff --git a/arch/powerpc/kvm/book3s_emulate.c =
b/arch/powerpc/kvm/book3s_emulate.c
>>> index 4b58d8a90cb5..abe6f3057e5b 100644
>>> --- a/arch/powerpc/kvm/book3s_emulate.c
>>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>>> @@ -531,6 +531,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu =
*vcpu, int sprn, ulong *spr_val
>>> case SPRN_VTB:
>>> *spr_val =3D vcpu->arch.vtb;
>>> break;
>>> + case SPRN_IC:
>>> + *spr_val =3D vcpu->arch.ic;
>>> + break;
>>> case SPRN_GQR0:
>>> case SPRN_GQR1:
>>> case SPRN_GQR2:
>>> diff --git a/arch/powerpc/kvm/book3s_pr.c =
b/arch/powerpc/kvm/book3s_pr.c
>>> index b5598e9cdd09..51d469f8c9fd 100644
>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>> @@ -121,6 +121,7 @@ void kvmppc_copy_to_svcpu(struct =
kvmppc_book3s_shadow_vcpu *svcpu,
>>> */
>>> vcpu->arch.entry_tb =3D get_tb();
>>> vcpu->arch.entry_vtb =3D get_vtb();
>>> + vcpu->arch.entry_ic =3D mfspr(SPRN_IC);
>>=20
>> Is this implemented on all systems?
>>=20
>>>=20
>>> }
>>>=20
>>> @@ -174,6 +175,7 @@ out:
>>> vcpu->arch.purr +=3D get_tb() - vcpu->arch.entry_tb;
>>> vcpu->arch.spurr +=3D get_tb() - vcpu->arch.entry_tb;
>>> vcpu->arch.vtb +=3D get_vtb() - vcpu->arch.entry_vtb;
>>> + vcpu->arch.ic +=3D mfspr(SPRN_IC) - vcpu->arch.entry_ic;
>>=20
>> This is getting quite convoluted. How about we act slightly more =
fuzzy=20
>> and put all of this into vcpu_load/put?
>>=20
>=20
> I am not sure whether vcpu_load/put is too early/late to save these
> context ?
It'd mean we treat instruction emulation as part of guest overhead and =
time, but we'd make the entry/exit path faster. Unlike with HV KVM, =
guest entry/exit is pretty hot due to the massive amounts of instruction =
emulation we need to do.
Alex
^ permalink raw reply
* Re: [RFC PATCH 03/10] KVM: PPC: BOOK3S: PR: Emulate instruction counter
From: Aneesh Kumar K.V @ 2014-01-31 11:25 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E92F08.6020803@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> Writing to IC is not allowed in the privileged mode.
>
> This is not a patch description.
>
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/kvm_host.h | 1 +
>> arch/powerpc/kvm/book3s_emulate.c | 3 +++
>> arch/powerpc/kvm/book3s_pr.c | 2 ++
>> 3 files changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index 9ebdd12e50a9..e0b13aca98e6 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -509,6 +509,7 @@ struct kvm_vcpu_arch {
>> /* Time base value when we entered the guest */
>> u64 entry_tb;
>> u64 entry_vtb;
>> + u64 entry_ic;
>> u32 tcr;
>> ulong tsr; /* we need to perform set/clr_bits() which requires ulong */
>> u32 ivor[64];
>> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
>> index 4b58d8a90cb5..abe6f3057e5b 100644
>> --- a/arch/powerpc/kvm/book3s_emulate.c
>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>> @@ -531,6 +531,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
>> case SPRN_VTB:
>> *spr_val = vcpu->arch.vtb;
>> break;
>> + case SPRN_IC:
>> + *spr_val = vcpu->arch.ic;
>> + break;
>> case SPRN_GQR0:
>> case SPRN_GQR1:
>> case SPRN_GQR2:
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index b5598e9cdd09..51d469f8c9fd 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -121,6 +121,7 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
>> */
>> vcpu->arch.entry_tb = get_tb();
>> vcpu->arch.entry_vtb = get_vtb();
>> + vcpu->arch.entry_ic = mfspr(SPRN_IC);
>
> Is this implemented on all systems?
>
>>
>> }
>>
>> @@ -174,6 +175,7 @@ out:
>> vcpu->arch.purr += get_tb() - vcpu->arch.entry_tb;
>> vcpu->arch.spurr += get_tb() - vcpu->arch.entry_tb;
>> vcpu->arch.vtb += get_vtb() - vcpu->arch.entry_vtb;
>> + vcpu->arch.ic += mfspr(SPRN_IC) - vcpu->arch.entry_ic;
>
> This is getting quite convoluted. How about we act slightly more fuzzy
> and put all of this into vcpu_load/put?
>
I am not sure whether vcpu_load/put is too early/late to save these
context ?
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Aneesh Kumar K.V @ 2014-01-31 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140130054913.GA10611@iris.ozlabs.ibm.com>
Paul Mackerras <paulus@samba.org> writes:
> On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
>> virtual time base register is a per vm register and need to saved
>> and restored on vm exit and entry. Writing to VTB is not allowed
>> in the privileged mode.
> ...
>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +#define mfvtb() ({unsigned long rval; \
>> + asm volatile("mfspr %0, %1" : \
>> + "=r" (rval) : "i" (SPRN_VTB)); rval;})
>
> The mfspr will be a no-op on anything before POWER8, meaning the
> result will be whatever value was in the destination GPR before the
> mfspr. I suppose that may not matter if the result is only ever used
> when we're running on a POWER8 host, but I would feel more comfortable
> if we had explicit feature tests to make sure of that, rather than
> possibly doing computations with unpredictable values.
>
> With your patch, a guest on a POWER7 or a PPC970 could do a read from
> VTB and get garbage -- first, there is nothing to stop userspace from
> requesting POWER8 emulation on an older machine, and secondly, even if
> the virtual machine is a PPC970 (say) you don't implement
> unimplemented SPR semantics for VTB (no-op if PR=0, illegal
> instruction interrupt if PR=1).
Ok that means we need to do something like ?
struct cpu_spec *s = find_cpuspec(vcpu->arch.pvr);
if (s->cpu_features & CPU_FTR_ARCH_207S) {
}
>
> On the whole I think it is reasonable to reject an attempt to set the
> virtual PVR to a POWER8 PVR value if we are not running on a POWER8
> host, because emulating all the new POWER8 features in software
> (particularly transactional memory) would not be feasible. Alex may
> disagree. :)
That would make it much simpler.
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Alexander Graf @ 2014-01-31 10:47 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87y51wjv0w.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 11:38, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> We definitely don't need to emulate mtspr, because both the =
registers
>>> are hypervisor resource.
>>=20
>> This patch description doesn't cover what the patch actually does. It=20=
>> changes the implementation from "always tell the guest it uses 100%" =
to=20
>> "give the guest an accurate amount of cpu time spent inside guest
>> context".
>=20
> Will fix that
>=20
>>=20
>> Also, I think we either go with full hyp semantics which means we =
also=20
>> emulate the offset or we go with no hyp awareness in the guest at all=20=
>> which means we also don't emulate SPURR which is a hyp privileged
>> register.
>=20
> Can you clarify this ?
In the 2.06 ISA SPURR is hypervisor privileged. That changed for 2.07 =
where it became supervisor privileged. So I suppose your patch is ok. =
When reviewing those patches I only had 2.06 around because power.org =
was broken.
Alex
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Aneesh Kumar K.V @ 2014-01-31 10:38 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E92D15.8000901@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> We definitely don't need to emulate mtspr, because both the registers
>> are hypervisor resource.
>
> This patch description doesn't cover what the patch actually does. It
> changes the implementation from "always tell the guest it uses 100%" to
> "give the guest an accurate amount of cpu time spent inside guest
> context".
Will fix that
>
> Also, I think we either go with full hyp semantics which means we also
> emulate the offset or we go with no hyp awareness in the guest at all
> which means we also don't emulate SPURR which is a hyp privileged
> register.
Can you clarify this ?
>
> Otherwise I like the patch :).
>
-aneesh
^ permalink raw reply
* [PATCH V2 2/2] powerpc/mm: Fix compile error of pgtable-ppc64.h
From: Aneesh Kumar K.V @ 2014-01-31 10:29 UTC (permalink / raw)
To: benh, paulus, stable; +Cc: linuxppc-dev, Aneesh Kumar K.V, Li Zhong
In-Reply-To: <1391164141-14073-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Li Zhong <zhong@linux.vnet.ibm.com>
It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:
In file included from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:17,
from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
upstream sha1:fd120dc2e205d2318a8b47d6d8098b789e3af67d
for 3.13 stable series
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c89a71..bc141c950b1e 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
#define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
- spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+ struct spinlock *old_pmd_ptl)
{
/*
* Archs like ppc64 use pgtable to store per pmd
--
1.8.3.2
^ permalink raw reply related
* [PATCH V2 1/2] powerpc/thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-01-31 10:29 UTC (permalink / raw)
To: benh, paulus, stable; +Cc: linuxppc-dev, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This patch fix the below crash
NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.
We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.
upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
for 3.13 stable series
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++++++++++++++
include/asm-generic/pgtable.h | 12 ++++++++++++
mm/huge_memory.c | 14 +++++---------
3 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 4a191c472867..d27960c89a71 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -558,5 +558,19 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#define __HAVE_ARCH_PMDP_INVALIDATE
extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
+
+#define pmd_move_must_withdraw pmd_move_must_withdraw
+typedef struct spinlock spinlock_t;
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+ spinlock_t *old_pmd_ptl)
+{
+ /*
+ * Archs like ppc64 use pgtable to store per pmd
+ * specific information. So when we switch the pmd,
+ * we should also withdraw and deposit the pgtable
+ */
+ return true;
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index db0923458940..8e4f41d9af4d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
}
#endif
+#ifndef pmd_move_must_withdraw
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+ spinlock_t *old_pmd_ptl)
+{
+ /*
+ * With split pmd lock we also need to move preallocated
+ * PTE page table if new_pmd is on different PMD page table.
+ */
+ return new_pmd_ptl != old_pmd_ptl;
+}
+#endif
+
/*
* This function is meant to be used by sites walking pagetables with
* the mmap_sem hold in read mode to protect against MADV_DONTNEED and
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95d1acb0f3d2..5d80c53b87cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1502,19 +1502,15 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
- set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
- if (new_ptl != old_ptl) {
- pgtable_t pgtable;
- /*
- * Move preallocated PTE page table if new_pmd is on
- * different PMD page table.
- */
+ if (pmd_move_must_withdraw(new_ptl, old_ptl)) {
+ pgtable_t pgtable;
pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
-
- spin_unlock(new_ptl);
}
+ set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+ if (new_ptl != old_ptl)
+ spin_unlock(new_ptl);
spin_unlock(old_ptl);
}
out:
--
1.8.3.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox