* Re: PCIe Access - achieve bursts without DMA
From: Benjamin Herrenschmidt @ 2014-01-31 22:53 UTC (permalink / raw)
To: Moese, Michael; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>
On Thu, 2014-01-30 at 12:20 +0000, Moese, Michael wrote:
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-cores
> located inside our FPGA. On x86-based systems I was able to achieve bursts for
> both read and write access. On PPC32, using an e500v2, I had no success at all
> so far.
> I tried using ioremap_wc(), like I did on x86, for writing, and it only results in my
> writes just being single requests, one after another.
Hrm, ioremap_wc will give you a mapping without the G (guard) bit.
Whether that results in some store gathering or not on IOs depends on a
specific HW implementation, you'll have to check with the FSP folks on
that one, there could also be a chicken switch (HID bit or similar)
needed to enable that (there was on some earlier ppc32 chips).
Another thing you can try is to use FP register load/stores.
> For reads, I noticed I could not ioremap_cache() on PPC, so I used simple ioremap()
> here.
> I used several ways to read from the device, from simple readl(),memcpy_from_io(),
> memcpy() to cacheable_memcpy() - with no improvements. Even when just issuing
> a batch of prefetch()-calls for all the memory to read did not result in read bursts.
>
> I only get really poor results, writing is possible with around 40 MiByte/s, whereas I
> can read at about only 3 MiByte/s.
> After hours of studying the reference manual from freescale, looking into other code
> and searching the web, I'm close to resignation.
>
> Maybe someone of you has some more directions for me, I'd appreciate every hint
> that leads me to my problem's solution - maybe I just missed something or lack
> knowledge about this architecture in general.
>
> Thanks for your reading.
>
>
> Michael
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Paul Mackerras @ 2014-01-31 22:17 UTC (permalink / raw)
To: Alexander Graf; +Cc: linuxppc-dev, Aneesh Kumar K.V, kvm-ppc, kvm-devel
In-Reply-To: <5C99D2BA-7E11-4012-B3BD-9B01F4F865ED@suse.de>
On Fri, Jan 31, 2014 at 11:47:44AM +0100, Alexander Graf wrote:
>
> On 31.01.2014, at 11:38, Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
> > Alexander Graf <agraf@suse.de> writes:
> >
> >> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
> >>> We definitely don't need to emulate mtspr, because both the registers
> >>> are hypervisor resource.
> >>
> >> This patch description doesn't cover what the patch actually does. It
> >> changes the implementation from "always tell the guest it uses 100%" to
> >> "give the guest an accurate amount of cpu time spent inside guest
> >> context".
> >
> > Will fix that
> >
> >>
> >> Also, I think we either go with full hyp semantics which means we also
> >> emulate the offset or we go with no hyp awareness in the guest at all
> >> which means we also don't emulate SPURR which is a hyp privileged
> >> register.
> >
> > Can you clarify this ?
>
> In the 2.06 ISA SPURR is hypervisor privileged. That changed for 2.07 where it became supervisor privileged. So I suppose your patch is ok. When reviewing those patches I only had 2.06 around because power.org was broken.
It's always been supervisor privilege for reading and hypervisor
privilege for writing, ever since it was introduced in 2.05, and that
hasn't changed. So I think what Aneesh is doing is correct.
Regards,
Paul.
^ permalink raw reply
* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Cody P Schafer @ 2014-01-31 20:59 UTC (permalink / raw)
To: Michael Ellerman
Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
Arnaldo Carvalho de Melo, Linux PPC
In-Reply-To: <52E05E49.3010903@linux.vnet.ibm.com>
On 01/22/2014 04:11 PM, Cody P Schafer wrote:
> On 01/21/2014 05:32 PM, Michael Ellerman wrote:
>> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>>> performance counters: gpci ("get performance counter info") and 24x7.
Any comments on/things that need fixing for this patch set to be merged?
^ permalink raw reply
* Re: [PATCH 2/2] Fix coding style errors
From: Brian W Hart @ 2014-01-31 19:34 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <1390878454-4329-1-git-send-email-stewartb2@gmail.com>
On Mon, Jan 27, 2014 at 09:07:34PM -0600, Brandon Stewart wrote:
> I corrected several coding errors.
>
> Signed-off-by: Brandon Stewart <stewartb2@gmail.com>
> ---
> drivers/macintosh/adb.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/macintosh/adb.c b/drivers/macintosh/adb.c
> index 53611de..dd3f49a 100644
> --- a/drivers/macintosh/adb.c
> +++ b/drivers/macintosh/adb.c
> @@ -623,7 +623,7 @@ do_adb_query(struct adb_request *req)
> {
> int ret = -EINVAL;
>
> - switch(req->data[1]) {
> + switch (req->data[1]) {
> case ADB_QUERY_GETDEVINFO:
> if (req->nbytes < 3)
> break;
> @@ -792,8 +792,9 @@ static ssize_t adb_write(struct file *file, const char __user *buf,
> }
> /* Special case for ADB_BUSRESET request, all others are sent to
> the controller */
> - else if ((req->data[0] == ADB_PACKET) && (count > 1)
> - && (req->data[1] == ADB_BUSRESET)) {
> + else if (req->data[0] == ADB_PACKET &&
> + req->data[1] == ADB_BUSRESET &&
> + count > 1) {
Is this re-ordering safe? Isn't 'count > 1' notionally indicating whether
req->data[1] exists to be tested in the first place?
On the other hand there's a check at the top of the routine that returns
if count < 2, so maybe the check here should be removed altogether (along
with one a few lines above)?
^ permalink raw reply
* Re: [PATCH] powerpc/eeh: drop taken reference to driver on eeh_rmv_device
From: Thadeu Lima de Souza Cascardo @ 2014-01-31 17:24 UTC (permalink / raw)
To: Gavin Shan; +Cc: linuxppc-dev, paulus
In-Reply-To: <20140131004611.GA6790@shangw.(null)>
On Fri, Jan 31, 2014 at 08:46:11AM +0800, Gavin Shan wrote:
> On Thu, Jan 30, 2014 at 11:00:48AM -0200, Thadeu Lima de Souza Cascardo wrote:
> >Commit f5c57710dd62dd06f176934a8b4b8accbf00f9f8 ("powerpc/eeh: Use
> >partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
> >which may grab a reference to a driver, but not release it.
> >
> >That prevents a driver from being removed after it has gone through EEH
> >recovery.
> >
> >This patch drops the reference in either exit path if it was taken.
> >
> >Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> >---
> > arch/powerpc/kernel/eeh_driver.c | 5 ++++-
> > 1 files changed, 4 insertions(+), 1 deletions(-)
> >
> >diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> >index 7bb30dc..afe7337 100644
> >--- a/arch/powerpc/kernel/eeh_driver.c
> >+++ b/arch/powerpc/kernel/eeh_driver.c
> >@@ -364,7 +364,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
> > return NULL;
> > driver = eeh_pcid_get(dev);
> > if (driver && driver->err_handler)
> >- return NULL;
> >+ goto out;
> >
> > /* Remove it from PCI subsystem */
> > pr_debug("EEH: Removing %s without EEH sensitive driver\n",
> >@@ -377,6 +377,9 @@ static void *eeh_rmv_device(void *data, void *userdata)
>
> For normal case (driver without EEH support), we probably release the reference
> to the driver before pci_stop_and_remove_bus_device().
You are right, we need to call it before we call
pci_stop_and_remove_bus_device, otherwise dev->driver will be NULL, and
eeh_pcid_put will not do module_put. On the other hand, we could change
the call to eeh_pcid_put to accept struct pci_driver instead.
>
> > pci_stop_and_remove_bus_device(dev);
> > pci_unlock_rescan_remove();
> >
> >+out:
> >+ if (driver)
> >+ eeh_pcid_put(dev);
> > return NULL;
>
> We needn't "if (driver)" here as eeh_pcid_put() already had the check.
>
What if try_module_get returned false on eeh_pcid_get?
How about something like the patch below?
> > }
> >
>
> Thanks,
> Gavin
---
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 7bb30dc..3a397fa 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -352,6 +352,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
struct eeh_dev *edev = (struct eeh_dev *)data;
struct pci_dev *dev = eeh_dev_to_pci_dev(edev);
int *removed = (int *)userdata;
+ bool has_err_handler;
/*
* Actually, we should remove the PCI bridges as well.
@@ -362,8 +363,12 @@ static void *eeh_rmv_device(void *data, void *userdata)
*/
if (!dev || (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE))
return NULL;
+
driver = eeh_pcid_get(dev);
- if (driver && driver->err_handler)
+ has_err_handler = driver && driver->err_handler;
+ if (driver)
+ eeh_pcid_put(dev);
+ if (has_err_handler)
return NULL;
/* Remove it from PCI subsystem */
---
^ permalink raw reply related
* Re: [PATCH] powerpc: Add cpu family documentation
From: Kumar Gala @ 2014-01-31 13:32 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1391049480-29346-1-git-send-email-mpe@ellerman.id.au>
On Jan 29, 2014, at 8:38 PM, Michael Ellerman <mpe@ellerman.id.au> =
wrote:
> This patch adds some documentation on the different cpu families
> supported by arch/powerpc.
>=20
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> Documentation/powerpc/cpu_families.txt | 76 =
++++++++++++++++++++++++++++++++++
> 1 file changed, 76 insertions(+)
> create mode 100644 Documentation/powerpc/cpu_families.txt
>=20
> diff --git a/Documentation/powerpc/cpu_families.txt =
b/Documentation/powerpc/cpu_families.txt
> new file mode 100644
> index 0000000..df72657
> --- /dev/null
> +++ b/Documentation/powerpc/cpu_families.txt
> @@ -0,0 +1,76 @@
> +CPU Families
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +This doco tries to summarise some of the different cpu families that =
exist and
> +are supported by arch/powerpc.
> +
> +Book3S (aka sPAPR)
> +------------------
> +
> + - Hash MMU
> + - Mix of 32 & 64 bit
> +
> + Old
> + POWER --- 601 --- 603
> + | | |
> + | | *----- 740
> + | | |
> + | | *----- 750 (G3) --- 750CX --- 750CL --- 750FX
> + | | |
> + | | |
> + | 604 *--- 7400 --- 7410 --- 7450 --- 7455 --- =
7447 --- 7448
> + | |
> + | |
> + | *---- [620] --- POWER3/630 --- POWER3+ --- POWER4 --- =
POWER4+ --- POWER5 --- POWER5+ --- POWER5++ --- POWER6 --- POWER7 --- =
POWER7+ --- POWER8
> + | (64bit) =
| .
> + | =
| .
> + | =
| *--- Cell
> + | =
|
> + | =
*--- 970 --- 970FX --- 970MP
> + |
> + *--- RS64 (threads)
> +
> +
> + PA6T (64bit) ...
> +
> +
> +IBM BookE
> +---------
> +
> + - Software loaded TLB.
> + - All 32 bit
> +
> + 401 --- 403 --- 405 --- 440 --- 450 --- 460 --- 476
> + |
> + *--- BG/P
> +
> +
> +Motorola/Freescale 8xx
> +----------------------
> +
> + - Software loaded with hardware assist.
> + - All 32 bit
> +
> + 8xx --- 850
> +
> +
> +Freescale BookE
> +---------------
> +
> + - Software loaded TLB.
> + - e6500 adds HW loaded indirect TLB entries.
> + - Mix of 32 & 64 bit
> +
> + e200 --- e500 --- e500v2 --- e500mc --- e5500 --- e6500
> + (Book3E) (HW TLB)
> + (64bit)
> +
e200 is its own core family that doesn=92t have any relation to e500 =
line other than being book-e
might want to add multithreaded to e6500.
> +IBM A2 core
> +-----------
> +
> + - Book3E, software loaded TLB + HW loaded indirect TLB entries.
> + - 64 bit
> +
> + A2 core --- BG/Q
> + |
> + *------- WSP
> --=20
> 1.8.3.2
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
^ permalink raw reply
* Re: PCIe Access - achieve bursts without DMA
From: Gabriel Paubert @ 2014-01-31 12:31 UTC (permalink / raw)
To: Moese, Michael; +Cc: linuxppc-dev@lists.ozlabs.org
In-Reply-To: <2DF74D4E746FF14C8697D5041AAE72D56A2B1420@MEN-EX2.intra.men.de>
On Thu, Jan 30, 2014 at 12:20:21PM +0000, Moese, Michael wrote:
> Hello PPC-developers,
> I'm currently trying to benchmark access speeds to our PCIe-connected IP-cores
> located inside our FPGA. On x86-based systems I was able to achieve bursts for
> both read and write access. On PPC32, using an e500v2, I had no success at all
> so far.
> I tried using ioremap_wc(), like I did on x86, for writing, and it only results in my
> writes just being single requests, one after another.
I believe that on PPC, write-combine is directly mapped to nocache. I can't remember
if there is a writethrough option for ioremap (but adding it would probably be
relaively easy).
> For reads, I noticed I could not ioremap_cache() on PPC, so I used simple ioremap()
> here.
You might be able to use ioremap_cache and using direct cache control instruction
(dcbf/dcbi) to achieve your goals. This becomes similar to handling machines with
no hardware cache coherency. You have to know the hardware cache line size to make
this work.
This said, it might be better to mark the memory as guarded and non-coherent
(WIMG=0000), I don't know what ioremap_cache does for the MG bits and don't
have the time to look it up right now.
> I used several ways to read from the device, from simple readl(),memcpy_from_io(),
> memcpy() to cacheable_memcpy() - with no improvements. Even when just issuing
> a batch of prefetch()-calls for all the memory to read did not result in read bursts.
If the device data you want to read is supposed to be cacheable (which means basically
that the data does not change unexpectedly under you, i.e., is not as volatile as
a typical device I/O register), you don't want to use readl() which adds some
synchronization to the read.
Prefetch only works on writeback memory, maybe writethrough, expecting it to work on
cache-inhibited memory is contradictory.
Regards,
Gabriel
^ permalink raw reply
* Re: [PATCH 0/2] Fixes for PCI-E link speed
From: Benjamin Herrenschmidt @ 2014-01-31 12:29 UTC (permalink / raw)
To: Kleber Sacilotto de Souza; +Cc: Brian King, Paul Mackerras, linuxppc-dev
In-Reply-To: <52EB94F6.6000800@linux.vnet.ibm.com>
On Fri, 2014-01-31 at 10:20 -0200, Kleber Sacilotto de Souza wrote:
> On 01/17/2014 11:56 AM, Kleber Sacilotto de Souza wrote:
> > These two patches fix problems on the PCI-E link speed detection.
> > The first one fixes a regression and adds some improvements on the
> > code, and the second one adds definitions for Gen3 speeds.
> >
> > Kleber Sacilotto de Souza (2):
> > powerpc/pseries: fix regression on PCI link speed
> > powerpc/pseries: add Gen3 definitions for PCIE link speed
> >
> > arch/powerpc/platforms/pseries/pci.c | 22 +++++++++++++++-------
> > 1 files changed, 15 insertions(+), 7 deletions(-)
> >
>
> Hi,
>
> Any feedback on this patch series?
Patches on this list are tracked in patchwork so are generally not
"lost". Plus I was on vacation last week. So there's no need for such
pings unless much more time has elapsed. I'll probably put it in after
-rc1.
Ben.
^ permalink raw reply
* Re: [PATCH 0/2] Fixes for PCI-E link speed
From: Kleber Sacilotto de Souza @ 2014-01-31 12:20 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Brian King, Paul Mackerras
In-Reply-To: <1389967012-7774-1-git-send-email-klebers@linux.vnet.ibm.com>
On 01/17/2014 11:56 AM, Kleber Sacilotto de Souza wrote:
> These two patches fix problems on the PCI-E link speed detection.
> The first one fixes a regression and adds some improvements on the
> code, and the second one adds definitions for Gen3 speeds.
>
> Kleber Sacilotto de Souza (2):
> powerpc/pseries: fix regression on PCI link speed
> powerpc/pseries: add Gen3 definitions for PCIE link speed
>
> arch/powerpc/platforms/pseries/pci.c | 22 +++++++++++++++-------
> 1 files changed, 15 insertions(+), 7 deletions(-)
>
Hi,
Any feedback on this patch series?
Thanks,
--
Kleber Sacilotto de Souza
IBM Linux Technology Center
^ permalink raw reply
* Re: [RFC PATCH 08/10] KVM: PPC: BOOK3S: PR: Add support for facility unavailable interrupt
From: Alexander Graf @ 2014-01-31 12:02 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87lhxwjs60.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 12:40, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> At this point we allow all the supported facilities except EBB. So
>>> forward the interrupt to guest as illegal instruction.
>>>=20
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/include/asm/kvm_asm.h | 4 +++-
>>> arch/powerpc/kvm/book3s.c | 4 ++++
>>> arch/powerpc/kvm/book3s_emulate.c | 18 ++++++++++++++++++
>>> arch/powerpc/kvm/book3s_pr.c | 17 +++++++++++++++++
>>> 4 files changed, 42 insertions(+), 1 deletion(-)
>>>=20
>>> diff --git a/arch/powerpc/include/asm/kvm_asm.h =
b/arch/powerpc/include/asm/kvm_asm.h
>>> index 1bd92fd43cfb..799244face51 100644
>>> --- a/arch/powerpc/include/asm/kvm_asm.h
>>> +++ b/arch/powerpc/include/asm/kvm_asm.h
>>> @@ -99,6 +99,7 @@
>>> #define BOOK3S_INTERRUPT_PERFMON 0xf00
>>> #define BOOK3S_INTERRUPT_ALTIVEC 0xf20
>>> #define BOOK3S_INTERRUPT_VSX 0xf40
>>> +#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
>>>=20
>>> #define BOOK3S_IRQPRIO_SYSTEM_RESET 0
>>> #define BOOK3S_IRQPRIO_DATA_SEGMENT 1
>>> @@ -117,7 +118,8 @@
>>> #define BOOK3S_IRQPRIO_DECREMENTER 14
>>> #define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 15
>>> #define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16
>>> -#define BOOK3S_IRQPRIO_MAX 17
>>> +#define BOOK3S_IRQPRIO_FAC_UNAVAIL 17
>>> +#define BOOK3S_IRQPRIO_MAX 18
>>>=20
>>> #define BOOK3S_HFLAG_DCBZ32 0x1
>>> #define BOOK3S_HFLAG_SLB 0x2
>>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>>> index 8912608b7e1b..a9aea28c2677 100644
>>> --- a/arch/powerpc/kvm/book3s.c
>>> +++ b/arch/powerpc/kvm/book3s.c
>>> @@ -143,6 +143,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned =
int vec)
>>> case 0xd00: prio =3D BOOK3S_IRQPRIO_DEBUG; break;
>>> case 0xf20: prio =3D BOOK3S_IRQPRIO_ALTIVEC; break;
>>> case 0xf40: prio =3D BOOK3S_IRQPRIO_VSX; =
break;
>>> + case 0xf60: prio =3D BOOK3S_IRQPRIO_FAC_UNAVAIL; =
break;
>>> default: prio =3D BOOK3S_IRQPRIO_MAX; =
break;
>>> }
>>>=20
>>> @@ -273,6 +274,9 @@ int kvmppc_book3s_irqprio_deliver(struct =
kvm_vcpu *vcpu, unsigned int priority)
>>> case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
>>> vec =3D BOOK3S_INTERRUPT_PERFMON;
>>> break;
>>> + case BOOK3S_IRQPRIO_FAC_UNAVAIL:
>>> + vec =3D BOOK3S_INTERRUPT_FAC_UNAVAIL;
>>> + break;
>>> default:
>>> deliver =3D 0;
>>> printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", =
priority);
>>> diff --git a/arch/powerpc/kvm/book3s_emulate.c =
b/arch/powerpc/kvm/book3s_emulate.c
>>> index 60d0b6b745e7..bf6b11021250 100644
>>> --- a/arch/powerpc/kvm/book3s_emulate.c
>>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>>> @@ -481,6 +481,15 @@ int kvmppc_core_emulate_mtspr_pr(struct =
kvm_vcpu *vcpu, int sprn, ulong spr_val)
>>> vcpu->arch.shadow_fscr =3D vcpu->arch.fscr & host_fscr;
>>> break;
>>> }
>>> + case SPRN_EBBHR:
>>> + vcpu->arch.ebbhr =3D spr_val;
>>> + break;
>>> + case SPRN_EBBRR:
>>> + vcpu->arch.ebbrr =3D spr_val;
>>> + break;
>>> + case SPRN_BESCR:
>>> + vcpu->arch.bescr =3D spr_val;
>>> + break;
>>> unprivileged:
>>> default:
>>> printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
>>> @@ -607,6 +616,15 @@ int kvmppc_core_emulate_mfspr_pr(struct =
kvm_vcpu *vcpu, int sprn, ulong *spr_val
>>> case SPRN_FSCR:
>>> *spr_val =3D vcpu->arch.fscr;
>>> break;
>>> + case SPRN_EBBHR:
>>> + *spr_val =3D vcpu->arch.ebbhr;
>>> + break;
>>> + case SPRN_EBBRR:
>>> + *spr_val =3D vcpu->arch.ebbrr;
>>> + break;
>>> + case SPRN_BESCR:
>>> + *spr_val =3D vcpu->arch.bescr;
>>> + break;
>>> default:
>>> unprivileged:
>>> printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
>>> diff --git a/arch/powerpc/kvm/book3s_pr.c =
b/arch/powerpc/kvm/book3s_pr.c
>>> index 51d469f8c9fd..828056ec208f 100644
>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>> @@ -900,6 +900,23 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, =
struct kvm_vcpu *vcpu,
>>> case BOOK3S_INTERRUPT_PERFMON:
>>> r =3D RESUME_GUEST;
>>> break;
>>> + case BOOK3S_INTERRUPT_FAC_UNAVAIL:
>>> + {
>>> + /*
>>> + * Check for the facility that need to be emulated
>>> + */
>>> + ulong fscr_ic =3D vcpu->arch.shadow_fscr >> 56;
>>> + if (fscr_ic !=3D FSCR_EBB_LG) {
>>> + /*
>>> + * We only disable EBB facility.
>>> + * So only emulate that.
>>=20
>> I don't understand the comment. We emulate nothing at all here. We =
either
>> - hit an EBB unavailable in which case we send the guest an =
illegal=20
>> instruction interrupt or we
>> - hit another facility interrupt in which case we forward the=20
>> interrupt to the guest, but not the interrupt cause (fscr_ic).
>>=20
>=20
> What i wanted to achive was, enable both TAR and DSCR and disable
> EBB. The reason to disable EBB was, we are still not clear how to =
handle
> PMU details in PR. Now with FSCR carrying that value, we would get
> facility unavailable interrupt when we try to mfspr/mtspr few EBB
> related registers. The PR guest kernel do that on context switch
> (_switch). Now what we do here is to fallthrough and handle that via
> emulate mtspr/mfspr.
>=20
> If we get facility unavailable interrupt due to any other reason, that
> means PR guest has explicitly disabled that facility. Hence we forward
> that as facility unavailable interrupt to guest allowing PR guest to
> handle that.=20
Please adjust the comment accordingly. =46rom the code flow that is very =
unclear. "Disable" means we don't allow the guest to access EBB. You do =
want to allow the guest to use a fake version of EBB by emulating the =
facility unavailable interrupt.
if (fscr_ic =3D=3D FSCR_EBB_LG) {
/*
* We filtered EBB out of FSCR so that we get traps whenever the guest =
is trying to
* access EBB registers. Thanks to that we can now emulate these =
instructions and
* expose a virtual (no-op) ebb facility to the guest
*/
<call instruction emulation>
} else {
/* forward interrupt to the guest */
}
Alex
>=20
>=20
>> I think the EBB case should be explicit:
>>=20
>> /* We don't allow EBB inside the guest, so something must have gone=20=
>> terribly wrong */
>> if (fscr_ic =3D=3D FSCR_EBB_LG)
>> BUG();
>>=20
>=20
> Instead of BUG, we do handle few mfspr/mtspr via emulate which we are
> mostly ignoring. For event based branch instruction, the emulation =
will
> fail and we will send 0x700(interrupt program) to PR guest right ?
>=20
>=20
>> vcpu->arch.fscr &=3D ~FSCR_IC_MASK;
>> vcpu->arch.fscr |=3D vcpu->arch.shadow_fscr & FSCR_IC_MASK;
>> kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
>> r =3D RESUME_GUEST;
>> break;
>>=20
>=20
> -aneesh
>=20
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH 08/10] KVM: PPC: BOOK3S: PR: Add support for facility unavailable interrupt
From: Aneesh Kumar K.V @ 2014-01-31 11:40 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E93BF2.9010500@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> At this point we allow all the supported facilities except EBB. So
>> forward the interrupt to guest as illegal instruction.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/kvm_asm.h | 4 +++-
>> arch/powerpc/kvm/book3s.c | 4 ++++
>> arch/powerpc/kvm/book3s_emulate.c | 18 ++++++++++++++++++
>> arch/powerpc/kvm/book3s_pr.c | 17 +++++++++++++++++
>> 4 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
>> index 1bd92fd43cfb..799244face51 100644
>> --- a/arch/powerpc/include/asm/kvm_asm.h
>> +++ b/arch/powerpc/include/asm/kvm_asm.h
>> @@ -99,6 +99,7 @@
>> #define BOOK3S_INTERRUPT_PERFMON 0xf00
>> #define BOOK3S_INTERRUPT_ALTIVEC 0xf20
>> #define BOOK3S_INTERRUPT_VSX 0xf40
>> +#define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60
>>
>> #define BOOK3S_IRQPRIO_SYSTEM_RESET 0
>> #define BOOK3S_IRQPRIO_DATA_SEGMENT 1
>> @@ -117,7 +118,8 @@
>> #define BOOK3S_IRQPRIO_DECREMENTER 14
>> #define BOOK3S_IRQPRIO_PERFORMANCE_MONITOR 15
>> #define BOOK3S_IRQPRIO_EXTERNAL_LEVEL 16
>> -#define BOOK3S_IRQPRIO_MAX 17
>> +#define BOOK3S_IRQPRIO_FAC_UNAVAIL 17
>> +#define BOOK3S_IRQPRIO_MAX 18
>>
>> #define BOOK3S_HFLAG_DCBZ32 0x1
>> #define BOOK3S_HFLAG_SLB 0x2
>> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
>> index 8912608b7e1b..a9aea28c2677 100644
>> --- a/arch/powerpc/kvm/book3s.c
>> +++ b/arch/powerpc/kvm/book3s.c
>> @@ -143,6 +143,7 @@ static int kvmppc_book3s_vec2irqprio(unsigned int vec)
>> case 0xd00: prio = BOOK3S_IRQPRIO_DEBUG; break;
>> case 0xf20: prio = BOOK3S_IRQPRIO_ALTIVEC; break;
>> case 0xf40: prio = BOOK3S_IRQPRIO_VSX; break;
>> + case 0xf60: prio = BOOK3S_IRQPRIO_FAC_UNAVAIL; break;
>> default: prio = BOOK3S_IRQPRIO_MAX; break;
>> }
>>
>> @@ -273,6 +274,9 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority)
>> case BOOK3S_IRQPRIO_PERFORMANCE_MONITOR:
>> vec = BOOK3S_INTERRUPT_PERFMON;
>> break;
>> + case BOOK3S_IRQPRIO_FAC_UNAVAIL:
>> + vec = BOOK3S_INTERRUPT_FAC_UNAVAIL;
>> + break;
>> default:
>> deliver = 0;
>> printk(KERN_ERR "KVM: Unknown interrupt: 0x%x\n", priority);
>> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
>> index 60d0b6b745e7..bf6b11021250 100644
>> --- a/arch/powerpc/kvm/book3s_emulate.c
>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>> @@ -481,6 +481,15 @@ int kvmppc_core_emulate_mtspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val)
>> vcpu->arch.shadow_fscr = vcpu->arch.fscr & host_fscr;
>> break;
>> }
>> + case SPRN_EBBHR:
>> + vcpu->arch.ebbhr = spr_val;
>> + break;
>> + case SPRN_EBBRR:
>> + vcpu->arch.ebbrr = spr_val;
>> + break;
>> + case SPRN_BESCR:
>> + vcpu->arch.bescr = spr_val;
>> + break;
>> unprivileged:
>> default:
>> printk(KERN_INFO "KVM: invalid SPR write: %d\n", sprn);
>> @@ -607,6 +616,15 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
>> case SPRN_FSCR:
>> *spr_val = vcpu->arch.fscr;
>> break;
>> + case SPRN_EBBHR:
>> + *spr_val = vcpu->arch.ebbhr;
>> + break;
>> + case SPRN_EBBRR:
>> + *spr_val = vcpu->arch.ebbrr;
>> + break;
>> + case SPRN_BESCR:
>> + *spr_val = vcpu->arch.bescr;
>> + break;
>> default:
>> unprivileged:
>> printk(KERN_INFO "KVM: invalid SPR read: %d\n", sprn);
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index 51d469f8c9fd..828056ec208f 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -900,6 +900,23 @@ int kvmppc_handle_exit_pr(struct kvm_run *run, struct kvm_vcpu *vcpu,
>> case BOOK3S_INTERRUPT_PERFMON:
>> r = RESUME_GUEST;
>> break;
>> + case BOOK3S_INTERRUPT_FAC_UNAVAIL:
>> + {
>> + /*
>> + * Check for the facility that need to be emulated
>> + */
>> + ulong fscr_ic = vcpu->arch.shadow_fscr >> 56;
>> + if (fscr_ic != FSCR_EBB_LG) {
>> + /*
>> + * We only disable EBB facility.
>> + * So only emulate that.
>
> I don't understand the comment. We emulate nothing at all here. We either
> - hit an EBB unavailable in which case we send the guest an illegal
> instruction interrupt or we
> - hit another facility interrupt in which case we forward the
> interrupt to the guest, but not the interrupt cause (fscr_ic).
>
What i wanted to achive was, enable both TAR and DSCR and disable
EBB. The reason to disable EBB was, we are still not clear how to handle
PMU details in PR. Now with FSCR carrying that value, we would get
facility unavailable interrupt when we try to mfspr/mtspr few EBB
related registers. The PR guest kernel do that on context switch
(_switch). Now what we do here is to fallthrough and handle that via
emulate mtspr/mfspr.
If we get facility unavailable interrupt due to any other reason, that
means PR guest has explicitly disabled that facility. Hence we forward
that as facility unavailable interrupt to guest allowing PR guest to
handle that.
> I think the EBB case should be explicit:
>
> /* We don't allow EBB inside the guest, so something must have gone
> terribly wrong */
> if (fscr_ic == FSCR_EBB_LG)
> BUG();
>
Instead of BUG, we do handle few mfspr/mtspr via emulate which we are
mostly ignoring. For event based branch instruction, the emulation will
fail and we will send 0x700(interrupt program) to PR guest right ?
> vcpu->arch.fscr &= ~FSCR_IC_MASK;
> vcpu->arch.fscr |= vcpu->arch.shadow_fscr & FSCR_IC_MASK;
> kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
> r = RESUME_GUEST;
> break;
>
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 07/10] KVM: PPC: BOOK3S: PR: Emulate facility status and control register
From: Aneesh Kumar K.V @ 2014-01-31 11:28 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140130060000.GB10611@iris.ozlabs.ibm.com>
Paul Mackerras <paulus@samba.org> writes:
> On Tue, Jan 28, 2014 at 10:14:12PM +0530, Aneesh Kumar K.V wrote:
>> We allow priv-mode update of this. The guest value is saved in fscr,
>> and the value actually used is saved in shadow_fscr. shadow_fscr
>> only contains values that are allowed by the host. On
>> facility unavailable interrupt, if the facility is allowed by fscr
>> but disabled in shadow_fscr we need to emulate the support. Currently
>> all but EBB is disabled. We still don't support performance monitoring
>> in PR guest.
>
> ...
>
>> + /*
>> + * Save the current fscr in shadow fscr
>> + */
>> + mfspr r3,SPRN_FSCR
>> + PPC_STL r3, VCPU_SHADOW_FSCR(r7)
>
> I don't think you need to do this. What could possibly have changed
> FSCR since we loaded it on the way into the guest?
The reason for facility unavailable interrupt is encoded in FSCR right ?
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 03/10] KVM: PPC: BOOK3S: PR: Emulate instruction counter
From: Alexander Graf @ 2014-01-31 11:28 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87r47ojsu6.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 12:25, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> Writing to IC is not allowed in the privileged mode.
>>=20
>> This is not a patch description.
>>=20
>>>=20
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> ---
>>> arch/powerpc/include/asm/kvm_host.h | 1 +
>>> arch/powerpc/kvm/book3s_emulate.c | 3 +++
>>> arch/powerpc/kvm/book3s_pr.c | 2 ++
>>> 3 files changed, 6 insertions(+)
>>>=20
>>> diff --git a/arch/powerpc/include/asm/kvm_host.h =
b/arch/powerpc/include/asm/kvm_host.h
>>> index 9ebdd12e50a9..e0b13aca98e6 100644
>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>> @@ -509,6 +509,7 @@ struct kvm_vcpu_arch {
>>> /* Time base value when we entered the guest */
>>> u64 entry_tb;
>>> u64 entry_vtb;
>>> + u64 entry_ic;
>>> u32 tcr;
>>> ulong tsr; /* we need to perform set/clr_bits() which requires =
ulong */
>>> u32 ivor[64];
>>> diff --git a/arch/powerpc/kvm/book3s_emulate.c =
b/arch/powerpc/kvm/book3s_emulate.c
>>> index 4b58d8a90cb5..abe6f3057e5b 100644
>>> --- a/arch/powerpc/kvm/book3s_emulate.c
>>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>>> @@ -531,6 +531,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu =
*vcpu, int sprn, ulong *spr_val
>>> case SPRN_VTB:
>>> *spr_val =3D vcpu->arch.vtb;
>>> break;
>>> + case SPRN_IC:
>>> + *spr_val =3D vcpu->arch.ic;
>>> + break;
>>> case SPRN_GQR0:
>>> case SPRN_GQR1:
>>> case SPRN_GQR2:
>>> diff --git a/arch/powerpc/kvm/book3s_pr.c =
b/arch/powerpc/kvm/book3s_pr.c
>>> index b5598e9cdd09..51d469f8c9fd 100644
>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>> @@ -121,6 +121,7 @@ void kvmppc_copy_to_svcpu(struct =
kvmppc_book3s_shadow_vcpu *svcpu,
>>> */
>>> vcpu->arch.entry_tb =3D get_tb();
>>> vcpu->arch.entry_vtb =3D get_vtb();
>>> + vcpu->arch.entry_ic =3D mfspr(SPRN_IC);
>>=20
>> Is this implemented on all systems?
>>=20
>>>=20
>>> }
>>>=20
>>> @@ -174,6 +175,7 @@ out:
>>> vcpu->arch.purr +=3D get_tb() - vcpu->arch.entry_tb;
>>> vcpu->arch.spurr +=3D get_tb() - vcpu->arch.entry_tb;
>>> vcpu->arch.vtb +=3D get_vtb() - vcpu->arch.entry_vtb;
>>> + vcpu->arch.ic +=3D mfspr(SPRN_IC) - vcpu->arch.entry_ic;
>>=20
>> This is getting quite convoluted. How about we act slightly more =
fuzzy=20
>> and put all of this into vcpu_load/put?
>>=20
>=20
> I am not sure whether vcpu_load/put is too early/late to save these
> context ?
It'd mean we treat instruction emulation as part of guest overhead and =
time, but we'd make the entry/exit path faster. Unlike with HV KVM, =
guest entry/exit is pretty hot due to the massive amounts of instruction =
emulation we need to do.
Alex
^ permalink raw reply
* Re: [RFC PATCH 03/10] KVM: PPC: BOOK3S: PR: Emulate instruction counter
From: Aneesh Kumar K.V @ 2014-01-31 11:25 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E92F08.6020803@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> Writing to IC is not allowed in the privileged mode.
>
> This is not a patch description.
>
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> ---
>> arch/powerpc/include/asm/kvm_host.h | 1 +
>> arch/powerpc/kvm/book3s_emulate.c | 3 +++
>> arch/powerpc/kvm/book3s_pr.c | 2 ++
>> 3 files changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index 9ebdd12e50a9..e0b13aca98e6 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -509,6 +509,7 @@ struct kvm_vcpu_arch {
>> /* Time base value when we entered the guest */
>> u64 entry_tb;
>> u64 entry_vtb;
>> + u64 entry_ic;
>> u32 tcr;
>> ulong tsr; /* we need to perform set/clr_bits() which requires ulong */
>> u32 ivor[64];
>> diff --git a/arch/powerpc/kvm/book3s_emulate.c b/arch/powerpc/kvm/book3s_emulate.c
>> index 4b58d8a90cb5..abe6f3057e5b 100644
>> --- a/arch/powerpc/kvm/book3s_emulate.c
>> +++ b/arch/powerpc/kvm/book3s_emulate.c
>> @@ -531,6 +531,9 @@ int kvmppc_core_emulate_mfspr_pr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val
>> case SPRN_VTB:
>> *spr_val = vcpu->arch.vtb;
>> break;
>> + case SPRN_IC:
>> + *spr_val = vcpu->arch.ic;
>> + break;
>> case SPRN_GQR0:
>> case SPRN_GQR1:
>> case SPRN_GQR2:
>> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
>> index b5598e9cdd09..51d469f8c9fd 100644
>> --- a/arch/powerpc/kvm/book3s_pr.c
>> +++ b/arch/powerpc/kvm/book3s_pr.c
>> @@ -121,6 +121,7 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu,
>> */
>> vcpu->arch.entry_tb = get_tb();
>> vcpu->arch.entry_vtb = get_vtb();
>> + vcpu->arch.entry_ic = mfspr(SPRN_IC);
>
> Is this implemented on all systems?
>
>>
>> }
>>
>> @@ -174,6 +175,7 @@ out:
>> vcpu->arch.purr += get_tb() - vcpu->arch.entry_tb;
>> vcpu->arch.spurr += get_tb() - vcpu->arch.entry_tb;
>> vcpu->arch.vtb += get_vtb() - vcpu->arch.entry_vtb;
>> + vcpu->arch.ic += mfspr(SPRN_IC) - vcpu->arch.entry_ic;
>
> This is getting quite convoluted. How about we act slightly more fuzzy
> and put all of this into vcpu_load/put?
>
I am not sure whether vcpu_load/put is too early/late to save these
context ?
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 02/10] KVM: PPC: BOOK3S: PR: Emulate virtual timebase register
From: Aneesh Kumar K.V @ 2014-01-31 10:57 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, agraf, kvm-ppc, kvm
In-Reply-To: <20140130054913.GA10611@iris.ozlabs.ibm.com>
Paul Mackerras <paulus@samba.org> writes:
> On Tue, Jan 28, 2014 at 10:14:07PM +0530, Aneesh Kumar K.V wrote:
>> virtual time base register is a per vm register and need to saved
>> and restored on vm exit and entry. Writing to VTB is not allowed
>> in the privileged mode.
> ...
>
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +#define mfvtb() ({unsigned long rval; \
>> + asm volatile("mfspr %0, %1" : \
>> + "=r" (rval) : "i" (SPRN_VTB)); rval;})
>
> The mfspr will be a no-op on anything before POWER8, meaning the
> result will be whatever value was in the destination GPR before the
> mfspr. I suppose that may not matter if the result is only ever used
> when we're running on a POWER8 host, but I would feel more comfortable
> if we had explicit feature tests to make sure of that, rather than
> possibly doing computations with unpredictable values.
>
> With your patch, a guest on a POWER7 or a PPC970 could do a read from
> VTB and get garbage -- first, there is nothing to stop userspace from
> requesting POWER8 emulation on an older machine, and secondly, even if
> the virtual machine is a PPC970 (say) you don't implement
> unimplemented SPR semantics for VTB (no-op if PR=0, illegal
> instruction interrupt if PR=1).
Ok that means we need to do something like ?
struct cpu_spec *s = find_cpuspec(vcpu->arch.pvr);
if (s->cpu_features & CPU_FTR_ARCH_207S) {
}
>
> On the whole I think it is reasonable to reject an attempt to set the
> virtual PVR to a POWER8 PVR value if we are not running on a POWER8
> host, because emulating all the new POWER8 features in software
> (particularly transactional memory) would not be feasible. Alex may
> disagree. :)
That would make it much simpler.
-aneesh
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Alexander Graf @ 2014-01-31 10:47 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Paul Mackerras, linuxppc-dev, kvm-ppc, kvm-devel
In-Reply-To: <87y51wjv0w.fsf@linux.vnet.ibm.com>
On 31.01.2014, at 11:38, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:
> Alexander Graf <agraf@suse.de> writes:
>=20
>> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>>> We definitely don't need to emulate mtspr, because both the =
registers
>>> are hypervisor resource.
>>=20
>> This patch description doesn't cover what the patch actually does. It=20=
>> changes the implementation from "always tell the guest it uses 100%" =
to=20
>> "give the guest an accurate amount of cpu time spent inside guest
>> context".
>=20
> Will fix that
>=20
>>=20
>> Also, I think we either go with full hyp semantics which means we =
also=20
>> emulate the offset or we go with no hyp awareness in the guest at all=20=
>> which means we also don't emulate SPURR which is a hyp privileged
>> register.
>=20
> Can you clarify this ?
In the 2.06 ISA SPURR is hypervisor privileged. That changed for 2.07 =
where it became supervisor privileged. So I suppose your patch is ok. =
When reviewing those patches I only had 2.06 around because power.org =
was broken.
Alex
^ permalink raw reply
* Re: [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation
From: Aneesh Kumar K.V @ 2014-01-31 10:38 UTC (permalink / raw)
To: Alexander Graf; +Cc: paulus, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <52E92D15.8000901@suse.de>
Alexander Graf <agraf@suse.de> writes:
> On 01/28/2014 05:44 PM, Aneesh Kumar K.V wrote:
>> We definitely don't need to emulate mtspr, because both the registers
>> are hypervisor resource.
>
> This patch description doesn't cover what the patch actually does. It
> changes the implementation from "always tell the guest it uses 100%" to
> "give the guest an accurate amount of cpu time spent inside guest
> context".
Will fix that
>
> Also, I think we either go with full hyp semantics which means we also
> emulate the offset or we go with no hyp awareness in the guest at all
> which means we also don't emulate SPURR which is a hyp privileged
> register.
Can you clarify this ?
>
> Otherwise I like the patch :).
>
-aneesh
^ permalink raw reply
* [PATCH V2 2/2] powerpc/mm: Fix compile error of pgtable-ppc64.h
From: Aneesh Kumar K.V @ 2014-01-31 10:29 UTC (permalink / raw)
To: benh, paulus, stable; +Cc: linuxppc-dev, Aneesh Kumar K.V, Li Zhong
In-Reply-To: <1391164141-14073-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>
From: Li Zhong <zhong@linux.vnet.ibm.com>
It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:
In file included from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:17,
from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
upstream sha1:fd120dc2e205d2318a8b47d6d8098b789e3af67d
for 3.13 stable series
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c89a71..bc141c950b1e 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
#define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
- spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+ struct spinlock *old_pmd_ptl)
{
/*
* Archs like ppc64 use pgtable to store per pmd
--
1.8.3.2
^ permalink raw reply related
* [PATCH V2 1/2] powerpc/thp: Fix crash on mremap
From: Aneesh Kumar K.V @ 2014-01-31 10:29 UTC (permalink / raw)
To: benh, paulus, stable; +Cc: linuxppc-dev, Aneesh Kumar K.V
From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
This patch fix the below crash
NIP [c00000000004cee4] .__hash_page_thp+0x2a4/0x440
LR [c0000000000439ac] .hash_page+0x18c/0x5e0
...
Call Trace:
[c000000736103c40] [00001ffffb000000] 0x1ffffb000000(unreliable)
[437908.479693] [c000000736103d50] [c0000000000439ac] .hash_page+0x18c/0x5e0
[437908.479699] [c000000736103e30] [c00000000000924c] .do_hash_page+0x4c/0x58
On ppc64 we use the pgtable for storing the hpte slot information and
store address to the pgtable at a constant offset (PTRS_PER_PMD) from
pmd. On mremap, when we switch the pmd, we need to withdraw and deposit
the pgtable again, so that we find the pgtable at PTRS_PER_PMD offset
from new pmd.
We also want to move the withdraw and deposit before the set_pmd so
that, when page fault find the pmd as trans huge we can be sure that
pgtable can be located at the offset.
upstream SHA1: b3084f4db3aeb991c507ca774337c7e7893ed04f
for 3.13 stable series
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 14 ++++++++++++++
include/asm-generic/pgtable.h | 12 ++++++++++++
mm/huge_memory.c | 14 +++++---------
3 files changed, 31 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 4a191c472867..d27960c89a71 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -558,5 +558,19 @@ extern pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
#define __HAVE_ARCH_PMDP_INVALIDATE
extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp);
+
+#define pmd_move_must_withdraw pmd_move_must_withdraw
+typedef struct spinlock spinlock_t;
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+ spinlock_t *old_pmd_ptl)
+{
+ /*
+ * Archs like ppc64 use pgtable to store per pmd
+ * specific information. So when we switch the pmd,
+ * we should also withdraw and deposit the pgtable
+ */
+ return true;
+}
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index db0923458940..8e4f41d9af4d 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -558,6 +558,18 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp)
}
#endif
+#ifndef pmd_move_must_withdraw
+static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
+ spinlock_t *old_pmd_ptl)
+{
+ /*
+ * With split pmd lock we also need to move preallocated
+ * PTE page table if new_pmd is on different PMD page table.
+ */
+ return new_pmd_ptl != old_pmd_ptl;
+}
+#endif
+
/*
* This function is meant to be used by sites walking pagetables with
* the mmap_sem hold in read mode to protect against MADV_DONTNEED and
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 95d1acb0f3d2..5d80c53b87cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1502,19 +1502,15 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
VM_BUG_ON(!pmd_none(*new_pmd));
- set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
- if (new_ptl != old_ptl) {
- pgtable_t pgtable;
- /*
- * Move preallocated PTE page table if new_pmd is on
- * different PMD page table.
- */
+ if (pmd_move_must_withdraw(new_ptl, old_ptl)) {
+ pgtable_t pgtable;
pgtable = pgtable_trans_huge_withdraw(mm, old_pmd);
pgtable_trans_huge_deposit(mm, new_pmd, pgtable);
-
- spin_unlock(new_ptl);
}
+ set_pmd_at(mm, new_addr, new_pmd, pmd_mksoft_dirty(pmd));
+ if (new_ptl != old_ptl)
+ spin_unlock(new_ptl);
spin_unlock(old_ptl);
}
out:
--
1.8.3.2
^ permalink raw reply related
* [PATCH 2/2][v8] powerpc/config: Enable memory driver
From: Prabhakar Kushwaha @ 2014-01-31 9:40 UTC (permalink / raw)
To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha
As Freescale IFC controller has been moved to driver to driver/memory.
So enable memory driver in powerpc config
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
Branch next
Changes for v2: Sending as it is
Changes for v3: Sending as it is
Changes for v4: Rebased to
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
changes for v5:
- Rebased to branch next of
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
Changes for v6: Sending as it is
Changes for v7: Sending as it is
Changes for v8: Sending as it is
arch/powerpc/configs/corenet32_smp_defconfig | 1 +
arch/powerpc/configs/corenet64_smp_defconfig | 1 +
arch/powerpc/configs/mpc85xx_defconfig | 1 +
arch/powerpc/configs/mpc85xx_smp_defconfig | 1 +
4 files changed, 4 insertions(+)
diff --git a/arch/powerpc/configs/corenet32_smp_defconfig b/arch/powerpc/configs/corenet32_smp_defconfig
index bbd794d..087d437 100644
--- a/arch/powerpc/configs/corenet32_smp_defconfig
+++ b/arch/powerpc/configs/corenet32_smp_defconfig
@@ -142,6 +142,7 @@ CONFIG_RTC_DRV_DS3232=y
CONFIG_RTC_DRV_CMOS=y
CONFIG_UIO=y
CONFIG_STAGING=y
+CONFIG_MEMORY=y
CONFIG_VIRT_DRIVERS=y
CONFIG_FSL_HV_MANAGER=y
CONFIG_EXT2_FS=y
diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index 63508dd..25b03f8 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -129,6 +129,7 @@ CONFIG_EDAC=y
CONFIG_EDAC_MM_EDAC=y
CONFIG_DMADEVICES=y
CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
CONFIG_ISO9660_FS=m
diff --git a/arch/powerpc/configs/mpc85xx_defconfig b/arch/powerpc/configs/mpc85xx_defconfig
index 83d3550..cba638c 100644
--- a/arch/powerpc/configs/mpc85xx_defconfig
+++ b/arch/powerpc/configs/mpc85xx_defconfig
@@ -216,6 +216,7 @@ CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1307=y
CONFIG_DMADEVICES=y
CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
# CONFIG_NET_DMA is not set
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
diff --git a/arch/powerpc/configs/mpc85xx_smp_defconfig b/arch/powerpc/configs/mpc85xx_smp_defconfig
index 4b68629..e315b8a 100644
--- a/arch/powerpc/configs/mpc85xx_smp_defconfig
+++ b/arch/powerpc/configs/mpc85xx_smp_defconfig
@@ -217,6 +217,7 @@ CONFIG_RTC_DRV_CMOS=y
CONFIG_RTC_DRV_DS1307=y
CONFIG_DMADEVICES=y
CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
# CONFIG_NET_DMA is not set
CONFIG_EXT2_FS=y
CONFIG_EXT3_FS=y
--
1.7.9.5
^ permalink raw reply related
* [PATCH 1/2][v8] driver/memory:Move Freescale IFC driver to a common driver
From: Prabhakar Kushwaha @ 2014-01-31 9:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha
Freescale IFC controller has been used for mpc8xxx. It will be used
for ARM-based SoC as well. This patch moves the driver to driver/memory
and fix the header file includes.
Also remove module_platform_driver() and instead call
platform_driver_register() from subsys_initcall() to make sure this module
has been loaded before MTD partition parsing starts.
Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
Based upon git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git
Branch next
Changes for v2:
- Move fsl_ifc in driver/memory
Changes for v3:
- move device tree bindings to memory
Changes for v4: Rebased to
git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
Changes for v5:
- Moved powerpc/Kconfig option to driver/memory
Changes for v6:
- Update Kconfig details
Changes for v7:
- Update Kconfig
Changes for v8:
- Update Kconfig help
.../{powerpc => memory-controllers}/fsl/ifc.txt | 0
arch/powerpc/Kconfig | 4 ----
arch/powerpc/sysdev/Makefile | 1 -
drivers/memory/Kconfig | 8 ++++++++
drivers/memory/Makefile | 1 +
{arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c | 8 ++++++--
drivers/mtd/nand/fsl_ifc_nand.c | 2 +-
.../include/asm => include/linux}/fsl_ifc.h | 0
8 files changed, 16 insertions(+), 8 deletions(-)
rename Documentation/devicetree/bindings/{powerpc => memory-controllers}/fsl/ifc.txt (100%)
rename {arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c (98%)
rename {arch/powerpc/include/asm => include/linux}/fsl_ifc.h (100%)
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/ifc.txt
rename to Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index a5e5d2e..00edd29 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -734,10 +734,6 @@ config FSL_LBC
controller. Also contains some common code used by
drivers for specific local bus peripherals.
-config FSL_IFC
- bool
- depends on FSL_SOC
-
config FSL_GTM
bool
depends on PPC_83xx || QUICC_ENGINE || CPM2
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index f67ac90..afbcc37 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -21,7 +21,6 @@ obj-$(CONFIG_FSL_SOC) += fsl_soc.o fsl_mpic_err.o
obj-$(CONFIG_FSL_PCI) += fsl_pci.o $(fsl-msi-obj-y)
obj-$(CONFIG_FSL_PMC) += fsl_pmc.o
obj-$(CONFIG_FSL_LBC) += fsl_lbc.o
-obj-$(CONFIG_FSL_IFC) += fsl_ifc.o
obj-$(CONFIG_FSL_GTM) += fsl_gtm.o
obj-$(CONFIG_FSL_85XX_CACHE_SRAM) += fsl_85xx_l2ctlr.o fsl_85xx_cache_sram.o
obj-$(CONFIG_SIMPLE_GPIO) += simple_gpio.o
diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 29a11db..57721ed 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -50,4 +50,12 @@ config TEGRA30_MC
analysis, especially for IOMMU/SMMU(System Memory Management
Unit) module.
+config FSL_IFC
+ bool "Freescale Integrated Flash Controller"
+ depends on FSL_SOC
+ help
+ This driver is for the Integrated Flash Controller(IFC) module
+ available in Freescale SoCs. This controller allows to handle
+ devices such as NOR, NAND, FPGA and ASIC etc.
+
endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index 969d923..f2bf25c 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -6,6 +6,7 @@ ifeq ($(CONFIG_DDR),y)
obj-$(CONFIG_OF) += of_memory.o
endif
obj-$(CONFIG_TI_EMIF) += emif.o
+obj-$(CONFIG_FSL_IFC) += fsl_ifc.o
obj-$(CONFIG_MVEBU_DEVBUS) += mvebu-devbus.o
obj-$(CONFIG_TEGRA20_MC) += tegra20-mc.o
obj-$(CONFIG_TEGRA30_MC) += tegra30-mc.o
diff --git a/arch/powerpc/sysdev/fsl_ifc.c b/drivers/memory/fsl_ifc.c
similarity index 98%
rename from arch/powerpc/sysdev/fsl_ifc.c
rename to drivers/memory/fsl_ifc.c
index fbc885b..3d5d792 100644
--- a/arch/powerpc/sysdev/fsl_ifc.c
+++ b/drivers/memory/fsl_ifc.c
@@ -29,8 +29,8 @@
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/platform_device.h>
+#include <linux/fsl_ifc.h>
#include <asm/prom.h>
-#include <asm/fsl_ifc.h>
struct fsl_ifc_ctrl *fsl_ifc_ctrl_dev;
EXPORT_SYMBOL(fsl_ifc_ctrl_dev);
@@ -298,7 +298,11 @@ static struct platform_driver fsl_ifc_ctrl_driver = {
.remove = fsl_ifc_ctrl_remove,
};
-module_platform_driver(fsl_ifc_ctrl_driver);
+static int __init fsl_ifc_init(void)
+{
+ return platform_driver_register(&fsl_ifc_ctrl_driver);
+}
+subsys_initcall(fsl_ifc_init);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Freescale Semiconductor");
diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index 4335577..865b323 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -30,7 +30,7 @@
#include <linux/mtd/nand.h>
#include <linux/mtd/partitions.h>
#include <linux/mtd/nand_ecc.h>
-#include <asm/fsl_ifc.h>
+#include <linux/fsl_ifc.h>
#define FSL_IFC_V1_1_0 0x01010000
#define ERR_BYTE 0xFF /* Value returned for read
diff --git a/arch/powerpc/include/asm/fsl_ifc.h b/include/linux/fsl_ifc.h
similarity index 100%
rename from arch/powerpc/include/asm/fsl_ifc.h
rename to include/linux/fsl_ifc.h
--
1.7.9.5
^ permalink raw reply related
* Re: [PATCH v2] kexec/ppc64 fix device tree endianess issues for memory attributes
From: Simon Horman @ 2014-01-31 5:21 UTC (permalink / raw)
To: Laurent Dufour; +Cc: Mahesh Salgaonkar, kexec, linuxppc-dev, Anton Blanchard
In-Reply-To: <20140130150622.11156.39497.stgit@nimbus>
On Thu, Jan 30, 2014 at 04:06:22PM +0100, Laurent Dufour wrote:
> All the attributes exposed in the device tree are in Big Endian format.
>
> This patch add the byte swap operation for some entries which were not yet
> processed, including those fixed by the following kernel's patch :
>
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-January/114720.html
>
> To work on PPC64 Little Endian mode, kexec now requires that the kernel's
> patch mentioned above is applied on the kexecing kernel.
>
> Tested on ppc64 LPAR (kexec/dump) and ppc64le in a Qemu/KVM guest (kexec)
>
> Changes from v1 :
> * add processing of the following entries :
> - ibm,dynamic-reconfiguration-memory
> - chosen/linux,kernel-end
> - chosen/linux,crashkernel-base & size
> - chosen/linux,memory-limit
> - chosen/linux,htab-base & size
> - linux,tce-base & size
> - memory@/reg
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Thanks, applied.
^ permalink raw reply
* [PATCH 3/3] cpuidle/ppc: Split timer_interrupt() into timer handling and interrupt handling routines
From: Preeti U Murthy @ 2014-01-31 4:10 UTC (permalink / raw)
To: deepthi, svaidy, toshi.kani, arnd, geoff, mpe, rusty,
linux-kernel, paul.gortmaker, afleming, anton, srivatsa.bhat,
benh, paulus, ady8radu, linuxppc-dev
In-Reply-To: <20140131040631.13071.19603.stgit@preeti.in.ibm.com>
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Split timer_interrupt(), which is the local timer interrupt handler on ppc
into routines called during regular interrupt handling and __timer_interrupt(),
which takes care of running local timers and collecting time related stats.
This will enable callers interested only in running expired local timers to
directly call into __timer_interupt(). One of the use cases of this is the
tick broadcast IPI handling in which the sleeping CPUs need to handle the local
timers that have expired.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
arch/powerpc/kernel/time.c | 81 +++++++++++++++++++++++++-------------------
1 file changed, 46 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 3ff97db..df2989b 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -478,6 +478,47 @@ void arch_irq_work_raise(void)
#endif /* CONFIG_IRQ_WORK */
+void __timer_interrupt(void)
+{
+ struct pt_regs *regs = get_irq_regs();
+ u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+ struct clock_event_device *evt = &__get_cpu_var(decrementers);
+ u64 now;
+
+ trace_timer_interrupt_entry(regs);
+
+ if (test_irq_work_pending()) {
+ clear_irq_work_pending();
+ irq_work_run();
+ }
+
+ now = get_tb_or_rtc();
+ if (now >= *next_tb) {
+ *next_tb = ~(u64)0;
+ if (evt->event_handler)
+ evt->event_handler(evt);
+ __get_cpu_var(irq_stat).timer_irqs_event++;
+ } else {
+ now = *next_tb - now;
+ if (now <= DECREMENTER_MAX)
+ set_dec((int)now);
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
+ __get_cpu_var(irq_stat).timer_irqs_others++;
+ }
+
+#ifdef CONFIG_PPC64
+ /* collect purr register values often, for accurate calculations */
+ if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
+ struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
+ cu->current_tb = mfspr(SPRN_PURR);
+ }
+#endif
+
+ trace_timer_interrupt_exit(regs);
+}
+
/*
* timer_interrupt - gets called when the decrementer overflows,
* with interrupts disabled.
@@ -486,8 +527,6 @@ void timer_interrupt(struct pt_regs * regs)
{
struct pt_regs *old_regs;
u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
- struct clock_event_device *evt = &__get_cpu_var(decrementers);
- u64 now;
/* Ensure a positive value is written to the decrementer, or else
* some CPUs will continue to take decrementer exceptions.
@@ -519,39 +558,7 @@ void timer_interrupt(struct pt_regs * regs)
old_regs = set_irq_regs(regs);
irq_enter();
- trace_timer_interrupt_entry(regs);
-
- if (test_irq_work_pending()) {
- clear_irq_work_pending();
- irq_work_run();
- }
-
- now = get_tb_or_rtc();
- if (now >= *next_tb) {
- *next_tb = ~(u64)0;
- if (evt->event_handler)
- evt->event_handler(evt);
- __get_cpu_var(irq_stat).timer_irqs_event++;
- } else {
- now = *next_tb - now;
- if (now <= DECREMENTER_MAX)
- set_dec((int)now);
- /* We may have raced with new irq work */
- if (test_irq_work_pending())
- set_dec(1);
- __get_cpu_var(irq_stat).timer_irqs_others++;
- }
-
-#ifdef CONFIG_PPC64
- /* collect purr register values often, for accurate calculations */
- if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
- struct cpu_usage *cu = &__get_cpu_var(cpu_usage_array);
- cu->current_tb = mfspr(SPRN_PURR);
- }
-#endif
-
- trace_timer_interrupt_exit(regs);
-
+ __timer_interrupt();
irq_exit();
set_irq_regs(old_regs);
}
@@ -828,6 +835,10 @@ static void decrementer_set_mode(enum clock_event_mode mode,
/* Interrupt handler for the timer broadcast IPI */
void tick_broadcast_ipi_handler(void)
{
+ u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
+
+ *next_tb = get_tb_or_rtc();
+ __timer_interrupt();
}
static void register_decrementer_clockevent(int cpu)
^ permalink raw reply related
* [PATCH 2/3] powerpc: Implement tick broadcast IPI as a fixed IPI message
From: Preeti U Murthy @ 2014-01-31 4:10 UTC (permalink / raw)
To: deepthi, svaidy, toshi.kani, arnd, geoff, mpe, rusty,
linux-kernel, paul.gortmaker, afleming, anton, srivatsa.bhat,
benh, paulus, ady8radu, linuxppc-dev
In-Reply-To: <20140131040631.13071.19603.stgit@preeti.in.ibm.com>
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
For scalability and performance reasons, we want the tick broadcast IPIs
to be handled as efficiently as possible. Fixed IPI messages
are one of the most efficient mechanisms available - they are faster than
the smp_call_function mechanism because the IPI handlers are fixed and hence
they don't involve costly operations such as adding IPI handlers to the target
CPU's function queue, acquiring locks for synchronization etc.
Luckily we have an unused IPI message slot, so use that to implement
tick broadcast IPIs efficiently.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[Functions renamed to tick_broadcast* and Changelog modified by
Preeti U. Murthy<preeti@linux.vnet.ibm.com>]
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/include/asm/time.h | 1 +
arch/powerpc/kernel/smp.c | 19 +++++++++++++++----
arch/powerpc/kernel/time.c | 5 +++++
arch/powerpc/platforms/cell/interrupt.c | 2 +-
arch/powerpc/platforms/ps3/smp.c | 2 +-
6 files changed, 24 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9f7356b..ff51046 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
* in /proc/interrupts will be wrong!!! --Troy */
#define PPC_MSG_CALL_FUNCTION 0
#define PPC_MSG_RESCHEDULE 1
-#define PPC_MSG_UNUSED 2
+#define PPC_MSG_TICK_BROADCAST 2
#define PPC_MSG_DEBUGGER_BREAK 3
/* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index c1f2676..1d428e6 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -28,6 +28,7 @@ extern struct clock_event_device decrementer_clockevent;
struct rtc_time;
extern void to_tm(int tim, struct rtc_time * tm);
extern void GregorianDay(struct rtc_time *tm);
+extern void tick_broadcast_ipi_handler(void);
extern void generic_calibrate_decr(void);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ee7d76b..6f06f05 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -35,6 +35,7 @@
#include <asm/ptrace.h>
#include <linux/atomic.h>
#include <asm/irq.h>
+#include <asm/hw_irq.h>
#include <asm/page.h>
#include <asm/pgtable.h>
#include <asm/prom.h>
@@ -145,9 +146,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
}
-static irqreturn_t unused_action(int irq, void *data)
+static irqreturn_t tick_broadcast_ipi_action(int irq, void *data)
{
- /* This slot is unused and hence available for use, if needed */
+ tick_broadcast_ipi_handler();
return IRQ_HANDLED;
}
@@ -168,14 +169,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] = call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
- [PPC_MSG_UNUSED] = unused_action,
+ [PPC_MSG_TICK_BROADCAST] = tick_broadcast_ipi_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
};
const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] = "ipi call function",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
- [PPC_MSG_UNUSED] = "ipi unused",
+ [PPC_MSG_TICK_BROADCAST] = "ipi tick-broadcast",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
};
@@ -251,6 +252,8 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
+ if (all & IPI_MESSAGE(PPC_MSG_TICK_BROADCAST))
+ tick_broadcast_ipi_handler();
if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info->messages);
@@ -289,6 +292,14 @@ void arch_send_call_function_ipi_mask(const struct cpumask *mask)
do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
}
+void tick_broadcast(const struct cpumask *mask)
+{
+ unsigned int cpu;
+
+ for_each_cpu(cpu, mask)
+ do_message_pass(cpu, PPC_MSG_TICK_BROADCAST);
+}
+
#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
void smp_send_debugger_break(void)
{
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b3dab20..3ff97db 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -825,6 +825,11 @@ static void decrementer_set_mode(enum clock_event_mode mode,
decrementer_set_next_event(DECREMENTER_MAX, dev);
}
+/* Interrupt handler for the timer broadcast IPI */
+void tick_broadcast_ipi_handler(void)
+{
+}
+
static void register_decrementer_clockevent(int cpu)
{
struct clock_event_device *dec = &per_cpu(decrementers, cpu);
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index adf3726..8a106b4 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
{
iic_request_ipi(PPC_MSG_CALL_FUNCTION);
iic_request_ipi(PPC_MSG_RESCHEDULE);
- iic_request_ipi(PPC_MSG_UNUSED);
+ iic_request_ipi(PPC_MSG_TICK_BROADCAST);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
}
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 00d1a7c..b358bec 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION != 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE != 1);
- BUILD_BUG_ON(PPC_MSG_UNUSED != 2);
+ BUILD_BUG_ON(PPC_MSG_TICK_BROADCAST != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK != 3);
for (i = 0; i < MSG_COUNT; i++) {
^ permalink raw reply related
* [PATCH 1/3] powerpc: Free up the slot of PPC_MSG_CALL_FUNC_SINGLE IPI message
From: Preeti U Murthy @ 2014-01-31 4:10 UTC (permalink / raw)
To: deepthi, svaidy, toshi.kani, arnd, geoff, mpe, rusty,
linux-kernel, paul.gortmaker, afleming, anton, srivatsa.bhat,
benh, paulus, ady8radu, linuxppc-dev
In-Reply-To: <20140131040631.13071.19603.stgit@preeti.in.ibm.com>
From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
The IPI handlers for both PPC_MSG_CALL_FUNC and PPC_MSG_CALL_FUNC_SINGLE map
to a common implementation - generic_smp_call_function_single_interrupt(). So,
we can consolidate them and save one of the IPI message slots, (which are
precious on powerpc, since only 4 of those slots are available).
So, implement the functionality of PPC_MSG_CALL_FUNC_SINGLE using
PPC_MSG_CALL_FUNC itself and release its IPI message slot, so that it can be
used for something else in the future, if desired.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Geoff Levand <geoff@infradead.org> [For the PS3 part]
---
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/kernel/smp.c | 12 +++++-------
arch/powerpc/platforms/cell/interrupt.c | 2 +-
arch/powerpc/platforms/ps3/smp.c | 2 +-
4 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 084e080..9f7356b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -120,7 +120,7 @@ extern int cpu_to_core_id(int cpu);
* in /proc/interrupts will be wrong!!! --Troy */
#define PPC_MSG_CALL_FUNCTION 0
#define PPC_MSG_RESCHEDULE 1
-#define PPC_MSG_CALL_FUNC_SINGLE 2
+#define PPC_MSG_UNUSED 2
#define PPC_MSG_DEBUGGER_BREAK 3
/* for irq controllers that have dedicated ipis per message (4) */
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..ee7d76b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -145,9 +145,9 @@ static irqreturn_t reschedule_action(int irq, void *data)
return IRQ_HANDLED;
}
-static irqreturn_t call_function_single_action(int irq, void *data)
+static irqreturn_t unused_action(int irq, void *data)
{
- generic_smp_call_function_single_interrupt();
+ /* This slot is unused and hence available for use, if needed */
return IRQ_HANDLED;
}
@@ -168,14 +168,14 @@ static irqreturn_t debug_ipi_action(int irq, void *data)
static irq_handler_t smp_ipi_action[] = {
[PPC_MSG_CALL_FUNCTION] = call_function_action,
[PPC_MSG_RESCHEDULE] = reschedule_action,
- [PPC_MSG_CALL_FUNC_SINGLE] = call_function_single_action,
+ [PPC_MSG_UNUSED] = unused_action,
[PPC_MSG_DEBUGGER_BREAK] = debug_ipi_action,
};
const char *smp_ipi_name[] = {
[PPC_MSG_CALL_FUNCTION] = "ipi call function",
[PPC_MSG_RESCHEDULE] = "ipi reschedule",
- [PPC_MSG_CALL_FUNC_SINGLE] = "ipi call function single",
+ [PPC_MSG_UNUSED] = "ipi unused",
[PPC_MSG_DEBUGGER_BREAK] = "ipi debugger",
};
@@ -251,8 +251,6 @@ irqreturn_t smp_ipi_demux(void)
generic_smp_call_function_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))
scheduler_ipi();
- if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNC_SINGLE))
- generic_smp_call_function_single_interrupt();
if (all & IPI_MESSAGE(PPC_MSG_DEBUGGER_BREAK))
debug_ipi_action(0, NULL);
} while (info->messages);
@@ -280,7 +278,7 @@ EXPORT_SYMBOL_GPL(smp_send_reschedule);
void arch_send_call_function_single_ipi(int cpu)
{
- do_message_pass(cpu, PPC_MSG_CALL_FUNC_SINGLE);
+ do_message_pass(cpu, PPC_MSG_CALL_FUNCTION);
}
void arch_send_call_function_ipi_mask(const struct cpumask *mask)
diff --git a/arch/powerpc/platforms/cell/interrupt.c b/arch/powerpc/platforms/cell/interrupt.c
index 2d42f3b..adf3726 100644
--- a/arch/powerpc/platforms/cell/interrupt.c
+++ b/arch/powerpc/platforms/cell/interrupt.c
@@ -215,7 +215,7 @@ void iic_request_IPIs(void)
{
iic_request_ipi(PPC_MSG_CALL_FUNCTION);
iic_request_ipi(PPC_MSG_RESCHEDULE);
- iic_request_ipi(PPC_MSG_CALL_FUNC_SINGLE);
+ iic_request_ipi(PPC_MSG_UNUSED);
iic_request_ipi(PPC_MSG_DEBUGGER_BREAK);
}
diff --git a/arch/powerpc/platforms/ps3/smp.c b/arch/powerpc/platforms/ps3/smp.c
index 4b35166..00d1a7c 100644
--- a/arch/powerpc/platforms/ps3/smp.c
+++ b/arch/powerpc/platforms/ps3/smp.c
@@ -76,7 +76,7 @@ static int __init ps3_smp_probe(void)
BUILD_BUG_ON(PPC_MSG_CALL_FUNCTION != 0);
BUILD_BUG_ON(PPC_MSG_RESCHEDULE != 1);
- BUILD_BUG_ON(PPC_MSG_CALL_FUNC_SINGLE != 2);
+ BUILD_BUG_ON(PPC_MSG_UNUSED != 2);
BUILD_BUG_ON(PPC_MSG_DEBUGGER_BREAK != 3);
for (i = 0; i < MSG_COUNT; i++) {
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox