* Re: [PATCH 01/17] KVM: PPC: Fix machine checks on 32-bit Book3S
From: Alexander Graf @ 2011-07-01 10:08 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <20110629101642.GB25406@bloggs.ozlabs.ibm.com>
On 29.06.2011, at 12:16, Paul Mackerras wrote:
> Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through
> function exports") resulted in vcpu->arch.trampoline_lowmem and
> vcpu->arch.trampoline_enter ending up with kernel virtual addresses
> rather than physical addresses. This is OK on 64-bit Book3S machines,
> which ignore the top 4 bits of the effective address in real mode,
> but on 32-bit Book3S machines, accessing these addresses in real mode
> causes machine check interrupts, as the hardware uses the whole
> effective address as the physical address in real mode.
>=20
> This fixes the problem by using __pa() to convert these addresses
> to physical addresses.
Ouch. Thanks for the catch! I really need to include book3s_32 in my =
automated testing :(.
Alex
^ permalink raw reply
* Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
From: Paul Mackerras @ 2011-07-01 10:09 UTC (permalink / raw)
To: Alexander Graf; +Cc: Scott Wood, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <4E0C9077.2060608@suse.de>
On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote:
> On 06/29/2011 12:41 PM, Paul Mackerras wrote:
> >+struct kvm_ppc_set_platform {
> >+ __u16 platform; /* defines the OS/hypervisor ABI */
> >+ __u16 guest_arch; /* e.g. decimal 206 for v2.06 */
> >+ __u32 flags;
>
> Please add some padding so we can extend it later if necessary.
>
> >+};
> >+
> >+/* Values for platform */
> >+#define KVM_PPC_PV_NONE 0 /* bare-metal, non-paravirtualized */
> >+#define KVM_PPC_PV_KVM 1 /* as defined in kvm_para.h */
> >+#define KVM_PPC_PV_SPAPR 2 /* IBM Server PAPR (a la PowerVM) */
>
> We also support BookE which would be useful to also include in the list.
> Furthermore, KVM is more of a feature flag than a platform. We can
> easily support KVM extensions on an SPAPR platform, no?
Yes, I guess so. The hypercall sequence will have to be different,
since ordinary system call interrupts go straight to the guest. But I
guess you've allowed for that with the hypercall sequence property in
the device tree.
> This whole interface also could deprecate the PVR setting one, so we
> can simply include PVR as well and not require kernel space to jump
> through hoops to figure out its capabilities.
I debated about whether to include a PVR value in this structure.
The thing is that POWER7 has the "Processor Compatibility Register"
(PCR), which has a bit which makes the processor behave in user mode
as if it were a POWER6. So, we could run a book3s_hv guest in POWER6
mode by setting this bit (which we might want to do to run older
distros). However, this bit doesn't affect the PVR value that the
guest sees. That's why I went for an architecture level rather than a
specific PVR value.
We could go with a PVR value and use the "logical" PVR values defined
in PAPR to represent architecture levels, e.g. 0x0f000002 for
architecture v2.05 (POWER6).
> And we need to identify 32-bit BookS processors, so we can go into
> 32-bit mode when necessary. That should also be a different
> guest_arch, right?
Right. If we go with a PVR value then we just use the PVR value for a
suitable 32-bit processor.
> >+
> >+/* Values for flags */
> >+#define KVM_PPC_CROSS_ARCH 1 /* guest architecture != host */
>
> User space shouldn't have to worry about this one. It's up to the
> kernel to decide that it's cross.
I put that in because we might want to force the use of book3s_pr, for
example if we know we're going to want to do emulated MMIO or
something else that isn't implemented in book3s_hv just yet.
Ultimately, yes, the kernel should be able to decide whether it's
cross or not. However, I don't think we should make it completely
opaque to userspace as to whether the kernel is using _pr or _hv.
If nothing else, userspace should be able to find out and tell the
user so that performance expectations can be set correctly.
Paul.
^ permalink raw reply
* Re: [RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate
From: Alexander Graf @ 2011-07-01 10:23 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Scott Wood, linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <20110701100948.GA14080@bloggs.ozlabs.ibm.com>
On 01.07.2011, at 12:09, Paul Mackerras wrote:
> On Thu, Jun 30, 2011 at 05:04:23PM +0200, Alexander Graf wrote:
>> On 06/29/2011 12:41 PM, Paul Mackerras wrote:
>>> +struct kvm_ppc_set_platform {
>>> + __u16 platform; /* defines the OS/hypervisor ABI */
>>> + __u16 guest_arch; /* e.g. decimal 206 for v2.06 */
>>> + __u32 flags;
>>=20
>> Please add some padding so we can extend it later if necessary.
>>=20
>>> +};
>>> +
>>> +/* Values for platform */
>>> +#define KVM_PPC_PV_NONE 0 /* bare-metal, =
non-paravirtualized */
>>> +#define KVM_PPC_PV_KVM 1 /* as defined in =
kvm_para.h */
>>> +#define KVM_PPC_PV_SPAPR 2 /* IBM Server PAPR (a la =
PowerVM) */
>>=20
>> We also support BookE which would be useful to also include in the =
list.
>> Furthermore, KVM is more of a feature flag than a platform. We can
>> easily support KVM extensions on an SPAPR platform, no?
>=20
> Yes, I guess so. The hypercall sequence will have to be different,
> since ordinary system call interrupts go straight to the guest. But I
> guess you've allowed for that with the hypercall sequence property in
> the device tree.
>=20
>> This whole interface also could deprecate the PVR setting one, so we
>> can simply include PVR as well and not require kernel space to jump
>> through hoops to figure out its capabilities.
>=20
> I debated about whether to include a PVR value in this structure.
>=20
> The thing is that POWER7 has the "Processor Compatibility Register"
> (PCR), which has a bit which makes the processor behave in user mode
> as if it were a POWER6. So, we could run a book3s_hv guest in POWER6
> mode by setting this bit (which we might want to do to run older
> distros). However, this bit doesn't affect the PVR value that the
> guest sees. That's why I went for an architecture level rather than a
> specific PVR value.
>=20
> We could go with a PVR value and use the "logical" PVR values defined
> in PAPR to represent architecture levels, e.g. 0x0f000002 for
> architecture v2.05 (POWER6).
IIUC the PVR values are somewhat standardized to contain major and minor =
revision numbers. Can't we just mask out the minor ones and match for =
known good systems?
>=20
>> And we need to identify 32-bit BookS processors, so we can go into
>> 32-bit mode when necessary. That should also be a different
>> guest_arch, right?
>=20
> Right. If we go with a PVR value then we just use the PVR value for a
> suitable 32-bit processor.
Well, we need to have some way of mapping PVR to arch then. KVM easily =
supports -cpu G3 and G4. We might also want to have some information on =
feature flags, such as Altivec or SPE mode available. Or paired singles =
:). I'm not sure I want to have all that mapping information inside the =
kernel.
So what we could do is we just provide as much information as we can =
from user space, including PVR, architecture (2.01 for example), =
features (32/64-bit, booke/books, fpu, altivec, spe, ...).
>=20
>>> +
>>> +/* Values for flags */
>>> +#define KVM_PPC_CROSS_ARCH 1 /* guest architecture !=3D host =
*/
>>=20
>> User space shouldn't have to worry about this one. It's up to the
>> kernel to decide that it's cross.
>=20
> I put that in because we might want to force the use of book3s_pr, for
> example if we know we're going to want to do emulated MMIO or
> something else that isn't implemented in book3s_hv just yet.
Ah, I see. Well, we could just add a flag to the feature list saying =
MMIO. If that's impossible to satisfy (HV only), fail the call. =
Otherwise switch to _pr mode. Later when _hv might be able to support =
MMIO, we can use it without changing user space.
> Ultimately, yes, the kernel should be able to decide whether it's
> cross or not. However, I don't think we should make it completely
> opaque to userspace as to whether the kernel is using _pr or _hv.
> If nothing else, userspace should be able to find out and tell the
> user so that performance expectations can be set correctly.
Hrm. Sure, but the decision should be done in kernel land based on all =
information required to actually make it. And the kernel has more =
information regarding the system it's running on, so that's the place to =
actually do the decision. Bubbling it up to user space again is =
certainly fine by me :).
Alex
^ permalink raw reply
* Re: [BUG?]3.0-rc4+ftrace+kprobe: set kprobe at instruction 'stwu' lead to system crash/freeze
From: tiejun.chen @ 2011-07-01 10:03 UTC (permalink / raw)
To: Yong Zhang
Cc: Jim Keniston, linux-kernel, Steven Rostedt, paulus,
yrl.pp-manager.tt, Masami Hiramatsu, linuxppc-dev
In-Reply-To: <BANLkTimYej4_dmBqvPBCLej=JA5atLrZVA@mail.gmail.com>
Yong Zhang wrote:
> On Mon, Jun 27, 2011 at 6:01 PM, Ananth N Mavinakayanahalli
> <ananth@in.ibm.com> wrote:
>> On Sun, Jun 26, 2011 at 11:47:13PM +0900, Masami Hiramatsu wrote:
>>> (2011/06/24 19:29), Steven Rostedt wrote:
>>>> On Fri, 2011-06-24 at 17:21 +0800, Yong Zhang wrote:
>>>>> Hi,
>>>>>
>>>>> When I use kprobe to do something, I found some wired thing.
>>>>>
>>>>> When CONFIG_FUNCTION_TRACER is disabled:
>>>>> (gdb) disassemble do_fork
>>>>> Dump of assembler code for function do_fork:
>>>>> 0xc0037390 <+0>: mflr r0
>>>>> 0xc0037394 <+4>: stwu r1,-64(r1)
>>>>> 0xc0037398 <+8>: mfcr r12
>>>>> 0xc003739c <+12>: stmw r27,44(r1)
>>>>>
>>>>> Then I:
>>>>> modprobe kprobe_example func=do_fork offset=4
>>>>> ls
>>>>> Things works well.
>>>>>
>>>>> But when CONFIG_FUNCTION_TRACER is enabled:
>>>>> (gdb) disassemble do_fork
>>>>> Dump of assembler code for function do_fork:
>>>>> 0xc0040334 <+0>: mflr r0
>>>>> 0xc0040338 <+4>: stw r0,4(r1)
>>>>> 0xc004033c <+8>: bl 0xc00109d4 <mcount>
>>>>> 0xc0040340 <+12>: stwu r1,-80(r1)
>>>>> 0xc0040344 <+16>: mflr r0
>>>>> 0xc0040348 <+20>: stw r0,84(r1)
>>>>> 0xc004034c <+24>: mfcr r12
>>>>> Then I:
>>>>> modprobe kprobe_example func=do_fork offset=12
>>>>> ls
>>>>> 'ls' will never retrun. system freeze.
>>>> I'm not sure if x86 had a similar issue.
>>>>
>>>> Masami, have any ideas to why this happened?
>>> No, I don't familiar with ppc implementation. I guess
>>> that single-step resume code failed to emulate the
>>> instruction, but it strongly depends on ppc arch.
>>> Maybe IBM people may know what happened.
>>>
>>> Ananth, Jim, would you have any ideas?
>> On powerpc, we emulate sstep whenever possible. Only recently support to
>> emulate loads and stores got added. I don't have access to a powerpc box
>> today... but will try to recreate the problem ASAP and see what could be
>> happening in the presence of mcount.
>
> After taking more testing on it, it looks like the issue doesn't
> depend on mcount
> (AKA. CONFIG_FUNCTION_TRACER)
>
> As I said in the first email, with eldk-5.0 CONFIG_FUNCTION_TRACER=n
> will work well.
>
> But when I'm using eldk-4.2[1], both will fail. But the funny thing is when I
> set kprobe at several functions some works fine but some will fail. For example,
> at this time do_fork() works well, but show_interrupt() will crash.
>
> root@unknown:/root> insmod kprobe_example.ko func=show_interrupts
> Planted kprobe at c009be18
> root@unknown:/root> cat /proc/interrupts
> pre_handler: p->addr = 0xc009be18, nip = 0xc009be18, msr = 0x29000
> post_handler: p->addr = 0xc009be18, msr = 0x29000,boostable = 1
> Oops: Exception in kernel mode, sig: 11 [#1]
> PREEMPT MPC8536 DS
> Modules linked in: kprobe_example
> NIP: df159e74 LR: c0106f40 CTR: c009be18
> REGS: df159d90 TRAP: 0700 Not tainted (3.0.0-rc4-00001-ge8ffcca-dirty)
> MSR: 00029000 <EE,ME,CE> CR: 20202688 XER: 00000000
> TASK = dfaa5340[613] 'cat' THREAD: df158000
> GPR00: fffff000 df159e40 dfaa5340 df024a00 df159e78 00000000 df159f20 00000001
> GPR08: c10060d0 c009be18 00029000 df159e70 00000000 1001ca74 1ffb5f00 100a01cc
> GPR16: 00000000 00000000 00000000 00000000 df024a28 df159f20 00000000 dfbff080
> GPR24: 10016000 00001000 df159f20 df159e78 dfbff080 df159e78 df024a00 df159e70
> NIP [df159e74] 0xdf159e74
> LR [c0106f40] seq_read+0x2a4/0x568
> Call Trace:
> [df159e40] [00029000] 0x29000 (unreliable)
> [df159e74] [00000000] (null)
> Instruction dump:
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
> ---[ end trace 60026bfc1fe79aed ]---
> Segmentation fault
Maybe I can understand this problem.
When we kprobe these operations such as store-and-update-word for SP(r1),
stwu r1, -A(r1)
The program exception is triggered then PPC always allocate an exception frame
as shown as the follows:
old r1 --------
...
nip
gpr[2]~gpr[31]
gpr[1] <--------- old r1 is stored here.
gpr[0]
-------- <-- pr_regs @offset 16 bytes
padding
STACK_FRAME_REGS_MARKER
LR
back chain
new r1 --------
Here emulate_step() is called to emulate 'stwu'. Actually this is equivalent to
1> update pr_regs->gpr[1] = mem(old r1 + (-A))
2> 'stw <old r1>, mem<(old r1 + (-A)) >
You should notice the stack based on new r1 would be covered with mem<old r1
+(-A)>. So after this, the kernel exit from post_krpobe, something would be
broken. This should depend on sizeof(-A).
For kprobe show_interrupts, you can see pregs->nip is re-written violently so
kernel issued.
But sometimes we may only re-write some violate registers the kernel still
alive. And so this is just why the kernel works well for some kprobed point
after you change some kernel options/toolchains.
If I'm correct its difficult to kprobe these stwu sp operation since the
sizeof(-A) is undermined for the kernel. So we have to implement in-depend
interrupt stack like PPC64.
Tiejun
>
> Thanks,
> Yong
>
> [1]: http://ftp.denx.de/pub/eldk/4.2/
>
^ permalink raw reply
* Re: [PATCH 05/17] KVM: PPC: Deliver program interrupts right away instead of queueing them
From: Alexander Graf @ 2011-07-01 11:47 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <20110629101852.GF25406@bloggs.ozlabs.ibm.com>
On 29.06.2011, at 12:18, Paul Mackerras wrote:
> Doing so means that we don't have to save the flags anywhere and gets
> rid of the last reference to to_book3s(vcpu) in =
arch/powerpc/kvm/book3s.c.
>=20
> Doing so is OK because a program interrupt won't be generated at the
> same time as any other synchronous interrupt. If a program interrupt
> and an asynchronous interrupt (external or decrementer) are generated
> at the same time, the program interrupt will be delivered, which is
> correct because it has a higher priority, and then the asynchronous
> interrupt will be masked.
>=20
> We don't ever generate system reset or machine check interrupts to the
> guest, but if we did, then we would need to make sure they got =
delivered
> rather than the program interrupt. The current code would be wrong in
> this situation anyway since it would deliver the program interrupt as
> well as the reset/machine check interrupt.
>=20
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/kvm/book3s.c | 8 +++-----
> 1 files changed, 3 insertions(+), 5 deletions(-)
>=20
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 163e3e1..f68a34d 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -129,8 +129,8 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu =
*vcpu, unsigned int vec)
>=20
> void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
> {
> - to_book3s(vcpu)->prog_flags =3D flags;
Now that prog_flags is unused, please remove it from the headers.
Alex
^ permalink raw reply
* Re: [PATCH 1/2] mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
From: Ingo Molnar @ 2011-07-01 12:14 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <1308013070.2874.784.camel@pasglop>
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
> files, and I need it in a third one to fix a powerpc bug, so let's
> first move it into a header
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> Ingo, Thomas: Who needs to ack the x86 bit ? I'd like to send that
> to Linus asap with the powerpc fix.
Acked-by: Ingo Molnar <mingo@elte.hu>
(btw., you can consider obvious cleanups as being implicitly acked by
me and don't need to block fixes on me.)
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Fix memory_block_size_bytes() for non-pseries
From: Ingo Molnar @ 2011-07-01 12:15 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <1308013071.2874.785.camel@pasglop>
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> Just compiling pseries in the kernel causes it to override
> memory_block_size_bytes() regardless of what is the runtime
> platform.
>
> This cleans up the implementation of that function, fixing
> a bug or two while at it, so that it's harmless (and potentially
> useful) for other platforms. Without this, bugs in that code
> would trigger a WARN_ON() in drivers/base/memory.c when
> booting some different platforms.
>
> If/when we have another platform supporting memory hotplug we
> might want to either move that out to a generic place or
> make it a ppc_md. callback.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
> index 33867ec..9d6a8ef 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
> @@ -12,6 +12,8 @@
> #include <linux/of.h>
> #include <linux/memblock.h>
> #include <linux/vmalloc.h>
> +#include <linux/memory.h>
> +
> #include <asm/firmware.h>
> #include <asm/machdep.h>
> #include <asm/pSeries_reconfig.h>
> @@ -20,24 +22,25 @@
> static unsigned long get_memblock_size(void)
> {
> struct device_node *np;
> - unsigned int memblock_size = 0;
> + unsigned int memblock_size = MIN_MEMORY_BLOCK_SIZE;
> + struct resource r;
>
> np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> if (np) {
> - const unsigned long *size;
> + const __be64 *size;
>
> size = of_get_property(np, "ibm,lmb-size", NULL);
> - memblock_size = size ? *size : 0;
> -
> + if (size)
> + memblock_size = be64_to_cpup(size);
> of_node_put(np);
> - } else {
> + } else if (machine_is(pseries)) {
> + /* This fallback really only applies to pseries */
> unsigned int memzero_size = 0;
> - const unsigned int *regs;
>
> np = of_find_node_by_path("/memory@0");
> if (np) {
> - regs = of_get_property(np, "reg", NULL);
> - memzero_size = regs ? regs[3] : 0;
> + if (!of_address_to_resource(np, 0, &r))
> + memzero_size = resource_size(&r);
> of_node_put(np);
> }
>
> @@ -50,16 +53,21 @@ static unsigned long get_memblock_size(void)
> sprintf(buf, "/memory@%x", memzero_size);
> np = of_find_node_by_path(buf);
> if (np) {
> - regs = of_get_property(np, "reg", NULL);
> - memblock_size = regs ? regs[3] : 0;
> + if (!of_address_to_resource(np, 0, &r))
> + memblock_size = resource_size(&r);
> of_node_put(np);
> }
> }
> }
> -
> return memblock_size;
> }
>
> +/* WARNING: This is going to override the generic definition whenever
> + * pseries is built-in regardless of what platform is active at boot
> + * time. This is fine for now as this is the only "option" and it
> + * should work everywhere. If not, we'll have to turn this into a
> + * ppc_md. callback
> + */
Just a small nit, please use the customary (multi-line) comment
style:
/*
* Comment .....
* ...... goes here.
*/
specified in Documentation/CodingStyle.
Thanks,
Ingo
^ permalink raw reply
* Serial messages on boot
From: Guillaume Dargaud @ 2011-07-01 14:06 UTC (permalink / raw)
To: linuxppc-dev
Hello all,
in the most recent versions of the [xilinx powerpc] kernel, there are no boot messages printed, which make it hard to
debug early problems. What has changed and what can I do in order to get all the habitual boot messages on the serial
port ?
--
Guillaume Dargaud
http://www.gdargaud.net/
^ permalink raw reply
* Re: Serial messages on boot
From: Guillaume Dargaud @ 2011-07-01 15:18 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <201107011606.50425.dargaud@lpsc.in2p3.fr>
> in the most recent versions of the [xilinx powerpc] kernel, there are no
> boot messages printed, which make it hard to debug early problems. What
> has changed and what can I do in order to get all the habitual boot
> messages on the serial port ?
My bad, as usual: wrong console in the dts file.
--
Guillaume Dargaud
http://www.gdargaud.net/
^ permalink raw reply
* pci_pcie_cap invalid on AER/EEH enabled PPC?
From: Jon Mason @ 2011-07-01 15:24 UTC (permalink / raw)
To: linuxppc-dev, linux-pci; +Cc: Richard Lary, James Smart
I recently sent out a number of patches to migrate drivers calling
`pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This
function takes uses a PCI-E capability offset that was determined by
calling pci_find_capability during the PCI bus walking. In response
to one of the patches, James Smart posted:
"The reason is due to an issue on PPC platforms whereby use of
"pdev->is_pcie" and pci_is_pcie() will erroneously fail under some
conditions, but explicit search for the capability struct via
pci_find_capability() is always successful. I expect this to be due
a shadowing of pci config space in the hal/platform that isn't
sufficiently built up. We detected this issue while testing AER/EEH,
and are functional only if the pci_find_capability() option is used."
See http://marc.info/?l=linux-scsi&m=130946649427828&w=2 for the whole post.
Based on his description above pci_pcie_cap
andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally
equivalent. If this is not safe, then the PCI bus walking code is
most likely busted on EEH enabled PPC systems (and that is a BIG
problem). Can anyone confirm this is still an issue?
Thanks,
Jon
^ permalink raw reply
* Re: [PATCH 1/2] mtd/nand : don't free the global data fsl_lbc_ctrl_dev->nand in fsl_elbc_chip_remove()
From: Scott Wood @ 2011-07-01 16:14 UTC (permalink / raw)
To: dedekind1; +Cc: linuxppc-dev, b35362, dwmw2, linux-mtd
In-Reply-To: <1309498826.23597.200.camel@sauron>
On Fri, 1 Jul 2011 08:40:21 +0300
Artem Bityutskiy <dedekind1@gmail.com> wrote:
> On Thu, 2011-06-30 at 11:26 -0500, Scott Wood wrote:
> > If the NULL assignment is dropped, consider what happens if the
> > fsl_elbc_nand module is removed then reinserted. On reinsertion, it
> > will
> > see a non-NULL fsl_lbc_ctrl_dev->nand, and will skip allocating a new
> > one.
> > Then you're referencing freed memory.
>
> Oh, then this sounds like a separate bug. Removing the module should
> kill everything, and re-inserging the module should have zero
> dependencies on the previous states...
fsl_lbc_ctrl_dev (and thus the fsl_lbc_ctrl_dev->nand pointer) is not part
of the module, it is part of arch/powerpc/sysdev/fsl_lbc.c. NAND isn't the
only thing that elbc does. Since there can be multiple NAND chips, which
are separately probed, the first chip (under a lock) creates the NAND state
that is shared among the chips, and the last one removed destroys it.
> Anyway, if you think the original patch is OK, I can put it to my tree.
I think it's OK. The loop also needs to be removed, so the remove callback
operates only on the particular chip it's called on, but that's a separate
bug.
-Scott
^ permalink raw reply
* Re: [PATCH] powerpc: enable access to HT Host-Bridge on Maple
From: Segher Boessenkool @ 2011-07-01 16:44 UTC (permalink / raw)
To: Dmitry Eremin-Solenikov; +Cc: linuxppc-dev
In-Reply-To: <1309357060-20872-1-git-send-email-dbaryshkov@gmail.com>
> CPC925/CPC945 use special window to access host bridge
> functionality of
> u3-ht. Provide a way to access this device.
Why? Is anything going to use it?
> +static int u3_ht_root_read_config(struct pci_controller *hose, u8
> offset,
> + int len, u32 *val)
> +{
> + volatile void __iomem *addr;
> +
> + addr = hose->cfg_addr;
> + addr += ((offset & ~3) << 2) + (4 - len - (offset & 3));
This will only work for len 1,2,4 with offset a multiple of len, is that
guaranteed here?
> +static int u3_ht_root_write_config(struct pci_controller *hose, u8
> offset,
> + int len, u32 val)
> +{
> + volatile void __iomem *addr;
> +
> + addr = hose->cfg_addr + ((offset & ~3) << 2) + (4 - len - (offset
> & 3));
> +
> + if (offset >= PCI_BASE_ADDRESS_0 && offset < PCI_CAPABILITY_LIST)
> + return PCIBIOS_SUCCESSFUL;
This is a workaround for something; at the very least it needs a
comment,
but probably it shouldn't be here at all.
> static int u3_ht_read_config(struct pci_bus *bus, unsigned int devfn,
> int offset, int len, u32 *val)
> {
> @@ -217,6 +265,9 @@ static int u3_ht_read_config(struct pci_bus
> *bus, unsigned int devfn,
> if (hose == NULL)
> return PCIBIOS_DEVICE_NOT_FOUND;
>
> + if (bus->number == hose->first_busno && devfn == PCI_DEVFN(0, 0))
> + return u3_ht_root_read_config(hose, offset, len, val);
> +
> if (offset > 0xff)
> return PCIBIOS_BAD_REGISTER_NUMBER;
u3_ht_root_read_config() can get an offset out of range this way.
> hose->cfg_data = ioremap(0xf2000000, 0x02000000);
> + hose->cfg_addr = ioremap(0xf8070000, 0x1000);
Eww. You could just make a global instead of abusing existing fields,
there can be only one CPC9x5 in a system anyway.
Segher
^ permalink raw reply
* Re: [PATCH 7/7] [v6] drivers/virt: introduce Freescale hypervisor management driver
From: Arnd Bergmann @ 2011-07-01 17:12 UTC (permalink / raw)
To: Tabi Timur-B04825
Cc: konrad.wilk@oracle.com, linux-kernel@vger.kernel.org,
cmetcalf@tilera.com, akpm@kernel.org, dsaxena@linaro.org,
linux-console@vger.kernel.org, greg@kroah.com, Gala Kumar-B11780,
virtualization@lists.linux-foundation.org,
linuxppc-dev@lists.ozlabs.org, alan@lxorguk.ukuu.org.uk
In-Reply-To: <BANLkTi=3r=UQYJVqRLRe-eu7rdkXQS-ESQ@mail.gmail.com>
On Friday 01 July 2011, Tabi Timur-B04825 wrote:
> On Thu, Jun 9, 2011 at 4:04 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 09 June 2011 22:52:06 Timur Tabi wrote:
> >> Add the drivers/virt directory, which houses drivers that support
> >> virtualization environments, and add the Freescale hypervisor management
> >> driver.
> ....
> >>
> >> Signed-off-by: Timur Tabi <timur@freescale.com>
> >
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
>
> So I've made the changes that people have asked for, and Arnd has
> acked this driver. Who's going to pick it up? Who will be the
> maintainer for drivers/virt? I'd really like to see this driver in
> 3.1, and there's not much time left.
>
I'd say merge it through the powerpc tree, and list yourself for
the driver, but not the entire directory.
Arnd
^ permalink raw reply
* Re: [PATCH 0/17] Hypervisor-mode KVM on POWER7 and PPC970
From: Alexander Graf @ 2011-07-01 18:24 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, kvm-ppc, kvm
In-Reply-To: <20110629101552.GA25406@bloggs.ozlabs.ibm.com>
On 29.06.2011, at 12:15, Paul Mackerras wrote:
> The first patch of the following series is a pure bug-fix for 32-bit
> kernels.
>=20
> The remainder of the following series of patches enable KVM to exploit
> the hardware hypervisor mode on 64-bit Power ISA Book3S machines. At
> present, POWER7 and PPC970 processors are supported. (Note that the
> PPC970 processors in Apple G5 machines don't have a usable hypervisor
> mode and are not supported by these patches.)
>=20
> Running the KVM host in hypervisor mode means that the guest can use
> both supervisor mode and user mode. That means that the guest can
> execute supervisor-privilege instructions and access supervisor-
> privilege registers. In addition the hardware directs most exceptions
> to the guest. Thus we don't need to emulate any instructions in the
> host. Generally, the only times we need to exit the guest are when it
> does a hypercall or when an external interrupt or host timer
> (decrementer) interrupt occurs.
>=20
> The focus of this KVM implementation is to run guests that use the
> PAPR (Power Architecture Platform Requirements) paravirtualization
> interface, which is the interface supplied by PowerVM on IBM pSeries
> machines. Currently the "pseries" machine type in qemu is only
> supported by book3s_hv KVM, and book3s_hv KVM only supports the
> "pseries" machine type. That will hopefully change in future.
>=20
> These patches are against the master branch of the kvm tree.
Something seems to be broken with signals. When running without =
io-thread, I can't even do ctrl-c on -nographic while the guest is in =
sleep mode. But that might not be related to your patches.
I've applied 01-16 now. Sending them through some more testing and if =
they're good, sending a pull request.
Alex
^ permalink raw reply
* Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
From: Richard A Lary @ 2011-07-01 18:30 UTC (permalink / raw)
To: Jon Mason; +Cc: linux-pci, linuxppc-dev, Richard Lary, James Smart
In-Reply-To: <BANLkTinZSeq5H=K4K4s_cuC2tOwsTZEDJg@mail.gmail.com>
On 7/1/2011 8:24 AM, Jon Mason wrote:
> I recently sent out a number of patches to migrate drivers calling
> `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This
> function takes uses a PCI-E capability offset that was determined by
> calling pci_find_capability during the PCI bus walking. In response
> to one of the patches, James Smart posted:
>
> "The reason is due to an issue on PPC platforms whereby use of
> "pdev->is_pcie" and pci_is_pcie() will erroneously fail under some
> conditions, but explicit search for the capability struct via
> pci_find_capability() is always successful. I expect this to be due
> a shadowing of pci config space in the hal/platform that isn't
> sufficiently built up. We detected this issue while testing AER/EEH,
> and are functional only if the pci_find_capability() option is used."
>
> See http://marc.info/?l=linux-scsi&m=130946649427828&w=2 for the whole post.
>
> Based on his description above pci_pcie_cap
> andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally
> equivalent. If this is not safe, then the PCI bus walking code is
> most likely busted on EEH enabled PPC systems (and that is a BIG
> problem). Can anyone confirm this is still an issue?
Jon,
I applied the following debug patch to lpfc driver in a 2.6.32 distro
kernel ( I had this one handy, I can try with mainline later today )
---
drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 !
1 file changed, 10 insertions(+)
Index: b/drivers/scsi/lpfc/lpfc_init.c
===================================================================
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb
pci_try_set_mwi(pdev);
pci_save_state(pdev);
+ printk(KERN_WARNING "pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n",
+ pdev->is_pcie,
+ pdev->pcie_cap,
+ pdev->pcie_type);
+
+ if (pci_is_pcie(pdev))
+ printk(KERN_WARNING "pcicap: true\n");
+ else
+ printk(KERN_WARNING "pcicap: false\n");
+
/* PCIe EEH recovery on powerpc platforms needs fundamental reset */
if (pci_find_capability(pdev, PCI_CAP_ID_EXP))
pdev->needs_freset = 1;
This is output upon driver load on an IBM Power 7 model 8233-E8B server.
dmesg | grep pcicap
Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4
[gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27 PDT 2011
pcicap: is_pcie=0 pci_cap=0 pcie_type=0
pcicap: false
pcicap: is_pcie=0 pci_cap=0 pcie_type=0
pcicap: false
pcicap: is_pcie=0 pci_cap=0 pcie_type=0
pcicap: false
pcicap: is_pcie=0 pci_cap=0 pcie_type=0
pcicap: false
It would appear that the pcie information is not set in pci_dev structure for
this device at the time the driver is being initialized during boot.
-rich
^ permalink raw reply
* Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode
From: Dave Hansen @ 2011-07-01 18:37 UTC (permalink / raw)
To: Paul Mackerras; +Cc: linuxppc-dev, kvm-ppc, kvm, Alexander Graf
In-Reply-To: <20110629102134.GK25406@bloggs.ozlabs.ibm.com>
On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote:
> +struct kvmppc_pginfo {
> + unsigned long pfn;
> + atomic_t refcnt;
> +};
I only see this refcnt inc'd in one spot and never decremented or read.
Is the refcnt just the number of hptes we have for this particular page
at the moment?
> +long kvmppc_alloc_hpt(struct kvm *kvm)
> +{
> + unsigned long hpt;
> + unsigned long lpid;
> +
> + hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
> + HPT_ORDER - PAGE_SHIFT);
> + if (!hpt) {
> + pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
> + return -ENOMEM;
> + }
> + kvm->arch.hpt_virt = hpt;
> +
> + do {
> + lpid = find_first_zero_bit(lpid_inuse, NR_LPIDS);
> + if (lpid >= NR_LPIDS) {
> + pr_err("kvm_alloc_hpt: No LPIDs free\n");
> + free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
> + return -ENOMEM;
> + }
> + } while (test_and_set_bit(lpid, lpid_inuse));
> +
> + kvm->arch.sdr1 = __pa(hpt) | (HPT_ORDER - 18);
> + kvm->arch.lpid = lpid;
> + kvm->arch.host_sdr1 = mfspr(SPRN_SDR1);
> + kvm->arch.host_lpid = mfspr(SPRN_LPID);
> + kvm->arch.host_lpcr = mfspr(SPRN_LPCR);
> +
> + pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
> + return 0;
> +}
> +static unsigned long user_page_size(unsigned long addr)
> +{
> + struct vm_area_struct *vma;
> + unsigned long size = PAGE_SIZE;
> +
> + down_read(¤t->mm->mmap_sem);
> + vma = find_vma(current->mm, addr);
> + if (vma)
> + size = vma_kernel_pagesize(vma);
> + up_read(¤t->mm->mmap_sem);
> + return size;
> +}
That one looks pretty arch-independent and like it could use some
consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size()
> +void kvmppc_map_vrma(struct kvm *kvm, struct kvm_userspace_memory_region *mem)
> +{
> + unsigned long i;
> + unsigned long npages = kvm->arch.ram_npages;
> + unsigned long pfn;
> + unsigned long *hpte;
> + unsigned long hash;
> + struct kvmppc_pginfo *pginfo = kvm->arch.ram_pginfo;
> +
> + if (!pginfo)
> + return;
> +
> + /* VRMA can't be > 1TB */
> + if (npages > 1ul << (40 - kvm->arch.ram_porder))
> + npages = 1ul << (40 - kvm->arch.ram_porder);
Is that because it can only be a single segment? Does that mean that we
can't ever have guests larger than 1TB? Or just that they have to live
with 1TB until they get their own page tables up?
> + /* Can't use more than 1 HPTE per HPTEG */
> + if (npages > HPT_NPTEG)
> + npages = HPT_NPTEG;
> +
> + for (i = 0; i < npages; ++i) {
> + pfn = pginfo[i].pfn;
> + /* can't use hpt_hash since va > 64 bits */
> + hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & HPT_HASH_MASK;
Is that because 'i' could potentially have a very large pfn? Nish
thought it might have something to do with the hpte entries being larger
than 64-bits themselves with the vsid included, but we got thoroughly
confused. :)
-- Dave
^ permalink raw reply
* Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
From: Jon Mason @ 2011-07-01 19:02 UTC (permalink / raw)
To: Richard A Lary; +Cc: linux-pci, linuxppc-dev, Richard Lary, James Smart
In-Reply-To: <4E0E1253.1000909@linux.vnet.ibm.com>
On Fri, Jul 1, 2011 at 1:30 PM, Richard A Lary <rlary@linux.vnet.ibm.com> w=
rote:
> On 7/1/2011 8:24 AM, Jon Mason wrote:
>>
>> I recently sent out a number of patches to migrate drivers calling
>> `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. =A0This
>> function takes uses a PCI-E capability offset that was determined by
>> calling pci_find_capability during the PCI bus walking. =A0In response
>> to one of the patches, James Smart posted:
>>
>> "The reason is due to an issue on PPC platforms whereby use of
>> "pdev->is_pcie" and pci_is_pcie() will erroneously fail under some
>> conditions, but explicit search for the capability struct via
>> pci_find_capability() is always successful. =A0 I expect this to be due
>> a shadowing of pci config space in the hal/platform that isn't
>> sufficiently built up. =A0We detected this issue while testing AER/EEH,
>> and are functional only if the pci_find_capability() option is used."
>>
>> See http://marc.info/?l=3Dlinux-scsi&m=3D130946649427828&w=3D2 for the w=
hole
>> post.
>>
>> Based on his description above pci_pcie_cap
>> andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally
>> equivalent. =A0If this is not safe, then the PCI bus walking code is
>> most likely busted on EEH enabled PPC systems (and that is a BIG
>> problem). =A0Can anyone confirm this is still an issue?
>
> Jon,
>
> I applied the following debug patch to lpfc driver in a 2.6.32 distro
> kernel ( I had this one handy, I can try with mainline later today )
>
> ---
> =A0drivers/scsi/lpfc/lpfc_init.c | =A0 10 =A0 10 + =A0 =A00 - =A0 =A0 0 !
> =A01 file changed, 10 insertions(+)
>
> Index: b/drivers/scsi/lpfc/lpfc_init.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb
> =A0 =A0 =A0 =A0pci_try_set_mwi(pdev);
> =A0 =A0 =A0 =A0pci_save_state(pdev);
>
> + =A0 =A0 =A0 printk(KERN_WARNING "pcicap: is_pcie=3D%x pci_cap=3D%x pcie=
_type=3D%x\n",
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdev->is_pcie,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdev->pcie_cap,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 pdev->pcie_type);
> +
> + =A0 =A0 =A0 if (pci_is_pcie(pdev))
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_WARNING "pcicap: true\n");
> + =A0 =A0 =A0 else
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_WARNING "pcicap: false\n");
> +
> =A0 =A0 =A0 =A0/* PCIe EEH recovery on powerpc platforms needs fundamenta=
l reset */
> =A0 =A0 =A0 =A0if (pci_find_capability(pdev, PCI_CAP_ID_EXP))
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pdev->needs_freset =3D 1;
>
> This is output upon driver load on an IBM Power 7 model 8233-E8B server.
>
> dmesg | grep pcicap
> Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4
> [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27
> PDT 2011
> pcicap: is_pcie=3D0 pci_cap=3D0 pcie_type=3D0
> pcicap: false
> pcicap: is_pcie=3D0 pci_cap=3D0 pcie_type=3D0
> pcicap: false
> pcicap: is_pcie=3D0 pci_cap=3D0 pcie_type=3D0
> pcicap: false
> pcicap: is_pcie=3D0 pci_cap=3D0 pcie_type=3D0
> pcicap: false
>
> It would appear that the pcie information is not set in pci_dev structure
> for
> this device at the time the driver is being initialized during boot.
Thanks for trying this. Can you confirm that the other devices in the
system have this issue as well (or show that it is isolated to the lpr
device)? You can add printks in set_pcie_port_type() to verify what
is being set on bus walking and to see when it is being called with
respect to when it is being populated by firmware.
Thanks,
Jon
>
> -rich
>
>
>
^ permalink raw reply
* Re: [PATCH] powerpc: enable access to HT Host-Bridge on Maple
From: Dmitry Eremin-Solenikov @ 2011-07-01 19:11 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: linuxppc-dev
In-Reply-To: <F4E7EABC-B63F-4776-A8EB-8C93F64C2A3C@kernel.crashing.org>
Hello,
Thanks for the review.
On 7/1/11, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> CPC925/CPC945 use special window to access host bridge
>> functionality of
>> u3-ht. Provide a way to access this device.
>
> Why? Is anything going to use it?
Hmmm. Why not? Initially I stumbled upon the fact that powermac provides
such acces. Then I discovered that it provides configuration for memory windows
and other parts. (I needed it for my hack to add AGP bridge on U3 in Linux,
as I didn't want to change firmware).
>> +static int u3_ht_root_read_config(struct pci_controller *hose, u8
>> offset,
>> + int len, u32 *val)
>> +{
>> + volatile void __iomem *addr;
>> +
>> + addr = hose->cfg_addr;
>> + addr += ((offset & ~3) << 2) + (4 - len - (offset & 3));
>
> This will only work for len 1,2,4 with offset a multiple of len, is that
> guaranteed here?
I have to check. I think there is no guarantee, but standard accessors
are created exactly for 1, 2 and 4-byte access. And IIRC according
to PCI specs, offset should be len-aligned.
>> +static int u3_ht_root_write_config(struct pci_controller *hose, u8
>> offset,
>> + int len, u32 val)
>> +{
>> + volatile void __iomem *addr;
>> +
>> + addr = hose->cfg_addr + ((offset & ~3) << 2) + (4 - len - (offset
>> & 3));
>> +
>> + if (offset >= PCI_BASE_ADDRESS_0 && offset < PCI_CAPABILITY_LIST)
>> + return PCIBIOS_SUCCESSFUL;
>
> This is a workaround for something; at the very least it needs a
> comment,
> but probably it shouldn't be here at all.
I'll add a comment. Basically these registers are unimplemented on u3/u4
bridges and at least some kinds of access to them cause system freeze.
I'll try to check what triggers what this night.
>> static int u3_ht_read_config(struct pci_bus *bus, unsigned int devfn,
>> int offset, int len, u32 *val)
>> {
>> @@ -217,6 +265,9 @@ static int u3_ht_read_config(struct pci_bus
>> *bus, unsigned int devfn,
>> if (hose == NULL)
>> return PCIBIOS_DEVICE_NOT_FOUND;
>>
>> + if (bus->number == hose->first_busno && devfn == PCI_DEVFN(0, 0))
>> + return u3_ht_root_read_config(hose, offset, len, val);
>> +
>> if (offset > 0xff)
>> return PCIBIOS_BAD_REGISTER_NUMBER;
>
> u3_ht_root_read_config() can get an offset out of range this way.
Yes, as offsets for this host bridge can be larger then 0xff. U3/U4 have
some HT configuration there, and I didn't want to impose a limit there.
If you are against this, I can change the order of calls.
>
>> hose->cfg_data = ioremap(0xf2000000, 0x02000000);
>> + hose->cfg_addr = ioremap(0xf8070000, 0x1000);
>
> Eww. You could just make a global instead of abusing existing fields,
> there can be only one CPC9x5 in a system anyway.
This was a copy-paste of corresponding PowerMac code. Do you really want
for me to change this to global variable?
BTW: I see a lot of code duplication between PowerMac and Maple pci.c
files. Would you like a patch that merges those files to something
like arch/powerpc/sysdev/u3-pci.c? I can try merging them...
--
With best wishes
Dmitry
^ permalink raw reply
* Re: [PATCH 10/17] KVM: PPC: Add support for Book3S processors in hypervisor mode
From: Alexander Graf @ 2011-07-01 19:12 UTC (permalink / raw)
To: Dave Hansen; +Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm
In-Reply-To: <1309545462.16582.12.camel@nimitz>
On 01.07.2011, at 20:37, Dave Hansen wrote:
> On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote:=20
>> +struct kvmppc_pginfo {
>> + unsigned long pfn;
>> + atomic_t refcnt;
>> +};
>=20
> I only see this refcnt inc'd in one spot and never decremented or =
read.
> Is the refcnt just the number of hptes we have for this particular =
page
> at the moment? =20
>=20
>> +long kvmppc_alloc_hpt(struct kvm *kvm)
>> +{
>> + unsigned long hpt;
>> + unsigned long lpid;
>> +
>> + hpt =3D =
__get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|__GFP_NOWARN,
>> + HPT_ORDER - PAGE_SHIFT);
>> + if (!hpt) {
>> + pr_err("kvm_alloc_hpt: Couldn't alloc HPT\n");
>> + return -ENOMEM;
>> + }
>> + kvm->arch.hpt_virt =3D hpt;
>> +
>> + do {
>> + lpid =3D find_first_zero_bit(lpid_inuse, NR_LPIDS);
>> + if (lpid >=3D NR_LPIDS) {
>> + pr_err("kvm_alloc_hpt: No LPIDs free\n");
>> + free_pages(hpt, HPT_ORDER - PAGE_SHIFT);
>> + return -ENOMEM;
>> + }
>> + } while (test_and_set_bit(lpid, lpid_inuse));
>> +
>> + kvm->arch.sdr1 =3D __pa(hpt) | (HPT_ORDER - 18);
>> + kvm->arch.lpid =3D lpid;
>> + kvm->arch.host_sdr1 =3D mfspr(SPRN_SDR1);
>> + kvm->arch.host_lpid =3D mfspr(SPRN_LPID);
>> + kvm->arch.host_lpcr =3D mfspr(SPRN_LPCR);
>> +
>> + pr_info("KVM guest htab at %lx, LPID %lx\n", hpt, lpid);
>> + return 0;
>> +}
>=20
>> +static unsigned long user_page_size(unsigned long addr)
>> +{
>> + struct vm_area_struct *vma;
>> + unsigned long size =3D PAGE_SIZE;
>> +
>> + down_read(¤t->mm->mmap_sem);
>> + vma =3D find_vma(current->mm, addr);
>> + if (vma)
>> + size =3D vma_kernel_pagesize(vma);
>> + up_read(¤t->mm->mmap_sem);
>> + return size;
>> +}
>=20
> That one looks pretty arch-independent and like it could use some
> consolidation with: virt/kvm/kvm_main.c::kvm_host_page_size()
Yep, I'd deem that a cleanup for later though. Good point however! We =
have similar code in e500 kvm today.
>=20
>> +void kvmppc_map_vrma(struct kvm *kvm, struct =
kvm_userspace_memory_region *mem)
>> +{
>> + unsigned long i;
>> + unsigned long npages =3D kvm->arch.ram_npages;
>> + unsigned long pfn;
>> + unsigned long *hpte;
>> + unsigned long hash;
>> + struct kvmppc_pginfo *pginfo =3D kvm->arch.ram_pginfo;
>> +
>> + if (!pginfo)
>> + return;
>> +
>> + /* VRMA can't be > 1TB */
>> + if (npages > 1ul << (40 - kvm->arch.ram_porder))
>> + npages =3D 1ul << (40 - kvm->arch.ram_porder);
>=20
> Is that because it can only be a single segment? Does that mean that =
we
> can't ever have guests larger than 1TB? Or just that they have to =
live
> with 1TB until they get their own page tables up?
The VRMA is only important in real mode, so this part looks good. The =
RMA is usually a lot smaller than 1TB ;).
Alex
^ permalink raw reply
* Re: pci_pcie_cap invalid on AER/EEH enabled PPC?
From: Richard A Lary @ 2011-07-01 20:00 UTC (permalink / raw)
To: Jon Mason; +Cc: linux-pci, linuxppc-dev, Richard Lary, James Smart
In-Reply-To: <BANLkTinTEk2SgBwVUHc49cDLEwH_a-XsJg@mail.gmail.com>
On 7/1/2011 12:02 PM, Jon Mason wrote:
> On Fri, Jul 1, 2011 at 1:30 PM, Richard A Lary<rlary@linux.vnet.ibm.com> wrote:
>> On 7/1/2011 8:24 AM, Jon Mason wrote:
>>>
>>> I recently sent out a number of patches to migrate drivers calling
>>> `pci_find_capability(pdef, PCI_CAP_ID_EXP)` to pci_pcie_cap. This
>>> function takes uses a PCI-E capability offset that was determined by
>>> calling pci_find_capability during the PCI bus walking. In response
>>> to one of the patches, James Smart posted:
>>>
>>> "The reason is due to an issue on PPC platforms whereby use of
>>> "pdev->is_pcie" and pci_is_pcie() will erroneously fail under some
>>> conditions, but explicit search for the capability struct via
>>> pci_find_capability() is always successful. I expect this to be due
>>> a shadowing of pci config space in the hal/platform that isn't
>>> sufficiently built up. We detected this issue while testing AER/EEH,
>>> and are functional only if the pci_find_capability() option is used."
>>>
>>> See http://marc.info/?l=linux-scsi&m=130946649427828&w=2 for the whole
>>> post.
>>>
>>> Based on his description above pci_pcie_cap
>>> andpci_find_capability(pdef, PCI_CAP_ID_EXP) should be functionally
>>> equivalent. If this is not safe, then the PCI bus walking code is
>>> most likely busted on EEH enabled PPC systems (and that is a BIG
>>> problem). Can anyone confirm this is still an issue?
>>
>> Jon,
>>
>> I applied the following debug patch to lpfc driver in a 2.6.32 distro
>> kernel ( I had this one handy, I can try with mainline later today )
>>
>> ---
>> drivers/scsi/lpfc/lpfc_init.c | 10 10 + 0 - 0 !
>> 1 file changed, 10 insertions(+)
>>
>> Index: b/drivers/scsi/lpfc/lpfc_init.c
>> ===================================================================
>> --- a/drivers/scsi/lpfc/lpfc_init.c
>> +++ b/drivers/scsi/lpfc/lpfc_init.c
>> @@ -3958,6 +3958,16 @@ lpfc_enable_pci_dev(struct lpfc_hba *phb
>> pci_try_set_mwi(pdev);
>> pci_save_state(pdev);
>>
>> + printk(KERN_WARNING "pcicap: is_pcie=%x pci_cap=%x pcie_type=%x\n",
>> + pdev->is_pcie,
>> + pdev->pcie_cap,
>> + pdev->pcie_type);
>> +
>> + if (pci_is_pcie(pdev))
>> + printk(KERN_WARNING "pcicap: true\n");
>> + else
>> + printk(KERN_WARNING "pcicap: false\n");
>> +
>> /* PCIe EEH recovery on powerpc platforms needs fundamental reset */
>> if (pci_find_capability(pdev, PCI_CAP_ID_EXP))
>> pdev->needs_freset = 1;
>>
>> This is output upon driver load on an IBM Power 7 model 8233-E8B server.
>>
>> dmesg | grep pcicap
>> Linux version 2.6.32.42-pcicap-ppc64 (geeko@buildhost) (gcc version 4.3.4
>> [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Fri Jul 1 09:31:27
>> PDT 2011
>> pcicap: is_pcie=0 pci_cap=0 pcie_type=0
>> pcicap: false
>> pcicap: is_pcie=0 pci_cap=0 pcie_type=0
>> pcicap: false
>> pcicap: is_pcie=0 pci_cap=0 pcie_type=0
>> pcicap: false
>> pcicap: is_pcie=0 pci_cap=0 pcie_type=0
>> pcicap: false
>>
>> It would appear that the pcie information is not set in pci_dev structure
>> for
>> this device at the time the driver is being initialized during boot.
>
> Thanks for trying this. Can you confirm that the other devices in the
> system have this issue as well (or show that it is isolated to the lpr
> device)? You can add printks in set_pcie_port_type() to verify what
> is being set on bus walking and to see when it is being called with
> respect to when it is being populated by firmware.
Jon,
I will give this suggestion a try and post results
-rich
^ permalink raw reply
* Re: [PATCH v3 1/3] driver core: Add ability for arch code to setup pdev_archdata
From: Greg KH @ 2011-07-01 22:09 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1308828839-27349-1-git-send-email-galak@kernel.crashing.org>
On Thu, Jun 23, 2011 at 06:33:57AM -0500, Kumar Gala wrote:
> On some architectures we need to setup pdev_archdata before we add the
> device. Waiting til a bus_notifier is too late since we might need the
> pdev_archdata in the bus notifier. One example is setting up of dma_mask
> pointers such that it can be used in a bus_notifier.
>
> We add noop version of arch_setup_pdev_archdata() in
> <asm-generic/platform_device.h> and allow the arch code to override with
> access the full definitions of struct device, struct platform_device, and
> struct pdev_archdata.
Isn't there some way to use "weak" symbols to keep us from having to
create this .h file in every single arch and then if the arch wants to
define it, it does so?
That should make this patch simpler, right?
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH 1/2] mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
From: Benjamin Herrenschmidt @ 2011-07-01 23:11 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <20110701121408.GC28008@elte.hu>
On Fri, 2011-07-01 at 14:14 +0200, Ingo Molnar wrote:
> * Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
> > files, and I need it in a third one to fix a powerpc bug, so let's
> > first move it into a header
> >
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >
> > Ingo, Thomas: Who needs to ack the x86 bit ? I'd like to send that
> > to Linus asap with the powerpc fix.
>
> Acked-by: Ingo Molnar <mingo@elte.hu>
>
> (btw., you can consider obvious cleanups as being implicitly acked by
> me and don't need to block fixes on me.)
Ok thanks !
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Fix memory_block_size_bytes() for non-pseries
From: Benjamin Herrenschmidt @ 2011-07-01 23:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <20110701121516.GD28008@elte.hu>
On Fri, 2011-07-01 at 14:15 +0200, Ingo Molnar wrote:
> > +/* WARNING: This is going to override the generic definition whenever
> > + * pseries is built-in regardless of what platform is active at boot
> > + * time. This is fine for now as this is the only "option" and it
> > + * should work everywhere. If not, we'll have to turn this into a
> > + * ppc_md. callback
> > + */
>
> Just a small nit, please use the customary (multi-line) comment
> style:
>
> /*
> * Comment .....
> * ...... goes here.
> */
>
> specified in Documentation/CodingStyle.
Ah ! Here goes my sneak attempts at violating coding style while nobody
notices :-)
No seriously, that sort of stuff shouldn't be such a hard rule... In
some cases the "official" way looks nicer, on some cases it's just a
waste of space, and I've grown to prefer my slightly more compact form,
at least depending on how the surrounding code looks like.
Since that's all powerpc arch code, I believe I'm entitled to that
little bit of flexibility in how the code looks like :-) It's not
like I'm GoingToPlayWithCaps() or switching to 3-char tabs :-)
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Fix memory_block_size_bytes() for non-pseries
From: Ingo Molnar @ 2011-07-02 10:23 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <1309562112.14501.257.camel@pasglop>
* Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Fri, 2011-07-01 at 14:15 +0200, Ingo Molnar wrote:
>
> > > +/* WARNING: This is going to override the generic definition whenever
> > > + * pseries is built-in regardless of what platform is active at boot
> > > + * time. This is fine for now as this is the only "option" and it
> > > + * should work everywhere. If not, we'll have to turn this into a
> > > + * ppc_md. callback
> > > + */
> >
> > Just a small nit, please use the customary (multi-line) comment
> > style:
> >
> > /*
> > * Comment .....
> > * ...... goes here.
> > */
> >
> > specified in Documentation/CodingStyle.
>
> Ah ! Here goes my sneak attempts at violating coding style while
> nobody notices :-)
>
> No seriously, that sort of stuff shouldn't be such a hard rule...
> In some cases the "official" way looks nicer, on some cases it's
> just a waste of space, and I've grown to prefer my slightly more
> compact form, at least depending on how the surrounding code looks
> like.
>
> Since that's all powerpc arch code, I believe I'm entitled to that
> little bit of flexibility in how the code looks like :-) It's not
> like I'm GoingToPlayWithCaps() or switching to 3-char tabs :-)
It's certainly not a hard rule - but note that the file in question
(arch/powerpc/platforms/pseries/hotplug-memory.c) has a rather
inconsistent comment style, sometimes even within the same function:
/*
* Remove htab bolted mappings for this section of memory
*/
...
/* Ensure all vmalloc mappings are flushed in case they also
* hit that section of memory
*/
That kind of inconsistency within the same .c file and within the
same function is not defensible with a "style is a matter of taste"
argument.
As i said, it's just a small nit.
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Fix memory_block_size_bytes() for non-pseries
From: Benjamin Herrenschmidt @ 2011-07-02 14:15 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mm@kvack.org, Thomas Gleixner, linuxppc-dev,
linux-kernel@vger.kernel.org
In-Reply-To: <20110702102333.GC17482@elte.hu>
On Sat, 2011-07-02 at 12:23 +0200, Ingo Molnar wrote:
> It's certainly not a hard rule - but note that the file in question
> (arch/powerpc/platforms/pseries/hotplug-memory.c) has a rather
> inconsistent comment style, sometimes even within the same function:
>
> /*
> * Remove htab bolted mappings for this section of memory
> */
> ...
>
> /* Ensure all vmalloc mappings are flushed in case they also
> * hit that section of memory
> */
>
> That kind of inconsistency within the same .c file and within the
> same function is not defensible with a "style is a matter of taste"
> argument.
Right, that's a matter of different people with different taste mucking
around with the same file I suppose.
Most of this probably predates my involvement as a maintainer but even
if not (and I really can't be bothered digging into the history), it
might very well be something I didn't pay attention to while reviewing.
Seriously, it's so low on the scale of what matters ... I'm sure we both
have more valuable stuff to spend our time and energy on :-)
Cheers,
Ben.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox