LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: machine check in kernel for a mpc870 board
From: Scott Wood @ 2010-07-02 19:41 UTC (permalink / raw)
  To: Shawn Jin; +Cc: ppcdev
In-Reply-To: <AANLkTimX7G59glJ9CQZpT-Zlb3EUsJ1Lil4_80BOi1Nk@mail.gmail.com>

On Fri, 2 Jul 2010 12:16:11 -0700
Shawn Jin <shawnxjin@gmail.com> wrote:

> >> The chipselect? Isn't it just the child-bus-addr? BTW, do we have
> >> to define the #address-cells to 2? 1 is not enough?
> >
> > The first cell of the child bus address is the chip select, the
> > second cell is the offset into the chip select.
>=20
> I see. So the #address-sells of 2 doesn't necessarily indicate the
> address is 64 bits?

Well, there's 64 bits of data, but it doesn't mean that it's one 64-bit
integer.

> Different processors can interpret it differently?

Different device tree bus types can -- though in this case it translates
to an ordinary CPU address using the standand ranges property.

> Where can I find such info? Is there any doc on this?

Documentation/powerpc/dts-bindings/fsl/lbc.txt

> I have a question on the serial settings. Why does it locate at 0xa80?
> According to MPC885RM.pdf, the SMC1's registers start from 0xa82.=20

I suppose the interpretation was that the register block starts at
0xa80, and the first register within that block is at 0xa82 -- though
the manual seems to actually lump those two reserved bytes in with the
previous section.

> What does the reg property specify here for SMC1, the first set of <0xa80
> 0x10> and the 2nd <0x3e80 0x40>?

=46rom Documentation/powerpc/dts-bindings/fsl/cpm.txt:
> - reg : Unless otherwise specified, the first resource represents the =20
>         scc/fcc/ucc registers, and the second represents the device's
>         parameter RAM region (if it has one).

-Scott

^ permalink raw reply

* Re: machine check in kernel for a mpc870 board
From: Shawn Jin @ 2010-07-02 19:16 UTC (permalink / raw)
  To: Scott Wood; +Cc: ppcdev
In-Reply-To: <20100702124713.2e2d300c@schlenkerla.am.freescale.net>

>> The chipselect? Isn't it just the child-bus-addr? BTW, do we have to
>> define the #address-cells to 2? 1 is not enough?
>
> The first cell of the child bus address is the chip select, the second
> cell is the offset into the chip select.

I see. So the #address-sells of 2 doesn't necessarily indicate the
address is 64 bits? Different processors can interpret it differently?
Where can I find such info? Is there any doc on this?

I have a question on the serial settings. Why does it locate at 0xa80?
According to MPC885RM.pdf, the SMC1's registers start from 0xa82. What
does the reg property specify here for SMC1, the first set of <0xa80
0x10> and the 2nd <0x3e80 0x40>?

                        console: serial@a80 {
                                device_type = "serial";
                                compatible = "fsl,mpc875-smc-uart",
                                             "fsl,cpm1-smc-uart";
                                reg = <0xa80 0x10 0x3e80 0x40>;
                                interrupts = <4>;
                                interrupt-parent = <&CPM_PIC>;
                                fsl,cpm-brg = <1>;
                                fsl,cpm-command = <0x0090>;
                                current-speed = <115200>;

Thanks a lot,
-Shawn.

^ permalink raw reply

* Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
From: Scott Wood @ 2010-07-02 19:10 UTC (permalink / raw)
  To: Alexander Graf
  Cc: KVM list, kvm-ppc, Dan Hettena, linuxppc-dev, Hollis Blanchard
In-Reply-To: <3085B58A-01A1-4B5C-A0E7-024DCFDFD4B2@suse.de>

On Fri, 2 Jul 2010 20:47:44 +0200
Alexander Graf <agraf@suse.de> wrote:

> 
> On 02.07.2010, at 19:59, Hollis Blanchard wrote:
> 
> > [Resending...]
> > 
> > Please reconcile this with
> > http://www.linux-kvm.org/page/PowerPC_Hypercall_ABI, which has been
> > discussed in the (admittedly closed) Power.org embedded hypervisor
> > working group. Bear in mind that other hypervisors are already
> > implementing the documented ABI, so if you have concerns, you should
> > probably raise them with that audience...
> 
> We can not use sc with LV=1 because that would break the KVM in
> something else case which is KVM's strong point on PPC.

The current proposal involves the hypervisor specifying the hcall opcode
sequence in the device tree -- to allow either "sc 1" or "sc 0 plus
magic GPR" depending on whether you've got the hardware hypervisor
feature (hereafter HHV).

With HHV, "sc 0 plus magic GPR" just doesn't work, since it won't trap
to the hypervisor.  "sc 1 plus magic GPR" might be problematic on some
non-HHV implementations, especially if you *do* have HHV but the
non-HHV hypervisor is running as an HHV guest.

-Scott

^ permalink raw reply

* Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-07-02 18:47 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: Scott Wood, linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <AANLkTiksUkrO8ryEiX3Yv-_2KGVE6r5RIT4YDvrmoDPL@mail.gmail.com>


On 02.07.2010, at 19:59, Hollis Blanchard wrote:

> [Resending...]
>=20
> Please reconcile this with
> http://www.linux-kvm.org/page/PowerPC_Hypercall_ABI, which has been
> discussed in the (admittedly closed) Power.org embedded hypervisor
> working group. Bear in mind that other hypervisors are already
> implementing the documented ABI, so if you have concerns, you should
> probably raise them with that audience...

We can not use sc with LV=3D1 because that would break the KVM in =
something else case which is KVM's strong point on PPC.

Alex

^ permalink raw reply

* Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
From: Alexander Graf @ 2010-07-02 18:41 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <EB2E1C5B-118B-481B-83D6-44CFAA2E55D3@kernel.crashing.org>


On 02.07.2010, at 18:27, Segher Boessenkool wrote:

>> +To find out if we're running on KVM or not, we overlay the PVR =
register. Usually
>> +the PVR register contains an id that identifies your CPU type. If, =
however, you
>> +pass KVM_PVR_PARA in the register that you want the PVR result in, =
the register
>> +still contains KVM_PVR_PARA after the mfpvr call.
>> +
>> +	LOAD_REG_IMM(r5, KVM_PVR_PARA)
>> +	mfpvr	r5
>> +	[r5 still contains KVM_PVR_PARA]
>=20
> I love this part :-)

:)

>=20
>> +	__u64 scratch3;
>> +	__u64 critical;		/* Guest may not get interrupts if =3D=3D =
r1 */
>> +	__u64 sprg0;
>> +	__u64 sprg1;
>> +	__u64 sprg2;
>> +	__u64 sprg3;
>> +	__u64 srr0;
>> +	__u64 srr1;
>> +	__u64 dar;
>> +	__u64 msr;
>> +	__u32 dsisr;
>> +	__u32 int_pending;	/* Tells the guest if we have an =
interrupt */
>> +};
>> +
>> +Additions to the page must only occur at the end. Struct fields are =
always 32
>> +bit aligned.
>=20
> The u64s are 64-bit aligned, should they always be?

That's obvious, isn't it? And the ABI only specifies u64s to be 32 bit =
aligned, no? At least that's what ld and std specify.

>=20
>> +The "ld" and "std" instructions are transormed to "lwz" and "stw" =
instructions
>> +respectively on 32 bit systems with an added offset of 4 to =
accomodate for big
>> +endianness.
>=20
> Will this add never overflow?  Is there anything that checks for it?

It basically means that to access dar, we either do

ld  rX, DAR(0)

or

lwz rX, DAR+4(0)


>=20
>> +mtmsrd	rX, 0		b	<special mtmsr section>
>> +mtmsr			b	<special mtmsr section>
>=20
> mtmsr rX

Nod.


Alex

^ permalink raw reply

* Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
From: Hollis Blanchard @ 2010-07-02 17:59 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Scott Wood, linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277980982-12433-28-git-send-email-agraf@suse.de>

[Resending...]

Please reconcile this with
http://www.linux-kvm.org/page/PowerPC_Hypercall_ABI, which has been
discussed in the (admittedly closed) Power.org embedded hypervisor
working group. Bear in mind that other hypervisors are already
implementing the documented ABI, so if you have concerns, you should
probably raise them with that audience...

-Hollis

On Thu, Jul 1, 2010 at 3:43 AM, Alexander Graf <agraf@suse.de> wrote:
>
> We just introduced a new PV interface that screams for documentation. So =
here
> it is - a shiny new and awesome text file describing the internal works o=
f
> the PPC KVM paravirtual interface.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
>
> ---
>
> v1 -> v2:
>
> =A0- clarify guest implementation
> =A0- clarify that privileged instructions still work
> =A0- explain safe MSR bits
> =A0- Fix dsisr patch description
> =A0- change hypervisor calls to use new register values
> ---
> =A0Documentation/kvm/ppc-pv.txt | =A0185 ++++++++++++++++++++++++++++++++=
++++++++++
> =A01 files changed, 185 insertions(+), 0 deletions(-)
> =A0create mode 100644 Documentation/kvm/ppc-pv.txt
>
> diff --git a/Documentation/kvm/ppc-pv.txt b/Documentation/kvm/ppc-pv.txt
> new file mode 100644
> index 0000000..82de6c6
> --- /dev/null
> +++ b/Documentation/kvm/ppc-pv.txt
> @@ -0,0 +1,185 @@
> +The PPC KVM paravirtual interface
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +The basic execution principle by which KVM on PowerPC works is to run al=
l kernel
> +space code in PR=3D1 which is user space. This way we trap all privilege=
d
> +instructions and can emulate them accordingly.
> +
> +Unfortunately that is also the downfall. There are quite some privileged
> +instructions that needlessly return us to the hypervisor even though the=
y
> +could be handled differently.
> +
> +This is what the PPC PV interface helps with. It takes privileged instru=
ctions
> +and transforms them into unprivileged ones with some help from the hyper=
visor.
> +This cuts down virtualization costs by about 50% on some of my benchmark=
s.
> +
> +The code for that interface can be found in arch/powerpc/kernel/kvm*
> +
> +Querying for existence
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +To find out if we're running on KVM or not, we overlay the PVR register.=
 Usually
> +the PVR register contains an id that identifies your CPU type. If, howev=
er, you
> +pass KVM_PVR_PARA in the register that you want the PVR result in, the r=
egister
> +still contains KVM_PVR_PARA after the mfpvr call.
> +
> + =A0 =A0 =A0 LOAD_REG_IMM(r5, KVM_PVR_PARA)
> + =A0 =A0 =A0 mfpvr =A0 r5
> + =A0 =A0 =A0 [r5 still contains KVM_PVR_PARA]
> +
> +Once determined to run under a PV capable KVM, you can now use hypercall=
s as
> +described below.
> +
> +PPC hypercalls
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +The only viable ways to reliably get from guest context to host context =
are:
> +
> + =A0 =A0 =A0 1) Call an invalid instruction
> + =A0 =A0 =A0 2) Call the "sc" instruction with a parameter to "sc"
> + =A0 =A0 =A0 3) Call the "sc" instruction with parameters in GPRs
> +
> +Method 1 is always a bad idea. Invalid instructions can be replaced late=
r on
> +by valid instructions, rendering the interface broken.
> +
> +Method 2 also has downfalls. If the parameter to "sc" is !=3D 0 the spec=
 is
> +rather unclear if the sc is targeted directly for the hypervisor or the
> +supervisor. It would also require that we read the syscall issuing instr=
uction
> +every time a syscall is issued, slowing down guest syscalls.
> +
> +Method 3 is what KVM uses. We pass magic constants (KVM_SC_MAGIC_R0 and
> +KVM_SC_MAGIC_R3) in r0 and r3 respectively. If a syscall instruction wit=
h these
> +magic values arrives from the guest's kernel mode, we take the syscall a=
s a
> +hypercall.
> +
> +The parameters are as follows:
> +
> + =A0 =A0 =A0 r0 =A0 =A0 =A0 =A0 =A0 =A0 =A0KVM_SC_MAGIC_R0
> + =A0 =A0 =A0 r3 =A0 =A0 =A0 =A0 =A0 =A0 =A0KVM_SC_MAGIC_R3 =A0 =A0 =A0 =
=A0 Return code
> + =A0 =A0 =A0 r4 =A0 =A0 =A0 =A0 =A0 =A0 =A0Hypercall number
> + =A0 =A0 =A0 r5 =A0 =A0 =A0 =A0 =A0 =A0 =A0First parameter
> + =A0 =A0 =A0 r6 =A0 =A0 =A0 =A0 =A0 =A0 =A0Second parameter
> + =A0 =A0 =A0 r7 =A0 =A0 =A0 =A0 =A0 =A0 =A0Third parameter
> + =A0 =A0 =A0 r8 =A0 =A0 =A0 =A0 =A0 =A0 =A0Fourth parameter
> +
> +Hypercall definitions are shared in generic code, so the same hypercall =
numbers
> +apply for x86 and powerpc alike.
> +
> +The magic page
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +To enable communication between the hypervisor and guest there is a new =
shared
> +page that contains parts of supervisor visible register state. The guest=
 can
> +map this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
> +
> +With this hypercall issued the guest always gets the magic page mapped a=
t the
> +desired location in effective and physical address space. For now, we al=
ways
> +map the page to -4096. This way we can access it using absolute load and=
 store
> +functions. The following instruction reads the first field of the magic =
page:
> +
> + =A0 =A0 =A0 ld =A0 =A0 =A0rX, -4096(0)
> +
> +The interface is designed to be extensible should there be need later to=
 add
> +additional registers to the magic page. If you add fields to the magic p=
age,
> +also define a new hypercall feature to indicate that the host can give y=
ou more
> +registers. Only if the host supports the additional features, make use o=
f them.
> +
> +The magic page has the following layout as described in
> +arch/powerpc/include/asm/kvm_para.h:
> +
> +struct kvm_vcpu_arch_shared {
> + =A0 =A0 =A0 __u64 scratch1;
> + =A0 =A0 =A0 __u64 scratch2;
> + =A0 =A0 =A0 __u64 scratch3;
> + =A0 =A0 =A0 __u64 critical; =A0 =A0 =A0 =A0 /* Guest may not get interr=
upts if =3D=3D r1 */
> + =A0 =A0 =A0 __u64 sprg0;
> + =A0 =A0 =A0 __u64 sprg1;
> + =A0 =A0 =A0 __u64 sprg2;
> + =A0 =A0 =A0 __u64 sprg3;
> + =A0 =A0 =A0 __u64 srr0;
> + =A0 =A0 =A0 __u64 srr1;
> + =A0 =A0 =A0 __u64 dar;
> + =A0 =A0 =A0 __u64 msr;
> + =A0 =A0 =A0 __u32 dsisr;
> + =A0 =A0 =A0 __u32 int_pending; =A0 =A0 =A0/* Tells the guest if we have=
 an interrupt */
> +};
> +
> +Additions to the page must only occur at the end. Struct fields are alwa=
ys 32
> +bit aligned.
> +
> +MSR bits
> +=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +The MSR contains bits that require hypervisor intervention and bits that=
 do
> +not require direct hypervisor intervention because they only get interpr=
eted
> +when entering the guest or don't have any impact on the hypervisor's beh=
avior.
> +
> +The following bits are safe to be set inside the guest:
> +
> + =A0MSR_EE
> + =A0MSR_RI
> + =A0MSR_CR
> + =A0MSR_ME
> +
> +If any other bit changes in the MSR, please still use mtmsr(d).
> +
> +Patched instructions
> +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> +
> +The "ld" and "std" instructions are transormed to "lwz" and "stw" instru=
ctions
> +respectively on 32 bit systems with an added offset of 4 to accomodate f=
or big
> +endianness.
> +
> +The following is a list of mapping the Linux kernel performs when runnin=
g as
> +guest. Implementing any of those mappings is optional, as the instructio=
n traps
> +also act on the shared page. So calling privileged instructions still wo=
rks as
> +before.
> +
> +From =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 To
> +=3D=3D=3D=3D =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D=3D
> +
> +mfmsr =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0ld =A0 =A0 =A0rX, magic_page->msr
> +mfsprg rX, 0 =A0 =A0 =A0 =A0 =A0 ld =A0 =A0 =A0rX, magic_page->sprg0
> +mfsprg rX, 1 =A0 =A0 =A0 =A0 =A0 ld =A0 =A0 =A0rX, magic_page->sprg1
> +mfsprg rX, 2 =A0 =A0 =A0 =A0 =A0 ld =A0 =A0 =A0rX, magic_page->sprg2
> +mfsprg rX, 3 =A0 =A0 =A0 =A0 =A0 ld =A0 =A0 =A0rX, magic_page->sprg3
> +mfsrr0 rX =A0 =A0 =A0 =A0 =A0 =A0 =A0ld =A0 =A0 =A0rX, magic_page->srr0
> +mfsrr1 rX =A0 =A0 =A0 =A0 =A0 =A0 =A0ld =A0 =A0 =A0rX, magic_page->srr1
> +mfdar =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0ld =A0 =A0 =A0rX, magic_page->dar
> +mfdsisr =A0 =A0 =A0 =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0lwz =A0 =A0 rX, mag=
ic_page->dsisr
> +
> +mtmsr =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0std =A0 =A0 rX, magic_page->msr
> +mtsprg 0, rX =A0 =A0 =A0 =A0 =A0 std =A0 =A0 rX, magic_page->sprg0
> +mtsprg 1, rX =A0 =A0 =A0 =A0 =A0 std =A0 =A0 rX, magic_page->sprg1
> +mtsprg 2, rX =A0 =A0 =A0 =A0 =A0 std =A0 =A0 rX, magic_page->sprg2
> +mtsprg 3, rX =A0 =A0 =A0 =A0 =A0 std =A0 =A0 rX, magic_page->sprg3
> +mtsrr0 rX =A0 =A0 =A0 =A0 =A0 =A0 =A0std =A0 =A0 rX, magic_page->srr0
> +mtsrr1 rX =A0 =A0 =A0 =A0 =A0 =A0 =A0std =A0 =A0 rX, magic_page->srr1
> +mtdar =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0std =A0 =A0 rX, magic_page->dar
> +mtdsisr =A0 =A0 =A0 =A0rX =A0 =A0 =A0 =A0 =A0 =A0 =A0stw =A0 =A0 rX, mag=
ic_page->dsisr
> +
> +tlbsync =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nop
> +
> +mtmsrd rX, 0 =A0 =A0 =A0 =A0 =A0 b =A0 =A0 =A0 <special mtmsr section>
> +mtmsr =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0b =A0 =A0 =A0 <special mtmsr se=
ction>
> +
> +mtmsrd rX, 1 =A0 =A0 =A0 =A0 =A0 b =A0 =A0 =A0 <special mtmsrd section>
> +
> +[BookE only]
> +wrteei [0|1] =A0 =A0 =A0 =A0 =A0 b =A0 =A0 =A0 <special wrteei section>
> +
> +
> +Some instructions require more logic to determine what's going on than a=
 load
> +or store instruction can deliver. To enable patching of those, we keep s=
ome
> +RAM around where we can live translate instructions to. What happens is =
the
> +following:
> +
> + =A0 =A0 =A0 1) copy emulation code to memory
> + =A0 =A0 =A0 2) patch that code to fit the emulated instruction
> + =A0 =A0 =A0 3) patch that code to return to the original pc + 4
> + =A0 =A0 =A0 4) patch the original instruction to branch to the new code
> +
> +That way we can inject an arbitrary amount of code as replacement for a =
single
> +instruction. This allows us to check for pending interrupts when setting=
 EE=3D1
> +for example.
> +
> --
> 1.6.0.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: machine check in kernel for a mpc870 board
From: Scott Wood @ 2010-07-02 17:47 UTC (permalink / raw)
  To: Shawn Jin; +Cc: ppcdev
In-Reply-To: <AANLkTikELusybzh2uTh3KJ6QwelgSHW6HWWPLh-4Zc0b@mail.gmail.com>

On Fri, 2 Jul 2010 10:06:47 -0700
Shawn Jin <shawnxjin@gmail.com> wrote:

> > Or more generally update this section to hold whatever is connected
> > to the localbus on your board. =A0The first cell is the chipselect.
>=20
> The chipselect? Isn't it just the child-bus-addr? BTW, do we have to
> define the #address-cells to 2? 1 is not enough?

The first cell of the child bus address is the chip select, the second
cell is the offset into the chip select.

> SDRAM uses CS0/6, each 64MB. BDI2000 configuration is as follows.
> ; init memory controller
> WM32    0xFA200104      0xfe000ff6      ;;OR0: Flash 32MB
> WM32    0xFA200100      0xfc000001      ;;BR0: Flash at 0xFC000000,
> 32bit, R/W, no parity, use GPCM
> WM32    0xFA20010C      0xfc000e00      ;;OR1: SDRAM 64MB, all
> accesses WM32    0xFA200108      0x00000081      ;;BR1: SDRAM at
> 0x00000000, 32bit, R/W, no parity, use UPMA
> WM32    0xFA200134      0xfc000e00      ;;OR6: SDRAM 64MB, all
> accesses WM32    0xFA200130      0x04000081      ;;BR6: SDRAM at
> 0x04000000, 32bit, R/W, no parity, use UPMA

That looks like SDRAM is on CS1/6, not CS0/6.

We haven't been putting ordinary RAM under the localbus node, even
though it's connected through the localbus on these chips.

> When defining memory's reg property, can a single pair <0 0x08000000>
> be enough? Or must it be <0 0x04000000 0x04000000 0x04000000>?

A single pair is fine.

-Scott

^ permalink raw reply

* Re: machine check in kernel for a mpc870 board
From: Shawn Jin @ 2010-07-02 17:06 UTC (permalink / raw)
  To: Scott Wood; +Cc: ppcdev
In-Reply-To: <4C2CF9A7.7010801@freescale.com>

>> =A0 =A0 =A0 =A0localbus@fa200100 {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0compatible =3D "fsl,mpc885-localbus", "fs=
l,pq1-localbus",
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "simple-bus";
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#address-cells =3D<2>;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0#size-cells =3D<1>;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0reg =3D<0xfa200100 0x40>;
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0ranges =3D<
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A00 0 0xfe000000 0x01000000=
 =A0 =A0// I'm not sure about
>> this?
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0>;
>> =A0 =A0 =A0 =A0};
>
> Change 0xfe000000 to wherever u-boot maps your flash, and 0x01000000 to
> whatever the size of the flash localbus mapping is.
>
> Or more generally update this section to hold whatever is connected to th=
e
> localbus on your board. =A0The first cell is the chipselect.

The chipselect? Isn't it just the child-bus-addr? BTW, do we have to
define the #address-cells to 2? 1 is not enough?

SDRAM uses CS0/6, each 64MB. BDI2000 configuration is as follows.
; init memory controller
WM32    0xFA200104      0xfe000ff6      ;;OR0: Flash 32MB
WM32    0xFA200100      0xfc000001      ;;BR0: Flash at 0xFC000000,
32bit, R/W, no parity, use GPCM
WM32    0xFA20010C      0xfc000e00      ;;OR1: SDRAM 64MB, all accesses
WM32    0xFA200108      0x00000081      ;;BR1: SDRAM at 0x00000000,
32bit, R/W, no parity, use UPMA
WM32    0xFA200134      0xfc000e00      ;;OR6: SDRAM 64MB, all accesses
WM32    0xFA200130      0x04000081      ;;BR6: SDRAM at 0x04000000, 32bit, =
R/W,
no parity, use UPMA

When defining memory's reg property, can a single pair <0 0x08000000>
be enough? Or must it be <0 0x04000000 0x04000000 0x04000000>?

Thanks,
-Shawn.

^ permalink raw reply

* Re: [PATCH 00/27] KVM PPC PV framework
From: Alexander Graf @ 2010-07-02 16:59 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <82C19122-91B6-4F91-9EF0-BEA2759A349D@kernel.crashing.org>


On 02.07.2010, at 18:22, Segher Boessenkool wrote:

>> [without]
>>=20
>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > =
/dev/null; done
>>=20
>> real    0m14.659s
>> user    0m8.967s
>> sys     0m5.688s
>>=20
>> [with]
>>=20
>> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > =
/dev/null; done
>>=20
>> real    0m7.557s
>> user    0m4.121s
>> sys     0m3.426s
>>=20
>>=20
>> So this is a significant performance improvement! I'm quite happy how =
fast this
>> whole thing becomes :)
>=20
> Yeah :-)  Do you have timings for the native system as well?

Sure, same machine with openSUSE 11.1 instead of Debian that I use as =
guest OS usually:

agraf@lychee:~> time for i in {1..1000}; do /bin/echo hello > /dev/null; =
done

real	0m2.088s
user	0m0.704s
sys	0m1.460s


Alex

^ permalink raw reply

* Re: [PATCH 27/27] KVM: PPC: Add Documentation about PV interface
From: Segher Boessenkool @ 2010-07-02 16:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277980982-12433-28-git-send-email-agraf@suse.de>

> +To find out if we're running on KVM or not, we overlay the PVR  
> register. Usually
> +the PVR register contains an id that identifies your CPU type. If,  
> however, you
> +pass KVM_PVR_PARA in the register that you want the PVR result in,  
> the register
> +still contains KVM_PVR_PARA after the mfpvr call.
> +
> +	LOAD_REG_IMM(r5, KVM_PVR_PARA)
> +	mfpvr	r5
> +	[r5 still contains KVM_PVR_PARA]

I love this part :-)

> +	__u64 scratch3;
> +	__u64 critical;		/* Guest may not get interrupts if == r1 */
> +	__u64 sprg0;
> +	__u64 sprg1;
> +	__u64 sprg2;
> +	__u64 sprg3;
> +	__u64 srr0;
> +	__u64 srr1;
> +	__u64 dar;
> +	__u64 msr;
> +	__u32 dsisr;
> +	__u32 int_pending;	/* Tells the guest if we have an interrupt */
> +};
> +
> +Additions to the page must only occur at the end. Struct fields  
> are always 32
> +bit aligned.

The u64s are 64-bit aligned, should they always be?

> +The "ld" and "std" instructions are transormed to "lwz" and "stw"  
> instructions
> +respectively on 32 bit systems with an added offset of 4 to  
> accomodate for big
> +endianness.

Will this add never overflow?  Is there anything that checks for it?

> +mtmsrd	rX, 0		b	<special mtmsr section>
> +mtmsr			b	<special mtmsr section>

mtmsr rX


Segher

^ permalink raw reply

* Re: [PATCH 11/27] KVM: PPC: Make RMO a define
From: Segher Boessenkool @ 2010-07-02 16:23 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277980982-12433-12-git-send-email-agraf@suse.de>

> v1 -> v2:
>
>   - RMO -> PAM

Except you forgot the subject line.


Segher

^ permalink raw reply

* Re: [PATCH 00/27] KVM PPC PV framework
From: Segher Boessenkool @ 2010-07-02 16:22 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277980982-12433-1-git-send-email-agraf@suse.de>

> [without]
>
> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > / 
> dev/null; done
>
> real    0m14.659s
> user    0m8.967s
> sys     0m5.688s
>
> [with]
>
> debian-powerpc:~# time for i in {1..1000}; do /bin/echo hello > / 
> dev/null; done
>
> real    0m7.557s
> user    0m4.121s
> sys     0m3.426s
>
>
> So this is a significant performance improvement! I'm quite happy  
> how fast this
> whole thing becomes :)

Yeah :-)  Do you have timings for the native system as well?


Segher

^ permalink raw reply

* Re: [PATCH 13/27] KVM: PPC: Magic Page Book3s support
From: Alexander Graf @ 2010-07-02 15:37 UTC (permalink / raw)
  To: kvm-ppc; +Cc: linuxppc-dev, KVM list
In-Reply-To: <1277980982-12433-14-git-send-email-agraf@suse.de>

Alexander Graf wrote:
> We need to override EA as well as PA lookups for the magic page. When the guest
> tells us to project it, the magic page overrides any guest mappings.
>
> In order to reflect that, we need to hook into all the MMU layers of KVM to
> force map the magic page if necessary.
>
> Signed-off-by: Alexander Graf <agraf@suse.de>
>
> v1 -> v2:
>
>   - RMO -> PAM
> ---
>  arch/powerpc/kvm/book3s.c             |    7 +++++++
>  arch/powerpc/kvm/book3s_32_mmu.c      |   16 ++++++++++++++++
>  arch/powerpc/kvm/book3s_32_mmu_host.c |   12 ++++++++++++
>  arch/powerpc/kvm/book3s_64_mmu.c      |   30 +++++++++++++++++++++++++++++-
>  arch/powerpc/kvm/book3s_64_mmu_host.c |   12 ++++++++++++
>  5 files changed, 76 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 14db032..b22e608 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -554,6 +554,13 @@ mmio:
>  
>  static int kvmppc_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>  {
> +	ulong mp_pa = vcpu->arch.magic_page_pa;
> +
> +	if (unlikely(mp_pa) &&
> +	    unlikely((mp_pa & KVM_RMO) >> PAGE_SHIFT == gfn)) {
>   

This should be KVM_PAM :(. Should I respin the whole thing or could
whoever commits this just make that trivial change?


Alex

^ permalink raw reply

* [PATCH 2/2] edac: mpc85xx: Add support for MPC8569 EDAC controllers
From: Anton Vorontsov @ 2010-07-02 12:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Tyser, linux-kernel, Dave Jiang, linuxppc-dev,
	Doug Thompson

Simply add a proper ID into the device table.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/edac/mpc85xx_edac.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 52ca09b..f39b00a 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -1120,6 +1120,7 @@ static struct of_device_id mpc85xx_mc_err_of_match[] = {
 	{ .compatible = "fsl,mpc8555-memory-controller", },
 	{ .compatible = "fsl,mpc8560-memory-controller", },
 	{ .compatible = "fsl,mpc8568-memory-controller", },
+	{ .compatible = "fsl,mpc8569-memory-controller", },
 	{ .compatible = "fsl,mpc8572-memory-controller", },
 	{ .compatible = "fsl,mpc8349-memory-controller", },
 	{ .compatible = "fsl,p2020-memory-controller", },
-- 
1.7.0.5

^ permalink raw reply related

* [PATCH 1/2] edac: mpc85xx: Fix MPC85xx dependency
From: Anton Vorontsov @ 2010-07-02 12:41 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Tyser, linux-kernel, Dave Jiang, linuxppc-dev,
	Doug Thompson

Since commit 5753c082f66eca5be81f6bda85c1718c5eea6ada ("powerpc/85xx:
Kconfig cleanup"), there is no MPC85xx Kconfig symbol anymore, so the
driver became non-selectable.

This patch fixes the issue by switching to PPC_85xx symbol.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
---
 drivers/edac/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index aedef79..0d2f9db 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -209,7 +209,7 @@ config EDAC_I5100
 
 config EDAC_MPC85XX
 	tristate "Freescale MPC83xx / MPC85xx"
-	depends on EDAC_MM_EDAC && FSL_SOC && (PPC_83xx || MPC85xx)
+	depends on EDAC_MM_EDAC && FSL_SOC && (PPC_83xx || PPC_85xx)
 	help
 	  Support for error detection and correction on the Freescale
 	  MPC8349, MPC8560, MPC8540, MPC8548
-- 
1.7.0.5

^ permalink raw reply related

* Re: [PATCH 16/27] KVM: Move kvm_guest_init out of generic code
From: Geert Uytterhoeven @ 2010-07-02  7:41 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <1277980982-12433-17-git-send-email-agraf@suse.de>

On Thu, 1 Jul 2010, Alexander Graf wrote:
> Currently x86 is the only architecture that uses kvm_guest_init(). With
> PowerPC we're getting a second user, but the signature is different there=

> and we don't need to export it, as it uses the normal kernel init framewo=
rk.

Making the signatures match (i.e. always return `int') wouldn't hurt,
since kvm_guest_init() apparently can fail on x86, too.

> So let's move the x86 specific definition of that function over to the x8=
6
> specfic header file.

With kind regards,

Geert Uytterhoeven
Software Architect
Techsoft Centre

Technology and Software Centre Europe
The Corporate Village =B7 Da Vincilaan 7-D1 =B7 B-1935 Zaventem =B7 Belgium=


Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 =B7 RPR Brussels
Fortis =B7 BIC GEBABEBB =B7 IBAN BE41293037680010

************************************************************************
The information contained in this message or any of its attachments may be =
confidential and is intended for the exclusive use of the addressee(s).  An=
y disclosure, reproduction, distribution or other dissemination or use of t=
his communication is strictly prohibited without the express permission of =
the sender.  The views expressed in this email are those of the individual =
and not necessarily those of Sony or Sony affiliated companies.  Sony email=
 is for business use only.

This email and any response may be monitored by Sony to be in compliance wi=
th Sony's global policies and standards

^ permalink raw reply

* Re: [PATCH 16/27] KVM: Move kvm_guest_init out of generic code
From: Alexander Graf @ 2010-07-02  7:44 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: linuxppc-dev, KVM list, kvm-ppc
In-Reply-To: <alpine.LRH.2.00.1007020937160.17095@mink.sonytel.be>


On 02.07.2010, at 09:41, Geert Uytterhoeven wrote:

> On Thu, 1 Jul 2010, Alexander Graf wrote:
>> Currently x86 is the only architecture that uses kvm_guest_init(). =
With
>> PowerPC we're getting a second user, but the signature is different =
there
>> and we don't need to export it, as it uses the normal kernel init =
framework.
>=20
> Making the signatures match (i.e. always return `int') wouldn't hurt,
> since kvm_guest_init() apparently can fail on x86, too.

I'm reasonably indifferent here. Fact is that the x86 hook is done =
completely different from how we do it on ppc. So whatever we do, the =
signature doesn't belong in generic code.

If you like, feel free to send a follow-up patch making the x86 =
signature return failures :). I personally don't think it make sense to =
expose failures for PV speedups - they should never be mandatory and =
thus failure is no problem for the system, so the caller doesn't need to =
know.

Alex

^ permalink raw reply

* Re: Oops while running fs_racer test on a POWER6 box against latest git
From: divya @ 2010-07-02  6:46 UTC (permalink / raw)
  To: maciej.rutecki
  Cc: Latchesar Ionkov, jaxboe, LKML, linuxppc-dev, Ron Minnich, hch
In-Reply-To: <201007012025.30452.maciej.rutecki@gmail.com>

On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
>    
>> While running fs_racer test from LTP on a POWER6 box against latest
>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the following
>> warning followed by multiple oops.
>>
>>      
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=16324
> for your bug report, please add your address to the CC list in there, thanks!
>
>
>    
Here I find a cleaner back trace while running fs_racer test from LTP on a POWER6
box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)

Badness at kernel/mutex-debug.c:64
BUG: key (null) not in .data!
NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
REGS: c00000010bb176f0 TRAP: 0700   Not tainted  (2.6.35-rc3-git5-autotest)
BUG: key 00000000000001d8 not in .data!
BUG: key 00000000000001e0 not in .data!
BUG: key 00000000000001e8 not in .data!
MSR: 8000000000029032
Unable to handle kernel paging request for data at address 0x00000028
Faulting instruction address: 0xc0000000003ad0ec
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
Modules linked in:
NIP = fff9e98fc40  MSR = 800000004001d032
  ipv6 fuse loop
Unable to handle kernel paging request for unknown fault
  dm_mod
Faulting instruction address: 0xc00000000008d0f4
  sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
REGS: c000000109b4f610 TRAP: 0300   Not tainted  (2.6.35-rc3-git5-autotest)
MSR: 8000000000009032<EE,ME,IR,DR>   CR: 88004484  XER: 00000001
DAR: 0000000000000028, DSISR: 0000000040010000
TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798 0000000000000028
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR08: 0000000000000000 0000000000000028 c000000000189f2c c000000109a98600
GPR12: 0000000024004424 c00000000f602f80 00000000000041ff 0000000000000001
GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44 0000000000000000
GPR20: c00000010df77908 fffffffffffff000 0000000000010000 00000000000041ff
GPR24: c00000010df77758 c000000109fa1800 c00000010df77908 c0000000ff236600
GPR28: 0000000000000028 0000000000000040 c000000000ca7b38 c000000000189f2c
NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
Call Trace:
[c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4 (unreliable)
[c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
[c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
[c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
[c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
[c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
[c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
[c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
Instruction dump:
eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
38000000 7c691b78 980d0214 800d0008<7d601829>  2c0b0000 40c20010 7c00192d
Oops: Weird page fault, sig: 11 [#2]

Pls let me know if this back trace would help in analyzing further.
Meanwhile I shall do a git bisect and send the inputs.

Thanks
Divya

^ permalink raw reply

* RE: CONFIG_NO_HZ causing poor console responsiveness
From: Li Yang-R58472 @ 2010-07-02  6:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Tabi Timur-B04825; +Cc: Linuxppc-dev Development
In-Reply-To: <1278049644.4200.377.camel@pasglop>


>-----Original Message-----
>From: linuxppc-dev-bounces+leoli=3Dfreescale.com@lists.ozlabs.org
>[mailto:linuxppc-dev-bounces+leoli=3Dfreescale.com@lists.ozlabs.org] On
>Behalf Of Benjamin Herrenschmidt
>Sent: Friday, July 02, 2010 1:47 PM
>To: Tabi Timur-B04825
>Cc: Linuxppc-dev Development
>Subject: Re: CONFIG_NO_HZ causing poor console responsiveness
>
>On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote:
>> I'm adding support for a new e500-based board (the P1022DS), and in
>> the process I've discovered that enabling CONFIG_NO_HZ (Tickless
>> System / Dynamic Ticks) causes significant responsiveness problems on
>> the serial console.  When I type on the console, I see delays of up =
to
>> a half-second for almost every character.  It acts as if there's a
>> background process eating all the CPU.
>>
>> I don't have time to debug this thoroughly at the moment.  The =
problem
>> occurs in the latest kernel, but it appears not to occur in 2.6.32.
>>
>> Has anyone else seen anything like this?
>
>I noticed that on the bimini with 2.6.35-rc* though I didn't get to =
track
>it down yet.


Patch found at the following location fixed this problem.

http://www.spinics.net/lists/linux-tip-commits/msg08279.html

Hope it has already been merged.

- Leo

^ permalink raw reply

* Re: CONFIG_NO_HZ causing poor console responsiveness
From: Benjamin Herrenschmidt @ 2010-07-02  5:47 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Linuxppc-dev Development
In-Reply-To: <AANLkTilMzfwgYvoFhxhcVQVGV-EkMLVHI2TeQ29SYFCH@mail.gmail.com>

On Tue, 2010-06-29 at 14:54 -0500, Timur Tabi wrote:
> I'm adding support for a new e500-based board (the P1022DS), and in
> the process I've discovered that enabling CONFIG_NO_HZ (Tickless
> System / Dynamic Ticks) causes significant responsiveness problems on
> the serial console.  When I type on the console, I see delays of up to
> a half-second for almost every character.  It acts as if there's a
> background process eating all the CPU.
> 
> I don't have time to debug this thoroughly at the moment.  The problem
> occurs in the latest kernel, but it appears not to occur in 2.6.32.
> 
> Has anyone else seen anything like this?

I noticed that on the bimini with 2.6.35-rc* though I didn't get to
track it down yet.

Cheers,
Ben.

^ permalink raw reply

* Re: CONFIG_NO_HZ causing poor console responsiveness
From: Tabi Timur-B04825 @ 2010-07-02  3:54 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Linuxppc-dev Development
In-Reply-To: <1278042390.19236.5.camel@marge.simson.net>

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]

On Jul 1, 2010, at 10:46 PM, "Mike Galbraith" <efault@gmx.de> wrote:
> 
> Hi Timur,
> 
> This has already fixed.  Below is the final fix from tip.

Than Mike.  I thought I was using the latest code, but I guess not.


[-- Attachment #2: Type: text/html, Size: 509 bytes --]

^ permalink raw reply

* Re: CONFIG_NO_HZ causing poor console responsiveness
From: Mike Galbraith @ 2010-07-02  3:46 UTC (permalink / raw)
  To: Timur Tabi; +Cc: Linuxppc-dev Development
In-Reply-To: <AANLkTikLivGEFl_DJsvWArbvE1yYUBFn9yiQxqsZgTma@mail.gmail.com>

On Thu, 2010-07-01 at 16:55 -0500, Timur Tabi wrote:
> On Tue, Jun 29, 2010 at 2:54 PM, Timur Tabi <timur@freescale.com> wrote:
> > I'm adding support for a new e500-based board (the P1022DS), and in
> > the process I've discovered that enabling CONFIG_NO_HZ (Tickless
> > System / Dynamic Ticks) causes significant responsiveness problems on
> > the serial console.  When I type on the console, I see delays of up to
> > a half-second for almost every character.  It acts as if there's a
> > background process eating all the CPU.
> 
> I finally finished my git-bisect, and it wasn't that helpful.  I had
> to skip several commits because the kernel just wouldn't boot:
> 
> There are only 'skip'ped commits left to test.
> The first bad commit could be any of:
> 6bc6cf2b61336ed0c55a615eb4c0c8ed5daf3f08
> 8b911acdf08477c059d1c36c21113ab1696c612b
> 21406928afe43f1db6acab4931bb8c886f4d04ce
> 5ca9880c6f4ba4c84b517bc2fed5366adf63d191
> a64692a3afd85fe048551ab89142fd5ca99a0dbd
> f2e74eeac03ffb779d64b66a643c5e598145a28b
> c6ee36c423c3ed1fb86bb3eabba9fc256a300d16
> e12f31d3e5d36328c7fbd0fce40a95e70b59152c
> 13814d42e45dfbe845a0bbe5184565d9236896ae
> b42e0c41a422a212ddea0666d5a3a0e3c35206db
> 39c0cbe2150cbd848a25ba6cdb271d1ad46818ad <== the crime scene
> beac4c7e4a1cc6d57801f690e5e82fa2c9c245c8
> 41acab8851a0408c1d5ad6c21a07456f88b54d40
> 6427462bfa50f50dc6c088c07037264fcc73eca1
> c9494727cf293ae2ec66af57547a3e79c724fec2
> We cannot bisect more!
> 
> These correspond to a batch of scheduler patches, most from Mike Galbraith.
> 
> I don't know what to do now.  I can't test any of these commits.  Even
> if I could, they look like they're all part of one set, so I doubt I
> could narrow it down to one commit anyway.

Hi Timur,

This has already fixed.  Below is the final fix from tip.

commit 3310d4d38fbc514e7b18bd3b1eea8effdd63b5aa
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Jun 17 18:02:37 2010 +0200

    nohz: Fix nohz ratelimit
    
    Chris Wedgwood reports that 39c0cbe (sched: Rate-limit nohz) causes a
    serial console regression, unresponsiveness, and indeed it does. The
    reason is that the nohz code is skipped even when the tick was already
    stopped before the nohz_ratelimit(cpu) condition changed.
    
    Move the nohz_ratelimit() check to the other conditions which prevent
    long idle sleeps.
    
    Reported-by: Chris Wedgwood <cw@f00f.org>
    Tested-by: Brian Bloniarz <bmb@athenacr.com>
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Jiri Kosina <jkosina@suse.cz>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Greg KH <gregkh@suse.de>
    Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
    Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
    Cc: Jef Driesen <jefdriesen@telenet.be>
    LKML-Reference: <1276790557.27822.516.camel@twins>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..783fbad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -315,9 +315,6 @@ void tick_nohz_stop_sched_tick(int inidle)
 		goto end;
 	}
 
-	if (nohz_ratelimit(cpu))
-		goto end;
-
 	ts->idle_calls++;
 	/* Read jiffies and the time when jiffies were updated last */
 	do {
@@ -328,7 +325,7 @@ void tick_nohz_stop_sched_tick(int inidle)
 	} while (read_seqretry(&xtime_lock, seq));
 
 	if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu) ||
-	    arch_needs_cpu(cpu)) {
+	    arch_needs_cpu(cpu) || nohz_ratelimit(cpu)) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
 	} else {

^ permalink raw reply related

* Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
From: Benjamin Herrenschmidt @ 2010-07-02  2:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-ppc, linuxppc-dev, Alexander Graf, KVM list
In-Reply-To: <4C2C9B36.8000002@redhat.com>

On Thu, 2010-07-01 at 16:42 +0300, Avi Kivity wrote:
> > So I think the only reasonable way to implement page ageing is to
> unmap
> > pages. And that's slow, because it means we have to map them again
> on
> > access. Bleks. Or we could look for the HTAB entry and only unmap
> them
> > if the entry is moot.
> >    
> 
> I think it works out if you update struct page when you clear out an
> HTAB.

Hrm... going to struct page without going through the PTE might work out
indeed. We can get to the struct page from the RPN.

However, that means -reading- the hash entry we want to evict, and
that's a fairly expensive H-Call, especially if we ask phyp to
back-translate the real address into a logical (partition) address so we
can get to the struct page.... While we might be able to reconstitute
the virtual address from the hash content + bucket address. However,
from the vsid back to the page table might be tricky as well.

IE. Either way, it's not a simple process.

Now, eviction is rare, our MMU hash is generally big, so maybe the read
back with back translate to hit struct page might be the way to go here.

As for other kind of invalidations, we do have the PTE around when they
happen so we can go fetch the HW ref bit and update the PTE I suppose.

Ben.

^ permalink raw reply

* Re: [PATCH 0/2] Faster MMU lookups for Book3s v3
From: Benjamin Herrenschmidt @ 2010-07-02  2:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, linuxppc-dev, Avi Kivity, KVM list
In-Reply-To: <4C2C8FA8.1030702@suse.de>

On Thu, 2010-07-01 at 14:52 +0200, Alexander Graf wrote:
> Page ageing is difficult. The HTAB has a hardware set referenced bit,
> but we don't have a guarantee that the entry is still there when we look
> for it. Something else could have overwritten it by then, but the entry
> could still be lingering around in the TLB.
> 
> So I think the only reasonable way to implement page ageing is to unmap
> pages. And that's slow, because it means we have to map them again on
> access. Bleks. Or we could look for the HTAB entry and only unmap them
> if the entry is moot.

Well, not quite.

We -could- use the HW reference bit. However, that means that whenever
we flush the hash PTE we get a snapshot of the HW bit and copy it over
to the PTE.

That's not -that- bad for normal invalidations. However, it's a problem
potentially for eviction. IE. When a hash bucket is full, we
pseudo-randomly evict a slot. If we were to use the HW ref bit, we would
need a way to go back to the PTE from the hash bucket to perform that
update (or something really tricky like sticking it in a list somewhere,
and have the young test walk that list when non-empty, etc...)

Cheers,
Ben.

^ permalink raw reply

* Re: Oops while running fs_racer test on a POWER6 box against latest git
From: Michael Neuling @ 2010-07-02  1:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Latchesar Ionkov, Jens Axboe, LKML, linuxppc-dev, Ron Minnich,
	Christoph Hellwig, divya
In-Reply-To: <20100701105907.GK22976@laptop>

In message <20100701105907.GK22976@laptop> you wrote:
> On Thu, Jul 01, 2010 at 03:04:54PM +1000, Michael Neuling wrote:
> > > While running fs_racer test from LTP on a POWER6 box against latest git(2
.6.3
> > 5-rc3-git4 - commitid 984bc9601f64fd)
> > > came across the following warning followed by multiple oops.
> > > 
> > > ------------[ cut here ]------------
> > > 
> > > Badness at kernel/mutex-debug.c:64
> > > NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> > > REGS: c00000010be8f6f0 TRAP: 0700   Not tainted  (2.6.35-rc3-git4-autotes
t)
> > > MSR: 8000000000029032<EE,ME,CE,IR,DR>    CR: 24224422  XER: 00000012
> > > TASK = c00000010727cf00[8211] 'fs_racer_file_c' THREAD: c00000010be8bb50 
CPU:
> >  2
> > > GPR00: 0000000000000000 c00000010be8f970 c000000000d3d798 000000000000000
1
> > > GPR04: c00000010be8fa70 c00000010be8c000 c00000010727d9f8 000000000000000
0
> > > GPR08: c0000000043042f0 c0000000016534e8 000000000000017a c000000000c29a1
c
> > > GPR12: 0000000028228424 c00000000f600500 c00000010be8fc40 000000002000000
0
> > > GPR16: fffffffffffff000 c000000109c73000 c00000010be8fc30 000000000001044
2
> > > GPR20: 0000000000000000 0000000000000000 00000000000001b6 c00000010dd1225
0
> > > GPR24: c00000000017c08c c00000010727cf00 c00000010dd12278 c00000010dd1221
0
> > > GPR28: 0000000000000001 c00000010be8c000 c000000000ca2008 c00000010be8fa7
0
> > > NIP [c0000000000be9e8] .mutex_remove_waiter+0xa4/0x130
> > > LR [c0000000000be9cc] .mutex_remove_waiter+0x88/0x130
> > > Call Trace:
> > > [c00000010be8f970] [c00000010be8fa00] 0xc00000010be8fa00 (unreliable)
> > > [c00000010be8fa00] [c00000000064a9f0] .mutex_lock_nested+0x384/0x430
> > > Instruction dump:
> > > e81f0010 e93d0000 7fa04800 41fe0028 482e96e5 60000000 2fa30000 419e0018
> > > e93e8008 80090000 2f800000 409e0008<0fe00000>   e93e8000 80090000 2f80000
0
> > > Unable to handle kernel paging request for unknown fault
> > > Faulting instruction address: 0xc00000000008d0f4
> > > Oops: Kernel access of bad area, sig: 7 [#1]
> > > SMP NR_CPUS=1024 NUMA
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > pSeries
> > > last sysfs file: /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_ma
p
> > > Modules linked in: ipv6 fuse loop dm_mod sr_mod cdrom ibmveth sg
> > > sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt scsi_mod
> > > NIP: c00000000008d0f4 LR: c00000000008d0d0 CTR: 0000000000000000
> > > REGS: c00000010978f900 TRAP: 0600   Tainted: G        W    (2.6.35-rc3-gi
t4-a
> > utotest)
> > > MSR: 8000000000009032
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > Unrecoverable FP Unavailable Exception 800 at c000000000648ed4
> > > EE,ME,IR,DR>    CR: 24022442  XER: 00000012
> > > DAR: c000000000648f54, DSISR: 0000000040010000
> > > TASK = c0000001096e4900[7353] 'fs_racer_file_s' THREAD: c00000010978c000 
CPU:
> >  10
> > > GPR00: 0000000000004000 c00000010978fb80 c000000000d3d798 000000000000000
1
> > > GPR04: c00000000083539e c000000001610228 0000000000000000 c0000000054c688
0
> > > GPR08: 00000000000006a5 c000000000648f54 0000000000000007 00000000049b000
0
> > > GPR12: 0000000000000000 c00000000f601900 00000000ffffffff fffffffffffffff
f
> > > GPR16: 000000004b7dc520 0000000000000000 0000000000000000 c00000010978fea
0
> > > GPR20: 00000fffcca7e7a0 00000fffcca7e7a0 00000fffabf7dfd0 00000fffabf7dfd
0
> > > GPR24: 0000000000000000 0000000001200011 c000000000e1c0a8 c000000000648ed
4
> > > GPR28: 0000000000000000 c0000001096e4900 c000000000ca0458 c00000010725d40
0
> > > NIP [c00000000008d0f4] .copy_process+0x310/0xf40
> > > LR [c00000000008d0d0] .copy_process+0x2ec/0xf40
> > > Call Trace:
> > > [c00000010978fb80] [c00000000008d0d0] .copy_process+0x2ec/0xf40 (unreliab
le)
> > > [c00000010978fc80] [c00000000008deb4] .do_fork+0x190/0x3cc
> > > [c00000010978fdc0] [c000000000011ef4] .sys_clone+0x58/0x70
> > > [c00000010978fe30] [c0000000000087f0] .ppc_clone+0x8/0xc
> > > Instruction dump:
> > > 419e0010 7fe3fb78 480774cd 60000000 801f0014 e93f0008 7800b842 39290080
> > > 78004800 60000042 901f0014 38004000<7d6048a8>   7d6b0078 7d6049ad 40c2fff
4
> > > 
> > > Kernel version 2.6.34-rc3-git3 works fine.
> > 
> > Should this read 2.6.35-rc3-git3?
> > 
> > If so, there's only about 20 commits in:
> > 5904b3b81d2516..984bc9601f64fd
> > 
> > The likely fs related candidates are from Christoph and Nick Piggin
> > (added to CC)
> > 
> > No commits relating to POWER6 or PPC.
> 
> Not sure what's happening here. The first warning looks like some mutex
> corruption, but it doesn't have a stack trace (these are 2 seperate
> dumps, right? ie. the copy_process stack doesn't relate to the mutex
> warning?) So I don't have much idea.
> 
> If it is reproducable, can you try getting a better stack trace, or
> better yet, even bisecting if there is just a small window?

I can't reproduce the bug here on POWER6 or POWER7.

Divya, can you bisect this?

Mikey

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox