* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-11 20:41 UTC (permalink / raw)
To: Wood Scott-B07421
Cc: Tabi Timur-B04825, Alexander Graf, linuxppc-dev@ozlabs.org,
Gala Kumar-B11780
In-Reply-To: <20110711130430.4b3036f6@schlenkerla.am.freescale.net>
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Monday, July 11, 2011 1:05 PM
> To: Yoder Stuart-B08248
> Cc: Wood Scott-B07421; Tabi Timur-B04825; Grant Likely; Benjamin Herrensc=
hmidt; Gala Kumar-
> B11780; Alexander Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Mon, 11 Jul 2011 12:41:20 -0500
> Yoder Stuart-B08248 <B08248@freescale.com> wrote:
>=20
> >
> >
> > > -----Original Message-----
> > > From: Wood Scott-B07421
> > > Sent: Monday, July 11, 2011 11:24 AM
> > > To: Tabi Timur-B04825
> > > Cc: Yoder Stuart-B08248; Grant Likely; Benjamin Herrenschmidt; Gala
> > > Kumar-B11780; Wood Scott- B07421; Alexander Graf;
> > > linuxppc-dev@ozlabs.org
> > > Subject: Re: RFC: top level compatibles for virtual platforms
> > >
> > > On Mon, 11 Jul 2011 10:45:47 -0500
> > > Timur Tabi <timur@freescale.com> wrote:
> > >
> > > > >> Also, if these are KVM creations, shouldn't there be a "kvm" in
> > > > >> the compatible string somewhere?
> > > > >
> > > > > There is nothing KVM specific about these platforms. Any
> > > > > hypervisor could create a similar virtual machine.
> > > >
> > > > True, but I think we're on a slippery slope, here. Virtualization
> > > > allows us to create "virtual platforms" that are not well defined.
> > > > Linux requires a unique compatible string for each platform.
> > >
> > > The device tree is supposed to describe the hardware (virtual or
> > > otherwise), not just supply what Linux wants. Perhaps there simply
> > > shouldn't be a toplevel compatible if there's nothing appropriate to =
describe there -- and
> fix whatever issues Linux has with that.
> >
> > But there is a concept in Linux of a platform 'machine':
>=20
> So have a Linux "machine" that is used when no other one matches. That d=
oesn't justify making
> something up in the device tree.
>=20
> > define_machine(p4080_ds) {
> > .name =3D "P4080 DS",
> > .probe =3D p4080_ds_probe,
> > .setup_arch =3D corenet_ds_setup_arch,
> > .init_IRQ =3D corenet_ds_pic_init,
> > #ifdef CONFIG_PCI
> > .pcibios_fixup_bus =3D fsl_pcibios_fixup_bus,
> > #endif
> > .get_irq =3D mpic_get_coreint_irq,
> > .restart =3D fsl_rstcr_restart,
> > .calibrate_decr =3D generic_calibrate_decr,
> > .progress =3D udbg_progress,
> > };
> >
> > Right now p4080_ds_probe needs something to match on to determine
> > whether this is the machine type. How would it work if
> > there was no top level compatible to match on? Some
> > platforms (e.g. e500v2-type) need mpc85xx_ds_pic_init(), others need
> > corenet_ds_pic_init().
>=20
> Just because Linux does it that way now doesn't mean it needs to. The in=
terrupt controller
> has a compatible property. Match on it like any other device. You can f=
ind which one is the
> root interrupt controller by looking for nodes with the interrupt-control=
ler property that
> doesn't have an explicit interrupt-parent (or an interrupts property? se=
ems to be a conflict
> between ePAPR and the original interrupt mapping document).
This may be the right long term thing to do, but restructuring
how Linux powerpc platforms work is a bigger effort. I was looking
for an incremental improvement over what we do now, which is pass
a compatible of MPC8544DS and P4080DS for these virtual platforms.
However, they _are_ compatible with MPC8544DS and P4080DS so maybe
leaving the compatible string alone is ok for now.
Stuart
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Scott Wood @ 2011-07-11 20:06 UTC (permalink / raw)
To: Grant Likely
Cc: Wood Scott-B07421, linuxppc-dev@ozlabs.org, Tabi Timur-B04825,
Alexander Graf, Gala Kumar-B11780
In-Reply-To: <CACxGe6sRjGzk2iVDXCMZS=AZk8mc_CtGp7CSqnO2WY-yih1Q8A@mail.gmail.com>
On Tue, 12 Jul 2011 04:59:33 +0900
Grant Likely <grant.likely@secretlab.ca> wrote:
> However, compatible values are cheap and while theoretically any
> hypervisor could create a similar machine, the reality is probably
> subtle difference between the implementations. I'd rather see the
> compatible reflect the specific implementation.
That's what the hypervisor node is for. We have a tree, let's use it. :-)
-Scott
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Grant Likely @ 2011-07-11 19:59 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, Tabi Timur-B04825, Alexander Graf,
linuxppc-dev@ozlabs.org, Gala Kumar-B11780
In-Reply-To: <9F6FE96B71CF29479FF1CDC8046E150316F97F@039-SN1MPN1-003.039d.mgd.msft.net>
On Mon, Jul 11, 2011 at 11:34 PM, Yoder Stuart-B08248
<B08248@freescale.com> wrote:
>
>
>> -----Original Message-----
>> From: Tabi Timur-B04825
>> Sent: Friday, July 08, 2011 8:39 PM
>> To: Yoder Stuart-B08248
>> Cc: Grant Likely; Benjamin Herrenschmidt; Gala Kumar-B11780; Wood Scott-=
B07421; Alexander
>> Graf; linuxppc-dev@ozlabs.org
>> Subject: Re: RFC: top level compatibles for virtual platforms
>>
>> On Fri, Jul 8, 2011 at 1:43 PM, Yoder Stuart-B08248 <B08248@freescale.co=
m> wrote:
>>
>> > =A0 "MPC85xxDS" - for a virtual machine for the e500v2 type platforms
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 and would support 85xx targets, plus P=
2020, P1022,etc
>> >
>> > =A0 "corenet-32-ds" - for a virtual machine similar to the 32-bit P408=
0
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platforms
>> >
>> > =A0 "corenet-64-ds" - for a virtual machine based on a 64-bit corenet
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platform
>>
>> I think we should drop the "DS" because that's a name applied to certain=
Freescale reference
>> boards.
>>
>> Is being a CoreNet board really something meaningful with respect to KVM=
? =A0I don't see the
>> connection.
>
> We're talking about what would be meaningful to Linux as a guest on
> this platform here-- =A0Corenet-based SoCs are similar
> in various ways, like using msgsnd for IPIs, having external proxy
> support, etc.
>
> A corenet platform created by a QEMU/KVM looks similar
> to other corenet SoCs. =A0 So, I'm trying to find some generic
> compatible string that describes this platform.
>
>> Also, if these are KVM creations, shouldn't there be a "kvm" in the comp=
atible string
>> somewhere?
>
> There is nothing KVM specific about these platforms. =A0Any hypervisor
> could create a similar virtual machine.
>
> A guest OS can determine specific info about the hypervisor it is
> running on by looking at the /hypervisor node on the device
> tree.
>
> We could put a generic -hv extension to indicate that this is
> a virtual platform.
>
> =A0"mpc85xx-hv"
> =A0"corenet-32-hv"
> =A0"corenet-64-hv"
However, compatible values are cheap and while theoretically any
hypervisor could create a similar machine, the reality is probably
subtle difference between the implementations. I'd rather see the
compatible reflect the specific implementation.
g.
^ permalink raw reply
* Re: [PATCH] powerpc/85xx: fix mpic configuration in CAMP mode
From: Scott Wood @ 2011-07-11 19:38 UTC (permalink / raw)
To: Fabio Baltieri; +Cc: Poonam Aggrwal, linuxppc-dev, linux-kernel
In-Reply-To: <CABkP77e6vLmLxk3wCP39GJ7+yKnJZFc7BPUZ1OkdA8d7KkBCiw@mail.gmail.com>
On Sun, 10 Jul 2011 20:55:32 +0200
Fabio Baltieri <fabio.baltieri@gmail.com> wrote:
> Change the string to check for CAMP mode boot on MPC85xx (eg. P2020) to match
> the one in the corresponding dts files (p2020rdb_camp_core{0,1}.dts).
>
> Without this fix the mpic is configured as in the SMP boot mode, which causes
> the first core to report a protected source interrupt error for devices
> of the other core and lock up.
>
> Signed-off-by: Fabio Baltieri <fabio.baltieri@gmail.com>
> ---
> arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> index 088f30b..a1e5e70 100644
> --- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> +++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
> @@ -58,7 +58,7 @@ void __init mpc85xx_rdb_pic_init(void)
> return;
> }
>
> - if (of_flat_dt_is_compatible(root, "fsl,85XXRDB-CAMP")) {
> + if (of_flat_dt_is_compatible(root, "fsl,MPC85XXRDB-CAMP")) {
> mpic = mpic_alloc(np, r.start,
> MPIC_PRIMARY |
> MPIC_BIG_ENDIAN | MPIC_BROKEN_FRR_NIRQS,
Shouldn't we be setting MPIC_SINGLE_DEST_CPU in this case (as we do for
the other case)?
Or just drop this and specify pic-no-reset in the mpic node.
-Scott
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Scott Wood @ 2011-07-11 18:04 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, Tabi Timur-B04825, Alexander Graf,
linuxppc-dev@ozlabs.org, Gala Kumar-B11780
In-Reply-To: <9F6FE96B71CF29479FF1CDC8046E150316FBA5@039-SN1MPN1-003.039d.mgd.msft.net>
On Mon, 11 Jul 2011 12:41:20 -0500
Yoder Stuart-B08248 <B08248@freescale.com> wrote:
>
>
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Monday, July 11, 2011 11:24 AM
> > To: Tabi Timur-B04825
> > Cc: Yoder Stuart-B08248; Grant Likely; Benjamin Herrenschmidt; Gala Kumar-B11780; Wood Scott-
> > B07421; Alexander Graf; linuxppc-dev@ozlabs.org
> > Subject: Re: RFC: top level compatibles for virtual platforms
> >
> > On Mon, 11 Jul 2011 10:45:47 -0500
> > Timur Tabi <timur@freescale.com> wrote:
> >
> > > >> Also, if these are KVM creations, shouldn't there be a "kvm" in the
> > > >> compatible string somewhere?
> > > >
> > > > There is nothing KVM specific about these platforms. Any hypervisor
> > > > could create a similar virtual machine.
> > >
> > > True, but I think we're on a slippery slope, here. Virtualization
> > > allows us to create "virtual platforms" that are not well defined.
> > > Linux requires a unique compatible string for each platform.
> >
> > The device tree is supposed to describe the hardware (virtual or otherwise), not just supply
> > what Linux wants. Perhaps there simply shouldn't be a toplevel compatible if there's nothing
> > appropriate to describe there -- and fix whatever issues Linux has with that.
>
> But there is a concept in Linux of a platform 'machine':
So have a Linux "machine" that is used when no other one matches. That
doesn't justify making something up in the device tree.
> define_machine(p4080_ds) {
> .name = "P4080 DS",
> .probe = p4080_ds_probe,
> .setup_arch = corenet_ds_setup_arch,
> .init_IRQ = corenet_ds_pic_init,
> #ifdef CONFIG_PCI
> .pcibios_fixup_bus = fsl_pcibios_fixup_bus,
> #endif
> .get_irq = mpic_get_coreint_irq,
> .restart = fsl_rstcr_restart,
> .calibrate_decr = generic_calibrate_decr,
> .progress = udbg_progress,
> };
>
> Right now p4080_ds_probe needs something to match on to determine
> whether this is the machine type. How would it work if
> there was no top level compatible to match on? Some
> platforms (e.g. e500v2-type) need mpc85xx_ds_pic_init(),
> others need corenet_ds_pic_init().
Just because Linux does it that way now doesn't mean it needs to. The
interrupt controller has a compatible property. Match on it like any other
device. You can find which one is the root interrupt controller by looking
for nodes with the interrupt-controller property that doesn't have an
explicit interrupt-parent (or an interrupts property? seems to be a
conflict between ePAPR and the original interrupt mapping document).
-Scott
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Timur Tabi @ 2011-07-11 17:54 UTC (permalink / raw)
To: Scott Wood
Cc: Wood Scott-B07421, linuxppc-dev@ozlabs.org, Alexander Graf,
Gala Kumar-B11780
In-Reply-To: <20110711112418.4db9f41e@schlenkerla.am.freescale.net>
Scott Wood wrote:
> The device tree is supposed to describe the hardware (virtual or
> otherwise), not just supply what Linux wants. Perhaps there simply
> shouldn't be a toplevel compatible if there's nothing appropriate to
> describe there -- and fix whatever issues Linux has with that.
That might be the way to go. I wonder if we can get rid of the platform file
altogether, at least in some situations.
> But what about this is specific to kvm (the actual hypervisor info is
> already described in /hypervisor)? Then we'll have to add a platform match
> for every other hypervisor out there that does the same thing.
I don't know enough about KVM to answer that question.
Frankly, I like the approach that Topaz takes -- add a "-hv" to the real
hardware platform. The only drawback is that each platform needs to add support
for virtualization, but we already have this problem with Topaz today.
--
Timur Tabi
Linux kernel developer at Freescale
^ permalink raw reply
* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-11 17:41 UTC (permalink / raw)
To: Wood Scott-B07421, Tabi Timur-B04825
Cc: Gala Kumar-B11780, Alexander Graf, linuxppc-dev@ozlabs.org
In-Reply-To: <20110711112418.4db9f41e@schlenkerla.am.freescale.net>
> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Monday, July 11, 2011 11:24 AM
> To: Tabi Timur-B04825
> Cc: Yoder Stuart-B08248; Grant Likely; Benjamin Herrenschmidt; Gala Kumar=
-B11780; Wood Scott-
> B07421; Alexander Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Mon, 11 Jul 2011 10:45:47 -0500
> Timur Tabi <timur@freescale.com> wrote:
>=20
> > >> Also, if these are KVM creations, shouldn't there be a "kvm" in the
> > >> compatible string somewhere?
> > >
> > > There is nothing KVM specific about these platforms. Any hypervisor
> > > could create a similar virtual machine.
> >
> > True, but I think we're on a slippery slope, here. Virtualization
> > allows us to create "virtual platforms" that are not well defined.
> > Linux requires a unique compatible string for each platform.
>=20
> The device tree is supposed to describe the hardware (virtual or otherwis=
e), not just supply
> what Linux wants. Perhaps there simply shouldn't be a toplevel compatibl=
e if there's nothing
> appropriate to describe there -- and fix whatever issues Linux has with t=
hat.
But there is a concept in Linux of a platform 'machine':
define_machine(p4080_ds) {
.name =3D "P4080 DS",
.probe =3D p4080_ds_probe,
.setup_arch =3D corenet_ds_setup_arch,
.init_IRQ =3D corenet_ds_pic_init,
#ifdef CONFIG_PCI
.pcibios_fixup_bus =3D fsl_pcibios_fixup_bus,
#endif
.get_irq =3D mpic_get_coreint_irq,
.restart =3D fsl_rstcr_restart,
.calibrate_decr =3D generic_calibrate_decr,
.progress =3D udbg_progress,
};
Right now p4080_ds_probe needs something to match on to determine
whether this is the machine type. How would it work if=20
there was no top level compatible to match on? Some=20
platforms (e.g. e500v2-type) need mpc85xx_ds_pic_init(),
others need corenet_ds_pic_init(). We need a way to
select the machine.
Stuart
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Scott Wood @ 2011-07-11 16:24 UTC (permalink / raw)
To: Timur Tabi
Cc: Wood Scott-B07421, linuxppc-dev@ozlabs.org, Alexander Graf,
Gala Kumar-B11780
In-Reply-To: <4E1B1AAB.8010301@freescale.com>
On Mon, 11 Jul 2011 10:45:47 -0500
Timur Tabi <timur@freescale.com> wrote:
> >> Also, if these are KVM creations, shouldn't there be a "kvm" in the compatible string
> >> somewhere?
> >
> > There is nothing KVM specific about these platforms. Any hypervisor
> > could create a similar virtual machine.
>
> True, but I think we're on a slippery slope, here. Virtualization allows us to
> create "virtual platforms" that are not well defined. Linux requires a unique
> compatible string for each platform.
The device tree is supposed to describe the hardware (virtual or
otherwise), not just supply what Linux wants. Perhaps there simply
shouldn't be a toplevel compatible if there's nothing appropriate to
describe there -- and fix whatever issues Linux has with that.
> I guess my point is back to the name "corenet". That just doesn't mean anything
> to me, and I don't think it means much to anyone else, either. That's why I
> think that maybe "kvm" should be in the string, to at least indicate that it's a
> virtualized environment.
But what about this is specific to kvm (the actual hypervisor info is
already described in /hypervisor)? Then we'll have to add a platform match
for every other hypervisor out there that does the same thing.
-Scott
^ permalink raw reply
* Re: RFC: top level compatibles for virtual platforms
From: Timur Tabi @ 2011-07-11 15:45 UTC (permalink / raw)
To: Yoder Stuart-B08248
Cc: Wood Scott-B07421, Alexander Graf, linuxppc-dev@ozlabs.org,
Gala Kumar-B11780
In-Reply-To: <9F6FE96B71CF29479FF1CDC8046E150316F97F@039-SN1MPN1-003.039d.mgd.msft.net>
Yoder Stuart-B08248 wrote:
> We're talking about what would be meaningful to Linux as a guest on
> this platform here-- Corenet-based SoCs are similar
> in various ways, like using msgsnd for IPIs, having external proxy
> support, etc.
>
> A corenet platform created by a QEMU/KVM looks similar
> to other corenet SoCs. So, I'm trying to find some generic
> compatible string that describes this platform.
Is there a list of these features that are 100% guaranteed to belong to a
corenet platform?
I'm just not comfortable using "corenet" as a basis for a feature set that has
nothing to do with coherency.
>> Also, if these are KVM creations, shouldn't there be a "kvm" in the compatible string
>> somewhere?
>
> There is nothing KVM specific about these platforms. Any hypervisor
> could create a similar virtual machine.
True, but I think we're on a slippery slope, here. Virtualization allows us to
create "virtual platforms" that are not well defined. Linux requires a unique
compatible string for each platform. That's easy when we ship a reference board
that has a unique name and a fixed, well-defined set of features. But with
these virtual platforms, what does the name mean?
I guess my point is back to the name "corenet". That just doesn't mean anything
to me, and I don't think it means much to anyone else, either. That's why I
think that maybe "kvm" should be in the string, to at least indicate that it's a
virtualized environment.
> A guest OS can determine specific info about the hypervisor it is
> running on by looking at the /hypervisor node on the device
> tree.
>
> We could put a generic -hv extension to indicate that this is
> a virtual platform.
>
> "mpc85xx-hv"
> "corenet-32-hv"
> "corenet-64-hv"
That's an improvement, but I wonder if we should just keep doing what we do with
Topaz: take the actual hardware platform and add -hv to it. Of course, that
conflicts with Topaz at the moment.
--
Timur Tabi
Linux kernel developer at Freescale
--
Timur Tabi
Linux kernel developer at Freescale
^ permalink raw reply
* Re: NAND BBT corruption on MPC83xx
From: Matthew L. Creech @ 2011-07-11 15:30 UTC (permalink / raw)
To: Scott Wood; +Cc: linux-mtd, linuxppc-dev, rick22, Mike Hench
In-Reply-To: <CAPBSBxqRywJjCfk_iuDLxn1HbuqPjJRJS=SkRQ3-Ge7SinbSBg@mail.gmail.com>
On Tue, Jul 5, 2011 at 3:58 PM, Matthew L. Creech <mlcreech@gmail.com> wrot=
e:
>
> Separately, I set up 2 test devices to run while I was away last week.
> =A0One of them contained 2 patches:
>
> - Mike Hench's patch which eliminates this block of code in fsl_elbc_nand=
.c
> - Adam Thomson's patch
> (http://lists.infradead.org/pipermail/linux-mtd/2011-June/036427.html)
> which initializes oob_poi correctly
>
> Upon my return, the device with these patches saw no problems at all,
> and had no additional bad blocks. =A0The device without these patches
> had some 200+ blocks which had been newly marked as bad in the BBT
> over the course of 10 days. =A0After rebooting, this latter device then
> failed to boot, as shown here:
>
> http://mcreech.com/work/bbt-ecc-error4.txt
>
> I'm currently running another test to verify which of the two patches
> actually fixed this problem (which might take a few days), but it
> seems like removing that block of code in fsl_elbc_nand.c is a good
> idea.
>
Just an update: my tests confirmed that the patch to fsl_elbc_nand.c
(http://lists.infradead.org/pipermail/linux-mtd/2011-July/036893.html)
seems to have fixed these BBT corruption problems.
I ran a torture test on 2 devices for several days: the one which had
only that patch had no further issues, while the one which didn't have
it (but did have the other oob_poi patch from Adam) experienced BBT
corruption.
Thanks everyone
--=20
Matthew L. Creech
^ permalink raw reply
* Re: Analysing a kernel panic
From: Guillaume Dargaud @ 2011-07-11 14:38 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <1310253395.8267.7.camel@pasglop>
> Ok, I'm not familiar with that PIC. You need to check what's going on
> between the PIC, your interrupt source and the kernel.
>
> Normally, if it's an edge interrupt, it's a single event that gets
> latched by the PIC. The kernel will then call ack() on that PIC driver
> (irq_chip) which should clear that latch -before- getting into your
> device driver for processing.
>
> Also, the interrupt shall either be masked while processing or if it
> re-enters, the PIC code shall try to mask it (lazy masking) until the
> original handler completes at which point it gets unmasked. That shall
> be handled by the standard flow handlers, so it really depends on how
> you hookup your PIC in SW.
This should be all this:
static int xad_driver_probe(struct of_device* dev, const struct of_device_id *match) {
struct device_node *dn = dev->node;
Xad.irq = irq_of_parse_and_map(dn, 0);
rc=request_irq(Xad.irq, XadIsr, IRQF_TRIGGER_RISING | IRQF_DISABLED | IRQF_SHARED /*| IRQF_SAMPLE_RANDOM*/,
"XadIsr", &Xad);
IIRC IRQF_DISABLED is obsolete (I've tried without).
What mystifies me is that:
- my same code on slightly different hardware works perfectly (the differences are not relevant to the driver but to the
user application).
- a simplified standalone code works (so, non-linux).
--
Guillaume Dargaud
http://www.gdargaud.net/
^ permalink raw reply
* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-11 14:36 UTC (permalink / raw)
To: Grant Likely, Tabi Timur-B04825
Cc: linuxppc-dev@ozlabs.org, Gala Kumar-B11780, Alexander Graf,
Wood Scott-B07421
In-Reply-To: <CACxGe6v_xr=1ro+g=CaAqZNd5sF0+V55gtOjVJa1dw=W6WUYMg@mail.gmail.com>
> -----Original Message-----
> From: glikely@secretlab.ca [mailto:glikely@secretlab.ca] On Behalf Of Gra=
nt Likely
> Sent: Friday, July 08, 2011 9:42 PM
> To: Tabi Timur-B04825
> Cc: Yoder Stuart-B08248; Grant Likely; Benjamin Herrenschmidt; Gala Kumar=
-B11780; Wood Scott-
> B07421; Alexander Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Friday, July 8, 2011, Tabi Timur-B04825 <B04825@freescale.com> wrote:
> > On Fri, Jul 8, 2011 at 1:43 PM, Yoder Stuart-B08248
> > <B08248@freescale.com> wrote:
> >
> >> =A0 "MPC85xxDS" - for a virtual machine for the e500v2 type platforms
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 and would support 85xx targets, plus P=
2020, P1022,etc
> >>
> >> =A0 "corenet-32-ds" - for a virtual machine similar to the 32-bit P408=
0
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platforms
> >>
> >> =A0 "corenet-64-ds" - for a virtual machine based on a 64-bit corenet
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platform
> >
> > I think we should drop the "DS" because that's a name applied to
> > certain Freescale reference boards.
> >
> > Is being a CoreNet board really something meaningful with respect to
> > KVM? =A0I don't see the connection.
> >
> > Also, if these are KVM creations, shouldn't there be a "kvm" in the
> > compatible string somewhere?
>=20
> I would say so. That would accurately describe the execution environment.
As I mentioned to Timur, there is nothing KVM specific about
the execution environment. The /hypervisor node (as per=20
ePAPR 1.1) describes hypervisor specific info.
Stuart
^ permalink raw reply
* RE: RFC: top level compatibles for virtual platforms
From: Yoder Stuart-B08248 @ 2011-07-11 14:34 UTC (permalink / raw)
To: Tabi Timur-B04825
Cc: Wood Scott-B07421, Alexander Graf, linuxppc-dev@ozlabs.org,
Gala Kumar-B11780
In-Reply-To: <CAOZdJXVp5LHqYEMoqpnKEkuWyuE6kwU+Z9VwUYLPcz-+h_gqfw@mail.gmail.com>
> -----Original Message-----
> From: Tabi Timur-B04825
> Sent: Friday, July 08, 2011 8:39 PM
> To: Yoder Stuart-B08248
> Cc: Grant Likely; Benjamin Herrenschmidt; Gala Kumar-B11780; Wood Scott-B=
07421; Alexander
> Graf; linuxppc-dev@ozlabs.org
> Subject: Re: RFC: top level compatibles for virtual platforms
>=20
> On Fri, Jul 8, 2011 at 1:43 PM, Yoder Stuart-B08248 <B08248@freescale.com=
> wrote:
>=20
> > =A0 "MPC85xxDS" - for a virtual machine for the e500v2 type platforms
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 and would support 85xx targets, plus P2=
020, P1022,etc
> >
> > =A0 "corenet-32-ds" - for a virtual machine similar to the 32-bit P4080
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platforms
> >
> > =A0 "corenet-64-ds" - for a virtual machine based on a 64-bit corenet
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 platform
>=20
> I think we should drop the "DS" because that's a name applied to certain =
Freescale reference
> boards.
>=20
> Is being a CoreNet board really something meaningful with respect to KVM?=
I don't see the
> connection.
We're talking about what would be meaningful to Linux as a guest on
this platform here-- Corenet-based SoCs are similar=20
in various ways, like using msgsnd for IPIs, having external proxy
support, etc.
A corenet platform created by a QEMU/KVM looks similar
to other corenet SoCs. So, I'm trying to find some generic
compatible string that describes this platform.
> Also, if these are KVM creations, shouldn't there be a "kvm" in the compa=
tible string
> somewhere?
There is nothing KVM specific about these platforms. Any hypervisor
could create a similar virtual machine.
A guest OS can determine specific info about the hypervisor it is
running on by looking at the /hypervisor node on the device
tree.
We could put a generic -hv extension to indicate that this is
a virtual platform.
"mpc85xx-hv"
"corenet-32-hv"
"corenet-64-hv"
Stuart
^ permalink raw reply
* [v3 PATCH 1/1] booke/kprobe: make program exception to use one dedicated exception stack
From: Tiejun Chen @ 2011-07-11 11:31 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
When kprobe these operations such as store-and-update-word for SP(r1),
stwu r1, -A(r1)
The program exception is triggered, and PPC always allocate an exception frame
as shown as the follows:
old r1 ----------
...
nip
gpr[2] ~ gpr[31]
gpr[1] <--------- old r1 is stored.
gpr[0]
-------- <--------- pr_regs @offset 16 bytes
padding
STACK_FRAME_REGS_MARKER
LR
back chain
new r1 ----------
Then emulate_step() will emulate this instruction, 'stwu'. Actually its
equivalent to:
1> Update pr_regs->gpr[1] = mem[old r1 + (-A)]
2> stw [old r1], mem[old r1 + (-A)]
Please notice the stack based on new r1 may be covered with mem[old r1
+(-A)] when addr[old r1 + (-A)] < addr[old r1 + sizeof(an exception frame0].
So the above 2# operation will overwirte something to break this exception
frame then unexpected kernel problem will be issued.
So looks we have to implement independed interrupt stack for PPC program
exception when CONFIG_BOOKE is enabled. Here we can use
EXC_LEVEL_EXCEPTION_PROLOG to replace original NORMAL_EXCEPTION_PROLOG
for program exception if CONFIG_BOOKE. Then its always safe for kprobe
with independed exc stack from one pre-allocated and dedicated thread_info.
Actually this is just waht we did for critical/machine check exceptions
on PPC.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
arch/powerpc/include/asm/irq.h | 3 +++
arch/powerpc/include/asm/reg.h | 4 ++++
arch/powerpc/kernel/entry_32.S | 15 +++++++++++++++
arch/powerpc/kernel/head_booke.h | 15 +++++++++++++--
arch/powerpc/kernel/irq.c | 11 +++++++++++
arch/powerpc/kernel/setup_32.c | 4 ++++
6 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/irq.h b/arch/powerpc/include/asm/irq.h
index 1bff591..6d12169 100644
--- a/arch/powerpc/include/asm/irq.h
+++ b/arch/powerpc/include/asm/irq.h
@@ -313,6 +313,9 @@ struct pt_regs;
extern struct thread_info *critirq_ctx[NR_CPUS];
extern struct thread_info *dbgirq_ctx[NR_CPUS];
extern struct thread_info *mcheckirq_ctx[NR_CPUS];
+#if defined(CONFIG_KPROBES) && defined(CONFIG_BOOKE)
+extern struct thread_info *pgirq_ctx[NR_CPUS];
+#endif
extern void exc_lvl_ctx_init(void);
#else
#define exc_lvl_ctx_init()
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index c5cae0d..34d6178 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -885,6 +885,10 @@
#endif
#define SPRN_SPRG_RVCPU SPRN_SPRG1
#define SPRN_SPRG_WVCPU SPRN_SPRG1
+#ifdef CONFIG_KPROBES
+#define SPRN_SPRG_RSCRATCH_PG SPRN_SPRG0
+#define SPRN_SPRG_WSCRATCH_PG SPRN_SPRG0
+#endif
#endif
#ifdef CONFIG_8xx
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 56212bc..a99e209 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -1122,6 +1122,21 @@ ret_from_mcheck_exc:
RESTORE_xSRR(DSRR0,DSRR1);
RESTORE_MMU_REGS;
RET_FROM_EXC_LEVEL(SPRN_MCSRR0, SPRN_MCSRR1, PPC_RFMCI)
+
+ .globl ret_from_prog_exc
+ret_from_prog_exc:
+ mfspr r9,SPRN_SPRG_THREAD
+ lwz r10,SAVED_KSP_LIMIT(r1)
+ stw r10,KSP_LIMIT(r9)
+ lwz r9,THREAD_INFO-THREAD(r9)
+ rlwinm r10,r1,0,0,(31-THREAD_SHIFT)
+ lwz r10,TI_PREEMPT(r10)
+ stw r10,TI_PREEMPT(r9)
+ RESTORE_xSRR(SRR0,SRR1);
+ RESTORE_xSRR(CSRR0,CSRR1);
+ RESTORE_xSRR(DSRR0,DSRR1);
+ RESTORE_MMU_REGS;
+ RET_FROM_EXC_LEVEL(SPRN_SRR0, SPRN_SRR1, rfi)
#endif /* CONFIG_BOOKE */
/*
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index a0bf158..941be40 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -79,6 +79,10 @@
/* only on e500mc/e200 */
#define DBG_STACK_BASE dbgirq_ctx
+#if defined(CONFIG_KPROBES)
+#define PG_STACK_BASE pgirq_ctx
+#endif
+
#define EXC_LVL_FRAME_OVERHEAD (THREAD_SIZE - INT_FRAME_SIZE - EXC_LVL_SIZE)
#ifdef CONFIG_SMP
@@ -158,6 +162,12 @@
EXC_LEVEL_EXCEPTION_PROLOG(DBG, SPRN_DSRR0, SPRN_DSRR1)
#define MCHECK_EXCEPTION_PROLOG \
EXC_LEVEL_EXCEPTION_PROLOG(MC, SPRN_MCSRR0, SPRN_MCSRR1)
+#if defined(CONFIG_KPROBES)
+#define PROGRAM_EXCEPTION_PROLOG \
+ EXC_LEVEL_EXCEPTION_PROLOG(PG, SPRN_SRR0, SPRN_SRR1)
+#else
+#define PROGRAM_EXCEPTION_PROLOG NORMAL_EXCEPTION_PROLOG
+#endif
/*
* Exception vectors.
@@ -370,11 +380,12 @@ label:
#define PROGRAM_EXCEPTION \
START_EXCEPTION(Program) \
- NORMAL_EXCEPTION_PROLOG; \
+ PROGRAM_EXCEPTION_PROLOG; \
mfspr r4,SPRN_ESR; /* Grab the ESR and save it */ \
stw r4,_ESR(r11); \
addi r3,r1,STACK_FRAME_OVERHEAD; \
- EXC_XFER_STD(0x0700, program_check_exception)
+ EXC_XFER_TEMPLATE(program_check_exception, 0x0700, MSR_KERNEL, NOCOPY,\
+ transfer_to_handler_full, ret_from_prog_exc)
#define DECREMENTER_EXCEPTION \
START_EXCEPTION(Decrementer) \
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 5b428e3..ff5b8dd 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -397,6 +397,10 @@ struct thread_info *critirq_ctx[NR_CPUS] __read_mostly;
struct thread_info *dbgirq_ctx[NR_CPUS] __read_mostly;
struct thread_info *mcheckirq_ctx[NR_CPUS] __read_mostly;
+#if defined(CONFIG_KPROBES) && defined(CONFIG_BOOKE)
+struct thread_info *pgirq_ctx[NR_CPUS] __read_mostly;
+#endif
+
void exc_lvl_ctx_init(void)
{
struct thread_info *tp;
@@ -423,6 +427,13 @@ void exc_lvl_ctx_init(void)
tp = mcheckirq_ctx[cpu_nr];
tp->cpu = cpu_nr;
tp->preempt_count = HARDIRQ_OFFSET;
+
+#if defined(CONFIG_KPROBES)
+ memset((void *)pgirq_ctx[i], 0, THREAD_SIZE);
+ tp = pgirq_ctx[i];
+ tp->cpu = i;
+ tp->preempt_count = 0;
+#endif
#endif
}
}
diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
index 620d792..b872564 100644
--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -272,6 +272,10 @@ static void __init exc_lvl_early_init(void)
__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
mcheckirq_ctx[hw_cpu] = (struct thread_info *)
__va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
+#ifdef CONFIG_KPROBES
+ pgirq_ctx[hw_cpu] = (struct thread_info *)
+ __va(memblock_alloc(THREAD_SIZE, THREAD_SIZE));
+#endif
#endif
}
}
--
1.5.6
^ permalink raw reply related
* [v3] booke/kprobe: Fix stack corrupt issue when kprobe 'stwu'
From: Tiejun Chen @ 2011-07-11 11:31 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
In-Reply-To: <1310383915-30543-1-git-send-email-tiejun.chen@windriver.com>
v1 -> v2: when allocate pgirq_ctx, use 'hw_cpu' to identify cpu ID in exc_lvl_early_init().
v2 -> v3: add that specific return-from-program-exc to restore necessary thread info then
we can withdraw the original patch #2.
Tiejun
^ permalink raw reply
* [PATCH v1] kexec-tools: ppc32: Fixup the ThreadPointer for purgatory code.
From: Suzuki K. Poulose @ 2011-07-11 11:29 UTC (permalink / raw)
To: Simon Horman
Cc: Suzuki K. Poulose, kexec, Paul Mackerras, linux ppc dev,
Vivek Goyal
PPC32 ELF ABI expects r2 to be loaded with Thread Pointer, which is 0x7000
bytes past the end of TCB. Though the purgatory is single threaded, it uses
TCB scratch space in vsnprintf(). This patch allocates a 1024byte TCB
and populates the TP with the address accordingly.
Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc: Ryan S. Arnold <rsa@us.ibm.com>
---
kexec/arch/ppc/kexec-elf-ppc.c | 9 +++++++++
kexec/arch/ppc/kexec-uImage-ppc.c | 8 ++++++++
purgatory/arch/ppc/purgatory-ppc.c | 2 +-
purgatory/arch/ppc/v2wrap_32.S | 4 ++++
4 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/kexec/arch/ppc/kexec-elf-ppc.c b/kexec/arch/ppc/kexec-elf-ppc.c
index f4443b4..3a4b59b 100644
--- a/kexec/arch/ppc/kexec-elf-ppc.c
+++ b/kexec/arch/ppc/kexec-elf-ppc.c
@@ -414,6 +414,15 @@ int elf_ppc_load(int argc, char **argv, const char *buf, off_t len,
elf_rel_set_symbol(&info->rhdr, "stack", &addr, sizeof(addr));
#undef PUL_STACK_SIZE
+#define TCB_SIZE 1024
+#define TCB_TP_OFFSET 0x7000 /* PPC32 ELF ABI */
+
+ addr = locate_hole(info, TCB_SIZE, 0, 0, elf_max_addr(&ehdr), 1);
+ addr += TCB_SIZE + TCB_TP_OFFSET;
+ elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+#undef TCB_SIZE
+#undef TCB_TP_OFFSET
+
addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
info->entry = (void *)addr;
#endif
diff --git a/kexec/arch/ppc/kexec-uImage-ppc.c b/kexec/arch/ppc/kexec-uImage-ppc.c
index 1d71374..4c0adf6 100644
--- a/kexec/arch/ppc/kexec-uImage-ppc.c
+++ b/kexec/arch/ppc/kexec-uImage-ppc.c
@@ -228,6 +228,14 @@ static int ppc_load_bare_bits(int argc, char **argv, const char *buf,
/* No allocation past here in order not to overwrite the stack */
#undef PUL_STACK_SIZE
+#define TCB_SIZE 1024
+#define TCB_TP_OFFSET 0x7000
+ addr = locate_hole(info, TCB_SIZE, 0, 0, -1, 1);
+ addr += TCB_TP_OFFSET;
+ elf_rel_set_symbol(&info->rhdr, "my_thread_ptr", &addr, sizeof(addr));
+#undef TCB_TP_OFFSET
+#undef TCB_SIZE
+
addr = elf_rel_get_addr(&info->rhdr, "purgatory_start");
info->entry = (void *)addr;
diff --git a/purgatory/arch/ppc/purgatory-ppc.c b/purgatory/arch/ppc/purgatory-ppc.c
index 349e750..3e6b354 100644
--- a/purgatory/arch/ppc/purgatory-ppc.c
+++ b/purgatory/arch/ppc/purgatory-ppc.c
@@ -26,7 +26,7 @@ unsigned int panic_kernel = 0;
unsigned long backup_start = 0;
unsigned long stack = 0;
unsigned long dt_offset = 0;
-unsigned long my_toc = 0;
+unsigned long my_thread_ptr = 0;
unsigned long kernel = 0;
void setup_arch(void)
diff --git a/purgatory/arch/ppc/v2wrap_32.S b/purgatory/arch/ppc/v2wrap_32.S
index 8442d16..8b60677 100644
--- a/purgatory/arch/ppc/v2wrap_32.S
+++ b/purgatory/arch/ppc/v2wrap_32.S
@@ -56,6 +56,10 @@ master:
mr 17,3 # save cpu id to r17
mr 15,4 # save physical address in reg15
+ lis 6,my_thread_ptr@h
+ ori 6,6,my_thread_ptr@l
+ lwz 2,0(6) # setup ThreadPointer(TP)
+
lis 6,stack@h
ori 6,6,stack@l
lwz 1,0(6) #setup stack
^ permalink raw reply related
* Re: [v2 PATCH 2/2] booke/kprobe: remove unnecessary preempt_enable_no_resched
From: tiejun.chen @ 2011-07-11 11:28 UTC (permalink / raw)
To: ananth; +Cc: tiejun.chen, linuxppc-dev
In-Reply-To: <4E1AB59A.6050002@windriver.com>
tiejun.chen wrote:
> Ananth N Mavinakayanahalli wrote:
>> On Mon, Jul 11, 2011 at 10:39:35AM +0800, Tiejun Chen wrote:
>>> When enable CONFIG_PREEMPT we will trigger the following call trace:
>>>
>>> BUG: scheduling while atomic: swapper/1/0x10000000
>>> ...
>>>
>>> krpobe always goes through the following path:
>>>
>>> program_check_exception()
>>> |
>>> + notify_die(DIE_BPT, "breakpoint",...)
>>> |
>>> + kprobe_handler()
>>> |
>>> + preempt_disable();
>>> + break_handler() <- preempt_enable_no_resched()
>>> + emulate_step()
>>> + preempt_enable_no_resched()
>>> ...
>>> exit
>>>
>>> We should remove unnecessary preempt_enable_no_resched() inside of break_handler()
>>> since looks longjmp_break_handler() always go the above path.
>> The current code is correct. Reasoning follows...
>>
>> setjmp_pre_handler() and longjmp_break_handler() are used only for
>> jprobes. In the case of a jprobe, the code flow would be:
>>
>> bp hit -> kprobe_handler() -> preempt_disable() -> setjmp_pre_handler()
>> (not that since this routine returns 1, we skip sstep here) -> jp->entry()
>> -> jprobe_return() -> bp hit -> kprobe_handler() -> preempt_disable() again
>> -> longjmp_break_handler() -> preempt_enable() -> sstep -> preempt_enable()
>> (for the second kprobe_handler() entry).
>>
>> You could verify this with a preempt_count() printk with a
>> CONFIG_PREEMPT=y kernel.
>>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>> Nack, sorry :-)
>
> You're right.
>
> When use EXC_LEVEL_EXCEPTION_PROLOG for Critical/Machine check, if the exception
> came from kernel mode, we copy thread_info flags, *preempt*, and task pointer
> from the process thread_info. So here I steal EXC_LEVEL_EXCEPTION_PROLOG for
> Program Exception, preempt count would be corrupted incorrectly.
Looks I miss the specific return-from-program-exc to restore those necessary
thread information like we did for debug exception with ret_from_debug_exc when
use EXC_LEVEL_EXCEPTION_PROLOG for debug exception.
Will update this on v3.
Tiejun
>
> Thanks
> Tiejun
>
>> Ananth
>>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
^ permalink raw reply
* Re: [v2 PATCH 2/2] booke/kprobe: remove unnecessary preempt_enable_no_resched
From: tiejun.chen @ 2011-07-11 8:34 UTC (permalink / raw)
To: ananth; +Cc: linuxppc-dev
In-Reply-To: <20110711055516.GA4659@in.ibm.com>
Ananth N Mavinakayanahalli wrote:
> On Mon, Jul 11, 2011 at 10:39:35AM +0800, Tiejun Chen wrote:
>> When enable CONFIG_PREEMPT we will trigger the following call trace:
>>
>> BUG: scheduling while atomic: swapper/1/0x10000000
>> ...
>>
>> krpobe always goes through the following path:
>>
>> program_check_exception()
>> |
>> + notify_die(DIE_BPT, "breakpoint",...)
>> |
>> + kprobe_handler()
>> |
>> + preempt_disable();
>> + break_handler() <- preempt_enable_no_resched()
>> + emulate_step()
>> + preempt_enable_no_resched()
>> ...
>> exit
>>
>> We should remove unnecessary preempt_enable_no_resched() inside of break_handler()
>> since looks longjmp_break_handler() always go the above path.
>
> The current code is correct. Reasoning follows...
>
> setjmp_pre_handler() and longjmp_break_handler() are used only for
> jprobes. In the case of a jprobe, the code flow would be:
>
> bp hit -> kprobe_handler() -> preempt_disable() -> setjmp_pre_handler()
> (not that since this routine returns 1, we skip sstep here) -> jp->entry()
> -> jprobe_return() -> bp hit -> kprobe_handler() -> preempt_disable() again
> -> longjmp_break_handler() -> preempt_enable() -> sstep -> preempt_enable()
> (for the second kprobe_handler() entry).
>
> You could verify this with a preempt_count() printk with a
> CONFIG_PREEMPT=y kernel.
>
>> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
>
> Nack, sorry :-)
You're right.
When use EXC_LEVEL_EXCEPTION_PROLOG for Critical/Machine check, if the exception
came from kernel mode, we copy thread_info flags, *preempt*, and task pointer
from the process thread_info. So here I steal EXC_LEVEL_EXCEPTION_PROLOG for
Program Exception, preempt count would be corrupted incorrectly.
Thanks
Tiejun
>
> Ananth
>
^ permalink raw reply
* Re: [RFC PATCH 1/1] BPF JIT for PPC64
From: Matt Evans @ 2011-07-11 7:04 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, linuxppc-dev
In-Reply-To: <1308988180.2532.27.camel@edumazet-laptop>
Hi Eric,
On 25/06/11 17:49, Eric Dumazet wrote:
> Le samedi 25 juin 2011 à 09:33 +0200, Andreas Schwab a écrit :
>> Matt Evans <matt@ozlabs.org> writes:
>>
>>> + stdu r1, -128(r1); \
>>
>>> + addi r5, r1, 128+BPF_PPC_STACK_BASIC+(2*8); \
>>
>>> + addi r1, r1, 128; \
>>
>>> + PPC_STD(r_M + i, 1, -128 + (8*i));
>>
>>> + PPC_LD(r_M + i, 1, -128 + (8*i));
>>
>> s/128/BPF_PPC_STACK_SAVE/?
>>
>
> I am not sure using registers to hold MEM[] is a win if MEM[idx] is used
> once in the filter
>
> # tcpdump "tcp[20]+tcp[21]=0" -d
> (000) ldh [12]
> (001) jeq #0x800 jt 2 jf 15
> (002) ldb [23]
> (003) jeq #0x6 jt 4 jf 15
> (004) ldh [20]
> (005) jset #0x1fff jt 15 jf 6
> (006) ldxb 4*([14]&0xf)
> (007) ldb [x + 34]
> (008) st M[1]
> (009) ldb [x + 35]
> (010) tax
> (011) ld M[1]
> (012) add x
> (013) jeq #0x0 jt 14 jf 15
> (014) ret #65535
> (015) ret #0
>
> In this sample, we use M[1] once ( one store, one load)
>
> So saving previous register content on stack in prologue, and restoring
> it in epilogue actually slow down the code, and adds two instructions in
> filter asm code.
The x86 version generates all accesses of M[x] as a load or store to the
stackframe, so your example would result in one store + one load to/from the
stack. More than one access would result in N stores/loads. By having M[] live
in r16-31 on PPC, there is a one-off cost of saving/restoring the (non-volatile)
register to the stack plus a per-use cost of a register-register move (MR, which
is pretty cheap compared to loads/stores!).
You are correct in that, for a single store/load of M[1] above, I'll generate
a STD, MR, MR, LD, but this extra cost of the two MRs is pretty small. With the
current implementation the gains seen when accessing M[x] /more/ than once will
IMHO more than justify this. (For several M[x] accesses, x86 would have several
mem ops, PPC would have several reg-reg moves and one load+store.)
An obvious alternative would be to use some of the three free volatile registers
first (in the hope that "most" filters won't use >3 M[] locations), with a
simple register allocator. This would save the (non-volatile reg) spill/fill to
stack, too. In the interest of simplicity I didn't want to do that on a first
pass.
> This also makes epilogue code not easy (not possible as a matter of fact)
> to unwind in helper function
>
> In x86_64 implementation, I chose bpf_error be able to force
> an exception, not returning to JIT code but directly to bpf_func() caller
>
> bpf_error:
> # force a return 0 from jit handler
> xor %eax,%eax
> mov -8(%rbp),%rbx
> leaveq
> ret
Yep, if I use non-volatile regs a return isn't just a simple stack "pop".
Currently, I've an extra branch in the return path to hit the common epilogue.
This could be optimised such that the out of line error path jumps directly to
the common epilogue to return (rather than back to the callsite, checking a flag
and /then/ to the epilogue) to speed up the non-error case. However, it's just
a question of getting to the (existing) epilogue code to clean up; it doesn't
need to be unwound in the helper function. I don't think this issue is a
strong argument against having M[] exist in registers, though.
>From the current position, I think going in the direction of using volatile regs
(without backup/restore cost) is better than going in the direction of making
all M[] references stack accesses. Do you think it's bearable to continue as it
is and then perform that optimisation later?
Also, thanks for reading/commenting on the patch!
Cheers,
Matt
^ permalink raw reply
* Re: [RFC PATCH 1/1] BPF JIT for PPC64
From: Matt Evans @ 2011-07-11 6:27 UTC (permalink / raw)
To: Andreas Schwab; +Cc: netdev, linuxppc-dev
In-Reply-To: <m2boxmxt45.fsf@igel.home>
On 25/06/11 17:33, Andreas Schwab wrote:
> Matt Evans <matt@ozlabs.org> writes:
>
>> + stdu r1, -128(r1); \
>
>> + addi r5, r1, 128+BPF_PPC_STACK_BASIC+(2*8); \
>
>> + addi r1, r1, 128; \
>
>> + PPC_STD(r_M + i, 1, -128 + (8*i));
>
>> + PPC_LD(r_M + i, 1, -128 + (8*i));
>
> s/128/BPF_PPC_STACK_SAVE/?
Actually, that's a different 128, but that nicely illustrates that I should've
#defined something more recognisable :-) The second set, with -128, is actually
in the save area for non-volatile regs, whereas the first is just a stackframe
size.
Cheers,
Matt
^ permalink raw reply
* Re: [RFC PATCH 1/1] BPF JIT for PPC64
From: Matt Evans @ 2011-07-11 6:21 UTC (permalink / raw)
To: Ben Hutchings; +Cc: netdev, linuxppc-dev
In-Reply-To: <1308967114.3093.1379.camel@localhost>
On 25/06/11 11:58, Ben Hutchings wrote:
> On Fri, 2011-06-24 at 16:02 +1000, Matt Evans wrote:
> [...]
>> + case BPF_S_ALU_ADD_K: /* A += K; */
>> + if (!K)
>> + break;
>> + if (K < 32768)
>> + PPC_ADDI(r_A, r_A, K);
>> + else
>> + PPC_ADDI(r_A, r_A, IMM_L(K));
>> + PPC_ADDIS(r_A, r_A, IMM_HA(K));
>> + break;
>
> Missing braces.
>
>> + case BPF_S_ALU_SUB_X: /* A -= X; */
>> + ctx->seen |= SEEN_XREG;
>> + PPC_SUB(r_A, r_A, r_X);
>> + break;
>> + case BPF_S_ALU_SUB_K: /* A -= K */
>> + if (!K)
>> + break;
>> + if (K < 32768)
>> + PPC_ADDI(r_A, r_A, -K);
>> + else
>> + PPC_ADDI(r_A, r_A, IMM_L(-K));
>> + PPC_ADDIS(r_A, r_A, IMM_HA(-K));
>> + break;
> [...]
>
> Here as well.
Thanks, Ben -- oops! :) Really, just the ADDISes need to be conditional, too.
Cheers,
Matt
^ permalink raw reply
* Re: [v2 PATCH 2/2] booke/kprobe: remove unnecessary preempt_enable_no_resched
From: Ananth N Mavinakayanahalli @ 2011-07-11 5:55 UTC (permalink / raw)
To: Tiejun Chen; +Cc: linuxppc-dev
In-Reply-To: <1310351976-24078-2-git-send-email-tiejun.chen@windriver.com>
On Mon, Jul 11, 2011 at 10:39:35AM +0800, Tiejun Chen wrote:
> When enable CONFIG_PREEMPT we will trigger the following call trace:
>
> BUG: scheduling while atomic: swapper/1/0x10000000
> ...
>
> krpobe always goes through the following path:
>
> program_check_exception()
> |
> + notify_die(DIE_BPT, "breakpoint",...)
> |
> + kprobe_handler()
> |
> + preempt_disable();
> + break_handler() <- preempt_enable_no_resched()
> + emulate_step()
> + preempt_enable_no_resched()
> ...
> exit
>
> We should remove unnecessary preempt_enable_no_resched() inside of break_handler()
> since looks longjmp_break_handler() always go the above path.
The current code is correct. Reasoning follows...
setjmp_pre_handler() and longjmp_break_handler() are used only for
jprobes. In the case of a jprobe, the code flow would be:
bp hit -> kprobe_handler() -> preempt_disable() -> setjmp_pre_handler()
(not that since this routine returns 1, we skip sstep here) -> jp->entry()
-> jprobe_return() -> bp hit -> kprobe_handler() -> preempt_disable() again
-> longjmp_break_handler() -> preempt_enable() -> sstep -> preempt_enable()
(for the second kprobe_handler() entry).
You could verify this with a preempt_count() printk with a
CONFIG_PREEMPT=y kernel.
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Nack, sorry :-)
Ananth
^ permalink raw reply
* [PATCH] powerpc: add denormalisation exception handling for POWER6/7
From: Michael Neuling @ 2011-07-11 5:52 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev, Paul Mackerras, miltonm
On POWER6 and POWER7 if the input operand to an instruction is a
denormalised single precision binary floating we can take a
denormalisation exception where it's expected that the hypervisor (HV=1)
will fix up the inputs before the instruction is run.
This adds code to handle this denormalisation exception for POWER6 and
POWER7.
It also add a CONFIG_PPC_DENORMALISATION option and sets it in
pseries/ppc64_defconfig.
This is useful on bare metal systems only. Based on patch from Milton
Miller.
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
arch/powerpc/Kconfig | 7 +
arch/powerpc/configs/ppc64_defconfig | 1
arch/powerpc/configs/pseries_defconfig | 1
arch/powerpc/include/asm/ppc-opcode.h | 2
arch/powerpc/include/asm/reg.h | 1
arch/powerpc/kernel/exceptions-64s.S | 125 +++++++++++++++++++++++++++++++++
6 files changed, 137 insertions(+)
Index: linux-ozlabs/arch/powerpc/Kconfig
===================================================================
--- linux-ozlabs.orig/arch/powerpc/Kconfig
+++ linux-ozlabs/arch/powerpc/Kconfig
@@ -556,6 +556,13 @@ config SCHED_SMT
when dealing with POWER5 cpus at a cost of slightly increased
overhead in some places. If unsure say N here.
+config PPC_DENORMALISATION
+ bool "PowerPC denormalisation exception handling"
+ default "n"
+ ---help---
+ Add support for handling denormalisation of single precision
+ values. Useful for bare metal only. If unsure say Y here.
+
config CMDLINE_BOOL
bool "Default bootloader kernel arguments"
Index: linux-ozlabs/arch/powerpc/configs/ppc64_defconfig
===================================================================
--- linux-ozlabs.orig/arch/powerpc/configs/ppc64_defconfig
+++ linux-ozlabs/arch/powerpc/configs/ppc64_defconfig
@@ -54,6 +54,7 @@ CONFIG_KEXEC=y
CONFIG_IRQ_ALL_CPUS=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_SCHED_SMT=y
+CONFIG_PPC_DENORMALISATION=y
CONFIG_PCCARD=y
CONFIG_ELECTRA_CF=y
CONFIG_HOTPLUG_PCI=m
Index: linux-ozlabs/arch/powerpc/configs/pseries_defconfig
===================================================================
--- linux-ozlabs.orig/arch/powerpc/configs/pseries_defconfig
+++ linux-ozlabs/arch/powerpc/configs/pseries_defconfig
@@ -47,6 +47,7 @@ CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PPC_64K_PAGES=y
CONFIG_PPC_SUBPAGE_PROT=y
CONFIG_SCHED_SMT=y
+CONFIG_PPC_DENORMALISATION=y
CONFIG_HOTPLUG_PCI=m
CONFIG_HOTPLUG_PCI_RPA=m
CONFIG_HOTPLUG_PCI_RPA_DLPAR=m
Index: linux-ozlabs/arch/powerpc/include/asm/ppc-opcode.h
===================================================================
--- linux-ozlabs.orig/arch/powerpc/include/asm/ppc-opcode.h
+++ linux-ozlabs/arch/powerpc/include/asm/ppc-opcode.h
@@ -157,6 +157,8 @@
VSX_XX1((s), (a), (b)))
#define XXLOR(t, a, b) stringify_in_c(.long PPC_INST_XXLOR | \
VSX_XX3((t), (a), (b)))
+#define XVCPSGNDP(t, a, b) stringify_in_c(.long (0xf0000780 | \
+ VSX_XX3((t), (a), (b))))
#define PPC_NAP stringify_in_c(.long PPC_INST_NAP)
#define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP)
Index: linux-ozlabs/arch/powerpc/include/asm/reg.h
===================================================================
--- linux-ozlabs.orig/arch/powerpc/include/asm/reg.h
+++ linux-ozlabs/arch/powerpc/include/asm/reg.h
@@ -496,6 +496,7 @@
#define SPRN_HSRR0 0x13A /* Save/Restore Register 0 */
#define SPRN_HSRR1 0x13B /* Save/Restore Register 1 */
+#define HSRR1_DENORM 0x00100000 /* Denorm exception */
#define SPRN_TBCTL 0x35f /* PA6T Timebase control register */
#define TBCTL_FREEZE 0x0000000000000000ull /* Freeze all tbs */
Index: linux-ozlabs/arch/powerpc/kernel/exceptions-64s.S
===================================================================
--- linux-ozlabs.orig/arch/powerpc/kernel/exceptions-64s.S
+++ linux-ozlabs/arch/powerpc/kernel/exceptions-64s.S
@@ -264,6 +264,31 @@ vsx_unavailable_pSeries_1:
STD_EXCEPTION_HV(0x1200, 0x1202, cbe_system_error)
#endif /* CONFIG_CBE_RAS */
STD_EXCEPTION_PSERIES(0x1300, 0x1300, instruction_breakpoint)
+
+#ifdef CONFIG_PPC_DENORMALISATION
+ . = 0x1500
+ .global denorm_Hypervisor
+denorm_Hypervisor:
+ HMT_MEDIUM
+ mtspr SPRN_SPRG_HSCRATCH0,r13
+ mfspr r13,SPRN_SPRG_HPACA
+ std r9,PACA_EXGEN+EX_R9(r13)
+ std r10,PACA_EXGEN+EX_R10(r13)
+ std r11,PACA_EXGEN+EX_R11(r13)
+ std r12,PACA_EXGEN+EX_R12(r13)
+ mfspr r9,SPRN_SPRG_HSCRATCH0
+ std r9,PACA_EXGEN+EX_R13(r13)
+ mfcr r9
+
+ mfspr r10,SPRN_HSRR1
+ mfspr r11,SPRN_HSRR0 /* save HSRR0 */
+ andis. r10,r10,(HSRR1_DENORM)@h /* denorm? */
+ addi r11,r11,-4 /* HSRR0 is next instruction */
+ bne+ denorm_assist
+
+ EXCEPTION_PROLOG_PSERIES_1(denorm_common, EXC_HV)
+#endif
+
#ifdef CONFIG_CBE_RAS
STD_EXCEPTION_HV(0x1600, 0x1602, cbe_maintenance)
#endif /* CONFIG_CBE_RAS */
@@ -317,6 +342,103 @@ masked_Hinterrupt:
hrfid
b .
+#ifdef CONFIG_PPC_DENORMALISATION
+denorm_assist:
+BEGIN_FTR_SECTION
+/*
+ * To denormalise we need to move a copy of the register to itself.
+ * For POWER6 do that here for all FP regs.
+ */
+ mfmsr r10
+ ori r10,r10,(MSR_FP|MSR_FE0|MSR_FE1)
+ xori r10,r10,(MSR_FE0|MSR_FE1)
+ mtmsrd r10
+ sync
+ fmr 0,0
+ fmr 1,1
+ fmr 2,2
+ fmr 3,3
+ fmr 4,4
+ fmr 5,5
+ fmr 6,6
+ fmr 7,7
+ fmr 8,8
+ fmr 9,9
+ fmr 10,10
+ fmr 11,11
+ fmr 12,12
+ fmr 13,13
+ fmr 14,14
+ fmr 15,15
+ fmr 16,16
+ fmr 17,17
+ fmr 18,18
+ fmr 19,19
+ fmr 20,20
+ fmr 21,21
+ fmr 22,22
+ fmr 23,23
+ fmr 24,24
+ fmr 25,25
+ fmr 26,26
+ fmr 27,27
+ fmr 28,28
+ fmr 29,29
+ fmr 30,30
+ fmr 31,31
+FTR_SECTION_ELSE
+/*
+ * To denormalise we need to move a copy of the register to itself.
+ * For POWER7 do that here for the first 32 VSX registers only.
+ */
+ mfmsr r10
+ oris r10,r10,MSR_VSX@h
+ mtmsrd r10
+ sync
+ XVCPSGNDP(0,0,0)
+ XVCPSGNDP(1,1,1)
+ XVCPSGNDP(2,2,2)
+ XVCPSGNDP(3,3,3)
+ XVCPSGNDP(4,4,4)
+ XVCPSGNDP(5,5,5)
+ XVCPSGNDP(6,6,6)
+ XVCPSGNDP(7,7,7)
+ XVCPSGNDP(8,8,8)
+ XVCPSGNDP(9,9,9)
+ XVCPSGNDP(10,10,10)
+ XVCPSGNDP(11,11,11)
+ XVCPSGNDP(12,12,12)
+ XVCPSGNDP(13,13,13)
+ XVCPSGNDP(14,14,14)
+ XVCPSGNDP(15,15,15)
+ XVCPSGNDP(16,16,16)
+ XVCPSGNDP(17,17,17)
+ XVCPSGNDP(18,18,18)
+ XVCPSGNDP(19,19,19)
+ XVCPSGNDP(20,20,20)
+ XVCPSGNDP(21,21,21)
+ XVCPSGNDP(22,22,22)
+ XVCPSGNDP(23,23,23)
+ XVCPSGNDP(24,24,24)
+ XVCPSGNDP(25,25,25)
+ XVCPSGNDP(26,26,26)
+ XVCPSGNDP(27,27,27)
+ XVCPSGNDP(28,28,28)
+ XVCPSGNDP(29,29,29)
+ XVCPSGNDP(30,30,30)
+ XVCPSGNDP(31,31,31)
+ALT_FTR_SECTION_END_IFCLR(CPU_FTR_HVMODE_206)
+ mtspr SPRN_HSRR0,r11
+ mtcrf 0x80,r9
+ ld r9,PACA_EXGEN+EX_R9(r13)
+ ld r10,PACA_EXGEN+EX_R10(r13)
+ ld r11,PACA_EXGEN+EX_R11(r13)
+ ld r12,PACA_EXGEN+EX_R12(r13)
+ ld r13,PACA_EXGEN+EX_R13(r13)
+ HRFID
+ b .
+#endif
+
.align 7
do_stab_bolted_pSeries:
std r11,PACA_EXSLB+EX_R11(r13)
@@ -419,6 +541,9 @@ machine_check_common:
STD_EXCEPTION_COMMON(0xe60, hmi_exception, .unknown_exception)
STD_EXCEPTION_COMMON_IDLE(0xf00, performance_monitor, .performance_monitor_exception)
STD_EXCEPTION_COMMON(0x1300, instruction_breakpoint, .instruction_breakpoint_exception)
+#ifdef CONFIG_PPC_DENORMALISATION
+ STD_EXCEPTION_COMMON(0x1502, denorm, .unknown_exception)
+#endif
#ifdef CONFIG_ALTIVEC
STD_EXCEPTION_COMMON(0x1700, altivec_assist, .altivec_assist_exception)
#else
^ permalink raw reply
* v2 booke/kprobe: Fix stack corrupt issue when kprobe 'stwu'
From: Tiejun Chen @ 2011-07-11 2:39 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
In-Reply-To: <1310351976-24078-2-git-send-email-tiejun.chen@windriver.com>
v1 -> v2: when allocate pgirq_ctx, use 'hw_cpu' to identify cpu ID in exc_lvl_early_init().
BTW, I already validated these patches on fsl mpc8536 by kprobe do_fork() and show_interrupts().
Note this bug I fixed is from another email thread, "[BUG?]3.0-rc4+ftrace+kprobe:
set kprobe at instruction 'stwu' lead to system crash/freeze", Yong Zhang, <yong.zhang0@gmail.com>
previously reported.
Tiejun
^ permalink raw reply
* [v2 PATCH 2/2] booke/kprobe: remove unnecessary preempt_enable_no_resched
From: Tiejun Chen @ 2011-07-11 2:39 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev
In-Reply-To: <1310351976-24078-1-git-send-email-tiejun.chen@windriver.com>
When enable CONFIG_PREEMPT we will trigger the following call trace:
BUG: scheduling while atomic: swapper/1/0x10000000
...
krpobe always goes through the following path:
program_check_exception()
|
+ notify_die(DIE_BPT, "breakpoint",...)
|
+ kprobe_handler()
|
+ preempt_disable();
+ break_handler() <- preempt_enable_no_resched()
+ emulate_step()
+ preempt_enable_no_resched()
...
exit
We should remove unnecessary preempt_enable_no_resched() inside of break_handler()
since looks longjmp_break_handler() always go the above path.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
---
arch/powerpc/kernel/kprobes.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index bc47352..a8a2a4d 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -552,7 +552,6 @@ int __kprobes longjmp_break_handler(struct kprobe *p, struct pt_regs *regs)
* saved regs...
*/
memcpy(regs, &kcb->jprobe_saved_regs, sizeof(struct pt_regs));
- preempt_enable_no_resched();
return 1;
}
--
1.5.6
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox