LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v5 1/5] powerpc/85xx: implement hardware timebase sync
From: Zhao Chenhui @ 2012-06-07  4:07 UTC (permalink / raw)
  To: Scott Wood; +Cc: Matthew McClintock, linuxppc-dev, linux-kernel
In-Reply-To: <4FCFA0C8.9090800@freescale.com>

On Wed, Jun 06, 2012 at 01:26:16PM -0500, Scott Wood wrote:
> On 06/06/2012 04:31 AM, Zhao Chenhui wrote:
> > On Tue, Jun 05, 2012 at 11:07:41AM -0500, Scott Wood wrote:
> >> On 06/05/2012 04:08 AM, Zhao Chenhui wrote:
> >>> On Fri, Jun 01, 2012 at 10:40:00AM -0500, Scott Wood wrote:
> >>>> I know you say this is for dual-core chips only, but it would be nice if
> >>>> you'd write this in a way that doesn't assume that (even if the
> >>>> corenet-specific timebase freezing comes later).
> >>>
> >>> At this point, I have not thought about how to implement the cornet-specific timebase freezing.
> >>
> >> I wasn't asking you to.  I was asking you to not have logic that breaks
> >> with more than 2 CPUs.
> > 
> > These routines only called in the dual-core case. 
> 
> Come on, you know we have chips with more than two cores.  Why design
> such a limitation into it, just because you're not personally interested
> in supporting anything but e500v2?
> 
> Is it so hard to make it work for an arbitrary number of cores?
> 
> >>> If do not set them, it may make KEXEC fail on other platforms.
> >>
> >> What platforms?
> > 
> > Such as P4080, P3041, etc.
> 
> So we need to wait for corenet timebase sync before we stop causing
> problems in virtualization, simulators, etc. if a kernel has kexec or
> cpu hotplug enabled (whether used or not)?
> 
> Can you at least make sure we're actually in a kexec/hotplug scenario at
> runtime?
> 
> Or just implement corenet timebase sync -- it's not that different.
> 
> -Scott

We also work on the corenet timebase sync. Our plan is first the dual-core case,
then the case of more than 2 cores. We will submit the corenet timebase sync patch soon.

-Chenhui

^ permalink raw reply

* Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user
From: Paul Mackerras @ 2012-06-07  3:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: mikey, michael, Anton Blanchard, olof, linuxppc-dev
In-Reply-To: <178E3BC0-C6E2-4E33-BA66-8144F192A151@kernel.crashing.org>

On Wed, Jun 06, 2012 at 06:40:54PM +0200, Segher Boessenkool wrote:
> >+err1;	dcbz	r0,r3
> 
> There is no such instruction, you probably meant "dcbz 0,r3"?

There certainly is such an instruction, though it doesn't do exactly
what a naive reader might expect.  Using 0 rather than r0 or %r0
improves readability but makes no difference to the assembler or the
cpu.

Paul.

^ permalink raw reply

* [PATCH] mpc85xx_edac: fix error: too few arguments to function 'edac_mc_alloc'
From: Kim Phillips @ 2012-06-07  0:49 UTC (permalink / raw)
  To: linux-edac; +Cc: linuxppc-dev, Mauro Carvalho Chehab

commit ca0907b "edac: Remove the legacy EDAC ABI" broke mpc85xx_edac
in the following manner:

mpc85xx_edac.c:983:35: error: too few arguments to function 'edac_mc_alloc'

this patch puts back the missing 'layers' argument.

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Kim Phillips <kim.phillips@freescale.com>
---
 drivers/edac/mpc85xx_edac.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/mpc85xx_edac.c b/drivers/edac/mpc85xx_edac.c
index 4c40235..0e37462 100644
--- a/drivers/edac/mpc85xx_edac.c
+++ b/drivers/edac/mpc85xx_edac.c
@@ -980,7 +980,8 @@ static int __devinit mpc85xx_mc_err_probe(struct platform_device *op)
 	layers[1].type = EDAC_MC_LAYER_CHANNEL;
 	layers[1].size = 1;
 	layers[1].is_virt_csrow = false;
-	mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), sizeof(*pdata));
+	mci = edac_mc_alloc(edac_mc_idx, ARRAY_SIZE(layers), layers,
+			    sizeof(*pdata));
 	if (!mci) {
 		devres_release_group(&op->dev, mpc85xx_mc_err_probe);
 		return -ENOMEM;
-- 
1.7.10.2

^ permalink raw reply related

* Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs
From: Scott Wood @ 2012-06-07  0:37 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Ben Collins
In-Reply-To: <1339021310.7150.181.camel@pasglop>

On 06/06/2012 05:21 PM, Benjamin Herrenschmidt wrote:
> Now that means that we can end up with funky arithmetic in a couple of
> cases:
> 
>  - If the bus address of the IO space is larger than the virtual address
> returned by ioremap (it's a bit silly to use large IO addresses but it's
> technically possible, normally IO windows start at 0 bus-side though).
> In fact I wouldn't be surprised if we have various other bugs if IO
> windows don't start at 0 (you may want to double check your dts setup
> here).

The dts does show the I/O beginning at bus address zero:

                 ranges = <0x2000000 0x0 0xc0000000 0xc0000000 0x0
0x20000000
                           0x1000000 0x0 0x0 0xe1000000 0x0 0x10000>;

>  - If the ioremap'ed address of the IO space of another domain is lower
> than the ioremap'ed address of the first domain, in which case the
> calculation:
> 
> 	host->io_base_virt - _IO_BASE
> 
> results in a negative offset.

There should have been only one PCI domain in the QEMU case.

-Scott

^ permalink raw reply

* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
From: Alex Shi @ 2012-06-07  0:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
	sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
	arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
	cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu, rth, tony.luck,
	torvalds, linux-kernel, ralf, lethal, linux-alpha, bob.picco,
	akpm, linuxppc-dev, davem
In-Reply-To: <1338973295.2749.81.camel@twins>

On 06/06/2012 05:01 PM, Peter Zijlstra wrote:

> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> -       if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> +       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 
> 
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
> 


I understand you, the BIOS guys don't have a good alignment with us on
this.

> So I've taken this.
> 
> thanks!

^ permalink raw reply

* Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user
From: Olof Johansson @ 2012-06-07  0:30 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: mikey, michael, paulus, linuxppc-dev
In-Reply-To: <20120605120222.6722a3e3@kryten>

On Mon, Jun 4, 2012 at 7:02 PM, Anton Blanchard <anton@samba.org> wrote:
>
> I blame Mikey for this. He elevated my slightly dubious testcase:
>
> # dd if=/dev/zero of=/dev/null bs=1M count=10000
>
> to benchmark status. And naturally we need to be number 1 at creating
> zeros. So lets improve __clear_user some more.
>
> As Paul suggests we can use dcbz for large lengths. This patch gets
> the destination cacheline aligned then uses dcbz on whole cachelines.
>
> Before:
> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>
> After:
> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>
> 39 GB/s, a new record.
>
> Signed-off-by: Anton Blanchard <anton@samba.org>

Besides the comments from Segher, feel free to add:

Tested-by: Olof Johansson <olof@lixom.net>
Acked-by: Olof Johansson <olof@lixom.net>

Didn't help performance all that much on pa6t, but it didn't go down.
Too low on cycles to actually analyze why at this time.

-OIof

^ permalink raw reply

* Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs
From: Ben Collins @ 2012-06-06 23:35 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <4FCFC870.40004@freescale.com>


On Jun 6, 2012, at 5:15 PM, Scott Wood wrote:

> On 06/05/2012 10:50 PM, Ben Collins wrote:
>> The commit introducing pcibios_io_space_offset() was ignoring 32-bit =
to
>> 64-bit sign extention, which is the case on ppc32 with 64-bit =
resource
>> addresses. This only seems to have shown up while running under QEMU =
for
>> e500mc target. It may or may be suboptimal that QEMU has an IO base
>> address > 32-bits for the e500-pci implementation, but 1) it's still =
a
>> regression and 2) it's more correct to handle things this way.
>=20
> Where do you see addresses over 32 bits in QEMU's e500-pci, at least
> with current mainline QEMU and the mpc8544ds model?
>=20
> I/O space should be at 0xe1000000.

The problem is this:

pci_bus 0000:00: root bus resource [io  0xffbed000-0xffbfcfff] (bus =
address [0x100000000-0x10000ffff])

Without the fix that I sent, it ends up looking like:

pci_bus 0000:00: root bus resource [io  0xffbed000-0xffbfcfff] (bus =
address [0x0000-0xffff])

And that's when some devices fail to be assigned valid bar 0's and the =
kernel complains because of it.

> I'm also not sure what this has to do with the virtual address =
returned
> by ioremap().
>=20
>> Signed-off-by: Ben Collins <bcollins@ubuntu.com>
>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> ---
>> arch/powerpc/kernel/pci-common.c |    8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>=20
>> diff --git a/arch/powerpc/kernel/pci-common.c =
b/arch/powerpc/kernel/pci-common.c
>> index 8e78e93..be9ced7 100644
>> --- a/arch/powerpc/kernel/pci-common.c
>> +++ b/arch/powerpc/kernel/pci-common.c
>> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, =
int mask)
>> 	return pci_enable_resources(dev, mask);
>> }
>>=20
>> +/* Before assuming too much here, take care to realize that we need =
sign
>> + * extension from 32-bit pointers to 64-bit resource addresses to =
work.
>> + */
>> resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
>> {
>> -	return (unsigned long) hose->io_base_virt - _IO_BASE;
>> +	long vbase =3D (long)hose->io_base_virt;
>> +	long io_base =3D _IO_BASE;
>> +
>> +	return (resource_size_t)(vbase - io_base);
>=20
> Why do we want sign extension here?
>=20
> If we do want it, there are a lot of other places in this file where =
the
> same calculation is done.
>=20
> -Scott
>=20

--
Ben Collins
Servergy, Inc.
(757) 243-7557

CONFIDENTIALITY NOTICE: This communication contains privileged and/or =
confidential information; and should be maintained with the strictest =
confidence. It is intended solely for the use of the person or entity in =
which it is addressed. If you are not the intended recipient, you are =
STRICTLY PROHIBITED from disclosing, copying, distributing or using any =
of this information. If you received this communication in error, please =
contact the sender immediately and destroy the material in its entirety, =
whether electronic or hard copy.

^ permalink raw reply

* Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs
From: Benjamin Herrenschmidt @ 2012-06-06 22:21 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, Ben Collins
In-Reply-To: <4FCFC870.40004@freescale.com>

On Wed, 2012-06-06 at 16:15 -0500, Scott Wood wrote:
> On 06/05/2012 10:50 PM, Ben Collins wrote:
> > The commit introducing pcibios_io_space_offset() was ignoring 32-bit to
> > 64-bit sign extention, which is the case on ppc32 with 64-bit resource
> > addresses. This only seems to have shown up while running under QEMU for
> > e500mc target. It may or may be suboptimal that QEMU has an IO base
> > address > 32-bits for the e500-pci implementation, but 1) it's still a
> > regression and 2) it's more correct to handle things this way.
> 
> Where do you see addresses over 32 bits in QEMU's e500-pci, at least
> with current mainline QEMU and the mpc8544ds model?
> 
> I/O space should be at 0xe1000000.
> 
> I'm also not sure what this has to do with the virtual address returned
> by ioremap().

This is due to how we calculate IO offsets on ppc32, see below

> > Signed-off-by: Ben Collins <bcollins@ubuntu.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >  arch/powerpc/kernel/pci-common.c |    8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> > index 8e78e93..be9ced7 100644
> > --- a/arch/powerpc/kernel/pci-common.c
> > +++ b/arch/powerpc/kernel/pci-common.c
> > @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
> >  	return pci_enable_resources(dev, mask);
> >  }
> >  
> > +/* Before assuming too much here, take care to realize that we need sign
> > + * extension from 32-bit pointers to 64-bit resource addresses to work.
> > + */
> >  resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
> >  {
> > -	return (unsigned long) hose->io_base_virt - _IO_BASE;
> > +	long vbase = (long)hose->io_base_virt;
> > +	long io_base = _IO_BASE;
> > +
> > +	return (resource_size_t)(vbase - io_base);
> 
> Why do we want sign extension here?
> 
> If we do want it, there are a lot of other places in this file where the
> same calculation is done.

We should probably as much as possible factor it, but basically what
happens is that to access IO space, we turn:

	 oub(port)

into
	 out_8(_IO_BASE + port)

With _IO_BASE being a global.

Now what happens when you have several PHBs ? Well, we make _IO_BASE be
the result of ioremap'ing the IO space window of the first one, minus
the bus address corresponding to the beginning of that window. Then for
each device, we offset devices with the offset calculated above.

Now that means that we can end up with funky arithmetic in a couple of
cases:

 - If the bus address of the IO space is larger than the virtual address
returned by ioremap (it's a bit silly to use large IO addresses but it's
technically possible, normally IO windows start at 0 bus-side though).
In fact I wouldn't be surprised if we have various other bugs if IO
windows don't start at 0 (you may want to double check your dts setup
here).

 - If the ioremap'ed address of the IO space of another domain is lower
than the ioremap'ed address of the first domain, in which case the
calculation:

	host->io_base_virt - _IO_BASE

results in a negative offset.

Thus we need to make sure that this offset is fully sign extended so
that things work properly when applied to a resource_size_t which can be
64-bit.

On ppc64 we do things differently, we have a single linear region that
has all IO spaces and _IO_BASE is the beginning of it so offsets are
never negative, we can do that because we don't care wasting address
space there.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user
From: Benjamin Herrenschmidt @ 2012-06-06 21:20 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: mikey, michael, paulus, Anton Blanchard, olof, linuxppc-dev
In-Reply-To: <178E3BC0-C6E2-4E33-BA66-8144F192A151@kernel.crashing.org>

On Wed, 2012-06-06 at 18:40 +0200, Segher Boessenkool wrote:
> > +err1;	dcbz	r0,r3
> 
> There is no such instruction, you probably meant "dcbz 0,r3"?

This reminds me... what would happen if we changed all our

#define	r0	0
#define	r1	1

etc... to:

#define r0	%r0
#define r1	%r1

?

I'm thinking it might help catch that sort of nasties (and some of them
can be really nasty, such as inverting mfspr/mtspr arguments, or vs ori,
etc... ). I'm sure we'd have a problem with a few macros & inline
constructs but nothing we can't fix..

(Haven't tested ... still home, officially sick :-)

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] PPC: PCI: Fix pcibios_io_space_offset() so it works for 32-bit ptr/64-bit rsrcs
From: Scott Wood @ 2012-06-06 21:15 UTC (permalink / raw)
  To: Ben Collins; +Cc: linuxppc-dev
In-Reply-To: <4DC27253-67FC-4A55-8C78-7782D9D0CF53@servergy.com>

On 06/05/2012 10:50 PM, Ben Collins wrote:
> The commit introducing pcibios_io_space_offset() was ignoring 32-bit to
> 64-bit sign extention, which is the case on ppc32 with 64-bit resource
> addresses. This only seems to have shown up while running under QEMU for
> e500mc target. It may or may be suboptimal that QEMU has an IO base
> address > 32-bits for the e500-pci implementation, but 1) it's still a
> regression and 2) it's more correct to handle things this way.

Where do you see addresses over 32 bits in QEMU's e500-pci, at least
with current mainline QEMU and the mpc8544ds model?

I/O space should be at 0xe1000000.

I'm also not sure what this has to do with the virtual address returned
by ioremap().

> Signed-off-by: Ben Collins <bcollins@ubuntu.com>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/kernel/pci-common.c |    8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index 8e78e93..be9ced7 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1477,9 +1477,15 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
>  	return pci_enable_resources(dev, mask);
>  }
>  
> +/* Before assuming too much here, take care to realize that we need sign
> + * extension from 32-bit pointers to 64-bit resource addresses to work.
> + */
>  resource_size_t pcibios_io_space_offset(struct pci_controller *hose)
>  {
> -	return (unsigned long) hose->io_base_virt - _IO_BASE;
> +	long vbase = (long)hose->io_base_virt;
> +	long io_base = _IO_BASE;
> +
> +	return (resource_size_t)(vbase - io_base);

Why do we want sign extension here?

If we do want it, there are a lot of other places in this file where the
same calculation is done.

-Scott

^ permalink raw reply

* Re: [PATCH v5 4/5] fsl_pmc: Add API to enable device as wakeup event source
From: Scott Wood @ 2012-06-06 18:29 UTC (permalink / raw)
  To: Li Yang
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472,
	linux-kernel@vger.kernel.org, Zhao Chenhui-B35336
In-Reply-To: <CADRPPNSzTNPkNExBnNu1iD2hqyRG=Cw+y+PnjTRWxjP_ib1Kww@mail.gmail.com>

On 06/05/2012 11:06 PM, Li Yang wrote:
> On Wed, Jun 6, 2012 at 2:05 AM, Scott Wood <scottwood@freescale.com> wrote:
>> You ignored "what about devices other than ethernet".
> 
> No, I haven't.  Other devices are so at least for now.

I don't understand that last sentence.  Other devices are what?

-Scott

^ permalink raw reply

* Re: [PATCH v5 1/5] powerpc/85xx: implement hardware timebase sync
From: Scott Wood @ 2012-06-06 18:26 UTC (permalink / raw)
  To: Zhao Chenhui; +Cc: Matthew McClintock, linuxppc-dev, linux-kernel
In-Reply-To: <20120606093142.GA23505@localhost.localdomain>

On 06/06/2012 04:31 AM, Zhao Chenhui wrote:
> On Tue, Jun 05, 2012 at 11:07:41AM -0500, Scott Wood wrote:
>> On 06/05/2012 04:08 AM, Zhao Chenhui wrote:
>>> On Fri, Jun 01, 2012 at 10:40:00AM -0500, Scott Wood wrote:
>>>> I know you say this is for dual-core chips only, but it would be nice if
>>>> you'd write this in a way that doesn't assume that (even if the
>>>> corenet-specific timebase freezing comes later).
>>>
>>> At this point, I have not thought about how to implement the cornet-specific timebase freezing.
>>
>> I wasn't asking you to.  I was asking you to not have logic that breaks
>> with more than 2 CPUs.
> 
> These routines only called in the dual-core case. 

Come on, you know we have chips with more than two cores.  Why design
such a limitation into it, just because you're not personally interested
in supporting anything but e500v2?

Is it so hard to make it work for an arbitrary number of cores?

>>> If do not set them, it may make KEXEC fail on other platforms.
>>
>> What platforms?
> 
> Such as P4080, P3041, etc.

So we need to wait for corenet timebase sync before we stop causing
problems in virtualization, simulators, etc. if a kernel has kexec or
cpu hotplug enabled (whether used or not)?

Can you at least make sure we're actually in a kexec/hotplug scenario at
runtime?

Or just implement corenet timebase sync -- it's not that different.

-Scott

^ permalink raw reply

* Re: [PATCH v5 2/5] powerpc/85xx: add HOTPLUG_CPU support
From: Scott Wood @ 2012-06-06 18:19 UTC (permalink / raw)
  To: Zhao Chenhui; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20120606095910.GB23505@localhost.localdomain>

On 06/06/2012 04:59 AM, Zhao Chenhui wrote:
> On Tue, Jun 05, 2012 at 11:15:52AM -0500, Scott Wood wrote:
>> On 06/05/2012 06:18 AM, Zhao Chenhui wrote:
>>> If user does not enable kexec or hotplug, these codes are redundant.
>>> So use CONFIG_KEXEC and CONFIG_HOTPLUG_CPU to gard them.
>>
>> My point is that these lists tend to grow and be a maintenance pain.
>> For small things it's often better to not worry about saving a few
>> bytes.  For larger things that need to be conditional, define a new
>> symbol rather than growing ORed lists like this.
>>
>> -Scott
> 
> I agree with you in principle. But there are only two config options
> in this patch, and it is unlikely to grow. 

That's what everybody says when these things start. :-)

-Scott

^ permalink raw reply

* Re: [PATCH 2/2] [POWERPC] uprobes: powerpc port
From: Jim Keniston @ 2012-06-06 18:08 UTC (permalink / raw)
  To: ananth
  Cc: Srikar Dronamraju, Peter Zijlstra, oleg, lkml, Paul Mackerras,
	Anton Blanchard, Ingo Molnar, linuxppc-dev
In-Reply-To: <20120606093541.GA29580@in.ibm.com>

On Wed, 2012-06-06 at 15:05 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Jun 06, 2012 at 11:27:02AM +0200, Peter Zijlstra wrote:
> > On Wed, 2012-06-06 at 14:51 +0530, Ananth N Mavinakayanahalli wrote:
> > > One TODO in this port compared to x86 is the uprobe abort_xol() logic.
> > > x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
> > > if a signal was caused when the uprobed instruction was single-stepped/
> > > emulated, in which case, we reset the instruction pointer to the probed
> > > address and retry the probe again. 
> > 
> > Another curious difference is that x86 uses an instruction decoder and
> > contains massive tables to validate we can probe a particular
> > instruction.

Part of that difference is because the x86 instruction set is a lot more
complex.  Another part is due to the lack, back when the x86 code was
created, of robust handling by uprobes of traps by probed instructions.
So we refused to probe instructions that we knew (or strongly suspected)
would generate traps in user mode -- e.g., privileged instructions,
illegal instructions.  A couple of times we had to "legalize"
instructions or prefixes that we didn't originally expect to encounter.

> > 
> > Can we probe all possible PPC instructions?
> 
> For the kernel, the only ones that are off limits are rfi (return from
> interrupt), mtmsr (move to msr). All other instructions can be probed.
> 
> Both those instructions are supervisor level, so we won't see them in
> userspace at all; so we should be able to probe all user level
> instructions.

Presumably rfi or mtmsr could show up in the instruction stream via an
erroneous or mischievous asm statement.  It'd be good to verify that you
handle that gracefully.

> 
> I am not aware of specific caveats for vector/altivec instructions;
> maybe Paul or Ben are more suitable to comment on that.
> 
> Ananth
> 

Jim

^ permalink raw reply

* Re: [PATCH] powerpc: Optimise the 64bit optimised __clear_user
From: Segher Boessenkool @ 2012-06-06 16:40 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: mikey, michael, paulus, olof, linuxppc-dev
In-Reply-To: <20120605120222.6722a3e3@kryten>

> +err1;	dcbz	r0,r3

There is no such instruction, you probably meant "dcbz 0,r3"?


Segher

^ permalink raw reply

* Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
From: Srikar Dronamraju @ 2012-06-06 16:30 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: peterz, lkml, Paul Mackerras, Anton Blanchard, Ingo Molnar,
	linuxppc-dev
In-Reply-To: <20120606150848.GA24641@redhat.com>

* Oleg Nesterov <oleg@redhat.com> [2012-06-06 17:08:48]:

> On 06/06, Ananth N Mavinakayanahalli wrote:
> >
> > From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
> >
> > On RISC architectures like powerpc, instructions are fixed size.
> > Instruction analysis on such platforms is just a matter of (insn % 4).
> > Pass the vaddr at which the uprobe is to be inserted so that
> > arch_uprobe_analyze_insn() can flag misaligned registration requests.
> 
> And the next patch checks "vaddr & 0x03".
> 
> But why do you need this new arg? arch_uprobe_analyze_insn() could
> check "container_of(auprobe, struct uprobe, arch)->offset & 0x3" with
> the same effect, no? vm_start/vm_pgoff are obviously page-aligned.
> 

We cant use container_of because we moved the definition for struct
uprobe to kernel/events/uprobe.c. This was possible before when struct
uprobe definition was in include/uprobes.h 

-- 
Thanks and Regards
Srikar

^ permalink raw reply

* Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address
From: David Miller @ 2012-06-06 16:03 UTC (permalink / raw)
  To: benh; +Cc: aarcange, linux-kernel, linux-mm, R65777, linuxppc-dev
In-Reply-To: <1338960617.7150.163.camel@pasglop>

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Wed, 06 Jun 2012 15:30:17 +1000

> On Wed, 2012-06-06 at 00:46 +0000, Bhushan Bharat-R65777 wrote:
> 
>> > >> memblock_end_of_DRAM() returns end_address + 1, not end address.
>> > >> While some code assumes that it returns end address.
>> > >
>> > > Shouldn't we instead fix it the other way around ? IE, make
>> > > memblock_end_of_DRAM() does what the name implies, which is to
>> return
>> > > the last byte of DRAM, and fix the -other- callers not to make bad
>> > > assumptions ?
>> > 
>> > That was my impression too when I saw this patch.
>> 
>> Initially I also intended to do so. I initiated a email on linux-mm@
>> subject "memblock_end_of_DRAM()  return end address + 1" and the only
>> response I received from Andrea was:
>> 
>> "
>> It's normal that "end" means "first byte offset out of the range". End
>> = not ok.
>> end = start+size.
>> This is true for vm_end too. So it's better to keep it that way.
>> My suggestion is to just fix point 1 below and audit the rest :)
>> "
> 
> Oh well, I don't care enough to fight this battle in my current state so
> unless Dave has more stamina than I have today, I'm ok with the patch.

I'm definitely without the samina to fight something like this right now :)

^ permalink raw reply

* Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
From: Oleg Nesterov @ 2012-06-06 15:08 UTC (permalink / raw)
  To: Ananth N Mavinakayanahalli
  Cc: Srikar Dronamraju, peterz, lkml, Paul Mackerras, Anton Blanchard,
	Ingo Molnar, linuxppc-dev
In-Reply-To: <20120606091950.GB6745@in.ibm.com>

On 06/06, Ananth N Mavinakayanahalli wrote:
>
> From: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
>
> On RISC architectures like powerpc, instructions are fixed size.
> Instruction analysis on such platforms is just a matter of (insn % 4).
> Pass the vaddr at which the uprobe is to be inserted so that
> arch_uprobe_analyze_insn() can flag misaligned registration requests.

And the next patch checks "vaddr & 0x03".

But why do you need this new arg? arch_uprobe_analyze_insn() could
check "container_of(auprobe, struct uprobe, arch)->offset & 0x3" with
the same effect, no? vm_start/vm_pgoff are obviously page-aligned.

Oleg.

^ permalink raw reply

* [PATCH] kernel panic during kernel module load (powerpc specific part)
From: Steffen Rumler @ 2012-06-06 14:37 UTC (permalink / raw)
  To: ext Benjamin Herrenschmidt, paulus
  Cc: Wrobel Heinz-R39252, Michael Ellerman,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1338982323.7150.165.camel@pasglop>

Hi,

The patch below is intended to fix the following problem.

According to the PowerPC EABI specification, the GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), the GPR r11 is also used
to generate trampoline code.

This combination crashes the kernel, in the following case:

   + The compiler has been generated the prologue and epilogue,
     which is part of the .text section.
   + The compiler has been generated the code for the module init entry point,
     part of the .init.text section (in the case it is marked with __init).
   + By returning from the module init entry point, the epilogue is called by doing
     a branch instruction.
   + If the epilogue is too far away, a relative branch instruction cannot be applied.
     Instead trampoline code is generated in do_plt_call(), in order to jump via register.
     Unfortunately the code generated by do_plt_call() destroys the content of GPR r11.
   + Because GPR r11 does not more keep the right stack frame pointer,
     the kernel crashes right after the epilogue.

The fix just uses GPR r12 instead of GPR r11 for generating the trampoline code.
According to the statements from Freescale, this is also save from EABI perspective.

I've tested the fix for kernel 2.6.33 on MPC8541.

Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com>
---

--- orig/arch/powerpc/kernel/module_32.c	2012-06-06 16:04:28.956446788 +0200
+++ new/arch/powerpc/kernel/module_32.c		2012-06-06 16:04:17.746290683 +0200
@@ -187,8 +187,8 @@

  static inline int entry_matches(struct ppc_plt_entry *entry, Elf32_Addr val)
  {
-	if (entry->jump[0] == 0x3d600000 + ((val + 0x8000) >> 16)
-	    && entry->jump[1] == 0x396b0000 + (val & 0xffff))
+	if (entry->jump[0] == 0x3d800000 + ((val + 0x8000) >> 16)
+	    && entry->jump[1] == 0x398c0000 + (val & 0xffff))
  		return 1;
  	return 0;
  }
@@ -215,10 +215,9 @@
  		entry++;
  	}

-	/* Stolen from Paul Mackerras as well... */
-	entry->jump[0] = 0x3d600000+((val+0x8000)>>16);	/* lis r11,sym@ha */
-	entry->jump[1] = 0x396b0000 + (val&0xffff);	/* addi r11,r11,sym@l*/
-	entry->jump[2] = 0x7d6903a6;			/* mtctr r11 */
+	entry->jump[0] = 0x3d800000+((val+0x8000)>>16); /* lis r12,sym@ha */
+	entry->jump[1] = 0x398c0000 + (val&0xffff);     /* addi r12,r12,sym@l*/
+	entry->jump[2] = 0x7d8903a6;                    /* mtctr r12 */
  	entry->jump[3] = 0x4e800420;			/* bctr */

  	DEBUGP("Initialized plt for 0x%x at %p\n", val, entry);

^ permalink raw reply

* Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address
From: Andrea Arcangeli @ 2012-06-06 13:14 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Bhushan Bharat-R65777, linuxppc-dev@lists.ozlabs.org,
	David Miller
In-Reply-To: <1338960617.7150.163.camel@pasglop>

Hi,

On Wed, Jun 06, 2012 at 03:30:17PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-06-06 at 00:46 +0000, Bhushan Bharat-R65777 wrote:
> 
> > > >> memblock_end_of_DRAM() returns end_address + 1, not end address.
> > > >> While some code assumes that it returns end address.
> > > >
> > > > Shouldn't we instead fix it the other way around ? IE, make
> > > > memblock_end_of_DRAM() does what the name implies, which is to
> > return
> > > > the last byte of DRAM, and fix the -other- callers not to make bad
> > > > assumptions ?
> > > 
> > > That was my impression too when I saw this patch.
> > 
> > Initially I also intended to do so. I initiated a email on linux-mm@
> > subject "memblock_end_of_DRAM()  return end address + 1" and the only
> > response I received from Andrea was:
> > 
> > "
> > It's normal that "end" means "first byte offset out of the range". End
> > = not ok.
> > end = start+size.
> > This is true for vm_end too. So it's better to keep it that way.
> > My suggestion is to just fix point 1 below and audit the rest :)
> > "
> 
> Oh well, I don't care enough to fight this battle in my current state so

I wish you to get well soon Ben!

> unless Dave has more stamina than I have today, I'm ok with the patch.

Well it doesn't really matter in the end what is decided as long as
something is decided :). I was asked through a forward so I only
expressed my preference...

^ permalink raw reply

* Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
From: Srikar Dronamraju @ 2012-06-06 11:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, lkml, oleg, Paul Mackerras, Anton Blanchard,
	Ingo Molnar, linuxppc-dev
In-Reply-To: <20120606094014.GD9495@gmail.com>

* Ingo Molnar <mingo@kernel.org> [2012-06-06 11:40:15]:

> 
> * Ananth N Mavinakayanahalli <ananth@in.ibm.com> wrote:
> 
> > On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, loff_t vaddr)
> > > 
> > > Don't we traditionally use unsigned long to pass vaddrs?
> > 
> > Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
> > I guess I should've made that clear in the patch description.
> 
> Why not fix struct vma_info's vaddr type?
> 

Calculating and comparing vaddr results either uses variables of type loff_t. 
To avoid typecasting and avoid overflow at each of these places, we used
loff_t. 

Ananth, install_breakpoint() already has a variable of type addr of type
unsigned long.  Why dont you use addr instead of vaddr. 

-- 
Thanks and regards
Srikar

^ permalink raw reply

* Re: kernel panic during kernel module load (powerpc specific part)
From: Benjamin Herrenschmidt @ 2012-06-06 11:32 UTC (permalink / raw)
  To: Steffen Rumler
  Cc: Wrobel Heinz-R39252, Michael Ellerman,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <4FCF0897.2060405@nsn.com>

On Wed, 2012-06-06 at 09:36 +0200, Steffen Rumler wrote:
> 
> how should we continue here ?
> There is the kernel panic, I've described.
> 
> Technically, there is an conflict between the code generated by the
> compiler and the loader in module_32.c, at least by using -Os.
> Because the prologue/epilogue is part of the .text and init_module()
> is part of .init.text (in the case __init is applied, as usual),
> a directly reachable call is not always possible. 

As we discussed earlier, if you could submit a patch to use r12 instead,
we should merge that.

Cheers,
Ben.

^ permalink raw reply

* Re: [RFC PATCH] sched/numa: do load balance between remote nodes
From: Sergei Shtylyov @ 2012-06-06 10:53 UTC (permalink / raw)
  To: Alex Shi
  Cc: linux-mips, linux-ia64, linux-sh, dhowells, paulus, hpa,
	sparclinux, mingo, sivanich, x86, greg.pearson, chris.mason,
	arjan.van.de.ven, mattst88, pjt, fenghua.yu, seto.hidetoshi,
	a.p.zijlstra, cmetcalf, ak, ink, anton, tglx, kamezawa.hiroyu,
	rth, tony.luck, torvalds, linux-kernel, ralf, lethal, linux-alpha,
	bob.picco, akpm, linuxppc-dev, davem
In-Reply-To: <1338965571-9812-1-git-send-email-alex.shi@intel.com>

Hello.

On 06-06-2012 10:52, Alex Shi wrote:

> commit cb83b629b

    Please also specify that commit's summary in parens.

> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.

> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is

    "Is" not needed here.

> not too slow between nodes.  So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.

> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)

> Signed-off-by: Alex Shi<alex.shi@intel.com>

WBR, Sergei

^ permalink raw reply

* Re: [PATCH 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()
From: Ananth N Mavinakayanahalli @ 2012-06-06 10:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srikar Dronamraju, Peter Zijlstra, lkml, oleg, Paul Mackerras,
	Anton Blanchard, Ingo Molnar, linuxppc-dev
In-Reply-To: <20120606094014.GD9495@gmail.com>

On Wed, Jun 06, 2012 at 11:40:15AM +0200, Ingo Molnar wrote:
> 
> * Ananth N Mavinakayanahalli <ananth@in.ibm.com> wrote:
> 
> > On Wed, Jun 06, 2012 at 11:23:52AM +0200, Peter Zijlstra wrote:
> > > On Wed, 2012-06-06 at 14:49 +0530, Ananth N Mavinakayanahalli wrote:
> > > > +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, loff_t vaddr)
> > > 
> > > Don't we traditionally use unsigned long to pass vaddrs?
> > 
> > Right. But the vaddr we pass here is vma_info->vaddr which is loff_t.
> > I guess I should've made that clear in the patch description.
> 
> Why not fix struct vma_info's vaddr type?

Agreed. Will fix and send v2.

Ananth

^ permalink raw reply

* Re: [PATCH v5 5/5] powerpc/85xx: add support to JOG feature using cpufreq interface
From: Zhao Chenhui @ 2012-06-06 10:19 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <4FCE2CB1.5090308@freescale.com>

On Tue, Jun 05, 2012 at 10:58:41AM -0500, Scott Wood wrote:
> On 06/05/2012 05:59 AM, Zhao Chenhui wrote:
> > On Fri, Jun 01, 2012 at 06:30:55PM -0500, Scott Wood wrote:
> >> On 05/11/2012 06:53 AM, Zhao Chenhui wrote:
> >>> The jog mode frequency transition process on the MPC8536 is similar to
> >>> the deep sleep process. The driver need save the CPU state and restore
> >>> it after CPU warm reset.
> >>>
> >>> Note:
> >>>  * The I/O peripherals such as PCIe and eTSEC may lose packets during
> >>>    the jog mode frequency transition.
> >>
> >> That might be acceptable for eTSEC, but it is not acceptable to lose
> >> anything on PCIe.  Especially not if you're going to make this "default y".
> > 
> > It is a hardware limitation.
> 
> Then make sure jog isn't used if PCIe is used.
> 
> Maybe you could do something with the suspend infrastructure, but this
> is sufficiently heavyweight that transitions should be manually
> requested, not triggered by the automatic cpufreq governor.
> 
> Does this apply to p1022, or just mpc8536?

Both of them.

> 
> > Peripherals in the platform will not be operating
> > during the jog mode frequency transition process.
> 
> What ensures this?
> 
> -Scott

Hardware ensures it without software intervention.

-Chenhui

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox