* Re: [PATCH RFCv7 0/2] CARMA Board Support
From: Benjamin Herrenschmidt @ 2011-05-19 4:13 UTC (permalink / raw)
To: Ira W. Snyder; +Cc: dmitry.torokhov, linuxppc-dev, linux-kernel
In-Reply-To: <1297467270-29576-1-git-send-email-iws@ovro.caltech.edu>
On Fri, 2011-02-11 at 15:34 -0800, Ira W. Snyder wrote:
> Hello everyone,
>
> This is the seventh posting of these drivers, taking into account comments
> from earlier postings. I've made sure that the drivers both pass checkpatch
> without any errors or warnings. I would appreciate as much review as you
> can offer, so that these can get into the next merge cycle. They've been
> sitting outside mainline for far too long.
This has been bitrotting for way too long indeed. I'm sticking this into
powerpc -next today.
Cheers,
Ben.
> RFCv6 -> RFCv7:
> - reference count private data structure (to support unbind)
> - use #defines instead of hex values for registers
> - keep lines <=80 characters
>
> RFCv5 -> RFCv6:
> - change locking in several functions
> - use list_move_tail() to simplify code
> - remove unused helper functions
>
> RFCv4 -> RFCv5:
> - remove unecessary locking per review comments
> - do not clobber return values from *_interruptible()
> - explicitly track buffer DMA mapping
> - use #defines instead of raw hex addresses
> - change enable sysfs attribute to root-writeable only
>
> RFCv3 -> RFCv4:
> - updates for DATA-FPGA version 2
>
> RFCv2 -> RFCv3:
> - use miscdevice framework (removing the carma class)
> - add bitfile readback capability to the programmer
>
> RFCv1 -> RFCv2:
> - change comments to kerneldoc format
> - Kconfig improvements
> - use the videobuf_dma_sg API in the programmer
> - updates for Freescale DMAEngine DMA_SLAVE API changes
>
> KNOWN ISSUES:
> - untested with a setup that can generate interrupts (will get access soon)
> - does not handle runtime "unbind"
>
> Information about the CARMA board:
>
> The CARMA board is essentially an MPC8349EA MDS reference design with a
> 1GHz ADC and 4 high powered data processing FPGAs connected to the local
> bus. It is all packed into a compact PCI form factor. It is used at the
> Owens Valley Radio Observatory as the main component in the correlator
> system.
>
> For board information, see:
> http://www.mmarray.org/~dwh/carma_board/index.html
>
> For DATA-FPGA register layout, see:
> http://www.mmarray.org/memos/carma_memo46.pdf
>
> These drivers are the necessary pieces to get the data processing FPGAs
> working and producing data. Despite the fact that the hardware is custom
> and we are the only users, I'd still like to get the drivers upstream.
> Several people have suggested that this is possible.
>
> Some further patches will be forthcoming. I have a driver for the LED
> subsystem and the PPS subsystem. The LED register layout is expected to
> change soon, so I won't post the driver until that is finished. The PPS
> driver will be posted seperately from this patch series; it is very
> generic.
>
> Thanks to everyone who has provided comments on earlier versions!
>
> Ira W. Snyder (2):
> misc: add CARMA DATA-FPGA Access Driver
> misc: add CARMA DATA-FPGA Programmer support
>
> drivers/misc/Kconfig | 1 +
> drivers/misc/Makefile | 1 +
> drivers/misc/carma/Kconfig | 18 +
> drivers/misc/carma/Makefile | 2 +
> drivers/misc/carma/carma-fpga-program.c | 1141 ++++++++++++++++++++++++
> drivers/misc/carma/carma-fpga.c | 1433 +++++++++++++++++++++++++++++++
> 6 files changed, 2596 insertions(+), 0 deletions(-)
> create mode 100644 drivers/misc/carma/Kconfig
> create mode 100644 drivers/misc/carma/Makefile
> create mode 100644 drivers/misc/carma/carma-fpga-program.c
> create mode 100644 drivers/misc/carma/carma-fpga.c
>
^ permalink raw reply
* Re: [git pull] Please pull powerpc.git merge branch
From: Linus Torvalds @ 2011-05-19 4:11 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list
In-Reply-To: <1305777978.7481.37.camel@pasglop>
On Wed, May 18, 2011 at 9:06 PM, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> Dunno if it's too late or not yet but here's 3 fixes for powerpc that
> would be welcome to have in before the release. If not I'll send them
> first thing next (one of them is already in -next in fact).
Gah. I just cut 2.6.39.
Linus
^ permalink raw reply
* Re: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Hitoshi Mitake @ 2011-05-19 4:08 UTC (permalink / raw)
To: Moore, Eric
Cc: linux-arch, Prakash, Sathya, Desai, Kashyap, linux scsi dev,
Matthew Wilcox, linux powerpc dev, Milton Miller, linux kernel,
James Bottomley, Ingo Molnar, paulus@samba.org, linux pci,
Ingo Molnar, Sam Ravnborg
In-Reply-To: <4565AEA676113A449269C2F3A549520F80BE7F37@cosmail03.lsi.com>
On Thu, May 19, 2011 at 04:11, Moore, Eric <Eric.Moore@lsi.com> wrote:
> On Wednesday, May 18, 2011 12:31 PM Milton Miller wrote:
>> Ingo I would propose the following commits added in 2.6.29 be reverted.
>> I think the current concensus is drivers must know if the writeq is
>> not atomic so they can provide their own locking or other workaround.
>>
>
>
> Exactly.
>
The original motivation of preparing common readq/writeq is that
letting each driver
have their own readq/writeq is bad for maintenance of source code.
But if you really dislike them, there might be two solutions:
1. changing the name of readq/writeq to readq_nonatomic/writeq_nonatomic
2. adding new C file to somewhere and defining spinlock for them.
With spin_lock_irqsave() and spin_unlock_irqrestore() on the spinlock,
readq/writeq can be atomic.
How do you think about them? If you cannot agree with the above two solutions,
I'll agree with reverting them.
--
Hitoshi Mitake
h.mitake@gmail.com
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Will Drewry @ 2011-05-19 4:07 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
Russell King, x86, James Morris, Linus Torvalds, Ingo Molnar,
linux-arm-kernel, kees.cook, Serge E. Hallyn, Steven Rostedt,
Martin Schwidefsky, Thomas Gleixner, Roland McGrath, Michal Marek,
Michal Simek, linuxppc-dev, linux-kernel, Ralf Baechle,
Paul Mundt, Tejun Heo, linux390, Andrew Morton, agl,
David S. Miller
In-Reply-To: <20110517131902.GF21441@elte.hu>
On Tue, May 17, 2011 at 6:19 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Tue, 2011-05-17 at 14:42 +0200, Ingo Molnar wrote:
>> > * Steven Rostedt <rostedt@goodmis.org> wrote:
>> >
>> > > On Mon, 2011-05-16 at 18:52 +0200, Ingo Molnar wrote:
>> > > > * Steven Rostedt <rostedt@goodmis.org> wrote:
>> > > >
>> > > > > I'm a bit nervous about the 'active' role of (trace_)events, bec=
ause of the
>> > > > > way multiple callbacks can be registered. How would:
>> > > > >
>> > > > > =A0 =A0 =A0 err =3D event_x();
>> > > > > =A0 =A0 =A0 if (err =3D=3D -EACCESS) {
>> > > > >
>> > > > > be handled? [...]
>> > > >
>> > > > The default behavior would be something obvious: to trigger all ca=
llbacks and
>> > > > use the first non-zero return value.
>> > >
>> > > But how do we know which callback that was from? There's no ordering=
of what
>> > > callbacks are called first.
>> >
>> > We do not have to know that - nor do the calling sites care in general=
. Do you
>> > have some specific usecase in mind where the identity of the callback =
that
>> > generates a match matters?
>>
>> Maybe I'm confused. I was thinking that these event_*() are what we
>> currently call trace_*(), but the event_*(), I assume, can return a
>> value if a call back returns one.
>
> Yeah - and the call site can treat it as:
>
> =A0- Ugh, if i get an error i need to abort whatever i was about to do
>
> or (more advanced future use):
>
> =A0- If i get a positive value i need to re-evaluate the parameters that =
were
> =A0 passed in, they were changed
Do event_* that return non-void exist in the tree at all now? I've
looked at the various tracepoint macros as well as some of the other
handlers (trace_function, perf_tp_event, etc) and I'm not seeing any
places where a return value is honored nor could be. At best, the
perf_tp_event can be short-circuited it in the hlist_for_each, but
it'd still need a way to bubble up a failure and result in not calling
the trace/event that the hook precedes.
Am I missing something really obvious? I don't feel I've gotten a
good handle on exactly how all the tracing code gets triggered, so
perhaps I'm still a level (or three) too shallow. (I can see the asm
hooks for trace functions and I can see where that translates to
registered calls - like trace_function - but I don't see how the
hooked calls can be trivially aborted).
As is, I'm not sure how the perf and ftrace infrastructure could be
reused cleanly without a fair number of hacks to the interface and a
good bit of reworking. I can already see a number of challenges
around reusing the sys_perf_event_open interface and the fact that
reimplementing something even as simple as seccomp mode=3D1 seems to
require a fair amount of tweaking to avoid from being leaky. (E.g.,
enabling all TRACE_EVENT()s for syscalls will miss unhooked syscalls
so either acceptance matching needs to be propagated up the stack
along with some seccomp-like task modality or seccomp-on-perf would
have to depend on sys_enter events with syscall number predicate
matching and fail when a filter discard applies to all active events.)
At present, I'm leaning back towards the v2 series (plus the requested
minor changes) for the benefit of code clarity and its fail-secure
behavior. Even just considering the reduced case of seccomp mode 1
being implemented on the shared infrastructure, I feel like I missing
something that makes it viable. Any clues?
If not, I don't think a seccomp mode 2 interface via prctl would be
intractable if the long term movement is to a ftrace/perf backend - it
just means that the in-kernel code would change to wrap whatever the
final design ended up being.
Thanks and sorry if I'm being dense!
>> Thus, we now have the ability to dynamically attach function calls to
>> arbitrary points in the kernel that can have an affect on the code that
>> called it. Right now, we only have the ability to attach function calls =
to
>> these locations that have passive affects (tracing/profiling).
>
> Well, they can only have the effect that the calling site accepts and han=
dles.
> So the 'effect' is not arbitrary and not defined by the callbacks, it is
> controlled and handled by the calling code.
>
> We do not want invisible side-effects, opaque hooks, etc.
>
> Instead of that we want (this is the getname() example i cited in the thr=
ead)
> explicit effects, like:
>
> =A0if (event_vfs_getname(result))
> =A0 =A0 =A0 =A0return ERR_PTR(-EPERM);
>
>> But you say, "nor do the calling sites care in general". Then what do
>> these calling sites do with the return code? Are we limiting these
>> actions to security only? Or can we have some other feature. [...]
>
> Yeah, not just security. One other example that came up recently is wheth=
er to
> panic the box on certain (bad) events such as NMI errors. This too could =
be
> made flexible via the event filter code: we already capture many events, =
so
> places that might conceivably do some policy could do so based on a filte=
r
> condition.
This sounds great - I just wish I could figure out how it'd work :)
>> [...] I can envision that we can make the Linux kernel quite dynamic her=
e
>> with "self modifying code". That is, anywhere we have "hooks", perhaps w=
e
>> could replace them with dynamic switches (jump labels). Maybe events wou=
ld
>> not be the best use, but they could be a generic one.
>
> events and explicit function calls and explicit side-effects are pretty m=
uch
> the only thing that are acceptable. We do not want opaque hooks and arbit=
rary
> side-effects.
>
>> Knowing what callback returned the result would be beneficial. Right now=
, you
>> are saying if the call back return anything, just abort the call, not kn=
owing
>> what callback was called.
>
> Yeah, and that's a feature: that way a number of conditions can be attach=
ed.
> Multiple security frameworks may have effect on a task or multiple tools =
might
> set policy action on a given event.
>
> Thanks,
>
> =A0 =A0 =A0 =A0Ingo
>
^ permalink raw reply
* [git pull] Please pull powerpc.git merge branch
From: Benjamin Herrenschmidt @ 2011-05-19 4:06 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linuxppc-dev list, Andrew Morton, Linux Kernel list
Hi Linus
Dunno if it's too late or not yet but here's 3 fixes for powerpc that
would be welcome to have in before the release. If not I'll send them
first thing next (one of them is already in -next in fact).
Those are regression fixes and a build breakage.
Cheers,
Ben.
The following changes since commit fce519588acfac249e8fdc1f5016c73d617de315:
Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 (2011-05-18 13:25:57 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc.git merge
Ben Hutchings (1):
powerpc/kexec: Fix build failure on 32-bit SMP
Benjamin Herrenschmidt (1):
powerpc/smp: Make start_secondary_resume available to all CPU variants
kerstin jonsson (1):
powerpc/4xx: Fix regression in SMP on 476
arch/powerpc/kernel/crash.c | 59 +++++++++++++++++++++--------------------
arch/powerpc/kernel/head_32.S | 9 ------
arch/powerpc/kernel/misc_32.S | 11 +++++++
arch/powerpc/kernel/smp.c | 4 +-
4 files changed, 43 insertions(+), 40 deletions(-)
^ permalink raw reply
* Re: [PATCH] lib: Consolidate DEBUG_STACK_USAGE option
From: Benjamin Herrenschmidt @ 2011-05-19 3:32 UTC (permalink / raw)
To: Stephen Boyd
Cc: linux-arch, linux-mips, linux-m32r, user-mode-linux-devel,
linux-sh, x86, linux-kernel, Chris Metcalf, sparclinux,
uclinux-dist-devel, Andrew Morton, linuxppc-dev, linux-arm-kernel
In-Reply-To: <1304747831-2098-1-git-send-email-sboyd@codeaurora.org>
On Fri, 2011-05-06 at 22:57 -0700, Stephen Boyd wrote:
> Most arches define CONFIG_DEBUG_STACK_USAGE exactly the same way.
> Move it to lib/Kconfig.debug so each arch doesn't have to define
> it. This obviously makes the option generic, but that's fine
> because the config is already used in generic code.
>
> It's not obvious to me that sysrq-P actually does anything
> different with this option enabled, but I erred on the side of
> caution by keeping the most inclusive wording.
Sorry for the delay...
For powerpc:
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cheers,
Ben.
^ permalink raw reply
* linux-next: build warning after merge of the powerpc tree
From: Stephen Rothwell @ 2011-05-19 3:19 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev
Cc: linux-next, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 788 bytes --]
Hi all,
After merging the powerpc tree, today's linux-next build (powerpc
allyesconfig) produced this warning:
WARNING: arch/powerpc/sysdev/built-in.o(.text+0x10eb8): Section mismatch in reference from the function .ics_rtas_init() to the function .init.text:.xics_register_ics()
The function .ics_rtas_init() references
the function __init .xics_register_ics().
This is often because .ics_rtas_init lacks a __init
annotation or the annotation of .xics_register_ics is wrong.
Introduced by commit 0b05ac6e2480 ("powerpc/xics: Rewrite XICS driver").
ics_rtas_init() is only called from xics_init() which is marked __init,
so ics_rtas_init() should be as well.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: [PATCH] PPC_47x SMP fix
From: Benjamin Herrenschmidt @ 2011-05-19 3:09 UTC (permalink / raw)
To: Kerstin Jonsson
Cc: Paul Mackerras, Michael Neuling, linuxppc-dev, linux-kernel,
Will Schmidt
In-Reply-To: <1305712631-21690-1-git-send-email-kerstin.jonsson@ericsson.com>
On Wed, 2011-05-18 at 11:57 +0200, Kerstin Jonsson wrote:
> commit c56e58537d504706954a06570b4034c04e5b7500 breaks SMP support in PPC_47x chip.
> secondary_ti must be set to current thread info before callin kick_cpu or else
> start_secondary_47x will jump into void when trying to return to c-code.
> In the current setup secondary_ti is initialized before the CPU idle task is started
> and only the boot core will start. I am not sure this is the correct solution, but it
> makes SMP possible in my chip.
> Note! The HOTPLUG support probably need some fixing to, There is no trampoline code
> available in head_44x.S - start_secondary_resume?
Sending to Linus now. I've also committed a fix for the later, moving
the 32-bit definition of start_secondary_resume to misc_32.S
Thanks !
Cheers,
Ben.
>
> Signed-off-by: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Neuling <mikey@neuling.org>
> Cc: Will Schmidt <will_schmidt@vnet.ibm.com>
> ---
> arch/powerpc/kernel/smp.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index cbdbb14..f2dcab7 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -410,8 +410,6 @@ int __cpuinit __cpu_up(unsigned int cpu)
> {
> int rc, c;
>
> - secondary_ti = current_set[cpu];
> -
> if (smp_ops == NULL ||
> (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
> return -EINVAL;
> @@ -421,6 +419,8 @@ int __cpuinit __cpu_up(unsigned int cpu)
> if (rc)
> return rc;
>
> + secondary_ti = current_set[cpu];
> +
> /* Make sure callin-map entry is 0 (can be leftover a CPU
> * hotplug
> */
^ permalink raw reply
* Re: mmotm threatens ppc preemption again
From: Jeremy Fitzhardinge @ 2011-05-18 23:29 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: linuxppc-dev, Andrew Morton, Hugh Dickins, Peter Zijlstra
In-Reply-To: <1301603918.2407.70.camel@pasglop>
On 03/31/2011 01:38 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2011-03-31 at 10:21 -0700, Jeremy Fitzhardinge wrote:
>> No, its the same accessors for both, since the need to distinguish them
>> hasn't really come up. Could you put a "if (preemptable()) return;"
>> guard in your implementations?
> That would be a band-aid but would probably do the trick for now
> for !-rt, tho it wouldn't do the right thing for -rt...
Hi Ben,
Have you had a chance to look at doing a workaround/fix for these power
problems? I believe that's the only holdup to putting in the batching
changes. I'd like to get them in for the next window if possible, since
they're a pretty significant performance win for us.
Thanks,
J
^ permalink raw reply
* Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization
From: Scott Wood @ 2011-05-18 22:27 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1305755688.7481.12.camel@pasglop>
On Thu, 19 May 2011 07:54:48 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2011-05-18 at 16:51 -0500, Scott Wood wrote:
> > On Thu, 19 May 2011 07:37:47 +1000
> > Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> >
> > > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > > A little more speed up measured on e5500.
> > > >
> > > > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > > > see.
> > >
> > > Please keep them for now. If your core doesn't have them, make them an
> > > MMU feature.
> >
> > We have them, it was just an attempt to clean out unused things to speed up
> > the miss handler. I'll drop that part if you think we'll use it in the
> > future.
>
> I never know for sure ... damn research people ... :-)
>
> I'd rather keep them for now, does it make a significant difference ?
It was minor but measurable (wouldn't have been worthwhile except as part
of a series of small things that add up), but upon trying again I was able
to reorder slightly and fit it in without seeing an impact.
-Scott
^ permalink raw reply
* RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic
From: Moore, Eric @ 2011-05-18 22:05 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Prakash, Sathya, Desai, Kashyap, linux-scsi@vger.kernel.org,
Matthew Wilcox, Milton Miller, James Bottomley, paulus@samba.org,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1305754218.7481.0.camel@pasglop>
T24gV2VkbmVzZGF5LCBNYXkgMTgsIDIwMTEgMzozMCBQTSwgQmVuamFtaW4gSGVycmVuc2NobWlk
dCB3cm90ZToNCj4gDQo+IFlvdSBtYXkgYWxzbyB3YW50IHRvIGxvb2sgYXQgTWlsdG9uJ3MgY29t
bWVudHMsIGl0IGxvb2tzIGxpa2UgdGhlIHdheQ0KPiB5b3UgZG8gaW5pdF9jb21wbGV0aW9uIGZv
bGxvd2VkIGltbWVkaWF0ZWx5IGJ5IHdhaXRfY29tcGxldGlvbiBpcyByYWN5Lg0KPiANCj4gWW91
IHNob3VsZCBpbml0IHRoZSBjb21wbGV0aW9uIGJlZm9yZSB5b3UgZG8gdGhlIElPIHRoYXQgd2ls
bCBldmVudHVhbGx5DQo+IHRyaWdnZXIgY29tcGxldGUoKSB0byBiZSBjYWxsZWQuDQo+IA0KDQpJ
IGFncmVlLiAgVGhlIGluaXRfY29tcGxldGlvbiBuZWVkcyB0byBiZSBkb25lIHByaW9yIHRvIHBv
c3RpbmcgdGhlIHNtaWQuICBJJ20gbm90IHN1cmUgd2h5IEkgZGlkIGl0IHRoYXQgd2F5Lg0K
^ permalink raw reply
* Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS
From: Benjamin Herrenschmidt @ 2011-05-18 21:58 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518165200.0d3e0188@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:52 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:38:19 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > > ---
> > > Is there any 64-bit book3e chip that doesn't support this? It
> > > doesn't appear to be optional in the ISA.
> >
> > Not afaik.
>
> Any objection to just removing the feature bit?
Nope. Wasn't it added by Kumar in the first place ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization
From: Benjamin Herrenschmidt @ 2011-05-18 21:54 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518165134.1023dccc@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:51 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:37:47 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > A little more speed up measured on e5500.
> > >
> > > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > > see.
> >
> > Please keep them for now. If your core doesn't have them, make them an
> > MMU feature.
>
> We have them, it was just an attempt to clean out unused things to speed up
> the miss handler. I'll drop that part if you think we'll use it in the
> future.
I never know for sure ... damn research people ... :-)
I'd rather keep them for now, does it make a significant difference ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes
From: Benjamin Herrenschmidt @ 2011-05-18 21:54 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, David Gibson
In-Reply-To: <20110518165025.1deddf00@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:50 -0500, Scott Wood wrote:
> On Thu, 19 May 2011 07:36:04 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > > I don't see where any non-standard page size will be set in the
> > > kernel page tables, so don't waste time checking for it. It wouldn't
> > > work with TLB0 on an FSL MMU anyway, so if there's something I missed
> > > (or which is out-of-tree), it's relying on implementation-specific
> > > behavior. If there's an out-of-tree need for occasional 4K mappings
> > > with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> > > that is defined.
> > >
> > > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > > ---
> >
> > Do you use that in the hugetlbfs code ? Can you publish that code ? It's
> > long overdue...
>
> hugetlbfs entries don't get loaded by this code. It branches to a slow
> path based on seeing a positive value in a pgd/pud/pmd entry.
BTW. The long overdue was aimed at David to get A2 hugetlbfs out :-)
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS
From: Scott Wood @ 2011-05-18 21:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1305754699.7481.6.camel@pasglop>
On Thu, 19 May 2011 07:38:19 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > ---
> > Is there any 64-bit book3e chip that doesn't support this? It
> > doesn't appear to be optional in the ISA.
>
> Not afaik.
Any objection to just removing the feature bit?
-Scott
^ permalink raw reply
* Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs
From: Benjamin Herrenschmidt @ 2011-05-18 21:52 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518164645.0ae83ad5@schlenkerla.am.freescale.net>
> > Why do you want to create a virtual page table at the PMD level ? Also,
> > you are changing the geometry of the page tables which I think we don't
> > want. We chose that geometry so that the levels match the segment sizes
> > on server, I think it may have an impact with the hugetlbfs code (check
> > with David), it also was meant as a way to implement shared page tables
> > on hash64 tho we never published that.
>
> The number of virtual page table misses were very high on certain loads.
> Cutting back to a virtual PMD eliminates most of that for the benchmark I
> tested, though it could still be painful for access patterns that are
> extremely spread out through the 64-bit address space. I'll try a full
> 4-level walk and see what the performance is like; I was aiming for a
> compromise between random access and linear/localized access.
Let's get more numbers first then :-)
> Why does it need to match segment sizes on server?
I'm not sure whether we have a dependency with hugetlbfs there, I need
to check (remember we have one page size per segment there). For sharing
page tables that came from us using the PMD pointer as a base to
calculate the VSIDs. But I don't think we have plans to revive those
patches in the immediate future.
Cheers,
Ben.
> As for hugetlbfs, it merged easily enough with Becky's patches (you'll have
> to ask her when they'll be published).
>
> -Scott
^ permalink raw reply
* Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization
From: Scott Wood @ 2011-05-18 21:51 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1305754667.7481.5.camel@pasglop>
On Thu, 19 May 2011 07:37:47 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > A little more speed up measured on e5500.
> >
> > Setting of U0-3 is dropped as it is not used by Linux as far as I can
> > see.
>
> Please keep them for now. If your core doesn't have them, make them an
> MMU feature.
We have them, it was just an attempt to clean out unused things to speed up
the miss handler. I'll drop that part if you think we'll use it in the
future.
-Scott
^ permalink raw reply
* Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes
From: Scott Wood @ 2011-05-18 21:50 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, David Gibson
In-Reply-To: <1305754564.7481.4.camel@pasglop>
On Thu, 19 May 2011 07:36:04 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> > I don't see where any non-standard page size will be set in the
> > kernel page tables, so don't waste time checking for it. It wouldn't
> > work with TLB0 on an FSL MMU anyway, so if there's something I missed
> > (or which is out-of-tree), it's relying on implementation-specific
> > behavior. If there's an out-of-tree need for occasional 4K mappings
> > with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> > that is defined.
> >
> > Signed-off-by: Scott Wood <scottwood@freescale.com>
> > ---
>
> Do you use that in the hugetlbfs code ? Can you publish that code ? It's
> long overdue...
hugetlbfs entries don't get loaded by this code. It branches to a slow
path based on seeing a positive value in a pgd/pud/pmd entry.
-Scott
^ permalink raw reply
* Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame
From: Benjamin Herrenschmidt @ 2011-05-18 21:48 UTC (permalink / raw)
To: Richard Cochran; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20110518120320.GA3025@riccoc20.at.omicron.at>
> (I get the feeling that I am the only one testing recent kernels with
> the mpc85xx.)
>
> Anyhow, I see that this commit was one of a series. For my own use,
> can I simply revert this one commit independently?
For your own use sure :-) But I'd still like to get to the bottom of
this !
Cheers,
Ben.
^ permalink raw reply
* Re: powerpc: mpc85xx regression since 2.6.39-rc2, one cpu core lame
From: Benjamin Herrenschmidt @ 2011-05-18 21:48 UTC (permalink / raw)
To: Milton Miller
Cc: Kerstin Jonsson, Richard Cochran, linuxppc-dev, linux-kernel
In-Reply-To: <1305739191_6593@mail4.comsite.net>
On Wed, 2011-05-18 at 12:19 -0500, Milton Miller wrote:
> Does this patch help? If so please reply to that thread so patchwork
> will see it in addition to here.
>
> http://patchwork.ozlabs.org/patch/96146/
Interesting. I'll have a closer look today. Unfortunately, I don't have
any 32-bit BookE SMP at hand at the moment so I couldn't test those
configs.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 1/7] powerpc/mm: 64-bit 4k: use page-sized PMDs
From: Scott Wood @ 2011-05-18 21:46 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1305754361.7481.2.camel@pasglop>
On Thu, 19 May 2011 07:32:41 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Wed, 2011-05-18 at 16:04 -0500, Scott Wood wrote:
> > This allows a virtual page table to be used at the PMD rather than
> > the PTE level.
> >
> > Rather than adjust the constant in pgd_index() (or ignore it, as
> > too-large values don't hurt as long as overly large addresses aren't
> > passed in), go back to using PTRS_PER_PGD. The overflow comment seems to
> > apply to a very old implementation of free_pgtables that used pgd_index()
> > (unfortunately the commit message, if you seek it out in the historic
> > tree, doesn't mention any details about the overflow). The existing
> > value was numerically indentical to the old 4K-page PTRS_PER_PGD, so
> > using it shouldn't produce an overflow where it's not otherwise possible.
> >
> > Also get rid of the incorrect comment at the top of pgtable-ppc64-4k.h.
>
> Why do you want to create a virtual page table at the PMD level ? Also,
> you are changing the geometry of the page tables which I think we don't
> want. We chose that geometry so that the levels match the segment sizes
> on server, I think it may have an impact with the hugetlbfs code (check
> with David), it also was meant as a way to implement shared page tables
> on hash64 tho we never published that.
The number of virtual page table misses were very high on certain loads.
Cutting back to a virtual PMD eliminates most of that for the benchmark I
tested, though it could still be painful for access patterns that are
extremely spread out through the 64-bit address space. I'll try a full
4-level walk and see what the performance is like; I was aiming for a
compromise between random access and linear/localized access.
Why does it need to match segment sizes on server?
As for hugetlbfs, it merged easily enough with Becky's patches (you'll have
to ask her when they'll be published).
-Scott
^ permalink raw reply
* Re: [PATCH 7/7] powerpc/e5500: set MMU_FTR_USE_PAIRED_MAS
From: Benjamin Herrenschmidt @ 2011-05-18 21:38 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518210538.GF29524@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> Is there any 64-bit book3e chip that doesn't support this? It
> doesn't appear to be optional in the ISA.
Not afaik.
Cheers,
Ben.
> arch/powerpc/kernel/cputable.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
> index 34d2722..a3b8eeb 100644
> --- a/arch/powerpc/kernel/cputable.c
> +++ b/arch/powerpc/kernel/cputable.c
> @@ -1981,7 +1981,7 @@ static struct cpu_spec __initdata cpu_specs[] = {
> .cpu_features = CPU_FTRS_E5500,
> .cpu_user_features = COMMON_USER_BOOKE,
> .mmu_features = MMU_FTR_TYPE_FSL_E | MMU_FTR_BIG_PHYS |
> - MMU_FTR_USE_TLBILX,
> + MMU_FTR_USE_TLBILX | MMU_FTR_USE_PAIRED_MAS,
> .icache_bsize = 64,
> .dcache_bsize = 64,
> .num_pmcs = 4,
^ permalink raw reply
* Re: [PATCH 6/7] powerpc/mm: 64-bit: tlb handler micro-optimization
From: Benjamin Herrenschmidt @ 2011-05-18 21:37 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518210536.GE29524@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> A little more speed up measured on e5500.
>
> Setting of U0-3 is dropped as it is not used by Linux as far as I can
> see.
Please keep them for now. If your core doesn't have them, make them an
MMU feature.
Cheers,
Ben.
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> arch/powerpc/mm/tlb_low_64e.S | 21 ++++++++-------------
> 1 files changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index e782023..a94c87b 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -47,10 +47,10 @@
> * We could probably also optimize by not saving SRR0/1 in the
> * linear mapping case but I'll leave that for later
> */
> - mfspr r14,SPRN_ESR
> mfspr r16,SPRN_DEAR /* get faulting address */
> srdi r15,r16,60 /* get region */
> cmpldi cr0,r15,0xc /* linear mapping ? */
> + mfspr r14,SPRN_ESR
> TLB_MISS_STATS_SAVE_INFO
> beq tlb_load_linear /* yes -> go to linear map load */
>
> @@ -62,11 +62,11 @@
> andi. r10,r15,0x1
> bne- virt_page_table_tlb_miss
>
> - std r14,EX_TLB_ESR(r12); /* save ESR */
> - std r16,EX_TLB_DEAR(r12); /* save DEAR */
> + /* We need _PAGE_PRESENT and _PAGE_ACCESSED set */
>
> - /* We need _PAGE_PRESENT and _PAGE_ACCESSED set */
> + std r14,EX_TLB_ESR(r12); /* save ESR */
> li r11,_PAGE_PRESENT
> + std r16,EX_TLB_DEAR(r12); /* save DEAR */
> oris r11,r11,_PAGE_ACCESSED@h
>
> /* We do the user/kernel test for the PID here along with the RW test
> @@ -225,21 +225,16 @@ finish_normal_tlb_miss:
> * yet implemented for now
> * MAS 2 : Defaults not useful, need to be redone
> * MAS 3+7 : Needs to be done
> - *
> - * TODO: mix up code below for better scheduling
> */
> clrrdi r11,r16,12 /* Clear low crap in EA */
> + rldicr r15,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
> rlwimi r11,r14,32-19,27,31 /* Insert WIMGE */
> + clrldi r15,r15,12 /* Clear crap at the top */
> mtspr SPRN_MAS2,r11
> -
> - /* Move RPN in position */
> - rldicr r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
> - clrldi r15,r11,12 /* Clear crap at the top */
> - rlwimi r15,r14,32-8,22,25 /* Move in U bits */
> + andi. r11,r14,_PAGE_DIRTY
> rlwimi r15,r14,32-2,26,31 /* Move in BAP bits */
>
> /* Mask out SW and UW if !DIRTY (XXX optimize this !) */
> - andi. r11,r14,_PAGE_DIRTY
> bne 1f
> li r11,MAS3_SW|MAS3_UW
> andc r15,r15,r11
> @@ -483,10 +478,10 @@ virt_page_table_tlb_miss_whacko_fault:
> * We could probably also optimize by not saving SRR0/1 in the
> * linear mapping case but I'll leave that for later
> */
> - mfspr r14,SPRN_ESR
> mfspr r16,SPRN_DEAR /* get faulting address */
> srdi r11,r16,60 /* get region */
> cmpldi cr0,r11,0xc /* linear mapping ? */
> + mfspr r14,SPRN_ESR
> TLB_MISS_STATS_SAVE_INFO
> beq tlb_load_linear /* yes -> go to linear map load */
>
^ permalink raw reply
* Re: [PATCH 5/7] powerpc/mm: 64-bit: don't handle non-standard page sizes
From: Benjamin Herrenschmidt @ 2011-05-18 21:36 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev, David Gibson
In-Reply-To: <20110518210535.GD29524@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> I don't see where any non-standard page size will be set in the
> kernel page tables, so don't waste time checking for it. It wouldn't
> work with TLB0 on an FSL MMU anyway, so if there's something I missed
> (or which is out-of-tree), it's relying on implementation-specific
> behavior. If there's an out-of-tree need for occasional 4K mappings
> with CONFIG_PPC_64K_PAGES, perhaps this check could only be done when
> that is defined.
>
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
Do you use that in the hugetlbfs code ? Can you publish that code ? It's
long overdue...
Cheers,
Ben.
> arch/powerpc/mm/tlb_low_64e.S | 13 -------------
> 1 files changed, 0 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index 922fece..e782023 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -232,19 +232,6 @@ finish_normal_tlb_miss:
> rlwimi r11,r14,32-19,27,31 /* Insert WIMGE */
> mtspr SPRN_MAS2,r11
>
> - /* Check page size, if not standard, update MAS1 */
> - rldicl r11,r14,64-8,64-8
> -#ifdef CONFIG_PPC_64K_PAGES
> - cmpldi cr0,r11,BOOK3E_PAGESZ_64K
> -#else
> - cmpldi cr0,r11,BOOK3E_PAGESZ_4K
> -#endif
> - beq- 1f
> - mfspr r11,SPRN_MAS1
> - rlwimi r11,r14,31,21,24
> - rlwinm r11,r11,0,21,19
> - mtspr SPRN_MAS1,r11
> -1:
> /* Move RPN in position */
> rldicr r11,r14,64-(PTE_RPN_SHIFT-PAGE_SHIFT),63-PAGE_SHIFT
> clrldi r15,r11,12 /* Clear crap at the top */
^ permalink raw reply
* Re: [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table
From: Benjamin Herrenschmidt @ 2011-05-18 21:33 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110518210528.GA29524@schlenkerla.am.freescale.net>
On Wed, 2011-05-18 at 16:05 -0500, Scott Wood wrote:
> Loads with non-linear access patterns were producing a very high
> ratio of recursive pt faults to regular tlb misses. Rather than
> choose between a 4-level table walk or a 1-level virtual page table
> lookup, use a hybrid scheme with a virtual linear pmd, followed by a
> 2-level lookup in the normal handler.
>
> This adds about 5 cycles (assuming no cache misses, and e5500 timing)
> to a normal TLB miss, but greatly reduces the recursive fault rate
> for loads which don't have locality within 2 MiB regions but do have
> significant locality within 1 GiB regions. Improvements of close to 50%
> were seen on such benchmarks.
Can you publish benchmarks that compare these two with no virtual at all
(4 full loads) ?
Cheers,
Ben.
> Signed-off-by: Scott Wood <scottwood@freescale.com>
> ---
> arch/powerpc/mm/tlb_low_64e.S | 23 +++++++++++++++--------
> 1 files changed, 15 insertions(+), 8 deletions(-)
>
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index af08922..17726d3 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -24,7 +24,7 @@
> #ifdef CONFIG_PPC_64K_PAGES
> #define VPTE_PMD_SHIFT (PTE_INDEX_SIZE+1)
> #else
> -#define VPTE_PMD_SHIFT (PTE_INDEX_SIZE)
> +#define VPTE_PMD_SHIFT 0
> #endif
> #define VPTE_PUD_SHIFT (VPTE_PMD_SHIFT + PMD_INDEX_SIZE)
> #define VPTE_PGD_SHIFT (VPTE_PUD_SHIFT + PUD_INDEX_SIZE)
> @@ -185,7 +185,7 @@ normal_tlb_miss:
> /* Insert the bottom bits in */
> rlwimi r14,r15,0,16,31
> #else
> - rldicl r14,r16,64-(PAGE_SHIFT-3),PAGE_SHIFT-3+4
> + rldicl r14,r16,64-(PMD_SHIFT-3),PMD_SHIFT-3+4
> #endif
> sldi r15,r10,60
> clrrdi r14,r14,3
> @@ -202,6 +202,16 @@ MMU_FTR_SECTION_ELSE
> ld r14,0(r10)
> ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_USE_TLBRSRV)
>
> +#ifndef CONFIG_PPC_64K_PAGES
> + rldicl r15,r16,64-PAGE_SHIFT+3,64-PTE_INDEX_SIZE-3
> + clrrdi r15,r15,3
> +
> + cmpldi cr0,r14,0
> + beq normal_tlb_miss_access_fault
> +
> + ldx r14,r14,r15
> +#endif
> +
> finish_normal_tlb_miss:
> /* Check if required permissions are met */
> andc. r15,r11,r14
> @@ -353,14 +363,11 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_TLBRSRV)
> #ifndef CONFIG_PPC_64K_PAGES
> /* Get to PUD entry */
> rldicl r11,r16,64-VPTE_PUD_SHIFT,64-PUD_INDEX_SIZE-3
> - clrrdi r10,r11,3
> - ldx r15,r10,r15
> - cmpldi cr0,r15,0
> - beq virt_page_table_tlb_miss_fault
> -#endif /* CONFIG_PPC_64K_PAGES */
> -
> +#else
> /* Get to PMD entry */
> rldicl r11,r16,64-VPTE_PMD_SHIFT,64-PMD_INDEX_SIZE-3
> +#endif
> +
> clrrdi r10,r11,3
> ldx r15,r10,r15
> cmpldi cr0,r15,0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox