* Re: Kernel cannot see PCI device
From: Benjamin Herrenschmidt @ 2011-05-24 21:43 UTC (permalink / raw)
To: Prashant Bhole
Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Stefan Roese,
linuxppc-dev, Tirumala Marri
In-Reply-To: <BANLkTi=J5EFvs3nHDQXNQfyd0EejWJvzcQ@mail.gmail.com>
On Tue, 2011-05-24 at 10:25 +0530, Prashant Bhole wrote:
> Fixed the problem by soft resetting the PCIe port in the function
> ppc460ex_pciex_port_init_hw().
> Is it a right thing to do?
> Following is the patch for kernel 2.6.38.4:
> --------------------------------------------------------------------------------------
> --- linux-2.6.38.4/arch/powerpc/sysdev/ppc4xx_pci.c.orig 2011-05-24
> 10:02:38.000000000 +0530
> +++ linux-2.6.38.4/arch/powerpc/sysdev/ppc4xx_pci.c 2011-05-24
> 10:07:17.000000000 +0530
> @@ -876,6 +876,20 @@
> u32 val;
> u32 utlset1;
>
> + switch (port->index)
> + {
> + case 0:
> + mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x0);
> + mdelay(10);
> + break;
> + case 1:
> + mtdcri(SDR0, PESDR1_460EX_PHY_CTL_RST, 0x0);
> + mdelay(10);
> + break;
> + default:
> + break;
> + }
> +
> if (port->endpoint)
> val = PTYPE_LEGACY_ENDPOINT << 20;
> else
> --------------------------------------------------------------------------------------
Well, it's odd that you'd have to do that, maybe something the
bootloader is doing ?
I personally don't mind but I'd like Stefan and/or Tirumala opinion on
this.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Steven Rostedt @ 2011-05-24 20:14 UTC (permalink / raw)
To: Ingo Molnar
Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
Ralf Baechle, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
Russell King, x86, James Morris, Linus Torvalds, Ingo Molnar,
kees.cook, Serge E. Hallyn, Tejun Heo, Thomas Gleixner,
linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
linuxppc-dev, linux-kernel, Eric Paris, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110524200815.GD27634@elte.hu>
On Tue, 2011-05-24 at 22:08 +0200, Ingo Molnar wrote:
> * Will Drewry <wad@chromium.org> wrote:
> But there could be a perf_tp_event_ret() or perf_tp_event_check() entry that
> code like seccomp which wants to use event results can use.
We should name it something else. The "perf_tp.." is a misnomer as it
has nothing to do with performance monitoring. "dynamic_event_.." maybe,
as it is dynamic to the affect that we can use jump labels to enable or
disable it.
-- Steve
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-24 20:10 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110524195435.GC27634@elte.hu>
* Ingo Molnar <mingo@elte.hu> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
> > On Tue, 2011-05-24 at 10:59 -0500, Will Drewry wrote:
> > > include/linux/ftrace_event.h | 4 +-
> > > include/linux/perf_event.h | 10 +++++---
> > > kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++++---
> > > kernel/seccomp.c | 8 ++++++
> > > kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
> > > 5 files changed, 82 insertions(+), 16 deletions(-)
> >
> > I strongly oppose to the perf core being mixed with any sekurity voodoo
> > (or any other active role for that matter).
>
> I'd object to invisible side-effects as well, and vehemently so. But note how
> intelligently it's used here: it's explicit in the code, it's used explicitly
> in kernel/seccomp.c and the event generation place in
> kernel/trace/trace_syscalls.c.
>
> So this is a really flexible solution IMO and does not extend events with
> some invisible 'active' role. It extends the *call site* with an open-coded
> active role - which active role btw. already pre-existed.
Also see my other mail - i think this seccomp code is too tied in to the perf
core and ABI - but this is fixable IMO.
The fundamental notion that a generator subsystem of events can use filter
results as well (such as kernel/trace/trace_syscalls.c.) for its own purposes
is pretty robust though.
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-24 20:08 UTC (permalink / raw)
To: Will Drewry
Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
Russell King, x86, James Morris, Linus Torvalds, Ingo Molnar,
kees.cook, Serge E. Hallyn, Steven Rostedt, Tejun Heo,
Thomas Gleixner, linux-arm-kernel, Michal Marek, Michal Simek,
linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <BANLkTim9UyYAGhg06vCFLxkYPX18cPymEQ@mail.gmail.com>
* Will Drewry <wad@chromium.org> wrote:
> The change avoids defining a new trace call type or a huge number of internal
> changes and hides seccomp.mode=2 from ABI-exposure in prctl, but the attack
> surface is non-trivial to verify, and I'm not sure if this ABI change makes
> sense. It amounts to:
>
> include/linux/ftrace_event.h | 4 +-
> include/linux/perf_event.h | 10 +++++---
> kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++++---
> kernel/seccomp.c | 8 ++++++
> kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
> 5 files changed, 82 insertions(+), 16 deletions(-)
>
> And can be found here: http://static.dataspill.org/perf_secure/v1/
Wow, i'm very impressed how few changes you needed to do to support this!
So, firstly, i don't think we should change perf_tp_event() at all - the
'observer' codepaths should be unaffected.
But there could be a perf_tp_event_ret() or perf_tp_event_check() entry that
code like seccomp which wants to use event results can use.
Also, i'm not sure about the seccomp details and assumptions that were moved
into the perf core. How about passing in a helper function to
perf_tp_event_check(), where seccomp would define its seccomp specific helper
function?
That looks sufficiently flexible. That helper function could be an 'extra
filter' kind of thing, right?
Also, regarding the ABI and the attr.err_on_discard and attr.require_secure
bits, they look a bit too specific as well.
attr.err_on_discard: with the filter helper function passed in this is probably
not needed anymore, right?
attr.require_secure: this is basically used to *force* the creation of
security-controlling filters, right? It seems to me that this could be done via
a seccomp ABI extension as well, without adding this to the perf ABI. That
seccomp call could check whether the right events are created and move the task
to mode 2 only if that prereq is met - or something like that.
> If there is any interest at all, I can post it properly to this giant
> CC list. [...]
I'd suggest to trim the Cc: list aggressively - anyone interested in the
discussion can pick it up on lkml - and i strongly suspect that most of the Cc:
participants would want to be off the Cc: :-)
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-24 19:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
x86, James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1306254027.18455.47.camel@twins>
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-05-24 at 10:59 -0500, Will Drewry wrote:
> > include/linux/ftrace_event.h | 4 +-
> > include/linux/perf_event.h | 10 +++++---
> > kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++++---
> > kernel/seccomp.c | 8 ++++++
> > kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
> > 5 files changed, 82 insertions(+), 16 deletions(-)
>
> I strongly oppose to the perf core being mixed with any sekurity voodoo
> (or any other active role for that matter).
I'd object to invisible side-effects as well, and vehemently so. But note how
intelligently it's used here: it's explicit in the code, it's used explicitly
in kernel/seccomp.c and the event generation place in
kernel/trace/trace_syscalls.c.
So this is a really flexible solution IMO and does not extend events with some
invisible 'active' role. It extends the *call site* with an open-coded active
role - which active role btw. already pre-existed.
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH v2 5/7] mmc: sdhci: consolidate sdhci-of-esdhc and sdhci-esdhc-imx
From: Wolfram Sang @ 2011-05-24 19:40 UTC (permalink / raw)
To: Shawn Guo
Cc: Chris Ball, sameo, Arnd Bergmann, patches, devicetree-discuss,
linux-mmc, Saeed Bishara, Xiaobo Xie, kernel, Mike Rapoport,
Olof Johansson, Anton Vorontsov, linuxppc-dev, Albert Herranz,
linux-arm-kernel
In-Reply-To: <1304601778-13837-6-git-send-email-shawn.guo@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]
On Thu, May 05, 2011 at 09:22:56PM +0800, Shawn Guo wrote:
> This patch is to consolidate SDHCI driver for Freescale eSDHC
> controller found on both MPCxxx and i.MX platforms. It merges
> sdhci-of-esdhc.c into sdhci-esdhc.c, so that the same pair of
> .probe/.remove hook works with eSDHC for two platforms.
>
> As the results, sdhci-of-esdhc.c and sdhci-esdhc.h are removed, and
> header esdhc.h containing the definition of esdhc_platform_data is
> put into the public folder.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
I agree with Anton about not merging the two...
> +#ifndef CONFIG_MMC_SDHCI_ESDHC_IMX
> +#define cpu_is_mx25() (0)
> +#define cpu_is_mx35() (0)
> +#define cpu_is_mx51() (0)
> +#define cpu_is_imx() (0)
> +#else
> +#define cpu_is_imx() (1)
> +#endif
... e.g. that looks a bit frightening.
--
Pengutronix e.K. | Wolfram Sang |
Industrial Linux Solutions | http://www.pengutronix.de/ |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH v2 3/7] mmc: sdhci: make sdhci-of device drivers self registered
From: Wolfram Sang @ 2011-05-24 19:32 UTC (permalink / raw)
To: Shawn Guo
Cc: Chris Ball, sameo, Arnd Bergmann, patches, devicetree-discuss,
linux-mmc, Saeed Bishara, Xiaobo Xie, kernel, Mike Rapoport,
Olof Johansson, Anton Vorontsov, linuxppc-dev, Albert Herranz,
linux-arm-kernel
In-Reply-To: <1304601778-13837-4-git-send-email-shawn.guo@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 2140 bytes --]
On Thu, May 05, 2011 at 09:22:54PM +0800, Shawn Guo wrote:
> The patch turns the sdhci-of-core common stuff into helper functions
> added into sdhci-pltfm.c, and makes sdhci-of device drviers self
> registered using the same pair of .probe and .remove used by
> sdhci-pltfm device drivers.
>
> As a result, sdhci-of-core.c and sdhci-of.h can be eliminated with
> those common things merged into sdhci-pltfm.c and sdhci-pltfm.h
> respectively.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
After taking care of Anton's comment and fixing this minor thingie...
> +#ifdef CONFIG_OF
> +static bool sdhci_of_wp_inverted(struct device_node *np)
> +{
> + if (of_get_property(np, "sdhci,wp-inverted", NULL))
> + return true;
> +
> + /* Old device trees don't have the wp-inverted property. */
> +#ifdef CONFIG_PPC
> + return machine_is(mpc837x_rdb) || machine_is(mpc837x_mds);
> +#else
> + return false;
> +#endif
/* CONFIG_PPC */ after #endif
> +}
> +
> +void sdhci_get_of_property(struct platform_device *pdev)
> +{
> + struct device_node *np = pdev->dev.of_node;
> + struct sdhci_host *host = platform_get_drvdata(pdev);
> + struct sdhci_pltfm_host *pltfm_host = sdhci_priv(host);
> + const __be32 *clk;
> + int size;
> +
> + if (of_device_is_available(np)) {
> + if (of_get_property(np, "sdhci,auto-cmd12", NULL))
> + host->quirks |= SDHCI_QUIRK_MULTIBLOCK_READ_ACMD12;
> +
> + if (of_get_property(np, "sdhci,1-bit-only", NULL))
> + host->quirks |= SDHCI_QUIRK_FORCE_1_BIT_DATA;
> +
> + if (sdhci_of_wp_inverted(np))
> + host->quirks |= SDHCI_QUIRK_INVERTED_WRITE_PROTECT;
> +
> + clk = of_get_property(np, "clock-frequency", &size);
> + if (clk && size == sizeof(*clk) && *clk)
> + pltfm_host->clock = be32_to_cpup(clk);
> + }
> +}
> +#else
> +void sdhci_get_of_property(struct platform_device *pdev) {}
> +#endif
/* CONFIG_OF */
you can add
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
--
Pengutronix e.K. | Wolfram Sang |
Industrial Linux Solutions | http://www.pengutronix.de/ |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH v2 2/7] mmc: sdhci: eliminate sdhci_of_host and sdhci_of_data
From: Wolfram Sang @ 2011-05-24 19:26 UTC (permalink / raw)
To: Shawn Guo
Cc: Chris Ball, sameo, Arnd Bergmann, patches, devicetree-discuss,
linux-mmc, Saeed Bishara, Xiaobo Xie, kernel, Mike Rapoport,
Olof Johansson, Anton Vorontsov, linuxppc-dev, Albert Herranz,
linux-arm-kernel
In-Reply-To: <1304601778-13837-3-git-send-email-shawn.guo@linaro.org>
[-- Attachment #1: Type: text/plain, Size: 648 bytes --]
On Thu, May 05, 2011 at 09:22:53PM +0800, Shawn Guo wrote:
> The patch migrates the use of sdhci_of_host and sdhci_of_data to
> sdhci_pltfm_host and sdhci_pltfm_data, so that the former pair can
> be eliminated.
>
> Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
> Reviewed-by: Grant Likely <grant.likely@secretlab.ca>
We might be able to get rid of the migrated entries in sdhci-pltfm.h, but this
is a task for later IMO.
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
--
Pengutronix e.K. | Wolfram Sang |
Industrial Linux Solutions | http://www.pengutronix.de/ |
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Will Drewry @ 2011-05-24 19:00 UTC (permalink / raw)
To: Thomas Gleixner, Peter Zijlstra, Steven Rostedt, Ingo Molnar,
Frederic Weisbecker
Cc: linux-mips, linux-sh, Heiko Carstens, Oleg Nesterov,
David Howells, Paul Mackerras, Ralf Baechle, H. Peter Anvin,
sparclinux, Jiri Slaby, linux-s390, Russell King, x86,
James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
Serge E. Hallyn, Tejun Heo, linux-arm-kernel, Michal Marek,
Michal Simek, linuxppc-dev, linux-kernel, Eric Paris, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <alpine.LFD.2.02.1105241823540.3078@ionos>
On Tue, May 24, 2011 at 11:25 AM, Thomas Gleixner <tglx@linutronix.de> wrot=
e:
> On Tue, 24 May 2011, Peter Zijlstra wrote:
>
>> On Tue, 2011-05-24 at 10:59 -0500, Will Drewry wrote:
>> > =A0include/linux/ftrace_event.h =A0| =A0 =A04 +-
>> > =A0include/linux/perf_event.h =A0 =A0| =A0 10 +++++---
>> > =A0kernel/perf_event.c =A0 =A0 =A0 =A0 =A0 | =A0 49 ++++++++++++++++++=
+++++++++++++++++++---
>> > =A0kernel/seccomp.c =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0 =A08 ++++++
>> > =A0kernel/trace/trace_syscalls.c | =A0 27 +++++++++++++++++-----
>> > =A05 files changed, 82 insertions(+), 16 deletions(-)
>>
>> I strongly oppose to the perf core being mixed with any sekurity voodoo
>> (or any other active role for that matter).
>
> Amen. We have enough crap to cleanup in perf/ftrace already, so we
> really do not need security magic added to it.
Thanks for the quick responses!
I agree, but I'm left a little bit lost now w.r.t. the comments around
reusing the ABI. If perf doesn't make sense (which certainly seems
wrong from a security interface perspective), then the preexisting
ABIs I know of for this case are as follows:
- /sys/kernel/debug/tracing/*
- prctl(PR_SET_SECCOMP* (or /proc/...)
Both would require expansion. The latter was reused by the original
patch series. The former doesn't expose much in the way of per-task
event filtering -- ftrace_pids doesn't translate well to
ftrace_syscall_enter-based enforcement. I'd expect we'd need
ftrace_event_call->task_events (like ->perf_events), and either
explore them in ftrace_syscall_enter or add a new tracepoint handler,
ftrace_task_syscall_enter, via something like TRACE_REG_TASK_REGISTER.
It could then do whatever it wanted with the successful or
unsuccessful matching against predicates, stacking or not, which could
be used for a seccomp-like mechanism. However, bubbling that change
up to the non-existent interfaces in debug/tracing could be a
challenge too (Registration would require an alternate flow like perf
to call TRACE_REG_*? Do they become
tracing/events/subsystem/event/task/<tid>/<filter_string_N>? ...?).
This is all just a matter of programming... but at this point, I'm not
seeing the clear shared path forward. Even with per-task ftrace
access in debug/tracing, that would introduce a reasonably large
change to the system and add a new ABI, albeit in debug/tracing. If
the above (or whatever the right approach is) comes into existence,
then any prctl(PR_SET_SECCOMP) ABI could have the backend
implementation to modify the same data. I'm not putting it like this
to say that I'm designing to be obsolete, but to show that the defined
interface wouldn't conflict if ftrace does overlap more in the future.
Given the importance of a clearly defined interface for security
functionality, I'd be surprised to see all the pieces come together in
the near future in such a way that a transition would be immediately
possible -- I'm not even sure what the ftrace roadmap really is!
Would it be more desirable to put a system call filtering interface on
a miscdev (like /dev/syscall_filter) instead of in /proc or prctl (and
not reuse seccomp at all)? I'm not clear what the onus is to justify
a change in the different ABI areas, but I see system call filtering
as an important piece of system security and would like to determine
if there is a viable path forward, or if this will need to be
revisited in another 2 years.
thanks again!
will
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Thomas Gleixner @ 2011-05-24 16:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
x86, James Morris, Linus Torvalds, Ingo Molnar, Ingo Molnar,
Serge E. Hallyn, Steven Rostedt, Martin Schwidefsky, kees.cook,
linux-arm-kernel, Michal Marek, Michal Simek, Will Drewry,
linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <1306254027.18455.47.camel@twins>
On Tue, 24 May 2011, Peter Zijlstra wrote:
> On Tue, 2011-05-24 at 10:59 -0500, Will Drewry wrote:
> > include/linux/ftrace_event.h | 4 +-
> > include/linux/perf_event.h | 10 +++++---
> > kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++++---
> > kernel/seccomp.c | 8 ++++++
> > kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
> > 5 files changed, 82 insertions(+), 16 deletions(-)
>
> I strongly oppose to the perf core being mixed with any sekurity voodoo
> (or any other active role for that matter).
Amen. We have enough crap to cleanup in perf/ftrace already, so we
really do not need security magic added to it.
Thanks,
tglx
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Peter Zijlstra @ 2011-05-24 16:20 UTC (permalink / raw)
To: Will Drewry
Cc: linux-mips, linux-sh, Frederic Weisbecker, Heiko Carstens,
Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
x86, James Morris, Linus Torvalds, Ingo Molnar, Ingo Molnar,
Serge E. Hallyn, Steven Rostedt, Tejun Heo, Thomas Gleixner,
kees.cook, linux-arm-kernel, Michal Marek, Michal Simek,
linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
Martin Schwidefsky, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <BANLkTim9UyYAGhg06vCFLxkYPX18cPymEQ@mail.gmail.com>
On Tue, 2011-05-24 at 10:59 -0500, Will Drewry wrote:
> include/linux/ftrace_event.h | 4 +-
> include/linux/perf_event.h | 10 +++++---
> kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++=
++---
> kernel/seccomp.c | 8 ++++++
> kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
> 5 files changed, 82 insertions(+), 16 deletions(-)=20
I strongly oppose to the perf core being mixed with any sekurity voodoo
(or any other active role for that matter).
^ permalink raw reply
* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Will Drewry @ 2011-05-24 15:59 UTC (permalink / raw)
To: Steven Rostedt, Ingo Molnar, Peter Zijlstra, Frederic Weisbecker
Cc: linux-mips, linux-sh, Heiko Carstens, Oleg Nesterov,
David Howells, Paul Mackerras, Ralf Baechle, H. Peter Anvin,
sparclinux, Jiri Slaby, linux-s390, Russell King, x86,
James Morris, Linus Torvalds, Ingo Molnar, kees.cook,
Serge E. Hallyn, Tejun Heo, Thomas Gleixner, linux-arm-kernel,
Michal Marek, Michal Simek, linuxppc-dev, linux-kernel,
Eric Paris, Paul Mundt, Martin Schwidefsky, linux390,
Andrew Morton, agl, David S. Miller
In-Reply-To: <BANLkTiki8aQJbFkKOFC+s6xAEiuVyMM5MQ@mail.gmail.com>
On Thu, May 19, 2011 at 4:05 PM, Will Drewry <wad@chromium.org> wrote:
> On Thu, May 19, 2011 at 7:22 AM, Steven Rostedt <rostedt@goodmis.org> wro=
te:
>> On Wed, 2011-05-18 at 21:07 -0700, Will Drewry wrote:
>>
>>> Do event_* that return non-void exist in the tree at all now? =A0I've
>>> looked at the various tracepoint macros as well as some of the other
>>> handlers (trace_function, perf_tp_event, etc) and I'm not seeing any
>>> places where a return value is honored nor could be. =A0At best, the
>>> perf_tp_event can be short-circuited it in the hlist_for_each, but
>>> it'd still need a way to bubble up a failure and result in not calling
>>> the trace/event that the hook precedes.
>>
>> No, none of the current trace hooks have return values. That was what I
>> was talking about how to implement in my previous emails.
>
> Led on by complete ignorance, I think I'm finally beginning to untwist
> the different pieces of the tracing infrastructure. =A0Unfortunately,
> that means I took a few wrong turns along the way...
>
> I think function tracing looks something like this:
>
> ftrace_call has been injected into at a specific callsite. =A0Upon hit:
> 1. ftrace_call triggers
> 2. does some checks then calls ftrace_trace_function (via mcount instrs)
> 3. ftrace_trace_function may be a single func or a list. For a list it
> will be: ftrace_list_func
> 4. ftrace_list_func calls each registered hook for that function in a
> while loop ignoring return values
> 5. registered hook funcs may then track the call, farm it out to
> specific sub-handlers, etc.
>
> This seems to be a red herring for my use case :/ though this helped
> me understand your back and forth (Ingo & Steve) regarding dynamic
> versus explicit events.
>
>
> System call tracing is done via kernel/tracepoint.c events fed in via
> arch/[arch]/kernel/ptrace.c where it calls trace_sys_enter. =A0This
> yields direct sys_enter and sys_exit event sources (and an event class
> to hook up per-system call events). =A0This means that
> ftrace_syscall_enter() does the event prep prior to doing a filter
> check comparing the ftrace_event object for the given syscall_nr to
> the event data. =A0perf_sysenter_enter() is similar but it pushes the
> info over to perf_tp_event to be matched not against the global
> syscall event entry, but against any sub-event in the linked list on
> that syscall's event.
>
> Is that roughly an accurate representation of the two? =A0I wish I
> hadn't gotten distracted along the function path, but at least I
> learned something (and it is relevant to the larger scope of this
> thread's discussion).
>
>
> After doing that digging, it looks like providing hook call
> pre-emption and return value propagation will be a unique and fun task
> for each tracer and event subsystem. =A0If I look solely at tracepoints,
> a generic change would be to make the trace_##subsys function return
> an int (which I think was the event_vfs_getname proposal?). =A0The other
> option is to change the trace_sys_enter proto to include a 'int
> *retcode'.
>
> That change would allow the propagation of some sort of policy
> information. =A0To put it to use, seccomp mode 1 could be implemented on
> top of trace_syscalls. =A0The following changes would need to happen:
> 1. dummy metadata should be inserted for all unhooked system calls
> 2. perf_trace_buf_submit would need to return an int or a new
> TRACE_REG_SECCOMP_REGISTER handler would need to be setup in
> syscall_enter_register.
> 3. If perf is abused, a "kill/fail_on_discard" bit would be added to
> event->attrs.
> 4. perf_trace_buf_submit/perf_tp_event will return 0 for no matches,
> 'n' for the number of event matches, and -EACCES/? if a
> 'fail_on_discard' event is seen.
> 5. perf_syscall_enter would set *retcode =3D perf_trace_buf_submit()'s re=
tcode
> 6. trace_sys_enter() would need to be moved to be the first entry
> arch/../kernel/ptrace.c for incoming syscalls
> 7. if trace_sys_enter() yields a negative return code, then
> do_exit(SIGKILL) the process and return.
>
> Entering into seccomp mode 1 would require adding a =A0"0" filter for
> every system call number (which is why we need a dummy event call for
> them since failing to check the bitmask can't be flagged
> fail_on_discard) with the fail_on_discard bit. =A0For the three calls
> that are allowed, a '1' filter would be set.
>
> That would roughly implement seccomp mode 1. =A0It's pretty ugly and the
> fact that every system call that's disallowed has to be blacklisted is
> not ideal. =A0An alternate model would be to just use the seccomp mode
> as we do today and let secure_computing() handle the return code of "#
> of matches". =A0If it the # of matches is 0, it terminates. A
> 'fail_on_discard' bit then would only be good to stop further
> tracepoint callback evaluation. =A0This approach would also *almost* nix
> the need to provide dummy syscall hooks. =A0(Since sigreturn isn't
> hooked on x86 because it uses ptregs fixup, a dummy would still be
> needed to apply a "1" filter to.)
>
> Even with that tweak to move to a whitelist model, the perf event
> evaluation and tracepoint callback ordering is still not guaranteed.
> Without changing tracepoint itself, all other TPs will still execute.
> And for perf events, it'll be first-come-first-serve until a
> fail_on_discard is hit.
>
> After using seccomp mode=3D1 as the sample case to reduce scope, it's
> possible to ignore it for now :) and look at the seccomp-filter/mode=3D2
> case. =A0The same mechanism could be used to inject more expressive
> filters. =A0This would be done over the sys_perf_event_open() interface
> assuming the new attr is added to stop perf event list enumeration.
> Assuming a whitelist model, a call to prctl(PR_SET_SECCOMP, 2) would
> be needed (maybe with the ON_EXEC flag option too to mirror the
> event->attr on-exec bit). That would yield the ability to register
> perf events for system calls then use ioctl() to SET_FILTER on them.
>
> Reducing the privileges of the filters after installation could be
> done with another attribute bit like 'set_filter_ands'. =A0If that is
> also set on the event, and a filter is installed to ensure that
> sys_perf_event_open is blocked, then it should be reasonably sane.
>
> I'd need to add a PERF_EVENT_IOC_GET_FILTER handler to allow
> extracting the settings.
>
> Clearly, I haven't written the code for that yet, though making the
> change for a single platform shouldn't be too much code.
>
> So that leaves me with some questions:
> - Is this the type of reuse that was being encouraged?
> - Does it really make sense to cram this through the perf interface
> and events? =A0While the changed attributes are innocuous and
> potentially reusable, it seems that a large amount of the perf
> facilities are exposed that could have weird side effects, but I'm
> sure I still don't fully grok the perf infrastructure.
> - Does extending one tracepoint to provide return values via a pointer
> make sense? I'd hesitate to make all tracepoint hooks return an int by
> default.
> - If all tracepoints returned an int, what would the standard value
> look like - or would it need to be per tracepoint impl?
> - How is ambiguity resolved if multiple perf_events are registered for
> a syscall with different filters? =A0Maybe a 'stop_on_match'? though
> ordering is still a problem then.
> - Is there a way to affect a task-wide change without a seccomp flag
> (or new task_struct entry) via the existing sys_perf_event_open
> syscall? =A0I considered suggesting a attr bit call 'secure_computing'
> that when an event with the bit is enabled, it automagically forces
> the task into seccomp enforcement mode, but that, again, seemed
> hackish.
>
> While I like the idea of sharing the tracepoints infrastructure and
> the trace_syscalls hooks as well as using a pre-existing interface
> with very minor changes, I'm worried that the complexity of the
> interface and of the infrastructure might undermine the ability to
> continue meeting the desired security goals. =A0I had originally stayed
> very close to the seccomp world because of how trivial it is to review
> the code and verify its accuracy/effectiveness. =A0This approach leaves
> a lot of gaps for kernel code to seep through and a fair amount of
> ambiguity in what locked down syscall filters might look like.
>
> To me, the best part of the above is that it shows that even if we go
> with a prctl SET_SECCOMP_FILTER-style interface, it is completely
> certain that if a perf/ftrace-based security infrastructure is on our
> future, it will be entirely compatible -- even if the prctl()
> interface is just the "simpler" interface at that point somewhere down
> the road.
>
>
> Regardless, I'll hack up a proof of concept based on the outline
> above. Perhaps something more elegant will emerge once I start
> tweaking the source, but I'm still seeing too many gaps to be
> comfortable so far.
>
>
> [[There is a whole other approach to this too. We could continue with
> the prctl() interface and mirror the trace_sys_enter model for
> secure_computing(). =A0 Instead of keeping a seccomp_t-based hlist of
> events, they could be stored in a new hlist for seccomp_events in
> struct ftrace_event_call. =A0The ftrace filters would be installed there
> and the seccomp_syscall_enter() function could do the checks and pass
> up some state data on the task's seccomp_t that indicates it needs to
> do_exit(). =A0That would likely reduce the amount of code in
> seccomp_filter.c pretty seriously (though not entirely
> beneficially).]]
>
>
> Thanks again for all the feedback and insights! I really hope we can
> come to an agreeable approach for implementing kernel attack surface
> reduction.
Just a quick follow up. I have a PoC of the perf-based system call
filtering code in place which uses seccomp.mode as a task_context
flag, adds require_secure and err_on_discard perf_event_attrs, and
enforces these new events terminate the process in perf_syscall_enter
via a return value on perf_buf_submit. This lets a call like:
LD_LIBRARY_PATH=3D/host/home/wad/kernel.uml/tests/
/host/home/wad/kernel.uml/tests/perf record \
-S \
-e 'syscalls:sys_enter_access' \
-e 'syscalls:sys_enter_brk' \
-e 'syscalls:sys_enter_close' \
-e 'syscalls:sys_enter_exit_group' \
-e 'syscalls:sys_enter_fcntl64' \
-e 'syscalls:sys_enter_fstat64' \
-e 'syscalls:sys_enter_getdents64' \
-e 'syscalls:sys_enter_getpid' \
-e 'syscalls:sys_enter_getuid' \
-e 'syscalls:sys_enter_ioctl' \
-e 'syscalls:sys_enter_lstat64' \
-e 'syscalls:sys_enter_mmap_pgoff' \
-e 'syscalls:sys_enter_mprotect' \
-e 'syscalls:sys_enter_munmap' \
-e 'syscalls:sys_enter_open' \
-e 'syscalls:sys_enter_read' \
-e 'syscalls:sys_enter_stat64' \
-e 'syscalls:sys_enter_time' \
-e 'syscalls:sys_enter_newuname' \
-e 'syscalls:sys_enter_write' --filter "fd =3D=3D 1" \
/bin/ls
run successfully while omitting a syscall or passing in --filter "fd
=3D=3D 0" properly terminates it. (ignoring the need for execve and
set_thread_area for PoC purposes).
The change avoids defining a new trace call type or a huge number of
internal changes and hides seccomp.mode=3D2 from ABI-exposure in prctl,
but the attack surface is non-trivial to verify, and I'm not sure if
this ABI change makes sense. It amounts to:
include/linux/ftrace_event.h | 4 +-
include/linux/perf_event.h | 10 +++++---
kernel/perf_event.c | 49 +++++++++++++++++++++++++++++++++++++=
---
kernel/seccomp.c | 8 ++++++
kernel/trace/trace_syscalls.c | 27 +++++++++++++++++-----
5 files changed, 82 insertions(+), 16 deletions(-)
And can be found here: http://static.dataspill.org/perf_secure/v1/
If there is any interest at all, I can post it properly to this giant
CC list. However, it's unclear to me where this thread stands. I
also see a completely independent path for implementing system call
filtering now, but I'll hold off posting that patch until I see where
this goes.
Any guidance will be appreciated - thanks again!
will
^ permalink raw reply
* [PATCH 1/1] powerpc: Update MAX_HCALL_OPCODE to reflect page coalescing
From: Brian King @ 2011-05-24 13:40 UTC (permalink / raw)
To: benh; +Cc: brking, linuxppc-dev
When page coalescing support was added recently, the MAX_HCALL_OPCODE
define was not updated for the newly added H_GET_MPP_X hcall.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/hvcall.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN arch/powerpc/include/asm/hvcall.h~powerpc_max_hcall_opcode arch/powerpc/include/asm/hvcall.h
--- linux-2.6/arch/powerpc/include/asm/hvcall.h~powerpc_max_hcall_opcode 2011-05-20 09:33:45.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/include/asm/hvcall.h 2011-05-20 09:33:58.000000000 -0500
@@ -236,7 +236,7 @@
#define H_HOME_NODE_ASSOCIATIVITY 0x2EC
#define H_BEST_ENERGY 0x2F4
#define H_GET_MPP_X 0x314
-#define MAX_HCALL_OPCODE H_BEST_ENERGY
+#define MAX_HCALL_OPCODE H_GET_MPP_X
#ifndef __ASSEMBLY__
_
^ permalink raw reply
* Re: PCI DMA to user mem on mpc83xx
From: Andre Schwarz @ 2011-05-24 10:02 UTC (permalink / raw)
To: David Laight; +Cc: LinuxPPC List, Ira W. Snyder
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6D8AD4B@saturn3.aculab.com>
David,
>> we have a pretty old PCI device driver here that needs some
>> basic rework running on 2.6.27 on several MPC83xx.
>> It's a simple char-device with "give me some data" implemented
>> using read() resulting in zero-copy DMA to user mem.
>>
>> There's get_user_pages() working under the hood along with
>> SetPageDirty() and page_cache_release().
> Does that dma use the userspace virtual address, or the
> physical address - or are you remapping the user memory into
> kernel address space.
no mapping at all AFAIK.
I'm using get_user_pages() followed by allocation of a struct
scatterlist being filled with sg_set_page().
After the transfer the pages are marked dirty using SetPageDirty().
> If the memory is remapped into the kernel address space, the
> cost of the mmu and tlb operations (especially on MP systems)
> is such that a dma to kernel memory followed by copyout/copytouser
> may well be faster!
no mapping.
> That may even be the case even if the dma is writing to the
> user virtual (or physical) addresses when it is only
> necessary to ensure the memory page is resident and that
> the caches are coherent.
All I need is physical addresses of user mem.
Since the allocating user driver is using mlock() and there's no swap I
expect to be safe ... is this a stupid assumption ?
> In any case the second copy is probably far faster than the
> PCI one!
huh - I observed memcpy() to be very expensive (at least on 83xx PowerPC).
> I've recently written driver that supports a pread/pwrite interface
> to the memory windows on a PCIe card. It was important to use
> dma for the PCIe transfers (to get a sensible transfer size).
> I overlapped the copyin/copyout with the next dma transfer.
> The dma's are fast enough that it is worth spinning waiting
> for completion - but slow enough to make the overlapped
> operation worthwhile (same speed as a single word pio transfer).
Thanks for your feedback.
Cheers,
André
MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler
Registergericht: Amtsgericht Stuttgart, HRB 271090
Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner
^ permalink raw reply
* Re: PCI DMA to user mem on mpc83xx
From: Andre Schwarz @ 2011-05-24 9:47 UTC (permalink / raw)
To: Ira W. Snyder; +Cc: LinuxPPC List
In-Reply-To: <20110523172727.GA21717@ovro.caltech.edu>
Ira,
> On Mon, May 23, 2011 at 11:12:41AM +0200, Andre Schwarz wrote:
>> Ira,
>>
>> we have a pretty old PCI device driver here that needs some basic rework
>> running on 2.6.27 on several MPC83xx.
>> It's a simple char-device with "give me some data" implemented using
>> read() resulting in zero-copy DMA to user mem.
>>
>> There's get_user_pages() working under the hood along with
>> SetPageDirty() and page_cache_release().
>>
>> Main goal is to prepare a sg-list that gets fed into a DMA controller.
>>
>> I wonder if there's a more up-to-date/efficient and future proof scheme
>> of creating the mapping.
>>
>>
>> Could you provide some pointers or would you stick to the current scheme ?
>>
> This scheme is the best you'll come up with for zero-copy IO. I used
> get_user_pages_fast(), but otherwise my implementation was the same.
> These interfaces should be fairly future proof.
excellent - thanks.
Will stick to it then ...
> In the end, I realized that most of my transfers were 4 bytes in length,
> and zero copy IO was a waste of effort. I decided to use mmap instead.
>
I'm using 98% page sized (4KiB) scatter gather transfers summing up to
~80MiB/sec sustained throughput.
Cheers,
André
MATRIX VISION GmbH, Talstrasse 16, DE-71570 Oppenweiler
Registergericht: Amtsgericht Stuttgart, HRB 271090
Geschaeftsfuehrer: Gerhard Thullner, Werner Armingeon, Uwe Furtner
^ permalink raw reply
* Re: [PATCH] oprofile, powerpc: Handle events that raise an exception without overflowing
From: Robert Richter @ 2011-05-24 9:27 UTC (permalink / raw)
To: Eric B Munson
Cc: oprofile-list@lists.sf.net, paulus@samba.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <1306160560-5309-1-git-send-email-emunson@mgebm.net>
On 23.05.11 10:22:40, Eric B Munson wrote:
> Commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 fixes a situation on POWER7
> where events can roll back if a specualtive event doesn't actually complete.
> This can raise a performance monitor exception. We need to catch this to ensure
> that we reset the PMC. In all cases the PMC will be less than 256 cycles from
> overflow.
>
> This patch lifts Anton's fix for the problem in perf and applies it to oprofile
> as well.
>
> Signed-off-by: Eric B Munson <emunson@mgebm.net>
> Cc: <stable@kernel.org> # as far back as it applies cleanly
> ---
> arch/powerpc/oprofile/op_model_power4.c | 24 +++++++++++++++++++++++-
> 1 files changed, 23 insertions(+), 1 deletions(-)
I applied the fix to oprofile/urgent.
Thanks Eric,
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
^ permalink raw reply
* RE: PCI DMA to user mem on mpc83xx
From: David Laight @ 2011-05-24 8:15 UTC (permalink / raw)
To: Andre Schwarz, Ira W. Snyder; +Cc: LinuxPPC List
In-Reply-To: <4DDA2509.6070702@matrix-vision.de>
=20
> we have a pretty old PCI device driver here that needs some
> basic rework running on 2.6.27 on several MPC83xx.
> It's a simple char-device with "give me some data" implemented
> using read() resulting in zero-copy DMA to user mem.
>=20
> There's get_user_pages() working under the hood along with=20
> SetPageDirty() and page_cache_release().
Does that dma use the userspace virtual address, or the
physical address - or are you remapping the user memory into
kernel address space.
If the memory is remapped into the kernel address space, the
cost of the mmu and tlb operations (especially on MP systems)
is such that a dma to kernel memory followed by copyout/copytouser
may well be faster!
That may even be the case even if the dma is writing to the
user virtual (or physical) addresses when it is only
necessary to ensure the memory page is resident and that
the caches are coherent.
In any case the second copy is probably far faster than the
PCI one!
I've recently written driver that supports a pread/pwrite interface
to the memory windows on a PCIe card. It was important to use
dma for the PCIe transfers (to get a sensible transfer size).
I overlapped the copyin/copyout with the next dma transfer.
The dma's are fast enough that it is worth spinning waiting
for completion - but slow enough to make the overlapped
operation worthwhile (same speed as a single word pio transfer).
David
^ permalink raw reply
* Re: Kernel cannot see PCI device
From: Prashant Bhole @ 2011-05-24 4:55 UTC (permalink / raw)
To: Bjorn Helgaas, Benjamin Herrenschmidt
Cc: linux-pci@vger.kernel.org, linuxppc-dev
In-Reply-To: <BANLkTi=bs31L=C+XWGeXG49xuFVzt+omgA@mail.gmail.com>
Hi,
On Fri, May 20, 2011 at 4:49 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Thu, May 19, 2011 at 5:12 PM, Benjamin Herrenschmidt
> <benh@kernel.crashing.org> wrote:
>> On Thu, 2011-05-19 at 11:58 -0600, Bjorn Helgaas wrote:
>>> The scan below PCIX0 (bus 0001:00) doesn't find anything. =A0You really
>>> need a powerpc expert to help here, but in their absence, my guess
>>> would be something's wrong with config space access, so I would start
>>> by just adding some printks to ppc4xx_probe_pcix_bridge() to see if
>>> the rsrc_cfg address looks reasonable. =A0You might need a chip spec or
>>> maybe you can compare it to the device tree (I have no idea what the
>>> relation between the device tree and OF is).
>>>
>>> You mentioned the u-boot "pci 2" command earlier. =A0It found a device
>>> on bus 2, which means there must be at least one P2P bridge to get you
>>> from bus 0 to bus 2. =A0So the output of "pci 0", "pci 1", "pci 80", an=
d
>>> "pci 81" (to compare with what Linux found) would be interesting.
>>
>> Well, if it's PCIe, there's the "virtual" P2P bridge of the root
>> complex.
>>
>> The question is on what PCIe is his device connected, the one that we
>> see or the one that's disabled in the device-tree.
>
> I *think* the device Prashant is looking for ("02.00.00 =A0 0x1000
> 0x0072 =A0 =A0 Mass storage controller 0x00") is below the PCI-X bridge;
> at least the canyonlands.dts he posted says that PCIX0 leads to buses
> 0-3f.
>
Fixed the problem by soft resetting the PCIe port in the function
ppc460ex_pciex_port_init_hw().
Is it a right thing to do?
Following is the patch for kernel 2.6.38.4:
---------------------------------------------------------------------------=
-----------
--- linux-2.6.38.4/arch/powerpc/sysdev/ppc4xx_pci.c.orig 2011-05-24
10:02:38.000000000 +0530
+++ linux-2.6.38.4/arch/powerpc/sysdev/ppc4xx_pci.c 2011-05-24
10:07:17.000000000 +0530
@@ -876,6 +876,20 @@
u32 val;
u32 utlset1;
+ switch (port->index)
+ {
+ case 0:
+ mtdcri(SDR0, PESDR0_460EX_PHY_CTL_RST, 0x0);
+ mdelay(10);
+ break;
+ case 1:
+ mtdcri(SDR0, PESDR1_460EX_PHY_CTL_RST, 0x0);
+ mdelay(10);
+ break;
+ default:
+ break;
+ }
+
if (port->endpoint)
val =3D PTYPE_LEGACY_ENDPOINT << 20;
else
---------------------------------------------------------------------------=
-----------
- Prashant
^ permalink raw reply
* RE: [PATCH 0/1] ppc4xx: Fix PCIe scanning for the 460SX
From: Benjamin Herrenschmidt @ 2011-05-24 3:40 UTC (permalink / raw)
To: Tirumala Marri; +Cc: Paul Mackerras, Ayman El-Khashab, linuxppc-dev
In-Reply-To: <6a09a426f126fb315b773fc207d68eb9@mail.gmail.com>
On Thu, 2011-05-12 at 11:16 -0700, Tirumala Marri wrote:
> So what is the best way to handle this? It appears (based
> on the comments of others and my own experience) that there
> is no DCR that exists and behaves the way that previous SOCs
> behaved to give us the link status?
The register above
> PECFGn_DLLSTA is actually in the PCIe configuration space so
> we would have to map that in to be able to read that
> register during the link check. Is that correct or ok?
> [marri] yes, you need to program DCR register access these local PCIE_CFG
> registers.
We do I think, tho we might have to re-order some stuff. I'm facing a
similar issue with some internal design here, I'm thinking off adding a
new hook to the ppc4xx_pciex_hwops for link checking to replace the SDR
business.
The interesting question of course is whether that 460SX stuff is the
same as what we're using internally :-)
> I've communicated with some people over email and they had
> tried the (PESDRn_HSSLySTS) register. Recognizing that
> there exists one of these for each port/lane, is there a way
> to use this one? It is in the indirect DCR space. I'd
> tried this myself and never did get it to do anything but I
> could have been looking at the wrong lane or something.
> [marri]This is at SERDES level. If this link up doesn't necessarily
> Overall stack is up. This is mostly used for BIST and diagnostics.
>
> Lastly, what was the reason for forcing the original code to
> be GEN-1 speeds?
> [marri] Gen-2 need some extra checks compared to Gen-1.
> There were not many Gen-2 devices at the time of submission
> To test them.
Can we fix that ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table
From: Benjamin Herrenschmidt @ 2011-05-24 2:52 UTC (permalink / raw)
To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <20110523183100.36c54904@schlenkerla.am.freescale.net>
On Mon, 2011-05-23 at 18:31 -0500, Scott Wood wrote:
> On Tue, 24 May 2011 06:51:01 +1000
> Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> > Is your linear mapping bolted ? If it is you may be able to cut out most
> > of the save/restore stuff (SRR0,1, ...) since with a normal walk you
> > won't take nested misses.
>
> It is bolted -- we ignore anything we can't map with 16 entries. The only
> semi-realistic case I can think of where we might bump into that (and thus
> want non-bolted memory), especially with more than negligible loss compared
> to the size of memory, is AMP with a non-zero start address where we have
> to stick with smaller pages due to alignment. Even so, 16 times the
> alignment of the start of RAM doesn't seem that unreasonable a limit. The
> 32-bit limit of 3 entries for lowmem is a bit more troublesome there.
Ok so in this case, it might be worth doing a separate of TLB miss
handlers without all the context save/restore business... would also
make re-entrant CRITs and MCs easier to deal with.
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 2/7] powerpc/mm: 64-bit 4k: use a PMD-based virtual page table
From: Scott Wood @ 2011-05-23 23:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1306183861.7481.208.camel@pasglop>
On Tue, 24 May 2011 06:51:01 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> Is your linear mapping bolted ? If it is you may be able to cut out most
> of the save/restore stuff (SRR0,1, ...) since with a normal walk you
> won't take nested misses.
It is bolted -- we ignore anything we can't map with 16 entries. The only
semi-realistic case I can think of where we might bump into that (and thus
want non-bolted memory), especially with more than negligible loss compared
to the size of memory, is AMP with a non-zero start address where we have
to stick with smaller pages due to alignment. Even so, 16 times the
alignment of the start of RAM doesn't seem that unreasonable a limit. The
32-bit limit of 3 entries for lowmem is a bit more troublesome there.
-Scott
^ permalink raw reply
* [PATCH v5] powerpc: Force page alignment for initrd reserved memory
From: Dave Carroll @ 2011-05-23 22:54 UTC (permalink / raw)
To: 'Milton Miller'; +Cc: Paul Mackerras, LPPC, LKML
In-Reply-To: <initrd-align-withdrawn@mdm.bga.com>
When using 64K pages with a separate cpio rootfs, U-Boot will align
the rootfs on a 4K page boundary. When the memory is reserved, and
subsequent early memblock_alloc is called, it will allocate memory
between the 64K page alignment and reserved memory. When the reserved
memory is subsequently freed, it is done so by pages, causing the
early memblock_alloc requests to be re-used, which in my case, caused
the device-tree to be clobbered.
This patch forces the reserved memory for initrd to be kernel page
aligned, and adds the same range extension when freeing initrd. It
will also move the device tree if it overlaps with the reserved memory
for initrd.
Many thanks to Milton Miller for his input on this patch.
Signed-off-by: Dave Carroll <dcarroll@astekcorp.com>
---
* This patch is based on Linus' current tree
arch/powerpc/kernel/prom.c | 11 ++++++++---
arch/powerpc/mm/init_32.c | 5 ++++-
arch/powerpc/mm/init_64.c | 5 ++++-
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 48aeb55..58871df 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -86,7 +86,8 @@ early_param("mem", early_parse_mem);
* move_device_tree - move tree to an unused area, if needed.
*
* The device tree may be allocated beyond our memory limit, or inside the
- * crash kernel region for kdump. If so, move it out of the way.
+ * crash kernel region for kdump, or within the page aligned range of init=
rd.
+ * If so, move it out of the way.
*/
static void __init move_device_tree(void)
{
@@ -99,7 +100,9 @@ static void __init move_device_tree(void)
size =3D be32_to_cpu(initial_boot_params->totalsize);
if ((memory_limit && (start + size) > PHYSICAL_START + memory_limit=
) ||
- overlaps_crashkernel(start, size)) {
+ overlaps_crashkernel(start, size) ||
+ ((start + size) > _ALIGN_DOWN(initrd_start, PAGE_SI=
ZE)
+ && start <=3D _ALIGN_UP(initrd_end, PAGE_SIZE))) {
p =3D __va(memblock_alloc(size, PAGE_SIZE));
memcpy(p, initial_boot_params, size);
initial_boot_params =3D (struct boot_param_header *)p;
@@ -555,7 +558,9 @@ static void __init early_reserve_mem(void)
#ifdef CONFIG_BLK_DEV_INITRD
/* then reserve the initrd, if any */
if (initrd_start && (initrd_end > initrd_start))
- memblock_reserve(__pa(initrd_start), initrd_end - initrd_st=
art);
+ memblock_reserve(_ALIGN_DOWN(__pa(initrd_start), PAGE_SIZE)=
,
+ _ALIGN_UP(initrd_end, PAGE_SIZE) -
+ _ALIGN_DOWN(initrd_start, PAGE_SIZE));
#endif /* CONFIG_BLK_DEV_INITRD */
#ifdef CONFIG_PPC32
diff --git a/arch/powerpc/mm/init_32.c b/arch/powerpc/mm/init_32.c
index d65b591..4835c4f 100644
--- a/arch/powerpc/mm/init_32.c
+++ b/arch/powerpc/mm/init_32.c
@@ -226,8 +226,11 @@ void free_initmem(void)
#ifdef CONFIG_BLK_DEV_INITRD
void free_initrd_mem(unsigned long start, unsigned long end)
{
- if (start < end)
+ if (start < end) {
+ start =3D _ALIGN_DOWN(start, PAGE_SIZE);
+ end =3D _ALIGN_UP(end, PAGE_SIZE);
printk ("Freeing initrd memory: %ldk freed\n", (end - start=
) >> 10);
+ }
for (; start < end; start +=3D PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index 6374b21..060c952 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -102,8 +102,11 @@ void free_initmem(void)
#ifdef CONFIG_BLK_DEV_INITRD
void free_initrd_mem(unsigned long start, unsigned long end)
{
- if (start < end)
+ if (start < end) {
+ start =3D _ALIGN_DOWN(start, PAGE_SIZE);
+ end =3D _ALIGN_UP(end, PAGE_SIZE);
printk ("Freeing initrd memory: %ldk freed\n", (end - start=
) >> 10);
+ }
for (; start < end; start +=3D PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
init_page_count(virt_to_page(start));
--
1.7.4
^ permalink raw reply related
* Re: [PATCH][v3] powerpc/85xx: add host-pci(e) bridge only for RC
From: Kumar Gala @ 2011-05-23 22:41 UTC (permalink / raw)
To: Prabhakar Kushwaha; +Cc: meet2prabhu, linuxppc-dev, Vivek Mahajan
In-Reply-To: <1306146205-22772-1-git-send-email-prabhakar@freescale.com>
On May 23, 2011, at 5:23 AM, Prabhakar Kushwaha wrote:
> FSL PCIe controller can act as agent(EP) or host(RC).
> Under Agent(EP) mode they are configured via Host. So it is not =
required to add
> with the PCI(e) sub-system.
>=20
> Add and configure PCIe controller only for RC mode.
>=20
> Signed-off-by: Vivek Mahajan <vivek.mahajan@freescale.com>
> Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
> ---
> Based upon =
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git(branc=
h master)
>=20
> Chages for v2: Incorporated Kumar's comment
> - Use PCI_CLASS_PROG instead of PCI_HEADER_TYPE=20
> Changes for v3:=20
> - updated if check condition
> - removed checkpatch warning
>=20
> arch/powerpc/sysdev/fsl_pci.c | 14 ++++++++++++++
> 1 files changed, 14 insertions(+), 0 deletions(-)
applied to next.
- k=
^ permalink raw reply
* Re: [PATCH 0/7] This patchset adds support for running Linux under the Freescale hypervisor,
From: Kumar Gala @ 2011-05-23 22:27 UTC (permalink / raw)
To: Tabi Timur-B04825
Cc: greg@kroah.com, linux-kernel@vger.kernel.org, akpm@kernel.org,
linux-console@vger.kernel.org, Gala Kumar-B11780,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <BANLkTinqOb1UH=keRtZ9=454NDrzcjYV7g@mail.gmail.com>
On May 23, 2011, at 4:09 PM, Tabi Timur-B04825 wrote:
> On Fri, May 20, 2011 at 3:29 PM, Kumar Gala <kumar.gala@freescale.com> =
wrote:
>=20
>> Applied to 'test' branch. (grabbed 'v2' of tty patch). Fixed merged =
conflicts.
>=20
> I don't think you pushed this branch to git.kernel.org
>=20
> =
http://git.kernel.org/?p=3Dlinux/kernel/git/galak/powerpc.git;a=3Dshortlog=
;h=3Drefs/heads/test
>=20
Tree pushed.
- k=
^ permalink raw reply
* Re: [PATCH 0/7] This patchset adds support for running Linux under the Freescale hypervisor,
From: Tabi Timur-B04825 @ 2011-05-23 21:09 UTC (permalink / raw)
To: Gala Kumar-B11780
Cc: Tabi Timur-B04825, linux-kernel@vger.kernel.org, akpm@kernel.org,
linux-console@vger.kernel.org, greg@kroah.com,
linuxppc-dev@lists.ozlabs.org
In-Reply-To: <A221B293-A4DD-4223-AEEA-F3E5243D2C0A@freescale.com>
On Fri, May 20, 2011 at 3:29 PM, Kumar Gala <kumar.gala@freescale.com> wrot=
e:
> Applied to 'test' branch. =A0(grabbed 'v2' of tty patch). =A0Fixed merged=
conflicts.
I don't think you pushed this branch to git.kernel.org
http://git.kernel.org/?p=3Dlinux/kernel/git/galak/powerpc.git;a=3Dshortlog;=
h=3Drefs/heads/test
--=20
Timur Tabi
Linux kernel developer at Freescale=
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox