LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop, [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop
From: Milton Miller @ 2011-05-12  9:13 UTC (permalink / raw)
  To: Andrew Morton, Nick Piggin, Benjamin Herrenschmidt,
	Anton Blanchard, Thomas Gleixner, Eric Dumazet
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, linuxppc-dev,
	linux-kernel
In-Reply-To: <alpine.LFD.2.02.1105091036000.2895@ionos>

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.  

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Cc: <stable@vger.kernel.org>
Signed-off-by: Milton Miller <miltonm@bga.com>
---

To the readers of [RFC] time: xtime_lock is held too long:

I initially thought x86 would not see this because rmb would
be a nop, but upon closer inspection X86_PPRO_FENCE will add
a lfence for rmb.

milton

Index: common/include/linux/seqlock.h
===================================================================
--- common.orig/include/linux/seqlock.h	2011-04-06 03:27:02.000000000 -0500
+++ common/include/linux/seqlock.h	2011-04-06 03:35:02.000000000 -0500
@@ -88,12 +88,12 @@ static __always_inline unsigned read_seq
 	unsigned ret;

 repeat:
-	ret = sl->sequence;
-	smp_rmb();
+	ret = ACCESS_ONCE(sl->sequence);
 	if (unlikely(ret & 1)) {
 		cpu_relax();
 		goto repeat;
 	}
+	smp_rmb();

 	return ret;
 }

^ permalink raw reply

* Re: [PATCH 03/13] kvm/powerpc: Fix the build for 32-bit Book 3S (classic) processors
From: Alexander Graf @ 2011-05-12  9:33 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev@ozlabs.org, kvm@vger.kernel.org
In-Reply-To: <20110511103950.GD2837@brick.ozlabs.ibm.com>


Am 11.05.2011 um 12:39 schrieb Paul Mackerras <paulus@samba.org>:

> Commits a5d4f3ad3a ("powerpc: Base support for exceptions using
> HSRR0/1") and 673b189a2e ("powerpc: Always use SPRN_SPRG_HSCRATCH0
> when running in HV mode") cause compile and link errors for 32-bit
> classic Book 3S processors when KVM is enabled.  This fixes these
> errors.
>=20
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/reg.h       |    5 +++++
> arch/powerpc/kvm/book3s_rmhandlers.S |    2 ++
> 2 files changed, 7 insertions(+), 0 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg=
.h
> index 47e3416..05658b7 100644
> --- a/arch/powerpc/include/asm/reg.h
> +++ b/arch/powerpc/include/asm/reg.h
> @@ -823,6 +823,11 @@
>    FTR_SECTION_ELSE_NESTED(66);            \
>    mtspr    SPRN_SPRG_HSCRATCH0,rX;            \
>    ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
> +
> +#else /* CONFIG_PPC_BOOK3S_64 */
> +#define GET_SCRATCH0(rX)    mfspr    rX,SPRN_SPRG_SCRATCH0
> +#define SET_SCRATCH0(rX)    mtspr    SPRN_SPRG_SCRATCH0,rX
> +
> #endif
>=20
> #ifdef CONFIG_PPC_BOOK3E_64
> diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3=
s_rmhandlers.S
> index ae99af6..1a1b344 100644
> --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> @@ -112,7 +112,9 @@ INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_MACHINE_CHECK=

> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_DATA_STORAGE
> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_INST_STORAGE
> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL
> +#ifdef CONFIG_PPC_BOOK3S_64
> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL_HV

Hrm - I don't remember putting this one here. When did it get into the tree a=
nd why wasn't I CC'ed?

Alex

^ permalink raw reply

* Re: [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop
From: Eric Dumazet @ 2011-05-12  9:35 UTC (permalink / raw)
  To: Milton Miller
  Cc: Nick Piggin, Ingo Molnar, linux-kernel, Linus Torvalds,
	Andi Kleen, Anton Blanchard, Andrew Morton, linuxppc-dev,
	Thomas Gleixner
In-Reply-To: <seqlock-rmb@mdm.bga.com>

Le jeudi 12 mai 2011 à 04:13 -0500, Milton Miller a écrit :
> Move the smp_rmb after cpu_relax loop in read_seqlock and add
> ACCESS_ONCE to make sure the test and return are consistent.
> 
> A multi-threaded core in the lab didn't like the update
> from 2.6.35 to 2.6.36, to the point it would hang during
> boot when multiple threads were active.  Bisection showed
> af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
> Remove the per cpu tick skew) as the culprit and it is
> supported with stack traces showing xtime_lock waits including
> tick_do_update_jiffies64 and/or update_vsyscall.
> 
> Experimentation showed the combination of cpu_relax and smp_rmb
> was significantly slowing the progress of other threads sharing
> the core, and this patch is effective in avoiding the hang.
> 
> A theory is the rmb is affecting the whole core while the
> cpu_relax is causing a resource rebalance flush, together they
> cause an interfernce cadance that is unbroken when the seqlock
> reader has interrupts disabled.  
> 
> At first I was confused why the refactor in
> 3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
> seqlock) didn't affect this patch application, but after some
> study that affected seqcount not seqlock. The new seqcount was
> not factored back into the seqlock.  I defer that the future.
> 
> While the removal of the timer interrupt offset created
> contention for the xtime lock while a cpu does the
> additonal work to update the system clock, the seqlock
> implementation with the tight rmb spin loop goes back much
> further, and is just waiting for the right trigger.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Milton Miller <miltonm@bga.com>
> ---
> 
> To the readers of [RFC] time: xtime_lock is held too long:
> 
> I initially thought x86 would not see this because rmb would
> be a nop, but upon closer inspection X86_PPRO_FENCE will add
> a lfence for rmb.
> 
> milton
> 
> Index: common/include/linux/seqlock.h
> ===================================================================
> --- common.orig/include/linux/seqlock.h	2011-04-06 03:27:02.000000000 -0500
> +++ common/include/linux/seqlock.h	2011-04-06 03:35:02.000000000 -0500
> @@ -88,12 +88,12 @@ static __always_inline unsigned read_seq
>  	unsigned ret;
>  
>  repeat:
> -	ret = sl->sequence;
> -	smp_rmb();
> +	ret = ACCESS_ONCE(sl->sequence);
>  	if (unlikely(ret & 1)) {
>  		cpu_relax();
>  		goto repeat;
>  	}
> +	smp_rmb();
>  
>  	return ret;
>  }

I fully agree with your analysis. This is a call to make the change I
suggested earlier [1]. (Use a seqcount object in seqlock_t)

typedef struct {
	seqcount_t seq
	spinlock_t lock;
} seqlock_t;

I'll submit a patch for 2.6.40

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks

[1] Ref: https://lkml.org/lkml/2011/5/6/351

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Kees Cook @ 2011-05-12  9:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, jmorris, Linus Torvalds, Ingo Molnar,
	linux-arm-kernel, Serge E. Hallyn, Peter Zijlstra,
	microblaze-uclinux, Steven Rostedt, Martin Schwidefsky,
	Thomas Gleixner, Roland McGrath, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Tejun Heo, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512074850.GA9937@elte.hu>

Hi,

On Thu, May 12, 2011 at 09:48:50AM +0200, Ingo Molnar wrote:
> 1) We already have a specific ABI for this: you can set filters for events via 
>    an event fd.
> 
>    Why not extend that mechanism instead and improve *both* your sandboxing
>    bits and the events code? This new seccomp code has a lot more
>    to do with trace event filters than the minimal old seccomp code ...

Would this require privileges to get the event fd to start with? If so,
I would prefer to avoid that, since using prctl() as shown in the patch
set won't require any privs.

-Kees

-- 
Kees Cook
Ubuntu Security Team

^ permalink raw reply

* [tip:core/locking] seqlock: Don't smp_rmb in seqlock reader spin loop
From: tip-bot for Milton Miller @ 2011-05-12 10:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: andi, npiggin, eric.dumazet, linuxppc-dev, miltonm, mingo, anton,
	hpa, stable, tglx, paulmck, torvalds

Commit-ID:  5db1256a5131d3b133946fa02ac9770a784e6eb2
Gitweb:     http://git.kernel.org/tip/5db1256a5131d3b133946fa02ac9770a784e6eb2
Author:     Milton Miller <miltonm@bga.com>
AuthorDate: Thu, 12 May 2011 04:13:54 -0500
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 12 May 2011 12:13:43 +0200

seqlock: Don't smp_rmb in seqlock reader spin loop

Move the smp_rmb after cpu_relax loop in read_seqlock and add
ACCESS_ONCE to make sure the test and return are consistent.

A multi-threaded core in the lab didn't like the update
from 2.6.35 to 2.6.36, to the point it would hang during
boot when multiple threads were active.  Bisection showed
af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867 (clockevents:
Remove the per cpu tick skew) as the culprit and it is
supported with stack traces showing xtime_lock waits including
tick_do_update_jiffies64 and/or update_vsyscall.

Experimentation showed the combination of cpu_relax and smp_rmb
was significantly slowing the progress of other threads sharing
the core, and this patch is effective in avoiding the hang.

A theory is the rmb is affecting the whole core while the
cpu_relax is causing a resource rebalance flush, together they
cause an interfernce cadance that is unbroken when the seqlock
reader has interrupts disabled.

At first I was confused why the refactor in
3c22cd5709e8143444a6d08682a87f4c57902df3 (kernel: optimise
seqlock) didn't affect this patch application, but after some
study that affected seqcount not seqlock. The new seqcount was
not factored back into the seqlock.  I defer that the future.

While the removal of the timer interrupt offset created
contention for the xtime lock while a cpu does the
additonal work to update the system clock, the seqlock
implementation with the tight rmb spin loop goes back much
further, and is just waiting for the right trigger.

Cc: <stable@vger.kernel.org>
Signed-off-by: Milton Miller <miltonm@bga.com>
Cc: <linuxppc-dev@lists.ozlabs.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Link: http://lkml.kernel.org/r/%3Cseqlock-rmb%40mdm.bga.com%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/seqlock.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e98cd2e..06d6964 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -88,12 +88,12 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
 	unsigned ret;

 repeat:
-	ret = sl->sequence;
-	smp_rmb();
+	ret = ACCESS_ONCE(sl->sequence);
 	if (unlikely(ret & 1)) {
 		cpu_relax();
 		goto repeat;
 	}
+	smp_rmb();

 	return ret;
 }

^ permalink raw reply related

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-12 10:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, jmorris, Linus Torvalds, Ingo Molnar,
	linux-arm-kernel, Serge E. Hallyn, Peter Zijlstra,
	microblaze-uclinux, Steven Rostedt, Martin Schwidefsky,
	Thomas Gleixner, Roland McGrath, Michal Marek, Michal Simek,
	Will Drewry, linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt,
	Tejun Heo, linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512092424.GO28888@outflux.net>


* Kees Cook <kees.cook@canonical.com> wrote:

> Hi,
> 
> On Thu, May 12, 2011 at 09:48:50AM +0200, Ingo Molnar wrote:
> > 1) We already have a specific ABI for this: you can set filters for events via 
> >    an event fd.
> > 
> >    Why not extend that mechanism instead and improve *both* your sandboxing
> >    bits and the events code? This new seccomp code has a lot more
> >    to do with trace event filters than the minimal old seccomp code ...
> 
> Would this require privileges to get the event fd to start with? [...]

No special privileges with the default perf_events_paranoid value.

> [...] If so, I would prefer to avoid that, since using prctl() as shown in 
> the patch set won't require any privs.

and we could also explicitly allow syscall events without any privileges, 
regardless of the setting of 'perf_events_paranoid' config value.

Obviously a sandboxing host process wants to run with as low privileges as it 
can.

Thanks,

	Ingo

^ permalink raw reply

* Re: powerpc: Make early memory scan more resilient to out of order nodes
From: Benjamin Herrenschmidt @ 2011-05-12 11:08 UTC (permalink / raw)
  To: Milton Miller; +Cc: linuxppc-dev
In-Reply-To: <benh-inital-memory@mdm.bga.com>

On Thu, 2011-05-12 at 03:09 -0500, Milton Miller wrote:
> On Wed, 11 May 2011 about 20:58:18 -0000, Benjamin Herrenschmidt wrote:
> > We keep track of the size of the lowest block of memory and call
> > setup_initial_memory_limit() only after we've parsed them all
> > 
> 
> Good, we lose our sensitivity to device node ordering.

Yup, A platform we'll release soon has all of them backward :-)

> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> 
> > index 584b398..27475c6 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -70,6 +70,7 @@ int __initdata iommu_force_on;
> >  unsigned long tce_alloc_start, tce_alloc_end;
> >  u64 ppc64_rma_size;
> >  #endif
> > +static phys_addr_t first_memblock_size;
> 
> __initdata 
> 
> (its only referenced by 2 __init functions)
> 
> Acked-by: Milton Miller <miltonm@bga.com>

Ag good, I'll fold that in, thanks.

Cheers,
Ben.

> 
> >  static int __init early_parse_mem(char *p)
> >  {
> ..
> > @@ -507,11 +508,14 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 size)
> ..
> > @@ -708,6 +712,7 @@ void __init early_init_devtree(void *params)

^ permalink raw reply

* Re: [PATCH 03/13] kvm/powerpc: Fix the build for 32-bit Book 3S (classic) processors
From: Paul Mackerras @ 2011-05-12 11:15 UTC (permalink / raw)
  To: Alexander Graf; +Cc: linuxppc-dev@ozlabs.org, kvm@vger.kernel.org
In-Reply-To: <C5B8A119-56B6-467C-B3BD-01D5D7FA0773@suse.de>

On Thu, May 12, 2011 at 11:33:00AM +0200, Alexander Graf wrote:
> 
> Am 11.05.2011 um 12:39 schrieb Paul Mackerras <paulus@samba.org>:
> 
> > diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S
> > index ae99af6..1a1b344 100644
> > --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> > @@ -112,7 +112,9 @@ INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_MACHINE_CHECK
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_DATA_STORAGE
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_INST_STORAGE
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL_HV
> 
> Hrm - I don't remember putting this one here. When did it get into
> the tree and why wasn't I CC'ed?

It comes from commit a5d4f3ad3a ("powerpc: Base support for exceptions
using HSRR0/1", author Ben H.) in Ben's next branch.  He committed it
on April 20.

Paul.

^ permalink raw reply

* Re: [PATCH 03/13] kvm/powerpc: Fix the build for 32-bit Book 3S (classic) processors
From: Benjamin Herrenschmidt @ 2011-05-12 11:16 UTC (permalink / raw)
  To: Alexander Graf
  Cc: linuxppc-dev@ozlabs.org, Paul Mackerras, kvm@vger.kernel.org
In-Reply-To: <C5B8A119-56B6-467C-B3BD-01D5D7FA0773@suse.de>

On Thu, 2011-05-12 at 11:33 +0200, Alexander Graf wrote:
> Am 11.05.2011 um 12:39 schrieb Paul Mackerras <paulus@samba.org>:
> 
> > Commits a5d4f3ad3a ("powerpc: Base support for exceptions using
> > HSRR0/1") and 673b189a2e ("powerpc: Always use SPRN_SPRG_HSCRATCH0
> > when running in HV mode") cause compile and link errors for 32-bit
> > classic Book 3S processors when KVM is enabled.  This fixes these
> > errors.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> > ---
> > arch/powerpc/include/asm/reg.h       |    5 +++++
> > arch/powerpc/kvm/book3s_rmhandlers.S |    2 ++
> > 2 files changed, 7 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> > index 47e3416..05658b7 100644
> > --- a/arch/powerpc/include/asm/reg.h
> > +++ b/arch/powerpc/include/asm/reg.h
> > @@ -823,6 +823,11 @@
> >    FTR_SECTION_ELSE_NESTED(66);            \
> >    mtspr    SPRN_SPRG_HSCRATCH0,rX;            \
> >    ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
> > +
> > +#else /* CONFIG_PPC_BOOK3S_64 */
> > +#define GET_SCRATCH0(rX)    mfspr    rX,SPRN_SPRG_SCRATCH0
> > +#define SET_SCRATCH0(rX)    mtspr    SPRN_SPRG_SCRATCH0,rX
> > +
> > #endif
> > 
> > #ifdef CONFIG_PPC_BOOK3E_64
> > diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S
> > index ae99af6..1a1b344 100644
> > --- a/arch/powerpc/kvm/book3s_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
> > @@ -112,7 +112,9 @@ INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_MACHINE_CHECK
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_DATA_STORAGE
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_INST_STORAGE
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL
> > +#ifdef CONFIG_PPC_BOOK3S_64
> > INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL_HV
> 
> Hrm - I don't remember putting this one here. When did it get into the tree and why wasn't I CC'ed?

Because I did and I forgot :-)

The patch in question only marginally touched kvm, it's one in a series
that rework of the ppc64 exception vectors to better operate on modern
CPUs running in HV mode (deal with HSRR's vs SRR's etc...) and it needed
a small fixup to the KVM code due to 0x500 becoming "H" interrupts
(using HSRR's) on these.

Unfortunately, it looks like I didn't have KVM enabled in any of my
32-bit test configs and missed that little breakage.

I should have CCed you I suppose, I simply forgot as it wasn't primarily
a KVM related patch.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: James Morris @ 2011-05-12 11:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512074850.GA9937@elte.hu>

On Thu, 12 May 2011, Ingo Molnar wrote:

> 
> 2) Why should this concept not be made available wider, to allow the 
>    restriction of not just system calls but other security relevant components 
>    of the kernel as well?

Because the aim of this is to reduce the attack surface of the syscall 
interface.

LSM is the correct level of abstraction for general security mediation, 
because it allows you to take into account all relevant security 
information in a race-free context.


>    This too, if you approach the problem via the events code, will be a natural 
>    end result, while if you approach it from the seccomp prctl angle it will be
>    a limited hack only.

I'd say it's a well-defined and readily understandable feature.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [PATCH 03/13] kvm/powerpc: Fix the build for 32-bit Book 3S (classic) processors
From: Alexander Graf @ 2011-05-12 11:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: linuxppc-dev@ozlabs.org, Paul Mackerras, kvm@vger.kernel.org
In-Reply-To: <1305198979.29820.102.camel@pasglop>


Am 12.05.2011 um 13:16 schrieb Benjamin Herrenschmidt <benh@kernel.crashing.=
org>:

> On Thu, 2011-05-12 at 11:33 +0200, Alexander Graf wrote:
>> Am 11.05.2011 um 12:39 schrieb Paul Mackerras <paulus@samba.org>:
>>=20
>>> Commits a5d4f3ad3a ("powerpc: Base support for exceptions using
>>> HSRR0/1") and 673b189a2e ("powerpc: Always use SPRN_SPRG_HSCRATCH0
>>> when running in HV mode") cause compile and link errors for 32-bit
>>> classic Book 3S processors when KVM is enabled.  This fixes these
>>> errors.
>>>=20
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>> ---
>>> arch/powerpc/include/asm/reg.h       |    5 +++++
>>> arch/powerpc/kvm/book3s_rmhandlers.S |    2 ++
>>> 2 files changed, 7 insertions(+), 0 deletions(-)
>>>=20
>>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/r=
eg.h
>>> index 47e3416..05658b7 100644
>>> --- a/arch/powerpc/include/asm/reg.h
>>> +++ b/arch/powerpc/include/asm/reg.h
>>> @@ -823,6 +823,11 @@
>>>   FTR_SECTION_ELSE_NESTED(66);            \
>>>   mtspr    SPRN_SPRG_HSCRATCH0,rX;            \
>>>   ALT_FTR_SECTION_END_NESTED_IFCLR(CPU_FTR_HVMODE_206, 66)
>>> +
>>> +#else /* CONFIG_PPC_BOOK3S_64 */
>>> +#define GET_SCRATCH0(rX)    mfspr    rX,SPRN_SPRG_SCRATCH0
>>> +#define SET_SCRATCH0(rX)    mtspr    SPRN_SPRG_SCRATCH0,rX
>>> +
>>> #endif
>>>=20
>>> #ifdef CONFIG_PPC_BOOK3E_64
>>> diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/boo=
k3s_rmhandlers.S
>>> index ae99af6..1a1b344 100644
>>> --- a/arch/powerpc/kvm/book3s_rmhandlers.S
>>> +++ b/arch/powerpc/kvm/book3s_rmhandlers.S
>>> @@ -112,7 +112,9 @@ INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_MACHINE_CHE=
CK
>>> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_DATA_STORAGE
>>> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_INST_STORAGE
>>> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL
>>> +#ifdef CONFIG_PPC_BOOK3S_64
>>> INTERRUPT_TRAMPOLINE    BOOK3S_INTERRUPT_EXTERNAL_HV
>>=20
>> Hrm - I don't remember putting this one here. When did it get into the tr=
ee and why wasn't I CC'ed?
>=20
> Because I did and I forgot :-)
>=20
> The patch in question only marginally touched kvm, it's one in a series
> that rework of the ppc64 exception vectors to better operate on modern
> CPUs running in HV mode (deal with HSRR's vs SRR's etc...) and it needed
> a small fixup to the KVM code due to 0x500 becoming "H" interrupts
> (using HSRR's) on these.
>=20
> Unfortunately, it looks like I didn't have KVM enabled in any of my
> 32-bit test configs and missed that little breakage.
>=20
> I should have CCed you I suppose, I simply forgot as it wasn't primarily
> a KVM related patch.

Alright :). Don't worry too much about it - just wanted to make sure you'll r=
emember next time ;)

Alex

>=20

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Frederic Weisbecker @ 2011-05-12 12:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Heiko Carstens,
	Oleg Nesterov, David Howells, Paul Mackerras, Eric Paris,
	H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390, Russell King,
	x86, jmorris, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512074850.GA9937@elte.hu>

On Thu, May 12, 2011 at 09:48:50AM +0200, Ingo Molnar wrote:
> To restrict execution to system calls.
> 
> Two observations:
> 
> 1) We already have a specific ABI for this: you can set filters for events via 
>    an event fd.
> 
>    Why not extend that mechanism instead and improve *both* your sandboxing
>    bits and the events code? This new seccomp code has a lot more
>    to do with trace event filters than the minimal old seccomp code ...
> 
>    kernel/trace/trace_event_filter.c is 2000 lines of tricky code that
>    interprets the ASCII filter expressions. kernel/seccomp.c is 86 lines of
>    mostly trivial code.
> 
> 2) Why should this concept not be made available wider, to allow the 
>    restriction of not just system calls but other security relevant components 
>    of the kernel as well?
> 
>    This too, if you approach the problem via the events code, will be a natural 
>    end result, while if you approach it from the seccomp prctl angle it will be
>    a limited hack only.
> 
> Note, the end result will be the same - just using a different ABI.
> 
> So i really think the ABI itself should be closer related to the event code. 
> What this "seccomp" code does is that it uses specific syscall events to 
> restrict execution of certain event generating codepaths, such as system calls.
> 
> Thanks,
> 
> 	Ingo

What's positive with that approach is that the code is all there already.
Create a perf event for a given trace event, attach a filter to it.

What needs to be added is an override of the effect of the filter. By default
it's dropping the event, but there may be different flavours, including sending
a signal. All in one, extending the current code to allow that looks trivial.

The negative points are that

* trace events are supposed to stay passive and not act on the system, except
doing some endpoint things like writing to a buffer. We can't call do_exit()
from a tracepoint for example, preemption is disabled there.

* Also, is it actually relevant to extend that seccomp filtering to other events
than syscalls? Exposing kernel events to filtering sounds actually to me bringing
a new potential security issue. But with fine restrictions this can probably
be dealt with. Especially if by default only syscalls can be filtered

* I think Peter did not want to give such "active" role to perf in the system.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: James Morris @ 2011-05-12 11:33 UTC (permalink / raw)
  To: Will Drewry
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Ingo Molnar, linux-arm-kernel, kees.cook,
	Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner, Ingo Molnar,
	Roland McGrath, Michal Marek, Michal Simek, linuxppc-dev,
	linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo, linux390,
	Andrew Morton, agl, David S. Miller
In-Reply-To: <1305169376-2363-1-git-send-email-wad@chromium.org>

On Wed, 11 May 2011, Will Drewry wrote:

> +void seccomp_filter_log_failure(int syscall)
> +{
> +	printk(KERN_INFO
> +		"%s[%d]: system call %d (%s) blocked at ip:%lx\n",
> +		current->comm, task_pid_nr(current), syscall,
> +		syscall_nr_to_name(syscall), KSTK_EIP(current));
> +}

I think it'd be a good idea to utilize the audit facility here.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* RE: [PATCH] RapidIO: Fix default routing initialization
From: Bounine, Alexandre @ 2011-05-12 12:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Thomas Moll, linuxppc-dev
In-Reply-To: <20110511130024.955f1981.akpm@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> wrote:
=20
> The changelog doesn't permit me to determine the importance of this
> fix,
> so I don't know whether to schedule it for 2.6.39 or for -stable.

Sorry, my fault. This patch is applicable to kernel versions starting
from 2.6.37.

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Ingo Molnar @ 2011-05-12 13:01 UTC (permalink / raw)
  To: James Morris
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <alpine.LRH.2.00.1105122133500.31507@tundra.namei.org>

* James Morris <jmorris@namei.org> wrote:

> On Thu, 12 May 2011, Ingo Molnar wrote:
> 
> > 2) Why should this concept not be made available wider, to allow the 
> >    restriction of not just system calls but other security relevant components 
> >    of the kernel as well?
> 
> Because the aim of this is to reduce the attack surface of the syscall 
> interface.

What i suggest achieves the same, my argument is that we could aim it to be 
even more flexible and even more useful.

> LSM is the correct level of abstraction for general security mediation, 
> because it allows you to take into account all relevant security information 
> in a race-free context.

I don't care about LSM though, i find it poorly designed.

The approach implemented here, the ability for *unprivileged code* to define 
(the seeds of ...) flexible security policies, in a proper Linuxish way, which 
is inherited along the task parent/child hieararchy and which allows nesting 
etc. is a *lot* more flexible.

What Will implemented here is pretty huge in my opinion: it turns security from 
a root-only kind of weird hack into an essential component of its APIs, 
available to *any* app not just the select security policy/mechanism chosen by 
the distributor ...

If implemented properly this could replace LSM in the long run.

As a prctl() hack bound to seccomp (which, by all means, is a natural extension 
to the current seccomp ABI, so perfectly fine if we only want that scope), that 
is much less likely to happen.

And if we merge the seccomp interface prematurely then interest towards a more 
flexible approach will disappear, so either we do it properly now or it will 
take some time for someone to come around and do it ...

Also note that i do not consider the perf events ABI itself cast into stone - 
and we could very well add a new system call for this, independent of perf 
events. I just think that the seccomp scope itself is exciting but looks 
limited to what the real potential of this could be.

> >    This too, if you approach the problem via the events code, will be a natural 
> >    end result, while if you approach it from the seccomp prctl angle it will be
> >    a limited hack only.
> 
> I'd say it's a well-defined and readily understandable feature.

Note, it was me who suggested this very event-filter-engine design a year ago, 
when the first submission still used a crude bitmap of allowed seccomp 
syscalls:

  http://lwn.net/Articles/332974/

Funnily enough, back then you wrote this:

  " I'm concerned that we're seeing yet another security scheme being designed on 
    the fly, without a well-formed threat model, and without taking into account 
    lessons learned from the seemingly endless parade of similar, failed schemes. "

so when and how did your opinion of this scheme turn from it being an "endless 
parade of failed schemes" to it being a "well-defined and readily 
understandable feature"? :-)

The idea itself has not changed since last year, what happened is that the 
filter engine got a couple of new features and Will has separated it out and 
has implemented a working prototype for sandboxing.

What i do here is to suggest *further* steps down the same road, now that we 
see that this scheme can indeed be used to implement sandboxing ... I think 
it's a valid line of inquiry.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH] seqlock: don't smp_rmb in seqlock reader spin loop
From: Andi Kleen @ 2011-05-12 14:08 UTC (permalink / raw)
  To: Milton Miller
  Cc: Nick Piggin, Eric Dumazet, Ingo Molnar, linux-kernel,
	Linus Torvalds, Andi Kleen, Anton Blanchard, Andrew Morton,
	linuxppc-dev, Thomas Gleixner
In-Reply-To: <seqlock-rmb@mdm.bga.com>

On Thu, May 12, 2011 at 04:13:54AM -0500, Milton Miller wrote:
> 
> Move the smp_rmb after cpu_relax loop in read_seqlock and add
> ACCESS_ONCE to make sure the test and return are consistent.
> 
> A multi-threaded core in the lab didn't like the update

Which core was that?

-Andi

^ permalink raw reply

* Re: fsl_udc_core: BUG: scheduling while atomic
From: Matthew L. Creech @ 2011-05-12 15:30 UTC (permalink / raw)
  To: Sergej.Stepanov; +Cc: linuxppc-dev
In-Reply-To: <4206182445660643B9AEB8D4E55BBD0A1533BC2780@HERMES2>

On Thu, May 12, 2011 at 4:37 AM,  <Sergej.Stepanov@ids.de> wrote:
> Hi Mattheew,
>
> such oops you can get also with spi.
> For such problem helps to compile your kernel with other preemption
> model:
> =A0- preempt
> =A0- standard
> =A0- !!! but not voluntary preemption !!!

Thanks Sergej, indeed I'm currently using CONFIG_PREEMPT_VOLUNTARY on
this board.  I'll change it to fix this problem for now.

Do you happen to know whether the Freescale folks intend to fix this?
If not, it seems like at least some sort of warning is in order.

--=20
Matthew L. Creech

^ permalink raw reply

* Re: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to defconfig
From: Scott Wood @ 2011-05-12 15:31 UTC (permalink / raw)
  To: Li Yang-R58472; +Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <3F607A5180246847A760FD34122A1E0514EC65@039-SN1MPN1-002.039d.mgd.msft.net>

On Thu, 12 May 2011 01:11:03 -0500
Li Yang-R58472 <R58472@freescale.com> wrote:

> >Subject: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to
> >defconfig
> >
> >Even though support for the p5020's on-chip ethernet is not yet upstream,
> >it is not appropriate to disable all networking support (including
> >loopback, unix domain sockets, external ethernet devices, etc) in the
> >defconfig.  The networking settings are taken from mpc85xx_smp_defconfig,
> >minus the drivers for ethernet devices not found on any current e5500 chip.
> >
> >The other changes are the result of running "make savedefconfig".
> >
> >Signed-off-by: Scott Wood <scottwood@freescale.com>
> >---
> > arch/powerpc/configs/e55xx_smp_defconfig |   38 ++++++++++++++++++++++---
> >----
> > 1 files changed, 29 insertions(+), 9 deletions(-)
> >
> >diff --git a/arch/powerpc/configs/e55xx_smp_defconfig
> >b/arch/powerpc/configs/e55xx_smp_defconfig
> >index 9fa1613..f4c5780 100644
> >--- a/arch/powerpc/configs/e55xx_smp_defconfig
> >+++ b/arch/powerpc/configs/e55xx_smp_defconfig
> >@@ -6,10 +6,10 @@ CONFIG_NR_CPUS=2
> > CONFIG_EXPERIMENTAL=y
> > CONFIG_SYSVIPC=y
> > CONFIG_BSD_PROCESS_ACCT=y
> >+CONFIG_SPARSE_IRQ=y
> 
> Hi Scott,
> 
> I remember in previous testing that this option has a negative effect on performance.  Do we really need it to be enabled?

I didn't change this setting, it just moved due to running it through
savedefconfig.

-Scott

^ permalink raw reply

* Re: [linuxppc-release] [PATCH 1/2] powerpc, e5500: add networking to defconfig
From: Scott Wood @ 2011-05-12 15:31 UTC (permalink / raw)
  To: Scott Wood
  Cc: Wood Scott-B07421, linuxppc-dev@lists.ozlabs.org, Li Yang-R58472
In-Reply-To: <20110512103108.6dd3ca2a@schlenkerla.am.freescale.net>

On Thu, 12 May 2011 10:31:08 -0500
Scott Wood <scottwood@freescale.com> wrote:

> On Thu, 12 May 2011 01:11:03 -0500
> Li Yang-R58472 <R58472@freescale.com> wrote:
> 
> > >diff --git a/arch/powerpc/configs/e55xx_smp_defconfig
> > >b/arch/powerpc/configs/e55xx_smp_defconfig
> > >index 9fa1613..f4c5780 100644
> > >--- a/arch/powerpc/configs/e55xx_smp_defconfig
> > >+++ b/arch/powerpc/configs/e55xx_smp_defconfig
> > >@@ -6,10 +6,10 @@ CONFIG_NR_CPUS=2
> > > CONFIG_EXPERIMENTAL=y
> > > CONFIG_SYSVIPC=y
> > > CONFIG_BSD_PROCESS_ACCT=y
> > >+CONFIG_SPARSE_IRQ=y
> > 
> > Hi Scott,
> > 
> > I remember in previous testing that this option has a negative effect on performance.  Do we really need it to be enabled?
> 
> I didn't change this setting, it just moved due to running it through
> savedefconfig.

What was the performance impact?

-Scott

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: Will Drewry @ 2011-05-12 16:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, James Morris, Linus Torvalds, Ingo Molnar,
	linux-arm-kernel, kees.cook, Serge E. Hallyn, Peter Zijlstra,
	microblaze-uclinux, Steven Rostedt, Martin Schwidefsky,
	Thomas Gleixner, Roland McGrath, Michal Marek, Michal Simek,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512130104.GA2912@elte.hu>

[Thanks to everyone for the continued feedback and insights - I appreciate =
it!]

On Thu, May 12, 2011 at 8:01 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * James Morris <jmorris@namei.org> wrote:
>
>> On Thu, 12 May 2011, Ingo Molnar wrote:
>>
>> > 2) Why should this concept not be made available wider, to allow the
>> > =A0 =A0restriction of not just system calls but other security relevan=
t components
>> > =A0 =A0of the kernel as well?
>>
>> Because the aim of this is to reduce the attack surface of the syscall
>> interface.
>
> What i suggest achieves the same, my argument is that we could aim it to =
be
> even more flexible and even more useful.
>
>> LSM is the correct level of abstraction for general security mediation,
>> because it allows you to take into account all relevant security informa=
tion
>> in a race-free context.
>
> I don't care about LSM though, i find it poorly designed.
>
> The approach implemented here, the ability for *unprivileged code* to def=
ine
> (the seeds of ...) flexible security policies, in a proper Linuxish way, =
which
> is inherited along the task parent/child hieararchy and which allows nest=
ing
> etc. is a *lot* more flexible.
>
> What Will implemented here is pretty huge in my opinion: it turns securit=
y from
> a root-only kind of weird hack into an essential component of its APIs,
> available to *any* app not just the select security policy/mechanism chos=
en by
> the distributor ...
>
> If implemented properly this could replace LSM in the long run.
>
> As a prctl() hack bound to seccomp (which, by all means, is a natural ext=
ension
> to the current seccomp ABI, so perfectly fine if we only want that scope)=
, that
> is much less likely to happen.
>
> And if we merge the seccomp interface prematurely then interest towards a=
 more
> flexible approach will disappear, so either we do it properly now or it w=
ill
> take some time for someone to come around and do it ...
>
> Also note that i do not consider the perf events ABI itself cast into sto=
ne -
> and we could very well add a new system call for this, independent of per=
f
> events. I just think that the seccomp scope itself is exciting but looks
> limited to what the real potential of this could be.

I agree with you on many of these points!  However, I don't think that
the views around LSMs, perf/ftrace infrastructure, or the current
seccomp filtering implementation are necessarily in conflict.  Here is
my understanding of how the different worlds fit together and where I
see this patchset living, along with where I could see future work
going.  Perhaps I'm being a trifle naive, but here goes anyway:

1. LSMs provide a global mechanism for hooking "security relevant"
events at a point where all the incoming user-sourced data has been
preprocessed and moved into userspace.  The hooks are called every
time one of those boundaries are crossed.
2. Perf and the ftrace infrastructure provide global function tracing
and system call hooks with direct access to the caller's registers
(and memory).
3. seccomp (as it exists today) provides a global system call entry
hook point with a binary per-process decision about whether to provide
"secure computing" behavior.

When I boil that down to abstractions, I see:
A. Globally scoped: LSMs, ftrace/perf
B. Locally/process scoped: seccomp

The result of that logical equivalence is that I see room for:
I. A per-process, locally scoped security event hooking interface (the
proposed changes in this patchset)
II. A globally scoped security event hooking interface _prior_ to
argument processing
III. A globally scoped security event hooking interface _post_
argument processing

II and III could be reduced further if I assume that ftrace/perf
provides (II) and a simple intermediary layer (hook entry/exit)
provides the argument processing steps that then call out a global
security policy system.

The driving motivation for this patchset is kernel attack surface
reduction, but that need arises because we lack a process-scoped
mechanism for making security decisions -- everything is global:
creds/DAC, containers, LSM, etc.   Adding ftrace filtering to agl's
original bitmask-seccomp proposal opens up the process-local security
world.  At present, it can limit the attack surface with simple binary
filters or apply limited security policy through the use of filter
strings.

Based on your mails, I see two main deficiencies in my proposed patchset:
a. Deep argument analysis: Any arguments that live in user memory
needs to be copied into the kernel, then checked, and substituted for
the actual system call, then have the original pointers restored (when
applicable) on system call exit.  There is a large overhead here and
the LSM hooks provide much of this support on a global level.
b. Lack of support for non-system call events.

For (a), if the long term view of ftrace/perf & LSMs is that LSM-like
functionality will live on top of the ftrace/perf infrastructure, then
adding support for the intermediary layer to analyze arguments will
come with time.  It's also likely that for process-local stuff (e.g.,)
a new predicate could be added to callback to a userspace supervisor,
or even a more generic ability for modules to register new
predicates/functions in the filtering engine itself -- like "fd =3D=3D 1
&& check_path(path) =3D=3D '/etc/safe.conf'" or "check_xattr(path,
expected)".  Of course, I'm just making stuff up right now :)

For (b), we could just add a field we don't use right now in the prctl
interface:
  prctl(PR_SET_SECCOMP_FILTER, int event_type, int
event_or_syscall_nr, char *filter)
[or something similar]

Then we can add process-local/scoped supported event types somewhere
down the road without an ABI change.

Tying it all together, it'd look like:
* Now -- add process-scoped security support: secocmp filter with
support for "future" event types
* Soon -- expand ftrace syscall hooks to hook more system calls
* Later -- expand ftrace filter language to support either deep
argument analysis and/or custom registered predicates
* Later, later -- implement a LSM-like hooking layer for "interesting"
event types on top of the ftrace hooks

That would yield process-scoped security controls and global security
controls and the ability to continue to create new and interesting
security modules.

All that said, I'm in over my head.  I've focused primarily on the
process-scoped security.  I think James, some of the LSM authors, and
out-of-tree security system maintainers would be good to help guide
direction toward the security view you have in mind to ensure the
flexibility desired exists.  And that's even assuming this sketch is
even vaguely interesting...

[snip]

> What i do here is to suggest *further* steps down the same road, now that=
 we
> see that this scheme can indeed be used to implement sandboxing ... I thi=
nk
> it's a valid line of inquiry.

I certainly agree that it's a valid line of inquiry, but I worry about
the massive scope expansion.  I know it hurts my head, but I'm hoping
the brain-dump above frames up how I think about this patch and your
line of inquiry.  ftrace hooking and the perf code certainly look a
lot like LSMs if I squint hard :)  But there is a substantial amount
of work to merge the worlds, and (thankfully) I don't think that
future directly impacts process-scoped security mechanisms even if
they can interact nicely.

thanks!
will

^ permalink raw reply

* RE: [PATCH 0/1] ppc4xx: Fix PCIe scanning for the 460SX
From: Tirumala Marri @ 2011-05-12 18:16 UTC (permalink / raw)
  To: Ayman El-Khashab; +Cc: linuxppc-dev, Paul Mackerras
In-Reply-To: <20110509160935.GA14965@crust.elkhashab.com>

So what is the best way to handle this?  It appears (based
on the comments of others and my own experience) that there
is no DCR that exists and behaves the way that previous SOCs
behaved to give us the link status?  The register above
PECFGn_DLLSTA is actually in the PCIe configuration space so
we would have to map that in to be able to read that
register during the link check.  Is that correct or ok?
[marri] yes, you need to program DCR register access these local PCIE_CFG
registers.

I've communicated with some people over email and they had
tried the (PESDRn_HSSLySTS) register.  Recognizing that
there exists one of these for each port/lane, is there a way
to use this one?  It is in the indirect DCR space.  I'd
tried this myself and never did get it to do anything but I
could have been looking at the wrong lane or something.
[marri]This is at SERDES level. If this link up doesn't necessarily
Overall stack is up. This is mostly used for BIST and diagnostics.

Lastly, what was the reason for forcing the original code to
be GEN-1 speeds?
[marri] Gen-2 need some extra checks compared to Gen-1.
There were not many Gen-2 devices at the time of submission
To test them.

^ permalink raw reply

* mpc8xx IDE hard drive enumeration
From: Burton Samograd @ 2011-05-12 21:27 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2863 bytes --]

Hello list,

I'm currently attempting to get Linux running on a custom board and have
gotten to the point of trying to get our IDE FlashCard working.  I have
ported u-boot and have the flash card working as expected (as in being
able to read and write sectors) so it looks like it is possible to get
it working in Linux with the parameters I have setup (this is using
CONFIG_IDE_8XX_DIRECT and the various CFG_PCMCIA_* parameters).

I have built the kernel with IDE Block Device support and have selected
the MPC8xx IDE support as well, using 8xx_DIRECT as the interface.  I
have ensured that my parameters for PCMCIA match those of u-boot.  I get
the following on bootup with regards to the IDE subsystem (having
defined DEBUG and added some additional debugging into in
drivers/ide/ppc/mpc8xx.c):

...

Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2

ide: Assuming 50MHz system bus speed for PIO modes; override with
idebus=xx

win->br: 0xfe100000, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100080

win->br: 0xfe100080, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100100

win->br: 0xfe100100, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100180

win->br: 0xfe100c00, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100c80

win->br: 0xfe100c80, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100d00

win->br: 0xfe100d00, PCMCIA_MEM_SIZE: 0x80, pcmcia_phy_end: 0xfe100d80

PCMCIA slot A: phys mem fe100000...fe100d80 (size 00000d80)

PCMCIA virt base: c1000000

base: c1000000 + 00000000 = c1000000

port[0]: c1000000 + 00000000 = c1000000

port[1]: c1000000 + 00000081 = c1000081

port[2]: c1000000 + 00000082 = c1000082

port[3]: c1000000 + 00000083 = c1000083

port[4]: c1000000 + 00000084 = c1000084

port[5]: c1000000 + 00000085 = c1000085

port[6]: c1000000 + 00000086 = c1000086

port[7]: c1000000 + 00000087 = c1000087

port[8]: c1000000 + 00000106 = c1000106

port[9]: c1000000 + 0000000a = c100000a

...

The IDE driver is calling m8xx_ide_init_hwif_ports successfully (not so
if I define 8xx_PCCARD) and from the source it just seems to be getting
the PCMCIA parameters that are setup by u-boot.  Everything seems fine
up to this point.

Now, I'm wondering, why is it not enumerating the drives on the found
hwif ports?  Would I need to specify the drives on the command line, or
some other step that I might be missing?  Can anyone see anything
drastically wrong with the PCMCIA parameters (even though they seem to
work fine with u-boot).  

We are using a rather old kernel (2.6.12 I think, with customizations
for our hardware), but this is a rather old custom board with very
limited ram so we are attempting to get the older kernel working first
before looking into upgrading.

Any help appreciated.

--

Burton Samograd

[-- Attachment #2: Type: text/html, Size: 8867 bytes --]

^ permalink raw reply

* Re: [PATCH] fix build warnings on defconfigs
From: Ralf Baechle @ 2011-05-12 21:57 UTC (permalink / raw)
  To: wanlong.gao
  Cc: linux-mips, david.woodhouse, tony, nicolas.ferre, paulus, eric,
	sam, sfr, linux, khilman, manuel.lauss, u.kleine-koenig, mingo,
	rientjes, anton, ben-linux, linux-arm-kernel, linux-kernel,
	santosh.shilimkar, akpm, linuxppc-dev, hans-christian.egtvedt
In-Reply-To: <1302375858-11253-1-git-send-email-wanlong.gao@gmail.com>

On Sun, Apr 10, 2011 at 03:04:18AM +0800, wanlong.gao@gmail.com wrote:

> Subject: [PATCH] fix build warnings on defconfigs
> 
> From: Wanlong Gao <wanlong.gao@gmail.com>
> 
> Change the BT_L2CAP and BT_SCO defconfigs from 'm' to 'y',
> since BT_L2CAP and BT_SCO had changed to bool configs.
> 
> Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>

I've queued the MIPS bits only for 2.6.40.  Thanks.

  Ralf

^ permalink raw reply

* [PATCH] powerpc/pseries: Enable iSCSI support for a number of cards
From: Anton Blanchard @ 2011-05-12 22:23 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev
In-Reply-To: <20110509091930.2e784e54@kryten>


Enable iSCSI support for a number of cards. We had the base
networking devices enabled but forgot to enable iSCSI.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

v2: I added the bnx2 iscsi twice.

Index: junk/arch/powerpc/configs/pseries_defconfig
===================================================================
--- junk.orig/arch/powerpc/configs/pseries_defconfig	2011-05-13 08:03:17.122419568 +1000
+++ junk/arch/powerpc/configs/pseries_defconfig	2011-05-13 08:03:43.962869252 +1000
@@ -146,12 +146,17 @@ CONFIG_SCSI_MULTI_LUN=y
 CONFIG_SCSI_CONSTANTS=y
 CONFIG_SCSI_FC_ATTRS=y
 CONFIG_SCSI_SAS_ATTRS=m
+CONFIG_SCSI_CXGB3_ISCSI=m
+CONFIG_SCSI_CXGB4_ISCSI=m
+CONFIG_SCSI_BNX2_ISCSI=m
+CONFIG_BE2ISCSI=m
 CONFIG_SCSI_IBMVSCSI=y
 CONFIG_SCSI_IBMVFC=m
 CONFIG_SCSI_SYM53C8XX_2=y
 CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=0
 CONFIG_SCSI_IPR=y
 CONFIG_SCSI_QLA_FC=m
+CONFIG_SCSI_QLA_ISCSI=m
 CONFIG_SCSI_LPFC=m
 CONFIG_ATA=y
 # CONFIG_ATA_SFF is not set

^ permalink raw reply

* Re: [PATCH 3/5] v2 seccomp_filters: Enable ftrace-based system call filtering
From: James Morris @ 2011-05-13  0:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-mips, linux-sh, Peter Zijlstra, Frederic Weisbecker,
	Heiko Carstens, Oleg Nesterov, David Howells, Paul Mackerras,
	Eric Paris, H. Peter Anvin, sparclinux, Jiri Slaby, linux-s390,
	Russell King, x86, Linus Torvalds, Ingo Molnar, linux-arm-kernel,
	kees.cook, Serge E. Hallyn, Peter Zijlstra, microblaze-uclinux,
	Steven Rostedt, Martin Schwidefsky, Thomas Gleixner,
	Roland McGrath, Michal Marek, Michal Simek, Will Drewry,
	linuxppc-dev, linux-kernel, Ralf Baechle, Paul Mundt, Tejun Heo,
	linux390, Andrew Morton, agl, David S. Miller
In-Reply-To: <20110512130104.GA2912@elte.hu>

On Thu, 12 May 2011, Ingo Molnar wrote:
> Funnily enough, back then you wrote this:
> 
>   " I'm concerned that we're seeing yet another security scheme being designed on 
>     the fly, without a well-formed threat model, and without taking into account 
>     lessons learned from the seemingly endless parade of similar, failed schemes. "
> 
> so when and how did your opinion of this scheme turn from it being an "endless 
> parade of failed schemes" to it being a "well-defined and readily 
> understandable feature"? :-)

When it was defined in a way which limited its purpose to reducing the 
attack surface of the sycall interface.


- James
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox