All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"Paul E. McKenney" <paulmck@us.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: per-cpu operation madness vs validation
Date: Wed, 27 Jul 2011 14:11:33 +0200	[thread overview]
Message-ID: <1311768693.24752.488.camel@twins> (raw)
In-Reply-To: <alpine.DEB.2.00.1107261624210.10273@router.home>

On Tue, 2011-07-26 at 16:32 -0500, Christoph Lameter wrote:

> Those all fold to the same operation on x86. The x86 cpu operations with
> segment prefix are interrupt and preempt safe. The context issues come
> into play in fallback scenarios where other architectures do not have
> instructions that can perform the same things as x86.

Right, but that doesn't help us much if anything.

> > Now the reason is of course that -rt changes some things, even when
> > using migrate_disable() it might be multiple processes might try and
> > access the per-cpu variable, needing serialization.
> 
> On x86 these operations are rt safe by the pure fact that all these
> operations are the same interrupt safe instruction.

The number of instructions has nothing what so ever to do with things.
And while the individual ops might be preempt/irq/nmi safe, that doesn't
say anything at all about the piece of code they're used in.

>  The difficulties come in for other
> architectures. In those cases preemption or interrupts need to be
> disabled.

Irrelevant.

> > The point of course is, how are we going to go about doing this, I'm
> > sure adding a proper conditional to each and every per-cpu op is going
> > to be a herculean task, and most of it utterly boring.
> 
> Basically we need to track the contexts in which references to a certain
> per cpu variable are established. Then constraints can be enforced. For
> example:
> 
> 1. A variable used in an interupt context with __this_cpu ops requires
> irqsafe_cpu_xxx when used in context where interrupts are enabled.
> 
> 2. A variable used in a non-preemptible context with __this_cpu ops
> requires this_cpu ops in a preemptible context.

And doesn't solve the larger issue of multiple per-cpu variables forming
a consistent piece of data.

Suppose there's two per-cpu variables, a and b, and we have an invariant
that says that a-b := 5. Then the following code:

preempt_disable();
__this_cpu_inc(a);
__this_cpu_inc(b);
preempt_enable();

is only correct when used from task context, an IRQ user can easily
observe our invariant failing and none of your above validations will
catch this.

Now move to -rt where the above will likely end up looking something
like:

migrate_disable();
__this_cpu_inc(a);
__this_cpu_inc(b);
migrate_enable();

and everybody can see the invariant failing.

Now clearly my example is very artificial but not far fetched at all,
there's lots of code like this, and -rt gets to try and unravel all
this. Basically preempt_disable()/local_bh_disable()/local_irq_disable()
are the next BKL, there is no context what so ever to reconstruct what
invariants certain pieces of code expect.

Hence my suggestion to do something like:

struct foo {
	percpu_lock_t lock;
	int a;
	int b;
}

DEFINE_PER_CPU(struct foo, foo);

percpu_lock(&foo.lock);
__this_cpu_inc(foo.a);
__this_cpu_inc(foo.b);
percpu_unlock(&foo.lock);

That would get us (aside from a shitload of work to make it so):

 - clear boundaries of where the data structure atomicy lie
 - validation, for if the above piece of code was also ran from IRQ
   context we could get lockdep complaining about IRQ unsafe locks used
   from IRQ context.

Now for !-rt percpu_lock will not emit more than
preempt_disable/local_bh_disable/local_irq_disable, depending on what
variant is used, and the data type percpu_lock_t would be empty (except
when enabling lockdep of course).

Possibly we could reduce all this percpu madness back to one form
(__this_cpu_*) and require that when used a lock of the percpu_lock_t is
taken.



  reply	other threads:[~2011-07-27 12:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-26 21:06 per-cpu operation madness vs validation Peter Zijlstra
2011-07-26 21:32 ` Christoph Lameter
2011-07-27 12:11   ` Peter Zijlstra [this message]
2011-07-27 13:01     ` Peter Zijlstra
2011-07-27 14:20       ` Christoph Lameter
2011-07-27 16:20         ` Peter Zijlstra
2011-07-27 16:29           ` Peter Zijlstra
2011-07-27 17:04             ` Christoph Lameter
2011-07-27 16:53           ` Christoph Lameter
2011-07-27 14:16     ` Christoph Lameter
2011-07-27 16:20       ` Peter Zijlstra
2011-07-27 17:13         ` Christoph Lameter
2011-07-27 17:33           ` Thomas Gleixner
2011-07-27 20:48             ` Christoph Lameter
2011-07-27 22:17               ` Thomas Gleixner
2011-07-28 13:50                 ` Christoph Lameter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1311768693.24752.488.camel@twins \
    --to=peterz@infradead.org \
    --cc=cl@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@us.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.