[patch] generic IPI: simplify barriers and locking

public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed

* [patch] generic IPI: simplify barriers and locking
@ 2009-02-25  5:22 Nick Piggin
  2009-02-25  5:22 ` Nick Piggin
  2009-02-25 11:28 ` Ingo Molnar
  0 siblings, 2 replies; 5+ messages in thread
From: Nick Piggin @ 2009-02-25  5:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Peter Zijlstra, Linus Torvalds,
	Jens Axboe <jens.axbo>

OK, shall we go ahead with this? Only objection from arch guys came from
x86, which has since fixed up ordering of IPIs versus memory operations.

It simplifies things quite a bit and I think Peter is using it as a base
under his remove kmalloc patchset..
--
Simplify the barriers in generic remote function call interrupt code.

Firstly, just unconditionally take the lock and check the list in the
generic_call_function_single_interrupt IPI handler. As we've just taken an IPI
here, the chances are fairly high that there will be work on the list for us,
so do the locking unconditionally. This removes the tricky lockless list_empty
check and dubious barriers. The change looks bigger than it is because it is
just removing an outer loop.

Secondly, clarify architecture specific IPI locking rules. Generic code has no
tools to impose any sane ordering on IPIs if they go outside normal cache
coherency, ergo the arch code must make them appear to obey cache coherency as
a "memory operation" to initiate an IPI, and a "memory operation" to receive
one. This way at least they can be reasoned about in generic code, and smp_mb
used to provide ordering. 

The combination of these two changes means that explict barriers can be taken
out of queue handling for the single case -- shared data is explicitly locked,
and ipi ordering must conform to that, so no barriers needed. An extra barrier
is needed in the many handler, so as to ensure we load the list element after
the IPI is received.

Does any architecture actually *need* these barriers? For the initiator I
could see it, but for the handler I would be surprised. So the other thing we
could do for simplicity is just to require that, rather than just matching
with cache coherency, we just require a full barrier before generating an
IPI, and after receiving an IPI. In which case, the smp_mb()s can go away.
But just for now, we'll be on the safe side and use the barriers (they're
in the slow case anyway).

Cc: linux-arch@vger.kernel.org
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nick Piggin <npiggin@suse.de>

---
 kernel/smp.c |   83 +++++++++++++++++++++++++++++++----------------------------
 1 file changed, 44 insertions(+), 39 deletions(-)

Index: linux-2.6/kernel/smp.c
===================================================================
--- linux-2.6.orig/kernel/smp.c
+++ linux-2.6/kernel/smp.c
@@ -74,9 +74,16 @@ static void generic_exec_single(int cpu,
 	spin_unlock_irqrestore(&dst->lock, flags);
 
 	/*
-	 * Make the list addition visible before sending the ipi.
+	 * The list addition should be visible before sending the IPI
+	 * handler locks the list to pull the entry off it because of
+	 * normal cache coherency rules implied by spinlocks.
+	 *
+	 * If IPIs can go out of order to the cache coherency protocol
+	 * in an architecture, sufficient synchronisation should be added
+	 * to arch code to make it appear to obey cache coherency WRT
+	 * locking and barrier primitives. Generic code isn't really equipped
+	 * to do the right thing...
 	 */
-	smp_mb();
 
 	if (ipi)
 		arch_send_call_function_single_ipi(cpu);
@@ -104,6 +111,14 @@ void generic_smp_call_function_interrupt
 	int cpu = get_cpu();
 
 	/*
+	 * Ensure entry is visible on call_function_queue after we have
+	 * entered the IPI. See comment in smp_call_function_many.
+	 * If we don't have this, then we may miss an entry on the list
+	 * and never get another IPI to process it.
+	 */
+	smp_mb();
+
+	/*
 	 * It's ok to use list_for_each_rcu() here even though we may delete
 	 * 'pos', since list_del_rcu() doesn't clear ->next
 	 */
@@ -154,49 +169,37 @@ void generic_smp_call_function_single_in
 {
 	struct call_single_queue *q = &__get_cpu_var(call_single_queue);
 	LIST_HEAD(list);
+	unsigned int data_flags;
 
-	/*
-	 * Need to see other stores to list head for checking whether
-	 * list is empty without holding q->lock
-	 */
-	smp_read_barrier_depends();
-	while (!list_empty(&q->list)) {
-		unsigned int data_flags;
-
-		spin_lock(&q->lock);
-		list_replace_init(&q->list, &list);
-		spin_unlock(&q->lock);
-
-		while (!list_empty(&list)) {
-			struct call_single_data *data;
-
-			data = list_entry(list.next, struct call_single_data,
-						list);
-			list_del(&data->list);
+	spin_lock(&q->lock);
+	list_replace_init(&q->list, &list);
+	spin_unlock(&q->lock);
 
-			/*
-			 * 'data' can be invalid after this call if
-			 * flags == 0 (when called through
-			 * generic_exec_single(), so save them away before
-			 * making the call.
-			 */
-			data_flags = data->flags;
+	while (!list_empty(&list)) {
+		struct call_single_data *data;
 
-			data->func(data->info);
+		data = list_entry(list.next, struct call_single_data,
+					list);
+		list_del(&data->list);
 
-			if (data_flags & CSD_FLAG_WAIT) {
-				smp_wmb();
-				data->flags &= ~CSD_FLAG_WAIT;
-			} else if (data_flags & CSD_FLAG_LOCK) {
-				smp_wmb();
-				data->flags &= ~CSD_FLAG_LOCK;
-			} else if (data_flags & CSD_FLAG_ALLOC)
-				kfree(data);
-		}
 		/*
-		 * See comment on outer loop
+		 * 'data' can be invalid after this call if
+		 * flags == 0 (when called through
+		 * generic_exec_single(), so save them away before
+		 * making the call.
 		 */
-		smp_read_barrier_depends();
+		data_flags = data->flags;
+
+		data->func(data->info);
+
+		if (data_flags & CSD_FLAG_WAIT) {
+			smp_wmb();
+			data->flags &= ~CSD_FLAG_WAIT;
+		} else if (data_flags & CSD_FLAG_LOCK) {
+			smp_wmb();
+			data->flags &= ~CSD_FLAG_LOCK;
+		} else if (data_flags & CSD_FLAG_ALLOC)
+			kfree(data);
 	}
 }
 
@@ -375,6 +378,8 @@ void smp_call_function_many(const struct
 
 	/*
 	 * Make the list addition visible before sending the ipi.
+	 * (IPIs must obey or appear to obey normal Linux cache coherency
+	 * rules -- see comment in generic_exec_single).
 	 */
 	smp_mb();
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [patch] generic IPI: simplify barriers and locking
  2009-02-25  5:22 [patch] generic IPI: simplify barriers and locking Nick Piggin
@ 2009-02-25  5:22 ` Nick Piggin
  2009-02-25 11:28 ` Ingo Molnar
  1 sibling, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2009-02-25  5:22 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Peter Zijlstra, Linus Torvalds,
	Jens Axboe, Oleg Nesterov, Suresh Siddha, linux-arch

OK, shall we go ahead with this? Only objection from arch guys came from
x86, which has since fixed up ordering of IPIs versus memory operations.

It simplifies things quite a bit and I think Peter is using it as a base
under his remove kmalloc patchset..
--
Simplify the barriers in generic remote function call interrupt code.

Firstly, just unconditionally take the lock and check the list in the
generic_call_function_single_interrupt IPI handler. As we've just taken an IPI
here, the chances are fairly high that there will be work on the list for us,
so do the locking unconditionally. This removes the tricky lockless list_empty
check and dubious barriers. The change looks bigger than it is because it is
just removing an outer loop.

Secondly, clarify architecture specific IPI locking rules. Generic code has no
tools to impose any sane ordering on IPIs if they go outside normal cache
coherency, ergo the arch code must make them appear to obey cache coherency as
a "memory operation" to initiate an IPI, and a "memory operation" to receive
one. This way at least they can be reasoned about in generic code, and smp_mb
used to provide ordering. 

The combination of these two changes means that explict barriers can be taken
out of queue handling for the single case -- shared data is explicitly locked,
and ipi ordering must conform to that, so no barriers needed. An extra barrier
is needed in the many handler, so as to ensure we load the list element after
the IPI is received.

Does any architecture actually *need* these barriers? For the initiator I
could see it, but for the handler I would be surprised. So the other thing we
could do for simplicity is just to require that, rather than just matching
with cache coherency, we just require a full barrier before generating an
IPI, and after receiving an IPI. In which case, the smp_mb()s can go away.
But just for now, we'll be on the safe side and use the barriers (they're
in the slow case anyway).

Cc: linux-arch@vger.kernel.org
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Nick Piggin <npiggin@suse.de>

---
 kernel/smp.c |   83 +++++++++++++++++++++++++++++++----------------------------
 1 file changed, 44 insertions(+), 39 deletions(-)

Index: linux-2.6/kernel/smp.c
===================================================================
--- linux-2.6.orig/kernel/smp.c
+++ linux-2.6/kernel/smp.c
@@ -74,9 +74,16 @@ static void generic_exec_single(int cpu,
 	spin_unlock_irqrestore(&dst->lock, flags);
 
 	/*
-	 * Make the list addition visible before sending the ipi.
+	 * The list addition should be visible before sending the IPI
+	 * handler locks the list to pull the entry off it because of
+	 * normal cache coherency rules implied by spinlocks.
+	 *
+	 * If IPIs can go out of order to the cache coherency protocol
+	 * in an architecture, sufficient synchronisation should be added
+	 * to arch code to make it appear to obey cache coherency WRT
+	 * locking and barrier primitives. Generic code isn't really equipped
+	 * to do the right thing...
 	 */
-	smp_mb();
 
 	if (ipi)
 		arch_send_call_function_single_ipi(cpu);
@@ -104,6 +111,14 @@ void generic_smp_call_function_interrupt
 	int cpu = get_cpu();
 
 	/*
+	 * Ensure entry is visible on call_function_queue after we have
+	 * entered the IPI. See comment in smp_call_function_many.
+	 * If we don't have this, then we may miss an entry on the list
+	 * and never get another IPI to process it.
+	 */
+	smp_mb();
+
+	/*
 	 * It's ok to use list_for_each_rcu() here even though we may delete
 	 * 'pos', since list_del_rcu() doesn't clear ->next
 	 */
@@ -154,49 +169,37 @@ void generic_smp_call_function_single_in
 {
 	struct call_single_queue *q = &__get_cpu_var(call_single_queue);
 	LIST_HEAD(list);
+	unsigned int data_flags;
 
-	/*
-	 * Need to see other stores to list head for checking whether
-	 * list is empty without holding q->lock
-	 */
-	smp_read_barrier_depends();
-	while (!list_empty(&q->list)) {
-		unsigned int data_flags;
-
-		spin_lock(&q->lock);
-		list_replace_init(&q->list, &list);
-		spin_unlock(&q->lock);
-
-		while (!list_empty(&list)) {
-			struct call_single_data *data;
-
-			data = list_entry(list.next, struct call_single_data,
-						list);
-			list_del(&data->list);
+	spin_lock(&q->lock);
+	list_replace_init(&q->list, &list);
+	spin_unlock(&q->lock);
 
-			/*
-			 * 'data' can be invalid after this call if
-			 * flags == 0 (when called through
-			 * generic_exec_single(), so save them away before
-			 * making the call.
-			 */
-			data_flags = data->flags;
+	while (!list_empty(&list)) {
+		struct call_single_data *data;
 
-			data->func(data->info);
+		data = list_entry(list.next, struct call_single_data,
+					list);
+		list_del(&data->list);
 
-			if (data_flags & CSD_FLAG_WAIT) {
-				smp_wmb();
-				data->flags &= ~CSD_FLAG_WAIT;
-			} else if (data_flags & CSD_FLAG_LOCK) {
-				smp_wmb();
-				data->flags &= ~CSD_FLAG_LOCK;
-			} else if (data_flags & CSD_FLAG_ALLOC)
-				kfree(data);
-		}
 		/*
-		 * See comment on outer loop
+		 * 'data' can be invalid after this call if
+		 * flags == 0 (when called through
+		 * generic_exec_single(), so save them away before
+		 * making the call.
 		 */
-		smp_read_barrier_depends();
+		data_flags = data->flags;
+
+		data->func(data->info);
+
+		if (data_flags & CSD_FLAG_WAIT) {
+			smp_wmb();
+			data->flags &= ~CSD_FLAG_WAIT;
+		} else if (data_flags & CSD_FLAG_LOCK) {
+			smp_wmb();
+			data->flags &= ~CSD_FLAG_LOCK;
+		} else if (data_flags & CSD_FLAG_ALLOC)
+			kfree(data);
 	}
 }
 
@@ -375,6 +378,8 @@ void smp_call_function_many(const struct
 
 	/*
 	 * Make the list addition visible before sending the ipi.
+	 * (IPIs must obey or appear to obey normal Linux cache coherency
+	 * rules -- see comment in generic_exec_single).
 	 */
 	smp_mb();
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] generic IPI: simplify barriers and locking
  2009-02-25  5:22 [patch] generic IPI: simplify barriers and locking Nick Piggin
  2009-02-25  5:22 ` Nick Piggin
@ 2009-02-25 11:28 ` Ingo Molnar
  2009-02-25 11:32   ` Peter Zijlstra
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2009-02-25 11:28 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Peter Zijlstra, Linus Torvalds, Jens Axboe,
	Oleg Nesterov, Suresh Siddha, linux-arch


* Nick Piggin <npiggin@suse.de> wrote:

> OK, shall we go ahead with this? Only objection from arch guys 
> came from x86, which has since fixed up ordering of IPIs 
> versus memory operations.
> 
> It simplifies things quite a bit and I think Peter is using it 
> as a base under his remove kmalloc patchset..

Sure. I had Peter's patches in tip/core/ipi but they had test 
failures so it's still being worked out.

I've applied your patch to tip:core/ipi and will keep it there 
unless there are objections.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] generic IPI: simplify barriers and locking
  2009-02-25 11:28 ` Ingo Molnar
@ 2009-02-25 11:32   ` Peter Zijlstra
  2009-02-25 12:23     ` Ingo Molnar
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-02-25 11:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Jens Axboe,
	Oleg Nesterov, Suresh Siddha, linux-arch

On Wed, 2009-02-25 at 12:28 +0100, Ingo Molnar wrote:
> * Nick Piggin <npiggin@suse.de> wrote:
> 
> > OK, shall we go ahead with this? Only objection from arch guys 
> > came from x86, which has since fixed up ordering of IPIs 
> > versus memory operations.
> > 
> > It simplifies things quite a bit and I think Peter is using it 
> > as a base under his remove kmalloc patchset..
> 
> Sure. I had Peter's patches in tip/core/ipi but they had test 
> failures so it's still being worked out.

You never had these latest few I think. Let me repost them.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] generic IPI: simplify barriers and locking
  2009-02-25 11:32   ` Peter Zijlstra
@ 2009-02-25 12:23     ` Ingo Molnar
  0 siblings, 0 replies; 5+ messages in thread
From: Ingo Molnar @ 2009-02-25 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Andrew Morton, Linus Torvalds, Jens Axboe,
	Oleg Nesterov, Suresh Siddha, linux-arch


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> On Wed, 2009-02-25 at 12:28 +0100, Ingo Molnar wrote:
> > * Nick Piggin <npiggin@suse.de> wrote:
> > 
> > > OK, shall we go ahead with this? Only objection from arch guys 
> > > came from x86, which has since fixed up ordering of IPIs 
> > > versus memory operations.
> > > 
> > > It simplifies things quite a bit and I think Peter is using it 
> > > as a base under his remove kmalloc patchset..
> > 
> > Sure. I had Peter's patches in tip/core/ipi but they had test 
> > failures so it's still being worked out.
> 
> You never had these latest few I think. Let me repost them.

yeah, it was the initial version from a week ago or so.

	Ingo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-25 12:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-25  5:22 [patch] generic IPI: simplify barriers and locking Nick Piggin
2009-02-25  5:22 ` Nick Piggin
2009-02-25 11:28 ` Ingo Molnar
2009-02-25 11:32   ` Peter Zijlstra
2009-02-25 12:23     ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox